You are on page 1of 15

~ Pergamon

Transpn Res.-B, Vol. 32, No. I, pp, 61-75, 1998 © 1998 Elsevier Science Ltd All rights reserved. Printed in Great Britain 0191-2615/98 $19.00+0.00

PH: SOI91-2615(97)00014-3

A COMPARISON OF TWO ALTERNATIVE BEHAVIORAL CHOICE MECHANISMS FOR HOUSEHOLD AUTO OWNERSHIP DECISIONS

CHANDRA R. BHAT*

Department of Civil Engineering, University of Texas at Austin, ECJ Hall, Austin, TX 78712, U.S.A. and

VAMSI PULUGURTA

Department of Civil Engineering, University of Massachusetts at Amherst, Marston Hall, Amherst, MA 01003, U.S.A.

(Received 10 April 1996; in revised form 18 March 1997)

Abstract-Auto ownership modeling plays an important role in travel demand analysis because it is a key determinant of the travel behavior of individuals and households. Discrete-choice auto ownership models use either an ordered-response choice mechanism or an unordered-response choice mechanism. The orderedresponse mechanism is based on the hypothesis that an uni-dimensional continuous latent auto ownership propensity index determines the level of car ownership. The unordered-response mechanism is based on the Random Utility Maximization principle. This paper presents the underlying theoretical structures, and identifies the advantages and disadvantages, of the two alternative response mechanisms. The paper also compares the ordered-response mechanism (represented by the ordered-response logit model) and the unorderedresponse mechanism (represented by the multinomial logit model) empirically using several data sets. This comparative analysis offers strong evidence that the appropriate choice mechanism is the unordered-response structure. As a general guideline, auto ownership modeling must be pursued using the unordered-response class of models (such as the multinomial logit or pro bit model) and not using the ordered-response class of models (such as the ordered-response logit or probit). © 1998 Elsevier Science Ltd

Keywords: ordered-response mechanism, unordered-response mechanism, car ownership modeling, discrete choice modeling, ordered-response logit, multinomial logit

I. INTRODUCTION

Auto ownership modeling has received considerable attention in the travel demand analysis literature because of the important role it plays in the overall transportation and land use planning process. It is now well recognized that auto ownership is one of the key determinants of the activity-travel behavior of individuals and households. Among other things, it affects trip frequency choice (Meurs, 1990), destination choice for non-work activity participation (Wrigley, 1990), mode choice to work and to non-work activity destinations (Uncles, 1987; Bhat, 1996), and the propensity to chain activities in a tour (Hamed and Mannering, 1993).

Auto ownership forecasts can be obtained using aggregate extrapolation models which model auto ownership directly at the aggregate level (such as a zonal, regional or national level) or using disaggregate auto ownership models which use the household as the decision making unit and obtain zonal, regional, or national level forecasts by aggregating over households. The disaggregate models are structurally more behavioral compared to aggregate models and are better able to capture the causal relationship between auto ownership determinants and auto ownership levels (see Oi and Shu1diner, 1963 and Schor, 1989). Consequently, disaggregate methods have become the preferred approach to model auto ownership choice. Since auto ownership is a categorical variable, disaggregate auto ownership models usually take the form of discrete choice models. Within the class of disaggregate discrete choice models, two general decision mechanisms have been used for auto ownership modeling: the ordered-response mechanism and the unorderedresponse mechanism.

*Author for correspondence.

61

62

C. R. Bhat and V. Pulugurta

The ordered-response mechanism is based on the hypothesis that a single continuous variable C; represents the latent auto ownership propensity of household i. The observed auto ownership level C, for household i is assumed to be related to the latent auto ownership propensity as follows:

C, = k if and only if 1/Ik-1 < C; :::;'1/Ik, k = 0, I, ... , K, 1/1-1 = -00, 1/IK = +00, (1)

where the 1/Ik terms represent the threshold values of the latent propensity demarcating the discrete outcomes (Amemiya, 1985, p. 292). Thus, in the ordered-response structure, the level of auto ownership corresponds to an ordered partition of the real line. The latent auto ownership propensity is specified to be the sum of a deterministic component (which is a function of household attributes) and a random component (that represents the effects of unobserved attributes of the household). The econometric specification of the ordered-response mechanism is completed by assuming a particular continuous probability density function for the random component.

The unordered-response mechanism does not consider the levels of car ownership to correspond to an ordered-response; that is, the auto ownership levels are not assumed to correspond to the successive partition of a uni-dimensional latent variable. Rather, the auto levels are assumed to correspond to the 'partition of a higher dimensional Euclidean space' (Amemiya, 1985, p. 292). The unordered-response mechanism is based on the Random Utility Maximization (RUM) principle which assumes that households associate an utility value with each auto ownership level and select the auto ownership level that provides the highest utility. The utility that a household associates with each auto ownership level is specified to be the sum of a deterministic component and a random component. Thus, if Ujk is the utility that household i associates with auto ownership level k (k=0,1,2, ... K), then household iwill choose: auto ownership level} only if Ui} :> Ujk for all k not equal to}. In this structure, K continuous variables (Uil - UiO, Ui2 - UiO, ... , UjK - UjO) determine the choice of car ownership level; that is, the level of car ownership corresponds to the partition of a K-dimensional Euclidean space. The econometric specification of the unorderedresponse mechanism is completed by assuming a continuous multivariate probability density function for the random components of utilities across auto ownership levels. Different assumptions of the multivariate density function give rise to different models such as the multinomial logit, the nested logit, and the multinomial probit.

Between the ordered- and unordered-response mechanisms, the ordered-response mechanism has the advantage of being parsimonious in structure. However, it may be an over-simplification to assume that auto ownership decisions are determined by a single continuous propensity measure. If households make a choice of car ownership based on an utility-maximization principle (i.e. if the true choice process is unordered), then using an ordered-response model can lead to serious biases in the estimation of the probabilities (see Amemiya, 1985, p. 293). The unordered-response mechanism, on the other hand, is appealing because it is based on an utility maximization hypothesis. In many discrete choice situations, it may be a reasonable representation of the choice process. However, if households make a choice of car ownership based on a single latent propensity measure (i.e. if the true choice process is ordered), then using the unordered-response mechanism leads to a loss in efficiency (since the unordered decision mechanism disregards the strictly one dimensional nature of the propensity determining the discrete outcomes).

In the context of auto ownership decisions, both the ordered- and the unordered-response behavioral choice mechanisms are plausible and researchers have used both decision structures. Examples of the use of the ordered-response mechanism include Kitamura (1987, 1988), Golob and Van Wissen (1988), Kitamura and Bunch (1989), Golob (1990), Bhat and Koppelman (1993). Examples of the use of the unordered-response mechanism include Mannering and Winston (1985), Train (1986), Bunch and Kitamura (1990), Hensher (1992), Purvis (1994) and Agostino et al. (1996).*

An issue of interest is: Which one of the two behavioral mechanisms; the ordered-response or the unordered-response; is a more appropriate representation of the process underlying auto ownership choices? In this paper, we examine the underlying theoretical structure of the two

-Many of the studies listed here model auto ownership jointly with other travel-related decisions such as vehicle use, vehicle type choice, trip generation, and work travel mode choice. Some of these studies also model auto ownership using longitudinal data to accommodate inter-temporal influences on auto ownership levels. Here, we only identify the behavioral choice structure adopted for car ownership modeling, the subject of central focus of the current paper.

Household auto ownership decisions

63

behavioral mechanisms and evaluate the empirical performance of the two structures in the context of auto ownership modeling. Several data sets are used in the empirical analysis, the rationale being that if we obtain consistent results regarding the preferred model structure, this may shed light on the behavioral mechanism driving auto ownership decisions and suggest the structure to use for auto ownership modeling. We use the multinomiallogit (MNL) model to represent the unordered-response decision mechanism. The MNL model is obtained by assuming independent and identically (lID) gumbel distributed random components of utilities across auto ownership levels and households (we use the MNL model because it is the most frequently used unorderedresponse model for auto ownership analysis; see Schor, 1989; and also because the MNL formulation is reasonably robust to violations of the lID assumption; see Horowitz, 1980). Corresponding to the gumbel-distribution for the latent utilities in the unordered-response mechanism, we use an lID (across households) gumbel-distribution for the latent auto ownership propensity index in the ordered-response decision mechanism. This results in the ordered-response logit (ORL) model (the ORL model and the more commonly used ordered-response pro bit model provide similar results; see Han and Hausman, 1990).

The next section of this paper examines the theoretical structure of the ordered and unordered behavioral mechanisms. Section 3 presents the elasticity expressions for the ORL model and the MNL model, and discusses the measures used in evaluating the empirical performance of the two models. Section 4 identifies the data sources and samples used in the empirical analysis. Section 5 focuses on empirical results. The final section summarizes the important findings from the research.

2. THEORETICAL STRUCTURE

2.1. Ordered-response mechanism

The ordered-response decision structure is not consistent with global utility-maximization. As indicated earlier, it is based on the hypothesis that a single continuous variable represents the latent auto ownership propensity of household i, and that the observed car ownership level is a reflection of this underlying latent propensity. A more useful approach, however, to understand the orderedresponse mechanism (vis a vis the unordered mechanism) is to characterize it as a decision process associated with a series of binary choice decisions. Specifically, consider that a household assigns utility values for the range of alternatives on either side of each possible car ownership level. Thus, if there are four possible car ownership levels (0, I ,2, and 3), the household assigns utility values to

(a) zero car ownership [Uo] and more than zero car ownership [U>o],

(b) less than or equal to one car ownership [U .. d and more than one car ownership [U>d, and (c) less than or equal to two car ownership [U .. 2] and more than two car ownership [U>2] (note

that U>2 = U3, where U3 is the utility of owning three cars, since three cars is the maximum possible car ownership level in this example).

The household then makes an independent utility-maximizing decision for each range. Based on the decision outcome for each range, the actual choice is implicitly determined. For example, if

then a car ownership level of one is chosen. For uniqueness of choice, the condition U .. 2 > U >2 must hold in the example above. More generally, if U~;k > U;«, then it must be true that

U![J> U>j for alii> k.

Similarly, if U>k > U .. k» then it must be true that

U>j> U![J for alii < k.

These are essentially the ordering conditions in the ordered-response mechanism. The structure of the ordered-response mechanism is completed by specifying the 'range-based' utilities as the sum

64

c. R. Bhat and V. Pulugurta

of a deterministic component (which is a function of household characteristics) and a random component. To satisfy the ordering conditions, the following specification is adopted (the index for households is suppressed):

Uo = f3~x + EO,

U,;;;I = f3',;;;IX+E,;;;I, U>I = -1/11 + (f3';;;1 + f3)'x+ (E';;;I +E) U,;;;2 = f3',;;;2x + E';;;2, U>2 = -1/12 + (f3';;;2 + f3)'x + (E';;;2 + E)

(2)

In the above equation, x is a row vector of household characteristics (whose first element is a constant of one), the 1/1 and f3 terms are parameters, and the E terms correspond to random error terms. It is easy to verify that the ordering conditions are always satisfied in the above specification as long as 1/IK-I > 1/IK-2 ... > 1/Ik > ... 1/12 > 1/11 > 0.

Equation (2) represents the underlying structure of the ordered-response mechanism. To see this, note that a household chooses car ownership level k (i.e. C = k) if

that is,

C = k if and only if 1/Ik-1 < f3'x + E < 1/Ib 1/1-1 = -00, 1/10 = 0, 1/IK = +00. (3)

Equation (3) is the same as the latent index formulation of the ordered-response mechanism (eqn (1» with C* = f3'x + E (the index i for households is suppressed). Thus, the ordered-response mechanism with K+ I categories (k=O,I, ... ,K) is equivalent to a choice process imputed from K binary choice models in which the parameters (other than the constant) are constrained to be equal and the correlations among the error terms of the K equations are all equal to one (note that each binary choice model has an equation of the form U>k .- U,;;;k = 1/Ik + {Yx + E).

The equality constraints on the f3 parameters across the K binary equations in the orderedresponse decision structure can be restrictive in capturing the effect of household characteristics on car ownership level. Consider the effect of household income. It is at least conceivable that income has a strong positive effect on owning more than zero cars relative to zero cars, a strong but less significant effect on owning> 1 car relative to :::;; 1 calr, but little effect on owning more than two cars relative to :::;; 2 cars. However, since the ordered response structure constrains the income parameter to be the same across the binary choices, the result would be an under-estimation of the negative effect of income on zero car ownership and an over-estimation of the positive effect of income on > 2 car ownership. Further, the ordered structure will not allow abrupt trend changes in the effect of variables on successive car ownership levels,

The ORL model that we use to represent the ordered-response mechanism in the empirical analysis is obtained by assuming a gumbel distribution for the error term E in eqn (3). Re-introducing the index i for the household, the probability that household i chooses car ownership level k can be written as (see McKelvey and Zavoina, 1975):

Prob [C; = k] = P;k = A(1/Ik - {Yx;) - A(1/Ik-1 - {fx;), where A(z) = exp [-exp(-z)] (4)

The parameters to be estimated in the ORL model are the threshold parameters (the 1/1 terms) and the vector f3. A(.) represents the standard cumulative gumbel distribution.

2.2. Unordered-response mechanism

The unordered-response mechanism is consistent with global utility-maximization. It is based on the principle that households associate a utility value with each car ownership level and choose the

Household auto ownership decisions

65

one with maximum utility. The structure for the unordered mechanism is much more straightforward and explicit than the ordered mechanism. Let the utility of car ownership level k (Uk) be written as (we suppress the index i for households):

(5)

where x is a vector of exogenous variables associated with the: household (as earlier) and fh is a corresponding vector of parameters to be estimated for each auto ownership level k [k = 1,2, ... ,K; as usual, because of identification considerations, the parameter vector corresponding to one of the (K+ 1) car ownership levels needs to be normalized to zero]. The notation in eqn (5) assumes that all exogenous variables correspond to characteristics of the household. We adopt this notation because the exogenous variables in auto ownership models are generally associated with household characteristics and not auto level characteristics. Further, for the purpose of comparing the ordered- and unordered-response mechanisms, we can use only characteristics specific to the household because characteristics associated with auto ownership levels cannot be introduced in the ordered-response mechanism.

An important feature of the unordered mechanism of eqn (5) is that it does not place any restrictions on the effect of household characteristics across car ownership levels. Thus, for example, it can allow the effect of income to be highly negative for the utility of zero car ownership, positive for I and 2 car ownership and zero for ownership of three or more cars. The flexibility in parameter effect, however, comes at the price of substantially more parameters; there are (K - It M additional parameters to be estimated in the unordered structure compared to the ordered structure, where M is the total number of household variables.

The primary issue in comparing the ordered- and unordered-response mechanisms is whether the restrictive nature of the ordered mechanism is a reasonable representation of car ownership choice-making behavior or whether the flexibility offered by the unordered-response mechanism is required to analyze car ownership decisions. If the former is true, then the ordered-response mechanism should provide statistically superior results compared to the unordered mechanism in empirical performance. If the latter is true, then the unordered mechanism should provide statistically better results. It is important to emphasize that the ordered and unordered mechanisms have different structures and different numbers of parameters. The statistical tests that we will use to assess empirical performance will be non-nested to recognize the different structures and will penalize the unordered structure for having more parameters than the ordered structure (see Section 3.2). The tests, therefore, will indicate that the unordered mechanism is better if and only if the penalty of having more parameters is outweighed by the statistical benefit (in terms of data fit) of including the additional parameters.

The MNL model that is used to represent the unordered-response mechanism in the empirical analysis is obtained by assuming that the error term in eqn (5) is independently and identically distributed (across alternatives) with a gumbel distribution. With this assumption (and introducing the index i for the household), the utility maximizing framework provides the following form for the probability of choice of household i choosing car ownership level k:

(6)

where Yk is a vector of parameters to be estimated for each car ownership level (except a base car ownership level).

3. ELASTICITY EFFECTS AND MEASURES OF DATA FIT

3.1. Elasticity effects

The parameters of the ORL and MNL models are not directly comparable because of their different structures. However, the aggregate-level elasticity effects can be used to compare the exogenous variable effects in the two models.

66

C. R. Bhat and V. Pulugurta

The aggregate-level elasticity effect of a continuous exogenous variable Xc (such as income) on the expected share of choice of auto ownership level k (Pk) can be written in the ORL model as:

N

TJ~: = ~ x ~) .. (l/Ik-l - (fXj) -- )..(l/Ik - P'Xi)]Xic, where )..(z) = e-ze-e-' (7)

EPik i=l

i

The corresponding expression for the aggregate-level elasticity of a continuous variable in the MNL model is:

K

where )lic = L P;kYk'c

k'=O

(8)

To compute an aggregate-level 'elasticity' of an ordinal exogenous variable (such as the number of working adults in the household), we increase the value of the ordinal variable by one unit for each household and obtain the relative change in expected aggregate shares. Thus, the 'elasticities' for the ordinal exogenous variables can be viewed as the relative change in expected aggregate shares due to an increase of one unit in the ordinal variable across all households.

Finally, to compute an aggregate-level 'elasticity' of a dummy exogenous variable (such as urban residential location of a household), we change the value of the variable to one for the subsample of observations for which the variable takes a value of zero and to zero for the subsample of observations for which the variable takes a value of one. We then sum the shifts in expected aggregate shares in the two subsamples after reversing the sign of the shifts in the second subsample and compute an effective proportional change in expected aggregate shares in the: entire sample due to a change in the dummy variable from 0 to 1.

3.2. Evaluation criteria for data fit

The elasticity effects can be used to compare the effects of exogenous variables in the ORL and MNL models. But they do not provide an evaluation of the performance of the two models. In this section, we discuss the evaluation criteria used to assess the data fit offered by the ORL and. MNL models. We examine the fit of the models on both an estimation sample (used in estimation) and a holdout sample (that is not used in estimation).

A standard measure of fit in the estimation sample is 7P value (referred to as the adjusted likelihood ratio index or McFadden's adjusted R2; see Windmeijer, 1995) defined as follows:

(9)

where Y(~) and 2( C) are the log-likelihood function values at convergence and at sample shares, respectively, and M is the number of parameters estimated in the model (besides the threshold parameters for the ORL model and besides the alternative specific constants for the MNL model)." To test the performance of the two non-nested models (i.e. the ORL and MNL models) statistically, Ben-Akiva and Lerman's (1985) adjusted likelihood ratio test may be used. This test determines if the adjusted likelihood ratio indices of two non-nested models are significantly different. In particular, if the difference in the indices is T, then the probability that this difference could have occurred by chance is no larger than <I>{--[-2TY(C) + (M2 - MJ)]O.5} in the asymptotic limit (<I>{.} is the cumulative standard normal distribution function). A small value of the probability of chance occurrence indicates that the difference T is statistically significant and that the model with the higher value of adjusted likelihood ratio index is to be preferred.

We also evaluate the performance of the ORL and MNL models on a holdout (validation) sample to verify that the results obtained from the estimation sample are not an artifact of overfitting and are in fact stable. We use both aggregate and dis aggregate measures of fit. At the aggregate level, we compare the predicted and actual auto ownership level shares and compute the

*The log-likelihood value at sample shares is the log-likelihood at convergence with only the threshold bounds and the constant in the ORL model and with only the alternative specific: constants in the MNL.

Household auto ownership decisions

67

root mean square error (RMSE) and the mean percentage absolute error of the predicted shares. At the disaggregate-level, we use two measures of fit. The first measure is the predictive adjusted likelihood ratio index. This measure is computed by calculating the predictive log-likelihood function value at the parameter estimates obtained by maximizing the estimation likelihood function and then computing the corresponding adjusted likelihood ratio index. The second measure is the average probability of correct prediction computed as N-1 ~= ~ 8ikPik. where N is the number

i k

of observations in the validation sample, 8ik is a dummy variablle indicating if household i owns k cars, and Pik is the predicted probability of household i choosing k cars.

4. DATA SOURCES AND SAMPLES USED

We use four different data sources for comparing the performance of the ORL and MNL models. This provides the opportunity to draw general (as opposed to data-specific) conclusions regarding the behavioral mechanism (ordered-response vs unordered-response) underlying auto ownership decisions. The four data sources include three regional data sets from the United States and a Dutch national dataset. The U.S. regional data sets arc! obtained from the 1991 Boston Region Household Activity Survey, the 1990 Bay Area Household Travel Survey, and the 1991 wave (a wave refers to cross-sectional data at one time point) of the Puget Sound Household Travel Panel Survey. The Dutch national dataset is based on the 1987 wave of the Dutch Mobility Panel Survey. We now provide a brief overview of each survey and discuss the samples drawn for analysis.

The 1991 Boston Region Household Activity Survey was conducted by the Central Transportation Planning Staff (CTPS) of the Boston Metropolitan Planning Organization. The mail-back survey was conducted in April of 1991 and collected data on socio-demographic/trip characteristics of the household and each individual in the household (see Stopher, 1992). There are a total of 3896 households in this dataset. After removing households with inconsistent and missing values on relevant variables, 3665 households remained. We selected 2500 households randomly for estimation and set aside the remaining 1165 households as the validation sample.

The 1990 Bay Area Household Travel Survey was conducted by the Metropolitan Transportation Commission (MTC) in the Spring and Fall of 1990 (see White and Company, Inc., 1991). About 10 800 households were contacted by telephone and provided information on household and individual characteristics. For the current analysis, we focus on the 9359 households which represent the 'single-weekday' sample. After data cleaning, we wen! left with 9140 households. For the current analysis, we draw a random sample of 3500 households and then further split this sample randomly into a sample of 2500 households for estimation and 1000 households for validation.

The Puget Sound Household Travel Panel Survey comprises four waves collected between 1989 and 1993 (Fitzroy, 1994). The survey was conducted by the Puget Sound Council of Governments (PSCOG). In this paper, we use the 1990 wave of the panel comprising 1822 households. Of these, 1731 were useable (after removing households with missing values on car ownership and other relevant exogenous variables). We randomly selected 1231 households for the estimation sample and the remaining 500 households for validation purposes.

The Dutch National Mobility Panel Survey was a mail-back survey involving weekly travel diaries and household and personal questionnaires collected at biannual and annual intervals (see van Wissen and Meurs, 1989). Ten waves were collected between March 1984 and March 1989. Each wave consists of about 1800 households. In the current analysis, we use data on 1807 households collected from the survey in March 1987. We split the 1807 households into an estimation sample of 1307 households and a validation sample of 500 households.

We specify five auto ownership level alternatives in the U.S. regional data sets: zero, one, two, three, and four autos (the share of households with more than four autos was very small and hence we assigned households with more than four autos to the four autos category). We specify three auto ownership level alternatives in the Dutch data: zero, one, and two autos (the two autos category represents the choice of two or more autos).

The auto ownership shares and the mean auto ownership level in the estimation sample for the different data sets are provided in Table 1.

Among the U.S. regional data sets, the car ownership level is highest in the Puget Sound region and lowest in the Boston region. The car ownership level in the Netherlands is much lower than in

68

C. R. Bhat and V. Pulugurta

Table I. Descriptive statistics on auto ownership in estimation sample

Shares expressed as percentages in

---------
Auto ownership level Boston data Bay area data Puget Sound data
0 10.0 7.2 2.9
1 34.8 32.2 23.4
2 41.0 39.2 44.0
3 10.1 15.1 21.3
4 4.2 6.3 8.4
Mean auto ownership level 1.64 1.81 2.09 Dutch data

27.6 63.3 9.1

0.81

the U.S. Detailed statistics on the exogenous variables in the various data sets are provided in Pulugurta (1996).

5. EMPIRICAL ANALYSIS

5.1. Variable specification

The choice of variables for potential inclusion in the auto ownership models was guided by previous theoretical and empirical work on car ownership modeling, intuitive arguments regarding the effects of exogenous variables, and data availability considerations. We consider two broad classes of variables for inclusion: socio--economic variables and residentiallocationjtype attributes.

A number of variables representing socio-economic characteristics were included, but only three variables emerged to be consistently significant across the different data sets. These variables were number of working adults, number of non-working adults, and household income. * We did not consider number of licensed individuals in the household as an explanatory variable because this variable is likely to be co-determined with car ownership levels. We found the differentiation of adults by workers and non-workers to be statistically important in the modeling of auto ownership levels. However, household size was insignificant or only marginally significant (after controlling for number of workers and non-workers). This suggests that the number of non-adult household members does not significantly impact auto ownership choices. We included a race variable as part of the socio--economic attributes of a household to capture cultural and life-style differences. Our analysis indicated that the only differentiation of consequence was whether a household was noncaucasian or caucasian. Therefore, in the final specification for the Bay area dataset (the differentiation of households by race was available only from the Bay Area Travel Survey), we use a single dummy variable representing non-caucasian households to represent the effect of race.t

The residential location and type variables capture attributes of a household's activity-travel environment. We use simple descriptors of residential location based on the degree of urbanization of the area of residence. These location descriptors serve as 'proxy' variables for a wide range of inter-related activity-travel characteristics affecting car ownership levels including the spatial opportunities to pursue activities by transit, overall transportation level-of-service offered by auto and transit modes, and costs of auto maintenance and insurance. t The residential (housing) type variable provides a good description of the activity-travel environment in the immediate

*An individual is labeled as an 'adult' if she or he is 16 years or older in the U.S. data sets and 18 years or older in the Dutch data set. This definition was based on the age at which an individual can acquire a driving license and also on the distribution of employed individuals by age (almost all workers in the U.S. data samples were over 16 years and all workers in the Netherlands sample were over 18).

"Some other specifications of the socio-eeonomic variables that we explored included a 'household income divided by household size' variable to capture a 'per-individual' income effect, and the logarithm-transformation of household income. These specifications did not provide better results than the simple untransformed income specification.

tThe location descriptors for the different data sets are developed as follows: In the Bay Area dataset, San Francisco County is classified as an urban area. In the Puget Sound data, the city of Seattle is designated as an urban area. In the Boston data, Rings 0 and I of the Boston Metropolitan Planning Organization'S geographical classification system (see Harrington et al., 1995) are defined to be urban areas; Ring 2 is classified as a suburban area; and Rings 3 and 4 are designated rural areas. In the Dutch dataset, the classification as an urban area was based on a urbanization index developed by the Central Bureau of Statistics of the Dutch Government (see Central Bureau Voor de Statistiek, 1989).

Household auto ownership decisions 69
Table 2. Parameter estimates for Boston data
ORL ML model
Variable Parameter t-statistic Parameter t-statistic
Socio-economic variables
Number of non-working adults 0.3989 11.11
I Auto 0.4320 2.79
2 Autos 1.3735 7.97
3 Autos 2.1963 11.03
4 Autos 2.6923 11.99
Number of working adults 0.6381 22.22
I Auto 0.3935 2.54
2 Autos 1.7200 10.04
3 Autos 2.9077 14.87
4 Autos 3.5211 16.21
Annual household income (in OOOs of dollars) 0.0125 13.11
1 Auto 0.0423 7.48
2 Autos 0.0623 10.55
3 Autos 0.0661 10.28
4 Autos 0.0702 9.79
Residential location] type variables
Urban residential location ·-1.0509 -15.19
1 Auto -1.9741 -8.09
2 Autos -3.1091 -11.58
3 Autos -3.4850 -9.68
4 Autos -3.8958 -8.55
Suburban residential location -0.4026 -6.50
1 Auto -0.6962 -2.39
2 Autos -1.2275 -4.01
3 Autos -1.2751 -3.66
4 Autos -1.6026 -3.83
Single-family residential housing 0.5647 9.33
1 Auto 0.7568 3.02
2 Autos 1.7020 6.49
3 Autos 2.3811 7.17
4 Autos 1.7504 4.32 neighborhood of a household's residence (the residential location variables, on the other hand, may be viewed as more aggregate spatial representations of the activity-travel environment). We distinguished two housing types in the analysis; single-family and multiple-family; with the expectation that higher density development and a transit-friendly environment would be associated with multi-family dwelling units relative to single-family homes. Thus, we anticipate that single-family housing units would be associated with higher auto ownership levels than multiplefamily housing units. Housing type information was available only for the Boston and Bay Area data sets.

5.2. Calibration results

The ORL model and the MNL model were estimated for each of the four data sets and the elasticity effects were computed as discussed in Section 3.1. Here we present the detailed estimation results and the elasticity effects for the Boston area data set. The parameter estimates and elasticity effects for the other data sets are not included in this paper ....

The ORL and MNL parameter estimates for the Boston dataset are shown in Table 2.

The parameter estimates in the ORL model indicate the effect of exogenous variables on the latent auto ownership propensity of households; those in the MNL model represent the effect of variables on the utility of each auto ownership alternative relative to the zero car alternative

·The parameter estimates and elasticity effects for the other data sets may be obtained from the authors. These are also available in the Masters thesis of the second author.

70

C. R. Bhat and V. Pulugurta

(we do not show the estimates of the threshold bounds in the ORL model and the estimates of the alternative specific constants in the MNL model due to space limitations). Prima facie, the signs of all the parameters are consistent with a priori expectations in both models. As the number of nonworking adults and number of working adults in a household increase, the household tends to prefer higher auto ownership levels. This reflects, of course, the greater mobility needs of households with many adults. The effect of working adults is greater than that of nonworking adults (except for the one-car alternative in the MNL model where the effects are not significantly different), possibly because of greater time-space constraints to pursue activities for working adults and, thereby, a greater need for the auto mode. Higher household income is associated with higher levels of auto ownership, a finding consistent with microeconomic theory of consumer choice. The parameters on the residential location variables and the single-family household variable also have the expected signs and their magnitudes in the MNL model for the different auto ownership levels show the expected trend (except that single-family households associate lesser utility with four or more cars compared to three cars).

The parameter estimates of the ORL and MNL models in Table 2 are not directly comparable because the models have different structures. Further, in both models, the parameters do not directly provide the directionality and magnitude of the effects of exogenous variables on the choice probabilities of each auto ownership level (see Greene, 1990, pp. 696 and 705). To address both these issues, we obtain the aggregate-level elasticity effects. The results are shown in Table 3.

The ORL model estimates a larger percentage decrease in the share of zero-auto households relative to one-auto households due to an increase in the number of non-working and working adults; the MNL model, on the other hand, indicates almost equal percentage decreases in the expected share of one- and two-auto ownership levels. Also, the ORL model estimates much smaller elasticities for the three- and four-auto alternatives compared to the MNL due to an increase in non-working and working adults. The income elasticity is lower for the zero car alternative and higher for the two-, three- and four-auto alternatives in the ORL model relative to the MNL model. The percentage decrease in the shares of three- and four-auto levels because of residing in an urban or suburban area is estimated to be higher in the ORL relative to the MNL. Finally, the ORL suggests a higher share of the four-auto alternative for single-family households (compared to multi-family households). The MNL, however, indicates that the share of the fourauto alternative is only marginally affected by residential housing type. To summarize, there are quite substantial differences in the estimated elasticities from the ORL and MNL models (similar results were observed in the other data sets). The MNL model, because it allows alternative-specific effects of exogenous variables, appears to be able to capture a flexible pattern of elasticity effects of variables across alternatives; the ORL, on the other hand, is constrained to have a more rigid (and monotonic) trend in elasticity effects from the zero-car to the four-car alternative (this difference in the elasticity patterns between the ORL and MNL models is a reflection of the theoretical structure of the two models, as discussed in Section 2).

The differences in elasticity effects between the ORL and MNL models suggest the need to apply formal statistical tests to determine the structure that is most consistent with the data. Table 4 provides the measures of fit in estimation for all the data sets. The adjusted likelihood ratio index favors the MNL model in all cases. To test if the adjusted likelihood ratio values of the two models are significantly different, we use the non-nested likelihood ratio test. The final row of Table 4

Table 3. Aggregate-level elasticity effects of exogenous variables for Boston data
ORL model MNL model
Variable o Autos 1 Auto 2 Autos 3 Autos 4 Autos o Autos I Auto 2 Autos 3 Autos 4 Autos
Number of non-working -0.430 -0.155 0.099 0.339 0.444 -0.390 -0.319 0.071 0.679 1.235
adults
Number of working adults -0.618 -0.259 0.135 0.571 0.791 -0.443 -0.458 0.044 1.052 1.867
Annual household income -0.554 -0.276 0.126 0.593 0.804 -0.938 -0.189 0.281 0.258 0.426
(in $OOO/year)
Urban residential location 1.574 0.345 -0.422 -0.672 -0.730 1.639 0.152 -0.385 -0.357 -0.528
Suburban residential 0.539 0.129 -0.128 -0.291 -0.343 0.569 0.084 -0.161 -0.069 -0.301
location
Single family residential -0.684 -0.244 0.212 0.415 0.464 -0.598 -0.319 0.274 0.604 0.059
housing type Household auto ownership decisions

...J Z ::E

] ] § §

z z

gg :! ~

..n 00 MOOr--t""') N r--- 0

~ 00 M

I

j

~ o

...J Z ::E

...J Z ::E

...J Z ::E

V') 00

o

71

~

I o

X ~ a;

II N

o a;

I

&'

~

I o

X

'"

I

&'

.,.. '"

I

o

X V')

I

&'

1

r!::

o Z

72 C. R. Bhat and V. Pulugurta
Table 5. Aggregate measures of performance in validation sample
Shares in Boston data' Shares in Bay area data Shares in Puget Sound data Shares in Dutch data
Auto level Actual ORL MNL Actual ORL MNL Actual ORL MNL Actual ORL MNL
o Autos 10.21 9.72 9.31 6.60 7.38 7.20 2.20 3.28 2.98 24.80 26.29 26.11
I Auto 32.88 31.84 33.12 33.50 31.93 33.46 27.20 22.94 24.61 65.00 63.78 64.06
2 Autos 42.83 42.62 42.50 39.50 40.06 39.18 42.20 44.31 43.20 10.20 9.92 9.83
3 Autos 9.87 11.29 10.53 13.90 14.60 14.29 118.80 21.17 20.96
4 Autos 4.21 4.53 4.54 6.50 6.03 5.87 9.60 8.30 8.25
RMSEt 0.84 0.55 0.91 0.45 2.49 1.72 1.12 0.96
MPAEt 6.14 4.97 6.03 4.50 19.18 14.58 3.54 2.97 'The share of each auto ownership level is expressed in percentage form in the Table. The entries under the ORL and MNL columns are the predicted shares obtained from the ORL and MNL models, respectively.

tRMSE represents the root mean square error of the predicted auto ownership shares.

tMPAE represents the mean absolute percentage error. It is computed as the mean (across auto-ownership levels) of the percentage absolute error between predicted and actual shares.

presents the upper bound of the probability that the estimated difference in adjusted likelihood ratio index values between the ORL and the MNL models could have occurred by chance. The results provide very strong evidence that the MNL model is to be preferred over the ORL model for all data sets.

5.3. Validation results

We now examine the predictive performance of the MNL and ORL models on the validation samples based on the aggregate and disaggregate fit measures identified in Section 2.3.

The predicted aggregate shares (expressed as percentages) obtained from the ORL and MNL models in the validation sample are presented in Table: 5.

As can be observed, in about two-thirds of the cases, the predicted shares from the MNL model tend to be closer to the actual shares than those from the ORL model. It is also important to note that five of the six cases in which the ORL model provides better results are associated with either the 0/3/4 auto levels in the U.S. data sets or the 2 auto level in the Dutch data set. These autolevels have very small actual aggregate shares. The RMSE statistic and the mean percentage absolute error statistic, two overall measures of fit across auto levels, are presented in the final two rows of the table. These measures show that the MNL model does perform better than the ORL model in the overall.

The share predictions in Table 5 offer only an aggregate measure of fit. Given that our validation sample is very similar in characteristics to the estimation sample (the validation and estimation samples are random sub-samples of the same overall sample), and also since errors tend to average out in the aggregate, it is not surprising that the aggregate fit measures in Table 5 do not

Table 6. Disaggregate measures of lfit in validation sample
Boston data Bay area data Puget Sound data Dutch data
Summary statistic ORL MNL ORL MNL ORL MNL ORL MNL
Log-likelihood at zero -1875.00 -1875.00 -1609.44 -1609.44 -804.72 -804.72 -549.31 -·549.31
Log-likelihood at sample shares -1542.18 -1542.18 -1368.97 -1368.97 -670.68 -670.68 -429.33 -·429.33
Predictive log-likelihood" -1178.43 -1083.83 -1146.43 -1065.69 -534.48 -509.48 -369.38 --359.96
Number of parameters" 24 6 24 6 16 4 8 4
Number of observations 1165 1165 1000 1000 500 500 500 500
Predictive adjusted likelihood 0.232 0.282 0.158 0.204 0.197 0.216 0.130 0.143
ratio index
Average probability of 0.442 0.486 0.383 0.420 0.392 0.432 0.572 0.588
correct prediction -The predictive log-likelihood is the log-likelihood function value in the validation sample computed at the parameter estimates obtained from maximizing the estimation likelihood function.

"The number of parameters refers to the parameters on the exogenous variables; it does not include the threshold hounds in the ORL model or the alternative-specific constants in the MNL model.

Household auto ownership decisions

73

discriminate much between the ORL and MNL models. However, we have noted in the previous section that the aggregate elasticity effects from the two models are different and also that the MNL model is substantially better than the ORL model at the individual household level. These observations suggest that if the characteristics of the 'validation' sample were quite different from that of the estimation sample (i.e. if the estimation and application contexts are quite different as in the case of forecasting car ownership levels for a future year with substantial changes in household characteristics over time), the discongruity in aggregate fit measures from the MNL and ORL models would become significant. The implication is that it is the performance of the two models at the disaggregate household level that is critically important. We now examine if the superiority of the MNL model over the ORL model at the disaggregate level in the estimation sample carries over to the validation sample. We compute the predictive adjusted likelihood ratio index and the average probability of correct prediction. These disaggregate measures of fit are presented in the final two rows of Table 6 and indicate that the MNL model clearly outperforms the ORL model at the disaggregate level.

6. SUMMARY AND CONCLUSIONS

Two alternative behavioral mechanisms; the ordered-response mechanism and the unorderedresponse mechanism; have been widely used to represent the choice process underlying households' auto ownership level decisions. It is important to identify which of these two mechanisms (or structures) more closely represents the true underlying choice process to enable accurate auto ownership forecasts in response to changing socio---economic conditions and activity-travel environment attributes.

In this paper, we have discussed the theoretical structure of the ordered and unordered choice mechanisms. Specifically, we have shown that the ordered-response mechanism can be viewed as a decision structure associated with a series of binary choices. Each binary choice is associated with the range of alternatives on one side or the other of each possible car ownership level. Based on the decision outcome for each range, the actual choice is implicitly determined. The ordered-response mechanism is not consistent with the global utility maximizing hypothesis. The unorderedresponse mechanism, on the other hand, is consistent with global utility maximization. It involves assigning a utility value to each car ownership level and selecting the car ownership level with the highest utility. The paper identifies the theoretical constraints imposed by the ordered-response mechanism, and presents the advantages and limitations of the: ordered and unordered-response mechanisms.

To identify which of the two behavioral mechanisms more closely represents car ownership choice-making behavior, we have compared the ordered-response and unordered-response behavioral structures empirically using several data sets (three regional data sets from the United States and a Dutch National data set). The use of many data sets offers the opportunity to make general conclusions regarding the behavioral process underlying auto ownership choices. The orderedresponse structure is represented by the ordered-response logit (ORL) model. The unorderedresponse structure is represented by the multinomiallogit (MNL) model.

An examination of the parameters from the ORL and MNL models show that both models provide estimates which are reasonable. However, there are substantial differences in the magnitude of the effects of relevant exogenous variables on the choice probabilities of different auto ownership levels (as indicated by the aggregate-level elasticities). The MNL model, because it allows alternative-specific effects of exogenous variables, appears to be able to capture a flexible pattern of elasticity effects of variables across alternatives; the ORL, on the other hand, is constrained to have a more rigid trend in elasticity effects .. The differences in elasticity effects imply that the forecasts from the ORL and MNL models are likely to be quite different in response to temporal changes in household socio-economic characteristics and the activity-travel environment.

We assess the relative performances of the ORL and MNL models using several measures of fit in the estimation sample and in a validation sample. A consistent result that emerges from all the different fit measures and for all data sets is that the MNL model outperforms the ORL model. Thus, the ORL model is mis-specified. Since the aggregate elasticity effects from the ORL model are very different from those of the MNL, the implication is that the use of the ORL model for

74

C. R. Bhat and V. Pulugurta

auto ownership modeling is likely to lead to incorrect and inaccurate forecasts. From a behavioral viewpoint, our results clearly indicate that the unordered-response choice mechanism is a better representation of households' auto ownership decision process than the ordered-response mechanism.

In summary, our comparative analysis offers strong evidence that the appropriate choice mechanism is the unordered-response structure. As a general guideline, auto ownership modeling must be pursued using the unordered-response class of models (such as the MNL model or the multinomial probit model) and not using the ordered-response class of models (such as the ordered-response logit or probit).

Acknowledgements-This research was supported by National Science Foundation grants DMS 9208758 and DMS 9313013 to the National Institute of Statistical Sciences (NISS). The comments of Professor Eric Pas, Dr Alan Karr, and Dr Agostino Nobile during a presentation of the initial results of this research at NISS are greatly appreciated. The authors would also like to thank Mr Ian Harrington, Mr Charles Purvis, and Mr Neil Kilgren for providing the Boston area, Bay area, and the Puget Sound area data sets, respectively. Several anonymous referees and Prof. Michael Bell provided valuable comments and suggestions on an earlier draft of the paper.

REFERENCES

Agostino, A., Bhat, C. R. and Pas, E. I. (1996) A random effects multinomial probit model of car ownership choice. Proceedings of the Third Workshop on Bayesian Statistics in Science and Technology: Case Studies (forthcoming).

Amemiya, T. (1985) Advanced Econometrics. Harvard University Press, Cambridge.

Ben-Akiva, M. and Lerman, S. R. (1985) Discrete Choice Analysis .. ' Theory and Application to Travel Demand. The MIT Press, Cambridge.

Bhat, C. R. (1996) A model of work travel mode choice and number of non-work commute stops. Transportation Research B 31, 41-54.

Bhat, C. R. and Koppelman, F. S. (1993) An endogenous switching simultaneous equation system of employment, income, and car ownership. Transportation Research A 27,447-459.

Bunch, D. S. and Kitamura, R. (1990) Multinomial probit model estimation revisited: testing estimable model specifications, maximum likelihood algorithms and pro bit integral approximations for trinomial models of car ownership. Institute of Transportation Studies Technical Report, University of California, Davis.

Central Bureau Voor de Statistiek (1989) Bevolking Der Gemeenten Van Nederland. SDU Publishers, The Hague.

Fitzroy, S. S. (1994) Puget Sound Transportation Panel. Presented at the Travel Model Improvement Program Workshop, August, Fort Worth, Texas.

Golob, T. F. (1990) The dynamics of household travel time expenditures and car ownership decisions. Transportation Research A 24, 443-465.

Golob, T. F. and van Wissen, L. (1988) A joint household travel distance generation and car ownership model. Working paper WP-88-15, Institute of Transportation Studies, University of California, Irvine.

Greene, W. H. (1990) Econometric Analysis. MacMillan Publishing Company, New York.

Hamed, M. M and Mannering, F. L. (1993) Modeling travelers' postwork activity involvement: toward a new methodology. Transportation Science 27(4),381-394.

Han, A. and Hausman, 1. A. (1990) Flexible parametric estimation of duration and competing risk models. Journal of Applied Econometrics 5, 1-28.

Harrington, I. E., Wang, C.-Y. and Kuttner, W. (1995) Expansion of 1991 regional household- based travel survey. Technical Memorandum, Central Transportation Planning Staff, Boston Metropolitan Organization.

Hensher, D. A., Smith, N. C., Milthorpe, F. W. and Barnard, P. O. (1992) Dimensions of Automobile Demand: A Longitudinal Study of Household Automobile Ownership and Use, Studies in Regional Science and Urban Economics, p. 22. Elsevier Science Publishers, Amsterdam.

Horowitz, 1. L. (1980) Sources of error and uncertainty in behavioral travel demand models. In New Horizons in Travel Behavior Resaearch, eds P. R. Stopher, A. H. Meyburg and W. Brog. pp. 543-558. Lexington Books, MA.

Kitamura, R. (1987) A panel analysis of household car ownership and mobility, infrastructure planning and management.

Proceedings of the Japan Society of Civil Engineers, 383/IV-7, pp. 13-27.

Kitamura, R. (1988) A dynamic model system of household car ownership, trip generation, and modal split: model development and simulation experiments. In Proceedings of the 14th Australian Road Research Board Conference, Part 3, Australian Road Research Board, Vermont South, Victoria, Australia, pp. 96-111.

Kitamura, R. and Bunch, D. S. (1989) Heterogeneity and state dependence in household car ownership: a panel analysis using ordered-response probit models with error components. Research report, UCD-TRG-RR-89-6, Transportation Research Group, University of California at Davis.

Mannering, F. L. and Winston, C. (1985) A dynamic analysis of household vehicle ownership and utilization. Rand Journal of Economics 16, 215-236.

McKelvey, R. D. and Zavoina, W. (1975) A statistical model for the analysis of ordinal level dependent variables. Journal of

Mathematical Sociology 4, 103-120.

Meurs, H. (1990) Trip generation models with permanent unobserved effects. Transportation Research B 24,145-158.

Oi, W. Y. and Shuldiner, P. W. (1963) An Analysis of Urban Travel Demand. Northwestern University Press, Evanston, IL. Pulugurta, V. (1996) Household auto ownership choice: a comparison of two alternative behavioral mechanisms. Unpub-

lished Masters thesis, University of Massachusetss, Amherst.

Purvis, L. C. (1994) Using census public use micro data sample to estimate demographic and automobile ownership models.

Transportation Research Record 1443,21-30.

Schor, N. (1989) Use of historical information in auto ownership modelling. Unpublished Masters thesis, Transportation Center, Northwestern University, Evanston, IL.

Stopher, P. R. (1992) Use of an activity-based diary to collect household travel data. Transportation 19, 159-176.

Household auto ownership decisions

75

Train, K. (1986) Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. The MIT Press, Cambridge, MA.

Uncles, M. D. (1987) A beta-logistic model of mode choice: goodness of fit and intertemporal dependence. Transportation Research B 21(3), 195-205.

White, E. H. and Company, Inc. (1991) 1990 Bay Area travel survey: final report. Submitted to the Metropolitan Trans-

portation Commission, Oakland, California.

Windmeijer, F. A. G. (1995) Goodness-of-fit measures in binary choice models. Econometric Reviews 14, 101-116.

van Wissen, L. and Meurs, H. (1989) The dutch mobility panel: experiences and evaluation. Transportation 16,99-119. Wrigley, N. (1990) Dirichlet-logistic models of spatial choice. In Spatial Choices and Processes, eds M. M. Fischer, P

Nijkamp, and Y. Papageorgiou. North Holland, Amsterdam.

You might also like