Professional Documents
Culture Documents
cHARLEt A. hVE
Dept. of Economics,University of California,Irvine, CA 92717,U.S.A.
and
KENNEM TRUN
CambridgeSystematics Inc., Berkeley. CA 94704, U.S.A.
AbdPr&Previous models of auto-type&ice have not been abk to disentaqtk very much of the structureof tbc
bousebold’sauto-c&ice decision: the models assumed that very few auto characteristicsa&t &ok-e, and often
these few parameterswcte estimatedwith low pfecih. Hence the models had only limited we io forecastiagthe
effects of govenuuellt policiesto infiucacetraasportatinenergy consumption.The present paper introduces8
multinomiailogit model for the type of car that householdswill choose to buy. The model includes a large variety
of auto characteristicsas explanatoryvariabks, as well as a large numberof characteristicsof the bousebold and
the drivirutenvironment.The model fits the data suite well, and all of the variablesenter witb the correctsigns and
plausibleLgaitudes.
able, stating that “the estimated own-price coefficient for impact on sales of one price element relative to others.
medium car shares and large car shares are only On the other hand, automotive fuel economy price
significant at a 25 and 40% level of error respectively”. proposals would influence various combinations of
Other models leave some important variables out of each of these elements, making it essential that the
some of the equations, indicating that their standard effect of each be isolated.”
errors must have been too large to allow inclusion. For
example, the model of Chase Econometrics does not Diliglio and Kulash were forced to specify a “general-
include the price of compacts in the equation for the ized” price combining auto price, gas price, and fuel
market share of compacts; the equation for the market economy; and then entered this single variable into their
share of subcompacts includes neither the price of sub- model rather than estimating the separate effects of auto
compacts nor the fuel economy. Similariy, the EEA1 price, gas price, and fuel economy.
model does not include fuel economy of compacts in the Because of these two problems (lack of sufficient
compact auto share equation or the price of subcompacts variation over time and across regions in automobile
in the subcompact auto share equation. characteristics, and large correlations among automobile
A second problem confronting the builders of aggre- characteristics) the possibility seems small of any aggre-
gate models is that, over time, automobile characteristics gate econometric model being able to include a full array
tend to vary together. Weight and external dimensions of automobile characteristics. Consequently, it seems
tend to move in tandem, with both increasing or that a disaggregate econometric approach should be
decreasing at the same time. Similarly horsepower and attempted.
price tend to be correlated. To obtain reliable estimates
of the relationships between variables, however, each m OF THEDATA
explanatory variable must vary fairly independently of Householddata
the other explanatory variables. Otherwise, it is not We utilized a stratified random sample of new car
possible to determine which variable’s variation is buyers, collected during the Summer of 1976 by Arthur
explaining the variation in the dependent variables. D. Little. Inc., in seven cities: Atlanta, Buffalo, Chicago,
The only automobiie characteristics which Difiglio and Denver, IndianapoGs, Los Angeles and New Orleans.
Kulash entered were price and fuel economy. Yet, with The stratification produced an equal number of house-
only these two variables, the problem of collinearity was holds who had purchased small, medium and large cars
encountered. Kulash (1975) stated the difficulty expli- (classes l-4/5-6/7-10; as explained below). Each home
citly: interview recorded the type of auto purchased and a
variety of household characteristics. The means of some
“Car prices, gasoline prices, and fuel economy aLl of the socioeconomic variables are shown in Table 1.
have a bearing on the overall cost of owning and
operating a vehicle. But over the last 15 yr, these three Automobiledata
measures have all changed in highly interrelated ways, Information on various physical characteristics for
as reflected by simple conelation coefficients between each type of car was taken from the 3976 Automotiue
them of 0.9 or greater. These interrelations make it News Ma&et L&a Book FVice data for each car were
diGcult to use statistical techniques to separate the constructed as the sum of (a) sticker price, (b) destina-
Standard
-
:&an Deviation
Number of people tn household who are 18 years and over 2.36 .92
Number of people in household who are O-17 years old .94 1.13
(1 - urban __ ._
Location of home (2 - suburban 1.78 .48
(3 = rural
tion charges specitk to each city and (c) taxes specihc to AIulxxLoFAmm.cE
each city. Destination charge information was obtained The model estimates the probability that a given
from the auto manufacturers. Tax information (regis- household will choose to buy a new auto within a parti-
tration fees, personal property taxes, local sales tax and cular class of auto types (where the classes are defined
state sales tax) was obtained from various federal pub- as above), conditional upon the household’s choice of a
lications, and from individual cities. new auto over a used one. The model is thereby restric-
tive in two ways: (1) only the choice of new auto types is
Lkjinitionof choice categories considered and (2) classes of autos rather than makes
All of the makes and models of cars were classified and models are the choice alternatives. The former
into ten categories, with cars in each category chosen to restriction was imposed by the data: only a sample of
be relatively homogeneous in size and price. The house- new auto purchasers was available for analysis. The
hold is then choosing between these ten categories. We latter restriction was imposed by computer capacity:
constructed a “representative” car for each category by estimation of a model with a separate alternative’ for
taking a sales-weighted average of the characteristics of each make and model would entail more computer space
the cars in that category. The categories are summarized than is available.
at the top of Table 2. The model is multinomial logit (MNL) with the prob-
Independent Variable
(The vartable takes the described value
in the alternatives listed in parentheses Estimated T-
and zero in non-listed alternatives). Coefficient Statistic
Table 2 (Contd)
Estimated T-
Independent Variable Coefficient Statistic
1 .
Imtlal auto cost is sticker price, taxes, and destination charges, in
dollars. Income is in dollars per year.
'Auto operating cost per mile is in cents per mile and is defined as
the price of a gallon of gasoline divided by the auto's miles per gallon.
Prtce for each city from the Oil and Gas Journal.
3
Weighted seats is a variable which gives a *eight of one for each
seat in an auto up to the number of persons in the household and a weight
of one-half for each seat in an auto more than the number of persons In
the household. Thus, a household with three members will have a value
of 2 for auto classes with 2 seats, a value of 3 for auto classes with
3 seats, and a value of 3.5 for auto classes with 4 seats.
4
Weight is in hundreds of pounds. Age is coded as follows:
1 = teens
2 = twenties
3 = thfrties
4 = forties
5 = fffties
6 = sixties and above.
5.
Wetght ts in hundreds of pounds. Education Is coded as follows:
1 = high school not complete
2 = high school complete
3 - one to three years of college
4 = four or more years of college.
ability of choosing a new auto within class i delined as: of the household, I is a vector-valued function of xi and
eb’2Wi.S) S, and fl is a vector of coefficients. The term /?‘z(x,,S) is
Pi = _ (1) called the “representative” utility of auto class i. The
eB’rCx*I,
MNL model and a method for estimating the model
c-1
parameters are discussed in McFadden (1973J.t
where P, is the probability of choosing a new auto in Table 2 presents the model of new auto class choice.
class i, xc is a vector of attributes of autos of class i (e.g. The independent variables are the elements of z(x,, S) in
cost, weight, horsepower, etc.), s is a iector of attributes (1).
The second and third columns record the estimates
tThc modelwas estimatedas if the sample were drawn exo- and r-statistics, respectively, of the elements of /3. The
genously, ~hcrcas the sa~~ple was actually choice-based. This independent variables are fairly complex and require
inconsistcn~y is duscusscd below, in “Problemswith the Mode”. explanation.
A disaggregatemodel of auto-typechoice 5
Auto Weight
(in hundreds of pounds)
where WEIGHT and EDUCATION take the values given in the POW and
column heads.
the car. But the degree of positive slope increases greatly fashion was that we could not do so. A variety of
with age: an increase in car weight has a much stronger alternative specitications to accomplish this were tried,
positive inlhrence on old people than it does on young but they simply did not yield sign&ant results.
people. In particular, entering “auto cost per mile divided by
The “performance” variable approximates the relative income” rather than “auto cost per mile” decreased the
acceleration of different cars; and again we have allowed log liielihood of the model; the t-statistic for this new
for a possible non-linear relationship. The results seem term was only 0.49. This indicates that the importance of
quite reasonable: for any given age class of people, auto cost per mile does not vary with the inverse of
performance has a positive slope; i.e. an increase in the income.$
performance of autos in a particular class increases the It was thought that, perhaps, households consider auto
probability that the class will be selected. But the degree cost per month when choosing an auto rather than the
of positive slope goes down with increasing age: young cost per mile. That is, the importance of auto cost per
people are much more influenced by an increase in mile varies with the vehicle miles traveled by .a house-
performance than older people. hold. To test the proposition, the model of Table 2 was
In summary, the flexible form allowed by our various estimated with “auto cost per month” replacing “auto
non-linear interaction terms has enabled us to discover a cost per mile”, the former being defined as the product
number of complex, but highly plausible interactions of the latter and the vehicle miles traveled by the
between household characteristics and car choices. household. This model again attained a lower log likeli-
A dummy variable was included for each auto classt hood than the model of Table 2, and the t-statistic for the
to capture the common effects, on all consumers, of coefficient of “auto cost per month” was only 1.25.
variables which are not included in the model. In parti- Finally to test whether the importance of auto cost per
cular, these variables capture the effects of the relative mile varies with both vehicle miles traveled and income,
prestige and comfort of the autos in each class. Since the model of Table 2 was estimated with “auto cost per
these variables cannot be measured, they cannot enter month divided by income” replacing “auto cost per
the model directly. mile”. This model also attained a lower log likelihood
than the model of Table 2, and the t-statistic for the
coefficient of “auto cost per month divided by income”
The “auto operating cost” variable
was only 0.18.
Autopurchase price was divided by income to allow
the importance of cost to vary with income, but auto
operating costs were not divided by income. The reason PRoutXMJwtTETnEiWnEL
we did not treat the two kinds of costs in a parallel Several problems limit the plausibility and
consequently the applicability of the model. First is the
problem of the restrictions implied by muhinomial logit
tA dummy variable was not included for auto class 1 since
doing so would produce an identihcationproblem. Not including
(MNL). The MNL model, expressed in eqn (11, assumes
the variable is equivalentto normalizingthe representativeutility that the ratio of the probabilities of choosing any two
function such that the coefficientof this variable is zero. alternatives is independent of the availability or attri-
SThe correct specificationtest is to include both variables in a butes of other alternatives. This property is called the
more general model and test the hypothesisthat the coefficientof independence from irrelevant alternatives (IL% property
the “auto cost divided by income” variable is zero. This pro-
cedure was not possible, however, because the capacity of the and can be demonstrated as follows. Consider the ratio
computer was insufficientto allow estimation of the model of of the probability of a person choosing alternative i to
Table 2 with an extra variable added. that of choosing alternative k, given that set C of alter-
A disaggregatemodel of auto-typechoice 1
natives is available: for use with exogenously chosen samples (and was used
for estimating the model of Table 2) is not appropriate if
the sample was chosen on the basis of the household’s
chosen alternatives. They demonstrate two differences
between the maximum likelihood estimator which is ap-
propriate when the sample is choice-based and the
McFadden estimator. First, the estimated altemative-
This ratio is constant for any C which contains i and k specific constants are d&rent with the two estimators
(including, of course, the set containing only i and k) and (though all the other estimated parameters are the same).
any attributes of alternatives (except i and k) in C. Second, the estimated standard errors of all the
The model of Table 2 seems particularly likely to parameters are different with McFadden’s method than
violate the IIA assumption. For example, it seems with the method which is appropriate for choice-based
doubtful that the probability of choosing a class 9 auto samples.
over a class 3 auto remains tbe same whether or not the As a result of these findings, the estimated altemative-
possibility of owning a class 8 auto exists. If the prob- specific constants and the t-statistics in Table 2 should
ability is not constant, then the property of independence be viewed with caution. Unfortunately, a software rou-
from irrekvant alternatives does not hold and the MNL tine with the appropriate correction for choice-based
model is inappropriate. samples was not available at the time the model was
Two factors mitigate the severity of this probkm. estimated.
First, it has recently been found that if altemative- A last, and fundamental, problem with the model lies
speci6c constants are inchakd as explanatory variables in the fact that most of the auto attributes do not vary
when an MNL model is being estimated, then these over the population. That is, the weight, size, horse-
constants partially “correct” for violations of the IIA power, number of seats, and so on, of a particular auto
property. That is, if a MNL model is estimated in an type is the same for all households in all parts of the
application in which the IL4 property does not hold, then country.? Because of this, auto attributes cannot enter
the estimated values of the alternative-specific constants directly into a model which has alternative-specific
are automatically adjusted partklly to correct for this constants for each alternative (since the attributes are
probkm. (See Train (1977) for a full discussion.) Since collinear with the constants). The only way an auto
the model of Tabk 2 was estimated with an alternative- attribute which does not vary over the population can
apecifkconauuuforeacbautockss,theprobkmofIIA enter the model is by interact& with some charac&atic
is kss severe than it would be if ascb con&ants were not of the household (which do vary over the population, of
idJdCd. course) and/or by removing one or more of the alter-
Tbc second reason to d&count the probkm of the IIA native-specific constants. Both of these approaches have
property is baaed on empirical testing. McFadden d al. drawbacks.
(1976) developed methods for testing the IIA property of Fiit consider the approach of removing one of the
MNL models. These tests were applied to the model of constants. Say that auto weight is included as an
Table 2, and in all cases the model passed. That is, the explanatory variable and that the constant for auto class
hypothesis that the IIA property holds was not rejected 10 is removed to allow the weight variable to enter. The
with any of the tests, indi&ng that perhaps the IIA estimated coefficient of the weight variable would be
property does not present a problem in this application. exactly equal to the constant which had previously been
(The details of these tests are available from the estimated for class 10 autos divided by the difference
authors.) However, it must be mentioned that the power between the weight of class 10 and class 1 autos (since
of these tests seems to be low. Consequently, it is quite the constant for class 1 autos is normalized to zero). All
possible that violations of the IIA property exist in this of the constants for the other auto classes would be
application but were not detected by the tests. adjusted such that the sum of the new constant and the
Another problem with the model concerns the method weight term would equal the previously-estimated
by which the model was estimated. As mentioned above, constant. All the other estimated parameters would
a stratified sampling procedure was used to obtain the remain the same with this change in model
households upon which the model was estimated. The specification.
stratitication was based on auto size so as to obtain an Two points are important with regard to this approach.
equal number of households who had purchased small, Fit, the coefficient to the weight variable can be cal-
medium and large cars. As a result, households were culated without actually entering it. Second, the
selected on the basis of their chosen auto, rather than on coefficient of the weight variable would be different
the basis of some variable which is exogenous to the depending on which alternative-specilic constant is
decision process being modeled. removed.
Manski and Lerman (1976) show that the estimation This discussion shows that entering an auto attribute
method given in McFadden (1973), which was developed by removing one constant tells us nothing about the
value of that attribute to consumers. It is simply a
Wtial cost varies because of differencesin taxes and destina- different, but equivalent, method for entering an aher-
tion charges. Operatingcost varies because of diierences in the native-specific constant.
price of gasoline. If two or more alternative-specific constants are
a C. A. LAVE and K. TRAIN
removed and one attribute is entered, it is possible that The basic question is simply: how much can be learned
the estimated coefficient of this variable contains about the effect of changes in auto characteristics from a
meaningful information. The more constants that are sample in which such attributes do not vary? No simple,
removed when one attribute is entered, the more in- definitive answer to that question is available at the
formation might be contained in the estimated moment; and hence the estimated coefficients must be
coefficient. However, two problems occur in this regard. used with some degree of caution.
Fast, as mentioned above, the alternative-specific
constants are useful in correcting for violations of the
IIA property. If they are removed (and an equal number WPLICATIONSOFTBEMOWL.
of attributes are ‘not entered), then violations of the IIA We have begun the lengthy process of exploring
property could cause important problems for the model. alternative policy scenarios with the aid of the model,
Second, the auto type choice model contains only nine and have two preliminary results: the effects of an in-
alternative-specific constants (the tenth is normalized to crease in gasoline taxes, and the effects of an excise tax
zero). Consequently, even if ah the constants were on larger cars. Utilizing the model of Table 2, we cal-
removed, few auto attributes could be entered with culate each household’s individual probabilities of buying
meaningful results. t a new car in each of the ten car classes. When these
Because of the drawbacks involved with adding auto probabilities are summed across ah households in the
attributes by eliminating alternative-specific constants, sample we obtain the lirst column of Table 4, the initial
the approach was adopted of interacting the attributes market shares of new car types.
with the socioeconomic characteristics of the household. To calculate the effects of a change in gasoline prices
This approach allows the auto attributes to enter the we make a 10% increase in the variable “auto operating
model without eiiminating the constants, but it has other cost per mile” for each separate household, and sum
drawbacks. The variation which occurs in an explanatory across households. These results are shown in the
variable, that is defhted as an interaction of an auto second column. The strong gainers are the subcompacts,
attribute with a socioeconomic variable, is entirely due to classes 1 and 3; and the biggest losers are the inter-
variation in the socioeconomic variable. Consequently, it mediate and large cars, classes 7-10. The market share of
is questionable whether the estimated coefficient of such compacts, classes 5 and 6, remains about the same, in
an explanatory variable contains any information about apparent terms. But what is happening is that the in-
the effect of the atrribute on the choice of the decision- crease in gasoline prices is shifting the entire profile of
maker, rather than the effect of the decision-maker’s car choices downward. That is, the results are compati-
tastes as captured by his socioeconomic characteristics. ble with the notion that the tax causes people to move
down one or two classes, rather than making a major
shift: some luxury car buyersshift down to class 9; some
tMNL models have been estimated that include several attri- intermediate buyers shift down to class 6; the net down-
butes of the akcmatives which do not vary over the popuhtion; shift from the compacts is balanced by the net down-
destination choice models in urbantravel demandanalysis are an
shift into the compacts; and more people are piling up at
example. However, these models describe choice situations for
which there are numerous dternnti~~~ (and dtcmati~~-~p~cifk the bottom end, in the smahest cars.
constants are not included). For the model of auto type choice, The results of a 10% excise tax on intermediate, large
there are only ten alternatives. and luxury cars is shown in Column 3. The excise tax has