You are on page 1of 10

Spatial Transferability of Neural Network Models

in Travel Demand Modeling


Liang Tang 1; Chenfeng Xiong 2; and Lei Zhang 3

Abstract: Neural network (NN) models have been widely used in travel demand modeling in recently years. However, there are few studies
about the spatial transferability of NN models. In this paper, the spatial transferability of NN models in travel demand modeling, especially in
mode choice models, is analyzed. This paper first discusses the performance of naïve transfer when no data are available in an application
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

context. Then, a NN model adaptation method is proposed using the classification adjustment weight vector when limited local data are
available. Using the 2007/2008 Transportation Planning Board—Baltimore Metropolitan Council Household Travel Survey data, five NN
models are built using trips within five areas in the Washington, DC, and Baltimore regions. Each of the five NN models is applied to the other
four areas to study spatial transferability using both individual-level and aggregate-level performance measures. The result shows that the
naïve transfer of NN models can perform very well between areas that share many similarities. It also indicates the transferability of NN
models is not symmetric. The performance of the proposed adaptation method is evaluated for different sample sizes of local training data. For
transfer between areas that have significant differences, the proposed NN model adaptation method can improve performance significantly,
even with a small sample size, compared to naïve transfer. DOI: 10.1061/(ASCE)CP.1943-5487.0000752. © 2018 American Society of
Civil Engineers.

Introduction Yasmin et al. 2015; Sikder et al. 2013a; Wafa et al. 2015; Bowman
and Bradley 2017). Based on data availability, people can use
Travel demand models are important tools in the transportation different transfer methods. When no local data are available, naïve
planning process, and they are used to analyze people’s travel transfer is the only available transferring method, which means
behaviors and predict travel demand changes in different scenarios. directly applying the model from the estimated context to the
However, these travel demand models usually need various data as application context without changing the model specification or
input from different kinds of travel surveys, like household travel parameters. When limited local data are available, people can
surveys (HHTSs), stated preference (SP) surveys, or global posi- use local data to update the parameters of the transferred models.
tioning system (GPS) surveys. These surveys are usually very Different transfer methods have been proposed and studied for logit
time-consuming and effort-consuming to conduct. For those re- models, including naïve transfer, transfer scaling, Bayesian updat-
gions that want to do some transportation analysis but lack those ing, combined transfer estimator, and joint context estimation
survey data, spatial transfer sometimes is a more practical and ef- (Karasmaa 2007; Xiong et al. 2015; Xiong and Zhang 2013).
ficient solution than spending money and time in conducting travel Karasmaa (2007) studied the spatial transferability of mode and
surveys and building local travel demand models. Spatial transfer destination choice models; both were logit models. Different trans-
refers to the practice that applies a model to an area other than the fer methods (transfer scaling, Bayesian updating, combined trans-
estimation context. Spatial transfer can save significant time and fer, or joint context estimation) were compared using different
money because it does not require any data from application con- sample sizes, and the researchers concluded that joint context es-
texts, or it needs only limited data to update the transferred model. timation gives the best prediction performance in almost all cases.
Because of these benefits, spatial transfer has become a common Detailed suggestions have been provided about how to conduct a
practice when resources are limited (Rossi and Bhat 2014; Sikder
successful spatial transfer for logit models (Rossi and Bhat 2014).
et al. 2013b). Many studies have been conducted to study the trans- In recent years, computational intelligence (CI) methods, which
ferability of travel demand models, especially for the most widely
are based on learning, adaptation, evolution, and fuzzy logic, are
used logit-based models (Bowman et al. 2014; Ziemke et al. 2015;
more and more used in transportation fields. Neural networks
1 (NNs) are one of the most popular CI models and have been suc-
Graduate Research Assistant, Dept. of Civil and Environmental
Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College Park,
cessfully applied to travel demand modeling in different choice
MD 20742. ORCID: https://orcid.org/0000-0003-4138-1543. E-mail: liang@ dimensions (Shmueli et al. 1996; Sayed and Razavi 2000). Many
umd.edu studies have been conducted applying NNs in mode choice mod-
2 eling. Hensher and Ton (2000) compared the predictive capability
Assistant Research Professor, Dept. of Civil and Environmental
Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College Park, of neural network models and nested logit models in the context of
MD 20742. E-mail: cxiong@umd.edu a commuter mode choice problem. The results showed that nested
3
Herbert Rabin Distinguished Professor, Dept. of Civil and Environ- logit models are better at matching the overall market share,
mental Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College whereas neural network models are better at predicting an individ-
Park, MD 20742 (corresponding author). E-mail: lei@umd.edu
ual’s travel mode. Cantarella and de Luca (2005) discussed how to
Note. This manuscript was submitted on May 9, 2017; approved on
October 16, 2017; published online on February 10, 2018. Discussion per- successfully apply the multilayer feedforward network (MLFFN)
iod open until July 10, 2018; separate discussions must be submitted for to support travel demand analysis, like trip generation, trip distri-
individual papers. This paper is part of the Journal of Computing in Civil bution, and modal split. Then, they built MLFFN models to analyze
Engineering, © ASCE, ISSN 0887-3801. transportation mode choice and compare with random utility

© ASCE 04018010-1 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


models. They concluded that MLFFN models may outperform policies or traveler information systems. Many studies have been
random utility models when the values of model shares are quite conducted to compare NN models with other models, like logit
similar. Xie et al. (2003) compared decision trees, neural networks, models, doubly constrained gravity models, and support vector ma-
and the multinomial logit model in mode choice modeling and con- chine models, depending on the application context. Sometimes
cluded that the neural network model gives superior prediction per- NN models win, and sometimes they lose. However, it is widely
formance in most cases. Zhang and Xie (2008) compared three acknowledged that it is a good alternative method for different
different kinds of models for travel mode choice modeling: the mul- transportation problems because of its ability to work with massive
tinomial logit model, multilayer feedforward neural network amounts of multidimensional data, its modeling flexibility, its
model, and support vector machine model. They found that the sup- learning and generalization ability, its adaptability, and its good
port vector machine model performed better than the multinomial predictive ability (Karlaftis and Vlahogianni 2011). Although many
logit and multilayer feedforward neural network models. Research- studies have been conducted applying NN models in different
ers also applied NN in other choice dimensions of travel demand transportation problems, most of these previous studies focus on
modeling. AMOS, an activity-based microsimulation model sys- the accuracy or performance in applying NN models to various
tem, uses neural network models to model individual responses transportation problems. Few studies have been conducted to study
to various transportation control measures, which is the travel de- the transferability of NN models. One such study is Hensher and
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

mand management response option generator module of AMOS Ton (2000). This study focused on commuters’ mode choice and
(Pendyala et al. 1997). Transportation control measures considered compared the generalization ability between NN and nested logit
in the paper include parking pricing, improved bicycle or pedestrian models using SP data from Sydney and Melbourne. Naïve transfer
facilities, employer-supplied commuter vouchers, congestion pric- was used in the study. The researchers found that NN has prediction
ing and travel time reduction, and so on. Under different transpor- ability comparable to the nested logit model and concluded that
tation control measures, a neural network model is used to predict there is no clear indication which approach is better. Another study
individual’s responses, which can be one of eight possible options: that analyzed the transferability of CI models in travel demand
do nothing different, change the departure time to work or school, modeling was Arentze et al. (2002). They tested the spatial trans-
walk to work or school, bicycle to work or school, take a car or van ferability of different components of Albatross, which is a rule-
pool to work or school, take transit to work or school, work at based model. Choice rules trained from Hendrik-Ido-Ambacht
home, and other. Pang et al. (1999) used a fuzzy neural approach, were applied directly to Voorhout and Apeldoorn, which is essen-
which is a fuzzy system using a particularly structured neural net- tially naïve transfer. The researchers concluded that the overall spa-
work, to model the decision-making process of an individual’s tial transferability of the model was satisfactory, except for mode
route choice. The proposed approach is adaptive to the decision choice models.
making of the driver, which shows the driver’s preference. Dia When studying model transferability, various assessment mea-
and Panwai (2007) used neural network models to analyze com- sures are used, representing three different aspects: model equiv-
muters’ route choice behavior in response to traveler information alence between the transferred and local models, predictive
systems. Dia and Panwai (2009) then compared the performance of ability at either the individual or aggregate level, and sensitivity
neural network models to discrete choice models. The results analysis. Depending on whether one is assessing the performance
showed that neural network models perform better than binary pro- of the transferred model relative to the model estimated based on
bit and logit models. Longhi et al. (2005) used artificial neural net- local data, these measures can also be categorized as absolute or
works to predict the growth rate of total regional employment at a relative measures. Rossi and Bhat (2014) gave a more detailed
certain time. Mozolin et al. (2000) compared the performance of summary of assessment measures used for evaluating model
multilayer perceptron neural networks and maximum-likelihood transferability.
doubly constrained models for commuter trip distribution. The re- Through the literature review, it is found that few studies con-
sults showed that NN models may fit data better, but their predictive sider the spatial transferability of NN models in travel demand
accuracy is poor in comparison to that of maximum-likelihood modeling. The few existing studies simply compare the naïve trans-
doubly constrained models. Tillema et al. (2006) applied neural ferability between the NN and logit models with no discussion
networks in trip distribution modeling and compared the results about what leads to a NN model with better transferability. Addi-
to doubly constrained gravity models. They concluded that neural tionally, there is a gap in how to update the NN model when some
networks outperform gravity models when data are scarce. Shmueli local data are available. Although several transfer methods have
et al. (1996) used neural networks to predict the number of work, been proposed and studied, including transfer scaling, Bayesian up-
leisure, and maintenance trips and total number of trips. Sarvareddy dating, combined transfer estimator, and joint context estimation,
et al. (2005) tried two neural network models, the backpropagation they are targeted for traditional statistical models like logit-based
and the fully recurrent neural network, for truck trip generation us- models and have not been used on NN models. No study has been
ing vessel freight data. Zhou et al. (2007) explored the application done to adapt NN models using available local data in travel de-
of backpropagation neural networks to travel demand analysis. mand modeling. In this paper, the spatial transferability of NN
They first applied backpropagation neural networks to model trip models in travel demand modeling, especially in mode choice mod-
generation, trip distribution, and mode choice separately. Then, els, will be analyzed. The paper first studies the naïve transferabil-
they discussed integrated models that can be built in two ways: ity of the NN models when no data are available in an application
a simple combination of separate models or a multilayer backpro- context. Then, a NN model adaptation method is proposed when
pagation network model. Mostafa (2004) used neural network limited local data are available. In order to test how the proposed
models to forecast maritime traffic flows and compared the model NN model adaptation method performs for different sample sizes of
performance with the performance of the autoregressive integrated local data, the NN model adaptation method is evaluated using five
moving average (ARIMA) model. The results showed that neural different sample sizes. Five areas in the Washington, DC, and
network models generally perform better than the ARIMA model. Baltimore regions are chosen to conduct the transferability analy-
As shown in previous discussion, NN models have been applied sis: DC, Baltimore city, Baltimore County, Anne Arundel County,
in problems like mode choice, trip generation, trip distribution, and Montgomery County. The 2007/2008 Transportation Planning
route choice, and modeling travelers’ behavior under different Board—Baltimore Metropolitan Council Household Travel Surveys

© ASCE 04018010-2 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


 PK T
are used to train the NN models and conduct the transferability gk ðTÞ ¼ eT k l¼1 e
l is usually used for K-class classification.

analysis. The softmax function can produce positive value of Y k that sum
The rest of the paper is organized as follows. The next section to one. In application, the predicted class k would be the one with
introduces the NN models and the proposed NN model adaptation largest value of Y k
method. The “Data” section introduces the data and some descrip-
tive statistics. The results of the spatial transferability analysis for k ¼ argmaxi Y i ; i ¼ 1; : : : ; K ð3Þ
the five NN models will be presented in the “Results” section. The
last section concludes the main findings of the paper and discusses When fitting the NN model, one needs to find a set of weights to
several possible future research directions. fit the training data well. In classification problems, cross-entropy
is usually used as the measure of fit

Methodology X
N X
K
RðθÞ ¼ − tkn lnyk ðxn Þ ð4Þ
n¼1 k¼1
Neural Network Model
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

Different types of NN models have currently been developed and where N = number of the training sample size; and tkn = true output
applied, like feedforward neural networks, recurrent neural net- value of the sample n, tkn ¼ 1 if sample n belongs to class k, tkn ¼ 0
works, radial basis function networks, and so on. In this paper, otherwise. The backpropagation algorithm is the most commonly
one of the most widely used neural network structures is used— used training algorithm for NN models (Wasserman 1989).
the feedforward neural network (FNN), which normally consists There are several issues that need to be taken care of when train-
of an input layer, one or more hidden layers, and an output layer. ing the NN model. Before starting training, the structure of the NN
In FNN, information moves in one direction: from the input nodes, model needs to be determined, which means modelers need to
through hidden nodes, and to the output nodes. In this paper, the decide the number of hidden layers and number of nodes in each
single hidden-layer FNN or three-layer FNN is used with the struc- hidden layers. The three-layer FNN is the most commonly used NN
ture shown in Fig. 1. NN models can be applied to solve both clas- structure in travel demand modeling and has shown good perfor-
sification and regression problems. For K-class classification mance (Sayed and Razavi 2000; Xie et al. 2003; Zhang and Xie
(K > 2), there are K output nodes in the output layer, with the 2008). Therefore, this study will continue using the three-layer
kth node modeling the probability of class kðk ¼ 1; 2; : : : ; KÞ. FNN structure. Different numbers of nodes in the hidden layer will
For binary classification problems, usually only one output node be tried, and the one with best performance based on cross-
will be included in the output layer. For a regression problem, there validation will be taken.
is only one output node in the output layer. The second issue is the overfitting problem. The model may fit
Input information enters the NN model through the input layer. the training data very well but have very poor generalization ability
The input layer usually standardizes the input to avoid saturation if modelers try to achieve the global minimum value of the cross-
and facilitate the training process (Hastie et al. 2009). Then, the entropy RðθÞ. Early stopping (Caruana et al. 2001) is adopted in this
hidden layer creates derived features Zm ðm ¼ 1; 2; : : : ; MÞ based study to deal with this problem, which means that the model is
on linear combinations of the X p ðp ¼ 1; 2; : : : ; PÞ from the input trained only for a while before it reaches the global optimum. Val-
layer as Eq. (1). The derived features Zm then feed into the output idation data are usually used to determine when to stop because it is
layer to calculate the target Y k as Eq. (2) preferred to stop training when the validation error starts to
increase.
Zm ¼ σðvÞ ¼ σðαom þ αTm XÞ; m ¼ 1; : : : ; M ð1Þ The third issue in NN training is the multiple minima problem.
Because the error function RðθÞ is nonconvex, there will be many
Y k ¼ gk ðTÞ ¼ gk ðβ ok þ βTk ZÞ; k ¼ 1; : : : ; K ð2Þ local minima, and the final train NN will be quite dependent on the
choice of starting weights. A common practice is to try several
where αom , αm , β ok , and βk = unknown parameters in NN models, starting weights and choose the solution with the best performance.
often called weights. Another issue is the imbalanced data set problem. If NN models
The function σðvÞ is called the activation function, where the are directly trained on the training data set, there are two assump-
sigmoid function σðvÞ ¼ 1=ð1 þ e−v Þ is usually used. The function tions made (Provost 2000):
gk ðTÞ is called the output function, where the softmax function 1. The goal is to maximize accuracy; and
2. The classifier will operate on a data set drawn from the same
distribution as the training data.
… However, these assumptions may not be true. For example, for
Output Layer Y1 Y2 YK
the mode choice problem, sometimes modelers may not care that
much about predicting each individual accurately and care more
about getting the overall market shares of different modes correctly.
Sometimes the target of interest is a minor class like bus and

Hidden Layer Z1 Z2 ZM identifying bus users, or sometimes the sample data are not
representative of the whole population. If either of the two assump-
tions is violated, the imbalanced data set may cause a problem that
needs to be handled. The imbalanced data set issue can be solved by
instance reweighting or resampling (Elkan 2001; Lin et al. 2002;
Du Plessis and Sugiyama 2014).

Input Layer X1 X2 X2 XP NN models can easily be used to solve transportation problems.
As mentioned in the introduction, NN models have been success-
fully applied in problems like mode choice, trip generation, trip
Fig. 1. Single hidden-layer FNN
distribution, route choice, and modeling travelers’ behavior under

© ASCE 04018010-3 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


different policies or traveler information systems. When defining a When wk > 1, then the classifier is more in favor of the kth class
NN model for a discrete choice problem like a mode choice prob- compared to the first class; when wk < 1, the classifier is more in
lem, one input node is introduced for each independent variable favor of the first class. In order to prevent adjusting the classifier too
(like sociodemographic variables, attributes of different alterna- much in one direction and save computation time, upper and lower
tives, etc.) and one output node for each alternative in the choice bounds can be set when searching the suitable value of the classi-
set. On the other hand, when defining a NN model for a continuous fication adjustment weights.
problem like trip generation, only one output node is needed. By introducing the classification adjustment weight vector, the
classifier’s classification ability is adjusted based on the prevalence
of different classes in the local data, which essentially reduces the
Model Transfer Method systematical bias of the different data distributions between the es-
When transferring a NN model from the estimation context to the timation and application contexts.
application context, it is necessary to check the aforementioned two
assumptions. If both assumptions stand, which means that the goal Measures
of the model application is to maximize accuracy and the applica-
In this study, the spatial transferability of NN models is measured
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

tion context is similar to the estimation context, the model trans-


ferability is enhanced. For the statement that the application context mainly based on the predictive ability at either the individual or
is similar to the estimation context, it means that the data of the aggregate level. Some measures from previous literature about
application context follow similar distribution as the data of the the spatial transferability of logit models are used, as well as some
estimation context on different aspects, like socioeconomic makeup others that are more suitable for NN models. These measures can be
and mode share. If both assumptions stand, a simple naïve transfer used to evaluate discrete choice models.
can work very well, and it is the simplest way for model transfer The individual-level measures assess the ability of the trans-
and requires minimal resources. It directly applies the model from ferred model to predict individual-level outcomes in the application
the estimated context to the application context without changing context. Two individual-level measures are used in the study: the hit
the model specification or parameters and does not require any lo- ratio (HT) and overall prediction accuracy (ACC)
cal data. However, usually, the second assumption is violated and N pk
the distribution of the estimation data is different from the data in HTk ¼ ð8Þ
the application context, which will case a systematical bias. When Nk
some local data are available, the systematical bias can be reduced P
N pk
by updating the parameters of the transferred model using local ACC ¼ Pk ð9Þ
data. Previous studies have proposed several parameter-updating k Nk
methods, including transfer scaling, Bayesian updating, combined
where N pk = number of correctly predicted individual observations
transfer estimator, and joint context estimation. However, these
for choice alternative k; and N k = number of actual observations for
methods are targeted toward traditional statistical models like
choice alternative k.
logit-based models. No study has been done to adapt NN models
The aggregate measures assess the ability of the transferred
using available local data in travel demand modeling.
model to predict aggregate travel behaviors in the application con-
In the field of freeway incident detection, FNN models are
text. Three aggregate measures are used in this study: the mean
usually adapted through adjusting the output threshold value
absolute relative error measure (REM), the root mean square error
(Srinivasan et al. 2004). However, this method is applicable only
(RMSE), and the relative aggregate transfer error (RATE). REM
for binary classification problems. For most travel demand model-
measures the aggregate predictive ability of the transferred model
ing problems, there are usually more than two alternatives in the
for each choice alternative. RMSE gives an overall measurement
choice set. In order to handle this problem, this paper proposes
for all choice alternatives. RATE compares the predictive ability
a new model adaptation method for FNN models. The proposed
of the transferred model to the local model
model adaptation method works for any discrete choice problems,
like mode choice, route choice, modeling travelers’ behavior under PSk − OSk
different policies, or any other transportation problems with dis- REMk ¼ ð10Þ
OSk
crete dependent variables.
P 1=2
Here, the classification adjustment weight vector is introduced
k PSk × REM2k
RMSE ¼ P ð11Þ
W ¼ ðw1 · · · wk · · · wK Þ ð5Þ k PSk

When predicting, instead of choosing k with the largest value of RMSEi ðβ j Þ


RATE ¼ ð12Þ
Y k , k with the largest value of wk Y k is chosen RMSEi ðβ i Þ

k ¼ argmaxi ðwi Y i Þ; i ¼ 1; : : : ; K ð6Þ where PSk and OSk = predicted shares and observed shares, respec-
tively, for a choice alternative k; RMSEi ðβ j Þ = RMSE of the trans-
If the classification adjustment weight wk equals one, it is the ferred model applied to the application context data; and
base case in which the k with the largest value of Y k is chosen. RMSEi ðβ i Þ = RMSE of the local model applied to the application
When some local data are available, the classification adjustment context data.
weight vector can be tuned to reflect the local characteristics.
Because only the relative values between those weights make
sense, any of the weights can be fixed to be one. Here, w1 is Data
assumed to be one; thus, the classification adjustment weight
vector is The 2007/2008 Transportation Planning Board—Baltimore Metro-
politan Council Household Travel Survey (TPB-BMC HHTS) is
W ¼ ð1 · · · wk · · · wK Þ ð7Þ used in this study, which covers approximately 14,000 households

© ASCE 04018010-4 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


Table 1. Explanatory Variables
Variable Definition Values
Age Age in years Numeric value
Income Household income 1 = less than $50,000; 2 = $50,000–$99,999;
3 = $100,000–$149,999; 4 = $150,000 or more
Lic Have driver license? (persons 16+) 1 = YES; 2 = no; −9 = not applicable
Emply Currently employed? 1 = YES; 2 = no; −9 = not applicable
Tenure Housing tenure 1 = owned; 2 = rented
Workers Number of workers in household Numeric value
Vehicles Number of vehicles in the household Numeric value
Bikes Number of bicycles in the household Numeric value
Opurp Origin trip purpose 01 = home; 02 = work; 04 = shop; 08 = school;
09 = other; 11 = drop off or pick up someone
Dpurp Destination trip purpose 01 = home; 02 = work; 04 = shop; 08 = school;
09 = other; 11 = drop off or pick up someone
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

Time Travel time for each mode Continuous (min)


Free parking Employer provides free parking 1 = yes; 2 = no; −9 = not applicable
Share parking Employer and employee share parking cost 1 = yes; 2 = no; −9 = not applicable
Subsidies Employer provides subsidies for transit or vanpooling 1 = yes; 2 = no; −9 = not applicable
Ride home Guaranteed ride home available to employee 1 = yes; 2 = no; −9 = not applicable

in the Washington, DC, and Baltimore regions (National Capital People in DC and BAL have shorter trips and more HBW trips
Region Transportation Planning Board Metropolitan Washington compared to other areas. Additionally, in DC and BAL, there
Council of Governments 2010). Representative households were are fewer elders and a lower number of vehicles and bikes in each
asked to complete a travel diary that documented the activities household. More people in these two areas rent houses.
of all household members on a randomly assigned weekday. Vari- There are differences in some aspects between DC and BAL.
ous information was collected, including trip, person, household, For example, BAL has many more low-income people, almost
and vehicle information. In order to study the spatial transferability, 30% of the total sample. This may be because BAL has the lowest
trips within five regions are used: District of Columbia (DC) and average number of workers per household among these five re-
Montgomery County (MG) in the Washington region, and Balti- gions. DC has more transit and walk or bike trips than BAL, which
more city (BAL), Baltimore County (BC), and Anne Arundel may be because DC has better transit facilities and trips in DC are
County (AA) in the Baltimore region. Trips in other areas and shorter.
cross-region trips are not considered in this study. The three lower-density areas, MG, BC, and AA, are very sim-
In order to build mode choice models, alternative specific var- ilar in many aspects like age, employment, house ownership, num-
iables are required for each travel mode. However, TPB-BMC ber of vehicles, number of bikes, trip purpose, trip distance, modal
HHTS records travel information only for the real-taken mode. share, and so on. Thus, models are expected to be more transferable
Additional information is needed to represent the level of service between these three areas because they share so many similarities.
for all alternative modes. In this study, travel time skim matrices
from the Metropolitan Washington Council of Governments
Results
(MWCOG) Travel Forecasting Model version 2.3 are used for
DC and MG, and travel time skim matrices from the Baltimore Re-
gion Travel Demand Model version 4.0 are used for BAL, BC, and NN Model Training
AA. Four travel modes are considered when modeling travelers’ The data of each area are first randomly separated into two parts:
mode choice decisions: transit, driving (auto driver), carpool (auto 80% for the training data set and 20% for the test data set. The
passenger), and walk or bike, which represents the primary travel training data set is used for NN model training, whereas the test
mode. Travel times of transit, driving, and carpool are provided by data set is used to evaluate model transferability.
the travel time skim matrices. For walk or bike, travel time is esti- In this study, it is assumed that the goal of the mode choice
mated using reported trip distance divided by speed. The speed is model is to maximize accuracy. For each region, the sample data
exogenously defined as 4.87 mi=h. set is assumed to be representative of the whole population. With
Explanatory variables used in this study include age; income; these two assumptions, the imbalanced data set problem is not that
driver license; employment; housing tenure; number of workers problematic and is not handled in the case study.
in household; number of vehicles; number of bikes; trip purpose; Before starting NN training, the structure of the NN model
trip distance; travel time for each alternative mode; and some trans- needs to be specified, which means to determine the number of
portation benefit policies employers provide to employees, like free hidden layers and the number of nodes in each hidden layer. As
parking, share parking cost, subsidies for transit, and so on. These discussed in the “Methodology” section, a three-layer FNN model
explanatory variables are summarized in Table 1. with one hidden layer is used in this study. In order to choose the
Table 2 summarizes some descriptive statistics of the recorded number of nodes in the hidden layer, different numbers of nodes are
trips within the five study areas. DC and BAL are higher-density tried, from 4 to 20. For each case, 10-fold cross-validation is used to
urban areas, whereas MG, BC, and AA are lower-density areas. evaluate the model performance. In 10-fold cross-validation, the
Significant differences can be observed between the higher-density training samples are first arbitrarily divided into 10 subsets. Then,
and lower-density areas. For example, in DC and BAL, there are one of the subsets is left out as the validation data for testing the
many transit and walk or bike trips, whereas in MG, BC, and AA, model, and the NN model is trained using the samples in the re-
driving is dominant and constitutes more than 64% of the trips. maining nine subsets. The process is repeated 10 times with each of

© ASCE 04018010-5 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


Table 2. Descriptive Statistics
Categories DC MG BAL BC AA
Sample size 8,968 6,355 5,114 5,620 4,521
Age <18 years 9.5% 19.9% 14.4% 14.6% 15.8%
Age 18–29 years 12.8% 7.4% 12.7% 8.8% 8.0%
Age 30–64 years 65.4% 56.2% 59.4% 56.2% 55.6%
Age >65 years 12.2% 16.5% 13.4% 20.4% 20.6%
Income < $30,000 10.6% 7.7% 28.8% 10.5% 8.0%
Income $30,000–$74,999 27.6% 32.3% 36.5% 42.3% 33.4%
Income ≥ 75,000 61.8% 60.0% 34.7% 47.3% 58.6%
Driver 82.6% 77.5% 73.2% 84.2% 84.8%
Employed 68.7% 54.4% 59.2% 54.0% 53.4%
Own the house 68.9% 83.5% 67.5% 83.6% 89.1%
Average number of workers 1.3 1.4 1.2 1.3 1.4
Average number of vehicles 1.3 1.9 1.4 2.1 2.2
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

Average number of bikes 1.2 1.6 1.0 1.4 1.6


HBW 16.3% 11.7% 18.4% 12.9% 11.2%
HBSCH 3.6% 5.9% 7.3% 4.5% 3.6%
HBS 12.8% 18.5% 14.8% 18.1% 20.1%
HBO 26.2% 36.7% 32.0% 31.6% 33.5%
NHW 22.8% 17.2% 13.7% 21.1% 20.0%
Average trip distance (miles) 1.8 3.2 2.3 4.2 4.3
Transit 18.3% 2.4% 10.8% 0.7% 0.4%
Driving 30.2% 64.5% 49.3% 72.4% 71.4%
Carpool 10.2% 23.9% 17.4% 22.2% 23.2%
Walk or bike 41.3% 9.2% 22.5% 4.7% 5.1%
Note: HBO = home-based other trips; HBS = home-based shopping trips; HBSCH = home-based school trips; HBW = home-based work trips; NHW = non–
home-based work trips.

the 10 subsets used exactly once as the validation data. The average Naïve Transfer
of the 10 validation error results is used as the final performance
The model estimated for each of the five areas is evaluated by the
evaluation. When training the NN model, the early stopping tech- five test data sets. The performance for the test data of the same area
nique is used to mitigate the problem of overfitting. Because of the represents the performance of the local model, whereas the perfor-
multiple minima problem, for each training data set, the NN model mance for the test data of other areas represents the performance
is trained 10 times, and the one with the best performance is used. when the model is directly transferred to these areas. Because each
Fig. 2 shows the calculated 10-fold cross-validation error using dif- of the five models is transferred to the other four areas, there are, in
ferent numbers of nodes in the hidden layer when training the BAL total, 20 transfers. Table 3 shows the performance of the 20 naïve
NN model. Here, the cross-entropy based on the validation data is transfers between the five areas, as well as the performance of the
used as the error measure. Based on this plot, the NN model has the local model. Performance measures including HT, ACC, REM,
best performance when the hidden layer has 17 nodes for the BAL RMSE, and RATE are presented in Table 3. For the DC test data,
training data. The number of nodes in the hidden layers is also de- the BAL model has good performance based on ACC, RMSE, and
termined for DC, MG, BC, and AA training data in the same way, RATE, whereas the MG, BC, and AA models are not transferable,
and is 18, 14, 9, and 16, respectively. with low ACC and very high RMSE and RATE. For the MG test
After specifying the number of nodes in the hidden layer, the data, the BC and AA models have very good performance, whereas
NN model structure is determined. Then, five NN models are the DC and BAL models do not perform very well. For the BAL
trained using the training data set of the five areas with early stop- test data, the MG and DC models perform better than the AA and
ping and multiple runs. BC models. For the BC test data, MG and AA perform better than
the DC and BAL models, whereas the DC model has a very high
0.09
RMSE value. For the AA test data, MG and BC have very good
performance, whereas the DC model has a very high RMSE value.
Table 4 shows the transferability ranking for each of the five areas
0.0895
when they are used as the application context. Tables 3 and 4
Cross validation Error

clearly show that the transferability between the three lower-density


0.089 areas is very good, as expected, because they share similar demo-
graphic characteristics. The two higher-density urban areas have
0.0885 reasonably good transferability, but not as good as the transferabil-
ity between the three lower-density areas, probably because DC and
0.088 BAL have some significant differences, such as BAL having more
low-income people and longer trips. In terms of ACC, the DC
0.0875
model has good transferability to MG, BC, and AA, whereas the
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 MG, BC, and AA models have poor transferability to DC, which
No. of nodes indicates that the transferability of NN models is not symmetric,
similar to the logit models (Nowrouzian and Srinivasan 2012;
Fig. 2. Number of hidden nodes selection for BAL NN model
Sikder and Pinjari Rawoof 2013).

© ASCE 04018010-6 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


Table 3. Naïve Transfer Performance
HT REM
Application Estimation Transit Driving Carpool Walk or bike ACC Transit Driving Carpool Walk or bike
context context (%) (%) (%) (%) (%) (%) (%) (%) (%) RMSE RATE
DC DC 62.3 77.2 37.0 84.1 73.6 −16.7 15.3 −48.7 7.0 0.16 1.00
MG 29.8 97.6 45.4 20.7 48.5 −59.1 132.5 5.9 −74.6 1.16 7.12
BAL 52.1 77.8 37.0 76.1 68.6 −27.4 25.4 −32.8 0.4 0.20 1.25
BC 0.9 98.9 47.9 5.7 37.8 −98.6 162.4 50.4 −90.4 1.49 9.13
AA 0.0 95.8 48.7 32.1 47.8 −100.0 127.5 25.2 −58.1 1.10 6.75
MG DC 54.5 85.4 50.0 73.2 75.2 104.5 −3.4 −47.5 112.2 0.58 2.78
MG 45.5 99.6 67.8 31.7 84.4 −31.8 16.8 −19.8 −57.3 0.21 1.00
BAL 54.5 84.1 57.4 51.2 74.0 45.5 −2.5 −22.8 61.0 0.27 1.30
BC 0.0 99.1 72.3 12.2 82.1 −100.0 17.5 −3.5 −84.1 0.18 0.88
AA 0.0 96.8 65.8 28.0 80.6 −100.0 14.4 −14.4 −36.6 0.17 0.80
BAL DC 66.7 78.2 28.7 83.1 70.9 20.2 −9.4% −63.4 45.9 0.33 2.17
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

MG 51.2 98.5 56.4 36.6 71.4 −38.1 35.9 −11.9 −45.3 0.35 2.35
BAL 67.9 93.8 47.5 73.3 78.9 −7.1 12.4 −39.6 2.3 0.15 1.00
BC 4.8 99.4 61.4 10.5 60.5 −94.0 44.7 65.3 −80.8 0.53 3.51
AA 0.0 98.2 55.4 39.5 65.7 −100.0 37.6 20.8 −37.8 0.35 2.34
BC DC 80.0 88.4 46.5 81.1 78.7 360.0 −0.4 −51.8 194.6 0.98 5.17
MG 80.0 99.6 62.4 10.8 86.9 20.0 14.4 −32.4 −70.3 0.20 1.06
BAL 80.0 82.1 67.1 35.1 76.5 120.0 −6.9% 18.2 2.7 0.18 0.96
BC 0.0 98.9 63.5 13.5 86.3 −100.0 13.9 −28.2 −64.9 0.19 1.00
AA 0.0 98.2 62.9 10.8 85.5 −100.0 12.6 −29.4 −40.5 0.18 0.94
AA DC 80.0 86.8 41.5 79.4 75.5 180.0 0.5 −56.5 211.8 0.94 4.08
MG 40.0 99.3 57.1 5.9 83.6 −20.0 18.6 −35.4 −79.4 0.23 1.02
BAL 40.0 65.9 71.4 44.1 65.8 −60.0 −22.0% 71.4 −20.6 0.49 2.12
BC 0.0 98.6 57.8 0.0 82.7 −100.0 18.3 −29.3 −91.2 0.22 0.94
AA 0.0 99.5 58.5 17.6 84.4 −100.0 18.1 −33.3 −70.6 0.23 1.00
Note: Bold is used for the performance of the local models.

Table 4. Transferability Ranking


to adapt the model. Because it is a four-class classification problem,
there are four classification adjustment weights in the classification
Application Transferability ranking Transferability ranking adjustment weight vector, W ¼ ð1; w2 ; w3 ; w4 Þ. The lower and
context based on ACC based on RMSE upper bounds for each w are set as 0.2 and 5 to prevent adjusting
DC BAL ≫ MG > AA > BC BAL ≫ AA > MG > BC the classifier too much toward one direction and to save computa-
MG BC > AA > DC > BAL AA > BC > BAL > DC tion time. The enumeration method is used to search for the optimal
BAL MG > DC > AA > BC DC > AA > MG > BC combination of three weights. The searching step is set as 0.1.
BC MG > AA > DC > BAL AA > BAL > MG ≫ DC
Table 5 shows the detailed results of the adapted model from
AA MG > BC > DC > BAL BC > MG > BAL ≫ DC
MG to DC using 100 local data. Tables 6 and 7 show the perfor-
mance of the 20 transferred NN models after model adaptation us-
ing different sample sizes. The performance of naïve transfer is also
Model Adaptation included in the tables for comparison.
Because part of the transfer error comes from sampling the local Table 5 shows the detailed results of the adapted model from
data or random variation (Karasmaa 2007), it would be interesting MG to DC using 100 local data, compared with naïve transfer.
to test how the proposed NN model adaptation method performs for The model adaptation is conducted five times, and the average clas-
different sizes of local data. In this study, the proposed NN model sification adjustment weight vector is shown in the table, as well as
adaptation method is applied using five different local data sample the confusion matrix, ACC, and HT calculated using the weight
sizes: 100, 200, 300, 400, and 500. The model adaptation is con- vector. A large weight, 1.74, is assigned to walk or bike mode,
ducted five times for each sample size. Each time, the specified size whereas small weights, 0.26 and 0.28, are assigned to driving
of data is randomly drawn from the local training data set and used and carpool, respectively. This is because DC has more transit

Table 5. NN Model Transferred from MG to DC


Transfer method ACC (%) Confusion matrix Transit Driving Carpool Walk or bike Weight HT (%)
Naïve transfer 48.5 Transit 64 116 26 9 — 29.8
Driving 1 369 0 8 — 97.6
Carpool 7 51 54 7 — 45.4
Walk or bike 16 343 46 106 — 20.7
Model adaptation 65.9 Transit 87 89 15 24 1 40.5
Driving 4 295 0 79 0.26 78.0
Carpool 7 39 34 39 0.28 28.6
Walk or bike 18 94 9 390 1.74 76.3

© ASCE 04018010-7 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


Table 6. Model Adaptation Performance Based on ACC
Application context Estimation context Naïve transfer 100 200 300 400 500
DC MG 0.48 0.66 (þ) 0.66 (þ) 0.66 (þ) 0.66 (þ) 0.66 (þ)
BAL 0.69 0.68 (×) 0.68 (×) 0.69 (×) 0.68 (×) 0.69 (×)
BC 0.38 0.56 (þ) 0.55 (þ) 0.55 (þ) 0.55 (þ) 0.55 (þ)
AA 0.48 0.56 (þ) 0.60 (þ) 0.60 (þ) 0.59 (þ) 0.58 (þ)
MG DC 0.75 0.83 (þ) 0.83 (þ) 0.82 (þ) 0.81 (þ) 0.83 (þ)
BAL 0.74 0.82 (þ) 0.81 (þ) 0.81 (þ) 0.81 (þ) 0.81 (þ)
BC 0.82 0.82 (×) 0.81 (×) 0.82 (×) 0.82 (×) 0.81 (×)
AA 0.81 0.81 (×) 0.82 (þ) 0.82 (þ) 0.82 (þ) 0.82 (þ)
BAL DC 0.71 0.72 (þ) 0.72 (þ) 0.74 (þ) 0.74 (þ) 0.74 (þ)
MG 0.71 0.71 (×) 0.73 (þ) 0.72 (×) 0.73 (þ) 0.73 (þ)
BC 0.61 0.62 (þ) 0.62 (þ) 0.64 (þ) 0.64 (þ) 0.64 (þ)
AA 0.66 0.67 (þ) 0.69 (þ) 0.69 (þ) 0.69 (þ) 0.69 (þ)
BC DC 0.79 0.84 (þ) 0.85 (þ) 0.85 (þ) 0.85 (þ) 0.86 (þ)
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

MG 0.87 0.85 (−) 0.87 (×) 0.87 (×) 0.86 (×) 0.87 (×)
BAL 0.77 0.81 (þ) 0.81 (þ) 0.81 (þ) 0.81 (þ) 0.82 (þ)
AA 0.86 0.85 (×) 0.86 (×) 0.86 (×) 0.86 (×) 0.86 (×)
AA DC 0.76 0.82 (þ) 0.82 (þ) 0.83 (þ) 0.83 (þ) 0.83 (þ)
MG 0.84 0.84 (×) 0.84 (×) 0.84 (×) 0.84 (×) 0.84 (×)
BAL 0.66 0.78 (þ) 0.78 (þ) 0.78 (þ) 0.78 (þ) 0.79 (þ)
BC 0.83 0.82 (×) 0.83 (×) 0.82 (×) 0.83 (×) 0.83 (×)
Amount of performance increase compared to naïve transfer 12 14 13 14 14
Amount of performance decrease compared to naïve transfer 1 0 0 0 0
Note: (þ) indicates the ACC of the adapted model is larger than the ACC of the naïve transferred model plus 0.01; (−) indicates the ACC of the adapted model
is less than the ACC of the naïve transferred model minus 0.01; (×) indicates that the adapted model has similar performance to the naïve transferred model.

Table 7. Model Adaptation Performance Based on RMSE


Application context Estimation context Naïve transfer 100 200 300 400 500
DC MG 1.16 0.32 (þ) 0.31 (þ) 0.3 (þ) 0.25 (þ) 0.26 (þ)
BAL 0.20 0.22 (−) 0.22 (−) 0.23 (−) 0.28 (−) 0.16 (þ)
BC 1.49 0.34 (þ) 0.42 (þ) 0.40 (þ) 0.37 (þ) 0.36 (þ)
AA 1.10 0.43 (þ) 0.25 (þ) 0.29 (þ) 0.32 (þ) 0.37 (þ)
MG DC 0.58 0.19 (þ) 0.20 (þ) 0.19 (þ) 0.17 (þ) 0.19 (þ)
BAL 0.27 0.19 (þ) 0.18 (þ) 0.17 (þ) 0.15 (þ) 0.16 (þ)
BC 0.18 0.19 (×) 0.20 (−) 0.21 (−) 0.21 (−) 0.19 (×)
AA 0.17 0.20 (−) 0.21 (−) 0.20 (−) 0.21 (−) 0.22 (−)
BAL DC 0.33 0.17 (þ) 0.15 (þ) 0.18 (þ) 0.18 (þ) 0.18 (þ)
MG 0.35 0.29 (þ) 0.26 (þ) 0.32 (þ) 0.20 (þ) 0.20 (þ)
BC 0.53 0.43 (þ) 0.39 (þ) 0.34 (þ) 0.36 (þ) 0.22 (þ)
AA 0.35 0.24 (þ) 0.32 (þ) 0.3 (þ) 0.26 (þ) 0.29 (þ)
BC DC 0.98 0.29 (þ) 0.16 (þ) 0.19 (þ) 0.25 (þ) 0.21 (þ)
MG 0.20 0.17 (þ) 0.22 (−) 0.22 (−) 0.23 (−) 0.20 (×)
BAL 0.18 0.38 (−) 0.38 (−) 0.42 (−) 0.51 (−) 0.51 (−)
AA 0.18 0.16 (þ) 0.19 (−) 0.20 (−) 0.21 (−) 0.19 (−)
AA DC 0.94 0.27 (þ) 0.23 (þ) 0.23 (þ) 0.24 (þ) 0.22 (þ)
MG 0.23 0.23 (×) 0.24 (×) 0.24 (×) 0.24 (×) 0.23 (×)
BAL 0.49 0.13 (þ) 0.13 (þ) 0.15 (þ) 0.14 (þ) 0.15 (þ)
BC 0.22 0.23 (−) 0.22 (×) 0.21 (×) 0.22 (−) 0.24 (−)
Number of performance increase compared to naïve transfer 14 12 12 12 13
Number of performance decrease compared to naïve transfer 4 6 6 6 4
Note: (þ) indicates the RMSE of the adapted model is larger than the RMSE of the naïve transferred model plus 0.01; (−) indicates the RMSE of the adapted
model is less than the RMSE of the naïve transferred model minus 0.01; (×) indicates that the adapted model has similar performance to the naïve
transferred model.

and walk or bike trips and fewer driving and carpool trips compared For those transfers between areas where there exist many differen-
to MG. By doing this, the adapted model is able to predict transit ces and naïve transfer does not work very well, the model adapta-
and walk or bike with much higher accuracy, thus achieving higher tion method will improve the accuracy significantly, even with a
overall prediction accuracy. small sample size, such as MG transferred to DC (ACC increased
Table 6 shows the performance evaluated by ACC. Each ACC from 0.48 to 0.66 with sample size of 100), BC transferred to DC
value of the adapted model of different sizes represents the mean (ACC increased from 0.38 to 0.56 with sample size of 100), AA
value of five runs. It shows that the adaptation method can improve transferred to DC (ACC increased from 0.48 to 0.56 with sample
the individual-level performance of NN models most of the time. size of 100), DC transferred to BC (ACC increased from 0.79 to

© ASCE 04018010-8 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


0.84 with sample size of 100), and DC transferred to AA (ACC DC and BAL, have reasonably good transferability, but not as good
increased from 0.76 to 0.82 with sample size of 100). For transfer as the transferability among the three lower-density areas because
between areas that share many similarities and for which naïve DC and BAL have some significant differences, such as BAL hav-
transfer works well, the gain of using model adaptation is not as ing more low-income people and longer trips. It is also noticed that
significant, such as transfer between DC and BAL and transfers the transferability of NN models is not symmetric.
among MG, BC, and AA. Generally speaking, there is not much The proposed NN model adaptation method is used to adapt the
difference between the adapted models using different sizes of transferred models using five different local data sample sizes: 100,
local data, which means that even a small size of local data can 200, 300, 400, and 500. The performances of the adapted models
significantly improve the model transferability if the data are are evaluated by the individual-level measure ACC and aggregate
representative of local data. However, when the sample size is measure RMSE. Similar conclusions can be drawn from individual-
smaller, the randomness of local data sampling will have an increas- level and aggregate measures. The proposed NN model adaptation
ing impact, which will affect the robustness of the adapted model. method can gain significantly for transfers between the areas where
Table 7 shows the performance evaluated by RMSE. Each there exist many differences and for which naïve transfer does not
RMSE value of the adapted model of different sizes represents work very well, even with small sample size, as long as the local
the mean value of five runs. Similar conclusions can be reached sample is representative. For transfer between areas that are very
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

as those for ACC. As Table 7 shows, the adapted model can gain similar, the gain of using model adaptation is not as significant.
significantly for those transfers even with a small sample size be- Because the proposed adaptation method is targeted to maximize
tween areas where there exist many differences and for which naïve accuracy in the application context, the model transferability in
transfer does not work very well, such as in the case of MG trans- terms of individual-level performance is boosted most of the time,
ferred to DC (RMSE decreased from 1.16 to 0.32 with sample size whereas the model transferability in terms of aggregate level may
of 100), BC transferred to DC (RMSE decreased from 1.49 to 0.34 be harmed sometimes.
with sample size of 100), AA transferred to DC (RMSE decreased In this study, the lower and upper bounds for each classification
from 1.10 to 0.43 with sample size of 100), DC transferred to MG adjustment weight are set as 0.2 and 5 to prevent adjusting the clas-
(RMSE decreased from 0.59 to 0.19 with sample size of 100), DC sifier too much in one direction and to save computation time. If no
transferred to BC (RMSE decreased from 0.98 to 0.29 with sample bound is set for searching the classification adjustment weight, it
size of 100), and DC transferred to AA (RMSE decreased from would be expected that those classes with few observations in the
0.94 to 0.27 with sample size of 100). The gain of using model local training data may be neglected, especially when the local sam-
adaptation for transfer between similar areas is not as significant, ple size is small. In the future, it would be interesting to study how
such as transfers among MG, BC, and AA. There is not much differ- different settings of the lower and upper bounds will affect the
ence between the adapted models using different sizes of local data. performance of the adapted NN models. Additionally, this study
Even a small size of local data can improve the performance signifi- focuses on the transferability of NN models without comparing
cantly. However, in some cases, the adapted models may perform them to other models, like logit models, random forest, support
worse than the naïve transfer based on RMSE. This is because the vector machine, and so on. How different methods differ in their
proposed adaptation method is targeted to maximize accuracy in spatial transferability is an interesting research direction. Moreover,
the application context instead of matching the market share. Model in this paper, the three lower-density areas, MG, BC, and AA, are
transferability in terms of aggregate level may be harmed sometimes. considered more “similar” compared to DC and BAL through
qualitatively comparing aspects like age, employment, house
ownership, number of vehicles, number of bikes, trip purpose, trip
Conclusions distance, modal share, and so on. The NN models of these three
regions are also shown to have higher transferability among each
In this paper, the spatial transferability of NN models in travel de- other. However, no quantitative definition is given to define the
mand modeling, especially in mode choice models, is analyzed. level of similarity between different regions. How to quantitatively
The paper first discusses the naïve transferability of NN models define the level of similarity and how the level of similarity will
when no data are available in the application context. Then, a affect model transferability will be studied in future research.
NN model adaptation method is proposed using the classification Finally, the 2007/2008 TPB-BMC HHTS data used in this study
adjustment weight vector when limited data from the application are slightly outdated. The 2017/2018 household travel survey for
context are available. The performance of the adaptation method the TPB-BMC regions will start in September 2017. According to
is evaluated for different sample sizes. Using the 2007/2008 MWCOG, the data will be ready for modelers and practitioners in
TPB-BMC HHTS data, five NN models are built using trips within 2019. This new data set will be explored in future studies.
five areas: DC and MG from the Washington, DC, region, and
BAL, BC, and AA from the Baltimore region. Cross-validation
is used to choose the number of nodes in the hidden layer. When Acknowledgments
training the NN models, the early stopping technique is used to
This research is financially supported by the National Science
avoid overfitting, and multiple runs are used to deal with the multi-
Foundation CAREER Award Project “Reliability as an Emergent
ple minima problem. Each of the five NN models built is applied to
Property of Transportation Networks” and U.S. Federal Highway
four other areas to test the performance of naïve transfer. Different
Administration Exploratory Advanced Research Program. The
performance measures are used to evaluate the transferability, in-
authors are solely responsible for the statements in the paper.
cluding individual-level measures like HT and ACC, as well as ag-
gregate measures like REM, RMSE, and RATE. After analyzing
the performance of naïve transfer, it is found that the model trans-
References
ferability is good between areas that share many similarities in
characteristics like land use, travel modal share, and so on. In Arentze, T., Hofman, F., Van Mourik, H., and Timmermans, H. (2002).
the five study areas, the transferability between the three lower- “Spatial transferability of the albatross model system: Empirical evi-
density areas is very good. The two higher-density urban areas, dence from two case studies.” Transp. Res. Rec., 1805, 1–7.

© ASCE 04018010-9 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010


Bowman, J. L., and Bradley, M. (2017). “Testing spatial transferability of Pendyala, R. M., Kitamura, R., Chen, C., and Pas, E. I. (1997). “An
activity-based travel forecasting models.” Proc., 96th Transportation activity-based microsimulation analysis of transport control measures.”
Research Board Annual Meeting, Washington, DC. Transp. Policy, 4(3), 183–192.
Bowman, J. L., Bradley, M., Castiglione, J., and Yoder, S. L. (2014). Provost, F. (2000). “Machine learning from imbalanced data sets 101.”
“Making advanced travel forecasting models affordable through model Proc., AAAI’2000 Workshop on Imbalanced Data Sets, AAAI, Menlo
transferability.” Proc., 93rd Transportation Research Board Annual Park, CA, 1–3.
Meeting, Washington, DC. Rossi, T. F., and Bhat, C. R. (2014). “Guide for travel model transfer.”
Cantarella, G. E., and de Luca, S. (2005). “Multilayer feedforward FHWA Research Rep. No. FHWA-HEP-15-006, Transportation Re-
networks for transportation mode choice analysis: An analysis and a search Board, Washington, DC.
comparison with random utility models.” Transp. Res. Part C: Emerg. Sarvareddy, P., Al-Deek, H. M., Klodzinski, J., and Anagnostopoulos, G.
Technol., 13(2), 121–155. (2005). “Evaluation of two modeling methods for generating heavy-
Caruana, R., Lawrence, S., and Giles, L. (2001). “Overfitting in neural nets: truck trips at an intermodal facility by using vessel freight data.” Transp.
Backpropagation, conjugate gradient, and early stopping.” Advances Res. Rec., 1906(1), 113–120.
in Neural Information Processing Systems 13: Proc., 2000 Conf., Sayed, T., and Razavi, A. (2000). “Comparison of neural and conventional
MIT Press, Cambridge, MA, 402. approaches to mode choice analysis.” J. Comput. Civ. Eng., 10.1061
Dia, H., and Panwai, S. (2007). “Modelling drivers’ compliance and route /(ASCE)0887-3801(2000)14:1(23), 23–30.
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.

choice behaviour in response to travel information.” Nonlinear Dyn., Shmueli, D., Salomon, I., and Shefer, D. (1996). “Neural network analysis
49(4), 493–509. of travel behavior: Evaluating tools for prediction.” Transp. Res. Part
Dia, H., and Panwai, S. (2009). “Evaluation of discrete choice and neural C: Emerg. Technol., 4(3), 151–166.
network approaches for modelling driver compliance with traffic infor- Sikder, S., Augustin, B., Pinjari, A. R., and Eluru, N. (2013a). “Spatial
mation.” Transportmetrica, 6(4), 1–22. transferability of tour-based time-of-day choice models: An empirical
Du Plessis, M. C., and Sugiyama, M. (2014). “Semi-supervised learning assessment.” Proc.–Soc. Behav. Sci., 104(2429), 640–649.
of class balance under class-prior change by distribution matching.” Sikder, S., and Pinjari, A. R. (2013). “Spatial transferability of person-level
Neural Networks, 50, 110–119. daily activity generation and time use models: Empirical assessment.”
Elkan, C. (2001). “The foundations of cost-sensitive learning.” Proc., 17th Transp. Res. Rec., 2343, 95–104.
Int. Joint Conf. on Artificial Intelligence, Vol. 17, Morgan Kaufmann Sikder, S., Pinjari, A. R., Srinivasan, S., and Nowrouzian, R. (2013b).
Publishers, Inc., San Francisco, 973–978. “Spatial transferability of travel forecasting models: A review and
Hastie, T., Tibshirani, R., and Friedman, J. (2009). “Overview of supervised synthesis.” Int. J. Adv. Eng. Sci. Appl. Math., 5(2–3), 104–128.
learning.” The elements of statistical learning, Springer, New York,
Srinivasan, D., Jin, X., and Cheu, R. L. (2004). “Evaluation of adaptive
9–41.
neural network models for freeway incident detection.” IEEE Trans.
Hensher, D., and Ton, T. (2000). “A comparison of the predictive potential
Intell. Transport. Syst., 5(1), 1–11.
of artificial neural networks and nested logit models for commuter mode
Tillema, F., van Zuilekom, K. M., and van Maarseveen, M. F. A. M. (2006).
choice.” Transp. Res. Part E: Logist. Transp. Rev., 36(3), 155–172.
“Comparison of neural networks and gravity models in trip distribu-
Karasmaa, N. (2007). “Evaluation of transfer methods for spatial travel de-
tion.” Comput.-Aided Civ. Infrastruct. Eng., 21(2), 104–119.
mand models.” Transp. Res. Part A: Policy Pract., 41(5), 411–427.
Karlaftis, M. G., and Vlahogianni, E. I. (2011). “Statistical methods versus Wafa, Z., Bhat, C. R., Pendyala, R. M., and Garikapati, V. M. (2015).
neural networks in transportation research: Differences, similarities “A latent-segmentation based approach to investigating the spatial trans-
and some insights.” Transp. Res. Part C: Emerg. Technol., 19(3), ferability of activity-travel models.” Transp. Res. Rec., 2493, 136–144.
387–399. Wasserman, P. D. (1989). Neural computing, Van Nostrand Reinhold,
Lin, Y., Lee, Y., and Wahba, G. (2002). “Support vector machines for New York.
classification in nonstandard situations.” Mach. Learn., 46(1–3), Xie, C., Lu, J., and Parkany, E. (2003). “Work travel mode choice modeling
191–202. with data mining: decision trees and neural networks.” Transp. Res.
Longhi, S., Nijkamp, P., Reggianni, A, and Maierhofer, E. (2005). “Neural Rec., 1854, 50–61.
network modeling as a tool for forecasting regional employment pat- Xiong, C., Yang, D., Chen, X., and Zhang, L. (2015). “Model transferabil-
terns.” Int. Reg. Sci. Rev., 28(3), 330–346. ity of hidden Markov models and a Bayesian approach to recalibrating
Mostafa, M. (2004). “Forecasting the Suez Canal traffic: A neural network travel demand models.” Presentation at the 14th Int. Conf. on Travel
analysis.” Marit. Policy Manage., 31(2), 139–156. Behavior Research, The International Association for Travel Behaviour
Mozolin, M., Thill, J. C., and Lynn Usery, E. (2000). “Trip distribution Research, Windsor, U.K.
forecasting with multilayer perceptron neural networks: A critical Xiong, C., and Zhang, L. (2013). “A descriptive Bayesian approach to mod-
evaluation.” Transp. Res. Part B: Methodol., 34(1), 53–73. eling and calibrating drivers’ En route diversion behavior.” IEEE
National Capital Region Transportation Planning Board Metropolitan Transac. Intell. Transp. Syst., 14(4), 1817–1824.
Washington Council of Governments. (2010). “2007/2008 TPB house- Yasmin, F., Morency, C., and Roorda, M. J. (2015). “Assessment of spatial
hold travel survey: Technical documentation.” 〈http://www.mwcog transferability of an activity-based model, TASHA.” Transp. Res. Part
.org/uploads/committee-documents/Zl5YWV5W20100903131244.pdf〉 A: Policy Pract., 78, 200–213.
(Aug. 27, 2010). Zhang, Y., and Xie, Y. (2008). “Travel mode choice modeling with support
Nowrouzian, R., and Srinivasan, S. (2012). “Empirical analysis of spatial vector machines.” Transp. Res. Rec., 2076, 141–150.
transferability of tour-generation models.” Transp. Res. Rec., 2302, Zhoul, Q., Lu, H., and Xu, W. (2007). “New travel demand models with
14–22. back-propagation network.” Proc., 3rd Int. Conf. on Natural Compu-
Pang, G. K. H., Takahashi, K., Yokota, T., and Takenaga, H. (1999). tation, Vol. 3, IEEE, New York, 311–317.
“Adaptive route selection for dynamic route guidance system based Ziemke, D., Nagel, K., and Bhat, C. R. (2015). “Integrating CEMDAP and
on fuzzy-neural approaches.” IEEE Trans. Veh. Technol., 48(6), MATSim to increase the transferability of transport demand models.”
2028–2041. Transp. Res. Rec., 2493, 117–125.

© ASCE 04018010-10 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2018, 32(3): 04018010

You might also like