Professional Documents
Culture Documents
Spatial Transferability of Neural Network Models
Spatial Transferability of Neural Network Models
Abstract: Neural network (NN) models have been widely used in travel demand modeling in recently years. However, there are few studies
about the spatial transferability of NN models. In this paper, the spatial transferability of NN models in travel demand modeling, especially in
mode choice models, is analyzed. This paper first discusses the performance of naïve transfer when no data are available in an application
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.
context. Then, a NN model adaptation method is proposed using the classification adjustment weight vector when limited local data are
available. Using the 2007/2008 Transportation Planning Board—Baltimore Metropolitan Council Household Travel Survey data, five NN
models are built using trips within five areas in the Washington, DC, and Baltimore regions. Each of the five NN models is applied to the other
four areas to study spatial transferability using both individual-level and aggregate-level performance measures. The result shows that the
naïve transfer of NN models can perform very well between areas that share many similarities. It also indicates the transferability of NN
models is not symmetric. The performance of the proposed adaptation method is evaluated for different sample sizes of local training data. For
transfer between areas that have significant differences, the proposed NN model adaptation method can improve performance significantly,
even with a small sample size, compared to naïve transfer. DOI: 10.1061/(ASCE)CP.1943-5487.0000752. © 2018 American Society of
Civil Engineers.
Introduction Yasmin et al. 2015; Sikder et al. 2013a; Wafa et al. 2015; Bowman
and Bradley 2017). Based on data availability, people can use
Travel demand models are important tools in the transportation different transfer methods. When no local data are available, naïve
planning process, and they are used to analyze people’s travel transfer is the only available transferring method, which means
behaviors and predict travel demand changes in different scenarios. directly applying the model from the estimated context to the
However, these travel demand models usually need various data as application context without changing the model specification or
input from different kinds of travel surveys, like household travel parameters. When limited local data are available, people can
surveys (HHTSs), stated preference (SP) surveys, or global posi- use local data to update the parameters of the transferred models.
tioning system (GPS) surveys. These surveys are usually very Different transfer methods have been proposed and studied for logit
time-consuming and effort-consuming to conduct. For those re- models, including naïve transfer, transfer scaling, Bayesian updat-
gions that want to do some transportation analysis but lack those ing, combined transfer estimator, and joint context estimation
survey data, spatial transfer sometimes is a more practical and ef- (Karasmaa 2007; Xiong et al. 2015; Xiong and Zhang 2013).
ficient solution than spending money and time in conducting travel Karasmaa (2007) studied the spatial transferability of mode and
surveys and building local travel demand models. Spatial transfer destination choice models; both were logit models. Different trans-
refers to the practice that applies a model to an area other than the fer methods (transfer scaling, Bayesian updating, combined trans-
estimation context. Spatial transfer can save significant time and fer, or joint context estimation) were compared using different
money because it does not require any data from application con- sample sizes, and the researchers concluded that joint context es-
texts, or it needs only limited data to update the transferred model. timation gives the best prediction performance in almost all cases.
Because of these benefits, spatial transfer has become a common Detailed suggestions have been provided about how to conduct a
practice when resources are limited (Rossi and Bhat 2014; Sikder
successful spatial transfer for logit models (Rossi and Bhat 2014).
et al. 2013b). Many studies have been conducted to study the trans- In recent years, computational intelligence (CI) methods, which
ferability of travel demand models, especially for the most widely
are based on learning, adaptation, evolution, and fuzzy logic, are
used logit-based models (Bowman et al. 2014; Ziemke et al. 2015;
more and more used in transportation fields. Neural networks
1 (NNs) are one of the most popular CI models and have been suc-
Graduate Research Assistant, Dept. of Civil and Environmental
Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College Park,
cessfully applied to travel demand modeling in different choice
MD 20742. ORCID: https://orcid.org/0000-0003-4138-1543. E-mail: liang@ dimensions (Shmueli et al. 1996; Sayed and Razavi 2000). Many
umd.edu studies have been conducted applying NNs in mode choice mod-
2 eling. Hensher and Ton (2000) compared the predictive capability
Assistant Research Professor, Dept. of Civil and Environmental
Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College Park, of neural network models and nested logit models in the context of
MD 20742. E-mail: cxiong@umd.edu a commuter mode choice problem. The results showed that nested
3
Herbert Rabin Distinguished Professor, Dept. of Civil and Environ- logit models are better at matching the overall market share,
mental Engineering, Univ. of Maryland, 1173 Glenn Martin Hall, College whereas neural network models are better at predicting an individ-
Park, MD 20742 (corresponding author). E-mail: lei@umd.edu
ual’s travel mode. Cantarella and de Luca (2005) discussed how to
Note. This manuscript was submitted on May 9, 2017; approved on
October 16, 2017; published online on February 10, 2018. Discussion per- successfully apply the multilayer feedforward network (MLFFN)
iod open until July 10, 2018; separate discussions must be submitted for to support travel demand analysis, like trip generation, trip distri-
individual papers. This paper is part of the Journal of Computing in Civil bution, and modal split. Then, they built MLFFN models to analyze
Engineering, © ASCE, ISSN 0887-3801. transportation mode choice and compare with random utility
mand management response option generator module of AMOS Ton (2000). This study focused on commuters’ mode choice and
(Pendyala et al. 1997). Transportation control measures considered compared the generalization ability between NN and nested logit
in the paper include parking pricing, improved bicycle or pedestrian models using SP data from Sydney and Melbourne. Naïve transfer
facilities, employer-supplied commuter vouchers, congestion pric- was used in the study. The researchers found that NN has prediction
ing and travel time reduction, and so on. Under different transpor- ability comparable to the nested logit model and concluded that
tation control measures, a neural network model is used to predict there is no clear indication which approach is better. Another study
individual’s responses, which can be one of eight possible options: that analyzed the transferability of CI models in travel demand
do nothing different, change the departure time to work or school, modeling was Arentze et al. (2002). They tested the spatial trans-
walk to work or school, bicycle to work or school, take a car or van ferability of different components of Albatross, which is a rule-
pool to work or school, take transit to work or school, work at based model. Choice rules trained from Hendrik-Ido-Ambacht
home, and other. Pang et al. (1999) used a fuzzy neural approach, were applied directly to Voorhout and Apeldoorn, which is essen-
which is a fuzzy system using a particularly structured neural net- tially naïve transfer. The researchers concluded that the overall spa-
work, to model the decision-making process of an individual’s tial transferability of the model was satisfactory, except for mode
route choice. The proposed approach is adaptive to the decision choice models.
making of the driver, which shows the driver’s preference. Dia When studying model transferability, various assessment mea-
and Panwai (2007) used neural network models to analyze com- sures are used, representing three different aspects: model equiv-
muters’ route choice behavior in response to traveler information alence between the transferred and local models, predictive
systems. Dia and Panwai (2009) then compared the performance of ability at either the individual or aggregate level, and sensitivity
neural network models to discrete choice models. The results analysis. Depending on whether one is assessing the performance
showed that neural network models perform better than binary pro- of the transferred model relative to the model estimated based on
bit and logit models. Longhi et al. (2005) used artificial neural net- local data, these measures can also be categorized as absolute or
works to predict the growth rate of total regional employment at a relative measures. Rossi and Bhat (2014) gave a more detailed
certain time. Mozolin et al. (2000) compared the performance of summary of assessment measures used for evaluating model
multilayer perceptron neural networks and maximum-likelihood transferability.
doubly constrained models for commuter trip distribution. The re- Through the literature review, it is found that few studies con-
sults showed that NN models may fit data better, but their predictive sider the spatial transferability of NN models in travel demand
accuracy is poor in comparison to that of maximum-likelihood modeling. The few existing studies simply compare the naïve trans-
doubly constrained models. Tillema et al. (2006) applied neural ferability between the NN and logit models with no discussion
networks in trip distribution modeling and compared the results about what leads to a NN model with better transferability. Addi-
to doubly constrained gravity models. They concluded that neural tionally, there is a gap in how to update the NN model when some
networks outperform gravity models when data are scarce. Shmueli local data are available. Although several transfer methods have
et al. (1996) used neural networks to predict the number of work, been proposed and studied, including transfer scaling, Bayesian up-
leisure, and maintenance trips and total number of trips. Sarvareddy dating, combined transfer estimator, and joint context estimation,
et al. (2005) tried two neural network models, the backpropagation they are targeted for traditional statistical models like logit-based
and the fully recurrent neural network, for truck trip generation us- models and have not been used on NN models. No study has been
ing vessel freight data. Zhou et al. (2007) explored the application done to adapt NN models using available local data in travel de-
of backpropagation neural networks to travel demand analysis. mand modeling. In this paper, the spatial transferability of NN
They first applied backpropagation neural networks to model trip models in travel demand modeling, especially in mode choice mod-
generation, trip distribution, and mode choice separately. Then, els, will be analyzed. The paper first studies the naïve transferabil-
they discussed integrated models that can be built in two ways: ity of the NN models when no data are available in an application
a simple combination of separate models or a multilayer backpro- context. Then, a NN model adaptation method is proposed when
pagation network model. Mostafa (2004) used neural network limited local data are available. In order to test how the proposed
models to forecast maritime traffic flows and compared the model NN model adaptation method performs for different sample sizes of
performance with the performance of the autoregressive integrated local data, the NN model adaptation method is evaluated using five
moving average (ARIMA) model. The results showed that neural different sample sizes. Five areas in the Washington, DC, and
network models generally perform better than the ARIMA model. Baltimore regions are chosen to conduct the transferability analy-
As shown in previous discussion, NN models have been applied sis: DC, Baltimore city, Baltimore County, Anne Arundel County,
in problems like mode choice, trip generation, trip distribution, and Montgomery County. The 2007/2008 Transportation Planning
route choice, and modeling travelers’ behavior under different Board—Baltimore Metropolitan Council Household Travel Surveys
analysis. The softmax function can produce positive value of Y k that sum
The rest of the paper is organized as follows. The next section to one. In application, the predicted class k would be the one with
introduces the NN models and the proposed NN model adaptation largest value of Y k
method. The “Data” section introduces the data and some descrip-
tive statistics. The results of the spatial transferability analysis for k ¼ argmaxi Y i ; i ¼ 1; : : : ; K ð3Þ
the five NN models will be presented in the “Results” section. The
last section concludes the main findings of the paper and discusses When fitting the NN model, one needs to find a set of weights to
several possible future research directions. fit the training data well. In classification problems, cross-entropy
is usually used as the measure of fit
Methodology X
N X
K
RðθÞ ¼ − tkn lnyk ðxn Þ ð4Þ
n¼1 k¼1
Neural Network Model
Downloaded from ascelibrary.org by Tufts University on 02/21/18. Copyright ASCE. For personal use only; all rights reserved.
Different types of NN models have currently been developed and where N = number of the training sample size; and tkn = true output
applied, like feedforward neural networks, recurrent neural net- value of the sample n, tkn ¼ 1 if sample n belongs to class k, tkn ¼ 0
works, radial basis function networks, and so on. In this paper, otherwise. The backpropagation algorithm is the most commonly
one of the most widely used neural network structures is used— used training algorithm for NN models (Wasserman 1989).
the feedforward neural network (FNN), which normally consists There are several issues that need to be taken care of when train-
of an input layer, one or more hidden layers, and an output layer. ing the NN model. Before starting training, the structure of the NN
In FNN, information moves in one direction: from the input nodes, model needs to be determined, which means modelers need to
through hidden nodes, and to the output nodes. In this paper, the decide the number of hidden layers and number of nodes in each
single hidden-layer FNN or three-layer FNN is used with the struc- hidden layers. The three-layer FNN is the most commonly used NN
ture shown in Fig. 1. NN models can be applied to solve both clas- structure in travel demand modeling and has shown good perfor-
sification and regression problems. For K-class classification mance (Sayed and Razavi 2000; Xie et al. 2003; Zhang and Xie
(K > 2), there are K output nodes in the output layer, with the 2008). Therefore, this study will continue using the three-layer
kth node modeling the probability of class kðk ¼ 1; 2; : : : ; KÞ. FNN structure. Different numbers of nodes in the hidden layer will
For binary classification problems, usually only one output node be tried, and the one with best performance based on cross-
will be included in the output layer. For a regression problem, there validation will be taken.
is only one output node in the output layer. The second issue is the overfitting problem. The model may fit
Input information enters the NN model through the input layer. the training data very well but have very poor generalization ability
The input layer usually standardizes the input to avoid saturation if modelers try to achieve the global minimum value of the cross-
and facilitate the training process (Hastie et al. 2009). Then, the entropy RðθÞ. Early stopping (Caruana et al. 2001) is adopted in this
hidden layer creates derived features Zm ðm ¼ 1; 2; : : : ; MÞ based study to deal with this problem, which means that the model is
on linear combinations of the X p ðp ¼ 1; 2; : : : ; PÞ from the input trained only for a while before it reaches the global optimum. Val-
layer as Eq. (1). The derived features Zm then feed into the output idation data are usually used to determine when to stop because it is
layer to calculate the target Y k as Eq. (2) preferred to stop training when the validation error starts to
increase.
Zm ¼ σðvÞ ¼ σðαom þ αTm XÞ; m ¼ 1; : : : ; M ð1Þ The third issue in NN training is the multiple minima problem.
Because the error function RðθÞ is nonconvex, there will be many
Y k ¼ gk ðTÞ ¼ gk ðβ ok þ βTk ZÞ; k ¼ 1; : : : ; K ð2Þ local minima, and the final train NN will be quite dependent on the
choice of starting weights. A common practice is to try several
where αom , αm , β ok , and βk = unknown parameters in NN models, starting weights and choose the solution with the best performance.
often called weights. Another issue is the imbalanced data set problem. If NN models
The function σðvÞ is called the activation function, where the are directly trained on the training data set, there are two assump-
sigmoid function σðvÞ ¼ 1=ð1 þ e−v Þ is usually used. The function tions made (Provost 2000):
gk ðTÞ is called the output function, where the softmax function 1. The goal is to maximize accuracy; and
2. The classifier will operate on a data set drawn from the same
distribution as the training data.
… However, these assumptions may not be true. For example, for
Output Layer Y1 Y2 YK
the mode choice problem, sometimes modelers may not care that
much about predicting each individual accurately and care more
about getting the overall market shares of different modes correctly.
Sometimes the target of interest is a minor class like bus and
…
Hidden Layer Z1 Z2 ZM identifying bus users, or sometimes the sample data are not
representative of the whole population. If either of the two assump-
tions is violated, the imbalanced data set may cause a problem that
needs to be handled. The imbalanced data set issue can be solved by
instance reweighting or resampling (Elkan 2001; Lin et al. 2002;
Du Plessis and Sugiyama 2014).
…
Input Layer X1 X2 X2 XP NN models can easily be used to solve transportation problems.
As mentioned in the introduction, NN models have been success-
fully applied in problems like mode choice, trip generation, trip
Fig. 1. Single hidden-layer FNN
distribution, route choice, and modeling travelers’ behavior under
k ¼ argmaxi ðwi Y i Þ; i ¼ 1; : : : ; K ð6Þ where PSk and OSk = predicted shares and observed shares, respec-
tively, for a choice alternative k; RMSEi ðβ j Þ = RMSE of the trans-
If the classification adjustment weight wk equals one, it is the ferred model applied to the application context data; and
base case in which the k with the largest value of Y k is chosen. RMSEi ðβ i Þ = RMSE of the local model applied to the application
When some local data are available, the classification adjustment context data.
weight vector can be tuned to reflect the local characteristics.
Because only the relative values between those weights make
sense, any of the weights can be fixed to be one. Here, w1 is Data
assumed to be one; thus, the classification adjustment weight
vector is The 2007/2008 Transportation Planning Board—Baltimore Metro-
politan Council Household Travel Survey (TPB-BMC HHTS) is
W ¼ ð1 · · · wk · · · wK Þ ð7Þ used in this study, which covers approximately 14,000 households
in the Washington, DC, and Baltimore regions (National Capital People in DC and BAL have shorter trips and more HBW trips
Region Transportation Planning Board Metropolitan Washington compared to other areas. Additionally, in DC and BAL, there
Council of Governments 2010). Representative households were are fewer elders and a lower number of vehicles and bikes in each
asked to complete a travel diary that documented the activities household. More people in these two areas rent houses.
of all household members on a randomly assigned weekday. Vari- There are differences in some aspects between DC and BAL.
ous information was collected, including trip, person, household, For example, BAL has many more low-income people, almost
and vehicle information. In order to study the spatial transferability, 30% of the total sample. This may be because BAL has the lowest
trips within five regions are used: District of Columbia (DC) and average number of workers per household among these five re-
Montgomery County (MG) in the Washington region, and Balti- gions. DC has more transit and walk or bike trips than BAL, which
more city (BAL), Baltimore County (BC), and Anne Arundel may be because DC has better transit facilities and trips in DC are
County (AA) in the Baltimore region. Trips in other areas and shorter.
cross-region trips are not considered in this study. The three lower-density areas, MG, BC, and AA, are very sim-
In order to build mode choice models, alternative specific var- ilar in many aspects like age, employment, house ownership, num-
iables are required for each travel mode. However, TPB-BMC ber of vehicles, number of bikes, trip purpose, trip distance, modal
HHTS records travel information only for the real-taken mode. share, and so on. Thus, models are expected to be more transferable
Additional information is needed to represent the level of service between these three areas because they share so many similarities.
for all alternative modes. In this study, travel time skim matrices
from the Metropolitan Washington Council of Governments
Results
(MWCOG) Travel Forecasting Model version 2.3 are used for
DC and MG, and travel time skim matrices from the Baltimore Re-
gion Travel Demand Model version 4.0 are used for BAL, BC, and NN Model Training
AA. Four travel modes are considered when modeling travelers’ The data of each area are first randomly separated into two parts:
mode choice decisions: transit, driving (auto driver), carpool (auto 80% for the training data set and 20% for the test data set. The
passenger), and walk or bike, which represents the primary travel training data set is used for NN model training, whereas the test
mode. Travel times of transit, driving, and carpool are provided by data set is used to evaluate model transferability.
the travel time skim matrices. For walk or bike, travel time is esti- In this study, it is assumed that the goal of the mode choice
mated using reported trip distance divided by speed. The speed is model is to maximize accuracy. For each region, the sample data
exogenously defined as 4.87 mi=h. set is assumed to be representative of the whole population. With
Explanatory variables used in this study include age; income; these two assumptions, the imbalanced data set problem is not that
driver license; employment; housing tenure; number of workers problematic and is not handled in the case study.
in household; number of vehicles; number of bikes; trip purpose; Before starting NN training, the structure of the NN model
trip distance; travel time for each alternative mode; and some trans- needs to be specified, which means to determine the number of
portation benefit policies employers provide to employees, like free hidden layers and the number of nodes in each hidden layer. As
parking, share parking cost, subsidies for transit, and so on. These discussed in the “Methodology” section, a three-layer FNN model
explanatory variables are summarized in Table 1. with one hidden layer is used in this study. In order to choose the
Table 2 summarizes some descriptive statistics of the recorded number of nodes in the hidden layer, different numbers of nodes are
trips within the five study areas. DC and BAL are higher-density tried, from 4 to 20. For each case, 10-fold cross-validation is used to
urban areas, whereas MG, BC, and AA are lower-density areas. evaluate the model performance. In 10-fold cross-validation, the
Significant differences can be observed between the higher-density training samples are first arbitrarily divided into 10 subsets. Then,
and lower-density areas. For example, in DC and BAL, there are one of the subsets is left out as the validation data for testing the
many transit and walk or bike trips, whereas in MG, BC, and AA, model, and the NN model is trained using the samples in the re-
driving is dominant and constitutes more than 64% of the trips. maining nine subsets. The process is repeated 10 times with each of
the 10 subsets used exactly once as the validation data. The average Naïve Transfer
of the 10 validation error results is used as the final performance
The model estimated for each of the five areas is evaluated by the
evaluation. When training the NN model, the early stopping tech- five test data sets. The performance for the test data of the same area
nique is used to mitigate the problem of overfitting. Because of the represents the performance of the local model, whereas the perfor-
multiple minima problem, for each training data set, the NN model mance for the test data of other areas represents the performance
is trained 10 times, and the one with the best performance is used. when the model is directly transferred to these areas. Because each
Fig. 2 shows the calculated 10-fold cross-validation error using dif- of the five models is transferred to the other four areas, there are, in
ferent numbers of nodes in the hidden layer when training the BAL total, 20 transfers. Table 3 shows the performance of the 20 naïve
NN model. Here, the cross-entropy based on the validation data is transfers between the five areas, as well as the performance of the
used as the error measure. Based on this plot, the NN model has the local model. Performance measures including HT, ACC, REM,
best performance when the hidden layer has 17 nodes for the BAL RMSE, and RATE are presented in Table 3. For the DC test data,
training data. The number of nodes in the hidden layers is also de- the BAL model has good performance based on ACC, RMSE, and
termined for DC, MG, BC, and AA training data in the same way, RATE, whereas the MG, BC, and AA models are not transferable,
and is 18, 14, 9, and 16, respectively. with low ACC and very high RMSE and RATE. For the MG test
After specifying the number of nodes in the hidden layer, the data, the BC and AA models have very good performance, whereas
NN model structure is determined. Then, five NN models are the DC and BAL models do not perform very well. For the BAL
trained using the training data set of the five areas with early stop- test data, the MG and DC models perform better than the AA and
ping and multiple runs. BC models. For the BC test data, MG and AA perform better than
the DC and BAL models, whereas the DC model has a very high
0.09
RMSE value. For the AA test data, MG and BC have very good
performance, whereas the DC model has a very high RMSE value.
Table 4 shows the transferability ranking for each of the five areas
0.0895
when they are used as the application context. Tables 3 and 4
Cross validation Error
MG 51.2 98.5 56.4 36.6 71.4 −38.1 35.9 −11.9 −45.3 0.35 2.35
BAL 67.9 93.8 47.5 73.3 78.9 −7.1 12.4 −39.6 2.3 0.15 1.00
BC 4.8 99.4 61.4 10.5 60.5 −94.0 44.7 65.3 −80.8 0.53 3.51
AA 0.0 98.2 55.4 39.5 65.7 −100.0 37.6 20.8 −37.8 0.35 2.34
BC DC 80.0 88.4 46.5 81.1 78.7 360.0 −0.4 −51.8 194.6 0.98 5.17
MG 80.0 99.6 62.4 10.8 86.9 20.0 14.4 −32.4 −70.3 0.20 1.06
BAL 80.0 82.1 67.1 35.1 76.5 120.0 −6.9% 18.2 2.7 0.18 0.96
BC 0.0 98.9 63.5 13.5 86.3 −100.0 13.9 −28.2 −64.9 0.19 1.00
AA 0.0 98.2 62.9 10.8 85.5 −100.0 12.6 −29.4 −40.5 0.18 0.94
AA DC 80.0 86.8 41.5 79.4 75.5 180.0 0.5 −56.5 211.8 0.94 4.08
MG 40.0 99.3 57.1 5.9 83.6 −20.0 18.6 −35.4 −79.4 0.23 1.02
BAL 40.0 65.9 71.4 44.1 65.8 −60.0 −22.0% 71.4 −20.6 0.49 2.12
BC 0.0 98.6 57.8 0.0 82.7 −100.0 18.3 −29.3 −91.2 0.22 0.94
AA 0.0 99.5 58.5 17.6 84.4 −100.0 18.1 −33.3 −70.6 0.23 1.00
Note: Bold is used for the performance of the local models.
MG 0.87 0.85 (−) 0.87 (×) 0.87 (×) 0.86 (×) 0.87 (×)
BAL 0.77 0.81 (þ) 0.81 (þ) 0.81 (þ) 0.81 (þ) 0.82 (þ)
AA 0.86 0.85 (×) 0.86 (×) 0.86 (×) 0.86 (×) 0.86 (×)
AA DC 0.76 0.82 (þ) 0.82 (þ) 0.83 (þ) 0.83 (þ) 0.83 (þ)
MG 0.84 0.84 (×) 0.84 (×) 0.84 (×) 0.84 (×) 0.84 (×)
BAL 0.66 0.78 (þ) 0.78 (þ) 0.78 (þ) 0.78 (þ) 0.79 (þ)
BC 0.83 0.82 (×) 0.83 (×) 0.82 (×) 0.83 (×) 0.83 (×)
Amount of performance increase compared to naïve transfer 12 14 13 14 14
Amount of performance decrease compared to naïve transfer 1 0 0 0 0
Note: (þ) indicates the ACC of the adapted model is larger than the ACC of the naïve transferred model plus 0.01; (−) indicates the ACC of the adapted model
is less than the ACC of the naïve transferred model minus 0.01; (×) indicates that the adapted model has similar performance to the naïve transferred model.
and walk or bike trips and fewer driving and carpool trips compared For those transfers between areas where there exist many differen-
to MG. By doing this, the adapted model is able to predict transit ces and naïve transfer does not work very well, the model adapta-
and walk or bike with much higher accuracy, thus achieving higher tion method will improve the accuracy significantly, even with a
overall prediction accuracy. small sample size, such as MG transferred to DC (ACC increased
Table 6 shows the performance evaluated by ACC. Each ACC from 0.48 to 0.66 with sample size of 100), BC transferred to DC
value of the adapted model of different sizes represents the mean (ACC increased from 0.38 to 0.56 with sample size of 100), AA
value of five runs. It shows that the adaptation method can improve transferred to DC (ACC increased from 0.48 to 0.56 with sample
the individual-level performance of NN models most of the time. size of 100), DC transferred to BC (ACC increased from 0.79 to
as those for ACC. As Table 7 shows, the adapted model can gain similar, the gain of using model adaptation is not as significant.
significantly for those transfers even with a small sample size be- Because the proposed adaptation method is targeted to maximize
tween areas where there exist many differences and for which naïve accuracy in the application context, the model transferability in
transfer does not work very well, such as in the case of MG trans- terms of individual-level performance is boosted most of the time,
ferred to DC (RMSE decreased from 1.16 to 0.32 with sample size whereas the model transferability in terms of aggregate level may
of 100), BC transferred to DC (RMSE decreased from 1.49 to 0.34 be harmed sometimes.
with sample size of 100), AA transferred to DC (RMSE decreased In this study, the lower and upper bounds for each classification
from 1.10 to 0.43 with sample size of 100), DC transferred to MG adjustment weight are set as 0.2 and 5 to prevent adjusting the clas-
(RMSE decreased from 0.59 to 0.19 with sample size of 100), DC sifier too much in one direction and to save computation time. If no
transferred to BC (RMSE decreased from 0.98 to 0.29 with sample bound is set for searching the classification adjustment weight, it
size of 100), and DC transferred to AA (RMSE decreased from would be expected that those classes with few observations in the
0.94 to 0.27 with sample size of 100). The gain of using model local training data may be neglected, especially when the local sam-
adaptation for transfer between similar areas is not as significant, ple size is small. In the future, it would be interesting to study how
such as transfers among MG, BC, and AA. There is not much differ- different settings of the lower and upper bounds will affect the
ence between the adapted models using different sizes of local data. performance of the adapted NN models. Additionally, this study
Even a small size of local data can improve the performance signifi- focuses on the transferability of NN models without comparing
cantly. However, in some cases, the adapted models may perform them to other models, like logit models, random forest, support
worse than the naïve transfer based on RMSE. This is because the vector machine, and so on. How different methods differ in their
proposed adaptation method is targeted to maximize accuracy in spatial transferability is an interesting research direction. Moreover,
the application context instead of matching the market share. Model in this paper, the three lower-density areas, MG, BC, and AA, are
transferability in terms of aggregate level may be harmed sometimes. considered more “similar” compared to DC and BAL through
qualitatively comparing aspects like age, employment, house
ownership, number of vehicles, number of bikes, trip purpose, trip
Conclusions distance, modal share, and so on. The NN models of these three
regions are also shown to have higher transferability among each
In this paper, the spatial transferability of NN models in travel de- other. However, no quantitative definition is given to define the
mand modeling, especially in mode choice models, is analyzed. level of similarity between different regions. How to quantitatively
The paper first discusses the naïve transferability of NN models define the level of similarity and how the level of similarity will
when no data are available in the application context. Then, a affect model transferability will be studied in future research.
NN model adaptation method is proposed using the classification Finally, the 2007/2008 TPB-BMC HHTS data used in this study
adjustment weight vector when limited data from the application are slightly outdated. The 2017/2018 household travel survey for
context are available. The performance of the adaptation method the TPB-BMC regions will start in September 2017. According to
is evaluated for different sample sizes. Using the 2007/2008 MWCOG, the data will be ready for modelers and practitioners in
TPB-BMC HHTS data, five NN models are built using trips within 2019. This new data set will be explored in future studies.
five areas: DC and MG from the Washington, DC, region, and
BAL, BC, and AA from the Baltimore region. Cross-validation
is used to choose the number of nodes in the hidden layer. When Acknowledgments
training the NN models, the early stopping technique is used to
This research is financially supported by the National Science
avoid overfitting, and multiple runs are used to deal with the multi-
Foundation CAREER Award Project “Reliability as an Emergent
ple minima problem. Each of the five NN models built is applied to
Property of Transportation Networks” and U.S. Federal Highway
four other areas to test the performance of naïve transfer. Different
Administration Exploratory Advanced Research Program. The
performance measures are used to evaluate the transferability, in-
authors are solely responsible for the statements in the paper.
cluding individual-level measures like HT and ACC, as well as ag-
gregate measures like REM, RMSE, and RATE. After analyzing
the performance of naïve transfer, it is found that the model trans-
References
ferability is good between areas that share many similarities in
characteristics like land use, travel modal share, and so on. In Arentze, T., Hofman, F., Van Mourik, H., and Timmermans, H. (2002).
the five study areas, the transferability between the three lower- “Spatial transferability of the albatross model system: Empirical evi-
density areas is very good. The two higher-density urban areas, dence from two case studies.” Transp. Res. Rec., 1805, 1–7.
choice behaviour in response to travel information.” Nonlinear Dyn., Shmueli, D., Salomon, I., and Shefer, D. (1996). “Neural network analysis
49(4), 493–509. of travel behavior: Evaluating tools for prediction.” Transp. Res. Part
Dia, H., and Panwai, S. (2009). “Evaluation of discrete choice and neural C: Emerg. Technol., 4(3), 151–166.
network approaches for modelling driver compliance with traffic infor- Sikder, S., Augustin, B., Pinjari, A. R., and Eluru, N. (2013a). “Spatial
mation.” Transportmetrica, 6(4), 1–22. transferability of tour-based time-of-day choice models: An empirical
Du Plessis, M. C., and Sugiyama, M. (2014). “Semi-supervised learning assessment.” Proc.–Soc. Behav. Sci., 104(2429), 640–649.
of class balance under class-prior change by distribution matching.” Sikder, S., and Pinjari, A. R. (2013). “Spatial transferability of person-level
Neural Networks, 50, 110–119. daily activity generation and time use models: Empirical assessment.”
Elkan, C. (2001). “The foundations of cost-sensitive learning.” Proc., 17th Transp. Res. Rec., 2343, 95–104.
Int. Joint Conf. on Artificial Intelligence, Vol. 17, Morgan Kaufmann Sikder, S., Pinjari, A. R., Srinivasan, S., and Nowrouzian, R. (2013b).
Publishers, Inc., San Francisco, 973–978. “Spatial transferability of travel forecasting models: A review and
Hastie, T., Tibshirani, R., and Friedman, J. (2009). “Overview of supervised synthesis.” Int. J. Adv. Eng. Sci. Appl. Math., 5(2–3), 104–128.
learning.” The elements of statistical learning, Springer, New York,
Srinivasan, D., Jin, X., and Cheu, R. L. (2004). “Evaluation of adaptive
9–41.
neural network models for freeway incident detection.” IEEE Trans.
Hensher, D., and Ton, T. (2000). “A comparison of the predictive potential
Intell. Transport. Syst., 5(1), 1–11.
of artificial neural networks and nested logit models for commuter mode
Tillema, F., van Zuilekom, K. M., and van Maarseveen, M. F. A. M. (2006).
choice.” Transp. Res. Part E: Logist. Transp. Rev., 36(3), 155–172.
“Comparison of neural networks and gravity models in trip distribu-
Karasmaa, N. (2007). “Evaluation of transfer methods for spatial travel de-
tion.” Comput.-Aided Civ. Infrastruct. Eng., 21(2), 104–119.
mand models.” Transp. Res. Part A: Policy Pract., 41(5), 411–427.
Karlaftis, M. G., and Vlahogianni, E. I. (2011). “Statistical methods versus Wafa, Z., Bhat, C. R., Pendyala, R. M., and Garikapati, V. M. (2015).
neural networks in transportation research: Differences, similarities “A latent-segmentation based approach to investigating the spatial trans-
and some insights.” Transp. Res. Part C: Emerg. Technol., 19(3), ferability of activity-travel models.” Transp. Res. Rec., 2493, 136–144.
387–399. Wasserman, P. D. (1989). Neural computing, Van Nostrand Reinhold,
Lin, Y., Lee, Y., and Wahba, G. (2002). “Support vector machines for New York.
classification in nonstandard situations.” Mach. Learn., 46(1–3), Xie, C., Lu, J., and Parkany, E. (2003). “Work travel mode choice modeling
191–202. with data mining: decision trees and neural networks.” Transp. Res.
Longhi, S., Nijkamp, P., Reggianni, A, and Maierhofer, E. (2005). “Neural Rec., 1854, 50–61.
network modeling as a tool for forecasting regional employment pat- Xiong, C., Yang, D., Chen, X., and Zhang, L. (2015). “Model transferabil-
terns.” Int. Reg. Sci. Rev., 28(3), 330–346. ity of hidden Markov models and a Bayesian approach to recalibrating
Mostafa, M. (2004). “Forecasting the Suez Canal traffic: A neural network travel demand models.” Presentation at the 14th Int. Conf. on Travel
analysis.” Marit. Policy Manage., 31(2), 139–156. Behavior Research, The International Association for Travel Behaviour
Mozolin, M., Thill, J. C., and Lynn Usery, E. (2000). “Trip distribution Research, Windsor, U.K.
forecasting with multilayer perceptron neural networks: A critical Xiong, C., and Zhang, L. (2013). “A descriptive Bayesian approach to mod-
evaluation.” Transp. Res. Part B: Methodol., 34(1), 53–73. eling and calibrating drivers’ En route diversion behavior.” IEEE
National Capital Region Transportation Planning Board Metropolitan Transac. Intell. Transp. Syst., 14(4), 1817–1824.
Washington Council of Governments. (2010). “2007/2008 TPB house- Yasmin, F., Morency, C., and Roorda, M. J. (2015). “Assessment of spatial
hold travel survey: Technical documentation.” 〈http://www.mwcog transferability of an activity-based model, TASHA.” Transp. Res. Part
.org/uploads/committee-documents/Zl5YWV5W20100903131244.pdf〉 A: Policy Pract., 78, 200–213.
(Aug. 27, 2010). Zhang, Y., and Xie, Y. (2008). “Travel mode choice modeling with support
Nowrouzian, R., and Srinivasan, S. (2012). “Empirical analysis of spatial vector machines.” Transp. Res. Rec., 2076, 141–150.
transferability of tour-generation models.” Transp. Res. Rec., 2302, Zhoul, Q., Lu, H., and Xu, W. (2007). “New travel demand models with
14–22. back-propagation network.” Proc., 3rd Int. Conf. on Natural Compu-
Pang, G. K. H., Takahashi, K., Yokota, T., and Takenaga, H. (1999). tation, Vol. 3, IEEE, New York, 311–317.
“Adaptive route selection for dynamic route guidance system based Ziemke, D., Nagel, K., and Bhat, C. R. (2015). “Integrating CEMDAP and
on fuzzy-neural approaches.” IEEE Trans. Veh. Technol., 48(6), MATSim to increase the transferability of transport demand models.”
2028–2041. Transp. Res. Rec., 2493, 117–125.