You are on page 1of 9

Journal of Transport Geography 19 (2011) 434442

Contents lists available at ScienceDirect

Journal of Transport Geography

journal homepage:

A GIS-based toolkit for route choice analysis

Dominik Papinski 1, Darren M. Scott
TransLAB (Transportation Research Lab), School of Geography & Earth Sciences, McMaster University, 1280 Main Street West, Hamilton, ON, Canada L8S 4K1

a r t i c l e

i n f o

Geographic information system
Global positioning system
Halifax STAR project
Route choice
Travel behavior

a b s t r a c t
This paper develops and applies the route choice analysis (RCA) toolkit. This GIS-based toolkit generates a
suite of over 40 variables describing route characteristics such as distance, travel time, speed statistics,
number of intersections, number of turns, number of stop signs/stop lights, and a measure of route circuity, to name a few. The input to the toolkit is one or more routes, which can be obtained from global
positioning system (GPS) data or some other means (e.g., shortest path). While the toolkit is designed to
support route choice modeling by generating variables that have been tested in previous modeling
efforts, we demonstrate its utility by testing the hypothesis that workers choose routes to minimize
either travel time or distance between home and work. A GPS-enhanced data set of 237 observed routes
for home-to-work trips collected for auto drivers in Halifax, Nova Scotia, Canada is used in our analysis.
We nd that the null hypothesis is refuted that is, a comparison of observed routes to their shortestpath alternatives based on time and distance via inferential statistics indicates that observed routes
are signicantly longer compared to their alternatives. This nding suggests that workers may choose
routes based on other route attributes. The attributes generated by the RCA toolkit for observed, shortest
time, and shortest distance routes are compared and signicant differences are noted.
2010 Elsevier Ltd. All rights reserved.

1. Introduction
Interest in modeling route choice behavior has grown in recent
years in a renewed effort to understand factors governing route
choice decisions. In part, this is driven by the emphasis on microsimulation to replace the four-step urban transportation modeling
system (UTMS). However, as Prato (2009, p. 66) notes in his recent
review of the eld, route choice research is confronted with four
challenges: data collection, processing of large data sets, generation of choice sets of alternative routes, and estimation of discrete
choice models. Beyond these challenges, other issues loom map
matching and derivation of variables used to describe routes. Decisions regarding route choice are often difcult to observe using
conventional means of data collection. Recently, the global positioning system (GPS) has been used as an effective means of collecting activity-travel data. In terms of route choice, GPS provides
detailed travel information about trip speeds and precise observation of routes. To support route choice analysis, tools are required
to process and describe observed and alternative routes. These
choice sets can become extremely large and a toolkit is necessary
to create such variables in a one-stop fashion.
Corresponding author. Tel.: +1 905 525 9140x24953; fax: +1 905 546 0463.
E-mail addresses: (D. Papinski),
(D.M. Scott).
Present address: Department of Geography and Environmental Studies, Wilfrid
Laurier University, Waterloo, ON, Canada N2L 3C5.
0966-6923/$ - see front matter 2010 Elsevier Ltd. All rights reserved.

This paper presents the route choice analysis (RCA) toolkit,

which efciently processes such routes. The RCA toolkit imports
automatically route information and generates over 40 variables
describing characteristics of that route. More importantly, the
tool is designed to rapidly generate variables, which would otherwise have to be created manually. The RCA toolkit allows the
study of traditional variables such as route distance, travel time,
and trip speeds. Other variables are also incorporated such as
the number of intersections, stop signs and/or stop lights, along
with measures of route circuity/directness. ArcGIS is used as
the development platform for the tool. Variables were selected
based on previous route choice studies and the development
and application of new route choice variables such as the route
directness index and the longest leg. These two variables are
based on cognitive route choice studies, which surmise that drivers tend to select simple, straight-forward routes (Bailenson et al.,
In general, the route choice problem has been approached from
the idea that people seek to minimize travel distance, time or maximize route reliability (Lam and Small, 2001). To demonstrate the
usefulness of the toolkit, we examine the efciency of observed
routes. Taking an efcient route implies that people select routes
with the least amount of effort or cost. We compare the attributes
of observed routes to those corresponding to shortest paths, one of
which is based on travel time and the other, distance. We use a
sample of home-to-work trips (without stops along routes) from
the Halifax STAR (SpaceTime Activity Research) project. This


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

innovative survey of time use employed GPS tracking to georeference respondent locations throughout a 48-h period (TURP, 2007).
The remainder of this paper is organized as follows. Section 2 reviews literature in the eld of route choice, including route observation, choice set generation, and inuences on route choice
decision-making. Section 3 discusses development of the RCA toolkit within a geographic information system (GIS) framework. Section 4 demonstrates the RCA toolkit by applying it to a set of 237
home-to-work trips. The observed routes are compared to their
shortest-path alternatives, which epitomize route efciency. In
Section 5, we discuss the utility of the RCA toolkit, results of the
empirical study, and highlight further application of the toolkit.
2. Background
The simplest route choice model is based on the deterministic
shortest path. Static trafc assignment models were rst developed by Wardrop (1952) and later rened by Frank and Wolfe
(1956). Equilibrium trafc assignment models are a common
means of modeling observed trafc patterns. Key assumptions
underlying these assignment models are based on Wardrops rst
principle, which states that no driver can unilaterally reduce his
or her travel costs by shifting to another route. Wardrops second
principle states that drivers cooperate with one another in order
to minimize total system travel time. These assumptions suggest
that individuals seek to minimize their travel costs. However, other
factors may inuence an individuals choice of route, which calls
into question the validity of trafc assignment models. As mentioned earlier, GPS tracking facilitates the investigation of factors
governing route choice decisions by recording precisely actual
routes in both space and time.
2.1. Observing route choice
Minimizing generalized cost, time, or even distance may not be
the sole inuence governing route choice decisions (Golledge and
Stimson, 1997). Specic types of path selection criteria include
shortest path by time or distance, least generalized cost, turn minimization, longest leg rst, fewest obstacles (stop signs or stop
lights), avoiding congestion, minimizing number of roads, restricting to a known area or corridor, maximizing aesthetics (comfort),
minimizing intermodal transfers, optimizing fastest routes (freeways), and avoiding unsafe areas (Golledge and Grling, 2002). Bailenson et al. (1998), for example, focused on a technique called
road climbing when selecting routes between two locations.
Subjects were found to start off by choosing the longest and
straightest roads for their ideal route choice.
Route choice data can be collected using stated or revealed preference surveys or through direct observation (e.g., GPS tracking).
One of the primary challenges facing route choice research is to
empirically determine factors inuencing route choice decisions.
GPS provides exact information on observed route choice; however, it is not capable of studying cognitive or psychological variables such as comfort or route familiarity. Table 1 depicts the
variety of elements that people may consider when making their
route choice. Peeta and Yu (2006) propose a behavior-based consistency-seeking (BBCS) modeling approach, which takes into account the utilities of several variables and generates route choice
probabilities. Most notably, Chen et al. (2001) explore several route
choice criteria from a behavioral perspective. A method is presented to analyze criterion weights for determining the route
selection criteria without the need for independence between
From Table 1, it is clear that the three most studied route choice
attributes are travel time, reliability, and distance. Selten et al.
(2007), for example, examined a simplied route choice scenario

Table 1
Variables tested in route choice studies.

Explanatory Variables

Selten et al. (2007)

Peeta and Yu (2006)
Lo et al. (2006)
Bar-Gera et al. (2006)
Li et al. (2005)
de Palma and Picard (2005)
Avineri and Pashker (2005)
Eby and Molnar (2002)
Cascetta et al. (2002)
Lam and Small (2001)
Chen et al. (2001)
Hato et al. (1999)



B, C, D, E, F, G, H, I, J, K, L
C, S, W
C, F, H, M, U, V
C, M, N
D, I, O
B, C, E, K, P, Q, R, S, T, U
C, E, J, M


Time of day
Trip purpose



Travel time
Travel time
Travel distance

Toll on route

Number of

Number of nodes/



Level of service


En-route stops or
Average speed


in which 18 participants had a choice between two routes. By constraining the route choice problem, researchers can better investigate the inuences of specic variables. However, individuals have
a variety of personal preferences regarding route choice. For example, Papinski et al. (2009) surveyed 31 individuals who stated that
they seek to minimize travel time for their home-to-work commutes. At the same time, the survey results also demonstrated that
individuals selected routes that avoided congestion and routes
characterized as direct.
Revealed and stated preference surveys may suggest that different criteria are important to a range of drivers. However, without
observing drivers directly, it is difcult to test the range of factors
that might inuence decision-making. Li et al. (2005) demonstrated how GPS could effectively record observed route choice
information for 182 subjects over a 10-day period. In many cases,
GPS data can supplement existing diary data originally recorded
by hand (Wolf et al., 2001). GPS data can provide trip information
on exact start and end times, route information, and also information on transportation modes (Wolf et al., 2003). However, concerns of GPS include battery consumption, instrument reliability,
urban canyon effects, and respondent accuracy, amongst others.
Information on travel time and trip speeds can be obtained in an
accurate, economical, and timely manner. Matching GPS points to
a road network is a powerful way to provide the exact route in
terms of links and junctions. The map matching algorithm of
Chung and Shalaby (2005) resulted in a 78.5% correct identication
of links traveled.
Map matching algorithms demonstrate that trip reconstruction
is effective at decreasing the burden placed on subjects to record
their exact routes as would be the case in a stated or revealed
preference survey. In turn, more time can be spent generating
explanatory variables and testing assumptions concerning route
choice. Geographic information systems (GIS) can be used to support travel behavior research. GIS allows us to view, query, interpret, and visualize spatio-temporal data. Observed and derived
routes can be imported and processed within this programming
and analytical environment. With a growing number of route
choice data sets, there is a need to evaluate routes based on their


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

2.2. Route choice models

To reiterate, two primary challenges confronting route choice
research are generating feasible sets of alternative routes to compare to observed routes and evaluating attributes governing route
choice. Route choice researchers have used several methods to derive alternative routes between origins and destinations.
In terms of route selection, travel time has been used extensively as the sole characteristic governing route choice decisions
(Wardrop, 1952). Shortest path algorithms are reviewed extensively in the literature (Hall, 1986). Variations of the shortest path
algorithm have been used to generate feasible choice sets. One approach is the k-shortest path algorithm, which generates a specic
number of shortest paths (k = 1, 2, . . ., n). Two approaches following this algorithm are the link penalty and link elimination methods. The link penalty method iteratively increases the impedance
of all links along the shortest path and then selects the next shortest path. In link elimination, the links along the shortest path are
eliminated. Another approach described by Ben-Akiva et al.
(1984) is link labeling, which labels routes according to specic criteria (e.g., minimize time or distance, or maximize aesthetics).
Every person exhibits a unique travel pattern based on personal
preferences, mode choice, and the perception of distances and
times across transportation networks (Bekhor et al., 2006). Knowledge gained from trip information (e.g., trip speed, distance) can
provide great insight into individual travel behavior (Lam and
Small, 2001). Studies have used discrete choice analysis to study
route-switching behavior (Mahmassani and Liu, 1999), route
diversion (Peeta et al., 2000), and weather effects on travel behavior (Khattak and de Palma, 1997).
Using stated preference, Abdel-Aty et al. (1997) found route
reliability, in terms of time of arrival, a critical part of the decision-making process for route selection. The study indicates that
there is a trade off in speed between a reliable route and a route
with a high degree of temporal uncertainty. However, stated preference surveys have distinct limitations. First, respondents are frequently overly optimistic when responding to hypothetical
questions (Hunt and Abraham, 2006). Respondents have to imagine the set of alternatives instead of experiencing the alternatives
directly. Therefore, they may not be able to assess properly the

merits of each choice. Second, respondents may simply answer

questions that they think the interviewer would like to hear or select a response that they would not realistically pursue (e.g., being
environmentally friendly vs. being economically stable).
Research on route choice has made limited progress incorporating a large number of route attributes in studies (see Table 1). Traditionally, studies have focused on exploring the issues of
minimizing travel time and route reliability. This is partly due to
limited survey information and small sample sizes.
Bekhor et al. (2006) generated a choice set and modeled route
choice for large-scale urban networks. They explored differences
between several route choice generation approaches. Four choice
generation methods were applied: labeling, link elimination, link
penalty, and simulation. The study highlights the importance of
properly generating a set of random yet feasible alternatives along
with proper model estimation.
Trafc assignment has moved towards more sophisticated modeling techniques, but still treats all drivers as being equal. There is a
growing departure from simulation-based studies to eld testing.
Large-scale empirical studies to evaluate different route choices
represent a step forward towards the creation of a more comprehensive urban transportation model. New tools to evaluate routes
efciently would help support route choice research.
3. RCA toolkit design and implementation
The route choice analysis (RCA) toolkit is designed to perform
one basic function to generate a series of variables describing
characteristics of a route. The toolkit is programmed within the
framework of ArcGIS using visual basic for applications (VBA)
and ArcObjects. The benets of using GIS as the development
environment are substantial. For instance, spatial data can be displayed, manipulated, and analyzed using native spatial and aspatial functions.
Routes are dened as paths through a network that visit specied network locations. These paths have an inherent spatial
component that is easily managed within a GIS. Routes traverse
the road network, which contains a set of attributes such as
posted speed limits and road type. Within a GIS, one can easily
manipulate spatial data using spatial joins and intersect select

Fig. 1. Route choice analysis (RCA) toolkit.

D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

queries not readily available in other software packages. This programming exibility allows the tool to analyze multiple aspects
of the route simultaneously without involving different software
The RCA toolkit customizes functions within ArcGIS using VBA.
Fig. 1 shows the graphical user interface for the toolkit. Users can
select which route shapeles are to be used as inputs and select
individual route attributes to be calculated. The toolkit builds upon
the features present in the Network Analyst extension of ArcGIS.
Two separate forms are available, each requiring a different level of
user interaction.
Table 2 lists the set of variables generated by the toolkit. The
underlying goal is to process multiple routes efciently while
simultaneously generating route attributes. The toolkit supports
the exploration of several route variables that have been tested
previously in route choice studies (see Table 1). The results generated by the toolkit are stored in either a comma or tab-delimited
text le format and are expressed in either imperial or metric units.
The toolkit is divided into two parts: Network Analyst functions
and attribute/spatial functions.
3.1. Network Analyst functions
Network Analyst provides the necessary tools to perform a
number of tasks. These include: nding the best route, nding
the closest feature of interest, calculating service areas, creating
origindestination cost matrices, and creating models for route
analysis. Network Analyst is an optional extension for ArcGIS specically tailored to solving transportation-related problems.
Prior to using the RCA toolkit, one must prepare the network
and the route(s) to be analyzed. Network connectivity, turns,
restrictions, and impedances must be properly dened. Restrictions are declared to include one-way roads in the road network.
Impedances such as travel distance or time are typically used. After
building the road network, the next step is to create a route using
ESRIs ArcGIS. Fig. 2 illustrates this process for routes captured
using GPS tracking. To begin, users must dene a routes origin

Table 2
Variables generated by the RCA toolkit.


Link count
Speed statistics

Route travel time

Route distance
Number of line segments along route
Statistics based on posted speed limit
% of route where observed link speeds are less
than posted speeds
% of route based on road type
% of longest road based on road type by time and
Number of unique roads based on street name
Number of intersections along route
Number of stop signs along route
Number of stop lights along route
Straight-line distance between origin and
Route distance divided by straight-line distance
Longest segment of a route based on distance
Longest segment of a route based on time
Number of turns between 270 and 330
Number of turns between 30 and 90
Number of turns between 270 and 210
Number of turns between 90 and 150
Total number of turns
Geographic location of the route start position

Route by road type

Longest road by road
Number of unique roads
Number of intersections
Number of stop signs
Number of stop lights
Straight-line distancea
Route directness indexa
Longest leg distancea
Longest leg timea
Left turnsa
Right turnsa
Sharp left turnsa
Sharp right turnsa
Total turnsa
Latitude/longitude of
Latitude/longitude of


and destination using stops (Fig. 2A). GPS data points are then
overlaid onto the road network to display the observed route
(Fig. 2B). Initially, a shortest path algorithm is used to dene a
route through the road network from the origin to the destination
(Fig. 2C). If there is a mismatch between the observed route and
the shortest path, intermediate stops (waypoints) are dened.
Where necessary, waypoints are added en-route to force the routing engine to reproduce the observed route (Fig. 2D). The RCA toolkit requires as input this link-based observed route, which is
derived using the Network Analyst extension. In this study, waypoints are added manually to reproduce the observed route. However, automated GPS map matching algorithms can also be used to
reproduce routes captured via GPS tracking (e.g., Chung and
Shalaby, 2005). It is important to note that any type of link-based
route may be used as input to the toolkit. These include observed
routes derived from GPS data and routes based on shortest paths.
The spatial conguration of a route is an important factor in
route choice (Peeta and Yu, 2006). Observed routes may be more
circuitous in nature depending on trip purpose, time of day or congestion patterns (Eby and Molnar, 2002). To study route circuity,
the toolkit generates a route directness index (RDI). The index is
based on the total route distance (TRD) divided by the straight-line
distance (SLD). SLD is calculated based on the distance between
origin and destination coordinates using the haversine formula
(Robusto, 1957):

SLD arccossin1  sin2 cos1  cos2  cosDk  R

where SLD is the straight-line distance between the two points, R is
the radius of the sphere representing the Earth, 1 is the latitude of
point 1, 2 is the latitude of point 2, and Dk is the longitude
The RDI is an expression of route circuity. Values equal to 1 represent direct routes while values greater than 1 represent routes
that are more circuitous in nature. The RDI has been used in trafc
management studies since the 1960s albeit under other names
such as route factor and detour index. For example, Cole and
King (1968) and later Barbour (1977) used the index to study the
comparative efciency of different trip patterns using direct or
minimized travel distances. More recently, Cardillo et al. (2006)
examined planar graphs of urban street patterns and how routes
deviate from straight-line distance.
Computation of the RDI is an important feature of the RCA toolkit. While the RDI can be computed for an observed route, it can
also be computed for an observed routes shortest-path alternatives based on time and distance. Such routes are important benchmarks in that they are the most efcient routes through a network.
Comparison of an observed routes RDI value to that of a shortestpath alternative reveals how efcient an observed route is.
In addition to the RDI, other route variables are generated by
the toolkit using Network Analyst. Turns, for instance, are classied
into four types. The rst pair of turns simply records the frequency
of normal turns dened as turn left and turn right. The second
pair of turns is based on the severity of the turn, which is dened as
sharp left and sharp right. The toolkit also identies attributes
based on the longest leg. The longest leg is dened as the road that
has the longest length or travel time between successive turns.
Two longest leg statistics are generated by the toolkit length
and travel time. The following section discusses the use of attribute
tables and spatial queries to generate more explanatory variables.
3.2. Attribute and spatial functions

Geographic location of the route end position

Denotes variables requiring the ArcGIS Network Analyst extension.

The second set of variables is generated using spatial and attribute table queries. A separate user form is used where multiple


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

Fig. 2. Generation of an observed route from GPS data points for input to the RCA toolkit.

shapeles representing routes are selected. Descriptive statistics

are calculated based on posted speed limits, distance traveled,
and the amount of time and distance spent along a specic road
type. Each of these must be specied in the attribute table and
are programmed using the embedded statistics interface. The
interface provides access to statistical information and unique values for the specied eld.
Road types can be segregated into different categories with the
traditional breakdown between four classes: primary highway,
expressway, major road, and minor/local roads. The toolkit also
generates the total number of links corresponding to a route. GPS
data can be linked to the underlying road network using mapmatching techniques for creating a route (e.g., Chung and Shalaby,
2005). Average speeds can be assigned on a link-by-link basis by
joining GPS points to network links. Alternatively, speed can be
simulated on roadways or posted speed limits can be used. A basic
measure of congestion is also generated. This is calculated by taking the percentage of distance and time where observed link
speeds are less than posted speed limits.
Lastly, statistics such as mean, median, minimum, maximum,
and standard deviation of the variable speed are calculated,
including speed percentiles at intervals of 10 which provide a distribution of trip speeds. In this study, observed route travel times
and speeds were based on posted speed limits and estimated network travel times. Since GPS data were not available for the shortest paths, this approach allows for a direct comparison of observed
routes to their shortest-path alternatives. The following section
demonstrates the utility of the RCA toolkit via a real-world application that is, testing the hypothesis that workers choose routes
to minimize either travel time or distance between home and

4. Demonstration
The Halifax STAR (SpaceTime Activity Research) project surveyed households across the Halifax Regional Municipality, Nova
Scotia, Canada between April 2007 and May 2008. The RCA toolkits
utility is demonstrated by analyzing observed commuting routes
for 237 auto drivers who did not make additional stops on their
journeys from home to work. Work trips have been the focus of
trafc assignment models. These trips are very common and contribute signicantly to congestion and vehicle emissions (Scott
et al., 1997). Also, given the regularity of the trip, it is likely to be
optimized by time or distance. In the case study, we test this
hypothesis by comparing observed routes to their shortest-path
alternatives (see Fig. 3). We begin by comparing route characteristics that have been tested by previous route choice studies (see
Table 1). These attributes have been found to impact the route
choice decision-making process.
Location-tracking technologies, such as GPS, allow data to be
collected at very ne spatial and temporal scales. In recent years,
GPS has been used to observe trips, which involves recording positional data at regular intervals (e.g., 1-s) with great precision. Mobile devices such as smart phones, PDAs, and Bluetooth-enabled
GPS receivers are capable of tracking individuals and vehicles. Previous studies have demonstrated that GPS data can be used to infer
trip purpose, reconstruct travel patterns for congestion studies,
and examine underlying scheduling decisions (Wolf et al., 2001;
Faghri and Hamad, 2002; Doherty and Papinski, 2004).
The Halifax STAR project utilized a GPS-assisted, promptedrecall computer-assisted telephone interview for collecting travel
data. The equipment carried by each respondent consisted of a
Hewlett Packard iPAQ hw6955. GPS tracking was conducted over


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

Fig. 3. Observed route (a) compared to its shortest paths based on distance (b) and time (c).

a 2-day period and the primary respondent was asked to complete

a pencil-and-paper survey instrument (memory jogger). A followup telephone interview was conducted after the 2-day travel peri-

od was completed. Within this framework, a set of controls was

established to ensure that there were no missing data. If there
was a conict between diary and GPS data or any missing data,

Table 3
Route attribute statistics for observed routes and shortest paths based on time and distance (n = 237).

Observed route
(mean std.)

Shortest path based on time

(mean std.)

Shortest path based on distance

(mean std.)

Number of unique roads

Time (min)
Distance (m)
Straight-line distance (m)
Route directness index
Link count
Longest leg distance (m)
Longest leg time (min)

14.4 7.8
12.6 8.0
15,316 15,316
10,451 8234
1.53 0.41
86 47
6447 6550
4.6 4.0

10.2 4.1
11.3 7.1
13,880 10,360
10,451 8234
1.38 0.23
85 45
6209 6439
4.5 4.0

12.5 5.6
12.2 8.1
13,495 10,110
10,451 8234
1.34 0.22
83 45
5394 5013
4.3 3.4

3.0 1.9
3.0 2.0
0.4 0.8
0.4 0.7
6.9 3.7

2.8 1.8
2.7 1.6
0.4 0.7
0.3 0.5
6.2 3.1

3.7 2.5
3.5 2.3
0.3 0.7
0.4 0.6
7.8 4.5

Speed statistics (km/h)

Minimum speed
Maximum speed
Mean speed
Standard deviation speed
10th speed percentile
20th speed percentile
30th speed percentile
40th speed percentile
50th speed percentile
60th speed percentile
70th speed percentile
80th speed percentile
90th speed percentile

50.4 4.8
87.6 16.5
67.5 9.0
13.3 6.7
52.3 5.9
55.5 8.2
58.7 10.1
62.2 11.1
66.3 12.9
70.9 14.5
75.8 15.8
80.0 16.8
84.3 17.1

50.6 4.0
86.2 15.8
68.4 8.8
12.0 6.0
53.9 6.8
57.9 9.1
61.7 10.3
65.2 11.5
68.7 12.3
71.9 13.0
75.7 14.0
78.8 15.5
82.4 15.6

47.9 11.0
83.2 16.3
65.1 8.6
11.5 6.1
51.9 7.2
55.4 8.6
58.5 9.8
61.9 10.5
64.9 11.8
68.1 12.6
72.0 13.5
75.2 14.7
78.5 16.0

% of route based on road type

% distance on highway
% distance on expressway
% distance on main roads
% distance on local roads
% time on highway
% time on expressway
% time on main roads
% time on local roads

27.7 29.1
15.7 20.6
18.9 23.8
23.1 23.8
24.3 26.2
14.4 19.3
19.8 23.5
26.6 24.4

20.9 27.9
23.7 25.6
19.7 24.8
19.7 22.0
18.5 25.2
21.7 23.9
20.8 24.8
22.7 22.6

13.8 22.5
20.5 24.0
17.2 22.3
28.7 25.2
11.8 19.6
17.8 21.8
17.3 22.0
31.8 25.8

% of longest road based on road type

% distance on highway
% distance on expressway
% distance on main roads
% distance on local roads
% time on highway
% time on expressway
% time on main roads
% time on local roads

17.1 23.0
5.9 17.5
6.6 16.5
4.7 14.3
13.3 18.6
5.3 16.4
6.6 16.4
4.9 14.6

13.3 22.3
10.7 21.0
8.4 18.7
3.8 13.6
10.6 18.5
9.7 19.4
8.4 18.6
3.9 14.0

7.9 16.6
9.8 20.7
7.5 17.6
5.9 15.7
5.8 12.7
8.7 19.1
7.3 17.2
5.6 15.4

Turn statistics
Left turns
Right turns
Sharp left turns
Sharp right turns
Total turns


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

the participant was contacted for further clarication. This redundancy between the diary and GPS data ensured a high degree of
data quality.
Verication and validation of the GPS data were manually conrmed to record observed route information. Poor quality trips
containing missing trip segments or low signal quality based on
horizontal dilution of precision (HDOP) values were eliminated
from the sample. Signal degradation can make it difcult to assess
the actual route and therefore such routes were not used in the
To begin the process, we manually recreated observed routes
using the Network Analyst extension and ran the routes through
the RCA toolkit. Empirical analysis was based on three types of
routes: observed, shortest path based on distance, and shortest
path based on travel time. Table 3 contains summary statistics
(mean, standard deviation) for route attributes generated by the
toolkit by route type.
Using SPSS, we used the paired-samples t-test to test for differences between observed routes and those based on shortest paths
(time and distance). Table 4 lists the t-statistics and p-values that
provide evidence on whether to accept or reject the null hypothesis
to test whether differences between observed routes and their
shortest-path alternatives are statistically signicant with respect
to specic route attributes. Bold values represent signicant differences at the 0.05 signicance level.
A comparison between observed routes and their corresponding
shortest paths shows signicant differences for several attributes.
Interestingly, observed routes are signicantly longer in terms of
time and distance than their shortest-path alternatives based on
time. The same holds true when comparing observed routes to
their shortest-path alternatives based on distance. These ndings
alone suggest that observed routes are not optimized. Instead, individuals may choose routes based on one or more other characteristics of routes. In our analysis, differences are found in terms of
route directness, longest leg characteristics, sharp and normal
turns, speed statistics, and differences in road type usage such as
the use of highways and local roads. Although the home-to-work
commute is relatively xed spatially and temporally, the differences suggest that the observed route attributes vary signicantly
compared to their shortest paths implying that factors other than
time or distance play a role in route choice decisions.
The composition of shortest distance routes in terms of road
type differs from that of observed routes. When measured in terms
of route distance, on average, highways and expressways comprise
only 34% of shortest distance routes compared to 44% for observed
routes. Shortest distance routes have fewer unique roads and more
left and right turns compared to observed routes.
Routes that minimize travel time tend to follow expressways
(24% of route distance) and to a lesser extent highways (21% of
route distance) a trend that is reversed for observed routes
(16% of route distance for expressways, 28% of route distance for
highways). Shortest time routes have fewer unique roads and fewer left and right turns compared to observed routes.
As mentioned in Section 3.1, the RCA toolkit computes the
straight-line distance (SLD) between an origin and a destination.
This is used to derive the route directness index (RDI) for a given
route, whether the route is observed or based on a shortest path.
The RDI measures route circuity, which, for an observed route, is
based on an individuals behavior and the conguration of the
underlying road network. For a shortest path route, however, the
RDI describes the coarseness of a network. For instance, on a
network that is very dense, where roads are arranged in a ne,
grid-like pattern, a shortest path would have a relatively low index
value (close to 1). Conversely, on road network that is coarse, a
shortest path route would tend to have a higher index value (greater than 1). Comparison of an observed routes RDI value to that of a

Table 4
Attributes of observed routes compared to their shortest-path alternatives (time and
distance) via paired-samples t-tests (test statistics and p-values are shown with the
latter in parentheses). Bolded values indicate differences signicant at the 0.05
signicance level (n = 237).

Obs. vs. time

Obs. vs. distance

Number of unique roads

Time (min)
Distance (m)
Route directness index
Link count
Longest leg distance (m)
Longest leg time (min)

4.02 (0.000)
9.43 (0.000)
7.19 (0.000)
8.26 (0.000)
0.71 (0.476)
1.50 (0.133)
0.96 (0.335)

4.30 (0.000)
1.99 (0.047)
8.97 (0.000)
10.20 (0.000)
2.50 (0.013)
5.41 (0.000)
0.00 (0.999)

Turn statistics
Left turns
Right turns
Sharp left turns
Sharp right turns
Total turns





Speed statistics (km/h)

Minimum speed
Maximum speed
Mean speed
Standard deviation speed
10th speed percentile
20th speed percentile
30th speed percentile
40th speed percentile
50th speed percentile
60th speed percentile
70th speed percentile
80th speed percentile
90th speed percentile

1.20 (0.226)
2.39 (0.017)
3.30 (0.001)
5.09 (0.000)
4.40 (0.000)
5.20 (0.000)
5.50 (0.000)
5.60 (0.000)
4.20 (0.000)
1.60 (0.098)
0.11 (0.908)
1.48 (0.139)
2.91 (0.004)



% of route based on road type

% distance on highway
% distance on expressway
% distance on main roads
% distance on local roads
% time on highway
% time on expressway
% time on main roads
% time on local roads

5.17 (0.000)
6.50 (0.000)
0.80 (0.417)
3.33 (0.001)
4.97 (0.000)
6.50 (0.000)
1.00 (0.316)
3.76 (0.000)



% of longest road based on road type

% distance on highway
3.37 (0.001)
% distance on expressway
4.60 (0.000)
% distance on main roads
2.00 (0.044)
% distance on local roads
1.48 (0.140)
% time on highway
3.02 (0.003)
% time on expressway
4.60 (0.000)
% time on main roads
2.00 (0.043)
% time on local roads
1.50 (0.134)



Differences for the paired-samples t-tests are computed as observed route attribute
minus shortest path attribute. This means that if the statistic is positive, the
observed route attribute is larger. In turn, if the statistic is negative, the shortest
path attribute is larger.

shortest path reveals how efcient an observed route is. This measure of efciency is related directly to an individuals behavior.
There is a statistically signicant difference between observed
routes and shortest path routes in terms of RDI. The shortest paths
based on time and distance have average index values of 1.38 and
1.34, respectively. These values mean that such routes are, on average, 38% and 34% longer than the straight-line distances between
origins and destinations. By comparison, the average RDI value
for observed routes is 1.53. This implies that such routes are 15%
and 19% less efcient than their shortest-path alternatives.
We also compared the speed distributions for shortest time and
shortest distance routes to that of observed routes (see Table 4 for
the statistical analysis). Fig. 4 presents the values for speed percentiles by route type. Minimum, median, and maximum average
speeds are shown. From the 50th to 90th speed percentiles, observed routes have statistically higher average speeds than shortest

D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

Fig. 4. Distribution of speed percentiles for observed and shortest path (time and
distance) routes.

distance routes. For instance, at the 50th speed percentile, the average speed for an observed route is 66.3 km/h compared to 64.9 km/
h for a shortest distance route. One aspect of the comparison is that
routes minimizing travel time typically traverse high-speed links
(i.e., expressways and highways). This is the reason why observed
routes have statistically lower average speeds than shortest time
routes for the 10th to 50th speed percentiles. However, at the
90th percentile the situation is reversed observed routes have faster average speeds.
As a nal point, we compared the shortest-path alternatives
(time and distance) using the paired-samples t-test. We found that
over half of the variables generated by the RCA toolkit were statistically different at the 0.05 signicance level between the two
route types. Even more importantly, shortest path routes do not
represent observed routes accurately for the home-to-work commute. The ndings from this demonstration of the RCA toolkit suggest that individuals do not necessarily select routes that minimize
travel time or distance. In turn, this affects trafc congestion levels,
automobile fuel efciency, and automobile emission levels.
5. Conclusions
Transportation planning models require detailed information
for the purpose of accurately forecasting future travel demand.
Routing applications such those available in ArcGIS, ArcLogistics,
and TransCAD are capable of solving traditional routing problems
such as the shortest path and traveling salesman problems. However, none are available for generating variables for evaluating
routing behavior in an effort to improve trafc assignment models.
The limitations of the urban transportation modeling system
(UTMS) have concerned researchers for quite some time. These
limitations include, among others, the need to model elderly drivers (Maoh et al., 2009), land use not affected by travel patterns
(Waddell, 2002), and the zonal aggregation of decision makers
(Kwan and Weber, 2008; Martnez et al., 2009). Travel behavior
is a complex process that involves decision-making by household
members (Scott and Kanaroglou, 2002). Trafc analysis zones are
often too large to study at a ne scale. Trips do not begin and
end at a single point in a zone (i.e., centroid). Instead, based on
traveler decisions, trips start and end at precise locations with distinct routes between them.
With the widespread availability of GPS receivers for tracking
people and vehicles, we nd route reconstruction more accessible
than previously envisioned. Use of these precise data sets allows us
to observe exact route choice. With the emergence of more disaggregate models to represent travel (i.e., microsimulation), the RCA
toolkit has an important role to play in generating route attributes
for furthering our understanding of route choice decisions.


Dynamic trafc assignment models, such as AIMSUN2 (Barcelo,

1996), MITSIM (Ben-Akiva et al., 2001), Paramics (Quadstone
Paramics Ltd., 2010), and VisSim (Visual Studios Inc., 2010), are
widely used.
Using our case study as an example, we have demonstrated our
toolkits ability to generate several types of explanatory variables
for route choice analysis. In this study, we compared observed
routes to their corresponding shortest paths and found signicant
differences in many variables. These differences are present in
route distance and time, as well as certain measures of trip speed,
road type usage, and number of turns. Comparisons could also be
drawn between observed routes and other choice sets. These
choice sets could be comprised of randomly selected alternative
routes or k-shortest paths. Each comparison could enhance our
understanding of route choice decision-making.
While the underlying need for travel is important, we also recognize many unanswered questions relating to urban form. Network conguration and urban form play an important role,
especially the impact of density and network design, on route
choice options. Fine grid-like road networks in dense urban areas
allow people to more easily minimize their commutes. Low-density areas with coarse road networks may cause individuals to take
more circuitous routes (Cervero and Gorham, 1995). The RCA toolkit addresses this issue through the route directness index. The use
of this measure in conjunction with trip purpose can provide valuable insights on whether frequently traveled home-to-work trips
are more circuitous than discretionary trips in terms of travel time
and distance.
The ndings in this research indicate that shortest paths do not
represent observed routes for the home-to-work commute accurately. On average, observed routes are signicantly longer in
terms of time and distance than their shortest-path alternatives
based on time and distance. This implies that algorithms based
on shortest paths to represent routes may not capture real-world
route choice decisions.
To conclude, current trafc assignment models focus on minimizing perceived travel time. However, many other factors inuence route choice decisions (see Table 1). Policy makers place a
lot of faith in current trafc assignment models, which may underestimate the number of kilometers traveled per vehicle. In turn,
this means that emission levels of harmful pollutants produced
by vehicles will most likely be underestimated. Empirical ndings
may reveal travel patterns not captured in current modeling efforts. The combination of detailed route choice data and proper
evaluation tools can provide a better understanding of the underlying route choice decision-making process.
We would like to thank the editor (Prof. Shih-Lung Shaw) and
two anonymous reviewers for providing insightful comments to
improve our paper. We would also like to thank Matthew Armenti,
Kimberlee Evering, Benjamin Garden, and Alexander Mitra for their
help in processing data. The research was supported nancially by
a grant awarded to Darren M. Scott from the Natural Sciences and
Engineering Research Council of Canada (261850-2009).
Abdel-Aty, M.A., Kitamura, R., Jovanis, P.P., 1997. Using stated preference data for
studying the effect of advanced trafc information on drivers route choice.
Transportation Research Part C: Emerging Technologies 5, 3950.
Avineri, E., Pashker, J.N., 2005. Sensitivity to travel time variability: travelers
learning perspective. Transportation Research Part C: Emerging Technologies
13, 157183.
Bailenson, J.N., Shum, M.S., Uttal, D.H., 1998. Road climbing: principles governing
asymmetric route choices on maps. Journal of Environmental Psychology 18,


D. Papinski, D.M. Scott / Journal of Transport Geography 19 (2011) 434442

Barbour, K.M., 1977. Rural road lengths and farm-market distances in north-east
Ulster. Geograska Annaler Series B: Human Geography 59, 1427.
Barcelo, J., 1996. The parallelization of AIMSUN2 microscopic trafc simulator for
ITS applications. In: Paper presented at the 3rd World Conference on Intelligent
Transportation Systems, Orlando, FL.
Bar-Gera, H., Mirchandani, P., Fan, W., 2006. Evaluating the assumption of
independent turning probabilities. Transportation Research Part B:
Methodological 40, 903916.
Bekhor, S., Ben-Akiva, M.E., Ramming, M.S., 2006. Evaluation of choice set
generation algorithms for route choice models. Annals of Operations Research
144, 235247.
Ben-Akiva, M., Bergman, M.J., Daly, A.J, Ramaswamy, R., 1984. Modelling inter urban
route choice behaviour. In: Volmuller, R., Hamerslag, R. (Eds.), Proceedings of
the 9th International Symposium on Transportation and Trafc Theory. VNU
Press, Utrecht, pp. 299330.
Ben-Akiva, M., Cortes, M., Davol, A., Koutsopoulos, H., Toledo, T., 2001. MITSIMLab:
enhancements and applications for urban networks. In: Paper Presented at the
9th World Conference on Transportation Research (WCTR), Seoul, South Korea,
Cardillo, A., Scellato, S., Latora, V., Porta, S., 2006. Structural properties of planar
graphs of urban street patterns. Physical Review E: Statistical, Nonlinear, and
Soft Matter Physics 73, 066107.
Cascetta, E., Russo, F., Viola, F.A., Vitetta, A., 2002. A model of route perception in
urban road networks. Transportation Research Part B: Methodological 36, 577
Cervero, R., Gorham, R., 1995. Commuting in transit versus automobile
neighborhoods. Journal of the American Planning Association 61, 210225.
Chen, T.Y., Chang, H.L., Tzeng, G.H., 2001. Using a weight-assessing model to
identify route choice criteria and information effects. Transportation Research
Part A: Policy and Practice 35, 97224.
Chung, E.H., Shalaby, A., 2005. A trip reconstruction tool for GPS-based personal
travel surveys. Transport Planning and Technology 28, 381401.
Cole, J.P., King, C.A.M., 1968. Quantitative Geography. John Wiley and Sons, London.
De Palma, A., Picard, N., 2005. Route choice decision under travel time uncertainty.
Transportation Research Part A: Policy and Practice 39, 295324.
Doherty, S.T., Papinski, D., 2004. Is it possible to automatically trace activity
scheduling decisions? In: Paper presented at Progress in Activity-Based
Analysis, Maastricht, The Netherlands.
Eby, D.W., Molnar, L.J., 2002. Importance of scenic byways in route choice: a survey
of driving tourists in the United States. Transportation Research Part A: Policy
and Practice 36, 95106.
Faghri, A., Hamad, K., 2002. Application of GPS in trafc management systems. GPS
Solutions 5, 5260.
Frank, M., Wolfe, P., 1956. An algorithm for quadratic programming. Naval Research
Logistics Quarterly 3, 95110.
Golledge, R.G., Grling, T., 2002. Spatial behaviour in transportation modeling and
planning. In: Goulias, K. (Ed.), Transportation Systems Planning: Methods and
Applications. CRC Press, New York, pp. 31 to 321.
Golledge, R.G., Stimson, R.J., 1997. Spatial Behaviour: A Geographic Perspective.
Guilford, New York.
Hall, R.W., 1986. The fastest path through a network with random time-dependent
travel times. Transportation Science 20, 182188.
Hato, E., Taniguchi, M., Sugie, Y., Kuwahara, M., Morita, H., 1999. Incorporating an
information acquisition process into a route choice model with multiple
information sources. Transportation Research Part C: Emerging Technologies 7,
Hunt, J.D., Abraham, J.E., 2006. Inuences on bicycle use. Transportation 34, 453
Khattak, A.J., De Palma, A., 1997. The impact of adverse weather conditions on the
propensity to change travel decisions: a survey of Brussels commuters.
Transportation Research Part A: Policy and Practice 31, 181203.

Kwan, M.-P., Weber, J., 2008. Scale and accessibility: implications for the analysis of
land use-travel interaction. Applied Geography 28, 110123.
Lam, T.C., Small, K.A., 2001. The value of time and reliability: measurement from a
value pricing experiment. Transportation Research Part E: Logistics and
Transportation Review 37, 231251.
Li, H., Guensler, R., Ogle, J., 2005. Analysis of morning commute route choice
patterns using global positioning system-based vehicle activity data.
Transportation Research Record: Journal of the Transportation Research Board
1926, 162170.
Lo, H.K., Luo, X.W., Siu, Y., 2006. Degradable transport network: travel time and
budget of travelers with heterogeneous risk aversion. Transportation Research
Part B: Methodological 40, 792806.
Mahmassani, H.S., Liu, Y., 1999. Dynamics of commuting decision behavior under
advanced traveller information systems. Transportation Research Part C:
Emerging Technologies 7, 91107.
Maoh, H., Kanaroglou, P., Scott, D., Pez, A., Newbold, B., 2009. IMPACT: an
integrated GIS-based model for population aging consequences on
transportation. Computers, Environment and Urban Systems 33, 200210.
Martnez, L.M., Viegas, J.M., Silva, E.A., 2009. A trafc analysis zone denition: a new
methodology and algorithm. Transportation 36, 581599.
Papinski, D., Scott, D.M., Doherty, S.T., 2009. Exploring the route choice decisionmaking process: a comparison of planned and observed routes obtained using
person-based GPS. Transportation Research Part F: Trafc Psychology and
Behaviour 12, 347358.
Peeta, S., Yu, J.W., 2006. Behavior-based consistency-seeking models as deployment
alternatives to dynamic trafc assignment models. Transportation Research
Part C: Emerging Technologies 14, 114138.
Peeta, S., Ramos, J.L., Pasupathy, R., 2000. Content of variable message signs and online driver behavior. Transportation Research Record: Journal of the
Transportation Research Board 1725, 102108.
Prato, C.G., 2009. Route choice modeling: past, present and future research
directions. Journal of Choice Modelling 2, 65100.
Quadstone Paramics Ltd., 2010. Paramics. <>.
(accessed April 2010).
Robusto, C.C., 1957. The cosinehaversine formula. The American Mathematical
Monthly 64, 3840.
Scott, D.M., Kanaroglou, P.S., 2002. An activity-episode generation model that
captures interactions between household heads: development and empirical
analysis. Transportation Research Part B: Methodological 36, 875896.
Scott, D.M., Kanaroglou, P.S., Anderson, W.P., 1997. Impacts of commuting efciency
on congestion and emissions: case of the Hamilton CMA, Canada.
Transportation Research Part D: Transport and Environment 2, 245257.
Selten, R., Chumura, T., Pitz, T., Kube, S., Schreckenberg, M., 2007. Commuters route
choice behaviour. Games and Economic Behavior 58, 394406.
TURP, 2007. Halifax Regional SpaceTime Activity Research (STAR) Survey: Project
Description. Saint Marys University, Time Use Research Program, Halifax, Nova
Visual Studios Inc., 2010. VisSim. <>. (accessed April
Waddell, P., 2002. UrbanSim: modeling urban development or land use,
transportation, and environmental planning. Journal of the American Planning
Association 68, 297314.
Wardrop, J.G., 1952. Some theoretical aspects of road trafc research. Proceedings of
the Institute of Civil Engineers Part 2 1, 325378.
Wolf, J., Guensler, R., Bachman, W., 2001. Elimination of the travel diary: an
experiment to derive trip purpose from GPS travel data. Transportation
Research Record: Journal of the Transportation Research Board 1768, 124.
Wolf, J., Oliveira, M., Thompson, M., 2003. The impact of trip underreporting on VMT
and travel time estimates. Transportation Research Record: Journal of the
Transportation Research Board 1854, 121.