Origin-Destination Estimation of Bus Users by Smart Card Data

Chapter 17
Origin-Destination Estimation of Bus

Users by Smart Card Data
Mona Mosallanejad, Sekhar Somenahalli and David Mills
Abstract The public transport smart cards offer transit planners access to a tremen-
dous source of spatial-temporal data, offering opportunities to infer a passenger’s
mobility pattern and path choices. It is essential to accurately estimate the origin and
destination (OD) matrix to understand the travel demand. This research has devel-
oped a new approach using a trip chain model to estimate public transport commuter’s
trajectories in a multi-legged journey. This research has proposed new algorithms to
link the passenger’s journeys involving the mode transfers using assumptions relating
to the passenger paths in between their successive boarding’s and their acceptable
walking distances. The study also developed assumptions to distinguish “transfer’
from ‘activity’ to accurately predict the passenger destination. This study results will
enable the public transport agencies to optimise the public transport routes and their
schedule; which will ultimately lead to the public transport system improvements
resulting in higher patronage.
Keywords Origin-destination matrix · Public transport · Trip chain model · Smart

card
1 Introduction
Transport planners attempt to design transit facilities that will encourage people to
use public transport instead of private vehicles. As public transport agencies increas-
ingly adopt the use of automatic data collection systems, a significant amount of
M. Mosallanejad (B) · S. Somenahalli

University of South Australia, Adelaide, Australia
e-mail: mosmy007@mymail.unisa.edu.au
S. Somenahalli
e-mail: Sekhar.Somenahalli@unisa.edu.au
D. Mills
Department of Planning Transport and Infrastructure, Adelaide, Australia
e-mail: David.mills@sa.gov.au
© Springer Nature Switzerland AG 2019 305

S. Geertman et al. (eds.), Computational Urban Planning and Management
for Smart Cities, Lecture Notes in Geoinformation and Cartography,
https://doi.org/10.1007/978-3-030-19424-6_17
306 M. Mosallanejad et al.
boarding data becomes available, providing an excellent opportunity for transit plan-
ners to access spatial-temporal data (Rahbar et al. 2017; Tao 2018) which can be
used for a better understanding of human mobility and the performance of a transit
system (El Mahrsi et al. 2017). In comparison with traditional surveys, which are
usually time-consuming, expensive and of the ‘snapshot’ type, smart card data can be
used to examine a whole network regularly and to make realistic estimates of pas-
senger origin-destination (OD) patterns.
Developing approaches for estimating accurate OD matrices from smart card
data is critical for transit planners (Alsger et al. 2015). Having knowledge of travel
demand will facilitate the design of appropriate public transport routes, and lead to
the optimisation of schedules. In turn, this will enhance public transport patronage,
with the potential of improving the public transport system’s performance.
In this paper, a one-month (May 2017) dataset was used. The data was provided
by the Department of Planning Transport and Infrastructure (DPTI) in Adelaide,
South Australia. A new methodology was developed, using the trip chain model, to
estimate an OD matrix for Adelaide’s bus users. Adelaide was chosen for this study
because unlike in other cities, commuters scan their smart card upon boarding but
not on alighting. This allows the algorithm to be generic and therefore applicable
elsewhere.
2 Origin-Destination Estimation Methods
Demand for public transport depends on factors such as time of travel, weather, and
service reliability (Morency et al. 2007). Many procedures have been used to make
such predictions and estimates of OD matrices based on smart card data have been
carried out since the 20th century. These methodologies and their accuracy vary,
depending on the availability of data and the time they cover, which can vary from
a week to a year. Before the evolution of new technologies for collecting data, most
studies were based on household and on-board survey data, used in a variety of
methods to estimate an OD matrix. These methods included non-iterative algorithms
(Tsygalnitsky 1977), Fluid mechanics (Tsygalnitsky 1977), passenger on-off counts
and checker records at each stop (Simon and Furth 1985), constrained least squares
and the Fratar model (Gur and Ben-Shabat 1997), and fuzzy theory (Friedrich et al.
2000).
The introduction of the automatic fare collection system made it possible to
develop different methods for estimating an OD matrix. Initially a new method-
ology was proposed to compare OD trips versus the number of passengers (Barry
et al. 2002); since then, researchers have explored the potential of smart card data
to infer trip rates, turnover rates, and travel behaviour to improve planning aims
(Bagchi and White 2005; Utsunomiya et al. 2006). Methods based on automatic data
collection systems for OD matrix estimation include the Furness model (Lianfu et al.
2007), fusion approaches (Kusakabe and Asakura 2011), multiple linear regression
(Kalaanidhi and Gunasekaran 2013), iterative proportional fitting (Cui 2006; Gordon
17 Origin-Destination Estimation of Bus Users by Smart Card Data 307
et al. 2013; Horváth et al. 2014; Li and Cassidy 2007), maximum likelihood esti-
mation (Cui 2006; Ickowicz and Sparks 2015; Li and Cassidy 2007), Inferring the
alighting station via the straightforward algorithm and iterative method (Chapleau
et al. 2008; Seaborn et al. 2009; Zhao 2004; Zhao et al. 2007), and the trip chain
model (Ali et al. 2015; Alsger et al. 2018; Munizaga and Palma 2012; Nassir et al.
2011; Wang 2010).
The time-dependent OD matrix is estimated from passenger counts at both board-
ing and alighting stations and is based on the forecasting method linking boarding
and alighting data (Horváth 2012). This method included transfer time, and its vali-
dation is based on an application in the Hungarian capital city. Yang and Jun (2018)
develop a new methodology to visualise the travel patterns of transit commuters
in Seoul, South Korea, by calculating trajectories and using Carto to create a map.
The moth-flamed optimisation (MFO) algorithm is a new population-based meta-
heuristic algorithm that investigates the celestial navigation of moths to estimate
the OD matrix (Heidari et al. 2017). Li et al. (2018) compare different studies using
smart card information, to examine passengers’ travel behaviours and provide a com-
prehensive review of them. The trip chain model is a recently devised method for
determining travel patterns and travel behaviours, first utilised by Barry et al. (2002)
to estimate destinations (Li et al. 2018). Although there is no exact definition for a
trip chain, a basic description is that each chain consists of one or more stops to the
next destination, and a trip chain is specified according to the number of stops. The
algorithm which will be used here to estimate the alighting stop is based on the trip
chain model (Alsger et al. 2016; Langlois et al. 2016; Li et al. 2018).
3 Data Structure
The smart card must be tapped, swiped or waved at the station, stop or vehi-
cle. Flat fare policy and some zonal fare policies require commuters to tap once
before boarding and records only this single transaction. However, in some cities
where an exit reader is available as well, and the fare policy is based on distance or
zone, for each trip, two records are available, for boarding and alighting (Kurauchi
and Schmöcker 2016).
The data used in this paper is based on the ‘MetroCard’ database used in Adelaide
and is collected by the DPTI for a specific period: May 2017. Each MetroCard
contains spatial and temporal information. In Adelaide, where a flat fare policy
operates, commuters validate their cards when they board a public vehicle but not on
alighting. Three modes of transport are available: bus, train and tram. The information
for each smart card transaction contains card identification, fare type, transport mode
used, time, date, stop code, route code and direction for each boarding (see Table 1).
When passengers swipe their card and pay an initial transaction, the fare is valid
for two hours, and passengers can use any public transport within this time without
incurring further costs.
Table 1 Individual MetroCard information

Media code Fare type Transport Date and Stop code Route code Direction
mode time
807***CB SV 4 2017-05-01 8089 Tram 1
09:49:35
94E***FB TICKETS 1 2017-05-01 3351 251 1
10:39:15
11C***89 28DAY 1 2017-05-05 3285 271 1
10:46:32
707***27 OTHER 1 2017-05-01 2072 H22 1
11:04:05
584***97 SV 5 2017-05-08 1852 GWC 1
11:06:36
Note Transport mode: 1 = Bus, 3 = Station, 4 = Tram, 5 = Train
There are some deviations from the one-swipe rule: railway stations in Adelaide
operate under a closed system, and swiping is required for both boarding and alight-
ing, and various systemic and user issues mean that transfers between the train and
other modes cannot be estimated directly from the MetroCard. Also, there is a free
tram zone in Adelaide where passengers do not need to swipe their cards; this means
that the tram boarding point is not available. Given these limitations, this study
focuses on bus users.
4 Methodology
Knowledge of transit demand plays a decisive role in public transport plans to improve
the performance of the system. One common method for estimating the destination
is the trip chain model. As mentioned previously, each smart card can provide the
boarding location and time of each bus trip but not the alighting location. This study
used various assumptions (as listed below) to estimate the passenger’s ultimate des-
tination. In the case of transfer trips, the trip chain model assumes the alighting stop
is located within an acceptable walking distance of the next stop and for calculating
the walking distance, the Euclidian distance was utilised.
Some assumptions considered in this algorithm are:
• The initial boarding location of a trip leg is the ‘origin’.
• A passenger’s alighting point is assumed to be within walking distance of the next
boarding stop in the case of transfer trips.
• Passengers return to the place where they first boarded that day, or to some other
nearby station.
• Commuters take the first available service after arriving at a boarding place.
• Each smart card is used by a single commuter and cannot be used by multiple
passengers.
• Commuters who use the public transport system do not use any other mode of
transport on that same day.
Here is the explanation for some of the terms used in this study:
• Media code: the unique identifier for each MetroCard in Adelaide.
• Time threshold: the waiting time between two consecutive transactions.
• Trip leg: the trip for an individual commuter between boarding and alighting stops.
• Walking distance: the maximum distance between two consecutive trip legs that
commuters walk to transfer to another public transport service.
• Trip ID: identifies an ID for each trip, which is unique for every service.
• Route ID: identifies a unique ID for each route.
• Stop ID: identifies a unique ID for an individual stop or station entrance; a multiple
route ID may use the same stop.
• Service ID: contains a unique ID of the available service for one or more routes.
• Block ID: identifies the block to which a specific trip belongs. A block can consist
of a single trip or more for the same vehicle.
4.1 Estimating the Alighting Stop
A new heuristic algorithm is used to estimate stop-level origins and destinations,

based on the boarding transactions in the MetroCard datasets. The algorithm used
to estimate the alighting stop is shown in Fig. 1. This flowchart was used for finding
the alighting stop and not the destination because not all alighting stops are the
destination of a trip leg, as some of the alighting stops may be used for transferring to
other modes or other buses. For OD estimation, some criteria like trip ID and service
ID were extracted from the Google Transit Feed Specification (GTFS) dataset. In
the database provided by DPTI, the stop ID for each MetroCard is different from the
stop code in GTFS data, and these need to be matched. Once that was done, the data
based on the transaction time was sorted, and a MetroCard ID was selected. Based
on the trip chain model, the subsequent transaction in each trip leg is a key point for
inferring the alighting stop. By considering the following transaction of a MetroCard
(the next boarding), the alighting stop was estimated by calculating the minimum
Euclidian distance. Based on the algorithm, for each transaction, the trip ID, service
ID and block ID from ‘stop_times.txt’ in GTFS data were selected. These criteria
are unique for each service for various modes of public transport: for example, a bus
which departs at a specific time from its origin has its own trip ID, service ID and
block ID, which may be different from the subsequent bus. By matching the time of
each transaction in MetroCard data with the arrival and departure time in GTFS data,
and by considering the day that the commuter swiped the card, a trip ID is chosen. If
there is no trip ID relevant to the MetroCard data, as an interval of five minutes was
Read Metro Card ID
Match the stop ID from Metro card
Sort it based on the time
Label as Yes
"Origin " Is it the first transaction of a day?
No
Sort the following stops based on distance by using stop code and route ID, Label
them as X0,Y0(If there is a thru route, then select stops for both route number)
Read the following transaction

Is it the last transaction of a day?
Yes
Read the latitude and longitude for the following
No
transaction and label them as X1,Y1
Label as
Calculate the Euclidean distance "Destination"
((X1-X0)^2+(Y1-Y0)^2))^0.5
Find the stop with minimum Euclidean distance
No
Label as Yes Is the distance less Label as "Cannot be inferred"
"Alighting" than walking distance?
Fig. 1 Estimation of alighting stop
considered for selecting the trip ID. If in this period no trip ID was selected, then the
next available trip ID was chosen for the algorithm by considering a delay.
In Adelaide, some buses change their route ID in the middle of the route for some
specific hours, especially before entering the central business district (CBD). This is
known as a thru-linking route. The first stage is to infer the stop at which the route
ID changed to another one: in other words, by identifying the last stop for the current
route ID, the changing location can be inferred. To find the last stop, the data were
sorted based on arrival time. Then, based on the trip ID which was selected for the
transaction and the existing route ID, the last stop and its arrival time were chosen.
By entering the chosen stop and relevant time in the timetable database, the available
route could be selected. Routes with the same service ID and block ID could be
chosen and labelled as thru-link routes.
In the next step, the Euclidian distance was calculated between all stops along the
current route and the following transaction (next boarding). By using the stop code
and route ID, subsequent stops based on distance could be identified. The latitude
and longitude of these stops were labelled X0, Y0, and the latitude and longitude
of the successive transaction (next boarding) were labelled X1, Y1. Based on the
formula in the algorithm, the Euclidean distance could be calculated. For the next
stage, the stop ID with minimum Euclidean distance was selected. The distance was
compared with the maximum acceptable walking distance of 1000 m, derived for
Adelaide through sensitivity analysis; this distance will vary from city to city. If the
distance to the selected stop is less than the walking distance, then it was labelled
‘alighting stop’; otherwise, the alighting stop was labelled ‘cannot be inferred’.
Figure 2 depicts an example of a trip chain model for inferring a passenger’s
alighting stop. If a commuter starts the trip at stop i on route 1 and the next transaction
is at stop j on route 3; then the alighting point can be estimated. As mentioned earlier,
some routes in Adelaide change their route ID, but passengers are not required to
revalidate their cards. For example, if route 1 changes to route 2 as shown in Fig. 2
(a thru-linking route), the Euclidian distance is used to find the alighting stop; all
distances from stops in route 1 and route 2 to stop j, ED1, ED2, ED3 and ED4, should
be calculated (see Fig. 2) and the stop with the minimum Euclidian distance selected
as the alighting stop: this should be less than the acceptable walking distance. For
instance, if the first boarding is at stop i and the second boarding at stop j, then the
commuter alighted at stop m in route 2 (the thru-linking route for route 1). Also, stop
i is the origin of the first trip leg because it is the first transaction of a day. If the next
transaction will be k, this is the last transaction of a day and based on the assumptions
the destination should be near the origin of a day i. By using the minimum Euclidian
distance from stop k to i by route 4, the alighting stop will be i which is the last
destination of a day, and there is no other transaction afterwards (Mosallanejad et al.
2018).
In some cases, the alighting stop could not be inferred if the distance to the next
boarding was higher than the acceptable walking distance. Manual analysis showed
that the GPS incorrectly selected stops in certain situations due to their proximity to
a stop on the other side of the road. If the alighting stop could not be inferred, then
the opposite stop was considered in the algorithm to check whether the alighting
stop could be estimated or not. An earlier study in Chile (Munizaga and Palma 2012)
for estimating the alighting stop considered a trajectory time to minimise the time
distance with the next boarding position time, for bus routes that utilise the same
street for both direction. They estimated this variable by adding the time associated
with position i to walking time from position i to next boarding by multiplying a
penalization factor. However, in this research, a new algorithm is developed for non-
inferred OD pairs due to observed GPS data errors in some boarding locations. The
new improved trip chain model algorithm developed in this research helped us to
accurately locate an additional 5% of alighting stops. An additional algorithm was
developed for locating the opposite stop (see Fig. 3).
If any commuter in special circumstances used different mode on his return trip
(for example occasional use of a friend’s car), it is difficult to track those trips. In
Fig. 2 An example of a trip chain for inferring the alighting stop
Select the media code with no

alighting info in the previous step
Read latitude and longitude and label it as

X0, Y0
Read latitude and longitude for other stops in

Stop Reference and label it as X1, Y1
Calculate the Euclidean distance

((X1-X0) ^2+(Y1-Y0)^2))^0.5
Choose the minimum Euclidian Distance
For the selected stop check if the

Lebel as Yes
route id is available or not
"Opposite stop"
No
Select the next minimum Euclidean distance
Fig. 3 Estimation of the opposite stop

such situations, the study tracked the travel pattern of the commuters over a week,
and then their alighting stops were accurately derived. Such an approach further
improved the accuracy of OD pair estimation by about 3%.
4.2 Estimating the Alighting Time
To estimate the alighting time, first trip ID, which is unique for each service, has
to be identified. The trip ID is selected using route ID and stop ID for boarding
transactions, based on the boarding time and date. If the trip ID is in 5-min intervals,
then alighting time is selected based on the alighting stop and trip ID. If this was
later than the boarding time, then it is labelled as the alighting time.
4.3 Destination Estimation
After estimating the alighting stop, four categories were considered to infer the
destination: First, the data was checked to see if it was the last transaction of a day;
if yes, the inferred alighting stop was labelled as the destination. If the alighting stop
for the last transaction of a day could not be inferred, the destination could not be
estimated. Next, it was checked to see if a commuter used the same route twice, or
used a parallel route, to reach a destination in a single day; if so, this was an ‘activity’,
since no-one alights from a direct route and takes the same or a parallel route again.
Thus, the alighting stop was taken as the destination point. This approach of using
parallel route information is an improvement of a standard trip chain model. The
third criterion to infer the destination of each trip leg was the time threshold between
two consecutive transactions. If the time threshold was less than 20 min, then the
commuter was assumed to have transferred to another bus, and the inferred stop was
also the alighting point. For time threshold of more than 20 min but less than an
hour, the label ‘short activity’ was used; if the time threshold was more than 1 h
the label ‘long activity’ was used. Both short and long activities were labelled as
the destination. The fourth criterion for investigating the destination stop was the
distance between the boarding stop and the subsequent alighting stop. If this value
was less than 400 m, then the alighting stop was labelled as the destination (see
Fig. 4).
5 Origin-Destination Analysis
One of the critical considerations when planning transit services is estimating the
demand for each route, to determine the frequency and capacity of the vehicles
(Tamblay et al. 2018). An OD matrix provides critical information for transit planners
Read the
Alighting stop
Check if the distance between
boarding and subsequent alighting
Check if for the transaction is less than 400 m Check if it is the
subsequent trans- last transaction
action, the same of a day?
route or parallel No
route is selected Check if the time threshold If it is less than 20 min,
between 2 transactions is label it as transfer
more than 1 hour
If more than 20
Yes min and less than 1
Label it as
hour, label it as short
Activity
activity
Label as “Destination”
Fig. 4 Distinguishing transfer from activity
by estimating the number of journeys between different zones, information which

can be used in transportation planning, design and management. After analysing
the data based on the trip chain model, bus users’ origins and destination counts
during the morning peak were derived for each suburb (Fig. 5). Most trips originated
from Paradise, Modbury, Adelaide, and Klemzig suburbs. Three of them, Modbury,
Paradise and Klemzig are major interchanges for O-Bahn busway. Adelaide, Bedford
and Modbury are suburbs which destined most journeys during the day.
5.1 Discussion of Origin-Destination Analysis
The origin-destination analysis showed that bus movements were radial, and most
trips during the morning peak ended in the CBD. These movements were further
explored to rationalise the existing routes. The information below came from an OD
analysis that was used to identify specific routes. Suburbs with the highest origins and
destinations were shortlisted and analysed further; Fig. 6 shows movement patterns
from these suburbs in terms of percentages, shown as the thickness of the desire
lines, of trips originated or attracted.
• Modbury–Bedford Park: the OD analysis showed high demand from Modbury
to Flinders University during the morning peak, but just one route (G40) runs
between the suburbs, going through the CBD. The results indicate that a direct
route is required from Modbury to Bedford Park.
• Paradise–Bedford Park: there are two bus routes between these two suburbs (W90
and G40), and both pass through the CBD, which is heavily congested during the
morning peak. It is worth exploring the option of a direct route from Paradise to
Flinders University that avoids congested city links.
Fig. 5 Origin and destination counts for each suburb (bus system)
Fig. 6 Percentage of trip movements between suburbs with high origins and destinations
• Modbury–North Adelaide: bus routes between these two suburbs run through the
CBD. Given the high demand on this route, it would be better to explore another
direct route and divert some buses.
6 Validation
The best way to examine a model’s accuracy is to validate its results differently.
This was done through a survey in which fifteen volunteers were recruited randomly,
and their 407 transactions were analysed. This differed from the approach of earlier
studies, which undertook a household survey or utilised data from a closed system
where both boarding and alighting statistics are available. For example, Barry et al.
(2009) validated their assumptions by taking passenger counts at the exit and entrance
of the subway station in New York. Later, Devillaine et al. (2012) validated their
findings of the smart card by undertaking a travel survey in which users’ smart card
IDs were recorded. Munizaga et al. (2014) validated the assumptions they used in
the trip chain model by taking a travel survey of a small group of volunteers, which
returned 90% confirmation. In Brisbane where the ‘tap on, tap off’ system records
data for both boarding and alighting, the trip chain model assumptions were validated
against the go card dataset (Alsger 2016; He et al. 2015).
6.1 Estimating the Sample Size for a Survey
Estimating the sample size is critical for obtaining accurate results, and it is necessary
to investigate how much an increase in the sample size will lead to proper results with
fewer errors. In the context of survey objectives, two rationales can be considered:
the first is estimating the specific population parameters, and the second is testing the
statistical hypotheses. In this paper, the objective of the survey is related to population
parameters, and in such case factors that should be taken into account (Richardson
et al. 1995) include the variability of parameters across the population; the required
degree of precision; and population size.
Some approaches that consider estimating the sample size, such as that of Ceder
(2016), employ a procedure involving a survey for OD matrix, by taking into account
the percentage of passengers who travel between specific origins and destinations, the
population of each suburb, and the accuracy of each cell in the OD matrix. Previous
studies’ sample sizes vary as follows: 37 volunteers (Ebadi and Kang 2016), 53
(Munizaga et al. 2014), 306 (Lee and Hickman 2014) and 8000 households (Seaborn
et al. 2009).
This paper takes a different approach, using the discrete variable and based on a
random sample method. In this dataset which includes discrete variables, the standard
error for estimating a proportion p is given in Eq. 1 (Richardson et al. 1995).

N − n p(1 − p)
s.e.( p) = ( ) (1)
n n
where
n: sample size
N: population
In this study, the sample size was based on the population of the whole dataset
and assumed there would be a 95% correlation with the results. In the present study,
only the number of commuters who used buses was considered, and N = the number
of transactions per day by these passengers: 139,187. To calculate the population of
the whole dataset, the first week of May 2017 was considered, and Wednesday’s data
were selected as showing the most transactions. As per the equation, the minimum
sample size (with 95% confidence) for the transactions is estimated as 105. However,
this study analysed 407 transactions.
6.2 Survey
In this study, a survey was conducted by recruiting volunteers who usually used bus
services. Fifteen volunteers were randomly identified, and their smart card details
were collected after obtaining their written consent and ethics approval. The Depart-
ment of Planning Transport and Infrastructure provided the media code (unique
identifier in the dataset) for the smart card numbers, and two sources of data can
be matched by using the relevant ID. For fifteen participants over five months, 1686
transactions were collected, in which 1177 were related to the bus system. This inter-
view data helped in validating the estimated OD pair information derived for the trip
chain model developed in this study. Out of the 1177 transactions collected from
the interview survey, only 944 OD pair information was considered as error-free
data. The reported errors which are insignificant are due to the reporting of unusual
walking distance and also due to trip id errors. So only 944 OD pair information was
further used for validation purposes. When this information was compared with the
reported OD pair information derived from the interview survey, as many as 926 OD
pair information was tallied with the model results which amounts to 98% accuracy
(refer Table 2).
Table 2 Survey data Number of volunteers 15

information
Number of transactions (5 months) 1686
Number of transactions for the bus system 1177
Number of inferred OD pairs 944
Number of accurate OD based on an interview 926
Accuracy level 98.09
7 Conclusions
The transit OD matrix is a useful prerequisite for planners to optimise public trans-
port systems. The reliability of the system is an important criterion to encourage
people to leave their vehicles at home and take public transport instead. The primary
aim of this paper is in estimating an accurate OD matrix. A new methodology has
been developed, using SQL software and based on the trip chain model, to create an
OD matrix for Adelaide’s bus users and, as a result, to estimate the demand on the
transit system. The methodology assumes that passengers’ alighting points can be
determined using the Euclidian distance to the next boarding stop and considering
a minimum walking distance. This approach used various improvements over tra-
ditional methods for improving the estimated OD pair accuracy. These include (i)
minimising the GPS errors by using the stops on the opposite side of the road (ii)
increasing the OD estimation accuracy by observing commuter travel pattern over
a week period and (iii) improving the estimated OD accuracy by using the parallel
routes.
This study presents an overview of ridership patterns using one-month estimate
more accurate matrix. MetroCard data in Adelaide. The survey indicates that the
method used in this paper is 98% accurate and can be utilised elsewhere. An accu-
rate estimation of public transport OD will be a significant help to public agencies
involved in route rationalisation, which will lead to higher public transport patronage.
In further studies, census data could be used to validate this algorithm, and sensitivity
analysis could also be considered for various assumptions. It may also be possible
to estimate the purposes of various trips, based on smart card information if access
to such information is made available.
References
Ali A, Kim J, Lee S (2015) Travel behavior analysis using smart card data. KSCE J Civil Eng: 1–8
Alsger AA (2016) Estimation of transit origin destination matrices using smart card fare data.
School of Civil Engineering, The University of Queensland
Alsger A, Mesbah M, Ferreira L, Safi H (2015) Public transport origin-destination estimation using
smart card fare data. In: Transportation research board 94th annual meeting
Alsger A, Assemi B, Mesbah M, Ferreira L (2016) Validating and improving public transport
origin–destination estimation algorithm using smart card fare data. Transp Res Part C: Emerg
Technol 68:490–506
Alsger A, Tavassoli A, Mesbah M, Ferreira L, Hickman M (2018) Public transport trip purpose
inference using smart card fare data. Transp Res Part C: Emerg Technol 87:123–137
Bagchi M, White P (2005) The potential of public transport smart card data. Transp Policy
12(5):464–474
Barry J, Newhouser R, Rahbee A, Sayeda S (2002) Origin and destination estimation in New York
City with automated fare system data. Transp Res Record: J Transp Res Board 1817:183–187
Barry J, Freimer R, Slavin H (2009) Use of entry-only automatic fare collection data to estimate
linked transit trips in New York City. Transp Res Record: J Transp Res Board 2112:53–61
Devillaine F, Munizaga M, Trépanier M (2012) Detection of activities of public transport users by

analyzing smart card data. Transport Res Rec: J Transport Res Board 2276: 48–55
Ceder A (2016) Public transit planning and operation: modeling, practice and behavior. CRC Press
Chapleau R, Trépanier M, Chu KK (2008) The ultimate survey for transit planning: complete
information with smart card data and GIS. In: Proceedings of the 8th international conference on
survey methods in transport: harmonisation and data comparability, pp 25–31
Cui A (2006) Bus passenger origin-destination matrix estimation using automated data collection
systems. Massachusetts Institute of Technology
Ebadi N, Kang JE (2016) Constructing activity-mobility patterns of university at buffalo students
based on UB card transactions
El Mahrsi M, Come E, Oukhellou L, Verleysen M (2017) Clustering smart card data for urban
mobility analysis. IEEE Trans. Intell Transp Syst 18(3):712–728
Friedrich M, Mott P, Noekel K (2000) Keeping passenger surveys up to date: A fuzzy approach.
Transp Res Record: J Transp Res Board 1735:35–42
Gordon J, Koutsopoulos H, Wilson N, Attanucci J (2013) Automated inference of linked transit
journeys in London using fare-transaction and vehicle location data. Transp Res Record: J Transp
Res Board 2343:17–24
Gur Y, Ben-Shabat E (1997) Estimating bus boarding matrix using boarding counts in individual
vehicles. Transp Res Record: J Transp Res Board 1607:81–86
He L, Nassir N, Trépanier M, Hickman M (2015) Validating and calibrating a destination estimation
algorithm for public transport smart card fare collection systems. CIRRELT
Heidari A, Moayedi A, Abbaspour RA (2017) Estimating origin-destination matrices using an
efiicient moth flame- based spatial clustering approach. Int Arch Photogrammetry Remote Sens
Spat Inf Sci 42
Horváth B (2012) A simple method to forecast travel demand in urban public transport. Acta
Polytech Hung 9(4):165–176
Horváth B, Horváth R, Gaál B (2014) A new iterative method to estimate origin-destination matrix
in urban public transport. Transport Research Arena Europe, pp 14–17
Ickowicz A, Sparks R (2015) Estimation of an origin/destination matrix: application to a ferry
transport data. Public Transport 7(2):235–258
Kalaanidhi S, Gunasekaran K (2013) Estimation of bus transport ridership accounting accessibility.
Procedia-Soc Behav Sci 104:885–893
Kurauchi F, Schmöcker J-D (2016) Public transport planning with smart card data. CRC Press
Kusakabe T, Asakura Y (2011) Behavioural data mining for railway travellers with smart card data.
Behavioural data mining for railway travellers with smart card data
Langlois GG, Koutsopoulos HN, Zhao J (2016) Inferring patterns in the multi-week activity
sequences of public transport users. Transp Res Part C: Emerg Technol 64:1–16
Lee S, Hickman M (2014) Trip purpose inference using automated fare collection data. Public
Transport 6(1–2):1–20
Li Y, Cassidy MJ (2007) A generalized and efficient algorithm for estimating transit route ODs
from passenger counts. Transp Res Part B: Methodol 41(1):114–125
Li T, Sun D, Jing P, Yang K (2018) Smart card data mining of public transport destination: a literature
review. Information 9(1):18
Lianfu Z, ShuzhiZ, Yonggang Z, Ziyin Z (2007) Study on the method of constructing bus stops
OD matrix based on IC card data. In: International conference on wireless communications,
networking and mobile computing, 2007. WiCom 2007. IEEE, pp 3147–3150
Morency C, Trepanier M, Agard B (2007) Measuring transit use variability with smart-card data.
Transp Policy 14(3):193–203
Mosallanejad M, Somenahalli S, Vij A, Mills D (2018) Estimation of the Origin-Destination matrix
for bus system using MetroCard data, HKSTS conference
Munizaga MA, Palma C (2012) Estimation of a disaggregate multimodal public transport orig-
in–destination matrix from passive smartcard data from Santiago, Chile. Transp Res Part C:
Emerg Technol 24:9–18
Munizaga M, Devillaine F, Navarrete C Silva D (2014) Validating travel behavior estimated from
smartcard data. Transp Res Part C: Emerg Technol 44:70–79
Nassir N, Khani A, Lee S, Noh H, Hickman M (2011) Transit stop-level origin-destination estimation
through use of transit schedule and automated data collection system. Transp Res Record: J Transp
Res Board 2263:140–150
Rahbar M, Mesbah M, Hickman M, Tavassoli A (2017) Determining route-choice behaviour of
public transport passengers using Bayesian statistical inference. Road Trans Res: J Aust NZ Res
Pract 26(1):64
Richardson AJ, Ampt ES, Meyburg AH (1995) Survey methods for transport planning. Eucalyptus
Press Melbourne
Seaborn C, Attanucci J, Wilson N (2009) Analyzing multimodal public transport journeys in London
with smart card fare payment data. Transp Res Record: J Transp Res Board 2121:55–62
Simon J, Furth PG (1985) Generating a bus route OD matrix from on-off data. J Transp Eng
111(6):583–593
Tamblay S, Muñoz JC, Ortúzar JdD (2018) Extended methodology for the estimation of a zonal
origin–destination matrix: a planning software application based on smartcard trip data
Tao S (2018) Public transport planning with smart card data. In: Kurauchi F, Schmöcker JD (eds)
Boca Raton: CRC Press. ISBN 9781498726580, Elsevier
Tsygalnitsky S (1977) Simplified methods for transportation planning. Master’s thesis, Mas-
sachusetts Institute of Technology Cambridge
Utsunomiya M, Attanucci J, Wilson N (2006) Potential uses of transit smart card registration
and transaction data to improve transit planning. Transp Res Record: J Transp Res Board
1971:119–126
Wang W (2010) Bus passenger origin-destination estimation and travel behavior using automated
data collection systems in London. Massachusetts Institute of Technology, UK
Yang H, Jun C (2018) Visualization of public bus passenger travel for travel pattern analysis. In:
Adjunct proceedings of the 14th international conference on location based services, pp 121–126
Zhao J (2004) The planning and analysis implications of automated data collection systems: rail
transit OD matrix inference and path choice modeling examples. Massachusetts Institute of Tech-
nology
Zhao J, Rahbee A, Wilson NH (2007) Estimating a rail passenger trip origin-destination matrix
using automatic data collection systems. Comput-Aided Civil Infrastruct Eng 22(5):376–387

Origin-Destination Estimation of Bus Users by Smart Card Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Origin-Destination Estimation of Bus Users by Smart Card Data

Uploaded by

Copyright:

Available Formats

Chapter 17

Origin-Destination Estimation of Bus

Mona Mosallanejad, Sekhar Somenahalli and David Mills

Keywords Origin-destination matrix · Public transport · Trip chain model · Smart

M. Mosallanejad (B) · S. Somenahalli

© Springer Nature Switzerland AG 2019 305

2 Origin-Destination Estimation Methods

Table 1 Individual MetroCard information

4.1 Estimating the Alighting Stop

A new heuristic algorithm is used to estimate stop-level origins and destinations,

Read Metro Card ID

Match the stop ID from Metro card

Sort it based on the time

Read the following transaction

Find the stop with minimum Euclidean distance

Fig. 1 Estimation of alighting stop

Fig. 2 An example of a trip chain for inferring the alighting stop

Select the media code with no

Read latitude and longitude and label it as

Read latitude and longitude for other stops in

Calculate the Euclidean distance

Choose the minimum Euclidian Distance

For the selected stop check if the

Select the next minimum Euclidean distance

Fig. 3 Estimation of the opposite stop

4.2 Estimating the Alighting Time

4.3 Destination Estimation

Fig. 4 Distinguishing transfer from activity

by estimating the number of journeys between different zones, information which

5.1 Discussion of Origin-Destination Analysis

6.1 Estimating the Sample Size for a Survey

Table 2 Survey data Number of volunteers 15

Devillaine F, Munizaga M, Trépanier M (2012) Detection of activities of public transport users by

You might also like