You are on page 1of 6

Big Data Analytics: Estimation of

Destination for Users of Bus Rapid Transit


(BRT) Public Transportation in Jakarta
Muhammad Syarif Widyawan Teguh Bharata Adji
Department of Electrical and Department of Electrical and Department of Electrical and
Information Engineering Information Engineering Information Engineering
Universitas Gadjah Mada Universitas Gadjah Mada Universitas Gadjah Mada
Yogyakarta, Indonesia Yogyakarta, Indonesia Yogyakarta, Indonesia
muhammad.syarif@mail.ugm.ac.id widyawan@ugm.ac.id adji@ugm.ac.id

Abstract—Origin-Destination (OD) matrix is a metric which population density, has implemented the AFC system in the
has become the main requirement in the analysis of the BRT transportation mode as a way of user transactions. As
transportation system. This measurement has been made a one of the main public transportation systems, BRT becomes
model for management, development, and transportation the solution to the severe congestion problems in Jakarta, the
planning purposes. Entering the era of big data, the first to be implemented in Southeast Asia with the longest
infrastructure that is used to measure OD matrices has also line in the world[3]. The use of AFC in many countries
changed. Due to the development of the Intelligence around the world relies on entry-only systems, where users
Transportation System (ITS), estimation of the origin- are only recorded when boarding. This system is used to
destination flow of transportation users is utilized as an
reduce the possibility of delay on the device when the user
approach to observe travel behavior of public transport users
formulated in the OD matrix. The infrastructure has been
performs data logging at the alighting station. However, the
developed a lot to get user’s behavior by recording their use of this system in public transportation makes it difficult
transactions. This research presents the method of alighting to analyze travel behavior and density of users, since it
station estimation from the Automated Fare Collection (AFC) requires specific destination inference so that accurate OD
transaction data on Bus Rapid Transit (BRT) smart card users matrix estimation can be done[4].
between bus stations and corridors, with OD matrices as its The estimation of the OD matrix in BRT Jakarta has been
parameters. The estimated travel destination of users with the carried out by Widyawan et al. by using entry-only AFC data.
same day trip chaining approach was explained in this study.
The study was able to model the number of transactions
Validation is done by comparing OD matrices with available
entry-exit AFC. The comparison of the results of estimated
registered on the system and describe them to OD matrices
validation between corridors and stops as a comparison of the [1]. The first problem of the research is that the estimation
scope of research on the behavior of travel users is presented. results cannot explain the actual behavior of BRT users. This
The purpose of this study is to determine the level of validity of is due to a large number of anomalies described in the study
the OD matrix estimation results validated with AFC data that made destination cannot be estimated properly. The
entries using the method approach that had been offered. The second problem is the validation of the estimated results was
results of comparisons at each level (corridor - bus stop) show also not done both internally and externally, as there is no
that the approach provides results with accuracy up to 94%. exit data available.
The comparison of destination estimation at each level shows
that OD matrix validity will be better at higher levels. The aims of the research are to address both problems by
estimating the BRT passenger’s destination in Jakarta at
Keywords—OD Matrices, AFC, Trip Chaining various levels of available pathways by making OD matrices
as a result and validating the results by using AFC exit data
as a valid destination.
I. INTRODUCTION
Origin-Destination matrix is an important element in In the next section, the current studies, the description of
transportation management. The matrix is a measurement for the data, the flow of the destination estimation method, and
modeling transportation conditions from various levels. the results of estimation and validation are presented.
Studies on system modeling in strategic management using
the OD matrix as an input has been carried out to solve II. CURRENT STUDIES
various transportation cases. This matrix represents how There are many ways to perform destination inference in
users behave in travel, where users start to travel and where order to generate OD matrix, depending on the dataset
they stop[1]. provided at the case. The trip chaining method is a method
AFC is a ticket system that has been widely used in commonly used by researchers to find out the boarding and
various public transportation to record user transactions. It alighting stations of users for estimating OD matrix. Many
records user transactions that allow for the transportation kinds of research modified the travel assumptions on the
system's OD matrix analytic process time by time. The study object of study and validated the assumptions made.
and implementation of the AFC system have been carried out Wang et al.[2] made the trip chaining with three
in many public transportations in the world on various assumptions to estimate alighting station : the final
modes[2]. Jakarta, as a capital city in Indonesia with a high

209
978-1-5386-8448-1/19/$31.00 ©2019 IEEE
destination station of the user’s trip is the same as the station 3. Using the station as the last destination.
on the first transaction on the same day, there is no private
The spatial validation was performed by A.A.Nunes et al.
mode of transportation at the daily trip sequence, and there is
no passengers who walk at long distance between to validate trip chaining algorithm in the case of entry-only
AFC assumptions that is integrated with AVL on the bus.
intermediary stations. The study used a combination of data
entry-only AFC and AVL (Automated Vehicle Location) The use of AVL, in this case, was a tool to record bus stops
to be included in the user's smart card transaction. Recording
carried out for public transportation cases, with minimum
number of destinations inferred was 57% using manual of the candidate destination, in this case, used Euclidean
Distance as the tolerance distance among the candidates for
survey validation. Due to an assumption which limited the
use of private mode, this study did not always work, the first trip with the next origin. The validation was
performed by experimental methods before and after the
especially in intermediary trip. Besides, there was no actual
validation or any method for obtaining distance parameters implementation of spatial validation, by seeing the number of
destinations can be estimated[4]. However, the spatial
assumption stated in the study to measure walking range of
validation did not represent the quantitative results to
passengers between stations.
compare with estimated OD matrices as actual counts. This
In the term of validation of OD matrix, Nassir et al.[5] case also happened in the research performed by Hora J et
validated the OD matrices using entry-only AFC dataset to al.[12], who assumed the maximum transfer distance by
infer boarding and alighting stops based on location and using Manhattan Distance has more realistic measure of
GTFS (Google’s General Transit Feed Specification) which walking distance.
provide schedule information. The verification was
As a resume, many assumptions were made to model trip
performed by comparing the estimated boarding and
alighting location results with the combined APC-VL chaining for destination inference and OD matrix generation
purposes with datasets provided in the cases respectively.
(Automated Passenger Count-Vehicle Location) data
provided in the study and drew 98% results. This happened Nevertheless, the model has not been validated with the
actual passenger's behavior of boarding and alighting as
due to the sources of verification data was completely
ground truth. Therefore, the exit data drew from entry-exit
provided, which is not always available in other cases.
AFC was considered better in this research to represent
Munizaga et al.[6] estimated alighting station to get OD validation of the estimated model and assumptions in every
matrix from smart card data of passive entry-only AFC passenger transactions at every level.
system for bus and metro. AFC and GPS data from public
transportation vehicles were used to predict alighting station III. DATA DESCRIPTION
of the user. 80% of the alighting stations in all modes of
transportation can be estimated. This study used the same This study uses fact data from BRT transactions in
assumptions as Barry et al.[7], and Trepanier et al.[8] Jakarta, which has 14 corridors with 242 stops serving 6
regarding the assumption of trip chaining, and added cities and 34 districts. Transaction data is the data obtained
objective functions involving general time, walking distance from the AFC BRT system through transportation users’
parameters 1000 m by determining the shortest distance smart cards. Earlier in 2014, the government implemented an
using Djikstra, as well as other influencing factors. However, AFC payment system with an entry-only mechanism, i.e.
the study was only able to estimate the user’s journey passengers make ‘a tap-in’ to be able to enter the departure
through data recorded on the metro, bus, and bus stops. The stop as a payment method. The mechanism changed in 2017
estimation results were not validated to find out the when the government implemented an entry-exit system to
correctness of the estimation results that lead to the truth of obtain OD matrix data through the results of recording
the OD matrix prediction results. Munizaga et al.[9] then passenger transactions when entering and leaving the bus
validated assumptions from the OD matrix estimation results stop.
in the case of the metro transaction database on the user card. This study uses BRT transaction data in Jakarta from 01
The validation was performed by using the OD matrix survey to 08 December 2017 with ‘a tap-out’ mechanism as a
on the transaction cards of registered users. destination estimation ground, with a number of more than 1
Furthermore, Alsgar et al. added the assumption of million transactions and average daily transactions after
transfer time in the form of ITT (Inter Transfer Time) in preprocessing around 140,000. OD matrix will be used as
intermediary trip which explained the behavior of users in metric data on the estimation and validation of origin and
making travel transfers from time to time in the case of AFC destination. To implement the algorithm, this research used
entry-exit interconnection networks for public transportation following information: timestamp of the transaction, user’s
buses, trains and ferries. This study investigated transfer smartcard serial number, bus stop, flag status (in and out),
times between 15 - 90 minutes with a 15-minute incremental and the reference of corridors, sub-corridors, and stops.
increase to determine whether the trip is complete or still in
transfer [10]. Alsgar et al. then developed OD matrix IV. DESTINATION INFERENCE METHOD
estimation on the same dataset, by developing the assumption This section describes the process of user travel
of trip chaining and validating with actual data on passenger destination inferences which is divided into two stages i.e.
boarding and alighting [11]. This study changed the data preprocessing and estimation method using trip chaining
assumption to add the final destination of the trip as a algorithm. Both stages will be explained in Sub Section IIIA
correction of the final destination of the system, which is and IIIB. Data preprocesses explain how to filter data to
based on the following: reduce data anomalies due to procedural errors that occur at
1. Finding the last alighting station using route trip ID. the application level in the field, such as data redundancy,
user transaction confusion, and system abuses. The
2. Finding the alighting station on the same route that is estimation method section describes the method used to
closest to the first boarding station transaction from estimate a destination with an OD matrix as its output. In the
the same day. matrix validation part, an endogenous validation mechanism

210
is explained, which describes the validation method based on
the primary data available on the internal system.
A. Data Preprocessing
The data processed in this study is BRT transaction data
for 8 days. The data obtained is data that satisfies the
attributes of the AFC entry-exit system, namely the presence
of in and out flags for each transaction to mark the user’s
boarding or alighting status. The data is then processed by
performing a cleaning process from duplication and deletion
of users who do not fulfill system attributes. The transaction
screening process is carried out to obtain valid user Fig. 2. The frequency of raw transactions each day
transaction data. The process of filtering data to get user data
is described in Fig.1, by fulfilling the following steps: After passing the preprocessing stage of the data, the
1. Perform a process of removing duplicate data that number of transactions has decreased. The filtering process
occurs due to errors at the application level. on the preprocessing phase is required to see the data that has
valid users. The number of transactions at the validated user
2. Remove a single transaction, a transaction that is only level is calculated and illustrated in Fig. 3.
done once a day.
3. Filter travel data of users who have completed the
transactions, namely those who perform tap-in and
tap-out.
4. Group transactions for users who tap at the adjacent
time.
The result of preprocessed data is in the form of
transaction data with an entry flag to enter the destination
estimation process. The data is sorted by transaction time and
user ID.
Fig. 3. The frequency of filtered transactions each day

Fig. 2 and 3 show a significant reduction in the number of


user transactions from raw data to filtered data to obtain
validated data. This shows the large number of BRT users in
Jakarta who do not have a personal smart card as the
transaction tool.
The filtering process is made to see valid user data, but not
enough to notice any smart card that is used together.
Grouping accounts that perform transactions together are
carried out in order to filter them better, indicated from data
records that have close a time with the same flag. The
number of transactions that are validated by user grouping is
illustrated in Fig. 4.

Fig. 1. Sample of a filtering transaction process

Between these dates, the total transactions carried out by


the AFC tap-in and tap-out system are 4.7 million with
average daily transactions of more than 590,000 transactions.
All daily transactions at all levels are then analyzed. The
frequency of daily transactions of all dates in the raw data is
described in Fig. 2. Fig. 4. The frequency of grouped transactions each day

Fig. 4 should show the reduction in number from the


filtered transactions, due to transactions performed by the
user in a limited time range are grouped in a single
transaction. This shows that the reduction resulted was not
significant. The study also analyzes daily transactions on raw
data and validated data, which shows a decrease in
transactions at both raw and validated data on holidays. The
decreased transactions in both show a reduction in active
participation on personal cards and BRT user officers.

211
B. Trip Chaining Algorithm Notebook as its processing interface. This section describes
The trip chaining is a method used to make OD matrix on an analysis of anomalies found in preprocessing data and the
transactions. In this study, a trip chaining is carried out by comparison with the processed data. Analysis of multilevel
applying appropriate assumptions to the object of research. estimation results is also presented to compare the estimation
The assumptions applied are as follows: matrix with actual data.

1. There is no single transaction counted as an OD pair A. Analysis of Data Anomalies


on the trip.
The detection of data anomalies was done by assessing
2. There is no transaction with unpaired flags. unusual transactions found in the dataset. The problem shows
that there were so many transactions made by a single card.
3. The alighting stop on the transaction is the same as These were carried out possible by the officers, and users
the next boarding stop for the same user. who perform transactions in groups. Another problem shows
4. The passengers will return to the place where they in transactions which do not have complete trips. Table I
boarded the first day at the end of the trip. describes the differences between the raw and processed
transactions to distinguish data anomalies.
Fig. 5 describes how the assumptions were implemented
in the model to estimate alighting station in every
!"#$% '(")*++*, !($-+$)#."-
passenger’s transaction. In the study conducted by × 100% (1)
!"#$% /$0 !($-+$)#."-
Widyawan et al.[1], a single transaction on AFC entry-only
system was included in the analysis process, by invoking the
same destination as the origin of the transaction. Basically,
TABLE I. COMPARISON BETWEEN RAW AND PROCESSED
destination inference in the case of a single transaction TRANSACTION
cannot be performed by the trip chaining method. Therefore,
in this study, a single transaction was removed at the Transactions Result Count
preprocessing stage.
Raw
Total 4,753,334
Mean Transactions Each User 226.5
Processed
Total 1,162,300
Mean Transactions Each User 13.75
Sum of OD Pair 573,222
Table I shows the results from preprocessing step to
algorithm implementation in general. The data shows that
there were many transactions carried out by officers that
cause transaction anomalies. The anomalies can be seen from
the average number of most user transactions on raw data
reaching more than 200 transactions each day. These
transactions are then filtered at the preprocess stage to obtain
non-officer transactions which are considered as actual user
transactions. The preprocess results make a significant
difference with the average number of the most user
transactions, which indicates that the dominant use of
attendant cards at the first time the AFC entry-exit system is
applied. Filtering transactions based on users at this stage
influences the number of transactions obtained. The results
calculated by comparing between the processed data with the
total raw transaction as described in (1) show a significant
decrease in the number of transactions, with the number of
transactions of passengers who had OD pairs of 24% of the
total transactions. Furthermore, destination inference analysis
of the users is carried out at various levels, with the OD
matrix as a description of the results of the trip patterns
generated from the transaction.

B. Multilevel Analysis
The BRT in Jakarta has 14 corridors, showing 14 lanes
that are passed by the fleet. In destination inference analysis
Fig. 5. Trip Chaining for destination inference at this level, the trip chaining is applied to determine the
pattern of passenger travel from one corridor to another.
V. RESULT AND DISCUSSION
The application of the algorithm in this study was done
using Apache Spark as an analytic processor and Jupyter

212
C. Travel Estimation Ability
In this section, the experimental analysis is done by comparing
the OD matrix estimation results at each level with the actual OD
matrix. Several preparations were made to obtain the results of
comparison estimation between levels. The 8-day transaction data
from 01 to 08 December 2017 is extracted until it gets the station's
position of boarding and alighting in each transaction. The
processed data is extracted in the form of a matrix at each level. The
matrix is then validated with the actual matrix which also passes the
preprocess in the same way to obtain the accuracy of the predicted
results. The user destination used to complete the OD matrix is
endogenously validated using the flag out data on the same
transaction. The purpose of validation is to test the validity of the
Fig. 6. Heatmap of Line Level Trip Chaining OD Matrix Sample destination estimation of each transaction at each level. Table II
shows the number of daily estimated transactions and the actual
There are sub corridors and stops on each of the lines of transaction data. The data shows the similarity of the number of
14 corridors available. A total of 60 sub-corridors with 242 transactions on actual and predictive results, both on the side of the
stops are available in this case. The generated OD matrix total transaction data and on the total number of OD matrices
described in Fig.6 provides an overview of the density of produced. The difference between the number of transaction data
users at each level. In OD matrix generation, different and OD matrix is generated because the calculated OD matrix does
processing is carried out at the bus stop compared to the level not calculate group transactions, namely transactions carried out by
of the corridor and sub-corridor in the case where origin and more than one person at a time adjacent using the same smart card.
destination have the same point. In the case of bus stops, the The same treatment is also done when the calculation of accuracy
transactions that have the same origin and destination are not between the two matrices is presented in Table III because group
considered, because these conditions indicate there is no transactions cause a bias in calculating the OD matrix. One case that
movement of the user's trip so that the analysis of the users’ causes bias is when users tap to enter and exit from a bus stop with
travel behavior cannot be carried out. This is different in the different amounts. The difficulty of giving value in the matrix will
case of corridors and sub-corridors that have different stops make a difference in the amount generated in the matrix, either the
even though they have the same origin and destination. Using predicted and the actual.
the assumptions applied, the average number of daily
transactions is obtained at all levels. TABLE II. PROCESSED TRANSACTION AND OD MATRIX
GENERATION
Fig. 7 shows the number of user frequencies that board
and alight at the same stop on the predicted OD matrix at the
Sum Sum Sum OD Sum OD
bus stop level. The daily error frequencies were calculated by Date
dividing the miscalculated count obtained from algorithm Truth Estimated Truth Predicted
implementation with the total transactions of the day. The 1 42,508 42,508 41,793 41,793
average of predictive errors of destination, in this case, is less
2 50,693 50,693 49,753 49,753
than 2%. The predictive errors shown in the same matrix
value at the origin and destination are caused by the 3 43,594 43,594 42,708 42,708
possibility of changes in user behavior that has not included 4 90,360 90,360 89,186 89,186
in the assumption of the trip chaining. It occurs due to the
behavior of passengers returning to the boarding stop on 5 90,464 90,464 89,441 89,441
previous trips using other modes of transportation, which 6 91,252 91,252 90,203 90,203
causes passengers on the next trip to depart at the same stop
with the previous bus stop. The assumption of the cause of 7 86,835 86,835 85,781 85,781
this behavior is that there are passengers who want to change 8 85,444 85,444 84,357 84,357
destinations on the next trip on a different route with the first
trip, thus relying on large stops to fulfill the case. The In terms of data usage for validation, researchers have
passengers depart from the nearest main bus station to the performed a variety of methods to verify the results by
destination stop on the first trip and return to the main stop to predicting the validated OD matrix data, which are divided
start the second trip. Another assumption is the possibility of into two main types:
users looking for empty buses on other lines for the same
a. Endogenous data, i.e. validation data obtained from
purpose. internal transportation systems, such as data on
transportation database systems, GPS data attached to
public transportation, mobile phone data for passenger,
or other internal systems.
b. Exogenous data, i.e. data from other external data on
the main system, such as data obtained from
Automated Passenger Count (APC), the volunteer
survey, and the interview.
The accuracy is calculated at each level by calculating the
average accuracy of the comparison matrix for daily transactions on
available data. Table III shows that the highest and the lowest
accuracy is obtained at the level of corridor and bus stations, which
Fig. 7. The frequency of predicted destination which has the same stop as concluded that the higher level of estimation carried out using the
origin at bus stop level

213
assumed trip chaining offered, the higher accuracy is obtained. J. Public Transp., vol. 14, no. 4, pp. 131–150, 2011.
Estimation at each of the higher levels is comparable to the wider
scope of the study. [3] R. Cervero, “Bus Rapid Transit (BRT): An Efficient
and Competitive Mode of Public Transport. Working
TABLE III. CALCULATION ACCURACY AT EVERY LEVEL Paper 2013-01, Institute of Urban and Regional
Development, University of California,” no. August,
Level of Estimation Accuracy 2013.
Corridor/Line 94%
[4] A. A. Nunes, T. G. Dias, and J. Falcão E Cunha,
Sub-Corridor 84% “Passenger journey destination estimation from
Station 57% automated fare collection system data using spatial
validation,” IEEE Trans. Intell. Transp. Syst., 2016.
VI. CONCLUSION
This study has two major contributions to the case of [5] N. Nassir, A. Khani, S. Lee, H. Noh, and M.
Jakarta’s BRT. First, this study estimates destinations using Hickman, “Transit Stop-Level Origin-Destination
the assumption of trip chaining by filtering the actual users. Estimation Through Use of Transit Schedule and
The estimation is carried out on entry-exit AFC of the Automated Data Collection System,” Transp. Res.
recording results of the tapping transaction data with entry Rec. J. Transp. Res. Board, vol. 2263, pp. 140–150,
and exit status, by separating the exit data at the estimation 2011.
process. Second, this study validates the estimation results
using the AFC exit data as the actual data.
[6] M. A. Munizaga and C. Palma, “Estimation of a
In terms of estimation, this study has identified the OD disaggregate multimodal public transport Origin–
matrix with filtering on the BRT smart card user at ID level. Destination matrix from passive smartcard data from
User identification is performed by removing data anomalies, Santiago, Chile,” Transp. Res. Part C Emerg.
i.e. transactions with an abnormal number, users who do not Technol., vol. 24, pp. 9–18, Oct. 2012.
have complete travel, and users ID that perform transactions
in groups, by looking at transaction data with the entry and [7] J. Barry, R. Newhouser, A. Rahbee, and S. Sayeda,
exit status that have contiguous intervals. Identification is “Origin and Destination Estimation in New York
also performed by filtering user travel behavior by City with Automated Fare System Data,” Transp.
eliminating single transactions, i.e. transactions that are
Res. Rec. J. Transp. Res. Board, vol. 1817, pp. 183–
paired only once a day.
187, Jan. 2002.
From validation results, OD matrix estimation has been
identified, by comparing the results with the OD matrix in [8] M. Trépanier, N. Tranchant, and R. Chapleau,
the complete transaction on each of the same users’ ID. The “Individual Trip Destination Estimation in a Transit
results give the highest degree of accuracy at the level of the Smart Card Automated Fare Collection System,” J.
corridor as much as 94%, which has concluded that the OD Intell. Transp. Syst., vol. 11, no. 1, pp. 1–14, 2007.
matrix estimation will be higher when it is performed at the
higher level. The trip chaining estimation process and the [9] M. Munizaga, F. Devillaine, C. Navarrete, and D.
validation performed with the exit data in this case was
Silva, “Validating travel behavior estimated from
considered describe more about user’s travel behavior and
give more accurate results since it represents actual data smartcard data,” Transp. Res. Part C Emerg.
transactions at each user. This can not be performed in Technol., 2014.
previous studies since there was no exit data or other systems
which connect directly to user transactions. [10] A. A. Alsger, M. Mesbah, L. Ferreira, and H. Safi,
“Use of Smart Card Fare Data to Estimate Public
The estimation and validation process in the BRT case Transport Origin–Destination Matrix,” Transp. Res.
still require further investigation to consider the users’ Rec. J. Transp. Res. Board, vol. 2535, pp. 88–96, Jan.
behavior to dig deeper into the travel data, which cannot be 2015.
estimated to give fully enough information. The data that is
not estimated occurs due to many BRT passengers who do
not have a personal card, so they only rely on attendant cards [11] A. Alsger, B. Assemi, M. Mesbah, and L. Ferreira,
to tap in and out when entering and exiting bus stops. “Validating and improving public transport origin-
destination estimation algorithm using smart card
fare data,” Transp. Res. Part C Emerg. Technol.,
REFERENCES
2016.
[1] Widyawan, B. Prakasa, D. W. Putra, S. S.
[12] J. Hora, T. G. Dias, A. Camanho, and T. Sobral,
Kusumawardani, B. T. Y. Widhiyanto, and F. “Estimation of Origin-Destination matrices under
Habibie, “Big data analytic for estimation of origin- Automatic Fare Collection: the case study of Porto
destination matrix in Bus Rapid Transit system,” in
transportation system,” Transp. Res. Procedia, vol.
Proceeding - 2017 3rd International Conference on
27, pp. 664–671, Jan. 2017.
Science and Technology-Computer, ICST 2017, 2017.

[2] W. Wang, J. Attanucci, and N. Wilson, “Bus


Passenger Origin-Destination Estimation and Related
Analyses Using Automated Data Collection Systems,”

214

You might also like