You are on page 1of 9

Use of Entry-Only Automatic

Fare Collection Data to Estimate


Linked Transit Trips in New York City
James J. Barry, Robert Freimer, and Howard Slavin

Many large transit systems use automatic fare collection (AFC) systems. the Chicago, Illinois, Transit Authority’s (CTA’s) system (3). These
Most AFC systems were designed solely for revenue management, but earlier works focused on rail transit trips and did not include all bus
they contain a wealth of customer use data that can be mined to create trips. The Massachusetts Institute of Technology, working with
inputs to operations planning and demand forecasting models for trans- CTA and Transport for London, has conducted a detailed analysis
portation planning. More detailed information than could ever be col- of fare card data, typically with more and better-quality information
lected by any travel survey is potentially available if it is assumed that the than that available for New York City (1, 4–6).
transactional data can be processed to produce the desired information. AFC systems for transit can be classified by whether or not
Previous work in this field focused primarily on rail transit, since board- they require a swipe to leave the system. Exit swipes are required
ings at fixed stations are easier to locate than boardings of buses, which by distance-based fare systems (e.g., those in San Francisco, Cali-
move around. This paper presents a case study for the Metropolitan fornia, and London) and facilitate a more straightforward approach
Transit Authority’s New York City Transit, a transit system in which a to the recovery of information about the transit trips that riders
rider swipes a fare card only to enter a station or board a bus. This is the make. Other cities (e.g., New York and Chicago) use a flat-rate
first work to include trips by all transit modes in a system that records the fare system and collect only boarding information. Such entry-only
transaction only on rider entry, which is significantly more challenging transit systems require additional logic to impute the alighting
because all the alighting locations need to be inferred and the bus board- locations of transit trips.
ing locations need to be estimated. No location information (from auto- This paper presents the findings of work conducted for the New
mated vehicle location technology or a Global Positioning System) was York City Transit (NYCT) of the Metropolitan Transit Authority
available for buses. Software that processes the 7 million–plus daily trans- (MTA) to extend the methods of Barry et al. (2) to include all tran-
actions and that creates a data set of linked transit trips was created. The sit modes: subway, local and express buses, ferry, and tramway.
data set can then be analyzed by using geographic information system- This effort and its relevance to demand forecasting have been
based query software to create reports, maps, origin–destination matri- described earlier (7 ). The goals included improving the process
ces, load profiles, and new data sets. Subway journeys are assigned by for estimation of subway trips, generating origin–destination pat-
using a schedule-based shortest-path algorithm. terns by traffic analysis zone, creating load profiles for bus and
subway routes, and extending the analysis beyond the morning
Many large transit systems use automatic fare collection (AFC) sys- peak period.
tems, in which a rider swipes a card to enter a subway station or to MTA’s MetroCard system is an entry-only AFC system in which
board a bus. Once AFC systems came into use, transportation plan- a rider swipes a MetroCard to enter a subway station or dips a Metro-
ners realized that they contained a wealth of customer use data that Card to board a bus (a “bus dip”), generating a unique transaction that
could be mined to create inputs to operations planning and demand is logged onto a mainframe computer database. No transaction occurs
forecasting models. Because most AFC systems were designed solely when a rider exits a station or a bus. The system was designed in the
for the efficient collection of fares from a huge number of riders per early 1990s to collect fares efficiently from millions of riders per day
day, the special and often difficult processing of their transactional on a system with a flat-rate fare structure. It was not set up to capture
data sets is required to derive information useful for transportation the details of the trips made by each rider. The transaction times saved
planning (1). are truncated to 10ths of an hour.
Early work by Barry et al. showed that AFC transactions could be The basis of the approach described is that for each transaction,
used to estimate subway origin–destination patterns in New York an attempt is made to identify the route and the specific board-
City (2). Rahbee and Czerwinski then applied similar techniques to ing and alighting stops that define a trip leg. Multiple trip legs
are combined into a linked trip, when it is inferred that a rider
uses his or her MetroCard two or more times to complete a single
J. J. Barry, New York City Transit, Metropolitan Transit Authority, 72 Lindbergh
Lane, New York, NY 10956. R. Freimer and H. Slavin, Caliper Corporation, journey.
1172 Beacon Street, Suite 300, Newton, MA 02461. Corresponding author: The remainder of this paper is organized in three parts. The first part
R. Freimer, robert@caliper.com. describes the methods used to process the available AFC and sched-
ule data to produce data sets consisting of geographically referenced
Transportation Research Record: Journal of the Transportation Research Board,
No. 2112, Transportation Research Board of the National Academies, Washington,
linked transit trips, known by NYCT as the Citywide Transit Travel
D.C., 2009, pp. 53–61. Database. Each data set contains all the transactions for a single day.
DOI: 10.3141/2112-07 The second part describes the user-oriented geographic information

53
54 Transportation Research Record 2112

system-based query and analysis tools that can be used to study and either a subway station or a bus stop. A second table details the com-
analyze the trip data by creating reports, maps, origin–destination ponent legs for each linked trip. These two tables are the principal
matrices, load profiles, and new data sets. The final part describes how inputs used by the query software.
the system is being used in practice by NYCT.

AFC Transactions
CREATING TRANSIT TRIPS
The development project was based on a sample data set consisting
This section discusses how the MetroCard transactions were processed of MetroCard transactions provided for a 2-week study period in late
to create geographically located, linked transit trips. Figure 1 provides April 2004. Almost 95 million records were provided in the raw
a schematic flowchart of the extensive data processing required. mainframe format and converted into a fixed-format binary table on
The other inputs used include a geographic representation of tran- a personal computer.
sit routes, the log of actual bus trips, and the subway and bus Unlike many cities, NYCT never shuts down completely and runs
schedules. 24 h a day, so 3 a.m. was chosen as the start and end of a day to mini-
The procedure generates a table of linked passenger trips for a mize the overlapping trips. The procedure processes a day’s worth of
given day, with each trip having an origin and a destination that is transactions at a time. Weekdays include more than 7 million transac-

MetroCard AFC Bus Trips NYCT Bus Schedules TransCAD Route


Transactions Operator Order System

Selected by Date Bus Schedules Trip Matched to Route


Equipment Order

AFC Bus Trips Trip Matched to


with Stops Similar Scheduled Trip

Subway Booths

Located Subway
Boardings

Bus Trips Identified


(BUSTRIP.EXE)

Locate Buses based on


Bus to Bus Transfers Intersection Lookup
Nearby Subway to Bus Transfers Tables
Bus to Subway Transfers

Interpolate Times for


Unlocated Bus Events

AFC Bus Trips with


Estimated Times

Located Bus
Boardings

Transactions with
Located Boardings

FIGURE 1 Data flow.


(continued)
Barry, Freimer, and Slavin 55

Transactions with AFC Bus Trips with Subway & Ferry TransCAD Planning
Located Boardings Estimated Times Schedules Route System

Locate Subway to Schedule-Based


Bus Destinations Shortest Path

Locate Destinations Subway Path Lookup


by Chaining Tables

Estimate Arrivals

Link Passenger Legs based


on Travel Time and Location

Impute Destinations
for Remaining Trips

Determine All Subway Commuter Rail &


Paths & Expand Trips Bus Estimates TAZs

Allocate Commuter Rail/


Bus Flows

Allocate Linked
Endpoints to Zones

Linked Passenger
Passenger Legs
Trips

Query
Software

FIGURE 1 (continued) Data flow. (TAZ ⴝ traffic analysis zone).

tions, corresponding to more than 9 million transit legs, because many stops being correctly located. These include the schedule-based
riders transfer within the subway system. shortest-path (SSP) algorithm used to estimate the paths through
MetroCard transactions are generated for a high percentage of the subway system, the locations of transfers between routes, and
boardings: 97% for the subway and 86% for buses. The remaining estimates of arrival times so that trips can be linked.
trips are paid for by single-use MetroCards or cash fares that are Before the project described here began, NYCT had been using
not captured. a route system with only the major patterns for the morning peak
period. In conjunction with this project, NYCT migrated over to a
new route system that was based on a more accurate base map and
Accurate Geographic Route System that extended to include all the subway patterns, bus patterns for
other time periods, and routes for other bus systems that also use
Many aspects of the processing rely on an accurate representation of the MetroCard system. Geographically accurate bus stops with
the transit system, with a route for each pattern and stations and bus standardized identifiers were added so that the schedules could be
56 Transportation Research Record 2112

matched to routes. The creation of such a route system required a Boarding Locations
lot of time and effort.
Subway and tramway boardings are located by the use of the turn-
stile fare collection information. This provides an immediate iden-
AFC Bus Trip Log
tification of the station where the boarding occurred but does not
The MetroCard system records a log of events for each bus and the identify the routes boarded when multiple lines serve the same sta-
event times. These events include destination sign changes, which can tion complex or the rider switches subway trains. Even the direction
usually be used to determine the route pattern that a bus followed. It traveled is usually unavailable, because both platforms are often
does not record any location information for a bus or a trip identifier accessible from the same entrance.
that can be synchronized with the schedule. A significant effort was Bus boardings are located by estimating the location of the bus at
required to clean up this data set because multiple records frequently the trip boarding transaction time. This is rather challenging in New
need to be combined into a single trip and some records are missing York City, because the bus fleet does not employ the automated vehi-
or are incorrect because they are manually generated by the driver cle location technology and the transaction times are truncated to
changing the sign, which does not always happen as planned. 6-min intervals. Instead, the AFC bus trip log is combined with the
bus schedules and positional information derived from certain
MetroCard transactions to obtain approximate bus locations for most
Subway Schedules trips. The times for intermediate stops between derived locations
were interpolated by using distance as the weighting factor.
For each subway line, there are up to three separate schedule files, cor- The most difficult portion of this project was trying to determine
responding to weekdays, Saturday, and Sunday. Each file details the a method that could be used to clean up the bus trip tables and
stations, the list of trips, and the sequence of stations that comprise a match them against the schedule so that they could be used to locate
trip along the route. the bus boardings. The principal difficulty was the inconsistent
These were converted into a fixed-format binary table with 596,876 number of records corresponding to an actual trip. To create a sin-
records, with each record corresponding to a schedule event. These gle trip record, there is a frequent need to combine multiple trip
events occur at 2,549 unique route stops, corresponding to fewer records. In other cases, records need to be split to correct for a driver’s
physical locations. failure to change the sign. There were several false starts before a
The schedules were used to update the route system to contain all final strategy was selected.
the subway patterns, including short turns. They are also used as The original strategy used to locate bus transactions involved the
inputs to the SSP algorithm, which is used to determine paths through cleaning up of the AFC bus trip log to contain one record per actual
the subway system. trip. The scheduled trip, along with the TransCAD (a commercial
The MetroCard system is also used by the Staten Island Railway software package) route identified by using the schedule files, would
(SIR), the Roosevelt Island Tramway, parts of the Port Authority control the cleanup process. The scheduled stop times would be
Trans-Hudson (PATH) system under the Hudson River, and the Air- adjusted to reflect the actual times observed, and passenger bus dips
Train to and from John F. Kennedy International Airport (JFK Air- would be located on the basis of interpolation along the trip route.
Train). All of these function similar to the subway, in that they have Unfortunately, a variety of data imperfections and matching issues
fixed stations. Routes were added for all these operators. Schedules combined to foil the cleanup and matching strategy. After several
for the SIR and the free Staten Island Ferry were added so that the false starts, a successful approach was developed on the basis of the
SSP algorithm could be used to assign trips between Staten Island following observations:
and Manhattan.
1. The most important goals of matching the bus trip to the sched-
ule are to determine the route pattern, the locations of stops, and the
Bus Schedules relative times between stops. Thus, it is sufficient to find a similar
The NYCT bus schedules are similar to the subway schedules. trip made by a bus, as long as it uses the same pattern. It is no longer
For each bus route, there are up to four separate schedule files, cor- problematic if two trips match an identical scheduled trip, since only
responding to weekdays (when school is open), weekdays (when the pattern of stops and their relative times are used by the location
school is closed), Saturday, and Sunday. These were converted into procedure.
a fixed-format binary file with more than 5 million schedule events, 2. Bus positions can be localized by using the transfers observed
which occurred at 63,049 unique route stops. The schedules had to in the data. For example, if a passenger takes two bus legs within a
be converted from the service schedule order into the equipment short period of time and the routes intersect, then the second bus
order, so that they could be synchronized with the actual trips from must be near the intersection point at the transaction time. Similarly,
the AFC bus trip log. some subway-to-bus transfer locations can be used as well.
MetroCard transactions were also provided for the other bus
operators within New York City, including Long Island Bus, routes A two-pass method is used to locate bus transactions. During the
franchised by the New York City Department of Transportation first pass the primary goal is to pinpoint the location of the bus at
(NYCDOT), Atlantic Express, and the Metro-North Hudson Rail several stops during each trip by using bus-to-bus and subway-to-
Link. Each of these provided their schedules in a different format bus transfer points. Interpolated times are then assigned to interme-
that had to be converted and integrated separately. For a variety of diate stops, and these are used to assign locations to the remaining
reasons, the schedule information for the NYCDOT routes was bus dips during the second phase. Adjustments to the boarding stop
sometimes incomplete or inaccurate, and during this project these used are made during the trip-linking procedure.
routes were being taken over by the MTA and grouped into the The boarding locations developed are geographically less accu-
MTA Bus Company. rate for buses than for subways because of the 6-min truncation of
Barry, Freimer, and Slavin 57

recorded boarding times, the estimation process, and the cleanup which uses the complete subway schedule and geographic representa-
required to use the AFC bus trip log. tions of all route patterns to predict the route traveled through the sub-
way system and the time of arrival. This method also handles the use
of the Staten Island Ferry, which is a free connector between the
Alighting Locations NYCT subway system and SIR. Bus alighting times are determined
by using the estimated arrival time for the bus at the alighting stop.
Two assumptions were made to allow alighting locations to be deter-
mined: (a) most riders start their next trip at or near the destination
of their previous trip, and (b) most riders end their last trip of the day Zone Allocation
at or near the start of their first trip of the day. These were shown by
NYCT to be reasonable for subway riders by use of a travel diary sur- The procedure generates a table of linked passenger trips for a given
vey to validate the destinations (2). The methodology resulted in 90% day, with each trip having an origin and a destination that is either a
valid destinations. subway station or a bus stop. Nearby origin and destination zones
An additional assumption was made: the pattern of single-fare (2000 census block groups) were assigned to each trip by a logit allo-
card users is similar to that of multiple-fare card users at a given cation procedure that distributes the trips to nearby zones on the basis
boarding location. of a weighting of walking distance and population or employment,
A chaining procedure is applied to determine the likely alighting depending on the time of day. Special handling was added to assign
locations for riders with two or more MetroCard transactions on a the trips starting and ending on commuter rail or a bus.
particular day. Each transaction defines a leg (unlinked trip) in the
passenger’s journey. The location of the next transaction by a Metro-
Card is used to find a nearby bus stop or subway station for the alight- Validation
ing location of the current leg, assuming that a consistent one exists.
A variety of methods were used to validate the location and linkage
This assumes that many riders start their next movement near the
procedure while it was being developed. In particular, some auto-
conclusion of their prior movement. For the final leg, the procedure
mated procedures were implemented to tabulate the transactions
loops back to the first transaction, unless the passenger makes only a
located so that the results could be compared with other available
single linked trip during the day. This logic is justified by the fact that
information for validation purposes.
many riders return to their origin at the end of the day. Impossible
For subway transactions, the entrance and exit counts were tabu-
destinations, that is, those that are unreachable by the subway system
lated by subway station complex for the full day and by 4-h periods,
or bus in question, are discarded. Destinations for single trips and
which match the periods used at each station to match the polling of
other trips with no chained destination are assigned by using a ran-
the registers; these periods vary in their starting hours. These results
dom sampling of distributions derived from other riders who have
can then be used for comparisons with subway register counts (exit
the same trip origins.
counts should be treated as a lower bound, since not all exiting pas-
The arrival time for each imputed alighting location is estimated
sengers are counted) and to check the balance between the entrance
by running a query in the SSP algorithm for each subway journey or
and exit totals for a station, since they should usually be close.
by using the estimated stop times computed for the bus trip. Legs
For bus transactions, ride check data are an alternate source of
are combined into a single linked trip if the expected arrival time of
information. Counts of boardings, alightings, and overall loads are
the first leg is within 18 min of the start of the next leg. The location
provided for each stop along each bus trip. Only a couple of routes
consistency is ensured by the fact that alighting locations have not
were checked during the 2-week study period. The available ride
yet been assigned for inconsistent legs.
check data were imported and successfully matched to the sched-
uled trips. The bus transactions were then tabulated by trip stop, so
Expansion that the ride check data could be compared with the transactions that
were located.
Not all transactions will have their alighting locations determined by To compare the results obtained by the automated location proce-
the chaining procedure because it fails to assign an alighting location dure with those for some known trips, some MetroCards were pur-
when either the rider made a single trip (which could possibly be a chased and 10 predetermined tours were taken. The boarding and
multimodal trip) or no alighting stop is consistent with the next board- alighting times and locations were logged during those tours. The
ing location. Two expansion procedures that use sampling to assign results derived by the automated location procedure were compared
alighting locations are applied. For subway transactions, an alighting with the results derived from actual trips. By in large, the results were
stop is assigned by uniform sampling on the basis of the observed quite promising, given the quality of some of the data sets. A couple
distribution from riders boarding at the same station with assigned of subway journeys used different pairs of trains to get between the
alighting stations. For bus passengers, a similar approach based on same pair of stations, which happens when actual choices do not
a distribution of alighting stops for all passengers boarding at the same match the SSP algorithm query expansion.
stop during the day for that route pattern is used.

QUERY SOFTWARE
Linkage
Powerful query software that allows almost any conceivable query
Two or more movements for a rider are linked together into a single to be answered was created for this project. The software works in
trip, when they occur within a short period of time. The alighting two steps: trip or leg selection and output creation. Queries can be
times for subway trips are determined by use of the SSP algorithm, made on either the linked trips table or the unlinked legs table.
58 Transportation Research Record 2112

The tool is set up to work with multiple days of data processed ing selection set of linked trips. The choices for the output include
from the 2-week study period from April 19 to May 2, 2004, which reports (Figure 3), maps (Figure 4), origin–destination matrices,
was selected as a representative period for data collection and which or a TransCAD selection set that can be used to create external tables
was free of exceptional events, such as snowstorms, disasters, and or spreadsheets. The reports can be summarized by arrival time,
school vacations. departure time, mode or route of a particular leg, origin, destination,
The software is a custom application of TransCAD that is accessed or origin–destination pair. A ridership report by either route pattern
through a new custom toolbox. The software was designed to be or route segment can also be produced. Maps can depict the ori-
easy for anyone to use to get the information that they desire. No gins and destinations of the trips or legs selected. Maps can also
special knowledge or training is required to use the system. Users include a scaled theme depicting the ridership by street or track
can interactively select the geographic location and time period of segment. The origin–destination matrices summarize the trips or
interest. legs by stop, zone, census tract, or borough. In addition to creat-
The query builder step defines a query by combining one or more ing export tables, the selection set can be used for general analy-
selection primitives that conceptually select a set of trips or legs sis in TransCAD or for examination of individual linked trips in a
(Figure 2). The query can require that either any or all of the primi- customized trip browser that depicts each leg of the linked trip on
tives be matched. For linked trips, the primitives include selection the map (Figure 5).
by the mode, route, or pattern of a particular leg in the trip sequence;
by the origin or destination of the trip by the specification of stops,
zones, census tracts, or boroughs; by the inclusion of a particular CURRENT USE OF CITYWIDE
type of transfer between modes within the trip; or by a general struc- TRANSIT TRAVEL DATABASE
tured query language (SQL) query. For unlinked legs, the primitives
include selection by the mode, route, or pattern of the leg; by the ori- The Citywide Transit Travel Database with its query tool in
gin or destination of the leg by the specification of stops, zones, cen- TransCAD provides a unique resource for short- and long-term
sus tracts, or boroughs; by specification of the mode of either the transportation planning in New York City as well as for the planning
preceding or following leg used as part of a linked trip; by a general and scheduling of subway and bus services by NYCT. The millions
SQL query; or by the position of the leg within its linked trip. Queries of transaction records that the MetroCard fare collection system pro-
can be saved for later reuse. duces daily provide unprecedented details on the spatial and tempo-
The execute step specifies the day to be used for the query, ral aspects of transit use. In addition to NYCT, MetroCards are
which is selected from the list of available days. The trips or legs accepted on Long Island Bus, MTA Bus, Atlantic Express, Metro-
can be further restricted on the basis of a time period or an exist- North Hudson Rail Link, the Bee Line bus system in Westchester

FIGURE 2 Query builder example for the No. 6 Train.


Barry, Freimer, and Slavin 59

FIGURE 3 Report generated for the No. 6 Train example.

County, SIR, Roosevelt Island Tramway, the PATH system, and the from both the MTA and the NYCT models. Application examples
JFK AirTrain. are as follows.
The retrieval of basic origin and destination information by route
for transit users in New York City is a daunting task. On an average
weekday, there are 5 million weekday subway entries and 2.5 mil- Long- and Short-Term Service Planning
lion bus boardings. This database provides origin and destination
information by route, including the key distinction between linked 1. The zone-to-zone trip tables are essential for the demand
and unlinked trips that is essential for an understanding of existing forecasting tasks required for major additions to the subway sys-
travel patterns and the development of forecasts. This trip information tem, such as the Second Avenue Subway. Zone-level trip tables
at the subway station and bus stop levels was also converted into tra- are needed to project future travel as population, employment, and
ditional zone (block group)-to-zone trip tables for use in various travel land use changes occur.
demand models. NYCT maintains a detailed transit network model 2. Detailed trip information helps address short-term service
for analysis of the effects of subway and bus service alternatives planning issues, such as capacity and crowding constraints, fleet addi-
on route choice, travel time savings, and convenience. For regional tions, and the allocation of operating and capital resources.
transit analysis focused on commuter rail and subway use, MTA
maintains a model (a regional transit forecasting model) for mode
choice, trip assignment, and benefit analysis. The metropolitan plan- Reconstruction Service Planning
ning organization for the New York City portion of the region (the
New York Metropolitan Transportation Council) maintains a full- 3. Portions of subway lines sometimes need to be taken out of
fledged travel demand forecasting model that incorporates features service for extended periods of time for reconstruction. Information
60 Transportation Research Record 2112

on how passengers use the line is used to help plan for the service
interruption, and the network model is used to estimate how passen-
gers will adjust their route use and determine whether supplementary
service is needed.
4. Travel pattern information is also used to determine the
impacts on passengers of routine late-night and weekend maintenance
and replacement work that requires track outages.

Bus Route Analysis

5. Origin and destination information is used to evaluate the


proposed splitting of long bus routes in Manhattan and Brooklyn
into two shorter and more reliable routes while minimizing the need
to transfer.
6. The origin and destination locations of express routes in
Staten Island were analyzed.
7. For bus rapid transit corridor planning, origin and destination
information was used to help determine existing bus trip lengths and
subway transfers in the corridor.

Route Usage

8. Station-to-station subway route use information was provided


during off-peak periods and weekends to plan diversions during
FIGURE 4 Ridership map generated for the No. 6 Train example. reconstruction work.

FIGURE 5 A three-legged subway–bus trip being browsed.


Barry, Freimer, and Slavin 61

9. The total number of riders using each subway route, includ- 2. Preserve the orderings of the MetroCard bus boardings.
ing transfers, was extracted by time of day to improve the informa- 3. Improve the route system to the point at which it matches the
tion provided to government agencies and the press. schedules exactly and also has accurate geographic locations for all
the stops.
4. Improve the bus trip-logging system to allow the easier recov-
Model Enhancements and Updates ery of trip records. This could include having the drivers enter sign
codes more consistently. It would be even better to have a Global
10. For the first time, the linked trip product provides a transit trip Positioning System-based system to provide bus locations.
table that includes all of the bus and subway links that form a single
trip. This is important for understanding and modeling the thousands The data provide the opportunity to perform much more sophis-
of ways that passengers use the extensive bus and subway network ticated types of forecasting of the use of new services by making it
in New York City. possible to use dynamic (i.e., time-dependent) methods of transit
11. The modeling results are used to calibrate the NYCT demand assignment. Further calibration of the transit assignment procedure
model and for validation purposes when the results are compared could work hand in hand to make fuller use of the data generated.
with other sources of transit usage information. The wealth of data now available also make it possible to study
12. The zone-to-zone trips tables are incorporated into the regional nontraditional commuting patterns, which previously would have
trip tables used in the regional models. been ignored by onboard surveys. For example, it is now possible to
examine commuting between or within the four boroughs outside of
Manhattan instead of the more common commute into Manhattan.
Household Travel Surveying

13. As part of the updating of the model, MTA has been con- ACKNOWLEDGMENT
ducting a household travel survey within New York City, and the
project team has been using this MetroCard-based information to The authors acknowledge the invaluable contributions of the NYCT
validate the origin and destination trip tables being produced by the project manager, Larry Hirsch, whose insights were instrumental in
survey. developing a solution to many of the difficult problems encountered
14. Because of logistical and financial constraints, the household in this work. His observation that certain transfer sequences could be
survey is limited to the collection of detailed information from a sam- used to infer the location of a bus was critical for the development of
ple of adult residents of New York City and is subject to response the bus location procedure.
bias. The MetroCard-based information, however, is based on mil-
lions of daily transactions and includes all paying passengers, regard-
less of their age or residence. These two sources of transit demand REFERENCES
information complement each other.
1. Zhao, J. Rail Transit OD Planning and Analysis Implications of Auto-
mated Data Collection Systems: Rail OD Matrix Inference and Path
Choice Modeling Examples. MS thesis. Department of Urban Studies
CONCLUSION and Planning and Department of Civil and Environmental Engineering,
Massachusetts Institute of Technology, Cambridge, 2004.
The project described here involved the creation of custom software 2. Barry, J. J., R. Newhouser, A. Rahbee, and S. Sayeda. Origin and Desti-
that processes MetroCard data and creates geographically located nation Estimation in New York City with Automated Fare System
Data. In Transportation Research Record: Journal of the Transportation
linked passenger trips. These trips can then be queried by the Research Board, No. 1817, Transportation Research Board of the National
use of user-friendly software to produce reports, maps, extracts, and Academies, Washington, D.C., 2002, pp. 183–187.
matrices, as needed, to support various planning and operational needs. 3. Rahbee, A., and D. Czerwinski. Using Entry-Only Automatic Fare Col-
lection Data to Estimate Rail Transit Passenger Flows at CTA. Proc.,
Numerous unexpected hurdles were overcome to create a functioning
2002 Transport Chicago Conference, Chicago, Ill., 2002.
system that is, the authors believe, the first to actually handle buses and 4. Wilson, N. H. M., J. Zhao, and A. Rahbee. The Potential Impact of Auto-
mixed-mode trips for a transit system with entry-only AFC data. mated Data Collection Systems on Urban Public Transport Planning. In
The processing procedure developed to generate the origin– Schedule-Based Modeling of Transportation Networks. Springer, New
York, 2009, pp. 75–99.
destination trip database is highly complex and involves numerous 5. Utsunomiya, M., J. Attanucci, and N. H. Wilson. Potential Uses of
intermediate steps, many assumptions, and some sampling approx- Transit Smart Card Registration and Transaction Data to Improve Tran-
imations. Undoubtedly, there are errors in the trip tables produced, sit Planning. In Transportation Research Record: Journal of the Trans-
especially for bus trips. Nevertheless, the portrait of NYCT system use portation Research Board, No. 1971, Transportation Research Board of
the National Academies, Washington, D.C., 2006, pp. 119–126.
is probably the most accurate that has ever been available, and it 6. Zhao, J., A. Rahbee, and N. H. M. Wilson. Estimating a Rail Passenger
has many potential applications. Trip Origin–Destination Matrix Using Automatic Data Collection Systems.
The data development approach could be improved in many Computer-Aided Civil and Infrastructure Engineering, Vol. 22, No. 5,
ways. The following improvements would improve the accuracy of 2007, pp. 376–387.
7. Slavin, H., A. Rabinowicz, J. Brandon, G. Flammia, and R. Freimer.
the results that could be obtained from data from MetroCard trans- Using Automatic Fare Collection Data, GIS, and Dynamic Schedule
actions. Some of them would also allow simpler algorithms to be Queries to Improve Transit Data and Transit Assignment Models. In
used for the processing. Schedule-Based Modeling of Transportation Networks (N. H. M. Wilson
and A. Nuzzolo, eds.). Springer, New York, 2009, pp. 101–118.
1. Improve the accuracy of the MetroCard transactions recorded The Rail Transit Systems Committee sponsored publication of this paper, which
to a minute, a second, or better. A bus can travel a significant distance was selected by the Public Transportation Group to receive the First Annual Out-
in 6 min. standing Research Paper in Public Transportation Award.

You might also like