You are on page 1of 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

Modeling and Analyzing


Impact Factors of Metro
Station Ridership: An
Approach Based on
a General Estimating
Equation
Yuxin He, Yang Zhao, and Kwok Leung Tsui
Are with City University of Hong Kong, China.
Email: yuxinhe2-c@my.cityu.edu.hk; yang.zhao@my.cityu.edu.hk; kltsui@cityu.edu.hk

XXXXXX

Abstract—Modeling and analyzing metro station ridership is of great importance to passenger flow man-
agement and transportation planning operations. In practice, ridership can be affected by multiple fac-
tors, including spatial factors (distance and network topology), temporal factors (e.g., period and trend),
and external factors (e.g., land use and socioeconomics). However, existing studies mainly focus on ex-
ternal factors but are rarely concerned with investigating temporal influencing factors on metro station
ridership.
In this article, we propose a novel data-driven method for estimating metro ridership and identifying
influencing factors at a refined granular level based on general estimating equation (GEE) models. Dif-
ferent from prior research, this study looks at longitudinal station-level metro ridership at varied time
resolutions. The longitudinal ridership data of the Taipei Metro and its potential influencing factors
data in an urban environment in the year 2015 are used to validate the effectiveness of our proposed
method.
The results demonstrate that the proposed method performs well in estimating longitudinal metro rid-
ership. It implies that the land use for shopping, feeder bus systems, days since stations were opened, and
transportation hub are significant factors influencing ridership at any time resolution. Temporal factors as
categorical parameters are also found to be crucial for determining metro ridership, which could facilitate
the implementation of flexible transportation planning strategies adapted with temporal changes in real
practice.

Digital Object Identifier 10.1109/MITS.2020.3014438


Date of current version: 1 September 2020 *Corresponding author

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 2 • WINTER 2020 1939-1390/20©2020IEEE


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

T
ransit ridership modeling and estimation are essen- influencing factors on longitudinal station-level ridership.
tial tasks in transportation planning, including traf- To the best of our knowledge, research on modeling lon-
fic demand analysis, route planning, feasibility and gitudinal transit ridership is yet to come. In this study, we
sustainability evaluation, and so on. In metro trans- propose a novel direct-demand model for influencing fac-
portation, ridership estimation at the station level plays a tors at a refined, granular-level identification and metro
critical role in determining the scale of stations and access ridership estimation.
facilities. Various methods have been proposed for transit The first objective of this study is to identify the associa-
ridership estimation. As one of the best-known techniques, tion between multiple factors and metro station ridership.
the four-step (generation, distribution, mode choice, and The second objective is to estimate longitudinal ridership
assignment) model has dominated the history of transport at different time resolutions (day of the week, week of the
modeling since the 1950s [1]. However, the four-step model month, and month of the year). The aim of aggregating rid-
has many drawbacks in practice, such as low model accu- ership into three time resolutions is to explore the temporal
racy, low data precision, insensitivity to land use, institu- factors, such as period and trend. A GEE is used to analyze
tional barriers, and high expense [2]. The four-step model the significance of various factors that impact longitudinal
is generally effective for estimating transit ridership on a metro station ridership. A GEE is a general statistical ap-
regional scale rather than on more detailed scales (such as proach to fit a marginal model for longitudinal/clustered
the station level) [3]. data analysis, and it has been popularly applied to clinical
As an alternative to the four-step model, direct-demand trials and biomedical studies [11]–[15].
models have drawn growing attention for ridership esti- The longitudinal ridership data of the Taipei Metro and
mation in recent decades. Direct-demand models estimate potential influencing factors data of the urban environment
ridership as a function of influencing factors within the pe- in the year of 2015 are used to illustrate the effectiveness of
destrian catchment areas (PCAs) via regression analysis, our proposed model. To the extent of our knowledge, this is
which enables identifying factors that contribute to higher the first work that investigates the utility of the GEE model
transit ridership [2], [4]–[7]. In the models, the PCA is a geo- for estimating longitudinal metro ridership and analyzing
graphic area from which a station attracts passengers. The the influencing factors. Such an approach allows a better ex-
size and shape of a PCA depend on how accessible a station planation of the temporal dynamics of transit ridership and
is and how far it is from alternative stations. One can use contributes to the literature on understanding the longitudi-
buffers to create circular PCAs by a specific distance or use nal ridership along the timeline and its time-varying impact
Thiessen polygons to illustrate the area most accessible to factors. It is worth noting that this study not only broadens
each station. the GEE model’s application domain but also advances the
The major advantages of direct-demand models in trav- literature on the use of marginal models in transportation
el analysis are simplicity of use, easy interpretation of re- generally.
sults, immediate response, and low cost. A comprehensive
review of direct-demand models can be found in the work Empirical Study Area and Data
by Walters and Cervero (2003) [8] and Cardozo et al. (2012) In this article, we investigate factors influencing transit
[3]. Ordinary least squares (OLS) multiple regression is the ridership at the metro station level in the Taipei–Keelung
most widely used direct-demand model; it can handle both metropolitan area, also commonly known as the Taipei
numerical and dummy variables and is flexible, widely metropolitan area, which includes Taipei City, New Taipei
used, and easily understood [9]. He et al. (2018) [10] inves- City, and Keelung. This area is supported by a relatively
tigated the factors influencing Taipei Metro ridership at large metro transportation network, consisting of five lines
the station level over varying time periods by adopting OLS [BR (Wenhu Line), R (Tamsui-Xinyi Line), G (Songshan-
multiple regression models. Xindian Line), O (Zhonghe-Xinlu Line), and BL (Bannan
However, despite the rich literature on transit ridership Line)] and 108 stations, operating on 131.1 km of revenue
modeling, OLS and even other traditional direct-demand track. The Taipei metropolitan area and a route map of the
models have limitations for modeling ridership, such as Taipei Metro are shown in Figure 1.
ignoring temporal dependencies when modeling and ana- The population of the Taipei metropolitan area is about
lyzing ridership. In practice, acquiring insights into metro 7,040,386, and Taipei city, as the core city of the area, has a
ridership at different time resolutions under multiple in- population of approximately 2,682,721; the population den-
fluencing factors is important for effective passenger flow sity is 9,870 persons/km 2. This density ranks Taipei as the
management and transportation planning operations. seventh-most densely populated city in the world [32].
Nonetheless, few existing direct-demand models have in-
cluded temporal factors in ridership estimation. Data Description
This study contributes to the ridership estimation-relat- All 108 metro stations were taken into consideration dur-
ed literature by investigating spatiotemporal and external ing data collection. The Taipei Metro boarding and alight-

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 3 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Table 1. A summary of explanatory variables.


N
Explanatory Abbreviations
W E
Categories Variables of Variables Data Source
S Land use Residential units Residence Google Maps
(The number
Hotels Hotel Google Maps
of ***)
Shopping malls Shopping Google Maps
Schools School Google Maps
Offices Offices Google Maps
Banks Bank Google Maps
Hospitals Hospital Google Maps
Universities University Google Maps
Network Distance to the Dis_to_center Calculated
MetroLines Taipei City structure city center

MetroStations Degree centrality Degree Calculated


New Taipei City
Water Betweenness Betweenness Calculated
Keelung City centrality
Socioeconomic Population Pop WorldPop
FIG 1 The Taipei metropolitan area and a route map of the Taipei Metro. Days since Days_open The Taipei
opened Rapid Transit
website
ing ridership data used in the research were collected
Intermodal Dummy variable Trans_hub \
from the Taipei Rapid Transit website [16]. The data cover
transportation for distinguishing
a time span of the whole year of 2015, including boarding accessibility transportation hub
and alighting ridership amounts. We sum the boarding
The number of Bus Google Maps
and alighting ridership amounts at three time resolutions: bus stations
day of the week (October 12–18), week of the month (June
***: locations where units served as explanatory variables; \: no data source.
1–28), and month of the year (January 1–December 31).
The ridership data at the three time resolutions are ad-
opted as dependent variables in the fitting of models. The ship is the largest on Saturday, while for other stations, the
explanatory variables represent factors hypothesized to maximum ridership appears on Friday). In addition, there
influence station ridership. (A detailed description is pro- is no significant difference among the distribution of rid-
vided in Table 1.) ership of each week for the whole month. Motivated by the
difference among the temporal distribution characteristics
Dependent Variables of the three time resolutions, three models considering
As mentioned, travel demands and travel patterns vary temporal dependencies were built, intending to find the
over time in real practice. According to the descriptive factors influencing the station-level ridership at different
statistics shown in Figure 2, the statistical distribution of times.
ridership in each time unit of a time resolution is not ex- Figure 3 shows the spatial distribution of ridership on
actly the same. Figure 2(a)–(c) shows the time series of all average weekdays (October 12–16). Obviously, it presents
station-level ridership at three time resolutions: average that the ridership at Taipei Main Station is much larger
ridership of each day of the week, average daily ridership than that of the other stations, which corresponds to the
of each week of the month, and average daily ridership outlier line shown in Figure 2(a)–(c); this is the main
of each month of the year, respectively. To better depict transportation hub for both the city and northern Taiwan.
groups of ridership through their quartiles, we present the Taipei Main Station is connected to the following transpor-
boxplots of ridership in each time unit of the three time tation services: Taipei Mass Rapid Transit (MRT) (metro),
resolutions in Figure 2(d)–(f). The boxplots may display Taiwan Railways and Taiwan High-Speed Rail (train), and
outliers as individual points, corresponding to the indi- Taiwan Taoyuan International Airport MRT (airport ex-
vidual lines far away from the mass distribution shown in press). Moreover, it is worth noting that ridership is dis-
Figures 2(a)–(c). tributed more densely in the central region of the Taipei
Specifically, different stations’ ridership variations metropolitan area, covering the central business district
present different patterns (e.g., for some stations, rider- of Taipei City.

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 4 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Sun.
3 Sat.

Day of the Week


Ridership (×105)

Fri.
2
Thu.
Wed.
1
Tue.
Mon.
0
Mon. Tue. Wed. Thu. Fri. Sat. Sun. 0 1 2 3
Day of the Week Ridership (×105)
(a) (d)

3
Week 4
Week of the Month
Ridership (×105)

2 Week 3

Week 2
1

Week 1
0 Feb
Week 1 Week 2 Week 3 Week 4 0 0.5 1 1.5 2 2.5 3
Week of the Month Ridership (×105)
(b) (e)

Dec.
3 Nov.
Oct.
Month of the Year

Sep.
Ridership (×105)

Aug.
2
July
June
May
1 Apr.
Mar.
Feb.
0 Jan.
0 1 2 3
Fe .
b.

.
r.
ay

ne
ly
g.

.
.

.
.
n

ar

pt
ct
ov
ec
Ap

Ju
Ja

Au
M

O
Ju

Se
M

N
D

Month of the Year Ridership (×105)


(c) (f)

FIG 2 The temporal distribution of ridership. (a) The ridership of all metro stations at the level of day of the week. (b) The average daily ridership of all
metro stations at the level of week of the month. (c) The average daily ridership of all metro stations at the level of month of the year. (d) A boxplot of the
ridership at the level of day of the week. (e) A boxplot of the average daily ridership at the level of week of the month. (f) A boxplot of the average daily
ridership at the level of month of the year.

Explanatory Variables Socioeconomic Variables


The explanatory variables can be divided into four groups: Socioeconomic variables consist of the population distri-
socioeconomic, land use, intermodal transportation acces- bution of the Taipei metropolitan area in the year 2015
sibility, and network structure variables. and operation days since metro stations were opened. The

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 5 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Legend

223,474–278,237

168,711–223,474

113,948–168,711
59,185–113,948
4,418–59,185
Taipei Main Station ≤4,418

FIG 3 The spatial distribution of ridership on average weekdays.

rect-viewing relationship between the population density


and metro station distribution. Therefore, the population
data are processed and aggregated within 500-m buffers
with ArcGIS 10.2, and their influence on ridership is ana-
lyzed in the subsequent modeling stage.

Land Use Variables


With regard to land use variables, the number of residenc-
Legend es, hotels, schools, universities, offices, hospitals, banks,
Taipei Population 2015 and shopping malls within a station’s 500-m PCA were col-
Value lected from Google Maps with the assistance of an applica-
High: 410.863 tion programming interface.
500 m
Low: 0 Intermodal Transportation Accessibility Variables
As for intermodal transportation accessibility, here, we
FIG 4 The population distribution of the Taipei metropolitan area and
500-m buffers of metro stations. consider the feeder bus system. The related data indicat-
ing the number of bus stations within metro stations’ PCAs
information on days since metro lines and stations opened are collected from Google Maps. A dummy variable for the
was collected from the Taipei Rapid Transit website [17]. transportation hub is also included to test whether stations
The population data were collected from a related website serving as transportation hubs generate substantial ad-
that provides the raster files of population distribution in ditional ridership. It is hypothesized that some important
the year 2015 [18]. The format of the population file is in services (e.g., metro, train, and high-speed rail) provided
GeoTIFF. The file provides the estimated numbers of peo- by the transportation hub might be a positive inducement
ple per grid square at 8.33 # 10 -4 degrees spatial resolu- to ridership of the station.
tion (approximately 100 m at the equator), which can be Defining a transportation hub of a region is a fundamen-
projected to the “GCS_WGS_1984” geographic coordinate tal task. Therefore, in a quantitative way, we performed
system. outlier tests of the linear regression model (excluding
Figure 4 shows the population distribution of the Taipei transportation hub information) for average daily rider-
metropolitan area in the year 2015. Meanwhile, the buf- ship in a whole week (October 12–18) and found that the
fers of metro stations, each with a radius of 500 m, were no. 81 sample point is an influential point that has the most
created by using ArcGIS 10.2; these are also illustrated in influence with the largest residual and highest leverage
Figure 4. As shown in Figure 4, it is hard to visualize a di- value according to the influence plot shown in Figure 5(a).

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 6 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Legend
Average Daily Ridership Population Value
4,336–20,099 High: 410.863
20,100–41,201
41,201–68,932 Low: 0
68,933–139,110
139,111–300,416
Taipei Main Station
Cook’s Distance
81
0.5
Studentized Residuals

8
1

4
1.5

0
2

0.05 0.1 0.15 0.2


Hat Values
(a) (b)

FIG 5 The influential point in the sample of Taipei Metro stations. (a) The influence plot of the regression model for the whole-week average daily
ridership. (b) The spatial distribution of the whole-week average daily ridership and population.

It turns out that no. 81 is Taipei Main Station, which is a where R is the radius of Earth, and (Lat 0 Lon 0) and
mega transportation hub for both the city and for northern (Lat i Lon i) are the latitude and longitude of the city center
Taiwan, with much larger ridership than that of other sta- and station i, respectively. The related geographical data
tions [Figure 5(b)], as mentioned earlier. In this case, we were collected from Google Maps.
include a dummy variable for transportation hub into our
models to improve the explanatory power. Methodology
We developed a data-driven approach based on a GEE
Network Structure Variables to model longitudinal ridership and analyze the impor-
In this article, we consider various network structure vari- tance of different factors that can potentially impact
ables, including the degree of centrality and betweenness ridership. Many factors, such as population, number
centrality of the nodes in the metro network and, also, dis- of nearby facilities, bus feeder system, and the net-
tance to the city center. In the field of complex networks, de- work structure have great potential to impact the rid-
gree is a simple centrality measure that counts how many ership of a metro station. As confirmed in this study,
neighbors a node has, and the betweenness centrality for the ridership values counted repeatedly in each time
each node refers to the number of shortest paths that pass unit at different time resolutions for the same station
through the node [19]. In the context of metro networks, de- are strongly correlated (see Figure 6). Therefore, the
gree is correlated to the information for transfer stations GEE model is employed, as it is a widely used statisti-
or terminal stations [20]. Betweenness describes the impor- cal model for longitudinal data collected from repeated
tance of stations in the aspect of their control of overflows measurements on the same statistical units (in this
passing between others of the metro networks. As for the case, the metro stations).
distance Dist i of the i th station to the city center, i.e., the A series of models were built in R software [21] by us-
Taipei city government located in Hsinyi District, we calcu- ing the “gee” and “geepack” packages [22], [23]. All of the
late it by (1), considering the effect of the radius of Earth: aforementioned potential influencing factors are used as
input variables, and their individual significance in the
Dist i = R : arccos models is calculated. The model with the smallest quasi
cos ^Lat 0 h : cos ^Lat ih : cos ^Lon 0 - Lon ih
e o: r ,
Akaike information criterion (QIC) is chosen as the fi-
 + sin^Lat ih : sin^Lat 0h 180 nal model. Following is the introduction of the proposed
(1) model based on the GEE and model selection criteria.

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 7 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

The Proposed Model Based on the GEE are commonly adopted: “independent,” “exchangeable,”
“k-dependent,” “autoregressive,” “stationary,” “nonstationary,”
Notation and Model Building and “unstructured.” (For more details, see [13] and [14]).
In the context of our empirical study area, the longitudinal data
consists of K subjects, which indicate K ^ K = 108 h metro sta- Estimating Parameters
tions. For subject i ^i = 1, 2, f, K h, there are n i observations, The estimator of a is associated with different correlation
and Yit denotes the t th response ^t = 1, f, n i h, representing structures and can be estimated through the iterative algo-
metro ridership at the ith station observed at the t th given time. rithm using Pearson residuals e it = ^Yit - n it h o ^ n it h cal-
Let X it denote a p # 1 vector of covariates, corresponding to p culated from the current value of b. In addition, the scale
influencing factors at station i observed at the t th given time. parameter z can be estimated by
Accordingly, Yi = ^Yi1, Yi2, f, Yin hT is the response vector for
i

the ith subject with the mean vector by n i = ^ n i1, n i2, f, n in hT,
K n
1 i

N - p i/ / e 2it ,(3)
i zt =
=1t=1
where n it is the corresponding t th mean.
The following assumptions were used: (i) each metro sta- where N = R iK= 1 n i is the total number of observations, and
tion is independent, and (ii) each longitudinal observation p is the dimension of covariates.
within stations is dependent. These hypotheses are based With the estimating equation approach, no likelihood
on the main assumption of GEEs that cases need to be de- has been specified, so maximum likelihood inference is
pendent within subjects and independent between sub- not available for these estimators. Instead, robust or sand-
jects. In addition, the independent hypothesis of stations wich inference is typically provided. For given estimates
(except for spatiotemporal dependencies among observa- (zt , at ) of (z, a), b can be estimated by solving the GEE as
tions) is widely used in the OLS models, which assumes
U^ bh =
K
that each observed individual is considered independent in / D iT V -i 1 ri = 0, (4)
i=1
the OLS model [3], [6], [24].
GEE models are generalized linear marginal models. where D i = 2n iT 2b , and ri = Yi - n i . For each i,
That is, they combine the generalized linear model for U i ^ b, a h = D Ti V -i 1 ri . bt is asymptotically normally distrib-
a nonnormal residual with the repeated measures of a uted, with a mean b 0 and a covariance matrix estimated
marginal model. We would use a GEE when we have re- based on the “sandwich” estimator
peated measures on each subject and need to run a gen- -1 -1
K K
c D i V i D i m M LZ c / D i V i D i m , (5)
eralized linear regression model. Thus, GEE models are t LZ = /
V T -1 t T -1

i=1 i=1
an extension of the generalized linear regression model.
Therefore, it is reasonable to follow the assumption of OLS with
models applied in the same applications when conducting
/ D Ti V -i 1 Cov^YihV -i 1 D i . (6)
K
GEE models. t LZ =
M
A relationship between n it and the covariates X it can i=1

be explained by the marginal model formulated as follows: Cov ^Yih = rt i rt iT with rt i = Yi - nt i is an estimator of the
variance–covariance matrix of Yi [25], [26]. The “sand-
g ^ n it h = X Tit b, (2) wich” estimator is robust even when the correlation
structure ^Vi h is misspecified. It is noted that, when Vi is
where g ^ : h is a given link function (e.g., identity, log, logit, and specified exactly, V t LZ comes down to ^R iK= 1 D Ti V i-1 D ih-1,
so on), and b is a p # 1 vector of regression coefficients that which is generally called the model-based variance estima-
need to be estimated. The conditional variance of Yit given X it tor [27]. On this basis, a Wald Z test can be conducted since
can be expressed as Var ^Yit | X it h = o ^ n it h z, where o ^ : h is the test statistic is asymptotically normally distributed.
a known variance function of n it, and z is a scale parameter
pending estimation. Generally, o and z are dependent on the Calculating Procedures of Estimators
distributions of responses. For instance, if Yit is continuous, The Gauss–Newton method can be adopted to compute the
o ^ n it h is equal to one, and z denotes the error variance; if Yit is estimator bt [28] by iterating between a modified Fisher
counted, o ^ n it h = n it, and z is specified as one. scoring for b and moment estimation of a and z. For giv-
Moreover, the variance–covariance matrix of response en current estimates (zt , at ) of (z, a), the iterative proce-
vector Yi can be expressed by Vi = zA i R i ^a h A i , where
^ 1/2 h ^ 1/2 h
dure for b is as follows:
A i = Diag " o ^ n i1 h, f, o ^ n in h,, and R i ^a h is referred to as
/ D Ti ^ bt jhO
Vi ^ bt j h D j ^ bt j h1
-1
bt j + 1 = bt j - '
i K -1
the “working” correlation structure, which refers to the de-
i=1

# ' / D Ti ^ bt j hO
pendency correlation matrix with the size n i # n i within the 
Vi ^ bt j h ri ^ bt j h1,
K
subject. R i ^a h depends on a vector of association parameters
-1
(7)
i=1
represented by a. Several “working” correlation structures

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 8 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Corr: Corr: Corr: Corr: Corr: Corr:


Mon.
× 105
0.998 0.998 0.998 0.998 0.97 0.96 1
2.5
Corr: Corr: Corr: Corr: Corr:
1.5
Tue. 1 0.999 0.995 0.962 0.95
0.5
0
2 Corr: Corr: Corr: Corr:
1 Wed.
1 0.996 0.965 0.952
0
2
Thu. Corr: Corr: Corr: Corr:
1
Week 1 Corr: Corr:
0 0.997 0.967 0.954 1
3 × 105 0.999 0.998
3
2 Fri. Corr: Corr:
1 2
Week 2 Corr: Corr:
0 0.98 0.973 1
0 0.999 0.999
3 3
2 Sat. Corr: 2 Corr:
1
0 0.994 1 Week 3 0.998
3 0
3
2 Sun. 2
1
0 × 105 1 Week 4
0 × 105
0
1
2

0
0.5
1
1.5
2
2.5
0

0
1
2
3
0
1
2
3

0
1
2
3
0
1
2
3
0
1
2
3
(a) (b)

Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr:
Jan. 0.998 0.998 0.999 0.999 0.999 0.998 0.997 0.999 0.998 0.998 0.997
× 105
3
2 Feb. Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr:
1 0.995 0.997 0.996 0.996 0.996 0.996 0.996 0.995 0.995 0.994
0
3
2 Mar. Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr:
1 0.999 0.998 0.998 0.995 0.994 0.997 0.998 0.998 0.996
0
3
2 Apr. Corr: Corr: Corr: Corr: Corr: Corr: Corr: Corr:
1 1 0.999 0.998 0.997 0.999 0.999 0.999 0.998
0
3
2 Corr: Corr: Corr: Corr: Corr: Corr: Corr:
1 May
0
1 0.998 0.998 0.999 0.999 0.999 0.998
3
2 Corr: Corr: Corr: Corr: Corr: Corr:
1 June
0 0.999 0.998 0.999 0.999 0.998 0.998
3
2
July Corr: Corr: Corr: Corr: Corr:
1
0 1 0.999 0.997 0.997 0.997
3
2 Corr: Corr: Corr: Corr:
1 Aug.
0 0.998 0.996 0.996 0.997
3
2 Corr: Corr: Corr:
1 Sept.
0 0.999 0.999 0.998
3
2 Corr: Corr:
1 Oct.
0 0.999 0.998
3
2 Corr:
1 Nov.
0 0.998
3
2 Dec.
1
0 × 105
0 1 2 30 1 2 30 1 2 30 1 2 30 1 2 30 1 2 30 1 2 30 1 2 3 0 1 2 30 1 2 30 1 2 3
(c)

FIG 6 The correlation matrices of ridership observed at different times. (a) The correlation among days of the week. (b) The correlation among weeks of
the month. (c) The correlation among months of the year. Corr: correlation coefficient.

where OVi ^ b h = Vi 6b, at " b, zt ^ b h,@ . for calculating bt is equivalent to carrying out an itera-
Then, define D = ^D T1 , f, D TK h , r = ^r T1 , f, r TK h , and
T
tively reweighted linear regression of Z on D with weight
Vu as an nK # nK block diagonal matrix with Vi values Vu -1. For brevity, specific estimators and calculating pro-
as the diagonal elements. Define the modified dependent cedures for zt and at are not explained in this article but
variable Z = Db - r, and then the iterative procedure (7) can be found in [13].

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 9 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Model Selection basic idea of the criterion is to calculate the expected Kull-
Variable selection is necessary for determining which are back–Leibler discrepancy using the quasi-likelihood under
included in the final regression model by identifying sig- the independence “working” correlation assumption be-
nificant predictors; in addition, exactly determining the cause of lacking a general quasi-likelihood for the corre-
“working” correlation structure can definitely enhance lated data under any other complex “working” correlation
the efficiency of the parameter estimates, particularly structures. QIC(R) is defined by
when the sample size is not large enough [27]. Therefore,
different criteria are adopted due to different goals of QIC ^R h = - 2} ^ bt ^R h; I h + 2trace ^X t LZ h, (8)
t IV
model selection [28]. The most generally used criterion,
named quasi-likelihood, under the independence model where the quasi-likelihood is } ^ bt ^R h; I h = R iK= 1 R nt = 1 Q (bt ^ R h,
i

zt ; " Yit, X it ,) w i t h Q ^ n, zt ; y h = 8 ^^ y - t h ^ zt v ^ t hhh dt, bt


u
criterion (QIC) on the model selection of a GEE, is simply
y
introduced. and zt are obtained under the hypothesized “working”
The QIC was proposed by Pan [29], who modified the correlation structure R, X t I = R iK= 1 D iT V i-1 D i | b = bt , R = I , and
Akaike information criterion in adaption to a GEE. Since Vt LZ is defined as described earlier with the replacement
a GEE is not likelihood based, it is called quasi-likelihood of b by bt ^Rh . In this article, the QIC is adopted for model
under the independence model criterion (QIC) [30]. The selection.

Table 2. The model for the level of day of the week (October 12–18).

P Value
Coefficients Estimate Standard Error Wald (>|W|) Significance
Intercept –2.15e+04 6.84e+03 9.93 0.00162 **
Pop 2.57e+01 1.62e+01 2.53 0.11147
Office 1.06e+03 4.60e+02 5.32 0.0211 *
Shopping 1.56e+03 3.16e+02 24.46 7.6e–07 ***
Bus 7.24e+02 1.91e+02 14.38 0.00015 ***
Dis_to_center 8.34e+02 4.46e+02 3.49 0.06174 .
Days_open 3.22e+00 5.68e–01 32.21 1.4e–08 ***
Trans_hub 2.21e+05 8.44e+03 687.55 <2e–16 ***
Tuesday 7.86e+02 2.07e+02 14.4 0.00015 ***
Wednesday 1.44e+03 1.93e+02 55.82 8.0e–14 ***
Thursday 1.74e+03 2.02e+02 73.88 <2e–16 ***
Friday 5.01e+03 6.81e+02 54.13 1.9e–13 ***
Saturday 1.17e+03 1.28e+03 0.84 0.35993
Sunday –4.45e+03 1.15e+03 15.07 0.0001 ***
Correlation: Structure = AR-1
QIC 14,814

Working Correlation
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0.977 0.954 0.932 0.91 0.889 0.868
[2,] 0.977 1 0.977 0.954 0.932 0.91 0.889
[3,] 0.954 0.977 1 0.977 0.954 0.932 0.91
[4,] 0.932 0.954 0.977 1 0.977 0.954 0.932
[5,] 0.91 0.932 0.954 0.977 1 0.977 0.954
[6,] 0.889 0.91 0.932 0.954 0.977 1 0.977
[7,] 0.868 0.889 0.91 0.932 0.954 0.977 1
Significance levels: ***: 0; **: 0.001; *: 0.01; .: 0.05’; no mark: 0.1 or 1.

Numbers in brackets refer to days of the week.

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 10 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Results and Discussion According to the QIC, exchangeable correlation struc-


tures are selected in both the model for the level of week
Model Implementation and Results Analysis of the month and the model for month of the year (Tables
We built three models for three time resolutions (n i = 7 for 3 and 4). Exchangeable correlation structure assumes
the level of day of the week, n i = 4 for the level of week of constant time dependency; thus, all of the off-diagonal
the month, and n i = 12 for the level of month of the year). elements of the correlation matrix are equal, indicating
The proposed models for the three types of time resolutions that there is no logical ordering to the observations of
with the lowest QIC were selected as the final models (with each week of the month and each month of the year. In
the working matrix structure and variables determined), addition, variables of the model for the level of month of
and the results are summarized in Tables 2–4. the year are the same as those of the model for day of the
First of all, we compare the result of the GEE model for week, and variables of the model for week of the month
the level of day of the week (Table 2) with that of the OLS are almost the same as those of the other two models
model for estimating average weekday ridership [10]. The if the variable “distance to city center” is not included.
results show that the selected variables (except for the tem-
poral factors identifying each day/week/month) and their Discussion and Implications for Planning
significance in the GEE model are consistent with those of The proposed models based on the GEE provide general-
OLS model, indicating the effectiveness of the GEE model. ized estimates of the linear model, enabling the consider-
In addition, the GEE takes into account the dependency ation of temporal factors such that they can be adopted for
of observations by specifying a “working correlation struc- estimating the longitudinal station ridership in different
ture.” For OLS regression, however, pairwise correlations time periods.
between time points are independent, and the variance of
scores is homogenous across time (mean squared error Interpretation of Coefficients
in OLS); therefore, OLS regression cannot be applied to The coefficients for predictors of the GEE are interpreted
longitudinal data modeling or used to interpret the effects in the same way as OLS. The only consideration (and key
of temporal factors on attracting ridership. In general, the departure from linear regression) is that these measured
superiority of the GEE model over
the OLS model is reflected in that
the GEE can capture the temporal Table 3. The model for the level of week of the month (June 1–28).
factors that OLS cannot involve.
Focusing on the model for the Standard P Value
Coefficients Estimate Error Wald (>|W|) Significance
level of day of the week (Table 2), au-
toregressive 1 (AR-1), which assumes Intercept –1.42e+04 5.42e+03 6.9 0.086 **
the correlations to be an exponential Pop 3.22e+01 1.61e+01 3.98 0.04611 *
function of the time lag 1 wave (see Office 9.42e+02 4.68e+02 4.06 0.04401 *
the working correlation in Table 2),
Shopping 1.49e+03 3.09e+02 23.23 1.4e–06 ***
is selected as the working matrix
Bus 6.78e+02 2.01e+02 11.4 0.00073 ***
structure in the model for repeated
measures. Theoretically, a matrix Days_open 3.14e+00 5.90e–01 28.46 9.6e–08 ***
accounting for time information was Trans_hub 2.17e+05 8.48e+03 652.3 <2e–16 ***
recommended (e.g., autoregressive) Week 2 –1.12e+02 1.79e+02 0.39 0.53201
when the observations were collect-
Week 3 –3.59e+03 2.58e+02 193.8 <2e–16 ***
ed at different time points [15].
Therefore, AR-1, as the selected Week 4 –2.70e+02 1.13e+02 5.69 0.01705 *
working correlation structure, in- Correlation: Structure = exchangeable
dicates there is a logical ordering QIC 8,319
to the observation of each day of
Working Correlation
the week. The variables selected by
the QIC are listed as the population, [,1] [,2] [,3] [,4]
number of offices, shopping malls [1,] 1 0.982 0.982 0.982
of each station, distance to the city [2,] 0.982 1 0.982 0.982
center, days since opened, dummy
[3,] 0.982 0.982 1 0.982
variable for transportation hub, and
[4,] 0.982 0.982 0.982 1
day of the week (i.e., Monday to Sun-
Significance levels: ***: 0; **: 0.001; *: 0.01; .: 0.05’; no mark: 0.1 or 1.
day) as the categorical parameter.

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 11 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

identify the degree of their effects


Table 4. The model for the level of month of the year (January–December 2015).
on attracting metro passengers.
P Value For example, Sunday has negative
Coefficients Estimate Standard Error Wald (>|W|) Significance and statistically significant corre-
Intercept –2.35e+04 6.98e+03 11.31 0.00077 *** lation (according to the p value),
while the other days of the week
Pop 2.75e+01 1.70e+01 2.63 0.10506
are positively correlated with rid-
Office 1.16e+03 4.84e+02 5.76 0.01644 *
ership, which indicates that peo-
Shopping 1.61e+03 3.40e+02 22.54 2.1e–06 *** ple ride the metro less frequently
Bus 7.86e+02 1.95e+02 16.16 5.8e–05 *** on Sunday compared to the other
Dis_to_center 7.21e+02 4.48e+02 2.59 0.10756
days of week since it is part of the
weekend (Table 2).
Days_open 3.28e+00 5.97e–01 30.07 4.2e–08 ***
Likewise, the month of the year
Trans_hub 2.17e+05 9.11e+03 565.88 <2e–16 *** also plays an important role in de-
Feb –1.04e+03 2.78e+02 14.08 0.00018 *** termining metro ridership. Febru-
Mar 1.71e+03 2.22e+02 59.16 1.5e–14 *** ary, June, August, and September
have negative and statistically sig-
Apr 5.81e+01 1.23e+02 0.22 0.63683
nificance correlation, while the oth-
May 2.54e+02 1.66e+02 2.32 0.12745 er months of the year are positively
Jun –6.16e+02 1.92e+02 10.31 0.00133 ** correlated with ridership (Table 4).
Jul –5.25e+02 3.34e+02 2.47 0.11603 This demonstrates that February,
June, August, and September had
Aug –1.22e+03 3.66e+02 11.07 0.00088 ***
negative effects on attracting trav-
Sep –1.57e+03 2.42e+02 41.96 9.3e–11 ***
elers by metro compared with other
Oct 5.68e+02 2.46e+02 5.33 0.02097 * months in the year 2015, which can
Nov 8.46e+02 2.74e+02 9.52 0.00203 ** be verified by the fact that Febru-
Dec 2.91e+03 4.80e+02 36.84 1.3e–09 *** ary, August, September, and June
are the last four months in terms of
Correlation: Structure = exchangeable
passenger volumes according to the
QIC 25,006
table of monthly average daily pas-
Working Correlation senger volumes [31].
[,1] [,2] g [,12]
Implications for Planning
[1,] 1 0.97 g 0.97
These findings from the empirical
[2,] 0.97 1 0.97 study of ridership in Taipei Metro
h h h j h stations have major implications
[12,] 0.97 0.97 g 1 for transportation and related land
use planning. First, periphery de-
Significance levels: ***: 0; **: 0.001; *: 0.01; .: 0.05’; no mark: 0.1 or 1.
velopment of stations is suggested
to be incorporated when planning
effects are considered to be at a “population” level. The metro lines. In particular, as days since stations opened is
modeling results show that the p values of the variables for positively correlated with metro ridership, the first metro
number of shopping malls within a station’s PCA, nearby line is usually built in areas where activity and population
bus stations, days since stations opened, and transporta- density are the highest in the city.
tion hub are all less than 0.01 for the model at any time Second, attention needs to be paid to the interactive ef-
resolution, indicating that they are all statistically signifi- fect of commercial development and metro ridership so that
cant factors in determining the station ridership in any the commercial development and construction of metro sta-
time period. In addition, they are all positively correlated tions can be planned in coordination. For instance, Taipei
with ridership, which indicates that those factors play key 101 is the trade center in Taipei, with intensive commer-
roles in attracting passengers of metro stations. cial development even in its radiation region and where the
As for the categorical parameters related to temporal development of commercial and metro traffic is promoted
factors, we can first identify which days/weeks/months mutually.
are significant factors according to the p values, and then, Third, a feeder bus system and transportation hub
through the coefficient of each day/week/month, we can could be more strategically positioned in the planning for

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 12 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

the metro network to achieve more balanced distribution and it advances the literature on the use of these types of
of passenger flows. Taking Taipei Main Station (the trans- statistical models in transportation generally.
portation hub) as an example, extra attention needs to be
paid to evacuating and diverting the huge passenger flow Acknowledgments
brought by multimodal transports. This work was supported by the National Natural Science
Finally, flexible planning strategies are suggested to be Foundation of China (71901188) and the Research Grants
adopted based on changes over time. For example, season- Council Theme-Based Research Scheme (T32-101/15-R).
al dynamic fares could be adopted to stimulate ridership We would like to thank Prof. Javier Cabrera from Rutgers
during the time period with a negative coefficient. These University for the useful discussion on the modeling part
findings can also inspire the metro planning and periph- of the article.
ery development of other cities.
About the Authors
Conclusion Yuxin He (yuxinhe2-c@my.cityu.edu.hk)
In summary, through modeling with GEEs, this article earned her B.S. and M.S. degrees in
identified the influencing factors on Taipei Metro rider- transportation planning and manage-
ship for different time resolutions (day of the week, week of ment from Central South University,
the month, and month of the year). Different from previous Changsha, China, in 2014 and 2017, re-
works, the proposed direct-demand model based on GEEs spectively. She is pursuing her Ph.D. de-
engaged temporal factors and generalized the linear mod- gree at the School of Data Science, City
el to estimate the longitudinal ridership during a period. University of Hong Kong, China. Her research interests in-
Various factors, including land use, socioeconomic, inter- clude network analysis, traffic flow theory, and data mining.
modal transportation accessibility, and network structure
information, were considered in initial models. Yang Zhao (yang.zhao@my.cityu.edu.hk)
According to the QIC, the correlation structure and earned her B.S. degree in statistics
variables were specified in final models. The correlation from Shandong University of Science
structure of the model for the level of day of the week was and Technology in 2011 and her Ph.D.
AR-1, assuming the correlation among the repeated mea- degree from City University of Hong
sure of different days of the week was an exponential func- Kong in 2015. She is currently the sci-
tion of the time lag 1 wave. The correlation structures for entific officer in the Centre for Systems
the levels of week of the month and month of the year are Informatics Engineering and Hong Kong Institute for Data
exchangeable, indicating that there is no logical ordering Science at City University of Hong Kong. Her research in-
to the observations of each week of the month and each terests are in machine learning and statistics, especially
month of the year. their application to real problems.
The results showed that the significant factors in de-
termining station-level ridership of the Taipei Metro at Kwok Leung Tsui (kltsui@cityu.edu.hk)
different time resolutions were nearly the same. The is a chair professor of industrial engi-
land use for commerce, bus feeder systems, days since neering with the School of Data Sci-
stations opened, and transportation hub were significant ence, City University of Hong Kong,
factors in attracting ridership. Temporal factors as cat- and a professor at the Department of
egorical parameters were also crucial for determining Industrial and Systems Engineering,
the metro ridership. Sunday of the week and February, Virginia Polytechnic Institute and State
June, August, and September of the year (negative co- University. He was a recipient of the National Science
efficients with statistical significance) have negative ef- Foundation Young Investigator Award. He is a Fellow of the
fects on attracting passengers who travel by metro in the American Statistical Association, American Society for
Taipei metropolitan area compared with the other days Quality, and International Society of Engineering Asset
and months. Management. His research interests include data mining,
In general, the modeling results enable us to estimate surveillance in healthcare and public health, prognostics
longitudinal metro station ridership and interpret the in- and systems health management, calibration and valida-
fluencing factors of metro travel demand at different time tion of computer models, process control and monitoring,
resolutions; thus, they provide a theoretical basis for metro and robust design and Taguchi methods.
planning and operation as well as periphery development
for passengers, transit operators, and public agencies. This References
[1] M. G. McNally, “The four step model,” eScholarship, Univ. of Califor-
methodology is also applicable to other transportation is- nia, Oakland, 2008. [Online]. Available: https://escholarship.org/uc/
sues that involve longitudinal traffic volume estimation, item/0r75311t

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 13 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

[2] J. Gutiérrez, O. D. Cardozo, and J. C. García-Palomares, “Transit rider- [17] “Chronicles,” Taipei Metro, Taipei. [Online]. Available: https://
ship forecasting at station level: An approach based on distance-decay english.metro.taipei/News_Content.aspx?n=07DAD5F7351B8882&
weighted regression,” J. Transp. Geogr., vol. 19, no. 6, pp. 1081–1092, sms=2190547C60526D6B&s=FD5C094216AE77AB
2011. doi: 10.1016/j.jtrangeo.2011.05.004. [18] “Data types,” WorldPop, Southampton. [Online]. Available: http://
[3] O. D. Cardozo, J. C. García-Palomares, and J. Gutiérrez, “Application of www.worldpop.org.uk/data/get_data/
geographically weighted regression to the direct forecasting of tran- [19] K. Erciyes, Complex Networks: An Algorithmic Perspective. Boca Raton,
sit ridership at station-level,” Appl. Geogr., vol. 34, no. 4, pp. 548–558, FL: CRC, 2014.
2012. doi: 10.1016/j.apgeog.2012.01.005. [20] X. Yang, T. Tang, Y. Qu, J. Wu, H. Yin, and Z. Gao, “Recognizing the
[4] J. Choi, Y. J. Lee, T. Kim, and K. Sohn, “An analysis of Metro ridership critical stations in urban rail networks: an analysis method based on
at the station-to-station level in Seoul,” Transportation, vol. 39, no. 3, the smart-card data,” IEEE Intell. Transp. Syst. Mag., vol. 11, no. 1, pp.
pp. 705–722, 2012. doi: 10.1007/s11116-011-9368-3. 29–35, 2018. doi: 10.1109/MITS.2018.2884492.
[5] R. Cervero, “Alternative approaches to modeling the travel-demand [21] R Development Core Team, “R: A language and environment for
impacts of smart growth,” J. Am. Plan. Assoc., vol. 72, no. 3, pp. 285– statistical computing,” R Foundation for Statistical Computing, Vien-
295, 2006. doi: 10.1080/01944360608976751. na, Austria, 2009. [Online]. Available: http://www.R-project.org.
[6] M. Kuby, A. Barranda, and C. Upchurch, “Factors influencing light-rail [22] J. C Vincent, “GEE: Generalized estimation equation solver,” R Pack-
station boardings in the United States,” Transp. Res. A, Pol. Pract., vol. age Version 4.13-19. 2015. [Online]. Available: https://CRAN.R-project
38, no. 3, pp. 223–247, 2004. doi: 10.1016/j.tra.2003.10.006. .org/package=gee
[7] X. Chu, “Ridership models at the stop level,” National Center for Tran- [23] U. Halekoh, S. Højsgaard, and J. Yan, “The R Package GEEPACK for
sit Research, Univ. of South Florida, Tech. Rep., 2004. generalized estimating equations,” J. Stat. Softw., vol. 15, no. 2, pp.
[8] G. Walters and R. Cervero, “Forecasting transit demand in a fast grow- 1–11, 2006. doi: 10.18637/jss.v015.i02.
ing corridor: The direct-ridership model approach,” Fehr & Peers [24] D. Zhang and X. C. Wang, “Transit ridership estimation with net-
Associates, Walnut Creek, CA, 2003. [Online]. Available: http://www work Kriging: A case study of Second Avenue Subway, NYC,” J.
.fehrandpeers.com/fplib/public/forecast-td-dirRidershp-approach. Transp. Geogr., vol. 41, pp. 107–115, Dec. 2014. doi: 10.1016/j.jtran-
pdf geo.2014.08.021.
[9] J. Zhao, W. Deng, Y. Song, and Y. Zhu, “What influences metro station [25] B. Lu, J. S. Preisser, B. F. Qaqish, C. Suchindran, S. I. Bangdiwala, and
ridership in China? Insights from Nanjing,” Cities, vol. 35, pp. 114–124, M. Wolfson, “A comparison of two bias-corrected covariance estima-
Dec. 2013. doi: 10.1016/j.cities.2013.07.002. tors for generalized estimating,” Biometrics, vol. 63, no. 3, pp. 935–941,
[10] Y. He, Y. Zhao, and K. L. Tsui, “An analysis of factors influencing metro 2007. doi: 10.1111/j.1541-0420.2007.00764.x.
station ridership: Insights from Taipei Metro,” in Proc. IEEE Conf. [26] A. Qu, B. G. Lindsay, and B. Li, “Improving generalised estimating
Intelligent Transportation Systems (ITSC), 2018, pp. 1598–1603. doi: equations using quadratic inference functions,” Biometrika, vol. 87,
10.1109/ITSC.2018.8569948. no. 4, pp. 823–836, 2000. doi: 10.1093/biomet/87.4.823.
[11] Z. Feng, P. Diehr, A. Peterson, and D. McLerran, “Selected statistical [27] G. Kauermann and R. J. Carroll, “A note on the efficiency of sandwich
issues in group randomized trials,” Annu. Rev Public Health, vol. 22, covariance matrix estimation,” J. Am. Stat. Assoc., vol. 96, no. 456, pp.
no. 1, pp. 167–187, 2001. doi: 10.1146/annurev.publhealth.22.1.167. 1387–1396, 2001. doi: 10.1198/016214501753382309.
[12] G. Fitzmaurice, N. Laird, and J. Ware, Applied Longitudinal Analysis, [28] A. L. Hin, V. J. Carey, Y. Wang, and J. Carey, “Criteria for working–
2nd ed. Hoboken, NJ: Wiley, 2011. correlation–structure selection in GEE,” Am. Stat., vol. 61, no. 4, pp.
[13] J. W. Hardin and J. M. Hilbe, Generalized Estimating Equations. New 360–364, 2013. doi: 10.1198/000313007X245122.
York: Chapman and Hall/CRC, 2002. [29] W. Pan, “Akaike’s information criterion in generalized estimat-
[14] M. Wang, “Generalized estimating equations in longitudinal data ing equations,” Biometrics, vol. 57, no. 1, pp. 120–125, 2001. doi:
analysis: A review and recent developments,” Adv. Stat., vol. 2014, no. 10.1111/j.0006-341X.2001.00120.x.
1, pp. 1–11, 2014. doi: 10.1155/2014/303728. [30] J. A. Neldert and Y. Lee, “Likelihood, quasi-likelihood and pseudolike-
[15] P. Ghisletta and D. Spini, “An introduction to generalized estimating lihood: Some comparisons,” J. R. Stat. Soc. Series B Methodol., vol. 54,
equations and an application to assess selectivity effects in a longitu- no. 1, pp. 273–284, 1992. doi: 10.1111/j.2517-6161.1992.tb01881.x.
dinal study on very old individuals,” J. Educ. Behav. Stat., vol. 29, no. 4, [31] “Ridership statistics,” Taipei Metro, Taipei. [Online]. Available:
pp. 421–437, 2004. doi: 10.3102/10769986029004421. https://english.metro.taipei/cp.aspx?n=78C1963041B43CBF
[16] “Ridership counts,” Taipei Metro, Taipei. [Online]. Available: [32] “City Mayors Statistics.” http://www.citymayors.com/statistics/largest
http://english.metro.taipei/ct.asp?xItem=1056489&ctNode=70217& -cities-density-125.html.
mp=122036 

IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE • 14 • WINTER 2020


Authorized licensed use limited to: University of Canberra. Downloaded on October 04,2020 at 09:46:55 UTC from IEEE Xplore. Restrictions apply.

You might also like