Spatio-Temporal Modeling of Yellow Taxi Demands Time Series

International Journal of Forecasting ( ) –
Contents lists available at ScienceDirect
International Journal of Forecasting

journal homepage: www.elsevier.com/locate/ijforecast
Spatio-temporal modeling of yellow taxi demands in New

York City using generalized STAR models
Abolfazl Safikhani a, *, Camille Kamga b , Sandeep Mudigonda c ,
Sabiheh Sadat Faghih b , Bahman Moghimi b
a
Columbia University, Department of Statistics, 1255 Amsterdam Avenue, New York, NY 10027-6938, United States
b
City University of New York, New York, NY, United States
c
City College of New York, New York, NY, United States
article info a b s t r a c t
Keywords: The spatio-temporal variation in the demand for transportation, particularly taxis, in the
STARMA highly dynamic urban space of a metropolis such as New York City is impacted by various
Spatio-temporal factors such as commuting, weather, road work and closures, disruptions in transit services,
Time series
etc. This study endeavors to explain the user demand for taxis through space and time by
Taxi demand prediction
proposing a generalized spatio-temporal autoregressive (STAR) model. It deals with the
high dimensionality of the model by proposing the use of LASSO-type penalized methods
for tackling parameter estimation. The forecasting performance of the proposed models
is measured using the out-of-sample mean squared prediction error (MSPE), and the
proposed models are found to outperform other alternative models such as vector au-
toregressive (VAR) models. The proposed modeling framework has an easily interpretable
parameter structure and is suitable for practical application by taxi operators. The efficiency
of the proposed model also helps with model estimation in real-time applications.
© 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction to a much greater efficiency and more nuanced economic

measures. It has been shown that providing more accu-
Taxi services have long been an important part of ur- rate predictions of the demand for taxis can improve the
ban transportation. Traditionally, ride hailing has been efficiency of the transportation system (Moreira-Matias,
performed by customers from the curbside of streets; Gama, Ferreira, Mendes-Moreira, & Damas, 2013), since a
however, such street-hail taxis could be inefficient at ad- better knowledge of the demand in various parts of the city
dressing spatio-temporal variations in demand. The highly at different times allows cab drivers to reduce their ride-
dynamic urban space in a metropolis such as New York City seeking trips, resulting in a reduction in traffic congestion
means that the spatio-temporal variation in the demand and pollution. On the other hand, inaccurate taxi demand
for taxis is impacted by various factors, such as commuting, forecasts could leave passengers in some parts of the city
weather, special events, parades, road work and closures, with long wait times. Hence, accurate short-term taxi de-
disruptions in transit services, etc. However, the rise of mand prediction enables the TNCs to dynamically reroute,
transportation network companies (TNCs) is making ride schedule and optimize operations.
hailing more on-demand. TNCs use economic means such The demand for ride hailing taxis in New York City
as surge pricing to address some of the inefficiencies, but (NYC) is highly variable, with between about 150,000 and
accurate predictions of the demand for taxis could lead 600,000 trips per day being provided by 21,263 street hail
taxis in 2015 (New York City Taxi & Limousine Commission,
2016; see also Fig. 1). This demand also has a high spatial
* Corresponding author.
E-mail address: as5012@columbia.edu (A. Safikhani). variability, with about 383,000 pickups in Manhattan and
https://doi.org/10.1016/j.ijforecast.2018.10.001
0169-2070/© 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Please cite this article in press as: Safikhani, A., et al., Spatio-temporal modeling of yellow taxi demands in New York City using generalized STAR models.
International Journal of Forecasting (2018), https://doi.org/10.1016/j.ijforecast.2018.10.001.
2 A. Safikhani et al. / International Journal of Forecasting ( ) –
Fig. 1. Temporal and spatial variation in taxi demand.
Fig. 2. Spatial variation of taxi demand aggregated by zip code in Manhattan.
only 3,150 pickups in the Bronx on an average day. The use (1980, 1981) and has been applied in many different dis-
of GPS enabled the spatio-temporal historical demand for ciplines since, such as economics and finance (Giacomini
taxis in the year of 2015 to be disaggregated to several sub- & Granger, 2004; Hernández-Murillo & Owyang, 2006;
regions within the city (see Fig. 2). Shoesmith, 2013), social science (Pfeifer & Deutsch, 1980;
These taxis travelled approximately 460 million miles Sartoris, 2005), transportation (Cheng, Wang, Harworth,
in 2015 (New York City Taxi & Limousine Commission, Heydecker, & Chow, 2011; Duan, Mao, Zhang, & Wang,
2016), the distribution of which can be seen in Fig. 3. Due 2016; Kamarianakis & Prastacos, 2003), climatology (Kyr-
to the myriad of factors that can impact demand, which iakidis & Journel, 1999), health sciences (Baklanov et al.,
may or may not be known in advance, there is scope for 2007), etc. Modeling the demand through time in all sub-
taxis to drive around seeking rides – some of which could regions simultaneously – a vector autoregressive model
be included in the number of trips under one mile in Fig. 3. (VAR) – is a high-dimensional problem, since the number of
This study models the demand for taxis as a dy- parameters in the model is proportional to the square of the
namic spatio-temporal process. The historical GPS-enabled number of sub-regions. A STAR-type modeling approach
spatio-temporal demand for taxis in the year 2015 (pro- reduces the number of parameters dramatically by gov-
vided by the Taxi and Limousine Commission of New York erning the neighborhood structure between the regions.
City) is used and aggregated to several sub-regions within This structure is also useful for capturing the spatial depen-
the city. dence of the demand between the regions, and makes the
We examine the demand for taxis in NYC through results more interpretable.
space and time using a spatio-temporal autoregressive In this research, we use the spatio-temporal structure
model. The spatio-temporal autoregressive moving aver- for predicting the yellow taxi demand for different zip
age (STARMA) model is a well-established spatio-temporal codes of Manhattan, a borough of NYC. For this purpose,
process that was first introduced by Pfeifer and Deutsch GPS taxi demand data for two ordinary days in October
A. Safikhani et al. / International Journal of Forecasting ( ) – 3
Fig. 3. Distribution of distance travelled by taxis in 2015.
2015 are disaggregated based on the zip code. Such an yellow taxis are already using app-based ride providers
approach requires the use of a zoning system for studying such as Curb. The objective of the proposed modeling
the spatial characteristics of the data. The system should framework would be to direct taxis to certain zones at cer-
divide the study area into smaller districts in order to assess tain times. Actual dispatch to pick up a certain passenger
the existence of spatial correlations among them. However, is not a part of the proposed framework, though it could
districts cannot be too small (like census tracks), since the be done using the actual location (within the zone) of a
GPS devices are not 100% accurate and can cause pick-up passenger requesting a ride through an app-based ride-
counts in each district to be less accurate. providing service. The main benefit of this approach is that
We also ensure temporal variation in the demand by it avoids taxis cruising for rides, especially across different
disaggregating the GPS data into 15 min intervals, result- zones, which would decrease the unnecessary distance
ing in 96 time-points per day. Other studies (e.g. Qian, travelled, with the associated fuel costs and pollution. In
Ukkusuri, Yang, & Yan, 2017) have also decomposed such addition, from a policy standpoint, the spatio-temporal
data by time (15-min intervals) and space (zip code basis). structure inferred from the demand data provides a basis
In addition to the use of spatio-temporal structure for taxi to allow regulating agencies to explore cordon pricing ini-
demand prediction, the main contribution of this paper tiatives.
is the introduction of new penalty functions in the STAR This paper is organized as follows. Section 2 discusses
model. Various different penalty functions are used in this the literature on time series modeling in transportation
paper for parameter estimation in the proposed general- and short-term taxi demand prediction. Section 3 provides
ized STAR model, and we show that they improve the accu- a detailed description of the spatio-temporal modeling and
racy of the taxi demand prediction. The double hierarchical formulation of taxi demand using the STARMA approach.
group LASSO (DHGLASSO) is the new penalty scheme that Section 4 presents findings for various types of STARMA
is introduced in this paper for penalizing parameters in the models and prediction errors, and compares them with
hierarchical structure of modeling in both the space and those from other time series models. Finally, we present
time dimensions. We evaluate the forecasting performance our conclusions and future research directions.
of the proposed method using the out-of-sample mean
squared prediction error (MSPE), and the results show that 2. Literature review
the proposed model outperforms some of the alternative
algorithms such as ARMA and VAR models. With the rise of intelligent systems and increases in
Given that there are about 12 million taxi trips a month, the availability of data, taxi pick-up demand prediction
amounting to 2 GB of data, a demand forecasting model has recently come to the attention of many scholars. It is
with a good spatial and temporal predictive accuracy is obvious that the taxi demand in any given zone changes
very useful. In particular, the proposed model has the abil- from one time interval to another in real time. Hence,
ity to forecast the taxi demand a few steps into the future time series models can form a strong statistical tool for
at various locations in NYC, and this enables the agency’s capturing this time variation in taxi demand, figuring out
framework to plan taxi positioning and provide demand- the correlations within the taxi data, and producing real
sensitive taxi dispatching for various locations and specific time predictions. For univariate data, a well-known family
times of the day over the year. When using a model for of time series models called ARIMA (autoregressive in-
placement of taxis, the yellow taxis may not operate as yel- tegrated moving average) models can be beneficial, and
low taxis per se. In other words, they could serve demand- have been applied to many transportation-related prob-
based rides as well as curb-side pickups. In fact, some lems (Moghimi, Safikhani, Kamga, & Hao, 2017). However,
in the dense urban transportation network that we con- both of the other algorithms, with mean absolute percent-
sider, with many different areas or zip codes that each age errors (MAPE) close to 0.1. The case study used in the
have their own demand dynamics and may be correlated current paper is the same as that used by Qian et al. (2017).
with one another, the taxi demand variation of a given zip It is well-known that spatial information can increase
code is not related only to its own values, but also to the the accuracy of prediction, especially for traffic congestion
demand in the neighboring zip codes. Since there are too and at longer horizons. The idea of capturing spatial in-
many parameters to estimate due to multiple zip codes, formation in time series studies of transportation-related
VAR models, as the most common multivariate time series problems was first introduced in the study by Okutani and
model, will not be able to perform well and will do a poor Stephanedes (1984) for the prediction of traffic flow. Later,
job of forecasting the taxi demand. This study attempts to the spatial concept was deployed in the study by Kamar-
mitigate this problem by applying a multivariate spatial– ianakis and Prastacos (2003) for forecasting the relative
temporal time series model which is discussed in details in velocity on major roads in Athens, Greece, in a method re-
Section 3 ferred to as the space–time autoregressive integrated mov-
Some of the primary research about taxi demand has ing average (STARIMA) model. The model is quite different
aimed to find the factors that influence taxi demand. from traditional ARIMA models due to the inclusion of
Schaller (1999) developed a citywide empirical time series spatial information regarding neighboring links for traffic
regression model of NYC taxis in an attempt to understand forecasting. They compared the forecasting performances
the relationship between the taxicab revenue per mile and of four models, namely the historical average, ARIMA,
economic activity in the city, taxi supply, taxi fares, and VARMA, and STARIMA. The results demonstrated that there
bus fares. Later, Schaller (2005) tried to figure out the rela- are no significant differences among the last three models,
tionships between taxi demand and other factors including although the last three models all performed better than
the city size, the availability and cost of privately owned the historical average one. Spatial–temporal modeling is
autos, the use of complements to taxicabs, the cost of taxi also used in various other areas of transportation; for ex-
usage, the taxi service quality, the presence of competing ample, the traffic condition of the downstream section of
modes, and the presence of a senior or disabled popula- a road is highly correlated with that observed upstream.
tion. With the emergence of GPS technology, subsequent Stathopoulos and Karlaftis (2003) considered the spatial
extensive research into the use of spatial information has information from four consecutive loop detectors in the
been applied in the context of transportation-related prob- area upstream of the study section for predicting the traffic
lems. GPS based systems are also used to track the taxis flow in the downstream of an urban corridor, while the
of New York City and to analyze the taxi ridership. Yang same idea was used in the studies by Cheng et al. (2011)
and Gonzales (2017) processed the New York City GPS taxi and Duan et al. (2016) for predicting the traffic speed of
data and used the negative binomial method to capture the downstream link.
the variation in the taxi pick-up demand. Their study used One of the most important parts of STARIMA modeling
six explanatory variables, namely population, education, is the spatial weighting matrix, which indicates the spa-
median age, median income per capita, employment by tial dependency between multiple time series. Thus, the
industry sector, and transit accessibility. Correa, Xie, and optimal spatial weighting matrix varies with the nature of
Ozbay (2017) performed an empirical analysis in order the problem, and determining it requires some engineer-
to explore the spatial dependence between Uber and taxi ing judgment. In general, two approaches have been used
pick-up data. The results from Moran’s I tests confirmed for selecting the neighboring dependence: (a) correlation-
the significant spatial correlation in both taxi and Uber coefficient assessment and (b) distance adjustment. The
demand. values in STARIMA’s weighting matrix can vary by time and
Several studies have considered prepositioning taxis so location. In one method that has been developed, called
as to reduce wait times (Chang, Tai, & Hsu, 2010; Yuan, General STARIMA, the spatial parameters are designed to
Zheng, Zhang, Xie, & Sun, 2011) using spatio-temporal vary by location, instead of having fixed values over all
clustering. Time series models such as ARIMA have also locations (Min, Hu, & Zhang, 2010). Another approach that
been tested for taxi demand prediction (Moreira-Matias is associated with the weighting matrix is to consider only
et al., 2013; Qian et al., 2017; Sayarshad & Chow, 2016). the link/zone that is adjacent to the target link/zone. It
Moreira-Matias et al. (2013) proposed a methodology for can be elaborated by a ring of dependency, defined by the
the prediction of short-term taxi demand at 30-min time ‘‘order’’. For instance, a first-order adjacent matrix repre-
intervals. Their methodology is an ensemble of three presents the dependency between the study link/zone and
dictive models, namely a time-varying Poisson model, its immediate adjacent link/zone (first-order link/zone). A
a weighted time-varying Poisson model, and an ARIMA second-order adjacent matrix shows the dependency of
model. They found that their proposed model outper- zone that is not directly adjacent to the study zone, but it
formed all three models run individually. The recent study is an immediate adjacent to the link/zone defined as first-
by Qian et al. (2017) also used artificial neural networks order stated earlier. It can also be expanded to a third-
to combat nonlinearities in the tax demand. Furthermore, order adjacent matrix, and so forth. First- and second-order
they attempted to capture spatio-temporal variations us- adjacency-weighting matrices were used in the study by
ing conditional random fields. The proposed model and Kamarianakis, Prastacos, and Kotzinos (2004). On the other
two other algorithms (ARIMA and ANN) were run in four hand, it is more practical to use the distance between the
different scenarios and their performances evaluated. The two links/zones, where the value of the dependency is
results reported that the proposed model outperformed reduced by increasing the distance.
Fig. 4. Sample ACFs of the first 5 components.
3. Methodology Due to the high dimensionality of the data, simple VAR

models are not appropriate here. Instead, this section de-
This section introduces the proposed spatio-temporal velops a generalized version of the STARMA model, which
model and briefly discusses the implementation of the takes into account the topology of the locations at which
model. Suppose that k different time series are observed the data is observed, with the aim of increasing the predic-
over a period of duration T . If one chooses vector autore- tion performance efficiency. STARMA models, introduced
gression (VAR) models with a maximum time lag of p to by Pfeifer and Deutsch (1980, 1981), are spatio-temporal
fit the data, a total of k2 × p parameters need to estimated models that reduce the number of parameters in a typical
using the k × T observed data points. Now, if k is relatively VAR model by introducing neighborhood structures. Here,
large compared to T , then the number of parameters in we focus only on the autoregressive (AR) part of this model,
the model will be more than the number of data points since it is more interpretable. A multivariate time series
observed. This is called a high-dimensional problem. The Y (t ) = (Y1 (t ) , . . . , Yk (t )) , t = 1, 2, . . . , T , is called a
typical least square methods cannot be used, as the design generalized STAR of order p (see Di Giacinto, 1994, and
matrix will not be invertible. The data set that we explore Terzi, 1995, for an introduction, and Di Giacinto, 2006, for
in this paper has features similar to high-dimensional time its application to regional unemployment analysis) if for
series. More specifically, we consider the yellow taxi de- each t = 1, 2, . . . , T and i = 1, 2, . . . , k,
mand in NYC for the day October 6th, 2015, a date chosen
P ηj −1
because it is a typical day without any holidays or special
φi(j,l) Wi(l) Y (t − j) + εi (t ) ,
∑ ∑
events nearby. The demand is aggregated spatially over Yi (t ) = (1)
the zip codes, and temporally every 15 min, making it a j=1 l=0
multivariate time series with more than 100 components. where εi (t ) = (ε1 (t ) , . . . , εk (t )) is a k-variate normal
However, only 39 of the zip codes have enough non-zero variable with mean zero and
counts to keep them in the model. It is worth noting that {
σ 2 Ik , s = 0
E ε (t ) ε(t + s)′ =
( )
the zip code zoning system includes some small zones 0, ∧other w ise.
(even as small as a block/building) that should be removed
from the inputs, as the demand in these small zones is not Also, the W (l) s are k × k weighting matrices that govern
(0)
of interest. Thus, the dataset ultimately consists of k = 39 the lth neighborhood location, with Wi = Ik . Denote
(l)
locations and T = 96 time points. Fig. 4 shows the sample the ith row of W by Wi . One possible choice for W (l)
(l)
ACFs of the first five components of the data, which imply is to set W (l) (i, j) = 1 if the ith and jth locations are
the existence of a strong temporal dependence. Hence, a lth level neighbors, and W (l) (i, j) = 0 otherwise. These
multivariate time series model is chosen for analyzing this matrices are then normalized in such a way that the sum
dataset. of each row is 1. Finally, for each i = 1, 2, . . . , k, and
(j,0:ηj −1) (j,η −1)

( )
j = 1, 2, . . . , p, φi = φi(j,0) , φi(j,1) , . . . , φi j is • HGLASSO (hierarchical group LASSO): similar to the
a vector of coefficients of size ηj that relates the current ob- HVAR method introduced by Nicholson, Bien, and Matte-
servation at location i , Yi (t ), to all weighted observations son (2014) and Nicholson, Matteson, and Bien (2017) for
in ηj different neighborhoods j time lags in the past. With- sparse VAR models. The coefficients for each time lag are
out loss of generality, it is assumed that η1 = ·(· · = ηp = η) grouped together, and they are penalized more if the time
(if they are different, one can choose η = max η1 , . . . , ηp lags are higher by means of a time-lag hierarchical group
(j,l) structure. More specifically, denoting
and set some (of the φi coefficients to)zero). Furthermore, (
(j,0:η−1) (p,0:η−1)
)
(1,0:η−1) (p,0:η−1) (j:p)
denote Φi = φi , . . . , φi . In the generalized Φi = φi , . . . , φi for j = 1, 2, . . . , p,
STAR model in Eq. (1), the current value Yi (t ) depends on
the previous values observed at nearby locations through p  
∑  (j:p) 
both the φ parameters and the weighting matrices W. Ω (Φi ) = Φi  . (6)
2
The difference between VAR and STAR models is that VAR j=1
models have the current value depending on the previ-
• DHGLASSO (double hierarchical group LASSO): similar to
ous values of all other time series components, whether
HGLASSO, but with an additional neighborhood-lag hierar-
they are nearby observations or not. Thus, VAR models
chical group structure penalty term. Denoting
don’t utilize the spatial structure, meaning that they may
perform poorly for prediction, especially when the spatial (j:p,l:η−1)
(
(j,l:η−1) (j+1,0:η−1) (p,0:η−1)
)
Φi = φi , φi , . . . , φi ,
correlations are high. It is worth noting that there are
also various other models that use the spatial structure, j = 1, 2, . . . , p, l = 0, 2, . . . , η − 1, (7)
including the conditional autoregressive model (CAR), spa-
one can write the penalty function as
tial autoregressive models, and the simultaneously autore-
gressive model (SAR; see Cressie, 2015, and LeSage, 1997, p η−1 
∑ ∑  (j:p,l:η−1) 
for more details). However, none of these have any notion Ω (Φi ) = Φi  . (8)

2
of time, meaning that they are not applicable for our time j=1 l=0
series data, especially given that our objective is to predict
the values of demand in the future.
3.1. Implementation
It would be more convenient to write Eq. (1) in a com-
pact matrix form. For that, let Yi = Yi (1) , . . . , Yi (T ) ∧
εi = (εi (1) , . . . , εi (T )) , and define Zi as T × ηp with The solving of optimization problems of the type in
(l) Eq. (4) has been studied extensively under the penalty
Zi (t , (j − 1) η + l) = Wi Y (t − j) for t = 1, 2, . . . , T , j =
terms introduced previously (see Tibshirani, 1996, and ref-
1, 2, . . . , p, and l = 0, 2, . . . , η − 1. Now, one can write the
erences therein). Due to the hierarchical structure of the
data equation for the ith time series component as
group penalties in HGLASSO and DHGLASSO, here we ap-
Yi = Zi Φi + εi , (2) ply the proximal gradient method introduced by Jenatton,
Mairal, Obozinski, and Bach (2011). Furthermore, the con-
2
This model reduces the number of parameters from k × p vergence rate of the proximal gradient method has been
in the VAR model to k × η × p, assuming η ≪ k. Least improved by Beck and Teboulle (2009) by introducing the
squares estimation can be implemented for parameter es- fast iterative soft-thresholding algorithm (FISTA), where a
timation; i.e., for i = 1, 2, . . . , k, sequence of matrix coefficients Φ̂i [r] , r = 1, 2, . . ., are
1 introduced iteratively through
Φ̂i = argminΦi ∥Yi − Zi Φi ∥22 , (3)
2 r −2 ( )
with ∥.∥2 being the Euclidean norm. However, for the cases φ̂ = Φ̂i [r − 1] + Φ̂i [r − 1] − Φ̂i [r − 2]
r +1
where T is small compared to k, it might still be beneficial
to reduce the number of parameters in the model with the ( ( ))
aim of improving the forecast performance. For that pur- Φ̂i [r] = ProxsλΩ φ̂ − s∇ fi φ̂ , (9)
pose, a penalty function Ω (Φ ) will be added to Eq. (3) with
the aim of setting some of the small parameters to zero so where fi (Φi ) = 21 ∥Yi − Zi Φi ∥22 ∇ fi (Φi ) = −Zi′ (Yi − Zi Φi )
as to increase the forecast efficiency. More specifically, is the vector of derivatives of fi (Φi ) , s is the step size (here,
we choose s to be 1/σ1 (Zi )2 , where σ1 (Zi ) is the largest
1
Φ̂i = argminΦi ∥Yi − Zi Φi ∥22 + λΩ (Φi ) , (4) singular value of Zi ), and
2 ( )
1
where λ is the tuning parameter to be selected by cross ProxsλΩ (u) = argminυ ∥u − υ∥2 + sλΩ (υ) . (10)
validation techniques. We define several penalty functions 2
and evaluate their performances on the yellow taxi de- The proximal function may not have a closed form in
mand data. More specifically, we consider the following general, and in that case, it must itself be approximated
penalty functions. numerically. However, in the case of a hierarchical group
• LASSO: a simple element-wise L1 penalty on all the com- penalty, this function does in fact have a simple closed form
ponents of Φi ; i.e., for i = 1, 2, . . . , k, (see for example algorithm 2 of Nicholson et al. (2014)),
p η−1 ⏐ which makes the whole optimization efficient. The tuning
∑ ⏐ (j,l) ⏐⏐
parameter λ is selected based on a rolling scheme cross-
∑
Ω (Φi ) = ⏐φi ⏐ . (5)
j=1 l=0 validation procedure that was used also by Nicholson et al.
Table 1
Results for October 6th data with η = 1.
Model MSPE MRPE AIC BIC
VAR 1.7153 4.8259 216.4933 257.1222
STAR (Univariate AR) 0.2815 2.4463 176.9854 178.0271
LASSO 0.2977 1.8467 173.8735 174.9153
HGLASSO 0.2977 1.8467 173.8735 174.9153
DHGLASSO 0.2977 1.8467 173.8735 174.9153
(2014, 2017) and Song and Bickel (2011). For this purpose, Table 2
the time points are divided into three parts (usually at Results for October 6th data with η = 2.
equal distances), 0 < T1 < T2 < T . The estimation Model MSPE MRPE AIC BIC
procedure for fixed values of λ is applied for the first part, STAR 0.2707 1.9913 177.3313 179.4148
i.e., t = 1, 2, . . . , T1 ; then the mean squared prediction LASSO 0.2728 1.9616 177.0052 179.0353
error (MSPE) for predicting one step ahead is calculated HGLASSO 0.2909 1.8942 176.0614 178.1449
DHGLASSO 0.2907 1.9543 178.6925 180.6425
over all k time series components on the time interval
[T1 + 1, T2 ] :
Table 3
k T2 Results for October 6th data with η = 3.
1 ∑ ∑ )2
Yi (t ) − PT1 Yi (t ) ,
(
MSPE = (11) Model MSPE MRPE AIC BIC
k (T2 − T1 )
i=1 t =T1 +1 STAR 0.2932 2.1346 178.7531 181.8784
k T2 LASSO 0.3254 2.1413 177.3218 179.1115
1 ∑ ∑
HGLASSO 0.301 2.114 175.1811 178.3064
MRPE =
k (T2 − T1 ) DHGLASSO 0.2821 1.9472 176.3991 179.3107
i=1 t =T1 +1
× |Yi (t ) − PT1 Yi (t )| |Yi (t )|, (12) Table 4

Results for October 6th data with η = 4.
where PT1 Yi (t ) is the best linear predictor of Yi (t ) based
on the first T1 observations. The mean of the relative pre-
diction error (MRPE) is also shown in Eq. (12). Now, we STAR 0.3582 2.4261 182.2474 186.3877
select the tuning parameter λ that minimizes this MSPE, LASSO 0.3506 2.2577 177.8353 180.3196
HGLASSO 0.3412 2.2928 177.0926 181.2329
and quantify the model performance based on the MSPE DHGLASSO 0.2968 1.882 176.5145 180.0939
on the last part of the data, which covers the time interval
[T | |2 + 1, T ].
4. Results 4.1. Case study using data from October 6th only
This section applies the proposed methods to the yellow Considering data from October 6th, only T = 96 time
taxi demand data on different days, and calculates their points are available. A rolling window scheme is used to
prediction performances under different scenarios. Based divide T into three parts, with T1 being set to ⌊T /3⌋ and
on the sample ACFs of the data, p is chosen to be 1, and the T2 to ⌊2T /3⌋. Different orders of neighborhood (η) are
calculation of the AIC/BIC also supports this selection. How- chosen, and the MSPE, mean squared relative prediction
ever, before applying different methods to this dataset, it error (MRPE), AIC and BIC (see Lutkepohl, 2007, for the
needs to be scaled properly. For this purpose, the sample definitions and formulas) values are reported in each case.
mean is subtracted from each time series that corresponds Tables 1–4 show the results for η = 1, 2, 3, 4, respec-
to a zip code, and the resulting series are divided by the tively. In simple words, η = 1 only considers the previous
sample standard deviation, so that all time series have the time data of the study zone which does not consider any
same scales. Also, the weighting matrices W are chosen for neighborhood information, η = 2 considers only the in-
five different neighborhood levels based on the authors’ formation of the first-order neighborhood, and it continues
judgment, or, more specifically, by counting the numbers of up to 3rd-order neighbor which is η = 4. Obviously,
boundaries between the target zip code and its neighbors. the VAR model does not perform well relative to STAR-
For example, a zip code that is adjacent to the target zip based models due to the huge number of parameters in-
code is considered as the first-order neighborhood; zip volved. Based on the MSPE, the STAR and LASSO models
codes adjacent to the first-order neighborhood are the for η = 2 outperform the rest. The difference between
second-order neighborhood for the target zip code, and so the prediction performances of these two methods, STAR
on. This study extends the neighborhood up to five levels by and LASSO, and those of the other methods is statistically
means of an eyeballing procedure. October 6th and 7th are significant, with p -values of less than 0.01 based on the
chosen for this research because they are typical weekdays, Diebold–Mariano test statistics developed by Diebold and
being away from both weekends and days with special Mariano (1995). This means that the inclusion of the first
events. Two different approaches have been considered neighborhood structure improves the forecasting perfor-
for evaluating the performance of the developed model. mance of the STAR model. Meanwhile, the proposed spatio-
First, we consider time points from October 6th only; and, temporal structure using the topology and zip code based
second, the two days of October 6th and 7th are merged to disaggregation of Manhattan with a first-order neighbor-
give a longer range of time points. hood (η = 2) performs the best in this case study. Also, it
Table 5 Table 8
Results for October 6th and 7th data combined, with η = 1. Results for October 6th and 7th data combined, with η = 4.
Model MSPE MRPE AIC BIC Model MSPE MRPE AIC BIC
VAR 0.7103 14.544 204.3445 230.15 STAR 0.2247 4.4178 178.271 180.9008
STAR (univariate AR) 0.253 3.9068 182.6419 183.3035 LASSO 0.2244 4.3892 178.0977 180.6765
LASSO 0.2527 3.8983 182.5923 183.254 HGLASSO 0.2247 4.419 178.2357 180.8654
HGLASSO 0.2527 3.8983 182.5923 183.254 DHGLASSO 0.2224 4.162 177.6367 180.2156
DHGLASSO 0.2527 3.8983 182.5923 183.254
Table 9
Table 6 Results for October 6th and 7th data combined, with η = 5.
Results for October 6th and 7th data combined, with η = 2.
Model MSPE MRPE AIC BIC STAR 0.2279 3.5851 178.7162 182.0077
STAR 0.2273 4.0633 178.3005 179.6239 LASSO 0.2257 3.5265 178.0827 181.1196
LASSO 0.2273 4.0633 178.3005 179.6239 HGLASSO 0.2277 3.5857 178.6503 181.9418
DHGLASSO 0.2273 4.0633 178.3003 179.6237
Table 10
Table 7 Results for October 6th and 7th data combined, with η = 6.
Results for October 6th and 7th data combined, with η = 3.
Model MSPE MRPE AIC BIC STAR 0.2405 3.5611 178.4606 182.4137
STAR 0.2249 4.1741 177.9721 179.9571 LASSO 0.238 3.4261 177.7624 181.2913
LASSO 0.2248 4.1703 177.9496 179.9347 HGLASSO 0.238 3.5376 178.2116 182.1647
DHGLASSO 0.2238 4.1062 177.6838 179.6519
Another benefit of the use of STAR-based models is

is worth mentioning that the DHGLASSO penalty function that one can infer the neighborhood influences of other
provides a consistent model performance overall, since zip codes’ demands on a given target zip code. Figs. 6, 7,
its MSPE and MRPE values do not increase dramatically and 8 show the inferred neighborhood correlations among
with η. In other words, the DHGLASSO penalty structure is the η = 5 different neighborhood orders for lower, mid-
better at correcting for the increase in the parameter space town, and upper Manhattan, respectively. The colors on
dimension. Also, increasing the number of neighborhood these plots are basically |Φi | for different components of
levels η, and thus the number of parameters in Φ , allows i based on the DHGLASSO method. It is clear from these
the DHGLASSO method to reduce the MSPE by around 3% plots that the correlation/influence between neighboring
relative to the STAR model when η = 3, and by around 17% zip codes decreases as they get farther apart. This corre-
when η = 4. If the MRPE is selected as the measure of the lation structure is reasonable and well-aligned with the
forecasting performance, the DHGLASSO is comparable to assumptions involved in using a spatio-temporal model
the other leading models when η = 4. such as the STARMA model for predicting taxi demand in
Manhattan, New York. In other words, a knowledge of the
4.2. Case study using data for October 6th and 7th combined short-term demand histories of neighboring zip codes in
lower Manhattan will be more informative for predicting
The same models and methods that were applied in the taxi demand over the next 15 min for a zip code in lower
the previous case study are applied again here to the taxi Manhattan than a knowledge of the short-term demand
demand for two days, October 6th and 7th. This makes histories of zip codes in upper Manhattan. The proposed
the total number of time points 192, instead of 96 as in DHGLASSO model is able to capture this decreasing trend
the previous case study. Increasing T while keeping k fixed accurately within the STARMA structure, reaching the low-
reduces the effect of the penalization on parameter esti- est prediction error among all methods considered.
mation, and hence on the forecasting performance, as can Another notable feature that can be highlighted using
be seen from Tables 5–10, which show the performances the proposed generalized STAR model with DHGLASSO is
of the methods when η = 1, 2, 3, 4, 5, 6, respectively. the variation in the spatial differences in the dependence of
In this scenario, DHGLASSO for η = 5 outperforms the demand of neighboring zip codes. It can be seen from Fig. 6
other methods in terms of MSPE. The DHGLASSO with that the values of the coefficients of the second and third
η = 5 reduces the MSPE significantly relative to other levels of neighboring zip codes are not the same among the
methods with different values of η. The Diebold–Mariano zip codes, even in lower Manhattan. More specifically, for
test statistics developed by Diebold and Mariano (1995) zip code 10280, the coefficient for the second-level neigh-
are applied to the different components of the data, and bors’ demand is less than that for the third-level neighbors.
the test shows that the improvement in the performance However, for zip code 10002, the coefficients for first-,
of DHGLASSO relative to those of the other methods is sta- second- and third-level neighbors’ zip codes demands all
tistically significant, with p-values of less than 0.01 in most decrease with the level of neighborhood. This non-linear
cases. Fig. 5 summarizes the MSPE results of all methods for trend in the coefficients for neighboring zip codes could be
the six levels of η. Again, DHGLASSO is the most consistent due to the smaller areas of the zip codes – particularly for
penalty function with respect to the increase in η. zip codes 10004 and 10280.
Fig. 5. MSPE results for October 6th and 7th data combined, with η = 1, 2, . . . , 6.
Fig. 6. Neighborhood-level estimated coefficients for lower Manhattan (zip codes: 10004, 10002, 10280).
Fig. 7. Neighborhood-level estimated coefficients for midtown Manhattan (zip codes: 10019, 10022, 10128).
Fig. 8. Neighborhood-level estimated coefficients for upper Manhattan (zip codes: 10021, 10028, 10027).
5. Conclusion neighborhood structure changes depending on the location

of interest. The influence of neighborhood taxi demand lev-
The accurate prediction of yellow taxi demand in large, els is easy to interpret, especially for agencies that manage
populous, and dense areas of cities like New York is hard taxi operations and other TNCs. Taxi companies and TNCs
to achieve, since there are numerous parameters that af- can easily use the neighborhood taxi demand dependence
fect the demand. Moreover, the demand for taxis in such to direct taxi drivers to remain in certain areas, depending
densely populated areas is highly variable in different parts on the time of day and location. This helps to reduce the
of the city, depending on the time of the day. In this lengths of empty taxi trips seeking new rides, thus reducing
study, taxi demand data obtained from the historical us- emissions, improving air quality and reducing fuel costs for
age of individual (GPS-enabled) taxis, obtained from NYC the operators. The increased computational efficiency due
TLC, are aggregated both spatially by zip code and tempo-
to the DHGLASSO penalization structure also helps with
rally by 15-min time intervals. We propose a multivariate
estimating the model in real-time.
spatio-temporal method called STARMA that reduces the
The proposed modeling approach can be used as a de-
number of parameters dramatically compared to typical
mand prediction tool in its current format for planning
multivariate time series models such as VAR, by means
the positioning of taxis. However, in addition, one could
of the neighborhood structure between the regions. This
structure is useful for capturing the spatial dependence of also include several other covariates that may potentially
the demand between the regions, and makes the results represent or act as surrogates for the demand being served
more interpretable. We also present a new method for by other providers for a real-time implementation. Also,
penalizing prediction parameters, called the double hier- ideally, the parameters may be re-estimated periodically
archical group LASSO (DHGLASSO). DHGLASSO penalizes to account for changes in the demand for yellow taxis,
parameters that are farther away not only temporally but which would account indirectly for the passenger de-
also spatially to a larger extent, thus establishing a ‘double’ mand being served by other providers and help to predict
hierarchy. the demand for more current conditions. Even with such
The proposed model and several other comparable time modifications, the computational efficiency offered by the
series models and penalty functions are applied to the double-hierarchical LASSO structure means that the pro-
yellow taxi demand of Manhattan for a typical day of the posed modeling approach could still be useful for a real-
week. The results reveal that the proposed model captures time implementation.
the structure of the data well, with lower prediction errors As part of ongoing and future work, the modeling frame-
than the other time series models considered, such as work is being extended using other forms of disaggrega-
VAR, STAR with and without LASSO, etc. Using data from tion. The utilization of additional travel demand-related
both a single day and two consecutive days, the proposed information such as subway and bus ridership, bike de-
generalized STAR model with DHGLASSO performed the mand, weather, etc., will also be being considered through
best in terms of predictive performances. For the model the addition of exogenous variables to the time series
using data from two consecutive days, a maximum level of regime.
five neighborhood orders performed the best. Additionally,
DHGLASSO is shown to be most consistent and stable when
dealing with increasing parameter dimensions. Acknowledgment
The proposed generalized STAR model and penalty
function is able to capture the spatial variation in the de- This work was supported in part by NSF, United States,
mand for taxis among zip codes very well. The effect of the Grant IIS-1302423.
References Moghimi, B., Safikhani, A., Kamga, C., & Hao, W. (2017). Cycle-length pre-
diction in actuated traffic-signal control using ARIMA model. Journal
Baklanov, A., Hänninen, O., Slørdal, L. H., Kukkonen, J., Bjergene, N., Fay, B., of Computing in Civil Engineering, 32(2), 04017083.
et al. (2007). Integrated systems for forecasting urban meteorology, Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L.
air pollution and population exposure. Atmospheric Chemistry and (2013). Predicting taxi-passenger demand using streaming data. IEEE
Physics, 7, 855–874. Transactions on Intelligent Transportation Systems, 14(3), 1393–1402.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding al- New York City Taxi & Limousine Commission, (2016). TLC factbook. http:
gorithm for linear inverse problems. SIAM Journal on Imaging Sciences, //www.nyc.gov/html/tlc/downloads/pdf/2016_tlc_factbook.pdf.
2(1), 183–202. Nicholson, W. B., Bien, J., & Matteson, D. S. (2014). Hierarchical vector
Chang, H., Tai, Y., & Hsu, J. Y. (2010). Context-aware taxi demand hotspots autoregression. arXiv preprint, arXiv:1412.5250.
prediction. International Journal of Business Intelligence and Data Min- Nicholson, W. B., Matteson, D. S., & Bien, J. (2017). VARX-L: structured reg-
ing, 5(1), 3–18. ularization for large vector autoregressions with exogenous variables.
Cheng, T., Wang, J., Harworth, J., Heydecker, B. G., & Chow, A. H. F. (2011). International Journal of Forecasting, 33(3), 627–651.
Modeling dynamic space–time autocorrelations of urban transport Okutani, I., & Stephanedes, Y. J. (1984). Dynamic prediction of traffic
network. GeoComputation, Session 5A: Network Complexity, 215–220. volume through Kalman filtering theory. Transportation Research, Part
Correa, D., Xie, K., & Ozbay, K. (2017). Exploring the taxi and uber demands B (Methodological), 18(1), 1–11.
in New York City: an empirical analysis and spatial modeling. In Pfeifer, P. E., & Deutsch, S. J. (1980). A three-stage iterative procedure for
Transportation research board’s 96th annual meeting, Washington, D.C. space–time modeling phillip. Technometrics, 22(1), 35–47.
Cressie, N. (2015). Statistics for spatial data. John Wiley & Sons. Pfeifer, P. E., & Deutsch, S. J. (1981). Variance of the sample space–time
Di Giacinto, V. (1994). Su una generalizzazione dei modelIi spazio- autocorrelation function. Journal of the Royal Statistical Society. Series
temporali autoregressivi media mobile (STARMAG). In Atti della B. Statistical Methodology, 43, 28–33.
XXXVII riunione scienti_ca SIS, Sanremo, Aprile 1994, vol. H. Qian, X., Ukkusuri, S. V., Yang, C., & Yan, F. (2017). A model for short-term
Di Giacinto, V. (2006). A generalized space–time ARMA model with an taxi demand forecasting accounting for spatio-temporal correlations.
application to regional unemployment analysis in Italy. International In Transportation research board annual 2017, Washington D.C.
Regional Science Review, 29(2), 159–198. Sartoris, A. (2005). A STARMA model for homicides in the city of Sao Paulo.
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. In Proceedings of the spatial economics workshop, kiel institute for world
Journal of Business & Economic Statistics, 13, 134–145. economics, 8–9 April, 2005, Kiel, Germany.
Duan, P., Mao, G., Zhang, C., & Wang, S. (2016). STARIMA-based traffic Sayarshad, H. R., & Chow, J. Y. J. (2016). Survey and empirical evaluation
prediction with time-varying lags. In IEEE 19th international conference of nonhomogeneous arrival process models with taxi data. Journal of
on intelligent transportation system, Rio, Brazil. Advanced Transportation, 50, 1275–1294.
Giacomini, R., & Granger, C. W. J. (2004). Aggregation of space–time Schaller, B. (1999). Elasticities for taxicab fares and service availability.
processes. Journal of Econometrics, 118, 7–26. Transportation, 26, 283–297.
Hernández-Murillo, R., & Owyang, M. T. (2006). The information content Schaller, B. A. (2005). Regression model of the number of taxicabs in U.S.
of regional employment data for forecasting aggregate conditions. cities. Journal of Public Transportation, 8, 63–78.
Economics Letters, 90(3), 335–339. Shoesmith, G. L. (2013). Space–time autoregressive models and forecast-
Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods ing national, regional and state crime rates. International Journal of
for hierarchical sparse coding. Journal of Machine Learning Research Forecasting, 29(1), 191–201.
(JMLR), 12, 2297–2334. Song, S., & Bickel, P. (2011). Large vector auto regressions. arXiv preprint
Kamarianakis, Y., & Prastacos, P. (2003). Forecasting traffic flow conditions arXiv:1106.3915.
in an urban network: comparison of multivariate and univariate ap- Stathopoulos, A., & Karlaftis, M. G. (2003). A multivariate state space ap-
proaches. Transportation Research Record: Journal of the Transportation proach for urban traffic flow modeling and prediction. Transportation
Research Board, 1857, 74–84. Research Part C: Emerging Technologies, 11(2), 121–135.
Kamarianakis, Y., Prastacos, P., & Kotzinos, D. (2004). Bivariate traffic Terzi, S. (1995). Maximum likelihood estimation of a generalized STAR(p,
relations: A space–time modeling approach. In AGILE proceedings lp) model. Journal of the Italian Statistical Society, 4(3), 377–393.
(pp. 465–474). Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Kyriakidis, P. C., & Journel, A. G. (1999). Geostatistical space–time models: Journal of the Royal Statistical Society. Series B. Statistical Methodology,
a review. Mathematical Geology, 31, 651–683. 58, 267–288.
LeSage, J. P. (1997). Bayesian estimation of spatial autoregressive models. Yang, C., & Gonzales, E. (2017). Modeling taxi demand and supply in New
International Regional Science Review, 20(1–2), 113–129. York City using large-scale taxi GPS data. In P. Thakuriah, N. Tilhun, &
Lutkepohl, H. (2007). New introduction to multiple time series analysis. M. Zellner (Eds.), Seeing cities through big data–research, methods and
Springer. applications in urban informatics (pp. 405–425). Springer.
Min, X., Hu, J., & Zhang, Z. (2010). Urban traffic network modeling and Yuan, J., Zheng, Y., Zhang, L., Xie, X., & Sun, G. (2011). Where to find my
short-term traffic flow forecasting based on GSTARIMA model. In next passenger. In Proceedings of the 13th international conference on
13th international IEEE annual conference on intelligent transportation ubiquitous computing, Beijing, China —September 17–21 (pp. 109–118).
systems, September 19-22, Madeira Island, Portugal. New York, NY: ACM.

Spatio-Temporal Modeling of Yellow Taxi Demands Time Series

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatio-Temporal Modeling of Yellow Taxi Demands Time Series

Uploaded by

Copyright:

Available Formats

International Journal of Forecasting ( ) –

Contents lists available at ScienceDirect

International Journal of Forecasting

Spatio-temporal modeling of yellow taxi demands in New

1. Introduction to a much greater efficiency and more nuanced economic

Fig. 1. Temporal and spatial variation in taxi demand.

Fig. 2. Spatial variation of taxi demand aggregated by zip code in Manhattan.

Fig. 3. Distribution of distance travelled by taxis in 2015.

Fig. 4. Sample ACFs of the first 5 components.

3. Methodology Due to the high dimensionality of the data, simple VAR

(j,0:ηj −1) (j,η −1)

× |Yi (t ) − PT1 Yi (t )| |Yi (t )|, (12) Table 4

Another benefit of the use of STAR-based models is

5. Conclusion neighborhood structure changes depending on the location

You might also like