You are on page 1of 5

Available online at www.sciencedirect.

com

Available online at www.sciencedirect.com


Available online at www.sciencedirect.com
Available online
Procedia at www.sciencedirect.com
Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia

ScienceDirect
Procedia Computer Science 00 (2019) 000–000
2018 International Conference
Procedia
Procedia onScience
Computer
Computer Identification,
Science 14700 (2019)
(2019) Information
000–000
562–566 and Knowledge
www.elsevier.com/locate/procedia
in the Internet of Things, IIKI 2018 www.elsevier.com/locate/procedia

Predicting bike sharing


2018 International demand
Conference using recurrent
on Identification, Information neural networks
and Knowledge
2018 International Conference on Identification, Information and Knowledge
in the Internet of Things, IIKI 2018
a,∗ in
the Chen
Internet of Things,
Yan Pan , Ray Zheng a
, JiaxiIIKI 2018a , Xin Yaob
Zhang
Predicting
Predicting bike
bike sharing
sharing
The High a demand
demand
School Affiliated using
using
to Renmin University recurrent
Beijing, 100080,neural
recurrent
of China,
neural
China networks
networks
b
Institute of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China
Yan Pana,∗ a a
a,∗, Ray Chen Zhenga , Jiaxi Zhanga , Xin Yaob
b
Yan
a
Pan , Ray Chen Zheng , Jiaxi Zhang , Xin Yao
The High School Affiliated to Renmin University of China, Beijing, 100080, China
b Institute a The High School Affiliated to Renmin University of China, Beijing, 100080, China
Abstract of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China
b Institute of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China
Predicting bike sharing demand can help bike sharing companies to allocate bikes better and ensure a more sufficient circulation
of bikes for customers. This paper proposes a real-time method for predicting bike renting and returning in different areas of a city
Abstract
during a future period based on historical data, weather data, and time data. We construct a network of bike trips from the data,
Abstract
use a community detection method on the network, and find two communities with the most demand for shared bikes. We use
Predicting bike sharing demand can help bike sharing companies to allocate bikes better and ensure a more sufficient circulation
data of stations in
Predicting the two communities helpasbike
our dataset, and train antodeep LSTM model with two layersa to predict bike renting and
of bikes forbike sharing
customers. demand
This paper can
proposes sharing companies
a real-time allocate
method for predicting bikes better
bike renting and
and ensure
returningmore sufficient
in different circulation
areas of a city
returning,
of bikes making
for customers.use of thepaper
This gatingproposes
mechanism
a of longmethod
real-time short termfor memory and
predicting bikethe abilityand
renting to returning
process sequence
in data
different of recurrent
areas of a city
during a future period based on historical data, weather data, and time data. We construct a network of bike trips from the data,
neural network.
during We evaluate
a future period the
on model
basedmethod with the Root
historical Mean Squared Error of data and showathat the prediction of proposed model
use a community detection on thedata, weather
network, anddata,
find and
two time data.
communities Wewith
construct
the most network
demandofforbike trips
shared from
bikes.theWedata,
use
outperforms
use aofcommunitythat of other deep learning
on models by comparing their RMSEs.
data stations in detection method
the two communities the network,
as our dataset,and
andfind
traintwoan communities
deep LSTM modelwith the most
with twodemand
layers tofor shared
predict bikes.
bike We and
renting use
data of stations
returning, making in the
usetwo communities
of the as our dataset,
gating mechanism of longand train
short an deep
term memoryLSTMandmodel with to
the ability twoprocess
layers sequence
to predict data
bike of
renting and
recurrent
c 2019network.
returning,
neural The Authors.
making Weuse Published
of the the
evaluate by Elsevier
gating
model with B.V.
mechanism of long
the Root short
Mean term memory
Squared Error ofand
datathe
andability
show tothatprocess sequenceofdata
the prediction of recurrent
proposed model
This is network.
neural an open
outperforms thataccess
We article
evaluate
of other deep under
the the with
model
learning CC BY-NC-ND
modelstheby
Root license
Mean
comparing (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Squared
their Error of data and show that the prediction of proposed model
RMSEs.
Peer-review under
outperforms that ofresponsibility of the models
other deep learning scientific
bycommittee
comparingof theRMSEs.
their 2018 International Conference on Identification, Information
and Knowledge
c 2019 in the Internet of Things.

© 2019 The
The Authors.
Authors. Published
Published by by Elsevier
Elsevier B.V.
B.V.
c 2019
This is The Authors. Published by Elsevier B.V.
This is an
an open
Keywords: open
Sharedaccess
bike article
access article under
demand under the
the CC
prediction CC BY-NC-ND
BY-NC-ND
; time license
license
series forecasting (https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
; recurrent neural networks ; long short term memory
This is an open
Peer-review under access article
under responsibility under the
responsibilityofofthe CC BY-NC-ND
thescientific license
scientificcommittee
committee ofofthe(https://creativecommons.org/licenses/by-nc-nd/4.0/)
the 2018
2018 International
International Conference
Conference on Identification,
on Identification, Information
Information and
Peer-review
Knowledge
and Knowledgeinunder responsibility
theinInternet
the of of
of Things.
Internet the scientific committee of the 2018 International Conference on Identification, Information
Things.
and Knowledge in the Internet of Things.
Keywords: Shared bike demand prediction ; time series forecasting ; recurrent neural networks ; long short term memory
Keywords: Shared bike demand prediction ; time series forecasting ; recurrent neural networks ; long short term memory

1. Introduction

Bikes have long played an important part in city transportation. As a consequence, bike-sharing has recently re-
ceived
1. increasing attention around the world. Bike-sharing customers prefer to quickly find a bike whenever they need
Introduction
1.
one.Introduction
Thus, bike provider companies need to allocate bikes efficiently according to the demand. Appropriate prediction
of bike
Bikesdemands across
have long different
played areas over
an important different
part in city time is thus crucial.
transportation. As a consequence, bike-sharing has recently re-
Bikes
As many
ceived have long
increasing played
underlying
attention an important
factors — for
around part Bike-sharing
the example,
world. intime
city of
transportation.
the customers As
day, day of a consequence,
the week,
prefer bike-sharing
events,find
to quickly weather, has recently
a bike correlation
whenever re-
between
they need
ceived
stations
one. increasing
Thus,— bike attention
contribute
provider around
to companies the
the demandneed world.
of sharedBike-sharing
bikes[1],
to allocate customers
bikespredicting prefer
efficientlybike to
demand
according quickly
to is find
thevery a bike whenever
challenging.
demand. they
Several
Appropriate need
studies
prediction
one.
show
of Thus,
bikethat bike provider
analyzing
demands usage
across companies
data of
different need to different
taxicabs
areas over allocate bikes
[11], subways efficiently
time is [2],
thus according
buses[4],
crucial. to the demand.
and bikes[14] Appropriate
could predict futureprediction
transport
of bike
As manydemands across factors
underlying different—areas over different
for example, time
time of theisday,
thusday
crucial.
of the week, events, weather, correlation between
As many underlying factors — for example, time of the
stations — contribute to the demand of shared bikes[1], predicting bikeday, day of thedemand
week, events,
is veryweather, correlation
challenging. Severalbetween
studies
stations — contribute
∗ Corresponding to the
author. Tel.: demand of shared
+86-187-0137-1618. bikes[1], predicting bike demand is very
show that analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future challenging. Severaltransport
studies
show E-mail
thataddress: topanyan@sina.com
analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future transport
1877-0509  c 2019 The Authors. Published by Elsevier B.V.

This∗ Corresponding author.
is an open access Tel.:
article +86-187-0137-1618.
under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Corresponding
E-mail address:author. Tel.: +86-187-0137-1618.
topanyan@sina.com
Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in
E-mail address: topanyan@sina.com
the Internet of Things.
1877-0509  c 2019 The Authors. Published by Elsevier B.V.
1877-0509 
1877-0509 © 2019 The
c 2019 The Authors.
Authors. Published
Published by
by Elsevier B.V.
Elsevier B.V.
This
This isisan
anopen
openaccess article
access under
article the CC
under the BY-NC-ND
CC BY-NC-ND licenselicense
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an
Peer-review open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-reviewunder
underresponsibility of the
responsibility of scientific committee
the scientific of the
committee 2018
of theInternational Conference
2018 International on Identification,
Conference Information and
on Identification, Knowledge
Information in
and
Peer-review
the under
Internet of
Knowledge theresponsibility
inThings. of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in
Internet of Things.
the Internet of Things.
10.1016/j.procs.2019.01.217
Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 563
2 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000

usage. Kaltenbrunner et al.[9] discovered that temporal and spatial mobility patterns exist within the city; Vogel et
al.[13] discovered that there are spatio-temporal dependencies between rents and returns of bikes at stations.
In this paper, we propose a real-time method for predicting bike demands in different areas of a city during a future
period based on historical data from Citi Bike System Data and meteorology data. We use the time sequence of bike
rents and returns as dataset. We train a deep long short term memory (LSTM)[8] recurrent neural network (RNN) with
this data, making use of the self-loop and forget gate of LSTM. The model is proved to be effective after experiment
with various approaches. The method is able to handle huge data in an acceptable amount of time, and the same
method can be applied to other bike sharing systems.

2. Methodology

2.1. Deep LSTM

We choose the LSTM sequence learning model because of its ability to process sequential data and memorize data
of past time steps[7]. LSTM is a different type of gated RNN which is capable of learning long-term dependencies.
LSTM is not affected by vanishing gradient or exploding gradient problem [8]. Fig. 1a shows the mechanism of
LSTM. An LSTM has a internal recurrence and a self-loop[5] in addition to the outer recurrence, which allows the
network to accumulate information. The self-loop weight of LSTM is controlled by a forget gate, using a sigmoid unit
which sets the self-loop weight between 0 and 1. The external input gate and output gate have similar computations
to the forget gate. Thus, an ability to learn long-term dependencies were given to the network.

y<t>

softmax
y(d) y(d+1)
a<t>

c<t ­ 1> * loss function loss function


c<t>
c<t>
tanh
LSTM LSTM …
*
Dataset
a<t>
a<t ­ 1>
f<t> i<t> c~<t> o<t>
* a<t>
tanh gate  output gate 
forget gate  update gate 
LSTM LSTM …

x(d) x(d+1)

x<t>
(b) The structure of the deep LSTM sequence learn-
(a) The mechanism of LSTM. ing model.

Fig. 1: Fig. 1a shows the complete mechanism of LSTM using a flowchart. Fig. 1b shows the implementation of a deep LSTM model with two
layers of LSTM.

Like other neural networks in deep learning, RNNs could be stacked up to deeper versions, which contain more
than 1 layers of RNN. Because RNNs are especially computationally expensive to train, normally a deep RNN model
contains no more than 3 layers of LSTM. Deep RNN is very useful in learning complex functions. We use two LSTM
layers in our model. In deep LSTM, the model contain multiple layers, but the parameters of different layers are
calculated independently. The first layer of LSTM will compute a hidden layer of units based on the input. Then, the
second layer of LSTM will calculate the output based on the hidden units. Finally, the neural network will calculate
the loss function and try to minimize it. Fig. 1b shows the structure of the sequence model. With the deep LSTM
sequence learning model, we are able to learn complex functions and predict sequential data very accurately.

3. Experiment

3.1. Data Description & Processing

We use data from the Citi Bike System Data of 2017 as the training set and use data of January, February, and
March of 2018 as test set to conduct the experimental study. The Citi Bike have more than 800 bike stations built in
New York City and Jersey City. However, Fig. 2a and 2b show that the number of bike rent in a given hour can vary
564 Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566
Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000 3

hugely according to the location. Furthermore, analyzing every station on its own there is a repetitive pattern found,
as shown in Fig. 2c.
There are two problems with these stations having little amount of related trips. First, they may lead to the problem
of data scarcity, while LSTM have strict restriction on the quality of data[3]. Second, since they have little rents and
returns, the bikes hardly run out, so analyzing their time sequence is much less meaningful. We use the community
detection method proposed by Rosvall et al. (2008)[12] to detect the station community structure. The method results
in 12 large communities with more than 3 stations and other small communities. We only choose the two communities
with largest number of related trips as our dataset. Therefore, by using data of stations in a community as the dataset,
we could maintain the consideration of interactions between stations while filtering low-quality data.

(c) The accumulative number of (d) The community map.


(a) The accumulative spa- (b) The accumulative spatial rents and returns for a single sta- Community 1 and commu-
tial distribution of bike rents distribution of bike rents tion in different hours of week nity 3 are chosen because of
during 8:00-9:00 in March during 18:00-19:00 in in February 2018. (W 52 St & 6 their large number of related
2018. March 2018. Ave, station id: 3443) trips.

Fig. 2: The spacial distribution of bike rent and its repetitive pattern. Fig. 2a and 2b show that the demand of bikes varies significantly with stations,
and the comparison of the two sub-figures shows the difference of distribution of rent and return behaviors. Fig. 2c shows the repetitive pattern of
rents and returns for a single station in different time of week. The curve shows similar patterns from Monday to Friday, but different on Saturday
and Sunday. Fig. 2d shows the positions of all communities and the stations included in each community.

The raw data contains information in many dimensions, including spatial information, temporal information, and
customer information. In this model we use only the start time, end time, start station, and the end station of each
trip. We first convert the information into data of stations by dividing each day into several time steps and count the
number of rents and returns separately, denoted by Xrent and Xreturn . We also consider the importance of different
influence factors in our model, including Weather, Date, and Day of Week. Because people are more exposed to harsh
weather conditions during bike rides, the demand for shared bikes is greatly influenced by weather[1]. We consider
the potential influence of 3 different weather indicator — Temperature, Precip Intensity, and Wind Speed. We use

both the historical weather and future weather in the dataset, using Xweather and Xweather separately to denote weather
data of the past day and the target day, Xyear and Xweek to denote day of year and day of week. Therefore, the input
data structure is a combined matrix of all matrices. The input matrix consists of N rows, denoting the features, and
T columns, denoting the time steps. As we only need to predict future data of rents and returns, the output is the
combination of two matrices Yrent and Yreturn .

3.2. Deep LSTM Sequence Learning Model

To avoid the potentially strong influence of Day of Week, we use previous data to predict the data after 7 days. That
(d) (d) (d+7) (d+7)
is, we use Xrent and Xreturn to predict Xrent and Xreturn . We have 360 stations in total. We use data from January 1,
2017 to December 31, 2017 as training set and data from January 1, 2018 to March 31, 2018 as test set. The input
and output shape are 358 × 24 × 728 and 358 × 24 × 720 for training set, while 90 × 24 × 728 and 90 × 24 × 720
for test set. The experimental parameters are shown in Table 1a. We use the mean squared error as the loss function:
 N T < j>(d)
MS E(d) = N·T 1
· i=1 j=1 (Ŷi − Yi< j>(d) )2 , where N and T are the number of training sample and the total number
of timesteps, d represents the day, Ŷi< j> and Y is the predicted value and the real value of the ith training sample at the
jth timestep on day d, respectively.
Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 565
4 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000

Table 1: The experimental parameters, as well as average RMSEs for each method.

(a) Experimental Parameters


(b) Average RMSEs for each method.
Parameter Value
Method Training set Test set
Number of trips 16364502
LSTM 3.7046 2.7128
Number of stations 360
DNN 4.6083 3.1117
Time step length 1 hour
LSTM+LSTM 3.6752 2.7069
Time sequence length 24 hours
LSTM+DNN 3.4953 2.7731
Number of sequences 358
DNN+DNN 3.8289 2.9361
Number of hidden layers 0-2 LSTM+LSTM+DNN 4.4015 3.1106
Number of nodes in hidden layer 1000

(a) Training set (b) Test set (c) Net demand of train- (d) Net demand of test set
ing set

Fig. 3: Fig. 3a and 3b compare the RMSEs for different neural networks. The comparison shows the deep LSTM model fits the test set best. Fig. 3c
and 3d show the RMSEs of net demand different deep learning models for training set and test set.

(a) Real (b) Predicted

Fig. 4: Real and predicted number of rents at 18:00-19:00, January 14, 2018.

4. Result Analysis

In order to evaluate the performance of our proposed model, we use different deep learning models to predict the
demand and compare their results. Apart from LSTM, we also use deep neural networks (DNN)[6] to predict the result,
which does not take the property of sequence into consideration. Weuse the Root Mean Squared Error (RMSE)[10]
 N T < j>(d)
as the performance metric, which can be calculated by RMS E(d) = N·T 1
· i=1 j=1 (Ŷi − Yi< j>(d) )2 .
The result shows a mean RMSE of 3.6752 for training set and a mean RMSE of 2.7069 for test set. Considering the
number of docks in each station, the error is affordable. The RMSEs for the test set are significantly lower than that
for the training set, which indicates no problem of overfitting. Figure 3 shows the boxplot of RMSE for each model.
The comparison shows that our model with two layers of LSTM fits the test set best, indicating that LSTM is better
at predictions with sequential data than DNN. Fig. 4 shows an example of prediction. The prediction is accurate on
the areas around the Central Park and the New York Stock Exchange. However, the prediction is not so well in areas
around Museum of Modern Art and the Empire State Building, maybe due to the influence of events.
566 Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566
Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000 5

To assist allocation of bikes and predict the actual demand for each area, we need to compute the difference between
<t>(d)
rents and returns, which we define by NetDemand: NetDemand ˆ n
ˆ <t>(d)
= Yrent n
ˆ <t>(d)
− Yreturn n . We can therefore
evaluate our performance by calculating the RMSE for NetDemand. The mean RMSE is 3.0203 for training set while
1.9323 for test set. Fig. 3c and 3d show the boxplot of RMSE for net demand. We conclude that the prediction is even
more precise on the net demand than on rent and returns.

5. Conclusion & Future Application

In response to the unequal spatial temporal distribution of demand for shared bikes, we propose a model based on
long short-term memory to predict the rents and returns of each bike sharing station in different areas of a city based
on historical bike data, weather data, and time data. We evaluate our model on data from the Citi Bike System Data.
Experimental results show that the model can get an RMSE of 2.7069 on average. We further evaluate our model by
comparing the RMSE of proposed model to RMSE of result predicted by other deep learning neural networks. We get
the net demand by calculating the difference between number of rents and returns. The result for net demand is even
better, showing that our model can predict the demand accurately.
By learning from historical bike data and past weather data, the proposed deep LSTM model can predict the rents
and returns of bikes for the entire city as well as the demand for bikes at a certain time. Based on the prediction, we
can make suggestion for bike companies about how to distribute the bikes specifically to each station to satisfy the
need of customers as well as saving unnecessary cost of keeping bikes. The application of proposed model will be a
win-win solution for both the bike company and the customers.

Acknowledgments

This work is funded by Studies on Talent Cultivation Model: International Experience and Domestic Reform
(Project ID: ADA160004) — A Key National Project Under the 13th Five Year Plan for National Education Sci-
ence, supported by National Social Science Foundation of China.

References

[1] Campbell, A.A., Cherry, C.R., Ryerson, M.S., Yang, X., 2016. Factors influencing the choice of shared bicycles and shared electric bikes in
beijing. Transportation research part C: emerging technologies 67, 399–414.
[2] Ding, C., Wang, D., Ma, X., Li, H., 2016. Predicting short-term subway ridership and prioritizing its influential factors using gradient boosting
decision trees. Sustainability 8, 1100.
[3] Dong, D., Wu, H., He, W., Yu, D., Wang, H., 2015. Multi-task learning for multiple language translation, in: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume
1: Long Papers), pp. 1723–1732.
[4] Foell, S., Phithakkitnukoon, S., Kortuem, G., Veloso, M., Bento, C., 2015. Predictability of public transport usage: A study of bus rides in
lisbon, portugal. IEEE Transactions on Intelligent Transportation Systems 16, 2955–2960.
[5] Gers, F.A., Schmidhuber, J., Cummins, F., 1999. Learning to forget: Continual prediction with lstm .
[6] Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks, in: Proceedings of the fourteenth international conference on
artificial intelligence and statistics, pp. 315–323.
[7] Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep learning. volume 1. MIT press Cambridge.
[8] Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.
[9] Kaltenbrunner, A., Meza, R., Grivolla, J., Codina, J., Banchs, R., 2010. Urban cycles and mobility patterns: Exploring and predicting trends in
a bicycle-based public transport system. Pervasive and Mobile Computing 6, 455–466.
[10] Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y., et al., 2015. Traffic flow prediction with big data: A deep learning approach. IEEE Trans.
Intelligent Transportation Systems 16, 865–873.
[11] Phithakkitnukoon, S., Veloso, M., Bento, C., Biderman, A., Ratti, C., 2010. Taxi-aware map: Identifying and predicting vacant taxis in the city,
in: International Joint Conference on Ambient Intelligence, Springer. pp. 86–95.
[12] Rosvall, M., Bergstrom, C.T., 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National
Academy of Sciences 105, 1118–1123.
[13] Vogel, P., Greiser, T., Mattfeld, D.C., 2011. Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia-Social
and Behavioral Sciences 20, 514–523.
[14] Yang, Z., Hu, J., Shu, Y., Cheng, P., Chen, J., Moscibroda, T., 2016. Mobility modeling and prediction in bike-sharing systems, in: Proceedings
of the 14th Annual International Conference on Mobile Systems, Applications, and Services, ACM. pp. 165–178.

You might also like