Professional Documents
Culture Documents
com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000
2018 International Conference
Procedia
Procedia onScience
Computer
Computer Identification,
Science 14700 (2019)
(2019) Information
000–000
562–566 and Knowledge
www.elsevier.com/locate/procedia
in the Internet of Things, IIKI 2018 www.elsevier.com/locate/procedia
1. Introduction
Bikes have long played an important part in city transportation. As a consequence, bike-sharing has recently re-
ceived
1. increasing attention around the world. Bike-sharing customers prefer to quickly find a bike whenever they need
Introduction
1.
one.Introduction
Thus, bike provider companies need to allocate bikes efficiently according to the demand. Appropriate prediction
of bike
Bikesdemands across
have long different
played areas over
an important different
part in city time is thus crucial.
transportation. As a consequence, bike-sharing has recently re-
Bikes
As many
ceived have long
increasing played
underlying
attention an important
factors — for
around part Bike-sharing
the example,
world. intime
city of
transportation.
the customers As
day, day of a consequence,
the week,
prefer bike-sharing
events,find
to quickly weather, has recently
a bike correlation
whenever re-
between
they need
ceived
stations
one. increasing
Thus,— bike attention
contribute
provider around
to companies the
the demandneed world.
of sharedBike-sharing
bikes[1],
to allocate customers
bikespredicting prefer
efficientlybike to
demand
according quickly
to is find
thevery a bike whenever
challenging.
demand. they
Several
Appropriate need
studies
prediction
one.
show
of Thus,
bikethat bike provider
analyzing
demands usage
across companies
data of
different need to different
taxicabs
areas over allocate bikes
[11], subways efficiently
time is [2],
thus according
buses[4],
crucial. to the demand.
and bikes[14] Appropriate
could predict futureprediction
transport
of bike
As manydemands across factors
underlying different—areas over different
for example, time
time of theisday,
thusday
crucial.
of the week, events, weather, correlation between
As many underlying factors — for example, time of the
stations — contribute to the demand of shared bikes[1], predicting bikeday, day of thedemand
week, events,
is veryweather, correlation
challenging. Severalbetween
studies
stations — contribute
∗ Corresponding to the
author. Tel.: demand of shared
+86-187-0137-1618. bikes[1], predicting bike demand is very
show that analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future challenging. Severaltransport
studies
show E-mail
thataddress: topanyan@sina.com
analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future transport
1877-0509 c 2019 The Authors. Published by Elsevier B.V.
∗
This∗ Corresponding author.
is an open access Tel.:
article +86-187-0137-1618.
under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Corresponding
E-mail address:author. Tel.: +86-187-0137-1618.
topanyan@sina.com
Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in
E-mail address: topanyan@sina.com
the Internet of Things.
1877-0509 c 2019 The Authors. Published by Elsevier B.V.
1877-0509
1877-0509 © 2019 The
c 2019 The Authors.
Authors. Published
Published by
by Elsevier B.V.
Elsevier B.V.
This
This isisan
anopen
openaccess article
access under
article the CC
under the BY-NC-ND
CC BY-NC-ND licenselicense
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an
Peer-review open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-reviewunder
underresponsibility of the
responsibility of scientific committee
the scientific of the
committee 2018
of theInternational Conference
2018 International on Identification,
Conference Information and
on Identification, Knowledge
Information in
and
Peer-review
the under
Internet of
Knowledge theresponsibility
inThings. of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in
Internet of Things.
the Internet of Things.
10.1016/j.procs.2019.01.217
Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 563
2 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000
usage. Kaltenbrunner et al.[9] discovered that temporal and spatial mobility patterns exist within the city; Vogel et
al.[13] discovered that there are spatio-temporal dependencies between rents and returns of bikes at stations.
In this paper, we propose a real-time method for predicting bike demands in different areas of a city during a future
period based on historical data from Citi Bike System Data and meteorology data. We use the time sequence of bike
rents and returns as dataset. We train a deep long short term memory (LSTM)[8] recurrent neural network (RNN) with
this data, making use of the self-loop and forget gate of LSTM. The model is proved to be effective after experiment
with various approaches. The method is able to handle huge data in an acceptable amount of time, and the same
method can be applied to other bike sharing systems.
2. Methodology
We choose the LSTM sequence learning model because of its ability to process sequential data and memorize data
of past time steps[7]. LSTM is a different type of gated RNN which is capable of learning long-term dependencies.
LSTM is not affected by vanishing gradient or exploding gradient problem [8]. Fig. 1a shows the mechanism of
LSTM. An LSTM has a internal recurrence and a self-loop[5] in addition to the outer recurrence, which allows the
network to accumulate information. The self-loop weight of LSTM is controlled by a forget gate, using a sigmoid unit
which sets the self-loop weight between 0 and 1. The external input gate and output gate have similar computations
to the forget gate. Thus, an ability to learn long-term dependencies were given to the network.
y<t>
softmax
y(d) y(d+1)
a<t>
x(d) x(d+1)
x<t>
(b) The structure of the deep LSTM sequence learn-
(a) The mechanism of LSTM. ing model.
Fig. 1: Fig. 1a shows the complete mechanism of LSTM using a flowchart. Fig. 1b shows the implementation of a deep LSTM model with two
layers of LSTM.
Like other neural networks in deep learning, RNNs could be stacked up to deeper versions, which contain more
than 1 layers of RNN. Because RNNs are especially computationally expensive to train, normally a deep RNN model
contains no more than 3 layers of LSTM. Deep RNN is very useful in learning complex functions. We use two LSTM
layers in our model. In deep LSTM, the model contain multiple layers, but the parameters of different layers are
calculated independently. The first layer of LSTM will compute a hidden layer of units based on the input. Then, the
second layer of LSTM will calculate the output based on the hidden units. Finally, the neural network will calculate
the loss function and try to minimize it. Fig. 1b shows the structure of the sequence model. With the deep LSTM
sequence learning model, we are able to learn complex functions and predict sequential data very accurately.
3. Experiment
We use data from the Citi Bike System Data of 2017 as the training set and use data of January, February, and
March of 2018 as test set to conduct the experimental study. The Citi Bike have more than 800 bike stations built in
New York City and Jersey City. However, Fig. 2a and 2b show that the number of bike rent in a given hour can vary
564 Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566
Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000 3
hugely according to the location. Furthermore, analyzing every station on its own there is a repetitive pattern found,
as shown in Fig. 2c.
There are two problems with these stations having little amount of related trips. First, they may lead to the problem
of data scarcity, while LSTM have strict restriction on the quality of data[3]. Second, since they have little rents and
returns, the bikes hardly run out, so analyzing their time sequence is much less meaningful. We use the community
detection method proposed by Rosvall et al. (2008)[12] to detect the station community structure. The method results
in 12 large communities with more than 3 stations and other small communities. We only choose the two communities
with largest number of related trips as our dataset. Therefore, by using data of stations in a community as the dataset,
we could maintain the consideration of interactions between stations while filtering low-quality data.
Fig. 2: The spacial distribution of bike rent and its repetitive pattern. Fig. 2a and 2b show that the demand of bikes varies significantly with stations,
and the comparison of the two sub-figures shows the difference of distribution of rent and return behaviors. Fig. 2c shows the repetitive pattern of
rents and returns for a single station in different time of week. The curve shows similar patterns from Monday to Friday, but different on Saturday
and Sunday. Fig. 2d shows the positions of all communities and the stations included in each community.
The raw data contains information in many dimensions, including spatial information, temporal information, and
customer information. In this model we use only the start time, end time, start station, and the end station of each
trip. We first convert the information into data of stations by dividing each day into several time steps and count the
number of rents and returns separately, denoted by Xrent and Xreturn . We also consider the importance of different
influence factors in our model, including Weather, Date, and Day of Week. Because people are more exposed to harsh
weather conditions during bike rides, the demand for shared bikes is greatly influenced by weather[1]. We consider
the potential influence of 3 different weather indicator — Temperature, Precip Intensity, and Wind Speed. We use
both the historical weather and future weather in the dataset, using Xweather and Xweather separately to denote weather
data of the past day and the target day, Xyear and Xweek to denote day of year and day of week. Therefore, the input
data structure is a combined matrix of all matrices. The input matrix consists of N rows, denoting the features, and
T columns, denoting the time steps. As we only need to predict future data of rents and returns, the output is the
combination of two matrices Yrent and Yreturn .
To avoid the potentially strong influence of Day of Week, we use previous data to predict the data after 7 days. That
(d) (d) (d+7) (d+7)
is, we use Xrent and Xreturn to predict Xrent and Xreturn . We have 360 stations in total. We use data from January 1,
2017 to December 31, 2017 as training set and data from January 1, 2018 to March 31, 2018 as test set. The input
and output shape are 358 × 24 × 728 and 358 × 24 × 720 for training set, while 90 × 24 × 728 and 90 × 24 × 720
for test set. The experimental parameters are shown in Table 1a. We use the mean squared error as the loss function:
N T < j>(d)
MS E(d) = N·T 1
· i=1 j=1 (Ŷi − Yi< j>(d) )2 , where N and T are the number of training sample and the total number
of timesteps, d represents the day, Ŷi< j> and Y is the predicted value and the real value of the ith training sample at the
jth timestep on day d, respectively.
Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 565
4 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000
Table 1: The experimental parameters, as well as average RMSEs for each method.
(a) Training set (b) Test set (c) Net demand of train- (d) Net demand of test set
ing set
Fig. 3: Fig. 3a and 3b compare the RMSEs for different neural networks. The comparison shows the deep LSTM model fits the test set best. Fig. 3c
and 3d show the RMSEs of net demand different deep learning models for training set and test set.
Fig. 4: Real and predicted number of rents at 18:00-19:00, January 14, 2018.
4. Result Analysis
In order to evaluate the performance of our proposed model, we use different deep learning models to predict the
demand and compare their results. Apart from LSTM, we also use deep neural networks (DNN)[6] to predict the result,
which does not take the property of sequence into consideration. Weuse the Root Mean Squared Error (RMSE)[10]
N T < j>(d)
as the performance metric, which can be calculated by RMS E(d) = N·T 1
· i=1 j=1 (Ŷi − Yi< j>(d) )2 .
The result shows a mean RMSE of 3.6752 for training set and a mean RMSE of 2.7069 for test set. Considering the
number of docks in each station, the error is affordable. The RMSEs for the test set are significantly lower than that
for the training set, which indicates no problem of overfitting. Figure 3 shows the boxplot of RMSE for each model.
The comparison shows that our model with two layers of LSTM fits the test set best, indicating that LSTM is better
at predictions with sequential data than DNN. Fig. 4 shows an example of prediction. The prediction is accurate on
the areas around the Central Park and the New York Stock Exchange. However, the prediction is not so well in areas
around Museum of Modern Art and the Empire State Building, maybe due to the influence of events.
566 Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566
Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000 5
To assist allocation of bikes and predict the actual demand for each area, we need to compute the difference between
<t>(d)
rents and returns, which we define by NetDemand: NetDemand ˆ n
ˆ <t>(d)
= Yrent n
ˆ <t>(d)
− Yreturn n . We can therefore
evaluate our performance by calculating the RMSE for NetDemand. The mean RMSE is 3.0203 for training set while
1.9323 for test set. Fig. 3c and 3d show the boxplot of RMSE for net demand. We conclude that the prediction is even
more precise on the net demand than on rent and returns.
In response to the unequal spatial temporal distribution of demand for shared bikes, we propose a model based on
long short-term memory to predict the rents and returns of each bike sharing station in different areas of a city based
on historical bike data, weather data, and time data. We evaluate our model on data from the Citi Bike System Data.
Experimental results show that the model can get an RMSE of 2.7069 on average. We further evaluate our model by
comparing the RMSE of proposed model to RMSE of result predicted by other deep learning neural networks. We get
the net demand by calculating the difference between number of rents and returns. The result for net demand is even
better, showing that our model can predict the demand accurately.
By learning from historical bike data and past weather data, the proposed deep LSTM model can predict the rents
and returns of bikes for the entire city as well as the demand for bikes at a certain time. Based on the prediction, we
can make suggestion for bike companies about how to distribute the bikes specifically to each station to satisfy the
need of customers as well as saving unnecessary cost of keeping bikes. The application of proposed model will be a
win-win solution for both the bike company and the customers.
Acknowledgments
This work is funded by Studies on Talent Cultivation Model: International Experience and Domestic Reform
(Project ID: ADA160004) — A Key National Project Under the 13th Five Year Plan for National Education Sci-
ence, supported by National Social Science Foundation of China.
References
[1] Campbell, A.A., Cherry, C.R., Ryerson, M.S., Yang, X., 2016. Factors influencing the choice of shared bicycles and shared electric bikes in
beijing. Transportation research part C: emerging technologies 67, 399–414.
[2] Ding, C., Wang, D., Ma, X., Li, H., 2016. Predicting short-term subway ridership and prioritizing its influential factors using gradient boosting
decision trees. Sustainability 8, 1100.
[3] Dong, D., Wu, H., He, W., Yu, D., Wang, H., 2015. Multi-task learning for multiple language translation, in: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume
1: Long Papers), pp. 1723–1732.
[4] Foell, S., Phithakkitnukoon, S., Kortuem, G., Veloso, M., Bento, C., 2015. Predictability of public transport usage: A study of bus rides in
lisbon, portugal. IEEE Transactions on Intelligent Transportation Systems 16, 2955–2960.
[5] Gers, F.A., Schmidhuber, J., Cummins, F., 1999. Learning to forget: Continual prediction with lstm .
[6] Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks, in: Proceedings of the fourteenth international conference on
artificial intelligence and statistics, pp. 315–323.
[7] Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep learning. volume 1. MIT press Cambridge.
[8] Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.
[9] Kaltenbrunner, A., Meza, R., Grivolla, J., Codina, J., Banchs, R., 2010. Urban cycles and mobility patterns: Exploring and predicting trends in
a bicycle-based public transport system. Pervasive and Mobile Computing 6, 455–466.
[10] Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y., et al., 2015. Traffic flow prediction with big data: A deep learning approach. IEEE Trans.
Intelligent Transportation Systems 16, 865–873.
[11] Phithakkitnukoon, S., Veloso, M., Bento, C., Biderman, A., Ratti, C., 2010. Taxi-aware map: Identifying and predicting vacant taxis in the city,
in: International Joint Conference on Ambient Intelligence, Springer. pp. 86–95.
[12] Rosvall, M., Bergstrom, C.T., 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National
Academy of Sciences 105, 1118–1123.
[13] Vogel, P., Greiser, T., Mattfeld, D.C., 2011. Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia-Social
and Behavioral Sciences 20, 514–523.
[14] Yang, Z., Hu, J., Shu, Y., Cheng, P., Chen, J., Moscibroda, T., 2016. Mobility modeling and prediction in bike-sharing systems, in: Proceedings
of the 14th Annual International Conference on Mobile Systems, Applications, and Services, ACM. pp. 165–178.