You are on page 1of 6

2017 14th Web Information Systems and Applications Conference

Multi-step ahead time series forecasting for different


data patterns based on LSTM recurrent neural
network
Liu Yunpeng: Institute of Computer Software, Xi’an JiaotongUniversity, Xi’an, China
Hou Di: Institute of Computer Software, Xi’an JiaotongUniversity, Xi’an, China
Bao Junpeng: Institute of Computer Software, Xi’an JiaotongUniversity, Xi’an, China
Qi Yong: Institute of Computer Software, Xi’an JiaotongUniversity, Xi’an, China

Abstract—Time series prediction problems can play an remaining parking spaces in a parking lot per hour changes, the
important role in many areas, and multi-step ahead time series
number of visitors to the museum every day, the electricity
forecast, like river flow forecast, stock price forecast, could help
people to make right decisions. Many predictive models do not
expenses for the shopping mall every month. LSTM has been
work very well in multi-step ahead predictions. LSTM (Long used for single-step ahead traffic flow prediction [4], and
Short-Term Memory) is an iterative structure in the hidden layer multi-step ahead forecasting is still a tough task. In order to get
of the recurrent neural network which could capture the long- the proper model, we need to have a deeper understanding of
term dependency in time series. In this paper, we try to model the data. Different time series have different features, some
different types of data patterns, use LSTM RNN for multi-step time series may have the mean value around zero, some time
ahead prediction, and compare the prediction result with other series own the property of periodicity, some time series are of
traditional models. rising/decreasing trend. We called these features as well as the
combinations of them are data patterns.
Keywords—time series; LSTM; multi-step ahead
In this paper, we sort out three kinds of data patterns of
time series, and present different modeling solutions for
I. INTRODUCTION different data patterns. For the first two data patterns, we use
Time series forecasting is an important problem in many the ARIMA and GRNN (Generalized regression neural
domains, good prediction is a crucial evidence for people to network) as the comparison; for the last kind of data pattern,
make decisions. There are many mature time series prediction ARIMA/GRNN model could not be fit properly, we only make
methods based on statistical knowledge, which can solve time a general assessment for the forecast result of LSTM recurrent
series forecast problems in many fields. But the statistical- neural network.
based methods usually have a complex modeling process, let
alone the need of expertise of the domain. Except of these
statistical-based methods, many models have been proposed to
solve time series forecasting problem, such as Hidden Markov II. RELATED WORK
Model, Random Forest [1], SVM regression [2]. Many of these
models worked very well with one-step forward prediction, but A. Time series forecast
when it comes to multi-step ahead forecasting problems, most There are many solutions/models based on statistical
of them do not have good performance. knowledge to solve time series forecasting problem, such as
LSTM is the structure in the hidden layer of Recurrent linear regression model, holt-winter method, ARIMA model.
neural networks, which can capture the long-term dependency Linear regression model assume a linear relationship
in time series. Based on these structures, time series forecasting between the input and the output, which can be expressed as:
problem can be solved by neural networks. It can help us to
leave out many manual steps in traditional modeling methods,
such as stability checking, autocorrelation function checking, Whereșare the parameters we need to fit, and Ȝrepresents
partial autocorrelation function checking differentiation order the error, which follows the Gaussian distribution.
selection and so on. Although the use of RNN(Recurrent Holt-winters is the abbreviation of the Triple/Three Order
Neural Network) can greatly simplify our modeling process, Exponential Smoothing. It was developed from the moving
but in practical applications, for different data pattern, we need average method, while the original intention of the Exponential
different modeling methods to make the neural network to play Smoothing is to give the recent data points bigger weights.
a greater effect. This model can be described by the following formula( ~
A time series is a time-oriented or chronological sequence are the parameters and the is the white noise at time step t):
of observations on a variable of interest [3]. There are a lot of
data in our daily life that can be modeled as time series, such as

978-1-5386-4806-3/17 $31.00 © 2017 IEEE 299


305
DOI 10.1109/WISA.2017.25
Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.
ARIMA is the abbreviation of the autoregressive integrated also introduced the Cell State to the networks, which allows the
moving average. The main difference between the Order information to be saved for a long time.
Exponential Smoothing and ARIMA is that the Order
Exponential Smoothing assumes the random noise is a
completely independent process. However this assumption is
often violated in practice, and successive observations often
show sequence dependencies, and the ARIMA model can
correctly incorporate this dependency structure.
The moving average (MA) process with order q:

Where is the white noise process.


The MA process deals with the noise, which means we
need some preprocess to make a stationary time series to fit this Fig. 2. Structure of Recurrent Neural Network.
model. At the same time, AR process models the trend. Here’s
the autoregressive (AR) process with the order of p:
C. Multi-step ahead predictions
In recent years, many researchers focus in time series
Where is the white noise sequence is a constant value. forecasting, but most of them pay attention to one-step-ahead
predictions, which is not practical in daily life. For multi-step
B. LSTM and RNN ahead prediction, there are two basic strategies: iterate-based
and direct-based methods. Iterate-based method [7] uses the
RNN means “Recurrent neural network”, which was output of time step t as one of the inputs of time step t+1. The
generated from the feedforward neural networks that adds the main disadvantage of this method is that the error would be
ability to transfer information across time steps [5]. The inputs accumulated to a very large number a few time steps later.
at time step t contains original input xt and the output at the last Direct-based method [8] build different training examples to
time step St-1. get different models, one model make prediction of the next
time step, another model make prediction of the next two time
step and so on. The main disadvantage of this method is the
need of too much computing resources.

III. TIME SERIES PREDICTION BASED ON LSTM RNN FOR


DIFFERENT DATA PATTERNS
A lot of data in our daily life could be expressed in the form
of time series, the macro data including sales of shopping
malls, gross national product, the proportion of carbon dioxide
in the air; the micro data including computer memory usage,
Fig. 1. Structure of Recurrent Neural Network. the voltage of a capacitor in an instrument, etc. If we want to
model such a wide range of data and make predictions in
Just like other kind of neural networks, RNN is divided into traditional ways, we need to learn a lot expertise. Besides, we
three layers: input layer, hidden layer and output layer, and it also need to implement many preprocessing procedures to
has the same training algorithm named back propagation (In match the data to a certain model.
particular, the training algorithm in RNN is called Back
Propagation Through time, BPTT). LSTM RNN is a very powerful tool that allows us to make
time series predictions without too much expertise. But
LSTM denote “long short term memory”, which is a special different time series have different features, we still need to
structure of the hidden layer which was present by Hochreiter choose the right modeling approaches based on different data
and Schmidhuber in 1997 [6]. It is designed to solve the patterns.
gradient vanish problem in long-term series, remembering the
long-term information is actually their default behavior. In this part, we propose three different data patterns, model
these different types of data separately, using different
The primitive structure of hidden layer in RNN is tanh modeling strategies and preprocessing methods to process the
function, while the LSTM structure contains of three modules: data so that it can be trained in the LSTM RNN.
forget gate, input gate and output gate. The forget gate and the
input gate controls which part of the information should be A. Strong periodic data
removed/reserved to the networks; the output gate uses the
processed information to generate the correct output. LSTM Strong periodic data means the data with strong periodicity,
which is very easy to build models. This data pattern has a

306
300

Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.
fixed cycle, without rising/decreasing trend, and has a fixed prediction(p3)…… After several steps of iteration, we have a
mean value at the same time. In order to make the data to fit the series of predictions(p1,p2,p3…), that is, we have a series of
LSTM RNN model, we need a preprocessing step to normalize growth/decline rate, which could help us to calculate the
the raw data between 0 and 1. The normalization process is predictions.
quite simple. First㸪 we find the maximum value MAX and the
minimum value MIN in the training examples. Then we map C. Super-long periodic data
the maximum value to 1, the minimum value is mapped to 0, Super-long periodic data means the data with a very long
and the value of the middle is evenly mapped to the range [0,1] cycle, a cycle might contains of more than 100 time steps. If
by the following formula we model directly, we have to enlarge our window length to fit
the periodicity, which means we need to trace back more than
100 time steps for one training example. This will make our
(1) calculation volume rise sharply, besides, the prediction
accuracy begins to decrease as the time step increases.
Where represent the raw input at time step t, while the
means the normalized input at time step t. After the For this kind of data pattern, we propose a method to
normalization, all the input numbers are normalized to the shorten the long cycle. Inspired by the moving average
range [0,1], which mean we could build LSTM RNN models algorithm, we set a parameter F that represents a multiple of
and training the parameters. When we use this model to make the cycle to be shorten. Next, we make partitions of the raw
predictions ,we could only get a number between 0 and 1 . At data, and each F data points are used as an interval, the first
this time, we need one more step to get our real number: interval from time step 0 to time step F-1, the second interval
from time step F to time step 2F-1……suppose, we will get V
intervals after partition operation. Then, we calculate the mean
(2) value of the data points between each interval, and use this
mean value to represent the interval. After this, we could get a
Where the represents the output of our model, while the sequence of interval mean values which contains V data points,
means the real prediction. we named this series as SA. At the same time, for every data
point in raw data series, we use it to subtract the mean value of
B. Periodic data accompanied with the ascending/descending its interval to get the difference, all the difference form the
trend sequence SB.
When we have a data with the ascending/descending trend,
it seems impossible to directly normalize the raw data to a
certain range. The traditional ARIMA model has many (4)
preprocessing steps such as data smoothing, logarithmic
transformation, differential, to make the data a stable sequence
for modeling. But when we use LSTM RNN to build a model, (5)
the input data must be within the range of [0,1], which means In the above two formulas, represent the n elements in
normalizing is one of the necessary steps in preprocessing. the raw data, subscript i is between the range [0,V] (not
Although the raw data is not stable within a certain range, but including V). The [n / F] means round down.
the rate of growth/decline at each time step should be within a After the above steps, we obtained two sequences SA and
certain range, so we build model for the growth/decline rate of SB, where SA is a short period sequence and SB has a mean
the raw data. The rate could be calculated as follow value of zero and a fixed range of fluctuation. We could model
them using the method of dealing with strong periodic data just
(3) like we talked about before.
Where represent the growth/decline rate at time step t.
Then we find the maximum value MAX and the minimum
value MIN in the rate list. Next preprocessing step is IV. EXPERIMENTS
normalization, while the Eq. (1) could be used to finish this
work. Now we have the data in the range of [0,1], which mean We did some experiment to verify the effectiveness of the
we could use LSTM RNN to build a model. When we get a proposed method. In the first two data patterns, we also
prediction from our model, we could use the Eq. (2) to get the compared our method with the ARIMA model as well as
growth/decline rate, then we use the last raw data in the GRNN (Generalized regression neural network). For the last
training examples combined with the growth/decline rate to kind of data pattern, the time series could not be model by
calculate our prediction ARIMA/GRNN, so we only validate our proposed method.

In order to get multi-step ahead prediction, we use the A. strong periodic data
iteration method, which means after we get a prediction(p1), For strong periodic data, we use the electric current value
we put this prediction(p1) into the input series to get the of an electronic instrument whose image is approximately
prediction(p2) of the next time step, then we put the new sinusoidal. In the training set, we have 2000 equal intervals of
prediction(p2) into the input series to get another new

307
301

Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.
data points, and in the test set, we have 350 equally spaced data prediction value are almost the same with the true value, and its
points. We build training examples in the following way: MSE(mean square root error) is 0.02; figure b in the middle
shows the multi-step iterative prediction using GRNN, the blue
• Create a sliding window with a width of 100. line represent the raw data and the red line denote the
• The input of the first sample is the first 100 points, and predictions, its prediction has a larger deviation which has the
the output of the first sample is the 101st point. MSE equals 0.20409; figure c present the prediction result of
the LSTM RNN, the blue line represent the raw data and the
• Slide a time step each time to get the input and output of green line denote the predictions, which is also a very accurate
other training examples. prediction with a MSE of 0.02408. For strong periodic data,
After these steps, we get 1900 training examples. Then, we ARIMA and LSTM have similar performance, they both make
perform similar processing on the test set data to get 250 inputs accurate predictions, while GRNN's predictions are relatively
and 250 correct output. We use the methods presented in poor.
Section ċ to model the data and train the parameters. When
we get the proper parameters, we could use this model to get B. Periodic data accompanied with the ascending/descending
the prediction. In this experiment, we predict the last 50 data trend
points and use the 200th input vector in the test set as the initial For this kind of data pattern, we use the Airline Passenger
input, then add the output of each forecast step to the current data (from Time Series Data Library) to accomplish our
input sequence and delete the first element in the input experiment. This is a traditional data set contains 144 data
sequence to keep the size of the input vector equals 100. Using points which representing the human traffic of an airport per
this method, we iteratively get the 50 prediction data. month from 1949 to 1960. We take the first 100 points as the
training examples, while the other 44 points are classified in
As a comparison, we also use ARIMA/GRNN for modeling the test set. Because the training sample is relatively small, we
and forecasting. Similar with LSTM RNN, we also use the trained the examples for more than one thousand times. Here’s
iterative method to produce multi-step ahead prediction results. our prediction result (MSPE represent mean square root of
The prediction results of this experiment are shown in the Fig.3. percentage error):

a.ARIMA result with MSE=0.02


a.ARIMA result with MSPE=0.0916

b.GRNN result with MSE=0.20409


b.GRNN result with MSPE=0.9874

c.LSTM result with MSE=0.02408


Fig. 3. Experiment result for data pattern strong periodic data. c.LSTM result with MSPE=0.05908
Fig. 4. Experiment result for periodic data accompanied with the ascending /
In the Fig. 3, figure a shows the multi-step iterative descending trend.
prediction using the ARIMA model, as we can see, the

308
302

Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.
Because we modeled the growth/decline rate of the data, we visual observation we can see from the figure that in the first
use another way to evaluate our predicting results which is few cycles, each cycle contains about 250 time steps, which is
called MSPE (mean square root of percentage error). We can very long. Just like we talked in Sectionċ, we need to split
calculate MSPE by the following formula: the raw data into two different sequences SA and SB. The
splitting results are as follows.

(6)
Where pt means the prediction result at time step t while the
rt means the real value at time step t. MSPE represents the
mean square root of the percentage deviation of the prediction.
In Fig. 4, the graph on the top shows the prediction result of
ARIMA model. The red line represents the prediction result
while the blue line indicates the real value of the data. As we
can see on the graph, the forecast result is relatively accurate,
it’s MSPE equals 0.0916 which means the accuracy rate of the
prediction is about 90.84%.
a. Mean value sequence (SA, F=21)
Figure b in the middle shows the predicted data modeled by
GRNN. This model has a bad performance on this data because
the range of independent variables keep increasing as the time
steps goes on, and GRNN could not extract the feature
correctly. The error rises rapidly after the first cycle, and the
prediction is in the wrong direction.
In figure c, after training for thousands of times, the loss
converged to a very small value. The window length was set as
24, in one hand, we could have one training example which
contains a long enough sequence for neural network to find the
features; in the other hand, we could get relatively adequate
training examples at the same time. The multi-step ahead
predictions generated by iteration method do not appear to be b. Difference sequence (SB, F=21)
far away from the raw data. The mean square root of Fig. 6. Split sequencs SA and SB.
percentage error is 0.05908, which means that the accuracy of
the prediction is about 94.09%. For multi-step ahead For sequence SA, we use the similar method to model the
prediction, this accuracy is acceptable and practical. data just like what we did for dataset airline passenger. We use
the first 120 data points as train set, and the other 46 points as
C. Super-long periodic data test set. After modeling and training, we get the result like this:
For this kind of data pattern, we choose the data of the
output power of a communication equipment. The data set do
not have an obvious periodicity, and the distribution of data
points is very messy. There are totally 3500 points, the
distribution as shown below.

Fig. 7. The forecast result of sequenct SA.

From the Fig. 7, we could conclude that the prediction of


the first cycle is relatively accurate, after that, the performance
of our model is not very good. Totally, the mean square root
Fig. 5. The raw data of the experimental dataset.
error of the prediction result is 8.82.
Sequence SB has a certain range of fluctuations and the
As we can see in the figure above, the fluctuations in the mean value of 0. We can directly normalize it and put it into
raw data seem to have some inconspicuous regularity. By LSTM RNN for training. Here is the result:

309
303

Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.
Compared with the traditional statistical prediction method,
our proposed method based on LSTM RNN has the following
advantages:
• It can fit a wider range of data patterns compared to the
traditional models.
• We do not need to spend too much time on the
modeling process by omitting the manual steps such as
stability checking, autocorrelation function checking,
partial autocorrelation function checking, besides we do
not need to know too much expertise.
• After training properly, our model has higher predictive
Fig. 8. The forecast result of sequenct SB. accuracy.
In the Fig. 8, the green line represents the multi-step ahead But the tradition statistical-based models still have the
predictions. As we can see, there are some fluctuations in the advantages like less resource consumption, faster forecasting
beginning, but with the iteration going on, the predictions speed. As for the data patterns, compared with the widely
gradually converge to 0. The MSE for all the prediction values changed data in our daily life, the data patterns we proposed in
is 6.8560. Finally, we add the results of SA and SB to get the this paper is relatively simple, and there is still a lot of mining
forecast results like this: spaces in the field of multi-step ahead time series prediction.

REFERENCES
[1] Y. Lin, U. Kruger, J. Zhang, Q. Wang, L. Lamont and L. E. Chaar,
"Seasonal Analysis and Prediction of Wind Energy Using Random
Forests and ARX Model Structures," in IEEE Transactions on Control
Systems Technology, vol. 23, no. 5, pp. 1994-2002, Sept. 2015.
[2] Vladimir Vapnik, Steven E. Golowich, and Alex J. Smola. Support
Vector Method for Function Approximation, Regression Estimation and
Signal Processing. NIPS, pp. 281-287, 1996.
[3] Douglas C. M and Cheryl L. J, “Introduction to TimeSeries Analysis and
Forecasting” John Wiley㸤Sons, pp. 2, 2015.
[4] Y. Tian and L. Pan, "Predicting Short-Term Traffic Flow by Long Short-
Term Memory Recurrent Neural Network," 2015 IEEE International
Fig. 9. The final forecast result. Conference on Smart City/SocialCom/SustainCom (SmartCity),
Chengdu, 2015, pp. 153-158.
Although the result of multi-step ahead prediction is not [5] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.
particularly accurate (MSE=16.452), but it still has a certain Nature, Volume 521, Issue 7553, 2015, Pages 436-444.
reference value. [6] Sepp Hochreiter, and Jürgen Schmidhuber. Long short-term memory.
Neural Computation, pp. 1735-1780, 1997.
[7] A. F. Atiya, S. M. El-Shoura, S. I. Shaheen and M. S. El-Sherif, "A
comparison between neural-network forecasting techniques-case study:
V. CONCLUSION river flow forecasting," in IEEE Transactions on Neural Networks, vol.
10, no. 2, pp. 402-409, Mar 1999.
In this paper, we studied the multi-step time series [8] Guillaume Chevillon. DIRECT MULTI-STEP ESTIMATION AND
forecasting problem, and put forward different modeling ideas FORECASTING. Journal of Economic Surveys, Volume 21, Issue 4,
according to the different data patterns. We have also done 2007, Pages 746-785.
experiments to verify the effectiveness of the proposed method
and compared it with the results produced by the ARIMA/
GRNN model.

310
304

Authorized licensed use limited to: the KIRC. Downloaded on July 09,2023 at 13:35:48 UTC from IEEE Xplore. Restrictions apply.

You might also like