You are on page 1of 5

2022 IEEE 22nd International Conference on Communication Technology

Short-Term Load Forecasting Method Based


on ARIMA and LSTM
Shuo Chen, Rongheng Lin Wei Zeng
Beijing University of Posts and Telecommunications State Grid Jiangxi Electric Power Research Institute
School of Computer Science Jiangxi, China
State Key Lab of Networking and Switching Technology eric.zengw@gmail.com
Beijing, China
cshuo@bupt.edu.cn, rhlin@bupt.edu.cn
2022 IEEE 22nd International Conference on Communication Technology (ICCT) | 978-1-6654-7067-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCT56141.2022.10073051

Abstract—This paper explores the classic prediction algo- the next day. Therefore, the focus of this paper is on short-term
rithms such as Autoregressive Integrated Moving Average model load forecasting.
(ARIMA) and Long Short-Term Memory Neural Network The randomness and large scale of power data are the main
(LSTM), and combines the advantages of both, and proposes
a short-term forecast method ARIMA-LSTM fusion model. This difficulties in predicting power data. The existing forecasting
model computes the final predicted value by applying a linear methods are roughly divided into two categories, one is
correctio to the LSTM model error. Using ARIMA and LSTM the traditional regression method represented by time series
as a comparison algorithm, train and predict the next day’s load analysis, and the other is the artificial intelligence method
with ARIMA-LSTM. The root mean square error (RMSE), mean represented by neural network.
absolute percentage error (MAPE), and the worst relative error
(WRE) were used to evaluate the performance of the proposed The basic method of traditional regression prediction is to
algorithm. After testing, the RMSE of the proposed model is establish a fitting curve and mathematical model according to
0.433, while ARIMA and LSTM are 0.461 and 0.445 respectively. historical load data, take the load data as a random variable,
ARIMA-LSTM also has better results and performs well on calculate the change law of the load data through statistical
different types of power datasets. laws, deduce the mathematical expression, and use this expres-
Index Terms—forecast, power load, ARIMA, LSTM, ARIMA-
sion to carry out load forecast. The more common methods are
LSTM
autoregressive model (AR), moving average model (MA) and
autoregressive moving average model (ARMA) [1]. ARMA
I. I NTRODUCTION can only deal with stationary sequences. For non-stationary
sequences, it needs to be differentiated to obtain stationary
Orderly electricity consumption refers to staggering the sequences, and then use ARMA, which is the autoregressive
electricity consumption of different users during peak elec- integrated moving average model (ARIMA) [2], [3]. The tra-
tricity consumption periods. The precise distribution of power ditional regression prediction method only needs endogenous
resources depends on the cooperation of many aspects. If variables without the help of other exogenous variables, and
the power supply is insufficient, there will be power outages, does not require manual intervention. The algorithm speed
affecting people’s lives, factory production, and even serious is relatively fast. Its disadvantage is that it relies too much
power outages for important facilities. Conversely, if the power on historical data and seldom considers other influencing
supply is excessive, it will lead to the waste of resources. factors, which leads to its inability to achieve high accuracy,
Therefore, the government and other relevant departments especially its poor fitting effect on nonlinear data. The previous
need to rationally allocate electricity resources through orderly initial conditions possess the inherent deficiencies of having
regulation of electricity consumption. a fixed structure and poor adaptability to changing raw data.
Accurate load prediction can provide a reference for the In addition, the method of gray scale prediction can also be
orderly power consumption regulation. Relevant departments used. [4]
or institutions can understand the load law of the power grid The prediction method of artificial intelligence refers to the
through the predicted load and formulate relevant measures. At load prediction through the method of machine learning or
the same time, by comparing the actual load with the forecast deep learning. Its typical representatives are artificial neural
results, it is possible to keep abreast of changes in the grid network (ANN) [5] and recurrent neural network (RNN) [6].
load and investigate possible problems as soon as possible. The predictive performance of an ANN largely depends on
In addition, the user’s electricity load curve is an important its hyperparameters. If an ANN contains fewer connections,
basis for e-commerce companies to decide to purchase elec- it has limited learning ability and does not predict well. On
tricity from the spot market. Accurate daily load forecasting the other hand, if an ANN is given too many connections,
can better meet users’ electricity demand and reduce the risk of it may learn noise during the training phase and thus cannot
e-commerce purchasing electricity from the real-time market predict the regularity accurately [7], [8]. Compared with the

978-1-6654-7067-4/22/$31.00 ©2022 IEEE 1913


Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on August 29,2023 at 09:02:33 UTC from IEEE Xplore. Restrictions apply.
traditional neural network, RNN has great advantages. It can B. LSTM
save the learned information in the network and can use this LSTM stores information in gated units outside the normal
information in processing. This feature enables RNN to better information flow of the recurrent network. These units deter-
process time series [9]. However, when using information far mine which information to store and when to allow reading,
from the current moment, these historical information are often writing or clearing of information through the switch of the
not fully utilized, which is the gradient vanishing problem of gate, that is, the memory unit will learn when to allow data
RNN [10]. As the number of neural network layers increases, to enter, leave or be deleted through the iterative process
the gradient update also increases exponentially, which is of guessing, error back propagation and adjusting weight by
prone to gradient explosion problems [11]. To overcome the gradient descent.
vanishing and exploding gradients of RNNs, variants of RNNs
such as Long Short-Term Memory (LSTM) [12] and Gated
Recurrent Unit (GRU) [13] have emerged. Later, variants of
LSTM such as Convolutional LSTM [14], eight variants of
vanilla LSTM [15], etc. appeared. Different from traditional
regression prediction, neural networks such as LSTM can cope
well with nonlinear load data.
Through the above analysis, different forecasting algorithms
have different advantages. In this paper, a new short-term
prediction model called ARIMA-LSTM is proposed by com-
bining the advantages of ARMIA and LSTM algorithms.
Deconstruct the power data into nonlinear parts and error parts, Fig. 1. The structure of the LSTM gating unit
make predictions separately, and finally merge the prediction
results. After experimental verification, the new model has C. ARIMA-LSTM
better performance in short-term prediction. Assuming that the training set is yt , there exists a perfect
The main contributions of this paper are: function f (xt ) such that yt = f (xt ) + εt , where εt is the error
• A new short-term prediction model ARIMA-LSTM is caused by noise. Considering that most of the power data is
proposed, and good results are obtained. non-linear, the LSTM model is first trained using the load data
• The predictions of the new model were compared with to obtain an approximate function fˆ(xt ) of f (xt ). There is an
those of the existing model and tested on different types error in this function, that is, et = yt − fˆ(xt ). In this paper, the
of load curves. ARIMA model is used to fit this error to obtain the prediction
II. R ESEARCH M ETHODS model of the error:
Based on the above analysis, this paper first selects ARIMA β(B)(∇d et ) = φ(B)εt (5)
and LSTM as the representatives of traditional regression The prediction result fˆpred (xt ) can be obtained through the
method and neural network method, respectively, for short- trained LSTM, and the error correction result êpred can be
term prediction. After that, this paper proposes a short-term obtained through the trained ARIMA model. The results of
prediction method based on ARIMA-LSTM. the ARIMA-LSTM model are as follows:
A. ARIMA ŷpred = fˆpred (xt ) + êpred (6)
ARIMA is a classic time series forecasting model. It adds
The detailed process of ARIMA-LSTM is shown in Fig. 2:
difference operation to ARMA, so that ARMA can adapt to
non-stationary time series. ARMA consists of an autoregres-
sive model (AR) and a moving average model (MA). III. E XPERIMENTAL R ESULTS AND A NALYSIS
Let B be the delay operator, then the ARMA(p, q) model This section verifies the ARIMA-LSTM model perfor-
can be defined as: mance. Experiment with real load data and compare the results
β(B)(Yt ) = φ(B)εt (1) with ARIMA and LSTM, comparing mean absolute percentage
error (MAPE), root mean squared error (RMSE), and the worst
Where relative error (WRE). Their definitions are as follows:
n
β(B) = 1 − β1 B − β2 B2 − ... − βp Bp (2) 100% X yi − ŷi
M AP E(y, ŷ) = | | (7)
2
φ(B) = 1 + φ1 B + φ2 B + ... + φq B q
(3) n i=1 yi
v
Suppose ∇ is the difference operator, Yt is the d-order u n
u1 X
homogeneous non-stationary sequence, then ∇d Yt is the sta- RM SE(y, ŷ) = t (yi − ŷi )2 (8)
n i=1
tionary sequence, and ARIMA(p, d, q) can be defined as:
yi − ŷi
β(B)(∇d Yt ) = φ(B)εt (4) W RE(y, ŷ) = min {| |} (9)
i=1,2,...,n yi

1914
Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on August 29,2023 at 09:02:33 UTC from IEEE Xplore. Restrictions apply.
TABLE II
P ERFORMANCE OF INDIVIDUAL MODELS ( POWER DATA )

Model RMSE MAPE WRE


ARIMA 0.461 9.371% 24.118%
LSTM 0.445 8.965% 23.570%
ARIMA-LSTM 0.433 8.534% 20.789%

Fig. 2. The detailed process of ARIMA-LSTM model

A. Experimental setup
Using the electricity load data of a city, this dataset contains
96-point load curves from January 1, 2014 to December 31,
2014. Select the data from January 1, 2014 to November
30, 2014 for training, the data from December 1, 2014 to
Fig. 3. Comparison of the predicted results of each model with the actual
December 30, 2014 for validation, and predict the data from value
December 31, 2014.
ARIMA, LSTM, ARIMA-LSTM model settings are as
follows: C. Universality

TABLE I
This section uses the ElectricityLoadDiagrams20112014
ARIMA, LSTM, ARIMA-LSTM PARAMETER SETTINGS dataset, which contains the 96-point daily load curve of 370
users from January 1, 2011 to January 1, 2015, that is, from
Model Parameters
autoregressive order p: 2
00:00 to 23:45 every 15 minutes load. The data is a txt file,
ARIMA difference order d: 0 each line is the load of all users in 15 minutes.
moving average order q: 2 Select the data from January 1, 2012 to November 30, 2012
layers: 3
LSTM
hidden node: 256
for training, and the data from December 1, 2012 to December
autoregressive order p: 2 30, 2012 for validation, and forecast the data from December
difference order d: 0 31, 2012.
ARIMA moving average order q: 3
layers: 3
The parameters of the test model are still selected from the
hidden node: 256 parameters in Table I. The test data selects the power data
of user 2, user 4 and user 5 in 2012. The Fig. 4 shows the
approximate load curves for three users (values are taken at
ARIMA and LSTM are used as comparison algorithms, intervals of 500).
and they are trained and tested together with ARIMA-LSTM. Fig. 5 is a box plot of the electricity data of user 2, user 4
Adam optimizer is used in training, the mean square loss and user 5 for the whole year of 2012.
function MSE loss is used as the loss function, the learning Table III shows the mean and overall variance of the
rate is 1e-4, the batch size is 256, and the number of iterations electricity data for user 2, user 4 and user 5 for the whole
is 100 epochs. year of 2012.
B. Forecast Result
Table II shows the RMSE, MAPE, and WRE of ARIMA, TABLE III
M EAN AND POPULATION VARIANCE OF ELECTRICITY DATA FOR THREE
LSTM, and ARIMA-LSTM models on the dataset. It can be USERS .
seen that ARIMA-LSTM outperforms the other two models in
most metrics. User2 User4 User5
Mean 26.269 53.716 40.788
Fig. 3 shows the results of each model compared to the true Var. 129.582 1057.596 451.848
value.

1915
Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on August 29,2023 at 09:02:33 UTC from IEEE Xplore. Restrictions apply.
TABLE IV
P ERFORMANCE OF INDIVIDUAL MODELS (U SER 2)

Model RMSE MAE WRE


ARIMA 2.003 1.709 16.676%
LSTM 1.839 1.631 10.498%
ARIMA-LSTM 1.233 0.751 22.279%

TABLE V
P ERFORMANCE OF INDIVIDUAL MODELS (U SER 4)

Model RMSE MAE WRE


ARIMA 13.753 7.672 736.260%
LSTM 12.133 7.777 1222.923%
ARIMA-LSTM 11.099 7.443 945.210%

For the data in Fig. 6, the predicted values of ARIMA and


LSTM are significantly lower than the actual values in most of
Fig. 4. Load curve of three users (value interval: 500)
the extreme values. The error correction function of LSTM-
ARIMA solves this problem, and this function enables the
proposed model to fit the actual load curve well.
The data of Fig. 7 can be divided into high-value part and
low-value part. In the high-value part, ARIMA-LSTM got
a good prediction result. At the junction of the two parts,
ARIMA-LSTM can converge to the real value faster than
LSTM. In the low-value part, ARIMA-LSTM also has a good
prediction effect.
For the data in Fig. 8, ARIMA-LSTM achieves better
prediction performance than ARIMA and LSTM in the less
volatile parts. However, in the parts with large fluctuations,
the prediction effect of the model for extreme values needs to
be improved.
It can be seen from the experimental results that ARIMA-
LSTM also performs well on other datasets and has a certain
generality. ARIMA-LSTM performs very well on relatively
stationary datasets. But this model does not perform well on
Fig. 5. Box plot of annual electricity data for three users
the worst relative error, probably due to ARIMA errors during
error correction. In addition, the model performs poorly on the
volatile data of user 5.
Through the above analysis, it can be seen that the load
data of user 2 is relatively stable. The load data of user 4 IV. C ONCLUSIONS
fluctuates the most, but the distribution is relatively uniform. The forecast of day-ahead load can provide strategic sup-
The load data of user 5 fluctuates greatly, and the distribution port for electricity sales companies. Accurate forecasts allow
is concentrated near the median. But the extreme value is far decision makers to better regulate electricity consumption in
away from the median, and there are many sudden changes in an orderly manner and specify power generation plans.
the value. The existing prediction models such as LSTM can achieve
Since there may be more outliers in the data, change MAPE good results. Based on the existing research, this paper im-
to MAE: n proves the existing models.
1X
M AE(y, ŷ) = |yi − ŷi | (10)
n i=1
TABLE VI
Table IV, Table V and Table VI shows the RMSE, MAPE P ERFORMANCE OF INDIVIDUAL MODELS (U SER 5)
and WRE of ARIMA, LSTM, and ARIMA-LSTM models on
the dataset. It can be seen that ARIMA-LSTM is better than Model RMSE MAE WRE
the other two in most indicators. ARIMA 12.419 8.028 712.501%
LSTM 13.131 8.330 715.045%
Fig. 6, Fig. 7 and Fig. 8 shows the comparison of the results ARIMA-LSTM 12.306 7.558 906.491%
of each model with the true value.

1916
Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on August 29,2023 at 09:02:33 UTC from IEEE Xplore. Restrictions apply.
This paper proposes the ARIMA-LSTM model for short-
term load forecasting. This model improves the accuracy of
model prediction by using ARIMA to correct the model error
of LSTM. The prediction error of the ARIMA-LSTM model
on some data is usually lower than that of other models.
However, the performance of this model on the volatile dataset
needs to be improved. In addition, the lack of rules formulated
by hyperparameters is the insufficiency of this model and
is one of the directions for future research. The choice of
ARIMA parameters is also an area that can be optimized. This
shortcoming can be improved by using heuristic algorithm.
In conclusion, ARIMA-LSTM is one of the best options for
short-term electricity forecasting, and roles such as decision
makers and e-commerce retailers can benefit from it.
ACKNOWLEDGEMENT
Research in this paper is supported by the Key Research and
Fig. 6. Comparison of the predicted results of each model (User2) Development Program of Jiangxi Province (20212BBE51002)
R EFERENCES
[1] Moon J, Hossain M B, Chon K H. AR and ARMA model order se-
lection for time-series modeling with ImageNet classification[J]. Signal
Processing, 2021, 183: 108026.
[2] Tang L, Yi Y, Peng Y. An ensemble deep learning model for short-
term load forecasting based on ARIMA and LSTM[C]//2019 IEEE
International Conference on Communications, Control, and Computing
Technologies for Smart Grids (SmartGridComm). IEEE, 2019: 1-6.
[3] Satrio C B A, Darmawan W, Nadia B U, et al. Time series analysis and
forecasting of coronavirus disease in Indonesia using ARIMA model
and PROPHET[J]. Procedia Computer Science, 2021, 179: 524-532.
[4] Ding S, Hipel K W, Dang Y. Forecasting China’s electricity consumption
using a new grey prediction model[J]. Energy, 2018, 149: 314-328.
[5] Arvanitidis A I, Bargiotas D, Daskalopulu A, et al. Enhanced Short-
Term Load Forecasting Using Artificial Neural Networks[J]. Energies,
2021, 14(22): 7788.
[6] Shi H, Xu M, Li R. Deep learning for household load forecasting—A
novel pooling deep RNN[J]. IEEE Transactions on Smart Grid, 2017,
9(5): 5271-5280.
[7] Runge J, Zmeureanu R. Forecasting energy use in buildings using
artificial neural networks: A review[J]. Energies, 2019, 12(17): 3254.
[8] Mao G, Wang M, Liu J, et al. Comprehensive comparison of artificial
neural networks and long short-term memory networks for rainfall-
Fig. 7. Comparison of the predicted results of each model (User4) runoff simulation[J]. Physics and Chemistry of the Earth, Parts A/B/C,
2021, 123: 103026.
[9] Hewamalage H, Bergmeir C, Bandara K. Recurrent neural networks
for time series forecasting: Current status and future directions[J].
International Journal of Forecasting, 2021, 37(1): 388-427.
[10] Noh S H. Analysis of Gradient Vanishing of RNNs and Performance
Comparison[J]. Information, 2021, 12(11): 442.
[11] Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recur-
rent neural networks[C]//International conference on machine learning.
PMLR, 2013: 1310-1318.
[12] Cui C, He M, Di F, et al. Research on power load forecasting method
based on LSTM model[C]//2020 IEEE 5th Information Technology
and Mechatronics Engineering Conference (ITOEC). IEEE, 2020: 1657-
1660.
[13] Fu R, Zhang Z, Li L. Using LSTM and GRU neural network meth-
ods for traffic flow prediction[C]//2016 31st Youth Academic Annual
Conference of Chinese Association of Automation (YAC). IEEE, 2016:
324-328.
[14] Shi X, Chen Z, Wang H, et al. Convolutional LSTM network: A machine
learning approach for precipitation nowcasting[J]. Advances in neural
information processing systems, 2015, 28.
[15] Graves A, Schmidhuber J. Framewise phoneme classification with
bidirectional LSTM networks[C]//Proceedings. 2005 IEEE International
Joint Conference on Neural Networks, 2005. IEEE, 2005, 4: 2047-2052.

Fig. 8. Comparison of the predicted results of each model (User5)

1917
Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on August 29,2023 at 09:02:33 UTC from IEEE Xplore. Restrictions apply.

You might also like