Professional Documents
Culture Documents
12 A Hybrid Model Based On Bidirectional Long Short Term Memory Neural Network and Catboost For Short Term Electricity Spot Price Forecasting
12 A Hybrid Model Based On Bidirectional Long Short Term Memory Neural Network and Catboost For Short Term Electricity Spot Price Forecasting
12 A Hybrid Model Based On Bidirectional Long Short Term Memory Neural Network and Catboost For Short Term Electricity Spot Price Forecasting
To cite this article: Fan Zhang , Hasan Fleyeh & Chris Bales (2020): A hybrid model
based on bidirectional long short-term memory neural network and Catboost for short-
term electricity spot price forecasting, Journal of the Operational Research Society, DOI:
10.1080/01605682.2020.1843976
ORIGINAL ARTICLE
CONTACT Fan Zhang fzh@du.se Department of Microdata Analysis, Dalarna University, Falun 79188, Sweden
ß 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/
licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not
altered, transformed, or built upon in any way.
2 F. ZHANG ET AL.
short-term demand at the least cost combination of LSTMs and CatBoost algorithms are described in
generation resources. In addition, short-term fore- Section 3. In Section 4, the proposed forecasting
casting results can be utilised by a firm to develop approach and data used in the experiment are pre-
bidding strategies to gain maximised profit (Girish sented in detail. Details of the experiment as well as
& Vijayalakshmi, 2015). the analysis of experimental results are presented in
Although neural networks are considered as the Section 5. Finally, limitations and conclusions of
state of art techniques for forecasting tasks, deep this study are summarised in Sections 6 and 7.
neural networks are not studied comprehensively
with respect to electricity price forecasting and none
2. Literature review
of them has been applied for Nord Pool market.
This represents a strong motivation to study deep Mirikitani and Nikolaev (2011) proposed an RNN-
learning neural network and its performance in elec- based approach for one hour ahead electricity spot
tricity price forecasting. The other motivation is that price forecasting. The major contribution of this
in recent years, boosting algorithms became increas- study was the utilisation of Expectation
ingly popular among researchers for feature selec- Maximisation (EM) algorithm with Kalman filtering
tion as well as hybrid models are proved to be and smoothing, which estimate both noise in the
capable of tackling complex real-life problems, but data and model uncertainty. Hourly MCP data of
there are very few hybrid deep neural networks Ontario HOEP year 2004 and Spanish power
applications found in the existing literature. exchange year 2002 were used in the case study. For
The main contribution of this article is to pro- the Ontario case, 48 days’ data from Spring,
pose a novel hybrid approach for short-term electri- Summer, and Winter were selected for training
city spot price forecasting. The proposed approach while testing set consisted of two weeks’ data. The
consists of two main building blocks; CatBoost and least MAPE of the proposed model was 15.09, 10.21,
bidirectional long short-term memory (BDLSTM) and 15.71 for Spring, Summer, and Winter, respect-
neural network. Catboost algorithm is applied for ively. In terms of the Spanish market dataset, 42
feature selection and ranking. Conventional boosting days’ data of four seasons prior to the week to be
algorithms, such as XGboost (Chen & Carlos, 2016) forecasted were used for training. MAPE of the pro-
and LightGBM (Ke et al., 2017) require categorical posed model was 4.87, 10.38, 8.93, and 4.26 for
input variables to be converted into numeric repre- Spring, Summer, Autumn, and Winter, respectively.
sentations before being processed. Catboost algo- Anbazhagan and Kumarappan (2013) applied
rithm, however, automatically converts categorical Elman neural network for day ahead electricity price
values into numbers using various statistics on com- forecasting. The architecture of the proposed net-
binations of categorical features as well as combina- work consisted of an input layer with 16 neurons,
tions of both categorical and numerical features, one hidden layer with 10 neurons, and an output
which reduces the explicit pre-processing process. layer with one neuron. Lagged electricity price was
Moreover, the procedure of conventional gradient used as the input feature. Day ahead data of Spanish
boosting algorithms is prone to overfitting due to market 2002 and New York 2010 were used in the
the fact that models are trained using the same data case study. For the Spanish market, 42 days prior to
points in each iteration. To reduce overfitting, a the week to be forecasted is used for training.
random permutation mechanism is introduced in MAPE of the proposed model is 4.11, 4.37, 9.09,
Catboost when dividing a given dataset. and 8.66 for winter, spring, summer, and autumn
In addition, DBLSTM is used as the main forecast- week, respectively. In terms of the result for the
ing engine of the proposed approach. It tackles the New York market, MAPE of the presented model is
gradient vanishing problem by introducing various 5.06, 3.98, 3.30, and 2.93 for Winter, Spring,
gating mechanisms, therefore, performs better in Summer, and Autumn week, respectively.
learning dependencies of a time series than conven- Vardhan and Chintham (2015) presented Elman
tional neural networks. Besides, by preserving infor- neural network to forecast the day ahead electricity
mation from both past and future, BDLSTM has been price of a deregulated market. MCP data of the
proved to be superior to LSTM in various application Spanish market were used in the case study. 42 days’
areas (Graves & Schmidhuber, 2005; Graves et al., data to build the model and 16 lagged prices are
2013). The proposed hybrid approach is novel and selected as the model input. The result showed that
has not been found in any state of art literature. MAPE of the proposed method were 5.43 for Winter
The rest of this article is organised as follow: and 3.00 for Summer week, respectively. It was also
Section 2 reviews past literature of applying recur- reported in the study that the proposed method out-
rent neural networks (RNNs) and deep neural net- performs ARIMA, Wavelet-ARIMA, fuzzy neural net-
works for electricity price forecasting. Overviews of work, Wavelet-ARIMA-RBF in terms of MAPE.
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 3
Wang et al. (2017) proposed an extended stacked DNN was to include both the recurrent layer and
denoising autoencoder based model (RS-SDA) for regular layer for modelling relations inside the
short-term electricity prices forecasting. The pro- sequential time series data and non-sequential data.
posed method was validated using hourly electricity A GRU (Cho et al., 2014) layer was used in GRU-
prices data collected from American hubs. Online DNN, which is faster to train than LSTM. EPEX-
hourly forecasting and day ahead hourly forecasting Belgium market data from 1 January 2010 to 31
were performed. The proposed method was com- November 2016 were employed in the case study.
pared with classical ANN, SVM, multivariate adap- sMAPE of LSTM-DNN and GRU-DNN were 13.06
tive regression splines (MARS), and least absolute and 13.04, respectively.
shrinkage selection operator (Lasso). Performance Kuo and Huang (2018) presented a hybrid deep
metrics used in this study were hit rage (HR), neural network model for electricity price forecast-
MAPE, and different variations of MAPE. ing. The proposed hybrid model consisted of two
Experiment results showed that the proposed deep neural network layers: CNN and LSTM. In the
method outperforms the rest baseline models con- first step, CNN was used to extract the features,
sidered in this study. One important conclusion of which were fed to LSTM for forecasting. Model
this study was that the performance of models dep- input was historic electricity price of 24 h and the
redates when fluctuation or spikes are presented in output was the forecasted price of the next hour.
the series to be forecasted. PJM Regulation Zone Preliminary Billing Data
Ugurlu et al. (2018) presented a Gated Recurrent which is composed of regulation market capacity
Unit (GRU) based recurrent neural network for clearing price of every half hour in 2017 was
electricity price forecasting. Hourly price data from employed in the study. Ten datasets, three months’
1 January 2013 to 21 December 2016 of Turkish day data for each set were used for training, and one
ahead market were employed in this case study.
month’s data for testing. The average MAE of the
Data from 1 January 2013 to 21 December 2015
proposed hybrid model was 8.85 which was lower
were used for training. The trained model was used
than a single LSTM and single CNN.
to forecast the hourly price of the next day by 24
steps ahead forecasting approach. Input features
consisted of lagged prices along with exogenous var- 3. Theoretical background
iables such as forecast Demand/Supply (D/S), tem- 3.1. Overview of LSTMs
perature, realised D/S, and balancing market prices.
Two groups of case studies were presented; a group LSTM is a variation of the recurrent neural network
with shallow (one hidden layer) and deep (three (RNN) (Sulehria & Zhang, 2007) which was first pro-
hidden layers) architectures. The result showed that posed in Hochreiter and Schmidhuber (1997). To
deep neural networks outperform shallow networks tackle the problem of vanishing gradients of the con-
in most cases. ventional recurrent neural network, LSTM cells are
Lago et al. (2018) proposed a hybrid deep neural introduced in its architecture (Bengio et al., 1994;
networks approach for the day ahead electricity spot Hochreiter, 1991). A standard topology of LSTM is
price forecasting. Two hybrid deep neural network shown in Figure 1 (Olah, 2015). At each iteration t,
models were presented in this study, namely LSTM- the input of LSTM cell is xt and ht denotes its output.
DNN and GRU-DNN. The motivation of LSTM- The current cell input and output state are denoted
4 F. ZHANG ET AL.
by C ~ t and Ct , while the cell output state of previous point in sequence data can be retrieved using bidir-
time step is denoted by Ct1 : ectional recurrent neural network (Graves &
As mentioned earlier, the structure of celled gates Schmidhuber, 2005). Similarly, in a BDLSTM,
enables LSTM to model long-term dependences of sequence data are processed in both directions with
sequence data. Gates are served to control cell states forward LSTM and backward LSTM layer and these
of LSTM by allowing information to pass through two hidden layers are connected to the same output
optionally. There are three types of gates; the input layer. A standard topology of BDLSTM is shown in
gate, the forget gate, and the output gate denoted by Figure 2 (Yildirim, 2018).
it , ft , ot , respectively. Values of the cell input state According to Equations (1)–(4), at each iteration
and gates are calculated by Equations (1)–(4). t, cell output state Ct and LSTM layer output ht are
it ¼ rg ðWi xt þ Ui ht1 þ bi Þ (1) calculated using Equations (5) and (6).
ft ¼ rg ðWf xt þ Uf ht1 þ bf Þ (2) ~ t it
Ct ¼ ft Ct1 þ C (5)
ot ¼ rg ðWo xt þ Uo ht1 þ bo Þ (3) ht ¼ ot tanhðCt Þ (6)
~
C t ¼ tanhðWc xt þ Uo ht1 þ bc Þ (4) BDLSTMs have been applied in the field of tra-
where Wi , Wf , Wo , Wc denote the weight matrices jectory prediction (Xue et al., 2017; Zhao et al.,
between the input of the hidden layer, the input 2018), speech recognition (Zeyer et al, 2017; Zheng
gate, the forget gate, the output gate and the input et al., 2016), biomedical event analysis (Wang et al.,
cell state. Ui , Uf , Uo , Uc denote the weight matrices 2017), natural language processing (Xu et al., 2018),
between previous cell output state, input gate, forget traffic speed prediction (Cui & Wang, 2018), etc. It
gate, output gate and input cell state. bi , bf , bo , bc is reported in the literature that BDLSTM outper-
denote the corresponding bias vectors. forms conventional LSTM in some areas such as
frame wise phoneme classification (Graves &
Schmidhuber, 2005) as well as automatic speech rec-
3.2. Overview of bidirectional LSTM (BDLSTM)
ognition and understanding (Graves et al., 2013).
BDLSTM is derived by the idea of bidirectional
recurrent neural network (Baldi et al., 1999;
3.3. Overview of CatBoost
Schuster & Paliwal, 1997). In bidirectional recurrent
neural network, each training sequence is presented Boosting is an ensemble algorithm that trains and
forwards and backwards to two recurrent networks combines weak learners into a strong learner in a
separately, both of which are connected to the same systematic manner (Freund & Schapire, 1997).
output layer. This means that complete sequential However, pre-processing steps that convert categor-
information of all points before or after the given ical input variables into numeric representations are
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 5
Table 1. Gradient estimation by CatBoost. (Friedman, 2002), a new schema of calculating leaf
Algorithm: Gradient estimation by CatBoost values when selecting tree structure is presented in
1. Input: training data fðXk , Yk Þgnk¼1 after random permutation, CatBoost. To be more specific, assume F i denotes
number of trees T, choice loss function uðyj , aÞ:
2. Initialization: Mi < for i ¼ 1, … ,n the built model and g i ðXk, Yk Þ denotes the gradient
3. Do for iter ¼ 1, … ,I value of the k-th training sample after building i
Do for i ¼ 1, … ,n
Do for j ¼ 1, … ,i-1 trees. To keep the gradient unbiased, for each sam-
gj < da d
uðyj , aÞja¼Mi ðXj Þ ple Xk, a separate model Mk is trained, which is not
M < BuildOneTree((Xj , gj ) for j ¼ 1, … ,i-1)
Mi < Mi þ M
updated using a gradient estimate for this sample.
4. Output: M1 , :::, Mn ; M1 ðX1 Þ, M2 ðX2 Þ, :::, Mn ðXn Þ: The gradient on Xk is estimated using Mk : Then,
the resulting tree is scored according to the estima-
required by conventional boosting algorithms. For tion. Detailed steps of the algorithm are presented
example, one of the most common approaches to in Table 1 (Dorogush et al., 2018).
pre-process categorical features is one-hot encoding
(Micci-Barreca, 2001), which replaces the original 4. The proposed method and data
categorical feature with binary values for each cat- description
egory. This approach consumes large memory and
is computationally intensive, especially when dealing The overall process of the proposed method is
with categorical features of high cardinalities. shown in Figure 3.
Another approach to deal with categorical inputs, After data collection and visualisation, a two-
adopted by the LightGBM algorithm, converts cat- phase feature selection step is performed. Then, the
egorical features into gradient statistics at each gra- data are normalised and split in to training and test-
dient boosting step. However, this approach results ing set for training and testing the model. Details of
in a high computation cost due to the fact that sta- the proposed method are discussed in the following
tistics calculation is performed for each categorical subsection. To verify the effectiveness of the pro-
feature at each step (Prokhorenkova et al., 2018). posed method, historical hourly Stockholm electri-
A more efficient boosting approach, namely cat- city price data of Nord Pool market are employed
egorical boosting (CatBoost) (Dorogush et al., 2018), as a case study. Details of the dataset are discussed
is proposed to tackle this problem. To be more spe- in the following subsection.
cific, a modified target-based statistics (TBS) algo-
rithm is used in CatBoost. Assume that dataset 4.1. Data description
D ¼ fðXi , Yi Þg i¼1, :::, n , where Xi ¼ ðxi, 1 , :::, xi, m Þ is
a vector consists of both numerical and categorical Nord Pool (“as of May 19, 2020,” “Nord Pool
features, m is the number of features. Yi 2 Ɍ is the Website”, 2015) runs the leading power market in
corresponding label. First, the dataset is randomly Europe with both day ahead and intraday markets
permutated. Then, for each sample, the average being offered. There are four series randomly
value of the label is calculated for samples with the selected from each season used in this study. Details
same category value prior to the given one in the of each series are shown in Table 2.
permutation. Let r ¼ r1 , :::, rn denotes the permuta- Plots of four series are shown in Figures 4–7,
tion. Then, the permutated observation xi, k is respectively. It can be seen from the plot that series
replaced with xrq, k and xrq, k is calculated by 1 and 3 have greater fluctuation compared with the
Pp1 other two series, while series 2 and 4 present stron-
j¼1 ½xrj, k ¼ xrq, k Yrj þ a:P ger seasonality.
Pp1 :
j¼1 ½xrj, k ¼ xrq, k þ a
where ½xj, k ¼ xi, k ¼ 1 if xj, k ¼ xi, k and 0 otherwise. 4.2. The proposed method
P denotes the prior value and a is the corresponding After data collection and visualisation, autocorrel-
weight. Prior is the average label value for regression ation tests of four series data are performed and the
and a priori probability of encountering a positive corresponding results are plotted in Figures 8–11.
label for classification. Adding prior serves to reduce The blue areas represent the approximate 95% con-
the noise from minor categories (Cestnik, 1990). On fidence intervals of autocorrelations. Dots that
the one hand, the proposed method utilises the appear outside the blue area are statistically signifi-
whole dataset for training. On the other hand, it cant, which indicate potential autocorrelations at a
avoids the overfitting problem by performing ran- 95% confidence interval.
dom permutations. Initial input features are selected from the lags of
Moreover, to overcome the biased gradients original series according to Autocorrelation
problems in conventional boosting algorithms Function (ACF) plots. Apart from numeric features,
6 F. ZHANG ET AL.
there are three categorical variables derived from the significant correlation near the 400th lag. Therefore,
dataset which are the hour of the day, weekend (the the first 200, 400, 400, and 450 lags of electricity
current day is weekend or no), and the day name. price along with the three categorical features are
Figures 8–11 show that there exists significant chosen as the initial candidate features for series
correlation near the 200th lag for series 1 and sig- one to four, respectively.
nificant autocorrelation values are observed after the To eliminate features that present less useful
300th lag in series 2 and 3. For series 4, there is information for forecasting, the initial candidate
Price (SEK/MWh) Price (SEK/MWh)
Series
Series
Series
Series
Series
100
200
300
400
500
600
700
800
0
4
3
2
1
350
400
450
500
550
600
650
700
750
03-07-2018 11-04-2018
04-07-2018 12-04-2018
13-04-2018
Spring
Winter
05-07-2018
Season
Autumn
Summer
06-07-2018 14-04-2018
08-07-2018 15-04-2018
09-07-2018 16-04-2018
10-07-2018 17-04-2018
culated by Equation (8).
11-07-2018 18-04-2018
Start date
20-04-2018
Table 2. Details of each series.
12-07-2018
1/5/2018 5:00
9/10/2018 13:00
7/3/2018 9:00
4/11/2018 1:00
21-04-2018
13-07-2018
22-04-2018
15-07-2018
23-04-2018
16-07-2018 24-04-2018
17-07-2018 25-04-2018
18-07-2018 26-04-2018
End date
19-07-2018 27-04-2018
2/24/2018 4:00
10/30/2018 11:00
8/22/2018 8:00
5/30/2018 0:00
20-07-2018 29-04-2018
22-07-2018 30-04-2018
24-07-2018 02-05-2018
model fitting, the importance of each feature is cal-
features are fed to CatBoost algorithm first. After
Length
25-07-2018 03-05-2018
26-07-2018 04-05-2018
27-07-2018 05-05-2018
29-07-2018 06-05-2018
Date (spring)
30-07-2018 08-05-2018
Date (summer)
31-07-2018 09-05-2018
01-08-2018 10-05-2018
02-08-2018 11-05-2018
Featureimportance
12-05-2018
05-08-2018
06-08-2018 14-05-2018
15-05-2018
07-08-2018
08-08-2018 17-05-2018
X
þ v2
09-08-2018 18-05-2018
trees, leaves
10-08-2018 19-05-2018
12-08-2018 20-05-2018
21-05-2018
13-08-2018
v1
c1 þ c2
22-05-2018
14-08-2018
23-05-2018
15-08-2018
v1 c1 þ v2 c2 2
24-05-2018
16-08-2018 26-05-2018
c1 þ c2
:c2:::
17-08-2018
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY
27-05-2018
19-08-2018 28-05-2018
v1 c1 þ v2 c1 2
20-08-2018 29-05-2018
:c1
(8)
21-08-2018 30-05-2018
8
100
200
300
400
500
600
700
150
250
350
450
550
650
750
850
950
1050
50
10-09-2018
05-01-2018
06-01-2018 11-09-2018
07-01-2018 12-09-2018
08-01-2018 14-09-2018
F. ZHANG ET AL.
09-01-2018 15-09-2018
11-01-2018 16-09-2018
12-01-2018 17-09-2018
13-01-2018 18-09-2018
14-01-2018 19-09-2018
15-01-2018 21-09-2018
16-01-2018 22-09-2018
18-01-2018 23-09-2018
19-01-2018 24-09-2018
20-01-2018 25-09-2018
21-01-2018 26-09-2018
22-01-2018 28-09-2018
23-01-2018 29-09-2018
Date (winter)
02-02-2018 08-10-2018
03-02-2018 09-10-2018
04-02-2018 10-10-2018
12-10-2018
Electricity price of Stockholm - series 3
presented in Appendix 2.
09-02-2018
10-02-2018 16-10-2018
11-02-2018 17-10-2018
12-02-2018 19-10-2018
13-02-2018 20-10-2018
15-02-2018 21-10-2018
16-02-2018 22-10-2018
17-02-2018 23-10-2018
18-02-2018 24-10-2018
19-02-2018 26-10-2018
20-02-2018 27-10-2018
22-02-2018 28-10-2018
29-10-2018
123, 136, 156, and 143 features selected for series 1–4,
After feature selection, top rows with NA values 2018, 0:00 am. Series 2 consists of data points from
are removed. As a result, Series 1 consists of data 20 July 2018, 2:00 am to 22 August 2018, 8:00 am.
points from 19 April 2018, 10:00 am to 30 May Series 3 consists of data points from 27 September
10 F. ZHANG ET AL.
2018, 6:00 am to 30 October 2018, 11:00 am and 130th time step afterwards, where spikes and valleys
Series 4 consists of data points from 23 January are present, the proposed model tends to underesti-
2018, 0:00 am to 24 February 2018, 4:00 am, mate the actual values, although the trend is still
respectively. fully captured by the model. There are two major
Then, data normalisation is performed for by reasons that contribute to this phenomenon. The
Equation (9). first is due to the presence of spikes in the entire
xs ¼ ðx aÞ=r (9) series. Spikes are generally caused by certain short-
term events or gaming behaviours, which are not
where a denotes the mean value of the series data and the long-term trends of market factors. These events
r denotes the corresponding standard deviation. are usually subjective and difficult to forecast (Zhao
The resultant preprocessed data are split, so that et al., 2007). Amjady and Keynia (2011) reported
80% of the normalised dataset is used for training that model price spikes, in general, cannot be mod-
and 20% for testing. The experiment is repeated 15 elled by conventional electricity price forecast
times for each model and initial weights of approaches effectively due to the highly erratic
BDLSTM, GRU, LSTM, and MLP are randomly ini- behaviours and dependency on complex factors (L.
tialised for each run. Wang et al., 2017). There are particular studies
The state of BDLSTM is initialised by predicting focussing on the forecasting of electricity price
on the training data first. After that, to perform the spikes (Amjady & Keynia, 2011; Fragkioudaki et al.,
forecasting of n time steps ahead of data points in 2015; Manner et al., 2016; Sandhu et al., 2016;
the testing set, one-time step forecasting approach is Voronin & Partanen, 2013). However, forecasting of
used. To be more specific, prediction of the first electricity price spikes is not within the scope of this
testing sample is made by the initialised BDLSTM study and is considered as future work. Another fac-
model. Next, the prediction value is utilised to tor is that the one-step ahead forecasting suffers
update the network state, after that, the test sample from the problem of error accumulation, as pre-
of the next time step is predicted using the updated dicted values are served as inputs to predict the next
model. The process is iterated for the remaining time step (Chevillon, 2007; Ching-Kang, 2003).
time steps to be predicted. Overall, the figure shows that the proposed
BDLSTM model outperforms the rest.
The plot of MLP shows that trends of certain
5. Results and discussions
time steps, e.g., time steps between 38 and 51, are
The forecasting results of the proposed BDLSTM not captured correctly by the model. The curve of
together with the other baseline models (SVR, MLP, actual prices displays a small peak between time
ARIMA, ensemble tree, GRU, and LSTM) for each steps 38 and 51. However, the curve predicted by
series are presented in Figures 12–19. MLP shows a small valley. SVR model tends to
Series 1: Figures 12 and 13 show that the pro- overestimate peak values and underestimate valleys
posed BDLSTM model captures the overall trend of more than the proposed model, e.g., at around time
the original price. Especially from the beginning steps 135 and 152, predicted values of SVR deviate
time step until approximately 130th time step, fore- far from actual prices. Ensemble tree outperforms
casting values of the proposed model fit the actual MLP and SVR models. However, predicted values of
electricity price closely. From approximately the ensemble tree model are over smoothed, this being
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 11
Figure 12. A plot of the actual price and the predictions by non-deep learning models for series 1.
Figure 13. A plot of the actual price and the predictions by deep learning models for series 1.
more pronounced for series two and three than for and the end of forecasting time steps, while LSTM
series one and four. Therefore, ensemble tree fails to shows an overall better performance than that of
capture the detailed dynamics presented in the ser- GRU, thought it underestimates the valleys more
ies. ARIMA model overestimates for almost all fore- than BDLSTM.
casting time steps and predicts constant price after Series 2: this series contains certain seasonality,
time step 60 approximately. In addition, predictions but with fewer spikes and valleys than series 1. The
of GRU model deviate to actual prices at the start proposed model fits the data very well for series 2
12 F. ZHANG ET AL.
Figure 14. A plot of the actual price and the predictions by non-deep learning models for series 2.
Figure 15. A plot of the actual price and the predictions by deep learning models for series 2.
as depicted in Figures 14 and 15 with slightly predicted values are overestimated for almost all
underestimated prediction between time steps 20 forecasting time steps. In terms of the forecasting
and 40 as well as time steps 80 and 100. MLP and results of deep learning models, LSTM tends to
SVR overestimate high values as well as underesti- overestimate the peaks, while GRU predictions devi-
mate low values. ARIMA predicts a similar seasonal- ate more from time steps 50 to 60 and the valley at
ity as that is presented in the actual series, though time step 110 is underestimated more by GRU
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 13
Figure 16. A plot of the actual price and the predictions by non-deep learning models for series 3.
Figure 17. A plot of the actual price and the predictions by deep learning models for series 3.
comparing to the other two deep learning models. the same problem of prediction of valleys and spikes
Due to the presence of spikes and less significant remains at approximate time step 9 and 61. MLP
valleys in Series 2 comparing to Series 1, the pro- underestimates valleys and overestimates peaks of
posed model tends to overestimate these peak values almost the entire series while predictions by ensem-
rather than underestimating valleys. ble tree remain the same after the 20th time step.
Series 3: as depicted in Figures 16 and 17, the The ARIMA suffers from the similar over smoothed
overall trend is well captured by BDLSTM, however, problem as ensemble tree does. In addition, GRU
14 F. ZHANG ET AL.
Figure 18. A plot of the actual price and the predictions by non-deep learning models for series 4.
Figure 19. A plot of the actual price and the predictions by deep learning models for series 4.
model shows a relatively big deviation to the actual observed for almost all time steps. Besides, the over
price at approximately time step 58, while LSTM smoothed problem of ensemble tree and ARIMA is
underestimates the peak between time step 60 and less obvious. It is also observed that forecasted val-
70. Overall, the predicted values of the proposed ues of ARIMA model display certain seasonality
model follow the actual prices more closely than present in the actual electricity prices. However,
predicted values of the rest models. other dynamics such as trend, peaks, and valleys are
Series 4: from Figures 18 and 19, the predicted not captured properly by ARIMA. This is due to the
values of BDLSTM follow the actual prices closely, factor that when complex nonlinearity dynamics are
obvious deviation from the actual prices is not presented in a time series, the performance of
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 15
ARIMA suffers (Deb et al., 2017). Concerning SVR, other models. Ensemble tree is the most time effi-
the major problem is that it overestimates the peak cient model in terms of forecasting efficiency but
values of the actual prices most among all models has lower accuracy. To further compare predictive
considered. In terms of the performance of deep accuracy, a Diebold–Mariano test (Diebold &
learning models, except BDLSTM, the other two Mariano, 2012) is performed. The null hypothesis is
models suffer to predict the first forecasting time that the proposed BDLSTM is as accurate as the
steps. To be more specific, GRU model shows noisy other model it is compared with. While the alterna-
predictions at the first 20 forecasting time steps. On tive hypothesis is that the model to be compared
the contrary, LSTM prediction values are over with is less accurate than BDLSTM. p values of
smoothed before the 40 time step. The plots of Diebold–Mariano test are reported in Table 5.
Series 4 show relatively regular seasonality patterns Diebold–Mariano test results show that BDLSTM
than the other three series. Therefore, forecasting is more accurate than the other models considered
errors of the proposed model are more evenly dis- in this study for almost all tested series with a sig-
tributed instead of showing relatively obvious over- nificance level of 0.05, apart from Series 3 forecast-
estimated or underestimated predictions in certain ing result of SVR and Series 1 forecasting result of
time steps especially where peaks and valleys are LSTM. Although the overall errors of BDLSTM
presented as shown in the other three series. reported in Tables 3 and 4 are lower, there is not
In general, BDLSTM outperforms the other mod- enough evidence to prove BDLSTM is more accur-
els for all series, SVR and MLP tend to overestimate ate than SVR for Series 3 or LSTM for Series 1 with
peaks and underestimate valleys. While predicted a significance level of 0.05 in this case.
values of ensemble tree and ARIMA are over
smoothed. Error measures adopted in this study are
6. Limitations and notes for future
MAPE, RMSE, and MAE. Apart from the error
researchers
measurement, training and forecasting time of each
model are also measured. Average results are According to the latest review of short-term electri-
reported in Table 3 and the best run results with city prices forecasting (Zhang & Fleyeh, 2019), there
least errors of each model are presented in Table 4. is a lack of recognised benchmarking procedure and
Besides, the detailed results of all 15 runs can be a standard dataset for benchmarking of different
found in Appendix 1. models, which make the direct comparison of differ-
Results show that the proposed model outper- ent results difficult. To be more specific, researchers
forms the other models considered in the experi- use different error measures, dataset, length of train-
ment for all series for the chosen error measures. ing/testing time steps, start/end date used, and so
The lowest MAPE is achieved by the proposed forth. To make the future benchmarking procedure
model with values of 7.015%, 4.441%, 5.265%, and easier, the instructions of accessing the dataset used
7.137% for series 1–4, respectively. However, the in this study are provided, the dataset can be
average time for training the BDLSTM model is accessed by the link of Nord Pool (“Nord Pool
higher than for training the other models except Historical Market Data”) website in “.xls” format,
ARIMA. Furthermore, the trained BDLSTM is less the filtering criteria used for historical hourly elec-
time efficient in forecasting comparing with the tricity spot prices data retrieve is “Elspot Prices” in
16 F. ZHANG ET AL.
Table 5. DM test p values of models to be compared with the proposed BDLSTM for each series.
Series BDLSTM-MLP BDLSTM-SVR BDLSTM-ET BDLSTM-ARIMA BDLSTM-GRU BDLSTM-LSTM
Series 1 4.18E-05 6.70E-06 5.53E-05 1.46E-13 8.26E-06 0.09303
Series 2 1.13E-05 5.72E-08 6.17E-04 <2.2e-16 1.29E-04 1.13E-05
Series 3 0.01191 0.08409 0.02924 4.34E-15 4.21E-06 5.28E-05
Series 4 2.77E-05 3.39E-08 2.39E-03 <2.2e-16 5.03E-06 7.77E-05
the “Filter by category” filter and “Hourly” in the optimise the model structure and parameters such
“Filter by resolution” filter. The motivation of the as number of hidden layers, number of neurons as
instructions is for the ease of other researchers to well as the weights of each connection. Secondly, to
access the dataset used in this study and encourage reduce the accumulated error introduced by the
future researchers to use the published dataset for one-step-ahead forecasting approach, other
benchmarking or upload their own datasets and approaches such as training separate models for
provide the instructions of how to access them if each time horizon separately using only past obser-
possible. In addition, it is advised to use the same vations, multi-step ahead approach can be exam-
error measures, length of training/testing, and other ined. Besides, electricity price spikes shall be
factors involved in the benchmarking procedure. handled more carefully by a separate approach to
further improve the forecasting accuracy. Moreover,
7. Conclusion and future work to enhance time efficiency, graphic processing unit
(GPU) and parallel computing techniques can be
In this article, a novel hybrid model is proposed for considered. Finally, instructions of accessing the
short-term electricity price forecasting. It combines dataset used in this study are provided, which can
a Catboost algorithm for feature selection and be served as a benchmarking dataset for future
BDLSTM neural network model for forecasting. The researchers, suggestions of making future bench-
major advantages of the proposed method are that marking procedure smoother are advised as well.
categorical features are handled more efficiently.
Besides, BDLSTM is superior to other methods for
modelling complex dependencies inside the series Disclosure statement
data. The experiment results show that the proposed No potential conflict of interest was reported by
approach outperforms the other models in terms of the authors.
MAPE, RMSE, and MAE for series with large fluc-
tuations of electricity prices as well as series with
References
smaller fluctuation but with seasonality present. A
limitation of the proposed model is that it consumes Amjady, N., & Keynia, F. (2011). A new prediction strat-
more time in terms of both training and forecasting egy for price spike forecasting of day-ahead electricity
markets. Applied Soft Computing Journal, 11(6),
comparing with other models.
4246–4256. https://doi.org/10.1016/j.asoc.2011.03.024
There are four suggestions for future studies. Anbazhagan, S., & Kumarappan, N. (2013). Day-ahead
Firstly, optimisation techniques such as particle deregulated electricity market price forecasting using
swarm optimisation (PSO), genetic algorithm (GA), recurrent neural network. Systems Journal, IEEE, 7(4),
differential evolutionary (DE) can be explored to 866–872. https://doi.org/10.1109/JSYST.2012.2225733
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 17
Baldi, P., Brunak, S., Frasconi, P., Soda, G., & Pollastri, G. boosting. Journal of Computer and System Sciences,
(1999). Exploiting the past and the future in protein sec- 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
ondary structure prediction. Bioinformatics, 15(11), Friedman, J. H. (2002). Stochastic gradient boosting.
937–946. https://doi.org/10.1093/bioinformatics/15.11.937 Computational Statistics & Data Analysis, 38(4), 367–378.
Barnes, P. M. (2017). The politics of nuclear energy in the https://doi.org/10.1016/S0167-9473(01)00065-2
European Union: Framing the discourse. Barbara Girish, G. P., & Vijayalakshmi, S. (2015). Role of energy
Budrich Publishers. exchanges for power trading in India. International
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning Journal of Energy Economics and Policy, 5(3), 673–676.
long-term dependencies with gradient descent is diffi- Graves, A., & Schmidhuber, J. (2005). Framewise phon-
cult. IEEE Transactions on Neural Networks, 5(2), eme classification with bidirectional LSTM and other
157–166. https://doi.org/10.1109/72.279181 neural network architectures. Neural Networks: The
Berglund, S. (2009). Putting politics into perspective: A Official Journal of the International Neural Network
study of the implementation of EU public utilities direc- Society, 18(5-6), 602–610. https://doi.org/10.1016/j.neu-
tives. Eburon Uitgeverij B.V., 2009 – European Union net.2005.06.042
countries. Graves, A., Jaitly, N., & Mohamed, A. (2013). Hybrid
Cabero, J., Baillo, A., Cerisola, S., Ventosa, M., Garcia- speech recognition with Deep Bidirectional LSTM. In
Alcalde, A., Peran, F., & Relano, G. (2005). A medium- 2013 IEEE Workshop on Automatic Speech Recognition
term integrated risk management model for a hydro- and Understanding (pp. 273–278). https://doi.org/10.
thermal generation company. IEEE Transactions on 1109/ASRU.2013.6707742
Power Systems, 20(3), 1379–1388. https://doi.org/10. Hochreiter, S. (1991). Untersuchungen zu dynamischen
1109/TPWRS.2005.851934 neuronalen netzen. Diploma, Technische Universitat
Cestnik, B. (1990). Estimating probabilities: A crucial task Munchen.
in machine learning. In ECAI’90: Proceedings of the 9th Hochreiter, S., & Schmidhuber, J. (1997). Long short-term
European Conference on Artificial Intelligence January memory. Neural Computation, 9(8), 1735–1780. https://
(pp. 147–149). doi.org/10.1162/neco.1997.9.8.1735
Chen, T., Carlos, G. (2016). XGBoost: A Scalable Tree Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma,
Boosting System. In Publication:KDD ’16: Proceedings of W., Ye, Q., & Liu, T. (2017). LightGBM: A highly effi-
the 22nd ACM SIGKDD International Conference on cient gradient boosting decision tree. In NIPS’17:
Knowledge Discovery and Data Mining August (pp. Proceedings of the 31st International Conference on
785–794). https://doi.org/https://doi.org/10.1145/2939672. Neural Information Processing Systems (pp. 3149–3157).
2939785 Curran Associates Inc.
Chevillon, G. (2007). Direct multi-step estimation and fore- Kuo, P.-H., & Huang, C.-J. (2018). An electricity price
casting. Journal of Economic Surveys, 21(4), 746–785. forecasting model by hybrid structured deep neural
https://doi.org/10.1111/j.1467-6419.2007.00518.x networks. Sustainability, 10(4), 1280. https://doi.org/10.
Ching-Kang, I. (2003). Multistep prediction in autoregres- 3390/su10041280
sive processes. Econometric Theory, 19(02), 254–279. Lago, J., De Ridder, F., & De Schutter, B. (2018).
https://doi.org/10.1017/S0266466603192031 Forecasting spot electricity prices: Deep learning
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., approaches and empirical comparison of traditional
Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning algorithms. Applied Energy, 221(February), 386–405.
phrase representations using RNN encoder-decoder for https://doi.org/10.1016/j.apenergy.2018.02.069
statistical machine translation. http://arxiv.org/abs/1406. Manner, H., T€ urk, D., & Eichler, M. (2016). Modeling
1078 and forecasting multivariate electricity price spikes.
Cui, Z., & Wang, Y. (2018). Deep stacked bidirectional Energy Economics, 60, 255–265. https://doi.org/10.1016/
and unidirectional LSTM recurrent neural network for j.eneco.2016.10.006
network-wide traffic speed prediction. In 23rd ACM Micci-Barreca, D. (2001). A preprocessing scheme for
SIGKDD conference on knowledge discovery and data high-cardinality categorical attributes in classification
mining (KDD) (pp. 22–25). New York, NY, United and prediction problems. ACM SIGKDD Explorations
States: Association for Computing Machinery. Newsletter, 3(1), 27–32. https://doi.org/10.1145/507533.
Deb, C., Zhang, F., Yang, J., Eang, S., & Wei, K. (2017). 507538
A review on time series forecasting techniques for Mirikitani, D., & Nikolaev, N. (2011). Nonlinear max-
building energy consumption. Renewable and imum likelihood estimation of electricity spot prices
Sustainable Energy Reviews, 74(February), 902–924. using recurrent neural networks. Neural Computing
https://doi.org/10.1016/j.rser.2017.02.085 and Applications, 20(1), 79–89. https://doi.org/10.1007/
Diebold, F. X., & Mariano, R. S. (2012). Comparing pre- s00521-010-0344-1
dictive accuracy. Journal of Business & Economic Nord Pool Historical Market Data. (n.d.). Nord Pool.
Statistics, 13(3), 253–263. https://doi.org/10.1080/ Retrieved November 5, 2020, from https://www.nord-
07350015.1995.10524599 poolgroup.com/historical-market-data/
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). Nord Pool Website. (2015). Nord Pool. Retrieved
CatBoost: Gradient boosting with categorical features November 5, 2020, from https://www.nordpoolgroup.
support, arXiv:1810.11363. com/
Fragkioudaki, A., Marinakis, A., & Cherkaoui, R. (2015). Olah, C. (2015). Understanding LSTM networks. Retrieved
Forecasting price spikes in European day-ahead electricity November 5, 2020, from http://colah.github.io/posts/
markets using decision trees. International Conference on 2015-08-Understanding-LSTMs/
the European Energy Market, EEM, 2015-August. https:// Pandey, N., & Upadhyay, K. G. (2016). Different price fore-
doi.org/10.1109/EEM.2015.7216672 casting techniques and their application in deregulated elec-
Freund, Y., & Schapire, R. (1997). A decision-theoretic tricity market: A comprehensive study. 2016 International
generalization of on-line learning and an application to Conference on Emerging Trends in Electrical Electronics
18 F. ZHANG ET AL.
& Sustainable Energy Systems (ICETEESES) (pp. 1–4). Weron, R. (2014). Electricity price forecasting: A review
https://doi.org/10.1109/ICETEESES.2016.7581342 of the state-of-the-art with a look into the future.
Pezzutto, S., Grilli, G., Zambotti, S., & Dunjic, S. (2018). International Journal of Forecasting, 30(4), 1030–1081.
Forecasting electricity market price for end users in https://doi.org/10.1016/j.ijforecast.2014.08.008
EU28 until 2020—Main factors of influence. Energies, Xu, C., Xie, L., & Xiao, X. (2018). A bidirectional LSTM
11(6), 1418–1460. https://doi.org/10.3390/en11061460 approach with word embeddings for sentence boundary
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., detection. Journal of Signal Processing Systems, 90(7),
& Gulin, A. (2018). CatBoost: Unbiased boosting with cat- 1063–1075. https://doi.org/10.1007/s11265-017-1289-8
egorical features. In NIPS’18: Proceedings of the 32nd Xue, H. Q., Huynh, D., & Reynolds, M. (2017). Bi-predic-
International Conference on Neural Information Processing tion: Pedestrian trajectory prediction based on bidirec-
Systems, December 2018 (pp. 6639–6649). ACM. tional LSTM classification. In 2017 International
Sandhu, H. S., Fang, L., & Guan, L. (2016). Forecasting
Conference on Digital Image Computing: Techniques
day-ahead price spikes for the Ontario electricity mar-
and Applications (DICTA) (pp. 1–8). https://doi.org/10.
ket. Electric Power Systems Research, 141, 450–459.
https://doi.org/10.1016/j.epsr.2016.08.005 1109/DICTA.2017.8227412
Yildirim, O.€ (2018). A novel wavelet sequence based on
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recur-
rent neural networks. IEEE Transactions on Signal deep bidirectional LSTM network model for ECG sig-
Processing, 45(11), 2673–2681. https://doi.org/10.1109/ nal classification. Computers in Biology and Medicine,
78.650093 96(March), 189–202. https://doi.org/10.1016/j.comp-
Sulehria, H. K., & Zhang, Y. (2007). Hopfield neural net- biomed.2018.03.016
works: A survey. In AIKED’07 Proceedings of the 6th Zeyer, A., Doetsch, P., Voigtlaender, P., & Schl€ uter, R.
Conference on 6th WSEAS Int. Conf. on Artificial (2017). A comprehensive study of deep bidirectional
Intelligence, Knowledge Engineering and Data Bases LSTM RNNS for acoustic modeling in speech recognition.
(Vol. 6, pp. 125–130). In 2017 IEEE International Conference on Acoustics,
Ugurlu, U., Oksuz, I., & Tas, O. (2018). Electricity price Speech and Signal Processing (ICASSP) (pp. 2462–2466).
forecasting using recurrent neural networks. Energies, https://doi.org/10.1109/ICASSP.2017.7952599
11(5), 1255. https://doi.org/10.3390/en11051255 Zhang, F., & Fleyeh, H. (2019). A review of single artifi-
Vardhan, N. H., & Chintham, V. (2015). Electricity price fore- cial neural network models for electricity spot price
casting of deregulated market using Elman neural network. forecasting. In 2019 16th International Conference on
In 2015 Annual IEEE India Conference (INDICON) (pp. the European Energy Market (EEM) (pp. 1–6). https://
1–5). https://doi.org/10.1109/INDICON.2015.7443460 doi.org/10.1109/EEM.2019.8916423
Ramos, A., & Rivier, M. (2005).
Ventosa, M., Baıllo, A., Zhao, J. H., Dong, Z. Y., Li, X., & Wong, K. P. (2007). A
Electricity market modeling trends. Energy Policy, 33(7), framework for electricity price spike analysis with
897–913. https://doi.org/10.1016/j.enpol.2003.10.013 advanced data mining methods. IEEE Transactions on
Voronin, S., & Partanen, J. (2013). Price forecasting in Power Systems, 22(1), 376–385. https://doi.org/10.1109/
the day-ahead energy market by an iterative method
TPWRS.2006.889139
with separate normal price and price spike frameworks.
Zhao, Y., Yang, R., Chevalier, G., Shah, R. C., &
Energies, 6(11), 5897–5920. https://doi.org/10.3390/
Romijnders, R. (2018). Optik applying deep bidirec-
en6115897
Wang, L., Member, S., Zhang, Z., & Chen, J. (2017). tional LSTM and mixture density network for basket-
Short-term electricity price forecasting with stacked ball trajectory prediction. Optik – International Journal
denoising autoencoders. IEEE Transactions on Power for Light and Electron Optics, 158, 266–272. https://doi.
Systems, 32(4), 2673–2681. https://doi.org/10.1109/ org/10.1016/j.ijleo.2017.12.038
TPWRS.2016.2628873 Zheng, D., Chen, Z., Wu, Y., & Yu, K. (2016). Directed
Wang, Y., Wang, J., Lin, H., Zhang, S., & Li, L. (2017). automatic speech transcription error correction using
Biomedical event trigger detection based on bidirectional bidirectional LSTM. In 2016 10th International
LSTM and CRF. In 2017 IEEE International Conference Symposium on Chinese Spoken Language Processing
on Bioinformatics and Biomedicine (BIBM) (pp. 445–450). (ISCSLP) (pp. 1–5). https://doi.org/10.1109/ISCSLP.
https://doi.org/10.1109/BIBM.2017.8217689 2016.7918446
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 19
Continued.
Series 1 Series 2 Series 3 Series 4
Feature Score Feature Score Feature Score Feature Score
Lag_174 0.197 Lag_275 0.434 Lag_214 0.302 Lag_195 0.362
Lag_11 0.188 Lag_20 0.425 Lag_62 0.297 Lag_316 0.351
Lag_197 0.186 Lag_266 0.422 Lag_190 0.289 Lag_197 0.341
Lag_190 0.183 Lag_243 0.419 Lag_51 0.282 Lag_271 0.333
Lag_127 0.181 Lag_228 0.398 Lag_335 0.281 Lag_272 0.329
Lag_59 0.170 Lag_316 0.392 Lag_115 0.272 Lag_71 0.323
Lag_156 0.169 Lag_17 0.380 Lag_313 0.264 Lag_410 0.320
Lag_27 0.169 Lag_223 0.368 Lag_262 0.258 Lag_189 0.317
Lag_164 0.167 Lag_111 0.366 Lag_176 0.227 Lag_5 0.311
Lag_86 0.164 Lag_287 0.365 Lag_178 0.227 Lag_310 0.285
Lag_189 0.163 Lag_154 0.363 Lag_124 0.224 Lag_236 0.281
Lag_94 0.157 Lag_289 0.335 Lag_167 0.222 Lag_432 0.280
Lag_52 0.150 Lag_200 0.333 Lag_158 0.221 Lag_356 0.272
Lag_5 0.148 Lag_80 0.326 Lag_159 0.216 Lag_263 0.269
Lag_173 0.132 Lag_302 0.321 Lag_30 0.214 Lag_23 0.255
Lag_165 0.126 Lag_199 0.319 Weekend 0.206 Lag_302 0.251
Lag_98 0.123 Lag_69 0.312 Lag_381 0.201 Lag_66 0.246
Lag_198 0.123 Lag_140 0.303 Lag_271 0.198 Lag_433 0.241
Lag_89 0.117 Lag_334 0.288 Lag_100 0.192 Lag_342 0.234
Lag_72 0.115 Lag_157 0.272 Lag_253 0.186 Lag_175 0.220
Lag_69 0.115 Lag_6 0.258 Lag_391 0.182 Lag_293 0.217
Lag_26 0.113 Lag_229 0.246 Lag_85 0.176 Lag_174 0.213
Lag_159 0.110 Lag_171 0.246 Lag_292 0.175 Lag_116 0.213
Lag_192 0.109 Lag_226 0.240 Lag_303 0.170 Lag_347 0.211
Lag_66 0.104 Lag_56 0.209 Lag_34 0.159 Lag_338 0.209
Lag_163 0.103 Lag_377 0.209 Lag_168 0.157 Lag_411 0.192
Lag_188 0.101 Lag_208 0.208 Lag_279 0.152 Lag_163 0.176
Lag_166 0.090 Lag_49 0.199 Lag_29 0.148 Lag_136 0.174
Lag_110 0.089 Lag_361 0.198 Lag_246 0.145 Lag_192 0.173
Lag_162 0.088 Lag_272 0.183 Lag_98 0.140 Lag_11 0.171
Lag_122 0.086 Lag_246 0.173 Lag_228 0.139 Lag_37 0.171
Lag_113 0.081 Lag_63 0.163 Lag_268 0.138 Lag_210 0.171
Lag_87 0.079 Lag_325 0.161 Lag_22 0.137 Lag_73 0.158
Lag_12 0.075 Lag_270 0.159 Lag_166 0.133 Lag_350 0.154
Lag_85 0.073 Lag_306 0.147 Lag_384 0.132 Lag_33 0.153
Lag_144 0.073 Lag_350 0.132 Lag_53 0.125 Lag_262 0.142
Lag_167 0.072 Lag_8 0.131 Lag_353 0.122 Lag_29 0.140
Lag_128 0.069 Lag_14 0.124 Lag_107 0.119 Lag_96 0.140
Lag_81 0.067 Lag_370 0.123 Lag_306 0.116 Lag_279 0.139
Lag_40 0.067 Lag_41 0.123 Lag_270 0.116 Lag_273 0.139
Lag_6 0.066 Lag_330 0.118 Lag_336 0.116 Lag_41 0.136
Lag_82 0.065 Lag_40 0.112 Lag_157 0.115 Lag_63 0.133
Lag_172 0.063 Lag_207 0.108 Lag_379 0.114 Lag_164 0.132
Lag_14 0.063 Lag_237 0.104 Lag_194 0.112 Lag_276 0.128
Lag_28 0.062 Lag_85 0.104 Lag_234 0.110 Lag_94 0.126
Lag_77 0.062 Lag_188 0.103 Lag_4 0.110 Lag_68 0.123
Day_name 0.059 Lag_322 0.094 Lag_233 0.110 Lag_10 0.122
Lag_79 0.059 Lag_283 0.093 Lag_174 0.109 Lag_172 0.118
Lag_50 0.058 Lag_354 0.090 Lag_31 0.109 Lag_225 0.117
Lag_30 0.052 Lag_213 0.078 Lag_32 0.106 Lag_9 0.117
Lag_179 0.051 Lag_55 0.077 Lag_239 0.104 Lag_14 0.114
Lag_10 0.051 Lag_176 0.074 Lag_298 0.098 Lag_123 0.113
Lag_329 0.071 Lag_142 0.095 Lag_274 0.112
Lag_145 0.070 Lag_149 0.095 Lag_321 0.112
Lag_192 0.070 Lag_192 0.095 Lag_89 0.112
Lag_339 0.069 Lag_217 0.094 Lag_397 0.111
Lag_47 0.066 Lag_273 0.092 Lag_49 0.099
Lag_28 0.064 Lag_321 0.092 Lag_183 0.098
Lag_126 0.064 Lag_278 0.092 Lag_205 0.095
Lag_364 0.058 Lag_396 0.085 Lag_101 0.083
Lag_23 0.054 Lag_251 0.083 Lag_150 0.083
Lag_286 0.054 Lag_369 0.079 Lag_122 0.083
Lag_273 0.053 Lag_119 0.078 Lag_111 0.083
Lag_311 0.052 Lag_141 0.078 Lag_85 0.080
Lag_7 0.050 Lag_305 0.077 Lag_204 0.080
Lag_329 0.076 Lag_74 0.075
Lag_106 0.074 Lag_360 0.070
Lag_113 0.074 Lag_213 0.070
Lag_221 0.073 Lag_340 0.060
Lag_165 0.073 Lag_389 0.054
Lag_104 0.068 Lag_126 0.053
Lag_72 0.066 Lag_64 0.050
Lag_318 0.061
Lag_45 0.059
Lag_236 0.059
(continued)
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY 25
Continued.
Series 1 Series 2 Series 3 Series 4
Feature Score Feature Score Feature Score Feature Score
Lag_83 0.059
Lag_280 0.058
Lag_212 0.057
Lag_238 0.057
Lag_359 0.056
Lag_290 0.054
Lag_259 0.053
Lag_41 0.052
Lag_25 0.052
Lag_120 0.051