Professional Documents
Culture Documents
Jin Xiao, Yuxi Li, Ling Xie, Dunhu Liu, Jing Huang
PII: S0360-5442(18)31226-X
DOI: 10.1016/j.energy.2018.06.161
Reference: EGY 13203
Please cite this article as: Xiao J, Li Y, Xie L, Liu D, Huang J, A hybrid model based on
selective ensemble for energy consumption forecasting in China, Energy (2018), doi: 10.1016/
j.energy.2018.06.161.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
RI
U SC
AN
M
D
TE
EP
C
AC
ACCEPTED MANUSCRIPT
4 Jin Xiaoa, Yuxi Lia, Ling Xiea, Dunhu Liub, Jing Huangc,1
PT
7
c School of Public Administration, Sichuan University, Chengdu 610064, China
8
9 Abstract: It is of great significance to develop accurate forecasting models for China’s energy
RI
10 consumption. The energy consumption time series often have the characteristics of complexity and
11 nonlinearity, and the single model cannot achieve satisfactory forecasting results. Therefore, in recent
12
SC
years, more and more scholars have tried to build up hybrid model to handle this issue, in which the
13 divide and rule method is the most popular one. However, the existing divide and rule models often
14 predict the energy consumption subseries after decomposing with the single forecasting model. This
15
U
study introduces the group method of data handling technique for energy consumption forecasting in
16 China, and constructs a hybrid forecasting model based on the group method of data handling selective
AN
17 ensemble. It mainly focuses on predicting the nonlinear variation of energy consumption. The model
18 first predicts the linear trend of energy consumption time series through the group method of data
19 handling-based autoregressive model and then obtains the residual subseries of energy consumption.
M
20 Considering the highly nonlinear characteristics of the residual subseries, this study introduces
21 AdaBoost ensemble technology to enhance the forecasting performance of the single nonlinear
22 prediction model, back propagation neural network, support vector regression machine, genetic
D
23 programming, and radical basis function neural network respectively, to obtain four different versions
24
TE
of the ensemble model on nonlinear subseries. Further, the prediction results of these four AdaBoost
25 ensemble models are used as an initial input, and the selective combination prediction for the nonlinear
26 subseries is obtained by using the group method of data handling. Finally, two parts are added up to
27
EP
obtain the final prediction. The empirical analysis of total energy consumption and total oil
28 consumption in China shows that the forecasting performance of the proposed model is better than that
29 of the group method of data handling-based autoregressive model and seven other hybrid models, and
C
30 this study gives the out-of-sample forecasting of two time series from 2015 to 2020.
31 Key words: prediction of energy consumption; GMDH; AdaBoost ensemble technology; selective
AC
34 1. Introduction
35 Since the economic reform (“reform and opening-up”), the Chinese economy has developed
36 rapidly, and the energy consumption has increased continuously. The BP Statistical Review of World
37 Energy 2016 [1] pointed out that the Chinese economy grew slowly, and it was undergoing a structural
38 transformation, while China remained the country with the largest energy consumption, production,
1
Corresponding author. E-mail address: 147895715@qq.com.
1
ACCEPTED MANUSCRIPT
1 and net imports in the world. In 2015, China’s energy consumption was 23% of the total global
2 consumption, and comprised 34% of the net increase in the global energy consumption. In fossil energy,
3 China’s consumption increase rate for oil was the fastest, at 6.7%. In non-fossil energy, solar energy
4 increased the fastest, at 69.7%. China surpassed Germany and the USA, and became the largest solar
5 energy electricity generation country in the world. Therefore, it is of realistic significance to study and
6 construct a scientific energy consumption model, accurately predict the future gap between supply and
7 demand for sustainable economy and society development, energy industry development, the
8 reasonable use of energy resources, the construction of a conservation-minded society, and creation of
PT
9 a national energy strategy.
10
RI
Nomenclature
ARIMA autoregressive integrated moving average nonlinear subseries of the energy consumption
GM grey prediction model forecasting result of the t-th sample by the i-th ensemble
SC
model in the nonlinear subseries
U
ANN artificial neural network AS asymmetric stability
AN
DR demand respond MR mean regularization
MLP multi-layer perceptron yˆ t (W ) forecasted output of the t-th sample in the entire training
ANFIS adaptive neuro fuzzy inference system RMSE root mean square error
DEMD differential empirical mode SRMSE symmetrical root mean squared error
D
decomposition
LSSVR least square support vector regression SMAPE symmetrical mean absolute percentage error
TE
IMF intrinsic mode function mA sample size of the model learning set A
GMDH group method of data handling mB sample size of the model selection set B
HFGSE hybrid forecasting model based on forecasted output of the t-th sample in the selection set B
EP
yˆt (A)
GMDH selective ensemble by the model trained in the model learning set A in Eqs.
yˆ(
t B)
learning set A by the model trained in the model
AC
coefficient vector in Eq. (1) relative forecasting error in the τ-th iteration
number of initial models in Eq. (3) weight of weak learner in the τ-th iteration
estimated output in Eq. (4) the largest lagged order of BP neural network
2
ACCEPTED MANUSCRIPT
yt original energy consumption time series γ kernel width of SVR
1
2 1.1. Literature review
3 With the social development and progress, people have realized the important effect of energy on
4 economic development. After the energy crisis in 1973 and 1979, the entire world has been conscious
5 of the energy conditionality for economy and the significance of consumption forecasting. During the
6 period, a great deal of research on energy consumption demand forecasting appeared abroad. Dupree
PT
7 and Corsentino [2] presented the future energy consumption within the major consuming sectors and
8 the energy supply sources of America. Thompson [3] proposed a weather sensitive electric loads and
RI
9 energy forecasting method which could be used in both long-term and short-term prediction. Parikh
10 and Rothkopf [4] studied the long-run elasticity of US energy demand and proposed an effective
11 process analysis method. There are no energy consumption data before the reform, so the research of
SC
12 China remained on policy suggestions. Yang [5] put forward several ways to save energy in China. Wu
13 [6] proposed the idea of using forecasting technology to solve the energy crisis. After that, the
14 availability of energy consumption data has resulted in a great progress in the domestic studies, for
U
15 example, Shi [7] pointed out that the improvement of China's energy utilization efficiency was very
16 significant since the reform and opening-up. The State Planning and Energy-saving Commission [8]
AN
17 focused on the construction and application of energy forecasting models. Until recently, the scholars
18 have proposed many methods for predicting energy consumption, and they can be divided into two
19 classes: single forecasting models and hybrid forecasting models.
M
20
21 Table 1 Typical literatures using single forecasting models
D
Time series Sen et al. (2016) [9] Intuitive and explainable Pre-assumed form of the model;
TE
model Clements et al. (2016) [10] functional form; data independence assumption;
Boroojeni et al. (2017) [11] low computational complexity; low accuracy for nonlinear
Nonlinear Kovačič and Šarler (2014) [14] Don't need pre-assumed form Results cannot be easily
Due 21/12/2018
22
23 Some typical literatures using single forecasting models are summarized in Table 1. The
24 commonly used single models include: 1) Time series models, including autoregressive integrated
25 moving average models (ARIMA) [9], regression analysis models [10], and grey prediction models
26 (GM) [13]. For example, Sen et al. [9] focused on how to select the best possible ARIMA model for
27 short term forecasting and they found out that the ARIMA (1,0,0) x (0,1,1) was the best model for the
3
ACCEPTED MANUSCRIPT
1 energy consumption and the ARIMA (0,1,4) x (0,1,1) was the best one for GHG (green-house gas)
2 emission, respectively. Clements et al. [10] proposed a multiple equation time series model to forecast
3 the day-ahead electricity load in Austria, and found that this model could achieve the same or even
4 better performance than the complex nonlinear and nonparametric forecasting models. Ding et al. [13]
5 developed a novel optimized grey model based on the principle of “new information priority” to predict
6 China’s electricity consumption, which combined a new initial condition and rolling mechanism. The
7 empirical results showed that the model was superior to some benchmark models. 2) Nonlinear
8 forecasting models, including genetic programming (GP) [14], artificial neural networks (ANN) [15],
PT
9 support vector regression (SVR) [17], etc. For instance, Kovačič and Šarler [14] applied the GP model
10 to forecasting the natural gas consumption in a steel plant and the results showed high accuracy of this
RI
11 model. Szoplik [15] used the multilayer perceptron (MLP) in ANN to forecast the gas demand in
12 Szczecin, Poland, and the results showed that this model had good performance when used to forecast
13 the gas consumption on any day of the year and any hour of the day. Chen et al. [17] proposed a new
SC
14 SVR model, which used the ambient temperature of two hours before demand respond (DR) event as
15 the input variables, for forecasting the DR baselines of office buildings.
16 The economic time series often have the characteristics of complexity and nonlinearity, and the
U
17 single model cannot always analyze and predict the energy demand accurately. Therefore, in recent
18 years, more and more scholars have tried to build up the hybrid model to handle this issue, and the
AN
19 models can be approximately classified into two types: 1) The combination forecasting method, which
20 trains several models to predict the original time series, and then combines these models with
21 appropriate weight to obtain the final forecasting result. For example, Zhang et al. [19] constructed a
M
22 weighted model combining nu-SVR and epsilon-SVR, and the differential evolution algorithm was
23 employed to determine the weights of each model. This model was utilized to forecast the daily and
D
24 half-hourly energy consumption of a building in Singapore and the results showed that the proposed
25 model had higher accuracy than some other models. Yuan et al. [21] combined GM and ARIMA
TE
26 models with the same weight to forecast China’s primary energy consumption, and found out that the
27 forecasting performance of this model was better than that of the single GM and ARIMA model. Li et
28 al. [26] improved the traditional combination method by allowing the weight coefficient of the
EP
29 participating model to be negative, the experimental results on the oil consumption in China indicated
30 that this new method had better performance than the traditional combination methods. 2) The divide
31 and rule method, which first decomposes the original time series into several subseries, and then
C
32 models and predicts each subseries by an appropriate model, finally integrates the prediction results
33
AC
according to certain rules. This method is used most frequently, for instance, Fan et al. [33] proposed a
34 model to forecast the electric load in Australia and USA. Firstly, this method used the differential
35 empirical mode decomposition (DEMD) to decompose the original time series into several intrinsic
36 mode functions (IMFs) and a residual subseries; secondly, the SVR model was employed to forecast
37 the IMFs and the autoregression model for the residual subseries; finally, all the results were summed
38 up to obtain the final prediction result. The empirical results illustrated that this model could provide
39 both accurate prediction and interpretative ability. Panapakidis and Dagoumas [34] proposed a hybrid
40 model to predict the day-ahead natural gas demand. First, it decomposed the original time series into
41 several subseries by wavelet transforming, then employed a genetic algorithm optimized adaptive
42 neuro-fuzzy inference system (ANFIS) to forecast each subseries, and finally a feed-forward neural
4
ACCEPTED MANUSCRIPT
1 network (FFNN) was used to aggregate the forecasting results of all the subseries. The experimental
2 results showed that the model had good robustness. In addition to the energy consumption forecasting,
3 the hybrid models are widely applied in the energy price forecasting. For example, Zhu et al. [35]
4 developed a EMD-based least square support vector regression (LSSVR) model to predict the carbon
5 price. It first decomposed the carbon price time series into several IMFs and a residue by EMD, then
6 used LSSVR to forecast the IMFs and residue, respectively, finally all the forecasting values were
7 aggregated into the final prediction results. Compared with some traditional forecasting methods, the
8 proposed model had better performance and robustness. More typical literatures regarding the hybrid
PT
9 forecasting model can be found in Table 2.
10
11
RI
Table 2 Typical literatures using hybrid forecasting models
Combination forecasting Zhang et al. (2016) [19] Convenient and simple; Computational intensive;
SC
method Xiao et al. (2016) [20] robust for the complex difficult to decide which
U
Liu et al. (2016) [23] into consideration
AN
Zhang et al. (2016) [24]
Divide and rule method Zhu and Wei (2013) [28] Assign appropriate Complex model;
12
AC
5
ACCEPTED MANUSCRIPT
1 selecting and combining the forecasting results of a subset of the models for a final decision, i.e.,
2 selective ensemble. The factor screening function of the group method of data handling (GMDH)
3 neural network proposed by Ivakhnenko [39] can objectively and automatically choose factors that
4 critically influence the research object [40]. Thus, GMDH can reduce the effect of multicollinearity on
5 the performance of the ensemble model to some extent.
6 To fill the gaps mentioned above, this study introduces the GMDH technique, and proposes a
7 hybrid forecasting model based on GMDH selective ensemble (HFGSE). It uses the GMDH-based
8 autoregressive (GAR) model proposed in the authors’ previous work [40] to predict the linear trend of
PT
9 the energy consumption time series, and obtains the nonlinear residual subseries. Considering the
10 highly nonlinear characteristics of the residual subseries, this study introduces AdaBoost ensemble [38]
RI
11 technology to enhance the forecasting performance of the single nonlinear prediction model, back
12 propagation (BP) neural network, support vector regression (SVR) machine [41], genetic programming
13 (GP) , and radical basis function (RBF) neural network respectively, to obtain four different versions of
SC
14 the ensemble model on nonlinear subseries: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and
15 AdaBoost.RBF. Further, the prediction results of these four AdaBoost ensemble models are used as an
16 initial input, and the selective combination prediction for the nonlinear subseries is obtained by using
U
17 GMDH. Finally, the predictions of the two parts are integrated to obtain the final forecasting results.
18 The empirical analysis on China’s total energy consumption and total oil consumption time series
AN
19 verifies the effectiveness of the HFGSE model.
20 The novelty of this study can be summarized as follows:
21 1) This study employs the ensemble learning to predict the nonlinear subseries after decomposition,
M
24 selective ensemble to avoid redundancy and multicollinearity to some extent. To the best of our
25 knowledge, this study, for the first time, employs selective ensemble in the energy consumption
TE
26 forecasting filed.
27
28 1.3. Organization of the paper
EP
29 This study is organized as follows: Section 2 describes the related methodology applied for the
30 proposed model, including the AdaBoost ensemble method, GMDH neural network, and GAR model.
31 Section 3 discusses the hybrid forecasting model based on the GMDH selective ensemble, that is,
C
32 HFGSE, in detail. Section 4 presents the empirical study. Finally, the findings of this study are
33
AC
summarized in Section 5.
34 2. Related Theories
35 In this study, AdaBoost ensemble model, GMDH neural network and GAR model are used to
36 construct the hybrid model HFGSE. A brief description of these related methods is summarized in this
37 section.
38
39 2.1. AdaBoost ensemble model
40 In machine learning, ensemble learning is an effective method for increasing the learning accuracy
41 through combining the outputs of many weak learners. Boosting is a commonly used ensemble
6
ACCEPTED MANUSCRIPT
1 algorithm. It has many different versions, and AdaBoost is the most popular one.
2 AdaBoost was proposed by Freund and Schapire [38]. To improve the learning performance of a
3 weak learner with the AdaBoost algorithm, it first needs to initialize the sample weight distribution in
4 the training set, and the initial weight assigned to each sample is the same, that is, if the training set
5 contains n samples, then the weight of each sample is 1⁄. Therefore, in the first iteration of training
6 the weak learner with AdaBoost, each sample will be selected with the same probability. The selected
7 samples can be trained to obtain the first weak learner, ℎ , under the appointed learning rule. Then,
8 AdaBoost calculates the classification error of the training samples at the current iteration. The weight
PT
9 distribution of the samples at the next iteration is updated according to the error. The update rule is:
10 increase the sample weight of misclassification, decrease the sample weight of correct classification.
Repeat the process T times, and T weak learners can be obtained: ! , !# , ⋯ , !% . Finally, the last
RI
11
12 prediction value is obtained through weighting the forecasting results of the T weak learners.
13 At the beginning, the application of the AdaBoost algorithm was focused on classification [42]
SC
14 such as face recognition, vehicle license plate recognition, and so on. In recent years, it has been
15 applied in forecasting [43]. For example, Solomatine and Shresth [44] proposed the AdaBoost.RT
16 algorithm for forecasting. The algorithm is similar to the AdaBoost algorithm, and the difference
U
17 between them is that the latter increases the weight of the sample whose relative error is greater than
18 the pre-set threshold values ∅ after finding it at the end of each iteration. For the detailed process,
AN
19 please refer to [44].
20
21 2.2. Group method of data handling neural network
M
22 The GMDH neural network is the core technique of self-organizing data mining [45], and it can
23 decide the variables to enter the model and the structure and parameters of the model in a
D
26 into two subsets, namely, model learning set A for estimation of model parameters, and model selection
27 set B for performance evaluation of intermediate candidate models [47]. GMDH constructs the general
28 relation between the inputs and outputs variables through the reference function. Generally speaking, as
EP
32
33
AC
as follows:
34 Let
35 !+ , +# , ⋯ , +
= + + # +# + ⋯ + + , (2)
36 and take all its sub-items as the initial models of the modeling network structure:
37 / = + , /# = # +# , ⋯ , / = + . (3)
38 Set the initial models of Eq. (3) as the inputs of the GMDH network, combine all their possible
#
39 intermediate candidate models of the first layer [48]. The transfer
40 function is as follows:
41 = !3/ , /, 4; , 6 = 1, 2, ⋯ , ; ≠ 6, (4)
7
ACCEPTED MANUSCRIPT
1 where is the estimated output. Obtain parameters through least squares (LS) estimation on the
2 model learning set A. Work out the external criterion value of every intermediate candidate model on
3 the model selection set B. Generally speaking, the smaller the external criterion value is, the higher the
4 performance of the intermediate candidate model is. Rank the external criteria from small to large,
5 select the optimal 9 ≤ 1#
models as the inputs of the second layer, and combine all their possible
6 pairs to generate 1;#< intermediate candidate models:
7 = = !3 , , 4; , 6 = 1, 2, ⋯ , 9 ; ≠ 6. (5)
8 Estimate the parameter of each intermediate candidate model and calculate its external criterion value,
PT
9 select 9# 3≤ 1;#< 4 intermediate candidate models again as the inputs of the third layer, and combine all
10 their possible pairs to generate 1;#> intermediate candidate models:
= !3= , =, 4; , 6 = 1, 2, ⋯ , 9# ; ≠ 6.
RI
11 (6)
12 The process repeats continuously, and the intermediate candidate models of the fourth, fifth, …, layer
13 can be obtained in turn. The termination rule of the model is given through the optimal complexity
SC
14 theory [49]: with the increase of the intermediate candidate models’ complexity, the external criteria
15 values will first become smaller and then larger. Therefore, when the external criteria value reaches its
16 minimum, the corresponding model is the optimal complexity model ∗ (see Fig. 1). Finally, in order
to seek the initial model contained in the optimal complexity model ∗ , one just needs to reconstruct
U
17
18 the GMDH network structure from the last layer until the initial input layer is reached. From Fig. 1, it
AN
19 can be seen that the initial input models v1, v3, v4 and v5 are chosen. In other words, x1, x3, x4, and x5 are
20 chosen. However, v2 is eliminated during the self-adaption process; in other words, x2 is eliminated
21 [50].
M
w2
v1
The 3rd layer
z2
TE
v2
The optimal model
v3 w5
EP
v4 y2 y*= f (v)
z6
v5
w10
C
24
25 2.3. Group method of data handling-based autoregression model
26 In time series forecasting, the ARIMA model is usually adopted to predict the linear trend of the
27 time series. However, to determine whether the test sequence is stable, before constructing the ARIMA
28 (p, d, q) model, unit root test should be conducted. In addition, it’s necessary to find the optimal
29 parameter values, namely, the autoregressive order p and the moving average order q through trial and
30 error, whereas the GMDH neural network is a data driven method and it requires little prior knowledge
31 and assumptions. Thus, the authors’ previous work [40] combined an GMDH-type neural network with
32 an ARIMA model, and construct a GMDH-based autoregression (GAR) model for forecasting the
8
ACCEPTED MANUSCRIPT
1 energy consumption. In this model, the original single time series is first converted to a matrix. In the
2 matrix, starting from the second column, each column represents a new variable. Specifically, yt in the
3 second column is the current period of the energy consumption time series, that is, the dependent
4 variable Y, and from the third column to the end are the energy consumption time series with lag order
5 1, 2, · · · , k, respectively, which makes up the input vector X. In addition, it divides the new data set
6 into training set and test set , and divides the training set into a model learning set and a
7 model selection set further. Secondly, it trains a GMDH neural network to find the optimal complexity
8 model and decides the optimal autoregression order p. Finally, the energy consumption in the test set is
PT
9 forecasted by the optimal complexity model.
10 This model can ensure a self-organized modeling process, including finding the optimal
RI
11 complexity model, determining the optimal autoregression order, and estimating model parameters,
12 being largely devoid of human interference. The empirical analysis on three energy consumption time
13 series shows that the GAR model outperforms the ARIMA model.
SC
14 3. Hybrid Forecasting Model Based on Selective Ensemble
15 In this section, the proposed hybrid model HFGSE is described in detail, including basic idea,
U
16 construction of external criteria and modeling steps.
17
AN
18 Table 3 The transfer matrix for the nonlinear subseries
A .
D D
D
#
D
B
D
C
D2E D2E
D2E
#
D2E
B
D2E
C
D
… … … … …
TE
D2EF D2EF
D2EF
#
D2EF
B
D2EF
C
D2E D2E
D2E
#
D2E
B
D2E
C
EP
… … … … …
19
C
The hybrid model proposed in this study belongs to the divide and rule method. Because China’s
22 energy consumption time series is annual data, no seasonal factor exists. Therefore, this study uses the
23 GAR [40] model proposed earlier to predict its linear trend. The left residual sequence is the non-linear
24 subseries. Because the forecasting of the linear trend is relatively simple, and that of nonlinear
25 subseries is more difficult, it mainly focuses on the latter. Most existing hybrid forecasting models on
26 nonlinear subseries are for constructing a single prediction model, although the forecasting effect is
27 always better than that of the models that consider the linear trend only. Considering the complexity of
28 non-linear subseries, it is hard to obtain a better forecasting effect with the commonly used single time
29 series prediction model. This study first utilizes the ensemble learning model AdaBoost algorithm to
30 predict. It selects four nonlinear subseries classification models, namely, BP, SVR, GP, and RBF, to
9
ACCEPTED MANUSCRIPT
1 train the weak learner of the AdaBoost algorithm, and construct four ensemble prediction models:
2 AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Further, it considers combining the
3 four ensemble forecasting results. However, if all four trained ensemble models are combined, then
4 multicollinearity among the models may exist, which will degrade the forecasting accuracy of the
5 model. Forecasting performance can be improved by selecting and combining the forecasting results of
6 a subset of the models for a final decision. Thus, this study introduces a GMDH neural network to
7 establish selective combination forecasting. With the automatic modeling mechanism of GMDH, it
8 selects parts of models from all the ensemble forecasting models, self-organizes to combine them, and
PT
9 ensures their weights.
10 Suppose that the original energy consumption time series is . The HFGSEM model proposed in
RI
11 this study includes four steps: 1) To obtain the energy consumption nonlinear subseries, construct a
12 GAR model to predict linear trend; suppose it is . Then the difference value between them, 3 =
13 − 4, is the energy consumption nonlinear subseries. 2) AdaBoost ensemble prediction in the
SC
14 nonlinear subseries: It selects the above four nonlinear single models as the weak learners of AdaBoost
15 ensemble learning, obtains the forecasting results of four ensemble models in the non-linear subseries;
16 suppose these are = 1, 2, 3, 4
. 3) GMDH-based selective combination prediction in the nonlinear
subseries: First, it transfers the original nonlinear time series and all forecasting results of the
U
17
18 ensemble models = 1, 2, 3, 4
to the data set , stored in matrix form (see Table 3), where,
AN
19 denotes the energy consumption nonlinear subseries at the current period, that is, the dependent
20 variable A. From the third column to the sixth column: , # , B , and C construct the independent
21 variable . = , # , B , C
. Next, it divides the whole data set of the table into the model
M
22 training set and the test set (see the first column of Table 3). Further, it divides the model
23 training set into two subsets horizontally: model learning set A and model selecting set B, and
D
24 finds the optimal complexity model through the GMDH algorithm. Finally, it predicts the dataset using
25 the optimal complexity model and records the forecasting result as . 4) Calculate the final
forecasting value of the energy consumption time series. It adds the forecasting value of GAR model
TE
26
27 to that of nonlinear part , and obtains the final energy consumption forecasting value, that is,
28 = + .
EP
29
30 3.2. Construction of external criteria
31 In realistic system modeling, different requirements will appear, which may be the aims of
C
32 modeling, or the prior system knowledge. In GMDH modeling, the external criteria are the mathematic
33
AC
descriptions of these specified requirements, and can select the “optimal” model from the candidate
34 model set. GMDH has an external criteria system [50], which can select different external criteria
35 according to different modeling aims, and construct new external criteria according to needs.
36 This study chooses two external criteria from the existing GMDH external criteria system: the
37 asymmetric stability (AS) criterion and the mean regularization (MR) criterion. Their descriptions are
38 as follows:
39 (1) asymmetric stability criterion
40 J#
= ∑∈M − K
# , (7)
41 where yt is the actual output of the t-th sample in modeling training set W, and yˆt (A) is its
10
ACCEPTED MANUSCRIPT
1 forecasted output for dataset W by the model trained in the model learning set A. This criterion
2 means that first train the model in subset A and then calculate the sum of error squares between the
3 actual outputs and the forecasted outputs in the entire training set W.
4 (2) mean regularization criterion
5 J##
= ∑∈M −
# , (8)
6 where yˆt (W) is the forecasted output of the t-th sample in the entire training set W by the model
7 trained on the same dataset, that is, the model learning process and the calculation of external criteria
PT
8 are both carried out on the training set W.
9 Furthermore, considering that root mean square error (RMSE) and mean absolute percentage error
RI
10 (MAPE) are two commonly used indexes for evaluating the performance of the models in the energy
11 consumption prediction, this study constructs two new criteria: the symmetrical root mean squared
12 error (SRMSE) criterion and symmetrical mean absolute percentage error (SMAPE) criterion. The
SC
13 following are their descriptions:
14 (3) symmetrical root mean squared error criterion
JB#
= N
∑P∈ROP 2OP Q
>
+N ,
DR DT
U
15 AN (9)
16 where mA and mB stand for the sample size of data set A and B, respectively, yˆt (B) is the forecasted
17 output of the t-th sample in the modeling learning set A from the model trained in the model selection
M
18 set B, and yˆt ( A) is the forecasted output of the t-th sample in the modeling selection set B from the
19 model trained in the model learning set A. The SRMSE criterion calculates the root mean square error
D
Y T
V
VW XW Y R
V
VW XW
∑P∈R P P ∑P∈T P P
JC#
= U +U .
WP WP
DR DT
22 (10)
EP
23 The SMAPE criterion calculates the mean absolute percent error in subset A and the mean absolute
24 percent error in subset B simultaneously, which uses the information in subsets A and B symmetrically,
25 as in the SRMSE criterion.
C
11
ACCEPTED MANUSCRIPT
PT
RI
1
2 Figure 2. The modeling flowchart of the HFGSE model.
3
SC
4 Step 1: Obtain the energy consumption nonlinear subseries. Construct a GAR model on the
5 original energy consumption time series and predict the linear trend. Suppose the forecasting result
6 is ; then, the energy consumption nonlinear subseries is = − ;
Step 2: AdaBoost ensemble forecasting on Z[ . Suppose contains \ sample points, the
U
7
8 maximum number of iterations is and the threshold value of the relative forecasting error is ∅. The
AN
9 processes of integrating the nonlinear single forecasting model with the AdaBoost.RT algorithm are as
10 follows [45]:
D
11
12 (2) For ` = 1, 2, ⋯ , :
cd< Ma
!
+
→ ;
TE
14
b. Calculate the relative forecasting error
= ∑D
*
, : g
h ic
2Oc
a
j > ∅, where is the
Oc
15
16
17 c. Assign the weight
=
# to the weak learner;
18 d. Update the weight vector of the sample:
h ic
2Oc
, ! g a j≤∅
C
F
=
∗ l Oc .
1, otherwise
19 (11)
AC
!h +
= ∑%
* log u w !
+
/ ∑%
* log u w.
va va
21 (12)
22 This study selects four single nonlinear forecasting models in turn, to train the weak learners;
23 obtains four models: AdaBoost.BP, AdaBoost.SVR, AdaBoost.RBF, and AdaBoost.GP; and records
24 their forecasting results as , # , B , C , respectively.
25 Step 3: Conduct selective combination forecasting with a GMDH neural network on .
26 (1) Transfer and prepare the data: Transfer the original nonlinear time series data and the
27 forecasting results of the four ensemble models , # , B , C into matrix form as in Table 3, divide
12
ACCEPTED MANUSCRIPT
1 the matrix data into model training set W and model test set . Further, divide the training set into
2 model learning set A and model selecting set B;
3 (2) Run the GMDH algorithm on the model training set W, and find the combination forecasting
4 model with the optimal complexity:
5 a. Construct the general relation between the output and input variables:
6 = + # # + B B + C C , (13)
7 and regard all the sub-items as the initial input models of the GMDH neural network:
8 / = , /# = # # , /B = B B , /C = C C , (14)
PT
9 b. Combine all the possible pairs of the four initial models and generate the six candidate models
10 for the first layer, and estimate the parameters of the intermediate candidate models with the LS
RI
11 method;
12 c. Calculate the external criteria values of all intermediate candidate models, select the four
13 intermediate candidate models with the smallest external criteria values for the next layer, and regard
SC
14 them as inputs of the second layer of the GMDH neural network;
15 d. Repeat Steps b and c, generate the intermediate candidate models for the second, third, …, L-th
16 layer in turn, and find the combination forecasting model with optimal complexity u* according to the
U
17 optimal complexity theory;
18 (3) Predict the energy consumption nonlinear subseries on the test set with the optimal
AN
19 complexity model ∗ , and let it be ;
20 Step 4: Calculate the final energy consumption time series forecasting value. Add the
21 forecasting value of the linear GAR model and that of the nonlinear part , and obtain the final
M
22 energy consumption time series forecasting value, that is, = + .
23 4. Empirical Analysis
D
24 To verify the performance of the proposed model, this study selects two time series, the total
TE
25 energy consumption and total oil consumption in China for experiments. Firstly, to analyze the impact
26 of AdaBoost.RT algorithm on the model’s forecasting performance in the nonlinear subseries, this
27 study compares the forecasting performance of AdaBoost.RT ensemble with four single models—GP,
EP
28 SVR, GP and RBF. Secondly, to investigate the effect of selective combination forecasting, it analyzes
29 the forecasting results of four different versions of GMDH combination forecasting models to find the
30 optimal one, and then compares the best one with the models participating in the combination. Thirdly,
C
31 it compares the forecasting performance of HFGSE model with that of other hybrid forecasting models.
32
AC
Finally, the out-of-sample forecasting of HFGSE model is complemented on the total energy
33 consumption and total oil consumption time series in China from 2015 to 2020.
34
35 4.1. Data
36 To evaluate the forecasting performance of the HFGSE model proposed in this study, an empirical
37 analysis of the annual time series is conducted on the Chinese total energy consumption and total oil
38 consumption from 1978 to 2014 (see Fig. 3). The data are from the China Statistical Yearbook.
39 Because the key of the HFGSE model is to predict the nonlinear subseries of energy consumption, this
40 study does not discuss the forecasting result of the linear trend in detail, but utilizes the GAR model
41 proposed above to predict the linear trend of the original series , and obtains nonlinear subseries .
13
ACCEPTED MANUSCRIPT
1 Fig. 4 shows the change of total energy consumption and total oil consumption nonlinear subseries. It
2 can be seen from the figure that the nonlinear subseries of two energy consumption time series shows
3 fluctuation to a large extent.
450000
400000
Ten thousand tons of standard coal
350000
300000
PT
250000
200000
150000
RI
100000
50000
SC
0
U
4
5 Figure 3. Energy consumption time series.
AN
6
M
20000
Ten thousand tons of standard coal
15000
D
10000
TE
5000
-5000
EP
-10000
-15000
-20000
C
7
8 Figure 4. The nonlinear subseries of total energy and oil consumption.
9
10 4.2. Experiment setting
11 This study selected the time series of energy consumption from 1978 to 2009 as the training set,
12 and the ones from 2010 to 2014 as the test set. The models mentioned in this study were trained the
13 corresponding models on the training set, and evaluate their performance on the test set. It is worth
14 noting that the training set and test set here are different from those in Table 3, but they are related. In
15 Table 3, it first conducts the GAR model in the original energy consumption time series yt to obtain the
16 linear trend prediction , then calculates the nonlinear subseries y , and finally uses 1978 - #''} as the
14
ACCEPTED MANUSCRIPT
1 training set and #'' - #'C as the test set.
2 This study used the original energy consumption time series as the dependent variable, and its
3 lagged item as the independent variable to train the model. The four nonlinear forecasting models were
4 regarded as the weak learners to the train AdaBoost.RT ensemble model. The parameters setting of the
5 four models was as follows: 1) BP neural network: It includes two important parameters: the largest
6 lagged order, and its nodes in hidden layer, . In predicting different energy consumption time
7 series, the optimal values of two parameters are always different. After repeated experiments, it can be
8 found that the BP neural network can attain a satisfactory forecasting performance for the total energy
PT
9 consumption and total oil consumption time series when = 5 and 4, and =3 and 3 respectively. 2)
10 SVR model: this study used the Libsvm-3.1 toolbox to implement the SVR model. It chose the most
RI
11 commonly used RBF as the kernel function because of its nonlinear mapping ability. Through
12 experiments, it can be found that the SVR model had the best forecasting performance on the total
13 energy consumption and total oil consumption time series when =1 and 2, respectively. There are
other two important parameters in SVR model, i.e., the penalty parameter C and the kernel width γ.
SC
14
15 This study introduced the grid computing method in the toolbox to search for the best parameter values.
16 Finally, let C=0.2 and γ = 15.76 for the total energy consumption, and C=7.1 and γ = 24.20 for the
U
17 total oil consumption. 3) GP model: In its modeling process, the parameters setting is relatively
18 important for its performance. Through repeated tries, the GP model can attain the optimal forecasting
AN
19 effect for the total energy consumption and total oil consumption time series, respectively, when let the
20 number of initial trees be 50 and 60, the crossover probability 0.8 and 0.85, the threshold value of
21 goodness of fit 0.85 and 0.85, and the maximum number of iterations 50 and 50. 4) RBF neural
M
22 network: The expanding speed of the radial basis function spread is an important parameter, and the
23 lagged order of the time series, , is also very important. Through experimental comparison, it can be
D
24 found that the RBF model attained the best forecasting performance for two energy consumption time
25 series when spread=3 and =1.
For the threshold value ∅ of the AdaBoost.RT ensemble algorithm, after a repeated experiments
TE
26
27 comparison, this study took ∅=10% because the performance of the model is the best at this value.
28 Although the forecasting error of the final strong learner will decrease with the increase of the iteration
times , the increase in will lead to the increase of the model operation time, and therefore, the
EP
29
30 iteration times was set at =50.
31 Finally, all experiments were performed on the platform Matlab2011b. At the same time, this
C
32 study repeated the above procedure 10 times and took the average value to be the experimental result.
33
AC
Y P XWP
W
∑b
Pd<
K =
WP
D
39 , (16)
40 where is the real value of the y-th sample, is its corresponding forecasting value, and \ is the
15
ACCEPTED MANUSCRIPT
1 number of the test samples. Obviously, the smaller the value of the evaluation criterion is, the better the
2 forecasting performance of the model is [53].
3
4 Table 4 Comparison of AdaBoost ensemble and single models on two nonlinear subseries
Rank 5 2 3 1 8 6 7 4
PT
MAPE 103.51% 11.31% 98.62% 28.22% 255.74% 19.63% 134.87% 36.45%
Rank 6 1 5 3 8 2 7 4
RI
Total oil consumption nonlinear subseries
Rank 8 4 7 6 3 1 5 2
SC
MAPE 156.44% 79.36% 94.62% 70.85% 125.28% 72.18% 128.44% 53.01%
Rank 8 4 5 2 6 3 7 1
U
5
AN
6 4.4. AdaBoost ensemble forecasting on the nonlinear subseries
7 To analyze the impact of the AdaBoost.RT ensemble algorithm on the model’s forecasting
8 performance, this study compares the forecasting result of the BP neural network, SVR model, GP
9
M
model, and RBF neural network with that of each original single nonlinear model. Table 4 shows the
10 comparison of each model’s forecasting performance on two energy consumption nonlinear subseries.
11 The table gives the rank of each model on two evaluation criteria, from low to high (the smaller the
D
12 rank is, the better the model’s performance is). The last row is the average value of each model’s
13 evaluation criteria ranks, for each of two nonlinear subseries.
TE
14 The following conclusions can be obtained after carefully analyzing Table 4: 1) For both the total
15 energy consumption nonlinear subseries and the total oil consumption nonlinear time series, the values
16 of RMSE and MAPE through integrating the AdaBoost model are smaller than those of the
EP
17 corresponding single nonlinear models. This demonstrates that the AdaBoost.RT algorithm can
18 certainly improve the single linear models’ forecasting performance to different extents. 2) In the total
19 energy consumption nonlinear subseries, it can be seen from the ranks that the performance of the
C
20 AdaBoost.SVR model is the best according to the RMSE evaluation criterion, and that of AdaBoost.BP
AC
21 is the best according to the MAPE evaluation criterion; in the total oil consumption nonlinear subseries,
22 the performance of AdaBoost.GP is the best according to the RMSE evaluation criterion, and that of
23 AdaBoost.RBF is the best according to the MAPE evaluation criterion. This demonstrates that the
24 ensemble models can always achieve better performance compared with the four single nonlinear
25 forecasting models. From the average ranks of the last row in Table 4, the eight models in order of their
26 forecasting performance are as follows: AdaBoost.BP, AdaBoost.RBF, AdaBoost.SVR, AdaBoost.GP,
27 SVR, GP, RBF and BP. The four different ensemble models rank better than the other four models,
28 which verifies the above conclusions again.
29
30 4.5. Analysis of selective combination forecasting
16
ACCEPTED MANUSCRIPT
1 This part focuses on the effect of selective combination forecasting. It first analyzes the
2 forecasting results of four different versions of GMDH combination forecasting models to find the
3 optimal one, and then compares the best one with the models participating in the combination.
4
5 4.5.1 Comparisons of different versions of selective combination forecasting models
6 In the HFGSE model proposed in this study, four different versions of the model are constructed
7 according to the different external criteria used in GMDH selective combination prediction: AS.GMDH,
8 MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH. In this section, the four versions of GMDH model
PT
9 are used to make selective combinations of the models that were enhanced by the AdaBoost.RT
10 algorithm in the previous section. Table 5 shows the comparison of the selective combinations’
11 performance for the four different GMDHs. The number in parentheses indicates the rank of the model
RI
12 in the row. The smaller the rank is, the better the model’s performance is. The last row is the average
13 values of the evaluation criteria ranks of all the models for two consumption time series, which can
SC
14 represent the overall predictive performance of the models well.
15 According to Table 5, for the total energy consumption time series, MR.GMDH has the best
16 performance according to the RMSE evaluation criterion, followed by AS.GMDH and SRMSE.GMDH,
U
17 and the poorest performer in the group is SMAPE.GMDH. Meanwhile, according to the MAPE
18 evaluation criterion, AS.GMDH has the best performance, followed by SMPE.GMDH, and the poorest
AN
19 performers are MR.GMDH and SMAPE.GMDH. Therefore, each of these four models has its own
20 advantages and disadvantages. However, for the total oil consumption time series, AS.GMDH has the
21 smallest value of both RMSE and MAPE, indicating its superior prediction performance. Finally, from
M
22 the average ranks in the last row of Table 5, the AS.GMDH model has the smallest value, followed by
23 the MR.GMDH model, and finally the SMAPE.GMDH and SRMSE.GMDH models. This indicates
D
24 that, in the four versions of the GMDH selective combination forecasting model, the AS.GMDH model
25 has the best overall predictive performance. Therefore, in the following experiments of this study, the
TE
29
30 Furthermore, Table 6 gives the models that participate in the optimal combination model for two
31 consumption nonlinear subseries selected by the AS.GMDH model. It can be seen from the table that
32 the AS.GMDH model chooses two models from the four candidates, that is, AdaBoost.BP,
33 AdaBoost.GP, AdaBoost.RBF, and AdaBoost.SVR, to participate in the optimal combination for two
34 consumption nonlinear subseries. Thus, the following conclusion can be drawn that, on the one hand,
17
ACCEPTED MANUSCRIPT
1 models that participate in the optimal combination selected by the GMDH selective combination
2 forecasting model with self-organization modeling technology are not a single candidate model that can
3 effectively compensate for the lack of a single prediction model with poor performance; on the other
4 hand, they are not all candidate models, which is a good way to overcome the disadvantage of
5 information redundancy that the combination of all candidate models, namely the traditional
6 combination forecasting model, may lead to, thus improving the prediction performance of the model.
7
8 Table 6 Models participating in the optimal combination model constructed by AS.GMDH
PT
Nonlinear subseries Selected models
RI
Total oil consumption AdaBoost.GP, AdaBoost.RBF
9
10 4.5.2. Comparisons of the selective combination model with the models participating in the
SC
11 combination
12 To verify the performance of the GMDH-based selective combination forecasting model, this
13 study compares the GMDH-based combination model AS.GMDH with the four models participating in
U
14 combination: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Fig. 5 and Fig. 6 show
15 the comparison results for the total energy consumption nonlinear series and the total oil consumption
AN
16 nonlinear series, respectively.
0.9000
0.8000 0.7602
0.6818 0.6709
0.7000 RMSE
0.5738
0.6000 MAPE
D
0.5000
0.4000 36.45%
28.22%
0.3000
19.63%
0.2000
TE
8.54% 11.31%
0.1000
0.0000
EP
17
18 Figure 5. Comparison of the GMDH combination model with the models participating in the combination for the nonlinear
20
21 As can be seen from Fig. 5, for the total energy consumption nonlinear subseries, according to the
AC
22 RMSE evaluation criterion, the AS.GMDH model is optimal, followed by the AdaBoost.SVR and
23 AdaBoost.BP model, and finally the AdaBoost.RBF and AdaBoost.GP model. Moreover, according to
24 the MAPE evaluation criterion, AS.GMDH still is the optimal model, followed by AdaBoost.BP,
25 AdaBoost.GP, AdaBoost.SVR, and AdaBoost.RBF. Thus, the conclusion can be drawn that, for the
26 total energy consumption nonlinear series, compared with the four models participating in the
27 combination, the AS.GMDH model proposed by this study has a better forecasting performance.
28 According to Fig. 6, it can be seen that for the total oil consumption nonlinear series, the GMDH
29 selective combination forecasting model has the smallest value on both evaluation criteria, especially
30 on the MAPE evaluation criterion: the value of AS.GMDH is 13.84% lower than that of AdaBoost.RBF.
31 This shows that AS.GMDH still has the best forecasting performance for the total oil consumption
18
ACCEPTED MANUSCRIPT
1 nonlinear series.
PT
0.1789 0.1931 MAPE
0.2000
0.1000
0.0000
RI
2
SC
3 Figure 6. Comparison of the GMDH combination model with the models participating in the combination for the total oil
U
6 4.6. Comparisons of the proposed hybrid model with other models
7 To verify the overall forecasting performance of the proposed hybrid model HFGSE, this article
AN
8 compared it to other commonly used time series models. First, it compared the HFGSE model with the
9 GAR model (which only predicts the linear trend of the energy consumption time series and discards
10 the nonlinear residual subseries directly) that put forward earlier; the results are shown in Table 7. It
M
11 can be seen from the table that for both the total energy consumption time series and the total oil
12 consumption time series, the errors of the HFGSE model, which predicted the nonlinear residual series,
D
13 are always smaller than those of the GAR model. The conclusion can be drawn that for both
14 consumption time series, the nonlinear residual series do carry useful information for prediction
TE
15 modeling.
16
17 Table 7 Comparison of the forecasting performance of the HFGSE and GAR models
EP
18
19 Next, this study compared the HFGSE model with four simple hybrid models which first use the
20 GAR model to predict the linear trend, and then employ the BP, SVR, GP, and RBF model,
21 respectively, to predict the nonlinear fluctuations, finally combining two parts for the forecasting result.
22 Furthermore, it compared the HFGSE model with three hybrid forecasting models proposed recently,
23 including the combination forecasting method GM-ARIMA [21], the divide and rule methods
24 EMD-LSSVR [35] and DEMD-SVR-AR [33]. The results are shown in Table 8. The bold value in the
25 table corresponds to the smallest error in the current row. The number in parentheses indicates the rank
26 of the model in the row. The smaller the rank is, the better the model’s performance is. The last row
27 shows the average rank for each model.
19
ACCEPTED MANUSCRIPT
1 According to Table 8, the following conclusions can be obtained: 1) For both the total energy
2 consumption and total oil consumption time series, HFGSE, put forward by this study, has the smallest
3 value of MAPE evaluation criterion; the value of RMSE for HFGSE is only larger than that of
4 DEMD-SVR-AR for the total oil consumption time series. In addition, from the average rank in the last
5 row of this table, it can be seen that the average rank of HFGSE is the smallest, too. Thus, compared
6 with the other seven hybrid models, HFGSE has the best overall forecasting performance. 2) For the
7 seven hybrid models, the average rank of DEMD-SVR-AR is only larger than that of HFGSE model
8 proposed in this study, followed by EMD-LSSVR, GAR&BP, GM-ARIMA, GAR&SVR, GAR&GP
PT
9 models, and finally GAR&RBF. This indicates that the overall forecasting performance of
10 DEMD-SVR-AR model is superior to those of the six other models, whereas that of GAR&RBF is the
RI
11 worst.
12
13 Table 8 Comparisons of forecasting performance of the HFGSE and the other seven hybrid models
SC
HFGSE GAR&BP GAR&SVR GAR&GP GAR&RBF GM-ARIMA EMD-LSSVR DEMD-SVR-AR
U
MAPE 1.20%(1) 2.93%(4) 3.05%(5) 3.16%(6) 4.56%(8) 3.40%(7) 1.30%(2) 1.42%(3)
AN
Total oil consumption time series
14
15 4.7. Out-of-sample forecasting of the proposed hybrid model
D
16 Based on the above analyses and comparisons, the HFGSE model can accurately predict energy
17 consumption. Furthermore, Table 9 shows the out-of-sample forecasting results of the HFGSE model
TE
18 for two consumption time series from 2015 to 2020. It can be seen from the table that China's energy
19 consumption will continue to rise from 2015 to 2020, and the total amount of energy consumption and
20 oil consumption will reach 5261.47 and 1017.56 million tons of standard coal by 2020, respectively.
EP
21 The average annual growth rate of total energy consumption in 2015-2020 is 4.14%, whereas the
22 annual growth rate of total oil consumption is 5.24%.
23
C
24 Table 9 Forecasting of the HFGSE model for two consumption time series from 2015 to 2020 (unit: ten thousand tons of
AC
25 standard coal)
Total energy consumption time series 435637 448275 453746 485768 499398 526147
Total oil consumption time series 77059 81498 86148 91064 96262 101756
26
27 Meanwhile, since the real energy consumption data of 2015 and 2016 in China is available now,
28 the forecasting accuracy of 2015 and 2016 is shown in Table 10. After comparing Tables 10 and 8, it
29 can be found that HFGSE model has little difference in prediction performance between the
30 out-of-sample and test set, which shows that the HFGSE model has strong generalization ability.
31 Table 10 Forecasting accuracy of the HFGSE model for two consumption time series in 2015 and 2016
20
ACCEPTED MANUSCRIPT
Total energy consumption time series Total oil consumption time series
1
2 Figure 7 depicts the predicted results of energy consumption and comparisons with real values; the
3 triangle-dotted line and the cross-dotted line represent the predicted value of total energy consumption
4 and total oil consumption, respectively. The circle solid line and the square solid line represent the real
PT
5 value of the total energy consumption and total oil consumption, respectively. The dotted line and solid
6 line for 1978-2014 in the figure almost overlap completely, which further indicates that the HFGSE
RI
7 model can fit the energy consumption time series well. After 2015, the triangle-dotted line and the
8 cross-dotted line still maintain the trend of growth, but the growth rate of total energy consumption
9 begins to decelerate, and the growth rate of total oil consumption is basically unchanged.
SC
10
600000
U
Ten thousand tons of standard coal
500000
the real value of
total energy
AN
400000 consumption
the predicted
300000 value of total
energy
consumption
M
the predicted
value of tatal oil
0 consumption
TE
11
12 Figure 7. Comparison between the predicted and the real values of energy consumption.
EP
13 5. Conclusion
14 Researching and building scientific energy consumption models and accurately predicting the
15 future gap of energy supply and demand have important practical significance to our country's
C
16 sustainable economic and social development, the development of the energy industry, the rational use
AC
21
ACCEPTED MANUSCRIPT
1 series of total energy consumption and total oil consumption in China, and the main conclusions are as
2 follows:
3 1) Compared with the four single models—BP, SVR, GP and RBF, the AdaBoost.RT ensemble
4 algorithm can achieve better forecasting performance on the nonlinear subseries.
5 2) This study compares four different versions of GMDH selective combination forecasting
6 models, and the results show that AS.GMDH model has the best overall forecasting performance.
7 3) The comparisons of AS.GMDH combination forecasting model with the models participating in
8 the combination show that AS.GMDH has the best performance on the nonlinear subseries.
PT
9 4) Compared with GAR model and other seven hybrid models, the HFGSE model has the best
10 forecasting performance. In addition, the out-of-sample forecasting proves the superiority of the
RI
11 HFGSE model again.
12 5) The HFGSE model is applied to the out-of-sample forecasting and the results demonstrate the
13 total energy consumption and total oil consumption in China will keep growing until 2020.
SC
14 In the process of constructing the GMDH neural network, the reference function only considers
15 the first-order linear K-G polynomials, without further study of other forms of reference function. In
16 fact, in the real world, the relationship between the dependent and independent variables may not be a
U
17 simple first-order linear relationship. Therefore, considering the form of a more complex nonlinear
18 reference function will be more in line with the actual relationship, and may further improve the
AN
19 performance of the model; it is also the further research direction of this study.
20 Acknowledgments
M
21 Thanks for the constructive suggestions of the editor and anonymous reviewers. This study is
22 partly supported by the National Natural Science Foundation of China under Grant Nos. 71471124 and
D
23 71273036, Excellent Youth Fund of Sichuan University under Grant Nos. skqx201607, sksyl201709,
24 and skzx2016-rcrw14.
TE
25 References
26 [1] B. World, BP Statistical Review of World Energy 2016. Available from: http://www.bp.com/en/global/corporate
EP
27 /energy-economics/statistical-review-of-world-energy.html.
28 [2] W.G.J. Dupree, J.S. Corsentino, United States energy through the year 2000, Nasa Sti/recon Technical Report, 1975.
29 [3] R.P. Thompson, Weather sensitive electric demand and energy analysis on a large geographically diverse power system
C
30 application to short term hourly electric demand forecasting, IEEE Transactions on Power Apparatus and Systems 95 (1)
31
AC
(1976) 385-393.
32 [4] S. Parikh, M.H. Rothkopf, Long-run elasticity of US energy demand: A process analysis approach, Energy Economics 2 (1)
33 (1980) 31-36.
34 [5] Z.R. Yang, The potential and means of saving energy, China's Energy 3 (4) (1980) 5-8. (in Chinese).
35 [6] Z.H. Wu, See the Way Out of the Energy Crisis from Energy Science and Technology, Knowledge Press, 1980. (in Chinese)
36 [7] D. Shi, The improvement of energy utilization efficiency in China's economic growth, Economic Research Journal 48 (9)
38 [8] The State Planning and Energy-saving Commission, Development and Application of Energy Prediction Model, China
40 [9] P. Sen, M. Roy, P. Pal, Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an
41 Indian pig iron manufacturing organization, Energy 116 (12) (2016) 1031-1038.
22
ACCEPTED MANUSCRIPT
1 [10] A.E. Clements, A.S. Hurn, Z. Li, Forecasting day-ahead electricity load using a multiple equation time series approach,
3 [11] K.G. Boroojeni, M.H. Amini, S. Bahrami, S.S. Iyengar, A.F. Sarwat, O. Karabasoglu, A novel multi-time-scale modeling for
4 electric power demand forecasting: From short-term to medium-term horizon, Electric Power Systems Research 142 (1)
5 (2017) 58-73.
6 [12] F. Shaikh, Q. Ji, P.H. Shaikh, N.H. Mirjat, M.A. Uqaili, Forecasting China’s natural gas demand based on optimized
8 [13] S. Ding, K.W. Hiple, Y.G. Dang, Forecasting China's electricity consumption using a new grey prediction model, Energy
PT
9 149 (4) (2018) 314-328.
10 [14] M. Kovačič, B. Šarler, Genetic programming prediction of the natural gas consumption in a steel plant, Energy 66 (3) (2014)
11
RI
273-284.
12 [15] J. Szoplik, Forecasting of natural gas consumption with artificial neural networks, Energy 85 (6) (2015) 208-220.
13 [16] E.S. Irdemoosa, S.R. Dindarloo, Prediction of fuel consumption of mining dump trucks: a neural networks approach,
SC
14 Applied Energy 115 (8) (2015) 77-84.
15 [17] Y. Chen, P. Xu, Y. Chu, W.L. Li, Y.T. Wu, L.Z. Ni, Y. Bao, K. Wang, Short-term electrical load forecasting using the support
16 vector regression (SVR) model to calculate the demand response baseline for office buildings, Applied Energy 195 (6) (2017)
U
17 659-670.
18 [18] A. Rahman, V. Srikumar, A.D. Smith, Predicting electricity consumption for commercial and residential buildings using
AN
19 deep recurrent neural networks, Applied energy 212 (2) (2018) 372-385.
20 [19] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted
21 support vector regression with differential evolution optimization technique, Energy and Buildings 126 (8) (2016) 94-103.
M
22 [20] L.Y. Xiao, C. Wang, T.L Liang, W. Shao, A combined model based on multiple seasonal patterns and modified firefly
23 algorithm for electrical load forecasting, Applied Energy 167 (4) (2016) 135-153.
D
24 [21] C.Q Yuan, S.F. Liu, Z.G. Fang, Comparison of China's primary energy consumption forecasting by using ARIMA (the
25 autoregressive integrated moving average) model and GM (1,1) model, Energy 100 (4) (2016) 384-390.
TE
26 [22] J. Nowotarski, B. Liu, R. Weron, T. Hong, Improving short term load forecast accuracy via combining sister forecasts,
28 [23] X.L. Liu, B. Moreno, A.S. García, A grey neural network and input-output combined forecasting model. Primary energy
EP
29 consumption forecasts in Spanish economic sectors, Energy 115 (11) (2016) 1042-1054.
30 [24] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted
31 support vector regression with differential evolution optimization technique, Energy & Buildings 126 (8) (2016) 94-103.
C
32 [25] Y. Karadede, G. Ozdemir, E. Aydemir, Breeder hybrid algorithm approach for natural gas demand forecasting model, Energy
33
AC
34 [26] J.R. Li, R. Wang, J.Z. Wang, Y.F. Li, Analysis and forecasting of the oil consumption in China based on combination models
36 [27] Y.J. Zhang, F. Ma, B.S. Shi, D.S. Huang, Forecasting the prices of crude oil: An iterated combination approach, Energy
38 [28] B.Z. Zhu, Y.M. Wei, Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines
40 [29] N. Liu, Q.F. Tang, J.H. Zhang, W. Fan, J. Liu, A hybrid forecasting model with parameter optimization for short-term load
42 [30] A. Addoos, M. Hemmati, A.A. Abdoos, Short term load forecasting using a hybrid intelligent method, Knowledge-Based
23
ACCEPTED MANUSCRIPT
1 [31] J.L. Zhang, Y.J. Zhang, L. Zhang, A novel hybrid method for crude oil price forecasting, Energy Economics 49 (5) (2015)
2 649-659.
3 [32] L. Yu, Z.S. Wang, L. Tang, A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil
5 [33] G.F. Fan, L.L. Peng, W.C. Hong, F. Sun, Electricity load forecasting by the SVR model with differential empirical mode
7 [34] I.P. Panapakidis, A.S. Dagoumas, Day-ahead natural gas demand forecasting based on the combination of wavelet transform
8 and ANFIS/genetic algorithm/neural network model, Energy 118 (1) (2017) 231-245.
PT
9 [35] B.Z. Zhu, D. Han, P. Wang, Z.C. Wu, T. Zhang, T.M. Wei, Forecasting carbon price using empirical mode decomposition
10 and evolutionary least squares support vector regression, Applied Energy 191 (4) (2017) 521-530.
11
RI
[36] E.M. Oliveira, F.L.C. Oliveira, Forecasting mid-long term electric energy consumption through bagging ARIMA and
13 [37] D.L. Wang, Y.D. Wang, X.F. Song, Y. Liu, Coal overcapacity in China: multiscale analysis and prediction, Energy
SC
14 Economics 70 (2) (2018) 244-257.
15 [38] Y. Freund, R.E. Schapirre, Experiments with a new boosting algorithm, in: ICML, pp. 148-156.
16 [39] A. Ivakhnenko, The group method of data handling in prediction problems, Soviet Autom Control 9 (6) (1976) 21-30.
U
17 [40] L. Xie, J. Xiao, H. Zhao, Y. Xiao, Y. Hu, China’s energy consumption forecasting by GMDH based auto-regressive model,
20 decomposition, SVM and cuckoo search algorithm, Journal of Systems Science and Information 2 (6) (2014) 481-504.
21 [42] P. Viola, M.J. Jones, Robust real-time object detection, International Journal of Computer Vision 57 (2) (2001) 34-47.
M
22 [43] L. Gao, P. Kou, F. Gao, X.H. Guan, AdaBoost regression algorithm based on classification-type loss, in: 8th World Congress
24 [44] D.P. Solomatine, D.L. Shrestha, AdaBoost.RT: a boosting algorithm for regression problems, in: International Joint
26 [45] J. Xiao, C.Z. He, X.Y. Jiang, Structure identification of Bayesian classifiers based on GMDH, Knowledge-Based Systems
28 [46] J. Xiao, C.Z. He, X.Y. Jiang, D.H. Liu, A dynamic classifier ensemble selection approach for noise data, Information
EP
30 [47] J. Xiao, L. Xie, C.Z. He, Dynamic classifier ensemble model for customer classification with imbalanced class distribution,
32 [48] J. Xiao, Y. Xiao, A. Huang, D.H. Liu, S. Wang, Feature-selection-based dynamic transfer ensemble model for customer
33
AC
34 [49] J. Xiao, H.W. Cao, X.Y. Jiang, X. Gu, L. Xie, GMDH-based semi-supervised feature selection for customer classification,
36 [50] J.A. Mueller, F. Lemke, Self-organizing Data Mining: An Intelligent Approach to Extract Knowledge from Data, Libri,
37 2000.
38 [51] Y. Xiao, J.J. Liu, Y. Hu, Y.F. Wang, Time series forecasting using a hybrid adaptive particle swarm optimization and neural
39 network model, Journal of Systems Science and Information 2 (4) (2014) 335-344.
40 [52] J. Xiao, X.Y. Jiang, C.Z. He, G. Teng, Churn prediction in customer relationship management via GMDH-based multiple
42 [53] S.W. Yu, K.J. Zhu, A hybrid procedure for energy demand forecasting in China, Energy 37 (1) (2012) 396-404.
43
24
ACCEPTED MANUSCRIPT
1
PT
RI
U SC
AN
M
D
TE
EP
C
AC
25
ACCEPTED MANUSCRIPT
Highlights
This study employs the selective ensemble method for the nonlinear subseries.
PT
The selective ensemble method performs better than its constituent models.
RI
The hybrid model outperforms other seven models on the original time series.
The out-of-sample forecasting for the two time series from 2015-2020 is shown.
U SC
AN
M
D
TE
C EP
AC