You are on page 1of 28

Accepted Manuscript

A hybrid model based on selective ensemble for energy consumption forecasting in


China

Jin Xiao, Yuxi Li, Ling Xie, Dunhu Liu, Jing Huang

PII: S0360-5442(18)31226-X
DOI: 10.1016/j.energy.2018.06.161
Reference: EGY 13203

To appear in: Energy

Received Date: 29 November 2017


Revised Date: 19 June 2018
Accepted Date: 24 June 2018

Please cite this article as: Xiao J, Li Y, Xie L, Liu D, Huang J, A hybrid model based on
selective ensemble for energy consumption forecasting in China, Energy (2018), doi: 10.1016/
j.energy.2018.06.161.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
M
D
TE
EP
C
AC
ACCEPTED MANUSCRIPT

1 A hybrid model based on selective ensemble for energy


2 consumption forecasting in China
3

4 Jin Xiaoa, Yuxi Lia, Ling Xiea, Dunhu Liub, Jing Huangc,1

5 a Business School, Sichuan University, Chengdu 610064, China

6 
b Management Faculty, Chengdu University of Information Technology, Chengdu 610103, China

PT
7 
c School of Public Administration, Sichuan University, Chengdu 610064, China
8
9 Abstract: It is of great significance to develop accurate forecasting models for China’s energy

RI
10 consumption. The energy consumption time series often have the characteristics of complexity and
11 nonlinearity, and the single model cannot achieve satisfactory forecasting results. Therefore, in recent
12

SC
years, more and more scholars have tried to build up hybrid model to handle this issue, in which the
13 divide and rule method is the most popular one. However, the existing divide and rule models often
14 predict the energy consumption subseries after decomposing with the single forecasting model. This
15

U
study introduces the group method of data handling technique for energy consumption forecasting in
16 China, and constructs a hybrid forecasting model based on the group method of data handling selective
AN
17 ensemble. It mainly focuses on predicting the nonlinear variation of energy consumption. The model
18 first predicts the linear trend of energy consumption time series through the group method of data
19 handling-based autoregressive model and then obtains the residual subseries of energy consumption.
M

20 Considering the highly nonlinear characteristics of the residual subseries, this study introduces
21 AdaBoost ensemble technology to enhance the forecasting performance of the single nonlinear
22 prediction model, back propagation neural network, support vector regression machine, genetic
D

23 programming, and radical basis function neural network respectively, to obtain four different versions
24
TE

of the ensemble model on nonlinear subseries. Further, the prediction results of these four AdaBoost
25 ensemble models are used as an initial input, and the selective combination prediction for the nonlinear
26 subseries is obtained by using the group method of data handling. Finally, two parts are added up to
27
EP

obtain the final prediction. The empirical analysis of total energy consumption and total oil
28 consumption in China shows that the forecasting performance of the proposed model is better than that
29 of the group method of data handling-based autoregressive model and seven other hybrid models, and
C

30 this study gives the out-of-sample forecasting of two time series from 2015 to 2020.
31 Key words: prediction of energy consumption; GMDH; AdaBoost ensemble technology; selective
AC

32 combination forecasting; hybrid forecasting


33

34 1. Introduction
35 Since the economic reform (“reform and opening-up”), the Chinese economy has developed
36 rapidly, and the energy consumption has increased continuously. The BP Statistical Review of World
37 Energy 2016 [1] pointed out that the Chinese economy grew slowly, and it was undergoing a structural
38 transformation, while China remained the country with the largest energy consumption, production,

1
Corresponding author. E-mail address: 147895715@qq.com.

1
ACCEPTED MANUSCRIPT
1 and net imports in the world. In 2015, China’s energy consumption was 23% of the total global
2 consumption, and comprised 34% of the net increase in the global energy consumption. In fossil energy,
3 China’s consumption increase rate for oil was the fastest, at 6.7%. In non-fossil energy, solar energy
4 increased the fastest, at 69.7%. China surpassed Germany and the USA, and became the largest solar
5 energy electricity generation country in the world. Therefore, it is of realistic significance to study and
6 construct a scientific energy consumption model, accurately predict the future gap between supply and
7 demand for sustainable economy and society development, energy industry development, the
8 reasonable use of energy resources, the construction of a conservation-minded society, and creation of

PT
9 a national energy strategy.
10

RI
Nomenclature

ARIMA autoregressive integrated moving average  nonlinear subseries of the energy consumption

GM grey prediction model  forecasting result of the t-th sample by the i-th ensemble

SC
model in the nonlinear subseries

GP genetic programming Y dependent variable

SVR support vector regression X independent variable

U
ANN artificial neural network AS asymmetric stability
AN
DR demand respond MR mean regularization

MLP multi-layer perceptron yˆ t (W ) forecasted output of the t-th sample in the entire training

set by the model trained on the same dataset in Eq. (8)


M

ANFIS adaptive neuro fuzzy inference system RMSE root mean square error

EMD empirical mode decomposition MAPE mean absolute percentage error

DEMD differential empirical mode SRMSE symmetrical root mean squared error
D

decomposition

LSSVR least square support vector regression SMAPE symmetrical mean absolute percentage error
TE

IMF intrinsic mode function mA sample size of the model learning set A

GMDH group method of data handling mB sample size of the model selection set B

HFGSE hybrid forecasting model based on forecasted output of the t-th sample in the selection set B
EP

yˆt (A)
GMDH selective ensemble by the model trained in the model learning set A in Eqs.

(9) and (10)

AR autoregression forecasted output of the t-th sample in the model


C

yˆ(
t B)
learning set A by the model trained in the model
AC

selection set B in Eqs. (9) and (10)

GAR GMDH-based AR T maximum number of iterations

BP back propagation ∅ threshold value of the relative forecasting error

RBF radical basis function  


initialization weight of the i-th sample

output in Eq. (1) 


weight of the i-th sample in the τ-th iteration

 coefficient vector in Eq. (1)  relative forecasting error in the τ-th iteration

 number of initial models in Eq. (3)  weight of weak learner in the τ-th iteration

 estimated output in Eq. (4)  the largest lagged order of BP neural network

 test set  nodes in hidden layer of BP neural network

 training set C penalty parameter of SVR

2
ACCEPTED MANUSCRIPT
yt original energy consumption time series γ kernel width of SVR

 forecasted linear trend

1
2 1.1. Literature review
3 With the social development and progress, people have realized the important effect of energy on
4 economic development. After the energy crisis in 1973 and 1979, the entire world has been conscious
5 of the energy conditionality for economy and the significance of consumption forecasting. During the
6 period, a great deal of research on energy consumption demand forecasting appeared abroad. Dupree

PT
7 and Corsentino [2] presented the future energy consumption within the major consuming sectors and
8 the energy supply sources of America. Thompson [3] proposed a weather sensitive electric loads and

RI
9 energy forecasting method which could be used in both long-term and short-term prediction. Parikh
10 and Rothkopf [4] studied the long-run elasticity of US energy demand and proposed an effective
11 process analysis method. There are no energy consumption data before the reform, so the research of

SC
12 China remained on policy suggestions. Yang [5] put forward several ways to save energy in China. Wu
13 [6] proposed the idea of using forecasting technology to solve the energy crisis. After that, the
14 availability of energy consumption data has resulted in a great progress in the domestic studies, for

U
15 example, Shi [7] pointed out that the improvement of China's energy utilization efficiency was very
16 significant since the reform and opening-up. The State Planning and Energy-saving Commission [8]
AN
17 focused on the construction and application of energy forecasting models. Until recently, the scholars
18 have proposed many methods for predicting energy consumption, and they can be divided into two
19 classes: single forecasting models and hybrid forecasting models.
M

20
21 Table 1 Typical literatures using single forecasting models
D

Model Type Typical Literature Advantages Disadvantages

Time series Sen et al. (2016) [9] Intuitive and explainable Pre-assumed form of the model;
TE

model Clements et al. (2016) [10] functional form; data independence assumption;

Boroojeni et al. (2017) [11] low computational complexity; low accuracy for nonlinear

Shaikh et al. (2017) [12] don't require extended data variation


EP

Ding et al. (2018) [13]

Nonlinear Kovačič and Šarler (2014) [14] Don't need pre-assumed form Results cannot be easily

forecasting model Szoplik (2015) [15] of the model; explained;


C

Irdemoosa and Dindarloo strong nonlinear mapping high computational complexity


AC

(2015) [16] ability;

Chen et al. (2017) [17] can solve the complex

Rahman et al. (2018) [18] nonlinear problems

Due 21/12/2018
22
23 Some typical literatures using single forecasting models are summarized in Table 1. The
24 commonly used single models include: 1) Time series models, including autoregressive integrated
25 moving average models (ARIMA) [9], regression analysis models [10], and grey prediction models
26 (GM) [13]. For example, Sen et al. [9] focused on how to select the best possible ARIMA model for
27 short term forecasting and they found out that the ARIMA (1,0,0) x (0,1,1) was the best model for the

3
ACCEPTED MANUSCRIPT
1 energy consumption and the ARIMA (0,1,4) x (0,1,1) was the best one for GHG (green-house gas)
2 emission, respectively. Clements et al. [10] proposed a multiple equation time series model to forecast
3 the day-ahead electricity load in Austria, and found that this model could achieve the same or even
4 better performance than the complex nonlinear and nonparametric forecasting models. Ding et al. [13]
5 developed a novel optimized grey model based on the principle of “new information priority” to predict
6 China’s electricity consumption, which combined a new initial condition and rolling mechanism. The
7 empirical results showed that the model was superior to some benchmark models. 2) Nonlinear
8 forecasting models, including genetic programming (GP) [14], artificial neural networks (ANN) [15],

PT
9 support vector regression (SVR) [17], etc. For instance, Kovačič and Šarler [14] applied the GP model
10 to forecasting the natural gas consumption in a steel plant and the results showed high accuracy of this

RI
11 model. Szoplik [15] used the multilayer perceptron (MLP) in ANN to forecast the gas demand in
12 Szczecin, Poland, and the results showed that this model had good performance when used to forecast
13 the gas consumption on any day of the year and any hour of the day. Chen et al. [17] proposed a new

SC
14 SVR model, which used the ambient temperature of two hours before demand respond (DR) event as
15 the input variables, for forecasting the DR baselines of office buildings.
16 The economic time series often have the characteristics of complexity and nonlinearity, and the

U
17 single model cannot always analyze and predict the energy demand accurately. Therefore, in recent
18 years, more and more scholars have tried to build up the hybrid model to handle this issue, and the
AN
19 models can be approximately classified into two types: 1) The combination forecasting method, which
20 trains several models to predict the original time series, and then combines these models with
21 appropriate weight to obtain the final forecasting result. For example, Zhang et al. [19] constructed a
M

22 weighted model combining nu-SVR and epsilon-SVR, and the differential evolution algorithm was
23 employed to determine the weights of each model. This model was utilized to forecast the daily and
D

24 half-hourly energy consumption of a building in Singapore and the results showed that the proposed
25 model had higher accuracy than some other models. Yuan et al. [21] combined GM and ARIMA
TE

26 models with the same weight to forecast China’s primary energy consumption, and found out that the
27 forecasting performance of this model was better than that of the single GM and ARIMA model. Li et
28 al. [26] improved the traditional combination method by allowing the weight coefficient of the
EP

29 participating model to be negative, the experimental results on the oil consumption in China indicated
30 that this new method had better performance than the traditional combination methods. 2) The divide
31 and rule method, which first decomposes the original time series into several subseries, and then
C

32 models and predicts each subseries by an appropriate model, finally integrates the prediction results
33
AC

according to certain rules. This method is used most frequently, for instance, Fan et al. [33] proposed a
34 model to forecast the electric load in Australia and USA. Firstly, this method used the differential
35 empirical mode decomposition (DEMD) to decompose the original time series into several intrinsic
36 mode functions (IMFs) and a residual subseries; secondly, the SVR model was employed to forecast
37 the IMFs and the autoregression model for the residual subseries; finally, all the results were summed
38 up to obtain the final prediction result. The empirical results illustrated that this model could provide
39 both accurate prediction and interpretative ability. Panapakidis and Dagoumas [34] proposed a hybrid
40 model to predict the day-ahead natural gas demand. First, it decomposed the original time series into
41 several subseries by wavelet transforming, then employed a genetic algorithm optimized adaptive
42 neuro-fuzzy inference system (ANFIS) to forecast each subseries, and finally a feed-forward neural

4
ACCEPTED MANUSCRIPT
1 network (FFNN) was used to aggregate the forecasting results of all the subseries. The experimental
2 results showed that the model had good robustness. In addition to the energy consumption forecasting,
3 the hybrid models are widely applied in the energy price forecasting. For example, Zhu et al. [35]
4 developed a EMD-based least square support vector regression (LSSVR) model to predict the carbon
5 price. It first decomposed the carbon price time series into several IMFs and a residue by EMD, then
6 used LSSVR to forecast the IMFs and residue, respectively, finally all the forecasting values were
7 aggregated into the final prediction results. Compared with some traditional forecasting methods, the
8 proposed model had better performance and robustness. More typical literatures regarding the hybrid

PT
9 forecasting model can be found in Table 2.
10
11

RI
Table 2 Typical literatures using hybrid forecasting models

Model Type Typical Literature Advantages Disadvantages

Combination forecasting Zhang et al. (2016) [19] Convenient and simple; Computational intensive;

SC
method Xiao et al. (2016) [20] robust for the complex difficult to decide which

Yuan et al. (2016) [21] problems models to be combined;

Nowotarski et al. (2016) [22] don’t take data characteristic

U
Liu et al. (2016) [23] into consideration
AN
Zhang et al. (2016) [24]

Karadede et al. (2017) [ 25]

Li et al. (2018) [26]


M

Zhang et al. (2018) [27]

Divide and rule method Zhu and Wei (2013) [28] Assign appropriate Complex model;

Liu et al. (2014) [29] forecasting method based on computational intensive


D

Abdoos et al. (2015) [30] the data characteristic;

Zhang et al. (2015) [31] robust for the complex


TE

Yu et al. (2015) [32] problems

Fan et al. (2016) [33]

Panapakidis and Dagoumas (2017) [34]


EP

Zhu et al. (2017) [35]

Oliveira and Oliveira (2018) [36]

Wang et al. (2018) [37]


C

12
AC

13 1.2. Our Contributions


14 The above researches have made much contribution to the energy demand forecasting, whereas
15 there are some gaps existing in the current state of art: 1) The existing divide and rule method often
16 predicts the energy consumption subseries after decomposing with the single forecasting model. In fact,
17 for subseries with strong nonlinear fluctuation, it is hard to obtain satisfactory effects with the single
18 model. To handle this issue, ensemble learning [38], which has arisen in recent years, undoubtedly
19 provides a good method. Its basic idea is to combine a series of weak learners to enhance their
20 prediction performance. 2) Most of the existing ensemble methods allow all the trained models to
21 participate in combination, therefore the redundancy and multicollinearity may exist, which may
22 decrease the performance of the ensemble model. The forecasting performance may be improved by

5
ACCEPTED MANUSCRIPT
1 selecting and combining the forecasting results of a subset of the models for a final decision, i.e.,
2 selective ensemble. The factor screening function of the group method of data handling (GMDH)
3 neural network proposed by Ivakhnenko [39] can objectively and automatically choose factors that
4 critically influence the research object [40]. Thus, GMDH can reduce the effect of multicollinearity on
5 the performance of the ensemble model to some extent.
6 To fill the gaps mentioned above, this study introduces the GMDH technique, and proposes a
7 hybrid forecasting model based on GMDH selective ensemble (HFGSE). It uses the GMDH-based
8 autoregressive (GAR) model proposed in the authors’ previous work [40] to predict the linear trend of

PT
9 the energy consumption time series, and obtains the nonlinear residual subseries. Considering the
10 highly nonlinear characteristics of the residual subseries, this study introduces AdaBoost ensemble [38]

RI
11 technology to enhance the forecasting performance of the single nonlinear prediction model, back
12 propagation (BP) neural network, support vector regression (SVR) machine [41], genetic programming
13 (GP) , and radical basis function (RBF) neural network respectively, to obtain four different versions of

SC
14 the ensemble model on nonlinear subseries: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and
15 AdaBoost.RBF. Further, the prediction results of these four AdaBoost ensemble models are used as an
16 initial input, and the selective combination prediction for the nonlinear subseries is obtained by using

U
17 GMDH. Finally, the predictions of the two parts are integrated to obtain the final forecasting results.
18 The empirical analysis on China’s total energy consumption and total oil consumption time series
AN
19 verifies the effectiveness of the HFGSE model.
20 The novelty of this study can be summarized as follows:
21 1) This study employs the ensemble learning to predict the nonlinear subseries after decomposition,
M

22 to improve the forecasting performance of single model.


23 2) Instead of integrating all the predictors on the nonlinear subseries, the proposed method utilizes
D

24 selective ensemble to avoid redundancy and multicollinearity to some extent. To the best of our
25 knowledge, this study, for the first time, employs selective ensemble in the energy consumption
TE

26 forecasting filed.
27
28 1.3. Organization of the paper
EP

29 This study is organized as follows: Section 2 describes the related methodology applied for the
30 proposed model, including the AdaBoost ensemble method, GMDH neural network, and GAR model.
31 Section 3 discusses the hybrid forecasting model based on the GMDH selective ensemble, that is,
C

32 HFGSE, in detail. Section 4 presents the empirical study. Finally, the findings of this study are
33
AC

summarized in Section 5.

34 2. Related Theories
35 In this study, AdaBoost ensemble model, GMDH neural network and GAR model are used to
36 construct the hybrid model HFGSE. A brief description of these related methods is summarized in this
37 section.
38
39 2.1. AdaBoost ensemble model
40 In machine learning, ensemble learning is an effective method for increasing the learning accuracy
41 through combining the outputs of many weak learners. Boosting is a commonly used ensemble

6
ACCEPTED MANUSCRIPT
1 algorithm. It has many different versions, and AdaBoost is the most popular one.
2 AdaBoost was proposed by Freund and Schapire [38]. To improve the learning performance of a
3 weak learner with the AdaBoost algorithm, it first needs to initialize the sample weight distribution in
4 the training set, and the initial weight assigned to each sample is the same, that is, if the training set
5 contains n samples, then the weight of each sample is 1⁄. Therefore, in the first iteration of training
6 the weak learner with AdaBoost, each sample will be selected with the same probability. The selected
7 samples can be trained to obtain the first weak learner, ℎ , under the appointed learning rule. Then,
8 AdaBoost calculates the classification error of the training samples at the current iteration. The weight

PT
9 distribution of the samples at the next iteration is updated according to the error. The update rule is:
10 increase the sample weight of misclassification, decrease the sample weight of correct classification.
Repeat the process T times, and T weak learners can be obtained: ! , !# , ⋯ , !% . Finally, the last

RI
11
12 prediction value is obtained through weighting the forecasting results of the T weak learners.
13 At the beginning, the application of the AdaBoost algorithm was focused on classification [42]

SC
14 such as face recognition, vehicle license plate recognition, and so on. In recent years, it has been
15 applied in forecasting [43]. For example, Solomatine and Shresth [44] proposed the AdaBoost.RT
16 algorithm for forecasting. The algorithm is similar to the AdaBoost algorithm, and the difference

U
17 between them is that the latter increases the weight of the sample whose relative error is greater than
18 the pre-set threshold values ∅ after finding it at the end of each iteration. For the detailed process,
AN
19 please refer to [44].
20
21 2.2. Group method of data handling neural network
M

22 The GMDH neural network is the core technique of self-organizing data mining [45], and it can
23 decide the variables to enter the model and the structure and parameters of the model in a
D

24 self-organizing way [46].


25 Generally speaking, before GMDH modeling, the training set, W, needs to be randomly divided
TE

26 into two subsets, namely, model learning set A for estimation of model parameters, and model selection
27 set B for performance evaluation of intermediate candidate models [47]. GMDH constructs the general
28 relation between the inputs and outputs variables through the reference function. Generally speaking, as
EP

29 a reference function, it takes the discrete form of a K-G polynomial:


30 = ' + ∑*  + + ∑* ∑,* , + +, + ∑* ∑,* ∑-* ,- + +, +- + ⋯, (1)
31 where y is the output, . = + , +# , ⋯ +
is the input vector, and  is the coefficient or weight vector.
Specifically, the form of the first-order linear K-G polynomial including  variables can be expressed
C

32
33
AC

as follows:
34 Let
35 !+ , +# , ⋯ , +
=  + + # +# + ⋯ +  + , (2)
36 and take all its sub-items as the  initial models of the modeling network structure:
37 / =  + , /# = # +# , ⋯ , / =  + . (3)
38 Set the  initial models of Eq. (3) as the inputs of the GMDH network, combine all their possible

pairs and generate the 1# =


2

#
39 intermediate candidate models of the first layer [48]. The transfer

40 function is as follows:
41  = !3/ , /, 4; , 6 = 1, 2, ⋯ , ; ≠ 6, (4)

7
ACCEPTED MANUSCRIPT
1 where  is the estimated output. Obtain parameters through least squares (LS) estimation on the
2 model learning set A. Work out the external criterion value of every intermediate candidate model on
3 the model selection set B. Generally speaking, the smaller the external criterion value is, the higher the
4 performance of the intermediate candidate model is. Rank the external criteria from small to large,
5 select the optimal 9 ≤ 1#
models as the inputs of the second layer, and combine all their possible
6 pairs to generate 1;#< intermediate candidate models:
7 = = !3 , , 4; , 6 = 1, 2, ⋯ , 9 ; ≠ 6. (5)
8 Estimate the parameter of each intermediate candidate model and calculate its external criterion value,

PT
9 select 9# 3≤ 1;#< 4 intermediate candidate models again as the inputs of the third layer, and combine all
10 their possible pairs to generate 1;#> intermediate candidate models:
= !3= , =, 4; , 6 = 1, 2, ⋯ , 9# ; ≠ 6.

RI
11 (6)
12 The process repeats continuously, and the intermediate candidate models of the fourth, fifth, …, layer
13 can be obtained in turn. The termination rule of the model is given through the optimal complexity

SC
14 theory [49]: with the increase of the intermediate candidate models’ complexity, the external criteria
15 values will first become smaller and then larger. Therefore, when the external criteria value reaches its
16 minimum, the corresponding model is the optimal complexity model ∗ (see Fig. 1). Finally, in order
to seek the initial model contained in the optimal complexity model ∗ , one just needs to reconstruct

U
17
18 the GMDH network structure from the last layer until the initial input layer is reached. From Fig. 1, it
AN
19 can be seen that the initial input models v1, v3, v4 and v5 are chosen. In other words, x1, x3, x4, and x5 are
20 chosen. However, v2 is eliminated during the self-adaption process; in other words, x2 is eliminated
21 [50].
M

The 1st layer


input layer The 2nd layer
w1
D

w2
v1
The 3rd layer
z2
TE

v2
The optimal model
v3 w5
EP

v4 y2 y*= f (v)
z6
v5
w10
C

The unselected models (eliminated) The selected models (reserved)


22
23
AC

Figure 1. The process of GMDH neural network modeling.

24
25 2.3. Group method of data handling-based autoregression model
26 In time series forecasting, the ARIMA model is usually adopted to predict the linear trend of the
27 time series. However, to determine whether the test sequence is stable, before constructing the ARIMA
28 (p, d, q) model, unit root test should be conducted. In addition, it’s necessary to find the optimal
29 parameter values, namely, the autoregressive order p and the moving average order q through trial and
30 error, whereas the GMDH neural network is a data driven method and it requires little prior knowledge
31 and assumptions. Thus, the authors’ previous work [40] combined an GMDH-type neural network with
32 an ARIMA model, and construct a GMDH-based autoregression (GAR) model for forecasting the

8
ACCEPTED MANUSCRIPT
1 energy consumption. In this model, the original single time series is first converted to a matrix. In the
2 matrix, starting from the second column, each column represents a new variable. Specifically, yt in the
3 second column is the current period of the energy consumption time series, that is, the dependent
4 variable Y, and from the third column to the end are the energy consumption time series with lag order
5 1, 2, · · · , k, respectively, which makes up the input vector X. In addition, it divides the new data set
6 into training set  and test set  , and divides the training set into a model learning set and a
7 model selection set further. Secondly, it trains a GMDH neural network to find the optimal complexity
8 model and decides the optimal autoregression order p. Finally, the energy consumption in the test set is

PT
9 forecasted by the optimal complexity model.
10 This model can ensure a self-organized modeling process, including finding the optimal

RI
11 complexity model, determining the optimal autoregression order, and estimating model parameters,
12 being largely devoid of human interference. The empirical analysis on three energy consumption time
13 series shows that the GAR model outperforms the ARIMA model.

SC
14 3. Hybrid Forecasting Model Based on Selective Ensemble
15 In this section, the proposed hybrid model HFGSE is described in detail, including basic idea,

U
16 construction of external criteria and modeling steps.
17
AN
18 Table 3 The transfer matrix for the nonlinear subseries

A .

  # B C


M

D D

D
#
D
B
D
C

D2E D2E

D2E
#
D2E
B
D2E
C
D


… … … … …
TE

D2EF D2EF

D2EF
#
D2EF
B
D2EF
C

D2E D2E

D2E
#
D2E
B
D2E
C


EP

… … … … …

E E E# EB EF


C

19
C

20 3.1. Basic idea


21
AC

The hybrid model proposed in this study belongs to the divide and rule method. Because China’s
22 energy consumption time series is annual data, no seasonal factor exists. Therefore, this study uses the
23 GAR [40] model proposed earlier to predict its linear trend. The left residual sequence is the non-linear
24 subseries. Because the forecasting of the linear trend is relatively simple, and that of nonlinear
25 subseries is more difficult, it mainly focuses on the latter. Most existing hybrid forecasting models on
26 nonlinear subseries are for constructing a single prediction model, although the forecasting effect is
27 always better than that of the models that consider the linear trend only. Considering the complexity of
28 non-linear subseries, it is hard to obtain a better forecasting effect with the commonly used single time
29 series prediction model. This study first utilizes the ensemble learning model AdaBoost algorithm to
30 predict. It selects four nonlinear subseries classification models, namely, BP, SVR, GP, and RBF, to

9
ACCEPTED MANUSCRIPT
1 train the weak learner of the AdaBoost algorithm, and construct four ensemble prediction models:
2 AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Further, it considers combining the
3 four ensemble forecasting results. However, if all four trained ensemble models are combined, then
4 multicollinearity among the models may exist, which will degrade the forecasting accuracy of the
5 model. Forecasting performance can be improved by selecting and combining the forecasting results of
6 a subset of the models for a final decision. Thus, this study introduces a GMDH neural network to
7 establish selective combination forecasting. With the automatic modeling mechanism of GMDH, it
8 selects parts of models from all the ensemble forecasting models, self-organizes to combine them, and

PT
9 ensures their weights.
10 Suppose that the original energy consumption time series is  . The HFGSEM model proposed in

RI
11 this study includes four steps: 1) To obtain the energy consumption nonlinear subseries, construct a
12 GAR model to predict linear trend; suppose it is  . Then the difference value between them,  3 =
13  −  4, is the energy consumption nonlinear subseries. 2) AdaBoost ensemble prediction in the

SC
14 nonlinear subseries: It selects the above four nonlinear single models as the weak learners of AdaBoost
15 ensemble learning, obtains the forecasting results of four ensemble models in the non-linear subseries;
16 suppose these are   = 1, 2, 3, 4
. 3) GMDH-based selective combination prediction in the nonlinear
subseries: First, it transfers the original nonlinear time series  and all forecasting results of the

U
17
18 ensemble models   = 1, 2, 3, 4
to the data set  , stored in matrix form (see Table 3), where, 
AN
19 denotes the energy consumption nonlinear subseries at the current period, that is, the dependent
20 variable A. From the third column to the sixth column:  , # , B , and C construct the independent
21 variable . =  , # , B , C
. Next, it divides the whole data set  of the table into the model
M

22 training set  and the test set  (see the first column of Table 3). Further, it divides the model
23 training set  into two subsets horizontally: model learning set A and model selecting set B, and
D

24 finds the optimal complexity model through the GMDH algorithm. Finally, it predicts the dataset using
25 the optimal complexity model and records the forecasting result as  . 4) Calculate the final
forecasting value of the energy consumption time series. It adds the forecasting value of GAR model 
TE

26
27 to that of nonlinear part  , and obtains the final energy consumption forecasting value, that is,
28  =  +  .
EP

29
30 3.2. Construction of external criteria
31 In realistic system modeling, different requirements will appear, which may be the aims of
C

32 modeling, or the prior system knowledge. In GMDH modeling, the external criteria are the mathematic
33
AC

descriptions of these specified requirements, and can select the “optimal” model from the candidate
34 model set. GMDH has an external criteria system [50], which can select different external criteria
35 according to different modeling aims, and construct new external criteria according to needs.
36 This study chooses two external criteria from the existing GMDH external criteria system: the
37 asymmetric stability (AS) criterion and the mean regularization (MR) criterion. Their descriptions are
38 as follows:
39 (1) asymmetric stability criterion
40 J# 
= ∑∈M  −  K

# , (7)

41 where yt is the actual output of the t-th sample in modeling training set W, and yˆt (A) is its

10
ACCEPTED MANUSCRIPT
1 forecasted output for dataset W by the model trained in the model learning set A. This criterion
2 means that first train the model in subset A and then calculate the sum of error squares between the
3 actual outputs and the forecasted outputs in the entire training set W.
4 (2) mean regularization criterion
5 J## 
= ∑∈M  −  

# , (8)

6 where yˆt (W) is the forecasted output of the t-th sample in the entire training set W by the model

7 trained on the same dataset, that is, the model learning process and the calculation of external criteria

PT
8 are both carried out on the training set W.
9 Furthermore, considering that root mean square error (RMSE) and mean absolute percentage error

RI
10 (MAPE) are two commonly used indexes for evaluating the performance of the models in the energy
11 consumption prediction, this study constructs two new criteria: the symmetrical root mean squared
12 error (SRMSE) criterion and symmetrical mean absolute percentage error (SMAPE) criterion. The

SC
13 following are their descriptions:
14 (3) symmetrical root mean squared error criterion

JB# 
= N
∑P∈ROP 2OP Q

> ∑P∈TOP 2OP S

>
+N ,
DR DT

U
15 AN (9)

16 where mA and mB stand for the sample size of data set A and B, respectively, yˆt (B) is the forecasted

17 output of the t-th sample in the modeling learning set A from the model trained in the model selection
M

18 set B, and yˆt ( A) is the forecasted output of the t-th sample in the modeling selection set B from the

19 model trained in the model learning set A. The SRMSE criterion calculates the root mean square error
D

20 in subset A and the root mean square error in subset B simultaneously.


21 (4) symmetrical mean absolute percentage error criterion
TE

Y T
V
VW XW Y R
V
VW XW
∑P∈R P P ∑P∈T P P
JC# 
= U +U .
WP WP
DR DT
22 (10)
EP

23 The SMAPE criterion calculates the mean absolute percent error in subset A and the mean absolute
24 percent error in subset B simultaneously, which uses the information in subsets A and B symmetrically,
25 as in the SRMSE criterion.
C

26 According to different external criteria, different GMDH selective combination forecasting


27
AC

models can be constructed: AS.GMDH, MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH.


28
29 3.3. Modeling steps
30 The modeling flowchart of the energy consumption forecasting model HFGSE proposed in this
31 study can be seen in Fig. 2. Its specific modeling steps include the following:
32

11
ACCEPTED MANUSCRIPT

PT
RI
1
2 Figure 2. The modeling flowchart of the HFGSE model.
3

SC
4 Step 1: Obtain the energy consumption nonlinear subseries. Construct a GAR model on the
5 original energy consumption time series  and predict the linear trend. Suppose the forecasting result
6 is  ; then, the energy consumption nonlinear subseries is  =  −  ;
Step 2: AdaBoost ensemble forecasting on Z[ . Suppose  contains \ sample points, the

U
7
8 maximum number of iterations is  and the threshold value of the relative forecasting error is ∅. The
AN
9 processes of integrating the nonlinear single forecasting model with the AdaBoost.RT algorithm are as
10 follows [45]:

(1) Initialize the weight vector  


= ,  = 1, 2, ⋯ , \
;

M

D
11

12 (2) For ` = 1, 2, ⋯ , :

a. Calculate the sample weight distribution: 


= ∑b a
M 

cd< Ma 

13 , and train one weak learner,

! +
→ ;
TE

14
b. Calculate the relative forecasting error  = ∑D
* 
, : g
h ic
2Oc
a
j > ∅, where  is the
Oc
15

real output of the i-th sample, and ! +


is the forecasting output of the weak learner;
EP

16
17 c. Assign the weight  =  # to the weak learner;
18 d. Update the weight vector of the sample:
h ic
2Oc
 , ! g a j≤∅
C

 F 
=  
∗ l Oc .
1, otherwise
19 (11)
AC

20 (3) Output the final hypothesis:

!h +
= ∑% * log u w ! +
/ ∑% * log u w.
 
va va
21 (12)

22 This study selects four single nonlinear forecasting models in turn, to train the weak learners;
23 obtains four models: AdaBoost.BP, AdaBoost.SVR, AdaBoost.RBF, and AdaBoost.GP; and records
24 their forecasting results as  , # , B , C , respectively.
25 Step 3: Conduct selective combination forecasting with a GMDH neural network on  .
26 (1) Transfer and prepare the data: Transfer the original nonlinear time series data  and the
27 forecasting results of the four ensemble models  , # , B , C into matrix form as in Table 3, divide

12
ACCEPTED MANUSCRIPT
1 the matrix data into model training set W and model test set  . Further, divide the training set into
2 model learning set A and model selecting set B;
3 (2) Run the GMDH algorithm on the model training set W, and find the combination forecasting
4 model with the optimal complexity:
5 a. Construct the general relation between the output and input variables:
6  =   + # # + B B + C C , (13)
7 and regard all the sub-items as the initial input models of the GMDH neural network:
8 / =   , /# = # # , /B = B B , /C = C C , (14)

PT
9 b. Combine all the possible pairs of the four initial models and generate the six candidate models
10 for the first layer, and estimate the parameters of the intermediate candidate models with the LS

RI
11 method;
12 c. Calculate the external criteria values of all intermediate candidate models, select the four
13 intermediate candidate models with the smallest external criteria values for the next layer, and regard

SC
14 them as inputs of the second layer of the GMDH neural network;
15 d. Repeat Steps b and c, generate the intermediate candidate models for the second, third, …, L-th
16 layer in turn, and find the combination forecasting model with optimal complexity u* according to the

U
17 optimal complexity theory;
18 (3) Predict the energy consumption nonlinear subseries on the test set  with the optimal
AN
19 complexity model ∗ , and let it be  ;
20 Step 4: Calculate the final energy consumption time series forecasting value. Add the
21 forecasting value of the linear GAR model  and that of the nonlinear part  , and obtain the final
M

22 energy consumption time series forecasting value, that is,  =  +  .

23 4. Empirical Analysis
D

24 To verify the performance of the proposed model, this study selects two time series, the total
TE

25 energy consumption and total oil consumption in China for experiments. Firstly, to analyze the impact
26 of AdaBoost.RT algorithm on the model’s forecasting performance in the nonlinear subseries, this
27 study compares the forecasting performance of AdaBoost.RT ensemble with four single models—GP,
EP

28 SVR, GP and RBF. Secondly, to investigate the effect of selective combination forecasting, it analyzes
29 the forecasting results of four different versions of GMDH combination forecasting models to find the
30 optimal one, and then compares the best one with the models participating in the combination. Thirdly,
C

31 it compares the forecasting performance of HFGSE model with that of other hybrid forecasting models.
32
AC

Finally, the out-of-sample forecasting of HFGSE model is complemented on the total energy
33 consumption and total oil consumption time series in China from 2015 to 2020.
34
35 4.1. Data
36 To evaluate the forecasting performance of the HFGSE model proposed in this study, an empirical
37 analysis of the annual time series is conducted on the Chinese total energy consumption and total oil
38 consumption from 1978 to 2014 (see Fig. 3). The data are from the China Statistical Yearbook.
39 Because the key of the HFGSE model is to predict the nonlinear subseries of energy consumption, this
40 study does not discuss the forecasting result of the linear trend in detail, but utilizes the GAR model
41 proposed above to predict the linear trend of the original series  , and obtains nonlinear subseries  .

13
ACCEPTED MANUSCRIPT
1 Fig. 4 shows the change of total energy consumption and total oil consumption nonlinear subseries. It
2 can be seen from the figure that the nonlinear subseries of two energy consumption time series shows
3 fluctuation to a large extent.

450000
400000
Ten thousand tons of standard coal

350000
300000

PT
250000
200000
150000

RI
100000
50000

SC
0

Total energy consumption Total oil consumption

U
4
5 Figure 3. Energy consumption time series.
AN
6
M

20000
Ten thousand tons of standard coal

15000
D

10000
TE

5000

-5000
EP

-10000

-15000

-20000
C

The nonlinear subseries of total energy consumption


The nonlinear subseries of total oil consumption
AC

7
8 Figure 4. The nonlinear subseries of total energy and oil consumption.
9
10 4.2. Experiment setting
11 This study selected the time series of energy consumption from 1978 to 2009 as the training set,
12 and the ones from 2010 to 2014 as the test set. The models mentioned in this study were trained the
13 corresponding models on the training set, and evaluate their performance on the test set. It is worth
14 noting that the training set and test set here are different from those in Table 3, but they are related. In
15 Table 3, it first conducts the GAR model in the original energy consumption time series yt to obtain the
16 linear trend prediction  , then calculates the nonlinear subseries y , and finally uses 1978 - #''} as the

14
ACCEPTED MANUSCRIPT
1 training set and #'' - #'C as the test set.
2 This study used the original energy consumption time series as the dependent variable, and its
3 lagged item as the independent variable to train the model. The four nonlinear forecasting models were
4 regarded as the weak learners to the train AdaBoost.RT ensemble model. The parameters setting of the
5 four models was as follows: 1) BP neural network: It includes two important parameters: the largest
6 lagged order,  and its nodes in hidden layer, . In predicting different energy consumption time
7 series, the optimal values of two parameters are always different. After repeated experiments, it can be
8 found that the BP neural network can attain a satisfactory forecasting performance for the total energy

PT
9 consumption and total oil consumption time series when = 5 and 4, and =3 and 3 respectively. 2)
10 SVR model: this study used the Libsvm-3.1 toolbox to implement the SVR model. It chose the most

RI
11 commonly used RBF as the kernel function because of its nonlinear mapping ability. Through
12 experiments, it can be found that the SVR model had the best forecasting performance on the total
13 energy consumption and total oil consumption time series when =1 and 2, respectively. There are
other two important parameters in SVR model, i.e., the penalty parameter C and the kernel width γ.

SC
14
15 This study introduced the grid computing method in the toolbox to search for the best parameter values.
16 Finally, let C=0.2 and γ = 15.76 for the total energy consumption, and C=7.1 and γ = 24.20 for the

U
17 total oil consumption. 3) GP model: In its modeling process, the parameters setting is relatively
18 important for its performance. Through repeated tries, the GP model can attain the optimal forecasting
AN
19 effect for the total energy consumption and total oil consumption time series, respectively, when let the
20 number of initial trees be 50 and 60, the crossover probability 0.8 and 0.85, the threshold value of
21 goodness of fit 0.85 and 0.85, and the maximum number of iterations 50 and 50. 4) RBF neural
M

22 network: The expanding speed of the radial basis function spread is an important parameter, and the
23 lagged order of the time series, , is also very important. Through experimental comparison, it can be
D

24 found that the RBF model attained the best forecasting performance for two energy consumption time
25 series when spread=3 and =1.
For the threshold value ∅ of the AdaBoost.RT ensemble algorithm, after a repeated experiments
TE

26
27 comparison, this study took ∅=10% because the performance of the model is the best at this value.
28 Although the forecasting error of the final strong learner will decrease with the increase of the iteration
times , the increase in  will lead to the increase of the model operation time, and therefore, the
EP

29
30 iteration times was set at =50.
31 Finally, all experiments were performed on the platform Matlab2011b. At the same time, this
C

32 study repeated the above procedure 10 times and took the average value to be the experimental result.
33
AC

34 4.3. Model evaluation criteria


35 To evaluate the forecasting performance of model, this study utilizes two commonly used
36 evaluation criteria: the root mean square error (RMSE) [51] and mean absolute percent error (MAPE)
37 [52]. Their definitions are as follows:
∑b
‚Jƒ = N
P 2OP
>
Pd<O
,
D
38 (15)

Y P XWP
W
∑b
Pd<… …
‚K„ƒ =
WP
D
39 , (16)

40 where  is the real value of the y-th sample,  is its corresponding forecasting value, and \ is the

15
ACCEPTED MANUSCRIPT
1 number of the test samples. Obviously, the smaller the value of the evaluation criterion is, the better the
2 forecasting performance of the model is [53].
3
4 Table 4 Comparison of AdaBoost ensemble and single models on two nonlinear subseries

Model BP AdaBoost.BP SVR AdaBoost.SVR GP AdaBoost.GP RBF AdaBoost.RBF

Total energy consumption nonlinear subseries

RMSE 0.7738 0.6818 0.75 0.6709 2.0351 0.92 1.1497 0.7602

Rank 5 2 3 1 8 6 7 4

PT
MAPE 103.51% 11.31% 98.62% 28.22% 255.74% 19.63% 134.87% 36.45%

Rank 6 1 5 3 8 2 7 4

RI
Total oil consumption nonlinear subseries

RMSE 0.2905 0.2368 0.2688 0.2392 0.2367 0.1931 0.2388 0.2319

Rank 8 4 7 6 3 1 5 2

SC
MAPE 156.44% 79.36% 94.62% 70.85% 125.28% 72.18% 128.44% 53.01%

Rank 8 4 5 2 6 3 7 1

Average rank 6.75 2.75 5 3 6.25 3 6.5 2.75

U
5
AN
6 4.4. AdaBoost ensemble forecasting on the nonlinear subseries
7 To analyze the impact of the AdaBoost.RT ensemble algorithm on the model’s forecasting
8 performance, this study compares the forecasting result of the BP neural network, SVR model, GP
9
M

model, and RBF neural network with that of each original single nonlinear model. Table 4 shows the
10 comparison of each model’s forecasting performance on two energy consumption nonlinear subseries.
11 The table gives the rank of each model on two evaluation criteria, from low to high (the smaller the
D

12 rank is, the better the model’s performance is). The last row is the average value of each model’s
13 evaluation criteria ranks, for each of two nonlinear subseries.
TE

14 The following conclusions can be obtained after carefully analyzing Table 4: 1) For both the total
15 energy consumption nonlinear subseries and the total oil consumption nonlinear time series, the values
16 of RMSE and MAPE through integrating the AdaBoost model are smaller than those of the
EP

17 corresponding single nonlinear models. This demonstrates that the AdaBoost.RT algorithm can
18 certainly improve the single linear models’ forecasting performance to different extents. 2) In the total
19 energy consumption nonlinear subseries, it can be seen from the ranks that the performance of the
C

20 AdaBoost.SVR model is the best according to the RMSE evaluation criterion, and that of AdaBoost.BP
AC

21 is the best according to the MAPE evaluation criterion; in the total oil consumption nonlinear subseries,
22 the performance of AdaBoost.GP is the best according to the RMSE evaluation criterion, and that of
23 AdaBoost.RBF is the best according to the MAPE evaluation criterion. This demonstrates that the
24 ensemble models can always achieve better performance compared with the four single nonlinear
25 forecasting models. From the average ranks of the last row in Table 4, the eight models in order of their
26 forecasting performance are as follows: AdaBoost.BP, AdaBoost.RBF, AdaBoost.SVR, AdaBoost.GP,
27 SVR, GP, RBF and BP. The four different ensemble models rank better than the other four models,
28 which verifies the above conclusions again.
29
30 4.5. Analysis of selective combination forecasting

16
ACCEPTED MANUSCRIPT
1 This part focuses on the effect of selective combination forecasting. It first analyzes the
2 forecasting results of four different versions of GMDH combination forecasting models to find the
3 optimal one, and then compares the best one with the models participating in the combination.
4
5 4.5.1 Comparisons of different versions of selective combination forecasting models
6 In the HFGSE model proposed in this study, four different versions of the model are constructed
7 according to the different external criteria used in GMDH selective combination prediction: AS.GMDH,
8 MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH. In this section, the four versions of GMDH model

PT
9 are used to make selective combinations of the models that were enhanced by the AdaBoost.RT
10 algorithm in the previous section. Table 5 shows the comparison of the selective combinations’
11 performance for the four different GMDHs. The number in parentheses indicates the rank of the model

RI
12 in the row. The smaller the rank is, the better the model’s performance is. The last row is the average
13 values of the evaluation criteria ranks of all the models for two consumption time series, which can

SC
14 represent the overall predictive performance of the models well.
15 According to Table 5, for the total energy consumption time series, MR.GMDH has the best
16 performance according to the RMSE evaluation criterion, followed by AS.GMDH and SRMSE.GMDH,

U
17 and the poorest performer in the group is SMAPE.GMDH. Meanwhile, according to the MAPE
18 evaluation criterion, AS.GMDH has the best performance, followed by SMPE.GMDH, and the poorest
AN
19 performers are MR.GMDH and SMAPE.GMDH. Therefore, each of these four models has its own
20 advantages and disadvantages. However, for the total oil consumption time series, AS.GMDH has the
21 smallest value of both RMSE and MAPE, indicating its superior prediction performance. Finally, from
M

22 the average ranks in the last row of Table 5, the AS.GMDH model has the smallest value, followed by
23 the MR.GMDH model, and finally the SMAPE.GMDH and SRMSE.GMDH models. This indicates
D

24 that, in the four versions of the GMDH selective combination forecasting model, the AS.GMDH model
25 has the best overall predictive performance. Therefore, in the following experiments of this study, the
TE

26 AS.GMDH model is chosen for the selective combination forecasting.


27
28 Table 5 Comparisons of different versions of GMDH models on energy and oil consumption nonlinear subseries
EP

Model AS.GMDH MR.GMDH SRMSE.GMDH SMAPE.GMDH

Total energy consumption time series

RMSE 0.5738(2) 0.5669(1) 0.5887(3) 0.6003(4)


C

MAPE 8.541%(1) 10.01%(3) 10.32%(4) 9.981%(2)


AC

Total oil consumption time series

RMSE 0.1789(1) 0.1903(4) 0.1867(2) 0.1899(3)

MAPE 39.17%(1) 47.23%(3) 52.11%(4) 45.01%(2)

Average rank 1.25 2.75 3.25 2.75

29
30 Furthermore, Table 6 gives the models that participate in the optimal combination model for two
31 consumption nonlinear subseries selected by the AS.GMDH model. It can be seen from the table that
32 the AS.GMDH model chooses two models from the four candidates, that is, AdaBoost.BP,
33 AdaBoost.GP, AdaBoost.RBF, and AdaBoost.SVR, to participate in the optimal combination for two
34 consumption nonlinear subseries. Thus, the following conclusion can be drawn that, on the one hand,

17
ACCEPTED MANUSCRIPT
1 models that participate in the optimal combination selected by the GMDH selective combination
2 forecasting model with self-organization modeling technology are not a single candidate model that can
3 effectively compensate for the lack of a single prediction model with poor performance; on the other
4 hand, they are not all candidate models, which is a good way to overcome the disadvantage of
5 information redundancy that the combination of all candidate models, namely the traditional
6 combination forecasting model, may lead to, thus improving the prediction performance of the model.
7
8 Table 6 Models participating in the optimal combination model constructed by AS.GMDH

PT
Nonlinear subseries Selected models

Total energy consumption AdaBoost.BP, AdaBoost.GP

RI
Total oil consumption AdaBoost.GP, AdaBoost.RBF

9
10 4.5.2. Comparisons of the selective combination model with the models participating in the

SC
11 combination
12 To verify the performance of the GMDH-based selective combination forecasting model, this
13 study compares the GMDH-based combination model AS.GMDH with the four models participating in

U
14 combination: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Fig. 5 and Fig. 6 show
15 the comparison results for the total energy consumption nonlinear series and the total oil consumption
AN
16 nonlinear series, respectively.

Total energy consumption


1.0000 0.9200
M

0.9000
0.8000 0.7602
0.6818 0.6709
0.7000 RMSE
0.5738
0.6000 MAPE
D

0.5000
0.4000 36.45%
28.22%
0.3000
19.63%
0.2000
TE

8.54% 11.31%
0.1000
0.0000
EP

17
18 Figure 5. Comparison of the GMDH combination model with the models participating in the combination for the nonlinear

19 subseries of the total energy consumption.


C

20
21 As can be seen from Fig. 5, for the total energy consumption nonlinear subseries, according to the
AC

22 RMSE evaluation criterion, the AS.GMDH model is optimal, followed by the AdaBoost.SVR and
23 AdaBoost.BP model, and finally the AdaBoost.RBF and AdaBoost.GP model. Moreover, according to
24 the MAPE evaluation criterion, AS.GMDH still is the optimal model, followed by AdaBoost.BP,
25 AdaBoost.GP, AdaBoost.SVR, and AdaBoost.RBF. Thus, the conclusion can be drawn that, for the
26 total energy consumption nonlinear series, compared with the four models participating in the
27 combination, the AS.GMDH model proposed by this study has a better forecasting performance.
28 According to Fig. 6, it can be seen that for the total oil consumption nonlinear series, the GMDH
29 selective combination forecasting model has the smallest value on both evaluation criteria, especially
30 on the MAPE evaluation criterion: the value of AS.GMDH is 13.84% lower than that of AdaBoost.RBF.
31 This shows that AS.GMDH still has the best forecasting performance for the total oil consumption

18
ACCEPTED MANUSCRIPT
1 nonlinear series.

Total oil consumption


0.9000
79.36%
0.8000 70.85% 72.18%
0.7000
0.6000 53.01%
0.5000
39.17%
0.4000
RMSE
0.3000 0.2368 0.2392 0.2319

PT
0.1789 0.1931 MAPE
0.2000
0.1000
0.0000

RI
2

SC
3 Figure 6. Comparison of the GMDH combination model with the models participating in the combination for the total oil

4 consumption nonlinear subseries.

U
6 4.6. Comparisons of the proposed hybrid model with other models
7 To verify the overall forecasting performance of the proposed hybrid model HFGSE, this article
AN
8 compared it to other commonly used time series models. First, it compared the HFGSE model with the
9 GAR model (which only predicts the linear trend of the energy consumption time series and discards
10 the nonlinear residual subseries directly) that put forward earlier; the results are shown in Table 7. It
M

11 can be seen from the table that for both the total energy consumption time series and the total oil
12 consumption time series, the errors of the HFGSE model, which predicted the nonlinear residual series,
D

13 are always smaller than those of the GAR model. The conclusion can be drawn that for both
14 consumption time series, the nonlinear residual series do carry useful information for prediction
TE

15 modeling.
16
17 Table 7 Comparison of the forecasting performance of the HFGSE and GAR models
EP

Errors of total energy consumption Errors of total oil consumption

Model RMSE MAPE RMSE MAPE


C

GAR 1.7010 3.62% 1.2908 6.99%

HFGSE 0.4672 1.20% 0.2341 2.84%


AC

18
19 Next, this study compared the HFGSE model with four simple hybrid models which first use the
20 GAR model to predict the linear trend, and then employ the BP, SVR, GP, and RBF model,
21 respectively, to predict the nonlinear fluctuations, finally combining two parts for the forecasting result.
22 Furthermore, it compared the HFGSE model with three hybrid forecasting models proposed recently,
23 including the combination forecasting method GM-ARIMA [21], the divide and rule methods
24 EMD-LSSVR [35] and DEMD-SVR-AR [33]. The results are shown in Table 8. The bold value in the
25 table corresponds to the smallest error in the current row. The number in parentheses indicates the rank
26 of the model in the row. The smaller the rank is, the better the model’s performance is. The last row
27 shows the average rank for each model.

19
ACCEPTED MANUSCRIPT
1 According to Table 8, the following conclusions can be obtained: 1) For both the total energy
2 consumption and total oil consumption time series, HFGSE, put forward by this study, has the smallest
3 value of MAPE evaluation criterion; the value of RMSE for HFGSE is only larger than that of
4 DEMD-SVR-AR for the total oil consumption time series. In addition, from the average rank in the last
5 row of this table, it can be seen that the average rank of HFGSE is the smallest, too. Thus, compared
6 with the other seven hybrid models, HFGSE has the best overall forecasting performance. 2) For the
7 seven hybrid models, the average rank of DEMD-SVR-AR is only larger than that of HFGSE model
8 proposed in this study, followed by EMD-LSSVR, GAR&BP, GM-ARIMA, GAR&SVR, GAR&GP

PT
9 models, and finally GAR&RBF. This indicates that the overall forecasting performance of
10 DEMD-SVR-AR model is superior to those of the six other models, whereas that of GAR&RBF is the

RI
11 worst.
12
13 Table 8 Comparisons of forecasting performance of the HFGSE and the other seven hybrid models

SC
HFGSE GAR&BP GAR&SVR GAR&GP GAR&RBF GM-ARIMA EMD-LSSVR DEMD-SVR-AR

Total energy consumption time series

RMSE 0.4672(1) 1.4722(5) 1.3310(4) 1.6320(6) 2.3675(7) 3.780(8) 0.6231(3) 0.5156(2)

U
MAPE 1.20%(1) 2.93%(4) 3.05%(5) 3.16%(6) 4.56%(8) 3.40%(7) 1.30%(2) 1.42%(3)
AN
Total oil consumption time series

RMSE 0.2341(2) 0.7428(5) 0.8461(8) 0.7572(6) 0.7816(7) 0.3152(3) 0.3907(4) 0.1713(1)


MAPE 2.84%(1) 5.49%(6) 5.18%(5) 6.88%(7) 7.19%(8) 4.33%(3) 4.78%(4) 2.94%(2)
M

Average rank 1.25 5.0 5.5 6.25 7.5 5.25 3.25 2

14
15 4.7. Out-of-sample forecasting of the proposed hybrid model
D

16 Based on the above analyses and comparisons, the HFGSE model can accurately predict energy
17 consumption. Furthermore, Table 9 shows the out-of-sample forecasting results of the HFGSE model
TE

18 for two consumption time series from 2015 to 2020. It can be seen from the table that China's energy
19 consumption will continue to rise from 2015 to 2020, and the total amount of energy consumption and
20 oil consumption will reach 5261.47 and 1017.56 million tons of standard coal by 2020, respectively.
EP

21 The average annual growth rate of total energy consumption in 2015-2020 is 4.14%, whereas the
22 annual growth rate of total oil consumption is 5.24%.
23
C

24 Table 9 Forecasting of the HFGSE model for two consumption time series from 2015 to 2020 (unit: ten thousand tons of
AC

25 standard coal)

Year 2015 2016 2017 2018 2019 2020

Total energy consumption time series 435637 448275 453746 485768 499398 526147

Total oil consumption time series 77059 81498 86148 91064 96262 101756

26
27 Meanwhile, since the real energy consumption data of 2015 and 2016 in China is available now,
28 the forecasting accuracy of 2015 and 2016 is shown in Table 10. After comparing Tables 10 and 8, it
29 can be found that HFGSE model has little difference in prediction performance between the
30 out-of-sample and test set, which shows that the HFGSE model has strong generalization ability.
31 Table 10 Forecasting accuracy of the HFGSE model for two consumption time series in 2015 and 2016

20
ACCEPTED MANUSCRIPT
Total energy consumption time series Total oil consumption time series

Year MAPE RMSE MAPE RMSE

2015 1.33% 0.5732 2.05% 0.1614

2016 2.81% 1.2274 2.14% 0.2729

1
2 Figure 7 depicts the predicted results of energy consumption and comparisons with real values; the
3 triangle-dotted line and the cross-dotted line represent the predicted value of total energy consumption
4 and total oil consumption, respectively. The circle solid line and the square solid line represent the real

PT
5 value of the total energy consumption and total oil consumption, respectively. The dotted line and solid
6 line for 1978-2014 in the figure almost overlap completely, which further indicates that the HFGSE

RI
7 model can fit the energy consumption time series well. After 2015, the triangle-dotted line and the
8 cross-dotted line still maintain the trend of growth, but the growth rate of total energy consumption
9 begins to decelerate, and the growth rate of total oil consumption is basically unchanged.

SC
10
600000

U
Ten thousand tons of standard coal

500000
the real value of
total energy
AN
400000 consumption

the predicted
300000 value of total
energy
consumption
M

200000 the real value of


total oil
consumption
100000
D

the predicted
value of tatal oil
0 consumption
TE

11
12 Figure 7. Comparison between the predicted and the real values of energy consumption.
EP

13 5. Conclusion
14 Researching and building scientific energy consumption models and accurately predicting the
15 future gap of energy supply and demand have important practical significance to our country's
C

16 sustainable economic and social development, the development of the energy industry, the rational use
AC

17 of energy resources, the construction of a conservation-oriented society, and the formulation of a


18 national energy strategy. This study proposed a new GMDH-based selective ensemble hybrid
19 forecasting model. The model first uses the GAR model to predict the linear trend of the energy
20 consumption time series and obtains the nonlinear residual subseries. Considering the highly nonlinear
21 characteristic of the residual subseries, this study introduces AdaBoost ensemble technology to enhance
22 the forecasting performance of single nonlinear prediction models to obtain the prediction results of
23 four different versions of the ensemble model on a nonlinear subseries. Further, the prediction results of
24 these four AdaBoost ensemble models are used as initial input, and the combination predictive value of
25 the nonlinear subseries is obtained by using GMDH for selective combination prediction. Finally, the
26 two parts are summed up to obtain the final prediction. The experiment was conducted on the time

21
ACCEPTED MANUSCRIPT
1 series of total energy consumption and total oil consumption in China, and the main conclusions are as
2 follows:
3 1) Compared with the four single models—BP, SVR, GP and RBF, the AdaBoost.RT ensemble
4 algorithm can achieve better forecasting performance on the nonlinear subseries.
5 2) This study compares four different versions of GMDH selective combination forecasting
6 models, and the results show that AS.GMDH model has the best overall forecasting performance.
7 3) The comparisons of AS.GMDH combination forecasting model with the models participating in
8 the combination show that AS.GMDH has the best performance on the nonlinear subseries.

PT
9 4) Compared with GAR model and other seven hybrid models, the HFGSE model has the best
10 forecasting performance. In addition, the out-of-sample forecasting proves the superiority of the

RI
11 HFGSE model again.
12 5) The HFGSE model is applied to the out-of-sample forecasting and the results demonstrate the
13 total energy consumption and total oil consumption in China will keep growing until 2020.

SC
14 In the process of constructing the GMDH neural network, the reference function only considers
15 the first-order linear K-G polynomials, without further study of other forms of reference function. In
16 fact, in the real world, the relationship between the dependent and independent variables may not be a

U
17 simple first-order linear relationship. Therefore, considering the form of a more complex nonlinear
18 reference function will be more in line with the actual relationship, and may further improve the
AN
19 performance of the model; it is also the further research direction of this study.

20 Acknowledgments
M

21 Thanks for the constructive suggestions of the editor and anonymous reviewers. This study is
22 partly supported by the National Natural Science Foundation of China under Grant Nos. 71471124 and
D

23 71273036, Excellent Youth Fund of Sichuan University under Grant Nos. skqx201607, sksyl201709,
24 and skzx2016-rcrw14.
TE

25 References
26 [1] B. World, BP Statistical Review of World Energy 2016. Available from: http://www.bp.com/en/global/corporate
EP

27 /energy-economics/statistical-review-of-world-energy.html.

28 [2] W.G.J. Dupree, J.S. Corsentino, United States energy through the year 2000, Nasa Sti/recon Technical Report, 1975.

29 [3] R.P. Thompson, Weather sensitive electric demand and energy analysis on a large geographically diverse power system
C

30 application to short term hourly electric demand forecasting, IEEE Transactions on Power Apparatus and Systems 95 (1)

31
AC

(1976) 385-393.

32 [4] S. Parikh, M.H. Rothkopf, Long-run elasticity of US energy demand: A process analysis approach, Energy Economics 2 (1)

33 (1980) 31-36.

34 [5] Z.R. Yang, The potential and means of saving energy, China's Energy 3 (4) (1980) 5-8. (in Chinese).

35 [6] Z.H. Wu, See the Way Out of the Energy Crisis from Energy Science and Technology, Knowledge Press, 1980. (in Chinese)

36 [7] D. Shi, The improvement of energy utilization efficiency in China's economic growth, Economic Research Journal 48 (9)

37 (2002) 49-56. (in Chinese).

38 [8] The State Planning and Energy-saving Commission, Development and Application of Energy Prediction Model, China

39 Planning Press, 1988. (in Chinese).

40 [9] P. Sen, M. Roy, P. Pal, Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an

41 Indian pig iron manufacturing organization, Energy 116 (12) (2016) 1031-1038.

22
ACCEPTED MANUSCRIPT
1 [10] A.E. Clements, A.S. Hurn, Z. Li, Forecasting day-ahead electricity load using a multiple equation time series approach,

2 European Journal of Operational Research 251 (2) (2016) 522-530.

3 [11] K.G. Boroojeni, M.H. Amini, S. Bahrami, S.S. Iyengar, A.F. Sarwat, O. Karabasoglu, A novel multi-time-scale modeling for

4 electric power demand forecasting: From short-term to medium-term horizon, Electric Power Systems Research 142 (1)

5 (2017) 58-73.

6 [12] F. Shaikh, Q. Ji, P.H. Shaikh, N.H. Mirjat, M.A. Uqaili, Forecasting China’s natural gas demand based on optimized

7 nonlinear grey models, Energy 140 (12) (2017) 941-951.

8 [13] S. Ding, K.W. Hiple, Y.G. Dang, Forecasting China's electricity consumption using a new grey prediction model, Energy

PT
9 149 (4) (2018) 314-328.

10 [14] M. Kovačič, B. Šarler, Genetic programming prediction of the natural gas consumption in a steel plant, Energy 66 (3) (2014)

11

RI
273-284.

12 [15] J. Szoplik, Forecasting of natural gas consumption with artificial neural networks, Energy 85 (6) (2015) 208-220.

13 [16] E.S. Irdemoosa, S.R. Dindarloo, Prediction of fuel consumption of mining dump trucks: a neural networks approach,

SC
14 Applied Energy 115 (8) (2015) 77-84.

15 [17] Y. Chen, P. Xu, Y. Chu, W.L. Li, Y.T. Wu, L.Z. Ni, Y. Bao, K. Wang, Short-term electrical load forecasting using the support

16 vector regression (SVR) model to calculate the demand response baseline for office buildings, Applied Energy 195 (6) (2017)

U
17 659-670.

18 [18] A. Rahman, V. Srikumar, A.D. Smith, Predicting electricity consumption for commercial and residential buildings using
AN
19 deep recurrent neural networks, Applied energy 212 (2) (2018) 372-385.

20 [19] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted

21 support vector regression with differential evolution optimization technique, Energy and Buildings 126 (8) (2016) 94-103.
M

22 [20] L.Y. Xiao, C. Wang, T.L Liang, W. Shao, A combined model based on multiple seasonal patterns and modified firefly

23 algorithm for electrical load forecasting, Applied Energy 167 (4) (2016) 135-153.
D

24 [21] C.Q Yuan, S.F. Liu, Z.G. Fang, Comparison of China's primary energy consumption forecasting by using ARIMA (the

25 autoregressive integrated moving average) model and GM (1,1) model, Energy 100 (4) (2016) 384-390.
TE

26 [22] J. Nowotarski, B. Liu, R. Weron, T. Hong, Improving short term load forecast accuracy via combining sister forecasts,

27 Energy 98 (3) (2016) 40-49.

28 [23] X.L. Liu, B. Moreno, A.S. García, A grey neural network and input-output combined forecasting model. Primary energy
EP

29 consumption forecasts in Spanish economic sectors, Energy 115 (11) (2016) 1042-1054.

30 [24] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted

31 support vector regression with differential evolution optimization technique, Energy & Buildings 126 (8) (2016) 94-103.
C

32 [25] Y. Karadede, G. Ozdemir, E. Aydemir, Breeder hybrid algorithm approach for natural gas demand forecasting model, Energy

33
AC

141 (12) (2017) 1269-1284.

34 [26] J.R. Li, R. Wang, J.Z. Wang, Y.F. Li, Analysis and forecasting of the oil consumption in China based on combination models

35 optimized by artificial intelligence algorithms, Energy 44 (2) (2018) 243-264.

36 [27] Y.J. Zhang, F. Ma, B.S. Shi, D.S. Huang, Forecasting the prices of crude oil: An iterated combination approach, Energy

37 Economics 70 (2) (2018) 472-483.

38 [28] B.Z. Zhu, Y.M. Wei, Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines

39 methodology, Omega 41 (3) (2013) 517-524.

40 [29] N. Liu, Q.F. Tang, J.H. Zhang, W. Fan, J. Liu, A hybrid forecasting model with parameter optimization for short-term load

41 forecasting of micro-grids, Applied Energy 129 (12) (2014) 336-345.

42 [30] A. Addoos, M. Hemmati, A.A. Abdoos, Short term load forecasting using a hybrid intelligent method, Knowledge-Based

43 Systems 76 (3) (2015) 139-147.

23
ACCEPTED MANUSCRIPT
1 [31] J.L. Zhang, Y.J. Zhang, L. Zhang, A novel hybrid method for crude oil price forecasting, Energy Economics 49 (5) (2015)

2 649-659.

3 [32] L. Yu, Z.S. Wang, L. Tang, A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil

4 price forecasting, Applied Energy 156 (10) (2015) 251-267.

5 [33] G.F. Fan, L.L. Peng, W.C. Hong, F. Sun, Electricity load forecasting by the SVR model with differential empirical mode

6 decomposition and auto regression, Neurocomputing 173 (1) (2016) 958-970.

7 [34] I.P. Panapakidis, A.S. Dagoumas, Day-ahead natural gas demand forecasting based on the combination of wavelet transform

8 and ANFIS/genetic algorithm/neural network model, Energy 118 (1) (2017) 231-245.

PT
9 [35] B.Z. Zhu, D. Han, P. Wang, Z.C. Wu, T. Zhang, T.M. Wei, Forecasting carbon price using empirical mode decomposition

10 and evolutionary least squares support vector regression, Applied Energy 191 (4) (2017) 521-530.

11

RI
[36] E.M. Oliveira, F.L.C. Oliveira, Forecasting mid-long term electric energy consumption through bagging ARIMA and

12 exponential smoothing methods, Energy 144 (2) (2018) 776-788.

13 [37] D.L. Wang, Y.D. Wang, X.F. Song, Y. Liu, Coal overcapacity in China: multiscale analysis and prediction, Energy

SC
14 Economics 70 (2) (2018) 244-257.

15 [38] Y. Freund, R.E. Schapirre, Experiments with a new boosting algorithm, in: ICML, pp. 148-156.

16 [39] A. Ivakhnenko, The group method of data handling in prediction problems, Soviet Autom Control 9 (6) (1976) 21-30.

U
17 [40] L. Xie, J. Xiao, H. Zhao, Y. Xiao, Y. Hu, China’s energy consumption forecasting by GMDH based auto-regressive model,

18 Journal of Systems Science and Complexity 30 (6) (2017) 1332-1349.


AN
19 [41] X.F. Li, Z.S. Zhang, C. Huang, An EPC forecasting method for stock index based on integrating empirical mode

20 decomposition, SVM and cuckoo search algorithm, Journal of Systems Science and Information 2 (6) (2014) 481-504.

21 [42] P. Viola, M.J. Jones, Robust real-time object detection, International Journal of Computer Vision 57 (2) (2001) 34-47.
M

22 [43] L. Gao, P. Kou, F. Gao, X.H. Guan, AdaBoost regression algorithm based on classification-type loss, in: 8th World Congress

23 on Intelligent Control and Automation (WCICA), IEEE, 2010, pp. 682-687.


D

24 [44] D.P. Solomatine, D.L. Shrestha, AdaBoost.RT: a boosting algorithm for regression problems, in: International Joint

25 Conference on Neural Networks, IEEE, 2004, pp. 1163-1168.


TE

26 [45] J. Xiao, C.Z. He, X.Y. Jiang, Structure identification of Bayesian classifiers based on GMDH, Knowledge-Based Systems

27 22 (6) (2009) 461-470.

28 [46] J. Xiao, C.Z. He, X.Y. Jiang, D.H. Liu, A dynamic classifier ensemble selection approach for noise data, Information
EP

29 Sciences 180 (18) (2010) 3402-3421.

30 [47] J. Xiao, L. Xie, C.Z. He, Dynamic classifier ensemble model for customer classification with imbalanced class distribution,

31 Expert Systems with Applications 39 (3) (2012) 3668-3675.


C

32 [48] J. Xiao, Y. Xiao, A. Huang, D.H. Liu, S. Wang, Feature-selection-based dynamic transfer ensemble model for customer

33
AC

churn prediction, Knowledge and Information Systems 43 (1) (2015) 29-51.

34 [49] J. Xiao, H.W. Cao, X.Y. Jiang, X. Gu, L. Xie, GMDH-based semi-supervised feature selection for customer classification,

35 Knowledge-Based Systems 132 (9) (2017) 236-248.

36 [50] J.A. Mueller, F. Lemke, Self-organizing Data Mining: An Intelligent Approach to Extract Knowledge from Data, Libri,

37 2000.

38 [51] Y. Xiao, J.J. Liu, Y. Hu, Y.F. Wang, Time series forecasting using a hybrid adaptive particle swarm optimization and neural

39 network model, Journal of Systems Science and Information 2 (4) (2014) 335-344.

40 [52] J. Xiao, X.Y. Jiang, C.Z. He, G. Teng, Churn prediction in customer relationship management via GMDH-based multiple

41 classifiers ensemble, IEEE Intelligent Systems 31 (2) (2016) 37-44.

42 [53] S.W. Yu, K.J. Zhu, A hybrid procedure for energy demand forecasting in China, Energy 37 (1) (2012) 396-404.

43
24
ACCEPTED MANUSCRIPT
1

PT
RI
U SC
AN
M
D
TE
EP
C
AC

25
ACCEPTED MANUSCRIPT

Highlights

 A selective ensemble based hybrid energy consumption prediction model is proposed.

 This study employs the selective ensemble method for the nonlinear subseries.

PT
 The selective ensemble method performs better than its constituent models.

RI
 The hybrid model outperforms other seven models on the original time series.

 The out-of-sample forecasting for the two time series from 2015-2020 is shown.

U SC
AN
M
D
TE
C EP
AC

You might also like