GMDH selective ensemble hybrid model for China energy consumption

Accepted Manuscript
A hybrid model based on selective ensemble for energy consumption forecasting in

China
Jin Xiao, Yuxi Li, Ling Xie, Dunhu Liu, Jing Huang
PII: S0360-5442(18)31226-X
DOI: 10.1016/j.energy.2018.06.161
Reference: EGY 13203
To appear in: Energy
Received Date: 29 November 2017

Revised Date: 19 June 2018
Accepted Date: 24 June 2018
Please cite this article as: Xiao J, Li Y, Xie L, Liu D, Huang J, A hybrid model based on
selective ensemble for energy consumption forecasting in China, Energy (2018), doi: 10.1016/
j.energy.2018.06.161.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
RI
U SC
AN
M
D
TE
EP
C
AC
ACCEPTED MANUSCRIPT
1 A hybrid model based on selective ensemble for energy

2 consumption forecasting in China
3
4 Jin Xiaoa, Yuxi Lia, Ling Xiea, Dunhu Liub, Jing Huangc,1
5 a Business School, Sichuan University, Chengdu 610064, China
6  b Management Faculty, Chengdu University of Information Technology, Chengdu 610103, China
PT
7  c School of Public Administration, Sichuan University, Chengdu 610064, China
8
9 Abstract: It is of great significance to develop accurate forecasting models for China’s energy
RI
10 consumption. The energy consumption time series often have the characteristics of complexity and
11 nonlinearity, and the single model cannot achieve satisfactory forecasting results. Therefore, in recent
12
SC
years, more and more scholars have tried to build up hybrid model to handle this issue, in which the
13 divide and rule method is the most popular one. However, the existing divide and rule models often
14 predict the energy consumption subseries after decomposing with the single forecasting model. This
15
U
study introduces the group method of data handling technique for energy consumption forecasting in
16 China, and constructs a hybrid forecasting model based on the group method of data handling selective
AN
17 ensemble. It mainly focuses on predicting the nonlinear variation of energy consumption. The model
18 first predicts the linear trend of energy consumption time series through the group method of data
19 handling-based autoregressive model and then obtains the residual subseries of energy consumption.
M
20 Considering the highly nonlinear characteristics of the residual subseries, this study introduces
21 AdaBoost ensemble technology to enhance the forecasting performance of the single nonlinear
22 prediction model, back propagation neural network, support vector regression machine, genetic
D
23 programming, and radical basis function neural network respectively, to obtain four different versions
24
TE
of the ensemble model on nonlinear subseries. Further, the prediction results of these four AdaBoost
25 ensemble models are used as an initial input, and the selective combination prediction for the nonlinear
26 subseries is obtained by using the group method of data handling. Finally, two parts are added up to
27
EP
obtain the final prediction. The empirical analysis of total energy consumption and total oil
28 consumption in China shows that the forecasting performance of the proposed model is better than that
29 of the group method of data handling-based autoregressive model and seven other hybrid models, and
C
30 this study gives the out-of-sample forecasting of two time series from 2015 to 2020.
31 Key words: prediction of energy consumption; GMDH; AdaBoost ensemble technology; selective
AC
32 combination forecasting; hybrid forecasting

33
34 1. Introduction
35 Since the economic reform (“reform and opening-up”), the Chinese economy has developed
36 rapidly, and the energy consumption has increased continuously. The BP Statistical Review of World
37 Energy 2016 [1] pointed out that the Chinese economy grew slowly, and it was undergoing a structural
38 transformation, while China remained the country with the largest energy consumption, production,
1
Corresponding author. E-mail address: 147895715@qq.com.
1
ACCEPTED MANUSCRIPT
1 and net imports in the world. In 2015, China’s energy consumption was 23% of the total global
2 consumption, and comprised 34% of the net increase in the global energy consumption. In fossil energy,
3 China’s consumption increase rate for oil was the fastest, at 6.7%. In non-fossil energy, solar energy
4 increased the fastest, at 69.7%. China surpassed Germany and the USA, and became the largest solar
5 energy electricity generation country in the world. Therefore, it is of realistic significance to study and
6 construct a scientific energy consumption model, accurately predict the future gap between supply and
7 demand for sustainable economy and society development, energy industry development, the
8 reasonable use of energy resources, the construction of a conservation-minded society, and creation of
PT
9 a national energy strategy.
10
RI
Nomenclature
ARIMA autoregressive integrated moving average nonlinear subseries of the energy consumption
GM grey prediction model forecasting result of the t-th sample by the i-th ensemble
SC
model in the nonlinear subseries
GP genetic programming Y dependent variable
SVR support vector regression X independent variable
U
ANN artificial neural network AS asymmetric stability
AN
DR demand respond MR mean regularization
MLP multi-layer perceptron yˆ t (W ) forecasted output of the t-th sample in the entire training
set by the model trained on the same dataset in Eq. (8)

M
ANFIS adaptive neuro fuzzy inference system RMSE root mean square error
EMD empirical mode decomposition MAPE mean absolute percentage error
DEMD differential empirical mode SRMSE symmetrical root mean squared error
D
decomposition
LSSVR least square support vector regression SMAPE symmetrical mean absolute percentage error
TE
IMF intrinsic mode function mA sample size of the model learning set A
GMDH group method of data handling mB sample size of the model selection set B
HFGSE hybrid forecasting model based on forecasted output of the t-th sample in the selection set B
EP
yˆt (A)
GMDH selective ensemble by the model trained in the model learning set A in Eqs.
(9) and (10)
AR autoregression forecasted output of the t-th sample in the model

C
yˆ(
t B)
learning set A by the model trained in the model
AC
selection set B in Eqs. (9) and (10)
GAR GMDH-based AR T maximum number of iterations
BP back propagation ∅ threshold value of the relative forecasting error
RBF radical basis function

initialization weight of the i-th sample
output in Eq. (1)

weight of the i-th sample in the τ-th iteration
coefficient vector in Eq. (1) relative forecasting error in the τ-th iteration
number of initial models in Eq. (3) weight of weak learner in the τ-th iteration
estimated output in Eq. (4) the largest lagged order of BP neural network
test set nodes in hidden layer of BP neural network
training set C penalty parameter of SVR
2
ACCEPTED MANUSCRIPT
yt original energy consumption time series γ kernel width of SVR
forecasted linear trend
1
2 1.1. Literature review
3 With the social development and progress, people have realized the important effect of energy on
4 economic development. After the energy crisis in 1973 and 1979, the entire world has been conscious
5 of the energy conditionality for economy and the significance of consumption forecasting. During the
6 period, a great deal of research on energy consumption demand forecasting appeared abroad. Dupree
PT
7 and Corsentino [2] presented the future energy consumption within the major consuming sectors and
8 the energy supply sources of America. Thompson [3] proposed a weather sensitive electric loads and
RI
9 energy forecasting method which could be used in both long-term and short-term prediction. Parikh
10 and Rothkopf [4] studied the long-run elasticity of US energy demand and proposed an effective
11 process analysis method. There are no energy consumption data before the reform, so the research of
SC
12 China remained on policy suggestions. Yang [5] put forward several ways to save energy in China. Wu
13 [6] proposed the idea of using forecasting technology to solve the energy crisis. After that, the
14 availability of energy consumption data has resulted in a great progress in the domestic studies, for
U
15 example, Shi [7] pointed out that the improvement of China's energy utilization efficiency was very
16 significant since the reform and opening-up. The State Planning and Energy-saving Commission [8]
AN
17 focused on the construction and application of energy forecasting models. Until recently, the scholars
18 have proposed many methods for predicting energy consumption, and they can be divided into two
19 classes: single forecasting models and hybrid forecasting models.
M
20
21 Table 1 Typical literatures using single forecasting models
D
Model Type Typical Literature Advantages Disadvantages
Time series Sen et al. (2016) [9] Intuitive and explainable Pre-assumed form of the model;
TE
model Clements et al. (2016) [10] functional form; data independence assumption;
Boroojeni et al. (2017) [11] low computational complexity; low accuracy for nonlinear
Shaikh et al. (2017) [12] don't require extended data variation

EP
Ding et al. (2018) [13]
Nonlinear Kovačič and Šarler (2014) [14] Don't need pre-assumed form Results cannot be easily
forecasting model Szoplik (2015) [15] of the model; explained;

C
Irdemoosa and Dindarloo strong nonlinear mapping high computational complexity

AC
(2015) [16] ability;
Chen et al. (2017) [17] can solve the complex
Rahman et al. (2018) [18] nonlinear problems
Due 21/12/2018
22
23 Some typical literatures using single forecasting models are summarized in Table 1. The
24 commonly used single models include: 1) Time series models, including autoregressive integrated
25 moving average models (ARIMA) [9], regression analysis models [10], and grey prediction models
26 (GM) [13]. For example, Sen et al. [9] focused on how to select the best possible ARIMA model for
27 short term forecasting and they found out that the ARIMA (1,0,0) x (0,1,1) was the best model for the
3
ACCEPTED MANUSCRIPT
1 energy consumption and the ARIMA (0,1,4) x (0,1,1) was the best one for GHG (green-house gas)
2 emission, respectively. Clements et al. [10] proposed a multiple equation time series model to forecast
3 the day-ahead electricity load in Austria, and found that this model could achieve the same or even
4 better performance than the complex nonlinear and nonparametric forecasting models. Ding et al. [13]
5 developed a novel optimized grey model based on the principle of “new information priority” to predict
6 China’s electricity consumption, which combined a new initial condition and rolling mechanism. The
7 empirical results showed that the model was superior to some benchmark models. 2) Nonlinear
8 forecasting models, including genetic programming (GP) [14], artificial neural networks (ANN) [15],
PT
9 support vector regression (SVR) [17], etc. For instance, Kovačič and Šarler [14] applied the GP model
10 to forecasting the natural gas consumption in a steel plant and the results showed high accuracy of this
RI
11 model. Szoplik [15] used the multilayer perceptron (MLP) in ANN to forecast the gas demand in
12 Szczecin, Poland, and the results showed that this model had good performance when used to forecast
13 the gas consumption on any day of the year and any hour of the day. Chen et al. [17] proposed a new
SC
14 SVR model, which used the ambient temperature of two hours before demand respond (DR) event as
15 the input variables, for forecasting the DR baselines of office buildings.
16 The economic time series often have the characteristics of complexity and nonlinearity, and the
U
17 single model cannot always analyze and predict the energy demand accurately. Therefore, in recent
18 years, more and more scholars have tried to build up the hybrid model to handle this issue, and the
AN
19 models can be approximately classified into two types: 1) The combination forecasting method, which
20 trains several models to predict the original time series, and then combines these models with
21 appropriate weight to obtain the final forecasting result. For example, Zhang et al. [19] constructed a
M
22 weighted model combining nu-SVR and epsilon-SVR, and the differential evolution algorithm was
23 employed to determine the weights of each model. This model was utilized to forecast the daily and
D
24 half-hourly energy consumption of a building in Singapore and the results showed that the proposed
25 model had higher accuracy than some other models. Yuan et al. [21] combined GM and ARIMA
TE
26 models with the same weight to forecast China’s primary energy consumption, and found out that the
27 forecasting performance of this model was better than that of the single GM and ARIMA model. Li et
28 al. [26] improved the traditional combination method by allowing the weight coefficient of the
EP
29 participating model to be negative, the experimental results on the oil consumption in China indicated
30 that this new method had better performance than the traditional combination methods. 2) The divide
31 and rule method, which first decomposes the original time series into several subseries, and then
C
32 models and predicts each subseries by an appropriate model, finally integrates the prediction results
33
AC
according to certain rules. This method is used most frequently, for instance, Fan et al. [33] proposed a
34 model to forecast the electric load in Australia and USA. Firstly, this method used the differential
35 empirical mode decomposition (DEMD) to decompose the original time series into several intrinsic
36 mode functions (IMFs) and a residual subseries; secondly, the SVR model was employed to forecast
37 the IMFs and the autoregression model for the residual subseries; finally, all the results were summed
38 up to obtain the final prediction result. The empirical results illustrated that this model could provide
39 both accurate prediction and interpretative ability. Panapakidis and Dagoumas [34] proposed a hybrid
40 model to predict the day-ahead natural gas demand. First, it decomposed the original time series into
41 several subseries by wavelet transforming, then employed a genetic algorithm optimized adaptive
42 neuro-fuzzy inference system (ANFIS) to forecast each subseries, and finally a feed-forward neural
4
ACCEPTED MANUSCRIPT
1 network (FFNN) was used to aggregate the forecasting results of all the subseries. The experimental
2 results showed that the model had good robustness. In addition to the energy consumption forecasting,
3 the hybrid models are widely applied in the energy price forecasting. For example, Zhu et al. [35]
4 developed a EMD-based least square support vector regression (LSSVR) model to predict the carbon
5 price. It first decomposed the carbon price time series into several IMFs and a residue by EMD, then
6 used LSSVR to forecast the IMFs and residue, respectively, finally all the forecasting values were
7 aggregated into the final prediction results. Compared with some traditional forecasting methods, the
8 proposed model had better performance and robustness. More typical literatures regarding the hybrid
PT
9 forecasting model can be found in Table 2.
10
11
RI
Table 2 Typical literatures using hybrid forecasting models
Model Type Typical Literature Advantages Disadvantages
Combination forecasting Zhang et al. (2016) [19] Convenient and simple; Computational intensive;
SC
method Xiao et al. (2016) [20] robust for the complex difficult to decide which
Yuan et al. (2016) [21] problems models to be combined;
Nowotarski et al. (2016) [22] don’t take data characteristic
U
Liu et al. (2016) [23] into consideration
AN
Zhang et al. (2016) [24]
Karadede et al. (2017) [ 25]
Li et al. (2018) [26]

M
Zhang et al. (2018) [27]
Divide and rule method Zhu and Wei (2013) [28] Assign appropriate Complex model;
Liu et al. (2014) [29] forecasting method based on computational intensive

D
Abdoos et al. (2015) [30] the data characteristic;
Zhang et al. (2015) [31] robust for the complex

TE
Yu et al. (2015) [32] problems
Fan et al. (2016) [33]
Panapakidis and Dagoumas (2017) [34]

EP
Zhu et al. (2017) [35]
Oliveira and Oliveira (2018) [36]
Wang et al. (2018) [37]

C
12
AC
13 1.2. Our Contributions

14 The above researches have made much contribution to the energy demand forecasting, whereas
15 there are some gaps existing in the current state of art: 1) The existing divide and rule method often
16 predicts the energy consumption subseries after decomposing with the single forecasting model. In fact,
17 for subseries with strong nonlinear fluctuation, it is hard to obtain satisfactory effects with the single
18 model. To handle this issue, ensemble learning [38], which has arisen in recent years, undoubtedly
19 provides a good method. Its basic idea is to combine a series of weak learners to enhance their
20 prediction performance. 2) Most of the existing ensemble methods allow all the trained models to
21 participate in combination, therefore the redundancy and multicollinearity may exist, which may
22 decrease the performance of the ensemble model. The forecasting performance may be improved by
5
ACCEPTED MANUSCRIPT
1 selecting and combining the forecasting results of a subset of the models for a final decision, i.e.,
2 selective ensemble. The factor screening function of the group method of data handling (GMDH)
3 neural network proposed by Ivakhnenko [39] can objectively and automatically choose factors that
4 critically influence the research object [40]. Thus, GMDH can reduce the effect of multicollinearity on
5 the performance of the ensemble model to some extent.
6 To fill the gaps mentioned above, this study introduces the GMDH technique, and proposes a
7 hybrid forecasting model based on GMDH selective ensemble (HFGSE). It uses the GMDH-based
8 autoregressive (GAR) model proposed in the authors’ previous work [40] to predict the linear trend of
PT
9 the energy consumption time series, and obtains the nonlinear residual subseries. Considering the
10 highly nonlinear characteristics of the residual subseries, this study introduces AdaBoost ensemble [38]
RI
11 technology to enhance the forecasting performance of the single nonlinear prediction model, back
12 propagation (BP) neural network, support vector regression (SVR) machine [41], genetic programming
13 (GP) , and radical basis function (RBF) neural network respectively, to obtain four different versions of
SC
14 the ensemble model on nonlinear subseries: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and
15 AdaBoost.RBF. Further, the prediction results of these four AdaBoost ensemble models are used as an
16 initial input, and the selective combination prediction for the nonlinear subseries is obtained by using
U
17 GMDH. Finally, the predictions of the two parts are integrated to obtain the final forecasting results.
18 The empirical analysis on China’s total energy consumption and total oil consumption time series
AN
19 verifies the effectiveness of the HFGSE model.
20 The novelty of this study can be summarized as follows:
21 1) This study employs the ensemble learning to predict the nonlinear subseries after decomposition,
M
22 to improve the forecasting performance of single model.

23 2) Instead of integrating all the predictors on the nonlinear subseries, the proposed method utilizes
D
24 selective ensemble to avoid redundancy and multicollinearity to some extent. To the best of our
25 knowledge, this study, for the first time, employs selective ensemble in the energy consumption
TE
26 forecasting filed.
27
28 1.3. Organization of the paper
EP
29 This study is organized as follows: Section 2 describes the related methodology applied for the
30 proposed model, including the AdaBoost ensemble method, GMDH neural network, and GAR model.
31 Section 3 discusses the hybrid forecasting model based on the GMDH selective ensemble, that is,
C
32 HFGSE, in detail. Section 4 presents the empirical study. Finally, the findings of this study are
33
AC
summarized in Section 5.
34 2. Related Theories
35 In this study, AdaBoost ensemble model, GMDH neural network and GAR model are used to
36 construct the hybrid model HFGSE. A brief description of these related methods is summarized in this
37 section.
38
39 2.1. AdaBoost ensemble model
40 In machine learning, ensemble learning is an effective method for increasing the learning accuracy
41 through combining the outputs of many weak learners. Boosting is a commonly used ensemble
6
ACCEPTED MANUSCRIPT
1 algorithm. It has many different versions, and AdaBoost is the most popular one.
2 AdaBoost was proposed by Freund and Schapire [38]. To improve the learning performance of a
3 weak learner with the AdaBoost algorithm, it first needs to initialize the sample weight distribution in
4 the training set, and the initial weight assigned to each sample is the same, that is, if the training set
5 contains n samples, then the weight of each sample is 1⁄. Therefore, in the first iteration of training
6 the weak learner with AdaBoost, each sample will be selected with the same probability. The selected
7 samples can be trained to obtain the first weak learner, ℎ , under the appointed learning rule. Then,
8 AdaBoost calculates the classification error of the training samples at the current iteration. The weight
PT
9 distribution of the samples at the next iteration is updated according to the error. The update rule is:
10 increase the sample weight of misclassification, decrease the sample weight of correct classification.
Repeat the process T times, and T weak learners can be obtained: ! , !# , ⋯ , !% . Finally, the last
RI
11
12 prediction value is obtained through weighting the forecasting results of the T weak learners.
13 At the beginning, the application of the AdaBoost algorithm was focused on classification [42]
SC
14 such as face recognition, vehicle license plate recognition, and so on. In recent years, it has been
15 applied in forecasting [43]. For example, Solomatine and Shresth [44] proposed the AdaBoost.RT
16 algorithm for forecasting. The algorithm is similar to the AdaBoost algorithm, and the difference
U
17 between them is that the latter increases the weight of the sample whose relative error is greater than
18 the pre-set threshold values ∅ after finding it at the end of each iteration. For the detailed process,
AN
19 please refer to [44].
20
21 2.2. Group method of data handling neural network
M
22 The GMDH neural network is the core technique of self-organizing data mining [45], and it can
23 decide the variables to enter the model and the structure and parameters of the model in a
D
24 self-organizing way [46].

25 Generally speaking, before GMDH modeling, the training set, W, needs to be randomly divided
TE
26 into two subsets, namely, model learning set A for estimation of model parameters, and model selection
27 set B for performance evaluation of intermediate candidate models [47]. GMDH constructs the general
28 relation between the inputs and outputs variables through the reference function. Generally speaking, as
EP
29 a reference function, it takes the discrete form of a K-G polynomial:

30 = ' + ∑* + + ∑* ∑,* , + +, + ∑* ∑,* ∑-* ,- + +, +- + ⋯, (1)
31 where y is the output, . = + , +# , ⋯ +
is the input vector, and is the coefficient or weight vector.
Specifically, the form of the first-order linear K-G polynomial including variables can be expressed
C
32
33
AC
as follows:
34 Let
35 !+ , +# , ⋯ , +
= + + # +# + ⋯ + + , (2)
36 and take all its sub-items as the initial models of the modeling network structure:
37 / = + , /# = # +# , ⋯ , / = + . (3)
38 Set the initial models of Eq. (3) as the inputs of the GMDH network, combine all their possible
pairs and generate the 1# =

2
#
39 intermediate candidate models of the first layer [48]. The transfer
40 function is as follows:
41 = !3/ , /, 4; , 6 = 1, 2, ⋯ , ; ≠ 6, (4)
7
ACCEPTED MANUSCRIPT
1 where is the estimated output. Obtain parameters through least squares (LS) estimation on the
2 model learning set A. Work out the external criterion value of every intermediate candidate model on
3 the model selection set B. Generally speaking, the smaller the external criterion value is, the higher the
4 performance of the intermediate candidate model is. Rank the external criteria from small to large,
5 select the optimal 9 ≤ 1#
models as the inputs of the second layer, and combine all their possible
6 pairs to generate 1;#< intermediate candidate models:
7 = = !3 , , 4; , 6 = 1, 2, ⋯ , 9 ; ≠ 6. (5)
8 Estimate the parameter of each intermediate candidate model and calculate its external criterion value,
PT
9 select 9# 3≤ 1;#< 4 intermediate candidate models again as the inputs of the third layer, and combine all
10 their possible pairs to generate 1;#> intermediate candidate models:
= !3= , =, 4; , 6 = 1, 2, ⋯ , 9# ; ≠ 6.
RI
11 (6)
12 The process repeats continuously, and the intermediate candidate models of the fourth, fifth, …, layer
13 can be obtained in turn. The termination rule of the model is given through the optimal complexity
SC
14 theory [49]: with the increase of the intermediate candidate models’ complexity, the external criteria
15 values will first become smaller and then larger. Therefore, when the external criteria value reaches its
16 minimum, the corresponding model is the optimal complexity model ∗ (see Fig. 1). Finally, in order
to seek the initial model contained in the optimal complexity model ∗ , one just needs to reconstruct
U
17
18 the GMDH network structure from the last layer until the initial input layer is reached. From Fig. 1, it
AN
19 can be seen that the initial input models v1, v3, v4 and v5 are chosen. In other words, x1, x3, x4, and x5 are
20 chosen. However, v2 is eliminated during the self-adaption process; in other words, x2 is eliminated
21 [50].
M
The 1st layer

input layer The 2nd layer
w1
D
w2
v1
The 3rd layer
z2
TE
v2
The optimal model
v3 w5
EP
v4 y2 y*= f (v)
z6
v5
w10
C
The unselected models (eliminated) The selected models (reserved)

22
23
AC
Figure 1. The process of GMDH neural network modeling.
24
25 2.3. Group method of data handling-based autoregression model
26 In time series forecasting, the ARIMA model is usually adopted to predict the linear trend of the
27 time series. However, to determine whether the test sequence is stable, before constructing the ARIMA
28 (p, d, q) model, unit root test should be conducted. In addition, it’s necessary to find the optimal
29 parameter values, namely, the autoregressive order p and the moving average order q through trial and
30 error, whereas the GMDH neural network is a data driven method and it requires little prior knowledge
31 and assumptions. Thus, the authors’ previous work [40] combined an GMDH-type neural network with
32 an ARIMA model, and construct a GMDH-based autoregression (GAR) model for forecasting the
8
ACCEPTED MANUSCRIPT
1 energy consumption. In this model, the original single time series is first converted to a matrix. In the
2 matrix, starting from the second column, each column represents a new variable. Specifically, yt in the
3 second column is the current period of the energy consumption time series, that is, the dependent
4 variable Y, and from the third column to the end are the energy consumption time series with lag order
5 1, 2, · · · , k, respectively, which makes up the input vector X. In addition, it divides the new data set
6 into training set and test set , and divides the training set into a model learning set and a
7 model selection set further. Secondly, it trains a GMDH neural network to find the optimal complexity
8 model and decides the optimal autoregression order p. Finally, the energy consumption in the test set is
PT
9 forecasted by the optimal complexity model.
10 This model can ensure a self-organized modeling process, including finding the optimal
RI
11 complexity model, determining the optimal autoregression order, and estimating model parameters,
12 being largely devoid of human interference. The empirical analysis on three energy consumption time
13 series shows that the GAR model outperforms the ARIMA model.
SC
14 3. Hybrid Forecasting Model Based on Selective Ensemble
15 In this section, the proposed hybrid model HFGSE is described in detail, including basic idea,
U
16 construction of external criteria and modeling steps.
17
AN
18 Table 3 The transfer matrix for the nonlinear subseries
A .
# B C

M
D D

D
#
D
B
D
C
D2E D2E

D2E
#
D2E
B
D2E
C
D

… … … … …
TE
D2EF D2EF

D2EF
#
D2EF
B
D2EF
C
D2E D2E

D2E
#
D2E
B
D2E
C

EP
… … … … …
E E E# EB EF

C
19
C
20 3.1. Basic idea

21
AC
The hybrid model proposed in this study belongs to the divide and rule method. Because China’s
22 energy consumption time series is annual data, no seasonal factor exists. Therefore, this study uses the
23 GAR [40] model proposed earlier to predict its linear trend. The left residual sequence is the non-linear
24 subseries. Because the forecasting of the linear trend is relatively simple, and that of nonlinear
25 subseries is more difficult, it mainly focuses on the latter. Most existing hybrid forecasting models on
26 nonlinear subseries are for constructing a single prediction model, although the forecasting effect is
27 always better than that of the models that consider the linear trend only. Considering the complexity of
28 non-linear subseries, it is hard to obtain a better forecasting effect with the commonly used single time
29 series prediction model. This study first utilizes the ensemble learning model AdaBoost algorithm to
30 predict. It selects four nonlinear subseries classification models, namely, BP, SVR, GP, and RBF, to
9
ACCEPTED MANUSCRIPT
1 train the weak learner of the AdaBoost algorithm, and construct four ensemble prediction models:
2 AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Further, it considers combining the
3 four ensemble forecasting results. However, if all four trained ensemble models are combined, then
4 multicollinearity among the models may exist, which will degrade the forecasting accuracy of the
5 model. Forecasting performance can be improved by selecting and combining the forecasting results of
6 a subset of the models for a final decision. Thus, this study introduces a GMDH neural network to
7 establish selective combination forecasting. With the automatic modeling mechanism of GMDH, it
8 selects parts of models from all the ensemble forecasting models, self-organizes to combine them, and
PT
9 ensures their weights.
10 Suppose that the original energy consumption time series is . The HFGSEM model proposed in
RI
11 this study includes four steps: 1) To obtain the energy consumption nonlinear subseries, construct a
12 GAR model to predict linear trend; suppose it is . Then the difference value between them, 3 =
13 − 4, is the energy consumption nonlinear subseries. 2) AdaBoost ensemble prediction in the
SC
14 nonlinear subseries: It selects the above four nonlinear single models as the weak learners of AdaBoost
15 ensemble learning, obtains the forecasting results of four ensemble models in the non-linear subseries;
16 suppose these are = 1, 2, 3, 4
. 3) GMDH-based selective combination prediction in the nonlinear
subseries: First, it transfers the original nonlinear time series and all forecasting results of the
U
17
18 ensemble models = 1, 2, 3, 4
to the data set , stored in matrix form (see Table 3), where,
AN
19 denotes the energy consumption nonlinear subseries at the current period, that is, the dependent
20 variable A. From the third column to the sixth column: , # , B , and C construct the independent
21 variable . = , # , B , C
. Next, it divides the whole data set of the table into the model
M
22 training set and the test set (see the first column of Table 3). Further, it divides the model
23 training set into two subsets horizontally: model learning set A and model selecting set B, and
D
24 finds the optimal complexity model through the GMDH algorithm. Finally, it predicts the dataset using
25 the optimal complexity model and records the forecasting result as . 4) Calculate the final
forecasting value of the energy consumption time series. It adds the forecasting value of GAR model
TE
26
27 to that of nonlinear part , and obtains the final energy consumption forecasting value, that is,
28 = + .
EP
29
30 3.2. Construction of external criteria
31 In realistic system modeling, different requirements will appear, which may be the aims of
C
32 modeling, or the prior system knowledge. In GMDH modeling, the external criteria are the mathematic
33
AC
descriptions of these specified requirements, and can select the “optimal” model from the candidate
34 model set. GMDH has an external criteria system [50], which can select different external criteria
35 according to different modeling aims, and construct new external criteria according to needs.
36 This study chooses two external criteria from the existing GMDH external criteria system: the
37 asymmetric stability (AS) criterion and the mean regularization (MR) criterion. Their descriptions are
38 as follows:
39 (1) asymmetric stability criterion
40 J#
= ∑∈M − K
# , (7)
41 where yt is the actual output of the t-th sample in modeling training set W, and yˆt (A) is its
10
ACCEPTED MANUSCRIPT
1 forecasted output for dataset W by the model trained in the model learning set A. This criterion
2 means that first train the model in subset A and then calculate the sum of error squares between the
3 actual outputs and the forecasted outputs in the entire training set W.
4 (2) mean regularization criterion
5 J##
= ∑∈M −
# , (8)
6 where yˆt (W) is the forecasted output of the t-th sample in the entire training set W by the model
7 trained on the same dataset, that is, the model learning process and the calculation of external criteria
PT
8 are both carried out on the training set W.
9 Furthermore, considering that root mean square error (RMSE) and mean absolute percentage error
RI
10 (MAPE) are two commonly used indexes for evaluating the performance of the models in the energy
11 consumption prediction, this study constructs two new criteria: the symmetrical root mean squared
12 error (SRMSE) criterion and symmetrical mean absolute percentage error (SMAPE) criterion. The
SC
13 following are their descriptions:
14 (3) symmetrical root mean squared error criterion
JB#
= N
∑P∈ROP 2OP Q
> ∑P∈TOP 2OP S
>
+N ,
DR DT
U
15 AN (9)
16 where mA and mB stand for the sample size of data set A and B, respectively, yˆt (B) is the forecasted
17 output of the t-th sample in the modeling learning set A from the model trained in the model selection
M
18 set B, and yˆt ( A) is the forecasted output of the t-th sample in the modeling selection set B from the
19 model trained in the model learning set A. The SRMSE criterion calculates the root mean square error
D
20 in subset A and the root mean square error in subset B simultaneously.

21 (4) symmetrical mean absolute percentage error criterion
TE
Y T
V
VW XW Y R
V
VW XW
∑P∈R P P ∑P∈T P P
JC#
= U +U .
WP WP
DR DT
22 (10)
EP
23 The SMAPE criterion calculates the mean absolute percent error in subset A and the mean absolute
24 percent error in subset B simultaneously, which uses the information in subsets A and B symmetrically,
25 as in the SRMSE criterion.
C
26 According to different external criteria, different GMDH selective combination forecasting

27
AC
models can be constructed: AS.GMDH, MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH.

28
29 3.3. Modeling steps
30 The modeling flowchart of the energy consumption forecasting model HFGSE proposed in this
31 study can be seen in Fig. 2. Its specific modeling steps include the following:
32
11
ACCEPTED MANUSCRIPT
PT
RI
1
2 Figure 2. The modeling flowchart of the HFGSE model.
3
SC
4 Step 1: Obtain the energy consumption nonlinear subseries. Construct a GAR model on the
5 original energy consumption time series and predict the linear trend. Suppose the forecasting result
6 is ; then, the energy consumption nonlinear subseries is = − ;
Step 2: AdaBoost ensemble forecasting on Z[ . Suppose contains \ sample points, the
U
7
8 maximum number of iterations is and the threshold value of the relative forecasting error is ∅. The
AN
9 processes of integrating the nonlinear single forecasting model with the AdaBoost.RT algorithm are as
10 follows [45]:
(1) Initialize the weight vector

= , = 1, 2, ⋯ , \
;

M
D
11
12 (2) For ` = 1, 2, ⋯ , :
a. Calculate the sample weight distribution:

= ∑b a
M
cd< Ma
13 , and train one weak learner,
! +
→ ;
TE
14
b. Calculate the relative forecasting error = ∑D
*
, : g
h ic
2Oc
a
j > ∅, where is the
Oc
15
real output of the i-th sample, and ! +

is the forecasting output of the weak learner;
EP
16
17 c. Assign the weight = # to the weak learner;
18 d. Update the weight vector of the sample:
h ic
2Oc
, ! g a j≤∅
C
F
=
∗ l Oc .
1, otherwise
19 (11)
AC
20 (3) Output the final hypothesis：
!h +
= ∑% * log u w ! +
/ ∑% * log u w.

va va
21 (12)
22 This study selects four single nonlinear forecasting models in turn, to train the weak learners;
23 obtains four models: AdaBoost.BP, AdaBoost.SVR, AdaBoost.RBF, and AdaBoost.GP; and records
24 their forecasting results as , # , B , C , respectively.
25 Step 3: Conduct selective combination forecasting with a GMDH neural network on .
26 (1) Transfer and prepare the data: Transfer the original nonlinear time series data and the
27 forecasting results of the four ensemble models , # , B , C into matrix form as in Table 3, divide
12
ACCEPTED MANUSCRIPT
1 the matrix data into model training set W and model test set . Further, divide the training set into
2 model learning set A and model selecting set B;
3 (2) Run the GMDH algorithm on the model training set W, and find the combination forecasting
4 model with the optimal complexity:
5 a. Construct the general relation between the output and input variables:
6 = + # # + B B + C C , (13)
7 and regard all the sub-items as the initial input models of the GMDH neural network:
8 / = , /# = # # , /B = B B , /C = C C , (14)
PT
9 b. Combine all the possible pairs of the four initial models and generate the six candidate models
10 for the first layer, and estimate the parameters of the intermediate candidate models with the LS
RI
11 method;
12 c. Calculate the external criteria values of all intermediate candidate models, select the four
13 intermediate candidate models with the smallest external criteria values for the next layer, and regard
SC
14 them as inputs of the second layer of the GMDH neural network;
15 d. Repeat Steps b and c, generate the intermediate candidate models for the second, third, …, L-th
16 layer in turn, and find the combination forecasting model with optimal complexity u* according to the
U
17 optimal complexity theory;
18 (3) Predict the energy consumption nonlinear subseries on the test set with the optimal
AN
19 complexity model ∗ , and let it be ;
20 Step 4: Calculate the final energy consumption time series forecasting value. Add the
21 forecasting value of the linear GAR model and that of the nonlinear part , and obtain the final
M
22 energy consumption time series forecasting value, that is, = + .
23 4. Empirical Analysis
D
24 To verify the performance of the proposed model, this study selects two time series, the total
TE
25 energy consumption and total oil consumption in China for experiments. Firstly, to analyze the impact
26 of AdaBoost.RT algorithm on the model’s forecasting performance in the nonlinear subseries, this
27 study compares the forecasting performance of AdaBoost.RT ensemble with four single models—GP,
EP
28 SVR, GP and RBF. Secondly, to investigate the effect of selective combination forecasting, it analyzes
29 the forecasting results of four different versions of GMDH combination forecasting models to find the
30 optimal one, and then compares the best one with the models participating in the combination. Thirdly,
C
31 it compares the forecasting performance of HFGSE model with that of other hybrid forecasting models.
32
AC
Finally, the out-of-sample forecasting of HFGSE model is complemented on the total energy
33 consumption and total oil consumption time series in China from 2015 to 2020.
34
35 4.1. Data
36 To evaluate the forecasting performance of the HFGSE model proposed in this study, an empirical
37 analysis of the annual time series is conducted on the Chinese total energy consumption and total oil
38 consumption from 1978 to 2014 (see Fig. 3). The data are from the China Statistical Yearbook.
39 Because the key of the HFGSE model is to predict the nonlinear subseries of energy consumption, this
40 study does not discuss the forecasting result of the linear trend in detail, but utilizes the GAR model
41 proposed above to predict the linear trend of the original series , and obtains nonlinear subseries .
13
ACCEPTED MANUSCRIPT
1 Fig. 4 shows the change of total energy consumption and total oil consumption nonlinear subseries. It
2 can be seen from the figure that the nonlinear subseries of two energy consumption time series shows
3 fluctuation to a large extent.
450000
400000
Ten thousand tons of standard coal
350000
300000
PT
250000
200000
150000
RI
100000
50000
SC
0
Total energy consumption Total oil consumption
U
4
5 Figure 3. Energy consumption time series.
AN
6
M
20000
15000
D
10000
TE
5000
-5000
EP
-10000
-15000
-20000
C
The nonlinear subseries of total energy consumption

The nonlinear subseries of total oil consumption
AC
7
8 Figure 4. The nonlinear subseries of total energy and oil consumption.
9
10 4.2. Experiment setting
11 This study selected the time series of energy consumption from 1978 to 2009 as the training set,
12 and the ones from 2010 to 2014 as the test set. The models mentioned in this study were trained the
13 corresponding models on the training set, and evaluate their performance on the test set. It is worth
14 noting that the training set and test set here are different from those in Table 3, but they are related. In
15 Table 3, it first conducts the GAR model in the original energy consumption time series yt to obtain the
16 linear trend prediction , then calculates the nonlinear subseries y , and finally uses 1978 - #''} as the
14
ACCEPTED MANUSCRIPT
1 training set and #'' - #'C as the test set.
2 This study used the original energy consumption time series as the dependent variable, and its
3 lagged item as the independent variable to train the model. The four nonlinear forecasting models were
4 regarded as the weak learners to the train AdaBoost.RT ensemble model. The parameters setting of the
5 four models was as follows: 1) BP neural network: It includes two important parameters: the largest
6 lagged order, and its nodes in hidden layer, . In predicting different energy consumption time
7 series, the optimal values of two parameters are always different. After repeated experiments, it can be
8 found that the BP neural network can attain a satisfactory forecasting performance for the total energy
PT
9 consumption and total oil consumption time series when = 5 and 4, and =3 and 3 respectively. 2)
10 SVR model: this study used the Libsvm-3.1 toolbox to implement the SVR model. It chose the most
RI
11 commonly used RBF as the kernel function because of its nonlinear mapping ability. Through
12 experiments, it can be found that the SVR model had the best forecasting performance on the total
13 energy consumption and total oil consumption time series when =1 and 2, respectively. There are
other two important parameters in SVR model, i.e., the penalty parameter C and the kernel width γ.
SC
14
15 This study introduced the grid computing method in the toolbox to search for the best parameter values.
16 Finally, let C=0.2 and γ = 15.76 for the total energy consumption, and C=7.1 and γ = 24.20 for the
U
17 total oil consumption. 3) GP model: In its modeling process, the parameters setting is relatively
18 important for its performance. Through repeated tries, the GP model can attain the optimal forecasting
AN
19 effect for the total energy consumption and total oil consumption time series, respectively, when let the
20 number of initial trees be 50 and 60, the crossover probability 0.8 and 0.85, the threshold value of
21 goodness of fit 0.85 and 0.85, and the maximum number of iterations 50 and 50. 4) RBF neural
M
22 network: The expanding speed of the radial basis function spread is an important parameter, and the
23 lagged order of the time series, , is also very important. Through experimental comparison, it can be
D
24 found that the RBF model attained the best forecasting performance for two energy consumption time
25 series when spread=3 and =1.
For the threshold value ∅ of the AdaBoost.RT ensemble algorithm, after a repeated experiments
TE
26
27 comparison, this study took ∅＝10% because the performance of the model is the best at this value.
28 Although the forecasting error of the final strong learner will decrease with the increase of the iteration
times , the increase in will lead to the increase of the model operation time, and therefore, the
EP
29
30 iteration times was set at =50.
31 Finally, all experiments were performed on the platform Matlab2011b. At the same time, this
C
32 study repeated the above procedure 10 times and took the average value to be the experimental result.
33
AC
34 4.3. Model evaluation criteria

35 To evaluate the forecasting performance of model, this study utilizes two commonly used
36 evaluation criteria: the root mean square error (RMSE) [51] and mean absolute percent error (MAPE)
37 [52]. Their definitions are as follows:
∑b
J = N
P 2OP
>
Pd<O
,
D
38 (15)
Y P XWP
W
∑b
Pd<
K =
WP
D
39 , (16)
40 where is the real value of the y-th sample, is its corresponding forecasting value, and \ is the
15
ACCEPTED MANUSCRIPT
1 number of the test samples. Obviously, the smaller the value of the evaluation criterion is, the better the
2 forecasting performance of the model is [53].
3
4 Table 4 Comparison of AdaBoost ensemble and single models on two nonlinear subseries
Model BP AdaBoost.BP SVR AdaBoost.SVR GP AdaBoost.GP RBF AdaBoost.RBF
Total energy consumption nonlinear subseries
RMSE 0.7738 0.6818 0.75 0.6709 2.0351 0.92 1.1497 0.7602
Rank 5 2 3 1 8 6 7 4
PT
MAPE 103.51% 11.31% 98.62% 28.22% 255.74% 19.63% 134.87% 36.45%
Rank 6 1 5 3 8 2 7 4
RI
Total oil consumption nonlinear subseries
RMSE 0.2905 0.2368 0.2688 0.2392 0.2367 0.1931 0.2388 0.2319
Rank 8 4 7 6 3 1 5 2
SC
MAPE 156.44% 79.36% 94.62% 70.85% 125.28% 72.18% 128.44% 53.01%
Rank 8 4 5 2 6 3 7 1
Average rank 6.75 2.75 5 3 6.25 3 6.5 2.75
U
5
AN
6 4.4. AdaBoost ensemble forecasting on the nonlinear subseries
7 To analyze the impact of the AdaBoost.RT ensemble algorithm on the model’s forecasting
8 performance, this study compares the forecasting result of the BP neural network, SVR model, GP
9
M
model, and RBF neural network with that of each original single nonlinear model. Table 4 shows the
10 comparison of each model’s forecasting performance on two energy consumption nonlinear subseries.
11 The table gives the rank of each model on two evaluation criteria, from low to high (the smaller the
D
12 rank is, the better the model’s performance is). The last row is the average value of each model’s
13 evaluation criteria ranks, for each of two nonlinear subseries.
TE
14 The following conclusions can be obtained after carefully analyzing Table 4: 1) For both the total
15 energy consumption nonlinear subseries and the total oil consumption nonlinear time series, the values
16 of RMSE and MAPE through integrating the AdaBoost model are smaller than those of the
EP
17 corresponding single nonlinear models. This demonstrates that the AdaBoost.RT algorithm can
18 certainly improve the single linear models’ forecasting performance to different extents. 2) In the total
19 energy consumption nonlinear subseries, it can be seen from the ranks that the performance of the
C
20 AdaBoost.SVR model is the best according to the RMSE evaluation criterion, and that of AdaBoost.BP
AC
21 is the best according to the MAPE evaluation criterion; in the total oil consumption nonlinear subseries,
22 the performance of AdaBoost.GP is the best according to the RMSE evaluation criterion, and that of
23 AdaBoost.RBF is the best according to the MAPE evaluation criterion. This demonstrates that the
24 ensemble models can always achieve better performance compared with the four single nonlinear
25 forecasting models. From the average ranks of the last row in Table 4, the eight models in order of their
26 forecasting performance are as follows: AdaBoost.BP, AdaBoost.RBF, AdaBoost.SVR, AdaBoost.GP,
27 SVR, GP, RBF and BP. The four different ensemble models rank better than the other four models,
28 which verifies the above conclusions again.
29
30 4.5. Analysis of selective combination forecasting
16
ACCEPTED MANUSCRIPT
1 This part focuses on the effect of selective combination forecasting. It first analyzes the
2 forecasting results of four different versions of GMDH combination forecasting models to find the
3 optimal one, and then compares the best one with the models participating in the combination.
4
5 4.5.1 Comparisons of different versions of selective combination forecasting models
6 In the HFGSE model proposed in this study, four different versions of the model are constructed
7 according to the different external criteria used in GMDH selective combination prediction: AS.GMDH,
8 MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH. In this section, the four versions of GMDH model
PT
9 are used to make selective combinations of the models that were enhanced by the AdaBoost.RT
10 algorithm in the previous section. Table 5 shows the comparison of the selective combinations’
11 performance for the four different GMDHs. The number in parentheses indicates the rank of the model
RI
12 in the row. The smaller the rank is, the better the model’s performance is. The last row is the average
13 values of the evaluation criteria ranks of all the models for two consumption time series, which can
SC
14 represent the overall predictive performance of the models well.
15 According to Table 5, for the total energy consumption time series, MR.GMDH has the best
16 performance according to the RMSE evaluation criterion, followed by AS.GMDH and SRMSE.GMDH,
U
17 and the poorest performer in the group is SMAPE.GMDH. Meanwhile, according to the MAPE
18 evaluation criterion, AS.GMDH has the best performance, followed by SMPE.GMDH, and the poorest
AN
19 performers are MR.GMDH and SMAPE.GMDH. Therefore, each of these four models has its own
20 advantages and disadvantages. However, for the total oil consumption time series, AS.GMDH has the
21 smallest value of both RMSE and MAPE, indicating its superior prediction performance. Finally, from
M
22 the average ranks in the last row of Table 5, the AS.GMDH model has the smallest value, followed by
23 the MR.GMDH model, and finally the SMAPE.GMDH and SRMSE.GMDH models. This indicates
D
24 that, in the four versions of the GMDH selective combination forecasting model, the AS.GMDH model
25 has the best overall predictive performance. Therefore, in the following experiments of this study, the
TE
26 AS.GMDH model is chosen for the selective combination forecasting.

27
28 Table 5 Comparisons of different versions of GMDH models on energy and oil consumption nonlinear subseries
EP
Model AS.GMDH MR.GMDH SRMSE.GMDH SMAPE.GMDH
Total energy consumption time series
RMSE 0.5738(2) 0.5669(1) 0.5887(3) 0.6003(4)

C
MAPE 8.541%(1) 10.01%(3) 10.32%(4) 9.981%(2)

AC
Total oil consumption time series
RMSE 0.1789(1) 0.1903(4) 0.1867(2) 0.1899(3)
MAPE 39.17%(1) 47.23%(3) 52.11%(4) 45.01%(2)
Average rank 1.25 2.75 3.25 2.75
29
30 Furthermore, Table 6 gives the models that participate in the optimal combination model for two
31 consumption nonlinear subseries selected by the AS.GMDH model. It can be seen from the table that
32 the AS.GMDH model chooses two models from the four candidates, that is, AdaBoost.BP,
33 AdaBoost.GP, AdaBoost.RBF, and AdaBoost.SVR, to participate in the optimal combination for two
34 consumption nonlinear subseries. Thus, the following conclusion can be drawn that, on the one hand,
17
ACCEPTED MANUSCRIPT
1 models that participate in the optimal combination selected by the GMDH selective combination
2 forecasting model with self-organization modeling technology are not a single candidate model that can
3 effectively compensate for the lack of a single prediction model with poor performance; on the other
4 hand, they are not all candidate models, which is a good way to overcome the disadvantage of
5 information redundancy that the combination of all candidate models, namely the traditional
6 combination forecasting model, may lead to, thus improving the prediction performance of the model.
7
8 Table 6 Models participating in the optimal combination model constructed by AS.GMDH
PT
Nonlinear subseries Selected models
Total energy consumption AdaBoost.BP, AdaBoost.GP
RI
Total oil consumption AdaBoost.GP, AdaBoost.RBF
9
10 4.5.2. Comparisons of the selective combination model with the models participating in the
SC
11 combination
12 To verify the performance of the GMDH-based selective combination forecasting model, this
13 study compares the GMDH-based combination model AS.GMDH with the four models participating in
U
14 combination: AdaBoost.BP, AdaBoost.SVR, AdaBoost.GP, and AdaBoost.RBF. Fig. 5 and Fig. 6 show
15 the comparison results for the total energy consumption nonlinear series and the total oil consumption
AN
16 nonlinear series, respectively.
Total energy consumption

1.0000 0.9200
M
0.9000
0.8000 0.7602
0.6818 0.6709
0.7000 RMSE
0.5738
0.6000 MAPE
D
0.5000
0.4000 36.45%
28.22%
0.3000
19.63%
0.2000
TE
8.54% 11.31%
0.1000
0.0000
EP
17
18 Figure 5. Comparison of the GMDH combination model with the models participating in the combination for the nonlinear
19 subseries of the total energy consumption.

C
20
21 As can be seen from Fig. 5, for the total energy consumption nonlinear subseries, according to the
AC
22 RMSE evaluation criterion, the AS.GMDH model is optimal, followed by the AdaBoost.SVR and
23 AdaBoost.BP model, and finally the AdaBoost.RBF and AdaBoost.GP model. Moreover, according to
24 the MAPE evaluation criterion, AS.GMDH still is the optimal model, followed by AdaBoost.BP,
25 AdaBoost.GP, AdaBoost.SVR, and AdaBoost.RBF. Thus, the conclusion can be drawn that, for the
26 total energy consumption nonlinear series, compared with the four models participating in the
27 combination, the AS.GMDH model proposed by this study has a better forecasting performance.
28 According to Fig. 6, it can be seen that for the total oil consumption nonlinear series, the GMDH
29 selective combination forecasting model has the smallest value on both evaluation criteria, especially
30 on the MAPE evaluation criterion: the value of AS.GMDH is 13.84% lower than that of AdaBoost.RBF.
31 This shows that AS.GMDH still has the best forecasting performance for the total oil consumption
18
ACCEPTED MANUSCRIPT
1 nonlinear series.
Total oil consumption

0.9000
79.36%
0.8000 70.85% 72.18%
0.7000
0.6000 53.01%
0.5000
39.17%
0.4000
RMSE
0.3000 0.2368 0.2392 0.2319
PT
0.1789 0.1931 MAPE
0.2000
0.1000
0.0000
RI
2
SC
3 Figure 6. Comparison of the GMDH combination model with the models participating in the combination for the total oil
4 consumption nonlinear subseries.
U
6 4.6. Comparisons of the proposed hybrid model with other models
7 To verify the overall forecasting performance of the proposed hybrid model HFGSE, this article
AN
8 compared it to other commonly used time series models. First, it compared the HFGSE model with the
9 GAR model (which only predicts the linear trend of the energy consumption time series and discards
10 the nonlinear residual subseries directly) that put forward earlier; the results are shown in Table 7. It
M
11 can be seen from the table that for both the total energy consumption time series and the total oil
12 consumption time series, the errors of the HFGSE model, which predicted the nonlinear residual series,
D
13 are always smaller than those of the GAR model. The conclusion can be drawn that for both
14 consumption time series, the nonlinear residual series do carry useful information for prediction
TE
15 modeling.
16
17 Table 7 Comparison of the forecasting performance of the HFGSE and GAR models
EP
Errors of total energy consumption Errors of total oil consumption
Model RMSE MAPE RMSE MAPE

C
GAR 1.7010 3.62% 1.2908 6.99%
HFGSE 0.4672 1.20% 0.2341 2.84%

AC
18
19 Next, this study compared the HFGSE model with four simple hybrid models which first use the
20 GAR model to predict the linear trend, and then employ the BP, SVR, GP, and RBF model,
21 respectively, to predict the nonlinear fluctuations, finally combining two parts for the forecasting result.
22 Furthermore, it compared the HFGSE model with three hybrid forecasting models proposed recently,
23 including the combination forecasting method GM-ARIMA [21], the divide and rule methods
24 EMD-LSSVR [35] and DEMD-SVR-AR [33]. The results are shown in Table 8. The bold value in the
25 table corresponds to the smallest error in the current row. The number in parentheses indicates the rank
26 of the model in the row. The smaller the rank is, the better the model’s performance is. The last row
27 shows the average rank for each model.
19
ACCEPTED MANUSCRIPT
1 According to Table 8, the following conclusions can be obtained: 1) For both the total energy
2 consumption and total oil consumption time series, HFGSE, put forward by this study, has the smallest
3 value of MAPE evaluation criterion; the value of RMSE for HFGSE is only larger than that of
4 DEMD-SVR-AR for the total oil consumption time series. In addition, from the average rank in the last
5 row of this table, it can be seen that the average rank of HFGSE is the smallest, too. Thus, compared
6 with the other seven hybrid models, HFGSE has the best overall forecasting performance. 2) For the
7 seven hybrid models, the average rank of DEMD-SVR-AR is only larger than that of HFGSE model
8 proposed in this study, followed by EMD-LSSVR, GAR&BP, GM-ARIMA, GAR&SVR, GAR&GP
PT
9 models, and finally GAR&RBF. This indicates that the overall forecasting performance of
10 DEMD-SVR-AR model is superior to those of the six other models, whereas that of GAR&RBF is the
RI
11 worst.
12
13 Table 8 Comparisons of forecasting performance of the HFGSE and the other seven hybrid models
SC
HFGSE GAR&BP GAR&SVR GAR&GP GAR&RBF GM-ARIMA EMD-LSSVR DEMD-SVR-AR
Total energy consumption time series
RMSE 0.4672(1) 1.4722(5) 1.3310(4) 1.6320(6) 2.3675(7) 3.780(8) 0.6231(3) 0.5156(2)
U
MAPE 1.20%(1) 2.93%(4) 3.05%(5) 3.16%(6) 4.56%(8) 3.40%(7) 1.30%(2) 1.42%(3)
AN
Total oil consumption time series
RMSE 0.2341(2) 0.7428(5) 0.8461(8) 0.7572(6) 0.7816(7) 0.3152(3) 0.3907(4) 0.1713(1)

MAPE 2.84%(1) 5.49%(6) 5.18%(5) 6.88%(7) 7.19%(8) 4.33%(3) 4.78%(4) 2.94%(2)
M
Average rank 1.25 5.0 5.5 6.25 7.5 5.25 3.25 2
14
15 4.7. Out-of-sample forecasting of the proposed hybrid model
D
16 Based on the above analyses and comparisons, the HFGSE model can accurately predict energy
17 consumption. Furthermore, Table 9 shows the out-of-sample forecasting results of the HFGSE model
TE
18 for two consumption time series from 2015 to 2020. It can be seen from the table that China's energy
19 consumption will continue to rise from 2015 to 2020, and the total amount of energy consumption and
20 oil consumption will reach 5261.47 and 1017.56 million tons of standard coal by 2020, respectively.
EP
21 The average annual growth rate of total energy consumption in 2015-2020 is 4.14%, whereas the
22 annual growth rate of total oil consumption is 5.24%.
23
C
24 Table 9 Forecasting of the HFGSE model for two consumption time series from 2015 to 2020 (unit: ten thousand tons of
AC
25 standard coal)
Year 2015 2016 2017 2018 2019 2020
Total energy consumption time series 435637 448275 453746 485768 499398 526147
Total oil consumption time series 77059 81498 86148 91064 96262 101756
26
27 Meanwhile, since the real energy consumption data of 2015 and 2016 in China is available now,
28 the forecasting accuracy of 2015 and 2016 is shown in Table 10. After comparing Tables 10 and 8, it
29 can be found that HFGSE model has little difference in prediction performance between the
30 out-of-sample and test set, which shows that the HFGSE model has strong generalization ability.
31 Table 10 Forecasting accuracy of the HFGSE model for two consumption time series in 2015 and 2016
20
ACCEPTED MANUSCRIPT
Total energy consumption time series Total oil consumption time series
Year MAPE RMSE MAPE RMSE
2015 1.33% 0.5732 2.05% 0.1614
2016 2.81% 1.2274 2.14% 0.2729
1
2 Figure 7 depicts the predicted results of energy consumption and comparisons with real values; the
3 triangle-dotted line and the cross-dotted line represent the predicted value of total energy consumption
4 and total oil consumption, respectively. The circle solid line and the square solid line represent the real
PT
5 value of the total energy consumption and total oil consumption, respectively. The dotted line and solid
6 line for 1978-2014 in the figure almost overlap completely, which further indicates that the HFGSE
RI
7 model can fit the energy consumption time series well. After 2015, the triangle-dotted line and the
8 cross-dotted line still maintain the trend of growth, but the growth rate of total energy consumption
9 begins to decelerate, and the growth rate of total oil consumption is basically unchanged.
SC
10
600000
U
500000
the real value of
total energy
AN
400000 consumption
the predicted
300000 value of total
energy
consumption
M
200000 the real value of

total oil
consumption
100000
D
the predicted
value of tatal oil
0 consumption
TE
11
12 Figure 7. Comparison between the predicted and the real values of energy consumption.
EP
13 5. Conclusion
14 Researching and building scientific energy consumption models and accurately predicting the
15 future gap of energy supply and demand have important practical significance to our country's
C
16 sustainable economic and social development, the development of the energy industry, the rational use
AC
17 of energy resources, the construction of a conservation-oriented society, and the formulation of a

18 national energy strategy. This study proposed a new GMDH-based selective ensemble hybrid
19 forecasting model. The model first uses the GAR model to predict the linear trend of the energy
20 consumption time series and obtains the nonlinear residual subseries. Considering the highly nonlinear
21 characteristic of the residual subseries, this study introduces AdaBoost ensemble technology to enhance
22 the forecasting performance of single nonlinear prediction models to obtain the prediction results of
23 four different versions of the ensemble model on a nonlinear subseries. Further, the prediction results of
24 these four AdaBoost ensemble models are used as initial input, and the combination predictive value of
25 the nonlinear subseries is obtained by using GMDH for selective combination prediction. Finally, the
26 two parts are summed up to obtain the final prediction. The experiment was conducted on the time
21
ACCEPTED MANUSCRIPT
1 series of total energy consumption and total oil consumption in China, and the main conclusions are as
2 follows:
3 1) Compared with the four single models—BP, SVR, GP and RBF, the AdaBoost.RT ensemble
4 algorithm can achieve better forecasting performance on the nonlinear subseries.
5 2) This study compares four different versions of GMDH selective combination forecasting
6 models, and the results show that AS.GMDH model has the best overall forecasting performance.
7 3) The comparisons of AS.GMDH combination forecasting model with the models participating in
8 the combination show that AS.GMDH has the best performance on the nonlinear subseries.
PT
9 4) Compared with GAR model and other seven hybrid models, the HFGSE model has the best
10 forecasting performance. In addition, the out-of-sample forecasting proves the superiority of the
RI
11 HFGSE model again.
12 5) The HFGSE model is applied to the out-of-sample forecasting and the results demonstrate the
13 total energy consumption and total oil consumption in China will keep growing until 2020.
SC
14 In the process of constructing the GMDH neural network, the reference function only considers
15 the first-order linear K-G polynomials, without further study of other forms of reference function. In
16 fact, in the real world, the relationship between the dependent and independent variables may not be a
U
17 simple first-order linear relationship. Therefore, considering the form of a more complex nonlinear
18 reference function will be more in line with the actual relationship, and may further improve the
AN
19 performance of the model; it is also the further research direction of this study.
20 Acknowledgments
M
21 Thanks for the constructive suggestions of the editor and anonymous reviewers. This study is
22 partly supported by the National Natural Science Foundation of China under Grant Nos. 71471124 and
D
23 71273036, Excellent Youth Fund of Sichuan University under Grant Nos. skqx201607, sksyl201709,
24 and skzx2016-rcrw14.
TE
25 References
26 [1] B. World, BP Statistical Review of World Energy 2016. Available from: http://www.bp.com/en/global/corporate
EP
27 /energy-economics/statistical-review-of-world-energy.html.
28 [2] W.G.J. Dupree, J.S. Corsentino, United States energy through the year 2000, Nasa Sti/recon Technical Report, 1975.
29 [3] R.P. Thompson, Weather sensitive electric demand and energy analysis on a large geographically diverse power system
C
30 application to short term hourly electric demand forecasting, IEEE Transactions on Power Apparatus and Systems 95 (1)
31
AC
(1976) 385-393.
32 [4] S. Parikh, M.H. Rothkopf, Long-run elasticity of US energy demand: A process analysis approach, Energy Economics 2 (1)
33 (1980) 31-36.
34 [5] Z.R. Yang, The potential and means of saving energy, China's Energy 3 (4) (1980) 5-8. (in Chinese).
35 [6] Z.H. Wu, See the Way Out of the Energy Crisis from Energy Science and Technology, Knowledge Press, 1980. (in Chinese)
36 [7] D. Shi, The improvement of energy utilization efficiency in China's economic growth, Economic Research Journal 48 (9)
37 (2002) 49-56. (in Chinese).
38 [8] The State Planning and Energy-saving Commission, Development and Application of Energy Prediction Model, China
39 Planning Press, 1988. (in Chinese).
40 [9] P. Sen, M. Roy, P. Pal, Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an
41 Indian pig iron manufacturing organization, Energy 116 (12) (2016) 1031-1038.
22
ACCEPTED MANUSCRIPT
1 [10] A.E. Clements, A.S. Hurn, Z. Li, Forecasting day-ahead electricity load using a multiple equation time series approach,
2 European Journal of Operational Research 251 (2) (2016) 522-530.
3 [11] K.G. Boroojeni, M.H. Amini, S. Bahrami, S.S. Iyengar, A.F. Sarwat, O. Karabasoglu, A novel multi-time-scale modeling for
4 electric power demand forecasting: From short-term to medium-term horizon, Electric Power Systems Research 142 (1)
5 (2017) 58-73.
6 [12] F. Shaikh, Q. Ji, P.H. Shaikh, N.H. Mirjat, M.A. Uqaili, Forecasting China’s natural gas demand based on optimized
7 nonlinear grey models, Energy 140 (12) (2017) 941-951.
8 [13] S. Ding, K.W. Hiple, Y.G. Dang, Forecasting China's electricity consumption using a new grey prediction model, Energy
PT
9 149 (4) (2018) 314-328.
10 [14] M. Kovačič, B. Šarler, Genetic programming prediction of the natural gas consumption in a steel plant, Energy 66 (3) (2014)
11
RI
273-284.
12 [15] J. Szoplik, Forecasting of natural gas consumption with artificial neural networks, Energy 85 (6) (2015) 208-220.
13 [16] E.S. Irdemoosa, S.R. Dindarloo, Prediction of fuel consumption of mining dump trucks: a neural networks approach,
SC
14 Applied Energy 115 (8) (2015) 77-84.
15 [17] Y. Chen, P. Xu, Y. Chu, W.L. Li, Y.T. Wu, L.Z. Ni, Y. Bao, K. Wang, Short-term electrical load forecasting using the support
16 vector regression (SVR) model to calculate the demand response baseline for office buildings, Applied Energy 195 (6) (2017)
U
17 659-670.
18 [18] A. Rahman, V. Srikumar, A.D. Smith, Predicting electricity consumption for commercial and residential buildings using
AN
19 deep recurrent neural networks, Applied energy 212 (2) (2018) 372-385.
20 [19] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted
21 support vector regression with differential evolution optimization technique, Energy and Buildings 126 (8) (2016) 94-103.
M
22 [20] L.Y. Xiao, C. Wang, T.L Liang, W. Shao, A combined model based on multiple seasonal patterns and modified firefly
23 algorithm for electrical load forecasting, Applied Energy 167 (4) (2016) 135-153.
D
24 [21] C.Q Yuan, S.F. Liu, Z.G. Fang, Comparison of China's primary energy consumption forecasting by using ARIMA (the
25 autoregressive integrated moving average) model and GM (1,1) model, Energy 100 (4) (2016) 384-390.
TE
26 [22] J. Nowotarski, B. Liu, R. Weron, T. Hong, Improving short term load forecast accuracy via combining sister forecasts,
27 Energy 98 (3) (2016) 40-49.
28 [23] X.L. Liu, B. Moreno, A.S. García, A grey neural network and input-output combined forecasting model. Primary energy
EP
29 consumption forecasts in Spanish economic sectors, Energy 115 (11) (2016) 1042-1054.
30 [24] F. Zhang, C. Deb, S.E. Lee, J.J. Yang, K.W. Shah, Time series forecasting for building energy consumption using weighted
31 support vector regression with differential evolution optimization technique, Energy & Buildings 126 (8) (2016) 94-103.
C
32 [25] Y. Karadede, G. Ozdemir, E. Aydemir, Breeder hybrid algorithm approach for natural gas demand forecasting model, Energy
33
AC
141 (12) (2017) 1269-1284.
34 [26] J.R. Li, R. Wang, J.Z. Wang, Y.F. Li, Analysis and forecasting of the oil consumption in China based on combination models
35 optimized by artificial intelligence algorithms, Energy 44 (2) (2018) 243-264.
36 [27] Y.J. Zhang, F. Ma, B.S. Shi, D.S. Huang, Forecasting the prices of crude oil: An iterated combination approach, Energy
37 Economics 70 (2) (2018) 472-483.
38 [28] B.Z. Zhu, Y.M. Wei, Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines
39 methodology, Omega 41 (3) (2013) 517-524.
40 [29] N. Liu, Q.F. Tang, J.H. Zhang, W. Fan, J. Liu, A hybrid forecasting model with parameter optimization for short-term load
41 forecasting of micro-grids, Applied Energy 129 (12) (2014) 336-345.
42 [30] A. Addoos, M. Hemmati, A.A. Abdoos, Short term load forecasting using a hybrid intelligent method, Knowledge-Based
43 Systems 76 (3) (2015) 139-147.
23
ACCEPTED MANUSCRIPT
1 [31] J.L. Zhang, Y.J. Zhang, L. Zhang, A novel hybrid method for crude oil price forecasting, Energy Economics 49 (5) (2015)
2 649-659.
3 [32] L. Yu, Z.S. Wang, L. Tang, A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil
4 price forecasting, Applied Energy 156 (10) (2015) 251-267.
5 [33] G.F. Fan, L.L. Peng, W.C. Hong, F. Sun, Electricity load forecasting by the SVR model with differential empirical mode
6 decomposition and auto regression, Neurocomputing 173 (1) (2016) 958-970.
7 [34] I.P. Panapakidis, A.S. Dagoumas, Day-ahead natural gas demand forecasting based on the combination of wavelet transform
8 and ANFIS/genetic algorithm/neural network model, Energy 118 (1) (2017) 231-245.
PT
9 [35] B.Z. Zhu, D. Han, P. Wang, Z.C. Wu, T. Zhang, T.M. Wei, Forecasting carbon price using empirical mode decomposition
10 and evolutionary least squares support vector regression, Applied Energy 191 (4) (2017) 521-530.
11
RI
[36] E.M. Oliveira, F.L.C. Oliveira, Forecasting mid-long term electric energy consumption through bagging ARIMA and
12 exponential smoothing methods, Energy 144 (2) (2018) 776-788.
13 [37] D.L. Wang, Y.D. Wang, X.F. Song, Y. Liu, Coal overcapacity in China: multiscale analysis and prediction, Energy
SC
14 Economics 70 (2) (2018) 244-257.
15 [38] Y. Freund, R.E. Schapirre, Experiments with a new boosting algorithm, in: ICML, pp. 148-156.
16 [39] A. Ivakhnenko, The group method of data handling in prediction problems, Soviet Autom Control 9 (6) (1976) 21-30.
U
17 [40] L. Xie, J. Xiao, H. Zhao, Y. Xiao, Y. Hu, China’s energy consumption forecasting by GMDH based auto-regressive model,
18 Journal of Systems Science and Complexity 30 (6) (2017) 1332-1349.

AN
19 [41] X.F. Li, Z.S. Zhang, C. Huang, An EPC forecasting method for stock index based on integrating empirical mode
20 decomposition, SVM and cuckoo search algorithm, Journal of Systems Science and Information 2 (6) (2014) 481-504.
21 [42] P. Viola, M.J. Jones, Robust real-time object detection, International Journal of Computer Vision 57 (2) (2001) 34-47.
M
22 [43] L. Gao, P. Kou, F. Gao, X.H. Guan, AdaBoost regression algorithm based on classification-type loss, in: 8th World Congress
23 on Intelligent Control and Automation (WCICA), IEEE, 2010, pp. 682-687.

D
24 [44] D.P. Solomatine, D.L. Shrestha, AdaBoost.RT: a boosting algorithm for regression problems, in: International Joint
25 Conference on Neural Networks, IEEE, 2004, pp. 1163-1168.

TE
26 [45] J. Xiao, C.Z. He, X.Y. Jiang, Structure identification of Bayesian classifiers based on GMDH, Knowledge-Based Systems
27 22 (6) (2009) 461-470.
28 [46] J. Xiao, C.Z. He, X.Y. Jiang, D.H. Liu, A dynamic classifier ensemble selection approach for noise data, Information
EP
29 Sciences 180 (18) (2010) 3402-3421.
30 [47] J. Xiao, L. Xie, C.Z. He, Dynamic classifier ensemble model for customer classification with imbalanced class distribution,
31 Expert Systems with Applications 39 (3) (2012) 3668-3675.

C
32 [48] J. Xiao, Y. Xiao, A. Huang, D.H. Liu, S. Wang, Feature-selection-based dynamic transfer ensemble model for customer
33
AC
churn prediction, Knowledge and Information Systems 43 (1) (2015) 29-51.
34 [49] J. Xiao, H.W. Cao, X.Y. Jiang, X. Gu, L. Xie, GMDH-based semi-supervised feature selection for customer classification,
35 Knowledge-Based Systems 132 (9) (2017) 236-248.
36 [50] J.A. Mueller, F. Lemke, Self-organizing Data Mining: An Intelligent Approach to Extract Knowledge from Data, Libri,
37 2000.
38 [51] Y. Xiao, J.J. Liu, Y. Hu, Y.F. Wang, Time series forecasting using a hybrid adaptive particle swarm optimization and neural
39 network model, Journal of Systems Science and Information 2 (4) (2014) 335-344.
40 [52] J. Xiao, X.Y. Jiang, C.Z. He, G. Teng, Churn prediction in customer relationship management via GMDH-based multiple
41 classifiers ensemble, IEEE Intelligent Systems 31 (2) (2016) 37-44.
42 [53] S.W. Yu, K.J. Zhu, A hybrid procedure for energy demand forecasting in China, Energy 37 (1) (2012) 396-404.
43
24
ACCEPTED MANUSCRIPT
1
PT
RI
U SC
AN
M
D
TE
EP
C
AC
25
ACCEPTED MANUSCRIPT
Highlights
A selective ensemble based hybrid energy consumption prediction model is proposed.
This study employs the selective ensemble method for the nonlinear subseries.
PT
The selective ensemble method performs better than its constituent models.
RI
The hybrid model outperforms other seven models on the original time series.
The out-of-sample forecasting for the two time series from 2015-2020 is shown.
U SC
AN
M
D
TE
C EP
AC

GMDH selective ensemble hybrid model for China energy consumption

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GMDH selective ensemble hybrid model for China energy consumption

Uploaded by

Copyright:

Available Formats

Accepted Manuscript

A hybrid model based on selective ensemble for energy consumption forecasting in

To appear in: Energy

Received Date: 29 November 2017

1 A hybrid model based on selective ensemble for energy

5 a Business School, Sichuan University, Chengdu 610064, China

6 b Management Faculty, Chengdu University of Information Technology, Chengdu 610103, China

32 combination forecasting; hybrid forecasting

GP genetic programming Y dependent variable

SVR support vector regression X independent variable

set by the model trained on the same dataset in Eq. (8)

EMD empirical mode decomposition MAPE mean absolute percentage error

(9) and (10)

AR autoregression forecasted output of the t-th sample in the model

selection set B in Eqs. (9) and (10)

GAR GMDH-based AR T maximum number of iterations

BP back propagation ∅ threshold value of the relative forecasting error

RBF radical basis function  

output in Eq. (1) 

 test set  nodes in hidden layer of BP neural network

 training set C penalty parameter of SVR

 forecasted linear trend

Model Type Typical Literature Advantages Disadvantages

Shaikh et al. (2017) [12] don't require extended data variation

Ding et al. (2018) [13]

forecasting model Szoplik (2015) [15] of the model; explained;

Irdemoosa and Dindarloo strong nonlinear mapping high computational complexity

(2015) [16] ability;

Chen et al. (2017) [17] can solve the complex

Rahman et al. (2018) [18] nonlinear problems

Model Type Typical Literature Advantages Disadvantages

Yuan et al. (2016) [21] problems models to be combined;

Nowotarski et al. (2016) [22] don’t take data characteristic

Karadede et al. (2017) [ 25]

Li et al. (2018) [26]

Zhang et al. (2018) [27]

Liu et al. (2014) [29] forecasting method based on computational intensive

Abdoos et al. (2015) [30] the data characteristic;

Zhang et al. (2015) [31] robust for the complex

Yu et al. (2015) [32] problems

Fan et al. (2016) [33]

Panapakidis and Dagoumas (2017) [34]

Zhu et al. (2017) [35]

Oliveira and Oliveira (2018) [36]

Wang et al. (2018) [37]

13 1.2. Our Contributions

22 to improve the forecasting performance of single model.

24 self-organizing way [46].

29 a reference function, it takes the discrete form of a K-G polynomial:

pairs and generate the 1# =

The 1st layer

The unselected models (eliminated) The selected models (reserved)

Figure 1. The process of GMDH neural network modeling.

 # B C

E E E# EB EF

20 3.1. Basic idea

> ∑P∈TOP 2OP S

20 in subset A and the root mean square error in subset B simultaneously.

26 According to different external criteria, different GMDH selective combination forecasting

models can be constructed: AS.GMDH, MR.GMDH, SRMSE.GMDH, and SMAPE.GMDH.

(1) Initialize the weight vector  

a. Calculate the sample weight distribution: 

6  b Management Faculty, Chengdu University of Information Technology, Chengdu 610103, China

RBF radical basis function

output in Eq. (1)

test set nodes in hidden layer of BP neural network

training set C penalty parameter of SVR

forecasted linear trend

pairs and generate the 1# =

# B C

E E E# EB EF

> ∑P∈TOP 2OP S

(1) Initialize the weight vector

a. Calculate the sample weight distribution:

real output of the i-th sample, and ! +