Professional Documents
Culture Documents
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
art ic l e i nf o a b s t r a c t
Article history: Electric load forecasting is an important issue for power utility, associated with the management of daily
Received 26 December 2014 operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong
Received in revised form non-linear learning capability of support vector regression (SVR), this paper presents a SVR model
5 June 2015
hybridized with the differential empirical mode decomposition (DEMD) method and auto regression
Accepted 20 August 2015
(AR) for electric load forecasting. The differential EMD method is used to decompose the electric load
Bijaya Ketan Panigrahi
Available online 1 September 2015 into several detail parts associated with high frequencies (intrinsic mode function (IMF)) and an
approximate part associated with low frequencies. The electric load data from the New South Wales
Keywords: (NSW, Australia) market and the New York Independent System Operator (NYISO, USA) are employed for
Electric load forecasting
comparing the forecasting performances of different alternative models. The results illustrate the validity
Support vector regression
of the idea that the proposed model can simultaneously provide forecasting with good accuracy and
Differential empirical mode decomposition
Auto regression interpretability.
& 2015 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2015.08.051
0925-2312/& 2015 Elsevier B.V. All rights reserved.
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 959
For the empirical mode decomposition (EMD) with auto regression the literature, such as ARIMA model, BPNN model (artificial neural
(AR), which is a fast, easy, and reliable unsupervised clustering network trained by back-propagation algorithm), and GA–ANN
algorithm, it has been successfully applied to many fields, such as model (artificial neural network trained by genetic algorithm).
communication, economy, engineering, and so on [21–23], and also These experimental results indicate that this proposed DEMD–
has achieved good effects. In the meanwhile, the EMD method can SVR–AR model has the following advantages: (1) simultaneously
effectively extract the components of the basic mode from non- receives higher accuracy and interpretability; (2) the proposed
linear or non-stationary time series [21,23–26]. By employed EMD, model can tolerate more redundant information than the original
the original complex time series (with multi-scale) can be locally SVR model, thus, it has better generalization ability.
separated into a sum of a low frequency part (residual) and a high The rest of this paper is organized as follows: in Section 2, the
frequency part (IMF), i. e., time series can be transferred into a series DEMD–SVR–AR forecasting model is introduced and the main steps
with more apparent component by reducing noise [26]. However, of the model are given. In Section 3, the data description and the
the sifting process in the EMD modeling phase will stop when the research design are outlined. The numerical results and compar-
residual becomes either over-distorted or a monotonic function
isons are presented and discussed in Section 4. A brief conclusion of
from which no further IMF can be extracted [27,28]. Therefore,
this paper and the future research are provided in Section 5.
Bhusana and Chris [23] proposed the differential empirical mode
decomposition (DEMD) to overcome the fluctuation problem which
the original EMD method is unable to do well. In their model, a
derivate signal can be obtained by several derivative of the original 2. Support vector regression with differential empirical mode
signal which will eliminate the fluctuated gradient, so that the decomposition
signal can be better to meet the requirements of EMD. The new
signal is then used by EMD to integrate and receive each order 2.1. Differential empirical mode decomposition (DEMD)
intrinsic mode function (IMF) and the residual amount of the ori-
ginal signal. The DEMD method is used to decompose the electric The EMD method is based on the simple assumption that any
load to several detail parts associated with high frequencies IMF and signal consists of different simple intrinsic modes of oscillations.
an approximate part associated with low frequencies IMF. It can Each linear or non-linear mode will have the same number of
effectively reduce the interactions among lots of singular values and extreme and zero-crossings. There is only one extreme between
improve the forecasting performance of a single kernel function. successive zero-crossings. Each mode should be independent of
Thus, it is useful to employ suitable kernel functions for forecasting the others. Since the original work on EMD, several studies have
the medium-and-long-term tendencies of the time series. been presented to improve EMD. One improvement is the differ-
In this paper, we present a new hybrid model with clear ential EMD [23]. In this section, the differential EMD will be
human-understandable knowledge on training data to achieve a described as follows. In this way, each signal could be decomposed
satisfied forecasting accuracy. The principal idea is hybridizing into a number of intrinsic mode functions (IMFs), each of which
DEMD with SVR and AR, namely the DEMD–SVR–AR model, to should satisfy the following two definitions [25],
receive better forecasting performances. The rationale of our
forecasting model is as follows: (1) the raw data can be divided a. In the whole data set, the number of extreme and the number
into two parts by DEMD technology, one is the high frequency of zero-crossings should either equal or differ to each other at
item, another is the residuals; (2) the high frequency item have most by one.
little redundant information than the raw data and trend infor- b. At any point, the mean value of the envelope defined by local
mation, because these information are gone to the residuals, so the maxima and the envelope defined by the local minima is zero.
SVR model is employed to forecast the high frequency, the accu-
racy is higher than the original SVR model particularly in some An IMF represents a simple oscillatory mode compared with the
peak and valley values period; (3) the residuals is monotonous and simple harmonic function. With the definition, any signal x (t ) can be
stationary, so the AR model is appropriate for forecast the resi- decomposed as following steps, and the flowchart is shown as Fig. 1.
duals; (4) the forecasting results would be eventually obtained 1. Identify all local extremes, and then connect all the local
from the high frequency item and the residuals. The proposed maxima by a cubic spline line as the upper envelope.
DEMD–SVR–AR model has the capability in smoothing and redu- 2. Repeat the procedure for the local minima to produce the
cing the noise (inherited from DEMD), the capability in filtering lower envelope. The upper and lower envelopes should cover all
dataset and improving forecasting performance (inherited from
the data among them.
SVR), and the capability in effectively forecasting the future ten-
3. The mean of upper and low envelope value is designated as
dencies (inherited from AR). The forecasting outputs by using the
m1, and the difference between the signal x (t ) and m1 is the first
hybrid method are described in the following section.
component, h1, as shown in Eq. (1),
To show the applicability, generality and superiority of the
proposed model, firstly, half-hourly electric load data (48 data h1 = x (t ) − m1 (1)
points per day) from the New South Wales (NSW, Australia) with
Generally speaking, h1 will not necessarily meet the require-
two different sample sizes are employed to compare the fore-
ments of the IMF, because h1 is not a standard IMF. It needs to be
casting performances of the proposed model and other four
determined for k times until the mean envelope tends to zero.
alternative models existed in the literature, namely the PSO–BP
Then, the first intrinsic mode function c1 is introduced, which
model (BP neural network trained by a particle swarm optimiza-
tion algorithm), SVR model, PSO–SVR model (SVR parameters stands for the most high-frequency component of the original data
determined by the PSO algorithm), and the AFCM model (an sequence. At this point, the data could be represented as Eq. (2),
adaptive fuzzy combination model based on a self-organizing map h1k = h1 (k − 1) − m1k (2)
and support vector regression). Secondly, another hourly electric
load data (24 data points per day) from the New York Independent where h1k is the datum after k times siftings. h1 (k − 1) stands for
System Operator (NYISO, USA), also, with two different sample the data after k − 1 times sifting. Standard deviation (SD) is used to
sizes are used to further compare the forecasting performances of determine whether the results of each filter component meet the
the proposed model with other three alternative models existed in IMF or not. SD is defined as Eq. (3),
960 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970
n
Start x1 (t ) = ∑ ci + rn
i=1 (6)
input signal Because the sifting process stops when the residual rn(t) becomes
x(t) either over-distorted or a monotonic function from which no further
IMF can be extracted. The power density of white Gaussian noise has
r = x(t), n = 1 a normal distribution, so eliminating the IMF that represents
the normal distribution is therefore assumed to cancel the white
Gaussian noise. Next, the last IMF, the lagged IMF before the
Determination of local maxima monotonic function emerges, is the most suitable because its local
and minima of X(t)
curves have a normal distribution. Subsequently, we subtract the
Fitting the envelope envelope under
original signals using the last IMF, denoted as c0 (t ) in Eq. (1).
E1 and E2 Finally, the differential EMD is proposed by Eq. (7),
DEMD = xn (t ) − c0 (t ) (7)
x(t) = h
m = (E1 + E2) / 2
where xn (t ) refers to dependent variables.
The original data can be expressed as the IMF component and
h = x(t) - m remainder.
x(t) = r
f (X ) = WT φ (X ) + b (8)
Fig. 1. Differential EMD algorithm flowchart.
where f (X ) denotes the forecasting values; the coefficients
2
W (W ∈ R nh ) and b ( b ∈ R ) are adjustable. As mentioned above, the
T
h1 (k − 1) (t ) − h1k (t ) SVM method aims at minimizing the empirical risk, shown as Eq. (9),
SD = ∑
k=1
h12(k − 1) (t ) (3) N
1
R emp (f ) = ∑ Θε (yi , WT φ (Xi ) + b)
where T is the length of the data. N i=1 (9)
The value of standard deviation SD is limited in the range of
0.2 to 0.3, which means when 0.2 < SD < 0.3, the decomposition where Θε (y, f (x ))is the ε-insensitive loss function and defined
process can be finished. The consideration for this standard is that as Eq. (10),
it should not only ensure hk (t ) to meet the IMF requirements, but ⎛ f (X ) − Y − ε, if f (X ) − Y ≥ ε
also control the decomposition times. Therefore, in this way, the Θε (Y , f (X )) = ⎜
⎝ 0, otherwise (10)
IMF components could retain amplitude modulation information
in the original signal. In addition, Θε (Y , f (X )) is employed to find out an optimum hyper-
4. When h1k had met the basic requirements of SD, based on the plane on the high dimensional feature space (Fig. 1b) to maximize the
condition of c1 ¼ h1k , the signal x (t ) of the first IMF component c1 distance separating the training data into two subsets. Thus, the SVR
can be obtained directly, and a new series r1 could be achieved focuses on finding the optimum hyper plane and minimizing the
after deleting the high frequency components. This relationship
training error between the training data and the ε -insensitive loss
could be expressed as Eq. (4),
function. Then, the SVR minimizes the overall errors, shown as Eq. (11),
r1 = x1 (t ) − c1 (4) N
1 T
Min Rε (W , ξ *, ξ ) = W W + C ∑ (ξi* + ξi )
The new sequence is treated as the original data and repeats W , b, ξ *, ξ 2 (11)
i=1
the steps 1 to 3 processes. The second intrinsic mode function c2
could be obtained. with the constraints:
5. Repeat previous steps 1 to 4 until the rn cannot be decom-
Yi − WT φ (Xi ) − b ≤ ε + ξi*, i = 1, 2, ... , N
posed into the IMF. The sequence rn is called the remainder of the
original data x (t ) : rn is a monotonic sequence, it can indicate the − Yi + WT φ (Xi ) + b ≤ ε + ξi, i = 1, 2, ... , N
overall trend of the raw data x1 (t ) or mean, and it is usually ξi* ≥ 0, i = 1, 2, ... , N
referred as the so-called trend items. It is of clear physical sig- ξi ≥ 0, i = 1, 2, ... , N (12)
nificance. The process is expressed as Eqs. (5) and (6):
The first term of Eq. (11), employing the concept of maximizing
r1 = x1 (t ) − c1, r2 = r1 − c2, …, rn = rn − 1 − cn (5) the distance of two separated training data, is used to regularize
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 961
weight sizes to penalize large weights, and to maintain regression Input (data)
function flatness. The second term penalizes training errors of f (x )
DEMD
and y by using the ε -insensitive loss function. C is the parameter
to trade off these two terms. Training errors above ε are denoted as
ξi*, whereas training errors below ε are denoted as ξi .
Resi
After the quadratic optimization problem with inequality con- IMF1 IMF2 IMF3 IMFk
duals
straints is solved, the parameter vector w in Eq. (8) is obtained as
Eq. (13), SVR
AR
N
W= ∑ (βi* − βi ) φ (Xi )
i=1 (13) Prediction
where ξi*,ξi are obtained by solving a quadratic program and are
Fig. 2. The full flowchart of DEMD–SVR–AR model.
the Lagrangian multipliers. Finally, the SVR regression function is
obtained as Eq. (14) in the dual space:
from SVR model and AR model, respectively, the final fore-
N casting results would be eventually obtained from the high
f (X ) = ∑ (βi* − βi ) K (Xi , X ) + b frequency item and the residuals.
i=1 (14)
11000
11000
10000 10000
9000 9000
8000 8000
7000 7000
6000 6000
-50 0 50 100 150 200 250 300 350 0 200 400 600 800 1000 1200
Time (half hour) Time (half hour)
Fig. 3. (a) Half-hourly electric load in NSW from 2 to 8 May 2007; (b) half-hourly electric load in NSW from 2 to 24 May 2007.
800 1500
1500
Electric load (MW)
600
400 1000 1000
200 500
500
0
-200 0 0
-400 -500
-500
-600
-800 -1000 -1000
-50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350
1500 1200
1500
1000
Electric load (MW)
Electric load (MW)
9200
Electric load (MW)
200 10000
9000
100
8800 9000
0 8600
8000
8400
-100
8200 7000
-200
8000 6000
-300 7800
-50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350
forecast effects are 0.9976 and 0.9984, accordingly. This implies that 3.1.3. Forecasting using AR for data-II (the residuals in Case 1)
the decomposition is helpful to improve the forecasting accuracy. The As shown in Fig. 4(h), the residuals are linear locally and stable,
parameters of a SVR model for data-I are shown in Table 1, in which so the AR technique is very suitable to forecast.
the forecasting error for the high-frequency decomposed by the Then, according to the geometric decay of the correlation coef-
modified DEMD and SVR has been reduced. ficient and partial correlation coefficients fourth-order truncation
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 963
Fig. 5. Comparison of the data-I and the forecasted electric load of training and testing by the SVR model for the small sample and large sample data in Case 1: (a) one-day
ahead prediction of May 8, 2007 are performed by the model; (b) one-week ahead prediction from 18 to 24 May 2007 are performed by the model.
3.2.2. Forecasting using SVR for data-I (the high frequency item in
Table 1
The SVR's parameters for data-I in Case 1.
Case 2)
As shown in Fig. 7, the high frequency data and raw data have
Sample size m σ C ε Testing MAPE the same characteristic such as nonlinearity, chaos. The SVR model
is very adaptive to solve forecasting problems.
The small sample size 20 0.1 100 0.0047 9.72
The large sample size 20 0.24 128 0.0021 4.9
Firstly, for both small sample and large sample data, the high-
frequency item is simultaneously employed for SVR modeling, and
the better performances of the training and testing (forecasting)
for data-II (the residuals), it can be denoted as AR(4) model. The sets are shown in Fig. 9(a) and (b), respectively. The correlation
parameters of an AR model for data-II are also shown in Table 2. coefficients of training effects are 0.9901 and 0.9915, respectively,
As shown in Fig. 6(a) and (b), the residuals, for both small sample of the forecast effects are 0.9936 and 0.9957, accordingly. This
and large sample data, almost are in a straight line. In addition, it is not implies that the decomposition is helpful to improve the fore-
difficult to find straight line in Fig. 4(h), which is also the superiority of casting accuracy. The parameters of a SVR model for data-I are
DEMD technology. The good forecasting results are shown in Table 2, shown in Table 3, in which the forecasting error for the high-fre-
and the errors have reached the level of 10 5 for the small or large quency decomposed by the modified DEMD and SVR has been
amount of data. It has demonstrated the superiority of the AR model. reduced.
In Table 2, the forecasting error of the residuals by the improved
decomposition DEMD has significantly reduced. 3.2.3. Forecasting using AR for data-II (the residuals in Case 2)
As shown in Fig. 8(h), the residuals are linear locally and stable,
3.2. The experimental results of Case 2 so the AR technique is very suitable to forecast.
Then, according to the geometric decay of the correlation coef-
For Case 2, firstly, the proposed model is trained by electric load ficient and partial correlation coefficients fourth-order truncation
obtained from 1 January 2015 to 12 January 2015 (i.e., training data for data-II (the residuals), it can be denoted as AR(4) model. The
set), and testing electric load data is from 13 to 14 January 2015. parameters of an AR model for data-II are also shown in Table 4.
The employed electric load data is on an hour basis (i.e., 24 data As shown in Fig. 10(a) and (b), the residuals, for both small
points per day). The data size contains only 14 days, to differ from sample and large sample data, almost are in a straight line. In
the other example with more sample data, this example is so- addition, it is not difficult to find straight line in Fig. 8(h), which is
called the small sample size data, and illustrated in Fig. 7(a). also the superiority of DEMD technology. The good forecasting
Secondly, the second experiment with 46 days (1104 data results are shown in Table 4, and the errors have reached the level
points from 1 January to 15 February 2015) is modeled by using of 10 5 for the small or large amount of data. It has demonstrated
part of all the training samples as training set, i.e., from 1 January the superiority of the AR model. In Table 4, the forecasting error of
to 1 February 2015, and testing electric load data is from 2 to 15 the residuals by the improved decomposition DEMD has sig-
February 2015. This example is so-called the large sample size nificantly reduced.
data, and illustrated in Fig. 7(b).
Table 2
Summary of results of the AR forecasting model for data-II in Case 1.
The small sample size 9.7725 × 10−5 xn = 5523.894 + 1.01xn − 1 + 0.372176xn − 2 + 0.002791xn − 3 − 0.791445xn − 4
The large sample size 7.5921 × 10−5 xn = 5538.269 + 1.0022xn − 1 + 0.369828xn − 2 + 0.001914xn − 3 − 0.753692xn − 4
8640
Actual Values Actual Values
9500
8620 Predicted Values Predicted Values
8600 9400
9300
8560
8540 9200
8520
9100
8500
8480 9000
8460
8900
8440
0 10 20 30 40 50 -50 0 50 100 150 200 250 300 350
Time(half hour) Time (half hour)
Fig. 6. Comparison of the data-II and the forecasted electric load by the AR model for the two experiments in Case 1: (a) one-day ahead prediction of 8 May 2007 performed
by the model; (b) one-week ahead prediction from 18 to 24 May 2007 performed by the model.
26000 26000
24000 24000
22000 22000
20000 20000
18000 18000
16000 16000
Electric load (MW)
14000 14000
12000 original 12000
10000 data-I 10000 original
8000 8000 data-I
6000 6000
4000 4000
2000 2000
0 0
-2000 -2000
-4000 -4000
-6000 -6000
-50 0 50 100 150 200 250 300 350 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Time (hour) Time (hour)
Fig. 7. (a) Hour electric load in NYISO from 1 to 14 January 2015; (b) hour electric load in NYISO from 1 to 15 February 2015.
4.1. Forecasting evaluation methods 4.2. Parameter settings of the employed forecasting models
For the purpose of evaluating the forecasting capability, we As mentioned by Taylor [31], and to be based on the same
examine the forecasting accuracy by calculating three different sta- comparison condition with Che et al. [32], in Case 1, some para-
tistical metrics, the root mean square error (RMSE), the mean absolute meter settings of the employed forecasting models are set as fol-
error (MAE) and the mean absolute percentage error (MAPE). The lowings. For the PSO–BP model, as mentioned in [32], they employ
definitions of RMSE, MAE and MAPE are expressed as Eqs. (16)–(18): 90% of all collected samples as the training set, and the rest as the
n 2 evaluation set. The parameters used in the PSO–BP are set as fol-
∑i = 1 ( Pi − Ai ) lows, (i) The BP neural network is set as that the input layer
RMSE =
n (16) dimension (indim) is 2, hidden layer dimension (hiddennum) is 3,
output layer dimension (outdim) is 1; (ii) the related settings of the
n
∑i = 1 Pi − Ai PSO, as mentioned in [32], are as that maximum iteration number
MAE =
n (17) (itmax) is 300, number of particles N is 40, length of particle D is 3,
weight c1 and c2 are set as 2. Because the PSO–SVR model embeds
∑i = 1
n Pi − Ai the construction and prediction algorithm of SVR in the fitness
Ai
MAPE = *100 value iteration step of PSO, it will take a long time to train the
n (18)
PSO–SVR using the full training dataset. For the above reason, we
where Pi and Ai are the i-th predicted and actual values, draw a small part of all training samples as training set, and the
respectively, and n is the total number of predictions. rest as evaluation set. The parameters of PSO used in this case are
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 965
2000
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
2000 1000
-500 -200
-400 -200
-1000
-600
-1500 -300
-800
-2000 -1000
-400
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
20000
80 40
60 30 19900
40 20 19800
20 10
19700
0 0
-20 -10 19600
-40 -20
19500
-60 -30
-80 -40 19400
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
as follows, for small sample size, maximum iteration number SVR model and the proposed DEMD–SVR–AR model are shown in
(itmax) is 50, number of particles N is 20, length of particle D is 3, Fig. 11(a). Notice that the forecasting curve of the proposed
weight c1 and c2 are set as 2; for large sample size, maximum DEMD–SVR–AR model (red solid dot and red curve) fits better
iteration number (itmax) is 20, number of particles N is 5, length of than other alternative models. For the Case 2, the forecasting
particle D is 3, weight c1 and c2 are set as 2. results (the electric load from 13 to 14 January 2015) of the
Regarding to Case 2, to further verify the applicability, gen- ARIMA model, the BPNN model, the GA–ANN model, and the
erality and superiority of the proposed model, the newest electric proposed DEMD–SVR–AR model are shown in Fig. 12(a). Simi-
load data from NYISO is employed for modeling, three alternative larly, the forecasting curve of the proposed DEMD–SVR–AR
forecasting models (including the ARIMA model, BPNN model, and model (red solid triangle and red curve) also fits better than
GA–ANN model) existed in the literature are selected to be com- others.
pared with the proposed model. Some parameter settings of the The second experiments in Cases 1 and 2 show the one-week-
employed forecasting models are set as followings. For BPNN
ahead forecasting for the large sample size data. The peak load
model, the node numbers of its structure are different between
values of testing set are bigger than that of training set shown in
small sample size and large sample size, for the former one, the
Figs. 5(b) and 9(b), respectively. The detailed forecasted results of
input layer dimension is 240, the hidden layer dimension is 12,
this experiment are shown in Figs. 11(b) and 12(b). It indicates that
and the output layer dimension is 48, and 480, 12, 336, respec-
the results obtained from the DEMD–SVR–AR model fits the peak
tively, for the latter one. The parameters of GA–ANN model used in
load values exceptionally well. In other words, the DEMD–SVR–AR
this case are as follows, generation numbers are set as 5, popula-
tion size is set as 100, bit numbers are set as 50, mutation rate is model has better generalization ability than the three comparison
set as 0.8, crossover rate is 0.05. models in both Cases. Particularly in Case 1, for example, the local
enlargement (peak) details of Fig. 11(a) and (b) are shown in Fig. 13
4.3. Empirical results and analysis (a) and (b), respectively. It is clearer to see that the forecasting curve
of the proposed DEMD–SVR–AR model (red solid dot and red curve)
For the first experiment in Case 1, the forecasting results (the fits more precise than other alternative models, i.e., it is powerful to
electric load on 8 May 2007) of the original SVR model, the PSO– keep the data changing trend including fluctuation tendency.
966 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970
5000
5000
original
4000 original 4000
predict
predict
3000 3000
2000 2000
1000 1000
data-I (MW)
data-I (MW)
0
-1000 -1000
-2000 -2000
-3000 -3000
-4000 -4000
-5000 -5000
-6000 -6000
0 24 48 72 96 120 144 168 192 216 240 264 288 0 48 96 144 192 240 288 336 384 432 480 528 576 624 672 720 768
forecast (MW)
forecast (MW)
1000 1000
0 0
0 0
-1000 -1000
-1000 -1000
-2000 -2000
-2000 -2000
-3000 -3000
-3000 -3000
-4000 -4000
-4000 -4000
-5000 -5000
0 5 10 15 20 25 30 35 40 45 50 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340
Time(hour) Time(hour)
Fig. 9. Comparison of the data-I and the forecasted electric load of training and testing by the SVR model for the small sample and large sample data in Case 2: (a) one-day
ahead prediction from 13 to 14 January 2015 are performed by the model; (b) one-week ahead prediction from 2 to 15 February 2015 are performed by the model.
Table 4
Summary of results of the AR forecasting model for data-II in Case 2.
The small sample size 6.7345 × 10−5 xn = 10372.441−0.998xn − 1 + 0. 65218xn − 2−0. 3316xn − 3 + 0. 00072xn − 4
The large sample size 7. 8579 × 10−5 xn = 11013.26 + 0. 9782xn − 1 + 0. 11xn − 2−0. 4783xn − 3 + 0. 36437xn − 4
19940
20130
19920
20120 Electric load Trend
20110 19900 forecast
data-II
19880
20090
20080 19860
20070
19840
20060
20050 19820
20040
19800
20030
20020 19780
0 5 10 15 20 25 30 35 40 45 50 0 25 50 75 100 125 150 175 200 225 250 275 300 325
Time (hour) Time (hour)
Fig. 10. Comparison of the data-II and the forecasted electric load by the AR model for the two experiments in Case 2: (a) one-day ahead prediction of 13 to 14 January 2015
are performed by the model; (b) one-week ahead prediction from 2 to 15 February 2015 are performed by the model.
9000
9000
8500 8500
8000
8000 7500
Raw data
Forecasted load by DEMDSVRAR 7000
7500
Forecasted load by SVR 6500
7000 Forecasted load by PSOSVR 6000 Raw data
Forecasted load by DEMDSVRAR
5500 Forecasted load by SVR
6500 Forecasted load by PSOSVR
5000
0 10 20 30 40 50 0 50 100 150 200 250 300 350
Time (half hour) Time(half hour)
Fig. 11. Comparison of the original data and the forecasted electric load by the DEMD–SVR–AR Model, the SVR model and the PSO–SVR model for: (a) the small sample size
(One-day ahead prediction of May 8, 2007 are performed by the models); (b) the large sample size (one-week ahead prediction from 18 to 24 May 2007 are performed by the
models). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
5. Conclusions other sub-classes with small size. The DEMD term of the proposed
DEMD–SVR–AR model has been employed in the present research,
The proposed model achieves superiority and significantly out- details of which have discussed in the above section.
performs the original SVR model while forecasting based on the The interest in applying the DEMD forecast systems arises from
unbalanced data. In addition, the goal of the training model is not to the fact that those systems consider both accuracy and compre-
learn an exact representation of the training set itself, but rather to hensibility of the forecast result simultaneously. To this end, a
set up a statistical model that generalizes better forecasting values hybrid model has been proposed and its effectiveness in forecasting
for the new inputs. In practical applications of a SVR model, if the the electric load data has been compared with three other alter-
SVR model is over trained to some sub-classes with overwhelming native models. In this study, various data characteristics of electric
size, it memorizes the training data and gives poor generalization of load are identified where the proposed model performs better than
968 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970
24000 24000
22000
22000
20000
Electric load (MW)
18000
18000
16000
Fig. 12. Comparison of the original data and the forecasted electric load by the DEMD–SVR–AR Model, the ARIMA model, the BPNN model and the GA–ANN model for:
(a) the small sample size (one-day ahead prediction from 13 to 14 January 2015 are performed by the models); (b) the large sample size (one-week ahead prediction from 2
to 15 February 2015 are performed by the models). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
10600
11500
10400
11000
Electric load (MW)
10200
10500
10000
10000
9800
9500
34 36 38 40 42 160 180 200 220 240 260 280 300 320 340
Fig. 13. The local enlargement (peak) comparison of the DEMD–SVR–AR Model, the SVR model and the PSO–SVR model for (a) the small sample size; (b) the large sample
size. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
Table 5
Table 6
Summary of results of the forecasting models in Case 1.
Summary of results of the forecasting models in Case 2.
Table 7 [11] W.-C. Hong, Electric load forecasting by seasonal recurrent SVR (support
Wilcoxon signed-rank test in Case 1. vector regression) with chaotic artificial bee colony algorithm, Energy 36
(2011) 5568–5578.
Compared models Wilcoxon signed-rank test [12] M. Yesilbudak, S. Sagiroglu, I. Colak, A new approach to very short term wind
speed prediction using k-nearest neighbor classification, Energy Convers.
α ¼0.025; W ¼4 α ¼ 0.05;W¼ 6 Manag. 69 (2013) 77–86.
[13] H. Peng, F. Liu, X. Yang, A hybrid strategy of short term wind power prediction,
Renew. Energy 50 (2013) 590–595.
DEMD–SVR–AR vs. original SVR 8 3a
[14] X. An, D. Jiang, C. Liu, M. Zhao, Wind farm power prediction based on wavelet
DEMD–SVR–AR vs. PSO–SVR 6 2a
decomposition and chaotic time series, Expert Syst. Appl. 38 (2011) 11280–11285.
DEMD–SVR–AR vs. PSO–BP 6 2a [15] Y. Lei, J. Lin, Z. He, M.J. Zuo, A review on empirical mode decomposition in fault
DEMD–SVR–AR vs. AFCM 6 2a diagnosis of rotating machinery, Mech. Syst. Signal Process. 35 (2013) 108–126.
DEMD–SVR–AR vs. EMD–SVR–AR 6 2a [16] P. Wong, Q. Xu, C. Vong, H. Wong, Rate-dependent hysteresis modeling and
control of a piezostage using online support vector machine and relevance
a
Denotes that the DEMD–SVR–AR model significantly outperforms other vector machine, IEEE Trans. Ind. Electron. 59 (2012) 1988–2001.
alternative models. [17] Z. Wang, L. Liu, Sensitivity prediction of sensor based on relevance vector
machine, J. Inf. Comput. Sci. 9 (2012) 2589–2597.
[18] W.-C. Hong, Intelligent Energy Demand Forecasting, Springer, London, UK, 2013.
Table 8 [19] Z.K. Peng, P.W. Tse, F.L. Chu, A comparison study of improved Hilbert–Huang
Wilcoxon signed-rank test. in Case 2. transform and wavelet transform: Application to fault diagnosis for rolling
bearing, Mech. Syst. Signal Process. 19 (2005) 974–988.
Compared models Wilcoxon signed-rank test [20] H. Li, B. Xu, Y. Zuo, G. Wu, The comparative study of the signal trend extraction
based on Wavelet Transformation and EMD method, Instrum. Anal. Monit. 3
α¼ 0.025; W ¼ 4 α ¼0.05; W¼ 6 (2013) 28–30.
[21] B. Huang, A. Kunoth, An optimization based empirical mode decomposition
DEMD–SVR–AR vs. ARIMA 6 2a scheme, J. Comput. Appl. Math. 240 (2013) 174–183.
DEMD–SVR–AR vs. BPNN 6 2a [22] G. Fan, S. Qing, Z. Wang, Shi, W.-C. Hong, L. Dai, Study on apparent kinetic
DEMD–SVR–AR vs. GA–ANN 6 2a prediction model of the smelting reduction based on the time series, Math.
DEMD–SVR–AR vs. EMD–SVR–AR 6 2a Probl. Eng. 2012 (2012) 1–15, http://dx.doi.org/10.1155/2012/720849.
[23] P. Bhusana, T. Chris, Improving prediction of exchange rates using differential
a EMD, Expert Syst. Appl. 40 (2013) 377–384.
Denotes that the EMDSVRAR model significantly outperforms other alter-
[24] X. An, D. Jiang, M. Zhao, C. Liu, Short-term prediction of wind power using
native models. EMD and chaotic theory, Commun. Nonlinear Sci. Numer. Simul. 17 (2012)
1036–1042.
the forecasting accuracy of the SVR model. Meanwhile, even the [25] Y. Huang, F.G. Schmitt, Time dependent intrinsic correlation analysis of tem-
perature and dissolved oxygen time series using empirical mode decom-
interference is decomposed into the residuals, the AR model is still position, J. Mar. Syst. 130 (2014) 90–100.
receive well forecasting performance. [26] G. Rilling, P. Flandrin, P. Gonçalvès, On empirical mode decomposition and its
algorithms, in: Proceedings of the 6th IEEE/EURASIP Workshop on Nonlinear
Signal and Image Processing (NSIP'03), Grado, Italy, 2003.
[27] W. Huang, Z. Shen, N.E. Huang, Y.C. Fung, Nonlinear indicial response of
Acknowledgments complex nonstationary oscillations as pulmonary hypertension responding to
step hypoxia, Proc. Natl. Acad. Sci. 96 (1996) 1834–1839 , USA.
[28] N.E. Huang, N.O. Attoh-Okine, The Hilbert Transform in Engineering, CRC
This work was supported by the Startup Foundation for Doctors Press, Florida, USA, 2005, Taylor & Francis Group.
(No. PXY-BSQD-2014001), Educational Commission of Henan [29] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag,
Province of China (No. 15A530010), The Youth Foundation of Ping New York, NY, USA, 1995.
[30] H.L. Koul, X. Zhu, Goodness-of-fit testing of error distribution in nonpara-
Ding Shan University (No. PXY-QNJJ-2014008), and Ministry of metric ARCH(1) models, J. Multivar. Anal. 137 (2015) 141–160.
Science and Technology, Taiwan (NSC 100-2628-H-161-001-MY4 [31] J.W. Taylor, Short-term load forecasting with exponentially weighted meth-
ods, IEEE Trans. Power Syst. 27 (2012) 458–464.
and MOST 104-2410-H-161-002).
[32] J. Che, J. Wang, G. Wang, An adaptive fuzzy combination model based on self-
organizing map and support vector regression for electric load forecasting,
Energy 37 (2012) 657–664.
References
[1] J.T. Bernard, D. Bolduc, N.D. Yameogo, S. Rahman, A pseudo-panel data model
Guo-Feng Fan was born in Shanxi Province, China.
of household electricity demand, Resour. Energy Econ. 33 (2010) 315–325.
Birthdate: May 29th, 1985. He received his Doctoral
[2] F.J. Ardakani, M.M. Ardehali, Long-term electrical energy consumption fore-
degree in Engineering Research Center of Metallurgical
casting for developing and developed economies based on different optimized Energy Conservation and Emission Reduction, Ministry
models and historical data types, Energy 65 (2014) 452–461. of Education, Kunming University of Science and
[3] I. Arisoy, I. Ozturk, Estimating industrial and residential electricity demand in Technology, Kunming, 2013. His research interests are
Turkey: A time varying parameter approach, Energy 66 (2014) 959–964. ferrous metallurgy, Energy forecasting, Optimization,
[4] K. Afshar, N. Bigdeli, Data analysis and short term load forecasting in Iran System Identification.
electricity market using singular spectral analysis (SSA), Energy 36 (2011)
2620–2627.
[5] U. Kumar, V.K. Jain, Time series models (Grey–Markov, Grey Model with
rolling mechanism and singular spectrum analysis) to forecast energy con-
sumption in India, Energy 35 (2010) 1709–1716.
[6] P. Li, Y. Li, Q. Xiong, Y. Zhang, Application of a hybrid quantized Elman neural
network in short-term load forecasting, Int. J. Electr. Power Energy Syst. 66
(2014) 1–8.
[7] A. Kavousi-Fard, H. Samet, F. Marzbani, A new hybrid modified firefly algo- Li-ling Peng, Hunan Province, China. Birthdate: February
rithm and support vector regression model for accurate short term load 15th, 1985, She received his master degree in Faculty of
forecasting, Expert Syst. Appl. 41 (2014) 6047–6056. Science, Kunming University of Science and Technology,
[8] F. Rodrigues, The daily and hourly energy consumption and load forecasting Kunming, 2013 and research interests on recognition of
using artificial neural network method: a case study using a set of 93 pattern in image and computer. Especially she is good at
households in Portugal, Energy Procedia 62 (2014) 220–229. the recognition and prediction of the meteorology.
[9] S. Kouhi, F. Keynia, S.N. Ravadanegh, A new short-term load forecast method
based on neuro-evolutionary algorithm and chaotic feature selection, Int. J.
Electr. Power Energy Syst. 62 (2014) 862–867.
[10] J. Geng, M.-L. Huang, M.-W. Li, W.-C. Hong, Hybridization of seasonal chaotic
cloud simulated annealing algorithm in a SVR-based load forecasting model,
Neurocomputing 151 (2015) 1362–1373.
970 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970
Wei-Chiang Hong received his Ph.D. degree in Man- Fan Sun was born in Henan, China, November 13th
agement from Da-Yeh University, Taiwan, in2008. Since 1972. She received her B.S. degree in Mathematics
September 2006, he has been with the Department of education from Henan University, China, 1996. Her
Information Management of the Oriental Institute of research interests are Mathematics education and
Technology, where he is currently a professor. His Applied mathematics.
research interests mainly include applications of fore-
casting technology and computational intelligence. He
is currently appointed as the Editor-in-Chief of the
International Journal of Applied Evolutionary Compu-
tation, he is also on the Editorial Board of several
journals, including Neurocomputing, Applied Soft
Computing, The Scientific World Journal, Journal of
Applied Mathematics, Energy Sources Part B: Econom-
ics, Planning, Policy, etc.