You are on page 1of 13

Applied Energy 261 (2020) 114368

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Short-term electrical load forecasting based on error correction using T


dynamic mode decomposition
⁎ ⁎
Xiangyu Konga, , Chuang Lia, , Chengshan Wanga, Yusen Zhangb, Jian Zhangc
a
Key Laboratory of Smart Grid of Ministry of Education (Tianjin University), Nankai District, Tianjin 300072, China
b
Xiong’an New Area Branch of State Grid Hebei Electric Power Co., Ltd., Baoding 071600, Hebei Province, China
c
State Grid Tianjin Electric Power Company, Tianjin 300010, China

H I GH L IG H T S

• An error correction method based on dynamic mode decomposition is proposed.


• Developed an extreme value constraint method to further correct outliers.
• The proposed method does not depend on any previous assumptions.
• The proposed method can improve the prediction accuracy of different models.

A R T I C LE I N FO A B S T R A C T

Keywords: Accurate short-term load forecasting (STLF) is an important basis for daily dispatching of the power grid, but the
Short-term load forecasting non-stationary characteristics of the load series add to the challenge of this task. Many researchers have been
Error correction working to improve the accuracy and speed of forecasting models, but stability is equally important. This paper
Dynamic mode decomposition develops a forecasting method based on error correction using dynamic mode decomposition (DMD) for STLF,
Grey relational analysis
including data selection, error forecasting, and error correction. In the data selection stage, three types of data
Extreme value constraint method
are selected as input data of the model, including previous day data, same day data in previous week and similar
day data obtained by grey relational analysis (GRA). In the error forecasting stage, the data driving char-
acteristics of the DMD algorithm is used to capture the potential spatiotemporal dynamics of error series, thereby
realizing the error forecasting. In the error correction stage, on the basis of combining the forecasting results of
load and error, an extreme value constraint method (EVCM) is developed to further correct the load demand
series. Based on the load data of different regions, this paper selects different performance indicators, such as
MAPE, MAE, RMSE, Variance and direction accuracy (DA), to prove that the proposed method has the ad-
vantages of accuracy and stability.

1. Introduction of units, economic dispatch, optimal trend, electricity market transac-


tions, etc. [1].
In recent years, many countries have actively promoted low-carbon Electrical load forecasting can be divided into long-term (years
energy transformation and the economic access of renewable energy to ahead), medium-term (months to a year ahead), short-term (one day to
power systems. However, the high proportion of renewable energy is weeks ahead) and ultra-short-term (minutes to hours ahead) according
connected to the grid, which has an impact on the users’ power con- to forecasting periods [2]. For short-term load forecasting (STLF), their
sumption behavior and electrical load forecasting. As a part of power 24-hour demand curves have obvious similarities including (1) between
planning, load forecasting plays a fundamental role in achieving safe different working days, (2) the same day data in the previous weeks, (3)
operation and scientific management of the system. Accurate load between different rest days [3]. In addition, it is sensitive to changes in
forecasting is an important guarantee for improving the utilization of the external environment, such as climate change, demand response
generation equipment and the effectiveness of economic dispatch. and social activities, which increases the randomness of the load series
Furthermore, it has important significance for the optimal combination [4]. At the same time, with the large-scale access of energy storage,


Corresponding authors.
E-mail addresses: eekongxy@tju.edu.cn (X. Kong), lchuangsky@tju.edu.cn (C. Li), cswang@tju.edu.cn (C. Wang).

https://doi.org/10.1016/j.apenergy.2019.114368
Received 17 September 2019; Received in revised form 7 December 2019; Accepted 12 December 2019
Available online 30 December 2019
0306-2619/ © 2019 Elsevier Ltd. All rights reserved.
X. Kong, et al. Applied Energy 261 (2020) 114368

electric vehicles and renewable energy to the grid, which poses more and Guo [17] proposed a hybrid grey model (HGM) to improve the
serious challenges for STLF. prediction accuracy by combining the rolling mechanism, but it cannot
balance multiple influencing factors. In order to make up for this
1.1. Related works and motivation shortcoming, Li et al. [18] proposed a multivariate grey model (MGM).
Probabilistic load forecasting (PLF) can be obtained from the fol-
In the past decades, various STLF methods can be roughly divided lowing conditions: (1) using probabilistic predictive models; (2) simu-
into three categories according to the technical characteristics, in- lating predictive input scenarios; (3) converting point load forecasting
cluding (1) statistical method (e.g., linear model, non-linear model, into PLF by residual simulation or combination forecasting [19].
correlation analysis method, probabilistic load forecasting, etc.), (2) Among the various methods of PLF, the simulated temperature scene
neural network model (e.g., artificial neural network, deep belief net- (STS) is generally accepted for its simplicity and interpretability [19].
work, deep Boltzmann machine, etc.), (3) hybrid forecasting method. Xie and Hong [19] used error measurements to quantitatively compare
In general, the linear model is simple and easy to implement [5], the generation techniques of temperature scene, including the fixed-
such as auto-regressive moving average (ARMA), auto-regressive in- date, shifted-date and bootstrap methods, and proposed an empirical
tegrated moving average (ARIMA), exponential smoothing, semi-para- formula to select the parameters of temperature scene. PLF is widely
metric model (SPM), multiple linear regression (MLR), etc. Zhang et al. concerned because it provides more comprehensive prediction in-
[6] proposed a hybrid forecasting method including wavelet transform, formation. In GEFCom2014, GAM showed good predictive performance
kernel extreme learning machine (KELM) and ARMA, and proved that in the electrical load forecasting [20]. In GEFCom2017, a black analysis
this method has better versatility and practicability than the single technique is developed to establish a set of base point prediction models
model. Based on the electrical demand data from different European for each region and hour of the day, and to achieve more accurate
countries, Taylor [7] proved that the two-season Holt-Winters ex- predictions [21]. However, compared with point load forecasting, PLF
ponential smoothing has the advantage of simplicity and robustness is more difficult to integrate into other applications, such as control
compared to ARIMA. However, Gould et al. [8] believed that the use of systems.
Holt-Winters exponential smoothing (HWES) needs to be based on the Neural network consists of the input layer, hidden layer and output
assumption that the intraday cycle is the same for each day of the week, layer [22]. It is widely used in the field of STLF, such as artificial neural
which limits the scope of application of it. To solve this problem, Gould network (ANN) [1], generalized regression neural network (GRNN)
et al. proposed an intraday cycle exponential smoothing method that [23], wavelet neural network (WNN) [24], etc. Based on the decom-
allows intraday periods of different dates to be represented by different posed load time series, Qiu et al. [25] used the deep belief network
seasonal components. Goude et al. [9] used a semi-parametric model to (DBN) to model each component separately and then weighted their
estimate the relationship between load and influencing factors, and prediction results. Zhang et al. [26] used the deep Boltzmann machine
effectively predicted the load demand of more than 2200 substations in (DBM) to forecast wind speed, but it is difficult to determine the op-
France. In the 2012 Global Energy Forecasting Competition timal topology of the model. In theory, the more complex the structure
(GEFCom2012), MLR is widely used. Research group CountingLab de- of depth model, the stronger the fitting ability, which means it can
veloped a prediction method that combines 5-best fitted models based approach complex time series [22], but the training efficiency of depth
on MLR and won the competition. Other research groups Quadrivio and model will be significantly reduced [26]. With the advent of cloud
Tao’s Vanilla Benchmark used a simple model MLR and ranked 7th and computing and big data era, the significant increase in computing
25th respectively [10]. power can alleviate the inefficiency of training, which has led to a re-
Non-linear model is transformed into a linear model by embedding search boom in deep learning. In fact, the learning process of the neural
input data into high-dimensional space based on non-linear mapping network involves many parameters that are set according to experience,
[11]. Since the performance of the model depends on the choice of so it often falls into the local minimum [25].
parameter values, researchers improve the prediction accuracy of the Most of the above references use a hybrid prediction method. Due to
support vector machine (SVM) by finding suitable optimization algo- the complexity of the prediction environment, it is difficult for a single
rithms [12]. SVM and its variants are widely used for STLF. Yang et al. model to meet the requirements for prediction accuracy, so researchers
[13] improved the performance of the model by combining the sub- combine the advantages of multiple technologies to predict future load
sampling method with support vector regression (SVR). However, the values [27]. From the characteristics of hybrid prediction, this paper
introduction of asymptotic normality increases the complexity of the roughly divides it into the following three types.
method. As we all know, the kernel function directly determines the
performance of SVM, but the choice of the kernel function is still an (1) Hybrid of decomposition technique and prediction model. The main
unsolved problem. In addition, generalized additive model (GAM) is function of decomposition technique is to decompose and extract
also a type of nonlinear model. Serinaldi [14] changed the model the characteristics of load time series, and it includes wavelet
parameters according to the relationship between multi-linear and non- transform (WT) [6], wavelet packet transform (WPT) [28], em-
linear, and established the dynamic distribution model of GAM to de- pirical mode decomposition (EMD) [24,25], ensemble empirical
scribe the period and trend of electricity price. GAM won the challenge mode decomposition (EEMD) [29], fast ensemble empirical mode
of GEFCom2014 and is one of the more popular technologies at present. decomposition (FEEMD) [23], complementary ensemble empirical
Correlation analysis methods forecast future values by calculating mode decomposition (CEEMD) [27], singular spectrum analysis
the correlation between variables to match the approximation variables (SSA) [30], variational mode decomposition (VMD) [27] and dy-
in the system, and it includes pattern sequence forecasting (PSF), gray namic mode decomposition (DMD) [5], etc. In response to the
model (GM), etc. PSF is a prediction method that mixes the nearest pattern mixing problem of EMD, EEMD is proposed. However,
neighbor algorithm with clustering techniques [15]. Alvarez et al. [16] EEMD cannot completely eliminate the white noise, and its de-
used clustering techniques to obtain labeled samples and then searched composition rate is slow. Therefore, CEEMD and FEEMD are de-
for matching pattern series in historical data. However, this method veloped to deal with white noise and improve decomposition speed
only considers the labels associated with each pattern to forecast future separately [27]. In order to adapt to the prediction environment,
values, but the impact on chance factors is underestimated. Koprinska the use of decomposition technique has become diverse, such as
et al. [15] used the time series predicted by PSF as one of the input single-layer decomposition [25] and two-layer decomposition [27].
characteristics of the neural network, but ignored the influence of error (2) Hybrid of feature selection technique and prediction model. In
accumulation. In STLF, GM is good at solving the uncertainty problem general, the main purpose of the feature selection technique is to
and processing the sample data that is small or has bad data [17]. Zhao select strongly correlated features for the model to improve the

2
X. Kong, et al. Applied Energy 261 (2020) 114368

accuracy and speed of the prediction [2]. Koprinska et al [2] correction method for the load forecasting model. By establishing a
evaluated the performance of four feature selection techniques, comparative experiment of singular spectrum analysis (SSA) and DMD,
such as auto-correlation (AC), mutual information (MI), RReliefF Sanei et al. [36] proved that DMD has higher accuracy in predicting
(RF), and correlation-based feature selection (CFS). Furthermore, univariate time series. In this paper, an error correction method based
ant colony optimization (ACO) and minimal redundancy maximal on DMD is proposed, and it can identify the fluctuation characteristics
relevance (mRMR) are also used for feature selection [31]. Since of the error and improve the prediction accuracy of the complex non-
the feature selection is a process of removing redundant features, it linear error curve. The main contributions of this paper are as follows.
does not fully exploit the useful information in redundant features.
(3) Hybrid of error correction technique and prediction model. Error (1) Considering that the load forecasting model is always accompanied
correction techniques extract useful information from the error by the existence of error, a DMD-based error prediction method is
values to correct the predicted values. Hao and Tian [32] used the established to effectively capture the dynamic trend of error series
extreme learning machine (ELM) optimized by the multi-objective and achieve accurate error prediction. It does not need to extract
grey wolf optimizer (MOGWO) to separately predict the error and input features and select parameters, which makes the prediction of
wind power components, and then integrated the predicted values errors easier to implement.
of all components and errors to improve the accuracy of wind (2) An error correction strategy is proposed, which can accurately
power prediction. Based on data transformation, Yu et al. [33] forecast load demand series in a complex environment by providing
constructed an error forecasting model using the gray model (GM correction values for load forecasting results. It has good general-
(1, 1)). Mao et al. [34] proposed a novel error forecasting model ization ability and can be mixed with various load prediction
that combines with the wind speed forecasting model to obtain models to form a more accurate and stable model.
more accurate prediction results. Furthermore, Cai et al. [35] used (3) By obtaining the extreme values of the three data types, such as the
SVR as an error prediction model to correct the prediction results of previous day data, the same day data in previous week and the
the seasonal ARIMA model, thereby reducing the impact of nu- similar day data, an extreme value constraint method (EVCM) is
merical weather prediction (NWP) errors on STLF. developed to further correct the load time series corrected by DMD,
thereby improving the stability of the prediction model.
In addition to the hybrid model based on the error correction
technique, other hybrid models can improve the prediction results to The remainder of this paper is organized as follows. The mathe-
some extent, but the importance of error data is ignored [33]. There- matical principle and application process of DMD are given in Section
fore, this paper makes full use of error data to achieve error correction, 2. The proposed method is described in detail in Section 3. Experi-
which further improves the accuracy of short-term electrical load mental analysis is presented in Section 4. Finally, this study is sum-
forecasting. Table 1 shows a detailed review of the above references. marized and prospected in Section 5.

1.2. Objective and contributions 2. Mathematical principle of DMD

The purpose of this paper is to provide a stable and accurate error DMD is a data-driven algorithm, which is originally used in the field

Table 1
Summary of various prediction models.
Type Category Technique Data source Interval Index

Statistical methods Linear model ARMA [6] America, Spain One-hour MAE, RMSE, MAPE
HWES [8] America One-hour MSFE
SPM [9] France Ten minutes MAPE
MLR [10] GEFCom2012 One-hour WRMSE
Non-linear model SVM [12] India One-hour MAPE
SVR [13] China, America Half-an-hour MAE, MAPE
GAM [14] America, Italy One-hour WME
Correlation analysis PSF [15] Spain One-hour MAE
HGM [17] China One-year MAPE, RMSE
MGM [18] China One-year RE, MSE
PLF STS [19] North Carolina One-hour QS
Neural network models Shallow network model ANN [19] Mauritius One-month RMSE, RE
Deep network model DBN [25] Australia Half-an-hour RMSE, MAPE
DBM [26] China Ten minutes MSE, MAPE
Hybrid methods Decomposition technique and model WT-KELM-ARMA [6] America, Australia One-hour MAE, RMSE, MAPE
WPT-LSSVM [28] America One-hour MSE, MAPE, ESD
EMD-ARIMA [24] Australia Half-an-hour MAPE, MAE, RMSE
EEMD-Random forest [29] China One-day MAE, RMSE, MAPE
FEEMD-GRNN [23] Australia Half-an-hour MAE, MSE, MAPE
CSV-ENN [27] China Ten minutes, Half-an-hour MAE, RMSE, DA, MAPE
SSA-AR [30] Iran One-hour DME, WME
Feature selection technique and model AC-Neural network [2] Australia Five minutes MAE, MAPE
mRMR-GRNN [31] China One-hour MAE, MAPE, TIC
Error correction technique and model MOGWO-ELM [32] Spain, Canada One-hour MAE, RMSE, MAPE
LSSVR-GM(1,1) [33] China One-year MAPE
MFEC-SVM [34] China Fifteen minutes RMSE, MAPE
SVR-ARIMA [35] New England One-hour MAPE

WRMSE: Weighted Root Mean Square Error; MSFE: Mean Square Forecast Error; TIC: Theil Inequality Coefficient; DME: Daily Mean Error; WME: Weekly Mean Error;
ESD: Error Standard Deviation; MAPE: Mean Absolute Percent Error; RMSE: Root Mean Square Error; DA: Direction Accuracy; MAE: Mean Absolute Error; MSE: Mean
Square Error; RE: Relative Error; QS: Quantile Score; CSV: hybridization of CEEMD, sample entropy, and VMD; ENN: Elman Neural Network; AR: Auto-Regressive
model; LSSVR: Least Squares Support Vector Regression; MFEC: Model of Forecasting Error Correction.

3
X. Kong, et al. Applied Energy 261 (2020) 114368

of fluid dynamics to extract complex spatiotemporal features from the min ∥X2∗ − AX1∗ ∥2F
A (4)
flow data. Compared with feature extraction (using spatial samples)
and system identification (using time series and input-output samples), Combining with the Eqs. (3) and (4), A can be approximated as
the data-driven nature of DMD has the unique advantage of space-time
A ≈ B = U H YV Σ−1 (5)
coupled modeling [5]. The error forecasting based on DMD consists of
four phases: (1) constructing error Hankel matrix, (2) pattern decom- Matrix B contains the main error characteristics of A, which is due
position of the error, (3) reconstructing the error series, and (4) fore- to A ≅ B . If the ith eigenvalue of B is μi and the eigenvector is wi , then
casting the error series. the ith error mode Φi can be expressed as Uwi . The growth rate gi and
frequency fi of Φi can be defined as
2.1. Constructing error Hankel matrix gi = Re{lg(μi )}/Δt
fi = Im{lg(μi )}/Δt (6)
Before using the DMD, it is necessary to normalize the error data.
The goal of data normalization is to transform dimensioned data into After the above decomposition, we can extract the dynamic mode of
dimensionless data and map the data to a range of 0 to 1, which can the error and further estimate the evolution process of the error ac-
improve the convergence speed of the forecasting model. cording to the matrix B.
For a given error sample data Y = [x1, x2 , x3 , ⋯, x i], the data nor-
malization can be expressed as 2.3. Reconstructing the error time series
x i − x min
x i∗ = Through the singular value decomposition, the error snapshot
x max − x min (1)
x i∗ ⊆ X1∗ is mapped to the subspace zi , and the feature decomposition of
where x i∗ is the normalized value of x i , x max and x min are the maximum the matrix B is expressed as
and minimum values of Y, respectively.
To capture the complete dynamic of the error sequence, the nor- B = WDW −1, D = diag(μ1 , μ 2 , ⋯, μr ) (7)
malized error sequence is converted to a multidimensional data matrix. where W is the matrix of error eigenvectors wi , and D is a diagonal
If the error time series with length N after normalization is matrix of B singular values. According to formula (7), the error snap-
X = [x1, x2 , x3 , ⋯, x i , ⋯, xN ]. x i is the error snapshot (or observation) at shot at any time can be estimated as
the ith moment. The time interval between any two adjacent snapshots
is Δt (1 h or 15 min). x i∗ = Ax i∗− 1 = UBU H x i∗− 1 = UWDi − 1W −1U H x1∗ (8)
The error sequence X is expanded into a multi-dimensional data
Combining with Φi = Uwi , the amplitude of the error modality can
matrix whose window length is L ∈ [2 ⩽ L ⩽ (N /2)] and the number of
be defined as
overlapping segments is K = N − L + 1,
α = W −1z1 = W −1U H x1∗ (9)
x x2 x3 ⋯ xL
⎡ 1 ⎤
∗ ∗ ∗ ∗ ⎢ x2 x3 x4 ⋯ xL + 1 ⎥ where α = [α1,α2, ⋯, αr ]T ,
αr is the amplitude of the rth mode re-
X = [x1 , x 2 , ⋯, xL ] =
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ presenting the contribution of this mode to the error snapshot x1∗.

⎣ xK xK + 1 xK + 2 ⋯ xK + L − 1⎥
⎦ (2) Combining with the Eqs. (8) and (9), the reconstructing an error
snapshot at any time can be expressed as
Since the elements on each sub-diagonal of X∗
are equal, it is a
Hankel matrix. Generating the error matrix of this special structure is r

helpful to better capture the dynamic change pattern of errors. x i∗ = ∑ Φj αj (μj )i −1


j=1 (10)

2.2. Pattern decomposition of the error Reconstructing an error time series matrix X R∗ with a window length
of p (p > r), it can be expressed as
Using the error snapshots from 1 to L, the snapshot matrices X R∗ = [xR∗ (1) , xR∗ (2) , ⋯, xR∗ (p) ] = ΦαVand (p)
X1∗ = [x1∗, x 2∗, ⋯, xL∗ - 1] and X2∗ = [x 2∗, x 3∗, ⋯, xL∗] can be constructed. The
1
average interval of the k-lag vector xL∗ is Δt , so there is a mapping A α1 ⎡ 1 μ1 ⋯ μ1p − 1⎤
⎛ ⎞⎢
between successive lag vectors such that X2∗ is equal to AX1∗. The A is a = [Φ1, ⋯, Φr ] ⎜ ⋱ ⎟ ⋮ ⋮ ⋮ ⎥
⎝ αr ⎠ ⎢ 1 μ 1 ⋯ p−1
μ1 ⎥

high-dimensional error system matrix that reflects the dynamic char- ⎢
⎣ 1 ⎦ (11)
acteristics of the error system. Since the dimension of A is high, it is
necessary to calculate it from the error sequence by means of reducing where Vand (p) is a standard Vandermonde matrix for reconstruction. By
order. using it to form the time series of the DMD model, xR∗ (p) can be increased
Since the similar transformation of high-order operators is per- to appropriate error values.
formed by singular value decomposition to achieve system re-
construction, which has better numerical stability [37]. For X1∗, a si- 2.4. Forecasting the error series
milar matrix B can be provided instead of the A. To find the orthogonal
subspace of a similar transformation, the singular value decomposition Since the range of error values based on reconstruction is between 0
of X1∗ can be used to obtain and 1, the inverse normalization is required to restore the true error
value, according to the Eq. (12).
X1∗ = U ΣV H , A = UBU H (3)
xR (ij) = xR∗ (ij) (x max − x min ) + x min (12)
where U and V H
are the left and right singular vectors, V H
is the adjoint
matrix of V, Σ is the singular value diagonal matrix of X1∗, whose di- where xR (ij) is the jth error value of the ith snapshot after anti-normal-
agonal contains n singular values. In the singular value decomposition ization, x max and x min are the maximum and minimum values of the
process, in order to reduce the numerical noise, larger singular values sample Y, respectively.
are preserved. In order to predict the error time series of length f, the anti-nor-
The calculation of B can be regarded as the minimization of the malization error matrix XR = [xR (1) , xR (2), ⋯, xR (p) ] is reorganized into
Frobenius norm, which can be expressed as XR̂ . XR̂ ∈ g × f , g = p − f + 1, the element XR̂ (ij) , 1 ⩽ i ⩽ g , 1⩽ j ⩽ f .

4
X. Kong, et al. Applied Energy 261 (2020) 114368

divides the data set into three parts including the load forecasting
model training set (A), the forecasting error training set (B), and the test
set (C). In Fig. 3, the ratio of the three parts is set to 2:2:1, which means
that 40% of the data is used for the training of the load forecasting
model, 40% of the data is used for the training of the error forecasting
model, and 20% of the data is used to test the final result of the pre-
diction. Based on the divided data, the specific use of each part of the
data in the proposed method is described in Fig. 4.
With the advent of grid big data, the variables of the input model
are not limited to the electrical load data. Entering multiple variables to
improve the performance of the model is widely used, such as date type
[9], temperature [2,9], humidity [2], weather, electricity price [28],
etc. In addition, in the context of renewable energy [38] and demand
Fig. 1. Process of forecasting the error time series based on DMD. response [39], the choice of input variables is more diverse.
In fact, inputting a large number of weakly correlated variables has
no positive significance for the training of predictive models, which
x xR (2) ⋯ xR (f )
⎡ R (1) ⎤ means that the number of input variables is not necessarily related to
⎢ x x ⋯ xR (2f ) ⎥
X^R = ⎢
R (f + 1) R (f + 2)
the prediction accuracy [40]. Electrical load series has some special
⋮ ⋮ ⋱ ⋮ ⎥
⎢ xR (g ) xR (g + 1) ⋯ xR (g + f − 1) ⎥ properties, such as (1) similarity (similar load demand on different
⎣ ⎦ (13) dates), (2) correlation (susceptible to the load of the previous day), and
In order to obtain an accurate error sequence with a window length (3) weekly periodicity (the same day data in the previous week is si-
of f, it is necessary to perform an averaging operation on the diagonal of milar). Therefore, this paper selects three types of data from the his-
X^R according to the Eq. (14). torical sample dataset to predict load demand, including (1) the pre-
vious day data, (2) the same day data in the previous week, and (3) the
n+1
⎧ 1 ∑ x similar day data.
, 1⩽ n < (g − 1)
⎪ n + 1 m = 1 R (m, n − m + 2) For the selection of similar day data, grey relational analysis (GRA)

⎪ 1 g is used to process the load time series in this paper [41]. The method
xn = ∑ xR (m, n − m + 2), (g − 1) ⩽ n < f uses the degree of association to represent the similarity of the shape
⎨ g m=1
⎪ p−f +1 between the associated sequence and the reference sequence, and the
⎪ 1 ∑ xR (m, n − m + 2) , f ⩽ n < p higher the similarity of the sequence shape, the greater the correlation.
⎪ p − n
⎩ m=n−f +2 (14) The specific calculation process is as follows.
where m and n represent the position of the element in the matrix XR̂ ,
(1) Selecting the load sequence.
i + j = n + 2 . The final predicted error time series x can be expressed as
x = [x1, x2 , ⋯, x f ] (15) The reference sequence X0 = [x 0 (1), x 0 (2), ⋯, x 0 (n)] and the com-
parison sequence Xi = [x i (1), x i (2), ⋯, x i (n)] are defined, where x 0 (n)
Finally, the process of predicting the error time series with DMD can
and x i (n) are the value of X0 and Xi in the n moment, and n is the length
be briefly illustrated in Fig. 1.
of time.

3. Forecasting method based on error correction and dynamic (2) After the sequence is standardized, the correlation coefficient at
mode decomposition different times is calculated, which can be expressed as

This section focuses on the selection of input data and the con- minmin |x 0 (k ) − x i (k )| + λ maxmax |x 0 (k ) − x i (k )|
i k i k
F (k ) =
struction of the prediction method. Specifically, the allocation and |x 0 (k ) − x i (k )| + λ maxmax |x 0 (k ) − x i (k )|
i k (16)
application of the data required for the experiment are described in
Section 3.1, and the specific process of the proposed method is detailed where F (k ) is the kth time correlation coefficient, k ∈ (0, n) ; λ ∈ (0, 1)
in Section 3.2. is the resolution coefficient; minmin |x 0 (k ) − x i (k )| is the minimum
i k
value of the difference between the reference sequence and all com-
3.1. Data description and selection parison sequences; maxmax |x 0 (k ) − x i (k )| is the maximum value of the
i k
difference between the reference sequence and all comparison se-
This paper selects the actual load demand data from the grid in quences.
Tianjin, China including history load data from Jan 2016 to Dec 2017,
which can be divided into two types: (1) large regional level load de- (3) By calculating the average value of the correlation coefficients at
mand time series (interval: 1 h, 24 collection points a day), (2) small different times, the grey correlation degree r between the sequences
area level load demand time series (interval: 15 min, 96 collection can be expressed as
points a day). The details are shown in Table 2 and Fig. 2.
In order to match the data with the proposed method, this paper

Table 2
Data characteristics of the sample data set.
Types Years Interval Range (MW) Median (MW) Mean (MW) Variance (MW)

Large regional level 2016 1h 5550.35 3505.81 3184.11 620599.77


2017 1h 5613.13 3576.36 3241.41 593530.62
Small area level 2016 15 min 1065.65 1168.14 1048.50 11694.13
2017 15 min 1133.21 1183.27 1138.58 11356.60

5
X. Kong, et al. Applied Energy 261 (2020) 114368

Fig. 2. Sample data set including large regional load with 1-hour interval and small area load with 15 min interval.

Fig. 3. Schematic diagram of the experimental data is divided into three parts
of A, B, and C according to the ratio of 2:2:1.

Fig. 4. The application process of each part of the data in the experiment.

n
1
r=
n
∑ F (k )
k=1 (17)

Fig. 5. Forecasting process of the proposed method including data selection in


3.2. Forecasting procedure Fig. A, error forecasting in Fig. B, and error correction in Fig. C.

The traditional method is to obtain the result directly through the Step 4 (obtaining the error): the original value is subtracted from the
load forecasting model, which does not fully exploit the value of the predicted value to obtain the error time series, as shown in Fig. B of
error. In response to this problem, this paper establishes an error cor- Fig. 5.
rection model based on DMD to predict the error value, which is used to Step 5 (error forecasting): DMD is used to predict the error time
correct the predicted value of load forecasting model, as shown in series in Fig. C of Fig. 5. The prediction process of DMD is detailed in
Fig. 5. Section 2.
The specific forecasting process is as follows. Step 6 (obtaining the forecasting result): the final load forecasting
Step 1 (preprocessing the abnormal data): based on the forecasting results can be obtained by adding the load predicted value obtained by
model, the error sequence is obtained to determine the abnormal data, Step 3 to the error predicted value obtained by Step 5.
and then correcting or deleting it [42]. Based on the above steps, two explanations are needed.
Step 2 (data selection): the previous day data, the same day data in
the previous week and the similar day data are selected respectively (1) The abnormal data may be generated during data acquisition and
from load model training data and error training data, as shown in Fig. transmission, so simple identification and processing are required
A of Fig. 5. before prediction. If the outliers disturb the normal training process
Step 3 (load forecasting): the selected three types of data are input of the model, they should be corrected or deleted. The definition of
into the load forecasting model to predict the load time series of the the outliers in the load time series: (1) selecting the value of the
error training dataset in Fig. B of Fig. 5 and the load time series of the load L at a certain moment and the mean of the moment M in the
test dataset in Fig. C of Fig. 5.

6
X. Kong, et al. Applied Energy 261 (2020) 114368

uses the Sigmoid function. The details of each model can be obtained by
consulting the corresponding references.

4.1. Performance metrics

This paper selects five performance metrics to evaluate the predic-


tion model, including mean absolute percentage error (MAPE), mean
absolute error (MAE), root mean square error (RMSE), Variance and
direction accuracy (DA). MAPE is the most widely used measure. MAE
can avoid errors offsetting each other and reflect the actual situation of
the error. RMSE is sensitive to anomalies in the dataset, which can
reflect the degree of dispersion of the error set. Variance and DA are
used to evaluate the stability and direction accuracy of the prediction
model, respectively [48]. In addition to DA, the smaller the value of
other performance indicators, the better the predictive performance of
the model.

(1) MAPE is defined as


Fig. 6. Schematic diagram of correcting the load forecasting results with EVCM.
1 N Ai − Fi
MAPE =
N
∑i =1 Ai
× 100%
(18)
previous week, (2) if the absolute value of L-M is greater than 30%
L, L is an abnormal value and is replaced by M. Otherwise, L is a
normal value. (2) MAE is defined as
(2) Some models need to optimize parameters such as SVM (regular- 1 N
ization parameters and kernel function parameters), back-
MAE =
N
∑i =1 |Ai − Fi|
(19)
propagation neural networks (weights and thresholds). If there is a
slight deviation in the setting of parameters, it usually leads to large (3) RMSE is defined as
fluctuation in the prediction results. In response to this problem,
this paper develops EVCM to simply and efficiently correct the 1 N
RMSE =
N
∑i =1 (Ai − Fi )2
(20)
prediction results. Since the three types of electrical load time series
selected in Section 3.1 are similar, EVCM takes their maximum and
minimum values as the upper and lower limits of the final load (4) Variance is defined as
forecasting values to achieve error correction, as shown in Fig. 6. 1 K
The correction rules are (1) if the predicted value is lower than the Variance =
K
∑r=1 (MAPEr − M )2 (21)
minimum value, the correction value is equal to the minimum
value, (2) if the predicted value is higher than the maximum value, (5) DA is defined as
the correction value is equal to the maximum value, and (3) if the
predicted value is greater than or equal to the minimum value and 1 N 1, (Ai + 1 − Ai )(Fi + 1 − Ai ) > 0
less than or equal to the maximum value, the correction value is
DA =
N
∑i =1 ⎧⎨ 0, otherwise
⎩ (22)
equal to the predicted value.
where N is the number of sampling points of the time series, Ai and Fi
are the actual value and the predicted values of the ith sampling point
4. Experiments and analysis respectively, K represents the number of MAPE values in the sample set,
MAPEr represents the rth MAPE value in the sample set, M represents
This section is mainly to verify the performance of the proposed the average value of the MAPE.
method in practical application. Based on the historical data described
in Section 3.1, Experiments I and II are designed in Sections 4.2 and 4.3, 4.2. Experiment I
respectively. This paper selects a variety of models to enrich the ex-
periment such as ARMA [6], ARIMA [7], GAM [14], Persistence model The statistical method is used to analyze the stability and accuracy
(PM) [43], ANN [44], SVR [45], least squares support vector machine of the predictive model. In order to reflect the role of the EVCM,
(LSSVM) [45], ELM [42,46], and back propagation (BP) neural network Experiment I established three sets of comparison scheme based on
[47]. different load forecasting models, including (1) using only load fore-
ARMA and ARIMA are traditional methods in time series analysis, casting model, (2) using EVCM to correct the predicted result of load
and their parameters are estimated by nonlinear least squares method. forecasting model, and (3) using the proposed method to correct the
For GAM, it can perform multivariate analysis because its extended predicted result of load forecasting model. Specifically, PM [43], ARMA
structure allows separation and representation of different character- [6], ANN [44], and SVR [45] are selected as load forecasting models to
istics of the signal. PM is the baseline for evaluating the predictive form four comparative schemes: (1) PM, PM-EVCM, PM-Proposal, (2)
model, and it follows a basic rule that today’s load value is equal to ARMA, ARMA-EVCM, ARMA-Proposal, (3) ANN, ANN-EVCM, ANN-
tomorrow. ANN with only one hidden layer is used to reduce the Proposal, and (4) SVR, SVR-EVCM, SVR-Proposal.
training difficulty of the model. SVR and LSSVM use the radial basis In order to obtain a large number of statistical samples, Experiment
function (RBF), and their kernel function parameter C and regulariza- I forecasts the load of day-ahead and obtains 61 days of forecast results
tion parameter γ are determined by the training data set. ELM uses a from Nov to Dec 2017 based on the large regional data of Tianjin,
single-layer feedforward neural network to achieve optimal results by China. In Fig. 7, boxplot chart counts the predicted results for each
selecting appropriate mapping functions at different hidden layer model for 61 days, and its horizontal centerline represents the median,
nodes. The number of neurons in the input layer of BP neural network is the edge locations correspond to the upper and lower quartiles of the
3, the hidden layer is 4, the output layer is 1, and its transfer function MAPE value, the “+” symbol represents the outlier. Bar graph and line

7
X. Kong, et al. Applied Energy 261 (2020) 114368

Fig. 7. Statistical results of the MAPE values obtained for 61 days forecasting for the large region in Nov and Dec 2017 using five forecasting methods.

graph depict the mean and variance of MAPE respectively. Table 3 helps to reduce variance to build a more stable model. In addition, the
further supplements Fig. 7, and it shows the mean and variance of MAE, model based on the proposed method has smaller MAE, RMSE value
the mean and variance of RMSE, and DA for each model, and the best and larger DA value than other models, which indicates that the pro-
values are plotted in bold. posed method can improve the prediction accuracy of the model.
In this paper, a stable forecast is one where the variation amongst
the forecasts is low. In order to quantify stability, we used variance as
one of the evaluation indicators. The smaller the variance, the better 4.3. Experiment II
the stability. Comparing the statistical results of variance, the model
based on the proposal or EVCM is significantly smaller than the single In order to achieve the integrity of the experiment, Experiment II
model, which can be visually seen from the boxplot chart and line chart establishes five case studies including Case 1, Case 2, Case 3, Case 4 and
of Fig. 7 or Table 3. In fact, the use of EVCM avoids the appearance of Case 5.
obvious outliers due to accidental factors such as improper parameter Case 1: PM [43], ARMA [6], ELM [46], SVR [45] and GAM [14] are
setting of the forecasting model or the input of abnormal data, which selected as load forecasting models, and the method proposed is used to
correct the error, which can form five sets of comparative schemes: (1)

8
X. Kong, et al. Applied Energy 261 (2020) 114368

Table 3 areas is significantly lower. In terms of direction accuracy, DMD is good


MAE, RMSE and DA for different forecasting methods (The best values are given at capturing the trending characteristics of the error, so the model
in bold). based on the proposal has a larger DA value.
Data Methods DA Mean (MW) Variance (MW) Case 2: ARIMA [7], BP neural network [47] and LSSVM [45] are
selected as the load forecasting model, and the method proposed is used
RMSE MAE RMSE MAE as the error correction model, which can form three sets of comparative
schemes, including (1) ARIMA, ARIMA-Proposal, (2) BP, BP-Proposal,
2017 ARMA 0.57 504.14 435.62 4194.04 3836.21
ARMA-EVCM 0.74 485.29 410.37 1948.18 1567.44 and (3) LSSVM, LSSVM-Proposal. Based on the large regional data set of
ARMA-DMD 0.79 480.02 391.46 2351.50 2015.93 Tianjin, China in 2017, Case 2 forecasts the last two days of Jan, Feb
ARMA-Proposal 0.81 476.17 384.25 2598.51 2145.2 and Mar 2017 respectively. The specifics of the forecasting results are
ANN 0.43 578.12 465.25 5557.06 4954.39
shown in Fig. 8.
ANN-EVCM 0.69 551.20 433.21 3101.69 2799.57
ANN-DMD 0.77 545.25 419.09 2398.72 1904.08 In Fig. 8, this paper draws the load demand curve for two con-
ANN-Proposal 0.80 540.46 416.74 2305.17 1827.35 secutive days based on different methods and shows the value of per-
PM – 624.65 489.35 2965.61 2377.54 formance metrics on the right side. We can intuitively observe from
PM-EVCM – 613.59 477.35 2332.25 1942.62 Fig. 8 that the model based on the proposed method has a smaller index
PM-DMD – 604.17 469.68 2201.44 1824.70
value, which means that the proposal can effectively implement error
PM-Proposal – 596.14 468.30 1969.67 1536.88
SVR 0.64 442.32 358.58 3487.39 2836.25 correction to improve the prediction accuracy.
SVR-EVCM 0.68 450.05 375.44 2159.65 1845.91 Case 3: A small regional dataset for 2017 in Tianjin, China is con-
SVR-DMD 0.81 440.31 348.60 1950.18 1813.02 sidered to forecast the load demand for a certain day in the future. Since
SVR-Proposal 0.87 434.12 346.98 2456.31 2095.06
the load demand is sensitive to temperature changes, there is a sig-
nificant difference in demand curves for different months. In order to
take into account the difference as much as possible, Case 3 forecasts
ARMA, ARMA-Proposal, (2) PM, PM-Proposal, (3) ELM, ELM-Proposal,
the last day’s load demand for Jan, Mar, May, Jul, Sept and Dec 2017
(4) SVR, SVR-Proposal, and (5) GAM, GAM-Proposal. For the selection
based on different forecasting models.
of datasets, Case 1 contains historical load data for the 2016 large and
Specifically, Case 3 selects ARIMA [7], BP neural network [47], and
small regions of Tianjin, China, which fully considers the seasonal and
SVR [45] to set up three sets of comparative schemes, such as (1)
regional characteristics of the power load and provides a multi-level
ARIMA-ARIMA, ARIMA-Proposal, (2) BP-BP, BP-Proposal, and (3) SVR-
experimental result for Case 1, as shown in Table 4.
SVR, SVR-Proposal, to predict load demand in Jan, Mar, and May. For
From Table 4, we can observe the outstanding contribution of the
Jul, Sept, and Dec, Case 3 uses six forecasting models to forecast se-
proposal in forecasting accuracy and direction accuracy. Compared to
parately, including SVR-PM, SVR-ARMA, SVR-ARIMA, SVR-BP, SVR-
large areas, the load demand time series of small areas is usually more
SVR, and SVR-Proposal.
random and non-stationary. Therefore, the prediction accuracy in small
In Fig. 9, based on different models, the prediction results of Jan,

Table 4
Performance metrics for different forecasting methods based on different data sets (The best values are given in bold).
Datasets Seasons Metrics Methods

ARMA ARMA-Proposal PM PM-Proposal ELM ELM-Proposal SVR SVR-Proposal GAM GAM-Proposal

Large regional level Spring MAPE 5.596 4.841 8.676 7.583 4.276 4.312 2.868 2.729 4.867 4.605
MAE 185.708 176.292 303.417 208.167 161.958 170.042 111.042 108.625 180.551 164.320
RMSE 232.364 222.420 323.486 260.310 196.878 219.881 127.877 121.476 231.272 201.033
DA 0.647 0.824 – – 0.606 0.801 0.654 0.821 0.569 0.795
Summer MAPE 6.171 6.119 8.301 7.433 3.491 3.298 3.451 3.399 4.140 3.532
MAE 337.500 331.875 399.121 341.458 253.25 236.166 241.792 212.250 281.208 261.377
RMSE 400.289 352.917 461.349 405.667 283.574 264.110 272.802 237.724 302.530 293.105
DA 0.508 0.811 – – 0.553 0.711 0.620 0.901 0.561 0.848
Fall MAPE 5.447 5.366 7.273 7.011 4.635 4.551 4.560 4.385 4.703 4.566
MAE 189.583 159.510 260.752 210.125 141.458 125.417 133.875 112.752 172.250 134.359
RMSE 203.566 182.009 291.605 221.696 174.622 150.724 162.724 146.261 191.224 177.802
DA 0.622 0.793 – – 0.436 0.767 0.516 0.833 0.568 0.811
Winter MAPE 6.796 6.614 7.768 7.359 5.640 5.554 3.521 3.492 3.501 3.257
MAE 245.125 256.833 308.451 295.708 202.791 195.541 183.258 115.708 119.023 110.266
RMSE 307.256 295.058 376.892 351.659 264.453 225.224 217.383 203.574 205.467 198.209
DA 0.540 0.784 – – 0.562 0.878 0.490 0.867 0.602 0.877
Small area level Spring MAPE 8.422 8.497 12.269 12.001 7.956 8.179 7.218 6.899 6.115 6.024
MAE 112.875 128.625 160.333 154.791 109.041 113.831 105.416 95.041 89.218 82.081
RMSE 125.753 142.761 180.756 179.269 116.667 129.719 123.798 112.652 111.427 95.034
DA 0.539 0.670 – – 0.501 0.736 0.545 0.810 0.466 0.789
Summer MAPE 9.175 9.006 10.461 10.393 11.554 10.489 7.366 7.337 7.542 7.220
MAE 117.087 113.411 145.125 143.708 151.166 148.583 94.208 92.502 106.137 90.021
RMSE 143.856 137.428 158.301 157.869 160.401 159.461 110.005 108.067 122.421 95.465
DA 0.408 0.619 – – 0.452 0.794 0.469 0.755 0.454 0.728
Fall MAPE 9.866 9.276 13.825 12.933 10.819 10.011 7.838 7.639 8.542 8.399
MAE 116.255 114.374 203.568 184.958 149.375 141.758 97.541 95.416 108.028 100.544
RMSE 145.750 137.992 224.011 202.079 179.368 151.447 137.168 103.050 141.157 138.041
DA 0.521 0.763 – – 0.461 0.807 0.499 0.782 0.612 0.805
Winter MAPE 9.424 9.375 10.473 10.456 9.195 9.070 8.823 8.764 6.890 6.813
MAE 126.034 124.581 143.219 143.125 119.041 113.176 111.958 101.168 92.546 90.369
RMSE 142.980 129.084 160.027 149.483 126.854 123.149 120.302 115.709 110.994 104.076
DA 0.514 0.707 – – 0.522 0.769 0.565 0.808 0.467 0.790

9
X. Kong, et al. Applied Energy 261 (2020) 114368

Fig. 8. Forecasting results of two days in advance using different models. Forecasting results for (a) 30 and 31-Jan-2017, (b) 27 and 28-Feb-2017, (c) 30 and 31-Mar-
2017. On the left is the forecasting curve and on the right is the performance metrics.

Mar, and May are clearly displayed by plotting the load demand curve, an efficient nonparametric time series analysis algorithm. It can extract
and the prediction accuracy of Jul, Sept, and Dec are shown by plotting trend, oscillation, period and noise components from the original data.
the radar maps of MAPE, MAE, and RMSE respectively, which proves VMD is a non-recursive signal processing algorithm with complete
that the proposed method can help different load forecasting models to mathematical theory. Compared to the above decomposition technique,
further improve the prediction accuracy. the DMD-based mode has a single frequency and growth rate, which
Case 4: Since DMD is a decomposition technique, Case 4 selects helps to capture the changing trend of the data. However, it is sus-
several similar techniques for comparative experiments including WT, ceptible to the characteristics of the sample dimension, which limits the
EMD, SSA, and VMD. WT has localized characteristics, but it is difficult application of DMD to a certain extent.
to reasonably select the wavelet basis based on the sample. EMD is Case 4 selects WT-SVM [49], EMD-SVM [50], SSA-AR [51], VMD-
prone to modal aliasing and lacks rigorous mathematical theory. SSA is ELM [52], and SVR-Proposal to forecast the load demand of the large

10
X. Kong, et al. Applied Energy 261 (2020) 114368

Fig. 9. Forecasting results of one day in advance using different models. Forecasting results for (a) 31-Jan-2017, (b) 31-Mar-2017, (c) 31-May-2017, (d) 31-Jul-2017,
(e) 30-Sept-2017, and (f) 31-Dec-2017.

region of China’s Tianjin from April 11 to 17, 2016 (one week). The Proposal, (2) ANN and ANN-Proposal, (3) SVR and SVR-Proposal. Based
values of MAPE and RMSE for the five methods are shown in Table 5, on multiple experiments, the mean values of MAPE for each method are
and the best values are indicated in bold. Based on the prediction re- shown in Table 6.
sults, Case 4 further proves that the proposed method can help the It can be clearly seen from Table 6 that the proposed method can
prediction model to improve the prediction accuracy. still enhance the load forecasting in the case of inputting load data and
Case 5: The public dataset GEFCom 2012 used in this case study is weather data. In fact, using all possible load explanatory variables can
available from [53]. It should be noted that three types of data in the improve the accuracy of prediction results, but it also reduces the vo-
GEFCom 2012 dataset are used such as Holiday_list, Temperature_- latility of the error time series, which helps the DMD to capture the
history and Load_history. Case 5 selects ARIMA, ANN and SVR as load trend characteristics of the error and achieve error correction.
forecasting models, and forms three comparison schemes to forecast the Based on the experimental results of five cases, Experiment II
load value from Jan 1 to 7, 2008, including (1) ARIMA and ARIMA- proved the superiority of the proposed method in terms of accuracy and

11
X. Kong, et al. Applied Energy 261 (2020) 114368

Table 5
Forecasting results for electricity demand from April 11 to 17, 2016 (The best values are given in bold).
Range Methods Metrics Mon. Tues. Wed. Thur. Fri. Sat. Sun.

April 11–17, 2016 WT-SVM MAPE 6.527 7.070 5.605 6.721 5.129 10.075 7.832
RMSE 751.192 840.098 703.190 625.139 445.584 1084.848 872.776
EMD-SVM MAPE 6.348 4.192 5.125 3.496 4.527 5.530 8.224
RMSE 789.735 427.026 626.164 327.844 489.608 549.227 801.836
SSA-AR MAPE 5.301 6.245 4.936 4.526 6.169 8.122 7.347
RMSE 647.626 711.608 533.114 568.374 683.545 822.764 719.409
VMD-ELM MAPE 4.867 3.969 5.447 3.649 4.651 7.592 4.168
RMSE 515.162 405.536 611.540 376.117 548.870 844.482 375.351
SVR-Proposal MAPE 4.772 4.502 3.150 4.304 3.319 5.032 4.452
RMSE 416.871 434.194 289.948 470.519 318.805 513.433 444.161

Table 6 to forecast the electrical load or apply it to other fields, such as wind
Forecasting results for load demand from Jan 1 to 7, 2008 (The best values are speed, photoelectric, electricity prices, and even stock forecasting.
given in bold).
Dataset Methods MAPE (%) CRediT authorship contribution statement

GEFCom 2012 ARIMA 5.481 Xiangyu Kong: Methodology. Chuang Li: Validation. Chengshan
ARIMA-Proposal 4.716
Wang: Funding acquisition. Yusen Zhang: Data curation. Jian Zhang:
ANN 4.650
ANN-Proposal 4.093
Project administration.
SVR 3.622
SVR-Proposal 3.104 Declaration of Competing Interest

The authors declare that they have no known competing financial


stability, and specific advantages include (1) the generalization ability interests or personal relationships that could have appeared to influ-
for different date type patterns, (2) the simple calculation and easy ence the work reported in this paper.
implementation, (3) no additional parameter selection and feature ex-
traction, and (4) not relying on any previous assumptions. Taking the Acknowledgement
EMD as an example, the input data before using EMD should satisfy the
assumption that includes at least two extreme values. This work was supported by the National Natural Science
Foundation of China (grant number 51877145).
5. Conclusion
References
Short-term load forecasting plays a dominant role in the develop-
ment of microgrids and smart grids. In order to improve the prediction [1] Adam NRB, Elahee MK, Dauhoo MZ. Forecasting of peak electricity demand in
results, this paper proposes a short-term electrical load forecasting Mauritius using the non-homogeneous Gompertz diffusion process. Energy
2011;36(12):6763–9.
method based on error correction using dynamic mode decomposition [2] Koprinska I, Rana M, Agelidis VG. Correlation and instance based feature selection
(DMD) from the perspective of error analysis. In the data selection for electricity load forecasting. Knowl-Based Syst 2015;82:29–40.
stage, three types of strongly correlated input data are selected to re- [3] Černe G, Dovžan D, Škrjanc I. Short-term load forecasting by separating daily
profiles and using a single fuzzy model across the entire domain. IEEE Trans Ind
duce the impact of data volume and weakly correlated data on pre- Electron 2018;65(9):7406–15.
diction accuracy. In the error prediction stage, the advantage of DMD’s [4] Sousa JC, Jorge HM, Neves LP. Short-term load forecasting based on support vector
capturing trend characteristics is used to predict the error value and regression and load profiling. Int J Energy Res 2014;38(3):350–62.
[5] Mohan N, Soman KP, Kumar SS. A data-driven strategy for short-term electric load
reduce the incidence of error accumulation. In the error correction
forecasting using dynamic mode decomposition model. Appl Energy
stage, error correction is achieved by combining the predicted values of 2018;232:229–44.
load and error, and the outlier is replaced by the extreme value con- [6] Zhang Y, Li C, Li L. Electricity price forecasting by a hybrid model, combining
wavelet transform, ARMA and kernel-based extreme learning machine methods.
straint method (EVCM). Based on the experimental results, the pro-
Appl Energy 2017;190:291–305.
posed method shows better accuracy and stability. In addition, during [7] Taylor JW, Mcsharry PE. Short-term load forecasting methods: an evaluation based
the experiment, the following problems are found. on European data. IEEE Trans Power Syst 2007;22(4):2213–9.
[8] Gould PG, Koehler AB, Ord JK, Snyder RD, Hyndman RJ, Vahid-Araghi F.
Forecasting time series with multiple seasonal patterns. Eur J Oper Res
(1) EVCM limits the load time series corrected by DMD to a reasonable 2008;191(1):207–22.
range to further improve forecasting results, which puts forward [9] Goude Y, Nedellec R, Kong N. Local short and middle term electricity load fore-
higher requirements for the accuracy of historical data. casting with semi-parametric additive models. IEEE Trans Smart Grid
2014;5(1):440–6.
(2) Since the load demand curve at the small area has stronger ran- [10] Tao H, Pierre P, Shu F. Global energy forecasting competition 2012. Int J Forecast
domness than the large area, the prediction accuracy of small areas 2014;30(2):357–63.
is usually lower based on the same prediction model. [11] Hong WC. Application of chaotic ant swarm optimization in electric load fore-
casting. Energy Policy 2010;38(10):5830–9.
(3) When the predicted errors are used to correct the load values, the [12] Barman M, Choudhury NBD, Sutradhar S. A regional hybrid GOA-SVM model based
error accumulation phenomenon cannot be avoided. However, on similar day approach for short-term load forecasting in Assam, India. Energy
using DMD to track the fluctuation trend of the error can reduce its 2018;145:710–20.
[13] Yang Y, Che J, Deng C, Li L. Sequential grid approach based support vector re-
probability of occurrence.
gression for short-term electric load forecasting. Appl Energy 2019;238:1010–21.
[14] Serinaldi F. Distributional modeling and short-term forecasting of electricity prices
By mining the value of the error, the proposed method achieves by generalized additive models for location, scale and shape. Energy Econ
2011;33(6):1216–26.
effective error correction, which provides a new idea for improving the
[15] Lora AT, Santos JMR, Exposito AG, Ramos JLM, Santos JCR. Electricity market
prediction results. In fact, the application of DMD is not limited to price forecasting based on weighted nearest neighbors techniques. IEEE Trans
forecast the error time series. In the future, we can try to use it directly Power Syst 2007;22(3):1294–301.

12
X. Kong, et al. Applied Energy 261 (2020) 114368

[16] Alvarez FM, Troncoso A, Riquelme JC, Ruiz JSA. Energy time series forecasting 2016;4(4):1206–16.
based on pattern sequence similarity. IEEE Trans Knowl Data Eng [35] Cai G, Wang W, Lu J. A novel hybrid short term load forecasting model considering
2011;23(8):1230–43. the error of numerical weather prediction. Energies 2016;9(12):994.
[17] Zhao H, Guo S. An optimized grey model for annual power load forecasting. Energy [36] Sanei S, Lee TK-M, Abolghasemi V. A new adaptive line enhancer based on singular
2016;107:272–86. spectrum analysis. IEEE Trans Biomed Eng 2012;59(2):428–34.
[18] Li XB, Jing ZX, Wu QH. Application of improved GM (1, N) models in annual [37] Duke D, Soria J, Honnery D. An error analysis of the dynamic mode decomposition.
electricity demand forecasting. 2015 IEEE innovative smart grid technologies-Asia Exp Fluids 2012;52(2):529–42.
(ISGT ASIA). IEEE; 2015. p. 1–6. [38] Kaur A, Nonnenmacher L, Coimbra CFM. Net load forecasting for high renewable
[19] Xie J, Hong T. Temperature scenario generation for probabilistic load forecasting. energy penetration grids. Energy 2016;114:1073–84.
IEEE Trans Smart Grid 2018;9(3):1680–7. [39] Giaouris D, Papadopoulos AI, Patsios C, Walker S. A systems approach for man-
[20] Tao H, Pinson P, Shu F, Zareipour H, Troccoli A, Rob JH. Probabilistic energy agement of microgrids considering multiple energy carriers, stochastic loads,
forecasting: global energy forecasting competition 2014 and beyond. Int J Forecast forecasting and demand side response. Appl Energy 2018;226:546–59.
2016;32(3):896–913. [40] Jiang H. Model forecasting based on two-stage feature selection procedure using
[21] Tao H, Xie J, Jonathan B. Global energy forecasting competition 2017: Hierarchical orthogonal greedy algorithm. Appl Soft Comput 2018;63:110–23.
probabilistic load forecasting. Int J Forec 2019;35(4):1389–99. [41] Yang W, Wang J, Niu T, Du P. A hybrid forecasting system based on a dual de-
[22] Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with composition strategy and multi-objective optimization for electricity price fore-
neural networks: a review. Neural Netw 2019;113:54–71. casting. Appl Energy 2019;235:1205–25.
[23] Wu Z, Zhao X, Ma Y, Zhao X-Y. A hybrid model based on modified multi-objective [42] Cecati C, Kolbusz J, Różycki P, Siano P, Wilamowski BM. A novel RBF training
cuckoo search algorithm for short-term load forecasting. Appl Energy algorithm for short-term electric load forecasting and comparative studies. IEEE
2019;237:896–909. Trans Ind Electron 2015;62(10):6519–29.
[24] Zhang J, Wei Y-M, Li D, Tan Z, Zhou J. Short term electricity load forecasting using [43] Dutta S, Li Y, Venkataraman A, Costa LM, Jiang T. Load and renewable energy
a hybrid model. Energy 2018;158:774–81. forecasting for a microgrid using persistence technique. Energy Proc
[25] Qiu X, Ren Y, Suganthan PN, Amaratunga GAJ. Empirical mode decomposition 2017;143:617–22.
based ensemble deep learning for load demand time series forecasting. Appl Soft [44] Wilamowski BM. Neural network architectures and learning algorithms. IEEE Ind
Comput 2017;54:246–55. Electron Mag 2009;3(4):56–63.
[26] Zhang C-Y, Chen CLP, Gan M, Chen L. Predictive deep Boltzmann machine for [45] Chang C-C, Lin C-J. LIBSVM: A library for support vector machines; 2001.
multiperiod wind speed forecasting. IEEE Trans Sustain Energy 2015;6(4):1416–25. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[27] Tian C, Hao Y, Hu J. A novel wind speed forecasting system based on hybrid data [46] Available: http://nng.wsiz.rzeszow.pl/Download.html.
preprocessing and multi-objective optimization. Appl Energy 2018;231:301–19. [47] Fan GF, Peng L, Hong W-C, Sun F. Electric load forecasting by the SVR model with
[28] Shayeghi H, Ghasemi A, Moradzadeh M, Nooshyar M. Simultaneous day-ahead differential empirical mode decomposition and auto regression. Neurocomputing
forecasting of electricity price and load in smart grids. Energy Convers Manage 2016;173:958–70.
2015;95:371–84. [48] Wang J, Yang W, Du P, et al. A novel hybrid forecasting system of wind speed based
[29] Li C, Tao Y, Ao W, Yang S, Bai Y. Improving forecasting accuracy of daily enterprise on a newly developed multi-objective sine cosine algorithm. Energy Convers
electricity consumption using a random forest based on ensemble empirical mode Manage 2018;163:134–50.
decomposition. Energy 2018;165:1220–7. [49] Liu D, Niu D, Wang H, Fan L. Short-term wind speed forecasting using wavelet
[30] Afshar K, Bigdeli N. Data analysis and short term load forecasting in Iran electricity transform and support vector machines optimized by genetic algorithm. Renew
market using singular spectral analysis (SSA). Energy 2011;36(5):2620–7. Energy 2014;62:592–7.
[31] Liang Y, Niu D, Hong W-C. Short term load forecasting based on feature extraction [50] Zhang W, Liu F, Zheng X, Li Y. A hybrid EMD-SVM based short-term wind power
and improved general regression neural network model. Energy 2019;166:653–63. forecasting model. 2015 IEEE PES Asia-Pacific power and energy engineering
[32] Hao Y, Tian C. A novel two-stage forecasting model based on error factor and en- conference (APPEEC). 2015. p. 1–5.
semble method for multi-step wind power forecasting. Appl Energy [51] Vahabie AH, Yousefi MMR, Araabi BN, Lucas C, Barghinia S. Combination of sin-
2019;238:368–83. gular spectrum analysis and autoregressive model for short term load forecasting.
[33] Yu Z, Yang C, Zhang Z, Jiao J. Error correction method based on data transfor- 2007 IEEE Lausanne power tech. IEEE; 2007. p. 1090–3.
mational GM (1,1) and application on tax forecasting. Appl Soft Comput [52] Abdoos Akbar A. A new intelligent method based on combination of VMD and ELM
2015;37:554–60. for short term wind power forecasting. Neurocomputing 2016;203:111–20.
[34] Mao M, Ling J, Chang L, Hatziargyriou ND, Zhang J, Ding Y. A novel short-term [53] GEFCom 2012 Dataset. URL < http://blog.drhongtao.com/2016/07/gefcom2012-
wind speed prediction based on MFEC. IEEE J Emerg Sel Top Power Electr load-forecasting-data.html > .

13

You might also like