VMD Cat

Available online at www.sciencedirect.
com
ScienceDirect
Energy Reports 9 (2023) 199–211

www.elsevier.com/locate/egyr
The 7th International Conference on New Energy and Future Energy Systems (NEFES 2022),
7th NEFES, 25–28 October 2022, Nanjing (virtually), China
VMD-CAT: A hybrid model for short-term wind power prediction

Huan Zhenga , Zhenda Hua , Xuguang Wangb ,∗, Junhong Nib , Mengqi Cuib
a Institute of Economic and Technology, State Grid Fujian Electric Power Co., Ltd., Fuzhou 350002, China
b North China Electric Power University, Baoding 071003, China
Received 19 February 2023; accepted 25 February 2023

Available online xxxx
Abstract
Accurate wind power prediction is essential to optimize the wind power scheduling and maximize the profits. However,
the inertia and time-varying property of the wind speed pose a challenge to the wind power prediction task. The existing
prediction models fail to efficiently mitigate the negative influence of these properties on the prediction results. Therefore, their
generalization abilities require a further improvement. In this paper, the historical wind power segment is decomposed into
sub-signals, which are considered as the fluctuation patterns of the wind power series, the variable support then is employed to
describe the inertia and time-varying properties for the fluctuation patterns. The component-attention mechanism is introduced to
formulate the correlation-relationship between each fluctuation pattern and the historical wind power segment, this mechanism
is used to replace the self-attention mechanism for the Transformer model. A hybrid model combined VMD and Transformer
is utilized for predicting the future wind power. Experiments performed on an actual wind power series validate the efficiency
of the proposed model.
© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the scientific committee of the 7th International Conference on New Energy and Future Energy Systems, NEFES,
2022.
Keywords: Wind power prediction; Correlation relationship; VMD; Transformer
1. Instruction
As one of the most promising clean energy sources, wind energy has attracted more and more attention. Recently,
extreme weather events happen frequently, resulting in increasing the volatility and intermittency of the wind energy.
This further makes it difficult to balance the power system in real time. Therefore, improving the accuracy of wind
power prediction plays a key role for the balanced dispatching of wind power and the stable operation of power
system [1–3].
Various wind power prediction models have been described in the literature, such as [4–7]. These models can
be categorized differently according to diverse criteria.
Due to different theoretical sources, the prediction models are classified into physical, statistical and machine
learning-based models.
∗ Corresponding author.
E-mail address: wangxg@ncepu.edu.cn (X. Wang).
https://doi.org/10.1016/j.egyr.2023.02.061
2352-4847/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http:
//creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the scientific committee of the 7th International Conference on New Energy and Future Energy Systems,
NEFES, 2022.
H. Zheng, Z. Hu, X. Wang et al. Energy Reports 9 (2023) 199–211
The physical models are developed according to the physical laws of the airflow. These models are often based on
rigorous preconditions and are very complex. Therefore, they have a high computational load. The statistical models
are based on statistical characteristics of the wind. They include the autoregressive (AR) model [8], autoregressive
integrated moving average (ARIMA) model and its variants [9,10], and non-linear regression model [11]. Note that
the statistical models are simple and user-friendly. However, their applications are always limited to the stationary
time series. The machine learning-based models such as the back propagation (BP) [12], least squares support vector
machine (LS-SVM) [13], long–short term memory (LSTM) [14], and deep belief network (DBN) [15] models, are
commonly used due to their ability to extract and learn complex quantitative relationships that are hidden in the
wind power data. In cases where violent oscillations in the wind power series occur, the prediction accuracy is
significantly reduced. Therefore, they are usually used as the predictive module in a hybrid model.
According to the model structure, the prediction models can also be classified into single and hybrid models. The
hybrid models are usually expected to have a better wind power prediction accuracy than single models [16,17].
Early hybrid models usually simply assembled several single models together and summed the prediction results
of the single models with learned or manually set weights, e.g., [18–20], etc. The popular hybrid models always
combine the signal decomposition methods and the time series prediction method together, they usually decompose
the wind power series with the Wavelet Transform (WT) [21], Empirical Mode Decomposition (EMD) [22],
Variational Mode Decomposition (VMD) [23] or the variants of the three decomposition methods. The successful
hybrid models, such as [24–27] (referred to as WEE, EAW, RWA and FVAD respectively in this paper) all belong
to this type.
Wind power series always presents the inertia property and time-varying property. These two properties are
closely related to the future wind power trend. The previously mentioned prediction models are unable to efficiently
describe the influence of the two properties on the future wind power trend. Therefore, they should be further
improved. To address this issue, the variable support is proposed to trace the dynamic relation between the historical
and future wind power segments. The parameters of the variable support are designed to describe the impacts
of the inertia and time-varying properties of the wind power series. The prediction is performed using a hybrid
model. The historical wind power segment is then decomposed into several components using the VMD method,
and the component-attention mechanism is introduced to assist in the estimation of the variable support. Finally, the
Transformer model [28] is modified to implement the variable support estimation and future wind power prediction.
The main contributions of this paper are summarized as follows:
• Problem Definition. The variable support is proposed to describe the impacts of the inertia and time-varying
properties of the wind power series. The prediction then consists in approximating the functional relationship
be in approximating the functional relationship between the future wind power segment and its variable
support.
• Analysis and Model. The historical wind power segment is decomposed into fluctuation patterns using VMD.
The component-attention mechanism is then introduced to formulate the correlation-relationship between each
fluctuation pattern and the historical wind power segment. Based on the correlation-relationship, a hybrid
model which combines VMD and a modified Transformer model is proposed to estimate the variable support
and predict the future wind power.
• Experiments. Experiments are conducted to evaluate the efficiency of the proposed model. The experimental
results demonstrate that the component-attention mechanism can efficiently describe the correlation-relation
between each fluctuation pattern and the future wind power segment. In addition, the proposed model achieves
a better prediction accuracy than the baseline models.
The remainder of this paper is organized as follows. Section 2 formulates the wind power prediction problem
and defines the variable support. Section 3 briefly reviews the background theory of the proposed model. Section 4
details the modification of the Transformer model and the proposed hybrid model. Section 5 validates the proposed
model using numerical experiments. Finally, Section 6 concludes the paper.
2. Problem formulation
For ease of reading, a nomenclature is presented in Table 1.
200
Table 1. Notation and description.

Notation Description
F (·) Function between wind speed segments.
Fk (·) Function between segment of sub-signals.
G(·) Function that transforms segment of sub-signals into future wind speed segment.
x f , xh Wind speed segment.
x(k)
h , xf
(k)
Segments of sub-signals.
(k)
a p,τ , a p,τ Variable support.
xi Wind speed measurement.
xi(k) Measurement of the sub-signal.
j
Hi , WiK , WiQ , WiV , W̃i , W̃i Weight matrix.
Q, K, V Query matrix, key matrix and value matrix.
sj Slice.
s(i) Union of slices.
p, pi Length of the variable support.
τ , τi Delay related nonnegative integer.
Assuming that the future wind power measurements can be predicted by the historical wind power data, several
models are proposed to approximate the following functional relationship:
x f = F(xh ) (1)
where x f = [xi , . . . , xi+L−1 ] represent the future wind power segment, xi is the wind power sample measured at
time ti , xh = [xi−P , . . . , xi−1 ] denotes the historical wind power segment, P and L are non-negative integers, and
F(·) : R P → R L is the function to be approximated. Note that, if L = 1, Eq. (2a) corresponds to a one-step-ahead
prediction problem. Otherwise, it corresponds to a multi-step-ahead prediction problem.
A wind power series generally presents an unstable future trend, therefore, directly approximating function
F(·) in Eq. (2a) becomes a challenging task. In the frequency domain, wind power series is signal with
large band width, while series with narrow bandwidths generally show a better stability of the future trends.
Consequently, the wind power data is first decomposed into certain sub-signals (fluctuation patterns) that are then
predicted. Afterwards, the predicted sub-signals are summed to reconstruct the future wind power segment. This
decomposition–prediction–summation process can be expressed as:
x(k) (k)
f = Fk (xh ) (2a)
x(k)
∑
xf = f (2b)
k
where x(k) (k) (k) (k)

h = [x i−Pk , . . . , x i−1 ] denotes a sub-signal decomposed from the historical wind power segment, x f =
[xi(k) , . . . , xi+L−1
(k)
] in Eq. (2b) represents the future wind power segment predicted from x(k)
h , and Fk is the function
that relates xh and x(k)
(k)
f .
However, this process does not take into consideration the influence of the inertia and time-varying properties
on the sub-signals, and unable to describe the interaction among the sub-signals, all of which can compromise or
even prevail the benefits of the stable future trend of the sub-signals. In this paper, the wind power prediction task
is reformulated as:
x f = G(a(1) (K )
p1 ,τ1 , . . . , a p K ,τ K ) (3)
(k) (k)
where a(k)
p1 ,τ1 = [x i− p−τ , . . . , x i−1−τ ] in Eq. (3) is the variable support of the kth sub-signal, G(·) denotes the
functional relationship between a(1) (K )
p1 ,τ1 , . . . , a p K ,τ K and x f .τk is the delay related nonnegative integer. Therefore,
the inertia of a time series can be characterized by a time-delay process. The non-negative integer pk denotes the
length of the variable support segment. Note that both τk and pk are mutable. Thus, the time-varying property of
the wind power series can be efficiently described.
201
A schematic diagram of the variable support is shown in Fig. 1, where the coloured cubes arranged in
chronological order represent the predicted future wind power segment and the sky blue circles represent the segment
of a sub-signal decomposed from the historical wind power segment.
Fig. 1. Variable support. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)
Note that the circles connected to red (or green) cube with red (or green) arrows, form the variable support.
Since we aim to address the one-step ahead wind power prediction problem, the length of the predicted future wind
power segment L is set to be 1. It can be observed from Fig. 1 that the parameters τ and p of a variable support
may vary with different future wind power measurements.
The variable support makes the functional relationship G flexible to describe the inertia and time-varying
properties of sub-signals. Therefore, adaptively estimating the variable supports for the sub-signals becomes the
key problem to be solved. In this paper, sub-signals are decomposed from the historical wind power segment using
VMD, the variable support is then estimated based on the correlation-relationship between each sub-signal and the
historical wind power segment with a modified Transformer model.
3. VMD and Transformer

This section briefly reviews the VMD method and the Transformer model.
3.1. VMD
VMD is completely non-recursive. It is designed to iteratively decompose real-valued signals into a given number
of intrinsic mode functions (narrowband components).
In VMD, the intrinsic mode function is redefined as:
u k (t) = Ak (t) cos(φk (t)) (4)
where Ak (t) and φk (t) are the amplitude and phase of u k (t), respectively.
The mode functions can be obtained by solving the following constraint optimization problem:
⎧  ]2
∑  [( i
)
∂t δ(t) +
⎪ 
⎨ min ∗ u k (t) 
⎪ 
πt
⎪
{u k (t)},{wk }  
2
k ∑ (5)
⎪
⎪
⎪
⎩ s.t. u k (t) = x(t)
k
where wk denotes the centre frequency of u k (t), δ(t) represents the impulse signal, x(t) is the original signal to be
decomposed, “∂” and “∗” are the gradient and convolution operators, respectively.
The mode functions can be obtained by solving the ) following]2 constraint optimization problem:
∑ [(
i
L({u k (t)} , {wk } , λ(t)) = α ∂t δ(t) +
 
∗ u k (t) 
 πt 
2
2 ⟨ k
(6)
 ⟩
 ∑  ∑
+ x(t) − u k (t) + λ(t), x(t) − u k (t)
 
 
k 2 k
202
where α is a weight factor and λ(t) is the Lagrangian multiplier.

Afterwards, the Fourier transforms of u k (t) and wk are computed as:
x̂(w) − i<k û kn+1 (w) − i>k û nk (w) + λ̂ 2(w)
∑ ∑ n
n+1
û k (w) =
1 + 2α(w − wnk )2
∫ ∞ ⏐ n+1 ⏐2 (7)
w ⏐û (w)⏐ dw
wnn+1 = 0∫ ∞ ⏐ k
⏐û n+1 (w)⏐2 dw
⏐
0 k
where x̂(w) and λ̂n (w) are the Fourier transforms of x(t) and λn (t), respectively. λ̂n (w) is given by:
( )
∑
λ (w) = λ (w) + β x̂(w) −
n+1 n
û k (w)
n+1
(8)
k
where β is a constant factor.

The iteration process in Eq. (7) terminates when the following condition is met:
û n+1 (w) − û n (w)2 / û n (w)2 < ε
∑   
k k 2 k 2
(9)
k
where ε is a stop threshold.
3.2. Transformer
The Transformer model [28] is developed to address the long-distance information extraction and memory in
the sequence-to-sequence task. It breaks through the serialization calculation limitation of the RNN-based models
using the parallel matrix operation and achieves high performance in the machine translation field.
The Transformer model adopts the encoder–decoder structure. The encoder stack and decoder stack contain six
serially connected decoders and decoders, respectively. The input layer contains a word embedding module and
a positional encoding module, and the word embedding module is designed to receive a sentence. In the output
layer, the linear module maps the output of the decoder stack into a vector of fixed dimension and the softmax
module transforms this vector into a probability vector. Each encoder consists of a multi-head attention module and
a feed forward neural network module, while each decoder contains three serially connected modules, including
a multi-head attention module, an encoder–decoder attention module and a feed forward neural network module.
A residual connection exists in each module, and the output of the module is normalized. The structure of the
Transformer model is illustrated in Fig. 2.
Fig. 2. Structure of the Transformer model.
203
4. Proposed model
With the purpose of estimating the variable support and approximating the functional relationship of Eq. (3), the
VMD and Transformer are combined to form the wind power prediction model.
The original Transformer model cannot be directly applied to the wind power prediction process, because the
word embedding module in the input layer is designed to receive a sentence but not a numerical sequence. Therefore,
this module should be modified. The positional encoding module should be retained, since it can be used to encode
the position information of the historical wind power measurements. The output layer of the Transformer should
also be adjusted, as the softmax module transforms the output vector into a probability vector. The modified input
and output layer are shown in Fig. 3.
Fig. 3. The modified input and output layer.
It can be seen that the original output layer is simply replaced by a fully connected neural network. It is used to
receive the sub-signals (i.e. the right module of the input layer). The modified input layer contains two parallelly
arranged input modules. It is designed to receive the overlapped slices of the historical wind power segment (i.e. the
left module of the input layer).
The sub-signals are first concatenated, then transformed to a vector of specified dimension using a fully connected
neural network. Afterwards, they are entered to the head most encoder after positional encoding.
An overlapped slicing submodule is designed to produce overlapped segments from the historical wind power
segment. The overlapped slices of the historical wind power segment are shown in Fig. 4.
Fig. 4. Overlapped slices of the historical wind power segment.
In order to adapt to the input layer modification, the self-attention mechanism used in the original multi-head
attention module of the lowermost encoder is adjusted to the component-attention mechanism, as shown in Fig. 5.
Here, Q, K and V represent the query, key and value matrix, respectively. In the self-attention mechanism, Q,
K and V are calculated from the same input, while in the component-attention mechanism, K and V are calculated
from the slices, but Q is calculated from the sub-signals. Moreover, each row of V is computed from a unique slice.
204
Fig. 5. Component-attention module.
According to the structure of the multi-head attention module, the ith head outputs a weighted matrix:
⎡ ( )T ⎤
KWiK QWiQ
V
Hi = so f tmax ⎣ √ ⎦ VWi (10)
⎢ ⎥
d
where WiK , WiQ and WiV are matrices used for linear transformation, and d is the scale factor.
For the sake of presentation, a weight matrix W̃i is separated from Eq. (10):
⎡ ( )T ⎤
KWiK QWiQ
W̃i = so f tmax ⎣ √ (11)
⎢ ⎥
⎦
d
This weight matrix can indicate which slices contribute to Hi . Eq. (10) can be reformulated as:
Hi = W̃i V WiV = ṼWiV
( )
(12)
Hence, the jth row of Ṽ is equal to the weighted sum of all the rows of V and the jth row of W̃i is the very
weight vector.
Empirically, the length of the historical wind power segment xh is set to 12, and xh is decomposed into 10 slices.
Each slice s j contains three consecutive wind power samples, and the adjacent slices contain two common samples.
Fig. 6 presents a weight matrix calculated for a head.
To avoid confusion, this weight matrix is also indexed by i and denoted as W̃i . It can be seen that this weight
matrix is sparse, and non-zero values are distributed in a portion of the matrix’s columns1. Thus, the elements of
xh that contribute to Hi can be expressed as:
s(i) = ∪ sj (13)
j
max(W̃i )>0
j j j
where W̃i denotes the jth column of W̃i and max(W̃i ) is the maximum element in W̃i .
In Eq. (13), s(i) can be considered as the variable support of Hi with length p = 7 and delay-related parameter
τ = 2. Therefore, the variable support a p,τ can be obtained by pooling all the wind power elements of xh that
contribute to at least one of the output of the head.
a p,τ = ∪s(i) (14)
i
In the sequel, the modified Transformer model is referred to as the M-Reformer. The structure of the proposed
model is illustrated in Fig. 7.
It is known that a high frequency residue is usually generated after the wind power series decomposition process
using VMD. This high frequency residue can be the high frequency component or noise of the wind power series.
Therefore, the permutation entropy of the residue is used as a randomness metric, and the residue will be discarded
if its permutation entropy reaches a given threshold η.
205
Fig. 6. Weight matrix calculated for a head.
Fig. 7. Structure of the proposed model.
To avoid confusion, the model combining the VMD and the component-attention mechanism-based Transformer
is referred to as VMD-CAT.
5. Numerical results
5.1. Wind power datasets
Two wind power datasets collected in different wind farms of China are adopted in this part to validate the
proposed hybrid model. One is measured from a wind farm in Hebei province and another is measured from a
wind farm in Shanxi province. Both the datasets record a whole-year data with a one-hour time scale.
Fig. 8 shows the seasonal statistics of the two datasets. As can be seen, the statistics of each dataset vary with
seasons. Hebei dataset reach its maximum value and minimum value in summer and spring respectively, while
Shanxi dataset gets its maximum value and minimum value in winter and autumn respectively. Hebei dataset
fluctuates most sharply in summer and least sharply in autumn, the same is true for the Shanxi dataset. In each
season, the kurtosis value of Hebei dataset is larger than that of Shanxi dataset, and the skewness deviation of the
two datasets is obvious.
206
Fig. 8. Statistics of the two datasets. (a) maximum value; (b) standard deviation; (c) kurtosis; (d) skewness.
5.2. Data decomposition and randomness metric
According to the structure of VAM-CAT (cf. Fig. 7), the historical wind power segment is decomposed into
sub-signals before entering the M-Transformer model. The fundamental parameter K (i.e. the number of modes)
should be set in advance, because it cannot be adaptively set in the VMD method. In this paper, K is set to make
sure that the minimum deviation between adjacent centre frequencies of the relatively higher-frequency sub-signals
is not less than a given threshold. The threshold is set to one-tenth of the bandwidth of the input wind power
segment. In addition, parameter in VMD is empirically set to 2000.
When a wind power segment is decomposed, a residual component is always generated. For instance, the
decomposition results of the Hebei wind power series collected in June are listed in Table 1. It can be seen that the
central pulsation of the modes slightly varies, and the difference between adjacent centre pulsations of the lower
frequency sub-signal is relatively small. The sub-signals regularly fluctuate, while the vibration of the residue is
random.
Since the residue may carry high frequency information, simply discards it may lead to a negative impact on
the wind power prediction accuracy. Therefore, the permutation entropy is used as a signal randomness metric, and
the residue is filtered using a threshold-based method. The order and delay time of the permutation entropy are
respectively set to 7 and 2, while threshold η is adaptively estimated.
η = mean e p + 3std e p
( ) ( )
(15)
where e p denotes the vector which elements are the permutation entropy values of the sub-signals.
The permutation entropy values for the sub-signals and the residue are also provided in Table 2.
In this case, threshold η is equal to 4.73 according to Eq. (15). Therefore, the residue should be retained and
treated as a sub-signal.
5.3. Comparative experiments and discussion
The assessment indexes adopted in this part include the mean absolute error (MAE), the root mean square error
(RMSE) and the mean absolute percent error (MAPE).
207
Table 2. The centre frequencies and permutation entropies of the decomposition.

Sub-signal wk (Hz) ep
S1 0.07 2.08
S2 14.36 1.55
S3 40.20 1.76
S4 77.78 2.31
S5 139.02 2.81
S6 232.73 3.15
S7 321.62 3.53
S8 431.30 3.20
Residue – 4.46
To verify the performance of VMD-CAT, comparative experiments are implemented. The single models including
ARIMA, BP and LS-SVM, and the hybrid models including WEE, EAW, RWA and FVAD are used as the
comparative models.
During the training and testing process, the parameters of VMD-CAT are set as: each encoder (decoder) stack
includes 4 encoders (decoders), a 64-dimensional vector is produced from a slice with the fully connected neural
network. The dropout rate, learning rata and epoch is set to 0.1, 0.0001 and 5000, respectively. Python3.8 platform
is adopted to implement all the experiments.
Here, the wind power series of the Hebei dataset measured in the last week of February, May, August and
October are used as the testing data, while the wind power samples collected four weeks before each testing data
are selected as the corresponding training data. Similarly, the wind power segments of the Shanxi dataset measured
in the last week of February, May, August and November are randomly selected as the testing data. The wind power
samples collected four weeks before the testing data, are considered as the corresponding training data. The testing
errors of each model are presented in Fig. 9. Moreover, the testing results evaluated using MAE, RMSE and MAPE
are provided in Tables 3 and 4.
Fig. 9. Testing errors shown with box-whisker figures. (a) Hebei dataset; (b) Shanxi dataset.
208
Table 3. Testing results on the Hebei dataset.

Model Feb. May Aug. Oct.
MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE
ARIMA 1.59 1.82 14.46% 1.61 1.91 15.32% 1.72 1.87 18.21% 1.61 1.94 18.42%
BP 1.50 1.76 13.23% 1.54 1.86 14.23% 1.67 1.71 17.21% 1.58 1.75 17.45%
LS-SVM 1.32 1.33 12.02% 1.39 1.41 13.21% 1.54 1.43 14.83% 1.43 1.39 14.29%
WEE 0.82 0.91 6.34% 0.90 0.93 6.23% 1.14 1.02 8.35% 1.05 1.11 7.83%
EAW 0.83 0.97 6.43% 0.92 0.96 6.43% 1.19 1.03 9.51% 0.93 1.00 8.67%
RWA 0.80 0.93 5.76% 0.84 0.93 5.87% 0.99 0.95 7.32% 0.93 0.99 6.88%
FVAD 0.79 0.90 4.35% 0.79 0.91 5.41% 0.92 0.94 6.83% 0.86 0.95 5.68%
VMD-CAT 0.76 0.88 3.29% 0.78 0.90 2.07% 0.80 0.92 4.91% 0.79 0.93 3.98%
Table 4. Testing results on the Shanxi dataset.

Model Feb. May Aug. Oct.
ARIMA 1.64 1.87 15.26% 1.52 1.84 15.84% 1.68 1.84 17.55% 1.68 1.88 17.79%
BP 1.57 1.81 14.52% 1.48 1.78 15.03% 1.56 1.82 16.87% 1.51 1.86 16.56%
LS-SVM 1.38 1.67 14.25% 1.21 1.57 14.27% 1.39 1.65 14.25% 1.39 1.41 14.30%
WEE 0.86 1.04 6.80% 0.86 0.97 7.32% 0.85 1.02 9.77% 0.89 1.03 8.42%
EAW 0.89 1.07 8.20% 0.87 0.99 8.41% 0.88 1.05 11.30% 0.90 1.12 9.09%
RWA 0.85 0.99 5.96% 0.85 0.95 6.78% 0.83 0.98 9.26% 0.87 1.01 7.45%
FVAD 0.84 0.96 4.54% 0.80 0.93 5.15% 0.83 0.97 6.89% 0.85 0.97 6.04%
VMD-CAT 0.81 0.93 4.01% 0.77 0.87 2.06% 0.80 0.94 5.18% 0.82 0.95 4.27%
It can be seen that the performance of all the models varies with the seasons. On the Hebei dataset, all the
models have the best performance on the data collected in February, and have the same performance on the data
sampled in August and October. On the Shanxi dataset, each model shows a similar performance in the different
seasons. VMD-CAT has the best performance on the wind power series of May, and similar performance on the data
collected in February, August and November. In addition, the hybrid models highly outperform the single models.
This is due to the fact that the mode decomposition process used by the hybrid models, reduces the difficulty of
describing the characteristics of the wind power series. Moreover, VMD-CAT achieves the best prediction results
on two datasets.
Based on the assessment indexes, the improvement index used to measure the relative prediction improvement
achieved by VMD-CAT is expressed as:
Ac − A p
IM = (16)
Ac
where Ac is an assessment index obtained with a comparative model and A p denotes the same assessment index
obtained with VMD-CAT. Note that the larger the improvement index value, the more improvement achieved by
VMD-CAT.
Due to space limitations, the improvement indexes calculated only from the MAE values in Tables 3 and 4 are
shown in Fig. 10. The MAE values are effectively reduced by VMD-CAT on both the two datasets.
The proposed model combines the VMD and M-Transformer. In order to validate the outperformance of the
combination of VMD-CAT, WT, EMD and EEMD [25] are respectively combined with M-Transformer. The
temporary combinations are referred to as WT-CAT, EMD-CAT and EEMD-CAT, respectively. The setting of the
test and training data remain the same. The predicted wind power measurements in May and October for the Hebei
dataset are shown in Fig. 11, and the assessment indexes of the testing results are provided in Table 5.
It can be observed from Fig. 11 and Table 5 that, VMD-CAT performs slightly better than the other three hybrid
comparative models, which validates the efficiency of the VMD method.
6. Conclusions
In this paper, a hybrid model referred to as VMD-CAT was proposed for short-term wind power prediction.
The proposed model used VMD to decompose the historical wind power data into sub-signals, and then adopted a
209
Fig. 10. Improvements achieved by VMD-CAT. (a) Hebei dataset; (b) Shanxi dataset.
Fig. 11. Wind power prediction results of the M-Transformer based models. (a) May; (b) October.
Table 5. Testing results of different combinations.

Combination Feb. May Aug. Oct.
WT-CAT 0.79 0.93 4.43% 0.83 0.94 3.13% 0.82 0.98 5.21% 0.85 0.98 4.67%
EMD-CAT 0.78 0.91 4.06% 0.80 0.92 2.97% 0.82 0.95 5.11% 0.82 0.96 4.28%
EEMD-CAT 0.78 0.90 3.95% 0.79 0.92 2.41% 0.81 0.94 5.03% 0.80 0.95 4.14%
VMD-CAT 0.76 0.88 3.29% 0.78 0.90 2.07% 0.80 0.92 4.91% 0.79 0.93 3.98%
modified transformer model to learn a variable support for the future wind power measurements. Several experiments
were conducted on real wind power data collected from two typical wind farms, in order to validate the proposed
model. The outperformance of the proposed model attributed to three aspects: (1) The impacts of the inertia property
and time-varying property of the wind power series were efficiently described by the variable support. (2) The
component-attention mechanism was introduced to formulate the correlation-relationship between each sub-signal
and the historical wind power data. (3) A modified Transformer model was used to accurately learn the functional.
In future work, we expect to focus on leveraging multiple meteorological factors to tackle the wind power
prediction when the wind power series suffers from sudden and violent changes.
Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could
have appeared to influence the work reported in this paper.
Data availability
Data will be made available on request.
210
Acknowledgements
This work was fully supported by the scientific and technological project of State Grid Fujian Electric Power Co.
Ltd., China (SGTYHT/20-JS-223(SGFJJY00GHJS2200054)). The authors would also like to express their gratitude
to Edit Springs for the expert linguistic services provided.
References
[1] Z. Ma, G. Mei, A hybrid attention-based deep learning approach for wind power prediction, Appl Energy 323 (2022) 119608.
[2] A. Meng, S. Chen, Z. Ou, et al., A hybrid deep learning architecture for wind power prediction based on bi-attention mechanism and
crisscross optimization, Energy 238 (2022) 121795.
[3] C. Sasser, M. Yu, R. Delgado, Improvement of wind power prediction from meteorological characterization with machine learning
models, Renew Energy 183 (2022) 491–501.
[4] Y. Wang, H. Xu, R.R. Zou, et al., A deep asymmetric Laplace neural network for deterministic and probabilistic wind power forecasting,
Renew Energy 196 (2022) 497–517.
[5] C. Tian, T. Niu, W. Wei, Developing a wind power forecasting system based on deep learning with attention mechanism, Energy 257
(2022) 124750.
[6] S. Khazaei, M. Ehsan, S.S. Soleymani, et al., A high-accuracy hybrid method for short-term wind power forecasting, Energy 238
(2022) 122020.
[7] P. Lu, L. Ye, M. Pei, et al., Short-term wind power forecasting based on meteorological feature extraction and optimization strategy,
Renew Energy 184 (2022) 642–661.
[8] P. Poggi, et al., Forecasting and simulating wind speed in Corsica by using an autoregressive model, Energy Convers Manage 44 (20)
(2003) 3177–3196.
[9] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing 50 (17) (2003) 159–175.
[10] R.G. Kavasseri, K. Seetharaman, Day-ahead wind speed forecasting using f-ARIMA models, Renew Energy 34 (5) (2009) 1388–1393.
[11] M. Lydia, et al., Linear and non-linear autoregressive models for shortterm wind speed forecasting, Energy Convers Manage 112 (2016)
115–124.
[12] R.S. Michalski, J.G. Carbonell, T.M. Mitchell, Machine learning: an artificial intelligence approach, Springer Science & Business
Media, 1983, 27.4(1983).
[13] J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process Lett 9 (3) (1999) 293–300.
[14] J. Hochreiter, Long short-term memory, Neural Comput 9 (8) (1997) 1735–1780.
[15] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets, Neural Comput 18 (7) (2006) 1527–1554.
[16] Z. Peng, et al., A novel deep learning ensemble model with data de-noising for short-term wind speed forecasting, Energy Convers
Manage 207 (1) (2020) 112524.
[17] Z. Liu, et al., Ensemble forecasting system for short-term wind speed forecasting based on optimal sub-model selection and
multi-objective version of mayfly optimization algorithm, Expert Syst Appl 177 (3) (2021) 114974.
[18] Z.X. Qu, et al., Research and application of ensemble forecasting based on a novel multiobjective optimization algorithm for wind-speed
forecasting, Energy Convers Manage 154 (2017) 440–454.
[19] P. Jiang, C. Li, Research and application of an innovative combined model based on a modified optimization algorithm for wind speed
forecasting, Measurement 124 (2018) 395–412.
[20] M.H. Li, et al., Research and application of a combined model based on variable weight for short term wind speed forecasting, Renew
Energy 116 (2018) 669–684.
[21] S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans Pattern Anal Mach Intell 11
(1989) 674–693.
[22] N.E. Huang, et al., The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,
Proc R Soc A-Math Phys Eng Sci 454 (1971) (1998) 903–995.
[23] K. Dragomiretskiy, D. Zosso, Variational mode decomposition, IEEE Trans Signal Process 62 (3) (2014) 531–543.
[24] H. Liu, W.X. Mi, Y.F. Li, An experimental investigation of three new hybrid wind speed forecasting models using multi-decomposing
strategy and ELM algorithm, Renew Energy 123 (2018) 694–705.
[25] M. Santhosh, C. Venkaiah, D.M. Vinod Kumar, Ensemble empirical mode decomposition based adaptive wavelet neural network method
for wind speed prediction, Energy Convers Manage 168 (15) (2018) 482–493.
[26] Aasim S., N. Singh, A. Mohapatra, Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting,
Renew Energy 136 (2019) 758–768.
[27] J.L. Zhang, Y.M. Wei, Z.F. Tan, An adaptive hybrid model for short term wind speed forecasting, Energy 190 (2020) 115615.
[28] A. Vaswani, et al., Attention is all you need, in: Proc. NIPS, 2017, pp. 5998–6008.
211

VMD Cat

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VMD Cat

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

Energy Reports 9 (2023) 199–211

VMD-CAT: A hybrid model for short-term wind power prediction

Received 19 February 2023; accepted 25 February 2023

Keywords: Wind power prediction; Correlation relationship; VMD; Transformer

Table 1. Notation and description.

where x(k) (k) (k) (k)

3. VMD and Transformer

where α is a weight factor and λ(t) is the Lagrangian multiplier.

where β is a constant factor.

where ε is a stop threshold.

Fig. 2. Structure of the Transformer model.

Fig. 3. The modified input and output layer.

Fig. 4. Overlapped slices of the historical wind power segment.

Fig. 5. Component-attention module.

Fig. 6. Weight matrix calculated for a head.

Fig. 7. Structure of the proposed model.

5.1. Wind power datasets

5.2. Data decomposition and randomness metric

5.3. Comparative experiments and discussion

Table 2. The centre frequencies and permutation entropies of the decomposition.

Table 3. Testing results on the Hebei dataset.

Table 4. Testing results on the Shanxi dataset.

Table 5. Testing results of different combinations.

Declaration of competing interest

You might also like