You are on page 1of 12

17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 26 June 2020 Revised: 22 September 2020 Accepted: 19 October 2020 IET Renewable Power Generation
DOI: 10.1049/rpg2.12085

ORIGINAL RESEARCH PAPER

Ultra-short-term multi-step wind power forecasting based


on CNN-LSTM

Qianyu Wu1 Fei Guan1 Chen Lv2 Yongzhang Huang1

1
State Key Laboratory of Alternate Electrical Power Abstract
System with Renewable Energy Sources, North
The fluctuation and intermission of large-scale wind power integration is a serious threat
China Electric Power University, Beijing, People’s
Republic of China to the stability and security of the power system. Accurate prediction of wind power is
2
China Electric Power Research Institute,
of great significance to the safety of wind power grid connection. This study proposes a
Beijing, China novel spatio-temporal correlation model (STCM) for ultra-short-term wind power predic-
tion based on convolutional neural networks-long short-term memory (CNN-LSTM). The
Correspondence original meteorological factors at multi-historical time points of different sites throughout
Qianyu Wu, State Key Laboratory of Alternate
the target wind farm can be reconstructed into the input window of the model, and thus a
Electrical Power System with Renewable Energy
Sources, North China Electric Power University, new data reconstruction method is represented. CNN is used to extract the spatial correla-
Beijing, People’s Republic of China. tion feature vectors of meteorological factors of different sites and the temporal correlation
Email: 18810663690@163.com
vectors of the meteorological features in ultra-short term, which are reconstructed in time
series and used as the input data of LSTM. Then, LSTM extracts the temporal feature
Funding: Funder: Science and Technology Project of
State Grid. Grant No.: 5201011600TS. relationship between the historical time points for multi-step wind power forecasting. The
STCM based on CNN-LSTM proposed in this study is suitable for wind farms that can
collect meteorological factors at different locations. Taking the measured meteorological
factors and wind power dataset of a wind farm in China as an example, four evaluation
metrics of the CNN-LSTM model, CNN and LSTM individually used for multi-step wind
power prediction, are obtained. The results show that the proposed STCM based on CNN-
LSTM has better spatial and temporal characteristics extraction ability than the traditional
structure model and can forecast the power of wind farm more accurately.

1 INTRODUCTION it is easy to realise the medium and long-term forecasting of


wind power. The forecasting accuracy mainly depends on the
In recent years, wind power has developed rapidly all over the accuracy of NWP data, the information of the physical environ-
world. The fluctuation and intermission of wind power output ment around the wind farm and the accuracy of the physical
bring unstable factors to the power system. Improving the fore- model [3]. However, physical models are not suitable for short-
casting accuracy of wind power is an effective way to reduce the term wind power forecasting because of the high calculation
instability of the power system caused by large-scale wind power cost. Based on a large number of historical data of wind farms,
integration. the statistical methods use algorithms including Kalman filter
The forecasting models of wind power are mainly divided [4], autoregressive (AR) model, autoregressive moving average
into physical, statistical, machine learning and combined mod- (ARMA) model [5] to extract the linear relation between input
els [1]. The physical models convert the numerical weather fore- features (NWP, historical measured data) and wind power. Sta-
casting (NWP) data into wind speed of the height of the wind tistical models can achieve short-term wind speed forecasting,
turbine by means of microscale meteorology and computational but they cannot analyse non-linear relationships between the
fluid dynamics and forecast wind power indirectly by the con- variables [6]. Machine learning models such as backpropagation
version calculation [2]. The physical models can be applied to network [7], radial basis function [8], extreme learning machine
new wind farms without a large amount of historical data, and [9], support vector machine (SVM) and Gaussian process [10]

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2021 The Authors. IET Renewable Power Generation published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology

IET Renew. Power Gener. 2021;15:1019–1029. wileyonlinelibrary.com/iet-rpg 1019


17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1020 WU ET AL.

establish a black-box model to fit the non-linear relationship prediction to extract the hidden features of the wind speed time
between the input characteristics and the output wind power series.
by learning and training a large number of historical measured Existing individual CNN and LSTM can establish the non-
data. However, the shallow machine learning models can only linear correlation between output and input variables through
extract very superficial features and have weak learning ability a large amount of historical data to predict wind speed or
for multi-dimensional big data. wind power. And each model has its advantages and disad-
Compared with shallow machine learning, the deep learning vantages. The combination of CNN and LSTM can realise the
models have a stronger ability of computing and complex complementary advantages of each model to further improve
function fitting. Through non-linear optimisation of multi-layer the accuracy of forecasting [25]. The power of a wind farm
network structure, deep learning models can automatically is related to meteorological factors such as wind speed and
extract the inherent features in data from the lowest to the wind direction at various sites of the wind farm. The mete-
highest level [11]. Some scholars try to apply the deep learning orological factors are continuous in time and space result-
models to wind power forecasting based on historical data to ing in a significant cross-correlation between the factors of a
improve the accuracy of wind power forecasting [12]. The ultra- target and its adjacent site [27]. However, the existing ultra-
short-term and short-term wind speed forecasting models of short-term forecasting methods of wind power usually ignore
three hidden layers are established by using the deep Boltzmann the influence of spatio-temporal correlation of meteorologi-
machine [13]. In [14], migration learning model was applied cal factors at different sites of the wind farm on the wind
to transfer wind speed forecasting models trained by wind power.
farms with rich historical data to wind farms with less historical Thus, this study combines the advantage of extracting spa-
data. In [15], the deep belief network model was applied to tial features of CNN and the advantage of extracting time-
short-term wind speed forecasting to obtained high prediction series features of LSTM to extract the spatio-temporal cor-
accuracy in practical examples. A new forecasting model based relation between multiple meteorological factors and wind
on neural network (NN) and a novel chaotic shark smell opti- power.
misation algorithm was proposed and the effectiveness of the The contributions of this study are as follows. (1) Consid-
proposed forecasting model was tested on two real-world case ering the influence of multiple meteorological factors of dif-
studies [16]. The authors in [17] proposed a new wind power ferent sites throughout the target wind farm to wind power,
prediction approach which included an improved version of a novel spatio-temporal correlation model (STCM) based on
Kriging interpolation method, empirical mode decomposition CNN-LSTM for ultra-short-term wind power prediction is pro-
(EMD), an information-theoretic feature selection method, and posed. (2) A new data reconstruction method is proposed. The
a closed-loop forecasting engine. In [18], a prediction approach input matrix is constructed by the meteorological factors at
based on the improved EMD (IEMD) in conjunction with different sites of the wind farm as the vertical axis and the
a hybrid framework consisting of the bagging NN (BaNN), ultra-short-term historical time as the horizontal axis. CNN is
K-means clustering method, and a stochastic optimisation used to extract the spatial correlation of meteorological fac-
algorithm was proposed. tors on the vertical axis and the ultra-short-term temporal
CNN and LSTM are two main deep learning models [19]. correlation of the features on the horizontal axis. The cor-
In [20], a probability forecasting model of ultra-short-term relation vectors of each input matrix extracted by CNN are
wind power based on CNN was proposed, and the accuracy constructed in a long-term time series and used as the input
of the model was verified. In [21], CNN and physical model data of LSTM to extract long-term historical temporal relation-
were combined for forecasting, which further reduced the fore- ship for multi-step wind power prediction. (3) Based on the
casting error of short-term wind power. In [22], LSTM mod- data reconstruction method, the study uses multiple indepen-
els were applied to short-term forecasting of wind speed and dent models to share the same input matrix to achieve a multi-
wind power. In [23], the short-term wind power interval pre- step prediction of wind power and reduce the time for data
diction based on two typical recurrent NN (RNN) models, processing.
Elman network and the nonlinear autoregressive with exoge- The organisation of this study is as follows. Section 2 intro-
nous inputs (NARX) model and lower upper bound estima- duces the structure of STCM based on CNN-LSTM. In Sec-
tion method was investigated. Taking into account the impact tion 3, the spatio-temporal of multiple meteorological factors
of meteorological information data on wind power prediction, (specifically refers to wind speed and wind direction in this
the authors in [24] sifted multivariate meteorological informa- study) at different sites of the wind farm is analysed. The wind
tion data highly relevant to wind power with distance analysis speeds and wind directions measured by benchmark wind tur-
as the input data of the LSTM model and modelled the time bines are reconstructed and used as the input of the model. And
series from the viewpoint of time with LSTM. The authors in the STCM based on CNN-LSTM for multi-step wind power
[25] combined CNN and LSTM to forecast wind speed and forecasting and error calculation method is presented. In Sec-
considered the influence of various meteorological factors such tion 4, taking a wind farm in China as an example, the calcu-
as temperature, wind speed, and wind direction on the wind lation results of four evaluation indexes show that the STCM
speed in time and space. In [26], the authors investigated the based on CNN-LSTM established in this study can predict the
combined performance of the wavelet packet decomposition wind farm power more accurately than the individually deep
and the CNN and CNN-LSTM in the wind speed multi-step learning model (CNN, LSTM).
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WU ET AL. 1021

average pooling is defined as [31]

S2 −1 S∑
1 −1
1 ∑
Cab = y (2)
S1 S2 j =0 i=0 aS1 +i,bS2 + j

where Cab is the element in row i and column j of C;


a = 0,1,…,M/S1 – 1; b = 0,1,…,N/S2 – 1; yaS1+i,bS2+j is the
element in row aS1 +i and column bS2 +j of y.
FIGURE 1 The structure of convolutional neural networks (CNN)
1. Fully connected layer: In the full connection layer, the spatial
correlation characteristic matrix C processed by the pooling
layer is expanded into one-dimensional data output, which is
expressed as [31]
2 STRUCTURE OF STCM BASED ON
CNN-LSTM ∑
n
p= f( ki ci + b) (3)
i=1
2.1 The principle of CNN algorithm
where c = [c1 ,c2 ,…,ci ,…cn ] is the n-dimensional input variable;
CNN is a deep feedforward NN with convolution structure k = [k1 ,k2 ,…,ki ,…kn ] is the connection weight; p is the one-
using a supervised learning method [28]. The structure includes dimensional output value of the spatial correlation characteris-
a convolutional layer, pooling layer and full connection layer as tics of wind speeds and directions.
shown in Figure 1. Convolutional and pooling layers are the core
modules of the CNN network feature extraction. The input fea-
tures are convoluted through the convolutional layer. The pool- 2.2 The principle of LSTM algorithm
ing layer samples information from the preceding convolutional
layer and minimises the spatial size [29]. Then, the full connec- LSTM is a special RNN with stronger feature extraction ability
tion layer maps the two-dimensional feature to one-dimensional for processing sequence data [32]. LSTM introduces memory
data for output. unit on the basis of RNN, which is controlled by input, output
and forgetting gates. It can better realise the storage, screening
1. Convolutional layer: The convolution operation of the con- and control of information flow under the time feedback mech-
volutional layer can reduce the noise and enhance the key anism, effectively avoid information loss and solve the problem
information of the original input features. Assuming that v of gradient disappearance and explosion.
is the input features of the original wind speeds and direc- The structure of LSTM is shown in Figure 2. The symbol σ
tions, and w is the convolution kernel of order J × I, and the represents sigmoid activation function, and tanh is tanh activa-
spatial correlation characteristic matrix y of the wind speeds tion function. The input information flow enters from the out-
and directions of order M × N is output after the activation put variable pt–1 at the previous time and the input variable vt
function, the element in row m and column n is defined as at the current time. Through the control of input, output and
[30] forgetting gates, the memory unit Ct–1 is updated to Ct , and the
output value at the current time is pt . The output values of input,
J −1 I −1
∑ ∑ output and forgetting gates are it , ot and ft , respectively.
ymn = f ( vm+i,n+ j wi j + b) (1) The transformation equation is defined as [33]
j =0 i=0

where m = 0,1,…,M – 1; n = 0,1,…,N – 1; wi,j is the element in it = 𝜎(kxi vt + khi p(t −1) + kci ⋅c(t −1) + bi )
row i and column j of w; vm+i,n+j is the element in row m + i and
ft = 𝜎(kx f vt + kh f p(t −1) + kc f ⋅c(t −1) + b f )
column n + j of v; b is the bias; f is the activation function.
ot = 𝜎(kxovt + kho p(t −1) + kco⋅ct + bo ) (4)
1. Pooling layer: In the pooling layer, the spatial correlation
matrix y of wind speeds and directions after through the ct = ft ⋅c(t −1) + it ⋅ tanh(kxc vt + khc p(t −1) + bc )
convolutional layer is further reduced by taking the aver-
age value (average pooling) or the maximum value (maxi- pt = ot ⋅ tanh(ct )
mum pooling) of the area to save useful information while
reducing the amount of data processing. Assuming that the where kxi , khi , kci are weight matrixes of input, output at the
dimension of the pool area is S1×S2, the dimension of the previous moment and memory unit to input gate, respectively;
spatial correlation characteristic matrix C after output pro- kxf , khf , kcf are weight matrixes of input, output at the previ-
cessing is (M/S1 )×(N/S2 ), and the calculation formula of ous moment and memory unit to forgetting gate, respectively;
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1022 WU ET AL.

FIGURE 2 The architecture of the long short-term memory (LSTM) model

FIGURE 4 The architecture of the single CNN-LSTM model

includes multiple correlations of N meteorological factors in


FIGURE 3 Output mode of CNN-LSTM multi-step forecasting model
different sites and at M historical time points. And each point
can be defined as IW(t–m, fn ), which represents the value of the
kxo , kho , kco are weight matrixes of input, output at the previ- n-th meteorological factor of at historical time point t–m. The
ous moment and memory unit to output gate, respectively; bi , purpose of the model is to calculate the wind power of target
bf , bo , bc are the bias values of input, forgetting, output gates and wind farm at the next M moment from the correlation coef-
memory unit, respectively [33]; “⋅” is Hadamard product. ficient between N meteorological factor values at M historical
time points. The formula is defined as

2.3 The STCM based on CNN-LSTM and P (pt , pt +1 , … pt +m … pt +M −1 )


data reconstruction method −1
∑ ∑
N M
= (IW (t − m, fn ) ⋅ C (IW (t − m, fn ), pt +m ) + 𝜉nm )
Meteorological factors are closely related to wind power fore- n=1 m=0
casting, including wind speed, wind direction, temperature, air
pressure, and humidity; and the meteorological features of a (5)
region are similar to those of its adjacent regions [27]. Mak-
ing full use of the meteorological information of different site Here,P (pt , pt +1 , … pt +m … pt +M −1 ) is the value of the wind
throughout the target wind farm can improve the accuracy and powers of the target wind farm at the next M moment,
reliability of wind power forecasting. C (IW (t − m, fn ), pt +m ) is the correlation coefficient between
CNN-LSTM model combines the advantages of CNN and the value of the nth meteorological factor at historical time point
LSTM, which can extract spatial local features while time-series t–m and wind power value at the moment t+m, and 𝜉nm is the
modelling. CNN is used to extract the spatial correlation fea- error term.
ture vectors of meteorological factors of different sites, which The CNN-LSTM forecasting model with an output of M
are constructed in time series and used as the input data of steps has M CNN-LSTM structures that share the same input,
LSTM network, and then LSTM network is used for ultra-short- simplifying the process of data preprocessing. The M models are
term wind power forecasting. The multi-step forecasting model trained independently and do not interfere with each other. The
of wind power based on CNN-LSTM proposed in this study is structure of a single CNN-LSTM model is shown in Figure 4, N
shown in Figure 3. N represents the numbers of meteorolog- meteorological factors at different positions of the wind farm at
ical features and M represents the numbers of historical time M times form an input window. After convolution by multiple
points. The M × N input window of the CNN-LSTM model two-dimensional convolution kernels, CNN can extract the spa-
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WU ET AL. 1023

TABLE 1 Programming platform and model parameters

Development Development
platform Keras’s functions language Basic frequency
Programming
platform
Conv2D, MaxPooling2D, Dense, LSTM, Activation, Dropout,
Google Keras plot_model Python 3.2 GHz

Parameters Iterations Convolution Number of Number of Time window Batch size Random L2
kernel size convolution long inacti-vetion
kernels short-term rate
memory
(LSTM)
neurons
30 2×2 4 128 254 14 0.2 0.005

the wind speed value, t is the time, and the data resolution is
5 min. The distribution rule of wind directions is represented
by wind direction rose map, and the wind direction rose maps
of the four benchmark wind turbines based on the wind speed
and wind direction data of 2017 are shown in Figure 6. The
wind direction in the figure is represented by eight directions,
namely east (E), west (W), south (S), north (N), southeast (S-E),
southwest (W-S), northeast (N-E), northwest (N-W). Different
colours represent different wind speed ranges, and the number
above each colour represents the frequency statistics of the cor-
responding wind speed section in this direction.
From the figures above, it can be seen that the wind speeds
and wind directions distribution of four benchmark wind tur-
bines in a wind farm have an obvious similarity. Therefore, it
is very necessary to explore the wind speeds and wind direc-
FIGURE 5 Wind speeds measured by four benchmark wind turbines in a tions temporal and spatial correlation of different wind turbines
wind farm in the wind farm so as to achieve the power forecasting of the
wind farm.
tial correlation of N meteorological factors on the vertical axis The wind speeds and wind directions measured at different
and the ultra-short-term temporal correlation of the features at benchmark wind turbines of wind farm are taken as input vari-
M times on the horizontal axis. After pooling, the correlation ables features, and the power values of wind farm are taken as
vectors of each input matrix extracted by CNN are constructed output variables to construct the learning model. In order to
in a long-term time series and used as the input data of LSTM to extract the spatial correlations of each input features and con-
extract long-term historical temporal relationship for multi-step sider the temporal correlation of historical input data, the rela-
wind power prediction. Combining time window and long-term tionship between wind speeds and wind directions of the bench-
historical, the power forecasting results of the wind farm are mark wind turbines in the previous hour and the power value of
achieved. wind farm in the next hour is established by CNN-LSTM algo-
rithm (the data resolution is 5 min, and the input features of
the previous hour are used to predict the wind power of the
3 THE STCM BASED ON CNN-LSTM next hour). The input characteristic matrix of each data sample
FOR MULTI-STEP WIND POWER is represented as
FORECASTING
⎡vt1−11 dt1−11 vt2−11 dt2−11 ⋯ vth−11 dth−11 ⎤
3.1 Spatio-temporal correlation of wind ⎢ ⎥
⎢vt1−10 dt1−10 vt2−10 dt2−10 ⋯ vth−10 dth−10 ⎥
speeds and wind directions at different ⎢ ⎥
positions of the wind farm ⎢ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥
⎢ ⎥
Wind farm has dozens to hundreds of wind turbines, usually Vt = ⎢ vt1−i dt1−i vt2−i dt2−i ⋯ vth−i dth−i ⎥ (6)
⎢ ⎥
influenced by distance and eddy current effect of wind turbine ⎢ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥
blades, the wind speed, and direction of each wind turbine are ⎢ ⎥
⎢ v1 dt1−1 vt2−1 dt2−1 ⋯ vth−1 dth−1 ⎥⎥
not exactly the same but have the correlation. The change of ⎢ t −1
wind speeds with time, measured by four benchmark wind tur- ⎢ 1 ⎥
⎣ vt dt1 vt2 dt2 ⋯ vth dth ⎦
bines in a wind farm in China is shown in Figure 5, where v is
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1024 WU ET AL.

FIGURE 6 Wind rose maps of four benchmark wind turbines in a wind farm. (a) Benchmark wind turbine 1, (b) benchmark wind turbine 2, (c) benchmark
wind turbine 3, (d) benchmark wind turbine 4

where t is the current time, vh t–i is the wind speed of the h-th
benchmark wind turbine at t–i, and dh t–i is the wind direction of
the h-th benchmark wind turbine at t–i.
The corresponding output is the wind farm power in the next
hour, which is shown below as
[ ]T
Pt = pt pt +1 ⋯ pt +11 (7)

The wind power forecasting is to find the mapping


relationship between Vt and Pt and consider the influ-
ence of historical sequence data. It is a high-dimensional
regression problem, which needs CNN-LSTM algo-
rithm to mine the temporal and spatial correlation of
FIGURE 7 The wind prediction model based on CNN-LSTM features.
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WU ET AL. 1025

FIGURE 8 The analysis of model generalisation ability. (a) the errors of the 12 networks with 1–12, (b) the errors of the 12 networks with 1–12

TABLE 2 Mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and normalised RMSE (NRMSE) for
5 min–1 h (1–12 steps) ahead prediction of convolutional neural networks (CNN), LSTM and CNN-LSTM

eMAE (MW) eMAPE (%) eRMSE (MW) eNRMSE (%)


Time
CNN- CNN- CNN- CNN-
CNN LSTM LSTM CNN LSTM LSTM CNN LSTM LSTM CNN LSTM LSTM

5 min 0.648 0.510 0.452 1.309 1.030 0.911 1.483 1.038 0.836 2.996 2.097 1.689
10 min 1.162 0.912 0.721 2.347 1.842 1.452 2.294 1.769 1.542 4.634 3.573 3.115
15 min 1.701 1.398 1.288 3.436 2.824 2.478 2.873 2.564 1.978 5.804 5.179 3.996
20 min 2.276 1.922 1.589 4.598 3.883 3.398 3.212 2.833 2.461 6.489 5.732 4.971
25 min 2.832 2.310 1.984 5.721 4.667 3.934 3.791 3.263 2.968 6.91 6.63 6.68
30 min 3.211 2.705 2.372 6.487 5.465 4.848 4.138 3.715 3.469 8.359 7.658 7.008
35 min 3.731 3.013 2.705 7.537 6.087 5.752 4.684 4.173 3.735 9.462 8.430 7.545
40 min 4.133 3.521 3.156 8.349 7.113 6.681 5.047 4.834 4.121 10.19 9.765 8.325
45 min 4.590 3.995 3.601 9.273 8.071 7.897 5.392 5.087 4.327 10.89 10.27 8.741
50 min 4.998 4.313 3.976 10.09 8.713 8.018 5.661 5.229 4.889 11.43 10.56 9.876
55 min 5.679 4.645 4.089 11.47 9.383 8.791 5.993 5.564 5.025 12.10 11.64 10.15
60 min 6.337 5.286 4.923 12.80 10.68 9.545 6.524 6.099 5.136 13.10 12.13 10.98
Average 3.442 2.879 2.573 6.948 5.813 5.316 4.258 3.864 3.398 8.53 7.80 6.92
errors

3.2 Wind farm power forecasting model and wind direction would naturally be weighted more, and
error calculation method based on CNN-LSTM thus wind speeds and direction must be normalised.
Different attribute input features have a different cor-
The construction process of wind power prediction model relation with wind power, and the orders of magnitude
based on CNN-LSTM is shown in Figure 7. of features are different. Therefore, the input features
The specific steps are as follows: are processed by the method of normalisation, and the
wind power feature data is mapped to 0∼1 to form the
1. Data preprocessing initial features set [34]:
The importance of normalisation is actually so that one vt − min{v j }
feature is not weighted more than another because it vt′ = , j = 1, 2, … , k (8)
has a higher variance magnitude. In this scenario, the max{v j } − min{v j }
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1026 WU ET AL.

FIGURE 9 The comparison for prediction results of next hour (12 steps) in four samples between CNN-LSTM, CNN and LSTM. (a) The 5000th sample, (b)
the 10,000th sample, (c) the 15,000th sample, (d) the 20,000th sample

where vt is the actual value of the input feature, vt ’ is the 1. Model training
normalised input feature, and the number of samples is
k. The normalised data is divided into a training set and test
The final wind power forecasting data shall be achieved by set according to 8:2. The training set is used for model
inverse normalisation according to Equation (9) [34]: training, and the error prediction is obtained by post-
departure operation. The mean squared error (MSE)
pre pre′ is used as the objective function, and the parameters
pt = pt × (max{p j } − min{p j }) + min{p j }, (9)
j = 1, 2, … , k of the whole model are updated from the back to the
front by using the Adam optimisation algorithm to find
where pt pre is the predicted value after inverse normalisa- a set of parameters that minimise the MSE. Adam opti-
tion; pt pre’ is the predicted value of wind power, pt is the misation algorithm is a random objective function-step
actual value of wind power, and the number of sample optimisation algorithm based on low-order moment
points is k. adaptive estimation [35]. In order to
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WU ET AL. 1027

TABLE 3 The forecasting power values of CNN, LSTM and CNN-LSTM TABLE 3 (Continued)
for the four samples of next hour (12 steps)
Wind farm power (kW)
Wind farm power (kW)
Time Real CNN-
Time Real CNN- Sample (min) values CNN LSTM LSTM
Sample (min) values CNN LSTM LSTM
40 3953.06 4744.015 3969.952 9167.915
1 5 34,720.41 36,380.652 34,368.746 33,937.188
45 5025.35 4008.8337 4438.9307 7972.131
10 31,119.301 35,480.43 33,185.14 32,741.406
50 5289.72 5355.1353 3498.628 8273.88
15 30,705.721 35,168.082 33,803.734 31,940.156
55 6619.679 5108.5923 3498.628 7546.9165
20 29,876.85 35,083.605 34,133.953 32,985.223
60 6162.95 5946.305 5182.143 8813.828
25 32,880.91 31,598.648 34,355.18 32,112.92
30 31,392.381 33,140.293 33,629.34 31,194.94
35 29,753.711 34,199.05 33,884.062 30,657.139 prevent overfitting, L2 parameter regularisation is used
40 30,271.99 34,769.38 32,084.904 32,161.428
in the full connection layer. The objective function of
adding regular terms is shown in Equation (10) [36]:
45 29,890.359 31,584.9 33,973.277 31,764.613

1 ∑ pre′
50 31,280.33 34,558.62 33,142.387 31,635.4 n 2
𝛼
55 31,546.961 33,692.637 32,215.523 29,640.443 eMSE = (pt − p′t ) + w T w (10)
n t =1 2
60 29,098.99 33,436.055 34,129.875 31,754.287
2 5 12,716.36 9453.897 10,412.582 14,127.441 where pt ’ is the normalised wind power, α is the regular
10 11,395.87 10,747.163 9846.695 14,485.338 term parameter, w is the connection weight vector of
15 9569.8 10,052.157 10,286.979 13,551.66 the full connection layer, and n is the number of training
20 11,200.85 10,841.17 10,920.691 14,157.615 samples.
25 8012.09 10,622.405 10,959.345 14,323.592
In addition, the optimisation method of random deactiva-
tion (dropout) is introduced. In the learning process,
30 9911.2 9954.355 10,223.284 12,792.364
part of the weight or output of the hidden layer is ran-
35 9654.22 8679.003 9572.729 15,278.311 domly zeroed to reduce the interdependence between
40 18,176.539 10,885.018 9644.081 15,593.974 the hidden layer neurons so as to realise the regularisa-
45 16,207.82 11,181.835 10,554.634 16,554.01 tion of the NN.
50 17,934.699 10,975.403 10,056.064 16,016.255
55 18,670.631 10,002.928 11,194.049 14,377.775
1. Evaluating index
60 16,668.301 10,330.784 11,382.005 12,437.075
The test set is used to evaluate the accuracy of the model.
3 5 11,414.45 10,034.183 10,903.952 12,437.857 In this study, the MAE, the mean absolute percentage error
10 13,295.92 10,047.826 10,967.445 12,544.46 (MAPE), root mean square error (RMSE) and the normalised
15 12,332.99 11,117.005 11,358.424 12,956.259 RMSE (NRMSE) are used as the evaluation indexes [36]:
20 11,913.91 11,434.607 11,600.791 13,085.642
1 ∑ | pre
N
25 10,530.96 10,777.605 11,857.121 13,700.035 |
eMAE = | p − pt | (11)
30 9599.46 11,184.91 11,671.862 12,625.193 N i=1 | t |
35 8368.22 11,091.872 10,623.982 13,039.483
N | pre |
40 7587.65 12,063.932 10,858.019 13,511.718 100% ∑ || pt − pt ||
eMAPE = (12)
45 6144.46 11,219.689 11,978.992 13,685.547 N i=1 || pt |
|
| |
50 5900.34 12,035.183 11,729.29 13,867.837 √

55 6129.53 12,239.117 12,965.97 13,407.783 √1 ∑ N
eRMSE = √
pre 2
60 6060.73 11,132.08 13,039.952 14,411.142 (pt − pt ) (13)
N i=1
4 5 6933.62 3853.386 3707.6467 7190.1113

√ N ( )2
1 √
10 7400.749 3690.3926 3814.5557 6733.379
eNRMSE = √ 1 ∑ p pre − p (14)
15 6943.53 4185.029 4076.5503 8260.487 t t
PN N i=1
20 6922.69 4670.879 4390.6025 7143.3296
25 6042.29 4080.2886 4210.668 8417.876
Here, N is the total sample number of the forecasting series,
30 5127.18 4113.7373 3863.0166 8129.9614 and PN is the rated installed capacity. pt and pt pre are the actual
35 2813.58 3262.7854 3324.9043 8510.74 and forecasted values of the i-th wind power samples, respec-
(Continues) tively.
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1028 WU ET AL.

4 EXAMPLE ANALYSIS tion errors eMAE , eMAPE , eRMSE and eNRMSE of the CNN, LSTM
and CNN-LSTM models for the test set in the next hour (12
4.1 Sample processing and parameters steps) is shown in Table 2.
setting It can be seen from Table 2 that the eMAE , eMAPE , eNRMSE
and eRMSE of CNN-LSTM model are lower than that of CNN
In order to verify the validity of the model proposed in this and LSTM model in every step in the next hour. The average
study, data samples of a wind farm in China are selected for MAE, MAPE, RMSE and NRMSE of the total model decrease
wind power prediction. The wind farm installed 33 wind tur- by 33.77%, 30.69%, 25.3% and 23.3% (compared with CNN),
bines with a total installed capacity of 49.5 MW. It is located in 12.0%, 10.6%, 14% and 12.7% (compared with LSTM), respec-
the middle and low mountain landform with undulating terrain tively. The comparison results of the wind farm power forecast-
and different height distribution of wind turbine. The histori- ing values and the real power values of the test sets for the three
cal power values and meteorological factors of the target wind algorithms are shown in Figure 9. Four samples of prediction
farm in 2017 were measured, a total of 105,120 time points of results of next hour (12 steps) are randomly selected and the
data with a resolution of 5 min. After deleting the missing data, specific power values are shown in Table 3. It can be seen from
a total of 104,800 time points were collected. After data recon- Figure 9 that CNN-LSTM has better forecasting effect than
struction (12 time point data for each sample), there are a total CNN and LSTM.
of 8732 samples. The first 6732 samples are taken as the training
set, and the remaining 2000 samples as the test set. The training
set is used to calculate the gradient and update the weights, and 5 CONCLUSIONS
the test set gives the final evaluation index.
The multi-model method is selected for multi-step predic- Considering the temporal and spatial correlation of meteoro-
tion. Based on the input features of the historical sequences logical factors of different sites throughout the wind farm,
length of 12, the 5 min—1 h (1–12 step) ultra-short-term a novel STCM based on CNN-LSTM for ultra-short-term
forecasting of the power time series of the wind farm is car- wind power prediction is proposed. In order to deeply extract
ried out. Twelve CNN-LSTM multi-step models with the same the temporal and spatial correlations for the STCM based on
network structure share the same input, and each model is CNN-LSTM, a new data reconstruction method is proposed,
trained independently without interference. The platform used saving data processing time while extracting spatio-temporal
for programming and the set super parameters is shown in correlations for multi-step wind power prediction. Specifically,
Table 1. the wind speeds and wind directions measured by benchmark
wind turbines are reconstructed and used as the input of the
model. The STCM for wind farm power forecasting model and
4.2 Generalisation capability of the model error calculation method based on CNN-LSTM is presented. To
verify the accuracy and superiority of the proposed model, the
In order to prevent overfitting, the regularisation coefficient of measured wind speeds and wind directions by four benchmark
L2 parameter and dropout are set to reduce network complex- wind turbines and the wind power data from a wind farm in
ity and improve the generalisation ability of the model. By com- China are used in the experiments. Moreover, the effectiveness
paring the variation of 5 min–1 h (1–12 steps) RMSE eRMSE and superiority of the proposed CNN-LSTM model are verified
for training and test sets with the iteration times, the general- by the comparison of four evaluation metrics with CNN and
isation ability of the model is analysed as shown in Figure 8. LSTM individually used. The experiment results show that the
All the errors of the 12 networks with 1–12 predicted values in average MAE, MAPE, RMSE and NRMSE of the total model
the training set decrease with the increase of iteration times, and decrease by 33.77%, 30.69%, 25.3% and 23.3% (compared with
there is still good convergence in the test set, thus the model has CNN), 12.0%, 10.6%, 14% and 12.7% (compared with LSTM),
strong generalisation ability. respectively. The proposed STCM based on CNN-LSTM for
multi-step wind power forecasting fully considers the spatio-
temporal correlation of meteorological factors throughout the
4.3 Comparison of forecasting performance wind farm and can forecast the power of wind farm more
between CNN, LSTM and CNN-LSTM accurately.
However, there are still some shortcomings that need to be
The CNN, LSTM can significantly improve the forecasting completed. The different contribution degree of meteorological
accuracy of wind farm power compared with ANN and SVM factors of different sites to the power of the wind farm needs
and other conventional machine learning algorithms [37, 38]. to be explored further. Assigning different weights to different
Therefore, this study only analyses the advantages of CNN- contribution degree of factors in the learning process will help
LSTM compared with CNN and LSTM in extracting spatial further improve the forecasting accuracy of wind power based
correlation features of multivariate time series. The CNN and on the deep learning algorithm, which is the next research focus.
LSTM network uses the same input data to predict the wind In addition, it is necessary to improve the optimiser algorithm
farm power for the next hour. The comparison of the predic- of the NN.
17521424, 2021, 5, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rpg2.12085 by Cochrane Peru, Wiley Online Library on [01/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WU ET AL. 1029

REFERENCES 22. Liu, H., Mi, X., Li, Y.: Wind speed forecasting method based on deep learn-
1. Hu, Q., et al.: Short-term wind speed or power forecasting with het- ing strategy using empirical wavelet transform, long short term memory
eroscedastic support vector regression. IEEE Trans. Sustainable Energy neural network and Elman neural network. Energy Convers. Manage. 156,
7(1), 241–249 (2015) 498–514 (2018)
2. Feng, S., et al.: Study on the physical approach to wind power prediction. 23. Shi, Z., Liang, H., Dinavahi, V.: Direct interval forecast of uncertain
Proc. CSEE 30(2), 1–6 (2010) wind power based on recurrent neural networks. IEEE Trans. Sustainable
3. Xue, Y., et al.: A review on short-term and ultra-short-term wind power Energy 9(3), 1177–1187 (2017)
prediction. Autom. Electr. Power Syst. 39(6), 141–151 (2015) 24. Zhu, Q., et al.: Short-term wind power forecasting based on LSTM. Power
4. Xiu, C., et al.: Short-term prediction method of wind speed series based Syst. Technol. 41(12), 3797–3802 (2017)
on kalman filtering fusion. Trans. China Electrotech. Soc. 29(2), 253–259 25. Chen, Y., et al.: Multifactor spatio-temporal correlation model based on a
(2014) combination of convolutional neural network and long short-term mem-
5. Sun, G., Wei, Z., Zhai, W.: Short term wind speed forecasting based on ory neural network for wind speed forecasting. Energy Convers. Manage.
RVM and ARMA error correcting. Trans. China Electrotech. Soc. 27(8), 185, 783–799 (2019)
187–193 (2012) 26. Liu, H., Mi, X., Li, Y.: Smart deep learning based wind speed prediction
6. Quan, H., Srinivasan, D., Khosravi, A.: Short-term load and wind power model using wavelet packet decomposition, convolutional neural network
forecasting using neural network-based prediction intervals. IEEE Trans. and convolutional long short term memory network. Energy Convers.
Neural Networks Learn. Syst. 25(2), 303–315 (2013) Manage. 166, 120–131 (2018)
7. Santamaría-Bonfil, G., Reyes-Ballesteros, A., Gershenson, C.: Wind speed 27. Ak, R., Vitelli, V., Zio, E.: An interval-valued neural network approach
forecasting for wind farms: A method based on support vector regression. for uncertainty quantification in short termwind speed prediction. IEEE
Renewable Energy 85, 790–809 (2016) Trans. Neural Networks Learn. Syst. 26(11), 2787–2800 (2015)
8. Lee, D., Baldick, R.: Short-term wind power ensemble prediction based on 28. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recog-
Gaussian processes and neural networks. IEEE Trans. Smart Grid 5(1), nition. Neural Comput. 1(4), 541–551 (1989)
501–510 (2013) 29. LeCun, Y., Bottou, L.: Gradient-based learning applied to document recog-
9. Haque, A.U., Nehrir, M.H., Mandal, P.: A hybrid intelligent model for nition. Proc. IEEE 86(11), 2278–2324 (1998)
deterministic and quantile regression approach for probabilistic wind 30. Huang, H., et al.: Pure electric vehicle nonstationary interior sound quality
power forecasting. IEEE Trans. Power Syst. 29(4), 1663–1672 (2014) prediction based on deep CNNs with an adaptable learning rate tree. Mech.
10. Noorollahi, Y., Jokar, M.A., Kalhor, A.: Using artificial neural networks Syst. Sig. Process. 148, 107170 (2021)
for temporal and spatial wind speed forecasting in Iran. Energy Convers. 31. Li, Y., et al.: Small-signal stability assessment of power system based on
Manage. 115, 17–25 (2016) convolutional neural network. Autom. Electr. Power Syst. 43(2), 50–57
11. Zhao, Z., et al.: LSTM network: A deep learning approach for short-term (2019)
traffic forecast. IET Intel. Transport Syst. 11(2), 68–75 (2017) 32. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Com-
12. Lu, Z., et al.: New internet based operation pattern design of wind power put. 9(8), 1735–1780 (1997)
forecasting system. Power Syst. Technol. 40(1), 125–131 (2016) 33. Feng, X., Qiu, L., Guo, X.: Student feedback text academic emotion recog-
13. Zhang, C., et al.: Predictive deep Boltzmann machine for multiperiod nition method based on LSTM model. Open Educ. Res. 2, 1–13 (2019)
wind speed forecasting. IEEE Trans. Sustainable Energy 6(4), 1416–1425 34. Lin, Z., Liu, X.: Wind power forecasting of an offshore wind turbine based
(2015) on high-frequency SCADA data and deep learning neural network. Energy
14. Hu, Q., Zhang, R., Zhou, Y.: Transfer learning for short-term wind 201, 117693 (2020)
speed prediction with deep neural networks. Renewable Energy 85, 83–95 35. Diederik, P.K., Jimmy, L.B.: Adam: A method for stochastic optimization.
(2016) In: International Conference on Learning Representations, San Diego,
15. Wang, H., et al.: Deep belief network based deterministic and probabilistic USA, pp. 1–15 (2015)
wind speed forecasting approach. Appl. Energy 182, 80–93 (2016) 36. Zang, H., et al.: Short-term global horizontal irradiance forecasting based
16. Abedinia, O., Amjady, N.: Short-term wind power prediction based on on a hybrid CNN-LSTM model with spatiotemporal correlations. Renew-
Hybrid Neural Network and chaotic shark smell optimization. Int. J. Precis. able Energy 160, 26–41 (2020)
Eng. Manuf. Green Technol. 2, 245–254 (2015) 37. Wang, K., Qi, X., Liu, H.: Photovoltaic power forecasting based LSTM-
17. Amjady, N., Abedinia, O.: Short term wind power prediction based Convolutional Network. Energy 189, 116225 (2019)
on improved kriging interpolation, empirical mode decomposition, and 38. Qin, Y., et al.: Hybrid forecasting model based on long short term memory
closed-loop forecasting engine. Sustainability 9(11), 2104 (2017) network and deep learning neural network for wind signal. Appl. Energy
18. Abedinia, O., et al.: Shafie-khah, M., Catalão, J.P.S., Improved EMD-based 236, 262–272 (2019)
complex prediction model for wind power forecasting. IEEE Trans. Sus-
tainable Energy 11(4), 2790–2802 (2020)
19. Yao, G., Lei, T., Zhong, J.: A review of convolutional-neural-network-based
action recognition. Pattern Recognit. Lett. 118, 14–22 (2019) How to cite this article: Wu Q, Guan F, Lv C, Huang
20. Wang, H., et al.: Deep learning based ensemble approach for probabilistic
Y. Ultra-short-term multi-step wind power forecasting
wind power forecasting. Appl. Energy 188, 56–70 (2017)
21. Mi, X., Liu, H., Li, Y.: Wind speed prediction model using singular spec- based on CNN-LSTM. IET Renewable Power Generation.
trum analysis, empirical mode decomposition and convolutional support 2021;15:1019–1029.
vector machine. Energy Convers. Manage. 180, 196–205 (2019) https://doi.org/10.1049/rpg2.12085

You might also like