You are on page 1of 5

Comparative performance study of DBN, LSTM,

CNN and SAE models for wind speed and direction


forecasting.
1st Régis Donald HONTINFINDE* 2nd K. B. Audace DIDAVI
Laboratory of Science Engineering and Mathematics (LSIMA) Department of Electrical Engineering
University of Abomey (UNSTIM) Polytechnic School of Abomey-Calavi (EPAC)
Abomey, Republic of Benin Cotonou, Bénin
donald.hontinfinde@yahoo.com didavia@gmail.com
*Corresponding author

3rd Faith Y. Perseverance J. HOUESSOU 4th Macaire B. AGBOMAHENA


Laboratory of Science Engineering and Mathematics (LSIMA) Department of Electrical Engineering
University of Abomey (UNSTIM) Polytechnic School of Abomey-Calavi (EPAC)
Abomey, Republic of Benin Cotonou, Bénin
faith03houessou@gmail.com agbomac@yahoo.fr

Abstract— a result, their integration (mainly solar photovoltaic and wind


Index Terms—Genetic Algorithm, Cloud Computing, power) complicates grid balancing, degrades grid reliability
Telecommunications, Greedy Algorithm, Distributed Computing, and increases the cost of electricity due to technical constraints
Makespan, Optimization, Intelligent Networking
[4]–[6].
Faced with this random variation in the power generated by
I. I NTRODUCTION
solar photovoltaic and wind power sources, several solutions
Africa’s favorable climatic and geographical conditions can be envisaged, including forecasting the variability of
mean that it has considerable potential in terms of output power [7]–[9]. Several forecasting models have
renewable energies. Renewable energies include solar, wind, therefore been developed: the persistence method [10],
hydro, biomass and geothermal power, with theoretical physical methods [11]–[17] and statistical methods generally
annual production estimated at 1,449,772 TWh/year, 978,066 more accurate including artificial intelligence techniques such
TWh/year, 1,478 TWh/year, 2,374 TWh/year and 105 as Convolutional Neural Network (CNN), Long Short-Term
TWh/year respectively [1]. Africa enjoys abundant sunshine Memory (LSTM), Stacked Autoencoder (SAE) or Deep Belief
in many regions. Solar photovoltaic panels can be widely Network (DBN) [18]–[30]. In this paper, we propose a
deployed to produce solar-generated electricity. Many parts of comparative study of Deep Learning models for forecasting
Africa have considerable wind energy potential, particularly wind speed variability, with the aim of efficiently controlling
along the coast and in desert areas. Wind energy can be wind generation. In the second section, the methodology
harnessed using wind turbines. All this theoretical renewable adopted is described step by step, together with the hardware
potential should enable the whole of Africa to have access to used. The third section presents the results obtained and
electricity. concludes.
However, despite this significant potential for renewable
II. METHOD
energies, many African countries still face relatively low rates
of access to electricity. In 2021, 567 million people in Sub- A. Research chart presentation
Saharan Africa had no access to electricity, representing more Figure 1 below shows the research chart followed in this
than 80% of the world’s population without access [2], [3]. work. The aim is to come up with an effective model for
The lack of access in the Region remained almost identical to predicting wind speed variability, mainly but also its direction.
the situation observed in 2010. In fact, apart from investment To achieve this, the approach begins with bibliographic
issues, the integration of renewable energies into existing and bibliometric reviews, enabling us to identify the Deep
energy systems can present a number of challenges, including: Learning models most widely used for this task, and to propose
they are generally time- and space-dependent, have low techniques for data pre-processing and model evaluation. This
electricity generation capacity and low energy efficiency. As is followed by data acquisition and processing, optimization
of model hyperparameters, modeling and forecasting. Finally, include CNNs, LSTMs, DBNs, SAEs or GRUs. RNNs and
the performance of the selected models is compared to identify all their newer versions, such as LSTM and GRU, are the
the best performer. most popular models after hybrid models, which makes sense
because renewable energy data are time series, and RNN
models are known for their ability to extract temporal features.
This is why researchers use them alone or in combination with
other methods. Figure 2-b shows the percentage use of each
deep learning architecture in papers published in SCOPUS
in 2022 (for wind power forecasting). In this study, we have

Fig. 1. Research chart

B. Literature review on Deep Learning models used for


predicting wind power
The number of publications reporting on deep learning-
based methods has grown fast in recent years. This is
Fig. 2. (a) Number of publications reporting on deep learning-based methods
illustrated in Figure 2-a, which shows the number of articles in SCOPUS, (b) Deep learning models for wind energy forecasting used in
on deep learning-based wind energy research found in the 2022 publications in SCOPUS.
SCOPUS database for the period 2014-2023 (the last ten
years). The figure shows that most of the articles in SCOPUS therefore selected the CNN, LSTM, DBN and SAE models for
are published in the last three years, with 4292 articles a comparative study, combined with data processing methods.
out of a total of 7215 published from January 2014 to
C. Data acquisition and pre-processing
February 2023.This exponential growth shows the importance
of predictions and the current interest in them in the scientific Once the meteorological parameters to be studied (wind
community. speed and direction) were known, a database was identified
Recently, interest in hybrid models has increased. Almost half that could provide the related data for all the samples. For
of the articles published in SCOPUS in 2022 concern hybrid our work, NASA’s POWER — Data Access Viewer database
forecasting models. This trend is explained by the superiority was identified and exploited [31]. Information on the data
of these models over simple deep learning models, as shown downloaded from it is presented below:
by all comparative experiments. These hybrid models are • Period: January 1, 2012 to December 31, 2021;
either a combination of several Deep Learning models, or a • Satellite: MERRA-2;
◦ ◦
combination of a Deep Learning model and a data processing • Resolution: (1/2) × (5/8) latitude/longitude ;
technique. Of these hybrid models encountered, the majority • Longitude : 2.3855;
• Latitude : 6.3471; or standardization. Normalization, often simply called Min-
• Data step: hourly; Max scaling, reduces the extent of the data so that it is set
• Parameters: between 0 and 1 (or between -1 and 1 if there are negative
– Ws : Wind speed at 50 meters from ground in (m/s); values). It is most effective in cases where standardization
– Wd : Wind direction at 50 meters from ground in (◦ ); doesn’t work as well. If the distribution is not Gaussian, or
if the standard deviation is very small, the min-max scale
To obtain high-performance forecasting models, input data
works better. Normalization is generally performed using the
usually needs to be pre-processed. This is the most important
following equation:
step, and usually the largest part of a forecasting task, as a
forecasting model only draws information from the data it is x − min(x)
presented with. Data pre-processing is the process of preparing xnorm = (5)
max(x) − min(x)
raw data and making it suitable for a machine learning model.
It is the crucial first step in creating a machine learning model. Finally, the dataset is split. The aim of splitting the data
In our proposed data processing technique, we extract the into time series is similar to that of random splitting, namely to
most important frequencies in the data to integrate the concept validate the predictability of the model irrespective of how the
of periodicity into our dataset in order to increase model training-test datasets are split. However, time-series splitting
accuracy. Since we’re dealing here with meteorological time ensures that the test datasets are more recent or older than the
series, they often have clear daily or annual periodicity, for training datasets, which is more realistic since we won’t be
example. To facilitate forecasting, it is important to make able to train on ”future” data. We have opted for a split into
models understand these notions of frequency. To this end, three parts: Training, Validation and Test, with 70%, 20% and
Python 3’s Fast Fourier Transform (FFT) is used to determine 10% respectively.
the important frequencies in these series.
It is seldom possible to exploit parameters and their data as D. Modeling and hyper parameter optimization
downloaded or acquired in a forecasting task. Typically, these 1) Modeling: As indicated in section 2.2, the models
parameters are either transformed, combined or supplemented selected for this study are LSTM (Long short-term memory),
to create a new dataset in order to give the models a precise CNN (Convolutional Neural Network), DBN (Deep Belief
understanding of the target information. The first variables Network) and SAE (Stacked Autoencoder).
added to the dataset are those created from the significant Long Short-Term Memory
frequencies identified earlier. For each significant frequency, A Long Short-Term Memory recurrent neural network (LSTM)
two variables described by equations (1) and (2) are added: is a specialized type of recurrent neural network (RNN)
 ! that is capable of capturing long-term temporal dependencies
2π in sequential data. Unlike traditional RNNs, LSTMs are
xsin = sin t ∗ (1)
T designed to solve the gradient vanishing problem, enabling
 !
2π them to handle longer sequences and better capture long-term
xcos = sin t ∗ (2) relationships in the data. The structure of an LSTM model is
T
composed of three main gates:
where xsin and xcos are the newly added variables, t is • Forget Gate: This gate decides what information from
the identified important period. T is the identified important the previous cell state is to be forgotten or retained. This
period. is based on the sequence’s current input and previous
In our case, it’s also important to make the models understand hidden state.
the concept of wind direction. Wind directions are angles, and • Input Gate: This gate decides what new information
models need to understand that 0° and 360° are identical. should be added to the cell state. It consists of two parts:
What’s more, wind direction isn’t very useful if the wind – A sigmoid layer, which decides which input values
is weak. The two variables (wind speed and direction) are are to be updated.
therefore combined to create two new variables, as shown in – A tanh layer that creates a vector of new candidate
equations (3) and (4). values, which could be added to the cell state.
• Output Gate: Based on the cell state and the current
W D1 = Ws sin(Wd ) (3)
input, this gate decides on the hidden state output for the
W D2 = Ws cos(Wd ) (4) current sequence.
W D1 and W D2 are the two new variables created, Ws Mathematically, the operations in an LSTM are defined as
wind speeds in m/s and Wd wind directions in radian. follows, xt is the input at time t, ht is the hidden state at time
Once data cleansing has been carried out and new variables t, ct is the cell state at time t, σ is the sigmoid function, and
added, quality data must be consolidated in other forms tanh is the hyperbolic tangent function.
by modifying the value, structure or format of the data, • Calculation of the forgetting gate (ft ):
using data transformation strategies such as normalization ft = σ(Wf ∗ [ht−1 , xt ] + bf )
• Calculation of the input gate (it ): Deep Belief Network
it = σ(Wi ∗ [ht−1 , xt ] + bi ) Deep Belief Network (DBN) is a type of probabilistic neural
c′t = tanh(Wc ∗ [ht−1 , xt ] + bc ) network. It consists of several layers of neurons, each layer
• Update cell status (ct ): being connected to the previous and next layer. DBNs are
ct = ft ∗ ct−1 + it ∗ c′t used in tasks such as classification, regression, data generation
• Output gate calculation (0t ): and other machine learning problems. They are based on
ot = σ(Wo ∗ [ht−1 , xt ] + bo ) Boltzmann machines. A DBN consists of two main layers:
• Calculation of the forgetting gate (ht ): • Visible layers: The first layer of the network, called the
ht = ot ∗ tanh(ct ) visible layer, represents the model inputs. These inputs
In these equations,Wf ,Wi ,Wc and Wo are the weight can be binary or real variables
matrices for the different gates, and bf ,bi ,bc and bo are the • Hidden layers: The intermediate layers are called hidden
corresponding bias vectors. layers. These layers are organized into several levels,
Convolutional Neural Network with each level connected to the visible and other hidden
A convolutional neural network (CNN or ConvNet) is a type layers. The neurons in these layers are generally binary,
of deep neural network specially designed to process gridded representing features extracted from the input data.
data, such as images or time sequences. A CNN model is Mathematically, a DBN can be represented as follows:
structured as follows: • Boltzmann Restricted Network (BRN): Each pair of
• Convolution layer: The first layer of a CNN is generally adjacent layers in a DBN is modeled as a BRN. An
a convolution layer. It applies a number of filters (kernels) BRM is a probabilistic model with a visible layer v
to the input data to extract relevant features. Each filter and a hidden layer h. Activations of the visible layer
traverses the data and produces an ”activation map” by are denoted v = (v1 , v2 , ..., vn ) and those of the hidden
applying a convolution operation. Let I be the input data, layer are denoted h = (h1 , h2 , ..., hm ).
K the convolution filter, and B the associated bias. The The energy of a particular state (v, h) in an BRM is
convolution operation for an output feature O is given given by :
by: m
n X n m
XX X X X
O(i, j) = I(i + m, j + n) ∗ K(m, n) + B (6) E(v, h) = − Wij vi hj − bi vi − cj hj
m n i=1 j=1 i=1 j=1

• Activation function: After each convolution operation, where Wij are the weights between the visible vi and
an activation function (such as ReLU - Rectified Linear hidden hj units, bj and cj are the biases of the visible
Unit) is applied element by element to the resulting and hidden units respectively.
activation map. This introduces non-linearity into the • Top-down Propagation (Fine-Tuning): Once the BRMs
model, enabling it to learn complex patterns. have been pre-trained layer by layer, the network is
usually fine-tuned using a technique such as gradient
ReLU (x) = max(0, x) (7) back-propagation to adjust the weights and biases of the
• Pooling layer: After certain convolution layers, pooling entire network, thus optimizing model performance for
operations (such as max-pooling) are performed to reduce the specific task.
the spatial dimension of the activation map and the DBN is a deep neural network composed of visible and hidden
number of parameters in the model. Pooling retains the layers, where each pair of adjacent layers is modeled as an
essential features. For a given region of the activation BRM. These BRMs are pre-trained in an unsupervised way,
map, max-pooling retains the maximum value: then the overall network is fine-tuned for specific tasks using
supervised learning techniques.
M ax − P ooling(X) = max(region) (8)
Stacked Autoencoder
• Fully Connected (Dense) layers: One or more fully Sparse Autoencoders (SAEs) are a variant of the classic
connected layers can follow the convolution and pooling autoencoders used for unsupervised learning and feature
layers. These layers perform linear operations on the extraction in data. The special feature of SAEs is that they
extracted features to perform final classification or other impose a parsimony constraint on the internal representation
tasks. Let X be the output of the previous layer, W of the data, meaning that only a few units/neurons are activated
the weight matrix and b the associated bias. The fully at any one time. This can lead to a better representation
connected operation is given by: of the data, highlighting important features and reducing the
dimensionality of the data. An SAE typically consists of:
Y = XW + b (9)
• Encoder: The encoder transforms the input data into
• Output layer: The last layer of the CNN produces the an internal representation (encoding) using a layer of
output results, usually using an appropriate activation neurons called the encoding layer. This layer is generally
function (such as softmax for classification) to obtain narrower than the input layer, resulting in information
probabilities or class scores. compression.
• Decoder: The decoder reconstructs the input data from will be processed to produce the formatted Word, PDF, and
the internal representation obtained from the encoder. The HTML5 output formats, which will be provided to you for
aim of the decoder is to minimize the reconstruction error review, revision/resubmission (if applicable), and approval.
between the input data and the reconstructed data.
III. C ONCLUSION
2) Hyper parameter optimization: As presented previously,
LSTM, CNN, DBN and SAE networks are made up of several R EFERENCES
layers whose configurations can influence their learning [1] N. Funabiki, K.S. Lwin, Y. Aoyagi, M. Kuribayashi, and W.-C. Kao,
“A user-pc computing system as ultralow-cost computation platform for
performance. Thus, for a given forecasting task, the optimal small groups,” Appl. Theory. Comput. Tech., vol.2, no.3, pp.10-24, 2017.
parameterization to achieve good forecasting performance [2] H. Htet, N. Funabiki, A. Kamoyedji and M. Kuribayashi, “Design
with reduced computation time must be sought. In general, and implementation of improved user-pc computing system,” IEICE
Technical Report, vol.120, no.69, pp.37-42, 2020.
building an efficient machine learning model is a complex [3] N. Funabiki, Y. Asai, Y. Aoyagi and W. -C. Kao, “A Worker
and tedious process, involving determining the appropriate Implementation on Windows OS for User-PC Computing System,”
algorithm and obtaining an optimal model architecture by 2015 Third International Symposium on Computing and Networking
(CANDAR), Sapporo, Japan, pp. 279-282 2015.
tuning its hyperparameters (HP). To build an optimal model, [4] A. Kamoyedji, N. Funabiki, H. Htet and M. Kuribayashi, “A proposal
we need to explore a range of possibilities. In Table 1 below, of static job scheduling algorithm considering cpu core utilization
models hyperparameters and variation ranges are presented. for user-pc computing system,” 2021 9th International Conference on
Information and Education Technology (ICIET), Okayama, Japan, pp.
Table 1.Values of different integrals φi,j,k for Cosh pulses 374-379, 2021.
from ∀ n ∈ N [5] N. Funabiki, Y. Aoyagi, M. Kuribayashi, W.-C. Kao, “Worker pc
performance measurements using benchmarks for user-pc computing
system,” Consumer Electronics-Taiwan (ICCE-TW) 2016 IEEE
International Conference on Consumer Electronics-Taiwan (ICCE-TW),
• We vary input data size between 2h and 24h; Taiwan, pp.1-2, 2016.
• We vary the forecast horizon size between 1h and 23h in [6] T. Xie, A. Sung, and X. Qin, “Dynamic task scheduling with security
awareness in real-time systems,” in Proc. IEEE International Parallel and
steps of 1 hour; Distributed Processing Symposium, pp. 8-15, 2005.
• The number of units in the LSTM, CNN and SAE models
is varied between 10 and 1000 in steps of 10;
• The number of filters in the CNN model ranges from 10
to 300 in steps of 10;
• The number of hidden layers in the DBN model ranges
from 1 to 6 in steps of 1;
• The size of the hidden layers in the DBN model is varied
between 32 and 260 in steps of 32;
−2
• Three learning rates are tested: 10 , 10−3 , 10−4
Hyper-parameter optimization is performed using the Keras-
tuner library in Python3 [32].
E. Models training
Once the optimal parameters of the models have been
determined, they are then trained on the dataset. The models
are developed and trained using Tensorflow under python3.
TensorFlow is a leading open source platform for machine
learning. It features a comprehensive and flexible ecosystem
of tools, libraries and community resources that enable
researchers to advance machine learning and developers to
easily create and deploy machine learning-based applications
[33].
F. Model performance evaluation
More about the submission template
This submission version of your paper should not have headers
or footers, these will be added when your manuscript is
processed after acceptance. It should remain in a one-column
format—please do not alter any of the styles or margins.
If a paper is accepted for publication, authors will be
instructed on the next steps. Authors must then follow the
submission instructions found on their respective publication’s
web page. Once your submission is received, your paper

You might also like