2017 Ieeeaccess Deep Belief Networks With Genetic

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2929542, IEEE
Access
Deep Belief Networks with Genetic

Algorithms in Forecasting Wind Speed
KUO-PING LIN 1,2 , Senior Member, IEEE ,PING-FENG PAI 3 , Senior Member, IEEE AND
YI-JU TING 3
1
Institute of Innovation and Circular Economy, Asia University, 500, Lioufeng Rd., Wufeng, Taichung 41354, Taiwan
2
Department of Industrial Engineering and Enterprise Information, Tunghai University, No.1727, Sec.4, Taiwan Boulevard, Taichung
40704, Taiwan
3
Department of Information Management, National Chi Nan University,1 University Rd., Puli, Nantou 54561, Taiwan
Corresponding author: Ping-Feng Pai (e-mail: paipf@ncnu.edu.tw).
This work was financially supported in part by the Ministry of Science and Technology of Taiwan, Republic of China, under Contract
Numbers MOST 103-2410-H-260-020, MOST 104-2410-H-260-018, MOST 105-2410-H-260-017-MY2 and MOST 107-2410-H-260-012.
ABSTRACT Wind is one of the most essential sources of clean, renewable energy, and is therefore a critical
element in responsible power consumption and production. The accurate prediction of wind speed plays a
key role in decision-making and management in wind power generation. This study proposes a model using
a deep belief network with genetic algorithms (DBNGA) for wind speed forecasting. The genetic algorithms
are used to determine parameters for deep belief networks. Wind speed and weather-related data are collected
from Taiwan’s central weather bureau for this purpose. This study uses both time series data and multivariate
regression data to forecast wind speed. The seasonal autoregressive integrated moving average (SARIMA)
method and the least squares support vector regression for time series with genetic algorithms (LSSVRTSGA)
are used to forecast wind speed in a time series, and the least squares support vector regression with genetic
algorithms (LSSVRGA) and DBNGA models are used to predict wind speed in a multivariate format.
Empirical results show that forecasting wind speed by DBNGA models outperforms the other forecasting
models in terms of forecasting accuracy. Thus, the DBNGA model is a feasible and effective approach for
wind speed forecasting.
INDEX TERMS Forecast, wind speed, deep belief networks, genetic algorithms, multivariate regression,
time series
I. INTRODUCTION hybrid model. A trial-and-error method was employed to

As global population growth, economic development and the determine the number of hidden layers and hidden nodes
progress of science and technology have all rapidly increased, required in a deep belief network for predicting wind speed.
so too has the global demand for energy. This increasing The numerical result indicated that the proposed hybrid model
energy demand has resulted in energy shortages and global outperformed the auto-regressive and moving average models,
warming. Wind power, however, is free and clean, and has back-propagation neural networks, and the Morlet wavelet
thus has become one of the most popular sources of renewable neural network in terms of forecasting accuracy. Hu et al. [7]
energy. The accurate prediction of wind speed plays a crucial developed a shared-hidden-layer deep neural network model
role in effectively generating, distributing and managing wind for predicting wind speed. The numerical outcomes
power, and, to date, a number of wind speed forecasting demonstrated that the proposed shared-hidden-layer deep
models have been proposed [1-5]. Deep learning is an neural network model achieved more accurate forecasting
emerging technology that has been applied in many fields, results than models using support vector regression, deep
including wind speed prediction. Wang et al. [6] presented a neural networks, or extreme learning machines. Liu et al. [8]
hybrid model including wavelet transform, a deep belief designed a hybrid deep-learning model consisting of empirical
network and spine quantile regression to forecast wind speed. wavelet transform, long short term memory neural networks
Wind farm hourly data with high-level nonlinear and non- and Elman neural networks for forecasting wind speed. The
stationary characteristics were predicted by the proposed empirical wavelet transform divided original wind speed data
VOLUME XX, 2017 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2929542, IEEE
Access
into low-frequency wind speed data to be processed by the [21-24] have been used for deep learning network parameter
long short term memory neural network, and high-frequency selection. In this study, genetic algorithms [25] are used for
wind speed data to be processed by the Elman neural network. deep belief network parameter selection for predicting wind
The experiment results showed that the proposed hybrid speed by weather-related data. The remainder of this paper is
model outperformed other forecasting techniques in terms of arranged as follows. Section 2 presents the methodologies
forecasting accuracy. Liu et al. [9] proposed a multistep used in this study. The proposed wind speed forecasting
forecasting model including variational mode decomposition, framework and numerical results are described in Section 3.
singular spectrum analysis, long short term memory networks, Finally, conclusions are drawn in Section 4.
and an extreme learning machine for predicting wind speed.
The variational mode decomposition and the singular
spectrum analysis were employed to decompose data and II. METHODOLOGIES
extract data trends; the long short term memory network and A. THE SEASONAL AUTOREGRESSIVE INTEGRATED
the extreme learning machine were used to deal with low- MOVING AVERAGE MODEL
frequency and high-frequency data, respectively. The Developed by Box and Jenkins [26] the autoregressive
empirical results revealed that the presented hybrid model integrated moving average (ARIMA) model is a popular
could generate more accurate results in forecasting wind speed method in coping with time series data. In the ARIMA method,
than the other models. Chen et al. [10] developed a combined two components, the autoregressive and the moving average;
model by applying the ensemble learning technique to long are included; and a future variable value is determined by a
short term memory neural networks and a support vector linear combination of prior values and errors. Three
regression machine for wind speed forecasting. Simulation parameters, the order of the autoregressive (p), the degree of
results indicated that the proposed model achieved a better differencing (d), and the order of the moving average model
forecasting performance than the autoregressive integrated (q), are contained in the ARIMA model. The autoregressive
moving average models, support vector regression, the k- model, AR(p), represents the association between present and
nearest neighbor technique, artificial neural networks, and the past values. For a sequence with variables Y, the values of the
gradient boosting regression tree for wind speed forecasting. prior period are employed to predict the present period value
Liu et al. [11] proposed a hybrid model using the wavelet (Yt) and expressed by Eq.(1).
packet decomposition technique, a convolutional neural 𝑌 𝐶 ∑ ∅𝑌 𝜀 ∑ 𝜃𝜀 (1)
network and a long short term memory network for predicting where C is a constant, ∅ and 𝜃 are the autoregressive
wind speed. The wavelet packet decomposition technique was coefficient and the moving average coefficient respectively,
used to decompose wind speed data into high-frequency and and Ɛ t is white noise at time t.
low-frequency data patterns. The convolutional neural The seasonal autoregressive integrated moving average
network was employed to cope with high-frequency wind (SARIMA) model [26, 27] is a revised form of the ARIMA
speed data, while the low-frequency wind speed data was model. The SARIMA model was designed for forecasting
processed by the convolutional long short term memory time series data with trend and seasonal tendency, and uses
network. The numerical results demonstrated that the regular and seasonal differences to remove trends and seasonal
developed hybrid model outperformed other forecasting presences. A SARIMA model (p, d, q)(P, D, Q)S produces a
models in terms of wind speed prediction accuracy. Li et al. time series, {Yt, t = 1, 2. . ..k}, with a mean M from an ARIMA
[12] developed a hybrid model for forecasting wind speed. [26] time series model satisfying Eq. (2):
The proposed model included empirical wavelet transform ∅p(L) ϑ(Ls)(1-L)d(1-Ls)D(Yt-M)=θq(L)ΘQ(Ls)Ɛ t (2)
decomposition, a long short term memory network, a where L is the backward shift operator, and Ɛ t is white noise
regularized extreme learning machine network and inverse at time t, while p, d, q, P, D and Q are nonnegative integers,
empirical wavelet transform. The models performed raw wind the orders p and P are a regular autoregressive operator and a
speed data decomposition, time series wind speed data seasonal autoregressive operator, the orders q and Q are a
forecasting, forecasting error series modeling, final regular moving average operator and a seasonal moving
forecasting series integration, and outlier filtering. The average operator, d and D denote degrees of regular
numerical results showed that the performance of the differencing and seasonal differencing, s indicates the seasonal
presented hybrid model was superior to that of the other seven interval, ∅ p(L)=(1− ∅ 1L− ∅ 2L2−· · ·− ∅ pLp) is the regular
compared forecasting techniques in terms of forecasting autoregressive operator with order p, and θq(L)=(1-θ1L-θ
2 q
accuracy. 2L -· · ·-θqL ) is the regular moving average operator with
Nevertheless, the choice of parameters for deep belief order q. In addition, ϑ(Ls)=(1- ϑ1Ls- ϑ2L2s-· · ·- ϑPLps) is the
networks significantly influences their forecasting accuracy. seasonal autoregressive operator with order P, and ΘQ(LS)
Experimental or trial-and-error methods have been adopted to =(1−Θ1BS -Θ2B2S-· · ·ΘQBQS) is the seasonal moving average
cope with such parameter selection [6, 13-16]. In addition, operator with order Q. The degrees of differencing, d and D,
some metaheuristics, including the tabu search algorithm [17], are determined in terms of eliminating the tendency and
particle swarm optimization [18-20] and genetic algorithms seasonality of data. The autocorrelation function and the
Access
partial autocorrelation function of the differenced series are RBM only allows links between nodes in adjacent layers. An
employed to determine the values of p, P, q and Q. In general, RBM consists of two layers: a visible layer and a hidden
a four-step procedure involving identification, estimation, layer. Each node in the visible layer is linked to all hidden
validation, and forecasting, is performed to find suitable nodes without directions. Deep belief network learning
parameters and to make a forecast in the SARIMA model [26- strategies include unsupervised learning in the pre-training
29]. stage, and supervised learning in the fine-tuning stage. The
main task of unsupervised learning is to select appropriate
B. THE LEAST SQUARE SUPPORT VECTOR initial parameters, including biases and weights, for the
REGRESSION MODEL supervised learning using only independent variables. Thus,
Revised from the support regression model [30-32] by the pre-training phase rebuilds training samples by adjusting
parameters in order to maximize the likelihood estimation.
decreasing computation load, the least square support vector
Based on the initial parameters provided by the pre-training
regression model [33] is a popular and powerful technique for
phase, the supervised learning phase can further tune biases
dealing with forecasting problems in both time series and
and weights efficiently. Figure 1 illustrates an RBM structure
multivariate regression formats. By introducing a least squares consisting of a visible layer with i units, and a hidden layer
approach to the support vector regression model for a training with j units. The outputs of units in a visible layer are treated
data set {Xi,Yi}, i=1,…..N in order to formulate a regression as inputs of units in the hidden layer. An energy function of
problem, the LSSVR model can be depicted as the following a joint structure (v,h) of visible and hidden units can be
optimization problem [33, 34]: depicted as Eq. (8):
Min ∶ 𝑤 𝑤 Ʊ∑ e (3) Energy v, h ∑  v ∑ h
subject to ∑ ∑ V W h (8)
𝑌 𝑤 ∙𝜌 𝑋 𝑏 e , 𝑖 1, … . . 𝑁 where Wm,n is the weight of connection unit m in the visible
where w is the weight vector of the same dimension as the layer with the bias m and unit n in the hidden layer with the
feature space, Ʊ is the regularization constant signifying the bias n. Values of Vm and hn satisfy a binary distribution
trade-off between the empirical error and the flatness of the stochastically, and can be represented as Eq. (9):
,
function, N is the number of data points, 𝑒 is the ith training p v, h (9)
∑ ∑ ,
error of 𝑋 , 𝑋 and Yi are the input and output data points for
the ith pattern data, 𝜌(X) is a nonlinear mapping function, and ,
where ∑ ∑ exp is a normalization element.
b is the bias. By the Lagrange multipliers technique, the
When state vector v of the visible layer or h of the hidden
Lagrange form of Eq. (3) is rewritten as:
layer are generated, the activated probabilities of ℎ and
L 𝑤, b, e, α ‖𝑊‖ Ʊ∑ 𝑒 ∑ 𝛼 𝑤 ∙ 𝑣 can be expressed as Eq. (10) and Eq. (11), respectively:
𝜌 𝑋 𝑏 𝑒 𝑌 (4) p ℎ 1|𝑣 𝑠𝑖𝑔𝑚  ∑ 𝑣 𝑊 (10)
where 𝛼 are the Lagrange multipliers. p 𝑣 1|h 𝑠𝑖𝑔𝑚  ∑ 𝑊 ℎ (11)
Then, the partial derivative of Eq. (4) is performed with The sigm(･) is a sigmoid function and can be represented as
respect to w, b, e and α, when all derivatives are equal to zero Eq.(12).
according to the Karush–Kuhn–Tucker conditions [35-37]. 𝑠𝑖𝑔𝑚 x (12)
Using the least squares method to resolve the linear equations, The aim of the RBM is to search three appropriate
the LSSVR model can be represented as Eq. (5): parameters, namely α, β and W, represented collectively as a
𝑌 𝑋 ∑ 𝛼 𝜌 X T∙𝜌 𝑋 T 𝑏 (5) parameter set Ω. The pre-trained parameter set serves as an
With respect to Mercer’s principle [38], the following kernel initial parameter set for the supervised learning stage of the
function is defined: deep belief network. Thus, features of the training data set
K(X,Xi)=𝜌(X)T∙𝜌 𝑋 T (6) can be depicted well by the parameter set, and the likelihood
In this study, the radial basis function with the kernel width σ function can be maximized. A constructive divergence
indicated by Eq.(7) serves as a kernel function. algorithm [42] was presented to optimize the likelihood
‖ ‖ function when distributions of the visible layer and the
𝐾 𝑋, 𝑋 𝑒𝑥𝑝 ) (7)
hidden layer are in steady states [43]. The likelihood function
In this investigation, suitable LSSVR model parameters are is expressed as Eq. (13):
provided by genetic algorithms [25, 39]. ln 𝐿 Ω ln ∏ 𝑃 𝑉 ∑ ln 𝑃 𝑉 (13)
where T is the number of training data. The gradient of
C. DEEP BELIEF NETWORKS Eq.(13) with respect to the parameter set can be expressed as
A deep belief network [40] consists of several stacked Eq.(14):
restricted Boltzmann machines (RBMs), which are the main ∑ (14)
components of a deep belief network. The RBM was revised
from the original Boltzmann machine BM [41] with node Then, the derivative of ln 𝑃 𝑉 with respect to Ω can be
connections both within layers and between layers. The
Access
expressed as Eq.(15). Thus, the gradient of the likelihood 𝑊𝑀𝐿𝑃 𝑡 1 𝑀𝑂 𝑊 𝑡

function can be represented as Eq.(16). 𝜂 = 𝑀𝑂 𝑊 𝑡 𝜂 𝜏𝑂 (22)
𝜕 ln 𝑃 𝑉 𝜕 , , 𝛾 𝑡 1 𝑀𝑂 𝛾 𝑡 𝜂 𝜏 (23)
ln 𝑒 ln 𝑒
𝜕Ω ∂Ω In this study, for simplicity, the following relation and
,
, , symbols shown as Eq.(24) are employed.
∑ 𝑃 ℎ|𝑣 ∑ , 𝑃 𝑣, ℎ (15)
∆𝑊 𝑡 = τ (24)
,
∑ ∑ 𝑃 ℎ|𝑉
In addition, the τ is depicted as Eq.(25):
,
∑ ,𝑃 𝑣, ℎ (16) 𝐴 𝑌
⎧
Finally, the learning algorithm with the momentum and the ⎪, 𝑤ℎ𝑒𝑛 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑎𝑟𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑡𝑜 𝑜𝑢𝑡𝑝𝑢𝑡
learning rate of the RBM is generated and shown as τ (25)
Eq.(17). ⎨ ∑ 𝐴 𝑌
𝛺 𝑡 1 𝑀𝑂 𝛺 𝑡 𝜂 (17) ⎪
⎩ , others
where 𝑀𝑂 and 𝜂 are the momentum and the learning rate,
respectively, in the unsupervised learning stage. Because the where 𝑁𝐸𝑇 is the input a node receives from a previous
computations of Eq. (17) are hard, a k-step constructive layer, a(･) is an activation function, and 𝑎 𝑁𝐸𝑇 is the
divergence (CD-k) algorithm [42] is employed to address output of a node, serving as an input for the next layer.
this issue. In this study, the CD-1 policy is employed. Thus,
Input data
the learning rules of the three parameters can be represented
as Eqs. (18-20): Layer 1 Layer 2 Layer3 Layer L
𝑊 𝑡 1 𝑀𝑂 𝑊 𝑡 𝜂 ∑ 𝑃 ℎ 𝑣 𝑣
𝑃 ℎ 𝑣 𝑣 (18) …….
𝛼 𝑡 1 𝑀𝑂 𝛼 𝑡 𝜂 ∑ 𝑣 𝑣 (19) …….
𝛽 𝑡 1 𝑀𝑂 𝛽 𝑡 𝜂 ∑ 𝑃 ℎ 𝑣
𝑃 ℎ 𝑣 (20) …….
RBM 1 RBM2 RBM3 RBM (L-1) RBM L
Errors propagation
FIGURE 2. An illustration of a deep belief network. [44-46]
III. WIND SPEED FORECASTING BY THE DBNGA

MODEL AND NUMERICAL RESULTS
FIGURE 1. The construct of RBMs. [40, 44] A. THE PROPOSED FRAMEWORK

When the RBM unsupervised learning phase is complete, Taiwan has fairly plentiful wind energy resources [47], and
a supervised learning mechanism using the initial parameter the Penghu archipelago area has the strongest wind speed
set generated from the unsupervised learning stage is and produces the most wind power among several regions in
engaged. In the supervised learning stage, backpropagation Taiwan [48]. In this study, a deep belief network with genetic
algorithms are employed to further adjust weights and biases algorithm model was designed to forecast wind speed at
for a multilayer perceptron network (MLP). Figure 2 Penghu. To examine the feasibility and applicability of the
illustrates a deep belief network model with a total of L developed model, datasets from two other weather stations
layers, an input layer on the left hand side, and an output in Taiwan, namely Kinmen and Matsu, were used to forecast
layer on the right hand side. In this study, the deep belief wind speed. The flowchart of this study is presented in Fig.
network model with multivariate regression weather-related 3, and stated as follows. First, the hourly wind speed and
data is used to forecast wind speed. For a certain training data weather-related data at the Penghu station in Taiwan were
set in the output layer, the error value can be represented as collected from the Central Weather Bureau in Taiwan
Eq. (21) [45]: (http://e-service.cwb.gov.tw/HistoryDataQuery/index.jsp).
𝐸 ∑ 𝐴 𝑌 (21) Originally, there are a total of 14 weather attributes,
Where N is the number of output nodes, 𝑌 is the output including wind speed, on this web-site. However, due to
value of the n-th output node, 𝐴 is the n-th target value. some missing or uncertain values, a total of 11 weather
Thus, the updating algorithms with gradient-descent attributes were used in this study. The data, including wind
method and chain rules of weights (WMLP) and bias (𝛾) speed, station pressure, sea pressure, temperature, dew point
can be expressed as Eqs.(22–23). temperature, relative humidity, wind direction, max gust,
direction of max gust, precipitation amount, precipitation
4
Access
hours, and sunshine hours were used in this study to predict

wind speed by time series models and multivariate models.
These 11 weather attributes were taken from the work of
Wang et al. [5]. The hourly weather data from January 1,
2017 to December 31, 2017 were used to model wind speed
prediction models, and the hourly wind speed data from
January 1, 2018 to January 14, 2018 were used as testing data.
For multivariate forecasting models, a rolling and a one hour
forecast format were used to forecast wind speed. Fourteen
time windows from 12 to 168 hours, with 12 hour time
intervals were examined to test the performance of
forecasting models.
In this study, both time series forecasting methods and
multivariate forecasting methods were used for predicting
wind speed. The time series forecasting models used were
the seasonal autoregressive integrated moving average
method (SARIMA), and the least squares support vector
regression for time series with genetic algorithm method
(LSSVETSGA). These forecasting methods used past wind
speed data as input data to forecast future wind speed.
Using the principles of evolution and natural selection,
Holland [25] developed genetic algorithms as a structured
random search method for discovering optimal or near
optimal solutions with relatively mild computational burden.
In this study, genetic algorithms are employed to select the
parameters for the DBN, LSSVR and LSSVRTS models. In
the deep belief network model, two parameters, namely the
momentum and learning rate, are included in both
unsupervised learning and supervised learning stages. In the
unsupervised learning stage and the supervised learning
stage, the negative values of the energy function of restricted Figure 3. The proposed flowchart of wind speed forecasting.
Boltzmann machines and negative RMSE values of wind
speed forecast serve as the fitness of genetic algorithms, Initialize RBM parameters, 𝑀𝑂1 ,𝜂1 ,
and Ω randomly.
respectively. Furthermore, one hidden layer is employed for
the deep belief network in this study. Similarly, two Train the parameter set Ω= (α, β,
parameters of the LSSVR and LSSVRTS models are W) of RBM models.
provided by genetic algorithms using negative RMSE values Use GA to generate a new set of
𝑀𝑂1 and 𝜂1 , and provisional
of wind speed forecast as fitness function in the training stage. Calculate values of the
RBM models.
In this investigation, parameters are expressed by a energy function of RBM
chromosome composed of 40 genes in the form of binary

numbers. The population size is set to 10. The crossover and Does the GA stop No
Yes criterion reach ?
mutation rates are both set to 0.7. The single-point crossover
method is employed for genetic algorithms in this study. The
varying range of the momentum and the learning rate is set Collect initial parameters values
of (γ, W) provided by RBM
to 0.1 to 0.9, correspondingly. The searching range of the two and initial random parameters
parameters for LSSVR and LSSVRTS is set to 1 to 500. 30 𝑀𝑂2 , 𝜂2 of MLP.
generations is used as a stop criterion for the genetic Use GA to generate a new set of
𝑀𝑂2 and 𝜂2 , and provisional
algorithms. The goal of selecting proper controlling Train parameters (γ, W) of MLP models.
parameters for metaheuristics is to optimize problems. The MLP models.
selection of controlling parameters of genetic algorithms No

used in this study can provide forecasting models with Calculate RMSE values of MLP
appropriate parameters, and result in accurate forecasting. Does the GA stop
Figure 4 shows the parameter selection mechanism for both criterion reach ?
the unsupervised pre-training stage, and the supervised fine-

Yes
tuning stage in a deep belief network.
A Selected DBN model
Figure 4. The parameters selection mechanism a deep belief network.
Access
B. NUMERICAL RESULTS Wind speeds (m/s) Actual values

This study employed the least squares support vector 9 SARIMA
DBNGA
regression with genetic algorithms and deep belief networks 8
LSSVRGA
LSSVRTSGA
with genetic algorithms to predict wind speed with various
weather attributes. Three forecasting performance 7
measurements, the root mean square error (RMSE) and the

mean absolute percentage error (MAPE) were used to 6
measure the forecasting accuracy of the tested models, 5

expressed as Eqs. (26-27):
4
∑
RMSE (26)
3
MAPE % ∑ (27)
2
where N is the number of forecasting periods, 𝐴 is the actual
value at period t, Ft is the forecasting value at period t, 𝐴 is 1
highest actual value, and 𝐴 is the lowest actual value.

Tables1-3 list the parameters of the four forecasting models 0
Hours
1
4
7
100
103
106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
at three weather stations. The wind speed forecasting values
of different forecasting models and actual values are shown FIGURE 5. Actual and forecasted values of hourly wind speed at the
in Figs. 5-7 at three weather stations. Figures 5-7 show that Penghu station.
the DBNGA model can capture wind speed patterns well,
especially at turning points. However, the other three models
Wind speeds (m/s)
are unable to effectively follow wind speed data patterns in 8
Actual values
SARIMA
mid-term time windows. Tables 4-6 list various DBNGA
measurements of forecasting accuracy by the four models
7
LSSVRGA
LSSVRTSGA
with various time windows at three weather stations. Table 7 6
lists the averages of forecasting measurements for three

weather stations. In terms of forecasting accuracy, it is clear 5
that the DBNGA model outperforms the other three 4
forecasting models in mid-term time windows, as well as in

terms of the average for all time windows. 3
2
TABLE 1. Parameters of forecasting models at the Penghu station.
Stations Models Parameters 1
DBNGA (𝑀𝑂 , 𝜂 )=(0.83,0.45);

0
(𝑀𝑂 ,𝜂 )=(0.99,0.83) Hours
1
5
9
101
105
109
113
117
121
125
129
133
137
141
145
149
153
157
161
165
169
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
Penghu LSSVRGA (Ʊ,σ)= (347.11,1.22)

LSSVRTSGA (Ʊ,σ)=(359.46, 6.47) FIGURE 6. Actual and forecasted values of hourly wind speed at the
Kinmen station.
SARIMA (p, d, q)(P, D, Q)s =(1,1,1)(0,0,2)6
Wind speeds (m/s) Actual values
TABLE 2. Parameters of forecasting models at the Kinmen station. 9
Stations Models Parameters SARIMA

8 DBNGA
DBNGA (𝑀𝑂 , 𝜂 )=( 0.65, 0.04);
LSSVRGA
(𝑀𝑂 ,𝜂 )=( 0.32, 0.93) 7
LSSVRTSGA
Kinmen LSSVRGA (Ʊ,σ)= (313.01, 1.68)
6
LSSVRTSGA (Ʊ,σ)=( 351.20, 4.05)
SARIMA (p, d, q)(P, D, Q)s =(0,1,5)( 0,0,2)6 5
4
TABLE 3. Parameters of forecasting models at the Matsu station.
Stations Models Parameters 3
DBNGA (𝑀𝑂 , 𝜂 )=( 0.88, 0.49);
2
(𝑀𝑂 ,𝜂 )=( 0.98, 0.94))
Matsu LSSVRGA (Ʊ,σ)= (335.44, 8.92) 1
LSSVRTSGA (Ʊ,σ)=( 463.41, 4.86)
0
SARIMA (p, d, q)(P, D, Q)s Hours
1
5
9
101
105
109
113
117
121
125
129
133
137
141
145
149
153
157
161
165
169
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
=(0,1,1)( 0,0,1)24
FIGURE 7. Actual and forecasted values of hourly wind speed at the
Matsu station.
Access
TABLE 4. Measurements of forecasting accuracy by four models with MAPE(%) 11.831 11.299 21.360 16.609
various time windows at the Penghu station. 24
Forecasting models RMSE 0.708 0.655 1.338 0.875
Time Measureme
Window nts of MAPE(%) 13.116 12.505 21.472 29.550
s forecasting DBN LSSVRG LSSVRT 36
SARIMA RMSE 0.708 0.655 1.183 1.301
(hours) accuracy GA A SGA
MAPE(%) 14.949 15.036 21.780 34.894

MAPE(%)
6.034 8.738 9.460 7.713
48
12 RMSE 0.744 0.800 1.105 1.433
RMSE
0.492 0.669 0.725 0.596
MAPE(%) 15.401 18.347 33.218 58.489
MAPE(%)
6.499 8.990 10.225 12.759
60
24 RMSE 0.690 0.780 1.216 1.876
RMSE
0.494 0.725 0.756 0.981
MAPE(%) 14.789 19.454 30.892 52.429
MAPE(%)
7.048 8.559 10.556 19.493
72
36 RMSE 0.686 0.879 1.216 1.767
RMSE
0.531 0.723 0.737 1.315
MAPE(%) 16.756 21.275 32.401 58.075
MAPE(%)
8.820 10.062 19.111 32.986
84
48 RMSE 0.720 0.901 1.204 1.868
RMSE
0.576 0.667 1.152 1.883
MAPE(%) 18.704 25.621 39.766 72.517
MAPE(%)
13.439 14.720 46.443 67.811
96
60 RMSE 0.709 0.951 1.264 2.066
RMSE
0.628 0.777 1.883 2.700
MAPE(%) 23.780 33.197 55.580 97.224
MAPE(%)
12.839 14.626 52.855 76.930
108
72 RMSE 0.726 1.003 1.376 2.252
RMSE
0.598 0.773 2.050 2.945
MAPE(%) 22.466 31.625 52.624 88.730
MAPE(%)
12.516 13.993 49.929 74.462
120
84 RMSE 0.716 1.022 1.390 2.148
RMSE
0.604 0.958 1.982 2.910
MAPE(%) 21.572 31.123 49.602 82.180
MAPE(%)
14.156 13.895 53.156 79.220
132
96 RMSE 0.719 1.059 1.355 2.061
RMSE
0.659 1.545 2.055 3.023
MAPE(%) 20.951 29.835 47.116 79.179
MAPE(%)
14.113 15.329 54.202 81.199
144
108 RMSE 0.711 1.030 1.317 2.033
RMSE
0.648 1.787 2.083 3.081
MAPE(%) 20.529 29.916 47.316 82.041
MAPE(%)
13.659 14.659 49.841 74.557
156
120 RMSE 0.696 1.014 1.308 2.096
RMSE
0.667 1.742 1.993 2.942
MAPE(%) 21.789 31.824 52.042 91.408
MAPE(%)
13.146 14.353 46.069 70.529
168
132 RMSE 0.694 1.011 1.349 2.219
RMSE
0.660 1.771 1.909 2.852
MAPE(%)
13.192 14.776 43.535 67.794
TABLE 6. Measurements of forecasting accuracy by four models with
144
RMSE various time windows at the Matsu station.
0.684 2.074 1.848 2.792
Forecasting models
MAPE(%) Time Measureme
13.606 18.627 43.633 68.533
156 Window nts of
RMSE s forecasting DBN LSSVRG LSSVRT
0.704 2.205 1.848 2.819 SARIMA
MAPE(%)
18.930 24.018 62.948 92.852
168 MAPE(%) 16.284 10.950 14.880 11.467
RMSE
0.743 2.146 2.010 2.993 12
RMSE 0.880 0.669 0.656 0.626
TABLE 5. Measurements of forecasting accuracy by four models with MAPE(%) 14.119 11.971 16.484 11.622
various time windows at the Kinmen station. 24
Forecasting models RMSE 0.734 0.630 0.629 0.589
Time Measureme
Window nts of MAPE(%) 12.528 12.181 16.609 10.601
s forecasting DBN LSSVRG LSSVRT 36
SARIMA RMSE 0.659 0.615 0.710 0.533
MAPE(%) 12.560 12.432 22.333 13.838

MAPE(%) 13.263 11.776 17.174 16.471
48
12 RMSE 0.621 0.609 0.891 0.611
RMSE 0.743 0.661 0.934 0.794
Access
MAPE(%) 14.077 15.199 32.731 22.525 with genetic algorithms model for predicting wind speed in
60
RMSE 0.652 0.661 1.163 0.846 Taiwan. Both wind speed data and various weather factors
are employed to forecast wind speed at three weather stations
MAPE(%) 13.799 16.232 30.632 22.517 in Taiwan. Numerical results indicate that DGNBA models
72 can achieve more accurate forecasting results over mid-term
RMSE 0.671 0.790 1.174 1.000
periods, and in terms of the average forecasting accuracy for
MAPE(%) 13.407 15.764 27.826 22.200
all time windows than the other models. Thus, the proposed
84 DBNGA model is a feasible and promising method for
RMSE 0.678 0.801 1.116 1.020
forecasting wind speed. However, this study only used
MAPE(%) 17.010 18.585 48.938 39.818
limited weather factors for multivariate forecasting methods
96
RMSE 0.679 0.786 1.410 1.231 of predicting wind speed. Future work will include some
other factors, such as landforms, to increase the forecasting
MAPE(%) 17.471 23.154 52.414 42.570
accuracy. Another possible direction for future study is to
108
RMSE 0.741 0.964 1.485 1.296 include the number of hidden layers and the number of
MAPE(%) 16.681 21.728 49.262 41.365
hidden nodes in the fitness function of genetic algorithms
120
used to select deep belief network models. Finally, other
RMSE 0.735 0.943 1.472 1.366 metaheuristics could be employed select deep belief network
MAPE(%) 15.791 20.666 45.612 39.365 models in order to improve forecasting performance.
132
RMSE 0.712 0.917 1.412 1.338
REFERENCES
MAPE(%) 15.780 20.460 43.966 37.636 1. H. Samet, F. Marzbani,“Quantizing the deterministic
144 nonlinearity in wind speed time series,” Renew. Sust.
RMSE 0.709 0.919 1.385 1.302
Energ. Rev, vol. 39, pp. 1143-1154, 2014.
MAPE(%) 17.831 23.888 50.277 42.257 2. A. Tascikaraoglu, M. Uzunoglu, “A review of combined
156
RMSE 0.733 0.980 1.484 1.351 approaches for prediction of short- term wind speed and
power,” Renew. Sust. Energ. Rev. vol. 34, pp. 243-254,
MAPE(%) 21.015 30.071 62.415 52.073
2014.
168
RMSE 0.739 1.032 1.604 1.427 3. J. Jung, R. P. Broadwater, “Current status and future
advances for wind speed and power forecasting,” Renew.
TABLE 7. Averages of forecasting measurements at three stations.
Sust. Energ. Rev. vol. 31, pp. 762-777, 2014.
Forecasting models 4. L. Xiao, J. Wang, Y. Dong, J. Wu, “Combined
Measureme forecasting models for wind energy forecasting: A case
nts of
Stations
forecasting DBN LSSVRG LSSVRT study in China,” Renew. Sust. Energ. Rev. vol. 31, pp.
SARIMA
accuracy GA A SGA 271-288, 2015.
5. J. Wang, Y. Song, F. Liu, R. Hou, “Analysis and
MAPE(%)
12.000 13.953 39.426 59.060 application of forecasting models in wind power
Penghu
RMSE integration: A review of multi-step-ahead wind speed
0.621 1.326 1.645 2.417
MAPE(%)
forecasting models,” Renew. Sust. Energ. Rev. vol. 60, pp.
Kinmen
17.850 23.060 37.310 61.414 960-981, 2016.
RMSE 6. H. Z. Wang, G. B. Wang, G. Q .Li, J. C. Peng, Y. T. Liu,
0.712 0.887 1.254 1.771
MAPE(%) “Deep belief network based deterministic and
15.597 18.092 36.741 29.275 probabilistic wind speed forecasting approach,” Appl.
Matsu
RMSE
0.710 0.808 1.185 1.038 Energy. vol. 182, pp. 80-93, 2016.
7. Q. Hu, R. Zhang, Y. Zhou, “Transfer learning for short-
term wind speed prediction with deep neural networks,”
IV. CONCLUSIONS Renew. Energy. vol. 85, pp. 83-95, 2016.
As global fossil fuel reserves have shrunk, and global 8. H. Liu, X. W. Mi, Y. F. Li, “Wind speed forecasting
warming has become an ever more serious concern, energy method based on deep learning strategy using empirical
policy has begun to receive increasingly more attention, and wavelet transform, long short term memory neural
options for clean, renewable energy, such as wind power, network and Elman neural network,” Energ. Convers.
have become an essential issue. Wind speed is always
Manage, vol. 156, pp. 498-514, 2018.
intermittent, making its prediction very challenging. Due to
9. H. Liu, X. Mi, Y. Li, “Smart multi-step deep learning
topographical features, and strong annual South West and
model for wind speed forecasting based on variational
North East monsoons, Taiwan has an abundance of wind,
and wind energy utilization, as well as forecasting, has been mode decomposition, singular spectrum analysis, LSTM
indispensable. This study develops a deep belief network network and ELM,” Energ. Convers. Manage, vol. 159,
pp. 54-64, 2018.
8
Access
10. J. Chen, G. Q. Zeng, W. Zhou, W. Du, K. D. Lu, “Wind deep belief network,” Appl. Soft Comput, vol.62, pp. 251-
speed forecasting using nonlinear-learning ensemble of 258, 2018.
deep learning time series prediction and extremal 23. H. Shao, H. Jiang, X. Li, T. Liang, “Rolling bearing fault
optimization,” Energ. Convers. Manage , vol. 165, pp. detection using continuous deep belief network with
681-695, 2018. locally linear embedding,” Comput. Ind, vol. 96, pp. 27-
11. H. Liu, X. Mi, Y. Li, “Smart deep learning based wind 39, 2018.
speed prediction model using wavelet packet 24. Y. Zhang, G. Huang, “traffic flow prediction model based
decomposition, convolutional neural network and on deep belief network and genetic algorithm,” IET Intelt.
convolutional long short term memory network,” Energ. Transp. Sy, vol. 12, Issue 6, pp. 533-541. 2018.
Convers. Manage, vol. 166, pp. 120-131, 2018. 25. J. Holland, “Adaptation in natural and artificial systems.
12. Y. Li, H. Wu, “Multi-step wind speed forecasting using An introductory analysis with application to biology, ”
EWT decomposition, LSTM principal computing, RELM Control, and Artificial Intelligence, Ann Arbor,
subordinate computing and IEWT reconstruction,” Energ. University of Michigan Press, pp. 439-444, 1975.
Convers. Manage, vol. 167, pp. 203-219. 2018. 26. G.E.P. Box, G.M. Jenkins, “Time Series Analysis:
13. M. Berglund, T. Raiko, K. Cho, “Measuring the Forecasting and Control, ” revised Edition, CA , Holden-
usefulness of hidden units in Boltzmann machines with Day, San Francisco,1976.
mutual information,” Neural Networks , vol. 64, pp. 12- 27. G.E.P. Box, G.M. Jenkins, G. C. Reinsel, G. M. Ljung,
18.2018. “Time Series Analysis: Forecasting and Control, ” 5th
14. F. Shen, J. Chao, J. Zhao, “Forecasting exchange rate Edition, John Wiley and Sons Inc., Hoboken, New Jersey,
using deep belief networks and conjugate gradient 2015.
method,” Neurocomputing, vol. 167, pp.243-253.2015 28. A.K. Palit, D. Popovic, “Computational Intelligence in
15. G. Wang, J. Qiao, X. Li, L. Wang, Wang, Improved Time Series Forecasting, ” Springer-verlag, London,
classification with semi-supervised deep belief network,” 2005.
IFAC-PapersOnLine, vol. 50, pp. 4174-4179.2017. 29. R.H. Shumway, D.S. Stoffer, “Time series analysis and
16. Z. Geng, Z. Li, Y. Han, “new deep belief network based its applications. With R Examples, ” Springer, New York,
on RBM with glial chains,” Information Sciences, vol. 2011.
463, pp. 294-306,2018. 30. S. Mukherjee, E. Osuna, F. Girosi, “Nonlinear prediction
17. T. Hayashida, I. Nishizaki, S. Sekizaki, M. Nishida, M. D. of chaotic time series using support vector machines, ”
Prasetio, “Structural Optimization of Deep Belief In Proceedings of the 1997 IEEE Signal Processing
Network by Evolutionary Computation Methods Society Workshop . IEEE, pp. 511-520, 1997.
including Tabu Search. Transactions on Machine 31. K. R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J.
Learning and Artificial Intelligence,” Transactions on Kohlmorgen, V. Vapnik, “Predicting time series with
Machine Learning and Artificial Intelligence, vol. 6, pp. support vector machines, ” In: International Conference
69, 2018. on Artificial Neural Networks, Springer, Berlin,
18. T. Kuremoto, S. Kimura, K. Kobayashi, M. Obayashi, Heidelberg, pp. 999-1004, 1997.
“Time series forecasting using a deep belief network with 32. V. Vapnik, S. E. Golowich, A. J. Smola, “Support vector
restricted Boltzmann machines,” Neurocomputing. vol. method for function approximation, regression estimation
137, pp. 47-56, 2014. and signal processing, ” In Proceedings of the advances
19. B. Qolomany, M. Maabreh, A. Al-Fuqaha, A. Gupta, D. in neural information processing systems, pp. 281-287,
Benhaddou, “Parameters optimization of deep learning 1997.
models using Particle swarm optimization,” In: 2017 13th 33. J.A.K. Suykens, Joos. Vandewalle, “Least squares
International Wireless Communications and Mobile support vector machine classifiers, ” Neural processing
Computing Conference (IWCMC). IEEE, pp. 1285-1290, Lett, vol. 9, pp. 293-300,1999.
2017. 34. J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Vandewalle,
20. Q. Sun, Y. Wang, Y. Jiang, L. Shao, D. Chen, “Fault “Weighted least squares support vector machines:
diagnosis of SEPIC converters based on PSO-DBN and robustness and sparse approximation, ” Neurocomputing,
wavelet packet energy spectrum,” In: Prognostics and vol. 484, pp. 85-105, 2002.
System Health Management Conference (PHM-Harbin). 35. R. Fletcher, “Practical methods of optimization, ” John
IEEE, pp. 1-7, 2017. Wiley & Sons. New York, 2013.
21. D. Hossain, G. Capi, “Genetic algorithm based deep 36. W. Karush, “Minima of functions of several variables
learning parameters tuning for robot object recognition with inequalities as side constraints, ”M. Sc. Dissertation.
and grasping,” Int. Sch. Sci. Res. Innov, vol. 11, pp. 629- Department of Mathematics, University of Chicago, 1939.
633,2017. 37. H.W. Kuhn, A.W. Tucker, “Nonlinear programming, ” In
22. F. Ghasemi, A. Mehridehnavi, A. Fassihi, H. Pérez- Proceeding of 2nd Berkeley Symposium on
Sánchez, “Deep neural network in QSAR studies using
9
Access
Mathematical Statistics and Probabilities, pp. 481–492,

1951.
38. J. Mercer, “Functions of positive and negative type, and
their connection the theory of integral equations, ”
Philosophical Transactions of the Royal Society of
London. Series A: Mathematical, Physical and
Engineering Sciences, vol. 209, pp. 415-446, 1909.
39. P. F. Pai, K. C. Hung, K. P. Lin, “Tourism demand
forecasting using novel hybrid system, ” Expert. Syst, vol.
41, pp. 3691-3702, 2014.
40. G. E. Hinton, S. Osindero, Y. W. Teh, “A fast learning
algorithm for deep belief nets, ” Neural Comput, vol. 18,
pp. 1527-1554, 2006.
41. D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A learning
algorithm for Boltzmann machines, ” Cognitive sci, vol.
9, pp. 147-169, 1985.
42. G. E. Hinton, “Training products of experts by
minimizing contrastive divergence, ” Neural Comput, vol.
14, pp. 1711-1800, 2002.
43. N. Le Roux, Y. Bengio, “Representational power of
restricted Boltzmann machines and deep belief networks, ”
Neural Comput, vol. 20, pp. 1631-1649, 2008.
44. D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A learning
algorithm for Boltzmann machines, ” Cogn. Sci, vol. 9,
pp. 147-169,1985.
45. C. T. Lin, C. G. Lee, “Neural fuzzy systems: a neuro-
fuzzy synergism to intelligent systems, ” Prentice Hall,
New Jersey, 1996.
46. P. Smolensky, “Information processing in dynamical
systems: Foundations of harmony theory, ” Colorado
University at Boulder, Department of Computer, 1986.
47. Y. Zhang, C. Zhang, Y. C. Chang, W. H. Liu, Y. Zhang,
“Offshore wind farm in marine spatial planning and the
stakeholders engagement: Opportunities and challenges
for Taiwan, ” Ocean. Coast. Manag, vol. 149, pp. 69-80,
2017.
48. T. A. T. Nguyen, S. Y. Chou, “Maintenance strategy
selection for improving cost-effectiveness of offshore
wind systems, ” Energ. Convers. Manage, vol. 157, pp.
86-95,2018.
10

2017 Ieeeaccess Deep Belief Networks With Genetic

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2017 Ieeeaccess Deep Belief Networks With Genetic

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Deep Belief Networks with Genetic

I. INTRODUCTION hybrid model. A trial-and-error method was employed to

VOLUME XX, 2017 1

expressed as Eq.(15). Thus, the gradient of the likelihood 𝑊𝑀𝐿𝑃 𝑡 1 𝑀𝑂 𝑊 𝑡

RBM 1 RBM2 RBM3 RBM (L-1) RBM L

III. WIND SPEED FORECASTING BY THE DBNGA

FIGURE 1. The construct of RBMs. [40, 44] A. THE PROPOSED FRAMEWORK

hours, and sunshine hours were used in this study to predict

In this investigation, parameters are expressed by a energy function of RBM

chromosome composed of 40 genes in the form of binary

parameters for LSSVR and LSSVRTS is set to 1 to 500. 30 𝑀𝑂2 , 𝜂2 of MLP.

selection of controlling parameters of genetic algorithms No

the unsupervised pre-training stage, and the supervised fine-

Figure 4. The parameters selection mechanism a deep belief network.

B. NUMERICAL RESULTS Wind speeds (m/s) Actual values

measurements, the root mean square error (RMSE) and the

measure the forecasting accuracy of the tested models, 5

highest actual value, and 𝐴 is the lowest actual value.

lists the averages of forecasting measurements for three

that the DBNGA model outperforms the other three 4

forecasting models in mid-term time windows, as well as in

DBNGA (𝑀𝑂 , 𝜂 )=(0.83,0.45);

Penghu LSSVRGA (Ʊ,σ)= (347.11,1.22)

Stations Models Parameters SARIMA

MAPE(%) 14.949 15.036 21.780 34.894

MAPE(%) 12.560 12.432 22.333 13.838

Mathematical Statistics and Probabilities, pp. 481–492,

You might also like