You are on page 1of 22

International Journal of Forecasting 37 (2021) 949–970

Contents lists available at ScienceDirect

International Journal of Forecasting


journal homepage: www.elsevier.com/locate/ijforecast

U-Convolutional model for spatio-temporal wind speed


forecasting

Bruno Quaresma Bastos a , Fernando Luiz Cyrino Oliveira a , , Ruy Luiz Milidiú b
a
Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, RJ, Brazil
b
Department of Informatics, Pontifical Catholic University of Rio de Janeiro, RJ, Brazil

article info a b s t r a c t

Keywords: The increasing penetration of intermittent renewable energy in power systems brings
Convolutional neural networks operational challenges. One way of supporting them is by enhancing the predictability
Spatio-temporal forecasting of renewables through accurate forecasting. Convolutional Neural Networks (Convnets)
Renewable energy
provide a successful technique for processing space-structured multi-dimensional data.
Deep learning
In our work, we propose the U-Convolutional model to predict hourly wind speeds for
Time series
a single location using spatio-temporal data with multiple explanatory variables as an
input. The U-Convolutional model is composed of a U-Net part, which synthesizes input
information, and a Convnet part, which maps the synthesized data into a single-site
wind prediction. We compare our approach with advanced Convnets, a fully connected
neural network, and univariate models. We use time series from the Climate Forecast
System Reanalysis as datasets and select temperature and u- and v-components of
wind as explanatory variables. The proposed models are evaluated at multiple locations
(totaling 181 target series) and multiple forecasting horizons. The results indicate that
our proposal is promising for spatio-temporal wind speed prediction, with results that
show competitive performance on both time horizons for all datasets.
© 2020 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction such as wind and solar energy, has been continuously


increasing (IEA, 2017).
Society is moving towards adopting cleaner energy The increased penetration of intermittent renewables
solutions for power generation, prioritizing the use of brings new challenges to modern power systems (Le Xie
resources which are sustainable on a long-term basis, et al., 2011). Higher variability in power generation may
increase the need for operating reserves and further re-
and which do not contribute to climate change. Policies
quire better controls for balancing very-short-term gen-
and regulations worldwide have created a basis for the
eration and demand (Le Xie et al., 2011), hence incurring
growth of renewables in the form of financial incentives
higher operational costs. One way to support operational
(e.g., feed-in tariff policies), penetration targets, and other planning and reduce operational costs is to enhance the
mechanisms (e.g., renewables quotas in energy portfo- predictability of renewable power generation via accurate
lios). As a consequence of the execution of such poli- and informative forecasting.
cies, the penetration of intermittent renewable energy, Many studies have addressed the issue of wind speed
and wind power forecasting (for a review, refer to Jung
∗ Corresponding author. and Broadwater (2014), Lei, Shiyan, Chuanwen, Hongling,
E-mail addresses: brunoq.b@aluno.puc-rio.br (B.Q. Bastos),
and Yan (2009), Vargas et al. (2019), Zhu and Genton
cyrino@puc-rio.br (F.L. Cyrino Oliveira), milidiu@inf.puc-rio.br (2012), and the references therein), and, more recently, of
(R.L. Milidiú). solar irradiance and solar power forecasting (Antonanzas

https://doi.org/10.1016/j.ijforecast.2020.10.007
0169-2070/© 2020 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

et al., 2016; David, Luis, & Lauret, 2018). Technological residual mappings (He, Zhang, Ren, & Sun, 2016a)) have
advances in metering and communication have led to allowed further improvements to be made to the tech-
the increasing availability of data measured at different nique.
locations over time; as a consequence, spatio-temporal Spatio-temporal prediction using models that are based
approaches have begun to gain attention in forecasting on Convnets is an ongoing research topic in many areas
practices (Tascikaraoglu et al., 2016). The use of spatially such as meteorology (Shi et al., 2015) and computer
distributed information in modeling has improved the ac- vision (Denton & Birodkar, 2017; Mathieu, Couprie, &
curacy of temporal forecasts. For example, in Dowell and LeCun, 2015; Villegas, Yang, Hong, Lin, & Lee, 2017).
Pinson (2016), the spatio-temporal approach of sparse Shi et al. (2015) proposed a model, known as ConvL-
vector auto-regression (sVAR) outperformed single-site STM, which extends recurrent neural networks (RNNs)
model benchmarks such as auto-regressive (AR) and vec- with Long Short-Term Memory (LSTM) cells (Hochreiter &
tor auto-regressive (VAR) models with regard to the task Schmidhuber, 1997) to perform convolutional operations.
of one-step-ahead wind power forecasting. In Persson,
The model was used to solve precipitation nowcasting and
Bacher, Shiga, and Madsen (2017), the spatio-temporal
video frame prediction. Mathieu et al. (2015) proposed a
approach of gradient boosted regression trees (GBRTs)
multi-scale framework based on Convnets and adversarial
provided competitive results from solar power forecasts
training (Goodfellow et al., 2014) to predict future video
compared to the single-site benchmark models of AR and
frames. More recent work on video frame prediction sug-
GBRT, among others.
gests decomposing video into content and motion (Den-
The notion that the use of spatio-temporal information
may improve wind and solar power generation forecasts ton & Birodkar, 2017; Villegas et al., 2017); moreover
is quite reasonable, since wind and solar power gener- this work combines a multitude of techniques such as
ation are affected by meteorological conditions, which Convnets, RNNs, and adversarial training. U-Net (Ron-
may span across large regions. In this sense, for exam- neberger, Fischer, & Brox, 2015), a technique originally
ple, the use of information from nearby meteorological developed for image segmentation, has recently been
stations could be useful to predict information at a tar- used for video frame prediction (Liu, Luo et al., 2018).
get wind farm. Additionally, the use of spatio-temporal Convnets have already been adopted in the realm of
information on meteorological variables such as humidity, renewable energy prediction. Zhu, Li, Mo, and Wu (2017)
temperature, precipitation, and so forth, may also be propose a Convnet that provides univariate wind power
useful in the process of wind and solar modeling. Statis- predictions by rearranging the univariate data into a 2-D
tical methods and machine-learning methods have been vector. Liu, Mi, and Li (2018) decompose wind speed data
adopted to forecast renewable power generation with into low- and high-frequency series. The low-frequency
spatio-temporal approaches, some of which consider in- series is predicted via Convnet with an LSTM layer on
formation on meteorological variables (e.g., Persson et al. top of it, whereas the high-frequency series are predicted
(2017), Tascikaraoglu, Sanandaji, and Chicco et al. (2016)). with standard Convnets. Chen, Zhang, Zhang, Peng, and
Examples of statistical methods used in spatio-temporal Cai (2019) uses a combination of multi-factor correlation,
forecasting include the compressive spatio-temporal fore- Convnets, and LSTM to predict wind speeds at a target
casting (CSTF) model (Tascikaraoglu, Sanandaji, & Chicco site. In Chen et al. (2019), the authors build a 3D matrix
et al., 2016), the spatio-temporal vector auto-regressive containing meteorological factors for all sites at histori-
model (André, Dabo-Niang, Soubdhan, & Ould-Baba, 2016), cal times and use Convnets to extract spatial features of
the sparse vector auto-regressive model (sVAR) (Dow- meteorological factors at various sites and times. LSTM is
ell & Pinson, 2016), the trigonometric direction diurnal then used to extract temporal features.
model (TDD) (Xie, Gu, Zhu, & Genton, 2014), and oth- This work proposes an architecture which combines
ers (Aryaputera, Yang, Zhao, & Walsh, 2015; He, Yang, U-Net and standard Convnet: the U-Convolutional model.
Zhang, & Vittal, 2014; Koivisto et al., 2016). Examples of
In this architecture, the U-Net – recently used to predict
machine learning methods include ensembles of decision
video frames (see Liu, Luo et al. (2018)) – is used to
trees (DTs) and support vector regression (SVR) (Heiner-
synthesize and produce high-level features from spatio-
mann & Kramer, 2016), gradient boosted regression trees
temporal data; the high-level features acquired with
(GBRTs) (Persson et al., 2017), and a fuzzy model (Da-
U-Net are fed into a Convnet, which produces the wind
mousis, Alexiadis, Theocharis, & Dokopoulos, 2004).
speed prediction. The proposed architecture is trained
Recently, deep learning frameworks and techniques
have been gaining ever more attention in different areas jointly; this means that the high-level features obtained
(e.g., see Li, Shang, and Wang (2018), Liu, Luo, Lian, and via U-Net are produced by optimizing a single value
Gao (2018), Shi et al. (2015)) due to breakthroughs in (the wind speed error for a single coordinate). In the
computer vision, machine translation, speech recognition, context of this work, the addition of calendar variables
and other complex tasks. Convolutional neural networks and identity mappings (He et al., 2016a) to the pro-
(Convnets) have been one of the techniques responsible posed U-Convolutional model are also investigated. The
for breakthroughs in computer vision (see Krizhevsky, proposed U-Convolutional model is compared to other
Sutskever, and Hinton (2012)). Convnets are neural net- architectures which use convolutional layers as a basis
works specially designed to process multi-dimensional (such as standard Convnets and Convnets built with sim-
data, within which the ordering of the elements matters plified Inception modules (Chollet, 2016)), fully connected
– such as images (2D arrays). Advances in Convnets’ el- neural networks with spatio-temporal information, and
ements (e.g., factorized convolutions (Chollet, 2016) and benchmark univariate models (such as Box & Jenkins (Box
950
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

& Jenkins, 1976) and BATS (Livera, Hyndman, & Sny- N elements in the j-axis; i.e., W = (wi,j ) ∈ RM ×N . Then,
der, 2011)). In the convolutional-based models, each in- the convolution operation of the filter W over the 2-D
put channel receives one explanatory variable (Maçaira, input array X may be described as in Eq. (1) (Goodfellow,
Thomé, Oliveira, & Ferrer, 2018) that is a spatio-temporal Bengio, & Courville, 2017).1
process. M −1 N −1
As well as proposing an architecture which differs ∑∑
Si,j = (X ∗ W )i,j = Xi+m,j+n Wm,n (1)
from the one in Chen et al. (2019), this work also differs
m=0 n=0
from Chen et al. (2019) in another aspect: meteorologi-
cal variables (temperature and u- and v-components of The output Si,j in Eq. (1) is related to one single unit in
wind) are adopted to directly map wind speed at a sin- the feature map of a convolutional layer. After performing
gle site. Specifically, the proposed architecture learns the the convolution operation for all units in the given feature
relations between the exploratory variables and output map, we obtain the number of units in the resulting
data without being given any specific information on the feature map, S = (Si,j ).
correlation between the variables. The contributions of In deep learning, inputs to Convnets are usually 3-D
this work are the following: arrays with height, width, and depth axes. In images, the
• The proposition of a deep learning architecture that height and width axes are related to the spatial disposi-
combines U-Net architecture and Convnet to process tion of pixels. The depth axis, on the other hand, is related
spatio-temporal information from multiple variables to the number of color channels of an image. To account
and forecast wind speeds at a single location; for a 3-D input array, the kernel of the convolutional
• The investigation of Convnet architectures which layer is made volumetric, with each channel of the kernel
include simplified inception modules, identity map- performing a convolution operation on a specific channel
pings, and calendar variables in the task of spatio- of the input array (Dumoulin & Visin, 2016). To form a
temporal wind speed predictions; feature map, the output of each convolution operation on
• A comparison of the proposed and Convnet architec- the channels is summed, and then a bias term is added
tures with fully connected neural network modeled (element-wise) to each resulting unit in the spatial grid.
in a spatio-temporal setting and against traditional After adding the bias term, a non-linearity is applied
univariate models such as BATS (Livera et al., 2011), element-wise to the resulting units in the feature map.
Box and Jenkins (1976), and Naïve models; By doing this, the convolutional layer is able to map
• Outlining a way forward in modeling wind speeds features which are non-linear transformations of the in-
with multiple explanatory data. put. In deep learning, a usual non-linearity is the ReLU
This paper is structured as follows: after the in- activation (Glorot, Bordes, & Bengio, 2011), defined as
troductory remarks, the fundamentals of the proposed g(z) = max(0, z). This activation function has a number
U-Convolutional model are presented in Section 2. Next, in of properties which are desirable for training deep neural
Section 3, the proposed approach to spatio-temporal wind networks and extracting features such as the production
prediction with Convnets is outlined, and the other Con- of sparse representations. As discussed in LeCun, Bengio,
vnet architectures are described. Section 4 presents the and Hinton (2015), the ReLU activation excels when many
dataset and the experimental set-up adopted in this study. layers are used in neural networks.
In Section 5, the results are presented and discussed; the
proposed U-Convolutional models are compared against
2.2. Pooling layers
other Convnet architectures, fully connected neural net-
work, the BATS model, the ARIMA model, and the Naïve
The pooling layer, also known as the subsampling
model. The final Section provides the conclusions of this
layer (LeCun, Bottou, Bengio, & Haffner, 1998), reduces
work and perspectives for future work.
the dimension of the feature map by compressing all the
information on several units from a local neighborhood
2. Fundamentals
of a feature map into one single value. This procedure
This work proposes an architecture which combines makes the pooling layer invariant to small translations
U-Net and Convnet to process spatio-temporal data re- and distortions in the data (Goodfellow et al., 2017; LeCun
lated to multiple explanatory variables and to predict et al., 2015, 1998). A pooling layer may take, for instance,
wind speeds at a single site. Before detailing the approach, the maximum value or the average value of units in the
the elements that comprise the proposed architecture are local neighborhood, which are respectively known as max
introduced. pooling and average pooling
Let there be a pooling layer of size P × P, which is
2.1. Feature extraction in convolutional layers applied to a single output feature map of a convolution
layer of size Q × Q . Then, the output of the pooling
Let X be a 2-D input array; i.e., X = (xi,j ). The array operation, for a stride of size Ns , is a map of size (((Q −
X may be the values of an input plane preceding a given P)/Ns ) + 1) × (((Q − P)/Ns ) + 1).
convolutional layer, where i and j are axes which retain
the spatial disposition of values in the 2-D input array 1 As explained in Goodfellow et al. (2017), Eq. (1) shows the cross-
(e.g., height and width axes of an image). Furthermore, correlation function. For further details, please refer to Goodfellow et al.
let W be a 2-D filter with M elements in the i-axis and (2017).

951
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

2.3. Fully connected layers Introduced in 2015, the U-Net is a special kind of Con-
vnet, which takes its name from the shape of the neural
Fully connected (FC) layers are usually the final layers network architecture. The U-Net is based upon the fully
in a Convnet, and are stacked on top of modules of con- convolutional network presented in Long, Shelhamer, and
volutional layers and pooling layers. Since the modules of Darrell (2014) and takes only convolutional, subsampling,
convolutional layers and pooling layers extract features and upsampling layers. The architecture presents a con-
from data, FC layers act directly upon the extracted fea- tracting path (convolution followed by downsampling)
tures. By doing this, FC layers map the extracted features which is used to extract high-level features from the
of the data to the final target of the Convnet. input image, followed by an expansive path (upsampling
Each one of the neurons in an FC layer is connected to followed by convolution), which creates a high-resolution
all the neurons in the preceding layer, and each connec- segmentation map (for more details, refer to Ronneberger
tion is associated with a trainable parameter (weight) plus et al. (2015)).
a bias term. Therefore, the neuron in the FC layer may be
computed by the dot product of inputs and its respective 3. Methodology
weights, followed by the addition of a bias term. Let a
neuron in an FC layer have n inputs, let f be an activation The use of spatial information has been shown to en-
function, b be the bias term, and let xi and wi be the input hance the predictability of wind (Dowell & Pinson, 2016;
and weight values respective to that neuron’s
∑n ith input. Persson et al., 2017; Tascikaraoglu, Sanandaji, & Chicco
Then, the neuron may be computed as f (( i=1 wi xi ) + b). et al., 2016). Thus, adopting more advanced techniques,
The configuration of the Convnet’s output layer will which process spatio-temporal information and predict
depend on the task being solved by the network. For wind speed, may lead to better tools for power systems
example, if we intend to solve a multi-class classification operations. Convnets are a powerful technique for pro-
with N output classes, the output layer may be designed cessing spatial data. This work develops architectures
to have N neurons with a softmax activation function. In based on convolutional and fully convolutional neural
the case of wind speed prediction for a single location site, networks to leverage spatio-temporal information about
the output layer will have only one neuron. multiple explanatory variables with the goal of predicting
wind speeds at a single location.
2.4. Identity mappings Let there be explanatory variables {X1 , . . . , XM } which
are spatio-temporal processes. Then, each explanatory
In He et al. (2016a), one of the most prominent deep variable is a random variable which may take values in
learning architectures (Gu et al., 2018) was proposed: location s ∈ S and time t ∈ T , where S is the spatial
the ResNet. The architecture, which aimed to address domain (in our case, S ⊂ R2 ) and T ⊂ R is the time
the degradation problem of very deep neural networks, domain. In the context of wind prediction, potential ex-
created residual blocks so that the layers of the neural planatory variables are temperature, pressure, presence of
network would perform residual mapping. The hypothesis rain, the components u and v of wind, and others.
was that the neural network would be easier to optimize To process spatio-temporal variables with Convnets,
with layers that performed residual mappings. Without this work proposes assigning each explanatory variable to
residual mapping, the neural network would have to learn one of the input channels of the Convnet. Considering this
unconstrained mappings, and learning the unconstrained set-up, for a given input channel, the height and width
mappings would be harder. axis of the Convnets are going to represent latitude and
Formally, the residual mapping may be formulated as longitude, respectively, and each value at a given position
follows: in the input channel will be the value of the associated
explanatory variable at the respective latitude-longitude
y = x + F (x, Wi ) (2)
pair. Moreover, all values in a given input channel will
where y and x are the output and input vectors of the be related to the same time step. For clarity, let the
layer(s) performing residual mapping. Wi is the parameter input variable X1 be written as X1,t . One could add in-
vector related to the mapping performed by the layer(s), formation about a given variable at different time steps
and F (x, Wi ) is the function that will be learned by the by including lags of the spatio-temporal variables in the
residual mapping. input channels. Additionally, one could consider multi-
ple explanatory variables in the input channels. Let l ∈
2.5. U-Net {0, . . . , L} denote lagged steps and let m ∈ {1, . . . , M } de-
note explanatory variables; the multivariate input to the
Convnets were originally developed to solve image Convnets, Xt , may be written as Xt = {Xm,t −l }.
classification tasks where the output was a class label Considering this setting, the objective is to use in-
(e.g., CIFAR, ImageNet). However, many other visual tasks formation about the multivariate 3D input tensor, Xt ,
require pixel-by-pixel classification with very limited data to predict a single value yt +h , which denotes the wind
(e.g., fewer than thousands of samples). The U-Net (Ron- speed h-steps ahead of time for a single site (i.e., for a
neberger et al., 2015) was proposed in the context of single latitude-longitude coordinate). This work proposes
biomedical image segmentation, and addresses these is- the use of neural networks to perform such a mapping.
sues of the traditional Convnets, producing more precise Specifically, deep learning techniques are adopted to de-
segmentation with very few training examples. sign neural network architectures that extract high-level
952
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

features from the multi-dimensional tensors and map 2D tensor in (A) is concatenated in the channel axis to
the extracted features into the output variable in R. The form the input tensor of dimensions nlatitude × nlongitude × 9.
backbones of the proposed architecture are the U-Net and Illustration (C) shows the U-Conv model overall structure:
Convnet architectures. a model that is composed of two parts – a U-Net model
and a Convnet-based model. The U-Conv model outputs a
3.1. U-Convolutional models single value (h-step-ahead wind speed for a given (i, j) co-
ordinate). In (D), the target variable of a single coordinate
The U-Net maps an image to a segmentation mask, is detailed.
solving pixel-by-pixel classification. It is a powerful tech-
nique for image segmentation. More recently, the U-Net 3.1.2. U-Net design
was adopted to predict future frames in videos. Specifi- The design of the U-Net part of the U-Conv model fol-
cally, in Liu, Luo et al. (2018), an adversarial framework is lows the structure proposed in Ronneberger et al. (2015).
adopted, and the generator, which produces the predic-
However, in the present work, the spatial resolution of
tion for the next frame, is a U-Net. As pointed out in Liu,
the input and the output is smaller than the one in Ron-
Luo et al. (2018), the U-Net is selected as the predictor due
neberger et al. (2015): the datasets used in this work have
to its good performance in image-to-image translation.
a spatial dimension of 10 × 10 and 9 × 9 (accounting for
The inputs to the U-Net in Liu, Luo et al. (2018) are frames
specific regions in Brazil), whereas in Ronneberger et al.
in present and past times; the output is the frame at one
(2015), the spatial dimension is 512 × 512 (accounting
step ahead of the present time.
for medical images with a 512 × 512 pixel resolution).
In the proposed architecture, which will be referred to
In Ronneberger et al. (2015), the contracting path is com-
as U-Conv model, the U-Net is used differently, since its
posed of 3 × 3 convolutions and 2 × 2 maxpooling with
final layer is not associated with a target tensor (e.g., seg-
mentation mask, image, or video frame). Instead, the final a stride of 2. Considering the reduced spatial information
layer of the U-Net is connected to a Convnet, which out- in our input setting, the proposed architecture is designed
puts a single value (i.e., wind speed at h-step ahead of with 2 × 2 convolutions and 2 × 2 pooling with a stride
time). Consequently, the U-Net’s final layer is a feature of 1 in the contracting path. Moreover, despite following
map that is obtained as a result of the optimization pro- the same idea of doubling the number of filters at each
cess that minimizes weights considering only a single 2 × 2 subsampling, the proposed design also differs in
output variable. With this setting, we design the U-Net terms of the number of filters at the convolutional layers
to map the 3D input tensor, Xt , to a 3D output tensor (instead of the 64 filters in the first convolutional layer,
that has the same dimensions as the input tensor. This the proposed model uses 24 or 48 filters, which are se-
output tensor is connected to a Convnet-based architec- lected in a hyperparameter grid search). The expansive
ture, which produces the final single-value output for our path follows the idea of the original work: upsampling
problem (the wind speed prediction). of feature maps followed by convolution that halves the
number of features, which are concatenated with output
3.1.1. Input design maps from the contracting path, and then followed by two
In the U-Conv model design for wind speed prediction, convolutional layers. The expansion path is symmetric to
the u-component of wind, v -component of wind, and the the contracting path in terms of the number of filters in
temperature at times t, t − 1, and t − 2 are considered as the convolutional layers.
input variables. Let these three variables be, respectively,
U, V , and K . One may write the input tensor as Xt = 3.1.3. Convnet design
{Ut −l , Vt −l , Kt −l }, l ∈ {0, 1, 2}. Hence, Xt totals nine input To build the Convnet part of the model, convolu-
variables. Each variable is assigned to one input channel. tion and subsampling layers are combined, increasing the
Thus, the input array has the shape nlatitude × nlongitude × 9, number of feature maps (representational power) as the
which accounts, respectively, for the number of latitude size of the feature maps (resolution) decreases (LeCun
points (height axis), number of longitude points (width et al., 1998). Considering this setting, a module com-
axis), and the three explanatory variables at times t, t − 1, prising two convolutional layers followed by one pooling
and t − 2 (channels axis). layer is created. Two such modules are stacked to act as
The output variable is the wind speed at the time the feature extractor of the Convnet part of the model -
(i,j)
t + h for a given (i, j) coordinate; i.e., yt +h . The wind this design is inspired by the first layers of the VGG-19
speed is composed of the u- and v -components of wind, model (He et al., 2016a; Simonyan & Zisserman, 2014).
which is why these variables are selected as exploratory The Convnet part of the proposed model ended up
(i,j) (i,j)
variables for the proposed task. Let ut and vt be the with 8 to 9 layers. The first two layers were convolutional
u- and v -components of wind for a given (i, j) coordinate layers with kernels sized 2 × 2; the third layer was a
at time t. Then, the wind √ speed for the same coordinate pooling layer sized 2 × 2. The fourth and fifth layers
(i,j) (i,j) (i,j)
is calculated as yt = (ut )2 + (vt )2 . were convolutional layers with kernels sized 2 × 2. The
Fig. 1 illustrates the solution that is proposed for wind sixth layer was a pooling layer sized 2 × 2. The first six
speed spatio-temporal prediction with the U-Conv model. layers are the ones that perform the feature extraction.
In (A), the variables that comprise the input tensor are After the sixth layer, the output map is reshaped into
illustrated: variables U, V , and K at times t, t − 1, and t − 2. a one-dimensional vector. The one-dimensional vector is
In (B), the structure of the input tensor is detailed: each then fed to a fully connected layer (the seventh layer of
953
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. 1. Illustration of input-output mapping with U-Conv model.

the Convnet design). The last layer is the output layer The full architectures of the U-Conv model and the
– comprising a single neuron with linear activation.2 variants are illustrated in Figs. 2. In the Figures, the filters
A dropout layer between the fully connected layer in the first convolutional module of the U-Net part of
and the output layer may be included in the Convnet the model are referred to as f0; the filters in the first
design. The adoption of the dropout layer is treated as and second convolutional modules of the Convnet part
a hyperparameter choice; since other techniques that of the model are referred to as f1 and f2, respectively.
prevent overfitting are adopted (e.g., early stop in train- Fully connected layers are referred to as fc, and the output
ing), a dropout might not be needed. The hyperparam- layer is named as out. The number of filters and the
eter choices are made through experiments, which are regularization are hyperparameter choices. The selected
described in Section 4. hyperparameters are specified in Appendix A.

3.2. Convnet and inception models


3.1.4. Model variants
A second version of the U-Conv model is proposed,
Following the idea of using explanatory variables in
named residual U-Conv model (or U-ConvRes), where
the channel axis of the 3D input tensor, it was also pos-
residual connections are added to the Convnet part of sible to develop other Convnet models to solve spatio-
the model. The residual connections are added in every temporal wind speed forecasting. Specifically, this work
convolutional module of the Convnet. Specifically, the develops a standard Convnet, Convnet with identity map-
U-ConvRes model adopts a variant of the identity map- pings, Convnet with simplified inception modules (for
ping where a 1 ×1 convolution is applied in the residual details, see Chollet (2016)), and Convnet with simplified
connection. This variant is adopted in order to match the inception modules and identity mappings. These archi-
input and output sizes of feature maps (for details, see He tectures – which we refer to as Convnet, residual Convnet
et al. (2016a), He, Zhang, Ren, and Sun (2016b)). (or ConvRes), Inception, and residual Inception (or Incep-
Another model variant investigated in this study is tionRes), respectively) – are illustrated in Figs. 3(a), 3(b),
the one where calendar variables are adopted to cap- 3(c), and 3(d), respectively. By investigating other Con-
ture seasonality in the proposed problem. The goal in vnet architectures, one may be able to identify which
developing such a variant is to understand through exper- other types of architectures would be suitable for solving
iments whether adding seasonal indicator variables im- spatio-temporal wind speed forecasting.
proves the forecasting accuracy of the proposed models. In He et al. (2016a), the residual mapping is adopted to
Specifically, the calendar variables are dummy variables every few layers; for example, the 34-layer residual net-
for (a) the hour of the day, (b) the month of the year, work has a residual mapping that encompasses two con-
and (c) the day of the week, which totals 43 exogenous secutive layers. In the architectures tested in this work,
variables. These variables are incorporated (via concate- identity mapping is adopted for every convolutional (or
nation) into the input of the fully connected part of the inception) module. This is illustrated in Fig. 3(b) (and
U-Conv and U-ConvRes models. Fig. 3(d)). Similar to the U-ConvRes, the 1 ×1 convolu-
tion variant of the identity mappings are implemented to
2 The number of filters in the convolutional layers and the number
create the ConvRes and InceptionRes models.
Concerning the inception-based models, the architec-
of neurons in the fully connected layer may vary in the experiments.
Nevertheless, all U-Conv models follow the idea that the number of ture design substitutes convolutional layers in Convnets
filters in the first convolutional module is lower than that in the second and Residual Convnets in inception modules (see Figs. 3(a)
convolutional module. and (b) versus Figs. 3(c) and (d)). Standard convolution
954
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. 2. (a) U-Conv; (b) U-ConvRes; (c) U-Conv w/calendar info; (d) U-ConvRes w/calendar info.

attempts to learn cross-correlation and spatial correlation As well as experimenting with identity mappings in
with its filters (Chollet, 2016). Conversely, the Inception Convnet and Inception architectures, this work also tests
idea aims to split the processing of cross correlation and seasonal variables in the four models depicted in Fig. 3.
spatial correlation by factoring the convolution into mul- The addition of seasonal variables follows the same prin-
tiple kernels. In this context, the idea in this work is to ciples as in the U-Conv model: they total 43 dummy
understand whether adopting this decoupling technique variables, which are concatenated to the flattened output
(inception idea) improves spatio-temporal wind forecast- of the feature extractor, serving as the input to the fully
ing. connected part of the models.
955
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. 3. (a) Convnet; (b) ConvRes; (c) Inception; (d) InceptionRes.

3.3. Training sample data for latitude-longitude pairs, where latitudes


range from −9.5◦ to −14.0◦ at a 0.5◦ spatial resolution
To build the proposed models, a loss function, which and longitudes range from −44.5◦ to −40.0◦ at a 0.5◦ spa-
provides a measure of how bad the model is, needs to be tial resolution. These coordinates are related to an area
defined. The loss function selected for the model training in Brazil comprising, approximately, the entire state of
was the mean squared error, a popular loss function for Bahia. For this purpose, this first sample of the CFSR
regression tasks (see Eq. (3)). To estimate the models, dataset is referred to as Bahia Wind.
we also need to define the learning algorithm that min- For the other sample, the data is acquired for latitude-
imizes the selected loss function. To search the weights longitude pairs where latitudes range from −4.191◦ to
in the parametric space in the Convnets such that the −5.826◦ at, approximately, a 0.204◦ spatial resolution,
loss function of our task is minimized, the Adam al-
and longitudes range from −36.000◦ to −37.841◦ at, ap-
gorithm (Kingma & Ba, 2014) was selected. Adam was
proximately, a 0.204◦ spatial resolution. These coordi-
selected because it combines properties from Adagrad
nates are related to an area in Brazil comprising a region
(Duchi, Hazan, & Singer, 2011) (dealing with sparse gra-
in the State of Rio Grande do Norte, one of the regions
dients) and RMSprop (dealing with non-stationary objec-
with the highest incidence of constant wind in Brazil. This
tives), and because Adam is usually considered robust to
hyperparameter choices (Goodfellow et al., 2017). second sample of the CFSR dataset is referred to as Rio
Grande do Norte Wind or RN Wind.
N
1 ∑ The data acquired for each variable at each coordinate
MSE = (yi − ŷi )2 (3) consists of an hourly time series which dates from 2011-
N
i=1 04-01 00:00 until 2017-01-01 0:00 for Bahia Wind, and
where yi is the true value for the ith example and ŷi is the from 2011-04-01 00:00 until 2018-03-01 00:00 for Rio
predicted value for the ith example. Grande do Norte Wind.
In summary, there are 50,448 hourly data for each one
4. Empirical evaluation of the 10 ×10 locations (latitude-longitude pair) for each
of the 3 variables considered for Bahia Wind, and 60,648
4.1. Datasets hourly data for each one of the 9 ×9 locations for each
one of the 3 variables considered for Rio Grande do Norte
In this work, two samples from the Climate Forecast Wind.
System Reanalysis (CFSR) dataset (Saha et al., 2014) are In Figs. 4 and 5, the explanatory variables (at the time
used to evaluate the proposed architectures. The dataset step of t = 0 hours) are illustrated in a heatmap for Bahia
was obtained from the Research Data Archive (RDA) web-
Wind and Rio Grande do Norte Wind, respectively.
site. Both samples from the CFSR dataset contain data
on the u-component of wind, v -component of wind, and
4.2. Experimental setup
temperature for latitude-longitude pairs. The u- and
v -components of wind were obtained for a specified
height of 10 m above ground, whereas the temperature The proposed models were adopted to predict wind
was acquired with ground surface level in mind.3 The first speed one hour and six hours ahead of time for all co-
ordinates of the Bahia Wind and Rio Grande do Norte
3 For wind generation prospecting, due to differences in wind
Wind datasets. Thus, models were developed for each
(i, j) latitude-longitude pair of each dataset. All models
behavior for different height levels, special attention should be paid
to the heights of wind measurements when considering the proposed map a 3D input tensor with spatio-temporal information,
(i,j)
approach for wind speed modeling. Xt , into a single target value, yt +h , which is the wind
956
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. 4. Heatmaps of meteorological variables for time step t = 0 hours of Bahia Wind.

Fig. 5. Heatmaps of meteorological variables for time Step t = 0 hours of Rio Grande do Norte Wind.

speed h-steps ahead for a given (i, j) coordinate. In the models. Consequently, the hyperparameter search of the
experiments, the same input tensor, Xt , was adopted for U-Convolutional models was restricted to the U-Net part
all target coordinates. Convergence in neural networks of the model (as well as learning and regularization hy-
is faster when input data is normalized (LeCun, Bottou, perparemeters). It is possible that these strategies pro-
Orr, & Müller, 2012). Thus, to develop the neural net- duce sub-optimal models for the variants (and locations)
work models, a preprocessing step was adopted where we are investigating (since the variants are not subjected
each input variable in Xt was normalized to achieve an to a hyperparameter search). However, it allows us to
approximate average of zero and a standard deviation of evaluate whether including (a) calendar variables and (b)
one. residual connections improve models with fixed hyperpa-
Considering the spatial size of Bahia Wind and Rio rameter designs, and allows us to compare the different
Grande do Norte Wind (10 ×10 and 9 ×9, respectively) architectures since they are all the subject of the same
and that four modeling variants have been selected for strategy.
experiment (design schemes with or without calendar Before performing the hyperparameter search, Bahia
variables and residual connections) for each coordinate and RN datasets are split into training and test sets. The
with four different neural network architectures (U-Conv, test set contains the last 20% of samples in the datasets
Inception, Convnet, and a benchmark fully connected neu-
(accounting for 10,089 samples in the case of Bahia Wind
ral network detailed below), a great number of models
and 12,129 samples in the case of Rio Grande do Norte
need to be developed (1600 models for Bahia Wind and
Wind). The test set is used to evaluate how well the mod-
1296 models for Rio Grande do Norte Wind) for each
els perform with unseen data. To search the hyperparam-
forecasting horizon.
eters of the proposed architectures, 5-fold cross validation
In such a setting, optimizing the hyperparameters (such
(CV) is applied to the training set (thus, the validation
as the learning rate, batch size, number of neurons, and
so forth) using a grid search for each location for all sets in the CV setting include 20% of the samples of the
variants may be highly time consuming. To avoid this, the training set). The best hyperparameter is the one whose
following strategy was adopted: a single location is se- model returns the lowest average MAE (on the 5-folds
lected for a hyperparameter search on the most simple of the CV setting). After deciding on the best hyperpa-
model variant (no calendar variables and no residual rameters, the training set is divided into fixed training
connection) for each algorithm. The best hyperparame- and validation sets, with the validation set containing
ter configuration would then be adopted for all other the last 20% of the data from the training set. Following
variants of that algorithm in order to train models for this, the training of all the variants and architectures is
all other coordinates. The best hyperparameter configu- performed for all coordinates and horizons, optimizing
ration is selected considering only the one-hour-ahead the parameters (weights) of the architectures based on
problem; the same hyperparameter configuration is used the loss on the validation training. After estimating all
to model the six-hour-ahead problem. Additionally, to models, the predictions are made on the test data, and
avoid an extensive grid search, the best Convnet con- the accuracy measures are evaluated.
figuration (obtained through a hyperparameter search) In the hyperparameter search, regularization tech-
was adopted as the Convnet part of the U-Convolutional niques (such as dropout (Srivastava, Hinton, Krizhevsky,
957
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Sutskever, & Salakhutdinov, 2014) and weight regular- than Convnet and ANN – can take two or more hours for
ization) are tested; the goal is to have hyperparameter a single coordinate).
configurations which are robust to overfitting. The train- The first step of the experiments was to select the best
ing protocol adopted was the same for all architectures hyperparameters for the simplest model variant of each
(both in a hyperparameter search and in standard train- algorithm at a single location (center coordinate for both
ing). The training was set to have 200 epochs with early datasets), as discussed in Section 4.2. For each architec-
stopping when loss did not decrease (on validation) for ture, a different set of hyperparameter configurations was
5 epochs. (The early stopping strategy is also a regular- tested. In Appendix A, the hyperparameter search is de-
ization technique.) The batch size was set as 256 for all tailed, and the best configurations and respective results
algorithms. (average MAE on validation folds) are presented (results
for a fully connected neural network benchmark with
spatio-temporal information, detailed below, are also pre-
4.3. Accuracy measures sented). It is relevant to note that the best hyperparame-
ter configurations of the same algorithms were different
The accuracy measure adopted to select the best hy- for Bahia Wind and RN Wind. This could either be due
perparameter configuration was the Mean Absolute Error to weight initialization or the intrinsic differences in the
(MAE) (evaluated on the validation set). MAE is a scale- datasets.
dependent metric that is very common in forecasting The models created for each algorithm differ in terms
literature (Hyndman & Koehler, 2006), in general, and in of capacity (i.e., total number of parameters). For ex-
wind speed prediction (e.g., Hu, Zhang, Yu, and Xie (2016), ample, regarding the Bahia Wind dataset, the Convnet
Li, Shi, and Zhou (2011), Tascikaraoglu, Sanandaji, Poolla, model with no calendar variables and no residual con-
and Varaiya (2016)). nections has 456,353 parameters (weights), while the
To evaluate how well the proposed models perform U-Convolutional model with no calendar variables and
with unseen data, other metrics are also adopted. The no residual connections has 993,194 parameters. A model
idea is to provide different references of error measures with a higher capacity may have more room to overfit
other than only MAE. In this context, the Root Mean the dataset, in which case regularization is especially im-
Squared Error (RMSE), the Symmetric Mean Absolute Per- portant. The best hyperparameter configuration for all the
centage Error (sMAPE), and the Mean Absolute Scaled algorithms considered a weight regularization technique.
Error (MASE) are also provided. The RMSE is selected as a Only the Inception architecture had both dropout and
second option for the scale-dependent measure, whereas weight regularization as regularization techniques (for
the Bahia Wind dataset). This indicates that, in order to
sMAPE and MASE are selected with the aim of provid-
perform well on the validation folds (and, possibly, to
ing percentage-base-error and scaled-error references, re-
generalize unseen data effectively), it is crucial that the
spectively. For a more detailed discussion on accuracy
architectures are designed with regularization, especially
measures for forecasting, see Hyndman and Koehler
weight regularization.
(2006). The second step of the experiments was to train the
Let yi and ŷi be, respectively, the true and predicted models for all coordinates, considering the best hyper-
values of the ith example. Then, the aforementioned ac- parameter configuration found for a single coordinate.
curacy measures are described as follows: The final step was then to evaluate the models with
√ regard to the test set. In this section, the Convnet models
RMSE = MSE (4)
are compared with each other and against more tradi-
tional benchmark models: fully connected neural net-
N
1 ∑ works (ANN) with 4 hidden layers, the Box & Jenkins
MAE = |yi − ŷi | (5) (B&J) model (Box & Jenkins, 1976), the BATS model (Livera
N
i=1 et al., 2011), and the Naive model (also known as the
N ( ) persistence model). The ANN benchmark also uses spatio-
1 ∑ |yi − ŷi |
sMAPE = 200 × (6) temporal inputs (the same input variables as the Convnet
N |yi + ŷi | models). Additionally, variants with calendar variables
i=1
N
( ) and residual connections are also tested for the ANN
1 ∑ yi − ŷi algorithm (which was subject to the same training process
MASE = (7)
N 1
∑N
|yi − yi−1 | as the Convnet models - with a hyperparameter search
i=1 N −1 i=2
followed by full training and model evaluation). The last
three methods (i.e., B&J, BATS, and Naïve) were developed
5. Results and discussion
in a univariate forecasting setting (each coordinate has
a wind speed time series, which is used to train the
The work was developed in Python using Keras (Chollet models). BATS was originally developed to model time
et al., 2015) to create and train neural network architec- series which exhibit complex seasonal patterns (Livera
tures. The proposed U-Convolutional architectures were et al., 2011), and the B&J method is a traditional and
trained on an NVIDIA GTX 1660Ti GPU (the training for a robust method for univariate time series forecasting. BATS
single coordinate could take up to 15 min, depending on and B&J were implemented with the forecast package in
early stopping and the dataset); the other models were the R (Hyndman & Khandakar, 2008) (the B&J model was
executed on a CPU with 8 cores (the training of the in- fitted with auto.arima function, and the BATS model with
ception architecture – which presented more parameters bats function).
958
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

5.1. Global accuracy measures with calendar variables and present-best and second-best
accuracies) and for RN Wind (ANN and ANNRes, both with
In order to provide a global view of the performance of calendar variables, present the second-best and third-best
the models, the average value of the accuracy measures accuracies). However, when six-hour-ahead forecasts are
at all coordinates is provided for Bahia Wind and RN considered, ANN architectures do not present the best ac-
Wind in Tables 1 and 2, respectively. With the results curacy measures, and Convnet-based architectures, which
in Tables 1 and 2, it is possible to answer some of the specialize in processing spatial data, obtain the best re-
questions that inspired the experiments. For example, it sults. Considering this, it may be hypothesized that when
is possible to analyze whether adding calendar variables the forecasting horizon grows higher, spatial information
improves the forecast accuracy of the proposed models. gains more relevance in the modeling set-up (and, since
Considering the global measure, it is possible to note that ANN architectures do not directly take spatial information
adding calendar variables improves the accuracy of all into account, ANN is outperformed by Convnet-based
architectures for both datasets and forecasting horizons. models). Such a hypothesis seems plausible because the
For example, in the case of Bahia Wind, the average MAE weather spans and moves over large regions.
of the U-Conv model is 0.2084 with calendar variables The proposed U-Convolutional models are competitive
against 0.2255 with no calendar variables for the one- in terms of both forecasting horizons for both datasets.
hour ahead forecast, and 0.5359 against 0.5538 for the For example, the proposed U-ConvRes model (with cal-
six-hour ahead forecast. The same happens for all other endar variables) presents the best accuracy measures for
architectures, indicating that incorporating calendar vari- one-hour-ahead forecasting in the RN Wind dataset, and
ables after the feature extraction improves the forecast the third-best global accuracy measures in all other cases.
accuracy of the Convnet-based models (and that incorpo- Additionally, the proposed U-Conv model (with calendar
rating them as input data to ANN models improves their variables) is ranked in the top 5 in global accuracy mea-
performance as well). The variants with calendar vari- sures in all cases. Specifically, the U-Conv model presents
ables outperform those without calendar variables despite the second-best results for six-hour-ahead forecasting in
not being fully optimized (i.e., the best hyperparameter the Bahia Wind dataset.
configuration was selected based on a grid search of the The standard Convnet model with calendar variables
variant without calendar variables). (and no residual connections) presents the best global
Conversely, the addition of identity mappings is not accuracy measures for six-hour-ahead forecasts in both
unanimous in terms of how it impacts the performance datasets. This result might indicate that traditional con-
of the models. For example, U-Conv models (with and volution is able to leverage different information from
without calendar variables) are slightly improved by the explanatory variables of input channels to build a model
addition of identity mappings for almost all cases except with good performance in the task. Perhaps one could
for the six-hour-ahead forecast horizon on the Bahia Wind have expected a more complex type of processing of
dataset; this is also the case for the Convnet model with- the spatial information that outperforms the standard
out calendar variables. On the other hand, ANN models convolution for six-hour-ahead forecasting in the global
are worsened by the addition of residual connections in measures. However, it seems clear that the Inception ar-
almost all cases; Inception with no calendar variables chitectures (i.e., Inception and InceptionRes) are outper-
and Convnet with calendar variables also present similar formed by the Convnet model (with calendar variables)
behavior to ANN models. It could be hypothesized that on a six-hour-ahead forecasting task. Conversely, looking
adding identity mappings would improve models with at the proposed U-Conv architectures (also with calendar
higher capacities (such as the U-Conv models, which is variables), the global accuracy measures in the six-hour-
more than ten times the size of the ANN on the RN Wind ahead problem are quite similar to those of the aforemen-
dataset, and more than 2.5 times the size of the ANN on tioned Convnet model (e.g., the U-Conv model presents
the Bahia Wind dataset, as detailed in Appendix A). How- an MAE of 0.5359 and an sMAPE of 0.6994 against the
ever, this might not be case since Inception models also MAE of 0.5346 and sMAPE 0.6992 of the Convnet model
have a higher capacity (than ANN and Convnet models, for – for Bahia Wind) – which might indicate that adopting
instance), and present mixed results when comparing the U-Net may be equally as beneficial as only using standard
variant with identity mappings against the one without convolutional layers. It is important to put into perspec-
it. In this case, by looking only at the global accuracy tive, however, that the U-ConvRes model with calendar
measures, conclusive remarks may not be drawn on how variables outperforms the Convnet model on one-hour-
(or if) identity mappings improve forecast accuracy. ahead forecasting in terms of global accuracy measures.
Looking at the different algorithms, it is possible to It may be hypothesized that by adopting the U-Net to
note that, despite not directly considering the spatial synthesize 3D input data, the features extracted by the
disposition of the input data (3D input tensors are trans- U-ConvRes model are more refined – which might make
formed into a 1D vector), fully connected neural networks the U-ConvRes model more ‘‘stable’’ than the Convnet
(ANN architectures) are competitive. ANN architectures model when it comes to modeling multiple coordinates,
have the best global accuracy measures in one-hour- also in the short-term case.
ahead forecasts for Bahia Wind (ANN and ANNRes,4 both
5.2. Best models per location
4 ANNRes is the ANN architecture with the addition of identity
mappings connecting the output of the first hidden layer to the output After analyzing global accuracy measures, a second
of the fourth hidden layer. perspective which might be evaluated is that of model
959
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Table 1
Average accuracy measures (over all coordinates) - Bahia Wind.
Horizon One-hour horizon Six-hours horizon
Model Calendar MAE RMSE SMAPE MASE MAE RMSE SMAPE MASE
ANN – 0.2091 0.2838 8.8800 0.6906 0.5786 0.7555 21.9273 0.5785
ANN Yes 0.1987 0.2700 8.5624 0.6575 0.5674 0.7411 21.6148 0.5677
ANNRES – 0.2126 0.2896 9.0763 0.7021 0.5911 0.7694 22.6331 0.5920
ANNRES Yes 0.2014 0.2755 8.7036 0.6660 0.5791 0.7543 22.2715 0.5807
Convnet – 0.2239 0.3051 9.6481 0.7381 0.5537 0.7206 21.1251 0.5549
Convnet Yes 0.2101 0.2882 9.2198 0.6948 0.5346 0.6992 20.5221 0.5370
ConvRes – 0.2220 0.3026 9.5769 0.7337 0.5542 0.7217 21.1677 0.5553
ConvRes Yes 0.2106 0.2892 9.2777 0.6975 0.5422 0.7083 20.8874 0.5448
Inception – 0.2445 0.3298 10.3200 0.8059 0.5887 0.7626 22.2959 0.5900
Inception Yes 0.2342 0.3171 9.9698 0.7735 0.5768 0.7494 21.9465 0.5794
InceptionRes – 0.2455 0.3314 10.3634 0.8100 0.6006 0.7779 22.7818 0.6022
InceptionRes Yes 0.2330 0.3152 9.9230 0.7700 0.5858 0.7605 22.2649 0.5882
U-Conv – 0.2255 0.3081 9.5343 0.7397 0.5538 0.7196 21.0210 0.5552
U-Conv Yes 0.2084 0.2871 8.9977 0.6858 0.5359 0.6994 20.4807 0.5380
U-ConvRes – 0.2133 0.2921 9.1534 0.7039 0.5590 0.7276 21.2563 0.5605
U-ConRes Yes 0.2017 0.2772 8.8035 0.6676 0.5389 0.7029 20.6847 0.5421
ARIMA – 0.2513 0.3519 10.7021 0.8245 0.8754 1.0943 30.8176 0.8486
BATS – 0.2507 0.3495 10.5057 0.8225 0.8579 1.0707 30.1451 0.8354
Nave – 0.3145 0.4218 12.6965 – 1.0691 1.3309 38.0337 –

Table 2
Average accuracy measures (over all coordinates) - RN Wind.
Horizon One-hour horizon Six-hours horizon
Model Calendar MAE RMSE SMAPE MASE MAE RMSE SMAPE MASE
ANN – 0.1984 0.2844 4.1143 0.5680 0.6203 0.8146 12.0766 0.5021
ANN Yes 0.1845 0.2657 3.8090 0.5290 0.5998 0.7883 11.5990 0.4864
ANNRES – 0.2006 0.2883 4.1848 0.5748 0.6213 0.8152 12.1215 0.5031
ANNRES Yes 0.1851 0.2675 3.8491 0.5306 0.5992 0.7879 11.6114 0.4857
Convnet – 0.2066 0.2948 4.2750 0.5911 0.6064 0.7986 11.8557 0.4905
Convnet Yes 0.1932 0.2779 4.0106 0.5542 0.5752 0.7597 11.1402 0.4663
ConvRes – 0.2047 0.2930 4.2641 0.5866 0.6030 0.7949 11.8076 0.4879
ConvRes Yes 0.1880 0.2733 3.9356 0.5392 0.5754 0.7603 11.1730 0.4664
Inception – 0.2046 0.2928 4.2418 0.5854 0.6160 0.8104 12.0223 0.4984
Inception Yes 0.1926 0.2763 3.9636 0.5519 0.5883 0.7765 11.3996 0.4768
InceptionRes – 0.2041 0.2926 4.2323 0.5841 0.6182 0.8153 12.0437 0.5003
InceptionRes Yes 0.1880 0.2716 3.8875 0.5386 0.5932 0.7835 11.4876 0.4807
U-Conv – 0.2030 0.2883 4.1883 0.5811 0.6097 0.8030 11.8969 0.4932
U-Conv Yes 0.1905 0.2719 3.9170 0.5459 0.5813 0.7670 11.2479 0.4711
U-ConvRes – 0.2001 0.2850 4.1372 0.5723 0.6065 0.7977 11.8275 0.4910
U-ConRes Yes 0.1816 0.2622 3.7530 0.5202 0.5777 0.7616 11.1952 0.4683
ARIMA – 0.2844 0.3933 5.9123 0.8048 1.0384 1.2914 20.1717 0.8291
BATS – 0.2834 0.3890 5.8790 0.8024 0.9819 1.2265 18.8075 0.7882
Nave – 0.3523 0.4756 7.1160 – 1.2798 1.5804 24.7799 –

performance per location. Since there are 100 coordinates Tables 3 and 4: most ‘‘winning’’ models (i.e., those which
for Bahia Wind and 81 coordinates for RN Wind and are selected as the best models for a greater number of co-
17 algorithms, in order to save space, only the number ordinates) have calendar variables, ANN models appear to
of coordinates where each architecture performed best be more prominent in the one-hour forecasting problem,
(i.e., lowest MAE) in the test set is presented in this and U-Conv models appear to be in the top tier for one-
section. In Appendix B, the heatmaps of errors (MAE) in hour- and six-hour-ahead forecasting in both datasets.
the test set are illustrated for each algorithm, dataset, and Specifically, considering the one-hour forecast for the RN
forecast horizon.
Wind dataset, the U-ConvRes model with calendar vari-
Tables 3 and 4 present the number of coordinates
ables is the go-to model for 31 coordinates (almost 40%
where each algorithm had the lowest MAE per test set
of the coordinates) considering the significance level of
for the Bahia Wind and RN Wind datasets, respectively.
The Diebold Mariano (DM) test was performed for the
α = 5% - presenting the lowest MAE in the test set for
null hypothesis whereby the given best model is less 42 coordinates (more than 50% of the coordinates).
accurate than the other models. In parenthesis in Tables 3 One point that is interesting to note is that the Incep-
and 4, the same metric is provided, considering a sig- tionRes model with calendar variables was the best model
nificance level of α = 5%. It might be noted that there for 17 coordinates of the Bahia Wind in one-hour-ahead
is no single model which outperforms in all coordinates. forecasting (and had the lowest MAE in the test set for
Overall, the analyzes undertaken that consider global ac- 21 coordinates). However, looking at the global accuracy
curacy measures are corroborated by the results shown in measures, it did not appear to be a top-tier model. This
960
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Table 3
Number of coordinates with the lowest MAE in the test set (significant at α = 5%) - Bahia Wind.
One-hour horizon Six-hour horizon
Architecture Calendar Qty Architecture Calendar Qty
InceptionRes Yes 21 (17) U-Conv Yes 33 (10)
U-Conv Yes 19 (7) Convnet Yes 27 (6)
ANN Yes 17 (15) U-ConvRes Yes 18 (7)
U-ConvRes Yes 16 (8) ConvRes Yes 17 (5)
ANNRes Yes 11 (9) ANN Yes 3 (-)
Convnet Yes 8 (2) Convnet – 1 (-)
Inception Yes 3 (-) ConvRes – 1 (-)
ConvRes Yes 3 (1) ANN – - (-)
ANNRes – 1 (1) ANNRes Yes - (-)
U-Conv – 1 (-) ANNRes – - (-)
ANN – - (-) Inception Yes - (-)
Convnet – - (-) Inception – - (-)
ConvRes – - (-) InceptionRes Yes - (-)
Inception – - (-) InceptionRes – - (-)
InceptionRes – - (-) U-Conv – - (-)
U-ConvRes – - (-) U-ConvRes – - (-)
Box & Jenkins – - (-) Box & Jenkins – - (-)
BATS – - (-) BATS – - (-)
Nave – - (-) Nave – - (-)

Table 4
Number of coordinates with the lowest MAE in the test set (significant at α = 5%) - RN Wind.
One-hour horizon Six-hour horizon
Architecture Calendar Qty Architecture Calendar Qty
U-ConvRes Yes 42 (31) ConvRes Yes 25 (7)
ANN Yes 20 (12) Convnet Yes 23 (3)
ANNRes Yes 9 (2) U-ConvRes Yes 18 (2)
InceptionRes Yes 6 (1) U-Conv Yes 8 (1)
U-Conv Yes 2 (-) Convnet – 3 (-)
ConvRes Yes 2 (-) ConvRes – 1 (-)
ANN – - (-) Inception Yes 1 (-)
ANNRes – - (-) InceptionRes Yes 1 (-)
Convnet Yes - (-) U-Conv – 1 (-)
Convnet – - (-) ANN Yes - (-)
ConvRes – - (-) ANN – - (-)
Inception Yes - (-) ANNRes Yes - (-)
Inception – - (-) ANNRes – - (-)
InceptionRes – - (-) Inception – - (-)
U-Conv – - (-) InceptionRes – - (-)
U-ConvRes – - (-) U-ConvRes – - (-)
Box & Jenkins – - (-) Box & Jenkins – - (-)
BATS – - (-) BATS – - (-)
Nave – - (-) Nave – - (-)

means that the model performed well at a few coordi- corner of the grid are usually related to the highest MAEs.
nates and performed poorly at other coordinates (this is This is probably related to how the wind blows in the
illustrated at the Appendix B, where MAE at corner co- region (from the top right to the bottom left). Convnet
ordinates are especially high; i.e., over 0.40, for Inception architectures (Convnet and Convres) and U-Conv archi-
architectures one hour ahead for Bahia Wind). Conversely, tectures (U-Conv and U-ConvRes) with calendar variables
the ANN model is not ranked first for one-hour-ahead have a better time predicting the top-right corner for
forecasting of Bahia Wind (in Table 3), and it appears to both datasets in a six-hour forecasting horizon. This might
obtain the best global accuracy measures in this task. This be the reason why (1) these models have better global
means that, overall, the model performs well for most accuracy measures, and (2) are selected as best models
coordinates, despite not being the best model for most of more times than the ANN and Inception architectures in
them. This is also the case for the Convnet model with a six-hour-ahead forecasting task.
calendar variables in a six-hour-ahead forecast task for
both datasets.
The ranking illustrated in Tables 3 and 4 might be 5.3. Additional remarks
related to different locations in the region grid. At this
point, with an aggregated evaluation of models, it is not The proposed approach could be applied to a real-
possible to say whether specific algorithms are better at world setting where information (on the meteorological
specific coordinates (or regions). However, it is possible to variables) is updated hourly and the models are executed
see, in Appendix B, that the coordinates at the top-right hourly to predict h-hours ahead of time. However, real
961
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

power system operation may require the models to pro- our task by hybridizing Convnet with Recurrent Neural
vide hourly predictions 24 h ahead of time. This is the case Networks (RNNs). RNNs are a type of neural network
for the Brazilian Power System, where the centralized that present feedback loops in their architecture, allowing
operation considers hourly forecasts 24 h ahead of time them to keep, in the network, information about past
to plan daily operations. To apply the proposed approach inputs. Due to this property, RNNs are well suited to the
in such circumstances, one could create 24 models, where model data that are sequentially generated such as time
each model would use any information available at the series, speech signals, and many others.
time t to predict wind speeds for a given step ahead One topic of interest for future work is related to
(i.e., one model would be developed to predict the wind designing the proposed spatio-temporal forecasting task
speed at a time t + 1, a second model would be developed with fully convolutional models, which would map 3D
to predict the wind speed at a time t + 2, and so on). De- input tensors to 2D or 3D input tensors, directly providing
spite the implementation drawbacks 24 h ahead of time, a forecast for all coordinates in the region. Considering a
the proposition and study of models for h-hours-ahead- fully convolutional setting, transfer learning studies may
of-time predictions are frequent in the literature, which also be beneficial – for example, adopting a segmentation
indicates the relevance of studying such models (Dowell task (e.g., classifying extreme wind speeds in the region)
& Pinson, 2016; Feng, Cui, Hodge, & Zhang, 2017; Hu to train the fully convolutional neural network, and then
& Wang, 2015). Other designs based on the proposed fine-tuning the same neural network in the regression
U-Conv model may be developed to perform 24-hour- (prediction) task. Additionally, it would be of special in-
ahead forecasting (e.g., by making the last layer have 24 terest to compare the fully convolutional model with the
output neurons instead of one single output neuron). models that generate reanalysis data.
With these observations, we outline a way forward
6. Conclusions in modeling wind speeds with multiple explanatory data
and Convnet modeling.
The increasing penetration of intermittent renewable
energy in modern power systems is a challenge in the Acknowledgments
operation of power systems. One way of providing sup-
port to the operation of a system is to develop advanced
We would like to thank the editor and the reviewers
models, which provide accurate forecasts for the intermit-
for their insightful comments, which allowed us to sub-
tent resources. In this work, the U-Convolutional model is
stantially improve our paper. This work was supported by
proposed. It combines the U-Net architecture and Convnet
the Brazilian Coordination for the Improvement of Higher
to predict wind speed for a single location using multiple
Level Personnel (CAPES) under Grant 001, by the Carlos
spatio-temporal explanatory variables as the input to the
Chagas Filho Research Support Foundation of the State of
model. This work also investigates other Convnet archi-
Rio de Janeiro (FAPERJ) under Grants 202.673/2018 and
tectures for the same task; these include Convnets with
211.086/2019, the Brazilian National Council for Scien-
inception modules and standard Convnets. Additionally,
tific and Technological Development (CNPq) under Grant
this work investigates whether the addition of seasonal
307403/2019-0, and by the International Institute of Fore-
indicator variables and identity mappings improve the
casters IIF/SAS 2017-2018 Grant Award. This material is
models that are part of the experiments.
also based upon work supported by the Air Force Office
The results indicate that the proposed U-Conv archi-
of Scientific Research (AFOSR), USA under award number
tecture is promising, and that adding calendar variables
FA9550-19-1-0020.
after feature extraction considerably improves the per-
formance of all neural network architectures tested in
this work. The proposed U-Convolutional architectures Appendix A. Hyperparameter configurations
provided competitive global accuracy measures (high best
accuracies for one-hour-ahead forecasting in a dataset, In this section, the hyperparameter configuration de-
and third-best accuracies in all other cases), and were signs tested in the experiments in this work are listed.
selected as best models for different coordinates at both The hyperparameter configurations include (1) architec-
time horizons (in a specific case, the U-ConvRes model tural information such as the number of filters in first
was selected as the best model for almost 40% of the coor- and second convolutional modules in Convnet designs,
dinates on one-hour ahead-forecasting in a dataset). The (2) learning protocol information, which includes infor-
results also indicate that the proposed U-Convolutional mation regarding learning rate, weight decay, batch size,
architectures are more stable than the other architectures and early stopping, and (3) regularization techniques such
when it comes to forecasting different time horizons in as dropout and weight regularization. Additionally, this
different datasets. This may be due to the higher capabil- section provides the best hyperparameter configuration
ity of synthesizing input data with a U-Net architecture in for each dataset, detailing the total number of parameters
the model design. and the average Mean Absolute Error (MAE) on validation
In this work, to account for temporality, a modeling folds. As detailed in Section 4.2, the selection of the best
setup where features at present and past times are in- hyperparameters was made based on experiments at a
cluded in the channel axis was adopted. This type of single coordinate, located in the center of each dataset’s
setup is present in video frame prediction tasks. For fu- grid. Furthermore, the grid search considered only the
ture work, we propose to account for the temporality in one-hour-ahead task.
962
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

A.1. Bahia wind dataset • early stopping (epochs without loss decrease): 5;
• learning rate: 0.001;
A.1.1. Standard convolutional neural networks (Convnet) • weight decay: 0.0001;
For the standard Convolutional Neural Network (Con- • batch size: 256;
vnet) architecture, a total of 140 hyperparameter configu- • epochs: 200.
rations were tested. The grid search was performed in two
steps: a full grid search (on 128 configurations), followed A.1.2. U-Convolutional architecture
by a refinement phase (grid search on 12 configurations), The best architecture found for the Convnet (in
where new values were tested for a few hyperparameters. Appendix A.1.1) was adopted as a reference for the Con-
Below, the values tested for each hyperparameter in the vnet part of the U-Convolutional architecture. By doing
full grid search are listed. this, only hyperparameters related to the U-Net, the learn-
ing protocol, and regularization were subject to a grid
• activation - convolutional layers: ReLU;
search. In total, 64 hyperparameter configurations were
• activation - fully connected layers: ReLU;
tested. Below, the values of the hyperparameters tested
• activation - output layer: linear;
on the grid search are listed.
• filters - 1st convolutional module: 24, 32; The best configuration of the U-Convolutional model
• filters - 2nd convolutional module: 168, 224; was the one with 24 filters in the first module of the
• pooling function: maximum, average;
U-Net part of the architecture (the other modules follow
• neurons - fully connected layer: 168, 224; the logic of the U-Net design; e.g., doubling values on
• dropout rate - convolutional layers: None; the contracting path), no dropout, no weight regulariza-
• dropout rate - fully connected layer: None, 0.25; tion of convolutional layers, and weight regularization
• weight regularization - convolutional layers: None, of 0.000025 on fully connected layers, with a learning
0.00005; rate of 0.001. The average MAE on validation folds for
• weight regularization - fully connected layers: None, the best configuration was 0.2082. The best configuration
0.00005; produced a model with 993,194 parameters (weights).
• early stopping (epochs without loss decrease): 5;
• learning rate: 0.001; • activation - U-Net: eLU;
• weight decay: 0.0001; • activation - convolutional layers: ReLU;
• batch size: 256; • activation - fully connected layers: ReLU;
• epochs: 200. • activation - output layer: linear;
• filters - 1st convolutional module (U-Net part): 24,
After performing the grid search on 128 configurations,
48;
an additional round of hyperparameter searches was im-
• filters - 1st convolutional module (Convnet part): 48;
plemented in order to test a different value of the number
• filters - 2nd convolutional module (Convnet part):
of filters on the first convolutional module (i.e., 48 filters)
224;
and different values for weight regularization. Below, the
• pooling function: average;
list of hyperparameters tested on a new grid search are
• neurons - fully connected layers: 224;
detailed.
The best configuration of the Convnet architecture was • dropout rate - convolutional layers: None, 0.1;
the one with 48 filters in the first convolutional mod- • dropout rate - fully connected layer: None, 0.25;
ule, 224 filters in the second convolutional module, 224 • weight regularization - convolutional layers: None,
neurons on fully connected layers, the average pooling 0.00005;
function, and a weight regularization of 0.00005 on con- • weight regularization - fully connected layers: None,
volutional layers and of 0.000025 on fully connected lay- 0.000025;
ers; the other hyperparameters are the ones described • early stopping (epochs without loss decrease): 5;
above. The average MAE of validation folds for this hyper- • learning rate: 0.001, 0.002;
parameter configuration was 0.2232, and the total num- • weight decay: 0.0001;
ber of parameters for the architecture was 456,353. • batch size: 256;
• epochs: 200.
• activation - convolutional layers: ReLU;
• activation - fully connected layers: ReLU; A.1.3. Inception architecture
• activation - output layer: linear; For the Inception architecture, a total of 128 hyper-
• filters - 1st convolutional module: 32, 48; parameter configurations were tested. Below, the values
• filters - 2nd convolutional module: 224; tested for each hyperparameter in the full grid search are
• pooling function: average; listed. The best configuration for the Inception model was
• neurons - fully connected layer: 224; the one with 72 filters on the first inception module, 504
• dropout rate - convolutional layers: None; filters on the second inception module, 168 neurons on
• dropout rate - fully connected layer: None; the fully connected layer, max pooling, a dropout rate
• weight regularization - convolutional layers: of 0.25, and a weight regularization of 0.00005 for con-
0.00005, 0.0001; volutional and fully connected layers. The average MAE
• weight regularization - fully connected layers: None, in validation folds was 0.2250. The best configuration
0.000025, 0.00005; produced a model with 900,697 parameters.
963
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. B.6. MAE on test set for Bahia Wind (horizon = 1-hour ahead. (For interpretation of the references to colour in this figure legend, the reader
is referred to the web version of this article.)

• activation - convolutional layers: ReLU; • weight regularization - convolutional layers: None,


• activation - fully connected layers: ReLU; 0.00005;
• activation - output layer: linear; • weight regularization - fully connected layers: None,
• filters - 1st inception module: 24, 72; 0.00005;
• filters - 2nd inception module: 168, 504; • early stopping (epochs without loss decrease): 5;
• pooling function: maximum, average; • learning rate: 0.001;
• neurons - fully connected layers: 168, 504; • weight decay: 0.0001;
• dropout - fully connected layers: None, 0.25; • batch size: 256;
• epochs: 200.
964
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. B.7. MAE on test set for Bahia Wind (horizon = 6-hours ahead. (For interpretation of the references to colour in this figure legend, the reader
is referred to the web version of this article.)

A.1.4. Fully Connected Neural Networks (ANN) • activation - fully connected layers: ReLU;
For the ANN architecture, a total of 108 hyperparam- • activation - output layer: linear;
eter configurations were tested. Below, the values tested • neurons - 1st and 4th layers: 100, 200;
for each hyperparameter on the full grid search are listed. • neurons - 2nd and 3rd layers: 200, 300;
The best configuration for the ANN model was the one • dropout rate: None, 0.1, 0.25;
with 200 neurons on 1st and 4th layers, 300 neurons on • weight regularization - all layers: None, 0.0001,
2nd and 3rd layers, no dropout, a weight regularization 0.00005;
of 0.0001, and a learning rate of 0.001. The best model • early stopping (epochs without loss decrease): 5;
had an average MAE in validation folds of 0.2081. The best • learning rate: 0.005, 0.001, 0.0005;
configuration produced a model with 391,201 parameters. • weight decay: 0.0001;
965
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. B.8. MAE on test set for RN Wind (horizon = 1 h ahead. (For interpretation of the references to colour in this figure legend, the reader is
referred to the web version of this article.)

• training epochs: 200; there was no refinement step). The 128 configurations
• batch size: 256. tested are the same list as in the full grid search for Bahia
Wind (see Appendix A.1.1). The best configuration of the
A.2. Rio Grande do Norte wind dataset Convnet architecture was the one with 32 filters in the
first convolutional module, 168 filters in the second con-
A.2.1. Standard convolutional neural networks (Convnet) volutional module, 224 neurons in fully connected layers,
For the Rio Grande do Norte (RN) Wind dataset, 128 an average pooling function, a weight regularization of
hyperparameter configurations were tested for the Con- 0.00005 in convolutional layers, and no regularization in
vnet architecture (different from the Bahia Wind, where fully connected layers. The average MAE on validation
966
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Fig. B.9. MAE on test set for RN Wind (horizon = 6 h ahead. (For interpretation of the references to colour in this figure legend, the reader is
referred to the web version of this article.)

folds for this hyperparameter configuration was 0.1910, search. In total, 64 hyperparameter configuration were
and the total number of parameters for the architecture tested. Below, the values of the hyperparameters tested
was 291,025. on the grid search are listed.
The same hyperparameter configurations listed in
A.2.2. U-Convolutional architecture Appendix A.1.2 for Bahia Wind were tested for RN Wind.
The best architecture found for the Convnet (see The best configuration of the U-Convolutional model was
Appendix A.2.1) was adopted as a reference for the Con- the one with 48 filters in the first module of the
vnet part of the U-Convolutional architecture. By doing U-Net part of the architecture, no dropout, a weight reg-
this, only hyperparameters related to the U-Net, the learn- ularization of 0.00005 on convolutional layers, a weight
ing protocol, and regularization were subject to a grid regularization of 0.000025 on fully connected layers, and
967
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Table B.5 B.2. Rio Grande do Norte wind dataset


Model reference on heatmap figures.
ANN ANN ANNRes ANNRes
no calendar calendar no calendar calendar
Below, in Figs. B.8 and B.9, the MAEs for the one-
Convnet Convnet ConvRes ConvRes hour- and six-hour-ahead forecasts of the Rio Grande
no calendar calendar no calendar calendar do Norte Wind are provided. The heatmap ranges from
Inception Inception InceptionRes InceptionRes 0.15 MAE (darkest blue) to 0.30 or higher MAE (darkest
no calendar calendar no calendar calendar
U-Conv U-Conv U-ConvRes U-ConvRes
red) for the heatmaps of the one-hour-ahead horizon, and
no calendar calendar no calendar calendar ranges from 0.45 MAE (darkest blue) to 0.90 or higher
B&J BATS Nave – MAE (darkest red) for the heatmaps of the six-hour-ahead
horizon.

a learning rate of 0.001. The average MAE on validation References


folds for the best configuration was 0.2000. The best con-
figuration produced a model with 2,433,754 parameters. André, M., Dabo-Niang, S., Soubdhan, T., & Ould-Baba, H. (2016). Pre-
dictive spatio-temporal model for spatially sparse global solar radi-
ation data. Energy, 111, 599–608. http://dx.doi.org/10.1016/j.energy.
A.2.3. Inception architecture 2016.06.004, URL http://www.sciencedirect.com/science/article/pii/
For the Inception architecture, the same 128 hyperpa- S0360544216307769.
Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., Martinez-de Pison, F.,
rameter configurations listed for Bahia Wind (see
& Antonanzas-Torres, F. (2016). Review of photovoltaic power
Appendix A.1.3) were tested for RN Wind. The best con- forecasting. Solar Energy, 136, 78–111. http://dx.doi.org/10.1016/J.
figuration for the Inception model was the one with 72 SOLENER.2016.06.069, URL https://www.sciencedirect.com/science/
filters in the first inception module, 504 filters in the sec- article/pii/S0038092X1630250X.
Aryaputera, A. W., Yang, D., Zhao, L., & Walsh, W. M. (2015). Very
ond inception module, 504 neurons in the fully connected
short-term irradiance forecasting at unobserved locations using
layer, max pooling, no dropout, and a weight regular- spatio-temporal kriging. Solar Energy, 122, 1266–1278. http://dx.doi.
ization of 0.00005 for convolutional and fully connected org/10.1016/j.solener.2015.10.023, URL http://www.sciencedirect.
layers. The average MAE on validation folds was 0.1947. com/science/article/pii/S0038092X15005745.
The best configuration produced a model with 1,070,713 Box, G., & Jenkins, G. (1976). Time series analysis: Forecasting and control.
San Francisco, California: Holden-Day.
parameters.
Chen, Y., Zhang, S., Zhang, W., Peng, J., & Cai, Y. (2019). Multifac-
tor spatio-temporal correlation model based on a combination
of convolutional neural network and long short-term memory
A.2.4. Fully Connected Neural Networks (ANN)
neural network for wind speed forecasting. Energy Conversion and
For the ANN architecture, the same 108 hyperparame- Management, 185, 783–799. http://dx.doi.org/10.1016/j.enconman.
ter configurations listed for Bahia Wind (see 2019.02.018, URL http://www.sciencedirect.com/science/article/pii/
Appendix A.1.4) were tested for RN Wind. The best config- S0196890419302006.
uration for the ANN model was the one with 100 neurons Chollet, F. (2016). Xception: Deep learning with separable convolutions.
(pp. 1–14). http://dx.doi.org/10.1109/CVPR.2017.195, arXiv preprint
on the 1st and 4th layers, 300 neurons on the 2nd and 3rd arXiv:1610.02357.
layers, no dropout, a weight regularization of 0.00005, and Chollet, F., et al. (2015). Keras. URL https://keras.io.
a learning rate of 0.0005. The best model had an average Damousis, I. G., Alexiadis, M. C., Theocharis, J. B., & Dokopoulos, P.
MAE on validation folds of 0.1879. The best configuration S. (2004). A fuzzy model for wind speed prediction and power
generation in wind parks using spatial correlation. IEEE Transactions
produced a model with 223,801 parameters.
on Energy Conversion, 19(2), 352–361. http://dx.doi.org/10.1109/TEC.
2003.821865, URL http://ieeexplore.ieee.org/document/1300701/.
Appendix B. Mean absolute errors on all coordinates David, M., Luis, M., & Lauret, P. (2018). Comparison of intraday
probabilistic forecasting of solar irradiance using only endogenous
data. International Journal of Forecasting, 34(3), 529–547.
In this section, the heatmap of Mean Absolute Errors http://dx.doi.org/10.1016/j.ijforecast.2018.02.003, URL https://w
(MAEs) are provided for each dataset and forecasting hori- ww.scopus.com/inward/record.uri?eid=2-s2.0-85046850329&doi=1
0.1016%2fj.ijforecast.2018.02.003&partnerID=40&md5=e741d71b53
zon. With the heatmaps, it is possible to compare errors in
ed081adbd954a8c81253bc.
different architectures for all coordinates of each dataset. Denton, E., & Birodkar, V. (2017). Unsupervised learning of disen-
For reference, Table B.5 illustrates the positioning of the tangled representations from video. CoRR abs/1705.10915, arXiv:
heatmap for each models. 1705.10915.
Dowell, J., & Pinson, P. (2016). Very-short-term probabilistic wind
power forecasts by sparse vector autoregression. IEEE Transactions
B.1. Bahia wind dataset on Smart Grid, 7(2), 763–770. http://dx.doi.org/10.1109/TSG.2015.
2424078.
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient
Below, in Figs. B.6 and B.7, the MAEs for the one-hour- methods for online learning and stochastic optimization.
and six-hour-ahead forecasts of the Bahia Wind are pro- Journal of Machine Learning Research, 12(Jul), 2121–2159.
vided. The heatmap ranges from 0.14 MAE (darkest blue) http://dx.doi.org/10.1109/CDC.2012.6426698, URL http://jmlr.or
to 0.44 or higher MAE (darkest red) for the heatmaps of g/papers/v12/duchi11a.htmlhttp://dl.acm.org/citation.cfm?id=20210
68%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2
the one-hour-ahead horizon, and ranges from 0.40 MAE 32.1303&rep=rep1&type=pdf#page=265.
(darkest blue) to 0.78 or higher MAE (darkest red) for the Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for
heatmaps of the six-hour-ahead horizon. deep learning: Tech. rep., URL https://arxiv.org/abs/1603.07285.

968
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Feng, C., Cui, M., Hodge, B.-M., & Zhang, J. (2017). A data-driven multi- Koivisto, M., Seppänen, J., Mellin, I., Ekström, J., Millar, J., Mam-
model methodology with deep feature selection for short-term marella, I., et al. (2016). Wind speed modeling using a vector
wind forecasting. Applied Energy, 190, 1245–1257. http://dx.doi. autoregressive process with a time-dependent intercept term.
org/10.1016/j.apenergy.2017.01.043, URL http://www.sciencedirect. International Journal of Electrical Power & Energy Systems, 77,
com/science/article/pii/S0306261917300508. 91–99. http://dx.doi.org/10.1016/j.ijepes.2015.11.027, URL https://
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural www.sciencedirect.com/science/article/pii/S0142061515004470.
networks. 15, In Proceedings of the 14th international conference on Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet clas-
artificial intelligence and statistics (Vol. 15) (pp. 315–323). sification with deep convolutional neural networks. Advances In
Goodfellow, I., Bengio, Y., & Courville, A. (2017). Deep learning (1st ed.). Neural Information Processing Systems, 1–9. http://dx.doi.org/10.
Cambridge, MA: MIT Press. 1016/j.protcy.2014.09.007.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Le Xie, Carvalho, P. M. S., Ferreira, L. A. F. M., Juhua Liu, Krogh, B. H.,
Ozair, S., et al. (2014). Generative adversarial nets. In Z. Ghahra- Popli, N., et al. (2011). Wind integration in power systems: Oper-
mani, M. Welling, C. Cortes, N. D. Lawrence, K. Q. Weinberger ational challenges and possible solutions. Proceedings of the IEEE,
(Eds.), Advances in neural information processing systems (Vol. 27) 99(1), 214–232. http://dx.doi.org/10.1109/JPROC.2010.2070051.
(pp. 2672–2680). Curran Associates, Inc., URL http://papers.nips.cc/ LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,
paper/5423-generative-adversarial-nets.pdf. 521(7553), 436–444. http://dx.doi.org/10.1038/nature14539, URL
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. http://www.nature.com/doifinder/10.1038/nature14539.
(2018). Recent advances in convolutional neural networks. Pat- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based
tern Recognition, 77, 354–377. http://dx.doi.org/10.1016/j.patcog. learning applied to document recognition. Proceedings of the IEEE,
2017.10.013, URL http://www.sciencedirect.com/science/article/pii/ 86(11), 2278–2323. http://dx.doi.org/10.1109/5.726791.
S0031320317304120. LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K.-R. (2012). Efficient
He, M., Yang, L., Zhang, J., & Vittal, V. (2014). A spatio-temporal backprop. In G. Montavon, G. B. Orr, & K.-R. Müller (Eds.), Neural
analysis approach for short-term forecast of wind farm generation. networks: Tricks of the trade: Second edition (pp. 9–48). Berlin,
IEEE Transactions on Power Systems, 29(4), 1611–1622. http://dx.doi. Heidelberg: Springer Berlin Heidelberg, http://dx.doi.org/10.1007/
org/10.1109/TPWRS.2014.2299767, URL http://ieeexplore.ieee.org/ 978-3-642-35289-8_3.
document/6727513/. Lei, M., Shiyan, L., Chuanwen, J., Hongling, L., & Yan, Z. (2009). A
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for review on the forecasting of wind speed and generated power.
image recognition. In 2016 IEEe conference on computer vision and Renewable & Sustainable Energy Reviews, 13(4), 915–920. http://dx.
pattern recognition (pp. 770–778). doi.org/10.1016/J.RSER.2008.02.002, URL https://www.sciencedirect.
He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep com/science/article/pii/S1364032108000282.
residual networks. In B. Leibe, J. Matas, N. Sebe, & M. Welling Li, X., Shang, W., & Wang, S. (2018). Text-based crude oil price
(Eds.), Computer vision – ECCV 2016 (pp. 630–645). Cham: Springer forecasting: A deep learning approach. International Journal
International Publishing. of Forecasting, http://dx.doi.org/10.1016/j.ijforecast.2018.07.006,
Heinermann, J., & Kramer, O. (2016). Machine learning ensem- URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-
bles for wind power prediction. Renewable Energy, 89, 671– 85055654907&doi=10.1016%2fj.ijforecast.2018.07.006&partnerID=
679. http://dx.doi.org/10.1016/j.renene.2015.11.073, URL http:// 40&md5=7ed2dec017f48dada0e1fa9664828e4c.
www.sciencedirect.com/science/article/pii/S0960148115304894. Li, G., Shi, J., & Zhou, J. (2011). Bayesian Adaptive combination
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. of short-term wind speed forecasts from neural network mod-
Neural Computing, 9(8), 1735–1780. http://dx.doi.org/10.1162/neco. els. Renewable Energy, 36(1), 352–359. http://dx.doi.org/10.1016/
1997.9.8.1735. j.renene.2010.06.049, URL https://www.sciencedirect.com/science/
Hu, J., & Wang, J. (2015). Short-term wind speed prediction us- article/pii/S0960148110003228.
ing empirical wavelet transform and Gaussian process regres- Liu, W., Luo, W., Lian, D., & Gao, S. (2018). Future frame prediction
sion. Energy, 93, 1456–1466. http://dx.doi.org/10.1016/j.energy. for anomaly detection – A new baseline. In The IEEE conference on
2015.10.041, URL http://www.sciencedirect.com/science/article/pii/ computer vision and pattern recognition.
S0360544215014097. Liu, H., Mi, X., & Li, Y. (2018). Smart deep learning based wind
Hu, Q., Zhang, S., Yu, M., & Xie, Z. (2016). Short-term wind speed or speed prediction model using wavelet packet decomposition, con-
power forecasting with heteroscedastic support vector regression. volutional neural network and convolutional long short term
IEEE Transactions on Sustainable Energy, 7(1), 241–249. http://dx. memory network. Energy Conversion and Management, 166, 120–
doi.org/10.1109/TSTE.2015.2480245, URL http://ieeexplore.ieee.org/ 131. http://dx.doi.org/10.1016/j.enconman.2018.04.021, URL http:
document/7335638/. //www.sciencedirect.com/science/article/pii/S019689041830356X.
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecast- Livera, A. M. D., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting
ing: the forecast package for R. Journal of Statistical Software, 26(3), time series with complex seasonal patterns using exponential
1–22, URL http://www.jstatsoft.org/article/view/v027i03. smoothing. Journal of the American Statistical Association, 106(496),
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of 1513–1527. http://dx.doi.org/10.1198/jasa.2011.tm09771.
forecast accuracy. International Journal of Forecasting, 22(4), 679– Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional net-
688. http://dx.doi.org/10.1016/j.ijforecast.2006.03.001, URL https: works for semantic segmentation. CoRR abs/1411.4038, arXiv:1411.
//www.sciencedirect.com/science/article/pii/S0169207006000239? 4038.
via%3Dihub. Maçaira, P. M., Thomé, A. M. T., Oliveira, F. L. C., & Ferrer, A. L. C.
IEA (2017). World energy outlook 2017 (pp. 1–15). IEA, http://dx. (2018). Time series analysis with explanatory variables: A sys-
doi.org/10.1016/0301-4215(73)90024-4. URL https://www.iea.org/ tematic literature review. Environmental Modelling & Software, 107,
weo2017/. 199–209. http://dx.doi.org/10.1016/j.envsoft.2018.06.004, URL http:
Jung, J., & Broadwater, R. P. (2014). Current status and future //www.sciencedirect.com/science/article/pii/S136481521730542X.
advances for wind speed and power forecasting. Renewable Mathieu, M., Couprie, C., & LeCun, Y. (2015). Deep multi-scale video
& Sustainable Energy Reviews, 31, 762–777, http://dx.doi.org/ prediction beyond mean square error. CoRR abs/1511.05440, arXiv:
10.1016/J.RSER.2013.12.054. URL https://www.sciencedirect.com/ 1511.05440.
science/article/pii/S1364032114000094. Persson, C., Bacher, P., Shiga, T., & Madsen, H. (2017). Multi-site
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimiza- solar power forecasting using gradient boosted regression trees.
tion. (pp. 1–15). [ISSN: 09252312] ISBN: 9781450300728, http://doi. Solar Energy, 150, 423–436. http://dx.doi.org/10.1016/j.solener.
acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503. URL http:// 2017.04.066, URL http://www.sciencedirect.com/science/article/pii/
arxiv.org/abs/1412.6980. S0038092X17303717.

969
B.Q. Bastos, F.L. Cyrino Oliveira and R.L. Milidiú International Journal of Forecasting 37 (2021) 949–970

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional Tascikaraoglu, A., Sanandaji, B. M., Poolla, K., & Varaiya, P. (2016).
networks for biomedical image segmentation. In Medical im- Exploiting sparsity of interconnections in spatio-temporal wind
age computing and computer-assisted intervention – MICCAI 2015, speed forecasting using wavelet transform. Applied Energy, 165,
(pp. 234–241). Springer International Publishing, http://dx.doi.org/ 735–747. http://dx.doi.org/10.1016/j.apenergy.2015.12.082.
10.1007/978-3-319-24574-4_28. Vargas, S. A., Esteves, G. R. T., Maçaira, P. M., Bastos, B. Q., Oliveira, F. L.
Saha, S., Moorthi, S., Wu, X., Wang, J., Nadiga, S., Tripp, P., et C., & Souza, R. C. (2019). Wind power generation: A review and a
al. (2014). The NCEP climate forecast system version 2. Jour-
research agenda. Journal of Cleaner Production, 218, 850–870. http://
nal of Climate, 27(6), 2185–2208. http://dx.doi.org/10.1175/JCLI-D-
dx.doi.org/10.1016/j.jclepro.2019.02.015, http://www.sciencedirect.
12-00823.1, URL http://journals.ametsoc.org/doi/abs/10.1175/JCLI-
com/science/article/pii/S0959652619303944.
D-12-00823.1.
Villegas, R., Yang, J., Hong, S., Lin, X., & Lee, H. (2017). Decomposing
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-k., & Woo, W.-c.
(2015). Convolutional LSTM network: A machine learning approach motion and content for natural video sequence prediction. CoRR
for precipitation nowcasting. In C. Cortes, N. D. Lawrence, D. D. Lee, abs/1706.08033, arXiv:1706.08033.
M. Sugiyama, & R. Garnett (Eds.), Advances in neural information Xie, L., Gu, Y., Zhu, X., & Genton, M. G. (2014). Short-term spatio-
processing systems (Vol. 28) (pp. 802–810). Curran Associates, Inc. temporal wind power forecast in robust look-ahead power system
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional net- dispatch. IEEE Transactions on Smart Grid, 5(1), 511–520. http://dx.
works for large-scale image recognition. URL https://arxiv.org/abs/ doi.org/10.1109/TSG.2013.2282300.
1409.1556. Zhu, X., & Genton, M. G. (2012). Short-term wind speed forecasting for
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhut- power system operations. International Statistical Review, 80(1), 2–
dinov, R. (2014). Dropout: A simple way to prevent neural 23, http://dx.doi.org/10.1111/j.1751-5823.2011.00168.x. URL http:
networks from overfitting. Journal of Machine Learning Research, 15, //doi.wiley.com/10.1111/j.1751-5823.2011.00168.x.
1929–1958. http://dx.doi.org/10.1214/12-AOS1000. Zhu, A., Li, X., Mo, Z., & Wu, R. (2017). Wind power prediction based
Tascikaraoglu, A., Sanandaji, B. M., Chicco, G., Cocina, V., Spertino, F.,
on a convolutional neural network. In 2017 international confer-
Erdinc, O., et al. (2016). Compressive spatio-temporal forecasting of
ence on circuits, devices and systems (pp. 131–135). IEEE, http://
meteorological quantities and photovoltaic power. IEEE Transactions
dx.doi.org/10.1109/ICCDS.2017.8120465, URL http://ieeexplore.ieee.
on Sustainable Energy, 7(3), 1295–1305. http://dx.doi.org/10.1109/
org/document/8120465/.
TSTE.2016.2544929.

970

You might also like