Predictive Modeling of PV Energy Production: How To Set Up The Learning Task For A Better Prediction?

956 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO.
3, JUNE 2017
Predictive Modeling of PV Energy Production:

How to Set Up the Learning Task for a Better
Prediction?
Michelangelo Ceci, Roberto Corizzo, Fabio Fumarola, Donato Malerba, Member, IEEE,
and Aleksandra Rashkovska, Member, IEEE
Abstract—In this paper, we tackle the problem of power generation. The main challenges faced by this new energy mar-
prediction of several photovoltaic (PV) plants spread over ket are grid integration, load balancing, and energy trading.
an extended geographic area and connected to a power grid. First, integrating such distributed and renewable power sources
The paper is intended to be a comprehensive study of one-
day ahead forecast of PV energy production along several into the power grid, while avoiding decreased reliance and dis-
dimensions of analysis: 1) The consideration of the spatio- tribution losses, is a demanding task for the smart grid effort.
temporal autocorrelation, which characterizes geophysical In fact, renewable power sources, such as photovoltaic array,
phenomena, to obtain more accurate predictions. 2) The are variable and intermittent in their energy output, because the
learning setting to be considered, i.e., using simple output
energy produced may also depend on uncontrollable factors,
prediction for each hour or structured output prediction for
each day. 3) The learning algorithms: We compare artificial such as weather conditions. Second, the main players in the en-
neural networks, most often used for PV prediction forecast, ergy market – the distributors and smaller companies that act
and regression trees for learning adaptive models. The re- between offer (traders) and request in the supply chain – have
sults obtained on two PV power plant datasets show that: to face uncertainty not only in the request but also in the offer,
taking into account spatio/temporal autocorrelation is bene-
when planning the energy supply for their customers. Third, the
ficial; the structured output prediction setting significantly
outperforms the nonstructured output prediction setting; power produced by each single source contributes in defining
and regression trees provide better models than artificial the final clearing price in the daily or hourly market [1]; thus,
neural networks. making the energy market very competitive and a true maze for
Index Terms—Artificial neural networks (ANNs), photo- outsiders.
voltaic (PV) energy prediction, regression trees, spatial and In order to face these challenges, it is of paramount impor-
temporal autocorrelations, structured output. tance to monitor the production and consumption of energy,
both at the local and global levels, to store historical data and to
I. INTRODUCTION design new reliable prediction tools. In this paper, we focus our
HE urgent need to reduce pollution emission has made attention on photovoltaic (PV) power plants, due to their wide
T renewable energy a strategic European Union and interna-
tional sector. This has resulted in an increasing presence of re-
distribution in Europe. During the last few years, the forecast
of PV energy production has received significant attention since
newable energy sources and, thus, significant distributed power photovoltaics are becoming a major source of renewable energy
for the world. Forecast may apply to a single renewable power
generation system [2], or refer to an aggregation of large number
Manuscript received April 20, 2016; revised July 15, 2016; accepted of systems spread over an extended geographic area [3], [4]. Ac-
August 15, 2016. Date of publication August 31, 2016; date of current
version June 1, 2017. This work was supported by the projects “Vi-POC:
cordingly, different forecasting methods are used. Furthermore,
Virtual Power Operation Center” (PAC02L1_00269), funded by the Italian forecasting methods also depend on the tools and information
Ministry of University and Research, and “MAESTRA – Learning from available. Diverse resources are used to generate solar and PV
Massive, Incompletely annotated, and Structured Data” under Grant ICT-
2013-612944, funded by the European Commission. The computational
forecasts depending on the forecast horizon considered, ranging
work has been carried out on the resources provided by the projects from measured weather and PV system data, to satellite and sky
ReCaS (PONa3_00052) and PRISMA (PON04a2_A). Paper no. TII-16- imagery cloud observations used for very short-term forecasts
0385. (Corresponding author: M. Ceci.)
M. Ceci, R. Corizzo, and D. Malerba are with the Department of Com-
(0–6 h ahead), to Numerical Weather Prediction (NWP) models
puter Science, University of Bari Aldo Moro, Bari 70125, Italy (e-mail: used for horizons beyond approximately 6 h [5]. Several works
michelangelo.ceci@uniba.it; roberto.corizzo@uniba.it; fabio.fumarola@ clarify that the best approaches make use of both measured data
uniba.it; donato.malerba@uniba.it).
A. Rashkovska is with the Department of Computer Science, Uni-
and NWP [4], [6].
versity of Bari Aldo Moro, Bari 70125, Italy and also with the Depart- In the literature, several data mining approaches have been
ment of Communication Systems, Jožef Stefan Institute, Ljubljana 1000, proposed for renewable energy power forecasting. Researchers
Slovenia (e-mail: aleksandra.rashkovska@ijs.si).
Color versions of one or more of the figures in this paper are available
typically distinguish between two classes of approaches: phys-
online at http://ieeexplore.ieee.org. ical and statistical. The former relies on the refinement of
Digital Object Identifier 10.1109/TII.2016.2604758 NWP forecasts with physical considerations (e.g., obstacles and
1551-3203 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
CECI et al.: PREDICTIVE MODELING OF PV ENERGY PRODUCTION: HOW TO SET UP THE LEARNING TASK FOR A BETTER PREDICTION? 957
orography) [7] or measured data (approach often referred to as

Model Output Statistics or MOS) [4], [6], while the latter is
based on models that establish a relationship between historical
values and forecasted variables. Statistical approaches may or
may not take into account NWP data. Some of them are based
on time series [8], while others learn adaptive models from data,
like autoregressive (AR) models [3], artificial neural networks
(ANNs) [2], or support vector machines (SVM) classifiers [9].
In this respect, it has been noted that physical property be-
havior (e.g., wind speed and solar irradiation) exhibits a trail Fig. 1. Trend of pressure (hPa), cloud cover (%), humidity (%), wind
called concept drift, i.e., they change characteristics over time speed (m/s), irradiance (W/m2 ) and temperature (◦ C) (X-axis) with re-
spect to the energy produced (KWh) (Y-axis).
[1]: Adaptive models are generally considered to produce more
reliable predictions regarding concept drift, but require a con-
tinuous training phase. Combinations of statistical (ANN and In addition, the production of PV plants is also affected by
SVM) and physical (MOS) approaches for renewable energy temporal autocorrelation, since it:
power forecasting have also been recently investigated [10]. 1) tends to have similar values at a given time in close days;
Despite the existence of such data mining algorithms applied 2) has a cyclic and seasonal (over days and years) behavior;
in renewable energy power forecasting for learning adaptive 3) tends to show the same trend over time.
models [1]–[3], [9], there is no consensus about the spatio- While adaptive approaches (which typically employ stream
temporal information to be taken into account, the learning set- mining algorithms) deal with 1) and 2), they may fail to con-
ting to be considered and the learning algorithms to be used. This sider 2), since they tend to better represent the most recently
paper considers all these aspects as dimensions of analysis and observed concepts, forgetting previously learned ones [12]. On
investigates their real contribution in renewable energy power the contrary, time series-based approaches are able to deal with
forecasting. The paper is structured as follows. Section II gives 3), but may fail to consider 1) and 2). In fact, they typically re-
the motivation and contribution of the study. Section III for- quire the size of the temporal horizon as an input: Considering
malizes the problem, how spatial and temporal autocorrelation a short-term horizon (e.g., daily) excludes a long-term horizon
components are considered and how the (non)structured output (e.g., seasonal) and vice versa.
learning task is defined. Section IV presents the data collection Although several approaches combine both the spatial and
and preprocessing. The dimensions of analysis stated above are the temporal dimensions [13] from stream data, they have
addressed in Section V, where experiments are reported and not been applied in the context of renewable energy power
results are discussed. Finally, Section VI concludes the paper. prediction. Moreover, they do not take the spatial autocorre-
lation phenomenon into account and, thus, do not exploit the
II. MOTIVATION AND CONTRIBUTION spatial structure of the data [14]. In this paper, similar to other
The motivation for this paper comes from the different learn- approaches, we exploit NWP to benefit from uncontrollable
ing settings that can be found in the literature for the task at hand. factors, but in addition, we evaluate at which extent taking
Concerning the spatio-temporal information to be taken into ac- into account spatial and temporal autocorrelations is beneficial.
count, it is noteworthy that most of the previously referenced While spatial autocorrelation is taken into account by resorting
work consider forecasting solutions for single plants and ignore to two well-known techniques in spatial statistics, temporal au-
the information collected from/at other plants/sites in the vicin- tocorrelation is considered by resorting to directional statistics.
ity, even when this would be easily accessible (through spatial Concerning the learning setting, in the existing work, the
information). In this paper, we show that this information loss classical solution is to learn a predictive model which predicts
may result in reduced accuracy of the forecasting models, and the energy that will be produced at a specific hour and day in
we advocate an approach for learning forecasting models from the future. This approach, however, appears to be too limiting if
data related to multiple plants. Differently from the few works in we consider that two consecutive hours are seen as independent
the literature that consider multiple plants [3], [4], our analysis examples. An alternative solution is to resort to approaches that
also leverages the spatio-temporal autocorrelation that charac- learn predictive models for structured output (structured output
terizes geophysical phenomena, such as weather conditions, to prediction) [15]. In principle, this approach should be able to
make more accurate predictions. catch (and model) dependence between consecutive hours of the
Indeed, site proximity of PV plants introduces spatial same day, just as some time-series methods do for short-term
autocorrelation1 in functional annotations and leads to the vi- prediction.
olation of the usual assumption that observations are indepen- Concerning the data mining algorithm, we investigate the
dently and identically distributed (i.i.d.). Although the explicit predictive performance obtained with two different algorithms,
consideration of these spatial dependencies brings additional one which learns ANNs and another one which learns regres-
complexity to the learning process, it generally leads to an in- sion trees. While ANNs have been extensively used for en-
creased accuracy of learned models [11]. ergy prediction [1], regression trees have received significantly
less attention. Both algorithms are also able to deal with the
1 Correlation among data values which is strictly due to the relative spatial nonlinearity of production with respect to some of the consid-
proximity of the objects that the data refer to. ered features (see Fig. 1). In any case, we study and compare
958 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 3, JUNE 2017
their performance for the problem at hand. Actually, in our

framework, we can plug-in any learner which can learn models
for both nonstructured and structured outputs.
The contributions of this paper aim to provide a comprehen-
sive analysis of the problems described before and to understand
what matters if we want to achieve good and reliable predictions.
In particular, the contributions include:
1) The explicit consideration, with two different solutions,
of the spatial autocorrelation and the investigation of its
Fig. 2. Different types of autocorrelation (dashed lines): Squares and
effect at different extents in predictive modeling of en- circles represent target and input features, respectively; bigger circles
ergy production. In this way, weather conditions (e.g., represent different sites; N (v) represents the neighborhood of the site
temperature, solar radiation, wind direction and (wind) v. (a) Spatial lag model. (b) and (c) Two forms of spatial cross-regressive
model.
speed, and quantity of rain) are appropriately exploited
for forecasting purposes.
and ti,j,h for prediction purposes. The value of the predicted
2) The explicit consideration of the temporal autocorrelation
attribute – the energy production – is available only during train-
and the investigation of its effect at different extents in
ing. It is noteworthy that pi,j,h is used to represent data about
predictive modeling of energy production, in order to
the plants, which are valid at a certain time point. Examples of
deal with nonstationary (cyclic and seasonal) data, in an
features used in pi,j,h are: age of the plant, number of working
adaptive model.
inverters, and maximum plant production. These data allow the
3) An investigation of the effect of the structured output
learning algorithm to directly predict the production, instead of
prediction approach.
the percentage of the production with respect to the maximum
4) An investigation of the effect of the specific learning
plant production.
algorithm used.
The applied spatial and temporal autocorrelation techniques
5) The identification, by means of a feature selection step,
are discussed in the next two sections. The last section describes
of the best variables to be used for prediction.
the two learning settings for predicting yi,j,h , namely structured
We orthogonally investigate all the aspects discussed before
and nonstructured output prediction.
through an extensive experimental evaluation on two datasets
for PV power plants in Italy and USA.
A. Spatial Autocorrelation
III. METHOD
The proximity of sensors induces spatial autocorrelation in
The task we intend to perform is to predict PV power gener- the data. According to [16], the inappropriate treatment of sam-
ation from: ple data with spatial dependence could obfuscate important in-
1) historical data on power production; sights and observed patterns may even be inverted when spatial
2) weather forecast data provided by NWP systems; autocorrelation is ignored. Taking autocorrelation into account
3) weather information collected by sensors; allows the models to avoid overfitting and exploit information
4) geographic coordinates of the plants; coming from close sites. To accommodate several forms of spa-
5) additional features representing spatial and temporal au- tial correlation, various models have been developed in the field
tocorrelations. of spatial statistics. The most known types are the spatial lag
Similar to [8], the output is a fine-grained prediction for the model and the spatial cross-regressive model [17]. While the
next day at one hour intervals. former considers autocorrelation on the target variables, the lat-
The learning algorithms update the prediction models every ter considers cross correlation between input features at one site
day. We use historical weather information collected by sensors and target variables at other sites, as well as cross correlation
as features in the training phase, whereas we use weather fore- between input features at one site and input features at other
cast data provided by NWP systems as features for predictions. sites (see Fig. 2).
Formally, let Pi be the i-th plant, we define: We use two spatial statistics when building the cross-
1) πi = x, y, the geographic coordinates of Pi ; regressive model between PV plants: 1) the Local Indicator
2) αi,j,h , a vector representing the sun’s position at the of Spatial Association (LISA) for representing a local measure
location of Pi , on day j and at hour h; of spatial autocorrelation [18]; 2) the Principal Coordinates of
3) pi,j,h , a vector representing properties of Pi ; Neighbor Matrices (PCNM) for representing the spatial struc-
4) wi,j,h , a vector for the observed weather data of Pi ; ture in the data [19]. Compared to the classical STAR model

5) wi,j,h a vector that represents NWP forecast data for Pi ; [20], which combines spatial and temporal information in an
6) si,j,h , a vector modeling spatial autocorrelation at Pi ; AR model, the advantage of the solution we adopt is that we
7) ti,j,h , a vector modeling temporal autocorrelation at Pi ; are able to embed spatial and temporal information in new fea-
8) yi,j,h , a value representing the production (KWh) of Pi . tures rather than in the model (but still taking autocorrelation
In our approach, we use πi , αi,j,h , pi,j,h , wi,j,h , si,j,h , ti,j,h , into account). This gives us the opportunity to plug-in any off-
(independent input features) and yi,j,h (dependent/target vari- the-shelf learning algorithms and to separately investigate the

able) for training purposes and πi , αi,j,h , pi,j,h , wi,j,h , si,j,h , contribution of spatial and temporal information.
To compute the LISA, the spatial neighborhood of an entity

(i.e., PV plant) is expressed as a matrix, and for each spatial
entity and data point observed in a time horizon, the local
Moran’s I index [18] is computed using the matrix. More
precisely, given n PV plants, the first step is to define a
neighborhood matrix δ of size n × n, such that:

1 if dist(Pa , Pb ) < maxDist
δ[Pa , Pb ] = (1) Fig. 3. (a) Daily circumference. (b) Yearly circumference.
0 otherwise
where Pa and Pb are two of the n plants. The maxDist thresh-
old is a user-defined parameter that determines the effect of It has been proven (see [19]) that the eigenvectors of Δ are
the autocorrelation. The elements of δ are then transformed: vectors with unit norm maximizing Moran’s I under the con-
1
δ [Pa , Pb ] = |N (P a )|
δ[Pa , Pb ], where N (Pa ) is the set of nodes straint of orthogonality, whereas the eigenvalues of this matrix
which are at a distance less than or equal to maxDist with are equal to Moran’s I coefficients of spatial autocorrelation
respect to Pa . In this way, the sum of the elements on each row (postmultiplied by a constant). They can also be either positive
in δ is either 0 or 1. or negative (because the original Euclidean distance matrix has
The subsequent step consists in computing the deviation of been truncated). Eigenvectors associated with high positive (or
the variable of interest with respect to the mean according to negative) eigenvalues have high positive (or negative) autocor-
the z-score normalization [21]. Since in our approach we are relation and describe global and local structures. Since we are
interested in identifying the contribution of the neighborhood interested in considering only positive spatial autocorrelation,
for each feature, this computation is performed for each feature only eigenvectors corresponding to positive eigenvalues are kept
considered (e.g.,: temperature, humidity, etc.). More formally, and used as spatial descriptors.
for a plant Pa at day j and hour h, we compute: The principal coordinates of each spatial descriptor (used as
xi,j,h − x ) are obtained by scaling each eigenvector uk of
features in si,j,h√
(x)
za,j,h = (2) Δ to the length λk , where λk is the eigenvalue associated with
σx
eigenvector uk . Finally, the value of t used in D∗ is defined as
where x is a generic variable used for representing either the maximum distance between two PV plants (t = max di,j ),
weather condition, weather forecast or plant information (that i,j
in order to guarantee that the data are all connected.
is, a generic element of the vectors w, w , and p), x rep-
resents the average of the variable x and σx represents the
(x) B. Temporal Autocorrelation
standard deviation of x. On the basis of za,j,h , it is possi-
ble to compute the local Moran’s I for the variable x of the Weather data is inherently seasonal/cyclical. For instance,
(x) summer days are featured by an increased irradiance compared
plant Pa for day j and hour h (according to [18]): Ia,j,h =
(x) (x) to winter days. Moreover, if we consider an absolute time ref-
za,j,h · P i ∈N (P a ) δ [Pa , Pi ] · zi,j,h . erence, the irradiance is almost equal for two (close) days at
According to [22], we incorporate local spatial autocorrela- the same hour. Hence, we expect to increase the reliability of
tion terms as predictors in regression equations (in the vector predictions if days closer to the prediction (target) day are more
sa,j,h , which has one element for each possible variable x), influential in the model.
giving place to AR models. To account for both seasonal and daily cyclicities, we use
The PCNM is used to define new features which represent two alternative solutions. The first solution is simple and con-
the spatial structure of the data, so to exploit autocorrelation in sists in considering, as input features (independent variables),
the learning phase. Its computation has three steps: the values of time and day scaled in the range [0,1] for both
1) The Euclidean (geographic) distance matrix D between training and target days. The second solution is to resort to
plants is computed (D = [di,j ] = [dist(Pi , Pj )]). directional statistics which represent directional or circular dis-
2) A threshold value t is chosen to construct a truncated tributions of the data by “wrapping” the probability density
distance matrix D∗ = [d∗ij ] as follows: function around the circumference of a circle of unit radius.
We follow this approach by “wrapping” information collected
∗ dij if dij ≤ t
dij = at a specific time-point around two circumferences, namely
4t otherwise.
the daily circumference and the yearly circumference (see
3) A principal coordinate analysis of the truncated distance Fig. 3). The former catches the distribution of the production
matrix D∗ is performed. This analysis consists in the over the different hours of the day, while the latter catches
diagonalization of Δ, where: the distribution of the production over the 365 days of the
year.
1 1 · 1t ∗ 1 · 1t
Δ=− I− D2 I − (3) More specifically, for each day in the training (historical) data,
2 n n
we compute its radial distance dr on the circumference from the
with D∗2 = [(d∗ij )2 ], I being the identity matrix, and 1 target day. Since the learning algorithm is able to consider the
being a vector of 1s. similarity between two values, the value 2π − dr is incorporated
TABLE I IV. DATA COLLECTION AND PREPROCESSING

CORRESPONDENCES BETWEEN INDEPENDENT VARIABLES IN THE
NONSTRUCTURED AND STRUCTURED OUTPUT SETTINGS Coherently with the problem definition provided in
Section III, we distinguish between data locally collected by
Nonstructured Structured sensors installed on the PV power plants and data collected
from external sources, such as weather forecast services (NWP).
αi ,j,h αi , j , h = 1 :2 4
pi , j , h pi , j , h = 1 :2 4 While locally collected data are used to initialize part of the fea-
wi,j,h wi , j , h = 1 :2 4 tures in wi,j,h and the dependent variable yi,j,h , external sources
w i , j , h w i , j , h = 1 : 2 4 are used to initialize remaining features in wi,j,h (that are not
si , j , h si , j , h = 1 :2 4
ti , j , h ti , j (only daily information) collected by sensors), as well as features in wi,j,h . The raw data
are preprocessed and normalized before subjecting them to the
data mining algorithms. In this way, we solve problems related
to measurement errors, null values, and outliers, as explained in
as input feature in the model (in ti,j,h ). Note that this approach the following.
requires an update of the model every day, which is in line with
A. Locally Collected Data
the learning setting we consider. However, the same approach
cannot be adopted to catch the distribution of the production over The first problem we consider is how to deal with missing
the different hours of the day since this would require updating values related to sensor failures or communication problems
the model every hour. The solution we adopt in this case is to for the features in wi,j,h . It is noteworthy that most of the PV
directly include in the model the radial value of the specific plants collect data every 10 or 15 min; thus, we can still es-
hour hr as input feature (independent variable in ti,j,h ). The timate reliable hourly averages if few measurement values are
disadvantage is that we cannot catch similarities in the energy missing. However, if for a feature we cannot reliably compute
produced between the last and the first hours of the day when, its average due to the presence of several null values, we have
however, PV energy is not produced. to substitute them with a value returned by external systems for
the specific position and time. To make the values collected by
C. Learning Setting: Structured and NonStructured sensors comparable with the values returned by external systems
for the same feature (e.g., temperature), they are both z-score
As previously mentioned, we also investigate the application
normalized [see (2)]. If external systems provide us with no
of structured output prediction models to the specific task. In
information, we replace the null value with the average of the
particular, we consider multitarget models where, instead of a
feature observed for the same month of the same year at the
single model that predicts the production at a single hour, we
same hour. Similarly, in the (rare) cases we are not able to re-
have a single model that predicts 24 output variables at the
liably compute the hourly average for the independent variable
same time. In principle, learning multitarget models should
yi,j,h , we substitute it with the average value observed by the
present some benefits over learning a local model for predicting
sensors in the same month at the same hour.
the production at each hour. Indeed, it is generally recognized
After replacing the missing values, we check for the presence
that structured models are typically easier to interpret, perform
of outliers: if the value of the feature x in wi,j,h observed by the
better, and overfit less than single-target predictions [15]. In
sensors is outside the range [x − 4 · σx ; x + 4 · σx ] (a relaxed
this specific application, multitarget models can also exploit the
3-sigma rule), we consider it an outlier and handle it in the same
dependencies between the productions at two different hours
way we handle null values.
of the same day. The main issue is the collinearity problem,
since the linear dependence between attributes may negatively
B. External Data
affect the learned model.
Formally, when a nonstructured output prediction setting is In order to obtain external data, we query external services
used, the predicted value is yi,j,h (for the plant i, at the day j and which provide interpolated data for past days and NWP values
hour h). On the contrary, if a structured output prediction setting for the future days. Interpolated data are used either to initialize
is used, the predicted value is a vector [yi,j,1 , yi,j,2 , . . . , yi,j,24 ] features in wi,j,h , which are not collected by sensors, or, as
(for the plant i, at the day j). Since in the structured out- previously mentioned, to correct incomplete and wrong data (in
put prediction setting, the unit of analysis is the day and in wi,j,h ) obtained by the sensors. On the contrary, NWP values

the nonstructured output prediction setting the unit of analysis are used to initialize features in wi,j,h .
is an hour, the independent variables are different. In fact, in Independent of the use of data obtained from external sources,
the nonstructured output setting, each training instance repre- the input parameters we use for querying data are latitude, lon-
sents a single hour, while in the structured output approach gitude, and time stamp of interest. The obtained features are:
each training instance represents a single day. This means pressure, percentage of cloud cover, type of precipitation, inten-
that, in order to keep the same information between the two sity of precipitation, temperature, dew point, ozone, wind speed,
settings, it is necessary to reserve 24 variables in the struc- humidity, wind bearing and irradiance. Once retrieved, all the
tured output setting to represent the value observed for the values are z-score normalized [see (2)].
same property. Formally, the correspondences defined in Table I Finally, in order to appropriately learn a prediction model
hold. (especially for ANNs), data must be scaled to the unit interval.
Hence, we apply a min–max normalization [21] for each final TABLE II

INDEPENDENT VARIABLES USED IN DIFFERENT SCENARIOS
feature, considering the min and max of the values observed

(for wi,j,h and wi,j,h , obtained after z-score normalization).
Nontemporal Noncyclic Cyclic
Actually, we consider the max increased by 30%, to handle
future situations in which the observed values of each feature NoSpat. p, α , w j , h, p, α , w j , h, t, p, α , w
might exceed the current maximum. LatLon p, α , w , π j , h, p, α , w , π j , h, t, p, α , w , π
LISA p, α , w , s l i s a j , h, p, α , w , s l i s a j , h, t, p, α , w , s l i s a
PCNM p, α , w , s p c n m j , h, p, α , w , s p c n m j , h, t, p, α , w , s p c n m
V. EXPERIMENTS
h is not used in the daily setting.
A. Datasets
In our empirical evaluation, we consider two datasets: a real
TABLE III
dataset, named PVItaly, collected by an Italian company, and NUMBER OF ATTRIBUTES PER SCENARIO
a dataset concerning the PV production in USA available at
the National Renewable Energy Innovation (NREL) web site PVItaly Hourly (N = 276 811) Daily (N = 14 569)
(http://www.nrel.gov/), and henceforth referred to as NREL.
PVItaly data are collected at regular intervals of 15 min (mea- Nontemporal Noncyclic Cyclic Nontemporal Noncyclic Cyclic
surements start at 2:00 and stop at 20:00 every day) by sensors NoSpat. 15 18 19 230 233 234
located on 18 plants in Italy. The time period spans from January LatLon 17 20 21 232 235 236
1st, 2012 to May 4th, 2014. The installed peak power of the PV LISA 26 29 30 420 423 424
PCNM 30 33 34 245 248 249
arrays is between 982.80 kW peak and 999.99 kW peak, and the NREL Hourly (N = 331 968) Daily (N = 17 520)
average is 995.71 kW peak. The amount of missing values and
outliers in the data is relatively small, if compared to the total Nontemporal Noncyclic Cyclic Nontemporal Noncyclic Cyclic
NoSpat. 12 14 15 209 211 212
amount of data: around 1% for missing values and around 3% LatLon 14 16 17 211 213 214
for outliers. LISA 21 23 24 380 382 383
The NREL dataset originally consists of simulated PV data PCNM 42 44 45 239 241 242
for 6000 plants for the year 2006. We perform cluster sampling N: number of examples.
over the original dataset by first selecting 16 States with the
highest Global Horizontal Irradiation. Then, from each State,
we select three PV plants, resulting in PV data from 48 plants.
The plant capacity is between 7 MW and 200 MW, and the 8) ti,j,h : radial day distance dr and radial representation of
average is 82.37 MW. the hour hr .
The weather data are queried from Forecast.io All datasets, the results and the system are available at:
(http://forecast.io/), while the irradiance for the PVItaly http://www.di.uniba.it/˜ceci/energyprediction/.
dataset is queried from PVGIS (http://re.jrc.ec.europa.eu/
pvgis/apps4/pvest.php). Forecast.io data come from a wide B. Experimental Settings
range of data sources, which are statistically aggregated to We distinguish between hourly and daily settings. In the
provide the most accurate forecast possible for a given location. hourly setting, we investigate nonstructured models with sin-
PVGIS data make use of monthly averages of daily sums of gle output – the production yi,j,h of the plant Pi at a specified
global and diffuse irradiations. The averages represent the day j and specified hour h. In the daily setting, we investigate
period 1981–1990. We queried the PVGIS database for each structured models with 24 outputs – the productions yi,j of the
day and plant separately by using a custom-made wrapper, plant Pi for the hours from 1:00 to 24:00 on a specified day j (ac-
specifying the date and the coordinates of the plants of interest tually, we only consider the interval 2:00–20:00, because of the
(latitude and longitude). available data). For both settings, we investigate several scenar-
More formally, for the defined vectors in Section III, the ios with increasing spatio/temporal complexity (see Table II).
following input features are considered: Moreover, in the daily scenarios, we consider the representation
1) πi : latitude, longitude; summarized in Table I. The number of attributes and the number
2) j, h: day and hour, respectively; of examples considered for each scenario for the both datasets
3) αi,j,h : altitude and azimuth, queried from SunPosition are given in Table III.
(http://www.susdesign.com/sunposition/index.php); For the evaluation, the datasets are randomly split into train-
4) pi : site ID, brand ID, model ID, age in months; ing days (85%) and testing days (15%). The learning strategy

5) wi,j,h and wi,j,h : ambient temperature, irradiance, pres- is iterative – for each testing day, the model is learned on all
sure, wind speed, wind bearing, humidity, dew point, the previous days and tested on the considered day (example(s)
cloud cover, descriptive weather summary; unseen by the trained model). After testing, the testing day be-
(x)
6) slisa
i,j,h : LISA indexes Ii,j,h for each variable x from wi,j,h comes part of the training set. This testing-retraining procedure
(or wi,j,h ) and αi,j,h ; is repeated for each testing day and the error contributes to the
7) spcn
i
m
: n PCNM coordinates (n = 15 for the PVItaly reported result. The evaluation procedure is based on the “land-
dataset, n = 30 for the NREL dataset), mark window model” and on the “count-based sliding window
the individual predictive ability of each feature along with their

degree of redundancy.
C. Results and Discussion

The results on the PVItaly and NREL datasets for the in-
vestigated hourly and daily scenarios are reported in Table IV
for RPROP+ and CLUS. Negative percentages of improvement
Fig. 4. Evaluation (training and testing) procedure with landmark win- mean that the investigated model does not outperform the per-
dow model (left) and count-based sliding window model (right).
sistence model. The improvement of the best performing results
are highlighted in bold. The results show that the best results for
the PVItaly dataset are obtained in the setting LISA, Noncyclic,
model” [23]. While the first model takes into account all the his- CLUS, and Daily; while for the NREL dataset the best results
torical data, the second only takes into account the most recent are obtained in the setting PCNM, Cyclic, CLUS, and Daily.
window of a given number of instances. We performed experi- Moreover, from the results, it is clear that the landmark model
ments with the count-based sliding window model for window outperforms the count-based sliding window model. This is
lengths 1000, 2000, and 4000 examples (see Fig. 4). We also confirmed also statistically (see Table V, sections “TRAINING
investigate different values for the maxDist threshold for the WINDOW”). Finally, although there is no statistical evidence,
LISA method. Experiments are run three times with different LISA with small values of maxDist shows the best perfor-
random splits into training and testing sets, and the average error mances for PVItaly. This is not confirmed in NREL, where dis-
on the test set is reported. tances between plants are larger and behaviors at coarse-grained
For ANNs, we use the encog implementation of the Resilient granularity are caught by the models.
Propagation (RPROP+) algorithm for training neural networks In addition to the results reported in the tables, we also per-
(http://www.heatonresearch.com/wiki/Resilient_Propagation# form a more systematic analysis in order to understand what
Implementing_RPROP.2B). RPROP+ is one of the best matters if we want to achieve good and reliable predictions. Ac-
general-purpose neural network training methods imple- cording to the contributions stated in Section I, we compare the
menting the backpropagation technique. It performs a direct results across four dimensions:
adaptation of the weight step based on local gradient informa- 1) Spatial: NoSpat. versus LatLon versus LISA versus
tion. The basic principle is to eliminate the harmful influence PCNM;
of the size of the partial derivative on the weight step – it 2) Temporal: Nontemporal versus Noncyclic versus Cyclic;
considers only the sign of the derivative to indicate the direction 3) Structural: Hourly versus Daily;
of the weight update. We use RPROP+ since it has been proven 4) Algorithmic: RPROP+ versus CLUS.
effective for renewable energy prediction [1]. Furthermore, All the above comparisons are orthogonally performed on
RPROP+ does not require typical ANN parameters, such as both real-world datasets. The different dimensions of analysis
learning rate and momentum, since they are automatically for the landmark window model are graphically presented in
tuned during the training phase. The ANN topology has one Fig. 5. In addition, the analysis is complemented with statistical
hidden layer with the number of hidden neurons equal to 2/3 of (Wilcoxon signed rank) tests (see Table V). The considered
the sum of the number of inputs and outputs. LISA results are for maxDist of 15 km for the PVItaly dataset
For regression trees, we use the system CLUS that views and 600 km for the NREL dataset.
a tree as a hierarchy of clusters (Predictive Clustering Trees 1) Spatial: The comparison in Fig. 5(a) shows that, by vary-
(PCTs)): the top node corresponds to one cluster contain- ing the spatial complexity, there is no single spatial configura-
ing all the data, which is recursively partitioned into smaller tion which outperforms all the others. However, by considering
clusters while moving down the tree. CLUS, including PCTs the statistical tests for spatial autocorrelation, we can see that
for multitarget regression [15], is available at clus.sourceforge PCNM is the best performing method. The PCNM method out-
.net. performs all other spatial configurations (significantly outper-
For the evaluation of the results, we consider three indicators forming NoSpat. and LISA). The LISA method is not able to
of the predictive performance, namely, the root mean squared even outperform the simple LatLon configuration. This means
error (RMSE), the mean absolute error (MAE) (MAE results that modeling autocorrelation by considering the spatial struc-
are available at www.di.uniba.it/˜ceci/energyprediction/), and ture of the data (as PCNM does) is, in the application at hand,
the improvement with respect to the persistence model (i.e., the much more important than directly considering AR information
model that forecasts the same production observed 24 h before). (as LISA does). Finally, what is clear from the results is that
For an analysis at a disaggregated level, we also perform feature considering spatial autocorrelation (in any form) is generally
selection with the aim of automatically identifying the most beneficial (LatLon, LISA, and PCNM outperform NoSpat.).
relevant input features for the task at hand. According to [24], the 2) Temporal: By analyzing the contribution of temporal
feature selection step has been performed as a best-first search autocorrelation, we can see that results are not uniform in the
in backward mode: starting from the complete feature set, the sense that there is no single temporal configuration which out-
worth of each subset of attributes is evaluated by considering performs others over the two learning settings, the two learning
TABLE IV
RPROP+ AND CLUS AVERAGE PERFORMANCE (3 RUNS) ON THE PVITALY AND NREL DATASETS
For LISA, A, B, and C indicate different values of m axD ist: 15, 30, and 45 km for PVItaly, and 300, 600, and 900 km for NREL. S1000, S2000, and S4000 indicate the sliding
count-based model with different window lengths.
algorithms, and the two datasets. However, the statistical tests prediction, where dependencies between the productions at two
show that the two configurations which exploit temporal auto- different hours of the same day provide a sort of “contiguity”
correlation (Noncyclic and Cyclic) significantly outperform the with respect to the temporal autocorrelation captured by the
configuration that does not (Nontemporal), with the cyclic con- directional statistics.
figuration not outperforming the noncyclic one. This can be due 3) Structural: By comparing the nonstructured output pre-
to collinearity problems because of high correlation between diction setting with structured output prediction, we can clearly
cyclic temporal features and noncyclic ones. Moreover, since in see that the comparison is significantly in favor of the structured
two out of four daily settings, the best improvement is obtained output prediction setting, confirming that dependence between
with cyclic configuration, we conclude that temporal autocor- the predictions obtained at different hours of the same day is of
relation is properly exploited in the case of structured output fundamental importance for improving predictions.
TABLE V
p-VALUES OF THE SIGNED WILCOXON RANK TESTS FOR DIFFERENT
DIMENSIONS OF ANALYSIS
p-Value Winner
SPATIAL NoSpat. versus LatLon 0.0002 LatLon

(with Bonferroni NoSpat. versus LISA 0.954 LISA
correction < 0.05/6) NoSpat. versus PCNM 0.0008 PCNM
LatLon versus LISA 3.43E−05 LatLon
LatLon versus PCNM 0.775 PCNM
LISA versus PCNM 4.67E−05 PCNM
TEMPORAL Nontemp. versus Noncyclic 0.005 Noncyclic
(with Bonferroni Nontemp. versus Cyclic 0.594 Cyclic
correction < 0.05/3) Noncyclic versus Cyclic 0.007 Noncyclic
STRUCTURAL Hourly versus daily 5.22E−09 Daily
ALGORITHMIC RPROP+ versus CLUS 4.39E−08 CLUS
TRAINING Sliding 1000 versus Landmark 8.15E−10 Landmark
WINDOW (RPROP+) Sliding 2000 versus Landmark 8.68E−10 Landmark
(with B. corr. < 0.05/3 Sliding 4000 versus Landmark 4.63E−09 Landmark
TRAINING Sliding 1000 versus Landmark 1.96E−04 Landmark
WINDOW (CLUS) Sliding 2000 versus Landmark 2.44E−04 Landmark
(with B. corr. < 0.05/3) Sliding 4000 versus Landmark 9.83E−05 Landmark
In bold statistically significant values (confidence=0.05, unless specified otherwise).
TABLE VI
AVERAGE RMSE (OVER THREE RUNS) WITH AND WITHOUT FEATURE
SELECTION (FS)
RPROP CLUS
PVItaly With FS Without FS With FS Without FS

Hourly 0.121 0.116 0.113 0.107
Daily 0.117 0.102 0.083 0.071
NREL With FS Without FS With FS Without FS
Hourly 0.127 0.113 0.115 0.102
Daily 0.122 0.097 0.105 0.087
Results without feature selection are obtained with the configura-

tion PCNM—Noncyclic.
Fig. 6. Predicted (solid line) versus actual (dotted line) production for
three consecutive days (in January and May) of a single plant of PVI-
taly and NREL. Results for PVItaly are obtained with the CLUS-Daily-
LatLon-Noncyclic configuration, while for NREL are obtained with the
CLUS-Daily-PCNM-Noncyclic configuration. The time intervals consid-
ered are 2:00 AM–8:00 PM. The considered plant for PVItaly is located in
Sannicandro di Bari, Italy, at latitude: 40.984261, longitude: 16.831031;
the considered plant for NREL is located in Riverside, CA, USA, at lat-
itude: 33.671418, longitude: −115.558126. (a) PVItaly – January 1st,
Fig. 5. Different dimensions of analysis for the landmark window 2nd, 3th, (b) PVItaly – May 4th, 5th, 6th, (c) NREL – January 16th, 17th,
model. (a) spatial, (b) temporal, (c) structural, and (d) algorithmic. 18th, (d) NREL – May 2nd, 3th, 4th.
4) Algorithmic: CLUS outperforms RPROP+ with a great ACKNOWLEDGMENT

margin, especially for the structured output prediction setting, The authors would like to thank L. Rudd for her help in
where it is able to improve the predictions of the persistent model reading the manuscript.
for more than 45% in the case of PVItaly (with the combination
Noncyclic – LatLon) and of almost 30% in the case of NREL REFERENCES
(with the combination Cyclic – PCNM).
Concerning feature selection, the results reported in Table VI [1] R. Bessa, V. Miranda, and J. Gama, “Entropy and correntropy against
minimum square error in offline and online three-day ahead wind power
show that it does not lead to improved performances. This means forecasting,” IEEE Trans. Power Syst., vol. 24, no. 4, pp. 1657–1666,
that many and distinct features contribute to provide informa- Nov. 2009.
tion for better predictions. However, from the application of the [2] A. Rashkovska, J. Novljan, M. Smolnikar, M. Mohorčič, and C. Fortuna,
“Online short-term forecasting of photovoltaic energy production,” in
feature selection algorithm, we can still have additional insights. IEEE Power Energy Soc. Innovative Smart Grid Technol. Conf., 2015,
In fact, the selected features confirm that the added spatial and pp. 1–5.
temporal features are considered important for the prediction. [3] P. Bacher, H. Madsen, and H. A. Nielsen, “Online short-term solar power
forecasting,” Sol. Energy, vol. 83, no. 10, pp. 1772–1783, 2009.
For PVItaly, the selected features are: day, date, hour (hourly set- [4] S. Pelland, G. Galanis, and G. Kallos, “Solar and photovoltaic forecasting
ting), irradiance, humidity, irradianceLISA and azimuthLISA, through post-processing of the global environmental multiscale numerical
while for NREL they are: date, day, hour (hourly setting), longi- weather prediction model,” Prog. Photovolt. Res. Appl., vol. 21, no. 3,
pp. 284–296, 2013.
tude, windspeed, altitude, azimuth, cloudcoverLISA, azimuth- [5] J. Kleissl, Solar Resource Assessment and Forecasting. Amsterdam, The
LISA, PCNM27 and PCNM28. Netherlands: Elsevier, 2013.
A different better marked perspective of the results is provided [6] P. Mathiesen and J. Kleissl, “Evaluation of numerical weather prediction
for intra-day solar forecasting in the continental united states,” Sol. Energy,
in Fig. 6 where we compare the actual and predicted curves of vol. 85, no. 5, pp. 967–977, 2011.
the production for three consecutive winter and summer days for [7] S. Bofinger and G. Heilscher, “Solar electricity forecast—Approaches and
both datasets for the landmark window model. For the PVItaly first results,” in Proc. 20th Eur. PV Conf., 2006.
[8] P. Chakraborty, M. Marwah, M. F. Arlitt, and N. Ramakrishnan, “Fine-
dataset example, a decrease in predictive performances can be grained photovoltaic output prediction using a Bayesian ensemble,” in
observed under cloudy and rainy weather conditions (typical for Proc. 26th AAAI Conf. Artif. Intell., 2012, pp. 274–280.
winter days) compared to sunny weather conditions (typical for [9] N. Sharma, P. Sharma, D. E. Irwin, and P. J. Shenoy, “Predicting solar
generation from weather forecasts using machine learning,” in Proc. IEEE
summer days). Moreover, the PVItaly dataset has shown to be Int. Conf. Smart Grid Commun., 2011, pp. 528–533.
more challenging than NREL. Namely, the PVItaly is real-world [10] S. Buhan and I. Cadirci, “Multistage wind-electric power forecast by using
dataset, while the NREL dataset is simulated. Therefore, it is a combination of advanced statistical methods,” IEEE Trans. Ind. Inform.,
vol. 11, no. 5, pp. 1231–1242, Oct. 2015.
not surprising for PVItaly to exhibit greater oscillations than [11] D. Stojanova, M. Ceci, A. Appice, and S. Dzeroski, “Network regression
NREL. with predictive clustering trees,” Data Mining Knowl. Discovery, vol. 25,
no. 2, pp. 378–413, 2012.
[12] P. M. Gonçalves Jr and R. S. De Barros, “RCD: A recurring concept
drift framework,” Pattern Recognit. Lett., vol. 34, no. 9, pp. 1018–1025,
2013.
VI. CONCLUSION [13] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, “Mining data streams:
A review,” ACM SIGMOD Rec., vol. 34, no. 2, pp. 18–26, Jun. 2005.
This paper studies the problem of PV energy prediction by
[14] D. Borcard, P. Legendre, C. Avois-Jacquet, and H. Tuomisto, “Dissect-
considering different dimensions of analysis: spatio/temporal ing the spatial structure of ecological data at multiple scales,” Ecology,
autocorrelation, the learning setting (structured output versus vol. 85, no. 7, pp. 1826–1832, Jul. 2004.
[15] D. Kocev, C. Vens, J. Struyf, and S. Džeroski, “Tree ensembles for pre-
nonstructured output) and the learning algorithm (ANNs versus
dicting structured outputs,” Pattern Recognit., vol. 46, no. 3, pp. 817–833,
regression trees), aiming to investigate the relevant aspects for 2013.
the problem at hand. Results clearly show that structured output [16] I. Kühn, “Incorporating spatial autocorrelation mayinvert observed pat-
terns,” Diversity Distributions, vol. 13, no. 1, pp. 66–69, 2007.
prediction models are much more accurate than models that pre-
[17] L. Anselin, “Spatial dependence in linear regression models with an
dict single outputs because structured output prediction models introduction to spatial econometrics,” S. Literaturverz, Ed. pp. 43–53,
can capture dependencies between different hours of the same 1996.
[18] L. Anselin, “Local indicators of spatial association lisa,” Geographical
day. Moreover, experimental results confirm that both forms Anal., vol. 27, no. 2, pp. 93–115, 1995.
of autocorrelation should be taken into account in this specific [19] S. Dray, P. Legendre, and P. R. Peres-Neto, “Spatial modelling: A com-
application: While PCNM is the best way to consider spatial prehensive framework for principal coordinate analysis of neighbour
autocorrelation, the simple consideration of hour and day in- matrices (PCNM),” Ecological Model., vol. 196, no. 34, pp. 483–493,
2006.
formation is enough to properly catch temporal autocorrelation. [20] M. Szummer and R. W. Picard, “Temporal texture modeling,” in IEEE Int.
Finally, regression trees produce significantly better predictions Conf. Image Process., vol. 3, Sep. 1996, pp. 823–826.
than ANNs, indicating that also in PV energy prediction, hierar- [21] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques,
3rd ed., San Mateo, CA, USA: Morgan Kaufmann, 2011, pp. 113–114.
chical models are better than flat regression functions (this was [22] H. H. Wagner and M. J. Fortin, “Spatial analysis of landscapes: Concepts
also observed in [25] for predicting electricity energy consump- and statistics,” Ecology, vol. 86, no. 8, pp. 1975–1987, 2005.
tion). [23] J. Gama and M. M. Gaber, Eds.,Learning From Data Streams. New York,
NY, USA: Springer, 2007.
As future work, we intend to directly consider autocorrelation [24] M. A. Hall, “Correlation-based feature subset selection for machine learn-
in the cost function (heuristic) used to learn the ANN. We also ing,” Ph.D. dissertation, Univ. Waikato, Hamilton, New Zealand, 1998.
plan to investigate more sophisticated learning methods based [25] G. K. Tso and K. K. Yau, “Predicting electricity energy consumption:
A comparison of regression analysis, decision tree and neural networks,”
on ensemble techniques. Energy, vol. 32, no. 9, pp. 1761–1768, 2007.
Michelangelo Ceci received the Ph.D. degree Donato Malerba (M’87) received the M.Sc. de-
in computer science from the University of Bari, gree in computer science from the University of
Italy, in 2005, where he is currently an Associate Bari, Italy, in 1987, where he is currently a Full
Professor in the Department of Computer Sci- Professor in the Department of Computer Sci-
ence. ence.
He is the Unit Coordinator of European and He has been responsible for the local re-
national projects. He is on the Program Com- search unit of several European and national
mittee of many conferences, including IEEE projects, and received an IBM Faculty Award in
International Conference on Data Mining se- 2004. He is the Director of the Computer Sci-
ries, International Joint Conference on Artificial ence Department, University of Bari, and of the
Intelligence, European Conference on Machine CINI Lab on Big Data. He is on the Board of Di-
Learning and Principles and Practice of Knowledge Discovery, SIAM rectors of the Big Data Value Association and in the Partnership Board
International Conference on Data Mining, Dynamical Systems, and In- of the PPP Big Data Value. He was Program (co-)Chair of the Inter-
ternational Symposium on Methodologies for Intelligent Systems. He is national Conference on Industrial, Engineering & Other Applications of
on the editorial board of the International Journal of Distributed Sensor Applied Intelligent Systems (IEA-AIE) 2005, International Symposium on
Networks, International Journal of Social Network Mining, “Intelligenza Methodologies for Intelligent Systems (ISMIS) 2006, SEBD 2007, Euro-
Artificiale,” International Journal of Data Analysis Techniques and Strate- pean Conference on Machine Learning and Principles and Practice of
gies, and of the European Conference on Machine Learning and Princi- Knowledge Discovery in Databases (ECMLPKDD) 2011, and was the
ples and Practice of Knowledge Discovery in Databases (ECMLPKDD) General Chair of ALT/DS 2016. He is on the editorial board of several
2014–2016 journal tracks. He was program (co-)Chair of SEBD2007, international journals. He has published more than 200 papers in in-
Discovery Science 2016, and General Chair of ECML-PKDD2017. He ternational journals and conference proceedings. His research interests
has published more than 150 papers in reviewed journals and confer- include machine learning, data mining, and Big Data.
ences. His research interests include data mining and machine learning.
Aleksandra Rashkovska (M’14) received the

Ph.D. degree in computer science from the
Roberto Corizzo received the M.Sc. degree in
Jožef Stefan International Postgraduate School,
computer science from the University of Bari,
Ljubljana, Slovenia, in 2013.
Bari, Italy, in 2012, where he is currently work- She is currently a Research Associate in the
ing toward the Ph.D. degree in predictive models
Department of Communication Systems, Jožef
for streams of sensor data in the Department of
Stefan Institute, Ljubljana, Slovenia. She was a
Computer Science.
Visiting Researcher in the Department of Com-
He has been involved in the development of puter Science, University of Bari, Bari, Italy. Her
algorithms for renewable energy power forecast-
research interests include advanced bio-signal
ing in the Vi-POC project. His research interests
analysis, computer simulations and data mining
include Big Data analytics, data mining, and pre-
in biomedicine, and data mining in sensor networks.
dictive modeling techniques for sensor networks.
Fabio Fumarola received the Ph.D. degree in

computer science from the University of Bari,
Italy, in 2011, where he is currently a Research
Assistant in the Department of Computer Sci-
ence, University of Bari, Bari, Italy.
He was a Visiting Researcher at the Univer-
sity of Illinois, Champaign, IL, USA. He founded
a startup working on data mining on Big Data.
He has published more than 20 papers in ref-
ereed journals and conferences. He has served
on the Program Committees of several confer-
ences, including the European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases (ECMLP-
KDD), Biocomputation, Dynamical Systems (DS), and the International
Symposium on Methodologies for Intelligent Systems (ISMIS). His re-
search interests include big data, web mining, and data stream mining.

Predictive Modeling of PV Energy Production: How To Set Up The Learning Task For A Better Prediction?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictive Modeling of PV Energy Production: How To Set Up The Learning Task For A Better Prediction?

Uploaded by

Copyright:

Available Formats

956 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO.

Predictive Modeling of PV Energy Production:

orography) [7] or measured data (approach often referred to as

their performance for the problem at hand. Actually, in our

To compute the LISA, the spatial neighborhood of an entity

TABLE I IV. DATA COLLECTION AND PREPROCESSING

Hence, we apply a min–max normalization [21] for each final TABLE II

the individual predictive ability of each feature along with their

C. Results and Discussion

SPATIAL NoSpat. versus LatLon 0.0002 LatLon

In bold statistically significant values (confidence=0.05, unless specified otherwise).

PVItaly With FS Without FS With FS Without FS

Results without feature selection are obtained with the configura-

4) Algorithmic: CLUS outperforms RPROP+ with a great ACKNOWLEDGMENT

Aleksandra Rashkovska (M’14) received the

Fabio Fumarola received the Ph.D. degree in

You might also like