Professional Documents
Culture Documents
DOI 10.1007/s00521-015-1842-y
ORIGINAL ARTICLE
Received: 5 February 2014 / Accepted: 11 February 2015 / Published online: 1 March 2015
Ó The Natural Computing Applications Forum 2015
Abstract Forecasting models for photovoltaic energy strategic option. Therefore, an increasing attention has
production are important tools for managing energy flows. been devoted to the energy production from renewable
The aim of this study was to accurately predict the energy sources, because they represent a valid alternative to the
production of a PV plant in Italy, using a methodology traditional fossil fuel resources, whose future availability is
based on support vector machines. The model uses his- uncertain and whose cost is constantly increasing. One of
torical data of solar irradiance, environmental temperature the main renewable energy sources available in nature is
and past energy production to predict the PV energy pro- the sun, usable for the direct production of electricity
duction for the next day with an interval of 15 min. The through photovoltaic (PV) systems. Besides being an
technique used is based on m-SVR, a support vector re- inexhaustible resource in nature, PV solar energy is an
gression model where you can choose the number of sup- example of clean and directly available energy that can be
port vectors. The forecasts of energy production obtained simply obtained by exploiting the radiation from the sun to
with the proposed methodology are very accurate, with the the Earth. The solar PV is one of the sources of renewable
R2 coefficient exceeding 90 %. The quality of the predicted energy more suitable in Italy, thanks to a particularly fa-
values strongly depends on the goodness of the weather vorable level of radiation.
forecast, and the R2 value decreases if the predictions of The major problem related to the PV solar energy and
irradiance and temperature are not very accurate. the consequent scientific challenge is that its production
greatly depends on the weather conditions of the area
Keywords Forecasting model Support vector where the PV plant is installed. Prediction of the PV solar
machines PV energy production energy production for hours or days ahead can contribute to
an efficient and economic use of this resource and can
allow to manage the amount of energy obtained by PV
1 Introduction plants in order to satisfy the growing demand of it. Fur-
thermore, the increasing development of electrical smart
During the last years, researches on sustainable energy grids has led to the need of knowing in advance the energy
consider the growing use of renewable sources as a production from renewable sources in order to manage the
energy flows within the smart grid itself: Forecasting the
power output of a PV plant for the next hours or days is
R. De Leone (&) M. Pietrini
School of Science and Technologies, University of Camerino, necessary for the optimal integration of this production into
Camerino, MC, Italy power systems.
e-mail: renato.deleone@unicam.it There were many scientific investigations in this area
M. Pietrini carried out in recent years, and different forecasting
e-mail: maila.pietrini@studenti.unicam.it strategies have been used to achieve the desired goals. In
most cases, the studies have focused on the prediction of
A. Giovannelli
Research@energy, Loccioni Group, Angeli di Rosora, AN, Italy solar radiation, while there have been few articles devoted
e-mail: a.giovannelli@loccioni.com to the estimation of the production of PV solar energy. The
123
1956 Neural Comput & Applic (2015) 26:1955–1962
state of art in this field is reported in [14]. The traditional over the last fourth decades within the framework of
prediction of PV energy production is based on the clas- statistical learning theory or VC theory (Vapnik–Cher-
sical approach of time series forecasting of solar energy vonenkis Theory) [27, 28, 30]. The VC theory studies
and weather conditions, which are used to calculate the properties of learning machines, which enable them to
electrical energy of PV systems: AutoRegressive models, well generalize to unseen data. SVMs were developed at
Moving Average and Autoregressive Moving Averages [2] AT&T Laboratories by Vapnik et al. and due to this
are often used to model linear dynamic structures. Fur- industrial context, SV research has an orientation toward
thermore, Fourier series models [3] can be used in order to real-world applications [21, 25]. In this section, we
predict the PV production. Another approach used to solve briefly introduce the support vector regression (SVR),
this problem is based on physical modeling. This approach which can be used for time series prediction models,
seems to be less effective for complex nonlinear systems where excellent performances were obtained until now
such as the forecast of irradiation fluctuation [1]. More- [8–10, 15, 16, 26].
over, different researches show that nonlinear and non-s- Given training data fðx1 ; y1 Þ; . . .; ðxl ; yl Þg X R,
tationary models are more flexible in capturing the where xi are input vectors, X Rn denotes the space of
characteristics of these data and that, in some cases, are input patterns of dimension Rn and yi are the associated
better in terms of estimation and forecasting. Many sci- output values for xi , the goal in SVR is to determine a
entific articles [24] show that a predictive model based only function f ðxÞ that has at most e deviation from the set of
on a database of historical data is expected to be more target values yi ði ¼ 1; . . .; lÞ for all the training data and, at
effective for the forecast of the energy production from a the same time, is as flat as possible. The SVR model [4]
PV plant, because the influence of the specific system is requires the solution of the following optimization
somehow implicit in the past data. Nowadays, advanced problem:
models based on nonlinear approaches are rapidly spread-
1 T Xl
ing in the power production forecasting, using artificial min w wþC ðni þ ni Þ
neural network [6], support vector machines [7, 11] and w;b;n;n 2 i¼1
hybrid models [17]. subject to yi ðwT /ðxi Þ þ bÞ e þ ni ; ð1Þ
The aim of this work was to show how advanced ma-
ðwT /ðxi Þ þ bÞ yi e þ ni ;
chine learning techniques can be used for the prediction of
PV energy generation in a real-world scenario. ni ; ni 0; i ¼ 1; . . .; l:
Specifically, the adopted approach utilizes Support Vector
where the vector xi is mapped into a higher dimensional
Machines for Regression. The goal was to obtain daily
space by the function /. The quantity ni is an upper bound
forecast for PV energy production with a quarter-hour
on the training error (and ni is a lower bound) subject to
frequency, as will be better explained in the next sections.
the e-insensitive tube
The paper is organized in the following way. The next
section contains an explanation of support vector machine
y ðwT /ðxÞ þ bÞ e: ð2Þ
(SVM) for regression, then there is a section concerning the
data used and the model implemented, and, at last, com- The parameters which control the regression quality
putational results, and final conclusions are pointed out. are the cost of error ðCÞ, the width of the tube ðeÞ and
We briefly describe our notation now. All vectors are the mapping function ð/Þ. The positive constant C
column vectors and will be indicated with lower case Latin determines the trade-off between the flatness of the
letter ðx; z; . . .Þ. Subscripts indicate components of a vector, function
while superscripts are used to identify different vectors. The
X
l
set of real numbers is denoted by R. The space of the n— f ðxÞ ¼ wTi /ðxi Þ þ b ð3Þ
dimensional vector with real components will be indicated i¼1
by Rn . The symbol jjxjj indicates the Euclidean norm of a
and the amount up to which deviations larger than e are
vector x. Superscript T indicates transpose. The scalar pro-
tolerated. If the inequality (2) is not satisfied by xi , there is
duct of two vectors x and y in Rn will be denoted by xT y.
an error ni or ni which the SVR model minimizes in the
objective function. In fact, SVR avoids underfitting and
overfitting of the training data by minimizing the training
2 Support vector machines for regression P
error C li¼1 ðni þ ni Þ as well as the regularization term
1 T
1 2
SVM is a new and promising nonparametric technique 2 w w ¼ 2 w . For traditional least-square regression, the
for data classification and regression [29], developed quantity e is always zero, and data are not mapped into
123
Neural Comput & Applic (2015) 26:1955–1962 1957
Table 1 Examples of the principal kernel functions 3 The photovoltaic energy production model
Kernel function Formulation
The aim of this paper was to predict the daily PV energy
Linear kðx; yÞ ¼ xT Ay production. The work was carried out using historical data
Polynomial kðx; yÞ ¼ ðxT x þ cÞd of an existing solar PV plant. These data were provided by
Radial basis function (RBF) kðx; yÞ ¼ eckxyk
2
the Loccioni Group (located in Angeli di Rosora, AN,
Italy). A detailed study of the PV plant has been possible
thanks to the availability of a large amount of recorded
measurements, stored during the past years. Specifically,
higher dimensional spaces. Hence, SVR is a more general the data used are those related to the PV plant placed in
and flexible tool for regression problems. their location in Angeli di Rosora, consisting of Solyndra
The use of the mapping function / allows to take into panels, positioned on the roof of a building, with a nominal
account the fact that normally the function f ðxÞ that best power of 112 kWp.
fits the training data is nonlinear. Hence, the function / is a The collected measurements are energy and power
way to make the SV model nonlinear. The function / : produced by the plant, radiation (recorded directly by
Rn ! H is a function mapping Rn into a higher-dimension sensors placed on the roof, in the same position of the PV
Hilbert space H. In the dual formulation of problem (1), modules) and external environmental temperature. As the
the kernel function is introduced PV modules have a particular cylindrical shape, module
kðxi ; xÞ ¼ /ðxi ÞT /ðxÞ: temperature has little effect on their performance and it is
not measured.
Examples of kernel functions are shown in Table 1. In
recent years, several methods have been proposed to 3.1 Data organization
combine multiple kernels, instead of classical kernel-based
algorithm using a single kernel [13]. These methods are In the previous section, we have briefly described the SVR
known as Multiple Kernel Learning and Infinite Kernel model. Now we need to prepare the datasets used to build
Learning [12, 13, 20]. Nevertheless, in our study, we prefer this model. In order to train the SVR model, we have to
to use a single kernel, as the literature suggests. choose a diversified dataset of input vectors, representative
For the experiments in this paper, we use the open- of the real situation. Each component of the training data is
source library LIBSVM [5] available for MATLAB. It called feature (or attribute). The first analysis consists on
enables using two different versions of SVR: the e-SVR the selection of the necessary features of the input data.
and the m-SVR [4]. Scientific researches have shown that there are many
The version illustrated above is the e-SVR, while the m- physical parameters, which PV production depends on:
SVM [4, 22, 23] problem is defined as follows: environmental temperature, solar irradiance, atmospheric
1 T 1X l pressure, wind speed, humidity and cloud coverage.
min w w þ Cðme þ ðn þ ni ÞÞ However, many of these values are not available. For this
w;b;e;n;n 2 l i¼1 i
reason, the model will consider in the training input data
subject to yi ðwT /ðxi Þ þ bÞ e þ ni ; ð4Þ only the external temperature and the solar irradiance as
T i
ðw /ðx Þ þ bÞ yi e þ ni ; features. Nevertheless, a preliminary study on the his-
ni ; ni 0; i ¼ 1; . . .; l: torical collected data has proved that the solar irradiance
is the physical quantity mostly affecting the PV energy
Since it is difficult to select an appropriate e, Schölkopf production. The scatterplot in Fig. 1 shows the correla-
et al. [23] introduced the new parameter m which allows to tion between the PV energy produced and the measured
control the number of support vectors and training errors. irradiance. The value of the Pearson correlation coeffi-
To be more precise, m is an upper bound on the fraction of cient between the two measures is approximately 0.95, a
margin errors, that is to say of training samples which are high value suggesting a strong relationship between en-
errors (badly predicted), and a lower bound of the fraction ergy and irradiance. However, this strong relationship is
of support vectors: For regression, the parameter m replaces not perfectly linear, in particular during the central hours
e and in our situation, it might be easier to use m-SVR [4]. of the day, and this is the reason why a model based on m-
The main motivation for the m version of SVR is that it has SVR with a nonlinear kernel, as explained in the fol-
a more meaningful interpretation than e: e or m are just lowing section, is created for the energy production
different versions of the penalty parameter. forecast.
123
1958 Neural Comput & Applic (2015) 26:1955–1962
123
Neural Comput & Applic (2015) 26:1955–1962 1959
steps of 15 min. Figure 2 plots the real and forecasted denotes the mean square discrepancy between the values of
energy production values, for a sunny day in June 2012. the observed data ðyi Þ and the estimated data ð^ yi Þ: The
For the purpose of evaluating the quality of the fore- closer this value is to 0, the more accurate is the model.
casting, we examine the prediction accuracy by calculating The MAPE, defined in Eq. 6,
three different evaluation statistics: the root mean square
n
error (RMSE), the mean absolute percentage error (MAPE) 100 % X yi y^i
MAPE ¼ ð6Þ
and the coefficient of determination (R2 ). n i¼1 yi
The RMSE, as in Eq. 5,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi is another measure of accuracy, to be interpreted in this
Pn way: In case of a perfect fit, its value is 0, while there are
i¼1 ðyi y ^i Þ2 ð5Þ
RMSE ¼ no restrictions with regard to the upper bound.
n
At last, the coefficient R2 , as in Eq. 7 where y indicates
the mean of the actual data,
Pn
2 yi yÞ2
ð^
R ¼ Pi¼1 n
ð7Þ
i¼1 ðyi yÞ2
is a proportion of the variability in the data and the cor-
rectness of the model used; R2 varies in (0,1): It is 0 when
the used model does not explain the data at all, it is 1 when
the model explains the data perfectly.
In the first result, we obtain good values for the three
measures of accuracy illustrated above:
– RMSE ¼ 0:5275;
– MAPE ¼ 0:0785;
– R2 ¼ 0:9944.
The quality of the prediction is confirmed by the scat-
Fig. 3 The scatterplot of the correlation between real and forecasted terplot of the correlation between the measured energy
energy production. The red line is the regression line (color figure online) production and forecasted values, as illustrated in Fig. 3.
123
1960 Neural Comput & Applic (2015) 26:1955–1962
Figure 4 plots the real and forecasted PV energy pro- good results: RMSE ¼ 0:9955, MAPE ¼ 0:3585 and
duction values based on the m-SVR model for a cloudy day R2 ¼ 0:9508. Instead, these measures get significantly
in April 2012. The comparison between actual and pre- worse if the solar irradiance and external temperature
dicted data shows an almost full agreement between both provided by the weather site are not accurate. In this case,
series. The accuracy measures prove this result: the values of the three previous measures of goodness are,
RMSE ¼ 0:2810, MAPE ¼ 0:1143 and R2 ¼ 0:9926. respectively RMSE ¼ 3:5876, MAPE ¼ 3:5718 and
Things are different when the input values of solar ir- R2 ¼ 0:3616. Also, the scatterplot in Fig. 7 shows that the
radiance and environmental temperature for the next day predicted values are far from the regression line. As we
are not fully accurate. Figure 5 shows this situation for a have observed before about the influence of irradiance in
day in October 2013, where the forecasted irradiance the PV energy production, the forecasted result follows the
provided by a weather site is far from reality, as pointed out estimated error in the solar irradiance prediction. The
in Fig. 6. The error made in the irradiance prediction is presence of a correct irradiance is decisive for the energy
crucial for the forecast of the solar energy production. If prediction by the m-SVR model. In fact, previous analysis
the irradiance data used for the prediction were the mea- show that the correlation between the two variables (irra-
sured ones, in fact, the production forecast would have diance and energy production) exceeds 95 %.
123
Neural Comput & Applic (2015) 26:1955–1962 1961
123
1962 Neural Comput & Applic (2015) 26:1955–1962
20. Özögür-Akyüz S, Ünay D, Smola AJ (2011) Guest editorial: 25. Smola AJ, Schölkopf B (2004) A tutorial on support vector re-
model selection and optimization in machine learning. Mach gression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.
Learn 85(1–2):1–2 0000035301.49549.88
21. Pellegrini M, De Leone R, Maponi P (2013) Reducing power 26. Stitson M, Gammerman A, Vapnik V, Vovk V, Watkins C,
consumption in hydrometric level sensor network using support Weston J (1997) Support Vector regression with ANOVA de-
vector machines. In: PECCS (ed) Proceedings of the PECCS composition kernels. http://citeseer.nj.nec.com/stitson97support.
2013 international conference on pervasive and embedded com- html
puting and communication systems, pp 229–232 27. Vapnik V (1982) Estimation of dependences based on empirical
22. Schölkopf B, Bartlett P, Smola A, Williamson RC (1999) data. Springer Series in Statistics, Springer, New York
Shrinking the tube: a new support vector regression algorithm. In: 28. Vapnik VN (1995) The nature of statistical learning theory.
Kearns MS, Solla SA, Cohn DA (eds) Advances in neural in- Springer, New York
formation processing systems 11. MIT Press, Cambridge, 29. Vapnik V (1998) Statistical learning theory. Wiley, New York
pp 330–336 30. Vapnik V, Chervonenkis A (1964) A note on one class of per-
23. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New ceptrons. Automation and remote control 25
support vector algorithms. Neural Comput 12(5):1207–1245.
doi:10.1162/089976600300015565
24. Simonov M, Mussetta M, Grimaccia F, Leva S, Zich R (2012)
Artificial intelligence forecast of pv plant production for inte-
gration in smart energy systems. Int Rev Electr Eng 7:3454–3460
123
Copyright of Neural Computing & Applications is the property of Springer Science &
Business Media B.V. and its content may not be copied or emailed to multiple sites or posted
to a listserv without the copyright holder's express written permission. However, users may
print, download, or email articles for individual use.