You are on page 1of 54

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/317507439

Review on probabilistic forecasting of photovoltaic power production and


electricity consumption

Article  in  Renewable and Sustainable Energy Reviews · June 2017


DOI: 10.1016/j.rser.2017.05.212

CITATIONS READS

197 1,713

3 authors:

Dennis van der Meer Joakim Widén


MINES ParisTech Uppsala University
27 PUBLICATIONS   703 CITATIONS    145 PUBLICATIONS   5,244 CITATIONS   

SEE PROFILE SEE PROFILE

Joakim Munkhammar
Uppsala University
100 PUBLICATIONS   1,792 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Solar charge 2020 View project

Viable Cities: Urban Building and Mobility Energy Modeling (UBMEM) for Planning of Future Cities View project

All content following this page was uploaded by Dennis van der Meer on 09 March 2018.

The user has requested enhancement of the downloaded file.


Review on probabilistic forecasting of photovoltaic power production and
electricity consumption

D.W. van der Meera,∗, J. Widéna , J. Munkhammara


a Built Environment Energy Systems Group (BEESG), Division of Solid State Physics, Department of Engineering Sciences,
Uppsala University, P.O. Box 534, SE-751 21 Uppsala, Sweden

Abstract
Accurate forecasting simultaneously becomes more important and more challenging due to the increasing
penetration of photovoltaic (PV) systems in the built environment on the one hand, and the increasing
stochastic nature of electricity consumption, e.g., through electric vehicles (EVs), on the other hand. Until
recently, research has mainly focused on deterministic forecasting. However, such forecasts convey little
information about the possible future state of a system and since a forecast is inherently erroneous, it
is important to quantify this error. This paper therefore focuses on the recent advances in the area of
probabilistic forecasting of solar power (PSPF) and load forecasting (PLF). The goal of a probabilistic
forecast is to provide either a complete predictive density of the future state or to predict that the future
state of a system will fall in an interval, defined by a confidence level. The aim of this paper is to analyze the
state of the art and assess the different approaches in terms of their performance, but also to what extent
these approaches can be generalized so that they not only perform best on the data set for which they were
designed, but also on other data sets or different case studies. In addition, growing interest in net demand
forecasting, i.e., demand less generation, is another important motivation to combine PSPF and PLF into
one review paper and assess compatibility. One important finding is that there is no single preferred model
that can be applied to any circumstance. In fact, a study has shown that the same model, with adapted
parameters, applied to different case studies performed well but did not excel, when compared to models
that were optimized for the specific task. Furthermore, there is need for standardization, in particular in
terms of filtering night time data, normalizing results and performance metrics.
Keywords: Probabilistic forecasting; electricity consumption; photovoltaic; solar radiation; irradiance;
prediction interval

Contents

1 Introduction 2

2 Basic definitions and comparison models 4


2.1 Spatial and temporal horizon and resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Deterministic forecast vs. probabilistic forecast . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Clear sky models, and clear sky and clearness indices . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Benchmark methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Benchmarks for deterministic forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Benchmarks for probabilistic forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Deterministic forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Probabilistic forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Preprint submitted to Renewable & Sustainable Energy Reviews May 24, 2017
3 Forecasting techniques 16
3.1 Statistical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Parametric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Nonparametric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Physical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Parametric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Nonparametric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Review sorted on temporal horizon 22


4.1 Intra-hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Intra-day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Day-ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Comparison between PSPF and PLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Discussion 46

6 Conclusion 47

1. Introduction

During recent months, global temperature records are consistently being broken. The month June in
2016 has been the hottest June since measurements began in 1880, while also being the 14th consecutive
month in which the monthly global temperature record has been broken [1]. According to NASA, July 2016
was the hottest month in recorded history [2]. To mitigate the disastrous consequences of anthropogenic
global warming, 196 countries signed a document, known as the Paris Agreement, in which they committed
themselves to limit global warming well below 2 ◦ C while pursuing to limit the increase to 1.5 ◦ C, implying
that emissions are to be reduced to 40 gigatonnes CO2 equivalent in 2030 [3].
In order to achieve this reduction a transition towards renewable energy sources (RESs), such as solar
and wind power, is imperative. Particularly solar power generated through photovoltaics (PV) has seen
tremendous growth over the last decade, with a total of 227.1 GW installed at the end of 2015 [4]. This
technology offers several advantages, such as low-cost and easy installment. Specifically, in 2014 the cost of
crystalline silicon (c-Si) modules had been reduced by roughly 83% since 2000 [5]. However, solar irradiance
received on Earth is variable due to seasonal and diurnal variations and various meteorological processes
that occur in the atmosphere, in particular the stochastic movement of clouds. While the former variations
are deterministic, the latter physical processes are complex to model in high resolution, both spatial and
temporal, and come with substantial computational cost [6].
Nevertheless, if worldwide installment of PV power capacity continues along the current trend, this
technology will produce a significant share of the total generated electricity in the near future. Examples
of this can already be seen in Germany, where 7.5% of the annual net electricity consumption was covered
by PV in 2015, which can increase to 50% on sunny weekend days [7]. Moreover, RESs such as solar power
have dispatch priority due to legislation. Other examples are the Spanish and Italian governments, that
have implemented legislation that penalizes poor and rewards correct predictions of solar power production
in the day-ahead market [8, 9]. Furthermore, many studies have been performed that investigated problems
that may arise with increasing PV integration, such as voltage fluctuations, feeder loading, grid losses and
increased short-circuit current [10, 11]. Additionally, after a literature review, Ropp et al. [12] concluded
that distributed generation (DG) penetration levels of 40 - 50 % can cause significant problems in terms
of abnormal voltage fluctuations due to variability in cloud cover, islanding and increases in short-circuit

∗ Correspondingauthor. Tel.: +46 18 471 31 33


Email address: dennis.vandermeer@angstrom.uu.se (D.W. van der Meer)

2
current. Therefore, in order to reduce both the number of conventional power plants on standby and the
costs incurred due to uncertainties and erroneous forecasts, accurate PV power generation or solar irradiance
prediction becomes increasingly important for the proper integration of RESs into the electricity generation
mix. Although a substantial amount of research exists in this topic, it is still regarded as relatively immature
[13].
Generally speaking, a distinction is being made between solar irradiance forecasting and PV power
forecasting. The result of the former can be used in a physical model that describes the PV system and its
specifications, whereas the latter can be directly used by e.g., utilities to plan their operations accordingly.
As solar irradiance and PV system output naturally show a strong correlation [14], the approaches for
statistical forecasting are similar in this respect. In contrast, in case of physical forecasting, e.g., numerical
weather prediction (NWP), one forecasts the state of the atmosphere and therefore a physical model of
the PV system is required as well. Extensive review papers regarding solar irradiance forecasting and PV
power forecasting have been published in recent years, such as [14–23], which, in combination with the
book by Kleissl [24] provide a clear overview about what has been done, which methodologies exist and the
mathematical aspects of these methodologies.
On the other hand, consumption of electricity is known to be well predictable and is seen as a mature
domain [13], with large and medium utilities achieving forecast errors of 3% or lower in the day-ahead market
[25]. However, with the emergence of smart grids, new problems arise, especially related to demand-response
(DR) and demand-side management (DSM). These technologies allow consumers to shift their electricity
consumption based on e.g., price incentives where the purchase tariff is reduced during off-peak hours, and
consequently, this poses new challenges for grid operators. Hong & Fan identified that in such cases, utilities
will be required to perform one backcast and two forecasts since one needs to discern the normal consumption
pattern from the one influenced by the price incentive [26]. Recently, the term dynamic demand response,
or D2 R was coined by Aman et al., which implies that through the emergence of smart meter data, demand
response will become more dynamical, which in turn will pose additional challenges in terms of forecasting
since residential load with high temporal resolution tends to be more volatile [27].
In terms of load forecasting, a general division is made between very short-term load forecasting (VSTLF),
short-term load forecasting (STLF), medium-term load forecasting (MTLF) and long-term load forecasting
(LTLF), in which the forecast horizons are one day, two weeks, three years and thirty years, respectively
[28]. However, a more coarse distinction can be achieved by dividing load forecasting into STLF and LTLF,
where the former has a forecast horizon of up to two weeks [26]. It should be noted that this is by no means
standardized, as the authors of [29] divide load forecasting into three categories, namely: STLF, MTLF
and LTLF with forecast windows of one hour to one week, one week to a year and longer than a year. It
should also be noted that LTLF is generally considered of interest for policymakers and transmission system
operators (TSOs), since building the necessary infrastructure depends on many socio-economic variables.
As this particular form of forecasting is out of the scope of this review paper, this topic is excluded. Many
review papers have been dedicated to the topic of load forecasting, such as [25, 26, 29–31]. Additionally,
since 40% of total energy consumption in the European Union (EU) is due to buildings [32], several review
papers focus on the prediction of energy consumption in buildings [33–36].
Several of the aforementioned review papers offer a wide scope, see e.g., [15] regarding the theory behind
statistical and physical methods available to forecast solar irradiance or PV power, or [21] for a thorough
overview of results achieved in recent studies. With respect to load forecasting, Hong & Fan [26] discussed
17 load forecasting review papers, after which they focused on probabilistic load forecasting (PLF), which,
according to them, had not been done before. Nevertheless, the majority of the aforementioned review
papers focused on a particular aspect of forecasting or a certain methodology. For example, Ren et al.
[19] focused in particular on ensemble forecasting for PV power and solar irradiance, whereas Raza et al.
[14] covered the well-known models regarding PV power forecasting but paid special attention to artificial
intelligence (AI) and Diagne et al. [16] and Wan et al. [20] provided an in-depth discussion regarding general
forecasting techniques. Furthermore, Singh et al. [29], Takiyar & Singh [30] and Raza et al. [31] reviewed
load forecasting techniques with a focus on statistical models, whereas Hong [25] gave a broad overview of
the history of energy forecasting. More recent review papers, such as those by Voyant et al. [23] and Yildiz et
al. [36], uncovered a trend that although artificial neural networks (ANNs) are still among the most utilized
3
models for both irradiance and load forecasting, other methods such as gradient boosting, random forests
and support vector machines are getting increasing attention in the scientific community. Additionally,
Wang et al. [22], evaluated and compared three different types of ANNs in terms of their performance for
predicting the daily global irradiation in China. They found that accuracy strongly depended on both the
model and location, sometimes yielding significant different results, but that overall multi-layer perceptron
(MLP) and radial basis neural network (RBNN) outperformed the remaining ANN and a physical model.
An important note is that with increasing PV penetration and new technologies that allow consumers
to actively participate in the electricity market, it becomes important for utilities to be able to forecast net
demand, i.e., demand less residential generation [26]. This is particularly challenging in urban or residential
areas where on the one hand specifics regarding the PV systems, e.g. tilt, azimuth and additional hardware,
are often not known [37], while on the other hand stochastic human behavior poses challenges to accurately
model and predict electricity consumption [38–40]. The most recent review paper that considered both
generation and load forecasting was published in 2014, by Lazos et al. [41], in which the focus lied on
commercial buildings. Therefore, it appears that there is need for a literature review that summarizes
past research on net demand forecasting. However, a brief survey showed that the body of research is yet
limited in this particular area. Therefore, this paper reviews recent advances separately for probabilistic
forecasting of solar power (PSPF) and load (PLF) on different temporal horizons, to map the opportunities
for combining PSPF and PLF for net demand forecasting. First, an overview is given regarding the state-
of-the-art solar irradiance and PV power forecasting, followed by a similar overview of the state-of-the-art
load forecasting with special focus on urban and residential electricity consumption, although that research
area is similarly scarce [42]. In both cases the focus lies on probabilistic forecasting, since that conveys much
more information regarding future generation and consumption than a deterministic forecast. In this way,
stakeholders can manage risks more thoroughly and improve efficiency of the system. It should be noted
that this review includes all papers that we were able to find until 16 May 2017.
The objectives of this review paper can therefore be summed as follows: Firstly, we aim to provide a
broad overview of performance metrics, methods and recent advances in PSPF and PLF. Secondly, we focus
on identifying research gaps. Thirdly, we aim to find common ground between PSPF and PLF, in any
form that may take, which could open up the way towards net demand forecasting. Finally, we assess the
necessity for standardization for, e.g., the aforementioned performance metrics.
This paper is organized as follows: Section 2 introduces basic definitions, benchmark models and perfor-
mance metrics for both deterministic and probabilistic forecasts. Although this paper focuses on probabilistic
forecasting, deterministic forecasting still plays an important role due to the fact that these methods can
be dressed into probabilistic forecasts, and because the mean of a probabilistic forecast can be interpreted
as the deterministic forecast. Therefore, basic definitions and performance metrics for both are elaborated
upon in this review. Then, Section 3 provides an overview of forecasting techniques that have been found
to be commonly applied in both PSPF and PLF. In this section, the nonparametric statistical approach
dominates. This is because the parametric approach tends to be utilized to dress deterministic forecasts into
probabilistic ones, and the deterministic models are out of the scope of this paper. In addition, the non-
parametric statistical approach can be utilized to forecast directly, but also to create a probabilistic forecast
from a deterministic physical model such as NWP. Section 4 summarizes the state of the art, sorted on the
temporal horizon on which the studies have been performed and compares PSPF and PLF, among others,
in terms of methods used, forecast horizons, and spatial and temporal resolution. Finally, a concluding
discussion is presented in Section 5.

2. Basic definitions and comparison models

This section presents basic definitions regarding spatial and temporal horizon and resolution, elabo-
rates on the difference between deterministic and probabilistic forecasts, and provides an overview of the
comparison models and performance metrics for both deterministic and probabilistic forecasts.

4
Figure 1: Spatial and temporal resolution of statistical and physical methods, inspired by [16, 18].

2.1. Spatial and temporal horizon and resolution


The accuracy of any forecast is greatly influenced by the spatial and temporal horizon and resolution.
In fact, the decision for a suitable method mainly depends on these four parameters, in combination with
the data that is available. Figure 1 illustrates the spatial and temporal resolution of statistical and physical
methods, inspired by [16, 18]. From this figure we can infer several interesting facts.
First, there exists more variation in approaches for solar power forecasting (SPF) than for load forecasting.
The reason for this is that physical approaches are rarely used in case of load forecasting, and a distinction
is usually made between time-series and ANNs [26]. However, it should be noted that this does not imply
that physical variables such as temperature are not taken into account, as these variables have a significant
impact on electricity consumption.
Second, although several physical approaches exist for SPF, few of them are used for probabilistic SPF
(PSPF), as we will show in this review paper. For example, a NWP forecast is run with certain initial and
boundary conditions. In order to create a PSPF, one can adjust these conditions and construct a probability
density function (PDF) from the subsequent forecasts. As NWP models are computationally demanding,
this is a time consuming process.
Third, as noted before, PLF is usually achieved through time-series or ANNs. As can be seen from
the figure, these methods can be utilized on a wider variety of spatial resolutions, depending on the level
of aggregation. The reason for this is that electricity consumption on an aggregated level is less variable
than solar irradiance and has a strong repetitive character that is less stochastic and consequently, complex
physical modeling can be avoided.
Fourth, due to the complex nature of the NWP models and lack of computing power, the resolution in
both space and time are coarse and are therefore utilized for forecasts with a larger time horizon.
Finally, there is a discrepancy between the definition with respect to forecast horizon for short-term solar
power and load forecasting. The horizon for short-term SPF is usually intra-day and day-ahead, whereas
STLF is on a horizon of one day to two weeks. This review will focus on short-term PSPF and very
short-term to short-term PLF, as defined in Section 1. Section 3 will provide an elaborate overview of the
forecasting methods presented in Figure 1.

5
2.2. Deterministic forecast vs. probabilistic forecast
Previous research has mainly focused on deterministic or point forecasting. Although it is not clear why
this is the case, Hong & Fan [26] suggested that the reason might be the fact that probabilistic forecasts were
assessed with the same performance metrics as those used to assess deterministic forecasts, and subsequently
performed worse than their deterministic counterpart. From Section 2.5, which presents the most common
performance metrics, it can be concluded that assessing probabilistic forecasts with metrics formulated for
point forecasts could lead to invalid conclusions.
Nevertheless, providing a utility with a PDF or a prediction interval, i.e., an interval in which the random
variable is predicted to be measured with a specific probability, of future production and demand is arguably
more valuable than a single value, since it allows for risk management. It should be noted that a prediction
interval and confidence interval are not the same, but unfortunately are sometimes used as if they are
interchangeable. Whereas a prediction interval is concerned with a random variable, a confidence interval
is associated with an unknown parameter and the interval is computed from the data. To construct a PDF
in probabilistic forecasting, there are generally two approaches. First, one can assume a density function,
which is the parametric approach. Second is the nonparametric approach in which no such assumption is
made. However, assuming a distribution is rarely representative of observations and is usually invalid, or
at least sub-optimal [43, 44]. This is also the reason that the majority of the papers reviewed here consider
the nonparametric approach, which is elaborated upon in Section 4.
Finally, it is interesting to note the level of maturity for (P)SPF and PLF. Hong et al. [13] presented
a graphical overview of this and concluded that (P)SPF are both still immature, while PLF has recently
reached maturity in the scientific community.

2.3. Clear sky models, and clear sky and clearness indices
Clear sky models aim to model the direct normal irradiance (DNI) and diffuse horizontal irradiance
(DHI) under clear sky conditions, while taking into account solar geometry and meteorological inputs such
as water vapor content. The interested reader is referred to a comprehensive study by Ineichen [45], who
concluded that no single model performs best under all circumstances, but that performance mainly depends
on input parameters.
Clear sky irradiance is often used to normalize solar irradiance, so as to stationarize the time series.
The normalization is achieved by either the clear sky irradiance, as modeled by a clear sky model, or the
extraterrestrial irradiance. The result of the former is called the clear sky index and is formulated as follows:

It
kt = , (2.1)
Itclr

where It is the irradiance and Itclr is the clear sky irradiance. Subsequently, kt is often used in the persistence
forecast, as defined in eq. (2.4).
Normalization by the extraterrestrial irradiance is defined as the clearness index Kt as follows:

It
Kt = , (2.2)
Itextr
where Itextr is the extraterrestrial irradiance. Naturally, extraterrestrial irradiance is more straightforward
to model than clear sky irradiance due to the absence of atmosphere. A simple extraterrestrial irradiance
model mentioned by Inman et al. [15] is formulated as:

Itextr = I0 cos (θt ) , (2.3)

where the solar constant I0 = 1360 W/m2 and θt the solar zenith angle at time t. Finally, it is interesting
to note that the clear sky index is always greater than the clearness index, since the clear sky irradiance is
always lower than the extraterrestrial irradiance [15].
6
2.4. Benchmark methods
In order to quantify the performance of a proposed forecast method, it is common practice to compare it
to a benchmark method to evaluate the relative improvement. In general, typical conditions for a benchmark
are that it is not computationally demanding and nonparametric in case of probabilistic forecasts [46]. For
a deterministic forecast, a common benchmark method is the persistence model, which is elaborated upon
in Section 2.5.1. Regarding probabilistic forecasting, this literature review shows that a single benchmark
method is not yet being utilized, although several potential ones have been introduced in the literature,
which is discussed in Section 2.4.2.

2.4.1. Benchmarks for deterministic forecast


Solar. The persistence forecast, also known as the naive predictor, supposes that conditions at time t will
persist into time period t + h. This method is very effective on a forecast horizon in the order of seconds
or even minutes, but its effectiveness greatly diminishes when the forecast horizon increases to one hour
and beyond [16]. Furthermore, it should be noted that the forecast skill score S (eqs. (2.36) and (2.37)) is
often used to quantify the relative improvement of the proposed model compared to the persistence method.
A skill score S of zero implies that the uncertainty is as great as the variability, which is the case in the
persistence forecast. Therefore, if the proposed model has a skill score S greater than zero, it can be
considered valid, whereas a skill score S ≤ 0 implies that the proposed model should be rejected, because
the uncertainty is greater than the variability [6]. The persistence method can be formulated as:

x̂t+h = xt , (2.4)

where x̂t+h is the forecast and xt is the measured value at time t + 1 of the time series of interest, e.g., clear
sky index.
Although persistence is the most common benchmark method, another similar method has recently been
proposed, named smart persistence [47]. Rather than letting forecast x̂t+h with forecast horizon h depend
solely on the current observation, smart persistence takes the mean of previous h observations into account.
This can be formulated as

x̂t+h = mean [xt , ..., xt−h ] . (2.5)

Finally, the climatology model is introduced here, which is a benchmark adopted from meteorology. In
contrast to the persistence model and its application to short lead times, i.e., forecast horizon, the climatology
model is generally difficult to outperform on longer forecast horizons, as it takes into account the mean of
all measurements [46]. It can be formulated as follows

x̂t+h = mean (xt ) . (2.6)

It is important to note that although the climatology model can be applied to point forecasts, it is uncommon.
However, its use as benchmark model in probabilistic forecasts is frequent due to the possibility to construct
a model-free distribution based on the distribution of past observations, as we will show in the next section.
It is worth mentioning that other, well-established methods such as the autoregressive integrated moving
average (ARIMA) are sometimes used as benchmark as well, which will be explained in more detail in the
next section.

Load. Regarding load forecasting, a single prevalent benchmark is not utilized but rather a wide variety
exists. This paragraph presents several of the benchmarks that have been used in the papers reviewed here,
of which the simplest one is the persistence method, i.e., naive predictor, as was the case with SPF.
Another benchmark that is commonly used is double seasonal Holt-Winters-Taylor (HWT) exponential
smoothing (ES). The advantage of this method is that it is able to incorporate two seasonal patterns e.g.,
intra-day and intra-week cycles. It is mathematically formulated as follows [48]:
7
 
xt
Level : St = α + (1 − α) (St−1 + Tt−1 ) (2.7)
Dt−s1 Wt−s2
Trend : Tt = γ (St − St−1 ) + (1 − γ) Tt−1 (2.8)
 
xt
Seasonality 1 : Dt = δ + (1 − δ) Dt−s1 (2.9)
St Wt−s2
 
xt
Seasonality 2 : Wt = ω + (1 − ω) Wt−s2 (2.10)
St Dt−s1
Forecast : x̂t+k = (St + kTt ) Dt−s1 +k Wt−s2 +k , (2.11)

where α, γ, δ and ω are the smoothing parameters and x̂t (k) is the forecast with horizon k. Furthermore,
seasonal indices s1 and s2 determine the seasonal cycles under investigation e.g., s1 = 24 and s2 = 168
for the aforementioned intra-day and intra-week cycles. The interested reader is referred to [48] for more
information.
Vanilla is a benchmark method proposed by Hong [28] in 2010, which takes the recency effect into
account, a term adopted from psychology that implies that people tend to remember things they read last
best. This implies that the dependence of current load demand on past temperature measurements decreases
when looking further back in time. An appropriate mathematical representation of this is the moving average
(MA) of the temperature. The Vanilla benchmark supposes that future electricity demand depends solely
on calendar variables and temperature, and also includes the interaction between for example hour and week
or month and temperature. It is formulated as follows [49]:

x̂t = β0 + β1 Mt + β2 Wt + β3 Ht + β4 Wt Ht + f (Tt ) , (2.12)

where βi are the parameters, Mt , Wt and Ht are the monthly, weekly and hourly calendar variables at time
t, Tt is the temperature at time t and f (Tt ) is defined as

f (Tt ) = β5 Tt + β6 Tt2 + β7 Tt3 + β8 Tt Mt + β9 Tt2 Mt


(2.13)
+ β10 Tt3 Mt + β11 Tt Ht + β12 Tt2 Ht + β13 Tt3 Ht .

The interested reader is referred to [28] for more information.


Finally, the ARIMA method is often applied as a benchmark, as it is a well-established method. It was
proposed in 1970 by G. Box and G. Jenkins [50], and has been applied to many research areas. An important
condition is that the time series under investigation is stationary, i.e., has constant mean and variance, and
several methods exist to stationarize the time series at hand e.g., differencing, transforming to a logarithmic
scale, or normalizing. The ARIMA(p,d,q) model can be formulated in polynomial form as [51]:

φ(b)(1 − b)d xt = θ(b)t , (2.14)

where p and q represent the order of the autoregressive (AR) and MA process, and φ(b) and θ(b) are the
AR and MA parameters, respectively. Furthermore, d represents the dth difference and b the backward shift
operator necessary to write the ARIMA process in polynomial form, defined as:

xt − xt−1 = (1 − b)xt (2.15)

for the first order difference.

8
2.4.2. Benchmarks for probabilistic forecast
As of yet, there is no prevalent benchmark method being used to compare the performance of a newly
proposed method, as is the case with e.g., the persistence method in combination with SPF. However, several
benchmarks have been proposed, which are presented in this section.

Solar. Apart from using well-established methods such as quantile regression (QR) as benchmark, two other
methods have recently been proposed that build on methods introduced in Section 2.5.1. The persistence
ensemble (PeEn) has been proposed by Alessandrini et al. [9] and produces a PDF by looking at a certain
amount of most recent measurements m at the same time of day that subsequently can be ranked into
quantile intervals. This can be expressed as

PDF [x̂t+h ] = PDF (xt , xt−1 , ..., xt−m ) . (2.16)


The second method extends the climatology model to a probabilistic one, in which past measurement
are utilized to construct a PDF [46]. However, modifications have been made to improve accuracy, as for
example by Zamo et al. [52], who ordered the measurements by month in order to attain realistic forecasts.

Load. To the best of the authors’ knowledge, no dedicated probabilistic benchmarks exists regarding PLF.
Rather, the deterministic benchmarks presented in Section 2.5.1 are utilized to create a probabilistic one.
For example, Arora & Taylor [53] used a Monte Carlo (MC) simulation method to generate probabilistic
forecasts for the double seasonal HWT ES method. Furthermore, Liu et al. [49] utilized sister models
i.e., models with different amount of lagged temperatures and lagged daily moving averages, to construct a
density distribution forecast. Finally, Quan et al. [54] utilized the direct one-step approach on the ARIMA
model in order to construct 90% prediction intervals.

2.5. Performance metrics


Any forecast should be subjected to accuracy assessment, so that performance can be compared and one
can state whether or not the proposed method is an improvement over the benchmark method, usually the
naive method, and existing methods. However, many metrics exist and depending on the time series and
their scale, the amount of meaningless zero values, e.g., night values in case of (P)SPF, or the magnitude of
the forecast error of the naive benchmark method, one has to consider which metrics are most appropriate.
Therefore, this section aims to guide researchers to select the proper metrics depending on the time series
at hand and whether one is dealing with a point forecast or probabilistic forecast, and also to provide
necessary background to the review of papers. Since the accuracy measures can be used for both solar and
load forecasting, no distinction is made in that respect. Comprehensive overviews of accuracy measures
regarding point forecasting can be found in [6, 55, 56], whereas [57, 58] provide thorough overviews of the
aforementioned measures with respect to probabilistic forecasting. It should be noted that although this
paper focuses on probabilistic forecasting, point forecast measures can still be applied to e.g. the mean of a
probabilistic forecast and provide valuable information, especially in case of short-term forecasting [26].

2.5.1. Deterministic forecasting


Hoff et al. [59] identified that, in order to describe the performance of a model, several performance
metrics should be applied so that one can assess its accuracy. The reason for this are the characteristics
of individual metrics, e.g. the root-mean square error (RMSE), which penalizes outliers more heavily than
other methods due to the square of the errors. What follows is a summation of performance metrics and
their respective advantages and disadvantages.
The mean absolute error (MAE) shows the accuracy of the forecast compared with measurements by
calculating the average error between these two, which can be formulated as follows:

N
1 X
MAE = |x̂i − xi | , (2.17)
N i=1

9
where N is the length of the time series, x̂i is the forecast value and xi is the measured value. This metric
is useful when comparing several forecasts on the same time series. However, due to the fact that it is scale
dependent, it cannot be used among forecasts on different time series because of the inherent differences in
scale. Furthermore, many relatively small errors can disguise a small amount of large errors, which can be
troublesome if the forecast displays noise [56].
The mean square error (MSE) and RMSE are defined as:

N
1 X 2
MSE = (x̂i − xi ) (2.18)
N i=1
v
u
u1 X N
2
RMSE = t (x̂i − xi ) , (2.19)
N i=1

and are similarly limited in their applicability because of scale dependency. Additionally, because of the
squared error, these are more sensitive to outliers than MAE. Nevertheless, these metrics are widely used
due to their theoretical relevance in statistical modeling [55] and since they provide quick insight into the
variance and standard deviation of the errors, respectively.
As stated before, the measures formulated in eqs. (2.17) to (2.19) cannot be utilized to compare forecast
results among different time series, and in addition are not very comprehensive without background knowl-
edge of e.g., the PV power plant under investigation. In order to allow comparison between forecasts on
different temporal and spatial scales, the percent error measures can be used. As investigated by Hoff et
al. [59] for the case of SPF, there are several denominators that can be used to normalize the error. They
found that three are most appropriate, namely the average or a weighted average irradiance or power that
has been produced and the capacity of the PV power plant. However, they subjectively concluded that
MAE normalized by the average output would be most desirable, although if one would choose to utilize
both MAE and RMSE, normalizing by the capacity would be more appropriate. It is likely that a similar
reasoning holds for load forecasting, although no literature was found to support this. The mean absolute
percentage error (MAPE) and normalized root mean square error (NRMSE), normalized by the capacity,
are formulated as follows:

N
100 X x̂i − xi
MAPE = (2.20)
N i=1 P0
v
u N  2
u 100 X x̂i − xi
NRMSE = t , (2.21)
N i=1 P0

where P0 is the rated power of the PV power plant for the case of SPF. The advantage of the MAPE measure
is that it is straightforward and widely accepted, for example in the wind turbine industry [59]. Similarly
as RMSE, NRMSE is also more sensitive to outliers than MAPE. Sometimes, eqs. (2.20) and (2.21) are
normalized by the measured value, i.e. xi , rather than P0 , such as in [55], which has the disadvantage that
an absolute zero that has a meaning is assumed, e.g. in case of temperature and the Kelvin scale.
To assess the bias of the forecast, i.e. whether the model over- or underestimates, one should utilize
the mean bias error (MBE). The advantage of this measure is that one can immediately ascertain the
average bias, where a large and positive MBE represents a large overestimate, but disadvantages are scale
dependency and lack of information about the distribution of the errors. Furthermore, the MBE provides
valuable information because it can be reduced or removed during post-processing or can directly be taken
into account by e.g. the utility [56]. The MBE is formulated as follows:

10
N
1 X
MBE = (x̂i − xi ) . (2.22)
N i=1

Recently, Hyndman & Koehler [55] proposed a new measure, designed to be independent of scale and less
sensitive to outliers than methods based on relative measures and measures based on relative errors. These
methods are not mentioned in this paper because of their limited occurrence in solar and load forecasting
literature, however, the interested reader is referred to [55]. In contrast to the aforementioned methods,
in the mean absolute scale error (MASE), the error is scaled with the in-sample MAE of the benchmark
method, i.e. persistence method. The MASE is formulated as follows:

MAE
MASE = 1
PN . (2.23)
n−1 i=2 |xi − xi−1 |

According to the authors, this measure is widely applicable and can still be used in case zero values occur
in the time series. A value greater than one implies that the forecast is worse than the forecast produced
by the persistence method and a value lower than one means that the method is an improvement over the
persistence method. It should be noted, however, that the authors indicate that traditional methods such
as MAE and MAPE may be preferred in case the time series are on the same scale and have values greater
than zero, mainly because of their simplicity.
The linear dependency between two variables, i.e. measurement x and forecast x̂, can be determined by
Pearson’s correlation coefficient, and is defined as follows:

cov (x, x̂)


ρ= . (2.24)
σx σx̂
Here, ρ = 1 implies total positive correlation, ρ = 0 no correlation and ρ = −1 total negative correlation.
Since the covariance expresses how much two variables change together, the correlation coefficient can be
interpreted in a similar way, e.g., a coefficient of 1 means that both variables show the same trend. Correlation
can be utilized in two manners. First, one can assess the correlation between the forecast and observation,
as is defined in eq. (2.24). Another approach is to utilize the correlation between sites in studies where
dispersion of e.g., PV systems is of interest, in which the smoothing effect is used to reduce the average
forecast error by reducing forecast variability [60].
The coefficient of determination, R2 , is a measure that indicates to what extent the statistical model
fits the data and describes to what extent the variance of the errors and variance of the measured values
coincide. R2 can be defined as follows:

σ 2 (x̂ − x)
R2 = 1 − . (2.25)
σ 2 (x)

The Kolmogorov-Smirnov Integral (KSI) quantifies to what extent the forecast and measurements come
from the same distribution, and is suitable for comparison on longer timescales. It is a nonparametric test
that is formulated as follows:

Z xmax
KSI = Dn dx, (2.26)
xmin

where Dn is the difference between the cumulative distribution functions for each time interval, defined as:

11

Dn = max F (xi ) − F̂ (xi ) , xi (2.27)

∈ [xmin + (n − 1) p, xmin + np] ,

where F and F̂ represent the cumulative distributions of forecast and measured values, respectively [61].
Furthermore, p is the interval distance and formulated as:

xmax − xmin
p= , (2.28)
m
where m is the level of discretization, which according to Espinar et al. [61], is chosen to be 100 since a
higher m will not lead to accuracy improvement at the cost of increased computational time. In order to
determine whether the null hypothesis, i.e. if the forecast and measured values have identical distributions,
can be accepted with a 99% level of confidence, Dn should lie below a threshold value Vc , which is formulated
as follows:

1.63
Vc = √ , N ≥ 35, (2.29)
N
where N is the length of the time series [61]. As can be seen from eq. (2.26), one can assess the departure
of the cumulative distribution of the forecast from that of the measured values over the entire range of
measurements, where a KSI value of zero implies that the two distributions are equal. The advantage of the
KSI test over the Kolmogorov-Smirnov (KS) test is that quantification of the error is possible, in addition
to concluding whether to reject or accept the aforementioned null hypothesis [61].
Similarly as the KSI parameter, the OVER parameter is based on the KS test. However, the OVER
parameter is calculated only when the differences between F and F̂ , i.e. Dn , exceed Vc . In order to do this,
a vector a needs to be constructed that contains the differences between Dn and Vc if Dn is larger than Vc
or zero values in case Dn is smaller than Vc , which is formulated as follows [61]:

(
Dn − Vc if Dn > Vc
a= (2.30)
0 if Dn ≤ Vc .

Then, the OVER parameter can be calculated [61]:

Z xmax
OVER = adx. (2.31)
xmin

Both KSI and OVER parameters can be normalized by the critical area ac , defined as ac = Vc (xmax − xmin ),
which allows comparison of the aforementioned parameters from different tests [61]. Including these param-
eters in the analysis of a forecast model is valuable since, in contrast to RMSE and MBE, one can compare
the distributions of the forecast and measured values [61].
Although traditional measures such as RMSE and MAE have great value because of their ease of inter-
pretation, they do not allow for comparison between data sets in case the forecast errors have identical mean
and variance but different distributions. Evidently, it is important to know whether a model is consistently
under-predicting or over-predicting in case of unit dispatch, due to relatively slow ramp-down and ramp-up
times of traditional thermal generators. In order to get additional information about the distribution, one
can calculate the skewness and kurtosis of the forecast errors. The skewness is a measure of asymmetry,
where negative skewness implies that the tail on the left side of the probability distribution is longer or fatter
than on the right side, although it does not distinguish between these features. In contrast, positive skewness
12
indicates a longer or fatter tail on the right side than on the left side. Skewness is the third standardized
moment and can be defined as follows [62]:

h i
3
E (X − µ)
µ̂3 =  h i3/2 , (2.32)
2
E (X − µ)

where X is a random variable and µ is the mean.


Similarly as skewness, kurtosis describes the shape of the distribution function. Specifically, it measures
the significance of the tails of the distribution and is often compared to a univariate normal distribution,
which always has a kurtosis of three. A distribution with kurtosis higher than three is called leptokurtic
and is an indication that the model produces more outliers than the normal distribution. Conversely, a
distribution with a kurtosis lower than three indicates that the model produces less outiers than the normal
distribution, and is called platykurtic. In case of forecasts, a leptokurtic distribution implies a more accurate
forecast [56]. Kurtosis is the fourth standardized moment and is defined as [43]:

h i
4
E (X − µ)
µ̂4 =  h i4/2 (2.33)
2
E (X − µ)

Other metrics have to be used in combination with skewness and kurtosis, as these measures alone do not
provide enough information regarding the forecast errors [56].
In forecasting of solar irradiance of PV power output, it is common practice to compare the proposed
model to the persistence method. In order to assess the improvement of the proposed forecast model over
the persistence method, one can calculate the forecast skill S. It is defined as the ratio of the uncertainty U
and variability V of solar irradiance or PV power. The uncertainty U is formulated as follows [6]:

v
u Nw  2
u 1 X x̂i − xi
U= t , (2.34)
Nw i=1 xclear,i

where Nw is a subset time window and xclear,i is the expected clear-sky irradiance. When compared to
eq. (2.19), one can see the similarity between these formulations. In fact, as stated before, the RMSE can
be seen as the standard deviation of the errors, which is closely related to the uncertainty of the forecast [6].
The solar variability is defined by Coimbra & Kleissl [6] as the standard deviation of the step-change of
the clear-sky index, and is therefore independent of diurnal variability. The variability V is formulated as
follows

v
u
u1 X N
2
V =t (∆kti ) . (2.35)
N i=1

Subsequently, forecast skill S can be defined as follows [6]:

U
S =1− . (2.36)
V
From eq. (2.36), one can see that in case there is zero uncertainty, i.e. a perfect forecast, the forecast skill is
1. Conversely, if the uncertainty and variability are equal, the forecast skill is zero, which is by definition the
13
forecast skill of the persistence method [6]. Therefore, this measure can be used to assess the improvement
over the persistence method. Also note that S can become negative, implying that the proposed forecast
model performs worse than the persistence method. Finally, Coimbra & Kleissl [6] noted that S can be
approximated by the following formulation:

RMSE
S ≈1− , (2.37)
RMSEp
where RMSEp is the RMSE achieved by the persistence method. Evidently, eq. (2.37) is easier to calculate
than eq. (2.36), but it can also be used retroactively to compare results of different models if researchers
have calculated both RMSE and RMSEp .

2.5.2. Probabilistic forecasting


The measures of the previous section can be used in case of short-term forecasting [26], although dedicated
measures have been developed as well. However, Hong & Fan [26] identified that one of the primary reasons
for probabilistic forecasting being in a relatively immature stage, is due to the fact that there are few well-
established evaluation methods. It should be noted that they made this remark specifically for PLF, but
this is also valid for probabilistic irradiance forecasting (PIF).
In contrast to PIF and PLF, probabilistic wind power forecasting (PWPF) is a mature area of research
according to Hong et al. [13], and it therefore seems fitting to adopt properties that are required for PWPF.
Reliability, sharpness and resolution were introduced by Pinson et al. [63] and will be briefly explained here.
Reliability, also referred to as calibration [57], is an important quality of a probabilistic forecast, as it
indicates how similar the distribution of the observation is to that of the forecast. A probabilistic forecast
is said to be of perfect reliability in case the probabilities arising from the quantiles of the forecast model
and the observed ones are identical, and a deviation from this will lead to a reduction in the reliability
[63]. It is related to the bias of the forecast, in the sense that a high reliability corresponds to a low bias
[26]. To assess the reliability, a time series is constructed that monitors the times that the model over- or
under-predicts, after which a reliability diagram can be drawn. If this time series is close to the diagonal,
it is said to have high reliability [63]. Another method to evaluate the reliability is through assessing the
histograms of the probability integral transform (PIT). By definition, the PIT histograms are uniform if the
probabilistic forecast has perfect reliability [57].
Sharpness can be envisaged as the extent to which information that the probabilistic forecast contains,
is concentrated [63]. Therefore, high sharpness is an essential characteristic of a probabilistic forecast, since
it reduces uncertainty. In fact, Gneiting & Katzfuss [57] proposed that in case of probabilistic forecasting,
one should maximize sharpness while imposing reliability as a constraint to the optimization problem.
Resolution has been defined by Pinson et al. [63] as the variability of the probabilistic outcome, depending
on forecast conditions, and is also called dispersion by Gneiting & Katzfuss [57]. A comprehensible analogy
is provided by Hong & Fan [26], who note that resolution can be compared to the variance of forecasts in
point forecasts. In terms of probabilistic forecasting, one desires zero resolution, or dispersion, implying
that the width of the prediction interval remains constant.
Evidently, the aim of a probabilistic forecast is to ensure that the probability distribution of observations
lies within the prediction interval. In order to assess whether this is the case or not, one can calculate the
prediction interval coverage probability (PICP), which is formulated as follows [64]:

N
1 X
PICP = i , (2.38)
N i=1
where i is defined as:
(
1 if xi ∈ [Li , Ui ]
i = (2.39)
0 if xi ∈
/ [Li , Ui ] ,

14
where Li and Ui represent the lower and upper bound of the prediction interval, respectively. From the
formulation of i we can deduce that a high value for PICP implies that more results lie within the bounds
of the prediction interval, which is evidently desirable. The PICP measure is a quantitative expression of
reliability and should be higher than the nominal confidence level, since these are otherwise invalid and
should be discarded [65].
However, if one solely analyzes the quality of the forecast based on the PICP, it is possible to choose
a wide range between lower bound Li and upper bound Ui so that the coverage probability is artificially
being improved, while the variance of the forecast can be undesirable and decision makers are provided with
little valuable information. In fact, informativeness of prediction intervals is determined by their width [64].
Therefore, the PICP should be concurrently analyzed with the prediction interval normalized average width
(PINAW), which is a measure that quantitatively assesses the width of the prediction intervals. The PINAW
is defined as follows [64]:

N
1 X
PINAW = (Ui − Li ) , (2.40)
N R i=1

where R is meant to normalize the prediction interval average width and represents the maximum forecast
value less the minimum forecast value.
Khosravi et al. [66] noted that PICP and PINAW usually have a direct relationship in which high
width of the prediction interval (PINAW) implies high coverage (PICP) of results, and therefore proposed a
quantitative measure to assess both simultaneously. Although the authors named it coverage-length-based
criterion (CLC), later it was renamed as coverage width-based criterion (CWC), see e.g., [64]. CWC is
formulated as follows [64]:

 
CWC = PINAW 1 + γ (PICP) e−η(PICP−µ) , (2.41)

where η and µ are controlling parameters and γ(PICP) = 1 during training. Parameter µ represents the
preassigned PICP that is to be achieved during the training phase and to select this parameter, the nominal
confidence level [(1 − α) %] can be used as guidance. Furthermore, η is a penalizing term that will cause
CWC grow exponentially if the preassigned PICP is not satisfied. When PICP ≈ µ, one has achieved
balance between PICP and PINAW and can continue with testing of the model [64]. Then, CWC is to be
determined with γ(PICP) depending on µ, which is formulated as follows:

(
0 if PICP ≥ µ
γ(PICP) = (2.42)
1 if PICP < µ.

During the testing phase, CWCs are compared by using eq. (2.42), where CWC increases in case the PICP
lies below the preassigned PICP. The aim of CWC is to compromise between the amount of information
(PINAW) and coverage probability (PICP) of the prediction intervals [64], and can be interpreted as the
optimal balance between information and coverage.
Reliability and sharpness are important properties of a probabilistic forecast, but are sensitive to in-
terpretation due to the fact that they can be graphically assessed, which in turn may lead to subjective
conclusions, especially when methods are compared [63]. Therefore, skill scores or scoring rules are intro-
duced that are required to be proper, i.e. scores that ensure that the best forecast obtains the highest score.
In the following, three measures will be discussed, namely the continuous ranked probability score (CRPS),
the pinball loss function and the Winkler score.
The CRPS is a robust score that is designed in such a way that it measures both reliability and sharpness.
An advantage of the CRPS is that it reduces to the absolute error if the forecast is deterministic, and this
score therefore allows for comparison between probabilistic and point forecasts [57]. The CRPS can be
formulated as follows [57]:
15
Z ∞
2
CRPS(F, x) = (F (y) − 1 {x ≤ y}) dy (2.43)
−∞
1
= E |X − x| − E |X − X 0 | , (2.44)
F 2F
where 1 is the Heaviside step function, x represents the observation, X and X 0 are the independent random
variables with cumulative distribution function (CDF) forecast F and finite first moment [57]. Another
advantage of the CRPS is the fact that it has the same unit as the variable that was forecast [57], which
improves interpretability of the score. Furthermore, due to its relation with the absolute error, a low CRPS
indicates an accurate probabilistic forecast. Finally, as can be seen from the formulation in eq. (2.43), the
CRPS considers the entire distribution of forecasts, which contrasts with the pinball loss function that will
be discussed in the next paragraph.
Similarly as the CRPS, the pinball loss function takes both reliability and sharpness into consideration,
and is specifically designed for quantile forecasts [26], which is explained in Section 3. It is defined as follows
[67]:

(
(1 − τ ) |x̂τ − x| if x̂τ ≤ x
Lτ (x̂τ , x) = (2.45)
τ |x̂τ − x| if x̂τ > x,

where x̂τ is the point forecast for quantile τ and x the observation. After calculating for each quantile,
these can be summed over the forecast horizon to attain the pinball loss [26]. It is interesting to note that
the function defined in eq. (2.45) is to be minimized in case of quantile regression. A low pinball score is
indicative of an accurate probabilistic forecast model [26]. Furthermore, averaging pinball losses over all
quantiles over the forecast horizon produces the quantile score [68].
The Winkler score [69] allows for simultaneous assessment of reliability and sharpness, similar as CRPS
and the pinball loss function. Let (1 − α) be the nominal probability, then the Winkler score is defined as
[49]:


δ
 if Li ≤ xi ≤ Ui
Sci = δ + 2 (Li − xi ) /α if xi < Li (2.46)

δ + 2 (xi − Ui ) /α if xi > Ui ,

where δ = Ui − Li with Li and Ui representing the lower and upper bounds of the prediction interval
calculated on the previous day. As can be seen from eq. (2.46), the score increases when the observation
lies outside the prediction interval and an erroneous forecast is penalized more when the error increases.
Furthermore, a lower Winkler score therefore represents a better probabilistic forecast. In order to assess
the overall performance, the average Winkler score can be defined as

N
1 X
Sci = Sci . (2.47)
N i=1

3. Forecasting techniques

As this paper is mainly concerned with probabilistic forecasting, this section is dedicated to these par-
ticular forecasting methods. For an extensive overview of deterministic SPF methods, the reader is referred
to Inman et al. [15] and Diagne et al. [16]. In addition, the reader is referred to Hong & Fan [26] for
information on deterministic load forecasting.
16
Since statistical and hybrid approaches can be utilized for both PSPF and PLF, no distinction is made
between the latter in this section. While in SPF time series methods such as ARIMA and artificial intelligence
(AI) based methods such as ANNs are both considered to be statistical methods, in load forecasting usually
a distinction is made between statistical and AI based methods. For clarity, this paper adopts the former
terminology since data-driven approaches are statistical in nature.
Finally, it is important to note that this is by no means an exhaustive list of all methods available, but
more a representative list of methods most commonly utilized in probabilistic forecasting.

3.1. Statistical approach


As stated before, a distinction can be made between approaches where one assumes a PDF beforehand
i.e., parametric, or where no such assumption is made i.e., nonparametric. In line with that reasoning, this
section is similarly organized.

3.1.1. Parametric
The parametric approach relies on fitting a known density function to the errors of a forecast or by
assuming a density function around a deterministic forecast. For statistical approaches, these deterministic
forecasts are performed by well-known methods such as ANN or ARIMA. However, these methods are
outside the scope of this paper and the interested reader is referred to [15, 16, 26] for more information. As
a consequence of the parametric approach being dependent on deterministic models, this section is very brief.
However, to illustrate how this method works, David et al. [70] provide a fitting example. Their model
was based on a generalized autoregressive conditional heteroscedasticity (GARCH) model that estimates
non-constant variance. The errors of this model were assumed to be normal and thus modeled accordingly
to construct prediction intervals.

3.1.2. Nonparametric
Quantile Regression. As can be seen from table 1, the most common method to construct a nonparametric
PDF is QR. This method was introduced by Koenker & Bassett [71] in 1978, which they argued was required
as it is far from realistic to assume normality, or any other distribution for that matter, since a few errors can
cause deviation from these distributions, rendering them speculative. In order to establish a nonparametric
approach, Koenker & Bassett realized that the median can be defined as the solution to the minimization
of the absolute residuals due to the symmetric definition of the median, which leads to the 0.5th quantile.
In fact, QR is based on defining a regression model for each of the τ quantiles under investigation and
combining these so as to create a probabilistic forecast.
Let X̂ be a random response variable, let X be a predictor, let x̂ and x be realizations of the random
variables, and let F (x̂|X = x) = P (X̂ ≤ x̂|X = x) be the cumulative distribution function, then the
conditional quantile of order τ , qτ (x), can be defined as:

qτ (x) = F −1 (x̂|X = x) = inf{x̂ ∈ R, F (x̂|X = x) ≥ τ }, (3.1)


where τ ∈ [0, 1]. As stated before, the median can be defined as the minimization of the absolute residuals,
which can be generalized to acquire other quantiles through solving the following minimization problem [67]:

qτ (x) = arg min E{Lτ (X̂τ , x)|X = x}, (3.2)


x

where Lτ (X̂τ , x) is the pinball loss function defined in eq. (2.45). It is important to note that QR can also
be utilized as a post-processing technique to acquire density functions from point forecasting techniques.
Another important note is that since each quantile is predicted independently, quantile crossing can occur,
which violates the monotonicity property [72]

q̂τi (x) ≤ q̂τi+1 (x) ∀i, ..., τ quantiles, so that τi ≤ τi+1 . (3.3)

17
Many approaches have been proposed to circumvent this, such as monotone rearranging or joint estimation
[72]. The interested reader is referred to [67, 71, 73] for more information.

Quantile Regression Forests. Another method that is utilized to construct nonparametric density func-
tions is quantile regression forests (QRFs), which builds on random forests (RFs), an ensemble learning
method for regression developed by Breiman in 2001 [74]. QRFs was first proposed by Meinshausen in 2006
[67] and is designed to store all information regarding the observations and is able to construct a conditional
distribution based on that information, which contrasts with RFs, where only the mean of the observations
in a certain node is stored.
QRFs works as follows [67]: First, similarly as in RFs, one grows k trees T (θt ) with θt being the random
parameter vector that governs the variables under consideration at each splitpoint of the branch of tree T
and t = 1, ..., k, where the difference with RFs is that all information is stored instead of solely the mean.
The next step is to compute weights wi (x, θt ) and wi (x) of observations i ∈ {1, ..., n} for every tree
and every observation, respectively, for a certain realization x of predictor X. These weights are defined as
follows:

1{Xi ∈ R`(x,θ) }
wi (x, θt ) = (3.4)
#{j : Xj ∈ R`x,θ }
k
X
wi (x) = k −1 wi (x, θt ), (3.5)
t=1

where R`(x,θ) is a rectangular subset of the space S in which X is located for every leaf ` = 1, ..., L.
Furthermore, there is only one leaf ` for every x ∈ S and therefore also x ∈ R` , which can then be defined
as `(x, θ) for tree T (θ).
Finally, the estimate of the distribution function can be calculated as follows:

n
X n o
F̂ (x̂|X = x) = wi (x)1 X̂i ≤ x̂ , (3.6)
i=1

after which F̂ (x̂|X = x) can be plugged into eq. (3.1) instead of F (x̂|X = x) to obtain quantiles q̂τ (x).
Once the random forests are built and trained, observations from the test data set can be dropped down the
tree, after which it will be compared at each splitpoint and directed into the direction of the most similar
branch and subsequently the output can be estimated. In a sense, this can be compared to the nearest
neighbors method, explained later in this section.

Gaussian Processes. A method that is relatively underrepresented in Section 4 is the use of Gaussian
Processes (GPs), extensively discussed in the book by Rasmussen and Williams [75]. This nonparametric
and probabilistic approach is based on Bayes’ theorem, which is formulated as follows:

p(θ, y) p(y|θ)p(θ)
p(θ|y) = = , (3.7)
p(y) p(y)

where θ is a set of unknown parameters, y = {y1 , . . . , yn }, p(y|θ) represents the PDF of data y given model
parameters θ, p(θ) is the prior, representing prior belief on the model parameters and p(θ|y) the posterior
distribution, which is an updated versions of p(θ) after we have made observations y. In other words, the
aim is to renew our belief of the prior upon observing new data. In this way, parameters θ can be learned
in a probabilistic manner, where the PDF represents the uncertainty accompanying these parameters.
The definition of the GP states that it is a collection of random variables, and that any subset of these
random variables has a joint multivariate Gaussian distribution with mean µ and covariance matrix K [75].

18
More intuitively, one can imagine a GP to be a representation of some function f between, e.g., observations
x1 and x2 , generating outputs f (x1 ) and f (x2 ), which are then assumed to be jointly Gaussian distributed
according to N (µ, K). However, this does not have to be limited to two observations and we can therefore
broaden this concept to any number of inputs x = {x1 , . . . , xn }, such that the covariance matrix K can be
defined as [76]:

 
k(x1 , x1 ) k(x1 , x2 ) ··· k(x1 , xn )
 k(x2 , x1 ) k(x2 , x2 ) ··· k(x2 , xn ) 
K(x,x) =  , (3.8)
 
.. .. .. ..
 . . . . 
k(xn , x1 ) k(xn , x2 ) · · · k(xn , xn )

where k(xi , xj ) is a covariance function, or kernel, that represents the correlation between any of the inputs
x. For more information on kernels, the reader is referred to [75]. Additionally, we may define the mean
function as µ(x) so that the multivariate Gaussian distributions amounts to:

p(f(x)) = N (µ(x), K(x,x)) . (3.9)

In case a new observation is made, e.g., x∗ , the posterior distribution can be computed by first defining
the new joint distribution

     
f µ(x) K(x,x) K(x, x∗ )
p =N , . (3.10)
f (x∗ ) µ(x∗ ) K(x∗ , x) k(x∗ , x∗ )

Subsequently, we can compute the posterior distribution according to

m∗ = µ∗ + K(x∗ , x)K(x,x)−1 (f − µ(x)) (3.11)


−1
σ∗2 = k(x∗ , x∗ ) − K(x∗ , x)K(x,x) K(x, x∗ ). (3.12)

For more information on multiple-step ahead prediction with GPs, the interested reader is referred to
Girard et al. [77]. Likewise, the interested reader is referred to Roberts et al. [76] for more information on
GPs in case of time series modeling.

Bootstrapping. Bootstrapping was proposed by Efron in 1979 [78] as a method to estimate the proba-
bility distribution of a random variable R(X, F ) from random sample X = (X1 , X2 , ..., Xn ), drawn with
replacement from an unknown parent distribution F . The bootstrap method is widely applied in many
research areas due to its simplicity and consists of three steps. First, a sample PDF F̂ is constructed of n
realizations of Xi , i.e., x1 , x2 , ..., xn . Second, create a random sample of size n, i.e., the bootstrap sample,
Xi∗ = (X1∗ , X2∗ , ..., Xn∗ ) by drawing with replacement from F̂ , where Xi∗ = x∗i . Finally, the distribution
of R(X, F ) can be approximated by the bootstrap distribution R∗ (X ∗ , F̂ ), for which MC is often used
as it allows good approximation of the parent distribution in an efficient way, although a disadvantage of
bootstrapping is the amount of required data and the consequent computational burden.

Lower Upper Bound Estimate. The Lower Upper Bound Estimate (LUBE) method was introduced by
Khosravi et al. in 2011 [79], because, they argued, that the prevalent methods utilized to construct prediction
intervals were questionable. The authors argued that these methods are mainly based on minimizing the
prediction error, where instead a method should focus on improving prediction interval quality, i.e., PICP,
PINAW and CWC, as defined in eqs. (2.38), (2.40) and (2.41), which are the key characteristics of prediction
intervals.
The LUBE method begins with the construction of several neural networks (NNs) with two outputs
rather than one, one being the upper bound of the prediction interval and the other the lower bound.
19
Traditional learning methods can be utilized to train the NN on a training data set and obtain the initial
parameters and weights, or these can be randomly assigned. The candidate with the lowest PINAW, whilst
satisfying PICP, is chosen as the optimal structure, after which NN weights and an optimization algorithm,
e.g., particle swarm optimization (PSO) [64], are initialized. Subsequently, this optimal structure is used
to construct prediction intervals on the training data set, of which CWC is calculated. If, after several
iterations, CWC does not improve anymore, the optimal set of parameters of the NN are utilized for testing
the NN on the test data set, and prediction intervals can be created accordingly.

Gradient Boosting. Gradient boosting (GB) was proposed by Friedman in 2001 [80] with the aim of
linearly combining weak learners, i.e., independent variables with limited predictive information, into a
single prediction model. Similar as in other boosting methods, the prediction model is built iteratively,
called boosts, upon an initial guess function, often the mean of the data set, and the function that is fitted
to the subsequent residuals. This means that the model improves itself by learning from the errors of the
previous model, which are the instances that are difficult to fit, after which all the models are given weight
and combined into a set of predictors. The final model can then be formulated as follows [80]:

M
X
F̂ (x) = fˆ0 (x) + fˆm (x), (3.13)
m=1

where fˆ0 (x) is the initial guess, fˆm (x) is the model of the residuals at boost m and M is the total number
of boosts. During the training stage, the aim is to find the function that describes the errors according to a
diffentiable loss function L(x̂, F (x)), e.g., the quantile loss function that is formulated in eq. (2.45), at each
boost based on steepest-descent according to [80]:

fˆm (x) = −ρm gm (x), (3.14)


where:

 
∂ [L(x̂, F (x))]
gm (x) = Ex̂ |x (3.15)
∂F (x) F (x)=F̂m−1 (x)

ρm = arg min Ex̂,x L(x̂, F̂m−1 (x) − ρgm (x)). (3.16)


ρ

An interesting feature of GB is that intrinsic variable selection is carried out, as well as the possibility
to train τ different models for each quantile to attain the density function. It is also interesting to note that
extensions of this method exist, such as the one proposed by Bühlmann in which only one predictor among
d-predictors is selected. The interested reader is referred to [80, 81].

Kernel Density Estimation. Kernel density estimation (KDE) is a nonparametric method to estimate
density F̂ of a random variable drawn from an unknown density F , independently proposed by Rosenblatt
[82] and Parzen [83]. Imagine drawing a sample (x1 , x2 , ..., xn ) from the aforementioned density F and
subsequently organizing them as bins in a histogram. The histogram has a number of bins that reach higher
than others, depending on the distance between the sample drawings, e.g, if the values of drawings are close
to each other and resolution is coarse, these drawings will be added to the same bin. This means that
the histogram is non-smooth and the kernel density estimator aims to smooth out the contribution of each
sample point xi , where i ∈ {1, ..., n}, by imposing a kernel function with a certain width on each point. The
kernel density estimator is formulated as follows:

N N  
1 X 1 X x − xi
F̂h (x) = Kh (x − xi ) = Kh , (3.17)
N i=1 N h i=1 h

20
where K(·) is the kernel function and h > 0 is the bandwidth, which is a smoothing parameter. Kernel K(·)
is required to be a function that integrates to zero and has zero mean, e.g., uniform, triangular or normal.
Special attention is required when selecting h, since setting the parameter too low implies an undersmoothed
KDE in which noise of the underlying distribution distorts F̂ . On the other hand, setting h too high means
that information from underlying distribution F will be lost. A common method to determine the optimal
bandwidth hopt is by minimizing the asymptotic mean integrated square error (AMISE). The AMISE can
be expressed as follows [82]


h4
Z
1 2
AMISE(h) = + |F 00 (x)| dx. (3.18)
2hN 36 −∞

It is important to note that the AMISE(h) depends on the second derivative of the underlying distribution
F , which is the distribution one wants to ascertain and is therefore unknown. However, it can be shown
that similar expressions can be formulated utilizing higher derivatives and that this needs only be done two
or three times before F can be assumed normal. This is however out of the scope of this paper and the
interested reader is referred to [82–84].

k-Nearest Neighbor. k -Nearest Neighbor (k -NN) is a relatively simple machine learning method and
relies on comparing an observation to k similar past observations in a training sample to create a probability
distribution [85]. The algorithm calculates the distance in a hyperspace, e.g., Euclidian distance, between
the observation and past observations to ascertain the k neighbors that are closest to the current observation.
For example, if k = 1, the algorithm would simply select the closest neighbor.
The value for k needs to be relatively high so as to reduce the overall noise and can be empirically selected
or chosen by cross-validation. Furthermore, it is common to assign more weight to past observations that
are closer to the current observation, e.g., by giving each neighbor a weight of 1/d, where d is the distance
between the observation and a neighbor.
A key aspect to consider for k-NN is that the dimension needs to be kept relatively small because of the
curse of dimensionality.This implies that the search space grows exponentially with increasing dimension and
creates significant sparsity, up until clusters of observations become too far apart and statistical significance
reduces substantially.

Analog Ensemble. The analog ensemble (AnEn) was proposed by Delle Monache et al. [86] in 2013 and
can be seen as a hybrid approach, as it incorporates NWP forecasts, NWP past forecasts and PV power
production measurements. It shows many similarities with k -NN in the sense that the algorithm searches
past forecasts of meteorological variables that are similar to the current forecast, under the assumption
that the errors of the past forecasts are likely to be similar to those of the current NWP forecast. Then,
measured power production related to past forecasts that are similar as the current NWP forecast are used
to construct a density function. The similarity, or distance, is formulated as follows [86]:

v
u
N u t̃ 2
X wi u X 
||F̂t , At || = t F̂i,t+j − Ai,t+j , (3.19)
σ
i=1 F̂ij=−t̃

where F̂t and At are the current and analog past forecasts for lead time t of deterministic NWP models,
respectively; N , wi and σF̂i are the number of physical variables, their weights and the standard deviation
of their respective time series, and t̃ represents the half-width of the time window over which the metric is
computed. Consequently, j is the width of the time window over which the metric is computed. Weights wi
can be computed by minimizing the CRPS, defined in eq. (2.43), over the training set. After the distances
have been calculated, it is possible to construct a rank of which the best n can be selected to construct the
density function [86].

21
Delle Monache et al. [86] point out that the advantage of AnEn in comparison to an NWP-based
ensemble is the fact that AnEn only requires the physical model to run once, whereas an NWP ensemble
needs multiple runs with perturbations to construct a density function. A potential disadvantage is that no
post-processing is applied on the NWP forecasts, which have a tendency to be biased.

3.2. Physical approach


With the physical approach, it is common practice to assume a density function that describes the errors.
The reason for this is that the physical approach allows for less variation in terms of the nonparametric
approach, since in that case post-processing in the form of a statistical method needs to be applied. This is
commonly referred to as a hybrid method and is elaborated upon in Section 3.3.

3.2.1. Parametric
The parametric approach relies on modeling the errors of the forecast method, in this case a physical
model, as a certain density function, e.g., normal, beta or gamma. This can be achieved in several manners.
For example, Lorenz et al. [8] modeled the forecast errors as a normal distribution and subsequently the
error was assessed for dependency on the clear sky index and solar zenith angle. This dependency was then
modeled as a fourth order polynomial, after which the future errors can be estimated.
Another approach is that by Fonseca Jr. et al. [87], where both a normal and a Laplacian distribution
were assumed, after which the prediction interval limits were calculated in which the forecast can be found
with a certain predefined probability.

3.2.2. Nonparametric
As stated before, there is little variation in terms of possible methods for the nonparametric approach.
In fact, currently one method is being utilized, which is called ensemble forecasting. This method relies on
multiple runs of an NWP model but minor perturbations in the initial and boundary conditions, designed
to be statistically identical. In this way, several deterministic forecasts are produced, after which a density
forecast can be constructed from these. A disadvantage of this method is that running an NWP model is
computationally demanding, especially several runs with perturbations.

3.3. Hybrid approach


As stated before, NWP models have limited capability to construct density functions for probabilistic
forecasts. Similarly, forecasts based on sky imagery also lack the capability to produce a PDF. Therefore,
hybrid approaches exist where post-processing of a physical approach is done by a statistical approach in
order to remove bias and construct a density function.
Chu et al. [88] proposed a hybrid approach based on sky imagery and five statistical models, i.e., four
ANNs and one support vector machine (SVM) to predict mean DNI and the associated standard deviation
and classify variability periods, respectively. The variance was utilized to construct prediction intervals,
under the assumption that these were normally distributed.
Another example is the work done by Sperati et al. [89], who utilize an ensemble forecast of the NWP
model by the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction Sys-
tem (EPS), after which an NN is used to reduce bias and to create a PDF, in combination with two other
post-processing techniques, which will be elaborated upon in the following section.

4. Review sorted on temporal horizon

The present section reviews the most recent studies regarding PSPF and PLF, in combination with several
other studies that are deemed important to the field of forecasting. The section is organized according to
forecast horizon, since the horizon is a prominent feature of a model. In addition, this division allows us to
discern studies from each other in a higher resolution than for example divided based upon parametric or
nonparametric. Furthermore, Table 1 provides an overview of the studies under review in this paper, and is
sorted in a chronological order with respect to the year it was published.
22
4.1. Intra-hour
Intra-hour forecasting is generally based on statistical methods although in the case of PSPF, it can
also be achieved through sky imagery, as can be seen in Figure 1. Regarding load forecasting, intra-hour
forecasting is rather uncommon, which is due to the coarse resolution of consumption data. However, recent
developments of smart meters permit measuring at higher temporal resolution, which allows reduced forecast
horizons.

Solar. The study that works on the shortest forecast lead time is by far the work done by Torregrossa et al.
[90]. They argued that since solar irradiance is extremely variable on a sub-second time scale, it is interesting
to find bounds of solar irradiance, which in turn might aid real-time control of smart grids. The proposed
method, named dynamic interval predictor (DIP), works on the premise that a significant correlation can
be found between the derivative of solar irradiance and the deterministic forecast error, and that prediction
intervals can be estimated based on the aforementioned correlation. An advantage of this method is that
it is not dependent on the deterministic forecast model. However, one poor assessment of Torregrossa
et al. is that they state that "all the works presented in the domain of PV forecasting assume a Gaussian
innovation", while they refer to Bacher et al. [91], in which QR has been utilized to construct nonparametric
prediction intervals. Nevertheless, the authors proposed an innovative method that worked with a temporal
resolution of 250 and 750 ms and a lead time of 2 to 6 seconds, which showed good performance in terms of
the coverage probability. Depending on the resolution and horizon, PICP was between 97.94 - 99.92%. A
measure was utilized to quantify the width of the prediction intervals, but unfortundately not PINAW as
defined in eq. (2.40). Rather than taking all observations into account, the authors excluded the forecasts
that fell outside the prediction interval.
As a continuation on the work above, Scolari et al. [92] worked on enhancing the performance of the DIP
model by applying two main improvements. Firstly, the authors found that defining the error as an absolute
error between forecast and measurement yielded more realistic results than a relative error, since the latter
could be misleading at times of low measured alternating current (AC) power. Secondly, they clustered the
correlation between the derivative of the measured AC power and deterministic forecast error as a function
of the AC power itself. The authors went on to show that indeed the absolute error performed better than
the relative error, where the former method achieved PICP consistently higher than the nominal confidence
level. Finally, they showed that the inclusion of clustering led to better performance, most notably leading
to lower values for PINAW (0.0055 - 0.24%) and CWC (0.055 - 0.24%) for lead times of 100 ms to 500 ms,
respectively.
The authors of the study above continued working on the (very) short time horizon in [93]. Similarly,
clustering by means of the k-means algorithm, was performed to group observations of the clear sky index
into specific clusters, depending on the values of explanatory variables. However, in contrast to the previous
studies the derivatives were not taken into account in this paper. Clustering was done for both the original
and differentiated clear-sky index time series, where the latter was done to further stationarize the original
time series. After clustering, the prediction intervals could be calculated for each cluster. In order to
guarantee computational performance, the aforementioned calculations, i.e., clustering and calculation of
the prediction intervals, can both be done off-line. Upon running the model, observations of explanatory
variables were compared with the clusters and the prediction interval corresponding to the closest cluster
was returned. The results showed the effectiveness of the model: for a forecast horizon and resolution of 500
ms, the proposed model achieved a PICP between 96.1 - 98.2%, PINAW between 0.047 - 0.27% and CWC
between 0.047 - 0.27%, depending on the season. When the forecast horizon was increased and the temporal
resolution decreased to 1 minutes, the model achieved a PICP between 96.9 - 97.8%, PINAW between 3.26
- 10.5% and CWC between 0.3.26 - 10.5%, depending on the season. Finally, for a forecast horizon and
temporal resolution of 5 minutes, PICP was found to be between 96.1 - 96.7%, PINAW between 6.70 - 17.9%
and CWC between 6.70 - 17.9%. It is interesting to note that the PICP results presenter here were achieved
at a nominal confidence level of 95% and can therefore be considered valid.
Wan et al. [94] proposed a method to forecast power generation of a 10 kW PV system in Denmark with
a forecast horizon of 5 minutes and identical temporal resolution. The utilized method was based on QR
and an extreme learning machine (ELM), which is a form of a feedforward NN. The advantages of ELMs
23
are that input weights are randomly selected and weights between hidden nodes and output are learned
in a single step. Therefore, it is effectively similar to a linear system, which significantly reduces training
time. Good results were achieved in terms of coverage rate, with a Score of -0.0222, while outperforming all
benchmark models, i.e., persistence, bootstrap based NN (BNN) and granule computing (GC).
A hybrid approach was utilized by Chu et al. [88] to forecast direct normal irradiance (DNI), which
worked with a forecast lead time of 5 to 20 minutes and a temporal resolution of one minute. The method
was based on sky imagery, SVMs and ANNs sub-models with the aim to produce real-time prediction
intervals. First, sky images were analyzed based on the ratio of red intensity and blue intensity, since cloud
pixels tend to have higher red intensity than clear sky pixels. Then, SVM was utilized to classify sky images
and DNI time series into two categories: days with high and low variability. Finally, two ANNs were trained
for both categories: one that predicts irradiance and one that predicts the standard deviation, for which a
normal distribution was assumed. The results showed superior performance in terms of CWC on all fronts
when compared to benchmark models, i.e., persistence and BNN, most notably on days with high variability,
where CWC was between 0.554 and 8.733.
Chai et al. [95] approached the problem of constructing prediction intervals of a highly variable time
series by segmenting the time series into uniform time windows with lower and upper bounds, and utilizing
all acquired granular time series as inputs to a random vector forward link (RVFL) network. Forecast lead
time was 10 minutes, with a resolution of 1 minute. Similarly as with the LUBE method, the authors realized
that higher coverage probability, i.e., high reliability, can be achieved by increasing prediction interval width.
Therefore, they solved this problem by minimizing the average coverage error and the Score by PSO. Results
revealed that the PICP and PINAW were 91.20% and 16.94%, respectively, and it was shown that reliability
was significantly improved during high periods with high variability, when compared to a model earlier
proposed by the authors.
In order to forecast solar irradiance, David et al. [70] proposed to utilize ARMA and GARCH models in
combination with recursive estimation of the parameters to construct prediction intervals in a parametric
manner, under normality assumption. The forecast horizon was 10 minutes, similar as the resolution. Since
the time series should be stationary, the authors used the clear sky index rather than GHI. The recursive
estimation of the parameters is based on a recursive least square (RLS) and is used to incorporate short-term
patterns such as hurricanes, that have a profound impact on irradiance. The results showed an improvement
in terms of CRPS of 7.8% to 25.1% when compared to a persistence ensemble. However, the authors note
that although a normal distribution has been assumed, this is not a valid assumption, which in turn caused
it to be over confident under certain conditions.
Another ELM in combination with QR was employed by Golestaneh et al. [44] for PSPF with lead
times up to one hour. The authors selected ELM because of its extremely fast learning mechanism and
utilized PSO to ascertain the optimal weights of output nodes with respect to the skill score. Furthermore,
the forecast window of each day was limited to certain hours, the number of which was kept constant
throughout the year. As benchmarks a persistence, climatology, hybrid intelligent algorithm (HIA) and
bootstrap ELM (BELM) were applied, where HIA stems from advances in PWPF to find nonparametric
predictive densities. As case studies, two separate sites were investigated with 10 minute and one hour
forecast horizons at minutely resolution. In addition, k -fold cross validation was utilized to to determine the
optimal value for the number of lags that were incorporated. It was shown that the benchmarks BELM and
persistence performed well in terms of quantile score but lacked reliability, which deviated up to 20% in case
of BELM. Furthermore, climatology and HIA performed poorly in terms of both sharpness and reliability,
whereas the proposed method achieved high reliability while having acceptable sharpness. Quantitatively,
the proposed method improved performance over persistence in terms of quantile score with 4% to 14%.
Boland [96] applied the coupled autoregressive and dynamical system (CARDS) to forecast solar radiation
on three sites in the French West Indies by utilizing partial correlations between these sites to improve
forecasts, assessed on 10 minute and hourly scale. However, it was found that correlation on the highest
resolution was insignificant, while being small but significant on the hourly time scale. The method works
as follows: First, the power spectrum is modeled by using a Fourier series, after which the contribution
of the model is subtracted from the data, leaving the residual series that is subsequently to be modeled
with the CARDS approach. Due to the correlation, the author took into account lagged measurements of
24
a single site, but also those of the other two sites, on the hourly time scale. Then, a similar approach was
taken to model the variance with an autoregressive conditional heteroscedastic (ARCH) model, under the
assumption that the errors were normally distributed. Unfortunately, no probabilistic performance metrics
were utilized to assess the performance of the proposed method.
In order to construct a probabilistic forecast of PV generation forecasts, Wang & Jia [97] proposed a
nonparametric model based on radial basis function (RBF) for the deterministic forecast and the LUBE
method for the prediction intervals. The forecast horizon was one hour, with a temporal resolution of
15 minutes. To improve training of the model, the authors organized historical data based on a similar
day approach, in which the sample was constructed based on seasonal type, day type and atmospheric
temperature. Similarities of the latter were based on calculation of the Euclidian distance. The RBF
network was selected due to the fact that it is a feedforward network and therefore does not require the
backpropagation method to train it, which increases learning speed. Although the method utilized in this
paper is promising, no probabilistic performance metrics were utilized.
Chu & Coimbra [98] aimed at predicting DNI by utilizing k-NN, with forecast horizon of 5 to 20 minutes
at 1 minute resolution. In this case, k was set to 30 and the neighbors were weighted based on distance
between them and the observation. A key aspect for k-NN is to reduce the dimensionality as much as possible,
as explained in section 3.1.2, and therefore the authors used lagged DNI observations as endogenous inputs
and lagged DHI and sky image features as exogenous inputs. The results showed that the k-NN ensemble
outperformed both the persistence ensemble and k-NN with assumed Gaussian distribution. The authors
reported, for a nominal confidence level of 90%, PICP between 0.93 - 0.96 and PINAW between 0.22 - 0.57
for a 5 minutes horizon, and PICP between 0.91 - 0.93 and PINAW between 0.31 - 0.70 for a 20 minutes
horizon. The model outperformed the benchmarks as well in terms of CRPS, reportedly achieving 0.031 -
0.098 for a 5 minute horizon and 0.049 - 0.137 for a 20 minute horizon. Unfortunately it is not clear whether
the units of the aforementioned results are W/m2 or kW/m2 , since the former would yield very impressive
results although these would not be in line with reported RMSE.

Load. As stated before, it is rather unusual to forecast electricity demand with an intra-hour, or even intra-
day, horizon. Therefore, only two studies performed on this horizon will be reviewed in this section. The
first one is by Bracale et al. [99]. In this study, a stochastic time series in combination with Bayesian
Inference (BI) approach is taken to create probabilistic forecasts on horizons of 15 minutes and 24 to 48
hours. Moreover, several density functions were utilized to construct prediction intervals for a single domestic
load and an aggregate of five domestic loads, depending on whether the time series was differenced (normal
distribution) or not (Weibull or Log-Normal distribution). The proposed model uses measurements and
a prior PDF of the parameters in combination with ARIMA forecast of the mean to derive a conjugate
distribution of the prior PDF by means of BI, in order to establish a predictive posterior distribution of the
domestic load. The results showed an improvement of 27-31% when compared to probabilistic persistence.
Furthermore, it was shown that the method that assumed normal distribution provided the best reliability,
with a maximum deviation from the ideal reliability of under 3%.
The second one is the study by Guan et al. [100] that forecasts the load on an hourly horizon with a
temporal resolution of 5 minutes, implying 12 forecasts for each 5 minutes during the following hour. In
order to achieve that, the authors decomposed load data into three components at different frequencies, to
be used in three wavelet NNs (WNNs). In addition, calender variables were used to as inputs for the WNNs,
so as to assist these in recognizing periodical patterns of load data. The WNNs were then trained by hybrid
Kalman filters, which has as one of the outputs an innovation covariance that can be used to derive prediction
intervals. From the covariances, the variance estimates can attained and added together by orthogonality of
the frequencies to ascertain the overall variance, under the assumption of normal distribution. Although no
probabilistic performance metrics were used to assess the prediction intervals, the authors showed that the
normality assumption is only valid after removing the tails, as these were heavier than those of a Gaussian
distribution.

25
4.2. Intra-day
Forecasting of solar power and electricity demand with an intra-day horizon is common since generally
two markets exist where energy is traded: intra-day and day-ahead. Therefore, intra-day forecasting of both
these aspects is important to balance production and consumption. As we will reveal, the majority of the
methods that will be reviewed in this section rely on statistical methods, since physical models tend to be
too coarse in terms of temporal resolution.

Solar. Bracale et al. [101] proposed a BI approach in combination with an AR linear model, of which the
aforementioned study [99] is a continuation, in order to forecast PV power production with a 1 - 3 hour
horizon and 1 hour temporal resolution. However, in this study the authors utilized a modified Gamma
distribution to model the clearness index distribution, of which the only unknown is the mean clearness
index at the next time step, which is estimated by the AR model. Unfortunately, no probabilistic metrics
were used to assess the prediction intervals.
One of the few studies analyzed in this review that utilized satellite observations is the one by Bilionis
et al. [102], in which they employ a recursive Gaussian Process (rGP). As a first step, in order to reduce
the dimensionality of the satellite images, they employ factor analysis (FA), which is a generalization of
probabilistic principal component analysis (PCA). The general idea of reducing the dimensionality is to
construct two maps: a reduction and reconstruction map, where the former has a little dimensions as
possible without the loss of too much information. Subsequently, the authors applied the rGP to learn the
dynamics of the reduced input space to perform iterative predictions with a lead time of 8 hours and 30
minute resolution. Although the predictive density is not Gaussian anymore due to the nonlinearity of the
reduced dynamics, numerical methods could still be applied to produce a predictive density. The results
showed that the proposed satellite based method performs slightly worse than the ground based model in
terms of the one-step ahead forecast but outperforms it for larger time horizons, with an average CRPS of
0.18, although it is not directly clear which unit CRPS has in this case.
A statistical method to forecast a full density of solar irradiance for a horizon and resolution of one
hour was proposed by Grantham et al. [103]. The method was based on the CARDS model in combination
with bootstrapping and a map of solar positions, with the aim to show how a deterministic forecast can
be transformed into a nonparametric probabilistic forecast. The authors stated that irradiance depends on
cyclical, autoregressive and error components, of which the latter is assumed to be caused by solar position.
Therefore, by plotting the residuals of the in-sample forecasts against sun hour angle and sun elevation,
the authors organize the systematic variation in the variance. It should be noted that a similar approach
is taken by Lorenz et al. [8], although Lorenz et al. assumed a normal distribution, whereas Grantham et
al. take a nonparametric approach. In order to assess the performance, the authors used the CRPS, which
showed an improvement of 10% over the benchmark ensemble. In addition, the proposed method produced
narrower prediction intervals than the benchmark model, in combination with higher coverage rates.
In order to perform risk assessment for distribution networks with high penetration of PV, Tao et al.
[104] proposed a framework where a dynamic Bayesian network (DBN) performs the probabilistic forecast.
However, since the main focus of that paper is to perform risk assessment, no attention has been paid to
assessing the probabilistic forecasts in any way.
AlHakeem et al. [105] proposed a generalized regression NN (GRNN) of which the weights and biases
were optimized by PSO to perform a deterministic forecast, while bootstrap was applied to construct pre-
diction intervals. However, the time series of measured power production was first pre-processed by wavelet
transform (WT) to reduce noise and stationarize the time series. The forecast horizon was 1 - 6 hours at an
hourly resolution. Interestingly, rather than training the GRNN with a vast amount of data, hourly data of
15 days before the forecast were used. In addition to the decomposed time series, the GRNN was fed with
irradiance and temperature. After the GRNN had produced the forecasts for each frequency, wavelets were
reconstructed and bootstrapping could be applied. Unfortunately, only deterministic performance metrics
were used to assess the forecasts, although plots showing the prediction intervals revealed that these were
rather wide.
An interesting study is one performed by Bessa et al. [106], where the authors proposed a method that
combines distributed PV production measurements in both a vector autoregressive (VAR) and a VAR with
26
exogenous inputs (VARX) framework to forecast with a horizon of 6 hours at an hourly resolution. Two
levels of aggregation were used for the measured data: household level, i.e., low voltage (LV), and secondary
level, i.e., medium voltage (MV)/LV (MV/LV). The method begins with normalization of measured solar
power by utilizing the clear-sky generation in order to stationarize the time series. Then, the models are
established so that they incorporate measurements of solar power of a particular site in combination with
lagged measurements of neighbor sites, and therefore they use both time and spatial information. However,
no details were given regarding correlation between the sites that were used. Furthermore, RLS is used to
estimate parameters of the models, which subsequently reduces the amount of data required. Finally, GB is
deployed to select predictors and construct a predictive density. The results showed that the improvement
on the secondary level of the VAR model over the AR benchmark in terms of CRPS was between 1.4%
and 5.9%, while the VARX model showed improvements up to 16.4% over the benchmark. However, the
CRPS improvement of the VAR model on the household level ranged between -2.8% and 4.6%, which was
because of poor performance in some quantiles, as explained by the authors, who stated that in some cases
information from distributed sensors decreased the forecast skill. The authors deemed this an interesting
result because "an improvement in point forecast skill is not translated to an improvement in some quantile
forecasts", which contrasts with a remark by the authors of [107] in the case of wind power forecasting, as
pointed out by the authors.
The study performed by Liu et al. [108] is one of the few studies that utilizes NWP ensembles to
generate a nonparametric probabilistic forecast on the intra-day horizon, although day-ahead and 2-day
ahead forecasting is also performed. In that paper, the Weather Research and Forecasting (WRF) model
was used, due to its ability to simulate with high resolution, which is the reason that the temporal resolution
was 30 minutes. In order to create an ensemble forecast, the authors employed the lagged averaged forecast
(LAF) method, where the model produces three separate forecasts: intra-day, day-ahead and 2-days ahead.
The following day, the WRF produces forecasts for the same lead times and the intra-day forecast of this
day is combined with the day-ahead forecast of the previous day to create an ensemble. In this study, a
total of three members is used to create the ensemble, although the authors note that including more might
improve results. Furthermore, the LAF method computes each member with different initial conditions and
different initial time. The results showed that the empirical coverage rate is generally 20% to 30% lower than
the nominal coverage rate, which is likely due to the overestimation of GHI by the WRF model, as pointed
out by the authors. However, it should be noted that the coverage probability of prediction intervals should
be higher than the nominal confidence level, since these are invalid otherwise and should be discarded [65].
In light of the Global Energy Forecasting Competition 2014 (GEFCom2014), Nagy et al. [109] proposed
a method based on four ensembling techniques, i.e., voting, bagging, boosting and stacking, as previous
studies had shown that using multiple predictor tends to lead to better results, according to the authors.
The organizers of the competition have provided a significant amount of data, as can be seen from table 1.
Two models were built to construct a full predictive density for an intra-day horizon with hourly resolution: a
voted ensemble of a QRF and a stacked RF - GB decision tree (GBDT). The results showed that performance
in terms of the pinball loss gradually improved over the course of the competition, with a final result of
0.006 - 0.009, resulting in a second place in the competition. Finally, the authors noted that stacking RF
- GB led to the best results for both solar power and wind power forecasting, but that model training was
very time consuming, although no specifics were mentioned.
Similarly as the previous study, the paper published by Juban et al [110] also participated in GEF-
Com2014. However, the aim of Juban et al. was to create a generic framework for probabilistic forecasting,
and was applied to wind, solar and price forecasting. The proposed framework approached the problem
as follows: First, a multiple QR (MQR) framework was established. Second, the most relevant predictors
were selected by the forward-stepwise procedure. Then, features were generated by means of radial basis
functions (RBFs) in order the map the non-linear relationships in the aforementioned input data. Finally,
the authors proposed an optimization method based on the ADMM algorithm to minimize the quantile loss
function in combination with `2 regularization over all quantiles and all inputs and outputs in order to fit
a set of parameters that can be used to predict each quantile. Although the proposed framework did not
end high in the PSPF competition, with a pinball loss of 0.0086 they ended fifth in the ranking, the true
value lied in the generality of the framework, which resulted in top five rankings for wind, solar and price
27
forecasting.
Zhang et al. [111] applied Gaussian Condtional Random Fields (GCRFs) to predict one-step ahead solar
power with hourly resolution. GCRFs are utilized because they allow modeling of both spatial and temporal
correlations, and applied this feature to a city in California. Moreover, the authors investigated the fact that
GCRFs still can perform relatively well with missing data, e.g., in case of equipment failure or communication
issues. The results indicated that, in terms of RMSE and MAE, the proposed model outperformed the ARX
benchmark significantly when a moderate or substantial amount of data was missing. Since GCRFs are
able to provide a predictive density, the authors presented the PICP for several standard deviations during
different seasons, for the scenario where there was no missing data. During winter, the GCRF did not
manage to attain PICP high enough to be considered valid, but this was the case for the rest of year, likely
due to increased variability in the weather. Unfortunately, no other probabilistic measures were utilized.
A different approach was taken by Aryaputera et al. [112]. In their study, the authors aimed to compare
the performance of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS)
when predicting the intra-day accumulated solar irradiance for Singapore. Since these are post-processing
techniques, the forecasts were retrieved from ECMWF, the Japanese Meteorological Agency (JMA) and
Korean Meteorological Agency (KMA). As a first step, the authors identified that a skew-normal PDF was
most suitable for both methods. Next, linear regression is utilized for both BMA and EMOS to remove
bias and the optimal number of training days is determined using the Exhaustive Search (ES) approach. In
order to assess the quality of the forecasts, the authors looked at both reliability diagrams and CRPS. In
terms of reliability, the BMA with skewed-normal PDF performed best, since it showed relatively narrow
prediction intervals with low error. Furthermore, this method also achieved the lowest CRPS, with a value
of 292 Wh/m2 .
Takeda [113] took an interesting approach to forecasting solar power for a large area in Japan. After
having identified that a bottom-up strategy, i.e., predict PV generators separately and aggregate afterwards,
can reduce MAE with 3% when compared to direct strategy, where one forecasts the entire aggregate
immediately [52]. However, the author also identified that smart meters are not common enough and
therefore utilities cannot track hourly PV generation accurately. Therefore, PV power generation on a local
level is estimated via weather observations and forecasts, and monthly purchased PV volumes. In addition,
monthly installed capacities were also considered as an exogenous input. In order to be able to predict and
analyze, an ensemble Kalman filter (EnKF) in combination with state-space models (SSMs) was utilized.
The reason for using SSMs in combination with EnKF, the author argued, is that statistical methods, such as
ANNs or MLR, do not provide any insight into structural changes in electricity consumption. Furthermore,
EnKF is able to estimate nonlinear SSMs. The resulting CRPS was found to be 24.06 GWh, which was 5.6
GWh lower than the MAE, indicating that the results from the ensembles are appropriate when compared
to the deterministic forecast.

Load. Almeida & Gama [114] proposed a method to construct prediction intervals based on NNs with a
lead time of 0 to 24 hours and with hourly resolution. The authors used aggregated load demand from 45
substations to which different types of consumers were connected. They argued that since many different
load profiles existed, these needed to be clustered so as to improve forecast performance. Clustering was
performed through the Kulback-Leibler distance, since Euclidian distance poses difficulties when working
with less stable data, such as residential load. In order to create nonparametric prediction intervals, two
separate approaches were employed. The first was the dual perturb and combine method (DPC), where
predictions are made using slightly perturbed data. The second approach was conformal prediction (CP),
which looks at past data to determine the level of confidence in future predictions, under the assumption
that the data was identically and independently distributed (i.i.d.). The inputs of the multi-layer perceptron
(MLP) were calender variables and past values of the load curves, belonging to a certain cluster. From the
results it appeared that the DPC method showed more consistent performance regarding PINAW over
all clusters than CP, with an average of 20%. In addition, the reliability diagrams showed that coverage
probability reduces significantly in case of a cluster where load demand is quite variable. Unfortunately, one
plot stating the PICP was given where it was not mentioned at which confidence level the PICP of 63% and
96% were achieved.
28
4.3. Day-ahead
In case of PSPF, forecasts with lead times of day-ahead and longer are mainly performed with the use of
an NWP model, after which a statistical post-processing technique is applied to create prediction intervals,
reduce bias and generally improve results. This contrasts with PLF, where no physical models have been
utilized, as can be seen from Table 1. However, in the latter case temperature is often taken into account,
since that is an important driver of electricity consumption, especially when heating is electric.

Solar. In order to take inertia of meteorological systems into account, Golestaneh et al. [115] realized that
spatio-temporal dependencies have to be considered, since PV power generation on an aggregated scale
shows strong dependencies in time and space. Therefore, time series with an hourly resolution of three
neighboring PV sites were studied, in combination with output variables of the NWP model provided by
the ECMWF. To begin with; the authors applied QR to construct predictive marginal densities for each
location and lead time. In order to model the dependency between the locations at different lead times, a
Gaussian copula was employed, which is a form of a multivariate distribution. The copula can then be used
to generate several scenarios, i.e., trajectories, of future power generation at a specific location and lead
time. The CRPS scores of individual zones showed a clear PV power generation like profile, where the error
was highest at noon or late in the morning, amounting to 9% to 11% of nominal production. Reliability of
the copula forecast was assessed through the PIT histogram, which showed near uniform distribution and
the trajectories can therefore be regarded as nearly identical to the predictive distributions.
One of the first papers that considered PSPF, and to the best of the authors their knowledge, the first
paper to consider nonparametric PSPF, was written by Bacher et al. [91]. In their study, the authors used
measurements of power production from 21 PV systems to predict power generation up to 36 hours ahead
with an AR and AR with exogenous inputs (ARX) model, where NWP variables were inputs to the latter
model. Since the aforementioned models were constrained by stationary time series, measurement data was
normalized by the clear sky power production, which was found by statistical smoothing through weighted
QR, the weights of which were determined by a two-dimensional Gaussian smoothing kernel. In addition,
RLS in combination with an adaptive linear model for PV power generation was employed to convert NWP
forecasts, in order to account for changes in conditions, such as dirt on panels. Furthermore, QR was
employed to construct predictive densities for several quantiles but unfortunately no probabilistic measures
were utilized to assess the performance of the forecast. In fact, this is quite common for older studies and
we will discuss this in Section 5.
Another notable paper that was one of the first to consider PSPF was the study performed by Lorenz
et al. [8]. The aim was to provide probabilistic forecasts for 11 dispersed PV systems in southern Germany
up to three days ahead at an hourly resolution, based on the NWP model from the ECMWF. However,
since both temporal and spatial resolution were too coarse, three methods were introduced to increase
resolution. The first method included spatial averaging and linear temporal interpolation, which showed to
reduce rRMSE significantly for clear sky days. The second method was based on replacing the forecasts of
ECMWF by the clear sky irradiance for clear sky days, i.e., when total cloud cover (tcc) was lower than
0.03. The final method was based on removing systematic deviations between forecast and measurements,
in which the bias was modeled by a polynomial function dependent on zenith angle and clear sky index and
then subtracted from the NWP forecast. By using ensembles of PV sites, the authors clearly showed the
reduction in error of an ensemble with increasing region size, although they noted that increasing the number
of sites will eventually cause saturation and not lead to additional error reduction. Finally, the prediction
intervals were established in a parametric manner, by assuming normal distribution of the forecast errors,
where the standard deviation is modeled by a fourth order polynomial function dependent on zenith angle
and clear sky index. Furthermore, to establish the prediction intervals for ensembles, the standard deviation
of a single site was multiplied with the error reduction factor. To assess the prediction intervals, the relative
standard error was taken into consideration, and an empirical coverage rate of 91% was achieved for the 95%
nominal confidence level. It was noted that this was likely due to the fact that the error reduction factor
was kept constant over all meteorological conditions.
Almeida et al. [116] investigated the potential of nonparametric probabilistic forecasting with QRFs in
combination with NWP inputs for five PV plants in northern Spain. In order to deal with the spatial and
29
temporal uncertainties caused by WRF runs for different locations and different consecutive WRF model
runs, respectively, the authors took forecasts for nearby locations into account. However, no relation could
be found to describe the link between forecast errors in the consecutive runs. Furthermore, three different
training sets were constructed in order to assess which method would be most effective. The first training set
was simply N previous days, whereas the second training set was constructed depending on the similarity
between clearness index of the day to be predicted and that of the days in the database. The final training
set was based on the similarity of the entire distribution of the forecast and days in the database, and it
was found that this method achieved the best results. In addition, a multitude of scenarios were devised
to assess the relative impact of different input variables, and the authors noted that including predicted
and calculated irradiance data led to better results. More importantly, they concluded that increasing the
number of NWP variables was no guarantee for improved accuracy. A final interesting result was that the
length of the training set had no substantial impact on performance, as long as the time series was longer
than 15 days. Unfortunately, no probabilistic metrics as defined in Section 2.5.2 were used.
An AnEn based on a historical set of NWP forecasts was compared to QR and a PeEn by Alessandrini et
al. [9] for three sites in Italy, representing different climate zones. As these methods have been thoroughly
evaluated in Section 3, we will immediately discuss the results here. As a first measure, statistical consistency
of the three methods was compared by means of the rank histogram, which is a tool to assess if the ensemble
members are statistically identical from observations. AnEn showed superior performance over the other
methods and produced a more reliable predictive density. Surprising was however the under dispersive
behavior of QR, which showed to forecast without enough spread. The authors noted that this was likely
due to the optimization process, which set the forgetting factor relatively low in favor of CPRS. Consequently,
QR shows good performance on average in terms of CRPS although it at times it performs worse than AnEn
and PeEn, mainly during periods with low solar elevation. The authors pointed out that this is likely due
low correlation between past NWP forecasts and observations of power generation at those periods, which
is better dealt with by AnEn, as it only takes into account certain past NWP forecasts, rather than all of
them.
Le Cadre et al. [117] proposed an ELM to construct parametric prediction intervals with a 30 minute
resolution for a region in the south of France. By using experts from several stations in the region and
assessing which expert provides the most valuable information, the authors could discriminate against inputs
that could potentially increase bias into the model. This was achieved by introducing a loss function for
each station in the region, after which weights were determined in order to select valuable experts, i.e.,
stations, and discard obsolete ones. They found that precipitation measurements in combination with PV
power generation provided the highest accuracy with a PICP of 0.916 and PINAW of 0.098, whilst utilizing
data from 8 out of 13 stations in the region.
In a similar manner as is done with AnEn, Yamazaki et al. [118] proposed prediction interval estimation
at an hourly resolution based on k -NN in order to look for similar historical events in the data base. This
study is an improvement of the study in [119]. Of the neighbors, k were selected by means of determining
the Euclidian distance between the query point and historical data. However, the main difference between
AnEn and this method is that the predictive density was estimated by means of KDE rather than combining
past forecasts to construct a predictive density. The Gaussian function was utilized as kernel to estimate
the density function that best described the probabilistic relation between historical data and the current
observation. However, the presented setup showed significant bias for the highest quantiles due to the fact
that the kernel functions did not have any information regarding the distance between the query point
and neighbors, and consequently weighed all neighbors equal to construct the predictive density. In order
to improve performance, the authors introduced weighted kernel density estimation by means of a tricube
function that assigned weights to neighbors in a non-linear fashion. Furthermore, standard deviation of
the Gaussian function was also adjusted based on the Euclidian distance, so that the nearest neighbor were
assigned a lower standard deviation. Bias was significantly reduced, together with reliability. Unfortunately,
no quantitative metrics were used to assess the prediction intervals.
Similarly as the study by Yamazaki et al. [118], Fonseca Jr. et al. [87] utilized the Euclidian distance
to identify similarities between input data of hourly forecast of the previous 60 days and input data of
the present forecast. The authors found that 42 hours needed to be selected to construct good prediction
30
intervals. Their method was based on support vector regression (SVR) that was utilized to organize input
data, in combination with calculated extraterrestrial insolation and grid-point value forecasts with a meso-
scale model (GPV-MSM), where the value of the forecast hour and the preceding one of the latter two were
used as inputs. The prediction intervals were modeled with both Gaussian and Laplacian distributions. As a
benchmark, the authors adopted an unconventional method that cannot be deemed effective. It is comprised
of calculating the maximum and minimum power output of the PV system, given a forecast for a certain
hour, and this would then provide a nominal coverage level of 100%. Evidently, the width of these intervals
would be so large that they were not practical, as also pointed out by the authors, but if the proposed
method would perform as the benchmark, it had to be discarded. The results indicated that the Gaussian
distribution had a tendency to underestimate the forecast error coverage for low confidence levels and that
a Laplacian assumption of the error distribution showed more resemblance to the ideal curve. The authors
did not quantify the width of the prediction intervals in terms of PINAW but normalized the width by the
nominal capacity of the PV system. They found that for a confidence level of 95%, interval widths were
0.28 and 0.25 for Laplacian and Gaussian distributions, respectively. PICP was found to be 97.1% to 98.2%,
depending on the confidence level.
As participants of GEFCom2014, Huang & Perry [120] predicted hourly power generation of three PV
plants with ECMWF NWP data. Since the temporal resolution was one hour, the authors decided to create
a model for each hour where radiation was higher than zero. To de-trend the data, the annual cycle of both
irradiance and power time series were modeled using a low-pass filter, using a Fourier transformation. Then,
GB was employed to create the deterministic forecast, by building a model for each plant and each hour. To
account for spatial and temporal correlation caused by the fact that the plants are adjacent to each other,
predictors of all plants were utilized as input of the GB. k -NN was applied to create predictive densities by
looking for similar scenarios, where k was empirically set to 200. A satisfactory quantile score of 0.0121 was
achieved, although it is important to note that the model took 13 minutes to finish on a 256-core parallel
platform.
An extensive and interesting study into the behavior of several data-driven methods that used NWP
data from ECMWF and WRF as input, was performed by Pierro et al. [121]. The aim was to investigate to
what extent a multi-model ensemble (MME) could outperform the best perform member of that ensemble.
The potential members of the ensemble were a seasonal autoregressive integrated moving average with
exogenous inputs (SARIMAX), a SVM and two different MLPs of which one utilized several NWP variables
(called RHNN) and one that only used NWP GHI and temperature, in combination with a clear sky model
(GTNN). The MLPs were created using an optimization procedure described in [122], that effectively creates
a large ensemble that is able to outperform a single MLP. As a first step, the aforementioned models were
assessed with deterministic metrics individually, where it was found that GTNN with ECMWF input was
the outperforming model, with a skill score of 42.5%, as defined in eq. (2.37). Furthermore, it was found
that the outperforming models used ECMWF rather than WRF as input, even though that sometimes led
to increased bias. The best performing MME was found to be the one that utilized all NWP input data,
i.e., ECMWF and WRF, and all data-driven models, which improved RMSE with 6.3% when compared to
the best performing member. The most important reason for this is it is able to reduces the noise of single
predictors by averaging. In order to provide predictive densities, a normal distribution was assumed and a
similar approach as Lorenz et al. [8] was taken to assess these. Although no quantitative measures were
used to assess the prediction intervals, it could be seen that these were reliable although too wide, especially
for low confidence levels. An interesting conclusion was that of the capability of the machine learning model
to correct bias in the NWP model and accordingly, that performance is not always related to accurcay of
the NWP model.
Sperati et al. [89] applied the ensemble prediction system (EPS) of the ECMWF to provide probabilistic
forecasts for a 0 - 72 hour horizon. The EPS creates its ensemble by running the model several times with
perturbed initial conditions but is known to be under-dispersive, which was why post-processing needs to
be applied. First, a NN was employed to reduce bias and construct a predictive density, after which two
statistical post-processing techniques were applied and compared. The first method estimates the variance
deficit (VD) that causes the EPS to be under-dispersive, whereas the second method, i.e., EMOS, minimizes
CRPS of the ensemble data set. A wide variety of metrics was utilized to assess performance. Statistical
31
consistency was assessed with rank histograms and although improvement over the PeEn was achieved by
smoothening out the histograms, the proposed models with VD and EMOS were still over-confident, i.e.,
they showed lack of spread and produced too narrow prediction intervals. Furthermore, the BSS, with
PeEn as reference, showed that the proposed models performed better than PeEn, except for low solar
angles, i.e., when there is less correlation between forecasts and observations, possibly due to shading. In
terms of reliability, both methods outperform PeEn, while sharpness is slightly better. Finally, both methods
significantly outperform PeEn in terms of CRPS, where both peak at 8% to 9% when normalized by nominal
power (NP).
The main focus of the study by Bracale et al. [123] was not to create the most competitive forecast
model, but to propose new cost-based indices. These indices were designed to take economic consequences
of forecasts into account as well. The forecast model was based on BI and interestingly did not take NWP
forecasts into account but rather linked the estimated mean value of GHI and clearness index, modeled by
Beta and Gamma distributions respectively, to measurements of meteorological variables. Regarding the
max
cost-based metrics, the authors proposed extending CRPS by multiplying it with CEt /CE , where CEt and
max
CE represent the economic value and maximum economic value of energy at hour t, respectively. They
argued that since the price of energy is variable, the economic consequence of forecasts will vary, which is
important to take into account by "forecast consumers", e.g., utilities. The selected hours to forecast ranged
from 07:00 am to 08:00 pm, which remained constant throughout the study. Furthermore, the authors
selected the most influential variables through cross-correlation in order to reduce the computational burden
and found that these varied for the same site, based on the response variables, i.e., solar irradiance or clearness
index. The cost-based CRPS was consistently lower than CRPS, which should be the case according to its
definition. In addition, the model that was based on clearness index showed 25% improvement in terms of
CRPS over persistence, whereas the model based on solar irradiance actually performed significantly worse,
with an increase of 52% of CRPS when compared to the persistence. Although the authors did not go into
detail as to why this might be the case, it can be explained by the fact that the CDFs for particular hours
showed larger spread and therefore less sharpness than the model based on clearness index.
Davò et al. [124] took a different approach and aimed at forecasting daily solar irradiance for Oklahoma,
USA. Since this approach entailed a vast amount of data from 11 NWP ensemble members for 144 grid
points and daily irradiance measurements from 98 sites, principal component analysis (PCA) was employed
to reduce the amount of variables. PCA assesses the correlated data and looks for linear combinations
of the observed variables, which are then converted to a set of linearly uncorrelated variables, ordered
according to their variance. The authors went on to show that PCA significantly reduced computational
time with roughly 90% for both the deterministic (NN) and probabilistic (AnEn) cases. Furthermore,
deterministic performance for both NN and AnEn greatly benefited from PCA, while NN showed slightly
better performance than AnEn. Unfortunately, NN was not utilized for probabilistic forecasting, e.g., by
means of the LUBE method, and therefore only AnEn was assessed, but it would have been interesting to
see if the results from the deterministic forecasts can be translated to the probabilistic one. Nevertheless,
AnEn in combination with PCA proved to be reliable, whilst producing sharp prediction intervals, albeit
that reliability showed high variance around the median. In addition, CRPS normalized with maximum
observed radiation energy density (MED) varied between 0.03% and 0.06% depending on the season, where
spring showed the highest CRPS due to increased variability.
The study performed by Zamo et al. [52] considered the longest forecast horizon, i.e., 66 hours, in
combination with a coarse temporal resolution of daily averages. The aim was to assess the performance
and characteristics of several statistical methods that were fed by an ensemble NWP data to construct
nonparametric predictive densities. However, rather than using the ensemble NWP directly to provide
densities, first the QR and QRF models were trained with the unperturbed NWP member, which was
subsequently used to produce a control forecast. Then, the control model forecasts quantiles for all members
after which an empirical CDF is computed for each. Finally, these CDFs were averaged and from these
averages, the quantiles were computed back, so as to acquire the averaged forecast. The results showed that
the improvement over the benchmark, i.e., climatology model, ranged from 25% to 50%. Interestingly, of
the total of eight QR-based forecast models, none appeared to perform consistently better than the others.
Furthermore, similar as in previous studies discussed in this paper, the rank histograms showed that the
32
corrected forecasts were under-dispersive, i.e., did not show enough spread. A final interesting note was
that the authors could not be certain whether or not including all members of the ensemble significantly
improved predictive performance.
Similar to the study by Takeda [113], Saint-Drenan et al. [125] proposed a method to estimate regional
PV power production through a probabilistic approach and although a probabilistic forecast was not made
in this study, It would be possible to retrieve probabilistic information. The motivation for this method was
twofold: Firstly, the authors highlighted that using NWP variables as input to a forecast model of a set
of reference PV plants in a region is suboptimal, because it does not make use of all information produced
by the NWP model. Secondly, that an error may occur when selecting a set of reference PV plants, for
which it is therefore proposed to utilize statistical data on the parameters of the PV plants. In order to find
these parameters, a sensitivity analysis was performed where a compromise was made between minimizing
the amount of information needed and maximizing model accuracy. As a result, it was found that the
two orientation angles were most valuable. Then, to estimate the relative occurrence of these parameters, a
database with 35.000 PV plants was used, after which the plants were binned depending on nominal capacity
since there was found to be a clear relationship between capacity and orientation.These relative occurrences
would then serve as the joint probability distributions of the parameters, and a relatively simple PV power
model could then be utilized to estimate power production. The results showed that the model on average
performed worse than the utilities, being on average 0.5% higher. However, the authors note that they did
not intend to reduce the forecast error but to calculate the aggregated power generation of region, and this
case study was used to validate the model.
Chai et al. [126] utilized KDE and copula to not only forecast for a specific time horizon, but also
inform about the interdependent relationships between output power and forecasts of all intermediate time
horizons. The reasoning behind the aforementioned method is that the correlation between forecast error
and observation has a certain time-dependent impact on the overall uncertainty, which should be taken into
account. Here, copula is used to establish the interdependence between measurement and forecast from
KDE. As performance metric, the authors applied the interval score which, similar to the Winkler score,
tests for both sharpness and reliability., although the assessment was done in such a way that it is not
possible to quantify performance with a single number.

Load. Liu et al. [49] proposed an interesting methodology in which several deterministic forecasts are
shaped into a probabilistic forecast using QR averaging (QRA). The deterministic forecasts were created by
so-called sister models, i.e., regression models that have similar structure but were run with different time
lags and different training data lengths. Furthermore, these regression models possessed the recency effect,
mentioned in 2.4. Then, QRA was applied that minimized the quantile loss function for quantile q based
on all point forecasts to estimate the optimal set of parameters. Four different training data lengths were
utilized and these were applied with a rolling scheme, meaning that parameters were updated. In selecting
the best composition of the QRA model, the authors showed that, depending on the metric used i.e., Pinball,
Winkler score (50%) or Winkler score (90%), the QRA required 7 or 8 sister models and 183 or 365 days of
calibration data. The significant advantage of the proposed methodology is that it can be used with many
point forecasting methods, both different methods in the same QRA model as a single method with different
training schemes. The best QRA models showed a pinball score of 2.85, Winkler score (50%) of 25.04 and
Winkler score (90%) of 55.85.
Although studies (see e.g., [43, 44]) have provided evidence that assuming a distribution to represent
errors is often invalid, or at least sub-optimal, Xie et al. [68] attempted to approach assumptions regarding
a predictive density from another angle. Rather than trying to prove whether the assumption is invalid or
not, the authors aimed at trying to improve the quality of probabilistic forecasts by assuming a Gaussian
distribution. Among the regression models used in the paper is the Vanilla model described in Section 2.4.
These models depend on temperature and therefore, in order to create predictive densities, 30 years of
historical data was used to create 30 weather scenarios, which in turn were used to create 30 deterministic
forecasts with which the required quantiles could be computed. It was shown that the normality assumption
is indeed invalid, however, when residuals were grouped based on calender variables, passing rate of the KS
test was significant. Then, additional simulated Gaussian residuals were added to check if the higher passing
33
rate led to better quantile scores, which led to interesting conclusions. First, if the underlying model has
poor accuracy, this method helps to improve forecasts, but this is negligible if the underlying model showed
good accuracy. Second, no trend could be discovered between the KS test and quantile score, which led to
the conclusion that a higher passing rate is not indicative of a better grouping option. In order to assess the
validity of their conclusions, ANNs were also employed and showed similar results [68].
One of the few studies that investigated PLF on household level with the use of smart meter data
was performed by Taieb et al. [72]. The forecasting method was based on boosting additive QR, which
combines additive QR with GB under the additivity assumption of quantiles, i.e., that one may add several
models by e.g., GB to estimate each quantile. In the paper, both aggregated and individual demand profiles
were considered to generate forecasts for each hour one day ahead. Apart from taking demand profiles
into account, temperature profiles from a nearby airport were also taken into consideration because of the
high correlation between temperature and electricity consumption. Furthermore, since individual demand
was often close to zero, the authors performed a square root transformation in order to guarantee nonzero
forecasts. It was shown that the benchmark that was conditioned on the period of day, i.e., taking into
account calendar variables, showed significant improvement over the unconditional benchmark, which led
to the confirmation of the importance of these variables. Moreover, for increased forecast horizon, that
benchmark and QR showed similar performance in terms of CRPS, which was indicative of the importance
of these variables over lagged ones. Finally, QR outperformed the benchmark with normality assumption on
disaggregated scale by producing wide enough predictive densities to cover the volatile demand. However,
on aggregated level QR showed to lack sharpness.
Another study that considered smart meter data was performed by Arora & Taylor [53], who based
their nonparametric approach on conditional kernel density (CKD) estimation. However, contrary to the
previous study discussed on smart meter data forecasting by Taieb et al. [72], the present authors did
not include weather variables due to potential limited availability and affordability. It was found that the
demand pattern of residential consumers did not change notably over the course of the week and seasonality
was therefore chosen to daily and weekly. In order to take into account holidays, that day was compared
to the previous holiday and previous Sunday, after which the holiday would be considered as the most
similar one. Regarding CKD, which is an extension of KDE, the authors estimate the response of variable
x̂, conditioned on variable x, effectively estimating kernels in two dimensions rather than one. Furthermore,
several constructions of the CKD have been proposed in order ascertain the most effective one, ranging from
conditioned on period of week and period of day to type of intra-day cycle. Among these methods, it was
interesting to see that in fact four methods based on CKD and one on KDE performed very similar, where
the latter was based on taking into account the intra-day cycle. The CRPS was found to lie between 0.013
and 0.055, with the biggest error occurring during volatile time periods. Furthermore, this method also
showed highest reliability in case of six hour forecast horizon, while for longer lead times the other methods,
as well as the HWT benchmark, showed remarkable accuracy too.
Barta et al. [127] aimed at creating a forecasting framework for national energy consumption by utilizing
open access data from the European Network of Transmission System Operators for Electricity (ENTSO-E).
In order to construct predictive densities, GB regression trees (GBRTs) were employed and benchmarked
against actual load data and forecasts provided by the countries themselves. The proposed framework
begins with collecting and storing data, after which GRBT models were built and used to forecast. The
data consisted only of lagged values and were aggregated into hourly values. In terms of point forecasts,
the proposed framework showed good performance although relative improvement over existing methods
depended on the country, as some used very accurate forecasts themselves, e.g, Scandinavian and Benelux
countries. Furthermore, the authors argued that it is difficult to compare the models that achieved these
results, since that is not transparent, similarly as to which data has been used. The average pinball loss was
38.144, but could not be placed into context since only point forecasts were published by the countries.
Rather than focusing on residential electricity consumption, Kou & Gao proposed a PLF method for
energy-intensive enterprises (EIEs), specifically a 1000 MW steel plant. This method was based on Gaussian
process (GP), which assumes the variance to be normally distributed. However, since assumption Gaussian
variance was not considered to be realistic, the authors applied the heteroscedastic GP (HGP) for this
study. Furthermore, as the computational burden of HGP is substantial, the authors sparsified the data,
34
creating the sparsified HGP (SHGP) model. In order to determine the most valuable data as input for the
regression model, the authors took a forward greedy approach that incorporates each predictor separately
and calculates the predictive error of the subsequent model, after which the combination with the lowest error
is selected and the predictor is removed from the candidate set. In terms of reliability, SHGP outperformed
benchmark GP and splines QR (SQR), where it was interesting to see that the latter method consistently
underestimated demand in every quantile. Finally, sharpness of the SHGP model was lower than that of
SQR although not significantly, rendering the authors to conclude that the proposed model is a competitive
alternative.
Another parametric approach was taken by Wijaya et al. [128] who extended generalized additive
models (GAM), which is a linear model where the response variable depends linearly on the regressors, to
GAM2 in which a second GAM was applied to the squared residuals. This approach can be compared to
modeling squared residuals using the GARCH model, and is necessary due to the reason that these are
rarely homoscedastic. First, the mean was estimated using a GAM function, that is subsequently employed
to forecast, after which it is assessed on errors when compared to the training set. Another GAM is then
fitted on these residuals. Furthermore, after each day the proposed model is fed with data of that day so
as to assess its errors and update the parameters if necessary, by means of the online learning algorithm for
additive models. Without adding the online learning algorithm, the prediction intervals did not cover the
entire confidence level, which was however achieved by addition of this algorithm. Furthermore, the authors
showed that the width of the prediction intervals were satisfactory although width as defined in Section
2.5.2 was not utilized. In addition, no benchmark was used in this paper, thus impeding comparing relative
improvement.
Two papers by Quan et al. [54, 64] considered the LUBE method in combination with NN, elaborated
upon in Section 3, to forecast electric load with one week horizon. Other applications of this method have
focused on minimizing CWC, however, the authors argued that as CWC has many parameters and is sensitive
to these, solving the optimization problem can be more efficient by minimizing the width, whilst constraining
the coverage probability. The latter is chosen as the constraint since that determines the validity of the
prediction intervals [65]. To solve the optimization problem, the authors employed PSO. Furthermore, the
performance of the proposed method is sensitive to the NN structure, and therefore 100 candidates, i.e.,
ten neurons for two hidden layers, have been constructed and tested for performance, using PINAW as
the assessment index. The model showed consistency over several runs with a median PICP of 90.81% to
91.03%, median PINAW of 14.52% to 36.53%, CWC of 14.52% to 36.53% and Score of 86.59 to 4725.06,
depending on the load variability per city. In addition, it outperformed benchmark models ARIMA, ES and
naive models. Finally, computational performance was very high with prediction interval construction time
under 10 ms on a desktop computer.
In order to improve forecast accuracy of QR and to allow it to take nonlinear relationships into account,
He et al. [129] proposed utilizing NNs in combination with QR, based on KDE. Since QR is a linear
model and the relationship between regressors and regressands is more accurately modeled through nonlinear
dependencies, the authors argued to combine NNs with QR, as stated before. This implied that each quantile
is estimated by the NN, after which the quantile function is utilized as input of the KDE to estimate and
smoothen the density function. Although the method is an interesting one, the numerical result that were
presented do not allow fair comparison between the benchmark, i.e., radial basis function QR (RBFQR), and
the proposed method. More specifically, the RBFQR achieves a PICP of 0% to 5.95% and PINAW of 1.06e-
05% to 1.24%, depending on the case study. Evidently, PINAWs near zero in combination with extremely
low PICP convey little information regarding the probability of the response variable and consequently, the
proposed model will always outperform such a benchmark. Unfortunately, the authors did not elaborate on
these statistics, especially because their method showed promising results.
As another attempt to take into account nonlinear relationships, He et al. [130] proposed to utilize a
kernel-based SVR in combination with QR, since it is difficult to solve nonlinear problems with the latter.
By introducing the kernel-based SVR, the loss function of QR can be employed in the optimization problem
instead of the complex penalty function of SVR, and the kernel was then used as a similarity function to
approach the nonlinear dependency of the input vector. Finally, in order to quantify and take into account
the correlation between the input variable, i.e., electricity price and power load, the authors utilized copula
35
theory. The interesting premise in this study was indeed the correlation between real-time price and power
load, that showed to have substantial inter-dependency. The amount of training data depended on the
lead time, i.e., ten days of training data to forecast one day ahead and 25 days of training data to forecast
4 days ahead. For day-ahead forecasting, the results showed that the proposed models performed well in
terms of PICP and PINAW, with PICP at 100% and PINAW between 15.76% and 16.48%, depending on
the selected kernel function. Furthermore, when the real-time power price was taken into account via the
copula, PINAW was significantly reduced to 11.75% - 11.62%. The case study that considered a horizon
of 4 days showed a reduction in PICP and increase of PINAW to 82.81% - 96.35% and 23.69% - 30.65%,
respectively. Moreover, the improvement by including the real-time power price was not as substantial as
the first case study. Finally, the authors noted that real-time prices should be included, regardless of the
method that is being used.
Rather than focusing on a model, Xie and Hong [131] compared two model selection frameworks, specif-
ically for PLF. The authors argued that model selection for PLF can be done either by point error measures
(section 2.5.1 or probabilistic error measures (section 2.5.2), and that the former can be considered less
computationally intensive but also less accurate when applied to PLF. Therefore, the authors investigated
to what extend the accuracy can be improved when probabilistic error measures are utilized. As a model,
multiple linear regression (MLR) was used, which was fed multiple temperature scenarios to create proba-
bilistic forecasts. The error metrics used were MAPE for the deterministic forecast and the quantile score for
the probabilistic one. The results showed that the latter metric indeed led to better probabilistic forecasts
than when MAPE was used, but that the difference was negligible.
The following papers have been published in light of GEFCom2014 and are discussed here in random
order. First, Gaillard et al. [132] also utilized an extension of GAM, i.e., quantile GAM (quantGAM) to
participate, and eventually win, GEFCom2014. First, by capturing non-linear relationships between the
mean and regressors, GAM is used to fit the aforementioned mean. Then, the smooth functions that were
the result of a minimization process to find the mean were used as regressors to train the QR in order to
get the estimate of each quantile. Since electricity demand depends heavily on the time of day, the authors
split up the time series in 24, one for each hour of the day and consequently, 24 different models were
fitted. Furthermore, due to the influence of temperature and its uncertainty, these have first been split up.
This means that temperature depending on time of year was first predicted, after which a forecast of the
load conditional on temperature was performed. Finally, the latter was averaged over the former in order
to acquire the final forecast model. The proposed method achieved first place in the competition, with a
pinball score of 3.98 to 10.73 depending on the month, where summer months clearly showed better results
due to less variability. Furthermore, a significant improvement is achieved over the benchmark, although it
is not specified which model was utilized as benchmark.
Xie & Hong [133] proposed a framework for PLF in which pre-processing, forecasting and post-processing
were performed. The pre-processing step was initialized by cleansing the data with the Vanilla benchmark
model. This was done by training the model with all available data and subsequently calculating the
absolute percentage error (APE) between observation and prediction. If the error was greater than 50%,
this observation would be replaced by the prediction, which was the case for 0.05% of the data. The second
part of pre-processing was involved in weather station selection, i.e., which anonymous weather station
would provide the most valuable information. The first forecast was performed by multiple linear regression
(MLR), i.e., the Vanilla benchmark, after which the residual forecast was the average of four methods, i.e.,
unobserved component models (UCMs), ES, ANN and ARIMA. The final forecast was then the combination
of the first forecast and the averaged forecast. Post-processing was based on the paper by Xie et al. [68],
where it was found that the normality assumption for residuals may improve probabilistic forecasts and
therefore this method was applied here. The results showed that post-processing of residuals led to an
improvement in terms of the quantile score, where it was interesting to see that in that category, on average
the benchmark actually performed best with a quantile score of 7.908, thus outperforming the model that
combined the benchmark and four aforementioned model. Unfortunately, the authors did not mention why
this had occurred.
A nonparametric approach was taken by Mangalova & Shesterneva [134], based on fitting a sequence of
Nadaraya-Watson estimators. In order to optimize the basic model, the authors minimized the quantile score
36
to find the optimal bandwith of the kernel. To further improve the proposed model, temperature was utilized
as an input variable, although they found no significant improvement of the forecast. Quantile scores ranged
from 3.93 to 12.74, depending on the month. During summer months, when there is typically less variability,
the quantile score was on average lower. The main advantage of this model is that predictive densities can
be acquired that depend on only one parameter. Finally, the authors concluded that a modification of the
transformation into quantiles that was implemented after the competition led to substantial improvements
in accuracy.
Dordonnat et al. [135] utilized a GAM temperature dependent deterministic load model, for which the
anonymous weather stations were selected based on the generalized cross-validation (GCV) score. Then,
an AR model was built for temperature deviations from the moving average and employed to generate N
samples, which were in turn plugged into the deterministic load model to acquire N load samples. Finally, the
deterministic load model prediction samples were compared with observations to assess errors and quantify
uncertainty and quantiles were derived thereof. The authors pointed out that the model with highest MAPE,
i.e., deterministic metric, did not imply it would perform best when assessed with the quantile score, which
is a clear sign that performance metrics cannot be interchanged. The best performing model achieved an
average quantile score of 7.37, a score that ensured Dordonnat et al. to reach the top five of the competition.
A method based on least absolute shrinkage and selection operator (lasso) estimation was proposed by
Ziel & Liu [136]. A VAR model with load and temperature as input was selected, with the extension of
adding thresholds that represent piecewise linear functions. The reason for this addition was the nonlinear
relationship between load and temperature. However, to reduce the number of potential threshold functions
and consequently computation time, the lasso algorithm was applied that only selects significant nonlinear
impacts. Since the forecast horizon is one month with hourly resolution, the previous 1200 lags, i.e., 1200 h,
were taken into account. Furthermore, eight coefficients were selected to be time-varying, thus reflecting the
most important ones regarding seasonality and reducing parameter space as much as possible. In contrast,
the authors assumed the residuals to be homoscedastic, an assumption that rarely holds true in PLF. The
proposed model outperformed the Vanilla benchmark in terms of the quantile score with an average of 7.44
over the competition.
The final paper on PLF that has been published in light of GEFCom2014 that we include in our review
is written by Haben & Giasemidis [137]. The authors build on the work by Arora & Taylor [53], by applying
CKD conditioned on period of the week and on temperature. However, the decay parameter used here was
symmetrical, meaning that similar days of the year were incorporated into the forecast. Furthermore, QR
was also involved to create hybrid models to ascertain the relative improvement by adding these. It was
found that CKD conditioned on temperature performed best for day-ahead forecasts, but poorly for longer
horizons. This was due to inaccurate temperature forecasting. In addition, combining different CKDs with
QR led to the conclusion that QR led to much of the improvement seen in those combinations, and was
found to be the best non-hybrid forecast on average. Unfortunately, no average quantile score was stated
but estimating from the presented graph leads to a score of approximately 8.
Takeda et al. [138] used the EnKF in combination with SSMs to predict and analyze electricity load
for an area that covers Tokyo and its surroundings. It should be noted that the authors did not perform
a probabilistic forecast, but that the main author later studied EnKF in case of PV power generation and
therein in fact did perform a PSPF [113]. The reason for using SSMs in combination with EnKF, the authors
argued, is that statistical methods, such as ANNs or MLR, do not provide any insight into structural changes
in electricity consumption. In order to further increase the accuracy, the authors applied lasso and MLR.
Since the results did not show significant difference between applying lasso or MLR, they suggested to use
lasso because of its ability to avoid over-fitting. In terms of MAPE, the EnKF + lasso model achieved a score
1.87% and although it outperformed an MLR model currently in use by the utility, it did not outperform
the second MLR model of the utility.

4.4. Comparison between PSPF and PLF


Table 1 presents an overview of the papers that have been discussed in the previous sections. It aims to
provide an overview of the most important aspects that are accompanied with probabilistic forecasting. In
addition, figs. 2 and 3 present an overview of the approaches that have been taken in the reviewed papers,
37
PSPF PLF
Physical
Other 7% Hybrid- Other ANN
22% physical 24% 24%
14%

Hybrid- GAM
Regressive statistical 14%
24% 19%

ANN Regressive
14% 38%

Figure 2: Overview of forecasting techniques used in the reviewed studies.

PSPF PLF
Intra-hour
9%
Intra-hour
Intra-day
26%
10%
Day-ahead
41% Two days or
more
48%

Day-ahead
33%
Intra-day
33%

Figure 3: Overview of lead times in the reviewed studies.

PSPF PLF
GammaBeta Weibull
2% 3% 4%

Normal
Normal 23%
26%

Nonparametric
69% Nonparametric
73%

Figure 4: Overview of assumed distributions in the reviewed studies.

together with the lead times that were considered. From fig. 2 it can clearly be seen that regressive methods
are the most utilized ones for both PSPF and PLF. More specifically, in case of PSPF, 25% of the papers
take either a physical or hybrid-physical approach, which is a relative small amount. The main reason for
this can also be deduced from fig. 3, from which it can be seen that the majority of researchers investigated
intra-day and intra-hour for which a statistical approach is preferred. In addition, statistical approaches
have been employed for day-ahead forecasts as well. In case of PLF, regressive methods are preferred due to
their simplicity and the fact that PLF usually considers aggregated demand, implying that the time series
is smoother than in case of individual demand. This leads to the conclusion that PSPF and PLF could be
combined to perform a net demand forecast, if e.g., resolution and horizon are identical. Another observation
38
PSPF PLF
1 min 5 min
Other 5% 15 min
9%
17% 9%
10 min
3h 10%
30 min
2% 9%
30 min
7%

1h
1h 77%
55%

Figure 5: Overview of temporal resolutions in the reviewed studies.

PSPF PLF
Country Country Several houses
2% 9% 13%

Region Region
28% 14%
Site Neighborhood
40% 14%

Municipality
7%

Several sites Municipality


23% 50%

Figure 6: Overview of spatial resolutions in the reviewed studies.

that can be made is that lead times are longer in case of PLF, which is mainly due to the fact that the
reviewed methods considered calender variables, i.e., utilized the highly repetitive character of electricity
demand.
The extent of differences in data requirements between PSPF and PLF depends on the method that is
used and the correlations between input data that can be exploited. For example, the literature study has
shown that there is significant correlation between temperature and electricity demand, something that is
often exploited in case of PLF, as shown in Table 1. Although there is also correlation between temperature
and irradiance, it is less evident and therefore, in order to reduce the amount input variables, this relationship
is usually ignored and other variables such as precipitation may be regarded as more valuable. However,
if one utilizes a method that is suitable for both PSPF and PLF in combination with variables that are
correlated, both can be predicted simultaneously via net demand forecasting. In addition, performance
assessment with metrics defined in Section 2.5.2 allows for direct comparison between PSPF and PLF.
Although this is usually not necessary in case of separate PSPF and PLF, it can be useful to assess and
compare this due to the differences in variability and subsequent difference in forecast accuracy.
It can be seen from fig. 4 that the nonparametric approach is dominant in both PSPF and PLF. However,
the assumption of a normal distribution is still substantial, which is generally utilized in combination with
autoregressive methods such as AR or ARIMA models, or physical models. The reason normality is assumed
with autoregressive methods is inherent to these methods, since the errors should be normally distributed. It
is interesting to note that more recent papers generally do not make assumptions regarding the distribution
and take a nonparametric approach. Therefore, if one were to look at papers on probabilistic forecasting on
a longer time scale, a larger share of those papers would have taken a parametric approach. In essence, this
development is desirable since the assumption of a distribution is generally not appropriate [43, 44].

39
Increasing penetration of smart meters allows researchers to increase temporal resolution and it can be
seen from fig. 5 in combination with Table 1 that this is a rather recent trend. The lion’s share of papers
is still focused on hourly resolution, which is satisfactory for the day-ahead market but rather coarse for
the increasingly important intra-day market. In this market, short-term fluctuations in both generation and
demand have to be balanced to maintain efficiency and safety of the system, which cannot be controlled
when the temporal resolution is too coarse. Evidently, if PSPF and PLF were to be combined to net demand
forecasting, resolution of the data on which the statistical model is trained should be identical.
As a final remark, we would like to point out the similarities and differences between PSPF and PLF
regarding spatial resolution with Figure 6. As the review has shown, the majority of the papers on PLF
considered aggregated electricity consumption, e.g., on municipality scale, whereas the majority of research
on PSPF has focused on single sites or power plants. Although forecasting individual load data is more
challenging, the studies that have done so managed to achieve good results. Therefore, this discrepancy
between spatial resolutions paves way for interesting future studies into net demand forecasting where e.g.,
the smoothing effect can be exploited to reduce the average forecast error for PSPF on city scale, whilst
taking advantage of the reduced variability of electricity consumption for PLF. Conversely, a bottom-up
approach can also be considered, in which a representative group of buildings are selected and used to
forecast production and consumption. Indeed, studies have shown that the bottom-up approach can increase
accuracy in terms of MAE by 3% [52], and therefore additional research is required to create a paradigm
on how to select a representative group of sites, both for PV plants and electricity consumers. With such a
paradigm, the amount of data required can be significantly reduced while retaining high accuracy.

40
Table 1: Recent publications on probabilistic solar power and load forecasting.

Author and year Forecast Forecast Method Assumed probability Variables Results
horizon resolution distribution function
Solar
Lorenz et al. (2007)[139] 0 - 72 h 1h NWP with post-processing for Beta Forecasts of GHI and T, and No performance metric as defined
GHI, after which the anisotropic- system characteristics in section 2.5.2 was applied to
all-sky model was used to convert assess the prediction intervals
GHI into P
Lorenz et al. (2009)[8] 0 - 72 h 1h NWP with post-processing for Normal Forecasts of GHI and T, and 91% of measured values lie within
GHI, after which the anisotropic- system characteristics the prediction intervals at
all-sky model was used to convert 95% nominal confidence level
GHI into P
Bacher et al. (2009)[91] 1 - 36 h 1h AR and ARX with NWP as Nonparametric, using Past values of P and No performance metric as defined
exogenous input QR forecasts of GHI in section 2.5.2 was applied to
assess the prediction intervals
Bracale et al. (2013)[101] 1-3h 1h AR with Bayesian parameter Gamma Clearness index, system
estimation characteristics and extra-
terrestrial solar radiation
Zamo et al. (2014)[52] 66 h Daily average QR fed with NWP outputs and Nonparametric, using Total downward solar ir- CRPS improvement of 25 - 50%
additional post-processing QR radiation flow for short wave- vs. benchmark model
lengths, total irradiation in (climatological model)
the infra-red wavelengths,
horizontal wind speed 10 m
above the ground level, air
temperature 2 m above ground
level, air relative humidity 2 m
above ground level, total cloud
cover at the vertical of a given
41

location, sea level pressure,


maximum solar elevation angle
for the given day and location
Bilionis et al. (2014)[102] 8h 30 min rGP Nonparametric, using Satellite images CRPS 0.18, unit unclear
rGP
Bessa et al. (2015)[106] 0-6h 1h VARX Nonparametric, using Past values of P of local and CRPS improvement of 1.4 - 5.9%
gradient boosting neighboring sites vs. benchmark model (AR)
Almeida et al. (2015)[116] Day-ahead 1h QRFs fed with NWP outputs Nonparametric, using Surface downwelling shortwave No performance metric as defined
QRFs flux, temperature at 2 m, cloud in section 2.5.2 was applied to
cover at high levels, cloud cover assess the prediction intervals
at low levels, cloud cover at mid
levels, longitude-wind at 10 m,
latitude-wind at 10 m, wind
speed module at 10 m, wind
direction at 10 m, relative
humidity at 2 m, mean sea
level pressure, visibility in air
and past AC power measure-
ments
Alessandrini et al. (2015)[9] Day-ahead 1h Analog ensemble fed with NWP Nonparametric, using GHI, cloud cover, air tempera- Normalized CRPS shows similar
outputs AnEn ture at 2 m above ground, azi- performance as benchmark
muth angle and elevation angle model (QR)
Chu et al. (2015)[88] 5 - 20 min 1 min Sky imagery with SVM and Normal Past values of DNI and sky CWC improvement of >31.25%
ANN sub-models image data vs. benchmark model
(persistence model)
Boland (2015)[96] 10 and 60 min 10 and CARDS with ARCH model Normal Past values of GHI and forecast No performance metric as defined
60 min for variance GHI of neighboring sites in section 2.5.2 was applied to
assess the prediction intervals
Recent publications on probabilistic solar power and load forecasting.

Author and year Forecast Forecast Method Assumed probability Variables Results
horizon resolution distribution function
Le Cadre et al. (2015)[117] 0 - 24 h 30 min ELM Normal Past values of P, wind speed, PICP 0.916 and PINAW 0.098
wind direction, cloud cover,
temperature, atmospheric press-
ure, relative humidity and preci-
pitation
Wang and Jia (2015)[97] 1h 15 min LUBE using RBF Nonparametric, using Past values of P No performance metric as defined
LUBE in section 2.5.2 was applied to
assess the prediction intervals
Yamazaki et al. (2015)[119] Day-ahead 1h k -NN Nonparametric, using Surface pressure, east-west wind No performance metric as defined
KDE component, north-south wind com- in section 2.5.2 was applied to
ponent, air temperature, low cloud assess the prediction intervals
cover, middle cloud cover, high
cloud cover and hourly precipitation
Yamazaki et al. (2015)[118] Day-ahead 1h k -NN Nonparametric, using Sun altitude, day length, wind No performance metric as defined
KDE velocity, atmospheric pressure, in section 2.5.2 was applied to
temperature, humidity, cloudiness assess the prediction intervals
and rainfall
AlHakeem et al. (2015)[105] 1, 3 and 6 h 1h WT of past values of P, after Nonparametric, using Past values of P, GHI and T No performance metric as defined
which result is used as input bootstrap in section 2.5.2 was applied to
by GRNN that is optimized assess the prediction intervals
through PSO
Fonseca Jr. et al. (2015)[87] Day-ahead 1h NWP with extraterrestrial Laplacian and Gaussian Past values of P, air temperature, PICP 97.1 - 98.2%, but
insolation and SVR air relative humidity, low-level PINAW was not calculated
cloudiness, mid-level cloudiness,
42

high-level cloudiness and


extraterrestrial insolation
David et al. (2016)[70] 10 - 360 min 10 and Recursive ARMA and Normal Past values of GHI and clear CRPS improvement of 7.8 -
60 min GARCH sky GHI 25.1% vs. benchmark model
(persistence ensemble)
Huang and Perry (2016)[120] 24 h 1h Gradient boosting Nonparametric, using Normalized power, total column Quantile score 0.0121
k -NN liquid water, surface pressure,
total cloud cover, total column
cloud ice water content, relative
humidity, 10 m U wind com-
ponent, 10 m V wind component,
air temperature 2 m above ground
level, normalized net solar radiation
at the top of the atmosphere, nor-
malized PV simulation model, nor-
malized downward surface solar
radiation
Sperati et al. (2016)[89] 0 - 72 h 3h 1) EPS with NN and Normal Air temperature 2 m above ground CRPS/NP 8 - 9% for both
2) EMOS level, GHI, DNI, azimuth, elevation methods
angle, past values of P
Pierro et al. (2016)[121] Day-ahead 1h MME Normal Past values of P, GHI, DNI, global No performance metric as defined
plane-of-the-array (POA) irradiance, in section 2.5.2 was applied to
wind speed, air temperature and assess the prediction intervals
back of the module temperature,
relative humidity, elevation angle,
azimuth and total cloud cover
Recent publications on probabilistic solar power and load forecasting.

Author and year Forecast Forecast Method Assumed probability Variables Results
horizon resolution distribution function
Grantham et al. (2016)[103] 1h 1h CARDS Nonparametric, using Past values of GHI CRPS improvement of 10%
bootstrap vs. benchmark model
(smart persistence)
Chai et al. (2016)[95] 10 min 1 min Granular RVFL with PSO Nonparametric, using Past values of GHI PICP 91.20% and PINAW 16.94%
granularity
Zhang et al. (2016)[111] 1h 1h GCRFs Nonparametric, using Past values of P, both endogenous PICP 78.74 - 92.75%, however,
GCRFs and exogenous width was not calculated
Tao et al. (2016)[104] 1h 1h DBN Unclear Past values of GHI, ambient No performance metric as defined
temperature and P in section 2.5.2 was applied to
assess the prediction intervals
Golestaneh et al. (2016)[115] 6 - 19 h 1h NWP Nonparametric, using Past values of P, total column ice CRPS/NP 1 - 11%, depending on
QR water, surface pressure, total cloud forecast horizon
cover, total precipitation, relative
humidity, 10 m U wind component,
10 m V wind component, air
temperature 2 m above ground level,
net solar radiation at the top of the
atmosphere, normalized PV
simulation model, downward surface
solar radiation
Golestaneh et al. (2016)[44] 10 min and 1 h 1 min ELM Nonparametric, using Past values of P, clear sky Quantile score improvement of
QR solar irradiance and past values 4 - 14% vs. benchmark model
of wind speed, humidity, (persistence)
T and solar insolation
Torregrossa et al. (2016)[90] 2-6s 250 and 750 ms Holt Winter point forecast Nonparametric, using Past values of GHI and P PICP 97.94 - 99.92%, however,
43

coupled with DIP correlation between width was calculated with an


forecast error of P equation that cannot be directly
and derivative of compared to PINAW
irradiance
Scolari et al. (2016)[92] 100 - 500 ms 100 ms Holt Winter point forecast Nonparametric, using Past values of P PICP 95.14 - 95.46% and
coupled with DIP correlation between PINAW 0.055 - 0.24%
forecast error of P
and derivative of P
Scolari et al. (2016)[93] 0.5 s - 5 min 0.5 s - 5 min K-means clustering Nonparametric, using Past values of GHI and variability PICP 96.1 - 98.2% and
clustering PINAW 0.047 - 17.9%
Chai et al. (2016)[126] Day-ahead 1h KDE with copula Nonparametric, using Past values of P Interval score (similar to Winkler)
KDE used, but no conclusive
quantification
Bracale et al. (2016)[123] Day-ahead 1h Linear regression with BI Normal Past values of GHI, total cloud cover, CRPS improvement of
relative humidity and clearness index -52.0 - 25.0% vs. benchmark
model (persistence ensemble)
Nagy et al. (2016)[109] Intra-day 1h Voted ensemble of QRF Nonparametric, using Total column ice water, Pinball loss 0.006 - 0.009
and stacked RF - GBDT GBDT surface pressure, total cloud
cover, total precipitation, relative
humidity, 10 m U wind component,
10 m V wind component, air
temperature 2 m above ground level,
net solar radiation at the top of the
atmosphere, normalized PV
simulation model, downward surface
solar radiation, calender variables
Recent publications on probabilistic solar power and load forecasting.

Author and year Forecast Forecast Method Assumed probability Variables Results
horizon resolution distribution function
Aryaputera et al. (2016)[112] Intra-day Accumulated 1) NWP with BMA Skew-Normal Average flux CRPS 292 Wh/m2
2) NWP with EMOS
Liu et al. (2016)[108] Intra-day 0.5 h NWP with LAF Nonparametric, using WRF physical model PICP 66.30%, however,
ensemble method ensembles width was not calculated
Juban et al. (2016)[110] Intra-day 1h MQR with RBF Nonparametric, using Total column ice water, Quantile score 0.0086
and ADMM model MQR surface pressure, total cloud
cover, total precipitation, relative
humidity, 10 m U wind component,
10 m V wind component, air
temperature 2 m above ground level,
net solar radiation at the top of the
atmosphere, normalized PV
simulation model, downward surface
solar radiation, calender variables
Wan et al. (2016)[94] 5 min 5 min ELM with QR Nonparametric, using Past values of P Score -0.0222
QR
Davò et al. (2016)[124] Unclear Daily AnEn with PCA Nonparametric, using Total column ice water, CRPS/MED 0.03 - 0.06%
MQR surface pressure, total cloud
cover, total precipitation, relative
humidity, 10 m U wind component,
10 m V wind component, air
temperature 2 m above ground level,
net solar radiation at the top of the
atmosphere, normalized PV
simulation model, downward surface
solar radiation, calender variables and
44

past values of GHI


Takeda (2017)[113] 1h 1h EnKF with SSMs Nonparametric, using Observations of irradiance, air temp. CRPS 24.06 GWh
EnKF and wind speed, as well as monthly
installed PV capacity
Saint-Drenan et al. (2017)[125] Day-ahead 1h PV power model with No PLF, see text Module orientation No PLF, RMSE < 5%
NWP inputs
Chu & Coimbra (2017)[98] 5 - 20 min 1 min k-NN Nonparametric, using Past values of DNI, DFI PICP 0.91 - 0.96 and
k-NN and sky imagery PINAW 0.22 - 0.70
Load
Guan et al. (2013)[100] 1h 5 min WNN trained by Hybrid Normal Past values of P No performance metric as defined
Kalman Filters in section 2.5.2 was applied to
assess the prediction intervals
Quan et al. (2014)[54] One week ahead 1h LUBE using NN and PSO Nonparametric, using Total of 16 inputs, including CWC 14.52 - 36.53%
LUBE calender variables and past
values of P
Quan et al. (2014)[64] One week ahead 1h LUBE using NN and PSO Nonparametric, using Total of 16 inputs, including CWC 16.05 - 72.57%
LUBE calender variables and past
values of P
Kou and Gao (2014)[140] Day-ahead 1h HGP Normal Past values of P No performance metric as defined
in section 2.5.2 was applied to
assess the prediction intervals
Almeida et al. (2015)[114] 0 - 24 h 1h MLP with DPC or CP Nonparametric, using Past values of P and PICP not clearly specified
perturbations day of week and PINAW 10 - 20%
Liu et al. (2015)[49] Day-ahead 1h QR Averaging Nonparametric, using Past values of P and weather Pinball 2.85 and Sci 55.85
QR data, and actual future
values of T
Recent publications on probabilistic solar power and load forecasting.

Author and year Forecast Forecast Method Assumed probability Variables Results
horizon resolution distribution function
Xie et al. (2015)[68] Day-ahead 1h QR Nonparametric, using Past values of P and T, day of Quantile score 93.4 - 96.0%
QR week and month of year
2
Wijaya et al. (2015)[128] 24 - 48 h 1h GAM Normal Past values of P PICP 90% but formulation of
prediction interval mean width
is not identical to PINAW
Taieb et al. (2016)[72] Day-ahead 1h QR with gradient Nonparametric, using Past values of P and actual CRPS improvement of 21%
boosting QR forecast of T vs. benchmark model
(unconditional quantiles)
Arora & Taylor (2016)[53] 0.5 - 168 h 0.5 h CKD Nonparametric, using Past values of P CRPS 0.013 - 0.055
CKD
Barta et al. (2016)[127] 1 - 24 h 1h GBRT Nonparametric, using Past values of P and Pinball 38.144
gradient boosting calender variables
Gaillard et al. (2016)[132] 0 - 168 h 1h quantGAM Nonparametric, using Past values of P, T, time of Pinball 3.98 - 10.73
QR year and day type
Bracale et al. (2016)[99] 15 min 15 min ARIMA with BI Normal and Weibull Past values of P CRPS improvement of 27 - 31%
45

vs. benchmark model


(probabilistic persistence)
Xie & Hong (2016)[133] One month 1h Vanilla Normal Past values of P and T, Quantile score 7.908
and calender variables
Xie & Hong (2016)[131] One month 1h MLR Nonparametric, using Past values of P and T, Avg. quantile score 8.23
temperature scenarios and calender variables
Mangalova et al. (2016)[134] One month 1h Sequence of Nadaraya- Nonparametric, using Past values of P Quantile score 3.92 - 12.74
Watson estimators Nadaraya-Watson
estimators
Dordonnat et al. (2016)[135] One month 1h GAM with AR Nonparametric, using Past values of P and T Quantile score 7.37
resampling
Ziel & Liu (2016)[136] One month 1h VAR Normal Past values of P and T Quantile score 7.44
Haben et al. (2016)[137] One month 1h KDE with QR Nonparametric, using Past values of P and T Quantile score approx. 8
QR
He et al. (2016)[129] 0 - 24 h 15 min QRNNT Nonparametric, using Past values of P PICP 98.81% and
QR and KDE PINAW 9.61 - 16.08%
Takeda et al. (2016)[138] One week ahead 1h EnKF with No PLF, see text Weather observations, calendar No PLF; MAPE 1.87%
SSMs variables and past values of P
He et al. (2017)[130] Day-ahead 30 min KSVQR with Nonparametric, using Past values of P and PICP 100% and
copula QR and KDE electricity price PINAW 11.62 - 11.75%
5. Discussion

The literature review shows that research is fast underway to establish a solid scientific foundation regard-
ing probabilistic forecasting. Recent research has focused to a significant extent on day-ahead forecasting,
both in terms of PSPF and PLF. The reason for this is evident, since day-ahead planning of traditional gen-
erators is essential due to lengthy start-up and shut-down times, as well as substantial ramping constraints.
In addition, high resolution data regarding electricity demand is relatively recent. However, due to a shifting
electricity market, e.g., usage of smart meters, deployment of behind-the-meter (BTM) DGs and increasing
number of EVs, intra-day and intra-hour forecasting will likely become increasingly important in order to
balance stochastic production and consumption. Therefore we argue that more research is necessary on a
high temporal resolution, especially in case of high PV penetration in the built environment with accom-
panying solar variability and increasingly complicated electricity consumption patterns, e.g., D2 R, and the
subsequent complex net demand that the grid experiences. A final remark on this issue regards EVs, since
these pose a substantial challenge to the stability of the electricity grid due to their stochastic charging
pattern in case of e.g., city charging, and significant power demand. Consequently, it is important that
these uncertainties are taken into account and the behavior of EV drivers is accurately forecast in order to
prevent brown-outs and black-outs.
Furthermore, the widespread availability of PV power production data offers another opportunity, namely
uncovering spatio-temporal correlations on e.g., city-scale, due to the inertia of meteorological conditions.
Rather than employing expensive sky imaging cameras, one might attempt to use PV systems and their
geographic locations to uncover short-term trends in the aforementioned conditions and improve accuracy
of the forecasts. Another promising approach might be to model the space-time correlation of irradiance
and PV power using e.g., copula, to forecast in space and time. In addition, the differences in spatial
resolution between PSPF and PLF uncovered in this review offer opportunities to research the possibility
of net demand forecasting on different spatial scales, e.g., on city or neighborhood level.
Throughout the literature study it has also become apparent that probabilistic metrics, such as defined
in Section 2.5.2, are not yet consistently being applied in the case of PSPF, whereas this does hold true when
considering PLF. This is definitely an important issue since it can hinder progress. Furthermore, it was also
noted that normalization of results, e.g., in the case of CRPS, is sometimes done with nominal capacity of
the PV plant and other times with the maximum measured production. We suggest utilizing both since the
former method offers transparency, while the latter method offers the possibility to take seasonal variation
into account. Additional research, such as that performed by Hoff et al. [59] in which they investigated the
impact of normalization on the results, is required for the case of probabilistic forecasting. Another point
of interest was filtering data based on daytime or zenith angle, because of the fact that forecasting during
morning and evening is inherently more challenging due to the increased distance photons travel through
the atmosphere. Evidently, the improvement in terms of accuracy depends on the amount of filtering that
is done and therefore it might be interesting to standardize this. Additionally, it is important to explicitly
mention what method has been used in case of multiple-step, e.g., k steps, ahead predictions, which can
either be done by training a model to directly forecast k steps ahead or via an iterative, one-step ahead
method. Girard et al. [77] offer an excellent overview when considering a Gaussian Process (GP) and
conclude that the iterative method requires less data and computationally less demanding because a single
model can be utilized, rather than a unique one for each horizon.
It has repeatedly been shown that assuming a density during probabilistic forecasting is likely to give
an inaccurate estimate of the future. Researchers investigate the resemblance of the acquired density and
the original one, e.g., Gaussian, only to uncover that it is almost identical. This entails that ones forecast
is directly less accurate due to these dissimilarities. Furthermore, another disadvantage of assuming a fixed
density is that the eventual model is less general, i.e., cannot be applied at every location, since that might
produce different errors that cannot be described with the density that was assumed earlier. However, that
being said, it is important to note that this review has shown that there is no single best method that can be
applied on any location and under any circumstance. Therefore, it is important to focus on methodologies
rather than models that in fact can be universally applied. In addition, techniques applied in PLF can
also be utilized for PSPF, as shown in figs. 2 and 3 and Table 1, which will be useful when net demand

46
forecasting is under consideration. Therefore, research into net demand forecasting is required to assess the
performance of different models in that particular setting as opposed to when both are separately being
forecast.
Several notable papers with important contributions have been reviewed in this paper. Here, we high-
light a few of these in chronological order. For example, Bilionis et al. [102] utilized satellite imagery in
combination with recursive GP, both of which are methods that do not seem to be used often but yielded
promising results, especially since resolution of satellite data keeps improving. GP can be considered as
a powerful method to learn nonlinear dynamical systems although inverting the covariance matrix is a
computationally demanding process. Quan et al. [54] applied the lower upper bound estimate (LUBE)
method [79] to train an artificial neural network (ANN) in forecasting prediction intervals rather than point
predictions. Proposals such as the LUBE method are valuable since they pave the way to continue using
ANNs for probabilistic forecasting, while recognizing that training ANNs with deterministic metrics such
as RMSE is suboptimal. Relatively few researches have exploited the inertia of meteorological processes,
see e.g., [96, 106, 115]. However, they found that adding information from neighboring facilities aides in
improving accuracy of their proposed models. These studies could be extended by for example including
additional explanatory variables such as wind speed and wind direction to incorporate the effect of moving
clouds. Creating an ensemble of several data-driven methods was done by Pierro et al. [121], and was shown
to have significant potential in terms of skill score when forecasts with similar accuracy were combined.
Although this led to a significant improvement, the authors failed to specify computation time, an aspect
that is definitely worth mentioning when utilizing several methods. Taieb et al. [72] utilized smart meter
data to forecast electricity consumption on an disaggregated level, noting that, at the time, only two sim-
ilar papers existed. An important finding is that although variability increases on this spatial resolution,
time of day remains a good explanatory variable. With further penetration of smart meters into the built
environment, it is likely to see more of such studies in the future, which will require additional research into
relevant explanatory variables, especially in case net demand forecasting is considered. Finally, we mention
here the study performed by Torregrossa et al. [90], which was the first study to design a method to be
used with any forecast model that could create prediction intervals on the second to minute resolution. This
will become more relevant when the penetration of PV systems in the built environment increases and more
rigorous power control becomes necessary to protect those that are connected to it against significant power
variations.
In addition to focusing on methodologies, an interesting approach to quantify the accuracy of any forecast
is to quantify the incurred cost due the error of that forecast. This need not be complicated since energy
already has a marginal cost that reflects generation, transmission and distribution costs. However, in case
of predicting higher PV power production than measured for example, one will have to take into account
additional costs such as start-up costs of fast response generators. Quantifying the error based on cost was
proposed by Bracale et al. [123], but more research is required to acquire more advanced metrics.
Finally, many proposed models tend to be difficult to generalize and perform best for the particular case
they were designed for, given the particular set of data they were trained and tested with. GEFCom2014 [13]
and similar initiatives usually offer a significant amount of data to work with, thus avoiding the necessity to
test the proposed model on the training data set and allowing researchers to test their models over a variety
of conditions. More importantly, such competitions offer an outstanding opportunity to compare models
and methodologies, since researchers have to work with identical data sets and under the same conditions,
which is an essential feature of research.

6. Conclusion
In this review we aimed at providing a broad overview of performance metrics, methods and recent ad-
vances in PSPF and PLF. In addition, we have tried to identify research gaps in this area, such as addressing
the effect of net demand forecasting on probabilistic performance metrics, the impact of normalization on
probabilistic forecasting, using benchmark data sets, similar to benchmark models, and analyzing which
explanatory variables are important when using high resolution data. Our third aim was to find common
ground between PSPF and PLF, which could enable net demand forecasting. We have shown that there is
47
overlap in terms of spatial and temporal resolution, but that the similarity in terms of variability can differ
substantially, which opens up the possibility for new methods to be proposed. Finally, we have addressed
the importance of standardization in case of e.g., performance metrics and data pre-processing.

Acknowledgments

This work was financially supported by Smart Grid ERA-NET Cofund in the project "Increase Self
Consumption of Photovoltaic Power for Electric Vehicle Charging in Virtual Networks", and by SamspEL
2016 - 2020 in the project "Development and evaluation of forecasting models for solar power and electricity
use over space and time", both financed primarily by the Swedish Energy Agency. In addition, the authors
would like to thank the anonymous reviewers that helped to improve the quality of this paper.

48
References
[1] NOAA National Centers for Environmental Information, State of the Climate: Global Analysis for June 2016 (jul 2016).
URL http://www.ncdc.noaa.gov/sotc/global/201606
[2] NASA/GISS, Global Land-Ocean Temperature Index (2016).
URL http://data.giss.nasa.gov/gistemp/tabledata{_}v3/GLB.Ts+dSST.txt
[3] UNFCCC, Adoption of the Paris Agreement., Tech. Rep. December (2015). doi:FCCC/CP/2015/L.9/Rev.1.
URL http://unfccc.int/resource/docs/2015/cop21/eng/l09r01.pdf
[4] International Energy Agency, Snapshot of global photovoltaic markets, Tech. rep. (2016).
[5] International Renewable Energy Agency, Renewable Power Generation Costs in 2014, Tech. Rep. January (2015).
[6] C. F. Coimbra, J. Kleissl, R. Marquez, Chapter 8 - Overview of Solar-Forecasting Methods and a Metric for Accuracy
Evaluation. In: Kleissl J, editor. Solar energy forecasting and resource assessment. Boston: Acadamic Press; 2013. p. 171
- 194.
[7] Fraunhofer Institute for Solar Energy Systems ISE, Recent Facts about Photovoltaics in Germany, Tech. rep. (2016).
[8] E. Lorenz, J. Hurka, D. Heinemann, H. G. Beyer, Irradiance Forecasting for the Power Prediction of Grid-Connected
Photovoltaic Systems, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2 (1) (2009) 2–10.
[9] S. Alessandrini, L. Delle Monache, S. Sperati, G. Cervone, An analog ensemble for short-term probabilistic solar power
forecast, Appl. Energy 157 (2015) 95–110. doi:10.1016/j.apenergy.2015.08.011.
[10] X. W. X. Wang, J. G. J. Gao, W. H. W. Hu, Z. S. Z. Shi, B. T. B. Tang, Research of effect on distribution network with
penetration of photovoltaic system, in: Univ. Power Eng. Conf. (UPEC), 2010 45th Int., 2010, pp. 1–4.
[11] J. R. Aguero, S. J. Steffel, Integration challenges of photovoltaic distributed generation on power distribution systems,
in: 2011 IEEE Power Energy Soc. Gen. Meet., 2011, pp. 1–6. doi:10.1109/PES.2011.6039097.
[12] M. Ropp, J. Newmiller, C. Whitaker, B. Norris, Review of potential problems and utility concerns arising from high
penetration levels of photovoltaics in distribution systems, in: IEEE Photovolt. Spec. Conf., 2008. doi:10.1109/PVSC.
2008.4922861.
[13] T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, R. J. Hyndman, Probabilistic energy forecasting: Global Energy
Forecasting Competition 2014 and beyond, Int. J. Forecast. 32 (3) (2016) 896–913. doi:10.1016/j.ijforecast.2016.
02.001.
[14] Muhammad Qamar Raza, Mithulananthan Nadarajah, Chandima Ekanayake, On recent advances in PV output power
forecast, Sol. Energy 136 (136) (2016) 125–144. doi:10.1016/j.solener.2016.06.073.
[15] R. H. Inman, H. T. Pedro, C. F. Coimbra, Solar forecasting methods for renewable energy integration, Prog. Energy
Combust. Sci. 39 (6) (2013) 535–576. doi:10.1016/j.pecs.2013.06.002.
[16] M. Diagne, M. David, P. Lauret, J. Boland, N. Schmutz, Review of solar irradiance forecasting methods and a proposition
for small-scale insular grids, Renew. Sustain. Energy Rev. 27 (2013) 65–76. doi:10.1016/j.rser.2013.06.042.
[17] International Energy Agency (IEA), Photovoltaic and solar forecasting: State of the Art.
[18] J. Widén, N. Carpman, V. Castellucci, D. Lingfors, J. Olauson, F. Remouit, M. Bergkvist, M. Grabbe, R. Waters,
Variability assessment and forecasting of renewables: A review for solar, wind, wave and tidal resources, Renew. Sustain.
Energy Rev. 44 (2015) 356–375. doi:10.1016/j.rser.2014.12.019.
[19] Y. Ren, P. Suganthan, N. Srikanth, Ensemble methods for wind and solar power forecasting—A state-of-the-art review,
Renew. Sustain. Energy Rev. 50 (2015) 82–91. doi:10.1016/j.rser.2015.04.081.
[20] C. Wan, J. Zhao, S. Member, Y. Song, Photovoltaic and Solar Power Forecasting for Smart Grid Energy Management,
J. Power Energy Syst. 1 (4) (2015) 38–46. doi:10.17775/CSEEJPES.2015.00046.
[21] J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F. J. Martinez-de pison, F. Antonanzas-torres, Review of photovoltaic
power forecasting, Sol. Energy 136 (2016) 78–111. doi:10.1016/j.solener.2016.06.069.
[22] L. Wang, O. Kisi, M. Zounemat-Kermani, G. A. Salazar, Z. Zhu, W. Gong, Solar radiation prediction using different
techniques: Model evaluation and comparison, Renew. Sustain. Energy Rev. 61 (2016) 384–397. doi:10.1016/j.rser.
2016.04.024.
[23] C. Voyant, G. Notton, S. Kalogirou, M.-L. Nivet, C. Paoli, F. Motte, A. Fouilloy, Machine Learning methods for solar
radiation forecasting: a review, Renew. Energy 105 (2017) 569–582. doi:10.1016/j.renene.2016.12.095.
[24] J. Kleissl, Solar Energy Forecasting and Resource Assessment, Academic Press, Boston, 2013.
[25] T. Hong, Energy Forecasting : Past , Present and Future, Foresight Int. J. Forecast. (32) (2014) 43–49.
[26] T. Hong, S. Fan, Probabilistic Electric Load Forecasting: A Tutorial Review, Int. J. Forecast. 32 (3) (2016) 914–938.
doi:10.1016/j.ijforecast.2015.11.011.
[27] S. Aman, M. Frincu, C. Chelmis, M. Noor, Y. Simmhan, V. K. Prasanna, Prediction models for dynamic demand
response: Requirements, challenges, and insights, in: 2015 IEEE Int. Conf. Smart Grid Commun., 2015, pp. 338–343.
doi:10.1109/SmartGridComm.2015.7436323.
[28] T. Hong, Short Term Electric Load Forecasting, Ph.D. thesis, North Carolina State University (2010). arXiv:arXiv:
1011.1669v3, doi:10.1017/CBO9781107415324.004.
[29] A. K. Singh, Ibraheem, S. Khatoon, M. Muazzam, D. K. Chaturvedi, Load Forecasting Techniques and Methodologies:
A Review, in: Int. Conf. Power, Control Embed. Syst., 2012.
[30] S. Takiyar, V. Singh, Trend analysis and evolution of Short Term Load Forecasting Techniques, in: 2015 4th Int. Conf.
Reliab. Infocom Technol. Optim. (Trends Futur. Dir., IEEE, 2015, pp. 1–6. doi:10.1109/ICRITO.2015.7359233.
[31] M. Q. Raza, A. Khosravi, A review on artificial intelligence based load demand forecasting techniques for smart grid and
buildings, Renew. Sustain. Energy Rev. 50 (2015) 1352–1372. doi:10.1016/j.rser.2015.04.065.
[32] EPBD, On the energy performance of buildings, Tech. rep., EPBD (2010).

49
[33] A. Foucquier, S. Robert, F. Suard, L. Stéphan, A. Jay, State of the art in building modelling and energy performances
prediction: A review, Renew. Sustain. Energy Rev. 23 (2013) 272–288. doi:10.1016/j.rser.2013.03.004.
[34] N. Fumo, A review on the basics of building energy estimation, Renew. Sustain. Energy Rev. 31 (2014) 53–60. doi:
10.1016/j.rser.2013.11.040.
[35] Z. Wang, R. S. Srinivasan, A review of artificial intelligence based building energy prediction with a focus on ensemble
prediction models, in: 2015 Winter Simul. Conf., IEEE, 2015, pp. 3438–3448. doi:10.1109/WSC.2015.7408504.
[36] B. Yildiz, J. Bilbao, A. Sproul, A review and analysis of regression and machine learning models on commercial building
electricity load forecasting, Renew. Sustain. Energy Rev. 73 (December 2016) (2017) 1104–1122. doi:10.1016/j.rser.
2017.02.023.
[37] E. Lorenz, J. Hurka, G. Karampela, D. Heinemann, H. G. Beyer, M. Schneider, Qualified Forecast of Ensemble Power
Production by Spatially Dispersed Grid-Connected PV Systems, in: 23rd Eur. Photovolt. Sol. Energy Conf. Exhib. Val.
Spain, 1 - 5 Sept., 2008. arXiv:arXiv:1011.1669v3, doi:10.1017/CBO9781107415324.004.
[38] J. Widén, M. Lundh, I. Vassileva, E. Dahlquist, K. Ellegård, E. Wäckelgård, Constructing load profiles for household
electricity and hot water from time-use data-Modelling approach and validation, Energy Build. 41 (7) (2009) 753–768.
doi:10.1016/j.enbuild.2009.02.013.
[39] J. Widen, E. Wackelgard, A high-resolution stochastic model of domestic activity patterns and electricity demand, Appl.
Energy 87 (6) (2010) 1880–1892. doi:10.1016/j.apenergy.2009.11.006.
[40] J. Munkhammar, J. D. K. Bishop, J. J. Sarralde, W. Tian, R. Choudhary, Household electricity use, electric vehicle home-
charging and distributed photovoltaic power production in the city of Westminster, Energy Build. 86 (2015) 439–448.
doi:10.1016/j.enbuild.2014.10.006.
[41] D. Lazos, A. B. Sproul, M. Kay, Optimisation of energy management in commercial buildings with weather forecasting
inputs: A review, Renew. Sustain. Energy Rev. 39 (2014) 587–603. doi:10.1016/j.rser.2014.07.053.
[42] B. Dong, Z. Li, S. M. Rahman, R. Vega, A hybrid model approach for forecasting future residential electricity consump-
tion, Energy Build. 117 (2016) 341–351. doi:10.1016/j.enbuild.2015.09.033.
[43] B.-m. Hodge, M. Hummon, K. Orwig, Solar Ramping Distributions over Multiple Timescales and Weather Patterns, in:
10th Int. Work. Large-Scale Integr. Wind Power into Power Syst., 2011.
[44] F. Golestaneh, P. Pinson, H. B. Gooi, Very Short-Term Nonparametric Probabilistic Forecasting of Renewable Energy
Generation; With Application to Solar Energy, Power Syst. IEEE Trans. PP (99) (2016) 1–14. doi:10.1109/TPWRS.
2015.2502423.
[45] P. Ineichen, Comparison of eight clear sky broadband models against 16 independent data banks, Sol. Energy 80 (4)
(2006) 468–478. doi:10.1016/j.solener.2005.04.018.
[46] P. Pinson, R. Hagedorn, Verification of the ECMWF ensemble forecasts of wind speed against analyses and observations,
Meteorol. Appl. 19 (4) (2012) 484–500. doi:10.1002/met.283.
[47] R. Perez, A. Kankiewicz, J. Schlemmer, K. Hemker, S. Kivalov, A new operational solar resource forecast model service
for PV fleet simulation, in: 2014 IEEE 40th Photovolt. Spec. Conf., IEEE, 2014, pp. 0069–0074. doi:10.1109/PVSC.
2014.6925204.
[48] J. W. Taylor, Short-term double electricity demand, J. Oper. Res. Soc. 54 (8) (2003) 799–805. doi:10.1057/palgrave.
jors.2601589.
[49] B. Liu, J. Nowotarski, T. Hong, R. Weron, Probabilistic Load Forecasting via Quantile Regression Averaging on Sister
Forecasts, IEEE Trans. Smart Grid.
[50] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, Time series analysis : forecasting and control, Englewood Cliffs, N.J. :,
Englewood Cliffs, N.J. :, 1994.
[51] C. Chatfield, Time-Series Forecasting, Chapman & Hall / CRC, Bath, 2000.
[52] M. Zamo, O. Mestre, P. Arbogast, O. Pannekoucke, A benchmark of statistical regression methods for short-term fore-
casting of photovoltaic electricity production. Part II: Probabilistic forecast of daily production, Sol. Energy 105 (2014)
804–816. doi:10.1016/j.solener.2014.03.026.
[53] S. Arora, J. W. Taylor, Forecasting electricity smart meter data using conditional kernel density estimation, Omega 59
(2016) 1–13. doi:10.1016/j.omega.2014.08.008.
[54] H. Quan, D. Srinivasan, A. Khosravi, Uncertainty handling using neural network-based prediction intervals for electrical
load forecasting, Energy 73 (2014) 916–925. doi:10.1016/j.energy.2014.06.104.
[55] R. J. Hyndman, A. B. Koehler, Another look at measures of forecast accuracy, Int. J. Forecast. 22 (4) (2006) 679–688.
doi:10.1016/j.ijforecast.2006.03.001.
[56] J. Zhang, A. Florita, B.-M. Hodge, S. Lu, H. F. Hamann, V. Banunarayanan, A. M. Brockway, A suite of metrics for
assessing the performance of solar power forecasting, Sol. Energy 111 (2015) 157–175. doi:10.1016/j.solener.2014.10.
016.
[57] T. Gneiting, M. Katzfuss, Probabilistic Forecasting, Annu. Rev. Stat. Its Appl. 1 (2014) 125–151. doi:10.1146/
annurev-statistics-062713-085831.
[58] Y. Zhang, J. Wang, X. Wang, Review on probabilistic forecasting of wind power generation, Renew. Sustain. Energy
Rev. 32 (2014) 255–270. doi:10.1016/j.rser.2014.01.033.
[59] T. E. Hoff, R. Perez, J. Kleissl, D. Renne, J. Stein, Reporting of irradiance modeling relative prediction errors, in: Prog.
Photovoltaics Res. Appl., Vol. 21, 2012, pp. 1514–1519. doi:10.1002/pip.2225.
[60] R. Perez, T. E. Hoff, Chapter 6 - Solar Resource Variability. In: Kleissl J, editor. Solar energy forecasting and resource
assessment. Boston: Acadamic Press; 2013. p. 133 - 148.
[61] B. Espinar, L. Ramírez, A. Drews, H. G. Beyer, L. F. Zarzalejo, J. Polo, L. Martín, Analysis of different comparison
parameters applied to solar radiation data from satellite and German radiometric stations, Sol. Energy 83 (1) (2009)

50
118–125. doi:10.1016/j.solener.2008.07.009.
[62] D. P. Doane, L. E. Seward, Measuring Skewness : A Forgotten Statistic?, J. Stat. Educ. 19 (2) (2011) 1–18. doi:
10.1.1.362.5312.
[63] P. Pinson, H. A. Nielsen, J. K. M??ller, H. Madsen, G. N. Kariniotakis, Non-parametric probabilistic forecasts of wind
power: Required properties and evaluation, Wind Energy 10 (6) (2007) 497–516. doi:10.1002/we.230.
[64] H. Quan, D. Srinivasan, A. Khosravi, Short-term load and wind power forecasting using neural network-based prediction
intervals, IEEE Trans. Neural Networks Learn. Syst. 25 (2) (2014) 303–315. doi:10.1109/TNNLS.2013.2276053.
[65] a. Khosravi, S. Nahavandi, D. Creighton, Prediction intervals for short-term wind farm power generation forecasts, IEEE
Trans. Sustain. Energy 4 (3) (2013) 602–610. doi:10.1109/TSTE.2012.2232944.
[66] A. Khosravi, S. Nahavandi, D. Creighton, Construction of optimal prediction intervals for load forecasting problems,
IEEE Trans. Power Syst. 25 (3) (2010) 1496–1503. doi:10.1109/TPWRS.2010.2042309.
[67] N. Meinshausen, Quantile Regression Forests, J. Mach. Learn. Res. 7 (2006) 983–999. doi:10.1111/j.1541-0420.2010.
01521.x.
[68] J. Xie, T. Hong, T. Laing, C. Kang, On Normality Assumption in Residual Simulation for Probabilistic Load Forecasting,
IEEE Trans. Smart Grid (2015) 1–8doi:10.1109/TSG.2015.2447007.
[69] R. L. Winkler, A Decision-Theoretic Approach to Interval Estimation, J. Am. Stat. Assoc. 67 (337) (1972) 187–191.
doi:10.2307/2284720.
[70] M. David, F. Ramahatana, P. J. Trombe, P. Lauret, Probabilistic forecasting of the solar irradiance with recursive ARMA
and GARCH models, Sol. Energy 133 (2016) 55–72. doi:10.1016/j.solener.2016.03.064.
[71] R. Koenker, G. Bassett, Regression Quantiles, Econometrica 46 (1) (1978) 33–50.
[72] S. B. Taieb, R. Huser, R. J. Hyndman, M. G. Genton, Forecasting Uncertainty in Electricity Smart Meter Data by
Boosting Additive Quantile Regression, IEEE Trans. Smart Grid 7 (5) (2016) 2448–2455. doi:10.1109/TSG.2016.2527820.
[73] R. Koenker, K. Hallock, Quantile Regression, J. Econ. Perspect. 15 (4) (2001) 143–156. arXiv:arXiv:1011.1669v3,
doi:10.1017/CBO9780511754098.
[74] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. arXiv:/dx.doi.org/10.1023{\%}2FA{\%
}3A1010933404324, doi:10.1023/A:1010933404324.
[75] C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.
[76] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, S. Aigrain, Gaussian processes for time-series modelling, Philos.
Trans. R. Soc. London A Math. Phys. Eng. Sci. 371 (1984).
[77] A. Girard, C. E. Rasmussen, J. Q. Candela, R. Murray-Smith, Gaussian process priors with uncertain inputs-application
to multiple-step ahead time series forecasting, Adv. Neural Inf. Process. Syst. (2003) 545–552.
[78] B. Efron, Bootstrap Methods: Another Look at the Jackknife, Ann. Stat. 7 (1) (1979) 1 – 26. arXiv:arXiv:1306.3979v1,
doi:10.1214/aos/1176348654.
[79] A. Khosravi, S. Nahavandi, D. Creighton, A. F. Atiya, Lower Upper Bound Estimation Method for Construction of
Neural Network-Based Prediction Intervals, IEEE Trans. Neural Networks 22 (3) (2011) 337–346. doi:10.1109/TNN.
2010.2096824.
[80] J. H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, Statistics (Ber). 29 (5) (2001) 1189–
1232. doi:doi:10.1214/aos/1013203451.
[81] P. Bühlmann, Boosting for high-dimensional linear models, Ann. Stat. 34 (2) (2006) 559–583. arXiv:0606789, doi:
10.1214/009053606000000092.
[82] M. Rosenblatt, Remarks on Some Nonparametric Estimates of a Density Function, Ann. Math. Stat. 27 (3) (1956)
832–837. doi:10.1214/aoms/1177728190.
[83] E. Parzen, On Estimation of a Probability Density Function and Mode, Ann. Math. Stat. 33 (3) (1962) 1065–1076.
doi:10.1214/aoms/1177704472.
[84] S. J. Sheather, M. C. Jones, A reliable data based bandwidth selection method for kernel density estimation (1991).
doi:10.2307/2345597.
[85] N. S. Altman, An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression, Am. Stat. 46 (3) (1992)
175–185. doi:10.1080/00031305.1992.10475879.
[86] L. Delle Monache, F. A. Eckel, D. L. Rife, B. Nagarajan, K. Searight, Probabilistic Weather Prediction with an Analog
Ensemble., Mon. Weather Rev. 141 (2013) 3498–3516. doi:10.1175/MWR-D-12-00281.1.
[87] J. G. da Silva Fonseca, T. Oozeki, H. Ohtake, T. Takashima, O. Kazuhiko, On the use of maximum likelihood and input
data similarity to obtain prediction intervals for forecasts of photovoltaic power generation, J. Electr. Eng. Technol. 10 (3)
(2015) 1342–1348. doi:10.5370/JEET.2015.10.3.1342.
[88] Y. Chu, M. Li, H. T. C. Pedro, C. F. M. Coimbra, Real-time prediction intervals for intra-hour DNI forecasts, Renew.
Energy 83 (2015) 234–244. doi:10.1016/j.renene.2015.04.022.
[89] S. Sperati, S. Alessandrini, L. Delle, An application of the ECMWF Ensemble Prediction System for short-term solar
power forecasting, Sol. Energy 133 (2016) 437–450. doi:10.1016/j.solener.2016.04.016.
[90] D. Torregrossa, J. L. Boudec, M. Paolone, Model-free computation of ultra-short-term prediction intervals of solar
irradiance, Sol. Energy 124 (2016) 57–67. doi:10.1016/j.solener.2015.11.017.
[91] P. Bacher, H. Madsen, H. A. Nielsen, Online short-term solar power forecasting, Sol. Energy 83 (10) (2009) 1772–1783.
doi:10.1016/j.solener.2009.05.016.
[92] E. Scolari, D. Torregrossa, J. L. Boudec, M. Paolone, Ultra-Short-Term Prediction Intervals of Photovoltaic AC Active
Power, in: Int. Conf. Probabilistic Methods Appl. to Power Syst. PMAPS 2016, 2016. doi:10.1109/PMAPS.2016.7764064.
[93] E. Scolari, F. Sossan, M. Paolone, Irradiance prediction intervals for PV stochastic generation in microgrid applications,
Sol. Energy 139 (2016) 116–129. doi:10.1016/j.solener.2016.09.030.

51
[94] C. Wan, J. Lin, Y. Song, Z. Xu, G. Yang, Probabilistic Forecasting of Photovoltaic Generation: An Efficient Statistical
Approach, IEEE Trans. Power Syst. 8950 (8) (2016) 1–1. doi:10.1109/TPWRS.2016.2608740.
[95] S. Chai, Z. Xu, W. K. Wong, Optimal Granule-Based PIs Construction for Solar Irradiance Forecast, IEEE Trans. Power
Syst. 31 (4) (2016) 3332–3333. doi:10.1109/TPWRS.2015.2473097.
[96] J. Boland, Spatial-temporal forecasting of solar radiation, Renew. Energy 75 (2015) 607–616. doi:10.1016/j.renene.
2014.10.035.
[97] S. Wang, C. Jia, Prediction intervals for short-term photovoltaic generation forecasts, in: Proc. - 5th Int. Conf. Instrum.
Meas. Comput. Commun. Control. IMCCC 2015, 2016, pp. 459–463. doi:10.1109/IMCCC.2015.103.
[98] Y. Chu, C. F. M. Coimbra, Short-term probabilistic forecasts for Direct Normal Irradiance, Renew. Energy 101 (2017)
526–536. doi:10.1016/j.renene.2016.09.012.
[99] A. Bracale, G. Carpinelli, P. D. Falco, A Bayesian-based approach for the short-term forecasting of electrical loads in
smart grids . Part II : numerical applications, in: Int. Symp. Power Electron. Electr. Drives, Autom. Motion, 2016, pp.
129–136.
[100] C. Guan, P. B. Luh, L. D. Michel, Z. Chi, Hybrid Kalman Filters for Very Short-Term Load Forecasting and Prediction
Interval Estimation, IEEE Trans. Power Syst. 28 (4) (2013) 3806–3817.
[101] A. Bracale, P. Caramia, G. Carpinelli, A. R. Di Fazio, P. Varilone, A bayesian-based approach for a short-term steady-
state forecast of a smart grid, IEEE Trans. Smart Grid 4 (4) (2013) 1760–1771. doi:10.1109/TSG.2012.2231441.
[102] I. Bilionis, E. M. Constantinescu, M. Anitescu, Data-driven model for solar irradiation based on satellite observations,
Sol. Energy 110 (2014) 22–38. doi:10.1016/j.solener.2014.09.009.
[103] A. Grantham, Y. R. Gel, J. Boland, Nonparametric short-term probabilistic forecasting for solar radiation, Sol. Energy
133 (2016) 465–475. doi:10.1016/j.solener.2016.04.011.
[104] L. Tao, J. He, Y. Wang, P. Zhang, H. Zhang, H. Wang, Y. Miao, J. Wang, Operational risk assessment of distribution
network with consideration of PV output uncertainties, in: China Int. Conf. Electr. Distrib., 2016, pp. 10–15.
[105] D. AlHakeem, P. Mandal, A. Ul Haque, A. Yona, T. Senjyu, T.-L. Tseng, A New Strategy to Quantify Uncertainties of
Wavelet-GRNN-PSO Based Solar PV Power Forecasts Using Bootstrap Confidence Intervals, Power Energy Soc. Gen.
Meet. (2015) 1–5.
[106] R. Bessa, A. Trindade, C. S. Silva, V. Miranda, Probabilistic solar power forecasting in smart grids using distributed
information, Int. J. Electr. Power Energy Syst. 72 (2015) 16–23. doi:10.1016/j.ijepes.2015.02.006.
[107] J. Tastu, P. Pinson, P. J. Trombe, H. Madsen, Probabilistic forecasts of wind power generation accounting for geograph-
ically dispersed information, IEEE Trans. Smart Grid 5 (1) (2014) 480–489. doi:10.1109/TSG.2013.2277585.
[108] Y. Liu, S. Shimada, J. Yoshino, T. Kobayashi, Y. Miwa, K. Furuta, Ensemble forecasting of solar irradiance by applying
a mesoscale meteorological model, Sol. Energy 136 (2016) 597–605. doi:10.1016/j.solener.2016.07.043.
[109] G. I. Nagy, G. Barta, S. Kazi, G. Borbély, G. Simon, GEFCom2014: Probabilistic solar and wind power forecasting using
a generalized additive tree ensemble approach, Int. J. Forecast. 32 (3) (2016) 1087–1093. doi:10.1016/j.ijforecast.
2015.11.013.
[110] R. Juban, H. Ohlsson, M. Maasoumy, L. Poirier, J. Z. Kolter, A multiple quantile regression approach to the wind, solar,
and price tracks of GEFCom2014, Int. J. Forecast. 32 (3) (2016) 1094–1102. doi:10.1016/j.ijforecast.2015.12.002.
[111] B. Zhang, P. Dehghanian, M. Kezunovic, Spatial-Temporal Solar Power Forecast through Use of Gaussian Conditional
Random Fields, in: IEEE PES Gen. Meet., no. 2, 2016, pp. 16–20. doi:10.1109/PESGM.2016.7741503.
[112] A. Aryaputera, H. Verbois, W. Walsh, Probabilistic accumulated irradiance forecast for Singapore using ensemble tech-
niques, in: Conf. Rec. IEEE Photovolt. Spec. Conf., Vol. 2016-Novem, 2016, pp. 1113–1118. doi:10.1109/PVSC.2016.
7749786.
[113] H. Takeda, Short-term ensemble forecast for purchased photovoltaic generation, Sol. Energy 149 (2017) 176–187. doi:
10.1016/j.solener.2017.03.088.
[114] V. Almeida, J. Gama, Prediction Intervals for Electric Load Forecast: Evaluation for Different Profiles, in: Proc. 18th
Intell. Syst. Appl. to Power Syst., 2015, pp. 1–6. doi:10.1109/ISAP.2015.7325539.
[115] F. Golestaneh, P. Pinson, H. B. Gooi, Generation and Evaluation of Space-Time Trajectories of Photovoltaic Power,
Appl. EnergyarXiv:1603.06649.
[116] M. P. Almeida, O. Perpiñán, L. Narvarte, PV power forecast using a nonparametric PV model, Sol. Energy 115 (2015)
354–368. doi:10.1016/j.solener.2015.03.006.
[117] H. Le Cadre, I. Aravena, A. Papavasiliou, Solar PV Power Forecasting Using Extreme Learning Machine and Information
Fusion, HAL (April) (2015) 22–24.
[118] T. Yamazaki, S. Wakao, Y. Fujimoto, Y. Hayashi, Improvement of Prediction Interval Estimation Algorithm with Just-
In-Time Modeling for PV System Operation, in: Photovolt. Spec. Conf. (PVSC), 2015 IEEE 42nd, Vol. 1, 2015, pp.
4–9.
[119] T. Yamazaki, H. Homma, S. Wakao, Y. Fujimoto, Y. Hayashi, Estimation Prediction Interval of Solar Irradiance Based on
Just-in-Time Modeling for Photovoltaic Output Prediction, Electr. Eng. Japan (English Transl. Denki Gakkai Ronbunshi)
195 (3) (2016) 1–10. doi:10.1002/eej.22822.
[120] J. Huang, M. Perry, A semi-empirical approach using gradient boosting and k-nearest neighbors regression for GEF-
Com2014 probabilistic solar power forecasting, Int. J. Forecast. 32 (3) (2016) 1081–1086. doi:10.1016/j.ijforecast.
2015.11.002.
[121] M. Pierro, F. Bucci, M. De Felice, E. Maggioni, D. Moser, A. Perotto, F. Spada, C. Cornaro, Multi-Model Ensemble for
day ahead prediction of photovoltaic power generation, Sol. Energy 134 (2016) 132–146. doi:10.1016/j.solener.2016.
04.040.
[122] C. Cornaro, M. Pierro, F. Bucci, Master optimization process based on neural networks ensemble for 24-h solar irradiance

52
forecast, Sol. Energy 111 (2015) 297–312. doi:10.1016/j.solener.2014.10.036.
[123] A. Bracale, G. Carpinelli, P. D. Falco, R. Rizzo, A. Russo, A. Bracale, G. Carpinelli, P. D. Falco, R. Rizzo, New advanced
method and cost-based indices applied to probabilistic forecasting of photovoltaic generation, J. Renew. Sustain. Energy
8 (023505). doi:10.1063/1.4946798.
[124] F. Davò, S. Alessandrini, S. Sperati, L. Delle Monache, D. Airoldi, M. T. Vespucci, Post-processing techniques and
principal component analysis for regional wind power and solar irradiance forecasting, Sol. Energy 134 (2016) 327–338.
doi:10.1016/j.solener.2016.04.049.
[125] Y. Saint-Drenan, G. Good, M. Braun, A probabilistic approach to the estimation of regional photovoltaic power produc-
tion, Sol. Energy 147 (2017) 257–276. doi:10.1016/j.solener.2017.03.007.
[126] Songjian Chai, Ming Niu, Z. Xu, Loi Lei Lai, K. P. Wong, Nonparametric conditional interval forecasts for PV power
generation considering the temporal dependence, in: 2016 IEEE Power Energy Soc. Gen. Meet., IEEE, 2016, pp. 1–5.
doi:10.1109/PESGM.2016.7741953.
[127] G. Barta, G. Nagy, G. Papp, G. Simon, Forecasting framework for open access time series in energy, 2016 IEEE Int.
Energy Conf. ENERGYCON 2016arXiv:1606.00656, doi:10.1109/ENERGYCON.2016.7514015.
[128] T. K. Wijaya, M. Sinn, B. Chen, Forecasting Uncertainty in Electricity Demand, in: AAAI Work. Comput. Sustain.,
2015, pp. 120–126.
[129] Y. He, Q. Xu, J. Wan, S. Yang, Short-term power load probability density forecasting based on quantile regression neural
network and triangle kernel function, Energy 114 (2016) 498–512. doi:10.1016/j.energy.2016.08.023.
[130] Y. He, R. Liu, H. Li, S. Wang, X. Lu, Short-term power load probability density forecasting method using kernel-based
support vector quantile regression and Copula theory, Appl. Energy 185 (2017) 254–266. doi:10.1016/j.apenergy.2016.
10.079.
[131] J. Xie, T. Hong, Comparing two model selection frameworks for probabilistic load forecasting, in: 2016 Int. Conf.
Probabilistic Methods Appl. to Power Syst. Beijing, China, Oct 16-20„ 2016. doi:10.1109/PMAPS.2016.7764081.
[132] P. Gaillard, Y. Goude, R. Nedellec, Additive models and robust aggregation for GEFCom2014 probabilistic electric load
and electricity price forecasting, Int. J. Forecast. 32 (3) (2016) 1038–1050. doi:10.1016/j.ijforecast.2015.12.001.
[133] J. Xie, T. Hong, GEFCom2014 probabilistic electric load forecasting: An integrated solution with forecast combination
and residual simulation, Int. J. Forecast. 32 (3) (2016) 1012–1016. doi:10.1016/j.ijforecast.2015.11.005.
[134] E. Mangalova, O. Shesterneva, Sequence of nonparametric models for GEFCom2014 probabilistic electric load forecasting,
Int. J. Forecast. 32 (3) (2016) 1023–1028. doi:10.1016/j.ijforecast.2015.11.001.
[135] V. Dordonnat, A. Pichavant, A. Pierrot, GEFCom2014 probabilistic electric load forecasting using time series and semi-
parametric regression models, Int. J. Forecast. 32 (3) (2016) 1005–1011. doi:10.1016/j.ijforecast.2015.11.010.
[136] F. Ziel, B. Liu, Lasso estimation for GEFCom2014 probabilistic electric load forecasting, Int. J. Forecast. 32 (3) (2016)
1029–1037. arXiv:arXiv:1603.01376v1, doi:10.1016/j.ijforecast.2016.01.001.
[137] S. Haben, G. Giasemidis, A hybrid model of kernel density estimation and quantile regression for GEFCom2014 proba-
bilistic load forecasting, Int. J. Forecast. 32 (3) (2016) 1017–1022. doi:10.1016/j.ijforecast.2015.11.004.
[138] H. Takeda, Y. Tamura, S. Sato, Using the ensemble Kalman filter for electricity load forecasting and analysis, Energy
104 (2016) 184–198. doi:10.1016/j.energy.2016.03.070.
[139] E. Lorenz, D. Heinemann, H. Wickramarathne, H. G. Beyer, S. Bofinger, Forecast of Ensemble Power Production by
Grid-Connected PV Systems, in: 20th Eur. PV Conf., 2007.
[140] P. Kou, F. Gao, A sparse heteroscedastic model for the probabilistic load forecasting in energy-intensive enterprises, Int.
J. Electr. Power Energy Syst. 55 (2014) 144–154. doi:10.1016/j.ijepes.2013.09.002.

53

View publication stats

You might also like