Renewable and Sustainable Energy Reviews: Amandeep Sharma, Ajay Kakkar

Renewable and Sustainable Energy Reviews 82 (2018) 2254–2269
Contents lists available at ScienceDirect
Renewable and Sustainable Energy Reviews

journal homepage: www.elsevier.com/locate/rser
Forecasting daily global solar irradiance generation using machine learning MARK
⁎
Amandeep Sharma , Ajay Kakkar
Electronics and Communication Engineering Department, Thapar University, Patiala, India
A R T I C L E I N F O A BS T RAC T
Keywords: Rechargeable wireless sensor networks mitigate the life span and cost constraints propound in conventional
Solar irradiance battery operated networks. Reliable knowledge of solar radiation is essential for informed design, deployment
Energy harvesting planning and optimal management of self-powered nodes. The problem of solar irradiance forecasting can be
Solar forecasting well addressed by machine learning methodologies over historical data set. In proposed work, forecasts have
Machine learning
been done using FoBa, leapForward, spikeslab, Cubist and bagEarthGCV models. To validate the effectiveness of
these methodologies, a series of experimental evaluations have been presented in terms of forecast accuracy,
correlation coefficient and root mean square error (RMSE). The r interface has been used as simulation platform
for these evaluations. The dataset from national renewable energy laboratory (NREL) has been used for
experiments. The experimental results exhibits that from few hours to two days ahead solar irradiance
prediction is precisely estimated by machine learning based models irrespective of seasonal variation in weather
conditions.
1. Introduction izons. With the concern of practical use, Fig. 2 shows different
forecasting horizons and related activities in solar based power
Recent developments in wireless sensor technology incorporate systems.
self-powered sensors to autonomously operate for real time parameter Very short term forecasting is essential for real time monitoring of
updates. Various energy harvesting technologies provide different battery status. Short term forecasting is critical for decision making
kinds of widely distributed endless supply including solar light, piezo- activities including unit commitment etc. Medium term forecasting is
electricity, RF, physical motions and electromagnetic fields. Solar effective for maintenance scheduling and spinning of power unit. Long
energy with photovoltaic cell modules has been considered as the best term forecasting is useful in planning the network operations. Precise
ambient source because of high power density (15 mW/cm3), adequate solar forecasting ensure reliable and stable rechargeable sensor opera-
conversion efficiency (17%) and compatibility with integrated circuit tion with improved control algorithms for battery backup. Different
technology. Table A1 summarizes the power density and conversion forecasting methodologies have been developed for solar irradiance
efficiency of different sources [1,2] and given in Appendix A. forecasting task and summarized in Section 1.2.
1.1. Requirement of solar irradiance forecasting 1.2. Existing forecasting methodologies
Solar power based systems are restrained by different metrological Dependency on metrological conditions causes renewable energy
conditions, seasonal variability, geographical constraints and intra- resources to be inconsistent. Under this constraint, reliable solar
hour solar intensity. Fig. 1 exhibits monthly statistics based global solar irradiance forecast on different time horizons is essential for develop-
radiation on horizontal surface from January to December 2016. ing and utilizing solar energy based systems. As a sequel, research on
Dataset has been adapted from solar radiation research laboratory solar irradiance forecasting has been germinated along with the areas
(SRRL) under national renewable energy laboratory (NREL) [3] with of forecasting theory [5,6], solar physics [7], stochastic processes [8]
CMP-22 pyrometer as solar radiation sensor [4]. Fig. 1(a) shows the and machine learning [9]. Although all these methods have not the
seasonal variation of solar irradiance and Fig. 1(b) exhibits the same accuracy with respect to target forecasting horizon, the char-
maximum and minimum solar intensity with respect to different acteristic of machine learning models to trace relation between input
months of the year. Solar forecasting diminishes the effect of resource and output parameters allow this methodology to be successful in
variability and uncertainty by targeting different forecast time hor- various domains including classification, data mining and solar fore-
⁎
Corresponding author.
E-mail addresses: amandeep.sharma@thapar.edu (A. Sharma), ajay.kakkar@thapar.edu (A. Kakkar).
http://dx.doi.org/10.1016/j.rser.2017.08.066
Received 4 May 2017; Received in revised form 17 July 2017; Accepted 18 August 2017
Available online 24 August 2017
1364-0321/ © 2017 Elsevier Ltd. All rights reserved.
A. Sharma, A. Kakkar Renewable and Sustainable Energy Reviews 82 (2018) 2254–2269
Nomenclature D Historical days

xtPredicted Predicted solar energy
G Global solar irradiance (mW/cm2) xtActual Measured solar energy
r Correlation coefficient d Present day
r2 Coefficient of determination t Present time slot
RMSE Root mean square error T Historical time slots
casting. Classification and data mining have been considered as the with neural networks for parameter optimization. But similar to neural
initial step for machine learning based models as pre-processing of networks, it is hard to trace the dynamic behaviour of the atmosphere,
data has been required with big datasets. mathematically. Belaid et al. [20] proposed a SVM based approach for
Neural networks (NN) [10,11], genetic algorithm (GA) [12], sup- one step ahead solar forecasting with extraterrestrial solar irradiance,
port vector machine (SVM) [13] and fuzzy based models [14] are sunshine duration and ambient temperature as input parameters.
extensively used machine learning based methodologies in solar Jiang et al. [21] presented SVM approach with hard penalty function
forecasting. A multilayer perceptron (MLP) model with daily solar to select optimized number of radial basis function. They also imple-
irradiance and average air temperature as input parameters has been ment glowworm swarm optimization algorithm to choose optimal
proposed by Mellit et al. [15] for 24 h ahead forecasting. Kemmoku parameters for forecasting. Boata et al. [22] introduced autoregressive
et al. [16] proposed a multistage neural network by considering various fuzzy algorithm based model for dollar prediction by estimating daily
metrological parameters of past days and mean atmospheric pressure clearness index.
that is predicted by another neural network for the prediction of next
day. Hocaoglu et al. [17] integrate multistage neural network concept 1.3. Contribution
with time delay neural network models for hourly solar irradiance
forecasting. A comparison of neural network based models and In proposed work, multiple machine learning based models has
clearness index based time series models has been given by Sfetsos been applied to track effective solar forecasting models and analyse
et al. [18]. They consider daily ambient temperature, atmospheric prediction accuracy of each model. Machine learning has been applied
pressure and wind speed as inputs to neural network based model for on historical solar intensity observations as training dataset to calculate
hourly prediction. Main constraint with neural network based models future solar irradiance for different forecasting horizons irrespective of
is the designing of flawless network structure with optimal values of seasonal variation and input parameters availability.
different parameters. Quaiyum et al. [19] introduces endogenous and In Section 2, modelling of machine learning based models for solar
exogenous models that work on past solar irradiance and different irradiance prediction has been discussed. Description of database has
weather parameters respectively. They also integrate genetic algorithm been presented in Section 3. Equations for performance indicators has
1200 Min Avg Max

1000 25000
Solar irradiation (w-hour/m2)
Solar irradiance (w/m2)
800 20000
600
15000
400
10000
200
5000
0
0
1390
1853
2316
2779
3242
3705
4168
4631
5094
5557
6020
6483
6946
7409
7872
8335
1
464
927
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
-200
Time (hourly resolution) Month
(a) (b)
Fig. 1. (a). Solar irradiance (hourly) and (b) Minimum, maximum and average solar irradiance (monthly).
Real time Unit commitment Maintenance Network operation

Application monitoring scheduling planning
Very short term Short term Medium term Long term

Forecasting
horizon
Few seconds to 48 to 72 hours One week ahead Months or years

Time steps
minutes ahead ahead ahead
Fig. 2. Forecasting horizon and concerned applications.
2255
2
been summarized in Section 4. Section 5 includes simulation results
R(w ) = n−1 ∑j wfj j −y
and discussions. The proposed work has been concluded in Section 6. 2 (2)
A backward step has been taken when the increase of cost function
2. Solar irradiance forecasting platform is no more than half of the decrease of cost function in earlier forward
steps i.e. if l forward steps has been taken, the cost function should be
2.1. Machine learning methodology decreased by at least by an amount of l∈ /2 . This means that if R(W ) ≥ 0
for all ∈R d , the algorithm terminates no more than 2R(0)/∈ steps. The
All machine learning based algorithms works to trace a predictive procedure for FoBa [23] has been listed in Appendix B.
model that estimates a particular type of data with high accuracy. Large
dataset is essential for the learning algorithm to understand the 2.2.2. leapForward
behaviour of the system. Fig. 3 exhibits the machine learning metho- It performs exhaustive search using match-select-action cycle to for
dology. First step for machine learning based system is data procure- tracing best subset of predicting variables [25]. Selecting subset refers
ment. Collected data has been divided from different prospective and to finding a small set of independent variables that offers less
summarizes in useful information. The steps included in this process is prediction error in predicting the dependent variables. Rules that can
data cleansing and data segregation. Data has been segregated in three identify finite set of ordered variables that can satisfy their predicates
disjoint sets, training, testing and blind set. Training dataset has been are selected. When one n -sets are selected that particular rule has been
applied for model training and testing dataset has been used for model fired. This procedure continues until no more rules can be fired. The
optimization and evaluation. Blind dataset has been used for cross key point of leap algorithm is lazy subset selection i.e. subset emerges
validation. only when they are required. This perspective increases rule execution
efficiency and reduces space complexity of leaps algorithm.
2.2. Machine learning models When a variable has been selected or deleted, a timestamp to that
element has been placed on the stack that uphold the timestamp
In proposed work, machine learning based time series models ordering of variables. The most recent added variable has been placed
which are based on historically observed solar irradiance as input on the top of the stack and select first during rule execution cycle. This
parameter have been used for solar forecasting called endogenous variable is called dominant object and originate the selection predicates
forecasting. Fig. 4 summarizes five forecasting models used in pro- of all rules in an ordered way. When all the rules have been examined,
posed work with their methodologies and explained in following that dominant object has been popped up from the stack. When a
section. dominant object originated n subset has been found, the corresponding
rule is fired. A new dominant object has been selected when a rule gets
fired. These execution steps repeats until stack get empty and given in
2.2.1. FoBa (Adaptive forward – backward greedy algorithm) Appendix C.
FoBa is based on forward greedy algorithm with adaptive backward
steps [23,24]. The objective is to remove any error caused by earlier 2.2.3. Spikeslab
forward steps and avoid large number of basis functions. Adapted The spikeslab model [26,27] implements rescaled spikes and slab
backward steps ensure that any backward greedy step will not erase algorithms using a continuous bimodal prior. The model has been
gain made in forward steps. Consider n input vectors implemented in three stages shown in Fig. 5 and listed below:
xi ∈ R d (i = 1, . . . , n ),d feature vectors f j ∈ R n(j = 1, . . . , d ) with out- In step 1, filtering process carries top nF variables where n is the
put variables y=R n . Each f j is equivalent to jth feature component of xi sample size and F > 0 is the user defined fraction. Rest of the variables
that corresponds to f j, i = xi, j . With sparsity parameter k , non convex L 0 are filtered out to reduce the dimension. The posterior mean coefficient
regularization can be written as: has been calculated using Gibbs sampling for appropriate ordering of
ẇ = argmin w ∈ R d R(w ) variables. Step 2 refit the model using only those variables those are not
(1)
filtered in step 1. Gibbs sampler has been used to for model fitting and
Where ‖ w ‖ 0 ≤ k , w = [w1, w2,. . . ,wd ] ∈ R d and for least square regres- returns posterior mean values referred to as Bayesian model averaged
sion, R(w ) is a real valued cost function and calculated as: (BMA) estimate. Generalized elastic net (gnet) in step 3 has been used
Data Model Model

Data Testing
Procurement Mining Training
Train
Model
Data Cleansing
Data Segregation
Training Model optimization
Set and Evaluation
Testing
Set
Cross validation
Blind
Set
Fig. 3. Flow diagram of machine learning methodology.
2256
Machine learning Supervised learning Regression
FoBa leapForward spikeslab Cubist bagEarthGCV
Adaptive forward Regression subset spikes and slab Rule based Multivariate adaptive
backward greedy selection algorithm multivariate linear regression splines
algorithm modelling
Fig. 4. Machine learning based models.
for variable selection. Variables obtained from restricted BMA from A predicted value from a model tree has been adjusted to take account
step 2 have been classified in groups. Grouping force variables to share models at the nodes along the path from root to that leaf. The
a common regularization parameter. There is no limit on number of calculation is as follows:
groups. A variable that does not appear in the list will be assigned to a n × Predicted ′+k × Predicted
default group that has its own group specific regularization parameter. Predicted′′ =
n+k (4)
Where Predicted′′ is the prediction passed on the next higher nodes,
2.2.4. Cubist model Predicted′ is prediction passed to this node from below, Predicted is the
Cubist was developed by Quinlan [28] for inducing trees for predicted value at this node, n is the number of training cases that
regression models. Cubist is a rule based predictive model where each reach the node below and k is a constant.
rule carries a multivariate linear model. These models works on the
predictions of previous splits [29,30]. When a case satisfies all rule 2.2.5. BagEarthGCV
based conditions, the associated model has been used for prediction. It is a non-parametric regression technique and based on multi-
Fig. 6 shows the flow diagram of cubist model. In first stage recursive variate adaptive regression splines (MARS) [31,32]. Open source
partitioning (divide and conquer) of training cases has been exercised implementations of MARS are termed as Earth as MARS term is
to generate piecewise linear model in the form of regression based licenced to Salford systems. The MARSplines model has been imple-
model tree. Each training case has a set of attributes and associated mented by Eq. (5):
target value.
U
The basic approach is to generate a model that relates target values
f (x ) = ω0 + ∑ ωuhu(x )
of the training cases to their values of other attributes. A splitting u =1 (5)
criteria has been used to minimize the intra subset variation in the class
values instead of maximizing the information gain at each interior f (x ) is predicted as a function of predictor variables x , intercept
node. The splitting criteria is based on computing standard deviation of parameter ω0 and weighted sum of one or more basis functions. Each
target values of the cases in T. The attribute that minimizes standard ωu is a constant coefficient.
deviation has been chosen. If sd (Ti ) has been considered as standard Bagging: It is a model averaging approach that computes multiple
deviation of the target of the cases in Ti then reduction in standard version of a predictor and use them to derive an aggregate predictor.
deviation has been calculated as: Multiple versions has been generated by making bootstrap replicates of
training set and using it as a new training set. Bagging improves
Ti stability, prediction accuracy and avoid overfitting of different machine
Error = sd (T ) − ∑ × sd (Ti )
i
T (3) learning approaches.
Pruning: After forward stepwise selection of basis functions, a
Where T is the set of cases that reach the node and T1, T2 … are backward procedure called pruning has been applied to remove those
selected cases after splitting the node according to chosen attributes. At basis functions those are least concerned with increase in goodness-of-
the second stage Pruning has been carried out by estimating the fit. The generalized cross validation error is a measure of goodness-of-
expected error that will be experienced at each node for test data. Each fit that considers residual error together with model complexity and
linear model is simplified by eliminating parameters in order to reduce formulated as:
its estimated error. Parameters are eliminated one by one so long as the
N
error estimate decreases. Each internal node of the tree has both a ∑i =1 (yi − f (x )i )2
GCV = whereC = 1+cd
simplified model and a model subtree. The one with lowest estimated C
(1− N )2 (6)
error has chosen.
Finally, smoothing process has been carried out to compensate In Eq. (6), N signifies number of cases in dataset, d is degree of
abrupt discontinuities between adjacent linear models of a pruned tree. freedom and corresponds to number of independent basis functions.
Filtering Model averaging Variable selection
Dimension reduction Bayesian model Generalized elastic

averaged net
Fig. 5. Flow diagram of spikeslab.
2257
Decision tree induction

Training cases, T algorithm for tree Tree pruning Smoothing
building
Fig. 6. Flow diagram of cubist model.
3. Description of database N ∑ x tActualx Predicted

t − (∑ x tActual)(∑ x Predicted
t )
r=
⎡ (Actual) 2 2 ⎤⎡ 2 2⎤
To determine the effectiveness of above mentioned models, histor- ⎢N ∑ x t − (∑ x tActual) ⎥⎢N ∑ x (tPredicted) − (x Predicted)⎥
⎣ ⎦⎣ t
⎦
ical solar data set including daily solar irradiance over a period of six
−1.0 ≤ r ≤ + 1.0 (7)
years (January 1, 2010 to December 31, 2015) has been collected from
national renewable energy laboratory (NREL). NREL is primary
national laboratory in US for renewable energy that uses baseline • Coefficient of determination or r2 (0≤r 2≤1) exhibit the proportional
measurement system (BMS) with latitude 39.742 north, longitude variation of one variable which is predicted from other variable by
105.18 west and elevation 1828.8 m with time zone GMT-7. The data measuring the predicted values that are best fit to regression line.
set obtained from NREL has been sampled on per minute basis on Square of correlation coefficient, r has been taken as coefficient of
horizontal plane with 1440 samples per day. Historical data for the determination in linear least square regression.
same month from all the six years has been taken for training. For • To calculate the difference between real time measurements and
instance, if March 11, 2016 has to be predicted, the March dataset has specific model predicted values, root mean square error (RMSE) or
to be taken from the years from 2010 to 2015 for training. This has root mean square deviation (RMSD) has been used:
been done to circumvent variable maximum and minimum solar T
1
irradiance and different duration of sunshine for different months as RMSE = ∑ (x Actual
t − x Predicted
t )^2
shown in Fig. 1. Best input selection is the primary goal to train models T t=1 (8)
for different forecasting horizons. Different configurations of historical
data set has been tested to downright the training process. Season
(spring, summer, monsoon, winter) base training of different models
• Accuracy measures the proximity of the analytical results to the
actual value. In proposed work an acceptance error of ±20 has been
has been performed. The models have been trained for ± 5 to ± 25 past considered.
days (Dd-1, Dd-2,…., Dd-n) and for 6:00 a.m. onwards past time slots (Tt-1,
T
Tt-2, …, Tt-n). These features have been selected by exhaustive simula- 1
tion process with respect to correlation coefficient (r), coefficient of
%Accuracy = ∑ abs(x Actual
t − x Predicted
t )×100
T t=1 (9)
determination (r2), RMSE and accuracy. Section 4 describes these
performance matrixes. Machine learning based models assure similar The model performance has not been evaluated for night test
metrological behaviour in Training and testing datasets while keeping a samples as solar irradiance is not available during night hours.
ratio of 70% training and 30% testing.
5. Simulation and result discussion
4. Performance matrixes
A series of simulation experiments have been performed to enquire
To precisely evaluate the prediction accuracy of previously de- the accuracy of five solar irradiance forecasting models. A common
scribed models, four statistical quality measures has been adopted. platform has been used to run all experiments. The present section
These measures are executed by the following equations: narrate different experimental results. Historical data of last six years
from NREL has been used for training. To evaluate the forecasting
• Correlation is a statistic that interpret the degree of correspondence models in different weather conditions, four days, 11th March (spring),
25th June (summer), 30th August (monsoon) and 31st December
between two variables:
(winter) from different seasons of year 2016 has been used for testing.
As shown in Fig. 7 data on 11th March and 30th August is smooth as
these are sunny days, data on 25th June varies throughout the day
1000
Measured solar irradiance (w/m2)
800
600
400
Time : 6:00 - 18:00
Time: 6:00 - 18:00
31st December, 2016

Time: 6:00 - 18:00
30th August, 2016
Time: 6:00 - 18:00

11 March, 2016
25th June, 2016
200
0
1 6 11 16 21 26 31 36 41 46 51 56 61
Time (61 hours)
Fig. 7. Seasonal variation in solar irradiance with days of a year.
2258
because of cloudy weather conditions, data on 31st December pos-
12.17
20.56
70.66
11.07
82.14
85.71
10.38
89.29
11.27
89.29
± 25
.98
.96
.98
.96
.93
.92
.98
.96
.98
.96
25
sesses smooth behaviour with lower maximum and minimum solar
intensity thresholds because of winter season.
98.21
13.49
78.57
16.34
75.88
12.14
82.14
± 20
2.92
9.24
87.5
.98
.96
.95
.97
.99
.98
.98
.96
20
5.1. Experiments
1
1
15.99
78.18
14.14
13.75
83.64
14.66
78.18
15.14
81.82
± 15
Three experiments have been carried out to access prediction
.97
.94
.97
.94
.98
.96
.97
.94
.97
.94
15
80
31 December (Winter)
effectiveness based on input parameter selection. In the first one,
impact of number of past days on prediction accuracy has been
17.05
89.19
17.32
78.38
17.06
72.97
17.55
78.38
19.82
75.68
± 10
evaluated. Second experiment evaluates the performance by varying
.97
.94
.96
.92
.97
.94
.97
.92
.96
.92
10
number of past time slots. In the third experiment different forecasting
horizons has been considered to evaluate the forecasting models.
14.77
63.16
84.21
78.95
12.39
78.95
84.21
12.9
9.74
8.49
±5
.99
.98
.99
.98
.99
.98
5
1
1
1
1
5.1.1. Prediction accuracy with respect to number of past days
21.69
78.57
21.67
83.93
19.86
80.36
18.51
76.89
19.65
78.57
± 25
In experiment 1, simulation has been performed with 10–50 past
.97
.94
.96
.92
.97
.94
.97
.94
.97
.94
25
days for training and 5–25 past days for testing for all five forecasting
models. As shown in Table 1, for 11th March, leapForward shows
31.68
57.14
33.18
73.21
30.33
67.86
30.62
71.43
31.68
66.07
± 20
maximum accuracy of 96.08% with ±15 training days and 15 past days
.93
.86
.91
.83
.93
.86
.93
.86
.93
.86
20
for testing. Cubist achieves maximum accuracy of 78.95% for 25th
June with ±5 training days and 5 historical days for testing. For 30th
28.54
65.45
29.61
61.82
63.64
26.71
74.55
29.87
65.45
± 15
28.8
August, an accuracy of 89.47% has been achieved by bagEarthGCV with
.94
.88
.94
.88
.95
.95
.95
15
.9
.9
.9
30 August (Monsoon)
±5 training days and 5 historical days for testing. An accuracy of
98.21% has been gained by FoBa with ±20 training days and 20 days
33.23
56.76
41.31
64.86
30.64
72.97
64.86
38.15
67.57
± 10
37.1
.95
.87
.76
.94
.88
.92
.85
.89
.79
10
for testing for 31st December. The performance ranking of above
.9
mentioned models is complicated because relationship between past
days weather metrics and present day solar intensity is complicated.
11.27
89.47
33.71
42.11
14.69
73.68
12.52
89.47
10.98
84.21
±5
.99
.98
.97
.94
.99
.98
The choice of most accurate model with adequate number of past days
1
1
1
1
depends upon on present day weather conditions and is iteration
specific.
98.98
29.09
100.3
23.64
93.77
36.36
100.9
47.27
36.36
± 25
109
.89
.79
.87
.76
.88
.77
.86
.74
.86
.74
25
5.1.2. Prediction accuracy with respect to initial past time slots

57.88
54.55
57.88
54.55
53.46
54.55
35.47
61.82
54.53
45.45
± 20
In experiment 2, initial time slots from 6:00 a.m. to 10:00 a.m. has
.96
.92
.96
.92
.97
.94
.98
.96
.96
.92
20
been considered for performance evaluation. Simulation results in

Table 2 exhibits the effect of initial past time slots on prediction
56.59
50.91
56.44
50.91
55.57
43.64
49.61
60.85
43.64
± 15
accuracy. For different four days (11th March, 25th June, 30th August
.95
.95
.95
.95
.95
15
60
.9
.9
.9
.9
.9
and 31st December), FoBa offers maximum accuracy (70.27%, 64.86%,
Comparison results of five forecasting models for selection of historical days for forecasting (Experiment 1).
25 June (summer)
66.07% and 98.21% respectively) with past samples taken from

47.22
64.86
47.14
64.86
35.89
56.76
36.79
59.46
44.81
43.24
± 10
.98
.96
.98
.96
.99
.98
.98
.96
.98
.96
10
7:00 a.m. onwards. The methodology of leapForward model has least

effect of initial past samples on prediction accuracy as illustrated in
35.18
78.95
Table 2. For 11th March, 25th June, 30th August and 31st December
64.18
57.89
44.42
42.11
35.53
63.16
50.68
52.63
±5
.99
.98
.96
.92
.99
.98
.98
.96
.98
.96
maximum accuracies are 96.08%, 64.86%, 83.93% and 84.21% respec-
5
tively for almost all simulation cases. Experimental results shows that
38.36
67.86
67.86
34.34
34.28
64.29
39.05
similar to leapForward model, spikeslab offers less variation with
± 25
38.3
62.5
.96
.92
.96
.92
.97
.94
.97
.94
.96
.92
25
50
respect to initial time slot consideration and achieves high prediction
accuracy with less past samples consideration. For 11th March, 25th
36.51
66.07
42.11
57.14
36.17
66.07
76.79
27.04
71.43
± 20
June and 31st December, maximum accuracy (78.38%, 68.42%,

25.5
.96
.92
.96
.92
.96
.92
.98
.96
.97
.94
20
85.45% respectively) has been gained with past time slots from
10:00 a.m. onwards. For 30th August maximum accuracy (78.95%)
96.08
35.77
52.94
25.62
66.67
14.31
84.31
88.24
± 15
3.07
has been gained with past samples from 6:00 a.m. onwards. Cubist
9.49
.99
.99
.98
.96
.99
.98
.99
.98
15
model achieves high prediction accuracy with large past sample

1
1
11 March (spring)
consideration (6:00 a.m. onwards). For 11th March, 25th June, 30th
32.25
70.27
32.03
48.65
27.98
67.57
75.68
25.39
75.68
± 10
21.7
August and 31st December maximum accuracies are 90.2%, 78.95%,

.98
.96
.98
.96
.98
.96
.99
.98
.98
.96
10
84.21% and 85.71% respectively. The prediction accuracy (94.12%,

63.16%, 94.74% and 89.29%) of bagEarthGCV is high for all the days
55.41
47.37
00.79
63.67
47.37
47.96
57.89
53.62
63.16
35.92
63.16
±5
(11th March, 25th June, 30th August and 31st December respectively)
.94
.88
.89
.94
.88
.93
.86
.95
.9
5
but consideration of past samples are unpredictable and is iteration

base.
accuracy
accuracy
accuracy
accuracy
accuracy
RMSE
RMSE
RMSE
RMSE
RMSE
5.1.3. Prediction accuracy with respect to forecasting horizon and

r2
r2
r2
r2
r2
r
r
Past days (training)
Past days (testing)
seasonal validation
To investigate the prediction accuracy of five forecasting models
leapForward
Bagearthgcv
with respect to different forecasting horizons, simulation experiments

spikeslab
Cubist
were run for 1 h ahead, 24 h ahead and 48 h ahead solar forecasting in

Season
Table 1
Foba
experiment 3. Fig. 8 reports the performance indicators in terms of

correlation coefficient (r2), RMSE and accuracy (%).
2259
Table 2
Comparison results of five forecasting models for selection initial past time slots for forecasting (Experiment 2).
Season 11 March (spring) 25 June (summer)
Initial past time slots 6:00 am 7:00 am 8:00 am 9:00 am 10:00 am 6:00 am 7:00 am 8:00 am 9:00 am 10:00 am
FoBa r .92 .98 .97 .98 .98 .96 .98 .95 .95 .96
A. Sharma, A. Kakkar
r2 .85 .96 .94 .96 .96 .92 .96 .9 .9 .92

RMSE 63.03 32.25 34.49 29.89 34.37 76.3 47.22 75.02 75.02 64.81
accuracy 49.02 70.27 60.78 62.75 62.75 32.43 64.86 43.24 43.24 40.54
leapForward r −0.59 1 1 1 1 .98 .98 .98 .98 .98
r2 .35 1 1 1 1 .96 .96 .96 .96 .96
RMSE 654.5 3.07 3.07 13.55 3.07 47.14 47.14 47.14 47.14 47.14
accuracy 0 96.08 96.08 78.43 96.08 64.86 64.86 64.86 64.86 64.86
spikeslab r .98 .98 .98 .98 .98 .98 .99 .98 .98 .99
r2 .96 .96 .96 .96 .96 .96 .98 .96 .96 .98
RMSE 27.98 27.98 28.19 28.23 29.18 35.13 35.89 40.42 33.24 32.36
accuracy 70.27 67.57 70.27 67.57 78.38 63.16 56.76 63.16 63.16 68.42
Cubist r .99 .99 .97 .98 .97 .98 .99 .98 .98 .98
r2 .98 .98 .94 .96 .94 .96 .98 .96 .96 .96
RMSE 11.72 21.7 16.62 17.17 34.28 47.16 35.18 50.23 45.21 39.39
accuracy 90.2 75.68 90.2 86.27 64.29 57.89 78.95 47.37 52.63 57.89
bagEarthGCV r .99 .98 1 .99 1 .99 .98 .99 .96 .98
r2 .98 .96 1 .98 1 .98 .96 .98 .92 .96
RMSE 10.29 25.39 8.8 10.41 7.45 34.08 44.81 46.4 71.59 40.84
accuracy 90.2 75.68 90.2 94.12 86.27 63.16 43.24 42.11 42.11 52.63
Season 30 August (monsoon) 31 December (winter)
Initial past time slots 6:00 am 7:00 am 8:00 am 9:00 am 10:00 am 6:00 am 7:00 am 8:00 am 9:00 am 10:00 am
2260
FoBa r .97 .97 .95 .96 .96 .99 1 .99 .99 .99
r2 .94 .94 .9 .92 .92 .98 1 .98 .98 .98
RMSE 23.33 23.33 33.23 28.8 29.08 14.77 2.92 14.49 11.18 12.46
accuracy 66.07 66.07 56.76 57.14 57.14 63.16 98.21 63.16 78.95 78.95
leapForward r .96 .96 .96 .96 .96 .99 .99 .99 .99 .99
r2 .92 .92 .92 .92 .92 .98 .98 .98 .98 .98
RMSE 21.67 21.67 21.67 21.67 21.67 12.9 12.9 12.9 12.9 12.9
accuracy 83.93 83.93 83.93 83.93 83.93 84.21 84.21 84.21 84.21 84.21
spikeslab r 1 .94 .95 .99 1 .98 .97 .97 .97 .97
r2 1 .88 .9 .98 1 .96 .94 .94 .94 .94
RMSE 13.9 30.64 28.8 20.84 15.63 14.18 17.06 13.48 13.62 13.18
accuracy 78.95 72.97 63.64 57.89 57.89 83.64 72.97 85.45 81.82 85.45
Cubist r .99 .92 .99 .99 .99 .98 .97 .98 .98 .98
r2 .98 .85 .98 .98 .98 .96 .92 .96 .96 .96
RMSE 10.98 37.1 12.78 12.78 14.87 11.22 17.55 10.84 12.47 12.41
accuracy 84.21 64.86 84.21 84.21 84.21 85.71 78.38 83.93 83.93 82.14
bagEarthGCV r .99 .99 1 .99 1 .98 .96 .97 .98 .98
r2 .98 .98 1 .98 1 .96 .92 .94 .96 .96
RMSE 13.54 11.27 9.84 16.02 11.64 11.22 19.82 13.32 11.16 10.59
accuracy 84.21 89.47 94.74 78.95 78.95 85.71 75.68 83.93 87.5 89.29
Renewable and Sustainable Energy Reviews 82 (2018) 2254–2269
(a)
Predicted solar irradiance (w/m2)

1200 1000 1000
r2=.96 900 r2=1 r2=.98
900
1000 FoBa leapForward spikeslab
800 800
700 700
800
600 600
600 500
500
400
400
400 300
300
200
200 200
100
100
0 0
0 500 1000 0 500 1000 0
0 500 1000
Measured solar irradiance (w/m2) Measured solar irradiance(w/m2) Measured solar irradiance (w/m2)
1000 1000
r2=1 RMSE Accuracy(%)

900 r2=.98 900
800 Cubist bagEarthGCV 88.24 90.2
800 86.27 84.31
700 700 62.75
600 600 25.79

16.99 13.68
500 500 10.19 8.52
400 400
300 300
200 200
100 100
0 0
0 500 1000 0 500 1000
Measured solar irradiance (w/m2) Measured solar irradiance (w/m2)
(b)
1000 1000 1000
r2=.93
900 r2=.94 900 r2=.94 900

foba leapForward spikeslab
800 800 800
700 700 700
600 600 600
500 500 500
400 400 400
300 300 300
200 200 200
100 100 100
0 0 0
0 500 1000 0 500 1000 0 500 1000
Measured solar irradiance (w/m2) Measured solar irradiance (w/m2) Measured solar irradiance (w/m2)
900 1000
800 r2=.94 900 r2=.94 RMSE Accuracy(%)

cubist 800
bagEarthGCV
700 69.23 73.08
61.54 65.38
700 57.69
600
39.3 39.65 35.32 36.52 35.58
500 600
500
400
400
300
300
200
200
100
100
0
0
0 500 1000
0 500 1000
Fig. 8. (a). Correlation between predicted and measured solar irradiance for 11th March, 1 h ahead prediction. (b). Correlation between predicted and measured solar irradiance for
11th March, 24 h ahead prediction. (c). Correlation between predicted and measured solar irradiance for 11th March, 48 h ahead prediction. (d). Correlation between predicted and
measured solar irradiance for 25th June, 1 h ahead prediction. (e). Correlation between predicted and measured solar irradiance for 25th June, 24 h ahead prediction. (f). Correlation
between predicted and measured solar irradiance for 25th June, 48 h ahead prediction. (g). Correlation between predicted and measured solar irradiance for 30th August, 1 h ahead
prediction. (h). Correlation between predicted and measured solar irradiance for 30th August, 24 h ahead prediction. (i). Correlation between predicted and measured solar irradiance
for 30th August, 48 h ahead prediction. (j). Correlation between predicted and measured solar irradiance for 31st December, 1 h ahead prediction. (k). Correlation between predicted
and measured solar irradiance for 31st December, 24 h ahead prediction. (l). Correlation between predicted and measured solar irradiance for 31st December, 48 h ahead prediction.
2261
(c)
1000 1000 900

Predicted solar irradiance(w/m2) 900 r2=.85 900 r2=.88 800
r2=.88
800 800 700
700 700
600
600 600
500
500 500
400
400 400
300 300
300
200 200 200
100 100 100

0 0 0
0 500 1000 0 500 1000 0 500 1000
Measured solar irreaduance (w/m2) Measured solar irradiance (w/m2) Measured solar irradiance (w/m2)
1000 1000

900 r2=.83 900 r2=.85 RMSE Accuracy(%)
800 Cubist 800 bagEarthGCV
75 79.17 75
700 700 66.67
49.55
600 600 45.83 39.24
35.79 38.22 38.75
500 500
400 400
300 300
200 200
100 100
0 0
0 500 1000 0 500 1000
(d)
1500 1500 1500
Predcted solar irradiance (w/m2)
r2=.96 r2=.93
r2=.92
leapForward spikeslab
FoBa
1000 1000 1000
500 500 500
0 0 0
0 500 1000 1500 0 500 1000 1500 0 500 1000 1500
1500 1500
r2=.9 RMSE Accuracy(%)

r2=.96
cubist bagEarthGCV
67.77 65.38
61.54 61.54 61.54
1000 1000 54.18 55.12
53.01 53.85 46.34
500 500
0 0
0 500 1000 1500 0 500 1000 1500
Fig. 8. (continued)
2262
(e)
1500 1500 1500


r2=.94 r2=.92 r2=.94
1000 1000 1000
500 500 500
0 0 0
0 500 1000 1500 0 500 1000 1500 0 500 1000 1500
1500 1500
r2=.94
r2=.94 RMSE Accuracy(%)

cubist bagEarthGCV
77.03
69.2 56.36 69.09 66.93
1000 1000 58.18 54.55
47.27 52.96 45.15
500 500
0 0
0 500 1000 1500 0 500 1000 1500
(f)
1500 1500 1500
r2=.92 r2=.94 r2=.94

1000 1000 1000
500 500 500
0 0
0 500 1000 1500 0
0 500 1000 1500
0 500 1000 1500
1500 1500
r2=.94 r2=.92 RMSE Accuracy(%)

cubist bagEarthGCV 75
54.17 60.3
55.69 52.33 54.17
1000 1000 46.9
41.12
29.17 27.96
500 500
0 0
0 500 1000 1500 0 500 1000 1500
Fig. 8. (continued)
2263
(g)
1000 1000 1000

r2=.94

r2=.98 r2=.98
500 500 500
0 0 0
0 500 1000 0 500 1000 0 500 1000
1000 1000
r2=.96 r2=.96 RMSE Accuracy (%)

cubist bagEarthGCV
83.78 83.78 78.38
72.97 70.27
37.15 42.97
22.97 22.18 26.76
500 500
0 0
0 500 1000 0 500 1000
(h)
600 600 600

r2=.83 r2=.9 r2=.92

FoBa leapForward spikeslab
400 400 400
200
200 200
0
0 0 200 400 600 0
0 200 400 600 0 200 400 600
Measured solar irradiance(w/m2) Measured solar irradiance (w/m2) Measured solar irradiance (w/m2)
600 600
r2=.9
Predicted solar irradiance(w/m2)
RMSE Accuracy (%)

r2=.85 bagEarthGCV
cubist 88.46
73.08 69.23 73.08
400 400
48 50
39.36
32.38 30.95
23.37
200 200
0 0
0 200 400 600 0 200 400 600
Fig. 8. (continued)
2264
(i)
600 600 600
r2=.92

r2=.85 r2=.9
500 500 spikeslab
500 FoBa leapForward
400 400 400
300 300 300
200 200 200
100 100
100
0 0
0
0 200 400 600 0 200 400 600
0 200 400 600
600 600
2 r2=.92 RMSE Accuracy (%)

r =.9
500 500 bagEarthGCV 91.67
Cubist
79.17 79.17
75
400 400 66.67
300 300 41.25

32.18
26.8 28.14
200 200 22.73
100 100
0 0
0 200 400 600 0 200 400 600
(j)
350
350 350
r2=.9 r2=.92 r2=.94

300 FoBa
300 leapForward 300
spikeslab
250 250 250
200 200 200
150 150 150

100 100
100
50 50
50
0 0
0 200 400 0
0 200 400
0 200 400
350
350
r2=.92 r2=.92 RMSE Accuracy (%)

300 Cubist 300 bagEarthGCV
90.64 91 92.86 93.86 92.51
250 250
200 200
150 150
11.45 11.53 10.73 10.3 11.58
100 100
50 50
0 0
0 200 400 0 200 400
Fig. 8. (continued)
2265
(k)
140 160 140
r2=.9

r2=.91 r2=.92
120 140 leapForward 120
FoBa spikeslab
100 120
100
100
80 80
80
60 60
60
40 40
40
20 20
20
0
0 0
0 50 100 150
0 50 100 150 0 50 100 150
140 140

r2=.92 r2=.95
120 120 bagEarthGCV
Cubist RMSE Accuracy (%)
100 100 91.86 92.8 92.6 93.86
90.86
80 80
60 60
40 40
11.45 11.53 10.73 11.7 11.58
20 20
0 0
0 50 100 150 0 50 100 150
(l)
120 120 120

r2=.92 r2=.9 r2=.94

100 FoBa 100 leapForward 100 spikeslab
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
0 50 100 150 0 50 100 150 0 50 100 150
600 600
r2=.9 RMSE Accuracy (%)

r2=.92 91.67
500 Cubist 500 bagarthGCV 79.17 79.17
75
400 400 66.67
300 300 41.25

32.18
26.8 28.14
22.73
200 200
100 100
0 0
0 200 400 600 0 200 400 600
Fig. 8. (continued)
2266
• 11th March 2016, 1 h ahead forecasting prediction accuracy (83.78%) has been achieved by spikeslab with
.98 correlation coefficient and 22.18 RMSE. The historical days from
For 11th March 2016 in one hour ahead prediction highest 15th August to 14th September ( ± 15 days) from the years 2010–
prediction accuracy (90.2%) has been achieved by bagEarthGCV with 2015 have been used for training and days from 15th August 2016 to
r2 value of 1 and 8.52 RMSE. The historical days from 25th February to 30th August 2016 have been used for testing. Initial past time slots
27th March ( ± 15 days) from the years 2010–2015 have been used for has been taken from 7:00 a.m. and last time slot available is one
training and days from 25th February 2016 to 11th March 2016 have hour ahead of prediction time.
been used for testing. Initial past time slots has been taken from • 30th August 2016, 24 h ahead forecasting
7:00 a.m. and last time slot available is one hour ahead of prediction In 24 h ahead solar forecasting for 30th August 2016, spikeslab
time. offers the maximum prediction accuracy (88.46%) with .92 correla-
tion coefficient and 23.37 RMSE. The historical days from 15th
• 11th March 2016, 24 h ahead forecasting August to 14th September ( ± 15 days) from the years 2010–2015
have been used for training and days from 15th August 2016 to 29th
In 24 h ahead solar forecasting for 11th March 2016, prediction August 2016 have been used for testing. Initial past time slots has
accuracy has been reduced from 90.2% to 79.17% and offered by Cubist been taken from 7:00 a.m. and last time slot available is 24 h ahead
model. The value of correlation coefficient and RMSE has been of prediction time.
observed as .94 and 38.22 respectively. The historical days from 25th • 30th August 2016, 48 h ahead forecasting
February to 27th March ( ± 15 days) from the years 2010–2015 have In 48 h ahead prediction for 30th August 2016, maximum
been used for training and days from 25th February 2016 to 10th accuracy (91.67%) has been gained by spikeslab model with .92
March 2016 have been used for testing. Initial past time slots has been correlation coefficient and 22.73 RMSE. The historical days from
taken from 7:00 a.m. and last time slot available is 24 h ahead of 15th August to 14th September ( ± 15 days) from the years 2010–
prediction time. 2015 have been used for training and days from 15th August 2016 to
28th August 2016 have been used for testing. Initial past time slots
• 11th March 2016, 48 h ahead forecasting has been taken from 7:00 a.m. and last time slot available is 48 h
In 48 h ahead solar forecasting for 11th March 2016, maximum ahead of prediction time.
prediction accuracy (69.09%) has been offered by spikeslab model • 31st December 2016, 1 h ahead forecasting
with .88 correlation coefficient and 45.15 RMSE. The historical days In 1 h ahead solar forecasting for 31st December 2016, max-
from 25th February to 27th March ( ± 15 days) from the years imum accuracy (93.86%) has been given by Cubist model with .92
2010–2015 have been used for training and days from 25th correlation coefficient and 10.3 RMSE. The historical days from 16th
February 2016 to 9th March 2016 have been used for testing. December to 15th January ( ± 15 days) from the years 2010–2015
Initial past time slots has been taken from 7:00 a.m. and last time have been used for training and days from 16th December 2016 to
slot available is 48 h ahead of prediction time. 31st December 2016 have been used for testing. Initial past time
It has been observed that for 11th March 2016 in all three slots has been taken from 7:00 a.m. and last time slot available is
forecasting horizons, performance matrix is satisfactory and effec- one hour ahead of prediction time.
tiveness of a particular model is weather specific. • 31st December 2016, 24 h ahead forecasting
• 25th June 2016, 1 h ahead forecasting spikeslab model offers highest prediction accuracy (92.7%) for
For 25th June 2016 in 1 h ahead prediction, spikeslab gains the 30th August 2016 in 24 h ahead prediction horizon with .92
highest prediction accuracy (69.09%) with .96 correlation coefficient correlation coefficient and 10.73 RMSE. The historical days from
and 45.15 RMSE. The historical days from 10th June to 10th July ( 16th December to 15th January ( ± 15 days) from the years 2010–
± 15 days) from the years 2010–2015 have been used for training 2015 have been used for training and days from 16th December
and days from 10th June 2016 to 25th June 2016 have been used for 2016 to 30th December 2016 have been used for testing. Initial past
testing. Initial past time slots has been taken from 7:00 a.m. and last time slots has been taken from 7:00 a.m. and last time slot available
time slot available is one hour ahead of prediction time. is 24 h ahead of prediction time.
• 25th June 2016, 24 h ahead forecasting • 31st December 2016, 48 h ahead forecasting
In 24 h ahead prediction of 25th June 2016, Cubist model offers
the highest accuracy (65.38%) with .94 correlation coefficient and For 48 h ahead forecasting, Spikeslab offers highest prediction
46.34 RMSE. The historical days from 10th June to 10th July ( ± 15 accuracy (91.67%) with .94 correlation coefficient and 22.73 RMSE.
days) from the years 2010–2015 have been used for training and The historical days from 16th December to 15th January ( ± 15 days)
days from 10th June 2016 to 24th June 2016 have been used for from the years 2010–2015 have been used for training and days from
testing. Initial past time slots has been taken from 7:00 a.m. and last 16th December 2016 to 29th December 2016 have been used for
time slot available is 24 h ahead of prediction time. testing. Initial past time slots has been taken from 7:00 a.m. and last
• 25th June 2016, 48 h ahead forecasting time slot available is 48 h ahead of prediction time.
In 48 h ahead prediction of 25th March 2016, maximum It has observed from the results obtained in the above section that
prediction accuracy (62.5%) has been achieved by spikeslab model Spikeslab and Cubist model achieves high prediction accuracy than
with .94 correlation coefficient and 53.34 RMSE. The historical days FoBa, leapForward and bagEarthGCV with respect to different fore-
from 10th June to 10th July ( ± 15 days) from the years 2010–2015 casting horizons for all seasons of a year.
have been used for training and days from 10th June 2016 to 23rd
June 2016 have been used for testing. Initial past time slots has been 6. Conclusion
taken from 7:00 a.m. and last time slot available is 48 h ahead of
prediction time. The applicability of five machine learning models, FoBa,
It has been observed that unstable weather conditions (shown in leapForward, spikeslab, Cubist and bagEarthGCV in modelling solar
Fig. 7) and low correlation with past days is the reason of low irradiance prediction has been investigated and evaluated under
prediction accuracy and high RMSE in all forecasting horizon for seasonal effects using the same test platform and datasets. Main
25th June 2016. contribution is performance comparison of models in different fore-
• 30th August 2016, 1 h ahead forecasting casting horizons ranging from 1 h ahead to 48 h ahead. The perfor-
In one hour ahead prediction for 30th August 2016, maximum mance has been evaluated by statistical indices correlation coefficient,
2267
RMSE and prediction accuracy (%) for each model. spikeslab and Cubist model are very promising and stable with respect
Regarding the results obtained in experiment 1, accuracy of a model to different forecasting horizons. The prediction accuracy with different
depends upon quality of selected data for model training. For different forecasting horizons (1 h ahead, 24 h ahead, 48 h ahead) gained by
days of a year (11th March, 25th June, 30th August and 31st spikeslab for 11th March (86.27%, 66.67%, 69.09% respectively), for
December), the performance matrix (r2, RMSE and accuracy) for 25th June (69.09%, 61.54% and 62.5% respectively), 30th August
leapForward (.99, 3.07, 96.08%), Cubist (.98, 35.18, 78.95), (83.78%, 88.46%, 91.67% respectively) and 31st December (92.86%,
bagEarthGCV (.98, 11.27, 89.47%) and FoBa (1, 2.92, 98.21%) 92.8% and 91.67% respectively) are satisfactory and stable. Similarly,
respectively have been obtained. Cubist achieves (84.31%, 79.17%, 58.18% respectively) for 11th March,
In experiment 2, it has been observed that FoBa, leapForward, (58.18%, 65.38%, 54.17% respectively) for 25th June, (78.38%, 69.23%
Cubist performs well with large set of past time slots according to solar and 79.17% respectively) for 30th August and (93.86%, 92.6%, 79.17%
irradiance availability (7:00 a.m. onwards) whereas spikeslab works respectively) for 31st December.
with less past samples. The performance of bagEarthGCV is unpre- The results are evident that solar irradiance forecasting with such
dictable with respect to number of past samples and is iteration machine learning models is recent and productive study in this field
specific. leads to accurate solar forecasting than conventional methods.
The results obtained in experiment 3 shows that results obtained by
Appendix A
See Table A1 here
Table A1
Comparison of different ambient energy sources.
Power source Power density Conversion efficiency
Solar 17%
– Outdoor 15,000 µW/cm3
150 µW/cm3
- Indoor 6 µW/cm3
Vibration
– Piezoelectric 335 µW/cm3 5%
– Electrostatic 44 µW/cm3 9%
– Electromagnetic 400 µW/cm3 1%
Acoustic noise .003 µW/cm3 at 75 DB
.96 µW/cm3at 100 DB
Temperature gradient 15 µW/cm3at 10 °C 7% at 100 °C
15% at 200 °C
Human power 330 µW/cm3 5–30%
Air flow 7600 at 5 m/s
Pressure variation 17 µW/cm3
Appendix B. Algorithm: FoBa: Forward- Backward greedy algorithm
Input:f1 ,. . . ,fd ,y∈R nand ϵ > 0

Output:F kandw k
LetF (0) = ϕandw(0)
Let k = 0
While true
Let k = k + 1
// Forward step
Leti k = argminimin∝R(w k −1+∝ei )
LetF k = {i k} ∪ F k−1
Letw k = w(Ḟ k )
Letδ k = R(w k −1) − R(w k )
If (δ k≤ε )
k=k−1
break
endif
// Backward step
While true
Letj k = argminj ∈ F k R(w k − wjkej ) Where ej = zero vector ,ej ∈R d
k
Let (δ′>. 5δ )
2268
break
Let k = k − 1
LetF k = F k +1 − {j k +1 }
Letw k = w(Ḟ k )
end
end
Appendix C. Algorithm for subset selection
Forward selection: 1. Assumption: A0 be a null model no predictors

2. For c = 0, . . , k − 1
• k − c models increase the predictors in A0 with one additional predictor.

• Selection of best model among k − c models that possesses highest R 2 and terms as AC+1.
• Continue the process until a target R 2 has been achieved or maximum limit of adding terms has been reached.
References a multi-stage neural network. Sol Energy 1999;66(3):193–9.

[17] Hocaoĝlu FO, Gerek ÖN, Kurban M. A novel 2-D model approach for the prediction
of hourly solar radiation. In: International Work-Conference on ArtificialNeural
[1] Bhatnagar V, Owende P. Energy harvesting for assistive and mobile applications. Networks, IWANN; 2007. p. 749–756.
Energy Sci Eng 2015;3(3):153–73. [18] Sfetsos A, Coonick AH. Univariate and multivariate forecasting of hourly solar
[2] Ali H, Riaz M, Bilal A, Ullah K. Comparison of energy harvesting techniques in radiation with artificial intelligence techniques. Sol Energy 2000;68(2):169–78.
wireless body area network. Int J Multidiscip Sci Eng 2016;7(6):20–4. [19] Quaiyum S, Rahman S, Rahman S. Application of artificial neural network in
[3] NREL: MIDC/SRRL Baseline measurement system, 〈http://www.nrel.gov/midc/ forecasting solar irradiance and sizing of photovoltaic cell for standalone systems in
srrl_bms/〉. [Accessed 4 January 2017]. Bangladesh. Int J Comp Appl 2011;32:51–6.
[4] Pyrometer, 〈http://www.kippzonen.com/Product/15/CMP22-Pyranometer/〉. [20] Belaid S, Mellit A. Prediction of daily and mean monthly global solar radiation
[Accessed 2 January 2017]. using support vector machine in an arid climate. Energy Convers Manag
[5] Inman RH, Pedro HT, Coimbra CF. Solar forecasting methods for renewable energy 2016;118:105–18.
integration. Prog Energy Combust Sci 2013;39(6):535–76. [21] Jiang H, Dong Y. A nonlinear support vector machine model with hard penalty
[6] Banos R, Manzano-Agugliaro F, Montoya FG, Gil C, Alcayde A, Gómez J. function based on glowworm swarm optimization for forecasting daily global solar
Optimization methods applied to renewable and sustainable energy: a review. radiation. Energy Convers Manag 2016;126:991–1002.
Renew Sustain Energy Rev 2011;15(4):1753–66. [22] Boata RS, Gravila P. Functional fuzzy approach for forecasting daily global solar
[7] Christensen-Dalsgaard J. Physics of solar-like oscillations. Sol Phys irradiation. Atmos Res 2012;112:79–88.
2004;220(2):137–68. [23] Zhang T. Adaptive forward-backward greedy algorithm for learning sparse repre-
[8] Marquez R, Coimbra CF. Forecasting of global and direct solar irradiance using sentations. IEEE Trans Inform Theory 2011;57(7):4689–708.
stochastic learning methods, ground experiments and the NWS database. Sol [24] Zhang T. Adaptive forward-backward greedy algorithm for sparse learning with
Energy 2011;85(5):746–56. linear models. In: Advances in Neural Information Processing Systems, NIPS;
[9] Li J, Ward JK, Tong J, Collins L, Platt G. Machine learning for solar irradiance 2009. p. 1921–28.
forecasting of photovoltaic system. Renew Energy 2016;90:542–53. [25] Miller A. Subset selection in regression, 2nd ed. CRC Press; 2002.
[10] Maier HR, Dandy GC. Neural networks for the prediction and forecasting of water [26] Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat
resources variables: a review of modelling issues and applications. Environ Modell Soc B 2005;67(2):301–20.
Softw 2000;15(1):101–24. [27] Ishwaran H, Kogalur UB, Rao JS. spikeslab: prediction and variable selection using
[11] Wang L, Kisi O, Zounemat-Kermani M, Salazar GA, Zhu Z, Gong W. Solar radiation spike and slab regression. R J 2010;2(2):68–73.
prediction using different techniques: model evaluation and comparison. Renew [28] Quinlan JR. Combining instance-based and model-based learning. In: 10th
Sustain Energy Rev 2016;2016(61):384–97. International Conference on Machine Learning; 1993. p. 236–43.
[12] Orfila A, Ballester JL, Oliver R, Alvarez A, Tintoré J. Forecasting the solar cycle with [29] Quinlan JR. C4. 5: programs for machine learning. Elsevier; 2014.
genetic algorithms. Astron Astrophys 2002;386(1):313–8. [30] Wang Y, Witten IH. Inducing Models Trees for Continuous Classes. In: 9th
[13] Zeng J, Qiao W. Short-term solar power prediction using a support vector machine. European Conference on Machine Learning; 1997. p. 128–37.
Renew Energy 2013;52:118–27. [31] Friedman JH. Multivariate adaptive regression splines. Ann Stat 1991;19(1):1–67.
[14] Chen SX, Gooi HB, Wang MQ. Solar radiation forecast based on fuzzy logic and [32] Balshi MS, McGuire AD, Duffy P, Flannigan M, Walsh J, Melillo J. Assessing the
neural networks. Renew Energy 2013;60:195–201. response of area burned to changing climate in western boreal North America using
[15] Mellit A, Pavan AM. A 24-h forecast of solar irradiance using artificial neural a Multivariate Adaptive Regression Splines (MARS) approach. Glob Change Biol
network: application for performance prediction of a grid-connected PV plant at 2009;15(3):578–600.
Trieste, Italy. Sol Energy 2010;84(5):807–21.
[16] Kemmoku Y, Orita S, Nakagawa S, Sakakibara T. Daily insolation forecasting using
2269

Renewable and Sustainable Energy Reviews: Amandeep Sharma, Ajay Kakkar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Renewable and Sustainable Energy Reviews: Amandeep Sharma, Ajay Kakkar

Uploaded by

Copyright:

Available Formats

Renewable and Sustainable Energy Reviews 82 (2018) 2254–2269

Contents lists available at ScienceDirect

Renewable and Sustainable Energy Reviews

1.1. Requirement of solar irradiance forecasting 1.2. Existing forecasting methodologies

Nomenclature D Historical days

1200 Min Avg Max

Real time Unit commitment Maintenance Network operation

Very short term Short term Medium term Long term

Few seconds to 48 to 72 hours One week ahead Months or years

Fig. 2. Forecasting horizon and concerned applications.

Data Model Model

Fig. 3. Flow diagram of machine learning methodology.

Machine learning Supervised learning Regression

FoBa leapForward spikeslab Cubist bagEarthGCV

Fig. 4. Machine learning based models.

Filtering Model averaging Variable selection

Dimension reduction Bayesian model Generalized elastic

Decision tree induction

Fig. 6. Flow diagram of cubist model.

3. Description of database N ∑ x tActualx Predicted

31st December, 2016

Time: 6:00 - 18:00

25th June, 2016

Fig. 7. Seasonal variation in solar irradiance with days of a year.

because of cloudy weather conditions, data on 31st December pos-

5.1.2. Prediction accuracy with respect to initial past time slots

been considered for performance evaluation. Simulation results in

66.07% and 98.21% respectively) with past samples taken from

7:00 a.m. onwards. The methodology of leapForward model has least

June and 31st December, maximum accuracy (78.38%, 68.42%,

model achieves high prediction accuracy with large past sample

August and 31st December maximum accuracies are 90.2%, 78.95%,

84.21% and 85.71% respectively. The prediction accuracy (94.12%,

but consideration of past samples are unpredictable and is iteration

5.1.3. Prediction accuracy with respect to forecasting horizon and

with respect to diﬀerent forecasting horizons, simulation experiments

were run for 1 h ahead, 24 h ahead and 48 h ahead solar forecasting in

experiment 3. Fig. 8 reports the performance indicators in terms of

Season 11 March (spring) 25 June (summer)

r2 .85 .96 .94 .96 .96 .92 .96 .9 .9 .92

Season 30 August (monsoon) 31 December (winter)

Predicted solar irradiance (w/m2)

Predicted solar irradiance (w/m2)

r2=1 RMSE Accuracy(%)

600 600 25.79

Predicted solar irradiance (w/m2)

900 r2=.94 900 r2=.94 900

Predicted solar irradiance (w/m2)

800 r2=.94 900 r2=.94 RMSE Accuracy(%)

Predicted solar irradiance (w/m2)

Predicted solar irradiance (w/m2)

100 100 100

Predicted solar irradiance (w/m2)

Predcted solar irradiance (w/m2)

500 500 500

Predicted solar irradiance (w/m2)

r2=.9 RMSE Accuracy(%)

Predicted solar irradiance (w/m2)

Predicted solar irradiance (w/m2)

500 500 500

r2=.94 RMSE Accuracy(%)

Predicted solar irradiance (w/m2)

Predicted solar irradiance (w/m2)