Professional Documents
Culture Documents
Energy and AI
journal homepage: www.elsevier.com/locate/egyai
GRAPHICAL ABSTRACT
HIGHLIGHTS
Keywords: Solar photovoltaic (PV) energy has emerged as a potential alternative to carbon-based energies to meet the
Solar energy Paris agreement commitment. This study investigates the effect of environmental variables on the efficiency
Location selection of solar PV panels. Data Envelopment Analysis (DEA) is used to estimate efficiencies of 91 solar PV panels
Australia
located in Australia during the time period 2010–2020. The effects of environmental variables on the estimated
Data envelopment analysis
efficiencies are quantified using the truncated regression model. Random forest is then used to predict efficiency
of solar PV panel in every city of Australia. The results allow to determine the most suitable location and
https://doi.org/10.1016/j.egyai.2022.100222
Received 21 September 2022; Received in revised form 6 December 2022; Accepted 7 December 2022
Available online 13 December 2022
2666-5468/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
G. Cattani Energy and AI 11 (2023) 100222
Random forest regions for solar PV energy production in Australia. This study provides an interesting and easily interpretable
tool for policy decision makers.
2
G. Cattani Energy and AI 11 (2023) 100222
Table 1
DEA application in solar PV energy.
Author(s) and year Scope Methodology Time period
[39] 25 cities in Iran DEA, PCA and NT Unspecified
[40] 150 solar plant units in Iran ANN and FDEA Unspecified
[20] 8 solar cell industries in Taiwan DEA 2010–2011
[22] 32 solar firms in Taiwan DEA and AHP Unspecified
[21] 12 solar cell companies DEA 2011
in Taiwan
[36] 16 cities in Japan DEA Unspecified
[23] 160 PV power stations DEA 2012
in Germany and in US
[37] 30 cities in Turkey DEA 2010
[41] 15 solar plant sites DEA and FDEA Unspecified
in Taiwan
[25] 40 PV companies in China SBM 2009–2013
and in US
[24] 160 PV power stations DEA 2012
in Germany and in US
[27] 855 commercial rooftop PV DEA 2008–2012
systems in the USA
[28] 855 commercial rooftop PV DEA 2008–2012
systems in US
[26] Photovoltaic power generation SE-DEA 2005–2015
in China
[29] 70 solar PV power plants Three-stage DEA 2010
in the USA
[30] 84 solar panel sectors DEA 2014
in South Africa
[31] Crystalline silicon and DEA Unspecified
thin-film PV solar cell
industries in Iran
[32] 16 Solar PV panels in India DEA and Shannon’s entropy Unspecified
[42] 46 potential sites in Vietnam DEA, FAHP and TOPSIS Unspecified
[33] 42 photovoltaic poverty Three-phase DEA Unspecified
alleviation projects
in China
[34] 118 PV plants in China DEA 2012–2016
[38] 44 sites in Iran DEA Unspecified
[43] 20 cities in Taiwan DEA and AHP Unspecified
[35] 21 solar mini-grids DEA and AHP 2010–2019
in Bangladesh
[44] 27 locations in Vietnam DEA, G-AHP and G-TOPSIS Unspecified
efficiency, there are no studies that focus on predicting the efficiencies 3.1. Data envelopment analysis
estimated by DEA. This is however necessary to support the decisions of
policy makers. For example, the choice of location for new solar panels DEA is a non-parametric method used to measure the relative
can be decided based on efficiency prediction (i.e. technical efficiencies efficiency of a decision making unit (DMU) transforming inputs into
of potentials solar PV panel sites are predicted and compared in order to outputs [15]. By denoting the set of inputs, 𝑥 ∈ R𝑑+ , and the set of
select the optimal location). Several studies in other applied fields have outputs, 𝑦 ∈ R𝑝+ , we can define the production possibilities frontier
already proposed to predict the efficiencies estimated by DEA: [45] use containing all feasible combinations of inputs and output as
gradient boosting approach to predict the efficiency estimated by DEA
𝛹 = {(𝑥, 𝑦) ∣ x can produce y}. (1)
of 386 operational anaerobic digestion facilities. The DEA estimated
efficiencies of 450 paddy producers have been predicted by Random The output technical efficiency of a DMU with set (𝑥0 , 𝑦0 ) ∈ is R𝑑+𝑝
+
Forest [46]. [47] rely on a combination of machine learning methods then defined as a measure of the Euclidean distance from the point
and DEA to predict the efficiency of Chinese manufacturing companies. (𝑥0 , 𝑦0 ) to the boundary of 𝛹 [52]. Since 𝛹 is unknown, it is generally
Several predicting algorithms have been use to predict DEA efficiency estimated by the convex hull of the free disposal hull of observed data,
of COVID-19 management [48,49] Finally, [50] use a combination of ∑
𝑛 ∑
𝑛
DEA and ANN to predict the efficiencies of water companies. However, 𝛹̂ = {(𝑥, 𝑦) ∣ 𝑦 ≤ 𝑦𝑖 𝛾𝑖 , 𝑥 ≥ 𝑥𝑖 𝛾𝑖 ,
𝑖=1 𝑖=1
in these cited studies, the inference of machine learning estimations is
∑
𝑛
constraint to variable importance. This limitation is due to the black 𝛾𝑖 = 1, 𝛾𝑖 ≥ 0 ∀𝑖 = 1, … , 𝑛}. (2)
box nature of these methods [51]. This study proposes to predict the 𝑖=1
efficiency in each city in Australia and derive a directly interpretable where 𝛾𝑖 ’s are the intensity variables. The DEA technical efficiency
efficiency map. estimator could be expressed in linear programming terms
∑
𝑛 ∑
𝑛
ables on the estimated efficiencies and predict the optimal location for where 𝛿0 is a measure of technical efficiency for the producing unit
solar PV installation based on efficiency. with the corresponding input 𝑥0 and output 𝑦0 . It has been shown that
3
G. Cattani Energy and AI 11 (2023) 100222
where 𝑧𝑖 is a 𝑘 dimensional set of environmental variables. This econo- handle small sample sizes have prompted many practitioners to apply
it in various research areas [57]. The simplest way to describe the
metric model was originally estimated with Tobit and Ordinary least
Random Forest is to consider it as an ensemble of different regression
squares regression until the seminal paper of [54]. The authors showed
trees.
that these methods are not consistent due to the right-tail3 truncation
A regression tree is constructed by successively splitting of the data
of the error term 𝜖𝑖 . Since 𝛿̂𝑖 ≤ 1, we can rewrite Eq. (4) as
in a set of rectangles. Each variable split is chosen to minimize a
𝜖𝑖 ≤ 1 − 𝑓 (𝑧𝑖 ), 𝑖 = 1, … , 𝑛. (5) criterion (e.g. residual sum of squares). The feature space is divided
into two regions stored as nodes. These nodes are further divided until
The authors also highlighted the presence of the bias of 𝛿̂ and serial a stopping criterion is met (e.g. the minimum number of observations in
correlation in Eq. (4). To solve these issues, they propose a statistical terminal nodes). The response variable is then predicted by averaging
model where the two stages approach is appropriate. The error term values in each group.
is assumed to be normally distributed with zero mean and unknown An illustration is provided in Fig. 1. Consider 828 estimated efficien-
variance, truncation is determined by Eq. (5) and unknown smooth cies of solar PV plants. These efficiencies are separated in two groups
function is assumed to be linear, 𝑓 (𝑧𝑖 ) = 𝑧𝑖 𝛽. Estimates of 𝛽 ′ 𝑠 are ob- corresponding respectively to solar panels affected by more (733) and
tained with a combination of truncated maximum likelihood estimation less (95) than 0.000133 millimeters of snow per hour. A second split is
and bootstrap method see the second algorithm of [54,55] for details. chosen in the left group with respect to the variable Wind Speed at the
The steps of the estimation method are summarized in Algorithm 1. value 1.1 meter per second and the third split at the value 23 degrees.
Algorithm 1 Second algorithm of [54] In this minimal example, efficiencies of solar PV panels are predicted
by the average of each group.
Require: Input data 𝑥 ∈ R𝑑+ , output data 𝑦 ∈ R𝑝+ and environmental
The idea of the Random Forest is to increase the performance of
variable data 𝑧 ∈ R𝑘 .
regression tree using bootstrap aggregating (bagging). Suppose we fit a
1: compute from original sample 𝛿̂𝑖 , ∀𝑖 = 1, ..., 𝑛 using (3).
model based on our observed sample and obtain a predicted response
2: estimate 𝛽 and 𝜎𝜖 (variance of 𝜖𝑖 ) with maximum likelihood
variable 𝑦̂ = 𝑓̂(𝑧), where 𝑧 is the set of explanatory variables. We
estimation of truncated regression (4) assuming 𝑓 (𝑧𝑖 ) = 𝑧𝑖 𝛽.
can obtain 𝐵 boostrap samples by resampling the original data with
3: for each 1 ∶ 𝐿1 boostrap samples
replacement. A model is fitted for each boostrap sample and the final
4: draw 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛 from 𝑁(0, 𝜎̂ 𝜖 ) with truncation. prediction is obtained by averaging all predictions,
5: compute 𝛿𝑖∗ = 𝑧𝑖 𝛽̂ + 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛.
1 ∑ ̂∗𝑏
𝐵
𝛿̂
6: set 𝑥∗𝑖 = 𝑥𝑖 and 𝑦∗𝑖 = 𝑦𝑖 𝛿 ∗𝑖 , ∀𝑖 = 1, ..., 𝑛. 𝑓̂𝑏𝑎𝑔𝑔𝑖𝑛𝑔 (𝑧) = 𝑓 (𝑧), (6)
𝑖 𝐵 𝑏=1
7: compute 𝛿̂𝑖∗ , ∀𝑖 = 1, ..., 𝑛 using (3) with 𝑥∗𝑖 and 𝑦∗𝑖 .
̂
8: compute the bias-corrected estimator as 𝛿̂𝑖 = 𝛿̂𝑖 − 𝐵𝑖𝑎𝑠( ̂ 𝛿̂𝑖 ), where 𝑓̂∗𝑏 (𝑧) is the predicted response variable of boostrap sample 𝑏.
̂ ̂
∀𝑖 = 1, ..., 𝑛, where 𝐵𝑖𝑎𝑠(𝛿𝑖 ) is estimated from previous boostrap Bagging reduces the variance of the final prediction.
procedure. Regression trees benefit greatly from bagging. Indeed, if we allow
̂ them to grow in depth, they generally have relatively low bias and high
9: Estimate 𝛽̂ and 𝜎̂̂ 𝜖 with maximum likelihood estimation of
variance. However, the regression trees constructed in each boostrap
truncated regression (4) with 𝛿̂̂𝑖 sample are correlated. Because they are build based on the same
10: for each 1 ∶ 𝐿2 boostrap samples
set of explanatory variables. Unfortunately, the higher the correlation
11: draw 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛 from 𝑁(0, 𝜎̂̂ 𝜖 ) with truncation. between the trees, the higher the variance of the bagging estimator
12: compute 𝛿𝑖∗∗ = 𝑧𝑖 𝛽̂̂ + 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛. (6). Hence, the Random Forest algorithm randomly select a fraction
13: estimate 𝛽̂̂∗ and 𝜎̂̂ 𝜖∗ with maximum likelihood estimation of of explanatory variables (e.g. one third of available variables) for each
truncated regression (4) with 𝛿𝑖∗∗ . boostrap sample and use them to construct regression tree. This reduces
̂ ̂̂ based on the correlation between trees without increasing too much the variance.
14: return bias-corrected estimates 𝛽̂ and estimated 𝑉̂ [𝛽]
The Random Forest algorithm is used to predict technical efficien-
second boostrap procedure.
cies estimated by DEA using the set of environmental variables 𝑧.
One may wonder why the Random Forest algorithm is used instead of
3.3. Random forest other prediction algorithms. Therefore, predictions are also made using
other machine learning techniques and compared based on a goodness
The Random Forest introduced by [56] is one of the most popular of fit criteria. For the prediction problem presented in this paper,
machine learning tools. Its relatively high accuracy and ability to Random Forest is the most accurate algorithm (see Section 4.3). More
generally, Random Forest is expected to perform well with non-linear
data and converge relatively fast [58]. The Random Forest algorithm
3
Often presented in the literature as a left-tail truncation when reciprocals is also robust to outliers [59], requires relatively few hyperparameters
of technical efficiency 𝛿1 are used. compared to other methods [60] and does not overfit the data [61]. The
𝑖
4
G. Cattani Energy and AI 11 (2023) 100222
4. Empirical results
5
G. Cattani Energy and AI 11 (2023) 100222
Table 2
Summary statistics.
Variable Unit of Account Number of observation Mean Standard Deviation Minimum Maximum
Electricity produced Megawatt hour 828 7096 4371 1479 44964
Installed capacity Megawatt 828 4717 2819 1480 29700
Solar irradiance Watt per m2 828 0.3071 0.022 0.2311 0.3669
Temperature Degrees 828 17.506 3.879 9.974 24.085
Precipitation Millimeters per hour 828 0.073 0.033 0.003 0.294
Wind speed Meter per second 828 4.063 1.487 0.260 6.595
Snowfall Millimeters per hour 828 0.000 0.001 0.000 0.004
Air density Kilogram per m3 828 1.188 0.030 1.116 1.225
Age of solar plant Years 828 5.059 2.606 1.000 10.000
Table 3 mtry, the number of randomly selected variables, node size, the mini-
Estimates of the Boostrap Truncated Regression.
mum number of observations in terminal nodes and sample size, the size
Truncated Bootstrap Regression
of boostrap samples. mtry is usually set to one third of the available
Estimate 2.5% C.I. 97.5% C.I. variables. Lower values lead to less correlated trees but decrease the
Intercept 1.866∗∗∗ 1.447 2.242 average performance of the trees (suboptimal variables can be used).
(0.210) Node size is set by default to 5. This hyperparameter controls the depth
Precipitation 0.185 −0.035 0.419
(0.120)
of the regression tree. Sample size is generally set to the same size as the
Wind Speed 0.013∗∗∗ 0.008 0.018 original sample. As the hyperparameter mtry, lower values lead to less
(0.003) correlated trees. The R package tuneRanger is used to tune all hyperpa-
Snowfall 61.933∗∗∗ 38.781 85.549 rameters simultaneously. This package relies on sequential model-based
(14.260)
optimization (SMBO) [73] to select the optimal set of hyperparameters
Air Density −0.842∗∗∗ −1.149 −0.488
(0.174) based on the mean squared error criteria. In our optimization problem,
Temperature −0.096∗∗∗ −0.013 −0.007 tuned hyperparameters are 𝑚𝑡𝑟𝑦 = 6, 𝑛𝑜𝑑𝑒 𝑠𝑖𝑧𝑒 = 8 and 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 =
(0.002) 59%. Note that setting mtry to 6 implies that we are using full set of
Age −0.005∗∗∗ −0.007 −0.002
variables available in each boostrap sample. The resulting regression
(0.001)
trees are more accurate but potentially correlated. But by setting sample
Note: 828 obs. ∗ 𝑝 < 0.1; ∗∗
𝑝 < 0.05; ∗∗∗
𝑝 < 0.01.
size to about 60% of the original sample, the trees are de-correlated.
The regression trees of the first 4 boostrap samples are display in Fig. 3.
Note that even though the trees are constructed with the same set of
consistent with [68] assessing that wind has a cooling effect on the explanatory variables, setting the sample size at 60% produces different
solar PV panel and with [10] observing that the positive effect of wind trees.
speed is relatively small compared to the negative effect of temper- The variable importances of each predictor are displayed in Fig. 4.
ature. Indeed, the significant coefficient associated with the variable The metric used is the Mean decrease in accuracy which is better
𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 is −0.096, see [69] for a review of negative effect of than using node impurity [74]. The most important variable is the
temperature on solar PV plant performance. The coefficient associated Temperature. The age of the solar PV panel has less influence on the
with variable 𝑆𝑛𝑜𝑤𝑓 𝑎𝑙𝑙 is positive and significant. The magnitude, estimated efficiency than environmental variables. It should be noted
61.933, is relatively large compared to the other coefficients. This is that the variable Precipitation is used for efficiency prediction. While it
mainly due to the fact that snowfall is not common in Australia (the is not significant in the truncated regression. This could indicate that
maximum value of the variable 𝑆𝑛𝑜𝑤𝑓 𝑎𝑙𝑙 is 0.004 in the sample), the precipitation has a non-linear effect on the estimated efficiencies. To
coefficient 𝛽3 absorbs the scale of the variable. The sign of the estimated provide an interpretation of the Random Forest results, the efficiency of
coefficient seems surprising, since snow accumulation on solar glass newly installed solar PV panels is predicted in all Australian cities with
is thought to reduce the efficiency of solar installations [70]. One a population of more than 1000 inhabitants. Note that the Random For-
possible explanation is that Australia’s warm climate does not allow est estimation used only environmental variables. It is then sufficient
snow to accumulate. Therefore, snowfall only cleans the solar PV glass. to obtain the value of these variables at each city location using the
A significant negative effect is estimated for the variable 𝐴𝑖𝑟 𝐷𝑒𝑛𝑠𝑖𝑡𝑦. MERRA dataset to predict the efficiency of a solar PV plant installed in
An increase of air mass reduces the sunlight reaching the solar cell this location. The predicted efficiency of solar PV panel are predicted
and decreases the performance of the solar PV plant [71]. The effect for 3828 cities.7 The ranking of cities based on the efficiency score
of aging is estimated at −0.005. In other words, the efficiency of an is provided in the Table 5. The most suitable city to install solar PV
installed photovoltaic solar panel decreases by 0.05 every 10 years. panels seems to be the small town of Evandale located on the island of
Finally, the variable 𝑃 𝑟𝑒𝑐𝑖𝑝𝑖𝑡𝑎𝑡𝑖𝑜𝑛 is not significant in the truncated Tasmania. The least suitable location is the city of Augusta in the south
regression model. west of Australia. The cities of Melbourne and Sydney are ranked 445
and 2226 respectively. To visualize the predicted efficiencies location
4.3. Predicted efficiencies for solar PV panels, the average of predicted efficiencies is computed
for each Australian region. The results are displayed in Fig. 5. The most
The truncated regression model presented in the previous section favorable region for the installation of solar PV panels is the island of
allows a direct interpretation of estimated coefficient. However this Tasmania. The southeastern regions also appear to be more optimal
model requires relatively strong assumptions (e.q. linear effect of en- than the northern or western regions. The most unfavorable region
vironmental variables, no interaction effect, . . . ). It is then particularly according to these results is the Far North located at the northern tip of
interesting to estimate the flexible Random Forest model. The estima- Australia.
tion is made using the R package randomForest setting the number of
regression trees to 1000.
To improve the performance of the Random Forest the hyperparam- 7
The city’s location is defined by the Australian national cartography
eters are tuned following [72]. The set of hyperparameters contains mapping agency as the location of the city hall.
6
G. Cattani Energy and AI 11 (2023) 100222
Fig. 4. Variable importance of each predictor based on the mean decrease in accuracy.
Fig. 5. Average of predicted efficiencies of newly installed solar PV panels in 2020 by
Australian region.
Table 4
RMSE computed with 10 folds cross-validation.
Method RMSE Hyperparameters
there is not enough data for estimating this complex model. The best
Random forest 0.076 𝑚𝑡𝑟𝑦 = 6, 𝑛𝑜𝑑𝑒𝑠𝑖𝑧𝑒 = 8, 𝑠𝑎𝑚𝑝𝑠𝑖𝑧𝑒 = 0.59
GAM 0.079 𝑚𝑒𝑡ℎ𝑜𝑑 = GCV.Cp and 𝑠𝑒𝑙𝑒𝑐𝑡 = FALSE neural network (selected by caret package) contains only one layer with
SVM 0.083 𝐶𝑜𝑠𝑡 = 1 3 neurons. The Boosted tree is the worst method, probably because
ANN 0.089 𝐿𝑎𝑦𝑒𝑟1 = 3, 𝐿𝑎𝑦𝑒𝑟2 = 0, 𝐿𝑎𝑦𝑒𝑟3 = 0 of its tendency to overfitting. The correlation of predicted efficiencies
Boosted tree 0.095 𝑢 = 0.1, 𝑚𝑠𝑡𝑜𝑝 = 100, 𝑚𝑎𝑥𝑑𝑒𝑝𝑡ℎ = 3
estimated by the Random Forest and other methods is relatively high
(i.e. between 0.6 and 0.7).
One may say that other machine learning techniques should be ap- 5. Conclusion
plied instead of the Random Forest. Therefore, the efficiencies of solar
PV panels are also predicted using ANN, General Additive model (GAM), The transition from fossil fuel combustion to renewable energy is
Support Vector Machines with Linear Kernel (SVM) and Boosted Tree. one of the biggest challenge of the next decades. Among the renewable
The hyperparameters of the methods listed above are also optimized energies, solar PV has received a lot of attention. The potential of this
using the R package caret, see Table 4 for the list of optimized tuning source of electricity is relatively large. Therefore, it is of particular
parameters. The prediction methods are then compared based on Root interest to find the most suitable locations for installing solar PV panels.
mean squared errors (RMSE) computed for each method with using 10 This study examines the effect of environmental variables on the techni-
folds cross-validation. RMSE are reported in Table 4. cal efficiency of solar PV panels. First, efficiencies of 91 solar PV panels
The Random forest is the most efficient method to predict DEA installed in Australia are estimated during the time period 2010–2020
technical efficiencies. Note that ANN does not perform well because with DEA. Then the truncated regression model is used to estimate the
7
G. Cattani Energy and AI 11 (2023) 100222
Table 5
Predicted efficiency score of Australian cities for 2020.
City Score Rank
Evandale 0.973 1
West Moonah 0.972 2
Glenorchy 0.972 3
Mount Stuart 0.972 4
Goodwood 0.972 5
Lutana 0.972 6
... ... ...
Melbourne 0.859 445
... ... ...
Sydney 0.762 2226
... ... ...
Mount Sheridan 0.600 3823
Bentley Park 0.600 3824
Dunsborough 0.559 3825
Quindalup 0.558 3826
Yallingup 0.558 3827
Augusta 0.555 3828 Fig. A.7. Generated electricity in MWh against Inverter size measured in Watts.
Table A.6
Orientation of Solar PV plants.
Orientation East North East North North West West
Number of solar PV panels 5 21 44 19 2
Fig. A.8. Generated electricity in MWh against number of panels of solar PV plants.
References
[1] Van Ruijven BJ, De Cian E, Sue Wing I. Amplification of future energy demand
Fig. A.6. Estimated distribution of tilt angle measured in degrees of solar PV panels.
growth due to climate change. Nature Commun 2019;10(1):1–12.
[2] Yasa UG, Erim M, Erim N, Girgin M, Kurt H. Design of anti-reflective graded
height nanogratings for photovoltaic applications. In: 2017 international con-
ference on numerical simulation of optoelectronic devices. IEEE; 2017, p.
effects of environmental variables on the estimated efficiencies. Finally 25–6.
the highly flexible Random forest is used to predict the efficiency of [3] Sarkın AS, Ekren N, Sağlam Ş. A review of anti-reflection and self-cleaning
newly installed solar PV panels in the 3828 Australian cities. The results coatings on photovoltaic panels. Sol Energy 2020;199:63–73.
highlight optimal locations and regions for the installation of solar [4] Choudhary P, Srivastava RK. Sustainability perspectives-a review for solar
photovoltaic trends and growth opportunities. J Clean Prod 2019;227:589–612.
PV panels. This study provides an interesting and easily interpretable
[5] Sundareswaran K, Sankar P, Nayak PSR, Simon SP, Palani S. Enhanced energy
tool for policy decision makers. Further research could use the same output from a PV system under partial shaded conditions through artificial bee
methodology to investigate the best location for the installation of other colony. IEEE Trans Sustain Energy 2014;6(1):198–209.
renewable energy plants. [6] Xiao M, Junne T, Haas J, Klein M. Plummeting costs of renewables-are energy
scenarios lagging? Energy Strategy Rev 2021;35:100636.
[7] Liang Z, Zhou Z, Zhao L, Dong B, Wang S. Fabrication of transparent, durable
Declaration of competing interest and self-cleaning superhydrophobic coatings for solar cells. New J Chem
2020;44(34):14481–9.
The authors declare that they have no known competing finan- [8] Murdock HE, Gibb D, André T, Sawin JL, Brown A, Ranalder L, et al. Renewables
2021-global status report. 2021.
cial interests or personal relationships that could have appeared to [9] Commission IE, et al. Crystalline silicon terrestrial photovoltaic (PV)
influence the work reported in this paper. modules—design qualification and type approval. CEI/IEC 2005;61215.
[10] Gaglia AG, Lykoudis S, Argiriou AA, Balaras CA, Dialynas E. Energy efficiency of
Data availability PV panels under real outdoor conditions–an experimental assessment in Athens,
Greece. Renew Energy 2017;101:236–43.
[11] Hamou S, Zine S, Abdellah R. Efficiency of PV module under real working
Data will be made available on request. conditions. Energy Procedia 2014;50:553–8.
[12] Hachicha AA, Al-Sawafta I, Said Z. Impact of dust on the performance of
solar photovoltaic (PV) systems under United Arab Emirates weather conditions.
Appendix A
Renew Energy 2019;141:287–97.
[13] Adeh EH, Good SP, Calaf M, Higgins CW. Solar PV power potential is greatest
See Fig. A.6, Fig. A.7, Fig. A.8 and Table A.6. over croplands. Sci Rep 2019;9(1):1–6.
8
G. Cattani Energy and AI 11 (2023) 100222
[14] Santhakumari M, Sagar N. A review of the environmental factors degrading [43] Wang C-N, Dang T-T, Bayer J, et al. A two-stage multiple criteria decision making
the performance of silicon wafer-based photovoltaic modules: Failure detec- for site selection of solar photovoltaic (PV) power plant: A case study in Taiwan.
tion methods and essential mitigation techniques. Renew Sustain Energy Rev IEEE Access 2021;9:75509–25.
2019;110:83–100. [44] Wang C-N, Dang T-T, Wang J-W, et al. A combined data envelopment anal-
[15] Charnes A, Cooper WW, Rhodes E. Measuring the efficiency of decision making ysis (DEA) and grey based multiple criteria decision making (g-MCDM) for
units. European J Oper Res 1978;2(6):429–44. solar PV power plants site selection: A case study in Vietnam. Energy Rep
[16] Aigner D, Lovell CK, Schmidt P. Formulation and estimation of stochastic frontier 2022;8:1124–42.
production function models. J Econometrics 1977;6(1):21–37. [45] De Clercq D, Wen Z, Fei F. Determinants of efficiency in anaerobic bio-waste co-
[17] Meeusen W, van Den Broeck J. Efficiency estimation from cobb-douglas digestion facilities: A data envelopment analysis and gradient boosting approach.
production functions with composed error. Internat Econom Rev 1977;435–44. Appl Energy 2019;253:113570.
[18] Mardani A, Zavadskas EK, Streimikiene D, Jusoh A, Khoshnoudi M. A comprehen- [46] Nandy A, Singh PK. Farm efficiency estimation using a hybrid approach of
sive review of data envelopment analysis (DEA) approach in energy efficiency. machine-learning and data envelopment analysis: Evidence from rural eastern
Renew Sustain Energy Rev 2017;70:1298–322. India. J Clean Prod 2020;267:122106.
[19] Mohd Chachuli FS, Ahmad Ludin N, Mat S, Sopian K. Renewable energy [47] Zhu N, Zhu C, Emrouznejad A. A combined machine learning algorithms and DEA
performance evaluation studies using the data envelopment analysis (DEA): A method for measuring and predicting the efficiency of Chinese manufacturing
systematic review. J Renew Sustain Energy 2020;12(6):062701. listed companies. J Manag Sci Eng 2020.
[20] Chueh H-E, Jheng J-Y. Applying data envelopment analysis to evaluation of [48] Aydin N, Yurdakul G. Assessing countries’ performances against COVID-19 via
Taiwanese solar cell industry operational performance. Int J Comput Sci Inform WSIDEA and machine learning algorithms. Appl Soft Comput 2020;97:106792.
Technol 2012;4(4):1. [49] Taherinezhad A, Alinezhad A. COVID-19 crisis management: Global appraisal
[21] Hsiao J-M. Measuring the operating efficiency of solar cell companies in Taiwan using two-stage DEA and ensemble learning algorithms. Sci Iranica 2022.
with data envelopment analysis. Am J Appl Sci 2012;9(12):1899. [50] Molinos-Senante M, Maziotis A. Prediction of the efficiency in the water industry:
[22] Lee AH, Lin CY, Kang H-Y, Lee WH. An integrated performance evaluation model An artificial neural network approach. Proc Saf Environ Prot 2022;160:41–8.
for the photovoltaics industry. Energies 2012;5(4):1271–91. [51] Efron B. Prediction, estimation, and attribution. Internat Statist Rev 2020;88:S28–
[23] Sueyoshi T, Goto M. Photovoltaic power stations in Germany and the United 59.
States: A comparative study by data envelopment analysis. Energy Econ [52] Farrell MJ. The measurement of productive efficiency. J R Stat Soc Ser A
2014;42:271–88. (General) 1957;120(3):253–81.
[24] Sueyoshi T, Goto M. Measurement of returns to scale on large photovoltaic power [53] Kneip A, Park BU, Simar L. A note on the convergence of nonparametric DEA
stations in the United States and Germany. Energy Econ 2017;64:306–20. estimators for production efficiency scores. Econom Theory 1998;14(6):783–93.
[25] Li N, Liu C, Zha D. Performance evaluation of Chinese photovoltaic companies [54] Simar L, Wilson PW. Estimation and inference in two-stage, semi-parametric
with the input-oriented dynamic SBM model. Renew Energy 2016;89:489–97. models of production processes. J Econometrics 2007;136(1):31–64.
[26] Liu J, Long Y, Song X. A study on the conduction mechanism and evaluation of [55] Simar L, Wilson PW. Two-stage DEA: Caveat emptor. J Prod Anal
the comprehensive efficiency of photovoltaic power generation in China. Energies 2011;36(2):205–18.
2017;10(5):723. [56] Breiman L. Random forests. Mach Learn 2001;45(1):5–32.
[27] Wang DD, Sueyoshi T. Assessment of large commercial rooftop photovoltaic [57] Biau G, Scornet E. A random forest guided tour. Test 2016;25(2):197–227.
system installations: Evidence from California. Appl Energy 2017;188:45–55. [58] Fan G-F, Yu M, Dong S-Q, Yeh Y-H, Hong W-C. Forecasting short-term electricity
[28] Sueyoshi T, Wang D. Measuring scale efficiency and returns to scale on load using hybrid support vector regression with grey catastrophe and random
large commercial rooftop photovoltaic systems in California. Energy Econ forest modeling. Util Policy 2021;73:101294.
2017;65:389–98. [59] Cutler A, Cutler DR, Stevens JR. Random forests. In: Ensemble machine learning.
[29] Wang Z, Li Y, Wang K, Huang Z. Environment-adjusted operational performance Springer; 2012, p. 157–75.
evaluation of solar photovoltaic power plants: A three stage efficiency analysis. [60] Demir S, Sahin EK. Comparison of tree-based machine learning algorithms
Renew Sustain Energy Rev 2017;76:1153–62. for predicting liquefaction potential using canonical correlation forest, rota-
[30] de Villiers A, Vermeulen H. Sector performance monitoring in utility-scale solar tion forest, and random forest based on CPT data. Soil Dyn Earthq Eng
farms using data envelopment analysis. In: 2017 IEEE PES powerafrica. IEEE; 2022;154:107130.
2017, p. 192–7. [61] Desai S, Ouarda TB. Regional hydrological frequency analysis at ungauged sites
[31] Haeri A. Evaluation and comparison of crystalline silicon and thin-film photo- with random forest regression. J Hydrol 2021;594:125861.
voltaic solar cells technologies using data envelopment analysis. J Mater Sci, [62] Pfenninger S, Staffell I. Long-term patterns of European PV output using 30 years
Mater Electron 2017;28(23):18183–92. of validated hourly reanalysis and satellite data. Energy 2016;114:1251–65.
[32] Ghosh S, Yadav VK, Mukherjee V, Yadav P. Evaluation of relative impact of [63] de Oliveira MCC, Cardoso ASAD, Viana MM, Lins VdFC. The causes and
aerosols on photovoltaic cells through combined Shannon’s entropy and data effects of degradation of encapsulant ethylene vinyl acetate copolymer (EVA)
envelopment analysis (DEA). Renew Energy 2017;105:344–53. in crystalline silicon photovoltaic modules: A review. Renew Sustain Energy Rev
[33] Wu Y, Ke Y, Zhang T, Liu F, Wang J. Performance efficiency assessment of pho- 2018;81:2299–317.
tovoltaic poverty alleviation projects in China: A three-phase data envelopment [64] Simar L, Wilson PW. Non-parametric tests of returns to scale. European J Oper
analysis model. Energy 2018;159:599–610. Res 2002;139(1):115–32.
[34] You H, Fang H, Wang X, Fang S. Environmental efficiency of photovoltaic power [65] Simar L, Wilson PW. Inference by the m out of n bootstrap in nonparametric
plants in China—A comparative study of different economic zones and plant frontier models. J Prod Anal 2011;36(1):33–53.
types. Sustainability 2018;10(7):2551. [66] Boussofiane A, Dyson RG, Thanassoulis E. Applied data envelopment analysis.
[35] Aziz S, Chowdhury SA. Performance evaluation of solar mini-grids in Bangladesh: European J Oper Res 1991;52(1):1–15.
A two-stage data envelopment analysis. Clean Environ Syst 2021;2:100003. [67] Dyson RG, Allen R, Camanho AS, Podinovski VV, Sarrico CS, Shale EA. Pitfalls
[36] Yokota S, Kumano T. Mega-solar optimal allocation using data envelopment and protocols in DEA. European J Oper Res 2001;132(2):245–59.
analysis. Electr Eng Jpn 2013;183(4):24–32. [68] Gökmen N, Hu W, Hou P, Chen Z, Sera D, Spataru S. Investigation of wind speed
[37] Sozen A, Mirzapour A, Çakir MT. Selection of the best location for solar plants cooling effect on PV panels in windy locations. Renew Energy 2016;90:283–90.
in Turkey. J Energy Southern Afr 2015;26(4):52–63. [69] Dubey S, Sarvaiya JN, Seshadri B. Temperature dependent photovoltaic (PV)
[38] Dehghani E, Jabalameli MS, Pishvaee MS, Jabarzadeh A. Integrating information efficiency and its effect on PV production in the world–a review. Energy Procedia
of the efficient and anti-efficient frontiers in DEA analysis to assess location of 2013;33:311–21.
solar plants: A case study in Iran. J Ind Syst Eng 2018;11(1):163–79. [70] Andrews RW, Pollard A, Pearce JM. The effects of snowfall on solar photovoltaic
[39] Azadeh A, Ghaderi S, Maghsoudi A. Location optimization of solar performance. Sol Energy 2013;92:84–97.
plants by an integrated hierarchical DEA PCA approach. Energy Policy [71] Rida K, Al-Waeli A, Al-Asadi K. The impact of air mass on photovoltaic panel
2008;36(10):3993–4004. performance. Eng Sci Rep 2016;1(1):1–9.
[40] Azadeh A, Sheikhalishahi M, Asadzadeh S. A flexible neural network-fuzzy data [72] Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for
envelopment analysis approach for location optimization of solar plants with random forest. Wiley Interdisc Rev Data Min Knowl Discov 2019;9(3):e1301.
uncertainty and complexity. Renew Energy 2011;36(12):3394–401. [73] Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimization for
[41] Lee AH, Kang H-Y, Lin C-Y, Shen K-C. An integrated decision-making model for general algorithm configuration. In: International conference on learning and
the location of a PV solar plant. Sustainability 2015;7(10):13522–41. intelligent optimization. Springer; 2011, p. 507–23.
[42] Wang C-N, Nguyen VT, Thai HTN, Duong DH. Multi-criteria decision making [74] Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests.
(MCDM) approaches for solar power plant location selection in Viet Nam. Pattern Recognit Lett 2010;31(14):2225–36.
Energies 2018;11(6):1504.