Combining Data Development Analysis and RF For Selecting Optimal Locations of Solar PV

Energy and AI 11 (2023) 100222
Contents lists available at ScienceDirect
Energy and AI
journal homepage: www.elsevier.com/locate/egyai
Combining data envelopment analysis and Random Forest for selecting

optimal locations of solar PV plants
Gilles Cattani
GSEM University of Geneva, Geneva, 1213, Switzerland
GRAPHICAL ABSTRACT
HIGHLIGHTS
• Combination of DEA and Random Forest is proposed.

• Effects of environmental variables on solar PV efficiency are estimated.
• Most efficient site for solar PV plant is selected.
ARTICLE INFO ABSTRACT
Keywords: Solar photovoltaic (PV) energy has emerged as a potential alternative to carbon-based energies to meet the
Solar energy Paris agreement commitment. This study investigates the effect of environmental variables on the efficiency
Location selection of solar PV panels. Data Envelopment Analysis (DEA) is used to estimate efficiencies of 91 solar PV panels
Australia
located in Australia during the time period 2010–2020. The effects of environmental variables on the estimated
Data envelopment analysis
efficiencies are quantified using the truncated regression model. Random forest is then used to predict efficiency
of solar PV panel in every city of Australia. The results allow to determine the most suitable location and
E-mail address: gilles.cattani@unige.ch.
https://doi.org/10.1016/j.egyai.2022.100222
Received 21 September 2022; Received in revised form 6 December 2022; Accepted 7 December 2022
Available online 13 December 2022
2666-5468/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
G. Cattani Energy and AI 11 (2023) 100222
Random forest regions for solar PV energy production in Australia. This study provides an interesting and easily interpretable
tool for policy decision makers.
The rest of the paper is organized as follows. Section 2 presents

1. Introduction the literature review. Section 3 describes the methodology. Empirical
results are provided in Section 4. Section 5 presents the conclusions.
The world we live in is based on massive energy consumption. On
the one hand, it has enabled rapid economic growth. On the other 2. Literature review
hand, climate change due to carbon dioxide (C02 ) emissions is leading
to an energy crisis. As conventions are adopted to reduce fossil fuel The present study used an econometric/statistical approach to mea-
combustion, future energy demand will increase by 2050 [1]. In this sure the efficiency of solar PV panels. Applied econometricians estimate
context, solar energy has attracted particular interest. This renewable the technical efficiency (TE) of producing units using either DEA or
energy is non-polluting and inexhaustible [2]. The amount of solar en- Stochastic Frontier Analysis (SFA). The former is a deterministic perfor-
ergy that reaches the Earth’s surface each year is 10,000 times greater mance measurement, which assesses the relative efficiency of decision-
than the energy consumed by the world in one year [3]. Therefore, making units [15]. The latter is a stochastic regression model, which
being able to efficiently convert a tiny fraction of this renewable source separates the TE from random noises [16,17]. The renewable energy
into electricity could ensure sustainability [4]. literature has primarily used DEA to estimate the efficiency of produc-
Due to their attractive features (e.g. low maintenance costs Sun- ing unit, see [18,19] for comprehensive review of DEA in renewable
dareswaran et al. [5] and rapidly falling investment costs [6]), solar energy. In this section, a substantial review of DEA applications in solar
photovoltaic (PV) panels are the most popular solar power generation PV plants is presented.
technology [7]. In 2020, total installed solar PV capacity reaches 760 [20,21] use the DEA approach to evaluate respectively 8 solar cell
Gigawatts (GW) [8] with an averaged annual growth rate of 40.3%1 industries and 12 solar cell companies in Taiwan. [22] rely on DEA
between 2000 and 2020. This expansion of solar panels has been and Analytic Hierarchy Process (AHP) to evaluate the performance of
accompanied by a large body of research literature. Of particular 32 solar firms in Taiwan. The DEA has been used to compare PV power
interest has been the evaluation of solar panel performance under stations installed in Germany and in the United States [23,24]. [25]
outdoor conditions. Numerous studies have highlighted the difference use an input-oriented dynamic SBM model to measure the efficiency of
in the performance of PV systems between real working conditions and 40 PV companies in China and in the United States. [26] evaluate the
standard test conditions2 (STC): [10] found that the performance of comprehensive efficiency of photovoltaic power generation in China
solar PV is 18% lower under Greece outdoor environment than under based on a Super-Efficient DEA (SE-DEA). The DEA method has been
STC; [11] estimate a linear decrease in the performance of solar PV used to study the performance of 855 commercial rooftop PV system
installed in Algeria with temperature and air mass; [12] highlight the installed in California [27,28]. [29] use a three-stage DEA to investigate
negative impact of dust on solar PV performance; [13] have found the effect of environmental variable on the efficiency of 70 solar PV
that the performance of solar PV panels is influenced by insolation, air power plants located in the United States. [30] use the DEA approach
temperature, wind speed and relative humidity (for a comprehensive to evaluate the efficiency of 84 solar panel sectors installed in South
review of the effect of environmental variables on the performance of Africa. [31] compares the performance of crystalline silicon and thin-
solar PV see [14]). film PV solar cell industries in Iran based on DEA. [32] combine
The purpose of the present study is to investigate the effect of DEA and Shannon’s entropy to quantify the relative performance of 16
environmental variables on the efficiency of solar PV plants in order to solar PV panels in India. [33] propose a three-phase DEA approach to
predict potential efficiencies of new sites. The proposed methodology evaluate 46 photovoltaic poverty projects. The environmental efficiency
is based on a combination of Data Envelopment Analysis (DEA) and of 118 PV plants in China are estimated using DEA method [34]. [35]
Random forest. First, the DEA is used to estimate the technical efficien- use a combination of DEA and AHP to evaluate the performance of 21
cies of 91 Australian solar PV plants over the time period 2010–2020. solar mini-grids in Bangladesh.
Then a truncated regression model is estimated to capture the effects of The DEA approach has also been used to evaluate the best location
environmental variables on the estimated efficiencies. Finally to predict
for solar PV panels in Japan [36], in Turkey [37] and in Iran [38]. [39]
the efficiency of new sites, the highly flexible Random forest is used.
compare 25 cities in Iran based on a combination of DEA, Principal
The results of this study highlight the optimal location of solar PV panel
Component Analysis (PCA) and Numerical Taxonomy (NT) to deter-
installations in Australia. They could be useful to policy-makers when
mine the optimal site for solar PV installation. [40] rely on Artificial
considering new solar energy projects.
Neural Network (ANN) and fuzzy-DEA (FDEA) to investigate the best
This study contributes in several aspects. Firstly, this is the first
location of solar PV among 150 solar plant units in Iran. [41] use DEA
study proposing to predict the efficiency of solar PV plants. While the
and FDEA to compare 15 potential solar plant sites in Taiwan. [42]
current literature has only focused estimating efficiencies. Secondly,
study the optimal location among 46 potential sites in Vietnam for solar
to provide inference on the black-box estimation, the present study
PV based on a combination of DEA, FAHP and technique for order
propose to predict the efficiencies of solar PV panels in every city of
of preference by similarity to ideal solution (TOPSIS). The location
Australia. The resulting efficiency map is a directly and easily inter-
efficiency of 20 cities in Taiwan for solar PV panels is evaluated based
pretable tool for policy makers. Finally, a truncated regression analysis
on DEA and AHP [43]. Finally, [44] combine DEA, grey-AHP (G-AHP)
is also provided to capture the effects of environmental variables on
and grey-TOPSIS (G-TOPSIS) to determine the optimal location for
efficiencies. The results are compared to the existing literature.
solar PV among 27 location in Vietnam. The literature review of DEA
application in solar PV energy is summarized in Table 1.
1
This averaged annual growth rate is computed from solar PV installed
The studies presented above are entirely focused on either estimat-
capacity data available from https://www.irena.org/Statistics/View-Data-by- ing efficiencies of solar PV panels or investigating factors affecting
Topic/Capacity-and-Generation/Technologies. the estimated efficiencies. While the former studies are important for
2
1000 kW/m2 of irradiance, 25 ◦ C of ambient temperature and air mass of comparing solar plants/finding the most productive unit and the latter
1.5 [9]. are useful for examining the effect of external variables on technical
2
Table 1
DEA application in solar PV energy.
Author(s) and year Scope Methodology Time period
[39] 25 cities in Iran DEA, PCA and NT Unspecified
[40] 150 solar plant units in Iran ANN and FDEA Unspecified
[20] 8 solar cell industries in Taiwan DEA 2010–2011
[22] 32 solar firms in Taiwan DEA and AHP Unspecified
[21] 12 solar cell companies DEA 2011
in Taiwan
[36] 16 cities in Japan DEA Unspecified
[23] 160 PV power stations DEA 2012
in Germany and in US
[37] 30 cities in Turkey DEA 2010
[41] 15 solar plant sites DEA and FDEA Unspecified
in Taiwan
[25] 40 PV companies in China SBM 2009–2013
and in US
[24] 160 PV power stations DEA 2012
in Germany and in US
[27] 855 commercial rooftop PV DEA 2008–2012
systems in the USA
[28] 855 commercial rooftop PV DEA 2008–2012
systems in US
[26] Photovoltaic power generation SE-DEA 2005–2015
in China
[29] 70 solar PV power plants Three-stage DEA 2010
in the USA
[30] 84 solar panel sectors DEA 2014
in South Africa
[31] Crystalline silicon and DEA Unspecified
thin-film PV solar cell
industries in Iran
[32] 16 Solar PV panels in India DEA and Shannon’s entropy Unspecified
[42] 46 potential sites in Vietnam DEA, FAHP and TOPSIS Unspecified
[33] 42 photovoltaic poverty Three-phase DEA Unspecified
alleviation projects
in China
[34] 118 PV plants in China DEA 2012–2016
[38] 44 sites in Iran DEA Unspecified
[43] 20 cities in Taiwan DEA and AHP Unspecified
[35] 21 solar mini-grids DEA and AHP 2010–2019
in Bangladesh
[44] 27 locations in Vietnam DEA, G-AHP and G-TOPSIS Unspecified
efficiency, there are no studies that focus on predicting the efficiencies 3.1. Data envelopment analysis
estimated by DEA. This is however necessary to support the decisions of
policy makers. For example, the choice of location for new solar panels DEA is a non-parametric method used to measure the relative
can be decided based on efficiency prediction (i.e. technical efficiencies efficiency of a decision making unit (DMU) transforming inputs into
of potentials solar PV panel sites are predicted and compared in order to outputs [15]. By denoting the set of inputs, 𝑥 ∈ R𝑑+ , and the set of
select the optimal location). Several studies in other applied fields have outputs, 𝑦 ∈ R𝑝+ , we can define the production possibilities frontier
already proposed to predict the efficiencies estimated by DEA: [45] use containing all feasible combinations of inputs and output as
gradient boosting approach to predict the efficiency estimated by DEA
𝛹 = {(𝑥, 𝑦) ∣ x can produce y}. (1)
of 386 operational anaerobic digestion facilities. The DEA estimated
efficiencies of 450 paddy producers have been predicted by Random The output technical efficiency of a DMU with set (𝑥0 , 𝑦0 ) ∈ is R𝑑+𝑝
+
Forest [46]. [47] rely on a combination of machine learning methods then defined as a measure of the Euclidean distance from the point
and DEA to predict the efficiency of Chinese manufacturing companies. (𝑥0 , 𝑦0 ) to the boundary of 𝛹 [52]. Since 𝛹 is unknown, it is generally
Several predicting algorithms have been use to predict DEA efficiency estimated by the convex hull of the free disposal hull of observed data,
of COVID-19 management [48,49] Finally, [50] use a combination of ∑
𝑛 ∑
𝑛
DEA and ANN to predict the efficiencies of water companies. However, 𝛹̂ = {(𝑥, 𝑦) ∣ 𝑦 ≤ 𝑦𝑖 𝛾𝑖 , 𝑥 ≥ 𝑥𝑖 𝛾𝑖 ,
𝑖=1 𝑖=1
in these cited studies, the inference of machine learning estimations is
∑
𝑛
constraint to variable importance. This limitation is due to the black 𝛾𝑖 = 1, 𝛾𝑖 ≥ 0 ∀𝑖 = 1, … , 𝑛}. (2)
box nature of these methods [51]. This study proposes to predict the 𝑖=1
efficiency in each city in Australia and derive a directly interpretable where 𝛾𝑖 ’s are the intensity variables. The DEA technical efficiency
efficiency map. estimator could be expressed in linear programming terms
∑
𝑛 ∑
𝑛
3. Methodology 𝛿̂0 = 𝑚𝑎𝑥{𝜃 > 0 ∣ 𝜃𝑦0 ≤ 𝑦𝑖 𝛾𝑖 , 𝑥0 ≥ 𝑥𝑖 𝛾𝑖 ,

𝑖=1 𝑖=1
∑
𝑛
This section describes the methodology used to estimate technical 𝛾𝑖 = 1, 𝛾𝑖 ≥ 0 ∀𝑖 = 1, … , 𝑛}, (3)
efficiencies of solar plants, measure influence of environmental vari- 𝑖=1
ables on the estimated efficiencies and predict the optimal location for where 𝛿0 is a measure of technical efficiency for the producing unit
solar PV installation based on efficiency. with the corresponding input 𝑥0 and output 𝑦0 . It has been shown that
3
under given assumptions, this estimator is consistent [53]. However,

the convergence rate is relatively low due to the curse of dimensionality
associated with nonparametric estimations and 𝛿̂0 has a downward
biased.
3.2. Truncated regression
Once technical efficiencies of DMU are estimated it is of typical

interest to investigate the influence of environmental variables on effi-
ciency in a second stage estimation. More specifically, one might want
to learn about the unknown smooth function linking environmental
Fig. 1. Single regression tree with 3 splits applied to solar PV panel efficiency data.
variables to estimated efficiencies
𝛿̂𝑖 = 𝑓 (𝑧𝑖 ) + 𝜖𝑖 , 𝑖 = 1, … , 𝑛 (4)
where 𝑧𝑖 is a 𝑘 dimensional set of environmental variables. This econo- handle small sample sizes have prompted many practitioners to apply
it in various research areas [57]. The simplest way to describe the
metric model was originally estimated with Tobit and Ordinary least
Random Forest is to consider it as an ensemble of different regression
squares regression until the seminal paper of [54]. The authors showed
trees.
that these methods are not consistent due to the right-tail3 truncation
A regression tree is constructed by successively splitting of the data
of the error term 𝜖𝑖 . Since 𝛿̂𝑖 ≤ 1, we can rewrite Eq. (4) as
in a set of rectangles. Each variable split is chosen to minimize a
𝜖𝑖 ≤ 1 − 𝑓 (𝑧𝑖 ), 𝑖 = 1, … , 𝑛. (5) criterion (e.g. residual sum of squares). The feature space is divided
into two regions stored as nodes. These nodes are further divided until
The authors also highlighted the presence of the bias of 𝛿̂ and serial a stopping criterion is met (e.g. the minimum number of observations in
correlation in Eq. (4). To solve these issues, they propose a statistical terminal nodes). The response variable is then predicted by averaging
model where the two stages approach is appropriate. The error term values in each group.
is assumed to be normally distributed with zero mean and unknown An illustration is provided in Fig. 1. Consider 828 estimated efficien-
variance, truncation is determined by Eq. (5) and unknown smooth cies of solar PV plants. These efficiencies are separated in two groups
function is assumed to be linear, 𝑓 (𝑧𝑖 ) = 𝑧𝑖 𝛽. Estimates of 𝛽 ′ 𝑠 are ob- corresponding respectively to solar panels affected by more (733) and
tained with a combination of truncated maximum likelihood estimation less (95) than 0.000133 millimeters of snow per hour. A second split is
and bootstrap method see the second algorithm of [54,55] for details. chosen in the left group with respect to the variable Wind Speed at the
The steps of the estimation method are summarized in Algorithm 1. value 1.1 meter per second and the third split at the value 23 degrees.
Algorithm 1 Second algorithm of [54] In this minimal example, efficiencies of solar PV panels are predicted
by the average of each group.
Require: Input data 𝑥 ∈ R𝑑+ , output data 𝑦 ∈ R𝑝+ and environmental
The idea of the Random Forest is to increase the performance of
variable data 𝑧 ∈ R𝑘 .
regression tree using bootstrap aggregating (bagging). Suppose we fit a
1: compute from original sample 𝛿̂𝑖 , ∀𝑖 = 1, ..., 𝑛 using (3).
model based on our observed sample and obtain a predicted response
2: estimate 𝛽 and 𝜎𝜖 (variance of 𝜖𝑖 ) with maximum likelihood
variable 𝑦̂ = 𝑓̂(𝑧), where 𝑧 is the set of explanatory variables. We
estimation of truncated regression (4) assuming 𝑓 (𝑧𝑖 ) = 𝑧𝑖 𝛽.
can obtain 𝐵 boostrap samples by resampling the original data with
3: for each 1 ∶ 𝐿1 boostrap samples
replacement. A model is fitted for each boostrap sample and the final
4: draw 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛 from 𝑁(0, 𝜎̂ 𝜖 ) with truncation. prediction is obtained by averaging all predictions,
5: compute 𝛿𝑖∗ = 𝑧𝑖 𝛽̂ + 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛.
1 ∑ ̂∗𝑏
𝐵
𝛿̂
6: set 𝑥∗𝑖 = 𝑥𝑖 and 𝑦∗𝑖 = 𝑦𝑖 𝛿 ∗𝑖 , ∀𝑖 = 1, ..., 𝑛. 𝑓̂𝑏𝑎𝑔𝑔𝑖𝑛𝑔 (𝑧) = 𝑓 (𝑧), (6)
𝑖 𝐵 𝑏=1
7: compute 𝛿̂𝑖∗ , ∀𝑖 = 1, ..., 𝑛 using (3) with 𝑥∗𝑖 and 𝑦∗𝑖 .
̂
8: compute the bias-corrected estimator as 𝛿̂𝑖 = 𝛿̂𝑖 − 𝐵𝑖𝑎𝑠( ̂ 𝛿̂𝑖 ), where 𝑓̂∗𝑏 (𝑧) is the predicted response variable of boostrap sample 𝑏.
̂ ̂
∀𝑖 = 1, ..., 𝑛, where 𝐵𝑖𝑎𝑠(𝛿𝑖 ) is estimated from previous boostrap Bagging reduces the variance of the final prediction.
procedure. Regression trees benefit greatly from bagging. Indeed, if we allow
̂ them to grow in depth, they generally have relatively low bias and high
9: Estimate 𝛽̂ and 𝜎̂̂ 𝜖 with maximum likelihood estimation of
variance. However, the regression trees constructed in each boostrap
truncated regression (4) with 𝛿̂̂𝑖 sample are correlated. Because they are build based on the same
10: for each 1 ∶ 𝐿2 boostrap samples
set of explanatory variables. Unfortunately, the higher the correlation
11: draw 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛 from 𝑁(0, 𝜎̂̂ 𝜖 ) with truncation. between the trees, the higher the variance of the bagging estimator
12: compute 𝛿𝑖∗∗ = 𝑧𝑖 𝛽̂̂ + 𝜖𝑖 , ∀𝑖 = 1, ..., 𝑛. (6). Hence, the Random Forest algorithm randomly select a fraction
13: estimate 𝛽̂̂∗ and 𝜎̂̂ 𝜖∗ with maximum likelihood estimation of of explanatory variables (e.g. one third of available variables) for each
truncated regression (4) with 𝛿𝑖∗∗ . boostrap sample and use them to construct regression tree. This reduces
̂ ̂̂ based on the correlation between trees without increasing too much the variance.
14: return bias-corrected estimates 𝛽̂ and estimated 𝑉̂ [𝛽]
The Random Forest algorithm is used to predict technical efficien-
second boostrap procedure.
cies estimated by DEA using the set of environmental variables 𝑧.
One may wonder why the Random Forest algorithm is used instead of
3.3. Random forest other prediction algorithms. Therefore, predictions are also made using
other machine learning techniques and compared based on a goodness
The Random Forest introduced by [56] is one of the most popular of fit criteria. For the prediction problem presented in this paper,
machine learning tools. Its relatively high accuracy and ability to Random Forest is the most accurate algorithm (see Section 4.3). More
generally, Random Forest is expected to perform well with non-linear
data and converge relatively fast [58]. The Random Forest algorithm
3
Often presented in the literature as a left-tail truncation when reciprocals is also robust to outliers [59], requires relatively few hyperparameters
of technical efficiency 𝛿1 are used. compared to other methods [60] and does not overfit the data [61]. The
𝑖
4
main drawback of using Random Forest is the lack of interpretability

that results from the black box nature of the algorithm. To overcome
this problem, Random Forest is used to predict the technical efficiencies
of potential solar PV installations in each city in Australia. The resulting
efficiency map can be directly interpreted by decision makers.
4. Empirical results
The methodology described in Section 3 is applied to an unbal-

anced panel data combining information on inputs and output of 91
Australian solar panels from 2010 to 2020. The data are collected
from the PVoutput.org website.4 This platform allows panel owners to
automatically archive solar PV energy production and contains infor-
mation on panel system characteristics (e.g. capacity (MW), number of
panels, orientation, tilt and inverter size (MW), see table and figures of
Appendix for details). Numerous studies rely on this data (e.g., [62] use
it to simulate solar PV energy output). However, the website does not Fig. 2. Estimated empirical probability density function of DEA efficiencies.
provide information on metallic structure, PV system wire and battery.
Although the degradation of solar PV panel caused by environmental
variables is likely to be different due to these characteristics [63]. The 𝐸𝑙𝑒𝑐𝑡𝑟𝑖𝑐𝑖𝑡𝑦 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑 and installed capacity, solar irradiance and tem-
second stage truncated regression is estimated with and without fixed perature are defined as inputs, 𝑥 = (𝐼𝑛𝑠𝑡𝑎𝑙𝑙𝑒𝑑 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦, 𝑆𝑜𝑙𝑎𝑟 𝑖𝑟𝑟𝑎𝑑𝑖𝑎𝑛𝑐𝑒,
effect (capturing the unobserved characteristics of solar PV installation) 𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒). Note that our study follows both convention, namely
and results are consistent. that the number of producing units is larger than the product of the
This study focus on the effect of temperature, solar irradiance, precip- number of outputs and input (91 > 3 × 1) [66] and that the number
itation, snowfall, wind speed and air density at the solar PV location. In of observations is greater than three times the number of inputs plus
the comprehensive review of [14], environmental factors affecting solar outputs (828 > 3 × (1 + 3)) [67]. The efficiencies of solar PV panels
PV panels are dust accumulation, temperature, wind speed, humidity are estimated by DEA using the R package rDEA. The distribution of
and snowfall. Although there is no information on dust accumulation efficiencies estimated by DEA is displayed in Fig. 2. Most technical
on solar PV, it is determined by wind speed (which influences the efficiencies are between 0.6 and 0.9 with an average of 0.74. Since the
settlement of soiling on solar PV surface), precipitation (which cleans value 1 indicates perfect relative efficiency, it can be noted that solar
solar PV) and temperature (which favors the adhesion of dust particles panel have in general a relatively high efficiency.
to the PV surface). The ambient temperature at the solar PV location
also affects the temperature of the solar PV panel (an increase in 4.2. Truncated regression analysis
ambient temperature decreases the output voltage of the module, which
reduces the output power of the PV system). Wind speed can have Following the methodology described in Section 3.2 to measure
a cooling effect on the photovoltaic system, reducing the operating the influence of environmental variables on estimated efficiency, the
temperature of the cell. Snowfall is expected to have a negative effect following second stage regression model is proposed
on the efficiency of photovoltaic systems by reducing the amount of
𝛿̂𝑖𝑡 = 𝛽0 + 𝛽1 𝑃 𝑟𝑒𝑐𝑖𝑝𝑖𝑡𝑎𝑡𝑖𝑜𝑛𝑖𝑡 + 𝛽2 𝑊 𝑖𝑛𝑑 𝑠𝑝𝑒𝑒𝑑 𝑖𝑡
sunlight reaching the photovoltaic cells. Finally humidity can seep
into the solar panel and cause the encapsulant to delaminate. Since + 𝛽3 𝑆𝑛𝑜𝑤𝑓 𝑎𝑙𝑙𝑖𝑡 + 𝛽4 𝐴𝑖𝑟 𝑑𝑒𝑛𝑠𝑖𝑡𝑦𝑖𝑡
humidity information is not available at the solar PV location, this study + 𝛽5 𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑖𝑡 + 𝛽6 𝐴𝑔𝑒𝑖𝑡 + 𝜖𝑖𝑡 , (7)
used air density measurements (which vary at the same location with
temperature and humidity). where 𝛿̂𝑖𝑡 is the technical efficiency of solar PV plant 𝑖 at year 𝑡
The data of environmental variables are obtained from the Modern- estimated by DEA, 𝐴𝑔𝑒𝑖𝑡 is the age of solar PV plant 𝑖 at year 𝑡 and 𝜖𝑖𝑡 is
Era Retrospective analysis for Research and Applications climate the truncated error term. This truncated regression model is estimated
(MERRA) dataset.5 The ambient temperature (measured in degrees), following [54] with the R package rDEA setting the right truncation at
solar irradiance (measured in watt per square meter), precipitation value 1.
(measured in millimeters per hour), snowfall (measured in millimeters Note that in this two stages procedure, the panel data observa-
per hour), wind speed (measured in meters per second) and air density tions are pooled. Although it is always better to have more data
(measured in kilogram per cubic meter) are recorded and aggregated for nonparametric methods (linear programming estimation of (3)),
for each year at the location of each solar PV panel. Summary statistics pooling data indirectly assumed that the production frontier is separable
of data are provided in Table 2. with respect to time. In other words, the production frontier ignores
potential changes between time periods. While this assumption may not
4.1. DEA efficiency estimation be appropriate for producing industries subject to continuous techno-
logical changes, this is not the case for solar PV plants. In fact, once
The efficiencies of solar PV plants are estimated using the method- a solar panel is installed, its production frontier will remain the same
ology presented in 3.1. Variable returns to scale are allowed in the throughout its lifetime. The plant will eventually be removed when it
estimated frontier as constant and non-increasing returns to scale are becomes obsolete and replaced with a brand new panel. But since solar
rejected6 [64,65]. Following previous DEA studies of solar PV plants, PV plants are observed throughout their life, this assumption seems
the electricity produced by solar plant is defined as output, 𝑦 = appropriate. The estimates obtained from the truncated bootstrap re-
gression are presented in Table 3 (standard errors are estimated with
200 bootstraps samples).
4
The raw data can be downloaded at https://pvoutput.org. The variable 𝑊 𝑖𝑛𝑑 𝑆𝑝𝑒𝑒𝑑 is positively related to the efficiency of
5
The raw data can be downloaded at https://gmao.gsfc.nasa.gov/ the solar PV panels with an associated significant coefficient of 0.013
reanalysis/MERRA-2. (i.e. increasing the wind speed at the location of the solar PV plant,
6
Both null hypothesis are rejected at 𝛼 = 0.05. increases the efficiency of the solar plant by 0.013). This result is
5
Table 2
Summary statistics.
Variable Unit of Account Number of observation Mean Standard Deviation Minimum Maximum
Electricity produced Megawatt hour 828 7096 4371 1479 44964
Installed capacity Megawatt 828 4717 2819 1480 29700
Solar irradiance Watt per m2 828 0.3071 0.022 0.2311 0.3669
Temperature Degrees 828 17.506 3.879 9.974 24.085
Precipitation Millimeters per hour 828 0.073 0.033 0.003 0.294
Wind speed Meter per second 828 4.063 1.487 0.260 6.595
Snowfall Millimeters per hour 828 0.000 0.001 0.000 0.004
Air density Kilogram per m3 828 1.188 0.030 1.116 1.225
Age of solar plant Years 828 5.059 2.606 1.000 10.000
Table 3 mtry, the number of randomly selected variables, node size, the mini-
Estimates of the Boostrap Truncated Regression.
mum number of observations in terminal nodes and sample size, the size
Truncated Bootstrap Regression
of boostrap samples. mtry is usually set to one third of the available
Estimate 2.5% C.I. 97.5% C.I. variables. Lower values lead to less correlated trees but decrease the
Intercept 1.866∗∗∗ 1.447 2.242 average performance of the trees (suboptimal variables can be used).
(0.210) Node size is set by default to 5. This hyperparameter controls the depth
Precipitation 0.185 −0.035 0.419
(0.120)
of the regression tree. Sample size is generally set to the same size as the
Wind Speed 0.013∗∗∗ 0.008 0.018 original sample. As the hyperparameter mtry, lower values lead to less
(0.003) correlated trees. The R package tuneRanger is used to tune all hyperpa-
Snowfall 61.933∗∗∗ 38.781 85.549 rameters simultaneously. This package relies on sequential model-based
(14.260)
optimization (SMBO) [73] to select the optimal set of hyperparameters
Air Density −0.842∗∗∗ −1.149 −0.488
(0.174) based on the mean squared error criteria. In our optimization problem,
Temperature −0.096∗∗∗ −0.013 −0.007 tuned hyperparameters are 𝑚𝑡𝑟𝑦 = 6, 𝑛𝑜𝑑𝑒 𝑠𝑖𝑧𝑒 = 8 and 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 =
(0.002) 59%. Note that setting mtry to 6 implies that we are using full set of
Age −0.005∗∗∗ −0.007 −0.002
variables available in each boostrap sample. The resulting regression
(0.001)
trees are more accurate but potentially correlated. But by setting sample
Note: 828 obs. ∗ 𝑝 < 0.1; ∗∗
𝑝 < 0.05; ∗∗∗
𝑝 < 0.01.
size to about 60% of the original sample, the trees are de-correlated.
The regression trees of the first 4 boostrap samples are display in Fig. 3.
Note that even though the trees are constructed with the same set of
consistent with [68] assessing that wind has a cooling effect on the explanatory variables, setting the sample size at 60% produces different
solar PV panel and with [10] observing that the positive effect of wind trees.
speed is relatively small compared to the negative effect of temper- The variable importances of each predictor are displayed in Fig. 4.
ature. Indeed, the significant coefficient associated with the variable The metric used is the Mean decrease in accuracy which is better
𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 is −0.096, see [69] for a review of negative effect of than using node impurity [74]. The most important variable is the
temperature on solar PV plant performance. The coefficient associated Temperature. The age of the solar PV panel has less influence on the
with variable 𝑆𝑛𝑜𝑤𝑓 𝑎𝑙𝑙 is positive and significant. The magnitude, estimated efficiency than environmental variables. It should be noted
61.933, is relatively large compared to the other coefficients. This is that the variable Precipitation is used for efficiency prediction. While it
mainly due to the fact that snowfall is not common in Australia (the is not significant in the truncated regression. This could indicate that
maximum value of the variable 𝑆𝑛𝑜𝑤𝑓 𝑎𝑙𝑙 is 0.004 in the sample), the precipitation has a non-linear effect on the estimated efficiencies. To
coefficient 𝛽3 absorbs the scale of the variable. The sign of the estimated provide an interpretation of the Random Forest results, the efficiency of
coefficient seems surprising, since snow accumulation on solar glass newly installed solar PV panels is predicted in all Australian cities with
is thought to reduce the efficiency of solar installations [70]. One a population of more than 1000 inhabitants. Note that the Random For-
possible explanation is that Australia’s warm climate does not allow est estimation used only environmental variables. It is then sufficient
snow to accumulate. Therefore, snowfall only cleans the solar PV glass. to obtain the value of these variables at each city location using the
A significant negative effect is estimated for the variable 𝐴𝑖𝑟 𝐷𝑒𝑛𝑠𝑖𝑡𝑦. MERRA dataset to predict the efficiency of a solar PV plant installed in
An increase of air mass reduces the sunlight reaching the solar cell this location. The predicted efficiency of solar PV panel are predicted
and decreases the performance of the solar PV plant [71]. The effect for 3828 cities.7 The ranking of cities based on the efficiency score
of aging is estimated at −0.005. In other words, the efficiency of an is provided in the Table 5. The most suitable city to install solar PV
installed photovoltaic solar panel decreases by 0.05 every 10 years. panels seems to be the small town of Evandale located on the island of
Finally, the variable 𝑃 𝑟𝑒𝑐𝑖𝑝𝑖𝑡𝑎𝑡𝑖𝑜𝑛 is not significant in the truncated Tasmania. The least suitable location is the city of Augusta in the south
regression model. west of Australia. The cities of Melbourne and Sydney are ranked 445
and 2226 respectively. To visualize the predicted efficiencies location
4.3. Predicted efficiencies for solar PV panels, the average of predicted efficiencies is computed
for each Australian region. The results are displayed in Fig. 5. The most
The truncated regression model presented in the previous section favorable region for the installation of solar PV panels is the island of
allows a direct interpretation of estimated coefficient. However this Tasmania. The southeastern regions also appear to be more optimal
model requires relatively strong assumptions (e.q. linear effect of en- than the northern or western regions. The most unfavorable region
vironmental variables, no interaction effect, . . . ). It is then particularly according to these results is the Far North located at the northern tip of
interesting to estimate the flexible Random Forest model. The estima- Australia.
tion is made using the R package randomForest setting the number of
regression trees to 1000.
To improve the performance of the Random Forest the hyperparam- 7
The city’s location is defined by the Australian national cartography
eters are tuned following [72]. The set of hyperparameters contains mapping agency as the location of the city hall.
6
Fig. 3. Regression trees associated with 4 boostrap samples.
Fig. 4. Variable importance of each predictor based on the mean decrease in accuracy.
Fig. 5. Average of predicted efficiencies of newly installed solar PV panels in 2020 by
Australian region.
Table 4
RMSE computed with 10 folds cross-validation.
Method RMSE Hyperparameters
there is not enough data for estimating this complex model. The best
Random forest 0.076 𝑚𝑡𝑟𝑦 = 6, 𝑛𝑜𝑑𝑒𝑠𝑖𝑧𝑒 = 8, 𝑠𝑎𝑚𝑝𝑠𝑖𝑧𝑒 = 0.59
GAM 0.079 𝑚𝑒𝑡ℎ𝑜𝑑 = GCV.Cp and 𝑠𝑒𝑙𝑒𝑐𝑡 = FALSE neural network (selected by caret package) contains only one layer with
SVM 0.083 𝐶𝑜𝑠𝑡 = 1 3 neurons. The Boosted tree is the worst method, probably because
ANN 0.089 𝐿𝑎𝑦𝑒𝑟1 = 3, 𝐿𝑎𝑦𝑒𝑟2 = 0, 𝐿𝑎𝑦𝑒𝑟3 = 0 of its tendency to overfitting. The correlation of predicted efficiencies
Boosted tree 0.095 𝑢 = 0.1, 𝑚𝑠𝑡𝑜𝑝 = 100, 𝑚𝑎𝑥𝑑𝑒𝑝𝑡ℎ = 3
estimated by the Random Forest and other methods is relatively high
(i.e. between 0.6 and 0.7).
One may say that other machine learning techniques should be ap- 5. Conclusion
plied instead of the Random Forest. Therefore, the efficiencies of solar
PV panels are also predicted using ANN, General Additive model (GAM), The transition from fossil fuel combustion to renewable energy is
Support Vector Machines with Linear Kernel (SVM) and Boosted Tree. one of the biggest challenge of the next decades. Among the renewable
The hyperparameters of the methods listed above are also optimized energies, solar PV has received a lot of attention. The potential of this
using the R package caret, see Table 4 for the list of optimized tuning source of electricity is relatively large. Therefore, it is of particular
parameters. The prediction methods are then compared based on Root interest to find the most suitable locations for installing solar PV panels.
mean squared errors (RMSE) computed for each method with using 10 This study examines the effect of environmental variables on the techni-
folds cross-validation. RMSE are reported in Table 4. cal efficiency of solar PV panels. First, efficiencies of 91 solar PV panels
The Random forest is the most efficient method to predict DEA installed in Australia are estimated during the time period 2010–2020
technical efficiencies. Note that ANN does not perform well because with DEA. Then the truncated regression model is used to estimate the
7
Table 5
Predicted efficiency score of Australian cities for 2020.
City Score Rank
Evandale 0.973 1
West Moonah 0.972 2
Glenorchy 0.972 3
Mount Stuart 0.972 4
Goodwood 0.972 5
Lutana 0.972 6
... ... ...
Melbourne 0.859 445
... ... ...
Sydney 0.762 2226
... ... ...
Mount Sheridan 0.600 3823
Bentley Park 0.600 3824
Dunsborough 0.559 3825
Quindalup 0.558 3826
Yallingup 0.558 3827
Augusta 0.555 3828 Fig. A.7. Generated electricity in MWh against Inverter size measured in Watts.
Table A.6
Orientation of Solar PV plants.
Orientation East North East North North West West
Number of solar PV panels 5 21 44 19 2
Fig. A.8. Generated electricity in MWh against number of panels of solar PV plants.
References
[1] Van Ruijven BJ, De Cian E, Sue Wing I. Amplification of future energy demand
Fig. A.6. Estimated distribution of tilt angle measured in degrees of solar PV panels.
growth due to climate change. Nature Commun 2019;10(1):1–12.
[2] Yasa UG, Erim M, Erim N, Girgin M, Kurt H. Design of anti-reflective graded
height nanogratings for photovoltaic applications. In: 2017 international con-
ference on numerical simulation of optoelectronic devices. IEEE; 2017, p.
effects of environmental variables on the estimated efficiencies. Finally 25–6.
the highly flexible Random forest is used to predict the efficiency of [3] Sarkın AS, Ekren N, Sağlam Ş. A review of anti-reflection and self-cleaning
newly installed solar PV panels in the 3828 Australian cities. The results coatings on photovoltaic panels. Sol Energy 2020;199:63–73.
highlight optimal locations and regions for the installation of solar [4] Choudhary P, Srivastava RK. Sustainability perspectives-a review for solar
photovoltaic trends and growth opportunities. J Clean Prod 2019;227:589–612.
PV panels. This study provides an interesting and easily interpretable
[5] Sundareswaran K, Sankar P, Nayak PSR, Simon SP, Palani S. Enhanced energy
tool for policy decision makers. Further research could use the same output from a PV system under partial shaded conditions through artificial bee
methodology to investigate the best location for the installation of other colony. IEEE Trans Sustain Energy 2014;6(1):198–209.
renewable energy plants. [6] Xiao M, Junne T, Haas J, Klein M. Plummeting costs of renewables-are energy
scenarios lagging? Energy Strategy Rev 2021;35:100636.
[7] Liang Z, Zhou Z, Zhao L, Dong B, Wang S. Fabrication of transparent, durable
Declaration of competing interest and self-cleaning superhydrophobic coatings for solar cells. New J Chem
2020;44(34):14481–9.
The authors declare that they have no known competing finan- [8] Murdock HE, Gibb D, André T, Sawin JL, Brown A, Ranalder L, et al. Renewables
2021-global status report. 2021.
cial interests or personal relationships that could have appeared to [9] Commission IE, et al. Crystalline silicon terrestrial photovoltaic (PV)
influence the work reported in this paper. modules—design qualification and type approval. CEI/IEC 2005;61215.
[10] Gaglia AG, Lykoudis S, Argiriou AA, Balaras CA, Dialynas E. Energy efficiency of
Data availability PV panels under real outdoor conditions–an experimental assessment in Athens,
Greece. Renew Energy 2017;101:236–43.
[11] Hamou S, Zine S, Abdellah R. Efficiency of PV module under real working
Data will be made available on request. conditions. Energy Procedia 2014;50:553–8.
[12] Hachicha AA, Al-Sawafta I, Said Z. Impact of dust on the performance of
solar photovoltaic (PV) systems under United Arab Emirates weather conditions.
Appendix A
Renew Energy 2019;141:287–97.
[13] Adeh EH, Good SP, Calaf M, Higgins CW. Solar PV power potential is greatest
See Fig. A.6, Fig. A.7, Fig. A.8 and Table A.6. over croplands. Sci Rep 2019;9(1):1–6.
8
[14] Santhakumari M, Sagar N. A review of the environmental factors degrading [43] Wang C-N, Dang T-T, Bayer J, et al. A two-stage multiple criteria decision making
the performance of silicon wafer-based photovoltaic modules: Failure detec- for site selection of solar photovoltaic (PV) power plant: A case study in Taiwan.
tion methods and essential mitigation techniques. Renew Sustain Energy Rev IEEE Access 2021;9:75509–25.
2019;110:83–100. [44] Wang C-N, Dang T-T, Wang J-W, et al. A combined data envelopment anal-
[15] Charnes A, Cooper WW, Rhodes E. Measuring the efficiency of decision making ysis (DEA) and grey based multiple criteria decision making (g-MCDM) for
units. European J Oper Res 1978;2(6):429–44. solar PV power plants site selection: A case study in Vietnam. Energy Rep
[16] Aigner D, Lovell CK, Schmidt P. Formulation and estimation of stochastic frontier 2022;8:1124–42.
production function models. J Econometrics 1977;6(1):21–37. [45] De Clercq D, Wen Z, Fei F. Determinants of efficiency in anaerobic bio-waste co-
[17] Meeusen W, van Den Broeck J. Efficiency estimation from cobb-douglas digestion facilities: A data envelopment analysis and gradient boosting approach.
production functions with composed error. Internat Econom Rev 1977;435–44. Appl Energy 2019;253:113570.
[18] Mardani A, Zavadskas EK, Streimikiene D, Jusoh A, Khoshnoudi M. A comprehen- [46] Nandy A, Singh PK. Farm efficiency estimation using a hybrid approach of
sive review of data envelopment analysis (DEA) approach in energy efficiency. machine-learning and data envelopment analysis: Evidence from rural eastern
Renew Sustain Energy Rev 2017;70:1298–322. India. J Clean Prod 2020;267:122106.
[19] Mohd Chachuli FS, Ahmad Ludin N, Mat S, Sopian K. Renewable energy [47] Zhu N, Zhu C, Emrouznejad A. A combined machine learning algorithms and DEA
performance evaluation studies using the data envelopment analysis (DEA): A method for measuring and predicting the efficiency of Chinese manufacturing
systematic review. J Renew Sustain Energy 2020;12(6):062701. listed companies. J Manag Sci Eng 2020.
[20] Chueh H-E, Jheng J-Y. Applying data envelopment analysis to evaluation of [48] Aydin N, Yurdakul G. Assessing countries’ performances against COVID-19 via
Taiwanese solar cell industry operational performance. Int J Comput Sci Inform WSIDEA and machine learning algorithms. Appl Soft Comput 2020;97:106792.
Technol 2012;4(4):1. [49] Taherinezhad A, Alinezhad A. COVID-19 crisis management: Global appraisal
[21] Hsiao J-M. Measuring the operating efficiency of solar cell companies in Taiwan using two-stage DEA and ensemble learning algorithms. Sci Iranica 2022.
with data envelopment analysis. Am J Appl Sci 2012;9(12):1899. [50] Molinos-Senante M, Maziotis A. Prediction of the efficiency in the water industry:
[22] Lee AH, Lin CY, Kang H-Y, Lee WH. An integrated performance evaluation model An artificial neural network approach. Proc Saf Environ Prot 2022;160:41–8.
for the photovoltaics industry. Energies 2012;5(4):1271–91. [51] Efron B. Prediction, estimation, and attribution. Internat Statist Rev 2020;88:S28–
[23] Sueyoshi T, Goto M. Photovoltaic power stations in Germany and the United 59.
States: A comparative study by data envelopment analysis. Energy Econ [52] Farrell MJ. The measurement of productive efficiency. J R Stat Soc Ser A
2014;42:271–88. (General) 1957;120(3):253–81.
[24] Sueyoshi T, Goto M. Measurement of returns to scale on large photovoltaic power [53] Kneip A, Park BU, Simar L. A note on the convergence of nonparametric DEA
stations in the United States and Germany. Energy Econ 2017;64:306–20. estimators for production efficiency scores. Econom Theory 1998;14(6):783–93.
[25] Li N, Liu C, Zha D. Performance evaluation of Chinese photovoltaic companies [54] Simar L, Wilson PW. Estimation and inference in two-stage, semi-parametric
with the input-oriented dynamic SBM model. Renew Energy 2016;89:489–97. models of production processes. J Econometrics 2007;136(1):31–64.
[26] Liu J, Long Y, Song X. A study on the conduction mechanism and evaluation of [55] Simar L, Wilson PW. Two-stage DEA: Caveat emptor. J Prod Anal
the comprehensive efficiency of photovoltaic power generation in China. Energies 2011;36(2):205–18.
2017;10(5):723. [56] Breiman L. Random forests. Mach Learn 2001;45(1):5–32.
[27] Wang DD, Sueyoshi T. Assessment of large commercial rooftop photovoltaic [57] Biau G, Scornet E. A random forest guided tour. Test 2016;25(2):197–227.
system installations: Evidence from California. Appl Energy 2017;188:45–55. [58] Fan G-F, Yu M, Dong S-Q, Yeh Y-H, Hong W-C. Forecasting short-term electricity
[28] Sueyoshi T, Wang D. Measuring scale efficiency and returns to scale on load using hybrid support vector regression with grey catastrophe and random
large commercial rooftop photovoltaic systems in California. Energy Econ forest modeling. Util Policy 2021;73:101294.
2017;65:389–98. [59] Cutler A, Cutler DR, Stevens JR. Random forests. In: Ensemble machine learning.
[29] Wang Z, Li Y, Wang K, Huang Z. Environment-adjusted operational performance Springer; 2012, p. 157–75.
evaluation of solar photovoltaic power plants: A three stage efficiency analysis. [60] Demir S, Sahin EK. Comparison of tree-based machine learning algorithms
Renew Sustain Energy Rev 2017;76:1153–62. for predicting liquefaction potential using canonical correlation forest, rota-
[30] de Villiers A, Vermeulen H. Sector performance monitoring in utility-scale solar tion forest, and random forest based on CPT data. Soil Dyn Earthq Eng
farms using data envelopment analysis. In: 2017 IEEE PES powerafrica. IEEE; 2022;154:107130.
2017, p. 192–7. [61] Desai S, Ouarda TB. Regional hydrological frequency analysis at ungauged sites
[31] Haeri A. Evaluation and comparison of crystalline silicon and thin-film photo- with random forest regression. J Hydrol 2021;594:125861.
voltaic solar cells technologies using data envelopment analysis. J Mater Sci, [62] Pfenninger S, Staffell I. Long-term patterns of European PV output using 30 years
Mater Electron 2017;28(23):18183–92. of validated hourly reanalysis and satellite data. Energy 2016;114:1251–65.
[32] Ghosh S, Yadav VK, Mukherjee V, Yadav P. Evaluation of relative impact of [63] de Oliveira MCC, Cardoso ASAD, Viana MM, Lins VdFC. The causes and
aerosols on photovoltaic cells through combined Shannon’s entropy and data effects of degradation of encapsulant ethylene vinyl acetate copolymer (EVA)
envelopment analysis (DEA). Renew Energy 2017;105:344–53. in crystalline silicon photovoltaic modules: A review. Renew Sustain Energy Rev
[33] Wu Y, Ke Y, Zhang T, Liu F, Wang J. Performance efficiency assessment of pho- 2018;81:2299–317.
tovoltaic poverty alleviation projects in China: A three-phase data envelopment [64] Simar L, Wilson PW. Non-parametric tests of returns to scale. European J Oper
analysis model. Energy 2018;159:599–610. Res 2002;139(1):115–32.
[34] You H, Fang H, Wang X, Fang S. Environmental efficiency of photovoltaic power [65] Simar L, Wilson PW. Inference by the m out of n bootstrap in nonparametric
plants in China—A comparative study of different economic zones and plant frontier models. J Prod Anal 2011;36(1):33–53.
types. Sustainability 2018;10(7):2551. [66] Boussofiane A, Dyson RG, Thanassoulis E. Applied data envelopment analysis.
[35] Aziz S, Chowdhury SA. Performance evaluation of solar mini-grids in Bangladesh: European J Oper Res 1991;52(1):1–15.
A two-stage data envelopment analysis. Clean Environ Syst 2021;2:100003. [67] Dyson RG, Allen R, Camanho AS, Podinovski VV, Sarrico CS, Shale EA. Pitfalls
[36] Yokota S, Kumano T. Mega-solar optimal allocation using data envelopment and protocols in DEA. European J Oper Res 2001;132(2):245–59.
analysis. Electr Eng Jpn 2013;183(4):24–32. [68] Gökmen N, Hu W, Hou P, Chen Z, Sera D, Spataru S. Investigation of wind speed
[37] Sozen A, Mirzapour A, Çakir MT. Selection of the best location for solar plants cooling effect on PV panels in windy locations. Renew Energy 2016;90:283–90.
in Turkey. J Energy Southern Afr 2015;26(4):52–63. [69] Dubey S, Sarvaiya JN, Seshadri B. Temperature dependent photovoltaic (PV)
[38] Dehghani E, Jabalameli MS, Pishvaee MS, Jabarzadeh A. Integrating information efficiency and its effect on PV production in the world–a review. Energy Procedia
of the efficient and anti-efficient frontiers in DEA analysis to assess location of 2013;33:311–21.
solar plants: A case study in Iran. J Ind Syst Eng 2018;11(1):163–79. [70] Andrews RW, Pollard A, Pearce JM. The effects of snowfall on solar photovoltaic
[39] Azadeh A, Ghaderi S, Maghsoudi A. Location optimization of solar performance. Sol Energy 2013;92:84–97.
plants by an integrated hierarchical DEA PCA approach. Energy Policy [71] Rida K, Al-Waeli A, Al-Asadi K. The impact of air mass on photovoltaic panel
2008;36(10):3993–4004. performance. Eng Sci Rep 2016;1(1):1–9.
[40] Azadeh A, Sheikhalishahi M, Asadzadeh S. A flexible neural network-fuzzy data [72] Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for
envelopment analysis approach for location optimization of solar plants with random forest. Wiley Interdisc Rev Data Min Knowl Discov 2019;9(3):e1301.
uncertainty and complexity. Renew Energy 2011;36(12):3394–401. [73] Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimization for
[41] Lee AH, Kang H-Y, Lin C-Y, Shen K-C. An integrated decision-making model for general algorithm configuration. In: International conference on learning and
the location of a PV solar plant. Sustainability 2015;7(10):13522–41. intelligent optimization. Springer; 2011, p. 507–23.
[42] Wang C-N, Nguyen VT, Thai HTN, Duong DH. Multi-criteria decision making [74] Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests.
(MCDM) approaches for solar power plant location selection in Viet Nam. Pattern Recognit Lett 2010;31(14):2225–36.
Energies 2018;11(6):1504.

Combining Data Development Analysis and RF For Selecting Optimal Locations of Solar PV

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combining Data Development Analysis and RF For Selecting Optimal Locations of Solar PV

Uploaded by

Copyright:

Available Formats

Energy and AI 11 (2023) 100222

Contents lists available at ScienceDirect

Combining data envelopment analysis and Random Forest for selecting

• Combination of DEA and Random Forest is proposed.

ARTICLE INFO ABSTRACT

E-mail address: gilles.cattani@unige.ch.

The rest of the paper is organized as follows. Section 2 presents

3. Methodology 𝛿̂0 = 𝑚𝑎𝑥{𝜃 > 0 ∣ 𝜃𝑦0 ≤ 𝑦𝑖 𝛾𝑖 , 𝑥0 ≥ 𝑥𝑖 𝛾𝑖 ,

under given assumptions, this estimator is consistent [53]. However,

3.2. Truncated regression

Once technical efficiencies of DMU are estimated it is of typical

𝛿̂𝑖 = 𝑓 (𝑧𝑖 ) + 𝜖𝑖 , 𝑖 = 1, … , 𝑛 (4)

main drawback of using Random Forest is the lack of interpretability

The methodology described in Section 3 is applied to an unbal-

Fig. 3. Regression trees associated with 4 boostrap samples.

You might also like