Modelling Reference Evapotranspiration Using Principal Component Analysis and Machine Learning Methods Under Different Climatic Environments

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/370952822
Modelling reference evapotranspiration using principal component analysis

and machine learning methods under different climatic environments
Article in Irrigation and Drainage · May 2023

DOI: 10.1002/ird.2838
CITATIONS READS
0 85
9 authors, including:
Ali Raza Kouadri Saber
19 PUBLICATIONS 104 CITATIONS

Université Kasdi Merbah Ouargla
33 PUBLICATIONS 249 CITATIONS
SEE PROFILE
SEE PROFILE
Yongguang Hu Ram L. Ray

Jiangsu University Prairie View A&M University
82 PUBLICATIONS 596 CITATIONS 92 PUBLICATIONS 1,677 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Application of gamma test, heuristic and regression techniques for simulation of suspended sediment. View project
Special Issue "Sustainable Management of Water and Environment with the Aid of Advanced Computing Methods" View project
All content following this page was uploaded by Ali Raza on 23 May 2023.
The user has requested enhancement of the downloaded file.

Received: 29 July 2022 Revised: 21 April 2023 Accepted: 1 May 2023
DOI: 10.1002/ird.2838
RESEARCH ARTICLE
Modelling reference evapotranspiration using principal

component analysis and machine learning methods under
different climatic environments
Ali Raza 1 | Kouadri Saber 2 | Yongguang Hu 1 | Ram L. Ray 3 |

Yunus Ziya Kaya 4 | Hossein Dehghanisanij 5 | Ozgur Kisi 6,7 | Ahmed Elbeltagi 8
1
School of Agricultural Engineering,
Jiangsu University, Zhenjiang, Abstract
China Reference evapotranspiration (ETo) is a complex process in the hydrologic
2
Laboratory of Water and Environment cycle that influences several hydrologic parameters. Although several methods
Engineering in Saharan Environment,
have been developed to model ETo, a reliable method that can use limited cli-
University of Kasdi Merbah-Ouargla,
Ouargla, Algeria matic input parameters for data-limited regions is still limited. This study eval-
3
Department of Agriculture, Nutrition uated four machine learning (ML) methods: M5 pruned (M5P) tree, sequential
and Human Ecology, College of minimal optimization (SMO), radial basis function neural regression
Agriculture and Human Sciences, Prairie
View A&M University, Prairie View, (RBFNreg) and multilinear regression (MLR). The major objective of this study
Texas, USA was to identify the best approach to estimate ETo with minimum input data in
4
Civil Engineering Department, Faculty of five stations (Multan, Jacobabad, Faisalabad, Islamabad and Skardu) located
Engineering, Osmaniye Korkut Ata
in Pakistan. The datasets of these stations comprised maximum and minimum
University, Osmaniye, Turkey
5
Agricultural Research, Education and
temperatures (Tmax, Tmin), average relative humidity (RHavg), average wind
Extension Organization, Agricultural speed (Ux), and sunshine hours (n) variables. Two scenarios were used for ETo
Engineering Research Institute, Karaj, modelling. In the first scenario, five climatic variables were used as inputs to
Alborz, Iran
6
estimate ETo as obtaining full climatic parameters is the biggest challenge in
Department of Civil Engineering,
Technical University of Lübeck, Lübeck, developing countries. Principal component analysis (PCA) was used as a clus-
Germany tering technique in the second scenario to reduce the climatic input parame-
7
Department of Civil Engineering, Ilia ters. The PCA results indicated that Tmax, Tmin and n were identified as
State University, Tbilisi, Georgia
8
effective inputs for ETo estimation. Based on statistical indicators, the M5P tree
Agricultural Engineering
Department, Faculty of Agriculture, outperformed the other applied ML methods in estimating ETo under various
Mansoura University, Mansoura, climatic environments. This study recommends focusing on areas with high
Egypt
ETo values and adequate irrigation scheduling of crops to achieve water
Correspondence sustainability.
Yongguang Hu, School of Agricultural
Engineering, Jiangsu University, KEYWORDS
Zhenjiang 212013, China. diverse climatic environments, ETo modelling, machine learning methods, principal
Email: deerhu@ujs.edu.cn component analysis, reference evapotranspiration
Article title in French: Modelisation de l'evapotranspiration de reference a l'aide de methodes d'analyse des composantes principales et
d'apprentissage automatique dans differents environnements climatiques
Irrig. and Drain. 2023;1–26. wileyonlinelibrary.com/journal/ird © 2023 John Wiley & Sons, Ltd. 1
15310361, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ird.2838 by Jiangsu University, Wiley Online Library on [22/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 RAZA ET AL.
Funding information
Résumé
Jiangsu Postdoctoral Science Foundations,
Grant/Award Numbers: 2016M600376, L'évapotranspiration de référence (ETo) est un processus complexe du cycle
1601032C; Jiangsu Provincial hydrologique qui influence plusieurs paramètres hydrologiques. Bien que plu-
Government, Grant/Award Number:
sieurs méthodes aient été mises au point pour modéliser l'ETo, une méthode
BE2021340; Priority Academic Program
Development of Jiangsu Higher Education fiable qui peut utiliser des paramètres d'entrée climatiques limités pour des
Institutions, Grant/Award Number: régions où les données sont limitées est encore limitée. Cette étude a évalué
PAPD-2018-87
quatre méthodes d'apprentissage automatique (ML): arbre M5 élagué (M5P),
optimisation minimale séquentielle (SMO), régression neuronale à fonction de
base radiale (RBFNreg), et régression multilinéaire (MLR). Cette étude visait à
identifier la meilleure approche pour estimer l'ETo avec un minimum de don-
nées d'entrée dans 5 stations (Multan, Jacobabad, Faisalabad, Islamabad et
Skardu) situées au Pakistan. L'ensemble de données de ces stations comprend
les températures maximales et minimales (Tmax, Tmin), l'humidité relative moy-
enne (RH), la vitesse moyenne du vent (Ux) et les heures d'ensoleillement (n).
Deux scénarios ont été utilisés pour la modélisation de l'ETo. Dans le premier
scénario, cinq variables climatiques ont été utilisées comme intrants pour esti-
mer l'ETo, car l'obtention de paramètres climatiques complets est le plus grand
défi en face des pays en développement. L'analyse des composantes principales
(PCA) a été utilisée comme technique de regroupement dans le deuxième scé-
nario pour réduire les paramètres d'entrée climatiques. Les résultats de la PCA
ont indiqué que Tmax, Tmin et n ont été identifiés comme des intrants efficaces
pour l'estimation de l'ETo. Sur la base des indicateurs statistiques, l'arbre M5P
a surpassé les autres méthodes ML appliquées pour estimer l'ETo dans divers
environnements climatiques. Cette étude recommande de se concentrer sur les
zones présentant des valeurs élevées de l'ETo et sur un calendrier adéquat
d'irrigation des cultures pour assurer la durabilité de l'eau.
MOTS CLÉS
Evapotranspiration de référence, Modélisation d'ETo, Méthodes d'apprentissage
automatique, Analyse des composantes principales, Environnements climatiques variés
1 | INTRODUCTION Therefore, alternative methods are widely chosen and

used to estimate ETo (Mcmahon et al., 2013). The
Estimation of reference evapotranspiration (ETo) has Penman–Monteith equation of the Food and Agriculture
become a severe challenge in agriculture, meteorology Organization (FAO-PM56) developed by Allen et al.
and water-related studies (Dhillon et al., 2019). Irrigation (1998) is recommended and globally accepted to calculate
scheduling of crops is difficult to prepare without having ETo but requires numerous climatic and aerodynamic
appropriate knowledge of the water balance in the soil variables. Several areas lack adequate weather data
and crop water requirements associated with ETo (Nouri (Rahimikhoob, 2010). Good quality and quantity of cli-
et al., 2013; Zhao et al., 2013). Several methods are avail- mate data may not be accessible in developing countries
able to estimate ETo, including indirect, direct and com- (Trajkovic & Kolakovic, 2009). A profound need for data
puter modelling based on various machine learning such as the latitude, longitude and altitude of a geograph-
(ML) methods (Bahrami et al., 2019). The direct methods ical area becomes essential to adjust the varying weather
of ETo estimation include lysimeter, optical scintillation, measures (Gavilan et al., 2007).
Bowen ratio and eddy covariance. However, the main Due to the modernization and improvement of compu-
hindrances to using direct methods are high-cost installa- tational modelling, it is now convenient to address signifi-
tion charges, time consumption in data handling and cant challenges in various fields. Soft computing models'
skilled labour needed to perform correct operations. high performance and low cost contribute to their
RAZA ET AL. 3
continually increasing effectiveness (Kumar et al., 2011). SVM consistently performed the best. Wen et al. (2015)
Due to its high complexity, the dynamic and nonlinear applied two ML models of ANN and SVM against three
nature of ETo estimation poses a considerably challenging empirical (Hargreaves [Ha], Ritchie [Ri], Priestley and
task. Consequently, computer models with fewer climatic Taylor [PT]) models to estimate the ETo of a dry region
variables are the most effective alternatives for ETo estima- in China using daily meteorological data. The authors
tion. It is widely recognized (Ibrahim, 2016) due to its used the highest and minimum temperatures as model
capacity to efficiently handle challenging problems and input. The predicted daily ETo was found to be adequate
the benefit of applying it to resolve complex difficulties when only a few meteorological variables were used. To
with a limited amount of data. ML models rely on a set of estimate daily ETo using climatic data from four meteoro-
top-notch algorithms applied in nonlinear mapping pro- logical stations situated in the karst region of Guangxi
cesses, such as ETo, to comprehend the relationship province in China, Wang et al. (2016) studied the effec-
between the input and output (target) variables. tiveness of two ML models, namely, gene expression pro-
Based on premise and foundation of use, how accu- gramming (GEP) and ANN. According to the study, GEP
rate and structured, Raza, Hu, et al. (2021) listed research with fewer climatic inputs can generate straightforward
articles recently published on ETo estimation that were explicit mathematical formulas that are simpler to use
not established more than 8 years ago, from 2012 to 2020. than the employed ANN models.
Soft computing models have been successfully used in To estimate ETo, Mehdizadeh et al. (2017) examined
many parts of the world to estimate ETo. These are cho- the effectiveness of GEP, multivariate adaptive regression
sen for their tendency and ability to provide absolute for- splines (MARS) and SVM. ML models were created using
mulation and are helpful in their application. However, the monthly meteorological variables. MARS ranked first
the authors have pointed out a limitation about how pre- among the applied ML models according to the
dictable the soft computing models can be while using a evaluation indices, and it was in good agreement with
few parameters associated with the climate. These results FAO-PM56. Kişi & Cimen (2009) used climate data from
were observed to be more prominent in different climate California stations and least square support vector
conditions. The conditions may include humid, semi-arid machine (LSSVM) to estimate ETo. The evaluation indi-
and arid regions. This results from the level of influence ces used to examine the LSSVM performance produced
exerted on the ETo process by multiple climate variables. satisfactory and reliable ETo. By using the GEP ML
In lieu of these observations, a reliable result on estimat- model, Saggi & Jain (2019) calculated ETo using monthly
ing ETo using soft computing models in specific regions climate data. When the acquired results were compared
requires a variety of information relating to and with the to FAO-PM56, they were determined to be favourable.
climate. The main objective of the study was to develop Mattar (2018) discovered that GEP performed best when
different soft computing models, which can be preferred applied to varied climatic conditions in Egypt. Elbeltagi
as an alternative to FAO-PM56. The need for it is due to et al. (2022) developed five variants of additive regression
the numerous climatic and adjusted data as input, which (AR) ML methods using monthly climatic data from
are not easily accessible or available in most regions. Pakistan stations. The authors investigated the perfor-
Recent works on ETo modelling highlighted this con- mance of each ML model using different evaluation indi-
tentious issue and has been overwhelming among ces. The M5 pruned (M5P) variant of AR was found to be
researchers and climatologists. In the literature, various the closest to FAO-PM56 and provided accurate ETo esti-
types of ML methods, for example, support vector mations. Similarly, Wang et al. (2022) developed 10 ML
machine (SVM) (Ferreira & da Cunha, 2020; Mehdizadeh methods using monthly climatic data, and the results
et al., 2017), genetic programming (GP) (Mattar, 2018; based on evaluation indices indicated that tree boost
Valipour et al., 2019), extreme learning machine (ELM) (TB) performed best compared to other ML models. Fur-
(Abdullah et al., 2015; Shamshirband & Kamsin, 2016), thermore, ML methods can extract useful information
tree-based models (Raza et al., 2020), M5 model tree (Fan from time series data without discretization. The perfect
et al., 2018; Granata, 2019), random forest (RF) (Saggi & handling of time series data using ML methods is recom-
Jain, 2019), extreme gradient boosting (XGBoost) (Han mended in various engineering challenges, especially in
et al., 2019) and artificial neural networks (ANN) (Walls ETo estimation.
et al., 2020), have been applied to estimate ETo using lim- It can be perceived from the above literature that the
ited climatic data. Using historical meteorological data application of ML algorithms in ETo modelling using lim-
on a daily basis, Yin et al. (2017) evaluated the effective- ited climatic variables is a good choice and is accepted
ness of the ANN, SVM and three empirical models for worldwide. In Pakistan, a few weather stations were
estimating the daily ETo in a hilly interior watershed in installed, and climatic data for some areas were found to
northwest China. They observed that in the studied area, be insufficient to calculate ETo. Thus, when conventional
4 RAZA ET AL.
methods (FAO-PM56) cannot be implemented owing to the wind speed at Faisalabad station was recorded as the
enormous input demands or a lack of climatic character- highest (149.92 km/day) because of its geographical loca-
istics, improving methods depending on fewer climatic tion, and severe types of thunderstorms occurred every
inputs becomes important. One of the exquisite possibili- year due to the cold wind coming from the west. More-
ties for developing an ETo model is to use ML methods. over, it has dry winter and humid summer seasons, mak-
Creating an ML model with a known set of input vari- ing its climatic condition semi-arid. Table 2 represents the
ables versus the target variable is a challenging task that brief statistical characteristics of the climatic data
has been explicitly addressed in this paper. employed in the training and testing stages.
According to the available literature, there is no com- Additionally, skewness and kurtosis coefficients
parison research on the use of M5P tree, sequential mini- (Xskp and Xkrt) were also determined, which indicate the
mal optimization (SMO), radial basis function neural asymmetrical direction and degree of flatness/peakness
regression (RBFNreg) and multilinear regression (MLR) in the time series data, respectively. Table 2 shows that
for estimating ETo in different climates. Thus, the objec- Xskp at Islamabad station for the Tmax, Tmin and RHavg
tives of the current study are (i) developing and evaluat- climatic variables was positively skewed, which indicates
ing the M5P tree, SMO, RBFNreg and MLR methods in a larger value of the mean (Xmean). At the same time,
ETo estimation using climatic data from five stations negatively skewed Ux, n and ETo showed lower values of
located in different climates (semi-arid, humid, hyper- Xmean. Alternatively, Xkrt for Tmax, Tmin, RHavg and ETo
arid), (ii) examining various meteorological input combi- was estimated to be negative, indicating lower peakness
nations using principal component analysis (PCA) and (Platykurtic curve), while larger peakness in Ux and
identifying influential climatic variables for ETo estima- n was observed due to the positive value (Mesokurtic
tion and (iii) creating ETo variation maps in the studied curve). Similarly, the Xskp and Xkrt relations of the
region based on the output of the best ML method. climatic variables for the selected climatic stations can be
found in Table 2.
2 | MATERIALS AND METHODS

2.2 | FAO-PM56 method
2.1 | Study area and datasets
The FAO of the United Nations has provided CROPWAT
The study area comprises five stations in Pakistan software version 8.0, which is used to calculate the ETo
(Multan, Jacobabad, Faisalabad, Islamabad and Skardu) using meteorological input parameters. The value of ETo
located in various climatic regions. The climatic parame- in this software is calculated by the globally accepted
ters of maximum and minimum temperature (Tmax and standard PM56 equation, which has been defined by
Tmin, oC), averaged relative humidity (RHavg, %), averaged Allen et al. (1998) as follows:
wind speed of 24 h at 2-m height (Ux, km/day) and sun-
shine hours (n, h) were acquired from the Pakistan Meteo- 0:408ΔðRn GÞ þ γ T meanþ273
900
U 2 ðe s e a Þ
rological Department (PMD), Lahore. A dataset covering ETo ¼ , ð1Þ
Δ þ γ ð1 þ 0:34U 2 Þ
the period from 1987 to 2016 was obtained for each sta-
tion. Based on aridity and continentality indices, the
Multan and Jacobabad stations are in hyperarid regions, where ETo is the reference evapotranspiration
whereas the Faisalabad and Islamabad stations are in (mm day1), Rn/G is the net radiation/soil heat flux
semi-arid regions. However, Skardu is considered a humid (MJm2 day1), Tmean is the air temperature ( C), es/ea is
climatic region, as noted by Raza et al. (2020) and Raza, the saturation/actual vapour pressure (kPa), and Δ/γ is
Shoaib, et al. (2021). Figure 1 illustrates the location of the the vapour pressure curve slope/psychrometric constant
study region. Table 1 summarizes the data information, (kPa C1).
including the geographic location average of each dataset
for all stations. As shown in Table 1, the highest values of
the temperature range were observed at Multan and Jaco- 2.3 | Data-driven models
babad (hyperarid climate), mild temperatures were
observed at Faisalabad and Islamabad (semi-arid climate) 2.3.1 | M5 pruned tree
and the lowest values were observed at Skardu station
(humid climate). Likewise, sunshine hours corresponding The M5P method was initially introduced by Quinlan
to the hyperarid climatic station were recorded as highest (1992). This method is a type of algorithm for decision
compared to the semi-arid and humid stations. However, trees (DT). In this type of DT, a main (root) node splits
RAZA ET AL. 5
FIGURE 1 Selected climatic stations in Pakistan.

subnodes, and each node ends with a regression equa- T i
tion. After the classification, each final node containing a SDR ¼ SD j T i j , ð2Þ
T
regression equation lets users make an estimation. Con-
sidering ‘T’ as a node, the standard deviation of the where Ti represents the subgroup of samples,
class in ‘T’ is a measure of error. It is expected to deter- T represents a group of samples reaching the node, SD
mine the maximizing standard deviation reduction represents the standard deviation, and SDR represents
(SDR) to choose between attributes at that level of split- the standard deviation reduction. In some cases, the tree
ting. The calculation of the SDR is given by Equation (2) created by the M5P may split considerably, and the tree
as follows: size could be larger than expected.
6 RAZA ET AL.
TABLE 1 Monthly average meteorological parameters for each selected climatic station.
Station properties Meteorological parameters
Lat Lon Alt Tmax Tmin RHavg U n ETo

Selected station (DD) (DD) (m) (OC) (OC) (%) (km/day) (h) (m/day) Data duration
Multan 30.20 71.45 122 32.09 17.6 39.78 106.58 7.40 5.61 1987–2016
Faisalabad 31.41 73.11 184 30.66 16.94 42.43 149.92 6.50 4.80
Islamabad 33.72 73.06 540 28.34 13.37 49.98 77.91 5.41 4.28
Jacobabad 28.28 68.43 61 33.76 19.86 37.23 128.94 8.10 5.96
Skardu 35.30 75.68 2228 18.68 4.19 38.12 121.24 4.23 2.41
Abbreviations: ETo, reference evapotranspiration; n, sunshine hours; RHavg, average relative humidity; Tmax, maximum temperature; Tmin, minimum
temperature; U, wind speed.
TABLE 2 Statistical description of meteorological variables for selected climatic stations.
Dataset Climate variables Xmean Xstd CV Xmin Xmax Xskp Xkrt

Islamabad station
Training Tmax ( C) 28.848 6.738 23.360 16.200 40.700 0.270 1.200

Tmin ( C) 13.633 7.770 57.000 2.900 24.900 0.090 1.440
RH (%) 49.484 13.244 26.760 19.000 74.000 0.300 0.950
Ux (km/day) 78.390 49.580 63.250 75.480 230.880 0.480 0.030
n (h/day) 7.450 1.548 20.790 5.500 11.300 1.020 0.770
ETo (mm) 3.626 1.776 48.970 0.800 8.040 0.360 0.730
Testing Tmax ( C) 28.765 6.563 22.820 15.200 40.300 0.290 1.160
Tmin ( C) 14.306 7.849 54.870 1.000 25.500 0.170 1.420
RH (%) 49.830 11.220 22.510 23.000 74.000 0.360 0.430
Ux (km/day) 78.320 64.990 82.980 4.440 337.440 1.560 3.210
n (h/day) 7.450 1.553 20.840 5.500 11.300 1.030 0.820
ETo (mm) 3.551 1.878 52.900 0.910 8.340 0.410 0.730
Faisalabad station
Training Tmax ( C) 31.173 7.242 23.230 15.800 42.500 0.360 1.200
Tmin ( C) 17.335 8.294 47.850 3.100 29.000 0.180 1.470
RH (%) 42.750 11.612 27.160 16.000 91.000 0.030 0.450
Ux (km/day) 138.660 73.300 52.860 8.880 386.280 0.300 0.140
n (h/day) 7.025 1.010 14.380 4.500 8.100 1.150 0.700
ETo (mm) 4.398 2.299 52.290 0.910 10.960 0.120 1.100
Testing Tmax ( C) 31.022 7.406 23.870 16.500 41.900 0.440 1.100

Tmin ( C) 17.692 8.357 47.240 3.400 28.600 0.300 1.420
RH (%) 44.460 12.140 27.290 19.000 70.000 0.250 0.540
Ux (km/day) 177.720 89.470 50.340 13.320 337.440 0.320 1.160
n (h/day) 7.025 1.013 14.420 4.500 8.100 1.160 0.750
ETo (mm) 5.014 2.588 51.620 1.230 10.090 0.090 1.230
Skardu station
Training Tmax ( C) 19.285 9.715 50.380 2.700 35.800 0.190 1.240
Tmin ( C) 4.784 8.150 170.350 17.900 19.000 0.160 1.110
RH (%) 39.623 14.980 37.810 14.000 81.000 0.890 0.070
Ux (km/day) 125.040 95.450 76.330 0.000 497.280 0.990 0.960
n (h/day) 6.125 1.952 31.860 2.700 9.100 0.480 0.990
ETo (mm) 3.438 2.119 61.650 0.510 8.250 0.160 1.310
RAZA ET AL. 7
TABLE 2 (Continued)
Dataset Climate variables Xmean Xstd CV Xmin Xmax Xskp Xkrt

Testing Tmax ( C) 18.692 9.394 50.260 1.600 33.300 0.170 1.270

Tmin ( C) 4.277 8.214 192.050 12.600 19.400 0.160 1.230
RH (%) 38.120 13.370 35.060 19.000 74.000 0.660 0.470
Ux (km/day) 116.020 97.160 83.750 0.000 386.280 0.730 0.260
n (h/day) 6.125 1.957 31.950 2.700 9.100 0.490 0.980
ETo (mm) 3.365 2.187 64.980 0.520 7.860 0.180 1.350
Jacobabad station
Training Tmax ( C) 34.338 7.152 20.830 21.200 47.300 0.200 1.110
Tmin ( C) 20.279 7.968 39.290 5.200 31.000 0.280 1.410
RH (%) 36.452 10.624 29.140 11.000 65.000 0.200 0.480
Ux (km/day) 1618.500 471.700 29.140 488.400 2886.000 0.200 0.480
n (h/day) 7.791 0.591 7.580 7.000 8.600 0.100 1.660
ETo (mm) 5.223 2.239 42.880 1.660 11.020 0.150 1.030
Testing Tmax ( C) 33.971 7.430 21.870 20.100 45.600 0.220 1.090
Tmin ( C) 20.444 7.975 39.010 6.500 30.900 0.340 1.360
RH (%) 41.950 13.700 32.650 13.000 73.000 0.100 0.790
Ux (km/day) 1862.700 608.200 32.650 577.200 3241.200 0.100 0.790
n (h/day) 7.791 0.592 7.600 7.000 8.600 0.100 1.670
ETo (mm) 4.595 2.082 45.320 1.370 9.130 0.160 1.030
Multan station
Training Tmax ( C) 32.520 7.341 22.570 18.100 44.100 0.280 1.250
Tmin ( C) 18.177 8.634 47.500 3.200 30.300 0.160 1.470
RH (%) 56.709 11.790 20.790 26.000 80.000 0.610 0.450
Ux (km/day) 115.190 76.610 66.500 0.000 451.170 0.610 0.280
n (h/day) 7.831 1.271 16.240 3.130 11.250 0.890 1.110
ETo (mm) 4.482 2.372 52.910 1.100 10.000 0.160 1.200
Testing Tmax ( C) 32.378 7.340 22.670 18.000 43.000 0.440 1.130
Tmin ( C) 19.265 8.541 44.330 3.800 30.600 0.330 1.340
RH (%) 56.410 11.730 20.790 30.500 78.000 0.620 0.500
Ux (km/day) 158.240 99.400 62.820 2.220 395.610 0.290 0.870
n (h/day) 7.425 1.460 19.660 3.130 11.250 0.580 0.420
ETo (mm) 4.911 2.662 54.220 1.100 10.300 0.110 1.260
Abbreviations: CV, coefficient of variation; ETo, reference evapotranspiration; n, sunshine hours; RH, relative humidity; Tmax, maximum temperature; Tmin,
minimum temperature; Ux, average wind speed; Xmean, mean value; Xstd, standard deviation; Xmin, minimum value; Xmax, maximum value; Xskp, skewness
coefficient; Xkrt, kurtosis coefficient.
2.3.2 | Sequential minimal optimization SVM technique was extended to make it usable on multi-
class classification and regression problems. In the general
SMO is used to prepare a help vector classifier using poly- description of the SVM, it is indicated that the SVM tries
nomial or RBF bits. It replaces all the missing qualities to find the best line to separate the data into classes as
and changes ostensible qualities into paired ones. A soli- accurately as possible. However, SVR tries to find the best
tary shrouded layer neural system utilizes a similar type of line fitting with the cost function by minimizing errors.
model as an SVM. Support vector regression (SVR) is the The following equation (Equation 3) is used to train
general name of regression analysis performed using SVM- the SVR:
supervised learning models. The SVM method was initially
developed for binary classification problems. Later, the jyi hw, x i i bj ≤ ε, ð3Þ
8 RAZA ET AL.
where ε is a free threshold parameter, ‘hw, x i i b’ is the yi ¼ β0 þ β1 x i1 þ β2 x i2 þ … þ βp x ip þ ϵ, ð6Þ

estimation of the sample, and x i is a training sample with
target value yi .
where yi is the dependent variable, β0 is the constant
term, βp is the slope coefficient of the pth independent
2.3.3 | Radial basis function neural variable, xi is the independent variable, and ϵ is the error
regression term of the developed model.
RBFNreg is a nonlinear regression approach that means

more than solving a linear line. The problem is generally 2.4 | Principal component analysis
very complex in most hydrological cases, and the linear
approach is not sufficient to find the best model. As the PCA is widely used to analyse large datasets with a sub-
RBFNreg method is the basis of a Gaussian approach, the stantial number of dimensions/features per observation
model learned by the RBFNreg class is similar to to improve data interpretability while preserving as much
Equation (4). information as possible and enabling the presentation of
multidimensional data. PCA is a statistical technique
!!
X
b X
m a2 x c 2
j j i,j
used to reduce the number of dimensions in a dataset.
f ðx 1 ,x 2 , …,x m Þ ¼ g w0 þ wi exp , This is accomplished by conducting a linear transforma-
i¼1 j¼1
2σ 2i,j
tion on the data, which relocates it to a new coordinate
ð4Þ system in which the data's variance may be described in
fewer dimensions. Because they allow data to be shown
where x 1 , x 2 ,…, x m is the vector of attribute values in two dimensions, the first two major components are
belonging to an instance, b is the number of functions, g commonly employed in research. This makes it easier to
() is the, a2j is the jth attribute's weight, wi is the ith identify trends and outliers. PCA computes the principal
weight, ci is the centre of the basis function, and σ 2i is the components (a collection of points in a real coordinate
variance. system) to conduct a change of basis on the data, often
In the case of regression, the best fitting settings for just using the top few principal components and dismiss-
the parameters a2j , ci,j and σ 2i,j are found by Equation (5). ing the rest. The first principal component of a collection
This equation is an error function calculated on training of x variables is the derived variable formed by linearly
data and is about identifying the local minimum of the combining the original variables that explain the most
penalized squared error. significant amount of data variation. After eliminating
the impact of the first principal component, the remain-
! !
n 2
1X Xb ing variance is best explained by the second component;
LSSE ¼ y f ! þ λ w2i , ð5Þ hence, this procedure can be repeated up to x times until
2 i¼1 i xi
i¼1
all variance is explained. When numerous variables are
highly correlated, PCA is typically used to decrease the
where λ represents the penalty size and can be specified, number of variables to a more manageable independent
yi is the real target value considering the !
xi
training sam- set. PCA has applications in exploratory data analysis
ple, and n is the number of training instances. and model development. To acquire lower-dimensional
data while retaining as much of the data's variety as feasi-
ble, dimensionality reduction is frequently employed by
2.3.4 | Multilinear regression projecting each data point onto only the first few major
components. One method to consider the first principal
MLR is one of the most popular and known regression component is as the optimal compass bearing for mini-
approaches that has been applied to different types of mizing the dispersion of the expected values.
problems. MLR is an extension of the simple linear
regression method. While the simple linear regression
method lets users define the relationship between one 2.5 | Development of ETo model
continuous dependent (response) and one independent
(explanatory) variable, MLR allows users to work with Weka is an open-source software developed by the Univer-
multiple dependent and independent variables. sity of Waikato. It has many users because it is a collection
Equation (6) describes the definition of the MLR of ML methods based on Java. The software lets users apply
methodology. different ML approaches for data mining tasks quickly and
RAZA ET AL. 9
F I G U R E 2 Adopted reference
evapotranspiration (ETo) methodology for
selected machine learning algorithms. FAO,
Food and Agriculture Organization; M5P tree,
M5 pruned tree; MLR, multilinear regression;
RBFNreg, radial basis function neural
regression; SMO, sequential minimal
optimization.
effectively. It has data preprocessing, clustering, classifica- utilized (7), association rules were performed (8), and in
tion, regression, visualization and feature selection abilities. the last step, the model was evaluated (9). A scheme
In Weka software, each data point (parameter) is described showing the methodology of the modelling process is
as an ‘attribute’. The software supports different attributes, given in Figure 2.
such as nominal, numeric and string attributes. It has its
own file system called ‘arff’, and it also supports other
common file types such as ‘CSV’. 2.6 | Performance evaluation of ML
Weka was used in this study, and the following steps methods
were implemented. The dataset used was preprocessed
in the first step (1). The data file was prepared in CSV The present study calculated the performance indices,
file format (2), and it was uploaded to Weka software by namely, correlation coefficient (r), mean absolute error
using the import feature (3). The necessary parameters (MAE), root mean squared error (RMSE), relative abso-
were selected (4), data were classified (5), and the near- lute error (RAE) and root relative squared error (RRSE),
est neighbour was chosen as the estimation function (6). to evaluate the ML methods. These are defined as
In the cross-validation process, K-means clustering was follows:
Pn P n P n
n i¼1 f ETobs ETest i¼1 ETobs i¼1 ETest
r ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h h P , ð7Þ
Pn 2 n 2i h Pn 2
i h P
n 2i
n i¼1 ðETobs Þ i¼1 ETobs n i¼1 ðETest Þ i¼1 ETest
10 RAZA ET AL.
where ETobs is the relative observed agreement, and ETest matrices, eigenvalues, eigenvectors and contributions of
is the hypothetical probability. the variables for the studied meteorological stations.
Table 3 shows that Tmax, Tmin, Ux, and n have positive
correlation coefficients at the humid and semi-arid
1X n
1X n
MAE ¼ j f i yi j¼ jei j, ð8Þ stations. At Faisalabad station (semi-arid climate), the
n i¼1 n i¼1
ETo has r values with Tmax, Tmin, Ux and n equal to
0.901, 0.911, 0.853 and 0.729, respectively. At the second
where fi is the prediction value, and yi is the true value. semi-arid station, Islamabad, the ETo has a significant
correlation coefficient with Tmax, Tmin and n, with values
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi equal to 0.868, 0.838 and 0.728, respectively. A humid
X n ðy^ yÞ2 climate characterizes Skardu station, and the ETo has
RMSE ¼ t
ð9Þ
t¼1 n r values equal to 0.893, 0.911, 0.833 and 0.892 with Tmax,
Tmin, Ux and n, respectively. For hyperarid stations, Tmax
P n 2 2
1 and Tmin had the highest correlations for the estimation
i¼1 ðP i Ai Þ of ETo, which were 0.898 and 0.875, respectively, at
RAE ¼ U 1 ¼ P , ð10Þ
2 2
1
n
i¼1 ðAi Þ
Jacobabad station. At Multan station, the highest r values
were found between ETo and Tmax (0.913), Tmin (0.914)
and Ux (0.884).
where the first version of Thiel's U, called ‘U1’, is a mea- The main objective of using PCA in this study was to
sure of accuracy, comparing actual earnings (A) to pre- determine the best predictors of ETo to use in the second
dicted earnings (P). proposed scenario. For this aim, the selection of compo-
nent numbers in the rotated space had to be specified
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ffi
uP correctly. Figure 3 presents a bar diagram; the x-axis
u n p
u j¼1 ðijÞ T j presents the components, while the y-axis presents the
RRSE ¼ E i ¼ t P n 2 , ð11Þ
T j Ťj
eigenvalue of each component (on the left) with orange
j¼1
bars and the accumulation percentage of explained
variance (on the right) with a purple line. Each station
where P(ij) is the value predicted by the individual is presented in a separate window. From Figure 3, it is
model i for record j (out of n records), Tj is the target clear that in all five stations, the first two components
P
n explained more than 80% of the total variance
value for record j, and Ť is given by the formula n1 T j:
j¼1 (Islamabad station 84%, Faisalabad station 88%, Skardu
station 93%, Jacobabad station 86% and Multan station
86%). Based on Figure 3, we selected the first two
3 | R ES U L T S A N D D I S C U S S I O N components to present meteorological parameters in a
rotated space. Figure 4 illustrates the projection of
This paper applied four ML techniques, M5P tree, SMO parameters in a rotated space of two components and
and RBFNreg, for ETo modelling. The modelling tasks eigenvalues for selected stations. Based on Figure 4, a
were performed based on two scenarios. In the first sce- homogenous group containing ETo at each station was
nario, the meteorological data for Tmax, Tmin, RH, Ux and detected and identified by a red dashed circle. At the
n were used as inputs to predict ETo. In the second sce- Islamabad, Jacobabad and Skardu stations, Tmax, Tmin
nario, we used PCA as a clustering technique to reduce and n formed a homogeneous group with ETo. A new
the inputs. member was added to this group at Faisalabad station,
which is Ux, whereas at Multan station, we included the
addition of Ux by excluding n from the homogonous
3.1 | PCA results group of ETo. The RH was found to be out of this group
at all stations. The percentage of the presence of ETo
Due to the strong correlation between the explanatory with one of the remaining parameters in a homogeneous
variables, multicollinearity is a significant issue in multi- group at the studied stations is as follows: ETo-Tmax
ple linear regression analysis, increasing the regression 100%, ETo-Tmin 100%, ETo-n 80%, ETo-Ux 40% and
parameter estimators. Thus, PCA was suggested and used ETo-RHavg < 10%. From the previous results, the param-
to address the multicollinearity statistics between the eters selected as good predictors of ETo in the second
explanatory variables. Table 3 presents the correlation scenario are Tmax, Tmin and n.
RAZA ET AL. 11
TABLE 3 Correlation matrix between climate variables.
Tmax Tmin RHavg Ux n ETo

Faisalabad station (semi-arid)
Tmax 1
Tmin 0.954 1
RHavg 0.444 0.215 1
Ux 0.664 0.737 0.172 1
n 0.844 0.77 0.486 0.528 1
ETo 0.901 0.911 0.394 0.853 0.729 1
Islamabad station (semi-arid)
Tmax 1
Tmin 0.924 1
RHavg 0.316 0.063 1
Ux 0.183 0.177 0.528 1
n 0.775 0.649 0.392 0.158 1
ETo 0.868 0.838 0.485 0.567 0.728 1
Skardu station (humid)
Tmax 1
Tmin 0.966 1
RHavg 0.782 0.747 1
Ux 0.559 0.6 0.72 1
n 0.922 0.896 0.834 0.641 1
ETo 0.893 0.911 0.807 0.833 0.892 1
Jacobabad station (hyperarid)
Tmax 1
Tmin 0.937 1
RHavg 0.244 0.060 1
Ux 0.244 0.060 1 1
n 0.564 0.509 0.099 0.099 1
ETo 0.898 0.875 0.237 0.237 0.373 1
Multan station (hyperarid)
Tmax 1
Tmin 0.955 1
RHavg 0.691 0.517 1
Ux 0.694 0.784 0.434 1
n 0.546 0.46 0.492 0.344 1
ETo 0.913 0.914 0.714 0.884 0.528 1
Eigenvalues Component matrix

Initial Extraction
Variables Initial Extraction Total % of variance Cumulative % 1 2
Tmax 1.000 0.932 4.352 72.527 72.527 0.965 0.033
Tmin 1.000 0.938 0.956 15.934 88.460 0.941 0.228
RHavg 1.000 0.935 0.445 7.410 95.871 0.468 0.846
Ux 1.000 0.774 0.195 3.246 99.116 0.803 0.359
(Continues)
12 RAZA ET AL.
TABLE 3 (Continued)
Eigenvalues Component matrix

Initial Extraction
Variables Initial Extraction Total % of variance Cumulative % 1 2
n 1.000 0.790 0.041 0.688 99.804 0.863 0.214
ETo 1.000 0.939 0.012 0.196 100.000 0.963 0.113
Islamabad station (semi-arid)
Tmax 1.000 0.952 3.719 61.981 61.981 0.930 0.296
Tmin 1.000 0.913 1.344 22.405 84.386 0.857 0.424
RHavg 1.000 0.754 0.621 10.342 94.728 0.513 0.701
Ux 1.000 0.772 0.255 4.248 98.975 0.469 0.743
n 1.000 0.720 0.039 0.655 99.631 0.831 0.170
ETo 1.000 0.952 0.022 0.369 100.000 0.973 0.074
Skardu station (humid)
Tmax 1.000 0.980 5.023 83.718 83.718 0.941 0.307
Tmin 1.000 0.951 0.579 9.642 93.360 0.940 0.261
RHavg 1.000 0.810 0.260 4.338 97.698 0.890 0.137
Ux 1.000 0.978 0.090 1.505 99.203 0.783 0.604
n 1.000 0.928 0.027 0.455 99.658 0.950 0.157
ETo 1.000 0.955 0.021 0.342 100.000 0.973 0.091
Jacobabad station (hyperarid)
Tmax 1.000 0.976 3.248 54.134 54.134 0.974 0.128
Tmin 1.000 0.963 1.951 32.511 86.645 0.886 0.422
RHavg 1.000 0.712 0.698 11.629 98.274 0.372 0.928
Ux 1.000 0.879 0.090 1.494 99.768 0.372 0.928
n 1.000 0.828 0.014 0.232 100.000 0.630 0.150
ETo 1.000 0.919 0.019 0.251 100.000 0.917 0.106
Multan station (hyperarid)
Tmax 1.000 0.908 4.383 73.048 73.048 0.953 0.022
Tmin 1.000 0.915 0.788 13.129 86.177 0.927 0.237
RHavg 1.000 0.718 0.491 8.189 94.367 0.743 0.408
Ux 1.000 0.853 0.312 5.205 99.571 0.826 0.414
n 1.000 0.798 0.017 0.275 99.846 0.647 0.616
ETo 1.000 0.979 0.009 0.154 100.000 0.982 0.121
Abbreviations: ETo, reference evapotranspiration; n, sunshine hours; RHavg, average relative humidity; Tmax, maximum temperature; Tmin, minimum
temperature; Ux, average wind speed.
3.2 | Suggested scenario results The RBFNreg model was the optimal choice in ETo esti-
mation. This model generated low statistical errors
Table 4 shows the statistical performance indicators for (MAE = 0.573 mm/day, RMSE = 0.722 mm/day, RAE =
modelling ETo based on four developed ML models at five 29.532 and RRSE = 32.312) for training, while the corre-
stations for the first scenario. At Jacobabad station, all sponding errors for evaluation were MAE = 0.608 mm/
models produced satisfactory results, with correlation coef- day, RMSE = 0.812 mm/day, RAE = 33.330 and
ficients (r) ranging from 0.930 to 0.946 in the training RRSE = 37.525. Moreover, it also generated satisfactory
phase and from 0.941 to 0.949 in the evaluation period. results at Multan station, with correlation coefficients
RAZA ET AL. 13
FIGURE 3 Eigenvalues and percentage of explained variance.
ranging from 0.992 to 0.996 in the training phase and from 0.008 at Faisalabad and from 0.0016 to 0.0071 at
0.989 to 0.995 in the evaluation period. It generated low Islamabad during the evaluation process. The superior
statistical errors (MAE = 0.165 mm/day, RMSE = 0.211 ETo models were SMO and M5P for the Faisalabad and
mm/day, RAE = 7.898 and RRSE = 8.923) for training, Islamabad stations, which achieved the lowest values for
while the corresponding errors for evaluation were MAE statistical errors compared to the others. Additionally, at
= 0.232 mm/day, RMSE = 0.303 mm/day, RAE = 9.856 Skardu station (located in a humid region), RBFNreg was
and RRSE = 11.311. At both previous stations, RBFNreg the best ML model, which had the highest correlation
was found to be the best model in predicting ETo due to (r = 0.998) and the lowest values of errors (MAE =
the same climate conditions in the hyperarid region. At 0.100 mm/day, RMSE = 0.135 mm/day, RAE = 5.137 and
the other three stations, due to different climates (located RRSE = 6.223) through the evaluating phase. In addition,
in semi-arid and humid regions), different methods were Figure 5 presents scatter plots of the observed (x-axis) and
found to be optimal in modelling ETo. At Faisalabad and predicted (y-axis) ETo values for the created models. The
Islamabad stations, located in semi-arid regions, all results indicate that the performances of the RBFNreg pre-
designed models had the same correlation coefficient dictive model have a high correlation with observations at
(r = 0.980), with low variations ranging from 0.004 to the three stations (Jacobabad, Multan and Skardu). In
14 RAZA ET AL.
F I G U R E 4 Projection of meteorological parameters on a two-dimensional rotated space plot. ETo, reference evapotranspiration; n,
sunshine hours; RH, relative humidity; Tmax, maximum temperature; Tmin, minimum temperature; WS, wind speed.
contrast, the SMO and M5P predictive models had the evaluation periods. It is clearly observed from Figure 5
highest correlations with the observed ETo at other sta- that all the ML models show less performance at
tions (Faisalabad and Islamabad) during the training and Jacobabad station than at the other four stations. This can
RAZA ET AL. 15
TABLE 4 Performance metrics of developed models in ETo prediction for first scenario.
Train Test
M5P MLR RBFNreg SMO M5P MLR RBFNreg SMO

Faisalabad station
CCP 0.964 0.964 0.970 0.964 0.982 0.982 0.984 0.986
MAE 0.443 0.443 0.391 0.433 0.406 0.406 0.390 0.345
RMSE 0.603 0.603 0.553 0.610 0.556 0.556 0.549 0.492
RAE (%) 21.786 21.786 19.224 21.277 17.686 17.686 16.977 15.008
RRSE (%) 26.310 26.310 24.127 26.607 21.002 21.002 20.750 18.574
Islamabad station
CCP 0.981 0.974 0.981 0.976 0.987 0.981 0.984 0.981
MAE 0.172 0.231 0.185 0.227 0.228 0.276 0.262 0.276
RMSE 0.335 0.382 0.337 0.386 0.345 0.394 0.378 0.398
RAE (%) 11.500 15.440 12.361 15.147 14.465 17.546 16.628 17.528
RRSE (%) 18.918 21.583 19.043 21.789 18.482 21.087 20.251 21.306
Jacobabad station
CCP 0.943 0.931 0.946 0.930 0.942 0.944 0.949 0.941
MAE 0.573 0.642 0.573 0.635 0.697 0.637 0.608 0.637
RMSE 0.741 0.812 0.722 0.819 0.889 0.817 0.812 0.814
RAE (%) 29.496 33.048 29.532 32.691 38.190 34.89 33.330 34.884
RRSE (%) 33.174 36.371 32.312 36.662 41.091 37.749 37.525 37.613
Multan station
CCP 0.996 0.992 0.996 0.992 0.993 0.989 0.995 0.989
MAE 0.153 0.229 0.165 0.228 0.264 0.346 0.232 0.336
RMSE 0.205 0.293 0.211 0.295 0.341 0.450 0.303 0.438
RAE (%) 7.306 10.900 7.898 10.87 11.230 14.672 9.856 14.288
RRSE (%) 8.687 12.415 8.923 12.462 12.710 16.757 11.311 16.310
Skardu station
CCP 0.997 0.983 0.996 0.983 0.997 0.985 0.998 0.985
MAE 0.112 0.297 0.116 0.29 0.111 0.316 0.100 0.323
RMSE 0.164 0.380 0.167 0.389 0.156 0.392 0.135 0.411
RAE (%) 5.934 15.626 6.122 15.239 5.689 16.160 5.137 16.499
RRSE (%) 7.769 18.004 7.910 18.394 7.190 18.019 6.223 18.906
Abbreviations: CCP, coefficient of pearson correlation; ETo, reference evapotranspiration; M5P, M5 pruned; MAE, mean absolute error; MLR, multilinear
regression; RAE, relative absolute error; RBFNreg, radial basis function neural regression; RMSE, root mean squared error; RRSE, root relative squared error;
SMO, sequential minimal optimization.
be explained by the lower correlations between the meteo- and Ux seem to have a greater influence on ETo. In the
rological parameters (inputs) and ETo at this station com- second scenario, a slight reduction in model performance
pared to other stations (see Table 4). was noted compared to the first scenario, but the results
The second scenario consists of reducing the size of were satisfactory and acceptable. At Jacobabad station,
required inputs for modelling ETo. The operation of input the best performance was registered after the application
size reduction is performed based on PCA, as shown in of the RBFNreg model, where the coefficient of correla-
Section 3.1. The results of PCA show that strong correlation (r) exceeded 90%, with a value equal to 0.940 in the
tions exist between ETo and Tmax, Tmin and the hours of training phase and 0.937 in the evaluation phase. As
sunshine (n). Nevertheless, at Multan station, Tmax, Tmin mentioned above, all models performed well, with the
16 RAZA ET AL.
F I G U R E 5 Scatter plots for developed models during

test phase for first (S1, top) and second (S2, bottom)
scenario. ETo, reference evapotranspiration; MLR,
multilinear regression; M5P, M5 pruned; RBF, radial basis
function; SMO, sequential minimal optimization.
lowest result in the evaluation phase being 0.923 in the coefficient equal to 0.962 in the training phase and 0.954
SMO model. At Multan station, the best model perfor- in the evaluation phase, followed by the RBFNreg model
mance was registered for M5P tree with a correlation with 0.952 in the training phase and 0.954 in the
RAZA ET AL. 17
TABLE 5 Performance metrics of developed models in ETo prediction for second scenario.
Train Test
M5P MLR RBFNreg SMO M5P MLR RBFNreg SMO

Faisalabad station
CCP 0.960 0.928 0.946 0.929 0.955 0.910 0.943 0.913
MAE 0.441 0.672 0.553 0.666 0.716 0.965 0.801 0.966
RMSE 0.635 0.849 0.742 0.857 1.002 1.217 1.057 1.232
RAE (%) 21.651 33.021 27.184 32.701 31.16 41.975 34.839 42.052
RRSE (%) 27.698 37.027 32.331 37.364 37.850 45.965 39.941 46.538
Islamabad station
CCP 0.951 0.886 0.921 0.885 0.924 0.873 0.912 0.872
MAE 0.417 0.681 0.544 0.673 0.554 0.738 0.623 0.744
RMSE 0.548 0.821 0.686 0.829 0.737 0.922 0.783 0.930
RAE (%) 27.878 45.460 36.312 44.915 35.195 46.855 39.517 47.235
RRSE (%) 30.97 46.358 38.74 46.813 39.390 49.302 41.862 49.73
Jacobabad station
CCP 0.939 0.923 0.940 0.923 0.932 0.923 0.937 0.923
MAE 0.590 0.665 0.597 0.662 0.812 0.821 0.767 0.795
RMSE 0.764 0.859 0.757 0.864 0.975 0.992 0.941 0.960
RAE (%) 30.367 34.266 30.738 34.089 44.484 44.980 42.026 43.566
RRSE (%) 34.184 38.450 33.872 38.673 45.068 45.847 43.481 44.359
Multan station
CCP 0.962 0.929 0.952 0.929 0.957 0.929 0.954 0.928
MAE 0.497 0.704 0.589 0.693 0.742 0.873 0.773 0.857
RMSE 0.645 0.871 0.720 0.884 0.960 1.073 0.960 1.078
RAE (%) 23.661 33.517 28.068 32.990 31.467 37.050 32.809 36.383
RRSE (%) 27.272 36.794 30.432 37.370 35.736 39.950 35.752 40.150
Skardu station
CCP 0.967 0.927 0.950 0.926 0.977 0.928 0.961 0.93
MAE 0.388 0.617 0.472 0.608 0.340 0.641 0.433 0.632
RMSE 0.534 0.793 0.656 0.804 0.474 0.809 0.606 0.812
RAE (%) 20.403 32.430 24.821 31.945 17.368 32.745 22.143 32.277
RRSE (%) 25.286 37.507 31.030 38.030 21.785 37.163 27.840 37.319
Abbreviations: CCP, coefficient of pearson correlation; ETo, reference evapotranspiration; M5P, M5 pruned; MAE, mean absolute error; MLR, multilinear
regression; RAE, relative absolute error; RBFNreg, radial basis function neural regression; RMSE, root mean squared error; RRSE, root relative squared error;
SMO, sequential minimal optimization.
evaluation phase. After the application of SMO, the low- evaluation phase was registered using the SMO model at
est performance was found with an r value equal to 0.929 Islamabad station, where the performance percentage
in the training phases and 0.928 in the evaluation phase. decreased by 11% compared to the best performance
The M5P tree model had the best performance at the Fai- recorded when using the M5P tree model at Skardu sta-
salabad, Islamabad and Skardu stations, with r values in tion. This result indicates that all models have acceptable
the training phase equal to 0.960, 0.951 and 0.967 and in performance based on the reduced dataset of inputs used
the evaluation phase equal to 0.955, 0.924 and 0.977, in the second scenario. Table 5 shows the statistical per-
respectively. In contrast, the RBFNreg model had the formance indicators for modelling the ETo-based reduced
second-best performance after the M5P tree model. It is dataset of inputs used in the second scenario. Figure 5
worth mentioning that the lowest r value in the presents the scatter plots between the observed (x-axis)
18 RAZA ET AL.
and predicted (y-axis) ETo values for the created models. and third quarter values of the estimation models as well
From Figure 5, we can see that RBFNreg and the M5P as the observed ETo. In addition, according to Kouadri
tree have points distributed close to the matching line. It et al. (2021), the margin of deviation (MoD) is one of the
is worth noting that the effectiveness of the input param- methods used to evaluate the performance of a ML model
eters on ETo differs according to correlation analysis and in the estimation operation, as shown in Figure 8. Calcu-
PCA. Table 3 shows that Tmin, Tmax, Ux and n are effec- lating the MoD allows for the evaluation of anticipated
tive parameters for ETo estimation. In contrast, according ETo values depending on the degree of error, resulting in
to PCA (see Figure 4), Tmin, Tmax and n are found to be model performance analysis. The deviation rate error
effective parameters. This reveals the necessity of PCA in values between the measured and predicted ETo using
the determination of the most effective input variables. the suggested models were determined using the
The violin diagrams of both scenarios (S1 and S2) are equation below.
presented in Figure 6. The M5P tree model performed
well in comparison to the other models. Additionally,
Y Yi
heatmaps of selected stations from the input dataset for MoD ¼ 100, ð12Þ
Y
explaining the relation between explanatory and response
variables are presented in Figure 7, which shows that the
climatic variables Tmin, Tmax and n were effective for ETo where MoD is the margin of deviation, Y is the measured
estimation. ETo value, and Yi is the predicted ETo value.
It is clear from Figures 8 and 9 that the M5P tree
model represents the best predictive model for ETo esti-
3.3 | Uncertainty evaluation of results mation, followed by the RBFNreg model. In addition, it
was found that the fluctuations of the SMO and MLR
Figure 8 depicts a boxplot that analyses the uncertainty predictive models were far from the range of the observed
in ETo estimation. The boxplot contains the first, second ETo. Hence, it could be concluded that the M5P tree and
F I G U R E 6 Violin plots of first (S1, left side) and second (S2, right side) scenario. ETo, reference evapotranspiration; MLR, multilinear
regression; M5P, M5 pruned; RBF, radial basis function; SMO, sequential minimal optimization.
RAZA ET AL. 19
F I G U R E 7 Heat maps between explanatory and response variables for selected stations. ETo, reference evapotranspiration; n, sunshine
hours; RHavg, average relative humidity; Tmax, maximum temperature; Tmin, minimum temperature; Ux, average wind speed.
RBFNreg are more suitable for predicting ETo in different (2019) utilized SVM, GEP and an adoptive neuro-fuzzy
climate conditions. interference system (ANFIS) to estimate ETo utilizing
various input climatic combinations. The findings of
the chosen ML models indicated that SVM performed
3.4 | Comparison between ML and FAO- the best, with R2 = 0.999 and RMSE = 0.434 mm/
PM56 methods month. Saggi & Jain (2019) analysed four ML models,
including deep learning (DL), the generalized linear
The high probability of error in weather data monitoring model, the gradient boosting machine and TensorFlow
and recording, mainly in developing countries (TF), for modelling daily ETo at Indian stations. The
(e.g. Pakistan) and at meteorological stations run by non- DL model had superior performance compared to other
experts, is one of the most compelling arguments in models, with the greatest Nash–Sutcliffe efficiency
favour of a simpler technique than FAO-PM56. In certain coefficient (NSE) being 0.980 and the lowest RMSE
scenarios, data precision and quality metrics may be 0.190 mm/day. Shiri et al. (2019) compared the perfor-
unreliable (Droogers & Allen, 2002). The potential expla- mance of GEP with locally and externally calibrated PT
nation below supports the statistical index-based findings models on ETo estimation. GEP outperformed
of our investigation. (RMSE = 0.462mm/day; MAE = 0.216 mm/day) and
Wu et al. (2019) studied the ability of various ML gave the best solution for ETo modelling alternative to
models to estimate ETo using climatic data from local the FAO-PM56 approach utilizing two meteorological
and cross stations. ML-based models demonstrated inputs in humid and desert stations of Iran. The
superior estimation accuracy based on statistical indices acquired ML findings were compared to Valiantza's
(R2 = 0.962 and RMSE = 0.263 mm/day). SVM and empirical equation-based model. It was found that ML
tree-based ML are considered the best approaches for with grey wolf optimization techniques (ML GWO) out-
ETo modelling. Similarly, Mohammadrezapour et al. performed the empirical equation as determined by the
20 RAZA ET AL.
F I G U R E 8 Boxplots of first (S1) and second (S2) scenario. ETo, reference evapotranspiration; MLR, multilinear regression; M5P, M5
pruned; RBF, radial basis function; SMO, sequential minimal optimization.
indices NSE = 0.990 and RMSE = 0.050–0.040 mm/ option. In addition, the study determined that removing
month at both sites. RHavg data in the analysis reduces the RMSE by up to
According to Ferreira et al. (2019), most empirical 24%. Granata (2019) found ETo ML models (SVM, DT,
equations usually presented are site-specific or have less TB and TF) to be better than the conventional FAO-
extensive climatic conditions, restricting their worldwide PM56 method by analysing limited climate data from
applicability. Consequently, the authors utilized various Florida's humid region. Keshtegar et al. (2019) created a
ML models for ETo modelling in the entire Brazilian polynomial chaos expansion (PCE) ML model for ETo
region with fewer climate data. In the absence of meteo- modelling utilizing restricted meteorological data at two
rological data, the authors suggested that ML is the best Turkish stations. The findings revealed that the PCE ML
RAZA ET AL. 21
F I G U R E 9 Margins of deviation for first and second scenario. MLR, multilinear regression; M5P, M5 pruned; RBF, radial basis
function; SMO, sequential minimal optimization.
model outperformed competing methods and delivered enhance the ETo findings, two ensemble models based on
the greatest NSE of 0.999, the lowest RMSE of 0.045 mm ML and empirical equations were also built and com-
and the highest agreement index of 0.999. Globally, pared to single ML and empirical equations. In the case
Nourani et al. (2019) compared ML and empirical models of limited climate data, the study suggested using ML
for ETo modelling in several climatic locations models for ETo modelling. Similarly, Shiri et al. (2019)
(e.g. Turkey, Iraq, Cyprus, Iran and Libya). The outcomes compared the ML-GEP model against six empirical equa-
of ML models surpassed those of empirical models. To tions for estimating daily ETo using island climate data
22 RAZA ET AL.
from Iran. The findings indicated that the GEP model Even though researchers created robust ML models with
was superior to the selected empirical models at the test high accuracy, the approaches mentioned could not be
stations. used to create a generic model in place of separately
Kisi et al. (2015) examined the efficacy of four ML estimating each case study. However, utilizing climatic
models to successfully predict the monthly ETo in Iran data from different meteorological locations in our
using limited climatic data. The ML-GEP model used in analysis, the recommended ML model successfully calcu-
this work exhibited excellent performance with limited lated the ETo.
input data. Likewise, ML-based models excelled and pro-
duced a more accurate estimate of ETo, consistent with
our results. Due to the limitations of meteorological data, 3.5 | ETo interpolation maps based on
it is advised to employ ML applications in lieu of empiri- best ML model
cal and locally calibrated models. In addition, the overes-
timation and underestimation of ETo values by ML ArcMap GIS 10.1 software was used to create ETo varia-
models is contingent on correct calibration during the tion maps for the climatic stations under study. Inverse
training phase. Higher training data result in an underes- distance weighted (IDW) interpolation was used to
timation of ETo, whereas less training data result in an develop a surface raster map based on the M5P tree out-
overestimation of ETo. Implementing ML models with put. Figure 10 presents the ETo variation at the Faisala-
limited data requires good training, and ETo models bad, Islamabad, Jacobabad, Multan and Skardu climatic
based on ML may be applied to various climatic condi- stations. The lowest ETo for all the studied stations was
tions for their validation and verification. To validate the recorded at Faisalabad with 0.950 mm, Islamabad
efficacy of the generated ML ETo models, this work 0.800 mm, Jacobabad 1.370 mm, Multan 1.100 mm and
examined ML models under various meteorological con- Skardu 0.200 mm, while the highest ETo was noted at
ditions. Table 6 compares data requirements for the Faisalabad with 10.960 mm, Islamabad 8.34 mm, Jacoba-
FAO-PM56 and ML models to estimate ETo. Table 6 bad 11.020 mm, Multan 10.290 mm and Skardu
demonstrates that FAO-PM56 is dependent on various 4.620 mm. Figure 10 shows that red colour indicates a
parameters that are difficult to obtain, particularly in high ETo, yellow colour shows mediocre ETo, and green
developing regions. Unlike the FAO-PM56 method, ML colour presents a low ETo value. It can also be noted that
models use fewer parameters (Tmax, Tmin and n) and gen- the climatic stations in the arid region (Jacobabad, Mul-
erate reliable ETo values. The symbol ‘■’ in Table 6 tan) have the highest ETo variation. In contrast, the cli-
denotes the parameters required for ETo estimation, matic station in the humid region (Skardu) showed the
whereas ‘□’ indicates that they are not utilized in the lowest ETo variation. However, the climatic stations in
corresponding method. semi-arid regions (Faisalabad and Islamabad) showed
Significant efforts have been made to develop ML average ETo variation. The temperature in the arid region
models that reliably predict ETo based on a few input rises in the peak months (June, July and August) due to
data. Zhu et al. (2020) attempted to create an ML model more sunshine hours; therefore, ETo was highest in this
using solely temperature data, and they compared the region. On the other hand, rainfall occurred in the humid
model's ETo predictions to several empirical equations. region, which cooled the atmospheric temperature and
The study examined which ML model produced the least reduced sunshine hours. Therefore, ETo was recorded
error during testing and had the highest correlation lowest in this region. These ETo maps are based on effec-
(R2 = 89%) between the estimated and actual ETo. The tive climatic variables and present realistic scenarios that
method of earlier studies has generally been applied to can be used to estimate agricultural water needs and
estimate ETo over several sites, as demonstrated above. maximize yields while conserving water.
TABLE 6 Data requirements for FAO-PM56 and ML models in ETo estimation.
Input data Aerodynamic factors Adopted Target

parameter Tmin Tmax RH U n Rn (Rn, es, ea, emin, emax, Δ, Z and γ methodology result
Climatic and aerodynamic ■ ■ ■ ■ ■ ■ ■ FAO-PM56 PM ETo
Parameters ■ ■ □ □ ■ □ □ ML models ML ETo
Abbreviations: Δ, vapour pressure curve slope constant; γ, vapour pressure psychrometric constant; ea, actual vapour pressure; es, saturation vapour pressure;
ETo, reference evapotranspiration; FAO-PM56, Penman–Monteith equation of the Food and Agriculture Organization; ML; machine learning; n, sunshine
hours; RH, relative humidity; Rn, net radiation; Tmax, maximum temperature; Tmin, minimum temperature; emin, minimum vapour pressure; emax, maximum
vapour presuure; Z, height of installed instrument; U, wind speed.
RAZA ET AL. 23
F I G U R E 1 0 Variations in ETo at studied stations (Faisalabad, Islamabad, Jacobabad, Multan and Skardu). ETo, reference
evapotranspiration.
4 | C ON C L U S I ON for five stations from hyperarid, semi-arid and humid cli-

matic conditions was utilized in the current study. To
This study applies four ML methods, namely, M5P tree, examine the effectiveness of each ML method on ETo
SMO, RBFNreg and MLR, to investigate their potential estimation, a number of statistical measures were also
for the ETo modelling process. An input dataset of estimated. Findings stated that the M5P tree performed
30 years (1987–2016) divided into training and evaluation better in estimating ETo under different climate
24 RAZA ET AL.
environments than the other deployed ML methods. Con- and walnut trees using a continuous leaf monitoring system.
sidering the outcomes of the performance statistical indi- Precision Agriculture, 20(4), 723–745. Available from: https://
ces using limited climatic input for modelling ETo, the doi.org/10.1007/s11119-018-9607-0
Droogers, P. & Allen, R.G. (2002) Estimating reference evapotrans-
M5P tree ranked first, followed by the RBFNreg model.
piration under inaccurate data conditions. Irrigation and Drain-
Future advancements of this work will involve estab- age Systems, 2002(16), 33–45. Available from: https://doi.org/
lishing additional ETo models based on hybrid data intel- 10.1023/a:1015508322413
ligence (HDI) approaches and ELM that evaluate Elbeltagi, A., Raza, A., Hu, Y., al-Ansari, N., Kushwaha, N.L.,
numerous ecological factors. Effective planning, adminis- Srivastava, A., et al. (2022) Data intelligence and hybrid meta-
tration and control of water resource systems necessitate heuristic algorithms-based estimation of reference evapotrans-
extensive and dependable data on ETo modelling, which piration. Applied Water Science, 12(7), 1, 152–18. Available
from: https://doi.org/10.1007/s13201-022-01667-7
requires the implementation of gap-filling approaches.
Fan, J., Yue, W., Wu, L., Zhang, F., Cai, H., Wang, X., et al. (2018)
Due to their vital function in developing local models in
Evaluation of SVM, ELM and four tree-based ensemble models
areas with enough data, regional models should be given for predicting daily reference evapotranspiration using limited
special consideration. ETo maps are useful for managing meteorological data in different climates of China. Agricultural
data related to ground and surface water resource plan- and Forest Meteorology, 263, 225–241. Available from: https://
ning and management, as well as other water-related doi.org/10.1016/j.agrformet.2018.08.019
topics such as regional water usage analysis, water alloca- Ferreira, L.B. & da Cunha, F.F. (2020) New approach to estimate
tion, water consumption and water rights. daily reference evapotranspiration based on hourly temperature
and relative humidity using machine learning and deep learn-
ing. Agricultural Water Management, 234, 106113. Available
ACK NO WLE DGE MEN TS from: https://doi.org/10.1016/j.agwat.2020.106113
We are thankful to the editor and anonymous reviewers Ferreira, L.B., da Cunha, F.F., de Oliveira, R.A. & Filho, E.I.F.
for their time in improving the quality of this article. This (2019) Estimation of reference evapotranspiration in Brazil
research was supported by Key R&D program of Jiangsu with limited meteorological data using ANN and SVM—a new
Provincial Government (BE2021340), China, Jiangsu approach. Journal of Hydrology, 572, 556–570. Available from:
Postdoctoral Science Foundations (2016M600376 and https://doi.org/10.1016/j.jhydrol.2019.03.028
1601032C), and the Priority Academic Program Develop- Gavilan, P., Berengena, J. & Allen, R.G. (2007) Measuring versus
estimating net radiation and soil heat flux: impact on Penman–
ment of Jiangsu Higher Education Institutions (PAPD-
Monteith reference ET estimates in semiarid regions. Agricul-
2018-87). tural Water Management, 89(3), 275–286. Available from:
https://doi.org/10.1016/j.agwat.2007.01.014
CONFLICT OF INTEREST STATEMENT Granata, F. (2019) Evapotranspiration evaluation models based on
The authors declare that no conflict of interest is associ- machine learning algorithms—a comparative study. Agricul-
ated with this research article. tural Water Management, 217, 303–315. Available from:
https://doi.org/10.1016/j.agwat.2019.03.015
Han, Y., Wu, J., Zhai, B., Pan, Y., Huang, G., Wu, L., et al. (2019)
DATA AVAILABILITY STATEMENT
Coupling a bat algorithm with xgboost to estimate reference
Data are available on reasonable request.
evapotranspiration in the arid and semiarid regions of China.
Advances in Meteorology, 2019, 1–16. Available from: https://
R EF E RE N C E S doi.org/10.1155/2019/9575782
Abdullah, S.S., Malek, M.A., Abdullah, N.S., Kisi, O. & Yap, K.S. Ibrahim, D. (2016) An overview of soft computing. Procedia Com-
(2015) Extreme learning machines: a new approach for predic- puter Science, 102, 34–38. Available from: https://doi.org/10.
tion of reference evapotranspiration. Journal of Hydrology, 527, 1016/j.procs.2016.09.366
184–195. Available from: https://doi.org/10.1016/j.jhydrol.2015. Keshtegar, B., Zounemat-Kermani, M. & Kisi, O. (2019) Polynomial
04.073 chaos Expansion and Response Surface Method for Non-linear
Allen, R. G., Pereira, L. S., Raes, D. & Smith, M. (1998) Crop Modelling of Reference Evapotranspiration. Hydrological Sci-
evapotranspiration-guidelines for computing crop water ences Journal, 2019(64), 720–730. Available from: https://doi.
requirements-FAO irrigation and drainage paper 56. Fao, org/10.1080/02626667.2019.1601727
Rome, 300(9), 5109. Kişi, O. & Cimen, M. (2009) Evapotranspiration modelling
Bahrami, M., Zarei, A.R., Moghimi, M.M. & Mahmoudi, M.R. using support vector machines. Hydrological Sciences Journal,
(2019) Trend analysis of evapotranspiration applying paramet- 54(5), 918–928. Available from: https://doi.org/10.1623/hysj.54.
ric and non-parametric techniques (case study: arid regions of 5.918
southern Iran). Sustainable Water Resources Management, 5(4), Kisi, O., Sanikhani, H., Zounemat-Kermani, M. & Niazi, F. (2015)
1981–1994. Available from: https://doi.org/10.1007/s40899-019- Long-term monthly evapotranspiration modeling by several
00352-z data-driven methods without climatic data. Computers and
Dhillon, R., Rojo, F., Upadhyaya, S.K., Roach, J., Coates, R. & Electronics in Agriculture, 115, 66–77. Available from: https://
Delwiche, M. (2019) Prediction of plant water status in almond doi.org/10.1016/j.compag.2015.04.015
RAZA ET AL. 25
Kouadri, S., Kateb, S. & Zegait, R. (2021) Spatial and temporal various climatic regions. Theoretical and Applied Climatology,
model for WQI prediction based on back-propagation neural 139(3-4), 1459–1477. Available from: https://doi.org/10.1007/
network, application on EL MERK region (Algerian southeast). s00704-019-03007-3
Journal of the Saudi Society of Agricultural Sciences, 20(5), 324– Saggi, M.K. & Jain, S. (2019) Reference evapotranspiration
336. Available from: https://doi.org/10.1016/j.jssas.2021.03.004 estimation and modeling of the Punjab northern India using
Kumar, R., Shankar, V. & Kumar, M. (2011) Modelling of crop ref- deep learning. Computers and Electronics in Agriculture, 156,
erence evapotranspiration: a review. Universal Journal of Envi- 387–398. Available from: https://doi.org/10.1016/j.compag.
ronmental Research and Technology, 1(3), 239. 2018.11.031
Mattar, M.A. (2018) Using gene expression programming in Shamshirband, S. & Kamsin, A. (2016) Comparative analysis of ref-
monthly reference evapotranspiration modeling: a case study in erence evapotranspiration equations modelling by extreme
Egypt. Agricultural Water Management, 198, 28–38. Available learning machine. Computers and Electronics in Agriculture,
from: https://doi.org/10.1016/j.agwat.2017.12.017 127, 56–63. Available from: https://doi.org/10.1016/j.compag.
Mcmahon, T., Peel, M., Lowe, L., Srikanthan, R. & Mcvicar, T. 2016.05.017
(2013) Estimating actual, potential, reference crop and pan Shiri, J., Nazemi, A.H., Sadraddini, A.A., Marti, P., Fakheri
evaporation using standard meteorological data: a pragmatic Fard, A., Kisi, O., et al. (2019) Alternative heuristics equations
synthesis. Hydrology and Earth System Sciences, 17(4) to the Priestley–Taylor approach: assessing reference evapo-
1331.2013, 1331–1363. Available from: https://doi.org/10.5194/ transpiration estimation. Theoretical and Applied Climatology,
hess-17-1331-2013 138(1-2), 831–848. Available from: https://doi.org/10.1007/
Mehdizadeh, S., Behmanesh, J. & Khalili, K. (2017) Using MARS, s00704-019-02852-6
SVM, GEP and empirical equations for estimation of monthly Trajkovic, S. & Kolakovic, S. (2009) Kolakovic. Estimating
mean reference evapotranspiration. Computers and Electronics reference evapotranspiration using limited weather data. Jour-
in Agriculture, 139, 103–114. Available from: https://doi.org/10. nal of Irrigation and Drainage Engineering, 135(4), 443–449.
1016/j.compag.2017.05.002 Available from: https://doi.org/10.1061/(ASCE)IR.1943-4774.
Mohammadrezapour, O., Piri, J. & Kisi, O. (2019) Comparison of 0000094
SVM, ANFIS and GEP in modeling monthly potential evapo- Valipour, M., Sefidkouhi, M.A.G., Raeini-Sarjaz, M. & Guzman, S.
transpiration in an arid region (Case study: Sistan and Baluche- M. (2019) A hybrid data-driven machine learning technique for
stan Province, Iran). Water Supply, 19(2), 392–403. Available evapotranspiration modeling in various climates. Atmosphere
from: https://doi.org/10.2166/ws.2018.084 (Basel), 10(6), 311. Available from: https://doi.org/10.3390/
Nourani, V., Elkiran, G. & Abdullahi, J. (2019) Multi-station artifi- atmos10060311
cial intelligence based ensemble modeling of reference evapo- Walls, S., Binns, A.D. & Levison, J. (2020) Prediction of actual
transpiration using pan evaporation measurements. Journal of evapotranspiration by artificial neural network models using
Hydrology, 577, 123958. Available from: https://doi.org/10. data from a Bowen ratio energy balance station. Neural Com-
1016/j.jhydrol.2019.123958 puting and Applications, 32(17), 14001–14018. Available from:
Nouri, H., Beecham, S., Kazemi, F., Hassanli, A.M. & Anderson, S. https://doi.org/10.1007/s00521-020-04800-2
(2013) Remote sensing techniques for predicting evapotranspi- Wang, J., Raza, A., Hu, Y., Buttar, N.A., Shoaib, M., Saber, K., et al.
ration from mixed vegetated surfaces. Hydrology and Earth Sys- (2022) Development of monthly reference evapotranspiration
tem Sciences Discussions, 10(3), 3897–3925. Available from: machine learning models and mapping of Pakistan—a compar-
https://doi.org/10.5194/hessd-10-3897-2013 ative study. Water, 14(10), 1666. Available from: https://doi.
Quinlan, J. R. (1992) Learning with continuous classes.—5th org/10.3390/w14101666
Australian joint conference on artificial intelligence 92: Wang, S., Fu, Z.-Y., Chen, H., Nie, Y.-P. & Wang, K.-L. (2016)
343-348. Modeling daily reference ET in the karst area of Northwest
Rahimikhoob, A. (2010) Estimation of evapotranspiration based on Guangxi (China) using gene expression programming (GEP)
only air temperature data using artificial neural networks for a and artificial neural network (ANN). Theoretical and Applied
subtropical climate in Iran. Theoretical and Applied Climatol- Climatology, 126(3-4), 493–504. Available from: https://doi.org/
ogy, 101(1-2), 83–91. Available from: https://doi.org/10.1007/ 10.1007/s00704-015-1602-z
s00704-009-0204-z Wen, X., Si, J., He, Z., Wu, J., Shao, H. & Yu, H. (2015)
Raza, A., Hu, Y., Shoaib, M., Abd Elnabi, M.K., Zubair, M., Support-vector-machine-based models for modeling daily
Nauman, M., et al. (2021) A systematic review on estimation of reference evapotranspiration with limited climatic data in
reference evapotranspiration under Prisma guidelines. Polish extreme arid regions. Water Resources Management, 29(9),
Journal of Environmental Studies, 30, 5413–5422. Available 3195–3209. Available from: https://doi.org/10.1007/s11269-015-
from: https://doi.org/10.15244/pjoes/136348 0990-2
Raza, A., Shoaib, M., Baig, M.A.I., Ahmad, S., Khan, M.M., Wu, L., Peng, Y., Fan, J. & Wang, Y. (2019) Machine learning
Ullah, M.K., et al. (2021) Comparative study of powerful predic- models for the estimation of monthly mean daily reference
tive modeling techniques for modeling monthly reference evapotranspiration based on cross-station and synthetic data.
evapotranspiration in various climatic regions. Fresenius Envi- Hydrology Research, 50(6), 1730–1750. Available from: https://
ronmental Bulletin, 30(6b), 7490–7513. doi.org/10.2166/nh.2019.060
Raza, A., Shoaib, M., Khan, A., Baig, F., Faiz, M.A. & Khan, M.M. Yin, Z., Feng, Q., Yang, L., Deo, R.C., Wen, X., Si, J., et al. (2017)
(2020) Application of non-conventional soft computing Future projection with an extreme-learning machine and sup-
approaches for estimation of reference evapotranspiration in port vector regression of reference evapotranspiration in a
26 RAZA ET AL.
mountainous inland watershed in north-West China. Water,

9(11), 880. Available from: https://doi.org/10.3390/w9110880 How to cite this article: Raza, A., Saber, K., Hu,
Zhao, L., Xia, J., Xu, C.Y., Wang, Z., Sobkowiak, L. & Long, C. Y., L. Ray, R., Ziya Kaya, Y., Dehghanisanij, H.
(2013) Evapotranspiration estimation methods in hydrological
et al. (2023) Modelling reference
models. Journal of Geographical Sciences, 23(2), 359–369. Avail-
evapotranspiration using principal component
able from: https://doi.org/10.1007/s11442-013-1015-9
Zhu, B., Feng, Y., Gong, D., Jiang, S., Zhao, L. & Cui, N. (2020) analysis and machine learning methods under
Hybrid particle swarm optimization with extreme learning different climatic environments. Irrigation and
machine for daily reference evapotranspiration prediction from Drainage, 1–26. Available from: https://doi.org/10.
limited climatic data. Computers and Electronics in Agriculture, 1002/ird.2838
173, 105430. Available from: https://doi.org/10.1016/j.compag.
2020.105430
View publication stats

Modelling Reference Evapotranspiration Using Principal Component Analysis and Machine Learning Methods Under Different Climatic Environments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelling Reference Evapotranspiration Using Principal Component Analysis and Machine Learning Methods Under Different Climatic Environments

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Modelling reference evapotranspiration using principal component analysis

Article in Irrigation and Drainage · May 2023

Ali Raza Kouadri Saber

19 PUBLICATIONS 104 CITATIONS

Yongguang Hu Ram L. Ray

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Modelling reference evapotranspiration using principal

Ali Raza 1 | Kouadri Saber 2 | Yongguang Hu 1 | Ram L. Ray 3 |

1 | INTRODUCTION Therefore, alternative methods are widely chosen and

2 | MATERIALS AND METHODS

FIGURE 1 Selected climatic stations in Pakistan.

Station properties Meteorological parameters

Lat Lon Alt Tmax Tmin RHavg U n ETo

TABLE 2 Statistical description of meteorological variables for selected climatic stations.

Dataset Climate variables Xmean Xstd CV Xmin Xmax Xskp Xkrt

Dataset Climate variables Xmean Xstd CV Xmin Xmax Xskp Xkrt

where ε is a free threshold parameter, ‘hw, x i i b’ is the yi ¼ β0 þ β1 x i1 þ β2 x i2 þ … þ βp x ip þ ϵ, ð6Þ

RBFNreg is a nonlinear regression approach that means

TABLE 3 Correlation matrix between climate variables.

Tmax Tmin RHavg Ux n ETo

Eigenvalues Component matrix

Faisalabad station (semi-arid)

Eigenvalues Component matrix

FIGURE 3 Eigenvalues and percentage of explained variance.

M5P MLR RBFNreg SMO M5P MLR RBFNreg SMO

F I G U R E 5 Scatter plots for developed models during

M5P MLR RBFNreg SMO M5P MLR RBFNreg SMO

TABLE 6 Data requirements for FAO-PM56 and ML models in ETo estimation.

Input data Aerodynamic factors Adopted Target

4 | C ON C L U S I ON for five stations from hyperarid, semi-arid and humid cli-

mountainous inland watershed in north-West China. Water,

View publication stats

You might also like