You are on page 1of 11

Agricultural Water Management 221 (2019) 220–230

Contents lists available at ScienceDirect

Agricultural Water Management


journal homepage: www.elsevier.com/locate/agwat

Generalized reference evapotranspiration models with limited climatic data T


based on random forest and gene expression programming in Guangxi,
China
Sheng Wanga,b, Jinjiao Liana, Yuzhong Pengb, Baoqing Hub, Hongsong Chena,

a
Key Laboratory of Agro-ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, Hunan 410125, China
b
Key Laboratory of Environment Change and Resources Use in Beibu Gulf, Nanning Normal University, Ministry of Education, Nanning, Guangxi 530001, China

ARTICLE INFO ABSTRACT

Keywords: Accurate estimation of reference evapotranspiration (ET0) is very important in hydrological cycle research, and
Water resources is essential in agricultural water management and allocation. The application of the standard model (FAO-56
Climate change impact Penman-Monteith) to estimate ET0 is restricted due to the absence of required meteorological data. Although
Variable importance many machine learning algorithms have been applied in modeling ET0 with fewer meteorological variables, most
Karst region
of the models are trained and tested using data from the same station, their performances outside the training
station are not evaluated. This study aims to investigate generalization ability of the random forest (RF) algo-
rithm in modeling ET0 with different input combinations (refer to different circumstances in missing data), and
compares this algorithm with the gene-expression programming (GEP) method using the data from 24 weather
stations in a karst region of southwest China. The ET0 estimated by the FAO-56 Penman-Monteith model was
used as a reference to evaluate the derived RF-based and GEP-based models, and the coefficient of determination
(R2), Nash-Sutcliffe coefficiency of efficiency (NSCE), root of mean squared error (RMSE), and percent bias
(PBIAS) were used as evaluation criteria. The results revealed that the derived RF-based generalization ET0
models are successfully applied in modeling ET0 with complete and incomplete meteorological variables (R2,
NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day−1, and −2.916% to
1.571%, respectively), and seven RF-based models corresponding to different incomplete data circumstances are
proposed. The GEP-based generalization ET0 models are also proposed, and they produced promising results (R2,
NSCE, RMSE and PBIAS ranged from 0.639 to 0.944, 0.636 to 0.942, 0.222 to 0.555 mm day−1, and −1.98% to
0.248%, respectively). Although the RF-based ET0 models performed slightly better than the GEP-based models,
the GEP approach has the ability to give explicit expressions between the dependent and independent variables,
which is more convenient for irrigators with minimal computer skills. Therefore, we recommend applying the
RF-based models in water balance research, and the GEP-based models in agricultural irrigation practice.
Moreover, the models performance decreased with periods due to climate change impact on ET0. At last, both of
the two methods have the ability to assess the importance of predictors, the order of the importance of me-
teorological variables on ET0 in Guangxi is: sunshine duration, air temperature, relative humidity, and wind
speed.

1. Introduction et al., 2011), or a method for the transfer of the energy balance and
water vapor mass (Shiri et al., 2014b), these measurements are labor-
Evapotranspiration (ET) is an important branch of the hydrologic ious, time consuming, and expensive (Shiri et al., 2014b). Moreover,
cycle (Traore and Guven, 2013), as more than 60% of total global the measurements are limited in time and space (Falamarzi et al.,
precipitation is dissipated by it (Falamarzi et al., 2014). Accurate ob- 2014). Alternatively, ET can be estimated by a reference evapo-
servation of actual ET is important in the design of irrigation schedules, transpiration (ET0) multiplied by a crop coefficient, which is the most
water resource management, and water allocation (Wang et al., 2015). extensive approach recommended by the Food and Agriculture Orga-
Although ET can be monitored directly by using a lysimeter (Allen nization (FAO) (Allen et al., 1998; Wang et al., 2015; Rahimikhoob,


Corresponding author at: Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, Hunan 410125, China.
E-mail address: hbchs@isa.ac.cn (H. Chen).

https://doi.org/10.1016/j.agwat.2019.03.027
Received 19 September 2016; Received in revised form 22 January 2019; Accepted 19 March 2019
Available online 10 May 2019
0378-3774/ © 2019 Elsevier B.V. All rights reserved.
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

2016). The empirical crop coefficient (Kc, defined as the ratio of actual proposed models have adequate accuracy, as has been reported
crop ET to the reference crop ET), is determined predominantly using (Abdullah et al., 2015; Kim and Kim, 2008b; Kumar et al., 2008; Traore
specific crop characteristics and only a small percentage with en- and Guven, 2011), they may be useful only in the training stations, and
vironmental conditions (Allen et al., 1998). For example, the Kc values their effectiveness is otherwise doubtful. Moreover, it is impossible to
of hops were 0.69, 1.02 and 0.85 in at initial, mid-season, and end- develop ET0 models for each locations. One effective way is to develop
season periods, respectively (Fandio et al., 2015). Kc values of ap- generalized ET0 models using fewer meteorological variables, but few
proximately 80 crops are available on the FAO's website. Therefore, studies have considered this (Shiri et al., 2014c; Kisi, 2016). Beyond
precisely calculating ET0 is essential for accurately estimating ET this, no research has tested whether machine learning-based ET0
(Rahimikhoob, 2016). ET0 is the rate of ET of a hypothesized grass models are applicable in the context of climate change, which has an
(with adequate water supply, albedo = 0.23, height = 0.12, and sur- important impact on water resource management (Allen et al., 1998;
face resistance = 70 s/m), and represents the maximum atmospheric Wang et al., 2015).
evaporative power at a given time and location, regardless of crop type The random forest (RF) method, which is an ensemble learning
and soil characteristics (Allen et al., 1998; Shiri et al., 2012; Feng et al., method for classification and regression, has become popular in recent
2016). Allen et al. (1998) stressed that ET0 is influenced only by me- years because of its robust performance across a wide range of datasets,
teorological factors. Therefore, many methods have been proposed to high prediction accuracy, limited number of user-defined parameters,
estimate ET0 from climatic data. and ability to avoid overfitting (Jing et al., 2015). It can also estimate
In these methods, the FAO-56 Penman-Monteith (FAO-PM) is a the relative importance of variables. Fern et al. (2014) conducted an
physical approach proposed based on the theories of aerodynamics and exhaustive evaluation of 179 classifiers arising from 17 families (dis-
energy balance. This approach has been recommended as the standard criminant analysis, Bayesian, ANN, SVM, decision trees, rule-based
method by the FAO, and is used to calibrate other ET0 methods (Allen classifiers, boosting, bagging, stacking, random forests, generalized
et al., 1998; Shiri et al., 2012). The method has two important ad- linear models, nearest neighbors, partial least squares, principal com-
vantages: (1) It can be applied to different geographic and climatic ponent regression, logistic and multinomial regression, and multiple
zones without local calibration because of its theoretical basis, and the adaptive regression splines) over 121 datasets, and concluded that the
results have been proven to be more consistent with the observation RF delivers the best performance overall. RF has been successfully ap-
data than other methods. (2) It has been validated using lysimeters plied to many areas.
worldwide (Kumar et al., 2008; Shiri et al., 2012; Wang et al., 2015). Guangxi, located in southwest China, has one of the largest con-
The main drawback of the FAO-PM is that it requires a full set of me- tinuous karst landforms in the world. Although this region has a sub-
teorological factors, including air temperature, relative humidity, solar tropical, mountainous, monsoon climate with a large amount of annual
radiation, and wind speed, and high-quality data (Kim and Kim, 2008a; precipitation (more than 1200 mm), the karst habitat is deficient in
Kumar et al., 2009). Furthermore, the computation procedure is com- water resources for vegetation growth because there are a large number
plicated for irrigation technicians who typically are not sophisticated of fissures, gaps, channels, and sinkholes. Thus, these karst systems
computer users(Traore and Guven, 2013). However, weather stations have an ineffective water storage capacity (Chen et al., 2009, 2010).
that satisfy the requirements of observations are limited, especially in Furthermore, drought occurs more frequently. Liu et al. (2014) found
developing countries (Wang et al., 2015; Shiri et al., 2014b, 2012). Air that southwestern China has generally become drier in relation to
temperature sensors are generally available in most weather stations global climate change, and that regional mean annual precipitation has
worldwide, whereas sensors for observing other meteorological factors decreased by 11.4 mm per decade. It is therefore acknowledged that
are found in relatively fewer stations, and the quality of data is not accurate ET0 estimation is important for water resource allocation and
always reliable (Shiri et al., 2012; Droogers and Allen, 2002). There- management for agriculture in this region.
fore, there is a need to develop simpler ET0 models that use fewer The objectives of the study are: (1) to demonstrate the applicability
meteorological variables and have adequate precision. Empirical ET0 of RF and GEP in estimating ET0; (2) to develop and compare the
models using smaller amounts of climatic data have also been widely performance of the generalized RF-based and GEP-based ET0 estimation
used as a substitute for FAO-PM. The Hargreaves–Samani equation is models in Guangxi with different meteorological variables used as
superior to others as it requires only the maximum and minimum air input, and evaluate the applicability of the models in the context of
temperatures (Hargreaves, 1982), and provides the most accurate climate change; and (3) to identify the contribution rank of each cli-
global average performance (Almorox et al., 2015). It was therefore matic factor in ET0 estimation.
employed in this study.
ET0 can be recognized as a function of several meteorological 2. Materials and methods
variables. With the advancement in computational resources and the
emergence of big data, some machine learning techniques have been 2.1. Study area and data collection
successfully applied to estimate ET0, such as artificial neural networks
(ANN) (Kumar et al., 2002; Kim and Kim, 2008b; Shiri et al., 2014a), Guangxi is located in the Pearl River basin of southwest China be-
genetic programming (GP) (Izadifar and Elshorbagy, 2010; Kisi and tween 20°54′–26°23′ N and 104°29′–112°04′ E, and covers approxi-
Guven, 2010), support vector machine (SVM) (Tabari et al., 2012), mately 236,700 km2, accounting for 2.47% of China's total territory.
adaptive neuro-fuzzy inference system (ANFIS) (Shiri et al., 2012; The carbonate area takes up approximately 37.8% of the province. The
Tabari et al., 2012), extreme learning machine (ELM) (Feng et al., 2016; territory tilts from northwest to southeast, and has a hilly mountain
Abdullah et al., 2015), and gene expression programming (GEP) (Shiri terrain. The region has a tropical and subtropical humid climate, with
et al., 2012, 2014c; Wang et al., 2015). Unlike other machine learning an average annual temperature of 17–23 °C and annual precipitation of
approaches that produce black-box models, the GEP has the ability to 1080–2760 mm. Fig. 1 shows the geographical locations of the study
provide explicit expressions between dependent and independent area and the distribution of 24 meteorological stations in it. A summary
variables, a powerful advantage for practical applications, and trans- of the stations is listed in Table 1.
ferability (Traore and Guven, 2013; Shiri et al., 2014c). Thus, it was Daily climatic data from the 24 meteorological stations recorded
applied in this study. from 2010 to 2014, containing the maximum and minimum air tem-
At present, the major deficiency in research on machine learning- perature (°C), wind speed at a height of 2 m (m/s), relative humidity
based ET0 estimation models is that these models are trained and tested (%), and duration of sunshine (hours), were acquired from the National
using climatic data at the same station, and their applicability is not Climatic Centre of the China Meteorological Administration. Five years
validated beyond the training stations. Therefore, although the of daily meteorological data are sufficient to develop ET0 models

221
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Fig. 1. Experiment area and location of the weather stations.

because ET0 exhibits a smaller variation in comparison with other cli- Fifteen input combinations (Table 2) representing different scenarios
mate variables (such as precipitation) (Shiri et al., 2012). where meteorological data were missing (Traore and Guven, 2013) were
Based on our previous study (Wu et al., 2017), the 24 stations were applied to establish the RF and GEP models. Air temperature (Ta) was
divided into two types according to landform: karst (10) and non-karst generally available at most meteorological stations while the other para-
areas (14). Six stations were then selected – Hechi, Laibin and Pingguo meters were found at fewer stations, and the quality of the data was not
from the karst area and Wuzhou, Mengshan and Dongxing from the always guaranteed. Therefore, air temperatures (Tmax and Tmin) were in-
non-karst area – to validate the models at the spatial scale. The six troduced to all input combinations. The 15 input combinations were di-
stations were chosen because they typically represented the change in vided into seven types: Type-1 (RF1) was a temperature-based univariate
trend of ET0: no significant variation (Hechi and Wuzhou), significant model. Type-2 (RF2, RF3, and RF4) was a bivariate model, with Ta and RH
increase (Pingguo and Dongxing) and significant decrease (Laibin and in RF2, Ta and u2 in RF3, and Ta and n in RF4, respectively. Type-3 (GEP5,
Mengshan) (Wu et al., 2017). Moreover, according to long-term data GEP6, and GEP7) was a trivariate model, in which RF5 used Ta, RH, and
recorded at the six stations, the validation can be used to examine the u2. RF6 used Ta, u2, and n. RF7 used Ta, RH, and n. Type-4 (RF8), Type-
impact of climate change on the performance of the models. The daily 5(RF9, RF10, and RF11), and Type-6 (RF12, RF13, and RF14) were
climatic data from the remaining 18 stations from 2010 to 2013 were formed by adding Ra to Types-1, 2, and 3, respectively. Lastly, Type-7
used to train the RF and GEP models, and the remaining data (2014) (RF15) used all the meteorological variables. The input combinations in
were used for testing. GEP models were the same as RF models, as shown in Table 2.

Table 1
Locations of weather stations and characterization of their climatic data averages (2010–2014).
Code Stations Lat Lon Ele P T u2 RH n

1 Rongan 25.22 109.40 121.3 1942.5 20.2 0.9 76.8 1335.0


2 Guilin 25.32 110.30 164.4 1900.0 20.1 1.5 71.6 1372.2
3 Fengshan 24.55 107.03 484.6 1564.0 20.3 0.9 77.4 1203.2
4 Hechi 24.70 108.03 260.2 1342.7 20.4 1.6 73.8 1310.3
5 Duan 23.93 108.10 170.8 1784.8 21.6 2.6 75.8 1380.0
6 Liuzhou 24.35 109.40 96.8 1528.5 21.6 1.1 71.5 1315.5
7 Mengshan 24.20 110.52 145.7 1738.0 20.8 1.2 76.9 1373.1
8 Hezhou 24.42 111.53 108.8 1550.3 20.8 1.4 74.8 1417.3
9 Napo 23.42 105.83 794.1 1353.1 20.2 1.2 79.0 1232.7
10 Baise 23.90 106.60 173.5 1066.8 22.1 1.5 72.5 1543.1
11 Jingxi 23.13 106.42 739.9 1636.3 20.8 0.7 75.1 1371.1
12 Pingguo 23.32 107.58 108.8 1359.0 23.3 0.9 73.7 1439.3
13 Laibing 23.75 109.23 84.9 1360.0 22.0 1.0 73.7 1420.0
14 Guiping 23.40 110.08 42.5 1726.7 22.6 0.9 78.7 1410.4
15 Wuzhou 23.48 111.30 114.8 1503.6 22.0 1.5 79.1 1729.3
16 Longzhou 22.33 106.85 128.8 1260.4 23.2 0.8 78.5 1389.4
17 Nanning 22.63 108.22 121.6 1304.2 22.2 1.2 79.2 1667.1
18 Lingshan 22.42 109.30 66.6 1658.0 22.5 1.5 78.4 1613.9
19 Yulin 22.65 110.17 81.8 1650.0 23.0 1.3 77.1 1493.2
20 Dongxing 21.53 107.97 22.1 2738.0 23.5 1.4 77.6 1437.2
21 Fangcheng 21.78 108.35 32.4 2362.6 22.8 2.0 80.0 1512.5
22 Qinzhou 21.95 108.62 4.5 2150.0 23.6 1.5 77.1 1653.3
23 Beihai 21.45 109.13 12.8 1670.0 23.5 2.1 79.0 1750.0
24 Weizhoudao 21.03 109.10 55.2 1297.0 23.7 2.9 80.2 2133.2

Lat: latitude; Lon: longitude; Ele: elevation, m; P: precipitation, mm/year; T: mean temperature, °C; u2: wind speed at 2 m, m/s; RH: relative humidity, %; n: duration
of sunshine, hours/year. The underline indicates the station located in karst area.

222
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Table 2
Input set for RF and GEP models.
Model Input set

1 Tmin, Tmax
2 Tmin, Tmax, RH
3 Tmin, Tmax, u2
4 Tmin, Tmax, n
5 Tmin, Tmax, RH, u2
6 Tmin, Tmax, u2, n
7 Tmin, Tmax, RH, n
8 Tmin, Tmax, Ra
9 Tmin, Tmax, Ra, RH
10 Tmin, Tmax, Ra, u2
11 Tmin, Tmax, Ra, n
12 Tmin, Tmax, Ra, RH, u2
13 Tmin, Tmax, Ra, u2, n
14 Tmin, Tmax, Ra, RH, n
15 Tmin, Tmax, Ra, RH, u2, n
Hargreaves Tmin, Tmax, Ra
Fig. 2. Number of trees in training stage required to obtain the minimum error.

2.2. FAO-56 Penman-Monteith


randomly sample mtry of the predictors and choose the best split
ET0 values estimated by the FAO-PM equation were used as the from among those variables.
reference target values for developing and evaluating the RF-based and (c) Predict new data by aggregating the predictions of the ntree trees
GEP-based models (Kisi, 2016; Wang et al., 2015). The FAO-PM equa- (majority votes for classification, and average for regression).
tion is given as:
In this study, the R package randomForest developed by Liaw and
900
0.408 (Rn G) + u (e
Tmean + 273 2 s
ea) Wiener (Liaw and Wiener, 2002; James et al., 2013) was used for RF
ET0 = model training and testing. This package needs only a few tunable
+ (1 + 0.34u2 ) (1)
parameters: the number of trees (ntree) in the forest and the number of
−1
where ET0 is reference evapotranspiration, mm day ; Rn is net radia- predictors in the random subset of each node (mtry). In this study, the
tion at the crop surface, MJ/(m2 d); G is the heat flux density of soil, default values of mtry (one-third of all predictor variables) was used, as
MJ/(m2 d); Δ is slope of the saturation vapor pressure curve, KPa/°C; γ the creator of the RF algorithm in (Breiman, 2001), which for most
is the psychometric constant, KPa/°C; es is saturation vapor pressure, occasions is adequate to generate predictions of desirable accuracy. ntree
KPa; ea is actual vapor pressure, KPa; u2 is wind speed at a height of was built using an iterative evaluation, and out-of-bag error (mean
2 m, m/s; and Tmean is mean daily air temperature at 2-m height, °C squared error for regression problems) was used during parameter op-
(Allen et al., 1998). timization to yield the minimum error. From Fig. 2, it is evident that the
In this work, due to a lack of radiation data, Rn was indirectly es- mean squared errors decreased with ntree, and R2 increased corre-
timated based on the FAO-56 procedure from duration of sunshine and spondingly. In general, there was a threshold of ntree, beyond which
location of the weather stations, which was the difference between the increasing the number of trees induced no significant performance gain,
incoming net shortwave and the net outgoing longwave radiation, and and only increased computational cost. From the perspectives of ac-
G = 0 for the day timescale. Moreover, Ra, extraterrestrial radiation, curacy and the computational burden, ntree = 1000 was used in this
was estimated from the site latitude of the site and the day of the year study.
(1 = 1 January, and 365 or 366 = 31 December), which represented As a tree-based method, it is convenient to evaluate the importance
the theoretical solar radiation used to estimate Rn (Allen et al., 1998). of the variables in the RF (in this study, it means distinguishing vari-
Tmean was estimated by averaging Tmax and Tmin. ables that best contributed to the predictive ability of ET0). The im-
portance of variables was calculated by computing the reduction in
2.3. Hargreaves and Samani equation prediction accuracy resulting from randomly permuting the values of a
variable. The greater the decreasing in prediction accuracy, the more
The Hargreaves–Samani equation (Hargreaves, 1982) is the simplest important the relevant variable, and vice versa.
method to estimate ET0 and was proposed in Allen et al. (1998) if only
the air temperatures are variables.
2.5. Gene expression programming
2.4. Theory of Random forest
Gene expression programming (GEP) combines two popular genetic
Random forest (RF) is an ensemble learning method for classifi- techniques: genetic programming (GP, using expression tree structures
cation, regression, and clustering. It randomly creates a group of of different sizes and shapes) and genetic algorithms (GA, describing
decision trees and forecasts the class that is the mode of classes complex relations by simpler, fixed-length, linear structures called
(classification) or the mean (regression) of individual trees. The chromosomes) (Traore and Guven, 2013). It thus harnesses the ad-
following gives a brief description of the procedure for building an vantages of each of the two methods and overcomes their individual
RF model: constraints. In the GEP algorithm, the solution to the problem being
investigated is described via chromosomes that consist of one or more
(a) Draw ntree bootstrap samples from the original data. A bootstrap genes. Two or more genes can be combined to make a chromosome, and
subset contains approximately 2/3 of the elements of the original the chromosomes are combined through a linking function. The gen-
dataset. eration of the initial population is the first stage in the GEP algorithm.
(b) For each bootstrap subset, build an unpruned regression tree: At The chromosomes are then expressed in a tree expression, where the
each node, rather than choosing the best split among all predictors, tree consisting of chromosomes is reproduced as new programs through
replication and genetic modification (mutation, transposition, and

223
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

recombination of the chromosomes were executed in this process). This Table 3


process can be repeated until a certain criterion is met (such as the Performance of RF-based and GEP-based models during the testing period.
minimum fit function value). Model RF-based GEP-based
The major steps to solve problems using the GEP are as follows
(Traore and Guven, 2011; Wang et al., 2015): R 2
NSCE RMSE PBIAS R2 NSCE RMSE PBIAS

1 0.637 0.626 0.563 −2.916 0.639 0.636 0.555 −1.798


(a) Select the fitness function. According to Shiri et al. (2014c), the 2 0.787 0.786 0.426 1.152 0.753 0.753 0.458 0.757
fitness function of the root mean squared error (RMSE) is applicable 3 0.764 0.762 0.449 −1.530 0.730 0.723 0.485 −1.479
to modeling ET0. 4 0.899 0.894 0.300 −2.744 0.897 0.891 0.304 −2.839
(b) Determine the set of terminals (T) and the set of functions (F) to 5 0.874 0.870 0.332 1.571 0.822 0.812 0.399 1.756
6 0.945 0.942 0.221 −2.020 0.926 0.923 0.255 −1.963
create the chromosomes. T consists of different input combinations
7 0.938 0.938 0.230 −0.372 0.925 0.925 0.253 −0.390
presented in Table 2. A total of 15 input combinations were used in 8 0.705 0.698 0.506 −2.790 0.653 0.646 0.547 −3.122
the study. The function set used here contained 13 basic mathe- 9 0.820 0.818 0.393 1.073 0.757 0.754 0.457 1.360
matical functions: +, −, ×, ÷, x , x2, x3, x , 3 x , ln(x), sin(x), cos 10 0.803 0.801 0.411 −1.790 0.764 0.759 0.452 −2.692
11 0.906 0.901 0.290 −2.873 0.892 0.885 0.313 −3.323
(x), and tan(x).
12 0.886 0.881 0.318 1.419 0.828 0.826 0.384 1.368
(c) Choose the chromosomal architecture. The head size was 8, the 13 0.950 0.946 0.213 −2.063 0.939 0.934 0.237 −2.186
number of genes was 3, and number of chromosomes was 30. 14 0.941 0.941 0.224 −0.521 0.932 0.931 0.242 −0.274
(d) Determine the linking function. It can be “addition” or “multi- 15 0.987 0.986 0.107 −0.283 0.944 0.942 0.222 0.348
plication” for algebraic subtrees. “Addition” was selected here. Har 0.565 0.528 0.633 −6.494

(e) Determine the values of the gene operators. The mutation rate was
0.044, inversion rate was 0.1, the one-point and two-point re-
combination rate was 0.3, the gene recombination and gene tran-
n
i=1
(Oi Pi )
spiration rate was 0.1, the insertion sequence transposition rate was PBIAS = 100 n
O
i=1 i (5)
0.1, and the root insertion sequence transpiration rate was 0.1.
Select “parsimony pressure” as the penalizing tool.

With different input combinations, the program ran until the ac- 3. Results and discussion
curacy of the model no longer significantly improved, and the GEP
model was thus developed. This study used GeneXproTools 5.0 devel- 3.1. Performance of RF-based models during the testing period
oped by Gepsoft Limited to this end.
The ET0 values estimated from FAO-PM were considered the
2.6. Statistical indicators and input combinations benchmark to evaluate the application of the proposed RF and GEP
models during the testing periods. The statistical indicators, R2 NSCE,
To evaluate the accuracy of model predictions, four statistical in- RMSE and PBIAS, are shown in Table 3. It was observed that R2, NSCE,
dicators were used: RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to
0.563 mm day−1 and −2.916% to 1.571%, respectively. The presence
(a) The coefficient of determination (R2), where a value close to 1.0 or absence of critical meteorological factors in the input sets sig-
indicates that most of the total variance of the observed values is nificantly impacted the performance of RF-based models. Type-1 (RF1)
explained by the model (Fandio et al., 2015): used only Tmin and Tmax as input set, and can be considered a tem-
n 2 perature-based model. It yielded the worst estimation (R2 = 0.637,
( (Oi O¯ )(Pi P¯ ) )
R2 = n
i=1
n
NSCE = 0.626, RMSE = 0.563 mm day−1 and PBIAS = −2.916%) in
i=1
(Oi O¯ ) 2 i=1
(Pi P¯ ) 2 (2) comparison with the rest of the models. Nevertheless, RF1 was superior
to the conventional Hargreaves–Samani model (which yield
where Oi and Pi represent pairs of observed and predicted values for
R2 = 0.565, NSCE = 0.528, RMSE = 0.633 mm day−1 and
a given variable, and Ōi and P̄i are the mean values of Oi and Pi,
PBIAS = −6.494%). It could thus be used as an alternative method
respectively.
when only Tmin and Tmax were available. RF2, 3, and 4 were formed by
(b) The Nash-Sutcliffe coefficient of efficiency (NSCE) proposed by
adding RH, u2, and n to RF1 combinations, respectively. From the
Nash and Sutcliffe (1970), which determines the relative magnitude
viewpoint of the statistical indicators, RF2 (R2 = 0.787, NSCE = 0.786,
of the residual variance between the observed and the predicted
RMSE = 0.426 mm day−1 and PBIAS = 1.152%), 3 (R2 = 0.764,
values. A value of NSCE close to 1.0 indicates that the residual
NSCE = 0.762, RMSE = 0.449 mm day−1 and PBIAS = −1.530%), and
variance is much smaller than the observed variance in data and
4 (R2 = 0.899, NSCE = 0.894, RMSE = 0.300 mm day−1 and
thus, good model performance (Krause et al., 2005):
PBIAS = −2.744%) outperformed the RF1 model. It can be inferred
n
i=1
(Oi Pi )2 that the order of factors influencing ET0 in Guangxi was: n, RH and u2.
NSCE = 1.0 n
(Oi O¯ )2 (3) This result is similar to that in studies by Thomas (2000), who revealed
i=1
that the duration of sunshine is the major factor controlling evapo-
transpiration south of 35oN. Yin et al. (2010) also reported that n pri-
(c) The root mean square error (RMSE): marily determines change in ET0 in the subtropical and tropical regions
n of China. Type-3 was the trivariate model (air temperature and two
1
RMSE = (Oi Pi )2 other meteorological variables), and the input combinations contained
n (4)
i=1 n (RF6 and RF7) to produce a similar accuracy in modeling ET0 with R2
values of 0.945 against 0.938, NSCE values of 0.942 against 0.938,
(d) The percent bias (PBIAS) can provide insight into the simulated RMSE values of 0.247 against 0.245 mm day−1, and PBIAS values of
data to be larger or smaller than their corresponding observations. −2.020% against −0.372%. Moreover, the models considering n (RF6
Positive values of PBIAS imply that the model underestimates the and RF7) outperformed the model without n (RF5), which confirmed
target values, and vice versa (Fandio et al., 2015): the importance of n for the estimation of ET0 in Guangxi.
By comparing Types-1, 2, and 3 with Types-4, 5, and 6, it can be

224
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

concluded that adding Ra to the input sets improved models perfor- latitude areas, because the duration of sunshine (solar radiation) had a
mances (Table 3). Therefore, Ra represents the extraterrestrial radia- positive relation with ET0 (Yin et al., 2010). The mean value of n in
tion, and should be considered to estimate ET0 when solar radiation is high-latitude area was lower than in the whole area. Thus, the gen-
unavailable because it is a computed variable that can easily be ac- eralized models overestimated ET0, and vice versa. All results indicate
quired. The importance of Ra was also pointed out by Traore and Guven the significant effect of duration of sunshine on the modelling of ET0.
(2013). This section summarizes the excellent regression capacity of RF in
RF8 was superiority to the conventional Hargreaves–Samani model estimating ET0. Moreover, unlike other studies that targeted the given
when using the same input variables, with R2 values of 0.705 against weather stations, the generalization capability of the RF approach
0.565, NSCE values of 0.698 against 0.528, RMSE values of 0.506 proposed here is considered.
against 0.633 mm day−1, and PBIAS values of −2.790% against
−6.494%. Therefore, RF8 can be substitute for the Hargreaves–Samani
model when only Ta data were available for Guangxi. 3.2. Performance of GEP-based models during the testing period
In other circumstance of incomplete data in Guangxi, RF9, 10, 11,
12, 13, and 14 were developed by considering Ra in RF2, 3, 4, 5, 6, and The performance of the of GEP model on different input combina-
7, respectively, and yielded more accurate estimations of ET0 (Table 3. tions is presented in Table 3. The values of R2, NSCE, RMSE and PBIAS
Type-6 (RF15) had a full set of meteorological variables similar to FAO- ranged from 0.639 to 0.944, 0.636 to 0.942, 0.222 to 0.555 mm day−1
PM, and generated the best results (with the highest R2 and NSCE, and and −1.798% to 0.348%, respectively. Therefore, the derived GEP
the lowest RMSE and PBIAS). However, the RF15 model could not re- models exhibited the ability to capture the variation in ET0 in Guangxi,
place FAO-PM in a scenario featuring complete data. China. Type-1 (GEP1) contained only Tmin and Tmax, and produced the
Fig. 3 shows the distribution of the four statistical parameters of the poorest results. Type-2 (GEP2,3, and 4) was formed by introducing RH,
seven RF models at each of the 18 training stations (from high to low u2, and n to GEP1, and its performance are superior to that of GEP1.
latitude). It is clear that RF15 outperformed the other models on all Thus n variables were the most effective in estimating ET0, because
statistical criteria, as it considered all variables that had an influence on adding n to GEP1 (GEP4) significantly improved performance, yielding
ET0. Moreover, Fig. 3 shows that although the statistical parameters the largest increases in R2 (40%) and NSCE (40%), and the largest re-
fluctuated, the models became more effective and robust when added to ductions in RMSE (45%) and PBIAS (58%), followed by RH and u2. This
more climatic variables, which was consistent with Table 3. RF9, RF10, is consistent with the analysis in Section 3.1. Our previous study had
and RF12 had lower R2 and NSCE, and higher RMSE and PBIAS than also found that n and RH were the main meteorological factors influ-
RF11, R13, RF14, and RF15, owing to their less n as input that re- encing the change in ET0 (Wu et al., 2017). Type-3 (GEP5, 6, and 7)
presented the amount of available energy for evapotranspiration added u2, n, and RH to the input combinations of GEP2, 3, and 4, re-
(Droogers and Allen, 2002). Moreover, the models generally performed spectively, and performed better than Type-2.
better at high latitudes (karst region) than low latitudes (non-karst re- By Comparing Type-1, 2, and 3, with Type-4, 5, and 6, we see that
gion) (Fig. 3). This might have been the case because low-latitude areas similar to the proposed RF-based ET0 models, adding the calculated
(mean annual sunshine duration was 1580.2 hours) had a longer rather than the observed variable – Ra, which is easy to obtain – to the
duration of total sunshine than high-latitude areas (mean annual sun- input combinations can systematically improve the performance of the
shine duration was 1337.9 hours), and the difference was significant models (Table 3).
(p = 0.0007). Thus, the deficiency in the duration of sunshine (RF9, GEP8 contained the same meteorological variables (Tmax, Tmin, and
RF10, and RF12) or variation in it (RF11, RF13, and RF14) caused a Ra) as the conventional Hargreaves equation, and was superior to the
large variation in ET0 in low-latitude areas. Furthermore, from the Hargreaves model with R2 values of 0.653 against 0.565, NSCE values
spatial distribution of PBIAS, we can infer that the models over- of 0.646 against 0.528, RMSE values of 0.546 against 0.633 mm day−1,
estimated ET0 in high-latitude areas, and underestimated ET0 in low- and PBIAS values of −3.122% against −6.494%. Therefore, if only air
temperature is available in Guangxi, China, the GEP8 model can be a

Fig. 3. Statistical indicators of RF models at 18 training stations (from high to low latitude as shown in Table 1).

225
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Fig. 4. Statistical indicators of GEP models at 18 training stations (from high to low latitude, shown in as Table 1).

suitable substitute for the conventional Hargreaves method. Its alge- ET0 = sin( 3 3 Ra + Tmin (Tmax RH 9.234) )
braic equation is:
+ 3 ln(3.221Tmax ) sin( 3 u2 ) + 4 u (Tmax + 2.457) + 3.227 (10)
T In some circumstances, where relative humidity is usually unavail-
ET0 = sin[sin3 (sin3 {sin[sin(Ra) Ra]})] + sin sin2sin3 tan max
Ra able or the data quality is unreliable (Traore and Guven, 2011), GEP13
54.485Tmax 13.537Tmin can be used, and yields adequate accuracy (R2 = 0.939, NSCE = 0.934,
+ 2
49.289Tmax Tmin (6) RMSE = 0.237 mm day−1, and PBIAS = −2.186%). The derived for-
mula is:
When air temperature and RH are available, and n and u2 are un-
available or contain a great deal of missing data, the GEP9 model can be u2 Tmax Tmin
ET0 = 3 ln(Ra ) +
used (R2 = 0.757, NSCE = 0.754, RMSE = 0.457 mm day−1, and Ra 99.352 2Tmax 2u2
PBIAS = 1.360%). It is expressed as follows: n
+
6.521 + sin( 3 Tmin + u2 ) (11)

ET0 = exp3
Tmax 9
+
5.863Tmax 5.863 cos ( RH
7.546 ) In case wind speed is unavailable, GEP14 can be used (R = 0.932, 2

33.210 RH NSCE = 0.931, RMSE = 0.242 mm day−1, and PBIAS = −0.274%).


The expression is:
0.151 3 exp( 3 exp( 3 Ra + RH ) ) (7)
nRH2
ET0 =
If only air temperature and wind speed can be used in Guangxi, 63.908(Tmin Tmax ) 2 + 4.941RH2
China, GEP10 can be applied (R2 = 0.764, NSCE = 0.759, Tmin 0.22RH + 14.013 Tmax
RMSE = 0.452 mm day−1, and PBIAS = −2.692%). It can be written + sin
Ra
as:
Tmin RH + Tmax RH + 12.107RH
+ exp
Tmax Tmin Ra + RH2 + 6.624RH (12)
ET0 = 4 sin[sin(0.114Tmax + sin( Tmax ))]
4.8Tmax
Ultimately, according to the performance criteria, the GEP15
Tmax 3 u2 (Ra + Tmin ) model (R2 = 0.944, NSCE = 0.942, RMSE = 0.222 mm day−1, and
+
Ra + Tmin (8)
Table 4
If only air temperature and duration of sunshine are available for Performance of RF-based and GEP-based models during validation collected
Guangxi, GEP11 is the best method to estimate ET0 (R2 = 0.892, from 2010 to 2014 at the six weather stations.
NSCE = 0.885, RMSE = 0.313 mm day−1, and PBIAS = −3.323%).
Model RF-based GEP-based
GEP11 can be presented as follows:
R2 NSCE RMSE PBIAS R2 NSCE RMSE PBIAS
Ra
ET0 = 3 Tmin 3 2Tmin 3 Tmin + 2.776 + cos 3 n1.5 +
Tmax + Tmin 8 0.749 0.748 0.482 −0.787 0.727 0.724 0.504 −1.363
9 0.831 0.830 0.396 −0.400 0.791 0.790 0.440 −0.899
n3 10 0.786 0.782 0.448 1.876 0.763 0.762 0.468 0.850
+ 4
11 0.896 0.895 0.311 0.469 0.880 0.879 0.334 −0.424
93.003 Tmin (9)
12 0.876 0.866 0.351 2.321 0.844 0.843 0.380 1.149
13 0.910 0.905 0.295 2.111 0.900 0.897 0.308 2.128
GEP12 can be employed in cases where the duration of sunshine is
14 0.932 0.931 0.253 0.382 0.925 0.925 0.264 0.078
missing (R2 = 0.828, NSCE = 0.826, RMSE = 0.384 mm day−1, and 15 0.953 0.948 0.219 2.483 0.921 0.916 0.278 2.331
PBIAS = 1.368%). The expression is:

226
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Fig. 5. Validation of RF models at six stations from 2010 to 2014.

PBIAS = 0.348%) outperformed the other GEP models as it considered (n + Tmin + Ra ) u2 Tmin 3 u2
all meteorological variables, as does FAO-PM. Moreover, although the ET0 = + tan tan
RH RH
performance of FAO-PM was slightly better than that of GEP15, GEP15
is more convenient to use because FAO-PM contains a significant (n + 1.941) Tmax
+
number of sophisticated, intermediate parameters that need to be es- RH u2 (13)
timated (Feng et al., 2016), and can easily use the derivative that is
convenient for analyzing the sensitivity parameters (Yin et al., 2010). Fig. 4 shows the distribution of the four statistical parameters of the
The expression of GEP15 is: seven GEP models at each of the 18 training stations (from high to low
latitude). Like the RF models, GEP9, GEP10, and GEP12 performed
worse than GEP11, GEP13, GEP14, and GEP15 due to the absence of

Fig. 6. Validation of GEP models at six stations from 2010 to 2014.

227
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Table 5 Figs. 5 and 6 also show that ET0 in the stations in Pingguo was un-
Performance of RF-based and GEP-based models using data collected from 1960 derestimated, and RF and GEP models performed slightly worse in
to 2009 at the six validation weather stations. Pingguo than at the other stations. This can be explained by the mean
Model RF-based GEP-based value of ET0, where ET0 in Pingguo (2.78 mm day−1) was significantly
(p < 0.001) larger than in Hechi (2.09 mm day−1), Laibin
R2 NSCE RMSE PBIAS R2 NSCE RMSE PBIAS (2.18 mm day−1), Mengshan (2.25 mm day−1), Wuzhou
(2.45 mm day−1), and Dongxing (2.47 mm day−1). To this end, because
8 0.568 0.528 0.619 −8.05 0.529 0.521 0.622 −3.392
9 0.742 0.731 0.466 −1.389 0.732 0.728 0.468 −2.098 of the spatial variation of climatic variables, more weather stations
10 0.696 0.683 0.506 −3.309 0.694 0.560 0.596 −10.097 should be used to develop more robust models.
11 0.830 0.815 0.386 −4.147 0.850 0.830 0.371 −5.482 The superiority of RF models over GEP models was also noted.
12 0.834 0.811 0.391 1.063 0.770 0.739 0.459 −5.159
However, the differences were minimal, and the main purpose of de-
13 0.914 0.897 0.289 −1.497 0.900 0.866 0.329 −4.621
14 0.902 0.891 0.297 −0.510 0.897 0.892 0.295 2.501
veloping the GEP-based models of ET0 was to generate an easy-to-use
15 0.980 0.975 0.141 −0.266 0.886 0.883 0.307 0.902 mathematical expression with few meteorological variables for irriga-
tors with minimal computer skills (Traore and Guven, 2011; Shiri et al.,
2014c; Yassin et al., 2016).
data on the duration of sunshine. It is also clear that the GEP models
generally performed better in high-latitude areas (karst region) than in
low-latitude areas (non-karst region) (Fig. 3), which is consistent with 3.4. Testing RF-based and GEP-based models under climate change
previous analysis. From the spatial distribution of PBIAS, we can again
infer that the GEP models overestimated ET0 in high-latitude areas and To evaluate the RF-based and GEP-based models of ET0 under cli-
underestimated it in low-latitude areas. mate change, we validated the proposed models using climatic data
collected from 1960 to 2009 at six weather stations, where the data
3.3. Performance of RF-based and GEP-based models using data from typically represented trends of change in ET0. The performance is
outside the training stations shown in Table 5, and Figs. 7 and 8. Table 5 shows that the best model
was one with all climatic variables, in the 15th combination, the results
As mentioned in Section 2.1, the RF and GEP models were further of which were significantly close to those of the FAO-56 PM model.
spatially evaluated using climatic data collected from six meteor- Moreover, Figs. 7 and 8 show that for the RF models at Laibin (ET0
ological stations other than the 18 meteorological stations used to train recorded a significant variation) and Hechi (ET0 had no significant
the RF and GEP models in the same period (2010–2014). As mentioned variation) from 1960 to 2009 (divided into five periods: 1960–1969,
in Sections 3.1 and 3.2, Ra can improve model performance, and is a 1970–1979, 1980–1989, 1990–1999, and 2000–2009), R2 and RMSE
calculated variable (from the location of a station and the day of the clearly show that the performance of the models declined with time.
year) that does not need to be observed. It can thus be seen as a default Moreover, the coefficients of variation of R2 and RMSE were larger in
variable to be used in the estimation of ET0. We evaluated the models Laibin (0.037 and 0.117 mm day−1) than in Hechi (0.018 and
RF8–RF15 and GEP8–GEP15. Their performance is shown in Table 4, 0.077 mm day−1). The GEP models showed the same tendency as the
and Figs. 5 and 6. As the results indicate, the trend at the six stations RF models, which indicates that the performance of the models was
was identical to that at the other 18 stations used in training and sensitive to climate change. Climate change is the statistically sig-
testing. The best performance indicators at the six stations were ob- nificant mean climatic state of a significant change over a longer period
tained by input combinations containing the duration of sunshine (typically 10 years or longer). Thus, the proposed models can be ap-
(RF11, RF13, RF14, and RF15, and GEP11, GEP13, GEP14, and GEP15). plied to this period. Based on the results, similar research related to

Fig. 7. Variation in RF models in Laibin from 1960 to 2009.

228
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

Fig. 8. Variation in RF models in Hechi from 1960 to 2009.

Fig. 9. Rank of importance of meteorological variables.

data-driven models (Kim and Kim, 2008b; Kumar et al., 2008; Tabari the most important variable for estimating ET0, RH, u2, and Ra, ranked
et al., 2012; Shiri et al., 2012; Yassin et al., 2016) should be evaluated as third, fifth, and sixth, respectively. This was different with other
in the context of climate change. studies, which relate to different climate zone (Yin et al., 2010; Yassin
et al., 2016).

3.5. Relative importance of climatic factors in estimation of ET0 4. Conclusions

The analysis of the sensitivity of ET0 to meteorological variables is The RF algorithm has a lot of merit and the ability to model com-
important for a thorough understanding of the impact of global climate plicated nonlinear systems, however, it is rarely applied in hydrological
change on evapotranspiration, and for water resource management (Yin research. This study aims to investigate the applicability and the gen-
et al., 2010). The RF and GEP algorithms have the ability to evaluate eralization of RF in modeling ET0 in Guangxi with different input
the importance of predictors. The performances of the models with combinations (refer to the different circumstances in missing data), and
different input combinations can thus be explained by analyzing the compare with the GEP method. The following conclusions can be
importance of the meteorological factors on ET0. Fig. 9a shows the drawn:
importance of different variables for ET0 by RF method (higher value of
the mean decrease in Gini implies greater importance (Breiman, 2001)). (1) The derived RF-based generalization ET0 models are successfully
The order of importance of meteorological variables based on the RF applied in modeling ET0 with complete and incomplete meteor-
method was: n (43.9%) > Tmax (23.0%) > RH (12.1%) > Tmin ological variables. In general, the more variables in use by a model,
(11.5%) > u2 (6.2%) > Ra (3.3%), and the order based on the GEP the higher chance of producing better performance. Moreover, we
method (Fig. 9b was: n (30.4%) > Tmax (4.3%) > RH (35.2%) > u2 proposed seven RF-based models corresponding to different in-
(15.9%) > Tmin (14.3%) > Ra (6.9%). It can thus be concluded that complete data circumstances. Specifically, RF8 can be used as an
the order of the importance of climatic factors in Guangxi was: n, T, RH, alternative of the conventional Hargreaves–Samani model since it
and u2. The result is consistent with the preceding analysis where n was

229
S. Wang, et al. Agricultural Water Management 221 (2019) 220–230

performs better than the latter. of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15,
(2) The GEP-based generalization ET0 models are also proposed, and 3133–3181.
Hargreaves, G.H., 1982. Estimating potential evapotranspiration. J. Irrig. Drain. Div. 108,
the models produce promising results. Although the RF-based ET0 225–230.
models performed slightly better than the GEP-based models, they Izadifar, Z., Elshorbagy, A., 2010. Prediction of hourly actual evapotranspiration using
are black-box models. The GEP approach has the ability to give neural networks, genetic programming, and statistical models. Hydrol. Process. 24,
3413–3425.
explicit expressions between the dependent and independent vari- James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical
ables, which is more convenient for irrigators with minimal com- Learning.
puter skills than the RF-based models. Therefore, we recommend Jing, W., Yang, Y., Yue, X., Zhao, X., 2015. Mapping urban areas with integration of
DMSP/OLS nighttime light and MODIS data using machine learning techniques.
applying the RF-based models in water balance research, and the Remote Sens. 7, 12419–12439.
GEP-based models in agricultural irrigation practice. Kim, S., Kim, H.S., 2008a. Neural networks and genetic algorithm approach for nonlinear
(3) Based on a spatial evaluation, The RF and GEP models were gen- evaporation and evapotranspiration modeling. J. Hydrol. 351, 299–317. https://doi.
org/10.1016/j.jhydrol.2007.12.014.
erally performed better in high-latitude areas (karst region) than in
Kim, S., Kim, H.S., 2008b. Neural networks and genetic algorithm approach for nonlinear
low-latitude areas (non-karst region), and both of them over- evaporation and evapotranspiration modeling. J. Hydrol. 351, 299–317.
estimated ET0 at high-latitude areas and underestimated at low- Kisi, O., 2016. Modeling reference evapotranspiration using three different heuristic re-
latitude areas, which were induced by the differences of sunshine gression approaches. Agric. Water Manage. 169, 162–172. https://doi.org/10.1016/
j.agwat.2016.02.026.
duration of the two regions. Moreover, it was found that the per- Kisi, O., Guven, A., 2010. Evapotranspiration modeling using linear genetic programming
formance of the developed models decreased with periods. technique. J. Irrig. Drain. Eng. 136, 715–723.
(4) Both of the RF and GEP algorithm have the ability to evaluate the Krause, P., Boyle, D.P., Se, F.B., 2005. Comparison of different efficiency criteria for
hydrological model assessment. Adv. Geosci. 5, 89–97.
importance of predictors, the order of importance of meteorological Kumar, M., Raghuwanshi, N.S., Singh, R., Wallender, W.W., Pruitt, W.O., 2002.
variables on ET0 in Guangxi is: sunshine duration, air temperature, Estimating evapotranspiration using artificial neural network. J. Irrig. Drain. Eng.
relative humidity, and wind speed. 128, 224–233.
Kumar, M., Bandyopadhyay, A., Raghuwanshi, N.S., Singh, R., 2008. Comparative study
of conventional and artificial neural network-based ETo estimation models. Irrig. Sci.
Acknowledgements 26, 531–545. https://doi.org/10.1007/s00271-008-0114-3.
Kumar, M., Raghuwanshi, N.S., Singh, R., 2009. Development and Validation of GANN
Model for Evapotranspiration Estimation. J. Hydrol. Eng. 14, 131–140. https://doi.
This study was financially supported by the Guangxi Natural Science org/10.1061/(ASCE)1084-0699(2009)14:2(131.
Foundation (2018GXNSFBA281136 and 2018GXNSFGA281003), and the Liaw, A., Wiener, M., 2002. Classification and regression by random forest. R News,
National Natural Science Foundation of China (41807012). We would also pp. 23.
Liu, M., Xu, X., Sun, A.Y., Wang, K., Liu, W., Zhang, X., 2014. Is southwestern china
like to thank the two anonymous reviewers for their thoughtful and con-
experiencing more frequent precipitation extremes? Environ. Res. Lett. 9, 064002.
structive comments on the manuscript. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part i -
a discussion of principles. J. Hydrol. 10, 282–290.
Appendix A. Supplementary data Rahimikhoob, A., 2016. Comparison of M5 Model Tree and Artificial Neural Network's
Methodologies in Modelling Daily Reference Evapotranspiration from NOAA Satellite
Images. Water Resour. Manage. 30, 3063–3075. https://doi.org/10.1007/s11269-
Supplementary data associated with this article can be found, in the 016-1331-9.
online version, at https://doi.org/10.1016/j.agwat.2019.03.027. Shiri, J., Kisi, O., Landeras, G., Javier Lopez, J., Nazemi, A.H., Stuyt, L.C.P.M., 2012. Daily
reference evapotranspiration modeling by using genetic programming approach in
the Basque Country (Northern Spain). J. Hydrol. 414, 302–316. https://doi.org/10.
References 1016/j.jhydrol.2011.11.004.
Shiri, J., Marti, P., Nazemi, A.H., Sadraddini, A.A., Kisi, O., Lenderas, G., Fakherifard, A.,
2014a. Local vs. external training of neuro-fuzzy and neural networks models for
Abdullah, S.S., Malek, M.A., Abdullah, N.S., Kisi, O., Yap, K.S., 2015. Extreme learning
estimating reference evapotranspiration assessed through k-fold testing. Hydrol. Res.
machines: a new approach for prediction of reference evapotranspiration. J. Hydrol.
Shiri, J., Nazemi, A.H., Sadraddini, A.A., Landeras, G., Kisi, O., Fard, A.F., Marti, P.,
527, 184–195.
2014b. Comparison of heuristic and empirical approaches for estimating reference
Allen, R.G., Pereira, L.S., Raes, D., Smith, M., 1998. Crop evapotranspiration: guidelines
evapotranspiration from limited inputs in Iran. Comput. Electron. Agric. 108,
for computing crop water requirements. FAO irrigation and drainage paper no. 56.
230–241. https://doi.org/10.1016/j.compag.2014.08.007.
Allen, R.G., Pereira, L.S., Howell, T.A., Jensen, M.E., 2011. Evapotranspiration informa-
Shiri, J., Sadraddini, A.A., Nazemi, A.H., Kisi, O., Landeras, G., Fard, A.F., Marti, P.,
tion reporting: I. Factors governing measurement accuracy. Agric. Water Manage. 98,
2014c. Generalizability of gene expression programming-based approaches for esti-
899–920. https://doi.org/10.1016/j.agwat.2010.12.015.
mating daily reference evapotranspiration in coastal stations of Iran. J. Hydrol. 508,
Almorox, J., Quej, V.H., Marti, P., 2015. Global performance ranking of temperature-
1–11.
based approaches for evapotranspiration estimation considering Koppen climate
Tabari, H., Kisi, O., Ezani, A., Talaee, P.H., 2012. SVM, ANFIS, regression and climate
classes. J. Hydrol. 528, 514–522. https://doi.org/10.1016/j.jhydrol.2015.06.057.
based models for reference evapotranspiration modeling using limited climatic data
Breiman, L., 2001. Random Forests. Machine Learning.
in a semi-arid highland environment. J. Hydrol. 444–445, 78–89.
Chen, X., Zhang, Z., Chen, X., Shi, P., 2009. The impact of land use and land cover
Thomas, A., 2000. Spatial and temporal characteristics of potential evapotranspiration
changes on soil moisture and hydraulic conductivity along the karst hillslopes of
trends over china. Int. J. Climatol. 20, 381–396.
southwest china. Environ. Earth Sci. 59, 811–820.
Traore, S., Guven, A., 2011. New algebraic formulations of evapotranspiration extracted
Chen, H., Zhang, W., Wang, K., Fu, W., 2010. Soil moisture dynamics under different land
from gene-expression programming in the tropical seasonally dry regions of west
uses on karst hillslope in northwest Guangxi, China. Environ. Earth Sci. 61,
Africa. Irrig. Sci. 31, 1–10.
1105–1111.
Traore, S., Guven, A., 2013. New algebraic formulations of evapotranspiration extracted
Droogers, P., Allen, R.G., 2002. Estimating reference evapotranspiration under inaccurate
from gene-expression programming in the tropical seasonally dry regions of West
data conditions. Irrig. Drain. Syst. 16, 33–45.
Africa. Irrig. Sci. 31, 1–10. https://doi.org/10.1007/s00271-011-0288-y.
Falamarzi, Y., Palizdan, N., Huang, Y.F., Lee, T.S., 2014. Estimating evapotranspiration
Wang, S., Fu, Z.y., Chen, H.s., Nie, Y.p., Wang, K.l., 2015. Modeling daily reference et in
from temperature and wind speed data using artificial and wavelet neural networks
the karst area of northwest Guangxi (China) using gene expression programming
(WNNs). Agric. Water Manage. 140, 26–36. https://doi.org/10.1016/j.agwat.2014.
(GEP) and artificial neural network (ANN). Theor. Appl. Climatol. 1–12. https://doi.
03.014.
org/10.1007/s00704-015-1602-z.
Fandio, M., Olmedo, J.L., Martnez, E.M., Valladares, J., Paredes, P., Rey, B.J., Mota, M.,
Wu, L., Chen, H., Lian, J., Fu, Z., Wang, S., 2017. Spatio-temporal variation in reference
Cancela, J.J., Pereira, L.S., 2015. Assessing and modelling water use and the partition
evapotranspiration in recent 50 years in karst and non-karst areas in Guangxi Zhuang
of evapotranspiration of irrigated hop (Humulus lupulus), and relations of transpira-
autonomous region. Chin. J. Eco-Agric. 25, 1508–1517.
tion with hops yield and alpha-acids. Ind. Crops Prod. 77, 204–217.
Yassin, M.A., Alazba, A.A., Mattar, M.A., 2016. Artificial neural networks versus gene
Feng, Y., Cui, N., Zhao, L., Hu, X., Gong, D., 2016. Comparison of ELM, GANN, WNN and
expression programming for estimating reference evapotranspiration in arid climate.
empirical models for estimating reference evapotranspiration in humid region of
Agric. Water Manage. 163, 110–124.
Southwest China. J. Hydrol. 536, 376–383. https://doi.org/10.1016/j.jhydrol.2016.
Yin, Y., Wu, S., Dai, E., 2010. Determining factors in potential evapotranspiration changes
02.053.
over china in the period 1971–2008. Chin. Sci. Bull. 55, 3329–3337.
Fernandez-Delgado, M., Cernadas, E., Barro, S., Amorim, D., 2014. Do we need hundreds

230

You might also like