Evaporation Modelling Using Different Machine Learning PDF

INTERNATIONAL JOURNAL OF CLIMATOLOGY
Int. J. Climatol. (2017)

Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/joc.5064
Evaporation modelling using different machine learning

techniques
Lunche Wang,a,b* Ozgur Kisi,c Bo Hu,b Muhammad Bilal,d Mohammad Zounemat-Kermanie
and Hui Lia
a
Laboratory of Critical Zone Evolution, School of Earth Sciences, China University of Geosciences, Wuhan, China
b
State Key of Laboratory of Atmospheric Boundary Physics and Atmospheric Chemistry, Institute of Atmospheric Physics, Chinese Academy of
Sciences, Beijing, China
c
Center for Interdisciplinary Research, International Black Sea University, Tbilisi, Georgia
d Department of Land Surveying & Geo-Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
e
Department of Water Engineering, Shahid Bahonar University of Kerman, Iran
ABSTRACT: Accurate prediction of pan evaporation (Ep) is critical for water resource management. This article investigates
the capabilities of three different soft computing methods at estimating monthly Ep at six stations in the Yangtze River Basin
using climatic factors, including the air temperature (Ta), solar radiation (Rg), air pressure (Pa) and wind speed (Ws) for the
period of 1961–2000. The first part of the study focused on testing and comparing model accuracy levels at each station using
local input combinations. The results indicate that the fuzzy genetic (FG) model generally produces better results than adaptive
neuro-fuzzy inference systems with grid partition (ANFIS-GP) and M5 model tree (M5Tree) specifications in terms of the root
mean square error , mean absolute error and coefficient of determination values. The performance of the above models was
also examined using cross-station applications (estimating Ep without local input or output data) in the second part of the
study. The third part focused on estimating Ep using generalized FG, ANFIS-GP and M5Tree models. Collectively, the results
demonstrate that the FG model can be successfully used to estimate Ep without any local inputs and outputs and that a single
generalized FG model can also be used at six different locations.
KEY WORDS pan evaporation; fuzzy genetic algorithm; ANFIS-GP; M5 model tree; cross-station application
Received 2 August 2016; Revised 15 February 2017; Accepted 18 February 2017
1. Introduction Shiri et al., 2014c; Wang et al., 2017). The U.S. Weather
Bureau Class A pan, which is 21 cm in diameter and
Evaporation is the process through which liquid water
becomes water vapour. This process is strongly depen- 25.5 cm deep and is mounted on a timber grid 15 cm above
dent on energy supplies, air exchanges and differences the soil surface, is one of the most widely used instruments
in the vapour pressure between the Earth’s surface and for measuring evaporation. However, it is impractical to
atmosphere (Penman, 1948; Kim et al., 2012; Shiri et al., place evaporation pans at each site where they are needed
2014a). Pan evaporation (Ep) is essential for estimating (Rahimikhoob, 2009), especially in developing coun-
water budgets and modelling crop water responses to dif- tries (Kim et al., 2015). It is difficult to use pans due to
ferent climatic conditions. For example, Ep is widely used instrumental limits and practical issues that influence the
as an index for potential or reference crop evapotranspira- accuracy of Ep measurements (Rosenberry et al., 2007).
tion and irrigation scheduling (Guven and Kisi, 2011; Shiri Thus, a number of scholars have attempted to estimate Ep
et al., 2014b; Kisi, 2015). Thus, accurate observation and values using several indirect methods (Droogers and Allen,
estimation of Ep are critical aspects of water resource engi- 2002; Valiantzas, 2006; Sanikhani et al., 2012; Kim et al.,
neering and irrigation practices (McCuen, 1998; Sanikhani 2013). For example, many empirical methods fit the linear
et al., 2012; Majidi et al., 2015) and will contribute to relationship between Ep and meteorological variables
many studies focused on hydrology, meteorology, agron- (Stephens and Stewart, 1963). However, the rate of evap-
omy, forestry, and terrestrial ecosystems. oration strongly depends on local climatic conditions (e.g.
In recent decades, both direct and indirect methods have solar radiation, air temperature, relative humidity, vapour
been employed to estimate Ep values around the world pressure deficit, and wind speed), and these data cannot
(Priestley and Taylor, 1972; Shirsath and Singh, 2010; be easily obtained from weather stations (Rosenberry
et al., 2007; Piri et al., 2009). It is also difficult to derive
* Correspondence to: L. Wang, Laboratory of Critical Zone Evolution, an accurate formula that represents all physical processes
School of Earth Sciences, China University of Geosciences, Luoyu Road, owing to the presence of nonlinear, stochastic and com-
Wuhan 430074, China. E-mail: wang@cug.edu.cn plex evaporation processes (Shilo et al., 2015). Therefore,
© 2017 Royal Meteorological Society

L. WANG et al.
many researchers have emphasized the need to accurately This study aims to investigate the capabilities of the
estimate evaporation using hydrologic modelling studies. ANFIS-GP, M5Tree and FG models to predict the monthly
Artificial intelligence methods have been successfully Ep at six stations in the Yangtze River Basin using differ-
applied in a number of fields, including water resources, ent input combinations. The model performance levels are
solar radiation and energy applications. Regarding hydro- determined by (1) estimating Ep at each station using dif-
logical contexts, recent experiments have demonstrated ferent local input combinations; (2) estimating Ep without
that artificial neural networks (ANN) may serve as local input or output data and (3) estimating Ep using gen-
promising tools for Ep estimation (Tan et al., 2007; eralized ANFIS-GP, M5Tree and FG models. This is the
Cobaner et al., 2009; Sanikhani et al., 2012; Goyal et al., first study to compare the accuracy of different soft com-
2014; Xie et al., 2015). For example, Shirsath and Singh puting models for Ep estimation in the Yangtze River Basin
(2010) compared the ANN, Penman, Priestley-Taylor and of China.
Stephens and Stewart models in terms of estimating the
daily Ep in New Delhi, India and found that the ANN
model produces better estimates than multiple linear 2. Methods
regression (MLR) models. Lin et al. (2013) compared 2.1. Fuzzy genetic (FG) algorithm
the support vector machine (SVM) and multilayer per-
ceptron (MLP) models for Ep estimation and found that The FG method combines the fuzzy system and genetic
the SVM-based model is superior owing to its accuracy, algorithm (GA) approaches. These two approaches are
robustness and efficiency. Kim et al. (2015) compared briefly described in the following subsections.
to three soft computing models [MLP, the Kohonen
self-organizing feature maps-neural networks model 2.1.1. Fuzzy logic approach
(KSOFM-NNM) and gene expression programming Fuzzy logic can be used to establish a fuzzy system, which
(GEP)] in terms of predicting daily Ep values at two sta- generally includes four components (Figure 1(a)) (Kisi
tions with limited climatic data. Kisi (2015) investigated and Tombul, 2013). To set up a model based on a fuzzy
the accuracy of the least square SVM (LSSVM), multi- inference system, input data (variables) with correspond-
variate adaptive regression splines (MARS) and M5Tree ing output(s) should be introduced into the system. Unlike
in terms of modelling Ep at two stations in Mediterranean the black-box techniques involved in ANNs, the learning
sites of Turkey and found that the MARS model performs process of fuzzy systems is more transparent (Kisi, 2015).
better than the other models. The same researchers found When modelling Ep using fuzzy logic, the input and
satisfactory results when using different models at differ- output data are divided into subsets with membership
ent sites around the world. However, almost no studies functions (MFs). In this study, Gaussian MFs are first
have used a sufficiently large number of stations (>3) to applied in the fuzzification phase. Second, for cn (c denotes
reach general conclusions. In addition, no studies have the numbers of subsets, and n is the number of input
compared different ANN methods in terms of estimating parameters), fuzzy rules are established during the fuzzy
the Ep values at different sites of China. In this study, a rule base and inference engine phases. When considering
large number of stations were used to obtain more general one input variable ‘x’ with ‘A1 ’, ‘A2 ’, … , ‘Ac ’ fuzzy
conclusions, and single generalized Adaptive neuro-fuzzy subsets (A can be a linguistic parameter such as ‘low’,
inference system-grid partitioning (ANFIS-GP), M5Tree
and FG models that can be used at six locations were (b)
(a) Input Population Fitness
developed. To our knowledge, no studies have yet com- Initialization Function
pared these three methods in terms of Ep estimation, thus
creating the impetus for the present investigation.
Selection
Some researchers have analysed spatiotemporal varia- Fuzzification
tions in and related causes of Ep in sites across China (Liu
et al., 2004, 2011; Han et al., 2012; Yang and Yang, 2012; Fuzzy Inference
System
Wei et al., 2015; Zhang et al., 2015; Xu et al., 2016). For
example, Zuo et al. (2005) studied correlations between
Fuzzy Inference Crossover
Ep and environmental factors over the last 40 years in Engine
Fuzzy Rule Base
China. Wang et al. (2007) analysed Ep trends along the Mutation
upper and mid-lower Yangtze River Basin for 1961–2000
and found that decreasing Ep values in the summer can be NO
associated with net radiation and wind speed. Finally, Li Defuzzification End of Iteration?
et al. (2013) investigated Ep dynamics in the hyper-arid
region of China for 1958–2010 using long-term meteo- YES
rological data collected from 81 meteorological stations in Output Output

north-western China. However, very few studies have com-
pared Ep estimates in China using different soft computing Figure 1. (a) Typical view of fuzzy inference system and (b) general
techniques. framework on fuzzy genetic model.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

EVAPORATION MODELLING IN YANGTZE RIVER BASIN
(a) (b) Y
X,Y
A11
X,Y
A12
Output
X,Y
A13
X
1 00
Input MF T-norms Normalize Sugeno Sum A21 A23 A22
1
Figure 2. (a) Schematic architecture of ANFIS (Kisi, 2015); (b) grid partitioning identification method which divides the two variables input space
into a grid of nine regions.
‘high’, etc.), there are the following k rules (Kisi and establish fuzzy if-then rules, such as the following expres-
Tombul, 2013): sions (Zounemat-Kermani and Scholz, 2013):
( )
Rule 1 ∶ If x is A1 and y is B1 ,
Rule 1: IF (x has A1 ) THEN y1 ( )
Rule 2: IF (x has A2 ) THEN y2 Then z1 = p1 x + q1 y + r1
… . . . .. . . . . … … . . . . .. . . . .. .. ( )
Rule 2 ∶ If x is A2 and y is B2 ,
Rule k: IF (x has Ak ) THEN yk ( )
Then z2 = p2 x + q2 y + r2 (2)
The output values (y) can be calculated using a defuzzi- where A and B are the MFs of the input vector (x and
fication method (such as the centre of gravity, the mean of y), z is the output, and p, q and r are the output function
the maxima, the middle of the maximum or the weighted parameters.
fuzzy mean). In this study, the weighted fuzzy mean is used In general, an ANFIS is shaped like a five-layer neu-
in the last fuzzy modelling system phase, and the weighted ral network-based fuzzy system (Figure 2(a)). Different
averages of outputs from k rules lead to a single weighted types of TS fuzzy models can be developed using differ-
output (Kisi and Tombul, 2013), ent identification methods (e.g. GP and subtractive cluster-
∑
k ing, hereafter SC). In this study, a combination of ANFIS
wn · yn and GP (ANFIS-GP) methods are applied using the des-
n=1 ignated number of MFs (Kisi and Zounemat-Kermani,
y= (1)
∑
k 2014). The gradient descent method can be used as an opti-
wn mization method for adopting the sizes and locations of
n=1 fuzzy grid regions and the degree of overlap among them
where wk denotes the membership degree and where yn is (Figure 2(b)).
the output for each fuzzy rule. In this study, a GA is used
2.3. M5 model tree
to determine the optimal MFs.
M5Tree is categorized as a class of binary decision-tree
2.1.2. GAs models. M5Tree splits data into subspaces, and then a lin-
ear regression model is built for each of them (Witten et al.,
GAs are searching algorithms categorized as heuristic 1999; Goyal, 2014). Connections between the input and
methods, which are efficient techniques in terms of search- output parameters are developed using linear regression
ing for the optimal MF solution in a fuzzy system. In fuzzy functions at the terminal nodes (leaves) (Goyal and Ojha,
models, the MF parameters of the input and output are 2011). The generation of M5Tree occurs over two main
optimized using a GA. A general schematic view of a GA phases. In the first phase, the data are split into subsets
model is shown in Figure ure 1(b), and further information according to the standard deviation reduction (SDR) func-
can be found in a related study by Preis and Ostfeld (2008). tion shown below (Rahimikhoob, 2014):
2.2. ANFIS with GP ∑ ||Ti || ( )
SDR = sd (T) − sd Ti (3)
An ANFIS addresses complex problems using a combi- |T|
nation of rule based fuzzy systems and ANNs. In the where T represents a set of examples that reaches the
ANFIS architecture, input and output nodes denote the node, Ti denotes the subset of examples with the ith out-
input vector and target value, and middle nodes (in hid- come of the potential set, and sd denotes the standard
den layers) are processors that function as rule and MFs. deviation. Figure 3(a) shows the input space (two inde-
In this study, a Takagi-Sugeno (TS) fuzzy model is used to pendent variables) split into five subspaces (leaves). The

L. WANG et al.
(a) X2 (b)
Model 2 Model 1 X2≤2 No
4 Yes
X1>4
3
X1≤3 Yes No
2 Model 5
Model 4 No Model 1 X2≤3
1 Yes
Yes No
0 Model 3 Model 3 Model 4
X1
0 1 2 3 4 5 6
Model 5 Model 2
Y
Figure 3. Schematic diagram of an M5 model tree; (a) splitting the input space X1 × X2, (b) related tree-like-structure diagram with five linear
regression models at the leaves.
( n (
)2
parent-node-splitting process generates purer categories ∑ )( )
(with lower sd values) of data in child nodes. This pro- Epm,i − Epm Epo,i − Epo
cess continues until the best state with the maximum SDR i=1
R2 = (6)
is reached. However, the splitting process can produce an ∑(
n )2 ∑
n ( )2
overgrown tree-like structure and can consequently result Epm,i − Epm Epo,i − Epo
in poor generalization. Hence, in a second phase, the tree is i=1 i=1
pruned by replacing sub-trees with linear regression func- where N and bar indicate the number of data points and the
tions (Figure 3(b)). mean of the variable, respectively, and Epm and Epo are the
modelled and observed Ep levels, respectively.
2.4. Software availability
In developing the above soft computing models, the
3. Case study and data
Neural Network Toolbox 5 and MATLAB 7.0 (The
Mathworks, Inc., USA) software packages were used. In this study, monthly data collected at six stations oper-
A MATLAB script was written. The input and output ated by the China Meteorological Administration (CMA)
data were normalized before they were input into the were used. Figure 4 presents a detailed account of the geo-
models. graphical locations of the above stations, which are des-
ignated by IDs 57461 [latitude 30∘ 42′ N, longitude 111∘
2.5. Evaluation criteria 18′ E, 133.1 masl (m above sea level), 57494 (30∘ 37′ N,
110∘ 08′ E, 23.1 masl), 57516 (29∘ 35′ N, 106∘ 28′ E, 259.1
The first part of this study focused on estimating the masl), 58238 (32∘ 00′ N, 118∘ 48′ E, 7.1 masl), 58321 (31∘
monthly Ep values at each station in the Yangtze River 52′ N, 117∘ 14′ E, 27.9 masl) and 58362 (31∘ 24′ N, 121∘
Basin. The accuracies of the aforementioned ANFIS-GP, 29′ E, 6 masl)]. Figure 4 clearly shows that all stations are
FG and M5Tree models were compared using different located along the Yangtze River Basin in China, which
input combinations of Rg, Ta, Pa and Ws. In the second is the largest river in China and the third largest in the
part of the study, a cross-validation method was applied for world, after the Nile River in Africa and the Amazon River
each model, and the above methods were tested using input in South America. The Yangtze River Basin (from the
data from the other station (Application I, cross-station Qinghai-Tibet Plateau to the East China Sea) holds 40%
application). Moreover, the ANFIS-GP, FG and M5Tree of China’s freshwater resources; supports 70% of China’s
models were tested further in terms of Ep estimation using rice production, 50% of China’s grain production and 70%
input and output data from other stations (Application of China’s fishery production; and accounts for 40% of
II, without local input and output data). The evaluation China’s gross domestic product. The region is character-
criteria used in the following applications were the mean ized by a subtropical monsoon climate that is hot and rainy
absolute error (MAE), root mean square error (RMSE) and in the summer and cold and dry in the winter due to the
coefficient of determination (R2 ), which can be expressed combined effects of the East Asian atmospheric circulation
as follows: and the Qinghai-Tibet Plateau terrain and the proximity of
√ the North Pacific Ocean. More detailed information about
√ N
√ 1 ∑( )2 100
RMSE = √
the geographic, climatic and hydrological conditions of
Epm,i − Epo,i (4) the region is provided in Wang et al. (2007) and Liu et al.
N i=1 Epo,mean
(2011).
Monthly air temperature (Ta), solar radiation (Rg), air
1 ∑|
N
100
MAE = Ep − Epo,i || (5) pressure (Pa), wind speed (Ws) and Ep records for the
N i=1 | m,i Epo,mean period of 1961–2000 were used in this study. For each

70°E 80°E 90°E 100°E 110°E 120°E 130°E 140°E
40°N
40°N
30°N
30°N
20°N
20°N
kilometers
0 285 570 1140 1710 2280
90°E 100°E 110°E 120°E 130°E
Figure 4. The geographical locations of stations used in this study. [Colour figure can be viewed at wileyonlinelibrary.com].
station, 20-year data (50% of the data) were randomly the summer and lower in the winter. In contrast, the Pa
selected to train the models, and the remaining data (50% values were higher in the winter and lower in the sum-
of the data) were used to test the models. Annual variations mer (Figure 6). The mean Rg values recorded at stations
in Ep and associated climatic factors are clearly evident 58362 and 58238 (12.64 and 12.33 MJ m−2 ) were slightly
in Figure 5, and it should be noted that these data have greater than those recorded at other stations, and a mean
been quality controlled through a series of procedures Ta of 17.99 ∘ C was recorded at station 57516, representing
(Zhang et al., 2015) and that annual mean values were the highest value of all stations. The Pa values varied lit-
eliminated when more than one third of the monthly data tle across the stations (see the Cv values in Table 1), and
were missing. It is clear from Figure 5 that the Ep levels the monthly Pa recorded at station 57516 was generally
at station 57516 were generally lower than those at 58238, less than values recorded at the other stations (less than
58321 and other stations, and the Ep levels at most stations 1000 hPa) (Figure 6). The Ws value recorded at station
decreased from 1960 to 2000. However, the annual mean 58362 was greater than values recorded at other stations
Ta levels increased at most stations, and the values at for each month (owing to the proximity of this station to
station 57516 were the highest. The annual Ws value at the East China Sea) (Figure 6). For station ID 57461, the
station 58362 was the highest among the six stations, and Ta values exhibited a less skewed distribution and a strong
the values at stations 57516 and 57461 were the lowest. correlation with Ep (R = 0.9); the Rg and Ws data exhib-
The annual variations in Pa were relatively mild at most ited the strongest (R = 0.94) and weakest (R = 0.30) corre-
stations, and the values at stations 57516 and 57461 were lations with Ep. For station 57494, the Ta data exhibited
much less than they were at other stations. As previously high skewness values (Cv = 0.53) and correlations with Ep
reported in the literature, Rg decreased from 1961 to 1990 (R = 0.91). For station 57516, the Pa levels exhibited the
and then increased from the 1990s for most stations in least skewed distribution (Cv = 0.01) and a negative corre-
China (Figure 6). A more detailed account of these change lation with Ep (R = −0.90); the Rg data generated the high-
rates is presented in Wang et al. (2017). est skewness values (Cv = 0.53) and strongest correlation
Monthly statistical parameters regarding the climatic with Ep (R = 0.96). Similarly, the Rg and Ta data exhib-
data are shown in Table 1 and Figure 6, where xmean , Sx , ited higher skewness values and stronger correlations with
Cv , Cx , xmin and xmax denote the mean, standard devia- Ep for stations 58238, 58321 and 58362. It is clear from
tion, variation coefficient, skewness, minimum and max- the statistical indices shown in Table 1 that each data set
imum values, respectively. It is clear that the monthly covered different data ranges, which may have created
mean Ep values for the whole study period reached 3.63, extrapolation difficulties in estimating extreme Ep values.
3.87, 2.85, 4.27, 4.19 and 4.01 mm for stations 57461, However, in terms of the correlation coefficients, the Rg
57494, 57516, 58238, 58321 and 58362, respectively. The and Ta variables seemed to be the most appropriate param-
monthly Ep, Ta and Rg values were generally higher in eters for predicting Ep.

L. WANG et al.
Figure 5. The annual variations of Ep and associated meteorological parameters during 1961–2000. [Colour figure can be viewed at
wileyonlinelibrary.com].
4. Results and discussion ANFIS-GP3 model yielded lower RMSE (13.15%) and
higher R2 (0.949) values than the FG3 and M5Tree3 mod-
4.1. Estimating the Ep of each station using different
els. Rg seemed to be the most appropriate parameter for
local input combinations
modelling Ep at this station (57461), which was also con-
In the present study, the capabilities of three soft com- firmed from the highest correlation being found between
puting methods (ANFIS-GP, FG and M5Tree) in terms of Rg and Ep, as evident from Table 1. Adding Pa and Ws
estimating monthly Ep using climatic inputs of Rg, Ta, Pa inputs may decrease the models’ accuracy in terms of esti-
and Ws were investigated. Data drawn from six stations mating Ep. For example, the RMSE of M5Tree5 increased
(57461, 57494, 57516, 58238, 58321 and 58362) in the to 15.88%, and the R2 decreased to 0.916. It is clear from
Yangtze River Basin were used in the applications. The Table 3 that the optimal ANFIS-GP4 and FG5 models were
local input combinations used as model inputs included: almost the same in terms of accuracy and that both mod-
(1) Rg; (2) Ta; (3) Rg and Ta; (4) Rg, Ta and Pa; and (5) els performed better than the optimal M5Tree3 model in
Rg, Ta, Pa and Ws. The Sugeno fuzzy inference system the testing phase. In fact, the accuracies of two-, three-
was used for both the ANFIS-GP and FG models. For each and four-input models were similar when using the FG
model, two Gaussian MFs were used because increasing and M5Tree methods. According to results for these two
the number of MFs significantly increased the computa- methods, including the variables Pa and Ws in the input
tional time. For the FG models, population sizes set to combinations did not decrease the models’ accuracy in
100 and 1000 and 10 000 generations were used, and the predicting Ep at this station (57494). As is indicated in
one that yielded the lowest RMSE in the testing phase was Table 4, the optimal FG5 model, whose input combina-
selected. tions were Rg, Ta, Pa and Ws, performed better than the
The testing results of the ANFIS-GP, FG and M5Tree ANFIS-GP5 and M5Tree1 models. At station 57516, the
models for different input combinations were compared Rg variable also seemed to be more effective at modelling
for stations 57461, 57494, 57516, 58238, 58321 and Ep than Ta. This is evident from Table 1, which reveals
58362. As evident from Table 2, all models achieved dif- a correlation between Rg and Ep of 0.96. It is clear from
ferent levels of accuracy for different data sets, and the Table 5 that optimal models were obtained from the inputs
models were the most accurate in the testing phase when of Rg and Ta for the ANFIS-GP, FG and M5Tree models
applied to station 57461 using the Rg and Ta inputs. The of station 58238 in the testing phase. Like station 57494,

8 30
57494
7 Ep (mm) 57516 Ta (°C)
24
6 58238
5 58321 18
4 58362
57461 12
3
2
6
1
0 0
Fe ary
ry
ch
il
ay
ne
A y
em t
Fe ary
ry
ch
il
ay
ne
A y
O er
ov er
em t
O er
r
Se gus
Se gus
D be
be
N obe
D be
be
l
l
pr
pr
Ju
Ju
b
ua
ua
ar
Ju
ar
Ju
nu
nu
em
em
em
em
A
A
u
u
M
M
br
br
ct
ct
Ja
Ja
ov
pt
ec
pt
ec
N
4.0 1030
Pa (hPa)
3.6 1020
3.2 Ws (m s–1)
1010
2.8
2.4 1000
2.0 990
1.6
980
1.2
970
Fe ary
ry
ch
il
ay
ne
A y
em t
Fe ary
ry
ch
il
ay
ne
A y
O er
ov er
em t
O er
ov er
r
Se gus
Se gus
D be
be
D be
be
l
l
pr
pr
Ju
Ju
b
ua
ua
ar
Ju
ar
Ju
nu
nu
o
em
em
o
em
em
A
A
u
u
M
M
br
br
ct
ct
Ja
Ja
pt
ec
pt
ec
N
N
20
18 Rg (MJ m–2)
16
14
12
10
8
6
4
2
Fe ary
ry
ch
il
ay
ne
A y
em t
O er
ov er
r
Se gus
D be
be
l
pr
Ju
b
ua
ar
Ju
nu
o
em
em
A
u
M
br
ct
Ja
pt
ec
N
Figure 6. The seasonal variations of Ep and associated meteorological parameters used in this study. [Colour figure can be viewed at
wileyonlinelibrary.com].
the ANFIS-GP3 and FG3 models were similar in terms for stations 57461, 57494, 57516, 58238, 58321 and
of accuracy. It is clear from the statistics that adding the 58362, respectively. It is clear from Figure 6 that the FG
parameters Pa and Ws to the model inputs decreased the models generally yielded less scatter in their estimates
accuracy of the Ep estimations for station 58238. The opti- compared with the other models. The M5Tree model pro-
mal FG3 model performed better than the ANFIS-GP1 vided the worst results in estimating the Ep values across
and M5Tree5 models for station 58321 (Table 6), and six stations. The scatterplots show that the models gen-
the Rg was found to be the most important parameter erally underestimated high (peak) Ep values at stations
for Ep estimation. According to Table 7, the ANFIS-GP 57494, 58238, 58321 and 58362. This may be attributed
and FG models performed better when using four inputs, to the fact that there were higher Ep values in the test-
whereas the best M5Tree model was obtained when using ing data sets than in the training data sets for these sta-
three inputs for station 58362. The lowest MAE (8.02%) tions. Comparisons of Figs. 7, 8 reveal that all models
and RMSE (10.42%) values and the highest R2 (0.953) provided less scattered estimates for station 57516, poten-
value were observed using the FG5 model, whereas the tially due to the higher correlations (between climatic fac-
ANFIS-GP2 model provided the worst estimates in the tors and Ep) found at this station than at the other five
testing phase (MAE, RMSE and R2 are 18.13%, 22.85% stations.
and 0.746, respectively). According to the results of the
ANFIS-GP and FG models, including the Ws parameter as 4.2. Estimating Ep without local input and output data
an input increased the accuracy even though the correlation Estimating station Ep values using input data from nearby
between Ws and Ep was only 0.04 (Table 1). This reveals a stations is very important, especially in developing coun-
non-linear relationship between Ws and Ep that cannot be tries, for which some climatic data cannot be measured.
captured by the correlation coefficient. The aim of this part of the study was to determine the
Estimates of the optimal FG, ANFIS-GP and M5Tree performance of the ANFIS-GP, FG and M5Tree models
models of the testing phase are compared in Figures 7, 8 in estimating Ep values at stations without local input and

L. WANG et al.
Table 1. Monthly statistical parameters of each data set for each station.
Station ID Dataset xmean Sx Cv Cx xmin xmax R

57461 Rg 10.87 4.32 0.40 0.30 1.91 23.73 0.94
Ta 16.8 8.1 0.48 −0.09 1.5 30.38 0.90
Pa 1002.16 10.78 0.01 −2.71 891.89 1022.04 −0.69
Ws 1.26 0.39 0.31 0.12 0.51 2.24 0.30
EP 3.63 1.82 0.50 0.61 0.89 10.62 1
57494 Rg 11.98 4.64 0.39 0.59 3.27 26.35 0.90
Ta 16.51 8.83 0.53 −0.06 0.12 31.08 0.91
Pa 1013.51 8.55 0.01 −0.16 993.18 1028.18 −0.88
Ws 2.03 0.69 0.34 0.07 0.64 3.77 0.16
EP 3.87 2.05 0.53 0.67 0.84 10.65 1
57516 Rg 8.88 4.67 0.53 0.43 1.15 21.32 0.96
Ta 17.99 7.46 0.41 −0.11 0.64 30.9 0.87
Pa 983.49 7.39 0.01 −0.13 969.79 996.4 −0.90
Ws 1.36 0.34 0.25 −0.11 0.64 2.13 0.58
EP 2.85 1.92 0.67 0.84 0.54 9.32 1
58238 Rg 12.33 4.03 0.33 0.38 4.28 25.78 0.89
Ta 15.15 9 0.59 −0.03 −0.85 30.49 0.89
Pa 1015.45 8.78 0.01 −0.34 980.5 1030.39 −0.83
Ws 2.49 0.53 0.21 0.22 1.31 4.16 0.14
EP 4.27 2.04 0.48 0.35 1.15 9.85 1
58321 Rg 11.85 4.53 0.38 0.01 0 24.87 0.79
Ta 15.96 8.96 0.56 −0.13 −0.98 31.56 0.90
Pa 1010.81 11.3 0.01 −1.85 941.09 1027.32 −0.62
Ws 2.7 0.52 0.19 −0.23 1.05 4.04 0.15
EP 4.19 2.06 0.49 0.38 0.89 9.81 1
58362 Rg 12.64 4.01 0.32 0.36 5.18 25.27 0.90
Ta 15.95 8.46 0.53 −0.04 0.23 29.89 0.89
Pa 1016 8.42 0.01 −0.8 957.73 1029.27 −0.79
Ws 3.75 0.76 0.2 0.71 1.69 6.13 0.04
EP 4.01 1.70 0.42 0.46 1.17 9.04 1
xmean , mean; Sx , standard deviation; Cv , variation coefficient; Cx , skewness; xmin , minimum; xmax , maximum values; R means the coefficient of
correlation between Ep and each climatic parameter. The units of Rg, Ta, Pa, Ws and EP are MJ m−2 , ∘ C, hPa, m s−1 and mm, respectively.
Table 2. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at station ID 57461.
Training Testing
2
57461 MAE RMSE R MAE RMSE R2
(%) (%) (%) (%)
ANFIS-GP1 Rg 12.98 17.32 0.884 11.05 13.90 0.932
ANFIS-GP2 Ta 17.70 23.73 0.781 16.00 20.51 0.836
ANFIS-GP3 Rg, Ta 10.28 14.40 0.919 10.99 13.15 0.949
ANFIS-GP4 Rg, Ta, Pa 9.25 13.47 0.93 12.74 15.42 0.939
ANFIS-GP5 Rg, Ta, Pa, Ws 8.92 12.96 0.935 12.74 15.77 0.937
FG1 Rg 12.82 17.18 0.886 11.16 14.53 0.925
FG2 Ta 15.14 20.07 0.844 15.48 19.10 0.861
FG3 Rg, Ta 10.04 13.83 0.926 10.73 13.52 0.94
FG4 Rg, Ta, Pa 9.16 13.09 0.934 12.28 14.73 0.937
FG5 Rg, Ta, Pa, Ws 9.08 12.55 0.939 12.23 14.76 0.93
M5Tree1 Rg 11.78 16.58 0.893 10.79 13.98 0.932
M5Tree2 Ta 12.74 17.37 0.883 15.45 19.39 0.836
M5Tree3 Rg, Ta 8.07 11.46 0.949 10.79 13.90 0.929
M5Tree4 Rg, Ta, Pa 7.34 11.16 0.952 12.66 16.20 0.917
M5Tree5 Rg, Ta, Pa, Ws 6.76 10.69 0.956 12.08 15.88 0.916
The units of Rg, Ta, Pa, and Ws are MJ m−2 , ∘ C, hPa, and m s−1 , respectively.
output data. In the applications, station 57494 was arbi- Tables 8, 9 present the test results of the ANFIS-GP, FG
trarily selected as a target station. Two different applica- and M5Tree models in Application I and clearly indicate
tions were used: (1) to estimate the Ep at station 57494 that all three models were most accurate when using the Rg
using input data from stations 57461 and 58362 and (b) to and Ta input combinations. The lowest MAE (14.72%) and
estimate Ep at station 57494 using input and output data RMSE (19.75%) values and the highest R2 (0.895) value
from stations 57461 and 58362. were observed when estimating the Ep at station 57494

Training Testing
57494 MAE RMSE R2 MAE RMSE R2
(%) (%) (%) (%)
ANFIS-GP1 Rg 17.50 23.11 0.807 17.02 22.25 0.835
ANFIS-GP2 Ta 17.55 23.44 0.802 22.09 27.19 0.805
ANFIS-GP3 Rg, Ta 11.75 16.03 0.907 15.76 19.10 0.893
ANFIS-GP4 Rg, Ta, Pa 10.78 14.62 0.923 14.80 18.33 0.905
FG1 Rg 17.30 23.01 0.809 16.66 21.98 0.837
FG2 Ta 13.68 17.89 0.885 19.54 24.39 0.87
FG3 Rg, Ta 10.55 14.19 0.927 15.29 18.58 0.908
FG4 Rg, Ta, Pa 10.45 14.17 0.928 15.13 18.58 0.908
FG5 Rg, Ta, Pa, Ws 10.11 13.61 0.933 14.91 18.44 0.906
M5Tree1 Rg 14.85 20.56 0.848 17.51 24.01 0.798
M5Tree2 Ta 11.80 15.44 0.914 21.73 27.62 0.837
M5Tree3 Rg, Ta 7.92 10.98 0.956 15.29 18.83 0.909
M5Tree4 Rg, Ta, Pa 7.59 10.68 0.959 15.48 18.99 0.905
M5Tree5 Rg, Ta, Pa, Ws 7.29 10.34 0.962 15.62 19.29 0.903
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 13.05 19.23 0.919 9.49 13.94 0.956
ANFIS-GP2 Ta 27.00 37.07 0.699 25.36 34.55 0.744
ANFIS-GP3 Rg, Ta 11.73 17.11 0.936 10.36 13.90 0.958
ANFIS-GP4 Rg, Ta, Pa 10.62 15.83 0.945 9.45 12.95 0.964
FG1 Rg 12.46 17.70 0.931 8.39 11.78 0.969
FG2 Ta 19.30 24.54 0.868 20.47 25.17 0.873
FG3 Rg, Ta 9.96 14.27 0.955 8.83 12.22 0.968
FG4 Rg, Ta, Pa 9.93 14.16 0.956 9.27 12.37 0.969
FG5 Rg, Ta, Pa, Ws 8.30 11.70 0.97 8.46 11.67 0.978
M5Tree1 Rg 11.28 16.63 0.939 9.92 12.91 0.962
M5Tree2 Ta 15.48 20.93 0.904 20.14 26.67 0.853
M5Tree3 Rg, Ta 7.74 11.49 0.971 11.53 15.72 0.951
M5Tree4 Rg, Ta, Pa 7.05 10.62 0.975 10.69 15.21 0.957
M5Tree5 Rg, Ta, Pa, Ws 6.66 10.24 0.977 11.42 16.67 0.954
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 17.53 23.06 0.767 14.26 17.84 0.921
ANFIS-GP2 Ta 16.91 22.92 0.77 16.03 20.37 0.824
ANFIS-GP3 Rg, Ta 11.08 15.92 0.889 8.58 11.27 0.957
ANFIS-GP4 Rg, Ta, Pa 10.99 15.50 0.895 9.05 12.09 0.948
FG1 Rg 17.08 22.82 0.772 13.26 16.84 0.928
FG2 Ta 16.11 21.59 0.796 15.62 19.62 0.836
FG3 Rg, Ta 10.58 15.07 0.897 8.22 11.23 0.953
FG4 Rg, Ta, Pa 10.51 15.07 0.901 8.85 11.77 0.949
FG5 Rg, Ta, Pa, Ws 9.73 14.10 0.913 9.87 13.01 0.951
M5Tree1 Rg 16.65 22.37 0.781 13.81 18.31 0.904
M5Tree2 Ta 12.26 17.39 0.868 14.60 20.12 0.823
M5Tree3 Rg, Ta 8.10 12.21 0.935 9.85 13.08 0.927
M5Tree4 Rg, Ta, Pa 7.98 11.79 0.939 10.96 14.35 0.912
M5Tree5 Rg, Ta, Pa, Ws 7.23 10.84 0.949 10.62 14.31 0.927

L. WANG et al.
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 21.14 29.69 0.644 12.21 15.44 0.923
ANFIS-GP2 Ta 17.13 22.68 0.792 17.29 22.58 0.768
ANFIS-GP3 Rg, Ta 14.13 19.57 0.845 12.30 16.17 0.881
ANFIS-GP4 Rg, Ta, Pa 14.03 19.37 0.848 12.02 15.92 0.885
FG1 Rg 19.23 25.31 0.741 11.61 14.96 0.9
FG2 Ta 15.92 21.04 0.821 16.49 21.76 0.785
FG3 Rg, Ta 13.12 17.39 0.878 11.04 14.43 0.905
FG4 Rg, Ta, Pa 12.95 17.08 0.882 11.25 14.68 0.902
FG5 Rg, Ta, Pa, Ws 10.41 13.45 0.927 13.99 17.48 0.906
M5Tree1 Rg 17.10 23.04 0.786 12.64 15.94 0.888
M5Tree2 Ta 12.95 18.00 0.869 17.77 23.75 0.749
M5Tree3 Rg, Ta 9.32 12.44 0.937 12.21 15.87 0.89
M5Tree4 Rg, Ta, Pa 8.67 12.08 0.941 12.02 15.60 0.894
M5Tree5 Rg, Ta, Pa, Ws 7.87 11.04 0.951 12.41 15.48 0.9
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 15.31 19.74 0.776 11.48 13.62 0.922
ANFIS-GP2 Ta 15.13 19.71 0.777 18.13 22.85 0.746
ANFIS-GP3 Rg, Ta 9.71 13.69 0.893 9.26 11.50 0.942
ANFIS-GP4 Rg, Ta, Pa 9.61 13.44 0.897 8.98 11.17 0.948
FG1 Rg 14.88 19.39 0.784 10.92 13.54 0.918
FG2 Ta 14.26 18.14 0.811 17.10 21.08 0.787
FG3 Rg, Ta 9.28 13.22 0.9 8.70 11.10 0.946
FG4 Rg, Ta, Pa 9.41 13.32 0.898 8.60 11.22 0.943
FG5 Rg, Ta, Pa, Ws 8.86 12.67 0.908 8.02 10.42 0.953
M5Tree1 Rg 13.56 18.39 0.806 12.18 15.74 0.882
M5Tree2 Ta 11.92 15.60 0.861 16.04 20.55 0.795
M5Tree3 Rg, Ta 7.32 10.83 0.933 9.89 12.89 0.924
M5Tree4 Rg, Ta, Pa 7.04 10.35 0.938 9.81 12.64 0.927
M5Tree5 Rg, Ta, Pa, Ws 6.62 10.08 0.942 9.89 12.84 0.923
using input data from station 57461 using the optimal Ws parameter to the input combinations decreased the
FG3 model. However, the optimal ANFIS-GP1 model per- model accuracy of cross-station Application II.
formed better than the optimal FG5 and M5Tree1 models Estimates of the FG, ANFIS-GP and M5Tree models for
(Table 9). Adding the Ws parameter to input combinations cross-station Applications I and II are further shown in
generally increased the model accuracy levels of Applica- Figure 9; all of the models generally underestimated peak
tion I (for cases not involving local inputs). Test statistics values. The FG models generally produced less scattered
of the ANFIS-GP, FG and M5Tree models are compared estimates than the ANFIS-GP and M5Tree models for
further in Tables 10, 11 for Application II (for cases not both applications. For the second part of Application I
involving local input and output data). The best models (estimating the Ep of station 57494 using input data from
were obtained when Rg, Ta and Pa were used as inputs. The station 58362), the results seem to be worse than those
highest level of model accuracy (the optimal FG3 model) of the previous application (models based on input data
was obtained when estimating Ep at station 57494 using of station 57461). This may be because station 57461 is
input and output data from station 5746. In other cases closer to station 57494 than 58362. It is clear from Figure 9
described in Table 11, models with only Rg input data also that the M5Tree estimates are more scattered than those
achieved relatively high levels of accuracy. The optimal of the FG and ANFIS-GP models. Comparing Figure 9
FG1 model produced better estimates than the ANFIS-GP1 to Figure 7 also clearly reveals that the models obtained
and M5Tree1 models (MAE, RMSE and R2 values of without local input and output data generally provided
9.77%, 12.78% and 0.929, respectively), and adding the more scattered estimates than the models with local data.

9 9 9
57461: ANFIS-GP3 57461: FG3 57461: M5Tree3
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
6 6 6
3 3 3
y = 1.055x + 0.01 y = 1.056x + 0.039 y = 1.008x + 0.13
R2 = 0.949 R2 = 0.94 R2 = 0.929
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Observed Ep (mm) Observed Ep (mm) Observed Ep (mm)
9 9 9
57494: ANFIS-GP4 57494: FG4 57494: M5Tree4
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
6 6 6
3 3 3
y = 0.923x + 0.572 y = 1.942x + 0.537 y = 0.95x + 0.526
R2 = 0.905 R2 = 0.908 R2 = 0.909
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
9 9 9
57516: ANFIS-GP5 57516: FG5 57516: M5Tree4
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
6 6 6
3 3 3
y = 1.011x + 0.536 y = 1.048x + 0.01 y = 1.014x + 0.109
R2 = 0.968 R2 = 0.978 R2 = 0.957
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Figure 7. The observed and estimated Ep values using optimal ANFIS-GP, FG and M5Tree models for each station (Part 1). [Colour figure can be
viewed at wileyonlinelibrary.com].
9 9 9
58238: ANFIS-GP3 58238: FG3 58238: M5Tree3
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
6 6 6
3 3 3
y = 0.869x + 0.438 y = 0.892x + 0.335 y = 0.908x + 0.321
R2 = 0.957 R2 = 0.953 R2 = 0.927
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
9 9 9
58321: ANFIS-GP4 58321: FG4
Estimated Ep (mm)
Estimated Ep (mm)
58321: M5Tree4
Estimated Ep (mm)
6 6 6
3 3 3
y = 0.753x + 1.047 y = 0.907x + 0.411 y = 0.947x + 0.245
R2 = 0.923 R2 = 0.905 R2 = 0.89
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
9 9 9
58362: ANFIS-GP5 58362: M5Tree4
Estimated Ep (mm)
Estimated Ep (mm)
58362: FG5
Estimated Ep (mm)
6 6 6
3 3 3
y = 0.899x + 0.55 y = 0.928x + 0.443 y = 0.86x + 0.658
R2 = 0.95 R2 = 0.953 R2 = 0.927
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Figure 8. The observed and estimated Ep values using optimal ANFIS-GP, FG and M5Tree models for each station (Part 2). [Colour figure can be

L. WANG et al.
Table 8. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at station ID 57494 using input data of
57461 – Application I (without local input data).
Training Testing
MAE RMSE R2 MAE RMSE R2
(%) (%) (%) (%)
ANFIS-GP1 Rg 16.02 21.15 0.839 18.43 22.77 0.846
ANFIS-GP2 Ta 17.24 23.04 0.809 19.86 25.02 0.817
ANFIS-GP3 Rg, Ta 10.91 15.36 0.915 15.96 20.90 0.887
ANFIS-GP4 Rg, Ta, Pa 9.94 14.09 0.929 15.98 21.45 0.886
FG1 Rg 15.89 20.90 0.843 19.09 23.02 0.838
FG2 Ta 12.92 16.91 0.897 17.77 22.11 0.863
FG3 Rg, Ta 9.92 13.68 0.933 14.72 19.75 0.895
FG4 Rg, Ta, Pa 9.81 13.58 0.934 16.42 22.41 0.872
FG5 Rg, Ta, Pa, Ws 9.61 13.30 0.936 15.24 20.54 0.894
M5Tree1 Rg 13.32 18.54 0.877 19.86 24.97 0.809
M5Tree2 Ta 11.37 15.33 0.916 17.08 21.86 0.861
M5Tree3 Rg, Ta 7.98 11.37 0.954 15.66 21.56 0.871
M5Tree4 Rg, Ta, Pa 7.32 10.73 0.959 16.20 22.38 0.861
M5Tree5 Rg, Ta, Pa, Ws 6.69 9.56 0.967 16.40 21.81 0.864
Table 9. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at station ID 57494 using input data of
58362 – Application I (without local input data).
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 13.03 17.35 0.884 10.97 13.80 0.932
ANFIS-GP2 Ta 21.07 27.81 0.701 20.41 26.58 0.721
ANFIS-GP3 Rg, Ta 15.74 21.99 0.813 14.63 20.86 0.831
ANFIS-GP4 Rg, Ta, Pa 15.08 20.79 0.833 12.66 17.63 0.879
FG1 Rg 19.48 26.78 0.723 15.06 22.55 0.779
FG2 Ta 19.89 26.15 0.736 20.69 25.92 0.745
FG3 Rg, Ta 15.46 21.50 0.821 13.92 19.95 0.854
FG4 Rg, Ta, Pa 15.16 20.96 0.831 13.00 18.49 0.869
FG5 Rg, Ta, Pa, Ws 15.05 20.63 0.836 12.80 17.61 0.874
M5Tree1 Rg 19.13 26.53 0.728 14.52 21.72 0.792
M5Tree2 Ta 17.90 24.10 0.776 22.35 29.01 0.694
M5Tree3 Rg, Ta 12.05 18.33 0.87 16.69 24.21 0.792
M5Tree4 Rg, Ta, Pa 10.98 16.48 0.895 17.20 23.89 0.808
M5Tree5 Rg, Ta, Pa, Ws 10.30 15.63 0.906 17.12 23.46 0.804
Table 10. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at station ID 57494 using input and output data of
57461 – Application II (without local input and output data).
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 13.05 17.46 0.884 18.46 22.41 0.839
ANFIS-GP2 Ta 17.94 24.03 0.779 17.96 23.13 0.814
ANFIS-GP3 Rg, Ta 10.14 14.27 0.922 14.97 18.73 0.889
ANFIS-GP4 Rg, Ta, Pa 9.33 13.57 0.929 12.36 17.85 0.889
FG1 Rg 12.89 17.32 0.885 18.68 22.55 0.838
FG2 Ta 15.36 20.36 0.841 16.37 20.57 0.868
FG3 Rg, Ta 9.87 13.78 0.927 14.97 18.15 0.901
FG4 Rg, Ta, Pa 9.30 13.21 0.933 13.57 17.50 0.894
FG5 Rg, Ta, Pa, Ws 9.03 12.51 0.94 12.94 17.55 0.905
M5Tree1 Rg 11.15 15.31 0.91 18.92 22.93 0.833
M5Tree2 Ta 12.32 16.99 0.89 18.90 25.13 0.844
M5Tree3 Rg, Ta 7.64 10.52 0.958 18.54 24.69 0.857
M5Tree4 Rg, Ta, Pa 6.74 9.71 0.964 15.74 22.71 0.862
M5Tree5 Rg, Ta, Pa, Ws 6.53 9.52 0.965 16.26 23.84 0.871

Table 11. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at station ID 57494 using input and output data of
58362 – Application II (without local input and output data).
Training Testing
2
MAE RMSE R MAE RMSE R2
(%) (%) (%) (%)
ANFIS-GP1 Rg 15.30 19.74 0.777 11.35 13.92 0.927
ANFIS-GP2 Ta 15.15 19.71 0.778 24.21 28.55 0.825
ANFIS-GP3 Rg, Ta 9.72 13.68 0.893 12.55 14.92 0.945
ANFIS-GP4 Rg, Ta, Pa 9.39 13.31 0.899 12.78 16.35 0.919
FG1 Rg 14.93 19.41 0.785 9.77 12.78 0.929
FG2 Ta 14.28 18.17 0.811 23.04 27.04 0.862
FG3 Rg, Ta 14.10 19.61 0.821 13.92 19.95 0.854
FG4 Rg, Ta, Pa 9.62 13.43 0.897 15.03 18.98 0.903
FG5 Rg, Ta, Pa, Ws 8.92 12.88 0.905 10.63 15.66 0.905
M5Tree1 Rg 13.56 18.39 0.806 10.55 13.43 0.926
M5Tree2 Ta 11.91 15.60 0.861 22.46 27.44 0.826
M5Tree3 Rg, Ta 7.35 10.84 0.933 12.95 16.03 0.91
M5Tree4 Rg, Ta, Pa 7.05 10.37 0.939 14.00 17.12 0.879
M5Tree5 Rg, Ta, Pa, Ws 6.63 10.09 0.942 15.80 19.49 0.861
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
9 (Application I, 57461, 9 9
(Application I, 57461, (Application I, 57461,
ANFIS-GP3) FG3) M5Tree3)
6 6 6
3 y = 0.957x + 0.527 3 y = 0.961x + 0.472 3 y = 0.949 + 0.523

R2 = 0.887 R2 = 0.895 R2 = 0.871
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Estimated Ep (mm)
Estimated Ep (mm)
9
Estimated Ep (mm)
9 9
(Application I, 58362, (Application I, 58362, (Application I, 58362,
6 6 6
3 y = 0.892x + 0.553 3 y = 0.916x + 0.47 3 y = 0.968 + 0.371

R2 = 0.887 R2 = 0.876 R2 = 0.804
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Observed Ep (mm) Observed Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
9 9 9
(Application II, 57461, (Application II, 57461, (Application II, 57461,
6 6 6
3 y = 0.88x + 0.33 3 y = 0.941x + 0.01 3 y = 1.033 + 0.1

R2 = 0.889 R2 = 0.905 R2 = 0.862
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Observed Ep (mm) Observed Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
Estimated Ep (mm)
9 9 9
(Application II, 58362, (Application II, 58362 (Application II, 58362,
6 6 6
3 y = 0.811x + 0.644 3 y = 0.883x + 0.328 3 y = 0.868 + 0.344

R2 = 0.927 R2 = 0.929 R2 = 0.926
0 0 0
0 3 6 9 0 3 6 9 0 3 6 9
Figure 9. The observed and estimated Ep values using optimal ANFIS-GP, FG and M5Tree models for each station during cross validation stage.
[Colour figure can be viewed at wileyonlinelibrary.com].
4.3. Estimating Ep at six stations using generalized and testing statistics of the applied models are com-
ANFIS-GP, M5Tree and FG models pared in Table 12. The four-input FG model performed
In this part of the study, optimal FG, ANFIS-GP and better than the ANFIS-GP and M5Tree models, and
M5Tree models were obtained using training data from adding the Pa or Ws parameters to the input combinations
six stations and were tested at the above stations. Training slightly increased the accuracy of the FG and ANFIS-GP

L. WANG et al.
Table 12. Comparison of ANFIS_GP, FG and M5Tree models for predicting Ep at all stations.
Training Testing
(%) (%) (%) (%)
ANFIS-GP1 Rg 16.49 23.30 0.796 12.78 16.98 0.898
ANFIS-GP2 Ta 21.35 28.26 0.7 22.68 28.98 0.695
ANFIS-GP3 Rg, Ta 12.99 18.44 0.872 12.55 15.98 0.906
ANFIS-GP4 Rg, Ta, Pa 12.29 17.66 0.883 12.23 15.77 0.908
FG1 Rg 16.60 23.12 0.799 12.83 16.88 0.899
FG2 Ta 19.23 25.36 0.758 21.36 27.23 0.734
FG3 Rg, Ta 12.57 17.82 0.881 12.57 16.27 0.902
FG4 Rg, Ta, Pa 12.03 16.99 0.891 12.41 15.88 0.907
FG5 Rg, Ta, Pa, Ws 11.56 16.34 0.9 11.89 15.41 0.912
M5Tree1 Rg 15.64 21.77 0.822 12.94 17.32 0.893
M5Tree2 Ta 17.38 23.21 0.797 21.72 27.79 0.727
M5Tree3 Rg, Ta 9.77 14.05 0.926 12.86 17.30 0.891
M5Tree4 Rg, Ta, Pa 8.21 11.95 0.946 13.15 17.90 0.886
M5Tree5 Rg, Ta, Pa, Ws 7.79 11.45 0.951 13.23 17.74 0.887
9 9
ALL: ANFIS-GP5 ALL: FG5
Estimated Ep (mm)
Estimated Ep (mm)
6 6
3 3
y = 0.9x + 0.463 y = 0.915x + 0.399
R2 = 0.908 R2 = 0.912
0 0
0 3 6 9 0 3 6 9
9
ALL: M5Tree3
Estimated Ep (mm)
3
y = 0.929x + 0.339
R2 = 0.891
0
0 3 6 9
Observed Ep (mm)
Figure 10. The observed and estimated Ep values using optimal ANFIS-GP, FG and M5Tree models for data from all stations. [Colour figure can be
models. Only Ta inputs seemed to be insufficient for (16) rules and corresponding constant values (consequent
obtaining an accurate generalized Ep model at these sta- parameters). The rule base and consequent parameters
tions. Ep estimates from the optimal generalized models used in the optimal FG and ANFIS-GP models are pro-
are shown in Figure 10, and the FG model generated vided in Tables 13 and 14, respectively. As an example,
less scattered estimates than the other models. Similar to when Rg, Ta, Pa and Ws are in1mf1, in2mf1, in3mf1
previous applications, underestimations of high Ep values and in4mf1 under the first rule, respectively, then Ep is
are clearly evident in the scatterplots. The generalized 0.3123 under the optimal generalized FG model. For the
models with Rg and Ta inputs provided reliable estimation FG and ANFIS-GP models, final outputs were obtained
results. The FG and ANFIS-GP models were composed by calculating weighted averages of the 16 consequent
of four inputs and two MFs; thus, each input had 24 parameters (Table 14), and the consequent parameters of

Table 13. The rule base of the optimal generalized FG and ANFIS-GP models for predicting Ep at all stations.
Rules
1. If (Rg is in1mf1) and (Ta is in2mf1) and (Pa is in3mf1) and (Ws is in4mf1) then (Ep is out1mf1) (1)
1mf1, the firstt membership function of the firstt input and vice versa. The unit of Rg, Ta, Pa, Ws and EP are MJ m−2 , ∘ C, hPa, ms−1 and mm,
respectively.
Table 14. The consequent parameters of the optimal generalized FG and ANFIS-GP models for predicting Ep at all stations.
Rule Parameter Rule Parameter Rule Parameter Rule Parameter

Four-input FG
Rule 1 y1 = 0.3123 Rule 5 y5 = 0.8877 Rule 9 y9 = 0.1475 Rule 13 y13 = −0.3395
Rule 2 y2 = 0.9275 Rule 6 y6 = 0.9758 Rule 10 y10 = 0.7134 Rule 14 y14 = 0.4803
Rule 3 y3 = −0.0475 Rule 7 y7 = −1.548 Rule 11 y11 = −1.101 Rule 15 y15 = 0.8763
Rule 4 y4 = 0.3520 Rule 8 y8 = 0.5816 Rule 12 y12 = 5.362 Rule 16 y16 = −7.611
Four-input ANFIS-GP
Rule 1 y1 = −2.788 Rule 5 y5 = 0.6791 Rule 9 y9 = 3.768 Rule 13 y13 = 3.719
Rule 2 y2 = −0.8847 Rule 6 y6 = 3.548 Rule 10 y10 = 6.949 Rule 14 y14 = 9.595
Rule 3 y3 = 15.26 Rule 7 y7 = −0.489 Rule 11 y11 = −11.61 Rule 15 y15 = 6.754
Rule 4 y4 = 5.929 Rule 8 y8 = 3.834 Rule 12 y12 = 25.41 Rule 16 y16 = 3.320
the ANFIS-GP model seemed to be more effective than model performance even though there were high correla-
premise parameters of the FG model. tions between Pa and Ep. All these analysis results reveal
In general, the FG and ANFIS-GP models performed a non-linear relationship between each climatic variable
better than M5Tree at estimating the Ep values at sta- and Ep that cannot be explained from statistical indices.
tions in the Yangtze River Basin. The lower accuracy of Additionally, we found a few extreme statistical values due
the M5Tree may be attributed its linear nature, whereby to the presence of high or low meteorological values. For
it assumes constant variance throughout the data range. example, the RMSE for station 57494 reached 50% in July
FG was found to produce better estimation results than of 1998 as a result of extreme radiation values. This study
ANFIS-GP. Both the FG and ANFIS-GP models use fuzzy focused on evaluating and comparing the general applica-
approaches that can accurately describe non-linear rela- bility of the above models, and the extreme events will be
tionships. The ANFIS-GP model uses the gradient descent investigated in upcoming studies.
method to compute the MF parameters and weights (Jang,
1993). When using gradient-based methods, it is common
to become trapped in a local minimum (Sudheer et al., 5. Conclusion
2003). The FG model uses a GA to calculate the MF
parameters. A GA is a type of search method that strikes This study investigated and compared the capacities of
a remarkable balance between surveying the search space three different soft computing methods (ANFIS-GP, FG
and identifying the best solution. This approach has been and M5Tree) in modelling the monthly Ep values of sta-
found to be more robust than the gradient-based methods tions along the Yangtze River Basin in China. Climatic
because it can find the global optimum without becom- data (Rg, Ta, Pa and Ws) obtained from six stations (57461,
ing trapped in local optima (Karterakis et al., 2007; Kisi, 57494, 57516, 58238, 58321 and 58362) were used as
2015). The Rg variable was found to be more effective inputs to the models. In the first part of the study, the
at modelling Ep than the Ta even though there low cor- above models with different local input combinations were
relations existed between Ta and Ep at some stations. compared at each station, and the results indicate that the
In some cases, adding Pa to the inputs did not improve three models were most accurate when applied to station

L. WANG et al.
57461 using input combinations of Rg and Ta. Rg was Goyal MK, Ojha CSP. 2011. Estimation of Scour downstream of a
ski-jump bucket using support vector and M5 model tree. Water
found to be the most effective parameter for modelling Ep Resour. Manage. 25(9): 2177–2195.
at most stations, and on occasion, adding inputs of Pa or Goyal MK, Bharti B, Quilty J, Adamowski J, Pandey A. 2014. Modeling
Ws variables could decrease the model accuracy. The FG of daily pan evaporation in subtropical climates using ANN, LS-SVR,
Fuzzy Logic, and ANFIS. Expert Syst. Appl. 41(11): 5267–5276.
model generally performed better than the ANFIS-GP and Guven A, Kisi Ö. 2011. Daily pan evaporation modeling using linear
M5Tree models. For example, the lowest MAE (8.02%) genetic programming technique. Irrig. Sci. 29(2): 135–145.
and RMSE (10.42%) values and the highest R2 (0.953) Han S, Xu D, Wang S. 2012. Decreasing potential evaporation trends
in China from 1956 to 2005, accelerated in regions with significant
value were obtained at station 58362 when using the FG5 agricultural influence? Agric For. Meteorol. 154: 44–56.
model. In the second part of the study, the cross validation Jang JSR. 1993. ANFIS, adaptive-network-based fuzzy inference sys-
method was employed, and the Ep values were estimated tem. IEEE Trans. Syst. Manage. Cybern. 23(3): 665–685.
Karterakis SM, Karatzas GP, Nikolos IK, Papadopoulou MP. 2007.
using inputs or outputs from other stations. Under Appli- Application of linear programming and differential evolutionary opti-
cation I, the monthly Ep at station 57494 was estimated mization methodologies for the solution of coastal subsurface water
using input data from stations 57461 and 58362. The test management problems subject to environmental criteria. J. Hydrol.
342(3–4): 270–282.
results indicate that the FG model performed better than Kim S, Shiri J, Kisi O. 2012. Pan evaporation modeling using neural
the ANFIS-GP and M5Tree models in terms of the RMSE, computing approach for different climatic zones. Water Resour. Man-
MAE and R2 criteria. In Application II, the Ep at station age. 26(11): 3231–3249.
Kim S, Shiri J, Kisi Ö, Singh VP. 2013. Estimating daily pan evapora-
57494 was further estimated using input and output data tion using different data-driven methods and lag-time patterns. Water
from stations 57461 and 58362, respectively. Comparisons Resour. Manage. 27(7): 2267–2286.
also reveal that the FG models with different input combi- Kim S, Shiri J, Singh VP, Kisi Ö, Landeras G. 2015. Predicting daily
pan evaporation by soft computing models with limited climatic data.
nations performed better than the other methods. The third Hydrol. Sci. J. 60(6): 1120–1136.
part of the study focused on estimating the Ep values of Kisi O. 2015. Pan evaporation modeling using least square support vector
machine, multivariate adaptive regression splines and M5 model tree.
all stations using generalized FG, ANFIS-GP and M5Tree J. Hydrol. 528: 312–320.
models, and the results further indicate that a single gen- Kisi O, Tombul M. 2013. Modeling monthly pan evaporations using
eralized FG model can be successfully used to predict Ep fuzzy genetic approach. J. Hydrol. 477: 203–212.
Kisi O, Zounemat-Kermani M. 2014. Comparison of two different
using input combinations of Rg and Ta. adaptive neuro-fuzzy inference systems in modelling daily reference
In general, it was found that FG models can be success- evapotranspiration. Water Resour. Manage. 28(9): 2655–2675.
fully applied to estimate Ep both when using when not Li Z, Chen Y, Shen Y, Liu Y, Zhang S. 2013. Analysis of changing pan
evaporation in the arid region of Northwest China. Water Resour. Res.
using local input and output data. Such models can be 49(4): 2205–2212.
used in water resources management practice to charac- Lin GF, Lin HY, Wu MC. 2013. Development of a
terize other hydrological processes in the Yangtze River support-vector-machine-based model for daily pan evaporation
estimation. Hydrol. Process. 27(22): 3115–3127.
Basin (e.g. surface water–groundwater interactions, irri- Liu B, Xu M, Henderson M, Gong W. 2004. A spatial analysis of pan
gation scheduling and terrestrial ecosystem carbon cycle evaporation trends in China, 1955–2000. J. Geophys. Res. 109(D15):
modelling). Further studies will be performed to develop D15102, doi: 10.1029/2004JD004511.
Liu X, Luo Y, Zhang D, Zhang M, Liu C. 2011. Recent changes in
and test more soft computing techniques applied on hourly pan-evaporation dynamics in China. Geophys. Res. Lett. 38: L13404,
and daily scales. doi: 10.1029/2011GL047929.
Majidi M, Alizadeh A, Farid A, Vazifedoust M. 2015. Estimating evap-
oration from lakes and reservoirs under limited data condition in a
semi-arid region. Water Resour. Manage. 29(10): 3711–3733.
McCuen RH. 1998. Hydrologic Analysis and Design. Prentice Hall:
Acknowledgements Englewood Cliffs, NJ.
Penman HL. 1948. Natural evaporation from open water, bare soil and
This work was financially supported by National Natural grass. Proc. R. Soc. A. Math. Phys. Eng. Sci 193: 120–145.
Science Foundation of China (No.41601044), Natural Piri J, Amin S, Moghaddamnia A, Keshavarz A, Han D, Remesan R.
Science Foundation for Distinguished Yound Schol- 2009. Daily pan evaporation modeling in a hot and dry climate. J.
Hydrol. Eng. 14(8): 803–811.
ars of Hubei Province of China (No.2016CFA051) Preis A, Ostfeld A. 2008. A coupled model tree-genetic algorithm
and the Special Fund for Basic Scientific Research of scheme for flowand water quality predictions in watersheds. J. Hydrol.
Central Colleges, China University of Geosciences, 349: 364–375.
Priestley CHB, Taylor RJ. 1972. On the assessment of surface heat
Wuhan (No.CUG150631), and the 111 Project (grant flux and evaporation using large scale parameters. Mon. Weather Rev.
No. B08030). Authors would like to thank the China 100(2): 81–92.
Meteorological Administration (CMA) for providing the Rahimikhoob A. 2009. Estimating daily pan evaporation using artificial
neural network in a semi-arid environment. Theor. Appl. Climatol.
meteorological and radiation data. 98(1–2): 101–105.
Rahimikhoob A. 2014. Comparison between M5 model tree and neural
networks for estimating reference evapotranspiration in an arid envi-
ronment. Water Resour. Manage. 28: 657–669.
References Rosenberry DO, Winter TC, Buso DC, Likens GE. 2007. Comparison
of 15 evaporation methods applied to a small mountain lake in the
Cobaner M, Unal B, Kisi Ö. 2009. Suspended sediment concentration northeastern USA. J. Hydrol. 340: 149–166.
estimation by an adaptive neuro-fizzy and neural network approaches Sanikhani H, Kisi Ö, Nikpour MR, Dinpashoh Y. 2012. Estimation
by using hydrometerological data. J. Hydrol. 367: 52–61. of daily pan evaporation using two different adaptive neuro-fuzzy
Droogers P, Allen RG. 2002. Estimating reference evapotranspiration computing techniques. Water Resour. Manage. 26(15): 4347–4365.
under inaccurate data conditions. Irrig. Drain. Syst. 16: 33–45. Shilo E, Ziv B, Shamir E, Rimmer A. 2015. Evaporation from Lake
Goyal MK. 2014. Modeling of sediment yield prediction using M5 model Kinneret, Israel, during hot summer days. J. Hydrol. 528: 264–275.
tree algorithm and wavelet regression. Water Resour. Manage. 28: Shiri J, Nazemi AH, Sadraddini AA, Landeras G, Kisi O, Fakheri Fard
1991–2003. A, Marti P. 2014a. Comparison of heuristic and empirical approaches

for estimating reference evapotranspiration from limited inputs in Iran. Wei J, Knoche HR, Kunstmann H. 2015. Contribution of transpira-
Comput. Electron. Agric. 108: 230–241. tion and evaporation to precipitation, an ET-Tagging study for the
Shiri J, Sadraddini AA, Nazemi AH, Kisi O, Landeras G, Fakheri Poyang Lake region in Southeast China. J. Geophys. Res. 120(14):
Fard A, Marti P. 2014b. Generalizability of gene expression 6845–6864.
programming-based approaches for estimating daily reference Witten IH, Frank E, Trigg L, Hall M, Holmes G, Cunningham SJ.
evapotranspiration in coastal stations of Iran. J. Hydrol. 508: 1–11. 1999. Weka: practical machine learning tools and techniques with
Shiri J, Marti P, Singh VP. 2014c. Evaluation of gene expression pro- java implementations. In Proceedings of Emerging Knowledge Engi-
gramming approaches for estimating daily evaporation through spatial neering and Connectionist-Based Information Systems, Dunedin, New
and temporal data scanning. Hydrol. Process. 28(3): 1215–1225. Zealand, 192–196.
Shirsath PB, Singh AK. 2010. A comparative study of daily pan evap- Xie H, Zhu X, Yuan DY. 2015. Pan evaporation modelling and changing
oration estimation using ANN, regression and climate based models. attribution analysis on the Tibetan Plateau (1970–2012). Hydrol.
Water Resour. Manage. 24(8): 1571–1581. Process. 29(9): 2164–2177.
Stephens JC, Stewart EH. 1963. A comparison of procedures for comput- Xu YP, Pan S, Gao C, Fu G, Chiang YM. 2016. Historical pan evapora-
ing evaporation and evapotranspiration. Publication 62, international tion changes in the Qiantang River Basin, East China. Int. J. Climatol.
association of scientific hydrology. In International Union of Geody- 36: 1928–1942.
namics and Geophysics, Berkeley, CA, 123–133. Yang H, Yang D. 2012. Climatic factors influencing changing pan
Sudheer KP, Gosain AK, Ramasastri KS. 2003. Estimating actual evapo- evaporation across China from 1961 to 2001. J. Hydrol. 414:
transpiration from limited climatic data, using neural computing tech- 184–193.
nique. J. Irrig. Drain. Eng.-ASCE 129(3): 214–218. Zhang Q, Qi T, Li J, Singh VP, Wang Z. 2015. Spatiotemporal variations
Tan SBK, Shuy EB, Chua LHC. 2007. Modelling hourly and daily of pan evaporation in China during 1960–2005, changing patterns and
open-water evaporation rates in areas with an equatorial climate. causes. Int. J. Climatol. 35(6): 903–912.
Hydrol. Process. 21(4): 486–499. Zounemat-Kermani M, Scholz M. 2013. Computing air demand
Valiantzas JD. 2006. Simplified versions for the Penman evaporation using the Takagi–Sugeno model for dam outlets. Water 5(3):
equation using routine weather data. J. Hydrol. 331: 690–702. 1441–1456.
Wang Y, Jiang T, Bothe O, Fraedrich K. 2007. Changes of pan evap- Zuo H, Li D, Hu Y, Bao Y, Lü S. 2005. Characteristics of climatic
oration and reference evapotranspiration in the Yangtze River basin. trends and correlation between pan-evaporation and environmental
Theor. Appl. Climatol. 90: 13–23. factors in the last 40 years over China. Chin. Sci. Bull. 50(12):
Wang LC, Kisi O, Zounemat-Kermani M, Li H. 2017. Pan evaporation 1235–1241.
modeling using six different heuristic computing methods in different
climates of China. J. Hydrol. 544: 407–427.

Evaporation Modelling Using Different Machine Learning PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaporation Modelling Using Different Machine Learning PDF

Uploaded by

Copyright:

Available Formats

INTERNATIONAL JOURNAL OF CLIMATOLOGY

Int. J. Climatol. (2017)

Evaporation modelling using different machine learning

© 2017 Royal Meteorological Society

rological data collected from 81 meteorological stations in Output Output

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

70°E 80°E 90°E 100°E 110°E 120°E 130°E 140°E

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

Station ID Dataset xmean Sx Cv Cx xmin xmax R

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

3 y = 0.957x + 0.527 3 y = 0.961x + 0.472 3 y = 0.949 + 0.523

3 y = 0.892x + 0.553 3 y = 0.916x + 0.47 3 y = 0.968 + 0.371

3 y = 0.88x + 0.33 3 y = 0.941x + 0.01 3 y = 1.033 + 0.1

3 y = 0.811x + 0.644 3 y = 0.883x + 0.328 3 y = 0.868 + 0.344

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

Rule Parameter Rule Parameter Rule Parameter Rule Parameter

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

© 2017 Royal Meteorological Society Int. J. Climatol. (2017)

You might also like