You are on page 1of 12

Journal of Hydrology 403 (2011) 201–212

Contents lists available at ScienceDirect

Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol

Comparison of three artificial intelligence techniques for discharge routing


Rahman Khatibi a,⇑, Mohammad Ali Ghorbani b, Mahsa Hasanpour Kashani b, Ozgur Kisi c
a
Consultant Mathematical Modeller, Swindon, UK
b
Department of Water Engineering, Tabriz University, Tabriz, Iran
c
Department of Civil Engineering, Erciyes University, Kayseri, Turkey

a r t i c l e i n f o s u m m a r y

Article history: The inter-comparison of three artificial intelligence (AI) techniques are presented using the results of
Received 1 October 2010 river flow/stage timeseries, that are otherwise handled by traditional discharge routing techniques. These
Received in revised form 20 February 2011 models comprise Artificial Neural Network (ANN), Adaptive Nero-Fuzzy Inference System (ANFIS) and
Accepted 6 March 2011
Genetic Programming (GP), which are for discharge routing of Kizilirmak River, Turkey. The daily mean
Available online 25 March 2011
This manuscript was handled by A.
river discharge data with a period between 1999 and 2003 were used for training and testing the models.
Bardossy, Editor-in-Chief, with the The comparison includes both visual and parametric approaches using such statistic as Coefficient of Cor-
assistance of Fi-John Chang, Associate Editor relation (CC), Mean Absolute Error (MAE) and Mean Square Relative Error (MSRE), as well as a basic scor-
ing system. Overall, the results indicate that ANN and ANFIS have mixed fortunes in discharge routing,
Keywords: and both have different abilities in capturing and reproducing some of the observed information. How-
Inter-comparison ever, the performance of GP displays a better edge over the other two modelling approaches in most
Model pluralism of the respects. Attention is given to the information contents of recorded timeseries in terms of their
Discharge routing peak values and timings, where one performance measure may capture some of the information contents
Artificial intelligence modelling but be ineffective in others. Thus, this makes a case for compiling knowledge base for various modelling
GP, ANFIS, ANN techniques.
Kizilirmak
Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction (iv) advocating the selection of particular models for particular


problems.
Applying artificial intelligence (AI) techniques to modelling Flood routing equations have been transformed into sophisti-
timeseries of river flows emerged in the last two decades reaching cated modelling software tools and at an international arena there
the proof-of-concept stage of their developments to supplement are both freeware (e.g. HEC-RAS) and commercial applications
traditional discharge routing techniques. These techniques serve including DELFT-FEWS, ISIS and MIKE series. The modelling en-
significant practical problems in flood risk management. Tradi- gines of these applications are based on the full or approximate
tional routing techniques are top-down approaches and are limited solutions of the Saint–Venant equations and since 2000, even 2D
by their inherent assumptions; whereas AI techniques are ‘‘bot- modelling has been feasible at professional organisations. How-
tom-up’’ data-driven approaches exploiting the information con- ever, they require extensive survey and hydrometric data but these
tained in the data. Both top-down and bottom-up modelling are costly and sometimes there are various reasons not to be able
practices should be used inclusively in a modelling culture, where to meet data requirements. Traditionally, simplified modelling
there is an increasing consensus that each technique is more suit- techniques have been used to solve the problems posed by lack
able to particular circumstances than the others. Thus, just as it is of data but now AI techniques can also be employed. The focus
important to test new modelling approaches, it is equally impor- of this paper is to use such techniques to emulate the role of sim-
tant to learn their strength and weaknesses in a pluralistic model- plified discharge routing techniques.
ling culture. The focus of this paper is on (i) inter-comparison of The importance of flood studies throughout the world need not
three AI techniques, (ii) gaining an insight into their performance, be overemphasised and simplified techniques, such as the Muskin-
(iii) avoiding the mindset of searching for a superior model, and gum (discharge routing), Muskingum–Cunge methods or the vari-
able parameter diffusion techniques have restrictive assumptions
and therefore their applications may not give sensible results most
⇑ Corresponding author.
of the times. Timeseries analysis is also capable of simulations,
similar to discharge routing but without any significant restric-
E-mail addresses: rahman_khatibi@yahoo.co.uk (R. Khatibi), cusp2004@yahoo.
com (M.A. Ghorbani), mahsakashani2003@yahoo.com (M.H. Kashani), kisi@erciyes. tions. One of the problems is that a diversity of timeseries analysis
edu.tr (O. Kisi). models can be applied for discharge routing but there is little

0022-1694/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.jhydrol.2011.03.007
202 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

knowledge on their comparative performance. This paper applies ANFIS networks require a twofold process: (i) both firing
three such techniques for predicting the outflow of a river reach strengths and output coefficients have to be optimised; and (ii)
from its given inflows using the Artificial Neural Network (ANN), the overall number of rules must carefully be chosen (Panella
Adaptive Nero-Fuzzy Inference System (ANFIS) and Genetic Pro- and Gallo, 2005). Lee and Han (2005) compared the potentials of
gramming (GP). The focus in using these techniques is their in- different neurofuzzy models for simulating real-time flood fore-
ter-comparisons. The study uses daily river flows data for the casting applications with (i) the traditional linear transfer function
Kizilirmak catchment, north Turkey, which is one of the most model (TF-ARX type), (ii) naïve models, which predict future values
important water resources in the country. set equal to current ones, and (iii) trend model, which predict fu-
ture values from the linear extrapolation of the previous two flow
values. One of the neurofuzzy models is K-Nearest-Neighbour
(KNN) validation scheme, which is one of the most fundamental
2. Literature review and simple machine learning procedure for cases with little or no
prior knowledge about the distribution of the data. They conclude
AI techniques are data-driven modelling approaches and as that the KNN validation scheme, which has a great potential and
such, they are bottom-up approaches for not making any prior shows that the performance of a model improves by using the back
assumptions about the model structure. These models are intelli- propagation neural network model; and achieves a better perfor-
gent in the sense that they use a proportion of the datapoints from mance than, naïve, trend models, linear transfer function and
the timeseries to identify their inherent structure and use the feed-forward neural networks models. However, they devised a
remaining datapoints to test their predictions. The strength of special testing data and showed that under this test condition a lin-
these approaches comes forth when they are compared with a host ear transfer function model was more stable than all the neuro-
of routing models, which are top-down approaches for making fuzzy models for extreme flood events needing extrapolation
assumptions about the model structure. Top-down models include beyond their training dataset. They conclude that AI models are
those from the Muskingum method of discharge routing tech- generally incapable of performing extrapolations if the test event
niques to the full hydrodynamic waves using the Saint–Venant is very different to their training data.
technique, as well as blackbox unit hydrograph and conceptual The proof-of-concept for the application of GP to discharge
models. All these top-down models are common in having a set routing problems is supported by a number of research works,
of equations derived on the basis of some assumptions, where which show that GP emulates flood forecasting problems and is a
the more broad brush (coarse resolution) the assumptions are, viable alternative to ANNs and traditional models. GP applications
the less data-intensive they are but their predictions will suffer include: Savic et al. (1999), Babovic and Keijzer (2002), Khu et al.
from a greater degree of uncertainty and less able to predict spatial (2001), Liong et al. (2002), and Aytek et al. (2008). Savic et al.
distributions. (1999) highlight the advantages of GP over ANNs and conceptual
All these models are just prediction techniques and each has its models in flow prediction for the Kirkton catchment, UK. Khu
own strength and weaknesses. This review focuses on three et al. (2001) use GP under real-time conditions to forecast runoff
artificial techniques of ANN, ANFIS and GP and differentiates of Orgeval catchment in France; measure its performance using
between the works on the proof-of-concept and on consolidating observed and calculated values; compare these with the perfor-
experience. mance of other methods such as the Kalman filter; and these
The proof-of-concept for the application of ANN is supported by results indicate an acceptable accuracy for their GP model. Sheta
a number of research works to implement discharge routing mod- and Mahmoud (2001) use GP to forecast the River Nile flow in
els performing comparatively well with respect to conventional the Northern Sudan. Aytek and Kisi (2008) apply GP for modelling
discharge routing models and accounting for non-linearity inher- suspended sediment in streams showing improved performance
ent in the data, e.g. see Thirumalaiah and Deo (1995), Tawfik over conventional rating curves and multi-linear regression
et al. (1997), Campolo et al. (1999), and Peters et al. (2006). Tawfik techniques.
et al. (1997) modelled the hysteresis nature of the stage–discharge Sivapragasam et al. (2007) investigate the performance of non-
curve at a given stream section using ANN. Peters et al. (2006) linear Muskingum models and report the possibility of producing
approximated the HEC–RAS routing module with an ANN model erroneous results by using discharge routing techniques over
for a 60 km stream length, and emphasised the advantage of using events with complex storms profiles resulting in multi-peaked
ANNs for forecasting. hydrographs. They use GP successfully to route single-peaked
Campolo et al. (1999) applied ANNs to flood forecasting in an and multi-peaked flood hydrographs quite accurately. The peaks
upland catchment, but noted that almost all forecasted flood peaks are very accurately predicted and there is no time lag in the occur-
were underestimated. Various research works have been reported rence of the peak values, unlike with the non-linear Muskingum
to explain the symptomatic problem of underestimation in pre- model. The GP model is also found to be time invariant.
dicting extreme values in ANN models and these include: (i) inad- An overview of the reported works is that these academic re-
equate representation of high-flow datapoints in the training search works are focused on testing new techniques to provide evi-
dataset, as suggested by Karunanithi et al. (1994); (ii) the need dence for the proof-of-concept to apply ANN, ANFIS and GP, but the
for gap reduction in the model datapoint, e.g. Hsu et al. (1995) pro- consolidation of knowledge is left to practitioners. However, expe-
posed log transformations of flow values to reduce the gap be- rience is not often transformed into a body of knowledge for every-
tween the high and low flow conditions; and (iii) a lack of one to tap on. The review shows that critical views have been
information in the network as suggested by Dawson and Wilby presented showing that ANN are liable to underestimations and
(1998). poor on extrapolation; advanced training scheme can be more
Although ANN is quite powerful for modelling various real important than the model structure among neurofuzzy modelling
world problems, these models also have shortcomings. A major approaches; observed events contain information on the shape of
one is that poor performance is likely when their performances the whole hydrograph, the magnitude of the individual peaks
are extrapolated outside the range of the data used in their train- and the timing to their peaks and the relevant information on these
ing. Also, if the input data suffer from undue uncertainty, the per- features should be understood systematically. Also, there is no
formance of ANN is expected to be poor, in which case a fuzzy consensus on the choice and use of performance measures, for
system such as ANFIS may perform better. which there is a large number of alternatives available. This work
R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212 203

therefore compares these three techniques and focuses on the full to present errors in nodes are corrected through comparing the
information content of the predicted flows to qualify the perfor- model outputs with recorded input data.
mance of these three models. Connection weights are first initialised randomly by assigning a
small positive or negative random value through the following
3. Outline of the methods used procedure:

Ghorbani et al. (2010) argue that timeseries analysis is carried 1. Input–output patterns are selected randomly using the training
out through the following phases: data presented to ANN.
2. Actual network outputs are calculated for the current input
Phase 1: Review the data to be used in timeseries and if applica- after the application to the activation function.
ble, identify possible discontinuities in both independent and 3. Performance measure is selected, e.g. Mean Square Error (MSE)
dependent variables through chaos and catastrophe theory; and the values are calculated.
select the appropriate software application for the timeseries 4. Connection weights are adjusted to minimise the MSE.
analysis; divide the data into blocks of training data, validation 5. Steps (2)–(5) are repeated for each pair of input–output vector
data and application data; and prepare datasets according to in the training datasets, until no significant change in the MSE is
the software applications. detected for the system.
Phase 2: Implement the timeseries analysis as per selected mod-
elling application; set the parameters as per selected method of The final connection weights are kept fixed at the completion of
timeseries analysis and software application; and produce training and new input patterns are presented to the network to
results. This includes both training the model and using it in produce the corresponding output consistent with the internal rep-
the prediction mode. resentation of the input/output mapping.
Phase 3: Post-process the results in relation to training, valida-
tion and application; and if applicable, carry out some appropri- 3.2. Adaptive NeuroFuzzy Inference System (ANFIS)
ate sensitivity tests.
An adaptive network is utilised to embed the Sugeno fuzzy
model into its framework to facilitate the learning of the Sugeno
3.1. Artificial Neural Networks (ANNs)
fuzzy model. This can also compute gradient vectors systemati-
cally. This network architecture is called Adaptive-Network-based
ANNs are parallel information processing systems consisting of
Fuzzy Inference System or Adaptive NeuroFuzzy Inference System
a set of neurons or nodes arranged in layers and when weighted in-
(ANFIS). This is a universal approximator first introduced by Jang
puts are used, these nodes provide suitable conversion functions.
(1993) capable of approximating any real continuous function on
Any layer consists of pre-designated neurons and each neural net-
a compact set to any degree of accuracy. It identifies a set of
work includes one or more of these interconnected layers. Fig. 1
parameters through a hybrid learning rule combining the back
represents a three layered structure that consists of (i) input layer,
propagation gradient descent error and a least-squares method.
I, (ii) hidden layer, H, and (iii) output layer, O. Further information
It can be used as a basis for constructing a set of fuzzy ‘‘If–Then’’
on ANNs can be found in e.g. Haykin (1999).
rules with appropriate membership functions in order to generate
The type of ANN used in this study is a multi-layer feed-forward
the preliminary stipulated input–output pairs. Fig. 2 represents a
perceptron (MLP) trained with the use of back propagation learn-
typical ANFIS architecture, and outline as follows:
ing algorithm. The operation process of these networks is so that
the input layer accepts the data and the intermediate layer pro-
Layer 1: Every node in this layer is an adaptive node with a node
cesses them and finally the output layer displays the resultant out-
function that may be a generalised bell membership function or
puts of the model. During the modelling stage, coefficients related
a Gaussian membership function.
Layer 2: Every node in this layer is a fixed node labelled, P, rep-
resenting the firing strength of each rule, and is calculated by
the fuzzy AND connective of the ‘product’ of the incoming
signals.
Layer 3: Every node in this layer is a fixed node labelled N, rep-
resenting the normalised firing strength of each rule. The ith
node calculates the ratio of the ith rule’s firing strength to the
sum of two rules’ firing strengths.
Layer 4: Every node in this layer is an adaptive node with a node
function indicating the contribution of ith rule toward the over-
all output.
Layer 5: The single node in this layer is a fixed node labelled, R,
indicating the overall output as the summation of all incoming
signals.

The above comprise three different types of components, as fol-


lows (see Lughole, 2003):

1. Premise parameters as non-linear parameters that appear in the


input membership functions.
2. Consequent parameters as linear parameters that appear in the
rules consequents (output weights).
3. Rule structure that needs to be optimised to achieve a better lin-
Fig. 1. Multi layer perceptron neural network. guistic interpretability.
204 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

Fig. 2. A typical ANFIS architecture, see Moghaddamnia et al. (2009).

although data mutation is also possible. This completes the opera-


3.3. Genetic Programming (GP)
tions at the initial generation and the process is repeated until ter-
mination. There are now various software applications for
GP is truly a ‘‘bottom-up’’ process, for not making any assump-
implementing GP models and the GenXpro (Ferreira, 2001a,b)
tion on the structure of the relationship between independent and
was used in this study. The parametric choices in this study are gi-
dependent variables but it identifies an appropriate relationship
ven in Table A1, Appendix A.
for any given timeseries. The construction of the relationship is
made possible by two components and efficient emulation of evo-
lutionary processes become possible only when these components 4. Case study and data used description
work hand-in-hand. These components are: (i) a set of components
of functions and their parameters (referred to as the terminal set), This study uses daily mean river discharge data of two stations
which emulates the role of proteins or chromosomes in biological (Stations: 1501 and 1543) on the Kizilirmak River, the Kizilirmak
systems; and (ii) a parse tree, which is a functional set of basic Basin, Turkey. The locations of the stations are shown in Fig. 3. There
operators and those selected in this study are: are 4 years of observed data (1461 days) available (1999–2003) for
both stations. The daily timeseries at the upstream and downstream
fþ; ; ; xg ð1aÞ stations are given in Fig. 4. The daily discharge data for the period
1999 and 2001 is used for training the models but the records for
fþ; ; ; x2 g ð1bÞ 2002 (Fig. 5) and 2003 (Fig. 6) are used for their prediction and
assessment of the performance of each AI modelling techniques.
It is noted that for the year 2001 records, there is a significant
fþ; ; ; x3 g ð1cÞ
inflow from a possible tributary between the gauging stations.
The relationship between independent and dependent variables The models will be trained without filtering out this period and
are often referred to as the ‘‘model,’’ the ‘‘program,’’ or the ‘‘solu- therefore this will serve as a way to assess the flexibility of the
tion.’’ Whatever, the terminology, the identified relationship in a predictions by the AI techniques employed in this paper. The
particular GP modelling is continually evolving and never fixed. performance of the models is evaluated by three goodness-of-fit
The evolution starts from an initially selected random population measures: Coefficient of Correlation (CC) as expressed by (2), Mean
of models, where the fitness value of each model is evaluated using Absolute Error (MAE), as expressed by (3), and Mean Squares
the values of the independent and dependent variables. As the pop- Relative Error (MSRE), as expressed by (4):
ulation evolves from one generation to another, new models re-
X
n
½ðQ o Þi  ðQ o Þ½ðQ e Þi  ðQ e Þ
place the old ones by having demonstrably better performance. CC ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2Þ
There are various selection methods and the method used in Pn 2 Pn 2
i¼1 ½ðQ o Þi  ðQ o Þ i¼1 ½ðQ e Þi  ðQ e Þ
i¼1
this study is referred to as the Gene Expression Programming
(GEP) based on evolving computer programs of different sizes
1X n
and shapes encoded in linear chromosomes of fixed lengths, MAE ¼ jQ  Q o j ð3Þ
Ferreira (2001a,b). The chromosomes are composed of multiple n i¼1 e
genes, each gene encoding a smaller subprogram. Furthermore,
n  2
the structural and functional organisation of the linear chromo- 1X Qo  Qe
somes allows the unconstrained operation of important genetic MSRE ¼ ð4Þ
n i¼1 Qo
operators, such as mutation, transposition and recombination. It
has been reported that GEP is 100–10,000 times more efficient where Q is discharge; the subscripts ‘o’ and ‘e’ represent the ob-
than GP systems (Ferreira, 2001a,b) for a number of reasons, served and estimated values; the average value of the associated
including: (i) the chromosomes are simple entities: linear, com- variable is represented with a bar above it and n is the total number
pact, relatively small, easy to manipulate genetically (replicate, of records.
mutate, recombine, etc.); (ii) the parse trees or expression trees CC reflects on the degree of cause-and-effect relationship be-
are exclusively the expression of their respective chromosomes; tween the independent and dependent variables and has the abil-
they are entities upon which selection acts; and according to fit- ity to capture the information content both in terms of shape and
ness, they are selected to reproduce with modification. the actual agreement between the values. The MAE and the MSRE
Applying operators like crossover and mutation to the winners, can be used together to diagnose the variation in the errors in the
‘‘children’’ or ‘‘offspring’’ are produced, in which crossovers are results. MAE would reflect if the results suffer from a bias and
responsible for maintaining identical features from one generation MSRE records in real units the level of overall agreement between
to another but mutation causes a random change in the parse tree, the observed and modelled datasets. It is a non-negative number
R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212 205

Kizilirmak Basin and Gaging Stations

Fig. 3. Location of the stations (Stations: 1501 and 1543) on Kizilirmak River.

Fig. 4. Daily river timeseries at upstream and downstream stations (1999–2003). Fig. 5. Multi-peaked hydrograph in Kizilirmak River (2002).

that has no upper bound and for a perfect model, the result would tude events; instead, it evaluates all deviations from the observed
be zero. It provides no information about underestimation or over- values, both in an equal manner and regardless of their sign. MAE
estimation and is not weighted towards high(er) or low(er) magni- is comparable to the total sum of absolute residuals.
206 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

5.3. Preliminary implementation of ANN model

The ANN model was implemented in the MATLAB software and


the data were processed by (i) scaling as per (6) and (ii) normalis-
ing the data (e.g. mapstd function) to ensure that their mean is zero
and standard deviation, r = 1.

Qi  Q
Qn ¼ ð6Þ
r
where the subscripts ‘i’ and ‘n’ represent the ith discharge data and
normalisation, respectively. The best fit model structure was deter-
mined by the performance criteria and using trial and error method,
which led to the identification of 3-9-1 structure (input layer with
Fig. 6. Single-peaked hydrograph in Kizilirmak River (2003).
three neurons, hidden layer with nine neurons using the tan-sigmoid
function and output with one neuron and using a linear transfer func-
5. Model development and analysis tion in the output layer). Default training parameters are summarised
in Table A1, Appendix A.
5.1. Setting up the model
5.4. ANFIS model
The models were implemented by reviewing the recorded time-
series, which recognised the importance of the individual peaks The characteristic of the optimal ANFIS structure was obtained
and multi-peaks parts of the hydrographs for assessing the perfor- in this preliminary study. The preliminary modelling led to the
mance of each AI technique. The recorded data for 2002 in Fig. 5 identification of: input membership function type, output mem-
shows that there are four events with distinctive peaks reflecting bership function; and number of input membership functions, 3,
significant hydrological activities possibly superimposed on 3, 3 for the inputs It, It1 and Qt1. The default parameters are pre-
groundwater-driven baseflow. Similarly, the record for 2003 in sented in Table A1, Appendix A.
Fig. 4 shows a number of peaks reflecting hydrological activities
but only one of these has been selected to study the model perfor- 5.5. GP model
mances for the simulation of single-peak events. This exposes the
models to severe tests. Data were reviewed and the preliminary model structure given
in (5) was constructed to select the GP functional operators
5.2. Preliminary investigations expressed by (1a)–(1c). A preliminary investigation was carried

Table 2
A linear model was selected to set the values of the various de- Input variables used for ANN (MLP) model.
fault parameters and set the model structure, as follows:
Model Input variables Output variable
Q t ¼ f ðIt ; It1 ; Q t1 Þ ð5Þ Model 1 I(t) Q(t)
Model 2 I(t), I(t  1) Q(t)
where It = inflow at current time t, It1 = inflow at time t  1 and Model 3 I(t), I(t  1), Q(t  1) Q(t)
Qt1 = outflow at time t  1. Model 4 I(t), I(t  1), I(t  2), Q(t  1) Q(t)

Table 1
Comparing performance of basic operators of GP using CC, MAE and MSRE.
R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212 207

Table 3
CC values for preliminary investigations.

out to select the most appropriate expression and the results are A preliminary set of results are shown in Table 3, in which GP
shown in Table 1. uses (1c). According to these results, Models 1 and 2 perform
The results show that the model performance using the multi- poorly and they are rejected as some of the results produce nega-
peak data is significantly better than the single-peak ones for all tive flow values in the predicted timeseries. Model 3 performs bet-
three basic functions of (1a)–(1c) but the sensitivity to the choice ter than Model 4, although this is not consistent due to one or two
between these functions is not high. Overall, (1c) performs better exceptions and likewise the performance of Model 4 is satisfactory
than the others and this is selected for the rest of the study. The de- but not as good as that of Model 3. Generally, GP performs better
fault parameters are presented in Table A1, Appendix A. than the ANN and ANFIS models but minor inconsistencies are
noted. Model 3 is therefore selected for its overall better perfor-
mance and is used in the rest of the study.

5.6. Model structure


6. Results
The model structure was first investigated by different combi-
Figs. 7 and 8 provide a visual comparison for the modelled out-
nations of inflows and outflows of the river reach. Table 2 presents
puts using ANN, ANFIS and GP with the observed flows for the sin-
model structures investigated.

Fig. 7. Model performances for single-peaked outflow hydrograph (2003) – GP: Fig. 8. Model performances for multi-peaked outflow hydrograph (2002) – GP:
closest to the observed values; ANFIS: closer to GP; ANN: poor. closest to observed values; ANN: some problems; ANFIS: poor.
208 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

gle- and multi-peaked hydrographs, respectively. It is informative as it has a comparable performance to GP for the single-peaked
to capture the information content by each model compared with event (Fig. 7); but a poor performance for the multi-peaked event
the observed flows and compare the findings with the outcome of (Fig. 8).
the literature review, if possible. These are summarised as follows: The above comparison based on visual observation of the
(i) Overall, the GP model in these test runs performs better than hydrographic features is now studied further using performance
both the ANFIS and ANN models, where the ANFIS and ANN models measures presented in Table 4 in terms of CC, MAE and MSRE val-
have a mixed fortune. (ii) The performance of ANN has a mixed for- ues. These results show the overall performance of the models and
tune, as displayed in Fig. 7, it has a very poor performance for the provide a focus to the model with better performances. Evidently,
single-peaked record by severely overestimating the hydrograph none of the three models outperforms the others using whichever
(as explained later) whereas its performance as in Fig. 8 for the of the three performance measures in all of the evaluations (one
multi-peaked records is reasonably well and comparable to that single-peak event, four individual peaks of the multi-peak event
of GP but with slightly poor low values related to Peaks 2 and 3. and their overall performance); and none of the models necessarily
(iii) The performance of the ANFIS model has also a mixed fortune, perform consistently for all the results. As discussed later in detail,

Table 4
Comparison of ANN, ANFIS and GP models using CC, MAE and MSRE.

a
Note that the ANN model scores the highest CC value for the single-peaked records, even though Fig. 7 shows visually that ANN performs poorly.
R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212 209

mances and these are given in Table 4, although for some of the re-
sults there may be not much to choose between them. The results
show that the performance of the GP model is overall better by
scoring 51 points and the ANN model is ranked the second by scor-
ing 44 and the ANFIS model the last by scoring 27. The CC score of
ANN is 18 and is better than that even GP scoring 16 and that bet-
ter than ANFIS (8) reflecting the earlier conclusion that ANN cap-
tures the shape but introduces bias. Thus, the fortunes of ANN
are overturned by MAE (7 for ANN, 12 for ANFIS and 17 for GP)
but not by MSRE (14 for ANN, 9 for ANFIS and 18). Notably, MAE
and MSRE of GP score more than those of ANFIS and ANN. It should
be highlighted that these performance measures are rather blind to
the anomalous result of the ANN model as reflected in Fig. 7 but
this will be discussed in detail later.
Scatter diagrams are also a measure of performance and Figs.
Fig. 9. Scatter between modelled and observed flows for single-peaked timeseries.
9 and 10 confirm the best performance of GP for both single-
peaked and multi-peaked data. They show that ANFIS closely fol-
lows GP for the single-peaked data and comparatively better
than ANN but for the multi-peaked record, the situation changes.
This lack of pattern is highlighted for both sets of data in both
figures.
Sensitivity tests were carried out to investigate the effects of
data density on model performances. So both training and predic-
tion data were made sparser by removing: (i) every alternate value
(interval: 2 days), (ii) every 4 day data (interval: 5 days), and (iii)
every 9 day data (interval: 10 days). The investigation covered all
three AI techniques and the comparison of these results with ob-
served and modelled ones at a daily interval are given in Figs.
11a–11c. They show that: (i) all the models give the most accurate
results when the time interval is 1.0 day and their quality of pre-
dictions deteriorate by increasing the time interval; (ii) GP main-
tains some degree of robustness; (iii) ANN at the intervals of 5
and 10 days perform very poorly; and (iv) ANFIS produces unac-
ceptable results at the interval of 10 days. The general behaviour
Fig. 10. Scatter between modelled and observed flows for multi-peaked timeseries. of these results confirm the received view that the problem of res-
olution is the problem of ‘‘diminishing returns’’ in the sense that:
(i) the gain in short model running times with coarser time interval
this is the truth of modelling, which fits common sense better than resolutions is at the expense of the accuracy of modelling results;
the still received view of seeking for a super model outperforming (ii) significant improvements in accuracy are obtained by higher
rival ones all the time. resolutions up to a delineating point beyond which there is little
The performances of the individual models are counted with a gain in accuracy but the model run times can become excessively
score of 3 for good (best), 2 for satisfactory and 1 for poor perfor- long. This is shown in Fig. 12.

Fig. 11a. Impacts of coarser time interval resolution on accuracy – the ANN model.
210 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

Fig. 11b. Impacts of coarser time interval resolution on accuracy – the ANFIS model.

Fig. 11c. Impacts of coarser time interval resolution on accuracy – the GP model.

Table 5
Summarising results.

Events Visual comparison Performance Scatter


measures diagram
Single-  ANN: Phantom  ANN: Medium  ANN: Poor
peak feature
 ANFIS: Phase lag  ANFIS: Low  ANFIS: Good
 GP: Phase lag  GP: High  GP: Good
Multi-  ANN: Moderate  ANN: Medium  ANN: Poor
peak
 ANFIS: Poor  ANFIS: Poor  ANFIS: Good
 GP: Good  GP: High  GP: Good
Overall  ANN: Mixed  ANN: Medium  ANN: Good
 ANFIS: Poor  ANFIS: Poor  ANFIS: Poor
 GP: Good  GP: High  GP: Good
Fig. 12. Behaviour of time interval resolution as a diminishing value problem.
R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212 211

7. Discussion It is noted that this study does not employ a diversity of perfor-
mance measures and this is normally the case as there is no such
This study investigates three widely used AI models and em- formal recommendation on inter-comparison of model perfor-
ploys three inter-comparison approaches: visual comparison, mances. The studies normally employ deterministic frameworks
devising a scoring system based on performance measures and and use absolute values with a focus on certain features of pre-
the scatter diagrams. The study focuses on a host of observed infor- dicted values, overlooking inherent uncertainties. Inherent uncer-
mation in terms of the actual values of the peaks and troughs, the tainties are normally dealt with by statistical approaches, when
timing of these stationary points, the shapes of the hydrographs, there is a significant numbers of datapoints, according to which
inflows contributing to the hydrographs. The results presented in they are processed by ranking and banding; some sort of distribu-
this paper and summarised further in Table 5 show that: (i) each tion is fitted to the processed datapoints; and the goodness-of-fit is
modelling technique is only good in emulating some of these infor- tested by higher order statistical tests. With an increasing empha-
mation contents of the data but they do not generally reproduce sis on risk-based approaches, the focus nowadays is on the conse-
them in full; (ii) the effectiveness of each performance measure quences in terms of the frequency of adverse outcomes and their
is limited, as each captures certain features. impacts but not on deterministic values. Todini et al. (2006) argue
The paper promotes a pluralistic modelling culture to commu- that it is necessary to reduce uncertainty for effective decision-
nicate the collective strengths and weaknesses of the models of dif- making and suggest a procedure for considering damage con-
ferent modelling techniques and performance measures. Arguably, sequences caused by floods, as the probability of overtopping
a framework is yet to emerge for integrating knowledge for such a increases. Minded with consequences, it becomes feasible to
culture. For instance, this study identified a rather poor perfor- develop robust band of values or risk zones, rather than using
mance for ANFIS but one remains minded of the reported strength deterministic values. Khatibi (2006) discusses such a procedure
of ANFIS in coping with data of poor quality and therefore it is very for flood warning and more details are given in Sayers et al.
important not to overwrite that gained knowledge through the re- (2003) for flood risk management. Arguably, similar procedures
ported works by other researchers. are applicable in other water timeseries and risk-based approaches
The performance of the ANN model for the single-peaked record are expected to make comparisons more realistic.
seems anomalous for two reasons: (i) The numerical CC value of Although no knowledge base has been compiled for the strength
ANN scores the highest (0.88) compared visually with ANFIS and and weaknesses of individual models under a diversity of physical
GP despite the better performance of the latter two models. The and land use conditions, arguably the future of diversity of
high score in terms of CC is surprising and a further investigation modelling techniques is the need for some sort of a pluralistic
confirmed that the value to be correct. (ii) As shown in Fig. 7, the framework, creating suitable conditions for combining them on
overestimation in the predicted hydrograph by ANN seems to be their strengths. For instance, Lee and Han (2005) argue that neural
a phantom feature attributable to a significant amount of lateral networks are good at recognising patterns but they are not good at
inflow in the 2001 training data. Therefore, even though the pre- explaining how they reach their decisions. This opaque nature has
dicted results are erroneous, the feature seems to be for a reason. hampered its wider applications in real time operations. Fuzzy
As tributary inflows may or may not happen during all flow events, logic systems can reason with imprecise information and are good
this produces a degree of haphazardness and different modelling at explaining their decisions but they cannot automatically acquire
techniques would respond in a different manner and each AI tech- the rules they use to make those decisions.
nique have a different capability in capturing such features. It may
be reported without producing the result that the phantom feature 8. Conclusions
was indeed removed when the record of Year 2000 was used to
train the ANN model, in which case the lateral flow was physically In this study, applicability and capability of three AI techniques,
eliminated from modelling. ANN, ANFIS and GP, are investigated for routing river discharge
The GP generally performs well in all cases except in the case timeseries. Kizilirmak River, north Turkey, was used as a case for
of single-peak prediction event, for which there seems to be a the inter-comparison of their performances. Initial modelling deci-
significant phase difference between the time to predicted and sions were made for all three modelling techniques including the
observed peak. The phase difference is as much as 2 days but preliminary review and selection of data for training and predic-
this is relative better compared with the performance of the AN- tions and the assessment of the performance using single-peaked
FIS model as its phase difference is 3 days. The phase difference and multi-peaked series.
for the predicted multi-peak event is less significant and is order The results show that the GP model performs better in predict-
of 1 day. ing single-peaked series and multi-peaked series but some phase
Top-down and bottom-up models differ in their approach to lag is observed between the predicted and observed values. GP in
their structure, so that: top-down models have prescribed this study seemed to be insensitive to lateral inflows. The ANN
structures but bottom-up models make decisions and reconstruct model captures the peaks of the multi-peaked event better than
it; and in doing so, they are faced with the parsimony of their the ANFIS but suffers from a gross sensitivity to lateral inflows in
structure. One implication of the principle of parsimony is to select modelling a single-peaked event, although this may be attributed
an approach with a minimum number of assumptions and there- to the existence of lateral flow in the training data but the lack
fore this principle immediately turns the focus on the structure of lateral inflows in the prediction data may be resurrected as
of the model. The thrust of this principle is that modellers should anomalous features.
be minded with the selection of the structure of their models The paper argues that deterministic frameworks for inter-com-
and ensure that elaborate model structures are avoided. There is parisons are in themselves poor to focus on significant features of
not a recommended approach for this and therefore this study de- the timeseries and to assess significant differences. When the
vised a set of preliminary investigation (see the results presented difference in performances of two models is slight, the determinis-
in Tables 1–3) for this purpose. Practical experience shows if the tic framework of inter-comparison amplifies the differences lead-
performances of particular models can be improved by elaborate ing to anomalous results. This study focused on such anomalies
structures at the expense of parsimony, in the long run they be- and avoided the pitfalls by considering different features and
come susceptible to inadvertent failures. performance measures, as well as complementing quantitative
212 R. Khatibi et al. / Journal of Hydrology 403 (2011) 201–212

Table A1
Default parameter values used by the model.

ANN ANFIS GP
Training Values Training parameters Values Training Values
parameters parameters
Goal 0 Input membership function Triangular Crossover rate 0.3
(trimf)
Epochs 150 Number of input membership 333 Mutation rate 0.044
functions
max_fail 6 Output membership function Constant Population size 28
mem_reduc 1 Epochs 70 Number of 5000
generations
min_grad 1e010 Optimal method Hybrid Arithmetic Plus, minus,
functions multiply
Mu 0.001 Number of fuzzy rules 27
mu_dec 0.1
mu_inc 10
mu_max 10,000,000,000
Training Levenberg–Marquardt
algorithm (trainlm)

assessments with qualitative evaluations. The paper argues that Jang, J.R., 1993. Anfis: adaptive-network-based fuzzy inference system. IEEE Trans.
Syst. Man Cybernet. 23, 665–685.
risk-based approaches should overcome such anomalies.
Karunanithi, N., Grenney, W.J., Whitley, D., Bovee, K., 1994. Neural networks for
The paper is minded to avoid superlative expressions such as river flow prediction. J. Comput. Civil Eng. 8 (2), 201–220.
superior performance, where such a mindset defies common sense Khatibi, R., 2006. Barriers inherent in flood forecasting and their treatments. In:
by seeking for a superior performance of a supposed perfect model Knight, D.W., Shamseldin, A.Y. (Eds.), River Basin Management for Flood Risk
Mitigation. Taylor and Francis Group Plc., London, pp. 569–585.
all the times for all problems. Instead, inter-comparison of differ- Khu, S.T., Liong, S.Y., Babovic, V., Madsen, H., Muttil, N., 2001. Genetic programming
ent modelling approaches should be directed towards compiling and its application in real-time runoff forming. J. Am. Water Resour. Assoc. 37
a knowledge base to facilitate the emergence of pluralistic model- (2), 439–451.
Lee, H.X., Han, D., 2005. Exploration of neuro-fuzzy models in real time flood
ling practices, where strengths and weaknesses of the individual forecasting. In: Proceedings of the 2008 International Conference on Artificial
modelling approaches are captured with respect to the diversity Intelligence and Pattern Recognition, ISRST, Orlando, FL, USA, 7–10 July 2008,
of the context. If AI models can be employed to identify the various pp. 264–268. ISBN: 978-1-60651-000-1. <http://www.promoteresearch.org/
2008/aipr/index.html>.
hydrographic features, arguably these techniques will truly Liong, S.Y., Gautam, T.R., Khu, S.T., Babovic, V., Keijzer, M., Muttil, N., 2002. Genetic
become tools of artificial intelligence. programming: a new paradigm in rainfall runoff modeling. J. Am. Water Resour.
Assoc. 38 (3), 705–718.
Lughole, E., 2003. Online Adaptation of Takagi-Sugeno Fuzzy Inference Systems.
Acknowledgement Technical Report, Fuzzy Logic Laboratorium, Linz-Hagenberg.
Moghaddamnia, A., Remesan, R., Hassanpour Kashani, M., Mohammadi, M., Han,
The authors are grateful to the Turkish Electrical Power D., Piri, J., 2009. Comparison of LLR, MLP, Elman, NNARX and ANFIS models-
with a case study in solar radiation estimation. J. Atmos. Sol.-Terr. Phys. 71,
Resources Survey and Development Administration for making
975–982.
available the daily river discharge data. Panella, M., Gallo, A.S., 2005. An input/output clustering approach to the synthesis
of ANFIS networks. IEEE Trans. Fuzzy Syst. 13 (1).
Peters, R., Schmitz, G., Cullmann, J., 2006. Flood routing modelling with artificial
Appendix A neural networks. Adv. Geosci. 9, 131–136.
Savic, D.A., Walters, G.A., Davidson, G.W., 1999. A genetic programming approach to
See Table A1. rainfall–runoff modelling. Water Resour. Manage. 13, 219–231.
Sayers, P.B., Gouldby, B.P., Simm, J.D., Meadowcroft, I., Hall, J., 2003. Risk,
Performance and Uncertainty in Flood and Coastal Defence. Technical Report
References FD2302/TR1. Department of Environment, Food and Rural Affairs (DEFRA)
Research and Development (R&D), UK, p. 98. <http://www.safecoast.org/editor/
Aytek, A., Asce, M., Alp, M., 2008. An application of artificial intelligence for rainfall– databank/File/risk,%20performance%20and%20uncertainty%20in%20flood%20
runoff modelling. J. Earth Syst. Sci. 117 (2), 145–155. defence.pdf> (accessed 16.02.11).
Aytek, A., Kisi, O., 2008. A genetic programming approach to suspended sediment Sheta, A.F., Mahmoud, A., 2001. Forecasting using genetic programming. In:
modeling. J. Hydrol. 351, 288–298. Proceedings the 33rd Southeastern Symposium on System Theory, Athens,
Babovic, V., Keijzer, M., 2002. Rainfall runoff modelling based on genetic OH, USA, pp. 343–347.
programming. Nord. Hydrol. 33 (5), 331–346. Sivapragasam, C., Maheswaran, R., Venkatesh, V., 2007. Genetic programming
Campolo, M., Andreussi, P., Soldati, A., 1999. River flood forecasting with a neural approach for flood routing in natural channels. Hydrol. Process. doi:10.1002/
network model. Water Resour. Res. 35, 1191–1197. hyp.6628.
Dawson, C.W., Wilby, R., 1998. An artificial neural network approach to rainfall– Tawfik, M., Ibrahim, A., Fahmy, H., 1997. Hysteresis sensitive neural network for
runoff modelling. Hydrol. Sci. J. 43, 14–66. modeling rating curves. J. Comput. Civil Eng. 11, 206–211.
Ferreira, C., 2001a. Gene expression programming in problem solving. In: 6th Online Thirumalaiah, K., Deo, M.C., 1995. Flood estimation using neural networks. In:
World Conference on Soft computing in Industrial Applications (Invited Tutorial). International Conference on Disaster and Mitigation, Anna University, Madras,
Ferreira, C., 2001b. Gene expression programming: a new adaptive algorithm for pp. B2-34–B2-39.
solving problems. Complex Syst. 13 (2), 87–129. Todini, E., Alberoni, P., Butts, M., Collier, C., Khatibi, R., Samuels, P.G., Weerts, A.,
Ghorbani, M., Khatibi, R., Aytek, A., Makarynskyy, O., 2010. Sea water level 2006. ACTIF best practice – understanding and reducing uncertainty in flood
forecasting using genetic programming and artificial neural networks. J. forecasting. In: Proceedings International Conference on Innovation Advances
Comput. Geosci. 36 (5), 620–627. and Implementation of Flood Forecasting Technology, Tromso, Norway; <http://
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, second ed. www.actif-ec.net/conference2005/proceedings/PDF%20docs/Session_09_ACTIF_
Prentice Hall, Upper Saddle River, New Jersey. best_practice/ACTIF_best_practice_paper_1_Uncertainty_in_flood_forecasting_
Hsu, K.L., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling of the V2.pdf> (accessed 16.02.11).
rainfall–runoff process. Water Resour. Res. 31 (10), 2517–2530.

You might also like