You are on page 1of 15

+ MODEL

Available online at www.sciencedirect.com

ScienceDirect
Journal of Hydro-environment Research xx (2015) 1e15
www.elsevier.com/locate/jher

Research paper

Artificial Neural Network ensemble modeling with conjunctive data


clustering for water quality prediction in rivers
Sung Eun Kim, Il Won Seo*
Department of Civil and Environmental Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 151-744, South Korea
Received 6 April 2014; revised 6 September 2014; accepted 26 September 2014
Available online ▪ ▪ ▪

Abstract

The Artificial Neural Network (ANN) is a powerful data-driven model that can capture and represent both linear and non-linear relationships
between input and output data. Hence, ANNs have been widely used for the prediction and forecasting of water quality variables, to treat the
uncertainty of contaminant source, and nonlinearity of water quality data. However, the initial weight parameter problem and imbalanced
training data set make it difficult to assess the optimality of the results obtained, and impede the performance of ANN modeling. This study
attempted to employ the ensemble modeling technique to estimate the performance of the ANN without the influence of initial weight pa-
rameters on the model results, and to apply several clustering methods, to alleviate the imbalance of the training data set. An ANN ensemble
model was developed, and applied to forecast the water quality variables, pH, DO, turbidity (Turb), TN, and TP, at Sangdong station, on the
Nakdong River. The optimal ANN models for each water quality variable could be selected from the ensemble modeling. The optimal ANN
models for pH, DO, TN, and TP, of which the training target data set was distributed evenly, showed good results, with R squared higher than
0.90. But the ANN model for Turb, of which the training data set was imbalanced, showed large RMSE (11.8 NTU), and low R squared (0.58).
The training data set of Turb was partitioned into several classes, by conjunctive clustering methods according to the patterns of data set for each
number of clusters. The ANN ensemble models for Turb with the clustered training data set (clustered ANN models) were then developed. All
clustered ANN models for Turb showed better results, than the model without clustering. In particular, the three-clustered ANN model showed
an increase of R squared from 0.58 to 0.88, and a decrease of total RMSE from 11.8 NTU to 6.3 NTU.
© 2015 International Association for Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights
reserved.

Keywords: Artificial Neural Network; Ensemble modeling; Clustering; Water quality forecasting; Nakdong River

1. Introduction addition to the major criticism that ANNs lack transparency,


ANNs still suffer from limitations and problems; and a sig-
The Artificial Neural Network (ANN) has become a new nificant research effort is needed to address these deficiencies
tool and an efficient model for the prediction and forecasting of ANNs. It is well known that the deficiency of ANNs come
of various water quality variables in river systems, due to the from the initial weight parameter and imbalance of training
inherent uncertainties of contaminant source and water quality data set in the ANN development process (Maier et al., 2010;
data. However, despite many researches conducted using the Kasiviswanathan et al., 2013).
ANN model to predict water quality variables, the model The ANN model gives different results for the same orig-
building process has been poorly treated, which has made it inal inputs, depending on the initial weight parameter set,
difficult to assess the optimality of the results obtained. In before training the neural network. Problems with the initial
weight parameters often force the ANN modelers to select a
single “good” result, and accept it as the final result, omitting
* Corresponding author. explanation of the optimal initial weight parameter. Using a
E-mail address: seoilwon@snu.ac.kr (I.W. Seo).

http://dx.doi.org/10.1016/j.jher.2014.09.006
1570-6443/© 2015 International Association for Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

2 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

MonteeCarlo experiment, Kolen and Pollack (1990) showed training algorithms. However, their methods included modi-
that the training algorithm was very sensitive to the initial fying the probability or distribution of the training data set,
weight parameters. Yam and Chow (1995) presented an al- which led to loss of information of the data set, and increase of
gorithm based on linear algebraic methods for determining the the training time.
optimal initial weight parameters, and showed that with the The objective of this study is to reduce the modeling errors
optimal initial weight parameters, the initial network error can of ANN in water quality prediction caused by the initial
be greatly reduced. Other methods involving the genetic al- weight parameter problems and imbalanced training data set,
gorithm (GA) have been implemented to find the optimal by employing an ensemble modeling technique, and clustering
initial weight parameters, and have enhanced the accuracy of methods. Ensemble modeling was applied to estimate the
the ANN model (Venkatesan et al., 2009; Chang et al., 2012; ANN performance, by removing the effect of initial weight
Mulia et al., 2013). These researches agree that the optimal parameters on the variance of ANN model results. In order to
initial weight parameters were very sensitive to the training alleviate the imbalance of the training data set, several clus-
algorithms and data structures, and there were no fixed optimal tering methods were applied to separate the training data set,
initial weight parameters that were universally applicable to according to the patterns in the training data set, without the
the varieties of data structures and training algorithms. For this process modifying the probability or distribution of the data
reason, ensemble techniques have been applied, due to the set. In this study, each one-step ahead water quality fore-
basic fact that the selection of weights is an optimization casting ANN ensemble model for pH, DO, turbidity (Turb),
problem, with many local minima (Hansen and Salamon, TN, and TP was developed. ANN ensemble models with
1990). Laucelli et al. (2007) applied ensemble modeling and clustered training data sets (clustered ANN models) were
genetic programming to hydrological forecasts, and showed developed for Turb, of which the training data set was highly
the error due to the variance is effectively eliminated, by using imbalanced.
an average model (ensemble model), as the resultant model of
many runs. Boucher et al. (2009) developed the one-day ahead 2. Models and methods
ensemble ANN model, for streamflow forecast. This study
showed that random initialization of the weight parameters 2.1. ANN ensemble modeling
mainly accounted for the uncertainty linked to the optimiza-
tion of the model's parameters; and ensemble modeling could The ANN consists of a very simple and highly inter-
reduce the uncertainty, using the proper assessment tools for connected processor called a neuron. A neuron is an
the performance of ensemble models. Zamani et al. (2009) information-processing unit that is fundamental to the opera-
developed an ensemble ANN model with a Kalman Filter tion of a neural network, and consists of a weight and an
that corrects the outputs of the ANNs, to find the best estimate activation function (Fig. 1). The weights are the most impor-
of the wave height; and showed the prediction results were tant parameters acting as the memory of ANN, and the acti-
improved, as the number of ensemble members increased. vation function provides nonlinear mapping potential with the
Khalil et al. (2011) developed the ensemble ANN model for network. The manner in which the neurons of ANNs are
the estimation of the mean values of four selected water structured determines the architecture of ANNs (Haykin,
quality variables. The results showed that the ensemble ANN 1999). In general, there are three fundamentally different
model provided better generalization ability, than the single classes of network architecture. The first is a single-layer
best ANN model. These researches indicate that the ANN feedforward network, without hidden layers. The second is a
model cannot guarantee that the model will produce an multilayer feedforward network, with more than one hidden
optimal result, without considering appropriate methods for layer. The third is a recurrent neural network, with at least one
the initial weight parameters. feedback loop. In this study, the multilayer feedforward neural
The imbalance of the training data set is one of the network (MFNN) with one hidden layer was used, because it
fundamental problems in ANN modeling, and has recently is able to approximate most of the nonlinear functions
drawn much attention (Zhou and Liu, 2006; Alejo et al., 2007; demanded by practice (Mulia et al., 2013).
Yoon and Kwek, 2007; Nguyen et al., 2008). The imbalance The weight parameters on the links between neurons are
(uneven distribution) of water quality data sets is common, determined by the training algorithm. The most common and
where the number of training instances of a minority class is standard algorithm is the backpropagation training algorithm,
much smaller, compared to other majority classes (Nguyen the central idea of which is that the errors for the neurons of
et al., 2008). As a result, the neural network has difficulty in the hidden layer are determined by back-propagation of the
learning from imbalanced data sets, since the network tends to error of the neurons of the output layer, as shown in Fig. 1.
ignore the minority class, and treats it as noise, due to the There are a number of variations in backpropagation training
overwhelming training instances of the majority class (e.g. algorithms on the basic algorithm that are based on other
Murphey et al., 2004; Nguyen et al., 2008). To alleviate the standard optimization techniques, such as the steepest descent
problem of the imbalanced training data set, Lu et al. (1998), algorithm, conjugate gradient algorithm, and Newton's
Berardi and Zhang (1999), and Yoon and Kwek (2007) have method. Among various backpropagation methods, the Lev-
attempted to employ resampling methods, such as over- enbergeMarquardt (LM) algorithm has been very successfully
sampling and under-sampling, and modification of the applied to the training of ANN to predict streamflow and water

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 3

Fig. 1. Schematic diagram of backpropagation training algorithm and typical neuron model.

quality, providing significant speedup and faster convergence models into a single one is referred to as Ensemble Modeling
than the steepest descent-based algorithm, and conjugate (Laucelli et al., 2007). The application of an ensemble tech-
gradient-based algorithms (e.g. Zamani et al., 2009). In this nique is divided into two steps. The first step is the creation of
study, the LevenbergeMarquardt (LM) algorithm was applied individual ensemble members, and the second step is the
to train the network. combination of outputs of the ensemble members, to produce
Ensemble techniques have been applied with considerable the most appropriate output (Araghinejad et al., 2011).
success in hydrology and environmental science, as an In this study, the ensemble modeling technique was applied
approach to enhance the skill of forecasts (Krogh and to estimate the performance of the ANN, without the influence
Vedelsby, 1995; Araghinejad et al., 2011). The motivation of initial weight parameters on the model results, as shown in
for this procedure is based on the idea that one might improve Fig. 2. For the networks which have a various number of
the performance of a single generic predictor, by combining hidden neurons (Network 1 to n in Fig. 2), ensemble members,
the outputs of several individual predictors (Krogh and neural networks with a hundred of randomly generated initial
Vedelsby, 1995). The technique of combining multiple weight sets (Network 1-1 to 1-100 and Network n-1 to n-100

Fig. 2. ANN ensemble modeling with various initial weight parameters.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

4 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

IQRi ¼ Q75 ðyi Þ  Q25 ðyi Þ ð5Þ

where, Q75( yi) and Q25( yi) are the 25th and 75th percentile
values of the ANN ensemble model results for the ith data set,
respectively, as shown in Fig. 3.

2.2. Clustered ANN modeling

2.2.1. Data clustering method


Data clustering is the process of partitioning a data set into
several groups or classes. The methods to summarize a huge
number of data into a small number of groups or categories, in
order to further facilitate its analysis, are called “Data Clus-
tering Methods”. The most representative and classical clus-
tering methods are K-means (or hard C-means) clustering,
fuzzy C-means clustering, and subtractive clustering
(Hammouda and Karray, 2000).
Fig. 3. Inter-Quantile Range (IQR) of ANN ensemble model results.
The K-means clustering method (KCM) is an algorithm
based on finding data clusters in a data set, such that a cost
in Fig. 2), were created and applied in parallel to the same function (or an objection function) of dissimilarity measure is
training data set. After training, each ensemble model for minimized. In most cases this dissimilarity measure is chosen
Network 1 to Network n was tested by test data set for as the Euclidean distance. A set of n vectors xj, j ¼ 1,…,n, are
checking in which number of hidden neurons the ensemble to be partitioned into c groups Gi, i ¼ 1,…,c. The cost func-
models show a good performance. The best ensemble model tion, based on the Euclidean distance between a vector xk in
was selected among the Network 1 to Network n. Then vali- group Gi, and the corresponding cluster center ci, can be
dation data set was used to validate whether the selected defined by
ensemble model gives good results for the input data unpre- !
sented in training and test data set. Xc Xc X
J¼ Ji ¼ kxk  ci k ð6Þ
2
R squared, which is given in Eq. (1), was used to estimate i¼1 i¼1 k;xk 2Gi
the performance of ensemble models. R squared indicates how
well a result of the ANN ensemble model fitted a set of test or P
where, Ji ¼ kxk  ci k2 is the cost function within group
validation output data. k;xk 2Gi
Gi. The partitioned groups are defined by a c  n binary
SSrss membership matrix U, where if the data point xj belongs to
R2 ¼ 1  ð1Þ
SStss group i, the element uij is 1; and otherwise, 0. Once the cluster
centers ci are fixed, uij can be derived as Eq. (7).
X   2  2
SStss ¼ ðxi  xÞ2 ð2Þ    
i
uij ¼ 1 if xj  ci  xj  ck for; each ksi ð7Þ
0 otherwise
X which means that xj belongs to group i, if ci is the closest
SSrss ¼ ðxi  yi Þ2 ð3Þ center, among all centers. On the other hand, if the member-
i
ship matrix is fixed, i.e. if uij is fixed, then the optimal center ci
that minimizes Eq. (6) is the mean of all vectors in group Gi.
where SStss is the total sum of squared, and SSrss is the sum of
1 X
squares of residuals. xi is the ith observed value or target value. ci ¼ xk ð8Þ
x is the mean value of xi for all the observed data set. yi is the jGi j k;xk 2Gi
P
ensemble mean of the Network for the ith data set. In addition where, jGi j is the size of Gi, or jGi j ¼ nj¼1 uij. The algo-
to R squared, the root-mean-square error (RMSE) and inter- rithm is presented with a data set xj, i ¼ 1,…,n; it then itera-
quartile range (IQR, distance between the 25th and 75th tively determines the cluster centers ci, and the membership
percentile) were used to measure the whole bias error between matrix U.
the ensemble means and the observed values, and the variance The Fuzzy C-means clustering method (FCM) was pro-
error of the ANN ensemble model itself. posed by Bezdek in 1973, as an improvement over the earlier
rffiffiffiffiffiffiffiffiffi KCM. The FCM employs fuzzy partitioning, such that a given
SSrss data point can belong to several groups, with the degree of
RMSE ¼ ð4Þ
n belongingness specified by membership grades between 0 and
1. The membership matrix U is allowed to have elements with
where, n is the number of the observed data. values between 0 and 1. However, the summation of degrees

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 5

of belongingness of a data point to all clusters is always equal reduced density measure. After revising the density function,
to unity. the next cluster center is selected as the point having the
greatest density value. This process continues, until a suffi-
X
n
uij ¼ 1; c j ¼ 1; …; n ð9Þ cient number of clusters is attained.
j¼1
2.2.2. Cluster validity index
The cost function for FCM is a generalization of Eq. (10). Humans can perform clustering procedures in two or three
X
c c X
X n dimensions; but most real problems involve clustering in
JðU; c1 ; …; cc Þ ¼ ji ¼ umij dij2 ð10Þ higher dimensions. If there is no visual perception of the
i¼1 i¼1 j¼1 clusters, it is impossible to assess the validity of the clustering
where, uij is between result. The clustering validity index, CDbw (Composing
 0 and  1. ci is the cluster center of
fuzzy group i. dij ¼ ci  xj  is the Euclidean distance be- Density between and with clusters), proposed by Halkidi et al.
tween the ith cluster center and the jth data point, and m is a (2002a, 2002b), is an algorithm-independent clustering index,
weighting exponent. The necessary conditions for Eq. (10) to for assessing the quality of clustering. The index is based on
reach its minimum are two accepted concepts: (1) clusters' compactness; and (2)
Pn m clusters' separation.
j¼1 uij xj The clusters' compactness is calculated by an intra-cluster
c i ¼ Pn m ð11Þ density index, which is defined as the average number of
j¼1 uij
points that belong to the neighborhood of each cluster center.

uij ¼
1
 2=ðm1Þ ð12Þ 1X c
1X ni
 
Pc Intra denðcÞ ¼ density vij ; c > 1 ð15Þ
dij c i¼1 ni j¼1
k¼1 dkj

The algorithm works iteratively through the preceding two where, c is the number of clusters, vij is the jth point of the ith
conditions, until no more improvement is noticed. cluster, and ni is the number of points in the ith cluster.
The subtractive clustering method (SCM) proposed by Chiu The term
P i density (vij) is given by
(1994) is a modified method of mountain clustering. Because densityðvij Þ ¼ nk¼1 f ðxk ; vij Þ, and f(xk,vij) is defined as
the mountain function has to be evaluated at each grid point,   
  1; xk  vij   kstdevðiÞk
its computation grows exponentially, with the increase in f xk ; vij ¼ ð16Þ
0; otherwise
dimensionality of the data. Subtractive clustering solves this
problem by using data points as the candidates for cluster where, stdev(i) is a standard deviation vector of the ith cluster.
centers, instead of the grid points used in mountain clustering. f(xk,vij) counts points xk inside a hyper-sphere, with radius
This means that the computation is proportional to the data kstdevðiÞk around vij. The intra-cluster density is significantly
size, instead of the data dimension. While the actual cluster high for a well-separated cluster.
centers are not necessarily located at one of the data points, in The clusters' separation is defined by the Inter-cluster
most cases that point is a good approximation (Hammouda density, and Sep. Inter-cluster density is the density in be-
and Karray, 2000). Since each data point is a candidate for tween cluster areas. The density in between-cluster regions for
cluster centers, a density measure at data point xi is defined as a well-separated cluster is significantly low. It is defined as
 !
X n xi  xj 2 X
c
Di ¼ exp  ð13Þ Inter denðcÞ¼
j¼1 ðra =2Þ2 i¼1

where, xi,xj is the ith, jth data point (isj ), and ra is a X


c
kcloserepðiÞcloserepðjÞk  
 density uij ; c>1
positive constant, representing a neighborhood radius. Hence, kstdevðiÞkþkstdevðjÞk
if a data point has many neighboring data points, it will have a j¼1
jsi
high density value. The first cluster center xc1 is chosen as the
point having the measure of the largest density value Dc1. ð17Þ
Next, the density measure of each data point xi is revised, as where, close_rep(i) and close_rep( j ) are the closest pair of
follows.
points of the ith and jth clusters, and uij is the middle point
! between the pair points close_rep(i) and close_rep( j ). The
kxi  xc1 k
2
Di ¼ Di  Dc1 exp  ð14Þ density(uij) is given by
ðrb =2Þ2  
  nX i þnj
f xl ; uij
density uij ¼ ð18Þ
where, rb is a positive constant that defines a neighborhood l¼1
ni þ nj
that has measurable reductions in density measure, and is
usually set 25% or 50% greater than ra. Therefore, the data where, xl is the input vector belonging to the ith and jth
points near the first cluster center xc1 will have a significantly cluster; and f(xl,uij) is defined by

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

6 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

Fig. 4. Subtractive-based KCM and FCM with CDbw.

  
  1; xl  uij   ðkstdevðiÞk þ kstdevðjÞkÞ=2 and so on. The SCM is of this type. In both cases, this clus-
f xi ; uij ¼
0; otherwise tering method can be applied; however, if the number of
ð19Þ clusters and the initial cluster centers are not known, KCM and
FCM cannot be used (Hammouda and Karray, 2000). SCM,
The cluster's separation includes both inter-cluster dis- and KCM or FCM, can complement each other, by providing
tances, and inter-cluster density. The goal of clustering is that the initial cluster center, and numbers for KCM or FCM,
the inter-clusters distance is significantly high, while the inter- through SCM. However, the number of clusters and the center
cluster density is significantly low. The definition of a cluster's of clusters of the SCM are changed, depending on the width
separation is parameter, ra of the density function. Actually, the SCM needs
Xc Xc
kclose repðiÞ  close repðjÞk to have the number of clusters known from the beginning. In
SepðcÞ ¼ ; c>1 this study, CDbw was applied, to define the optimum width
i¼1 j ¼ 1
1 þ Inter denðcÞ
parameter of SCM, in order to get the initial cluster centers for
jsi the given number of clusters; and these conjunctive methods
ð20Þ are named Subtractive-based KCM, and Subtractive-based
The overall clustering validity index is defined as FCM with CDbw. Fig. 4 illustrates the Subtractive-based
KCM and Subtractive-based FCM for a given number of
CDbwðcÞ ¼ Intra denðcÞ  SepðcÞ ð21Þ clusters, using CDbw.
The CDbw is significantly high for a well-separated cluster.
2.2.4. ANN modeling using conjunctive clustering method
2.2.3. Conjunctive clustering methods with CDbw with CDbw
Among the clustering methods, KCM and FCM clustering Imbalanced training data can be considered from two points
rely on knowing the number of clusters, and the initial cluster of view. The first is the distribution and range of the output
center. In that case, the algorithm tries to partition the data into (desired) data. The neural network is calibrated by a training
the given number of clusters. KCM is very simple and fast. algorithm for the given historical output data in the training
However, there are two key features that are regarded as the data set. So it cannot guarantee that the network will produce
biggest drawbacks. One is that the number of clusters is an optimal results, for the range of output that the network was
input parameter. An inappropriate choice of the number of not trained for. The training algorithm is a kind of optimiza-
cluster may yield poor results. The other is that there is no tion algorithm for minimizing the sum of squared errors be-
guarantee that it will converge to the global optimum, and the tween the given historical output data, and the model results.
result may depend on the initial cluster center. FCM also has Naturally, the network is optimized by the output data in the
the same problems as KCM. range where a large amount of data is distributed, or the
In other cases, it is not necessary to have the number of magnitude of the value is high; since these significantly affect
clusters known from the beginning; the algorithm starts by the sum of squared errors. The second is the patterns of input
finding the first prototype cluster, and then finds the second, data. The network is trained by the input data for output data.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 7

Fig. 5. n-clustered ANN modeling.

In the case that the network is trained with a different input In this study, the proposed conjunctive clustering methods
vector for the same output vector, or with the same input with CDbw were applied for the patterns of input vectors, in
vector for a different output vector, the modeling error can be order to decrease the range of output vectors. Once the training
increased. data set is partitioned by the proposed clustering methods into

Fig. 6. Study site and available data.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

8 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

Table 1 3. Model application


Correlation analysis results between antecedents of each variable and the
desired outputs.
3.1. Study site and available data
Desired outputs
pHtþ1 DOtþ1 Turbtþ1 TNtþ1 TPtþ1 The Nakdong River, one of the four major rivers in South
Inputs variables pHt 0.97 0.48 0.34 0.14 0.19 Korea, is located in the southeastern part of the Korean
pHt1 0.92 0.45 0.33 0.12 0.2 peninsula. The river is about 525 km long, and has a drainage
DOt 0.51 0.96 0.21 0.42 0.66 area of 23,817 km2. Busan, one of the largest cities in South
DOt1 0.52 0.91 0.2 0.41 0.66
Turbt 0.39 0.24 0.72 0.1 0.13
Korea, is located near the river mouth. About 7 million people
Turbt1 0.38 0.25 0.52 0.08 0.14 reside within the basin, and more than 13 million people intake
TNt 0.15 0.42 0.05 0.95 0.35 drinking water from this river. Thus, the prediction of the
TNt1 0.14 0.41 0.03 0.90 0.35 water quality of this area has been a major national concern.
TPt 0.19 0.65 0.12 0.36 0.97 However, it is very hard task to forecast the water quality of
TPt1 0.2 0.64 0.11 0.37 0.93
downstream of the river, using a model based on physical and
chemical processes, due to the uncertainty of contaminant
sources. Moreover, eight weirs are located in the river; enor-
mous efforts for calibration are needed to develop the model.
several classes, according to the patterns of the input vectors,
In this study, ANN models for water quality prediction
the same number of neural networks as the number of clusters
were developed, and applied to the Sangdong water quality
were developed with the data subset in each class, after
station in Nakdong River, as shown in Fig. 6. In the applica-
dividing them into training, test, and validation data sets.
tion, the observed daily data from 2009 to 2012 was used to
These clustered ANN models can give a more specified result,
predict the water quality for the following day. The observed
than the model trained by an all available data set, because the
data of Sangdong consists of a 785 data set, with five water
network is tuned by similar properties of the training data set.
quality variables; pH, DO, turbidity (Turb), TN, and TP. These
Consequently, the ANN modeling accuracy for input vectors
variables were discretized into an input variable with time
can be maximized, with a reduced size of the training data set.
t(day) and t1, for the desired output with tþ1. Table 1 shows
Fig. 5 illustrates the clustered ANN modeling, with the pro-
the correlation analysis results between antecedents of each
posed clustering methods.
variable with the desired output. The antecedents of each
variable are highly correlated with the desired output with
tþ1. Therefore, the water quality at time tþ1 was predicted by
means of the water quality variables at times t and t1.

Table 2
Stratified sampling result for building training, test and validation data subset
of DO.
Interval (mg/L) Distribution Number of sampling data set
Lower Upper # of data Ratio (%)a Training Test Validation
1 2 3 0.38 18 3 3
2 3 8 1.02
3 4 5 0.64
4 5 8 1.02
5 6 32 4.08 24 4 4
6 7 68 8.66 50 9 9
7 8 92 11.72 68 12 12
8 9 122 15.54 90 16 16
9 10 110 14.01 82 14 14
10 11 94 11.97 70 12 12
11 12 57 7.26 43 7 7
12 13 62 7.90 46 8 8
13 14 45 5.73 35 5 5
14 15 48 6.11 36 6 6
15 16 16 2.04 12 2 2
16 17 7 0.89 11 2 2
17 18 6 0.76
18 19 1 0.13
19 20 1 0.13
Total 785 100 585 100 100
Fig. 7. Variance of ANN results for various number of training data set, hidden a
Distribution ratio(%) ¼ number of data in the interval/total number of
neurons and ensemble members for DO. data  100.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 9

Table 3 neurons increased. The RMSE of the models in each number


AR(2) model parameters of each variable for the training data set. of hidden neurons decreased, as the number of ensemble
Ytþ1 a b εtþ1 members increased. Fig. 7 shows the variance of the ANN
pHtþ1 1.168 0.330 1.217 results for various numbers of training data set, hidden neu-
DOtþ1 1.102 0.213 0.875 rons, and ensemble members for DO. When the number of
TNtþ1 1.005 0.099 0.228 training data set is 500, the averaged IQR is the lowest, and the
TPtþ1 1.000 0.054 0.003
Turbtþ1 0.843 0.092 4.960
training time no longer increases (Fig. 7 (a)). Fig. 7 (b) shows
the averaged RMSE and IQR for various numbers of hidden
neurons, when the number of training data set is 500. As the
3.2. Determination of the number of ensemble members number of ensemble members increases, the averaged RMSE
and IQR decreases. When the number of ensemble members is
In order to determine the adequate number of ensemble more than 100, the variance of averaged RMSE and IQR is not
members to reduce the effect of initial weight parameters on much. Therefore, the ensembles of 100 members with a
the variance of ANN model results, ANN models with various training data set of more than 500 were applied in this study.
numbers of training data set, hidden neurons, and ensemble
members were applied to the water quality variables of pH, 3.3. ANN ensemble modeling with the available data set
DO, Turb, TN, and TP. The number of training data set is 100,
300, and 500. The number of hidden neurons is 1, 2, 4, 8, and As mentioned above, the performance of the ANN model is
16. The number of ensemble members is 30, 50, 80, 100, and significantly affected by the training data sets. If the training
150. The LevenbergeMarquardt algorithm was used to train data set was sampled by a random sampling method, the ANN
the ANN models to reach the required training goal of 0.001 in model can give very different results, depending on the
500 epochs. The tangent sigmoid activation function was used sampled training data set. In particular, if the training, test, and
for the hidden layer, and a linear activation function for the validation data sets were sampled in the specific range of the
output layer. output data, the ANN model cannot be properly trained, tested,
In the results, the training time increased, and the averaged and validated. Therefore, in this study, the training data set
IQR of various numbers of hidden neurons in each number of was sampled by a stratified sampling method. According to the
ensemble members decreased, as the number of training data distribution ratio of the output (desired) variable in the original
set increased. The IQR of the models in each number of data set, the data subsets of the same distribution ratio as the
ensemble members increased, as the number of hidden original data set were built, so that the ANN models could be

Table 4
Comparisons of each ANN ensemble model result for various number of hidden neurons.
Output vector # of hidden neurons Training Test Validation
2 2
RMSE Mean IQR R RMSE Mean IQR R RMSE Mean IQR R2
pHtþ1 1 0.123 0 0.957 0.162 0 0.927 0.161 0 0.926
2 0.12 0.016 0.959 0.174 0.016 0.917 0.161 0.019 0.927
4 0.117 0.016 0.961 0.163 0.016 0.927 0.166 0.018 0.922
8 0.111 0.025 0.965 0.151 0.028 0.937 0.178 0.028 0.91
AR(2) 0.150 e 0.935 0.188 e 0.903 0.173 e 0.915
DOtþ1 1 0.812 0 0.924 0.766 0 0.931 0.828 0 0.919
2 0.789 0.118 0.928 0.764 0.088 0.931 0.861 0.121 0.912
4 0.757 0.133 0.934 0.777 0.111 0.929 1.069 0.144 0.865
8 0.688 0.198 0.945 0.787 0.179 0.927 2.276 0.403 0.388
AR(2) 0.850 e 0.916 0.836 e 0.918 0.815 e 0.921
Turbtþ1 1 12.502 0 0.609 13.189 0 0.559 11.799 0 0.583
2 11.713 1.499 0.657 13.233 1.49 0.556 11.971 1.502 0.571
4 10.507 2.06 0.724 14.098 2.534 0.496 12.592 2.121 0.525
8 8.306 1.611 0.827 32.585 4.086 1.693 14.626 2.243 0.359
AR(2) 14.007 e 0.509 11.835 e 0.645 12.943 e 0.498
TNtþ1 1 0.17 0 0.943 0.528 0 0.437 0.153 0 0.952
2 0.167 0.018 0.945 0.283 0.018 0.838 0.217 0.017 0.904
4 0.161 0.022 0.949 0.374 0.022 0.717 0.258 0.024 0.864
8 0.155 0.03 0.953 0.29 0.03 0.83 0.214 0.041 0.907
AR(2) 0.184 e 0.933 0.306 e 0.811 0.274 e 0.847
TPtþ1 1 0.011 0 0.906 0.011 0 0.917 0.011 0 0.911
2 0.011 0.002 0.912 0.01 0.001 0.918 0.011 0.002 0.912
4 0.01 0.002 0.919 0.01 0.002 0.919 0.011 0.002 0.912
8 0.009 0.002 0.93 0.01 0.002 0.925 0.011 0.002 0.909
AR(2) 0.011 e 0.902 0.011 e 0.914 0.011 e 0.910

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

10 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

Fig. 8. Interval RMSE according to the distribution ratio of training target data in validation results.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 11

Table 5
Clustering results by SCM of training data set for Turb.
# of cluster ra rb Cluster center (NTU) Intra_den Inter_den Sep CDbw
Turbt Turbt1 Turbtþ1
2 0.6 0.9 14.5 15.0 14.9 13.51 0.27 121.20 1637.35
135.3 155.1 71.0
3 0.15 0.225 11.1 10.9 11.2 18.57 0.57 38.41 713.11
23.4 23.8 23.0
41.7 36.1 40.3
4 0.5 0.75 13.5 14.0 14.3 7.97 0.99 53.64 427.62
62.6 32.8 149.8
100.9 131.7 72.3
155.1 43.4 135.3

trained, tested, and validated for all possible outputs in the distribution ratio of the training target data set (output data in
available data set. Among the available data set of 785, 585 training data set), and the interval RMSE of the validation
data sets were used to train the network, and 100 data sets results, with the optimum number of hidden neurons. The
were used for test and validation, respectively. Table 2 shows RMSE in the interval where a large amount of data is
the stratified sampling results for building training, test, and distributed is usually low. The ANN models for which the
validation data sets of DO. The data subsets for pH, Turb, TN, training target data set is evenly distributed, such as pHtþ1,
and TP were built in the same way. DOtþ1, TPtþ1, and TNtþ1, show low interval RMSE. But the
Each one-step ahead water quality prediction ANN training target data set of Turbtþ1 is significantly imbalanced;
ensemble model for pH, DO, Turb, TN, and TP was developed only 4.84% of the training target data set is included in the
with all available data sets. Single-output neural networks with range between 50 NTU and 160 NTU. This led to ill-training
one hidden layer were used, and the number of hidden neurons of the network, and a very high RMSE, in the interval where a
was set as 1, 2, 4, and 8, which were less than, or several times small number of data is distributed.
the number of input neuron (2 input neurons with time t and
t1). And autoregressive models with antecedents of each 3.4. ANN ensemble modeling with clustered data set for
variables, AR(2) model in Eq. (22), were developed for Turb
training data set. AR(2) models for each variable are shown as
Table 3. The proposed conjunctive clustering methods with CDbw
were applied to the data division, for building the balanced
ARð2Þ : Ytþ1 ¼ aYt þ bYt1 þ εtþ1 ð22Þ training data set of Turb, since higher error of the ANN
modeling results for Turb came from the imbalanced training
where Ytþ1 is a desired output, Yt and Yt1 are the antecedents
data set. Using the CDbw, the appropriate width parameters
of the desired output, and εtþ1 is a constant or noise.
need to be selected. In this study, varying the ra, the width
Developed AR(2) models using training data were applied
parameter showing the highest CDbw of the cluster results,
to test and validation data set to compare the result with ANN
was selected as the optimal width parameters, for a given
ensemble models. Table 4 shows the results of each AR(2)
number of clusters. The rb was set to be 50% larger than the ra,
model and the each ANN ensemble model for various numbers
which is usually applied in the SCM. The optimal width pa-
of hidden neurons.
In the result, ANN ensemble models show the slightly rameters and cluster centers of the SCM for the given number
of clusters are shown in Table 5. Each cluster center for the
lower RMSE and higher R squared than the AR(2) models for
given number of clusters was calculated using the cluster
all variables. In the training result of ANN ensemble models,
centers by the SCM, as the initial centers for the KCM and
the network with a higher number of hidden neurons was well
FCM clustering. When the number of clusters was two, the
trained, showing higher R squared; but the IQR, indicating the
cluster result by SCM was found to be the optimal cluster,
variance error of the network was also increasing. The opti-
since the CDbw of cluster result by SCM was the highest
mum number of hidden neurons showing good performance in
among the results by the three cluster methods. When the
the test and validation result can be selected as 1 hidden
neuron for pHtþ1, DOtþ1, TPtþ1, Turbtþ1, and 2 for TNtþ1. The number of clusters was three and four, cluster results by the
Subtractive-based KCM and Subtractive-based FCM were
validation results of pHtþ1, DOtþ1, TPtþ1, and TNtþ1 in the
selected as the optimal clusters, respectively. Optimal cluster
optimum number of hidden neurons show good results, with R
results and the distribution of Turb data set for given number
squared higher than 0.90. However, the networks for Turbtþ1
of clusters are shown in Table 6 and Fig. 9. In the distribution
were not well trained. Consequently, the test and validation
of the clustered training data set, the training target data
result for Turbtþ1 show a higher RMSE and lower R squared,
Turbtþ1 was overlapped in the range of low training target data
compared to the results of the other water quality ANN
Turbtþ1. This means that the training target data is the same,
models. These higher errors were considered to be due to the
imbalanced training data set of Turb. Fig. 8 shows the but the patterns of input data (Turbt and Turbt1) are different.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

12 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

Table 6
Optimal clustering results of Turb training data set for given number of clusters.
Method # of cluster Cluster center (NTU) Intra_den Inter_den Sep CDbw
Turbt Turbt1 Turbtþ1
SCM 2 14.5 15 14.9 13.51 0.27 121.20 1637.35
135.3 155.1 71
SC based KCM 3 10.9 11.0 11.0 16.39 0.33 54.60 894.76
27.0 27.9 28.5
74.1 67.2 69.4
SC based FCM 4 8.8 9.0 9.1 16.05 1.01 44.78 718.74
42.3 42.3 42.2
20.5 20.7 20.6
95.1 84.9 79.5

unclustered ANN models, according to each cluster. The data


sets used for training, test, and validation in the unclustered
ANN models were also used to train, test and validate the
clustered ANN models.
Each clustered ANN model was developed using the two-
clustered data sets. According to the test results of the two-
clustered ANN model, the optimum number of hidden neu-
rons for each model was selected as 4 hidden neurons, for both
class models. The validation results also showed good results
with the optimum number of hidden neurons. Table 7 and
Fig. 10 show the aggregated results of two-clustered ANN
models. Compared to the validation results of the unclustered
ANN model without clustered data subset, the aggregated
results of the two-clustered ANN model show better results, in
terms of RMSE and R squared. Fig. 10 shows that the ANN
Fig. 9. Distribution of training target data for each number of cluster. model with a Class 1 data subset produced good results; but a
model with a Class 2 data subset still shows higher errors in
The training data set of each cluster was partitioned by the the range of Turb over 40 NTU. Because the data subset used
patterns of inputs. for the Class 2 model ranged widely from 10 to 160, and was
To compare the result of the clustered ANN models with still imbalanced, Class 2 model gave a very high RMSE of
the results of the ANN models in which the training data set 27.25 NTU, and very low R squared of 0.20; while the Class 1
was not clustered (unclustered ANN models), the data subset model that had the range of data set between 0 and 60 NTU
of training, test and validation for the clustered ANN models gave a lower RMSE of 3.17 NTU, and higher R squared of
were sampled from the same data subset used for the 0.79, than the Class 2 model.

Table 7
Comparisons of the results between clustered model and the established models for Turb.
Model # of hidden neurons Test result Validation result
2
RMSE Mean IQR R RMSE Mean IQR R2
(NTU) (NTU) (NTU) (NTU)
Without cluster 1 13.19 0.00 0.56 11.80 0.00 0.58
Two-clustered ANN model Class 1 4 3.45 0.82 0.74 3.17 0.88 0.79
Class 2 4 26.05 13.35 0.35 27.25 10.08 0.20
Total 10.96 5.48 0.64
Three-clustered ANN model Class 1 2 3.09 0.43 0.62 2.80 0.25 0.71
Class 2 2 12.06 1.77 0.46 9.24 1.82 0.38
Class 3 2 33.98 18.74 0.17 14.70 23.66 0.84
Total 6.34 8.58 0.88
Four-clustered ANN model Class 1 1 2.46 0.00 0.56 1.78 0.00 0.75
Class 2 2 16.87 1.68 0.25 25.78 2.21 0.35
Class 3 2 3.67 0.99 0.49 4.04 1.44 0.27
Class 4 1 4.27 0.00 0.30 12.90 0.00 e
Total 9.77 0.912 0.714

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 13

Fig. 10. Validation results of two-clustered ANN models for Turb.

In the test results of the three-clustered ANN model, the model. The Class 1, 3 and 4 models show better results than
optimum number of hidden neurons for each model was the three-clustered ANN model, in term of RMSE. But the
selected as 2 hidden neurons, for all class models. Table 5 and Class 2 model, in which the higher value of Turb was modeled,
Fig. 11 show the aggregated results of the three-clustered shows a very high RMSE result. The training target data set of
ANN model. The performance of each class model for high, the Class 2 model was distributed in a wide range, and
middle, and low value of Turb was much improved, compared imbalanced, like the Class 2 of the two-clustered ANN model.
to the two-clustered ANN model. The aggregated results of the Thus, the aggregated results of the four-clustered ANN model
three-clustered ANN model also gave much better results, gave better results with an RMSE of 9.77 NTU, and R squared
with RMSE of 6.34 NTU, and R squared of 0.88, compared to of 0.71, than the two-clustered ANN model, but worse results
the unclustered model, and the two-clustered ANN models. than the three-clustered ANN model.
In the test results of the four-clustered ANN model, the The above analysis revealed that each clustered ANN
optimum number of hidden neurons for each model was model for Turb gave better results, than the ANN model
selected as 1 hidden neuron for the Class 1 and 4 models, and without clustered training data set. Using the clustering
2 hidden neurons for the Class 2 and 3 models. Table 7 and method, the range of the original training target data was
Fig. 12 show the aggregated results of the four-clustered ANN separated and decreased, according to the patterns of the input

Fig. 11. Validation results of three-clustered ANN models for Turb.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

14 S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15

Fig. 12. Validation results of four-clustered ANN models for Turb.

data set. The clustered ANN models trained by a decreased which led to the improvement in the performance of the ANN
range of training target data set produced low RMSE and high modeling.
R squared values. In particular, the three-clustered ANN
model gave the best results among them. A comparison of the 4. Summary and conclusions
interval RMSE of the three-clustered ANN model with the
ANN model without clustering is depicted in Fig. 13. This This study attempted to employ clustering methods for
figure shows that the interval RMSE of the three-clustered building training data sets, and ensemble of models, in order
ANN model was greatly reduced in all the range of Turb, to improve the performance of ANNs modeling. In this
compared to the model results without clustering. This resul- study, an ANN ensemble model with training data subset
ted in the increase of total R squared of the model with built by stratified sampling was developed, and was applied
clustered data set from 0.58 to 0.88, as shown in Table 7. This to Sangdong station on the Nakdong River, in order to
indicates that the imbalance of the training data set was alle- forecast the water quality ( pHtþ1, DOtþ1, Turbtþ1, TNtþ1,
viated, by clustering using the proposed conjunctive methods, and TPtþ1).

Fig. 13. Comparisons of interval RMSE for the three-clustered ANN model and the model without clustering.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL

S.E. Kim, I.W. Seo / Journal of Hydro-environment Research xx (2015) 1e15 15

The ANN ensemble models for pHtþ1, DOtþ1, TNtþ1 and Boucher, M.A., Perreault, L., Anctil, F., 2009. Tools for the assessment of
TPtþ1 showed a good performance, with R squared higher than hydrological ensemble forecasts obtained by neural networks. J. Hydro-
inform. 11 (3e4), 297e307.
0.9. The validation results for water quality variables for Chang, Y.-T., Lin, J., Shieh, J.-S., Abbod, M.F., 2012. Optimization the initial
which the training target data set was distributed evenly, such weights of artificial neural networks via genetic algorithm applied to hip
as pH, DO, TN, and TP, showed low interval RMSE, and high bone fracture prediction. Adv. Fuzzy Syst 9.
R squared. But, Turb of which the distribution of data set was Chiu, S., 1994. Fuzzy model identification based on cluster estimation. J.
imbalanced, showed high interval RMSE, and low R squared; Intell. Fuzzy Syst. 2, 267e278.
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002a. Cluster validity methods:
the distribution ratio of Turb under 50 NTU in the total range part I. Sigmod Rec. 31 (2), 40e45.
between 0 and 160 NTU was 95.17%. Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002b. Clustering validity
To alleviate the imbalance of the training data set of Turb, checking methods: part II. Sigmod Rec. 31 (3), 19e27.
the proposed clustering methods were applied to separate the Hammouda, K., Karray, F., 2000. A Comparative Study of Data Clustering
training data set, according to the patterns of the training data Techniques. Fakhreddine Karray University of Waterloo, Ontario, Canada.
Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans.
set. Then, the ANN ensemble models with the clustered Pattern Anal. 12 (10), 993e1001.
training data set for each number of clusters (clustered ANN Haykin, S., 1999. Neural Networks : a Comprehensive Foundation, second ed.
models) were developed. All clustered ANN models for Turb Prentice Hall, Upper Saddle River, N.J., USA, p. 842.
showed better results, than the model without clustering. In Kasiviswanathan, K.S., Cibin, R., Sudheer, K.P., Chaubey, I., 2013. Con-
particular, the three-clustered ANN model shows that the in- structing prediction interval for artificial neural network rainfall runoff
models based on ensemble simulations. J. Hydrol. 499, 275e288.
terval RMSE decreases, and R squared increases in great Khalil, B., Ouarda, T.B.M.J., St-Hilaire, A., 2011. Estimation of water quality
measure. The R square increases from 0.58 to 0.88, and the characteristics at ungauged sites using artificial neural networks and ca-
total RMSE decreases from 11.8 NTU to 6.3 NTU. nonical correlation analysis. J. Hydrol. 405, 277e287.
The main conclusions for each approach can be summa- Kolen, J.F., Pollack, J.B., 1990. Back Propagation is Sensitive to Initial
rized as follows: Conditions. Morgan Kaufmann Publishers Inc., pp. 860e867
Krogh, A., Vedelsby, J., 1995. Neural network ensembles, cross validation and
active learning. Adv. Neural Inf. Process. Syst. 7, 231e238.
(1) Using the ensembles modeling technique, the global per- Laucelli, D., Babovic, V., Keijzer, M., Giustolisi, O., 2007. Ensemble
formance of the ANN model, considering the variance of modeling approach for rainfall/groundwater balancing. J. Hydroinform. 9
the ANN results for various initial weight parameters, was (2), 95e106.
estimated; and the optimal ANN forecasting models for Lu, Y., Guo, H., Feldkamp, L.A., 1998. Robust neural learning from un-
balanced data samples. IEEE World Congr. Comput. Intell. 3,
each water quality variable could then be selected. 1816e1821.
(2) Using the proposed conjunctive clustering methods for Maier, H.R., Jain, A., Dandy, G.C., Sudheer, K.P., 2010. Methods used for the
building the training data set, the modeling errors by development of neural networks for the prediction of water resource var-
imbalanced data set could be reduced, and the perfor- iables in river systems: current status and future directions. Environ.
mance of the ANN model was improved. Model. Softw. 25 (8), 891e909.
Mulia, I.E., Tay, H., Roopsekhar, K., Tkalich, P., 2013. Hybrid ANN-GA
model for predicting turbidity and chlorophyll-a concentration. J. Hydro-
Acknowledgments environ. Res. 7, 279e299.
Murphey, Y.L., Guo, H., Feldkamp, L.A., 2004. Neural learning from unbal-
This research was supported by a grant (11-TI-C06) from anced data. J. Appl. Intell. 21, 117e128.
the Advanced Water Management Research Program funded Nguyen, G., Bouzerdoum, A., Phung, S.L., 2008. A supervised learning
by the Ministry of Land, Infrastructure and Transport of the approach for imbalanced data sets. In: Proceedings of International Con-
ference on Pattern Recognition Held in Tempa, Florida, USA During
Korean government, and conducted at the Engineering December 8e11, 2008, pp. 1e4.
Research Institute and the Integrated Research Institute of Venkatesan, D., Kannan, K., Saravanan, R., 2009. A genetic algorithm-based
Construction and Environment in Seoul National University, artificial neural network model for the optimization of machining pro-
Seoul, Korea. cesses. Neural Comput. Appl. 18 (2), 135e140.
Yam, Y.F., Chow, T.W.S., 1995. Determining initial weights of feedforward
neural networks based on least-squares method. Neural Process. Lett. 2 (2),
References 13e17.
Yoon, K., Kwek, S., 2007. A data reduction approach for resolving the
Alejo, R., Garcia, V., Sotoca, J.M., Mollineda, R.A., Senchez, J.S., 2007. imbalanced data issue in functional genomics. Neural Comput. Appl. 16,
Improving the performance of the RBF neural networks trained with 295e306.
imbalanced samples. Lect. Notes Comput. Sci. 4507, 162e169. Zamani, A., Azimian, A., Heemink, A., Solomatine, D., 2009. Wave height
Araghinejad, S., Azmi, M., Kholghi, M., 2011. Application of artificial neural prediction at the Caspian Sea using a data-driven model and ensemble-
network ensembles in probabilistic hydrological forecasting. J. Hydrol. based data assimilation methods. J. Hydroinform. 11 (2), 154e164.
407 (1e4), 94e104. Zhou, Z., Liu, X., 2006. Training cost-sensitive neural networks with methods
Berardi, V.L., Zhang, G.P., 1999. The effect of misclassification costs on addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 16
neural network classifier. Decis. Sci. 30 (3), 659e683. (1), 63e77.

Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006

You might also like