Professional Documents
Culture Documents
ScienceDirect
Journal of Hydro-environment Research xx (2015) 1e15
www.elsevier.com/locate/jher
Research paper
Abstract
The Artificial Neural Network (ANN) is a powerful data-driven model that can capture and represent both linear and non-linear relationships
between input and output data. Hence, ANNs have been widely used for the prediction and forecasting of water quality variables, to treat the
uncertainty of contaminant source, and nonlinearity of water quality data. However, the initial weight parameter problem and imbalanced
training data set make it difficult to assess the optimality of the results obtained, and impede the performance of ANN modeling. This study
attempted to employ the ensemble modeling technique to estimate the performance of the ANN without the influence of initial weight pa-
rameters on the model results, and to apply several clustering methods, to alleviate the imbalance of the training data set. An ANN ensemble
model was developed, and applied to forecast the water quality variables, pH, DO, turbidity (Turb), TN, and TP, at Sangdong station, on the
Nakdong River. The optimal ANN models for each water quality variable could be selected from the ensemble modeling. The optimal ANN
models for pH, DO, TN, and TP, of which the training target data set was distributed evenly, showed good results, with R squared higher than
0.90. But the ANN model for Turb, of which the training data set was imbalanced, showed large RMSE (11.8 NTU), and low R squared (0.58).
The training data set of Turb was partitioned into several classes, by conjunctive clustering methods according to the patterns of data set for each
number of clusters. The ANN ensemble models for Turb with the clustered training data set (clustered ANN models) were then developed. All
clustered ANN models for Turb showed better results, than the model without clustering. In particular, the three-clustered ANN model showed
an increase of R squared from 0.58 to 0.88, and a decrease of total RMSE from 11.8 NTU to 6.3 NTU.
© 2015 International Association for Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights
reserved.
Keywords: Artificial Neural Network; Ensemble modeling; Clustering; Water quality forecasting; Nakdong River
http://dx.doi.org/10.1016/j.jher.2014.09.006
1570-6443/© 2015 International Association for Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights reserved.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
MonteeCarlo experiment, Kolen and Pollack (1990) showed training algorithms. However, their methods included modi-
that the training algorithm was very sensitive to the initial fying the probability or distribution of the training data set,
weight parameters. Yam and Chow (1995) presented an al- which led to loss of information of the data set, and increase of
gorithm based on linear algebraic methods for determining the the training time.
optimal initial weight parameters, and showed that with the The objective of this study is to reduce the modeling errors
optimal initial weight parameters, the initial network error can of ANN in water quality prediction caused by the initial
be greatly reduced. Other methods involving the genetic al- weight parameter problems and imbalanced training data set,
gorithm (GA) have been implemented to find the optimal by employing an ensemble modeling technique, and clustering
initial weight parameters, and have enhanced the accuracy of methods. Ensemble modeling was applied to estimate the
the ANN model (Venkatesan et al., 2009; Chang et al., 2012; ANN performance, by removing the effect of initial weight
Mulia et al., 2013). These researches agree that the optimal parameters on the variance of ANN model results. In order to
initial weight parameters were very sensitive to the training alleviate the imbalance of the training data set, several clus-
algorithms and data structures, and there were no fixed optimal tering methods were applied to separate the training data set,
initial weight parameters that were universally applicable to according to the patterns in the training data set, without the
the varieties of data structures and training algorithms. For this process modifying the probability or distribution of the data
reason, ensemble techniques have been applied, due to the set. In this study, each one-step ahead water quality fore-
basic fact that the selection of weights is an optimization casting ANN ensemble model for pH, DO, turbidity (Turb),
problem, with many local minima (Hansen and Salamon, TN, and TP was developed. ANN ensemble models with
1990). Laucelli et al. (2007) applied ensemble modeling and clustered training data sets (clustered ANN models) were
genetic programming to hydrological forecasts, and showed developed for Turb, of which the training data set was highly
the error due to the variance is effectively eliminated, by using imbalanced.
an average model (ensemble model), as the resultant model of
many runs. Boucher et al. (2009) developed the one-day ahead 2. Models and methods
ensemble ANN model, for streamflow forecast. This study
showed that random initialization of the weight parameters 2.1. ANN ensemble modeling
mainly accounted for the uncertainty linked to the optimiza-
tion of the model's parameters; and ensemble modeling could The ANN consists of a very simple and highly inter-
reduce the uncertainty, using the proper assessment tools for connected processor called a neuron. A neuron is an
the performance of ensemble models. Zamani et al. (2009) information-processing unit that is fundamental to the opera-
developed an ensemble ANN model with a Kalman Filter tion of a neural network, and consists of a weight and an
that corrects the outputs of the ANNs, to find the best estimate activation function (Fig. 1). The weights are the most impor-
of the wave height; and showed the prediction results were tant parameters acting as the memory of ANN, and the acti-
improved, as the number of ensemble members increased. vation function provides nonlinear mapping potential with the
Khalil et al. (2011) developed the ensemble ANN model for network. The manner in which the neurons of ANNs are
the estimation of the mean values of four selected water structured determines the architecture of ANNs (Haykin,
quality variables. The results showed that the ensemble ANN 1999). In general, there are three fundamentally different
model provided better generalization ability, than the single classes of network architecture. The first is a single-layer
best ANN model. These researches indicate that the ANN feedforward network, without hidden layers. The second is a
model cannot guarantee that the model will produce an multilayer feedforward network, with more than one hidden
optimal result, without considering appropriate methods for layer. The third is a recurrent neural network, with at least one
the initial weight parameters. feedback loop. In this study, the multilayer feedforward neural
The imbalance of the training data set is one of the network (MFNN) with one hidden layer was used, because it
fundamental problems in ANN modeling, and has recently is able to approximate most of the nonlinear functions
drawn much attention (Zhou and Liu, 2006; Alejo et al., 2007; demanded by practice (Mulia et al., 2013).
Yoon and Kwek, 2007; Nguyen et al., 2008). The imbalance The weight parameters on the links between neurons are
(uneven distribution) of water quality data sets is common, determined by the training algorithm. The most common and
where the number of training instances of a minority class is standard algorithm is the backpropagation training algorithm,
much smaller, compared to other majority classes (Nguyen the central idea of which is that the errors for the neurons of
et al., 2008). As a result, the neural network has difficulty in the hidden layer are determined by back-propagation of the
learning from imbalanced data sets, since the network tends to error of the neurons of the output layer, as shown in Fig. 1.
ignore the minority class, and treats it as noise, due to the There are a number of variations in backpropagation training
overwhelming training instances of the majority class (e.g. algorithms on the basic algorithm that are based on other
Murphey et al., 2004; Nguyen et al., 2008). To alleviate the standard optimization techniques, such as the steepest descent
problem of the imbalanced training data set, Lu et al. (1998), algorithm, conjugate gradient algorithm, and Newton's
Berardi and Zhang (1999), and Yoon and Kwek (2007) have method. Among various backpropagation methods, the Lev-
attempted to employ resampling methods, such as over- enbergeMarquardt (LM) algorithm has been very successfully
sampling and under-sampling, and modification of the applied to the training of ANN to predict streamflow and water
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Fig. 1. Schematic diagram of backpropagation training algorithm and typical neuron model.
quality, providing significant speedup and faster convergence models into a single one is referred to as Ensemble Modeling
than the steepest descent-based algorithm, and conjugate (Laucelli et al., 2007). The application of an ensemble tech-
gradient-based algorithms (e.g. Zamani et al., 2009). In this nique is divided into two steps. The first step is the creation of
study, the LevenbergeMarquardt (LM) algorithm was applied individual ensemble members, and the second step is the
to train the network. combination of outputs of the ensemble members, to produce
Ensemble techniques have been applied with considerable the most appropriate output (Araghinejad et al., 2011).
success in hydrology and environmental science, as an In this study, the ensemble modeling technique was applied
approach to enhance the skill of forecasts (Krogh and to estimate the performance of the ANN, without the influence
Vedelsby, 1995; Araghinejad et al., 2011). The motivation of initial weight parameters on the model results, as shown in
for this procedure is based on the idea that one might improve Fig. 2. For the networks which have a various number of
the performance of a single generic predictor, by combining hidden neurons (Network 1 to n in Fig. 2), ensemble members,
the outputs of several individual predictors (Krogh and neural networks with a hundred of randomly generated initial
Vedelsby, 1995). The technique of combining multiple weight sets (Network 1-1 to 1-100 and Network n-1 to n-100
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
where, Q75( yi) and Q25( yi) are the 25th and 75th percentile
values of the ANN ensemble model results for the ith data set,
respectively, as shown in Fig. 3.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
of belongingness of a data point to all clusters is always equal reduced density measure. After revising the density function,
to unity. the next cluster center is selected as the point having the
greatest density value. This process continues, until a suffi-
X
n
uij ¼ 1; c j ¼ 1; …; n ð9Þ cient number of clusters is attained.
j¼1
2.2.2. Cluster validity index
The cost function for FCM is a generalization of Eq. (10). Humans can perform clustering procedures in two or three
X
c c X
X n dimensions; but most real problems involve clustering in
JðU; c1 ; …; cc Þ ¼ ji ¼ umij dij2 ð10Þ higher dimensions. If there is no visual perception of the
i¼1 i¼1 j¼1 clusters, it is impossible to assess the validity of the clustering
where, uij is between result. The clustering validity index, CDbw (Composing
0 and 1. ci is the cluster center of
fuzzy group i. dij ¼ ci xj is the Euclidean distance be- Density between and with clusters), proposed by Halkidi et al.
tween the ith cluster center and the jth data point, and m is a (2002a, 2002b), is an algorithm-independent clustering index,
weighting exponent. The necessary conditions for Eq. (10) to for assessing the quality of clustering. The index is based on
reach its minimum are two accepted concepts: (1) clusters' compactness; and (2)
Pn m clusters' separation.
j¼1 uij xj The clusters' compactness is calculated by an intra-cluster
c i ¼ Pn m ð11Þ density index, which is defined as the average number of
j¼1 uij
points that belong to the neighborhood of each cluster center.
uij ¼
1
2=ðm1Þ ð12Þ 1X c
1X ni
Pc Intra denðcÞ ¼ density vij ; c > 1 ð15Þ
dij c i¼1 ni j¼1
k¼1 dkj
The algorithm works iteratively through the preceding two where, c is the number of clusters, vij is the jth point of the ith
conditions, until no more improvement is noticed. cluster, and ni is the number of points in the ith cluster.
The subtractive clustering method (SCM) proposed by Chiu The term
P i density (vij) is given by
(1994) is a modified method of mountain clustering. Because densityðvij Þ ¼ nk¼1 f ðxk ; vij Þ, and f(xk,vij) is defined as
the mountain function has to be evaluated at each grid point,
1; xk vij kstdevðiÞk
its computation grows exponentially, with the increase in f xk ; vij ¼ ð16Þ
0; otherwise
dimensionality of the data. Subtractive clustering solves this
problem by using data points as the candidates for cluster where, stdev(i) is a standard deviation vector of the ith cluster.
centers, instead of the grid points used in mountain clustering. f(xk,vij) counts points xk inside a hyper-sphere, with radius
This means that the computation is proportional to the data kstdevðiÞk around vij. The intra-cluster density is significantly
size, instead of the data dimension. While the actual cluster high for a well-separated cluster.
centers are not necessarily located at one of the data points, in The clusters' separation is defined by the Inter-cluster
most cases that point is a good approximation (Hammouda density, and Sep. Inter-cluster density is the density in be-
and Karray, 2000). Since each data point is a candidate for tween cluster areas. The density in between-cluster regions for
cluster centers, a density measure at data point xi is defined as a well-separated cluster is significantly low. It is defined as
!
X n xi xj 2 X
c
Di ¼ exp ð13Þ Inter denðcÞ¼
j¼1 ðra =2Þ2 i¼1
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
1; xl uij ðkstdevðiÞk þ kstdevðjÞkÞ=2 and so on. The SCM is of this type. In both cases, this clus-
f xi ; uij ¼
0; otherwise tering method can be applied; however, if the number of
ð19Þ clusters and the initial cluster centers are not known, KCM and
FCM cannot be used (Hammouda and Karray, 2000). SCM,
The cluster's separation includes both inter-cluster dis- and KCM or FCM, can complement each other, by providing
tances, and inter-cluster density. The goal of clustering is that the initial cluster center, and numbers for KCM or FCM,
the inter-clusters distance is significantly high, while the inter- through SCM. However, the number of clusters and the center
cluster density is significantly low. The definition of a cluster's of clusters of the SCM are changed, depending on the width
separation is parameter, ra of the density function. Actually, the SCM needs
Xc Xc
kclose repðiÞ close repðjÞk to have the number of clusters known from the beginning. In
SepðcÞ ¼ ; c>1 this study, CDbw was applied, to define the optimum width
i¼1 j ¼ 1
1 þ Inter denðcÞ
parameter of SCM, in order to get the initial cluster centers for
jsi the given number of clusters; and these conjunctive methods
ð20Þ are named Subtractive-based KCM, and Subtractive-based
The overall clustering validity index is defined as FCM with CDbw. Fig. 4 illustrates the Subtractive-based
KCM and Subtractive-based FCM for a given number of
CDbwðcÞ ¼ Intra denðcÞ SepðcÞ ð21Þ clusters, using CDbw.
The CDbw is significantly high for a well-separated cluster.
2.2.4. ANN modeling using conjunctive clustering method
2.2.3. Conjunctive clustering methods with CDbw with CDbw
Among the clustering methods, KCM and FCM clustering Imbalanced training data can be considered from two points
rely on knowing the number of clusters, and the initial cluster of view. The first is the distribution and range of the output
center. In that case, the algorithm tries to partition the data into (desired) data. The neural network is calibrated by a training
the given number of clusters. KCM is very simple and fast. algorithm for the given historical output data in the training
However, there are two key features that are regarded as the data set. So it cannot guarantee that the network will produce
biggest drawbacks. One is that the number of clusters is an optimal results, for the range of output that the network was
input parameter. An inappropriate choice of the number of not trained for. The training algorithm is a kind of optimiza-
cluster may yield poor results. The other is that there is no tion algorithm for minimizing the sum of squared errors be-
guarantee that it will converge to the global optimum, and the tween the given historical output data, and the model results.
result may depend on the initial cluster center. FCM also has Naturally, the network is optimized by the output data in the
the same problems as KCM. range where a large amount of data is distributed, or the
In other cases, it is not necessary to have the number of magnitude of the value is high; since these significantly affect
clusters known from the beginning; the algorithm starts by the sum of squared errors. The second is the patterns of input
finding the first prototype cluster, and then finds the second, data. The network is trained by the input data for output data.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
In the case that the network is trained with a different input In this study, the proposed conjunctive clustering methods
vector for the same output vector, or with the same input with CDbw were applied for the patterns of input vectors, in
vector for a different output vector, the modeling error can be order to decrease the range of output vectors. Once the training
increased. data set is partitioned by the proposed clustering methods into
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Table 2
Stratified sampling result for building training, test and validation data subset
of DO.
Interval (mg/L) Distribution Number of sampling data set
Lower Upper # of data Ratio (%)a Training Test Validation
1 2 3 0.38 18 3 3
2 3 8 1.02
3 4 5 0.64
4 5 8 1.02
5 6 32 4.08 24 4 4
6 7 68 8.66 50 9 9
7 8 92 11.72 68 12 12
8 9 122 15.54 90 16 16
9 10 110 14.01 82 14 14
10 11 94 11.97 70 12 12
11 12 57 7.26 43 7 7
12 13 62 7.90 46 8 8
13 14 45 5.73 35 5 5
14 15 48 6.11 36 6 6
15 16 16 2.04 12 2 2
16 17 7 0.89 11 2 2
17 18 6 0.76
18 19 1 0.13
19 20 1 0.13
Total 785 100 585 100 100
Fig. 7. Variance of ANN results for various number of training data set, hidden a
Distribution ratio(%) ¼ number of data in the interval/total number of
neurons and ensemble members for DO. data 100.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Table 4
Comparisons of each ANN ensemble model result for various number of hidden neurons.
Output vector # of hidden neurons Training Test Validation
2 2
RMSE Mean IQR R RMSE Mean IQR R RMSE Mean IQR R2
pHtþ1 1 0.123 0 0.957 0.162 0 0.927 0.161 0 0.926
2 0.12 0.016 0.959 0.174 0.016 0.917 0.161 0.019 0.927
4 0.117 0.016 0.961 0.163 0.016 0.927 0.166 0.018 0.922
8 0.111 0.025 0.965 0.151 0.028 0.937 0.178 0.028 0.91
AR(2) 0.150 e 0.935 0.188 e 0.903 0.173 e 0.915
DOtþ1 1 0.812 0 0.924 0.766 0 0.931 0.828 0 0.919
2 0.789 0.118 0.928 0.764 0.088 0.931 0.861 0.121 0.912
4 0.757 0.133 0.934 0.777 0.111 0.929 1.069 0.144 0.865
8 0.688 0.198 0.945 0.787 0.179 0.927 2.276 0.403 0.388
AR(2) 0.850 e 0.916 0.836 e 0.918 0.815 e 0.921
Turbtþ1 1 12.502 0 0.609 13.189 0 0.559 11.799 0 0.583
2 11.713 1.499 0.657 13.233 1.49 0.556 11.971 1.502 0.571
4 10.507 2.06 0.724 14.098 2.534 0.496 12.592 2.121 0.525
8 8.306 1.611 0.827 32.585 4.086 1.693 14.626 2.243 0.359
AR(2) 14.007 e 0.509 11.835 e 0.645 12.943 e 0.498
TNtþ1 1 0.17 0 0.943 0.528 0 0.437 0.153 0 0.952
2 0.167 0.018 0.945 0.283 0.018 0.838 0.217 0.017 0.904
4 0.161 0.022 0.949 0.374 0.022 0.717 0.258 0.024 0.864
8 0.155 0.03 0.953 0.29 0.03 0.83 0.214 0.041 0.907
AR(2) 0.184 e 0.933 0.306 e 0.811 0.274 e 0.847
TPtþ1 1 0.011 0 0.906 0.011 0 0.917 0.011 0 0.911
2 0.011 0.002 0.912 0.01 0.001 0.918 0.011 0.002 0.912
4 0.01 0.002 0.919 0.01 0.002 0.919 0.011 0.002 0.912
8 0.009 0.002 0.93 0.01 0.002 0.925 0.011 0.002 0.909
AR(2) 0.011 e 0.902 0.011 e 0.914 0.011 e 0.910
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Fig. 8. Interval RMSE according to the distribution ratio of training target data in validation results.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Table 5
Clustering results by SCM of training data set for Turb.
# of cluster ra rb Cluster center (NTU) Intra_den Inter_den Sep CDbw
Turbt Turbt1 Turbtþ1
2 0.6 0.9 14.5 15.0 14.9 13.51 0.27 121.20 1637.35
135.3 155.1 71.0
3 0.15 0.225 11.1 10.9 11.2 18.57 0.57 38.41 713.11
23.4 23.8 23.0
41.7 36.1 40.3
4 0.5 0.75 13.5 14.0 14.3 7.97 0.99 53.64 427.62
62.6 32.8 149.8
100.9 131.7 72.3
155.1 43.4 135.3
trained, tested, and validated for all possible outputs in the distribution ratio of the training target data set (output data in
available data set. Among the available data set of 785, 585 training data set), and the interval RMSE of the validation
data sets were used to train the network, and 100 data sets results, with the optimum number of hidden neurons. The
were used for test and validation, respectively. Table 2 shows RMSE in the interval where a large amount of data is
the stratified sampling results for building training, test, and distributed is usually low. The ANN models for which the
validation data sets of DO. The data subsets for pH, Turb, TN, training target data set is evenly distributed, such as pHtþ1,
and TP were built in the same way. DOtþ1, TPtþ1, and TNtþ1, show low interval RMSE. But the
Each one-step ahead water quality prediction ANN training target data set of Turbtþ1 is significantly imbalanced;
ensemble model for pH, DO, Turb, TN, and TP was developed only 4.84% of the training target data set is included in the
with all available data sets. Single-output neural networks with range between 50 NTU and 160 NTU. This led to ill-training
one hidden layer were used, and the number of hidden neurons of the network, and a very high RMSE, in the interval where a
was set as 1, 2, 4, and 8, which were less than, or several times small number of data is distributed.
the number of input neuron (2 input neurons with time t and
t1). And autoregressive models with antecedents of each 3.4. ANN ensemble modeling with clustered data set for
variables, AR(2) model in Eq. (22), were developed for Turb
training data set. AR(2) models for each variable are shown as
Table 3. The proposed conjunctive clustering methods with CDbw
were applied to the data division, for building the balanced
ARð2Þ : Ytþ1 ¼ aYt þ bYt1 þ εtþ1 ð22Þ training data set of Turb, since higher error of the ANN
modeling results for Turb came from the imbalanced training
where Ytþ1 is a desired output, Yt and Yt1 are the antecedents
data set. Using the CDbw, the appropriate width parameters
of the desired output, and εtþ1 is a constant or noise.
need to be selected. In this study, varying the ra, the width
Developed AR(2) models using training data were applied
parameter showing the highest CDbw of the cluster results,
to test and validation data set to compare the result with ANN
was selected as the optimal width parameters, for a given
ensemble models. Table 4 shows the results of each AR(2)
number of clusters. The rb was set to be 50% larger than the ra,
model and the each ANN ensemble model for various numbers
which is usually applied in the SCM. The optimal width pa-
of hidden neurons.
In the result, ANN ensemble models show the slightly rameters and cluster centers of the SCM for the given number
of clusters are shown in Table 5. Each cluster center for the
lower RMSE and higher R squared than the AR(2) models for
given number of clusters was calculated using the cluster
all variables. In the training result of ANN ensemble models,
centers by the SCM, as the initial centers for the KCM and
the network with a higher number of hidden neurons was well
FCM clustering. When the number of clusters was two, the
trained, showing higher R squared; but the IQR, indicating the
cluster result by SCM was found to be the optimal cluster,
variance error of the network was also increasing. The opti-
since the CDbw of cluster result by SCM was the highest
mum number of hidden neurons showing good performance in
among the results by the three cluster methods. When the
the test and validation result can be selected as 1 hidden
neuron for pHtþ1, DOtþ1, TPtþ1, Turbtþ1, and 2 for TNtþ1. The number of clusters was three and four, cluster results by the
Subtractive-based KCM and Subtractive-based FCM were
validation results of pHtþ1, DOtþ1, TPtþ1, and TNtþ1 in the
selected as the optimal clusters, respectively. Optimal cluster
optimum number of hidden neurons show good results, with R
results and the distribution of Turb data set for given number
squared higher than 0.90. However, the networks for Turbtþ1
of clusters are shown in Table 6 and Fig. 9. In the distribution
were not well trained. Consequently, the test and validation
of the clustered training data set, the training target data
result for Turbtþ1 show a higher RMSE and lower R squared,
Turbtþ1 was overlapped in the range of low training target data
compared to the results of the other water quality ANN
Turbtþ1. This means that the training target data is the same,
models. These higher errors were considered to be due to the
imbalanced training data set of Turb. Fig. 8 shows the but the patterns of input data (Turbt and Turbt1) are different.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
Table 6
Optimal clustering results of Turb training data set for given number of clusters.
Method # of cluster Cluster center (NTU) Intra_den Inter_den Sep CDbw
Turbt Turbt1 Turbtþ1
SCM 2 14.5 15 14.9 13.51 0.27 121.20 1637.35
135.3 155.1 71
SC based KCM 3 10.9 11.0 11.0 16.39 0.33 54.60 894.76
27.0 27.9 28.5
74.1 67.2 69.4
SC based FCM 4 8.8 9.0 9.1 16.05 1.01 44.78 718.74
42.3 42.3 42.2
20.5 20.7 20.6
95.1 84.9 79.5
Table 7
Comparisons of the results between clustered model and the established models for Turb.
Model # of hidden neurons Test result Validation result
2
RMSE Mean IQR R RMSE Mean IQR R2
(NTU) (NTU) (NTU) (NTU)
Without cluster 1 13.19 0.00 0.56 11.80 0.00 0.58
Two-clustered ANN model Class 1 4 3.45 0.82 0.74 3.17 0.88 0.79
Class 2 4 26.05 13.35 0.35 27.25 10.08 0.20
Total 10.96 5.48 0.64
Three-clustered ANN model Class 1 2 3.09 0.43 0.62 2.80 0.25 0.71
Class 2 2 12.06 1.77 0.46 9.24 1.82 0.38
Class 3 2 33.98 18.74 0.17 14.70 23.66 0.84
Total 6.34 8.58 0.88
Four-clustered ANN model Class 1 1 2.46 0.00 0.56 1.78 0.00 0.75
Class 2 2 16.87 1.68 0.25 25.78 2.21 0.35
Class 3 2 3.67 0.99 0.49 4.04 1.44 0.27
Class 4 1 4.27 0.00 0.30 12.90 0.00 e
Total 9.77 0.912 0.714
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
In the test results of the three-clustered ANN model, the model. The Class 1, 3 and 4 models show better results than
optimum number of hidden neurons for each model was the three-clustered ANN model, in term of RMSE. But the
selected as 2 hidden neurons, for all class models. Table 5 and Class 2 model, in which the higher value of Turb was modeled,
Fig. 11 show the aggregated results of the three-clustered shows a very high RMSE result. The training target data set of
ANN model. The performance of each class model for high, the Class 2 model was distributed in a wide range, and
middle, and low value of Turb was much improved, compared imbalanced, like the Class 2 of the two-clustered ANN model.
to the two-clustered ANN model. The aggregated results of the Thus, the aggregated results of the four-clustered ANN model
three-clustered ANN model also gave much better results, gave better results with an RMSE of 9.77 NTU, and R squared
with RMSE of 6.34 NTU, and R squared of 0.88, compared to of 0.71, than the two-clustered ANN model, but worse results
the unclustered model, and the two-clustered ANN models. than the three-clustered ANN model.
In the test results of the four-clustered ANN model, the The above analysis revealed that each clustered ANN
optimum number of hidden neurons for each model was model for Turb gave better results, than the ANN model
selected as 1 hidden neuron for the Class 1 and 4 models, and without clustered training data set. Using the clustering
2 hidden neurons for the Class 2 and 3 models. Table 7 and method, the range of the original training target data was
Fig. 12 show the aggregated results of the four-clustered ANN separated and decreased, according to the patterns of the input
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
data set. The clustered ANN models trained by a decreased which led to the improvement in the performance of the ANN
range of training target data set produced low RMSE and high modeling.
R squared values. In particular, the three-clustered ANN
model gave the best results among them. A comparison of the 4. Summary and conclusions
interval RMSE of the three-clustered ANN model with the
ANN model without clustering is depicted in Fig. 13. This This study attempted to employ clustering methods for
figure shows that the interval RMSE of the three-clustered building training data sets, and ensemble of models, in order
ANN model was greatly reduced in all the range of Turb, to improve the performance of ANNs modeling. In this
compared to the model results without clustering. This resul- study, an ANN ensemble model with training data subset
ted in the increase of total R squared of the model with built by stratified sampling was developed, and was applied
clustered data set from 0.58 to 0.88, as shown in Table 7. This to Sangdong station on the Nakdong River, in order to
indicates that the imbalance of the training data set was alle- forecast the water quality ( pHtþ1, DOtþ1, Turbtþ1, TNtþ1,
viated, by clustering using the proposed conjunctive methods, and TPtþ1).
Fig. 13. Comparisons of interval RMSE for the three-clustered ANN model and the model without clustering.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006
+ MODEL
The ANN ensemble models for pHtþ1, DOtþ1, TNtþ1 and Boucher, M.A., Perreault, L., Anctil, F., 2009. Tools for the assessment of
TPtþ1 showed a good performance, with R squared higher than hydrological ensemble forecasts obtained by neural networks. J. Hydro-
inform. 11 (3e4), 297e307.
0.9. The validation results for water quality variables for Chang, Y.-T., Lin, J., Shieh, J.-S., Abbod, M.F., 2012. Optimization the initial
which the training target data set was distributed evenly, such weights of artificial neural networks via genetic algorithm applied to hip
as pH, DO, TN, and TP, showed low interval RMSE, and high bone fracture prediction. Adv. Fuzzy Syst 9.
R squared. But, Turb of which the distribution of data set was Chiu, S., 1994. Fuzzy model identification based on cluster estimation. J.
imbalanced, showed high interval RMSE, and low R squared; Intell. Fuzzy Syst. 2, 267e278.
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002a. Cluster validity methods:
the distribution ratio of Turb under 50 NTU in the total range part I. Sigmod Rec. 31 (2), 40e45.
between 0 and 160 NTU was 95.17%. Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002b. Clustering validity
To alleviate the imbalance of the training data set of Turb, checking methods: part II. Sigmod Rec. 31 (3), 19e27.
the proposed clustering methods were applied to separate the Hammouda, K., Karray, F., 2000. A Comparative Study of Data Clustering
training data set, according to the patterns of the training data Techniques. Fakhreddine Karray University of Waterloo, Ontario, Canada.
Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans.
set. Then, the ANN ensemble models with the clustered Pattern Anal. 12 (10), 993e1001.
training data set for each number of clusters (clustered ANN Haykin, S., 1999. Neural Networks : a Comprehensive Foundation, second ed.
models) were developed. All clustered ANN models for Turb Prentice Hall, Upper Saddle River, N.J., USA, p. 842.
showed better results, than the model without clustering. In Kasiviswanathan, K.S., Cibin, R., Sudheer, K.P., Chaubey, I., 2013. Con-
particular, the three-clustered ANN model shows that the in- structing prediction interval for artificial neural network rainfall runoff
models based on ensemble simulations. J. Hydrol. 499, 275e288.
terval RMSE decreases, and R squared increases in great Khalil, B., Ouarda, T.B.M.J., St-Hilaire, A., 2011. Estimation of water quality
measure. The R square increases from 0.58 to 0.88, and the characteristics at ungauged sites using artificial neural networks and ca-
total RMSE decreases from 11.8 NTU to 6.3 NTU. nonical correlation analysis. J. Hydrol. 405, 277e287.
The main conclusions for each approach can be summa- Kolen, J.F., Pollack, J.B., 1990. Back Propagation is Sensitive to Initial
rized as follows: Conditions. Morgan Kaufmann Publishers Inc., pp. 860e867
Krogh, A., Vedelsby, J., 1995. Neural network ensembles, cross validation and
active learning. Adv. Neural Inf. Process. Syst. 7, 231e238.
(1) Using the ensembles modeling technique, the global per- Laucelli, D., Babovic, V., Keijzer, M., Giustolisi, O., 2007. Ensemble
formance of the ANN model, considering the variance of modeling approach for rainfall/groundwater balancing. J. Hydroinform. 9
the ANN results for various initial weight parameters, was (2), 95e106.
estimated; and the optimal ANN forecasting models for Lu, Y., Guo, H., Feldkamp, L.A., 1998. Robust neural learning from un-
balanced data samples. IEEE World Congr. Comput. Intell. 3,
each water quality variable could then be selected. 1816e1821.
(2) Using the proposed conjunctive clustering methods for Maier, H.R., Jain, A., Dandy, G.C., Sudheer, K.P., 2010. Methods used for the
building the training data set, the modeling errors by development of neural networks for the prediction of water resource var-
imbalanced data set could be reduced, and the perfor- iables in river systems: current status and future directions. Environ.
mance of the ANN model was improved. Model. Softw. 25 (8), 891e909.
Mulia, I.E., Tay, H., Roopsekhar, K., Tkalich, P., 2013. Hybrid ANN-GA
model for predicting turbidity and chlorophyll-a concentration. J. Hydro-
Acknowledgments environ. Res. 7, 279e299.
Murphey, Y.L., Guo, H., Feldkamp, L.A., 2004. Neural learning from unbal-
This research was supported by a grant (11-TI-C06) from anced data. J. Appl. Intell. 21, 117e128.
the Advanced Water Management Research Program funded Nguyen, G., Bouzerdoum, A., Phung, S.L., 2008. A supervised learning
by the Ministry of Land, Infrastructure and Transport of the approach for imbalanced data sets. In: Proceedings of International Con-
ference on Pattern Recognition Held in Tempa, Florida, USA During
Korean government, and conducted at the Engineering December 8e11, 2008, pp. 1e4.
Research Institute and the Integrated Research Institute of Venkatesan, D., Kannan, K., Saravanan, R., 2009. A genetic algorithm-based
Construction and Environment in Seoul National University, artificial neural network model for the optimization of machining pro-
Seoul, Korea. cesses. Neural Comput. Appl. 18 (2), 135e140.
Yam, Y.F., Chow, T.W.S., 1995. Determining initial weights of feedforward
neural networks based on least-squares method. Neural Process. Lett. 2 (2),
References 13e17.
Yoon, K., Kwek, S., 2007. A data reduction approach for resolving the
Alejo, R., Garcia, V., Sotoca, J.M., Mollineda, R.A., Senchez, J.S., 2007. imbalanced data issue in functional genomics. Neural Comput. Appl. 16,
Improving the performance of the RBF neural networks trained with 295e306.
imbalanced samples. Lect. Notes Comput. Sci. 4507, 162e169. Zamani, A., Azimian, A., Heemink, A., Solomatine, D., 2009. Wave height
Araghinejad, S., Azmi, M., Kholghi, M., 2011. Application of artificial neural prediction at the Caspian Sea using a data-driven model and ensemble-
network ensembles in probabilistic hydrological forecasting. J. Hydrol. based data assimilation methods. J. Hydroinform. 11 (2), 154e164.
407 (1e4), 94e104. Zhou, Z., Liu, X., 2006. Training cost-sensitive neural networks with methods
Berardi, V.L., Zhang, G.P., 1999. The effect of misclassification costs on addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 16
neural network classifier. Decis. Sci. 30 (3), 659e683. (1), 63e77.
Please cite this article in press as: Kim, S.E., Seo, I.W., Artificial Neural Network ensemble modeling with conjunctive data clustering for water quality
prediction in rivers, Journal of Hydro-environment Research (2015), http://dx.doi.org/10.1016/j.jher.2014.09.006