You are on page 1of 8

Available online at www.sciencedirect.

com

ScienceDirect
Aquatic Procedia 4 (2015) 1099 – 1106

INTERNATIONAL CONFERENCE ON WATER RESOURCES, COASTAL AND OCEAN


ENGINEERING (ICWRCOE 2015)

Inference of Water Quality Index using ANFIA and PCA


Mrunmayee.M.Sahooa*, K.C.Patrab, K. K. Khatuac
a
Deptartment of Civil Engineering, NIT, Rourkela, India
b
Department of Civil Engineering, NIT, Rourkela, India
c
Department of Civil Engineering, NIT, Rourkela, India

Abstract

River Brahmani is reported to be polluted from the effluents discharged from the nearby industries, towns and villages located
near the banks. The presence of heavy metal content and radioactive material makes it most unsuitable for human use. The
fertilizers used for agricultural purpose affect the pH and nitrate content of water. Evaluation of Water Quality Index (WQI) of
water is extremely important in the gauging stations located near the industries to prepare remedial measures. To this end, the
present study proposes an efficient methodology such as adaptive Neuro fuzzy inference system (ANFIS) for the prediction of
water quality in Brahmani River. The water quality parameters used to assess are usually inter correlated with each other and this
makes an assessment unreasonable. Therefore, the parameters are uncorrelated using principal component analysis with varimax
rotation. The uncorrelated values are fuzzified to take into account uncertainty and impreciseness during data collection and
application in ANFIS. An efficient rule base and optimal distribution of membership function is constructed from the hybrid
learning algorithm of ANFIS in MATLAB. The model performed quite satisfactory with actual and predicted data on water
quality.
© 2015
© 2015TheTheAuthors.
Authors.Published
Publishedbyby Elsevier
Elsevier B.V.B.V.
This is an open access article under the CC BY-NC-ND license
Peer-review under responsibility of organizing committee of ICWRCOE 2015.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of ICWRCOE 2015
Keywords: ANFIS ; Correlation; MATLAB; Membership function; Principal component; WQI

1. Introduction

Water availability means the ideal combination of quality groundwater and surface water resource taken together at
a locality. While rivers form the lifeblood of most of the cities, towns and villages across the country, groundwater
is also vital to India’s people.
_____________
Corresponding author. Tel.: 08596024674
Email Address: mrunmayee.nitrkl@outlook.com

2214-241X © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of ICWRCOE 2015
doi:10.1016/j.aqpro.2015.02.139
1100 Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106

There is a tremendous variation both in the quantity and quality of discharge from region to region in river basins.
With a few exceptions, all the medium and minor river basins originates in the mountains, and thus exhibit a
common feature of fast flowing and monsoon-fed in the hilly regions and by the time they reach the plains they are
tidal. The treated or untreated discharges from such sources would always find a way into the rivers that oscillate
like a pendulum due to the seasonal flow character of these rivers. Surface water of River Brahmani is found
extremely variable in its chemical composition due to seasonal variations in the relative contributions of ground
water and surface water sources. The mineral content in river water usually bears an inverse relationship to
discharge. The mineral content of river water tends to increase from source to mouth, although the increase may not
be continuous or uniform. Other factors like discharge of city wastewater, industrial waste and mixing of waters can
also affect the nature and concentration of minerals in surface water. Assessment of water quality measures the
analysis of physical, chemical and biological characteristics of water. Knowledge of the water quality and evaluation
of water quality index (WQI) plays a significant role in water quality control and management. The index helps in
interpreting the water quality in a single numerical value. WQI is strongly dependent on various correlated
parameters taken for the study. Also, identification of the suitability of the parameters is critical for accurate
evaluation of WQI. Water quality is generally ascertained based on guidelines provided by agencies such as the
correlation between parameters can be avoided if the data reduction technique like principal component analysis
(PCA) is used to obtain independent principal components. The uncorrelated parameters can be used to assess the
quality of water through the standard procedure of prediction methodology such as neural network or fuzzy logic
tool boxes. However, the fuzzy logic tool box is preferred because it can take into account uncertainty and
impreciseness in the data. A neural network with their learning techniques can be used to learn the fuzzy decision
rules. This combination merges the advantages of a fuzzy system and a neural network. Sahu et al. (2011) have
predicted the water quality index by using adaptive neuro fuzzy interference system.
ANFIS is an adaptive fuzzy inference system implemented in the framework of neural networks. In this study,
WQI of River Brahmani of five gauging sites Panposh down-stream, Talcher up-stream, Kamalanga Downstream,
Aul and Pottamundai located in urban area adjacent to industries is predicted using ANFIS considering 11 water
quality parameters to improve the prediction capability. The correlation of these parameters has been studied and
converted into uncorrelated principal components by SPSS to serve as input to the ANFIS system.
The paper presents the prediction of WQI of River Brahmani by adopting architechture of ANFIs network for
creating a set of fuzzy IF-THEN rules and fuzzy inference system with the membership function to obtain the result.
Neuro-fuzzy rules play an important role in human ability to make decisions. Hence, fuzzy IF-Then rules are used to
make decisions in uncertainty analysis.

2. Data Collection and Analysis

Fig.1. Map shows the locations gauging stations of Brahmani River.


Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106 1101

In the present study data from River Brahmani of Odisha is considered for water quality indexing. The Panposh
gauging station can be located at Rourkela in Odisha. The gauging stations are well known in integrated Rourkela
steel plant, NALCO Smelter Plant and Captive Power Plant, Mahanadi Coal field Limited and chromites mines. The
river water adjacent to these industries is contaminated heavily resulting acidity, toxicity, presence of heavy metal
and microbes. Five gauging stations namely, Panposh downstream, Talcher up-stream, Kamalanga downstream, Aul
and Pottamundai are selected on the basis of mining and industrial activities prevalent nearby. From these five
gauging stations, the data are sampled during 2003-2011 for monsoon season.
The Pearson correlation matrix was prepared within the studied parameters for the data in monsoon season of
nine years and shown in Table 1. It is observed that parameters such as TC and NH4-N exhibit slight correlation
with pH of 0.147 and 0.175 respectively. COD and BOD show strong correlation with a Pearson correlation
coefficient of 0.747.BOD and FC show slight correlation of 0.361.TC, FC and COD show slight correlation with
conductivity having a correlation coefficient of 0.325, 0.380 and 0.331 respectively. When parameters exhibit strong
or moderate correlation with each other, WQI may not characterize the quality of water. Therefore, it is important to
convert correlated parameters into uncorrelated parameters for efficient forecasting of water quality. Principal
component analysis provides a suitable method to transform correlated parameters into uncorrelated components.

Table: 1 Pearson Correlation matrix


Nitrate- NH4- TA. as TH as
pH DO BOD EC. N TC FC COD N CaCO3 CaCO3
pH 1.000
DO 0.030 1.000
BOD 0.120 0.224 1.000
EC 0.073 0.360 0.290 1.000
Nitrate-N 0.160 0.044 0.342 0.001 1.000
TC 0.147 0.362 0.243 0.325 -0.038 1.000
FC 0.126 0.308 0.361 0.380 -0.006 0.764 1.000
COD 0.120 0.220 0.747 0.331 0.227 0.224 0.312 1.000
NH4-N 0.175 0.017 0.048 0.239 -0.175 0.093 0.117 0.128 1.000
TA as CaCO3 0.072 0.142 0.087 0.206 0.062 0.052 0.054 0.122 0.019 1.000
TH as CaCO3 0.137 0.029 0.316 0.173 0.396 0.178 0.229 0.438 0.115 0.274 1.000

3. Principal Component Analysis (PCA)

The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a
large number of interrelated variables, while retaining as much as possible of the variation present in the data set.
All of the new variables are independent, i.e., are not correlated with each other (whereas the original,
untransformed variables may have been correlated to a lesser or greater extent). The new principal component (PC)
axes (Y1, Y2, . . . , Yp) are uncorrelated (e.g. Y1 and Y2 are perpendicular as shown in Fig. 2).
In principle, each of the principal components is a linear combination of the original X values for the p variables
given as:
PC1 = c11X1 + c12X2 +c13X3 +· · ·+c1pXp (axis Y1)
PC2 = c21X1 + c22X2 +c23X3 +· · ·+c2pXp (axis Y2)
...
PCp = cp1X1 + cp2X2 +cp3X3 +· · ·+cppXp (axis Yp) (1)
Where c a,b is the component score coefficient for variable b on PC axis Ya, and Xb is the X score for variable b.
PCA converts a multivariate set of variables (X1, X2, . . . , Xp) to new variables (Y1,Y2, . . . , Yp), which are
uncorrelated with each other. The first principal component consists of a principal component coefficient (αi) for
each variable (p) such that there is maximal variance in the calculated score for each case (n); the factor score for
each case is calculated as α1X1 + α2X2 +· · ·+αiXi +· · ·+αpXp, where Xi is the center value for the ith variable (Xi :
1102 Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106

mean X for the ith variable).

3.1 Determination of Principal Components for the Assessment of Water Quality

In the previous section of data collection and analysis, interrelation of water quality parameters such as pH, DO,
BOD, Conductivity, Nitrate-N, NH4-N, COD, TC, FC, TA of CaCO3 and TH of CaCO3 have been established. As
calculation of WQI considers an additive approach; the parameters considered in the study must be independent of
each other for efficient forecasting of WQI. The number of PCs justified in the study can be judged from scree plot
shown in Fig. 2.

Fig.2: The Scree plot of 11 water quality parameters

It is observed that in scree plot four principal components explaining 65.159% of total variation are sufficient for
the study. The PC1 accounts for 28.485%, PC2 explains 16.496%, PC3 accounts for 10.458% and PC4 explains
9.719% of total variation as calculated by loadings for a cumulative percentage of variance by SPSS. The
component matrixes extracted from the data by SPSS for the calculation of PCs are given in table 3. The PCs in
terms of actual parameters are given in equations 2 to 5.

PC1=-0.017*pH-0.487*DO+0.734*BOD+0.616*Conductivity+0.287*Nitrate-N+0.650*TC
+0.719*FC+0.745*COD+0.195*NH4-N+0.274*TA of CaCO3+0.536*TH of CaCO3
(2)
PC2=-0.563*pH+0.273*DO+0.328*BOD-0.260*Coductivity+0.647*Nitrate-N-0.480*TC ---
0.407*FC+0.315*COD-0.315*NH4-N+0.170*TA of CaCO3+0.454*TH of CaCO3
(3)
PC3=0.237*pH+0.458*DO+0.018*BOD+0.131*Coductivity-0.142*Nitrate-N-0.246*TC
-0.165*FC+0.171*COD+0.799*NH4-N+0.026*TA of CaCO3+0.302*TH of CaCO3
(4)
PC4=-0.113*pH-0.230*DO-0.281*BOD+0.298*Coductivity-0.144*Nitrate-N-0.121*TC
-0.171*FC-0.183*COD-0.002*NH4-N+0.853*TA of CaCO3+0.102*TH of CaCO3
(5)
3.2. Calculation and Formulation of WQI

In the calculation of water quality for river water, the importance of various water quality parameters depends on the
intended use of water and the studied water quality parameters for the point of view of suitability for domestic
purposes. The standards (permissible values of various water quality parameters) for drinking water are
recommended by Indian Council of Medical Research (ICMR). When ICMR standards for water quality are not
available, the standards of United States Public Health Services (USPHS), World Health Organization (WHO),
Indian Standard Institution (ISI) and European Economic Community (EEC) are considered.

The water quality rating qi for the ith water quality parameters is obtained from the relation:

qi = 100(vi/si ), (6)
Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106 1103

where vi = value of the ith water quality parameter at a given sampling station and si = standard permissible value of
ith water quality parameter. This equation ensures that qi = 0 when a pollutant (the ith water quality parameter) is
absent in the water while qi = 100 if the value of this parameter is just equal to its permissible value for drinking
water. Thus, the larger the value of qi, the more polluted is the river water with the ith pollutant.
The sum of unit weight of 11 water quality parameters can be given as:

11
(8)
¦W
i 1
i 1

Table: 3: The Component Matrix by SPSS


Component Matrix
Component
1 2 3 4
pH 0.017 0.563 0.237 0.113
DO (mg/l) 0.487 0.273 0.458 0.230
BOD (mg/l) 0.734 0.328 0.018 0.281
Cond. mmho/cm 0.616 0.260 0.131 0.298
Nitrate-N (mg/l) 0.287 0.647 0.142 0.144
TC (MPN/100 ml) 0.650 0.480 0.246 0.121
FC (MPN/100 ml) 0.719 0.407 0.165 0.171
COD (mg/l) 0.745 0.315 0.171 0.183
NH4-N (mg/l) 0.195 0.315 0.799 0.002
T. Alka. as CaCO3 (mg/l) 0.274 0.170 0.026 0.853
TH as CaCO3 (mg/l) 0.536 0.454 0.302 0.102

The overall WQI of River Brahmani is then calculated by aggregating these sub indices (SI) linearly. Thus, WQI can
be written as:
11 11
WQI [ ¦ q i Wi / ¦ W i ]
i 1 i 1
11

¦q W
i 1
i i

(9)
11

¦ Wi 1
where, i 1 as explained above in (9) Water quality can be categorized into five classifications depending on
WQI values of the parameters. Water quality can be treated as excellent, good, poor, very poor, and unsuitable for
drinking water and domestic purposes if WQI lies in the range of 0–25, 26–50, 51–75, 76–100, and 100
respectively.

3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS) By MATLAB

Adaptive neuro-fuzzy inference system (ANFIS) is the result of coupled between artificial neural networks (ANN)
and fuzzy inference system (FIS) in MATLAB. A neural network and fuzzy logic are related and complementary
technology to each other. The data and feedback can be learned by neural network, however understanding the
knowledge or trend of data can be difficult. But fuzzy logic models and tool boxes are easy to execute because of the
linguistic terms like IF-THEN rules. An Adaptive Neuro-Fuzzy Inference System consists of five important
functional building parts of the fuzzy logic tool box, those are (i) rule base, (ii) data base, (iii) decision making unit,
(iv) fuzzification interface and (v) defuzzification interface.
1104 Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106

3.4. Architecture and Basic Learning Rules of ANFIS system

In a typical adaptive neural network, the network structures are consisting of a number of nodes, characterized by
node function with fixed or adjustable parameters. These nodes are connected through directional links. The basic
learning rule for ANFIS is a back propagation method, which minimizes the error; it is usually the sum of squared
differences between network output and desired output for the data. Generally, Learning or training phase of ANFIS
is a process to determine parameter values to best fit the training data given. The model performance can be checked
by means of distinct data and best fit is expected in testing phase. Considering a first order Takagi, Sugeno and
Kang (TSK) fuzzy inference system, a neoro-fuzzy model consists of two rules, given by Sugeno and Kang (1988)
as

Rule 1: If x is A1 and y is B1 then f1 = p1x +q1y +r1


Rule 2: If x is A2 and y is B2 then f2 = p2x +q2y +r2

If f1 and f2 are constants instead of linear equations, we have zero order TSK fuzzy models. The node function in
the same layer is of the same function family as described below. Here, denotes the output of the ith node in layer j.

3.5. Training and Testing of data by ANFIS GUI Editor

The data are normalized and are used as input in Principal Component Analysis as described in section of Principal
Component Analysis (PCA). Then the principal components extracted by SPSS Version 20 are normalized and used
as an input to ANFIS. The output for each data is the WQI calculated as per procedure given in section of
calculation and formulation of WQI.
A five layered ANFIS model is created during training. Starting with two nodes the number of nodes in the
second layer is increased gradually during training of data. The error starts decreasing with increasing the nodes up
to three. Hence, the number of nodes in the second layer is fixed to three and further analysis of ANFIS model is
carried out. The five layers are defined as, one input, three hidden layer and one output layer. The network is run in
MATLAB 2012b Version 8.
A membership function of Gaussian type (guessmf) is chosen for inputs and a membership function of constant
type for output during generating fuzzy inference system. The flow chart for complete approach and ANFIS
algorithm is shown in Fig.3.

Fig. 3: Flow chart showing the steps of ANFIS model


Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106 1105

4. Results and discussion

The pattern of variation and distribution of actual and predicted WQI for training & testing data of River Brahmani
are shown in Fig.4 (a) and (b) respectively. The plot of training and testing data along with FIS output show the
coherent nature of data in the distribution.

Fig. 4 (a): Distribution of actual Predicted WQI (Training data); (b) Distribution of Actual and Predicted WQI (Testing data)

Here, the blue dots indicateactual output and red dots present predicted data of WQI. The surface plot for these data
sets is shown in Fig.5. It is shown that the surface covers the total landscape and decision space. A complete set of
rules is generated by the Rule Editor in ANFIS GUI Editor for prediction of entrance length as shown in Fig.6.
The mean absolute percentage error (MAPE) for training data is found to be 0.37 and it is 1.09 for testing data.
The mean absolute percentage error (MAPE) of the same set of training and testing data used to forecast WQI
without transforming into principal components are found to be 12.84 and 13.52 respectively..
The regression relationship between actual WQI and predicted WQI via ANFIS model for training & testing data
are plotted in Fig.7. The degree of the coefficient of determination (R2) is 0.970 and 0.792 for training and for
testing respectively. From, this higher degree of coefficient of determination, it can be concluded that the data are
well fitted.
The variation in two principal components is 44.98~45 % is used initially. Then three and four principal
components having 55.43% and 65.15% of total variation are used as input respectively to the ANFIS model. It is
also observed that men absolute percentage error (MAPE) for training data is 0.90, 0.86 and 0.37 respectively when
two, three and four principal components are used as input parameters to ANFIS model. It is found that as the input
parameters decrease the mean absolute percentage error increases due to loss of information. Therefore, the ANFIS
model performs well when four principal components explaining 65.15% of total variation are used as input
components to ANFIS GUI Editor.

Fig.5: The Surface Plot. Fig.6: Set of Rules for rediction of entrance length
1106 Mrunmayee M. Sahoo et al. / Aquatic Procedia 4 (2015) 1099 – 1106

Fig. 7: Correlation between Predicted and actual WQI of Training and Testing data

5. Conclusions
Prediction of water quality with four principal components as inputs and WQI as output for River Brahmani.
outputs of these input parameters are excellent, good, poor, very poor and unsuitable for drinking are concluded by
using this Gaussian type membership function.
Principal component makes a sense to consider water quality data to be an input fuzzy set, which provides the
statistical foundation, to express the term water quality index in a linguistic way, e.g., very poor, poor, decent, good,
very good and excellent.
The water quality values predicted by ANFIS model lies between 21-52,from this range of water quality it can
be said that he water of the Brahmani River can be used for drinking water as well as domestic purposes up to a
certain extent. The degree of the coefficient of determination (R2) is 0.970 and 0.792 for the regression plots
between actual WQI and predicted WQI via ANFIS model for training and testing data respectively. The mean
absolute percentage error for training and testing data are 0.37 and 1.09 respectively. It can be said that the ANFIS
model predicts WQI with certain accuracy.

References

BIS (1991) Specification for drinking water, IS: 10050.


Rumelhart, D.E., Hinton, G.E., William, D.E., 1986. Learning internal representations by error propagation. In: Parallel distributed processing:
Explorations in the microstructure of cognition, MIT Press, Cambridge, 1–8, 318–362.
Sahu, M., Mohapatra, S.S., Sahu, H.B., Patel, R.K., 2011. Prediction of water quality index using neuro fuzzy inference system, Water quality
expo health, 3,175-191.
Singkran,N., Yenpiem, A., Sasitorn, P., 2010. Determining water conditions in the north-eastern rivers of Thailand using time Series and water
quality Index Models, Journal of sustainable Energy & Environment, 1, 47-58.
Sugeno, M., Kang, G.T., 1988. Structure identification of fuzzy model. Fuzzy Sets Syst 28, 942-947.
WHO (2006), guidelines for drinking water quality first addendum to 3rd edition.(I) recommendations, Geneva, Switzerland.
Zhou, J., Su, G., Jiang, C., Deng, Y., Li, C., 2007.A face and finger print identity authentication system based on multi-route detection,
Neurocomputing, 70, 922–931.

You might also like