You are on page 1of 2

“Prediction of Water Quality using Naive Bayesian Algorithm”- The paper with

this title is basically describe and determine the propriety of drinking water using
some statistical quality control technique and Bayesian algorithms. Where the
technique and algorithms are define the accuracy of the prognosis of water
quality. In the procedure, it accessing the quality based on range charts and the
results of resolving data acquired from Bayesian classifier. All the data collected
from the sensors with determined parameters and computed as geographical
position.

Researchers were very concern about the meaningful environmental terms. They
wanted to do, not only analyze but also the right classification of water. They
have proposed a four layer architectural model consisting a lot of sensors for
various data aggregation like chlorides, nitrates, total dissolved solids, pH and
hardness etc. For computing the control limits using the Range chart, the data’s
are combinedly given as input in the water quality predictor component.
Hundreds of samples collected from six different places near Gov. of Tamil Nadu,
India.

Data mining concepts, the data set maintained repeatedly in recent years in same
geographical location. Data set plotted in the range chart for further processing
like naive Bayesian classifier. It considers the prior probability from the existing
training database, which identifies the nature of water appropriately. After that,
for predicting the result Naive Bayes algorithm is applied on the data set and the
confusion matrix is generated.

The whole calculation depends on naive Bayesian classifier algorithm and the
statistical quality control, where sample of data lies within the normal range of
variation and range chart observes the variability. It helps to represents the
potential information generated by splitting the training dataset and finally, gives
the gain ratio which determines the splitting attribute.

In implementation, the acceptable limit, permissible and unfit limits are taken
from ISO standards followed by World Health Organization (WHO). They try to
ensure specification of International glossary of Hydrology. They have used
Netbeans tool using java. The correctness of the classifier is measured using
WEKA tool detailed accuracy by class. Graphs are also plotted to visualize the
change in nature of the water with respect to number of key parameters involved
using WEKA tool Steven’s Multi Parameter Water Quality Sensors, Hydrolab
DataSonde 5, Hydrolab DataSonde 5x are proposed for use, because of its
multiple measure ability like- bright dissolved oxygen, Temperature, Dissolved
Oxygen etc.

Their results shows that the prediction of water quality obtained in above
proposed method is nearest to actual result than the result obtained in
conventional method. They computed Gain Ratio for each attribute and the
attributes with highest gain ratio. Accuracy of classification with number of
attributes increases and the percentage of wrongly classified tuples decreases.

Water plays a dominant role in the growth of the world’s economy and essential
for all the activities. I have done an analysis and assume that prediction of water
quality using supervised machine learning will be more useful. Employ machine
learning methodologies to assist in finding an optimized solution. As well as this,
typically conventional lab analysis and statistical analysis can be used in research
to aid in determining water quality.
I proposed three classification algorithms like C5.0, Naive Bayes and Random
forest with data analytics tool R to generate. Python language will be used for
algorithmic operation purpose. Artificial neural network (ANN) model can use
here for accurate measure of some attributes like- temperature, biochemical
oxygen rate, dissolved oxygen etc. with the implementation platform- MATLAB
NN Toolbox.
In preprocessing, data can be explore by boxplot analysis for outlier detection.
Where most of the parameters can varied enough and on the higher end of the
values, and a boxplot provides insightful visualization to decide outlier detection
threshold values depending upon the problem domain. Water quality class
(WQC) of each sample using the WQI in classification algorithms. Use of z-score
is a conventional standardization and normalization method that represents the
number of standard deviations. In result data, it can be performed correlation
analysis to extract the possible relationships between the parameters. Not only
that, Data Splitting–Cross Validation, Multiple Linear Regression and
Polynomial Regression, Gradient Boosting Algorithm can be performed to best
outcome. For accuracy measure, different standard error decreasing method
should use after algorithmic operation like- Mean Absolute Error (MAE), Root
Mean Squared Error (RMSE) etc.

You might also like