Professional Documents
Culture Documents
COMPUTING TECHNIQUES
Bachelor of Technology
in
Civil Engineering
by
Ashish Malegaon - 19BCL0181
Pratham Kumar - 19BCL0085
Rishu Kumar Takur - 19BCL0169
VIT, Vellore.
April 2023
DECLARATION
Place: Vellore
Date:
MALEGAON ASHISH
PRATHAM KUMAR
CERTIFICATE
This is to certify that the thesis entitled “ Assessment of Groundwater
Quality using soft computing Techniques ” submitted by Ashish Malegaon,
Pratham Kumar, and Rishu Kumar Thakur, School of Civil Engineering, VIT,
for the award of the degree of Bachelor of Technology in Civil Engineering, is a
record of bonafide work carried out by him under my supervision during period, 01.
12. 2022 to 30.04.2023, per the VIT academic and research ethics code.
The contents of this report have not been submitted. They will not be
submitted either in part or in full, for the award of any other degree or diploma in this
institute or any other institute or university. The thesis fulfills the requirements and
regulations of the University and in my opinion, meets the necessary standards for
submission.
Place: Vellore
Date: Signature of the Guide
I would like to express my deepest gratitude to my capstone project guide, Mr. Amit
Mahindrakar, for his unwavering support and guidance throughout the entire duration
of this project. His expertise, patience, and dedication were instrumental in shaping
and bringing the project to fruition. His insightful feedback and valuable suggestions
helped me overcome challenges and advance the project.
I would also like to extend my heartfelt thanks to Professor Uma Shankar for his
constant encouragement, motivation, and valuable input that enriched my
understanding of the project's scope and significance. His mentorship and wisdom
have been invaluable in shaping my project and my overall academic journey.
I would also like to acknowledge the lab assistants who provided me with timely and
relevant information whenever required. Their assistance in gathering and analyzing
data, conducting experiments, and managing the laboratory resources was invaluable
and greatly contributed to the success of this project.
Once again, I express my deepest gratitude to everyone who has contributed to the
successful completion of this capstone project. Your support and guidance have been
invaluable, and I am truly honored and privileged to have had the opportunity to work
with such amazing mentors and colleagues.
Student Name:
(i).MALEGAON ASHOSH
(ii).RISHU KUMAR THAKUR
(iii).PRATHAM KUMAR
I
Executive Summary
The assessment of groundwater quality is crucial for ensuring the safety and
sustainability of our water resources. Soft computing techniques provide a valuable
tool for analyzing complex data sets and making predictions about water quality. In
this report, we explore the use of soft computing techniques, including Artificial Neural
Networks (ANN), Fuzzy Logic (FL), and Genetic Algorithms (GA), to assess
groundwater quality. The results of our analysis demonstrate the effectiveness of these
techniques in predicting water quality indicators, such as pH, turbidity, and Total
Dissolved Solids (TDS). These findings provide valuable insights for researchers and
policymakers seeking to improve water resource management and ensure the safety of
our drinking water supply.
Through an extensive review of existing research papers, it has been identified that
Geographic Information System (GIS) technology is making rapid advancements but is
unable to accurately process data that contains missing information, leading to
inconsistencies. This poses a challenge when determining water quality as it requires
the measurement of several parameters to calculate a Water Quality Index (WQI). To
overcome this challenge, soft computing techniques have been employed to predict
WQI, simplifying the process. Four different soft computing models were utilized to
predict WQI, and the effectiveness of each model was assessed using various plots.
The results revealed that the Artificial Neural Network (ANN) model exhibited a high
level of agreement between predicted and regression values, with over 80% accuracy.
This finding highlights the potential of soft computing techniques as a useful tool for
predicting water quality, which could greatly enhance our ability to manage and
protect our water resources. Furthermore, By measuring multiple hydro-chemical
parameters and utilizing soft computing techniques, WQI can be predicted with high
precision, making it a viable alternative to costly water quality measurement station
II
TABLE OF CONTENTS
i. Acknowledgment I
1. Introduction 1
1.2 Objectives 4
1.3 Motivation 5
1.4 Background 5
2.1 Methodology 7
3. Technical Specification 8
7. Conclusion 38
8. Project Demonstration 38
9. References 39
IV
List of Figure
Fig 4.7 Training state plots of NARX network for Pre monsoon 22
Fig 4.8 Training state plots of NARX network for Post monsoon 23
V
Fig Regression plots of Cascade network for post monsoon 27
4.18
VI
List of Tables
Tab: 3.6 Calculated WQI for the pre monsoon period of 2019- 13
2020
Tab: 3.7 Calculated WQI for the post monsoon period of 2019- 13
2020
VII
List of Abbreviations
Wi Relative weight
ML Machine Learning
VIII
1. INTRODUCTION
The findings of this study are expected to provide valuable insights for researchers and
policymakers in developing effective strategies for water resource management and
ensuring the safety of our drinking water supply. The use of Soft Computing techniques
in water quality assessment could revolutionize the way we monitor and manage our
water resources, making it more efficient, cost-effective, and accurate.
1
1.1.Literature Review
2
determine the type they fall under
then calculated salinity using US
salinity Diagram
Ramakrish “Assessment of Water Quality Index for Water Quality Index values was
naiah C.R the Groundwater in Tumkur Taluk, estimated by considering 17
8. et. al Karnataka State, India”, Coden Ecjhao E- parameters and forming
Journal of Chemistry, 2009, 6(2), 523- Regression Analysis Equation
530
3
criteria
Through these research papers it reflected that, due to inaccuracies and omissions,
there was a problem with the quality of the data since GIS was constantly being
developed.
Therefore, it was challenging to handle the inconsistent data using traditional
computing techniques.Therefore, a mechanism that can handle such conflicting data is
needed.
Soft computing is made up of approaches that complement one another and offers a
flexible information processing capability for dealing with ambiguous scenarios that
arise in everyday life. These models allow for inconsistent, error-filled, noisy, and
missing value data. Thus, soft computing may offer a potent tool for GIS to solve the
inconsistent data issue.
1.2.OBJECTIVE
4
1.3. Motivation
A key component in managing water resources and ensuring people have access to
clean drinking water is groundwater quality evaluation. But the traditional procedures
used for groundwater purity are difficult due to the presence of noisy information.
Additionally, those approaches take a long time to get right and may not handle all the
variables linked to water systems. Therefore, Machine learning models present
captivating responses for dealing with the complexity and variability of water bodies.
The choice to use soft computing strategies to evaluate the quality of groundwater for
my capstone project in my senior year was motivated by two major factors.
First off, software-based computing addresses prove helpful for determining water
quality because they are able to interact with ambiguous and fuzzy data. They can
rapidly examine such data, simulate nonlinear relationships, and generate accurate
projections. I am curious to learn how they can be utilized for actual data.
1.4. Background
Tamil Nadu, a state in southern India, includes the coastal district of Nagapattinam. It
is located in the Bay of Bengal between 10.7668° N latitude and 79.8447° E longitude
10.7668° N latitude and 79.8447° E longitude, it stands at the Bay of Bengal. The
district is rich in farmland and grows products like rice, sugar cane, and coconuts over
an area of around 2715 square kilometers. The Nagapattinam district receives
moderate to substantial rainfall from October to December during the monsoonal
season. The district of Nagapattinam is primarily flat and low-lying, rising on average 5
meters above sea level. It has a tropical climate with hot, muggy weather all year.
Roughly 1248mm of precipitation falls on average every year in the district. Due to
factors like El Nino or La Nina, there could be periodic variations typically. Over the
year, relative humidity fluctuates from 70 to 90 percent, causing it to be generally
humid. Owing to the monsoon rains, the state constantly observes 90% greater
moisture from July to September. Nagapattinam district's median temperature
5
oscillates between 27° C to 32° C. The two warmest periods are typically April and
May, with midday peaks usually leading to 35° c.
6
2.1. Methodology
7
3. Technical Specifications
8
Table: 3.2. Raw Data Collected for the post-monsoon for period of 2019-2020
9
By leveraging these statistical methods, we were able to address the inconsistencies in
the data and obtain more reliable and consistent results.
10
Following the removal of outliers and correction of erroneous values, the next step
involved data normalization, which is a crucial technique used to bring the data values
to a common scale. This is achieved by calculating the mean and standard deviation of
each sample and scaling the data to a comparable, equivalent scale. The process of
normalization is essential when dealing with data that has different ranges and units,
as it facilitates effective comparison and analysis of the data. By utilizing the statistical
measures of mean and standard deviation, we were able to normalize the data and
obtain reliable and consistent results that could be used for further analysis and
evaluation.
Upon completion of the data cleaning process for all 44 samples comprising 10
parameters, the next step involved assigning weights to each parameter based on their
relative importance and determining the permissible and desirable limits for each. By
utilizing these assigned weights, the relative weight Wi was computed for each
parameter, which takes into consideration their individual significance in the overall
assessment of groundwater quality. This approach enables a more accurate and
comprehensive evaluation of the water quality parameters and can help in identifying
potential sources of contamination and taking appropriate measures to address them.
11
Tab: 3.5 Assigned weight for input parameters
Ca 75 200 3 0.088235294
Mg 30 100 3 0.088235294
Na 0 200 5 0.147058824
K 0 12 2 0.058823529
NO3 0 45 4 0.117647059
34
Using relative weight and the highest permissible value for each parameter, the sample
values are converted from mg per liter to milliliters.
Milliliter = Ov*100*Wi/ Sn
Where Ov= observed value for the ith parameter of the sample, Sn = standard
permissible value of the ith parameter (Refer Tab: 3.6)
Adding all the resultant values of 10 parameters will give the water quality index.
12
Tab: 3.6 Calculated WQI for the pre-monsoon period of 2019-2020
Tab: 3.7 Water Quality Index for the post-monsoon period of 2019-2020
After verifying the results, we can determine the water quality based on the calculated
index and assess its suitability for drinking. To gain deeper insights, we can create a
13
correlation. plot and analyze which parameters have a significant impact on the water
quality index. Based on this information, we can implement appropriate measures to
improve the quality of water and make it cleaner. It is important to take necessary
actions to ensure that the water is safe for consumption.
Tab:3.8 WQI Range
0-25 19 samples
26-50 18 samples
51-75 6 samples
76-100 NIL
>100 1 samples
51-75 Poor
26-50 Good
0-25 Excellent
15
correctly forecast water quality measurements based on prior data. Regression
models, decision trees, and neural networks are a few examples of machine learning
algorithms that can analyze data patterns and predict outcomes with high levels of
accuracy. This can assist in identifying potential issues with water quality before they
become severe issues.
For instance, using information gathered from numerous sources, including satellite
imaging, water quality sensors, and weather data, machine learning models can
forecast the concentration of pollutants in water bodies. Machine learning algorithms
can use these datasets to analyze the possibility of water pollution and aid in stopping
the development of pollutants in the water.
Efficient Analysis: Monitoring water quality involves collecting an extensive volume
of information on multiple parameters, including pH, temperature, turbidity, dissolved
oxygen, and contaminants, from diverse sources, including rivers, lakes, and
groundwater. Data on water quality are often analyzed manually, which can be
laborious and prone to inaccuracy.
However, machine learning algorithms are an effective tool for monitoring water
quality because they can quickly and effectively analyze huge datasets. In large data
sets, machine learning algorithms can identify patterns and trends that could take
humans a long time or a lot of effort to notice. These algorithms can also determine
correlations between various aspects of water quality, which can be useful in locating
probable sources of contamination and forecasting problems with water quality.
Improved decision-making: Machine learning algorithms' insights can help decision-
makers find the best measures for maintaining or improving water quality. For
instance, information on the causes and sources of water contamination can be
provided by machine learning algorithms, which could help policymakers create
focused solutions to these problems. Machine learning may additionally provide
insight into the effects of various water quality management measures, such as the
efficiency of various treatment methods or the effects of changing land use on water
quality.
Cost-effective: Machine learning could help in improving resource utilization and
lowering the cost of managing and monitoring water quality. Machine learning, for
instance, can assist in lowering the need for expensive laboratory testing, which can be
time- and resource-intensive. Machine learning can also assist in lowering the need for
regular manual monitoring of water quality parameters, which can be expensive, by
offering accurate predictions and real-time monitoring.
16
The ability of ANNs to represent both linear and non-linear relationships is one of their
key advantages. This means that even when such relations are not evident or clear to
describe using conventional statistical approaches, ANNs may uncover complicated
patterns and relationships in data.
In order to find patterns and correlations between various groundwater quality
statistics, ANNs can be trained to analyze data from a variety of sources, such as water
quality monitoring stations, geological data, and land-use information. ANNs can offer
a simple technique to model groundwater quality and produce precise predictions
about the quality of groundwater at specific locations by recognizing these
relationships directly from data.An additional benefit of ANNs is their ability to present
simulated values for desirable places where measured data are requested but
necessary for water quality estimates to be unavailable. This is especially helpful for
assessing the quality of groundwater because it can be difficult to get information from
all relevant areas. Models for the water quality in these areas can be made more
thorough and precise by simulating data using ANNs.
Additionally, ANNs develop knowledge by themselves and generate results
independent of the input. This means that even when these patterns are not obvious or
widely known, ANNs can nevertheless find hidden patterns and relationships in data.
As a result, ANNs are able to recognize complex relationships between groundwater
quality metrics and make predictions about that quality that are more precise.
Finally, since ANNs store input in their own networks rather than a database, they are
unaffected by data loss in terms of how they operate. As a result, ANNs are particularly
helpful in situations where data collection is challenging or costly, as they may
continue to produce precise predictions even in the presence of missing or insufficient
data.Being able to represent both linear and non-linear connections, generate
simulated values for desired areas, learn automatically, and perform well even in the
absence of complete data are just a few of the benefits that ANNs offer in terms of
evaluating groundwater quality. Utilizing these benefits, ANNs can assist water quality
professionals in making better decisions that better safeguard environmental and
human health.
17
(ANN), which has numerous models in it. The processing was done in, and the neural
networks chosen were Cascade forward backpropagation, feed-forward, Elman
backpropagation, the NARX (Nonlinear AutoRegressive with eXogeneous Outputs)
neural network, and Self-organizing maps, for the purpose of training and predicting
the water quality index of the Nagapattinam district located in Tamil Nadu.
NARX neural network: nonlinear system simulation is primarily accomplished using
this. It has an input layer that houses the network's input. In order to fully comprehend
the functioning of the framework, the input for this network consists of the prior
inputs and outputs. The input data is transformed into the output through the hidden
layer, and the outcome, which is what is projected for the current time, is indicated in
the output layer. In order to train the network, input-output pairs are fed into it.
Backpropagation weight adjustments are then made in order to optimize the network
by minimizing the discrepancy between estimated and actual output. After the
completion of the training process, a new dataset is introduced by feeding the relevant
historical input-output pairs, and the output is then predicted and reported. Problems
involving time series and signal processing are the principal applications for this.
Feed-forward neural network- Also known as the multilayer perceptron, this network
is identical to the cascade forward backpropagation neural network, and functions
similarly. It has an output layer, several hidden layers, and one input layer comprising
the input values. All of the above layers are linked by weights, and the neuron in the
following layer utilizes the output from the layer prior as its input before the activation
function is used to produce the output. First, the weights are initialized with random
values for the input data, which is done through the forward pass procedure. Here, the
inputs are provided, and the current weights are used to construct the output. The
19
generated and the actual outputs are compared, and the error is conveyed back by
backward pass, just as in the cascade forward approach. The weights are then changed
as necessary to reduce inaccuracies up until precise results are generated. The only
distinction between a feed-forward neural network and a cascade-forward neural
network is in the learning algorithm and network design. Feed forward has a set
number of layers that are chosen before the training phase, unlike cascade forward
which has an adaptive architecture. This method is faster and less expensive when
performing calculations than the cascade forward method since it uses fewer
computing materials.
4.3. Results
The soft computing technique used here was ANN (Artificial Neural Network) due to
the advantages mentioned in section 3.2. A total of five neural networks, namely
Cascade forward backpropagation, Feed-forward, Elman backpropagation, the NARX
(Nonlinear AutoRegressive with eXogeneous outputs) neural network, and Self-
organizing maps, were compared, and the most suitable network was selected based
on accuracy and performance. The training function used was TRAINLM, and the
adaptation learning function used was LEARNGDA. The mean squared error (MSE) was
used as the measure of performance as the performance function, and the training,
validation, and testing were done for 5, 10, and 30 layers in order to obtain the best
results, and the network was trained only once since the sample size is small. As
mentioned, 70% of the data was used as a training dataset, 15% was used as a
validation dataset with which the network was not familiar, and 15% was used for the
testing phase. Out of 44 values of output, which is the WQI, only 5 values were made
known to the network, so it could predict the remaining values, which could help in
determining the level of precision of the network. The three figures in results for each
models represents 5 layer, 10 layer and 30 layer respectively.
20
NARX neural network
Upon executing the network, 2 plots were generated to understand the accuracy of
thenetwork for the given input and output. They were the performance plot and the
training state plot. A comparison of 5, 10, and 30 layers for both pre-monsoon and
post-monsoon is depicted below:
21
Fig:4.6 Performance plots of NARX network for post-monsoon
Here the comparison in the performance plots of different numbers of layers in the
post-monsoon is done. The least errors obtained while using 5, 10, and 30 layers are
102.8155, 8.6166, and 3660.5048 respectively. It is clear that 10 layers produced the
least amount of errors after only 24 epochs and hence it was the most effective and
precise arrangement.
22
the network was learning rapidly and generating quicker results compared to the
other 2 plots.
23
Fig:4.10 Performance plots of Elman network for post-monsoon
In the post-monsoon dataset, it is clear that the 5 layer network performed better, with
MSE only 10.6312, than 10 and 30 layer networks whose MSE values were 178.0775
and 215.572 at 11 and 7 epochs respectively. From the performance plot of both pre-
monsoon and post-monsoon, it can be noted that the network with 5 layers was the
best with the least errors.
24
comparison for 5, 10, and 30 layers is depicted below for all types of plots for the pre-
monsoon and post-monsoon dataset:
25
Fig:4.15 Training state plots of Cascade network for pre-monsoon
The declining trend of the plots indicates that the speed of the training of the network
gradually decreased with time. This means that the weights of the network had to be
constantly updated to reduce the errors in the start of the training process and as the
network was trained, the errors reduced and therefore minute updating was required
by the end. From the generated plots, it can be observed that the training process of
the 10 layer plot was the smoothest and the errors in training was the least for the 5
layer plot for the training dataset with only 5.1162 as the gradient by the end of the
training. From the Mu plot which represents the learning rate, it can be noted that the
5 and 10 layer network had a similar trend as well as the same value of learning rate, 1,
towards the end implying that the weights were being updated and response was
generated much quicker towards the end of the process. The system with 30 layers
was learning slower compared to the other two.
26
Fig:4.17 Regression plots of Cascade network for pre-monsoon
Through the regression plot, the variation between the actual output and the predicted
output could be determined, as it has the ability to predict the outputs. For precision
purposes, regression values greater than 0.8 are considered suitable and accurate. In
the regression plot, the x axis is the number of samples, and the y axis represents the
WQI. The dotted line in the background in each plot represents the actual outputs and
therefore has a regression value of 1 indicating the values are perfect. There are 4 plots
in each regression plot: the blue line represents the performance and prediction for
the training dataset, the green line for the validation dataset whose values are not
familiar to the network, the red line is the performance on the testing dataset, and the
last plot with the black line is the overall performance. The regression values can range
from -1 to 1. The regression values should be closer to 1, indicating that the variation
between the predicted and actual outputs is minimum and hence the errors are also
minimum. The results of the pre-monsoon data for all 3 networks were highly
accurate, as all values were greater than 0.8 and the variations between the plots were
very minute.
27
Feedforward Neural Network
This type of network also facilitates the prediction of the outputs and therefore the
precision of the model can be examined in a much more concrete manner. Networks
with 5, 10 and 30 layers were compared against each other. The plots generated here
were the performance plot, training state plot and the regression plot. Comparison for
each plot for the pre-monsoon and post-monsoon is depicted below:
28
Fig:4.22 Training state plots of feed-forward network for post-monsoon
For the post-monsoon data as well the network with the least errors for the training
dataset was the one with 30 layers whereas the learning rate of the 5 layer network
was the least towards the end.
29
5. Variation of Ions in water samples
In places where access to clean water is a concern, the study of hydrogeochemical
variation of ions in water samples has grown in significance. The Hill Piper diagram,
which depicts the dominating ions in the water samples, can be used to analyze this
variation. Utilizing Geographic Information Systems (GIS) to spatially analyze and map
the change of ions in water samples is another method. In order to forecast the change
of ions in water samples, self-organizing models and artificial neural network (ANN)
machine models have also been used. These models can help with water resource
management and conservation efforts by helping to comprehend the complex
relationships between various ions and their sources.
The Hill-Piper diagram, often also known as the Piper trilinear diagram or the Piper
plot, is a graphical representation of water chemistry data that illustrates the relative
proportions of various chemical components, such as cations and anions, in a water
sample. The illustration was created in 1942 by Arthur D. Hill and Albert F. Piper and is
frequently used in hydrogeology, environmental science, and water resource
management.
Three triangular axes, one for each of the relative proportions of cations, anions, and
total dissolved solids (TDS) in the water sample, make up the diagram. The main
cations and anions, such as sodium (Na+), potassium (K+), calcium (Ca2+), magnesium
(Mg2+), chloride (Cl-), sulfate (SO42-), and bicarbonate (HCO3-), are represented by
the triangles' vertices.
30
● Magnesium bicarbonate + Mixed + Calcium chloride = Alkaline earth exceed
alkalies
● Sodium chloride+mixed+sodium bicarbonate = Alkalies exceed alkaline earth
● Magnesium bicarbonate + Sodium bicarbonate + mixed = weak acids exceeds
strong acids
● Calcium chloride+sodium chloride + mixed = strong acids exceed weak acids
These are chemical reactions between different compounds, resulting in changes in the
relative proportions of different chemical species. In the first reaction, there is an
excess of alkaline earth over alkalies, while in the second reaction, there is an excess of
alkalies over alkaline earth. The third reaction results in weak acids exceeding strong
acids, while the fourth reaction results in strong acids exceeding weak acids. These
reactions have implications for the overall chemistry of the water and can be used to
classify water types based on their chemical composition.
Fig: 5.2 Hill piper Diagram for Fig: 5.3 Hill piper Diagram for
Pre-monsoon 2019-2020 Post-monsoon 2019-2020
These diagrams demonstrate that sodium and potassium ions (Na+-K+) dominate over
other cations in the water samples analyzed. According to this, the concentration of
these two elements in the water is higher than that of any other cation.
Another graphical representation of water chemistry data that identifies the
predominant water type is the diamond diagram. According to the graphic, all of the
analyzed water samples were of the sodium chloride type, indicating that their
concentrations of sodium and chloride ions were higher than those of other cations
and anions.
These results collectively imply that the sodium and chloride ions in the water samples
analyzed are quite high and that the relative proportions of various cations and anions
31
can change based on the particular chemical components of the water. This data can be
helpful in spotting possible problems with water quality and developing effective
management plans. But it's crucial to remember that these results only apply to the
water samples that were examined, and they might not apply to other water sources.
To confirm these results and their wider implications for the management of water
resources, additional study and analysis may be required.
32
Fig: 5.5 Temporal Variation of WQI
The spatial variation map we have generated is specifically for the calcium ion, and it
allows us to identify areas where the values are exceeding the permissible limit, as
indicated by the legend in the figure. By observing the map, we can see regions where
the ion levels are denoted by yellow, pink, and red, indicating that they are beyond the
desirable limit and potentially nearing the permissible limit. This information could
serve as a starting point for conducting further studies to understand the reasons for
these observations. Similarly, one can set limits for other parameters and analyze their
spatial variation using different colors on the map.
33
An artificial neural network that learns patterns and relationships in data on its own is
called a "self-organizing network." Self-organizing neural networks can learn the
underlying structure of the data on their own, as opposed to supervised neural
networks, which need labeled input to train from. A self-organizing network's primary
principle is to modify the weights of the network's neurons in search of the best
possible representation of the input data. A layer of input neurons and a layer of
output neurons make up the network. While the output neurons represent various
groups of related data points, the input neurons take in the raw data. A set of input
samples is given to the network during training. The output layer's neurons' weights
are initially initialized at random and subsequently modified according to how close
they are to the input samples. More updates are made to neurons nearby the input
samples than to neurons farther away.
As a result of this process, the neurons eventually organize into clusters that stand in
for various collections of related input data samples. These clusters can then be used to
group incoming data points according to how closely they resemble the clusters that
already exist.
Clustering, dimensionality reduction, and feature extraction are examples of
unsupervised learning tasks that frequently make use of self-organizing networks.
They have been applied to a variety of tasks, such as voice and picture recognition and
anomaly detection.
The data used for this research is from 2019-2020, and it can be compared with future
datasets to identify any changes over time and variations of parameters across
locations.The following results shown are for Pre-monsoon and post-monsoon
respectively.
The Weight Planes plot displays a grid with two dimensions that shows the weights of
different input parameters, such as TDS, pH, HCO3, Cl, SO4, Ca, Mg, Na, K, and NO3.
Neurons with similar weights are depicted using a light yellow color, while those with
dissimilar weights are shown using darker shades of red and orange. When the
weights are similar, it indicates a high correlation between the neurons. In the pre-
monsoon Weight Planes plot, the pH input parameter has similar weights across the
grid, while in the post-monsoon plot, the weights vary significantly. This plot is useful
for comparing the weights of different input parameters to identify seasonal and
temporal variations.
35
Fig: 5.10 Distances between Neurons
A visualization tool called the Neighbouring Weight Distances plot shows the
separations between neighboring nodes in a SOM. It comprises nodes, which are
shown by blue circles, and the red lines that connect them. Different yellow hues can
be seen in the hexagons that the lines cross; lighter hues denote nodes that are closer
together and more likely to influence one another, while darker hues denote nodes
that are farther apart and less likely to impact one another. While there are
irregularities and greater distances between nodes in the pre-monsoon plot, the
weights are more uniformly distributed and closer together in the post-monsoon
figure. In general, the Self-Organizing Map methodology works well for determining
the standard of groundwater.
The information offered by the presented data makes it easy to comprehend how the
input weights change over the pre-monsoon and post-monsoon periods, as well as the
spatial distribution and distances between samples. These findings are essential for
identifying and tracking water contamination, especially in light of the present global
water crisis. Monitoring changes in the distribution and make-up of water samples
over time and place can help focus future research and lead to the development of
workable remedies to decrease the effects of water-related disasters. These results
significantly advance our knowledge of the intricate dynamics governing water quality
and availability and have substantial implications for global sustainability and
resilience in the face of environmental issues.
36
Based on the results of the analysis, it can be observed that TDS, Chlorine, and Sodium
exhibit high correlation coefficients above 0.94, indicating a strong positive correlation
with water quality. This implies that changes in these variables are likely to have a
significant impact on the overall quality of water. Furthermore, the correlation plot can
be used to identify potential outliers and relationships between variables that may not
be immediately apparent through other means of data analysis.
7. Conclusion
Computer-based models called Artificial Neural Networks imitate how biological
neurons in the human brain work. A set of input parameters can be used to train these
models to forecast an output or categorize an item. The Water Quality Index , a single
number that measures the overall quality of a water sample based on numerous
physicochemical factors, has been predicted using ANN models in the context of water
quality assessment. The capacity to conserve resources is one advantage of employing
ANN models for water quality assessment. Traditionally, measuring numerous
physicochemical properties on water samples requires performing multiple
experiments. This procedure can be costly and time-consumingIn contrast, ANN
models can be trained using a relatively small dataset, which can then be used to
predict WQI values for new water samples without conducting additional tests.
In this case, ANN models have been trained using a set of 10 parameters specifically
chosen to determine the WQI in the Indian subcontinent(Nagapattinam). These
37
parameters were selected based on their relevance to water quality and availability in
water quality datasets. The models demonstrated their ability to predict WQI values
when WHO-defined parameters were used, indicating the potential for global
applications.However, it is important to note that variations in network parameters
can impact the results. This means that a larger training dataset with more members
may yield better regression values with learning rate and gradient. In addition, studies
have shown that a 10-layer network model has the highest regression when predicting
WQI values.Despite these considerations, the use of ANN models in water quality
assessment has the potential to simplify the complexity associated with interpreting
WQI. This can make the assessment process more efficient and cost-effective,
particularly in regions where access to water quality testing equipment is limited.
8. Project demonstration
As this study makes intensive use of soft computing techniques, different ANN models,
namely, Cascade forward backpropagation, feed-forward, Elman backpropagation,
NARX neural networks, and Self-organizing maps, for the purpose of training and
predicting the water quality index of the Nagapattinam district located in Tamil Nadu.
Other tools included GIS for the mapping of spatial and temporal variations, a decision
tree, a correlation plot for determining the impact of each parameter on the WQI,
cleaning and normalizing of the data, and a Python code for the classification of the
samples from the machine model.
38
9. References
39