12 Chapter7 PDF

140
CHAPTER 7
STATISTICAL ANALYSIS
7.1 GENERAL
A numerical measurement to describe the characteristics of a

sample is obtained by the use of statistical analysis. To compute the quality
and variations of data, simple statistical analysis is used. The operations can
be applied to a set of points or to a single point. A simple descriptive
statistical analysis is used for the set of water quality parameters for 53
locations of the study area.
7.2 DESCRIPTIVE ANALYSIS
A statistical analysis which describes the data set with maximum

and minimum values, mean and standard deviation is presented as descriptive
statistics. The standard deviation enables us to determine the value when the
data set is located with the relation to mean (Levin and Rubin 1995). The
mean and standard deviation (SD) values of the physic-chemical
characteristics of the groundwater samples are calculated as descriptive
statistics and it is given in the Tables 6.3 and 6.4 for the pre monsoon and post
monsoon season respectively.
141
7.3 STATISTICAL TOOLS USED IN THE STUDY
7.3.1 Mean
The mean of a data is simply the arithmetic average of the value in

a set of data. It is a measure for representing the entire data by one single
value. The mean is obtained by adding all the values or data’s and divide this
total by total number of the values.
Mean of a data is expressed as,
∑
̅
Here ̅ = Arithmetic mean
∑ = Sum of all values of variables of x
N = Number of observation
7.3.2 Variance
It is the arithmetic average of the square and difference between the

values and means (Fisher 1918). It is expressed as,
∑( ̅)
7.3.3 Standard Deviation
It is used to study the dispersion and it a measure of the absolute

dispersion or variability to the distribution. It is obtained as expressed by Karl
Pearson (1893),
142
∑( ̅ )
√
Here ̅ = Arithmetic mean of the values.
N = Number of observations
X = Variables and is Standard Deviation.
7.3.4 Correlation Analysis
Statistical investigation offers more attractive options in

environment science, though the results may deviate from the real situations
(Nemade & Shrivastava 1997). Generally the quality of groundwater depends
on various parameters. An excellent tool for the prediction of the parametric
values with a reasonable degree of accuracy is correlation (Venkatachalam &
Jebanesan 1998). A strong correlation exists among the different parameters
and a combine effect of inter relationship among these parameters reflects on
the quality of water (Arul Antony et al 2008). The study of the correlation
coefficient of the different parameters not only helps to assess the overall
water quality but also to quantify relative concentration of various pollutants
in water (Dash et al 2006).
The correlation coefficient is also known as Pearson’s Product

Moment “r”. If X and Y are the two variables then the correlation “r”
between the variables is calculated using the following relations,
∑( ̅ )( ̅̅̅)
( )
√∑( ̅) ( ̅̅̅)
If the values of correlation coefficient “r” between two variables X an Y are

fairly large, it implies that these variables are highly correlated.
143
7.3.4.1 Strength or magnitude of the correlation
The Pearson’s correlation coefficient may take a value from -1 to 0,

0 to 1. The values of “r” and their strength of relationship are given in
Table 7.1. The strength of the correlation is not dependent on the direction or
the sign. Thus, if r = 0.70 and r = - 0.70 implies that they are equal in the
degree of association of the measured variables. The positive correlation
shows that the data lie on a straight line with a positive slope whereas the
negative correlation shows that the data lie on a straight line with a negative
slope. The zero correlation clearly explains that there is no linear relationship
between the variables.
Table 7.1 Strength relationship between the correlation “r” values
“r” values Strength of Relationship

1.0 Perfect
0.9-0.7 Strong
0.6-0.4 Moderate
0.3-0.1 Weak
0.0 Zero
7.3.4.2 Test of Significance of the observed correlation coefficients

during premonsoon and post monsoon season
The correlation analysis for the various physico-chemical

parameters were studied and the extent of association or disassociation of any
of the hydro chemical parameters with other parameters of premonsoon and
post monsoon season samples can be studied from this analysis. The
correlation coefficients (r) among the various water quality parameters for the
pre-monsoon and post monsoon seasons have been calculated and the
144
numerical values are tabulated as shown in Table 7.2 and Table 7.3
respectively.
Table 7.2 illustrates the correlation matrices for the pre monsoon
season and it indicates that EC has good positive correlation with TDS
(r = 0.995), TH (r = 0.863), Mg2+ (r = 0.833), Na+ (r = 0.712), SO42-
(r = 0.826) and Cl- (r =0.919). Also TDS has the positive correlation with the
parameters TH (r = 0.852), Mg2+ (r= 0.833), Na+ (r = 0.745), SO42-
(r =0.834), Cl-(r= 0.918), Total Hardness have the positive correlation with
the cations (Ca2+ and Mg2+) with the correlation coefficient of r =0.744 and
r = 0.979 and anions (SO42- and Cl-) with the correlation coefficients of
r =0.873, and r =0.869, Also SO42- and Ca2+ (r= 0.702), SO42- and Mg2+
(r= 0.838), Cl- and Mg2+ (r =0.876) and Cl- and SO42- (r =0.811) are found to
have strong correlation with each other (0.9< r <0.7). The pH content of the
water is found to have the negative correlation with all the parameters except
K+ and F-. Fluoride has the negative correlation TH and Cl-. Similarly
Carbonates have the negative correlation with all the parameters except pH,
K+ and SO42-.
Table 7.2 Correlation matrix for the Pre-monsoon season
pH EC TDS TH Ca2+ Mg2+ Na+ K+ SO4- Cl- NO3- F- HCO3 CO3

pH 1.000
EC -0.150 1.000
TDS -0.152 0.995 1.000
TH -0.136 0.863 0.852 1.000
Ca2+ -0.131 0.631 0.641 0.744 1.000
Mg2+ -0.125 0.848 0.833 0.979 0.593 1.000
Na+ -0.147 0.712 0.745 0.319 0.266 0.304 1.000
K+ 0.044 0.282 0.262 0.226 0.057 0.256 0.047 1.000
SO4- -0.013 0.826 0.834 0.873 0.702 0.838 0.449 0.106 1.000
Cl- -0.130 0.919 0.918 0.869 0.565 0.876 0.574 0.216 0.811 1.000
NO3- -0.056 0.616 0.637 0.539 0.339 0.548 0.411 0.082 0.376 0.610 1.000
F- 0.101 0.325 0.305 -0.010 -0.148 0.033 0.497 0.244 0.068 0.163 0.081 1.000
HCO3 -0.237 0.569 0.543 0.225 0.196 0.211 0.669 0.370 0.160 0.281 0.186 0.614 1.000
CO3 0.622 -0.171 -0.164 -0.119 -0.123 -0.106 -0.185 0.015 0.001 -0.144 -0.055 -0.009 -0.353 1.000
145
Table 7.3 Correlation matrix for the Post-monsoon season
pH EC TDS TH Ca2+ Mg2+ Na+ K+ SO4- Cl- NO3- F- HCO3 CO3

pH 1.000
EC -0.364 1.000
TDS -0.374 0.994 1.000
TH -0.392 0.706 0.689 1.000
Ca2+ -0.399 0.565 0.549 0.891 1.000
Mg2+ -0.275 0.703 0.695 0.848 0.528 1.000
Na+ -0.145 0.743 0.760 0.065 -0.061 0.228 1.000
K+ -0.100 0.059 0.035 -0.012 -0.058 0.038 0.041 1.000
SO4- -0.418 0.787 0.787 0.680 0.611 0.577 0.476 0.042 1.000
Cl- -0.326 0.888 0.884 0.718 0.603 0.681 0.573 -0.020 0.551 1.000
NO3- -0.387 0.623 0.668 0.614 0.566 0.516 0.336 -0.217 0.435 0.694 1.000
F- 0.305 -0.040 -0.059 -0.109 -0.152 -0.032 0.033 -0.065 -0.100 -0.110 -0.207 1.000
HCO3 -0.113 0.749 0.740 0.284 0.074 0.491 0.794 0.196 0.528 0.466 0.154 0.177 1.000
CO3 - - - - - - - - - - - - - 1.000
146
147
Table 7.3 illustrates the correlation matrix for the post monsoon
season which indicates that 15 correlation coefficients between TDS and EC
(0.994), TH and EC (0.706), Mg2+ and EC (0.703), Na+ and EC (0.743), SO4-
and EC (0.787), Cl- and EC (0.888), HCO3- and EC (0.749), Na+ and TDS
(0.760), SO42- and TDS (0.787), Cl- and TDS (0.884), HCO3- and TDS
(0.740), Ca2+ and TH (0.891), Mg2+ and TH (0.848), Cl- and TH (0.718), Na+
and HCO3- (0.794) are found to have strong correlation with each other. In the
post monsoon period pH values have negative correlation with all the
parameters except F-. This shows that pH is independent of all other
parameters and the carbonate ion concentration is zero in all the wells and
hence the correlation is neglected. The other negative correlation is found
between TH and K+ (r = -012), TH and F- (r = -.101), the calcium
contents has negative correlation with Na+ (r = -0.061), K+ (r= -0.058) and F-
(r = -0.152). The fluoride content has the negative correlation with K+, SO42-,
Cl- and NO3-.
7.3.4.3 Inference from correlation matrices
From the correlation matrices (Tables 7.2 and 7.3) of the pre-
monsoon and post- monsoon season, it is observed that all the parameters
have interrelated with each other. The level of the correlation have been
classified as Good, Fair, Poor and Inverse based on the percentage of the
correlation value which is given in Table 7.4.
Table 7.4 Broad Classification of Correlation Values
Correlation Value Level of Correlation

67 to 100 % Good
34 to 66 % Fair
0 to 33% Poor
-1 to -100% Inverse
148
7.3.5 Factor Analysis
Factor analysis is a multivariate analytical technique which was

done using SPSS software. It is a very powerful technique applied to reduce
the dimensionality of a dataset consisting of a large number of interrelated
variables, while retaining as much as possible the variability presented in
dataset and with a minimum loss of information. This reduction is achieved
by transforming the dataset into a new set of variable - factors, which are
orthogonal and are arranged in decreasing order of importance. Factor
Analysis can also be used to generate hypotheses regarding causal
mechanisms or to screen variables for subsequent analysis.
7.3.5.1 Factor analysis of water quality parameters for premonsoon

season
Factor analysis was used to reduce the number of variables

pertaining to the determination of water quality index in the pre-monsoon
season; the study included as many as 14 variables and in order to reduce the
number of variables and to find the main underlying constructs of Water
Quality Index, factor analysis was carried out and the results of factor analysis
are produced below.
The initial analysis with all the 14 variables resulted in a Measure

of Sampling Adequacy of 0.613, which indicates that a factor analysis can be
applied to the data. Also on scrutinizing the Anti-image correlation matrix, it
was observed that the variables pH, K+, CO3-, HCO3- and NO3- have loading
below 0.5 which implies that these variables are eligible candidates for
exclusion from the analysis, as they do not add much to the total variation in
the sample. These five variables have been removed from the analysis and the
factor analysis was run again and the same was found to have Measure of
Sampling Adequacy of 0.728, and all the remaining variables have loadings
149
more than 0.5 in the anti-image correlation matrix. The results of the factor
analysis for the pre-monsoon seasons are presented from Table 7.5 to
Table 7.11.
Table 7.5 KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. 0.728

Bartlett's Test of Sphericity Approx. Chi-Square 1191.551
Degrees of freedom 36
Sig. 0.000
The results of MO and Bartlett’s Test indicate that a factor

analysis can be applied to the data as the value of KMO statistics is 0.728, and
the Bartlett’s Test of Sphericity is significant (p < .001). A value close to 1
indicates that patterns of correlations are relatively compact and so factor
analysis should yield distinct and reliable factors. The recommended values
are greater than 0.5 barely acceptable. Furthermore, values between 0.5 and
0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8
and 0.9 are great and values above 0.9 are excellent.
Table 7.5 shows the number of components extracted with eigen

values and cumulative variance explained by them. There are two factors
resulting from the analysis explaining a total of about 86.52 per cent of the
variations in the entire data set. The percentages of variation explained by the
two factors are 63.240 and 23.275 respectively after varimax rotation is
performed.
Table 7.6 Factor Analysis - Total Variance Explained (Extraction Method: Principal Component Analysis)
Extraction Sums of Squared

Initial Eigen values Rotation Sums of Squared Loadings
Loadings
Component
% of Cumulative % of Cumulative
Total Total Total % of Variance Cumulative %
Variance % Variance %
1 6.245 69.394 69.394 6.245 69.394 69.394 5.692 63.240 63.240
2 1.541 17.121 86.516 1.541 17.121 86.516 2.095 23.275 86.516
3 .529 5.881 92.397
4 .395 4.387 96.784
5 .178 1.978 98.762
6 .084 .936 99.699
7 .025 .272 99.971
8 .003 .029 100.000
9 .000 .000 100.000
150
151
The Table 7.7 represents the rotated component matrix which

shows the factor loadings for naming the factors. By this, two factors were
obtained. Factor 1 will comprise of seven variables TH, Mg2+, SO42-, Cl-,
TDS, EC and Ca2+. This factor is named as ‘Factor 1 Predominant’. Factor
2 comprises of two variables F- and Na+. This factor is named as ‘Factor 2
Predominant’.
Table 7.7 Rotated Component Matrixa
Component
Variable
1 2
TH 0.979
Mg2+ 0.934
SO42- 0.907
Cl- 0.877
E.C 0.859
TDS 0.859
Ca2+ 0.808
F- 0.884
Na+ 0.814
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.a
a. Rotation converged in 3 iterations.
Figure 7.1 shows the component plot in a rotated space. It is also

very much visible from the figure that the variables were grouping as per the
rotated component matrix.
152
Figure 7.1 Component plot in a rotated space
The factor scores for the set of new variables obtained by the factor
analysis are and these factor scores are used to develop a model for
determination of Water Quality Index through regression analysis.
Regression Analysis
The regression equation is an algebraic representation of the

regression line and describes the relationship between the response and
predictor variables. The regression equation takes the form of:
Response = constant + coefficient * predictor + ... + coefficient * predictor
(Or)
y = bo + b1X1 + b2X2 + ... + bkXk

153
Where:
 Response (Y) is the value of the response.
 Constant (bo) is the value of the response variable when the

predictor variable(s) is zero. The constant is also called the
intercept because it determines where the regression line
intercepts (meets) the Y-axis.
 Predictor(s) (X) is the value of the predictor variable(s). The

predictor can be a polynomial term.
 Coefficients (b1, b2, ... , bk) represent the estimated change in

mean response for each unit change in the predictor value. In
other words, it is the change in Y that occurs when X
increases by one unit.
Model Fitting through Multiple Regression Analysis (Pre-Monsoon)
Multiple regression analysis is used to fit a model for the dependent

variable Water Quality Index (WQI) on the independent variables of two
factors identified through Factor Analysis in the previous section.
The Table 7.8 gives the correlations among the variables taken for
study. It can be seen from the table that the independent variable Factor 1
Predominant has a strong positive significant correlation (0.962) with WQI
and Factor 2 predominant variable has very low correlation (0.101) with the
dependent variable WQI.
154
Table 7.8 Correlations between Variables
Factor 1 Factor 2
Pearson Correlation WQI
predominant predominant
WQI 0.101 1.000 .000
Factor 1 0.962 0.000 1.000
predominant
Factor 2 0.101 1.000 0.000
predominant
Table 7.9 gives the model summary of the model fitted through
SPSS software. It can be observed from this table the value of R-Square is
0.936, (adjusted R-square of .0934) which means that 93.4 per cent of the
variation in the dependent variable Water Quality Index was explained by the
two independent variables Factor 1 predominant and Factor 2 predominant.
Table 7.9 Model Summaryb
R Adjusted R Std. Error of

Model R
Square Square the Estimate
1 0.968a 0.936 0.934 7.668
a. Predictors: (Constant), Factor 1 predominant, Factor 2
predominant
b. Dependent Variable: WQI
Table 7.10 gives the significance of the three models fitted by

SPSS. The high value of F (2, 50) =366.429 with low p-value <.001 verify
that the first model is statistically significant in explaining the variation in
WQI at 1 per cent level of significance.
155
Table 7.10 ANOVAa
Sum of Mean
Model df F Sig.
Squares Square
1 Regression 43095.360 2 21547.680 366.439 0.000b
Residual 2940.148 50 58.803
Total 46035.508 52
a. Dependent Variable: WQI
b. Predictors: (Constant), Factor 1-predominant, Factor 2 -predominant
Table 7.11 gives the coefficients of the independent variables

included in the model. The fitted model for the dependent variable Water
Quality Index is expressed by the equation:
WQI = 55.824 + 3.002*Factor 1-predominant + 28.631*

Factor 2 -predominant
Table 7.11 Coefficientsa
Unstandardized Standardized
Collinearity Statistics
Coefficients Coefficients
Model t Sig.
Std.
B Beta Tolerance VIF
Error
1 (Constant) 55.824 1.053 52.998 0.000
Factor 1-
3.002 1.063 0.101 2.823 0.007 1.000 1.000
predominant
Factor 2 -
28.631 1.063 0.962 26.924 0.000 1.000 1.000
predominant
The independent variable Factor 1-predominant with a positive

coefficient of 3.002 is significant (t=2.823 and p < .05) in explaining
variations in WQI. The other independent variable Factor 2 -predominant -
156
with a coefficient of 28.631 is also significant in explaining WQI (t=26.924

and p < .01). This means that an increase of one unit in the value of Factor 2-
predominant variable increases WQI by 28.631 units and one unit of increase
in Factor 1-predominant increases WQI by 3.002 units.
It can also be seen from the above coefficients table that variance
inflationary factors (VIF) is very low and less than 5 for all the two the
explanatory variables in the model. This value of VIF indicates that
collinearity does not exist between the explanatory variables.
7.3.5.2 Factor analysis of water quality parameters for post monsoon

season
Factor analysis was used to reduce the number of variables

pertaining to the determination of water quality index in the post-monsoon
season; the study included as many as 14 variables and in order to reduce the
number of variables and to find the main underlying constructs of Water
Quality Index, factor analysis was carried out and the results of factor analysis
are produced below.
Factor Analysis
The initial analysis with all the 14 variables resulted in a Measure

of Sampling Adequacy of 0.592, which indicates that a factor analysis can be
applied to the data. Also on scrutinizing the Anti-image correlation matrix, it
was observed that the variables K+ and NO3- have loading below 0.5 which
implies that these variables are eligible candidates for exclusion from the
analysis, as they do not add much to the total variation in the sample. These
two variables have been removed from the analysis and the factor analysis
was run again and the same was found to have Measure of Sampling
Adequacy of 0.639, and all the remaining variables have loadings more than
157
0.5 in the anti-image correlation matrix. The results of the factor analysis are
presented from Table 7.12 to Table 7.20.
Table 7.12 KMO and Bartlett's Test for post monsoon season
Kaiser-Meyer-Olkin Measure of Sampling

0.639
Adequacy.
Approx. Chi-Square 1082.091

Bartlett's Test
Degrees of freedom 66
of Sphericity
Sig. 0.000
The results of MO and Bartlett’s Test indicate that a factor

analysis can be applied to the data as the value of KMO statistics is 0.639, and
the Bartlett’s Test of Sphericity is significant (p < .001). A value close to 1
indicates that patterns of correlations are relatively compact and so factor
analysis should yield distinct and reliable factors. Kaiser (1974) recommends
accepting values greater than 0.5 as barely acceptable. Furthermore, values
between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good,
values between 0.8 and 0.9 are great and values above 0.9 are excellent.
Table 7.13 shows the number of components extracted with

eigenvalues and cumulative variance explained by them. There are three
factors resulting from the analysis explaining a total of about 85.050 per cent
of the variations in the entire data set. The percentages of variation explained
by the first three factors are 37.486, 32.245 and 15.319 respectively after
varimax rotation is performed.
Table 7.13 Factor Analysis - Total Variance Explained for post monsoon season
Extraction Sums of Squared Rotation Sums of Squared

Initial Eigenvalues
Loadings Loadings
Component
% of % of % of
Total Cumulative % Total Cumulative % Total Cumulative %
Variance Variance Variance
1 5.288 58.756 58.756 5.288 58.756 58.756 3.374 37.486 37.486
2 1.308 14.530 73.287 1.308 14.530 73.287 2.902 32.245 69.731
3 1.059 11.763 85.050 1.059 11.763 85.050 1.379 15.319 85.050
4 0.652 7.249 92.299
5 0.433 4.816 97.115
6 0.210 2.328 99.443
7 0.039 0.439 99.882
8 0.006 0.062 99.944
9 0.005 0.056 100.000
158
159
The Table 7.14 represents the rotated component matrix which

shows the factor loadings for naming the factors. In this way we get three
factors. Factor 1 will comprise of four variables TH, Cg, Cl, and SO4. This
factor is named as ‘Factor 1 Predominant’. Factor 2 comprises of three
variables Na, TDS and EC. This factor is named as ‘Factor 2 Predominant’.
Factor 3 comprises of two variables F, and Ph. This factor is named as
‘Factor 3-predominant’.
Table 7.14 Rotated Component Matrixa for post monsoon season
Component
Variables
1 2 3
TH 0.966
Mg2+ 0.902
-
Cl 0.642
SO42- 0.605
Na+ 0.989
TDS 0.784
E.C 0.769
F- 0.876
pH 0.704
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
The Figure 7.2 shows the component plot in a rotated space. It is

also very much visible from the figure that the variables were grouping as per
the rotated component matrix.
160
Figure 7.2 Component plot in a rotated space in post monsoon season
The factor scores for the set of new variables obtained by the factor
analysis are and these factor scores are used to develop a model for
determination of Water Quality Index through regression analysis.
Regression Analysis – Post Monsoon
Multiple regression analysis is used to fit a model for the dependent

variable Water Quality Index (WQI) the independent variables of three factors
identified through Factor Analysis in the previous section.
The Table 7.15 gives the correlations among the variables taken for
study. It can be seen from the table that the independent variables Factor 1
predominant and Factor 2-predominant do not have any significant correlation
with the dependent variable WQI. It can also be noted that the independent
variable F-predominant has a strong positive significant correlation (.890)
with WQI.
161
Table 7.15 Correlations between Variables for post monsoon season
Pearson Factor 1 - Factor 2 - Factor 3-

WQI
Correlation predominant predominant predominant
WQI 1.000 0.083 0.043 0.890
Factor 1 -
0.083 1.000 .000 0.000
predominant
Factor 2 -
0.043 0.000 1.000 0.000
predominant
Factor 3-
0.890 0.000 0.000 1.000
predominant
Table 7.16 gives the model summary of the model fitted through
SPSS software. It can be observed from this table the value of R-Square is
.801 (adjusted R-square .788), which means that 80.1 per cent of the variation
in the dependent variable WQI was explained by the three independent
variables Factor 1 predominant, Factor 2-predominant and Factor 3-
predominant.
Table 7.16 Model Summaryb for post monsoon season
R Adjusted Std. Error of

Model R
Square R Square the Estimate
1 0.895a 0.801 0.788 10.374
a. Predictors: (Constant), Factor 1-predominant, Factor 2
predominant Factor 3 -predominant
b. Dependent Variable: WQI
Table 7.17 gives the significance of the three models fitted by

SPSS. The high value of F (3, 49) =65.582 with low p-value <.001 verify that
162
the first model is statistically significant in explaining the variation in WQI at

1 per cent level of significance.
Table 7.17 ANOVAa for post monsoon season
Sum of Mean
Model df F Sig.
Squares Square
1 Regression 21173.265 3 7057.755 65.582 0.000b
Residual 5273.249 49 107.617
Total 26446.514 52
b. Predictors: (Constant), Factor 1-predominant, Factor 2-predominant
Factor 3-predominant
Table 7.18 gives the coefficients of the independent variables

included in the model. The fitted model for the dependent variable Water
Quality Index is expressed by the equation:
WQI = 47.572 + 1.868* Factor 1 predominant + .975*Factor 2-

predominant + 20.068*Factor 3 -predominant
Table 7.18 Coefficientsa for post monsoon season
Unstandardized Standardized Collinearity

Coefficients Coefficients Statistics
Model t Sig.
Std.
B Beta Tolerance VIF
Error
(Constant) 47.572 1.425 33.385 0.000
Factor 1-
1.868 1.439 0.083 1.299 0.200 1.000 1.000
predominant
1 Factor 2
0.975 1.439 0.043 0.677 0.501 1.000 1.000
predominant
Factor 3 -
20.068 1.439 0.890 13.950 0.000 1.000 1.000
predominant
163
The independent variable Factor 3-predominant with a coefficient

of 20.068 is significant (t=13.950 and p < .001) in explaining variations in
WQI. Also the other independent variables Factor 1-predominant and Factor
2-predominant with respective coefficients of 1.868 and .975 are not
significant in explaining WQI (p > .05). This means that an increase of one
unit in the value of Factor 3-predominant variable increases the value of WQI
by 20.068 units.
It can also be seen from the above coefficients table that variance
inflationary factors (VIF) is very low and less than 5 for all the three the
explanatory variables in the model. This value of VIF indicates that
collinearity does not exist between the explanatory variables.
7.3.6 Cluster Analysis
Cluster analysis is a multivariate technique for classifying a

‘mountain’ of information into manageable meaningful piles. It is a data
reduction tool that creates subgroups that are more manageable than
individual datum. It is a class of techniques used to classify cases into groups
that are relatively homogeneous within themselves and heterogeneous
between each other, on the basis of a defined set of variables. These groups
are called clusters.
Hierarchical Cluster is the most common method. This is the major

statistical method for finding relatively homogeneous clusters of cases based
on measured characteristics. It starts with each case as a separate cluster, i.e.
there are as many clusters as cases, and then combines the clusters
sequentially, reducing the number of clusters at each step until only one
cluster is left. The clustering method uses the dissimilarities or distances
between objects when forming the clusters. The SPSS programme calculates
‘distances’ between data points in terms of the specified variables. A
164
hierarchical tree diagram, called a dendrogram on SPSS, can be produced to

show the linkage points. The clusters are linked at increasing levels of
dissimilarity.
Cluster analysis comprises a series of multivariate methods which

are used to find true groups within the data. In clustering, the objects are
grouped in such a way that similar objects form one class or cluster. In this
present work, Hierarchical clustering which is the most common approach is
used which starts with each case in a separate cluster and joins the clusters
together step by step until only one cluster remains. The levels of similarity
at which observations are merged are used to construct a dendrogram. The
Euclidean distance was chosen as a distance measure or similarity
measurement between sampling sites. Drift corrections were not made, and
the variables were not transformed into log space. Ward’s method was more
successful to form clusters that are more or less homogeneous and
geochemically distinct from other clusters, compared to other methods such
as the weighted pair-group average. The results were interpreted using the
resultant dendrogram. The classification of the samples into clusters is based
on the visual observation of the dendrograms obtained for the respective
seasons.
The four basic steps in the cluster analysis are
1. Data collection and selection of the variables for analysis.
2. Generation of a similarity matrix
3. Decision about number of clusters and interpretation.
4. Validation of cluster solution.

165
7.3.6.1 Cluster analysis for the groundwater samples of pre Monsoon

season and Post Monsoon seasons
A hierarchical cluster analysis was conducted among 14 water

quality parameters of the pre monsoon and post monsoon seasons. There are
two major clusters obtained in both the season. In cluster 1, the parameters
which are clustered are EC and TDS and in cluster 2 the parameters included
are pH, TH, Ca2+, Mg2+, Na+, K+, CO3-, HCO3-, Cl-, NO3-, SO42- and F-. The
dendrogram showing the hierarchical cluster analysis of pre monsoon and
post monsoon season samples are given in Figures 7.3 and 7.4 respectively.
The parameters which are cluster together during both the seasons are
summarized in the Table 7.19.
Figure 7.3 Dendrogram showing Hierarchical cluster analysis of Pre

monsoon samples
166
Figure 7.4 Dendrogram Showing Hierarchical cluster analysis of Post

monsoon samples
Table 7.19 Summary of the Cluster Analysis of both seasons
Parameters in Parameters in Cluster 2

Cluster 1
EC and TDS pH, TH, Ca2+, Mg2+, Na+,
K+, CO3-, HCO3-, Cl-, NO3-
, SO42- and F-.
7.4 WATER QUALITY INDEX (WQI)
An efficient tool for disseminating the information on the overall

quality of the water is based on the number of water quality parameters after
giving weight to each of the parameter is called as Water quality Index
167
(Tiwari & Mishra 1985). The water quality index is contributing in assessing
the quality of any water systems and an important in quality management.
The WQI is generally uses as a tool to convert the large data set in to a much
reduced and informative form. In this present study, Weighted Arithmetic
Index (WQI) and Nemerow’s Pollution Index (NPI) have been adopted to
assess the status of the existing water quality and to identify the Pollution
causing parameter at a particular location of the study area.
7.4.1 Weighted Arithmetic Index (Water Quality Index)
The pollution level of the water is generally indicated by Water

Quality Index. An index number is formed mathematically by combining all
the water quality parameters and provides a general description of the water.
WQI is one of the most effective tools to communicate the overall quality
status of the water to concerned user community.
7.4.1.1 Formulation of weightage factor and water quality rating in

WQI
Tiwari & Mishra (1985) proposed a formula to calculate the WQI

which is computed by adopting weighted arithmetic index method. The
following steps are considered for calculating the WQI.
1. The weightage of the various water quality parameters is

assigned to be inversely proportional to the recommended
standards for the corresponding parameters.
(i.e)
Where Wi is the Weightage Factor

168
Si is the standard value for the ith parameter prescribed by the

standards
K is the constant of proportionality = .

( )
The value of K in each case is taken as 0.8487
2. The water quality rating is calculated based on the following

equation
( )
Here, Va and Vi are the actual and the ideal values of the
water quality present in the water samples. The ideal value is
zero for all the parameter expect pH (7.0)
qi is the quality rating for the ith parameter.
3. The WQI is a compilation of a number of parameters that

can be used to determine the overall quality of water (Tiwari
& Mishra 1985). The parameters involved in the WQI are
pH, Calcium, Magnesium, Chlorides, Total Hardness, Total
Dissolved Solids, Sulphates, and Fluorides. The numerical
value of the quality rating is multiplied by the weightage
factor which is relative to the significance of the test to water
quality. The WQI is calculated as per the following formula
∑
( )
∑
The Table 7.20 indicates the water quality parameters standard values and the
assigned weightages.
169
Table 7.20 Unit Weight of the parameters based on BIS for drinking
water (All the values are in mg/l except pH)
Parameter Standard Ideal Value Unit Weight

Values( Si) (Vi) (Wi)
pH 8.5 7.0 0.0998
TDS 500 0 0.001697
TH 300 0 0.002829
Ca2+ 75 0 0.01131
Mg2+ 30 0 0.02829
Cl- 250 0 0.003394
SO42- 200 0 0.004243
F- 1.0 0 0.8487
The overall WQI is calculated and the ranges of the WQI and the
status of the water quality on the basis of the increasing scale indices are
given in Table 7.21.
Table 7.21 WQI values and its status in Increasing Scale Indices
S.No WQI Values Status of Water Quality

(Range)
1 0 - 25 Excellent
2 26-50 Good
3 51-75 Poor
4 76-100 Very Poor
5 >100 Unfit for drinking
170
The WQI for all the locations of the study area are calculated and
based on the results it is found that during pre monsoon season around 19
number of location is under Good Category and 7 station are found to be as
unfit for drinking and in post monsoon season 18 samples are found to be
good category and 1 number of sample in unfit for drinking, which clearly
indicates that water quality of the study are during pre monsoon seasons is
comparatively good for drinking purpose than the post monsoon seasons.
Table 7.22 shows the classification of the groundwater in the study area with
respect to WQI. The spatial distribution map of the WQI for the entire study
area during premonsoon period and the post- monsoon period is shown in
Figure 7.5 and 7.6 respectively. The comparison between the WQI values of
the pre monsoon season and the Post Monsoon season is given in Figure 7.7.
Table 7.22 Classification of groundwater samples during pre monsoon

and post monsoon season with respect to WQI
No of Sample
WQI Ranges Quality of Water
Pre monsoon Post Monsoon
0 - 25 Excellent 06 10
26-50 Good 19 18
51-75 Poor 18 19
76-100 Very Poor 3 5
>100 Unfit for drinking 7 1

171
Figure 7.5 WQI for the study area during Pre-monsoon season
172
Figure 7.6 WQI for the study area during Post-monsoon season
173
160
WQI Pre Monsoon WQI Post Monsoon
140
120
WQI Values
100
80
60
40
20
0 Gettisamudram
Poolapalayam
Chettipalayam
Attani
Olagadam
Vellithirupur
Kurichi
Chinnapuliyur
Punnam
Odathurai
Oddapalayam
Chellampalayam
Sembulichampalayam
Uonjampalayam
Pudupalayam
Chennampatti
Bhavani
Nerunchipettai
Vyramangalam
Sanyasipatti
Nagalur
Puruvachi
Kannapalli
Kalingarayanpalayam
Ammapettai
Jambai
Singampettai
Name of the Location
Figure 7.7 Comparative graphical representation of WQI of

groundwater at different location in the study area during
pre and post monsoon season
7.5 NEMEROW’S POLLUTION INDEX (NPI)
An artificially induced degradation of natural groundwater quality

is called as groundwater pollution. The groundwater quality may get affected
due to domestic and industrial wastes and also by many other elements.
In this present study, Pollution Index was introduced by Nemerow

known as NPI is found out as,
Where is the Observed Concentration of ‘i’ parameter
is the Permissible limit of ‘i’ parameter as per BIS.

174
The unit of Ci and Li should be identical and each value of NPI

shows the relative pollution contributed by single parameter. The value of L i
for the different water quality parameters are given in Table 7.23. If NPI >1.0
it indicate the presence of impurity in water and hence treatment is required
prior to the usage of water (Usha Madhuri et al 2004). NPI is evaluated for all
the parameters for each sample and the spatial distribution map for all the
parameters which exceeding the NPI value greater than 1.0 is plotted with the
help of the ArcGIS.
Table 7.23 Permissible limit value (Li) of water quality parameters by

NPI
Permissible value
S.No Parameter
as per BIS (Li)
1 pH 8.5
2 Calcium 75
3 Chlorides 250
4 Total Hardness 300
5 Nitrates 45
6 Magnesium 30
7 Total Dissolve Solids 500
8 Fluoride 1.5
Note: All parameter values are in mg/l except pH
pH is a measure of the hydrogen ion and an important parameter id

determining the water quality. By considering pH all the stations have the NPI
value less than 1.0. The spatial distribution map for the NPI values of pH for
the pre monsoon and post monsoon is shown in Figures 7.8 and 7.9
respectively.
175
Calcium is essential for healthy growth of bones and plays an

important role in biological content. Only Six stations exceeding the NPI
value of 1.0 in pre monsoon season and 27 stations in post monsoon which
identify the pollution with reference to this parameter. The spatial distribution
map for the NPI values of calcium for the pre monsoon and post monsoon is
shown in Figures 7.10 and 7.11 respectively.
The contamination of groundwater with human or animal waste is

indicated by chloride content. Higher concentration of chloride leads to the
organic pollutants. Only 10 stations are exceeding the NPI value during Pre
monsoon and 5 number of station exceeding the limit of 1.0 in post monsoon
seasons. The spatial distribution map for the NPI values of chloride for the pre
monsoon and post monsoon is shown in Figures 7.12 and 7.13 respectively.
Hardness is the important parameter for domestic and industrial

purpose. Out of 53 stations around 40 stations are exceeding the NPI value of
1.0 during pre monsoon season and 27 stations during the post monsoon
seasons, indicating the pollution in water. The spatial distribution map for the
NPI values of hardness for the pre monsoon and post monsoon is shown in
Figures 7.14 and 7.15 respectively.
Nitrate content is the indicator of seasonal variation with recharge

and leaching of fertilizers. The numbers of stations which exceed the NPI
value of 1.0 during pre and post monsoon seasons are 27 and 28 respectively.
The spatial distribution map for the NPI values of nitrate for the pre monsoon
and post monsoon is shown in Figures 7.16 and 7.17 respectively.
Magnesium have laxative effect on the persons in association with

sulphate and only two stations are lesser than the NPI value in pre monsoon
season whereas 9.0 stations are below the NPI during post monsoon seasons.
176
The spatial distribution map for the NPI values of magnesium for the pre
monsoon and post monsoon is shown in Figures 7.18 and 7.19 respectively.
The indicator of overall water quality and mineralization is TDS. In

this present study the numbers of stations which have NPI value less than 1.0
are 19 stations during Premonsoon seasons and 15 stations in post monsoon
season. It indicates that water is fresh water. The spatial distribution map for
the NPI values of TDS for the pre monsoon and post monsoon is shown in
Figures 7.20 and 7.21 respectively.
Fluoride is the most commonly occurring natural contaminant in

the water in the form of Fluorine. The groundwater usually contains the
fluoride dissolved by geological formations. Higher concentrations of
fluorides may lead to diseases like dental fluorosis and skeletal flurosis. Only
seven stations exceeding the NPI value of 1.0 in pre monsoon season and 1
station in post monsoon which identify the pollution with reference to this
parameter. The spatial distribution map for the NPI values of Fluoride for the
pre monsoon and post monsoon is shown in Figures 7.22 and 7.23
respectively.
From the evaluation of the NPI, most of the samples are exceeding
the values of 1.0. It seems that pollution is present in that particular sample
and to be treated before it is used for domestic purpose.
Figure 7.8 pH - Spatial Distribution map - Pre monsoon Figure 7.9 pH - Spatial Distribution map - Post monsoon
17177
17
Figure 7.10 Calcium - Spatial Distribution map - Pre monsoon Figure 7.11 Calcium - Spatial Distribution map - Post monsoon
178
Figure 7.12 Chloride - Spatial Distribution map - Pre monsoon Figure 7.13 Chloride - Spatial Distribution map - Post monsoon
17179
Figure 7.14 Hardness - Spatial Distribution map - Pre monsoon Figure 7.15 Hardness - Spatial Distribution map - Post monsoon
18
180
Figure 7.16 Nitrate - Spatial Distribution map - Pre monsoon Figure 7.17 Nitrate - Spatial Distribution map - Post monsoon
18
181
18
Figure 7.18 Magnesium - Spatial Distribution map - Pre monsoon Figure 7.19 Magnesium - Spatial Distribution map - Post monsoon
182
Figure 7.20 TDS - Spatial Distribution map - Pre monsoon Figure 7.21 TDS - Spatial Distribution map - Post monsoon
18183
Figure 7.22 Fluoride - Spatial Distribution map - Pre monsoon Figure 7.23 Fluoride - Spatial Distribution map - Post monsoon
18184

12 Chapter7 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

12 Chapter7 PDF

Uploaded by

Copyright:

Available Formats

140

A numerical measurement to describe the characteristics of a

7.2 DESCRIPTIVE ANALYSIS

A statistical analysis which describes the data set with maximum

7.3 STATISTICAL TOOLS USED IN THE STUDY

The mean of a data is simply the arithmetic average of the value in

Mean of a data is expressed as,

Here ̅ = Arithmetic mean

∑ = Sum of all values of variables of x

It is the arithmetic average of the square and difference between the

7.3.3 Standard Deviation

It is used to study the dispersion and it a measure of the absolute

Here ̅ = Arithmetic mean of the values.

X = Variables and is Standard Deviation.

7.3.4 Correlation Analysis

Statistical investigation offers more attractive options in

The correlation coefficient is also known as Pearson’s Product

If the values of correlation coefficient “r” between two variables X an Y are

7.3.4.1 Strength or magnitude of the correlation

The Pearson’s correlation coefficient may take a value from -1 to 0,

Table 7.1 Strength relationship between the correlation “r” values

“r” values Strength of Relationship

7.3.4.2 Test of Significance of the observed correlation coefficients

The correlation analysis for the various physico-chemical

pH EC TDS TH Ca2+ Mg2+ Na+ K+ SO4- Cl- NO3- F- HCO3 CO3

pH EC TDS TH Ca2+ Mg2+ Na+ K+ SO4- Cl- NO3- F- HCO3 CO3

7.3.4.3 Inference from correlation matrices

Table 7.4 Broad Classification of Correlation Values

Correlation Value Level of Correlation

7.3.5 Factor Analysis

Factor analysis is a multivariate analytical technique which was

7.3.5.1 Factor analysis of water quality parameters for premonsoon

Factor analysis was used to reduce the number of variables

The initial analysis with all the 14 variables resulted in a Measure

Table 7.5 KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. 0.728

The results of MO and Bartlett’s Test indicate that a factor

Table 7.5 shows the number of components extracted with eigen

Extraction Sums of Squared

The Table 7.7 represents the rotated component matrix which

Table 7.7 Rotated Component Matrixa

Figure 7.1 shows the component plot in a rotated space. It is also

Figure 7.1 Component plot in a rotated space

The regression equation is an algebraic representation of the

Response = constant + coefficient * predictor + ... + coefficient * predictor

y = bo + b1X1 + b2X2 + ... + bkXk

 Response (Y) is the value of the response.

 Constant (bo) is the value of the response variable when the

 Predictor(s) (X) is the value of the predictor variable(s). The

 Coefficients (b1, b2, ... , bk) represent the estimated change in

Model Fitting through Multiple Regression Analysis (Pre-Monsoon)

Multiple regression analysis is used to fit a model for the dependent

Table 7.8 Correlations between Variables

Table 7.9 Model Summaryb

R Adjusted R Std. Error of

Table 7.10 gives the significance of the three models fitted by

Table 7.10 ANOVAa

Table 7.11 gives the coefficients of the independent variables

WQI = 55.824 + 3.002*Factor 1-predominant + 28.631*

Table 7.11 Coefficientsa

The independent variable Factor 1-predominant with a positive

with a coefficient of 28.631 is also significant in explaining WQI (t=26.924

7.3.5.2 Factor analysis of water quality parameters for post monsoon

WQI = 55.824 + 3.002Factor 1-predominant + 28.631