You are on page 1of 9

Ain Shams Engineering Journal 9 (2018) 3197–3205

Contents lists available at ScienceDirect

Ain Shams Engineering Journal


journal homepage: www.sciencedirect.com

Data driven water quality modeling for drain/canal inflows to Lake


Burullus
Mohamed Shaban a,⇑, Hanan Farag b
a
Drainage Research Institute, National Water Research Center, PO 13621/5, El-Kanater El-Khairia, Qalubia, Egypt
b
National Water Research Center, PO 13621/5, El-Kanater El-Khairia, Qalubia, Egypt

a r t i c l e i n f o a b s t r a c t

Article history: Burullus Lake is located along the Egyptian coast of the Mediterranean. It receives discharges from nine
Received 15 November 2017 estuaries. These discharges comprise different wastes resulting in high vulnerability to rapid eutrophica-
Revised 15 April 2018 tion and pollution.
Accepted 21 May 2018
This indicates the importance of the estuaries water quality in relation with the Lake environment.
Available online 12 November 2018
Therefore, the objective is to model and quantify the effluent impacts of the estuaries on the Lake water
quality.
Keywords:
The analysis started with classifying the Lake 12 monitoring locations using water quality data (WQD)
Lake Burullus
Stepwise regression
collected monthly from Feb. 2010 to Nov. 2012. Then, correlation and regression analyses were carried
Agricultural drainage out to model and quantify lake BOD and TP levels in relation with the estuaries effluents.
Water pollution The Lake can be classified into three clusters that are highly affected with the effluent of the estuaries.
Some of these estuaries can be useful for predicting the Lake water quality using significant models at
0.1% level.
Ó 2018 Ain Shams University. Production and hosting by Elsevier B.V. This is an open access article under
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction Mediterranean provides the Lake with saline water [6]. The main
water sources are: Burullus West Drain, Drain 11, Nashart (Drain
Burullus Lake, a RAMSAR site declared as a natural protectorate 9), Drain 8, Drain 7, Tira Drain, khashaa drain, Burullus Drain and
since 1998, is located along the Egyptian coast of the Mediter- Bermbal canal (Fig. 1).
ranean at the north central Nile Delta region [1,2]. The Lake connection to the sea is through narrow and nearly
The Lake is a shallow basin that lies between Long. 30° 300 & 31° straight outlet (Boughaz El-Burg). Therefore, the wastewater
10 E and Lat. 31° 350 N [3]. It has a connection to the Mediterranean
0
released from the southern drains is greatly larger than seawater
Sea through Boughaz El-Burg at the northern side. entering the lagoon [7].
The Lake is economically important since it produces around The drains inflows are of serious concern because of its high
50% of the Egyptian fish production. It is also seen as a valuable levels of contamination by anthropogenic materials [8]. These
wetland and a resting area for various migrating birds [4]. materials include heavy fertilizers, oxidized organic matters and
Recently, the Lake ecosystem has been degraded due to the exces- pesticides loads causing serious water quality and eutrophication
sive drainage system discharges [5]. The Lake receives (3.2  109 problems during the last decade [5,9,10].
m3 y1) agricultural, domestic and industrial drainage waters from Several studies have been conducted over the years to assess the
the surrounding farmland, fish farms, and urbanized areas through extent of pollution in the Lake, identify the potential sources of pol-
eight drains and one fresh-to-brackish water canal. In addition, the lution and describe their implication These studies indicated with-
out quantification the impact of the drainage water on the Lake
⇑ Corresponding author.
ecosystem [11,12,4,13,7]. Also, efforts were exerted in developing/
E-mail address: salamashaban@gmx.com (M. Shaban).
using 1–2 D models for modeling water quality in Lake Burullus
Peer review under responsibility of Ain Shams University.
[14–16].
However, these types of modeling require enormous amount of
detailed input data that can be in many cases, very expensive and
difficult to collect [17,18]. These models usually cannot accommo-
Production and hosting by Elsevier date infinite complexity and typically are simplified. In addition,

https://doi.org/10.1016/j.asej.2018.05.002
2090-4479/Ó 2018 Ain Shams University. Production and hosting by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
3198 M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205

B2 B3
B1

B8
B4
B5
B6
B9 B7
B10
B12
B11

Lake Monitoring Sites


Drain Monitoring Sites

Fig. 1. Water quality monitoring sites in Lake Burullus.

they have difficulty accounting for outliers’ data and excluded vari- another synchronized nine surface water samples were collected
ables. Therefore, data driven techniques such as regression/neural at the drain/canal estuaries on the Lake during the same period,
based modeling have been developed as tradeoffs to numerical Fig. 1 shows a map of Lake Burullus water system indicating the
models especially when the available data is limited [17,19]. sampling locations employed for this study.
The main objective of this research is to model and quantify the Field water quality samples were collected in accordance with
effluent impacts of the drain/canal estuaries on the Lake water methods described in the report ‘‘Manual on data collection, pro-
quality using regression based formulas. The modeling/quantifica- cessing and presentation - short term routine measurements pro-
tion process will facilitate setting proper management actions gram” [21]. This report was developed to ensure collection of
based on what/if scenarios to improve the environmental condi- representative and reproducible water quality samples. In addi-
tions of the Lake taking future plans into account. These scenarios tion, laboratory analyses were conducted according to the Stan-
should weigh the relative importance of each drain/canal as a pol- dard Methods for the Examination of Water and Wastewater -
lution source as described by the regression formulas the paper. 19th edition [22].

2. Materials and methods 2.2. Data analysis

2.1. Monitoring program and data collection To examine the Lake WQD for spatial differences, hierarchical
agglomerative cluster analysis (HACA) was used. HACA is inten-
Twelve surface water samples were collected from Burullus sively used to recognize groups of samples that have similar water
Lake during the period from February 2010 to November 2012. quality characteristics [23]. It has been extensively employed to
These sites were selected to cover the Lake area and to represent study water-chemistry data for both surface and ground water
its environmental conditions (Fig. 1). These twelve sites selection [24,25]. For the purpose of this study, the analysis started with per-
was based on the following criteria: forming HACA to classify the Lake 12 monitoring locations using
the medians of the WQD (DO (mg/l), BOD (mg/l), COD (mg/l),
 Three samples (1–3) were taken from Boughaz El-Burg zone, in Chl-a (Micr. g/l), N-NH4 (mg/l), N-NO2 (mg/l), N-NO3 (mg/l), TN
the east side of the basin representing the Lake zone that is (mg/l) and TP (mg/l)) collected during the period from February
highly influenced by the sea water, Tira Drain, Khashaa Drain 2010 to November 2012.
and Burullus Drain discharge in this zone. In performing the HACA, the normality status of the data was
 Five samples (4–8) were collected to represent the conditions at first investigated using Kolmogorov-Smirnov and Shapiro-Wilk
the middle basin of the Lake. This basin receives enormous tests. Then clustering process developed one cluster for each sam-
amount of wastewaters from Nashart (Drain 9), Drain 8 and ple and then pairs of clusters were successively merged using the
Drain 7. Euclidean distance as a similarity measure.
 Four samples (9–12) represented the west side of the basin that In order to investigate the relationship between the indepen-
get its water from two drains; Burullus West Drain and Drain 11 dent (the drain/canal outfalls on the Lake) and dependent (Burullus
as well as Bermbal canal. Lake) variables correlation analysis were employed. The analyses
were carried out using measurements of selected water quality
Since the Lake receives about 3.9 billion cubic meters as an parameters; BOD (mg/l) and TP (mg/l). These biological and agri-
annual discharges from 8 drains and one irrigation canal [20], cultural indicators were selected based on the previous knowledge
M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205 3199

concerning the main pollution sources to the lake since industrial The clustering process had a relatively high structure since the
discharges do not exceed 0.12% from the total discharges to the agglomerative Coefficient (AC) for the examined data was calcu-
lake [26]. lated as, 0.903 (close to 1.0) indicating a ‘‘very clear clustering
Finally, stepwise multiple regression analyses were carried out structure” [31].
to model and quantify the BOD (mg/l) and TP (mg/l) levels for each
Lake cluster in relation with the effluent impact of the drain/canal 3.1.1. Cluster groups
estuaries. In Fig. 2, there are three major horizontal lines identifying three
For each cluster, the data for all drains that significantly corre- main water quality clusters labeled as Cluster I, Cluster II, and
late with those collected from the monitoring locations in the Cluster III. Their linkage Euclidean distances are clearly large.
examined Lake cluster were included in sequence into the model. Straightforward assessment of the three clusters indicated that
This inclusion is then tested. If the included data set contributes differences in site groups are attributable to water-quality con-
significantly (p < 0:01) to improve the information obtained by stituents. Fig. 3 shows the variation of these constituents along
the model, then it was kept. Otherwise, this data set was excluded. the Lake sites. Thorough investigation of Fig. 3 indicates the
As a result, a minimum number of predictors was kept in the final following:
model [27].
This way, the analysis attempts to formulate the relationships  The monitoring sites in Cluster I (B6, B7, B1 and B4) had rela-
between the model parameters and thereby offer a reliable pre- tively moderate water quality conditions. However, they were
dicting/forecasting mechanism for the Lake water quality [28,29].
The previous statistical analyses (carried out using SPSS version
21) were used to interpret the complex datasets to obtain an
improved understanding of lake water quality.

3. Results and discussions

3.1. Cluster analysis

The use of HACA in this research was to identify relatively


homogeneous groups of variables (monitoring sites in Lake
Burullus) through Dendrogram based on the examined WQPs. This
Dendrogram is then used as a simple graphical presentation for the
sites behaviors [30,31].
HACA generated a Dendogram as shown in Fig. 2 grouping 12
monitoring sites into three clusters. Cluster I comprised sites in
the internal zone in the south (B6, B7, B1 and B4) and the Cluster
II included sites in the coastal zone near to the Sea (B2, B3, B5, B8,
B9, B10 and B12), while Cluster III included one location (B11) in
the south west Zone.
Fig. 2 shows the Dendrogram where monitoring sites are repre-
sented as parallel horizontal lines. Vertical lines merge the site sets
into clusters that have similar water quality levels. The similarity/
dissimilarity between these clusters is assessed using the Eucli-
dean distance represented on the x-axis.

Cluster I

Cluster II

Cluster III

Fig. 2. Dendrogram from HACA of 12 water quality monitoring sites from Lake
Burullus. Fig. 3. Water quality variation along site clusters in Lake Burullus.
3200 M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205

more near to Cluster III than Cluster II. They recorded moderate Burullus Lake) variables. Correlation matrices (for BOD and TP)
concentrations for the parameters: DO, BOD, COD, N-NO2, N- were constructed and the significance is considered at the levels
NH4 and N-NO3. Highest Chl-a, TP and TN levels were also found of 0.01 and 0.05 (2- tailed analysis). The analysis results (Tables
in this cluster. This cluster zone is located in the south of the 1 and 2) revealed the followings:
Lake near to the inland and relatively far from the effect of
sea water and more affected by the drainage water inflow. In & For the monitoring sites in Cluster I (B6, B7, B1 and B4), it was
addition to the excess amount of nutrients received due to drai- found that the BOD levels were significantly correlated with the
nage fish farms adjacent to the lake border. BOD levels of five drains namely; Nashart (Drain 9), Drain 7, Tira
 The monitoring sites in Cluster II (B2, B3, B5, B8, B9, B10 and Drain, Khashaa Drain and Burullus Drain. Where the TP levels
B12) had relatively better water quality conditions. They were significantly correlated with the TP levels of all drain/canal
recorded lowest concentrations for the parameters: TP, TN, estuaries.
BOD, COD, N-NO2, N-NH4 and N-NO3. Highest DO levels were & For the monitoring sites in Cluster II (B2, B3, B5, B8, B9, B10 and
also found in this cluster. Chl-a was found to be in a moderate B12), the BOD levels were significantly correlated with the BOD
level. This quality conditions may be attributable to the fact that levels for four drains namely; Nashart (Drain 9), Drain 7, Tira
this cluster zone is located in the north of the Lake and rela- Drain and Burullus Drain. Where the TP levels were significantly
tively near to Boughaz El-Burg where sea water has relatively correlated with the TP levels of all drain/canal estuaries except
higher influences. Burullus Drain.
 The monitoring site in Cluster III (B11) seemed to have the & For the monitoring sites in Cluster III (B11), it was found that
worst water quality conditions. It recorded highest concentra- the BOD levels were significantly correlated with the BOD levels
tions for the parameters: BOD, COD, N-NO2, N-NH4 and N- for four drains and one fresh-to-brackish water canal namely;
NO3. Lowest Chl-a level was found in this cluster. TP and TN Burullus West Drain, Drain 11, Tira Drain, Burullus Drain and
were found to be in moderate levels. This cluster zone has the Bermbal Canal. Where the TP levels were significantly corre-
largest distance from the influence of the sea water and more lated with three drains namely; Burullus West Drain, Drain 11
affected by the drainage water inflow from untreated wastes and Nashart (Drain 9)
from villages distributed along drain no.11. These villages do
not have proper sanitation facilities representing 86% of Moto- 3.3. Stepwise regression analysis
bas Markaz and 55% of Foa Markaz from total population den-
sity at these two Marksz. In addition, Cluster III also receives 3.3.1. Modeling biological oxygen demand (BOD) levels
excessive nutrients from many fish farms adjacent to the south To find out the best predictor of mean BOD level for each site
western part of the lake. cluster in Lake Burullus, a series of stepwise multiple linear regres-
sions models was used. The main objective was to investigate the
3.2. Correlation analysis relationship between the independent variables (BOD levels for the
drain/canal estuaries) and a dependent variable (mean BOD level
Correlation analyses were carried out using measurements of for a certain site cluster) [32].
selected water quality parameters; BOD (mg/l) and TP (mg/l). The Before interpreting the results, standard assumptions of linear
analyses were carried out between the independent (monitoring regressions were tested. An assessment of the regression normal
sites at drain/canal outfalls) and dependent (monitoring sites at p-p and scatter plots showed that the relationship between the

Table 1
Correlation matrix for BOD levels measured at the independent (drain/canal estuaries) and dependent (Burullus Lake) sites.

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12


Burullus West 0.172 0.062 0.089 0.022 0.388 0.209 0.490 0.412 0.241 0.422 0.799** 0.370
Bermbal Canal 0.150 0.059 0.187 0.046 0.194 0.094 0.007 0.162 0.250 0.224 0.624* 0.211
Drain 11 0.420 0.210 0.002 0.150 0.453 0.197 0.363 0.435 0.238 0.256 0.810** 0.564
Nashart (Drain 9) 0.160 0.057 0.486 0.510 0.812** 0.577* 0.674* 0.708** 0.730** 0.611* 0.506 0.627*
Drain 8 0.007 0.269 0.225 0.358 0.454 0.258 0.491 0.497 0.654* 0.459 0.515 0.515
Drain 7 0.405 0.154 0.640* 0.641* 0.716** 0.516 0.512 0.632* 0.812** 0.588* 0.295 0.371
Tira Drain 0.268 0.146 0.571 0.575 0.759** 0.462 0.636* 0.585* 0.679* 0.610* 0.627* 0.645*
Khashaa Drain 0.376 0.231 0.473 0.332 0.275 0.634* 0.101 0.038 0.159 0.195 0.013 0.135
Burullus Drain 0.366 0.218 0.326 0.603* 0.452 0.014 0.846** 0.511 0.508 0.665** 0.704* 0.348
*
Correlation is significant at the 0.05 level (2-tailed).
**
Correlation is significant at the 0.01 level (2-tailed).

Table 2
Correlation matrix for TP levels measured at the independent (drain/canal estuaries) and dependent (Burullus Lake) sites.

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12


Burullus West 0.780** 0.111 0.598* 0.236 0.782** 0.739** 0.786** 0.780** 0.823** 0.806** 0.673* 0.710**
Bermbal Canal 0.643* 0.071 0.249 0.154 0.569 0.541 0.469 0.537 0.547 0.794** 0.484 0.686*
Drain 11 0.767** 0.268 0.569 0.154 0.548 0.505 0.760** 0.564 0.701* 0.772** 0.712** 0.710**
Nashart (Drain 9) 0.528 0.127 0.350 0.477 0.646* 0.807** 0.646* 0.665* 0.729** 0.633* 0.630* 0.586*
Drain 8 0.309 0.053 0.292 0.326 0.738** 0.853** 0.546 0.652* 0.618* 0.580* 0.389 0.391
Drain 7 0.651* 0.128 0.461 0.164 0.682* 0.751** 0.582* 0.667* 0.638* 0.613* 0.544 0.676*
Tira Drain 0.502 0.268 0.718** 0.528 0.484 0.547 0.730** 0.521 0.610* 0.292 0.544 0.352
Khashaa Drain 0.414 0.102 0.304 0.216 0.596* 0.605* 0.385 0.458 0.349 0.494 0.232 0.433
Burullus Drain 0.303 0.326 0.378 0.184 0.326 0.422 0.630* 0.377 0.452 0.372 0.435 0.257
*
Correlation is significant at the 0.05 level (2-tailed).
**
Correlation is significant at the 0.01 level (2-tailed).
M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205 3201

dependent variable (to be predicted) and the predictors is linear Thorough examination of the output showed that only BOD
and the residuals had the same variance (homoscedasticity) levels at Drain 7 and Khashaa Drain contribute substantially to
(Fig. 4). The following sections summarize the obtained results the model’s ability to predict the outcome. These drains receive
for each site cluster. substantial discharges from fish farms in addition to untreated/-
partially treated domestic wastewater from villages distributed
along southern shore of the lake that lead to receiving highly nutri-
3.3.1.1. Site Cluster I. The BOD levels for the drain outfall Nashart ent pollutants.
(Drain 9), Drain 7, Tira Drain, Khashaa Drain and Burullus Drain Table 3 shows that there are two models. Model 1 refers to the
as independent variables were introduced to SPSS 21 to model first stage in the stepwise regression hierarchy when only BOD
the mean BOD levels at Site Cluster I (dependent variable). level at Drain 7 was used as a predictor. Model 2 refers to final
Each set of drain BOD data (independent) was included in stage when BOD levels at Drain 7 and Khashaa Drain predictors
sequence into the model that is then evaluated. If the included data were used. The R2 value for the two variable model is 0.717, which
set contributes to improve the information obtained by the model, means that the two independent variables collectively explain
then it was kept. Otherwise, this data set was excluded. As a result, 71.7% of the variance in the dependent variable. Consequently,
a minimum number of predictors was kept in the final model. the relationship can be described as ‘‘strong”. Table 2 also indicates

Normal p-p plot of regression standardized residuals

Scatter plot (regression standardized predicted values against observed values)


Fig. 4. Checking regression assumptions (BOD data example).
3202 M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205

Table 3
Regression model summary for BOD levels at Site Cluster I.

Model summaryc
Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics Durbin-Watson
R Square Change F Change df1 df2 Sig. F Change
1 0.687a 0.472 0.419 5.76981 0.472 8.939 1 10 0.014
2 0.847b 0.717 0.684 4.45099 0.245 7.804 1 9 0.021 1.719
a
Predictors: (Constant), Drain 7.
b
Predictors: (Constant), Drain 7, Khashaa Drain.
c
Dependent Variable: BOD levels at Site Cluster I.

that the increase in predictive power or accuracy at each model At this stage, Model 2 can be written as:
step was a statistically significant addition. The ‘‘Sig. F
Change = 0.02100 provides the probability that the increase in R2 BOD levels at Site Cluster I ¼ 0:514ðBOD at Drain 7Þ
is greater than 0. In addition, the value of Durbin-Waston statistic þ 0:233 ðBOD at Drain KhashaaÞ
was 1.719 that is near to 2, indicating insignificant serial
þ 1:91
correlation.
Table 4 shows the estimates of the model coefficients. The Beta ð1Þ
coefficient among the parameters calibrated by stepwise regres- Using Eq. (1), the mean value of the BOD levels at Site Cluster I
sions analysis indicates that BOD levels at Drain 7 make the stron- can be estimated. The estimated mean will deviate from the field
gest unique contribution in Mean BOD levels for Site Cluster I measurements by an amount usually called residuals. The pre-
variation (0.514). The Beta value for BOD levels at Khashaa Drain dicted BOD levels at Site Cluster I using the original measurement
(0.233) was the second contributor. The very low P-values for the of BOD levels at Drain 7 and Khashaa Drain are presented in Fig. 5a.
intercept and slope coefficients indicate they are strongly The adjusted R2 value is commonly employed to check whether
significant. the application of the models are not limited to the measured data
Table 4 includes two measures to evaluate collinearity. The first set and can be used beyond.
is the Tolerance that varies from 0 to 1.0. Values that are near to For Model 2, R2 = 0.717 and the adjusted R2 = 0.684. These val-
zero indicate a strong evidence to collinearity problem. In this case, ues are very close that is a good sign for possible generalizability.
the number of predictors should be reduced by excluding one or However, model estimates within the range of the original data
more predictors. set (prediction) is more trustful compared to those outside that
The second measure (Variance Inflation Factor (VIF)) is simply (extrapolation).
the reciprocal of the Tolerance. However, Model 2 results show
that all predictors do not correlate together. 3.3.1.2. Site Cluster II. The BOD levels for the drain estuaries Nashart
The ANOVA table (Table 5) shows that the F-statistics (Drain 9), Drain 7, Tira Drain and Burullus Drain as independent
(F = 11.412) was large and the corresponding p value is highly sig- variables were considered to model the mean BOD levels at Site
nificant (p = 0.003 < 0.01). This indicates that, the slope of the Cluster II (dependent variable). Similar analysis steps were carried
model does not equal to zero, confirming that the relationship out for the case of Site Cluster II. The results indicated that the tol-
between the predictors is linear. erance values for the excluded variables varied between 0.407 and

Table 4
Estimates of coefficients of the multiple linear model of BOD levels at Site Cluster I.

Model Unstandardized Standardized Coefficients t Sig. Collinearity Statistics


Coefficients
B Std. Error Beta Tolerance VIF
1 (Constant) 6.076 3.294 1.845 0.095
Drain 7 0.518 0.173 0.687 2.990 0.014 1.000 1.000
2 (Constant) 1.910 2.946 0.648 0.033
Drain 7 0.514 0.134 0.682 3.848 0.004 0.9999 1.0001
Khashaa 0.233 0.083 0.495 2.794 0.021 0.9999 1.0001

Table 5
ANOVA table for the multiple linear model of BOD levels at Site Cluster I. ANOVA.a

Model Sum of squares df Mean Square F Sig.


1 Regression 297.581 1 297.581 8.939 0.014b
Residual 332.908 10 33.291
Total 630.488 11
2 Regression 452.187 2 226.093 11.412 0.003c
Residual 178.301 9 19.811
Total 630.488 11
a
Dependent Variable: BOD levels for Site Cluster I.
b
Predictors: (Constant), BOD levels at Drain 7.
c
Predictors: (Constant), BOD levels at Drain 7 and Khashaa Drain.
M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205 3203

Now, regression model for

BOD levels at Site Cluster II ¼ 0:482 ðBOD at Tira DrainÞ


þ 3:132 ð2Þ
Using Eq. (2), the mean BOD levels at Site Cluster II can be esti-
mated. The predicted values of mean BOD levels at Site Cluster II
using the original measurement of BOD levels at Tira Drain are pre-
sented in Fig. 5b.

3.3.1.3. Site Cluster III. The BOD levels for the drain estuaries Burul-
lus West Drain, Drain 11, Tira Drain, Burullus Drain and Bermbal
Canal as independent variables were considered in BOD modeling
process for Site Cluster III (dependent variable). Similar analysis
steps were carried out as for the previous two clusters. The analysis
results showed that only BOD levels at Burullus Drain and Drain 11
contribute substantially to the model’s ability to predict the BOD
levels at Site Cluster III. The R2 value for the two variable model
is 0.792, which means that the independent variable explains
79.2% of the variance in the dependent variable. Consequently,
the relationship can be described as ‘‘strong”. In addition, the value
of Durbin-Waston statistic was 2.34, that is near to 2, indicating no
serial correlation.
The very low P-values for the intercept and slope coefficients
(0.026, 0.006 and 0.039 respectively) indicate they are strongly sig-
nificant. The F-statistics (F = 17.150) was large and the p value is
highly significant (p = 0.001 < 0.01). This indicates that, the slope
of the model does not equal to zero and the relationship between
the predictors is linear.
The regression model for

BOD levels at Site Cluster III ¼ 0:803 ðBOD at Burullus DrainÞ


þ 0:506 ðBOD at Drain 11Þ
 1:633
ð3Þ
Using Eq. (3), the value of BOD levels at Site Cluster III can be
estimated. The predicted values of mean BOD levels at Site Cluster
III using the original measurement of BOD levels at Burullus Drain
and Drain 11 are presented in Fig. 5-c.
Fig. 5. Measured and predicted values for the mean BOD levels at Site Cluster I, II
and III. 3.3.2. Modeling total phosphorus (TP) levels
The same procedure was used to find out the best predictor/s of
0.670, so these independent variables depend linearly on each mean TP levels for each site cluster in Lake Burullus. Table 6 sum-
other. Therefore, one variable model is considered. Only BOD levels marizes the modeling results indicating that only TP levels at
at Tira Drain contribute substantially to the model’s ability to pre- Burullus West Drain contribute substantially to the models ability
dict the mean BOD levels at Site Cluster II. The R2 value for the one to predict the mean TP levels at Site Clusters I and II. The R2 values
variable model is 0.514, which means that the independent vari- for the one variable models were 0.700 and 0.899 respectively. This
able explains 51.4% of the variance in the dependent variable. Con- means that independent variable (TP levels at Burullus West Drain)
sequently, the relationship can be described as ‘‘moderate”. In explains 70.0% and 89.9% of the variance in the dependent
addition, the value of Durbin-Waston statistic was 2.28, that is variables (Mean TP levels at Site Clusters I and II respectively).
near to 2, indicating no serial correlation. Consequently, the relationship can be described as ‘‘strong”. In
The very low P-values for the intercept and slope coefficients addition, the values of Durbin-Waston statistics were 1.90 and
(0.027 and 0.009 respectively) indicate they are strongly signifi- 2.0 for Site Clusters I and II respectively that are very near or equal
cant. The F-statistics (F = 10.588) was large and the p value is to 2, indicating no serial correlation.
highly significant (p = 0.009 < 0.01). This indicates that, the slope For Site Cluster III, only TP levels at Drain 11 contribute sub-
of the model does not equal to zero and the relationship between stantially to the model ability to predict the mean TP levels. The
the predictors is linear. R2 and Durbin-Waston statistics were 0.507 and 1.87 indicating

Table 6
Multiple linear models of TP levels at Lake Site Clusters.

Site Groups TP Models R2 Adjusted R2 Sig. F Change Durbin-Waston Statistic


Site Cluster I Mean TP = 0.880 (TP for Burullus West Drain) + 232.229 0.700 0.670 0.001 1.871
Site Cluster II Mean TP = 0.456 (TP for Burullus West Drain) + 72.230 0.899 0.889 0.000 2.013
Site Cluster III Mean TP = 0.86748 (TP for Drain 11) + 5.8967 0.507 0.457 0.009 1.878
3204 M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205

‘‘moderate” relationship and no serial correlation. The correspond- nutrients drained from excessive fish farming adjacent to the Lake
ing p values for the three models were highly significant indicating border in the south. The monitoring sites in Cluster I had relatively
that the slope of the developed models is not equal to zero and moderate concentrations for the parameters: DO, BOD, COD, N-
confirming that there are statistically significant relationships. NO2, N-NH4 and N-NO3. Highest Chl-a, TP and TN levels were also
found. The BOD levels were mainly attributed to inflows from five
drains namely; Nashart (Drain 9), Drain 7, Tira Drain, Khashaa Drain
4. Summary and conclusions and Burullus Drain. However, only Drain 7 and Khashaa Drain collec-
tively explain 71.7% of the variance in the mean BOD levels at Site
In this paper, the Lake Burullus was classified into three cluster Cluster I. Similarly, the TP levels were attributed to inflows from
zones (I, II and III) based on its water quality characteristics. all the drain/canal estuaries but only Burullus West Drain explains
Cluster I is located in the south of the Lake near to the inland and 70.0% of the variance in the mean TP levels at Site Cluster I.
relatively far from the effect of sea water and more affected by the Cluster II is located in the upper northern part near to the sea
drainage water inflow. In addition, it receives significant amount of opening (El-Boghaz). It has relatively better water quality conditions
due to the Lake water exchange with the Mediterranean. The moni-
toring sites in Cluster II recorded lowest concentrations for the
parameters: TP, TN, BOD, COD, N-NO2, N-NH4 and N-NO3. Highest
DO levels were also found in this cluster. Chl-a was found to be in
a moderate level. In this cluster, the BOD levels were mainly attrib-
uted to inflows from four drains namely; Nashart (Drain 9), Drain 7,
Tira Drain and Burullus Drain. However, Only Tira Drain can explain
51.4% of the variance in the mean BOD levels at Site Cluster II. TP
levels were attributed to inflows from all the drain/canal estuaries
except Burullus Drain but only Burullus West Drain explains 89.9%
of the variance in the mean TP levels at Site Cluster II.
Cluster III is located in the southwest corner of the Lake at the
largest distance from the connection to the Mediterranean (Boug-
haz). It has the worst water quality conditions in the Lake and is
mainly affected by the drainage water inflow from untreated wastes
from villages that have improper sanitation facilities distributed
along Drain no.11. This zone also receives excessive nutrients from
many fish farms adjacent to the south western part of the Lake. It
recorded highest concentrations for the parameters: BOD, COD, N-
NO2, N-NH4 and N-NO3. Lowest Chl-a levels were found in this clus-
ter. TP and TN were found to be in moderate levels. In this cluster,
the BOD levels were mainly attributed to inflows from four drains
and one fresh-to-brackish water canal namely; Burullus West Drain,
Drain 11, Tira Drain, Burullus Drain and Bermbal Canal. However,
Only Burullus Drain and Drain 11 collectively explain 79.2% of the
variance in the mean BOD levels at Site Cluster III. TP levels were
attributed to inflows from three drains namely; Burullus West Drain,
Drain 11 and Nashart (Drain 9) but only Drain 11 explains 50.7% of
the variance in the mean TP levels at Site Cluster III.
The regression results for the three clusters showed good agree-
ments with the measured data with relatively acceptable ranges of
error (Figs. 5 and 6). The statistical measures, such as R2, Adjusted
R2 and F statistics for all developed models proved their validity.
These measures are commonly used as indicators of generalizabil-
ity. Table 7 presents the statistical model validity limits for BOD
and TP levels that can be used for predictions within the range of
the measured regressors (interpolation).

5. Conclusions and recommendations


Fig. 6. Measured and predicted values for the mean TP levels at Site Cluster I, II
and III. Modern irrigation, fish farming and large settlements around
Lake Burullus has brought the area under intensive environmental

Table 7
Statistical limits for BOD and TP models validity.

Drain 7 Khashaa Drain Tira Drain Burullus Drain Drain 11


BOD (mg/l)
Minimum 3.60 3.00 2.80 1.46 1.94
Maximum 30.33 63.18 27.33 25.50 40.45
TP (Micro g/l)
Burullus West Drain Drain 11
Minimum 205.94 314.20
Maximum 872.10 1081.20
M. Shaban, H. Farag / Ain Shams Engineering Journal 9 (2018) 3197–3205 3205

stress. Substantial discharges carrying high concentrations of fer- [15] El-Adawy A, Negm AM, Elzeir MA, Saavedra OC, El-Shinnawy IA, Nadaoka K.
Coupled hydrodynamic-water quality model for pollution control scenarios in
tilizer and pesticide enter the southern side of the lake through a
El-Burullus Lake (Nile Delta, Egypt). J Clean Energy Technol 2013;1(2):157–63.
number of drains leading to serious eutrophication and pollution. [16] El-Adawy A, Negm AM, Saavedra OC, Nadaoka K, El-Shinnawy IA. Coupled
Therefore, sustainable utilization of the Lake requires research hydrodynamic-water quality model for pollution control scenarios in El-
and stakeholders integration in the decision making process. This Burullus Lake (Nile Delta, Egypt). Am J Environ Sci 2014;10(6):549–68.
[17] Adamowski J, Chan HF. A wavelet neural network conjunction model for
would ensure implementation of decisions that are based on scien- groundwater level forecasting. J Hydrol 2011;407(1–4):28–40.
tific results and considers all stakeholders interests. [18] Yoon H, Jun SC, Hyun Y, Bae GO, Lee KK. A comparative study of artificial
In this research, previous experiences and measurements of neural networks and support vector machines for predicting groundwater
levels in a coastal aquifer. J Hydrol 2011;396(1):128–38.
Lake Burullus WQD were employed to yield empirical relationships [19] Moosavi V, Vafakhah M, Shirmohammadi B, Behnia N. A wavelet-ANFIS hybrid
(models) between canal/drain inflows (inputs) and lake water model for groundwater level forecasting for different prediction periods.
quality (output). Comparing the predicted and measured data con- Water Resour Manage 2013;27(5):1301–21.
[20] Drainage Research Institute (DRI), 2008. Annual report, Drainage Research
firmed the accuracy of the developed models supporting that they institute, National water research center (NWRC), Report No. (79).
are useful tools to predict water quality in the Lake. [21] Roest CWJ. Manual on Data Collection, Processing and Presentation. Short
These models can be used in conjunction with water quality Term Routine Measurement Programme, Reuse of Drainage Water Project.
Wageningen, The Netherlands: Institute for Land and Water Management
measurements to assess pollution control measures that are/will Research; 1983.
be carried out to improve the lake environmental conditions. These [22] APHA (American Public Health Association). Standard Methods for the
measures may comprise improving agricultural practices, using Examination of Water and Wastewater. 19th ed., Washington, D.C.; 1995.
[23] Güler C, Thyne GD, McCray JE, Turner KA. Evaluation of graphical and
environmental friendly fish species and providing adequate sanita-
multivariate statistical methods for classification of water chemistry data.
tion services. Improved agricultural practices with specific Hydrogeol J 2002;10:455–74.
emphasis on reducing fertilizers and pesticides can be of immense [24] Alther GA. A simplified statistical sequence applied to routine water quality
contribution in improving lake water quality. The use of fish spe- analysis—a case history. Ground Water 1979;17(6):556–61.
[25] Farnham IM, Klaus JS, Ashok KS, Johannesson KH. Deciphering groundwater
cies that have less negative environmental impacts can minimize flow systems in Oasis Valley, Nevada, using trace element chemistry,
re-suspension and consequent release of adsorbed nutrients to multivariate statistics, and geographical information system. Math Geol
the Lake water. In addition, providing adequate sanitation ser- 2000;32(8):943–68.
[26] Egyptian Environmental Affairs Agency (EEAA). Annual report. Cairo: Ministry
vices to the residents in the surrounding urban/rural districts will of State for Environmental Affairs; 2007.
significantly enhance the water quality of irrigation and drainage [27] Brace N, Kemp R, Snelgar R. Cross-tabulation, chi-square, and nonparametric
canals in the region. This can be achieved by setting priority action measures of association. Chapter 7, SPSS for psychologists, Psychology,
London; 2003. p. 206–20.
plan taking into account the relative importance of each drain/- [28] Weisberg S. Applied linear regression. New York: Wiley and Sons; 1980.
canal as a pollution source as described by the regression based [29] Montgomery D, Perk E. Linear regression analysis. New York: Wiley and Sons;
formulas obtained in the paper. This will be directly reflected in 1982.
[30] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning – data
the improvement of Lake water quality and, ultimately the mining, inference, and prediction. New York: Springer Science+Business Media
Mediterranean Sea, as well as further sustaining fish farming in Inc; 2001. p. 533.
Lake Burullus. [31] Ryberg KR. Cluster Analysis of Water-Quality Data for Lake Sakakawea,
Audubon Lake, and McClusky Canal, Central North Dakota, 1990–2003,
Prepared in cooperation with the Bureau of Reclamation, U.S. Department of
References the Interior Scientific Investigations Report 2006–5202, U.S. Geological Survey,
Reston, Virginia; 2006. p. 47.
[1] Shaltout Kamal H, Al-Sodany Yassin M. Vegetation analysis of Burullus [32] Helsel DR, Hirsch RM. Statistical methods in water resources. New York, NY,
Wetland: a RAMSAR site in Egypt. Wetlands Ecol Manage 2008;16(5):421–39. USA: Elsevier Science Publishers Co.; 1992. p. 522.
[2] Chen Zhongyuan, Salem Alaa, Zhuang Xu, Zhang Weiguo. Ecological
implications of heavy metal concentrations in the sediments of Burullus
Lagoon of Nile Delta, Egypt, Estuarine. Coastal Shelf Sci 2010;86(2010):491–8.
Associate Prof. Mohamed Shaban M. Abusalama He
[3] Okbah MA. Nitrogen and phosphorus species of lake Burullus water (Egypt).
Egypt J Aquat Res 2005;31(1):2005. has a working experience over 20 year in the field of
[4] Ali EM. Impact of drain water on water quality and eutrophication status of environmental and water quality management. I gained
Lake Burullus, Egypt, a southern Mediterranean lagoon. Afr. J. Aquatic Sci. this experience from my work in the National Water
2011;36(3):267–77. doi: https://doi.org/10.2989/16085914.2011.636897. Research Center, Egypt where I joined regional and
[5] Elshinnawy IA. Water budget estimate for environmental management, Al- international research projects. I also joined the
Burullus Wetland, Egypt. In: The fourth international conference and research team of the University of Lunenburg, Germany
exhibition on environmental technologies, Environment 2003, Ministry of (Leuphana) in environmental research.
State for Environmental Affairs, Cairo, Egypt, 30 September 2 October 2003;
2003.
[6] Yosef TA, Gomaa GM. Assessment of some heavy metal contents in fresh and
salted (Feseakh) mullet fish collected from El-Burullus Lake, Egypt. J Am Sci
2011;7(10).
[7] Younis AM, Nafea EM. Impact of environmental conditions on the biodiversity
of Mediterranean Sea Lagoon, Burullus Protected Area, Egypt. World Appl Sci J
2012;19(10):1423–30.
[8] El-Sammak A, El-Sabrouti MA. Organic Carbon distribution and preservation in Associate Prof. Hanan Farag She is an expert in envi-
sediments of Lake Burullus, S.E, Mediterranean, Egypt. Fresenius Environ Bull ronmental information systems design and analysis. She
1995;4(8):457–62. has a good experience in water quality and GIS envi-
[9] Zaki HR, Tadros AB. Comparative studies on hydrographical parameters in ronmental applications. She worked in ITC, University of
Burullus Lake, Egypt. Global J Environ Resour 2009;3:223–33. Twente for one year as research staff member, water
[10] Ahmed MH, Abdel-Moati MR, El-Bayomi G, Tawfiq M, El-Kafrawi S. Using geo- resources department during this period she had a good
information and remote sensing data for environmental assessment of knowledge in remote sensing applications in ecological
Burrullus Lagoon, Egypt. Bull Nat Inst Oceanogr Fish 2001;27:133–55. and lakes water quality issues. Her overall experience is
[11] Okbah MA, Hussein NR. Impact of environmental conditions on the focusing on geographic information systems environ-
phytoplankton structure in Mediterranean sea lagoon, Lake Burullus, Egypt.
mental applications and water quality of water resour-
Water Air Soil Pollut 2006;172:129–50.
ces.
[12] Hereher ME, Salem MI, Darwish DH. Mapping water quality of Burullus Lagoon
using remote sensing and geographic information system. J Am Sci 2010;7(1).
[13] Farag H, El-Gamal A. Assessment of the Eutrophic Status of Lake Burullus
(Egypt) using Remote Sensing. Int J Environ Sci Eng (IJESE) 2011;2:61–74.
[14] Assar W, Elshemy M, Zeidan BA. Water quality modeling for Lake Burullus,
Egypt, Part I: model calibration, Mansoura. Eng J 2015;40(2):2–9.

You might also like