You are on page 1of 16





Statistical analysis is a scientific method of analyzing the collected experimental

data, compiling, analyzing and interpreting the observed data for drawing reasonable and
valid conclusion. The data regarding quantitative and qualitative characters are an
essential component of experimental work, which need statistical analysis, interpretations
and conclusion.

5.1.1 Models

Models are powerful tools by which the designers of waste treatment systems can
investigate the performance of a number of potentials under a variety of conditions. The
aim of this part of thesis work is to develop a mathematical model on paper and pulp mill
wastewater in combination with domestic wastewater in correlation with effective
microorganism using regression analysis and artificial neural network analysis. The data
obtained from the study involving various parameters like hydraulic retention time,
organic loading rate, sludge loading rate, influent pH, VSS/TS ratio, influent COD,
effluent COD, are considered as the explanatory variables and the percentage of COD
removal and methane production were considered as response variable using these data, a
multiple regression model and ANN model were prepared.


5.2.1 Regression Analysis

Regression analysis is concerned with developing an approximating model. It is

a very useful and widely employed tool of data analysis. It leads to simple, yet often
powerful descriptions of the main features of the relationships among variables.

According to Blair (2010), Regression is the measure of the average relationship

between two or more variables in terms of the original units of the data. Regression helps

to estimate one variable or the dependent variable from the other variables or the
independent variables. In other words we can estimate the value of one variable, provided
the values of other variable given. The statistical method which helps to estimate the
unknown value of one variable from the known values of the related variables is called as

5.2.2 Simple Linear Regression

According to Taro Yamane (1976), the most frequently used techniques in

economics and business research, to find a relation between two or more variables that
are related casually is regression analysis.

According to Wallis and Robert (2001), It is often more important to find out
what the relation actually is, in order to estimate or predict one variable (the dependent
variable), and statistical technique appropriate in such a case is called Regression

According to Ya-Lun-Chow (1989), Regression analysis is a statistical device.

With the help of the regression analysis, we can estimate or predict the unknown values
of one variable from the known values of another variable. In the regression analysis, the
independent variable is also known as the Regressor or Predictor or Explanator and
the dependent variable is known as regressed or explained variable.

Divya et al (2008), prepared a correlation analysis for the various chemical

parameters such as pH, Electrical Conductivity, Total hardness, Calcium, Magnesium,
Total alkalinity, Carbonate, Bicarbonate Sodium, Potassium,Chloride,Phosphate,Fluoride
and Nitrate significant linear relationships among some water quality parameters have
been obtained and found maximum between dissolved solids and electrical
conductivity(0.9999) and between total hardness and magnesium(1.000) which can be
used for rapid monitoring of water quality parameters. There was an also positive
correlation between electrical conductivity and total dissolved solids with chloride
(0.909) and some extent to sodium.

Edokapayi and Clement Aghatise (2008) reported that the physical and chemical
characteristics of the five stations along a 5km stretch of Benin river Southern Nigeria.
The results of multiple regression analysis carried out for each station using conductivity,
dissolved oxygen, Calcium, Phosphate and nitrate-nitrogen as dependent variables.
Conductivity was significantly influenced by other environmental conditions at the study
stations (P< 0.0001, 0.0003). Dissolved oxygen was significantly influenced by other
environmental factors at stations I (P<0.0007) and V (P<0.002) and nitrate- nitrogen was
significantly influenced in stations N( P<0.026) and V(P<0.0007).

Venkatesh et al (2009) reported that the r- value varies in the range of 0.0608 to
0.9969 depending on the set of parameters considered for analysis. The correlation values
above 0.94 were selected for analysis. The highest correlation is between EC and TDS.
High positive correlations between Turbidity and TSS, BOD and COD, EC and chlorides
were also observed.

Regression equation is an algebraic method. It is an algebraic expression of the

regression line. It can be classified into regression equation, regression coefficient,
individual observation and group distribution.

5.2.3 Multiple Linear Regression

In order to study the relationship among the selected parameters, a functional

relationship among them is established. ANOVA indicated that various parameters of
regression and their relationship. The regression co- efficient 0, 1, . p (Parameters)
as well as the variance of the errors, var( i ) = 2, are usually unknown and have to be
estimated from observations (Yi, Xi1, .., Xip ), i = 1,2,3,.n. as in the simple linear
regression model, we can use the least squares criterion to determine the estimates of
regression coefficients.

5.2.4 Polynomial Regression

Polynomial regression is a form of linear regression in which the relationship

between the independent variable X and the dependent variable Y is modeled as an nth
order polynomial. Polynomial regression fits a non-linear relationship between the value

of X and the corresponding conditional mean of Y, denoted E (Y/X), and has been used to
describe non-linear phenomena.

5.2.5 Stepwise Regression

Stepwise regression equation calibrates first a prediction of equation, then using

statistical criteria. It selects the variables which will be included in the final regression
equation. Stepwise regression is classified into backward stepwise regression and forward
stepwise regression.


An Artificial Neural Network (ANN) is an information processing system that

has certain performance characteristics in common with biological neural networks. A
multilayer feed forward ANN, trained using error back propagation learning algorithm
is employed for this purpose. The ANN is a flexible mathematical structure which is
capable of identifying complex nonlinear relationships between input and output data
sets. ANN models have been found useful and efficient, particularly in problems for
which the characteristics of the processes are difficult to describe using physical
equations (, 1993).. Figure 5.1 shows the ANN modeling encomprising of
various parameters and the hidden layers.

ANNs have been proved successful in solving many civil and structural
engineering, waste water treatment and rain fall, run off modeling problems. ANN
structure designed to mimic the information processing functions of a network of neurons
similar to the brain (Guru Prasad, 2007).

In this study an ANN modeling was obtained using important operational

parameters of HUASB reactor like HRT (Hydraulic Retention time), OLR (Organic
Loading rate), influent pH, effluent pH, VFA/ alkalinity ratio, influent COD, effluent
COD, act as input parameters, also the output parameters were removal percentage of
COD , biogas production and methane content. The Figure 5.2 to 5.5 shows the various
procedures for analyzing the statistical analysis, explaining the architecture layers and
the algorithms, performance and correlation between estimated value and the measured
values of R2 for HUB1 reactor.

Figure 5.1 ANN modelling diagram


Figure 5.2 ANN Archecture of Layers and Algarithms

Figure 5.3 Best validation performance curve for HUB1


Figure 5.4 Correlation between estimated value and measured value for HUB1

Figure 5.5 Predicted value and observed value for Regression Coefficient for HUB 1

Table 5.1 Consolidated statement for ANN models of all reactors

Reactor Eqauation for % removal Equation for Methane Remarks

of TCOD Observed Vs Production Observed Vs
predicted Model I (M1) predicted Model II (M2)

HUB1 Y=1.00x - 0.014(R2=0.996) Y=0.64x+0.07 (R2=0.651) Significant for M1

HUB2 Y=0.98x+1.904(R2 =0.994) Y=0.547x+0.08 (R =0.51) Significant for M1
2 2
HUB3 Y=0.99x+0.374(R =0.990) Y=0.94x+0.01 (R =0.964) Significant for M1
HUB4 Y=1.09x-0.565(R2 =0.987) Y=1.02x -0.001 (R2=0.95) Significant for M1

The above Table 5.1 shows that the removal of TCOD % for the reactors are
efficient as the R2 value are above 0.9 and the HUB3 and HUB4 i.e. PPW and DW with
and without EM combinations are found to be efficient as the R2 value found to be above
0.99 and 0.95, out of this ANN model study.

5.4 Multiple Linear Regression Modeling Study

The HUASB reactor operations are divided in to two stages

(i) Start up phase and aafter start up conditions and optimization of HUASB
(ii) Efficiency of HUASB reactor among varying HRT.

The main aim in this study is to create a multiple linear regression model for the
dependent variable COD removal (%) and methane production on several independent
variables such as Organic Loading rate, Influent pH , Influent Inlet VFA, alkalinity. The
multiple regression equation for the dependent variable Y on the above independent
variable was fit using the SPSS software through forward stepwise regression method.

A Multiple regression equation for all the reactors using EXCEL software is
furnished below. Forward stepwise regression method was followed to find out the
significance of various factors (Independent variables) for COD removal (dependent

Table 5.2 Regression Analysis HUB 1

**: Significant at 1% level.

Table 5.3 Regression Analysis HUB 2

**: Significant at 1% level.


Figure 5.6 Comparison of observed and predicted TCOD % removal (Y1)

based on influent TCOD (X1) HUB 1.

Figure 5.7 Comparison of observed and predicted TCOD removal (Y1) based on OLR
(X2) HUB 2

Table 5.4 Regression Analysis HUB 3

*: Significant at 5% level.

Table 5.5 Regression Analysis HUB 4

*: Significant at 5% level.

Figure 5.8 Comparison of observed and predicted methane production rate (Y2)
based on OLR (X2) HUB 3.

Figure 5.9 Comparison of observed and predicted methane production rate (Y2) based
on influent TCOD (X1) HUB 4.

The consolidated regression equation obtained from the Table 4.9 to 4.12 and also
from Table 5.2 to 5.5, with its efficient R2 valued models in each reactor is tabulated in
Table 5.6 for ready reference as follows.
Figure 5.6 to 5.9, depict the comparision of observed and predicted values
obtained from the regression model.

From the Table 5.6, it could be assessed that HUB 4 reactor i.e. PPW and DWW
without EM , seems to be significant in both influent TCOD Vs % TCOD removal as well
OLR Vs TCOD % removal as R2 values are above 0.92. Similarly, the HUB 1 is also
showing consistent values of R2 of about 0.89. HUB 3 shows % of TCOD removal with
respect to influent TCOD as well methane production improved version. HUB2 fails to
maintain consistency factors with wide variation to TCOD removal and methane
production rates.

By and large, the treatment of pulp and paper mill wastewater treatment is proved
to be efficient by adding EM alone as in the HUB 1 reactor and while treating with
DWW, without EM is efficient as in HUB 4 from the conclusion arrived the multiple
regression analysis.

Table 5.6 Consolidated regression statement for all the reactor using regression method

Equation Remarks
Influent TCOD (X1) and
Y1 = 49.178+0.029 (x1)-6.3E-06(x1)2+3.34E-10(x1)3
HUB1 TCOD Removal 0.875
Efficiency (Y1).
Influent TCOD (X1) and
Y2 = 0.155+5.05E-05(x1)-1.3E-09(x1)2-1E-12(x1)3
HUB1 Methane Production rate 0.894

OLR (X2) and TCOD

Y1 = 49.162+14.632 (x2)-1.588(x2)2+0.042(x2)3
HUB1 Removal Efficiency 0.875

Y1 = 0.155+0.025 (x2) OLR (X2) and Methane

HUB1 0.894
Production rate (Y2).

Y1 =12.921+0.028 (x1)-1.3E-06(x1)2+3.6E-10(x1)3 Influent TCOD (X1) and

HUB 2 0.963
TCOD % Removal (Y1).

Influent TCOD (X1) and

Y2 = 0.538-0.000 (x1)+1.41E-07(x1)2-1.4E-11(x1)3
HUB 2 Methane Production rate 0.142

Y1 = 12.872+14.324(x2)-0.337(x2)2-0.044(x2)3 OLR (X2) and TCOD %

HUB 2 0.963
Removal (Y1).

Y2 = 0.537-0.215 (x2)+0.035(x2)2-0.001(x2)3 OLR (X2) and Methane

HUB 2 0.142
Production Rate (Y2).

Influent TCOD (X1) and

Y1 = 49.423+0.032 (x1)-1E-05(x1)2+9.43E-10(x1)3
HUB 3 TCOD Removal 0.454
Efficiency (Y1).
Regression Analysis of
2 3
Y2 = 0.033-6.2E-08(x1) +5E-12(x1) Influent TCOD (X1) and
HUB 3 0.514
Methane Production rate
OLR (X2) and TCOD
HUB 3 Y1 = 42.049+15.400 (x2)-1.784(x2)2+0.056(x2)3 Removal Efficiency 0.883
OLR (X2) and Methane
HUB 3 Y2 = 0.011+0.083(x2)-0.008(x2)2 0.897
Production rate (Y2).

Influent TCOD (X1) and

HUB 4 Y1 = 33.780-0.014 (x1)+1.26E-05(x1)2-1.7E-09(x1)3 0.929
TCOD % Removal (Y1).

Influent TCOD (X1) and

HUB 4 Y2 =0.113-1.4E-05 (x1)+3.58E-08(x1)2-5.8E-12(x1)3 Methane Production 0.943
Rate (Y2).
Regression Analysis
HUB 4 Y1 = 33.776-6.031 (x2)+2.184(x2)2-0.124(x2)3 OLR (X2) and TCOD % 0.929
Removal (Y1).
Regression Analysis
HUB 4 Y2 = 0.113-0.005 (x2)+0.006(x2)2 OLR (X2) and Methane 0.943
Production Rate (Y2).