You are on page 1of 37

SIMPLE LINEAR REGRESSION ANALYSIS

To be used in conjunction with the book:


Assessment of treatment plant performance and water quality data. A guide for students, researchers and practitioners
Marcos von Sperling, Matt Verbyla and Silvia Oliveira
IWA Publishing (2019)
Note: this spreadsheet if for didactic purposes only. The authors and IWA Publishing have no responsability in case there are problems with it

Fill-in cells in yellow. The other cells are for calculations and results.

Data

Constituent X Constituent Y
Sample (mg/L) (mg/L)
1 4.7 6.9
2 5.2 7.7
3 5.1 7.4
4 4.7 6.8
5 3.5 6.3
6 3.3 5.2
7 3.8 5.4
8 4 6
9 5.9 6.6
10 7.3 7.3
11 6.9 7.4
12 7.5 7.6
13 7.7 7.8
14 7.1 8.3
15 7.5 8.6
16 7.3 8.7
17 6.8 7.7
18 5.2 7
19 4.9 6.8
20 4.3 6.6
ractitioners

case there are problems with it, but would appreciate being communicated about them.
Computational table for the regression analysis (full calculations)

Uses data from tab 'Data'

Sum 112.7 142.1 823.65 677.79 1026.63 142.1 42.7255


Mean 5.635 7.105
Number of data (n) 20 (no missing data allowed)

Constituent X Constituent Y xy x2 y2 Ŷ (x-)2


Sample (mg/L) (mg/L)
1 4.7 6.9 32.4 22.1 47.6 6.60 0.8742
2 5.2 7.7 40.0 27.0 59.3 6.87 0.1892
3 5.1 7.4 37.7 26.0 54.8 6.82 0.2862
4 4.7 6.8 32.0 22.1 46.2 6.60 0.8742
5 3.5 6.3 22.1 12.3 39.7 5.96 4.5582
6 3.3 5.2 17.2 10.9 27.0 5.85 5.4522
7 3.8 5.4 20.5 14.4 29.2 6.12 3.3672
8 4 6 24.0 16.0 36.0 6.23 2.6732
9 5.9 6.6 38.9 34.8 43.6 7.25 0.0702
10 7.3 7.3 53.3 53.3 53.3 8.00 2.7722
11 6.9 7.4 51.1 47.6 54.8 7.78 1.6002
12 7.5 7.6 57.0 56.3 57.8 8.11 3.4782
13 7.7 7.8 60.1 59.3 60.8 8.21 4.2642
14 7.1 8.3 58.9 50.4 68.9 7.89 2.1462
15 7.5 8.6 64.5 56.3 74.0 8.11 3.4782
16 7.3 8.7 63.5 53.3 75.7 8.00 2.7722
17 6.8 7.7 52.4 46.2 59.3 7.73 1.3572
18 5.2 7 36.4 27.0 49.0 6.87 0.1892
19 4.9 6.8 33.3 24.0 46.2 6.71 0.5402
20 4.3 6.6 28.4 18.5 43.6 6.39 1.7822
1
𝑆Ŷ𝑖 = 𝑆𝑌𝑋 ඨ ቈ
17.0095 12.291628471 1.776357E-15 4.7178715287 1.7763568E-15 4.71787152871 𝑛
0.50
mean Ŷi = Ŷi ± t, n-2.SŶi
St dev
Confidence interval for mean
(y - )2 ( Ŷ- )2 y-Ŷ (y - Ŷ)2 e (e2) Syi
0.0420 0.2515 0.297 0.0879 0.297 0.0879 0.1359
0.3540 0.0544 0.828 0.6861 0.828 0.6861 0.1194
0.0870 0.0823 0.582 0.3387 0.582 0.3387 0.1219
0.0930 0.2515 0.197 0.0386 0.197 0.0386 0.1359
0.6480 1.3113 0.340 0.1157 0.340 0.1157 0.2027
3.6290 1.5685 -0.653 0.4259 -0.653 0.4259 0.2158
2.9070 0.9687 -0.721 0.5195 -0.721 0.5195 0.1837
1.2210 0.7691 -0.228 0.0520 -0.228 0.0520 0.1718
0.2550 0.0202 -0.647 0.4188 -0.647 0.4188 0.1163
0.0380 0.7975 -0.698 0.4873 -0.698 0.4873 0.1735
0.0870 0.4604 -0.384 0.1471 -0.384 0.1471 0.1514
0.2450 1.0006 -0.505 0.2554 -0.505 0.2554 0.1856
0.4830 1.2268 -0.413 0.1702 -0.413 0.1702 0.1982
1.4280 0.6174 0.409 0.1675 0.409 0.1675 0.1621
2.2350 1.0006 0.495 0.2447 0.495 0.2447 0.1856
2.5440 0.7975 0.702 0.4927 0.702 0.4927 0.1735
0.3540 0.3905 -0.030 0.0009 -0.030 0.0009 0.1464
0.0110 0.0544 0.128 0.0165 0.128 0.0165 0.1194
0.0930 0.1554 0.089 0.0080 0.089 0.0080 0.1281
0.2550 0.5127 0.211 0.0445 0.211 0.0445 0.1550
1 (𝑋𝑖 − 𝑋ത)2 1 ሺ𝑋𝑖 − 𝑋തሻ2
𝑆Ŷ𝑖 = 𝑆𝑌𝑋 ඨ ቈ + ቉ ሺ𝑆Ŷ𝑖 ሻ1 = 𝑆𝑌𝑋 ඨ ቈ1 + + ቉
𝑛 𝑆𝑄𝑋 𝑛 𝑆𝑄𝑋

n Ŷi = Ŷi ± t, n-2.SŶi single value of Ŷi = Ŷi ± t, n-2.(SŶi)1


dence interval for mean predicted Y Confidence interval for a single predicted Y
Lower CL Upper CL (Syi)1 Lower PL Upper PL
6.32 6.89 0.5297 5.49 7.72
6.62 7.12 0.5257 5.77 7.98
6.56 7.07 0.5263 5.71 7.92
6.32 6.89 0.5297 5.49 7.72
5.53 6.39 0.5506 4.80 7.12
5.40 6.31 0.5556 4.69 7.02
5.73 6.51 0.5439 4.98 7.26
5.87 6.59 0.5400 5.09 7.36
7.00 7.49 0.5250 6.14 8.35
7.63 8.36 0.5406 6.86 9.13
7.47 8.10 0.5339 6.66 8.91
7.72 8.50 0.5446 6.96 9.25
7.80 8.63 0.5490 7.06 9.37
7.55 8.23 0.5370 6.76 9.02
7.72 8.50 0.5446 6.96 9.25
7.63 8.36 0.5406 6.86 9.13
7.42 8.04 0.5325 6.61 8.85
6.62 7.12 0.5257 5.77 7.98
6.44 6.98 0.5278 5.60 7.82
6.06 6.71 0.5349 5.27 7.51
Calculations based on the computational table

Constituent X Constituen xy x2
(mg/L) t Y (mg/L)
Sum 112.7 142.1 823.65 677.79
Mean 5.635 7.105
Stand dev
Number of data (n) 20

MAIN RESULTS TESTING THE SIGNIFICANCE OF THE S

Number of data n 20 Significance level for the tests:

Slope b 0.536 Sum of squares SS


Intercept a 4.083

Coefficient of determination (r2) r2 0.723 Mean squares MS


Coefficient of correlation (r) r 0.850

F test for slope b


H0: b = 0
Ha: b ≠ 0

t test for slope b


H0: b = 0
Ha: b ≠ 0

Confidence intervals for slope


y2 Ŷ (x-)2 (y - )2 ( Ŷ- )2 y-Ŷ (y - Ŷ)2 e

1026.63 142.1 42.7255 17.0095 12.291628 1.776357E-15 4.7178715 1.776357E-15

0.498306159

TING THE SIGNIFICANCE OF THE SLOPE

nificance level for the tests:Alpha 0.05

m of squares SS Total SS 17.010


Regression SS 12.292
Residual SS 4.718
an squares MS Regression MS 12.292
Residual MS 0.262

st for slope b Fcalc 46.896


Fcrit 4.413873419
p-value 2.081152E-06
Decision Reject H0
Conclusion There is a significant relationship between the two variables

st for slope b Syx (stand error estimat 0.512


SQx 42.726
Sb 0.078
tcalc 6.848
tcrit 2.101
pvalue 2.081152E-06
Decision Reject H0
Conclusion There is a significant relationship between the two variables

fidence intervals for slope Upper limit 0.701


Lower limit 0.372
(e2)

4.7178715
0
Main graphs for the regresssion analysis

The graph titles are in the yellow cells (which you can change))

Scatter plot Scatter plot with line of best fit

Scatter plot Scatter plo


10 10
9 9

Constituent Y (mg/L)
Constituent Y (mg/L)

8 8
7 f(x) = 0.536365870
7 R² = 0.7226331444
6 6
5 5
4 4
3 3
2 2
1 1
0 0
3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 3 3.5 4 4.5

Constituent X (mg/L)

Scatter plot, regression line and confidence and prediction limits

Scatter plot, regression line and confidence and prediction limits


10
Constituent Y (mg/L)

8
6
4
2
0
3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
Constituent X (mg/L)

Data Ŷ Lower CL
Upper CL Lower PL Upper PL
ith line of best fit

Scatter plot with line of best fit

f(x) = 0.536365870498882 x + 4.0825783197388


R² = 0.722633144494996

3.5 4 4.5 5 5.5 6 6.5 7 7.5 8


Constituent X (mg/L)

You might also like