Professional Documents
Culture Documents
LINEAR REGRESSION
K.R Musara
University of Zimbabwe
Department of Mathematics and Computational Science
May 3, 2021
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 1 / 53
Introduction
If the model is fitted and no assumptions are violated, the next step is to
use the model to investigate the relationship between the independent and
the dependent variables as well as to make inference about the parameters.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 1 / 53
Properties of the Estimators
The expectation of βˆ0 and βˆ1 can be shown to be β0 and β1 , respectively,
that is the least squares estimates of β0 and β1 are unbiased.
ˆ Sxy
E[β1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx
i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx
i=1
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 2 / 53
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx
i=1
β1 Sxx
= = β1
Sxx
For the intercept (β0 ), we have,
That is, the model error variance is constant for a fixed value of the
regressor variable.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 4 / 53
The variance of βˆ1 is given by:
Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx
i=1
n
1 2X
= (xi − x̄)2 V ar(yi − ȳ)
Sxx
i=1
n
1 2X
= (xi − x̄)2 V ar(yi )
Sxx
i=1
n
1 2X
= (xi − x̄)2 σ 2
Sxx
i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 5 / 53
The variance of βˆ0 is given by:
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 6 / 53
Exercise Find the covariance between ŷ and β1 .
2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )),
where σ 2 is the variance of the error term. (This is because βˆ0 is a linear
combination of normal random variables, it must also be normal).
2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 7 / 53
Now, using the properties of the sampling distributions of βˆ0 and βˆ1 ,
inference about β0 and β1 can be made.
To estimate σ 2 , we will use the residuals, yi − yˆi , which are the observed
errors of fit.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 8 / 53
Properties of s2
1 s2 = M SE
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 9 / 53
Making inference about parameters, β0 and β1
H0 : β1 = 0 (There is no linear relationship between Y and X)
H1 : β1 6= 0 (There is a linear relationship between Y and X)
βˆ1 − β1
t= q
s2
Sxx
βˆ1
=q
s2
Sxx
√
βˆ1 Sxx
=
s
Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 10 / 53
Tests involving β0
H0 : β0 = 0
H1 : β0 6= 0
Test statistic:
βˆ0 − β0
t= q
x̄2
s2 ( n1 + Sxx )
βˆ0
= q
x̄2
s ( n1 + Sxx )
Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 11 / 53
Confidence interval for β0 and β1
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 12 / 53
Example 3.1
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 13 / 53
Green Liquor Conc.(yi ) Fitted conc. yˆi Production Residual ri
40 40.7089 825 -0.7089
42 41.0556 830 0.9444
49 45.2170 890 3.783
46 45.5637 895 0.4363
44 45.2170 890 -1.217
48 46.6041 910 1.3959
46 46.9509 915 -0.9509
43 50.0718 960 -7.0718
53 52.1525 990 0.8475
52 53.5396 1010 -1.5396
54 53.6783 1012 0.3217
57 54.9267 1030 2.0733
58 56.3138 1050 1.6862
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 14 / 53
Solution to Example 3.1
( (yi − yˆi )2
P
ˆ2
σ =s =2
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2
X
Sxx = x2 − nx̄2
= 11529419 − 13(939)2
= 67046
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 15 / 53
ˆ
Now t = qβ1 = 0.0694
q = 6.6396
s2 7.3249
Sxx 67046
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 16 / 53
(b) The (1 − α)100% confidence interval for β0 is given by
q q
2 x̄2
βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sx̄xx ), βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sxx )
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 17 / 53
ANALYSIS OF VARIANCE APPROACH TO SIMPLE
LINEAR REGRESSION
AIM
1 To compute σ 2
2 To measure the degree of linear relationship between X and Y in the
sample data.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 18 / 53
Partitioning the total sum of squares
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 19 / 53
Error Sum of Squares
If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 20 / 53
Regression Sum of Squares
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 21 / 53
Thus, for SLR, the decomposition of SST into two components is achieved
as follows
n
X n
X n
X
2 2
(yi − ȳ) = (yˆi − ȳ) + (yi − yˆi )2
i=1 i=1 i=1
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 22 / 53
n
X
SSR = (yˆi − ȳ)2
i=1
P 2
Yi Xi − nȲ X̄
= P 2
Xi − nX̄ 2
[Sxy ]2
=
Sxx
= βˆ1 Sxy
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 23 / 53
Partitioning degrees of freedom
SSR has one (1) degree of freedom, there are two parameters in the
regression
Pn function, but the deviations yˆi − ȳ are subject to the constraint,
(
i=1 iy
ˆ − ȳ) = 0.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 24 / 53
Mean Squares
If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the
same expected value.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 25 / 53
When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and
P
E[M SR] > E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger
than MSE.
NB: SSE
σ2
and SSR
σ2
are independent chi-square random variables, with
n − 2 and 1 degrees of freedom, respectively.
F-Ratio
M SR
F = ∼ F (1, n − 2)
M SE
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 26 / 53
BASIC ANOVA TABLE
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 27 / 53
From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.
Test statistic: F
Decision Rule
We reject H0 if F > F1,n−2 (1 − α)
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 28 / 53
Example 3.2
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 29 / 53
Solution
(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800
X
Sxx = x2i − nx̄2
= 13300 − 5(30)2 = 8800
Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 30 / 53
βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 31 / 53
X
SSY = (yˆi − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818
SSE
M SE = n−2 = 21.0061,
SSR
M SR = 1 = 2618.1818
M SR
F = M SE = 124.6394
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 32 / 53
Table 2: ANOVA TABLE
Source of Variation SS df MS F
Regression 2618.1818 1 2618.1818 124.6394
Error 63.0182 3 21.0061
Total 2681.2 4
H0 : β1 = 0
H1 : β1 6= 0
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 33 / 53
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )
x̄2
ˆ ˆ 2
1
V ar(β0 ) = σ +
n Sxx
302
1
= 21.0061 + = 6.349571136
5 8800
⇒ s.e(βˆ0 ) = 2.5198
σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056
⇒ s.e(βˆ1 ) = 0.0489
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 34 / 53
Coefficient of Determination (R2 )
Example 3.4
From the previous example R2 = SSR 2618.1818
SST = 2681.2 = 0.9765
That is, the model accounts for 97.65% of the variability in the data.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 35 / 53
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural
dispersion) around the regression line. It is given by
s
CV = × 100%
ȳ
The CV √ is scale free so it provides a better measure of spread
thats = s2 . A small value of CV suggests a good fit i.e there is not
much noise around the line.
Example 3.3
Referring to the previous example;
√
21.0061
CV = × 100%
347.6
= 1.3185%
A small value, suggesting minimal variation about the regression line that
is our model is a good fit to the data.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 36 / 53
The Lack of Fit test
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 37 / 53
The test involves partitioning the Error or Residual sum of squares into the
following components
where SSP E is the sum of squares attributable to pure error, and SSLOF
is the sum of squares attributable to the lack of fit of the model.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 38 / 53
Suppose we have n total observations such that
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 39 / 53
To develop the partitioning of SSE. Note that the (ij)th residual is
X ni
m X X ni
m X m
X
(yi − yˆi )2 = (yij − y¯i )2 + ni (y¯i − yˆi )2
i=1 j=1 i=1 j=1 i=1
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 40 / 53
There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply
SSLOF = SSE − SSP E and it has m − 2 degrees of freedom.
F ∗ ∼ F(m−2,n−m)
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 41 / 53
This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 42 / 53
Example 3.5
The following data set gives the cost of maintenance of a tractor (Y) and
the age of that tractor (X)
(a) Fit a simple linear regression model to the data.
(b) Construct the ANOVA table and use the F test to test the
significance of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) using
α = 0.01.
(d) Test for lack of fit using α = 0.05.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 43 / 53
Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 44 / 53
Solution
(a)
yi = β0 + β1 xi + i
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 47 / 53
H0 : β1 = 0
H1 : β1 6= 0
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 48 / 53
(c) H0 : β0 = 0
H1 : β0 6= 0
βˆ0 −β0
Test statistic: t = ∼ t(n − 2)
s.e(βˆ0 )
Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05
Test statistic
βˆ0
t=
s.e(βˆ0 )
βˆ0
= ˆ
x2
σ 2 n1 + Sxx
18.8885
= = 1.2603
14.9878
Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 49 / 53
(d) H0 : The simple linear regression model is correct.
(E(Y ) = β0 + β1 X)
H1 : The simple linear regression model is not correct.
(E(Y ) 6= β0 + β1 X)
(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 50 / 53
nj
7 X
X
SSP E = (yij − y¯.j )2 = 3147.166667
j=1 i=1
m = 7, therefore d.f = n − m = 14 − 7 = 7
M SP E = SS
n−m = 449.5952381
PE
M SLOF
F = M SP E = 1.005971257
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 51 / 53
Table 4: ANOVA TABLE
Source of Variation Sum of Squares df Mean Square F
Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Lack of Fit 2261.3994 5 452.2799 1.0060
Pure Error 3147.1667 7 449.5952
Total 11147.4286 13
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 52 / 53
We reject H0 if F > F0.05 (5, 7) = 3.97
The End
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 53 / 53