You are on page 1of 54

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE

LINEAR REGRESSION

K.R Musara

University of Zimbabwe
Department of Mathematics and Computational Science

Regression Analysis and ANOVA


HASTS112

May 3, 2021

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 1 / 53
Introduction

In the previous section, we discussed and derived the least squares


estimates for the model yi = β0 + β1 xi + i .

If the model is fitted and no assumptions are violated, the next step is to
use the model to investigate the relationship between the independent and
the dependent variables as well as to make inference about the parameters.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 1 / 53
Properties of the Estimators
The expectation of βˆ0 and βˆ1 can be shown to be β0 and β1 , respectively,
that is the least squares estimates of β0 and β1 are unbiased.

For the slope (β1 ), we have

 
ˆ Sxy
E[β1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx
i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx
i=1

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 2 / 53
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx
i=1
β1 Sxx
= = β1
Sxx
For the intercept (β0 ), we have,

E[βˆ0 ] = E[ȳ − βˆ1 x̄]


= E(ȳ) − β1 x̄
1X
= E(yi ) − β1 x̄
n
1X
= (β0 + β1 xi ) − β1 x̄
n
1X
= β0 + β1 xi − β1 x̄
n
= β0
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 3 / 53
NB: V ar(yi ) = var(i ) = σ 2 , (homogeneous variance assumption).

That is, the model error variance is constant for a fixed value of the
regressor variable.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 4 / 53
The variance of βˆ1 is given by:
Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx
i=1
 n
1 2X

= (xi − x̄)2 V ar(yi − ȳ)
Sxx
i=1
 n
1 2X

= (xi − x̄)2 V ar(yi )
Sxx
i=1
 n
1 2X

= (xi − x̄)2 σ 2
Sxx
i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 5 / 53
The variance of βˆ0 is given by:

V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)


= V ar(ȳ) + V ar(βˆ1 x̄) − Cov(ȳ, βˆ1 x̄)
n
1X
= V ar( yi ) + x̄2 V ar(βˆ1 )
n
i=1
n
1 X
= 2 V ar(yi ) + x̄2 V ar(βˆ1 )
n
i=1
n
1 X 2 σ2
= 2 σ + x̄2
n Sxx
i=1
σ 2 σ 2 x̄2
= +
n Sxx
x̄2
 
2 1
=σ +
n Sxx

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 6 / 53
Exercise Find the covariance between ŷ and β1 .

2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )),
where σ 2 is the variance of the error term. (This is because βˆ0 is a linear
combination of normal random variables, it must also be normal).
2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 7 / 53
Now, using the properties of the sampling distributions of βˆ0 and βˆ1 ,
inference about β0 and β1 can be made.

Before we infer on β0 and β1 , we need to estimate σ 2 .

To estimate σ 2 , we will use the residuals, yi − yˆi , which are the observed
errors of fit.

It is reasonable to say that the sample variance of the residuals should


provide an estimator of s2 .
n
2 1 X
s = (yi − yˆi )2
n−2
i=1
n
1 X 2
= r
n−2
i=1

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 8 / 53
Properties of s2

1 s2 = M SE

2 Under the regression model assumptions, s2 is an unbiased estimator


of σ 2 .

3 The n − 2 in the denominator comes from the fact that we have n


pieces of information less 2 estimates of β0 and β1 .

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 9 / 53
Making inference about parameters, β0 and β1
H0 : β1 = 0 (There is no linear relationship between Y and X)
H1 : β1 6= 0 (There is a linear relationship between Y and X)

β1 = 0 means that changes in X have no effect on Y .


Test statistic:

βˆ1 − β1
t= q
s2
Sxx

βˆ1
=q
s2
Sxx

βˆ1 Sxx
=
s
Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 10 / 53
Tests involving β0

H0 : β0 = 0
H1 : β0 6= 0

Test statistic:

βˆ0 − β0
t= q
x̄2
s2 ( n1 + Sxx )

βˆ0
= q
x̄2
s ( n1 + Sxx )

Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 11 / 53
Confidence interval for β0 and β1

1 The (1 − α)100% confidence interval for β1 is given by


 q q 
2 2
βˆ1 − tn−2 ( α2 ) Ssxx , βˆ1 + tn−2 ( α2 ) Ssxx

2 The (1 − α)100% confidence interval for β0 is given by


 q q 
2 x̄2
βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sx̄xx ), βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sxx )

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 12 / 53
Example 3.1

The following table is data on green liquor (N a2 S) concentration and


paper machine production. The fitted linear regression model is ,
yˆi = −16.5093 + 0.0694xi
1 Test H0 : β1 = 0, using the t-test at 5% level of significance.
2 Find a 95% confidence interval for the intercept.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 13 / 53
Green Liquor Conc.(yi ) Fitted conc. yˆi Production Residual ri
40 40.7089 825 -0.7089
42 41.0556 830 0.9444
49 45.2170 890 3.783
46 45.5637 895 0.4363
44 45.2170 890 -1.217
48 46.6041 910 1.3959
46 46.9509 915 -0.9509
43 50.0718 960 -7.0718
53 52.1525 990 0.8475
52 53.5396 1010 -1.5396
54 53.6783 1012 0.3217
57 54.9267 1030 2.0733
58 56.3138 1050 1.6862

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 14 / 53
Solution to Example 3.1

(b) We need to estimate σ 2 and find Sxx first.

( (yi − yˆi )2
P
ˆ2
σ =s =2
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2

X
Sxx = x2 − nx̄2
= 11529419 − 13(939)2
= 67046

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 15 / 53
ˆ
Now t = qβ1 = 0.0694
q = 6.6396
s2 7.3249
Sxx 67046

Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20

Since | t |> 2.20, we reject H0 and conclude that at 5% level of


significance, the slope is significantly different from zero → there is a
linear relationship between X and Y.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 16 / 53
(b) The (1 − α)100% confidence interval for β0 is given by
 q q 
2 x̄2
βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sx̄xx ), βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sxx )

⇒ 95% confidence interval of β0 is


s !
9392

1
−16.5093 ± 2.20 7.3249 +
13 67046
= (−16.5093 ± 2.20 × 9.8434)
= (−16.5093 ± 21.6555)
= (−38.1648, 5.1462)

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 17 / 53
ANALYSIS OF VARIANCE APPROACH TO SIMPLE
LINEAR REGRESSION

A method called the Analysis of Variance (ANOVA) can be used to test


the significance of regression. ANOVA is a highly useful and flexible mode
of analysis for regression models.

AIM
1 To compute σ 2
2 To measure the degree of linear relationship between X and Y in the
sample data.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 18 / 53
Partitioning the total sum of squares

The measure of variability of the observations is expressed in terms of the


sum of squares of the observations and is denoted by SST given by
n
X
SST = (yi − y¯i )2
i=1

SST = Total Sum of Squares

If there is a lot of variability in the yi s, then SST is large. If SST = 0 →


all the y − is are the same.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 19 / 53
Error Sum of Squares

The uncertainty associated with a prediction is related to the variability of


the yi around the fitted regression line as measured by the following
deviation, ri = yi − yˆi , (variability not explained by the model).
If all the yi values fall on the regression line, all the deviations ri will be
zero.
The conventional measure of variability around the fitted regression is the
error sum of squares (SSE) or (SSResidual ) which is calculated as follows
n
X n
X
2
SSE = (yi − yˆi ) = ri2
i=1 i=1

If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 20 / 53
Regression Sum of Squares

The reduction in the variability associated with the utilization of the


knowledge of the independent variables Xi is another sum of squares
known as Regression Sum of Squares (SSR). It is defined as
n
X
SSR = (yˆi − ȳ)2
i=1

(variability explained by the model)


SSR=SST-SSE
1 SSR can be viewed as a measure of the effect of the regression
relation in reducing the variability of yi .
2 If SSR = 0, then the regression calculation will not reduce variability
at all.
3 SSR can be interpreted as the proportion of variation in Y explained
by the regression.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 21 / 53
Thus, for SLR, the decomposition of SST into two components is achieved
as follows

SST = SSR + SSE

n
X n
X n
X
2 2
(yi − ȳ) = (yˆi − ȳ) + (yi − yˆi )2
i=1 i=1 i=1

The computational formulas for the above are as follows;


n
X
SST = (yi − ȳ)2
i=1
n
X
= yi2 − nȳ 2
i=1

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 22 / 53
n
X
SSR = (yˆi − ȳ)2
i=1
P 2
Yi Xi − nȲ X̄
= P 2
Xi − nX̄ 2
[Sxy ]2
=
Sxx
= βˆ1 Sxy

and SSE = SST − SSR.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 23 / 53
Partitioning degrees of freedom

SST had n − 1 degrees of freedom (d.f) associated with it. This is


because, SST has n deviations, namelyPyi − ȳ. However, there is one
constraint on these deviations, namely ni=1 (yi − ȳ) = 0, so we lose one
degrees of freedom, to remain with n − 1 degrees of freedom in the
n-deviations.

SSE had n − 2 degrees of freedom, since we imposed constraints on the


ri ’s during the estimation of β0 and β1 .

SSR has one (1) degree of freedom, there are two parameters in the
regression
Pn function, but the deviations yˆi − ȳ are subject to the constraint,
(
i=1 iy
ˆ − ȳ) = 0.

Thus, the degrees of freedom are additive and given by


(n − 1) = (1) + (n − 2)

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 24 / 53
Mean Squares

A sum of squares divided by the degrees of freedom is called a mean


square, e.g s2 = M SE.
The two important mean squares are the regression mean square denoted
MSR and the error mean square denoted by MSE. Thus, M SR = SSR 1
and M SE = SSEn−2 = s
2

Some properties of mean squares:


It can be shown that
1 E[M SE] = σ 2
E[M SR] = σ 2 + β12 (x − i − x̄)2
P
2

If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the
same expected value.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 25 / 53
When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and
P
E[M SR] > E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger
than MSE.

NB: SSE
σ2
and SSR
σ2
are independent chi-square random variables, with
n − 2 and 1 degrees of freedom, respectively.

F-Ratio
M SR
F = ∼ F (1, n − 2)
M SE

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 26 / 53
BASIC ANOVA TABLE

It is useful to collect the sum of squares, degrees of freedom, mean squares


and F-ratio in an ANOVA table for regression analysis. The table below
gives the structure and the appearance of the basic ANOVA table.

Table 1: BASIC ANOVA TABLE

Source of Variation Sum of Squares df Mean Square F


SSR = (yˆi − ȳ)2 M SR = SSR MS
P
Regression 1 1 F = MS
SSE = (yi − yˆi )2 M SE = SSE
P
Error n-2 n−2
SST = (yi − ȳ)2
P
Total n-1

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 27 / 53
From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.

If F is near 1, then MSR and MSE are approximately equal. F > 1,


suggests that β1 6= 0.

Our hypotheses are as follows:


H0 : β1 = 0
H1 : β1 6= 0

Test statistic: F

Decision Rule
We reject H0 if F > F1,n−2 (1 − α)

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 28 / 53
Example 3.2

An investigator interested in the dependence of the speed of sound on


temperature obtained the following measurements.

X, Temperature (o C) Y , Speed (m/s)


-20 323
0 327
20 340
50 364
100 384

The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5


(a) Find the least squares estimates of β0 and β1 .
(b) Construct the ANOVA table for this data set and hence test the
hypothesis that the slope is zero. Use α = 0.01.
(c) Find the standard errors of β0 and β1 .

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 29 / 53
Solution
(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800

X
Sxx = x2i − nx̄2
= 13300 − 5(30)2 = 8800

Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 30 / 53
βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364

Therefore, yˆi = 331.2364 + 0.5455xi


(b)
X
SST = (yi − ȳ)
X
= yi2 − nȳ 2
= 606810 − 5(347.6)2
= 2681.2

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 31 / 53
X
SSY = (yˆi − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818

Therefore SSE = SST − SSR = 63.0182.

SSE
M SE = n−2 = 21.0061,

SSR
M SR = 1 = 2618.1818

M SR
F = M SE = 124.6394

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 32 / 53
Table 2: ANOVA TABLE
Source of Variation SS df MS F
Regression 2618.1818 1 2618.1818 124.6394
Error 63.0182 3 21.0061
Total 2681.2 4

H0 : β1 = 0
H1 : β1 6= 0

Test statistic : F = 124.6394


Rejection Criteria
We reject H0 if F > F1,3 (0.01) = 34.1

Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have


sufficient evidence that the regression line is significant.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 33 / 53
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )

x̄2
 
ˆ ˆ 2
1
V ar(β0 ) = σ +
n Sxx
302
 
1
= 21.0061 + = 6.349571136
5 8800

⇒ s.e(βˆ0 ) = 2.5198

σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056

⇒ s.e(βˆ1 ) = 0.0489

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 34 / 53
Coefficient of Determination (R2 )

The quantity R2 = SSR SSE


SST = 1 − SST is called the coefficient of
determination and is often used to judge the adequacy of a regression
line/model. In the case where X and Y are jointly distributed random
variables, R2 is the square of the correlation coefficient between X and Y .
0 ≤ R2 ≤ 1
We often refer loosely to R2 as the amount of variability in the data
explained or accounted for by the regression model.

Example 3.4
From the previous example R2 = SSR 2618.1818
SST = 2681.2 = 0.9765
That is, the model accounts for 97.65% of the variability in the data.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 35 / 53
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural
dispersion) around the regression line. It is given by
s
CV = × 100%

The CV √ is scale free so it provides a better measure of spread
thats = s2 . A small value of CV suggests a good fit i.e there is not
much noise around the line.
Example 3.3
Referring to the previous example;

21.0061
CV = × 100%
347.6
= 1.3185%

A small value, suggesting minimal variation about the regression line that
is our model is a good fit to the data.
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 36 / 53
The Lack of Fit test

Regression models are often fit to data as an approximating function when


the true relationship between the variables Y and X is unknown.

Naturally, we would like to know whether the order of the model


tentatively assumed is correct.

Specifically, the hypotheses we wish to test are:

H0 : The simple linear regression model is correct


H1 : The simple linear regression model is not correct

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 37 / 53
The test involves partitioning the Error or Residual sum of squares into the
following components

SSE = SSP E + SSLOF

where SSP E is the sum of squares attributable to pure error, and SSLOF
is the sum of squares attributable to the lack of fit of the model.

The test requires that there be replicates (replication) at one or more


values of the predictor/explanatory variable (X).

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 38 / 53
Suppose we have n total observations such that

y11 , y12 , ..., y1n1 repeated observations at X1 .


y21 , y22 , ..., y2n2 repeated observations at X2 .
.
.
.
ym1 , ym2 , ..., ymnm repeated observations at Xm .

NB: There are m distinct levels of X.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 39 / 53
To develop the partitioning of SSE. Note that the (ij)th residual is

(yi − yˆi ) = (yij − y¯i ) + (y¯i − yˆi )

where y¯i is the average of the ni observations at Xi . Squaring both sides


and summing over i and j yields

X ni
m X X ni
m X m
X
(yi − yˆi )2 = (yij − y¯i )2 + ni (y¯i − yˆi )2
i=1 j=1 i=1 j=1 i=1

SSE = SSP E + SSLOF


Since the cross-product term equals zero.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 40 / 53
There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply
SSLOF = SSE − SSP E and it has m − 2 degrees of freedom.

The test statistic for lack of fit would then be


SSLOF
∗ (m−2) M SLOF
F = SSP E
=
(n−m)
M SP E

F ∗ ∼ F(m−2,n−m)

We would reject H0 if F ∗ > Fα (m − 2, n − m)

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 41 / 53
This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.

If H0 is not rejected ⇒ model must be abandoned and attempt must be


made to find a more appropriate model. If rejected ⇒ no apparent reason
to doubt the adequacy of the model.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 42 / 53
Example 3.5

The following data set gives the cost of maintenance of a tractor (Y) and
the age of that tractor (X)
(a) Fit a simple linear regression model to the data.
(b) Construct the ANOVA table and use the F test to test the
significance of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) using
α = 0.01.
(d) Test for lack of fit using α = 0.05.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 43 / 53
Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 44 / 53
Solution
(a)
yi = β0 + β1 xi + i

yˆi = βˆ0 + βˆ1 xi


Sxy
where βˆ0 = ȳ − βˆ1 x̄ and βˆ1 = Sxx
P
ˆ xy − nx̄ȳ
β1 = P 2
x − nx̄2
4003.9 − 14(3.7256)(68.4286)
=
227.14 − 14(3.7256)2
431.9286
= = 13.2866
32.5086

βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885


Therefore yˆi = 18.8885 + 13.2866xi
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 45 / 53
(b)
X
SST = yi2 − nȳ 2
= 76702 − 14(68.4286)2 = 11147.4286

SSR = βˆ1 Sxy


= 13.2866(431.9286) = 5738.8625

SSE = SST − SSR = 5408.5661

d.f for SSR is 1


d.f for SSE are n − 2 = 12
d.f for SST are n − 1 = 13
M SR = SSR 1 = SSR
SSE
M SE = 12 = 450.7138
M SR
F =M SE = 12.7328
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 46 / 53
ANOVA TABLE

Table 3: ANOVA TABLE


Source of Variation SS DF MS F
Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Total 11147.4286 13

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 47 / 53
H0 : β1 = 0
H1 : β1 6= 0

Test statistic: F = 12.7328

Testing at α = 0.05, we reject H0 if F > F0.05 (1, 12) = 4.75

Since F > 4.75, we reject H0 and conclude that the regression is


significant.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 48 / 53
(c) H0 : β0 = 0
H1 : β0 6= 0
βˆ0 −β0
Test statistic: t = ∼ t(n − 2)
s.e(βˆ0 )
Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05
Test statistic

βˆ0
t=
s.e(βˆ0 )
βˆ0
=  ˆ 
x2
σ 2 n1 + Sxx
18.8885
= = 1.2603
14.9878
Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 49 / 53
(d) H0 : The simple linear regression model is correct.
(E(Y ) = β0 + β1 X)
H1 : The simple linear regression model is not correct.
(E(Y ) 6= β0 + β1 X)
(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 50 / 53
nj
7 X
X
SSP E = (yij − y¯.j )2 = 3147.166667
j=1 i=1

m = 7, therefore d.f = n − m = 14 − 7 = 7

SSLOF = SSE − SSP E = 2261.399433, and its corresponding degrees of


freedom is d.f = m − 2 = 7 − 2 = 5

M SP E = SS
n−m = 449.5952381
PE

M SLOF = SSm−2 = 452.2798866


LOF

M SLOF
F = M SP E = 1.005971257

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 51 / 53
Table 4: ANOVA TABLE
Source of Variation Sum of Squares df Mean Square F
Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Lack of Fit 2261.3994 5 452.2799 1.0060
Pure Error 3147.1667 7 449.5952
Total 11147.4286 13

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 52 / 53
We reject H0 if F > F0.05 (5, 7) = 3.97

Since F < 3.97, we fail to reject H0 and conclude that there is no


sufficient evidence to say that the simple linear regression model is not
correct.

The End

K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 53 / 53

You might also like