Chapter 3_Presentation

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE
LINEAR REGRESSION
K.R Musara
University of Zimbabwe
Department of Mathematics and Computational Science
Regression Analysis and ANOVA

HASTS112
May 3, 2021
K.R Musara (UZ) CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR REGRESSION
May 3, 2021 1 / 53
Introduction
In the previous section, we discussed and derived the least squares

estimates for the model yi = β0 + β1 xi + i .
If the model is fitted and no assumptions are violated, the next step is to
use the model to investigate the relationship between the independent and
the dependent variables as well as to make inference about the parameters.
May 3, 2021 1 / 53
Properties of the Estimators
The expectation of βˆ0 and βˆ1 can be shown to be β0 and β1 , respectively,
that is the least squares estimates of β0 and β1 are unbiased.
For the slope (β1 ), we have

ˆ Sxy
E[β1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx
i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx
i=1
May 3, 2021 2 / 53
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx
i=1
β1 Sxx
= = β1
Sxx
For the intercept (β0 ), we have,
E[βˆ0 ] = E[ȳ − βˆ1 x̄]

= E(ȳ) − β1 x̄
1X
= E(yi ) − β1 x̄
n
1X
= (β0 + β1 xi ) − β1 x̄
n
1X
= β0 + β1 xi − β1 x̄
n
= β0
May 3, 2021 3 / 53
NB: V ar(yi ) = var(i ) = σ 2 , (homogeneous variance assumption).
That is, the model error variance is constant for a fixed value of the
regressor variable.
May 3, 2021 4 / 53
The variance of βˆ1 is given by:
Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx
i=1
n
1 2X

= (xi − x̄)2 V ar(yi − ȳ)
Sxx
i=1
n
1 2X

= (xi − x̄)2 V ar(yi )
Sxx
i=1
n
1 2X

= (xi − x̄)2 σ 2
Sxx
i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
May 3, 2021 5 / 53
The variance of βˆ0 is given by:
V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)

= V ar(ȳ) + V ar(βˆ1 x̄) − Cov(ȳ, βˆ1 x̄)
n
1X
= V ar( yi ) + x̄2 V ar(βˆ1 )
n
i=1
n
1 X
= 2 V ar(yi ) + x̄2 V ar(βˆ1 )
n
i=1
n
1 X 2 σ2
= 2 σ + x̄2
n Sxx
i=1
σ 2 σ 2 x̄2
= +
n Sxx
x̄2

2 1
=σ +
n Sxx
May 3, 2021 6 / 53
Exercise Find the covariance between ŷ and β1 .
2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )),
where σ 2 is the variance of the error term. (This is because βˆ0 is a linear
combination of normal random variables, it must also be normal).
2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )
May 3, 2021 7 / 53
Now, using the properties of the sampling distributions of βˆ0 and βˆ1 ,
inference about β0 and β1 can be made.
Before we infer on β0 and β1 , we need to estimate σ 2 .
To estimate σ 2 , we will use the residuals, yi − yî , which are the observed
errors of fit.
It is reasonable to say that the sample variance of the residuals should

provide an estimator of s2 .
n
2 1 X
s = (yi − yî )2
n−2
i=1
n
1 X 2
= r
n−2
i=1
May 3, 2021 8 / 53
Properties of s2
1 s2 = M SE
2 Under the regression model assumptions, s2 is an unbiased estimator

of σ 2 .
3 The n − 2 in the denominator comes from the fact that we have n

pieces of information less 2 estimates of β0 and β1 .
May 3, 2021 9 / 53
Making inference about parameters, β0 and β1
H0 : β1 = 0 (There is no linear relationship between Y and X)
H1 : β1 6= 0 (There is a linear relationship between Y and X)
β1 = 0 means that changes in X have no effect on Y .

Test statistic:
βˆ1 − β1
t= q
s2
Sxx
βˆ1
=q
s2
Sxx
√
βˆ1 Sxx
=
s
Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )
May 3, 2021 10 / 53
Tests involving β0
H0 : β0 = 0
H1 : β0 6= 0
Test statistic:
βˆ0 − β0
t= q
x̄2
s2 ( n1 + Sxx )
βˆ0
= q
x̄2
s ( n1 + Sxx )
Rejection Criteria
We reject H0 if | t |> tn−2 ( α2 )
May 3, 2021 11 / 53
Confidence interval for β0 and β1
1 The (1 − α)100% confidence interval for β1 is given by

q q
2 2
βˆ1 − tn−2 ( α2 ) Ssxx , βˆ1 + tn−2 ( α2 ) Ssxx
2 The (1 − α)100% confidence interval for β0 is given by

q q
2 x̄2
βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sx̄xx ), βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sxx )
May 3, 2021 12 / 53
Example 3.1
The following table is data on green liquor (N a2 S) concentration and

paper machine production. The fitted linear regression model is ,
yî = −16.5093 + 0.0694xi
1 Test H0 : β1 = 0, using the t-test at 5% level of significance.
2 Find a 95% confidence interval for the intercept.
May 3, 2021 13 / 53
Green Liquor Conc.(yi ) Fitted conc. yî Production Residual ri
40 40.7089 825 -0.7089
42 41.0556 830 0.9444
49 45.2170 890 3.783
46 45.5637 895 0.4363
44 45.2170 890 -1.217
48 46.6041 910 1.3959
46 46.9509 915 -0.9509
43 50.0718 960 -7.0718
53 52.1525 990 0.8475
52 53.5396 1010 -1.5396
54 53.6783 1012 0.3217
57 54.9267 1030 2.0733
58 56.3138 1050 1.6862
May 3, 2021 14 / 53
Solution to Example 3.1
(b) We need to estimate σ 2 and find Sxx first.
( (yi − yî )2
P
ˆ2
σ =s =2
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2
X
Sxx = x2 − nx̄2
= 11529419 − 13(939)2
= 67046
May 3, 2021 15 / 53
ˆ
Now t = qβ1 = 0.0694
q = 6.6396
s2 7.3249
Sxx 67046
Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20
Since | t |> 2.20, we reject H0 and conclude that at 5% level of

significance, the slope is significantly different from zero → there is a
linear relationship between X and Y.
May 3, 2021 16 / 53
(b) The (1 − α)100% confidence interval for β0 is given by
q q
2 x̄2
βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sx̄xx ), βˆ0 − tn−2 ( α2 ) s2 ( n1 + Sxx )
⇒ 95% confidence interval of β0 is

s !
9392

1
−16.5093 ± 2.20 7.3249 +
13 67046
= (−16.5093 ± 2.20 × 9.8434)
= (−16.5093 ± 21.6555)
= (−38.1648, 5.1462)
May 3, 2021 17 / 53
ANALYSIS OF VARIANCE APPROACH TO SIMPLE
LINEAR REGRESSION
A method called the Analysis of Variance (ANOVA) can be used to test

the significance of regression. ANOVA is a highly useful and flexible mode
of analysis for regression models.
AIM
1 To compute σ 2
2 To measure the degree of linear relationship between X and Y in the
sample data.
May 3, 2021 18 / 53
Partitioning the total sum of squares
The measure of variability of the observations is expressed in terms of the

sum of squares of the observations and is denoted by SST given by
n
X
SST = (yi − y¯i )2
i=1
SST = Total Sum of Squares
If there is a lot of variability in the yi s, then SST is large. If SST = 0 →

all the y − is are the same.
May 3, 2021 19 / 53
Error Sum of Squares
The uncertainty associated with a prediction is related to the variability of

the yi around the fitted regression line as measured by the following
deviation, ri = yi − yî , (variability not explained by the model).
If all the yi values fall on the regression line, all the deviations ri will be
zero.
The conventional measure of variability around the fitted regression is the
error sum of squares (SSE) or (SSResidual ) which is calculated as follows
n
X n
X
2
SSE = (yi − yî ) = ri2
i=1 i=1
If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.
May 3, 2021 20 / 53
Regression Sum of Squares
The reduction in the variability associated with the utilization of the

knowledge of the independent variables Xi is another sum of squares
known as Regression Sum of Squares (SSR). It is defined as
n
X
SSR = (yî − ȳ)2
i=1
(variability explained by the model)

SSR=SST-SSE
1 SSR can be viewed as a measure of the effect of the regression
relation in reducing the variability of yi .
2 If SSR = 0, then the regression calculation will not reduce variability
at all.
3 SSR can be interpreted as the proportion of variation in Y explained
by the regression.
May 3, 2021 21 / 53
Thus, for SLR, the decomposition of SST into two components is achieved
as follows
SST = SSR + SSE
n
X n
X n
X
2 2
(yi − ȳ) = (yî − ȳ) + (yi − yî )2
i=1 i=1 i=1
The computational formulas for the above are as follows;

n
X
SST = (yi − ȳ)2
i=1
n
X
= yi2 − nȳ 2
i=1
May 3, 2021 22 / 53
n
X
SSR = (yî − ȳ)2
i=1
P 2
Yi Xi − nȲ X̄
= P 2
Xi − nX̄ 2
[Sxy ]2
=
Sxx
= βˆ1 Sxy
and SSE = SST − SSR.
May 3, 2021 23 / 53
Partitioning degrees of freedom
SST had n − 1 degrees of freedom (d.f) associated with it. This is

because, SST has n deviations, namelyPyi − ȳ. However, there is one
constraint on these deviations, namely ni=1 (yi − ȳ) = 0, so we lose one
degrees of freedom, to remain with n − 1 degrees of freedom in the
n-deviations.
SSE had n − 2 degrees of freedom, since we imposed constraints on the

ri ’s during the estimation of β0 and β1 .
SSR has one (1) degree of freedom, there are two parameters in the
regression
Pn function, but the deviations yî − ȳ are subject to the constraint,
(
i=1 iy
ˆ − ȳ) = 0.
Thus, the degrees of freedom are additive and given by

(n − 1) = (1) + (n − 2)
May 3, 2021 24 / 53
Mean Squares
A sum of squares divided by the degrees of freedom is called a mean

square, e.g s2 = M SE.
The two important mean squares are the regression mean square denoted
MSR and the error mean square denoted by MSE. Thus, M SR = SSR 1
and M SE = SSEn−2 = s
2
Some properties of mean squares:

It can be shown that
1 E[M SE] = σ 2
E[M SR] = σ 2 + β12 (x − i − x̄)2
P
2
If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the
same expected value.
May 3, 2021 25 / 53
When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and
P
E[M SR] > E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger
than MSE.
NB: SSE
σ2
and SSR
σ2
are independent chi-square random variables, with
n − 2 and 1 degrees of freedom, respectively.
F-Ratio
M SR
F = ∼ F (1, n − 2)
M SE
May 3, 2021 26 / 53
BASIC ANOVA TABLE
It is useful to collect the sum of squares, degrees of freedom, mean squares

and F-ratio in an ANOVA table for regression analysis. The table below
gives the structure and the appearance of the basic ANOVA table.
Table 1: BASIC ANOVA TABLE
Source of Variation Sum of Squares df Mean Square F

SSR = (yî − ȳ)2 M SR = SSR MS
P
Regression 1 1 F = MS
SSE = (yi − yî )2 M SE = SSE
P
Error n-2 n−2
SST = (yi − ȳ)2
P
Total n-1
May 3, 2021 27 / 53
From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.
If F is near 1, then MSR and MSE are approximately equal. F > 1,

suggests that β1 6= 0.
Our hypotheses are as follows:

H0 : β1 = 0
H1 : β1 6= 0
Test statistic: F
Decision Rule
We reject H0 if F > F1,n−2 (1 − α)
May 3, 2021 28 / 53
Example 3.2
An investigator interested in the dependence of the speed of sound on

temperature obtained the following measurements.
X, Temperature (o C) Y , Speed (m/s)

-20 323
0 327
20 340
50 364
100 384
The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5

(a) Find the least squares estimates of β0 and β1 .
(b) Construct the ANOVA table for this data set and hence test the
hypothesis that the slope is zero. Use α = 0.01.
(c) Find the standard errors of β0 and β1 .
May 3, 2021 29 / 53
Solution
(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800
X
Sxx = x2i − nx̄2
= 13300 − 5(30)2 = 8800
Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800
May 3, 2021 30 / 53
βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364
Therefore, yî = 331.2364 + 0.5455xi

(b)
X
SST = (yi − ȳ)
X
= yi2 − nȳ 2
= 606810 − 5(347.6)2
= 2681.2
May 3, 2021 31 / 53
X
SSY = (yî − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818
Therefore SSE = SST − SSR = 63.0182.
SSE
M SE = n−2 = 21.0061,
SSR
M SR = 1 = 2618.1818
M SR
F = M SE = 124.6394
May 3, 2021 32 / 53
Table 2: ANOVA TABLE
Source of Variation SS df MS F
Regression 2618.1818 1 2618.1818 124.6394
Error 63.0182 3 21.0061
Total 2681.2 4
H0 : β1 = 0
H1 : β1 6= 0
Test statistic : F = 124.6394

Rejection Criteria
We reject H0 if F > F1,3 (0.01) = 34.1
Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have

sufficient evidence that the regression line is significant.
May 3, 2021 33 / 53
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )
x̄2

ˆ ˆ 2
1
V ar(β0 ) = σ +
n Sxx
302

1
= 21.0061 + = 6.349571136
5 8800
⇒ s.e(βˆ0 ) = 2.5198
σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056
⇒ s.e(βˆ1 ) = 0.0489
May 3, 2021 34 / 53
Coefficient of Determination (R2 )
The quantity R2 = SSR SSE

SST = 1 − SST is called the coefficient of
determination and is often used to judge the adequacy of a regression
line/model. In the case where X and Y are jointly distributed random
variables, R2 is the square of the correlation coefficient between X and Y .
0 ≤ R2 ≤ 1
We often refer loosely to R2 as the amount of variability in the data
explained or accounted for by the regression model.
Example 3.4
From the previous example R2 = SSR 2618.1818
SST = 2681.2 = 0.9765
That is, the model accounts for 97.65% of the variability in the data.
May 3, 2021 35 / 53
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural
dispersion) around the regression line. It is given by
s
CV = × 100%
ȳ
The CV √ is scale free so it provides a better measure of spread
thats = s2 . A small value of CV suggests a good fit i.e there is not
much noise around the line.
Example 3.3
Referring to the previous example;
√
21.0061
CV = × 100%
347.6
= 1.3185%
A small value, suggesting minimal variation about the regression line that
is our model is a good fit to the data.
May 3, 2021 36 / 53
The Lack of Fit test
Regression models are often fit to data as an approximating function when

the true relationship between the variables Y and X is unknown.
Naturally, we would like to know whether the order of the model

tentatively assumed is correct.
Specifically, the hypotheses we wish to test are:
H0 : The simple linear regression model is correct

H1 : The simple linear regression model is not correct
May 3, 2021 37 / 53
The test involves partitioning the Error or Residual sum of squares into the
following components
SSE = SSP E + SSLOF
where SSP E is the sum of squares attributable to pure error, and SSLOF
is the sum of squares attributable to the lack of fit of the model.
The test requires that there be replicates (replication) at one or more

values of the predictor/explanatory variable (X).
May 3, 2021 38 / 53
Suppose we have n total observations such that
y11 , y12 , ..., y1n1 repeated observations at X1 .

y21 , y22 , ..., y2n2 repeated observations at X2 .
.
.
.
ym1 , ym2 , ..., ymnm repeated observations at Xm .
NB: There are m distinct levels of X.
May 3, 2021 39 / 53
To develop the partitioning of SSE. Note that the (ij)th residual is
(yi − yî ) = (yij − y¯i ) + (y¯i − yî )
where y¯i is the average of the ni observations at Xi . Squaring both sides

and summing over i and j yields
X ni
m X X ni
m X m
X
(yi − yî )2 = (yij − y¯i )2 + ni (y¯i − yî )2
i=1 j=1 i=1 j=1 i=1
SSE = SSP E + SSLOF

Since the cross-product term equals zero.
May 3, 2021 40 / 53
There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply
SSLOF = SSE − SSP E and it has m − 2 degrees of freedom.
The test statistic for lack of fit would then be

SSLOF
∗ (m−2) M SLOF
F = SSP E
=
(n−m)
M SP E
F ∗ ∼ F(m−2,n−m)
We would reject H0 if F ∗ > Fα (m − 2, n − m)
May 3, 2021 41 / 53
This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.
If H0 is not rejected ⇒ model must be abandoned and attempt must be

made to find a more appropriate model. If rejected ⇒ no apparent reason
to doubt the adequacy of the model.
May 3, 2021 42 / 53
Example 3.5
The following data set gives the cost of maintenance of a tractor (Y) and
the age of that tractor (X)
(a) Fit a simple linear regression model to the data.
(b) Construct the ANOVA table and use the F test to test the
significance of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) using
α = 0.01.
(d) Test for lack of fit using α = 0.05.
May 3, 2021 43 / 53
Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55
May 3, 2021 44 / 53
Solution
(a)
yi = β0 + β1 xi + i
yî = βˆ0 + βˆ1 xi

Sxy
where βˆ0 = ȳ − βˆ1 x̄ and βˆ1 = Sxx
P
ˆ xy − nx̄ȳ
β1 = P 2
x − nx̄2
4003.9 − 14(3.7256)(68.4286)
=
227.14 − 14(3.7256)2
431.9286
= = 13.2866
32.5086
βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885

Therefore yî = 18.8885 + 13.2866xi
May 3, 2021 45 / 53
(b)
X
SST = yi2 − nȳ 2
= 76702 − 14(68.4286)2 = 11147.4286
SSR = βˆ1 Sxy

= 13.2866(431.9286) = 5738.8625
SSE = SST − SSR = 5408.5661
d.f for SSR is 1

d.f for SSE are n − 2 = 12
d.f for SST are n − 1 = 13
M SR = SSR 1 = SSR
SSE
M SE = 12 = 450.7138
M SR
F =M SE = 12.7328
May 3, 2021 46 / 53
ANOVA TABLE

Source of Variation SS DF MS F
Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Total 11147.4286 13
May 3, 2021 47 / 53
H0 : β1 = 0
H1 : β1 6= 0
Test statistic: F = 12.7328
Testing at α = 0.05, we reject H0 if F > F0.05 (1, 12) = 4.75
Since F > 4.75, we reject H0 and conclude that the regression is

significant.
May 3, 2021 48 / 53
(c) H0 : β0 = 0
H1 : β0 6= 0
βˆ0 −β0
Test statistic: t = ∼ t(n − 2)
s.e(βˆ0 )
Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05
Test statistic
βˆ0
t=
s.e(βˆ0 )
βˆ0
= ˆ
x2
σ 2 n1 + Sxx
18.8885
= = 1.2603
14.9878
Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.
May 3, 2021 49 / 53
(d) H0 : The simple linear regression model is correct.
(E(Y ) = β0 + β1 X)
H1 : The simple linear regression model is not correct.
(E(Y ) 6= β0 + β1 X)
(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2
May 3, 2021 50 / 53
nj
7 X
X
SSP E = (yij − y¯.j )2 = 3147.166667
j=1 i=1
m = 7, therefore d.f = n − m = 14 − 7 = 7
SSLOF = SSE − SSP E = 2261.399433, and its corresponding degrees of

freedom is d.f = m − 2 = 7 − 2 = 5
M SP E = SS
n−m = 449.5952381
PE
M SLOF = SSm−2 = 452.2798866

LOF
M SLOF
F = M SP E = 1.005971257
May 3, 2021 51 / 53
Source of Variation Sum of Squares df Mean Square F
Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Lack of Fit 2261.3994 5 452.2799 1.0060
Pure Error 3147.1667 7 449.5952
Total 11147.4286 13
May 3, 2021 52 / 53
We reject H0 if F > F0.05 (5, 7) = 3.97
Since F < 3.97, we fail to reject H0 and conclude that there is no

sufficient evidence to say that the simple linear regression model is not
correct.
The End
May 3, 2021 53 / 53

Chapter 3_Presentation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3_Presentation

Uploaded by

Copyright:

Available Formats

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE

Regression Analysis and ANOVA

In the previous section, we discussed and derived the least squares

For the slope (β1 ), we have

E[βˆ0 ] = E[ȳ − βˆ1 x̄]

V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)

Before we infer on β0 and β1 , we need to estimate σ 2 .

It is reasonable to say that the sample variance of the residuals should

2 Under the regression model assumptions, s2 is an unbiased estimator

3 The n − 2 in the denominator comes from the fact that we have n

β1 = 0 means that changes in X have no effect on Y .

1 The (1 − α)100% confidence interval for β1 is given by

2 The (1 − α)100% confidence interval for β0 is given by

The following table is data on green liquor (N a2 S) concentration and

(b) We need to estimate σ 2 and find Sxx first.

Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20

Since | t |> 2.20, we reject H0 and conclude that at 5% level of

⇒ 95% confidence interval of β0 is

A method called the Analysis of Variance (ANOVA) can be used to test

The measure of variability of the observations is expressed in terms of the

SST = Total Sum of Squares

If there is a lot of variability in the yi s, then SST is large. If SST = 0 →

The uncertainty associated with a prediction is related to the variability of

The reduction in the variability associated with the utilization of the

(variability explained by the model)

SST = SSR + SSE

The computational formulas for the above are as follows;

and SSE = SST − SSR.

SST had n − 1 degrees of freedom (d.f) associated with it. This is

SSE had n − 2 degrees of freedom, since we imposed constraints on the

Thus, the degrees of freedom are additive and given by

A sum of squares divided by the degrees of freedom is called a mean

Some properties of mean squares:

It is useful to collect the sum of squares, degrees of freedom, mean squares

Table 1: BASIC ANOVA TABLE

Source of Variation Sum of Squares df Mean Square F

If F is near 1, then MSR and MSE are approximately equal. F > 1,

Our hypotheses are as follows:

An investigator interested in the dependence of the speed of sound on

X, Temperature (o C) Y , Speed (m/s)

The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5

Therefore, yˆi = 331.2364 + 0.5455xi

Therefore SSE = SST − SSR = 63.0182.

Test statistic : F = 124.6394

Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have

The quantity R2 = SSR SSE

Regression models are often fit to data as an approximating function when

Naturally, we would like to know whether the order of the model

Specifically, the hypotheses we wish to test are:

H0 : The simple linear regression model is correct

SSE = SSP E + SSLOF

The test requires that there be replicates (replication) at one or more

y11 , y12 , ..., y1n1 repeated observations at X1 .

NB: There are m distinct levels of X.

(yi − yˆi ) = (yij − y¯i ) + (y¯i − yˆi )

where y¯i is the average of the ni observations at Xi . Squaring both sides

SSE = SSP E + SSLOF

The test statistic for lack of fit would then be

We would reject H0 if F ∗ > Fα (m − 2, n − m)

If H0 is not rejected ⇒ model must be abandoned and attempt must be

yˆi = βˆ0 + βˆ1 xi

βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885

SSR = βˆ1 Sxy

SSE = SST − SSR = 5408.5661

The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5