Simple Linear Regression, Cont.: BIOST 515 January 13, 2004

Lecture 3
Simple linear regression, cont.

BIOST 515
January 13, 2004

1
Breakdown of sums of squares
The simplest regression estimate for Yi is Ȳ (an intercept-only

model). Yi − Ȳ is the total error and can be broken down
further by
Yi − Ȳ = (Yi − Ŷi) + (Ŷi − Ȳ )
total error = residual error + error explained by regression

2
(x , y )
● i i
yi − yî
●(xi , yî) yi − y
y
yî − y
●
(x , y)
x
3
If we square the previous expression and sum over all

observations, we get
N
X N
X N
X
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi)2
i=1 i=1 i=1
=
SST O = SSR + SSE,
where SST O is the corrected sums of squares of the
observations, SSR is the sum of squares regression and SSE
is the sums of squares error.
4
Intuitively, if SSR is ’large’ compared to SSE, then β1 is

significantly different than zero.
Recall that Z2 = SSE
σ2
∼ χ2
N −2. It can also be shown that,
under H0, Z1 = SSR
σ2
χ2
1 and Z1 and Z2 are independent.
Under H0,
Z1/1 SSR
F = = ∼ F1,N −2.
Z2/(N − 2) SSE/(N − 2)
If the observed statistic
Fobs > F1,N −2,1−α,
then we reject H0 : β1 = 0.
5
The calculations for the F-test are usually presented in an

analysis of variance (ANOVA) table.
Source of Sums of squares Degrees of Mean E[Mean square]
variation freedom square
PN PN
Regression SSR = i=1 (Ŷi − Ȳ )2 1 SSR σ 2 + β12 i=1 (Xi − X̄)2
PN 2 SSE 2
Error SSE = i=1 (Ŷi − Yi ) N-2 N −2 σ
PN 2
Total SST O = i=1 (Yi − Ȳ ) N-1
lm1=lm(Mortality~Education,data=smsa)
anova(lm1)
Analysis of Variance Table
Response: Mortality
Df Sum Sq Mean Sq F value Pr(>F)
Education 1 59662 59662 20.508 3.008e-05 ***
Residuals 58 168737 2909
6
Fobs = 59662/(168737/58) = 20.51 > F1,58,.95 = 4.01.

Therefore, we reject H0 : β1 = 0.
To get SSTO:
alm1=anova(lm1)
SSTO=sum(alm1$"Sum Sq")
print(SSTO)
[1] 228398.3
Where do the degrees of freedom come from?

7
In class, we will show that the t-test and F-test are equivalent
for H0 : β1 = 0. However, the t-test is somewhat more
adaptable as it can be used for one-sided alternatives. We can
also easily calculate it for different hypothesized values in H0.
One-sided t-test for the SMSA example:
H0 : β1 = 0 vs. HA : β1 < 0.
βˆ1
tobs = = −4.529
ˆ βˆ1)
se(
−2
tN
α = −1.627 > −4.529 therefore reject H0 in favor of HA.
8
Coefficient of Determination
2 SSR SSE
R = =1−
SST O SST O
• Often referred to as the proportion of variation explained be
the predictor
• Because 0 ≤ SSE ≤ SST O, 0 ≤ R2 ≤ 1
• As predictors are added to the model R2 will not decrease
• Large R2 does not necessarily imply a “good” model

9
• R2 does not
? measure the magnitude of the slope
? measure the appropriateness of the model
From SMSA example with education as a predictor of

mortality:
R2=alm1$"Sum Sq"[1]/SSTO
print(R2)
0.261217
R2 = 0.26
10
Prediction
Sometimes, we would like to be able to predict the outcome

for a new value of the predictor. The new outcome is defined
as
ynew = β0 + β1xnew +
with an estimated value of
yd ˆ ˆ
new = β0 + β1xnew +
ˆ.
The expected value is
E[yd
new ] = β0 + β1xnew ,
11
and the variance is

( )
2 1 (xnew − x̄)2
var(yd
new ) = σ 1 + + PN .
N i=1 (xi − x̄)2
The 100(1 − α)% confidence interval is given by

( )1/2
2
1 (xnew − x̄)
βˆ0 +βˆ1xnew ±tN −2,1−α/2 ×σ̂× 1 + + PN .
N i=1(xi − x̄)
2
Note: We have assumed ∼ N (0, σ 2) to construct the

prediction interval. If the error terms are not close to normal,
12
then the prediction interval could be misleading. This is not

the case for the interval for the fitted response which only
requires approximate normality for βˆ0 and βˆ1.
13
●
1100
●
1050
●
● ●
● ●
1000
● ●
● ●
● ● ●
●
●
Mortality
● ● ● ● ●
● ● ● ●
950
● ● ●
●
●
● ●
● ●
●
● ● ●
● ● ● ●
900
●
● ● ●
● ● ●
● ● ●
● ● ●
●
850
● ●
●
800
9.0 9.5 10.0 10.5 11.0 11.5 12.0
Education
14
Maximum Likelihood Estimation
Assumptions about the distribution of i are not necessary for

least squares estimation. If we assume that i ∼iid N (0, σ 2),
then Yi ∼iid N (β0 + β1xi, σ 2) and
2 1 1
p(Yi|β0, β1, σ ) = √ exp{− 2 (Yi − (β0 + β1xi)2}.
2πσ 2 2σ
The likelihood is then equal to
N N
2 1 1 X
L(β0, β1, σ ) = √ exp{− 2 (Yi −(β0 +β1xi)2}.
2πσ 2 2σ i=1
15
The maximum likelihood estimators (MLEs) are those values

of β0, β1 and σ 2 that maximize L or, equivalently, l = log(L).
N
2 1 X
l ∝ −N/2 log(σ ) − 2 (Yi − (β0 + β1xi))2.
2σ i=1
The MLEs for the simple linear regression model are given by
βˆ0 = Ȳ − βˆ1¯(x),
PN
ˆ i=1 Yi(xi − x̄)
β1 = PN
(x − x̄)2
i=1 i
16
and
N
1
σˆ2 =
X
(Yi − βˆ0 − βˆ1xi)2.
N i=1
The MLEs for β0 and β1 are the same as the least squares
estimators. However the MLE for σ 2 is not. Recall that the
least squares estimate of σ 2 is unbiased. The MLE of σ 2 is
biased (although it is consistent).
17
Considerations in the use of regression
1. Regression models are only interpretable over the range of

the observed data.
2. The disposition of x plays an important role in the model fit.
3. Outliers or erroneous data can disturb the model fit.
4. Just because the regression results indicate that two variables

are related, there is no evidence about causality.
18
Multiple Linear Regression
Example:
y = β0 + β1x1 + β2x2 + ,
E(y) = 2 + 8x1 + 10x2
β1 indicates the change in the expected response per unit

change in x1 when x2 is held constant. Likewise, β2 represents
the change in the expected response per unit change in x2
when x1 is held constant.
19
10
8
6
E(y)
x2
4
x2
2
0
x1
0 2 4 6 8 10
x1
20
We now consider the model
yi = β0 + β1xi1 + · · · + βpxip + i, (1)
i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0. The

parameter βj , j = 1 . . . , p represents the expected change in yi
per unit of change in xj holding the remaining predictors
xi(i 6= j) constant.
21
We can use the model defined in (1) to describe more

complicated models. For example, we might be interested in a
cubic polynomial model,
y = β0 + β1x + β2x2 + β3x3 + .
If we let x1 = x, x2 = x2 and x3 = x3, then we can rewrite

the regression model as
y = β0 + β1x1 + β2x2 + β3x3 + ,
which is a multiple linear regression model with 3 predictors.

How do we interpret this model?
22
Interactions
We may also want to include interaction effects
y = β0 + β1x1 + β2x2 + β3x1x2.
If we let x3 = x1x2, this model is equivalent to
y = β0 + β1x1 + β2x2 + β3x3.

Simple Linear Regression, Cont.: BIOST 515 January 13, 2004

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression, Cont.: BIOST 515 January 13, 2004

Uploaded by

Copyright:

Available Formats

Lecture 3

Simple linear regression, cont.

January 13, 2004

Breakdown of sums of squares

The simplest regression estimate for Yi is Ȳ (an intercept-only

Yi − Ȳ = (Yi − Ŷi) + (Ŷi − Ȳ )

total error = residual error + error explained by regression

If we square the previous expression and sum over all

Intuitively, if SSR is ’large’ compared to SSE, then β1 is

Fobs > F1,N −2,1−α,

The calculations for the F-test are usually presented in an

Analysis of Variance Table

Fobs = 59662/(168737/58) = 20.51 > F1,58,.95 = 4.01.

Where do the degrees of freedom come from?

• Because 0 ≤ SSE ≤ SST O, 0 ≤ R2 ≤ 1

• As predictors are added to the model R2 will not decrease

• Large R2 does not necessarily imply a “good” model

From SMSA example with education as a predictor of

Sometimes, we would like to be able to predict the outcome

The expected value is

and the variance is

The 100(1 − α)% confidence interval is given by

Note: We have assumed  ∼ N (0, σ 2) to construct the

then the prediction interval could be misleading. This is not

9.0 9.5 10.0 10.5 11.0 11.5 12.0

Maximum Likelihood Estimation

Assumptions about the distribution of i are not necessary for

The maximum likelihood estimators (MLEs) are those values

Considerations in the use of regression

1. Regression models are only interpretable over the range of

2. The disposition of x plays an important role in the model fit.

3. Outliers or erroneous data can disturb the model fit.

4. Just because the regression results indicate that two variables

Multiple Linear Regression

E(y) = 2 + 8x1 + 10x2

β1 indicates the change in the expected response per unit

We now consider the model

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0. The

We can use the model defined in (1) to describe more

y = β0 + β1x + β2x2 + β3x3 + .

If we let x1 = x, x2 = x2 and x3 = x3, then we can rewrite

y = β0 + β1x1 + β2x2 + β3x3 + ,

which is a multiple linear regression model with 3 predictors.

We may also want to include interaction effects

y = β0 + β1x1 + β2x2 + β3x1x2.

If we let x3 = x1x2, this model is equivalent to

y = β0 + β1x1 + β2x2 + β3x3.

You might also like

Note: We have assumed ∼ N (0, σ 2) to construct the

Assumptions about the distribution of i are not necessary for

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0. The

y = β0 + β1x + β2x2 + β3x3 + .

y = β0 + β1x1 + β2x2 + β3x3 + ,