You are on page 1of 75

Chapter 2: Simple Linear Regression

2.1 The Model


2.2-2.3 Parameter Estimation
2.4 Properties of Estimators
2.5 Inference
2.6 Prediction
2.7 Analysis of Variance
2.8 Regression Through the Origin
2.9 Related Models
1
2.1 The Model
Measurement of y (response) changes in a linear fashion with a
setting of the variable x (predictor):
y =

0
+
1
x
linear relation
+

noise
The linear relation is deterministic (non-random).
The noise or error is random.
Noise accounts for the variability of the observations about the
straight line.
No noise relation is deterministic.
Increased noise increased variability.
2
Experiment with this simulation program:
simple.sim <-
function(intercept=0,slope=1,x=seq(1,10),sigma=1){
noise <- rnorm(length(x),sd = sigma)
y <- intercept + slope
*
x + noise
title1 <- paste("sigma = ", sigma)
plot(x,y,pch=16,main=title1)
abline(intercept,slope,col=4,lwd=2)
}
> source("simple.sim.R")
> simple.sim(sigma=.01)
> simple.sim(sigma=.1)
> simple.sim(sigma=1)
> simple.sim(sigma=10)
3
Simulation Examples
2 4 6 8 10
2
4
6
8
1
0
sigma = 0.01
x
y
2 4 6 8 10
2
4
6
8
1
0
sigma = 0.1
x
y
2 4 6 8 10
2
4
6
8
1
0
sigma = 1
x
y
2 4 6 8 10
0
1
0
2
0
3
0
sigma = 10
x
y
4
The Setup
Assumptions:
1. E[y|x] =
0
+
1
x.
2. Var(y|x) = Var(
0
+
1
x +|x) =
2
.
Data: Suppose data y
1
, y
2
, . . . , y
n
are obtained at settings
x
1
, x
2
, . . . , x
n
, respectively. Then the model on the data is
y
i
=
0
+
1
x
i
+
i
(
i
i.i.d. N(0,
2
) and E[y
i
|x
i
] =
0
+
1
x
i
.)
Either
1. the xs are xed values and measured without error
(controlled experiment)
OR
2. the analysis is conditional on the observed values of x
(observational study).
5
2.2-2.3 Parameter Estimation, Fitted Values and Residuals
1. Maximum Likelihood Estimation
Distributional assumptions are required
2. Least Squares Estimation
Distributional assumptions are not required
6
2.2.1 Maximum Likelihood Estimation
Normal assumption is required:
f(y
i
|x
i
) =
1

2
e
1
2
2
(y
i

1
x
i
)
2
Likelihood: L(
0
,
1
, ) =

f(y
i
|x
i
)

n
e

1
2
2

n
i=1
(y
i

1
x
i
)
2
Maximize with respect to
0
,
1
, and
2
.
(
0
,
1
): Equivalent to minimizing
n

i=1
(y
i

1
x
i
)
2
(i.e. Least-squares)
2
: SSE/n, biased.
7
2.2.2 Least Squares Estimation
Assumptions:
1. E[
i
] = 0
2. Var(
i
) =
2
3.
i
s are independent.
Note that normality is not required.
8
Method
Minimize
S(
0
,
1
) =
n

i=1
(y
i

1
x
i
)
2
with respect to the parameters or regression coefcients
0
and

1
:

0
and

1
.
Justication: We want the tted line to pass as close to all of the
points as possible.
Aim: small Residuals (observed - tted response values):
e
i
= y
i

1
x
i
9
Look at the following plots:
> source("roller2.plot")
> roller2.plot(a=14,b=0)
> roller2.plot(a=2,b=2)
> roller2.plot(a=12, b=1)
> roller2.plot(a=-2,b=2.67)
10
0 2 4 6 8 10 12
0
2
0
4
0
6
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=14, b=0
0 2 4 6 8 10 12
0
2
0
4
0
6
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=2, b=2
0 2 4 6 8 10 12
0
2
0
4
0
6
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=12, b=1
0 2 4 6 8 10 12
0
2
0
4
0
6
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=2, b=2.67
The rst three lines above do not pass as close to the plotted
points as the fourth, even though the sum of the residuals is
about the same in all four cases.
Negative residuals cancel out positive residuals.
The Key: minimize squared residuals source("roller3.plot.R")
> roller3.plot(14,0); roller3.plot(2,2); roller3.plot(12,1);
> roller3.plot(a=-2,b=2.67) # small SS
0 2 4 6 8 10 12
0
4
0
8
0
1
2
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=14, b=0
0 2 4 6 8 10 12
0
4
0
8
0
1
2
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=2, b=2
0 2 4 6 8 10 12
0
4
0
8
0
1
2
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual
ve residual
a=12, b=1
0 2 4 6 8 10 12
0
4
0
8
0
1
2
0
Roller weight (t)
D
e
p
r
e
s
s
i
o
n

i
n

l
a
w
n

(
m
m
)
Fitted values
Data values
+ve residual ve residual
a=2, b=2.67
11
Unbiased estimate of
2
1
n # parameters estimated
n

i=1
e
2
i
=
1
n 2
n

i=1
e
2
i
* n observations n degrees of freedom
* 2 degrees of freedom are required to estimate the parameters
* the residuals retain n 2 degrees of freedom
12
Alternative Viewpoint
* y = (y
1
, y
2
, . . . , y
n
) is a vector in n-dimensional space.
(n degrees of freedom)
* The tted values y
i
=

0
+

1
x
i
also form a vector in
n-dimensional space:

y = ( y
1
, y
2
, . . . , y
n
)
(2 degrees of freedom)
* Least-squares seeks to minimize the distance between y and

y.
* The distance between n-dimensional vectors u and v is the
square root of
n

i=1
(u
i
v
i
)
2
* Thus, the squared distance between y and

y is
n

i=1
(y
i
y
i
)
2
=
n

i=1
e
2
i
13
Regression Coefcient Estimators
The minimizers of
S(
0
,
1
) =
n

i=1
(y
i

1
x
i
)
2
are

0
= y

1
x
and

1
=
S
xy
S
xx
where
S
xy
=
n

i=1
(x
i
x)(y
i
y)
and
S
xx
=
n

i=1
(x
i
x)
2
14
HomeMade R Estimators
> ls.est <-
function (data)
{
x <- data[,1]
y <- data[,2]
xbar <- mean(x)
ybar <- mean(y)
Sxy <- S(x,y,xbar,ybar)
Sxx <- S(x,x,xbar,xbar)
b <- Sxy/Sxx
a <- ybar - xbar
*
b
list(a = a, b = b, data=data)
}
> S <-
function (x,y,xbar,ybar)
{
sum((x-xbar)
*
(y-ybar))
}
15
Calculator Formulas and Other Properties

S
xy
=
n

i=1
x
i
y
i

n
i=1
x
i
)(

n
i=1
y
i
)
n
S
xx
=
n

i=1
x
2
i

(

n
i=1
x
i
)
2
n
Linearity Property:
S
xy
=
n

i=1
y
i
(x
i
x)
so

1
is linear in the y
i
s.
Gauss-Markov Theorem: Among linear unbiased estimators for

1
and
0
,

1
and

0
are best (i.e. they have smallest variance).
Exercise: Find the expected value and variance of
y
1
y
x
1
x
. Is it
unbiased for
1
? Is there a linear unbiased estimator with
smaller variance?
16
Residuals

i
= e
i
= y
i

1
x
i
= y
i
y
i
= y
i
y

1
(x
i
x)
In R:
> res
function (ls.object)
{
a <- ls.object$a
b <- ls.object$b
x <- ls.object$data[,1]
y <- ls.object$data[,2]
resids <- y - a - b
*
x
resids
}
17
2.3.1 Consequences of Least-Squares
1. e
i
= y
i
y

1
(x
i
x)
(follows from intercept formula)
2.

n
i=1
e
i
= 0 (follows from 1.)
3.

n
i=1
y
i
=

n
i=1
y
i
(follows from 2.)
4. The regression line passes through the centroid ( x, y) (follows
from intercept formula)
5.

x
i
e
i
= 0
(set partial derivative of S(
0
,
1
) wrt
1
to 0)
6.

y
i
e
i
= 0
(follows from 2. and 5.)
18
2.3.2 Estimation of
2
The residual sum of squares or error sum of squares is given by
SSE =
n

i=1
e
2
i
= S
yy

1
S
xy
Note: SSE = SS
Res
and S
yy
= SS
T
An unbiased estimator for the error variance is

2
= MSE =
SSE
n 2
=
_
S
yy

1
S
xy
_
n 2
Note: MSE = MS
Res
= Residual Standard Error =

MSE
19
Example
roller data
weight depression
1 1.9 2
2 3.1 1
3 3.3 5
4 4.8 5
5 5.3 20
6 6.1 20
7 6.4 23
8 7.6 10
9 9.8 30
10 12.4 25
20
Hand Calculation
(y = depression, x = weight)


10
i=1
x
i
= 1.9 +3.1 + 12.4 = 60.7
x =
60.7
10
= 6.07


10
i=1
y
i
= 2 +1 + 25 = 141
y =
141
10
= 14.1


10
i=1
x
2
i
= 1.9
2
+3.1
2
+ +12.4
2
= 461


10
i=1
y
2
i
= 4 +1 +25 + +625 = 3009


10
i=1
x
i
y
i
= (1.9)(2) + (12.4)(25) = 1103
S
xx
= 461
(60.7)
2
10
= 92.6
21
S
xy
= 1103
(60.7)(141)
10
= 247
S
yy
= 3009
(141)
2
10
= 1021

1
=
S
xy
S
xx
=
247
92.6
= 2.67

0
= y
1
x = 14.1 2.67(6.07) = 2.11

2
=
1
n2
(S
yy

1
S
xy
)
=
1
8
(1021 2.67(247)) = 45.2 = MSE
Example Summary: the tted regression line relating depression
(y) to weight (x) is
y = 2.11 +2.67x
The error variance is estimated as MSE = 45.2.
R commands (home-made version)
> roller.obj <- ls.est(roller)
> roller.obj[-3] # intercept and slope estimate
$a
[1] -2.09
$b
[1] 2.67
> res(roller.obj) # residuals
[1] -0.98 -5.18 -1.71 -5.71 7.95
[6] 5.82 8.02 -8.18 5.95 -5.98
> sum(res(roller.obj)2)/8 # error variance (MSE)
[1] 45.4
22
R commands (built-in version)
> attach(roller)
> roller.lm <- lm(depression weight)
> summary(roller.lm)
Call:
lm(formula = depression weight)
Residuals:
Min 1Q Median 3Q Max
-8.180 -5.580 -1.346 5.920 8.020
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.0871 4.7543 -0.439 0.67227
weight 2.6667 0.7002 3.808 0.00518
23
Residual standard error: 6.735 on 8 degrees of freedom
Multiple R-squared: 0.6445, Adjusted R-squared: 0.6001
F-statistic: 14.5 on 1 and 8 DF, p-value: 0.005175
> detach(roller)
or
> roller.lm <- lm(depression weight, data = roller)
> summary(roller.lm)
Using Extractor Functions
Partial Output:
> coef(roller.lm)
(Intercept) weight
-2.09 2.67
> summary(roller.lm)$sigma
[1] 6.74
From the output,
the slope estimate is

1
= 2.67.
the intercept estimate is

0
= 2.09.
the Residual standard error is the square root of the MSE:
6.74
2
= 45.4.
24
Other R commands
tted values: predict(roller.lm)
residuals: resid(roller.lm)
diagnostic plots: plot(roller.lm)
(these include a plot of the residuals against the tted values
and a normal probability plot of the residuals)
Also plot(roller); abline(roller.lm)
(this gives a plot of the data with the tted line overlaid)
25
2.4: Properties of Least Squares Estimates
E[

1
] =
1
E[

0
] =
0
Var(

1
) =

2
S
xx
Var(

0
) =
2
_
1
n
+
x
2
S
xx
_
26
Standard Error Estimators


Var(

1
) =
MSE
S
xx
so the standard error (s.e.) of

1
is estimated by

MSE
S
xx
roller e.g.: MSE = 45.2, S
xx
= 92.6
s.e. of

1
is
_
45.2/92.6 = .699


Var(

0
) = MSE
_
1
n
+
x
2
S
xx
_
so the standard error (s.e.) of

0
is estimated by

_
MSE
_
1
n
+
x
2
S
xx
_
roller e.g.:
MSE = 45.2, S
xx
= 92.6, x = 6.07, n = 10
s.e. of

0
is

45.2
_
1
10
+
6.07
2
92.6
2
_
= 4.74
27
Distributions of

1
and

0
y
i
is N(
0
+
1
x
i
,
2
), and

1
=

n
i=1
a
i
y
i
(a
i
=
x
i
x
S
xx
)

1
is N(
1
,

2
S
xx
).
Also,
SSE

2
is
2
n2
(independent of

1
) so

1
_
MSE/S
xx
t
n2

0
=

n
i=1
c
i
y
i
(c
i
=
1
n

x(x
i
x)
S
xx
)

0
is N(
0
,
2
_
1
n
+
x
2
S
xx
_
) and

MSE
_
1
n
+
x
2
S
xx
_
t
n2
28
2.5 Inferences about the Regression Parameters
Tests
Condence Intervals
for
1
, and
0
+
1
x
0
29
2.5.1 Inference for
1
H
0
:
1
=
10
vs. H
1
:
1
=
10
Under H
0
,
t
0
=

10
_
MSE/S
xx
has a t-distribution on n 2 degrees of freedom.
p-value = P(|t
n2
| > t
0
)
e.g. testing signicance of regression for roller:
H
0
:
1
= 0 vs. H
1
:
1
= 0
t
0
=
2.67 0
.699
= 3.82
p-value = P(|t
8
| > 3.82) = 2(1 P(t
8
< 3.82)) = .00509
R command: 2
*
(1-pt(3.82,8))
30
(1 ) Condence Intervals
slope:

1
t
n2,/2
s.e.
or

1
t
n2,/2
_
MSE/S
xx
roller e.g. (95% condence interval)
2.67 t
8,.025
(.699)
or
2.67 2.31(.699) = 2.67 1.61
R command: qt(.975, 8)
intercept:

0
t
n2,/2
s.e.
e.g. 2.11 2.31(4.74) = 2.11 10.9
31
2.5.2 Condence Interval for Mean Response
E[y|x
0
] =
0
+
1
x
0
where x
0
is a possible value that the predictor could take.


E[y|x
0
] =

0
+

1
x
0
is a point estimate of the mean response at
that value.
To nd a 1 condence interval for E[y|x
0
], we need the
variance of y
0
=

E[y|x
0
]:
Var( y
0
) = Var(

0
+

1
x
0
)

0
=
n

i=1
c
i
y
i

1
x
0
=
n

i=1
b
i
x
0
y
i
32
Therefore,
y
0
=
n

i=1
(c
i
+b
i
x
0
)y
i
so
Var( y
0
) =
n

i=1
(c
i
+b
i
x
0
)
2
Var(y
i
)
=
2
n

i=1
(c
i
+b
i
x
0
)
2
=
2
n

i=1
(
1
n
+
(x
0
x)(x
i
x)
S
xx
)
2
=
2
(
1
n
+
(x
0
x)
2
S
xx
)


Var( y
0
) = MSE(
1
n
+
(x
0
x)
2
S
xx
)
The condence interval is then
y
0
t
n2,/2

_
MSE(
1
n
+
(x
0
x)
2
S
xx
)
e.g. Compute a 95% condence interval for the expected
depression when the weight is 5 tonnes.
x
0
= 5, y
0
= 2.11 +2.67(5) = 11.2
n = 10, x = 6.07, S
xx
= 92.6, MSE = 45.2
Interval:
11.2 t
8,.025

45.2(
1
10
+
(5 6.07)
2
92.6
)
= 11.2 2.31

5.08 = 11.2 5.21


= (5.99, 16.41)
R code:
> predict(roller.lm,
newdata=data.frame(weight<-5),
interval="confidence")
fit lwr upr
[1,] 11.2 6.04 16.5
Ex. Write your own R function to compute this interval.
2.6: Predicted Responses
If a new observation were to be taken at x
0
, we would predict it
to be y
0
=
0
+
1
x
0
+ where is independent noise.
y
0
=

0
+

1
x
0
is a point prediction of the response at that
value.
To nd a 1 prediction interval for y
0
, we need the variance of
y
0
y
0
:
Var(y
0
y
0
) = Var(
0
+
1
x
0
+

1
x
0
)
= Var(

1
x
0
) +Var()
=
2
(
1
n
+
(x
0
x)
2
S
xx
) +
2
33
=
2
(1 +
1
n
+
(x
0
x)
2
S
xx
)


Var(y
0
y
0
) = MSE(1 +
1
n
+
(x
0
x)
2
S
xx
)
The prediction interval is then
y
0
t
n2,/2

_
MSE(1 +
1
n
+
(x
0
x)
2
S
xx
)
e.g. Compute a 95% prediction interval for the depression for a
single new observation where the weight is 5 tonnes.
x
0
= 5, y
0
= 2.11 +2.67(5) = 11.2
n = 10, x = 6.07, S
xx
= 92.6, MSE = 45.2
Interval:
11.2 t
8,.025

45.2(1 +
1
10
+
(5 6.07)
2
92.6
)
= 11.2 2.31

50.3 = 11.2 16.4 = (5.2, 27.6)


R code for Prediction Intervals
> predict(roller.lm,
newdata=data.frame(weight<-5),
interval="prediction")
fit lwr upr
[1,] 11.2 -5.13 27.6
Write your own R function to produce this interval.
34
Degrees of Freedom
A random sample of size n coming from a normal population
with mean and variance
2
has n degrees of freedom:
Y
1
, . . . , Y
n
.
Each linearly independent restriction reduces the number of
degrees of freedom by 1.


n
i=1
(Y
i
)
2

2
has a
2
(n)
distribution.


n
i=1
(Y
i

Y )
2

2
has a
2
(n1)
distribution. (Calculating

Y imposes
one linear restriction.)


n
i=1
(Y
i

Y
i
)
2

2
has a
2
(n2)
distribution. (Calculating

0
and

1
imposes two linearly independent restrictions.)
Given the x
i
s, there are 2 degrees of freedom in the quantity

0
+

1
x
i
.
Given x
i
and

Y , there is one degree of freedom:
y
i


Y =

1
(x
i
x).
35
2.7: Analysis of Variance: Breaking down (analyzing) Variation
The variation in the data (responses) is summarized by
S
yy
=
n

i=1
(y
i
y)
2
= TSS
(Total sum of squares)
2 sources of variation in the responses:
1. variation due to the straight line relationship with the
predictor
2. deviation from the line (noise)

y
i
y
deviation from data center
=
y
i
y
i
residual
+
y
i
y
difference: line and center
y
i
y = e
i
+ y
i
y
36
so
S
yy
=

(y
i
y)
2
=

(e
i
+ y
i
y)
2
=

e
2
i
+

( y
i
y)
2
since

e
i
y
i
= 0 and y

e
i
= 0
S
yy
= SSE +

( y
i
y)
2
= SSE +SS
R
The last term is the regression sum of squares.
Relation between SS
R
and
1
We saw earlier that
SSE = S
yy

1
S
xy
Therefore,
SS
R
=

1
S
xy
=

1
2
S
xx
Note that, for a given set of xs SS
R
depends only on

1
.
MS
R
= SS
R
/d.f. = SS
R
/1
(1 degree of freedom for slope parameter)
37
Expected Sums of Squares

E[SS
R
] = E[S
xx

1
2
]
= S
xx
_
Var(

1
) +(E[

1
])
2
_
= S
xx
_

2
S
xx
+
2
1
_
=
2
+
2
1
S
xx
= E[MS
R
]
Therefore, if
1
= 0, then SS
R
is an unbiased estimator for
2
.
Development of E[MSE]:
E[S
yy
] = E[

(y
i
y)
2
]
= E[

y
2
i
n y
2
] =

E[y
2
i
] nE[ y
2
]
38
Development of E[MSE] (contd)
Consider the 2 terms on RHS, separately:
1. term:
E[y
2
i
] = Var(y
i
) +(E[y
i
])
2
=
2
+(
0
+
1
x
i
)
2

E[y
2
i
] = n
2
+n
2
0
+2n
0

1
x +

2
1
x
2
i
2. term:
E[ y
2
] = Var( y) +(E[ y])
2
=
2
/n +(
0
+
1
x)
2
nE[ y
2
] =
2
+n
2
0
+2n
0

1
x +n
2
1
x
2
39
Development of E[MSE] (contd)

E[S
yy
] = (n 1)
2
+
2
1

(x
i
x)
2
= E[SS
T
]

E[SSE] = E[S
yy
] E[SS
R
]
= (n 1)
2
+
2
1

(x
i
x)
2
(
2
+
2
1
S
xx
)
= (n 2)
2
E[MSE] = E[SSE/(n 2)] =
2
40
Another approach to testing H
0
:
1
= 0
Under the null hypothesis, both MSE and MS
R
estimate
2
.
Under the alternative, only MSE estimates
2
.
E[MS
R
] =
2
+
2
1
S
xx
>
2
A reasonable test is
F
0
=
MS
R
MSE
F
1,n2
Large F
0
evidence against H
0
.
Note t
2

= F
1,
so this is really the same test as
t
2
0
=
_

1
_
MSE/S
x
x
_

_
2
=

1
2
S
xx
MSE
=
MS
R
MSE
41
The ANOVA table
Source df SS MS F
Reg. 1

1
2
S
xx

1
2
S
xx
MS
R
MSE
Error n 2 S
yy

1
2
S
xx
SSE/(n 2)
Total n-1 S
yy
roller data example:
> anova(roller.lm) # R code
Analysis of Variance Table
Response: depression
Df Sum Sq Mean Sq F value Pr(>F)
weight 1 658 658 14.5 0.0052
Residuals 8 363 45
(recall that the t-statistic for testing
1
= 0 had been 3.81 =

14.5)
Ex. Write an R function to compute these ANOVA quantities.
42
Condence Interval for
2

SSE

2

2
n2
so
P(
2
n2,1/2

SSE

2

2
n2,/2
) = 1
so
P(
SSE

2
n2,/2

2

SSE

2
n2,1/2
) = 1
e.g. roller data:
SSE = 363

2
8,.975
= 2.18
(R code: 1-qchisq(8, .025))

2
8,.025
= 17.5
(363/17.5, 363/2.18) = (20.7, 166.5)
43
2.7.1 R
2
- Coefcient of Determination
R
2
= is the fraction of the response variability explained by the
regression:
R
2
=
SS
R
S
yy
0 R
2
1. Values near 1 imply that most of the variability is
explained by the regression.
roller data:
SS
R
= 658 and S
yy
= 1021
so
R
2
=
658
1021
= .644
44
R output
> summary(roller.lm)
...
Multiple R-Squared: 0.644, ...
Ex. Write an R function which computes R
2
.
Another interpretation:
E[R
2
]
.
=
E[SS
R
]
E[S
yy
]
=

2
1
S
xx
+
2
(n 1)
2
+
2
1
S
xx
=

2
1
S
xx
n1
+

2
n1

2
+
2
1
S
xx
n1
.
=

2
1
S
xx
(n1)

2
+
2
1
S
xx
(n1)
for large n. (Note: this differs from the textbook.)
45
Properties of R
2
Thus, R
2
increases as
1. S
xx
increases (xs become more spread out)
2.
2
decreases
Cautions
1. R
2
does not measure the magnitude of the regression slope.
2. R
2
does not measure the appropriateness of the linear model.
3. A large value of R
2
does not imply that the regression model
will be an accurate predictor.
46
Hazards of Regression
Extrapolation: predicting y values outside the range of observed
x values. There is no guarantee that a future response would
behave in the same linear manner outside the observed range.
e.g. Consider an experiment with a spring. The spring is
stretched to several different lengths x (in cm) and the restoring
force F (in Newtons) is measured:
x F
3 5.1
4 6.2
5 7.9
6 9.5
> spring.lm <- lm(Fx - 1,data=spring)
> summary(spring.lm)
Coefficients:
47
Estimate Std. Error t value Pr(>|t|)
x 1.5884 0.0232 68.6 6.8e-06
The tted model relating F to x is

F = 1.58x
Can we predict the restoring force for the spring, if it has been
extended to a length of 15 cm?
High leverage observations: x values at the extremes of the
range have more inuence on the slope of the regression than
observations near the middle of the range.
Outliers can distort the regression line. Outliers may be
incorrectly recorded OR may be an indication that the linear
relation or constant variance assumption is incorrect.
A regression relationship does not mean that there is a
cause-and-effect relationship. e.g. The following data give the
number of lawyers and number of homicides in a given year for a
number of towns:
no. lawyers no. homicides
1 0
2 0
7 2
10 5
12 6
14 6
15 7
18 8
Note that the number of homicides increases with the number of
lawyers. Does this mean that in order to reduce the number of
homicides, one should reduce the number of lawyers?
Beware of nonsense relationships. e.g. It is possible to show
that the area of some lakes in Manitoba is related to elevation.
Do you think there is a real reason for this? Or is the apparent
relation just a result of chance?
2.9 - Regression through the Origin
intercept = 0
y
i
=
1
x
i
+
Max. Likelihood and L-S
minimize
n

i=1
(y
i

1
x
i
)
2

1
=

x
i
y
i

x
2
i
e
i
= y
i

1
x
i
SSE =

e
2
i
Max. Likelihood

2
=
SSE
n
48
Unbiased Estimator:

2
= MSE =
SSE
n 1
Properties of

1
:
E[

1
] =
1
Var(

1
) =

2

x
2
i
1 C.I. for
1
:

1
t
n1,/2

_
MSE

x
2
i
1 C.I. for E[y|x
0
]:
y
0
t
n1,/2

_
MSEx
2
0

x
2
i
since
Var(

1
x
0
) =

2

x
2
i
x
2
0
1 P.I. for y, given x
0
:
y
0
t
n1,/2

_
MSE(1 +
x
2
0

x
2
i
)
R code:
> roller.lm <- lm(depressionweight - 1,
data=roller)
> summary(roller.lm)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
weight 2.392 0.299 7.99 2.2e-05
Residual standard error: 6.43
on 9 degrees of freedom
Multiple R-Squared: 0.876,
Adjusted R-squared: 0.863
F-statistic: 63.9 on 1 and 9 DF,
p-value: 2.23e-005
> predict(roller.lm,
newdata=data.frame(weight<-5),
interval="prediction")
fit lwr upr
[1,] 12.0 -2.97 26.9
Ch. 2.9.2 Correlation
Bivariate Normal Distribution:
f(x, y) =
1
2
1

2
_
1
2
e

1
2(1
2
)
(Y
2
2XY +X
2
)
where
X =
x
1

1
and
Y =
y
2

2
correlation coefcient:
=

12

2

1
and
2
1
are the mean and variance of x

2
and
2
2
are the mean and variance of y

12
= E[(x
1
)(y
2
)], the covariance
49
Conditional distribution of y given x
f(y|x) =
1

2
1.2
e

1
2
_
y
0

1
x

1.2
_
2
where

0
=
2

1
=

2

2
1.2
=
2
2
(1
2
)
50
An Explanation
:
Suppose
y =
0
+
1
x +
where and x are independent N(0,
2
) and N(
1
,
2
1
) random
variables.
Dene
2
= E[y] and
2
2
= Var(y):

2
=
0
+
1

2
2
=
2
1

2
1
+
2
Dene
12
= Cov(x
1
, y
2
):

12
= E[(x
1
)(y
2
)]
= E[(x
1
)(
1
x +
1

1
)]
=
1
E[(x
1
)
2
] +E[(x
1
)]
=
1

2
1
51
Dene =

12

2
:
=

1

2
1

2
=
1

2
Therefore,

1
=

2

and

0
=
2

1
=
2

What is the conditional distribution of y, given x = x?


y|
x=x
=
0
+
1
x +
must be normal with mean
E[y|x = x] =
0
+
1
x
=
2
+

1
(x
1
)
and variance
Var(y|x = x) =
2
1.2
=
2
=
2
2

2
1

2
1
=
2
2

2
2

2
1

2
1
=
2
2
(1
2
)
Estimation
Maximum Likelihood Estimation (using bivariate normal) gives

0
= y

1
x

1
=
S
xy
S
xx
r = =
S
xy
_
S
xx
S
yy
Note that
r =

S
xx
S
yy
so
r
2
=

1
2
S
xx
S
yy
= R
2
52
i.e. the coefcient of determination = square of correlation
coefcient
Testing H
0
: = 0 vs. H
1
: = 0
Condl approach (equiv. to testing
1
= 0):
t
0
=

1
_
MSE/S
xx
=

1
_
S
yy

1
2
S
xx
(n2)S
xx
=

1
2
(n 2)
S
yy
S
xx

1
2
=

_
(n 2)r
2
1 r
2
where we have used

1
2
= r
2
S
yy
/S
xx
.
53
The above statistic is a t statistic on n 2 degrees of freedom
conclude = 0 if pvalue
P(|t
n2
| > |t
0
|) is small.
Testing H
0
: =
0
Z =
1
2
log
1 +r
1 r
has an approximate normal distribution with mean

Z
=
1
2
log
1 +
1
and variance

2
Z
=
1
n 3
for large n.
Thus,
Z
0
=
log
1+r
1r
log
1+
0
1
0
2
_
1/(n 3)
has an approximate standard normal distribution when the null
hypothesis is true.
54
Condence Interval for
Condence interval for
1
2
log
1+
1
:
Z z
/2
_
1/(n 3)
Find endpoints (l, u) of this condence interval and solve for :
(
e
2l
1
1 +e
2l
,
e
2u
1
1 +e
2u
)
55
R code for fossum example
Find 95% condence interval for the correlation between total
length and head length:
> source("fossum.R")
> attach(fossum)
> n <- length(totlngth) # sample size
> r <- cor(totlngth,hdlngth)# correlation est.
> zci <- .5
*
log((1+r)/(1-r)) +
qnorm(c(.025,.975))
*
sqrt(1/(n-3))
> ci <- (exp(2
*
zci)-1)/(1+exp(2
*
zci))
> ci # transformed conf. interval
> detach(fossum)
[1] 0.62 0.83 # conf. interval for true
# correlation
56