SM I 2013 LecturesWeek 6

48 CHAPTER 2.
SIMPLE LINEAR REGRESSION
2.8 Matrix approach to simple linear regression
In this section we will briefly discuss a matrix approach to fitting simple linear
regression models. A random sample of size n gives n equations. For the full
SLRM we have
Y1 = β0 + β1 x1 + ε1
Y2 = β0 + β1 x2 + ε2
.. ..
. .
Yn = β0 + β1 xn + εn
We can write this in matrix formulation as
Y = Xβ + ε, (2.22)
where Y is an (n×1) vector of response variables (random sample), X is an (n×
2) matrix called the design matrix, β is a (2 × 1) vector of unknown parameters
and ε is an (n × 1) vector of random errors. That is,
     
Y1 1 x1 ε1
 Y2   1 x2   ε2 
    β0  
Y =  ..  , X =  .. ..  , β= , ε =  ..  .
 .   . .  β1  . 
Yn 1 xn εn
The assumptions about the random errors let us write

ε ∼ N n 0, σ 2 I ,
that is vector ε has n-dimensional normal distribution with
     
ε1 E(ε1 ) 0
 ε2   E(ε2 )   0 
     
E(ε) = E  ..  =  ..  =  ..  = 0
 .   .   . 
εn E(εn ) 0
and the variance-covariance matrix
 
var(ε1 ) cov(ε1 , ε2 ) . . . cov(ε1 , εn )
 cov(ε2 , ε1) var(ε2 ) . . . cov(ε2 , εn ) 
 
Var(ε) =  .. .. .. .. 
 . . . . 
cov(εn , ε1 ) cov(εn , ε2 ) . . . var(εn )
 
σ2 0 . . . 0
 0 σ2 . . . 0 
  2
=  .. .. . . ..  = σ I
 . . . . 
0 0 . . . σ2
2.8. MATRIX APPROACH TO SIMPLE LINEAR REGRESSION 49
This formulation is usually called the Linear Model (in β). All the models we
have considered so far can be written in this general form. The dimensions of
matrix X and of vector β depend on the number p of parameters in the model
and, respectively, they are n × p and p × 1. In the full SLRM we have p = 2.
The null model (p = 1)
Yi = β0 + εi for i = 1, . . . , n
is equivalent to
Y = 1β0 + ε
where 1 is an (n × 1) vector of 1’s.
The no-intercept model (p = 1)
Yi = β1 xi + εi for i = 1, . . . , n
can be written as in matrix notation with

 
x1
 x2 
 
X =  ..  , β= β1 .
 . 
xn
Quadratic regression (p = 3)
Yi = β0 + β1 xi + β2 x2i + εi for i = 1, . . . , n
can be written in matrix notation with

 
1 x1 x21  
 1 x2 x2  β0
 2 
X =  .. .. ..  , β =  β1  .
 . . .  β2
1 xn x2n

The normal equations obtained in the least squares method are given by
b
X T Y = X T X β.
50 CHAPTER 2. SIMPLE LINEAR REGRESSION
It follows that so long as X T X is invertible, i.e., its determinant is non-zero, the

unique solution to the normal equations is given by
βb = (X T X)−1 X T Y .
This is a common formula for all linear models where X T X is invertible. For the
full simple linear regression model we have
 
Y1
 
T 1 1 ··· 1  Y2 
X Y =  
x1 x2 · · · xn  ... 
Yn
P ! !
Yi nȲ
= P = P
xi Yi xi Yi
and P
T Pn xi
P 2 = n nx̄
P 2 .
X X=
xi xi nx̄ xi
The determinant of X T X is given by
X X
T 2 2 2 2
|X X| = n xi − (nx̄) = n xi − nx̄ = nSxx .
Hence, the inverse of X T X is

P 2 1
P
T 1 xi −nx̄ 1 x2i −x̄
(X X) = −1
= n .
nSxx −nx̄ n Sxx −x̄ 1
So the solution to the normal equations is given by
b = (X T X)−1 X T y
β
1
P
1 x2i −x̄ nȲ
= n P
Sxx −x̄ 1 xi Yi
P 2 P
1 Ȳ P xi − x̄ xi Yi
=
Sxx xi Yi − nx̄Ȳ
P 2 P
1 Ȳ xi − nx̄2 Ȳ + nx̄2 Ȳ − x̄ xi Yi
=
Sxx SxY
P 2 P
1 2
Ȳ ( xi − nx̄ ) − x̄( xi Yi − nx̄Ȳ )
=
Sxx SxY

1 Ȳ Sxx − x̄SxY
=
Sxx SxY
!
b
Ȳ − β1 x̄
=
βb1
which is the same result as we obtained before.
Note:
Let A and B be a vector and a matrix of real constants and let Z be a vector of
random variables, all of appropriate dimensions so that the addition and multipli-
cation are possible. Then
E(A + BZ) = A + B E(Z)

Var(A + BZ) = Var(BZ) = B Var(Z)B T .
In particular,
E(Y ) = E(Xβ + ε) = Xβ
Var(Y ) = Var(Xβ + ε) = Var(ε) = σ 2 I.
These equalities let us prove the following theorem.
b of β is unbiased and its variance-
Theorem 2.7. The least squares estimator β
covariance matrix is
b = σ 2 (X T X)−1 .
Var(β)
b is unbiased. Here we have

Proof. First we will show that β
b = E{(X T X)−1 X T Y } = (X T X)−1 X T E(Y )

E(β)
= (X T X)−1 X T Xβ = Iβ = β.
Now, we will show the result for the variance-covariance matrix.
b = Var{(X T X)−1 X T Y }
Var(β)
= (X T X)−1 X T Var(Y )X(X T X)−1
= σ 2 (X T X)−1 X T IX(X T X)−1 = σ 2 (X T X)−1 .

We denote the vector of residuals as
e = Y − Yb ,
\) = X β
where Yb = E(Y b is the vector of fitted responses µ
bi . It can be shown that
the following theorem holds.
Theorem 2.8. The n × 1 vector of residuals e has mean
E(e) = 0
and variance-covariance matrix

Var(e) = σ 2 I − X(X T X)−1 X T .
Hence, variance of the residuals ei is
var[ei ] = σ 2 (1 − hii ),
where the leverage hii is the ith diagonal element of the Hat Matrix H = X(X T X)−1 X T ,
i.e.,
T
hii = xT
i (X X) xi ,
−1
where xT
i = (1, xi ) is the ith row of matrix X.
The ith mean response can be written as

β0
E(Yi ) = µi = xT
i β = (1, xi ) = β0 + β1 xi
β1
and its estimator as

bi = xT b
µ i β.
Then, the variance of the estimator is
µi ) = var(xT
var(b b 2 T T 2
i β) = σ xi (X X) xi = σ hii
−1
and the estimator of this variance is

\ µi ) = S 2 hii ,
var(b
where S 2 is a suitable unbiased estimator of σ 2 .
We can easily obtain other results we have seen for the SLRM written in non-
matrix notation, now using the matrix notation, both for the full model and for a
reduced SLM (no intercept or zero slope).
We have seen on page 50 that

P
T 1 x2i −nx̄
(X X) −1
= .
nSxx −nx̄ n
b = σ 2 (X T X)−1 . Thus
Now, by Theorem 2.7, Var[β]
P 2
xi
var[βb0 ] = σ 2
nSxx
P P n o
1 x̄2
which, by writing x2 = x2 − nx̄2 + nx̄2 , can be written as σ 2 n
+ Sxx
.
Also,

−nx̄
cov(βb0 , βb1 ) = σ 2
nSxx
−σ 2 x̄
= ,
Sxx
and
σ2
var[βb1 ] = .
Sxx
The quantity hii is given by
T
hii = xTi (X X) xi
−1
P 2
1 xj −nx̄ 1
= (1 xi ) .
nSxx −nx̄ n xi
We shall leave it as an exercise to show that this simplifies to
1 (xi − x̄)2
hii = + .
n Sxx
2.8.1 Some specific examples

1. The Null model
As we have seen, this can be written as
Y = Xβ0 + ε
P
where X = 1 is an (n × 1) vector of 1’s. So X T X = n, X T Y = Yi ,
which gives
1X
βb = (X T X)−1 X T Y = Yi = Ȳ = βb0 ,
n
b T −1 2 σ2
var[β] = (X X) σ = .
n
2. No-intercept model
We saw that this example fits the General Linear Model with
 
x1
 x2 
 
X =  ..  , β = β1
 . 
xn
P P
So X T X = x2i and X T Y =
xi Yi , and we can calculate
P
xi Yi
βb = (X X) X Y = P 2 = βb1 ,
T −1 T
xi
2
b = σ 2 (X T X)−1 = Pσ .
Var[β]
x2i

SM I 2013 LecturesWeek 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SM I 2013 LecturesWeek 6

Uploaded by

Copyright:

Available Formats

48 CHAPTER 2.

SIMPLE LINEAR REGRESSION

2.8 Matrix approach to simple linear regression

The null model (p = 1)

The no-intercept model (p = 1)

can be written as in matrix notation with

can be written in matrix notation with

It follows that so long as X T X is invertible, i.e., its determinant is non-zero, the

Hence, the inverse of X T X is

which is the same result as we obtained before.

E(A + BZ) = A + B E(Z)

b is unbiased. Here we have

b = E{(X T X)−1 X T Y } = (X T X)−1 X T E(Y )

Now, we will show the result for the variance-covariance matrix.

We denote the vector of residuals as

Theorem 2.8. The n × 1 vector of residuals e has mean

and variance-covariance matrix

Hence, variance of the residuals ei is

The ith mean response can be written as

and its estimator as

Then, the variance of the estimator is

and the estimator of this variance is

where S 2 is a suitable unbiased estimator of σ 2 .

We have seen on page 50 that

We shall leave it as an exercise to show that this simplifies to

2.8.1 Some specific examples

You might also like