You are on page 1of 45

Regression Analysis

Lecture 6: Regression Analysis

MIT 18.S096

Dr. Kempthorne

Fall 2013

MIT 18.S096 Regression Analysis 1

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 2

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Multiple Linear Regression: Setup

Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Causal Inference
Functional Relationships
MIT 18.S096 Regression Analysis 3
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

General Linear Model: For each case i, the conditional

distribution [yi | xi ] is given by
yi = ŷi + i
ŷi = β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p
β = (β1 , β2 , . . . , βp )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of ŷi (in regression parameters) maintained with non-linear x.
MIT 18.S096 Regression Analysis 4
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Steps for Fitting a Model

(1) Propose a model in terms of

Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of  over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).

MIT 18.S096 Regression Analysis 5

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Assumptions in (1) for Residual Distribution

Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, σ 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 − δ) of the i are i.i.d.
N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some
contamination distribution).

MIT 18.S096 Regression Analysis 6

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Estimator Criterion in (2)

Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume βj are r.v.’s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters β̃j are:
ei = yi − [β̃1 xi,1 + β̃2 xi,2 + · · · + β̃p xi,p ]
Influence diagnostics (identify cases which are highly
‘influential’ ?)
Outlier detection
MIT 18.S096 Regression Analysis 7
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 8

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Ordinary Least Squares Estimates

Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define

PN 2
Q(β) =
PN i=1 [yi − ŷi ]
= i=1 [yi − (β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p )]

Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β).

Matrix Notation
   
y1 x1,1 x1,2 · · · x1,p  
 y2   x2,1 x2,2 · · · β1
x2,p 
.. 
y= . X= . β=
    
.. .. . . 
 ..   .. . . .. 
yn xn,1 xn,2 · · · xp,n

MIT 18.S096 Regression Analysis 9

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate β̂

 
 ŷ2 
ŷ =   = Xβ and
 
 . 
Q(β) = i=1 (yi − ŷi )2 = (y − ŷ)T (y − ŷ)

= (y − Xβ)T (y − Xβ)
OLS β̂ solves ∂βj =0, j = 1, 2, . . . , p
∂Q(β) ∂ Pn 2

∂βj = ∂βj i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
= i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
= −2(X[j] )T (y − Xβ) where X[j] is the jth column of X

MIT 18.S096 Regression Analysis 10

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate β̂

 ∂Q
  
∂β1 XT
[1] (y − Xβ)
[2] (y − Xβ)
   
 ∂β2  
 = −2   = −2XT (y − Xβ)

=  .. ..
∂β 
 .

 .

∂βp XT
[p] (y − Xβ)
So the OLS Estimate β̂ solves the “Normal Equations”
XT (y − Xβ) = 0
⇐⇒ XT Xβ̂ = XT y
=⇒ β̂ = (XT X)−1 XT y

N.B. For β̂ to exist (uniquely)

(XT X) must be invertible
⇐⇒ X must have Full Column Rank
MIT 18.S096 Regression Analysis 11
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit

OLS Estimate: 
βˆ2 T T
β̂ =   = (X X)−1 X y Fitted Values:
 
   
ŷ1 x1,1 β̂1 + · · · + x1,p β̂p
 ŷ2   x2,1 β̂1 + · · · + x2,p β̂p 
yˆ = 
 .. =
 

 .   . 
ŷn xn,1 βˆ1 + · · · + xn,p βˆp
= Xβ̂ = X(XT X)−1 XT y = Hy
Where H = X(XT X)−1 XT is the n × n “Hat Matrix”
MIT 18.S096 Regression Analysis 12
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit

The Hat Matrix H projects R n onto the column-space of X

Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n

 
ˆ =   = y − ŷ = (In − H)y
 
 
Normal Equations: XT (y − Xβ̂) = XT ˆ = 0p =  ... 
 

N.B. The Least-Squares Residuals vector ˆ is orthogonal to the
column space of X
MIT 18.S096 Regression Analysis 13
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 14

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Assumptions

   
y1 x1,1 x1,2 · · · x1,p
 y2   x2,1 x2,2 · · · x2,p 
Data y =   and X =  ..
   
.. .. . . .. 
 .   . . . . 
yn xn,1 xn,2 · · · xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and
E (Y | X, β) = Xβ, where β = (β1 , β2 , . . . βp )T is the
p-vector of regression parameters.
Cov (Y | X, β) = σ 2 In , for some σ 2 > 0.
I.e., the random variables generating the observations are
uncorrelated and have constant variance σ 2 (conditional on X,
and β).
MIT 18.S096 Regression Analysis 15
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 .
Under the Gauss-Markov assumptions, the estimator
θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 ,
where β̂1 , β̂2 , . . . βˆp are the least squares estimates is
1) An Unbiased Estimator of θ
2) A Linear Estimator of θ, that is
θ̂ = ni=1 bi yi , for some known (given X) constants bi .

Theorem: Under the Gauss-Markov Assumptions, the estimator

θ̂ has the smallest (Best) variance among all Linear Unbiased
Estimators of θ, i.e., θ̂ is BLUE .
MIT 18.S096 Regression Analysis 16
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Proof

Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of θ = cT β is:
θ̂ = cT β̂ = cT (XT X)−1 XT y ≡ dT y
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of θ:
θ̃ = bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b − d and note that
θ̃ = bT y = (d + f)T y = θ̂ + f T y
If θ̃ is unbiased then because θ̂ is unbiased
0 = E (f T y) = dT E (y) = f T (Xβ) for all β ∈ R p
=⇒ f is orthogonal to column space of X
=⇒ f is orthogonal to d = X(XT X)−1 c
MIT 18.S096 Regression Analysis 17
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

If θ̃ is unbiased then
The orthogonality of f to d implies

Var (θ̃) = Var (bT y) = Var (dT y + f T y)

= Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
= Var (θ̂) + Var (f T y) + 2dT Cov (y)f
= Var (θ̂) + Var (f T y) + 2dT (σ 2 In )f
= Var (θ̂) + Var (f T y) + 2σ 2 dT f
= Var (θ̂) + Var (f T y) + 2σ 2 × 0
≥ Var (θ̂)

MIT 18.S096 Regression Analysis 18

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 19

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized Least Squares (GLS) Estimates

Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = Xβ + 
where the random n-vector : E [] = 0n and E [0 ] = σ 2 Σ.
σ 2 is an unknown scale parameter
Σ is a known (n × n) positive definite matrix specifying the
relative variances and correlations of the component
1 1
Transform the data (Y, X) to Y∗ = Σ− 2 Y and X∗ = Σ− 2 X and
the model becomes
Y∗ = X∗ β + ∗ , where E [∗ ] = 0n and E [∗ (∗ )0 ] = σ 2 In
By the Gauss-Markov Theorem, the BLUE (‘GLS’) of β is
β̂ = [(X∗ )T (X∗ )]−1 (X∗ )T (Y∗ ) = [XT Σ−1 X]−1 (XT Σ−1 Y)
MIT 18.S096 Regression Analysis 20
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 21

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Normal Linear Regression Models

Distribution Theory
Yi = xi,1 β1 + xi,2 β2 + · · · xi,p βp + i
= µ i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, σ 2 ).
=⇒ [Yi | xi,1 , xi,2 , . . . , xi,p , β, σ 2 ] ∼ N(µi , σ 2 ),
independent over i = 1, 2, . . . n.

Conditioning on X, β, and σ 2  
 2 
Y = Xβ + , where  =   ∼ Nn (On , σ 2 In )
 
 . 

MIT 18.S096 Regression Analysis 22

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Distribution Theory

 
µ =  ...  = E (Y | X, β, σ 2 ) = Xβ
 


MIT 18.S096 Regression Analysis 23

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

 
0 0 ··· 0

 0 σ2 0 ··· 0 

Σ = Cov (Y | X, β, σ 2 ) = 
 0 0 σ2 0  = σ 2 In

 ... ... ..
. ... 

0 0 ··· σ2
That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j .

Apply Moment-Generating Functions (MGFs) to derive

Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
Joint distribution of β̂ = (β̂1 , β̂2 , . . . , β̂p )T .

MIT 18.S096 Regression Analysis 24

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of Y
For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T ,
MY (t) = E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn )
= E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn )
= MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn )
Qn ti µi + 21 ti2 σ 2
= i =1 e
P n 1 Pn T 1 T
= e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt
=⇒ Y ∼ Nn (µ, Σ)
Multivariate Normal with mean µ and covariance Σ

MIT 18.S096 Regression Analysis 25

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of β̂
For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T ,
T β̂ ˆ ˆ
Mβ̂ (τ ) = E (e τ ) = E (e τ1 β1 +τ2 β2 +···τp βp )

Defining A = (XT X)−1 XT we can express

β̂ = (XT X)−1 XT y = AY
T ˆ
Mβˆ (τ ) = E (e τ β )
= E (e τ AY )
= E (e t Y ), with t = AT τ
= MY (t)
T 1 T
= e t u+ 2 t Σt
MIT 18.S096 Regression Analysis 26
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of β̂
T β̂
Mβ̂ (τ ) = E (e τ )
tT u+ 12 tT Σt
= e
Plug in:
t = AT τ = X(XT X)−1 τ
µ = Xβ
Σ = σ 2 In
tT µ = τ T β
tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ
= τ T [σ 2 (XT X)−1 ]τ
So the MGF of β̂ is
T 1 T 2 T −1
Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ
ˆ 2
MIT 18.S096 T Regression
−1 Analysis 27
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Marginal Distributions of Least Squares Estimates


β̂ ∼ Np (β, σ 2 (XT X)−1 )

the marginal distribution of each β̂j is:

β̂j ∼ N(βj , σ 2 Cj,j )

where Cj.j = jth diagonal element of (XT X)−1

MIT 18.S096 Regression Analysis 28

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The Q-R Decomposition of X

Consider expressing the (n × p) matrix X of explanatory variables

Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p × p) upper-triangular matrix.

The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by

performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]

MIT 18.S096 Regression Analysis 29

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
 
r1,1 r1,2 · · · r1,p−1 r1,p
 0
 r2,2 · · · r2,p−1 r2,p 

If R= 0

. ... ... , then

 
 0 0 rp−1,p−1 rp−1,p 
0 0 ··· 0 rp,p
X[1] = Q[1] r1,1
r1,1 = XT
[1] X[1]
Q[1] = X[1] /r1,1

X[2] = Q[1] r1,2 + Q[2] r2,2

Q[1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2
= 1 · r1,2 + 0 · r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.S096 Regression Analysis 30
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With r1,2 and Q[1] specfied we can solve for r2,2 :

Q[2] r2,2 = X[2] − Q[1] r1,2

Take squared norm of both sides:

2 = XT X T 2
r2,2 [2] [2] − 2r1,2 Q[1] X[2] + r1,2

(all terms on RHS are known)

With r2,2 specified

Q[2] = r2,2 X[2] − r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)

MIT 18.S096 Regression Analysis 31

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With the Q-R Decomposition

X = QR
(QT Q = Ip , and R is p × p upper-triangular)

β̂ = (XT X)−1 XT y = R−1 QT y

(plug in X = QR and simplify)

Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T

H = X(XT X)−1 XT = QQT

(giving ŷ = Hy and ˆ = (In − H)y)

MIT 18.S096 Regression Analysis 32

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

More Distribution Theory

Assume y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,

 ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where µz = AE (y) = AXβ,
and Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096 Regression Analysis 33
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Theorem For the normal linear regression model

y = Xβ + ,
X (n × p) has rank p and
 ∼ Nn (0n , σ 2 In ).
(a) β̂ = (XT X)−1 XT y and ˆ = y − Xβ̂ are independent r.v.s
(b) β̂ ∼ N (β, σ 2 (XT X)−1 )
Pn p2
(c) ˆi = ˆT ˆ ∼ σ 2 χ2n−p (Chi-squared r.v.)
(d) For each j = 1, 2, . . . , p
β̂j −βj
ˆtj =
σ̂Cj,j ∼ tn−p (t− distribution)
where σ̂ 2 = n−p i=1  ˆ2i
Cj,j = [(XT X)−1 ]j,j
MIT 18.S096 Regression Analysis 34
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof: Note that (d) follows immediately from (a), (b), (c)

Define A = , where
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider  T   
Q y zQ (p × 1)
z = Ay = T =
W y z W (n − p) × 1
MIT 18.S096 Regression Analysis 35
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The distribution of z = Ay is Nn (µz , Σz )

where  T 
µz = [A][Xβ] = [Q · R · β]
= T [R · β]
 W Q 
= [R · β]
Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In
since AT = A−1

MIT 18.S096 Regression Analysis 36

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
zQ Rβ 2
Thus z = ∼ Nn , σ In
zW On−p
zQ ∼ Np [(Rβ), σ 2 Ip ]
zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ]
and zQ and zW are independent.
The Theorem follows by showing
(a*) β̂ = R−1 zQ and ˆ = WzW ,
(i.e. β̂ and ˆ are functions of different independent vecctors).
(b*) Deducing the distribution of β̂ = R−1 zQ ,
applying Theorem* with A = R−1 and “y” = zQ
(c*) ˆT ˆ = zW T zW
= sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ).
∼ σ 2 χ(n−p)
2 , a scaled Chi-Squared r.v.

MIT 18.S096 Regression Analysis 37

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip

ˆ = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )

= y − QzQ
= y − QQT y = (In − QQT )y
= WWT y (since In = AT A = QQT + WWT )
= WzW

MIT 18.S096 Regression Analysis 38

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 39

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
 ∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096 Regression Analysis 40
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
µi = j=1 βj xi,j ,
L(β, σ 2 ) = p (yi | β, σ 2 )
Qni=1 h 1 − 1 2 (yi − j=1 βj xi,j )2
P i
= i=1

e 2σ
1 − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
= (2πσ 2 )n/2
The maximum likelihood estimates (β̂, σ̂ 2 ) maximize the
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q (β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate 2
of σ 2 solves
∂log L(β̂ ,σ )
∂(σ 2 )
= 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0
=⇒ σML ˆ
2 = Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!)

MIT 18.S096 Regression Analysis 41

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation


1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096 Regression Analysis 42

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized M Estimation
For data y, X fit the linear regression model
y i = xT
i β + i , i = 1, 2, . . . , n.
by specifying β =P β̂ to minimize
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT
i β)

(2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xiT β|

(3) Maximum Likelihood (ML): Assume the yi are independent
with pdf’s p(yi | β, xi , σ 2 ),
h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 )
(4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT i β)
χ( ) is even, monotone increasing on (0, ∞).
MIT 18.S096 Regression Analysis 43
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile

τ |yi − xT
i β|, if yi ≥ xi β
h(yi , xi , β, σ 2 ) =
(1 − τ )|yi − xT
i β|, if yi < xi β

E.g., τ = 0.90 corresponds to the 90th quantile /

τ = 0.50 corresponds to the MAD Estimator

MIT 18.S096 Regression Analysis 44

MIT OpenCourseWare

18.S096 Topics in Mathematics with Applications in Finance

Fall 2013

For information about citing these materials or our Terms of Use, visit:

You might also like