You are on page 1of 8

Page 1

Ordinary Least Squares

I. What is Ordinary Least Squares (OLS)

There are a number of techniques for constructing the SRF (ie estimating the
regression model or estimating  0 , 1 ). But the most common and extensively
used procedure is OLS or Least Squares Criterion.

First, recall that the residual from any SRF can be written:

ei = Y i - Yˆ i  Y i - ˆ 0 - ˆ 1 X i

Suppose we use the following criterion in choosing the SRF.

n n
Min  ei2 =  ( Y i - Yˆ i )2
i 1 i 1

where n is the sample size. We want to make the Residual Sum of Squares
(RSS) as small as possible. The summation takes place over the sample. This is
our objective function -- the least squares criterion. The goal of the estimation.

Show this in the following diagram

II. OLS Estimators


Page 2

Solving this minimization problem, we get a general formula for the estimated
slope coefficient:

n X i Y i -  X i  Y i ( X i - X ) ( Y i - Y ) xi yi
̂ 1 =  
n X i2 - (  X i )2 ( X i - X )2  xi2

where the ‘bars’ over X and Y indicate sample means of X and Y respectively.
From now on, lower case letters for these variables indicate deviations from the
means (that is xi  X i  X , yi  Yi  Y ). And

ˆ 0 = Y - ˆ 1 X

Note that both solutions are 'unique'.

III. Why Ordinary Least Squares (OLS)? Any Alternative?

Some algebraic properties of OLS:

1) the estimated regression line passes through the sample means of Y and X
(see the formula for ̂ 0 ).

2) the mean of the fitted value is equal to the mean of the dependent variable
(that is Yˆ  Y ).

3) we can re-write this regression in its 'deviation form', that is

y i = ̂ 1 xi + ei

4) the residuals are uncorrelated with the fitted value of y :  yˆ i ei  0

Statistical properties of OLS will be discussed in Chapter 4.


n

An alternative to OLS – Least absolute deviation (LAD): Min | e


i 1
i |

IX. Total, Explained, and Residual Sums of Squares

OLS only tells how to obtain the estimated regression function (ie SPF) from
data. But how well the line fits the data? To address this issue, we need three
concepts.
Page 3

Let’s begin with the following expression,

Y i = Yˆi + ei

which we can put in the 'deviation' form:

y i  yˆ i  ei
Squaring both sides, and summing over the sample we get:

y 2
i  yˆ i2   ei2  2 yˆ i ei  yˆ i2   ei2

where we make use of the fact that the fitted values of y and the residuals are
uncorrelated, and the ‘deviation form’ of the regression function.

This has the following interpretation. The Total Sum of Squares (TSS) is equal
to the sum of the Explained Sum of Squares (ESS) and the Residual Sum of
Squares (RSS). It is known the “decomposition of variance”.

TSS = ESS + RSS


Page 4

The total variation in the dependent variable can be attributed to the regression
line (the explained forces) and the residuals (the unexplained forces). The
smaller the RSS, the better the fit of the line to the data.

X. A Measure of 'Goodness of Fit' (R 2)

Thus far we've concentrated on the estimation of the coefficients in our


regression. We now consider how well the regression function 'fits' the data.

The Coefficient of Determination (R2) is the summary measure in a two-variable


regression model that indicates the magnitude of this 'goodness of fit'.

R2 is defined as:
2
2 ESS  yˆ i
R = = 2
TSS  yi

or

2 RSS  ei2
R =1 - =1 -
TSS  y i2

This Coefficient of Determination measures “... the percentage of the total


variation in Yi explained by the regression model”. It is always between 0 & 1.
A larger R2 means higher explanatory power for the explanatory variable.

XI. Multiple Linear Regression (MLR) Model

Often Y is affected by many variables instead of just one. In this case we have
to use a MLR:

Yi =  0 +  1 X 1i     K X Ki +  i

The slope coefficients are often called partial regression coefficients as they
indicate the change in the dependent variable associated with a one unit increase
Page 5

in the independent variable in question holding constant the other independent


variables in the equation.

OLS applies in the same way to MLR. That is, we minimize the RSS:
n

Min  (Yi   0  1 X 1i     K X Ki )
2

i 1

The values of  0 , 1 ,,  K that minimize RSS are called the OLS estimates and
denoted by ˆ , ˆ ,, ˆ .
0 1 K

Formulae for OLS estimates? OLS estimates still have closed form expression
and hence solutions are unique. However, the expressions quickly get very
complicated with K. It is much easier to use Matrix Algebra to obtain a simple
expression for OLS estimates. This will be learned in a more advanced
econometric course.

XII. Coefficient of Determination

The Coefficient of Determination (R2) has the same interpretation as in the SLR
model and can be defined in the same way, that is,

TSS = ESS + RSS

where
ESS   yˆ i   (Yˆi  Y )   ( ˆ 0  ˆ1 X 1i    ˆ K X Ki  Y )
2 2 2

And R2 is defined as:


2
RSS  ei2 ESS  yˆ i
R  1 -
2
=1 - 2
= 
TSS  yi TSS  y i2

However, there is a potential problem here. Suppose you have a MLR with k
variables, and you now decide to add in one explanatory variable. Suppose your
goal is to get a larger value for R2. In reality, no matter what that variable is, the
R2 surely cannot decrease. That is R2 is a non-decreasing function of the number
of regressors since it does not penalize a lower degrees of freedom.

For this reason, you cannot simply choose a regression model with a larger
Page 6

value of R2. We need to develop an alternative measure of the “goodness of fit”.


We now introduce such an alternative measure – adjusted coefficient of
determination, R 2 . Define Adjusted R2 ( R 2 ) as:

 ei2
2 n - K  1  1  RSS /( n  K  1)
R =1 -
 y i2 TSS /( n  1)
n- 1

where K is the number of slope coefficients in the model. N-K-1 is the degrees
of freedom, that is, the excess of the number of observations over the number of
estimated coefficients (including the intercept).

Why is R 2 better? When K increases, RSS surely drops, but the denominator (n-
K-1) also drops. Now if the drop in RSS is not large enough, RSS/(n-K-1) will
actually increase so that R 2 will decrease. In other words, R 2 penalizes the
measure of fit in adding an explanatory variable if that variable does not
contribution much toward explaining the variable in Y.

However, do not misuse R 2 . 1) must also care about economic theory/intuition.


2) do not compare R 2 s when the dependent variables are different.

XIII. Example 2.2.3

IV. Continue the height regression Q2.11

V. Appendix (Optional):

1. Deriving OLS estimates in the SLR.

Consider the first order conditions, that is, differentiate the objective function
with respect to the choice variables (ie ˆ 0 , ˆ1 ) and set them equal to zero:
Rearranging terms we get:
Page 7
(  ei
 
2
) ˆ ˆ
= - 2 ( Y - - X ) = - 2

i i
ˆ 0 1
0

(  e
 
2
) ˆ ˆ
i
= - 2 ( Y - - X )X = - 2

i i
ˆ 0 1 i
1

 Y i = nˆ 0 + ˆ1  X i
 Y i X i = ˆ 0  X i + ˆ1  X i2
where 'n' is the sample size. These simultaneous equations are known as the
Normal Equations. Solving them, we get a general formula for the estimated
slope coefficient:

n X i Y i -  X i  Y i
ˆ1 =
n X i2 - (  X i )2
( X i - X ) ( Y i - Y )
=
( X i - X )2
x i y i
=
 xi2

where the ‘bars’ over X and Y indicate sample means. Substituting this back
into the first normal equation we have
ˆ 0 = Y - ˆ1 X

2. Proof of Algebraic Properties of OLS Estimates.

1) Recall that a line y=a+bx goes through a point (x0,y0) if and only if
y0=a+b*x0. Now we need to prove that the estimated regression line

Yˆi  ˆ0  ˆ1 X i

goes through ( X ,Y ) . This is obvious from the expression of ˆ 0 .

2) This is also easy to prove because


 
 
ˆ = ˆ  ˆ X
Y i 0 1 i


= ( Y - ˆ X ) + ˆ X i
1 1

= Y + ˆ ( X i - X )
1

If we sum both sides of the last equality over the sample and divide by
the sample size n, the second term at the right hand side will disappear.
Hence
Yˆ  Y
Page 8

3) First, the basic SRF can be written:


Yi  ˆ 0  ˆ1 X i  ei
Substituting in the expression for ̂ 0 and rearranging terms we have

Y i - Y = ̂ 1 ( X i - X ) + ei

or

y i = ˆ1 x i + ei

4) To prove the residuals are uncorrelated with the fitted value we have
  
 
 yˆ ei = ˆ  xi ei

 
i 1

= ˆ  x i ( y - ˆ x )
1 i 1 i

= ˆ  x i y - ˆ 2
 x
2
i
1 i 1
2 2
= ˆ  x
2
i - ˆ  x
2
i = 0
1 1

You might also like