Professional Documents
Culture Documents
There are a number of techniques for constructing the SRF (ie estimating the
regression model or estimating 0 , 1 ). But the most common and extensively
used procedure is OLS or Least Squares Criterion.
First, recall that the residual from any SRF can be written:
ei = Y i - Yˆ i Y i - ˆ 0 - ˆ 1 X i
n n
Min ei2 = ( Y i - Yˆ i )2
i 1 i 1
where n is the sample size. We want to make the Residual Sum of Squares
(RSS) as small as possible. The summation takes place over the sample. This is
our objective function -- the least squares criterion. The goal of the estimation.
Solving this minimization problem, we get a general formula for the estimated
slope coefficient:
n X i Y i - X i Y i ( X i - X ) ( Y i - Y ) xi yi
̂ 1 =
n X i2 - ( X i )2 ( X i - X )2 xi2
where the ‘bars’ over X and Y indicate sample means of X and Y respectively.
From now on, lower case letters for these variables indicate deviations from the
means (that is xi X i X , yi Yi Y ). And
ˆ 0 = Y - ˆ 1 X
1) the estimated regression line passes through the sample means of Y and X
(see the formula for ̂ 0 ).
2) the mean of the fitted value is equal to the mean of the dependent variable
(that is Yˆ Y ).
y i = ̂ 1 xi + ei
OLS only tells how to obtain the estimated regression function (ie SPF) from
data. But how well the line fits the data? To address this issue, we need three
concepts.
Page 3
Y i = Yˆi + ei
y i yˆ i ei
Squaring both sides, and summing over the sample we get:
y 2
i yˆ i2 ei2 2 yˆ i ei yˆ i2 ei2
where we make use of the fact that the fitted values of y and the residuals are
uncorrelated, and the ‘deviation form’ of the regression function.
This has the following interpretation. The Total Sum of Squares (TSS) is equal
to the sum of the Explained Sum of Squares (ESS) and the Residual Sum of
Squares (RSS). It is known the “decomposition of variance”.
The total variation in the dependent variable can be attributed to the regression
line (the explained forces) and the residuals (the unexplained forces). The
smaller the RSS, the better the fit of the line to the data.
R2 is defined as:
2
2 ESS yˆ i
R = = 2
TSS yi
or
2 RSS ei2
R =1 - =1 -
TSS y i2
Often Y is affected by many variables instead of just one. In this case we have
to use a MLR:
Yi = 0 + 1 X 1i K X Ki + i
The slope coefficients are often called partial regression coefficients as they
indicate the change in the dependent variable associated with a one unit increase
Page 5
OLS applies in the same way to MLR. That is, we minimize the RSS:
n
Min (Yi 0 1 X 1i K X Ki )
2
i 1
The values of 0 , 1 ,, K that minimize RSS are called the OLS estimates and
denoted by ˆ , ˆ ,, ˆ .
0 1 K
Formulae for OLS estimates? OLS estimates still have closed form expression
and hence solutions are unique. However, the expressions quickly get very
complicated with K. It is much easier to use Matrix Algebra to obtain a simple
expression for OLS estimates. This will be learned in a more advanced
econometric course.
The Coefficient of Determination (R2) has the same interpretation as in the SLR
model and can be defined in the same way, that is,
where
ESS yˆ i (Yˆi Y ) ( ˆ 0 ˆ1 X 1i ˆ K X Ki Y )
2 2 2
However, there is a potential problem here. Suppose you have a MLR with k
variables, and you now decide to add in one explanatory variable. Suppose your
goal is to get a larger value for R2. In reality, no matter what that variable is, the
R2 surely cannot decrease. That is R2 is a non-decreasing function of the number
of regressors since it does not penalize a lower degrees of freedom.
For this reason, you cannot simply choose a regression model with a larger
Page 6
ei2
2 n - K 1 1 RSS /( n K 1)
R =1 -
y i2 TSS /( n 1)
n- 1
where K is the number of slope coefficients in the model. N-K-1 is the degrees
of freedom, that is, the excess of the number of observations over the number of
estimated coefficients (including the intercept).
Why is R 2 better? When K increases, RSS surely drops, but the denominator (n-
K-1) also drops. Now if the drop in RSS is not large enough, RSS/(n-K-1) will
actually increase so that R 2 will decrease. In other words, R 2 penalizes the
measure of fit in adding an explanatory variable if that variable does not
contribution much toward explaining the variable in Y.
V. Appendix (Optional):
Consider the first order conditions, that is, differentiate the objective function
with respect to the choice variables (ie ˆ 0 , ˆ1 ) and set them equal to zero:
Rearranging terms we get:
Page 7
( ei
2
) ˆ ˆ
= - 2 ( Y - - X ) = - 2
i i
ˆ 0 1
0
( e
2
) ˆ ˆ
i
= - 2 ( Y - - X )X = - 2
i i
ˆ 0 1 i
1
Y i = nˆ 0 + ˆ1 X i
Y i X i = ˆ 0 X i + ˆ1 X i2
where 'n' is the sample size. These simultaneous equations are known as the
Normal Equations. Solving them, we get a general formula for the estimated
slope coefficient:
n X i Y i - X i Y i
ˆ1 =
n X i2 - ( X i )2
( X i - X ) ( Y i - Y )
=
( X i - X )2
x i y i
=
xi2
where the ‘bars’ over X and Y indicate sample means. Substituting this back
into the first normal equation we have
ˆ 0 = Y - ˆ1 X
1) Recall that a line y=a+bx goes through a point (x0,y0) if and only if
y0=a+b*x0. Now we need to prove that the estimated regression line
= ( Y - ˆ X ) + ˆ X i
1 1
= Y + ˆ ( X i - X )
1
If we sum both sides of the last equality over the sample and divide by
the sample size n, the second term at the right hand side will disappear.
Hence
Yˆ Y
Page 8
Y i - Y = ̂ 1 ( X i - X ) + ei
or
y i = ˆ1 x i + ei
4) To prove the residuals are uncorrelated with the fitted value we have
yˆ ei = ˆ xi ei
i 1
= ˆ x i ( y - ˆ x )
1 i 1 i
= ˆ x i y - ˆ 2
x
2
i
1 i 1
2 2
= ˆ x
2
i - ˆ x
2
i = 0
1 1