You are on page 1of 43

Multiple Regression Analysis

y = 0 + 1x1 + 2x2 + . . . kxk + u

1. Estimation

2011-3-20 Lectured by Dr Jin Hongfei 1


Recap of Simple Regression
Assumptions:
The population model is linear in parameters:
y = 0 + 1x + u [SLR.1]
There is a random sample of size n, {(xi, yi): i=1,
2, …, n}, from the population model. [SLR.2]
Assume E(u|x) = 0 and thus E(ui|xi) = 0 [SLR.3]
Assume there is variation in the xi [SLR.4]
Var(u|x) = σ2 [SLR.5]

2011-3-20 Lectured by Dr Jin Hongfei 2


Recap of Simple Regression
Applying the least squares method yields
estimators of 0 and 1
R2 = 1-(SSR/SST) is a measure of goodness
of fit
Under SLR.1-SLR.4 the OLS estimators
are unbiased…
…and, with SLR.5, have variances which
can be estimated from the sample…

2011-3-20 Lectured by Dr Jin Hongfei 3


Recap of Simple Regression
… if we estimate σ2 with SSR/(n-2).
Adding SLR.6, Normality, gives normally
distributed errors and allows us to state that …
ˆ1  1 ˆ1  1
 ~ tn2
ˆ s x se( ˆ1 )
…which is the basis of statistical inference.
Alternative, useful, functional forms are possible.
2011-3-20 Lectured by Dr Jin Hongfei 4
Limitations of Simple Regression
In simple regression we explicitly control for only
a single explanatory variable.
We deal with this by assuming SLR.3
e.g. wage = β0 + β1educ + β2exper + u
Simple regression of wage on educ puts exper in u
and assumes educ and u independent.
Simple regression puts a lot of weight on
conditional mean independence.

2011-3-20 Lectured by Dr Jin Hongfei 5


Multiple Regression Model
In the population we assume
y = 0 + 1x1 + 2x2 + . . . kxk + u

We are still explaining y.


There are k explanatory variables.
There are k+1 parameters.
k = 1 gets us back to simple regression.
2011-3-20 Lectured by Dr Jin Hongfei 6
Parallels with Simple Regression
0 is still the intercept
1 to k all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption, so now assume that
E(u|x1,x2, …,xk) = 0 or E(u|x) = 0
Still minimizing the sum of squared
residuals, however...
2011-3-20 Lectured by Dr Jin Hongfei 7
Applying Least Squares
A residual is
uˆi  yi  yˆ i
And a fitted value is
yˆ i  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik
So n n

 i  i i
( ˆ
u
i 1
) 2
 ( y  ˆ
y
i 1
) 2

n
  ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik ) 2
i 1
2011-3-20 Lectured by Dr Jin Hongfei 8
Some Important Notation
• xij is the i’th observation on the j’th
explanatory variable
• e.g. x32 is the 3rd observation on explanatory
variable 2
• Not such a problem when we use variable
names e.g. educ3

2011-3-20 Lectured by Dr Jin Hongfei 9


The First Order Conditions
n
  (uˆ i ) 2 n
i 1
  2  ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
ˆ0 i 1
n
  (uˆi ) 2 n
i 1
 2 xi1 ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik )  0
ˆ1 i 1

 
n
  (uˆ i ) 2 n
i 1
  2  xik ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
ˆ k i 1

2011-3-20 Lectured by Dr Jin Hongfei 10


The First Order Conditions
• There are k + 1 first order conditions,
solution of which is hard.
•A matrix approach is “easier” but beyond the
scope of our course. See Wooldridge,
Appendix E.
•In general each ˆ j is a function of all the xj
and the y.

2011-3-20 Lectured by Dr Jin Hongfei 11


Interpreting Multiple Regression

yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk , so


yˆ  ˆ x  ˆ x  ...  ˆ x ,
1 1 2 2 k k

so holding x2 ,..., xk fixed implies that


yˆ  ˆ x , that is each  has
1 1

a ceteris paribus interpretation

2011-3-20 Lectured by Dr Jin Hongfei 12


Example
Consider the multiple regression model:
wage = β0 + β1educ + β2exper + u

ˆ 1 is the estimated increase in the wage for a unit


increase in educ holding exper constant

Using the data in wage.wfl…

2011-3-20 Lectured by Dr Jin Hongfei 13


Example (Eviews output)

2011-3-20 Lectured by Dr Jin Hongfei 14


Example - interpretation
The fitted equation is:

• Holding experience fixed, a one year increase in


education increases the hourly wage by 64 cents.
• Holding education fixed, a one year increase in
experience increases the hourly wage by 7 cents.
• When education and experience are zero wages are
predicted to be -$3.40!
2011-3-20 Lectured by Dr Jin Hongfei 15
Simple vs Multiple Estimates
~ ~ ~
Compare the simple regression y   0  1 x1
with the multiple regression yˆ  ˆ0  ˆ1 x1  ˆ2 x2
~ ˆ
Generally, 1  1 unless :
ˆ  0 (i.e. no partial effect of x ) OR
2 2

x1 and x2 are uncorrelated in the sample

2011-3-20 Lectured by Dr Jin Hongfei 16


Assumptions for Unbiasedness
Population model is linear in parameters:
y = 0 + 1x1 + 2x2 +…+ kxk + u [MLR.1]
{(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random sample
from the population model, so that
yi = 0 + 1xi1 + 2xi2 +…+ kxik + ui [MLR.2]
E(u|x1, x2,… xk) = 0, implying that all of the explanatory
variables are uncorrelated with the error [MLR.3]
None of the x’s is constant, and there are no exact linear
relationships among them [MLR.4]

2011-3-20 Lectured by Dr Jin Hongfei 17


Unbiasedness of OLS
Under these assumptions: E (ˆ j )   j , j  0,1,..., k
All of the OLS estimators of the parameters of
the multiple regression model are unbiased
estimators.
This is not generally true if any one of MLR.1-
MLR.4 are violated.
Note MLR.4 more involved than SLR.4 and rules
out perfect multicollinearity.
MLR.3 more plausible than SLR.3?
2011-3-20 Lectured by Dr Jin Hongfei 18
Too Many or Too Few Variables
What happens if we include variables in
our specification that don’t belong?
 OLS estimators remain unbiased
 There will, however, be an impact on the
variance of the estimators
What if we exclude a variable from our
specification that does belong?
 OLS will usually be biased

2011-3-20 Lectured by Dr Jin Hongfei 19


Omitted Variable Bias
True model:
y  0  1 x1  2 x2  u
We estimate:
y  0  1 x1  u
  xi1
 x  y
Then  1  1 i

  xi1  x1 
2

2011-3-20 Lectured by Dr Jin Hongfei 20


Omitted Variable Bias (cont)
Recall that the true model is
yi  0  1 xi1  2 xi 2  ui , so the
numerator becomes
  xi1  x1  0  1 xi1  2 xi 2  ui  
1   xi1  x1   2   xi1  x1 xi 2    xi1  x1 ui
2

2011-3-20 Lectured by Dr Jin Hongfei 21


Omitted Variable Bias (cont)

  xi1
 x  x   xi1  x1  ui
  1  2 1 i 2

  xi1  x1    xi1  x1 
2 2

since E(ui )  0, taking expectations we have


  xi1
 x  x
E  1   1  2
 1 i2

  xi1  x1 
2

2011-3-20 Lectured by Dr Jin Hongfei 22


Omitted Variable Bias (cont)

Consider the regression of x2 on x1


  xi1  x1  xi 2
x2   0   1 x1 then  1 
  xi1  x1 
2

so E   1   1  2 1

2011-3-20 Lectured by Dr Jin Hongfei 23


Summary of Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

2 > 0 Positive bias Negative bias

2 < 0 Negative bias Positive bias

2011-3-20 Lectured by Dr Jin Hongfei 24


Omitted Variable Bias Summary
Two cases where bias is equal to zero
 2 = 0, that is x2 doesn’t really belong in model
 x1 and x2 are uncorrelated in the sample

If correlations between x2 , x1 and x2 , y are


the same sign, bias will be positive
If correlations between x2 , x1 and x2 , y are
the opposite sign, bias will be negative

2011-3-20 Lectured by Dr Jin Hongfei 25


Omitted Variable Bias: Example
Suppose the model
log(wage) = 0 + 1educ + 2abil + u
satisfies MLR.1-MLR.4
abil is typically hard to observe so we
estimate log(wage) = 0 + 1educ + u
On average we expect the estimate of 1 to be
too high, E(ˆ 1 )  1 , since we expect 2  0 and
Corr(abil , educ)  0.
2011-3-20 Lectured by Dr Jin Hongfei 26
The More General Case
Technically, can only sign the bias for the
more general case if all of the included x’s
are uncorrelated

Typically, then, we work through the bias


assuming the x’s are uncorrelated, as a
useful guide even if this assumption is not
strictly true

2011-3-20 Lectured by Dr Jin Hongfei 27


Variance of the OLS Estimators
Assume Var(u|x1, x2,…, xk) = 2
(Homoscedasticity)
Let x stand for (x1, x2,…xk)
Assuming that Var(u|x) = 2 also implies
that Var(y| x) = 2
The 4 assumptions for unbiasedness, plus
this homoscedasticity assumption are
known as the Gauss-Markov assumptions
2011-3-20 Lectured by Dr Jin Hongfei 28
Variance of OLS (cont)

Given the Gauss-Markov Assumptions


 2

Var  ˆ j   , where
SSTj 1  R j 
2

SSTj    xij  x j  and R j is the R


2 2 2

from regressing x j on all other x's


2011-3-20 Lectured by Dr Jin Hongfei 29
Components of OLS Variances
1. The error variance: a larger 2 implies a larger
variance for the OLS estimators
2. The total sample variation: a larger SSTj implies
a smaller variance for the estimators
3. Linear relationships among the independent
variables: a larger Rj2 implies a larger variance
for the estimators (multicollinearity)

2011-3-20 Lectured by Dr Jin Hongfei 30


Misspecified Models

Consider again the misspecified model



 
2
~ ~ ~ ~
y   0  1 x1 , so that Var 1 
SST1
 
~
 
Thus, Var   Var ˆ unless x and
1 1 1

x2 are uncorrelated, then they' re the same

2011-3-20 Lectured by Dr Jin Hongfei 31


Misspecified Models (cont)

Assuming that x and x are not uncorrelated,


1 2

we can draw the following conclusions:

~
From the second conclusion, it is clear that 1
is preferred if 2 0.

2011-3-20 Lectured by Dr Jin Hongfei 32


Misspecified Models (cont)
While the variance of the estimator is smaller for
the misspecified model, unless 2 = 0 the
misspecified model is biased
Corrolary: including an extraneous or irrelevant
variable cannot decrease the variance of the
estimator
As the sample size grows, the variance of each
estimator shrinks to zero, making the variance
difference less important

2011-3-20 Lectured by Dr Jin Hongfei 33


Gauss-Markov Assumptions
Linear in parameters: y = b0 + b1x1 + b2x2 +…+
bkxk + u [MLR.1]
{(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random
sample from the population model, so that
yi = b0i + b1x1i + b2x2i +…+ bkxki + ui [MLR.2]
E(u|x1, x2,… xk) = E(u) = 0. Conditional mean
independence [MLR.3]
No exact multicollinearity [MLR.4]
V(u|x) = V(u) = σ2. Homoscedasticity. [MLR.5]

2011-3-20 Lectured by Dr Jin Hongfei 34


The Gauss-Markov Theorem
Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is “BLUE”
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS

2011-3-20 Lectured by Dr Jin Hongfei 35


Estimating the Error Variance
ˆ   uˆ /(n  k  1)  SSR / df
2 2
i

ˆ
thus, se(  j )  ˆ /[ SST j (1  R j )]
2 1/ 2

df = n – (k + 1), or df = n – k – 1
df (i.e. degrees of freedom) is the (number
of observations) – (number of estimated
parameters)
2011-3-20 Lectured by Dr Jin Hongfei 36
Goodness-of-Fit
R2 can also be used in the multiple regression
context.
R2 = SSE/SST = 1 – SSR/SST
0 < R2 < 1
R2 has the same interpretation - the
proportion of the variation in y explained by
the independent (x) variables
2011-3-20 Lectured by Dr Jin Hongfei 37
More about R-squared
R2 can never decrease when another
independent variable is added to a
regression, and usually will increase
This is because SSR is non-increasing in k
Because R2 will usually increase with the
number of independent variables, it is not a
good way to compare models

2011-3-20 Lectured by Dr Jin Hongfei 38


Adjusted R-Squared (Section 6.3)
An alternative measure of goodness of fit is
sometimes used
The adjusted R2 takes into account the number of
variables in a model, and may decrease

R 2
 1
SSR n  k  1
SST n  1
ˆ 2
 1
SST n  1
2011-3-20 Lectured by Dr Jin Hongfei 39
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just
(1 – R2)(n – 1) / (n – k – 1), but Eviews will
give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))

2011-3-20 Lectured by Dr Jin Hongfei 40


Eviews output again

R 2
̂ R2
2011-3-20 Lectured by Dr Jin Hongfei 41
Summary: Multiple Regression
Many of the principles the same as simple
regression
 Functional form results the same
Need to be aware of the role of the
assumptions
Have only focused on estimation; consider
inference in the next lecture

2011-3-20 Lectured by Dr Jin Hongfei 42


Next Time
Next topic is inference in (multiple)
regression (Chapter 4)

2011-3-20 Lectured by Dr Jin Hongfei 43

You might also like