Multiple Regression Analysis: y + X + X + - . - X + U

Multiple Regression Analysis
y = 0 + 1x1 + 2x2 + . . . kxk + u
1. Estimation
2011-3-20 Lectured by Dr Jin Hongfei 1

Recap of Simple Regression
Assumptions:
The population model is linear in parameters:
y = 0 + 1x + u [SLR.1]
There is a random sample of size n, {(xi, yi): i=1,
2, …, n}, from the population model. [SLR.2]
Assume E(u|x) = 0 and thus E(ui|xi) = 0 [SLR.3]
Assume there is variation in the xi [SLR.4]
Var(u|x) = σ2 [SLR.5]

Applying the least squares method yields
estimators of 0 and 1
R2 = 1-(SSR/SST) is a measure of goodness
of fit
Under SLR.1-SLR.4 the OLS estimators
are unbiased…
…and, with SLR.5, have variances which
can be estimated from the sample…

… if we estimate σ2 with SSR/(n-2).
Adding SLR.6, Normality, gives normally
distributed errors and allows us to state that …
ˆ1  1 ˆ1  1
 ~ tn2
ˆ s x se( ˆ1 )
…which is the basis of statistical inference.
Alternative, useful, functional forms are possible.
Limitations of Simple Regression
In simple regression we explicitly control for only
a single explanatory variable.
We deal with this by assuming SLR.3
e.g. wage = β0 + β1educ + β2exper + u
Simple regression of wage on educ puts exper in u
and assumes educ and u independent.
Simple regression puts a lot of weight on
conditional mean independence.

Multiple Regression Model
In the population we assume
y = 0 + 1x1 + 2x2 + . . . kxk + u
We are still explaining y.

There are k explanatory variables.
There are k+1 parameters.
k = 1 gets us back to simple regression.
Parallels with Simple Regression
0 is still the intercept
1 to k all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption, so now assume that
E(u|x1,x2, …,xk) = 0 or E(u|x) = 0
Still minimizing the sum of squared
residuals, however...
Applying Least Squares
A residual is
uˆi  yi  yˆ i
And a fitted value is
yˆ i  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik
So n n
 i  i i
( ˆ
u
i 1
) 2
 ( y  ˆ
y
i 1
) 2
n
  ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik ) 2
i 1
Some Important Notation
• xij is the i’th observation on the j’th
explanatory variable
• e.g. x32 is the 3rd observation on explanatory
variable 2
• Not such a problem when we use variable
names e.g. educ3

The First Order Conditions
n
  (uˆ i ) 2 n
i 1
  2  ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
ˆ0 i 1
n
  (uˆi ) 2 n
i 1
 2 xi1 ( yi  ˆ0  ˆ1 xi1  ˆ2 xi 2    ˆk xik )  0
ˆ1 i 1
 
n
  (uˆ i ) 2 n
i 1
  2  xik ( y i  ˆ0  ˆ1 xi1  ˆ 2 xi 2    ˆ k xik )  0
ˆ k i 1

The First Order Conditions
• There are k + 1 first order conditions,
solution of which is hard.
•A matrix approach is “easier” but beyond the
scope of our course. See Wooldridge,
Appendix E.
•In general each ˆ j is a function of all the xj
and the y.

Interpreting Multiple Regression
yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk , so

yˆ  ˆ x  ˆ x  ...  ˆ x ,
1 1 2 2 k k
so holding x2 ,..., xk fixed implies that

yˆ  ˆ x , that is each  has
1 1
a ceteris paribus interpretation

Example
Consider the multiple regression model:
wage = β0 + β1educ + β2exper + u
ˆ 1 is the estimated increase in the wage for a unit

increase in educ holding exper constant
Using the data in wage.wfl…

Example (Eviews output)

Example - interpretation
The fitted equation is:
• Holding experience fixed, a one year increase in

education increases the hourly wage by 64 cents.
• Holding education fixed, a one year increase in
experience increases the hourly wage by 7 cents.
• When education and experience are zero wages are
predicted to be -$3.40!
Simple vs Multiple Estimates
~ ~ ~
Compare the simple regression y   0  1 x1
with the multiple regression yˆ  ˆ0  ˆ1 x1  ˆ2 x2
~ ˆ
Generally, 1  1 unless :
ˆ  0 (i.e. no partial effect of x ) OR
2 2
x1 and x2 are uncorrelated in the sample

Assumptions for Unbiasedness
Population model is linear in parameters:
y = 0 + 1x1 + 2x2 +…+ kxk + u [MLR.1]
{(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random sample
from the population model, so that
yi = 0 + 1xi1 + 2xi2 +…+ kxik + ui [MLR.2]
E(u|x1, x2,… xk) = 0, implying that all of the explanatory
variables are uncorrelated with the error [MLR.3]
None of the x’s is constant, and there are no exact linear
relationships among them [MLR.4]

Unbiasedness of OLS
Under these assumptions: E (ˆ j )   j , j  0,1,..., k
All of the OLS estimators of the parameters of
the multiple regression model are unbiased
estimators.
This is not generally true if any one of MLR.1-
MLR.4 are violated.
Note MLR.4 more involved than SLR.4 and rules
out perfect multicollinearity.
MLR.3 more plausible than SLR.3?
Too Many or Too Few Variables
What happens if we include variables in
our specification that don’t belong?
 OLS estimators remain unbiased
 There will, however, be an impact on the
variance of the estimators
What if we exclude a variable from our
specification that does belong?
 OLS will usually be biased

Omitted Variable Bias
True model:
y  0  1 x1  2 x2  u
We estimate:
y  0  1 x1  u
  xi1
 x  y
Then  1  1 i
  xi1  x1 
2

Omitted Variable Bias (cont)
Recall that the true model is
yi  0  1 xi1  2 xi 2  ui , so the
numerator becomes
  xi1  x1  0  1 xi1  2 xi 2  ui  
1   xi1  x1   2   xi1  x1 xi 2    xi1  x1 ui
2

  xi1
 x  x   xi1  x1  ui
  1  2 1 i 2

  xi1  x1    xi1  x1 
2 2
since E(ui )  0, taking expectations we have

  xi1
 x  x
E  1   1  2
 1 i2
  xi1  x1 
2

Consider the regression of x2 on x1

  xi1  x1  xi 2
x2   0   1 x1 then  1 
  xi1  x1 
2
so E   1   1  2 1

Summary of Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
2 > 0 Positive bias Negative bias
2 < 0 Negative bias Positive bias

Omitted Variable Bias Summary
Two cases where bias is equal to zero
 2 = 0, that is x2 doesn’t really belong in model
 x1 and x2 are uncorrelated in the sample
If correlations between x2 , x1 and x2 , y are

the same sign, bias will be positive
If correlations between x2 , x1 and x2 , y are
the opposite sign, bias will be negative

Omitted Variable Bias: Example
Suppose the model
log(wage) = 0 + 1educ + 2abil + u
satisfies MLR.1-MLR.4
abil is typically hard to observe so we
estimate log(wage) = 0 + 1educ + u
On average we expect the estimate of 1 to be
too high, E(ˆ 1 )  1 , since we expect 2  0 and
Corr(abil , educ)  0.
The More General Case
Technically, can only sign the bias for the
more general case if all of the included x’s
are uncorrelated
Typically, then, we work through the bias

assuming the x’s are uncorrelated, as a
useful guide even if this assumption is not
strictly true

Variance of the OLS Estimators
Assume Var(u|x1, x2,…, xk) = 2
(Homoscedasticity)
Let x stand for (x1, x2,…xk)
Assuming that Var(u|x) = 2 also implies
that Var(y| x) = 2
The 4 assumptions for unbiasedness, plus
this homoscedasticity assumption are
known as the Gauss-Markov assumptions
Variance of OLS (cont)
Given the Gauss-Markov Assumptions

 2
Var  ˆ j   , where
SSTj 1  R j 
2
SSTj    xij  x j  and R j is the R

2 2 2
from regressing x j on all other x's

Components of OLS Variances
1. The error variance: a larger 2 implies a larger
variance for the OLS estimators
2. The total sample variation: a larger SSTj implies
a smaller variance for the estimators
3. Linear relationships among the independent
variables: a larger Rj2 implies a larger variance
for the estimators (multicollinearity)

Misspecified Models
Consider again the misspecified model


 
2
~ ~ ~ ~
y   0  1 x1 , so that Var 1 
SST1
 
~
 
Thus, Var   Var ˆ unless x and
1 1 1
x2 are uncorrelated, then they' re the same

Misspecified Models (cont)
Assuming that x and x are not uncorrelated,

1 2
we can draw the following conclusions:
~
From the second conclusion, it is clear that 1
is preferred if 2 0.

Misspecified Models (cont)
While the variance of the estimator is smaller for
the misspecified model, unless 2 = 0 the
misspecified model is biased
Corrolary: including an extraneous or irrelevant
variable cannot decrease the variance of the
estimator
As the sample size grows, the variance of each
estimator shrinks to zero, making the variance
difference less important

Gauss-Markov Assumptions
Linear in parameters: y = b0 + b1x1 + b2x2 +…+
bkxk + u [MLR.1]
{(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random
sample from the population model, so that
yi = b0i + b1x1i + b2x2i +…+ bkxki + ui [MLR.2]
E(u|x1, x2,… xk) = E(u) = 0. Conditional mean
independence [MLR.3]
No exact multicollinearity [MLR.4]
V(u|x) = V(u) = σ2. Homoscedasticity. [MLR.5]

The Gauss-Markov Theorem
Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is “BLUE”
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS

Estimating the Error Variance
ˆ   uˆ /(n  k  1)  SSR / df
2 2
i
ˆ
thus, se(  j )  ˆ /[ SST j (1  R j )]
2 1/ 2
df = n – (k + 1), or df = n – k – 1
df (i.e. degrees of freedom) is the (number
of observations) – (number of estimated
parameters)
Goodness-of-Fit
R2 can also be used in the multiple regression
context.
R2 = SSE/SST = 1 – SSR/SST
0 < R2 < 1
R2 has the same interpretation - the
proportion of the variation in y explained by
the independent (x) variables
More about R-squared
R2 can never decrease when another
independent variable is added to a
regression, and usually will increase
This is because SSR is non-increasing in k
Because R2 will usually increase with the
number of independent variables, it is not a
good way to compare models

Adjusted R-Squared (Section 6.3)
An alternative measure of goodness of fit is
sometimes used
The adjusted R2 takes into account the number of
variables in a model, and may decrease
R 2
 1
SSR n  k  1
SST n  1
ˆ 2
 1
SST n  1
Adjusted R-Squared (cont)
It’s easy to see that the adjusted R2 is just
(1 – R2)(n – 1) / (n – k – 1), but Eviews will
give you both R2 and adj-R2
You can compare the fit of 2 models (with
the same y) by comparing the adj-R2
You cannot use the adj-R2 to compare
models with different y’s (e.g. y vs. ln(y))

Eviews output again
R 2
̂ R2
Summary: Multiple Regression
Many of the principles the same as simple
regression
 Functional form results the same
Need to be aware of the role of the
assumptions
Have only focused on estimation; consider
inference in the next lecture

Next Time
Next topic is inference in (multiple)
regression (Chapter 4)

Multiple Regression Analysis: y + X + X + - . - X + U

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression Analysis: y + X + X + - . - X + U

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis

y = 0 + 1x1 + 2x2 + . . . kxk + u

2011-3-20 Lectured by Dr Jin Hongfei 1

2011-3-20 Lectured by Dr Jin Hongfei 2

2011-3-20 Lectured by Dr Jin Hongfei 3

2011-3-20 Lectured by Dr Jin Hongfei 5

We are still explaining y.

2011-3-20 Lectured by Dr Jin Hongfei 9

2011-3-20 Lectured by Dr Jin Hongfei 10

2011-3-20 Lectured by Dr Jin Hongfei 11

yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk , so

so holding x2 ,..., xk fixed implies that

a ceteris paribus interpretation

2011-3-20 Lectured by Dr Jin Hongfei 12

ˆ 1 is the estimated increase in the wage for a unit

Using the data in wage.wfl…

2011-3-20 Lectured by Dr Jin Hongfei 13

2011-3-20 Lectured by Dr Jin Hongfei 14

• Holding experience fixed, a one year increase in

x1 and x2 are uncorrelated in the sample

2011-3-20 Lectured by Dr Jin Hongfei 16

2011-3-20 Lectured by Dr Jin Hongfei 17

2011-3-20 Lectured by Dr Jin Hongfei 19

2011-3-20 Lectured by Dr Jin Hongfei 20

2011-3-20 Lectured by Dr Jin Hongfei 21

since E(ui )  0, taking expectations we have

2011-3-20 Lectured by Dr Jin Hongfei 22

Consider the regression of x2 on x1

2011-3-20 Lectured by Dr Jin Hongfei 23

Corr(x1, x2) > 0 Corr(x1, x2) < 0

2 > 0 Positive bias Negative bias

2 < 0 Negative bias Positive bias

2011-3-20 Lectured by Dr Jin Hongfei 24

If correlations between x2 , x1 and x2 , y are

2011-3-20 Lectured by Dr Jin Hongfei 25

Typically, then, we work through the bias

2011-3-20 Lectured by Dr Jin Hongfei 27

Given the Gauss-Markov Assumptions

SSTj    xij  x j  and R j is the R

from regressing x j on all other x's

2011-3-20 Lectured by Dr Jin Hongfei 30

Consider again the misspecified model

x2 are uncorrelated, then they' re the same

2011-3-20 Lectured by Dr Jin Hongfei 31

Assuming that x and x are not uncorrelated,

we can draw the following conclusions:

2011-3-20 Lectured by Dr Jin Hongfei 32

2011-3-20 Lectured by Dr Jin Hongfei 33

2011-3-20 Lectured by Dr Jin Hongfei 34

2011-3-20 Lectured by Dr Jin Hongfei 35

2011-3-20 Lectured by Dr Jin Hongfei 38

2011-3-20 Lectured by Dr Jin Hongfei 40

2011-3-20 Lectured by Dr Jin Hongfei 42

2011-3-20 Lectured by Dr Jin Hongfei 43

You might also like