You are on page 1of 17

Type author name/s here

Dougherty

Introduction to Econometrics,
5th edition
Chapter heading
Chapter 6: Specification of
Regression Variables

© Christopher Dougherty, 2016. All rights reserved.


VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Consequences of variable misspecification

True model

Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u

Coefficients are biased (in


Correct specification,
Ŷ  ˆ1  ˆ2 X 2 no problems
general). Standard errors
Fitted model

are invalid (in general).

Yˆ  ˆ1  ˆ2 X 2
Correct specification,
 ˆ X
3 3
no problems

In this sequence we will investigate the consequences of including an irrelevant variable in


a regression model.

1
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Consequences of variable misspecification

True model

Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u

Coefficients are biased (in


Correct specification,
Ŷ  ˆ1  ˆ2 X 2 no problems
general). Standard errors
Fitted model

are invalid (in general).

Coefficients are
Yˆ  ˆ1  ˆ2 X 2 unbiased (in general), Correct specification,
 ˆ X
3 3
but inefficient.
Standard errors are
no problems

valid (in general)

The effects are different from those of omitted variable misspecification. In this case the
coefficients in general remain unbiased, but they are inefficient. The standard errors
remain valid, but are needlessly large.
2
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3

These results can be demonstrated quickly.

3
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

Rewrite the true model adding X3 as an explanatory variable, with a coefficient of 0. Now
the true model and the fitted model coincide. Hence b2 will be an unbiased estimator of b2
and b3 will be an unbiased estimator of 0.
4
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

 u2 1
 2ˆ2  
 X 2 i  X 2  1  rX 2 , X 3
2 2

However, the variance of b2 will be larger than it would have been if the correct simple
regression had been run because it includes the factor 1 / (1 – r2), where r is the correlation
between X2 and X3.
5
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

 u2 1
 2ˆ2  
 X 2 i  X 2  1  rX 2 , X 3
2 2

The estimator b2 using the multiple regression model will therefore be less efficient than the
alternative using the simple regression model.

6
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

 u2 1
 2ˆ2  
 X 2 i  X 2  1  rX 2 , X 3
2 2

The intuitive reason for this is that the simple regression model exploits the information
that X3 should not be in the regression, while with the multiple regression model you find
this out from the regression results.
7
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

 u2 1
 2ˆ2  
 X 2 i  X 2  1  rX 2 , X 3
2 2

The standard errors remain valid, because the model is formally correctly specified, but
they will tend to be larger than those obtained in a simple regression, reflecting the loss of
efficiency.
8
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

Y  1   2 X 2  u

Ŷ  ˆ1  ˆ2 X 2  ˆ3 X 3


Y  1   2 X 2  0 X 3  u

 u2 1
 2ˆ2  
 X 2 i  X 2  1  rX 2 , X 3
2 2

These are the results in general. Note that if X2 and X3 happen to be uncorrelated, there will
be no loss of efficiency after all.

9
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 3, 496) = 33.26
Model | 25.5575266 3 8.51917554 Prob > F = 0.0000
Residual | 127.041693 496 .256132446 R-squared = 0.1675
-----------+------------------------------ Adj R-squared = 0.1624
Total | 152.59922 499 .30581006 Root MSE = .5061
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .097249 .0102607 9.48 0.000 .0770893 .1174088
EXP | .0414485 .0095424 4.34 0.000 .0227001 .060197
MALE | .1885338 .0457636 4.12 0.000 .0986193 .2784483
_cons | 1.017176 .1999318 5.09 0.000 .6243587 1.409994
----------------------------------------------------------------------------

The table shows the output from a logarithmic regression of hourly earnings on years of
schooling, years of work experience, and a male dummy variable, using EAWE Data Set 21.

10
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE AGE


----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 4, 495) = 24.94
Model | 25.5961696 4 6.39904241 Prob > F = 0.0000
Residual | 127.00305 495 .256571818 R-squared = 0.1677
-----------+------------------------------ Adj R-squared = 0.1610
Total | 152.59922 499 .30581006 Root MSE = .50653
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0985747 .0108227 9.11 0.000 .0773106 .1198389
EXP | .0437575 .0112521 3.89 0.000 .0216497 .0658653
MALE | .1895216 .0458735 4.13 0.000 .0993907 .2796525
AGE | -.0074013 .0190712 -0.39 0.698 -.0448718 .0300691
_cons | 1.196229 .5028946 2.38 0.018 .2081574 2.1843
----------------------------------------------------------------------------

Now age has been added as an explanatory variable. There is no particular reason to
suppose that age is a relevant explanatory variable and its coefficient is small and
insignificant.
11
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE AGE . . cor S EXP MALE AGE


(obs=500)
----------------------------------------------------------------------------
| S EXP MALE AGE
Source | SS df MS Number of obs = 500
-------+------------------------------------
-----------+------------------------------ F( 4, 495) = 24.94
S | 1.0000
Model | 25.5961696 4 6.39904241 Prob > F = 0.0000
EXP | -0.5836 1.0000
Residual | 127.00305 495 .256571818 R-squared = 0.1677
MALE | -0.1453 0.0664 1.0000
-----------+------------------------------ Adj R-squared = 0.1610
AGE | -0.0362 0.4492 0.0400 1.0000
Total | 152.59922 499 .30581006 Root MSE = .50653
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0985747 .0108227 9.11 0.000 .0773106 .1198389
EXP | .0437575 .0112521 3.89 0.000 .0216497 .0658653
MALE | .1895216 .0458735 4.13 0.000 .0993907 .2796525
AGE | -.0074013 .0190712 -0.39 0.698 -.0448718 .0300691
_cons | 1.196229 .5028946 2.38 0.018 .2081574 2.1843
----------------------------------------------------------------------------

Its correlations with S, EXP, and MALE are –0.04, 0.45, and 0.04, respectively.

12
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .097249 .0102607 9.48 0.000 .0770893 .1174088
EXP | .0414485 .0095424 4.34 0.000 .0227001 .060197
MALE | .1885338 .0457636 4.12 0.000 .0986193 .2784483
_cons | 1.017176 .1999318 5.09 0.000 .6243587 1.409994
----------------------------------------------------------------------------

. reg LGEARN S EXP MALE AGE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0985747 .0108227 9.11 0.000 .0773106 .1198389
EXP | .0437575 .0112521 3.89 0.000 .0216497 .0658653
MALE | .1895216 .0458735 4.13 0.000 .0993907 .2796525
AGE | -.0074013 .0190712 -0.39 0.698 -.0448718 .0300691
_cons | 1.196229 .5028946 2.38 0.018 .2081574 2.1843
----------------------------------------------------------------------------

Its inclusion does not cause the coefficients of those variables to be biased and they are
little changed.

13
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .097249 .0102607 9.48 0.000 .0770893 .1174088
EXP | .0414485 .0095424 4.34 0.000 .0227001 .060197
MALE | .1885338 .0457636 4.12 0.000 .0986193 .2784483
_cons | 1.017176 .1999318 5.09 0.000 .6243587 1.409994
----------------------------------------------------------------------------

. reg LGEARN S EXP MALE AGE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0985747 .0108227 9.11 0.000 .0773106 .1198389
EXP | .0437575 .0112521 3.89 0.000 .0216497 .0658653
MALE | .1895216 .0458735 4.13 0.000 .0993907 .2796525
AGE | -.0074013 .0190712 -0.39 0.698 -.0448718 .0300691
_cons | 1.196229 .5028946 2.38 0.018 .2081574 2.1843
----------------------------------------------------------------------------

The effect on the standard errors of the coefficients of S and MALE are likewise negligible,
as would be expected, given their very low correlations with AGE.

14
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE

. reg LGEARN S EXP MALE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .097249 .0102607 9.48 0.000 .0770893 .1174088
EXP | .0414485 .0095424 4.34 0.000 .0227001 .060197
MALE | .1885338 .0457636 4.12 0.000 .0986193 .2784483
_cons | 1.017176 .1999318 5.09 0.000 .6243587 1.409994
----------------------------------------------------------------------------

. reg LGEARN S EXP MALE AGE


----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0985747 .0108227 9.11 0.000 .0773106 .1198389
EXP | .0437575 .0112521 3.89 0.000 .0216497 .0658653
MALE | .1895216 .0458735 4.13 0.000 .0993907 .2796525
AGE | -.0074013 .0190712 -0.39 0.698 -.0448718 .0300691
_cons | 1.196229 .5028946 2.38 0.018 .2081574 2.1843
----------------------------------------------------------------------------

However, the correlation of EXP with AGE is large enough to cause a substantial increase in
its standard error, reflecting a loss of efficiency. Both point estimates of the coefficient of
EXP will be unbiased, but that in the first regression will tend to be closer to the true value.
15
Copyright Christopher Dougherty 2016.

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 6.3 of C. Dougherty,


Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.

2016.05.04

You might also like