Introduction To Econometrics, 5 Edition: Chapter 6: Specification of Regression Variables

Type author name/s here
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 6: Specification of
Regression Variables
© Christopher Dougherty, 2016. All rights reserved.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE
Consequences of variable misspecification
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Ŷ  ˆ1  ˆ2 X 2
Fitted model
Yˆ  ˆ1  ˆ2 X 2
 ˆ3 X 3
In this sequence and the next we will investigate the consequences of misspecifying the
regression model in terms of explanatory variables.
1
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Ŷ  ˆ1  ˆ2 X 2
Fitted model
Yˆ  ˆ1  ˆ2 X 2
 ˆ X
3 3
To keep the analysis simple, we will assume that there are only two possibilities. Either Y
depends only on X2, or it depends on both X2 and X3.
2
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Correct specification,
Ŷ  ˆ1  ˆ2 X 2 no problems
Fitted model
Yˆ  ˆ1  ˆ2 X 2
 ˆ X
3 3
If Y depends only on X2, and we fit a simple regression model, we will not encounter any
problems, assuming of course that the regression model assumptions are valid.
3
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Fitted model
Yˆ  ˆ1  ˆ2 X 2 Correct specification,

no problems
 ˆ X
3 3
Likewise we will not encounter any problems if Y depends on both X2 and X3 and we fit the
multiple regression.
4
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Fitted model

no problems
 ˆ X
3 3
In this sequence we will examine the consequences of fitting a simple regression when the
true model is multiple.
5
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Fitted model

no problems
 ˆ X
3 3
In the next one we will do the opposite and examine the consequences of fitting a multiple
regression when the true model is simple.
6
True model
Y  1   2 X 2  u Y  1   2 X 2   3 X 3  u
Coefficients are biased (in

general). Standard errors
Fitted model
are invalid (in general).

no problems
 ˆ X
3 3
In general, the omission of a relevant explanatory variable causes the regression

coefficients to be biased and the standard errors to be invalid.
7
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
 
ˆ
E 2  2  3
  X 2i  X 2   X 3i  X 3 
 X  X2 
2
2i
In the present case, the omission of X3 causes ˆ2 to be biased by the term highlighted in
yellow. We will explain this first intuitively and then demonstrate it mathematically.
8
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ 
E 2  2  3
  X 2i  X 2   X 3i  X 3 
 X  X2 
2
2i
Y
effect of X3
direct effect of
b2 b3
X2, holding X3
apparent effect of X2,
constant
acting as a mimic for X3
X2 X3
The intuitive reason is that, in addition to its direct effect b2, X2 has an apparent indirect
effect as a consequence of acting as a proxy for the missing X3.
9
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ 
E 2  2  3
  X 2i  X 2   X 3i  X 3 
 X  X2 
2
2i
Y
effect of X3
direct effect of
b2 b3
X2, holding X3
constant
X2 X3
The strength of the proxy effect depends on two factors: the strength of the effect of X3 on
Y, which is given by b3, and the ability of X2 to mimic X3.
10
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
E  2   2  3
ˆ   X 2i  X 2   X 3i  X 3 
 X  X2 
2
2i
Y
effect of X3
direct effect of
b2 b3
X2, holding X3
constant
X2 X3
The ability of X2 to mimic X3 is determined by the slope coefficient obtained when X3 is

regressed on X2, the term highlighted in yellow.
11
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ
2    X 2 i  X 2   Yi  Y 
 X  X2 
2
2i
 X  X 2     1   2 X 2 i   3 X 3 i  ui    1   2 X 2   3 X 3  u  
 2i
 X 2i  X 2 
2

   2  X 2 i  X 2    3  X 2 i  X 2   X 3 i  X 3    X 2 i  X 2   ui  u 
2

 X 2i  X 2 
2
 2  3
 X  X   X  X    X  X   u  u
2i 2 3i 3 2i 2 i
 X  X   X  X 
2 2
2i 2 2i 2
We will now derive the expression for the bias mathematically.
12
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ
2    X 2 i  X 2   Yi  Y 
 X  X2 
2
2i
 X  X 2     1   2 X 2 i   3 X 3 i  ui    1   2 X 2   3 X 3  u  
 2i
 X 2i  X 2 
2

2

 X 2i  X 2 
2
 2  3
 X  X   X  X    X  X   u  u
2i 2 3i 3 2i 2 i
 X  X   X  X 
2 2
2i 2 2i 2
Although Y really depends on X3 as well as X2, we make a mistake and regress Y on X2 only.
The slope coefficient is therefore as shown.
13
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ
2    X 2 i  X 2   Yi  Y 
 X  X2 
2
2i
 X  X 2     1   2 X 2 i   3 X 3 i  ui    1   2 X 2   3 X 3  u  
 2i
 X 2i  X 2 
2

2

 X 2i  X 2 
2
 2  3
 X  X   X  X    X  X   u  u
2i 2 3i 3 2i 2 i
 X  X   X  X 
2 2
2i 2 2i 2
We substitute for Y.
14
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ
2    X 2 i  X 2   Yi  Y 
 X  X2 
2
2i
 X  X 2     1   2 X 2 i   3 X 3 i  ui    1   2 X 2   3 X 3  u  
 2i
 X 2i  X 2 
2

2

 X 2i  X 2 
2
 2  3
 X  X   X  X    X  X   u  u
2i 2 3i 3 2i 2 i
 X  X   X  X 
2 2
2i 2 2i 2
We simplify and demonstrate that ˆ2 has three components.
15
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
ˆ
2  2  3   X 2i  X 2   X 3i  X 3 

  X  X   u  u
2i 2 i
 X  X2   X  X 
2 2
2i 2i 2
 
E ˆ2   2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
     
2 2
X  X 2i 2  X  X  2i 2
To investigate biasedness or unbiasedness, we take the expected value of ˆ2 . The first two
terms are unaffected because they contain no random components. Thus we focus on the
expectation of the error term.
16
 
E ˆ2   2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
     
2 2
X  X2i 2  X  X  2i 2
   X 2 i  X 2  ui  u  
E    X 2 i  X 2  ui  u  
1
E 

  X 2i  X 2     X 2i  X 2 
2 2

1
 2 
E  X 2 i  X 2  ui  u  
  X 2i  X 2 
1
 2 
 X 2 i  X 2  E  ui  u 
  X 2i  X 2 
0
X2 is nonstochastic, so the denominator of the error term is nonstochastic and may be

taken outside the expression for the expectation.
17
 
E ˆ2   2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
     
2 2
X  X 2i 2  X  X  2i 2
   X 2 i  X 2  ui  u  
E    X 2 i  X 2  ui  u  
1
E 

  X 2i  X 2     X 2i  X 2 
2 2

1
 2 
E  X 2 i  X 2  ui  u  
  X 2i  X 2 
1
 2 
 X 2 i  X 2  E  ui  u 
  X 2i  X 2 
0
In the numerator the expectation of a sum is equal to the sum of the expectations (first
expected value rule).
18
 
E ˆ2   2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
     
2 2
X  X 2i 2  X  X  2i 2
   X 2 i  X 2  ui  u  
E    X 2 i  X 2  ui  u  
1
E 

  X 2i  X 2     X 2i  X 2 
2 2

1
 2 
E  X 2 i  X 2  ui  u  
  X 2i  X 2 
1
 2 
 X 2 i  X 2  E  ui  u 
  X 2i  X 2 
0
In each product, the factor involving X2 may be taken out of the expectation because X2 is
nonstochastic.
19
 
E ˆ2   2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
     
2 2
X  X 2i 2  X  X  2i 2
   X 2 i  X 2  ui  u  
E    X 2 i  X 2  ui  u  
1
E 

  X 2i  X 2     X 2i  X 2 
2 2

1
 2 
E  X 2 i  X 2  ui  u  
  X 2i  X 2 
1
 2 
 X 2 i  X 2  E  ui  u 
  X 2i  X 2 
0
By Assumption A.3, the expected value of u is 0. It follows that the expected value of the
sample mean of u is also 0. Hence the expected value of the error term is 0.
20
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
E  ˆ2    2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
 X  X    
2 2
2i 2  X  X  2i 2
 2  3
 X  X   X  X 
2i 2 3i 3
 X  X 
2
2i 2
Thus we have shown that the expected value of ˆ2 is equal to the true value plus a bias
term. Note: the definition of a bias is the difference between the expected value of an
estimator and the true value of the parameter being estimated.
21
Y  1   2 X 2   3 X 3  u Ŷ  ˆ1  ˆ2 X 2
E  ˆ2    2   3
  X  X   X  X   E    X  X   u  u  
2i 2 3i 3 2i 2 i
 
 X  X    
2 2
2i 2  X  X  2i 2
 2  3
 X  X   X  X 
2i 2 3i 3
 X  X 
2
2i 2
As a consequence of the misspecification, the standard errors, t tests and F test are invalid.
22
S   1   2 ASVABC   3 SM  u
. reg S ASVABC SM
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 105.58
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
Residual | 2634.6478 497 5.30110221 R-squared = 0.2982
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
----------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
We will illustrate the bias using an educational attainment model. To keep the analysis
simple, we will assume that in the true model S depends only on ASVABC and SM. The
output above shows the corresponding regression using EAWE Data Set 21.
23
S   1   2 ASVABC   3 SM  u
. reg S ASVABC SM . cor SM ASVABC
(obs=500)
----------------------------------------------------------------------------
| SM ASVABC
-----------+------------------------------ F( 2, 497) = 105.58
--------+------------------
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
SM| 1.0000
ASVABC| 0.3594 1.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
E  2   2  3
ˆ   ASVABC i  ASVABC   SM i  SM 
  ASVABC  ASVABC 
2
i
We will run the regression a second time, omitting SM. Before we do this, we will try to
predict the direction of the bias in the coefficient of ASVABC.
24
S   1   2 ASVABC   3 SM  u
(obs=500)
----------------------------------------------------------------------------
| SM ASVABC
-----------+------------------------------ F( 2, 497) = 105.58
--------+------------------
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
SM| 1.0000
ASVABC| 0.3594 1.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
E  2   2  3
2
i
It is reasonable to suppose, as a matter of common sense, that b3 is positive. This

assumption is strongly supported by the fact that its estimate in the multiple regression is
positive and highly significant.
25
S   1   2 ASVABC   3 SM  u
(obs=500)
----------------------------------------------------------------------------
| SM ASVABC
-----------+------------------------------ F( 2, 497) = 105.58
--------+------------------
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
SM| 1.0000
ASVABC| 0.3594 1.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
E  2   2  3
2
i
The correlation between ASVABC and SM is positive, so the numerator of the bias term
must be positive. The denominator is automatically positive since it is a sum of squares
and there is some variation in ASVABC. Hence the bias should be positive.
26
S   1   2 ASVABC   3 SM  u
. reg S ASVABC . cor SM ASVABC
(obs=500)
----------------------------------------------------------------------------
| SM ASVABC
-----------+------------------------------ F( 1, 498) = 182.56
--------+------------------
Model | 1006.99534 1 1006.99534 Prob > F = 0.0000
SM| 1.0000
ASVABC| 0.3594 1.0000
-----------+------------------------------ Adj R-squared = 0.2668
Total | 3754.022 499 7.52309018 Root MSE = 2.3486
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.580901 .1170059 13.51 0.000 1.351015 1.810787
_cons | 14.43677 .1097335 131.56 0.000 14.22117 14.65237
----------------------------------------------------------------------------
E  2   2  3
2
i
Here is the regression omitting SM.
27
S   1   2 ASVABC   3 SM  u
. reg S ASVABC SM
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
. reg S ASVABC
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.580901 .1170059 13.51 0.000 1.351015 1.810787
_cons | 14.43677 .1097335 131.56 0.000 14.22117 14.65237
----------------------------------------------------------------------------
As you can see, the coefficient of ASVABC is indeed higher when SM is omitted. Part of the
difference may be due to pure chance, but part is attributable to the bias.
28
S   1   2 ASVABC   3 SM  u
. reg S SM
----------------------------------------------------------------------------
-----------+------------------------------ F( 1, 498) = 68.44
Model | 453.551645 1 453.551645 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1191
Total | 3754.022 499 7.52309018 Root MSE = 2.5744
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
SM | .3598719 .0435019 8.27 0.000 .2744021 .4453417
_cons | 9.992614 .6002469 16.65 0.000 8.813286 11.17194
----------------------------------------------------------------------------
E  3   3  2
  SM  SM 
2
i
Here is the regression omitting ASVABC instead of SM. We would expect ˆ3 to be
upwards biased. We anticipate that b2 is positive and we know that both the numerator and
the denominator of the other factor in the bias expression are positive.
29
S   1   2 ASVABC   3 SM  u
. reg S ASVABC SM
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
ASVABC | 1.377521 .1229142 11.21 0.000 1.136026 1.619017
SM | .1919575 .0416913 4.60 0.000 .1100445 .2738705
_cons | 11.8925 .5629644 21.12 0.000 10.78642 12.99859
----------------------------------------------------------------------------
. reg S SM
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
SM | .3598719 .0435019 8.27 0.000 .2744021 .4453417
_cons | 9.992614 .6002469 16.65 0.000 8.813286 11.17194
----------------------------------------------------------------------------
In this case the bias is quite dramatic. The coefficient of SM has nearly doubled. The
reason for the bigger effect is that the variation in SM is much smaller than that in ASVABC,
while b2 and b3 are similar in size, judging by their estimates.
30
. reg S ASVABC SM
-----------+------------------------------ F( 2, 497) = 105.58
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
. reg S ASVABC
-----------+------------------------------ F( 1, 498) = 182.56
Model | 1006.99534 1 1006.99534 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2668
Total | 3754.022 499 7.52309018 Root MSE = 2.3486
. reg S SM
-----------+------------------------------ F( 1, 498) = 68.44
Model | 453.551645 1 453.551645 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1191
Total | 3754.022 499 7.52309018 Root MSE = 2.5744
Finally, we will investigate how R2 behaves when a variable is omitted. In the simple
regression of S on ASVABC, R2 is 0.27, and in the simple regression of S on SM it is 0.12.
31
. reg S ASVABC SM
-----------+------------------------------ F( 2, 497) = 105.58
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
. reg S ASVABC
-----------+------------------------------ F( 1, 498) = 182.56
Model | 1006.99534 1 1006.99534 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2668
Total | 3754.022 499 7.52309018 Root MSE = 2.3486
. reg S SM
-----------+------------------------------ F( 1, 498) = 68.44
Model | 453.551645 1 453.551645 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1191
Total | 3754.022 499 7.52309018 Root MSE = 2.5744
Does this imply that ASVABC explains 27% of the variance in S and SM 12%? No, because
the multiple regression reveals that their joint explanatory power is 0.30, not 0.39.
32
. reg S ASVABC SM
-----------+------------------------------ F( 2, 497) = 105.58
Model | 1119.3742 2 559.687101 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2954
Total | 3754.022 499 7.52309018 Root MSE = 2.3024
. reg S ASVABC
-----------+------------------------------ F( 1, 498) = 182.56
Model | 1006.99534 1 1006.99534 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.2668
Total | 3754.022 499 7.52309018 Root MSE = 2.3486
. reg S SM
-----------+------------------------------ F( 1, 498) = 68.44
Model | 453.551645 1 453.551645 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1191
Total | 3754.022 499 7.52309018 Root MSE = 2.5744
In the second regression, ASVABC is partly acting as a proxy for SM, and this inflates its
apparent explanatory power. Similarly, in the third regression, SM is partly acting as a
proxy for ASVABC, again inflating its apparent explanatory power.
33
LGEARN   1   2 S   3 EXP  u
. reg LGEARN S EXP
----------------------------------------------------------------------------
-----------+------------------------------ F( 2, 497) = 40.12
Model | 21.2104059 2 10.6052029 Prob > F = 0.0000
Residual | 131.388814 497 .264363811 R-squared = 0.1390
-----------+------------------------------ Adj R-squared = 0.1355
Total | 152.59922 499 .30581006 Root MSE = .51416
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
----------------------------------------------------------------------------
However, it is also possible for omitted variable bias to lead to a reduction in the apparent
explanatory power of a variable. This will be demonstrated using a simple earnings
function model, supposing the logarithm of hourly earnings to depend on S and EXP.
34
LGEARN   1   2 S   3 EXP  u
. reg LGEARN S EXP . cor S EXP
(obs=500)
----------------------------------------------------------------------------
-----------+------------------------------ F( 2,| S
497) = EXP
40.12
Model | 21.2104059 2 10.6052029 --------+------------------
Prob > F = 0.0000
Residual | 131.388814 497 .264363811 S|
R-squared 1.0000
= 0.1390
-----------+------------------------------ EXP| -0.5836 1.0000
Adj R-squared = 0.1355
Total | 152.59922 499 .30581006 Root MSE = .51416
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
----------------------------------------------------------------------------
 S  S   EXPi  EXP 
E  ˆ2    2   3
i
  Si  S 
2
If we omit EXP from the regression, the coefficient of S should be subject to a downward
bias. b3 is likely to be positive. The numerator of the other factor in the bias term is
negative since S and EXP are negatively correlated. The denominator is positive.
35
LGEARN   1   2 S   3 EXP  u
. reg LGEARN S EXP . cor S EXP
(obs=500)
----------------------------------------------------------------------------
-----------+------------------------------ F( 2,| S
497) = EXP
40.12
Model | 21.2104059 2 10.6052029 --------+------------------
Prob > F = 0.0000
Residual | 131.388814 497 .264363811 S|
R-squared 1.0000
= 0.1390
-----------+------------------------------ EXP| -0.5836 1.0000
Adj R-squared = 0.1355
Total | 152.59922 499 .30581006 Root MSE = .51416
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
----------------------------------------------------------------------------
E  ˆ3    3   2
  EXP  EXP   S  S 
i i
  EXP  EXP 
2
i
For the same reasons, the coefficient of EXP in a simple regression of LGEARN on EXP
should be downwards biased.
36
. reg LGEARN S EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
. reg LGEARN S
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
. reg LGEARN EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
EXP | -.0096339 .0084625 -1.14 0.255 -.0262605 .0069927
_cons | 2.886352 .0598796 48.20 0.000 2.768704 3.003999
As can be seen, the coefficients of S and EXP are indeed lower in the simple regressions.
In the third regression, the negative bias is sufficient to wipe out the positive effect of
EXP.
37
. reg LGEARN S EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
. reg LGEARN S
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
. reg LGEARN EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
EXP | -.0096339 .0084625 -1.14 0.255 -.0262605 .0069927
_cons | 2.886352 .0598796 48.20 0.000 2.768704 3.003999
Indeed, the regression suggests that EXP has a very small negative effect on
earnings, very small in terms of the numerical coefficient, and hence also very small in
terms of R2.
38
. reg LGEARN S EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0916942 .0103338 8.87 0.000 .0713908 .1119976
EXP | .0405521 .009692 4.18 0.000 .0215098 .0595944
_cons | 1.199799 .1980634 6.06 0.000 .8106537 1.588943
. reg LGEARN S
----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
. reg LGEARN EXP

----------------------------------------------------------------------------
-----------+----------------------------------------------------------------
EXP | -.0096339 .0084625 -1.14 0.255 -.0262605 .0069927
_cons | 2.886352 .0598796 48.20 0.000 2.768704 3.003999
If one plotted the data for LGEARN and EXP in a simple scatter diagram, not
controlling for S, one would see no relationship at all.
39
. reg LGEARN S EXP

-----------+------------------------------ F( 2, 497) = 40.12
Model | 21.2104059 2 10.6052029 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1355
Total | 152.59922 499 .30581006 Root MSE = .51416
. reg LGEARN S
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
. reg LGEARN EXP

-----------+------------------------------ F( 1, 498) = 1.30
Model | .396095486 1 .396095486 Prob > F = 0.2555
-----------+------------------------------ Adj R-squared = 0.0006
Total | 152.59922 499 .30581006 Root MSE = .55284
A comparison of R2 for the three regressions shows that the sum of R2 in the simple
regressions is actually less than R2 in the multiple regression.
40
. reg LGEARN S EXP

-----------+------------------------------ F( 2, 497) = 40.12
Model | 21.2104059 2 10.6052029 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1355
Total | 152.59922 499 .30581006 Root MSE = .51416
. reg LGEARN S
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
. reg LGEARN EXP

-----------+------------------------------ F( 1, 498) = 1.30
Model | .396095486 1 .396095486 Prob > F = 0.2555
-----------+------------------------------ Adj R-squared = 0.0006
Total | 152.59922 499 .30581006 Root MSE = .55284
This is because the apparent explanatory power of S in the second regression has been
undermined by the downwards bias in its coefficient. As we have seen, the same is even
more evident for the apparent explanatory power of EXP in the third equation.
41
Copyright Christopher Dougherty 2016.
These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 6.2 of C. Dougherty,

Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.05.04

Introduction To Econometrics, 5 Edition: Chapter 6: Specification of Regression Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Econometrics, 5 Edition: Chapter 6: Specification of Regression Variables

Uploaded by

Copyright:

Available Formats

Type author name/s here

© Christopher Dougherty, 2016. All rights reserved.

Consequences of variable misspecification

Consequences of variable misspecification

Consequences of variable misspecification

Consequences of variable misspecification

Yˆ  ˆ1  ˆ2 X 2 Correct specification,

Consequences of variable misspecification

Yˆ  ˆ1  ˆ2 X 2 Correct specification,

Consequences of variable misspecification

Yˆ  ˆ1  ˆ2 X 2 Correct specification,

Consequences of variable misspecification

Coefficients are biased (in

are invalid (in general).

Yˆ  ˆ1  ˆ2 X 2 Correct specification,

In general, the omission of a relevant explanatory variable causes the regression

The ability of X2 to mimic X3 is determined by the slope coefficient obtained when X3 is

We will now derive the expression for the bias mathematically.

We simplify and demonstrate that ˆ2 has three components.

X2 is nonstochastic, so the denominator of the error term is nonstochastic and may be

It is reasonable to suppose, as a matter of common sense, that b3 is positive. This

Here is the regression omitting SM.

. reg LGEARN S EXP

. reg LGEARN EXP

. reg LGEARN S EXP

. reg LGEARN EXP

. reg LGEARN S EXP

. reg LGEARN EXP

. reg LGEARN S EXP

. reg LGEARN EXP

. reg LGEARN S EXP

. reg LGEARN EXP

These slideshows may be downloaded by anyone, anywhere for personal use.

The content of this slideshow comes from Section 6.2 of C. Dougherty,

You might also like