You are on page 1of 14

ECONOMETRICS II

TUTORIAL II
The second tutorial will deal with the issue of how to estimate a
model with an alternative (more general) estimation method than OLS,
namely the Generalized Method of Moments (GMM), which embeds the
OLS, IV and GIVE (or Two-stage Least Squares) estimators as special
cases.

To this end we will use the data contained in the schooling2.wf1 workfile.
These data are used in example 5.4 and in exercise 5.2 of Verbeek and
originally used in a David Card (1995) article. The data are referred to
3, 010 US men observed in 1976. The variables are wage, experience and
other individual characteristics (race, residency in a metropolitan area
and residency in Southern regions). With these data we want to estimate
a human capital earning function (i.e. a function which explains wages
as a function of the human capital accumulated by the individual):
wi = β1 + β2 Si + β3 Ei + β4 Ei2 + εi (1)
where w is log wage (lwage76), S is the education level (ed76, in years),
E is the working experience (exp76, in years) and E 2 is its square
(exp762, in years). Both S and E are endogenous (S is an individual
choice and E is computed by age−S −6) so that appropriate instrumen-
ts must be found. The choice of instruments for E and E 2 is quite easy:
age and age squared. More difficult is to find appropriate instruments
for S. Labour economics literature has found as possible instruments
two groups of variables: 1) those related to the family environment of
the individual (e.g. parents’ education) and 2) those related to the insti-
tutional setting of schooling system (e.g. proximity or not of individuals
to college). We will use both types of variables by instrumenting S first
with college proximity (dummy nearc4) and then with education of bo-
th parents (daded and momed, in years). In the first case the model is
exactly identified and in the second one it is overidentified.

1 Some IV language (from Steve Pischke, LSE)


Imagine the model you are interested in is the following:

wi = β1 + β2 Si + γAi + ei (2)
where Ai is unobserved worker ability.
Call the instrumental variable Zi . A valid instrument needs to satisfy
three conditions:

1
I. Zi is as good as randomly assigned.

II. Zi satisfies the exclusion restriction, i.e. it does not appear as a


separate regressor in the model we are interested in.

III. Zi affects the endogenous regressor Si .

Of these, only condition III can be tested (see below). Conditions I


and II have to be argued based on knowledge from outside the data we
have.
The instrumental variables language comes from simultaneous equa-
tions models. Within an instrumental variable model, we usually identify
the:

• First stage: The regression of schooling on the instrument(s) is


called the first stage:

Si = π10 + π11 Zi + η1i

• Reduced form: The regression of earnings on the instrument(s) is


called the reduced form:

wi = π20 + π21 Zi + η2i

• Structural equation: The regression of earnings on schooling is


called the structural equation:

wi = β1 + β2 Si + εi

where εi = γAi + ei , (Ai is unobserved ability, correlated to schoo-


ling) , i.e. it is a structural error term, not a regression residual.
The structural equation is the one we are ultimately interested in:
we want to know the return to schooling β2 .1
1
Actually the coefficients in the three equations are linked. Substituting the first
stage into the structural equation:

wi = (β1 + β2 π10 ) + β2 π11 Zi + (η1i + εi ) = π20 + π21 Zi + η2i

This relationship only works when there is one endogenous regressor and one in-
strument (the model is just identified), while if there are multiple instruments for a
single endogenous regressor the model is over-identified and there is no single solution
to get β2 from the first stage and reduced form coefficients.

2
If the instruments exhibit only weak correlation with the endogenous
regressor(s) (condition III above), the properties of the IV estimator
can be very poor (the IV estimator is biased, its standard errors are
misleading, hypothesis tests are unreliable).
To test whether there is a weak instruments problem, it is useful to:
• Report the first stage and think about whether it makes sense. Are
the magnitude and sign as you would expect?
• Report the F-statistics on the excluded instruments. The big-
ger this is, the better. F-statistics above 10 to 20 are considered
relatively safe, lower F-statistics put you in the danger zone.
• Pick your “best” single instrument and report just-identified esti-
mates using this one only. Just-identified IV is somehow “less”
unbiased than over-identified IV.
• Look at the coefficients, t-statistics, and F-statistics for excluded
instruments in the reduced-form regression of dependent variables
on instruments. The reduced-form estimates are just OLS, so they
are unbiased. If the relationship you expect is not in the reduced
form, it is probably not there.

2 GMM ESTIMATOR AND SPECIAL CASES


GMM estimator is the vector minimizing the following expression:
minQn (β, Wn ) = gn (β)0 Wn gn (β) (3)
β
 X 0  X 
1 0 1 0
= [zi (yi − xi β) Wn [zi (yi − xi β)
n i n i
 0  
1 0 1 0
= Z (y − Xβ) Wn Z (y − Xβ)
n n
It is given by:
bGM M = (X0 ZWn Z0 X)−1 X0 ZWn Z0 y (4)
= (SXZ Wn SZX )−1 SXZ Wn sZY
The optimal GMM (boptGM M ) is the one which uses the estimator of the
optimal weighting matrix, i.e. the inverse of the moment variance matrix:
0
Σ =E[ε2i zi zi ] (5)
i.e.
1X 2 0
S= ε zi z ]
[b (6)
n i i i

3
where usually εbi is the residual obtained in a first stage where the Wn
matrix is equal to the identity matrix. Therefore
−1 0 0
bopt 0 −1 0
GM M = (X ZS Z X) X ZS
−1
Z y = (SXZ S−1 SZX )−1 SXZ S−1 sZY

2.1 Special cases


2.1.1 IV estimator
Given the usual R moment conditions
0
E[zi (yi − xi β)] = 0 (7)

if R = K then the X0 Z matrix is square and invertible so that the


weighting matrix is irrelevant:

bGM M = (X0 ZWn Z0 X)−1 X0 ZWn Z0 y (8)


= (Z0 X)−1 Wn−1 (X0 Z)−1 X0 ZWn Z0 y
= (Z0 X)−1 Z0 y = S−1
ZX sZY = bIV

2.1.2 OLS estimator


Consistency of the OLS estimator relies on the condition

E[xi εi ] = 0 (9)

which can be rewritten as


0
E[xi (yi − xi β)] = 0 (10)

so that

bGM M = (X0 X)−1 Wn−1 (X0 X)−1 X0 XWn X0 y (11)


= (X0 X)−1 X0 y = S−1
XX sXY = bOLS

2.1.3 Generalized IV Estimator (GIVE estimator)


  −1
opt 2 1 0
Wn = s ZZ (12)
n
so that

bGM M = (X0 Z(Z0 Z)−1 Z0 X)−1 X0 Z(Z0 Z)−1 Z0 y (13)


= (SXZ S−1 −1 −1
ZZ SZX ) SXZ SZZ sZY = bGIV E

4
2.1.4 Two-stage Least Squares estimator
GIVE estimator can also be seen as the outcome of a two stage estima-
tion procedure.

First stage: each variable of the x vector is regressed on the instrument


set z by using the OLS estimator. Matrix of fitted values:
b = Z(Z0 Z)−1 Z0 X
X (14)
= PZ X

Second stage: the variable y is regressed on the fitted values matrix,


X.
b OLS estimator of the second stage is called Two-stage Least Squares
estimator and is given by
b 0 X)
bT SLS = (X b −1 X
b 0y (15)
= (X0 Z(Z0 Z)−1 Z0 X)−1 X0 Z(Z0 Z)−1 Z0 y
= (SXZ S−1 −1 −1
ZZ SZX ) SXZ SZZ sZY = bGIV E

The bT SLS estimator can also be written as

bT SLS = (X b 0y
b 0 X)−1 X (16)
= (X0 Z(Z0 Z)−1 Z0 X)−1 X0 Z(Z0 Z)−1 Z0 y
= (SXZ S−1 −1 −1
ZZ SZX ) SXZ SZZ sZY = bGIV E

2.1.5 Sargan-Hansen test


If R = K the statistics
 X 0  X −1  X 
1 0 opt 1 2 0 1 0 opt
ξ=n zi (yi − xi bGM M ) (b
εi zi zi ) zi (yi − xi bGM M )
n i n i n i
X 0 hX 0
i−1 X 
= nQn (bopt
GM M , W opt
n ) = z i ε
b i (bε 2
i z i zi ) z i ε
b i =
i i i
=0 (17)

If R > K, the statistics takes strictly positive values but close to


Pzero 
if the assumptions on the moment conditions P are satisfied (plim n1 i zi εbi =
0) so that we can reasonably expect that n1 i zi εbi is small.
Under the null
ξ → χ2 (R − K) (18)
d

5
3 GMM AND SPECIAL CASES ESTIMATION
IN EViews
EViews allows one to estimate a regression equation with the methods
previously seen by appropriately specifying first the estimation method
(from the Estimation settings window) and then regressors and instru-
ments.
In the TSLS (Two Stage Least Squares) category are grouped the
GIVE and IV methods. Then there is the GMM category. In both cases
we have to specify regressors and instruments in the windows. Notice
that:

1. non endogenous regressors (exogenous or predetermined) have to


be included as well in the instrument list;
2. as the constant is always a valid instrument, EViews by default
adds it among the instruments.

When performing GMM estimation it is necessary to choose the


Weighting Matrix type i.e. if White type (for cross-sectional data) or
Newey-West type (for time series data).

When performing GMM estimation, EViews output also report the


“J ”statistics, given by the objective function Qn (bopt opt
GM M , Wn ). It is
then possible to perform a Sargan-Hansen test by “manually” multiply-
ing the J statistics for the number of observations in the regression. The
syntax for an equation called eq06 with 2 overidentifying restrictions is:

scalar overid = eq06.@regobs*eq06.@jstat

scalar overidp=1-@cchisq(overid, 2)

4 EQUATIONS TO ESTIMATE
1. Estimate the earnings function (1) with OLS by also including the
three dummies for race (black), for residency in a metropolitan area
(smsa76) and for residency in Southern regions (south76) among
the regressors.
2. Estimate the first stage for the education level by regressing with
OLS ed76 on all exogenous variables of the model (i.e. the th-
ree dummies for race black, for residency in a metropolitan area
smsa76 and for residency in Southern regions south76, and the
three instruments age76, age762 and nearc4).

6
Perform an F test on the excluded instruments (age76, age762 and
nearc4), to test whether there is a weak instruments problem.

3. Estimate by 2SLS the equation by instrumenting experience, expe-


rience squared and the education level with age, age squared and
the college proximity dummy.

4. Estimate by GMM the same equation as in point 3.

5. Estimate the first stage in point 2, by adding among the regressors


father and mother education (momed and daded).
Perform an F test on the excluded instruments (momed, daded,
age76, age762 and nearc4), to test whether there is a weak instru-
ments problem.

6. Estimate by GMM the same equation as in the preceeding point


3, by also adding as additional instruments momed and daded and
test the validity of the two overidentifying restrictions.

7. Estimate by 2SLS the same equation as in point 6, by adding


among the instruments mother and father education.

The outputs from each of the 7 estimated equations above are:

7
1. OLS estimation:

8
2. First stage for ed76:

9
3. Two stage Least Squares perfectly identified:

10
4. GMM perfectly identified:

11
5. First stage for ed76 with additional instruments:

12
6. GMM overidentified:

13
7. Two stage Least Squares overidentified:

14

You might also like