GMM PDF

General Advice Introduction Theory Estimation and inference Implementation (Matlab) Conclusion
Generalized Method of Moments

GMM in Applied Settings
Ashvin Gandhi
1
Harvard University
September 16, 2015

2
1
agandhi@fas.harvard.edu
2
Based on previous notes by Daniel Pollmann, Tom Wollmann, and Michael
Sinkinson.
Harvard University Generalized Method of Moments September 16, 2015 1 / 31
Lectures
I Go to the lectures.
I Actively follow along in the notes.
I Read the notes beforehand.
I Review the notes afterwards.

Readings
I Do the reading.

Problem Sets
I Start early.
I Read the papers.
I Work together, but do not copy code or content.
I Show your work.
I Comment your code.
I Package your code.
I Include your code in your writeup (LATEX package mcode).

Sections
I Not weekly.
I Not required.
I Potentially long.
I Hopefully helpful.
I Email me questions or topics beforehand.

Oce Hours
I By appointment.
I Email: agandhi@fas.harvard.edu

Seminars
I Monday and Wednesday at 2:30 PM.
I Go to as many as possible.
I Familiarize yourself with the tools and how they are used.
I Learn what current IO research looks like.

Overview
I Objective:
I What is GMM? How is it dierent from other estimation
techniques?
I Some technical details, and translating models into moments.
I Implementing GMM in Matlab.
I For derivations and details, I highly recommend Gary

Chamberlain's ECON 2120 Lecture Note 16 and Alberto
Abadie's ECON 2140 Extremum Estimators Handout. There
are also a number of great texts out there.

What is GMM?
I GMM is a framework for identifying parameters by leveraging
relations the econometrician would like to hold in expectation:
E [ψ (wi ; θ0 )] = 0.
I Need at least as many identifying moments as parameters.

May not be possible to impose all of these identifying
moments simultaneously.
I Solution: Weighted penatly for deviation from the moments.

0
θ0 = argminθ E [ψ (wi ; θ)] W E [ψ (wi ; θ)] ,
where W is positive semidenite.
I Heuristic: what parameters allow the data to best t the

identifying moments the econometrician wants to hold?

GMM and Other Methods
I This framework can encompass many techniques we are

familiar with:
I Regression:
E [xi i ] = 0.
I Instrumental Variables:
E [zi i ] = 0.
I Maximum likelihood:

∂ log f (Yi |Zi , θ)
E = 0.
∂θ
I Others, less so:
I Non-parametric estimation. (Though, GMM is used in
semi-parametric analysis.)

GMM and Other Methods
I Heuristically, one can think of GMM as imposing structure

that is somewhere between highly parametric techniques (like
MLE) and highly non-parametric ones (like kernel density
estimators). Sometimes people will even refer to GMM as
semiparametric. (This is a less-common use of the term.)
I MLE makes strong assumptions about the distributions (i.e.
known up to a parameter).
I GMM makes assumptions about the moments of the
distributions (e.g., means, variances, covariances, etc.).

I Non-parametrics make almost no assumptions about the
distributions.
I Tradeo between strength of assumptions and eciency.

How Does GMM Relate to Structural Modeling?
I Structural models in IO aim to leverage theory and data to

estimate policy invariant parameters that can be used to
assess counterfactuals.
I Often the theory may dictate equations we want to hold true
in expectation (identifying moments).
I Consumer optimality conditions.

I Producer optimality conditions.

Moment restriction
I We frame the problem as:
E [ψ (wi ; θ)] = 0,
where we call the M -vector ψ (wi ; θ) the moment function, wi
is an observation in the data, and θ is our parameter vector we
want to estimate (dim (θ) = K ).
I Our moment function evaluates a vector of M moments in the

data, and our parameter vector θ
K parameters. If
contains
M = K , then we say we are just-identied. If M > K , then
we are over-identied. Finally, if M < K , we are
under-identied (and cannot recover point estimates of the
parameters).

A Simple Example
I Suppose we have the following model:
yi = xi0 β + i ,
where E (i |xi ) = 0.
I 0 0
Then, E (yi − xi β|xi ) = 0 ⇒ E [(yi − xi β) h (xi )] = 0 for any
function h (·), in particular h (x) = x .
I Hence,
E [ψ (wi ; θ)] = 0,
0
where ψ (wi ; θ) = (yi − xi β) xi .
I In a more general problem, using optimal instruments means
optimal choice of h (·). (See Chamberlain 1987.)

Ω− 1 ,
∂ρjt (θ)
I h∗ (zt ) =
∂θ 0 |zt where E (ρjt |zt ) = 0 and
0

Ω = E ρjt ρjt |zt .
I See, e.g., Berry, Levinsohn, and Pakes (EMA, 1995) (BLP),
Reynart and Verboven (JOE, 2014).

Identication
I Recall: In that in Maximum Likelihood, a model is identied when
arg max L (θ; w ) = θ0

θ
is only true i θ 0 = θ0 , the true value. That is, the likelihood is
uniquely maximized at the true value.
I The analog for the semi-parametric GMM case is that
E [ψ (wi ; θ)] = 0
only holds at the true value θ = θ0 , and that at all other values of
the parameter vector, it does not hold.
I If we to set our moment restrictions as the gradients of the

parametric likelihood, we see that GMM nests Maximum Likelihood.

Consistency
I When asking whether an estimator is consistent, we want to

know whether it converges to the true value in probability:
p
θ̂ −
→ θ0
I Formally, this is the same as saying
h i
lim Pr θ̂ − θ0 > ε = 0, ∀ε > 0.

N→∞
I Under appropriate assumptions, GMM and ML are consistent.

Eciency
I We want to know whether our estimates are as precise as
possible. The ML estimator achieves the smallest variance
among all unbiased estimators in the parametric setting:
Var θ̂(X ) ≥ = (θ0 )−1

2
∂
= (θ) = −E ln f (X |θ)
∂θ∂θ0
I We call = (θ) the Fisher Information matrix, and we call
−1
= (θ0 ) the Cramer-Rao lower bound on variance
I The GMM estimator attains the semi-parametric eciency

bound (Chamberlain, JoE, 1987), which is the lower bound on
variance for an estimator using only the information contained
in the moment restrictions. In practice, the over-identied case
will require a two-step estimator (which we will discuss shortly)
for eciency.

Estimation
I We compute the empirical mean of the moment function, and

select θ̂ = arg minθ QC ,n (θ), where
" n #0 " n #
1 X 1 X
QC ,n (θ) = ψ (wi , θ) C ψ (wi , θ)
n n
i=1 i=1
for some positive denite M × M -matrix C .
I The weighting matrix matrix C assigns importance to
satisfying the dierent moment conditions.

Asymptotic variance
I Under appropriate assumptions,
√
d
n θ̂ − θ0 −→ N (0, V ) ,
where
−1 0 − 1
V = Γ0 C Γ Γ C ∆C Γ Γ0 C Γ .
h i
∂ψ
I Γ=E ∂θ (x, θ0 ) (M × K ): gradient of the moment function
with respect to the parameters
∆ = E ψ (x, θ0 ) ψ (x, θ0 )0 (M × M ):

I outer product of the
moments
I For extremely clear and concise derivation, see Gary

Chamberlain's ECON 2120 Lecture Note 16.

Just-identied case
−1 −1
V = Γ0 C Γ Γ0 C ∆C Γ Γ0 C Γ
= Γ−1 C −1 Γ0−1 Γ0 C ∆C ΓΓ−1 C −1 Γ0−1
= Γ−1 ∆Γ0−1
− 1
= Γ0 ∆−1 Γ ,
since Γ is invertible in the just-identied case. Equivalently, C

drops out because we can set the moments to zero (at least in the
limit), so we do not need to trade o dierent elements of the
moment function vector.

Over-identied case
For C = ∆−1 ,
−1 − 1
V = Γ0 C Γ Γ0 C ∆C Γ Γ0 C Γ
− 1 0 − 1 −1
= Γ0 ∆−1 Γ Γ ∆ ∆∆−1 Γ Γ0 ∆−1 Γ
− 1
= Γ0 ∆−1 Γ
− 1
The proof that (Γ0 C Γ)−1 Γ0 C ∆C Γ (Γ0 C Γ)−1 − Γ0 ∆−1 Γ ≥0
(positive semi-denite) can be found in virtually every econometrics
text or lecture notes. This proves that C = ∆−1 is indeed optimal.

Choice of weighting matrix
I We would like C ∝ ∆−1 . Recall that ∆ is the expectation of

the outer product of the moments at θ0 .
I Problem: we don't know θ0 .
I Solution: Form a consistent estimate ˆ
∆ using a consistent
though inecient estimate of θ0 .

Two-step GMM
I Step 1: Estimate θ̂GMM 1 by minimizing QC (θ) with any

arbitrary choice of (positive semi-denite) C , such as the
identity matrix.
I Step 2: Estimate the optimal weighting matrix as:

0 −1
ˆ −1 =

∆ En ψ wi , θ̂GMM 1 ψ w , θ̂GMM 1
and use this to then solve for θ̂GMM 2 = arg minθ Q∆

ˆ −1 (θ).

Issues and alternatives
I GMM, just like IV, is generally biased in nite sample.
I CUE (Continuously Updating Estimator) has less bias, but

large dispersion.
I For more details, see Newey's 14.385 notes (GMM II).

Return to simple example
I For the linear regression model, we have

ψ (wi ; θ) = (yi − xi0 β) xi .

I Just-identied GMM sets 1
Pn
n i=1 ψ w i ; θ̂ = 0.
1 Pn 0 −1 1 n
P
I Solving this, we get θ̂ = n i=1 xi xi n i=1 xi yi , which is
the same as OLS.

Linear IV example
I Now, suppose E (i |xi ) 6= 0, but we have a (relevant)

instrument z such that E (i |zi ) = 0 (exclusion restriction).
I Standard tool is TSLS
I In the GMM framework, we can use the moment function

ψ (wi , θ) = (yi − xi0 β) zi .
I If only some elements of the K -vector xi zi
are endogenous,
will also include the remaining subset. If dim (zi ) = dim (xi ),
the model is just-identied; for dim (zi ) > dim (xi ), it is
over-identied.

Typical moments in IO applications
I Look at the models we estimate for zero-correlation

conditions. One obvious example is the unobserved
heterogeneity term, ξ, in BLP. Look at what it is uncorrelated
with and form a moment from that.
I Micro moments/aggregate information: Suppose you know the

average of some function of the data and parameters.
I Nash conditions: This comes up in BLP's pricing equation.
I Optimality: Even in non-competitive environments, the

producer is usually optimizing some objective function. Even
without continuous controls, inequalities can be used.

Implementing GMM in Matlab

I The primary Matlab functions you should be familiar with are
fminsearch and fminunc.
I Basic syntax example of how to use it (just-identied case):
beta = fminsearch(@(b) (X'*(Y-X*b))'*(X*(Y-X*b)),betastart,myopts)
I In our simple example:
I Y is a column vector and X is a matrix where each row is an
observation
I The answer will be stored in a variable beta
I @(b) means the routine will attempt to minimize the
expression (X'*(Y-X*b'))'*(X'*(Y-X*b')) with respect to b

I The starting guess for b will be the value held in the vector
betastart
I The routine will follow the specications in the options set
myopts, which is set before this using a command like
myopts = optimset('TolFun',10^-12, 'MaxFunEvals',1000000,'MaxIter',1000)

Matlab: More complicated minimization

I The fminsearch command can also evaluate a named
function. This is useful if your moments are hard to evaluate.
In that case, you would create a separate .m le for the
function. Here's an example of moment_function.m le:
function [val] = moment_function(beta, X, S, alpha, P)
% Do manipulations with the input arguments beta, X, S, alpha, P.
% Suppose you evaluate a moment condition for each observation
% into a vector called "moment"
...
val = mean(moment);
I I could then call this function from fminsearch using:
beta = fminsearch(@(b) moment_function(b, X, S, alpha, P), betastart, myopts)
I Question: How would you implement 2-step GMM?

Evaluating gradients
I Necessary for Γ in asymptotic variance.
I Exact dierentiation (analytic derivatives) is always preferred

to numerical dierentiation due to approximation error. This is
also runs much faster.
I If not practical, use nite dierences with h on the order of

10
−6 :
I Forward dierence formula:

f (x + h) − f (x)
f 0 (x) ≈
h
I Symmetric dierence formula (more accurate):
f (x + h) − f (x − h)
f 0 (x) ≈
2h
I See Judd (1998, Ch. 7) for details.

Conclusion
I If you want article references or notes for anything in this

presentation or beyond, feel free to ask me.
I Questions?

GMM PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GMM PDF

Uploaded by

Copyright:

Available Formats

General Advice Introduction Theory Estimation and inference Implementation (Matlab) Conclusion

Generalized Method of Moments

September 16, 2015

I Actively follow along in the notes.

I Read the notes beforehand.

I Review the notes afterwards.

Harvard University Generalized Method of Moments September 16, 2015 2 / 31

Harvard University Generalized Method of Moments September 16, 2015 3 / 31

I Read the papers.

I Work together, but do not copy code or content.

I Show your work.

I Comment your code.

I Package your code.

I Include your code in your writeup (LATEX package mcode).

Harvard University Generalized Method of Moments September 16, 2015 4 / 31

I Email me questions or topics beforehand.

Harvard University Generalized Method of Moments September 16, 2015 5 / 31

Harvard University Generalized Method of Moments September 16, 2015 6 / 31

I Monday and Wednesday at 2:30 PM.

I Learn what current IO research looks like.

Harvard University Generalized Method of Moments September 16, 2015 7 / 31

I What is GMM? How is it dierent from other estimation

I Implementing GMM in Matlab.

I For derivations and details, I highly recommend Gary

Harvard University Generalized Method of Moments September 16, 2015 8 / 31

I Need at least as many identifying moments as parameters.

I Solution: Weighted penatly for deviation from the moments.

I Heuristic: what parameters allow the data to best t the

Harvard University Generalized Method of Moments September 16, 2015 9 / 31

GMM and Other Methods

I This framework can encompass many techniques we are

I Non-parametric estimation. (Though, GMM is used in

Harvard University Generalized Method of Moments September 16, 2015 10 / 31

GMM and Other Methods

I Heuristically, one can think of GMM as imposing structure

I MLE makes strong assumptions about the distributions (i.e.

distributions (e.g., means, variances, covariances, etc.).

I Tradeo between strength of assumptions and eciency.

Harvard University Generalized Method of Moments September 16, 2015 11 / 31

How Does GMM Relate to Structural Modeling?

I Structural models in IO aim to leverage theory and data to

I Often the theory may dictate equations we want to hold true

in expectation (identifying moments).

I Consumer optimality conditions.

Harvard University Generalized Method of Moments September 16, 2015 12 / 31

I We frame the problem as:

I Our moment function evaluates a vector of M moments in the

Harvard University Generalized Method of Moments September 16, 2015 13 / 31

Reynart and Verboven (JOE, 2014).

Harvard University Generalized Method of Moments September 16, 2015 14 / 31

I Recall: In that in Maximum Likelihood, a model is identied when

arg max L (θ; w ) = θ0

I The analog for the semi-parametric GMM case is that

I If we to set our moment restrictions as the gradients of the

Harvard University Generalized Method of Moments September 16, 2015 15 / 31

I When asking whether an estimator is consistent, we want to

Harvard University Generalized Method of Moments September 16, 2015 16 / 31

Var θ̂(X ) ≥ = (θ0 )−1

I The GMM estimator attains the semi-parametric eciency

Harvard University Generalized Method of Moments September 16, 2015 17 / 31

I We compute the empirical mean of the moment function, and

Harvard University Generalized Method of Moments September 16, 2015 18 / 31

with respect to the parameters

I What is GMM? How is it dierent from other estimation

I Need at least as many identifying moments as parameters.

I Heuristic: what parameters allow the data to best t the

I Tradeo between strength of assumptions and eciency.

I Recall: In that in Maximum Likelihood, a model is identied when

I The GMM estimator attains the semi-parametric eciency

since Γ is invertible in the just-identied case. Equivalently, C

I GMM, just like IV, is generally biased in nite sample.

I CUE (Continuously Updating Estimator) has less bias, but

I Now, suppose E (i |xi ) 6= 0, but we have a (relevant)

I Nash conditions: This comes up in BLP's pricing equation.

I Basic syntax example of how to use it (just-identied case):

beta = fminsearch(@(b) (X'(Y-Xb))'(X(Y-X*b)),betastart,myopts)

expression (X'(Y-Xb'))'(X'(Y-X*b')) with respect to b

myopts, which is set before this using a command like

I I could then call this function from fminsearch using:

I Exact dierentiation (analytic derivatives) is always preferred

I If not practical, use nite dierences with h on the order of

I Forward dierence formula: