Professional Documents
Culture Documents
13 GMM 2pp PDF
13 GMM 2pp PDF
1 of 32
Outline
(1) Introduction and motivation
2 of 32
Introduction
Generalized method of moments (GMM) is a general estimation principle.
Estimators are derived from so-called moment conditions.
(2) Maximum likelihood estimators have the smallest variance in the class of consistent
and asymptotically normal estimators.
But: We need a full description of the DGP and correct specification.
GMM is an alternative based on minimal assumptions.
(3) GMM estimation is often possible where a likelihood analysis is extremely difficult.
We only need a partial specification of the model.
Models for rational expectations.
3 of 32
• If we knew the expectation then we could solve the equations in (∗) to find θ0.
4 of 32
Instrumental Variables Estimation
• In many applications, the moment condition has the specific form:
where the R instruments in zt are multiplied by the disturbance term, u(wt, θ).
5 of 32
• Under rational expectations, the expectation error, vt, should be orthogonal to the
information set, It, and for zt ∈ It we have the moment condition
7 of 32
• For a sample, y1, y2, ..., yT , we state the corresponding sample moment conditions:
T
1X
gT (b
µ) = b) = 0.
(yt − µ
T t=1
The MM estimator of the mean µ0 is the solution, i.e.
T
1X
bMM
µ = yt,
T t=1
which is the sample average.
8 of 32
Example: OLS as a MM Estimator
• Consider the linear regression model of yt on xt (K × 1):
yt = x0tβ 0 + t. (∗∗)
Assume that (∗∗) represents the conditional expectation:
9 of 32
1 XT ³ ´ 1X T
1 XT
b =
gT (β) 0
xt yt − xtβb = xtyt − b = 0.
xtx0tβ
T t=1 T t=1 T t=1
And the MM estimator is derived as the unique solution:
à T !−1 T
X X
βb =MM xtx0 xtyt,
t
t=1 t=1
PT 0
provided that t=1 xt xt is non-singular.
10 of 32
Example: Under-Identification
• Consider again a regression model
11 of 32
E[z2t t] = 0. (†††)
The K2 moment conditions in (†††) can replace (††). To simplify notation, we define
µ ¶ µ ¶
x1t x1t
xt = and zt = .
(K×1) x2t (K×1) z2t
xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.
12 of 32
• The corresponding sample moment conditions are given by
1 XT ³ ´
b
gT (β) = 0b
zt yt − xtβ = 0.
T t=1
13 of 32
14 of 32
Distances and Weight Matrices
• Consider a simple example with 2 moment conditions
µ ¶
ga
gT (θ) = ,
gb
where the dependence of T and θ is suppressed.
• The weight matrix, WT , has to be positive definite, so that we put a positive and
non-zero weight on all moment conditions.
16 of 32
Asymptotic Distribution
• Assume a central limit theorem for f (wt, zt, θ), i.e.:
T
√ 1 X
T · gT (θ0) = √ f (wt, zt, θ0) → N(0, S),
T t=1
where S is the asymptotic variance.
• Then it holds that for any positive definite weight matrix, W , the asymptotic distri-
bution of the GMM estimator is given by
√ ³ ´
b
T θGMM − θ0 → N(0, V ).
The asymptotic variance is given by
−1 −1
V = (D0W D) D0W SW D (D0W D) ,
where
∙ ¸
∂f (wt, zt, θ)
D=E
∂θ0
is the expected value of the R × K matrix of first derivatives of the moments.
17 of 32
• Intuition: a moment with small variance is informative and should have large weight.
It can be shown that the optimal weight matrix, WTopt, has the property that
18 of 32
• Hypothesis testing can be based on the asymptotic distribution:
b a
θGMM ∼ N(θ0, T −1Vb ).
ξ J = T · gT (b
θGMM )0WToptgT (b
θGMM ) = T · QT (b
θGMM ) → χ2(R − K).
b opt
θ[2] = arg min gT (θ)0W[2] gT (θ).
θ
The estimator is not unique as it depends on the initial weight matrix W[1].
21 of 32
22 of 32
Example: 2SLS
• Consider again a regression model
23 of 32
• Instead, we want to derive the GMM estimator by minimizing the criteria function
• We take the first derivative, and the GMM estimator is the solution to
∂QT (β)
= −2T −2X 0ZWT Z 0Y + 2T −2X 0ZWT Z 0Xβ = 0.
∂β
bGMM (WT ) = (X 0ZWT Z 0X)−1 X 0ZWT Z 0Y , depending on WT .
We find β
• To estimate the optimal weight matrix, WTopt = ST−1, we use the estimator
T T
1 X 0 1X 2 0
ST = · f (wt, zt, θ)f (wt, zt, θ) = b ztz ,
T t=1 T t=1 t t
which allows for general heteroskedasticity of the disturbance term.
24 of 32
• For the asymptotic distributions, we recall that
³ ¡ 0 −1 ¢−1´
b a −1
β GMM ∼ N β 0, T DS D .
The derivative is given by
³ PT ´
−1 0
∂ T t=1 zt (yt − xt β)
T
X
∂gT (β) −1
DT = = = −T ztx0t,
(R×K) ∂β 0 ∂β 0
t=1
so the variance of the estimator becomes
h i ¡ ¢
V β bGMM = T −1 D0 W optDT −1
T T
⎛Ã !Ã !−1 Ã !⎞−1
XT T
X 2 T
X
−1 ⎝
= T −T −1 0
xtzt T −1 0
bt ztzt −T −1
ztx0t ⎠
t=1 t=1 t=1
à T !−1 T
à T !−1
X X X
= xtzt0 b2t ztzt0 ztx0t .
t=1 t=1 t=1
• Note that this is the heteroskedasticity consistent (HC) variance estimator of White.
GMM with allowance for heteroskedastic errors automatically produces heteroskedas-
ticity consistent standard errors!
25 of 32
• If we assume that the error terms are IID, the optimal weight matrix simplifies to
T
b2 X 0
σ
ST = b2Z 0Z,
ztzt = T −1σ
T t=1
• b
θML is consistent for weaker assumptions than maintained by ML.
The FOC for a normal regression model corresponds to
E[xt(yt − x0tβ)] = 0,
which is weaker than the assumption that the entire distribution is correctly specified.
OLS is consistent even if t is not normal.
• A ML estimation that maximizes a likelihood function different from the true models
likelihood is referred to as a pseudo-ML or a quasi-ML estimator.
Note that the variance matrix is no longer the inverse information.
27 of 32
Robustness: First order conditions should hold! Moment conditions should hold!
PML is a GMM interpretation of ML. Weights and variances can
Use larger PML variance. be made robust.
28 of 32
Example: The C-CAPM Model
• Consider the consumption based capital asset pricing (C-CAPM) model of Hansen
and Singleton (1982).
31 of 32