You are on page 1of 7

Questions and Answers on Generalized Method of Moments

L. Magee Winter, 2010

———————————————————–

1. Explain how to construct an estimator from a set of population moment conditions using the
generalized method of moments.

2. Observations in sample A are assumed to have been drawn from a distribution with a mean equal
to µ. Observations in sample B are assumed to have been drawn from a distribution with a mean
equal to µ + 5. All observations are uncorrelated. The sample means are ȳA = 7 and ȳB = 9,
and their variances are estimated to be Var(ȳA ) = 1 and Var(ȳB ) = 0.2.

(a) Write a quadratic form that is a function of µ and that is minimized at the asymptotically
efficient GMM estimate of µ.
(b) Compute the numerical value of this GMM estimate.
(c) Compute the value of the GMM test statistic for testing the assumption about the relation-
ship between the means of samples A and B. State the asymptotic null distribution of this
test statistic, including degrees of freedom.

3. The observations y1 , y2 , . . . , yn are assumed to be independently distributed and to have been


drawn from a distribution with the property

E(y) = µ (1)

(a) Interpreting equation (1) as a population moment condition, write down the corresponding
sample moment condition that could be used to obtain a method of moments estimator µ̂a .
(b) The assumed distribution also has the property

E(y 2 ) = 2µ2 (2)

Use (2) to write another sample moment condition that could be used to obtain another
method of moments estimator, µ̂b .
(c) Describe how equations (1) and (2) can be used together to produce a GMM estimator, µ̂c .

4. Consider the model

yj = Zj δj + j = Yj γj + Xj βj + j

where Ej = 0, Ej 0j = σ 2 I, EXj0 j = 0, EYj0 j 6= 0 and EYj0 j is unknown. Some instrumental
0  = 0. X
variables XIV are available, where EXIV j IV has more columns than Yj has.

1
(a) Applying the GMM method, derive the two-stage least squares estimator of δj .
(b) Show that the GMM test of overidentifying restrictions is equal to n times the uncentred
R2 from regressing the 2SLS residuals on X.

5. The observations y1 , y2 , . . . , yn are assumed to have been drawn independently from a distribution
with the property

E(y) = 3µ (3)

(a) (i) Interpreting equation (3) as a population moment condition, write the corresponding
sample moment condition.
(ii) Solve the sample moment condition for µ to obtain a method of moments estimator µ̂1 .
(b) Suppose that the assumed distribution also has the property
1 1
E( ) = (4)
y 2µ
Use (4) to write another sample moment condition, and use it to obtain another method of
moments estimator, µ̂2 .
(c) Suppose you have estimated the variances of y and of y1 , Vb (y) and Vb ( y1 ), and also have
1 2
a covariance estimate Cov(y,
d
y ). Write the function “X (µ)” that could be minimized to
produce a GMM estimator of µ that uses information from both conditions (3) and (4).
(You do not have to carry out or describe the minimization step itself.)

6. (16 marks: 5 for a, 5 for b, 6 for c)


The method of moments estimator β̂ of the k-element vector β is defined as the solution to
the sample moment condition n−1 i m(xi , yi , β̂) = 0 corresponding to the assumed population
P

moment condition Em(x, y, β) = 0, where x and y are random variables, and xi and yi , i =
1, . . . , n, are the observations, and m is a vector of moment functions.

(a) (i) How many elements are in the vector m(x, y, β)?
∂m(x,y,β)
(ii) What are the dimensions of the matrix ∂β 0 ?
∂m(x,y,β)
(iii) Describe the (a, b)th element of the matrix ∂β 0 .
(iv) Write a formula for estimating Var(β̂).
(b) Let x and y be scalar random variables. For each of the following two population moment
conditions, write: the sample moment conditions; the method of moments estimators β̂(i)
and β̂(ii) , and the formulas for estimating Var(β̂(i) ) and Var(β̂(ii) ).
(i) Ex(y − xβ) = 0
(ii) Ex2 (y − xβ) = 0

2
(c) Based on 10 observed scalars xi and yi , the following sums were calculated.
P P 2 P 3 P 4
xi = 18 xi = 38 xi = 90 x = 230
P 5 P 6 P P 2i
x = 618 x = 1718 xi yi = 38 x yi = 88
P 3i P 5i P 2 2 P 4 i2
xi yi = 218 xi yi = 1538 xi yi = 224 xi yi = 1472
Using the formulas from your answer to part (b), compute β̂(i) , β̂(ii) , and compute estimates
of their variances.

7. Observations in sample A are assumed to have been drawn from a distribution DA with a mean
equal to µ. Observations in sample B are assumed to have been drawn from a distribution DB
with a mean equal to 3µ. All observations are uncorrelated. The sample means are ȳA = 5 and
ȳB = 12. The variances of these sample means are estimated as Var(ȳ
d A ) = 2 and Var(ȳ
d B ) = 4.

(a) Write a quadratic form that is a function of µ and that is minimized at the asymptotically
efficient GMM estimate of µ.
(b) Compute the numerical value of this GMM estimate.
(c) (i) Compute the value of the GMM test statistic for testing the assumption about the
relationship between the means of the distributions DA and DB .
(ii) State the asymptotic null distribution of this test statistic, including degrees of freedom.

Answers

1. (About notation – following Greene and others, yi is used in two different ways. In an expression
like E(yi ), yi is a random variable. But in an expression like n−1 ni=1 yi , yi is an observation.
P

The population moment conditions are assumptions about the random variable yi . The sample
moment conditions use the observed yi ’s.)
Let m(xi , yi , θ) be a vector of functions of xi , yi and the unknown parameter vector θ. The
population moment conditions are restrictions on the expectations of m(xi , yi , θ),

Em(xi , yi , θ) = 0

For example, in the usual regression model yi = x0i θ + i where E(xi i ) = 0, then m(xi , yi , θ) =
xi (yi − x0i θ). Next, bring in the data by defining, as a sample counterpart to Em(xi , yi , θ), the
sample or empirical moments:
n
X
−1
n m(xi , yi , θ) ≡ m̄(θ)
i=1

If there are as many elements in θ as there are in m̄(θ) then usually a value of θ can be found
that solves m̄(θ) = 0, and θ is just-identified. This estimation method is known as the method

3
of moments. An example is OLS for the usual regression model mentioned above, where m̄(θ) =
n−1 ni=1 xi (yi − x0i θ). Setting this equal to zero and solving for θ gives the OLS estimator.
P

If there are more elements in m̄(θ) than there are in θ, then θ is over-identified. No θ can be
found that solves m̄(θ) = 0. The GMM estimator is the value of θ that minimizes m̄(θ)0 W m̄(θ)
given some positive definite weight matrix W . A W that results in an asymptotically efficient
GMM estimator is W = V̂ (m̄(θ))−1 , where V̂ (m̄(θ)) is an estimator of the variance-covariance
matrix of m̄(θ). (The phrase “asymptotically efficient estimator” means that as n → ∞, the
estimator’s variance is as small or smaller than that of any other consistent estimator.)

2. (a) " #0 " #−1 " # " #0 " #" #


ȳA − µ 1 0 ȳA − µ 7−µ 1 0 7−µ
=
ȳB − (µ + .5) 0 0.2 ȳB − (µ + .5) 4−µ 0 5 4−µ

(b) Minimize g(µ) with respect to µ, where g(µ) is the formula from (a),
g(µ) = (7 − µ)2 + 5(4 − µ)2 . Then

∂g/∂µ = −2(7 − µ) − 2(5)(4 − µ)


= −14 + 2µ − 40 + 10µ
= 0 at 12µ̂ = 54 ⇒ µ̂ = 4.5

(c) This statistic is obtained by substituting the GMM estimate µ̂ into the quadratic form g(µ)
that it minimizes,

g(µ̂) = (2.5)2 + 5(−0.5)2 = 7.5

Under H0 : “E(y|group B) = E(y|group A) + 5”, g(µ̂) is asymptotically distributed as χ2


with one degree of freedom.

3. (a) population moment condition is E(yi − µ) = 0 or Em(µ) = 0 where m(µ) = yi − µ.


sample moment condition is m̄(µ) = n−1 ni=1 (yi − µ) = 0 at µ = µ̂a .
P

Therefore µ̂a = n−1 ni=1 yi = ȳ.


P

(b) population moment condition is E(yi2 − 2µ2 ) = 0 or Em(µ) = 0 where m(µ) = yi2 − 2µ2 .
sample moment condition is m̄(µ) = n−1 ni=1 (yi2 − 2µ2 ) = 0 at µ = µ̂b .
P

Therefore µ̂b = (n−1 ni=1 (yi2 /2))1/2 .


P
" #
n−1 ni=1 (yi − µ)
P
(c) Let m̄(µ) = .
n−1 ni=1 (yi2 − 2µ2 )
P

µ is overidentified. Choose a µ = µ̂c that minimizes m̄(θ)0 V̂ −1 m̄(θ) where V̂ is an estimate

4
of Var(m̄(µ)). One possibility (this goes beyond what would be required for full marks) is

n
" Pn Pn #
n−2 i=1 (yi − ȳ)
2 n−2 2
i=1 (yi − ȳ)(yi − A)
X
V̂ = Pn where A = n−1 yi2
−2 2 n−2 ni=1 (yi2 − A)2
P
n i=1 (yi − ȳ)(yi − A) i=1

4. (a) The orthogonality conditions are EXj0 j = 0 and EXIV


0  = 0. Together these can be
j
.
written as EX 0 j = 0 where X = [Xj .. XIV ]. Substituting out j and putting in the
observed yj vector gives the sample moment function m̄(δj ) = X 0 (yj − Zj δj ).
The GMM estimator δ̂j minimizes m̄(δj )W m̄(δj ). For W , note that when there is no auto or
het, then Var(X 0 ) = X 0 Var()X = σ 2 (X 0 X), so we can use W = σ −2 (X 0 X)−1 . Although
σ −2 is unknown, we do not need to know it to do the minimization. δ̂j is the value of δj
that minimizes

m̄(δj )W m̄(δj ) = (X 0 (yj − Zj δj ))0 (σ 2 (X 0 X))−1 X 0 (yj − Zj δj )


= σ −2 (yj − Zj δj )0 X(X 0 X)−1 X 0 (yj − Zj δj )

This is minimized at the 2SLS estimator

δ̂j = (Zj0 X(X 0 X)−1 X 0 Zj )−1 Zj0 X(X 0 X)−1 X 0 yj

(b) The test statistic is the minimized value of the GMM criterion. From (a), this is

σ −2 (yj − Zj δ̂j )0 X(X 0 X)−1 X 0 (yj − Zj δ̂j ) = σ −2 e0j X(X 0 X)−1 X 0 ej

where ej = yj − Zj δ̂j is the 2SLS residual. We need to estimate σ 2 . A commonly used one
is σ̂ 2 = e0j ej /n. Substituting this above gives

n(e0j X(X 0 X)−1 X 0 ej /e0j ej )

This is n × (uncentred R2 ) = n(ESS/T SS), where T SS is the total uncentred sum of


squares of the elements of ej and ESS is the explained sum of squares ê0j êj where êj =
X(X 0 X)−1 X 0 ej .
Pn
5. (a) (i) n−1 i=1 (yi − 3µ) = 0 at µ = µ̂1
(ii) ȳ − 3µ̂1 = 0 ⇒ µ̂1 = ȳ/3.
(b) n−1 ni=1 ( y1i − 2µ
1
P
) = 0 at µ = µ̂2 .

n−1 ni=1 ( y1i ) = 2µ̂1 2 ⇒ µ̂2 = ( n2 ni=1 ( y1i ))−1


P P

(c) Define

5
" #
ȳ − 3µ
m̄(µ) = −1
Pn 1 1
and
n i=1 ( yi ) − 2µ

" #
1
Vb (y) Cov(y,
d
y)
V̂ (m) = 1
Cov(y,
d
y) Vb ( y1 )
then the function is

X 2 (µ) = m̄(µ)0 (V̂ (m))−1 m̄(µ)

6. (a) (i) m(x, y, β) is a k × 1 vector


∂m(x,y,β)
(ii) ∂β 0 is a k × k matrix
∂m(x,y,β) ∂ma (x,y,β)
(iii) The (a, b)th element of the matrix ∂β 0 is ∂βb , which is the partial derivative
of the ath element of the vector of functions m(x, y, β) with respect to the bth element
of the β vector.
!−1  !−1 0
(iv) X ∂ m̂i X X ∂ m̂i
Var(
d β̂) = m̂i m̂0i  
∂β 0 ∂β 0
i i i

∂ m̂i
where ∂β 0 is the derivative from parts (ii) and (iii) evaluated at x = xi , y = yi , and
β = β̂, and m̂i = m(xi , yi , β̂)
(b) (i) Let ei = yi − xi β̂(i) . Then m̂i = xi ei and ∂∂β
m̂i 2
0 = −xi .
P
sample moment condition: i xi (yi − xi β) = 0 at β̂(i)
P P 2 2
i xi yi d β̂(i) ) = Pi xi ei
β̂(i) = P 2 and Var(
i xi ( i x2i )2

(ii) Let e∗i = yi − xi β̂(ii) . Then m̂i = xi e∗i and ∂∂β


m̂i 3
0 = −xi .
P 2
sample moment condition: i xi (yi − xi β) = 0 at β̂(ii)
P 2 P 4 ∗2
xi yi x e
β̂(ii) = P 3 and Var(β̂(ii) ) = Pi i 3 i 2
i d
i xi ( i xi )

(c) (i) 38
β̂(i) = =1
38

Var(
d β̂(i) ) requires

6
X X
x2i e2i = x2i (yi − xi β̂(i) )2
i i
X X X
= x2i yi2 − 2β̂(i) x3i yi + β̂(i)
2
x4i
i i i
= 224 − 2(1)218 + (1)2 (230) = 18, so that
18
Var(
d β̂(i) ) = = .01247
(38)2
(ii) 88
β̂(ii) = = 0.9778
90

Var(
d β̂(ii) ) requires

X X
x4i e∗2
i = x4i (yi2 − 2yi xi β̂(ii) + x2i β̂(ii)
2
)
i i
X X X
= x4i yi2 − 2β̂(ii) x5i yi + β̂(ii)
2
x6i
i i i
2
= 1472 − 2(.9778)1538 + (.9778) (1718) = 106.85, so that
d β̂(ii) ) = 106.85 = .01319
Var(
(90)2
7. (a) " #−1 " #
h i 2 0 5−µ 1 1
g(µ) = 5 − µ 12 − 3µ = (5 − µ)2 + (12 − 3µ)2
0 4 12 − 3µ 2 4

(b) Minimize g(µ) with respect to µ:

2×3 18
∂g/∂µ = −(5 − µ) − (12 − 3µ) = −5 + µ − 18 + µ
4 4
23
This derivative equals zero at µ̂ = (22/4) = 4.18
(c) (i) The statistic is g(µ̂) = 21 (5 − 4.18)2 + 41 (12 − (3 × 4.18))2 = 0.4091
(ii) The asymptotic null distribution of g(µ̂) is χ2 with one degree of freedom.
(1 = 2 moment conditions minus 1 parameter = 1 overidentifying restriction.)

You might also like