SSRN Id276592

Massachusetts Institute of Technology
Department of Economics
Working Paper Series
BIAS CORRECTED INSTRUMENTAL

VARIABLES ESTIMATION FOR DYNAMIC PANEL
MODELS WITH FIXED EFFECTS
Jinyong Hahn
Jerry Hausman
Guido Kuersteiner
Working Paper 01-24

June 2001
Room E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the

Social Science Research Network Paper Collection at
http://papers.ssrn.com/abstract=276592
Bias Corrected Instrumental Variables Estimation for Dynamic
Panel Models with Fixed Effects
Jinyong Hahn Jerry Hausman Guido Kuersteiner

Brown University MIT MIT
June, 2001
Acknowledgment: We are grateful to Yu Hu for exceptional research assistance. Helpful

comments by Guido Imbens, Whitney Newey, James Powell, and participants of workshops at
Arizona State University, Berkeley, Harvard/MIT, Maryland, Triangle, UCL, UCLA, UCSD are
appreciated.
Abstract
This paper analyzes the second order bias of instrumental variables estimators for a dynamic
panel model with fixed effects. Three different methods of second order bias correction are
considered. Simulation experiments show that these methods perform well if the model does
not have a root near unity but break down near the unit circle. To remedy the problem near
the unit root a weak instrument approximation is used. We show that an estimator based on
long differencing the model is approximately achieving the minimal bias in a certain class of
instrumental variables (IV) estimators. Simulation experiments document the performance of
the proposed procedure in finite samples.
Keywords: dynamic panel, bias correction, second order, unit root, weak instrument
JEL: C13, C23, C51

1 Introduction
We are concerned with estimation of the dynamic panel model with fixed effects. Under large
n, fixed T asymptotics it is well known from Nickell (1981) that the standard maximum likeli-
hood estimator suffers from an incidental parameter problem leading to inconsistency. In order
to avoid this problem the literature has focused on instrumental variables estimation (GMM)
applied to first differences. Examples include Anderson and Hsiao (1982), Holtz-Eakin, Newey,
and Rosen (1988), and Arellano and Bond (1991). Ahn and Schmidt (1995), Hahn (1997), and
Blundell and Bond (1998) considered further moment restrictions. Comparisons of information
contents of varieties of moment restrictions made by Ahn and Schmidt (1995) and Hahn (1999)
suggest that, unless stationarity of the initial level yi0 is somehow exploited as in Blundell and
Bond (1998), the orthogonality of lagged levels with first differences provide the largest source
of information.
Unfortunately, the standard GMM estimator obtained after first differencing has been found
to suffer from substantial finite sample biases. See Alonso-Borrego and Arellano (1996). Mo-
tivated by this problem, modifications of likelihood based estimators emerged in the literature.
See Kiviet (1995), Lancaster (1997), Hahn and Kuersteiner (2000). The likelihood based esti-
mators do reduce finite sample bias compared to the standard maximum likelihood estimator,
but the remaining bias is still substantial for T relatively small.
In this paper, we attempt to eliminate the finite sample bias of the standard GMM estimator
obtained after first differencing. We view the standard GMM estimator as a minimum distance
estimator that combines T −1 instrumental variable estimators (2SLS) applied to first differences.
This view has been adopted by Chamberlain (1984) and Griliches and Hausman (1986). It has
been noted for quite a while that IV estimators can be quite biased in finite sample. See Nagar
(1959), Mariano and Sawa (1972), Rothenberg (1983), Bekker (1994), Donald and Newey (1998)
and Kuersteiner (2000). If the ingredients of the minimum distance estimator are all biased,
it is natural to expect such bias in the resultant minimum distance estimator, or equivalently,
GMM. We propose to eliminate the bias of the GMM estimator by replacing all the ingredients
with Nagar type bias corrected instrumental variable estimators. To our knowledge, the idea of
applying a minimum distance estimator to bias corrected instrumental variables estimators is
new in the literature.
We consider a second order approach to the bias of the GMM estimator using the formula
contained in Hahn and Hausman (2000). We find that the standard GMM estimator suffers
from significant bias. The bias arises from two primary sources: the correlation of the structural
equation error with the reduced form error and the low explanatory power of the instruments.
We attempt to solve these problems by using the “long difference technique” of Griliches and
Hausman (1986). Griliches and Hausman noted that bias is reduced when long differences are
used in the errors in variable problem, and a similar result works here with the second order
bias. Long differences also increases the explanatory power of the instruments which further
1
reduces the finite sample bias and also decreases the MSE of the estimator. To increase further
the explanatory power of the instruments, we use the technique of using estimated residuals as
additional instruments a technique introduced in the simultaneous equations model by Hausman,
Newey, and Taylor (1987) and used in the dynamic panel data context by Ahn and Schmidt
(1995). Monte Carlo results demonstrate that the long difference estimator performs quite well,
even for high positive values of the lagged variable coefficient where previous estimators are
badly biased.
However, the second order bias calculations do not predict well the performance of the estima-
tor for these high values of the coefficient. Simulation evidence shows that our approximations
do not work well near the unit circle where the model suffers from a near non-identification
problem. In order to analyze the bias of standard GMM procedures under these circumstances
we consider a local to non-identification asymptotic approximation.
The alternative asymptotic approximation of Staiger and Stock (1997) and Stock and Wright
(2000) is based on letting the correlation between instruments and regressors decrease at a
prescribed rate of the sample size. In their work, it is assumed that the number of instruments
is held fixed as the sample size increases. Their limit distribution is nonstandard and in special
cases corresponds to exact small sample distributions such as the one obtained by Richardson
(1968) for the bivariate simultaneous equations model. This approach is related to the work
by Phillips (1989) and Choi and Phillips (1992) on the asymptotics of 2SLS in the partially
identified case. Dufour (1997), Wang and Zivot (1998) and Nelson, Startz and Zivot (1998)
analyze valid inference and tests in the presence of weak instruments. The associated bias and
mean squared error of 2SLS under weak instrument assumptions was obtained by Chao and
Swanson (2000).
In this paper we use the weak instrument asymptotic approximations to analyze 2SLS for the
dynamic panel model. We analyze the impact of stationarity assumptions on the nonstandard
limit distribution. Here we let the autoregressive parameter tend to unity in a similar way as in
the near unit root literature. Nevertheless we are not considering time series cases since in our
approximation the number of time periods T is held constant while the number of cross-sectional
observations n tends to infinity.
Our limiting distribution for the GMM estimator shows that only moment conditions in-
volving initial conditions are asymptotically relevant. We define a class of estimators based on
linear combinations of asymptotically relevant moment conditions and show that a bias minimal
estimator within this class can approximately be based on taking long differences of the dynamic
panel model. In general, it turns out that under near non-identification asymptotics the optimal
procedures of Alvarez and Arellano (1998), Arellano and Bond (1991) , Ahn and Schmidt (1995,
1997) are suboptimal from a bias point of view and inference optimally should be based on a
smaller than the full set of moment conditions. We show that a bias minimal estimator can be
obtained by using a particular linear combination of the original moment conditions. We are
using the weak instrument asymptotic approximation to the distribution of the IV estimator to
2
derive the form of the optimal linear combination.
2 Review of the Bias of GMM Estimator

Consider the usual dynamic panel model with fixed effects:
yit = αi + βyi,t−1 + εit , i = 1, . . . , n; t = 1, . . . , T (1)
It has been common in the literature to consider the case where n is large and T is small. The
usual GMM estimator is based on the first difference form of the model
yit − yi,t−1 = β (yi,t−1 − yi,t−2 ) + (εit − εi,t−1 )
where the instruments are based on the orthogonality
E [yi,s (εit − εi,t−1 )] = 0 s = 0, . . . , t − 2.
Instead, we consider a version of the GMM estimator developed by Arellano and Bover (1995),
which simplifies the characterization of the “weight matrix” in GMM estimation. We define
the innovation uit ≡ αi + εit . Arellano and Bover (1995) eliminate the fixed effect αi in (1) by
applying Helmert’s transformation
r · ¸
∗ T −t 1
uit ≡ uit − (ui,t+1 + · · · + uiT ) , t = 1, . . . , T − 1
T −t+1 T −t
instead of first differencing.1 The transformation produces
∗
yit = βx∗it + ε∗it , t = 1, . . . , T − 1
where x∗t ≡ yi,t−1

∗ . Let zit ≡ (yi0 , . . . , yit−1 )0 . Our moment restriction is summarized by
E [zit ε∗it ] = 0 t = 1, . . . , T − 1
It can be shown that, with the homoscedasticity assumption on εit , the optimal “weight ma-
trix” is proportional to a block-diagonal matrix, with typical diagonal block equal to E [zit zit0 ].
Therefore, the optimal GMM estimator is equal to
PT −1 ∗0
b xt Pt yt∗
bGMM ≡ PTt=1 −1 ∗0 ∗
(2)
t=1 xt Pt xt
∗ , · · · , y ∗ )0 , Z ≡ (z , · · · , z )0 , and P ≡ Z (Z 0 Z )−1 Z 0 .
where x∗t ≡ (x∗1t , · · · , x∗nt )0 , yt∗ ≡ (y1t nt t 1t nt t t t t t
b
Now, let b2SLS,t denote the 2SLS of yt on xt : ∗ ∗
b x∗0 ∗
t Pt yt
b2SLS,t ≡ ∗0 , t = 1, . . . , T − 1
xt Pt x∗t
1
Arellano and Bover (1995) notes that the efficiency of the resultant GMM estimator is not affected whether
or not Helmert’s transformation is used instead of first differencing.
3
If εit are i.i.d. across t, then under the standard (first order) asymptotics where T is fixed and
n grows to infinity, it can be shown that
√ ³b ´0
n b2SLS,1 − β, . . . , bb2SLS,T −1 − β → N (0, Ψ) ,
¡ ¢
where Ψ is a diagonal matrix with the t-th diagonal elements equal to Var (εit )/ plim n−1 x∗0 ∗
t Pt xt .
Therefore, we may consider a minimum distance estimator, which solves
 0  −1  
bb2SLS,1 − b (x∗0 ∗ −1 bb2SLS,1 − b
1 P1 x1 ) 0
 ..   ..   .. 
min 
 .  
  .  
  . 

b ¡ ∗0 ¢−1
bb2SLS,T −1 − b 0 xT −1 PT −1 x∗T −1 bb2SLS,T −1 − b
The resultant minimum distance estimator is numerically identical to the GMM estimator in
(2):
PT −1 ∗ b
b t=1x∗0
t Pt xtt · b2SLS,t
bGMM = PT −1 ∗0 ∗
(3)
t=1 xt Pt xt
Therefore, the GMM estimator bbGMM may be understood as a linear combination of the 2SLS
estimators bb2SLS,1 , . . . , bb2SLS,T −1 . It has long been known that the 2SLS may be subject to
substantial finite sample bias. See Nagar (1959), Rothenberg (1983), Bekker (1994), and Donald
and Newey (1998) for related discussion. It is therefore natural to conjecture that a linear
combination of the 2SLS may be subject to quite substantial finite sample bias.
3 Bias Correction using Alternative Asymptotics

In this section, we consider the usual dynamic panel model with fixed effects (1) using the
alternative asymptotics where n and T grow to infinity at the same rate. Such approximation
was originally developed by Bekker (1994), and was adopted by Alvarez and Arellano (1998)
and Hahn and Kuersteiner (2000) in the dynamic panel context. We assume
i.i.d. ¡ ¢
Condition 1 εit ∼ N 0, σ2 over i and t.
We also assume stationarity on yi,0 and normality on αi 2 :

³ ´ ¡ ¢
αi σ2
Condition 2 yi0 | αi ∼ N 1−β , 1−β 2 and αi ∼ N 0, σ2α .
In order to guarantee that Zt0 Zt is nonsingular, we will assume that

2
This condition allows us to use lots of intermediate results in Alvarez and Arellano (1998). Our results are
expected to be robust to violation of this condition.
4
T
Condition 3 n → ρ, where 0 < ρ < 1.3
Alvarez and Arellano (1998) show that, under Conditions 1 - 3,

µ µ ¶¶
√ 1 ¡ ¢
b
nT bGMM − β − (1 + β) → N 0, 1 − β 2 , (4)
n
where bbGMM is defined in (2) and (3). By examining the asymptotic distribution (4) under such
alternative asymptotic approximation where n and T grow to infinity at the same rate, we can
develop a bias-corrected estimator. This bias-corrected estimator is given by
e n b 1
bGMM ≡ bGMM + . (5)
n−1 n−1
Combining (4) and (5), we can easily obtain:
√ ³ ´ ¡ ¢
Theorem 1 Suppose that Conditions 1 - 3 are satisfied. Then, nT ebGMM − β → N 0, 1 − β 2 .
¡ ¢
Hahn and Kuersteiner (2000) establish by a Hajék-type convolution theorem that N 0, 1 − β 2
is the minimal asymptotic distribution. As such, the bias corrected GMM is efficient. Although
the bias corrected GMM estimator e bGMM does have a desirable property under the alternative
asymptotics, it would not be easy to generalize the development leading to (5) to the model
involving other strictly exogenous variables. Such a generalization would require the charac-
terization of the asymptotic distribution of the standard GMM estimator under the alternative
asymptotics, which may not be trivial. We therefore consider eliminating biases in bb2SLS,t in-
stead. An estimator that removes the higher order bias of bb2SLS,t is the Nagar type estimator.
Let
∗0 ∗ 0∗ ∗
bbNagar,t = x∗t Pt yt − λt x∗t Mt yt ,
xt Pt xt − λt xt Mt x∗t
0 ∗ 0
Kt
where Mt ≡ I − Pt , λt ≈ n−K t
, and Kt denotes the number of instruments for the t-th equation.
Kt −2
For example, we may use λt = n−K t +2
as in Donald and Newey (1998). We may also use LIML
for the t-th equation, in which case λt would be estimated by the usual minimum eigenvalue
search.
We now examine properties of the corresponding minimum distance estimator. One possible
weight matrix for this problem is given by
 ¡ ¢ −1
x∗0 ∗ ∗0 ∗ −1
1 P1 x1 − λ1 x1 M1 x1 0
 .. 
 . 
 
¡ ∗0 ∗ ¢−1
0 xT −1 PT −1 x∗T −1 − λT −1 xT0−1 MT −1 x∗T −1
3
Alvarez and Arellano (1998) only require 0 ≤ ρ < ∞. We require ρ < 1 to guarantee that Zt0 Zt is singular for
every t.
5
With this weight matrix, it can be shown that the minimum distance estimator is given by
P ¡ ∗0 ∗ − λ x∗ 0 M y ∗
¢ P ¡ ∗0 ∗ ∗0 ∗
¢
ebNagar ≡ P ¡ ∗
t xt Pt yt t t t t t xt Pt εt − λt xt Mt ε t
0 ∗ ∗0 ∗
¢ = β + P ¡ ∗0 ∗ ∗0 ∗
¢. (6)
t xt Pt xt − λt xt Mt xt t xt Pt xt − λt xt Mt xt
One possible way to examine the finite sample property of the new estimator is to use the
alternative asymptotics:
Theorem 2 Suppose that Conditions 1-3 are satisfied. Also suppose that n and T grow to
√ ¡ ¢
infinity at the same rate. Then, nT (bN agar − β) → N 0, 1 − β 2 .
Proof. Lemmas 10, and 11 in Appendix A along with Lemma 2 of Alvarez and Arellano
(1998) establish that
µ ¶ µ ¶
1 X ∗0 ∗ Kt ∗0 ∗ σ4
√ xt Pt εt − x Mt εt → N 0,
nT t n − Kt t 1 − β2
and
µ ¶
1 X ∗0 ∗ Kt ∗0 ∗ σ2
xt Pt xt − xt Mt xt → ,
nT t n − Kt 1 − β2
from which the conclusion follows.

In Table 1, we summarized finite sample properties of ebNagar and b
bLIML approximated by
10000 Monte Carlo runs. Here, bbLIML is the estimator where λs in (6) are replaced by the
4
corresponding “eigenvalues”. In general, we find that ebNagar and bbLIML successfully remove
bias unless β is close to one and the sample size is small.
4 Bias Correction using Higher Order Expansions

In the previous section, we explained the bias of the GMM estimator as a result of the biases
of the 2SLS estimators. In this section, we consider elimination of the bias by adopting the
second order Taylor type approximation. This perspective has been adopted by Nagar (1959),
and Rothenberg (1983).
For this purpose we first analyze the second order bias of a general minimization estimator
b of a single parameter β ∈ R defined by
b = argmin Qn (c) (7)

c∈C
for some C ⊂ R. The score for the minimization problem is denoted by Sn (c) ≡ ∂Qn (c) /∂c.
The criterion function is assumed to be of the form Qn (c) = g (c)0 G (c)−1 g (c) where g (c) and
G (c) are defined in Condition 7. The criterion function depends on primitive functions δ (wi , c)
and ψ (wi , c) mapping Rk × R into Rd for d ≥ 1, where wi are i.i.d. observations. We assume
that E [δ (wi , β)] = 0. We impose the following additional conditions on wi , δ, and ψ.
³ ´
4 αi 1
In our Monte Carlo experiment, we let εit ∼ N (0, 1), αi ∼ N (0, 1), and yi0 ∼ N 1−β , 1−β 2 .
6
Condition 4 The random variables wi are i.i.d.
Condition 5 The functions δ (w, c) and ψ (w, c) are three times differentiable in c for c ∈ C
where C ⊂ R is a compact set such that β ∈ int C. Assume that δ (wi , c) and ψ (wi , c) satisfy a
Lipschitz condition kδ (wi , c1 ) − δ (wi , c2 )k ≤ Mδ (wi ) |c1 − c2 | for some function Mδ (.) : Rk → R
and c1 , c2 ∈ C with the hsame statement i holding for ψ. The functions Mδ (.) and Mψ (.) satisfy
2
E [Mδ (wi )] < ∞ and E |Mψ (wi )| < ∞.
Condition 6 Let δ j (wi , c) ≡ ∂ j δ (wi , c) /∂cj , Ψ (wi , c) ≡ ψ (wi , c) ψ (wi , c)0 and Ψj (wi , c) ≡
∂ j Ψ (wi , c) /∂cj . Then, λj (c) ≡ E [δ j (wi , c)], and Λj (c) ≡ E [Ψj (wi , c)] all exist and are finite
for j = 0, ..., 3. For simplicity, we use the notation λj ≡ λj (β), Λj ≡ Λj (β), λ (c) ≡ λ0 (c) and
Λ (c) ≡ Λ0 (c).
P P P
Condition 7 Let g (c) ≡ n1 ni=1 δ (wi , c), gj (c) ≡ n1 ni=1 δ j (wi , c), G (c) ≡ n1 ni=1 ψ (wi , c) ψ (wi , c)0
P p p
and Gj (c) ≡ n1 ni=1 Ψj (wi , c) . Then g (c) → E [δ (wi , c)], gj (c) → E [δ j (wi , c)], G (c) →
p
E [Ψ (wi , c)], and Gj (c) → E [Ψj (wi , c)] for all c ∈ C.
Our asymptotic approximation of the second order bias of b is based on an approximate

h i
¡ ¢
estimator b̃ such that b − b̃ = op n−1 . The approximate bias of b is then defined as E b̃ − β
while the original estimator b need not necessarily possess moments of any order. In order to
√
justify our approximation we need to establish that b is n-consistent and that Sn (b) = 0 with
probability tending to one. For this purpose we introduce the following additional conditions.
Condition 8 (i) There exists some finite 0 < M < ∞ such that the eigenvalues of E [Ψ (wi , c)]
£ ¤
are contained in the compact interval M−1 , M for all c ∈ C; (ii) the vector E [δ (wi , c)] = 0 if
and only if c = β; (iii) λ1 6= 0.
h i h i
Condition 9 There exists some η > 0 such that E Mδ (wi )2+η < ∞, E Mψ (wi )2+η < ∞,
h i h i
E supc∈C kδ (wi , c)k2+η < ∞, and E supc∈C kψ (wi , c)k4+η < ∞.
Condition 8 is an identification condition that guarantees the existence of a unique inte-

rior minimum of the limiting criterion function. Condition 9 corresponds to Assumption B of
Andrews (1994) and is used to establish a stochastic equicontinuity property of the criterion
function.
√
Lemma 1 Under conditions 4 - 9, b defined in (7) satisfies n(b − β) = Op (1) and Sn (b) = 0
with probability tending to 1.
Proof. See Appendix B.

Based on Lemma 1 the first order condition for (7) can be characterized by
0 = 2g1 (b)0 G (b)−1 g (b) − g (b)0 G (b)−1 G1 (b) G (b)−1 g (b) wp → 1. (8)
7
A second order Taylor expansion of (8) around β leads to a representation of b − β up to terms
of order op (n−1 ). In Appendix B, it is shown that
µ ¶ µ ¶
√ 1 1 1 1 Ψ 2 1
n (b − β) = − Φ + √ − Γ + 2 ΦΞ − 3 Φ + op √ . (9)
Υ n Υ Υ Υ n
³ ´
(See Definition 2 in Appendix B for the definition of Ψ, Υ, Φ, Ξ, and Γ.) Ignoring the op √1n
√
term in (9), and taking expectations, we obtain the “approximate mean” of n (b − β). We
present the second order bias of b in the next Theorem.
Theorem 3 Under Conditions 4-9, the second order bias of b is equal to

1 1 11 1 1 Ψ £ 2¤
−√ E [Φ] − E [Γ] + E [ΦΞ] − E Φ . (10)
nΥ nΥ n Υ2 nΥ3
where
E [Φ] = 0,
µ · ¸¶
−1 ∂δ 0i £ ¤ ¡ £ ¤¢
E [Γ] = 2 trace Λ E δ i − 2λ01 Λ−1 E ψi ψ 0i Λ−1 δi − trace Λ−1 Λ1 Λ−1 E δ i δ 0i ,
∂β
and
· ¸
∂δ 0 £ ¤
E [ΦΞ] = 8λ01 Λ−1 E δ i i Λ−1 λ1 − 4λ01 Λ−1 E δ i λ01 Λ−1 ψi ψ0i Λ−1 λ1
∂β
£ 0 ¤ −1 £ ¤
−8λ1 Λ E δ i δ i Λ Λ1 Λ−1 λ1 + 4λ01 Λ−1 E δ i δ 0i Λ−1 λ2 ,
0 −1
and
£ ¤ £ ¤
E Φ2 = 4λ01 Λ−1 E δ i δ 0i Λ−1 λ1 ,
Proof. See Appendix B.
Remark 1 For the particular case where ψ i = δ i , i.e. when b is a CUE, the bias formula (10)
exactly coincides with Newey and Smith’s (2000).
We now apply these general results to the GMM estimator of the dynamic panel model. The
GMM estimator bbGMM can be understood to be a solution to the minimization problem
Ã n !0 Ã n !
1X −1 1X
min mi (c) Vn mi (c)
c n n
i=1 i=1
for c ∈ C where C is some closed interval on the real line containing the true parameter value
and
 ∗ − c · x∗ )
  
zi1 (yi1 z z 0 0
i1 n i1 i1
 ..  1 X .. 
mi (c) = 
 . ,
 V n = 
 . .

³ ´ n
∗ ∗ i=1 0
zi,T −1 yi,T −1 − c · xi,T −1 0 zi,T −1 zi,T −1
We now characterize the finite sample bias of the GMM estimator bbGMM of the dynamic
panel model using Theorem 4. It can be shown that:
8
Theorem 4 Under Conditions 1-3 the second order bias of b
bGMM is equal to
µ ¶
B1 + B2 + B3 1
+o , (11)
n n
where
PT −1 ³ ´
zz −1 εxzz
B1 ≡ Υ−1
1 t=1 trace (Γ t ) Γt,t
−2 PT −1 PT −1 zx0 zz −1 εxzz −1 zx
B2 ≡ −2Υ1 t=1 s=1 Γt (Γt ) Γt,s (Γzz s ) Γs
−2 PT −1 PT −1 zx0 zz −1 zz −1 zx
B3 ≡ Υ1 t=1 s=1 Γt (Γt ) B3,1 (t, s) (Γs ) Γs .
h i
where Γzz 0 zx ∗ εxzz ∗ ∗ 0 ∗ zx0 zz −1 z z 0
t ≡ E [zit zit ] , Γt ≡ E [zit xit ] , Γt,s ≡ E [εit xis zit zis ] , B3,1 (t, s) ≡ E εit zit Γs (Γs ) is is
P −1 zx0 zz −1 zx
and Υ1 ≡ Tt=1 Γt (Γt ) Γt .
Proof. See Appendix C.

In Table 2, we compare the actual performance of b bGMM and the prediction of its bias based
on Theorem 4. Table 2 tabulates the actual bias of the estimator approximated by 10000 Monte
Carlo runs, and compares it with the second order bias based on the formula (11). It is clear
that the second order theory does a reasonably good job except when β is close to the unit circle
and n is small.
Theorem 4 suggests a natural way of eliminating the bias. Suppose that B b2 , B
b1 , B b3 are
√
n-consistent estimators of B1 , B2 , B3 . Then it is easy to see that
³ ´
bbBC1 ≡ bbGMM − 1 B b1 + B
b2 + B
b3 (12)
n
is first order equivalent to b bGMM , and has second order bias equal to zero. Define Γ̂zz
t =
−1
P n 0 zx −1
Pn ∗ εxzz −1
Pn ∗ ∗ 0
n i=1 zit zit , Γ̂t = n i=1 zit xit , Γ̂t,s = n i=1 eit xis zit zis and
n
X ³ ´−1
B̂3,1 (t, s) = n−1 e∗it zit0 Γ̂zx
t Γ̂zz
t
0
zis zis ,
i=1
where e∗it ≡ yit ∗ − x∗ b zz zx εxzz and

it bGMM . Let B̂1 , B̂2 and B̂3 be defined by replacing Γt , Γt ,Γt,s
B3,1 (t, s) by Γ̂zz zx εxzz
t , Γ̂t ,Γ̂t,s and B̂3,1 (t, s) in B1 , B2 and B3 .
Then the Bs b will satisfy the √n-consistency requirement, and hence, the estimator (12) will
be first order equivalent to bbGMM and will have zero second order bias. Because the summand
h £ ¤−1 ∗ £ ¤ i
0 −1
E E [zit x∗it ]0 E zit zit0 0 −1 0
εit zit λs Λs zis zis E zis zis ∗
E [zis xis ]
in the numerator of B3 is equal to zero for s < t, we may instead consider

µ ¶
bbBC2 ≡ bbGMM − 1 B b1 + B bb
b2 + B 3 (13)
n
9
where
1 bb PT −1 PT −1 zx0 ³ zz ´−1 ³ ´−1
B 3 ≡ Υ̂−2
1 s=t t=1 Γ̂t Γ̂t B̂3,1 (t, s) Γ̂zz
s Γ̂zx
s .
n
Second order asymptotic theory predicts approximately that bbBC2 would be relatively free
of bias. We examined whether such prediction is reasonably accurate in finite sample by 5000
Monte Carlo runs.5 Table 3 summarizes the properties of bbBC2 . We have seen in Table 2 that
the second order theory is reasonably accurate unless β is close to one. It is therefore sensible
to conjecture that bbBC2 would have a reasonable finite sample bias property as long as β is not
too close to one. Such a conjecture is verified in Table 3.
5 Long Difference Specification: Finite Iteration

In previous sections, we noted that even the second order asymptotics “fails” to be a good
approximation around β ≈ 1. This phenomenon can be explained by the “weak instrument”
problem. See Staiger and Stock (1997). Blundell and Bond (1998) argued that the weak instru-
ment problem can be alleviated by assuming stationarity on the initial observation yi0 . Such
stationarity condition may or may not be appropriate for particular applications. Further, sta-
tionarity assumption turns out to be a predominant source of information around β ≈ 1 as noted
by Hahn (1999). We therefore turn to some other method to overcome the weak instrument
problem around the unit circle avoiding the stationarity assumption. We argue that some of the
difficulties of inference around the unit circle would be alleviated by taking a long difference.
To be specific, we focus on a single equation based on the long difference
yiT − yi1 = β (yiT −1 − yi0 ) + (εiT − εi1 ) (14)
It is easy to see that the initial observation yi0 would serve as a valid instrument. Using
similar intuition as in Hausman and Taylor (1983) or Ahn and Schmidt (1995), we can see that
yiT −1 − βyiT −2 , . . . , yi2 − βyi1 would be valid instruments as well.
5.1 Intuition
In Hahn-Hausman (HH) (1999) we found that the bias of 2SLS (GMM) depends on 4 fac-
tors: “Explained” variance of the first stage reduced form equation, covariance between the
stochastic disturbance of the structural equation and the reduced form equation, the number of
instruments, and sample size:
1 (number of instruments) × (“covariance”)

E(β̂ 2SLS − β) ≈
n “Explained” variance of the first stage reduced form equation
5
The difference of Monte Carlo runs here induced some minor numerical difference (in properties of bGMM )
across Tables 1 - 3.
10
Similarly, the Donald-Newey (DN) (1999) MSE formula depends on the same 4 factors. We now
consider first differences (FD) and long differences (LD) to see why LD does so much better in
our Monte-Carlo experiments.
Assume that T = 4. The first difference set up is:
y4 − y3 = β (y3 − y2 ) + ε4 − ε3 (15)
For the RHS variables it uses the instrument equation:
y3 − y2 = (β − 1) y2 + α + ε3
Now calculate the R2 for equation (15) using the Ahn-Schmidt (AS) moments under “ideal con-
ditions” where you know β in the sense that the nonlinear restrictions become linear restrictions:
We would then use (y2 , y1 , y0 , α + ε1 , α + ε2 ) as instruments. Assuming stationarity for symbols,
but not using it as additional moment information, we can write
α
y0 = + ξ0,
1−β
³ ´
σ 2ε
where ξ 0 ∼ 0, 1−β 2 . It can be shown that the covariance between the structure error and
1−β
the first stage error is −σ2ε , and the “explained variance” in the first stage is equal to σ2ε β+1 .
Therefore,the ratio that determines the bias of 2SLS is equal to
−σ2ε 1+β
=− ,
σ2ε −β+1
β+1
1−β
which is equal to −19 for β = .9. For n = 100, this implies the percentage bias of
Number of Instruments −19 5 −19
× 100 = × 100 = −105. 56
Sample Size β 100 0.9
We now turn to the LD setup:
y4 − y1 = β (y3 − y0 ) + ε4 − ε1
It can be shown that the covariance between the first stage and second stage errors is −β 2 σ2ε ,
and the “explained variance” in the first stage is given by
¡ 6 4 5
¢
2 2β − 4β − 2β + 4β 2 + 4β − 2β 3 + 6 σ2 + β 6 − β 4 + 2 − 2β 3
−σ ε ¡ ¢ ,
−2β − 3 + β 2 σ2 − 1 + β 2
σ2α
where σ2 = σ 2ε
. Therefore, the ratio that determines the bias is equal to
¡ ¢
2 −2β − 3 + β 2 σ2 − 1 + β 2
β ¡ 6 ¢
2β − 4β 4 − 2β 5 + 4β 2 + 4β − 2β 3 + 6 σ2 + β 6 − β 4 + 2 − 2β 3
11
which is equal to
2. 570 3 × 10−4
−. 374 08 +
σ2 + 4. 830 6 × 10−2
for β = .9. Note that the maximum value that this ratio can take in absolute terms is
−. 374 08
which is much smaller than −19. We therefore conclude that the long difference increases R2 but
decreases the covariance. Further, the number of instruments is smaller in the long difference
specification so we should expect even smaller bias. Thus, all of the factors in the HH equation,
except sample size, cause the LD estimator to have smaller bias.
5.2 Monte Carlo

For the long difference specification, we can use yi0 as well as the “residuals” yiT −1 −βyiT −2 , . . . ,
yi2 −βyi1 as valid instruments.6 We may estimate β³by applying 2SLS to the long difference equa- ´
tion (14) using yi0 as instrument. We may then use yi0 , yiT −1 − b b2SLS yiT −2 , . . . , yi2 − bb2SLS yi1 as
instrument to the long difference equation (14) to estimate β. Call the estimator b b2SLS,1 . By
b b
iterating this procedure, we can define b2SLS,2³, b2SLS,3 , . . . Similarly, we may first estimate ´ β
b b
by the Arellano and Bover approach, and use yi0 , yiT −1 − bGMM yiT −2 , . . . , yi2 − bGMM yi1 as
instrument to the long difference equation (14) to estimate β. Call the estimator b b2SLS,1 . By
iterating this procedure, b b
³ we can define bGMM,2 , bGMM,3 , . . . Likewise, ´ we may first estimate
b b b
β by bLIML , and use yi0 , yiT −1 − bLIML yiT −2 , . . . , yi2 − bLIML yi1 as instrument to the long
difference equation (14) to estimate β. Call the estimator b bLIML,1 . By iterating this procedure,
b b
we can define bLIML,2 , bLIML,3 , . . . We found that such iteration of the long difference esti-
mator works quite well. We implemented these procedures for T = 5, n = 100, β = 0.9 and
σ2α = σ2ε = 1. Our finding with 5000 monte carlo runs is summarized in Table 4. In general, we
found that the iteration of the long difference estimator works quite well.7
We compared the performance of our estimator with Blundell and Bond’s (1998) estimator,
which uses additional information, i.e., stationarity. We compared four versions of their estima-
bBB1 , , bbBB4 with the long difference estimators bbLIML,1 , bbLIML,2 , bbLIML,3 . For the exact
tors b
definition of bbBB1 , , b
bBB4 , see Appendix E. Of the four versions, bbBB3 and bbBB4 are the ones
reported in their Monte Carlo section. In our Monte Carlo exercise, we set β = 0.9, σ2ε = 1,
αi ∼ N (0, 1). Our finding based on 5000 Monte Carlo runs is contained in Table 5. In terms of
bias, we find that Blundell and Bond’s estimators bbBB3 and bbBB4 have similar properties as the
long difference estimator(s), although the former dominates the latter in terms of variability.
6
We acknowledge that the residual instruments are irrelevant under the near unity asymptotics.
7
Second order theory does not seem to explain the behavior of the long difference estimator. In Table 7, we
compare the actual performance of the long difference based estimators with the second order theory developed
in Appendix F.
12
(We note, however, that b bBB1 and bbBB2 are seriously biased. This indicates that the choice
of weight matrix matters in implementing Blundell and Bond’s procedure.) This result is not
surprising because the long difference estimator does not use the information contained in the
initial condition. See Hahn (1999) for a related discussion. We also wanted to examine the
sensitivity of Blundell and Bond’s estimator to misspecification, i.e., nonstationary distribution
of yi0 . For this situation the estimator will be inconsistent.
³ In
´ order to assess the finite sample
αi σ 2ε
sensitivity, we considered the cases where yi0 ∼ 1−β , 1−β 2 . Our Monte Carlo results based
F F
on 5000 runs are contained in Table 6, which contains results for β F = .5 and β F = 0. We
find that the long difference estimator is quite robust, whereas bbBB3 and bbBB4 become quite
biased as predicted by the first order theory. (We note that bbBB1 and bbBB2 are less sensitive
to misspecification. Such robustness consideration suggests that choice of weight matrix is not
straightforward in implementing Blundell and Bond’s procedure.) We conclude that the long
difference estimator works quite well even compared to Blundell and Bond’s (1998) estimator.8
6 Near Unit Root Approximation

Our Monte Carlo simulation results summarized in Tables 1, 2, and 3 indicate that the previously
discussed approximations and the bias corrections that are based on them do not work well near
the unit circle. This is because the identification of the model becomes “weak” near the unit
circle. See Blundell and Bond (1998), who related the problem to the analysis by Staiger and
Stock (1997). In this Section, we formally adopt approximations local to the points in the
parameter space that are not identified. To be specific, we consider model (1) for T fixed and
n → ∞ when also β n tends to unity. We analyze the bias of the associated weak instrument
limit distribution. We analyze the class of GMM estimators that exploit Ahn and Schmidt’s
(1997) moment conditions and show that a strict subset of the full set of moment restrictions
should be used in estimation in order to minimize bias. We argue that this subset of moment
restrictions leads to the inference based on the “long difference” specification.
Following Ahn and Schmidt we exploit the moment conditions
£ ¤ ¡ ¢
E ui u0i = σ 2ε + σ2α I + σ2α 110
E [ui yi0 ] = σαy0 1
with 1 = [1, ..., 1]0 a vector of dimension T and ui = [ui1 , ..., uiT ]0 . The moment conditions can
be written more compactly as
" # " # " # " #
vech E [ui u0i ] 2 vech I 2 vech(I + 110 ) 0
b= = σε + σα + σαy0 (16)
E [ui yi0 ] 0 0 1
8
We did try to compare the sensitivity of the two moment restrictions by implementing CUE, but we experi-
enced some numerical problem. Numerical problem seems to be an issue with CUE in general. Windmeijer (2000)
reports similar problems with CUE.
13
where the redundant moment conditions have been eliminated by use of the vech operator which
extracts the upper diagonal elements from a symmetric matrix. Representation (16) makes it
clear that the vector b ∈ RT (T +1)/2+T is contained in a 3 dimensional subspace which is another
way of stating that there are G = T (T + 1)/2 + T − 3 restrictions imposed on b. This statement
is equivalent to Ahn and Schmidt’s (1997) analysis of the number of moment conditions.
GMM estimators are obtained from the moment conditions by eliminating the unknown
parameters σ2ε , σ2α and σαy0 . The set of all GMM estimators leading to consistent estimates of β
can therefore be described by a (T (T + 1)/2 + T ) × G matrix A which contains all the vectors
spanning the orthogonal complement of b. This matrix A satisfies
b0 A = 0.
For our purposes it will be convenient to choose A such that

£ ¤
b0 A = Euit ∆uis , E (uiT ∆uij ) , Eui ∆uik , E∆u0i yi0 ,
s = 2, ..., T ; t = 1, ...s − 2; j = 2, ..., T − 1; k = 2, ..., T
where ∆ui = [ui2 − ui1 , ..., uiT − uiT −1 ]0 . It becomes transparent that any other representation of
the moment conditions can be obtained by applying a corresponding nonsingular linear operator
C to the matrix A. It can be checked that there exists a nonsingular matrix C such that b0 AC = 0
is identical to the moment conditions (4a)-(4c) in Ahn and Schmidt (1997).
We investigate the properties of (infeasible) GMM estimators based on
E [uit ∆uis (β)] = 0, E [uiT ∆uij (β)] = 0, E [ui ∆uik (β)] = 0, E [yi0 ∆uit (β)] = 0
obtained by setting ∆uit (β) ≡ ∆yit − β∆yit−1 . Here, we assume that the instruments uit are
observable. Let gi1 (β) denote a column vector consisting of uit ∆uis (β) , uiT ∆uij (β) , ui ∆uik (β).
P £ ¤0
Also let gi2 (β) ≡ [yi0 ∆ui (β)]. Finally, let gn (β) ≡ n−3/2 ni=1 gi1 (β)0 , gi2 (β)0 with the
£ ¤
optimal weight matrix Ωn ≡ E gi (β n ) gi (β n )0 . The infeasible GMM estimator of a possibly
transformed set of moment conditions C 0 gn (β) then solves
¡ ¢+
β 2SLS = argmin gn (β)0 C C 0 Ωn C C 0 gn (β) (17)
β
³ ´
where C is a G × r matrix for 1 ≤ r ≤ G such that C 0 C = Ir and rank C (C 0 ΩC)+ C 0 ≥ 1.
We use (C 0 ΩC)+ to denote the Moore-Penrose inverse. We thus allow the use of a singular
weight matrix. Choosing r less than G allows to exclude certain hmomenti conditions. Let
P 0
fi,1 ≡ − ∂gi1 (β)/ ∂β, fi,2 ≡ − ∂gi2 (β)/ ∂β, and fn ≡ n−3/2 ni=1 fi1 0 , f0
i,2 . The infeasible
2SLS estimator can be written as
³ ¡ ¢+ ´−1 ¡ ¢+
β 2SLS − β n = fn0 C C 0 Ωn C C 0 fn fn0 C C 0 Ωn C C 0 gn (β n ) . (18)
We are now analyzing the behavior of β 2SLS − β n under local to unity asymptotics. We make
the following additional assumptions.9
9
Kruiniger (2000) considers similar local-to-unity asymptotics.
14
¡ ¢ ¡ ¢
Condition
³ 10´Let yit = αi + β n yit−1 + εit with εit ∼ N 0, σ2ε , αi ∼ N 0, σ 2α and yi0 ∼
αi 1
N 1−β , 1−β 2 , where β n = exp(−c/n) for some c > 0.
n n
Pt−1 ¡ ¢
Also note that ∆yit = β t−1
n η i0 + εit + √c
n
s−1
s=1 β n εit−s + op n−1 where
³ .¡ ¢´
ηi0 ∼ N 0, (β n − 1)2 1 − β 2n .
Under the generating mechanism described in the previous definition the following Lemma can
be established.
Lemma 2 Assume β n = exp(−c/n) for some c > 0. For T fixed and as n → ∞

n
X n
X
p p
n−3/2 fi,1 → 0, n−3/2 gi,1 (β 0 ) → 0
i=1 i=1
and
n
X
−3/2
£ 0 0
¤0 d £ ¤0
n fi,2 , gi,2 (β 0 ) → ξ 0x , ξ 0y
i=1
" #
£ ¤ Σ11 Σ12
where ξ 0x , ξ 0y ∼ N(0, Σ) with Σ = and Σ11 = δI, Σ12 = δM1 Σ22 = δM2 , where
Σ21 Σ22
σ2α σ 2ε
δ= ,
c2
   
−1 1 0 −1 2 0
 .. ..   
 . .   −1 . . . . . . 
   
M1 =  .. , M2 =  .. .. 
 0 . 1   . . −1 
   
−1 −1 2
and Σ12 = Σ021 . We also have

" #
1 0 0
Ωn = + o (1) .
n2 0 Σ22
Proof. See Appendix D.

Using Lemma (2) the limiting distribution of β 2SLS − β n is stated in the next corollary.
£ ¤
0 0
£ ¤
For this purpose we define the augmented vectors ξ # #
x = 0, ..., 0, ξ x and ξ y = 0, ..., 0, ξ y and
partition C = [C00 , C10 ]0 such that C 0 ξ # 0
x = C1 ξ x . Let r1 denote the rank of C1 .
Corollary 1 Let β 2SLS − β n be given by (18). If Condition 10 is satisfied then
d ξ 0x C1 (C10 Σ22 C1 )+ C10 ξ y

β 2SLS − 1 → = X (C, Σ22 ) (19)
ξ 0x C1 (C10 Σ22 C1 )+ C10 ξx
15
Unlike the limiting distribution for the standard weak instrument problem, X (C, Σ22 ), as
defined in (19), is based on normal vectors that have zero mean. This degeneracy is generated
by the presence of the fixed effect in the initial condition, scaled up appropriately to satisfy
the stationarity requirement10 for the process yit . Inspection of the proof shows that the usual
concentration parameter appearing in the limit distribution is dominated by a stochastic com-
ponent related to the fixed effect. This situation seems to be similar to time series models where
deterministic trends can dominate the asymptotic distribution.
Based on Corollary 1, we define the following class of 2SLS estimators for the dynamic panel
model. The class contains estimators that only use a nonredundant set of moment conditions
involving the initial conditions yi0 .
³ ´−1
Definition 1 Let β ∗2SLS be defined as β ∗2SLS ≡ argminβ g2,n (β)0 C1 C10 Ω̃C1 C10 g2,n (β), where
P
g2,n (β) ≡ n−3/2 ni=1 gi2 (β), Ω̃ is a symmetric positive definite (T − 1) × (T − 1) matrix of con-
stants and C1 is a (T − 1) × r1 matrix of full column rank r1 ≤ T − 1 such that C10 C1 = I.
Next we turn to the analysis of the asymptotic bias for the estimator β ∗2SLS of the dynamic
panel model. Since the limit only depends on zero mean normal random vectors we can apply
the results of Smith (1993).
³ ´
Theorem 5 Let X ∗ C1 , Ω̃ be the limiting distribution of β ∗2SLS − 1 in Definition 1 under
³ ´−1
Condition 10. Let D̄ ≡ (D + D0 ) /2, where D = C10 Ω̃C1 C10 M10 C1 . Then
¡ ¢ µ
h ³ í X∞
(1)k 12 1+k 1,k ³ ´−1 ¶
−1 −1 0
E X C1 , Ω̃ = λ̄ ¡ r1 ¢ C1+k D̄, Ir1 − λ̄ C1 Ω̃C1
k=0 2 1+k k!
10
An alternative way to parametrize theµstationarity
¶ condition is to set yit = (1 − β n ) αi + β n yit−1 + εit ,
¡ 2
¢ ¡ 2
¢ σ2
y
εit ∼ N 0, σ ε , αi ∼ N 0, σα and yi0 ∼ N αi , 1−β2 with β n = exp (−c/n) . It can be shown that an estimator
n
soley based on the moment condition E [ui ∆uik (β 0 )] = 0 is consistent. Restricting attention to estimators that
are based on all moment conditions except the condition E [ui ∆uik (β 0 )] = 0, one can show that
d (µ + ξx )0 C1 (C10 Σ22 C1 )−1 C10 ξ y

β 2SLS − 1 → =X
(µ + ξ x )0 C1 (C10 Σ22 C1 )−1 C10 (µ + ξx )
where µ = 1T −1 σ 2y /2. Estimators in the class β ∗2SLS defined in Definition 1 can be analyzed by the same methods
³ as
´
under the stationarity assumptions discussed in the paper. Their limiting distributions are denoted by X ∗ C1 , Ω̃ .
Moments of X ∗ are intractably complicated unless Ω̃ = IT −1 . Then the mean of X ∗ (C1 , IT −1 ) is given by
µ
0 tr D̄ r r µ̄0 µ̄ µ̄0 D̄µ̄ r r µ̄0 µ̄
E [X ∗ ] = B(D̄, µ̄) = e−µ̄ µ̄/2 1 F1 ( , + 1, )+ 1 F1 ( + 1, + 2, )
r 2 2 2 r+2 2 2 2
¶
µ̄0 D̄µ̄ r−1 r−1 µ̄0 µ̄ µ̄0 D̄µ̄ 0 r−1 r−1 µ̄0 µ̄
− 1 F1 ( , + 1, )− µ̄ µ̄ 1 F1 ( , + 1, )
r+1 2 2 2 r+1 2 2 2
with D̄ = (D + D 0 ) /2, D = C10 M1 C1 and µ̄ = C10 µ. While finding an exact analytical minimum of the above bias
expression as a function of the weight matrix is probably infeasible due to the complexity of the formula, one
sees relatively easily that for large T the minimum must approximately be achieved for C = 1T −1 , thus leading
approximately to the long difference estimator.
16
1,k
where r1 = rank (C1 ), (a)b is the Pochammer symbol Γ (a + b) /Γ (b), Cp+k (., .) is a top order
³ ´−1
invariant polynomial defined by Davis (1980) and λ̄ is the largest eigenvalue of C10 Ω̃C1 .
h ³ í
The mean E X C1 , Ω̃ exists for r1 ≥ 1.
Proof. See Appendix D.

The Theorem shows that the bias of β ∗2SLS both depends on the choice of C1 and the weight
matrix Ω̃. Note fore example that E [X(C1 , IT −1 )] = tr D̄/r1 .
The problem of minimizing the bias by choosing optimal matrices C1 and Ω̃ does not seem
to lead to an analytical solution but could in principle be carried out numerically for a given
number of time periods T . For our purpose we are not interested in such an exact minimum. We
show however that for two particular choices of Ω̃ where Ω̃ = IT −1 or Ω̃ = Σ22 and subsequent
minimization over C1 an analytical solution for the bias minimal estimator can be found. It
turns out that the optimum is the same for both weight matrices. The theorem also shows
that the optimum can be reasonably well approximated by a procedure that is very easy to
implement.
³ ´
Theorem 6 Let X C1 , Ω̃ be as defined in Definition 1. Let D̄ = (D + D0 ) /2 where D =
³ ´−1
C10 Ω̃C1 C10 M10 C1 . Then
min |E [X(C1 , IT −1 )]| = min |E [X(C1 , Σ22 )]| .

C1 s.t. C10 C1 =Ir1 C1 s.t. C10 C1 =Ir1
r1 =1,..,T −1 r1 =1,..,T −1
Moreover,
E [X (C1 , IT −1 )] = tr D̄/r1 .
Let C1∗ = argminC1 |E [X(C1 , IT −1 )]| subject to C10 C1 = Ir1 , r1 = 1, .., T − 1. Then C1∗ = ρi where
ρi is the eigenvector corresponding to the smallest eigenvalue li of D̄. Thus, minC1 tr D̄/r1 =
min li /2. As T → ∞ the smallest eigenvalue of D̄, min li → 0. Let 1 = [1, ..., 1]0 be a T −1 vector.
Then for C1 = 1/(10 1)1/2 it follows that tr D̄ → 0 as T → ∞.
Theorem 6 shows that the estimator that minimizes the bias is based only on a single moment
condition which is a linear combination of the moment conditions involving yi0 as instrument
where the weights are the elements of the eigenvector ρi corresponding to the smallest eigenvalue
of (M1 + M10 ) /2. This eigenvalue can be easily computed for any given T. The Theorem also
shows that at least for large T a heuristic method which puts equal weight on all moment
conditions leads to essentially the same bias reduction as the optimal procedure. The heuristic
procedure turns out be equal to the moment condition E [(uiT − ui1 ) yi0 ] which can be motivated
by taking ”long differences” of the model equation yit = αi + β n yit−1 + εit i.e. by considering
yiT − yi1 = β n (yiT −1 − yi0 ) + εiT − εi1 .
It can also be shown that a 2SLS estimator that uses all moment conditions involving yi0 remains
biased even as T → ∞.
17
7 Long Difference Specification: Infinite Iteration
We found that the iteration of the long difference estimator works quite well. In the (` + 1)-th
iteration, our iterated estimator estimates the model
yiT − yi1 = β(yiT −1 − yi0 ) + εiT − εi1

³ ´ ³ ´
based on 2SLS using instruments zi β b(`) ≡ yi0 , yi2 − β b yi1 , . . . , yiT −1 − β
(`)
b yiT −2 , where
(`)
b(`) is the estimator obtained in the previous iteration. We might want to examine properties
β
of an estimator based on an infinite iteration, and see if it improves the bias property. If we
continue the iteration and it converges11 , the estimator is a fixed point to the minimization
problem
ÃN !0 Ã N !−1 Ã N !
X X X
min ξ i (b) zi (b) zi (b)0 ξ i (b)
b
i=1 i=1 i=1
where ξ i (b) ≡ zi (b) ((yiT − yi1 ) − b (yiT −1 − yi0 )). Call the minimizer the infinitely iterated
2SLS and denote it β b b
I2SLS . Another estimator which resembles β I2SLS is CUE, which solves
ÃN !0 Ã N !−1 Ã N !
X X X
b
β ≡ argmin L (b) = argmin ξ (b) ξ (b) ξ (b)0
ξ (b) .
CUE i i i i
b b i=1 i=1 i=1
Their actual performance approximated by 5000 Monte Carlo runs along with the biases pre-
dicted by second order theory in Theorem 4 are summarized in Tables 8 and 9. We find that
the long difference based estimators have quite reasonable finite sample properties even when β
is close to 1. Similar to the finite iteration in the previous section, the second order theory seem
to be next to irrelevant for β close to 1.
We compared performances of our estimators with Ahn and Schmidt’s (1995) estimator as
well as Blundell and Bond’s (1998) estimator. Both estimators are defined in two-step GMM
procedures. In order to make a accurate comparison with our long difference strategy, for which
there is no ambiguity of weight matrix, we decided to apply the continuous updating estimator to
their moment restrictions. We had difficulty of finding global minimum for Ahn and Schmidt’s
(1995) moment restrictions. We therefore used a Rothenberg type two step iteration, which
would have the same second order property as the CUE itself. (See Appendix I.) Again, in order
to make a accurate comparison, we applied the two step iteration idea to our long difference and
Blundell and Bond (1998) as well. We call these estimators β b
bCU E2,AS , β b
CU E2,LD , and β CU E2,BB .
We set n = 100 and T = 5. Again the number of monte carlo runs was set equal to 5000. Our
results are reported in Tables 10 and 11. We can see that the long difference estimator has a
comparable property to Ahn and Schmidt’s estimator. We do not know why β bCU E2,LD has such
a large median bias at β = .95 whereas β b
CU E,LD does not have such problem.
11
There is no a priori reason to believe that the iterations converge to the fixed point. To show that, one would
have to prove that the iterations are a contraction mapping.
18
8 Conclusion
We have investigated the bias of the dynamic panel effects estimators using second order approx-
imations and Monte Carlo simulations. The second order approximations confirm the presence
of significant bias as the parameter becomes large, as has previously been found in Monte Carlo
investigations. Use of the second order asymptotics to define a second order unbiased estimator
using the Nagar approach improve matters, but unfortunately does not solve the problem. Thus,
we propose and investigate a new estimator, the long difference estimator of Griliches and Haus-
man (1986). We find in Monte Carlo experiments that this estimator works quite well, removing
most of the bias even for quite high values of the parameter. Indeed, the long differences esti-
mator does considerably better than “standard” second order asymptotics would predict. Thus,
we consider alternative asymptotics with a near unit circle approximation. These asymptotics
indicate that the previously proposed estimators for the dynamic fixed effects problem suffer
from larger biases. The calculations also demonstrate that the long difference estimator should
work in eliminating the finite sample bias previously found. Thus, the alternative asymptotics
explain our Monte Carlo finding of the excellent performance of the long differences estimator.
19
Technical Appendix
A Technical Details for Section 3

Lemma 3
" ¶#
Xµ Kt ∗
E x∗0 ∗
t Pt εt − xt 0 Mt ε∗t = 0.
t
n − Kt
Proof. We have
· ¸
Kt ∗ 0 Kt
E x∗0t P t ε∗
t − xt Mt ε∗ ∗ ∗0
t = E [trace (Pt Et [εt xt ])] − E [trace (Mt Et [ε∗t x∗0
t ])] ,
n n − Kt
where Et [·] denotes the conditional expectation given Zt . Because Et [ε∗t ] = 0, Et [ε∗t x∗0
t ] is the conditional
∗ ∗0
covariance between εt and yt−1 , which does not depend on Zt due to joint normality. Moreover, by cross-
sectional independence, we have
£ ∗ ∗ ¤
Et [ε∗t x∗0
t ] = Et εi,t xi,t In .
Hence, using the fact that trace (Pt ) = Kt and trace (Mt ) = n − Kt , we have
· ¸ µ ¶
∗0 ∗ Kt ∗ 0 ∗
£ ∗ ∗ ¤ Kt
E xt Pt εt − x Mt εt = Et εi,t xi,t · Kt − (n − Kt ) = 0,
n t n − Kt
Lemma 4
£ ∗2 ¤ ∗ ∗ 2
Var (x∗0 ∗ 2
t Mt εt ) = (n − t) σ E vit + (n − t) (E [vit εit ]) ,
Cov (x∗0 ∗ ∗0 ∗ ∗ ∗ ∗ ∗
t Mt εt , xs Ms εs ) = (n − s) E [vis εit ] E [vit εis ] , s<t
∗
where vit ≡ x∗it − E [ x∗it | zit ].
Proof. Follows by modifying the developments from (A23) to (A30) and from (A31) to A(34) in
Alvarez and Arellano (1998).
Lemma 5 Suppose that s < t. We have

Ã !2
£ ∗2 ¤ T −t 1 β − β T −t+1 σ2α
E vit = −
T − t + 1 1 − β (T − t) (1 − β)2 1+ σ 2α 2
+ σ2α
(t − 2)
σ2 1−β σ2
T −t 1
− σ2
T − t + 1 (T − t)2 (1 − β)2
Ã !
β 2 − 2β T −t+2 + β 2(T −t)+2 − 2β T −t+1 + 2β
× (T − t) + ,
β2 − 1
³ ´
r
T −t 1 − β T −t
∗ ∗
E [vit εit ] = −σ2
T − t + 1 (T − t) (1 − β)
r Ã !
2 T −t 1 β − β T −t
+σ (T − t) − ,
T − t + 1 (T − t)2 (1 − β) 1−β
20
³ ´
r 1 − β T −t
∗ ∗ T −s
E [vis εit ] = −σ 2
T − s + 1 (T − s) (1 − β)
r Ã !
T −s 1 1 − β T −t
+ σ2 (T − t) − ,
T − s + 1 (T − s) (T − t) (1 − β) 1−β
r Ã !
∗ ∗ 2 T −t 1 β − β T −t+1
E [vit εis ] =σ T −t− .
T − t + 1 (T − s) (T − t) (1 − β) 1−β
∗
Proof. We first characterize vit . We have
xi,t = yi,t−1
xi,t+1 = yi,t = αi + βyi,t−1 + εi,t
..
.
1 − β T −t ³ ´
xi,T = yi,T −1 = αi + β T −t yi,t−1 + εi,T −1 + βεi,T −2 + · · · + β T −t−1 εi,t
1−β
and hence
r
T −t+1 ∗ 1
xit = xit − (xit+1 + · · · + xiT )
T −t T −t
1
= yi,t−1 − (xit+1 + · · · + xiT )
T −t
Ã ! Ã !
β − β T −t+1 1 β − β T −t+1
= 1− yi,t−1 − − αi
(T − t) (1 − β) 1 − β (T − t) (1 − β)2
¡ ¢ ³ ´
(1 − β) εi,T −1 + 1 − β 2 εi,T −2 + · · · + 1 − β T −t εi,t
− .
(T − t) (1 − β)
It follows that
r Ã ! Ã !
T −t+1 ∗ β − β T −t+1 1 β − β T −t+1
E [ xit | zit ] = 1 − yi,t−1 − − E [ αi | zit ] ,
T −t (T − t) (1 − β) 1 − β (T − t) (1 − β)2
from which we obtain
r Ã !
∗ T −t 1 β − β T −t+1
vit =− − (αi − E [ αi | zit ])
T −t+1 1 − β (T − t) (1 − β)2
¡ ¢ ³ ´
r
T −t (1 − β) εi,T −1 + 1 − β 2 εi,T −2 + · · · + 1 − β T −t εi,t
− . (20)
T −t+1 (T − t) (1 − β)
h i
We now compute E (αi − E [ αi | zit ])2 = Var [ αi | zit ]. It can be shown that
¡ ¢ σ 2α ¡ ¢ σ2α
Cov αi , (yi0 , . . . , yit−1 )0 = `, and Var (yi0 , . . . , yit−1 )0 = ``0 + Q
1−β (1 − β)2
where ` is a t-dimensioanl column vector of ones, and
 
1 β β t−1
 
σ 2  β 1 β t−2 
Q=  
2  .. 
1−β  . 
β t−1 1
21
Therefore, the conditional variance is given by
" #−1
(1 − β)2
σ2α − σ2α `0 0
`` + Q `
σ2α
Because
" #−1 Ã !−1
0(1 − β)2 (1 − β)2
`` + Q = Q
σ 2α σ 2α
Ã !−1 Ã !−1
1 (1 − β)2 0 (1 − β)2
− ³ ´−1 Q `` Q
(1−β)2 σ2α σ2α
1 + `0 σ2α Q `
Ã !2
σ 2α −1 1 σ 2α
= 2Q − σ 2α 2 Q−1 ``0 Q−1 ,
(1 − β) 1+ (1−β)2
`0 Q−1 ` (1 − β)
we obtain
" #−1 ³ 2 ´2
σα 0 −1 σ 2α
(1 − β)2
σ 2
(1−β) 2` Q ` (1−β)2
`0 Q−1 `
α
`0 ``0 + Q ` = `0 −1
Q ` − =
(1 − β)2
2 2
σ 2α σα
1 + (1−β) 0 −1 `
2` Q
σα
1 + (1−β) 0 −1 `
2` Q
and hence,
" #−1
(1 − β)2 σ2α
σ 2α − σ2α `0 `` + 0
Q `= σ 2α
σ 2α 1+ `0 Q−1 `
(1−β)2
Now, it can be shown that12

1 ³ 2
´
`0 Q−1 ` = 2 (1 − β) + (t − 2) (1 − β)
σ2
h i σ2α
E (αi − E [ αi | zit ])2 = σ2α σ 2α
. (21)
2
1+ σ2 1−β + σ2 (t − 2)
£ ∗2 ¤
We now characterize E vit . Using (20), and the independence of the first and second term there,
we can see that
Ã !2
£ ∗2 ¤ T −t 1 β − β T −t+1 h i
2
E vit = − E (α i − E [ α | z
i it ])
T − t + 1 1 − β (T − t) (1 − β)2
Ã !
2 T −t+2 2(T −t)+2 T −t+1
T − t 1 β − 2β + β − 2β + 2β
− σ2 (T − t) + .
T − t + 1 (T − t)2 (1 − β)2 β2 − 1
With (21), we obtain the first conclusion.

∗ ∗
As for E [vit εit ], we note that
1
ε∗it = εit − (εiT + · · · + εit+1 ) .
T −t
12
See Amemiya (1985, p. 164), for example.
22
Combining with (20), we obtain
³ ´ ³ ´
r 1 − β T −t r (1 − β) + · · · + 1 − β T −t−1
∗ ∗ T − t T − t
E [vit εit ] = − σ2 + σ2,
T − t + 1 (T − t) (1 − β) T −t+1 (T − t)2 (1 − β)
from which follows the second conclusion.
∗ ∗ ∗ ∗
As for E [vis εit ] and E [vit εis ] s < t, we note that
³ ´
r T −t
T −s 1 − β
∗ ∗
E [vis εit ] = − σ2
T − s + 1 (T − s) (1 − β)
¡ ¢ ³ ´
r 2 T −t−1
T − s (1 − β) + 1 − β + · · · + 1 − β
+ σ2
T −s+1 (T − s) (T − t) (1 − β)
and
¡ ¢ ³ ´
r
T −t (1 − β) + 1 − β 2 + · · · + 1 − β T −t
∗ ∗
E [vit εis ] = σ2.
T −t+1 (T − s) (T − t) (1 − β)
Lemma 6
1 X t2 £ ∗2 ¤
E vit = o (1)
nT t n − t
Proof. Write
Ã !2
£ ∗2 ¤ T −t 1 β − β T −t+1 σ2α
E vit = − 2 σ 2 2
T −t+1 1 − β (T − t) (1 − β) 2
1 + σα2 1−β + σσα2 (t − 2)
µ ¶
2 T −t 1 β 2 + 2β
−σ (T − t) + 2
T − t + 1 (T − t)2 (1 − β)2 β −1
Ã !
T −t+2 2(T −t)+2 T −t+1
T − t 1 −2β + β − 2β
− σ2
T − t + 1 (T − t)2 (1 − β)2 β2 − 1
Sum of the first two terms on the right can be bounded above by
σ 2α
C σ2α σ 2α
,
2
1+ σ2 1−β + σ2 (t − 2)
and the third term can be bounded above in absolute value by

1
C
(T − t)2
where C is a generic constant. Therefore, we have
¯ ¯
¯ 1 X t2 £ ∗2 ¤¯¯ C X t2 σ2αC X t2 1
¯
¯ E vit ¯ ≤ +
¯ nT
t
n−t ¯ nT
t
n − t 1 + σ2α2 2
σ 1−β + (t − 2) nT t
σ 2α
n
σ2
− t (T − t)2
C X T2 σ2α C X T2 1
≤ +
nT t n − T 1 + σα2 2 + σα2 (t − 2) nT t n − T (T − t)2
2 2
σ 1−β σ
23
It can be shown that
X σ2α X 1
σ 2α σ2α
= O (log T ) , 2 = O (1)
2 (T − t)
t 1+ σ2 1−β + σ2 (t − 2) t
Using the assumption that T /n = O (1), we obtain the desired conclusion.
Lemma 7
1 X t2 ∗ ∗ 2
(E [vit εit ]) = o (1)
nT t n − t
∗ ∗ C 2
Proof. We can bound (E [vit εit ]) by (T −t)2 , where C is a generic constant. Conclusion easily follows
by adopting the same proof as in Lemma 6.
Lemma 8
1 X st ∗ ∗ ∗ ∗
E [vis εit ] E [vit εis ] = o (1)
nT s<t n − t
∗ ∗ ∗ ∗ C
Proof. We can bound |E [vis εit ] E [vit εis ]| by (T −s)2
. Therefore, we have
¯ ¯ Ã t−1 !
¯ 1 X st ¯ T −1 t−1
C X X st 1
T −1
C X t X s
¯ ∗ ∗ ∗ ∗ ¯
¯ E [vis εit ] E [vit εis ]¯ ≤ ≤
¯ nT
s<t
n−t ¯ nT
t=1 s=1
n − t (T − s)2 nT t=1 n − t s=1 (T − s)2
But because
s T 1 T
2 = 2 − ≤
(T − s) (T − s) T −s (T − s)2
¯ P ¯
¯ 1 st ∗ ∗ ∗ ∗ ¯
we can bound ¯ nT s<t n−t E [v ε
is it ] E [v it is ¯ further by
ε ]
T −1 t−1
C X t X 1
n t=1 n − t s=1 (T − s)2
Because
t−1
ÃZ ! µ ¶ µ ¶
X 1 T
1 T −t 1
=O ds = O = O
s=1 (T − s)2 t s2 Tt t
we have
T −1 µ ¶ T −1 T −1
C X t 1 1 C X 1 C X t
− = − .
n t=1 n − t t T n t=1 n − t nT t=1 n − t
Conclusion follows from

¯ ¯ Ã T −1 ! µ ¶
¯ 1 X st ¯ C X 1 log n − log T
¯ ∗ ∗ ∗ ∗ ¯
¯ E [vis εit ] E [vit εis ]¯ = O =O = o (1) .
¯ nT s<t n − t ¯ n t=1 n − t n
24
Lemma 9
Ã !
1 X t ∗
Var √ xt 0 Mt ε∗t = o (1)
nT t n − t
Proof. Note that

Ã ! µ ¶2
1 X t ∗ 1 X t ³ ∗ ´
0 ∗
Var √ xt Mt εt = Var xt 0 Mt ε∗t
nT t n − t nT t n−t
X µ ¶µ ¶
2 t s
+ Cov (x∗0 ∗ ∗0 ∗
t Mt εt , xs Ms εs )
nT s<t n − t n−s
σ 2 X t2 £ ∗2 ¤
= E vit
nT t n − t
1 X t2 ∗ ∗ 2 2 X st ∗ ∗ ∗ ∗
+ (E [vit εit ]) + E [vis εit ] E [vit εis ]
nT t n − t nT s<t n − t
Here, the second equality is based on Lemma 4. Lemmas 6, 7, and 8 establish that variances of the three
terms on the far right are all of order o (1).
Lemma 10
µ ¶ µ ¶
1 X ∗0 ∗ Kt ∗
0 ∗ σ4
√ xt Pt εt − x Mt εt → N 0,
nT t n − Kt t 1 − β2
Proof. Follows easily by combining Lemma 9 and the proof of Theorem 2 in Alvarez and Arellano
(1998).
Lemma 11
1 X Kt ∗
x 0 Mt x∗t = op (1)
nT t n − Kt t
∗ ∗
Proof. First, note that xt 0 Mt x∗t = vt 0 Mt vt∗ by normality. We therefore have
Ã !
1 X Kt ∗ 1 X t h ∗ i
0 ∗
E xt Mt xt = E vt 0 Mt vt∗
nT t n − Kt nT t n − t
By conditioning, it can be shown that

h ∗ i £ ∗2 ¤
E vt 0 Mt vt∗ = (n − t) E vit
Therefore,
Ã !
1 X Kt ∗ 1 X £ ∗2 ¤
E xt 0 Mt x∗t = tE vit
nT t n − Kt nT t
Modifying the proof of Lemma 6, we can establish that the right is o (1).
We now show that
Ã !
1 X Kt ∗
0 ∗
Var x Mt xt = o (1) .
nT t n − Kt t
25
We have
Ã ! µ ¶2
1 X Kt ∗ 1 X t ³ ∗ ´
Var x 0 Mt x∗t = Var vt 0 Mt vt∗
nT t n − Kt t 2 2
n T t n−t
2 X t s ³ ∗ ∗
´
+ 2 2 Cov vs 0 Ms vs∗ , vt 0 Mt vt∗
n T s<t n − t n − s
Modifying the development from (A53) to (A58) in Alvarez and Arellano (1998) and using normality, we
can show that
³ ∗ ´ £ ∗4 ¤ ¡ £ ∗2 ¤¢2
Var vt 0 Mt vt∗ = 2 (n − t) E vit = 6 (n − t) E vit ,
³ ∗ ∗
´
∗ ∗ 2
Cov vs 0 Ms vs∗ , vt 0 Mt vt∗ = 2 (n − t) (E [vit vis ]) .
Using (20), we can show that

r r Ã !
∗ ∗ T −t T −s 1 β − β T −t+1
E [vit vis ] = −
T −t+1 T −s+1 1 − β (T − t) (1 − β)2
Ã !
1 β − β T −s+1 σ2α
× −
1 − β (T − s) (1 − β)2 1 + σ2α2 2 + σ2α2 (t − 2)
σ 1−β σ
r r
T −t T −s 1
+ σ2
T − t + 1 T − s + 1 (T − t) (T − s) (1 − β)2
Ã !
β 2 − 2β T −t+2 + β 2(T −t)+2 − 2β T −t+1 + 2β
× (T − t) + .
β2 − 1
Adopting the same argument as in the proofs for Lemmas 6 - 8, we can show that the variance is o (1).
B Technical Details for Section 4

Definition 2
Ψ ≡ 3λ01 Λ−1 λ2 − 3λ01 Λ−1 Λ1 Λ−1 λ1 ,

1
Υ ≡ 2λ1 Λ−1 λ1 , √ Φ ≡ 2λ01 Λ−1 g,
n
1
√ Ξ ≡ 4 (g1 − λ1 )0 Λ−1 λ1 − 2λ01 Λ−1 (G − Λ) Λ−1 λ1 − 4λ01 Λ−1 Λ1 Λ−1 g + 2λ02 Λ−1 g,
n
1
Γ ≡ 2 (g1 − λ1 )0 Λ−1 g − 2λ01 Λ−1 (G − Λ) Λ−1 g − g0 Λ−1 Λ1 Λ−1 g.
n
Proof of Lemma 1. By Lemma 2 (a) and Theorem 1(a) of Andrews (1992) and Conditions
4-7 it follows that supc∈C |Qn (c) − Q (c)| = op (1), where Q (c) ≡ λ (c)0 Λ (c)−1 λ (c). Let B (β, ²) be an
open interval of length ² centered at β. By Condition 8 it follows that inf c∈B(β,²)
/ Q (c) > Q (β) = 0
for all ² > 0. It then follows from standard arguments that b − β = op (1). It therefore follows that
Pr (Sn (b) 6= 0) ≤ Pr (b ∈ ∂C) = 1 − Pr (b ∈ int C) ≤ 1 − Pr (b ∈ B(β, ²)) → 0 for any ² > 0 where ∂C
denotes the boundary of C.
26
Using similar arguments as in Pakes and Pollard (1989) we write
¯ ¯
¯ ¯
|Q (b)| = ¯λ (b)0 Λ (b)−1 λ (b)¯
¯ ¯
¯ ¯
≤ ¯g (b)0 G (b)−1 g (b) − λ (b)0 Λ (b)−1 λ (b) − g (β)0 G (b)−1 g (β)¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
+ ¯g (b)0 G (b)−1 g (b)¯ + ¯g (β)0 G (b)−1 g (β)¯
¯ ¯
¯ ¯
≤ ¯g (b)0 G (b)−1 g (b) − λ (b)0 G (b)−1 λ (b) − g (β)0 G (b)−1 g (β)¯
¯ ¯
¯ ¯
+ ¯λ (b)0 G (b)−1 λ (b) − λ (b)0 Λ (b)−1 λ (b)¯
¯ ¯ ¯ ¯
¯ 0 −1 ¯ ¯ 0 −1 ¯
+ ¯g (b) G (b) g (b)¯ + ¯g (β) G (b) g (β)¯ ,
¯ ¯ ¯ ¯ ¡ ¢
¯ ¯ ¯ ¯
where ¯g (b)0 G (b)−1 g (b)¯ ≤ ¯g (β)0 G (β)−1 g (β)¯ = Op n−1 by the definition of b and Condition 9. We
−1
¯have G (b) = Op ¯(1) by ¡consistency
¢
of b and the uniform law of large numbers, from which we obtain
¯ 0 −1 ¯
¯g (β) G (b) g (β)¯ = Op n−1 . We also have
g (b)0 G (b)−1 g (b) − λ (b)0 G (b)−1 λ (b) − g (β)0 G (b)−1 g (β)

= g (b)0 G (b)−1 (g (b) − g (β) − λ (b)) + (g (b) − g (β) − λ (b))0 G (b)−1 λ (b)
0 −1 0 −1
+2g (β) G (b) λ (b) + (g (b) − g (β) − λ (b)) G (b) g (β) .
Therefore, we obtain
¯ ¯
¯ ¯
|Q (b)| ≤ ¯λ (b)0 G (b)−1 λ (b) − λ (b)0 Λ (b)−1 λ (b)¯
¯ ¯
¯ ¯
+ ¯g (b)0 G (b)−1 (g (b) − g (β) − λ (b))¯
¯ ¯
¯ ¯
+ ¯(g (b) − g (β) − λ (b))0 G (b)−1 λ (b)¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
+2 ¯g (β)0 G (b)−1 λ (b)¯ + ¯(g (b) − g (β) − λ (b))0 G (b)−1 g (β)¯
¡ ¢
+Op n−1 .
The terms G (β)−1 , and Λ (b)−1 are Op (1) by consistency of b and the uniform law of large numbers.
Also, the terms g (b) and λ (b) are op (1) by consistency of b and the uniform law of large numbers. From
¡ ¢
Theorems 1 and 2 in Andrews (1994) and Conditions 4-9 it follows that g (b) − λ (b) − g (β) = op n−1/2 .
¡ ¢
From a standard CLT and consistency of b it follows that (g (b) − g (β) − λ (b))0 G (b)−1 g (β) = op n−1 ,
¡ ¢
and g (β)0 G (b)−1 = Op n−1/2 . These results show that
° ° ³ ´ ¡ ¢
° °
|Q (b)| ≤ °G (b)−1 − Λ (b)−1 ° kλ (b)k2 + Op n−1/2 kλ (b)k + op n−1
³ ´ ¡ ¢
2
= op (1) kλ (b)k + Op n−1/2 kλ (b)k + op n−1 .
¯ ¯
¯ ¯
Because |Q (b)| = ¯λ (b)0 Λ (b)−1 λ (b)¯ ≥ 1
M
2
kλ (b)k , we conclude that
µ ¶ ³ ´
1 ¡ ¢
− op (1) kλ (b)k2 − Op n−1/2 kλ (b)k ≤ op n−1
M
or
³ ´
kλ (b)k = Op n−1/2 ,
27
¡ ¢
which implies that b − β = Op n−1/2 .
Proof of Theorem 3. Note that we have
µ ¶
1 √ 1 ¡√ ¢2 1
g1 (b) = g1 + √ g2 · n (b − β) + g3 · n (b − β) + op ,
n 2n n
µ ¶
1 √ 1 ¡√ ¢2 1
g (b) = g + √ g1 · n (b − β) + g2 · n (b − β) + op ,
n 2n n
and
−1 1 √
G (b) = G−1 − √ G−1 G1 G−1 · n (b − β)
n
µ ¶
1 1 ¡ −1 ¢ ¡√ ¢2 1
+ 2G G1 G−1 G1 G−1 − G−1 G2 G−1 n (b − β) + op ,
2n n
µ ¶
1 √ 11 ¡√ ¢2 1
G1 (b) = G1 + √ G2 · n (b − β) + G3 · n (b − β) + op
n 2n n
√
where g, gj , G, Gj and n(b − β) are Op (1) by Conditions 6 and 7 and Lemma 1. Therefore, we have
µ ¶
0 −1 0 −1 1 √ 1 ¡√ ¢2 1
g1 (b) G (b) g (b) = g1 G g + √ h1 · n (b − β) + h2 · n (b − β) + op ,
n n n
and
0 −1 −1 1 √
g (b) G (b) G1 (b) G (b) g (b) = g0 G−1 G1 G−1 g + √ h3 · n (b − β)
n
µ ¶
1 ¡√ ¢2 1
+ h4 · n (b − β) + op ,
n n
where
h1 = g20 G−1 g − g10 G−1 G1 G−1 g + g10 G−1 g1 ,

1 1 ¡ ¢ 1
h2 = g30 G−1 g + g10 2G−1 G1 G−1 G1 G−1 − G−1 G2 G−1 g + g10 G−1 g2
2 2 2
0 −1 −1 0 −1 −1 0 −1
− g2 G G1 G g − g1 G G1 G g1 + g2 G g1
1 1 3
= g30 G−1 g + g10 G−1 G1 G−1 G1 G−1 g − g10 G−1 G2 G−1 g + g10 G−1 g2
2 2 2
− g20 G−1 G1 G−1 g − g10 G−1 G1 G−1 g1 ,
and
h3 = g10 G−1 G1 G−1 g − g 0 G−1 G1 G−1 G1 G−1 g + g 0 G−1 G2 G−1 g

− g0 G−1 G1 G−1 G1 G−1 g + g0 G−1 G1 G−1 g1
= 2g10 G−1 G1 G−1 g − 2g 0 G−1 G1 G−1 G1 G−1 g + g 0 G−1 G2 G−1 g,
28
1 1 ¡ ¢
h4 = g20 G−1 G1 G−1 g + g 0 2G−1 G1 G−1 G1 G−1 − G−1 G2 G−1 G1 G−1 g
2 2
1 0 −1 1 0 −1 ¡ −1 ¢
+ g G G3 G g + g G G1 2G G1 G−1 G1 G−1 − G−1 G2 G−1 g
−1
2 2
1
+ g0 G−1 G1 G−1 g2
2
− g10 G−1 G1 G−1 G1 G−1 g + g10 G−1 G2 G−1 g − g10 G−1 G1 G−1 G1 G−1 g
+ g10 G−1 G1 G−1 g1 − g0 G−1 G1 G−1 G2 G−1 g
+ g 0 G−1 G1 G−1 G1 G−1 G1 G−1 g − g 0 G−1 G1 G−1 G1 G−1 g1 − g0 G−1 G2 G−1 G1 G−1 g
+ g 0 G−1 G2 G−1 g1 − g0 G−1 G1 G−1 G1 G−1 g1
We may therefore rewrite the score Sn (b) as

¡ ¢ 1 √
Sn (b) = 2g10 G−1 g − g 0 G−1 G1 G−1 g + √ (2h1 − h3 ) n (b − β)
n
µ ¶
1 ¡√ ¢2 1
+ (2h2 − h4 ) n (b − β) + op . (22)
n n
Next note that P (|Sn (b)| > ²) = P (|Sn (b)| > ²/n−1 ) for any ² > 0 because of Lemma 1. Thus
Sn (b) = op (n−1 ) and we can subsume this error into the op (n−1 ) term of 22. Using these arguments and
Lemmas 12, 13, and 14 below, we may rewrite the first order condition (22) as
µ ¶ µ ¶
1 1 1 1 √ 1 ¡√ ¢2 1
0 = √ Φ+ Γ+ √ Υ+ √ Ξ n (b − β) + Ψ n (b − β) + op
n n n n n n
or
µ ¶ µ ¶
1 1 √ 1 ¡√ ¢2 1
0 =Φ+ √ Γ+ Υ+ √ Ξ n (b − β) + √ Ψ n (b − β) + op √ ,
n n n n
based on which we obtain (9). Noting that

√
E [Φ] = 2 nλ01 Λ−1 E [g] = 0, (23)
£ 0 ¤ £ ¤ £ ¤
E [Γ] = 2nE (g1 − λ1 ) Λ−1 g − 2nλ01 Λ−1 E (G − Λ) Λ−1 g − nE g 0 Λ−1 Λ1 Λ−1 g
µ · ¸¶
∂δ 0i £ ¤ ¡ £ ¤¢
−1
= 2 trace Λ E δ i − 2λ01 Λ−1 E ψi ψ0i Λ−1 δ i − trace Λ−1 Λ1 Λ−1 E δ i δ 0i , (24)
∂β
£ 0¤ £ ¤
E [ΦΞ] = 8nλ01 Λ−1 E g (g1 − λ1 ) Λ−1 λ1 − 4nE λ01 Λ−1 gλ01 Λ−1 (G − Λ) Λ−1 λ1
−8nλ01 Λ−1 E [gg 0 ] Λ−1 Λ1 Λ−1 λ1 + 4nλ01 Λ−1 E [gg0 ] Λ−1 λ2
· ¸
∂δ 0 £ ¤
= 8λ01 Λ−1 E δ i i Λ−1 λ1 − 4λ01 Λ−1 E δ i λ01 Λ−1 ψi ψ0i Λ−1 λ1
∂β
£ 0 ¤ −1 £ ¤
−8λ1 Λ E δ i δ i Λ Λ1 Λ−1 λ1 + 4λ01 Λ−1 E δ i δ 0i Λ−1 λ2 ,
0 −1
(25)
and
£ ¤ £ ¤
E Φ2 = 4λ01 Λ−1 E δ i δ 0i Λ−1 λ1 , (26)
we obtain the desired conclusion.
29
Lemma 12 Under Conditions 6 and 7
3 0 −1
h2 = λ Λ λ2 − λ01 Λ−1 Λ1 Λ−1 λ1 + op (1) ,
2 1
h4 = λ01 Λ−1 Λ1 Λ−1 λ1 + op (1) .
Proof. Follows from plim g = 0.

µ ¶
1 1
2h1 − h3 = Υ + √ Ξ + op √ .
n n
Proof. Because
µ ¶
1
g20 G−1 g = λ02 Λ−1 g + op √ ,
n
µ ¶
1
g10 G−1 G1 G−1 g = λ01 Λ−1 Λ1 Λ−1 g + op √ ,
n
and
µ µ ¶¶
0 1
g10 G−1 g1 = (λ1 + (g1 − λ1 )) Λ−1 − Λ−1 (G − Λ) Λ−1 + op √ (λ1 + (g1 − λ1 ))
n
µ ¶
1
= λ1 Λ−1 λ1 + 2 (g1 − λ1 )0 Λ−1 λ1 − λ1 Λ−1 (G − Λ) Λ−1 λ1 + op √ , (27)
n
we obtain
h1 = λ02 Λ−1 g − λ01 Λ−1 Λ1 Λ−1 g

µ ¶
−1 0 −1 −1 −1 1
+ λ1 Λ λ1 + 2 (g1 − λ1 ) Λ λ1 − λ1 Λ (G − Λ) Λ λ1 + op √ .
n
Similarly, we obtain
µ ¶
1
h3 = 2λ01 Λ−1 Λ1 Λ−1 g + op √ .
n
The conclusion follows.

µ ¶
1 1 1
2g10 G−1 g −g G 0 −1
G1 G −1
g = √ Φ + Γ + op .
n n n
Proof. We have
µ µ ¶¶
0 1
g10 G−1 g −1 −1
= (λ1 + (g1 − λ1 )) Λ − Λ (G − Λ) Λ + op √ −1
g
n
µ ¶
0 1
= λ01 Λ−1 g + (g1 − λ1 ) Λ−1 g − λ01 Λ−1 (G − Λ) Λ−1 g + op
n
and
µ ¶
0 −1 −1 0 −1 −1 1
gG G1 G g=gΛ Λ1 Λ g + op .
n
30
C Proof of Theorem 4
The second order bias is computed using Theorem 4. Because the “weight matrix” here does not involve
the parameter of interest, we have Λ1 = 0, which renders the third, sixth, and last terms in Theorem
4 equal to zero. Also, because the moment restriction is linear in the parameter of interest, we have
λ2h= 0, which renders the i seventh and eight terms in Theorem 4 equal to zero. Furthermore, because
0 0 −1
E zit zit E [zit zit ] zit ε∗it = 0 under conditional symmetry of ε∗it , the numerator in the second term
£ ¤ PT −1 h i
0 0 −1 0 −1
λ01 Λ−1 E ψi ψ0i Λ−1 δ i = − t=1 E [zit x∗it ] E [zit zit 0
] E zit zit E [zit zit ] zit ε∗it should be equal to zero,
and therefore, the second term should be equal to zero. We obtain the desired conclusion by noting that
T
X −1
−1
λ1 Λ−1 λ1 = E [zit x∗it ]0 E [zit zit
0
] E [zit x∗it ] ,
t=1
µ · ¸¶ T
X −1 ³ ´
∂δ 0 0 −1
trace Λ−1 E δ i i =− trace E [zit zit ] E [ε∗it x∗it zit zit
0
] ,
∂β t=1
· 0 ¸ T
X −1 T
X −1
∂δ 0 −1 0 −1
λ01 Λ−1 E δ i i Λ−1 λ1 = − E [zit x∗it ]0 E [zit zit ] E [ε∗it x∗is zit zis
0
] E [zis zis ] E [zis x∗is ] ,
∂β t=1 s=1
and
£ ¤
λ01 Λ−1 E δ i λ01 Λ−1 ψi ψ0i Λ−1 λ1
T
X −1 T
X −1 h i
−1 0 −1 0 −1
=− E [zit x∗it ]0 E [zit zit
0
] E ε∗it zit E [zis x∗is ]0 E [zis zis 0
] zis zis E [zis zis ] E [zis x∗is ] .
t=1 s=1
D Proofs for Section 6

Proof of Lemma 2. Note that
q r h i
E [|uit ∆yis−1 |] ≤ E [u2it ] E (∆yis−1 )2
v 
u
u Ã s−2
!2 
p u X
= σ2ε + σ 2α tE  β s−2
n (β n − 1) ξ i0 + εis−1 + (β n − 1) β r−1
n εis−1−r

r=1
v
u s−2
p u 2 2 X
t 2(s−2) σ ε (β n − 1) 2 2
= 2 2
σε + σ α β n 2
+ σε + (β n − 1) σε β 2(r−1)
n = O (1)
1 − β 2n r=1
Pn
By independence of uit ∆yis−1 across i, it therefore follows that n−3/2 i=1 uit ∆yis−1 = op (1). By
Pn Pn
the same reasoning, we obtain n−3/2 i=1 uiT ∆yij−1 = op (1), and n−3/2 i=1 ui ∆yik−1 = op (1). We
Pn P n
therefore obtain n−3/2 i=1 fi,1 = op (1). We can similarly obtain n−3/2 i=1 gi,1 = op (1).
P n P n
Next we consider n−3/2 i=1 fi,2 andn−3/2 i=1 gi,2 . Note that
· µ Xs−2 ¶¸
t−2 r−1
E [∆yit yi0 ] = E yi0 β n (β n − 1) ξ i0 + εis−1 + (β n − 1) β n εis−1−r
r=1
βn − 1
= β t−2
n σε
2
= O (1) ,
1 − β 2n
31
and
" µ ¶2 #
h i Xs−2
2 2 t−2 r−1
E (∆yit yi0 ) = E yi0 β n (β n − 1) ξ i0 + εis−1 + (β n − 1) β n εis−1−r
r=1
2
(β n − 1) σ 2ε σ 2α 4
2(t−2) σ ε (β n − 1)
2
= β 2(t−2)
n 2 2 + 3β n ¡ ¢2
1 − β n (1 − β n ) 1 − β 2n
µ Ã !
2 2
Xs−2 2(r−1) ¶ σ2α σ2ε
2
+ σε + (β n − 1) σ ε β +¡ ¢
r=1 n (1 − β n )2 1 − β 2n
σ 2ε σ2α 2
= n + O(n).
c2
¡ P ¢ P
such that Var n−3/2 ni=1 ∆yit yi0 = O(1). For n−3/2 ni=1 gi,2 (β 0 ) we have from the moment conditions
that E [gi,2 (β 0 )] = 0 and
2σ2ε σ2α 2
Var (∆ui (β 0 )yi0 ) = 2σ2ε σ2α (1 − β n )−2 + O(n) = n + O(n).
c2
P £ 0 ¤0
The joint limiting distribution of n−3/2 ni=1 fi,2 0
− Efi,2 , gi,2 (β 0 )0 can now be obtained from a trian-
gular array CLT. By previous arguments
£ 0 ¤ h i
E fi,2 , gi,2 (β 0 )0 = µ0 0 · · · 0
with µ = σ2y /2ι + O(n−1 ) where ι is the T − 1 dimensional vector with elements 1. Then
h¡ £ 0 ¤ ¢0 ¡ 0 £ 0 ¤ ¢i
0
E fi,2 − E fi,2 , gi,2 (β 0 )0 fi,2 − E fi,2 , gi,2 (β 0 )0 = Σn
where
" #
Σ11,n Σ12,n
Σn =
Σ21,n Σ22,n
σ 2ε σ 2α 2
By previous calculations we have found the diagonal elements of Σ11,n and Σ22,n to be c2 n and
2σ 2ε σ 2α 2
c2 n . The off-diagonal elements of Σ11,n are found to be
· µ Xs−2 r−1 ¶
£ ¤
2
E ∆yit ∆yis yi0 = E yi0 2
β s−2
n (β n − 1) ξ i0 + εis−1 + (β n − 1) β εis−1−r
r=1 n
µ Xt−2 ¶¸
× β t−2n (β n − 1) ξ i0 + ε it−1 + (β n − 1) β r−1
n ε it−1−r
r=1
Ã !
2
s−2 (β n − 1) σ 2α σ4ε σ2α
= β t−2
n β n ¡ ¢ + 3 ¡ ¢ + O(1) = n + O(1)
1 − β 2n (1 − β n )2 1 − β 2n 2c
2 2 2 2
which is of lower order of magnitude while n−1 (E [∆yit yi0 ])2 = O(1). Thus n−1 Σ11,n → diag( σεcσα , ..., σεcσα ).
The off-diagonal elements of Σ22,n are obtained from
(
£ 2
¤ −σ 2ε σ2α (1 − β n )−2 + O(n) t = s + 1 or t = s − 1
E ∆uit ∆uis yi0 =
0 otherwise
For Σ12,n , we consider
 σ2ε σ2α 2

 c2 n + O(n) if t = s
£ 2
¤ σ 2ε σ 2α 2
E ∆yit ∆uis yi0 = − c2 n + O(n) if t = s − 1


0 otherwise
32
P −1/2 £ 0 ¤0 d
It then follows that for ` ∈ RT (T +1)/+2T −6 such that `0 ` = 1 n−3/2 ni=1 `0 Σn fi,2 − Efi,2 , gi,2 (β 0 )0 →
N (0, 1) by the Lindeberg-Feller CLT for triangular arrays. It then follows from a straightforward applica-
Pn £ 0 ¤0 d
tion of the Cramer-Wold theorem and the continuous mapping theorem that n−3/2 i=1 fi,2 , gi,2 (β 0 )0 →
£ 0 0 ¤0 £ 0 0 ¤0 P
ξ x , ξ y where ξ x , ξ y ∼ N (0, Σ) . Note that n−3/2 ni=1 `0 E [fi,2 ] = O(n−1/2 ) and thus does not affect
the limit distribution.
£ 0
¤
Finally note that E gi,1 gi,1 = O (1). Also note that
−2 2σ2ε σ2α 2
Var (∆ui (β 0 )yi0 ) = 2σ2ε σ2α (1 − β n ) + O(n) = n + O(n).
c2
£ 0
¤
The off-diagonal elements of E gi,2 gi,2 are obtained from
(
£ 2
¤ −σ 2ε σ2α (1 − β n )−2 + O(n) t = s + 1 or t = s − 1
E ∆uit ∆uis yi0 =
0 otherwise
It therefore follows that

" #
1 0 0
E [gi gi0 ] = + o (1) .
n2 0 Σ22
Proof of Theorem 5. Use the transformation z = L−1 C10 ξ x with C10 Σ11 C1 = LL0 . Note that we
√ ³ ´−1 ³ ´ ±
can take L = δIr1 . Define W = C10 Ω̃C1 . Then X C1 , Ω̃ = z 0 L0 W C10 ξ y z 0 L0 W Lz. Next use
£ ¯ ¤
the fact that E C10 ξ y ¯ C1 ξ x = F C10 ξ x = F Lz with F = C10 Σ21 C1 (C10 Σ11 C1 )−1 = C10 M10 C1 , where the
second equality is based on (C10 Σ11 C1 )−1 = δ −1 Ir1 . Using a conditioning argument it then follows that
h ³ í · 0 0 ¸ · 0 ¸
z L W F Lz z Dz
E X C1 , Ω̃ = E =E 0 .
z 0 L0 W Lz z Wz
Note that z 0 Dz = z 0 D0 z = z 0 D̄z where D̄ = 12 (D + D0 ) is symmetric. Also, W is symmetric positive

definite. The result then follows from Smith (1993, Eq. 2.4, p. 273).
Proof of Theorem 6. We first analyze E [X (C1 , Σ22 )]. Note that in this case W = (C10 Σ22 C1 )−1 =
−1 −1
δ (C10 M2 C1 ) such that
" #
z 0 δ D̄z
E [X (C1 , Σ22 )] = E
z 0 (C10 M2 C1 )−1 z
and δ tr D̄ = δ/2 tr W C10 (M10 + M1 ) C1 = −r1 /2 since M10 + M1 = −M2 . Let Γ1 be an orthogonal
−1
matrix of eigenvectors of (C10 M2 C1 ) with corresponding diagonal matrix of eigenvalues Λ1 such that
−1
(C10 M2 C1 ) = Γ1 Λ1 Γ01 and z1 = Γ01 z. Let λ̄ be the largest element of Λ1 . Then it follows from Smith
(1993, p. 273 and Appendix A)
· 0 ¸ · 0 0 ¸ ¡ ¢
z Dz z1 Γ1 D̄Γ1 z1 −1
X∞
(1)k 12 1+k 1,k ³ 0 −1
´
E 0 = δE 0 = δ λ̄ ¡ r1 ¢ C1+k Γ1 D̄Γ1 , Ir1 − λ̄ Λ1 ,
z Wz z1 Λ1 z1 2 1+k k!
k=0
where for any two real symmetric matrices Y1 , Y2

k
¡1¢
1,k k! X 2 k−i ¡ ¢
C1+k (Y1 , Y2 ) = ¡1¢ tr Y1 Y2i Ck−i (Y2 )
2 2 k+1 i=0
(k − i)!
33
and
k!
Ck (Y2 ) = ¡ 1 ¢ dk (Y2 )
2 k
1 ³ k−j ´
k−1
X
−1
dk (Y2 ) = k tr Y2 dj (Y2 )
j=0
2
d0 (Y2 ) = 1.
−1 −1 −1 −1
Since all elements ³λ̄ λi in λ̄ Λ
´1 satisfy 0 < λ̄ λi ≤ 1 it follows that Ir1 − λ̄ Λ1 is positive semidefinite
−1
implying that Ck Ir1 − λ̄ Λ1 ≥ 0, which holds with equality only if all eigenvalues λi are the same.
Also note that if Y2 , Y3 are diagonal matrices and Y1 symmetric then tr Y3 Y1 Y2i = tr Y1 Y3 Y2i such that
³ í
−1 −1
tr Γ01 D̄Γ1 (Ir1 − λ̄ Λ1 )i = tr Γ01 (W C10 M10 C1 + C10 M1 C1 W ) Γ1 Ir1 − λ̄ Λ1
³ í
−1
= δ −1 tr (Λ1 Γ01 C10 M10 C1 Γ1 + Γ01 C10 M1 C1 Γ1 Λ1 ) Ir1 − λ̄ Λ1
³ í
−1
= δ −1 tr Γ01 C10 (M10 + M1 ) C1 Γ1 Λ1 Ir1 − λ̄ Λ1
³ í
−1
= −δ −1 tr Γ01 C10 M2 C1 Γ1 Λ1 Ir1 − λ̄ Λ1
³ í
−1
= −δ −1 tr Γ1 Λ1 Γ01 C10 M2 C1 Ir1 − λ̄ Λ1
−1
= −δ −1 tr(Ir1 − λ̄ Λ1 )i .
µ ³ í ¶ ³ ´
0 −1 −1
This shows that all the terms tr Γ1 D̄Γ1 Ir1 − λ̄ Λ1 Ck−i Ir1 − λ̄ Λ1 have the same sign and
³ ´
1,k −1
therefore all the terms C1+k Γ01 D̄Γ1 , Ir1 − λ̄ Λ1 have the same sign. Therefore, we have
¯ ¡1¢ ¯
¯ 0 ³ ´¯
¯ −1 X (1) k 2 1+k 1,k
¡ r1 ¢ 0 −1 ¯
|E [X (C1 , Σ22 )]| ≥ ¯δ λ̄ C1+k Γ1 D̄Γ1 , Ir1 − λ̄ Λ1 ¯
¯ 2 1+k k! ¯
k=0
For k = 0, we have
³ ´
−1
C11,0 Γ01 D̄Γ1 , Ir1 − λ̄ Λ1 = tr Γ01 D̄Γ1 = −δ −1 r1 /2
¡ ¢ ¡ ¢ −1
and (1)0 12 1 / r21 1 = 1/r1 . This shows that |E [X (C1 , Σ22 )]| ≥ λ̄ /2 for all C1 such that C10 C1 = Ir1
and all r1 ∈ {1, 2, .., T − 1} . Then
−1
λ̄ min lj
min |E [X (C1 , Σ22 )]| ≥ min ≥ , (28)
0
C1 s.t. C1 C1 =Ir1 0
C1 s.t. C1 C1 =Ir1 2 2
r1 ∈{1,2,..,T −1} r1 ∈{1,2,..,T −1}
where min lj is the smallest eigenvalue of M2 and the last inequality follows from Magnus and Neudecker
(1988, Theorem 10, p. 209). Now let r1 = 1 and C1 = ρi , where ρi is the eigenvector corresponding to
−1 −1 ¯ ¯
min lj . Then Ir1 − λ̄ Λ1 = 1 − 1 = 0 such that |E [X (C1 , Σ22 )]| = λ̄ ¯tr Γ01 D̄Γ1 ¯ = (1/ min lj ) /2 =
−1
min lj /2. Inequality (28) therefore holds with equality.

Next consider E [X (C1 , IT −1 )] = tr C10 M̄1 C1 /r1 with M̄1 = (M10 + M1 ) /2. We analyze
¯ ± ¯
min ¯ tr C10 M̄1 C1 r1 ¯ .
0
C1 s.t. C1 C1 =Ir1
r1 ∈{1,2,..,T −1}
34
¡ ¢
It can be checked easily that M̄1 is negative definite symmetric. We can therefore minimize − tr C10 M̄1 C1 .
It is now useful to choose an orthogonal matrix R with j-th row ρj such that R0 R = RR0 = I and
Pn 0
−M̄1 = RLR0 where L is the diagonal matrix of eigenvalues of −M̄1 = j=1 lj ρj ρj . Then it follows that
¡ 0 ¢ PT −1
− tr C1 M̄1 C1 = j=1 lj ρ0j C1 C10 ρj . Next note that all the eigenvalues of C1 C10 are either zero or one
¡ ¢
such that 0 ≤ ρ0j C1 C10 ρj ≤ 1. The minimum of − tr C10 M̄1 C1 is then found by choosing r1 = 1 and
¡ ¢
C1 such that C10 ρj = 0 except for the eigenvector ρi corresponding to min lj . To show that tr D̄/r1 is
¡ ¢
also minimized for r1 = 1 and C1 = ρi , where tr D̄/r1 = min lj , consider augmenting C1 by a column
P ¡ 0 ¢2
vector x such that x0 x = 1 and ρ0i x = 0. Then C10 C1 = I2 , r2 = 2 and tr C10 M̄1 C1 = li + Tj6=−1
i lj ρj x .
PT −1 ¡ 0 ¢2 ¡ 0 ¢
By Parseval’s equality j6=i ρj x = 1. Since lj ≥ li we can bound tr C1 M̄1 C1 ≥ 2li but then
¡ ¢
tr D̄/2 ≥ li . This argument can be repeated to more than one orthogonal additions x. It now fol-
¡ ¢
lows that E [X (C1 , IT −1 )] = tr D̄/r1 is minimized for r1 = 1 and C = ρi , where ρi is the eigenvector
corresponding to the smallest eigenvalue.
Next note that from x0 x = 1 such that min li ≤ −x0 M̄1 x ≤ max li it follows that
0 −1
min li ≤ −10 M̄1 1/(1 1) = (T − 1)
for 1 = [1, ..., 1]0 which shows that the smallest eigenvalue is bounded by a monotonically decreasing
function of the number of moment conditions.
0
The last part of the result follows from 10 M̄1 1/(1 1) →0.
E Blundell and Bond’s (1998) Estimator and Weight Matrix

Bludell and Bond (1998) suggest a new set of moment restrictions. If T = 5, they can be written as
E [qi (β)] = 0
where
 
yi0 · ((yi2 − yi1 ) − b (yi1 − yi0 ))
 yi0 · ((yi3 − yi2 ) − b (yi2 − yi1 )) 
 
 
 yi1 · ((yi3 − yi2 ) − b (yi2 − yi1 )) 
 
 yi0 · ((yi4 − yi3 ) − b (yi3 − yi2 )) 
 
 
 yi1 · ((yi4 − yi3 ) − b (yi3 − yi2 )) 
 
 yi2 · ((yi4 − yi3 ) − b (yi3 − yi2 )) 
 
 yi0 · ((yi5 − yi4 ) − b (yi4 − yi3 )) 
 
qi (b) ≡  
 yi1 · ((yi5 − yi4 ) − b (yi4 − yi3 )) 
 
 yi2 · ((yi5 − yi4 ) − b (yi4 − yi3 )) 
 
 
 yi3 · ((yi5 − yi4 ) − b (yi4 − yi3 )) 
 
 (yi1 − yi0 ) · (yi2 − byi1 ) 
 
 (yi2 − yi1 ) · (yi3 − byi2 ) 
 
 
 (yi3 − yi2 ) · (yi4 − byi3 ) 
(yi4 − yi3 ) · (yi5 − byi4 )
They suggest a GMM estimation:

Ã n !0 Ã n !
X X
−1
min qi (b) A qi (b)
b
i=1 i=1
35
We examine properties of Blundell and Bond’s moment restriction for β near unity. We consider four
methods of computing A, which in principle is a consistent estimator of E [qi (β) qi (β)]:
1. We can use bbLIML as our consistent estimator and use

1 X ³b ´ ³ ´0
n
A1 = qi bLIML qi bbLIML
n i=1
P 0 Pn b
This gives us a GMM estimator that minimizes ( ni=1 qi (b)) A−1
1 ( i=1 qi (b)). We call it bBB1 .
2. We can compute
1 X ³b ´ ³ ´0
n
A2 = qi bBB1 qi bbBB1
n i=1
Pn 0 Pn b
and obtain a GMM estimator that minimizes ( i=1 qi (b)) A−1
2 ( i=1 qi (b)). We call it bBB2 .
3. We can compute
n
1X 0
A1 = Z Zi
n i=1 i
where
 
yi,0 0 0 ··· 0
 0 yi,0 yi,1 
 
 .. 
 . 
 
 
 yi,0 yi,1 ··· yi,T −2 
 
Zi =  . 
 .. ∆yi1 
 
 
 ∆yi,2 
 
 .. 
 . 
0 ··· 0 ∆yi,T −1
0 PPnn b
and obtain a GMM estimator that minimizes ( i=1 qi (b)) A−1 3 ( i=1 qi (b)). We call it bBB3 .
This is one of the estimators considered by Blundell and Bond (1998) in thier Monte Carlo.
4. We can compute
1 X ³b ´ ³ ´0
n
A4 = qi bBB3 qi bbBB3
n i=1
Pn 0 Pn b
and obtain a GMM estimator that minimizes ( i=1 qi (b)) A−1 4 ( i=1 qi (b)). We call it bBB4 .
Again, this is one of the estimators considered by Blundell and Bond (1998) in thier Monte Carlo.
F Second Order Theory for Finitely Iterated Long Difference

Estimator
We examine second order bias of finitely iterated 2SLS. For this purpose, we consider 2SLS
Ã !Ã n !−1 Ã n !−1 Ã n !Ã n !−1 Ã n !
Xn X X X X X
b= 0
xi zbi 0
zbi zbi zbi xi  0
xi zbi 0
zbi zbi zbi yi (29)
i=1 i=1 i=1 i=1 i=1 i=1
36
applied to the single equation
yi = βxi + εi (30)
√ ³b ´
using instrument zbi = zi − √1n b
θwi , where b
θ = n β − β . Here, zi is the “proper” instrument. We
assume that
n µ ¶
b 1 X 1 1
θ=√ fi + √ Qn + op √ (31)
n i=1 n n
where fi is i.i.d. and has mean zero, and Qn = Op (1). It can be seen that E[Qn
n]
is equal to the second
b
order bias of β under our assumption (31). If εi in (30) is symmetrically distributed given zi , then the
second order bias of b is equal to n1 times
(K − 2) σuε λ0 Λ−1 ϕ λ0 Λ−1 E [fi wi εi ]

0 −1 − 0 −1 E [Qn ] −
λΛ λ λΛ λ λ0 Λ−1 λ
0 −1
E [fi zi xi ] Λ ϕ φ0 Λ−1 E [fi zi εi ]
− −
λ0 Λ−1 λ λ0 Λ−1 λ
λ Λ E [fi zi zi ] Λ ϕ λ0 Λ−1 ∆Λ−1 E [fi zi εi ]
0 −1 0 −1 £ 2 ¤ λ0 Λ−1 ∆Λ−1 ϕ
+ + − E fi
λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ
0 −1
λΛ ϕ 0 −1
+ 2¡ 0 ¢2 E [fi zi xi ] Λ λ
λΛ λ −1
λ0 Λ−1 E [fi zi εi ] 0 −1 £ 2 ¤ λ0 Λ−1 ϕ 0 −1

+2 ¡ 0 ¢2 φ Λ λ − 2E fi ¡ 0 ¢2 φ Λ λ
λ Λ−1 λ λ Λ−1 λ
λ0 Λ−1 ϕ 0 −1 0 −1
−¡ 0 ¢2 λ Λ E [fi zi zi ] Λ λ
λΛ λ−1
λ0 Λ−1 E [fi zi εi ] 0 −1 −1
£ 2 ¤ λ0 Λ−1 ϕ 0 −1 −1
− ¡ 0 ¢2 λ Λ ∆Λ λ + E fi ¡ 0 ¢2 λ Λ ∆Λ λ. (32)
λΛ λ−1 λΛ λ−1
where λ = E [zi xi ], Λ = E [zi zi0 ], φ = E [wi xi ], ∆ = E [wi zi0 + zi wi0 ], and ϕ = E [wi εi ].
Using (32), we can characterize the second order bias of iterated 2SLS applied to the long difference
equation using a LIML like estimator as the initial estimator. For this purpose, we need to have the
second order bias of the LIML like estimator. In Appendix G, we present a second order bias of the
LIML like estimator. In fact, based on 5000 runs, we found in our Monte Carlo experiments that the
biases of bbLIML,1 and bbLIML,2 are smaller than predicted by the second order theory. In Table 7, we
compare the actual performance of the long difference based estimators with the second order theory.
It is sometimes of interest to construct a consistent estimator for the asymptotic variance. Although
such exercise may appear to be related only to first order asymptotics, a consistent estimator of the
asymptotic variance could be useful in practice for refinement of confidence interval as well: Pivoted
bootstrap as considered by Hall and Horowitz (1996) require such consistent estimator for second order
refinement. In Appendix H, we present a first order asymptotic result as well as a consistent estimator
for the asymptotic variance.
Proof of (32). We first present an expansion for 2SLS using instrument zbi = zi − √1n b θwi . We have
¡ 1 Pn ¢0 ¡ 1 Pn ¢ ³
0 −1 √1
Pn ´
√ n b
z x
i=1 i i n b
z
i=1 i i b
z n
b
z ε
i=1 i i
n (b − β) = ¡ P ¢0 ¡ 1 Pn ¢ ³ P ´ (33)
1 n −1 1 n
n bi xi
i=1 z n bi zbi0
i=1 z
√
n
bi xi
i=1 z
37
Write
n
Ã n
!
1X 1 1 X
zi xi = λ + √ √ (zi xi − λ) ,
n i=1 n n i=1
n
Ã n
!
1X 0 1 1 X 0
zi z = Λ + √ √ (zi zi − Λ) .
n i=1 i n n i=1
Recalling that
n µ ¶
b 1 X 1 1
θ= √ fi + √ Qn + op √ ,
n i=1 n n
we can derive that

n µ ¶ Ã n
! Ã n
! µ ¶
1X 1 b 1 1 X 1 1 X 1
zi − √ θwi xi = λ + √ √ (zi xi − λ) − √ √ fi φ + op √ ,
n i=1 n n n i=1 n n i=1 n
n µ ¶µ ¶0 Ã n
!
1X 1 b 1 b 1 1 X 0
zi − √ θwi zi − √ θwi = Λ + √ √ (zi zi − Λ)
n i=1 n n n n i=1
Ã n
! µ ¶
1 1 X 1
−√ √ fi ∆ + op √ ,
n n i=1 n
and
n µ ¶ n
Ã n
!
1 X 1 b 1 X 1 X 1
√ zi − √ θwi εi = √ zi εi − √ fi ϕ − √ Qn ϕ
n i=1 n n i=1 n i=1 n
Ã n
! Ã n
! µ ¶
1 1 X 1 X 1
−√ √ fi √ (wi εi − ϕ) + op √ .
n n i=1 n i=1 n
38
Here, φ and ∆ are defined in Theorem 32. Using arguments similar to the derivation of (27), we obtain
Ã n
!0 Ã
n
!−1 Ã n
!
1X 1X 0 1 X
zbi xi zbi zb √ zbi εi
n i=1 n i=1 i n i=1
Ã n
! Ã n
!
0 −1 1 X 1 X
=λΛ √ zi εi − √ fi λ0 Λ−1 ϕ
n i=1 n i=1
Ã n
! Ã n
!
1 0 −1 1 1 X 1 X
− √ λ Λ ϕQn − √ √ fi λ0 Λ−1 √ (wi εi − ϕ)
n n n i=1 n i=1
Ã n
!0 Ã n
!
1 1 X 1 X
+√ √ (zi xi − λ) Λ−1 √ zi εi
n n i=1 n i=1
Ã n
!Ã n
!0
1 1 X 1 X
−√ √ fi √ (zi xi − λ) Λ−1 ϕ
n n i=1 n i=1
Ã n
! Ã n
! Ã n
!
1 1 X 0 −1 1 X 1 1 X
−√ √ fi φ Λ √ zi εi + √ √ fi φ0 Λ−1 ϕ
n n i=1 n i=1 n n i=1
Ã n
! Ã n
!
1 0 −1 1 X 1 X
−√ λΛ √ (zi zi0 − Λ) Λ−1 √ zi εi
n n i=1 n i=1
Ã n
! Ã n
!
1 1 X 0 −1 1 X
+√ √ fi λ Λ √ (zi zi − Λ) Λ−1 ϕ
0
n n i=1 n i=1
Ã n
! Ã n
! Ã n
!2
1 1 X 0 −1 1 X 1 1 X
+√ √ fi λ Λ ∆Λ −1
√ zi εi − √ √ fi λ0 Λ−1 ∆Λ−1 ϕ
n n i=1 n i=1 n n i=1
µ ¶
1
+ op √ ,
n
and
Ã n n
!0 Ã !−1 Ã n
!
1X 1X 0 1 X
zbi xi zbi zb √ zbi xi
n i=1 n i=1 i n i=1
Ã n
!0 Ã n
!
0 −1 2 1 X 2 1 X
=λΛ λ+ √ √ −1
(zi xi − λ) Λ λ − √ √ fi φ0 Λ−1 λ
n n i=1 n n i=1
Ã n
! Ã n
! µ ¶
1 0 −1 1 X 0 −1 1 1 X 0 −1 −1 1
−√ λΛ √ (zi zi − Λ) Λ λ + √ √ fi λ Λ ∆Λ λ + op √
n n i=1 n n i=1 n
Therefore, we may conclude that

³ P ´ Ã !
n
√ λ0 Λ−1 √1
n i=1 zi εi
λ0 Λ−1 ϕ 1 X
n
n (b − β) = 0 −1 − 0 −1 √ fi
λΛ λ λΛ λ n i=1
µ ¶
1 1 1
+ √ B1 + √ B2 + op √ , (34)
n n n
39
where
³ Pn ´0 ³ Pn ´
√1 (z x − λ) Λ−1 √1
z ε
n i=1 i i n i=1 i i
B1 =
³ λ0 Λ−1 λ ´ ³ ´
Pn Pn
λ0 Λ−1 √1
n
0
i=1 (zi zi − Λ) Λ
−1 √1
n i=1 zi εi
− 0 −1
³ λ´ Λ λ
0 −1 √1 Pn Ã !0
λΛ n
n i=1 zi εi 1 X
−2 ¡ 0 ¢2 √ (zi xi − λ) Λ−1 λ
−1
λΛ λ n i=1
³ ´
0 −1 √1 Pn Ã !
λΛ n i=1 zi εi 1 X
n
0 −1
+ ¡ 0 ¢2 λΛ √ (zi zi − Λ) Λ−1 λ,
0
λ Λ−1 λ n i=1
and
Ã n
! 0 −1 ³ 1 Pn ´
λ0 Λ−1 ϕ 1 X λΛ √
n i=1 (wi εi − ϕ)
B2 = − 0 −1 Qn − √ fi
λΛ λ n i=1 λ0 Λ−1 λ
Ã ³
! 1 Pn ´0
n −1
1 X i=1 (zi xi − λ) Λ ϕ
√
n
− √ fi
n i=1 λ0 Λ−1 λ
Ã ! 0 −1 1 Pn ³ ´ Ã !
1
n
X φ Λ √
n i=1 z i ε i 1 X
n
φ0 Λ−1 ϕ
− √ fi 0 −1 + √ fi
n i=1 λΛ λ n i=1 λ0 Λ−1 λ
Ã ! 0 −1 ³ 1 Pn 0
´
−1
1 X
n λΛ √
n i=1 (zi zi − Λ) Λ ϕ
+ √ fi 0 −1
n i=1 λΛ λ
Ã ! 0 −1 ³ Pn ´ Ã !2
n −1 √1 n
1 X λ Λ ∆Λ n i=1 i iz ε 1 X λ0 Λ−1 ∆Λ−1 ϕ
+ √ fi 0 −1 − √ fi
n i=1 λΛ λ n i=1 λ0 Λ−1 λ
Ã n
! Ã n
!0
1 X λ0 Λ−1 ϕ 1 X
+2 √ fi ¡ 0 ¢2 √ (zi xi − λ) Λ−1 λ
n i=1 λ Λ−1 λ n i=1
Ã n
! 0 −1 ³ 1 Pn ´ Ã n
!2
1 X λ Λ √
n i=1 z i ε i 1 X λ0 Λ−1 ϕ 0 −1
0 −1
+2 √ fi ¡ 0 ¢2 φΛ λ−2 √ fi ¡ 0 ¢2 φ Λ λ
n i=1 λ Λ−1 λ n i=1 λ Λ−1 λ
Ã n
! Ã n
!
1 X λ0 Λ−1 ϕ 0 −1 1 X
− √ fi ¡ 0 ¢2 λ Λ √ (zi zi − Λ) Λ−1 λ
0
n i=1 λ Λ−1 λ n i=1
Ã ! 0 −1 ³ 1 Pn ´
1
n
X λ Λ √
n i=1 z i ε i
− √ fi ¡ 0 ¢2 λ0 Λ−1 ∆Λ−1 λ
n i=1 λΛ λ −1
Ã n
! 2
1 X λ0 Λ−1 ϕ 0 −1 −1
+ √ fi ¡ 0 ¢2 λ Λ ∆Λ λ.
n i=1 λ Λ−1 λ
The first two terms on the right side of (34) capture the standard first order asymptotics of the plug in
estimator, which establishes Lemma 15. Obviously, they have mean equal to zero. The third term √1n B1
is the standard second order expansion term when b θ = 0, i.e., when the proper instrument is known
40
exactly. Therefore, under conditional symmetry of εi , it can be shown that
(K − 2) σ uε
E [B1 ] = . (35)
λ0 Λ−1 λ
The third term √1n B2 is the correction to the second order expansion to accommodate the plug-in nature
of the estimation. It is not difficult to see that
λ0 Λ−1 ϕ λ0 Λ−1 E [fi wi εi ]
E [B2 ] = − E [Q n ] −
λ0 Λ−1 λ λ0 Λ−1 λ
0 −1 0 −1
E [fi zi xi ] Λ ϕ φ Λ E [fi zi εi ]
− −
λ0 Λ−1 λ λ0 Λ−1 λ
λ Λ E [fi zi zi0 ] Λ−1 ϕ λ0 Λ−1 ∆Λ−1 E [fi zi εi ]
0 −1 £ 2 ¤ λ0 Λ−1 ∆Λ−1 ϕ
+ + − E fi
λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ
0 −1
λΛ ϕ 0 −1
+ 2¡ ¢2 E [fi zi xi ] Λ λ
0 −1
λΛ λ
λ0 Λ−1 E [fi zi εi ] 0 −1 £ 2 ¤ λ0 Λ−1 ϕ 0 −1
+2 ¡ 0 ¢2 φ Λ λ − 2E fi ¡ 0 ¢2 φ Λ λ
λ0 Λ−1 ϕ 0 −1 0 −1
−¡ 0 ¢2 λ Λ E [fi zi zi ] Λ λ
λ Λ−1 λ
λ0 Λ−1 E [fi zi εi ] 0 −1 −1
£ 2 ¤ λ0 Λ−1 ϕ 0 −1 −1
− ¡ 0 ¢2 λ Λ ∆Λ λ + E fi ¡ 0 ¢2 λ Λ ∆Λ λ. (36)
Using (34), (35), and (36), we can obtain the desired conclusion.
G Second Order Bias of bbLIML

Our bbLIML modifies Arellano and Bover’s estimator. It is given by
PT −1 ∗0 ∗
√1 ∗0 ∗
√ n t=1 (xt Pt εt − κt xt εt )
n (b − β) = 1 PT −1 , (37)
∗0 ∗ ∗0 ∗
n t=1 (xt Pt xt − κt xt xt )
where
¡ 1 Pn ∗
¢0 ¡ 1 Pn 0 −1 1
¢ ¡ Pn ¢
n i=1 zit (yit − x∗it c) n i=1 zit zit n
∗
i=1 zit (yit − x∗it c)
κt = min 1 Pn ∗ − x∗ c)2
c (y
n i=1 it it
√
We make the second order expansion of n (b − β). We make a digression to the discussion of single
equation model.13
G.1 Characterization of Second Order Bias of LIML

Consider a simple simultaneous equations model
yi = βxi + εi , xi = zi0 π + ui
13
The digression mostly confirms the usual higher order analysis of LIML readily available in the literature.
The only reason we consider such analysis is because all the analysis we found in the literature are conditional
analysis given instruments: They all assume that the instruments are nonstochastic. Our purpose is to make a
marginal second order analysis, which is more natural in the dynamic panel model context.
41
and examine LIML b that solves
¡ 1 Pn ¢0 ¡ P ¢−1 ¡ 1 Pn ¢
0
e (c) P e (c) n i=1 zi (yi − xi c) n1 ni=1 zi zi0 n i=1 zi (yi − xi c)
min 0 = min 1 Pn 2
c e (c) e (c) c
n i=1 (yi − xi c)
where
e (c) = y − xc
Here, the first order condition is given by
−2x0 P e (b) e (b)0 P e (b) 0

− ¡ ¢2 (−2x e (b)) = 0
e (b)0 e (b) 0
e (b) e (b)
or
Gn (b) = 0,
where
µ ¶µ ¶ µ ¶µ ¶
1 0 1 0 1 0 1 0
Gn (b) = x P e (b) e (b) e (b) − x e (b) e (b) P e (b) .
n n n n
Note that
µ ¶µ ¶ µ ¶µ ¶
∂Gn (b) 1 0 1 0 1 0 1 0
= − x Px e (b) e (b) + x P e (b) −2 x e (b)
∂b n n n n
µ ¶µ ¶ µ ¶µ ¶
1 0 1 0 1 0 1 0
− − xx e (b) P e (b) − x e (b) −2 x P e (b)
n n n n
and
µ ¶µ ¶ µ ¶µ ¶ µ ¶µ ¶
∂ 2 Gn (b) 1 0 1 0 1 0 1 0 1 0 1 0
= − x P x −2 x e (b) + − x P x −2 x e (b) + x P e (b) 2 x x
∂b2 n n n n n n
µ ¶µ ¶ µ ¶µ ¶ µ ¶µ ¶
1 1 1 1 1 0 1
− − x0 x −2 x0 P e (b) − − x0 x −2 x0 P e (b) − x e (b) 2 x0 P x .
n n n n n n
∂Gn (β) ∂ 2 Gn (β) √
We now expand Gn (β), ∂b , and ∂b2 using n-consistency of b:
µ ¶
1 ∂Gn (β) ¡√ ¢ 1 ∂ 2 Gn (β) ¡√ ¢2 1
0 = Gn (β) + √ n (b − β) + 2
n (b − β) + op . (38)
n ∂b n ∂b n
First, note that

µ ¶µ ¶ µ ¶µ ¶
1 0 1 0 1 0 1 0
Gn (β) = x Pε εε − xε ε Pε
n n n n
Ã n !Ã n !0 Ã n !−1 Ã n !
1X 2 1X 1X 0 1X
= ε zi xi zi z zi εi
n i=1 i n i=1 n i=1 i n i=1
Ã n !Ã n !0 Ã n !−1 Ã n !
1X 1X 1X 0 1X
− xi εi zi εi zi z zi εi .
n i=1 n i=1 n i=1 i n i=1
42
Because
n
Ã n
!
1X 2 2 1 1 X¡ 2 2
¢
ε = σε + √ √ ε − σε ,
n i=1 i n n i=1 i
n
Ã n
!
1X 1 1 X
zi xi = λ + √ √ (zi xi − λ) ,
n i=1 n n i=1
n
Ã n
!
1X 1 1 X
xi εi = σ xε + √ √ (xi εi − σxε ) ,
n i=1 n n i=1
Ã n !−1 Ã n
! µ ¶
1X 0 −1 1 −1 1 X 1
zi zi =Λ −√ Λ √ (zi zi − Λ) Λ−1 + op √
0
,
n i=1 n n i=1 n
where λ = E [zi xi ], and Λ = E [zi zi0 ]. Therefore, we have

µ ¶
1 1 1
Gn (β) = √ Φ + Γ + op (39)
n n n
where
Ã n
!
1 X
Φ= σ 2ε λ0 Λ−1 √ zi εi
n i=1
and
Ã n
!0 Ã n
!
1 X 1 X
Γ = σ2ε √ (zi xi − λ) Λ−1 √ zi εi
n i=1 n i=1
Ã n
! Ã n
!
2 0 1 X 0 −1 1 X
− σε λ √ (zi zi − Λ) Λ √ zi εi
n i=1 n i=1
Ã n
! Ã n
!
1 X¡ 2 2
¢ 0 −1 1 X
+ √ ε − σε λ Λ √ zi εi
n i=1 i n i=1
Ã n
!0 Ã n
!
1 X 1 X
− σxε √ zi εi Λ−1 √ zi εi .
n i=1 n i=1
Now, note that

µ ¶µ ¶ µ ¶µ ¶
∂Gn (β) 1 0 1 0 1 0 1 0
=− x Px εε + xx ε Pε
∂b n n n n
Ã n !Ã n !0 Ã n !−1 Ã n !
1X 2 1X 1X 0 1X
=− ε zi xi zi z zi xi
n i=1 i n i=1 n i=1 i n i=1
Ã n !Ã n !0 Ã n !−1 Ã n !
1X 2 1X 1X 0 1X
+ x zi εi zi z zi εi
n i=1 i n i=1 n i=1 i n i=1
µ ¶
1 1
= Υ + √ Ξ + op √ , (40)
n n
where
Υ = −σ2ε λ0 Λ−1 λ
43
and
Ã n
! Ã n
!0
1 X¡ 2 2
¢ 0 −1 2 1 X
Ξ=− √ ε − σε λ Λ λ − 2σε √ (zi xi − λ) Λ−1 λ
n i=1 i n i=1
Ã n
!
2 0 −1 1 X
+ σε λ Λ √ (zi zi − Λ) Λ−1 λ.
0
n i=1
Finally, note that

µ ¶µ ¶ µ ¶µ ¶
∂ 2 Gn (β) 1 0 1 0 1 0 1 0
=2 x Px x ε −2 x Pε xx
∂b2 n n n n
= 2Ψ + op (1) . (41)
where
Ψ = σxε λ0 Λ−1 λ.
Combining (38), (39), (40), and (41), we obtain

µ ¶ µ ¶
1 1 1 1 √ 1 ¡√ ¢2 1
0= √ Φ+ Γ+ √ Υ+ √ Ξ n (b − β) + Ψ n (b − β) + op ,
n n n n n n
µ ¶ µ ¶
√ 1 1 1 1 Ψ 1
n (b − β) = − Φ + √ − Γ + 2 ΦΞ − 3 Φ2 + op √ .
Υ n Υ Υ Υ n
Note that Φ has a mean equal to zero. Therefore, under symmetry, the second order bias of b is given by
· µ ¶¸
1 1 1 Ψ −σεx
E − Γ + 2 ΦΞ − 3 Φ2 = 0 −1 ,
n Υ Υ Υ λΛ λ
which is qualitatively of the same form as Rothenberg’s mean.
G.2 Higher Order Analysis of the “Eigenvalue”

Let
e (b)0 P e (b)
κ= 0
e (b) e (b)
Getting back to the first order condition
e (b)0 P e (b) 0
0 = x0 P e (b) − 0 x e (b) = x0 P y − κx0 y − (x0 P x − κx0 x) b,
e (b) e (b)
we can write
x0 P y − κx0 y
b= ,
x0 P x − κx0 x
the usual expression.
Note that
1 0 1 0 1 2 0
n ε P ε − 2 n (b − β) ε P x + n (b − β) x P x
κ= 1 0 1 1 2 0
0
n ε ε − 2 n (b − β) ε x + n (b − β) x x
44
The numerator and the denominator may be rewritten as
Ã n
!0 Ã n
! Ã n
!
1 1 X −1 1 X 1 ¡√ ¢ 0 −1 1 X
√ zi εi Λ √ zi εi − 2 n (b − β) λ Λ √ zi εi
n n i=1 n i=1 n n i=1
µ ¶
1 ¡√ ¢2 0 −1 1
+ n (b − β) λ Λ λ + op
n n
Ã !0 Ã ! ³ ³ P ´´2
n n λ0 Λ−1 √1n ni=1 zi εi µ ¶
1 1 X −1 1 X 1 1
= √ zi εi Λ √ zi εi − 0 −1 + op ,
n n i=1 n i=1 n λΛ λ n
and
1 0 1 1
ε ε − 2 (b − β) ε0 x + (b − β)2 x0 x = σ 2ε + op (1) .
n n n
We may therefore write
³ ³ ´´2
Ã n
!0 Ã n
! 0 −1 √1 Pn µ ¶
1 1 X 1 X 1 λ Λ n i=1 zi εi 1
κ= √ zi εi Λ−1 √ zi εi − + op .
nσ2ε n i=1 n i=1 nσ2ε λ0 Λ−1 λ n
G.3 Application to Dynamic Panel Model

We now adopt obvious notations, and make a second order analysis of the right side of (37). First, note
that
1
√ (x∗0 Pt ε∗t − κt x∗0 ∗
t εt )
n t
Ã n !0 Ã n !−1 Ã n
! Ã n
!
1X ∗ 1X 0 1 X ∗ 1 X ∗ ∗
= zit xit zit zit √ zit εit − κt √ x ε
n i=1 n i=1 n i=1 n i=1 it it
Ã n
! Ã n
!0 Ã n
!
0 −1 1 X ∗ 1 1 X ∗ −1 1 X ∗
= λt Λt √ zit εit + √ √ (zit xit − λt ) Λt √ zit εit
n i=1 n n i=1 n i=1
Ã n
! Ã n
!
1 0 −1 1 X 1 X
− √ λt Λt √ 0
(zit zit − Λt ) Λ−1 t √ zit ε∗it
n n i=1 n i=1
Ã n
!0 Ã n
!
1 1 X 1 X
−√ 2 √ zit ε∗it Λ−1 t √ zit ε∗it σ uε,t
nσε,t n i=1 n i=1
³ ³ ´´2
0 −1 √1 Pn ∗ µ ¶
1 λ Λ
t t n
z
i=1 it itε 1
+√ 2 σ uε,t + op √
nσε,t λ0t Λ−1
t λt n
and
1 ∗0
(x Pt x∗t − κt x∗0 ∗
t xt )
n t
Ã n !0 Ã n !−1 Ã n ! Ã n !
1X ∗ 1X 0 1X ∗ 1X ∗ 2
= zit xit zit zit zit xit − κt (x )
n i=1 n i=1 n i=1 n i=1 it
Ã n
!0
0 −1 2 1 X
= λt Λt λt + √ √ (zit xit − λt ) Λ−1
∗
t λt
n n i=1
Ã n
! µ ¶
1 0 −1 1 X 0 −1 1
− √ λt Λt √ (zit zit − Λt ) Λt λt + op √ .
n n i=1 n
45
It therefore follows that
PT −1 0 −1 ³ 1 Pn ∗
´
t=1 λt Λt z ε
√
√ n i=1 it it
n (b − β) = PT −1 0 −1
t=1 λt Λt λt
PT −1 ³ 1 Pn ∗
´0 ³
−1 √1 Pn ∗
´
i=1 (zit xit − λt ) Λt i=1 zit εit
√
1 t=1 n n
+√ PT −1 0 −1
n t=1 λt Λt λt
PT −1 0 −1 ³ 1 Pn 0
´ ³
−1 √1 Pn ∗
´
1 t=1 λ t Λ t
√
n i=1 (z it zit − Λ t ) Λt n i=1 zit εit
−√ PT −1 0 −1
n t=1 λt Λt λt
PT −1 σuε,t ³ 1 Pn ∗
´0 ³
−1 √1 Pn ∗
´
1 t=1 2
σε,t
√
n i=1 z it εit Λ t n i=1 zit εit
−√ PT −1 0 −1
n t=1 λ Λ
t t λ t
³ ³ Pn ´´ 2
PT −1 σ uε,t λ0t Λ−1
t
√1
n i=1 zit ε∗
it
1 t=1 σ2ε,t λ0t Λ−1
t λt
+√ PT −1 0 −1
n t=1 λt Λt λt
PT −1 0 −1 ³ 1 Pn ∗
´Ã Ã !0 !
2 t=1 λt Λ t
√
n i=1 zit εit
TX−1
1 X
n
∗ −1
−√ ³P ´2 √ (zit xit − λt ) Λt λt
n T −1 0 −1 n i=1
t=1 λ Λ
t t λ t t=1
PT −1 0 −1 ³ 1 Pn ∗
´Ã Ã ! !
1 t=1 λt Λ t
√
n i=1 zit εit
TX−1
1 X
n
0 −1 0 −1
+√ ³P ´2 λt Λt √ (zit zit − Λt ) Λt λt
n T −1 0 −1 n i=1
t=1 λt Λt λt
t=1
µ ¶
1
+ op √ .
n
Therefore, under symmetry, the second order bias of the LIML like estimator is given by
PT −1
1 t=1 σ uε,t
PT −1 0 −1
n t=1 λt Λt λt
PT −1 PT −1 0 −1 £ ∗ ∗ 0 ¤ −1
2 t=1 s=1 λt Λt E (zit εit ) (zis xis − λs ) Λs λs
− ³P ´2
n T −1 0 −1
t=1 λ Λ
t t λ t
PT −1 PT −1 0 −1 £ ∗ 0 −1 0
¤ −1
1 t=1 s=1 λt Λt E (zit εit ) λs Λs (zis zis − Λs ) Λs λs
+ ³P ´2
n T −1 0 −1
t=1 λ Λ
t t λt
µ ¶
1
+ op √ .
n
H First Order Asymptotic Theory for Finitely Iterated Long

Difference Estimator
Lemma 15 Let b denote the 2SLS in (29). We have
n µ ¶ Ã !
√ 1 X λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 ΣΛ−1 λ
n (b − β) = √ zi εi − 0 −1 fi + op (1) → N 0, ¡ 0 ¢2 ,
n i=1 λ0 Λ−1 λ λΛ λ λ Λ−1 λ
where
£ ¤
Σ ≡ E (zi εi − fi ϕ) (zi εi − fi ϕ)0 .
46
Proof. See Appendix F.
Lemma 15 can be used to establish the influence function of iterated 2SLS estimators bbLIML,1 , . . . , bbLIML,4
applied to the long difference. We first note that the influence function of bbLIML is given by
PT −1 0 −1 ∗
t=1 λt Λt zit εit
fLIML,i = P T −1 0 −1
t=1 λt Λt λt
where λt = E [zit x∗it ], and Λt = E [zit zit 0

]. We also note that yi = yiT³ − yi1 , xi = yiT −1 − yi0 , and wi =´
0 b iT −2 , . . . , yi2 − βy
(0, yiT −2 , . . . , yi1 ) . This is because we use the instrument of the form yi0 , yiT −1 − βy b i1
at each iteration, where β b is some preliminary estimator of β. By Lemma 15, the influence function of
bbLIML,1 is equal to
λ0 Λ−1 λ0 Λ−1 ϕ
zi εi − fLIML,i (42)
λ0 Λ−1 λ λ0 Λ−1 λ
Using Lemma 15 again, we can see that the influence function of b2 is equal to
µ ¶
λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ
zi εi − zi εi − fLIML,i . (43)
λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ
Likewise, we can see that the influence functions of b3 and b4 are equal to
· µ ¶¸
λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ
zi εi − 0 −1 zi εi − 0 −1 zi εi − 0 −1 fLIML,i (44)
λ0 Λ−1 λ λ Λ λ λ0 Λ−1 λ λ Λ λ λ0 Λ−1 λ λΛ λ
and
½ · µ ¶¸¾
λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ λ0 Λ−1 λ0 Λ−1 ϕ
z ε
i i − z ε
i i − z ε
i i − z ε
i i − fLIML,i
λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ λ0 Λ−1 λ
(45)
√ ³ ´
Using (42) - (45), we can easily construct consistent estimators of asymptotic variances of n bbLIML,1 − β ,
√ ³b ´ √ ³ ´ √ ³ ´
n bLIML,2 − β , n bbLIML,3 − β , and n bbLIML,4 − β . Suppose that Λ, b and ϕ
b λ, b are consistent
b b
estimators of Λ, λ, and ϕ. Likewise, let Λt , and λt denote some consistent estimators of Λt , and λt . For
example,
Xn Xn n
b= 1 b= 1 1X
Λ zbi zbi0 , λ zbi xi , b=
ϕ wi (yi − bxi )
n i=1 n i=1 n i=1
³ ´
b iT −2 , . . . , yi2 − βy
zbi = yi0 , yiT −1 − βy b i1 ,
X n X n
bt = 1
Λ 0
zit zit , bt = 1
λ zit x∗it .
n i=1 n i=1
b is any √n-consistent estimator of β. Also, let

where β
PT −1 b0 b −1 ³ ∗ b ∗ ´
t=1 λt Λt zit yit − βxit
b b i,
εi = yi − βx fbLIML,i = PT −1 b0 b −1 b
t=1 λt Λt λt
47
From (42) - (45), it then follows that
Ã !2
1X
n
λb0 Λ
b −1 b0 Λ
λ b −1 ϕ
bb
zbib
εi − 0 fLIML,i ,
n i=1 b0 Λ
λ b
b −1 λ bΛ
λ b
b −1 λ
Ã 0 Ã 0 !!2
n
1X λ bΛ b −1 b0 Λ
λ b −1 ϕ
b bΛ
λ b −1 b0 Λ
λ b −1 ϕ
bb
zbib
εi − 0 zbib
εi − 0 fLIML,i ,
b0 Λ
n i=1 λ b
b −1 λ bΛ
λ b λ
b −1 λ b0 Λ b
b −1 λ bΛ
λ b
b −1 λ
Ã 0 " 0 Ã 0 !#!2
n
λ b −1 ϕ
b λ bΛ b −1 b0 Λ
λ b −1 ϕ
b bΛ
λ b −1 b0 Λ
λ b −1 ϕ
bb
0 zbib
εi − 0 0 zbib
εi − 0 0 zbib
εi − fLIML,i ,
n i=1 λ
bΛ b
b −1 λ bΛ
λ b λ
b −1 λ bΛ b
b −1 λ bΛ
λ b λ
b −1 λ bΛ b
b −1 λ b0 Λ
λ b
b −1 λ
Ã 0 ( 0 " 0 Ã 0 !#)!2
n
λ b −1 ϕ
b bΛ
λ b −1 b0 Λ
λ b −1 ϕ
b λ bΛ b −1 b0 Λ
λ b −1 ϕ
b bΛ
λ b −1 b0 Λ
λ b −1 ϕ
bb
0 zbib
εi − 0 0 zbib
εi − 0 0 zbib
εi − 0 0 zbib
εi − 0 fLIML,i
bΛ
n i=1 λ b
b −1 λ bΛ
λ b λ
b −1 λ bΛ b
b −1 λ bΛ
λ b λ
b −1 λ bΛ b
b −1 λ bΛ
λ b λ
b −1 λ bΛ b
b −1 λ b
λΛ λb −1 b
√ ³b ´ √ ³ ´ √ ³ ´
are consistent estimators of asymptotic variances of n bLIML,1 − β , n bbLIML,2 − β , n bbLIML,3 − β ,
√ ³ ´
and n bbLIML,4 − β .
I Approximation of CUE
We examine an easier method of calculating an estimator that is equivalent to CUE up to the second
order adapting Rothenberg’s (1984) argument, who was concerned about properties of linearized version
of MLE. We basically argue that two Newton iterations suffice for second order bias removal. The CUE
bCU E solves
0 −1
min L (c) = min g (c) G (c) g (c) ,
c c
where
n n
1X 1X
g (c) ≡ δ i (c) , G (c) ≡ δ i (c) δ i (c)0
n i=1 n i=1
j
Let b denote the minimizer, and let Lj (c) ≡ ∂ ∂c
L(c)
j . We consider an iterated version of CUE. Suppose that
√
we have a n-consistent estimator b0 . Such estimator ³ can´ be easily found by the usual GMM estimation
1
method. Note that we would have b0 − bCU E = Op √
n
. Assume that
³ ´ ³ ´
L2 bb = Op (1) , L3 bb = Op (1)
√
for any n−consistent estimator bb. (This condition is expected to be satisfied for most estimators.) Let
L1 (br )
br+1 ≡ br − .
L2 (br )
48
Expanding around bCU E , and noting that L1 (bCU E ) = 0, we can obtain
L1 (b0 )
b1 − bCUE = b0 − bCUE −
L2 (b0 )
= b0 − bCUE
³ ´
2 2
L2 (bCUE ) · (b0 − bCUE ) + 12 L3 (bCU E ) · (b0 − bCUE ) + op (b0 − bCU E )
−
L2 (bCU E ) + L3 (bCUE ) · (b0 − bCU E ) + op (b0 − bCUE )
= b0 − bCUE
µ ¶
1
− L2 (bCUE ) · (b0 − bCUE ) + L3 (bCUE ) · (b0 − bCU E )2
2
Ã !
1 L3 (bCUE ) · (b0 − bCUE )
× − 2
L2 (bCUE ) L2 (bCU E )
³ ´
2
+ op (b0 − bCUE )
L3 (bCUE ) ³ ´
= · (b0 − bCU E )2 + op (b0 − bCUE )2
2L2 (bCUE )
µ ¶
L3 (bCUE ) 1
= · (b0 − bCU E )2 + op .
2L2 (bCUE ) n
It follows that
µ ¶
1
b1 − bCUE = Op .
n
We can similarly show that

³ ´ µ ¶
L3 (bCUE ) 1
b2 − bCUE = · (b1 − bCU E )2 + op (b1 − bCUE )2 = Op .
2L2 (bCUE ) n2
or
√ ³ ´
n (b2 − bCU E ) = Op n−3/2 .
This implies that b2 has very similar properties as bCU E : Its (approximate) mean and variance up to
¡ ¢
O n−1 coincide with those of bCU E .
49
References
[1] Ahn, S., and P. Schmidt, 1995, “Efficient Estimation of Models for Dynamic Panel Data”, Journal
of Econometrics 68, 5 - 28.
[2] Ahn, S., and P. Schmidt, 1997, “Efficient Estimation of Dynamic Panel Data Models: Alternative
Assumptions and Simplified Estimation”, Journal of Econometrics 76, 309 - 321.
[3] Alonso-Borrego, C. and M. Arellano, 1999, “Symmetrically Normalized Instrumental-

Variable Estimation Using Panel Data”, Journal of Business and Economic Statistics 17, pp. 36-49.
[4] Andrews, D.W.K., 1992, ”Generic Uniform Convergence”, Econometric Theory, 8, pp. 241-257.
[5] Andrews, D.W.K., 1994, ”Empirical Process Methods in Econometrics.” In Handbook of Econo-
metrics, Vol IV (R.F. Engle and D.L. McFadden, eds.). North Holland, Amsterdam.
[6] Alvarez, J. and M. Arellano, 1998, “The Time Series and Cross-Section Asymptotics of Dy-
namic Panel Data Estimators”, unpublished manuscript.
[7] Anderson, T.W. and C. Hsiao, 1982, “Formulation and Estimation of Dynamic Models Using
Panel Data”, Journal of Econometrics 18, pp. 47 - 82.
[8] Arellano, M. and S.R. Bond, 1991, “Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations”, Review of Economic Studies 58, pp. 277
- 297.
[9] Arellano, M. and O. Bover, 1995, “Another Look at the Instrumental Variable Estimation of
Error-Component Models”, Journal of Econometrics 68, pp. 29 - 51.
[10] Bekker, P., 1994, “Alternative Approximations to the Distributions of Instrumental Variable Es-
timators”, Econometrica 62, pp. 657 - 681.
[11] Blundell, R. and S. Bond, 1998, “Initial Conditions and Moment Restrictions in Dynamic Panel
Data Models”, Journal of Econometrics 87, pp. 115 - 143.
[12] Chamberlain, G., 1984, “Panel Data”, in Griliches, Z. and M. D. Intriligator, eds., Handbook of
Econometrics Vol. 2. Amsterdam: North-Holland.
[13] Chao, J. and N. Swanson, 2000, “An Analysis of the Bias and MSE of the IV Estimator under
Weak Identification with Application to Bias Correction”, unpublished manuscript.
[14] Choi, I. and P.C.B. Phillips, 1992, “Asymptotic and Finite Sample Distribution Theory for IV
Estimators and Tests in Partially Identified Structural Equations”, Journal of Econometrics 51, 113
- 150.
[15] Davis, A.W, 1980, “Invariant polynomials with two matrix arguments, extending the zonal poly-
nomials”, in P.R. Krishnaiah, ed, Mulitivariate Analysis V, 287-299. Amsterdam: North Holland.
[16] Donald, S. and W. Newey, 1998, “Choosing the Number of Instruments”, unpublished
manuscript.
[17] Dufour, J-M, 1997, “Some Impossibility Theorems in Econometrics with Applications to Structural
and Dynamic Models”, Econometrica 65, 1365 - 1388.
50
[18] Griliches, Z. and J. Hausman, 1986, “Errors in Variables in Panel Data,” Journal of Economet-
rics 31, pp. 93 - 118.
[19] Hahn, J., 1997, “Efficient Estimation of Panel Data Models with Sequential Moment Restrictions”,
Journal of Econometrics 79, pp. 1 - 21.
[20] Hahn, J., 1999, “How Informative Is the Initial Condition in the Dynamic Panel Model with Fixed
Effects?”, Journal of Econometrics 93, pp. 309 - 326.
[21] Hahn, J. and J. Hausman, 2000, “A New Specification Test for the Validity of Instrumental
Variables”, forthcoming Econometrica.
[22] Hahn, J. and G. Kuersteiner, 2000, “Asymptotically Unbiased Inference for a Dynamic Panel
Model with Fixed Effects When Both n and T are Large”, unpublished manuscript.
[23] Hall, P. and J. Horowitz, 1996, “Bootstrap Critical Values for Tests Based on Generalized-
Method-of-Moments Estimators”, Econometrica 64, 891-916.
[24] Hausman, J.A., and W.E. Taylor, 1983, “Identification in Linear Simultaneous Equations Mod-
els with Covariance Restrictions: An Instrumental Variables Interpretation”, Econometrica 51, pp.
1527-1550.
[25] Hausman, J.A., W.K. Newey, and W.E. Taylor, 1987, “Efficient Estimation and Identification
of Simultaneous Equation Models with Covariance Restrictions”, Econometrica 55, pp. 849-874.
[26] Holtz-Eakin, D., W.K. Newey, and H. Rosen, “Estimating Vector Autoregressions with Panel
Data”, Econometrica 56, pp. 1371 - 1395.
[27] Kiviet, J.F., 1995, “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel
Data Models”, Journal of Econometrics 68, pp. 1 - 268.
[28] Kruiniger, H., “GMM Estimation of Dynamic Panel Data Models with Persistent Data,” unpub-
lished manuscript.
[29] Kuersteiner, G.M., 2000, “Mean Squared Error Reduction for GMM Estimators of Linear Time
Series Models” unpulished manuscript.
[30] Lancaster, T., 1997, “Orthogonal Parameters and Panel Data”, unpublished manuscript.
[31] Magnus J., and H. Neudecker (1988), Matrix Differential Calculus with Applications in Statistics
and Econometrics, John Wiley & Sons Ltd.
[32] Mariano, R.S. and T. Sawa, 1972, “The Exact Finite-Sample Distribution of the Limited-
Information Maximum Likelihood Estimator in the Case of Two Included Endogenous Variables”,
Journal of the American Statistical Association 67, 159 - 163.
[33] Moon, H.R. and P.C.B. Phillips, 2000, “GMM Estimation of Autoregressive Roots Near Unity
with Panel Data”, unpublished manuscript.
[34] Nagar, A.L., 1959, “The Bias and Moment Matrix of the General k-Class Estimators of the
Parameters in Simultaneous Equations”, Econometrica 27, pp. 575 - 595.
[35] Nelson, C., R. Startz, and E. Zivot, 1998, “Valid Confidence Intervals and Inference in the
Presence of Weak Instruments”, International Economic Review 39, 1119 - 1146.
51
[36] Newey, W.K. and R.J. Smith, 2000, “Asymptotic Equivalence of Empirical Likelihood and the
Continuous Updating Estimator”, unpublished manuscript.
[37] Nickell, S., 1981, “Biases in Dynamic Models with Fixed Effects”, Econometrica 49, pp, 1417 -
1426.
[38] Pakes, A and D. Pollard, 1989, ”Simulation and the Asymptotics of Optimazation Estimators,”
Econometrica 57, pp. 1027-1057.
[39] Phillips, P.C.B., 1989, “Partially Identified Econometric Models”, Econometric Theory 5, 181 -
240.
[40] Richardson, D.J., 1968, “The Exact Distribution of a Structural Coefficient Estimator”, Journal
of the American Statistical Association 63, 1214 - 1226.
[41] Rothenberg, T.J., 1983, “Asymptotic Properties of Some Estimators In Structural Models”, in
Studies in Econometrics, Time Series, and Multivariate Statistics.
[42] Rothenberg, T. J., 1984, “Approximating the Distributions of Econometric Estimators and Test
Statistics”, in Griliches, Z. and M. D. Intriligator, eds., Handbook of Econometrics, Vol. 2. Amster-
dam: North-Holland.
[43] Smith, M.D., 1993, “Expectations of Ratios of Quadratic Forms in Normal Variables: Evaluating
Some Top-Order Invariant Polynomials”, Australian Journal of Statistics 35, 271-282.
[44] Wang, J. and E. Zivot, 1998, “Inference on Structural Parameters in Instrumental Variables
Regression with Weak Instruments”, Econometrica 66,1389 — 1404.
[45] Windmeijer, F., 2000, “GMM Estimation of Linear Dynamic Panel Data Models using Nonlinear
Moment Conditions”, unpublished manuscript.
52
Table 1: Performance of bbNagar and bbLIML
%bias RMSE
T n β bbGMM bbNagar bbLIML bbGMM bbNagar bbLIML
5 100 0.1 -16 3 -3 0.081 0.084 0.082
10 100 0.1 -14 1 -1 0.046 0.046 0.045
5 500 0.1 -4 0 -1 0.036 0.036 0.036
10 500 0.1 -3 0 -1 0.020 0.020 0.020
5 100 0.3 -9 1 -3 0.099 0.103 0.099
10 100 0.3 -7 0 -1 0.053 0.051 0.050
5 500 0.3 -2 0 -1 0.044 0.044 0.044
10 500 0.3 -2 0 0 0.023 0.023 0.023
5 100 0.5 -10 1 -3 0.132 0.140 0.130
10 100 0.5 -7 0 -1 0.064 0.059 0.058
5 500 0.5 -2 0 -1 0.057 0.057 0.057
10 500 0.5 -2 0 0 0.027 0.026 0.026
5 100 0.8 -28 -129 -15 0.321 102.156 0.327
10 100 0.8 -14 0 -5 0.136 0.128 0.109
5 500 0.8 -7 1 -3 0.130 0.141 0.127
10 500 0.8 -3 0 -1 0.050 0.044 0.044
5 100 0.9 -51 -70 -41 0.555 26.984 0.604
10 100 0.9 -24 -4 -15 0.250 4.712 0.229
5 500 0.9 -20 -41 -10 0.278 46.933 0.277
10 500 0.9 -9 0 -2 0.102 0.087 0.080
53
b
Table 2: Performance of Second Order Theory in Predicting Properties of β GMM
T n β Actual Bias Actual %Bias Second Order Bias Second Order %Bias
5 100 0.1 -0.016 -16.00 -0.018 -17.71
10 100 0.1 -0.014 -14.26 -0.016 -15.78
5 500 0.1 -0.004 -3.72 -0.004 -3.54
10 500 0.1 -0.003 -3.20 -0.003 -3.16
5 100 0.3 -0.028 -9.23 -0.032 -10.60
10 100 0.3 -0.021 -7.11 -0.024 -8.13
5 500 0.3 -0.006 -2.08 -0.006 -2.12
10 500 0.3 -0.005 -1.58 -0.005 -1.63
5 100 0.5 -0.052 -10.32 -0.060 -12.09
10 100 0.5 -0.034 -6.78 -0.040 -8.00
5 500 0.5 -0.011 -2.29 -0.012 -2.42
10 500 0.5 -0.008 -1.51 -0.008 -1.60
5 100 0.8 -0.224 -28.06 -0.302 -37.81
10 100 0.8 -0.108 -13.53 -0.152 -18.98
5 500 0.8 -0.056 -7.02 -0.060 -7.56
10 500 0.8 -0.027 -3.44 -0.030 -3.80
5 100 0.9 -0.455 -50.56 -1.068 -118.64
10 100 0.9 -0.220 -24.47 -0.474 -52.66
5 500 0.9 -0.184 -20.48 -0.214 -23.73
10 500 0.9 -0.078 -8.64 -0.095 -10.53
54
Table 3: Performance of bbBC2
³ ´ ³ ´ ³ ´ ³ ´
T n β %bias bbGMM %bias bbBC2 RMSE bbGMM RMSE bbBC2
5 100 0.1 -14.96 0.25 0.08 0.08
10 100 0.1 -14.06 -0.77 0.05 0.05
5 500 0.1 -3.68 -0.38 0.04 0.04
10 500 0.1 -3.15 -0.16 0.02 0.02
5 100 0.3 -8.86 -0.47 0.10 0.10
10 100 0.3 -7.06 -0.66 0.05 0.05
5 500 0.3 -2.03 -0.16 0.04 0.04
10 500 0.3 -1.58 -0.10 0.02 0.02
5 100 0.5 -10.05 -1.14 0.13 0.13
10 100 0.5 -6.76 -0.93 0.06 0.06
5 500 0.5 -2.25 -0.15 0.06 0.06
10 500 0.5 -1.53 -0.11 0.03 0.03
5 100 0.8 -27.65 -11.33 0.32 0.34
10 100 0.8 -13.45 -4.55 0.14 0.11
5 500 0.8 -6.98 -0.72 0.13 0.13
10 500 0.8 -3.48 -0.37 0.05 0.04
5 100 0.9 -50.22 -42.10 0.55 0.78
10 100 0.9 -24.27 -15.82 0.25 0.23
5 500 0.9 -20.50 -6.23 0.28 0.30
10 500 0.9 -8.74 -2.02 0.10 0.08
Table 4: Performance of Iterated Long Difference Estimator
bb2SLS,1 bb2SLS,2 bb2SLS,3 bb2SLS,4

Bias -0.0813 -0.0471 -0.0235 -0.0033
%Bias -9.0316 -5.2316 -2.6072 -0.3644
RMSE 0.3802 0.2863 0.2479 0.2536
bbGMM,1 bbGMM,2 bbGMM,3 bbGMM,4
Bias -0.0770 -0.0374 0.0006 0.0104
%Bias -8.5505 -4.1599 0.0622 1.1570
RMSE 0.1699 0.1954 0.2545 0.2851
bbLIML,1 bbLIML,2 bbLIML,3 bbLIML,4
Bias -0.0878 -0.0475 -0.0186 0.0074
%Bias -9.7571 -5.2756 -2.0698 0.8251
RMSE 0.2458 0.2391 0.2292 0.2638
55
Table 5: Comparison with Blundell and Bond’s (1998) Estimator: β = .9, T = 5, N = 100
bbBB1 bbBB2 bbBB3 bbBB4 bbLIML,1 bbLIML,2 bbLIML,3

Mean % Bias -33.8148 -29.4131 4.7432 4.2551 -9.7571 -5.2755 -2.0697
Median % Bias -31.1881 -25.9085 5.9111 5.6280 -15.3878 -9.0639 -6.9573
RMSE 0.4796 0.4257 0.0823 0.0882 0.2458 0.2391 0.2292
Table 6: Sensitivity of Blundell and Bond’s (1998) Estimator: β = .9, T = 5, N = 100
β F = .5 bbBB1 bbBB2 bbBB3 bbBB4 bbLIML,1 bbLIML,2 bbLIML,3

Mean % Bias 8.9525 14.4790 20.9971 21.5154 0.0252 0.1691 0.2334
Median % Bias 9.5207 15.4609 21.1202 21.6144 -0.2163 -0.2214 -0.2469
RMSE 0.0994 0.1400 0.1899 0.1944 0.0570 0.0611 0.0630
βF = 0 bbBB1 bbBB2 bbBB3 bbBB4 bbLIML,1 bbLIML,2 bbLIML,3
Mean % Bias 10.8819 17.3840 24.8534 25.4517 0.0429 0.1455 0.1860
Median % Bias 11.4178 18.2542 24.9990 25.5079 -0.1621 -0.1890 -0.2168
RMSE 0.1156 0.1654 0.2246 0.2299 0.0521 0.0543 0.0555
56
Table 7: Performance of Iterated Long Difference Estimator for T = 5
N = 100 bbLIML,1 bbLIML,2 bbLIML,3 bbLIML,4

β = .75 Actual Mean % bias 1.2977 4.6584 5.3703 7.6702
Actual Median % bias -3.0867 -0.4467 -0.0800 -0.4800
2nd order Mean % bias -.1358 2.8043 4.6720 6.5872
RMSE 0.1806 0.2278 0.2465 0.2857
β = .80 Actual Mean %bias -0.1119 2.5878 4.2732 6.3443
RMSE 0.2128 0.2452 0.2523 0.3032
β = .85 Actual Mean %bias -3.8994 -0.5921 1.6201 3.6981
RMSE 0.2333 0.2494 0.2532 0.2848
β = .90 Actual Mean %bias -9.7571 -5.2756 -2.0698 0.8251
2nd order Mean % bias -1.7413 25.3502 40.2274 49.3254
RMSE 0.2458 0.2391 0.2292 0.2638
β = .95 Actual Mean %bias -15.2028 -9.5738 -6.0855 -2.8321
RMSE 0.2518 0.2191 0.2124 0.2397
N = 200 bbLIML,1 bbLIML,2 bbLIML,3 bbLIML,4
β = .75 Actual Mean %bias 1.2054 3.0110 3.8420 5.1421
RMSE 0.1336 0.1630 0.1906 0.2189
RMSE 0.1740 0.2071 0.2245 0.2435
RMSE 0.2239 0.2363 0.2288 0.2513
β = .90 Actual Mean %bias -5.8274 -2.7803 -1.3073 0.1639
RMSE 0.2406 0.2257 0.2252 0.2396
β = .95 Actual Mean %bias -13.3638 -8.7034 -6.2646 -4.1416
RMSE 0.2515 0.2156 0.1991 0.2020
57
b
Table 8: Performance of β b
I2SLS and β CU E for T = 5
N = 100 b
β b
β
I2SLS,LD CU E,LD
β = 0.75 Actual Mean % Bias 5.5331 11.5527

Second Order Mean % Bias 5.4224 7.6105
Actual Median %Bias 1.3811 7.4700
RMSE 0.1761 0.2132
InterQuartile Range 0.2434 0.3067
Actual Median % Bias 1.4569 8.6510
RMSE 0.1727 0.2048
RMSE 0.1604 0.1947
β = 0.9 Actual Mean % Bias -0.7710 6.1387
Actual Median % Bias -2.3467 6.1147
RMSE 0.1534 0.1803
RMSE 0.1494 0.1655
58
b
Table 9: Performance of β b
I2SLS and β CU E for T = 5
N = 200 b
β b
β
I2SLS,LD CU E,LD

Actual Median %Bias 1.5982 4.2172
RMSE 0.1519 0.1704
RMSE 0.1447 0.1674
RMSE 0.1373 0.1585
RMSE 0.1271 0.1448
RMSE 0.1216 0.1331
59
b
Table 10: Performance of β b b b
CUE,LD , β CU E2,AS , β CU E2,LD , and β CU E2,BB for T = 5
N = 100 b
β b
β b
β b
β b
β
CUE,LD CU E,BB CU E2,AS CUE2,LD CUE2,BB
β = .75 Median % Bias 7.4700 1.2705 6.6814 4.2643 2.0471

Interquartile Range 0.3067 0.1480 0.2864 0.2911 0.2456
Mean % Bias 11.5527 0.4852 -296.6631 1250.1149 -136.4730
RMSE 0.2132 0.1050 152.2249 676.8912 117.9397
β = .8 Median % Bias 8.6510 1.2629 4.7391 1.6364 0.6595
Mean % Bias 10.4126 -0.0913 33.6498 -15.0393 -125.6554
RMSE 0.2048 0.1092 29.2436 12.6828 74.6934
β = .85 Median % Bias 7.5588 1.9808 0.9468 -2.2980 -1.0482
Mean % Bias 7.9833 0.3824 -100.7981 -161.9267 6.4686
RMSE 0.1947 0.1225 28.4546 25.6932 23.8489
β = .9 Median % Bias 6.1147 3.0423 -4.2248 -16.4693 -6.9530
Mean % Bias 6.1387 1.2087 -30.2898 -177.0842 495.0171
RMSE 0.1803 0.1344 24.6733 131.8341 193.9465
β = .95 Median % Bias 3.1365 3.4897 -17.7102 -129.4765 -21.5058
Mean % Bias 3.1244 1.0877 -290.6542 -42.6293 -32.0973
RMSE 0.1655 0.1347 166.1415 67.0635 98.0361
60
b
Table 11: Performance of β b b b
CUE,LD , β CU E2,AS , β CU E2,LD , and β CU E2,BB for T = 5
N = 200 b
β b
β b
β b
β b
β
CUE,LD CU E,BB CU E2,AS CUE2,LD CUE2,BB
β = .75 Median % Bias 4.2172 0.4242 3.4952 4.0943 1.2644

Mean % Bias 8.8638 0.1604 116.0861 29.5421 4.4327
RMSE 0.1704 0.0719 85.9117 10.6641 4.5595
β = .8 Median % Bias 5.4765 0.5388 5.8182 5.3421 0.8893
Mean % Bias 8.3701 -0.1898 16.9181 -233.6177 -21.1440
RMSE 0.1674 0.0736 13.7393 127.6513 9.5825
β = .85 Median % Bias 5.8672 0.6441 5.1660 3.7226 1.0708
Mean % Bias 7.3021 -0.7076 688.7913 -50.0828 59.6610
RMSE 0.1585 0.0779 455.6137 19.8972 66.8390
β = .9 Median % Bias 5.3657 0.9204 0.9958 -2.8551 -1.0774
Mean % Bias 5.4221 -0.3893 479.7193 31.6093 -29.9555
RMSE 0.1448 0.0913 381.8271 23.2206 42.2422
β = .95 Median % Bias 2.8984 2.5208 -11.2026 -125.6884 -12.6677
Mean % Bias 2.8482 0.9733 -82.2370 -6464.7709 -177.6883
RMSE 0.1331 0.1044 39.5396 4315.6181 116.8096
61

SSRN Id276592

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id276592

Uploaded by

Copyright:

Available Formats

Massachusetts Institute of Technology

BIAS CORRECTED INSTRUMENTAL

Working Paper 01-24

This paper can be downloaded without charge from the

Jinyong Hahn Jerry Hausman Guido Kuersteiner

Acknowledgment: We are grateful to Yu Hu for exceptional research assistance. Helpful

JEL: C13, C23, C51

2 Review of the Bias of GMM Estimator

yit = αi + βyi,t−1 + εit , i = 1, . . . , n; t = 1, . . . , T (1)

yit − yi,t−1 = β (yi,t−1 − yi,t−2 ) + (εit − εi,t−1 )

where the instruments are based on the orthogonality

E [yi,s (εit − εi,t−1 )] = 0 s = 0, . . . , t − 2.

where x∗t ≡ yi,t−1

3 Bias Correction using Alternative Asymptotics

We also assume stationarity on yi,0 and normality on αi 2 :

In order to guarantee that Zt0 Zt is nonsingular, we will assume that

Alvarez and Arellano (1998) show that, under Conditions 1 - 3,

from which the conclusion follows.

4 Bias Correction using Higher Order Expansions

b = argmin Qn (c) (7)

Our asymptotic approximation of the second order bias of b is based on an approximate

Condition 8 is an identification condition that guarantees the existence of a unique inte-

Proof. See Appendix B.

Theorem 3 Under Conditions 4-9, the second order bias of b is equal to

Proof. See Appendix B.

Proof. See Appendix C.

where e∗it ≡ yit ∗ − x∗ b zz zx εxzz and

in the numerator of B3 is equal to zero for s < t, we may instead consider

5 Long Diﬀerence Specification: Finite Iteration

yiT − yi1 = β (yiT −1 − yi0 ) + (εiT − εi1 ) (14)

1 (number of instruments) × (“covariance”)

For the RHS variables it uses the instrument equation:

5.2 Monte Carlo

6 Near Unit Root Approximation

For our purposes it will be convenient to choose A such that

Lemma 2 Assume β n = exp(−c/n) for some c > 0. For T fixed and as n → ∞

and Σ12 = Σ021 . We also have

Proof. See Appendix D.

Corollary 1 Let β 2SLS − β n be given by (18). If Condition 10 is satisfied then

d ξ 0x C1 (C10 Σ22 C1 )+ C10 ξ y

d (µ + ξx )0 C1 (C10 Σ22 C1 )−1 C10 ξ y

Proof. See Appendix D.

min |E [X(C1 , IT −1 )]| = min |E [X(C1 , Σ22 )]| .

yiT − yi1 = β(yiT −1 − yi0 ) + εiT − εi1

A Technical Details for Section 3

Lemma 5 Suppose that s < t. We have

Now, it can be shown that12

With (21), we obtain the first conclusion.

and the third term can be bounded above in absolute value by

Using the assumption that T /n = O (1), we obtain the desired conclusion.

by adopting the same proof as in Lemma 6.

Conclusion follows from

Proof. Note that

By conditioning, it can be shown that

Using (20), we can show that

B Technical Details for Section 4

Ψ ≡ 3λ01 Λ−1 λ2 − 3λ01 Λ−1 Λ1 Λ−1 λ1 ,

g (b)0 G (b)−1 g (b) − λ (b)0 G (b)−1 λ (b) − g (β)0 G (b)−1 g (β)

h1 = g20 G−1 g − g10 G−1 G1 G−1 g + g10 G−1 g1 ,

h3 = g10 G−1 G1 G−1 g − g 0 G−1 G1 G−1 G1 G−1 g + g 0 G−1 G2 G−1 g

We may therefore rewrite the score Sn (b) as

based on which we obtain (9). Noting that

we obtain the desired conclusion.