You are on page 1of 15

Journal of Econometrics 157 (2010) 53–67

Contents lists available at ScienceDirect

Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom

Specification and estimation of spatial autoregressive models with


autoregressive and heteroskedastic disturbances
Harry H. Kelejian, Ingmar R. Prucha ∗
Department of Economics, University of Maryland, College Park, MD 20742, United States

article info abstract


Article history: This study develops a methodology of inference for a widely used Cliff–Ord type spatial model containing
Available online 30 October 2009 spatial lags in the dependent variable, exogenous variables, and the disturbance terms, while allowing
for unknown heteroskedasticity in the innovations. We first generalize the GMM estimator suggested in
JEL classification:
Kelejian and Prucha (1998, 1999) for the spatial autoregressive parameter in the disturbance process. We
C21
also define IV estimators for the regression parameters of the model and give results concerning the joint
C31
asymptotic distribution of those estimators and the GMM estimator. Much of the theory is kept general
Keywords: to cover a wide range of settings.
Spatial dependence © 2009 Elsevier B.V. All rights reserved.
Heteroskedasticity
Cliff–Ord model
Two-stage least squares
Generalized moments estimation
Asymptotics

1. Introduction variables corresponding to other cross sectional units, plus a


disturbance term. This model is typically referred to as a spatial
In recent years the economics literature has seen an increasing autoregressive model, the weighted average is typically referred to
number of theoretical and applied econometric studies involving as a spatial lag, the corresponding parameter as the autoregressive
spatial issues.1 While this increase in interest in spatial models in parameter, and the matrix containing the weights as the spatial
economics is relatively recent, spatial models have a long history weights matrix. A generalized version of this model also allows
for the dependent variable to depend on a set of exogenous
in the regional science and geography literature.2 One of the most
variables and spatial lags thereof. A further generalization allows
widely referenced model of spatial interactions is one that was put
for the disturbances to be generated by a spatial autoregressive
forth by Cliff and Ord (1973, 1981). This model is a variant of the
process. Consistent with the terminology developed by Anselin
model considered by Whittle (1954). In its simplest (and original)
and Florax (1995) we refer to the combined model as a spatial
form the model only considers spatial spillovers in the dependent autoregressive model with autoregressive disturbances of order (1,
variable, and specifies the endogenous variable corresponding to a 1), for short SARAR(1, 1). We note that this model is fairly general
cross sectional unit in terms of a weighted average of endogenous in that it allows for spatial spillovers in the endogenous variables,
exogenous variables and disturbances.
Somewhat surprisingly, even though the SARAR(1, 1) model
∗ has been a modeling tool for many years, until recently there has
Corresponding author. Tel.: +1 301 405 3499.
E-mail address: prucha@econ.umd.edu (I.R. Prucha).
been a lack of formal results concerning estimation methods for
1 Some recent applications of spatial models are, e.g., Audretsch and Feldmann this model. One method that has been employed to estimate this
(1996), Baltagi et al. (2007a), Bell and Bockstael (2000), Besley and Case (1995), model is the (quasi) maximum likelihood (ML) procedure, where
Betrand et al. (2000), Case (1991), Case et al. (1993), Cohen and Morrison Paul the likelihood function corresponds to the normal distribution.
(2004), Hanushek et al. (2003), Holtz-Eakin (1994), Sacredote (2001), Shroder Formal results concerning the asymptotic properties of the ML
(1995), and Topa (2001). Contributions to the theoretical econometric literature estimator have been established only recently in an important
include, e.g., Baltagi and Li (2004, 2001a,b), Baltagi et al. (2007b, 2003), Bao and contribution by Lee (2004). Given that the likelihood function
Ullah (2007), Conley (1999), Das et al. (2003), Driscoll and Kraay (1998), Kapoor
et al. (2007), Kelejian and Prucha (2007a, 2004, 2002, 2001, 1999, 1998), Korniotis
involves the determinant of a matrix whose dimensions depend
(2005), Lee (2007a,b, 2004, 2003, 2002), Pinkse and Slade (1998), Pinkse et al. on the sample size and an unknown parameter, there can
(2002), Robinson (2007a, forthcoming), and Su and Yang (2007). be significant difficulties in the practical computation of this
2 See, e.g., Anselin (1988), Bennett and Hordijk (1986), Cliff and Ord (1973, 1981), estimator especially if the sample size is large, as it might be
and Cressie (1993) and the references cited therein. if the spatial units relate to counties, single family houses, etc.
0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2009.10.025
54 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

In part because of this Kelejian and Prucha (1999) introduced Another concern with the existing literature on Cliff–Ord type
a generalized moments (GM) estimator for the autoregressive models, including in the above cited literature on the SARAR(1,
parameter of the disturbance process that is simple to compute and 1) models, is the specification of the parameter space for spatial
remains computationally feasible even for large sample sizes. In autoregressive parameters. In virtually all of the literature it is
Kelejian and Prucha (1998) we used that GM estimator to introduce assumed that the parameter space for autoregressive parameters
a generalized spatial two stage least squares estimator (GS2SLS) for is the interval (−1, 1), or a subset thereof. One may conjecture that
the regression parameters of the spatial SARAR(1, 1) model that this traditional specification of the parameter space received its
is again simple to compute, and demonstrated its consistency and motivation from the time series literature. However, as discussed
asymptotic normality.3 in detail below, choosing the interval (−1, 1) as the parameter
All of the above estimators for the SARAR(1, 1) model were space for the autoregressive parameter of a spatial model is not
introduced and their asymptotic properties were derived under natural in the sense that the spatial autoregressive parameter
the assumption that the innovations in the disturbance process always appears in those models in product form with the spatial
are homoskedastic. The lack of an estimation theory that allows
weights matrix. Hence equivalent model formulations are obtained
for heteroskedasticity, and the lack of corresponding joint hy-
by applying an (arbitrary) scale factor to the autoregressive
pothesis tests for the presence of spatial dependencies in the en-
parameter and its inverse to the weights matrix. Of course,
dogenous variables, exogenous variables and/or disturbances, is a
applying a scale factor to the autoregressive parameter leads
serious shortcoming. Spatial units are often heterogeneous in im-
to a corresponding re-scaling of its parameter space. In this
portant characteristics, e.g., size, and hence the homoskedastic-
paper we therefore allow for a more general specification of the
ity assumption may not hold in many situations (conditionally
and unconditionally). It is readily seen that if the innovations are parameter space. Even if a scale factor is used that results in the
heteroskedastic, the ML estimator considered in Lee (2004) is in- parameter space being the interval (−1, 1), this scale factor and
consistent, and the asymptotic distribution given in Kelejian and correspondingly the autoregressive parameter will then typically
Prucha (1998) for the GS2SLS estimator is not appropriate. One im- depend on the sample size. In contrast to the existing literature we
portant goal of this study is therefore to develop a methodology of thus allow for the parameters to depend on the sample size. Our
inference for the SARAR(1, 1) model that allows for heteroskedas- discussion of the parameter space and possible normalizations of
tic innovations. In developing this theory we will adopt a modu- the spatial weights matrix also points out potential pitfalls with the
lar approach such that much of the theory not only applies to the frequently used approach of row-normalizing the spatial weights
SARAR(1, 1) model, but can also be utilized in different settings in matrix.
future research. The paper is organized as follows: The generalized SARAR(1, 1)
In more detail, in this paper we introduce a new class of model is specified and interpreted in Section 2. This section also
GM estimators for the autoregressive parameter of a spatially contains a discussion of the parameter space of the autoregressive
autoregressive disturbance process that allows for heteroskedastic parameter. In Section 3 we define and establish the large sample
innovations. Our GM estimators are again computationally simple properties of our suggested GM estimators for the autoregressive
even in large samples. We determine their consistency; unlike in parameter of a spatially autoregressive disturbance process. In this
our earlier paper we also determine, under reasonably general section we also provide results concerning the joint large sample
conditions, their asymptotic distribution. Loosely speaking, in distribution of the GM estimators and a wide class of estimator of
deriving those results we essentially only maintain that the the regression parameters. We also develop HAC type estimators
disturbances are n1/2 -consistently estimated (where n is the for the large sample variance-covariance matrix of the suggested
sample size) and that the estimator of the model parameters estimators. Section 4 contains results relating to the suggested
employed in estimating the disturbances is asymptotically linear instrumental variable estimators of the regression parameters of
in the innovations. As a result the methodology developed in this the SARAR(1, 1) model and their joint large sample distribution
paper covers a wide range of (linear and nonlinear) models and with GM estimators. Concluding remarks are given in the Section 5.
estimators, in addition to the SARAR(1, 1) model and estimators Technical details are relegated to the Appendices.
specific for that model. We furthermore derive results concerning
It proves helpful to introduce the following notation: Let An
the joint distribution of the GM estimators and estimators of the
with n ∈ N be some matrix; we then denote the (i, j)-th element
regression parameters to facilitate joint tests. While the results are
of An as aij,n . Similarly, if vn is a vector, then vi,n denotes the i-th
presented for the case of two step estimation procedures where the
element of vn . An analogous convention is adopted for matrices
spatial autoregressive parameter and the regression parameters
and vectors that do not depend on the index n, in which case the
are estimated in separate steps, the analysis can be readily adapted
to one step procedures where all parameters are estimated in a index n is suppressed on the elements. If An is a square matrix,
single (but numerically more involved) step. then A− n
1
denotes the inverse of An . If An is singular, then A− n
1

The general theory is then applied to develop inference should be interpreted as the generalized inverse of An . At times
methodology for the SARAR(1, 1) model. In particular, we use it will also be helpful to denote the generalized inverse more
the GM estimator in constructing a GS2SLS estimator for the explicitly as A+ n . With ai.,n and a.i,n we denote the i-th row and
regression parameters of the SARAR(1, 1) model and demonstrate column of An , respectively, and with ain. and a.ni those of A− 1
n . If
1/2
the consistency and asymptotic normality of this estimator. We An is a square symmetric nonnegative matrix, then An denotes
also provide results concerning the joint distribution of the GM the unique symmetric and nonnegative square root of An . If An is
estimator and the GS2SLS estimator, which permits, among other −1/2 1 1/2
nonsingular, then An denotes (A− n ) . Further, we say the row
things, testing the joint hypothesis of the absence of spatial
and column sums of the (sequence of) matrices An are bounded
spillovers stemming from the endogenous variables, exogenous
uniformly in absolute value if there exists a constant cA < ∞ (that
variables or disturbances.
does not dependent of n) such that
n
X n
X
3 The formulation of the GS2SLS estimator is based on an approximation of the max |aij,n | ≤ cA and max |aij,n | ≤ cA for all n ∈ N
1≤i≤n 1≤j≤n
ideal instruments. Recently Lee (2003) and Kelejian et al. (2004) extended the j =1 i =1
analysis to include the use of ideal instruments. Das et al. (2003) analyzed the small
sample properties of the GS2SLS (as well as those of other estimators). They find
holds. As a point of interest, we note that the above condition
that in many situations the loss of efficiency due to the approximation of the ideal is identical to the condition that the sequences of the maximum
instruments is minor. column sum matrix norms and maximum row sum matrix norms
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 55

of An are bounded; cp. Horn and Johnson (1985, pp. 294–5). and un would still depend on n in light of (3) since the elements
For definiteness, let A be some vector or matrix, then kAk = of the inverse matrices involved would generally depend on n. We
[Tr(A0 A)]1/2 . We note that this norm is submultiplicative, i.e., kABk maintain the following assumption concerning the spatial weight
≤ kAk kBk. matrices.
Assumption 3. The row and column sums of the matrices Wn , Mn ,
2. Model
(In −λn Wn )−1 and (In −ρn Mn )−1 are bounded uniformly in absolute
In this section we specify the generalized SARAR(1, 1) model value.
and discuss the underlying assumptions. Given (3), Assumption 2 implies that Eun = 0, and that the VC
matrix of un is given by
2.1. Specification −1
Eun u0n = (In − ρn Mn )−1 6n In − ρn M0n
Suppose a cross section of n spatial units is observed, and the where 6n = diag(σi2,n ). This specification allows for fairly
interactions between those spatial units can be described by the general patterns of autocorrelation and heteroskedasticity of the
following model: disturbances. It is readily seen that the row and column sums of
yn = Xn βn + λn Wn yn + un products of matrices, whose row and column sums are bounded
uniformly in absolute value, are again uniformly bounded in
= Zn δn + un (1) absolute value; see, e.g., Kelejian and Prucha (2004, Remark A.1).
and Because of this, Assumptions 2 and 3 imply that the row and
un = ρn Mn un + εn , (2) column sums of the variance-covariance (VC) matrix of un (and
0 similarly those of yn ) are uniformly bounded in absolute value,
with Zn = [Xn , Wn y] and δn = βn0 , λn . Here yn denotes the n × 1

thus limiting the degree of correlation between, respectively, the
vector of observations of the dependent variable, Xn denotes the elements of un (and of yn ). That is, making an analogy to the time
n × k matrix of non-stochastic (exogenous) regressors, Wn and Mn series literature, these assumptions ensure that the disturbance
are n × n non-stochastic matrices, un denotes the n × 1 vector of process and the process for the dependent variable exhibit a
regression disturbances, εn is an n × 1 vector of innovations, λn ‘‘fading’’ memory.4
and ρn are unknown scalar parameters, and βn is a k × 1 vector
of unknown parameters. The matrices Wn and Mn are typically 2.2. Parameter space for an autoregressive parameter
referred to as spatial weight matrices, and λn and ρn are typically
called spatial autoregressive parameters. The analysis allows for Assumption 1(b) defines the parameter space for the autore-
Wn = Mn , which will frequently be the case in applications. All gressive parameters. In discussing this assumption we focus on Wn
quantities are allowed to depend on the sample size. and λn . (An analogous discussion applies to Mn and ρn .) In the exist-
The vectors yn = Wn yn and un = Mn un are typically referred ing literature relating to Cliff–Ord models the parameter space for
to as spatial lags of yn and un , respectively. We note that all the autoregressive parameter is typically taken to be the interval
quantities are allowed to depend on the sample size and so some (−1, 1) and the autoregressive parameter is assumed not to de-
of the exogenous regressors may be spatial lags of exogenous pend on the sample size. However, in applications it is typically
variables. Thus the model is fairly general in that it allows for found that for un-normalized spatial weight matrices, In − λWn
spatial spillovers in the endogenous variables, exogenous variables is singular for some values of λ ∈ (−1, 1). To avoid this situa-
and disturbances. tion, many applied researchers normalize each row of their spatial
The spatial weight matrices and the autoregressive parameters weight matrices in such a way that In − λWn is non-singular for all
are assumed to satisfy the following assumption. λ ∈ (−1, 1). We now discuss the implications of various normal-
Assumption 1. (a) All diagonal elements of Wn and Mn are zero. izations of the spatial weight matrix.
λ ρ ρ λ Suppose cn denotes a scalar normalization factor. Clearly, this
(b) λn ∈ (−aλn , an ), ρn ∈ (−an , an ) with 0 < aλn , an ≤ aλ < ∞
ρ ρ normalization factor may depend on the sample size. For example,
and 0 < an , an ≤ aρ < ∞. (c) The matrices In − λWn and In − ρ Mn
λ ρ ρ some of our results below relate to the case in which cn corresponds
are nonsingular for all λ ∈ (−aλn , an ), and ρ ∈ (−an , an ). to the maximal row or column sum of the absolute values of the
Assumption 1(a) is clearly a normalization rule. Assump- elements of Wn . Given such a normalizing factor, an equivalent
tion 1(b) concerning the parameter space of λn and ρn will be dis- specification of model (1) for yn is obtained if λn Wn is replaced
cussed in the next subsection. Assumption 1 (c) ensures that yn and by λ∗n W∗n where λ∗n = cn λn and W∗n = Wn /cn . It is important to
un are uniquely defined by (1) and (2) as observe that even if λn and its corresponding parameter space do
not depend on n, λ∗n and its implied parameter space will depend on
yn = (In − λn Wn )−1 Xn βn + (In − λn Wn )−1 un , (3) the sample size as a result of the normalization of the spatial weight
un = (In − ρn Mn )−1 εn . matrix.5 It is for this reason that we allow in Assumption 1 for the
As remarked in Section 1, spatial units are often heterogeneous elements of the spatial weight matrices, and the autoregressive
in important characteristics, e.g., size. For that reason it is parameters and the corresponding parameter spaces to depend on
important to develop an estimation theory that allows for the n. Of course, Assumption 1 also covers the case where the true data
innovations to be heteroskedastic. Therefore, we maintain the generating process corresponds to a model where autoregressive
following set of assumptions with respect to the innovations. parameters do not depend on n.
Assumption 1 defines the parameter space for λn as an interval
Assumption 2. The innovations {εi,n : 1 ≤ i ≤ n, n ≥ 1} satisfy around zero such that In − λWn is nonsingular for values λ in
σ
E εi,n = 0, E (εi2,n ) = σi2,n with 0 < aσ ≤ σi2,n ≤ a < ∞, that interval. The following trivial lemma gives bounds for that
4+η interval.
and sup1≤i≤n,n≥1 E εi,n < ∞ for some η > 0. Furthermore,
for each n ≥ 1 the random variables ε1,n , . . . , εn,n are totally
independent.
4 Of course, the extent of correlation is limited in virtually all large sample
The above assumption also allows for the innovations to depend analysis; see, e.g., Amemiya (1985, ch. 3, 4) and Pötscher and Prucha (1997, ch. 5,
on the sample size n, i.e., to form triangular arrays. We note that 6).
even if the innovations do not depend on n, the elements of yn 5 The parameter space for λ∗ is given by (−c aλ , c aλ ).
n n n n n
56 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

Lemma 1. Let τn denote the spectral radius of Wn ; i.e., will also be useful in other settings such as cases where yn is
determined by a nonlinear model; see, e.g. Kelejian and Prucha
τn = max ν1,n , . . . , νn,n

(2001, p. 228). The estimators put forth below generalize the GM
where ν1,n , . . . , νn,n denote the eigenvalues of Wn . Then In − λWn is estimator for ρn introduced in Kelejian and Prucha (1999). In
nonsingular for all values of λ in the interval (−1/τn , 1/τn ).6 contrast to that earlier paper we now allow for heteroskedastic
innovations εi,n , and optimal weighting of the moment conditions.
Clearly, if we select (−1/τn , 1/τn ) as the parameter space for
We also do not confine the parameter space for ρn to be the
λn , then all eigenvalues of λn Wn are less than one in absolute
interval (−1, 1), and allow ρn to depend on n. In our earlier
value. Thus if we interpret (1) as an equilibrium relationship,
then this choice of the parameter space rules out unstable Nash paper we only demonstrated the consistency of the estimator. In
equilibria. Of course, we obtain an equivalent specification of the the following we also derive the asymptotic distribution of the
model if instead of working with Wn we work with the normalized considered estimators.
weights matrix W∗n = Wn /τn and select the interval (−1, 1) as
the parameter space for λ∗n = λn τn . Assumption 1 is sufficiently 3.1. Definition of the GM estimator for ρn
general to cover both cases.
For large sample sizes the computation of the eigenvalues on In the following let eun denote some predictor of un . Further-
Wn is difficult. The following lemma gives boundaries, which are more, for notational convenience let un = Mn un and un = Mn un =
simple to compute, for a (sub)set of values of λ for which In − λWn M2n un , and correspondingly, eun = Mneun , and un = M2ne
e
un . Similarly,
is nonsingular. let εn = Mn εn . It is readily seen that under Assumptions 1 and 2
Lemma 2. Let we have the following moment conditions:

n−1 E ε0n εn = n−1 Tr Mn diagni=1 (E εi2,n ) M0n ,


( )   
n
X n
X (4)
τn = min max

wij,n , max wij,n .
1≤i≤n
j =1
1≤j≤n
i=1
n −1
E εn εn = 0.
0

Then τn ≤ τn∗ and consequently In − λWn is nonsingular for all values It proves convenient to rewrite these conditions as
of λ in the interval (−1/τn∗ , 1/τn∗ ). εn A1,n εn
 0 
n− 1 E =0 (5)
The above lemma suggests (−1/τn , 1/τn ) as an alternative (al-
∗ ∗
εn A2,n εn
0
though somewhat more restrictive) specification of the parameter
space. Of course, we obtain an equivalent model specification if we with
normalize the spatial weight matrix by τn∗ and if we choose (−1, 1) A1,n = M0n Mn − diagni=1 (m0.i,n m.i,n ), A2,n = Mn .
as the parameter space for the autoregressive parameter. Since the
spectral radius is bound by any matrix norm, other norms in place Under Assumptions 1 and 3 it is readily seen that the diagonal
of the maximum absolute row and column sum norms can be used, elements of A1,n and A2,n are zero and that the row and column
but τn∗ is especially easy to compute. sums of A1,n and A2,n are bounded uniformly in absolute value; see,
Rather than to normalize Wn by τn or τn∗ , in much of the e.g., Remark A.1 in Kelejian and Prucha (2004).
empirical literature the spatial weight matrices are normalized Our GM estimators for ρn are based on these moments.
such that each row sums to unity. The motivation for this Specifically, note that in the light of (2) εn = (In − ρn Mn )un =
normalization is that if Wn is row-normalized then In − λWn is un − ρn un and so εn = un − ρn un . Substituting these expressions
nonsingular for all values of λ in the interval (−1, 1); this can into (4) or (5) yields the following two equation system:
be readily confirmed via Lemma 2. However, this normalization
γ n − 0n α n = 0 (6)
is quite different than those described above in that in row-
where αn = [ρn , ρ ] and the elements of 0n = γrs,n r ,s=1,2 and
2 0
 
normalizing a matrix one does not use a single normalization n
factor, but rather a different factor for the elements of each row. 0
γn = γ1,n , γ2,n are given by

Therefore, in general, there exists no corresponding re-scaling
factor for the autoregressive parameter that would lead to a
n 0  o
γ11,n = 2n−1 E un un − Tr Mn diagni=1 (ui,n ui,n ) M0n
 
specification that is equivalent to that corresponding to the un-
normalized weight matrix. Consequently, unless theoretical issues
suggest a row-normalized weight matrix, this approach will in
= 2n−1 Eu0n M0n A1,n un , (7)
n 0  o
general lead to a misspecified model. γ12,n = −n−1 E un un + Tr Mn diagni=1 (u2i,n ) M0n
 
The above discussion provides the motivation for our specifica-
tion that the autoregressive parameters may depend on n. Further- = −n−1 Eu0n M0n A1,n Mn un ,
more, since some of the regressors may be spatial lags, we allow all
of the model parameters to depend on the sample size. γ21,n = n−1 E (u0n un + u0n un )
= n−1 Eu0n M0n (A2,n + A02,n )un ,
3. GM estimator for the autoregressive parameter ρn
γ22,n = −n−1 Eu0n un
In the following we introduce a class of GM estimators for
= −n−1 Eu0n M0n A2,n Mn un ,
ρn that can be easily computed, and prove their consistency and
γ1,n = n−1 E u0n un − Tr Mn diagni=1 (u2i,n ) M0n
    
asymptotic normality under a set of general assumptions. We note
that the discussion in this section only maintains model (2) for
the disturbances un , but not necessarily (1) for yn . Thus the results
= n−1 Eu0n A1,n un ,
γ2,n = n−1 Eu0n un

6 In some of the spatial literature the following closely related claim can be found:
= n−1 Eu0n A2,n un .
0
In − λWn is nonsingular for all values of λ in the interval (1/νn,min , 1/νn,max ), where 0n = e γrs,n r ,s=1,2 and e
γn γ1,n , e
γ2,n denote
  
Now let e e =
νn,min and νn,max denote the smallest and largest eigenvalue of Wn , respectively. This
claim is correct for the case in which all eigenvalues of Wn are real and νn,min < 0
corresponding estimators for the elements of 0n and γn , which
and νn,max > 0. Since, e.g., the eigenvalues of Wn need not be real, this claim does are obtained from the above expressions for the elements of 0n
not hold in general. and γn by suppressing the expectations operator, and replacing
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 57

the disturbances un , un , and un by their predictors e


un , e
un , and un ,
e the smallest eigenvalues of Υn are uniformly bounded away from
respectively. Then, the empirical analog of the relationship in (6) is zero.
2+δ
= Op (1), which
Pn
γn − e
e 0n α n = υ n , (8) Assumption 4 implies n−1 i=1 di.,n
was maintained in Kelejian and Prucha (1999), and so is slightly
where υ n can be viewed as a vector of regression residuals. Our GM stronger than their assumption. Assumption 4 should be satisfied
estimators of ρn , say e
ρn , are now defined as weighted nonlinear for typical linear spatial models where e ui is based on n1/2 -
least squares estimators based on (8). That is, let Υ en be a 2 ×
consistent estimators of regression coefficients, di.,n denotes the
2 symmetric positive semidefinite (moments) weighting matrix; i-th row of the regressor matrix, and 1n denotes the difference
ρn is defined as
then e between the parameter estimator and the true parameter values.
ρn = e
e ρn (Υ
en ) In the next section we will actually demonstrate that Assumption 4
 0  holds for the estimated residuals of model (1) based on an
ρ ρ
  
= argmin γn − e
0n 2 Υ γn − e
0n 2 . (9) instrumental variable procedure. Assumption 4 should also be
ρ ρ
e en e
ρ∈[−aρ ,aρ ] satisfied for typical non-linear models provided the response
We note that the objective function for e ρn remains well defined function is differentiable in the parameters, and the derivatives are
even for values of ρn for which In − ρn Mn is singular, which (uniformly over the parameter space) bounded by some random
ρn to be any compact
allows us to take the optimization space for e variable with bounded 2 + δ moments; compare Kelejian and
interval that contains the true parameter space. For computational Prucha (1999).
efficiency it is best to use the formulae for the elements of e 0n Assumption 5 ensures that the smallest eigenvalue of 00n Υn 0n
and e γn corresponding to the first expression on the r.h.s. of (7) is uniformly bounded away from zero and will be sufficient to
permit us to demonstrate that ρn is identifiably unique w.r.t. the
and to compute e un and un recursively as e
un = Mne un and un =
e e
nonstochastic analog of the objective function of the GM estimator.
Mn un . In this fashion one can avoid the computation of M2n , i.e., the
e This analog is given by the function in curly brackets on the
computation of the product of two n × n matrices. γn , e
r.h.s. of (9) with e en replaced by γn , 0n and Υn . Under
0n and Υ
We now relate the above estimator for the autoregressive homoskedasticity and Υn = Υn specified as at the end of the
e
parameter to the GM estimator introduced in Kelejian and Prucha previous subsection this assumption is in essence equivalent to
(1999). Under homoskedasticity, σ 2 = σi2,n and so E [n−1 ε0n εn ] = Assumption 5 in Kelejian and Prucha (1999).
σ 2 , E [n−1 ε0n εn ] = σ 2 n−1 Tr Mn M0n , and E [n−1 ε0n εn ] = 0. These Clearly Assumption 5 requires 0n to be nonsingular, or equiv-

0
three moment conditions underlie the GM estimator suggested in alently that tr M0n A1,n Mn Su,n , tr M0n A2,n Mn Su,n
 
is linearly in-
Kelejian and Prucha (1999). Substituting the first of these moment 0
dependent of tr M0n (A1,n + A01,n )Su,n , tr M0n (A2,n + A02,n )Su,n ,
 
conditions into the second yields
  which is readily seen−by observing that Eu0n M0n Ai,n Mn un = tr
1 0 Mn Ai,n Mn Su,n and n Eu0n M0n (Ai,n + A0i,n )un = tr(M0n (Ai,n +
0 1
εn εn = E [n−1 ε0n εn ]n−1 Tr Mn M0n ,

E (10)
n A0i,n )Su,n ) where Su,n = (In − ρn Mn )−1 6n (In − ρn M0n )−1 . It is not
 
1 0 difficult to see that this linear independence condition is an analog
E εn εn = 0, to the identification conditions postulated in Lee (2007b), Assump-
n
tion 5(b), relating to quadratic forms under homoskedasticity.
which is clearly a special case of (4) under homoskedasticity. We note that while Assumption 5 should be satisfied in many
It is not difficult to see that the GM estimator suggested in settings, it does not cover situations where all elements of the
our previous paper can be viewed as being based on the two spatial weights matrix converge to zero uniformly as n → ∞ –
moment i with Υn = diag(υn , 1) and υn =
conditions in (10) see Lee (2004) – since in this case the elements of 0n would tend
e
h 2
1/ 1 + n−1 Tr Mn M0n
 
.7 to zero. On the other hand, Assumption 5 does not generally rule
out settings where Mn is row normalized, there is an increasing
number of nonzero elements in each row, and the row sums of
3.2. Consistency of the GM estimator for ρn the absolute values of the non-normalized elements are uniformly
bounded.
ρn we postulate the following
To establish consistency of e The vector of derivatives (multiplied by minus one) of the
additional assumptions.
moment conditions (6) w.r.t. ρn is given by Jn = 0n [1, 2ρn ]0 . As
Assumption 4. Let e un . We then
ui,n denote the i-th element of e expected, the limiting distribution of the GM estimator e ρn will be
assume that seen to depend on the inverse of J0n Υ n Jn . Assumption 5 also ensures
that J0n Υ n Jn is nonsingular.
ui,n − ui,n = di.,n 1n
e Because of the equivalence of matrix norms it follows from
Assumption 5 that the elements of Υn and Υn−1 are O(1).
where di.,n and 1n are 1 × p and p × 1 dimensional random vectors.
Let dij,n be the j-th element of di.,n . Then, we assume that for some
We can now give our basic consistency result for e ρn .
2+δ
δ > 0 E dij,n ≤ cd < ∞ where cd does not depend on n, and Theorem 1. Let e ρn = e ρn (Υ
en ) denote the GM estimator defined by
that n1/2 k1n k = Op (1). (9). Then, provided the optimization space contains the parameter
space, and given Assumptions 1–5,
0
Assumption 5. (a) The smallest eigenvalue of 0n 0n is uniformly p
bounded away from zero.8 (b) Υ en − Υn = op (1), where Υn are ρn − ρn → 0 as n → ∞.
e
2 × 2 non-stochastic symmetric positive definite matrices. (c) The
largest eigenvalues of Υn are bounded uniformly from above, and
dependence as Γn = Γn (Mn , ρn , σ12,n , . . . σn2,n ) the assumption should be
understood as to postulate that

inf λmin Γn (Mn , ρn , σ12,n , . . . , σn2,n )0 Γn (Mn , ρn , σ12,n , . . . , σn2,n ) > 0.


 
7 If we rewrite the moment conditions in (10) in the form corresponding to (5), n

then A1,n = M0n Mn − n−1 Tr Mn M0n In and A2,n = Mn .



In this sense the assumption allows for λγ to depend on the sequence of spatial
8 That is, λ (0 0 ) ≥ λ
0
> 0 where λ does not depend on n. More
min n n γ γ weights matrices Mn , and on the true values of the autoregressive parameters ρn
specifically, in general Γn depends on Mn , ρn , σ 2
1,n ,...σ 2
n, n . Denoting this and variances.
58 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

Clearly the conditions of the theorem regarding Υ en and Υn are for r , s = 1, 2


satisfied for Υen = Υn = I2 . In this case the estimator reduces to
ψrs,n = (2n)−1 tr Ar ,n + A0r ,n 6n As,n + A0s,n 6n
   
the nonlinear least squares estimator based on (8). This estimator
can, e.g., be used to obtain initial consistent estimates of the + n−1 a0r ,n 6n as,n . (13)
autoregressive parameter. Choices of Υ en that lead to an efficient
We now have the following result concerning the asymptotic
GM estimator for ρn (but require some consistent initial estimate ρn . We note that the theorem does not assume
distribution of e
of ρn ) will be discussed below in conjunction with the asymptotic convergence of the matrices involved.
normality result.
Theorem 2 (Asymptotic Normality). Let e ρn be the weighted nonlin-
3.3. Asymptotic distribution of the GM estimator for ρn ear least squares estimators defined by (9). Then, provided the opti-
mization space contains the parameter space, given Assumptions 1–7,
Let Dn = [d01.,n , . . . , d0n.,n ]0 where di.,n is defined in and given that λmin (9n ) ≥ cΨ∗ > 0, we have
un − un = Dn 1n . To establish the asymptotic
Assumption 4 so that e
ρn we need some additional assumptions.
normality of e n1/2 (e
ρn − ρn ) = (J0n Υn Jn )−1 J0n Υn 91n/2 ξn + op (1) (14)
where
Assumption 6. For any n × n real matrix An whose row and column  
sums are bounded uniformly in absolute value 1
J n = 0n , (15)
2ρn
n−1 D0n An un − n−1 ED0n An un = op (1).
1/2 d
ξn = 9 −
n vn → N (0, I2 ).
A sufficient condition for Assumption 6 is, e.g., that the columns
of Dn are of the form πn + 5n εn , where the elements of πn are Furthermore n1/2 (e
ρn − ρn ) = Op (1) and
bounded in absolute value and the row and column sums of 5n
are uniformly bounded in absolute value; see Lemma C.2. This will eρn (Υn ) = (J0n Υn Jn )−1 J0n Υn 9n Υn Jn (J0n Υn Jn )−1 ≥ const > 0. (16)
indeed be the case in many applications. In the next section we will
verify that this assumption holds for the model given by (1) and (2), The above theorem implies that the difference between the
1/2
and where Dn equals the (negative of the) design matrix Zn . cumulative distribution function of n (e ρn − ρn ) and that of
N 0, e

ρn converges pointwise to zero, which justifies the use of
Assumption 7. Let 1n be as defined in Assumption 4. Then the latter distribution as an approximation of the former.9
n1/2 1n = n−1/2 T0n εn + op (1), Remark 1. Clearly e ρn (9n ) = (Jn 9n Jn ) and e ρ n ( Υn ) −
−1 0 −1 −1

where Tn is a n × p dimensional real nonstochastic matrix whose eρn (9n ) is positive semi-definite. Thus choosing Υ
−1 en as a
elements are uniformly bounded in absolute value. consistent estimator for 9− n
1
leads to an efficient GM estimator.
Such a consistent estimator will be developed in the next
As remarked above, typically 1n denotes the difference
subsection. As discussed in the proof of the above theorem, the
between the parameter estimator and the true parameter values.
elements of 9n are uniformly bounded in absolute value and hence
Assumption 7 will be satisfied by many estimators. In the next
λmax (9n ) ≤ cΨ∗∗ for some cΨ∗∗ < ∞. Since by assumption also 0 <
section we verify that this assumption indeed holds for the
cΨ∗ ≤ λmin (9n ) it follows that the conditions on the eigenvalues
considered instrumental variable estimators for the parameters of
of Υn postulated in Assumption 5 are automatically satisfied by
model (1).
9− n We note that 9n is, in general, only identical to the VC matrix
1
It may be helpful to provide some insight concerning the
of the moment vector in (5) if ar ,n = 0. The terms involving ar ,n
variance of the limiting distribution of the GM estimator n1/2 (e
ρn −
reflect the fact that the GM estimator is based on estimators of the
ρn ) given below. To that effect we note that an inspection of the disturbances un and not on the true disturbances. As noted above,
derivation of this limiting distribution in Appendix C shows that it
Jn equals the vector of derivatives (multiplied by minus one) of the
depends on the limiting distribution of the (properly normalized)
moment conditions (6) w.r.t. ρn , and thus e ρn (Υn ) has the usual
vector of quadratic forms
structure, except that here 9n is not identical to the VC matrix of
1  the moment vector.
ε0n (A1,n + A01,n )εn + a01,n εn
vn = n−1/2  2
 
(11) Remark 2. From (11), (12), (14) and (15) we see that n1/2 (e ρn −
1 0 ρn ) depends linearly on a vector of linear quadratic forms in the

εn (A2,n + A2,n )εn + a2,n εn
0 0
2 innovations εn plus a term of order op (1). This result is helpful in
where for r = 1, 2 the n × n matrices Ar ,n are defined in (5), and establishing the joint distribution of e ρn with that of estimators of
where the n × 1 vectors ar ,n are defined as some of the other model parameters of interest. In particular, it
may be of interest to derive the joint limiting distribution of n1/2 1n
ar ,n = Tn αr ,n (12) and n1/2 (e
ρn − ρn ). By Assumption 7, n1/2 1n is asymptotically
with linear in εn and hence the joint limiting distribution can be
readily derived using the CLT for linear quadratic forms given in
αr ,n = n−1 E D0n In − ρn M0n (Ar ,n + A0r ,n ) (In − ρn Mn ) un .
  
Appendix A. We will illustrate this below within the context of IV
From (11) and (12) we see that, in general, the limiting distribution estimators for model (1).
of n1/2 (e
ρn − ρn ) will depend on the limiting distribution of n1/2 1n We next introduce a consistent estimator for e
ρn . For this
via the matrix Tn , unless αr ,n = 0. Clearly, if Dn is not stochastic, purpose let
then αr ,n = 0. Within the the context of model (1) and with Dn  
equal to the (negative of the) design matrix Zn this would be the 1
0n
Jn = e (17)
ρn
e
case if the model does not contain a spatial lag of the endogenous 2e
variable.
Observing further that the diagonal elements of the matrices
Ar ,n are zero it follows from Lemma A.1 that the VC matrix of the 9 This follows from Corollary F4 in Pötscher and Prucha (1997). Compare also the
vector of quadratic forms in (11) is given by 9n = (ψrs,n ) where discussion on pp. 86–87 in that reference.
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 59

where e0n is defined above by (7) and the discussion after that Remark 4. The modules underlying the derivation of Theorems 2
equation. We next define a HAC type estimator for 9n whose and 3 can be readily extended to cover a wider class of estimators.
elements are defined by (13). For this purpose let A crucial underlying ingredient is the CLT for vectors of linear
6n = diagi=1,...,n (e
e εi2,n ) quadratic forms given in Appendix A, which was used to establish
the limiting distribution of the vector of linear quadratic forms
εn = (In − e
with e ρn Mn ) e
un . We furthermore need to specify (11); that CLT is based on Kelejian and Prucha (2001). We
an estimator for ar ,n = Tn αr ,n . The matrix Tn introduced in emphasize that while in (5) we consider two moment conditions,
Assumption 7 will in many applications be of the form all of the above results generalize trivially to the case where the
 −1 GM estimator for ρn corresponds to m moment conditions
Tn = Fn Pn with Fn = Hn or Fn = In − ρn M0n Hn , (18)
where Hn is a real nonstochastic n × p∗ matrix of instruments, and n−1 E [E ε0n A1,n εn , . . . , ε0n Am,n εn ]0 = 0, (23)
Pn is a real nonstochastic p∗ ×p matrix, with p as in Assumption 7. In
where the diagonal elements of Ar ,n are zero and the row and
Section 4 we will consider instrumental variable estimators for the column sums of Ar ,n are bounded uniformly in absolute value. The
parameters of model (1) and (2). In that section we will see that if focus of this paper is on two-step estimation procedures, which is
1n corresponds to these instrumental variable estimators, then the motivated by their computational simplicity, generality of the first
matrix Tn will indeed have the above structure, and where Pn can step (where residuals may come from nonlinear models) and, since
be estimated consistently by some estimator e Pn . We now define at least under homoskedasticity, Monte Carlo experiments suggest
our estimator for Tn as10 that very little efficiency is lost, see, e.g., Das et al. (2003). Given
+ instruments Hn , one-step GMM estimators for all parameters,
Tn = e Pn with e
Fne Fn = Hn or e ρn M0n
Fn = In − e Hn . (19)
i.e., ρn , λn , βn , of the SARAR(1, 1) model could be defined by
e
In light of (12) it now seems natural to estimate ar ,n by augmenting those moment conditions (23) by the conditions
ar ,n = e
e α r ,n
Tne (20) n−1 EH0n εn = n−1 E [h0.1,n εn , . . . , h0.p∗ ,n εn ]0 = 0 (24)
with with
αr ,n = n−1 D0n In − e
ρn M0n (Ar ,n + A0r ,n ) (In − e
ρn Mn ) e
un .
  
e εn = (In − ρn Mn )(yn − Xn βn − λn Wn yn ).
Given the above we now introduce the following HAC type
The limiting distribution of the stacked moment vector follows
9 n = (ψ
estimator e ers,n ) where for r , s = 1, 2
immediately from the CLT for vectors of linear quadratic forms.
ψ
ers,n = (2n)−1 tr Ar ,n + A0r ,n e
6n As,n + A0s,n e
6n
   
Theorem 3 establishes consistent estimation of the VC matrix of the
vector of (normalized) linear quadratic forms (11). Estimation of
+ n−1e
a0r ,n e as,n .
6ne (21) the VC matrix of the vector of (normalized) linear quadratic forms
corresponding to the stacked moment conditions (23) and (24),
9n we define the following estimator for
Furthermore, based on e
is analogous. In fact, in this case estimation simplifies in that the
eρn :
components of the vector are either quadratic or linear, and the
eρn = (e
e J0n Υ Jn )+e
ene J0n Υ 9n Υ
en e Jn (e
ene J0n Υ Jn )+ .
ene (22) elements of the linear terms hr ,n are observed.11

9n and e
The next theorem establishes the consistency of e eρn . 3.4. Joint asymptotic distribution of GM estimator for ρn and
estimators of other model parameters
Theorem 3 (VC Matrix Estimation). Suppose all of the assumptions
of Theorem 2, apart from Assumption 5, hold and that additionally all In the following we discuss how the above results can be
of the fourth moments of the elements of Dn are uniformly bounded. extended to obtain the joint asymptotic distribution of the GM
Suppose furthermore that (a) that the elements of the nonstochastic estimator for ρn and of other estimators that are asymptotically
matrices Hn are uniformly bounded in absolute value, (b) supn |ρn | < linear in the innovations εn , i.e., that are of the form considered
1 and the row and column sums of Mn are bounded uniformly in in Assumption 7. As remarked above, the IV estimators for the
absolute value by, respectively, one and some finite constant (possibly regression parameters of model (1) and (2) considered in the next
after a renormalization of the weights matrix and parameter space as section will be of this form. Based on the joint distribution it will
discussed in Section 2.2), and (c) e
Pn − Pn = op (1) with Pn = O(1). then be possible to test a joint hypothesis concerning ρn and other
Then model parameters.
−1 In the following we will give results concerning the joint
9n − 9n = op (1),
e e n = op (1).
9n − 9− 1
asymptotic distribution of e ρn − ρn and 1n as considered in
If furthermore Assumption 5 holds, then also Assumptions 4 and 7 in conjunction with the estimation of the
disturbances un . Clearly, in general it will be of interest to have
eρn − eρn = op (1).
e
the available results not only concerning the joint asymptotic
The hypothesis of zero spatial correlation in the
distribution of e ρn − ρn and 1n , but also concerning other
 disturbances, estimators, say, 1∗n that are of the general form considered in
i.e., H0 : ρn = 0, can now be tested in terms of N 0, eeρn .

Assumption 7. To avoid having to introduce further notation we
Remark 3. We note that the above theorem also holds if e ρn is give our result in terms of eρn − ρn and 1n , but then comment on
what changes would be needed in the formulae to accommodate
ρ n with n1/2 (e
replaced by any other estimator e ρ n − ρn ) = Op (1).
other estimators 1∗n in place of 1n . The discussion assumes that
e e
In case Fn = Hn condition (b) can be dropped. The consistency
−1 Tn = Fn Pn and eTn = eFnePn as defined in the previous subsection.
9n verifies that this estimator for 9−
result for e 1
n can indeed be
used in the formulation of an efficient GM estimator, as discussed
after Theorem 2.
11 Several months after completing this paper we became aware of a paper
by Lin and Lee (forthcoming) that considers an SAR(1) model with unknown
heteroskedasticity. That paper is complementary to this one, with a focus and
10 The reason for using the generalized inverse is that e ρn defined by (9) is not extensive discussion of one-step estimation procedures for that model among other
forced to lie in the parameter space, and thus In − eρn Mn may be singular (where things. That paper does not discuss specification issues regarding the parameter
the probability of this event goes to zero as the sample size increases). space of the autoregressive parameters, which are considered in this paper.
60 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

In light of the above discussion we expect that the joint limiting heteroskedastic. In the following we provide results concerning the
distribution of n1/2 (e 1/2
 −1ρ n −ρn ) and
/2 0
 n 1n will depend on the limiting
0 0
asymptotic distribution of IV estimators allowing the innovations
distribution of n Fn εn , vn . Observing again that the diagonal to be heteroskedastic. More specifically, we will show that the
elements of the matrices Ar ,n are zero it follows from Lemma A.1 considered IV estimators satisfy certain conditions such that their
that the VC matrix of this vector of linear and linear quadratic form asymptotic distribution can be readily obtained via Theorem 4. We
is given by also allow for a more general definition of the parameter space
of the spatial autoregressive parameters to avoid certain pitfalls
9∆∆,n 9∆ρ,n
 
9◦,n = (25) discussed in Section 2.
90∆ρ,n 9n
4.1. Instruments
with 9∆∆,n = n−1 F0n 6n Fn , 9∆ρ,n = n−1 F0n 6n a1,n , a2,n and where
 
9n is defined by (13). We shall also employ the following estimator It is evident from (3) that in general Wn yn will be correlated
for 9◦,n : with the disturbances un , which motivates the use of IV estimation

9∆∆,n
e 9∆ρ,n
e
 procedures. We maintain the following assumptions w.r.t. the n × k
9◦,n = e0 (26) regressor matrices Xn , and the n × p∗ instrument matrices Hn .
9∆ρ,n 9n
e
e
Assumption 8. The regressor matrices Xn have full column rank
9∆∆,n = n−1e 6ne 9∆ρ,n = n−1e a1,n ,e
6n e
 
with e F0n e Fn , e F0n e a2,n and where
(for n large enough). Furthermore, the elements of the matrices Xn
9n is defined by (21).
e are uniformly bounded in absolute value.
We now have the following result concerning the joint limiting
distribution of n1/2 (e
ρn − ρn ) and n1/2 1n . Assumption 9. The instrument matrices Hn have full column rank
p∗ ≥ k+1 (for all n large enough). Furthermore, the elements of the
Theorem 4. Suppose all of the assumptions of Theorem 3 hold, and matrices Hn are uniformly bounded in absolute value. Additionally
λmin (9◦,n ) ≥ cΨ∗ ◦ > 0. Then Hn is assumed to, at least, contain the linearly independent
columns of (Xn , Mn Xn ).
n1/2 1n
   0 
Pn 0
= 91◦,/n2 ξ◦,n + op (1), (27)
n1/2 (e
ρn − ρn ) 0 (J0n Υn Jn )−1 J0n Υn Assumption 10. The instruments Hn furthermore satisfy:
1/2
 −1/2 0 0 d (a) QHH = limn→∞ n−1 H0n Hn is finite, and nonsingular.
ξ◦,n = 9− n Fn εn , v0n → N (0, Ip∗ +2 ).
◦,n (b) QHZ = p limn→∞ n−1 H0n Zn and QHMZ = p limn→∞ n−1 H0n Mn Zn
Furthermore let are finite and have full column rank. Furthermore, let
QHZ∗ (ρn ) = QHZ − ρn QHMZ , then the smallest eigenvalue of
P0n
   
0 Pn 0 Q0HZ∗ (ρn )Q−
HH QHZ∗ (ρn ) is bounded away from zero uniformly in
1
◦,n = 9◦,n , (28)
0 (J0n Υn Jn )−1 J0n Υn 0 Υn Jn (J0n Υn Jn )−1 n.
 0    (c) QH ΣH = limn→∞ n−1 H0n 6n Hn is finite and nonsingular.
P 0 Pn 0
◦,n = n 9
e e
◦,n (29) The above assumptions are similar to those maintained in
(e
J0n Υ Jn )+e
J0n Υ Υ Jn (e
J0n Υ Jn )+
e e
0 ene en 0 ene ene
Kelejian and Prucha (1998, 2004), and Lee (2003), and so a
9◦,n − 9◦,n = op (1), e
then e ◦,n − ◦,n = op (1), and 9◦,n = O(1), discussion which is quite similar to those given in these papers
◦,n = O(1). also applies here. Regarding the specification of the instruments
Hn observe first that
The above theorem implies that the difference between the
0 ∞
joint cumulative distribution function of n1/2 10n , (e
ρn − ρn ) and
 X
E (Wn yn ) = Wn (In − λn Wn )−1 Xn βn = λin Wni+1 Xn βn (30)
that of N 0, ◦,n converges pointwise to zero, which justifies the
 
i=0
use of the latter distribution as an approximation of the former. The
provided that the characteristic roots of λn Wn are less than one
theorem also states that e ◦,n is a consistent estimator for ◦,n .
in absolute value; compare Lemmas 1 and 2 concerning the
Remark 5. The above result generalizes readily to cases where choice of the parameter space for λn . The instrument matrices
we are interested in the joint distribution between e ρn − ρn and Hn will be used to instrument Zn = (Xn , Wn yn ) and Mn Zn =
some other estimator, say, 1∗n , where n1/2 1∗n = n−1/2 T∗0 (Mn Xn , Mn Wn yn ) in terms of their predicted values from a least
n εn +
squares regression on Hn , i.e., b Zn = PHn Zn and M \ n Zn = PHn
op (1), T∗n = F∗n P∗n and e
Tn = e F∗ne
P∗n , assuming that analogous
assumptions are maintained for this estimator. In particular, Mn Zn with PHn = Hn (Hn Hn ) Hn . Towards approximating the
0 −1 0

the results remain valid, but with 9∆∆,n = n−1 F∗0 n 6n Fn ,


∗ ideal instruments E (Zn ) = (Xn , Wn E (yn )) and E (Mn Zn ) =
(Mn Xn , Mn Wn E (yn )) it seems reasonable, in light of (30), to take
9∆ρ,n = n−1 F∗0n 6n a1,n , a2,n , 9∆∆,n = n Fn 6n Fn , 9∆ρ,n =
  −1e∗0 e e∗ e
e
Hn to be a subset of the linearly independent columns of
n 6n a1,n ,e
 
n−1e
F∗0 Pn replaced by P∗n , e
e e a2,n , and Pn , e P∗n .
(Xn , Wn Xn , W2n Xn , . . . , Wqn Xn , Mn Xn , Mn Wn Xn , . . . , Mn Wqn Xn )
4. Instrumental variable estimator for δn (31)
where q is a pre-selected finite constant.12 We note that if Hn
As remarked, the consistency and asymptotic normality results is selected as in (31) it follows from Assumptions 3 and 8 that
developed in an important paper by Lee (2004) for the quasi- its elements will be bounded in absolute value as postulated
ML estimator for the SARAR(1, 1) model defined by (1) and (2) in Assumption 9. Assumption 9 ensures that Xn and Mn Xn are
under the assumption of homoskedastic innovations do not carry instrumented by themselves. Finally we note that the assumption
over to the case where the innovations are heteroskedastic. In that Hn has full column rank could be relaxed at the expense of
fact, under heteroskedasticity the limiting objective function of the working with generalized inverses.
quasi-ML estimator would generally not be maximized at the true
parameter values, and therefore the quasi-ML estimator would
be inconsistent. Also, the asymptotic normality results developed 12 In Kelejian et al. (2004), who considered the case of homoskedastic innovations,
by Kelejian and Prucha (1998), Kelejian et al. (2004) and Lee the instruments were determined more generally by taking q as a function of the
(2003) for instrumental variable (IV) estimators of the SARAR(1, sample size n, i.e., qn , such that qn → ∞ as n → ∞. Their Monte Carlo results
1) model do not carry over to the case where the innovations are suggest that q = 2 may be sufficient for many applications.
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 61

4.2. Definition, consistency and asymptotic normality where yn∗ (ρn ) = yn − ρn Mn yn and Zn∗ (ρn ) = Zn − ρn Mn Zn . Our
estimator b δn for δn in this third step is now defined as the 2SLS
Towards estimating the model (1) and (2) we propose a procedure applied to the transformed model (33) after replacing
three step procedure. In the first step the model is estimated by ρn by e
ρn . That is,
two stage least squares (2SLS) using the instruments Hn . In the
second step the autoregressive parameter, ρn , is estimated using δn = [b
b Zn∗ (e
ρn )0 Zn∗ (e Zn∗ (e
ρn )]−1b ρn )0 yn∗ (e
ρn ) (34)
the generalized moments estimation approach from Section 3
based on the 2SLS residuals obtained via the first step. In Zn∗ (e
where b ρn ) = PHn Zn∗ (e
ρn ). To express the dependence of b
δn on
the third step, the regression model in (1) is re-estimated by ρn we also write δn = δn (e
e b b ρn ).
2SLS after transforming the model via a Cochrane–Orcutt-type The next lemma shows again that various assumptions
transformation to account for the spatial correlation. maintained in Section 3 w.r.t. the estimator of the regression
More specifically, the first step 2SLS estimator is defined as: parameters and estimated residuals are automatically satisfied
δn and corresponding
by the generalized spatial 2SLS estimator b
δn = (b
e Z0n yn ,
Z0n Zn )−1b (32)
residuals.
where bZn = PHn Zn = (Xn , W\ n yn ) and Wn yn = PHn Wn yn . In the
\
second step we estimate ρn by the GM procedure defined by (9) Lemma 4. Suppose the assumptions of Lemma 3, and let b δn (b
ρn ) be
un = yn − Zne
based on the 2SLS residuals e δn . We denote the GM ρn is any n1/2 -consistent estimator for ρn
as defined by (34), where b
ρn .
estimator again as e ρn based on 2SLS residuals). Then
(such as the GM estimator e
The next lemma shows that various assumptions maintained (a) n1/2 [b ρn ) − δn ] = n−1/2 T∗0
δn (b n εn + op (1) with Tn = Fn Pn and
∗ ∗ ∗
in Section 3 w.r.t. the estimator of the regression parameters
where
and estimated residuals are automatically satisfied by the 2SLS
HH QHZ∗ (ρn )[QHZ∗ (ρn )QHH QHZ∗ (ρn )]
P∗n = Q− 1 0 −1 −1
δn and the corresponding residuals.
estimator e
F∗n = Hn .
Lemma 3. Suppose Assumptions 1–3 and 8–10 hold, and supn βn
< ∞. Let Dn = −Zn , then the fourth moments of the elements of Dn (b) n−1/2 T∗0
n εn = Op (1).
are uniformly bounded, Assumption 6 holds, and: (c) P∗n = Op (1) and e
P∗n − P∗n = op (1) for
1/2 −1/2 0
(a) n (e
δn − δn ) = n Tn εn + op (1) with Tn = Fn Pn and where
P∗n = (n−1 H0n Hn )−1 (n−1 H0n Zn∗ (b
e ρn ))
Pn = QHH QHZ [QHZ QHH QHZ ]−1 ,
−1 0 −1 −1
× (n Zn∗ (b ρn )Hn )(n Hn Hn )−1 (n−1 H0n Zn∗ (b
ρn )) .
 −1 0 −1 0
−1
Fn = In − ρn M0n Hn .
−1/2 0 Frequently we will be interested in the joint distribution
(b) n Tn εn = Op (1).
of the generalized spatial 2SLS estimator b δn (ρ̂n ) and the GM
(c) Pn = Op (1) and e
Pn − Pn = op (1) for
ρn . In light of Lemmas 3 and 4 the limiting distribution
estimator e
Pn = (n−1 H0n Hn )−1 (n−1 H0n Zn )
h i0
of n1/2 (b
δn − δn )0 , (e
ρn − ρn )
e
now follows immediately from
× [(n−1 Z0n Hn )(n−1 H0n Hn )−1 (n−1 H0n Zn )]−1 .
Theorem 4 and Remark 5 after that theorem, with 1∗n = b δn − δn .
The asymptotic variance covariance matrix and its corresponding
The condition supn βn < ∞ is trivially satisfied if βn = β. Of
estimator are given by (28) and (29) with modifications as
course, parts (a) and (b) together imply that e δn is n1/2 -consistent described in Remark 5 and Dn = −Zn . The expressions for the
for δn . +
matrices Pn , Fn , e
Pn are as in Lemma 3, and e ρn M0n Hn .
Fn = In − e
un = un + Dn 1n with Dn = −Zn and 1n = e
Clearly e δn − δn .
Lemma 3 shows that in essence under Assumptions 1–3 and 8–10 The expressions for the matrices P∗n , F∗n , e
P∗n are as in Lemma 4, and

the 2SLS residuals automatically satisfy the conditions postulated Fn = Hn . The joint distribution can then be used to test in particular
e
in Assumptions 4, 6 and 7 with Dn = −Zn , 1n = e δn − δn and the joint hypothesis H0 : λn = ρn = 0 in the usual manner.
Tn as specified in the lemma. Consequently the results concerning Finally, consider the estimated residuals corresponding to b δn ,
consistency and asymptotic normality of the GM estimator for ρn u∗n = yn − Znb
i.e., e δn = un + Dn 1∗n . Clearly, in light of Lemma 4 we
in Theorems 1 and 2 apply in particular to the GM estimator e ρn could use those residuals to define a corresponding GM estimator
based on 2SLS residuals. Lemma 3 also establishes that the fourth based on those residuals, and a discussion analogous to that after
moments of the elements of Dn = −Zn are uniformly bounded. The Lemma 3 would also apply applies here. Of course, further iterative
lemma also gives an explicit expression for Pn and e Pn and verifies procedures are possible, and their asymptotic properties would
the conditions postulated w.r.t. to those matrices in Theorems 3 again be covered by the asymptotic theory developed above.
and 4. Hence the results of those two theorems also cover the GM
estimator e ρn and the 2SLS estimator e δn . In particular, Theorem 4 Remark 6. The estimation theory of Section 3 was developed
gives the joint limiting distribution of n1/2 (eρn − ρn ) and n1/2 1n = under a set of fairly general assumptions. The results given in
n1/2 (e
δn − δn ) where Dn = −Zn , the matrices Pn , Fn , e Pn are as in this section are illustrative as to how Assumptions 4, 6 and
+
Lemma 3, and e Fn = In − e ρn M0n Hn .13 7 can be implied for the 2SLS and GS2SLS estimator of the
We now turn to the third step. Applying a Cochrane–Orcutt- SARAR(1, 1) model from the more primitive Assumptions 8–10.
type transformation to (1) yields The assumption that Xn is nonstochastic rules out the presence
yn∗ (ρn ) = Zn∗ (ρn )δn + εn , (33) of endogenous regressors (apart from the spatial lag Wn yn ). Now
suppose that Xn = Xn , Yn and correspondingly Dn = −Zn =
 

− Xn , Yn , Wn yn , where Xn satisfies Assumptions 8–10, with Xn


 
13 An alternative estimation approach is to use a HAC procedure to estimate the
replaced by Xn (including in the formulation of the instruments),
variance covariance matrix of the 2SLS estimator. Such an approach was considered
in Pinkse et al. (2002) and Kelejian and Prucha (2007a). While such an approach is
and where Yn is a matrix of endogenous variables. Then given the
more robust, it does not yield a simple testing strategy for a joint test of spatial fourth moments of the elements of Dn are uniformly bounded, and
dependencies in the endogenous, exogenous and disturbances. Assumption 6 holds, parts (a), (b) and (c) of Lemmas 3 and 4 still
62 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

Zn = PHn Zn = Xn , PHn Yn , PHn Wn yn and


 
hold but with b also consider comparisons with the quasi-ML estimator (specified
under homoskedasticity).
Zn∗ (e
b ρn ) = PHn Zn∗ (e
ρn )
= (In − e ρn Mn ) Xn , PHn (In − e
ρn Mn ) Yn ,

Acknowledgements
PHn (In − e ρn Mn ) Wn yn .

Our thanks for very helpful discussions and suggestions are
In specifying a full system of equations analogous to Kelejian
owed to Irani Arraiz, Badi Baltagi, Peter Egger, David Drukker,
and Prucha (2004) one could furthermore develop more primitive
conditions that ensure the moment condition for the elements of Benedikt Pötscher, and Paulo Rodrigues, and to seminar partic-
Dn as well as Assumption 6. ipants at Pennsylvania State University, Singapore Management
University, ADRES Conference on Networks of Innovations and
5. Some Monte Carlo results Spatial Analysis of Knowledge Diffusion in Saint-Etienne, Texas
A&M University, SUNY Albany, University of Innsbruck, Syracuse
In order to obtain some insights relating to the small University and Kansas University. Also, we gratefully acknowledge
sample properties of our suggested estimators, as well as the financial support from the National Science Foundation through
quasi-maximum likelihood estimator, we undertook a limited grant SES-0001780 and the National Institute of Health through the
Monte Carlo study. Because of space constraints we do not SBIR grant R43 AG027622 and R44 AG027622.
report the details of that study here; however, those details
can be found in Kelejian and Prucha (2007b). In essence the
Appendix A. CLT for vectors of linear quadratic forms
results of that Monte Carlo study were quite consistent with
the theoretical results presented above. Specifically, in our
considered cases with heteroskedastic innovations the generalized In the following we state, for the convenience of the reader, a
spatial 2SLS/GM estimators had very little bias even in small central limit theorem (CLT) for vectors of linear quadratic forms
samples. In contrast, the ML estimators were significantly biased. with heteroskedastic innovations. This CLT is based on a CLT given
Furthermore, the ML biases did not decline with the sample in Kelejian and Prucha (2001) for the scalar case. We first state
size. We also note that the rejection rates of tests corresponding a lemma that collects useful results on the mean and VC matrix
to the generalized spatial 2SLS/GM estimators were close to between (and as a special case the variance of) linear quadratic
the nominal value of 0.05, suggesting that the derived large forms.
sample distribution and the estimators of the asymptotic variance
covariance matrix determined above provide indeed useful small Lemma A.1. Let ζ = (ζ1 , . . . , ζn )0 be a random vector with zero
sample appoximations. On the other hand, rejection rates for the mean and positive definite variance covariance matrix 4, let A = (aij )
ML estimators exibited serious biases ranging from 0.176 to 1.000. and B = (bij ) be n×n nonstochastic symmetric matrices, and let a and
In the cases we considered with homoskedastic innovations b be n × 1 nonstochastic vectors. Consider the decomposition 4 = SS0 ,
both the generalized spatial 2SLS/GM estimators and ML estima-
let A∗ = (aij,∗ ) = S0 AS and B∗ = (bij,∗ ) = S0 BS, and let a∗ = S0 a
tors showed very little bias. In addition, the loss of efficiency of the
and let b∗ = S0 b. Furthermore let η = (η1 , . . . , ηn )0 = S−1 ζ.
generalized spatial 2SLS/GM estimators relative to the ML estima-
tors was generally modest. Then assuming that the elements of η are independently distributed
with zero mean, unit variance, and finite third and fourth moments
6. Summary and suggestions for further research E (ηi3 ) = µη(3i ) and E (ηi4 ) = µ(η4i ) we have

In this paper we introduce a new class of GM estimators E (ζ0 Aζ + a0 ζ) = tr(A∗ ) = tr(A4),


for the autoregressive parameter of a spatially autoregressive cov(ζ0 Aζ + a0 ζ, ζ0 Bζ + b0 ζ) = 2tr(A4B4) + a0 4b
disturbance process allowing for innovations with unknown n n
aii,∗ bii,∗ µ(η4i ) − 3 + (ai,∗ bii,∗ + bi,∗ aii,∗ )µ(η3i ) .
X   X
heteroskedasticity. The estimation theory for the GM estimators +
is developed in a modular fashion under a fairly general set i =1 i =1
of assumptions, and should cover many (linear and nonlinear)
models. The general theory is then utilized to establish the
consistency and asymptotic normality of IV estimators for the Remark A.1. The above expression for the covariance exploits the
regression parameters for an important class of spatial models, independence of the elements of η. A convenient way to obtain
frequently referred to as SARAR(1, 1) models. The paper provides these expressions is to re-write the linear quadratic form as a sum
results concerning the joint asymptotic distribution of the GM of martingale differences as, e.g., in Kelejian and Prucha (2001,
estimator for the autoregressive parameter of the disturbance Appendix A). If the diagonal elements of A∗ and B∗ are zero the last
process and IV estimators for the model regression parameters. two terms drop out from the expression for the covariance, and so
Among other things, the results allow for a joint test that the one need only assume the existence of second moments. The last
autoregressive parameters corresponding to the spatial lags of the two terms also drop out from the expression for the covariance in
dependent variable and disturbance term are both zero We also the case where ζ – or equivalently η – is normally distributed, since
provide a discussion of the specification of the parameter space in this case µ(η3i ) = 0 and µ(η4i ) = 3. Of course, the variance of a
for SARAR(1, 1) models. We demonstrate that a computationally
linear quadratic form is obtained as the special case where A = B
simple re-scaling of the weights matrix leads to an equivalent
and a = b. Obviously, in case A and B are not symmetric the above
model containing a (re-scaled) autoregressive parameter which
formulae apply with A and B replaced by (A + A0 )/2 and (B + B0 )/2.
has a user-friendly parameter space. Unfortunately, previous
studies in the literature have often re-scaled their weights matrix Consider the linear quadratic forms (r = 1, . . . , m)
in such a way that the ‘‘before and after’’ scaled models are not
equivalent. Qr ,n = ξ0n Ar ,n ξn + a0r ,n ξn
One suggestion for further research is to extend the results
to panel data. A further suggestion for future research would be where ξn = (ξ1,n , . . . , ξn,n )0 is an n × 1 random vector,
a comprehensive Monte Carlo study which focuses on the small Ar ,n = (aij,r ,n )i,j=1,...,n is an n × n nonstochastic real matrix, and
sample properties of the considered estimators. Such a study may ar ,n = (a1,r ,n , . . . , an,r ,n )0 is an n × 1 nonstochastic real vector.
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 63

πr ,n Ar ,n and dn = πr ,n ar ,n . Observe that


Pm Pm
We maintain the following assumptions: where Cn = r =1 r =1
m
Assumption A.1. The real valued random variables of the array −1/2
µQn = EQn = n1/2 α 0 Σ Vn µVn =
X
{ξi,n : 1 ≤ i ≤ n, n ≥ 1} satisfy E ξi,n = 0. Furthermore, for each πr ,n µQr ,n ,
n ≥ 1 the random variables ξ1,n , . . . , ξn,n are totally independent. r =1
−1/2 −1/20
σQ2n = var(Qn ) = nα 0 6Vn 6Vn 6Vn α = n.
Assumption A.2. For r = 1, . . . , m the elements of the array of
real numbers {aij,r ,n : 1 ≤ i, j ≤ n, n ≥ 1} satisfy aij,r ,n = To prove the theorem, using the Cramer Wold device, it suffices
aji,r ,n and sup1≤j≤n,n≥1 i=1 aij,r ,n < ∞. The elements of the
Pn
to show that
array of real numbers {ai,r ,n : 1 ≤ i ≤ n, n ≥ 1} satisfy Q n − µQ n
−1/2 d
supn n−1
Pn
ai,r ,n
2+η1
< ∞ for some η1 > 0. α 0 6Vn (Vn − µV ,n ) = → N (0, 1). (A.1)
i=1 σQn
Assumption A.3. For r = 1, . . . , m one of the following two To show that (A.1) holds we verify that ξn and Cn and dn satisfy
conditions holds: the conditions of Theorem 1 in Kelejian and Prucha (2001).
2+η2
(a) sup1≤i≤n,n≥1 E ξi,n < ∞ for some η2 > 0 and aii,r ,n = 0. Assumptions A.1 and A.3 are identical to Assumptions 1 and 3 in
4+η2 Kelejian and Prucha (2001). Furthermore, n−1 σQ2n = 1. Thus it
(b) sup1≤i≤n,n≥1 E ξi,n < ∞ for some η2 > 0 (but possibly
suffices to verify that Assumption 2 in Kelejian and Prucha (2001)
aii,r ,n 6= 0).
holds for the elements of Cn and dn .
Let µQr ,n and σQrs,n denote the mean of Qr ,n and the covariance Clearly λmax (n6− Vn ) = 1/λmin (n 6Vn ) ≤ 1/c. By Proposition
1 −1
between Qr ,n and Qs,n , respectively, for r , s = 1, . . . , m. Then it −1/20 −1/2
follows immediately from Lemma A.1 that under Assumptions A.1 43 in Dhrymes (1978, p. 470) the matrices n6−
Vn = n6Vn
1
6V n
−1/2 −1/20
and A.3 and n6Vn 6Vn have the same characteristic roots. Thus
n 1/2 −1/20 −1/2 −1/20
X kπn k = α 0 (n6−
2
6Vn )α ≤ λmax (n6Vn 6Vn ) kαk2 =
µQr ,n = aii,r ,n σi2,n , Vn

i=1 Vn ) ≤ 1/c < ∞, and hence πr ,n


λmax (n6− 1
≤ cπ where cπ =

n X
X n n
X 1/ c.
σQrs,n = 2 aij,r ,n aij,s,n σi2,n σj2,n + ai,r ,n ai,s,n σi2,n Observe that
i =1 j =1 i =1 m
X m
X
X n
(4)
h i cij,n = πr ,n aij,r ,n ≤ cπ aij,r ,n
+ aii,r ,n aii,s,n µi,n − 3σi4,n r =1 r =1
i =1
n and hence
(ai,r ,n aii,s,n + ai,s,n aii,r ,n )µ(i,3n)
X
+ n
X m
X n
X
i =1 sup cij,n ≤ cπ sup aij,r ,n < ∞
1≤j≤n,n≥1 i=1 r =1 1≤j≤n,n≥1 i=1
where σ = Eξ
2
i,n = E ξ and
2
i,n , µ(i,3n)= E ξ In case 3
i,n µ(i,4n) 4
i ,n .
Assumption A.3(a) holds, the mean of Qr ,n is zero and the last two in light of Assumption A.2. This shows that the elements of Cn
terms drop out from the expression for the variance. satisfy Assumption 2 in Kelejian and Prucha (2001).
In the following we give a CLT for the m × 1 vector of linear Next observe that
quadratic forms " #2+η1 " #2+η1
0 m m
Vn = Q1,n , . . . , Qm,n .
 2+η1
X X
di,n ≤ πr ,n ai,r ,n ≤ cπ 2+η1
ai,r ,n
Let µVn = EVn = [µQ1,n , . . . , µQm,n ]0 and Σ Vn = [σQrs,n ]r ,s=1,..,m r =1 r =1
denote the mean and VC matrix of Vn , respectively. We then have m
2+η1
X
the following theorem. ≤ cπ2+η1 m2+η1 ai,r ,n
r =1
Theorem A.1. Suppose Assumptions A.1–A.3 hold and n−1 λmin (Σ Vn )

1/2

1/2
0 and hence
≥ c for some c > 0. Let Σ Vn = 6Vn 6Vn , then n
2+η1
X
−1/2 d sup n−1 di,n
6Vn (Vn − µVn ) → N (0, Im ). n
i=1
m n
2+η1
X X
Remark A.2. Since the diagonal elements of a positive definite ≤ cπ2+η1 m2+η1 sup n−1 ai , r , n <∞
n
matrix are greater than or equal to the smallest eigenvalue of that r =1 i=1
matrix it follows that n−1 λmin (6Vn ) ≥ c implies that n−1 σQn,ii ≥ c. in light of Assumption A.2. Therefore the elements of dn satisfy
Therefore this assumption automatically implies the assumption Assumption 2 in Kelejian and Prucha (2001). 
maintained in Theorem 1 of Kelejian and Prucha (2001) w.r.t. the
variance of a linear quadratic form for each of the linear quadratic
Appendix B. Proofs for Section 2
forms Qr ,n . Of course, the theorem remains valid, if all assumptions
are assumed to hold for n > n0 where n0 is finite.
Proof of Lemma 1. Clearly In − λWn is non-singular for λ = 0.
Proof of Theorem A.1. Let α be some arbitrary m × 1 vector with For λ 6= 0 we have det(In − λWn ) = (−λ)n det(Wn − λ−1 In ).
kαk = 1, let Consequently In − λWn is non-singular for values of λ−1 6∈
πn = (π1,n , . . . , πm,n ) = n1/2 α 0 6Vn ,
−1/2 {ν1,n , . . . , νn,n }. In particular, In − λWn is non-singular for λ−1 >
τn . Rewriting this last inequality as |λ| < τn−1 completes the
and define proof. 
−1/2
Qn = n1/2 α 0 6Vn Vn Proof of Lemma 2. As an immediate consequence of Geršgorin’s
X m m
X Theorem – see, e.g., Horn and Johnson (1985, pp. 344–346) – we
= πr ,n ξ0n Ar ,n ξn + πr ,n a0r ,n ξn = ξ0n Cn ξn + d0n ξn , have τn = max{ ν1,n , . . . , νn,n } ≤ τn∗ . The claim now follows
r =1 r =1 from Lemma 1. 
64 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

Appendix C. Proofs for Section 3 Rn (ω, ρ) − Rn (ρ)


h 0 i
= ρ, ρ 2 , 1 8en Υ en − 80n Υn 8n ρ, ρ 2 , 1 0
en 8
 
Remark C.1. Suppose the row and column sums of the np ×
np matrices An = (aij,n ) are bounded uniformly in absolute e0n Υ
≤ 8 en − 80n Υn 8n
en 8 ρ, ρ 2 , 1
2
Pnp q
value by some finite constants cA , then i=1 aij,n ≤ cAq for
q > 1. This result is trivially seen to hold since
Pnp
aij,n
q
= e0n Υ
≤ 8 en − 80n Υn 8n [1 + (aρ )2 + (aρ )4 ].
en 8
i=1
q−1 Pnp q −1 q−1 Pnp q
cA i =1 aij,n aij,n /cA ≤ cA i =1 aij,n ≤ cA . As is readily seen from the respective second expressions on
the r.h.s. of (7), the elements of 8n and 8 en are all of the form
In this appendix we shall make use of several lemmata.
n−1 Eu0n An un and n−1e u0n Ane
un where the row and column sums
Because of space constraints we do not provide explicit proofs
of An are bounded uniformly in absolute value. It now follows
for these lemmata. However, explicit proofs are given in the p
underlying working paper, Kelejian and Prucha (2007b), which can immediately from Lemma C.1 that 8 en − 8n → 0, and that the
elements of 8 en and 8n are, respectively, Op (1) and O(1). The
be obtained from the authors.
analogous properties are seen to hold for the elements of Υ en and
Lemma C.1. Suppose the row and column sums of the real non- Υn in light of Assumption 5. Given this it follows from the above
stochastic n × n matrices An are uniformly bounded in absolute value. inequality that Rn (ω, ρ) − Rn (ρ) converges to zero uniformly over
Let un be defined by (2) and let e
un denote a predictor for un . Suppose the optimization space [−aρ , aρ ], i.e.,
Assumptions 1–4 hold then: sup Rn (ω, ρ) − Rn (ρ)
ρ∈[−aρ ,aρ ]
(a) n −1
E un An un = O(1), v ar (n
0 −1 0
un An un ) = o(1) and p
e0n Υ
≤ 8 en − 80n Υn 8n [1 + (aρ )2 + (aρ )4 ] →
en 8 0
−1 0
un Ane
n e un − n −1
Eun An un = op (1).
0

as n → ∞. The consistency of eρn now follows directly from Lemma


(b) n−1 E d0.s,n An un = O(1), s = 1, . . . , p, where d.s,n denotes the 3.1 in Pötscher and Prucha (1997). 
s-th column of Dn , and
We make use of the following lemma.
un − n−1 ED0n An un = op (1).
n−1 D0n Ane
Lemma C.2. Let un be defined by (2) and let Dn = [d01.,n , . . . , d0n.,n ]0 ,
(c) If furthermore Assumption 6 holds, then
where di.,n is defined in Assumption 4. Suppose Assumptions 1–3 hold,
n−1/2e un = n−1/2 u0n An un + α0n n1/2 1n + op (1)
u0n Ane and suppose furthermore that the columns of Dn are of the form πn +
5n εn , where the elements of πn are bounded in absolute value and
with αn = n−1 ED0n (An + A0n )un . (Of course, in light of (b) we have the row and column sums of 5n are uniformly bounded in absolute
αn = O(1) and n−1 D0n (An + A0n )e un − αn = op (1).) value. Then (a) Ed4ij,n ≤ const < ∞, and thus the moment condition
in Assumption 4 is automatically implied, and (b) Assumption 6 is
Proof. A proof of the lemma is given in Kelejian and Prucha
automatically implied.
(2007b). 
Proof. A proof of the lemma is given in Kelejian and Prucha
Proof of Theorem 1. The existence and measurability of e ρn is (2007b). 
assured by, e.g., Lemma 3.4 in Pötscher and Prucha (1997).
Proof of Theorem 2. Define
The objective function of the weighted nonlinear least squares
estimator and its corresponding non-stochastic counterpart are ρn
   −1 0 
n e un C1,neun
qn (ρn , 1n ) = e
γn − e
0n = (C.1)
given by, respectively, ρn2 n−1eu0n C2,ne
un
0 
Rn (ω, ρ) = e0n (ρ, ρ 2 )0 − e
γn Υ 0n (ρ, ρ 2 )0 − e
γn
 
with Cr ,n = (1/2) In − ρn M0n (Ar ,n + A0r ,n ) (In − ρn Mn ), and
en e 
0  where the matrices Ar ,n are defined by (5), r = 1, 2. The
Rn (ρ) = 0n (ρ, ρ 2 )0 − γn Υn 0n (ρ, ρ 2 )0 − γn .
 
second equality in (C.1) is seen to hold in light of the discussion
To prove the consistency of eρn we show that the conditions of, surrounding Eqs. (4)–(9). For later use we observe that the rows
e.g., Lemma 3.1 in Pötscher and Prucha (1997) are satisfied for and column sums of Cr ,n are uniformly bounded in absolute value;
the problem at hand. We first show that ρn is an identifiably see, e.g., Remark A.1 in Kelejian and Prucha (2004).
unique sequence of minimizers of Rn . Observe that Rn (ρ) ≥ 0 and We have shown in Theorem 1 that the GM estimator e ρn defined
that Rn (ρn ) = 0. From (6) we have γn = 0n [ρn , ρn2 ]0 . Utilizing in (9) is consistent. Apart on a set whose probability tends to zero
Assumption 5 we then get the estimator satisfies the following first order condition
∂ qn (e
ρn , 1n )
Rn (ρ) − Rn (ρn ) = Rn (ρ) qn (e
ρn , 1n )0 Υ
en = 0.
0 ∂ρ
= ρ − ρn , ρ 2 − ρn2 00n Υn 0n ρ − ρn , ρ 2 − ρn2
  
0 Substituting the mean value theorem expression
≥ λmin (Υn )λmin (00n 0n ) ρ − ρn , ρ 2 − ρn2 ρ − ρn , ρ 2 − ρn2
 
∂ qn (ρ n , 1n )
≥ λ∗ [ρ − ρn ]2 qn (e
ρn , 1n ) = qn (ρn , 1n ) + (e
ρn − ρn )
∂ρ
for some λ∗ > 0. Hence for every ε > 0 and n we have: into the first order condition yields

inf [Rn (ρ) − Rn (ρn )] ∂ qn (e


ρn , 1n ) ∂ qn (ρ n , 1n ) 1/2
{ρ∈[−aρ ,aρ ]:kρ−ρn k≥ε} Υ
en n (e ρn − ρn )
∂ρ 0 ∂ρ
≥ inf λ2∗ [ρ − ρn ]2 = λ∗ ε 2 > 0,
{ρ∈[−aρ ,aρ ]:kρ−ρn k≥ε} ∂ qn (e
ρn , 1n )
=− en n1/2 qn (ρn , 1n )
Υ (C.2)
which proves that ρn is identifiably unique. Next let 8n = ∂ρ 0
[0n , −γn ] and 8 0n , −e
en = [e γn ], then where ρ n is some in-between value. Observe that
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 65

∂ qn (ρ, 1n )
 
1 Theorem 2 in the text, the VC matrix of the vector of quadratic
0n
= −e (C.3)
forms on the r.h.s. of (C.9) is given by 9n where the elements of
∂ρ 2ρ
that matrix are given in (13). These elements can be written more
and consider the nonnegative scalars explicitly as
ρn , 1n ) ∂ qn (ρ n , 1n )
∂ qn (e n n
4
en = Υ 1 XX
ψrs,n = (aij,r ,n + aji,r ,n )(aij,s,n + aji,s,n )σi2,n σj2,n
en
∂ρ 0 ∂ρ
2n i=1 j=1
 0  
1 0 1
= 0n Υ 0n , (C.4) n
ρn 2ρ n 1X
e ene
2e
+ ai,r ,n ai,s,n σi2,n . (C.10)
 0   n i=1
1 1
4n = 00n Υn 0n .
2ρn 2ρn By assumption λmin (9n ) ≥ const > 0. Since the matrices Ar ,n ,
p
the vectors ar ,n , and the innovations εn satisfy all of the remaining
In proving Theorem 1 we have demonstrated that e 0n − 0n → 0 and assumptions of the central limit theorem for vectors of linear
0n and 0n are Op (1) and O(1), respectively. By
that the elements of e quadratic forms given above as Theorem A.1 it follows that
Assumption 5 we have Υ en − Υn = op (1) and also that the elements 1 
en and Υn are Op (1) and O(1), respectively. Since e
of Υ ρn and ρ n are ε0n (A1,n + A01,n )εn + a01,n εn
ξn = −9n−1/2 n−1/2  2
 
consistent and bounded in absolute value, clearly
1 0

p
εn (A2,n + A02,n )εn + a02,n εn
4
en − 4n → 0 (C.5) 2
d
as n → ∞, and furthermore 4 en = Op (1) and 4n = O(1). In → N (0, I2 ). (C.11)
particular 4n ≤ λ∗∗
Ξ where λΞ is some finite constant. In light of
∗∗
Since the row and column sums of the matrices Ar ,n are uniformly
Assumption 5 we have 4n ≥ λmin (Υn )λmin (00n 0n )(1 + 4ρn2 ) ≥ λ∗Ξ bounded in absolute value, and since the elements of ar ,n and the
for some λ∗Ξ > 0. Hence 0 < 4−n ≤ 1/λΞ < ∞, and thus we also
1 ∗
variances are uniformly bounded by finite constants it is readily
+ 1/2
have 4− n = O(1). Let 4n denote the generalized inverse of 4n . It
1 e e seen from (C.10) that the elements of 9n , and hence those of 9n
then follows as a special case of Lemma F1 in Pötscher and Prucha are uniformly bounded. It now follows from (C.7), (C.8) and (C.11)
(1997) that 4en is nonsingular eventually with probability tending that
e+
to one, that 4 n = Op (1), and that
 0
1
n1/2 (e
ρn − ρn ) = 4− 1
00n Υn 9n1/2 ξn + op (1). (C.12)
e+ −1 p n 2ρn
4n − 4n → 0 (C.6)
as n → ∞. Observing that 4n = J0n Υn Jn , where Jn = 0n [1, 2ρn ]0 , this
e+
Premultiplying (C.2) by 4 n and rearranging terms yields
establishes (14). Since all of the nonstochastic terms on the r.h.s.
h i of (C.12) are O(1) it follows that n1/2 (e
ρn − ρn ) = Op (1). Next recall
n1/2 (e en n1/2 (e +
ρn − ρn ) = 1 − 4
en 4 ρn − ρn ) that 0 < λ∗Ξ ≤ 4n ≤ λ∗∗ Ξ < ∞. Hence

+ ∂ qn (e
ρn , 1n ) n Jn Υn 9n Υn Jn 4n
4− ≥ λmin (9n ) [λmin (Υn )]2 λmin (00n 0n )
1 0 −1
−4
en en n1/2 qn (ρn , 1n ).
Υ
∂ρ 0 × (1 + 4ρn2 )/(λ∗∗
Ξ ) ≥ const > 0.
2

In light of the above discussion the the first term on the r.h.s. is zero This establishes the last claim of the theorem. 
on ω-sets of probability tending to one. This yields
A part of proving Theorem 3 will be to show that Ψ
en − Ψn =
1/2 ∂ qn (e
ρn , 1n ) op (1). Observe that the elements can be written as
n (e e+
ρn − ρn ) = −4 en n1/2 qn (ρn , 1n ) + op (1).
Υ
n
∂ρ 0
ψ
ers,n = ψ
ers∗ ,n + ψ
ers∗∗,n , and ψrs,n = ψrs∗ ,n + ψrs∗∗,n (C.13)
(C.7)
with
Observe that n X
X n

∂ qn (e
ρn , 1n )
 0 ψ
ers∗ ,n = (2n)−1 εi2,ne
aij,ne εj2,n ,
e+ 1 1
4 Υ
en − 4− 0n Υn = op (1).
0
(C.8) i=1 j=1
n
∂ρ 0 n 2ρn
n X
X n

In light of (C.1) and Lemma C.1 the elements of n1/2 qn (ρn , 1n ) ψrs∗ ,n = (2n)−1 aij,n σi2,n σj2,n ,
can be expressed as (r = 1, 2) i=1 j=1
−1/2 0
n un = n−1/2 u0n Cr ,n un + α0r ,n n1/2 1n + op (1)
un Cr ,ne
e ψ α0r ,ne
ers∗∗,n = n−1e P0ne 6ne
F0n e Fne αs,n ,
Pne
where ψrs∗∗,n = n−1 α0r ,n P0n F0n 6n Fn Pn αs,n , (C.14)
αr ,n = 2n EDn Cr ,n un .
−1 0
where aij,n = (aij,r ,n + aji,r ,n )(aij,s,n + aji,s,n ).
Furthermore, the lemma implies that the elements of αr ,n ers∗ ,n − ψrs∗ ,n =
The next two lemmata will be used to show that ψ
are uniformly bounded in absolute value. Utilizing un = op (1).
(In − ρn Mn )−1 εn and Assumption 7 we have
1  Lemma C.3. Suppose Assumptions 1–3 hold. Let Λn = n−1 (σ 2n )0 An σ 2n
ε0 (A + A01,n )εn + a01,n εn and Λn = n−1 (ε2n )0 An ε2n with σ 2n = (σ12,n , . . . , σn2,n )0 and ε2n =
 n 1 ,n
n1/2 qn (ρn , 1n ) = n−1/2  2  + op (1) (ε12,n , . . . , εn2,n )0 , and where the n × n matrices An are real non-

1 0
εn (A2,n + A2,n )εn + a2,n εn
0 0 stochastic and symmetric. Suppose further that the diagonal elements
2 of the matrices An are zero and that their row and column sums are
(C.9) uniformly bounded in absolute value. Then E Λn = Λn = O(1)
p
where ar ,n = Tn αr ,n , r = 1, 2. Observe that the elements of and v ar (Λn ) = o(1), and hence Λn − Λn → 0 as n → ∞, and
ar ,n are uniformly bounded in absolute value. As discussed before Λn = Op (1).
66 H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67

Observing that the row and column sums of In − ρn M0n (Ar ,n +



Proof. A proof of the lemma is given in Kelejian and Prucha
(2007b).  Ar ,n ) (In − ρn Mn ) are bounded uniformly in absolute value it
0

follows from Lemma C.1 that e αr ,n − αr ,n = op (1), αr ,n = O(1)


Lemma C.4. Suppose Assumptions 1–4 hold. Let εn = (In − ρn Mn ) and hence e αr ,n = Op (1). By assumption e Pn − Pn = op (1), Pn =
εn = (In − e
un , and let e ρn Mn ) e un = un + Dn 1n and
un with e
O(1), and hence e Pn = Op (1). Observe that by Lemma C.6 we have
Dn = [d01.,n , . . . , d0n.,n ]0 , and where e ρn can be any estimator that
n−1e 6ne
F0n e Fn − n−1 F0n 6n Fn = op (1), n−1 F0n 6n Fn = O(1), and hence
satisfies n1/2 (eρn − ρn ) = Op (1). Define Λ en = n−1 (e ε2n )0 An (eε2n ), Λn =
n Fn 6n Fn = Op (1). Thus ψ
−1e0 e e ers∗∗,n − ψrs∗∗,n = op (1), ψrs∗∗,n = O(1), and
n (εn ) An εn with e
−1 2 0 2
εn = (e
2
ε1,n , . . . ,e
2
εn,n ) , εn = (ε1,n , . . . , εn2,n )0 , and
2 0 2 2

where the n × n matrices An are real nonstochastic and symmetric. ψ


ers∗∗,n = Op (1). Hence e
9n −9n = op (1), 9n = O(1) and e 9n = Op (1).
Suppose further that the diagonal elements of the matrices An are By Assumption 5 we have Υ en − Υn = op (1), Υn = O(1) and
zero and that their row and column sums are uniformly bounded in thus Υ en = Op (1). Let 4n = J0n Υn Jn and 4 en = eJ0n Υ Jn , then, as
ene
p
absolute value, and that Ed4ij,n ≤ Kd < ∞. Then, Λ
en − Λn → 0 as demonstrated in the proof of Theorem 2, Jn = Op (1), Jn = O(1),
e
n → ∞, and Λn = Op (1). e+ p
4 n = Op (1) and 4n = Op (1), and furthermoree
−1
e Jn − Jn → 0 and
e+ −1 p
Proof. A proof of the lemma is given in Kelejian and Prucha 4 n − 4n → 0 as n → ∞. (There is a slight difference in the
(2007b).  definition of 4 en here and in the proof of Theorem 2, which does
The next two lemmata will be used to show that ψ
ers∗∗,n − ψrs∗∗,n = not affect the claim.) Given the above results it is now obvious that
op (1), where ψ
ers∗∗,n and ψrs∗∗,n are defined in (C.13) and (C.14). eρn − eρn = op (1).
e 
Proof of Theorem 4. We concentrate again on the case Fn =
Lemma C.5. Suppose Assumptions 1–4 hold. Let εn = (In − ρn Mn ) 0 −1
In − ρ n M n

Hn . The first line in (27) is seen to hold in
εn = (In − e
un , and let e ρn Mn ) e un = un + Dn 1n and
un with e
light of Assumption 7 and Theorem 2. We next verify that
Dn = [d01.,n , . . . , d0n.,n ]0 , and where e ρn can be any estimator that d
ρn − ρn = op (1). Let an and bn be n × 1 vectors whose
satisfies e ξ◦,n → N (0, Ip∗ +2 ) utilizing the central limit theorem for vectors of
elements are uniformly bounded in absolute value by c < ∞ and let linear quadratic forms given above as Theorem A.1. By assumption
6n = diagi=1,...,n (σi2 ) and e 6n = diagi=1,...,n (e
εi2,n ). Then: λmin (9◦,n ) ≥ cΨ∗ ◦ > 0. In proving Theorem 2 we verified that
the innovations εn and the elements of ar ,n and Ar ,n appearing
6n bn − n−1 a0n 6n bn = op (1) and n−1 a0n 6n bn = O(1).
(a) n−1 a0n e in vn all satisfy the conditions of the central limit theorem. Since
(b) There exist random variables ςn that do not depend on an and bn −1
such that the row and column sums of In − ρn M0n and the elements of
Hn are uniformly bounded in absolute value it follows that the
6n bn − n−1 a0n 6n bn ≤ K (c )(1 + ςn )
n−1 a0n e elements of Fn are also uniformly bounded in absolute value. Thus
with ςn = op (1) and where K (c ) < ∞ is a constant that depends all conditions of Theorem A.1 are satisfied, which verifies the claim.
monotonically on c (as well as on some other bounds maintained In proving Theorems 2 and 3 we have shown that e 9n −
in the assumptions). 9n = op (1), 9n = O(1) and thus e 9n = Op (1). By analogous
Proof. A proof of the lemma is given in Kelejian and Prucha argumentation it is readily seen that the other sub-matrices of e 9◦,n
(2007b).  and 9◦,n have the same properties, and thus e 9◦,n − 9◦,n = op (1)
and 9◦,n = O(1). By assumption e Pn − Pn = op (1), Pn = O(1) and
Lemma C.6. Suppose Assumptions 1–4 hold. Furthermore assume thus ePn = Op (1), and furthermore Υ en − Υn = op (1), Υn = O(1)
that supn |ρn | < 1, and the row sums and column sums of Mn
and thus Υn = Op (1). In proving Theorem 2 we have verified
e
are bounded uniformly in absolute value by, respectively, one and p
some finite constant. Let εn = (In − ρn Mn ) un , and let e εn = that e Jn − Jn → 0, Jn = O(1) and e Jn = Op (1), and furthermore
ρn Mn ) e
(In − e un = un + Dn 1n and Dn = [d01.,n , . . . , d0n.,n ]0 ,
un with e that (e J0n Υ Jn )+ − (J0n Υn Jn )−1 = op (1), (e
ene J0n Υ Jn )+ = Op (1) and
ene
and where eρn can be any estimator that satisfies e
ρn − ρn = op (1). Let (J0n Υn Jn )−1 = O(1). The claim that e ◦,n − ◦,n (Υn ) = op (1) and
◦,n = O(1) is now obvious. 
Fn = e
Fn = Hn
or Appendix D. Proofs for Section 4
Fn = (In − ρn Mn )−1 Hn ρn Mn )+ Hn ,
Fn = (In − e
and e
Proof of Lemma 3. Utilizing (3) it follows from Assumptions 3 and
where Hn is an n×p∗ matrix whose elements are uniformly bounded in 8 and supn βn < ∞ that all columns of Zn = [Xn , Wn yn ] are of
absolute value, let 6n = diagi=1,...,n (σi2 ) and e
6n = diagi=1,...,n (e
εi2,n ), the form ϑn = πn + 5n εn , where the elements of the n × 1 vector
then n−1e Fn − n−1 F0n 6n Fn = op (1) and n−1 F0n 6n Fn = O(1).14
6ne
F0n e πn are bounded in absolute value and the row and column sums
of the n × n matrix 5n are uniformly bounded in absolute value
Proof. A proof of the lemma is given in Kelejian and Prucha
by some finite constant; compare, e.g., Remark A.1 in Kelejian and
(2007b). 
Prucha (2004). The claims that the fourth moments of Dn = −Zn
Proof of Theorem 3. We first demonstrate that Ψ en − Ψn = op (1), are uniformly bounded by a finite constant, and that Assumption 6
utilizing the expressions for the elements of Ψ en and Ψn in (C.13) and holds, now follows directly from Lemma C.2.
(C.14). Observe that under the maintained assumption the row and Clearly
column sums of the absolute elements of the matrices Ar ,n and As,n
are uniformly bounded, and thus clearly are those of the matrices n1/2 (e P0n n−1/2 F0n εn
δn − δn ) = e
An = (aij,n ) with aij,n = (aij,r ,n + aji,r ,n )(aij,s,n + aji,s,n ). It then −1
Pn is defined in the lemma and Fn =
where e In − ρn M0n Hn .
follows directly from Lemmas C.3 and C.4 that ψ ers∗ ,n −ψrs∗ ,n = op (1),
Given Assumption 10 clearly e Pn = Pn + op (1) and Pn = O(1),
ψrs∗ ,n = O(1) and ψ
ers∗ ,n = Op (1).
with Pn defined in the lemma. Since by Assumption 3 the row and
column sums of (In − ρn Mn )−1 are uniformly bounded in absolute
value, and since by Assumption 9 the elements of Hn are uniformly
14 We would like to thank Benedikt Pötscher for very helpful comments on parts bounded in absolute value, it follows that the elements of Fn are
of the proof of this lemma. uniformly bounded in absolute value. By Assumption 2, E (εn ) = 0
H.H. Kelejian, I.R. Prucha / Journal of Econometrics 157 (2010) 53–67 67

and its diagonal VC matrix, 6n , has uniformly bounded elements. Case, A., Hines Jr., J., Rosen, H., 1993. Budget spillovers and fiscal policy
Therefore, En−1/2 F0n εn = 0 and the elements of VC (n−1/2 F0n εn ) = independence: Evidence from the states. Journal of Public Economics 52,
285–307.
n−1 F0n 6n Fn are also uniformly bounded in absolute value. Thus, Cliff, A., Ord, J., 1973. Spatial Autocorrelation. Pion, London.
by Chebyshev’s inequality n−1/2 F0n εn = Op (1), and consequently Cliff, A., Ord, J., 1981. Spatial Processes, Models and Applications. Pion, London.
Cohen, J.P, Morrison Paul, C.J., 2004. Public infrastructure investment, interstate
n1/2 (e
δn − δn ) = P0n n−1/2 F0n εn + op (1) and P0n n−1/2 F0n εn = Op (1). spatial spillovers, and manufacturing cost. Review of Economic and Statistics
This completes the proof recalling that Tn = Fn Pn .  86, 551–560.
Conley, T., 1999. GMM estimation with cross sectional dependence. Journal of
Proof of Lemma 4. Note from (1) and (2) that Econometrics 92, 1–45.
Cressie, N.A.C., 1993. Statistics of Spatial Data. Wiley, New York.
yn∗ (ρ̂n ) = Zn∗ (ρ̂n )δn + εn − (ρ̂n − ρn )Mn un Das, D., Kelejian, H.H., Prucha, I.R., 2003. Small sample properties of estimators
of spatial autoregressive models with autoregressive disturbances. Papers in
and hence Regional Science 82, 1–26.
Dhrymes, P.J., 1978. Introductory Econometrics. Springer Verlag, New York.
n1/2 [b
δn (ρ̂n ) − δn ] Driscoll, J., Kraay, A., 1998. Consistent covariance matrix estimation with spatially
−1 −1/2 0 dependent panel data. The Review of Economics and Statistics LXXX, 549–560.
= n−1bZ0n∗ (ρ̂n )Zn∗ (ρ̂n ) Zn∗ (ρ̂n ) εn − (ρ̂n − ρn )Mn un
   Hanushek, E.A., Kain, J.F., Markman, J.M., Rivkin, S.G., 2003. Does peer ability affect
n b
student achievement? Journal of Applied Econometrics 18, 527–544.
P∗0 n−1/2 F∗0 εn − (ρ̂n − ρn )n−1/2 F∗∗0 εn ,
 
=e Holtz-Eakin, D., 1994. Public sector capital and the productivity puzzle. Review of
n n n Economics and Statistics 76, 12–21.
Horn, R.A., Johnson, C.R., 1985. Matrix Analysis. Cambridge University Press,
where eP∗n is defined in the lemma, and with F∗n = Hn and F∗∗n = Cambridge.
−1 0
In − ρn M0n Mn Hn . In light of Assumption 10, and since ρ̂n is Kapoor, M., Kelejian, H.H., Prucha, I.R., 2007. Panel data models with spatially cor-
related error components, Department of Economics. Journal of Econometrics
n1/2 -consistent, it follows that 140, 97–130.
Kelejian, H.H., Prucha, I.R., 1998. A generalized spatial two-stage least squares
Z0n∗ (ρ̂n )Zn∗ (ρ̂n ) − Q0HZ∗ (ρn )Q−
n−1b HH QHZ∗ (ρn ) = op (1).
1
procedure for estimating a spatial autoregressive model with autoregressive
disturbances. Journal of Real Estate Finance and Economics 17, 99–121.
Since by Assumption 10 we have Q0HZ∗ (ρn )Q− HH QHZ∗ (ρn ) = O(1)
1 Kelejian, H.H., Prucha, I.R., 1999. A generalized moments estimator for the
autoregressive parameter in a spatial model. International Economic Review 40,
and [QHZ∗ (ρn )QHH QHZ∗ (ρn )] = O(1) it follows that
0 −1 −1
509–533.
Kelejian, H.H., Prucha, I.R., 2001. On the asymptotic distribution of the Moran I test
Z0n∗ (ρ̂n )Zn∗ (ρ̂n )]−1 − [Q0HZ∗ (ρn )Q−
[n−1b HH QHZ∗ (ρn )]
1 −1
= op (1); statistic with applications. Journal of Econometrics 104, 219–257.
Kelejian, H.H., Prucha, I.R., 2002. 2SLS and OLS in a spatial autoregressive model with
compare, e.g., Pötscher and Prucha (1997), Lemma F1. In light of equal spatial weights. Regional Science and Urban Economics 32, 691–707.
Kelejian, H.H., Prucha, I.R., 2004. Estimation of simultaneous systems of spatially
P∗n − P∗n = op (1) and P∗n = O(1), with
this it follows further that e interrelated cross sectional equations. Journal of Econometrics 118, 27–50.
P∗n defined in the lemma. By argumentation analogous to that in Kelejian, H.H., Prucha, I.R., 2007a. HAC estimation in a spatial framework.
the proof of Lemma 3 it is readily seen that n−1/2 F∗0 n εn = Op (1)
Department of Economics. Journal of Econometrics 140, 131–154.
Kelejian, H.H., Prucha, I.R., 2007b. Specification and estimation of spatial
and n−1/2 F∗∗0 n εn = Op (1). Consequently n
1/2 b
[δn (ρ̂n ) − δn ] = autoregressive models with autoregressive and heteroskedastic disturbances.
∗0 −1/2 ∗0 ∗0 −1/2 ∗0
Pn n Fn εn + op (1) and Pn n Fn εn = Op (1), observing again Department of Economics, University of Maryland, July.
Kelejian, H.H., Prucha, I.R., Yuzefovich, E., 2004. Instrumental variable estimation
that ρ̂n − ρn = op (1). This completes the proof recalling that of a spatial autoregressive model with autoregressive disturbances: Large and
T∗n = F∗n P∗n .  small sample results. In: LeSage, J., Pace, K. (Eds.), Advances in Econometrics:
Spatial and Spatiotemporal Econometrics. Elsevier, New York, pp. 163–198.
Korniotis, G.M., 2005. A dynamic panel estimator with both fixed and spatial effects.
References Department of Finance, University of Notre Dame.
Lee, L.F., 2002. Consistency and efficiency of least squares estimation for mixed
Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge. regressive, spatial autoregressive models. Econometric Theory 18, 252–277.
Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Lee, L.F., 2003. Best spatial two-stage least squares estimators for a spatial
Publishers, Boston. autoregressive model with autoregressive disturbances. Econometric Reviews
Anselin, L., Florax, R., 1995. New Directions in Spatial Econometrics. Springer, 22, 307–335.
London. Lee, L.F., 2004. Asymptotic distributions of maximum likelihood estimators for
Audretsch, D.B., Feldmann, M.P., 1996. R&D spillovers and the geography of spatial autoregressive models. Econometrica 72, 1899–1925.
innovation and production. American Economic Review 86, 630–640. Lee, L.F., 2007a. The method of elimination and substitution in the GMM estimation
Baltagi, B.H., Egger, P., Pfaffermayr, M., 2007a. Estimating models of complex FDI: of mixed regressive, spatial autoregressive models. Journal of Econometrics 140,
Are there third-country effects? Journal of Econometrics 140, 260–281. 155–189.
Baltagi, B.H., Li, D., 2004. Prediction in panel data models with spatial correlation. Lee, L.F., 2007b. GMM and 2SLS estimation of mixed regressive, spatial autoregres-
In: Anselin, L., Florax, R.J.G.M., Rey, S.J. (Eds.), New Advances in Spatial sive models. Journal of Econometrics 137, 489–514.
Econometrics. Springer Verlag, New York. Lin, X., Lee, L.F., 2005. GMM estimation of spatial autoregressive models with
Baltagi, B.H., Li, D., 2001a. Double length artificial regressions for testing spatial unknown heteroskedasticity. Department of Economics, Ohio State University,
dependence. Econometric Reviews 20, 31–40. Journal of Econometrics, forthcoming (doi:10.1016/j.jeconom.2009.10.035).
Baltagi, B.H., Li, D., 2001b. LM test for functional form and spatial error correlation. Pinkse, J., Slade, M.E., 1998. Contracting in space: An application of spatial statistics
International Regional Science Review 24, 194–225. to discrete-choice models. Journal of Econometrics 85, 125–154.
Baltagi, B.H., Song, S.H., Jung, B.C., Koh, W., 2007b. Testing for serial correlation, Pinkse, J., Slade, M.E., Brett, C., 2002. Spatial price competition: A semiparametric
spatial autocorrelation and random effects using panel data. Journal of approach. Econometrica 70, 1111–1153.
Econometrics 140, 5–51. Pötscher, B.M., Prucha, I.R., 1997. Dynamic Nonlinear Econometric Models,
Baltagi, B.H., Song, S.H., Koh, W., 2003. Testing panel data regression models with Asymptotic Theory. Springer Verlag, New York.
spatial error correlation. Journal of Econometrics 117, 123–150. Robinson, P.M., 2007a. Nonparametric spectrum estimation for spatial data. Journal
Bao, Y., Ullah, A., 2007. Finite sample properties of maximum likelihood estimators of Statistical Planning and Inference 137, 1024–1034.
in spatial models. Journal of Econometrics 137, 396–413. Robinson, P.M., 2007b. Efficient estimation of the semiparametric spatial autore-
Bell, K.P., Bockstael, N.E., 2000. Applying the generalized-moments estimation gressive model. Department of Economics, London School of Economics, Journal
approach to spatial problems involving microlevel data. Review of Economics of Econometrics, forthcoming (doi:10.1016/j.jeconom.2009.10.031).
and Statistics 82, 72–82. Sacredote, B., 2001. Peer effects with random assignment: Results of Dartmouth
Bennett, R., Hordijk, L., 1986. Regional econometric and dynamic models. roommates. Quarterly Journal of Economics 116, 681–704.
In: Nijkamp, P. (Ed.), Handbook of Regional and Urban Economics, vol. 1. North Shroder, M., 1995. Games the states don’t play: Welfare benefits and the theory of
Holland, Amsterdam, pp. 407–441. fiscal federalism. Review of Economics and Statistics 77, 183–191.
Betrand, M., Luttmer, E.F.P., Mullainathan, S., 2000. Network effects and welfare Su, L., Yang, Z., 2007. QML estimation of dynamic panel data models with
cultures. Quarterly Journal of Economics 115, 1019–1055. spatial errors. School of Economics and Social Sciences, Singapore Management
Besley, T., Case, A., 1995. Incumbent behavior: Vote-seeking, tax-setting, and University.
yardstick competition. American Economic Review 85, 25–45. Topa, G., 2001. Social interactions, local spillovers and unemployment. Review of
Case, A., 1991. Spatial patterns in household demand. Econometrica 59, Economic Studies 68, 261–295.
953–966. Whittle, P., 1954. On stationary processes in the plane. Biometrica 41, 434–449.

You might also like