Wang 2013

Journal of Econometrics 172 (2013) 77–89
Contents lists available at SciVerse ScienceDirect
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Partial maximum likelihood estimation of spatial probit models✩

Honglin Wang a , Emma M. Iglesias b,∗ , Jeffrey M. Wooldridge c
a
Hong Kong Institute for Monetary Research, 55/F, Two International Finance Centre, 8 Finance Street, Central, Hong Kong
b
Department of Applied Economics II. Facultad de Economía y Empresa. University of A Coruña, Campus de Elviña, 15071. A Coruña, Spain
c
Department of Economics, Michigan State University, 101 Marshall-Adams Hall, East Lansing, MI 48824-1038, USA
article info abstract

Article history: This paper analyzes spatial Probit models for cross sectional dependent data in a binary choice context.
Received 31 October 2009 Observations are divided by pairwise groups and bivariate normal distributions are specified within each
Received in revised form group. Partial maximum likelihood estimators are introduced and they are shown to be consistent and
17 February 2012
asymptotically normal under some regularity conditions. Consistent covariance matrix estimators are also
Accepted 13 August 2012
Available online 21 August 2012
provided. Estimates of average partial effects can also be obtained once we characterize the conditional
distribution of the latent error. Finally, a simulation study shows the advantages of our new estimation
JEL classification:
procedure in this setting. Our proposed partial maximum likelihood estimators are shown to be more
C12 efficient than the generalized method of moments counterparts.
C13 © 2012 Elsevier B.V. All rights reserved.
C21
C24
C25
Keywords:
Spatial statistics
Maximum likelihood
Probit model
1. Introduction wide range of fields, including regional, real estate, agricultural, en-
vironmental, and industrial organization economics (Lee, 2004).
Most econometric techniques using cross-sectional data are Econometricians have begun to pay more attention to spatial
based on the assumption of independence of the observations. dependence problems in the last two decades, and there have been
When the data are outcomes measured at different geographical important advances both theoretical and empirical.1 The analysis
locations the assumption of independence is tenuous, especially of spatial data starts with an underlying spatial structure generat-
as economic activities have become more and more correlated over ing observed spatial correlations (Anselin and Florax, 1995). There
space with the advent of modern communication and transporta- are two popular ways of capturing spatial dependence. The first is
tion improvements. Technological advances in the geographic in- in the domain of geostatistics, where the spatial index is continu-
formation system (GIS) make collecting spatial data easier than ous (Conley, 1999). The second is to assume that spatial sites form a
ever before. Consequently, the possibility of spatial correlation countable lattice (Lee, 2004). Among lattice models, there are also
among observations has received more and more attention in a two types of spatial dependence models that have received the
bulk of the attention: the spatial autoregressive dependent vari-
able model (SAR) and the spatial autoregressive error model (SAE).
In most applications of spatial models, the dependent variables are
✩ We are grateful to two referees, the Associate Editor, the Co-Editor and continuous, work that has been added by important theoretical re-
participants at the 2009 Econometric Society European Meeting, the 2009 Simposio sults in Conley (1999), Lee (2004), and Kelejian and Prucha (1999,
de Análisis Económico, the 2010 ‘‘Brunel Macroeconomic Research Centre’’-QASS
2001). Nevertheless, there are a handful of applications that ad-
Conference on Macro and Financial Economics and at seminars at London City
University, London School of Economics, Tinbergen Institute, UCL, University Carlos dress spatial dependence with discrete choice dependent variables
III, University of Essex and University of Exeter for very useful comments. Any
remaining errors are our own.
∗ Corresponding author.
E-mail addresses: hwang@hkma.gov.hk (H. Wang), emma.iglesias@udc.es, 1 Anselin et al. (2004) wrote a comprehensive review about econometrics for
iglesia5@msu.edu (E.M. Iglesias), wooldri1@msu.edu (J.M. Wooldridge). spatial models.
0304-4076/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2012.08.005
78 H. Wang et al. / Journal of Econometrics 172 (2013) 77–89
(Case, 1991; McMillen, 1995; Pinkse and Slade, 1998; Lesage, 2000; Section 3 presents the bivariate spatial probit model. In Section 4,
Beron and Vijverberg, 2003; Pinkse et al., 2006). The purpose of this we prove consistency and asymptotic normality of the PML
paper is to advance the available estimation methods for spatially estimator (PMLE) under regularity assumptions, and discuss how
correlated binary outcomes. to get consistent covariance matrix estimators. Section 5 presents
While the analysis can be made more general, we focus on a simulation study showing the advantages of our new estimation
the probit model with spatially correlated data. As is now well procedure in this setting. Finally, Section 6 concludes. The proofs
known, if we ignore the spatial correlation and construct a are collected in Appendix A, while the results for the simulation
pseudo-likelihood function as if we had independent draws, the study are provided in Appendix B.
resulting pooled maximum likelihood estimator (MLE) is, under
fairly general conditions, consistent and asymptotically normal, 2. Discrete choice models with spatial dependence
provided the marginal model is correctly specified. Poirier and
Ruud (1988) established this result for time series data, and it is It is useful to begin with a brief discussion of general binary
pretty clear that it holds, under certain assumptions that restrict response models with spatial dependence. For a draw i, let Yi be
the amount of dependence for spatial data. The main drawback to a binary outcome and Xi a 1 × K vector of covariates. Assume that
applying the pooled MLE when the observations are dependent is Yi is generated as
a loss of efficiency. Some authors, for example Robinson (1982),
Yi = 1[Xi β + εi > 0], (1)
explicitly consider joint maximum likelihood estimation of a
nonlinear model with time series data. Unfortunately, in the where εi is an unobserved error and β is a K × 1 parameter vector
context of spatially correlated data obtaining maximum likelihood to be estimated. Regardless of any dependence in the data across i,
estimators that account for the joint dependence in the data is if εi is independent of Xi , then the response probability P (Yi = 1|Xi )
computationally very demanding. can be obtained if the distribution of εi is known. In the case where
Rather than taking either extreme — ignoring the dependence εi ∼ Normal(0, 1), it is well known that P (Yi = 1|Xi ) = Φ (Xi β),
in the data or trying to model full joint dependence — middle- where Φ denotes the standard normal cumulative distribution
ground approaches are possible. For example, Poirier and Ruud function (cdf). The ‘‘marginal probability’’ can be used, under
(1988) show how to estimate the probit model with dependence in general assumptions, to consistently estimate β using a pooled
time-series data using generalized conditional moment (GCM) es- MLE procedure — even though the data may not be independent.
timators. These estimators are computationally attractive and rel- This is effectively the insight of the Poirier and Ruud (1988) results
atively more efficient than ignoring serial dependence. Generally, for time series data.
nonlinear models with a time series dimension can be estimated Allowing explicitly for spatial correlation of the kind that is
by generalized method of moments (GMM). The GMM approach is popular for linear models raises a couple of important issues, as
(asymptotically) more efficient than just using a pooled MLE proce- recognized in Pinkse and Slade (1998). First, the variance of the
dure. However, because time series dependence is ignored in form- error in such models typically depends on the distances among
ing moment conditions, GMM estimators still can be considerably pairs of observations in the lattice — via the matrix that is used in
less efficient than the joint MLE. a weighted least squares analysis. Let W denote and N × N matrix
Similar considerations hold for spatially correlated data. Meth- of weights that are exogenous in the sense that
ods that only use information on the marginal distributions — such εi |X , W ∼ Normal(0, hi (W , λ)), (2)
as Pinkse and Slade’s (1998) GMM estimator of the SAE probit
where (hi (W , λ)) > 0 is a variance function that depends on λ. The
model based on the pooled MLE first order conditions — poten-
form of hi (·) differs across spatial models and is not yet impor-
tially give up much in terms of efficiency compared with a full
tant. The exogeneity assumption is embodied in the requirement
MLE approach. The motivation for the current paper is that joint
E (εi |X , W ) = 0, which also imposes a strict exogeneity assump-
MLE is often prohibitively difficult while recognizing that methods
tion on the covariates X .
based only on marginal distributions will often be too imprecise.
If we maintain (2) along with (1) then D(Yi |X , W ) follows a so-
Therefore, we propose a middle ground between a pooled probit
called heteroskedastic probit model with
approach and full maximum likelihood. In particular, we choose to
capture spatial dependence by assuming that sites form a count-
  
P (Yi = 1|X , W ) = Φ Xi β/ hi (W , λ) . (3)
able lattice. Then, we divide the lattice into many small groups
(clusters), where the clusters are formed from adjacent observa- Under sufficient regularity conditions — mainly restricting the
√ of spatial dependence — β and λ can be consistently
tions. The resulting structure is a large number of small clusters. amount
If we can obtain the joint density of the responses within cluster, and n-asymptotically normally estimated by using a pooled
we can improve upon methods that completely ignore the spatial heteroskedastic probit approach. These moment conditions are
dependence while arriving at estimation methods much less com- used in the Pinkse and Slade (1998) GMM estimator.
putationally demanding than joint MLE. We refer to our proposed Before we proceed further, the presence of W in (3) raises
method as ‘‘partial MLE’’ because we are only using partial joint a question about how we should summarize the partial effects
distributions, not the entire joint distribution. of the elements of Xi on the response probability. The notion of
Because we model spatial correlation only within a cluster, the average structural function (ASF), proposed by Blundell and
we still need to account for spatial correlation across clusters. Powell (2004) in a different context, seems useful. In the present
This feature is what distinguishes the current setting from a application, the ASF is defined as
standard panel data setting, where independence across clusters    
are assumed. To obtain valid inference, we appeal to Conley (1999), ASF (x) = EW Φ xβ/ hi (W , λ) . (4)
who extends Newey and West (1987) to allow for data generated
by a countable lattice. Conley (1999) uses metrics of economic The average partial effects are obtained by taking changes or partial
distance to characterize dependence among agents, and shows that derivatives of ASF (x). Given consistent estimators β̂ and λ̂, ASF (x)
the GMM estimator is consistent and asymptotically normal under can be (under regularity conditions) consistently estimated by
some assumptions similar to time-series data. n   
The rest of the paper is organized as follows. Section 2 provides

n− 1 Φ xβ̂/ hi (W , λ̂) . (5)
a brief overview of popular spatial models with a binary response. i=1
H. Wang et al. / Journal of Econometrics 172 (2013) 77–89 79
See Wooldridge (2005) for further discussion of average partial

effects in the context of heteroskedastic probit models.
We now turn to specific spatial models that have been proposed
for both linear and binary responses. Written with a latent variable
Yi∗ , with Yi = 1[Yi∗ > 0], the probit model with spatial error
correlation (SAE) can be written as
Yi∗ = Xi β + εi , (6)
n

εi = λ Wij εj + ui . (7)
j=1
Here the Wij are elements of the spatial weights matrix W

introduced above. The specification of Wij usually relies on some
measure of spatial distances between observations i and j, such Fig. 1. 2n observations =⇒ n groups.
as the Euclidean distance. (By convention, Wii = 0 for all i.) The
parameter λ is the spatial autoregressive error coefficient, and the
ui are assumed to be i.i.d. Normal(0, 1) random variables. We can Below we discuss univariate and bivariate probit approaches to
write (6) and (7) in matrix form as estimation. Both of these are computationally much simpler than
joint MLE.
Y ∗ = Xβ + ε (8)
ε = (I − λW ) −1
u, (9) 3.1. Univariate probit partial MLE
so that the variance–covariance matrix for the N × 1 vector ε is
If we use only the information in the marginal distributions
Ω ≡ Var(ε|X , W ) = [(I − λW )′ (I − λW )]−1 . (10) P (Yi = 1|X , W ) — the approach taken by Pinkse and Slade (1998)
If Y ∗ is observable, Eqs. (8) and (9) define the linear SAE model, — then we are lead to a partial (or pooled) log likelihood function
and its estimation and asymptotic properties using full MLE have of the form
been extensively studied by Lee (2004). Here, we only observe n

the binary responses. Estimating β and λ when Yi = 1[Yi∗ > L= {Yi log[Φ (Xi β/σi (λ))]
0] is considerably more complicated than the linear case. In i =1
fact, constructing the likelihood function requires N-dimensional + (1 − Yi ) log[1 − Φ (Xi β/σi (λ))]}, (14)
integration of a multivariate normal distribution. We refer the √
reader to Lee (2004) for details. where σi (λ) is shorthand for Var(εi |W ). Assuming that β and
While the formulation in (10) is common, it is not the only λ are identified, and that the conditions below in Section 4 hold,
√
possibility. We may prefer more of a moving average structure, the pooled heteroskedastic probit is generally consistent and n-
such as asymptotically normal. But, for reasons we discussed above, it is
  likely to be very inefficient relative to the full MLE. While Pinkse
 and Slade’s GMM estimator can help a little, estimators that use
ε i = ui + λ Wih uh , (11)
some information on the spatial correlation across observations
h̸=i
seem more promising in terms of increasing precision.
where the ui are i.i.d. Normal(0, 1) random variables. This formula-
tion is attractive because it is relatively easy to find variances and
3.2. Bivariate probit partial MLE
pairwise covariances (which we will use in the pseudo MLEs intro-
duced in the next section). In particular,
We now turn to using information on pairs of ‘‘nearby’’ obser-
vations to identify and estimate β and λ. There is nothing special
 

Var(εi |W ) = 1 + λ2 Wih2 (12) about using pairs; we could use, say, triplets, or even larger groups.
h̸=i But the bivariate case is easy to illustrate and is computationally
and quite feasible.
  For illustration, assume a sample includes 2n observations, and
 we divide the 2n observations into n pairwise groups according
Cov(εi , εj |W ) = λWij + λWji + λ 2
Wih Wjh . (13) for example to the spatial Euclidean distance between them (see
h̸=i,h̸=j
Fig. 1). In other words, each group includes two observations,
Notice that if we use only Var(εi |W ) in a pooled analysis we would with the idea being that the internal correlation between the two
have to take λ > 0. observations is more important than external correlations with
observations in other groups. Of course, the way that we group the
3. Using partial MLEs to estimate general spatial probit models observations will affect the asymptotic variance of our procedure.
In practice, we recommend to specify different types of grouping
As mentioned earlier, estimating a probit spatial autocorrela- and to check if the variance estimates are reduced significantly
tion model by full MLE is a prodigious task. The EM algorithm can in the different cases. One could, after obtaining estimators from
be used (McMillen, 1992), the RIS simulator (Beron and Vijver- several groupings, apply an efficient minimum distance procedure
berg, 2003), and the Bayesian Gibbs sampler (Lesage, 2000). But (for example Wooldridge, 2010, Chapter 14) to obtain a single
each of these approaches is computationally burdensome, making estimator. 
, Yg2
∗ ∗

it very difficult to conduct simulation studies or to quickly estimate Let Yg∗ = Yg1 be the bivariate vectors of latent outcomes
a range of models. for group g, and assume for notational simplicity that we have 2n
observations. Write the linear equations for group g as These properties form the basis of a partial MLE with two responses
per group. We now turn to the asymptotic properties of this
Yg1 = Xg1 β + εg1
∗
(15)
estimator.
Yg2 = Xg2 β + εg2 ,
∗
(16)
where Xg1 and Xg2 are 1 × K vectors of regressors and β is a 4. Asymptotic properties of the partial MLEs
K × 1 vector; εg1 and εg2 are scalars. These two equations look
like a two-period panel data model, but we must recognize that
In the context of panel data and also cluster samples,
the variances and covariance depend on the weighting matrix in
Wooldridge (2010, Chapters 13 and 20) discusses partial MLE
the underlying spatial model. Not only are εg1 and εg2 correlated
methods. These PMLEs apply to pooled log likelihoods where
with each other but they are also with the errors in other groups.
Therefore, the variances and covariance between εg1 and εg2 not some dependence — across time or within cluster — is ignored
only depend on the weight within the group, but also weights with in estimation. The asymptotic theory in Wooldridge (2010)
other observations out of the group. is straightforward because observations are assumed to be
By assumption, E (εg1 |Xg1 , W ) = E (εg2 |Xg2 , W ) = 0. Write the independent across groups (and the group sizes are fixed, as we
2 × 2 variance matrix as assume here). In the present setting, we still have correlation
across all clusters due to the spatial nature of the data. But the
Ωg11 Ωg12
 
Var(εg |Xg , W ) ≡ Ωg (W , λ) = , (17) arguments for how partial MLE identifies parameters and generally
Ωg21 Ωg22 has desirable asymptotic properties is essentially unchanged
where we often suppress the dependence of Ωg (W , λ) on W and from the standard case. Nevertheless, the details of showing
λ in what follows. The variance terms are the same as in the that the groups are sufficiently weakly dependent are not
Pinkse and Slade (1998) approach. To implement our procedure, simple, and estimating the asymptotic variance matrix requires
the covariance must also be computed; exploiting this correlation some care.
is the source of improving the precision of the estimates of β and If we let Yg be the 2 × 1 vector of observed responses for group
λ. g, the partial log likelihood has the form
Let Yg1 and Yg2 be the binary outcomes associated with group g.
The conditional bivariate normal distribution of Yg1 and Yg2 given n

Xg (and W ) is L= log fg (Yg |Xg , W , θ ), (25)
P (Yg1 = 1, Yg2 = 1|Xg ) g =1
= P (Xg1 β + εg1 > 0, Xg2 β + εg2 > 0|Xg ) (18) where fg (yg |Xg , W , θ ) is the density of Yg given Xg (and we again
= P (εg1 ≤ Xg1 β, εg2 ≤ Xg2 β|Xg ) assume there are 2n total observations). Because this conditional
  density is correctly specified, the partial-log-likelihood function
Xg1 β Xg2 β generally identifies θ0 because of the Kullback–Leibler inequality
= Φ2  , , ρg (19) applied for each g (Wooldridge, 2010, Chapter 13). Of course,
Ωg11 Ωg22
we would need to assume or otherwise show the uniqueness
Cov(εg1 , εg2 ) Ωg12 of θ0 .
ρg =  =  , (20)
Var(εg1 ) Var(εg2 ) Ωg11 Ωg22 The general PMLE results apply to the spatial probit model

if we have correctly specified the bivariate normal densities
where Φ2 is the standard bivariate normal distribution, φ2 is the
φ2g (Yg1, Yg2 |Xg , W , θ ). To ensure correct specification, we must
standard density function of the bivariate normal distribution
properly obtain the 2×2 conditional variance–covariance matrix of
and ρg is the standardized covariance between two error terms.
Estimation in this context is similar to ‘‘random effects’’ probit (εg1 , εg2 )′ . This is where the underlying spatial model comes in. We
with two ‘‘time periods’’ and n observations. The difference is that must also take care in restricting the spatial dependence in the data
we have system heteroskedasticity in Var(εg |Xg , W ) and spatial so that the standardized sum in (25) satisfies the usual limit laws.
correlation across g. To ensure weak dependence, we assume that the spatial process
Obtaining the joint probabilities within the group is not is strong mixing, which means the grouped observations form a
difficult, and is most easily done finding marginal and conditional strong mixing sequence, too. The asymptotic approximations we
probabilities. Given that (εg1 , εg2 ) has a joint normal distribution, use are based on the thought experiment that the geographic area
we can write is increasing in size.
εg1 = δg1 εg2 + eg1 (21) To facilitate asymptotic analysis, write the partial log likelihood
function as
where
n
Cov(εg1 , εg2 )

δg1 = , (22) L= {Yg1 Yg2 log P (Yg1 = 1, Yg2 = 1|Xg )
Var(εg2 ) g =1
and eg1 is independent of Xg and εg2 . Because of the joint normality + Yg1 (1 − Yg2 ) log P (Yg1 = 1, Yg2 = 0|Xg )
of (εg1 , εg2 ), eg1 is also normally distributed with E (eg1 ) = 0, and + (1 − Yg1 )Yg2 log P (Yg1 = 0, Yg2 = 1|Xg )
Var(eg1 ) = Var(εg1 ) − δg1
2
Var(εg2 ). (23) + (1 − Yg1 )(1 − Yg2 ) log P (Yg1 = 0, Yg2 = 0|Xg )} (26)
Thus, we can write
  and for the sake of brevity define
Xg1 β + δg1 εg2
P (Yg1 = 1|Xg , εg2 ) = Φ . (24) Pg (1, 1) ≡ log P (Yg1 = 1, Yg2 = 1|Xg );
Var(eg1 )

(27)
Pg (1, 0) ≡ log P (Yg1 = 1, Yg2 = 0|Xg );
Once we obtain (24), we can retrieve explicit expressions for
Pg (0, 1) ≡ log P (Yg1 = 0, Yg2 = 1|Xg ) and
P (Yg1 = 1, Yg2 = 1|Xg ), P (Yg1 = 0, Yg2 = 1|Xg ), P (Yg1 = 1, (28)
Yg2 = 0|Xg ) and P (Yg1 = 0, Yg2 = 0|Xg ) as given in Appendix A. Pg (0, 0) ≡ log P (Yg1 = 0, Yg2 = 0|Xg ).
Therefore, we can rewrite the partial log likelihood (PLL) as compact set Θ at θ0 , (iii) Q is continuous on Θ , (iv) the density of
n observations in any region whose area exceeds a fixed minimum is
bounded, (v) as n → ∞,

L= {Yg1 Yg2 Pg (1, 1) + Yg1 (1 − Yg2 )Pg (1, 0)
g =1
 
 1 1
+ (1 − Yg1 )Yg2 Pg (0, 1) + (1 − Yg1 )(1 − Yg2 )Pg (0, 0)}. (29) sup  +
1≤g ≤n P (Y = 1, Yg2 = 1|Xg ) P (Yg1 = 1, Yg2 = 0|Xg )

g1
The PLL in (29) is the simplest way one might exploit spatial cor- 1
relation in pairs of observations. One possibility is to expand the +
P (Yg1 = 0, Yg2 = 1|Xg )
group sizes and shrink the number of groups, although expanding 
the group size makes the computational problem harder (because 1
 <∞

+
the dimension of Yg grows). Previously we mentioned the possibil- P (Yg1 = 0, Yg2 = 0|Xg ) 
ity of using several different pairings and using minimum distance
(vi) as n → ∞, supg (Xg  + Yg ) = O(1), (vii) supngj |Cov
   
estimation. A related possibility is to estimate (β, λ) by pooling
pseudo log likelihoods across multiple partitions of the data. Sup- (Ygi , Yji )| ≤ α(dgj ), i = 1, 2 where dgj denotes the distance between
pose we settle on J partitions of the data into n groups of two ob- group g and j, and  α(d) → 0 as d → ∞, (viii) limn→∞ E [Qn (θ )]
servations. Let ijg1 and ijg2 denote the observation index of the first exists, (ix) supg Wg  < ∞, then  θ − θ0 = op (1).
and second member of group g in partition j = 1, . . . , J. Then we Proof. Given in Appendix A.
can form a PLL as
Condition (i) is a standard assumption for optimization
J 
n
 estimators. Condition (ii) is the identification condition for MLE.
log fijg1 ,ijg2 (β, λ) , (30)
Condition (iii) assumes that the function Q is continuous in the
j =1 g =1
metric space, which is a reasonable assumption and necessary for
where fijg1 ,ijg2 (β, λ) has the same form as the contribution to the the proof that Qn (θ) is stochastically equicontinuous. Condition
log likelihood given in (29). This estimator will be a bit more com- (iv) simply excludes that an infinite number of observations
putationally demanding than the one that we propose explicitly in crowd in one bounded area. The minimum area restriction is
this paper, but it will be more efficient. In this paper our asymp- imposed because an infinitesimal area around a single observation
totic analysis is restricted to the case of a single partition of the has infinite density. Condition (v) makes sure any one of these
data. four situations will be present in a sufficiently large sample
in our bivariate probit structure. Condition (vi) makes sure the
4.1. Consistency of bivariate probit estimation regressors are deterministic and uniformly bounded, which is not
a strong assumption in this literature. Condition (vii) is the key
In this section, to make the asymptotic arguments formal, we assumption for this theorem, and it requires that the dependence
distinguish between the true value, θ0 , and a generic parameter among groups decays sufficiently quickly when the distance
value θ . We establish conditions under which the PMLE estimator between groups become further apart. This assumption employs
the concept from α -mixing to define the rate of dependence
p
θ−
introduced above is weakly consistent, that is,  → θ0 , as n → ∞.
The objective function for the bivariate probit PMLE, standard- decreasing as distance increases. Condition (viii) assumes the
ized by n−1 , is limit of E [Sn (θ)] exists as n → ∞, which is not a strong
n
assumption. Condition (ix) is actually implied by the rule of
dividing groups, which just excludes that the two groups are

Qn (θ ) ≡ n−1 {Yg1 Yg2 Pg (1, 1) + Yg1 (1 − Yg2 )Pg (1, 0)
exactly in the same location. An important remark is that the
g =1
assumptions in Theorem 1 allow for general types of spatial
+ (1 − Yg1 )Yg2 Pg (0, 1) dependence as the one given in (7), (11) and higher order spatial
+ (1 − Yg1 )(1 − Yg2 )Pg (0, 0)}, (31) error lags. Moreover, for simplicity reasons, we focus on the setting
of a bivariate probit with a likelihood function as given in (31).
and  θ maximizes Qn (θ ) over the parameter space Θ . Remember However, the results of our Theorems can be easily generalized
that this objective function represents a partial log likelihood: at the expense of more complex notation to go beyond the
we are only using information on the conditional distributions bivariate dependence provided that we extend assumptions such
D(Yg1 , Yg2 |X , W ) and not D(Y1 , Y2 , . . . , Yn |X , W ) — as in a full as (v) to allow for a finite number of observations inside each
maximum likelihood setting. group.
The identification condition essentially requires that the limit
of E [Qn (θ )] is uniquely maximized at the true value θ0 . From the
4.2. Asymptotic normality
argument described earlier, the only issue is whether θ0 is unique.
Define the limiting function as Our proof of asymptotic normality must recognize the spatial
Q (θ ) ≡ lim E [Qn (θ)]. dependence in the scores of the partial log likelihood. To deal
n→∞
with general dependence problems, a common approach in the
Then θ0 will uniquely maximize Q (θ ) in well-specified models literature is to use the so called ‘‘Bernstein Sums’’, which break
when there is not perfect collinearity among the regressors or up Sn into blocks (partial sums). This is the approach we take
some other degeneracy. It can require some care in parameterizing here. Each block must be so large, relative to the rate at which
the spatial autocorrelation, but standard models of spatial the memory of the sequence decays, that the degree to which the
autocorrelation cause no problems. As in Pinkse and Slade (1998) next block can be predicted from current information is negligible.
we assume uniqueness in all our analysis. At the same time, the number of blocks must increase with n so
The following Theorem 1 states the main consistency result for that the CLT argument can be applied to this derived sequence
a broad class of spatial probit models. (Davidson, 1994).
In this section, we show under what assumptions we are
Theorem 1. If (i) θ0 is the interior of a compact set Θ , which is the able to apply McLeish’s central limit theorem (1974) to spatial
closure of a concave set, (ii) Q attains a unique maximum over the dependence cases to get asymptotic normality for the spatial Probit
estimator. This is presented in the following Theorem. AT denotes index random field Ws∗ that is equal to one if location s ∈ Z 2 is
the transpose of matrix A. Define the score of the objective function sampled and zero otherwise. Ws∗ is assumed to be independent of
as the underlying random field and to have a finite expectation and
∂ Qn to be stationary. The strong mixing coefficients are defined as
Sn (θ) ≡ (θ ). (32)
∂θ αk,l (n) ≡ sup {|P (A ∩ B) − P (A) P (B)|} ,
A ∈ ΞΛ1 , B ∈ ΞΛ2 and
Theorem 2. If the assumptions of Theorem 1 hold, and in addi- |Λ1 | ≤ k, |Λ2 | ≤ l, Υ (Λ1 , Λ2 ) ≥ n.
d2 α(dd∗ )
tion: (i) as d → ∞, α(d∗ ) = o(1) for all fixed d∗ > 0, (ii) the
√ We also define a new process Rs (θ ) such as
sampling area grows uniformly at a rate of n in two non-opposing
S (θ) if Ws∗ = 1,
 
directions, (iii) B(θ0 ) ≡ limn→∞ E [nSn (θ0 )SnT (θ0 )] and A(θ0 ) ≡ Rs (θ ) =
limn→∞ −E [Hn (θ0 )] are positive definite matrices. Then 0 if Ws∗ = 0.
√
n(
θ − θ0 ) → N [0, A(θ0 )−1 B(θ0 )A(θ0 )−1 ], We have the following theorem.
where Sn (θ0 ) ≡ ∂∂θ ∂ Qn

2
Qn
(θ0 ) and Hn (θ0 ) = ∂θ∂θ T (θ0 ). Theorem 3. If (i) Λτ grows uniformly in two non-opposing direc-
tions as τ −→ ∞, (ii) B(θ0 ) ≡ limn→∞ E [Sn (θ0 )SnT (θ0 )] and
Proof. Given in Appendix A. A(θ0 ) ≡ limn→∞ −E [H (θ0 )] are uniformly positive definite matrices,
Condition (i) is stronger than condition (vii) in Theorem 1, and (iii) Ygi , Yji as defined in Theorem 1, i = 1, 2 and Ws∗ are strong mixing
it is also stronger than the usual condition in time series data where αk,l (n) converges to zero as n → ∞; S (θ ) is Borel measurable
because spatial dependent data has more dimension correlations for all θ ∈  Θ , and continuous on Θ and first moment continuous
on Θ , (iv) m=1 mαk,l (m) < ∞ for k + l ≤ 4, (v) α1,∞ (
∞
than time series data. It shows how dependence decays when m) =
, (vi) for some δ > 0, E (∥S (θ0 )∥)2+δ < ∞ and ∞
 −2 
distance between groups gets further away, and the dependence o m m=1
decays at a fast enough rate. As stated in Pinkse and Slade mα1,1 (m)δ/(2+δ) < ∞, (vii) H (θ ) is Borel measurable for all θ ∈
(1998, p. 134), ‘‘an Euclidean-weighting scheme does not satisfy Θ , continuous on Θ  and second moment continuous, A(θ0 ) exists
condition (i). (i) also implies that α(d) is positive. However, and is full rank, (viii) s∈Z 2 cov (R0 (θ0 ) , Rs (θ0 )) is a non-singular
since α(d) is an upper bound, the possibility that covariances matrix, (ix) the KMP (j, k) are uniformly bounded and KMP (j, k) −→
do not decline monotonically is not excluded’’. Condition (ii) just 1, nτ −→ ∞ as τ −→ ∞(M , P −→ ∞), LM = o M 1/3 and
repeats the assumption in the Bernstein’s blocking method, the
LP = o P 1/3 , (x) for some δ > 0, E (∥S (θ0 )∥)4+δ < ∞ and Ygi , Yji
 
two non-opposing directions just exclude sampling area grows at
as defined in Theorem 1, i = 1, 2 and Ws∗ are strong mixing where
two parallel directions, which does not make much sense in the
α∞,∞ (m)δ/(2+δ) = o m−4 , (xi) E supΘ Rm,p (θ ) < ∞ and
   2
spatial dependent case. Conditions in (iii) are natural conditions
2
E supΘ (∂/∂θ ) Rm,p (θ)  < ∞, and if

about matrices, which are implied by the previous assumptions.

Matrices are semidefinite if some extreme situations happen such
as P (Yg1 = 1, Yg2 = 1|Xg ) = 0, which are assumed to be excluded LM 
 LP M
 P

in the previous assumptions. Bτ = n−
 τ
1
KMP (j, k)
j=0 k=0 m=j+1 p=k+1
 T

4.3. Estimation of variance–covariance matrices θ Rm−j,p−k 
θ +
   
Rm,p 
×  T
θ
θ Rm,p 
 
Consistent estimation of the asymptotic covariance matrix is Rm−j,p−k 
important for the construction of asymptotic confidence intervals M 
 P
 T
and hypotheses tests. Estimation of A(θo ) is relatively easy, as −n − 1
θ Rm,p 
θ ,
 
τ Rm,p 
we can use the sample average of the negative Hessian and m=1 p=1
replace θ0 with 
θ . Or, we can use a version based on a conditional
then
expectation — see Wooldridge (2010, Chapter 13). Estimation of
B(θo ) is substantially more difficult when there is dependence Bτ − B(θ0 ) = op (1) as τ −→ ∞,

in the data — especially spatial dependence. Newey and West
(1987) is the most commonly used approach for pure time where we split s = [m, p], Λτ is a rectangle so that m ∈ {1, 2,
series problems; Andrews (1991) established the consistency of . . . , M } and p ∈ {1, 2, . . . , P }.
kernel HAC (heteroskedasticity and autocorrelation consistent) To ensure positive semi-definite covariance matrix estimates, we
estimators under fairly general conditions. But we need an need to choose an appropriate two-dimensional weights function that
approach that allows for two-dimensional correlation. is a Bartlett window in each dimension
Pinkse and Slade (1998) showed that, under conditions similar KMP (j, k)
∞
to those that imply asymptotic normality, Bn ( θ) −→ B(θ0 ), where    
|j| |k|
Bn (θ0 ) ≡ nE [Sn (θ0 )SnT (θ0 )] (see Lemma 8 in Appendix A). Unfor- 1− 1− for |j| < LM , |k| < LP
 
= LM LP .
tunately, Pinkse and Slade’s estimator is feasible only if we can
0 else
 
get closed form expressions for E [Sn (θ0 )SnT (θ0 )], something that is
very difficult. Instead, we follow an approach proposed by Conley
(1999). Proof. The result follows from Conley (1999, Proposition 3).
A feasible way to obtain a consistent estimate of a vari-
ance–covariance matrix that allows for a wider range of depen- 5. Simulation study
dence is to apply the approach of Conley (1999). To this end, let
ΞΛ be the σ -algebra generated by a given random field ψsm , sm ∈ In the previous section we demonstrated consistency and
Λ with Λ compact, and let |Λ| be the number of sm ∈ Λ. Let asymptotic normality of the PMLE based on the bivariate normal
Υ (Λ1 , Λ2 ) denote the minimum Euclidean distance from an el- distribution. Unfortunately, it is difficult to show theoretically that
ement of Λ1 to an element of Λ2 . There exists also a regular lattice the PMLE that uses groups of two is more efficient than univariate
pooled estimators: both estimation approaches have neglected In generating the data according to Eqs. (34) and (35) we set
dynamics making analytical comparisons of the asymptotic the true parameter values for β1 , β2 and β3 all equal to unity. We
variances very difficult, if not impossible. Intuitively, it seems are particularly interested in estimation of the spatial parameter
reasonable that using more information about the spatial structure λ, and so we vary its value as follows: λ = 0.2; 0.4; 0.6; and 0.8.
should produce more precise estimators. In this section we use a These values for λ are in the range of the estimated value in the
small simulation study to verify this intuition. empirical application of Pinkse and Slade (1998). We consider total
sample sizes of N = 500 (so n = 250 groups), N = 1000, and
5.1. Simulation design and results N = 1500. We use 1000 replications in the simulations. The results
are reported in Table 1 (for the spatial parameter λ) and Table 2 (for
Instead of comparing our PMLE to the GMM estimator of Pinkse β1 , β2 and β3 ) in Appendix B.
and Slade (1998) directly, we choose to compare the bivariate We start with estimation of β . Table 2 shows that the PMLE
PMLE to the univariate PMLE, which we refer to as the het- has little bias with N = 1500 (except when λ = .2), whereas
eroskedastic probit estimator (HPE). We have two justifications for the HPE still has substantial bias. The poor behavior of the HPE
using the HPE rather than the GMM version. First, the HPE uses the for estimating β may be due to its inability to estimate λ. The
PMLE does much better in terms of precision, too. Generally, as we
same moment conditions as the GMM estimator because both use
expect, the Monte Carlo standard deviations shrink as the sample
the first-order condition from the HPE. Thus, efficiency gains from
size increases.
using an optimal weighting matrix are unlikely to be important.
Table 1 shows that the HPE struggles when trying to estimate
Second, the STATA2 source codes for bivariate probit estimation
and heteroskedastic probit estimation are available online, and we
λ. The PMLE is always much closer to the true parameter value —
although it has a systematic upward bias for each N — with smaller
can easily adopt the code for the kind of heteroskedasticity in the
standard deviations across all sample sizes and parameter values.
variances and covariances implied by common spatial dependence
The bias of the PMLE decreases when N increases but there is room
structures. We consider two simulation settings allowing for dif-
for improvement. Possible bias adjustments to the estimator of λ
ferent types of spatial dependence.
is a good topic for future research.
In summary, from the simulation results of Tables 1 and 2, we
5.1.1. Case 1 see how the PMLE clearly outperforms the HPE, especially when
According to the theoretical framework given in previous estimating the spatial parameter λ. While simulation findings are
sections, we could generate a dataset which allows a general necessarily special, the ones here provide strong support for the
correlation structure across groups as in Eqs. (6) and (7). We idea that using even a little information on the spatial correlation
require knowing the 2 × 2 matrices Ωg as functions of λ and W . structure can go a long way in obtaining less biased, more precise
Generally, it is quite difficult to derive the pairwise covariances estimators.
for the bivariate probit because the exact formula for Ωg12 (and
of Ωg11 , Ωg22 ) is very complicated; they must be obtained from 5.1.2. Case 2
the inverse of the full 2n × 2n variance–covariance matrix. For the We consider a second data generating process given as (34),
SAE model, this matrix is where again we set the true parameter values for β1 , β2 and
[(I − λW )′ (I − λW )]−1 β3 all equal to unity. In this case we assume (11) where the ui
are i.i.d. Normal(0, 1) and Wih is the reciprocal of the Euclidean
Ω111 · · · · · ·
 
··· ··· distance between i and h. We obtain the closed form expressions
 ··· ··· ··· ··· ···  for the variance and covariance given in (12) and (13). Results for
=  · · · · · · Ωg11 Ωg12 ··· , (33)
 
1000 replications are provided in Tables 3 and 4 in Appendix B.
 ··· ··· Ω Ωg22 ··· 
g21 Again, the PMLE provides substantial improvements over the
··· ··· ··· ··· Ωn22 HPE, especially when estimating λ. It makes sense that using
information in the pairwise data helps to substantially improve
and it is difficult to obtain the Ωghi in closed form. Instead, it seems
the precision in estimating λ, which is fundamentally a spatial
reasonable to do the following. Let R be a weighting matrix (which
correlation parameter. Efficiency gains of the bivariate procedure
can be generated in STATA3 ) according to the distance between
in estimating β are smaller in Table 4 — possibly because we chose
observations. Then define
to group nearby observations — but still nontrivial. For this data
Yi∗ = Xi1 β1 + Xi2 β2 + Xi3 β3 + εi (34) generating process (DGP), both HPE and PMLE show little bias in
estimating β , especially for the largest sample size.
ε = λRu, (35)
where u ∼ Normal(0, I2n ). The weighting matrix R is standardized 6. Conclusions
so that the diagonal elements are ones, and then the elements of
R shrink as distance between observations increases. Using this The idea of this paper is simple and intuitive: rather than just
approach, it is relatively easy to determine Var(εi ) and Cov(εi , εj ), using information contained the marginal distributions, we divide
which facilitates the HP bivariate probit estimation. We still allow observations into pairwise groups and use a partial MLE approach.
general correlation across groups, and we are able to compare Using the spatial correlation for pairs of outcomes, we prove
the efficiency gains from only using the marginal information (the that the bivariate PMLE is consistent and asymptotically normal
HP approach) to using both diagonal and off-diagonal information under reasonable regularity conditions (although these could be
(bivariate probit). relaxed in future research). We also discuss how to get consistent
covariance matrix estimators under general spatial dependence
by following the approach of Conley (1999), which is much more
2 See http://www.stata.com/. practical than the proposal of Pinkse and Slade (1998).
3 The STATA command is ‘‘Spatwmat’’. Since the speed to calculate the inverse of The simulation study in Section 5 demonstrates that using
a matrix is much slower as the size of matrix increases, and moreover the maximum
bivariate rather than univariate distributions not only improves
matrix size in Stata is 800, we allow here each observation to be spatially correlated efficiency, but can substantially decrease finite-sample bias —
to nearby 99 observations. especially for estimating the spatial correlation parameters.
The fact that we can undertake a substantial simulation study Now we are ready to get P (Yg1 = 1, Yg2 = 1|Xg ) as follows
demonstrates that our approach is computational much more
1
feasible than the full, joint MLE. Our conjecture is that an estimator P (Yg1 = 1, Yg2 = 1|Xg ) =  
that uses, say, trivariate distributions would perform even better. Xg2 β
Φ √
Of course that comes at the expense of computation. Nevertheless, Var(εg2 )
computation for a single data set should not be difficult for even
   
∞
Xg1 β + δg1 εg2 εg2

larger group sizes. We think the findings for group sizes of two × Φ φ dεg2
Var(eg1 ) Var(εg2 )
 
make a strong case for the general PMLE approach. −Xg2 β
A fixed and known spatial error structure is a limitation of our  
results. Ideally, one could accommodate endogenous location deci- Xg2 β
×Φ (42)
Var(εg2 )

sions. Unfortunately, endogenous location raises both conceptual
and technical difficulties that need to be studied in future research.    
∞

Extensions that are more immediate are models with spatial dis-
tributed lags in the covariates and other kinds of nonlinear models
= Φ φ dεg2 , (43)
 
−Xg2 β
that can be estimated by PMLE, including Tobit, count and switch-
ing models. and similarly we can obtain finally
 
Xg2 β
Appendix A P (Yg1 = 0, Yg2 = 1|Xg ) = Φ
Var(εg2 )

   
A.1. Expressions of conditional bivariate distributions ∞

− Φ φ  dεg2 (44)

−Xg2 β
Since
P (Yg1 = 1, Yg2 = 0|Xg )
P (Yg1 = 1, Yg2 = 1|Xg )
Xg2 β
   

= P (Yg1 = 1|Yg2 = 1, Xg ) · P (Yg2 = 1|Xg ) (36) = Φ φ dεg2 (45)
 
−∞
it is easy to see that P (Yg2 = 1|Xg ) = Φ Xg2 β/ Var(εg2 ) , and
  
  
thus it remains to get P (Yg1 = 1|Yg2 = 1, Xg ). Xg2 β
P (Yg1 = 0, Yg2 = 0|Xg ) = 1−Φ
First, since Yg2 = 1 if and only if εg2 > −Xg2 β, and εg2 follows Var(εg2 )

a normal distribution and it is independent of Xg , then the density
Xg2 β
   
of εg2 given εg2 > −Xg2 β is Xg1 β + δg1 εg2 εg2

− Φ φ dεg2 . (46)
Var(eg θ 1 ) Var(εg2 )
 
    −∞
εg2 εg2
φ √ φ √
Var(εg2 ) Var(εg2 )
= . (37)
P (εg2 > −Xg2 β)

A.2. Proofs of theorems
Φ √ Xg2 β
Var(εg2 )
Proof of Theorem 1. By Newey and Mcfadden (1994), for consis-
Therefore, tency it is sufficient to verify the following conditions:
P (Yg1 = 1|Yg2 = 1, Xg ) (i) Q has a unique maximum at θ0 .

(ii) Qn (θ ) − Q (θ) = op (1) at all θ ∈ Θ .
= E [P (Yg1 = 1|Xg , εg2 )|Yg2 = 1, Xg ) (38)
(iii) {Qn (θ)} is stochastically equicontinuous and Q is continuous
on Θ .
   
Xg1 β + δg1 εg2 

=E Φ  Yg2 = 1, Xg (39)
Var(eg1 )

 We have already assumed condition (i). The proof of condition
 ∞ (ii) is provided in Lemma 1, and the proof that {Qn (θ )} is
1
=   Φ stochastically equicontinuous can be found in Lemma 2.
X β −Xg2 β
Φ √ g2 Proof of Theorem 2. To find out the asymptotic normality of the
Var(εg2 )
Partial MLE for spatial bivariate Probit model, we start the proof
from mean value theorem. Since ∂∂θ (θ ) = 0, and by using the
Qn 
   
× φ dεg2 (40) mean value theorem
 
∂ Qn  ∂ Qn ∂ 2 Qn ∗ 
and it is easy to see that P (Yg1 = 0|Yg2 = 1, Xg ) = 1 − P (Yg1 = (θ ) = 0 = (θ0 ) + (θ )(θ − θ0 ) (47)
∂θ ∂θ ∂θ ∂θ T
1|Yg2 = 1, Xg ) because Yg1 is the binary variable. −1
∂ Qn ∗ ∂ Qn
 2
Similarly, we can get
⇒ (
θ − θ0 ) = − (θ ) (θ0 ), (48)
1 ∂θ ∂θ T ∂θ
P (Yg1 = 1|Yg2 = 0, Xg ) =  
√ Xg2 β where θ ∗ lies between 
θ and θ0 .
1−Φ
Var(εg2 ) ∂ 2 Qn
First, let us discuss the term ∂θ ∂θ T
(θ ∗ ) to find out the asymptotic
∂ 2 Qn
Xg2 β properties of ∂θ ∂θ T (θ ). Recall that
    ∗

× Φ φ dεg2 (41)
 
−∞ n
1
Qn (θ ) = {Yg1 Yg2 Pg (1, 1) + Yg1 (1 − Yg2 )Pg (1, 0)
and P (Yg1 = 0|Yg2 = 0, Xg ) = 1 − P (Yg1 = 1|Yg2 = 0, Xg ). n g =1
+ (1 − Yg1 )Yg2 Pg (0, 1) In order to prove

+ (1 − Yg1 )(1 − Yg2 )Pg (0, 0)}, (49)
n
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2
 
1
where Pg (1, 1) ≡ log P (Yg1 = 1, Yg2 = 1|Xg ) and so on. Also Kg11 (θ )
n g =1 ∂θ
∂ 2 Qn 1
n 
∂ 2 Pg (1, 1) p 1 
n 
∂ P (Yg1 = 1, Yg2 = 1|Xg )
2
(θ ) = Yg1 Y g2 −
→ Kg11 (θ0 ) , (55)
∂θ∂θ T n g =1 ∂θ ∂θ T n g =1 ∂θ
∂ 2 Pg (1, 0)
+ Yg1 (1 − Yg2 )
∂θ ∂θ T we need to show that it holds for all ∥ϖ ∥ = 1. Set Kg11 = ϖ T Kg
∂ 2 Pg (0, 1) and then
+ (1 − Yg1 )(Yg2 )
∂θ ∂θ T n
∂ P (Yg1 = 1, Yg2 = 1|Xg )  2
   
1
∂ 2 Pg (0, 0) ϖT (θ )

Kg11
+ (1 − Yg1 )(1 − Yg2 ) , (50) n g =1 ∂θ
∂θ ∂θ T
∂ P (Yg1 = 1, Yg2 = 1|Xg )
n  2 
1
where − Kg11 (θ0 ) (56)
n g =1 ∂θ
∂ Pg (1, 1)
2
−1
= 1
n 
∂ P (Yg1 = 1, Yg2 = 1|Xg )  2

∂θ∂θ T [P (Yg1 = 1, Yg2 = 1|Xg )]2 = Kg11 (θ )
n g =1 ∂θ
∂ P (Yg1 = 1, Yg2 = 1|Xg ) 2
 
×
∂θ ∂ P (Yg1 = 1, Yg2 = 1|Xg )
 2 
− (θ0 ) (57)
+
1 ∂θ
P (Yg1 = 1, Yg2 = 1|Xg )
2
n
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗
= (
θ − θ0 ) Kg11 (θ )
∂ 2 [P (Yg1 = 1, Yg2 = 1|Xg )] n g =1 ∂θ
× , (51)
∂θ ∂θ T
∂ 2 P (Yg1 = 1, Yg2 = 1|Xg ) ∗
and all other terms behave similarly. × (θ ). (58)
∂θ ∂θ T
As before, we only discuss one of these terms, and the same logic
applies to the other terms. We know that From the proof of Theorem 1, we know that sup g
 ∂ P (Yg1 =1,Yg2 =1|Xg )   ∂ P (Yg1 =1,Yg2 =1|Xg ) 
   2 
∂θ
< ∞ . From Lemma 3, sup g ∂θ ∂θ T
n
∂ 2 Pg (1, 1) ∗
     
1
Yg1 Yg2 (θ ) < ∞. From Theorem 1, we also know that  θ −θ0 = op (1) and hence
n g =1 ∂θ ∂θ T
n n
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗

1 −1 2 
(
θ − θ0 ) (θ )

= Yg1 Yg2 Kg11
n g =1 [P (Yg1 = 1, Yg2 = 1|Xg )]2 n g =1 ∂θ
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 ∂ 2 P (Yg1 = 1, Yg2 = 1|Xg ) ∗

 
× (θ ) × (θ ) = op (1) (59)
∂θ ∂θ ∂θ T
n
∂ P (Yg1 = 1, Yg2 = 1|Xg )  2
   
1 T 1
+ =⇒ ϖ Kg11 (θ )
P (Yg1 = 1, Yg2 = 1|Xg ) n g =1 ∂θ
∂ 2 [P (Yg1 = 1, Yg2 = 1|Xg )] ∗

∂ P (Yg1 = 1, Yg2 = 1|Xg )
n  2 
× (θ ) . (52) 1
∂θ ∂θ T − Kg11 (θ0 )
n g =1 ∂θ
Look at the first term of the above equation given by = op (1) (60)
n
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2
 
1
n =⇒ Kg11 (θ )

1 −1
Yg1 Yg2 n g =1 ∂θ
n g =1 [P (Yg1 = 1, Yg2 = 1|Xg )]2
∂ P (Yg1 = 1, Yg2 = 1|Xg )
n  2
p 1 
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 −
→ Kg11 (θ0 ) . (61)
  
× (θ ) . (53) n g =1 ∂θ
∂θ
By definition,
 
Since  [P (Y =1,Y1 =1|X )]2  < ∞, we can write this term as
 
g1 g2 g
∂ P (Yg1 = 1, Yg2 = 1|Xg )

n  2
1
lim Kg11 (θ0 )
n
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 n→∞ n ∂θ
 
1 g =1
Kg11 (θ ) , (54)
n g =1 ∂θ  2 
∂ P (Yg1 = 1, Yg2 = 1|Xg )

= E Kg11 (θ0 ) , (62)
where Kg11 ≡ Yg1 Yg2 [P (Y =1,−Y 1 =1|X )]2 .
∂θ
g1 g2 g
and therefore, does not hold here, i.e. −E [Hn (θ0 )] ̸= E [Sn (θ )SnT (θ )], because the
n  score terms are correlated with each other over space. In this
1 −1 part, we follow Pinkse and Slade (1998) and we use Bernstein’s
Yg1 Yg2
n g =1 [P (Yg1 = 1, Yg2 = 1|Xg )]2 blocking methods and the McLeish’s (1974) central limit theorem
for dependent processes. First, define Tnan ≡ Πj=n1 (1 + iγ Dn,j ),
a
∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 p

  
× (θ ) −
→ (63) where i2 = −1, and Dn,j (j = 1, 2 . . . an ) is an array of random
∂θ variables on the probability triple (Ω , z, P ).γ is a real number.
McLeish’s (1974) central limit theorem for dependent processes

−1
E Yg1 Yg2 requires the following four conditions
[P (Yg1 = 1, Yg2 = 1|Xg )]2

∂ P (Yg1 = 1, Yg2 = 1|Xg )
2  (i) {Tnan } is uniformly integrable,
× (θ0 ) . (64) (ii) ETnan → 1,
∂θ
p
→ 1,
an
(iii) j =1 D2n,j −
Similarly, we can prove in relation to the second term in (52)
p
that → 0.
(iv) maxj≤an |Dn,j | −
n
1 1
 √Now  we need to define Dn,j in our case. Let Y0n ≡ ϖ T
Yg1 Yg2
n g =1 P (Yg1 = 1, Yg2 = 1|Xg ) nSg (θ0 )
√
1 
= n− 2 ng=1 Ang for implicitly define Ang . In order to
B(θ )
0
∂ 2 [P (Yg1 = 1, Yg2 = 1|Xg )] ∗ prove Y0n −

d
→ N (0, 1), we need to establish that the property holds
× (θ ) (65)
∂θ ∂θ T for all ∥ϖ ∥ = 1 using the Cramer–Wold device. As in the proof of
Lemma 1 in Theorem 1, we split the √region√in which observations

p 1
−
→ E Yg1 Yg2
P (Yg1 = 1, Yg2 = 1|Xg ) √ of size bn × bn . We also know that
are located up to an an area
an increases faster than n and bn slower, where an and bn are in-
∂ 2 [P (Yg1 = 1, Yg2 = 1|Xg )]

tegers such that an bn = n. Let an and bn be constructed such that
× (θ ) . (66) 1
bn an → 0. Let nτ − 2 × bn < 1, uniformly in n, for some fixed
0
∂θ ∂θ T α
√ 
0<τ < Let Λnj denote the set of indices corresponding to the
1
2
.
As usual, we apply repeatedly the above arguments to the other
observations in area j. By assumption a number C > 0 exists such
terms. Finally, we can get that 1 
that maxj (#Λnj ) < Cbn . Define Dn,j ≡ n− 2 g ∈Λnj Ang , and hence
∂ 2 Qn ∗ p ∂ 2 Qn j=1 Dn,j .
 an
we can write Y0n =
lim (θ ) −
→ E (θ 0 ) . (67)
n→∞ ∂θ∂θ T ∂θ ∂θ T Now we are ready to discuss the four conditions for Mcleish’s
(1974) central limit theorem. First, look at condition (iv), which re-
If we define quires that maxj≤an |Dn,j | = op (1)
∂ 2 Pg (1, 1) ∂ 2 Pg (1, 0)

( )
 
H ≡ Yg1 Yg2 + Yg1 1 − Yg2
∂θ ∂θ T ∂θ ∂θ T
 
 −1 
Ang  .

max |Dn,j | = max n
 2 (71)
∂ 2 Pg (0, 1) j≤an j≤an 
g ∈Λnj
+ (1 − Yg1 )(Yg2 )

∂θ ∂θ T
Since by assumption
∂ 2 Pg (0, 0)

+ (1 − Yg1 )(1 − Yg2 ) (68)
∂θ ∂θ T
 
 
 −1 
max(#Λnj ) < Cbn ⇒ max n 2

Ang 
where H denotes the Hessian, Eq. (68) can be rewritten as j j≤an 
g ∈Λnj 
n
1 p
1
≤ Cbn × n− 2 sup Ang  ,
 
lim H (θ ∗ ) −
→ lim E [H (θ0 )]. (69) (72)
n→∞ n g =1 n→∞
where # denotes the number of objects, by definition we have that
Therefore, it remains to show the asymptotic normality of the √
nSg (θ0 )
 n
score term, Sn (θ0 ). Now ϖ T
√ = n− 2
1 
Ang ,
B(θ0 ) g =1
n
∂ Pg (1, 1)

1
Sn (θ0 ) = Yg1 Yg2 (θ0 ) 
∂ Pg (1, 1)
∂θ T 1
n g =1 Ang = ϖ √ Yg1 Yg2 (θ0 )
B0 ∂θ
∂ Pg (1, 0)
+ Yg1 (1 − Yg2 ) (θ0 ) ∂ Pg (1, 0) ∂ Pg (0, 1)
∂θ + Yg1 (1 − Yg2 ) (θ0 ) + (1 − Yg1 )Yg2 (θ0 )
∂ Pg (0, 1) ∂θ ∂θ
+ (1 − Yg1 )Yg2 (θ0 ) ∂ Pg (0, 0)

∂θ + (1 − Yg1 )(1 − Yg2 ) (θ0 ) . (73)
∂ Pg (0, 0)
 ∂θ
+ (1 − Yg1 )(1 − Yg2 ) (θ0 ) . (70)
∂θ 1
Since B(θ0 ) is positive definite, B(θ0 )− 2 is bounded as n → ∞,
and we have that supg Yg < ∞ by assumption (vi) in Theo-

1
We need to show that B− 2 (θ0 )Sn (θ0 ) → N (0, IK ), where B(θ ) ≡
 
 ∂ Pg (1,1) 
 
limn→∞ nE [Sn (θ )SnT (θ )]. Note that the information matrix equality rem 1. We have also proved that supg  ∂θ  < ∞ in Lemma 2.
Therefore, we are able to prove that sup Ang  < ∞. Then Cbn × by construction of Y0n , since E (Y0n ) = 1. It remains to show that
  2
i̸=j E (Dn,i Dn,j ) = o(1). This condition is proved in Lemmas 5–7.

1
 1
  an 4
n− 2 sup Ang  = Op Cbn × n− 2 = op (1) by construction of bn .
 

Hence we can get that maxj≤an |Dn,j | = op (1).
Second, let us discuss condition (i): {Tnan } is uniformly inte-
grable. Following Davidson (1994), if a random variable is in-
A.3. Technical lemmas
tegrable, the contribution to the integer of extreme random
variable values must be negligible. In other words, if E |Tnan | <
∞, E (|Tnan |1|Tnan |>K ) → 0, as K → ∞, it is equivalent to say The proofs of Theorems 1 and 2 require the use of the following
P [supn>N |Tnan | > K ] = 0, for some K > 0 as n → ∞. Here Lemmas 1–8. The proofs are in the technical appendix that is
we follow the proof of Lemma 10 in Pinkse and Slade (1998). We available upon request from the authors.
have that
   
P sup |Tnan | > K = P sup | Πja=n1 (1 + iγ Dn,j )| > K (74) Lemma 1. Under the assumptions in Theorem 1, Qn (θ ) − Q (θ) =
n >N n >N op (1) for all θ ∈ Θ .
    
≤ P sup Πja=n1 1 + γ 2 D2n,j  > K (75)
 
n>N Lemma 2. Under the assumptions in Theorem 1, Qn (θ ) − Q (θ ) is
 
   stochastically equicontinuous.
= P sup Πja=n1 1 + γ 2 D2n,j 
 
n >N
  Lemma 3. Under the assumptions in Theorem 2, supg
> K  sup nτ |Dn,j | ≤ C × P sup nτ |Dn,j | ≤ C
  
 ∂ P (Yg1 =1,Yg2 =1|Xg ) 
 2 
n>N ,j
 ∂θ ∂θ T
 < ∞.
   
+ P sup Πja=n1 1 + γ 2 D2n,j 
 
n >N Lemma 4. Under the assumptions in Theorem 2, ETnan − 1 = o(1),
where Tnan ≡ Πj=n1 (1 + iγ Dn,j ).
a
  
> K  sup nτ |Dn,j | > C × P sup nτ |Dn,j | > C
  
(76)
n>N ,j 
     Lemma 5. Under the assumptions in Theorem 2, max i̸=j
≤ P sup Πja=n1

1 + γ 2 D2n,j 
 |E (Dn,i Dn,j )| = o(n−1 bn ) = o(a−
n ).
1
n >N
   √an 
τ τ Lemma 6. Under the assumptions in Theorem 2, max
> K  sup n |Dn,j | ≤ C + P sup n |Dn,j | > C j∈Ξnil
  
 (77) l =2
n>N ,j |E (Dn,i Dn,j )| = o(a−
n ).
1
where C is a uniform upper bound to g ∈Λnj Ang . Therefore,


an an
  Lemma 7. Under the assumptions in Theorem 2, j=1 D2n,j = j =1
τ τ − 21 E (D2n,j ) + op (1).

P sup n |Dn,j | > C = P sup n |n Ang | > C
 
(78)
g ∈Λnj
  Finally, the following Lemma 8 generalizes Pinkse and Slade
1
= P sup nτ − 2

|Ang | > C (1998) results as a way to obtain consistent estimates of the
g ∈Λnj variance covariance matrix. The proof is also contained in the
  technical appendix that is available upon request from the
1
≤ P sup nτ − 2 bn

|Ang | > C = 0 (79) authors.
g ∈Λnj
 
 ∂Φ ∂Φ
Lemma 8. If assumptions in Theorem 2 hold, and supg  ∂θ4 + ∂θ3 

τ − 21
since n bn < 1 and by construction of bn . Then,
      < ∞, then An ( θ ) − A(θ0 ) = op (1) and Bn ( θ ) − B(θ0 ) = op (1);
1 + γ Dn,j  > K  sup nτ |Dn,j | ≤ C where Bn (θ ) ≡ nE [Sn (θ )SnT (θ )] and A(θ ) ≡ −E [H (θ )].
 an
P sup Πj=1 2

2 
n >N n>N ,j
 
an
≤ P sup |(1 + γ 2 n−2τ C 2 ) 2 | > K = 0 (80) Appendix B
n >N
provided we set K sufficiently large. Therefore, we proved that

P [supn>N |Tnan | > K ] = 0 ⇒ {Tn } is uniformly integrable. See Tables 1–4.
Third, condition (ii) requires that ETnan → 1, which is equiva-
lent to saying that ETnan − 1 = o(1); see proof in Lemma 4.
an p Appendix C. Supplementary data
Fourth, in order to prove (iii): j =1 D2n,j −
→ 1, by Lemma 7,
E (D2n,j ) − 1 + op (1) and
an an
j =1 D2n,j − 1 = j =1
Supplementary material related to this article can be found
an
 online at http://dx.doi.org/10.1016/j.jeconom.2012.08.005.
E( D2n,j ) − 1 + op (1)
j =1
an

= E (Y0n
2
)−1− E (Dn,i Dn,j ) + op (1) = op (1), (81) 4 Lemmas 5–8 are along the lines of those in Pinkse and Slade (1998), which are
i̸=j a simplified version of the proofs in Davidson (1994).

Table 1
*
Case 1: Simulation results of different estimators of λ in the context of the bivariate spatial probit model.
λ = 0.2 λ = 0.4 λ = 0.6 λ = 0.8
HPE PMLE HPE PMLE HPE PMLE HPE PMLE
N = 500 Mean 3.938 0.514 6.177 0.519 7.698 0.571 7.735 0.634
Bias 3.738 0.314 5.777 0.319 7.098 −0.029 6.935 −0.166
(s.d.) (12.158) (0.120) (15.776) (0.205) (16.929) (0.151) (16.202) (0.289)
N = 1000 Mean 3.174 0.512 4.668 0.518 5.456 0.581 5.914 0.672
Bias 2.974 0.312 4.268 0.118 4.856 −0.019 5.114 −0.128
(s.d) (8.844) (0.107) (9.100) (0.133) (9.631) (0.149) (10.173) (0.276)
N = 1500 Mean 2.746 0.511 4.050 0.507 4.872 0.609 5.426 0.708
Bias 2.546 0.311 3.650 0.107 4.272 0.009 4.626 −0.092
(s.d.) (6.423) (0.099) (7.414) (0.124) (8.598) (0.149) (8.514) (0.253)
*
Results are presented for the bivariate Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of λ. Numbers in brackets are
standard deviations (s.d.)
Table 2
*
Case 1: Simulation results of different estimators of β1 , β2 and β3 in the context of the bivariate spatial probit model.
β1 = 1 β2 = 1 β3 = 1
HPE PMLE HPE PMLE HPE PMLE
λ = 0.2 N = 500 Mean 5.322 2.618 5.333 2.619 5.329 2.623

(s.d.) (8.844) (0.839) (8.872) (0.855) (8.863) (0.870)
N = 1000 Mean 5.308 2.616 5.296 2.616 5.289 2.618
(s.d) (7.612) (0.560) (7.570) (0.560) (7.568) (0.564)
N = 1500 Mean 5.247 2.604 5.239 2.602 5.235 2.604
(s.d.) (6.624) (0.540) (6.606) (0.536) (6.613) (0.543)
λ = 0.4 N = 500 Mean 3.610 1.329 3.614 1.329 3.608 1.328
(s.d.) (5.305) (0.362) (5.311) (0.365) (5.290) (0.366)
N = 1000 Mean 3.600 1.318 3.593 1.316 3.588 1.315
(s.d.) (4.192) (0.355) (4.177) (0.355) (4.178) (0.353)
N = 1500 Mean 3.456 1.281 3.441 1.281 3.438 1.278
(s.d.) (3.818) (0.342) (3.793) (0.343) (3.798) (0.339)
λ = 0.6 N = 500 Mean 2.898 0.972 2.876 0.966 2.885 0.969
(s.d.) (3.761) (0.271) (3.723) (0.268) (3.735) (0.271)
N = 1000 Mean 2.669 0.981 2.669 0.979 2.657 0.978
(s.d.) (2.951) (0.261) (2.953) (0.261) (2.916) (0.259)
N = 1500 Mean 2.508 1.016 2.499 1.015 2.501 1.016
(s.d.) (2.726) (0.250) (2.706) (0.250) (2.708) (0.253)
λ = 0.8 N = 500 Mean 2.246 0.805 2.237 0.801 2.249 0.802
(s.d.) (2.810) (0.373) (2.803) (0.373) (2.841) (0.392)
N = 1000 Mean 2.098 0.843 2.096 0.843 2.082 0.843
(s.d.) (2.281) (0.349) (2.279) (0.349) (2.246) (0.340)
N = 1500 Mean 2.086 0.884 2.096 0.886 2.094 0.886
(s.d.) (2.059) (0.316) (2.071) (0.314) (2.073) (0.318)
*
Results are presented for our new Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of β1 , β2 and β3 . Numbers in brackets
show standard deviations (s.d.)
Table 3
*
Case 2: Simulation results of different estimators of λ in the context of the bivariate spatial probit model.
λ = 0.2 λ = 0.4 λ = 0.6 λ = 0.8
HPE PMLE HPE PMLE HPE PMLE HPE PMLE
N = 500 Mean 2.151 0.381 2.575 0.667 2.491 0.970 2.876 1.202
Bias 1.951 0.181 1.175 0.267 1.891 0.370 2.076 0.402
(s.d.) (4.630) (0.844) (5.073) (0.923) (4.996) (0.913) (6.213) (0.966)
N = 1000 Mean 1.013 0.356 1.089 0.606 1.307 0.863 1.660 1.160
Bias 0.813 0.156 0.689 0.206 0.707 0.263 0.860 0.360
(s.d) (2.131) (0.606) (2.241) (0.622) (2.424) (0.671) (2.675) (0.813)
N = 1500 Mean 0.684 0.324 0.792 0.592 0.906 0.860 1.305 1.156
Bias 0.484 0.124 0.392 0.192 0.306 0.260 0.505 0.356
(s.d.) (1.508) (0.484) (1.566) (0.515) (1.611) (0.601) (1.910) (0.706)
*
Results are presented for the bivariate Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of λ. Numbers in brackets are
standard deviations (s.d.)
Table 4
*
Case 2: Simulation results of different estimators of β1 , β2 and β3 in the context of the bivariate spatial probit model.
β1 = 1 β2 = 1 β3 = 1
HPE PMLE HPE PMLE HPE PMLE
λ = 0.2 N = 500 Mean 1.043 1.021 1.042 1.020 1.042 1.020

(s.d.) (0.120) (0.109) (0.114) (0.104) (0.122) (0.108)
N = 1000 Mean 1.017 1.006 1.022 1.011 1.016 1.005
(s.d) (0.079) (0.071) (0.078) (0.071) (0.079) (0.070)
N = 1500 Mean 1.012 1.004 1.013 1.005 1.010 1.002
(s.d.) (0.065) (0.059) (0.063) (0.058) (0.064) (0.057)
λ = 0.4 N = 500 Mean 1.043 1.017 1.042 1.018 1.043 1.019
(s.d.) (0.125) (0.112) (0.112) (0.105) (0.119) (0.110)
N = 1000 Mean 1.017 1.005 1.020 1.008 1.014 1.002
(s.d.) (0.079) (0.072) (0.080) (0.072) (0.077) (0.069)
N = 1500 Mean 1.014 1.005 1.013 1.004 1.009 1.000
(s.d.) (0.065) (0.058) (0.061) (0.056) (0.062) (0.056)
λ = 0.6 N = 500 Mean 1.046 1.022 1.045 1.022 1.043 1.020
(s.d.) (0.123) (0.107) (0.126) (0.111) (0.126) (0.108)
N = 1000 Mean 1.019 1.006 1.020 1.007 1.015 1.003
(s.d.) (0.083) (0.075) (0.084) (0.075) (0.079) (0.071)
N = 1500 Mean 1.010 1.002 1.012 1.004 1.010 1.002
(s.d.) (0.066) (0.060) (0.065) (0.060) (0.063) (0.059)
λ = 0.8 N = 500 Mean 1.035 1.013 1.036 1.014 1.037 1.015
(s.d.) (0.125) (0.115) (0.125) (0.111) (0.125) (0.115)
N = 1000 Mean 1.016 1.003 1.017 1.005 1.016 1.004
(s.d.) (0.083) (0.086) (0.082) (0.090) (0.084) (0.088)
N = 1500 Mean 1.011 1.000 1.012 1.001 1.012 1.001
(s.d.) (0.065) (0.058) (0.064) (0.057) (0.064) (0.056)
*
Results are presented for the bivariate Partial Maximum Likelihood Estimator (PMLE) and the Heteroskedastic Probit Estimator (HPE) of β1 , β2 and β3 . Numbers in
brackets are standard deviations (s.d.)
References Lesage, J.P., 2000. Bayesian estimation of limit dependent variable spatial
autoregressive models. Geographical Analysis 32, 19–35.
Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covari- McMillen, D.P., 1992. Probit with spatial autocorrelation. Journal of Regional Science
ance matrix estimation. Econometrica 59 (3), 817–858. 32, 335–348.
Anselin, L., Florax, R.J.G.M., 1995. New Direction in Spatial Econometrics. Springer- McMillen, D.P., 1995. Spatial Effects in Probit Models: A Monte Carlo Investigation.
In: New Directions in Spatial econometrics, Springer-Verlag, Berlin, Germany,
Verlag, Berlin, Germany.
pp. 189–228.
Anselin, L., Florax, R.J.G.M., Rey, J.S., 2004. Econometrics for Spatial Models:
Newey, W.K., Mcfadden, D., 1994. Large sample estimation and hypothesis testing.
Recent Advances. In: Advances in Spatial econometrics, Springer-Verlag, Berlin,
In: Handbook of Econometrics, Vol 4. North-Holland, New York, Ch. 36.
Germany, pp. 1–28.
Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity
Beron, K.J., Vijverberg, W.P., 2003. Probit in a Spatial Context: A Monte Carlo
and autocorrelation consistent covariance matrix. Econometrica 55, 308–703.
Approach. In: Advances in Spatial econometrics, Springer-Verlag, Berlin, Pinkse, J., Slade, M.E., 1998. Contracting in space: an application of spatial statistics
Germany, pp. 169–196. to discrete-choice models. Journal of Econometrics 85, 125–154.
Blundell, R., Powell, J.L., 2004. Endogeneity in semiparametric binary response Pinkse, J., Slade, M.E., Shen, L., 2006. Dynamic spatial discrete choice using one-step
models. Review of Economic Studies 71, 655–679. GMM: an application to mine operating decisions. Spatial Economic Analysis 1
Case, A.C., 1991. Spatial patterns in household demand. Econometrica 59, 953–965. (1), 53–99.
Conley, T.G., 1999. GMM estimation with cross sectional dependence. Journal of Poirier, D., Ruud, P.A., 1988. Probit with dependent observations. Review of
Econometrics 92, 1–45. Economic Studies 55, 593–614.
Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press, Oxford. Robinson, P.M., 1982. On the asymptotic properties of estimators of models
Kelejian, H.H., Prucha, I.R., 1999. A generalized moments estimator for the containing limited dependent variables. Econometrica 50, 27–41.
autoregressive parameter in a spatial model. International Economic Review 40, Wooldridge, J.M., 2005. Unobserved heterogeneity and estimation of average partial
509–533. effects. In: Andrews, D.W.K., Stock, J.H. (Eds.), Identification and Inference
Kelejian, H.H., Prucha, I.R., 2001. On the asymptotic distribution of the Moran I test for Econometric Models: Essays in Honor of Thomas Rothenberg. Cambridge
statistic with applications. Journal of Econometrics 104, 219–257. University Press, Cambridge, pp. 27–55.
Lee, L.-F., 2004. Asymptotic distribution of quasi-maximum likelihood estimators Wooldridge, J.M., 2010. Econometric Analysis of Cross Section and Panel Data,
for spatial autoregressive models. Econometrica 72 (6), 1899–1925. second ed. MIT Press, Cambridge, Massachusetts.

Wang 2013

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wang 2013

Uploaded by

Copyright:

Available Formats

Journal of Econometrics 172 (2013) 77–89

Contents lists available at SciVerse ScienceDirect

Partial maximum likelihood estimation of spatial probit models✩

article info abstract

See Wooldridge (2005) for further discussion of average partial

Here the Wij are elements of the spatial weights matrix W

where Sn (θ0 ) ≡ ∂∂θ ∂ Qn

P (Yg1 = 1|Yg2 = 1, Xg ) (i) Q has a unique maximum at θ0 .

+ (1 − Yg1 )Yg2 Pg (0, 1) In order to prove

∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 ∂ 2 P (Yg1 = 1, Yg2 = 1|Xg ) ∗

∂ P (Yg1 = 1, Yg2 = 1|Xg )

∂ P (Yg1 = 1, Yg2 = 1|Xg ) ∗ 2 p

∂ 2 [P (Yg1 = 1, Yg2 = 1|Xg )] ∗ prove Y0n −

i̸=j E (Dn,i Dn,j ) = o(1). This condition is proved in Lemmas 5–7.

where C is a uniform upper bound to g ∈Λnj Ang . Therefore,

provided we set K sufficiently large. Therefore, we proved that

i̸=j a simplified version of the proofs in Davidson (1994).

λ = 0.2 N = 500 Mean 5.322 2.618 5.333 2.619 5.329 2.623

λ = 0.2 N = 500 Mean 1.043 1.021 1.042 1.020 1.042 1.020

You might also like