Professional Documents
Culture Documents
A Note On Small Sample Properties of Estimators in A First-Order Spatial Autoregressive Model
A Note On Small Sample Properties of Estimators in A First-Order Spatial Autoregressive Model
L Anselin
Department of City and Regional Planning, The Ohio State University, Columbus, OH 43210, USA
Received 2 December 1980, in revised form 28 September 1981
Abstract. This note considers a Bayesian estimator and an ad hoc procedure for the parameters of a
first-order spatial autoregressive model. The approaches are derived, and their small sample properties
compared by means of a Monte Carlo simulation experiment.
Introduction
In many instances where phenomena are studied by means of cross-sectional data, the
interest focuses on whether or not, or to what extent, the value of the variable under
consideration in a particular spatial unit is determined by the values present in the
rest of the spatial system.
The problem can be approached from two viewpoints. In the first, tests for the
presence of spatial autocorrelation are carried out, in which a null hypothesis of no
spatial dependence in the system is tested against the hypothesis of dependence as
reflected in a particular spatial structure. Several test statistics have been developed,
discussed, and applied to different types of spatial data sets, by amongst others,
Krishna Iyer (1949), Moran (1950), Geary (1954), Dacey (1968), Granger (1969),
Cliff and Ord (1973), Cliff et al (1975), Royaltey et al (1975), Kooijman (1976), Sen
(1976), Sen and Soot (1977), Hubert (1978), and Griffith (1980). Applications to
the disturbances of a linear regression model are described in, for example, Cliff and
Ord (1973), Bartels and Hordijk (1977), Brandsma and Ketellapper (1979b).
In the second approach, the problem is viewed as that of estimating the parameter(s)
of a linear regression which expresses the spatial dependence explicitly. The more
general problem of estimation from dependent observations has been extensively
treated in a time series context. It was pointed out very early by Student (1907;
1914) and Yule (1926). More recently, Silvey (1961), Bar-Shalom (1971), and Bhat
(1974), amongst others, have considered the asymptotic properties of maximum
likelihood estimation from dependent observations. They are formulated by use of
conditional probabilities in the likelihood function, conditional upon the observations
in a previous time period.
An extension of this conditional probability framework to the spatial case is
presented in, for example, Besag (1974) and Besag and Moran (1975). However, its
applicability seems limited when few observations are present, because of a loss of
information which results when applying the method. An alternative approach is to
consider a joint probability framework. The difference between the joint and
conditional probability approaches are outlined in Brook (1964). In general terms,
the joint probability approach consists of considering transformations of the original
dependent observations into independent variables. The crucial points in this approach
are the structure of the transformation (which incorporates the assumed spatial
dependence), the extent to which traditional methods may be applied, or whether
alternative procedures need to be constructed.
In a typical cross-sectional situation, the observed sample does not contain enough
information to estimate a full spatial structure or test against all possible forms of
spatial dependence. Hence, a structure is imposed on the spatial system, which
1024 L Anselin
where the \ t are the roots of the contiguity matrix W. We assume the following
diffuse prior for the parameter a
P(o) <* (T 1 , 0 < G < +oo (4)
Small sample properties of estimators 1025
and for p,
P(p) oc constant , - 1 < p < +1 , (5)
and assume o and p to be independent, hence
P(p, a) oc a" 1 . (6) .
Notice that the assumption of nondiffuse priors is equally valid, and may even be
more appropriate in specific empirical applications of an exploratory nature. The
diffuse priors were chosen to compare the performance of the resulting estimate in
small samples more directly with the ML estimate. The constraints for p in
expression (5) were chosen by analogy with the usual procedure in finding the p
estimator by means of the maximum likelihood approach. It can be shown that the
ML estimate has to satisfy the condition
\-p\>0, Vi . (7)
Since the dominant root of the standardized contiguity matrix is + 1 , and all other
roots are less than one in absolute value, the usual condition
-1 < p < + 1 (8)
is overly restrictive for the left-hand side inequality. In most practical situations
mentioned in the literature this is not taken into account. In the experiments
described below, the results were not affected by use of inequality (8) instead of the
more appropriate inequality
where X~ ax is the negative root of W with the largest absolute value (but smaller
than 1). Also notice that this condition is not necessarily satisfied by the OLS
estimate, since there is no assurance that j>TWy < j>TWTWy. This will depend on
each particular y vector and contiguity matrix.
To obtain the joint posterior distribution for (p, a) in the Bayesian approach, the
joint prior is multiplied with the likelihood function (3) to obtain
Since in most cases we are not particularly interested in the parameter a, we obtain
the marginal posterior distribution of p by integrating out o, or
r +°°r N ~i
P(pLv) « I I . 0 (1-PX/)J a - ^ + 1 ) e x p [ - ( 2 a 2 r V T ( I - p W ) T ( I - p W ) j ; ] d a . (11)
it follows that
where yL = Wy. The normalizing constant (K), the mean, and other moments of the
1026 L Anselin
posterior distribution may be obtained for every sample using numerical integration
techniques. For example, the mean is found as
Similarly, once the normalizing constant is obtained, the probability of a value for p
in any interval may be obtained by integrating over that interval.
For a quadratic loss function, the estimator p, p = p, or the mean of the posterior
distribution, is the optimal point estimator or minimum expected loss estimator
(MELO). For large samples, the likelihood factor will dominate the prior factor in
the expression of Bayes's law. Since in large samples the mean and the mode of the
likelihood function coincide, and the large sample MELO estimate will tend to
coincide with the mean of the likelihood function, MELO and ML will be equivalent.
However, in small samples they are different. Since in spatial cross-sectional situations,
especially when dealing with larger spatial units (such as states, census regions, or
provinces), the number of observations may be quite low, it is important to gain
additional insight into the small sample properties of the MELO estimate relative to
the ML and OLS. In addition, a fourth estimate is considered, namely
contiguity structure. Also, these estimates are compared with the approaches
suggested in the previous sections, namely MELO and R Q i . Similar to the approach
used in Haining (1978a), three abstract spatial situations were considered as reflected
in regular square lattices of sizes 4 x 4 , 5 x 5 , and 6x6. This results in binary
contiguity matrices of dimensions 16, 25, and 36, where connectivity is defined as
having a side or vertex in common. The abstract lattice structures are chosen to
separate the properties of the estimators as such from peculiarities introduced by
particular spatial configurations. To the extent that often spatial data may be
represented on a grid, it is not considered to be too much of a departure from reality.
Ideally, however, each particular spatial configuration should be studied. The original
symmetric binary contiguity matrix is standardized by making the rows sum to 1.
Three values for the autoregressive parameter p are considered, which reflect increasing
degrees of spatial connectivity in the system: 0-2, 0-5, and 0-8. The starting point
for the simulated data sets is a set of generated independent standard normal variates,
the disturbances in the model. For each parameter combination and lattice size the
estimation is repeated for 250 generated data sets in order to approximate repeated
sampling. For each estimation method, four summary characteristics are considered,
namely:
(a) the mean of the estimated values for p:
M
m est = M-l £ p. ; (19)
I = 1
(b) the bias of the estimate, defined as the deviation of the mean from the true
parameter value,
M
best = A T 1 X pt-p ; (20)
/= I
procedures tend to underestimate. F o r OLS, ML, and MELO, the pattern is fairly
similar, with a long tail to the left (especially for p = 0-2 and 0-5) and with
increasing sharpness of the distribution for larger p . Of these three estimators, the
distribution for OLS is most to the right, showing estimates larger than 1, even when
p = 0 - 2 , and considerably so for p = 0 - 8 , on all three lattices. The distribution for
MELO is most to the left, following very closely the pattern for ML. The distribution:
of the estimates obtained by RQ1 is different from the others, which may be expected
since it estimates a parameter in a different model. There are a considerable number
4 x 4 lattice, p = 0 • 2 OLS
I 501- ML
MELO
RQI
+1
6 x 6 lattice, P == 0-8 11
5 x 5 lattice, p = 0*5
4
/A
* 50 50 - n
/ i\ Y 1
/ li / // A \
-1 0 . +1
P P
Figure 1. Frequency of estimates for different lattices and values of p.
Small sample properties of estimators 1029
Acknowledgements. This paper summarizes and extends some ideas presented in a chapter of the
author's PhD dissertation in the field of regional science at Cornell University. Laura Krause helped
in drawing the graphs. The comments of an anonymous referee are gratefully acknowledged. The
usual disclaimer holds.
References
Anselin L, 1980 Estimation Methods for Spatial Autoregressive Structures: A Study in Spatial
Econometrics Regional Science Dissertation and Monograph Series 8, Program in Urban and
Regional Studies, Cornell University, Ithaca, NY
Bar-Shalom Y, 1971 "On the asymptotic properties of the maximum likelihood estimate obtained
from dependent observations" Journal of the Royal Statistical Society, Series B 33 72-77
Bartels C P A, Hordijk L, 1977 "On the power of the generalized Moran contiguity coefficient in
testing for spatial autocorrelation among regression disturbances" Regional Science and Urban
Economics 7 (1-2) 83-101
Bennett R, 1975a "Dynamic systems modelling of the North-west region: 1. Spatio-temporal
representation and identification" Environment and Planning A 7 525-538
Bennett R, 1975b "The representation and identification of spatio-temporal systems: an example of
population diffusion in North West England" Transactions of the Institute of British Geographers
66 73-94
Besag J, 1974 "Spatial interaction and the statistical analysis of lattice systems" Journal of the
Royal Statistical Society, Series B 36 192-225
Besag J, Moran P A, 1975 "On the estimation and testing of spatial interaction in Gaussian lattice
processes" Biometrika 62 555-562
Bhat B, 1974 "On the method of maximum likelihood for dependent observations" Journal of the
Royal Statistical Society, Series B 30 48-53
1030 L Anselin
Bodson P, Peeters D, 1975 "Estimation of the coefficients of a linear regression in the presence of
spatial autocorrelation. An application to a Belgian labour-demand function" Environment and
Planning A 7 455-472
Brandsma A S, Ketellapper R, 1979a "Biparametric approach to spatial autocorrelation"
Environment and Planning >1 11 51-58
Brandsma A S, Ketellapper R, 1979b "Further evidence on alternative procedures for testing of
spatial autocorrelation among regression disturbances" in Exploratory and Explanatory Statistical
Analysis of Spatial Data Eds C P A Bartels, R Ketellapper (Martinus Nijhoff, Boston, MA)
pp 113-136
Brook D, 1964 "On the distinction between the conditional probability and the joint probability
approaches in the specification of nearest neighbour systems" Biometrika 51 481-483
Cliff A D, Martin R, (3rd J K, 1975 "A test for spatial autocorrelation in choropleth maps based
upon a modified X2 statistic" Transactions of the Institute of British Geographers 65 109-129
Cliff A D, Ord J K, 1973 Spatial Autocorrelation (Pion, London)
Cliff A D, Ord J K, 1975 "Space-time modelling with an application to regional forecasting"
Transactions of the Institute of British Geographers 64 119-128
Dacey M, 1968 "A review on measures of contiguity for two and fc-color maps" in Spatial Analysis
Eds B Berry, D Marble (Prentice-Hall, Englewood Cliffs, NJ) pp 479-495
Geary R C, 1954 "The contiguity ratio and statistical mapping" The Incorporated Statistician 5
115-145
Granger C, 1969 "Spatial data and time series analysis" in London Papers in Regional Science.
Studies in Regional Science Ed. A Scott (Pion, London) pp 1 -24
Griffith D A, 1976 "Spatial autocorrelation problems: some preliminary sketches of a structural
taxonomy" The East Lakes Geographer 11 59-68
Griffith D A, 1980 "Towards a theory of spatial statistics" Geographical Analysis 12(4) 325-339
Haining R, 1978a "Estimating spatial-interaction models" Environment and Planning A 10 305-320
Haining R, 1978b "The moving average model for spatial interactions" Transactions of the Institute
of British Geographers 3(2) 202-225
Hooper D, Hewings G J D, 1981 "Some properties of space-time processes" Geographical Analysis
13(3)203-223
Hordijk L, 1974 "Spatial correlation in the disturbances of a linear interregional model" Regional
Science and Urban Economics 4(3) 117-140
Hubert L, 1978 "Nonparametric tests for patterns in geographic variation—possible generalization"
Geographical Analysis 10(1) 86-88
Kooijman S A, 1976 "Some remarks on the statistical analysis of grids especially with respect to
ecology" Annals of Systems Research 5 113-132
Krishna Iyer P, 1949 "The first and second moments of some probability distributions arising from
points on a lattice and their applications" Biometrika 36 135-141
Martin R, Oeppen J, 1975 "The identification of regional forecasting models using space-time
correlation functions" Transactions of the Institute of British Geographers 66 95-118
Mead R, 1967 "A mathematical model for the estimation of interplant competition" Biometrics 23
189-205
Moran P A P , 1950 "A test for the serial independence of residuals" Biometrika 37 178-181
Ord J K, 1975 "Estimation methods for models of spatial interaction" Journal of the American
Statistical Association 70(349) 120-126
Royaltey H, Astrachan E, Sokal R, 1975 "Tests for patterns in geographic variation" Geographical
Analysis 7 (4) 369-395
Sen A, 1976 "Large sample-size distribution of statistics used in testing for spatial correlation"
Geographical Analysis 8(2) 175-184
Sen A, Soot S, 1977 "Rank tests for spatial correlation" Environment and Planning A 9 897-903
Silvey S, 1961 "A note on maximum likelihood in the case of dependent random variables" Journal
of the Royal Statistical Society, Series B 23 444-452
Student, 1907 "On the error of counting with a haemacytometer" Biometrika 5 351-360
Student, 1914 "The elimination of spurious correlation due to position in time or space"
Biometrika 10 179-180
Yule G, 1926 "Why do we sometimes get nonsense-correlations between time series? A study in
sampling and the nature of time series" Journal of the Royal Statistical Society 89 1-69
Zellner A, 1971 An Introduction to Bayesian Inference in Econometrics (John Wiley, New York)