You are on page 1of 8

Environment and Planning A, 1982, volume 14, pages 1023-1030

A note on small sample properties of estimators in a first-


order spatial autoregressive model

L Anselin
Department of City and Regional Planning, The Ohio State University, Columbus, OH 43210, USA
Received 2 December 1980, in revised form 28 September 1981

Abstract. This note considers a Bayesian estimator and an ad hoc procedure for the parameters of a
first-order spatial autoregressive model. The approaches are derived, and their small sample properties
compared by means of a Monte Carlo simulation experiment.

Introduction
In many instances where phenomena are studied by means of cross-sectional data, the
interest focuses on whether or not, or to what extent, the value of the variable under
consideration in a particular spatial unit is determined by the values present in the
rest of the spatial system.
The problem can be approached from two viewpoints. In the first, tests for the
presence of spatial autocorrelation are carried out, in which a null hypothesis of no
spatial dependence in the system is tested against the hypothesis of dependence as
reflected in a particular spatial structure. Several test statistics have been developed,
discussed, and applied to different types of spatial data sets, by amongst others,
Krishna Iyer (1949), Moran (1950), Geary (1954), Dacey (1968), Granger (1969),
Cliff and Ord (1973), Cliff et al (1975), Royaltey et al (1975), Kooijman (1976), Sen
(1976), Sen and Soot (1977), Hubert (1978), and Griffith (1980). Applications to
the disturbances of a linear regression model are described in, for example, Cliff and
Ord (1973), Bartels and Hordijk (1977), Brandsma and Ketellapper (1979b).
In the second approach, the problem is viewed as that of estimating the parameter(s)
of a linear regression which expresses the spatial dependence explicitly. The more
general problem of estimation from dependent observations has been extensively
treated in a time series context. It was pointed out very early by Student (1907;
1914) and Yule (1926). More recently, Silvey (1961), Bar-Shalom (1971), and Bhat
(1974), amongst others, have considered the asymptotic properties of maximum
likelihood estimation from dependent observations. They are formulated by use of
conditional probabilities in the likelihood function, conditional upon the observations
in a previous time period.
An extension of this conditional probability framework to the spatial case is
presented in, for example, Besag (1974) and Besag and Moran (1975). However, its
applicability seems limited when few observations are present, because of a loss of
information which results when applying the method. An alternative approach is to
consider a joint probability framework. The difference between the joint and
conditional probability approaches are outlined in Brook (1964). In general terms,
the joint probability approach consists of considering transformations of the original
dependent observations into independent variables. The crucial points in this approach
are the structure of the transformation (which incorporates the assumed spatial
dependence), the extent to which traditional methods may be applied, or whether
alternative procedures need to be constructed.
In a typical cross-sectional situation, the observed sample does not contain enough
information to estimate a full spatial structure or test against all possible forms of
spatial dependence. Hence, a structure is imposed on the spatial system, which
1024 L Anselin

restricts the number of parameters to be estimated. This structure is reflected in a


contiguity, connectivity, or generalized weight matrix, the elements of which are
assumed to be known, fixed, and nonstochastic. Several forms of the weight matrix
have been proposed in the literature, such as a binary matrix (in the Moran and
Geary test statistics), a generalized weight matrix, the rows or columns of which may
sum to one (as in the Cliff-Ord statistic), a combination of orders of contiguity
[biparametric as in Brandsma and Ketellapper (1979a), or multiparametric as, for
example, in Hordijk (1974)], or as a matrix with generalized weights that reflect
particular functional forms, where the values of the parameters are determined a
priori [as in Bodson and Peeters (1975)]. An approach is suggested in Anselin (1980)
for the joint estimation of spatial autoregressive parameters and some parameters of
the contiguity matrix.
The model considered in this note is an autoregressive model in space, where the
postulated spatial dependence of the system is expressed in an a priori known
generalized weight matrix W, the row elements of which sum to 1. With y as the
variable under study, p as the autoregressive parameter, and e as a vector of normal
independent disturbance terms, the model can be formulated as:
j> = pWy + e (1)
with
e ~ N(0, a 2 I) , (2)
where I is the unit matrix. It is the most straightforward and simple structure in a
more complete taxonomy of spatial autoregressive structures, as outlined in Griffith
(1976) and Anselin (1980). It is a special case in a general class of spatial and space-
time autoregressive and moving average models constructed in analogy to time series
approaches. These models have been treated in more detail in, for example, Bennett
(1975a; 1975b), Cliff and Ord (1975), Martin and Oeppen (1975), Haining (1978b)
and Hooper and Hewings (1981).
Though the structure in equation (1) is very simple and in itself not very meaningful
in empirical analysis, it may play a useful role in an initial and exploratory data
analysis. In that respect, it remains important that the estimates of the autoregressive
parameter p provide the correct statistical information.
In this note, a Bayesian estimation procedure is suggested and outlined, and its
performance in small sample situations is compared to the more traditional approaches
such as the ordinary least squares (OLS) and maximum likelihood estimators (ML).

Derivation of a Bayesian approach (MELO)


As with the autoregressive model in time series analysis, it is possible to apply a
Bayesian approach to the spatial autoregressive model. In this section we extend the
procedures that are outlined in Zellner (1971, pages 186-191) to the spatial case.
As outlined in Mead (1967), Cliff and Ord (1973), Ord (1975), and Haining
(1978a), amongst others, the likelihood function for the spatial autoregressive model
based on N observations, with a standardized contiguity matrix is

LCvlp, a) cc y\\ (1-pX,)] a - ^ e x p [ - ( 2 a 2 r 1 j ; T ( I - p W ) T ( I - p W ) j ; ] , (3)

where the \ t are the roots of the contiguity matrix W. We assume the following
diffuse prior for the parameter a
P(o) <* (T 1 , 0 < G < +oo (4)
Small sample properties of estimators 1025

and for p,
P(p) oc constant , - 1 < p < +1 , (5)
and assume o and p to be independent, hence
P(p, a) oc a" 1 . (6) .
Notice that the assumption of nondiffuse priors is equally valid, and may even be
more appropriate in specific empirical applications of an exploratory nature. The
diffuse priors were chosen to compare the performance of the resulting estimate in
small samples more directly with the ML estimate. The constraints for p in
expression (5) were chosen by analogy with the usual procedure in finding the p
estimator by means of the maximum likelihood approach. It can be shown that the
ML estimate has to satisfy the condition
\-p\>0, Vi . (7)
Since the dominant root of the standardized contiguity matrix is + 1 , and all other
roots are less than one in absolute value, the usual condition
-1 < p < + 1 (8)
is overly restrictive for the left-hand side inequality. In most practical situations
mentioned in the literature this is not taken into account. In the experiments
described below, the results were not affected by use of inequality (8) instead of the
more appropriate inequality

V^~ < P < + 1 , (9)

where X~ ax is the negative root of W with the largest absolute value (but smaller
than 1). Also notice that this condition is not necessarily satisfied by the OLS
estimate, since there is no assurance that j>TWy < j>TWTWy. This will depend on
each particular y vector and contiguity matrix.
To obtain the joint posterior distribution for (p, a) in the Bayesian approach, the
joint prior is multiplied with the likelihood function (3) to obtain

P(P, o\y) oc [ 0 (I-PMJ °~(N+1) e x p [ - ( 2 a 2 r V T ( I - p W ) T ( I - p W ) j ; ] . (10)

Since in most cases we are not particularly interested in the parameter a, we obtain
the marginal posterior distribution of p by integrating out o, or
r +°°r N ~i
P(pLv) « I I . 0 (1-PX/)J a - ^ + 1 ) e x p [ - ( 2 a 2 r V T ( I - p W ) T ( I - p W ) j ; ] d a . (11)

From the properties of the inverted gamma-2 density, namely that


r +«>
J cj"< i V + 1 >exp[-(2a 2 r 1 e]da = constant x Q~NI2 , (12)

it follows that

P(p\y) <* [ , 0 ^ 1 - P X / ) ] b T d - p W ) T ( I - p W ) 3 ; ] - ^ 2 (13)


or
P(p\y) * [ / f i i ( l " P h ) \ (yTy ~ 2pyTyL + P W A ) ^ / 2
, 04)

where yL = Wy. The normalizing constant (K), the mean, and other moments of the
1026 L Anselin

posterior distribution may be obtained for every sample using numerical integration
techniques. For example, the mean is found as

Similarly, once the normalizing constant is obtained, the probability of a value for p
in any interval may be obtained by integrating over that interval.
For a quadratic loss function, the estimator p, p = p, or the mean of the posterior
distribution, is the optimal point estimator or minimum expected loss estimator
(MELO). For large samples, the likelihood factor will dominate the prior factor in
the expression of Bayes's law. Since in large samples the mean and the mode of the
likelihood function coincide, and the large sample MELO estimate will tend to
coincide with the mean of the likelihood function, MELO and ML will be equivalent.
However, in small samples they are different. Since in spatial cross-sectional situations,
especially when dealing with larger spatial units (such as states, census regions, or
provinces), the number of observations may be quite low, it is important to gain
additional insight into the small sample properties of the MELO estimate relative to
the ML and OLS. In addition, a fourth estimate is considered, namely

Roi = (yTyrl(yTyty)> (16)


which is the OLS estimate for X in
YTy = 7w + e , (17)
T T
and is always less than 1 in absolute value, since y Vfy < y y because of the
averaging effect of the standardized contiguity matrix. Although it may be intuitively
tempting to interpret X as a coefficient in an inverse contiguity structure,
y = XW-^ + W^e , (18)
_1
it is difficult to do so since the row elements in W do not necessarily sum to 1.
Moreover, estimation of X by OLS suffers the same problems of bias and inconsistency
as in the direct autoregressive form of equation (1). As a result, R Q i should not be
considered to be more than a purely ad hoc coefficient within the estimation procedure.

Small sample properties


In this section the small sample properties of OLS, ML, MELO, and R 0 1 will be
compared. It can be shown that OLS is biased, because of the presence of a stochastic
element (Wy) in the right-hand side of the regression equation. In contrast to the
time series case, OLS will be inconsistent as well, independent of whether or not
there is serial dependence in the disturbance term, because of the nature of the
Jacobian |I—pW|. ML and MELO are asymptotically equivalent, consistent, and most
efficient.
Since the small sample properties of these estimators cannot be derived analytically,
we have to resort to Monte Carlo simulation experiments. Some autoregressive
models that incorporate a smaller number of contiguous spatial units than the one
used in the previous sections have been reported in Haining (1978a). For a contiguity
structure which is not standardized and with values of p between 0 and 0-25 on a
5 x 5 and a 7 x 7 lattice, he reported a smaller bias and variance for ML compared
with OLS. Also, the performance of ML was found to improve with increasing
lattice size. Although OLS tended to overestimate the true value of the parameter,
ML tended to underestimate.
In this section, additional evidence is presented on the small sample properties of
OLS and ML for the pure autoregressive model with a more general (and standardized)
Small sample properties of estimators 1027

contiguity structure. Also, these estimates are compared with the approaches
suggested in the previous sections, namely MELO and R Q i . Similar to the approach
used in Haining (1978a), three abstract spatial situations were considered as reflected
in regular square lattices of sizes 4 x 4 , 5 x 5 , and 6x6. This results in binary
contiguity matrices of dimensions 16, 25, and 36, where connectivity is defined as
having a side or vertex in common. The abstract lattice structures are chosen to
separate the properties of the estimators as such from peculiarities introduced by
particular spatial configurations. To the extent that often spatial data may be
represented on a grid, it is not considered to be too much of a departure from reality.
Ideally, however, each particular spatial configuration should be studied. The original
symmetric binary contiguity matrix is standardized by making the rows sum to 1.
Three values for the autoregressive parameter p are considered, which reflect increasing
degrees of spatial connectivity in the system: 0-2, 0-5, and 0-8. The starting point
for the simulated data sets is a set of generated independent standard normal variates,
the disturbances in the model. For each parameter combination and lattice size the
estimation is repeated for 250 generated data sets in order to approximate repeated
sampling. For each estimation method, four summary characteristics are considered,
namely:
(a) the mean of the estimated values for p:
M
m est = M-l £ p. ; (19)
I = 1

(b) the bias of the estimate, defined as the deviation of the mean from the true
parameter value,
M
best = A T 1 X pt-p ; (20)
/= I

(c) the mean squared error, defined as


M
ems = M-1 I (Pt-p)2 ; (21)
i = I

(d) the mean absolute percentage error, defined as


M
e ma P = M-i £ p-i|pz.-p|i00 ; (22)
I = 1

where M is the number of generated data sets (M = 250).


A summary overview of the results is given in table 1, where for each combination
of sample size and parameter value the best method is given according to &est, ems,
and emap, together with the estimated value and the value of the characteristic.
With respect to Z?est, ML is found as best on the 6x6 lattice for all values of p,
on the 5 x 5 lattice for p = 0-5 and 0-8, and on the 4 x 4 lattice for p = 0-8. In
every one of these situations, ML underestimates the true value of the parameter.
For p = 0 • 2 on the 5 x 5 lattice and for p = 0 • 5 on the 4 x 4 lattice OLS has the
smallest value of best. It underestimates slightly in the first case, and overestimates
on the smallest lattice. For p = 0-2 on the smallest lattice RQ1 is best, and
underestimates.
With respect to ems a n d emap, OLS does not achieve very satisfactory results. For
p = 0-2, RQ1 is best, and this on all three lattices. For p = 0-5 and p = 0-8, ML
and MELO are very close, with a slight edge for ML for p = 0-8 and on the 6x6
lattice.
An examination of the frequencies of the estimated values shows that there is a
tendency for OLS to overestimate the true value of the parameter, whereas the other
1028 L Anselin

procedures tend to underestimate. F o r OLS, ML, and MELO, the pattern is fairly
similar, with a long tail to the left (especially for p = 0-2 and 0-5) and with
increasing sharpness of the distribution for larger p . Of these three estimators, the
distribution for OLS is most to the right, showing estimates larger than 1, even when
p = 0 - 2 , and considerably so for p = 0 - 8 , on all three lattices. The distribution for
MELO is most to the left, following very closely the pattern for ML. The distribution:
of the estimates obtained by RQ1 is different from the others, which may be expected
since it estimates a parameter in a different model. There are a considerable number

Table 1. Summary results of the Monte Carlo experiment, best method 3 .


Sample p = 0-2 p = 0-5 p = 0-8
size ——
value method value method value method
of SC b of SC b ofSCb

Bias of the estimate, best


4x4 0-079 -0-121 Rol 0-562 0-062 OLS 0-691 -0-109 ML
5x5 0-198 -0-002 OLS 0-416 -0-084 ML 0-723 -0-077 ML
6x6 0-158 -0-042 ML 0-454 -0-046 ML 0-761 -0-039 ML
Mean squared error, e™
4x4 0-079 0-048 ^oi 0-254 0-112 ^oi 0-691 0-060 ML
5x5 0-083 0-035 0-374 0-085 MELO 0-723 0-045 ML
6x6 0-082 0-027 ^oi 0-454 0-044 ML 0-761 0-017 ML
Mean absolute percentage error, e1™^
4x4 0-079 92-9 RQ\ 0-312 52-7 MELO 0-691 20-2 ML
5x5 0-083 78-7 0-374 42-1 MELO 0-723 16-2 ML
0-416 42-1 ML
6x6 0-082 70-0 R oi 0-454 32-7 ML 0-761 11-8 ML
a
The best method is given according to the summary characteristic used for each combination of
sample size and value of p. Also given are the mean value of p and the value of the summary
characteristic.
b
SC—summary characteristic.

4 x 4 lattice, p = 0 • 2 OLS
I 501- ML
MELO
RQI

+1

6 x 6 lattice, P == 0-8 11

5 x 5 lattice, p = 0*5
4
/A
* 50 50 - n

/ i\ Y 1
/ li / // A \
-1 0 . +1
P P
Figure 1. Frequency of estimates for different lattices and values of p.
Small sample properties of estimators 1029

of small values of the estimate, even when p = 0 - 8 . Figure 1 illustrates some of


these results for p = 0-2 on the 4 x 4 lattice, p = 0*5 on the 5 x 5 , and p = 0 - 8 on
the 6x6 lattice.

Summary and conclusions


In this n o t e a n u m b e r of estimation procedures for the most simple of the spatial
autoregressive structures are more closely examined. A Bayesian approach is introduced
and derived and its properties compared with the more traditional ordinary least
squares and m a x i m u m likelihood procedures and with an ad h o c estimate.
The results of the simulation with respect t o OLS and ML are in general accordance
with t h e earlier findings reported in the literature for slightly different situations:
overestimation for OLS, underestimation and more precise estimates for ML.
As far as the limited simulation experiments allow for general conclusions, we
found quite satisfactory results for the t w o approaches suggested: the Bayesian MELO
and RQ1.
RQl > though it is a purely ad h o c procedure, outperforms the other m e t h o d s for
p = 0 • 2 on all samples, and is quite good for p = 0 • 5 on the 4 x 4 lattice. When
considered n o t so m u c h as an estimate in itself, but as a starting value in a nonlinear
optimization routine, it becomes quite useful and efficient, and is definitely superior
to OLS.
MELO approaches the properties of the ML estimator for larger values of p and
larger samples. On the 4 x 4 and 5 x 5 lattices for p = 0-2 and 0 - 5 , it often is the
most efficient of the t w o . In addition to its value as an estimate in itself, it allows
for the calculation of the exact probability of finding a value within a predetermined
interval. As a result, it provides a viable alternative to the use of the ML asymptotic
variance matrix, a likelihood ratio test or other test statistics for spatial autocorrelation.
Overall, ML and MELO seem to be the most appropriate approaches. RQ1 is
attractive in particular situations, b u t does n o t show that quality overall. Finally,
OLS, though showing a small bias in some instances, is in general very imprecise.

Acknowledgements. This paper summarizes and extends some ideas presented in a chapter of the
author's PhD dissertation in the field of regional science at Cornell University. Laura Krause helped
in drawing the graphs. The comments of an anonymous referee are gratefully acknowledged. The
usual disclaimer holds.
References
Anselin L, 1980 Estimation Methods for Spatial Autoregressive Structures: A Study in Spatial
Econometrics Regional Science Dissertation and Monograph Series 8, Program in Urban and
Regional Studies, Cornell University, Ithaca, NY
Bar-Shalom Y, 1971 "On the asymptotic properties of the maximum likelihood estimate obtained
from dependent observations" Journal of the Royal Statistical Society, Series B 33 72-77
Bartels C P A, Hordijk L, 1977 "On the power of the generalized Moran contiguity coefficient in
testing for spatial autocorrelation among regression disturbances" Regional Science and Urban
Economics 7 (1-2) 83-101
Bennett R, 1975a "Dynamic systems modelling of the North-west region: 1. Spatio-temporal
representation and identification" Environment and Planning A 7 525-538
Bennett R, 1975b "The representation and identification of spatio-temporal systems: an example of
population diffusion in North West England" Transactions of the Institute of British Geographers
66 73-94
Besag J, 1974 "Spatial interaction and the statistical analysis of lattice systems" Journal of the
Royal Statistical Society, Series B 36 192-225
Besag J, Moran P A, 1975 "On the estimation and testing of spatial interaction in Gaussian lattice
processes" Biometrika 62 555-562
Bhat B, 1974 "On the method of maximum likelihood for dependent observations" Journal of the
Royal Statistical Society, Series B 30 48-53
1030 L Anselin

Bodson P, Peeters D, 1975 "Estimation of the coefficients of a linear regression in the presence of
spatial autocorrelation. An application to a Belgian labour-demand function" Environment and
Planning A 7 455-472
Brandsma A S, Ketellapper R, 1979a "Biparametric approach to spatial autocorrelation"
Environment and Planning >1 11 51-58
Brandsma A S, Ketellapper R, 1979b "Further evidence on alternative procedures for testing of
spatial autocorrelation among regression disturbances" in Exploratory and Explanatory Statistical
Analysis of Spatial Data Eds C P A Bartels, R Ketellapper (Martinus Nijhoff, Boston, MA)
pp 113-136
Brook D, 1964 "On the distinction between the conditional probability and the joint probability
approaches in the specification of nearest neighbour systems" Biometrika 51 481-483
Cliff A D, Martin R, (3rd J K, 1975 "A test for spatial autocorrelation in choropleth maps based
upon a modified X2 statistic" Transactions of the Institute of British Geographers 65 109-129
Cliff A D, Ord J K, 1973 Spatial Autocorrelation (Pion, London)
Cliff A D, Ord J K, 1975 "Space-time modelling with an application to regional forecasting"
Transactions of the Institute of British Geographers 64 119-128
Dacey M, 1968 "A review on measures of contiguity for two and fc-color maps" in Spatial Analysis
Eds B Berry, D Marble (Prentice-Hall, Englewood Cliffs, NJ) pp 479-495
Geary R C, 1954 "The contiguity ratio and statistical mapping" The Incorporated Statistician 5
115-145
Granger C, 1969 "Spatial data and time series analysis" in London Papers in Regional Science.
Studies in Regional Science Ed. A Scott (Pion, London) pp 1 -24
Griffith D A, 1976 "Spatial autocorrelation problems: some preliminary sketches of a structural
taxonomy" The East Lakes Geographer 11 59-68
Griffith D A, 1980 "Towards a theory of spatial statistics" Geographical Analysis 12(4) 325-339
Haining R, 1978a "Estimating spatial-interaction models" Environment and Planning A 10 305-320
Haining R, 1978b "The moving average model for spatial interactions" Transactions of the Institute
of British Geographers 3(2) 202-225
Hooper D, Hewings G J D, 1981 "Some properties of space-time processes" Geographical Analysis
13(3)203-223
Hordijk L, 1974 "Spatial correlation in the disturbances of a linear interregional model" Regional
Science and Urban Economics 4(3) 117-140
Hubert L, 1978 "Nonparametric tests for patterns in geographic variation—possible generalization"
Geographical Analysis 10(1) 86-88
Kooijman S A, 1976 "Some remarks on the statistical analysis of grids especially with respect to
ecology" Annals of Systems Research 5 113-132
Krishna Iyer P, 1949 "The first and second moments of some probability distributions arising from
points on a lattice and their applications" Biometrika 36 135-141
Martin R, Oeppen J, 1975 "The identification of regional forecasting models using space-time
correlation functions" Transactions of the Institute of British Geographers 66 95-118
Mead R, 1967 "A mathematical model for the estimation of interplant competition" Biometrics 23
189-205
Moran P A P , 1950 "A test for the serial independence of residuals" Biometrika 37 178-181
Ord J K, 1975 "Estimation methods for models of spatial interaction" Journal of the American
Statistical Association 70(349) 120-126
Royaltey H, Astrachan E, Sokal R, 1975 "Tests for patterns in geographic variation" Geographical
Analysis 7 (4) 369-395
Sen A, 1976 "Large sample-size distribution of statistics used in testing for spatial correlation"
Geographical Analysis 8(2) 175-184
Sen A, Soot S, 1977 "Rank tests for spatial correlation" Environment and Planning A 9 897-903
Silvey S, 1961 "A note on maximum likelihood in the case of dependent random variables" Journal
of the Royal Statistical Society, Series B 23 444-452
Student, 1907 "On the error of counting with a haemacytometer" Biometrika 5 351-360
Student, 1914 "The elimination of spurious correlation due to position in time or space"
Biometrika 10 179-180
Yule G, 1926 "Why do we sometimes get nonsense-correlations between time series? A study in
sampling and the nature of time series" Journal of the Royal Statistical Society 89 1-69
Zellner A, 1971 An Introduction to Bayesian Inference in Econometrics (John Wiley, New York)

p © 1982 a Pion publication printed in Great Britain

You might also like