You are on page 1of 41

Spatial econometrics in about

40 minutes
Bernard Fingleton
University of Strathclyde

Spatial econometrics in about 40


minutes
Why spatial econometrics?
Spatial economics now widely recognised in the
economics mainstream
Krugmans Nobel prize for work on economic geography
Importance of network economics (eg Royal Economic
Society Easter 2009 School , on Auctions and
Networks)
LSE ESRC Centre for Spatial Economics
Increasing policy relevance : World Bank (2008), World
Development Report 2009, World Bank, Washington.
The standard time series econometric tools are not
relevant for spatial series

What is spatial econometrics?


the theory and methodology appropriate to
the analysis of spatial series relating to the
economy
spatial series means each variable is
distributed not in time as in conventional,
mainstream econometrics, but in space.

DGP for time series


y (t ) y (t 1) (t )
(1) y (1) 0

~ iid (0, 2 )
t 2...T

DGP for time series


1.5

0.5

-0.5

-1

-1.5

50

100

150

200

250

DGP for time series


y Wy
y is a T x 1 vector
is a scalar parameter that is estimated

is an T x 1 vector of disturbances

DGP for time series


y Wy
W is a TxT matrix with 1s on the minor diagonal, thus for T = 10
0
1
0
0
0
W
0
0
0
0
0

0
0
1
0
0
0
0
0
0
0

0
0
0
1
0
0
0
0
0
0

0
0
0
0
1
0
0
0
0
0

0
0
0
0
0
1
0
0
0
0

0
0
0
0
0
0
1
0
0
0

0
0
0
0
0
0
0
1
0
0

0
0
0
0
0
0
0
0
1
0

0
0
0
0
0
0
0
0
0
1

The 1s indicate location pairs that are close to each other


in time

0
0
0
0
0
0
0
0
0
0

DGP for time series

y Wy
Provided Wy and are contemporaneously independent
we can estimate by OLS and get consistent estimates,
although there is small sample bias.

DGP for spatial series


In spatial econometrics, we have an N x N W matrix
N is the number of places.

W=

0
0
0
1
0
0
0
0
0
0

0
0
1
1
0
0
0
0
0
0

0
1
0
0
0
0
0
0
0
0

1
1
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
1
1

0
0
0
0
0
0
0
1
0
1

0
0
0
0
0
0
0
0
1
0

0
0
0
0
0
1
0
0
0
1

0
0
0
0
1
0
1
0
0
1

0
0
0
0
1
1
0
1
1
0

N= 353
a portion of the W matrix for Luton(1), Mid Bedfordshire(2),
Bedford(3) , South Bedfordshire(4), Bracknell Forest(5),
Reading(6), Slough(7), West Berkshire(8),
Windsor and Maidenhead(9), Wokingham(10)

The 1s indicate location pairs that are close to each other


in space

DGP for spatial series


Residential property prices in England, 2001
District.s hp
40703 - 89013
89013 - 129966
129966 - 176349
176349 - 274395
274395 - 639049

N= 353

DGP for spatial series


y Wy
y is an N x 1 vector

is a scalar parameter that is estimated


is an N x 1 vector of disturbances

DGP for spatial series


y Wy

This is an almost identical set-up to the time series case


And one might think that it can also be consistently estimated by OLS

But now there is one big difference

we cannot estimate the spatial autoregression by OLS


and obtain consistent estimates of .

Reason - Wy and are not independent.

Wy determines y but is also determined by y.

DGP for spatial series


y Wy

yi depends on all other y s, including yk


because they are within Wy .
But yk also depends on all other y s, including yi
because they are within Wy .
So Wy determines yi and is determined by it.
So we have to use the appropriate likelihood function
or 2sls to obtain consistent estimates.
N

yi f ( Wij y j )
j 1

yk f ( Wkj y j )
j 1

Spatial versus time series


problem concerning W
Close together in time is easily defined
There are many different metrics for measuring closeness
in space
The simplest is contiguity(1,0), where locations 1 denotes
locations sharing the same boundary
Some alternatives for W

straight line distance


great circle distance
economic distance trade costs, market access
travel time
distance in market product space if the n objects are say
companies and not places
social network distance between people

Some typical models

Some typical models


A panel with spatial dependence
Yt X t ut

t 1,..., T

ut Wut t
Yt is an N x 1 vector
X t is an N x k vector

is a k x 1 parameter vector
ut is an N x 1 vector of residuals
t is an N x 1 vector of iid innovations
OR
Y X u
u ( IT W )u ( ITN IT W ) 1
Y is a TN x 1 vector etc

Estimation
ML typically used but problems..
No large sample theory
Fails to handle other endogenous variables
on rhs, i.e. excluding spatial lag (Wy)
Assumes normality, or some other specific
distribution, 2SLS/GM robust
ML requires eigenvalues of W matrix, if N
large this may be problematic

Some typical models

Estimation
Typically these models are estimated by ML.
However another very useful method to estimate these models is FGS2SLS
or feasible generalised spatial 2 stage least squares.
The method involves 3 stages.
Stage 1 the model is estimated by 2SLS to obtain .
Stage 2 the resulting 2SLS residuals give and 2 using a GM procedure.
Stage 3 is used to perform a Cochrane-Orcutt-type transformation
to account for the spatial dependence in the residuals,

to obtain .
This is the most sensible single equation estimation method with
endogenous right hand side regressors (in addition to WY) included within X.

Some applications
Fingleton B (2006b) The new economic geography versus urban economics : an
evaluation using local wage rates in Great Britain, Oxford Economic Papers 58 501530

The NEG wage equation


Relative market potential
Relative wage rates
District.s hp

District.s hp
0.416 - 0.782
0.782 - 0.96

0.645 - 0.865

0.96 - 1.135

0.865 - 0.992

1.135 - 1.373

0.992 - 1.182

1.373 - 1.883

1.182 - 1.506
1.506 - 2.471

The NEG wage equation


ln wo W ln wo a1 (ln P W ln P) b0 b1S b2T

~ N (0, 2 )

The NEG wage equation


ln wo W ln wo a1 (ln P W ln P ) b0 b1S b2T

~ N (0, 2 )

Some applications
Robin Dubin, R. Kelley Pace and Thomas G. Thibodeau Spatial Autoregression
Techniques for Real Estate Data Journal of Real Estate Literature Volume 7, Number 1 /
January, 1999
Abstract
This paper describes how spatial techniques can be used to improve the accuracy of market
value estimates obtained using multiple regression analysis. Rather than eliminating the problem
of spatial residual dependencies through the inclusion of many independent variables, spatial
statistical methods typically keep fewer independent variables and augment these with a simple
model of the spatial error dependence. We discuss alternative spatial autoregression model
specifications, estimation methods, and prediction procedures. An empirical example is provided
in the appendix.

Some applications
Beron K et al (2004) Hedonic price functions and spatial dependence : implications for
the demand for urban air quality, Chapter 12 in Advances in Spatial Econometrics (Eds
Anslin, L Florax R and Rey S) Springer.
Tin 1967 Ronald Ridker and John Henning conducted the first study that linked air pollution to
property prices. ..there seems to be a preponderance of evidence that air pollution is negatively
related to house prices. This is important because it reveals information about the willingness to
pay for air quality a non-market commodity. much of the analysis focuses on the hedonic
regressions, wherein some measure of house price is the dependent variable and measures of
the characteristics of housing ; eg living area, existence of a pool, neighbourhood quality, school
district as well as measures of pollution are the independent variables..we are worried that the
potential for misspecifying the role of neighbourhood quality as a determinant of house prices is
high. For us this is relevant to the extent that it may significantly alter the estimate of the air
pollution effect..to analyse these issues we use the tools of spatial econometrics.

Some applications
Kim C W, Phipps, T, Anselin, L (2003) Measuring the benefits of air quality
improvement: a spatial hedonic approach, Journal of Environmental Economics and
Management, 45 24-39
Abstract:
The primary objective of this paper is to improve the methodology for estimating hedonic price
functions when the data are inherently spatial. A spatial-econometric hedonic housing price
model is developed and estimated for the Seoul metropolitan area to measure the marginal value
of improvements in SO2 and NOX concentrations. Diagnostic testing favored the spatial lag
model over the spatial error model. Results showed that SO2 pollution levels had a significant
impact on housing prices while NOX pollution did not. The authors attribute this differential impact
to the relatively higher levels of SO2 pollution when compared with pollution standards and the
relative recency of NOX pollution. Marginal WTP for a 4% improvement in mean SO2
concentrations is about $2,333 or 1.4% of mean housing price.

Some applications
Bernard Fingleton and Julie Le Gallo
"Estimating spatial models with endogenous variables, a spatial
lag and spatially dependent disturbances: Finite sample
properties
Papers in Regional Science, Volume 87, Number 3, August 2008.

Results can be replicated in Stata

Some applications
http://www.sml.hw.ac.uk/ecomes/peebles/
Contains Stata data file
Stata do file
Stata log file

English house prices in 2000


Residential property prices in England, 2001
District.s hp
40703 - 89013
89013 - 129966
129966 - 176349
176349 - 274395
274395 - 639049

N= 353

English house prices in 2000


y Wy X

~ N (0, 2 )

Wij

Spatial lag model


Log likelihood = -1764.8763

Ei E j exp(d ij )

E E
i

exp(d ij )

Number of obs
Variance ratio
Squared corr.
Sigma

=
=
=
=

353
0.682
0.686
35.82

-----------------------------------------------------------------------------p_all_1 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_all_1
|
wdem_0 |
.7162038
.0744813
9.62
0.000
.5702231
.8621846
cwdem_0 |
.0307249
.0042079
7.30
0.000
.0224775
.0389723
oo | -.0005196
.0000951
-5.46
0.000
-.0007061
-.0003331
mkeystage2 |
185.0892
19.32967
9.58
0.000
147.2037
222.9746
_cons | -704.0562
76.03211
-9.26
0.000
-853.0764
-555.036
-------------+---------------------------------------------------------------rho |
.7233336
.0632212
11.44
0.000
.5994224
.8472448
-----------------------------------------------------------------------------Wald test of rho=0:
chi2(1) = 130.904 (0.000)
Likelihood ratio test of rho=0:
chi2(1) = 107.679 (0.000)
Lagrange multiplier test of rho=0:
chi2(1) = 184.967 (0.000)
Acceptable range for rho: -15.394 < rho < 1.000

DGP for spatial series

W* =

0
0
0
1
0
0
0
0
0
0

0
0
1
1
0
0
0
0
0
0

0
1
0
0
0
0
0
0
0
0

1
1
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
1
1

0
0
0
0
0
0
0
1
0
1

0
0
0
0
0
0
0
0
1
0

0
0
0
0
0
1
0
0
0
1

0
0
0
0
1
0
1
0
0
1

0
0
0
0
1
1
0
1
1
0

N= 353
a portion of the W matrix for Luton(1), Mid Bedfordshire(2),
Bedford(3) , South Bedfordshire(4), Bracknell Forest(5),
Reading(6), Slough(7), West Berkshire(8),
Windsor and Maidenhead(9), Wokingham(10)

The 1s indicate location pairs that are close to each other


in space

English house prices in 2000


y Wy X
2

~
N
(0,

)
Weights matrix
Name: w_contig
Type: Imported (binary)
Row-standardized: Yes

if j n(i)
otherwise

1
0

Wij*

n(i ) the set of neighbours of i


Wij

Wij*

*
ij

Spatial lag model


Log likelihood = -1742.3288

Number of obs
Variance ratio
Squared corr.
Sigma

=
=
=
=

353
0.691
0.748
32.15

-----------------------------------------------------------------------------p_all_1 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_all_1
|
wdem_0 |
.3954057
.072845
5.43
0.000
.2526322
.5381792
cwdem_0 |
.0275843
.0037148
7.43
0.000
.0203034
.0348651
oo | -.0003552
.0000873
-4.07
0.000
-.0005263
-.000184
mkeystage2 |
149.9569
17.41933
8.61
0.000
115.8156
184.0981
_cons | -541.2713
67.4818
-8.02
0.000
-673.5332
-409.0094
-------------+---------------------------------------------------------------rho |
.6062885
.0403313
15.03
0.000
.5272407
.6853364
-----------------------------------------------------------------------------Wald test of rho=0:
chi2(1) = 225.982 (0.000)
Likelihood ratio test of rho=0:
chi2(1) = 152.774 (0.000)
Lagrange multiplier test of rho=0:
chi2(1) = 164.297 (0.000)
Acceptable range for rho: -1.095 < rho < 1.000

English house prices in 2000


Y X
. ***************************** REFERENCE: OLS *******************************
.
. * As above, but estimating using OLS.
.
. regress p_all_1 wdem_0 cwdem_0 oo mkeystage2
Source |
SS
df
MS
-------------+-----------------------------Model | 824812.168
4 206203.042
Residual | 617188.856
348 1773.53119
-------------+-----------------------------Total | 1442001.02
352 4096.59382

Number of obs
F( 4,
348)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

353
116.27
0.0000
0.5720
0.5671
42.113

-----------------------------------------------------------------------------p_all_1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------wdem_0 |
.8640059
.0862467
10.02
0.000
.6943755
1.033636
cwdem_0 |
.0577055
.0040977
14.08
0.000
.0496461
.0657649
oo | -.0007112
.0001101
-6.46
0.000
-.0009278
-.0004947
mkeystage2 |
175.8029
22.70749
7.74
0.000
131.1417
220.4641
_cons | -571.8745
88.35934
-6.47
0.000
-745.6601
-398.089
------------------------------------------------------------------------------

International shock transmission

y Wy X

W
is a vector of N x 1 iid innovations
( I W ) 1
hence
y Wy X ( I W ) 1

International shock transmission


is a vector of N x 1 iid innovations
( I W )1
hence
y Wy X ( I W ) 1

And for <1

( I W ) ( iW i ) W 2W 2 3W 3 ...
1

i 0

is the matrix product of W and W, and Wi is the matrix product of Wi-1 and W.
Which means a shock at j goes to all other locations.
W0 I , W2

International shock transmission


Alternatively, we might invoke a moving average error process
( I W )
MA errors imply shocks that are only transmitted locally. This difference is highlighted
by this diagram

Shock effects with AR errors

Shock effects with MA errors

International shock transmission

The effect of shock are therefore felt directly within each country receiving a shock, and
there is an indirect effect due to W which affects only those countries linked via the W
matrix (i.e. for which there is a non-zero element on the W matrix). If W was a contiguity
matrix we might think of these as local effects. However the effect of a shock is global,
since it is transmitted also to third party countries that are neighbours of neighbours.
The effect via the higher powers of W is also felt in countries that are not linked via nonzero elements of the W matrix. Note that the effect is not one-way. A shock to a country
affects the neighbours, and the non-neighbours, but these also affect the country from
which the shock emanates. In other words, the full effect of a shock for country k is not
simply the shock itself, but the initial shock plus the feedback from the other countries.

International shock transmission

Country.shp
Country.shp
189 - 542
543 - 900
901 - 1502
1503 - 2902
2903 - 23281

5000

5000

10000 Miles

Impact of a positive shock (3 s.d.) to the UK


On real GDP pw

spatial econometrics software


MATLAB
Many routines available on James LeSages
webpages
Advanced spatial econometrics using MATLAB,
available from

http://www.spatial-econometrics.com/
Stata
Some routines forthcoming and available
David Drukker's package. When available, The
package is described here:
http://repec.org/snasug08/drukker_spatial.pdf.

Course
Second "Spatial Econometrics Advanced Institute"
that will be held again in Rome in Amy-June 2009.
The deadline for applications is the end of January.
All the details may be found at the website
Www.Spatialeconometricsadvancedinstitute.org

books
Anselin L (1988) Spatial Econometrics: Methods
and Models. Dordrecht : Kluwer.
Anselin L (2001) Spatial Econometrics, in B. Baltagi
(ed) A Companion to Theoretical Econometrics,
Blackwell: Oxford, 310-330
Anselin L (2006) 'Spatial econometrics' in TC Mills
and K Patterson (eds) Palgrave Handbook of
Econometrics : vol 1 Econometric theory Palgrave:
Macmillan 2006 901-969