You are on page 1of 33

Introduction to

Spatial Econometrics
Rusli Abdulah
Peneliti INDEF
Jakarta, 21-22 November 2019
What is Spatial Econometrics
“A collection of techniques that deal with the peculiarities caused by
space in the statistical analysis of regional science models”
Luc Anselin (1988)

Spatial Econometrics lays on first geographical law of Waldo Tobler:


“everything is related to everything else”, but near things are more
related than distant things
Why do We Need Spatial Econometric
• Important aspect when studying spatial units (cities, regions,
countries).
• Potential relationships and interactions between them.
• Example: Modeling pollution:
1. Analyze regions as independent units?
2. No, regions are spatially interrelated by ecological and economic
interactions.
3. Existence of environmental externalities: ➔ and increase in i’s pollution
will affect the pollution in neighbors regions, but the impact will be lower
for more distance regions.
Why do We Need Spatial Econometric
• Increasing attention towards Spatial Econometrics in Economics
• Growing interest in agglomeration economies/spillovers –
(Geographical Economics)
• Diffusion of GIS technology and increased availability of geo-coded
data
The nature of spatial data
• Aggegate spatial data are characterized by dependence (spatial
autocorrelation) and heterogeneity (spatial structure)
• Data representation: time series („time line”) vs. spatial data (map)
• Spatial econometrics deal with spatial effects:
spatial heterogeneity
spatial dependence
Spatial heterogeneity
• Spatial heterogeneity relates to a differentiation of the effects of
space over the sample units. Formally, for spatial unit i:

• Lack of stability over the geographical space.


• Structural instability in the forms of:
• Non-constant error variances (spatial heteroscedasticity)
• Non-constant coefficients (variable coefficients, spatial regimes)
Spatial dependence (spatial
autocorrelation/spatial association)
• In spatial datasets „dependence is present in all directions and
becomes weaker as data locations become more and more dispersed”
(Cressie, 1993)
• Tobler’s ‘First Law of Geography’: „Everything is related to everything
else, but near things are more related than distant things.” (Tobler,
1979)
• What happens in i depends on what happens in j. Formally,
Environmental Externalities

Where βji is the effect of pollution of region j on region i.


What is the problem with this modeling strategy?
Under standard econometric modeling, it is impossible to model spatial dependency
Spatial Autocorrelation
• Autocorrelation =⇒ the correlation of a variables with itself
• Time series: the values of a variable at time t depends on the value of
the same variable at time t - 1.
• Space: the correlation between the value of the variable at two
different locations
• Definition (Spatial Autocorrelation)➔ Correlation between the same
attribute at two (or more) different locations
• Coincidence of values similarity with location similarity. Under spatial
dependency it is not possible to change the location of the values of
certain variable without affecting the information in the simple.
• It can be positive and negative
Positive Autocorrelation

Observations with high (or low) values of a


variable tend to be clustered in space
Negative Autocorrelation
• Locations tend to be surrounded by neighbors having very dissimilar values
• Two main sources of spatial
autocorrelation (Anselin,
1988):
• Measurement errors.
• Importance of Space.
A brief Introduction to SAR models
• Sp feature :
1. spatial lags of dependent variables,
2. spatial lags of independent variables, and
3. spatially autoregressive errors.

Linear Regression
A brief Introduction to SAR models
• SAR models extend linear regression by allowing outcomes in one
area to be affected by
1. outcomes in nearby areas,
2. covariates from nearby areas, and
3. errors from nearby areas.
• Said in the spatial jargon, models can contain
1. spatial lags of the outcome variable,
2. spatial lags of covariates, and
3. spatially autoregressive errors
Preparing Data
Langkah 1 :Translate Shapefile to stata format
Unzip File ➔ ketik “unzip file name”

Unzip File ➔ ketik “unzip tl_2016_us_county.zip”


Langkah 2 : Membuka data
command ➔ ”use file_name, clear”
command ➔ ”use tl_2016_us_county, clear”

Tampilan
data
Langkah 2.1 : Membuka data “sp” format
Command “spset”
Langkah 3 : Create ID variable untuk digunakan
dengan data lain
Ketik ➔ “generate long fips = real(STATEFP + COUNTYFP)”

1. The variable we created did not have to


be numeric, but fips is numeric in project
cs.dta, and numeric is better for reasons
to be explained in step 4.
2. In any case, we were pleased when we
listed the value of variable NAME for fips
= 1001 and it was Autauga.
3. We also verify that new variable fips really
does uniquely identify the observations in
tl 2016 us county.dta by typing

bysort fips: assert _N==1


assert fips !=
Langkah 4 : Optionally, tell Sp to use the common ID variable

• Command: “spset fips, modify replace”Note :


• The above resets ID. spset verifies that fips is numeric
and would make an appropriate ID code.

• If it does, spset copies fips to Sp’s ID variable, the


variable that officially identifies the observations.

• Sp then reindexes both tl 2016 us county.dta and tl 2016


us county shp.dta on the new ID values.
Y
• ou should do this step because, if ID is a common code,
the spatial weighting matrices you create will be
sharable with other projects and researchers. The rows
and columns of the matrices will be identified by the
common code rather than the arbitrary code ID
previously contained.
Langkah 5 : Set the units of the coordinates, if necessary

Jangan lupa di save


Sekarang kita sudah punya 2 data
tl_2016_us_county.dta
tl_2016_us_county_shp.dta
Step 6.a: Merge your cross-sectional data with
the Stata-format shapefiles
Nama file data ➔ project_cs
Ketik :
use project_cs, clear
keep if _merge==3
drop _merge
save, replace
Turn regular Stata datasets to Sp datasets
Command : spset
Step 6.b: Merge your panel data with the
Stata-format shapefiles
Fila data name : project_panel
Command
• use project_panel, clear
• xtset fips time
• Spbalance
• merge m:1 fips using tl_2016_us_county
• keep if _merge==3
• drop _merge
• save, replace
Turn regular Stata datasets to Sp datasets and
xtset datasets
Command 1 : spset

Command 2 : spset
Langkah 7 : Agregasi data analisis dengan data
shape file
Nama data file : texas_ue
Buka data : command ➔ “use texas_ue, clear”
Langkah 7 : Agregasi data analisis dengan data
shape file (lanjutan)
Command : “merge 1:1 fips using tl_2016_us_county”
Langkah 7 : Agregasi data analisis dengan data
shape file (lanjutan)
At this point, we type describe again and discover that texas ue.dta has lots of
unnecessary, leftover variables from tl 2016 us county.dta, so we drop them. There is
another variable that we rather like—the names of the counties—and we rename it

Command :
. rename NAME countyname
. drop STATEFP COUNTYFP COUNTYNS GEOID
. drop NAMELSAD LSAD CLASSFP MTFCC CSAFP
. drop CBSAFP METDIVFP FUNCSTAT
. drop ALAND AWATER INTPTLAT INTPTLON
. save, replace
Hasil merge data
Langkah 8 : Analisa data
Command : “describe”
Langkah 8 : Analisa data (lanjutan)
Command : “summarize unemployment”
Langkah 8 : Analisa data : membuat peta
Command : grmap unemployment

You might also like