This action might not be possible to undo. Are you sure you want to continue?
Statistics
Budhi Setiawan, PhD
Types of Spatial Data
Continuous Random Field
Lattice Data
Point Pattern Data
Note: Each type of data is analyzed
differently
Geostatistics
Geostatistical analysis is distinct from
other spatial models in the statistics
literature in that it assumes the region
of study is continuous
•Observations could
be taken at any point
within the study area
•Interpolation at
points in between
observed locations
makes sense
0
5
1
0
1
5
2
0
X
0
5
1
0
1
5
2
0
Y
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
Z
Spatial Autocorrelation
Spatial modeling is based on the
assumption that observations close in
space tend to covary more strongly
than those far from each other
◦ Positively covary: values are similar in
value
E.g. elevation (or depth) tends to be similar for
locations close together)
◦ Negatively covary: values tend to be
opposite in value
E.g. density of an organism that is highly
spatially clustered, where observations in
between clusters are low and values within
clusters are high
Covariance
Definition: two variables are said to covary
if their correlation coefficient is not zero
where µ is the correlation coefficient
between X and Y and o
X
(o
Y
) is the
standard deviation of X (Y)
Consider this in the context of a single
variable
◦ E.g. do nearest neighbors have nonzero
covariance?
y x y x y x
y x E y x y x o µo µ µ o o = ÷ ÷ = = = )] )( [( ) , cov( ) , (
,
Continuous Data – Geostatistics
Notation
Z(s) is the random process at location s=(x,
y)
z(s) is the observed value of the process at
location s=(x, y)
D is the study region
The sample is the set {z(s) : s e D} . We say
that it is a partial realization of the random
spatial process {Z(s) : s e D}
Conceptual Model
where
µ(s) is the mean structure; called largescale nonspatial trend
W(s) is a zeromean, stationary process whose autocorrelation
range is larger than min{ s
i
– s
j
: i,j = 1, 2, …, n}; called smooth
smallscale variation
q(s) is a zeromean, stationary process whose autocorrelation
range is
smaller than min{ s
i
– s
j
: i,j = 1, 2, …, n} and which is
independent of W(s); called microscale variation or measurement
error
c(s) is the random noise term with zeromean and constant
variance and which is independent of W(s) and q(s)
) ( ) ( ) ( ) ( ) ( s ε s η s W s μ s Z + + + =
Simpler Conceptual Model
where
µ(s) is the mean structure; called largescale non
spatial trend
δ(s) = W(s) + q(s) is a zeromean, stationary
process with autocorrelation which combines the
smooth small scale and microscale variation
c(s) is the random noise term with zeromean and
constant variance which is independent of W(s) and
q(s)
) ( ) ( ) ( ) ( ) ( s ε s η s W s μ s Z + + + =
) ( ) ( ) ( ) ( s s s s Z c o µ + + =
Graphical Concept with Trend
5
0
5
10
15
20
25
30
35
Z
0 5 10 15 20 25 30 35
X
Linear Fit
Fit Each Value
Bivariate Fit of Z By X
Red line indicates largescale
trend
Green line shows how the
data are arranged around the
trend
Note that there is a pattern
to the points around the red
line. The pattern implies
possible positive
autocorrelation in Z(x).
Finally, there is white noise.
Graphical Concept without Trend
Red line indicates a
constant mean, i.e. no large
scale trend
Green line shows how the
data are arranged around the
trend
Again, the pattern of the
green line implies possible
positive autocorrelation in
RZ(x)
15
10
5
0
5
10
15
R
Z
0 5 10 15 20 25 30 35
X
Linear Fit
Fit Each Value
Bivariate Fit of RZ By X
Important Point
The model indicates that Z can be
decomposed into largescale
variation, small + microscale
variation, and noise
The reality is that any estimated
decomposition is not a unique
◦ E.g. in the graph just shown, we could
have instead added a sinusoidal aspect to
the largescale trend and hence captured
much of the apparent autocorrelation
Example
Red line indicates largescale
trend captured by a
sinusoidal + linear trend
Green line shows how the
data are arranged around the
trend
Note that now there is no
obvious pattern and so the
remaining unexplained
variation is likely white noise
in Z(x).
5
0
5
10
15
20
25
30
35
Z
0 5 10 15 20 25 30 35
X
Smoothing Spline Fit, lambda=1
Fit Each Value
Bivariate Fit of Z By X
Modeling
Ultimately we want to do modeling of
Z using the geostatistical model
Requires estimates of the model
components
◦ the mean
◦ the smallscale variation and the
covariances among Z values at different
locations
◦ Any “leftovers”, i.e. the unexplained or
residual variability
) ( ) ( ) ( ) ( s s s s Z c o µ + + =
Important Point
The choice of approach (detailed fit of a
trend vs. largescale trend + autocorrelation)
to estimating/predicting Z depends strongly
on the reason for and uses of the model
◦ E.g. if you are interested in predicting Z at
unsampled locations within the study area, then
any model that uses covariates to estimate large
scale trend must also have the covariates known
for the unsampled locations
◦ E.g. if you are interested in understanding the
reasons for the spatial distribution of Z then you
may or may not want to incorporate a spatial
correlation component
Correlation Structure
(Semivariogram)
Now, to assess spatial autocorrelation we look at
the behavior of the following:
for every possible pair of locations in the dataset (N
locations yields N(N1)/2 pairs).
Correlated: we would expect Z(s
i
) to be similar in
value to Z(s
j
) and hence the squared difference to
be small.
Independent: we would expect the squared
difference to be relatively large since the two
numbers would vary according to the population
variability.
2 / )] ( ) ( [
2
t Z s Z ÷ 2 / )] ( ) ( [
2
t Z s Z ÷ 2 / )] ( ) ( [
2
t Z s Z ÷
2
)] ( ) ( [
2
j i
ij
s Z s Z ÷
= ¸
Plot (Variogram Cloud)
distance
g
a
m
m
a
5 10 15
0
.
0
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
Looking for
pattern, i.e. is
there a trend in γ
with respect to
distance between
two locations
Variogram cloud for a dataset of 400 observations
Empirical Variogram
The variogram cloud is usually very
uninformative
◦ Difficult to discern trend or pattern
More pertinent is to calculate the average
values of γ for different distances
◦ Problem is we don„t usually have discrete
distances between locations (happens only
when data are on a perfect grid).
◦ A common method for averaging γ at specific
distances is to bin the distances into intervals
(called lag distances), i.e. use all points within
some bin width around a given distance value
Continuous Data – Geostatistics
Because we do not usually have lots of values at
discrete distances, a common method for averaging
the values at discrete distances is to use all points
within some bin width around a given distance value.
So we choose several levels of h (distances) and
calculate the empirical variogram:
where N(h) is the set of all locations that are a distance
of h apart within a tolerance region around h, i.e.
and N(h) is the number of pairs in N(h).
2
( )
1
ˆ
2 ( ) [ ( ) ( )]
 ( ) 
N h
h Z s Z t
N h
¸ = ÷
¿
)} (    : ) , {( ) ( h tol t s or h t s t s h N e ÷ = ÷ =
Empirical Semivariogram
distance
g
a
m
m
a
0 2 4 6 8 10 12
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
This plot is called an
omnidirectional classical
empirical semivariogram
•Omnidirectional because the
direction between the pairs of
locations was ignored,
•Classical because the
equation used to estimate the
mean (alternatives exist that
are robust to outliers or to
failure of assumptions of the
model)
•Semi because of the division
by 2 in the equation used
Graph based on a set of 20 distance lags
Important Points
The constantly increasing semivariogram
indicates that there is a problem with this
dataset
◦ Ideally, it should at some distance level off at the
variance of the process implying that at some
distance the relationship between 2 locations is
the same regardless of the distance between
them (i.e. observations are independent at large
distances)
◦ This graph indicates that
The data imply correlation exists at all distances (and
therefore the study region is small relative to the range
of autocorrelation) or
The data have a largescale trend which may account
for most of the seeming autocorrelation (smallscale
trend)
Semivariogram
distance
g
a
m
m
a
0 2 4 6 8 10 12
0
.
0
0
.
5
1
.
0
1
.
5
Note the rise and
then leveling off
of the γ(h) values
as distance
increases
We’ll cover shapes
for variograms in
more detail later
Empirical semivariogram for different dataset in which
there was no largescale trend but definite autocorrelation
Semivariogram
Note that the γ(h)
values are more
orless the same
regardless of
distance
Empirical semivariogram for different dataset in which
there was no largescale trend and no autocorrelation
distance
g
a
m
m
a
0 5 10 15
0
.
0
0
.
0
0
2
0
.
0
0
4
0
.
0
0
6
0
.
0
0
8
Important Points
If the empirical semivariogram increases in
distance between locations, then the
correlation between points is decreasing as
distance increases
The point at which it flattens to a constant
value is the distance at which any two points
that distance or larger apart are independent.
The value of γ is the variance of the spatial
process
At this point in our analyses, the number of lag
distances you use is not that critical but when
we try to fit a curve to the empirical
semivariogram later the number of lags
becomes very important
Important Point About Directionality
Another point to consider is whether the
pattern of autocorrelation, i.e. the shape of
the curve describing the semivariogram, is
the same in every direction.
◦ Can‟t tell from the omnidirectional plot.
Need to check if there is a directional effect
Directional Semivariograms
To check directionality in the
covariance, plot γ for each h for
different directions
◦ Modify the sets of locations over
which the averaging occurs
◦ Typically done using a set of binned
directions (wedges of the compass)
Requires that you modify the definition
of neighborhood
)} ( ), (   : ) , {( ) , ( angle tol h tol t s t s h N e Z e ÷ = Z
Directional Semivariograms
EXAMPLE:
calculate mean
variability for
the angles 0,
22.5, 45, 67.5,
90, and 112.5°
with a tolerance
of 11.25° on
each side.
0
1
2
3
4
5
0
0 2 4 6 8 10 12
22.5 45
0 2 4 6 8 10 12
67.5 90
0 2 4 6 8 10 12
0
1
2
3
4
5
112.5
distance
g
a
m
m
a
Need for Assumptions in Order to
Proceed Beyond This Point
The data that are collected are a
partial observation of the spatial
surface (e.g. map) that we are
interested in
In addition, it is usually assumed that
there is some “super process” that
created the particular surface for
which we have this partial view
◦ To estimate the spatial autocorrelation we
need to make some assumptions.
◦ Otherwise, we don‟t have sufficient
information to make any inferences.
Two Assumptions
Stationarity, specifically secondorder
stationarity
Isotropy
Stationarity
The mean of the process is constant, i.e. no trend
µ(s) = µ for all s e D (1)
The covariance between any pair of points
depends only on the distance (and possibly
direction) of the points NOT the location of the
points in space:
where C(.) is the covariance function
◦ This implies that the variance of Z is constant everywhere
If both points are met then the spatial process we
are studying is said to be secondorder
stationary.
D s t s C t Z s Z e ¬ ÷ = ) ( )) ( ), ( cov(
D s s s C s Z s Z
j i j i
e ¬ ÷ = ) ( )) ( ), ( cov(
Relationship between Semivariogram and Correlation
Assuming intrinsic stationarity, we have
Now, assuming that ,
we have
where . Thus,
[ ( ) ( )] 0 E Z Z + ÷ = s h s
[ ( ) ( )] 2 ( ) Var Z Z ¸ + ÷ = s h s h
1 2 1 2 1 2
2
[ ( ) ( )] [ ( )] [ ( )] 2 [ ( ), ( )]
2 2 ( )
Var Z Z Var Z Var Z Cov Z Z
C o
÷ = + ÷
= ÷
s s s s s s
h
2
1 2
[ ( )] [ ( )] Var Z Var Z o = = s s
1 2
÷ = s s h
2
( ) ( ) C o ¸ = ÷ h h
Isotropy
The covariance between any pair of
points does not depend on direction
but only distance
) ( ) ( )) ( ), ( cov( h C s s C s Z s Z
j i j i
= ÷ =


 

If this holds
then the spatial
process is said
to be isotropic
NonConstant Mean
Two ways to handle a trend when it does
exist:
◦ Detrend the data using regression (or similar) with
covariates and then use the residuals from the
trend analysis for the spatial autocorrelation
analysis
E.g. disease rates as a function of population density
◦ Universal kriging (UK) which allows for estimating
the trend as a global polynomial in s = (x, y) and
estimating the spatial autocorrelation
simultaneously
UK ignores other explanatory covariates which can be
advantageous or not depending on the purpose of your
study
NonConstant Variance
To account for heterogeneity (non
constant variance),
◦ estimate variability in smaller subregions of
the study area
Need to make decisions about the size and extent of
the subregions
Need sufficient numbers of observations within each
subregion
◦ Transform or standardize your data so that the
variability of the transformed values is constant
over the region
Anisotropy
Two types of anisotropy
◦ Geometric
the range over which correlation is nonzero depends
on direction
The variance is constant over all directions
This type can be adjusted for in geostatistical analyses
◦ Zonal
Anything not geometric anisotropy
Anisotropy implies that the spatial process
evolves differentially throughout the study
region
Variography
Fitting a valid semivariogram function
to the empirical semivariogram
Now we are interested in describing
the variogram as an equation in which
variance is a function of the distance.
We shall assume that the spatial
process is secondorder stationary
and isotropic in the following.
Semivariogram
We have already seen how to obtain the empirical
variogram of
is the semivariogram and is the primary
quantity of interest because
Now we are interested in describing the
semivariogram as a function of the distance.
We shall assume that the spatial process is second
order stationary and isotropic in the following.
)) ( ) ( var( ) ( 2 t Z s Z h γ ÷ =
) (h ¸
) ( ) 0 ( ) ( h C C h ÷ = ¸
Semivariogram
Semivariogram Models have the following
properties:
1) Many are not linear in their parameters
2) Must be “conditionally negativedefinite”, i.e. the
function must satisfy
for any real numbers satisfying
3) If as , there is microscale
variation which is assumed to be due to
measurement error (ME) or a process occurring at
the microscale. ME is measurable only if we have
replicate values at each location in the sample.
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿
= 0
i
a
0
) ( c h γ ÷
0 ÷ h
Semivariogram
Semivariogram Models have the following
properties:
If ¸(h) is constant for every h except h = 0 where
¸(0) = 0, then Z(s) and Z(t) are uncorrelated for
any pair of locations s and t
, i.e. h
2
is
increasing faster than ¸(h) as h increases
· ÷ ÷   0   / ) ( 2
2
h as h h γ
distance
sill
¸
nugget
range
A Typical Semivariogram
Characteristics of the Semivariogram
It is 0 when the separation distance is 0 (Var(0)=0).
Nugget effect: variation in two points very close
together.
May be measurement error
May be indicative of erratic process (gold ore).
The sill corresponds to the overall variance of the
data.
Data separated by distances less than the range
are spatially autocorrelated (Less variation
between close observations than between far
observations.)
2 2
) ( ) (  
j i j i i i
y y x x ÷ + ÷ = ÷s s
Estimating the Semivariogram
Take all pairwise differences in the data:
(Z(s
i
)Z(s
j
)), s = (x, y), a point in the 2D plane.
Compute the Euclidean distance between the
spatial locations:
Average pairs that have the same distance
class;
“Binning”: like a 2D histogram.
End Result: Empirical Semivariogram
Modeling the Semivariogram
• The semivariogram measures variation among
units h units apart.
• Note: We do not want negative standard errors.
• So, we model the semivariogram with selected
parametric functions ensuring all standard errors
are nonnegative.
• We estimate the nugget, sill, and range
parameters of the model that best fit the empirical
semivariogram (nonlinear least squares problem).
Selected
semivariogram
models
Covariogram Models
Power Model is simply a
reparameterization of the
exponential model.
Spherical
Model
Exponential Model
Gaussian Model
Covariogram vs. Semivariogram
The covariogram and semivariogram are related:
) ( ) 0 ( ) ( h C C h ÷ = ¸
The fitted semivariogram model
Estimates: nugget=0.084, sill=0.269, range=110.3 miles
Common methods for fitting these functions to a set of empirical
semivariogram means:
1) choose the most likely candidate model
2) Methods for estimating the parameters of the model :
nonlinear least squares estimation – allows for the estimation of parameters
that enter the equation nonlinearly but ignores any dependences among the
empirical variogram values
nonlinear weighted leastsquares – generalized least squares in which the
variancecovariance of the variogram data points is accounted for in the
estimation procedure
maximum likelihood assuming the data are Normally distributed but the
estimators are likely to be highly biased, especially in small samples (the
usual remedy is jackknifing)
restricted maximum likelihood – maximize a slightly altered likelihood function
which reduces the bias of the MLEs
Properties of Variogram
Models
if as then there is microscale
variation
◦ Usually assumed to be due to measurement
error (ME)
◦ ME is measurable only if we have replicate
values at each location in the sample
◦ When fitting a variogram function, may estimate
a nonzero value for c
0
even when you do not
have replicate observations at sites. This is
called the nugget.
if ¸(h) is constant for every h except h=0
where ¸(0) = 0, then Z(s
i
) and Z(s
j
) are
uncorrelated for any pair of locations s
i
and
s
j
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s γ a a ) ( 2 0
¿¿
÷ >
s t
t s
t s a a ) ( 2 0 ¸
¿¿
÷ >
s t
t s
t s a a ) ( 2 0 ¸
¿¿
÷ >
s t
t s
t s a a ) ( 2 0 ¸ ¿
= 0
i
a
¿
= 0
i
a
¿
= 0
i
a
¿
= 0
i
a
¿
= 0
i
a
¿
= 0
i
a
0
) ( c h ÷ ¸
0 ÷ h
Properties of Variogram
Models
Choosing a “Best” Model
Need to choose the variogram model that
best fits the data
◦ “Best” – minimum unexplained variation after
fitting
Look at a measure of deviance
where is the empirical semivariogram for the i
th
lag and is the value predicted by the fitted
semivariogram model
¿
÷
i
i i
h h
2
)] ( ˆ ) ( [ ¸ ¸
) (
i
h ¸
) ( ˆ
i
h ¸
Choosing a “Best” Model
In the absence of comparing deviance
(or similar) measures to determine if
the model seems appropriate
◦ Compare fits visually
◦ Use prior knowledge from other studies to
determine
Next Steps
Using the results of the variography to
do statistical modeling of the spatial
process
◦ “kriging”