You are on page 1of 40

GNR402: Introduction to Geographic Information

Systems: Spatial Interpolation-Part 4

Surya S. Durbha, Ph.D.


Professor
CSRE, IIT(B)

Outline

❖ Background
❖ Kriging

"Everything is related to everything else, but near things


are more related than distant things."
- Waldo Tobler (1970)

Kriging
• Assumes distance or direction between sample points
shows a spatial correlation that help describe the
surface
• Fits function to
• Specified number of points OR
• All points within a window of specified radius
• Based on an analysis of the data, then an application
of the results of this analysis to interpolation
• Most appropriate when you already know about
spatially correlated distance or directional bias in data
• Involves several steps
• Exploratory statistical analysis of data
• Variogram modeling
• Creating the surface based on variogram
Kriging

Breaks up topography into 3 elements: Drift (general trend), small


deviations from the drift and random noise.

To be stepped over
Kriging

•Kriging is a geostatistical method and a probabilistic method,


unlike the others, which are deterministic.
•That is, there is a probability associated with each prediction.
Kriging has both a deterministic and probabilistic component,
respectively
• Z(s) = µ(s) + ε(s), where both are functions of distance
•Assumes spatial variation in variable is too irregular to be
modeled by simple smooth function, better with stochastic
surface


Kriging

❖ Interpolation parameters (e.g. weights) are chosen to


optimize the function.
❖ Assumes that variability in space can be modeled as a sum of
three components:
• structure/deterministic part,
• random but spatially correlated part and
• spatially uncorrelated random part

Kriging
• Foundation of Kriging is notion of spatial autocorrelation
• tendency of values of entities closer in space to be related.
•Autocorrelation can be assessed using a semivariogram, which
plots the difference in pair values (variance) against their
distances.

❑ Given a set of features and an associated attribute, Global Moran's I


evaluates whether the pattern expressed is clustered, dispersed, or
random.
❑ When the Z score indicates statistical significance, a Moran's I value near
+1.0 indicates clustering while a value near –1.0 indicates dispersion.

Moran’s I

http://gis.esri.com/library/userconf/proc02/pap1064/p106413.gif
Kriging

Kriging is similar to IDW in that it weights the surrounding measured


values to derive a prediction for an unmeasured location. The general
formula for both interpolators is formed as a weighted sum of the
data:

where:
Z(si) = the measured value at the ith location.
λi = an unknown weight for the measured value at the ith location.
s0 = the prediction location.
N = the number of measured values.

Kriging Weights
❑In ordinary kriging, the weight, λi, depends
on
❑a fitted model to the measured points,
❑ the distance to the prediction location, and
❑ the spatial relationships among the
measured values around the prediction
location.

Variance
❑ Variance is a measure of how far a set of numbers are
spread out from each other.

❑ The variance of a random variable or distribution is the


expectation, or mean, of the squared deviation of that
variable from its expected value or mean.

❑ Variance is a measure of the amount of variation of the


values of that variable, taking account of all possible
values and their probabilities or weightings

Kriging
▪ Fitting a model, or spatial modeling, is also known as structural
analysis, or variography.
▪ In spatial modeling of the structure of the measured points, you
begin with a graph of the empirical semivariogram, computed
as:

▪ Semivariogram (distance h) =
0.5 * average[ (value at location i – value at location j)2 ]

▪ for all pairs of locations separated by distance h.


▪ The formula involves calculating the difference squared
between the values of the paired locations.
▪ The image (Next slide) shows the pairing of one point (the red
point) with all other measured locations. This process continues
for each measured point.

Kriging
Semivariogram
Measuring spatial
variation

• •
For each pair Z(x) and • • h

Z(x+h), separated by a •
h

h •
h
distance h, we • h

• •
h

measure the square of


the difference between Vector distance h

them

h
α
Kriging
❑ Often each pair of locations has a unique distance, and
there are often many pairs of points. To plot all pairs
quickly becomes unmanageable.
❑ Instead of plotting each pair, the pairs are grouped into
lag bins.
For example, compute the average semivariance for all
pairs of points that are greater than 40 meters apart but
less than 50 meters.

Kriging
❑ The empirical semivariogram is a graph of the averaged
semivariogram values on the y-axis and the distance
(or lag) on the x-axis (see diagram below).
.

❑ Thus, pairs of locations that are closer (far left on the x-axis of the
semivariogram cloud) should have more similar values (low on the y-
axis of the semivariogram cloud).
❑ As pairs of locations become farther apart (moving to the right on the
x-axis of the semivariogram cloud), they should become more
dissimilar and have a higher squared difference (moving up on the y-
axis of the semivariogram cloud).

Semi Variance

h is the distance between ordered data, and n(h) is the


number of paired data at a distance of h.
The semivariance is half the variance of the
increments z(xi + h) − z(xi), but the whole variance of
z-values at given separation distance h (Bachmaier
and Backes, 2008).

SemiVariogram in Kriging
ow avg. difference between values at points changes with distance between points

Range – no
more surprises

sill

nugget

A semivariogram. Each cross represents a pair of points. The solid circles are obtained by averaging within
the ranges or bins of the distance axis. The solid line represents the best fit to these five points, using one of
a small number of standard mathematical functions.
Variogram
❖ Plots semi-variance against
distance between points
❖ Is binned to simplify
❖ Can be binned based on just
distance (top) or distance
and direction (bottom)
❖ Where autocorrelation exists,
the semivariance should Binning based on distance only
have slope
❖ Look at variogram to find
where slope levels

Binning based on distance and


direction
SemiVariogram

❖ SV value where it flattens out


is called a “sill.” sill
❖ The distance range for which
there is a slope is called the
“neighborhood”; this is where
there is positive spatial
structure
❖ The intercept is called the
“nugget” and represents
random noise that is spatially
independent nugget range

Plotting the variogram


Analysing the variogram
❖ Even without a model we can notice some
features, which we define here only
qualitatively:
❖ Sill:
❖ maximum semi-variance;
❖ represents variability in the absence of spatial
dependence
❖ Range:
❖ separation between point-pairs at which the sill is
reached;
❖ distance at which there is no evidence of spatial
dependence
Analysing the variogram

❖ Nugget:
❖ semi-variance as the separation approaches
zero;
❖ represents variability at a point that can’t be
explained by spatial structure.
❖ In the previous slide, we can estimate the sill ≅
1.9, the range ≅ 1200 m, and the nugget ≅ 0.5
i.e. 25% of the sill.
Fitting Theoretical Functions

❖ The next step is to fit a model to the points forming the


empirical semivariogram.
❖ Semivariogram modeling is a key step between
spatial description and spatial prediction.
❖ The main application of kriging is the prediction of
attribute values at unsampled locations.
❖ The empirical semivariogram provides information on
the spatial autocorrelation of datasets. However, it does
not provide information for all possible directions and
distances.
Fitting Theoretical Functions

❑ For this reason, and to ensure that kriging predictions


have positive kriging variances, it is necessary to fit
a model—that is, a continuous function or curve—to
the empirical semivariogram.
❑ Abstractly, this is similar to regression analysis, in
which a continuous line or curve is fitted to the data
points
Fitting Theoretical Functions

❖ After building an experimental variogram, we need


to fit a theoretical function in order to model the
spatial variation.

❖ The adjustment procedure is interactive, where the


user selects the theoretical model that best fits his
data.

❖ Some useful models:


❖ Gaussian, Exponential, Spherical models
Fitting the Semivariogram

γ (h) Experimental

Theoretical

Sill

Nugget
Effect
Range h

Functional Forms

From Fortin and Dale Spatial Analysis


Spherical Model
⎪0 , |h|= 0

⎪ ⎡ 3⎤
⎪⎪ ⎢ 3 ⎛⎜|h|⎞⎟ 1 ⎛⎜|h|⎞⎟ ⎥
γ(h) = ⎨Co + C1 ⎢ ⎜ ⎟ − ⎜ ⎟ ⎥ = Co + C1[ Sph (|h|)] , 0 <|h|≤ a
⎪ ⎢2 ⎝
a ⎠ 2⎝ a ⎠ ⎥
⎪ ⎣ ⎦

⎪⎩C o + C1 , |h|> a

γ

This model shows a progressive decrease of


spatial autocorrelation (equivalently, an C1 Sill
C = Co+ C1
increase of semivariance) until some distance,
Co
beyond which autocorrelation is zero. The
spherical model is one of the most commonly a h
used models.
Exponential Model

⎧0 ,|h|= 0
⎪ C1
γ(h) = ⎪⎨ ⎡ ⎛ |h|⎞⎤
+
⎪C o C1 ⎢1 − exp⎜− + [Exp
⎟⎥ C o C1
⎟⎥ = (|h|)] ,|h|≠ 0
⎪⎩ ⎢ ⎜

a ⎠⎦

Co

a h
Gaussian Model

γ
⎧0 ,|h|= 0

γ(h) = ⎪⎨ ⎡ 2⎤
⎛ |h|⎞ ⎥
C1
⎪C o + C1 ⎢1 − exp + [Gau (|h|)] ,|h|≠ 0

⎟ ⎥ C o C1
⎜− ⎟ =
⎪ ⎢


a ⎠ ⎥
⎩ ⎣ ⎦ Co

a h
Kriging Method

•We can then use a scatter plot of predicted versus actual values to
see the extent to which our model actually predicts the values
•If the blue line and the points lie along the 1:1 line this indicates
that the kriging model predicts the data well

Kriging: Summary

•Semivariograms measure the strength of statistical correlation as


a function of distance; they quantify spatial autocorrelation
•Because Kriging is based on the semivariogram, it is
probabilistic, while IDW and Spline are deterministic
•Kriging associates some probability with each prediction, hence it
provides not just a surface, but some measure of the accuracy of
that surface
•Kriging equations are determined by fitting line through points so
as to minimize weighted sum of squares between points and line
•These equations are weighted based on spatial autocorrelation,
which is determined from the semivariograms

Which Method to Use?

Trend - rarely goes through your original points


Spline - best for surfaces that are already smooth
Elevations, water table heights, etc.
IDW - assumes variable decreases in influence
w/distance from sampled location
Interpolating a surface of consumer purchasing
power for a retail store
Kriging - if you already know correlated distances
or directional bias in data
Geology, soil science
Which to Use? cont.

Kriging - Allows user greater flexibility in defining


the model to be used in the interpolation
Tracks changes in spatial dependence across
study area (may not be linear)
Produces
• a smooth, interpolated surface
• variogram (how well pixel value fits overall
model)
• Want to get variances close as possible to zero
Example

•Here are some sample elevation points from which surfaces were
derived using the three methods
Example: IDW

•Done with P =2.


Example: Kriging
Kriging output: prediction
‘Much of the life of the mind consists in applying concepts to things’
(Fodor 1998:24)

You might also like