You are on page 1of 64

5.

Modeling the spatial


variability
Trend surface modeling
• This approach is aimed to model the overall distribution of
properties throughout space. It will sketch the global trend of
distribution as a simplified surface. The problem with this method is
that it was conceived to model broad scale patterns and thus finer
spatial structure cannot be detected.
• Trend surface modelling can be applied in the following geographic
information context:
i. Information can be either in image (raster format) or object
(vector format) form
ii. Properties are estimated at sampled locations, described as
a set of geographical locations
iii. Only spatially continuous distributions can be modelled
iv. Properties must be quantitative
• The principle of a trend surface model is a regression function that
estimates the property value Pi at any location, based on the Xi, Yi
coordinates of this location.
Trend surface modeling
• The general function is

• A trend surface model is a particular case of a bivariate


regression model with two independent variables, the
coordinates X and Y and a dependent variable, the thematic
variable P to be modelled.
• One can selected a linear regression function (first order) or,
if the spatial distribution is more complex, a polynomial
function (2nd, 3rd, …, or nth order (x, x2 , x3 ,…xn)
Trend surface modeling
• Simplifies the surface
representation to allow visualization
of general trends.
• Polynomials of higher order, e.g.,
z = a + bx + cy 1st degree
z = a + bx + cy + dx2 + exy + fy2
(2nd degree)
Trend surface modeling in R
• In order to run trend surface in R, you need the following
packages installed
install.packages("ape")
install.packages("spdep")
install.packages("ade4")
install.packages("vegan")
install.packages("packfor", repos="http://R-Forge.R-
project.org")
install.packages("AEM", repos="http://R-Forge.R-
project.org")
install.packages("PCNM", repos="http://R-Forge.R-
project.org") 5
Trend surface modeling in R
• You can visualize different spatial pattern modelled by trend-
surface analysis using actual and a theoretical data
Theoretical data examples.
xygrid=expand.grid(1:10, 1:10) #construct a plot a 10 x 10 grid
plot(xygrid)
xygrid.c=scale(xygrid, scale=FALSE) #centring
plot(xygrid.c)
X=xygrid.c[,1] # give the x coordinate of each sample point
Y=xygrid.c[,2] # give the y coordinate of each sample point
• You can play around with the different functions and see what
spatial pattern they model
6
Trend surface modeling
• One can identify 3 stages that are common to most trend
surface modelling methods:
1. In the first step we select the most significant polynomial
regression function that best explain the distribution of
sample values. As this modelling approach is aimed to
sketch the spatial distribution of properties with a continuous
surface, it is recommended to limit the order of the
regression function up to the fifth order.
2. Once the regression model has been “calibrated” the
regression function should be then applied to an
independent set of sample points for validation purpose.
3. Finally, the selected regression function describes the
considered trend surface that models the spatial distribution
of properties. This function can then be used to estimate the
property value at any location Xi, Yi within the study area.
1st order 2nd order

3rd order 4th order


1st order 2nd order

3rd order 4th order


1st order 2nd order

3rd order 4th order


1st order 2nd order
5th order

3rd order 4th order


9th order
Random Fields vs Random variables
• A random field is a random function over an arbitrary domain
• It is a function f(x) that takes on a random value at each
point.
• A theoretical model characterized by an autocovariance/
variogram model and a mean value.
• It is also sometimes thought of as a synonym for a stochastic
process with some restriction on its index set or we can say it
is a generalization of a stochastic process.
• A random variable (RV) is a number. i.e., a quantity that varies
at a single location Random field
Random Fields vs Random variables
Random fields: are quantities that vary temporally or spatially.
 modeled as a “continuum” of random variables.
 Stochastic processes are very important as they can mimic
numerous natural phenomena.
 To create RF in R you need to install RandomFields-
package
install.packages("RandomFields")
RFoptions(seed=0) ## *ANY* simulation will have the
random seed 0; set
## RFoptions(seed=NA) to make them all random again
 To simulate some data first we can use (Gaussian random
field with exponential, # covariance; 6 realisations)
model <- RMexp()
x <- seq(0, 10, 0.1)
Random Fields vs Random variables
• select some data from the simulated data
xy <- coordinates(z)
pts <- sample(nrow(xy), min(100, nrow(xy) / 2))
dta <- matrix(nrow=nrow(xy), as.vector(z))[pts, ]
dta <- cbind(xy[pts, ], dta)
plot(z, dta)
• re-estimate the parameter (true values are 1)
estmodel <- RMexp(var=NA, scale=NA)
(fit <- RFfit(estmodel, data=dta))
• show a kriged field based on the estimated parameters
kriged <- RFinterpolate(fit, x, x, data=dta)
plot(kriged, dta)
Random Fields vs Random variables
Deterministic Vs probabilistic modelling
• Most statistical models are comprised of a deterministic
model(s) and a stochastic model(s).
• The deterministic part is the average, or expected pattern in
the absence of any kind of randomness or measurement
error (i.e., stochasticity).
• The deterministic model can be phenomenological (i.e.,
relationship based on the observed patterns in the data),
mechanistic (i.e., relationship based on underlying theory), or
even a complex individual-based simulation model.
• In a deterministic model, given the input data and parameter
values, the model determines exactly the output, such that
we always get the same result.
• If the deterministic model perfectly described the
environmental system under consideration, we would be able
to predict the value of the dependent variable (y) exactly.
Deterministic Vs probabilistic modelling
• Where as , a probabilistic method or model is based on the
theory of probability or the fact that randomness plays a role
in predicting future events.
• in a stochastic model, given the input data and parameter
values, the model gives variable output, such that we
always get a different result due to randomness.
• The stochastic model is simply the error in our ability to
predict dependent variable for a particular input.
Deterministic Vs probabilistic modelling
Deterministic Vs probabilistic modelling
Model choice
• Ideally the model choice is made a priori, before you
have looked at the data, but there will be many times
where an initial examination of the data will provide
important insights on the adequacy of a particular model
and suggest a different model or perhaps competing
models.
• Time spent carefully considering the right model for the
question, given the data, is time well worth spending, as
any inferences made are going to be contingent on the
model, as are any insights gained from the study.
Deterministic Vs probabilistic modelling
Stationary
• Unlike deterministic approaches, geostatistics
assumes that all values in your study area are the result
of a random process.
• But a random process does not mean that all events are
independent as with each flip of a coin.
• Geostatistics is based on random process with
dependence.
Stationary
• Statistics relies on some notion of replication, estimates
can be derived and the variation and uncertainty of the
estimate can be understood from repeated observations.
• in a spatial setting, the idea of stationary is used to obtain
the necessary replication.
• There are two types of stationary
1. Mean stationary-mean is constant between samples and is
independent of location
2. Second-order stationarity for covariance and intrinsic
stationarity for semi variograms. It is the assumption that the
covariance/semi variogram is the same between any two points
that are at the same distance and direction apart no matter which
two points you choose. It depends on distance b/n any two
values but not on their location.
Stationary
• Second-order and intrinsic stationarity are assumptions
necessary to get the necessary replication to estimate the
dependence rules, which allows to make predictions and
assess uncertainty in the prediction.
• Therefore, computing semi variogram and covariance to
make predictions and assess uncertainty.
6. Spatial prediction
Taxonomy of prediction methods
• The goal of geostatistics is to predict the possible spatial
distribution of a property (quantitative) at unvisited site
within the area covered by existing observations by
applying a prediction algorithm
• The output often takes the form of a map (estimation) or
a series of maps (simulation).

• Spatial prediction models (algorithms) can be classified


based on several aspects.
Taxonomy of prediction methods
• Most importantly, they can be classified according to the
amount of statistical analysis included:
1. Mechanical/empirical models:
– These are models where arbitrary or empirical model
parameters are used.
– No estimate of the model error is available and usually no
strict assumptions about the variability of a feature exist.
– The most known techniques that belong to this group are:
a) Thiessen polygons;
b) Inverse distance interpolation;
c) Regression on coordinates;
d) Splines and others;
Taxonomy of prediction methods
2. Statistical (probability) models:
– The model parameters are commonly estimated in an
objective way, following the probability theory.
– The predictions are accompanied with the estimate of
the prediction error.
• There are at least four groups of statistical
models:
a) kriging (plain geostatistics);
b) environmental correlation (e.g. regression-based);
c) Bayesian-based models (e.g. Predicting from polygon
maps);
d) mixed models (regression-kriging);
1. Mechanical spatial prediction models
• Mechanical spatial prediction models can be very flexible
and easy to use.
• They can be considered to be subjective or empirical
techniques because the user him/her-self selects the
parameters of the model, often without any deeper
statistical analysis. Most commonly, a user typically
accepts the default parameters suggested by some
software, hence the name mechanical models.
• In general, mechanical prediction models are more
primitive than the statistical models and often sub-
optimal, however, there are situations where they can
perform as good as the statistical models (or better).
1. Mechanical spatial prediction models
a) Inverse distance interpolation:
– Each input point has local influence that diminishes with
distance
– Estimates are averages of values at n known points
within window
z (x)   wi zi w i wi  1 d i 2

i i
– where w is some function of distance and it
declines with distance
1. Mechanical spatial prediction models
• IDW is popular, easy, but not panacea
• interpolated values limited by the range of the data
• no interpolated value will be outside the observed range of
z values
• How many points should be included in the averaging?
• What to do about irregularly spaced points?

This set of six data points


clearly suggests a hill
profile. But in areas where
there is little or no data the
interpolator will move
towards the overall mean.
Blue line shows the profile
interpolated by IDW
1. Mechanical spatial prediction models
• Example
1. Mechanical spatial prediction models
b) Regression on coordinates
• Assuming that the values of target variable at some
location are function of coordinates.
• we can determine its values by finding a function which
passes through (or close to) the given set of discrete
points.
• Regression on coordinates is based on:
Z(s) = f(x, y) + ε and the predictions are made by:

where r + s < p is the number of transformations of


coordinates, p is the order of the surface.
• Regression on coordinates can be criticized for not relying
on empirical knowledge about the variation of a variable.
1. Mechanical spatial prediction models
C) Spline
• A spline is a special type of piecewise polynomial and
are preferable to simple polynomial interpolation because
more parameters can be defined including the amount of
smoothing.
• The smoothing spline function also assumes that there is a
(measurement) error in the data that needs to be
smoothed locally.

where the a1 is a constant and R(υi) is the radial basis function


determined using (Mit´aˇsov´a and Mitas, 1993).
• Splines have shown to be highly suitable for interpolation
of densely sampled heights and climatic variables.
2. Statistical spatial prediction models
• In the case of statistical models, coefficients/rules used to
derive outputs are derived in an objective way following
the theory of probability.
• Unlike mechanical models, in the case of statistical
models, we need to follow several statistical data analysis
steps before we can generate maps.
• This makes the whole mapping process more complicated
but it eventually helps us:
(a) produce more reliable/objective maps,
(b) understand the sources of errors in the data and
(c) depict problematic areas/points that need to be
revisited.
2. Statistical spatial prediction models
a) kriging
• Kriging is named after the South African engineer, D. G.
Krige, who first developed the method.
• Here the predictions are based on the model:
Z(s) = µ + ε’(s)
where µ is the constant stationary function (global mean)
and ε’(s) is the spatially correlated stochastic part of
variation. The predictions are made as:

• where λ0 is the vector of kriging weights (wi), z is the


vector of n observations at primary locations.
• kriging can be seen as a sophistication of the inverse
distance interpolation.
Spatial prediction
2. Statistical spatial prediction models
• The key problem of inverse distance interpolation is to
determine how much importance should be given to each
neighbour. Intuitively thinking, there should be a way to
estimate the weights in an objective way, so the weights
reflect the true spatial autocorrelation structure.
• Kriging uses the semivariogram, in calculating estimates of
the surface at the grid nodes.
• Based on the semivariogram used, optimal weights are
assigned to known values in order to calculate unknown
ones. Since the variogram changes with distance, the
weights depend on the known sample distribution.
2. Statistical spatial prediction models
b) Environmental correlation
• If some exhaustively-sampled auxiliary variables or
covariates are available in the area of interest and if they are
significantly correlated with our target variable (spatial cross
correlation), and assuming that the point-values are not
spatially auto-correlated, predictions can be obtained by
focusing only on the deterministic part of variation:
Z(s) = f {qk(s)} + ε
where qk are the auxiliary predictors that can be used to
explain the deterministic part of spatial variation.
• Predictors which are available over entire areas of interest
can be used to predict the value of an environmental variable
at unvisited locations — first by modelling the relationship
between the target and auxiliary predictors at sample
locations, and then by applying it to unvisited locations.
2. Statistical spatial prediction models
• There are (at least) three groups of statistical models that have
been used to make spatial predictions with the help of
environmental factors.
i. Classification-based models — Classification models are
primarily developed and used when we are dealing with discrete
target variables (e.g. land cover or soil types).
ii. Tree-based models —They are fitted by successively splitting a
dataset into increasingly homogeneous groupings. Output from
the model fitting process is a decision tree, which can then be
applied to make predictions.
iii. Regression models — Regression analysis employs a family of
functions called Generalized Linear or non-linear Models, which
all assume a linear/non-linear relationship between the inputs and
outputs.
• Each of the models listed above can be equally applicable for
mapping of environmental variables and can exhibit advantages
and disadvantages.
2. Statistical spatial prediction models
C) Predicting from polygon maps
• A special case of environmental correlation is prediction from
polygon maps i.e. stratified areas (different land use/cover
types, geological units etc).
• Assuming that the residuals show no spatial auto-correlation,
a value at new location can be predicted by

where k is the unit identifier. This means that the weights


within some unit will be equal so that the predictions are
made by simple averaging per unit.
2. Statistical spatial prediction models
D) Mixed or hybrid models
• Mixed or hybrid spatial prediction models comprise of a
combination of the techniques listed previously.
• For example, a mixed geostatistical model employs both
correlation with auxiliary predictors and spatial
autocorrelation simultaneously. There are two main
• sub-groups of mixed geostatistical models: (a) co-kriging-
based and (b) regression kriging-based techniques.
• Mixed models are more generic and can be used to
represent both discrete and continuous changes in the
space, both deterministic and stochastic processes.
Taxonomy of prediction methods
• Spatial Interpolation techniques can be also classified as
i. Non-geostatistical Methods
ii. Geostatistical Methods (kriging)
iii. Combined Method
• Geostatistical methods consider the spatial
correlation of distance and direction between
sample points
Taxonomy of prediction methods
• Strata divide area to be mapped into ‘homogeneous’
strata; predict within each stratum from all samples in that
stratum
• Global predictors: use all samples to predict at all points;
also called regional predictors;
• Local predictors: use only ‘nearby’ samples to predict at
each point
• Mixed predictors: some of structure is explained by
strata or globally, some locally
Spatial prediction
Introduction to ordinary kriging
 Ordinary kriging is the simplest form of kriging.

 It uses dimensionless points to estimate other


dimensionless points, e.g. elevation contour plots.
 In Ordinary kriging, the regionalized variable is assumed to
be stationary.
 Prediction is made as a linear combination of known data
values (a weighted average).
 Prediction is unbiased and exact at known points
 Points closer to the point to be predicted have larger weights
 Clusters of points “reduce to” single equivalent points, i.e.,
over-sampling in a small area can’t bias result
 Closer sample points “mask” further ones in the same
direction
Introduction to ordinary kriging
• Kriging uses the semivariogram, in calculating estimates of
the surface at the grid nodes.
Introduction to ordinary kriging (OK)
• We model the value of variable z at location si as the sum
of a regional mean m and a spatially-correlated random
component e(si):
• The regional mean m is estimated from the sample, but not
as the simple average, because there is spatial
dependence. It is implicit in the OK system.
• Predict at points, with unknown mean (which must also be
estimated) and no trend
• Each point x0 is predicted as the weighted average of the
values at all sample points.
• The weights l assigned to each sample point sum to 1:
• Therefore, the prediction is unbiased:
• “Ordinary”: no trend or strata; regional mean must be
estimated from sample
Introduction to ordinary kriging (OK)
• The kriging system is solved using the modeled semi-
variances
• Different models will give different kriging weights to the
sample points and these will give different predictions
• Conclusion: bad model leads to bad predictions
Derivation of the kriging equations
• kriging tries to choose the optimal weights that produce the
minimum estimation error .
• Optimal weights, those that produce unbiased estimates
and have a minimum estimation variance, are obtained by
solving a set of simultaneous equations .

w1 γ (h11 )  w2 γ (h12 )  w3 γ (h13 )  γ (h1 p )


w1 γ (h21 )  w2 γ (h22 )  w3 γ (h23 )  γ (h2 p )
w1 γ (h31 )  w2 γ (h32 )  w3 γ (h33 )  γ (h3 p )
Derivation of the kriging equations
• A fourth variable is introduced called the Lagrange
multiplier
w1 (h11 )  w2  (h12 )  w3  (h13 )     (h1 p )
w1 (h21 )  w2  (h22 )  w3  (h23 )     (h2 p )
w1  (h31 )  w2  (h32 )  w3  (h33 )     (h3 p )
w1  w2  w3  1
  (h )  (h )  (h ) 1  w    (h
11 12 13 1 1p
)
  (h )  (h )  (h ) 1  w    (h )
 21 21
   
23 2 2p

  (h )  (h )  (h ) 1  w    (h
31 32 33 3 3p
)
     
 1 1 1 0    1 
Derivation of the kriging equations
• Once the individual weights are known, an estimation using
OK can be made by

z e ( p )  w1 z1  w2 z 2  w3 z 3

• And an estimation variance can be calculated by

σ  w1 γ(h1 p )  w2 γ(h21 p )  w13 γ(h3 p )  λ


2
z
Block kriging (BK)
• Often we want to predict in blocks of some defined size,
not at points.
• Block kriging (BK) is quite similar in form to OK, but the
estimation variances are lower.
• Estimate at blocks of a defined size, with unknown mean
(which must also be estimated) and no trend
• Each block B is estimated as the weighted average of the
values at all sample points xi:
• As with OK, the weights λi sum to 1, so that the estimator is
unbiased, as for OK
Universal kriging
• Also known as Kriging with drift.
• It recognizes both non-stationary deterministic and
random components in a variable,
• estimates the trend in the former and the variogram of the
latter, and recombines the two for prediction.
• This introduces residual maximum likelihood into the
kriging procedure.
• OK and BK are for realizations of stationary processes
meaning assume a constant mean µ.
• Whereas, universal kriging is for spatial processes that
include trend, or ‘drift’ which are not stationary for the
mean.
7. Feature space modeling
accounting for secondary
information
Exhaustive secondary information
• There is much that can be gained by including process
knowledge in spatial interpolation
• The advantage of exploiting process knowledge is not only
that we (potentially) get more accurate maps, but also that
we get a better understanding of how the real world works.
• Secondary information be included in prediction by using
a) Kriging within strata
• ordinary kriging is entirely based on the observations and
does not make use of any additional information (which is
often available)
• perhaps we can do better by incorporating the additional
information (explanatory data as well as knowledge about
physical processes that caused the spatial variation)
Exhaustive secondary information
OK Kriging within strata
Exhaustive secondary information
b) Simple kriging
• In OK we must estimate the regional mean along with the
predicted values, in one OK system.
• In UK or kriging with external drift we must estimate
coeficients, along with the predicted values, in one UK
system.
• However, there may be situations where the regional mean
is known. Then we can use so-called Simple Kriging (SK)
• Similarly, if the trend is known, we can use “Simple”
variants of UK and KED.
Exhaustive secondary information
c) Kriging with External Drift (KED): includes feature-
space predictors that are not geographic coordinates.
• Also known as regression kriging.
• If the deterministic part of variation (drift) is defined
externally as a linear function of some auxiliary variables,
rather than the coordinates, the term Kriging with External
Drift (KED) is preferred.
• The predictions are made as with kriging, with the
difference that the covariance matrix of residuals is
extended with the auxiliary predictors qk(si)’s.
• However, the drift and residuals can also be estimated
separately and then summed.
Better sampled secondary information
1. cokriging
• Cokriging uses information on several variable types.
• The main variable of interest is Zi, and all other variable
types are used to make better predictions.
• Cokriging requires much more estimation, which includes
estimating the autocorrelation for each variable as well as
cross-correlations.
• Therefore, cockriging use either semivariograms or
covariance (correlogram) and cross-covariance (cross-
correlation) to make better predictions.
• If there is no cross-correlation, you can fall back on just
autocorrelation for Zi.
Better sampled secondary information
2. linear model of coregionalization (LMC)
• A model for semivariograms/covariances and cross-
covariances formed by taking a linear combination of
component semivariogram/covariance models.
• LCM combines a linear model for different scales of the
spatial variation as well as a linear model for components
of the multivariate variation.
• This model is used for cokriging methods.
Better sampled secondary information
3. markov model
• A Markov model is a finite state machine with N distint
states begins at (Time t = 1) in initial state .
• It moves from current state to Next state according to the
transition probabilities associated with the Current state
• This kind of system is called Finite or Discrete Markov
model.
• Markov Property : The Current state of the system
depends only on the previous state of the system
• The State of the system at Time [ T+1 ] depends on the
state of the system at time T.

Xt=1 Xt=2 Xt=3 Xt=4 Xt=5


Advanced topics
1. Nested semivariogram models
• A model that is the sum of two or more component models,
such as nugget, spherical, etc.
• Adding a nugget component to one of the other models is
the most common nested model, but more complex
combinations are occasionally used.
Advanced topics
2. geographically weighted regression (GWR)
• The basic idea behind GWR is to explore how the
relationship between a dependent variable (Y) and one or
more independent variables (the Xs) might vary
geographically.
• Instead of assuming that a single model can be fitted to the
entire study region, it looks for geographical differences.
• GWR works by moving a search window from one point
in a data set to the next, working through them all in
sequence.
• A regression model is then fitted to that subset of the data,
giving most weight to the points that are closest to the one
at the canter.
Exercise 5
Import the “meuse” point data to ArcGIS and answer the
following questions using the lead and copper data.
1. Examine and present the distribution of the meuse data
using histogram, normal QQplot and trend analysis.
Which of the data show normal or close to normal
distribution?
2. Produce a map of lead distribution using
a) Inverse distance
b) Ordinary kriging -use 15 bins and a lag distance of
100m with the anisotrophy option. What are the
nugget, sill and range values of the semivariogram?
c) Cokriging-use distance, soil type and flood frequency
as secondary data
3. Compare the maps produced by the three methods. 6
3
Which method revealed better accuracy?
Exercise 5
You can use the sample codes provided in the following
site to implement the different geostatistical prediction
methods including IDW and Kriging in R.
https://rspatial.org/raster/analysis/4-interpolation.html

6
4

You might also like