Professional Documents
Culture Documents
Multivariate
Geostatistics
An Introduction
with Applications
Springer
Dr. HANS WACKERNAGEL
Centre de Geostatistique
Ecole des Mines de Paris
35, rue Saint Honore
77305 Fontainebleau
France
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is con-
cerned, specifically the rights of translation reprinting, reuse of illustrations, recitation, broadcasting, re-
production on microfilm or in any other way, and storage in data banks. Duplication ofthis publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in ist current version, and permission for use must always be obtained from Springer-Verlag. Violations
are liable for prosecution under the German Copyright Law.
Product liability: The publishers cannot guarantee the accuracy of any information about dosage and
application contained in this book. In every individual case the user must check such information by
L'analyse des donnees est "un outil pour degager de la gangue des
donnees le pur diamant de la veridique nature" .
Multivariate analysis is "a tool to extract from the gangue of the data
the pure diamond of truthful nature".
Preface
Introducing geostatistics from a multivariate perspective is the main aim of this
book. The idea took root while teaching geostatistics at the Centre de Geostatis-
tique (Ecole des Mines de Paris) over the past ten years in the two postgraduate
programs DEA and CFSG. A first script of lecture notes in French originated
from this activity.
A specialized course on Multivariate and Exploratory Geostatistics held in
September 1993 in Paris (organized in collaboration with the Department of
Statistics of Trinity College Dublin) was the occasion to test some of the mate-
rial on a pluridisciplinary audience. Another important opportunity arose last
year when giving a lecture on Spatial Statistics during the summer term at the
Department of Statistics of the University of Washington at Seattle, where part
of this manuscript was distributed in an early version. Short accounts were also
given during COMETT and TEMPUS courses on geostatistics for environment al
studies in Fontainebleau, Freiberg, Rome and Prague, which were sponsored by
the European Community.
I wish to thank the participants of these various courses for their stimulating
questions and comments. Among the organizers of these courses, I particularly
want to acknowledge the support received from Georges Matheron, Pierre Chau-
vet, Margaret Armstrong, John Haslett and Paul Sampson. Michel Grzebyk has
made valuable comments on Chapters 26 and 27, which partly summarize some
of his contributions to the field.
1 Introduction 1
A Preliminaries 5
2 From Statistics to Geostatistics 7
The mean: center of mass 7
Covariance. . . . . . . . . . 10
Linear regression . . . . . . 12
Variance-covariance matrix. 15
Multiple linear regression. 16
Simple kriging. . . . . . . . 18
B Geostatistics 23
3 Regionalized Variable and Random Function 25
Multivariate timejspace data . . . . . . 25
Regionalized variable . . . . . . . . . . . 26
Random variable and regionalized value 27
Random function . . . . 27
Probability distributions 28
Strict stationarity . . 29
4 Variogram Cloud 30
Dissimilarity versus separation. . . . . . . . . . . . . . 30
Experimental variogram . . . . . . . . . . . . . . . . . 32
Replacing the experimental by a theoretical variogram 34
7 Anisotropy 46
Geometrie Anisotropy . . . . . . . 46
Rotating and dilating an ellipsoid . 46
Exploring 3D space for anisotropy . 48
Zonal anisotropy . . . . . . . . 48
Nonlinear deformations of space .. 49
11 Ordinary Kriging 74
Ordinary kriging problem . . . . . . . . 74
Block kriging . . . . . . . . . . . . . . . 76
Simple kriging with an estimated mean . 77
Kriging the residual . 79
Cross validation . . . . . . . . . . . . . . 80
XI
12 Kriging Weights 82
Geometry . . . . . . 82
Geometrie anisotropy . 84
Relative position of sampies 84
Sereen effeet . . . . . . . . . 85
Faetorizable eovarianee functions 86
Negative kriging weights .. 88
13 Mapping with Kriging 89
Kriging for spatial interpolation 89
Neighborhood . . . . . . . . . . 90
14 Linear Model of Regionalization 94
Spatial anomalies . . . . . . . . . . . . 94
Nested variogram model . . . . . . . . 95
Deeomposition of the random function 96
Seeond-order stationary regionalization 97
Intrinsie regionalization . . . . . . . . 98
Intrinsie regionalization with mostly stationary eomponents 98
Loeally stationary regionalization . . . . . . . . . . . . . . . 99
23 Cokriging 144
Isotopy and heterotopy . 144
Ordinary cokriging . . . 145
Simple cokriging . . . . 147
Cokriging with isotopic data 148
Autokrigeability . . . 149
Collocated cokriging . . . . 151
APPENDIX 201
Matrix Algebra 203
XIV
Bibliography 237
Index 251
1 Introduction
Data description. The data need to be explored for spatial, temporal and mul-
tivariate structure and checked for outlying values which mask structure.
Modern computer technology with its high-power graphic screens display-
ing multiple, linkable windows allows for dynamic simultaneous views on
the data. A map of the position of sampies in space or representations
along time can be linked with histograms, correlation diagrams, variogram
clouds and experimental variograms. First ideas about the spatial, time
and multivariate structure emerge from a variety of such simple displays.
the spatial or time structure, the associations and the causal relations be-
tween variables are built into a model which is fitted to the data. This
model not only describes the phenomenon at sampIe locations, but it is
usually also valid for the spatial or time continuum in the sampled re-
gion and it thus represents a step beyond the information contained in the
numerical data.
Estimation. Armed with a model of the variation in the spatial or temporal
continuum, the next objective can be to estimate values of the phenomenon
under study at various scales and at locations different from the sampIe
points. The methods to perform this estimation are based on least squares
and need to be adapted to a wide variety of model formulations in different
situations and for different problems encountered in practice.
We have decided to deal only with these three issues, leaving aside questions
of simulation and control which would have easily doubled the length of the book
and changed its scope. To get an idea of what portion of geostatistics is actually
covered it is convenient to introduce the following common sub division into
1. Linear stationary geostatistics,
2. Non-stationary linear geostatistics,
3. Non-linear geostatistics.
We shall mainly cover the first topic, examining single- and multi-variate
methods based on linear combinations of the sampIe values and we shall assume
that the data stern from the realization of a set of random functions which are
stationary or, at least, whose spatial or time increments are stationary.
A short review of the second topic is given in the last three chapters of the
book with the aim of providing a bett er understanding of the status of drift
functions which are not translation invariant. We had no intention of giving an
extensive treatment of non-stationary geostatistics which would justify a mono-
graph.
The third topic has recently been covered at an introductory level in an
excellent monograph by RIVOIRARD [155], so it was not necessary to reexpose
that material here. Actually RIVOlRARD's book starts off with abrief description
of multivariate geostatistical concepts, which will be found in full detail in the
present work and which are important for a deeper understanding of non-linear
geostatistics.
Multivariate Geostatistics consists of thirty short chapters which on average
represent the contents of a two hour lecture. The material is subdivided into five
parts.
Part A reviews the basic concepts of mean, variance, covariance, variance-
covariance matrix, mathematical expectation, linear regression, multiple
linear regression. It ends with the transposition of multiple linear regres-
sion into a spatial context, where regression receives the name of kriging.
Introduction 3
The Appendix contains two additional chapters on matrix algebra and lin-
ear regression theory in a notation consistent with the rest of the material. It
also contains a list of common covariance functions and variograms, additional
exercises and solutions to the exercises. References classified according to topics
of theory and fields of applications are found at the end of the book, together
with a list of sources of geostatistical computer software, the bibliography and a
subject index.
Part A
P re1-JIDlnarles
- -
2 From Statistics to Geostatistics
w(z) = 3, 4, 6, 3, 4, 4, 2
The location z where the bar, when suspended, stays in equilibrium is evi-
dently calculated using a weighted average
1
~ ZkP(Zk)
7
z= (;
7: : - ) k=1
W(Zk) LZkW(Zk) -- f:;.
where
W(Zk)
p(Zk) = (2(W(Zk))
center of mass
5 6
! 7 8
elementary weight v
5 6 7 8
mean m
E [ Z] = J z p( z) dz = m
z E IR
The expectation is a linear operator. Let a and b be deterministic constants.
It is easy to see from the definition that we have
E[ a] = a E[bZ] =bE[Z] =bm
and
E[a+bZ] =a+bm
The second moment of the random variable is the expectation of its squared
value
E[ Z2] = J z2 p(z) dz
z E IR
and the n-th moment is defined as the expected value of the n-th power of Z
E[Z"] = J z"p(z)dz
z E IR
When Z has a discrete distribution the integral in the definition of the math-
ematical expectation is replaced by a sum
E[ Z] = I>"
k
p" = m
10 Preliminaries
CUMULATIVE
FREQUENCY
z
2 3 4 5 6 7 8
where Pk = P(Z = Zk) is the probability that Z takes the value Zk.
The theoretical variance (]"2 is
(]"2 = E[(Z-m)2]
E[ Z2 + m 2 - 2mZ 1
(]"2 = E [ Z2] _ m 2
the variance can be expressed as the difference between the second moment and
the squared first moment.
Covariance
In the case of two variables, Zl and Z2 say, the data values can be represented
on a scatter diagram like on Figure 2.4 which shows the cloud of data points in
the plane spanned by two perpendicular axes, one for each variable. The center
of mass of the data cloud is the point defined by the two means (mb m2). An
obvious way to measure the dispersion of the data cloud around its center of mass
is to multiply the difference between a value of one variable and its mean, called
From Statistics to Geostatistics 11
Z1
.. •
.....• •.
• ...
---------
..
. .-- .
~ _,. -____
~
.~
- - e.
.#
.... .............,.,,:...
,~..
• • ••• •
. ,..:r : ~ :..
.:..,
m1
.:~.:.
• ".-.
.~ I
• ,•• ••
m2 Z2
Figure 2.4: Scatter diagram showing the cloud of sample values and the center of
mass (m2,m1).
a residual, with the residual of the other variable. The average of the products
of residuals is the covariance
When the residual of Zl tends to have the same sign as the residual of Z2 on
average, the covariance is positive, while when the two residuals are of opposite
sign on average, the covariance is negative. When a large value of one residual is
on average associated with a large value of the residual of the other variable, the
covariance has a large positive or negative value. Thus the covariance measures
on one hand the liking or disliking of two variables through its sign and on the
other hand the strength of this relationship by its absolute value.
We see that when Zl is identical with Z2, the eovarianee is equal to the vari-
anee.
It is often desirable to eompare the eovarianees of pairs of variables. When
the units of the variables are not eomparable, especially when they are of a
different type, e.g. cm, kg, %, ... , it is preferable to standardize each variable
12 Preliminaries
z, centering first its values around the center of mass by subtracting the mean,
and subsequently norming the distances of the values to the center of mass by
dividing them with the standard deviation 0", which is the square root of the
variance. The standardized variable z
_ z-m
z=--
0"
Linear regression
Two variables that have a correlation coefficient different from zero are said to
be correlated. It is often reasonable to suppose that some of the information
conveyed by the measured values is common to two correlated variables. Con-
sequently it seems interesting to look for a function which, knowing a value of
one variable, yields the best approximation to the unknown value of a second
variable.
We shall call "best" function z* a function of a given type which minimizes
the mean squared distance dist 2 (.) to the sampies
dise(-) = ~
n
t
a=1
(za - Z:)2
This is intuitively appealing as using this criterion the best function z* is the
one which passes closest to the data values.
zr
Let us take two variables Z1, Z2 and denote by the function which approx-
imates best unknown values of Z1.
The simplest type of approximation of Z1 is by a constant e, so let
z~ = e
and this does not involve Z2. The average distance between the data and the
constant is
1 n
dist 2 (e) = - I:)zf - e)2
n a =1
The minimum is achieved for a value of e for which the first derivative of the
distance function dist 2 (e) is zero
odist 2 (e)
=0
oe
From Statistics to Geostatistics 13
<===> ( -1 L(zf
n
- c? )'
=0
n 0<=1
1 n ,
<===> ;:; L ((zf? + c2 - 2czf) =0
0<=1
1 n
<===> - L(2c - 2zf) =0
n 0<=1
1 n
<===> C = - L zf = m1
n 0<=1
The constant c which minimizes the average square distance to the data of
Z1 is the mean m1. Replacing c by m1 in the expression of dise(c) we see that
the minimal distance for the optimal estimator zr = m1 is the variance u?
A slightly more sophisticated function that can be chosen to approximate Z1
is a linear function of Z2
z: = aZ2 + b
which defines a straight line through the data cloud with a slope a and an inter-
cept b.
The distance between the data of Z1 and the straight line depends on the two
parameters a and b
1 n
dist 2(a, b) = - I)zf - a z~ - W
n 0<=1
1
= -
n
L ({zf)2 + a2(zn 2 + b2 -
n
0<=1
2azf z~ - 2bzf + 2abz~)
1
+ 2abm2 + - L
n
= -2bm1 ((zf)2 + a2(z~)2 + b2 - 2azf z~)
n 0<=1
If we shift the data cloud and the straight line to the center of mass, this
translation does not change the slope of the straight line. We can thus consider,
without loss of generality, the special case m1 = m2 = 0 to determine the slope
of the straight line. In this case the distance is
cov(Zl, Z2)
a
~
= var(z2)
~ b =m1-am2
The best straight line approximating the variable Zl knowing Z2, called the
linear regression of Z2 on Zl, is
*
Zl = var
()
COV(Zl, Z2) (
Z2
Z2 - m2
)
+ m1
= + -r12
0"1
m1
0"2
(Z2 - m2)
= m1 + 0"1 r12 Z2
and is represented on Figure 2.5.
The linear regression is an improvement over the approximation by a constant,
the proportion r12 of the normed residual Z2, rescaled with the standard deviation
0"1, being added to the mean m1. The linear regression is constant when the
correlation coeflicient is zero.
The improvement can be evaluated by computing the minimal squared dis-
tance
= ~(_'"
-1 L." 0"1 Zl - 0"1 r12 Z2
-"')2
n ",=1
= O"~ (1 - r12)
The stronger the correlation between two variables, the smaller the distance
to the optimal straight line, the higher the explanative power of this straight
line and the lower the expected average error. The improvement of the linear
regression from Z2 over the approximation by the constant m1 alone is equal to
the proportion r12 of the variance O"~.
From Statistics to Geostatistics 15
Z1
zi = aZ2+b~
• ••
••
•
•
m1 •
•
•
• •
, ••
m2 Z2
Variables
z~l zL z~N
Samples
Z~l z~i z~N
Define a matrix M with the same dimension n x N as Z' containing the mean
values,
[7 ~Nl
mi
M = ml mi mN
ml mi mN
16 Preliminaries
Z~ = IDo + (Z - M) a
The distanee between Zo and the straight line is
{::::::} 2 Va - 2 Vo = 0
{::::::} Va =Vo
dist~in(a) = var(zo) - aT Vo
To make sure that this optimal distance is not negative, it has to be checked
beforehand that the augmented matrix Vo of variances and covariances for all
N +1 variables
var(zo) COV(Z~'ZN»)
V. - ( :
° - cov(z~, Zo) var(zN)
is nonnegative definite.
Sampie 1 3. 5.
Sampie 2 3. 2. 2.
Sampie 3 1. 4. 5.
The means are mo = 2, ml = 3, m2 = 4. The matrix Vo of variances and
covariances of the three variables
3
1 -1 -
2
2
Vo = I -1
3
1
3
1 2
2
18 Preliminaries
is obviously not nonnegative definite as two out of the three minors of order 2
are negative
det ( 1
-1 T) 1
3' det(~ ~) = --3' 1
det (:~ ~~) 1
4
Va~
The solution is z~ = -3/2 Zl + 0 Z2, but the minimal distance achieved by
this "optimal" regression line is negative:
1
dist~in = -2
and the whole operation is meaningless !
The lesson from this is that varianees and eovarianees of a set of variables
have to be ealeulated on the same subset of samples. This is the subset of samples
with no values missing for any variable.
Similarly, in a spatial eontext we wish to estimate values at loeations where
no measurement has been made using the data from neighboring loeations. We
see immediately that there is a need for a covariance /unction defined in such
a way that the eorresponding matrix Vo is nonnegative definite. The spatial
multiple regression based on a eovarianee function model is ealled simple kriging.
Simple kriging
Take the data loeations Xc> and eonstruct at eaeh of the n locations a random
variable Z(xc»' Take an additional loeation Xo and let Z(xo) be a random
variable at xo. Further assurne that these random variables are a subset of a
seeond-order stationary random function Z(x) defined over a domain 'D. By
second-order stationarity we mean that the expectation and the eovarianee are
both translation invariant over the domain, i.e. for a vector h linking any two
points x and x+h in the domain:
E[ Z(x+h) 1 = E[ Z(x) 1
eov[Z(x+h),Z(x)] = C(h)
The expected value E[ Z(x)] = m is the same at any point x of the domain.
The eovarianee between any pair of loeations depends only on the vector h whieh
separates them.
From Statistics to Geostatistics 19
Xo
o
Xa
•
Figure 2.6: Data points Xa and the estimation point Xo in a spatial domain 1).
Z*(Xo) = m+ t W"'(
",=1
Z(x",) - m)
where Wo: are weights attached to the residuals Z(xo:) - m. Note that unlike in
multiple regression the mean m is the same for alliocations due to the stationarity
assumption.
The estimation error is the difference between the estimated and the true
value at Xo
Z*(xo) - Z(xo)
20 Preliminaries
Q = 1, ... ,n { 2 t
ß=l
wß C(x" - xß) - 2 C(x" - Xo) =0
and the equation system for simple kriging is written
The left hand side of the equations describes the covariances between the 10-
cations. The right hand side describes the covariance between each data location
and the location where an estimate is sought. The resolution of the system yields
the optimal kriging weights w".
The operation of simple kriging can be repeated at regular intervals displacing
each time the location Xo. A regular grid of kriging estimates is obtained which
can be contoured for representation as a map.
From Statistics to Geostatistics 21
A second quantity of interest is the optimal variance for each location xo. It
is obtained by substituting the left hand of the kriging system by its right hand
side in the first term of the expression of the estimation variance 01 This is the
variance of simple kriging
Geostatistics
3 Regionalized Variable and
Random Function
The data provide information about regionalized variables, which are simply
functions z(x) (of a spatial or time continuum) whose behavior we would like to
characterize. In applications the regionalized variables are usually not identical
with simple deterministic functions and it is therefore of advantage to place them
into a probabilistic framework.
In the probabilistic model, a regionalized variable z(x) is considered to be a
realization of a random function Z(x) (the infinite family of random variables
constructed at all points x of a spatial domain 1)). The advantage of this ap-
proach is that we shall only try to characterize the random function Z(x) and
not a particular regionalized variable.
In such a setting the data values are sampIes from one particular realiza-
tion z(x) of Z(x). The epistemological implications of such an investigation
of a unique realization by probabilistic methods have been debated by MATH-
ERON [126] and shall not be reported here. We provide a description of the
probabilistic formalization, but we shall not attempt to justify it rigorously. The
main reason for the success of the probabilistic approach is that it offers a conve-
nient way of formulating methods which could also be viewed in a deterministic
setting, albeit with less elegance.
Coordinates Variables
tl Xl
I x I2 x 3I zi ziI zN
I
Sampies Ita Xl
a x a2 x 3a zl
a
zia zN
a
tn Xl x n2 x n3 Zl zi zN
n n n n
In this array the samples are numbered using an index a and the total number
of samples is symbolized by the letter n, thus a = 1,2,3, ... , n.
The different variables are labeled by an index i and the number of variables
is N, so i = 1,2,3, ... ,N.
The samples could be pieces of rock of same volume taken by a geologist or
they could be plots containing different plants a biologist is interested in. The
variables would then be chemical or physical measurements performed on the
rocks, or counts of the abundance of different species of plants within the plots.
The sampies may have been taken at different moments t a and at different
places X a , where X a is the vector of up to three spatial coordinates (x~, x!, x~).
So far the numbers contained in the data set have been arranged into a
rectangular table and symbols have been used to describe this table. There is a
need for viewing the data in a more general setting.
RegionaIized variable
Let us consider that only one property has been measured on different spatial
objects (N = 1) and that the time at which the measurements have been made
has not been recorded. In this case the index i and the time coordinate t a are
of no use and can be omitted. There are n observations symbolized by
z(X) with x E V
The data set {z( x a ), a = 1, ... , n} is a collection of a few values of the BEV.
Regionalized Variable and Random Function 27
Randomness
[ Ran~ variable)
~--../ ~ -----~
[ Sampies ) ( Random Function )
\ I
Regionalized Variable
Regionalization
Figure 3.1: The random function model.
Random function
Considering the regionalized values at all points in a region, the assoeiated fune-
tion z(x) of x E 1) is a regionalized variable. This function as a whole ean be
viewed as one draw from an infinite set of random variables (one random variable
at eaeh point of the domain) whieh is ealled the random function (RAF) Z(x).
Figure 3.1 shows how the model of the random function has been set up by
viewing the data under two different angles. One aspect is that the data values
stern from a physieal environment (time, 1-2-3D spaee) and are in some way
dependent on their Ioeation in the region: they are regionalized. The second
aspect is that the regionalized values z(x",) cannot, generally, be modeled with a
simple deterministie function z(x). Looking at the values that were sampled, the
behavior of z(x) appears to be very compiex. Like in many situations where the
parameters of a data generating meehanism eannot be determined, a probabilistie
approach is chosen, i.e. the mechanism is eonsidered as random. The data values
are viewed as outeomes of the random mechanism.
Joining together these two aspects of regionalization and randomness yields
28 Geostatistics
JJ- 1 1 realization is a
Probability distributions
In this model the random mechanism Z(xo) acting at a point Xo of the region
generates realizations following a probability distribution F
where P is the prob ability that an outcome of Z at the point Xo is lower than a
fixed value z.
A bivariate distribution function for two random variables Z(Xl) and Z(X2)
at two different locations is
possess only few data from one or several realizations of the RAF and it will
be impossible to infer all the mono- and multivariate distribution functions for
any set of points. Simplification is needed and it is provided by the idea of
stationarity.
Strict stationarity
Stationarity means that characteristics of a RAF stay the same when shifting
a given set of n points from one part of the region to another. This is called
translation invariance.
To be more specific, a RAF Z(x) is said to be strictly stationary iffor any set
of n points Xl! ... ,Xn and any vector h
1 "C'est partout et toujours comme ici, c'est partout et toujours comme chez nous, aux
degres de grandeur et de perfection pres." [M SERRES (1968) Le Systeme de Leibniz et ses
Modeles Mathematiques. Presses Universitaires de France, Paris.}
4 Variogram Cloud
Pairs of sampie values are evaluated by computing the squared difference between
the values. The resulting dissimilarities are plotted against the separation of
sampie pairs in geographical space and form the variogram doud. The doud is
sliced into dasses according to separation in space and the average dissimilarities
in each dass form the sequence of values of the experimental variogram.
·~.x
(z(x+h) _ Z(X» 2
2
..... .....
....
• •••••
,. .. ....
..
... ..
.
•
.
• • • ••••
•
•• • •
•••
.~:.
h
Figure 4.2: Plot of the dissimilarities
pairs: a variogram cloud.
,* against the spatial separation h of sampie
As the dissimilarity is a squared quantity, the sign of the vector h, i.e. the
order in which the points x'" and xß are considered does not enter into play. The
dissimilarity is symmetrie with respect to h:
,*(-h"'ß) = ,*(+h"'ß)
Thus on graphical representations the dissimilarities will be plot ted against abso-
'Y(h~
.. •
-_•...~.
• • •••
•~.~
•
• ..
....,.::. .
•••~.--
• Fe-tt : . :r ....
: l·
:.~:- •.:
~
...
• •• 1 l tr l :•
~.
...........
: :: : :: :: ::
.
I~;o . . . .; .; .;:
; .....
; ;
I I ! ! I
h 1 h2 h3 h4 h s h 6 h 7 h s h9
i i
Figure 4.3: The experimental variogram is obtained by averaging the dissimilarities
1* for given classes .fj,..
the variogram doud usually is dominated by many pairs with low dissimilarity
at all scales h (DIGGLE et al. [57] discuss this question).
Experimental variogram
An average of dissimilarities 1*(harß) can be formed for a given dass of vectors
.fj by grouping all n c point pairs that can be linked by a vector h belonging to
.fj. Such a dass .fj groups vectors whose lengths are within a specified interval
of lengths and whose orientation is the same up to a given tolerance on angle.
Generally non overlapping vector dasses .fj are chosen. The average dissimilarity
with respect to a vector dass .fj,. is a value of what is termed the experimental
variogram
1*(jj,.) = 2n1 c
E
ßc (
z(xar+harß) - Z(X ar )
)2 with h arß E jj,.
'Y (h)
h
Figure 4.4: The sequence of average dissimilarities is fitted with a theoretical vari-
ogram function.
variogram doud by subdiving it into dasses and computing an average for each
dass.
Usually we can observe that the average dissimilarity between values increases
when the spacing between the pairs of sampie points is increased. For large
spacings the experimental variogram sometimes reaches a sill which can be
equal to the variance of the data.
If the slope of the dissimilarity function changes abruptly at a specific scale,
this suggests that an intermediate level of the variation has been reached at this
scale.
The behavior at very small scales, near the origin of the variogram, is of
importance, as it indicates the type of continuity of the regionalized variable:
differentiable, continuous but not differentiable, or discontinuous. In this last
case, when the variogram is not differentiable at the origin, this is a symptom
of a nugget-effect, which means that the values of the variable change abruptly
at a very small scale, like gold grades when a few gold nuggets are contained in
some samples.
When the average dissimilarity of the values is constant for all spacings h,
there is no spatial structure in the data. Conversely, a non zero slope of the
variogram near the origin indicates structure. An abrupt change in slope indi-
cates the passage to a different structuration of the values in space. We shall
learn how to model such transitions with nested theoretical variograms and how
to visualize the different types of spatial associations of the values separately as
maps by kriging spatial components.
34 Geostatistics
The experimental variogram is a convenient tool for the analysis of spatial data
as it is based on a simple measure of dissimilarity. Its theoretical counterpart
reveals that a broad class of phenomena are adequately described by it, includ-
ing phenomena of unbounded variation. When the variation is bounded, the
variogram is equivalent to a covariance function.
Regional variogram
The experimental variogram of sampies z(xa,} is the sequence of averages of
variogram values for different distance classes f)k. If we had sampies for the
whole domain V we could compute the variogram for every possible pair in the
domain. The set V(h), defined as the intersection of the domain V with a
translation V-h of itself, describes all the points x having a counterpart x+h in
V. The regional variogram gR(h) is the integral over the squared differences of
a regionalized variable z(x) for a given lag h
gR(h) = "I:/L \ I
V(h)
J (z(x+h) - z(x) f
where IV(h)1 is the area (or volume) of the intersection V(h).
We know the regionalized variable z(x) only at a few locations and it is
generally not possible to approximate z(x) by a simple deterministic function.
Thus it is convenient to consider z(x) as a realization of a random function Z(x).
The associated regional variogram
GR(h) = "I:/L\I
V(h)
J (Z(x+h) - Z(x)f
• the mean m(h) of the increments, called the drift, is invariant for any
translation of a given vector h within the domain. Moreover, the drift is
supposed to be zero whatsoever the position of h in the domain.
The existence of expectation and variance of the increments does not imply
the existence of the first two moments of the random function itself: an intrinsic
random function can have an infinite variance although the variance of its in-
crements is finite for any vector h. An intrinsically stationary random function
does not need to have a constant mean or a constant variance.
The value of the variogram at the origin is zero by definition
-y(0) = 0
-y(h) ~ 0
-y(-h) = -y(h)
The variogram grows slower than Ihl 2 for Ihl H 00 (otherwise the drift m(h)
could not be assumed zero).
Variogram and Covariance Function 37
Covariance function
The covariance function C(h) is defined on the basis of a hypothesis of station-
arity of the first two moments (mean and covariance) of the random function
The covariance function is bounded and its absolute value does not exceed
the variance
IC(h)1 ~ C(O) = var(Z(x))
Like the variogram it is an even function: C( -h) = C( +h). But unlike the
variogram it can take negative values.
The covariance function divided by the variance is called the correlation func-
tion
C(h)
p(h) = C(O)
-1 ~ p(h) ~ 1
EXAMPLE 5.1 For example, the power variogram shown on Figure 5.1
Power model
"'
...
~<')
a:
8a:
.. ..................
--~:O-.b---" .,-,--------- --
,.
~'"
;
"
-4 ·2 o 2 4
DISTANCE
Figure 5.1: Power variograms for different values of the power p (with b = 1).
combination of n+1 random variables Z(x a) (any subset sampled from a second-
order stationary random function) must be positive. It is necessarily linked to a
positive semi-definite matrix C of covariances
for any set of points X a and any set of weights w'" (assembled into a vector w).
The continuous covariance functions are characterized by Bochner's theorem
as the Fourier transforms of positive measures. This topic is treated in more
detail in the chapter on cross-covariance functions.
var (
n )
.?;WaZ(Xa) = - .?;EwaWß,(Xa-xß) ~ 0
n n
if LW a = 0
a=O
var [E Wa (Z(Xa)-Z(O))]
ff
a=Oß=O
Wa wß E [( Z(xa)-Z(O)) . (Z(Xß)-Z(O))]
n n
= L L Wa wß C1(xa,xß)
a=Oß=O
where C1(x a, xß) is the covariance of increments formed using the additional
variable Z(O).
We also introduce the additional variable Z(O) into the expression of the
variogram
n ) n n
var ( ~WaZ(Xa) =~];WaWßCI(Xa,Xß)
n n n n n n
L Wß L Wa/(xa) + L Wa L wß/(xß) - L L waWß/(xa-xß)
ß=O a=O a=O ß=O a=O ß=O
~ ~
o 0
40 Geostatistics
n n n n
Ihl P
C(h) = e a with 0 < p :::; 2 and a > 0
,(h) = b- C(h)
We present a few models of covariance functions. They are defined for isotropie
(i.e. rotation invariant) random functions. On the graphieal representations
the covariance functions are plot ted as variograms using the relation ')'(h) =
C(O) - C(h).
Nugget-effect model
The covariance function C(h) that models a discontinuity at the origin is the
nugget-effect model
b for Ihl = 0
{
Cnug(h) = 0 for Ihl > 0
where b is a positive value.
Its variogram counterpart is zero at the origin and has the value b for h ::/: o.
It is shown on Figure 6.1.
The nugget-effect is used to model a discontinuity at the origin of the vari-
ogram, i.e. when
!im ')'(h) = b
Ihl"'O
The nugget-effect is equivalent to the concept of white noise in signal pro-
cessing.
Cl
......
00
ci
~
~
0
~
"<l;
0
Ci
0
0
ci ~ ..
0 2 4 6
DISTANCE
Figure 6.1: A nugget-effect variogram: its value is zero at the origin and b= 1 else-
where.
Spherical model
b
Csph(h) = { 0
(1 2-;;
-
3Ihl + 2~
1Ih13 )
for 0 ~ Ihl ~ a
for Ihl > a
The parameter a indicates the range of the spherical covariance: the co-
variance vanishes when the range is reached. The parameter b represents the
maximal value of the covariance: the spherical covariance steadily decreases,
starting from the maximum b at the origin, until it vanishes when the range is
reached.
The nugget-effect model can be considered as a particular case of a spheri-
cal covariance function with an infinitely small range. Nevertheless there is an
important difference between the two models: Cnug(h) describes a discontinuous
phenomenon, whose values change abruptly from one location to the other, while
Csph(h) represents a phenomenon which is continuous, but not differentiable: it
would feel rough, could one touch it.
A corresponding spherical variogram is shown on Figure 6.3. It reaches the
sill (b= 1) at a range of a= 3.
Examples of Covariance Functions 43
Exponential model
~ -t----_________________ _
CI)
ci
:::;:"!
~o
8
~'<I;
>0
"!
o
q
o I~~----------~------------_r------------~-------J
o 2 4 6
DISTANCE
JJJ JJJ
00 00 00 00 00 00
= IEnEhl
44 Geostatistics
0
...;
co
0
~
\Cl
0
~
-.!;
0
N
0
0
0
----r
0 2 4 6
DISTANCE
Conversely, it is worth noting that the intersection B n B-h of the ball with
a copy of itself translated by -h represents the set of points x' E B which have
a neighbor x' + h within the ball, as shown on Figure 6.4
Jt(h) = J
x'E8n8_h
dx' = IB n B-hl
31hl Ilh l3 )
C(h) = { °IBI ( 1 - 2d + 2d3 for 0 ~ Ihl ~ d,
~........ - ...............
"
, ,,
, ,, ,,
B I ,
,,
I \
~ \
,,
\
I
I
,,
I
~h.. ... I
. ,,
I
\
... \
~
, \' '- x'+h
~ I
, I
,,
~
\
\
\
... ...
,
.. . ... _--_ .... ..
...
~
~
Figure 6.4: The intersection 8 n 8_ h describes the set of points x' E 8 which have
a neighbor x/+h inside 8.
where 8181 = 87rcf3 /6 = C(O) represents the variance of Z(x) and 181 is the
volume of the spheres.
The diameter d of the spheres is equal to the range of the covariance function
as it indicates the distance at which the covariance vanishes. The range of
the spherical covariance function is the maximal distance at which the volumes
of infiuence of two random variables Z(x) and Z(x+h) can overlap and share
information.
In applications large objects (as compared to the scale of the investigation)
can condition the spatial structure of the data. The maximal size of these mor-
phologicalobjects in a given direction can often be read from the experimental
variogram and interpreted as the range of a spherical model.
The shape of objects conditioning the morphology of a regionalized variable
may not be spherical in many applications. This will result in anisotropical
behavior of the variogram.
7 Anisotropy
Geometrie Anisotropy
In 2D-spaee a representation of the behavior of the experimental variogram ean
be made by drawing a map of iso-variogram lines as a function of a vector h.
Ideally if the iso-variogram lines are eireular around the origin, the variogram
obviously only depends on the length of the vector hand the phenomenon is
isotropie.
If not, the iso-variogram lines ean in many applieations be approximated by
eoneentrie ellipses defined along a set of perpendieular main axes of anisotropy.
This type of anisotropy, ealled the geometrie anisotropy, ean be obtained by
a linear transformation of the spatial coordinates of a eorresponding isotropie
model. It allows to relate the dass of ellipsoidally anisotropie random functions
to a eorresponding isotropie random function. This is essential beeause variogram
models are defined for the isotropie ease. The linear transformation extends in a
simple way a given isotropie variogram to a whole dass of ellipsoidally anisotropie
variograms.
h'2 ~
h'1
,,"1'
,,"
h1
Figure 7.1: The coordinate system for h = (hl, h2 ) is rotated into the system h'
paraJlel to the main a.xes of the concentric ellipses.
cos () sin () )
Q= (
- sin () cos ()
The angle ()I defines a rotation of the plane hl h2 around h3 such that h l
is brought into the plane h~ h~. With ()2 a rotation is performed Mound the
intersection of the planes h I h2 and h~h~ bringing h3 in the position of h~. The
third rotation with an angle ()3 rotates everything around h~ in its final position.
The second step in the transformation is to operate a shrinking or dilation of
the principal axes of the ellipsoid using a diagonal matrix
VA~ (~' ;J
which transforms the system h' into a new system h in which the ellipsoids
become spheres
h=VAh'
48 Geostatistics
r = Ihl = VhTh
This yields the equation of an ellipsoid in the h' coordinate system
(h'r Ah' = r 2
The diameters dp (prineipal axes) of the ellipsoid along the prineipal direetions
are thus
2r
dp = ,,;xp
and the prineipal direetions are the vectors ()p of the rotation matrix.
Finally onee the ellipsoid is determined the anisotropie variogram is speeified
on the basis of an isotropie variogram by
-y(r) = -y(v'hTBh)
where B = QT AQ.
Q = (R)lc = (~)lc
2
(1
g+ 1
-(g+
9
1) 9 )lc
-1 withk=1, ... ,4
9 1 g+1
Zonal anisotropy
It can happen that experimental variograms calculated in different directions
suggest a different value for the sill. This is termed a zonal anisotropy.
Anisotropy 49
For example, in 2D the sill along the X2 eoordinate might be mueh larger than
along Xl. In such a situation a eommon strategy is to fit first to an isotropie model
11 (h) to the experimental variogram along the Xl direction. Seeond, to add a
geometrieally anisotropie variogram 12(h), whieh is designed to be without effect
along the Xl eoordinate by providing it with a very large range in that direetion
through an anisotropy eoeflieient. The final variogram model is then
Support
In the investigation of regionalized variables the variances are a function of the
size of the domain. On Table 8.1 the results of computations of means and
variances in nested 2D domains D n are shown.
Table 8.1: Nested 2D domains Dn for which the variance increases with the size of
the domain (from a simulation of an intrinsic random function by C LAJAUNIE)
~
v
Volumes
u v
Soil poJlutlon
D
s
,20 :~ ~0'~
Suifaces
s
Irtdu$trial hygienics
1'0
L\t T
Time intervals
Figure 8.1: Supports in 1, 2, 3D in different applications.
on short time intervals At the excess over a limit value defined for a work day T
should be estimated.
Extension variance
With regionalized variables it is necessary to take account of the spatial disposal
of points, surfaces or volumes for which the variance of a quantity should to be
computed.
The extension variance of a point x with respect to another point x' is
defined as twice the variogram
v
v
= 2 lvi 1IVI JJ
x Eu x'EV
,(x-x') dxdx'
-1~12 JJ
xEV x'EV
,(x-x') dxdx'
Denoting
we have
The extension variance depends on variogram integrals ;y(v, V), whose values
can either be read in charts (see JOURNEL & HUIJBREGTS [93], chap. II) or
integrated numerically on a computer.
Dispersion variance
Suppose a large volume V is partitioned into n smaller units v of equal size.
The experimental dispersion variance of the values z~ of the small volumes Va
building up V is given by the formula
1
s2(vlV) = -
n
L (z: - ZV)
a=l
n 2
where
1 n
Zv = -n LZ: a=l
Extension and Dispersion Variance 53
The theoretical formula for the dispersion variance is obtained by taking the
expectation
u 2(vlV) E[ S2(vlV) 1
1 n [
~EE (z: - Zv) 2]
in which we recognize the extension variances
u 2(vlV) = ~
n
t U~(Va,
a=1
V)
E
a=1
= -;y(v, V) -;y(V, V) + ~. __ . t JJ
a=1 x EVa x'EV
{(x-x')dxdx'
~
xEV
u 2(vlV) = ;Y(V,V)-;Y(v,v)
The theoretical determination of the dispersion variance reduces to the com-
putation of the variogram integrals ;Y(v, v) and ;Y(V, V) associated to the two
supports v and V.
Krige's relation
Starting from the formula of the dispersion variance, first we see that for the
case of the point values (denoted by a dot) the dispersion formula reduces to one
term
u 2(·IV) ;Y(V, V)
54 Geostatistics
v
D
-
v
L
I VI J
Figure 8.3: A domain V partitioned into volumes V which are themselves partitioned
into smaller volumes v.
Second, we notice that (12(vlV) is the difference between the dispersion vari-
ances of point values in V and in v
frequency
.. .-'.
'
' . ........
' ..
sampies•••••••••••...
'
'.' .. .
' '
.......... ......
-- ............... .
z
mean
Figure 8.4: The distribution of block values is narrower than the distribution of values
at the sarnple points.
polygon method each sam pie value is extended to its area of influence, neglecting
the fact that the sampies are obtained from pointwise measurements while the
polygons represent a much larger support.
In the case of a square grid the polygons are square blocks v which partition
the exploration area. The value at each grid node is extended to each area of
influence v. The method implies that the distribution of average values of the
blocks is the same as the distribution of the values at the sampie points. From
Krige's relation we know that this cannot be true: the distribution of the values
for a support v is narrower than the distribution of point values (as represented
on Figure 8.4) because the variance Q'2( 'Iv) of the points in v generally is not
negligible.
In mining, the cut-off value defines a grade above which a mining block should
be sent to production. Mining engineers are interested in the proportion of the
values above the cut-off value which represent the part of a geological body which
is of economical interest. If the cut-off grade is a value substantially above the
mean, the polygon method will lead to a systematic overestimation of the ore
reserves as shown on Figure 8.5. To avoid systematic over- or underestimation
the support effect needs to be taken into account.
56 Geostatistics
overestimation I
.......
...........
' .'.'.'.'
.'
.'.' ··,L
. '"
' ." "
...............
threshold
Figure 8.5: The proportion of sam pie values above the cut-off value is greater than the
proportion of block values: the polygon method leads to a systematic overestimation
in this case.
Circular saw
~
!!\.:~I '11,
i~.l'
~
~ :l!.i U! Ji
I:on::'lr° l~!'
0
'8§
Figure 8.6: Measurements of maximal noise level L max (dots) and average noise Leg
(plain) on time intervals of 20 seconds during 3 hours and 40 minutes. They represent
the exposure of a worker to the noise of a circular saw.
exponential model
ba 2
(ßt)2 (A
2e- t/a -
2h
2 + --;- + e- h/ a (2 _ e- At/ a ) _ e(h-At)/a)
for 0 S; h S; ßt,
'At(h) =
ba 2
(ßt )2 (e- At/ a - e At / a + (e- At / a + eAt / a - 2) . (1 - e- h/ a ) )
for h > ßt.
The Figure 8.7 shows the experimental variogram together with the exponen-
tial model regularized over time lags of 20 seconds, 1 and 5 minutes illustrating
58 Geostatistics
Circular saw
"t
0
~~.-"""",=,
1-0-13 'D:'a:~a
205
~
0
1MN
::E
~
Cl"!
00
if
~
ci 5MN
CI
0
--r
0 10 20 30
TIME I 205
Figure 8.7: Experimental variogram of the acoustic power v"q and a regularized ex-
ponential variogram model for time intervals of D..t = 20s, 1mn and 5mn.
the effect of a modification of the support on the shape of the theoretical vari-
ogram.
Finally a curve of the dispersion variance of the acoustic power as a function
of the integration time D..t is represented on Figure 8.8. The dispersion variance
for an exponential model is calculated with the formula
Circular saw
....
d
~~
Z
~
~
t5~
~
W
0-
cn
i5~
d
q
0-,-
0 200 400 600
TIME I 20S
The concepts of estimation and dispersion variance can be used to compare three
sampling designs with n sampies
B - random grid: the n sampies are taken at random within the domain Vj
For design A, with a regular grid the global estimation variance (T~G is com-
puted as
U~G = ~2EE[(Z(XOI)-Z(VOI)r]
1
2: ui(x
n
= 2 OI , VB)
n 01=1
As the points X Ol are at the centers Xc of cubes of the same size we have for
design A
uEG
2 = 1 2( xc,v )
-uE
n
For design B, the sampies are supposed to be located at random in the domain
(Poisson points). We shall consider one realization z with random coordinates
Xl, X 2, X a. The expectation will be taken on the coordinates. The global esti-
mation variance is
= (zv - Z'l) ) 2 ]
rl
sfuG Ex [
= Ex [ (~ Ez(Xf,X;,X~) - z(V)
sfuG = ~ tEx[(z(Xf,X;,X~)-z(V)r]
n 01=1
We now write explicitly the expectation over the random locations distributed
with probabilities I/lVI over the domain
2
sEG = ~
n
t JJJp(xr,x~,x~)· (z(xr,x~,x~)-z(V)r
01=1:1:1:1:2:1:3
dXldx2dxa
= ~2 t I~I JJJ(z(xr,x~,x~)-z(V)r
Oll :1:1<1'2:1:3
dXldx2dx3
1 n
= n2 2: s2(·IV)
01=1
= 2:.n s2 (·IV)
Generalizing the formula from one realization z to the random function Z
and taking the expectation (over Z), we have for design B
2
uEG = 1 2( .IV)
-u
n
Extension and Dispersion Variance 61
For design C, each sampie point is located at random within a cube v", and
the global estimation variance for one realization z is
S~G = Ex [(z; - Zv fl
Ex [(~ E( z(Xf,X~,Xf) - z(v",)) rJ
= ~s2('lv)
n
Randomizing z to Z and taking the expectation, we have for design C
2 1 2
/7EG = -/7 ('Iv)
n
Comparing the random grid B with the random stratified grid C we know
from Krige's relation that
/72( 'Iv) ::; /72( . IV)
/72('lv) = ;:Y(v, v)
It turns out (see [50]) that the former is usually lower than the latter. The
regular grid is superior to the random stratified grid from the point of view of
global dispersion variance as the sampies cover evenly the region.
However for computing the experimental variogram, an advantage can be
seen in using unequally spaced data: they will provide more information about
small-scale variability than evenly placed sampies. This helps in modeling the
variogram near the origin.
9 Measures and Plots of Dispersion
JdF(u)
+00
T(z) =
z
The quantity Q(z) of metal recovered above a cut-off grade z is given by the
integral
JudF(u)
+00
Q(z) =
z
The function Q(z) is decreasing. The value of Q(z) for z= 0 is the mean
because the cut-off grade is a positive variable
JudF(u)
+00
Q(z) = = E[ Z] = m
o
Measures and Plots of Dispersion 63
+ +00
Q(z) [-uT(u)L 00 + j T(u) du
z
+00
= zT(z)+ jT(u)du
z
= C(z) + B(z)
The recovered quantity is thus split into two terms, which can be interpreted
in the foHowing way in mining economics (assuming that the cut-off grade is weH
adjusted to the economical context)
- the share C(z) of metal which reflects the investment, i.e. the part of the
recovered metal that will serve to refund the investment necessary to mine
and process the ore,
- the amount B(z) of metal, which represents conventional profit, i.e. the
leftover of metal once the quantity necessary for refunding the investment
has been subtraeted.
These different quantities are aH present on the plot of the funetion T(z) as
shown on Figure 9.1. The conventional investment (assuming that the cut-off
adequately refleets the economic situation) is a reet angular surface
C(z) = zT(z)
+00 1
j B(u) du = "2 (m 2 + 0"2)
o
where m is the mean and 0"2 is the variance of Z. If we subtraet from the surface
under B(z) the area corresponding to half the square m 2 we obtain a surface
element whose value is half of the variance, as shown on Figure 9.2.
Note that Q(z) divided by T(z) yields the average of the values above cut-off
1~l-----------------------------------------
o z
Figure 9.1: The function T(z) oft he extracted tonnage aB a function of cut-off grade.
The shaded surface under T(z) represents the quantity of meta! Q(z) which is com-
posed of the conventiona! investment C(z) and the conventional profit B(z).
Selectivity
The selectivity S of a distribution F is defined as
S = ~E[IZ - Z'I]
where Z and Z' are two independent random variables with the same distribution
F. There is an analogy with the following formula for computing the variance
under the same conditions
q2 = ~E[ (Z _ Z')2]
The selectivity and the variance are both measures of dispersion, but the
selectivity is more robust than the variance.
The selectivity divided by the mean is known in econometrics as the Gini
coefficient (see e.g. KENDALL & STUART [94])
.. S
GIm=-
m
and represents an alternative to the coefficient 0/ variation q/m. As S ~ m the
Gini coeflicient is lower than one
o ~ Gini ~ 1
Measures and Plots of Dispersion 65
m~k----------------------------------------------___
m
o z
Figure 9.2: Function B(z) ofthe conventional profit. The shaded surface under B(z)
represents half of the variance u 2 •
u- V3S
-~
where 0 ~ U ~ l.
For a Gaussian distribution with a standard deviation u the selectivity is
U
SGaUB8 = v'ir
which yields a value near to the limit case of the uniform distribution.
For a lognormally distributed variable Y = m exp( u Z - u 2 /2) we have the
selectivity
Slognonn = m (2 G (u / v'2J - 1)
where G(z) is the standardized Gaussian distribution function.
66 Geostatistics
Jmin(T(z),T(u))du
+00
Q(z) =
o
+00
Q(T) = Jmin(T,T(u))du
o
= "2
o o
From this it can be seen that the se1ectivity is zero when the distribution F
is concentrated at one point (a Dirac measure). Finally we have
J Jlu-zldF(u)dF(z)
1 1 +00+00
J(Q(T) - mT) dT = "2
o o 0
1
= -8
2
by definition.
Measures and Plots of Dispersion 67
m r----------------------------------==
Q(T)
o T
Figure 9.3: The function Q(T) of the quantity of metal with respect to tonnage. The
shaded surface between Q(T) and the diagonalline corresponds to half the value of
the selectivity S.
Z(t)
T(z)
Figure 9.4: Environmental time series Z(t) with a cut-off value z: the shaded part
corresponds to the quantity of nuisance Q(z) = B(z) + C(z) during the time T(z)
when the tolerance level is trespassed.
It is not dear what meaning the curve B(z) can have in environmental prob-
lems, except for the fact that its graph contains a geometrie representation of
(half of) the variance (T2 when plot ted together with the line joining the value of
the mean m on both the abscissa and the ordinate, as suggested on Figure 9.2.
10 Kriging the Mean
The mean value of sarnples from a geographical space can be computed, either
using the arithmetic mean, or with a weighted average integrating the knowledge
of the spatial correlation of the sarnples. The two approaches are compared.
No systematic bias
We need to assurne that the mean exists at all points of the region
E[ Z(x)] = m for all x E 'D
We want the estimation error
m* m
~
estimated value -------
true value
to be zero on average
E[m*-m]=O
i.e. we do not wish to have a systematic bias in our estimation procedure.
This can be achieved by constraining the weights w'" to sum up to one
"
Lw",=l
",=1
E[m*-m] = E[Ew",z(x",)-m]
= tw", E[Z(x",)]-m
",=1 '---v----'
m
"
m Lw",-m
--..--
",=1
1
= 0
The variance of the estimation error can be expressed in terms of the covari-
ance function
The estimation variance is the sum of the cross-products of the weights as-
signed to the sampie points. Each cross-product is weighted by the covariance
between the corresponding sampie points.
Kriging equations
To solve the optimization problem the partial derivatives of the objective function
<p( W a , Il) are set to zero
ß<p( W a , p.)
ßw",
= 0 for a=l, ... ,n
ß<p( W a , Il)
äll
= 0
72 Geostatistics
(TkM = var(m* - m)
n n
= E E w~M w~M C(xa - xß)
a=1ß=1
n n
= Ew~M !-'KM = !-'KM Ew~M
a=1 a=1
"---v---"
1
= !-'KM
The variance of the kriging of the mean is thus given by the Lagrange multi-
plier !-'KM.
KM
wa =-n1
and the estimator of the kriging of the mean is equivalent to the arithmetic mean
n 1 n
m* = :E w!M Z(xa ) = - :E Z(Xa)
a=l n a=l
Ordinary kriging is the most widely used kriging method. It serves to estimate
a value at a point of a region for whieh a variogram is known, using data in the
neighborhood of the estimation loeation. Ordinary kriging ean also be used to
estimate a block value. With loeal seeond-order stationarity, ordinary kriging
implieitly evaluates the mean in a moving neighborhood. To see this, first a
kriging estimate of the loeal mean is set up, then a simple kriging estimator
using this kriged mean is examined.
Figure 11.1: A domain with irregularly spaeed sampie points (blaek dots) and a
Ioeation of interest xo.
values from the n neighboring sampie points X o' and eombining them linearly
with weights WO'
n
Z*(xo) = L WO' Z(x O' )
0'=1
Obviously we have to eonstrain the weights to sum up to one beeause in the
extreme ease when all data values are equal to a eonstant, the estimated value
should also be equal to this eonstant.
We ass urne that the data are part of a realization of an intrinsie random
function with a variogram ,(h).
Ordinary Kriging 75
E[ Z*(xo) - Z(xo)] = E[ t
",=1
w'" Z(x",) - Z(xo)· t W"']
",=1
'-..--'
1
n
= 0
LW",=O
",=0
Thus the condition that the weights numbered from 1 to n sum up to one also
implies that the use of the variogram is authorized in the computation of the
variance of the estimation error.
COMMENT 11.1 The variogram is authorized for ordinary kriging, but not for
simple kriging, because the latter does not include a constraint on the weights.
r]
The estimation variance is
( i(Xn-X1)
i(X1~X1)
i(X1-Xn) ~) (W[) (i(X1~XO))
=
i(Xn-Xn) 1 wn i(Xn-Xo)
1 1 0 /lOK 1
where the w~K are weights to be assigned to the data values and where fLOK is the
Lagrange parameter. The left hand side of the system describes the dissimilarities
between the data points, while the right hand shows the dissimilarities between
each data point and the estimation point xo.
76 Geostatistics
Figure 11.2: A domain with irregularly spaced sampIe points (black dots) and a block
Vofor which a value should be estimated.
I
rewritten in the form
t
ß=1
wg K ,(x,,-xß) + JlOK = ,(x,,-xo) for a = 1, ... ,n
n
L:wgK = 1
ß=1
Block kriging
Ordinary kriging can be used to estimate a block value instead of a point value
as suggested by the drawing on Figure 11.2. When estimating a block value from
Ordinary Kriging 77
point values by
n
'(X1~Xd
( ,(xn-xd ,(X1-Xn) 1) (W~K) ( ::Y(X1, vo) )
,(Xn-Xn) ~ W~K = ::y(xn', vo))
1 1 0 fLBK 1
where the right hand side now contains the average variogram ::y(x", , vo) of each
sample point with the block of interest.
The corresponding block kriging variance is
n
(j~K = -fLBK - ::y(vo, vo) +2 L w~K::y(x""vo)
",=1
For a second-order stationary random function the block kriging variance can
be written in terms of covariances
n
2
(jBK = fLBK + -C( Vo, vo) - ~ BK-
2 L...J w'" C(x"" vo)
",=1
Assuming the covariance function falls off to zero (like the spherical or expo-
nential functions), if we let the volume Vo grow very large, the terms C( Vo, vo)
and C(x", , vo) vanish, i.e.
2 2
(jBK ~ (jKM for Ivol ~ 00
The same is true for the kriging systems and the estimates. The block kriging
tends to be equivalent to the kriging of the mean for large blocks Vo.
Why not insert this estimated mean value into the simple kriging estimator
n
ZSKM(xO) = m* + L w~K(Z(x",) - m*)
",=1
where w~K are the simple kriging weights? Replacing m* by the corresponding
linear combination gives
n n n n
ZSKM(XO) = L w~M Z(x",) + L w~K Z(x",) - L w~K L w~M Z(Xß)
",=1 ",=1 ",=1 ß=1
78 Geostatistics
n n n n
L: w~K Z(x,,) + L: w~M Z(x,,) - L: W~M Z(x,,) L: W~K
,,=1 ,,=1 ,,=1 ß=1
Introducing a weight
n
w= 1- L:W~K
,,=1
called the weight of the mean, we have
Choosing to call f.-l' the product of the weight of the mean with the variance
of the kriging of the mean, and putting the equations for A~ together, we indeed
!
have an ordinary kriging system
t
ß=1
wß C(x,,-xß) = C(x,,-xo) + f.-l' for a = 1, ... ,n
n
LWß=l
ß=l
which shows that ordinary kriging is identical with simple kriging based on the
estimated mean: ZSKM(xO) = Z5K(xO).
The ordinary kriging variance has the following decomposition
E[m.Y(x)] =mE[Y(x)] =0
because it is a deterministic quantity.
We know how to estimate Z(x) by ordinary kriging and the mean m by a
kriging of the mean. Now we would like to krige Y(x) at some location Xo with
the same type of weighted average
n
Y*(xo) = L Wa Z(x a )
a=1
- m LWa =0
a=1
'-v-'
o
The condition on the weights has the effect of removing the mean.
The estimation variance var(Y*(xo) - Y(xo)) is identical to var(Z*(xo) -
Z(xo)) as Y(x) has the same covariance function C(h) as Z(x). The system of
the kriging 01 the residual (KR) is the same as for ordinary kriging, except for
the condition on the sum of weights which is zero instead of one.
It is of interest to note that the ordinary kriging system can be decomposed
into the kriging of the mean and the kriging of the residual:
[efJ+ ef)1
C(Xl;-Xl) C(XI-Xn)
(
C(X n-Xl)
1
C(xn-X n )
1 1) .
= (~) + (C(X ;-X 1 O
))
o C(xn-xo)
1 0
80 Geostatistics
The weights of ordinary kriging are composed of the weights of the krigings
of the mean and the residual, and so do the Lagrange multipliers
Cross validation
Cross validation is a simple way to compare various assumptions either about
the model (e.g. the type of variogram and its parameters, the size of the kriging
neighborhood) or about the data (e.g. values that do not fit their neighborhood
like outliers or pointwise anomalies).
In the cross validation procedure each sampie value Z(x",) is removed in turn
from the data set and a value Z*(X[",)) at that location is estimated using the
n-l other sampies. The square brackets around the index a symbolize the fact
that the estimation is performed at location x'" excluding the sampled value Z"'.
The difference between a data value and the estimated value
Z(X",) - Z*(X[",))
gives an indication of how weH the data value fits into the neighborhood of the
surrounding data values.
If the average of the cross-validation errors is not far from zero
.!. t(Z(x",)-z*(X[",))) ~O
n ",=1
we can say that there is no apparent bias, while a significant negative (or positive)
average error can represent systematic overestimation (respectively underestima-
tion).
The kriging standard deviation 00[",) represents the error predicted by the
model when kriging at location x'" (omitting the sampie at the location x",).
Dividing the cross-validation error by 00[",) allows to compare the magnitudes of
both the actual and the predicted error
Z(x",) - Z*(X[",))
00[",)
Ordinary Kriging 81
Geometry
The very first aspect to discuss about kriging weights is the geometry of the
sample points and the estimation point. Let us take a lozenge at the corners of
which 4 sampies are located. We wish to estimate a value at the center of the
lozenge.
•
25% 25%
• •
I
0
L
25%
•
Figure 12.1: Ordinary kriging weights when using a nugget-effect model with sampies
located at the corners of a lozenge and the estimation point at its center. The kriging
•
varIance 1S 2 = 1.25 (J'.
• (J'OK 2
With a nugget-effect model, when the estimation point does not coincide with
a data point, the ordinary kriging variance is equal to
2
(J'OK = /LOK + (J'2 = (J'2
-
n
+ (J'2
It is larger than the variance (J'2.
Kriging Weights 83
40.6%
•
9.4% 9.4%
• •
I•
0
L
40.6%
Figure 12.2: With a spherical model of range a/ L = 2 the remote sampies get lower
weights. The ordinary kriging variance is 0"6K= .840"2.
Using a Gaussian covariance model with range parameter a/ L = 1.5 the remote
sam pies have almost no influence on the estimated value at the center of the
lozenge as shown on Figure 12.3.
49.8
•
0.2% 0.2%
• •
I
0
L
49.8%
•
Figure 12.3: Using a Gaussian model with a range parameter a/ L = 1.5 the remote
sampies have very little influence on the estimated value. The ordinary kriging variance
is 0"6K = .30 0"2.
25% 17.6%
• •
I• I•
25% 25% 32.4% 32.4%
• 0
• • 0
•
L 25% L 17.6%
Geometrie anisotropy
On Figure 12.4 a configuration of four sampies at the corners of a square with a
eentral estimation point is used, taking alternately an isotropie (on the left) and
an anisotropie model (on the right).
On the left, the isotropie model generates equal weights with this configu-
ration where all sampie points are symmetrieally loeated with respect to the
estimation point.
On the right, a spherieal model with a range of a/ L = 1.5 in the horizon-
tal direetion and a range of a/L = .75 in the vertieal direction is used as the
anisotropie model. The weights are weaker in the direetion with the shorter
range and this matches intuition.
o 37.1% o 37.1%
33.3% 25.9%
Figure 12.5: Three samples on a circle around the estimation location (on the left
0'6K= .450'2 and on the right 0'6K= .480'2).
o o
48.7% 50%
Figure 12.6: On the left, two sampies have been set near to each other at the top
(0'6K= .5260'2). On the right, the two upper samples have been merged into one
single top sample (0'6K= .5370'2).
Screen effect
Aremarkable feature of kriging, which makes it different from other interpolators,
is that a sample can screen off other samples located behind it with respect to
the estimation location. An example of this phenomenon is given on Figure 12.7
showing two ID sampling configurations. A spherical model of range parameter
a/ L= 2 was used, where L is the distance from the point A to the estimation
location.
At the top of Figure 12.7, two samples A and Bare located at different
distances of the estimation point and get different ordinary kriging weights of
86 Geostatistics
65.6% 34.4%
• A
o •B
•
A
o • C
•B
Figure 12.7: At the top, we have two sampies at different distances from the esti-
mation point (U&K= 1.14u2). At the bottom, a third sampie has been added to the
configuration (U&K= .87 u 2).
65.6% and 34.4% according to their proximity to that point. The kriging variance
is U&K = 1.14 u 2.
At the bottom of Figure 12.7, a third sample C has been added to the config-
uration: the weight of B drops down to 2.7% and almost all weight is distributed
fairly equally between A and C which are at the same distance from the estima-
tion point. The ordinary kriging variance is brought down to U&K = .87 u 2.
An interpolator based on weights buHt on the inverse of the distance of a
sample to the estimation point does not show screen effects. The weights are
shaped without taking any account of the varying amount of samples in different
directions around the estimation point.
C(h) = e- 1h12
can be factorized with respect to the two spatial coordinates:
0% 0%
•D
•E
0%
•F
Figure 12.8: Simple kriging with a Gaussian covariance model: samples A, B, C
screen off D, E, F (in simple kriging the weights are unconstrained and do not add up
to 100%).
• D
•
G
•E
• A
0
•B
•C
-2.4%
•F
Figure 12.9: Simple kriging with a Gaussian covariance model: adding the point G
radically changes the distribution of weights.
000 0
•0 0 0 0 0 • 000
Xo
o 0 0 0 0·0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 000 000
• •
• • •
• •
• •
•
• •
•
• •
Figure 13.1: To make a map, a regular grid is built for the region of interest and each
node becomes the node Xo in turn.
evaluate the precision of the estimation in any part of the region. It is a useful
compendium of the map of the kriged values. It should be understood that the
kriging variance is primarily a measure of the density of information around the
estimation point.
The white squares on Figure 13.3 correspond to nodes which coincide with
a sampie location: the kriging variances are zero at these points where exact
interpolation takes place.
For a better visual understanding of the spatial structure of the kriged values
the grid of regularly spaced values is generally submitted to a second interpolation
step. One possibility is to refine the initial grid and to interpolate Hnearly the
kriged values on the finer grid like on the raster representation on Figure 13.4.
The result is a smoother version of the raster initially shown on Figure 13.2.
Another way to represent the results of kriging in a smooth way is to contour
them with different sets of isoHnes as shown on Figure 13.5 (the arsenic data has
been standardized before kriging: this is why part of the values are negative).
Neighborhood
It may be asked whether it is necessary to involve all sampies in the estimation
procedure. More precisely, with a spherical covariance model in mind, as the
covariance is zero for distances greater than the range, is it necessary to involve
sampies which are far away from a location of interest Xo?
It seems appeaHng to draw a circle (sphere) or an ellipse (ellipsoid) around
Xo and to consider only sampies which He within this area (volume) as neighbors
Mapping with Kriging 91
of Z(xo). Loeations whose eovarianee with Xo is zero are uneorrelated and the
eorresponding sampies have no direct influenee on Z(xo). Thus it makes sense
to try to define for eaeh loeation Xo a loeal neighborhood of sampies nearest to
Xo·
At first sight, with the spherieal eovarianee model in mind, the radius of the
neighborhood eentered on Xo eould be set equal to the range of the eovarianee
function and sampies further away eould be excluded as represented on Fig-
ure 13.6. However, the spherieal eovarianee function expresses only the direct
eovarianees between points, but not the partial eovarianees of the estimation
point with a data point eonditionally on the other sampie points. The partial
eorrelations are reflected in the kriging weights, whieh are in general not zero for
points beyond the range from the estimation loeation.
In practiee, beeause of the generally highly irregular spatial arrangement
and density of the data, the definition of the size of a loeal neighborhood is not
straightforward. A eriterion in eommon use for data on a regular mesh is to eheck
for a given estimation point if adding more sampies leads to a signifieant decrease
of the kriging variance, i.e. an increase in the precision of the estimation. Cross
validation criteria can also be used to compare different neighborhoods.
92 Geostatistics
285.
280.
275.
270.
Figure 13.4: Smoothed version of the raster representation of arsenie: a finer grid is
used and the kriged values are interpolated linearly on it.
Mapping with Kriging 93
285.
280.
275.
270.
Figure 13.5: Isoline representation of kriged values of arsenie: the kriged values at
the grid no des are contoured with a set of thin and thick isolines (arsenic data was
reduced to zero mean and unit variance before kriging).
"\. Xo
0 0 0 0
• 0 0 0
o
0·00;1.
0 0 0 0·0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 o( 0 0 0 0 0 0 0
•
•
• - .
•
• •
Figure 13.6: Around a location of interest Xo of the grid a circular neighborhood is
defined and the sampIes within the circle are selected for kriging.
14 Linear Model of Regionalization
Spatial anomalies
In geochemical prospecting or in environment al monitoring of soil the aim is to
detect zones with a high average grade of mineral or pollutant. A spatial anomaly
with a significant grade and a relevant spatial extension needs to be detected.
We now discuss the case of geochemical exploration.
When prospecting for mineral deposits a distinction can be made between
geological, superficial and geochemical anomalies as illustrated on Table 14.2.
A deposit is a geological anomaly with a significant average content of a
given raw material and enough spatial extension to have economic value. Buried
at a specific depth, the geological body may be detectable at the surface by
indices, the superficial anomalies, which can be isolated points, linear elements
or dispersion haloes.
Geological and superficial anomalies correspond to avision of the geological
phenomenon in its full continuity. Yet in propecting only a discrete perception of
the phenomenon is possible through samples spread over the region. A superficial
anomaly can be apprehended by one or several sampies or it can escape the grip
of the propector when it is located between sampie points.
A geochemical anomaly, in the strict sense, only exists at the sam pie points
and we can distinguish between:
3D 2D SampIes
I GEOLOGICAL 1-----+ SUPERFICIAL 1-----+ GEOCHEMICAL I
.0 ~ .. : •••• '0:'. '0 ~'.~ : ; ••••• : ..... o.~: .~... : .'.0,' .' ••••••• '. -.; 0ooogo'oo";,
•• " . • 0- 0 ' •• '
'Y (h)
l._0 .
--_ .... _---_ ... _---
0_5
Q) Q)
0> 0>
...CO
C C
Ci) ~
0> t::: 0>
0> 0
::J .c C
C (f) .Q
0_ I
aa_ I
l._
I
2_
I
3_
I
4_
I
5_
I
6_
I
7_
I
8_
I
9_
I
l.0_
Ih
l.1.
Figure 14.2: Variogram of the arsenic measurements fit ted with a nested model eon-
sisting of a nugget-effeet plus two spherical models of 3.5 km and 6.5 km range.
Intrinsic regionalization
In the same way an intrinsie regionalization is defined as a sum of 5 + 1 compo-
nents
can be termed locally stationary (i.e. locally second-order stationary), the ex-
pectation of Z(x) being approximatively equal to a constant inside any rather
small neighborhood of the domain.
The locally stationary regionalization model is the adequate framework for
the typical application with the following three steps
2. only covariance functions (i.e. bounded variogram functions) are fit ted to
the experimental variogram;
In the first step the experimental variogram filters out ml (x) at short dis-
tances, because it is approximatively equal to a constant in a local neighbor-
hood. For distances up to the radius of the neighborhood the variogram thus
estimates well the underlying covariance functions, which are fitted in the sec-
ond step. In the third step the ordinary kriging assurnes implicitly the existence
of a local mean within a moving neighborhood. This local mean corresponds
to the constant to which ml(x) is approximatively equal within not too large a
neighborhood. The local mean can be estimated explicitly by a kriging of the
mean.
15 Kriging Spatial Components
= t w~ (I: E[
,,=1 .,=o~
Z.,(x,,)] + ~[ ZS(X,,) - ZS(Xo) L)
..
o 0
= 0
Remember that the constraint of unit sum weights is also needed for the
existence of a variogram as a conditionally negative definite function as explained
in the presentation of ordinary kriging.
The estimation variance c7~ is
c7~ = var(Z~(xo)-Zs(Xo))
= E[ (
EEw~
S-1 n
Z.,(x,,) + Ew~
n
(Zs(x,,) - Zs(xo))
)2 ]
Kriging Spatial Components 101
a~ L L LW; w% CU(Xa-Xß)
u=o a=1 ß=1
n n n
1Lw%=l
ß=1
n
ß=1
The differenee with ordinary kriging is that the terms ,S(xa-xo), whieh are
speeifie of the eomponent Zs(x), appear in the right hand side.
285.
280.
275.
270.
Figure 15.1: Map of the component associated with the short range (3.5 km).
+ E
n
w~o ZS(X,,) - ZS(O) . En
w~o
]
~
o
S-1 n
L L W~o E[ Z,,(x,,)]- E[ Z"o (xo)]
u=O ,,=1 '-v--" '-----v-----"
o 0
n
n n n
= CUO(Xo-Xo) - LL W~o W'// ,(Xa-Xß) + 2 L Wa ,UO(Xa-Xo)
a=lß=1 a=1
I
The kriging system for the component Zuo (xo) is then
t w'//
ß=1
,(xa-xß) + /1uo = ,UO(xa-xo) for a = 1, ... , n
n
L wßo = 0
ß=1
We are thus able to extract from the data different components of the spatial
variation which were identified on the experimental variogram with a nested
variogram model.
These component kriging techniques bear some analogy to the spectral anal-
ysis methods, which are much in use in geophysics. CHILES & GUILLEN (33)
have compared the two approaches on gravimetrie data using covariance func-
tions derived from a model which represents geological bodies as random prisms
(see (177), (176)). The results obtained from decomposing either the variogram
or the spectrum matched weIl, even though the geological bodies were located
at different average depths in each method. An advantage of the geostatistical
approach is that the data need not to be interpolated on a regular grid for com-
puting the experimental variogram, while this is necessary for determining the
spectrum.
Filtering
and obtain a filtered version Z,(x) of Z(x). To achieve this by kriging, we use
weights w! in the linear combination
n
Zj(Xo) L w!Z(x a)
a=1
104 Geostatistics
285.
Ir
280.
275....
~
, r'
r
270.
Figure 15.2: Map of the component associated with the long range (6.5 km) together
with the estimated mean.
1 ß=l
n
LW~=l
ß=l
where C1(h) is absent in the right hand side. We obtain thereby an estimation
in which the component Zf(x) is removed.
The filtered estimate can also be computed by subtracting the component
from the ordinary kriging estimate
Zj(x) = Z~K(X) - Z~(x)
According to the configuration of data around the estimation point, ordinary
kriging implicitly filters out certain components of the spatial variation. This
topic will be taken up again in the next chapter when examining why kriging
gives a smoothed image of reality.
Loire river has already been shown on Figure 14.1 (page 96). The values of
arsenic were standardized before analysis. The mean experimental variogram of
this variable was fitted on Figure 14.2 (page 97) with a nested variogram model
How smooth are estimated values from kriging with irregularly spaced data in a
moving neighborhood? By looking at a few typical configurations of data around
nodes of the estimation grid and by building on our knowledge of how spatial
components are kriged, we can understand the way the estimated values are
designed in ordinary kriging.
The sensitivity of kriging to the choice of the variogram model is discussed
in connection with an application on topographie data.
~
\
0 0 0 0
• 0 0 0 0
0 0 0 0 0·0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
•
"------
• •
•
• •
Figure 16.1: No data point is within the range of the two spherical models for this
grid node.
as all distanees involved are greater than a2. The information transferred to Xo
will be an estimation of the loeal mean in the neighborhood.
This shows that kriging is very eonservative: it will only give an estimate of
the mean function when data is far away from the estimation point.
2. Xo is less than a2 and more than al away irom the nearest data point Xc>
With this spatial arrangement, see Figure 16.2, the ordinary kriging is equivalent
to a filtering of the eomponents Zo(x) and Zl(X), with the equation system
t I I
12::
Aß C(xc>-xß) - p. = C2(xc> - xo) for a = 1, ... ,n
ß=l
n
Aß = 1
ß=l
The eovarianees Co(h) and C1(h) are not present in the right hand side for
kriging at such a grid node.
3. Xo is less than al away, but does not coincide with the nearest Xc>
In this situation, shown on Figure 16.3, ordinary kriging will transfer information
not only on the long range eomponent Z2(X), but also about the short range
component Zl (x), which varies more quickly in space. This creates a more
detailed description of the regionalized variable in areas of the map where data
108 Geostatistics
range 2 /
~
0
0
0
0
0
0
0
0
• 0
0·0
0
0
0
0
0
0
0
•
G 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
• •
• •
•
• •
Figure 16.2: One data point is within the range of the second spherical model for this
grid node.
!
effect eomponent
t
ß=l
I
Aß C(x",-xß) - f.l = Cl (X", - Xo) + C2(x", - Xo) I for a = 1, ... ,n
n
LAß =l
ß=l
Only the nugget-effeet eovarianee Co(h) is absent from the right hand side
when kriging at such a loeation.
In this last ease, when Xo = x'" for a particular value of a, ordinary kriging
does not filter anything and restitutes the data value measured at the loeation
x'" = Xo: it is an exact interpolator!
This aeeuraey may nevertheless be regarded as a nuisanee when there are only
a few loeations of eoincidenee between the grid and the data locations, beeause
the nugget-effect component will generate spikes in the estimated values at these
loeations. To avoid having a map with discontinuities at fortuitous loeations, it
is often advisable to filter systematieally the nugget-effect component Zo(x) in
eartographieal applieations.
The Smoothness of Kriging 109
range 1
0000000 • range 2
o 0 0 0 0·0 0
o 0 0 0 0 0 0
o 0 000
o 0 0 0 0 000 0 0 000
• •
• •
•
• •
Figure 16.3: One data point is within the range of the two spherical models for this
node.
2. continuous, but not differentiable. For example a linear behavior near the
origin implies that the variogram is not differentiable at the origin. It has
the effect that the kriging estimator is not differentiable at the data loca-
tions and that the kriged surface is linear in the immediate neighborhood
of the data points.
model belongs to the family of stable variogram models (that bear this name
because of their analogy with the characteristic function of the stable distribu-
tions)
IhlP
J(h) b (1 - e-7) with 0 < p ~ 2 and a, b > 0
Ihl 2
J(h) b(1-e-7)
which is infinitely differentiable at the origin. The implication is that the asso-
ciated random function also has this property. This means that the function is
analytic and, if known over a small area, it could be predicted just anywhere as
all derivatives are known. Clearly this model is unrealistic in most applications
(besides the fact that it is purely deterministic). Furthermore the use of such
an unrealistic model has consequences on kriging which are illustrated in the
following example.
250.
200.
150.
100.
50.
Figure 16.4: On the left: map of elevations obtained by ordinary kriging with unique
neighborhood using a stable variogram (power p = 1.5) (data points are displayed as
stars of size proportional to the elevation). On the right: map of the corresponding
standard deviations.
250.
200.
o
150.
100.
50.
Q
Bl ..... t:l.on St.n4ard 4eviat:l.on
250.
:aoo.
150.
100.
50.
Multivariate Analysis
17 Principal Component Analysis
!..yTy = !..yTZA
n n
116 Multivariate Analysis
8</>1 = 0
8a l
~ 2Val - 2.Alal =0
we see that .Al is an eigenvalue of the variance-covariance matrix and that al is
equal to the eigenvector ql associated with this eigenvalue
Vql = .Alql
Principal Component Analysis 117
The eigenvalues indicate the amount of the total variance associated with
each factor and the ratio
variance of the factor Ap
total variance = tr(V)
gives a numerical indication, usually expressed in %, of the importance of the
factor.
Generally it is preferable to standardize the variables (subtracting the means
and dividing by the standard deviations), so that the principal component anal-
ysis is performed on the correlation matrix R. In this framework, when an
eigenvalue is lower than 1, we may consider that the associated factor has less
explanatory value than any single variable, as its variance is inferior to the unit
variance of each variable.
118 Multivariate Analysis
Humerus 1
Ulna .940 1
Tibia .875 .877 1
Femur .878 .886 .924 1
11 Humerus Ulna Tibia Femur
where mi and ~ are the mean and the standard deviation of the variable Zai.
The varianee-eovarianee matrix assoeiated to standardized data is the eorre-
lation matrix
R =!:. ZT Z
n
whieh ean be deeomposed using its eigensystem as
The vectors ai, eolumns of AT, are remarkable in the sense that they indieate
the eorrelations between a variable Zi and the factors Yp beeause
The vectors a; are of unit length and their eross product is equal to the
eorrelation eoeffieient
a;aj = Pij
Owing to their geometry the veetors a; ean be used to represent the position
of the variables on the surfaee of the unit hypersphere eentered at the origin.
The eorrelation eoeffieients Pij are the eosines of the angles between the vectors
referring to two different variables.
The projection of the position of the variables on the sUrfaee of the hyper-
sphere towards aplane defined by a pair ofaxes of factors yields a graphical
representation called the circle 0/ correlations. The circle of correlations shows
the proximity of the variables inside a unit eircle and is useful to evaluate the
affinities and the antagonisms between the variables. Statements ean easily be
made about variables whieh are loeated near the eireumferenee of the unit eircle
Principal Component Analysis 119
Factor 1 (92 %)
size
....
Ifl
o
tibi.!! j\ h"meNS
femur ulna
o +---------+--------1 Factor 2 (4%)
I shape
Ifl
o
I
....I
-1 -0.5 o 0.5 1
Figure 17.1: The position of the length variables inside the correlation circle. Each
axis is labeled with the number of the factor and the proportion of the total variance
(in %) extracted by the factor.
EXAMPLE 17.1 The correlation matrix of the lengths of fowl bones (see MOR-
RISON, 1978, p282) on Table 17.3 seems trivial to interpret at first look: a11
variables are we11 correlated. Big birds have large bones and small birds have
sma11 bones.
Inspection of the plane of the first two factors on Figure 17.1 showing the
position of the length variables inside the correlation circle, reveals two things.
The correlations of a11 variables with the first factor (92% of the total variance)
are a11 very strong and of the same sign. This factor represents the difference
in size of the fowls, which explains most of the variation, and has an obvious
impact on all bone lengths. The second factor (4% ofthe variance) splits up the
group of the four variables into two distinct subgroups. The second factor is due
to the difference in shape of the birds: the leg bones (tibia and femur) do not
grow in the same manner as the wing bones (humerus and ulna). Some birds
have shorter legs ar langer wings than others (and vice-versa).
120 Multivariate Analysis
F1
.....
~.-
c
.g~ •
~ ~ / •
o V T ~ F2
In
o
.
I
.....
I
-1 -0.5 o 0.5 1
Figure 17.2: Unit cirele showing the correlation of seven soil pollution variables with
the first two principal components.
Having a second look at tbe correlation matrix on Table 17.3, it is readily seen
tbat tbe correlations among tbe wing variables (.94) and among tbe leg variables
(.92) are stronger tban tbe rest. Principal component analysis does not extract
anytbing new or bidden from tbe data. It is merely a belp to read a correlation
matrix. Tbis feature becomes especially useful for large correlation matrices.
EXAMPLE 17.2 In a study on soil pollution data, tbe seven variables Pb, Cd,
Cr, Cu, Ni, Zn, Mo were logaritbmically transformed and standardized. Tbe
principal components calculated on tbe correlation matrix extracted 38%, 27%,
15%, 9%, 6%, 3% and 2% of the total variance. So the first two factors represent
about two thirds (65%) of the total variance. The third factor (15%) hardly
explains more variance than any of the original variables taken alone (1/7 =
14.3%).
The Figure 17.2 shows the correlations with the first two factors. Clearly
the first factor exhibits, like in the previous biological example, mainly a size
effect. Trus is more appropriately termed a dilution factor because tbe measured
elements constitute a. very small proportion of the soil and tbeir proportion in
a soil sampie is directly proportional to the overall pollution of the soil. The
second factor looks interesting: it shows the antagonism between two pairs of
elements: Pb, Cd and Ni, Cr.
Principal Component Analysis 121
Pb 1
Cd .40 1
Zn .59 .54 1
Cu .42 .27 .62 1
Mo .20 .19 .30 -.07 1
Ni -.04 -.08 .40 .29 .07 1
Cr -.14 -.11 .29 .11 .13 .86 1
11 Pb Cd Zn Cu Mo Ni Cr
Table 17.4: Correlation matrix between the seven soil pollution elements.
The correlation coeflicients between the seven soil pollution elements are
listed on Table 17.4. The ordering of the variables was chosen to be the same
as on the second factor: Pb, Cd, Zn, Cu, Mo, Ni and Cr (from left to right on
Figure 17.2). This exemplifies the utility of a PCA as a device to help reading a
correlation matrix.
In this data set of 100 samples there are three multivariate outJiers that could
be identified with a 3D representation of the sample cloud in the coordinate
system of the first three factors (which concentrate 80% of the variance). On
a modern computer screen the cloud can easily be rotated into a position that
shows best one of its features like on Figure 17.3.
This need for an additional rotation illustrates the fact that PCA only pro-
vides an optimal projection plane for the samples if the cloud is of ellipsoidal
shape.
In the general case of non-standardized data it is possible to build a graph
showing the correlations between the set of variables and a pair of fadors. The
variance-covariance matrix V is multiplied from the left and the right with the
matrix Du-l of the inverses of the standard deviations (see Example 1.4)
corr(z;,yp) = ~p-qip
aii
J:.ZTy = RQ = QA with QT Q = I
n
that the correlation coeflicient between a vector Zi and a factor y p is
28 F3
•
•
""
: " •• r _ .. '
" ""'""
, -_ • • " F2
..&..
••• -.rr- '
. ,;,." . .'
I: 'r
98
•
Figure 17.3: The cloud of 100 sampies in the coordinate system of the first three factors
after an appropriate rotation on a computer screen ("Pitch", "Roll" and "Yaw" are
the zones of the computer screen used by the pointer to activate the 3D rotation of
the cloud). Three outliers (with the corresponding sampie numbers) can be seen.
EXERCISE 17.4 Let Z a matrix fo standardized data and Y the matrix of cor-
responding factors obtained by a principal component analysis Y = ZQ where
Q is the matrix of orthogonal eigenvectors of the correlation matrix R.
Show that
R= corr(Z, Y) [corr(Z, Y) r
18 Canonical Analysis
In the previous chapter we have seen how principal component analysis was used
to determine linear orthogonal factors underlying a set of multivariate measure-
ments. Now we turn to the more ambitious problem to compare two groups of
variables by looking for pairs of orthogonal factors, which successively assess the
strongest possible links between the two groups.
Z = [ZIIZ2]
Up = Zl3p et Vp = Z2 bp
which are uncorrelated within their respective groups
and for which the correlation corr(up , vI') is maximal. This correlation, called
the canonical correlation between two factors u and v is
aT ZT Z2 b
corr(u, v) = 1
VaTZIZla. VbTZIZ2b
aTVlla = bTV 22 b = 1
corr(u, v) = aT C 12 b
G
For the complete system of vectors x and y we have
XTGY =E
that is to say the singular value decomposition
G =XEyT
where the canonical correlations Jl-, which are the only non zero elements of E,
turn out to be the singular values of the rectangular matrix G.
T l(T 1 fT
cf>=a C 12b-2"Jl- a V n a-1)-2"Jl-(b V 22b-1)
Canonical Analysis 125
What is the relation between JL and JL'? Premultiplying the first equation
with aT and the second with bT
aT C 12 b - JLaTVlla = 0 {:::::::} JL = a T C 12 b
b T C 21 a - JL'bTV 22b =0 {:::::::} JL' = bT C 21 a
we see that
JL = JL' = corr(u, v)
Having found earlier that
C 12 b = JLVna and C 2l a = JLV22b
we premultiply the first equation with C 2l VI]
C 2l vil c 12 b = JLC2l a
and taking into account the second equation we have
C 2l VilC12b = JL 2V 22 b
Finally this results in two eigenvalue problems, defining ,X = p.2
V221C2l vilc12b = ,Xb
Vil C 12 V 2l C 2l a 'xa
Only the smaller one of the two systems needs to be solved. The solution of
the other one can then easily be obtained on the basis of the following transition
formulae between the transformation vectors
1 1
a = v0. VillC 12 b and b -- v0. V-221 C 2l a
EXERCISE 18.1 Show that two lactors in the same group are orthogonal:
Vlla = ,XC12V2iC21a
Disjunctive table
A qualitative variable z is a system of categories (classes) which are mutually
exclusive: every sampie belongs to exactly one category. The membership of
a sampie z'" to a category Ci can be represented numerically by an indicator
function
1 ( )
Ci Z'" =
{I E
if z'" Ci
0 otherwise
The matrix recording the memberships of the n sampies to the N categories
of the qualitative variable is the disjunctive table H of order n x N
H = [lc;(z",)]
Each row of the disjunctive table contains only one element equal to 1 and
has zeroes elsewhere, as the possibility that a sampie belongs simultaneously to
more than one category is excluded.
The product of H T with H results in a diagonal matrix whose diagonal ele-
ments nii indicate the number of sampies in the category number i. The division
of the nji by n yields the proportion F ji of sampies contained in a category and
we have
(Fll o
F~J
1
V= -HTH= 0
n 0 o
Contingency table
Two qualitative variables measured on the same subset of sampies are repre-
sented, respectively, by a table H 1 of order n x M and a table H 2 of order
n X L.
The product of these two disjunctive tables has as elements the number of
sampies nij belonging simultaneously to a category i of the first qualitative vari-
able and a category j of the second qualitative variable. The elements of this
Correspondence Analysis 127
C 12 = .!.HiH2 = [Fii ]
n
1 T 1 T
C~l = C 12 , Vn = -H1Hl,
n
V 22 = -H 2 H 2
n
The diagonal matrix V = (l/n) HTH associated with the partition of a quanti-
tative variable contains the values of the histogram of the sampies Za, because
its diagonal elements show the frequencies in the classes of values of z.
The contingency table C 12 = (l/n) HiH2 of two quantitative variables Zl
and Z2 holds the information about a bivariate histogram, for we find in it the
frequencies of sampies at the crossing of classes of Zl and Z2.
128 Multivariate Analysis
EXAMPLE 19.1 When the bivariate law is Gaussian, the decomposition into
eigenvalues yields coeflicients /lp equal to a coeflicient p power p
Hp
1}p = v'PT
The bivariate Gaussian distribution G( dZl , dZ2) can then be decomposed into
factors TJp of decreasing importance pP
G(dZl , dZ2) =
p=o
Multivariate Geostatistics
20 Direct and Cross Covariances
The cross covariance between two random functions can be computed not only at
locations x but also for pairs of locations separated by a vector h. On the basis
of an assumption of joint second-order stationarity a cross covariance function
between two random functions is defined which only depends on the separation
vector h.
The interesting feature of the cross covariance function is that it is generally
not an even function, i.e. its values for +h and -h may be different. This occurs
in time series when the effect of one variable on another variable is delayed.
The cross variogram is an even function, defined in the framework of an
intrinsic hypothesis. When delay-effects are an important aspect of the core-
gionalization, the cross variogram is not an appropriate tool to describe the
data.
The direct and cross covariance functions C;j (h) of a set of N random functions
Z;(x) are defined in the framework of a joint second order stationarity hypothesis
The mean of each variable Z;(x) at any point of the domain is equal to a
constant mi. The covariance of a variable pair depends only on the vector h
linking a point pair and is invariant for any translation of the point pair in the
domain.
A set of cross covariance functions is a positive definite function, i.e. the
variance of any linear combination of N variables at n+ 1 points with a set of
weights w~ needs to be positive. For any set of points X o E 1) and any set of
weights w~ E IR
var( t t w~
;=1 a=O
Z;(Xo ))
N N n n
= LL L L W~W~C;j(Xo-xß) ~ 0
;=1 j=1 0=0 ß=O
132 Multivariate Geostatistics
Delay effect
The cross covariance funetion is not apriori an even or an odd funetion. Gen-
erally for i-# j, a change in the order of the variables or a change in the sign of
the separation veetor h changes the value of the cross covariance funetion
EXAMPLE 20.1 Let Zl (x) be a random function obtained by shifting the random
function Zz(x) with a vector r, multiplying it with a constant a and adding c:(x)
EXERCISE 20.2 Compute the cross covariance function between Zl(X) and Zz(x)
for
Zl(X) = al Z2(X + rt} + az Zz(x + rz)
which incorporates two shifts rl and rz.
Direct and Cross Covariances 133
Cross variogram
The direct and cross variogram 'Yij(h) are defined in the context of a joint intrinsic
hypothesis for N random functions, when for any x, x+h E 'D and all pairs
i,j = 1, ... ,N
E[Zi(X+h) - Zi(X)] = 0
{
cov [( Zi(x+h) - Zi(X)), (Zj(x+h) - Zj(X))] = 2'Yij(h)
The cross variogram is thus defined as half the expectation of the product
of the increments of two variables
The cross variogram is obviously an even function and it satisfies the following
inequality
'Yii(h) 'Yjj(h) ;::: bij(h)1 2
because the square of the covariance of increments from two variables is bounded
by the product of the corresponding increment variances. Actually a matrix
r(h o) of direct and cross variogram values is a positive semi-definite matrix for
any fixed h o because it is a variance-covariance matrix of increments.
It is appealing to investigate its relation to the cross covariance function in
the framework of joint second-order stationarity. The following formula is easily
obtained
'Yij(h) = Gij(O) - ~(Gij( -h) + Gij{ +h))
which shows that the cross variogram takes the average of the values for -h and
for +h of the corresponding cross covariance function. Decomposing the cross
covariance function into an even and an odd function
1
- •• C~· (h) "
•
.. lJ
.... , ..oll
-1 . .
••
. •+.. ,I
•
...' ,..••
.r ..,' •.--...~.
..-/'"'" j~ •
•
•
• ......
-2l.,-: yij (h) :. • 458 delay
-3
-400 -200 o
.:,
!' .. ./
200 400 seconds
h
Figure 20.1: Experimental cross covariance Cij(h) and experimental cross variogram
,;j(h).
EXAMPLE 20.3 (GAS FURNACE DATA FROM [16]) Two timeseries, one corre-
sponding to the Buctuation of agas input into a furnace and the other being the
output of 002 from the furnace, are measured at the same time. The chemical
reaction between the input variable and the output variable takes several tens of
seconds and we can expect a delay effect on measurements taken every 9 seconds.
The experimental cross covariance function for different distance classes .fj
gathering n c pairs of locations X a , xß according to their separation vectors
xa-xß = haß E .fj is computed as
1 nc
Cij(.fj) = - I: (Zi(Xa) - mi)' (zj(Xa+h aß ) - mj)
nc a=1
.. I
•• I
1
odd •• • II
• .1I
•
".•
". .1 r
••
...........,.. .
•• ••• I
o ••• •• I
T
.....,.,.......~
•
••
~
I
I• ....•
• •
~
~
.
• •• ..........,.
• I
........• I• :.~
I ••
-1 • I •••
• I •••
I ••
• I •
• .,.. I •
even· ·1· •
-2 •.......
• I··
h
-400 -200 o 200 400 seconds
Figure 20.2: Even and odd terms of the experimental cross covariance.
1 nc
,:](Sj) = 2 I: (zi(xa+haß ) - Zi(Xa))' (zj(Xa+haß ) - Zj(xa))
nc 0'=1
the definition of the cross variogram ';j(h). Assuming for the expectation of the
cross increments
The function lI'ij(h) has the advantage of not being even. The assumption of
stationary cross increments is however unrealistic: it usually does not make sense
to take the difference between two variables measured in different physical units
(even if they are rescaled). PAPRITZ et al. [139], PAPRITZ & FLÜHLER [138]
have experienced limitations in the usefulness of the pseudo cross variogram and
they argue that it applies only to second-order stationary functions. Another
drawback of the pseudo cross variogram function, which takes only positive val-
ues, is that it is not adequate for modeling negatively correlated variables. For
these reasons we shall not consider furt her this approach and deal only with the
more classical cross covariance function.
'l
Gij(O) IGij(h)1
Gii(h) Gjj(h) 'l IGij (h)1 2
as the quantities on the left hand side of the expressions can be negative (or
zero). In particular we do not usuaIly have the inequality
because the maximal absolute value of Gij(h) may not be located at the origin.
The matrix C(h) of direct and cross covariances is in general neither positive
or negative definite at a specific lag h. It is hard to describe the properties
of a covariance function matrix, while it easy to characterize the corresponding
matrix of spectral densities or distribution functions.
21 Covariance Function Matrices
EE E E w~ w~ Cij(x",-xß) ;::: 0
i=1 j=1 ",=0 ß=O
A Hermitian matrix is the generalization to complex numbers of a real sym-
metrie matrix. The diagonal elements of a Hermitian N x N matrix A are
real and the off-diagonal elements are equal to the complex conjugates of the
corresponding elements with transposed indices: aij = aji.
For the matrix of direct and cross covariance functions of a set of complex
variables this means that the direct covariances are real, while the cross covari-
ance functions are generally complex.
Cramer's theorem
Following a generalization of Bochner's theorem due to CRAMER [37] (see also
[68], [208]), each element Gij(h) of a matrix C(h) of continuous direct and cross
covariance functions has the spectral representation
J... Je
+00 +00
Gij(h) = iwTh dFij(w)
-00 -00
138 Multivariate Geostatistics
where the Fij{ w) are spectral distribution functions and W is a vector of the same
dimension as h. The diagonal terms F ii (w) are real, non decreasing and bounded.
The off-diagonal terms Fij(w) (ii= j) are in general complex valued and of finite
variation. Conversely, any matrix of continuous functions C(h) is a matrix of co-
variance functions, if the matrices of increments ßFij{ w) are Hermitian positive
semi-definite for any block ßw (using the terminology of [173]).
Spectral densities
If attention is restricted to absolutely integrable covariance functions, we have
the following representation
This Fourier transform can be inverted and the spectral density functions
fij( w) can be computed from the cross covariance functions
fij(W) = -
1
(211")n
J'" Je-iWThCoo(h)dh
+00 +00
')
-00 -00
EXERCISE 21.2 Show (in one dimensional space) that the function
( ai +a o
)
can only be the cross covariance function of two random functions with an ex-
ponentiaJ cross covariance function Cj(h) = e-a;lhl, aj > 0, if ai = aj.
Covariance Function Matrices 139
Phase shift
In one dimensional space, for example along the time axis, phase shifts can
easily be interpreted. Considering the inverse Fourier transforms (admitting
their existence) of the even and the odd term of the cross covariance function,
which are traditionally called the cospectrum Cij(W) and the quadrature spectrum
%(w), we have the following decomposition of the spectral density
The phase spectrum 'Pij(W) is the average phase shift of the proportion of
Zi( x) with the frequency
W on the proportion of Zj (x) at the same frequency
and
Im(f;j(w)) -qij(W)
tan 'Pij ()
W = = -....:;.-=:-..:....,-'-
Re(f;j(w)) Cij(W)
Absence of phase shift at a frequency W implies a zero imaginary part
Im(fij(w)) for this frequency. When there is no phase shift at any frequency,
the cross spectral density fij(W) is real and its Fourier transform is an even
function.
22 Intrinsic Multivariate Correlation
C(h) = V p(h)
This model is called the intrinsie correlation model because the correlation
between two variables
(Jij p(h) 17"
'3
EXERCISE 22.1 Show that the intrinsic correlation model is a valid model for
covariance function matrices.
The intrinsic correlation model implies even cross covariance functions be-
cause p(h) is an even function as it is a normalized direct covariance function.
An intrinsic correlation model can be formulated for intrinsically stationary
random functions (please note the two different uses of the adjective intrinsic).
Intrinsic Multivariate Correlation 141
Linear model
The linear random function model, that can be associated with the intrinsically
correlated multivariate variogram model, consists of a linear combination of co-
efficients a~ with factors Yp(x)
N
Z;(x) = L: a~ Yp(x)
p=l
The factors Yp(x) have pairwise uncorrelated increments
E[ (Yp(x+h) - Yp(x)) . (Yq(x+h) - Yq(x))] = 0 for p =f. q
and all have the same variogram
?;:;
21 E [N N a~ a~ (Yp(x+h) - Yp(x)) . (Yq(x+h) - Yq(x))
]
r/h )
Correlated at
short distances !!
•••••••••• - •• '
• 1 • ••• h
-~~
3.5km
Co dispersion coefficients
A sensitive question to explore for a given data set is to know whether the
intrinsic correlation model is adequate. One possibility to approach the problem
is to examine plots of the ratios of the cross versus the direct variograms, which
are called codispersion coe./ficients (MATHERON, 1965)
cc(h) = 'ij(h)
V'ii(h) ,jj(h)
If the co dispersion coefficients are constant, the correlation of the variable
pair does not depend on spatial scale. This is an obvious implication of the
intrinsic correlation model
b"'V(h) b··
cc(h) = '3 = _'_3- = rij
Jbii b ,(h) Jbii b
I
jj jj
where Tij is the correlation coefficient between two variables, computed from
elements of the coregionalization matrix B.
Another possible test for intrinsic correlation is based on the principal com-
ponents of the set of variables. If the cross variograms between the principal
components are not zero at alliags h, the principal components are not uncorre-
lated at all spatial scales of the study and the intrinsic correlation model is not
appropriate.
Figure 22.1 shows the cross variogram between the third and the fourth principal
component computed on the basis of the variance-covariance matrix of this geo-
chemical data for which a hypothesis of intrinsic correlation is obviously false.
Indeed, near the origin of the cross variogram the two components are signifi-
cantly correlated although the statistical correlation coefIicient of the principal
components is equal to zero. This is a symptom that the correlation structure
of the data is different at small scale, an aspect that a non-regionalized principal
component analysis cannot incorporate.
23 Cokriging
The quest ion to know whether cokriging is still interesting in the case of
isotopy, when the auxiliary variables are available at the same locations as the
variable of interest, will be examined at the end of this chapter.
Ordinary cokriging
The ordinary cokriging estimator is a linear combination of weights w~ with data
from different variables located at sampie points in the neighborhood of a point
xo. Each variable is defined on a set of sampies of (possibly different) size and n,
the estimator is defined as
N n,
Z~(xo) = LL
w~ Z,(x,,)
,=1,,=1
where the index i o refers to a particular variable of the set of N variables. The
n,
number of sampies depends upon the index i of the variables, so as to include
into the notation the possibility of heterotopic data.
In the framework of a joint intrinsic hypothesis we wish to estimate a par-
ticular variable of a set of N variables on the basis of an estimation error which
should be nil on average. This condition is satisfied by choosing weights which
sum up to one for the variable of interest and which have a zero sum for the
auxiliary variables
L w'
n,.
"0=8" =
{I if i= io
•
,,=1 0 otherW1se
Expanding the expression for the average estimation error, we get
E[ L,=1 <>=1
N n;
= LL w~ E[ Z,(X,,) - Z,(Xo) 1
,=1,,=1' " '
o
o
which are included into the sums, we can shorten the expression of the estimation
variance to
E[ (?: L: W~
N ni 2
(T~ = Zi(X O ) ) ]
.=10=0
(T~ = E[ (t (t w~
.=1 0=0
Zi(Xo) - Zi(O) t
0=0
w~ ) )2]
'--...-'
o
E[ (?: L: W~
N ~ 2
= ,(Z;(xo) - Zi(O)),) ]
.=10=0 •
increments
Defining cross covariances of increments Ct(xo , xß) (which are not translation
invariant) we have
N N ni nj
(TE
2 = "
L...J "
L...J " L...J WOi Wßj Cij(x
L...J " I o , Xß)
;=1 j=1 0=0 ß=O
- L: L: L: L: W~ W~ tij(Xo-Xß)
i=l j=1 0=1 ß=1
N ni
2
(TCK -_L...J
" L...J i tiio(Xo-XO)
" Wo + f.lio - tioio(XO - XO)
i=lo=l
Cokriging 147
Simple cokriging
Ordinary eokriging has no meaning when no data is available for the variable of
interest in a given neighborhood. On the other hand, simple kriging leans on the
knowledge of the means of the variables, so that an estimation of a variable ean
be ealibrated without having any data value for this variable in the eokriging
neighborhood.
The simple eokriging estimator is made up of the mean of the variable of
interest plus a linear eombination of weights w~ with the residuals of the variables
N n.
Z~(xo) = mio + I: I: w~ (Zi(X a ) - mi)
i=l a=l
[~' )
Cll C lj
C'N \ _ (C,,,
Cil C ii CiN I W. - I C"O
C N1 CNj C NN J WN \ CNio
where the left hand side matrix is built up with square symmetrie nj X nj bIo es
Ci; on the diagonal and with rectangular ni x nj bloes Cjj off the diagonal, with
C ij = cj;
The blocks C ij contain the eovarianees between sampie points for a fixed pair
of variables. The vectors Ciio list the eovarianees with the variable of interest, for
a speeifie variable of the set, between sampie points and the estimation loeation.
The vectors wi represent the weights attaehed to the sampies of a fixed variable.
Z1
Z2 Z2
X1 xo xÖ X2
.~ 1 ... 0 o •
Figure 23.1: Three data values at two locations Xl, X2.
ii) Set up a simple cokriging system for the case of isotopy. How does it reduce
due to the fact that no data value is available for Z1 (x) at the point X2?
iii) Solve the simple cokriging for an estimation of Z1 (x) at the three points x~,
x~ and X2 shown on Figure 23.l.
iv) Discuss the three cokriged values z~(x~), z~(x~) and Zr(X2).
The eokriging of S(x) is equal to the surn of the eokrigings of the variables
Z;(x) (using for eaeh eokriging N out of the N + 1 variables)
N
SCK(X) = E Z?K(x)
;=1
However, if we krige eaeh term of the surn and add up the krigings we generally
do not get the same result as when we krige directly the added up data: the two
estimations are not eoherent.
EXAMPLE 23.2 The thickness T(x) of a geologie layer is defined as the difference
between its upper and lower limits
The cokriging estimators of each term using the information of two out of the
three variables (the third being redundant) are coherent
But if kriging is used, the left hand result is in general different from the right
hand difference
T K (x) -# z{f (x) - zt (x)
and there is no criterion to help decide aB to which estimate of thickness is the
better.
A utokrigeability
A variable is said to be autokrigeab/e with respect to a set of variables, if the
direct kriging of this variable is equivalent to the eokriging. The trivial case is
when all variables are uncorrelated (at any scale!) and it is easy to see that this
implies that all kriging weights are zero exeept for the variable of interest.
In a situation of isotopy any variable is autokrigeable if it belongs to a set
of intrinsieally eorrelated variables. Assuming that the matrix V of the intrin-
sie correlation model is positive definite, we ean rewrite the simple cokriging
equations using Kronecker products 0
(V 0 R) W = Vio 0 ro
where
R = [p(xa-xß)] and rO = (p(xa-xo))
The nN X nN left hand matrix of the isotopic simple eokriging with intrinsic
correlation is expressed as the Kroneeker product between the N x N variance-
covarianee matrix V and the n x n left hand side matrix R of simple kriging
(111 R (11N R
V0R= (1ij R
(1NlR (1NNR
150 Multivariate Geostatistics
The nN right hand vector is the Kronecker product of the vector of covari-
ances O"iio times the right hand vector ro of simple kriging
O"lio ro
Vi o ® ro = I O"iio ro
O"Nio ro
Now, as Vio is also present in the left hand side of the cokriging system, a
solution exists for which the weights of the vector Wi o are those of the simple
kriging of the variable of interest and all other weights are zero
This is the only solution of the simple cokriging system as V and R are
positive definite.
It should be noted that in this demonstration the autokrigeability of the
variable of interest only depends on the block structure Vio ® Rand not on
the structure V ® R of the whole left hand matrix. Thus for a variable to be
autokrigeable we only need that the cross variograms or covariances with this
variable are proportional to the direct variogram or covarianee.
To test on an isotopic data set whether a variable Zi o is autokrigeable, we can
compute autokrigeability coefficients defined as the ratios of the cross variograms
against the direct variogram
aCioAh) = lioj(h)
lioio (h)
If the autokrigeability eoefficients are constant at any working seale for eaeh
variable j = 1, ... , N of a variable set, then the variable of interest is autokrige-
able with respect to this set of variables.
A set of N variables for whieh eaeh variable is autokrigeable with respect to
the N -1 other variables is intrinsically correlated.
Collocated cokriging
A particular heterotopic situation encountered in practice is when we have a
variable of interest Z(x) known at a few points and an auxiliary variable Sex)
known everywhere in the domain (or at least at all nodes of a given estimation
grid).
Xu ET AL. [206) called collocated cokriging a neighborhood definition strategy
in whieh the neighborhood of the auxiliary variable is arbitrarily reduced to only
one point: the estimation loeation xo. The simple cokriging system for such a
neighborhood (using n data values for the primary variable) is written
azz~
( azs r
azsro) (wz)
ass Ws
(azzro)
azs
o
To perform eolloeated cokriging with intrinsie eorrelation of the variable pair
we only need to know the eovarianee function
as well as the variance ass of Sex) and the covariance azs between Z(x) and
Sex).
EXERCISE 23.4 Suppose that Z(x) and Sex) are intrinsically correlated. We
choose to use as a cokriging neighborhood the data Z(x",) at n sampIe loca-
tions x'" and corresponding values S(x",) at the same locations. We also include
the value S(xo) available at the grid node Xo into the cokriging neighborhood
definition. Does this cokriging boil down to collocated cokriging?
EXERCISE 23.5 Gan this neighborhood search strategy (n values for Z, 1 value
for S at xo) be applied with ordinary cokriging? Evaluate the ordinary cokriging
weights.
24 Multivariate Nested Variogram
E[ Zi(X)] = mi
E[Z~(x)] = 0
and
The cross covariance functions Gij(h) associated with the spatial components
are composed of real coefIicients bij and are proportional to real correlation func-
tions p,,(h)
s s
Gij(h) = L Gij(h) = L bij p,,(h)
,,=0 ,,=0
which implies that the cross covariance functions are even in this model.
Coregionalization matrices B" of order N x N can be set up and we have a
multivariate nested covariance function model
s
C(h) = LBuPu(h)
u=o
Multivariate Nested Variogram 153
EXERCISE 24.1 When is the above covariance function model equivalent to the
intrinsic correlation model?
EXERCISE 24.2 Show that a correlation function Pu(h) having a non zero sill b'tj
on a given cross covariance function has necessarily non zero sills bi; and b'l; on
the corresponding direct covariance functions.
Conversely, if a sill bi; is zero for a given structure of a variable, all sills of
the structure on all cross covariance functions with this variable are zero.
Each spatial component Z!(x) can itself be represented as a set of uncorre-
lated factors YJ(x) with transformation coefficients a~p
N
Z!(x) = L a~u YJ(x)
p=l
E[Y.!'(x) 1 = 0
and
Combining the spatial with the multivariate decomposition, we obtain the linear
model 01 coregionalization
S N
Z;(x) L La~uY.!'(x)
u=Op=l
EXAMPLE 24.3 The inequality relation between sills can be used to provide a
graphical criterion to judge the goodness of the fit of a variogram model to an
experimental variogram. A huH of perfect correlation is defined by replacing the
sills b'tj of the cross variogram by the square root of the product of the sills of
the corresponding direct variograms (which have to be fitted first) and setting
the sign of the total to + or -
s
hull(iij(h)) = ± L Jbr; b'jj g,,(h)
,,=0
Figure 24.1 shows the cross variogram between nickel and arsenic for the Loire
geochemical data set. The fit of a nugget-effect plus two spherical models with
ranges of 3.5km and 6.5km seems inappropriate on this graph.
The same fit viewed on Figure 24.2 within the hull of perfect (positive or
negative) correlation now looks satisfactory. This shows that a cross variogram fit
should be judged in the context of the spatial correlation between two variables:
in this case the statistical correlation between the two variables is very poor (the
correlation coeflicient is equal to r = .12). The bt coeflicients ofthe two spherical
structures are very small in absolute value and the model is actually dose to a
pure nugget-effect, indicating that correlation between arsenic and nickel only
exists at the microscale.
Multivariate Nested Variogram 155
o(h) •
.12 I
As &Ni
•
•
I~.~ • • .. • •
•
o. h
o.
Figure 24.1: Experimental cross variogram of arsenic and nickel with a seemingly
badly fitted model.
Multivariate fit
A multivariate fit is sought for a model r(h) to the matrices r*(Sh) of the
experimental variograms calculated for n c classes fJk of separation vectors h. An
iterative algorithm due to GOULARD [76] is based on the least squares criterion
nc 2
L tr[ (r*(fJk) - r(fJk)) 1
k=l
which will be minimized under the constraint that the coregionalization matrices
B., of the model r(h) have to be positive semi-definite.
Starting with S arbitrarily chosen positive semi-definite matrices B", we look
for an S+l-th matrix B v which best fiUs up the gap dr~k between a model with
S matrices and the experimental matrices r*(fJk)
s
dr~k = r*(fJk) - L B.,g.,(fJk)
.,=0
.,#v
o(h)
.75
As &Ni
_. ......... . .h
-.75
Figure 24.2: The same cross variogram fit is displayed together with the hull of perfect
positive or negative correlation provided by the previous fitting of the corresponding
direct variograms. The cross variogram model is defined within the bounds of the hull.
and eigenvectors
nc
L dr:
1.=1
1. g,,(jh) = dr: = Q" A" Q: with Q:Q" = I
we can build with the positive eigenvalues a new matrix dr~ which is the positive
semi-definite matrix nearest to dr~ in the least squares sense
dr~ = Q"A~Q:
where A~ is equal to the diagonal eigenvalues matrix A" except for the negative
eigenvalues, which have been set to zero.
Dividing by the square of the variogram g,,(h) of the S+l-th structure, we
obtain a positive semi-definite matrix B: which minimizes the fitting criterion
B*= ~
" E(g,,(ih))
1.=1
2
This procedure is applied in turn to each matrix B u and iterated. The al-
gorithm converges weH in practice, although convergence is theoretically not
ensured.
Multivariate Nested Variogram 157
In practice weights W(.s)k) are included into the criterion to be allow the user
to put low weight on distance classes which the model should ignore
L
~ 2
w(.s)k) tr [ (r*(.s)k) - r(.s)k)) ]
k=l
The steps of this weighted least squares algorithm are analogous to the algorithm
without weights.
COMMENT 24.4 The heart of Goulard's algorithm is built on the following result
about eigenvalues (see RAO [145], p63).
Let A be asymmetrie N x N matrix. The Euclidean norm of A is
N N
IIAII = .ILLlaiil2
i=l ;=1
and its eigenvalue deeomposition
N
A = LApqpq; with QQT = I
p=l
We shall assume that the eigenvalues Ap are in deereasing order.
We look for asymmetrie N x N matrix B of rank k < N whieh best approx-
imates a given matrix A i.e.
inf IIA - BII
B
Multiplying a matrix by an orthogonal matrix does not modify its length
(measured with the Euclidean norm) so we have
IIA - BII 2 = 11 QT A Q - QT BQ 11 2
'"-.--'
C
= IIA-CIl 2
N N 2
= L L (Apbpq - Cpq)
p=lq=l
where bpq is one for p= q and zero otherwise. The best ehoke for C is obviously
a diagonal matrix as
N 2 N N N 2
L:(Ap-Cpp) + L:L:(Cpq)2 > L:(Ap - Cpp)
p=l p=lq=l p=l
~
p-#q
The smallest diagonal matrix C of rank k is the one for whieh the diagonal
elements Cpp are equal to the eigenvalues Ap for p ~ k and zero for p > k
N
;: : L
N 2
L(Ap - Cpp) (A p )2
p=l p=k+1
158 Multivariate Geostatistics
EXAMPLE 24.5 A multivariate nested variogram has been nt ted to the direct
and cross variograms of the three elements copper, lead and zinc sampled in a
forest near the town of Brilon, Germany (as described in (197]).
Regionalized correlation coeflicients
b~·
rij = Vb1f... b'f.
')
))
have been computed for each variable pair at three characteristic spatial scales
and are listed on Table 24.5 together with the classical correlation coeflicient
O'ij
rij = ----"-
-JO'ii O'jj
Multivariate Nested Variogram 159
The three characteristic scales were defined on the basis of the nested vari-
ogram model in the following way
- large scale: the variation above 130m is eaptured bya spherieal model with
a range of 2.3km.
At the micro-scale, the best correlated pair is copper and zine with a region-
alized eorrelation of .57: the geoehemist to whom these figures were presented
thought this was normal, beeause when eopper has a high value at a sampie
point, zine also tends to have a high value at the same loeation.
At the small scale, lead and zine have a eoeflicient of .46 whieh is explained
by the fact that when there are high values of lead at some sampie points, there
are high values of zine at points nearby within the same zones.
Finally, at the large seale, we notiee the negative eoeflieient of -.36 between
copper and lead, while at the two other eharacteristic seales the eorrelation is
(nearly) zero. This is due to the fact that when eopper is present in a zone, this
excludes lead and viee-versa.
Naturally this tentative interpretation does not answer all questions. In-
terpretation of eorrelation eoeflieients is diflieult beeause they do not obey an
equivalenee relation: if A is eorrelated with Band B with C, this does not imply
that A is eorrelated with C.
At least we ean draw one eonclusion from the analysis of this table of eo-
eflieients: the set of three variables is clearly not intrinsieally eorrelated and
eorrelation strongly depends on spatial seale. The varianee-eovarianee matrix
does not deseribe well the relations between the variables. Multiple linear re-
gression would be a bad idea and eokriging based on the multivariate nested
variogram the alternative to go for in a regression problem.
25 Coregionalization Analysis
COMMENT 25.1 This type of metric has also been used frequently to decom-
pose the matrix r*(.f)d of experimental direct and cross variograms for the first
distance dasS.f)I by numerous authors (see [202), [157}, [185}, [14} among others)
looking for the eigenvectors of
r*(.f)I)
V
This noise rem oval procedure has been compared to coregionalization analysis
in [200} and [198}. The basic principle is illustrated with a simple model which
shows why the approach can be successful in removing micro-scale variation.
Suppose we have a multivariate nested covariance function model
V = Bo+B1
Let.f)l be the distance dass grouping vectors h greater than zero and shorter
than the range of the spherical correlation function. Then
M ll = B 12 B 21
" ""
GOOVAERTS [73] has first applied a regionalized redundancy analysis to ex am-
ine the links between soil properties and chemical variables from banana leaves.
See also [103] for an application to remote sensing and geochemical ground data,
which turned out to be intrinsically correlated, so that redundancy analysis was
based on V instead of B".
Y:Ouo(xo) = L L w~ Zi(X")
;=1 ,,=1
for the factor (of zero mean, by eonstruction) by using weights summing up to
zero for eaeh variable
N n
E [ Y;,:uo (x) - y~o (x) ] = L m; LW~=O
i=1 ",=1
'-,..--'
o
The effeet of the eonstraints on the weights is to filter out the loeal means of
the variables Z;(x).
The estimation varianee a~ is
a~ = E[ (y;':uo(x) - Y~O(x)r ]
N N n n N n
= 1+ L L L ß=1
;=1 j=1
L w~ w~ C;j(x",-xß) - 2 L L w~ a~opo Puo(x",-xo)
",=1 ;=1 ",=1
We find in the right hand side of this system the transformation eoeffieients
a~ouo of the factor of interest. These eoeffieients are multiplied by values of the
spatial correlation function Puo(h) which describes the correlation at the scale of
interest.
The factor cokriging is used to estimate a regionalized factor at the no des of
a regular grid whieh serves to (lraw a map.
I Variables Zi(X) I
!
11 Intrinsic Correlation? 11 ~ 1 Fit r(h) = L:Bu9u(h) I
! Yes !
1 MVA on V 1 I MVA on B u I
! !
1 Transforrn into Y 1 1 Cokrige ~:uo frorn Zi(XoJ 1
! !
Figure 25.1: Classical multivariate analysis followed by a kriging of the factors (on
the left) versus regionalized multivariate analysis (on the right).
The quest ion of how to model and krige a complex variable has been analyzed by
LAJAUNIE & BEJAOUI (99) and GRZEBYK [80). The covariance structure can be
approached in two ways: either by modeling the real and imaginary parts of the
complex covariance or by modeling the coregionalization of the real and complex
parts of the random function. This opens several possibilities for (co-)kriging the
complex random function.
iV
Complex kriging
The linear eombination for estimating, on the basis of a eomplex eovarianee
(CC), the eentered variable Z(x) at a loeation Xo from data at loeations x'" with
weights W", E ce is
n
Z~dxo) = LW", Z(x",)
",=1
n
Z~dxo) = L [ (w~e U(x",) - W!:" V(x",))
",=1
EXERCISE 26.2 On the basis of the linear combination (1 +i) Z(O) + (1- i) Z(h)
show that
ICIm(h)I :::; (72
Complex kriging consists in computing the complex weights which are solution
of the simple kriging
n
~ Wp C(xer-xp) = C(xo-xer ) for a = 1, ... ,n
P=l
!t
This is equivalent to the following system, in which the real and imaginary
parts of the weights have been separated
n
~ (w~ CRe(xer-xp) - wr CIm(xer-xp)) = CRe(Xo-xer ) for Va
P=l
EXERCISE 26.3 Show that the estimation variance of ZCK(Xo) is equal to the
sum of the estimation variances of the real and imaginary parts.
As the estimators for U(x) and V(x) both have the same structure, we shall
only treat the cokriging of the former.
Let us examine the case of intrinsic correlation of U(x) and V(x). The
imaginary part of the complex covariance is zero and we have the complex kriging
system
(C; C~) (::) = (c:)
and more specifically
( <1vu
~UR ~vR)(~)=(~u~)
R <1vv R y <1uv ro
170 Multivariate Geostatistics
For both estimators the solution is equivalent to the separate simple kriging
of the real and imaginary parts and we find
W
Re = J.t1 = v 2 = 1
WK =
2
WK
EXERCISE 26.5 Show that with an even cross covariance Eunction the difference
oE the kriging variances is equal to
2 -
O"CC 2
O"KS = (1
WK-W
Re)T CUU (1
WK-W
Re) + (2
WK- W
Re)T C VV (2
WK- W
Re)
<I> = -(v
1
2
- iI)
Kriging a Complex Variable 171
JCC(h)dh ~ JCRe(h)dh
- CC(h) is more regular at the origin than CRe(h)j
- if CC(h) and cRe(h) have spectral densities then
r(w) ~ fRe(w) for all w
with weights Pk ~ 0 and EPk ~ 1. Instead of only one function CC(h), K func-
tions Ck(h) can be introduced which represent a family of covariance functions
compatible with a given CRe(h). The translations Tk and the models Ck(h) are
chosen after inspection of the graphs of the imaginary part of the experimental
covariance function. The coefficients Pk are fitted by least squares.
LAJAUNIE & BEJAOUI [99] provide a few fitting and complex kriging exam-
pIes using directional data from simulated ocean waves.
27 Bilinear Coregionalization Model
The linear model of eoregionalization of real variables implies even eross eovari-
anee functions and it was thus formulated with variograms in the framework
of intrinsie stationarity. The use of eross variograms however excludes deferred
eorrelations (due to delay effects or phase shifts). A more general model was set
up by GRZEBYK [80] [81] whieh allows for non even eross eovarianee functions.
B=E+iF
N
Z;(x) = L a~ Yp(x)
p=l
where
N N
Z;(x) = L (c~ Up(x) - ~ v" (x) ) + i L (c~ v,,(x) + ~ Up(x))
p=l p=l
Bilinear Coregionalization Model 173
where the coregionalization of Up(x) and Vp(x) is identical for all values of the
index p and has the form
bij = La~a~
p=l
whereas the elements of E and F are linked to the linear model by
N N
eij = L(c~C:;+~d~) !ij = L(C:;~ - c~d~)
p=l p=l
Naturally we can consider a nested complex multivariate covariance function
model of the type
s
C(h) = LBuPu(h)
u=O
with the underlying complex linear model of coregionalization
S N
Zi(X) = L L a~u Y.!'(x)
u=Op=l
where 'U and v are the indices of different characteristic spatial or time scales.
This model, as it is the sum of two linear models, has received the name
of bilinear model of coregionalization. We get a handy model by imposing the
foHowing relations between covariance function terms
1
Cuu(h) = Cvv(h) = 2" X(h)
1
Cuv( -h) = -Cuv(h) = 2" x:(h)
This implies that the cross covariance function Cuv(h) is odd. The cross
covariance function between two real variables is
~(Ci d + ~pdpi )
L..Jpp
X(h) _
2
~(d i_cipp
L..Jpp
dj ) x:(h)
2
p=l p=l
Non-Stationary Geostatistics
28 Universal Kriging
where the coefficients a/ are not zero. The function fo(x) is defined as constant
fo(x) = 1
which yields
n
m(xo) - 2: W", m(x",) = 0
",=1
and
L n
2: al (Mxo) - 2: W", fl(x",)) = 0
1=0 ",=1
As the al are non zero the following set of constraints on the weights w'"
emerges
n
Developing the expression for the estimation variance, introducing the con-
straints into the objective function together with Lagrange parameters JlI and
!
minimizing, we obtain the universal kriging (UK) system
t; p, I,(x.) ~ C(x.-x,)
n L
~ wß C(x.-xß) - for a = 1, ... , n
F = (fo, Idots, fL )
is of full column rank, i.e. the column vectors f l have to be linearly independent.
This means in particular that there can be no other constant vector besides the
vector fo. Thus the functions Mx) have to be selected with care.
z*(x) = ZT W x
Universal Kriging 179
(~ ~)(::)=(~;)
in which all terms dependent on the estimation location have been subscribed
with an x. In this formulation we need to solve the system each time for each
new interpolation location. As the left hand matrix does not depend on x, let
us define its inverse (assuming its existence) as
(J ~)
The universal kriging system is
(J ~) (~) = (~)
which, once inverted, yields the dual system of universal kriging
w T f, = J,(x)
we see that
z*(x) = J,(x) = z(x)
Thus the interpolator is exact for J,(x).
This clarifies the meaning of the constraints in universal kriging: the resulting
weights are ahle to interpolate exactly each one of the deterministic functions.
180 Non-Stationary Geostatistics
A specific coefficient A lo can be estimated from the data using the linear
combination
n
,,=1
ifl = 0 ;1
ifl =f: 10 .
I
with the constraints we obtain the system
n L
~ w1. C(x.-x,) - ~Pil. t,,(x.) ~ 0 for Q = 1, ... ,n
we see that
cov(Ai,Aio) =f.Lllo
The Lagrange parameters f.Lllo represent the covariances between the drift
coefficients.
Universal Kriging 181
!
The corresponding system is
n L
~ .J,iM C(x.-xß) - ~ P:<M J,(x.) ~ 0 for a = 1, ... ,n
we have not discussed how the underlying variogram ')'(h) can be inferred, know-
ing that both quantities, drift and residual, need to be estimated. The underlying
variogram is defined as
')'*(xa,xß) = ~E[(R"(Xa)-R"(xß)rl
= ')'(xa,xß) + ivar(M*(xa) - M*(xß))
x (j,(x+h) - j,(x)) dx
T3 = ~~1,,, j1s(X)ls(x+h)cov(At,A:)
'D
E[ G*(h)] = Tl - T3 # ')'(h)
Two methods exist for attempting to infer the underlying variogram, but they
are weH defined only for regularly gridded data (see [115], [31]).
In exceptional situations the underlying variogram is directly accessible:
- when the drift occurs only in one part of a domain which is assumed ho-
mogeneous, then the variogram model is inferred in the other, stationary
part of the domain and can be transferred to the non stationary part;
Universal Kriging 183
...
'C 'C
.
.. .
0) S.
B
~
...
0)
..
0) I:l
'"
. -....-,..
EI 0)
~ 8 ~.
'"c 2. 2. ~
::0 ~ • " i' • .l-
....'" ::0
I:l I ••
....'"
I.
... J. i' I
I •
100. 200. 300. 400. km tiOO. 700. 800. 900. km
Longitude Latitude
Figure 28.1: Scatter diagrams of temperature with longitude and latitude: there is a
systematic decrease from west to east, while there is no such trend in the north-south
direction.
- when the drift is not aetive in a partieular direction of spaee, the variogram
inferred in that direction ean be extended to the other directions under an
assumption of isotropie behavior of the underlying variogram.
To these important exeeptions we ean add the ease of a weak drift, when the
variogram can reasonably weH be inferred at short distances, where the bias is
not so strong. This situation is very similar to the case when ordinary kriging is
applied in a moving neighborhood with a loeaHy stationary model. The necessity
for including drift terms into the kriging system may be subjeet to discussion in
that case.
EXAMPLE 28.1 On Figure 28.1 we see the seatter diagrarns of average January
temperature in Seotland plotted against longitude and latitude (details are given
in [87]). Clearly there is a systematie deerease of mean temperature from west
to east due to the effeet of the Gulf stream whieh warms up the western eoast
of Seotland in the winter. Between north and south there is no tendeney to be
seen.
The variograms along longitude and latitude are shown on Figure 28.2. The
east-west experimental variogram is influeneed by drift and grows without bounds
while the shape of the north-south variogram suggests a sill: the latter has been
modeled using three spherical functions with ranges of 20, 70 and 150km. By
assuming that the ~nderlying variogram is isotropie, we ean use the north-south
variogram model also in the east-west direction. The interpretation of the east-
west experimental variogram is that drift masks off the underlying variogram in
that direction.
184 Non-Stationary Geostatistics
r (h)
//
I
"
15 E-W ". _______.l
,
"
,.-" "
"
.'
1.0 "
,.'""
,I'
,." " A.
h
50. 100. 150. km
Figure 28.2: Variograms of average January temperature data in the east-west and
north-south directions: a model is fitted in the direction with no drift.
With the second step we understand that the quest for the underlying var-
iogram can be vain: the residuals can have a structure which is simply not
compatible with a variogram and which can only be captured by a generalized
covariance. This leads us to a powerful theory of which only a elementary sketch
is provided in the next chapter. In the last chapter we discuss deterministic
functions which are not translation invariant, when dealing with external drift.
29 Translation Invariant Drift
The L+1 basis functions f/(x) need to generate a translation invariant vector
space. It can be shown that only functions belonging to the dass of exponentials-
polynomials fulfill this condition.
COMMENT 29.1 In two spatial dimensions with coordinate vectors x = (Xl, X2?
the following monomials are often used
fo(x) = 1, /I(x) = Xl, h(x) = X2, Ja(x) = (xd, f4(X) = Xl· X2, f5(X) = (X2)2
which generate a translation invariant vector space. The actual number L+ 1 of
basis functions of the drift depends on the degree k of the drift in the following
way:
01=1
Obviously the coefficients a/ are redundant and can be dropped for the inter-
polation of the basis functions f/(x).
With weights constrained as
n
Z*(xo) = E W ex Z(xex )
01=1
From this definition we see that the negative of the variogram, -"Y(h), is the
generalized eovarianee of an intrinsie random function of order zero. The theory
of intrinsically stationary random functions of order k provides a generalization of
the random functions with seeond order stationary inerements, whieh themselves
were set in a broader framework than the seeond order stationary functions.
The inferenee of the highest order of the drift polynomial to be filtered is
usually performed using automatie algorithms, a proeedure whieh we shall not
deseribe.
EXAMPLE 29.4 From the preeeding model a generalized eovarianee ean be de-
rived which has been (unfortunately) ealled the polynomial model
k
Kpol(h) = L bu (_1)u+1l h I2U+1 with bu ~ 0
u=O
To motivate the problem of spaee-time drift let us take the example of earth
magnetism measured from a traveling ship (see SEGURET & HUCHON [171]).
When studying the magnetism of the earth near the equator, a strong variation in
time is observed whieh is due to the 24 hour rotation of the earth. Measurements
taken from a ship which zigzags during several days in an equatorial zone need
to be corrected from the effect of this periodic variation.
The magnetism is modeled with a space-time random function Z(x, t) using
two uneorrelated intrinsie random functions of order k, Zk,w(X) and Zk,w(t), as
weH as spatial polynomial drift mk(x) and temporal periodie drift mw(t)
where a = eos<p and b = sin<p are multiplieative eonstants whieh will be ex-
pressed implicitly in the weights, so that the phase parameter <p does not need
to be provided explieitly.
For mapping magnetism without the bias of the periodie variation in time,
the generalized eovariance is inferred taking into account the space-time drift.
Subsequently the temporal drift is filtered by setting the eorresponding equations
Translation Invariant Drift 189
to zero on the right hand side of the following universal kriging system
n L
L Wp K(x,,-xp) - LJ1./ Nx,,) - VI sin(wt,,) - V2 cos(wt,,)
P=l /=0
= K(x,,-xo) for a = 1, .. . ,n
n
L wp sin(wtp) = [Q]
P=l
n
L wp IK(x,,-xp, ta-tp) I - L
n L
J1.1 j,(xa) - Vl sin(w ta) - V2 cos(wta)
P=l 1=0
=K(xa-xo) for a = 1, .. . ,n
n
LWP cos(wtp) = 0
P=l
where the terms 1«xa-xß, ta-tß) are modeled as the sum of the generalized
covariances 1«xa-xß) and 1«t a-tß).
30 External Drift
- the precise measurements of depth stemming from drillholes. They are not
a great many because of the excessively high cost of drilling. We choose to
model the drillholes with a random function Z(x).
As Z(x) and s(x) are two ways of expressing the phenomenon "depth of the
layer" we assume that Z(x) is on average equal to s(x) up to a constant ao and
a coefIicient b1
E[ Z(x) 1= ao + b s(x)
1
The deterministic function s(x) describes the overall shape of the layer in
imprecise depth units while the data about Z(x) give at a few locations an infor-
mation about the exact depth. The method merging both sources of information
uses s(x) as an external drift function for the estimation of Z(x).
External Drift 191
E[ Z*(xo) 1 = E[ Z(xo 1
which can be developed into
n
= ao + b1 LW" sex,,)
,,=1
= ao + b1 s(xo)
This last equation implies that the weights should be consistent with an exact
interpolation of sex)
n
s(xo) = LW" sex,,)
,,=1
The objective function to minimize in this problem consists of the estimation
variance (T~ and of two constraints
L wß s(xß) = s(xo)
ß=l
The mixing of a second-order stationary random function with a (non sta-
tionary) mean function may look surprising in this example. Stationarity is a
concept that depends of scale. Data can suggest stationarity at a large scale
(for widely spaced data points on Z(x)) while at a sm aller scale things look non
stationary (when inspecting the fine detail provided by a function sex)).
192 Non-Stationary Geostatistics
The external drift method consists in integrating into the kriging system sup-
plementary universality conditions about one or several external drift variables
Si(X), i = 1, ... , N measured exhaustively in the spatial domain. Actually the
functions Si(X) need to be known at aIllocations x" of the sampies as weIl as at
the nodes of the estimation grid.
The conditions
n
E W" Si (X,,) = slxo) i = 1, ... ,N
,,=1
are added to the kriging system independently of the dass of covariances K(h),
hence the qualificative external. A dass of generalized covariances can only be
defined with respect to translation-invariant basis functions fl(x).
With this definition of external drift we can have a new look at the inference
problems in the original theory of Universal Kriging, which was limited to in-
trinsic random functions with a variogram ,(h). These problems stern from the
fact that the drift functions fl(x) (l > 0) stay "external" to the process of infer-
ring the variogram. Attempts to integrate the drift functions into the inference
procedure are due to fail as the dass of generalized covariances for an IRF-O
K(h)= -,(h)
only filters (local) constants and is apriori too narrow to model adequately the
residuals.
A kriging system with translation-invariant and multiple external drift can
be written as
n L N
E Wß K(x,,-xß) - E J1.z!I(X,,) - E J1.i Si(X,,) = K(x,,-xo)
ß=1 1=0 i=1
for a = 1, ... ,n
n
cu oe
B 5.0
CI:S
I-<
cu
0-
S
2::0'
~ 2
cu CI:S
S.§ o.
~!
::I
~
.....,
o. 500.0 1000.0 m
Elevation
Figure 30.2: Estimated values of kriging with linear and external drift plotted against
elevation: the estimated values above 400m extrapolate out of the range of the data.
for a = 1, ... ,n
n
L Wß !l(xß) = 0 for 1 = O, ... ,L
ß=1
n
L Wß Si(Xß) = Diio for i = 1, ... , N
ß=1
where Diio is a Kronecker symbol which is one for i= io and zero otherwise.
EXAMPLE 30.1 (see [87]) We have data on mean January temperature at 146
stations in Seotland whieh are all loeated below 400m altitude, whereas the
scottish mountain ranges rise up to 1344m (Ben Nevis). The variogram of this
data was shown with a model on Figure 28.2 (on page 184) which is taken as the
underlying isotropie variogram model.
Figure 30.1 displays a map obtained by kriging with a linear drift. The
estimated temperatures stay within the range of the data (0 to 5 degrees Celsius).
Temperatures below the ones observed ean however be expected at altitudes
above 400m.
It could be verified that temperature depended on elevation in a reasonably
linear way. The altitude of the stations was known and elevation was available
External Drift 195
Figure 30.4: Map of the estimated external drift coefficient br when using elevation
as the external drift for kriging temperature with a moving neighborhood.
Externat Drift 197
Ordinary (OK)
and
( WOK ) = K- 1 k o
-IlOK
with K-1=(~ :)
Kriging with external drift (KE)
(~
1
o o S) (-IlKE
WKE ) = ( Co
1)
WKE )
F ( -1l~E
°
1.e. = Co
ST o -IlKE So -IlKE
CA + 1 yT = I
Cy + Iu =0
ITA = OT
ITy =1
* zTRI
m = FRI
c) We deiine Castelier's scalar product
xTRy
< x,y > = ITRI
and a corresponding mean and covariance
m m
and as the coeflicient b and the constant a oE the external drift are estimated by
show that b* and a* are the coeflicients oE a linear regression in the sense oE
Castelier's scalar product
b* = covn(z,s)
and a* = En[z]- b* En[s]
.covn(s,s)
External Drift 199
EXERCISE 30.3 Assume, in a first step, that we have an additional data value Zo
at the point Xo let us rewrite the left hand matrix of OK with n + 1 informations
COOUoo + kbvo =1
kouoo + Kvo =0
Coovb + kbAo = OT
kovi;" + KAo =1
a) Show that
1
Uoo = -2-
O"OK
Show that
ß(zolz) = Zo - z~ = Zo - (zT, 0) K- 1 k o
Show that
ß(sols) T -1 ß(sols)
Kn+1(zo,So)=zo 2 -(z,O)K ko 2 +Kn(z,s)
O"OK O"OK
and that
ß(sols) = (uoo, V TO)· (so)
0
0"6K
EXERCISE 30.4 In a second step, we are interested in an OI( with n-l points.
We write ß(zalz[a)) for the errar at a location x a with a kriging using OIJ.ly the
n-l data contained in the vector
Ll( s~IS[a])
2
K- 1 (~) = ara]
m*
n Kn(z,s)
b* L Wb Za = Kn(s, s)
= a=l
show that
a Ll(sals[a])
Wb = a[a]2 Kn (s, s )
The computation oE the errors Ll(sals[a]) can be interesting in applications.
They can serve to explore the influence of the values Sa on the KE estimator in
a given neighborhood.
Ll( Sa IS[a))2
2
ara)
could in an ideal situation be constant for all x a . In this particular case s(x) could
be considered as a realization (up to a constant factor) of the random function
Z(x) and the covariance function of the extern al drift would be proportional to
the c~variance function of Z(x). This ratio is worth examining in practice as
it explains the structure of the weights used for estimating the external drift
coefficient b* (see Exercise 30.4).
Appendix
Matrix Algebra
This review consists of a brief exposition of the principal results of matrix algebra,
scattered with a few exercises and examples. For a more detailed account we
recommend the book by STRANG [183].
Data table
In multivariate data analysis we handle tables of numbers (matrices) and the one
to start with is generally the data table Z
Variables
(columns)
Zl,l Zl,i Zl,N
Sampies I= [Zai] =Z
Zo,l Za,i Zo,N
(rows)
The element Zoi denotes a numerical value placed at the crossing of the row
number a (index of the sampie) and the column number i (index of the variable).
A = [aij]
with indices
i = 1, ... ,n j=l, ... ,m
Sum
Two matrices of the same order cau be added
A + B = [aii] + [bii] = [aii + bii]
The sum of the matrices is performed by adding together the elements of the
two matrices having the same index i aud j.
Multiplication by a scalar
A matrix cau be multiplied equally from the right or from the left by a scalar
AA = AA = [Aaii]
which amounts to multiply all elements aii by the scalar A.
A . B = [faii
. (nxm) (mxl) i=l
bjl,] = [Cik] = C
(nxl)
Transposition
The trauspose AT of a matrix A of order n x m is obtained by inverting the
sequence of the indices, such that the rows of A become the columns of AT
A = [aii], AT = [aii]
(nxm) (mxn)
The trauspose of the product of two matrices is equal to the product of the
transposed matrices in reverse order
(AB? =BTAT
EXERCISE 1.1 Let 1 be the vector of dimension n whose elements are a11 equal
to 1. Carry out the products Fl aud lF.
EXERCISE 1.2 Calculate the matrices resulting from the products
Square matrix
A square matrix has as many rows as eolumns.
Diagonal matrix
A diagonal matrix D is a square matrix whose only non zero elements are on the
n
diagonal
dl l o
D= 0 (
o o
In partieular we mention the identity matrix
hO o
o D
whieh does not modify a matrix of the same order when multiplying it from the
right or from the left
AI=IA=A
Orthogonal matrix
A square matrix A is orthogonal if it verifies
AT A = AAT =1
Symmetrie matrix
A square matrix A is symmetrie if it is equal to its transpose
A=AT
Linear independence
A set of vectors {ab ... , a.n} is said to be linearly independent, if there exists no
trivial set of scalars {Xl, ... , x m } such a.s
m
L:ajxj = 0
;=1
In other words, the linear independence of the columns of a matrix A is
acquired, if only the nul vector x = 0 satisfies the equation Ax = o.
Rank of a matrix
A rectangular matrix can be subdivided into the set of column vectors which
make it up. Similarly, we can also consider a set of "row vectors" of this matrix,
which are defined a.s the column vectors of its transpose.
The rank of the columns of a matrix is the maximum number of linearly
independent column vectors of this matrix. The rank of the rows is defined in an
analog manner. It can be shown that the rank of the rows is equal to the rank
of the columns.
The rank of a rectangular n x m matrix A is thus lower or equal to the smaller
of its two dimensions
rank(A) ::; min(n, m)
The rank of the matrix A indicates the dimension of the vector spacesl M (A)
and N(A) spanned by the columns and the rows of A
Inverse Enatrix
A square n x n matrix A is singular, if rank(A) < n, and non singular, if
rank(A) = n.
If A is non singular, an inverse matrix A -1 exists, such as
AA -1 = A -1 A = 1
Deternrlnant of a Enatrix
The determinant lAI of a square n x n matrix Ais
n
lAI = det(A) = 2) _lt(kl>...,k II a;ki
n)
;=1
where the sum is taken over all the permutations (kl, ... , k n ) of the integers
(1, ... , n) and where N(k h ... , kn ) is the number of transpositions of two inte-
gers necessary to pass from the starting set (1, ... , n) to a given permutation
(kt , .•• , kn ) of this set.
In the case of a 2 X 2 matrix there is the well-known formula
A = (ac d
b)' det(A) = ad - bc
Trace
The trace of a square n x n matrix is the sum of its diagonal elements
n
tr(A) = I: a;;
;=1
Eigenvalues
Let A be square n x n. The characteristic equation
1.-\1- AI = 0
has n in general complex solutions >. called the eigenvalues of A.
The sum of the eigenvalues is equal to the trace of the matrix
n
tr(A) = I: >'p
p=1
208 Appendix
Eigenvectors
Let A be a square matrix and A an eigenvalue of A. Then vectors x and y exist,
whieh are not equal to the zero veetor 0, satisfying
l.e.
Ax= AX, yT A = AyT
The vectors x are the eigenvectors of the eolumns of A and the y are the
eigenvectors of the rows of A.
When A is symmetrie, it is not neeessary to distinguish between the eigen-
vectors of the eolumns and of the rows.
xT Ax > 0
Similarly A is said to be positive semi-definite (non negative definite), iff
xT Ax > 0 for any vector x. Furthermore, A is said to be indefinite, iff
xT Ax > 0 for some X and xT Ax < 0 for other x.
We notiee that this definition is similar to the definition of a positive definite
function, e.g. the eovarianee function C(h).
We now list three very useful eriteria for positive semi-definite matrices.
FIRST CRITERION:
Ais positive semi-definite, iff a matrix W exists such as A = WTW.
In this situation W is sometimes written VA.
SECOND CRITERION:
A is positive semi-definite, iff all eigenvalues Ap ~ O.
Matrix Algebra 209
THIRD CRITERION:
A is positive semi-definite, iff all its prineipal minors are non negative.
A prineipal minor is the determinant of a prineipal submatrix of A. A prinei-
pal submatrix is obtained by leaving out k eolumns (k = 0,1, ... ,n -1) and the
eorresponding rows whieh cross them at the diagonal elements. The eombinato-
rial of the prineipal minors to be eheeked makes this eriterion less interesting for
applieations with n > 3.
EXAMPLE 1.6 It is easy to check with the third criterion that the left hand
(n +1) x (n+ 1) matrix of the ordinary kriging system, expressed with covariances,
is not positive semi-definite.
The principal submatrix S obtained by leaving out all columns and rows
crossing diagonal elements, except for the last two
S = (ein ~)
has a determinant equal to -1.
The computation of the eigenvalues of the left hand matrix of OK yields one
negative eigenvalue (resulting from the universality eondition) and n positive
(or zero) eigenvalues, on the basis of eovariances. Thus built up using a covari-
anee function, this matrix is indefinite, while it is negative (semi-)definite with
variogram values.
AQ = QA with QTQ = I
A = QAQT
J.ll 0 0
o 0
:E = o 0 J.lr
o 0 0
o 0 0
J.lp = .p:;,
In this decomposition Ql is the matrix of eigenvectors of AAT , while Q2 is
the matrix of eigenvectors of AT A.
AXA = A
XAX = X
(AX)T = AX
(XA? = XA
Z1
22
J
• 0
~ Xo
~)
1-'1
D C'i"
~+ =
(
~ o 0 o 0
o 1-'-;1 0 o >.. -1/2
r
0
Wehave
det(C) = 0 => A2 = 0
tr(C) = 2e => Al = 2e
and a matrix of eigenveetors (normed to one)
Q= (~T2 -72
~)
The solution of the system, relying on the generalized inverse, is
This solution does malm sense as the estimator takes the average of the two
sampie values Zl and Z2 measured at the loeation Xl and multiplies this average
with the simple kriging weight obtained when only one information is available
in the neighborhood of xo.
Linear Regression Theory
These are a few standard results of regression theory in the notation used in the
rest of the book.
E[(Zo-CTzrJ = E[(Zo-aTz+aTz-cTzrJ
E[(Zo-aTz). (aTz-cTzfJ
E [ (Zo - aT z) . (ZT a - ZT c) )
= E[ZozTa-ZozTc-aTzzTa+aTzzTc]
o a-vT
vT 0 c-aTVa+aTVc
Linear Regression Theory 215
vi; (a - c) - aT V (a - c)
= 0 beeause Va = vo.
is clearly lowest for c = a. Thus the multiple regression estimator provides the
best linear estimation in the least squares sense
N
Z~ = mo + L ai (Zi - mi)
;=1
Projection
Let Z be the matrix of N eentered auxiliary variables Zi and Zi be the vectors of
n sample values of eaeh variable, zr.
The sample values of the eentered variable
of interest Zo are not part of Z. The variable of interest is estimated by multiple
linear regression with the linear eombination of Z a = z~.
The varianee-eovarianee matrix of the auxiliary variables is
V= ~ZTZ
n
The veetor of of eovarianees between the variable of interest and the auxiliary
variables is
1 T
vo = -Z Zo
n
Thus the multiple linear regression is
Va=vo -F:} ZT Za = ZT Zo
a (ZT Zr 1 ZT Zo
z~ = Z(ZTZr1ZTzo
z~=Za=Pzo
216 Appendix
The vector of estimated values z~ is the result of this projection. The vector
of estimation errors
e=zo-z~
connects Zo with its projection and is by construction orthogonal to z~. We have
e=zo-Za= (I-P)zo
and the square of the length of the error vector is given by its Euclidean norm
lIell 2 = z~ Zo - 2 z~ }> Zo + z~ PP Zo
= z~ (I-P)zo
T -1 A -1 b cT A- 1
(A+bC) =A -1 - _. __
= (ZT Zr 1 + Z Z
( T )-1 za z~ (ZT Zr 1
1- Paa
where Paa is the element number a of the diagonal of the projection matrix P.
Cross validation
The error vector e = Zo - z~ is not a suitable basis for analyzing the goodness of
fit of the multiple linear regression vector z~. Indeed each estimated value zö* is
obtained using the same weight vector a. This vector is actually computed from
covariances set up with the sampie values Zö and zf, i = 1, ... ,N.
Linear Regression Theory 217
(ZT Zr 1 Za e",
a- 1- Paa
where ea = zg - zg*.
From this a recursive formula for the cross-validation error e[a] is derived
T
z'" a[al
01
e[a] Zo -
zg-z~ (a- (Z T
Z)_1 z",e a )
1- p",,,,
Pa", e",
= ea - 1- Paa
e",
1 -Paa
An estimate of the mean squared cross validation error is now easily computed
from the elements of the vector e and the diagonal elements Paa of the projection
matrix
1~( )2_1~( ea
- L. e[a] - - L.
)2
n 01=1 n 01=1 1 - paa
which can serve to evaluate the goodness of fit of the multiple linear regression.
Covariance and Variogram Models
Notation
• h is the spatial separation vector,
• b > 0 is the linear parameter,
• a > 0, Cl are non-linear parameters,
• Jv are Bessel functions of order v,
• K v are Basset functions (modified Bessel of third kind).
Covariance Models
Nugget-effect
{
b when Ihl = 0,
Cnug(h) = 0 when Ihl > o.
Spherical
Reference: [114], p57; [116], p86.
Csph(h) ={0
b (1 - 27
31hl 11h13 )
+ 2~
for 0 :s; Ihl :s; a,
for Ihl > a.
Cubic
Reference: [31]. Only for up to 2D.
Stable
Reference: [208], vol. 1, p364.
-~
CexP-Cl/(h) = be a with 0< er ~ 2
Exponential
Ihl
Cexp(h) = be---;;
Gaussian
Ihl 2
Cgaus(h) = be---;;:
Hole-effect
Reference: [208], vol. 1, p366.
-~ 1
Cho/(h) = be a coswlhl with a> y'3w
Basset
Reference: [114], p43j [208], vol. 1, p363.
Bessel
Reference: [114], p42j [208], vol. 1, p366.
lh l)-(n-2)/2 (Ihl)
Cbes(h) = b ( --;; J(n-2)/2 --;;
Cauchy
Reference: [208], vol. 1, p365.
C~.(h) ~ [1 + (1:1)
b
T with er> 0
220 Appendix
Variogram models
Power
Reference: [114], p128; [208], vol. 1, p406.
De Wijsian-a
Reference: [113], vol. 1, p75.
De Wijsian
Reference: [113], vol. 1, p75.
Ihl
C(h)=be a
2 {(72
cov(en,em) = Önm (7 = 0
ifn=m
otherwise
Z*(n + 1) = pZ(n),
with the simple kriging solution obtained in ii).
222 Appendix
Data is located on a regular grid and simple kriging (i.e. without condition
on the weights) is performed at the nodes of this grid to estimate short and long
range components, using a neighborhood incorporating all data.
What relation can be established between the weights A~ and A~ of the two
components at each point in the neighborhood ?
EXERCISE
_ _17.3
_ We have lZTy
n = lZTZQ
n because Y = ZQ.
R Q = Q A is the eigendecomposition of R.
Therefore cov(Z;, yp) = Xp qip and dividing by the standard deviation of yp,
which is A, we obtain the correlation coefficient between the variable and the
factor.
If the standardized variable is uncorrelated (orthogonal) with all others, an
eigenvector qp exists with all elements zero except for the element q,p corre-
sponding to that variable. As this element q,p is 1 because of normation and the
variable is identical with the factor, the eigenvalue has also to be 1.
EXERCISE 17.4
R = corr(Z, Z) =Q IX IX er
= corr(Z, y) [corr( Z, y) r
EXERCISE 18.1 The orthogonality constraints are not active like in PCA.
EXERCISE 18.2 Non active orthogonality constraints.
EXERCISE 18.3 Multiply the equation by the inverse of A.
f(w) ~
211"
J
+00
be-alhl-iwh dh
-00
= 2 b1l" [J e(a-iw)h dh + j
0
e-(a+iw)h dh]
1 1] =
-00
=
b[ b a 2:: 0 for any w.
211" a-iw + a+iw 1I"a 2 +w 2
((~)'
ai+ a "
aiaj > ~ )
is rewritten as
(a; +w 2)(aj +W 2)
(a; tajr f
a; aj
>
((~r +w 2
This inequality is false for a; i= aj when w - t 00 because the left hand side is a
constant < 1 while the right hand is dominated for large values of w by a term w4
appearing both in the numerator and in the denominator. Thus the set of direct
and cross covariance function is not an authorized covariance function matrix.
In this exercise the sills were implicitly set to one (which implies a linear
correlation coefficient equal to one). YAGLOM [208], vol. I, p. 315, gives the
example of a bivariate exponential covariance function model in which the range
parameters a are linked to the sill parameters b in such a way that to a given
degree of uncorrelatedness corresponds an interval of permissible ranges.
EXERCISE 22.1
N N n n
L: L: w;wjb;j ~ 0 and L: L: Wer Wß p(Xer-Xß) ~ 0
;=1 j=1 er=1 ß=1
(t t
;=1 j=1
W;Wjb;j)' (t ta=1 ß=1
Wer wßp(xer-xß) )
N N n n
= L: L: L: L: W~ W~ b;j p(Xer-Xß) ~ 0
;=1 j=1 er=l ß=1
with W~ = Wi Wer.
EXERCISE 22.2 ')'(h) is a conditionally negative definite function and -')'(h)
is conditionally positive definite. - B ')'(h) is a conditionally positive definite
function matrix. The demonstration is in the same spirit as for a covariance
function matrix.
EXERCISE 23.1
i) The diagonal elements of the coregione.lization matrix are positive and the
determinant is
I-12 -111 = 2 - 1 = 1
The principal minors are positive and the matrix is positive definite.
The correlation coefficient is r = -1/..[2 = -.71.
Solutions to Exercises 225
ii) In the case of isotopy the system would be
Cl1(XO - xo) Cl1 (XI - X2) CI2 (XO - XO) CI2 (XI - X2))
( Cl1 (X2 - Xl) Cl1 (Xo - XO) C12 (X2 - xd CI2(XO - XO)
C21 (XO - XO) C21 (XI - X2) C22(XO - XO) C22(XI - X2)
C21 (X2 - Xl) C21 (XO - XO) C22 (X2 - Xl) C22 (XO - XO)
Wi) (Cl1(XI-XO))
x ( W~
W2 _
-
Cl1 (X2 - XO)
C21 (XI - XO)
W~ C21 (X2 - XO)
As there is no information about ZI(X) at the point X2, the second row of the
system and the second column of the left hand matrix vanish and we have
(-12o -10)
1 0 (Wl)
W2 (0)0, ( 0 ) , (0)
0 1 W3 0 -5/16 -1
0
iii) W3 = 0, -te, -1; Wl = W2 = 0 for the three estimation points.
iv)
zt(x~) = ml because the point x~ is out of range of the two data points.
= ml + C12(~) (Z2(X2) - m2)' It is estimated from the residual of the
zt(x~)
auxiliary variable at the point X2'
Z;(X2) = ml + cov(zt, Z2) (Z2(X2) - m2) is equivalent to the linear regression
at the point X2.
As a conclusion of this exercise we see that in the heterotopic case cokriging
with an intrinsic correlation model does not boil down to kriging. For the two
points within the range of X2 the only non zero weight is for a data value of the
auxiliary variable, while the primary variable solely contributes to the simple
cokriging estimator through its mean ml.
EXERCISE 23.4 No.
EXERCISE 23.5 This strategy gives a trivial result. The sum of the weights for
the auxiliary variable S(x) is contrined to be zero in ordinary kriging. Thus the
weight for the only data value on S included in the cokriging neighborhood is
zero.
EXERCISE 24.1 When all coregionalization matrices B., are proportional to one
matrix B we have:
s s
C(h) = La., B p.,(h) = B La., p.,(h)
.,=0 .,=0
where the au are the coefficients of proportionality.
EXERCISE 24.2 As a matrix B., is positive semi-definite by definition, the posi-
tivity of its second order minors implies an inequality between direct and cross
sills
Ibijl ~ Jb'j b'li
226 Appendix
with any set of weights w~. Thus CRe(h) is a positive definite function.
EXERCISE 26.2
+2 L: L: Jl~ vJ Cuv(x",-xß)
",=Iß=1
n n
-2 L: Jl~ Cuu(x",-xo) - 2 L: v! CUV(x",-Xo)
",=1 ",=1
EXERCISE 26.5 With an even cross covariance function Cuv(h)= Cvu(h) the
kriging variance of complex kriging is equal to
where the weights are solution of the two simple kriging systems
-W
ReT (C 1
UU WK + C vv WK2) + Cuu
T
W K + Cvv WK
1 T 2
which are nonnegative because Cuu and Cvv are positive definite.
The difference between the two kriging variances reduces to Q2
= Q2
The variance of complex kriging is larger than the variance of the separate
kriging of the real and imaginary parts when the cross covariance function is
even. Equality is achieved with intrinsic correlation.
EXERCISE 30.2 a)
2
aOK = Coo - k6 K- 1 k o = Coo - kb ( -/lOK
WOK )
n
= C(xo - Xo) - L waoC(x o - xo) + fLOK
0=1
b)
-1 RI A R- RI
u= FRI V=FRI
=
-_(RI)T
....... -
228 Appendix
m* = (zT,O)K-l (0)
1
= (ZT,O) (V) = ZT V = zTRl
U - - ~-
C)
K-l s'
with VF = s,T K-l s' and
AF = K-1 _ K-l s' (K-l s'?
s,TK-l s'
m~
= s,TK-1s' = Kn(s,s) = COvn(s,s)
a* ~ (.T,O,O)F- t (zT,O)AF m
= z' (K-1 _ K- 1 s' (K-1 s'?)
S,T K-I S' 1
(0)
= Z' K- 1 (0) _
1
Z' K- 1 S' . S' K-I
s'K-Is'
(0)1
= En[z] - b* En[s)
CooUoo - ~K-Ikouoo = 1
b)
Solutions to Exercises 229
T -1) ~(sols)
K:n+t(Zo,SO)= ( zO-(z,O)K ko· 2 +K:n(Z,S)
O"OK
et
(Uoo,VTo)' (SO)
0 Uoo So + V~ (~) = Uoo So - Uoo K- l ko (~)
A( I) _ ~(sols)
uoo.u. So s - 2
O"OK
and for a = n
b)
l (S)
b*=K:n(z,s) = (zT,O)
K:n(s, s)
0
K:n(s, s)
K-
=tzaa=l
~(saIS[a))
O"~) K:n(s, s)
EXERCISE 1.1
ITl = n
230 Appendix
11T = C D
'-.------'
(nxn)
EXERCISE IV.l
i)
n
~ Wß
L.; e- I"'-ßI
a = e_ I",-(n+1)1 Q = l, ... ,n
ß=l a
ii)
_I",-nl _1",-(n+1l1
wne a = e a Q = l, ... ,n
1",-(n+1)1
e a
=> Wn
I",-nl Q= l, ... ,n
e- a
e-(n~l)
= e-"
n -
- e
_1
a =C
iii)
Z(n) = pZ(n-l)+en=p[pZ(n-2)+en_tl+en
m-l
= p2 Z(n-2) +pen-l + en = pm Z(n-m) + L: p"'en-",
",=0
00 n
Z(n) = L: p'" en-", = L: pn-"'en
",=0 0=-00
Solutions to Exercises 231
f
a=O
pa ( f
ß=O
~ a 2 lin-a,m-ß)
a 2 p m-n
=
1- p2
cov(Z(n),Z(m)) = f
a=n-m
pa p (m-n)+aa 2
00
LP(n-m)+i p(m-n)+(n-m)+; a 2
;=0
00
a 2 p n-m Lp2i
i=O
n-m
2
ap
=
1- p2
a2 In-mi
Thus cov(Z(n),Z(m)) = 1 P _2 and r nm = pln-m l.
V) P = Wn thus the range
1
a = -logp
and the sill
a2
b = var(Z(n)) = 1 _ p2
EXERCISE IV.2 As we have exact interpolation, the kriging weights for Z*(xo)
are linked by
Wo = 8xo ,xo
232 Appendix
( C~'1
....
CN•1
::.
...
C1:.N
CN.N
~)
1
( ~f'p+
wN
) ( Wr) (W~o )
'a+;"o
wN wN
,
1 ...
.....
1 0
' ..............................
I'P I'a I'mo
~
A ~ ~ ~
bp ba b mo
Concepts
Classification: SOUSA [175]; OLIVER & WEBSTER [135]; RASPA ET AL. [146].
Cokriging of variables and their derivatives: CHAUVET ET AL. [27]; THIEBAUX &
PEDDER [187]; RENARD & RUFFO [148].
Cross validation: COOK & WEIS BERG [36]; DUBRULE [61]; CASTELlER [21][22].
Noise filtering: SWITZER & GREEN [185]; BERMAN [14]; MA & ROYER [105];
DALY [44]; DALY ET AL. [45)[46].
Spaee-time modeling: STEIN [180); HASLETT [85); GOODALL & MARDIA [70).
Applications
Design of computer experiments: SACKS ET AL. [161).
Geophysical exploration: GALLI ET AL. [65]; CHILES & GUILLEN [33]; SCHULZ-
OHLBERG [168]; RENARD & NAI-HsIEN [147]; SEGURET & HUCHON [171];
SEGURET [170].
Hydrogeology: DELHOMME [53]; CREUTIN & OBLED [41]; BRAS & RoDRiGUEZ-
ITURBE [17]; DE MARSILY [111]; STEIN [180]; AHMED & DE MARSILY [2]; DA-
GAN [43]; DONG [59]; ROUHANI & WACKERNAGEL [156]; CHILES [32]; MATHERON
ET AL. [127].
Image analysis: SWITZER & GREEN [185]; BERMAN [14]; MA & ROYER [105]; DALY
ET AL. [45][46][47].
Industrial hygienics: PREAT [142]; SCHNEIDER ET AL. [166].
Material science: DALY ET AL. [45][46][47].
Meteorology: CHAUVET ET AL. [27]; THIEBAUX & PEDDER [187]; HASLETT [85];
THIEBAUX ET AL. [186]; HUDSON & WACKERNAGEL [87].
Mining: JOURNEL & HUIJBREGTS [93]; PARKER [140]; SOUSA [175].
Petroleum and gas exploration: DELHOMME ET AL. [55]; DELFIN ER ET AL. [51]; GALLI
ET AL. [65]; MARECHAL [110]; GALLI & MEUNIER [66]; JAQUET [92]; RENARD &
RUFFO [148].
Pollution: ORFEUIL [137]; LAJAUNIE [96]; BROWN ET AL. [18].
SoH science: WEBSTER [202]; [200]; GOULARD [77]; OLIVER
WACKERNAGEL ET AL.
& WEBSTER [135]; WEBSTER [204]; GOULARD & VOLTZ [78]; STEIN,
& OLIVER
STARISKY & BOUMA [179]; GOOVAERTS [72][73]; GOOVAERTS ET AL. [74]; PAPRITZ
& FLÜHLER [138]; GOOVAERTS & WEBSTER [75]; WEBSTER ET AL. [203].
Books
Basic geostatistical texts: MATHERON [113], [114], [116], [126].
Introductory texts:DAVID [48]; JOURNEL & HUIJBREGTS [93]; CLARK [34];
DELFIN ER [50]; ARMSTRONG & MATHERON [10]; AKIN & SIEMES [3]; ISAAKS &
SHRIVASTAVA [91]; CHAUVET [24], [25]; CRESSIE [40]; ARMSTRONG ET AL. [11];
RiVOJRARD [155].
Books on multivariate analysis: RAO [145]; MORRISON [131]; MARDIA, KENT &
BIBBY [108]; COOK & WEISBERG [36]; ANDERSON [5]; SEBER [169]; GREENACRE [79];
VOLLE [193]; GITTINS [69]; GIFI [67]; SAPORTA [165]; WHITTAKER [205].
Software
Several publie domain produets exist, of varying quality, and up to now, of limited
lifetime. The most eommonly used eolleetion of FORTRAN routines is available in the
book by DEUTSCH & JOURNEL [56].
We list three sourees of eommercial software:
• S-Plus. A general purpose statistical system for workstations and PC. Functions
for spatial statistics are deseribed in the book by VENABLES & RIPLEY [191]. De-
veloped by: Statistical Scienees Ine., 1700 Westlake Ave. N., Suite 500, Seattle,
WA 98109, USA.
Bibliography
[1] ADLER R (1981) The Geometry of Random Fields. Wiley, New York.
[2) AHMED S & DE MARSILY G (1987) Gomparison of geostatistical methods for es-
timating transmissivity using data on transmissivity and specific capacity. Water
Resources Research, 23,1717-1737.
[4) ALMEIDA AS & JOURNEL AG (1994) Joint simulation ofmultiple variables with
a Markov-type coregionalization model. Mathematical Geology, 26, 565-588.
[6) ANSELIN L (1988) Spatial Econometrics: Methods and Models. Kluwer Academic
Publishers, Dordrecht, 284p.
[9] ARMSTRONG M & DOWD PA, editors (1994) Geostatistical Simulation. Kluwer
Academic Publisher, Dordrecht, 255p.
[14] BERMAN M (1985) The statistical properties of three noise rem oval procedures
for multichannel remotely sensed data. Division of Mathematics and Statistics,
Consulting Report NSW /85/31/MB9, CSIRO, Lindfield, Australia.
238 Bibliography
(15) BOURGAULT G & MARCOTTE D (1991) Multivariable variogram and its applica-
tion to the linear model of coregionalization. Mathematical Geology, 23, 899-928.
(16) Box GEP & JENKINS GM (1970) Time Series Analysis: Forecasting and Con-
trol. Holden-Day, San Francisco.
(21) CASTELlER E (1992) La derive externe vue comme une regression lineaire.
Publication N-34/92/G, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau, 22p.
(23) CHAUVET P (1982) The variogram cloud. In: Johnson TB & Barnes RJ (eds)
17 th APCOM, 757-764, Society of Mining Engineers, New York.
(24) CHAUVET P (1991) Aide Memoire de Geostatistique Lineaire. Cahiers de Geo-
statistique, Fascicule 2, Ecole des Mines de Paris, Fontainebleau, 210p.
(25) CHAUVET P (1993) Processing Data with a Spatial Support: Geostatistics and
its Methods. Cahiers de Geostatistique, Fascicule 4, Ecole des Mines de Paris,
Fontainebleau, 57p.
(26) CHAUVET P & GALLI A (1982) Universal Kriging. Publication C-96, Centre de
Geostatistique, Ecole des Mines de Paris, Fontainebleau.
(27) CHAUVET P, PAILLEUX J & CHILES JP (1976) Analyse objective des champs
meteorologiques par cokrigeage. La Meteorologie, 6e serie, No 4,37-54.
(28) CHOQUET G (1967) Lectures on Analysis. Vol. 3, Benjamin, New York, 394p.
[30) CHRISTENSEN R (1991) Linear Models for Multivariate, Time Series and Spatial
Data. Springer-Verlag, New York.
[34] CLARK I (1979) Practical Geostatistics. Elsevier Applied Science, London, 129p.
[35] CLIFF AD & ÜRD JK (1981) Spatial Processes: Models & Applications. Pion,
London, 266p.
[36] COOK RD & WEISBERG S (1982) Residuals and Influence in Regression. Chap-
man and Hall, New York, 230p.
[37] CRAMER H (1940) On the theory oistationary random processes. Annals ofMath-
ematics, 41, 215-230.
[38] CRAMER H & LEADBETTER MR (1967) Stationary and Related Stochastic Pro-
cesses. Wiley, New York.
[39] CRESSIE N (1991) The origins oi kriging. Mathematical Geology, 22, 239-252.
[40) CRESSIE N (1991) Statistics for Spatial Data. Wiley, New York, 900p.
[41] CREUTIN JD & üBLED C (1982) Objective analysis and mapping techniques for
rainfall fields: an objective analysis. Water Resources Research, 18, 413-431.
a
[44] DALY C (1991) Application de la Geostatistique quelques Problemes de FiItrage.
Doctoral Thesis, Ecole des Mines de Paris, Fontainebleau.
[47) DALY C, JEULIN D & BENOIT D (1992) Nonlinear statistical fiItering and ap-
plications to segregation in steels from microprobe images. Scanning Microscopy
Supplement 6,137-145.
[49J DAVIS JC (1973) Statistics and Data Analysis in Geology. Wiley, New York.
240 Bibliography
[54] DELHOMME JP (1979) Reflexions sur 1a prise en compte simu1tanee des don-
nees de forages et des donnees sismiques. Publication LHM/RC/79/41, Centre
d'Informatique Geologique, Ecole des Mines de Paris, Fontainebleau.
[56] DEUTSCH C & JOURNEL A (1992) GSLIB: Geostatistical Software Library and
User's Guide. Oxford University Press, New York.
[57] DIGGLE PJ, LIANG KY & ZEGER SL (1994) Analysis of Longitudinal Data.
Clarendon Press, Oxford, 253p.
[58] DIMITRAKOPOULOS R, editor (1994) Geostatistics for the Next Century. Kluwer
Academic Publishers, Dordrecht, 497p.
[59) DONG A (1990) Estimation Geostatistique des PMnomenes regis par des
Equations aux Derivees Partielles. Doctoral thesis, Ecole des Mines de Paris,
Fontainebleau, 262p.
[63] FOUQUET C DE & MANDALLAZ D (1993) Using geostatistics for forest inventory
with air cover: an examp1e. In: Soares A (Ed.) Geostatistics Troia. '92, 875-886,
Kluwer Academic Publisher, Amsterdam.
[66] GALL! A & MEUNIER G (1987) Study of agas reservoir using the extern al drift
method. In: G Matheron and M Armstrong (eds) Geostatistical Case Studies,
105-119, Reidel, Dordrecht.
[72] GOOVAERTS P (1992) Factorial kriging analysis: a useful tool for exploring the
structure of multivariate spatial information. Journal of SoH Science, 43, 597-619.
[73] GOOVAERTS P (1994) Study of spatial relationships between two sets ofvariables
using multivariate geostatistics. Geoderma, 62, 93-107.
[78] GOULARD M & VOLTZ M (1992) Linear coregionalization model: tools for esti-
mation and choice ofmultivariate variograms. Mathematical Geology, 24,269-286.
[83] GUTTORP P & SAMPSON PD (1994) Methods for estimate heterogeneous spa-
tial eovariance funetions with environmental applieations. In: Patil GP & Rao
CR (eds) Environmental Statistics, Handbook of Statistics, 12,661-689, North-
Holland, Amsterdam.
[84] HAINING (1990) Spatial Data Analysis in the Social and Environmental Scienees.
Cambridge University Press, Cambridge, 409p.
[86] HASLETT J, BRADLEY R, CRAIG PS, WILLS G & UNWIN AR (1991) Dynamie
graphies for exploring spatial data, with application to loeating global and loeal
anomalies. The American Statistician, 45, 234-242.
[87] HUDSON G & WACKERNAGEL H (1994) Mapping temperature using kriging with
external drift: theory and an example from Seotland. International Journal of
Climatology, 14, 77-91.
[92] JAQUET 0 (1989) Faetorial Kriging Analysis Applied to Geological Data from
Petroleum Exploration. Mathematical Geology, 21, 683-691.
[94] KENDALL M & STUART A (1977) The Advanced Theory of Statistics. Volume 1,
4th edition, Griffin, London, 472p.
Bibliography 243
[96] LAJAUNIE C (1984) A geostatistical approach to air pollution modeling. In: Verly
G et al (eds) Geostatistics for Natural Resources Characterization, 877-891,
NATO ASI Series C-122, Reidel, Dordrecht.
[99] LAJAUNIE C & BEJAOUI R (1991) Sur le krigeage des fonctions complexes. Note
N-23/91/G, Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau,
24p.
[104) LUMLEY JL (1970) Stochastic Tools in Turb ulen ce. Academic Press, London.
[105] MA YZ & ROYER JJ (1988) Local geostatistical filtering: application to remote
sensing. Sciences de la Terre, Serie Informatique, 27, 17-36.
[106] MAGNUS JR & NEUDECKER H (1988) Matrix Differential Calculus with Appli-
cations in Statistics and Econometrics. Wiley, New York.
[108] MARDIA KV, KENT JT & BIBBY JM (1979) Multivariate Analysis. Academic
Press, London, 521p.
[110] MARECHAL A (1984) Kriging seismic data in presence of faults. In: Verly G
et al.(eds) Geostatistics for Natural Resources Characterization, 271-294, NATO
ASI Series C-122, Reidel, Dordrecht.
244 Bibliography
[118] MATHERON G (1972) Le~ons sur les Fonctions AIeatoires d'Ordre 2. Ecole des
Mines, Paris.
[119] MATHERON G (1973) The intrinsic random functions and their applications.
Advances in Applied Prob ability, 5, 439-468.
[120] MATHERON G (1976) A simple substitute for conditional expectation: the dis-
junctive kriging. In:· Guarascio M et al.(eds) Advanced Geostatistics in the Mining
Industry, 221-236, NATO ASI Series C 24, Reidel, Dordrecht.
[121] MATHERON G (1979) Recherche de simplification dans un probleme de
cokrigeage. Publication N-628, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau.
[122] MATHERON G (1982) Pour une Analyse Krigeante des Donnees Regionalisees.
Publication N-732, Centre de Geostatistique, Fontainebleau, France, 22p.
[124] MATHERON G (1984) The selectivity of distributions and "the second princi-
pIe of geostatistics". In: Verly G et al (eds) Geostatistics for Natural Resources
Characterization, 421-433, NATO ASI Series C-122, Reidel, Dordrecht.
[125] MATHERON G (1989) Two classes of isofactorial models. In: Armstrong M (ed)
Geostatistics, 309-322, Kluwer Academic Publisher, Amsterdam, Holland.
[128] MEIER S & KELLER W (1990) Geostatistik: eine Einführung in die Theorie
der Zufallsprozesse. Springer-Verlag, Wien, 206p.
[135] OLIVER MA & WEBSTER R (1989) A geostatistical basis for spatial weighting
in multivariate classificatioI}-. Mathematical Geology, 21, 15-35.
[136] OLIVER MA, LAJAUNIE C, WEBSTER R, MUIR KR & MANN JR (1993) Es-
timating the risk of childhood cancer. In: Soares A (Ed.) Geostatistics Troia '92,
899-910, Kluwer Academic Publisher, Amsterdam.
[137] ORFEUIL JP (1977) Etude, mise en oeuvre et test d'un modele de Pff3diction
a court terme de pollution athmospherique. Rapport CECA, Publication N-498,
Cent re de Geostatistique, Ecole des Mines de Paris, Fontainebleau.
[140] PARKER H (1979) The volume-variance relationship: a useful tool for mine plan-
ning. Engineering and Mining Journal, 180, 10, 106-123.
[141] PETITGAS P (1993) Use of a disjunctive kriging to model areas of high pelagic
fish density in acoustic fisheries surveys. Aquatic Living Resources, 6,201-209.
[143] PRIESTLEY MB (1981) Spectral Analysis and Time Series. Academic Press, Lon-
don, 890p.
246 Bibliography
[144] RAo CR (1964) The use and interpretation of principal component analysis in
applied research. Sankhya, Series A, 329-358.
[145] RAo CR (1973) Linear Statistical Inference and its Applications. 2nd edition,
Wiley, New York, 625p.
[148] RENARD D & RUFFO P (1993) Depth, dip and gradient. In: Soares A (Ed.)
Geostatistics Troia '92, 167-178, Kluwer Academic Publisher, Amsterdam.
[153] RIVOIRARD J (1989) Models with orthogonal indicator residuaJs. In: Armstrong
M (ed) Geostatistics, 91-107, Kluwer Academic Publisher, Dordrecht.
[160] SABOURIN R (1976) Application of two methods for the interpretation of the
underlying variogram. In: Guarascio M et al.( eds) Advanced Geostatistics in the
Mining Industry, 101-109, NATO ASI Series C 24, Reidel, Dordrecht.
Bibliography 247
[161] SACKS J, WELCH WJ, MITCHELL TJ & WYNN HP (1989) Design and analysis
of computer experiments. Statistical Science, 4, 409-435.
[164] SANDJIVY L (1984) The factorial kriging analysis of regionalized data. In:
Verly G et al.( eds) Geostatistics for Natural Resources Characterization, 559-
571, NATO ASI Series C-122, Reide1, Dordrecht.
[167] SCHOENBERG IJ (1938) Metric spaces and completely monotone functions. An-
nals of Mathematics, 39, 811-841.
[169] SEBER GAF (1984) Multivariate Observations. Wiley, New York, 686p.
[171] SEGURET S & HUCHON P (1990) Trigonometrie kriging: a new method for
removing the diurnal variation {rom geomagnetie data. Journal of Geophysical
Research, Vol. 32, No. B13, 21.383-21.397.
[174] SOARES A, editor (1993) Geostatisties Troia '92. Two vo1umes, K1uwer Academic
Publisher, Amsterdam.
[176] SPECTOR A & BHATTACHARYYA BK (1966) Energy density spectrum and au-
tocorrelation functions due to simple magnetic models. Geophysical Prospecting,
14, 242-272.
248 Bibliography
[177] SPECTOR A & GRANT FS (1970) Statistical models for interpreting aeromag-
netic data. Geophysics, 35, 293-302.
[179] STEIN A, STARISKY IG & BOUMA J (1991) Simulation of moisture deficits and
areal interpolation by universal cokriging. Water Resources Research, 27, 1963-
1973.
[180] STEIN M (1986) A simple model for spatial-temporal processes. Water Resources
Research, 22, 2107-2110. '
[182] STOYAN D & STOYAN H (1994) Fractals, Random Shapes and Point Fields.
Wiley, New York, 406p.
[183] STRANG G (1984) Linear Algebra and its Applications. Second edition, Academic
Press, London.
[185] SWITZER P & GREEN AA (1984) Minimax autocorrelation factors for multi-
variate spatial imagery. Department of Statistics, Stanford University, Technical
Report No 6, 14p.
[186] THIEBAUX HJ, MORONE LL & WOBUS RL (1990) Global forecast error cor-
relation. Part 1: isobaric wind and geopotential. Monthly Weather Review, 118,
2117-2137.
[187] THIEBAUX HJ & PEDDER MA (1987) Spatial Objective Analysis: with Appli-
cations in Atmospheric Science. Academic Press, London, 299p.
[189] UPTON G & FINGLETON B (1985) Spatial Data Analysis by Example: Point
Pattern and Quantitative Data. Wiley, Chichester, 410p.
[190] VANMARCKE E (1983) Random Fields: Application and Synthesis. MIT Press,
Cambridge, 382p.
[191] VENABLES WN & RIPLEY BD (1994) Modern Applied Statistics with S-Plus.
Springer-Verlag, New York, 462p.
[192] VERLY G, DAVID M & JOURNEL AG, editors (1984) Geostatistics for Natu-
ral Resources Characterization. Two volumes, NATO ASI Series C-122, Reidel,
Dordrecht.
Bibliography 249
[204] WEBSTER R & OLIVER MA (1990) Statistical Methods in Soil and Land Re-
source Survey. Oxford University Press, Oxford.