Multivariate Geostatistics - An Introduction With Applications-Springer Berlin Heidelberg (1995)

H.
Wackernagel, Multivariate Geostatistics

Springer-Verlag Berlin Heidelberg GmbH
Hans Wackernagel
Multivariate
Geostatistics
An Introduction
with Applications
With 75 Figures and 5 rabIes
Springer
Dr. HANS WACKERNAGEL
Centre de Geostatistique
Ecole des Mines de Paris
35, rue Saint Honore
77305 Fontainebleau
France
Cataloging-in-Publication Data applied for'
Die Deutsche Bibliothek - CIP-Einheitsaufnahme

WackernageI, Hans:
Multivariate Geostatistics : an introduction with applications ; with 5 tables 1 Hans
Wackernagel. - Berlin; Heidelberg; New York; Barcelona; Budapest; Hong Kong;
London ; Milan; Paris; Santa Clara ; Singapur; Tokyo : Springer, 1995
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is con-
cerned, specifically the rights of translation reprinting, reuse of illustrations, recitation, broadcasting, re-
production on microfilm or in any other way, and storage in data banks. Duplication ofthis publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in ist current version, and permission for use must always be obtained from Springer-Verlag. Violations
are liable for prosecution under the German Copyright Law.
ISBN 978-3-662-03100-1 ISBN 978-3-662-03098-1 (eBook)

DOI 10.1007/978-3-662-03098-1
© Springer-Verlag Berlin Heidelberg 1995
Originally published by Springer-Verlag Berlin Heidelberg New York in 1995,
Softcover reprint ofthe hardcovr 1st edition 1995
The use of general descriptive names, registered names, trademarks, ete. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
Product liability: The publishers cannot guarantee the accuracy of any information about dosage and
application contained in this book. In every individual case the user must check such information by
L'analyse des donnees est "un outil pour degager de la gangue des
donnees le pur diamant de la veridique nature" .
JP BENZECRI (according to [193])
Multivariate analysis is "a tool to extract from the gangue of the data
the pure diamond of truthful nature".
Preface
Introducing geostatistics from a multivariate perspective is the main aim of this
book. The idea took root while teaching geostatistics at the Centre de Geostatis-
tique (Ecole des Mines de Paris) over the past ten years in the two postgraduate
programs DEA and CFSG. A first script of lecture notes in French originated
from this activity.
A specialized course on Multivariate and Exploratory Geostatistics held in
September 1993 in Paris (organized in collaboration with the Department of
Statistics of Trinity College Dublin) was the occasion to test some of the mate-
rial on a pluridisciplinary audience. Another important opportunity arose last
year when giving a lecture on Spatial Statistics during the summer term at the
Department of Statistics of the University of Washington at Seattle, where part
of this manuscript was distributed in an early version. Short accounts were also
given during COMETT and TEMPUS courses on geostatistics for environment al
studies in Fontainebleau, Freiberg, Rome and Prague, which were sponsored by
the European Community.
I wish to thank the participants of these various courses for their stimulating
questions and comments. Among the organizers of these courses, I particularly
want to acknowledge the support received from Georges Matheron, Pierre Chau-
vet, Margaret Armstrong, John Haslett and Paul Sampson. Michel Grzebyk has
made valuable comments on Chapters 26 and 27, which partly summarize some
of his contributions to the field.
Fontainebleau, May 1995 Hans Wackernagel

Contents
1 Introduction 1
A Preliminaries 5
2 From Statistics to Geostatistics 7
The mean: center of mass 7
Covariance. . . . . . . . . . 10
Linear regression . . . . . . 12
Variance-covariance matrix. 15
Multiple linear regression. 16
Simple kriging. . . . . . . . 18
B Geostatistics 23
3 Regionalized Variable and Random Function 25
Multivariate timejspace data . . . . . . 25
Regionalized variable . . . . . . . . . . . 26
Random variable and regionalized value 27
Random function . . . . 27
Probability distributions 28
Strict stationarity . . 29
4 Variogram Cloud 30
Dissimilarity versus separation. . . . . . . . . . . . . . 30
Experimental variogram . . . . . . . . . . . . . . . . . 32
Replacing the experimental by a theoretical variogram 34
5 Variogram and Covariance Function 35

Regional variogram . . 35
Theoretical variogram . . 35
Covariance function. . . . 37
Positive definite function . 37
Conditionally negative definite function . 38
x
Fitting the variogram with a eovarianee function. . . . . . . . . . . .. 40
6 Examples of Covariance Functions 41

N ugget-effect model . . . . . . 41
Exponential eovarianee function . . . . 41
Spherieal model . . . . . . . . . . . . . 42
Derivation of the spherieal eovarianee . 43
7 Anisotropy 46
Geometrie Anisotropy . . . . . . . 46
Rotating and dilating an ellipsoid . 46
Exploring 3D space for anisotropy . 48
Zonal anisotropy . . . . . . . . 48
Nonlinear deformations of space .. 49
8 Extension and Dispersion Variance 50

Support . . . . . . 50
Extension varianee 51
Dispersion variance 52
Krige's relation .. 53
Change of support effect 54
Application: acoustic data 56
Comparison of sampling designs 59
9 Measures and Plots of Dispersion 62
Tonnage, recovered quantity, investment and profit 62
Selectivity . . . . . . . . . . . . . . . . . . . 64
Recovered quantity as a function of tonnage 66
Time series in environment al monitoring 67
10 Kriging the Mean 69

Mean value over a region . 69
No systematic bias . . . . 70
Variance of the estimation error 70
Minimal estimation variance . . 71
Kriging equations . . . . . . . . 71
Case of the nugget-effect model 72
Local estimation of the mean 73
11 Ordinary Kriging 74
Ordinary kriging problem . . . . . . . . 74
Block kriging . . . . . . . . . . . . . . . 76
Simple kriging with an estimated mean . 77
Kriging the residual . 79
Cross validation . . . . . . . . . . . . . . 80
XI
12 Kriging Weights 82
Geometry . . . . . . 82
Geometrie anisotropy . 84
Relative position of sampies 84
Sereen effeet . . . . . . . . . 85
Faetorizable eovarianee functions 86
Negative kriging weights .. 88
13 Mapping with Kriging 89
Kriging for spatial interpolation 89
Neighborhood . . . . . . . . . . 90
14 Linear Model of Regionalization 94
Spatial anomalies . . . . . . . . . . . . 94
Nested variogram model . . . . . . . . 95
Deeomposition of the random function 96
Seeond-order stationary regionalization 97
Intrinsie regionalization . . . . . . . . 98
Intrinsie regionalization with mostly stationary eomponents 98
Loeally stationary regionalization . . . . . . . . . . . . . . . 99
15 Kriging Spatial Components 100

Kriging of the intrinsie eomponent . . . . . . . 100
Kriging of a seeond-order stationary eomponent 101
Filtering . . . . . . . . . . . . . . . . . . . . . . 103
Applieation: kriging spatial eomponents of arsenic data . 104
16 The Smoothness of Kriging 106

Kriging with irregularly spaeed data 106
Sensitivity to ehoice of variogram model 109
Applieation: kriging topographie data . 110
C M ultivariate Analysis 113

17 Principal Component Analysis 115
Transformation into factors . . . . . 115
Maximization of the varianee of a factor 116
Interpretation of the factor varianees . . 117
Correlation of the variables with the factors 118
18 Canonical Analysis 123

Faetors in two groups of variables . . . . . 123
Intermezzo: singular value deeomposition . 124
Maximization of the correlation . . . . . . 124
XII
19 Correspondence Analysis 126

Disjunctive table . . . . . . . . 126
Contingency table. . . . . . . . 126
Canonical analysis of disjunctive tables 127
Co ding of a quantitative variable . . . 127
Contingencies between two quantitative variables 127
Continuous correspondence analysis . . . . . . . . 128
D M ultivariate Geostatistics 129
20 Direct and Cross Covariances 131

Cross covariance function 131
Delay effect . . . . . . . 132
Cross variogram. . . . . . 133
Pseudo cross variogram. . 135
Difficult characterization of the cross covariance function 136
21 Covariance Function Matrices 137

Covariance function matrix 137
Cramer's theorem. 137
Spectral densities 138
Phase shift 139
22 Intrinsic Multivariate Correlation 140

Intrinsic correlation model 140
Linear model . . . . . . 141
Codispersion coefficients 142
23 Cokriging 144
Isotopy and heterotopy . 144
Ordinary cokriging . . . 145
Simple cokriging . . . . 147
Cokriging with isotopic data 148
Autokrigeability . . . 149
Collocated cokriging . . . . 151
24 Multivariate Nested Variogram 152

Linear model of coregionalization . . . 152
Bivariate fit of the experimental variograms 154
Multivariate fit . . . . . . . . . . . . . . . . 155
The need for an analysis of the coregionalization . 158
XIII
25 Coregionalization Analysis 160

Regionalized principal component analysis 160
Generalizing the analysis . . . . . . . . . . 161
Regionalized canonical and redundancy analysis 162
Cokriging regionalized factors . . . 162
Regionalized multivariate analysis . 163
26 Kriging a Complex Variable 166

Coding directional data as a complex variable 166
Complex covariance function . . . . . . . . 166
Complex kriging . . . . . . . . . . . . . . . . 167
Cokriging of the real and imaginary parts 168
Complex kriging and cokriging versus aseparate kriging 169
Complex covariance function modeling . . . . . . . . . . 170
27 Bilinear Coregionalization Model 172

Complex linear model of coregionalization 172
Bilinear model of coregionalization . . . . . 173
E Non-Stationary Geostatistics 175

28 Universal Kriging 177
Second-order stationary residual . 177
Dual Kriging System . . . . . . . 178
Estimation of the drift . . . . . . 180
Variogram of the estimated residuals 181
A criticallook at the universal kriging model . 184
29 Translation Invariant Drift 185

Exponentialjpolynomial basis functions . 185
Generalized covariance function 186
Spatial and temporal drift . 188
Filtering temporal variation 188
30 External Drift 190

Depth measured with drillholes and seismic 190
Estimating with a shape function . 191
Kriging with external drift . . . . . . . . 193
Cross validation with external drift . . . 197
Regularity of the external drift function 200
APPENDIX 201
Matrix Algebra 203
XIV
Linear Regression Theory 213
Covariance and Variogram Models 218
Additional Exercices 221
Solutions to Exercises 223
References and Software 233
Bibliography 237
Index 251
1 Introduction
Geostatistics is a rapidly evolving branch of applied mathematics which origi-

nated in the in the mining industry in the early fifties to help improve ore reserve
calculation. The first steps were taken in South Africa, with the work of the min-
ing engineer DG KRIGE and the statistician HS SICHEL (see reference number
[95] in the bibliography).
In the late fifties the techniques attracted the attention of French engineers at
the Commissariat de I'Energie Atomique and in particular of the young Georges
MATHERON, who developed KRIGE's innovative concepts and set them in a single
framework with his Theory 0/ Regionalized Variables [113], [114], [116], [39].
Originally developed for solving ore reserve estimation problems the tech-
niques spread in the seventies into other areas of the earth sciences with the
advent of high-speed computers. They are nowadays popular in many fields of
science and industry where there is a need for evaluating spatially or temporally
correlated data. A first international meeting on the subject was organized in
Rome, Italy in 1975 [82]. Further congresses were held at Lake Tahoe, U.S.A. in
1983 [192], in Avignon, France in 1988 [8] and in Troia, Portugal in 1992 [174].
As geostatistics is now incorporating an increasing number of methods, the-
ories and techniques, it is an impossible task to give a full account of all de-
velopments in a single volume which was not intended to be encyclopedic. So
a selection of topics had to be made for the sake of convenience and we start
by presenting the contents of the book from the perspective of a few general
categories.
The analysis of spatial and temporal phenomena will be discussed along three
issues in mind
Data description. The data need to be explored for spatial, temporal and mul-
tivariate structure and checked for outlying values which mask structure.
Modern computer technology with its high-power graphic screens display-
ing multiple, linkable windows allows for dynamic simultaneous views on
the data. A map of the position of sampies in space or representations
along time can be linked with histograms, correlation diagrams, variogram
clouds and experimental variograms. First ideas about the spatial, time
and multivariate structure emerge from a variety of such simple displays.
Interpretation. The graphical displays gained from the numerical information

are evaluated by taking into account past experience on similar data and
scientific facts related to the variables under study. The interpretation of
2 Introduction
the spatial or time structure, the associations and the causal relations be-
tween variables are built into a model which is fitted to the data. This
model not only describes the phenomenon at sampIe locations, but it is
usually also valid for the spatial or time continuum in the sampled re-
gion and it thus represents a step beyond the information contained in the
numerical data.
Estimation. Armed with a model of the variation in the spatial or temporal
continuum, the next objective can be to estimate values of the phenomenon
under study at various scales and at locations different from the sampIe
points. The methods to perform this estimation are based on least squares
and need to be adapted to a wide variety of model formulations in different
situations and for different problems encountered in practice.
We have decided to deal only with these three issues, leaving aside questions
of simulation and control which would have easily doubled the length of the book
and changed its scope. To get an idea of what portion of geostatistics is actually
covered it is convenient to introduce the following common sub division into
1. Linear stationary geostatistics,
2. Non-stationary linear geostatistics,
3. Non-linear geostatistics.
We shall mainly cover the first topic, examining single- and multi-variate
methods based on linear combinations of the sampIe values and we shall assume
that the data stern from the realization of a set of random functions which are
stationary or, at least, whose spatial or time increments are stationary.
A short review of the second topic is given in the last three chapters of the
book with the aim of providing a bett er understanding of the status of drift
functions which are not translation invariant. We had no intention of giving an
extensive treatment of non-stationary geostatistics which would justify a mono-
graph.
The third topic has recently been covered at an introductory level in an
excellent monograph by RIVOIRARD [155], so it was not necessary to reexpose
that material here. Actually RIVOlRARD's book starts off with abrief description
of multivariate geostatistical concepts, which will be found in full detail in the
present work and which are important for a deeper understanding of non-linear
geostatistics.
Multivariate Geostatistics consists of thirty short chapters which on average
represent the contents of a two hour lecture. The material is subdivided into five
parts.
Part A reviews the basic concepts of mean, variance, covariance, variance-
covariance matrix, mathematical expectation, linear regression, multiple
linear regression. It ends with the transposition of multiple linear regres-
sion into a spatial context, where regression receives the name of kriging.
Introduction 3
Part B offers a detailed introduction to linear geostatisties for a single variable.

After presenting the random function model and the eoneept of stationar-
ity, the display of spatial variation with a variogram cloud is diseussed.
The neeessity of replaeing the experimental variogram, obtained from the
variogram cloud, by a theoretieal variogram is explained. The theoretieal
variogram and the eovarianee function are introdueed together with the as-
sumptions of stationarity they imply. As variogram models are frequently
derived from eovarianee functions, a few basie isotropie eovarianee models
are presented. Stationarity means translation-invarianee of the moments of
the random function, while isotropy is a eorresponding rotation-invarianee.
In the eases of geometrie or zonal anisotropy linear transformations of spaee
are defined to adapt the basieally isotropie variogram models to these sit-
uations.
An important feature of spatial or temporal data is that a measurement
refers to a given volume of spaee or an interval of time, whieh is ealled
the support of the measurement. Extension and dispersion varianees take
aeeount of the support of the regionalized variable and furthermore in-
eorporate the deseription of spatial eorrelation provided by the variogram
model.
Spatial regression teehniques known as kriging draw on the variogram or
the eovarianee function for estimating either the mean in a region or values
at partieular loeations of the region. The weights eomputed by kriging to
estimate these quantities are distributed around the estimation loeation in
a way that ean be understood by looking at simple sampie eonfigurations.
The linear model of regionalization eharacterizes distinct spatial or time
scales of a phenomenon. Kriging techniques are available to extract the
variation pertaining to a specific seale and to map a eorresponding eom-
ponent. As a byproduct the theory around the analysis and filtering of
eharacteristie seales gives a better understanding of how and why ordinary
kriging provides a smoothed image of a regionalized variable whieh has
been sampled with irregularly spaced data.
Part C presents three well-known methods of multivariate analysis. Prineipal

eomponent analysis is the simplest and most widely used method to define
factors explaining the multivariate correlation structure. Canonical anal-
ysis generalizes the method to the case of two groups of variables. Corre-
spondence analysis is an application of canonical analysis to two qualitative
variables coded into disjunctive tables. The transposition of the latter, by
coding a quantitative variable into disjunctive tables, has yielded models
used in disjunctive kriging, a technique of non-linear geostatisties.
Part D extends linear geostatistics to the multivariate case. The properties of

the cross variogram and the cross covariance function are discussed and
eompared. The characterization of matriees of eovariance functions is a
4 Introduction
central problem of multivariate geostatistics. Two models, the intrinsic

correlation model and the nested multivariate model, are examined in the
light of two multivariate random function models, the linear and the bi-
linear coregionalization models. Cokriging is analyzed for the situations
when it boils down to kriging, which is important to consider when trying
to evaluate the gain of introducing auxiliary variables. The cokriging of a
complex variable is based on a bivariate coregionalization model between
the real and the imaginary part and its comparison with complex kriging
provides a rich reservoir of problems for teasing students. The modeling of
the complex covariance function in complex kriging opens the gate to the
bilinear coregionalization model which allows for non-even cross covariance
functions between real random functions.
Part E discusses phenomena involving a non-stationary component called the

drift. When the drift functions are translation-invariant, generalized co-
variance functions can be defined in the framework of the rich theory of
intrinsic random functions of order k. In multivariate problems auxiliary
variables can be incorporated into universal kriging as external drift func-
tions which however are not translation-invariant.
The Appendix contains two additional chapters on matrix algebra and lin-
ear regression theory in a notation consistent with the rest of the material. It
also contains a list of common covariance functions and variograms, additional
exercises and solutions to the exercises. References classified according to topics
of theory and fields of applications are found at the end of the book, together
with a list of sources of geostatistical computer software, the bibliography and a
subject index.
Part A
P re1-JIDlnarles
- -
2 From Statistics to Geostatistics
In this introductory chapter we review a few basic concepts of statistics such as

mean, variance, covariance, variance-covariance matrix, as weH as the methods
of linear regression and multiple linear regression. Then we make a first step
into geostatistics by presenting the method of simple kriging, a transposition of
multiple regression into a spatial context.
The mean: center of mass

To introduce the notion of mean value let us take an example from physics.
Seven weights are hanging on a bar whose own weight is negligible. The
locations Z on the bar at which the weights are suspended are denoted by
Z = 5, 5.5, 6, 6.5, 7, 7.5, 8

as shown on Figure 2.1. The mass w(z) of the weights is
w(z) = 3, 4, 6, 3, 4, 4, 2
The location z where the bar, when suspended, stays in equilibrium is evi-
dently calculated using a weighted average
1
~ ZkP(Zk)
7
z= (;
7: : - ) k=1
W(Zk) LZkW(Zk) -- f:;.
where
W(Zk)
p(Zk) = (2(W(Zk))
are normed weights with

LP(Zk) = 1
k
In this example the weights W(Zk) can be disassembled into n elementary
weights v(zcr) of unit mass.
The normed weights p(zO/) corresponding to the elementary weights are equal
to l/n and the location of equilibrium of the bar, its center of mass, is computed
as
n 1 n
L
z = zO/p(z",) = - z'" = 6.4 L
",=1 n 0/=1
8 Pre1iminaries
center of mass
5 6
! 7 8
elementary weight v
' " weight w

Figure 2.1: Bar with suspended weights.
Transposing the physical problem at hand into a probabilistic context, we

realize that z is the mean value m of Z and that the normed weights p(Zk),
p(za) can be interpreted as probabilities, i.e. the frequency of appearance of the
values Zk or Za. The Zk represent a grouping of the Za and have the meaning of
classes of values za. The weights P can be called probabilities as they fulfill the
requirements 0 ~ P ~ 1 and E p = l.
Another characteristic value which can be calculated is the average squared
distance to the center of mass
.. 1 ..
dist 2 = E(za - m? p(za) = - E(Za - m)2 = var(za) = .83
a=l n a=l
This is the formula for the calculation of the experimental variance, which
gives an indication about the dispersion of the data around the center of mass
m of the data.
In fact, what has been introduced here under the cover of a weightless bar with
weights attached to it, is an upside down histogram as represented on Figure 2.2.
An alternate way to represent the frequencies of the values Z is by cumulating
the frequencies from left to right as on Figure 2.3 where a cumulative histogram
is shown.
The mathematical idealization of the cumulative histogram, when the random
variable Z takes values in R, is the probability distribution function F(z) defined
as
F(z) = P(Z ~ z), -00 < Z < 00
which indicates the probability P that a value of the random variable Z is below
a fixed value z.
If we partition Z into intervals of infinitesimallength dz, the probability that
a realization of Z belongs to such an interval is F(dz). We shall only consider
From Statistics to Geostatistics 9
5 6 7 8
mean m
Figure 2.2: Histogram.
differentiable distribution functions. The derivative of the frequency distribution

is the density function p(z)
F(dz) = p(z) dz
The idealization of the concept of mean value is the mathematical expectation
ErZ] or expected value. The expected value of Z is also called the first moment
o~ the random variable and it is defined as the integral over the realizations z
of Z weighted by the density function
E [ Z] = J z p( z) dz = m
z E IR
The expectation is a linear operator. Let a and b be deterministic constants.
It is easy to see from the definition that we have
E[ a] = a E[bZ] =bE[Z] =bm
and
E[a+bZ] =a+bm
The second moment of the random variable is the expectation of its squared
value
E[ Z2] = J z2 p(z) dz
z E IR
and the n-th moment is defined as the expected value of the n-th power of Z
E[Z"] = J z"p(z)dz
z E IR
When Z has a discrete distribution the integral in the definition of the math-
ematical expectation is replaced by a sum
E[ Z] = I>"
k
p" = m
10 Preliminaries
CUMULATIVE
FREQUENCY
z
2 3 4 5 6 7 8
Figure 2.3: Cumulative histogram.
where Pk = P(Z = Zk) is the probability that Z takes the value Zk.
The theoretical variance (]"2 is
(]"2 = E[(Z-m)2]
E[ Z2 + m 2 - 2mZ 1
and as the expectation is a linear operator
(]"2 = E [ Z2] _ m 2
the variance can be expressed as the difference between the second moment and
the squared first moment.
Covariance
In the case of two variables, Zl and Z2 say, the data values can be represented
on a scatter diagram like on Figure 2.4 which shows the cloud of data points in
the plane spanned by two perpendicular axes, one for each variable. The center
of mass of the data cloud is the point defined by the two means (mb m2). An
obvious way to measure the dispersion of the data cloud around its center of mass
is to multiply the difference between a value of one variable and its mean, called
Z1
.. •
.....• •.
• ...
---------
..
. .-- .
~ _,. -____
~
.~
- - e.
.#
.... .............,.,,:...
,~..
• • ••• •
. ,..:r : ~ :..
.:..,
m1
.:~.:.
• ".-.
.~ I
• ,•• ••
m2 Z2
Figure 2.4: Scatter diagram showing the cloud of sample values and the center of
mass (m2,m1).
a residual, with the residual of the other variable. The average of the products
of residuals is the covariance
COV(Zl' Z2) = !. I)zf - md(z~ - m2)

n 0=1
When the residual of Zl tends to have the same sign as the residual of Z2 on
average, the covariance is positive, while when the two residuals are of opposite
sign on average, the covariance is negative. When a large value of one residual is
on average associated with a large value of the residual of the other variable, the
covariance has a large positive or negative value. Thus the covariance measures
on one hand the liking or disliking of two variables through its sign and on the
other hand the strength of this relationship by its absolute value.
We see that when Zl is identical with Z2, the eovarianee is equal to the vari-
anee.
It is often desirable to eompare the eovarianees of pairs of variables. When
the units of the variables are not eomparable, especially when they are of a
different type, e.g. cm, kg, %, ... , it is preferable to standardize each variable
12 Preliminaries
z, centering first its values around the center of mass by subtracting the mean,
and subsequently norming the distances of the values to the center of mass by
dividing them with the standard deviation 0", which is the square root of the
variance. The standardized variable z
_ z-m
z=--
0"
has a variance equal to 1. The covariance of two standardized variables Z1 and

Z2 is a normed quantity rij, called correlation coefficient, with bounds
-1 ::; ri; ::; 1
The correlation coefficient ri; can also be calculated directly from Zi and z;
dividing their covariance by the product of their standard deviations
COV(Zi,Z,)
rij =
O"iO"j
Linear regression
Two variables that have a correlation coefficient different from zero are said to
be correlated. It is often reasonable to suppose that some of the information
conveyed by the measured values is common to two correlated variables. Con-
sequently it seems interesting to look for a function which, knowing a value of
one variable, yields the best approximation to the unknown value of a second
variable.
We shall call "best" function z* a function of a given type which minimizes
the mean squared distance dist 2 (.) to the sampies
dise(-) = ~
n
t
a=1
(za - Z:)2
This is intuitively appealing as using this criterion the best function z* is the
one which passes closest to the data values.
zr
Let us take two variables Z1, Z2 and denote by the function which approx-
imates best unknown values of Z1.
The simplest type of approximation of Z1 is by a constant e, so let
z~ = e
and this does not involve Z2. The average distance between the data and the
constant is
1 n
dist 2 (e) = - I:)zf - e)2
n a =1
The minimum is achieved for a value of e for which the first derivative of the
distance function dist 2 (e) is zero
odist 2 (e)
=0
oe
<===> ( -1 L(zf
n
- c? )'
=0
n 0<=1
1 n ,
<===> ;:; L ((zf? + c2 - 2czf) =0
0<=1
1 n
<===> - L(2c - 2zf) =0
n 0<=1
1 n
<===> C = - L zf = m1
n 0<=1
The constant c which minimizes the average square distance to the data of
Z1 is the mean m1. Replacing c by m1 in the expression of dise(c) we see that
the minimal distance for the optimal estimator zr = m1 is the variance u?
A slightly more sophisticated function that can be chosen to approximate Z1
is a linear function of Z2
z: = aZ2 + b
which defines a straight line through the data cloud with a slope a and an inter-
cept b.
The distance between the data of Z1 and the straight line depends on the two
parameters a and b
1 n
dist 2(a, b) = - I)zf - a z~ - W
n 0<=1
1
= -
n
L ({zf)2 + a2(zn 2 + b2 -
n
0<=1
2azf z~ - 2bzf + 2abz~)
1
+ 2abm2 + - L
n
= -2bm1 ((zf)2 + a2(z~)2 + b2 - 2azf z~)
n 0<=1
If we shift the data cloud and the straight line to the center of mass, this
translation does not change the slope of the straight line. We can thus consider,
without loss of generality, the special case m1 = m2 = 0 to determine the slope
of the straight line. In this case the distance is
dist 2 (a,b') = var(zt} + a 2 var(z2) + (b')2 - 2acov(zt,Z2)
where b' is the intercept of the shifted straight line.

At the minimum, the first partial derivative of the distance function with
respect to a is zero
8 dist 2(a, b')

=0
8a
<===> 2avar(z2) - 2COV(Z1,Z2) = 0
14 Preliminaries
cov(Zl, Z2)
a
~
= var(z2)
As the minimum with respect to b' is reached for

adist 2(a, b') = 2 b' = 0 ~ b' = 0
ab'
we conclude that the optimal straight line passes through the center of mass.
Coming back to the more general case m1 '" 0, m2 '" 0, the optimal value of
b (knowing a) is
adist 2 (a,b)
=0
ab
1 n
~ -L:(2b-2zr+2az~) =0
n ",=1
~ b =m1-am2
The best straight line approximating the variable Zl knowing Z2, called the
linear regression of Z2 on Zl, is
*
Zl = var
()
COV(Zl, Z2) (
Z2
Z2 - m2
)
+ m1
= + -r12
0"1
m1
0"2
(Z2 - m2)
= m1 + 0"1 r12 Z2
and is represented on Figure 2.5.
The linear regression is an improvement over the approximation by a constant,
the proportion r12 of the normed residual Z2, rescaled with the standard deviation
0"1, being added to the mean m1. The linear regression is constant when the
correlation coeflicient is zero.
The improvement can be evaluated by computing the minimal squared dis-
tance
dist~;n (a, b) ~( Zl'" -

= -1 L." .,.".
0"1 r12 Z2 - m1 )2
n ",=1
= ~(_'"
-1 L." 0"1 Zl - 0"1 r12 Z2
-"')2
n ",=1
= O"~ (1 - r12)
The stronger the correlation between two variables, the smaller the distance
to the optimal straight line, the higher the explanative power of this straight
line and the lower the expected average error. The improvement of the linear
regression from Z2 over the approximation by the constant m1 alone is equal to
the proportion r12 of the variance O"~.
Z1
zi = aZ2+b~
• ••
••
•
•
m1 •
•
•
• •
, ••
m2 Z2
Figure 2.5: Data cloud with regression line zr.

Variance-covariance matrix
Take N variables z:, sampled n times, and arrange them into a rectangular table,
the data matrix Z'
Variables
z~l zL z~N
Samples
Z~l z~i z~N
Z~l <i Z~N
Define a matrix M with the same dimension n x N as Z' containing the mean
values,
[7 ~Nl
mi
M = ml mi mN
ml mi mN
16 Preliminaries
A matrix Z of eentered variables Zj is obtained by subtracting M from the

raw data matrix
z M z'
The varianee-eovarianee matrix V is ealculated by premultiplying Z with its
matrix transpose ZT (the T indieates matrix transposition) and dividing by the
number of sampies
var(zt) eov(zI, Zj) eov(ZI,ZN)
V = !:.ZTZ = eov(Zj, Z1) var(Zj) eov(Zj, ZN)

n
eov(ZN,Z1) eov(ZN,Zj) var(ZN)
The matrix V is symmetrie. As it is the result of a product AT A, where

A = 7n'Z, it is nonnegative definite. The notion of nonnegative definiteness of
the varianee-eovarianee matrix plays an important role in theory and also has
practieal implieations.
Multiple linear regression

Having a set of N auxiliary variables Zj, i = 1, ... , N, it is often desirable to
estimate values of a main variable Zo using a linear function of the Nother
variables.
A multiple regression plane z~ in N +1 dimensional spaee is given by a linear
eombination of the residuals of the N variables with a set of eoeffieients aj
N
z~ mo + 2:ai(Zi - mi)
j=1
where mi are the respective means of the different variables.
For n sampies we have the matrix equation
Z~ = IDo + (Z - M) a
The distanee between Zo and the straight line is
dist 2 (a) .!.(zo -

n
Z~)T (zo - z~)
= var(zo) + aTVa - 2aTVo
where Vo is the veetor of eovarianees between Zo and Zi, i = 1, ... , N.

The minimum is found for
ßdist 2 (a)
=0
ßa
{::::::} 2 Va - 2 Vo = 0
{::::::} Va =Vo
This system of linear equations

var(zl) ZN») ~l)
(
COV(Z:b
( = (COV(~'Zl»)
cOV(Z~,Zl) var(ZN) aN cov(Zo, ZN)
has exactly one solution if the determinant of the matrix V is different from zero.
The minimum distance resulting from the optimal coefficients is
dist~in(a) = var(zo) - aT Vo
To make sure that this optimal distance is not negative, it has to be checked
beforehand that the augmented matrix Vo of variances and covariances for all
N +1 variables
var(zo) COV(Z~'ZN»)
V. - ( :
° - cov(z~, Zo) var(zN)
is nonnegative definite.
EXAMPLE 2.1 (MISSING VALUES) In the following situation a major problem

can arise. Suppose that for some variables data values are missing for some
samples. If the variances and covariances are naively calculated on the entire
set, then the variances and covariances will have been obtained from subsets
of samples with a varying size. The matrix Vo assembled with these variance-
covariance values stemming from different sample subsets can be indefinite, i.e.
not a variance-covariance matrix.
This is illustrated with a simple numerical example involving three samples
and three variables. A data value of Zo is missing for the first sample. The eight
data values are
Zo Zl Z2
Sampie 1 3. 5.
Sampie 2 3. 2. 2.
Sampie 3 1. 4. 5.
The means are mo = 2, ml = 3, m2 = 4. The matrix Vo of variances and
covariances of the three variables
3
1 -1 -
2
2
Vo = I -1
3
1
3
1 2
2
18 Preliminaries
is obviously not nonnegative definite as two out of the three minors of order 2
are negative
det ( 1
-1 T) 1
3' det(~ ~) = --3' 1
det (:~ ~~) 1
4
n~) (::) ~ q) ~v,

The linear regression equations for z~ can be solved as the matrix V is not
singular
Va~
The solution is z~ = -3/2 Zl + 0 Z2, but the minimal distance achieved by
this "optimal" regression line is negative:
1
dist~in = -2
and the whole operation is meaningless !
The lesson from this is that varianees and eovarianees of a set of variables
have to be ealeulated on the same subset of samples. This is the subset of samples
with no values missing for any variable.
Similarly, in a spatial eontext we wish to estimate values at loeations where
no measurement has been made using the data from neighboring loeations. We
see immediately that there is a need for a covariance /unction defined in such
a way that the eorresponding matrix Vo is nonnegative definite. The spatial
multiple regression based on a eovarianee function model is ealled simple kriging.
Simple kriging
Take the data loeations Xc> and eonstruct at eaeh of the n locations a random
variable Z(xc»' Take an additional loeation Xo and let Z(xo) be a random
variable at xo. Further assurne that these random variables are a subset of a
seeond-order stationary random function Z(x) defined over a domain 'D. By
second-order stationarity we mean that the expectation and the eovarianee are
both translation invariant over the domain, i.e. for a vector h linking any two
points x and x+h in the domain:
E[ Z(x+h) 1 = E[ Z(x) 1
eov[Z(x+h),Z(x)] = C(h)
The expected value E[ Z(x)] = m is the same at any point x of the domain.
The eovarianee between any pair of loeations depends only on the vector h whieh
separates them.
Xo
o
Xa
•
Figure 2.6: Data points Xa and the estimation point Xo in a spatial domain 1).
The problem of interest is to build a weighted average to make an estima-

tion of a value at a point Xo using information at points X a , a = 1, ... , n, see
Figure 2.6. This estimation procedure should be based on the knowledge of the
covariances between the random variables at the points involved. The ans wer
closely resembles multiple regression transposed into a spatial context where the
Z(x",) play the role of regressors on a regressand Z(xo). This spatial regression
bears the name of simple kriging (after the mining engineer DG KRIGE). With
a mean m supposed constant over the whole domain and calculated as the aver-
age of the data, simple kriging is used to estimate residuals from this reference
value m given apriori. Therefore simple kriging is sometimes called kriging with
known mean.
In analogy with multiple regression the weighted average for this estimation
is defined as
Z*(Xo) = m+ t W"'(
",=1
Z(x",) - m)
where Wo: are weights attached to the residuals Z(xo:) - m. Note that unlike in
multiple regression the mean m is the same for alliocations due to the stationarity
assumption.
The estimation error is the difference between the estimated and the true
value at Xo
Z*(xo) - Z(xo)
20 Preliminaries
The estimator is said to be unbiased if the estimation error is nil on average
E[ Z*(xo) - Z(xo)] = m+ ~ W" (E[ Z(x,,)] - m) - E[ Z(Xo)]

n
= m+ Lw,,(m-m)-m
,,=1
= 0
The variance of the estimation error, the estimation variance oL then is

simplyexpressed as
(1~ = var(Z*(xo) - Z(Xo)) = E[ (Z*(xo) - Z(xo)r]
Expanding this expression
(1~ = E [ (Z*(Xo)) 2 + (Z(xo)) 2 - 2 Z*(Xo) Z(xo) ]

n n n
L LW" Wß C(x" - Xß) + C(xo - Xo) - 2 L w" C(x" - xo)

,,=lß=l ,,=1
We write: covrZ(x,,),Z(xß)] = C(x,,-xß), because the spatial covariances de-

pend only on the difference vector between the points in the domain.
The estimation variance is minimal where its first derivative is zero
ß(1~ =0 for Q = 1, ... ,n

ßw"
Explicitly we have
Q = 1, ... ,n { 2 t
ß=l
wß C(x" - xß) - 2 C(x" - Xo) =0
and the equation system for simple kriging is written
Q = 1, ... ,n { t Wß C(x" - xß) = C(x" - xo)

ß=l
The left hand side of the equations describes the covariances between the 10-
cations. The right hand side describes the covariance between each data location
and the location where an estimate is sought. The resolution of the system yields
the optimal kriging weights w".
The operation of simple kriging can be repeated at regular intervals displacing
each time the location Xo. A regular grid of kriging estimates is obtained which
can be contoured for representation as a map.
A second quantity of interest is the optimal variance for each location xo. It
is obtained by substituting the left hand of the kriging system by its right hand
side in the first term of the expression of the estimation variance 01 This is the
variance of simple kriging
'E w", IC(x", - 1+ C(xo - xo) - 2 'E w", C(x", -

n n
O'~K = xo) Xo)

",=1 ",=1
n
= C(O) - 'E w", C(X", - Xo)
",=1
When sampie locations are scattered irregularly in space it is worthwhile

to produce a map of the kriging variance as a complement to the map of kriged
estimates. It gives an appreciation of the varying precision of the kriged estimates
due to the irregular disposition of informative points.
PartB
Geostatistics
3 Regionalized Variable and
Random Function
"It could be said,

as in The Emperor 0/ the Moon,
that all is everywhere and always like here,
up to a degree of magnitude and perfection."
GW LEIBNIZ
The data provide information about regionalized variables, which are simply
functions z(x) (of a spatial or time continuum) whose behavior we would like to
characterize. In applications the regionalized variables are usually not identical
with simple deterministic functions and it is therefore of advantage to place them
into a probabilistic framework.
In the probabilistic model, a regionalized variable z(x) is considered to be a
realization of a random function Z(x) (the infinite family of random variables
constructed at all points x of a spatial domain 1)). The advantage of this ap-
proach is that we shall only try to characterize the random function Z(x) and
not a particular regionalized variable.
In such a setting the data values are sampIes from one particular realiza-
tion z(x) of Z(x). The epistemological implications of such an investigation
of a unique realization by probabilistic methods have been debated by MATH-
ERON [126] and shall not be reported here. We provide a description of the
probabilistic formalization, but we shall not attempt to justify it rigorously. The
main reason for the success of the probabilistic approach is that it offers a conve-
nient way of formulating methods which could also be viewed in a deterministic
setting, albeit with less elegance.
Multivariate time/space data

In many fields of science data arises which is either time or space dependent-or
both. Such data is often multivariate, i.e. several quantities have been measured
and need to be examined. The data array, in its most general form, may have
the following shape
26 Geostatistics
Coordinates Variables
tl Xl
I x I2 x 3I zi ziI zN
I
Sampies Ita Xl
a x a2 x 3a zl
a
zia zN
a
tn Xl x n2 x n3 Zl zi zN
n n n n
In this array the samples are numbered using an index a and the total number
of samples is symbolized by the letter n, thus a = 1,2,3, ... , n.
The different variables are labeled by an index i and the number of variables
is N, so i = 1,2,3, ... ,N.
The samples could be pieces of rock of same volume taken by a geologist or
they could be plots containing different plants a biologist is interested in. The
variables would then be chemical or physical measurements performed on the
rocks, or counts of the abundance of different species of plants within the plots.
The sampies may have been taken at different moments t a and at different
places X a , where X a is the vector of up to three spatial coordinates (x~, x!, x~).
So far the numbers contained in the data set have been arranged into a
rectangular table and symbols have been used to describe this table. There is a
need for viewing the data in a more general setting.
RegionaIized variable
Let us consider that only one property has been measured on different spatial
objects (N = 1) and that the time at which the measurements have been made
has not been recorded. In this case the index i and the time coordinate t a are
of no use and can be omitted. There are n observations symbolized by
z(X a ) with a = 1, ... , n

The sampled objects in a region V can be considered as a fragment of a larger
collection of objects. Many more observations than the few collected could be
made, but because of the cost and effort involved this has not been done. If the
objects are points, even infinitely many observations are possible in the region.
This possibility of many more observations of the same kind is introduced by
dropping the index a and defining the regionalized variable (for short: BEV) as
z(X) with x E V
The data set {z( x a ), a = 1, ... , n} is a collection of a few values of the BEV.
Regionalized Variable and Random Function 27
Randomness
[ Ran~ variable)
~--../ ~ -----~
[ Sampies ) ( Random Function )
\ I
Regionalized Variable
Regionalization
Figure 3.1: The random function model.
Random variable and regionalized value

Eaeh measured value in the data set is a regionalized value. A new viewpoint is
introdueed by eonsidering a regionalized value as the outeome of some random
meehanism. Formally this meehanism will be ealled a random variable and a
sampled value z(x a ) represents one draw from the random variable Z(x a ). At
eaeh point X a the meehanism nature uses to produee a value z(x a ) may be
different, thus Z(x",) eould apriori have different properties at eaeh point of a
region.
Random function
Considering the regionalized values at all points in a region, the assoeiated fune-
tion z(x) of x E 1) is a regionalized variable. This function as a whole ean be
viewed as one draw from an infinite set of random variables (one random variable
at eaeh point of the domain) whieh is ealled the random function (RAF) Z(x).
Figure 3.1 shows how the model of the random function has been set up by
viewing the data under two different angles. One aspect is that the data values
stern from a physieal environment (time, 1-2-3D spaee) and are in some way
dependent on their Ioeation in the region: they are regionalized. The second
aspect is that the regionalized values z(x",) cannot, generally, be modeled with a
simple deterministie function z(x). Looking at the values that were sampled, the
behavior of z(x) appears to be very compiex. Like in many situations where the
parameters of a data generating meehanism eannot be determined, a probabilistie
approach is chosen, i.e. the mechanism is eonsidered as random. The data values
are viewed as outeomes of the random mechanism.
Joining together these two aspects of regionalization and randomness yields
28 Geostatistics
RAF Z(x) ----+ Z(XO) The Random Variable's
JJ- 1 1 realization is a
REV z(x) ----+ z(xo) Regionalized Value
Figure 3.2: The regionalized variable as a realization of a random function.
the concept of a RAF.

In probabilistic jargon, see Figure 3.2 , the REV z(x) is one realization of the
RAF Z(x). A regionalized value z(xo) at a specific location Xo is a realization of
a random variable Z(xo) which is itself a member of an infinite family of random
variables, the RAF Z(x). Capital 'Z' is used to denote random variables while
smalI' z' is used for their realizations. The point Xo is an arbitrary point of the
region which may or may not have been sampled.
Probability distributions
In this model the random mechanism Z(xo) acting at a point Xo of the region
generates realizations following a probability distribution F
P(Z(Xo) < z) = Fxo(z),
where P is the prob ability that an outcome of Z at the point Xo is lower than a
fixed value z.
A bivariate distribution function for two random variables Z(Xl) and Z(X2)
at two different locations is
P(Z(xd < Zl,Z(X2) < Z2) = FXloX2 (Zl,Z2)
where P is the probability that simultaneouslyan outcome of Z(Xl) is lower than

Zl and an outcome of Z(X2) is lower than Z2.
In the same way a multiple distribution function for n random variables
located at n different points can be defined
FXlo ...•Xn(Zl, ... 'zn) = P(Z(Xl) < Zl,""Z(xn) < zn)

Built up in this manner we have an extraordinarily general model which is
able to describe any process in nature or technology. In practice, however, we
Regionalized Variable and Random Function 29
possess only few data from one or several realizations of the RAF and it will
be impossible to infer all the mono- and multivariate distribution functions for
any set of points. Simplification is needed and it is provided by the idea of
stationarity.
Strict stationarity
Stationarity means that characteristics of a RAF stay the same when shifting
a given set of n points from one part of the region to another. This is called
translation invariance.
To be more specific, a RAF Z(x) is said to be strictly stationary iffor any set
of n points Xl! ... ,Xn and any vector h
Fxt. ....xn (Zl!' .. ,Zn) = FX1+h ..... xn+h(Zl, ... ,Zn)

i.e. a translation of a point configuration in a given direction does not change
the multiple distribution.
The new situation created by restraining Z(x) to be strictly stationary with
all its n-variate distribution functions being shift invariant, can be summarized by
a statement of Arlequino (coming back from a trip to the moon): "Everything is
everywhere and always the same as here ... up to a certain degree of magnitude
and perfection"l. This position of Arlequino explains best the idea of strict
stationarity: things do not change fundamentally when one moves from one part
of the universe to another part of it! Naturally, we cannot fully agree with
Arlequino's blase indifference to everything. But there is a restriction "up to
a certain degree" in Arlequino's statement and we shall, in an analogous way,
loosen the concept of strict stationarity and define several types and degrees of
stationarity. These lie in the wide range between the concept of a non-stationary
(and non-homogeneous) RAF, whose characteristics change at any time and at
any location, and the concept of a strictly stationary random function whose
distribution functions are everywhere and always the same.
1 "C'est partout et toujours comme ici, c'est partout et toujours comme chez nous, aux
degres de grandeur et de perfection pres." [M SERRES (1968) Le Systeme de Leibniz et ses
Modeles Mathematiques. Presses Universitaires de France, Paris.}
4 Variogram Cloud
Pairs of sampie values are evaluated by computing the squared difference between
the values. The resulting dissimilarities are plotted against the separation of
sampie pairs in geographical space and form the variogram doud. The doud is
sliced into dasses according to separation in space and the average dissimilarities
in each dass form the sequence of values of the experimental variogram.
Dissimilarity versus separation

We measure the variability of a regionalized variable z(x) at different scales by
computing the dissimilarity between pairs of data values, Za and Zß say, located
at points X a and Xß in a spatial domain 'D. The measure for the dissimilarity of
two values, labeled "(*, is
* (za - Zß?
"(aß = 2
i.e. half of the square of the difference between the two values.
The two points X a , xß in geographical space can be linked by a vector haß =
Xß-X a as shown on Figure 4.1.
·~.x
Figure 4.1: A vector h linking Xa to xß = Xa + h.

We let the dissimilarity "(* depend on the spacing and on the orientation of
the point pair described by the vector haß
"(*(h aß ) = ~(z(Xa+h) - z(xa)r

Variogram Cloud 31
(z(x+h) _ Z(X» 2
2
..... .....
....
• •••••
,. .. ....
..
... ..
.
•
.
• • • ••••
•
•• • •
•••
.~:.
h
Figure 4.2: Plot of the dissimilarities
pairs: a variogram cloud.
,* against the spatial separation h of sampie
As the dissimilarity is a squared quantity, the sign of the vector h, i.e. the
order in which the points x'" and xß are considered does not enter into play. The
dissimilarity is symmetrie with respect to h:
,*(-h"'ß) = ,*(+h"'ß)
Thus on graphical representations the dissimilarities will be plot ted against abso-
of half the diameter of the region), a plot of the dissimilarities ,*

lute values of the vectors h. Using all sampIe pairs in a data set (up to a distance
against the
spatial separation h is produced which is called the variogram cloud. A schematic
example is given on Figure 4.2.
The dissimilarity often increases with distance as near sampIes tend to be
alike.
The variogram cloud by itself is a powerful tool for exploring features of spa-
tial data. On a graphical computer screen the values of the variogram cloud can
be linked to the position of sample pairs on a map representation. The analysis
of subsets of the variogram cloud can help in understanding the distribution of
the sample values in geographical space. Anomalies, inhomogeneities can be
detected by looking at high dissimilarities at short distances. In some cases the
variogram cloud consists of two distinct clouds due to the presence of outliers.
HASLETT et al. [86] have first developed the use of the variogram cloud in combi-
nation with other views on the data using linked windows on a graphie computer
screen and they provide many examples showing the power of this exploratory
tool.
Actuallya variogram cloud seldom looks like what is suggested on Figure 4.2:
32 Geostatistics
'Y(h~
.. •
-_•...~.
• • •••
•~.~
•
• ..
....,.::. .
•••~.--
• Fe-tt : . :r ....
: l·
:.~:- •.:
~
...
• •• 1 l tr l :•
~.
...........
: :: : :: :: ::
.
I~;o . . . .; .; .;:
; .....
; ;
I I ! ! I
h 1 h2 h3 h4 h s h 6 h 7 h s h9
i i
Figure 4.3: The experimental variogram is obtained by averaging the dissimilarities
1* for given classes .fj,..
the variogram doud usually is dominated by many pairs with low dissimilarity
at all scales h (DIGGLE et al. [57] discuss this question).
Experimental variogram
An average of dissimilarities 1*(harß) can be formed for a given dass of vectors
.fj by grouping all n c point pairs that can be linked by a vector h belonging to
.fj. Such a dass .fj groups vectors whose lengths are within a specified interval
of lengths and whose orientation is the same up to a given tolerance on angle.
Generally non overlapping vector dasses .fj are chosen. The average dissimilarity
with respect to a vector dass .fj,. is a value of what is termed the experimental
variogram
1*(jj,.) = 2n1 c
E
ßc (
z(xar+harß) - Z(X ar )
)2 with h arß E jj,.
In practice the experimental variogram is usually computed using vectors

h arß of a length inferior to half the diameter of the region. For pairs of sampies
with vectors h arß of a length almost equal to the diameter of the region, the
corresponding sampies are located near the border. Vector dasses .fj formed with
such pairs will have no contribution from sampies a.t the center of the region and
are thus not representative of the whole data set.
An example of an experimental variogram obtained for a. sequence of dasses
jj,. is sketched on Figure 4.3. The experimental variogram is obtained from the
Variogram Cloud 33
'Y (h)
h
Figure 4.4: The sequence of average dissimilarities is fitted with a theoretical vari-
ogram function.
variogram doud by subdiving it into dasses and computing an average for each
dass.
Usually we can observe that the average dissimilarity between values increases
when the spacing between the pairs of sampie points is increased. For large
spacings the experimental variogram sometimes reaches a sill which can be
equal to the variance of the data.
If the slope of the dissimilarity function changes abruptly at a specific scale,
this suggests that an intermediate level of the variation has been reached at this
scale.
The behavior at very small scales, near the origin of the variogram, is of
importance, as it indicates the type of continuity of the regionalized variable:
differentiable, continuous but not differentiable, or discontinuous. In this last
case, when the variogram is not differentiable at the origin, this is a symptom
of a nugget-effect, which means that the values of the variable change abruptly
at a very small scale, like gold grades when a few gold nuggets are contained in
some samples.
When the average dissimilarity of the values is constant for all spacings h,
there is no spatial structure in the data. Conversely, a non zero slope of the
variogram near the origin indicates structure. An abrupt change in slope indi-
cates the passage to a different structuration of the values in space. We shall
learn how to model such transitions with nested theoretical variograms and how
to visualize the different types of spatial associations of the values separately as
maps by kriging spatial components.
34 Geostatistics
Replacing the experimental by a theoretical variogram

The experimental variogram is replaced by a set of theoretical variogram func-
tions essentially for the reason that the variogram model should have a physical
meaning (a random function with the given type of variogram does exist). The
use of a theoretical variogram guarantees (using weights subject to a certain con-
straint) that the variance of any linear combination of sampie values is positive.
If the values of the experimental variogram were taken to set up a kriging
system, this could lead to negative kriging variances, in a way similar to the
Example 2.1 (on page 17) where a problem of multiple regression with missing
values was discussed.
A set of theoretical variogram functions is fitted to the sequence of average
dissimilarities, as suggested on Figure 4.4. It is important to understand that
this fit implies an interpretation of both the behavior at the origin and the be-
havior at large distances, beyond the range of the experimental variogram. The
fit is done byeye (what looks like a strange tradition, at first glance!), because it
is generally not so relevant how well the variogram function fits the sequence of
points. What counts is the type of continuity assumed for the regionalized vari-
able and the stationarity hypothesis associated to the random function. These
assumptions will guide the selection of an appropriate variogram function and
this has many more implications than the way the theoretical function is fitted to
the experimental variogram. A thorough discussion is found in MATHERON [126].
5 Variogram and Covariance Function
The experimental variogram is a convenient tool for the analysis of spatial data
as it is based on a simple measure of dissimilarity. Its theoretical counterpart
reveals that a broad class of phenomena are adequately described by it, includ-
ing phenomena of unbounded variation. When the variation is bounded, the
variogram is equivalent to a covariance function.
Regional variogram
The experimental variogram of sampies z(xa,} is the sequence of averages of
variogram values for different distance classes f)k. If we had sampies for the
whole domain V we could compute the variogram for every possible pair in the
domain. The set V(h), defined as the intersection of the domain V with a
translation V-h of itself, describes all the points x having a counterpart x+h in
V. The regional variogram gR(h) is the integral over the squared differences of
a regionalized variable z(x) for a given lag h
gR(h) = "I:/L \ I
V(h)
J (z(x+h) - z(x) f
where IV(h)1 is the area (or volume) of the intersection V(h).
We know the regionalized variable z(x) only at a few locations and it is
generally not possible to approximate z(x) by a simple deterministic function.
Thus it is convenient to consider z(x) as a realization of a random function Z(x).
The associated regional variogram
GR(h) = "I:/L\I
V(h)
J (Z(x+h) - Z(x)f
is a randomized version of gR(h). Its expectation defines the theoretical vari-

ogram i(h) of the random function model Z(x) over the domain V
i(h) = E[ GR(h) 1
Theoretical variogram
The variation in space of a random function Z(x) can be described by taking the
differences between values at pairs of points x and x+h
Z(x+h) - Z(x)
36 Geostatistics
called the increments.

The theoretical variogram -y(h) is defined by the intrinsic hypothesis, which
is a short form for "a hypothesis of intrinsic stationarity of order two". This
hypothesis, which is merely a statement about the type of stationarity character-
izing the random function, is formed by two assumptions about the increments:
• the mean m(h) of the increments, called the drift, is invariant for any
translation of a given vector h within the domain. Moreover, the drift is
supposed to be zero whatsoever the position of h in the domain.
• the variance of the increments

has a finite value 2-y(h) depending on the length and the orientation of a
given vector h, but not on the position of h in the domain.
That is to say, for any pair of points x, x+h E 1) we have
E[ Z(x+h) - Z(x)] = m(h) = 0

var[Z(x+h) - Z(x)] 2-y(h)
These two properties of an intrinsically stationary random function yield the

definition for the theoretical variogram
-y(h) = ~E[ (Z(x+h) - Z(x))2]
The existence of expectation and variance of the increments does not imply
the existence of the first two moments of the random function itself: an intrinsic
random function can have an infinite variance although the variance of its in-
crements is finite for any vector h. An intrinsically stationary random function
does not need to have a constant mean or a constant variance.
The value of the variogram at the origin is zero by definition
-y(0) = 0
The values of the variogram are positive
-y(h) ~ 0
and the variogram is an even function
-y(-h) = -y(h)
The variogram grows slower than Ihl 2 for Ihl H 00 (otherwise the drift m(h)
could not be assumed zero).
Variogram and Covariance Function 37
Covariance function
The covariance function C(h) is defined on the basis of a hypothesis of station-
arity of the first two moments (mean and covariance) of the random function
E[ Z(x)] = m for all x E 'D

{
E[ Z(x)· Z(x+h)] - m2 = C(h) for all x, x+h E 'D
The covariance function is bounded and its absolute value does not exceed
the variance
IC(h)1 ~ C(O) = var(Z(x))
Like the variogram it is an even function: C( -h) = C( +h). But unlike the
variogram it can take negative values.
The covariance function divided by the variance is called the correlation func-
tion
C(h)
p(h) = C(O)
which is obviously bounded by
-1 ~ p(h) ~ 1
A variogram function can be deduced from a covariance function with the

formula
,(h) = C(O) - C(h)
but in general the reverse is not true, because the variogram is not necessarily
bounded. Thus the hypothesis of second-order stationarity is less general than
the intrinsic hypothesis (in the monovariate case) and some variogram models
do not have a covariance function counterpart.
EXAMPLE 5.1 For example, the power variogram shown on Figure 5.1
,(h) = b Ihl" with 0 0
cannot be obtained from a covariance function as it grows without bounds.

Clearly it outgrows the framework fixed by second-order stationarity.
Positive definite function

A covariance function is a positive definite function 1 . This means that the
use of a covariance function C(h) for the computation of the variance of a linear
1 For functions, in opposition to matrices, no distinction is usually made between "definite"

!Lnd "semi-definite".
38 Geostatistics
Power model
"'
...
~<')
a:
8a:
.. ..................
--~:O-.b---" .,-,--------- --
,.
~'"
;
"
,.. .I_---------_.. _~~:. .
-4 ·2 o 2 4
DISTANCE
Figure 5.1: Power variograms for different values of the power p (with b = 1).
combination of n+1 random variables Z(x a) (any subset sampled from a second-
order stationary random function) must be positive. It is necessarily linked to a
positive semi-definite matrix C of covariances
var ( EWa Z(xa) EE Wa wß C(xa-xß)

n )
=
n n
= wT Cw ~ 0
for any set of points X a and any set of weights w'" (assembled into a vector w).
The continuous covariance functions are characterized by Bochner's theorem
as the Fourier transforms of positive measures. This topic is treated in more
detail in the chapter on cross-covariance functions.
Conditionally negative definite function

The variogram is a conditionally negative definite function. The condition to
guarantee the positivity of the variance of any linear combination of n+ 1 random
variables, subset of an intrinsic random function, is that the n+1 weights w'"
sum up to zero. The variance of a linear combination of intrinsically stationary
random variables is defined as
n
var (
n )
.?;WaZ(Xa) = - .?;EwaWß,(Xa-xß) ~ 0
n n
if LW a = 0
a=O
i.e. any matrix r of variogram values is conditionally negative definite

n
[Wa)T [,(xa-Xß*W a) = wTrw ~ 0 for LW a = 0
a=O
Variogram and Covariance Function 39
To understand the meaning (sufficiency) of the condition on the sum of the

weights, it is necessary to compute explicitly the variance of the linear combina-
tion.
As the random function Z(x) is intrinsically stationary, only the expectation
of its increments has a meaning. A trick enabling to construct increments is
to insert an additional random variable Z(O), placed arbitrarily at the origin 0
(assuming 0 E 1)), muItiplied by the zero sum weights
var (E Wa Z(Xa)) = var( -Z(O) .

a=O
f Wa + f Wa Z(Xa))
a=O
~
o
var [E Wa (Z(Xa)-Z(O))]
ff
a=Oß=O
Wa wß E [( Z(xa)-Z(O)) . (Z(Xß)-Z(O))]
n n
= L L Wa wß C1(xa,xß)
a=Oß=O
where C1(x a, xß) is the covariance of increments formed using the additional
variable Z(O).
We also introduce the additional variable Z(O) into the expression of the
variogram
/(Xa-Xß) ~E [ (Z(Xa)+Z(O) _ Z(xß)-Z(O)) 2]

= ~(2/(Xa) + 2/(xß) - 2CI (Xa,xß))
so that
CI(Xa,Xß) /(X a ) + /(Xß) -/(Xa-Xß)
which incorporates two non stationary terms /(x a ) and /(xß).

Coming back to the computation of the variance of the linear combination,
we see that the two non stationary terms are cancelled by the condition on the
weights
n ) n n
var ( ~WaZ(Xa) =~];WaWßCI(Xa,Xß)
n n n n n n
L Wß L Wa/(xa) + L Wa L wß/(xß) - L L waWß/(xa-xß)
ß=O a=O a=O ß=O a=O ß=O
~ ~
o 0
40 Geostatistics
n n n n
= -.E.E W", Wß '(X"'-Xß) for .E w'" = .E wß = 0

"'=Oß=O ",=0 ß=O
When handling linear combinations of random variables, the variogram can

only be used together with a condition on the weights guaranteeing its existence.
In particular, the variogram cannot in general be used in a simple kriging. Other
forms of kriging, with constraints on the weights, are required to use this tool
that covers a wider range of phenomena than the covariance function.
Variograms can be characterized on the basis of continuous covariance func-
tions: a function ,(h) is a variogram, iffor any positive constant c the exponential
of -c,(h) is a covariance function
C(h) = e-q(h) with c> 0
This remarquable relation, based on a theorem by SCHOENBERG [167], links

(through the defining kernel of the Laplace transform) the conditionally negative
functions with the positive definite functions (see also CHOQUET [28]).
COMMENT 5.2 The covariance function
Ihl P
C(h) = e a with 0 0
which is related by Schoenberg's theorem to the power variogram model, defines

the family of stable covariance functions. The case p = 2 (Gaussian covariance
function) is pathological: it corresponds to a deterministic random function
(MATHERON [118j), which is contradictory with the randomness. The case p = 1
defines the exponential covariance function.
Fitting the variogram with a covariance function

If the variogram is bounded by a finite value ,(00) a covariance function can be
found such as
C(h) = ,(00) -,(h)
The experimental variogram can have a shape which suggests to use a
bounded variogram function to fit it. The lowest upper bound of the variogram
function is described as the sill.
When the experimental variogram exhibits a sill, it is possible to fit it with
a theoretical variogram that is actually a covariance function C(h) on the basis
of the formula for bounded variograms
,(h) = b- C(h)
where b= C(O) is the value at the origin of the covariance function.

6 Examples of Covariance Functions
We present a few models of covariance functions. They are defined for isotropie
(i.e. rotation invariant) random functions. On the graphieal representations
the covariance functions are plot ted as variograms using the relation ')'(h) =
C(O) - C(h).
Nugget-effect model
The covariance function C(h) that models a discontinuity at the origin is the
nugget-effect model
b for Ihl = 0
{
Cnug(h) = 0 for Ihl > 0
where b is a positive value.
Its variogram counterpart is zero at the origin and has the value b for h ::/: o.
It is shown on Figure 6.1.
The nugget-effect is used to model a discontinuity at the origin of the vari-
ogram, i.e. when
!im ')'(h) = b
Ihl"'O
The nugget-effect is equivalent to the concept of white noise in signal pro-
cessing.
Exponential covariance function

The exponential covariance function model falls off exponentially with increasing
distance
Jhl
Cezp(h) = be a with a,b> 0
The parameter adetermines how quickly the covariance falls off. For a value
of h = 3a the covariance function has decreased by 95% of its value at the origin,
so that this distance has been termed the practical range of the exponential
model.
The exponential model is continuous but not differentiable at the origin. It
drops asymptotically towards zero for Ihl H 00.
The variogram equivalent of the exponential covariance function is shown on
Figure 6.2.
42 Geostatistics
Cl
......
00
ci
~
~
0
~
"<l;
0
Ci
0
0
ci ~ ..
0 2 4 6
DISTANCE
Figure 6.1: A nugget-effect variogram: its value is zero at the origin and b= 1 else-
where.
Spherical model
A commonly used covariance function is the spherical model
b
Csph(h) = { 0
(1 2-;;
-
3Ihl + 2~
1Ih13 )
for 0 ~ Ihl ~ a
for Ihl > a
The parameter a indicates the range of the spherical covariance: the co-
variance vanishes when the range is reached. The parameter b represents the
maximal value of the covariance: the spherical covariance steadily decreases,
starting from the maximum b at the origin, until it vanishes when the range is
reached.
The nugget-effect model can be considered as a particular case of a spheri-
cal covariance function with an infinitely small range. Nevertheless there is an
important difference between the two models: Cnug(h) describes a discontinuous
phenomenon, whose values change abruptly from one location to the other, while
Csph(h) represents a phenomenon which is continuous, but not differentiable: it
would feel rough, could one touch it.
A corresponding spherical variogram is shown on Figure 6.3. It reaches the
sill (b= 1) at a range of a= 3.
Examples of Covariance Functions 43
Exponential model
~ -t----_________________ _
CI)
ci
:::;:"!
~o
8
~'<I;
>0
"!
o
q
o I~~----------~------------_r------------~-------J
o 2 4 6
DISTANCE
Figure 6.2: An exponential variogram: it rises asymptotically towards a sill b = 1.

The range parameter is set to a = 1. At a practical range of Ihl = 3 the exponential
model has approached the sill to 95% ..
Derivation of the spherical covariance

Imagine a uni verse with Poisson points, i.e. a 3D-space with points Xp scattered
randomIy following a uniform distribution along each coordinate and summing
up to () points per volume unit on average. A counting function N(V) is defined
which counts the number of Poisson points contained in a volume V.
Consider the random function Z(x) = N(E x ) which is the count of the num-
ber of Poisson points contained in a ball B centered on a point x. Clearly Bx
represents the volume of influence of diameter d around a point x which deter-
mines the value of Z(x). The problem at hand is to calculate the covariance
function of the random function Z(x).
An indicator function IB(x') is constructed indicating whether a Iocation x'
is inside a ball centered at x
I, if x' E Ex
{
lß(x') = 0, if x' tJ. Ex
A function Jt(h), the geometrie eovariogram, measures the volume of the

intersection of a ball E with a copy Eh of it translated by a vector h
JJJ JJJ
00 00 00 00 00 00
Jt(h) = IB(x') IB(x' + h)dx' = IB(x') IBh(x')dx'

-00 -00-00 -00 -00-00
= IEnEhl
44 Geostatistics
0
...;
co
0
~
\Cl
0
~
-.!;
0
N
0
0
0
----r
0 2 4 6
DISTANCE
Figure 6.3: A spherical variogram with a sill b= 1 and a range a= 3.
Conversely, it is worth noting that the intersection B n B-h of the ball with
a copy of itself translated by -h represents the set of points x' E B which have
a neighbor x' + h within the ball, as shown on Figure 6.4
Jt(h) = J
x'E8n8_h
dx' = IB n B-hl
The covariance of Z (x) can now be expressed as
C(h) = E[ N(B)N(Bh)] - E[ N(B)] E[ N(Bh)]

and as the counts N(V) are independent in any subvolume
C(h) = E[N(BnBh)2]-E2[N(BnB h)]

= °IBnBhl
= °Jt(h)
Calculating explicitly the volume of the intersection of two spheres of equal

size whose centers are separated by a vector h yields the formula for the spherical
covarlance
31hl Ilh l3 )
C(h) = { °IBI ( 1 - 2d + 2d3 for 0 ~ Ihl ~ d,
o for Ihl >d

Examples of Covariance Functions 45
~........ - ...............
"
, ,,
, ,, ,,
B I ,
,,
I \
~ \
,,
\
I
I
,,
I
~h.. ... I
. ,,
I
\
... \
~
, \' '- x'+h
~ I
, I
,,
~
I \ ' -_ ......... "

I x' , h ..... ,-
,,
I
I
\
\
\
... ...
,
.. . ... _--_ .... ..
...
~
~
Figure 6.4: The intersection 8 n 8_ h describes the set of points x' E 8 which have
a neighbor x/+h inside 8.
where 8181 = 87rcf3 /6 = C(O) represents the variance of Z(x) and 181 is the
volume of the spheres.
The diameter d of the spheres is equal to the range of the covariance function
as it indicates the distance at which the covariance vanishes. The range of
the spherical covariance function is the maximal distance at which the volumes
of infiuence of two random variables Z(x) and Z(x+h) can overlap and share
information.
In applications large objects (as compared to the scale of the investigation)
can condition the spatial structure of the data. The maximal size of these mor-
phologicalobjects in a given direction can often be read from the experimental
variogram and interpreted as the range of a spherical model.
The shape of objects conditioning the morphology of a regionalized variable
may not be spherical in many applications. This will result in anisotropical
behavior of the variogram.
7 Anisotropy
Experimental ealeulations ean reveal a very different behavior of the experimen-

tal variogram in different directions. This is ealled an anisotropie behavior. As
variogram models are defined for the isotropie ease, we need to examine transfor-
mations of the coordinates whieh allow to obtain anisotropie random functions
from the isotropie models. In practiee anisotropies are detected by inspecting
experimental variograms in different directions and are induded into the model
by tuning predefined anisotropy parameters.
Geometrie Anisotropy
In 2D-spaee a representation of the behavior of the experimental variogram ean
be made by drawing a map of iso-variogram lines as a function of a vector h.
Ideally if the iso-variogram lines are eireular around the origin, the variogram
obviously only depends on the length of the vector hand the phenomenon is
isotropie.
If not, the iso-variogram lines ean in many applieations be approximated by
eoneentrie ellipses defined along a set of perpendieular main axes of anisotropy.
This type of anisotropy, ealled the geometrie anisotropy, ean be obtained by
a linear transformation of the spatial coordinates of a eorresponding isotropie
model. It allows to relate the dass of ellipsoidally anisotropie random functions
to a eorresponding isotropie random function. This is essential beeause variogram
models are defined for the isotropie ease. The linear transformation extends in a
simple way a given isotropie variogram to a whole dass of ellipsoidally anisotropie
variograms.
Rotating and dilating an ellipsoid

We have a eoordinate system for h = (hI, ... , hn ) with n coordinates. In this
eoordinate system the surfaees of eonstant variogram deseribe an ellipsoid and
we seareh a new eoordinate system for h in whieh the iso-variogram lines are
spherieal.
As a first step a rotation matrix Q is sought whieh rotates the eoordinate
system h into a eoordinate system h' = Qh that is parallel to the prineipal axes
of the ellipsoid, as shown on Figure 7.1 in the 2D ease. The directions of the
principal axes should be known from experimental variogram ealculations.
Anisotropy 47
h'2 ~
h'1
,,"1'
,,"
h1
Figure 7.1: The coordinate system for h = (hl, h2 ) is rotated into the system h'
paraJlel to the main a.xes of the concentric ellipses.
In 2D the rotation is given by the matrix
cos () sin () )
Q= (
- sin () cos ()
where () is the rotation angle.

In 3D the rotation is obtained by a composition of elementary rotations. The
convention is to use Euler's angles and the corresponding rotation matrix is
COS ()3 sin ()3 0) ( 1 0 0) ( COS ()I sin ()I 0)

Q= ( - sin ()3 cos fh 0 COS ()2 sin ()2 0 - sin ()I cos ()I 0
o 0 1 - sin ()2 cos ()2 0 0 0 1
The angle ()I defines a rotation of the plane hl h2 around h3 such that h l
is brought into the plane h~ h~. With ()2 a rotation is performed Mound the
intersection of the planes h I h2 and h~h~ bringing h3 in the position of h~. The
third rotation with an angle ()3 rotates everything around h~ in its final position.
The second step in the transformation is to operate a shrinking or dilation of
the principal axes of the ellipsoid using a diagonal matrix
VA~ (~' ;J
which transforms the system h' into a new system h in which the ellipsoids
become spheres
h=VAh'
48 Geostatistics
Conversely, if r is the radius of a sphere around the origin in the eoordinate

system of the isotropie variogram, it is obtained by caleulating the length of any
vector h pointing on the surface of the sphere
r = Ihl = VhTh
This yields the equation of an ellipsoid in the h' coordinate system
(h'r Ah' = r 2
The diameters dp (prineipal axes) of the ellipsoid along the prineipal direetions
are thus
2r
dp = ,,;xp
and the prineipal direetions are the vectors ()p of the rotation matrix.
Finally onee the ellipsoid is determined the anisotropie variogram is speeified
on the basis of an isotropie variogram by
-y(r) = -y(v'hTBh)
where B = QT AQ.
Exploring 3D space for anisotropy

In 3D applieations the anisotropy of the experimental variogram ean be explored
taking advantage of the geometry of a regular icosahedron (20 faees, 30 edges)
eentered at the origin. The 15 lines joining opposite edges through the origin are
used as leading direetions for the experimental ealeulations. The lines are evenly
distributed in spaee and ean be grouped into 5 systems of Cartesian coordinates
forming the basis of trirectangular trieders.
The range of a geometrieally anisotropie variogram deseribes an ellipsoid
whose prineipal directions are given by a set of Cartesian coordinates. Five
possible ellipsoids for deseribing the range ean now be tested by eomposing up
to four times a rotation R yielding the rotation matrix
Q = (R)lc = (~)lc
2
(1
g+ 1
-(g+
9
1) 9 )lc
-1 withk=1, ... ,4
9 1 g+1
where 9 = (v'5 -1)/2 ~ 0.618 is the golden mean.
Zonal anisotropy
It can happen that experimental variograms calculated in different directions
suggest a different value for the sill. This is termed a zonal anisotropy.
Anisotropy 49
For example, in 2D the sill along the X2 eoordinate might be mueh larger than
along Xl. In such a situation a eommon strategy is to fit first to an isotropie model
11 (h) to the experimental variogram along the Xl direction. Seeond, to add a
geometrieally anisotropie variogram 12(h), whieh is designed to be without effect
along the Xl eoordinate by providing it with a very large range in that direetion
through an anisotropy eoeflieient. The final variogram model is then
,(h) = 'l(h) + '2(h)

in whieh the main axis of the anisotropy ellipse for '2(h) is very large in the
direction Xl.
The underlying random function model overlays two uneorrelated processes
Zl(X) and Z2(X)
Z(X) = Zl(X) + Z2(X)
,2
From the point of view of the regionalized variable, the anisotropy of (h) ean
be due to morphologieal objects whieh are extremely elongated in the direction
of XI, erossing the borders of the domain. These units slice up the domain along
Xl thus ereating a zonation along X2, whieh explains the additional variability to
be read on the variogram in that direetion.
N onlinear deformations of space

In air pollution and climatologieal studies it is frequent that data is available for
several replieations Nt in time at stations in 2D spaee. For every pair of loea-
tions (x a, xß) in geographical spaee a variogram value ,*(haß ) ean be eomputed
by averaging the dissimilarities '~ß between the two stations for the Nt repliea-
tions in time. It is often the ease for pairs of stations at loeations (xa,xß) and
(Xal, XßI) with separation veetors haß ~ h""ßI approximately of the same length
and orientation that the values 1*(h"'ß) are nevertheless very different!
To eope with this problem spatial correlation mapping has been developed,
inspired by teehniques used in morphometries. SAMPSON & GUTTORP [162] and
MONESTIEZ & SWITZER [130] have proposed smooth nonlinear deformations of
spaee f(x) for whieh the variogram ,(r) = ,(lhD, with h = f(x) - f(x'), is
isotropie. The deformation of the geographical spaee for whieh the ,*(h"'ß) values
best fit a given theoretieal model is obtained by multidimensional sealing . The
resulting somewhat grotesque looking maps showing the deformed geographieal
spaee turn out to be a valuable exploratory tool for understanding the eovarianee
structure of the stations, espeeially when this ean be done for different time
periods.
8 Extension and Dispersion Variance
Measurements can represent averages over volumes, surfaces or intervals, called

their support. The computation of variances depends intimatelyon the supports
that are involved as well as on a theoretical variogram associated to a point-
wise support. This is illustrated with an application from industrial hygienics.
Furthermore, three simple sampling designs are examined from a geostatistical
perspective.
Support
In the investigation of regionalized variables the variances are a function of the
size of the domain. On Table 8.1 the results of computations of means and
variances in nested 2D domains D n are shown.
Size Mean Variance

m(Dn ) 17 2 ('IDn )
DI 32x32 20.5 7.4

D2 64x64 20.1 13.8
D3128x128 20.1 23.6
D4256x256 20.8 34.6
~- 512x512 18.8 45.0
Table 8.1: Nested 2D domains Dn for which the variance increases with the size of
the domain (from a simulation of an intrinsic random function by C LAJAUNIE)
In this example the variance a 2 ('IDn ) of point sampies in a domain Dn , in-

creases steadily with the size of the domain whereas the mean does not vary
following a distinctive pattern. This illustrates the influence that a change in the
size of a support (here the domain D n ) can have on a statistic like the variance.
In applications generally two or more supports are involved as illustrated by
the Figure 8.1. In mining the sampies are collected on a support that can be
considered pointwise (only a few cm3 )j subsequently small blocs v (m3 ) or larger
panels V (100m 3 ) have to be estimated within deposits D. In soil pollution small
surface units s are distinguished from larger portions S. In industrial hygiene
the problem may be set in terms of time supports: with average measurements
Extension and Dispersion Variance 51
~
v
Volumes
u v
Soil poJlutlon
D
s
,20 :~ ~0'~
Suifaces
s
Irtdu$trial hygienics
1'0
L\t T
Time intervals
Figure 8.1: Supports in 1, 2, 3D in different applications.
on short time intervals At the excess over a limit value defined for a work day T
should be estimated.
Extension variance
With regionalized variables it is necessary to take account of the spatial disposal
of points, surfaces or volumes for which the variance of a quantity should to be
computed.
The extension variance of a point x with respect to another point x' is
defined as twice the variogram
ai(x, x') = var(Z(x) - Z(x /») = 2,(x-x/)

It represents the theoretical error committed when a value at a point x is "ex-
tended" to a point x'.
The extension variance of a small volume v to a larger volume V at a different
location (see Figure 8.2) is obtained by averaging the differences between all
positions of a point x in the volume v and a point x' in V
a~(v, V) var( Z(v) - Z(V»)

52 Geostatistics
v
v
Figure 8.2: Points X E v and x' E V.
= 2 lvi 1IVI JJ
x Eu x'EV
,(x-x') dxdx'
1:12 J J ,(x-x') dxdx'

x Eu x'Eu
-1~12 JJ
xEV x'EV
,(x-x') dxdx'
Denoting
;y(v, V) = Ivl~VI J J ,(x-x') dxdx'

x Ev x'eV
we have
O'~(v, V) = 2;Y(v, V) - ;y(V, v) - ;Y(V, V)
The extension variance depends on variogram integrals ;y(v, V), whose values
can either be read in charts (see JOURNEL & HUIJBREGTS [93], chap. II) or
integrated numerically on a computer.
Dispersion variance
Suppose a large volume V is partitioned into n smaller units v of equal size.
The experimental dispersion variance of the values z~ of the small volumes Va
building up V is given by the formula
1
s2(vlV) = -
n
L (z: - ZV)
a=l
n 2
where
1 n
Zv = -n LZ: a=l
Considering all possible realizations of a random function we write

1 n 2
S2(vlV) = -L:(Z:-Zv)
n a=1
The theoretical formula for the dispersion variance is obtained by taking the
expectation
u 2(vlV) E[ S2(vlV) 1
1 n [
~EE (z: - Zv) 2]
in which we recognize the extension variances
u 2(vlV) = ~
n
t U~(Va,
a=1
V)
Expressing the extension variances in terms of variogram integrals

1
L: (2;y(v, V) -;Y(v, v) -;Y(V, V))
n
a 2(vlV) = -
n
E
a=1
-;y( v, v) - ;Y(V, V) +~ IvatlVl J J {(x-x') dx dx'

x EVa x'EV
= -;y(v, V) -;y(V, V) + ~. __ . t JJ
a=1 x EVa x'EV
{(x-x')dxdx'
~
xEV
-;y(v, V) -;Y(V, V) + 2;Y(V, V)

so that we end up with the simple formula
u 2(vlV) = ;Y(V,V)-;Y(v,v)
The theoretical determination of the dispersion variance reduces to the com-
putation of the variogram integrals ;Y(v, v) and ;Y(V, V) associated to the two
supports v and V.
Krige's relation
Starting from the formula of the dispersion variance, first we see that for the
case of the point values (denoted by a dot) the dispersion formula reduces to one
term
u 2(·IV) ;Y(V, V)
54 Geostatistics
v
D
-
v
L
I VI J
Figure 8.3: A domain V partitioned into volumes V which are themselves partitioned
into smaller volumes v.
Second, we notice that (12(vlV) is the difference between the dispersion vari-
ances of point values in V and in v
(12(vlV) = (12(·1V) - (12(·lv)
Third, it becomes apparent that the dispersion variance of point values in V

can be decomposed into
(12(. IV) = (12(·lv) + (12(vlV)

This decomposition can be generalized to non point supports. Let 1) be a
domain partitioned into large volumes V which are themselves partitioned into
small units v as represented on Figure 8.3. Then the relation between the three
supports v, V and V can be expressed theoretically by what is called Krige '8
relation
(12(vIV) = (12 (v IV) + (12(VIV)
As the dispersion variances are basically differences of variogram averages over
given supports, the sole knowledge of the pointwise theoretical variogram model
makes dispersion variance computations possible for any supports of interest.
Change of support effect

In the early days of ore reserve estimation, mining engineers used a method
called the polygon method. It consists in defining a polygon around each sampie,
representing the area of influence of the sampie value, in such a way that the
ore deposit is partioned by the polygons. The reserves are estimated as a linear
combination of the grades with the corresponding areas of influence. In the
frequency
.. .-'.
'
' . ........
' ..
sampies•••••••••••...
'
'.' .. .
' '
.......... ......
-- ............... .
z
mean
Figure 8.4: The distribution of block values is narrower than the distribution of values
at the sarnple points.
polygon method each sam pie value is extended to its area of influence, neglecting
the fact that the sampies are obtained from pointwise measurements while the
polygons represent a much larger support.
In the case of a square grid the polygons are square blocks v which partition
the exploration area. The value at each grid node is extended to each area of
influence v. The method implies that the distribution of average values of the
blocks is the same as the distribution of the values at the sampie points. From
Krige's relation we know that this cannot be true: the distribution of the values
for a support v is narrower than the distribution of point values (as represented
on Figure 8.4) because the variance Q'2( 'Iv) of the points in v generally is not
negligible.
In mining, the cut-off value defines a grade above which a mining block should
be sent to production. Mining engineers are interested in the proportion of the
values above the cut-off value which represent the part of a geological body which
is of economical interest. If the cut-off grade is a value substantially above the
mean, the polygon method will lead to a systematic overestimation of the ore
reserves as shown on Figure 8.5. To avoid systematic over- or underestimation
the support effect needs to be taken into account.
56 Geostatistics
overestimation I
.......
...........
' .'.'.'.'
.'
.'.' ··,L
. '"
' ." "
...............
threshold
Figure 8.5: The proportion of sam pie values above the cut-off value is greater than the
proportion of block values: the polygon method leads to a systematic overestimation
in this case.
Application: acoustic data
Aseries of 659 measurements of equivalent noise pressure levels Leq (expressed

in dBA) averaged over 20 seconds were performed on a worker operating with
a circular saw. The problem is to evaluate whether a shorter or larger time
integration interval would be of interest.
The Leq(t) are not an additive variable and need to be transformed back to
the acoustic power Veq(t). The average acoustic power Veq(t) is defined as the
integral over the time interval tlt of the instant acoustic pressures p( x) divided
by the reference acoustic pressure Po squared
1O-91t+At/2 (P( X)) 2 dx

Veq(t)
tlt t-At/2 Po
= exp(o:Leq(t)-ß)
where 0: = (ln 10)/10 and ß = In 109 •

The measurements were taken continuously during aperiod of 3 hours and
40 minutes. The Figure 8.6 shows with a continuous line the time series (in
dB A ) of the equiva.lent acoustic pressure levels L eq integrated over interva.ls of 20
seconds. The maximal noise levels Lma:c within these time intervals are plot ted
with a dotted line (when they are above 107 dB). We observe in passing that the
averaging over 20 seconds has enormously reduced the variation.
The theoretical variogram of the acoustic power was modeled with a pointwise
Circular saw
~
!!\.:~I '11,
i~.l'
~
~ :l!.i U! Ji
I:on::'lr° l~!'
0
hl!l: jl,)!J, jiJl.

0
~ ." ,ji;J Oll T.~IJi3JI,c
'8§
o 200 400 600

TIME/20S
Figure 8.6: Measurements of maximal noise level L max (dots) and average noise Leg
(plain) on time intervals of 20 seconds during 3 hours and 40 minutes. They represent
the exposure of a worker to the noise of a circular saw.
exponential model
,(h) = b (1 - e- 1hl / a ) with a,b> 0
The sill is b = .42 and the range parameter is a = 2.4. It corresponds to a

practical range of 3a = 7.2 time units, i.e. 2.4 minutes, which is the time of a
typical repetitive working operation.
The support of the acoustic power, i.e. the integration time, has an impact
on the shape of the theoretical variogram: it alters the behavior at the origin,
reduces the value of the sill and increases the range. The exponential variogram
regularized over time intervals ßt is defined by the formula ([93], p84)
ba 2
(ßt)2 (A
2e- t/a -
2h
2 + --;- + e- h/ a (2 _ e- At/ a ) _ e(h-At)/a)
for 0 S; h S; ßt,
'At(h) =
ba 2
(ßt )2 (e- At/ a - e At / a + (e- At / a + eAt / a - 2) . (1 - e- h/ a ) )
for h > ßt.
The Figure 8.7 shows the experimental variogram together with the exponen-
tial model regularized over time lags of 20 seconds, 1 and 5 minutes illustrating
58 Geostatistics
Circular saw
"t
0
~~.-"""",=,
1-0-13 'D:'a:~a
205
~
0
1MN
::E
~
Cl"!
00
if
~
ci 5MN
CI
0
--r
0 10 20 30
TIME I 205
Figure 8.7: Experimental variogram of the acoustic power v"q and a regularized ex-
ponential variogram model for time intervals of D..t = 20s, 1mn and 5mn.
the effect of a modification of the support on the shape of the theoretical vari-
ogram.
Finally a curve of the dispersion variance of the acoustic power as a function
of the integration time D..t is represented on Figure 8.8. The dispersion variance
for an exponential model is calculated with the formula
a 2(D..tl1') = ;:Y(1', V) - ;:y(D..t, D..t)

= F(1') - F(D..t)
where for L = D..t, V
( 2a (aL -1) - V2a

F(L) = b 1 + L
2
exp -~ (L))
As the practical range of the variogram is relatively short (2.4 minutes), it
can be learned from Figure 8.8 that for a time integration support of less than
1/2 hour (90 time units) a small increase of the support leads to large dropping
of the dispersion variance. Conversely it does not seem to make much difference
if the integration is changed from 1 hour to 2 hours. With a short practical range
the essential part of the variability can only be recovered using an integration
time much shorter than 1/2 hour.
Circular saw
....
d
~~
Z
~
~
t5~
~
W
0-
cn
i5~
d
q
0-,-
0 200 400 600
TIME I 20S
Figure 8.8: Curve of dispersion variances (T2(ßtIV) as a function of the integration

support ßt with fixed V.
Comparison of sampling designs
The concepts of estimation and dispersion variance can be used to compare three
sampling designs with n sampies
A - regular grid: the domain V is partitioned into n cubic cells v at the

center of which a sampie z(x",) has been takenj
B - random grid: the n sampies are taken at random within the domain Vj
C - random stratified grid: the domain V is partitioned into n cubic cells

v inside each of which one sampie is taken at random.
For design A, with a regular grid the global estimation variance (T~G is com-
puted as
(T~G = var(Z; -Zv)

= ln
E [ ( ;;-; Z(X",) -;;- ;
1 n
Z(V",)
)2]
= E[ (~ 1; (Z(X",) - Z(va))rJ
60 Geostatistics
If we consider that the elementary errors Z(x",)-Z(vOI ) are independent from

one cell to the other
U~G = ~2EE[(Z(XOI)-Z(VOI)r]
1
2: ui(x
n
= 2 OI , VB)
n 01=1
As the points X Ol are at the centers Xc of cubes of the same size we have for
design A
uEG
2 = 1 2( xc,v )
-uE
n
For design B, the sampies are supposed to be located at random in the domain
(Poisson points). We shall consider one realization z with random coordinates
Xl, X 2, X a. The expectation will be taken on the coordinates. The global esti-
mation variance is
= (zv - Z'l) ) 2 ]
rl
sfuG Ex [
= Ex [ (~ Ez(Xf,X;,X~) - z(V)
Assuming elementary errors to be independent (for the random function Z)

we are left with
sfuG = ~ tEx[(z(Xf,X;,X~)-z(V)r]
n 01=1
We now write explicitly the expectation over the random locations distributed
with probabilities I/lVI over the domain
2
sEG = ~
n
t JJJp(xr,x~,x~)· (z(xr,x~,x~)-z(V)r
01=1:1:1:1:2:1:3
dXldx2dxa
= ~2 t I~I JJJ(z(xr,x~,x~)-z(V)r
Oll :1:1<1'2:1:3
dXldx2dx3
1 n
= n2 2: s2(·IV)
01=1
= 2:.n s2 (·IV)
Generalizing the formula from one realization z to the random function Z
and taking the expectation (over Z), we have for design B
2
uEG = 1 2( .IV)
-u
n
For design C, each sampie point is located at random within a cube v", and
the global estimation variance for one realization z is
S~G = Ex [(z; - Zv fl
Ex [(~ E( z(Xf,X~,Xf) - z(v",)) rJ
= ~s2('lv)
n
Randomizing z to Z and taking the expectation, we have for design C
2 1 2
/7EG = -/7 ('Iv)
n
Comparing the random grid B with the random stratified grid C we know
from Krige's relation that
/72( 'Iv) ::; /72( . IV)
and thus design C is a better strategy than design B.

To compare the random stratified grid C with the regular grid A we have to
compare the extension variance of the central point in the cube
/7~(Xe, v) = 2 ;:Y(xe, v) - ;:y( v, v)

with the dispersion variance of a point in the cube
/72('lv) = ;:Y(v, v)
It turns out (see [50]) that the former is usually lower than the latter. The
regular grid is superior to the random stratified grid from the point of view of
global dispersion variance as the sampies cover evenly the region.
However for computing the experimental variogram, an advantage can be
seen in using unequally spaced data: they will provide more information about
small-scale variability than evenly placed sampies. This helps in modeling the
variogram near the origin.
9 Measures and Plots of Dispersion
We include an account about the geometrical meaning of two measures of dis-

persion: the variance and the selectivity. This is based on MATHERON [123][124]
and LANTUEJOUL [100][101]. The context is mining economics, but a parallel to
analog concerns in the evaluation of time series from environment al monitoring
is drawn at the end of the chapter.
Tonnage, recovered quantity, investment and profit

For a mining deposit the concentration of a precious met al is denoted Z, which
is a positive variable. We call tonnage T( z) the proportion of the deposit (con-
sidered of unit volume) for which the grade is above a given value z. This value
z represents in this chapter a fixed value, the cut-off value, i.e. the grade above
which it is economically interesting to remove a volume from the deposit and to
let it undergo chemical and mechanical processing for extracting the metal.
Let F(z) be the distribution function of the grades, then the tonnage T(z) is
expressed by the integral
JdF(u)
+00
T(z) =
z
As F(z) is an increasing function, T(z) is a decreasing function. The tonnage

is actually complementary to the distribution function
T(z) = P(Z ~ z) =1- F(z)
The quantity Q(z) of metal recovered above a cut-off grade z is given by the
integral
JudF(u)
+00
Q(z) =
z
The function Q(z) is decreasing. The value of Q(z) for z= 0 is the mean
because the cut-off grade is a positive variable
JudF(u)
+00
Q(z) = = E[ Z] = m
o
Measures and Plots of Dispersion 63
The recovered quantity of metal can be written
+ +00
Q(z) [-uT(u)L 00 + j T(u) du
z
+00
= zT(z)+ jT(u)du
z
= C(z) + B(z)
The recovered quantity is thus split into two terms, which can be interpreted
in the foHowing way in mining economics (assuming that the cut-off grade is weH
adjusted to the economical context)
- the share C(z) of metal which reflects the investment, i.e. the part of the
recovered metal that will serve to refund the investment necessary to mine
and process the ore,
- the amount B(z) of metal, which represents conventional profit, i.e. the
leftover of metal once the quantity necessary for refunding the investment
has been subtraeted.
These different quantities are aH present on the plot of the funetion T(z) as
shown on Figure 9.1. The conventional investment (assuming that the cut-off
adequately refleets the economic situation) is a reet angular surface
C(z) = zT(z)
while the conventional profit

+00
B(z) = j T(u) du
z
is the surface under T(z) on the right of the cut-off z.

The funetion B(z) is convex and decreasing. The integral of B(z) for z= 0 is
+00 1
j B(u) du = "2 (m 2 + 0"2)
o
where m is the mean and 0"2 is the variance of Z. If we subtraet from the surface
under B(z) the area corresponding to half the square m 2 we obtain a surface
element whose value is half of the variance, as shown on Figure 9.2.
Note that Q(z) divided by T(z) yields the average of the values above cut-off
m(z) = E[ ZIZ;::: z 1 = Q(z)/B(z)

64 Geostatistics
1~l-----------------------------------------
o z
Figure 9.1: The function T(z) oft he extracted tonnage aB a function of cut-off grade.
The shaded surface under T(z) represents the quantity of meta! Q(z) which is com-
posed of the conventiona! investment C(z) and the conventional profit B(z).
Selectivity
The selectivity S of a distribution F is defined as
S = ~E[IZ - Z'I]
where Z and Z' are two independent random variables with the same distribution
F. There is an analogy with the following formula for computing the variance
under the same conditions
q2 = ~E[ (Z _ Z')2]
The selectivity and the variance are both measures of dispersion, but the
selectivity is more robust than the variance.
The selectivity divided by the mean is known in econometrics as the Gini
coefficient (see e.g. KENDALL & STUART [94])
.. S
GIm=-
m
and represents an alternative to the coefficient 0/ variation q/m. As S ~ m the
Gini coeflicient is lower than one
o ~ Gini ~ 1
m~k----------------------------------------------___
m
o z
Figure 9.2: Function B(z) ofthe conventional profit. The shaded surface under B(z)
represents half of the variance u 2 •
In geostatistics the Gini coefficient is usually called the selectivity, index.

The following inequality between the selectivity and the standard deviation
can be shown
u
S~ V3
Equality is obtained for the uniform distribution. This provides us with a
uniformity coefficient
u- V3S
-~
where 0 ~ U ~ l.
For a Gaussian distribution with a standard deviation u the selectivity is
U
SGaUB8 = v'ir
which yields a value near to the limit case of the uniform distribution.
For a lognormally distributed variable Y = m exp( u Z - u 2 /2) we have the
selectivity
Slognonn = m (2 G (u / v'2J - 1)
where G(z) is the standardized Gaussian distribution function.
66 Geostatistics
Recovered quantity as a function of tonnage

The function Q(z) of metal with respect to cut-off can be written
JT(u) du JT(z) du + JT(u) du

+00 Z +00
Q(z) = zT(z) + =
z 0 z
As Q(z) is decreasing, it can be expressed as the integral of the minimum of

the two values T(z) and T(u) evaluated over al1 grades u
Jmin(T(z),T(u))du
+00
Q(z) =
o
The function Q(T) of metal depending on the tonnage T is defined as
+00
Q(T) = Jmin(T,T(u))du
o
for 0 5 T 5 1. This function is equivalent, up to a convention in its definition,

to the Lorenz curve, which is used in economics to represent the concentration
of wealth as a function of the proportion of the population from a given country.
A typical Q(T) curve is shown on Figure 9.3. The area under the curve Q(T)
and the diagonal mT (between the origin and the point Q(1) = m) is worth half
of the selectivity 8.
It can indeed be shown that
J(Q(T) - mT) dT JB(z) dF(z)

1 +00
1
= "2
o o
JF(z) (1- F(z)) dz

1 +00
= "2
o
From this it can be seen that the se1ectivity is zero when the distribution F
is concentrated at one point (a Dirac measure). Finally we have
J Jlu-zldF(u)dF(z)
1 1 +00+00
J(Q(T) - mT) dT = "2
o o 0
1
= -8
2
by definition.
m r----------------------------------==
Q(T)
o T
Figure 9.3: The function Q(T) of the quantity of metal with respect to tonnage. The
shaded surface between Q(T) and the diagonalline corresponds to half the value of
the selectivity S.
Time series in environment al monitoring

When studying time series in environment al monitoring, interest is generally
focused on the trespassing periods of a threshold above which a chemical, a
dust concentration or a noise level is thought to be dangerous to health. Such
thresholds fixed by environmental regulations are the analogue of cut-off grades
in mining.
In economics the level of profitability is subject to permanent change and it
is appropriate to generate graphs with the values of the economic parameters for
a whole range of values. Similarly in the environmental sciences the tolerance
levels also evolve as the consciousness of dangers to health rises and it is suitable
not to generate computations only for a particular set of thresholds.
When time series are evaluated we are interested in knowing during how
long in total a tolerance level has been trespassed as shown on Figure 9.4. The
tonnage T(z) of mining becomes the fraction of time during which the level was
trespassed. On a corresponding T(z) curve like on Figure 9.1 the fraction of the
total time can be read during which Z(t) is above the cut-off for any tolerance
level z of interest.
A curve of Q(z) (not shown) represents the total quantity of nuisance for the
fraction of time when Z(t) is above the tolerance level z. The curve Q(T) as
on Figure 9.3 represents the quantity of nuisance as a function of the fraction of
time T, which itself depends on the tolerance level.
68 Geostatistics
Z(t)
T(z)
Figure 9.4: Environmental time series Z(t) with a cut-off value z: the shaded part
corresponds to the quantity of nuisance Q(z) = B(z) + C(z) during the time T(z)
when the tolerance level is trespassed.
The selectivity S is an important measure of dispersion to compare distribu-

tions with a same mean
- either as an indicator of change in the distribution of a variable for differ-

ent supports on the same spatial or temporal domain (the mean does not
change when changing support);
- or as a coefficient to rank the distributions of different sets of measurements

with a similar mean.
It is not dear what meaning the curve B(z) can have in environmental prob-
lems, except for the fact that its graph contains a geometrie representation of
(half of) the variance (T2 when plot ted together with the line joining the value of
the mean m on both the abscissa and the ordinate, as suggested on Figure 9.2.
10 Kriging the Mean
The mean value of sarnples from a geographical space can be computed, either
using the arithmetic mean, or with a weighted average integrating the knowledge
of the spatial correlation of the sarnples. The two approaches are compared.
Mean value over a region

When sarnples have been taken at irregular spacing like on Figure 10.1 a quantity
Figure 10.1: A doma.in with irregularly spaced sa.mple points.
of interest is the value of the mean m. A first approach for estimating m by m*

is to use the arithmetic mean
1
E Z(x
n
m* = - er )
n er=1
However the samples in a spatial problem cannot be considered independent.

The covariance function associated to the random function used to model the
data will in general not be a nugget-effect model. This means that we shall
assume that the random variables at two different locations in space are corre-
lated.
A second approach to estimate m* is to use a weighted average
n
m* = E WO/ Z(XO/)
0/=1
70 Geostatistics
with weights w"'.

How best choose the weights w",? We need to specify the problem.
No systematic bias
We need to assurne that the mean exists at all points of the region
E[ Z(x)] = m for all x E 'D
We want the estimation error
m* m
~
estimated value -------
true value
to be zero on average
E[m*-m]=O
i.e. we do not wish to have a systematic bias in our estimation procedure.
This can be achieved by constraining the weights w'" to sum up to one
"
Lw",=l
",=1
Indeed, if we replace m* by the weighted average, the unbiasedness condition

is fulfilled
E[m*-m] = E[Ew",z(x",)-m]
= tw", E[Z(x",)]-m
",=1 '---v----'
m
"
m Lw",-m
--..--
",=1
1
= 0
Variance of the estimation error

We need to assurne that Z(x) is second-order stationary and thus has a covariance
function C(h) which describes the correlation between any pair of points x and
x+h
C(h) = E[ Z(x) . Z(x+h)]- m2
The variance of the estimation error in our unbiased procedure is simply the
average of the squared error
var(m*-m) = E[(m*-m)2] - (E[m*-mJ)2

~
o
Kriging the Mean 71
The variance of the estimation error can be expressed in terms of the covari-
ance function
var(m* - m) E [ m*2 - 2 mm* + m2 ]

n n n
= LL Wa wß E[ Z(xa) Z(xß)]- 2m L Wa E[ Z(xa)] +m 2
a=lß=l a=l ~
m
n n
l: l: W a wß C(xa - xß)
a=lß=l
The estimation variance is the sum of the cross-products of the weights as-
signed to the sampie points. Each cross-product is weighted by the covariance
between the corresponding sampie points.
Minimal estimation variance

The criterion to define the "best" weights will be that we want the procedure to
reduce as much as possible the variance of the estimation error. Additionally we
want to respect the unbiasedness condition. We are thus looking for the
n
minimum of var(m* - m) subject to l: Wa = 1
a=l
The minimum of a positive quadratic function is found by setting the first

order partial derivatives to zero. The condition on the weights is introduced into
the problem by the method of Lagrange (see, for example, STRANG [184], p96).
An objective function <p is defined, consisting of the quadratic function plus
a term containing a Lagrange multiplier Il. The new unknown Il is built into
the function in such a way that the constraint on the weights is recovered when
setting the partial derivative with respect to Il to zero
<P(Wa,ll) = var(m* - m) - 21l (t Wa -1)

a=l
Kriging equations
To solve the optimization problem the partial derivatives of the objective function
<p( W a , Il) are set to zero
ß<p( W a , p.)
ßw",
= 0 for a=l, ... ,n
ß<p( W a , Il)
äll
= 0
72 Geostatistics
This yields a linear system of n + 1 equations. The solution of this system

provides the optimal weights w~M for the estimation of the mean by a weighted
average
n
E w~M C(xa - xß) - !-'KM = o

ß=1
for a = 1, ... ,n
n
Ew~M = 1
ß=1
This is the system for the kriging 0/ the mean (KM).
The minimal estimation variance (TRM is computed by using the equations for
the optimal weights
n
E w~M C(x a - xß) = !-'KM for a = 1, ... ,n
ß=1
in the expression of the estimation variance
(TkM = var(m* - m)
n n
= E E w~M w~M C(xa - xß)
a=1ß=1
n n
= Ew~M !-'KM = !-'KM Ew~M
a=1 a=1
"---v---"
1
= !-'KM
The variance of the kriging of the mean is thus given by the Lagrange multi-
plier !-'KM.
Case of the nugget-effect model

Exceptional situations can occur in which the nugget-effect model, which as-
sumes no correlation between sampies at different spatiallocations is appropriate
(T2 if X a = xß
Cnug(x a - xß) = {
0
ifxa # Xß
With this model the system for the kriging of the mean simplifies to
nw~M (T2 = !-'KM for a = 1, ... ,n

{
Lw~M = 1
ß=1
Kriging the Mean 73
As all weights are equal and sum up to one we have
KM
wa =-n1
and the estimator of the kriging of the mean is equivalent to the arithmetic mean
n 1 n
m* = :E w!M Z(xa ) = - :E Z(Xa)
a=l n a=l
The estimation variance associated to m* is

2 1
O'KM = I'KM = - 0'2
n
Local estimation of the mean

The difference between the kriging of the mean and a simple computation of the
arithmetic mean of the data is that the former takes into account the spatial
arrangement of the sampies. Sampies at the border are weighted in a different
manner than sampies in the middle of the domain, dustered sampies do not have
the same influence as more isolated ones.
The possibility of estimating the mean with reference to the spatial disposi-
tion of the sampies opens the gate to a more flexible dass of models with respect
to stationarity. We may allow (more on a physical, than on a mathematical basis)
for the concept of a locally second-order stationary random function whose first
two moments are approximately stationary at any given position of a neighbor-
hood (a disk or sphere of fixed size, for example) moved around in the domain.
Such random functions, which are not necessarily second-order stationary at the
scale of the whole domain, have a smooth mean function m(x), which can be
evaluated locally as a constant by kriging with data within a moving neighbor-
hood. In other words the drift m(h) is considered approximately constant for
distances Ihl smaller than the diameter of the moving neighborhood.
The concept of a moving neighborhood makes a global reference value like
the mean m seem undesirable. The framework needed to handle the larger dass
of locally second-order stationary random functions has in fact already been
presented: it is the framework of intrinsic random functions with a bounded
variogram. The variogram, as a first difference operator, filters constants and
thus local means. Furthermore, if the experimental variogram has a sill ,(00)
(which in practice generally is not equal to the variance) a covariance function
which is fitted to it under the cover of a theoretical variogram by the formula
,(h) = ,(00) - C(h) with ,(00) = C(O)

can be associated to a locally second-order stationary random function, but not
to a random function with constant mean and variance over the whole domain.
11 Ordinary Kriging
Ordinary kriging is the most widely used kriging method. It serves to estimate
a value at a point of a region for whieh a variogram is known, using data in the
neighborhood of the estimation loeation. Ordinary kriging ean also be used to
estimate a block value. With loeal seeond-order stationarity, ordinary kriging
implieitly evaluates the mean in a moving neighborhood. To see this, first a
kriging estimate of the loeal mean is set up, then a simple kriging estimator
using this kriged mean is examined.
Ordinary kriging problem

We wish to estimate a value at Xo as represented on Figure 11.1 using the data
Figure 11.1: A domain with irregularly spaeed sampie points (blaek dots) and a
Ioeation of interest xo.
values from the n neighboring sampie points X o' and eombining them linearly
with weights WO'
n
Z*(xo) = L WO' Z(x O' )
0'=1
Obviously we have to eonstrain the weights to sum up to one beeause in the
extreme ease when all data values are equal to a eonstant, the estimated value
should also be equal to this eonstant.
We ass urne that the data are part of a realization of an intrinsie random
function with a variogram ,(h).
Ordinary Kriging 75
The unbiasedness is warranted with unit sum weights
E[ Z*(xo) - Z(xo)] = E[ t
",=1
w'" Z(x",) - Z(xo)· t W"']
",=1
'-..--'
1
n
LW", E[ Z(x",) - Z(xo)]

",=1
= 0
because the expectations of the increments are zero.

The estimation variance O"fu = var( Z*(xo) - Z(xo)) is the variance of the linear
combination
n
Z*(xo) - Z(xo) = LW", Z(x",) - 1 . Z(xo) = LW", Z(x",)
",=1 ",=0
with a weight Wo equal to -1 and
LW",=O
",=0
Thus the condition that the weights numbered from 1 to n sum up to one also
implies that the use of the variogram is authorized in the computation of the
variance of the estimation error.
COMMENT 11.1 The variogram is authorized for ordinary kriging, but not for
simple kriging, because the latter does not include a constraint on the weights.
r]
The estimation variance is
O"~ = E [ (Z*(xo) - Z(xo)

n n n
-i(xo - xo) - L LW", Wß i(X",-Xß) + 2 L W",i(X",-Xo)

"'=1ß=1 ",=1
By minirnizing the estimation variance with the constraint on the weights,
we obtain the ordinary kriging system (OK)
( i(Xn-X1)
i(X1~X1)
i(X1-Xn) ~) (W[) (i(X1~XO))
=
i(Xn-Xn) 1 wn i(Xn-Xo)
1 1 0 /lOK 1
where the w~K are weights to be assigned to the data values and where fLOK is the
Lagrange parameter. The left hand side of the system describes the dissimilarities
between the data points, while the right hand shows the dissimilarities between
each data point and the estimation point xo.
76 Geostatistics
Figure 11.2: A domain with irregularly spaced sampIe points (black dots) and a block
Vofor which a value should be estimated.
Performing the matrix multiplieation, the ordinary kriging system can be
I
rewritten in the form
t
ß=1
wg K ,(x,,-xß) + JlOK = ,(x,,-xo) for a = 1, ... ,n
n
L:wgK = 1
ß=1
The estimation varianee of ordinary kriging is

n
U6K = -JlOK -,(xo - xo) + 2 L: w~K ,(X,,-Xo)
,,=1
Ordinary kriging is an exact interpolator in the sense that if Xo is identical
with a data loeation then the estimated value is identical with the data value at
that point
Z*(Xo) = Z(x,,) if Xo = x"
This can be easily seen. When Xo is one of the sampIe points, the right hand
side of the kriging system is equal to one column of the left hand side matrix. A
weight vector w with a weight for that column equal to one and all other weights
(including JlOK) equal to zero is a solution of the system. As the left hand matrix
is not singular, this is the only solution.
Block kriging
Ordinary kriging can be used to estimate a block value instead of a point value
as suggested by the drawing on Figure 11.2. When estimating a block value from
Ordinary Kriging 77
point values by
n
Z~o = LW", Z(x",)

",=1
the ordinary kriging is modified in the following way to krige a block
'(X1~Xd
( ,(xn-xd ,(X1-Xn) 1) (W~K) ( ::Y(X1, vo) )
,(Xn-Xn) ~ W~K = ::y(xn', vo))
1 1 0 fLBK 1
where the right hand side now contains the average variogram ::y(x", , vo) of each
sample point with the block of interest.
The corresponding block kriging variance is
n
(j~K = -fLBK - ::y(vo, vo) +2 L w~K::y(x""vo)
",=1
For a second-order stationary random function the block kriging variance can
be written in terms of covariances
n
2
(jBK = fLBK + -C( Vo, vo) - ~ BK-
2 L...J w'" C(x"" vo)
",=1
Assuming the covariance function falls off to zero (like the spherical or expo-
nential functions), if we let the volume Vo grow very large, the terms C( Vo, vo)
and C(x", , vo) vanish, i.e.
2 2
(jBK ~ (jKM for Ivol ~ 00
The same is true for the kriging systems and the estimates. The block kriging
tends to be equivalent to the kriging of the mean for large blocks Vo.
Simple kriging with an estimated mean

We already know how to krige the mean using the linear combination
n n
m* = L w~M Z(x",) with Lw~M = 1
",=1 ",=1
Why not insert this estimated mean value into the simple kriging estimator
n
ZSKM(xO) = m* + L w~K(Z(x",) - m*)
",=1
where w~K are the simple kriging weights? Replacing m* by the corresponding
linear combination gives
n n n n
ZSKM(XO) = L w~M Z(x",) + L w~K Z(x",) - L w~K L w~M Z(Xß)
",=1 ",=1 ",=1 ß=1
78 Geostatistics
n n n n
L: w~K Z(x,,) + L: w~M Z(x,,) - L: W~M Z(x,,) L: W~K
,,=1 ,,=1 ,,=1 ß=1
= t [W~K + W~M (1 - t w~K

,,=1 ß=1
) ] Z (x,,)
Introducing a weight
n
w= 1- L:W~K
,,=1
called the weight of the mean, we have
Z;KM(XO) = E[W~K +WW~M]Z(X,,) Ew~ = Z(x,,)
This looks like the estimator used for ordinary kriging.

We check whether the weights w~ in this linear combination sum up to one:
n n n n n
L: w~ = L: w~K +W L: w~M = L: w~K + 1 - L: W~K = 1
,,=1 ,,=1 ,,=1 ,,=1 ,,=1
'----...--'
1
We examine the possibility that the weights w~ might be obtained from an

ordinary kriging system
n n n
L: wß C(x,,-xß) L: w~K C(x,,-xß) +W L w~M C(x,,-xß)

ß=1 ß=1 ß=1
= C(x,,-xo) + WJ-lKM
Choosing to call f.-l' the product of the weight of the mean with the variance
of the kriging of the mean, and putting the equations for A~ together, we indeed
!
have an ordinary kriging system
t
ß=1
wß C(x,,-xß) = C(x,,-xo) + f.-l' for a = 1, ... ,n
n
LWß=l
ß=l
which shows that ordinary kriging is identical with simple kriging based on the
estimated mean: ZSKM(xO) = Z5K(xO).
The ordinary kriging variance has the following decomposition
0"6K = O"~K +I w2 O"kM I

The ordinary kriging variance is the sum of the simple kriging variance (as-
suming a known mean) plus the variance due to the uncertainty about the true
value of the mean. When the weight of the mean is small, the sum of the weights
of simple kriging is elose to one and ordinary kriging is elose to the simple kriging
solution, provided the variance of the kriged mean is also small.
Ordinary Kriging 79
Kriging the residual

If we restrict Z(x) to be an (at least locally) second-order stationary random
function, we can establish the following model
Z(x) = ~ + Y(x) with E[ Y(x)] = 0

mean ~
resIdual
The mean is apparently uncorrelated with the residual
E[m.Y(x)] =mE[Y(x)] =0
because it is a deterministic quantity.
We know how to estimate Z(x) by ordinary kriging and the mean m by a
kriging of the mean. Now we would like to krige Y(x) at some location Xo with
the same type of weighted average
n
Y*(xo) = L Wa Z(x a )
a=1
The unbiasedness is obtained by using weights summing up to zero (instead

of one)
n
E[ Y(xo) - Y*(xo)]
______
L
E[ Y(xo)] - a=1 Wa E[ Z(xa )]
--..--
o m
- m LWa =0
a=1
'-v-'
o
The condition on the weights has the effect of removing the mean.
The estimation variance var(Y*(xo) - Y(xo)) is identical to var(Z*(xo) -
Z(xo)) as Y(x) has the same covariance function C(h) as Z(x). The system of
the kriging 01 the residual (KR) is the same as for ordinary kriging, except for
the condition on the sum of weights which is zero instead of one.
It is of interest to note that the ordinary kriging system can be decomposed
into the kriging of the mean and the kriging of the residual:
[efJ+ ef)1
C(Xl;-Xl) C(XI-Xn)
(
C(X n-Xl)
1
C(xn-X n )
1 1) .
= (~) + (C(X ;-X 1 O
))
o C(xn-xo)
1 0
80 Geostatistics
The weights of ordinary kriging are composed of the weights of the krigings
of the mean and the residual, and so do the Lagrange multipliers
WOK = w KM + wKR and /lOK = /lKM + /lKR

'" '" '"
The estimators are thus compatible at any location of the domain
Z*(Xo) = m* + Y*(Xo) for all Xo E 'D
The kriging variances

n
2 _
UKM - /lKM and UkR = C(O) - 2 E w!R C(x", - xo)
",=1
however do not add up in an elementary way.
Cross validation
Cross validation is a simple way to compare various assumptions either about
the model (e.g. the type of variogram and its parameters, the size of the kriging
neighborhood) or about the data (e.g. values that do not fit their neighborhood
like outliers or pointwise anomalies).
In the cross validation procedure each sampie value Z(x",) is removed in turn
from the data set and a value Z*(X[",)) at that location is estimated using the
n-l other sampies. The square brackets around the index a symbolize the fact
that the estimation is performed at location x'" excluding the sampled value Z"'.
The difference between a data value and the estimated value
Z(X",) - Z*(X[",))
gives an indication of how weH the data value fits into the neighborhood of the
surrounding data values.
If the average of the cross-validation errors is not far from zero
.!. t(Z(x",)-z*(X[",))) ~O
n ",=1
we can say that there is no apparent bias, while a significant negative (or positive)
average error can represent systematic overestimation (respectively underestima-
tion).
The kriging standard deviation 00[",) represents the error predicted by the
model when kriging at location x'" (omitting the sampie at the location x",).
Dividing the cross-validation error by 00[",) allows to compare the magnitudes of
both the actual and the predicted error
Z(x",) - Z*(X[",))
00[",)
Ordinary Kriging 81
If the average of the squared standardized cross-validation errors is about one
1 n ( Z(xa) - Z*(X(a])) 2 '"

-n a=1
L 2
O"(a]
=1
the actual estimation error is equal on average to the error predicted by the
model. This last quantity gives an idea about the adequacy of the model and of
its parameters.
12 Kriging Weights
The behavior of kriging weights in 2D space is discussed with respect to the

geometry of the sample/estimation locations, with isotropy or anisotropy and in
view of the choice of the type of covariance function. The examples are borrowed
from the thesis of RIvOIRARD [151] which contains many more.
Geometry
The very first aspect to discuss about kriging weights is the geometry of the
sample points and the estimation point. Let us take a lozenge at the corners of
which 4 sampies are located. We wish to estimate a value at the center of the
lozenge.
N ugget-efFect covariance model

In the case of a nugget-effect model the ordinary kriging weights are shown on
on Figure 12.1. All weights are equal to l/n and do not depend on the geometry.
25%
•
25% 25%
• •
I
0
L
25%
•
Figure 12.1: Ordinary kriging weights when using a nugget-effect model with sampies
located at the corners of a lozenge and the estimation point at its center. The kriging
•
varIance 1S 2 = 1.25 (J'.
• (J'OK 2
With a nugget-effect model, when the estimation point does not coincide with
a data point, the ordinary kriging variance is equal to
2
(J'OK = /LOK + (J'2 = (J'2
-
n
+ (J'2
It is larger than the variance (J'2.
Kriging Weights 83
40.6%
•
9.4% 9.4%
• •
I•
0
L
40.6%
Figure 12.2: With a spherical model of range a/ L = 2 the remote sampies get lower
weights. The ordinary kriging variance is 0"6K= .840"2.
Spherical covariance model
Taking a spherical model of range a/ L = 2 the weights of the sampies at the

remote corners of the lozenge get lower weights as to be seen on Figure 12.2. The
associated ordinary kriging variance is 0"6K = .840"2.
Gaussian covariance model
Using a Gaussian covariance model with range parameter a/ L = 1.5 the remote
sam pies have almost no influence on the estimated value at the center of the
lozenge as shown on Figure 12.3.
49.8
•
0.2% 0.2%
• •
I
0
L
49.8%
•
Figure 12.3: Using a Gaussian model with a range parameter a/ L = 1.5 the remote
sampies have very little influence on the estimated value. The ordinary kriging variance
is 0"6K = .30 0"2.
The random function associated to the Gaussian model is an analytic func-

tion, which implies that the data are assumed to stern from an infinitely differ-
entiable regionalized variable. Such a model is generally not realistic.
84 Geostatisties
25% 17.6%
• •
I• I•
25% 25% 32.4% 32.4%
• 0
• • 0
•
L 25% L 17.6%
Figure 12.4: Comparing a configuration of four sampIes at the corners of a square

using an isotropie model (on the left) and an anisotropie model (on the right).
Geometrie anisotropy
On Figure 12.4 a configuration of four sampies at the corners of a square with a
eentral estimation point is used, taking alternately an isotropie (on the left) and
an anisotropie model (on the right).
On the left, the isotropie model generates equal weights with this configu-
ration where all sampie points are symmetrieally loeated with respect to the
estimation point.
On the right, a spherieal model with a range of a/ L = 1.5 in the horizon-
tal direetion and a range of a/L = .75 in the vertieal direction is used as the
anisotropie model. The weights are weaker in the direetion with the shorter
range and this matches intuition.
Relative position of sampies

This experiment eompares the relative position of sampies on a circle of radius
L around the estimation point using an isotropie spherieal model with a range
of 3a/L.
On the left of Figure 12.5 three sampies are arranged in asymmetrie way
around the estimation point. The weights are identieal and the ordinary kriging
varianee is 0'6K = .450'2. On the right, the two top sampl~s have heen moved
down around the circle. The weights of these two sampies inerease heeause of
the gap ereated at the top of the circle. The ordinary kriging variance is higher
than in the symmetrie configuration with 0'6K = .480'2.
On the left of Figure 12.6 the two top sampies have heen set very near to
eaeh other and their total weight is slightly higher than the weight of the bot tom
sampie. The ordinary kriging varianee ia 0'6K = .5260'2. On the right of Fig-
ure 12.6 the two upper points have actually heen merged into a single sampie.
Symmetry ia reestahlished: the top and hottom sampie loeations each share half
of the weight. Merging the two upper sampies into one top sampie has only
Kriging Weights 85
o 37.1% o 37.1%
33.3% 25.9%
Figure 12.5: Three samples on a circle around the estimation location (on the left
0'6K= .450'2 and on the right 0'6K= .480'2).
25.7% 25.7% 50%
o o
48.7% 50%
Figure 12.6: On the left, two sampies have been set near to each other at the top
(0'6K= .5260'2). On the right, the two upper samples have been merged into one
single top sample (0'6K= .5370'2).
slightly increased the ordinary kriging variance to 0'6K = .5370'2.
Screen effect
Aremarkable feature of kriging, which makes it different from other interpolators,
is that a sample can screen off other samples located behind it with respect to
the estimation location. An example of this phenomenon is given on Figure 12.7
showing two ID sampling configurations. A spherical model of range parameter
a/ L= 2 was used, where L is the distance from the point A to the estimation
location.
At the top of Figure 12.7, two samples A and Bare located at different
distances of the estimation point and get different ordinary kriging weights of
86 Geostatistics
65.6% 34.4%
• A
o •B
49.1% 48.2% 2.7%
•
A
o • C
•B
Figure 12.7: At the top, we have two sampies at different distances from the esti-
mation point (U&K= 1.14u2). At the bottom, a third sampie has been added to the
configuration (U&K= .87 u 2).
65.6% and 34.4% according to their proximity to that point. The kriging variance
is U&K = 1.14 u 2.
At the bottom of Figure 12.7, a third sample C has been added to the config-
uration: the weight of B drops down to 2.7% and almost all weight is distributed
fairly equally between A and C which are at the same distance from the estima-
tion point. The ordinary kriging variance is brought down to U&K = .87 u 2.
An interpolator based on weights buHt on the inverse of the distance of a
sample to the estimation point does not show screen effects. The weights are
shaped without taking any account of the varying amount of samples in different
directions around the estimation point.
Factorizable covariance functions

With the 1D configuration at the bottom of Figure 12.7 a total screen off of B
by C could be obtained by using an exponential covariance function (instead of
the spherical model) and simple kriging (instead of ordinary kriging). This is
due to the factorization of the exponential function C(h) = exp( -Ih/):
C(XO,XB) = C(xo,xc)' C(XC,XB)

which implies the conditional independence of Z(xo) and Z(xs) with respect to
Z(xc) for a stationary Gaussian random function Z(x).
In 2D, with h= (ht, h2?, the so-called "Gaussian" covariance function (be-
cause of the analogy with the distribution of the same name)
C(h) = e- 1h12
can be factorized with respect to the two spatial coordinates:
C(h) = e- 1h12 = e-(hd - (h 2 )2 = e-(hd 2 • e-(h 2? = C(hd' C(h 2 )

Kriging Weights 87
0% 0%
•D
•E
15.7% 65.4% -9.2%

•A
0
•B
•C
0%
•F
Figure 12.8: Simple kriging with a Gaussian covariance model: samples A, B, C
screen off D, E, F (in simple kriging the weights are unconstrained and do not add up
to 100%).
-10% 63.7% -40.8%
• D
•
G
•E
15.7% 64.9% -3.9%
• A
0
•B
•C
-2.4%
•F
Figure 12.9: Simple kriging with a Gaussian covariance model: adding the point G
radically changes the distribution of weights.
An example of simple kriging weights with a Gaussian covariance function is

shown on Figure 12.8. The sampies D, E, F are completely screened off by the
sampies A, B, C. The latter are the orthogonal projections of the former on the
abscissa of a coordinate system centered on the estimation point. It could be
thought that total screen off will act on sampies whose projection on either one
of the coordinate axes (centered on the estimation location) is a sampie location.
There is no such rule: adding a point G above the estimation location as on
Figure 12.9 changes radically the overall distribution of weights.
88 Geostatistics
Negative kriging weights

It is interesting to note that many points on Figure 12.9 have received a neg-
ative weight. Negative kriging weights are an interesting feature a.s they make
extrapolation possible out of the range of the data values. The counterpart is
that when a variable is only defined for positive values (like grades of ore), it
may happen that kriged values are negative and need to be readjusted.
13 Mapping with Kriging
Kriging can be used as an interpolation method to estimate values on a regular

grid using irregularly spaced data. Spatial data may be treated locally by defining
a neighborhood of sampIes around each location of interest in the domain.
Kriging for spatial interpolation

Kriging is certainly not the quiekest method for spatial interpolation.
A simpler method, for example, is the inverse distance interpolation, which
consists in weighting each data by the inverse of the distance to the estimation
location (scaling the weights to be unit sum). One could establish an alternate
method by using the inverse squared distance or, more generally, apower p of
the inverse distance. This raises the question of how to choose the parameter p
and there is no simple answer . Geostatistics prevents such a problem by starting
rightaway with an analysis and an interpretation of the data to determine the
parameters entering the interpolation algorithm, i.e. kriging.
The advantages of the geostatistical approach to interpolation are thus:
- kriging is preceded by an analysis of the spatial structure of the data.

The representation of the average spatial variability is integrated into the
estimation procedure in the form of a variogram model.
- ordinary kriging interpolates exactly: when a sampIe value is available at

the location of interest, the kriging solution is equal to that value.
- kriging, as a statistical method, provides an indication of the estimation

error: the kriging standard deviation, which is the square root of the kriging
varlance.
How is kriging actually used for generating a map?

A regular grid is defined on the computer as shown on Figure 13.1. Each node
of the grid becomes the point Xo in turn and a value is kriged at that location.
The result is a map like on Figure 13.2. Here araster representation of the
kriged grid was chosen. Each shaded square is centered on the node of the grid
and is shaded with a different grey tone according to the value estimated at that
location.
Figure 13.3 shows a raster representation of the corresponding kriging stan-
dard deviations. This map of the theoretical kriging estimation errors allows to
90 Geostatistics
000 0
•0 0 0 0 0 • 000
Xo
o 0 0 0 0·0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 000 000
• •
• • •
• •
• •
•
• •
•
• •
Figure 13.1: To make a map, a regular grid is built for the region of interest and each
node becomes the node Xo in turn.
evaluate the precision of the estimation in any part of the region. It is a useful
compendium of the map of the kriged values. It should be understood that the
kriging variance is primarily a measure of the density of information around the
estimation point.
The white squares on Figure 13.3 correspond to nodes which coincide with
a sampie location: the kriging variances are zero at these points where exact
interpolation takes place.
For a better visual understanding of the spatial structure of the kriged values
the grid of regularly spaced values is generally submitted to a second interpolation
step. One possibility is to refine the initial grid and to interpolate Hnearly the
kriged values on the finer grid like on the raster representation on Figure 13.4.
The result is a smoother version of the raster initially shown on Figure 13.2.
Another way to represent the results of kriging in a smooth way is to contour
them with different sets of isoHnes as shown on Figure 13.5 (the arsenic data has
been standardized before kriging: this is why part of the values are negative).
Neighborhood
It may be asked whether it is necessary to involve all sampies in the estimation
procedure. More precisely, with a spherical covariance model in mind, as the
covariance is zero for distances greater than the range, is it necessary to involve
sampies which are far away from a location of interest Xo?
It seems appeaHng to draw a circle (sphere) or an ellipse (ellipsoid) around
Xo and to consider only sampies which He within this area (volume) as neighbors
Mapping with Kriging 91
Figure 13.2: Raster representation of kriged values of arsenic: a square is centered on

each node of a .5 km X .5 km grid and shaded proportionally to the kriging result.
of Z(xo). Loeations whose eovarianee with Xo is zero are uneorrelated and the
eorresponding sampies have no direct influenee on Z(xo). Thus it makes sense
to try to define for eaeh loeation Xo a loeal neighborhood of sampies nearest to
Xo·
At first sight, with the spherieal eovarianee model in mind, the radius of the
neighborhood eentered on Xo eould be set equal to the range of the eovarianee
function and sampies further away eould be excluded as represented on Fig-
ure 13.6. However, the spherieal eovarianee function expresses only the direct
eovarianees between points, but not the partial eovarianees of the estimation
point with a data point eonditionally on the other sampie points. The partial
eorrelations are reflected in the kriging weights, whieh are in general not zero for
points beyond the range from the estimation loeation.
In practiee, beeause of the generally highly irregular spatial arrangement
and density of the data, the definition of the size of a loeal neighborhood is not
straightforward. A eriterion in eommon use for data on a regular mesh is to eheck
for a given estimation point if adding more sampies leads to a signifieant decrease
of the kriging variance, i.e. an increase in the precision of the estimation. Cross
validation criteria can also be used to compare different neighborhoods.
92 Geostatistics
285.
280.
275.
270.
Figure 13.3: Raster representation of the kriging standard deviations.
Figure 13.4: Smoothed version of the raster representation of arsenie: a finer grid is
used and the kriged values are interpolated linearly on it.
Mapping with Kriging 93
285.
280.
275.
270.
Figure 13.5: Isoline representation of kriged values of arsenie: the kriged values at
the grid no des are contoured with a set of thin and thick isolines (arsenic data was
reduced to zero mean and unit variance before kriging).
"\. Xo
0 0 0 0
• 0 0 0
o
0·00;1.
0 0 0 0·0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 o( 0 0 0 0 0 0 0
•
•
• - .
•
• •
Figure 13.6: Around a location of interest Xo of the grid a circular neighborhood is
defined and the sampIes within the circle are selected for kriging.
14 Linear Model of Regionalization
A regionalized phenomenon can be thought as being the sum of several indepen-

dent subphenomena acting at different characteristic scales. A linear model is set
up which splits the random function representing the phenomenon into several
uncorrelated random functions, each with a different variogram or covariance
function. In subsequent chapters we shall see that it is possible to estimate these
spatial components by kriging, or to filter them. The linear model of regional-
ization brings additional insight about kriging when the purpose is to create a
map using data irregularly scattered in space.
Spatial anomalies
In geochemical prospecting or in environment al monitoring of soil the aim is to
detect zones with a high average grade of mineral or pollutant. A spatial anomaly
with a significant grade and a relevant spatial extension needs to be detected.
We now discuss the case of geochemical exploration.
When prospecting for mineral deposits a distinction can be made between
geological, superficial and geochemical anomalies as illustrated on Table 14.2.
A deposit is a geological anomaly with a significant average content of a
given raw material and enough spatial extension to have economic value. Buried
at a specific depth, the geological body may be detectable at the surface by
indices, the superficial anomalies, which can be isolated points, linear elements
or dispersion haloes.
Geological and superficial anomalies correspond to avision of the geological
phenomenon in its full continuity. Yet in propecting only a discrete perception of
the phenomenon is possible through samples spread over the region. A superficial
anomaly can be apprehended by one or several sampies or it can escape the grip
of the propector when it is located between sampie points.
A geochemical anomaly, in the strict sense, only exists at the sam pie points
and we can distinguish between:
- pointwise anomalies defined on single sampies, and
- groupwise anomalies defined on several neighboring sampies.
If we think of the pointwise and the groupwise anomalies as having been

generated by two different uncorrelated spatial processes, we may attribute a
nugget-effect to the presence of pointwise anomalies, a short range structure to
Linear Model of Regionalization 95
3D 2D SampIes
I GEOLOGICAL 1-----+ SUPERFICIAL 1-----+ GEOCHEMICAL I
ANOMALY ANOMALY ANOMALY
isolated spot -+ point

/' /'
deposit -+ fault
'\. '\.
aureola -+ group of points
Table 14.2: Spatial anomalies in geochemical prospecting.
groupwise anomalies, while a further large scale structure may be explained by

the geochemical background variation.
Such associations between components of a variogram model and potential
spatial anomalies should be handled in a loose mann er , merely as a device to
explore the spatial arrangement of high and less high values.
N ested variogram model

Often several sills can be distinguished on the experimental variogram and related
to the morphology of the regionalized variable. In a pioneering paper SERRA [172]
has investigated spatial variation in the Lorraine iron deposit and found up
to seven sills, each with a geological interpretation, in the multiple transitions
between the micrometric and the kilometric scales.
Let us first define the correlation function p(h), obtained by normalizing the
covariance function with its value b= C(O) at the origin
C(h)
p(h) = -b- so that C(h) = b p(h)
Different sills bu observed on the experimental variogram are numbered with

an index u = 0, ... , 5. A nested variogram is set up by adding 5+1 elementary
variograms with different coefficients bu
s s
,(h) = L ,u(h) = L bu9u(h)
u=o u=o
where the gu(h) are normalized variograms.
If the variograms 'u(h) can be deduced from covariance functions Cu(h), the
nested variogram becomes
s s s
,(h) = b- L Cu(h) = b - L b Pu(h) u where b= L bu
u=O u=o u=o
96 Geostatistics
•••• '~'., .,": ;'.0°, 0. ".""

0
"0,: °0 ".' .. ,', .0" " " .. ~"_." ~ 0:" """,, .. 0,'
285. ':. , "::'.:.~.
· ' : ' ... f .:. " ••••••••• 0.:': .. ~. ~'. : .' .....•.•0:
:.~ .• "1.::,:.0-••:°':".:,:, :'",,:"°::°:00 .":.• 0"· :~':.~
' .. :.0",
• "0'14>.. l ... ' ..... ° '000.' 0' • ••• .••.••••••••• ~o • •
" " .~: 9° erd-.~ •• 'iI! " ... " '0. ,,00'0 °0 "".0· ,'ö " ........ : " ...... ',.... • " ..
: ._~ 0. • ':0.. : ": ~ •••••• • .0 • ' • .' :.','. •••••• . ° ••• : ' .: .
: .' r·: .' ~.' .•' .,. . • ° • '. • . ; . 0. f.: . . .... . 0 0. 0. •. . . .
'. '0 '."0 .~o •••• , .::: ••0•. : ..... ' :0•• 0 ••••• '•• :.. '.:<fOo'."..: "':.
280. • """ • 0°0 o...&: .0" 0 "
' . , 0 o. •
" ,,'. • ..
"
,," '.... .. ' " '. " , ' " 8R "'" 0'" •
• • , '1.,p •
II"L
.0 ~ .. : •••• '0:'. '0 ~'.~ : ; ••••• : ..... o.~: .~... : .'.0,' .' ••••••• '. -.; 0ooogo'oo";,
•• " . • 0- 0 ' •• '
·".;'~"'o·".'.' •....•.•• '. ·•••• 0.•.•• 0;': .•°. 0 •• 0 9

'.
'.,
00':.'.::.'.:.". • . ~·:·:-···":.".l·'8'·
' . ;'." ••••• '0' ".o'~ ':~.~ ·'Co··';
'oD
••"..::: ;"0.1'0
, ••• "0 "'. 0.00° 0.,
• ,0.: : • •• • ' •• '. ' . : 00 00': 0'.' 0
•••• :........ ... 0 " 0
"" " .' .",.. ",·,0 0 0 °0 "" .' ~OO 00.·. ~rf\~':"
0
' ... " '0' .. " 0... : "..
275. "
"..
~ GI
. . " " . : " .. 0 " "
n- 9"'\
. . " " ".. .. eS otJ"
,," "
" .. ""
.... 0 "
0.0
o-vCJ'06"0 ... •• 0""
.. '. 0 rP 1)CICJ 4, ..
"" ·,0,".:" 0 6'0

" " " .. " 0 0... ".. "'. ".." "...' "
: . . " " " " " 00 °o,,()
... 0 ~
•• " " , , ' 0 .. 0 0 " .. " .. 0 ,
.. 0.0 ,0. "
••••• ··o~ • •°°. 0 .•.. : ....... :.·0 ·0°o •...
..
•• ...... ,," 0 ",," .. " . . . .0" 0 0" ,0
0:' •••••• 0. ~'bo
'. ö···: . .. : c:.
° ':' ... ':.':•. ;08 0•a.° .0 0•°:
.0 00 • * •• ,0 00.°000• ·0'- ••, • • • • • ·.0 0
:.0: '0:'0 : .:• •..o.:: •0 00
270. ·of·• ••• ... • 00
0·0 0 • • • 0 . . . . . . . e:,. 0
o.
:gooo 0.
,g, • • ••
·.00
•••
•
·00°0.·.· : ... 0 .......... ooo.·'IIII ••• o~og:. 0 nver
tf>:. ;P°o~oo'\, 00 , .:., •• ~ ",0. ~'. o. '0 ," \te ° 0 ..
o 0 t:I! ° .'. °0 O· • • • • ••• 0 •• o. • 00 \...0 o. 0 • • , tP ,0 0
•• ' . 0 ':. ..' '. ' . ' .0' ö' • •• ,po. 0 ö · 0 0°
; ;
~
310. 315. 320. 325. 330. 335. 340.
Figure 14.1: Display of the values of arsenic measured on soil sampies in a 35 km x

25 km region near the Loire river in France.
with, in a typical case, a nugget-effect covariance function Co(h) and several

spherical covariance functions Cu(h) having different ranges a u.
On Figure 14.1 a map of arsenic samples taken in soil is shown, covering a
region near the Loire river. The arsenic sampie locations are represented with
circles proportional to the detected values. The mean experimental variogram of
this variable is seen on Figure 14.2. The variogram has been fitted with a nested
model consisting of a nugget effect plus two spherical functions with a range of
3.5 km and 6.5 km.
Decomposition of the random function

The random function Z(x) associated with a nested variogram model is a sum of
spatial components characterizing different spatial scales, i.e. reaching different
sills of variation bu at different scales, except maybe for the last coefficient bs ,
which could represent the slope of an unbounded variogram model.
It is dear that a small-scale component can only be identified if the sampling
grid is sufficiently fine. Equally, a large-scale component will only be visible on
the variogram if the diameter of the sampled domain is large enough.
'Y (h)
l._0 .
--_ .... _---_ ... _---
0_5
Q) Q)
0> 0>
...CO
C C
Ci) ~
0> t::: 0>
0> 0
::J .c C
C (f) .Q
0_ I
aa_ I
l._
I
2_
I
3_
I
4_
I
5_
I
6_
I
7_
I
8_
I
9_
I
l.0_
Ih
l.1.
Figure 14.2: Variogram of the arsenic measurements fit ted with a nested model eon-
sisting of a nugget-effeet plus two spherical models of 3.5 km and 6.5 km range.
A deterministie eomponent (mean m or drift m(x)) ean also be part of the

deeomposition.
In the following we present severallinear models of regionalization, obtained
by assembling together spatial eomponents of different types.
Second-order stationary regionalization

A seeond-order stationary random function Z(x) ean be built by adding uneor-
related zero mean seeond-order stationary random functions Zu(x) to a eonstant
m representing the expectation of Z(x)
Z(x) = Zo(x) + ... + Zu(x) + ... + Zs(x) + m

where eov( Zu(x), Z,,(x+h)) = 0 for u f: v.
A simple eomputation shows that the eorresponding eovarianee model is
C(h) = Co(h) + ... + C.,(h) + ... + Cs(h)

Take a random function model with two uneorrelated eomponents
Z(x) = Zl(X) + Z2(X) + m

98 Geostatistics
with E[ Zl(X) Z2(x+h)] =0 and E[ Zl(X)] = E[ Z2(X)] = O.

Then
C(h) = E[ Z(x+h) Z(X)]- m 2

= E[ Zl(x+h) Zl(X)] + E[ Zl(x+h) Zl(X) 1+ m2
+ E[ Zl(x+h) Z2(X) 1+ E [ Z2(x+h) Zl (x) 1
+E[ Zl(x+h) 1m + E[ Z2(x+h) 1m
+mE[ Zl(X)] + mE[ Z2(X)]- m2
= C1(h) + C2 (h)
Intrinsic regionalization
In the same way an intrinsie regionalization is defined as a sum of 5 + 1 compo-
nents
Z(x) = Zo(x) + ... + Z.,(x) + ... + Zs(x)
for which the increments are zero on average and uncorrelated.

For a two component model
Z(x) = Zl(X) + Z2(X)
we have the variogram
,(h) = E[ (Z(x+h) - Z(x) r]

= E[ ((Zl(x+h) - Zl(X)) + (Z2(x+h) _ Z2(X))) 2]
= E[ (Zl(x+h) - Zl(x)rl + E[ (Z2(x+h) - Z2(x)rl
= /l(h) + /2(h)
because E[ (Zl(x+h) - Zl(X)) . (Z2(x+h) - Z2(X)) 1 = o.
Intrinsic regionalization with mostly stationary components

A particular intrinsic random function Z(x) can be constructed by putting to-
gether one intrinsic random function Zs(x) and 5 second-order stationary func-
tions Z,,(x),u = 0, ... ,5-1, having means equal to zero and being uncorrelated
amongst themselves as weH as with the increments of the intrinsic component
Z(x) = Zo(x) + ... + Z,,(x) + ... + Zs(x)

The associated variogram model is composed of S elementary structures de-

duced from covariance functions plus a purely intrinsic structure.
For this mixed linear regionalization model with second order stationary and
intrinsically stationary components, we shall develop kriging systems to estimate
both types of components in the next chapter.
Locally stationary regionalization

A non-stationary random function Z(x) can be obtained by superposing S+l
uncorrelated second-order zero mean random functions Z,,(x) and a drift m(x)
(the expectation of Z(x) at the location x)
Z(x) = Zo(x) + ... + Zu(x) + ... + Zs(x) + m(x)

By assuming a very smooth function m/(x), which varies slowly from one
end of the domain to the other end, the drift is locally almost constant and the
random function
Z(x) = Zo(x) + ... + Z,,(x) + ... + Zs(x) + m/(x)
can be termed locally stationary (i.e. locally second-order stationary), the ex-
pectation of Z(x) being approximatively equal to a constant inside any rather
small neighborhood of the domain.
The locally stationary regionalization model is the adequate framework for
the typical application with the following three steps
1. the experimental variogram is used to describe spatial variation;
2. only covariance functions (i.e. bounded variogram functions) are fit ted to
the experimental variogram;
3. a moving neighborhood is used for ordinary kriging.
In the first step the experimental variogram filters out ml (x) at short dis-
tances, because it is approximatively equal to a constant in a local neighbor-
hood. For distances up to the radius of the neighborhood the variogram thus
estimates well the underlying covariance functions, which are fitted in the sec-
ond step. In the third step the ordinary kriging assurnes implicitly the existence
of a local mean within a moving neighborhood. This local mean corresponds
to the constant to which ml(x) is approximatively equal within not too large a
neighborhood. The local mean can be estimated explicitly by a kriging of the
mean.
15 Kriging Spatial Components
The components of regionalization models can be extracted by kriging. The

extraction of a component of the spatial variation is a complementary operation
to the filtering out (rejection) of the other components. This is illustrated by an
application on geochemical data.
Kriging of the intrinsic component

For Z(x) defined in the framework of the intrinsic regionalization model with
mostly stationary components (presented on page 98), we may want to estimate
the intrinsic component Zs(x) from data about Z(x) in a neighborhood
n
Z~(xo) = L: w~ Z(x,,)
,,=1
The estimation error is nil on average using weights that sum up to one
E[ Z~(xo) - Zs(xo)] = E[ Ew~ Z(x,,) - Zs(xo)· Ew~]

~
1
n
= L: w~ E[ Z(x,,) - Zs(xo)]
,,=1
= t w~ (I: E[
,,=1 .,=o~
Z.,(x,,)] + ~[ ZS(X,,) - ZS(Xo) L)
..
o 0
= 0
Remember that the constraint of unit sum weights is also needed for the
existence of a variogram as a conditionally negative definite function as explained
in the presentation of ordinary kriging.
The estimation variance c7~ is
c7~ = var(Z~(xo)-Zs(Xo))
= E[ (
EEw~
S-1 n
Z.,(x,,) + Ew~
n
(Zs(x,,) - Zs(xo))
)2 ]
Kriging Spatial Components 101
Taking into aeeount the non eorrelation between eomponents

S-1 n n
a~ L L LW; w% CU(Xa-Xß)
u=o a=1 ß=1
n n n
- L LW; w% ""s(Xa-Xß) - 'l(xo-Xo) + 2 L w; 'l(xa-xo)

a=1 ß=1 a=1
We ean replaee the eovarianees by CU(xo-Xo) - ,U(xa-xß) beeause it is

always possible to construct a variogram from a eovarianee function and we get
S-1
a~ = L CU(xo-xo)
U=o
n n n
- ,S(xo-xo) - L LW; w% ,(Xa-Xß) + 2 L w~ ,S(xa-xo)
a=1 ß=1 a=1
Minimizing a~ with the constraint on the weights, we have the system
t w% ,(Xa-Xß) + Ils = ,S(xa-xo) for a = 1, ... , n
1Lw%=l
ß=1
n
ß=1
The differenee with ordinary kriging is that the terms ,S(xa-xo), whieh are
speeifie of the eomponent Zs(x), appear in the right hand side.
COMMENT 15.1 If the variogram of the intrinsic component ,s(h) is bounded

we can replace it with a covariance function Cs(h) using the relation
,s(h) = Cs(O) - Cs(h)

The kriging system for extracting the intrinsic component Zs(x) can now
be viewed in the framework of the locally stationary regionalization model: the
system estimates a second order stationary component together with the loeal
mean.
Kriging of a second-order stationary component

For the purpose of kriging a particular seeond-order stationary eomponent Zuo (x)
in the framework of the intrinsie regionalization model with mostly stationary
eomponents (see page 98), we start with the linear eombination
n
Z:o (xo) =L w~o Z(xa)

0=1
102 Geostatistics
285.
280.
275.
270.
Figure 15.1: Map of the component associated with the short range (3.5 km).
Unbiasedness is achieved on the basis of zero sum weights (introducing a

ficticious value Z(O) at the origin, to form increments)
E[ Z:o(xo) - Z"o(xo)] t ,,=1t w~o Z,,(x,,) - Z"o(xo) ]

E[ u=o
S-1 n
= E[ L L w~ Z,,(x,,) - Z"o(xo)
u=o ,,=1
+ E
n
w~o ZS(X,,) - ZS(O) . En
w~o
]
~
o
S-1 n
L L W~o E[ Z,,(x,,)]- E[ Z"o (xo)]
u=O ,,=1 '-v--" '-----v-----"
o 0
n
+ 2: w~o E[ ZS(X,,) - Zs(O)]

0=1' y ~
o
o
The estimation variance is
(T~ = var(Z:o(xo) - Z"o(xo))

n n n
= CUO(Xo-Xo) - LL W~o W'// ,(Xa-Xß) + 2 L Wa ,UO(Xa-Xo)
a=lß=1 a=1
I
The kriging system for the component Zuo (xo) is then
t w'//
ß=1
,(xa-xß) + /1uo = ,UO(xa-xo) for a = 1, ... , n
n
L wßo = 0
ß=1
We are thus able to extract from the data different components of the spatial
variation which were identified on the experimental variogram with a nested
variogram model.
These component kriging techniques bear some analogy to the spectral anal-
ysis methods, which are much in use in geophysics. CHILES & GUILLEN (33)
have compared the two approaches on gravimetrie data using covariance func-
tions derived from a model which represents geological bodies as random prisms
(see (177), (176)). The results obtained from decomposing either the variogram
or the spectrum matched weIl, even though the geological bodies were located
at different average depths in each method. An advantage of the geostatistical
approach is that the data need not to be interpolated on a regular grid for com-
puting the experimental variogram, while this is necessary for determining the
spectrum.
Filtering
Instead of extracting a component of the spatial variation we may wish to reject

it. We can filter a spatial component Zuo(x) by removing the corresponding
covariances Cuo (xa-xo) from the right hand side ofthe ordinary kriging system.
For example, we might wish to eliminate the component Z1(X) in the second-
order stationary model
Z(X) = Z1(X) + Z2(X) + m

------..--
ZJ(x)
and obtain a filtered version Z,(x) of Z(x). To achieve this by kriging, we use
weights w! in the linear combination
n
Zj(Xo) L w!Z(x a)
a=1
104 Geostatistics
285.
Ir
280.
275....
~
, r'
r
270.
Figure 15.2: Map of the component associated with the long range (6.5 km) together
with the estimated mean.
derived from the system
t w~ C(x,,-Xß) - 1-'1 = C2(X,,-Xo) for a = 1, ... ,n
1 ß=l
n
LW~=l
ß=l
where C1(h) is absent in the right hand side. We obtain thereby an estimation
in which the component Zf(x) is removed.
The filtered estimate can also be computed by subtracting the component
from the ordinary kriging estimate
Zj(x) = Z~K(X) - Z~(x)
According to the configuration of data around the estimation point, ordinary
kriging implicitly filters out certain components of the spatial variation. This
topic will be taken up again in the next chapter when examining why kriging
gives a smoothed image of reality.
Application: kriging spatia! components of arsenie data

The data was collected during the national geochemical inventory of France and
the map of the location of arsenic values in a 35 km x 25 km region near the
Loire river has already been shown on Figure 14.1 (page 96). The values of
arsenic were standardized before analysis. The mean experimental variogram of
this variable was fitted on Figure 14.2 (page 97) with a nested variogram model
,(h) bo nug(h) + b1 sph(h, 3.5km) + b2 sph(h, 6.5km)

consisting of a nugget effect variogram nug(h) and two spherical variograms
sph(h, range), multiplied by constants bo, b1 and b2 •
The corresponding linear regionalization model is made up with three second
order stationary eomponents and a loeal mean
Z(x) = Zo(x) + Zl(X) + Z2(X) + m/(x)

Kriging was performed using a moving neighborhood eonsisting of the 50
nearest sampies to the estimation point. Several displays of the map of arsenic
by ordinary kriging were already presented on Figure 13.2 (page 91), Figure 13.4
(page 92) and Figure 13.5 (page 93). The map of the standard deviations of
ordinary kriging was displayed on Figure 13.3 (page 92).
The kriged map of the short range eomponent of the arsenie values is seen on
Figure 15.1. It is worthwhile to examine how agglomerates of high values (large
circles) on the display of the data about Z(x) on Figure 14.1 (page 96) eompare
with the darkly shaded areas on the map of the short range eomponent Zl(X).
It is also interesting to inspeet how extrapolation acts in areas with no data.
A map of the component assoeiated with the long range variation, together
with the estimated mean, is depicted on Figure 15.2. This simultaneous extrae-
tion of the long range component Z2(X) and the loeal mean m/(x) is equivalent to
filtering out (rejecting) the nugget-effect and the short range eomponents Zo(x)
and Zl(X). Comparison with Figure 13.4 (page 92) shows that this map is just
a smoother version of the ordinary kriging map.
It should be noted that for irregularly spaced data a map of the nugget-effect
component Zo(x) cannot be established. This component ean only be estimated
at grid nodes whieh eoincide with a data loeation and is zero elsewhere. With
regularly spaeed data a raster representation of the nugget eomponent ean be
made using a grid whose nodes are identieal with the sampie loeations.
16 The Smoothness of Kriging
How smooth are estimated values from kriging with irregularly spaced data in a
moving neighborhood? By looking at a few typical configurations of data around
nodes of the estimation grid and by building on our knowledge of how spatial
components are kriged, we can understand the way the estimated values are
designed in ordinary kriging.
The sensitivity of kriging to the choice of the variogram model is discussed
in connection with an application on topographie data.
Kriging with irregularly spaced data

Let us take a regionalized variable for whieh three eomponents of spatial variation
have been identified. For the following linear model of regionalization we ass urne
loeal stationarity of order two
Z(x) = Zo(x) + Zl(X) + Z2(X) + m/(x)

The eovarianee model of the three spatial eomponents is
C(h) = Co(h) + C1(h) + C2 (h)

where we let Co(h) be a nugget-effeet eovarianee function and C1(h), C2 (h) be
spherieal models with ranges al and a2, numbered in such a way that al < a2'
Suppose that the sampling grid is highly irregular, entailing a very unequal
distribution of data in spaee. A grid of estimation points is set up, with nodes
as elose as is required for the construction of a map at a given resolution. At
each point the operation of ordinary kriging is repeated, pieking up the data in
the moving neighborhood.
Assuming a highly irregular arrangement of the data points, very different
eonfigurations of sampie points around estimation points will arise, four of whieh
we shall examine in the following.
1. Xo is more than a2 away from the data

This configuration of data points around the estimation point ean arise when Xo
is loeated amid a zone without data within the range of the two spherieal models,
as shown on Figure 16.1. Ordinary kriging is then equivalent to the kriging of
the mean. The right hand side eovarianees of the ordinary kriging system are nil
The Smoothness of Kriging 107
~
\
0 0 0 0
• 0 0 0 0
0 0 0 0 0·0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
•
"------
• •
•
• •
Figure 16.1: No data point is within the range of the two spherical models for this
grid node.
as all distanees involved are greater than a2. The information transferred to Xo
will be an estimation of the loeal mean in the neighborhood.
This shows that kriging is very eonservative: it will only give an estimate of
the mean function when data is far away from the estimation point.
2. Xo is less than a2 and more than al away irom the nearest data point Xc>
With this spatial arrangement, see Figure 16.2, the ordinary kriging is equivalent
to a filtering of the eomponents Zo(x) and Zl(X), with the equation system
t I I
12::
Aß C(xc>-xß) - p. = C2(xc> - xo) for a = 1, ... ,n
ß=l
n
Aß = 1
ß=l
The eovarianees Co(h) and C1(h) are not present in the right hand side for
kriging at such a grid node.
3. Xo is less than al away, but does not coincide with the nearest Xc>
In this situation, shown on Figure 16.3, ordinary kriging will transfer information
not only on the long range eomponent Z2(X), but also about the short range
component Zl (x), which varies more quickly in space. This creates a more
detailed description of the regionalized variable in areas of the map where data
108 Geostatistics
range 2 /
~
0
0
0
0
0
0
0
0
• 0
0·0
0
0
0
0
0
0
0
•
G 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
• •
• •
•
• •
Figure 16.2: One data point is within the range of the second spherical model for this
grid node.
is plenty. Ordinary kriging is now equivalent to a system filtering the nugget-
!
effect eomponent
t
ß=l
I
Aß C(x",-xß) - f.l = Cl (X", - Xo) + C2(x", - Xo) I for a = 1, ... ,n
n
LAß =l
ß=l
Only the nugget-effeet eovarianee Co(h) is absent from the right hand side
when kriging at such a loeation.
4. Xo eoineides with a data ioeation
In this last ease, when Xo = x'" for a particular value of a, ordinary kriging
does not filter anything and restitutes the data value measured at the loeation
x'" = Xo: it is an exact interpolator!
This aeeuraey may nevertheless be regarded as a nuisanee when there are only
a few loeations of eoincidenee between the grid and the data locations, beeause
the nugget-effect component will generate spikes in the estimated values at these
loeations. To avoid having a map with discontinuities at fortuitous loeations, it
is often advisable to filter systematieally the nugget-effect component Zo(x) in
eartographieal applieations.
range 1
0000000 • range 2
o 0 0 0 0·0 0
o 0 0 0 0 0 0
o 0 000
o 0 0 0 0 000 0 0 000
• •
• •
•
• •
Figure 16.3: One data point is within the range of the two spherical models for this
node.
Sensitivity to choice of variogram model

The most important characteristic for the choice of the variogram model is the
interpretation of the behavior at the origin. The type of continuity assumed for
the regionalized variable under study has immediate implications for ordinary
kriging.
The behavior at the origin can be of three types
1. discontinuous, i.e. with a nugget-effect component. The presence of a
nugget-effect has the consequence that the kriging estimates will be also
discontinuous: each data location will be a point of discontinuity of the
estimated surface.
2. continuous, but not differentiable. For example a linear behavior near the
origin implies that the variogram is not differentiable at the origin. It has
the effect that the kriging estimator is not differentiable at the data loca-
tions and that the kriged surface is linear in the immediate neighborhood
of the data points.
3. continuous and differentiable. A quadratic behavior at the origin will gener-

ate a quadratic behavior of the kriged surface at the data locations. Models
that are differentiable at the origin have the power to extrapolate outside
the range of the data in kriging. This implies negative ordinary kriging
weights.
Most variogram models are fairly robust with respect to kriging. There how-
ever is one pathological model: the so-called "Gaussian" variogram model. This
110 Geostatistics
model belongs to the family of stable variogram models (that bear this name
because of their analogy with the characteristic function of the stable distribu-
tions)
IhlP
J(h) b (1 - e-7) with 0 0
Für apower p equal to 2 we have the Gaussian variogram model
Ihl 2
J(h) b(1-e-7)
which is infinitely differentiable at the origin. The implication is that the asso-
ciated random function also has this property. This means that the function is
analytic and, if known over a small area, it could be predicted just anywhere as
all derivatives are known. Clearly this model is unrealistic in most applications
(besides the fact that it is purely deterministic). Furthermore the use of such
an unrealistic model has consequences on kriging which are illustrated in the
following example.
Applieation: kriging topographie data

We use a data set which has become a classic of spatial data analysis since its
publication by DAVIS [49]. It consists of 52 measurements of elevation (in feet)
in a 300 ft x 300 ft region.
First we fit a stable variogram model with apower of 1.5 and obtain by
ordinary kriging with unique neighborhood (all samples used for kriging) the
isoline map on the left of Figure 16.4. This is the map most authors get using
different methods, models or parameters (see [49], [149], [201], [12], [191]). The
map of the kriging standard deviations is seen on the right of Figure 16.4. The
standard deviations are comparable to those obtained with a spherical model of
range 100 ft and a sill equal to the variance of the data (3770 ft 2 ). They are of
the same order of magnitude as the standard deviation of the data (61.4 ft).
Nüw, if we increase the power of the stable model from 1.5 to 2 we have a
dramatic change in the behavior at the origin and in the extrapolating power of
the variogram model. The map of contoured estimates of ordinary kriging using
the Gaussian model in unique neighborhood is shown on the left of Figure 16.5.
We can observe artificial hills and valleys between data locations. At the borders
of the map the estimates extrapolate far outside the range of the data. The
kriging standard deviations represented on the map on the right of Figure 16.5
are generally near zero: this complies with the deterministic type of variogram
model, which is supposed to describe exhaustively all features of the spatial
variation in that region.
250.
200.
150.
100.
50.
Elevation Stan4ard deviatiOD
Figure 16.4: On the left: map of elevations obtained by ordinary kriging with unique
neighborhood using a stable variogram (power p = 1.5) (data points are displayed as
stars of size proportional to the elevation). On the right: map of the corresponding
standard deviations.
Switching from a unique to a moving neighborhood (32 points) using the

Gaussian model we get the map to be seen on the left of Figure 16.6. The
extrapolation effects are weaker than in unique neighborhood. The isolines seem
as drawn by a trembling hand: the neighborhoods are apparently in conflict
because the data do not comply with the extreme assumption of continuity buHt
into the model. The standard deviations in moving neighborhood are higher
than in unique neighborhood as shown on the right of Figure 16.6, especially on
the borders of the map as extrapolation is performed with less data.
The conclusion from this experiment is that the Gaussian model should not
be used in practice. A stable variogram model with apower less than 2 will do
the job better.
Another instructive exercise is to add a slight nugget-effect (e.g. 1/1000,
1/100, 1/10 of the variance) to a Gaussian variogram and to observe how the
map of the estimated values behaves, how the standard deviations increase. The
discontinuity added at the origin actually destroys the extreme extrapolative
properties of the Gaussian model.
112 Geostatistics
250.
200.
o
150.
100.
50.
Q
Bl ..... t:l.on St.n4ard 4eviat:l.on
Figure 16.5: Maps kriged using a Gaussian variogram in unique neighborhood.
250.
:aoo.
150.
100.
50.
.1..... t:l.oD St.n4ard 4ev:l.atioD
Figure 16.6: Maps kriged using a Gaussian variogram in moving neighborhood.

Parte
Multivariate Analysis
17 Principal Component Analysis
Principal component analysis is the most widely used method of multivariate

data analysis owing to the simplicity of its algebra and to its straightforward
interpretation.
A linear transformation is defined which transforms a set of correlated vari-
ables into uncorrelated factors. These orthogonal factors can be shown to extract
successively a maximal part of the total variance of the variables. A graphical
display can be produced which shows the position of the variables in the plane
spanned by two factors.
Transformation into factors

The basic problem solved by principal component analysis is to transform a set
of correlated variables into uncorrelated quantities, which could be interpreted
in an ideal (multi-Gaussian) context as independent factors underlying the phe-
nomenon. This is why the uncorrelated quantities are called factars, although
such an interpretation is not always perfectly adequate.
Z is the n x N matrix of data from which the means of the variables have
already been subtracted. The corresponding N x N variance-covariance matrix
V then is
V = [aij] = !..ZTZ
n
Let Y be an n x N matrix containing in its rows the n sampIes of factors Yp
(p = 1, ... , N), which are uncorrelated and of zero mean.
The variance-covariance matrix of the factors is diagonal, owing to the fact
that the covariances between factors are nil by definition
o
(d0
d~J
l1
1
D = _yTy =
n 0 o
and the diagonal elements dpp are the variances of the factors.
A matrix A is sought, N x N orthogonal, which linearly transforms the
measured variables into synthetic factors
y=ZA withATA=I
Multiplying this equation from the left by 1/n and yT, we have
!..yTy = !..yTZA
n n
116 Multivariate Analysis
and replacing Y by ZA on the right hand side, it follows
.!.(ZA)T (ZA) = .!.ATZTZA = AT .!.(ZTZ)A

n n n
Finally
D=ATVA
that is
VA=AD
It can immediately be seen that the matrix Q of orthonormal eigenvectors of
V offers a solution to the problem and that the eigenvalues .Ap are then simply
the variances of the factors 1';,. Principal component analysis is nothing else than
a statistical interpretation of the eigenvalue problem
VQ=QA with QTQ = I
defining the factors as

Y=ZQ
Maximization of the variance of a factor

Another important aspect of principal component analysis is that it allows to
define a sequence of orthogonal factors which successively absorb a maximal
amount of the variance of the data.
Take a vector Yl corresponding to the first factor obtained by transforming
the centered data matrix Z with a vector al calibrated to unit length
Yl = Zal with aial = 1

The variance of Yl is
1 T ITT T
var(Yd = -Y 1 Yl = -al Z Zal = a l Val
n n
To attribute a maximal part of the variance of the data to Yl, we define
an objective function </>1 with a Lagrange parameter .Al, which multiplies the
constraint that the transformation vector al should be of unit norm
</>1 = aiVal - .Al (ai al - 1)

Setting the derivative with respect to al to zero
8</>1 = 0
8a l
~ 2Val - 2.Alal =0
we see that .Al is an eigenvalue of the variance-covariance matrix and that al is
equal to the eigenvector ql associated with this eigenvalue
Vql = .Alql
Principal Component Analysis 117
We are interested in a second vector Y2 orthogonal to the first

COV(Y2,Y1) = COV(Za2,Za1) = a~Va1 = a~A1a1 = 0
The function 4>2 to maximize incorporates two constraints: the fact that a2
should be unit norm and the orthogonality between a2 and a1. These constraints
bring up two new Lagrange multipliers A2 and fL
4>2 = a~Va2 - A2(a~a2 - 1) + fLa~a1
Setting the derivative with respect to a2 to zero
84>2
-;:;- = 0
ua2
{:::::> 2Va2 - 2A2a2 + fL a1 = 0
What is the value of fL ? Multiplying the equation by aI from the left

2 aIVa2 -2A2 aI a2 +fL aI a1 = 0
'--v--' '-v--" '-v--"
o 0 1
we see that fL is nil (the constraint is not active) and thus

Va2 = A2 a2
Again A2 turns out to be an eigenvalue of the variance-covariance matrix and
a2 is the corresponding eigenvector Q2. Continuing in the same way we find the
rest of the N eigenvalues and eigenvectors of V as an answer to our maximization
problem.
Interpretation of the factor variances

Numbering the eigenvalues of V from the largest to the lowest, we obtain a
sequence of N uncorrelated factors which provide an optimal decomposition (in
the least squares sense) of the total variance as
N N
tr(V) = .E (J';; = .E Ap
;=1 p=l
The eigenvalues indicate the amount of the total variance associated with
each factor and the ratio
variance of the factor Ap
total variance = tr(V)
gives a numerical indication, usually expressed in %, of the importance of the
factor.
Generally it is preferable to standardize the variables (subtracting the means
and dividing by the standard deviations), so that the principal component anal-
ysis is performed on the correlation matrix R. In this framework, when an
eigenvalue is lower than 1, we may consider that the associated factor has less
explanatory value than any single variable, as its variance is inferior to the unit
variance of each variable.
Humerus 1
Ulna .940 1
Tibia .875 .877 1
Femur .878 .886 .924 1
11 Humerus Ulna Tibia Femur
Table 17.3: Correlation matrix between the lengths of bones of fowl.
Correlation of the variables with the factors

In general it is preferable to work with standardized variables Zai to set them on
a eommon seale and make them eomparable
--mi
Z a.
Zai= ~
where mi and ~ are the mean and the standard deviation of the variable Zai.
The varianee-eovarianee matrix assoeiated to standardized data is the eorre-
lation matrix
R =!:. ZT Z
n
whieh ean be deeomposed using its eigensystem as
R = QAQI = QVX (QVXr = AA T
The vectors ai, eolumns of AT, are remarkable in the sense that they indieate
the eorrelations between a variable Zi and the factors Yp beeause
eorr(z;, Yp) = ;r; q;p = a;p
The vectors a; are of unit length and their eross product is equal to the
eorrelation eoeffieient
a;aj = Pij
Owing to their geometry the veetors a; ean be used to represent the position
of the variables on the surfaee of the unit hypersphere eentered at the origin.
The eorrelation eoeffieients Pij are the eosines of the angles between the vectors
referring to two different variables.
The projection of the position of the variables on the sUrfaee of the hyper-
sphere towards aplane defined by a pair ofaxes of factors yields a graphical
representation called the circle 0/ correlations. The circle of correlations shows
the proximity of the variables inside a unit eircle and is useful to evaluate the
affinities and the antagonisms between the variables. Statements ean easily be
made about variables whieh are loeated near the eireumferenee of the unit eircle
Factor 1 (92 %)
size
....
Ifl
o
tibi.!! j\ h"meNS
femur ulna
o +---------+--------1 Factor 2 (4%)
I shape
Ifl
o
I
....I
-1 -0.5 o 0.5 1
Figure 17.1: The position of the length variables inside the correlation circle. Each
axis is labeled with the number of the factor and the proportion of the total variance
(in %) extracted by the factor.
because the proximities in 2-dimensional space then correspond to proximities in

N-dimensional space. For the variables located near the center of a unit circle it
is necessary to check whether the proximities really correspond to proximities on
the hypersphere by looking at the correlation circles generated by other factor
pairs.
EXAMPLE 17.1 The correlation matrix of the lengths of fowl bones (see MOR-
RISON, 1978, p282) on Table 17.3 seems trivial to interpret at first look: a11
variables are we11 correlated. Big birds have large bones and small birds have
sma11 bones.
Inspection of the plane of the first two factors on Figure 17.1 showing the
position of the length variables inside the correlation circle, reveals two things.
The correlations of a11 variables with the first factor (92% of the total variance)
are a11 very strong and of the same sign. This factor represents the difference
in size of the fowls, which explains most of the variation, and has an obvious
impact on all bone lengths. The second factor (4% ofthe variance) splits up the
group of the four variables into two distinct subgroups. The second factor is due
to the difference in shape of the birds: the leg bones (tibia and femur) do not
grow in the same manner as the wing bones (humerus and ulna). Some birds
have shorter legs ar langer wings than others (and vice-versa).
F1
.....
~.-
c
.g~ •
~ ~ / •
o V T ~ F2
In
o
.
I
.....
I
-1 -0.5 o 0.5 1
Figure 17.2: Unit cirele showing the correlation of seven soil pollution variables with
the first two principal components.
Having a second look at tbe correlation matrix on Table 17.3, it is readily seen
tbat tbe correlations among tbe wing variables (.94) and among tbe leg variables
(.92) are stronger tban tbe rest. Principal component analysis does not extract
anytbing new or bidden from tbe data. It is merely a belp to read a correlation
matrix. Tbis feature becomes especially useful for large correlation matrices.
EXAMPLE 17.2 In a study on soil pollution data, tbe seven variables Pb, Cd,
Cr, Cu, Ni, Zn, Mo were logaritbmically transformed and standardized. Tbe
principal components calculated on tbe correlation matrix extracted 38%, 27%,
15%, 9%, 6%, 3% and 2% of the total variance. So the first two factors represent
about two thirds (65%) of the total variance. The third factor (15%) hardly
explains more variance than any of the original variables taken alone (1/7 =
14.3%).
The Figure 17.2 shows the correlations with the first two factors. Clearly
the first factor exhibits, like in the previous biological example, mainly a size
effect. Trus is more appropriately termed a dilution factor because tbe measured
elements constitute a. very small proportion of the soil and tbeir proportion in
a soil sampie is directly proportional to the overall pollution of the soil. The
second factor looks interesting: it shows the antagonism between two pairs of
elements: Pb, Cd and Ni, Cr.
Pb 1
Cd .40 1
Zn .59 .54 1
Cu .42 .27 .62 1
Mo .20 .19 .30 -.07 1
Ni -.04 -.08 .40 .29 .07 1
Cr -.14 -.11 .29 .11 .13 .86 1
11 Pb Cd Zn Cu Mo Ni Cr
Table 17.4: Correlation matrix between the seven soil pollution elements.
The correlation coeflicients between the seven soil pollution elements are
listed on Table 17.4. The ordering of the variables was chosen to be the same
as on the second factor: Pb, Cd, Zn, Cu, Mo, Ni and Cr (from left to right on
Figure 17.2). This exemplifies the utility of a PCA as a device to help reading a
correlation matrix.
In this data set of 100 samples there are three multivariate outJiers that could
be identified with a 3D representation of the sample cloud in the coordinate
system of the first three factors (which concentrate 80% of the variance). On
a modern computer screen the cloud can easily be rotated into a position that
shows best one of its features like on Figure 17.3.
This need for an additional rotation illustrates the fact that PCA only pro-
vides an optimal projection plane for the samples if the cloud is of ellipsoidal
shape.
In the general case of non-standardized data it is possible to build a graph
showing the correlations between the set of variables and a pair of fadors. The
variance-covariance matrix V is multiplied from the left and the right with the
matrix Du-l of the inverses of the standard deviations (see Example 1.4)
DU-l V Dq-l = Dq-l Q A QT Dq-l = Dq-l Q VA (Du-l Q VA) T

from what the general formula to calculate the correlation between a variable
and a factor can be deduced
corr(z;,yp) = ~p-qip
aii
EXERCISE 17.3 Deduce from
J:.ZTy = RQ = QA with QT Q = I
n
that the correlation coeflicient between a vector Zi and a factor y p is
pip = .;r; qip

28 F3
•
•
""
: " •• r _ .. '
" ""'""
, -_ • • " F2
..&..
••• -.rr- '
. ,;,." . .'
I: 'r
98
•
DDPitch DDRoll DDYaw
Figure 17.3: The cloud of 100 sampies in the coordinate system of the first three factors
after an appropriate rotation on a computer screen ("Pitch", "Roll" and "Yaw" are
the zones of the computer screen used by the pointer to activate the 3D rotation of
the cloud). Three outliers (with the corresponding sampie numbers) can be seen.
From this it can be seen that when a standardized variable is orthogonal

to all others, it will be identical with a factor of unit variance in the principal
component analysis.
EXERCISE 17.4 Let Z a matrix fo standardized data and Y the matrix of cor-
responding factors obtained by a principal component analysis Y = ZQ where
Q is the matrix of orthogonal eigenvectors of the correlation matrix R.
Show that
R= corr(Z, Y) [corr(Z, Y) r
18 Canonical Analysis
In the previous chapter we have seen how principal component analysis was used
to determine linear orthogonal factors underlying a set of multivariate measure-
ments. Now we turn to the more ambitious problem to compare two groups of
variables by looking for pairs of orthogonal factors, which successively assess the
strongest possible links between the two groups.
Factors in two groups of variables

Canonical analysis proposes to study simultaneously two groups of variables mea-
suring various properties on the same set of sampies. Successively pairs of factors
are established which link best the two groups of variables. The rum being, when
interpreting the factors, to see for each factor underlying one group of properties,
if it can be connected to a factor underlying another set of properties gained from
the same set of sampies.
The n x N centered data table Z is split into two groups of variables
Z = [ZIIZ2]
that is, the matrix Z is formed by putting together the columns of an n x M

matrix ZI and an n x L matrix Z2 with N = M + L.
The variance-covariance matrix V associated with Z can be subdivided into
four blocs
V = .!.[ZIIZ2]T[ZIIZ2]
n
= (V 11
C 2I
C 12 )
V 22
where the matrices V 11 of order M x M and V 22 of order L x L are the variance-
covariance matrices associated with ZI and Z2. The covariance matrices C I2 (of
order M xL) et C 2I (of order L x M) contain the covariances between variables
of different groups
CI2 = C 2I
We are seeking pairs of factors {u p , v p }
Up = Zl3p et Vp = Z2 bp
which are uncorrelated within their respective groups
Upi.Uk and Vpi.Vk for p #- k

and for which the correlation corr(up , vI') is maximal. This correlation, called
the canonical correlation between two factors u and v is
aT ZT Z2 b
corr(u, v) = 1
VaTZIZla. VbTZIZ2b
Deciding to norm a and b following the metric given by the variance-

covariance matrices (supposed to be positive definite) of the two groups
aTVlla = bTV 22 b = 1
the correlation can be simply written as
corr(u, v) = aT C 12 b
Intermezzo: singular value decomposition

It is instructive to follow a little algebraic detour. We pose
x= VVlla and y = VV22 b with x T x = yT Y = 1

in such a way that
corr(u, v) = xT [VVll C 12 JV22] Y

... v .,
G
For the complete system of vectors x and y we have
XTGY =E
that is to say the singular value decomposition
G =XEyT
where the canonical correlations Jl-, which are the only non zero elements of E,
turn out to be the singular values of the rectangular matrix G.
Maximization of the correlation

We are looking for a pair of transformation vectors {a, b} making the correlation
between the factors {u, v} maximal
corr(u, v) = aT C 12 b with aTVlla = bTV 22 b = 1
The objective function cf> is
T l(T 1 fT
cf>=a C 12b-2"Jl- a V n a-1)-2"Jl-(b V 22b-1)
Canonical Analysis 125
where p. and p.' are Lagrange parameters.

Setting the partial derivatives with respect to a and b to zero
a~
=0 {:::::::} C 12b - JLVlla =0
aa
a~
ab =0 C 2l a - JL'V 22 b =0
{:::::::}
What is the relation between JL and JL'? Premultiplying the first equation
with aT and the second with bT
aT C 12 b - JLaTVlla = 0 {:::::::} JL = a T C 12 b
b T C 21 a - JL'bTV 22b =0 {:::::::} JL' = bT C 21 a
we see that
JL = JL' = corr(u, v)
Having found earlier that
C 12 b = JLVna and C 2l a = JLV22b
we premultiply the first equation with C 2l VI]
C 2l vil c 12 b = JLC2l a
and taking into account the second equation we have
C 2l VilC12b = JL 2V 22 b
Finally this results in two eigenvalue problems, defining ,X = p.2
V221C2l vilc12b = ,Xb
Vil C 12 V 2l C 2l a 'xa
Only the smaller one of the two systems needs to be solved. The solution of
the other one can then easily be obtained on the basis of the following transition
formulae between the transformation vectors
1 1
a = v0. VillC 12 b and b -- v0. V-221 C 2l a
EXERCISE 18.1 Show that two lactors in the same group are orthogonal:
cov(up , Uk) = l5pk

EXERCISE 18.2 Show that two lactors in different groups are orthogonal:
cov(up , Vk) = p.l5pk

EXERCISE 18.3 Show that the vector a in the lollowing equation
Vlla = ,XC12V2iC21a
solves a canonical analysis problem.

19 Correspondence Analysis
Canonical analysis investigates, so to speak, the correspondence between the

factors of two groups of quantitative variables. The same approach applied to
two qualitative variables, each of which represents a group of mutually exclusive
categories, is known under the evocative name of "correspondence analysis".
Disjunctive table
A qualitative variable z is a system of categories (classes) which are mutually
exclusive: every sampie belongs to exactly one category. The membership of
a sampie z'" to a category Ci can be represented numerically by an indicator
function
1 ( )
Ci Z'" =
{I E
if z'" Ci
0 otherwise
The matrix recording the memberships of the n sampies to the N categories
of the qualitative variable is the disjunctive table H of order n x N
H = [lc;(z",)]
Each row of the disjunctive table contains only one element equal to 1 and
has zeroes elsewhere, as the possibility that a sampie belongs simultaneously to
more than one category is excluded.
The product of H T with H results in a diagonal matrix whose diagonal ele-
ments nii indicate the number of sampies in the category number i. The division
of the nji by n yields the proportion F ji of sampies contained in a category and
we have
(Fll o
F~J
1
V= -HTH= 0
n 0 o
Contingency table
Two qualitative variables measured on the same subset of sampies are repre-
sented, respectively, by a table H 1 of order n x M and a table H 2 of order
n X L.
The product of these two disjunctive tables has as elements the number of
sampies nij belonging simultaneously to a category i of the first qualitative vari-
able and a category j of the second qualitative variable. The elements of this
Correspondence Analysis 127
table, which is a contingency table of order M X L, can be divided by the sam-

pIe total n, measuring like that the proportion of sampies at the crossing of the
categories of two qualitative variables
C 12 = .!.HiH2 = [Fii ]
n
Canonical analysis of disjunctive tables
Correspondence analysis consists in applying the formalism of canonical analysis

to the disjunctive tables H 1 and H 2 by forming the matrices
1 T 1 T
C~l = C 12 , Vn = -H1Hl,
n
V 22 = -H 2 H 2
n
Because of the diagonal structure of the matrices V n and V 22 , which only

intervene in the calibration of the transformation vectors, we can say that corre-
spondence analysis boils down to the investigation of the links between the rows
and the columns of the contingency table C 12 •
Coding of a quantitative variable
A quantitative variable z can be transformed into a qualitative variable by parti-

tioning it with a set of non overlapping (disjunct) intervals Ci. Indicator functions
of these intervals allow to constitute a disjunctive table associated with z
1c- (za ) -- {I if Za belongs to the interval Ci

• 0 otherwise
A subdivision is sometimes given apriori, as for example with granulometric

fractions of a soil sampie.
The coding of a quantitative variable enables a correspondence analysis which
explores the links between the intervals of this quantitative variable and the
categories of a qualitative variable.
Contingencies between two quantitative variables
The diagonal matrix V = (l/n) HTH associated with the partition of a quanti-
tative variable contains the values of the histogram of the sampies Za, because
its diagonal elements show the frequencies in the classes of values of z.
The contingency table C 12 = (l/n) HiH2 of two quantitative variables Zl
and Z2 holds the information about a bivariate histogram, for we find in it the
frequencies of sampies at the crossing of classes of Zl and Z2.
Continuous correspondence analysis

The bivariate histogram, that is, the contingency table of two quantitative vari-
ables, can be modeled with a bivariate distribution function F(dZ1 , dZ2). If the
corresponding bivariate density function <p(ZI, Z2) exists
F(dZI, dZ2) = <P(Zl' Z2) Fl(dZd F2(dZ2)

where Fl(dZt}, F2(dZ2) are the marginal distribution functions of Zl and Z2,
used to model their respective histograms. Supposing, moreover, the density
function <p is square integrable, it can be decomposed into a system of eigenvalues
Ap = /l~ and eigenfunctions jp(Zd, 9p(Z2)
00
<p(ZI, Z2) = L /lp jp(Zd 9p(Z2)

p=O
This decomposition is the continuous analogue of the singular value decom-

position.
EXAMPLE 19.1 When the bivariate law is Gaussian, the decomposition into
eigenvalues yields coeflicients /lp equal to a coeflicient p power p
/lp = pP with Ipl < 1

as well as eigenfunctions equal to normalized Hermite polynomials 1}p
Hp
1}p = v'PT
The bivariate Gaussian distribution G( dZl , dZ2) can then be decomposed into
factors TJp of decreasing importance pP
L pP TJp(Zd TJp(Z2) G(dZt} G(dZ2)

00
G(dZl , dZ2) =
p=o
EXAMPLE 19.2 In nonlinear geostatistics, interest is focused on the bivariate

distribution of two random variables Z(x a ) and Z(xß) located at two points X a
and xß of a spatial domain. The decomposition of this bivariate distribution
results, under specific assumptions, in a so called isofactorial model. Isofactorial
models are used in disjunctive kriging for estimating an indicator function in a
spatial domain.
In particular, in the case of the Gaussian isofactorial model, the coeflicient
p will be equal to the value of the correlation function p(x",-xß) between two
points X a and xß.
PartD
Multivariate Geostatistics
20 Direct and Cross Covariances
The cross covariance between two random functions can be computed not only at
locations x but also for pairs of locations separated by a vector h. On the basis
of an assumption of joint second-order stationarity a cross covariance function
between two random functions is defined which only depends on the separation
vector h.
The interesting feature of the cross covariance function is that it is generally
not an even function, i.e. its values for +h and -h may be different. This occurs
in time series when the effect of one variable on another variable is delayed.
The cross variogram is an even function, defined in the framework of an
intrinsic hypothesis. When delay-effects are an important aspect of the core-
gionalization, the cross variogram is not an appropriate tool to describe the
data.
Cross covariance function
The direct and cross covariance functions C;j (h) of a set of N random functions
Z;(x) are defined in the framework of a joint second order stationarity hypothesis
E[Zi(X)] = mi for all x E 1); i = 1, ... , N

{
E[ (Zi(x)-mi) . (Zj(x+h)-mj)] = Cij(h)
for all x,x+h E 1); i,j = 1, ... , N
The mean of each variable Z;(x) at any point of the domain is equal to a
constant mi. The covariance of a variable pair depends only on the vector h
linking a point pair and is invariant for any translation of the point pair in the
domain.
A set of cross covariance functions is a positive definite function, i.e. the
variance of any linear combination of N variables at n+ 1 points with a set of
weights w~ needs to be positive. For any set of points X o E 1) and any set of
weights w~ E IR
var( t t w~
;=1 a=O
Z;(Xo ))
N N n n
= LL L L W~W~C;j(Xo-xß) ~ 0
;=1 j=1 0=0 ß=O
132 Multivariate Geostatistics
Delay effect
The cross covariance funetion is not apriori an even or an odd funetion. Gen-
erally for i-# j, a change in the order of the variables or a change in the sign of
the separation veetor h changes the value of the cross covariance funetion
Göj(h) -# Gjö(h) and Göj{ -h) -# Göj(h)

If both the sequence and the sign are changed, we are back to the same value
Göj(h) = Gjö( -h)

In particular, the maximum of the cross covariance funetion (assuming pos-
itive correlation between a given variable pair) may be shifted away from the
origin of the abscissa in a certain direetion by a veetor r. This shift of the maxi-
mal correlation is frequent with time series, where one variable can have an effeet
on another variable which is not instantaneous. The time for the second vari-
able to reaet to ßuetuations of the first variable causes a delay in the correlation
between the time series.
EXAMPLE 20.1 Let Zl (x) be a random function obtained by shifting the random
function Zz(x) with a vector r, multiplying it with a constant a and adding c:(x)
Zl(X) = a Z2(x+r) + c:(x)

where c:(x) is an independent measurement error without spatial correlation, i.e.
a quantity with a zero mean and a nugget-effect covariance function Gnug(h).
The direct covariance function of Zl(X) is proportional to that of Z2(X)
Gn(h) = a2 Gz2 (h) + Gnug(h)

and the cross covariance function is obtained from the even function Gz2 (h)
translated by r
G12 (h) = a G22 (h + r)
It is worth noting that the nugget-effect term is absent from the cross covariance
function.
EXERCISE 20.2 Compute the cross covariance function between Zl(X) and Zz(x)
for
Zl(X) = al Z2(X + rt} + az Zz(x + rz)
which incorporates two shifts rl and rz.
Direct and Cross Covariances 133
Cross variogram
The direct and cross variogram 'Yij(h) are defined in the context of a joint intrinsic
hypothesis for N random functions, when for any x, x+h E 'D and all pairs
i,j = 1, ... ,N
E[Zi(X+h) - Zi(X)] = 0
{
cov [( Zi(x+h) - Zi(X)), (Zj(x+h) - Zj(X))] = 2'Yij(h)
The cross variogram is thus defined as half the expectation of the product
of the increments of two variables
'Yij(h) = ~ E [( Zi(x+h) - Zi(X)) . (Zj(x+h) - Zj(x))]
The cross variogram is obviously an even function and it satisfies the following
inequality
'Yii(h) 'Yjj(h) ;::: bij(h)1 2
because the square of the covariance of increments from two variables is bounded
by the product of the corresponding increment variances. Actually a matrix
r(h o) of direct and cross variogram values is a positive semi-definite matrix for
any fixed h o because it is a variance-covariance matrix of increments.
It is appealing to investigate its relation to the cross covariance function in
the framework of joint second-order stationarity. The following formula is easily
obtained
'Yij(h) = Gij(O) - ~(Gij( -h) + Gij{ +h))
which shows that the cross variogram takes the average of the values for -h and
for +h of the corresponding cross covariance function. Decomposing the cross
covariance function into an even and an odd function
Gij(h) = ~ (Gij (+h) + Gij (-h)) + ~ (Gij (+h) - Gij (-h))

, ..,. , , ..,. ~
even term odd term

we see that the cross variogram only recovers the even term of the cross covariance
function. It is not adequate for modeling data in which the odd term of the cross
covariance function plays a significant role.
The cross covariance function between a random function and its derivative is
an odd function. Such antisymmetric behavior of the cross covariance function is
found in hydrogeology, when computing the theoretical cross covariance function
between water head and transmissivity, where the latter variable is seen as the
derivative of the former (see for example [42], p68). Several cross covariance
function models between a random function and its derivative are given in [59],
[186J.
1
- •• C~· (h) "
•
.. lJ
.... , ..oll
o ..•••.. .· ·T'·, ·• ....

•
.......~.....,, .•...-.-.-.•....•:
+. ....
•
-1 . .
••
. •+.. ,I
•
...' ,..••
.r ..,' •.--...~.
..-/'"'" j~ •
•
•
• ......
-2l.,-: yij (h) :. • 458 delay
-3
-400 -200 o
.:,
!' .. ./
200 400 seconds
h
Figure 20.1: Experimental cross covariance Cij(h) and experimental cross variogram
,;j(h).
EXAMPLE 20.3 (GAS FURNACE DATA FROM [16]) Two timeseries, one corre-
sponding to the Buctuation of agas input into a furnace and the other being the
output of 002 from the furnace, are measured at the same time. The chemical
reaction between the input variable and the output variable takes several tens of
seconds and we can expect a delay effect on measurements taken every 9 seconds.
The experimental cross covariance function for different distance classes .fj
gathering n c pairs of locations X a , xß according to their separation vectors
xa-xß = haß E .fj is computed as
1 nc
Cij(.fj) = - I: (Zi(Xa) - mi)' (zj(Xa+h aß ) - mj)
nc a=1
The experimental cross covariance function shown on Figure 20.1 revea1s a

delay of 45 seconds between Buctuations in the gas input and a subsequent effect
on the rate of carbon dioxide measured at the output of the system.
Figure 20.1 also exhibits the corresponding experimental cross variogram for
Direct and Cross Covariances 135
.. I
•• I
1
odd •• • II
• .1I
•
".•
". .1 r
••
...........,.. .
•• ••• I
o ••• •• I
T
.....,.,.......~
•
••
~
I
I• ....•
• •
~
~
.
• •• ..........,.
• I
........• I• :.~
I ••
-1 • I •••
• I •••
I ••
• I •
• .,.. I •
even· ·1· •
-2 •.......
• I··
h
-400 -200 o 200 400 seconds
Figure 20.2: Even and odd terms of the experimental cross covariance.
different classes Sj with haß E Sj
1 nc
,:](Sj) = 2 I: (zi(xa+haß ) - Zi(Xa))' (zj(Xa+haß ) - Zj(xa))
nc 0'=1
The experimental cross variogram is symmetrie with respect to the ordinate

and is not suitable for detecting a delay.
Figure 20.2 displays a plot of the decomposition of the experimental cross
covariance into an even and an odd term. The even term has the same shape
as the cross variogram when viewed upside down. The odd term measures the
degree of asymmetry of the cross covariance: in case of symmetry the odd term
would be identically zero.
Pseudo cross variogram
An alternate generalization of the variogram, the pseudo cross variogram 7rij(h),

has been proposed by MVERS [134] and CRESSIE [40] by considering the variance
of the cross increments instead of the covariance of the direct increments as in
the definition of the cross variogram ';j(h). Assuming for the expectation of the
cross increments
\:Ix E 1), \:Ii,j : E[ Z;(x+h) - Zj(x) 1= 0

the pseudo cross variogram comes as
lI';j(h) = ~E[ (Z;(x+h) - Zj(x)f 1
The function lI'ij(h) has the advantage of not being even. The assumption of
stationary cross increments is however unrealistic: it usually does not make sense
to take the difference between two variables measured in different physical units
(even if they are rescaled). PAPRITZ et al. [139], PAPRITZ & FLÜHLER [138]
have experienced limitations in the usefulness of the pseudo cross variogram and
they argue that it applies only to second-order stationary functions. Another
drawback of the pseudo cross variogram function, which takes only positive val-
ues, is that it is not adequate for modeling negatively correlated variables. For
these reasons we shall not consider furt her this approach and deal only with the
more classical cross covariance function.
Difficult characterization of the cross covariance function

For a pair of second order stationary random functions the foIlowing inequality
is valid
Gii(O) Gjj(O) ~ IGij(hW

This inequality teIls us that a cross covariance function is bounded by the product
of the values of the direct covariance functions at the origin.
There are no inequalities of the following type
'l
Gij(O) IGij(h)1
Gii(h) Gjj(h) 'l IGij (h)1 2
as the quantities on the left hand side of the expressions can be negative (or
zero). In particular we do not usuaIly have the inequality
IGij(O)1 'l IGij(h)1
because the maximal absolute value of Gij(h) may not be located at the origin.
The matrix C(h) of direct and cross covariances is in general neither positive
or negative definite at a specific lag h. It is hard to describe the properties
of a covariance function matrix, while it easy to characterize the corresponding
matrix of spectral densities or distribution functions.
21 Covariance Function Matrices
It is actually difficult to characterize directly a covariance function matrix. This

becomes easy in the spectral domain on the basis of Cramer's generalization of
the Bochner theorem, which is presented in this chapter. We consider complex
covariance functions.
Covariance function matrix

The matrix C(h) of direct and cross covariance functions of a vector of complex
random functions (with zero means without loss of generality)
z(x) = (ZI, ... ,Zi, ... ,ZNf with E[ z(x)] = 0

is defined as
C(h) = E [ z(x) z(x+h)"]
The covariance function matrix is a Hermitian positive semi-definite function,
that is to say, for any set of points we have x'" E 'D and any set of complex weights
wi
'" N N n n
EE E E w~ w~ Cij(x",-xß) ;::: 0
i=1 j=1 ",=0 ß=O
A Hermitian matrix is the generalization to complex numbers of a real sym-
metrie matrix. The diagonal elements of a Hermitian N x N matrix A are
real and the off-diagonal elements are equal to the complex conjugates of the
corresponding elements with transposed indices: aij = aji.
For the matrix of direct and cross covariance functions of a set of complex
variables this means that the direct covariances are real, while the cross covari-
ance functions are generally complex.
Cramer's theorem
Following a generalization of Bochner's theorem due to CRAMER [37] (see also
[68], [208]), each element Gij(h) of a matrix C(h) of continuous direct and cross
covariance functions has the spectral representation
J... Je
+00 +00
Gij(h) = iwTh dFij(w)
-00 -00
where the Fij{ w) are spectral distribution functions and W is a vector of the same
dimension as h. The diagonal terms F ii (w) are real, non decreasing and bounded.
The off-diagonal terms Fij(w) (ii= j) are in general complex valued and of finite
variation. Conversely, any matrix of continuous functions C(h) is a matrix of co-
variance functions, if the matrices of increments ßFij{ w) are Hermitian positive
semi-definite for any block ßw (using the terminology of [173]).
Spectral densities
If attention is restricted to absolutely integrable covariance functions, we have
the following representation
J... JeiwTh fij(W) dw

+00 +00
Cij(h) =
-00 -00
This Fourier transform can be inverted and the spectral density functions
fij( w) can be computed from the cross covariance functions
fij(W) = -
1
(211")n
J'" Je-iWThCoo(h)dh
+00 +00
')
-00 -00
The matrix of spectral densities ~f a set of covariance functions is positive

semi-definite for any value of w. For any pair of random functions this implies
the inequality
Ifij(WW :::; fii(W) /ij(w)
With functions that have spectral densities it is thus simple to check whether
a given matrix of functions can be considered as a covariance function matrix.
EXERCISE 21.1 Compute the spectraJ density of the exponentiaJ covariance

function (in one spatiaJ dimension) C(h) = be- a1hl , b
formula
> 0, a > °using the
J
+00
f(w) = 2111" e- iwh C(h) dh
-00
EXERCISE 21.2 Show (in one dimensional space) that the function
( ai +a o
)
Cij(h)=e- -2-3 Ihl
can only be the cross covariance function of two random functions with an ex-
ponentiaJ cross covariance function Cj(h) = e-a;lhl, aj > 0, if ai = aj.
Covariance Function Matrices 139
Phase shift
In one dimensional space, for example along the time axis, phase shifts can
easily be interpreted. Considering the inverse Fourier transforms (admitting
their existence) of the even and the odd term of the cross covariance function,
which are traditionally called the cospectrum Cij(W) and the quadrature spectrum
%(w), we have the following decomposition of the spectral density
f;j(w) = Cij(W) - iq;j(w)
The cospectrum represents the covariance between the frequency components

of the two processes which are in phase, while the quadrature spectrum charac-
terizes the covariance of the out of phase components. Further details are found
in [143], [208].
In polar notation the complex function fij(W) can be expressed as
fij(W) = Ifij(W) Iei <Pij(w)
The phase spectrum 'Pij(W) is the average phase shift of the proportion of
Zi( x) with the frequency
W on the proportion of Zj (x) at the same frequency
and
Im(f;j(w)) -qij(W)
tan 'Pij ()
W = = -....:;.-=:-..:....,-'-
Re(f;j(w)) Cij(W)
Absence of phase shift at a frequency W implies a zero imaginary part
Im(fij(w)) for this frequency. When there is no phase shift at any frequency,
the cross spectral density fij(W) is real and its Fourier transform is an even
function.
22 Intrinsic Multivariate Correlation
Is the multivariate correlation structure of a set of variables independent of the

spatial correlation? When the answer is positive, the multivariate correlation is
said to be intrinsic. It is an interesting question in applications to know whether
or not a set of variables can be considered intrinsically correlated, because the
answer may imply considerable simplifications for subsequent handling of this
data.
Intrinsic correlation model

The simplest multivariate covariance model that can be adopted for a covariance
function matrix consists in describing the relations between variables by the
variance-covariance matrix V and the relations between points in space by a
spatial correlation function p(h) which is the same for all variables
C(h) = V p(h)
This model is called the intrinsie correlation model because the correlation
between two variables
(Jij p(h) 17"
'3
V (Jii p(h) (Jjj p(h)

~=r"
V ~ii (Jjj '3
does not depend upon spatial scale.

In practice the intrinsic correlation model is obtained when direct and cross
covariance functions are chosen which are all proportional to a same basic spatial
correlation function
Cij(h) = bij p(h)
and the coefficients bij are subsequently interpreted as variances (Jii or covariances
depending whether i is equal to j or not.
(Jij,
EXERCISE 22.1 Show that the intrinsic correlation model is a valid model for
covariance function matrices.
The intrinsic correlation model implies even cross covariance functions be-
cause p(h) is an even function as it is a normalized direct covariance function.
An intrinsic correlation model can be formulated for intrinsically stationary
random functions (please note the two different uses of the adjective intrinsic).
Intrinsic Multivariate Correlation 141
For variograms the intrinsic correlation model is defined as the product of a

positive semi-definite matrix B of coefficients b;j, called the coregionalization
matrix, multiplied with a direct variogram ,(h)
r(h) = B ,(h)
EXERCISE 22.2 Show that the intrinsic correlation model is a valid model for
variogram matrices.
Linear model
The linear random function model, that can be associated with the intrinsically
correlated multivariate variogram model, consists of a linear combination of co-
efficients a~ with factors Yp(x)
N
Z;(x) = L: a~ Yp(x)
p=l
The factors Yp(x) have pairwise uncorrelated increments
E[ (Yp(x+h) - Yp(x)) . (Yq(x+h) - Yq(x))] = 0 for p =f. q
and all have the same variogram
E[ (Yp(x+h) - Yp(x)r] = 2,(h) for p = 1, .. . ,N

The variogram of a pair of random functions in this model is proportional to
one basic model
-nAh) = ~E[ (Z;(x+h) - Z;(x)). (Zj(x+h) - Zj(x))]
?;:;
21 E [N N a~ a~ (Yp(x+h) - Yp(x)) . (Yq(x+h) - Yq(x))
]
~ t a~a~E[ (Yp(x+h) - Yp(x)r]

p=l
= b;j/(h)
where each coefficient b;j, element of the coregionalization matrix B, is the result
of the summation over the index p of products of a~ with a~
N
b··
13 = "
j
L...J a;P aP
p=l
One possibility to decompose a given coregionalization matrix B for specifying
the linear model is to perform a principal component analysis with B playing
the role of the variance-covariance matrix
a~= Aq~
where Ap is a.n eigenvalue and q; is an element of an eigenvector of B.
r/h )
Correlated at
short distances !!
•••••••••• - •• '
• 1 • ••• h
-~~
3.5km
Figure 22.1: Cross variogram of a pair of principal components.
Co dispersion coefficients
A sensitive question to explore for a given data set is to know whether the
intrinsic correlation model is adequate. One possibility to approach the problem
is to examine plots of the ratios of the cross versus the direct variograms, which
are called codispersion coe./ficients (MATHERON, 1965)
cc(h) = 'ij(h)
V'ii(h) ,jj(h)
If the co dispersion coefficients are constant, the correlation of the variable
pair does not depend on spatial scale. This is an obvious implication of the
intrinsic correlation model
b"'V(h) b··
cc(h) = '3 = _'_3- = rij
Jbii b ,(h) Jbii b
I
jj jj
where Tij is the correlation coefficient between two variables, computed from
elements of the coregionalization matrix B.
Another possible test for intrinsic correlation is based on the principal com-
ponents of the set of variables. If the cross variograms between the principal
components are not zero at alliags h, the principal components are not uncorre-
lated at all spatial scales of the study and the intrinsic correlation model is not
appropriate.
EXAMPLE 22.3 (CROSS VARIOGRAM OF PAIR OF PRINCIPAL COMPONENTS)

Ten chemical elements were measured in a region for gold prospection (see (199]).
Intrinsic Multivariate Correlation 143
Figure 22.1 shows the cross variogram between the third and the fourth principal
component computed on the basis of the variance-covariance matrix of this geo-
chemical data for which a hypothesis of intrinsic correlation is obviously false.
Indeed, near the origin of the cross variogram the two components are signifi-
cantly correlated although the statistical correlation coefIicient of the principal
components is equal to zero. This is a symptom that the correlation structure
of the data is different at small scale, an aspect that a non-regionalized principal
component analysis cannot incorporate.
23 Cokriging
The cokriging procedure is a natural extension of kriging when a multivariate

variogram or covariance model and multivariate data are available. A variable
of interest is cokriged at a specific location from data about itself and about
auxiliary variables in the neighborhood. The data set may not cover all variables
at all sampie locations. Ordinary cokriging requires at least one data value about
the variable of interest, while simple cokriging, relying on its knowledge of the
mean, can be performed with data solely about the auxiliary variables. At the
other end, if all variables have been measured at all sam pie locations and if the
variables are intrinsically correlated, then cokriging is equivalent to kriging.
Isotopy and heterotopy

The measurements available for different variables Z;(x) in a given domain may
be located either at the same sampie points or at different points for each variable.
The following situations can be distinguished
- comp/ete heterotopy: the variables have been measured on different sets of

sampie points and have no sampie locations in commonj
- partial heterotopy: some variables share some sampie locationsj
- isotopy: data is available for each variable at all sampling points.
Complete heterotopy poses a problem for inferring the cross variogram or

covariance model. Experimental cross variograms cannot be computed for com-
pletely heterotopic data. Experimental cross covariances, though they can be
computed, are still problematic as they do not refer to the same points (and
sometimes subregions ) as the corresponding direct covariance values.
With partial heterotopy it is advisable, whenever possible, to infer the cross
variogram or covariance function model on the basis of the isotopic subset of the
data.
A particular case of partial heterotopy important for cokriging is when the
set of sampie points of the variable of interest is included in the sets of sampie
points of other variables, which serve as auxiliary variables in the estimation
procedure. In this case, when the auxiliary variables are available at more points
than the main variable, cokriging is generally of advantage.
Cokriging 145
The quest ion to know whether cokriging is still interesting in the case of
isotopy, when the auxiliary variables are available at the same locations as the
variable of interest, will be examined at the end of this chapter.
Ordinary cokriging
The ordinary cokriging estimator is a linear combination of weights w~ with data
from different variables located at sampie points in the neighborhood of a point
xo. Each variable is defined on a set of sampies of (possibly different) size and n,
the estimator is defined as
N n,
Z~(xo) = LL
w~ Z,(x,,)
,=1,,=1
where the index i o refers to a particular variable of the set of N variables. The
n,
number of sampies depends upon the index i of the variables, so as to include
into the notation the possibility of heterotopic data.
In the framework of a joint intrinsic hypothesis we wish to estimate a par-
ticular variable of a set of N variables on the basis of an estimation error which
should be nil on average. This condition is satisfied by choosing weights which
sum up to one for the variable of interest and which have a zero sum for the
auxiliary variables
L w'
n,.
"0=8" =
{I if i= io
•
,,=1 0 otherW1se
Expanding the expression for the average estimation error, we get
E[ L,=1 <>=1
N n;
E[ Z~(xo) - Z~(Xo) 1 = L w~ Z,(x,,)

- ,,=1
f: w~ Z,o(Xo)- ~ ; w~ Z,(Xo) ]
--..........-..
1
i#o "--v--'
o
N ni
= LL w~ E[ Z,(X,,) - Z,(Xo) 1
,=1,,=1' " '
o
o
For the variance of the estimation error we thus have

N ~ 2
a~ = E[ (~L w~ Z,(X,,) - Z,o(xo)) ]
.=1 <>=1
Introducing weights wh defined as
-I if i= io,
w~ = -8"0 =
{ 0
if if i o
which are included into the sums, we can shorten the expression of the estimation
variance to
E[ (?: L: W~
N ni 2
(T~ = Zi(X O ) ) ]
.=10=0
Then, inserting fictitious random variables Zi(O) positioned arbitrarily at the

origin, increments can be formed
(T~ = E[ (t (t w~
.=1 0=0
Zi(Xo) - Zi(O) t
0=0
w~ ) )2]
'--...-'
o
E[ (?: L: W~
N ~ 2
= ,(Z;(xo) - Zi(O)),) ]
.=10=0 •
increments
Defining cross covariances of increments Ct(xo , xß) (which are not translation
invariant) we have
N N ni nj
(TE
2 = "
L...J "
L...J " L...J WOi Wßj Cij(x
L...J " I o , Xß)
;=1 j=1 0=0 ß=O
In order to convert the increment covariances to variograms, the additional

assumption has to be made that the cross covariances of increments are symmet-
rie. With this hypothesis we obtain the translation invariant quantity
N n
(TE
2 = 2"
L...J "
L...J Woi tiio(Xo-Xo) - tioio (XO-Xo)
;=10=1
N N ni nj
- L: L: L: L: W~ W~ tij(Xo-Xß)
i=l j=1 0=1 ß=1
After a minimization in which the constraints on the weights generate N

parameters of Lagrange f.li, we have the ordinary cokriging system
N nj
L: L: w~ tij(Xo-Xß) + f.li = tiio(x",-xo) for i = 1, ... N; = 1, ... ni

Q
j=lß=l
ni
L: w~ = Öiio for i = 1, ... N

ß=l
and the cokriging variance
N ni
2
(TCK -_L...J
" L...J i tiio(Xo-XO)
" Wo + f.lio - tioio(XO - XO)
i=lo=l
Cokriging 147
Simple cokriging
Ordinary eokriging has no meaning when no data is available for the variable of
interest in a given neighborhood. On the other hand, simple kriging leans on the
knowledge of the means of the variables, so that an estimation of a variable ean
be ealibrated without having any data value for this variable in the eokriging
neighborhood.
The simple eokriging estimator is made up of the mean of the variable of
interest plus a linear eombination of weights w~ with the residuals of the variables
N n.
Z~(xo) = mio + I: I: w~ (Zi(X a ) - mi)
i=l a=l
To this estimator we ean assoeiate a simple eokriging system, written in

matrix form
[~' )
Cll C lj
C'N \ _ (C,,,
Cil C ii CiN I W. - I C"O
C N1 CNj C NN J WN \ CNio
where the left hand side matrix is built up with square symmetrie nj X nj bIo es
Ci; on the diagonal and with rectangular ni x nj bloes Cjj off the diagonal, with
C ij = cj;
The blocks C ij contain the eovarianees between sampie points for a fixed pair
of variables. The vectors Ciio list the eovarianees with the variable of interest, for
a speeifie variable of the set, between sampie points and the estimation loeation.
The vectors wi represent the weights attaehed to the sampies of a fixed variable.
EXERCISE 23.1 We are interested in cokriging estimates on the basis of 3 data

values at two points Xl and X2 as shown on Figure 23.1. In this lD neighborhood
with two sampie points we have one data value for Zl (x) and two da ta values for
Z2(X). The direct covariance functions and the cross covariance function (which
is assumed an even function) are modeled with a spherical structure with a range
parameter a= 1
Gn(h) = 2psph(h) G22 (h) = Psph(h) G12 (h) = -1 Psph(h)
i) Is the coregionalization matrix positive definite? Compute the correlation

coefficient between Zl(X) and Z2(X).
We want to estimate Zl (x) using simple cokriging assuming known means ml
and m2
zt(xo) = ml + Wl (Zl(xd - ml) + W2( Z2(xd - m2) + W3(Z2(X2) - m2)'

Z1
Z2 Z2
X1 xo xÖ X2
.~ 1 ... 0 o •
Figure 23.1: Three data values at two locations Xl, X2.
ii) Set up a simple cokriging system for the case of isotopy. How does it reduce
due to the fact that no data value is available for Z1 (x) at the point X2?
iii) Solve the simple cokriging for an estimation of Z1 (x) at the three points x~,
x~ and X2 shown on Figure 23.l.
iv) Discuss the three cokriged values z~(x~), z~(x~) and Zr(X2).
Cokriging with isotopic data

Let us eonsider the ease of isotopie data. With isotopy the eokriging of a set
of variables has the important advantage over aseparate kriging of eaeh vari-
able that it preserves the eoherenee of the estirnators. This ean be seen when
estirnating a surn S(x) of variables
N
S(x) = E Z;(x)
;=1
The eokriging of S(x) is equal to the surn of the eokrigings of the variables
Z;(x) (using for eaeh eokriging N out of the N + 1 variables)
N
SCK(X) = E Z?K(x)
;=1
However, if we krige eaeh term of the surn and add up the krigings we generally
do not get the same result as when we krige directly the added up data: the two
estimations are not eoherent.
EXAMPLE 23.2 The thickness T(x) of a geologie layer is defined as the difference
between its upper and lower limits
T(x) = Zu(x) ZL(X)

'-v-" '-.--' '-.--'
thickness upper limit lower limit
Cokriging 149
The cokriging estimators of each term using the information of two out of the
three variables (the third being redundant) are coherent
TCK(x) = ZgK(X) - ZfK(x)
But if kriging is used, the left hand result is in general different from the right
hand difference
T K (x) -# z{f (x) - zt (x)
and there is no criterion to help decide aB to which estimate of thickness is the
better.
In some situations isotopic cokriging is equivalent to kriging. The trivial

case is when all cross variograms or cross covariance functions are zero. A more
subtle ease is when cross variograms or eovarianees are proportional to a direct
variogram or eovariance function.
A utokrigeability
A variable is said to be autokrigeab/e with respect to a set of variables, if the
direct kriging of this variable is equivalent to the eokriging. The trivial case is
when all variables are uncorrelated (at any scale!) and it is easy to see that this
implies that all kriging weights are zero exeept for the variable of interest.
In a situation of isotopy any variable is autokrigeable if it belongs to a set
of intrinsieally eorrelated variables. Assuming that the matrix V of the intrin-
sie correlation model is positive definite, we ean rewrite the simple cokriging
equations using Kronecker products 0
(V 0 R) W = Vio 0 ro
where
R = [p(xa-xß)] and rO = (p(xa-xo))
The nN X nN left hand matrix of the isotopic simple eokriging with intrinsic
correlation is expressed as the Kroneeker product between the N x N variance-
covarianee matrix V and the n x n left hand side matrix R of simple kriging
(111 R (11N R
V0R= (1ij R
(1NlR (1NNR
The nN right hand vector is the Kronecker product of the vector of covari-
ances O"iio times the right hand vector ro of simple kriging
O"lio ro
Vi o ® ro = I O"iio ro
O"Nio ro
Now, as Vio is also present in the left hand side of the cokriging system, a
solution exists for which the weights of the vector Wi o are those of the simple
kriging of the variable of interest and all other weights are zero
. . . v. 0 R ... I H-) ~ vi_ 0 r,
This is the only solution of the simple cokriging system as V and R are
positive definite.
It should be noted that in this demonstration the autokrigeability of the
variable of interest only depends on the block structure Vio ® Rand not on
the structure V ® R of the whole left hand matrix. Thus for a variable to be
autokrigeable we only need that the cross variograms or covariances with this
variable are proportional to the direct variogram or covarianee.
To test on an isotopic data set whether a variable Zi o is autokrigeable, we can
compute autokrigeability coefficients defined as the ratios of the cross variograms
against the direct variogram
aCioAh) = lioj(h)
lioio (h)
If the autokrigeability eoefficients are constant at any working seale for eaeh
variable j = 1, ... , N of a variable set, then the variable of interest is autokrige-
able with respect to this set of variables.
A set of N variables for whieh eaeh variable is autokrigeable with respect to
the N -1 other variables is intrinsically correlated.
COMMENT 23.3 The property of autokrigeability is used explicitly in nonlinear

geostatistics in the formulation of models with orthogonal indicator residuals.
Cokriging 151
Collocated cokriging
A particular heterotopic situation encountered in practice is when we have a
variable of interest Z(x) known at a few points and an auxiliary variable Sex)
known everywhere in the domain (or at least at all nodes of a given estimation
grid).
Xu ET AL. [206) called collocated cokriging a neighborhood definition strategy
in whieh the neighborhood of the auxiliary variable is arbitrarily reduced to only
one point: the estimation loeation xo. The simple cokriging system for such a
neighborhood (using n data values for the primary variable) is written
( C.fz czs) (wz) = (czz)

Czs ass Ws azs
where Czz is the left hand matrix ofthe simple kriging system of Z(x) and Czz is
the corresponding right hand side. The vector Czs contains the cross eovarianees
between the n data points on Z(x) and the point Xo where the data value of
Sex) is loeated that is included into the estimator.
In the case of intrinsic correlation the collocated cokriging can be expressed
as
azz~
( azs r
azsro) (wz)
ass Ws
(azzro)
azs
o
To perform eolloeated cokriging with intrinsie eorrelation of the variable pair
we only need to know the eovarianee function
Gzz(h) = azz p(h)
as well as the variance ass of Sex) and the covariance azs between Z(x) and
Sex).
EXERCISE 23.4 Suppose that Z(x) and Sex) are intrinsically correlated. We
choose to use as a cokriging neighborhood the data Z(x",) at n sampIe loca-
tions x'" and corresponding values S(x",) at the same locations. We also include
the value S(xo) available at the grid node Xo into the cokriging neighborhood
definition. Does this cokriging boil down to collocated cokriging?
EXERCISE 23.5 Gan this neighborhood search strategy (n values for Z, 1 value
for S at xo) be applied with ordinary cokriging? Evaluate the ordinary cokriging
weights.
24 Multivariate Nested Variogram
The multivariate regionalization of a set of random functions can be represented

with a spatial multivariate linear model. The associated multivariate nested var-
iogram model is easily fitted to the multivariate data. Several coregionalization
matrices describing the multivariate correlation structure at different scales of a
phenomenon result from the variogram fit. The relation between the coregional-
ization matrices and the classical variance-covariance matrix is examined.
Linear model of coregionalization

A set of real second-order stationary random functions {Zi(X); i = 1, ... , N}
can be decomposed into sets {Z~(x); u = 0, ... , S} of spatially uncorrelated
components
S
Zi(X) = L Z~(x) + mi
,,=0
where for all values of the indices i, j, u and v
E[ Zi(X)] = mi
E[Z~(x)] = 0
and
cov(Z~(X), Z~(x+h)) = E[ Z~(x) Z~(x+h)] = Gij(h)

cov(Z~(x),zt(x+h)) = 0 when u =f v
The cross covariance functions Gij(h) associated with the spatial components
are composed of real coefIicients bij and are proportional to real correlation func-
tions p,,(h)
s s
Gij(h) = L Gij(h) = L bij p,,(h)
,,=0 ,,=0
which implies that the cross covariance functions are even in this model.
Coregionalization matrices B" of order N x N can be set up and we have a
multivariate nested covariance function model
s
C(h) = LBuPu(h)
u=o
Multivariate Nested Variogram 153
with positive semi-definite coregionalization matrices B u •
EXERCISE 24.1 When is the above covariance function model equivalent to the
intrinsic correlation model?
EXERCISE 24.2 Show that a correlation function Pu(h) having a non zero sill b'tj
on a given cross covariance function has necessarily non zero sills bi; and b'l; on
the corresponding direct covariance functions.
Conversely, if a sill bi; is zero for a given structure of a variable, all sills of
the structure on all cross covariance functions with this variable are zero.
Each spatial component Z!(x) can itself be represented as a set of uncorre-
lated factors YJ(x) with transformation coefficients a~p
N
Z!(x) = L a~u YJ(x)
p=l
where for a11 values of the indices i, j, u, v, p and q
E[Y.!'(x) 1 = 0
and
cov(Y.!'(x), Y.!'(x+h)) Pu(h)

cov(Y.!'(x), Y"q(x+h)) = 0 when u =f:. v or p =f:. q
Combining the spatial with the multivariate decomposition, we obtain the linear
model 01 coregionalization
S N
Z;(x) L La~uY.!'(x)
u=Op=l
In practice first a set of correlation functions Pu(h) (i.e. normalized vari-

ograms 9u(h)) is selected, taking care to keep S reasonably small. Then the
coregionalization matrices are fit ted using a weighted least squares algorithm
(described below). The weighting coefficients are chosen by the practitioner so
as to provide a graphica11y satisfactory fit which downweighs arbitrarily distance
classes which do not comply with the shape suggested by the experimental var-
iograms. Fina11y the coregionalization matrices are decomposed, yielding the
transformation coefficients alp which specify the linear coregionalization model
Bu = AuA: where Au = [a~ul
The decomposition of the B u into the product of Au with its transpose is

usually based on the eigenvalue decomposition of each coregionalization matrix.
Several decompositions for the purpose of a regionalized multivariate data anal-
ysis are discussed in the next chapter.
Bivariate fit of the experimental variograms

The multivariate nested variogram model associated with a linear model of in-
trinsically stationary random functions is
s
r(h) = L B"g,,(h)
,,=0
where the g,,(h) are normalized variograms and the B" are positive semi-definite
matrices.
In the case of two variables it is simple to design a procedure for fitting the
variogram model to the experimental variograms. We start by fitting the two
direct variograms using a nested model. At least one structure g,,(h) should
be common to both variograms to obtain a non trivial coregionalization model.
Then we are able to fit the sills bt of the cross variogram, using the sills of
the direct variograms to set bounds within which the coregionalization model is
authorized
Ib'!·1
'J -
< Jb'f... b'!·
JJ
because the second order principal minors of B" are positive.

Constrained weighted least squares routines exist, which aHow integrating
these constraints into an automated fit for a set of predefined structures.
The extension of this bivariate procedure to more than two variables does not
guarantee apriori an authorized model, because higher order principal minors of
the coregionalization matrices are not constrained to be positive.
EXAMPLE 24.3 The inequality relation between sills can be used to provide a
graphical criterion to judge the goodness of the fit of a variogram model to an
experimental variogram. A huH of perfect correlation is defined by replacing the
sills b'tj of the cross variogram by the square root of the product of the sills of
the corresponding direct variograms (which have to be fitted first) and setting
the sign of the total to + or -
s
hull(iij(h)) = ± L Jbr; b'jj g,,(h)
,,=0
Figure 24.1 shows the cross variogram between nickel and arsenic for the Loire
geochemical data set. The fit of a nugget-effect plus two spherical models with
ranges of 3.5km and 6.5km seems inappropriate on this graph.
The same fit viewed on Figure 24.2 within the hull of perfect (positive or
negative) correlation now looks satisfactory. This shows that a cross variogram fit
should be judged in the context of the spatial correlation between two variables:
in this case the statistical correlation between the two variables is very poor (the
correlation coeflicient is equal to r = .12). The bt coeflicients ofthe two spherical
structures are very small in absolute value and the model is actually dose to a
pure nugget-effect, indicating that correlation between arsenic and nickel only
exists at the microscale.
o(h) •
.12 I
As &Ni
•
•
I~.~ • • .. • •
•
o. h
o.
Figure 24.1: Experimental cross variogram of arsenic and nickel with a seemingly
badly fitted model.
Multivariate fit
A multivariate fit is sought for a model r(h) to the matrices r*(Sh) of the
experimental variograms calculated for n c classes fJk of separation vectors h. An
iterative algorithm due to GOULARD [76] is based on the least squares criterion
nc 2
L tr[ (r*(fJk) - r(fJk)) 1
k=l
which will be minimized under the constraint that the coregionalization matrices
B., of the model r(h) have to be positive semi-definite.
Starting with S arbitrarily chosen positive semi-definite matrices B", we look
for an S+l-th matrix B v which best fiUs up the gap dr~k between a model with
S matrices and the experimental matrices r*(fJk)
s
dr~k = r*(fJk) - L B.,g.,(fJk)
.,=0
.,#v
The sum of the differences weighted by gv(fJk) is a symmetrie matrix dr~

which is in general not positive semi-definite. Decomposing it into eigenvalues
o(h)
.75
As &Ni
_. ......... . .h
-.75
Figure 24.2: The same cross variogram fit is displayed together with the hull of perfect
positive or negative correlation provided by the previous fitting of the corresponding
direct variograms. The cross variogram model is defined within the bounds of the hull.
and eigenvectors
nc
L dr:
1.=1
1. g,,(jh) = dr: = Q" A" Q: with Q:Q" = I
we can build with the positive eigenvalues a new matrix dr~ which is the positive
semi-definite matrix nearest to dr~ in the least squares sense
dr~ = Q"A~Q:
where A~ is equal to the diagonal eigenvalues matrix A" except for the negative
eigenvalues, which have been set to zero.
Dividing by the square of the variogram g,,(h) of the S+l-th structure, we
obtain a positive semi-definite matrix B: which minimizes the fitting criterion
B*= ~
" E(g,,(ih))
1.=1
2
This procedure is applied in turn to each matrix B u and iterated. The al-
gorithm converges weH in practice, although convergence is theoretically not
ensured.
In practice weights W(.s)k) are included into the criterion to be allow the user
to put low weight on distance classes which the model should ignore
L
~ 2
w(.s)k) tr [ (r*(.s)k) - r(.s)k)) ]
k=l
The steps of this weighted least squares algorithm are analogous to the algorithm
without weights.
COMMENT 24.4 The heart of Goulard's algorithm is built on the following result
about eigenvalues (see RAO [145], p63).
Let A be asymmetrie N x N matrix. The Euclidean norm of A is
N N
IIAII = .ILLlaiil2
i=l ;=1
and its eigenvalue deeomposition
N
A = LApqpq; with QQT = I
p=l
We shall assume that the eigenvalues Ap are in deereasing order.
We look for asymmetrie N x N matrix B of rank k < N whieh best approx-
imates a given matrix A i.e.
inf IIA - BII
B
Multiplying a matrix by an orthogonal matrix does not modify its length
(measured with the Euclidean norm) so we have
IIA - BII 2 = 11 QT A Q - QT BQ 11 2
'"-.--'
C
= IIA-CIl 2
N N 2
= L L (Apbpq - Cpq)
p=lq=l
where bpq is one for p= q and zero otherwise. The best ehoke for C is obviously
a diagonal matrix as
N 2 N N N 2
L:(Ap-Cpp) + L:L:(Cpq)2 > L:(Ap - Cpp)
p=l p=lq=l p=l
~
p-#q
The smallest diagonal matrix C of rank k is the one for whieh the diagonal
elements Cpp are equal to the eigenvalues Ap for p ~ k and zero for p > k
N
;: : L
N 2
L(Ap - Cpp) (A p )2
p=l p=k+1
This means that

C=QTBQ=AO
where N is the diagonal matrix of eigenvalues of A in which the last k- N
(lowest) eigenvalues have been set to zero.
The best rank-k approximation of A is B = Q AO QT.
The need for an analysis of the coregionalization

We can use classical principal component analysis to define the values of factors
at sam pie locations and then krige the factors over the whole region to make
maps. What would be the benefit of a spatial multivariate analysis based on
the linear model of coregionalization and the corresponding multivariate nested
variogram?
To answer this question we restrict the discussion to a second order stationary
context, to make sure that the variance-covariance matrix V, on which classical
principal component analysis is based, exists from the point of view of the model.
If we let the lag h go to infinity in a second order stationary context with
structures g,,(h) having unit sills, we notice that the multivariate variogram
model is equal to the variance-covariance matrix for large h
r(h) -+ V for h -+ 00
In this setting the variance-covariance matrix is simply a mixture of coregion-

alization matrices
s
V=L:B"
,,=0
We realize that when the variables are not intrinsica11y correlated, it is nec-
essary to analyze separately each coregionalization matrix B". The variance-
covariance matrix V is ablend of different correlation structures stemming from
a11 scales covered by the sampling grid and is likely to be meaningless. Further-
more, coregionalization matrices B" can be obtained under any type of station-
arity hypothesis, while the variance-covariance matrix V is only meaningful with
data fitting into a framework of second-order stationarity.
EXAMPLE 24.5 A multivariate nested variogram has been nt ted to the direct
and cross variograms of the three elements copper, lead and zinc sampled in a
forest near the town of Brilon, Germany (as described in (197]).
Regionalized correlation coeflicients
b~·
rij = Vb1f... b'f.
')
))
have been computed for each variable pair at three characteristic spatial scales
and are listed on Table 24.5 together with the classical correlation coeflicient
O'ij
rij = ----"-
-JO'ii O'jj
Statistical Micro-scale Small scale Large scale

correlation correlation correlation correlation
(0-10 m) (10-130 m) (130-2300 m)
Cu-Pb -.08 0.0 -.04 -.36
Cu-Zn .42 .57 .31 .42
Pb-Zn .35 .23 .46 .11
Table 24.5: Correlations of Brilon geochemical variables.
The three characteristic scales were defined on the basis of the nested vari-
ogram model in the following way
- micro-scale: the variation below the minimal sampie spacing of 10m is

summarized by a nugget-effect model;
- small scale: the variation at ascale of 10m to 130m is represented by a

spherical model of 130m range;
- large scale: the variation above 130m is eaptured bya spherieal model with
a range of 2.3km.
At the micro-scale, the best correlated pair is copper and zine with a region-
alized eorrelation of .57: the geoehemist to whom these figures were presented
thought this was normal, beeause when eopper has a high value at a sampie
point, zine also tends to have a high value at the same loeation.
At the small scale, lead and zine have a eoeflicient of .46 whieh is explained
by the fact that when there are high values of lead at some sampie points, there
are high values of zine at points nearby within the same zones.
Finally, at the large seale, we notiee the negative eoeflieient of -.36 between
copper and lead, while at the two other eharacteristic seales the eorrelation is
(nearly) zero. This is due to the fact that when eopper is present in a zone, this
excludes lead and viee-versa.
Naturally this tentative interpretation does not answer all questions. In-
terpretation of eorrelation eoeflieients is diflieult beeause they do not obey an
equivalenee relation: if A is eorrelated with Band B with C, this does not imply
that A is eorrelated with C.
At least we ean draw one eonclusion from the analysis of this table of eo-
eflieients: the set of three variables is clearly not intrinsieally eorrelated and
eorrelation strongly depends on spatial seale. The varianee-eovarianee matrix
does not deseribe well the relations between the variables. Multiple linear re-
gression would be a bad idea and eokriging based on the multivariate nested
variogram the alternative to go for in a regression problem.
25 Coregionalization Analysis
The geostatistical analysis of multivariate spatial data can be subdivided into

two steps
- the analysis of the coregionalization of a set of variables leading to the
definition of a linear model of coregionalizationj
- the cokriging of specific factors at characteristic scales.
These techniques have originally been called factorial kriging analysis (from
the French analyse krigeante [122)). They allow to isolate and to display sources
of variation acting at different spatial scales with a different correlation structure.
Regionalized principal component analysis

Principal component analysis can be applied to coregionalization matrices, which
are the variance-covariance matrices describing the correlation structure of a set
of variables at characteristic spatial scales.
Regionalized principal component analysis consists in decomposing each ma-
trix B u into eigenvalues and eigenvectors
B u = AuA: with Au = Qu [A:. and Qu Q: =I

The matrices Au specify the coefficients of the linear model of coregionaliza-
tion. The transformation coefficients
;-!;;;
apu - YAi. qp
are the covariances between the original variables Z;{x) and the factors Y,!'{x).
They can be used to plot the position of the variables on correlation circles for
each characteristic spatial scale of interest. These plots are helpful to compare
the correlation structure of the variables at the different spatial scales.
The correlation circle plots can be used to identify intrinsic correlation: if
the plots show the same patterns of correlation, this means that the eigenvec-
tors of the coregionalization matrices are similar and the matrices only differ by
their eigenvalues. Thus the matrices B u are all proportional to a matrix B, the
coregionalization matrix of the intrinsic correlation model.
Applications of regionalized principal component analysis (or factor analysis)
have been performed in the fields of geochemical exploration ([195), [175], [15),
[199]), soil science ([200], [77], [72], [78], [146]), hydrogeology ([156], [74]), to
mention just a few.
Coregionalization Analysis 161
Generalizing the analysis

The analysis ean be generalized by ehoosing eigenvectors whieh are orthogonal
with respect to a symmetrie matrix M u representing ametrie
Au = QuMu{A: with Q"M" Q: = I

A possibility is to use the metrie
s
Mu = LB v
v=o
whieh is equivalent to the varianee-eovarianee matrix V in a seeond order sta-
tionary model. This metrie generates a eontrast between the global variation
and the variation at a specifie seale deseribed by a eoregionalization matrix B u •
COMMENT 25.1 This type of metric has also been used frequently to decom-
pose the matrix r*(.f)d of experimental direct and cross variograms for the first
distance dasS.f)I by numerous authors (see [202), [157}, [185}, [14} among others)
looking for the eigenvectors of
r*(.f)I)
V
This noise rem oval procedure has been compared to coregionalization analysis
in [200} and [198}. The basic principle is illustrated with a simple model which
shows why the approach can be successful in removing micro-scale variation.
Suppose we have a multivariate nested covariance function model
C(h) = B o Pnug(h) + BI Psph(h)

where Pnug(h) is a nugget-effect and Psph(h) a spherical correlation function. The
variance-covariance matrix in this model is
V = Bo+B1
Let.f)l be the distance dass grouping vectors h greater than zero and shorter
than the range of the spherical correlation function. Then
C(h) = BI p.ph(h) for hE .f)l
and the eigenvectors of

C(.f)l)
V
are equivalent to those of
BI
Bo+B1
what reminds the setting of discriminant analysis. Corresponding factors reduce
to a minimum the inBuence of noise (micro-scale variation), as reBected in the
matrix Ba.
Regionalized canonical and redundancy analysis

When the set of variables is split into two groups any coregionalization matrix
can be partitioned into
B ll B12)
B,,-- ( B21" B22"
" "
where B~1, B~2 are the within group coregionalization matrices while B~2 repre-
sents the between groups coregionalization.
The formalism of canonical analysis can be applied to that specific spatial
scale of index u
B ll = All (All)T
" " "
where
All
u.
= Qll
U
M Ull VAll
1L
with Q~1 M~1 (Q~1) T = I
and
M ull = B u12 (B u22 )-1 B u21
with non-singular matrices B~2.
The results of canonical analysis are often deceptive. Principal component
analysis with instrumental variables (RAO [144]), which has been reinvented
in psychometrics under the name of redundancy analysis (an account is given
in [188]), can be a more appropriate alternative. It is based on the metric
M ll = B 12 B 21
" ""
GOOVAERTS [73] has first applied a regionalized redundancy analysis to ex am-
ine the links between soil properties and chemical variables from banana leaves.
See also [103] for an application to remote sensing and geochemical ground data,
which turned out to be intrinsically correlated, so that redundancy analysis was
based on V instead of B".
Cokriging regionalized factors

The linear model of coregionalization defines factors at particular spatial scales.
We wish to estimate a regionalized factor from data in a local neighborhood
around each estimation location xo.
The estimator of a specific factor Y,fo°(x) at a location Xo is a weighted average
of data from variables in the neighborhood with unknown weights w~
N ni
Y:Ouo(xo) = L L w~ Zi(X")
;=1 ,,=1
In the framework of local second-order stationarity, in which local means mi

for the neighborhood around Xo are meaningful, an unbiased estimator is built
Coregionalization Analysis 163
for the factor (of zero mean, by eonstruction) by using weights summing up to
zero for eaeh variable
N n
E [ Y;,:uo (x) - y~o (x) ] = L m; LW~=O
i=1 ",=1
'-,..--'
o
The effeet of the eonstraints on the weights is to filter out the loeal means of
the variables Z;(x).
The estimation varianee a~ is
a~ = E[ (y;':uo(x) - Y~O(x)r ]
N N n n N n
= 1+ L L L ß=1
;=1 j=1
L w~ w~ C;j(x",-xß) - 2 L L w~ a~opo Puo(x",-xo)
",=1 ;=1 ",=1
The minimal estimation varianee is realized by the eokriging system

N
= Ia~ouo Puo(x",-xo) I
nj
LL w~ Cij(x",-xß) - fL; for i = 1, ... N;

j=l ß=l
a= 1, ... n
nj
LW~=O for i = 1, ... N

ß=l
We find in the right hand side of this system the transformation eoeffieients
a~ouo of the factor of interest. These eoeffieients are multiplied by values of the
spatial correlation function Puo(h) which describes the correlation at the scale of
interest.
The factor cokriging is used to estimate a regionalized factor at the no des of
a regular grid whieh serves to (lraw a map.
Regionalized multivariate analysis

Cokriging a factor is more cumbersome and computationally more intensive than
kriging it. Coregionalization analysis is more lengthy than a traditional analysis
whieh ignores spatial seale. When is all this effort neeessary and worthwhile?
When can it be avoided? The answer is based on the not ion of intrinsic correla-
tion.
The steps öf both a classical or a regionalized multivariate analysis (MVA)
for spatial data are summarized on Figure 25.1. The key question to investigate
is whether the eorrelation between variables is dependent on spatial seale. Three
ways to test for scale-dependent correlation have been described
1. co dispersion coefficients cC;j(h) can be computed and plotted: if they are

not constant for each variable pair, the correlation structure of the variable
set is affected by spatial scaJe;
I Variables Zi(X) I
!
11 Intrinsic Correlation? 11 ~ 1 Fit r(h) = L:Bu9u(h) I
! Yes !
1 MVA on V 1 I MVA on B u I
! !
1 Transforrn into Y 1 1 Cokrige ~:uo frorn Zi(XoJ 1
! !
1 Krige ~: frorn Ypo (xcr ) 1 ---+ I Map of Factor I
Figure 25.1: Classical multivariate analysis followed by a kriging of the factors (on
the left) versus regionalized multivariate analysis (on the right).
2. cross variograms between principal components of the variables can be

computed: if they are not zero for each principal component pair at any
lag h (like on Figure 22.1 on page 142), the classical principal components
are meaningless because the variance-covariance matrix of the variable set
is merely a mixture of different variance-covariance structures at various
spatial scalesj
3. plots of correlation circles in a regionalized principal component analysis

can be examined: if the patterns of association between the variables are
not identical for the coregionalization matrices, the intrinsic correlation
model is not appropriate for this data set. With only few variables it is
possible to look directly at a table of regionalized correlation coefficients
instead of the regionalized principal components (like in Example 24.5 on
page 158).
CoregionaJization Analysis 165
If the data appears to be intrinsically correlated, we can apply any classical

method of multivariate analysis, calculate the direct variograms of the factors,
krige them on a grid and represent them as maps. But if correlation is affected
by spatial scale, we need to fit a linear model of coregionalization and to cokrige
the factors, as suggested by the right hand path of Figure 25.1.
26 Kriging a Complex Variable
The quest ion of how to model and krige a complex variable has been analyzed by
LAJAUNIE & BEJAOUI (99) and GRZEBYK [80). The covariance structure can be
approached in two ways: either by modeling the real and imaginary parts of the
complex covariance or by modeling the coregionalization of the real and complex
parts of the random function. This opens several possibilities for (co-)kriging the
complex random function.
Coding directional data as a complex variable

Complex random functions can be useful to model directional data in two spatial
dimensions. For analyzing and mapping a wind field we code the direction 8(x)
and the intensity R(x) of wind at each location of the field as a complex variable
Z(x) = R(x) ei9 (x)
An alternate representation of Z(x) can be set up using the coordinates of

the complex plane
Z(x) = U(x) + i V(x)
as shown on Figure 26.1.
We shall use this second representation of the random function.
Complex covariance function

We assume that Z(x), U(x) and V(x) are second order stationary and centered.
The covariance function of Z(x) is defined as
C(h) = E[ Z(x+h) Z(x)] = CRe(h) + i CIm(h)

where Z(x) = U(x) - i V(x) is the complex conjugate.
Let Cuu(h), Cvv(h) and Cuv(h) be the (real) direct and cross covariance
functions of U(x) and V(x). Then the complex covariance C(h) can be expressed
as
C(h) = Cuu(h) + Cvv(h) - i Cuv(h) + i Cvu(h)

= Cuu(h) + Cvv(h) + i (Cuv( -h) - Cuv(h))
Kriging a Complex Variable 167
iV
Figure 26.1: Coding a wind vector in the complex plane.
The real part of the eomplex eovarianee
CRe(h) = Cuu(h) + Cvv(h)

is even and a eovarianee function, while the imaginary part
Chn(h) = Cuv ( -h) - Cuv(h)
is odd and not a eovarianee function.
Complex kriging
The linear eombination for estimating, on the basis of a eomplex eovarianee
(CC), the eentered variable Z(x) at a loeation Xo from data at loeations x'" with
weights W", E ce is
n
Z~dxo) = LW", Z(x",)
",=1
In terms of the real and imaginary parts of Z(x) it ean be written as
n
Z~dxo) = L [ (w~e U(x",) - W!:" V(x",))
",=1
+i (w~e V(x",) + W!:" U(x",) ) 1

EXERCISE 26.1 From the variance of the linear combination
var ( t; w'" Z(x",) = t;];

n ) n n
W'" wß C(x",-xß)
the following inequality can be deduced

n n
~ ~ (w~w~+w~wr) CRe(Xa-xp)
er=l P=l
n n
+ ~ ~ (w~wr-w~w~) CIm(xer-xp) ~ 0
er=l P=l
Show that this inequality implies that CRe(h) is a positive definite function.
EXERCISE 26.2 On the basis of the linear combination (1 +i) Z(O) + (1- i) Z(h)
show that
ICIm(h)I :::; (72
Complex kriging consists in computing the complex weights which are solution
of the simple kriging
n
~ Wp C(xer-xp) = C(xo-xer ) for a = 1, ... ,n
P=l
!t
This is equivalent to the following system, in which the real and imaginary
parts of the weights have been separated
n
~ (w~ CRe(xer-xp) - wr CIm(xer-xp)) = CRe(Xo-xer ) for Va
P=l
(w~ CIm(xer-xp) + wr CRe(xer-xp)) = CIm(Xo-xer ) for Va
In matrix notation we have

( eRe Re ) = (eelm
Re )
e lmT ee Re )
lm
(W
w lm
To implement complex kriging it is necessary to model CRe(h) and CIm(h) in
a consistent way such that C(h) is a complex covariance function. This quest ion
is postponed to the end of this chapter.
Cokriging of the real and imaginary parts

An alternative approach consists in relying on the coregionalization of the real
and imaginary parts to make the simple cokriging of Z(x) with the estimator
Z~K(Xo) = UCK(Xo) +i VCK(Xo)
where
n
UCK(xo) = ~ (JJ~ U(xer ) + v; V(x er ))
er=l
n
VCK(Xo) = ~ (JJ! U(xer ) + v! V(xer ))
er=l
EXERCISE 26.3 Show that the estimation variance of ZCK(Xo) is equal to the
sum of the estimation variances of the real and imaginary parts.
As the estimators for U(x) and V(x) both have the same structure, we shall
only treat the cokriging of the former.
EXERCISE 26.4 Compute the estimation variance var(U(xo) - UCK(xo)).

For cokriging U(x) from the real and imaginary parts of the data Z(x o ) we
have the system
( Cu u Cuv) (JL~) = (Cuu)
C vu Cvv y Cuv
where the blocks Cuu and C vv are the left hand sides of the simple kriging
systems of U(x) and V(x), respectively. The block C uv = Cvu contains the
cross covariance values between all data locations. The vector Cuu is the right
hand side of the simple kriging of U(x) while the vector Cuv groups the cross
covariances between all data points and the estimation location.
Complex kriging and cokriging versus aseparate kriging

Considering either the data on the complex variable or the corresponding real
and imaginary parts, we have three ways of building an estimate of the complex
quantity
- the estimator of complex kriging
Zcdxo) = wT z
= uT w Re _ yT w hn + i (uT w hn + yT w Re )
- the estimator based on the coregionalization of U(x) and V(x)
ZCK(xO) = uT JLI + yT yl + i (uT JL2 + yT y2)
- an estimator obtained by kriging separately U(x) and V(x)
Z*
KS (
Xo) = UT WKl+.T 1 Y WK2
Let us examine the case of intrinsic correlation of U(x) and V(x). The
imaginary part of the complex covariance is zero and we have the complex kriging
system
(C; C~) (::) = (c:)
and more specifically
( (<1UU + oYv) R 0 ) (W Re ) = (<1UU + <1vv) r o )

o (<1UU +<1vv) R w Im 0
For the cokriging system of U(x) with intrinsic correlation we write
( <1vu
~UR ~vR)(~)=(~u~)
R <1vv R y <1uv ro
For both estimators the solution is equivalent to the separate simple kriging
of the real and imaginary parts and we find
W
Re = J.t1 = v 2 = 1
WK =
2
WK
while all other weights are zero.

The case of an even cross covariance function is remarkable. With an even
cross covariance Cuv(h)= Cvu(h) the imaginary part of the complex covariance
is zero and the imaginary weights of complex kriging vanish. The estimator
reduces to
Z~dxo) = ZT w Re
The difference between the kriging variance O"~c of complex kriging and the krig-
ing variance O"ks of the separate simple kriging is non negative, so that complex
kriging provides a poorer solution than the one obtained by aseparate kriging
of U(x) and V(x) when the cross covariance function is even.
EXERCISE 26.5 Show that with an even cross covariance Eunction the difference
oE the kriging variances is equal to
2 -
O"CC 2
O"KS = (1
WK-W
Re)T CUU (1
WK-W
Re) + (2
WK- W
Re)T C VV (2
WK- W
Re)
and thus non negative.

When the cross covariance function Cuv(h) is not even, complex kriging
represents an intermediate solution, from the point of view of precision, between
the separate kriging and the cokriging of the real and imaginary parts.
Complex covariance function modeling

Knowing the real part of a continuous complex covariance function, its imaginary
part can be defined (as a consequence of the Radon-Nikodym theorem by the
relation
i CIm(h) = ( * CRe)(h)
where is a complex distribution whose Fourier transform is areal odd function
 * CRe)(h)

is a complex covariance function, are called compatible imaginary parts.
A simple dass of compatible imaginary parts can be obtained using
 = -(v
1
2
- iI)
where v is areal bounded measure such that Icp( u) I < 1 with

cp(u) = - Jsin(u T) V(dT)
T
The compatible imaginary parts are given by
CIm(h) =~ J[CRe(h - T) - CRe(h +T)] V(dT)

COMMENT 26.6 The imaginary parts from this dass can be viewed as a ran-
domization
CIm(h) = ~ E[ CRe(h - T) - CRe(h + T)]
where the expectation is taken over the random variable T.
The dass can be extended (see [80]) by considering real covariance functions
CC(h) compatible with a given CRe(h) in the sense that the positive measures
P.c ~ p.Re. Then
CIm(h) =' ~ J[CC(h - T) - CC(h +T)] V(dT)

is a compatible imaginary part, provided that IJ sin(uT T) V(dT) I < l.
Some properties of the functions CC(h) are
- the difference CRe(h)-CC(h) is a positive definite functionj
CC(O) ~ CRe(O)j
- when the integral range (see [126], [102]) of CRe(h) exists
JCC(h)dh ~ JCRe(h)dh
- CC(h) is more regular at the origin than CRe(h)j
- if CC(h) and cRe(h) have spectral densities then
r(w) ~ fRe(w) for all w
In practice, a finite sum based on translations Tk is used

1 K
CIm(h) = 2: L Pk (CC(h - Tk) - CC(h + Tk))
k=l
with weights Pk ~ 0 and EPk ~ 1. Instead of only one function CC(h), K func-
tions Ck(h) can be introduced which represent a family of covariance functions
compatible with a given CRe(h). The translations Tk and the models Ck(h) are
chosen after inspection of the graphs of the imaginary part of the experimental
covariance function. The coefficients Pk are fitted by least squares.
LAJAUNIE & BEJAOUI [99] provide a few fitting and complex kriging exam-
pIes using directional data from simulated ocean waves.
27 Bilinear Coregionalization Model
The linear model of eoregionalization of real variables implies even eross eovari-
anee functions and it was thus formulated with variograms in the framework
of intrinsie stationarity. The use of eross variograms however excludes deferred
eorrelations (due to delay effects or phase shifts). A more general model was set
up by GRZEBYK [80] [81] whieh allows for non even eross eovarianee functions.
Complex linear model of coregionalization

The eomplex analogue to the intrinsie eorrelation model is
C(h) = B p(h) = E X(h) - F II:(h) + i (E II:(h) + F X(h))
where p(h) = X(h) + i II:(h) is a sealar eomplex eovarianee function and B is a

hermitian positive semi-definite matrix with
B=E+iF
The matrix E is a symmetrie positive semi-definite matrix while F is anti-

symmetrie.
An underlying linear model with eomplex eoeffieients a~ and seeond-order
stationary uneorrelated eomplex random functions Yp(x) ean be written as
N
Z;(x) = L a~ Yp(x)
p=l
where
eov(Yp(x+h), Yp(x)) = E[ Yp(x+h) Yp(x)] = p(h)

eov(Yp(x+h), Yq(x)) = 0 for p -=I q
The alternate representation is based on jointly stationary eomponents Up(x)

and Vp(x)
N N
Z;(x) = L (c~ Up(x) - ~ v" (x) ) + i L (c~ v,,(x) + ~ Up(x))
p=l p=l
Bilinear Coregionalization Model 173
where the coregionalization of Up(x) and Vp(x) is identical for all values of the
index p and has the form
cov(Up(x+h),Up(x)) = Cuu(h) = Cvv(h)

cov(Up(x+h), Vp(x)) = Cuv(h)
while all other covariances are zero.
The coeflicients of the factor decomposition and the elements of the complex
coregionalization matrix B are related by
N .---r
bij = La~a~
p=l
whereas the elements of E and F are linked to the linear model by
N N
eij = L(c~C:;+~d~) !ij = L(C:;~ - c~d~)
p=l p=l
Naturally we can consider a nested complex multivariate covariance function
model of the type
s
C(h) = LBuPu(h)
u=O
with the underlying complex linear model of coregionalization
S N
Zi(X) = L L a~u Y.!'(x)
u=Op=l
and the orthogonality relations
cov(Y.!'(x+h), y"q(x)) =0 for p -; q or 'U -; V
where 'U and v are the indices of different characteristic spatial or time scales.
Bilinear model of coregionalization

The real linear model of coregionalization is in particular not adequate for mul-
tivariate time series analysis, where delay effects or phase shifts are common and
cannot be included in a model with even cross covariances. A model for real
random functions with non even real cross covariance functions can be derived
from the complex linear model of coregionalization (as defined in the previous
section) by taking its real part. In the case of only one spatial scale (the nested
case is analog) we can drop the index 'U and have
N N
Zi(X) = L c~ Up(x) - L ~ Vp(x)
p=l p=l
This model, as it is the sum of two linear models, has received the name
of bilinear model of coregionalization. We get a handy model by imposing the
foHowing relations between covariance function terms
1
Cuu(h) = Cvv(h) = 2" X(h)
1
Cuv( -h) = -Cuv(h) = 2" x:(h)
This implies that the cross covariance function Cuv(h) is odd. The cross
covariance function between two real variables is
Cij(h) = cov( Zi(x+h), Zj(x))

N N
= I: c~ c:; Cuu(h) + I: d~ d~ Cvv(h)
p=l p=l
N N
- I: c:; d~ Cvu(h) - I: c~ d~ Cuv(h)
p=l p=l
~(Ci d + ~pdpi )
L..Jpp
X(h) _
2
~(d i_cipp
L..Jpp
dj ) x:(h)
2
p=l p=l
The multivariate covariance function model is real

1
C(h) = 2" (E X(h) - F x:(h) )
with matrices E and F stemming from a Hermitian positive semi-definite matrix

B
B=E+iF
and p(h) = X(h) + i x:(h) being a complex covariance function.
GRZEBYK [80] has studied different algorithms which combine the approach
used for fitting the multivariate nested variogram with a fit of x:(h) when X(h)
is given. He provides an example of the fit of a covariance model with non even
cross covariance functions to the coregionalization of data from three remote
sensing channels of a Landsat satellite.
The bilinear coregionalization model is especially weH adapted for analyzing
multiple or multivariate time series, where delay effects at various time scales
are common and often are easily interpretable when causal relations between the
variables are known.
PartE
Non-Stationary Geostatistics
28 Universal Kriging
Universal kriging is a spatial multiple regression implementing a model which

splits the random function into a linear combination of deterministic functions,
known at any point of the region, and a random component, the residual random
function. It turns out that in general this model needs additional specification
because the variogram of the residual random function can only be inferred in
exceptional situations.
Second-order stationary residual

On a domain 1) we have functions f/ of the coordinates x, which are considered
deterministic, because they are known at any location of the domain.
The random function Z(x) is composed of a deterministic component m(x),
the drift, and a residual random function Y(x) (which is the remainder after
subtracting the deterministic part)
Z(x) = m(x) + Y(x)
Assuming in this section that Y(x) is mean zero second-order stationary, we

have
E[
Z(x)] = m(x)
We may have some reason to believe that the drift m(x) depends on the
deterministic functions f/ in a linear way
L
m(x) = La/ ft(x)
/=0
where the coefficients a/ are not zero. The function fo(x) is defined as constant
fo(x) = 1
For kriging we use the linear combination

n
Z*(xo) = L W" Z(x,,)
,,=1
We want no bias
E[ Z(xo) - Z*(xo) ] = 0
178 Non-Stationary Geostatistics
which yields
n
m(xo) - 2: W", m(x",) = 0
",=1
and
L n
2: al (Mxo) - 2: W", fl(x",)) = 0
1=0 ",=1
As the al are non zero the following set of constraints on the weights w'"
emerges
n
2: w'" Mx o) = fl(xo) for 1 = 0, ... , L

0=1
For the constant function fo(x) this is the usual condition

n
2: Wo = 1
<>=1
Developing the expression for the estimation variance, introducing the con-
straints into the objective function together with Lagrange parameters JlI and
!
minimizing, we obtain the universal kriging (UK) system
t; p, I,(x.) ~ C(x.-x,)
n L
~ wß C(x.-xß) - for a = 1, ... , n
2: Wß fl(xß) = Mxo) for 1 = O, ... ,L

ß=1
and in matrix notation

(~ ~) ( : ) = (~)
For this system to have a solution, it is necessary that the matrix
F = (fo, Idots, fL )
is of full column rank, i.e. the column vectors f l have to be linearly independent.
This means in particular that there can be no other constant vector besides the
vector fo. Thus the functions Mx) have to be selected with care.
Dual Kriging System

It is suitable at this point to examine the question of defining an interpolation
function based on universal kriging with a unique neighborhood. The interpo-
lator is the product of the vector of sampIes z with a vector of weights W x which
depends on the location in the domain
z*(x) = ZT W x
Universal Kriging 179
The weight vector is solution of the universal kriging system
(~ ~)(::)=(~;)
in which all terms dependent on the estimation location have been subscribed
with an x. In this formulation we need to solve the system each time for each
new interpolation location. As the left hand matrix does not depend on x, let
us define its inverse (assuming its existence) as
(J ~)
The universal kriging system is
(::) = (JT ~) (~;)

The interpolator can thus be written
z*(x) = ZT Tcx + ZT Ufx
Defining bT = zT T and dT = zT U the interpolator is a function of the right
hand side of the universal kriging system
z*(x) = bT Cx + dT f x
Combining the data vector z with a vector of zeroes, we can set up the system
(J ~) (~) = (~)
which, once inverted, yields the dual system of universal kriging
(ir !) (~) = (~)

It should be noted that when the variable to investigate is equal to one of the
deterministic functions
z(x) = J,(x)
the interpolator is
z*(x) = w T f,
As the weights are constrained to satisfy
w T f, = J,(x)
we see that
z*(x) = J,(x) = z(x)
Thus the interpolator is exact for J,(x).
This clarifies the meaning of the constraints in universal kriging: the resulting
weights are ahle to interpolate exactly each one of the deterministic functions.
Estimation of the drift

The drift can be considered random if the coefficients multiplying the determin-
istic functions are assumed random
L
M(x) = LAI fz(x) with E [ AI] = al
1=0
A specific coefficient A lo can be estimated from the data using the linear
combination
n
Aio = L wlo Z(x,,)

,,=1
In order to have no bias we need the L constraints
L wlo!lo(x,,) = Dllo = {I0

n
,,=1
ifl = 0 ;1
ifl =f: 10 .
Computing the estimation variance of the random coefficient and minimizing
I
with the constraints we obtain the system
n L
~ w1. C(x.-x,) - ~Pil. t,,(x.) ~ 0 for Q = 1, ... ,n
L wfo !lo(Xß) = Dllo for 1 = O, ... ,L

ß=1
What is the meaning of the Lagrange multipliers flllo? Starting from
n n
cov( Ai, Aio) = L L we wfo C(x,,-xß)
,,=1ß=1
Taking into account the kriging equations

n n n L
L we ß=1
,,=1
2: wfo C(x,,-xß) = 2: we 2:flllo !lo(X,,)
,,=1 1=0
and the constraints on the weights

L n L
2: f.Lllo 2: we !lo(X,,) = 2: f.Lllo Dllo
1=0 ,,=1 1=0
we see that
cov(Ai,Aio) =f.Lllo
The Lagrange parameters f.Lllo represent the covariances between the drift
coefficients.
The estimated drift at a specific location is obtained by combining the es ti-

mated coeflicients with the values of the deterministic functions at that location
L
m*(xo) = L aj !I(XO)
1=0
It can be shown that the same solution is obtained without explicitlyestimat-

ing the drift coeflicients, simply by following the same steps as for the kriging of
the mean n
m*(xo) = L w~M Z(xa )
a=1
!
The corresponding system is
n L
~ .J,iM C(x.-xß) - ~ P:<M J,(x.) ~ 0 for a = 1, ... ,n
L w~M !I(Xß) = !1(Xo) for 1 = 0, ... ,L

ß=1
Variogram of the estimated residuals

In the decomposition of Z(x) into drift and a residual random function
Y(X) = Z(x) - m(x)
we have not discussed how the underlying variogram ')'(h) can be inferred, know-
ing that both quantities, drift and residual, need to be estimated. The underlying
variogram is defined as
')'(h) = ~var(Z(x+h) - Z(x)) = ~E[ (Y(x+h) _ Y(X))2]

Experimentally we have access to an estimated residual R*(x) by forming the
difference between an estimated drift M*(x) and Z(x) at data locations Xa
L
R"(xa) = Z(xa) - M*(xa) = Z(xa) - LAi ft(xa)
1=0
The variogram of the estimated residuals between two data locations is
')'*(xa,xß) = ~E[(R"(Xa)-R"(xß)rl
= ')'(xa,xß) + ivar(M*(xa) - M*(xß))
-cov( (Z(xa) - Z(xß)) , (M*(x a) - M*(Xp)))

The variogram of the estimated residuals is eomposed of the underlying var-

iogram and two terms representing bias.
If S is the set of sample points, we ean· define the geometrie eovariogram of
the sample points as
j
Jt,s(h) = ls(x) ls(x+h) dx
V
where V is the domain of interest and ls(x) is the indicator function of the
sample set.
The experimental regional variogram of the estimated residuals is then com-
puted as
G*(h) = n Q 1,1.. \ j ls(x) ls(x+h) (R*(x+h) - R*(x)) dx

'D
Taking the expectation of G*(h) gives a three terms expression, which is

however not equal to the underlying variogram
E[ G*(h)] = Tl - 2T2 + T3 # ')'(h)

where
Tl = ~ ~ 1,,, Jls(x) ls(x+h) E[ (Z(x+h) - Zxr] dx

'D
T2 ~ ~ 1" , j ls(x) ls(x+h) E[ At (Z(x+h) - Zx) ]

'D
x (j,(x+h) - j,(x)) dx
T3 = ~~1,,, j1s(X)ls(x+h)cov(At,A:)
'D
x (J,(x+h) - f,(x)) (J8(x+h) - f.(x)) dx

The first term can be computed from data while the two other terms represent
the bias, which is generally considerable. Even if we assume to know the drift,
replacing At by a" we still do not obtain the underlying variogram because
E[ G*(h)] = Tl - T3 # ')'(h)
Two methods exist for attempting to infer the underlying variogram, but they
are weH defined only for regularly gridded data (see [115], [31]).
In exceptional situations the underlying variogram is directly accessible:
- when the drift occurs only in one part of a domain which is assumed ho-
mogeneous, then the variogram model is inferred in the other, stationary
part of the domain and can be transferred to the non stationary part;
...
'C 'C
.
.. .
0) S.
B
~
...
0)
::0 , l" .'':

0)
e.. 4. ~
...
0)
4. ••
8
B 8'
....
0)
~ 3. 3.
..
0) I:l
'"
. -....-,..
EI 0)
~ 8 ~.
'"c 2. 2. ~
::0 ~ • " i' • .l-
....'" ::0
I:l I ••
....'"
I.
... J. i' I
I •
100. 200. 300. 400. km tiOO. 700. 800. 900. km
Longitude Latitude
Figure 28.1: Scatter diagrams of temperature with longitude and latitude: there is a
systematic decrease from west to east, while there is no such trend in the north-south
direction.
- when the drift is not aetive in a partieular direction of spaee, the variogram
inferred in that direction ean be extended to the other directions under an
assumption of isotropie behavior of the underlying variogram.
To these important exeeptions we ean add the ease of a weak drift, when the
variogram can reasonably weH be inferred at short distances, where the bias is
not so strong. This situation is very similar to the case when ordinary kriging is
applied in a moving neighborhood with a loeaHy stationary model. The necessity
for including drift terms into the kriging system may be subjeet to discussion in
that case.
EXAMPLE 28.1 On Figure 28.1 we see the seatter diagrarns of average January
temperature in Seotland plotted against longitude and latitude (details are given
in [87]). Clearly there is a systematie deerease of mean temperature from west
to east due to the effeet of the Gulf stream whieh warms up the western eoast
of Seotland in the winter. Between north and south there is no tendeney to be
seen.
The variograms along longitude and latitude are shown on Figure 28.2. The
east-west experimental variogram is influeneed by drift and grows without bounds
while the shape of the north-south variogram suggests a sill: the latter has been
modeled using three spherical functions with ranges of 20, 70 and 150km. By
assuming that the ~nderlying variogram is isotropie, we ean use the north-south
variogram model also in the east-west direction. The interpretation of the east-
west experimental variogram is that drift masks off the underlying variogram in
that direction.
r (h)
//
I
"
15 E-W ". _______.l
,
"
,.-" "
"
.'
1.0 "
,.'""
,I'
,." " A.
" ", " ............ ..-_----...'4

"''''... ,l'
.........",,#'
N-S
h
50. 100. 150. km
Figure 28.2: Variograms of average January temperature data in the east-west and
north-south directions: a model is fitted in the direction with no drift.
A criticallook at the universal kriging model

In general it is difficult and cumbersome to infer the underlying variogram (as-
suming it existence). Two steps lead to a more sound and workable model:
1. The dass of deterministic functions is restricted to functions which are

translation-invariant and pairwise orthogonal.
2. With translation-invariant drift functions a specific structural analysis tool

can be defined: the generalized covariance function K(h), of which the
variogram is only a particular case.
With the second step we understand that the quest for the underlying var-
iogram can be vain: the residuals can have a structure which is simply not
compatible with a variogram and which can only be captured by a generalized
covariance. This leads us to a powerful theory of which only a elementary sketch
is provided in the next chapter. In the last chapter we discuss deterministic
functions which are not translation invariant, when dealing with external drift.
29 Translation Invariant Drift
The characterization of the drift by a linear combination of translation-invariant

deterministic functions opens the gate to the powerful theory of the intrinsic
random functions of order k. A more general tool of structural analysis than the
variogram can be defined in this framework: the generalized covariance function.
A simplified account is given in this chapter.
E~ponential/polynomial basis functions

We consider a non stationary random function Z(x), called an intrinsic random
function of order k (IRF-k)
Z(x) = Zk(X) + mk(x)

consisting of a deterministic part mk(x), representing the drift as a k-th order
polynomial, and a random part Zk(X) with an associated generalized covariance
K(h). The polynomial of order k is a linear combination of functions !I(x) of
the coordinates with coefficients a/
L
mk(x) = La/ !I(x)
/=0
The L+1 basis functions f/(x) need to generate a translation invariant vector
space. It can be shown that only functions belonging to the dass of exponentials-
polynomials fulfill this condition.
COMMENT 29.1 In two spatial dimensions with coordinate vectors x = (Xl, X2?
the following monomials are often used
fo(x) = 1, /I(x) = Xl, h(x) = X2, Ja(x) = (xd, f4(X) = Xl· X2, f5(X) = (X2)2
which generate a translation invariant vector space. The actual number L+ 1 of
basis functions of the drift depends on the degree k of the drift in the following
way:
for k = 0 we have one basis function and L = 0,

for k = 1 we have three basis functions and L = 2,
for k = 2 we have six basis functions and L = 5.
With estimation problems in mind we consider weights WQ which interpolate

exactly the terms a/ j,(x) of the drift polynomial
n
E W a/ j,(xex ) = a/ j,(xo)
Q
01=1
Obviously the coefficients a/ are redundant and can be dropped for the inter-
polation of the basis functions f/(x).
With weights constrained as
n
E W ex j,(xex ) = j,(Xo) for l=O, ... ,L

01=1
for any linear estimator

n
Z*(xo) = E W ex Z(xex )
01=1
the estimation error vanishes on average
E[ Z*(xo) - Z(xo) 1= E[ Z;(xo) - Zk(XO) 1= 0

COMMENT 29.2 By introducing a weight Wo = -1
n
E W ex j,(xex ) = 0 for 1= O, ... ,L
01=0
and we can write
E[ Z*(XQ) - Z(xo) 1= E[ t W ex Zk(Xex ) ] = 0

01=0
where the weights of zero sum filter the translation-invariant drift.
Generalized covariance function

A function J{ (h) is the generalized covariance function of an intrinsically sta-
tionary random function of order k if the variance of any linear combination of
values Zk(X ex ) with weights W ex equals
var(~ W ex Zk(Xex )) = E[ (~wex Zk(Xex )

n n
r]
= l: l: W ex wß J{(xex-xß)
ex=Oß=O
for any set of weights constrained to
n
E W ex j,(xex ) = 0 for l=O, ... ,L
01=0
Translation Invariant Drift 187
A generalized eovarianee is a k-th order conditionally positive definite function

n n n
L LW", Wß K(x",-xß) ~ 0 for L W",!I(X",) = 0, 1 = 0, ... ,L

",=Oß=O ",=0
From this definition we see that the negative of the variogram, -"Y(h), is the
generalized eovarianee of an intrinsie random function of order zero. The theory
of intrinsically stationary random functions of order k provides a generalization of
the random functions with seeond order stationary inerements, whieh themselves
were set in a broader framework than the seeond order stationary functions.
The inferenee of the highest order of the drift polynomial to be filtered is
usually performed using automatie algorithms, a proeedure whieh we shall not
deseribe.
EXAMPLE 29.3 A possible model for a k-th order generalized covariance is
Ko(h) = r (-~) Ihlo with 0< () < 2k+2
where r(·) is the gamma funetion.
EXAMPLE 29.4 From the preeeding model a generalized eovarianee ean be de-
rived which has been (unfortunately) ealled the polynomial model
k
Kpol(h) = L bu (_1)u+1l h I2U+1 with bu ~ 0
u=O
The eonditions on the eoeflicients bu are suflicient. Looser bounds on these

eoefIicients are given in MATHERON {119}.
The polynomia.l generalized eovarianee model is a nested model whieh has
the striking property that it is built up with several structures having a different
behavior at the origin. For k= 1 (linear drift) the term for u= 0 is linear at
the origin and is adequate for the deseription of a regionalized variable whieh
is eontinuous but not differentiable. The term of the polynomial generalized
eovarianee for u = 1 is eubie and thus appropriate for differentiable regionalized
variables.
If a nugget-effeet eomponent is added to the polynomial genera.lized eovari-
anee model, diseontinuous phenomena are also eovered. This extended poly-
nomial generalized eovarianee model is flexible with respect to the behavior at
the origin and well suited for automatie fitting: inessential struetures get zero
weights bu , provided the fitting is eorreet.
The polynomial generalized eovarianee model ean be used with a moving
kriging neighborhood.
Spatial and temporal drift
To motivate the problem of spaee-time drift let us take the example of earth
magnetism measured from a traveling ship (see SEGURET & HUCHON [171]).
When studying the magnetism of the earth near the equator, a strong variation in
time is observed whieh is due to the 24 hour rotation of the earth. Measurements
taken from a ship which zigzags during several days in an equatorial zone need
to be corrected from the effect of this periodic variation.
The magnetism is modeled with a space-time random function Z(x, t) using
two uneorrelated intrinsie random functions of order k, Zk,w(X) and Zk,w(t), as
weH as spatial polynomial drift mk(x) and temporal periodie drift mw(t)
Z(x, t) = Zk,w(X) + mk(x) + Zk,w(t) + mw(t)
where x is a point of the spatial domain V and t is a time coordinate.

The variation in time of earth magnetism is known from stations on the land.
Near the equator a trigonometrie drift is dominant
mw,<p(t) = sin(wt + <p)

with aperiod of w=24 hours and aphase <po Using a weH-known relation this
can be rewritten as
mw(t) = a sin(wt) + b cos(wt)
where a = eos<p and b = sin<p are multiplieative eonstants whieh will be ex-
pressed implicitly in the weights, so that the phase parameter <p does not need
to be provided explieitly.
Filtering temporal variation
In an equatorial region the time eomponent Zk,w(t) of magnetism is usually

present as weak noise while the temporal drift dominates the time dimension.
Neglegting the random variation in time, the model can be expressed as the
approximation
Z(x, t) ~ Zk,w(X) + mk(x) + mw(t)
For mapping magnetism without the bias of the periodie variation in time,
the generalized eovariance is inferred taking into account the space-time drift.
Subsequently the temporal drift is filtered by setting the eorresponding equations
Translation Invariant Drift 189
to zero on the right hand side of the following universal kriging system
n L
L Wp K(x,,-xp) - LJ1./ Nx,,) - VI sin(wt,,) - V2 cos(wt,,)
P=l /=0
= K(x,,-xo) for a = 1, .. . ,n
n
L wp Nxp) = Nxo) for I = 0, .. . ,L

P=l
n
L wp sin(wtp) = [Q]
P=l
n
L wp cos(w tp) = [Q]

P=l
If however the random component Zk,w(t) is significant, we need to infer its

generalized covariance 1«r) and to filter subsequently Z(t) = Zk,w(t) + mw(t)
using the system
L wp IK(x,,-xp, ta-tp) I - L
n L
J1.1 j,(xa) - Vl sin(w ta) - V2 cos(wta)
P=l 1=0
=K(xa-xo) for a = 1, .. . ,n
n
L wp fl(xp) = Nxo) for I = 0, .. . ,L

P=l
n
LWP sin(wtp) = 0
P=l
n
LWP cos(wtp) = 0
P=l
where the terms 1«xa-xß, ta-tß) are modeled as the sum of the generalized
covariances 1«xa-xß) and 1«t a-tß).
30 External Drift
In some situations it may be of interest to use arbitrary drift functions which

are not translation-invariant, like in the external drift method exposed in this
chapter, which results a particular formulation of universal kriging. The theory
based on translation-invariant drift, sketched in the last chapter, does not apply
anymore.
Depth measured with drillholes and seismic

It may happen that two variables measured in different ways reflect the same
phenomenon and that the primary variable is precise, but known only at few
locations, while the secondary variable cannot be accurately measured, but is
available everywhere in the spatial domain.
The classical example is found in petroleum exploration, where the top of
a reservoir has to be mapped, which is delimited by a continuous geologie
layer( except for faults) because petroleum is usually found in sedimentary for-
mations. Data is available from two sources
- the precise measurements of depth stemming from drillholes. They are not
a great many because of the excessively high cost of drilling. We choose to
model the drillholes with a random function Z(x).
- imprecise measurements of depth, deduced from seismic travel times and

covering the whole domain at a small scale. The seismic depth data pro-
vides an image of the shape of the layer with some inaccuracy due to the
difIiculty of converting seismic reflection times into depths. This second
variable is represented as a regionalized variable s(x) and is considered as
deterministic.
As Z(x) and s(x) are two ways of expressing the phenomenon "depth of the
layer" we assume that Z(x) is on average equal to s(x) up to a constant ao and
a coefIicient b1
E[ Z(x) 1= ao + b s(x)
1
The deterministic function s(x) describes the overall shape of the layer in
imprecise depth units while the data about Z(x) give at a few locations an infor-
mation about the exact depth. The method merging both sources of information
uses s(x) as an external drift function for the estimation of Z(x).
External Drift 191
Estimating with a shape function

We consider the problem of improving the estimation of a second order station-
ary random function Z(x) by taking into account a function sex) describing its
average shape. The estimator is the linear combination
n
Z*(xo) = LW" Z(x,,)
,,=1
with weights constrained to unit sum, so that
E[ Z*(xo) 1 = E[ Z(xo 1
which can be developed into
n
E[ Z*(xo) 1 = LW" E[ Z(x,,) 1

,,=1
n
= ao + b1 LW" sex,,)
,,=1
= ao + b1 s(xo)
This last equation implies that the weights should be consistent with an exact
interpolation of sex)
n
s(xo) = LW" sex,,)
,,=1
The objective function to minimize in this problem consists of the estimation
variance (T~ and of two constraints
~ (T~ ~1 ( ; Wa -1) - ~2 ( ; w"s(x,,) -

- s(xo))
The result is a universal kriging system

n
L Wß C(x,,-xß) - ~1 - ~2 sex,,) = C(x,,-xo) for 0: = 1, ... ,n
ß=l
n
LWß= 1
ß=l
n
L wß s(xß) = s(xo)
ß=l
The mixing of a second-order stationary random function with a (non sta-
tionary) mean function may look surprising in this example. Stationarity is a
concept that depends of scale. Data can suggest stationarity at a large scale
(for widely spaced data points on Z(x)) while at a sm aller scale things look non
stationary (when inspecting the fine detail provided by a function sex)).
100 200 300 400km

---r-1
Min 0.00 1.00 2.00 3.00 4.00 5.00

Max 1.00 2.00 3.00 4.00 5.00 6.00 oe
Figure 30.1: Kriging mean January temperature in Scotland with a linear drift.
External Drift 193
Kriging with external drift
The external drift method consists in integrating into the kriging system sup-
plementary universality conditions about one or several external drift variables
Si(X), i = 1, ... , N measured exhaustively in the spatial domain. Actually the
functions Si(X) need to be known at aIllocations x" of the sampies as weIl as at
the nodes of the estimation grid.
The conditions
n
E W" Si (X,,) = slxo) i = 1, ... ,N
,,=1
are added to the kriging system independently of the dass of covariances K(h),
hence the qualificative external. A dass of generalized covariances can only be
defined with respect to translation-invariant basis functions fl(x).
With this definition of external drift we can have a new look at the inference
problems in the original theory of Universal Kriging, which was limited to in-
trinsic random functions with a variogram ,(h). These problems stern from the
fact that the drift functions fl(x) (l > 0) stay "external" to the process of infer-
ring the variogram. Attempts to integrate the drift functions into the inference
procedure are due to fail as the dass of generalized covariances for an IRF-O
K(h)= -,(h)
only filters (local) constants and is apriori too narrow to model adequately the
residuals.
A kriging system with translation-invariant and multiple external drift can
be written as
n L N
E Wß K(x,,-xß) - E J1.z!I(X,,) - E J1.i Si(X,,) = K(x,,-xo)
ß=1 1=0 i=1
for a = 1, ... ,n
n
E Wß fl(xß) = fl(xo) for 1 = 0, ... , L

ß=1
n
E Wß Si(Xß) = Si(XO) for i = 1, ... ,N
ß=1
When applying the method with a moving neighborhood it is useful to map

the coefficients bi of each external drift to measure the influence of each extern al
variable on the estimate in different areas of the region. An estimate bio is
cu oe
B 5.0
CI:S
I-<
cu
0-
S
2::0'
~ 2
cu CI:S
S.§ o.
~!
::I
~
.....,
o. 500.0 1000.0 m
Elevation
Figure 30.2: Estimated values of kriging with linear and external drift plotted against
elevation: the estimated values above 400m extrapolate out of the range of the data.
obtained by modifying the right hand side of the system

n L N
L Wß J{(x",-xß) - L 1'1 !l(x",) - L I'i Si(X",) = 0
ß=1 1=0 i=1
for a = 1, ... ,n
n
L Wß !l(xß) = 0 for 1 = O, ... ,L
ß=1
n
L Wß Si(Xß) = Diio for i = 1, ... , N
ß=1
where Diio is a Kronecker symbol which is one for i= io and zero otherwise.
EXAMPLE 30.1 (see [87]) We have data on mean January temperature at 146
stations in Seotland whieh are all loeated below 400m altitude, whereas the
scottish mountain ranges rise up to 1344m (Ben Nevis). The variogram of this
data was shown with a model on Figure 28.2 (on page 184) which is taken as the
underlying isotropie variogram model.
Figure 30.1 displays a map obtained by kriging with a linear drift. The
estimated temperatures stay within the range of the data (0 to 5 degrees Celsius).
Temperatures below the ones observed ean however be expected at altitudes
above 400m.
It could be verified that temperature depended on elevation in a reasonably
linear way. The altitude of the stations was known and elevation was available
External Drift 195
100 200 300 400
Min -5.00 -2.50 0.00 1.00 2.00 3.00 4.00 5.00

Max -2.50 0.00 1.00 2.00 3.00 4.00 5.00 6.00 oe
Figure 30.3: Kriging mean J anuary temperature in Seotland using elevation data from
a digitized topographie map as external drift.
100 200 300 400
Min -0.020 -0.015 -0.012 -0.009 -0.006 -0.003 o.

Max -0.015 -0.012 -0.009 -0.006 -0.003 o. 0.001
Figure 30.4: Map of the estimated external drift coefficient br when using elevation
as the external drift for kriging temperature with a moving neighborhood.
Externat Drift 197
at the nodes of the estimation grid by digitalizing a topographie map. The

map of mean January temperature on Figure 30.3 was obtained by adding to
the universal kriging system the elevation data as an external drift s(x). The
kriging was performed with a neighborhood of 20 stations and the estimated
values now extrapolate outside the range of the data at higher altitudes as seen
on the diagram of Figure 30.2.
The map of the estimated coeflicient bt of the external drift is displayed on
Figure 30.4. The eoeflicient is more important in absolute value in the west, in
areas where stations are scarcer: the kriging estimator relies more heavily on the
secondary variable when there is fewer data about the primary variable nearby.
Cross validation with external drift

The subject of cross validation with the external drift method was examined in
detail by CASTELlER [21), [22) and is presented in the form of three exercises.
The random function Z(x) is second order stationary and only one drift function
s(x) is taken into account.
EXERCISE 30.2 We are interested in pairs of values (za, sa) of a variable of
interest Z(x) with a covarianee function C(h) and an external drift variable s(x)
at loeations x a , 0:' = 1, ... , n in a domain 1). The values are arranged into vectors
z and S of dimension n.
To clarify notation we write down the systems of simple kriging, ordinary
kriging, kriging with external drift and kriging of the mean
Simple Kriging (SK)

CWSK = Co and WSK = Rco where R=C- 1
Ordinary (OK)
( Fc l)(WOK)=(co) 1.e. K ( WOK) = ko

0 -IlOK 1 -IlOK
and
( WOK ) = K- 1 k o
-IlOK
with K-1=(~ :)
Kriging with external drift (KE)
(~
1
o o S) (-IlKE
WKE ) = ( Co
1)
WKE )
F ( -1l~E
°
1.e. = Co
ST o -IlKE So -IlKE
Kriging of the mean (KM):

n
K (WKE ) = (0)
-IlKE 1
and m* = LW~KZa = (zT,O)K- 1
a=l
(~)
a) Show that the variance oE ordinary kriging is

2
O"OK = Coo - k T0 K- 1 k 0
b) Using the Eollowing relations between matrices and vectors oE GI(
CA + 1 yT = I
Cy + Iu =0
ITA = OT
ITy =1
show that the kriging oE the mean can be expressed as
* zTRI
m = FRI
c) We deiine Castelier's scalar product
xTRy
< x,y > = ITRI
and a corresponding mean and covariance
En[z] = < Z, I >

En[z,s] = < z,s >
covn(z,s) = En[z,s]-En[z]En[s]
Defining Castelier's pseudo-scalar product using the inverse K- 1 oE OK (in-
stead oE the inverse R oE SK)
Kn(z,s) = (zT,O)K- 1 (~)

show that
Kn(z,s) = ZT As = ITRlcovn(z,s)
d) As the external drift is linked to the expectation oE the variable oE interest
by a linear relation
E[Z(x)] = a + bs(x)
m m
and as the coeflicient b and the constant a oE the external drift are estimated by
b' = (ZT,O,OW' and a' = (ZT,O,OW'
show that b* and a* are the coeflicients oE a linear regression in the sense oE
Castelier's scalar product
b* = covn(z,s)
and a* = En[z]- b* En[s]
.covn(s,s)
External Drift 199
EXERCISE 30.3 Assume, in a first step, that we have an additional data value Zo
at the point Xo let us rewrite the left hand matrix of OK with n + 1 informations
K o = (Coo kb ) and KÖ1 = (uoo Vb)

ko K Vo Ao
where Coo = G(xo-xo), K, k o are defined as in the previous exercise.

As K o K Ö1 = I, we have the relations
COOUoo + kbvo =1
kouoo + Kvo =0
Coovb + kbAo = OT
kovi;" + KAo =1
a) Show that
1
Uoo = -2-
O"OK
b) Let zb = (zo,zT) and sb = (sO,ST) be vectors of dimension n + 1, and
Kn+1(zo, so) = (zb,0)K ö1 (S;)
Show that
Kn+1 (zo, so) = Kn(z, s) + Uoo (zo - (ZT, 0) K- 1 k o) . (so - (ST, 0) K- 1 k o)
c) Let ß(zolz) be the OI( errar
ß(zolz) = Zo - z~ = Zo - (zT, 0) K- 1 k o
Show that
ß(sols) T -1 ß(sols)
Kn+1(zo,So)=zo 2 -(z,O)K ko 2 +Kn(z,s)
O"OK O"OK
and that
ß(sols) = (uoo, V TO)· (so)
0
0"6K
EXERCISE 30.4 In a second step, we are interested in an OI( with n-l points.
We write ß(zalz[a)) for the errar at a location x a with a kriging using OIJ.ly the
n-l data contained in the vector
Z[a) = (Z1, ..• , Za-h Za+b .•. , Zn?
and we denote 0"1aJ the corresponding ordinary kriging variance.

a) EstabIish the Eollowing elegant relation
Ll( s~IS[a])
2
K- 1 (~) = ara]
m*
b) As the coeflicient oE the drift is estimated by
n Kn(z,s)
b* L Wb Za = Kn(s, s)
= a=l
show that
a Ll(sals[a])
Wb = a[a]2 Kn (s, s )
The computation oE the errors Ll(sals[a]) can be interesting in applications.
They can serve to explore the influence of the values Sa on the KE estimator in
a given neighborhood.
Regularity of the external drift function

CASTELlER [21], [22] has used the description of the influence of the external drift
data on the kriging weights to show that the external drift should preferably
be a smoother function than realizations of the primary variable Z(x). This
question can be checked on data by examining the behavior of the experimental
variogram of s(x) and comparing it with the variogram model adopted for Z(x).
The behavior at the origin of the experimental variogram of s(x) should be more
regular than that of the variogram model of Z(x).
For example, using the same setting as in the previous section, the ratio of
the squared cross-validation error of the external drift kriging with respect to
the variance ara) of ordinary kriging (omitting the sampie at the Iocation x a )
Ll( Sa IS[a))2
2
ara)
could in an ideal situation be constant for all x a . In this particular case s(x) could
be considered as a realization (up to a constant factor) of the random function
Z(x) and the covariance function of the extern al drift would be proportional to
the c~variance function of Z(x). This ratio is worth examining in practice as
it explains the structure of the weights used for estimating the external drift
coefficient b* (see Exercise 30.4).
Appendix
Matrix Algebra
This review consists of a brief exposition of the principal results of matrix algebra,
scattered with a few exercises and examples. For a more detailed account we
recommend the book by STRANG [183].
Data table
In multivariate data analysis we handle tables of numbers (matrices) and the one
to start with is generally the data table Z
Variables
(columns)
Zl,l Zl,i Zl,N
Sampies I= [Zai] =Z
Zo,l Za,i Zo,N
(rows)
Zn,l Zn,i Zn,N
The element Zoi denotes a numerical value placed at the crossing of the row
number a (index of the sampie) and the column number i (index of the variable).
Matrix, vector, sealar

A matrix is a rectangular table of numbers
A = [aij]
with indices
i = 1, ... ,n j=l, ... ,m
We speak of a matrix of order n x m to designate a matrix of n rows and m

columns.
A vector (by convention: a column vector) of dimension n is a matrix of order
n X 1, i.e. a matrix having only one column.
A scalar is a number (one could say: a matrix of order 1 xl).
Concerning notation, matrices are symbolized by bold capital letters and
vectors by bold smal1 caps.
204 Appendix
Sum
Two matrices of the same order cau be added
A + B = [aii] + [bii] = [aii + bii]
The sum of the matrices is performed by adding together the elements of the
two matrices having the same index i aud j.
Multiplication by a scalar
A matrix cau be multiplied equally from the right or from the left by a scalar
AA = AA = [Aaii]
which amounts to multiply all elements aii by the scalar A.
Multiplication of two matrices

The product AB of two matrices A aud B can only be performed if B has as
many rows as A has columns. Let A be of order n x m and B of order m xl.
The multiplication of A by B produces a matrix C of order n x 1
A . B = [faii
. (nxm) (mxl) i=l
bjl,] = [Cik] = C
(nxl)
where i = 1, ... , n and k = 1, ... ,1.

The product BA of these matrices is only possible if n is equal to 1 and the
result is then a matrix of order m x m.
Transposition
The trauspose AT of a matrix A of order n x m is obtained by inverting the
sequence of the indices, such that the rows of A become the columns of AT
A = [aii], AT = [aii]
(nxm) (mxn)
The trauspose of the product of two matrices is equal to the product of the
transposed matrices in reverse order
(AB? =BTAT
EXERCISE 1.1 Let 1 be the vector of dimension n whose elements are a11 equal
to 1. Carry out the products Fl aud lF.
EXERCISE 1.2 Calculate the matrices resulting from the products
!ZT 1 aud !nTz

n n
where Z is the n x N matrix of the data.
Matrix Algebra 205
Square matrix
A square matrix has as many rows as eolumns.
Diagonal matrix
A diagonal matrix D is a square matrix whose only non zero elements are on the
n
diagonal
dl l o
D= 0 (
o o
In partieular we mention the identity matrix
hO o
o D
whieh does not modify a matrix of the same order when multiplying it from the
right or from the left
AI=IA=A
Orthogonal matrix
A square matrix A is orthogonal if it verifies
AT A = AAT =1
Symmetrie matrix
A square matrix A is symmetrie if it is equal to its transpose
A=AT
EXAMPLE I.3 An example oE a symmetrie matrix is the varianee-eovarianee ma-

trix V, eontaining the varianees on the diagonal and the eovarianees off the
diagonal
1 T
V = [O'iiJ = - (Z - M) (Z - M)
n
where M is the reet angular n x m matrix oE the means (solution oE the exercise
I.2), whose eolumn elements are equal to the mean oE the variable eorresponding
to a.given column.
206 Appendix
EXAMPLE 1.4 Another example oE a symmetrie matrix is the matrix R oE eor-

relations
R = [pi;] = D".-l V D".-l
where D".-l is a diagonal matrix containing the inverses oE the standard devia-
tions oE the variables
1
o o
JüU
D".-l = o o
1
o o
JUNN
Linear independence
A set of vectors {ab ... , a.n} is said to be linearly independent, if there exists no
trivial set of scalars {Xl, ... , x m } such a.s
m
L:ajxj = 0
;=1
In other words, the linear independence of the columns of a matrix A is
acquired, if only the nul vector x = 0 satisfies the equation Ax = o.
Rank of a matrix
A rectangular matrix can be subdivided into the set of column vectors which
make it up. Similarly, we can also consider a set of "row vectors" of this matrix,
which are defined a.s the column vectors of its transpose.
The rank of the columns of a matrix is the maximum number of linearly
independent column vectors of this matrix. The rank of the rows is defined in an
analog manner. It can be shown that the rank of the rows is equal to the rank
of the columns.
The rank of a rectangular n x m matrix A is thus lower or equal to the smaller
of its two dimensions
rank(A) ::; min(n, m)
The rank of the matrix A indicates the dimension of the vector spacesl M (A)
and N(A) spanned by the columns and the rows of A
M(A) = {y: y = Ax} and N(A) = {x: x = yT A}
where x is a vector of dimension m and y is a vector of dimension n.

Ido not mix up: the dimension of a vector and the dimension of a vector space.
Matrix Algebra 207
Inverse Enatrix
A square n x n matrix A is singular, if rank(A) < n, and non singular, if
rank(A) = n.
If A is non singular, an inverse matrix A -1 exists, such as
AA -1 = A -1 A = 1
and A is said to be invertible.

The inverse Q-1 of an orthogonal matrix Q is its transpose QT.
Deternrlnant of a Enatrix
The determinant lAI of a square n x n matrix Ais
n
lAI = det(A) = 2) _lt(kl>...,k II a;ki
n)
;=1
where the sum is taken over all the permutations (kl, ... , k n ) of the integers
(1, ... , n) and where N(k h ... , kn ) is the number of transpositions of two inte-
gers necessary to pass from the starting set (1, ... , n) to a given permutation
(kt , .•• , kn ) of this set.
In the case of a 2 X 2 matrix there is the well-known formula
A = (ac d
b)' det(A) = ad - bc
A non-zero determinant indicates that the corresponding matrix is invertible.
Trace
The trace of a square n x n matrix is the sum of its diagonal elements
n
tr(A) = I: a;;
;=1
Eigenvalues
Let A be square n x n. The characteristic equation
1.-\1- AI = 0
has n in general complex solutions >. called the eigenvalues of A.
The sum of the eigenvalues is equal to the trace of the matrix
n
tr(A) = I: >'p
p=1
208 Appendix
The product of the eigenvalues is equal to the determinant of the matrix

n
det(A) = II A p
p=l
If A is symmetrie, all its eigenvalues are real.
Eigenvectors
Let A be a square matrix and A an eigenvalue of A. Then vectors x and y exist,
whieh are not equal to the zero veetor 0, satisfying
(AI - A)x = 0, yT(H-A)=O
l.e.
Ax= AX, yT A = AyT
The vectors x are the eigenvectors of the eolumns of A and the y are the
eigenvectors of the rows of A.
When A is symmetrie, it is not neeessary to distinguish between the eigen-
vectors of the eolumns and of the rows.
EXERCISE 1.5 Show that the eigenva.lues oE a squared symmetrie matrix A 2 =

AA are equa.l to the square of the eigenva.lues of A. And that any eigenveetor
oE A is an eigenveetor of A 2 •
Positive definite matrix

DEFINITION:
Asymmetrie n x n matrix A is positive definite, iff (if and only if) for any
non zero vector x the quadratie form
xT Ax > 0
Similarly A is said to be positive semi-definite (non negative definite), iff
xT Ax > 0 for any vector x. Furthermore, A is said to be indefinite, iff
xT Ax > 0 for some X and xT Ax < 0 for other x.
We notiee that this definition is similar to the definition of a positive definite
function, e.g. the eovarianee function C(h).
We now list three very useful eriteria for positive semi-definite matrices.
FIRST CRITERION:
Ais positive semi-definite, iff a matrix W exists such as A = WTW.
In this situation W is sometimes written VA.
SECOND CRITERION:
A is positive semi-definite, iff all eigenvalues Ap ~ O.
Matrix Algebra 209
THIRD CRITERION:
A is positive semi-definite, iff all its prineipal minors are non negative.
A prineipal minor is the determinant of a prineipal submatrix of A. A prinei-
pal submatrix is obtained by leaving out k eolumns (k = 0,1, ... ,n -1) and the
eorresponding rows whieh cross them at the diagonal elements. The eombinato-
rial of the prineipal minors to be eheeked makes this eriterion less interesting for
applieations with n > 3.
EXAMPLE 1.6 It is easy to check with the third criterion that the left hand
(n +1) x (n+ 1) matrix of the ordinary kriging system, expressed with covariances,
is not positive semi-definite.
The principal submatrix S obtained by leaving out all columns and rows
crossing diagonal elements, except for the last two
S = (ein ~)
has a determinant equal to -1.
The computation of the eigenvalues of the left hand matrix of OK yields one
negative eigenvalue (resulting from the universality eondition) and n positive
(or zero) eigenvalues, on the basis of eovariances. Thus built up using a covari-
anee function, this matrix is indefinite, while it is negative (semi-)definite with
variogram values.
Decomposition into eigenvalues and eigenvectors
With a symmetrie matrix A the eigenvalues together with eigenvectors normed

to unity form the system
AQ = QA with QTQ = I
where A is the diagonal matrix of eigenvalues and Q is the orthogonal matrix of

eigenvectors.
As QT = Q-l, this results in a deeomposition of the symmetrie matrix A
A = QAQT
Singular value decomposition
Multivariate analysis being the art to deeompose tables of numbers, a decom-

position which ean be applied to any rectangular matrix (in the same spirit as
the deeomposition of a symmetrie matrix into eigenvalues and eigenvectors) is
to playa central role in data analysis.
210 Appendix
The decomposition into singular values ftp of a rectangular n X m matrix A

of rank r can be written as:
A Ql :E Q~
(n X m) (n X n) (n X m) (m X m)
where Ql and Q2 are orthogonal matrices and where :E is a rectangular matrix
with r positive values ftp on the diagonal (the set of elements with equal indices)
and zeroes elsewhere. For example, in the case n> m and r= m, the matrix :E
will have the following structure
J.ll 0 0
o 0
:E = o 0 J.lr
o 0 0
o 0 0
Such a decomposition always exists and can be obtained by computing the

eigenvalues Ap of AAT and of AT A, which are identical, and positive or zero.
The singular values are defined as the square roots of the non zero eigenvalues
J.lp = .p:;,
In this decomposition Ql is the matrix of eigenvectors of AAT , while Q2 is
the matrix of eigenvectors of AT A.
EXERCISE 1.7 What is the singular value deeomposition oE a symmetrie matrix?
Moore-Penrose generalized inverse

An inverse matrix exists for any square non singular matrix. It is interesting to
generalize the concept of an inverse to singular matrices as weIl as to rectangular
matrices.
An m x n matrix X is a Moore-Penrose generalized inverse of a reet angular
n X m matrix A if it verifies the following four eonditions
AXA = A
XAX = X
(AX)T = AX
(XA? = XA
Such a matrix X is denoted A +.

Matrix Algebra 211
Z1
22
J
• 0
~ Xo
Figure LI: Estimation point Xc and two sampies at loeation Xl.
EXERCISE 1.8 Is the inverse A -1 of a square non singular matrix A a Moore-

Penrose generalized inverse?
The Moore-Penrose inverse is obtained from a singular value deeomposition,

reversing the order of the two orthogonal matrices, transposing ~ and inverting
each singular value:
A+ = Q2~+Qi
The matrix ~+ is of order m x n and has, taking as an example n > m and
r = m, the structure
-1
o o 0 o o 0
~)
1-'1
D C'i"
~+ =
(
~ o 0 o 0
o 1-'-;1 0 o >.. -1/2
r
0
EXAMPLE 1.9 (SIMPLE KRIGING WITH A DUPLICATED SAMPLE) As a geosta-

tistical application one might wonder if the Moore-Penrose inverse can be used
to solve a kriging system whose left hand matrix is singular. We shall not treat
the problem in a general manner and only solve a very simple exercise.
A value is to be estimated at Xo using two values located at the same point
Xl of the domain aB represented on Figure 1.1. The simple kriging system for
this situation is
(: :) (::) = (~)
where a is the variance of the data and where b is the covariance between the
points Xl and xo.
The matrix A being singular we resort to the Moore-Penrose generalized
inverse.
Let
AAT = AT A = (2a 2 2a 2 ) =
2a 2 2a 2 c c
=C (c c)
212 Appendix
Wehave
det(C) = 0 => A2 = 0
tr(C) = 2e => Al = 2e
and a matrix of eigenveetors (normed to one)
Q= (~T2 -72
~)
The solution of the system, relying on the generalized inverse, is
w ~ A+b ~ QE+QTb ~ (~) = (~)

~ 2a
With eentered data we have in the end the following estimation at Xo
Z*(xo) = ! (Zl(Xt}; Z2(Xl))
This solution does malm sense as the estimator takes the average of the two
sampie values Zl and Z2 measured at the loeation Xl and multiplies this average
with the simple kriging weight obtained when only one information is available
in the neighborhood of xo.
Linear Regression Theory
These are a few standard results of regression theory in the notation used in the
rest of the book.
Best least squares estimator

The best estimation, in the least squares sense, of a random variable of interest
Zo on the basis of a function of N random variables Z" i = 1, ... , N is given by
the conditional expectation of Zo knowing the variables Z,
E[ Zo I Zt, ... , ZN] = mo(z)
where z is the vector of the explanative variables
z = (Zl, .. "ZN)T
In order to prove that no other function !(z) is better than the conditional
mean mo(z), we need to show that the mean square estimation error is larger for
!(z)
E[ (Zo - ~ E[ (Zo - mo(z))2]
!(z)f]
The mean square error using the arbitrary function ! can be expanded
E[ (Zo - !(z)f] E[ (Zo - mo(z) + mo(z) - !(z)f]

E[ (Zo - mo(z)f] + E[ (mo(z) - !(Z))2]
+ 2E[ (Zo - mo(z)) . (mo(z) - !(z))]
The last term is in fact zero:
E[ (Zo - mo(z)) . (mo(z) - !(z))]
=E[E[(Zo-mo(z)). (mo(z)-!(z)) Iz]]
=E[ E[(Zo-mo(z))lz]
.. 'Y' .J
(mo(z)-!(z))]
o
=0
Thus the mean square error related to an arbitrary function ! is equal to
the mean square error based on the conditional mean mo plus the mean square
difference between mo and f.
214 Appendix
Best linear estimator

The computation of the conditional expectation requires in general the knowledge
of the joint distribution F(Zo, Zh ... , ZN) which can be difficult to infer. An
alternative, which we know from the previous section to be less effective, is to
use a linear function b + cT z, where b is a constant and c = (Ch •.• ' CN)T is a
vector of coefficients for the variables Zi. The question is now to determine the
linear function which is best in the least squares sense.
We first expand the mean squared estimation error around the mean mo of
Zo:
E[(Zo-b-CT Z)2] = E[(Zo-mo+mo-b-cTzf]
= E [ (Zo - mo - cT z f] + E [ (mo - bf]

+ 2 E [ (Zo - mo - cTZ) . (mo - b) ]
As the third term is zero
E[ (Zo - mo - cT z) . (mo - b)] = E[ (Zo - mo - cT z)] . (mo - b) =0

" T .I
o
the mean squared error is equal to
E [ (Zo - b - cT z) 2] = E [ (Zo - mo - cT z) 2] + (mo - b f

and reduces to the first term for the optimal choice of b = mo.
Previously we have derived Va = Vo as the equations yielding the solution
a of multiple linear regression problem. We can now show that a is the vector of
coefficients for the best linear estimator of Zoo Supp08e without 1088 of generality
that the means are zero for all variables: mi = 0 for i = 0,1, ... , N. The mean
squared estimation error is expanded again
E[(Zo-CTzrJ = E[(Zo-aTz+aTz-cTzrJ
= E[ (Zo - aT zrJ + E[ (aTz - cT zrJ
+ 2 E [ (Zo - aT z) . (aT z - cT Zf]
and the third term is zero
E[(Zo-aTz). (aTz-cTzfJ
E [ (Zo - aT z) . (ZT a - ZT c) )
= E[ZozTa-ZozTc-aTzzTa+aTzzTc]
o a-vT
vT 0 c-aTVa+aTVc
vi; (a - c) - aT V (a - c)
= 0 beeause Va = vo.
The mean squared estimation error
E[ (Zo - cT zfl = E[ (Zo - aT zfl + E[ (aTz - cT z)2l
is clearly lowest for c = a. Thus the multiple regression estimator provides the
best linear estimation in the least squares sense
N
Z~ = mo + L ai (Zi - mi)
;=1
The estimator Z~ is identieal with the eonditional expectation m1(z) in the

ease of a multigaussian distribution function.
Projection
Let Z be the matrix of N eentered auxiliary variables Zi and Zi be the vectors of
n sample values of eaeh variable, zr.
The sample values of the eentered variable
of interest Zo are not part of Z. The variable of interest is estimated by multiple
linear regression with the linear eombination of Z a = z~.
The varianee-eovarianee matrix of the auxiliary variables is
V= ~ZTZ
n
The veetor of of eovarianees between the variable of interest and the auxiliary
variables is
1 T
vo = -Z Zo
n
Thus the multiple linear regression is
Va=vo -F:} ZT Za = ZT Zo
and assuming that V is invertible
a (ZT Zr 1 ZT Zo
z~ = Z(ZTZr1ZTzo
The matrix P = Z (ZT Zr 1 ZT is symmetrie and idempotent: P = p 2 • It

is a projection matrix, whieh projects orthogonally a vector of order n onto the
spaee spanned by the eolumns of Z
z~=Za=Pzo
216 Appendix
The vector of estimated values z~ is the result of this projection. The vector
of estimation errors
e=zo-z~
connects Zo with its projection and is by construction orthogonal to z~. We have
e=zo-Za= (I-P)zo
and the square of the length of the error vector is given by its Euclidean norm
lIell 2 = z~ Zo - 2 z~ }> Zo + z~ PP Zo
= z~ (I-P)zo
Updating the variance-covariance matrix

The effect on the variance-covariance structure of adding or removing a sampie
from the data set is interesting to analyze when companng the multiple regression
with the data values.
An updating formula for the inverse in RAo (1973, p33) is useful in this
context. For a non-singular N X N matrix A, vectors b, C of order N the inverse
of A + b cT can be computed from the inverse of A by
T -1 A -1 b cT A- 1
(A+bC) =A -1 - _. __
Applying this formula to A = n V = ZT Z, the effect of removing an N -variate

sampie Za from a row of the centered data matrix Z is computed by
(Z(a) Z(a)r 1 = (ZT Z - Za z~rl
= (ZT Zr 1 + (ZT Zr1 zaz~ (ZT Zr1

1 - zTa (zr Z)-1 Za
= (ZT Zr 1 + Z Z
( T )-1 za z~ (ZT Zr 1
1- Paa
where Paa is the element number a of the diagonal of the projection matrix P.
Cross validation
The error vector e = Zo - z~ is not a suitable basis for analyzing the goodness of
fit of the multiple linear regression vector z~. Indeed each estimated value zö* is
obtained using the same weight vector a. This vector is actually computed from
covariances set up with the sampie values Zö and zf, i = 1, ... ,N.
Cross-validation consists in computing an estimated value z~["'l leaving the

values related to the sample number a out when determining the weights a["'l
* l = z'"T a[",l
zO["'
The notation [al indicates that the sample values of index a have not been
included into the calculations for establishing the weights a[",l.
A recursive relation between a[",l and a", exists
a[",l = (ZT"'l Z["'l) -1 ZT",]z~l
= (ZT"'l Z[",]r 1 (ZT Zo - Z'" zg)

1 (ZT Z) -1 T ( T -1
= a _ (ZT Zr z'" zg + Z'" z'" a _ Z Z) Z'" Paa zg
1- Paa 1-p0101
(ZT Zr 1 Za e",
a- 1- Paa
where ea = zg - zg*.
From this a recursive formula for the cross-validation error e[a] is derived
T
z'" a[al
01
e[a] Zo -
zg-z~ (a- (Z T
Z)_1 z",e a )
1- p",,,,
Pa", e",
= ea - 1- Paa
e",
1 -Paa
An estimate of the mean squared cross validation error is now easily computed
from the elements of the vector e and the diagonal elements Paa of the projection
matrix
1~( )2_1~( ea
- L. e[a] - - L.
)2
n 01=1 n 01=1 1 - paa
which can serve to evaluate the goodness of fit of the multiple linear regression.
Covariance and Variogram Models
This is a list of a few models commonly found in the literat ure.
Notation
• h is the spatial separation vector,
• b > 0 is the linear parameter,
• a > 0, Cl are non-linear parameters,
• Jv are Bessel functions of order v,
• K v are Basset functions (modified Bessel of third kind).
Covariance Models
Nugget-effect
{
b when Ihl = 0,
Cnug(h) = 0 when Ihl > o.
Spherical
Reference: [114], p57; [116], p86.
Csph(h) ={0
b (1 - 27
31hl 11h13 )
+ 2~
for 0 :s; Ihl :s; a,
for Ihl > a.
Cubic
Reference: [31]. Only for up to 2D.
('~I) 2 [7 - I~I [3: - (':1) 2[~_ ~ (':1) 2]]]

1
b
Ccub(h) =
for 0 :s; Ihl :s; a,
o for Ihl > a.
Covariance and Variogram Models 219
Stable
Reference: [208], vol. 1, p364.
-~
CexP-Cl/(h) = be a with 0< er ~ 2
Exponential
Ihl
Cexp(h) = be---;;
Gaussian
Ihl 2
Cgaus(h) = be---;;:
Hole-effect
-~ 1
Cho/(h) = be a coswlhl with a> y'3w
Basset
Reference: [114], p43j [208], vol. 1, p363.
Cbas(h) = b c~lr IC C~I) with /J ~ 0
Bessel
Reference: [114], p42j [208], vol. 1, p366.
lh l)-(n-2)/2 (Ihl)
Cbes(h) = b ( --;; J(n-2)/2 --;;
with n equal to the number of spatial dimensions.
Cauchy
C~.(h) ~ [1 + (1:1)
b
T with er> 0
220 Appendix
Variogram models
Power
Reference: [114], p128; [208], vol. 1, p406.
IPow-o(h) = b Ihl a with 0< a < 2
De Wijsian-a
Iwijs-a2(h) = b ~ log(lhl 2 + a 2 ) with a =f 0
De Wijsian
Iwijs(h) = b2 log(lhl) for h =f 0

Additional Exercices
EXERCISE IV.1 (by C. DALV) Simple kriging with an exponential covariance

model in one dimension. Let Z(a) be a second-order stationary random function
denned at points a with a = 0,1, ... ,n located at regular intervals on a line. We
wish to estimate a value for the next point a+ 1 on the line.
i) Set up the simple kriging equations for an exponential covariance function
Ihl
C(h)=be a
where a is a range parameter.

ii) The solution of the system is
W", = 0 for a = 0, ... ,n-1 and Wn =C

Compute the value of c.
In time series analysis the random function
Z(n) = pZ(n -1) + en

is called an autoregressive process of order one AR(1), where p is a weight and
en is a nugget-effect model
2 {(72
cov(en,em) = Önm (7 = 0
ifn=m
otherwise
We restrict the exercise to stationary AR(1) processes with 0 < p < l.

iii) Show that Z( n) can be expressed aB a sum of values e", where a = -00, ... ,n.
iv) Compute the covariance cov( Z(n), Z(m)) and the correlation coeflicient for
n,mEZ.
v) Compare the Box & Jenkins AR(1) estimator
Z*(n + 1) = pZ(n),
with the simple kriging solution obtained in ii).
222 Appendix
EXERCISE IV.2 Z(x) is a second-order stationary random function witb a zero

mean, split into two uncorrelated zero mean components, wbich has the region-
alization model
Z(x) = yS(x) + yL(x)
where yS(x) is a component witb a short range covariance CS(h) and where
yL(x) is a component with a long range covariance CL(h)
C(h) = CS(h) + CL(h)
Data is located on a regular grid and simple kriging (i.e. without condition
on the weights) is performed at the nodes of this grid to estimate short and long
range components, using a neighborhood incorporating all data.
What relation can be established between the weights A~ and A~ of the two
components at each point in the neighborhood ?
EXERCISE IV.3 Z(x) is a locally second-order stationary random function com-

posed of uncorrelated zero mean components yS(x) and yL(x) as well as a drift
which is approximately constant in any local neighborhood of the domain.
Show that the sum of the krigings of the components and of the kriging of
the mean is equal to the ordinary kriging of Z(x)
y;(xo) + yt(xo) + mf(xo) = Z*(xo)

Solutions to Exercises
EXERCISE
_ _17.3
_ We have lZTy
n = lZTZQ
n because Y = ZQ.
R Q = Q A is the eigendecomposition of R.
Therefore cov(Z;, yp) = Xp qip and dividing by the standard deviation of yp,
which is A, we obtain the correlation coefficient between the variable and the
factor.
If the standardized variable is uncorrelated (orthogonal) with all others, an
eigenvector qp exists with all elements zero except for the element q,p corre-
sponding to that variable. As this element q,p is 1 because of normation and the
variable is identical with the factor, the eigenvalue has also to be 1.
EXERCISE 17.4
R = corr(Z, Z) =Q IX IX er
= corr(Z, y) [corr( Z, y) r
EXERCISE 18.1 The orthogonality constraints are not active like in PCA.
EXERCISE 18.2 Non active orthogonality constraints.
EXERCISE 18.3 Multiply the equation by the inverse of A.
EXERCISE 20.2 C12 (h) = al C22 (h + rl) + a2 C22 (h + r2)

EXERCISE 21.1
f(w) ~
211"
J
+00
be-alhl-iwh dh
-00
= 2 b1l" [J e(a-iw)h dh + j
0
e-(a+iw)h dh]
1 1] =
-00
=
b[ b a 2:: 0 for any w.
211" a-iw + a+iw 1I"a 2 +w 2
EXERCISE 21.2 The inequality between the spectral densities

2
((~)'
ai+ a "
aiaj > ~ )
<a; +w')(aj +w') - +w'

224 Appendix
is rewritten as
(a; +w 2)(aj +W 2)
(a; tajr f
a; aj
>
((~r +w 2
This inequality is false for a; i= aj when w - t 00 because the left hand side is a
constant < 1 while the right hand is dominated for large values of w by a term w4
appearing both in the numerator and in the denominator. Thus the set of direct
and cross covariance function is not an authorized covariance function matrix.
In this exercise the sills were implicitly set to one (which implies a linear
correlation coefficient equal to one). YAGLOM [208], vol. I, p. 315, gives the
example of a bivariate exponential covariance function model in which the range
parameters a are linked to the sill parameters b in such a way that to a given
degree of uncorrelatedness corresponds an interval of permissible ranges.
EXERCISE 22.1
N N n n
L: L: w;wjb;j ~ 0 and L: L: Wer Wß p(Xer-Xß) ~ 0
;=1 j=1 er=1 ß=1
because B is positive semi-definite and p(h) is a normalized covariance function.

Thus
(t t
;=1 j=1
W;Wjb;j)' (t ta=1 ß=1
Wer wßp(xer-xß) )
N N n n
= L: L: L: L: W~ W~ b;j p(Xer-Xß) ~ 0
;=1 j=1 er=l ß=1
with W~ = Wi Wer.
EXERCISE 22.2 ')'(h) is a conditionally negative definite function and -')'(h)
is conditionally positive definite. - B ')'(h) is a conditionally positive definite
function matrix. The demonstration is in the same spirit as for a covariance
function matrix.
EXERCISE 23.1
i) The diagonal elements of the coregione.lization matrix are positive and the
determinant is
I-12 -111 = 2 - 1 = 1
The principal minors are positive and the matrix is positive definite.
The correlation coefficient is r = -1/..[2 = -.71.
ii) In the case of isotopy the system would be
Cl1(XO - xo) Cl1 (XI - X2) CI2 (XO - XO) CI2 (XI - X2))
( Cl1 (X2 - Xl) Cl1 (Xo - XO) C12 (X2 - xd CI2(XO - XO)
C21 (XO - XO) C21 (XI - X2) C22(XO - XO) C22(XI - X2)
C21 (X2 - Xl) C21 (XO - XO) C22 (X2 - Xl) C22 (XO - XO)
Wi) (Cl1(XI-XO))
x ( W~
W2 _
-
Cl1 (X2 - XO)
C21 (XI - XO)
W~ C21 (X2 - XO)
As there is no information about ZI(X) at the point X2, the second row of the
system and the second column of the left hand matrix vanish and we have
(-12o -10)
1 0 (Wl)
W2 (0)0, ( 0 ) , (0)
0 1 W3 0 -5/16 -1
0
iii) W3 = 0, -te, -1; Wl = W2 = 0 for the three estimation points.
iv)
zt(x~) = ml because the point x~ is out of range of the two data points.
= ml + C12(~) (Z2(X2) - m2)' It is estimated from the residual of the
zt(x~)
auxiliary variable at the point X2'
Z;(X2) = ml + cov(zt, Z2) (Z2(X2) - m2) is equivalent to the linear regression
at the point X2.
As a conclusion of this exercise we see that in the heterotopic case cokriging
with an intrinsic correlation model does not boil down to kriging. For the two
points within the range of X2 the only non zero weight is for a data value of the
auxiliary variable, while the primary variable solely contributes to the simple
cokriging estimator through its mean ml.
EXERCISE 23.4 No.
EXERCISE 23.5 This strategy gives a trivial result. The sum of the weights for
the auxiliary variable S(x) is contrined to be zero in ordinary kriging. Thus the
weight for the only data value on S included in the cokriging neighborhood is
zero.
EXERCISE 24.1 When all coregionalization matrices B., are proportional to one
matrix B we have:
s s
C(h) = La., B p.,(h) = B La., p.,(h)
.,=0 .,=0
where the au are the coefficients of proportionality.
EXERCISE 24.2 As a matrix B., is positive semi-definite by definition, the posi-
tivity of its second order minors implies an inequality between direct and cross
sills
Ibijl ~ Jb'j b'li
226 Appendix
from which the assertions are easily deduced.

EXERCISE 26.1 For w~ = w~ we have
n n
2 L: L: w~ w:"' CRe(x",-xß) ~ 0
",=1 ß=1
with any set of weights w~. Thus CRe(h) is a positive definite function.
EXERCISE 26.2
var( (1 + i) Z(O) + (1 - i) Z(h)) = 4C Re (0) + 4C Im (h) ~ 0
var( (1 - i) Z(O) + (1 + i) Z(h)) = 4C Re (0) - 4C Im (h) ~ 0
and we have CRe(O) ~ ICIm(h)l.

EXERCISE 26.3 The estimation variance is:
var(Z(xo) - ZCK(XO)) = E[ (Z(xo) - ZCK(XO)) . (Z(xo) - Z~K(XO)) ]

= E[ ((U(xo) - UCK(xo)) + i (V(Xo) - V~(Xo)))
x ((U(xo) - crcK(XO)) - i (V(Xo) - VCK(XO))) ]
= var(U(xo) - UCK(xo)) +var(V(xo) - VCK(Xo))
EXERCISE 26.4 The estimation variance is:

n n
var(U(xo) - UCK(xo)) = Cuu(xo - Xo) + L: L: Jl~ Jl~ Cuu(x",-xß)
",=Iß=1
n n
+ L: L: v! vJ Cvv(x",-xß)
"'=Iß=1
n n
+2 L: L: Jl~ vJ Cuv(x",-xß)
",=Iß=1
n n
-2 L: Jl~ Cuu(x",-xo) - 2 L: v! CUV(x",-Xo)
",=1 ",=1
EXERCISE 26.5 With an even cross covariance function Cuv(h)= Cvu(h) the
kriging variance of complex kriging is equal to
O'~c = 0'2 _ C ReT w Re

= 0'2 - (cuu + cvv? w Re
a 2 -w ReT (C UU+ Cvv) w Re
because the weights W Re satisfy the equation system
CReT wRe = c Re ~ (Cuu + C vv ) w Re = cuu + cvv

The kriging variance of the separate kriging is
2
ace a 2 - CUUw
T 1 T 2
K - CVVWK
where the weights are solution of the two simple kriging systems
CUUW~ = Cuu and C VVWK2 = WK2

The difference between the two kriging variances is
2 2
aCC-aKS -W + Cvv ) + CUUWK
ReT (
Cuu T 1
+ Cvv
T
WK
2
-W
ReT (C 1
UU WK + C vv WK2) + Cuu
T
W K + Cvv WK
1 T 2
(W~ - WRe)T Cuu W~ + (W~ - WRe)T Cvv W~

= Q2 + (W~ _ w Re ? Cuuw Re + (W~ _ wRe)T CvvwRe
where Q2 is the sum of two quadratic forms
Q2 = (w~ - w Re ? Cuu (w~ - w Re ) + (w~ - WRe)T Cv v (w~ - w Re )
which are nonnegative because Cuu and Cvv are positive definite.
The difference between the two kriging variances reduces to Q2
a6c- aks Q2 + (cuu + cvv) w Re - w ReT Cu u w Re - w ReT C vv w Re
= Q2
The variance of complex kriging is larger than the variance of the separate
kriging of the real and imaginary parts when the cross covariance function is
even. Equality is achieved with intrinsic correlation.
EXERCISE 30.2 a)
2
aOK = Coo - k6 K- 1 k o = Coo - kb ( -/lOK
WOK )
n
= C(xo - Xo) - L waoC(x o - xo) + fLOK
0=1
b)
-1 RI A R- RI
u= FRI V=FRI
=
-_(RI)T
....... -
228 Appendix
m* = (zT,O)K-l (0)
1
= (ZT,O) (V) = ZT V = zTRl
U - - ~-
C)
ZTRS zTRl FRS)

Kn(z,s) = ZT As = ITRI ( ITRI - ITRI . ITRI
= ITRI (En[z,s)-En[z]En[s)) = ITRlcovn(z,s)
d)
Let z'=(~), s'=(~) and F- 1= ( s,TK S'0 ) = (AF

vI VF)
UF
K-l s'
with VF = s,T K-l s' and
AF = K-1 _ K-l s' (K-l s'?
s,TK-l s'
/f' ~ (zT,O,O)F- t m~ (.T,O,O) (::) ~z'T VF
z,T K- 1 s' Kn(z,s) COVn(Z,s)
m~
= s,TK-1s' = Kn(s,s) = COvn(s,s)
a* ~ (.T,O,O)F- t (zT,O)AF m
= z' (K-1 _ K- 1 s' (K-1 s'?)
S,T K-I S' 1
(0)
= Z' K- 1 (0) _
1
Z' K- 1 S' . S' K-I
s'K-Is'
(0)1
= En[z] - b* En[s)
EXERCISE 30.3 a) Vo = _K- 1 k o Uoo and
CooUoo - ~K-Ikouoo = 1
Uoo ( Coo - k~ K-1 ko) = 1

1
Uoo = -2-
O"OK
b)
K:n+t (Zo, so) (zo,zT,0) (uoo

Vo Vi;) (so,sT,0)
Ao
with Ao = K- l + Uoo K- l ko (K- l ko)T
K:n+l (zo, so) Zo Uoo So + (ZT,O)voso + zov~ (~) + (zT,O)Ao (~)
= Uoo Zo So - uoo(zT,O)K- l kos o - uoozOK- l ko (~)

+K:n(z, s) + Uoo (ZT, 0) K- l ko (K- l kO)T (~)
= uoo(zo-(zT,O)K-lko)' (so-(sT,O)K-lko) +K:n(z,s)
c)
T -1) ~(sols)
K:n+t(Zo,SO)= ( zO-(z,O)K ko· 2 +K:n(Z,S)
O"OK
et
(Uoo,VTo)' (SO)
0 Uoo So + V~ (~) = Uoo So - Uoo K- l ko (~)
A( I) _ ~(sols)
uoo.u. So s - 2
O"OK
EXERCISE 30.4 a) For a = 1 we have
(UlI, VIT) (S) _ ~(slls[l))

0 - 2
0"(1)
and for a = n
(Vl.n, ••• ,Vn-l,n,Unn,vn+t.n) (~) = ~(S~S[n))

O"[n)
as weIl as for a f:. 1 and a f:. n:

(Vl,a, • •• , Va-l,a, u aa ' Va+l,a,' .. , Vn+l,a) ( S) ~(sals[a))
0 = --2-
O"[a)
b)
l (S)
b*=K:n(z,s) = (zT,O)
K:n(s, s)
0
K:n(s, s)
K-
=tzaa=l
~(saIS[a))
O"~) K:n(s, s)
EXERCISE 1.1
ITl = n
230 Appendix
11T = C D
'-.------'
(nxn)
EXERCISE 1.2 lZTl

n = m is the vector of the means m; of the variables.
~llTZ = M is an n X N matrix containing in each row the transpose of the
vector of meanSj each column m; of M has N elements equal to the mean mi of
the variable number i.
EXERCISE 1.7 The classical eigenvalue decomposition.
EXERCISE IV.l
i)
n
~ Wß
L.; e- I"'-ßI
a = e_ I",-(n+1)1 Q = l, ... ,n
ß=l a
ii)
_I",-nl _1",-(n+1l1
wne a = e a Q = l, ... ,n
1",-(n+1)1
e a
=> Wn
I",-nl Q= l, ... ,n
e- a
e-(n~l)
= e-"
n -
- e
_1
a =C
iii)
Z(n) = pZ(n-l)+en=p[pZ(n-2)+en_tl+en
m-l
= p2 Z(n-2) +pen-l + en = pm Z(n-m) + L: p"'en-",
",=0
As liIDm-+oopm Z(n-m) = 0 because Ipl < 1
00 n
Z(n) = L: p'" en-", = L: pn-"'en
",=0 0=-00
iv) It is easy to show that E[ Z(n) 1 = o. Then

cov(Z(n), Z(m)) E[ Z(n) Z(m) 1= E[ f fpa ~ Cn-a Cn-ß ]
a=Oß=O
00 00
L Lpa ~E[ Cn-a cn-ß]

a=Oß=O
f
a=O
pa ( f
ß=O
~ a 2 lin-a,m-ß)
E[ Cn-a Cn-ß ] = a 2, if n-o:: = m-ß, that is to say, if ß = m-n + 0::.

Ifn::;m
00 00
cov( Z(n), Z(m)) L pa p(m-n)+a a 2 = a 2 pm-n L p2a

a~ a~
a 2 p m-n
=
1- p2
and if n > m, we have 0:: 2:: n - m, so
cov(Z(n),Z(m)) = f
a=n-m
pa p (m-n)+aa 2
00
LP(n-m)+i p(m-n)+(n-m)+; a 2
;=0
00
a 2 p n-m Lp2i
i=O
n-m
2
ap
=
1- p2
a2 In-mi
Thus cov(Z(n),Z(m)) = 1 P _2 and r nm = pln-m l.
V) P = Wn thus the range
1
a = -logp
and the sill
a2
b = var(Z(n)) = 1 _ p2
EXERCISE IV.2 As we have exact interpolation, the kriging weights for Z*(xo)
are linked by
Wo = 8xo ,xo
232 Appendix
As the estimators are linear we have

p c a
WO/ = UXa.Xo - WO/
EXERCISE IV.3 Ordinary kriging can be written as
( C~'1
....
CN•1
::.
...
C1:.N
CN.N
~)
1
( ~f'p+
wN
) ( Wr) (W~o )
'a+;"o
wN wN
,
1 ...
.....
1 0
' ..............................
I'P I'a I'mo
~
A ~ ~ ~
= (Cf' )+ ( Cf' )+ (~)

CO•N CO•N 0
______ o 0
______ 1
'-v-'
bp ba b mo
with Axp = b p , AXa = ba , Axmo = b mo •

We then have
z*(Xo) = ZT Wp + ZT Wa + ZT wmo = ZT(Wp + Wa + w mo ) = ZT W

where z is the vector of the n data and Wp, Wa, w mo , ware the vectors of the
n corresponding weights.
References and Software
This classification of selected references is aimed at the reader who wants to

get astart into the subject for a specific topic or application. References are
grouped under the three headings: concepts, applications and books. Sources of
computer software are listed at the end.
Concepts
Classification: SOUSA [175]; OLIVER & WEBSTER [135]; RASPA ET AL. [146].
Cokriging: MATHERON [116][121]; MARECHAL [109]; FRANQOIS-BoNGARQON [64]; MY-

ERS [132][133]; STEIN, EINJSBERGEN & BARENDREGT [178]; WACKERNAGEL [196].
Cokriging of variables and their derivatives: CHAUVET ET AL. [27]; THIEBAUX &
PEDDER [187]; RENARD & RUFFO [148].
Coregionalization analysis, factor cokriging: MATHERON [122]; SANDJIVY [164];

WACKERNAGEL [194][195]; WACKERNAGEL ET AL. [198]; GOULARD [76][77];
ROYER [158]; DALY ET AL. [45]; GOOVAERTS [71][73]; GRZEBYK & WACKER-
NAGEL [81].
Cross covariance function, cross variogram: YAGLOM [207], [208]; MATH-

ERON [114].
Cross validation: COOK & WEIS BERG [36]; DUBRULE [61]; CASTELlER [21][22].
External drift: DELHOMME [54]; [51]; GALLI ET AL. [65]; CHILES

DELFINER ET AL.
& GUILLEN [33]; MARECHAL [110]; GALLI & MEUNIER [66]; RENARD & NAI-
HSIEN [147]; CAST ELlER [21), [22); HUDSON & WACKERNAGEL [87).
Fractals and geostatistics: BRUNO & RASPA [19].
Generalized covariance, IRF-k: MATHERON [119); DELFINER & MATHERON [52];

CHILES [31]; DOWD (60); CHAUVET [24].
Kriging of spatial components: MATHERON (116), (122); SANDJIVY (163); GALLI ET

AL. (65); CHILES & GUILLEN [33).
Kriging weights: MATHERON [116]; RrVOlRARD [151].

234 References and Software
Multivariate fitting of variograms/eross eovarianees: GOULARD (76)[77]; LAJAU-

NIE [97]; BOURGAULT & MARCOTTE [15]; GOULARD & VOLTZ [78]; GRZEBYK [80].
Noise filtering: SWITZER & GREEN [185]; BERMAN [14]; MA & ROYER [105];
DALY [44]; DALY ET AL. [45)[46].
Nonlinear geostatisties, disjunctive kriging, isofactorial models: MATH-

ERON [120)[123)[124)[125]; ORFEUIL [137]; LANTUJboUL [100)[101]; RIVOIRARD
(152)[153)[154)[155]; PETITGAS [141]; LAJAUNIE [98].
Sensitivity of kriging: WARNES [201); ARMSTRONG & WACKERNAGEL [12).
Simulation: ARMSTRONG & DOWD [9].

Spaee-time drift, trigonometrie kriging: SEGURET & HUCHON [171); SEGURET [170).
Spaee-time modeling: STEIN [180); HASLETT [85); GOODALL & MARDIA [70).
Spatial eorrelation mapping: SAMPSON & GUTTORP (162); MONESTIEZ &

SWITZER [130); MONESTIEZ ET AL. [129); GUTTORP & SAMPSON [83]; BROWN
ET AL. [18].
Universal kriging: MATHERON [115]; HUIJBREGTS [88]; HUIJBREGTS [90];

SABOURIN [160]; CHILES [31]; CHAUVET & GALLI [26]; ARMSTRONG [7]; CHAU-
VET [24].
Variables linked by partial differential equations: MATHERON [117]; DONG [59];

MATHERON ET AL. [127].
Variogram cloud: CHAUVET [23]; HASLETT ET AL. [86].
Applications
Design of computer experiments: SACKS ET AL. [161).
Geography: HAINING [84].
Epidemiology: OLIVER ET AL. [136].
Fisheries: PETIT GAS [141).
Forestry: MARBEAU [107); FOUQUET & MANDALLAZ [63).
Geoehemieal exploration: SANDJIVY [163); SANDJIVY [164]; WACKERNAGEL &

BUTENUTH [197); ROYER [158); LINDNER & WACKERNAGEL [103); WACKERNAGEL
& SANGUINETTI [199].
Geodesy: MEIER & KELLER [128].

References and Software 235
Geophysical exploration: GALLI ET AL. [65]; CHILES & GUILLEN [33]; SCHULZ-
OHLBERG [168]; RENARD & NAI-HsIEN [147]; SEGURET & HUCHON [171];
SEGURET [170].
Hydrogeology: DELHOMME [53]; CREUTIN & OBLED [41]; BRAS & RoDRiGUEZ-
ITURBE [17]; DE MARSILY [111]; STEIN [180]; AHMED & DE MARSILY [2]; DA-
GAN [43]; DONG [59]; ROUHANI & WACKERNAGEL [156]; CHILES [32]; MATHERON
ET AL. [127].
Image analysis: SWITZER & GREEN [185]; BERMAN [14]; MA & ROYER [105]; DALY
ET AL. [45][46][47].
Industrial hygienics: PREAT [142]; SCHNEIDER ET AL. [166].
Material science: DALY ET AL. [45][46][47].
Meteorology: CHAUVET ET AL. [27]; THIEBAUX & PEDDER [187]; HASLETT [85];
THIEBAUX ET AL. [186]; HUDSON & WACKERNAGEL [87].
Mining: JOURNEL & HUIJBREGTS [93]; PARKER [140]; SOUSA [175].
Petroleum and gas exploration: DELHOMME ET AL. [55]; DELFIN ER ET AL. [51]; GALLI
ET AL. [65]; MARECHAL [110]; GALLI & MEUNIER [66]; JAQUET [92]; RENARD &
RUFFO [148].
Pollution: ORFEUIL [137]; LAJAUNIE [96]; BROWN ET AL. [18].
SoH science: WEBSTER [202]; [200]; GOULARD [77]; OLIVER
WACKERNAGEL ET AL.
& WEBSTER [135]; WEBSTER [204]; GOULARD & VOLTZ [78]; STEIN,
& OLIVER
STARISKY & BOUMA [179]; GOOVAERTS [72][73]; GOOVAERTS ET AL. [74]; PAPRITZ
& FLÜHLER [138]; GOOVAERTS & WEBSTER [75]; WEBSTER ET AL. [203].
Books
Basic geostatistical texts: MATHERON [113], [114], [116], [126].
Introductory texts:DAVID [48]; JOURNEL & HUIJBREGTS [93]; CLARK [34];
DELFIN ER [50]; ARMSTRONG & MATHERON [10]; AKIN & SIEMES [3]; ISAAKS &
SHRIVASTAVA [91]; CHAUVET [24], [25]; CRESSIE [40]; ARMSTRONG ET AL. [11];
RiVOJRARD [155].
Proceedings: GUARASCIO ET AL. [82]; VERLY ET AL. [192]; ARMSTRONG [8];

SOARES [174]; ARMSTRONG & DOWD [9]; DIMITRAKOPOULOS [58].
Books of related interest: MATERN [112]; YAGLOM [207], [208]; Box & JENKINS [16];
LUMLEY [104]; BENNETT [13]; CLIFF & ORD [35]; ADLER [1]; RIPLEY [149]; VAN-
MARCKE [190]; UPTON & FINGLETON [189]; BRAS & RODRIGUEz-ITuRBE [17];
MARSILY [111]; RIPLEY [150]; RYTOV ET AL. [159]; THIEBAUX & PEDDER [187];
ANSELIN [6]; WEBSTER & OLIVER [204]; HAINING [84]; CHRISTEN SEN [30]; STOYAN
& STOYAN [182J; DIGGLE ET AL. [57J.
236 Referenees and Software
Introduction to probability and statisties: FELLER [62]; MORRISON [131]; CHRIS-

TENSEN [29]; SAPORTA [165]; STOYAN [181].
Books on multivariate analysis: RAO [145]; MORRISON [131]; MARDIA, KENT &
BIBBY [108]; COOK & WEISBERG [36]; ANDERSON [5]; SEBER [169]; GREENACRE [79];
VOLLE [193]; GITTINS [69]; GIFI [67]; SAPORTA [165]; WHITTAKER [205].
Software
Several publie domain produets exist, of varying quality, and up to now, of limited
lifetime. The most eommonly used eolleetion of FORTRAN routines is available in the
book by DEUTSCH & JOURNEL [56].
We list three sourees of eommercial software:
• Isatis. A general purpose 3D geostatistical paekage for workstations (a PC ver-

sion is planned). Developed by: Cent re de Geostatistique, Eeole des Mines de
Paris, 35 rue Saint Honore, F-77305 Fontainebleau, Franee.
• GDM. A geostatistical paekage for workstations and PC. Oriented towards

mining and geological applieations. Developed by: Bureau de Reeherehes
Geologiques et Minieres, BOlte Post ale 6009, F-45060 Orleans Cedex 2, Franee.
• S-Plus. A general purpose statistical system for workstations and PC. Functions
for spatial statistics are deseribed in the book by VENABLES & RIPLEY [191]. De-
veloped by: Statistical Scienees Ine., 1700 Westlake Ave. N., Suite 500, Seattle,
WA 98109, USA.
Bibliography
[1] ADLER R (1981) The Geometry of Random Fields. Wiley, New York.
[2) AHMED S & DE MARSILY G (1987) Gomparison of geostatistical methods for es-
timating transmissivity using data on transmissivity and specific capacity. Water
Resources Research, 23,1717-1737.
[3] AKIN H & SIEMES H (1988) Praktische Geostatistik. Springer-Verlag, Heidelberg.
[4) ALMEIDA AS & JOURNEL AG (1994) Joint simulation ofmultiple variables with
a Markov-type coregionalization model. Mathematical Geology, 26, 565-588.
[5) ANDERSON TW (1984) An Introduction to Multivariate Statistical Analysis. Wi-

ley, New York, 675p.
[6) ANSELIN L (1988) Spatial Econometrics: Methods and Models. Kluwer Academic
Publishers, Dordrecht, 284p.
[7] ARMSTRONG M (1984) Problems with universal kriging. Mathematical Geology,

16,101-108.
(8) ARMSTRONG M, editor (1988) Geostatistics. Two volumes, Kluwer Academic

Publisher, Amsterdam.
[9] ARMSTRONG M & DOWD PA, editors (1994) Geostatistical Simulation. Kluwer
Academic Publisher, Dordrecht, 255p.
(10) ARMSTRONG M & MATHERON G (1987) Geostatistical Gase Studies. Reidel,

Dordrecht.
[11] ARMSTRONG M, RENARD D, RIVOIRARD J & PETITGAS P (1992) Geostatistics

for Fish Survey. Publication C-148, Centre de Geostatistique, Ecole des Mines de
Paris, Fontainebleau, 89p.
[12) ARMSTRONG M & WACKERNAGEL H (1988) The infiuence of the covariance

function on the kriged estimator. Sciences de Ja Terre, Serie Informatique, 27,
245-262.
[13) BENNETT RJ (1979) Spatial Time Series. Pion, London, 674p.
[14] BERMAN M (1985) The statistical properties of three noise rem oval procedures
for multichannel remotely sensed data. Division of Mathematics and Statistics,
Consulting Report NSW /85/31/MB9, CSIRO, Lindfield, Australia.
238 Bibliography
(15) BOURGAULT G & MARCOTTE D (1991) Multivariable variogram and its applica-
tion to the linear model of coregionalization. Mathematical Geology, 23, 899-928.
(16) Box GEP & JENKINS GM (1970) Time Series Analysis: Forecasting and Con-
trol. Holden-Day, San Francisco.
(17) BRAS RL & RODRfGUEZ-ITURBE I (1985) Random Functions and Hydrology.

Addison-Wesley, Reading, 559p.
(18) BROWN P J, LE ND & ZIDEK JV (1994) Multivariate spatial interpolation and

exposure to air pollutants. The Canadian Journal of Statistics, 22, 489-509.
(19) BRuNo R & RASPA G (1989) Geostatistical characterization offractal models of

surfaces. In: Armstrong M (ed) Geostatistics, 77-89, Kluwer Academic Publisher,
Dordrecht.
(20) CASTELlER E (1992) Le krigeage propagatif. Memoire de DEA, Publication S-291,

Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau, 42p.
(21) CASTELlER E (1992) La derive externe vue comme une regression lineaire.
Publication N-34/92/G, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau, 22p.
(22) CASTELlER E (1993) Derive externe et regression lineaire. Cahiers de Geostatis-

tique, 3,47-59, Ecole des Mines, Paris.
(23) CHAUVET P (1982) The variogram cloud. In: Johnson TB & Barnes RJ (eds)
17 th APCOM, 757-764, Society of Mining Engineers, New York.
(24) CHAUVET P (1991) Aide Memoire de Geostatistique Lineaire. Cahiers de Geo-
statistique, Fascicule 2, Ecole des Mines de Paris, Fontainebleau, 210p.
(25) CHAUVET P (1993) Processing Data with a Spatial Support: Geostatistics and
its Methods. Cahiers de Geostatistique, Fascicule 4, Ecole des Mines de Paris,
Fontainebleau, 57p.
(26) CHAUVET P & GALLI A (1982) Universal Kriging. Publication C-96, Centre de
Geostatistique, Ecole des Mines de Paris, Fontainebleau.
(27) CHAUVET P, PAILLEUX J & CHILES JP (1976) Analyse objective des champs
meteorologiques par cokrigeage. La Meteorologie, 6e serie, No 4,37-54.
(28) CHOQUET G (1967) Lectures on Analysis. Vol. 3, Benjamin, New York, 394p.
(29) CHRISTENSEN R (1987) Plane Answers to Complex Questions: the Theory of

Linear Models. Springer-Verlag, New York.
[30) CHRISTENSEN R (1991) Linear Models for Multivariate, Time Series and Spatial
Data. Springer-Verlag, New York.
(31) CHILES JP (1977) Geostatistique des PMnomfmes Non Stationnaires (dans le

plan). Doctoral thesis, Universite de Nancy I, Nancy.
Bibliography 239
[32] CHILES JP (1991) Application du krigeage avec derive externe a l'implantation

d'un reseau de mesures piezometriques. Sciences de la Terre, Serie Informatique,
30,131-147.
[33] CHILES JP & GUILLEN A (1984) Variogrammes et Krigeages pour la Gravimetrie

et le Magnetisme. Sciences de la Terre, Serie Informatique, 20, 455-468.
[34] CLARK I (1979) Practical Geostatistics. Elsevier Applied Science, London, 129p.
[35] CLIFF AD & ÜRD JK (1981) Spatial Processes: Models & Applications. Pion,
London, 266p.
[36] COOK RD & WEISBERG S (1982) Residuals and Influence in Regression. Chap-
man and Hall, New York, 230p.
[37] CRAMER H (1940) On the theory oistationary random processes. Annals ofMath-
ematics, 41, 215-230.
[38] CRAMER H & LEADBETTER MR (1967) Stationary and Related Stochastic Pro-
cesses. Wiley, New York.
[39] CRESSIE N (1991) The origins oi kriging. Mathematical Geology, 22, 239-252.
[40) CRESSIE N (1991) Statistics for Spatial Data. Wiley, New York, 900p.
[41] CREUTIN JD & üBLED C (1982) Objective analysis and mapping techniques for
rainfall fields: an objective analysis. Water Resources Research, 18, 413-431.
[42] DAGAN G (1985) Stochastic modeling of groundwater flow by unconditional and

conditional probabilities: the inverse problem. Water Resources Research, 21,
65-72.
[43] DAGAN G (1989) Flowand Transport in Porous Formations. Springer-Verlag,

Berlin.
a
[44] DALY C (1991) Application de la Geostatistique quelques Problemes de FiItrage.
Doctoral Thesis, Ecole des Mines de Paris, Fontainebleau.
[45] DALY C, LAJAUNIE C & JEULIN D (1989) Application of muItivariate kriging

to the processing oi noisy images. In: Armstrong M (ed) Geostatistics, 749-760,
Kluwer Academic Publisher, Amsterdam, Holland.
[46) DALY C, JEULIN D, BENOIT D & AUCLAIR G (1990) Application ofMultivariate

Geostatistics to Macroprobe Mappings in Steels. ISIJ International, 30, 529-534.
[47) DALY C, JEULIN D & BENOIT D (1992) Nonlinear statistical fiItering and ap-
plications to segregation in steels from microprobe images. Scanning Microscopy
Supplement 6,137-145.
[48] DAVID M (1977) Geostatistical Ore Reserve Estimation. Elsevier, Amsterdam.
[49J DAVIS JC (1973) Statistics and Data Analysis in Geology. Wiley, New York.
240 Bibliography
[50] DELFIN ER P (1979) Basic Introduction to Geostatistics. Publication C-78, Centre

de Geostatistique, Ecole des Mines de Paris, Fontainebleau, 125p.
[51] DELFINER P, DELHOMME JP & PELISSIER-COMBESCURE J (1983) Application

of geostatistical analysis to the evaluation of petroleum reservoirs with well logs.
24th Annual Logging Symposium of the SPWLA, Calgary, June 27-30 1983.
[52] DELFIN ER P & MATHERON G (1980) Les fonctions aleatoires intrinseques

d'ordre k. Publication C-84, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau.
[53] DELHOMME JP (1978) Kriging in the hydrosciences. Advances in Water Re-

sourees, 1,251-266.
[54] DELHOMME JP (1979) Reflexions sur 1a prise en compte simu1tanee des don-
nees de forages et des donnees sismiques. Publication LHM/RC/79/41, Centre
d'Informatique Geologique, Ecole des Mines de Paris, Fontainebleau.
[55] DELHOMME JP, BOUCHER M, MEUNIER G & JENSEN F (1981) Apport de 1a

geostatistique a1a description des stockages de gaz en aquitere. Revue de l'Institut
Fran~ais du Petrole, 36, 309-327.
[56] DEUTSCH C & JOURNEL A (1992) GSLIB: Geostatistical Software Library and
User's Guide. Oxford University Press, New York.
[57] DIGGLE PJ, LIANG KY & ZEGER SL (1994) Analysis of Longitudinal Data.
Clarendon Press, Oxford, 253p.
[58] DIMITRAKOPOULOS R, editor (1994) Geostatistics for the Next Century. Kluwer
Academic Publishers, Dordrecht, 497p.
[59) DONG A (1990) Estimation Geostatistique des PMnomenes regis par des
Equations aux Derivees Partielles. Doctoral thesis, Ecole des Mines de Paris,
Fontainebleau, 262p.
[60] DOWD PA (1989) Generalized cross-covariances. In: Armstrong M (ed) Geostatis-

tics, 151-162, Kluwer Academic Publisher, Amsterdam, Holland.
[61] DUBRULE 0 (1983) Cross-validation of kriging in a unique neighborhood. Math-

ematical Geology, 15, 687-699.
[62] FELL ER W (1968) An Introduction to Probability Theory and its Applications.

Volume I, third edition, Wiley, New York, 509p.
[63] FOUQUET C DE & MANDALLAZ D (1993) Using geostatistics for forest inventory
with air cover: an examp1e. In: Soares A (Ed.) Geostatistics Troia. '92, 875-886,
Kluwer Academic Publisher, Amsterdam.
[64] FRANQOIS-BoNGARQON D (1981) Les coregionalisations, 1e cokrigeage. Publica-

tion C-86 , Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau.
Bibliography 241
[65] GALLI A, GERDIL-NEUILLET F & DADOU C (1984) Factorial kriging analysis: a

substitute to spectral analysis of magnetic data. In: Verly G et al (eds) Geostatis-
tics for Natural Resources Characterization, 543-557, NATO ASI Series C-122,
Reidel, Dordrecht.
[66] GALL! A & MEUNIER G (1987) Study of agas reservoir using the extern al drift
method. In: G Matheron and M Armstrong (eds) Geostatistical Case Studies,
105-119, Reidel, Dordrecht.
[67] GIFI A (1990) Nonlinear Multivariate Analysis. WHey, New York.
[68] GIHMAN 11 & SKOROHOD AV (1974) The Theory of Stochastic Processes I.

Grundlehren der mathematischen Wissenschaften 210, Springer-Verlag, Berlin.
[69] GITTINS R (1985) Canonical Analysis: a Review with Applications in Ecology.

Biomathematics, Vol. 12, Springer-Verlag, Berlin, 351p.
[70] GOODALL C & MARDIA KV (1994) Challenges in multivariate spatio-temporal

modeling. Proceedings of XVIIth International Biometrics Conference, Volume 1,
1-17, Hamilton, Ontario.
[71] GOOVAERTS P (1992) Multivariate Geostatistical Tools for Studying Scale-

Dependent Correlation Structures and Describing Space- Time Variations. Doc-
toral Thesis, Universite Catholique de Louvain, Louvain-la-Neuve, 233p.
[72] GOOVAERTS P (1992) Factorial kriging analysis: a useful tool for exploring the
structure of multivariate spatial information. Journal of SoH Science, 43, 597-619.
[73] GOOVAERTS P (1994) Study of spatial relationships between two sets ofvariables
using multivariate geostatistics. Geoderma, 62, 93-107.
[74] GOOVAERTS P, SONNET P & NAVARRE A (1993) Factorial kriging analysis of

springwater contents in the Dyle river basin, Belgium. Water Resources Research,
29, 2115-2125.
[75] GOOVAERTS P & WEBSTER R (1994) Scale-dependent correlation between topsoil

copper and cobalt concentrations in Scotland. European Journal of SoH Science,
45,79-95.
[76] GOULARD M (1988) Champs Spatiaux et Statistique Multidimensionnelle. Doc-

toral thesis, Universite des Sciences et Techniques du Languedoc, Montpellier.
[77] GOULARD M (1989) Inference in" a coregionalization model. In: Armstrong M

(Ed.) Geostatistics, Kluwer Academic Publisher, Amsterdam, Holland, 397-408.
[78] GOULARD M & VOLTZ M (1992) Linear coregionalization model: tools for esti-
mation and choice ofmultivariate variograms. Mathematical Geology, 24,269-286.
[79] GREEN ACRE MJ (1984) Theory and Applications of Correspondence Analysis.

Academic Press, London.
242 Bibliography
[80] GRZEBYK M (1993) Ajustement d'une Coregionalisation Stationnaire. Doctoral

thesis, Ecole des Mines, Paris, 154p.
[81] GRZEBYK M & WACKERNAGEL H (1994) MuItivariate analysis and spa-

tialltemporal scales: real and complex models. Proceedings of XVIIth Interna-
tional Biometrics Conference, Volume 1, 19-33, Hamilton, Ontario.
[82] GUARASCIO M, DAVID M & HUIJBREGTS C, editors (1975) Advaneed Geostatis-

ties in the Mining Industry. NATO ASI Series C24, Reidel, Dordrecht, 461p.
[83] GUTTORP P & SAMPSON PD (1994) Methods for estimate heterogeneous spa-
tial eovariance funetions with environmental applieations. In: Patil GP & Rao
CR (eds) Environmental Statistics, Handbook of Statistics, 12,661-689, North-
Holland, Amsterdam.
[84] HAINING (1990) Spatial Data Analysis in the Social and Environmental Scienees.
Cambridge University Press, Cambridge, 409p.
[85] HASLETT J (1989) Space time modeling in meteorology: a review. Proceedings of

47th Session, International Statistical Institute, Paris, 229-246.
[86] HASLETT J, BRADLEY R, CRAIG PS, WILLS G & UNWIN AR (1991) Dynamie
graphies for exploring spatial data, with application to loeating global and loeal
anomalies. The American Statistician, 45, 234-242.
[87] HUDSON G & WACKERNAGEL H (1994) Mapping temperature using kriging with
external drift: theory and an example from Seotland. International Journal of
Climatology, 14, 77-91.
[88] HUIJBREGTS CJ (1970) Le variogramme des residus. Publication N-200, Centre

de Geostatistique, Ecole des Mines de Paris, Fontainebleau.
[89] HUIJBREGTS CJ (1971) Reeonstitution du variogramme ponctuel a partir d'un

variogramme experimental regularise. Publication N-244, Centre de Geostatis-
tique, Ecole des Mines de Paris, Fontainebleau, 25p.
[90] HUIJBREGTS CJ (1972) Ajustement d'un variogramme experimental a un modele

tMorique. Publication N-287, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau, 22p.
[91] ISAAKs EH & SHRIVASTAVA RM (1989) Applied Geostatistics. Oxford University

Press, Oxford, 561p.
[92] JAQUET 0 (1989) Faetorial Kriging Analysis Applied to Geological Data from
Petroleum Exploration. Mathematical Geology, 21, 683-691.
[93] JOURNEL AG & HUIJBREGTS CJ (1978) Mining Geostatistics. Academic Press,

London. 600p.
[94] KENDALL M & STUART A (1977) The Advanced Theory of Statistics. Volume 1,
4th edition, Griffin, London, 472p.
Bibliography 243
[95] KRIGE DG, GUARASCIO M AND CAMISANI-CALZOLARI FA (1989) Early South

African geostatistical techniques in today's perspective. In: Armstrong M (Ed.)
Geostatistics, Kluwer Academic Publisher, Amsterdam, Holland, 1-19.
[96] LAJAUNIE C (1984) A geostatistical approach to air pollution modeling. In: Verly
G et al (eds) Geostatistics for Natural Resources Characterization, 877-891,
NATO ASI Series C-122, Reidel, Dordrecht.
[97] LAJAUNIE C (1988) Ajustement automatique de coregionalisation. Unpublished

manuscript.
[98] LAJAUNIE C (1993) L'estimation GeostatistiqueNon Lineaire. Publication C-152,

Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau, 93p.
[99] LAJAUNIE C & BEJAOUI R (1991) Sur le krigeage des fonctions complexes. Note
N-23/91/G, Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau,
24p.
[100] LANTUEJOUL C (1988) Some stereological and statistical consequences derived

from Cartier's formula. Journal of Microscopy, 151,265-276.
[101] LANTUEJOUL C (1990) Cours de Selectivite. Publication C-140, Centre de Geo-

statistique, Ecole des Mines de Paris, Fontaineblea~, 72p.
[102] LANTUEJOUL C (1991) Ergodicity and integral range. Journal of Microscopy,

161, 387-403.
[103] LINDNER S & WACKERNAGEL H (1993) Statistische Definition eines Laterit-

panzer-Index für SPOTILandsat-Bilder durch Redundanzanalyse mit boden-
geochemischen Daten. In: Peschel G (ed) Beiträge zur Mathematischen Geologie
und Geoinformatik, Bd 5, 69-73, Sven-von-Loga Verlag, Köln.
[104) LUMLEY JL (1970) Stochastic Tools in Turb ulen ce. Academic Press, London.
[105] MA YZ & ROYER JJ (1988) Local geostatistical filtering: application to remote
sensing. Sciences de la Terre, Serie Informatique, 27, 17-36.
[106] MAGNUS JR & NEUDECKER H (1988) Matrix Differential Calculus with Appli-
cations in Statistics and Econometrics. Wiley, New York.
[107] MARBEAU JP (1976) Geostatistique Forestii~re. Doctoral thesis, Universite de

Nancy, Nancy.
[108] MARDIA KV, KENT JT & BIBBY JM (1979) Multivariate Analysis. Academic
Press, London, 521p.
[109] MARECHAL A (1970) Cokrigeage et regression en correlation intrinseque. Publi-

cation N-205, Centre de Geostatistique, Ecole des Mines de Paris, Fontainebleau.
[110] MARECHAL A (1984) Kriging seismic data in presence of faults. In: Verly G
et al.(eds) Geostatistics for Natural Resources Characterization, 271-294, NATO
ASI Series C-122, Reidel, Dordrecht.
244 Bibliography
[111] MARSILY G DE (1986) Quantitative Hydrogeology. Academic Press, London,

440p.
[112] MATERN B (1960) Spatial Variation. Reprint, Springer-Verlag, Berlin, 144p.

[113] MATHERON G (1962) Traite de Geostatistique Appliquee. Technip, Paris.
[114] MATHERON G (1965) Les Variables Regionalisees et leur Estimation. Masson,
Paris, 305p.
[115] MATHERON G (1969) Le Krigeage UniverseI. Fascicule 1, Les Cahiers du Cent re

de Morphologie Mathematique, Ecole des Mines de Paris, Fontainebleau, 83p.
[116] MATHERON G (1970) The Theory ofRegionalized Variables and its Applications.
Fascicule 5, Les Cahiers du Centre de Morphologie MatMmatique, Ecole des Mines
de Paris, Fontainebleau, 211p.
[117] MATHERON G (1971) La Theorie des Fonctions AIeatoires Intrinseques General-
isees. Publication N-252, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau.
[118] MATHERON G (1972) Le~ons sur les Fonctions AIeatoires d'Ordre 2. Ecole des
Mines, Paris.
[119] MATHERON G (1973) The intrinsic random functions and their applications.
Advances in Applied Prob ability, 5, 439-468.
[120] MATHERON G (1976) A simple substitute for conditional expectation: the dis-
junctive kriging. In:· Guarascio M et al.(eds) Advanced Geostatistics in the Mining
Industry, 221-236, NATO ASI Series C 24, Reidel, Dordrecht.
[121] MATHERON G (1979) Recherche de simplification dans un probleme de
cokrigeage. Publication N-628, Centre de Geostatistique, Ecole des Mines de Paris,
Fontainebleau.
[122] MATHERON G (1982) Pour une Analyse Krigeante des Donnees Regionalisees.
Publication N-732, Centre de Geostatistique, Fontainebleau, France, 22p.
[123] MATHERON G (1982) La selectivite des distributions. Publication N-686, Cent re

de Geostatistique, Ecole des Mines de Paris, Fontainebleau, 40p.
[124] MATHERON G (1984) The selectivity of distributions and "the second princi-
pIe of geostatistics". In: Verly G et al (eds) Geostatistics for Natural Resources
Characterization, 421-433, NATO ASI Series C-122, Reidel, Dordrecht.
[125] MATHERON G (1989) Two classes of isofactorial models. In: Armstrong M (ed)
Geostatistics, 309-322, Kluwer Academic Publisher, Amsterdam, Holland.
[126] MATHERON G (1989) Estimating and Choosing. Springer-Verlag, BerUn, 141p.
[127] MATHERON G, ROTH C & FOUQUET C DE (1993) Modelisation et cokrigeage

de 1a charge et de 1a transmissivite avec conditions aux limites a. distance linie.
Cahiers de Geostatistique, 3, 61-76, Ecole des Mines, Paris.
Bibliography 245
[128] MEIER S & KELLER W (1990) Geostatistik: eine Einführung in die Theorie
der Zufallsprozesse. Springer-Verlag, Wien, 206p.
[129] MONESTIEZ P, SAMPSON P & GUTTORP P (1993) Modeling of heterogeneous

spatial correlation structure by spatial deformation. Cahiers de Geostatistique, 3,
35-46, Ecole des Mines, Paris.
[130] MONESTIEZ P & SWITZER P (1991) Semiparametric estimation ofnonstationary

spatial covariance models by multidimensional scaling. Technical report No. 165,
Stanford University, Stanford.
[131] MORRISON DF (1978) MuItivariate Statistical Methods. Second Edition,

McGraw-Hill International, Auckland, 415p.
[132] MYERS DE (1982) Matrix formulation of cokriging. Mathematical Geology, 14,

249-258.
[133] MYERS DE (1983) Estimation oflinear combinations and cokriging. Mathemat-

ical Geology, 15,633-637.
[134] MYERS DE (1991) Pseudo-cross variograms, positive-definiteness, and cokriging.

Mathematical Geology, 23, 805-816.
[135] OLIVER MA & WEBSTER R (1989) A geostatistical basis for spatial weighting
in multivariate classificatioI}-. Mathematical Geology, 21, 15-35.
[136] OLIVER MA, LAJAUNIE C, WEBSTER R, MUIR KR & MANN JR (1993) Es-
timating the risk of childhood cancer. In: Soares A (Ed.) Geostatistics Troia '92,
899-910, Kluwer Academic Publisher, Amsterdam.
[137] ORFEUIL JP (1977) Etude, mise en oeuvre et test d'un modele de Pff3diction
a court terme de pollution athmospherique. Rapport CECA, Publication N-498,
Cent re de Geostatistique, Ecole des Mines de Paris, Fontainebleau.
[138] PAPRITZ A & FLÜHLER H (1994) Temporal change of spatially autocorrelated

soil properties: optimal estimation by cokriging. Geoderma, 62, 29-43.
[139] PAPRITZ A, KÜNscH HR & WEBSTER R (1993) On the pseudo cross-variogram.

Mathematical Geology, 25, 1015-1026.
[140] PARKER H (1979) The volume-variance relationship: a useful tool for mine plan-
ning. Engineering and Mining Journal, 180, 10, 106-123.
[141] PETITGAS P (1993) Use of a disjunctive kriging to model areas of high pelagic
fish density in acoustic fisheries surveys. Aquatic Living Resources, 6,201-209.
[142] PREAT, B (1987) Application of geostatistical methods for estimation of the

dispersion variance of occupational exposures. American Industrial Hygiene As-
sociation Journal, 48, 877-884.
[143] PRIESTLEY MB (1981) Spectral Analysis and Time Series. Academic Press, Lon-
don, 890p.
246 Bibliography
[144] RAo CR (1964) The use and interpretation of principal component analysis in
applied research. Sankhya, Series A, 329-358.
[145] RAo CR (1973) Linear Statistical Inference and its Applications. 2nd edition,
Wiley, New York, 625p.
[146] RASPA G, BRUNO R, DOSI P, PHILIPPI N & PATRIZI G (1993) Multivariate

geostatistics for soil classification. In: Soares A (Ed.) Geostatistics Troia '92,
793-804, Kluwer Academic Publisher, Amsterdam.
[147] RENARD D & NAI-HsIEN M (1988) Utilisation de derives externes multiples.

Sciences de la Terre, Serie Informatique, 28,281-301.
[148] RENARD D & RUFFO P (1993) Depth, dip and gradient. In: Soares A (Ed.)
Geostatistics Troia '92, 167-178, Kluwer Academic Publisher, Amsterdam.
[149] RIPLEY BD (1981) Spatial Statistics. Wiley, New York, 264p.
[150] RIPLEY BD (1987) Stochastic Simulation. Wiley, New York, 237p.
[151] RIVOIRARD J (1984) Le Comportement des Poids de Krigeage. Doctoral thesis,

Ecole des Mines de Paris, Fontainebleau.
[152] RIVOIRARD J (1988) Modeles aresidus d'indicatrices autokrigeahles. Sciences

de la Terre, Serie Informatique, 28, 303-326.
[153] RIVOIRARD J (1989) Models with orthogonal indicator residuaJs. In: Armstrong
M (ed) Geostatistics, 91-107, Kluwer Academic Publisher, Dordrecht.
[154] RIVOIRARD J (1993) Relations between the indicators related to a regionalized

variable. In: Soares A (Ed.) Geostatistics Troia '92, 273-284, Kluwer Academic
[155] RIVOIRARD J (1994) Introduction to Disjunctive Kriging and Non-Linear Geo-

statistics. Oxford University Press, 181p.
[156] ROUHANI S & WACKERNAGEL H (1990) Multivariate geostatistical approach to

space-time data analysis. Water Resources Research, 26, 585-591.
[157] ROYER JJ (1984) Proximityanalysis: a method for multivariate geodata pro-

cessing. Sciences de la Terre, Serie Informatique, 20, 223-243.
[158] ROYER JJ (1988) Analyse Multivariable et Filtrage des Donnees Regionalisoos.

Doctoral thesis, Institut National Polytechnique de Lorraine, Nancy.
[159] RYTOV SM, KRAVTSOV YA & TATARSKII VI (1989) Principles of Statistical

Radiophysics 3: Elements of Random Fields. Springer-Verlag, Berlin, 238p.
[160] SABOURIN R (1976) Application of two methods for the interpretation of the
underlying variogram. In: Guarascio M et al.( eds) Advanced Geostatistics in the
Mining Industry, 101-109, NATO ASI Series C 24, Reidel, Dordrecht.
Bibliography 247
[161] SACKS J, WELCH WJ, MITCHELL TJ & WYNN HP (1989) Design and analysis
of computer experiments. Statistical Science, 4, 409-435.
[162] SAMPSON P & GUTTORP P (1992) Nonparametrie estimation ofnonstationary

spatial structure. Journal of the American Statistical Association, 87, 108-119.
[163] SANDJIVY L (1983) Analyse krigeante de donnees geochimiques. Sciences de 1a

Terre, Serie Informatique, 18, 141-172.
[164] SANDJIVY L (1984) The factorial kriging analysis of regionalized data. In:
Verly G et al.( eds) Geostatistics for Natural Resources Characterization, 559-
571, NATO ASI Series C-122, Reide1, Dordrecht.
[165] SAPORTA G (1990) Probabilites, Analyse des Donnees et Statistique. Technip,

Paris.
[166] SCHNEIDER T, HOLM PETERSEN 0, AASBERG NIELSEN A & WINDFELD K

(1990) A geostatistical approach to indoor surface sampling strategies. Journal of
Aeroso1 Science, 21,555-567.
[167] SCHOENBERG IJ (1938) Metric spaces and completely monotone functions. An-
nals of Mathematics, 39, 811-841.
[168] SCHULZ-OHLBERG (1989) Die Anwendung geostatistischer Verfahren zur In-

terpretation von gravimetrischen und magnetischen Felddaten. Wissenschaftlich-
Technische Berichte, 1989-6, Deutsches Hydrographisches Institut, Hamburg.
[169] SEBER GAF (1984) Multivariate Observations. Wiley, New York, 686p.
[170] SEGURET S (1993) Analyse krigeante spatio-temporelle appliquee ades donnees

aeromagnetiques. Cahiers de Geostatistique, 3, 115-138, Eco1e des Mines, Paris.
[171] SEGURET S & HUCHON P (1990) Trigonometrie kriging: a new method for
removing the diurnal variation {rom geomagnetie data. Journal of Geophysical
Research, Vol. 32, No. B13, 21.383-21.397.
[172] SERRA J (1968) Les structures gigognes: morphologie matMmatique et inter-

pretation metallogenique. Mineralium Deposita, 3, 135-154.
[173] SHlLOV GE & GUREVICH BL (1977) Integral, Measureand Derivative: a Unified

Approach. Dover, New York, 233p.
[174] SOARES A, editor (1993) Geostatisties Troia '92. Two vo1umes, K1uwer Academic
[175] SOUSA AJ (1989) Geostatistieal data analysis: an application to ore typology.

In: Armstrong M (ed) Geostatistics, 851-860, Kluwer Academic Publisher, Ams-
terdam, Holland.
[176] SPECTOR A & BHATTACHARYYA BK (1966) Energy density spectrum and au-
tocorrelation functions due to simple magnetic models. Geophysical Prospecting,
14, 242-272.
248 Bibliography
[177] SPECTOR A & GRANT FS (1970) Statistical models for interpreting aeromag-
netic data. Geophysics, 35, 293-302.
[178] STEIN A, EINJSBERGEN AC VAN & BARENDREGT LG (1991) Cokriging non-

stationary data. Mathematical Geology, 23, 703-719.
[179] STEIN A, STARISKY IG & BOUMA J (1991) Simulation of moisture deficits and
areal interpolation by universal cokriging. Water Resources Research, 27, 1963-
1973.
[180] STEIN M (1986) A simple model for spatial-temporal processes. Water Resources
Research, 22, 2107-2110. '
[181] STOYAN D (1993) Stochastik für Ingenieure und Naturwissenschaftler.

Akademie-Verlag, Berlin, 307p.
[182] STOYAN D & STOYAN H (1994) Fractals, Random Shapes and Point Fields.
Wiley, New York, 406p.
[183] STRANG G (1984) Linear Algebra and its Applications. Second edition, Academic
Press, London.
[184] STRANG G (1986) Introduction to Applied Mathematics. Wellesley-Cambridge

Press, Wellesley.
[185] SWITZER P & GREEN AA (1984) Minimax autocorrelation factors for multi-
variate spatial imagery. Department of Statistics, Stanford University, Technical
Report No 6, 14p.
[186] THIEBAUX HJ, MORONE LL & WOBUS RL (1990) Global forecast error cor-
relation. Part 1: isobaric wind and geopotential. Monthly Weather Review, 118,
2117-2137.
[187] THIEBAUX HJ & PEDDER MA (1987) Spatial Objective Analysis: with Appli-
cations in Atmospheric Science. Academic Press, London, 299p.
[188] TYLER DE (1982) On the optimality of the simultaneous redundancy transfor-

mations. Psychometrika, 47, 77-86.
[189] UPTON G & FINGLETON B (1985) Spatial Data Analysis by Example: Point
Pattern and Quantitative Data. Wiley, Chichester, 410p.
[190] VANMARCKE E (1983) Random Fields: Application and Synthesis. MIT Press,
Cambridge, 382p.
[191] VENABLES WN & RIPLEY BD (1994) Modern Applied Statistics with S-Plus.
Springer-Verlag, New York, 462p.
[192] VERLY G, DAVID M & JOURNEL AG, editors (1984) Geostatistics for Natu-
ral Resources Characterization. Two volumes, NATO ASI Series C-122, Reidel,
Dordrecht.
Bibliography 249
[193] VOLLE M (1985) Analyse des Donnees. 3e edition, Economica, Paris.
[194] WACKERNAGEL H (1985) L'inference d'un Modele Lineaire en Geostatistique

Multivariable. Doctoral thesis, Ecole des Mines de Paris, Fontainebleau, 100p.
[195] WACKERNAGEL H (1988) Geostatistical techniques for interpreting multivariate
spatial information. In: Chung CF et al.( eds) Quantitative Analysis of Mineral
and Energy Resources, 393-409, NATO ASI Series C 223, Reidel, Dordrecht.
[196] WACKERNAGEL H (1994) Cokriging versus kriging in regionalized multivariate

data analysis. Geoderma, 62, 83-92.
[197] WACKERNAGEL H & BUTENUTH C (1989) Caracterisation d'anomalies geochim-

iques par la geostatistique multivariable. Journal of Geochemical Exploration, 32,
437-444.
[198] WACKERNAGEL H, PETITGAS P & TOUFFAIT Y (1989) Overview of methods for

coregionalization analysis. In: Armstrong M (ed) Geostatistics, 409-420, Kluwer
Academic Publisher, Amsterdam, Holland.
[199] WACKERNAGEL H & SANGUINETTI H (1992) Gold prospecting with factorial

cokriging in the Limousin, France. In: Davis JC & Herzfeld UC (eds) Computers
in Geology: 25 years of progress. Studies in Mathematical Geology, Vol. 5, 33-43,
Oxford University Press. .
[200] WACKERNAGEL H, WEBSTER R & OLIVER MA (1988) A geostatistical method

for segmenting multivariate sequences of soil data. In: Bock HH (ed) Classifica-
tion and Related Methods of Data Analysis, 641-650, Elsevier (North-Holland),
Amsterdam.
[201] WARN ES JJ (1986) A sensitivity analysis of universal kriging. Mathematical

Geology, 18, 653-676.
[202] WEBSTER R (1978) Optimally partitioning soil transects. Journal ofSoil Science,
29, 388-402.
[203] WEBSTER R, ATTEIA 0 & DUBOIS JP (1994) Coregionalization oftrace metals

in the soil in the Swiss Jura. European Journal of Soil Science, 45, 205-218.
[204] WEBSTER R & OLIVER MA (1990) Statistical Methods in Soil and Land Re-
source Survey. Oxford University Press, Oxford.
[205] WHITTAKER J (1990) Graphical Models in Applied Multivariate Analysis. Wiley,

New York.
[206] Xu W, TRAN TT, SRIVASTAVA RM & JOURNEL AG (1992) Integrating seismic

data in reservoir modeling: the collocated cokriging alternative. Proceedings of
67th Annual Technical Conference of the Society of Petroleum Engineers, paper
SPE 24742, 833-842, Washington.
[207] YAGLOM AM (1962) An Introduction to the Theory of Stationary Random Func-

tions. Dover, New York, 235p.
250 Bibliography
[208] YAGLOM AM (1986) Correlation Theory of Stationary and Related Random

Functions. Springer-Verlag, Berlin.
Index
analysis codispersion coefficient 142, 163

canonical 123, 127, 162 coefficient of variation 64
coregionalization 158, 160, 163 coherence of estimators 148
correspondence 127 cokriging
discriminant 161 autokrigeability 149
factor 160 coherence 148
factorial kriging 160 collocated 151
multiple time series 174 heterotopy 151
multivariate time series 174 intrinsic correlation 150, 151, 169
principal component 115, 141, 158, ordinary 145, 146
160, 164 real and imaginary parts 168
redundancy 162 regionalized factors 162
regionalized multivariate 153, 163 simple 147, 168
time series 221 variance 146
with instrumental variables 162 with isotopy 148
analytic function 110 collocated cokriging 151
anisotropy compatible imaginary part 170
geometrie 46, 84 complex covariance function
zonal 48 compatible imaginary part 170
anomalies definition 137, 166
groupwise 94 fitting 171
pointwise 94 modeling 170
anomaly 31,80,94 nested multivariate 173
autokrigeability 149 real and imaginary part 166
autokrigeability coefficients 150 complex kriging
autoregressive process 221 definition 167, 168
even cross covariance 170
behavior at the origin (variogram) 109 complex variable
bilinear coregionalization model 172, kriging 166
174 model 166
Bochner's theorem 38, 137 component
long range 105
canonical correlation 124 nugget-effect 105
causal relations 174 principal 115
centered variable 12, 16 short range 105
center of mass 7 spatial 97, 100, 152
characteristic scales 94, 96, 158, 160, conditional independence 86
173 conditionaily negative definite 38
circ1e of correlations 118, 160, 164 contingency table 126
252 Index
coregionalization Gaussian 40, 83, 86, 109, 110, 111,

analysis 158 219
between groups 162 hole-effect 219
bilinear model 172, 173, 174 intrinsic correlation 140, 172
complex linear model 172, 173 nugget-effect 41,42, 69, 72, 82, 218
linear model 153, 160 spherieal 42, 83, 218
matrix 141, 173 stable 40, 110, 111, 219
mixture of matrices 158,164 covariance model
real and imaginary parts 166, 168 exponential 221
within group 162 covariogram (geometrie) 43, 182
correlation Cramer's theorem 137
intrinsie 169 cross covariance function
correlation (regionalized) 158, 163, 164 antisymmetrie behavior 133
correlation cirele 118, 160, 164 characterization 136
correlation coefficient 12, 158 complex 137, 172
correlation function 37, 95, 128, 140, continuous 137
152 cross 137
correlation with factors 118, 121 definition 131
cospectrum 139 direct 137
covariance 11 even 170, 172
between drift coefficients 180 even term 133, 135
generalized 184, 186 experimental 134
of increments 39 intrinsic correlation 140, 172
partial 91 nested complex 173
covariance function non even 174
absolutely integrable 138 odd term 133, 135
bounded 37 relation with variogram 133
complex 137, 166; 172 spectral density function 138, 139
continuous 38, 40, 137 spectral representation 137
cross 131 with derivative 133
definition 37 cross validation
direct 131 error 80
experimental 134 kriging 80, 91
factorizable 86 multiple linear regression 216
generalized 184, 186 with external drift 197
matrix 137, 140 cross variogram 133, 143, 154
multivariate nested 152, 161 cumulative histogram 8
nested complex 173 cut-off value 55, 62
spectral density function 138
spectral representation 137 delayeffect 132,134,173,174
covariance function model density function 9, 128
Basset 219 Dirac measure 66
Bessel 219 disjunctive
Cauchy 219 kriging 128
cubie 218 table 126
exponential40,41,58,86,138,219 dispersion variance 52, 58
Index 253
dissimilarity 30, 75 fitting by eye 34

distribution
bivariate 28, 128 Gaussian isofaetorial model 128
function 8, 28, 62 Gaussian variogram/eovarianee 40, 83,
Gaussian 65, 86 86,109,110,111,219
joint 214 generalized eovarianee function 184,
lognormal 65 186
moments of 9 geometrie anisotropy 46, 84
geometrie eovariogram 43, 182
multiple 28
Gini eoefficient 64
multivariate 28
golden mean 48
uniform 65
Goulard's algorithm 155
drift 36, 97, 177, 185, 190
groupwise anomalies 94
drift estimation 180
dual kriging system 178 Hermite polynomials 128
Hermitian matrix 137
eigenvalue
Hermitian positive semi-definite 137
decomposition 209
heterotopy 144
definition 207
hole-effeet 219
interpretation 116
hull of perfect eorrelation 154
problem 116
ellipsoid 46, 47 ieosahedron 48
epistemology 25 inerements 36, 146
estimation indieator function 43, 127, 182
error 19,89 indieator function (kriging) 128
variance 20, 70 indieator residuals 150
Euler's angles 47 infinitely differentiable at origin 110
exact interpolation 76, 89, 90, 108, 179, inhomogeneity 31
186, 191 integral range 171
expectation (conditional) 213 intrinsie
expectation (expected value) 9 correlation 140, 150, 159, 160, 163,
extension variance 51 165,172
external drift 184, 190, 193 random function of order-k 185,
external drift (regularity) 200 187
extrapolation 88, 109, 111 regionalization 98
stationarity 36
factor invarianee
dilution 120 rotation 41
in multi-Gaussian context 115 translation 29, 36
interpretation 115 inverse distanee interpolator 86, 89
in two groups 123 investment 63
pairs of 119 iso-variogram lines 46
shape 119 isofactorial model 128
size 119 isolines 90, 110, 111
variance 116 isotopy 144, 148
filtering 100, 103, 107, 108, 188 isotropie 41, 183
filtering drift 186
filtering nugget-effect 108 Krige's relation 54, 55, 61
254 Index
kriging Lorenz eurve 66

block (BK) 77
eomplex (CC) 167, 168 map
eonservative 107 isoline 90
disjunetive 128 raster 89
dual system 178 matrix
exaet interpolation 76 Euelidean norm 157
filtering 100, 103 of eorrelations 206
indicator 128 varianee-eovarianee 205
of drift eoefficients 180, 194 mean 7, 62, 68, 69
of the drift 180 measures of dispersion 62
of the mean (KM) 72, 77, 106, 181 missing values 17
of the residual (KR) 79 moment 9
ordinary (OK) 75,78,99 morphological objeets 45, 49
simple 19,77,86,211,221 morphology 95
singular kriging matrix 211 morphometries 49
spatial eomponents 100 multi dimensional sealing 49
standard deviation 89 multiple linear regression 16, 159, 214,
universal (UK) 177, 178, 179, 184, 216
191, 193 multivariate outliers 121
varianee 21, 72, 89, 91
negative definite
weights 20
function 38
with duplieated sampie 211
matrix 38
with external drift 193
negative kriging weights 88, 109
with known mean 19 neighborhood
Kronecker produet 149
eolloeated eokriging 151
Kronecker symbol 194
loeal 91, 99
moving 73, 90, 99, 106, 111, 183,
Lagrange
187, 193
method 71
radius 91, 99
multiplier 71, 72, 116, 125, 146,
size 80, 91
178, 180
unique 110, 111, 178
linear interpolation 90
nested eovarianee function 173
linear model
nested variogram 95
bilinear 172, 173
noise removal161
intrinsic eorrelation 141, 172
nonllnear geostatistics 128, 150
IRF-k 185
nugget-effeet 33, 41, 42, 82, 105, 109,
of coregionalization 153, 160, 172
132, 187, 218
of regionalization 97, 106
nugget-effeet filtering 108
spatial multivariate 152
universal kriging 177, 184 objective function 71, 116, 124
linear regression 14 outliers 31, 80, 121
linked windows 31
loeally stationary 99, 183 perfeet eorrelation hull 154
loeal mean 99, 101, 105, 107, 162 phase shift 139, 173
loeal neighborhood 162 phase speetrum 139
Index 255
pointwise anomalies 94 Schoenberg's theorem 40

Poisson points 43, 60 screen effect 85, 87
polygon method 54 selectivity
positive definite definition 64
k-th order conditionally 187 Gaussian 65
criteria 208 geometrie meaning 66
function 37, 131 index 65
Hermitian 137 lognormal 65
matrix 37, 208 sensitivity of kriging 109
positive semi-definite matrix 38, 154, shape function 191
208 sill 33, 40, 42, 48, 73
practieal range 41, 57 simple cokriging 147
principal axes 46 simple kriging 19
prob ability 8 singular value decomposition 124, 128,
profit 63, 68 209
projection 215 size effect 120
pseudo cross variogram 135 space-time drift 188
spatial
quadrat ure spectrum 139 anomaly 94
quantity vs tonnage 66 characteristic scales 158
random function 27 component 97, 100
random function (intrinsie of order-k) spatial correlation mapping 49
185, 187 spectral analysis 103
randomization 171 spherical model 43
randomness 27 spherical model (interpretation) 45, 91
range 42, 45, 91 spikes on the map 108
range (integral) 171 standard deviation 12
raster map 89 standardized variable 11, 12
recovered quantity 62 stationarity
regionalization dependence on scale 191
intrinsie 98 intrinsie 36,37,98
locally stationary 99 joint intrinsie 133
mixed model 98, 100 joint second-order 131, 172
multivariate 152 local second-order 73, 99, 106, 162,
second-order stationary 97 183
regionalized second-order 18, 37, 97
correlation coefficient 158, 164 strict 29
value 27 support 50, 68
variable 26 support effect 55
residual 11
resolution of a map 106 theorem
rotation invariant 41 Bochner 38, 137
Cramer 137
sampling design 59 Schoenberg 40
scale-dependent correlation 163 time series analysis 174, 221
scatter diagram 10 tonnage 62
256 Index
translation invariance 29, 36, 131, 184, slope 33

185 theoretical 34, 35, 36
trembling hand 111 unbounded 96
underlying 181, 184
unbiased 20, 70, 75 with drift 193
underlying variogram 181, 184 variogram model
uniformity coefficient 65 Basset 219
unique neighborhood 110, 111, 178 Bessel219
unique realization 25 Cauchy 219
cubic 218
variance
De Wijsian 220
cokriging 146
De Wijsian-a 220
decomposition 117
exponential 40, 41, 58, 86, 138, 219
dispersion 52, 58
Gaussian 40, 83, 86, 109, 110, 111,
estimation 20, 70
219
experimental 8
hole-effect 219
extension 51
intrinsic correlation 141
kriging 21, 72, 91
nugget-effect 41, 42, 69, 72, 82, 218
of factor 117
power 37, 220
of increments 36
spherical 42, 83, 218
of the data 33
stahle 40, 110, 111, 219
theoretical 10, 63, 82
volume of infiuence 43
variance-covariance matrix 16, 115, 140,
149, 158, 159, 161, 164, 205 weighted average 7
variogram weighted least squares 154, 157
anisotropy 45 weight of the mean 78
as generalized covariance 187, 193 white noise 41
authorized 75, 100, 154
bivariate fit 154 zonal anisotropy 48
bounded 40, 73 zonation 49
cloud 31
cross 133
direct 133
even function 36
experimental 32, 45, 61, 134
fitting 34, 154, 155
integrals 52
multivariate fit 155
multivariate nested 154, 158
near the origin 33, 41, 42, 61, 109
nested 95, 103, 159
normalized 95, 154
range 42, 45
regional 35, 182
regularized 57
relation with covariance 37
sill 33, 40, 42
Spri nger-Verlag
and the Environment
We at Springer-Verlag firmly believe that an

international science publisher has a special
obligation to the environment, and our corpo-
rate policies consistently reflect this conviction.
We also expect our busi-

ness'partners - paper mills, printers, packag-
ing manufacturers, ete. - to commit themselves
to using environmentally friendly materials and
production processes.
The paper in this book is made from

low- or no-chlorine pulp and is acid free, in
conformance with international standards for
paper permanency.

Multivariate Geostatistics - An Introduction With Applications-Springer Berlin Heidelberg (1995)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Geostatistics - An Introduction With Applications-Springer Berlin Heidelberg (1995)

Uploaded by

Copyright:

Available Formats

H.

Wackernagel, Multivariate Geostatistics

With 75 Figures and 5 rabIes

Cataloging-in-Publication Data applied for'

Die Deutsche Bibliothek - CIP-Einheitsaufnahme

ISBN 978-3-662-03100-1 ISBN 978-3-662-03098-1 (eBook)

JP BENZECRI (according to [193])

Fontainebleau, May 1995 Hans Wackernagel

5 Variogram and Covariance Function 35

Fitting the variogram with a eovarianee function. . . . . . . . . . . .. 40

6 Examples of Covariance Functions 41

8 Extension and Dispersion Variance 50

10 Kriging the Mean 69

15 Kriging Spatial Components 100

16 The Smoothness of Kriging 106

C M ultivariate Analysis 113

18 Canonical Analysis 123

19 Correspondence Analysis 126

D M ultivariate Geostatistics 129

20 Direct and Cross Covariances 131

21 Covariance Function Matrices 137

22 Intrinsic Multivariate Correlation 140

24 Multivariate Nested Variogram 152

25 Coregionalization Analysis 160

26 Kriging a Complex Variable 166

27 Bilinear Coregionalization Model 172

E Non-Stationary Geostatistics 175

29 Translation Invariant Drift 185

30 External Drift 190

Linear Regression Theory 213

Covariance and Variogram Models 218

Additional Exercices 221

Solutions to Exercises 223

References and Software 233

Geostatistics is a rapidly evolving branch of applied mathematics which origi-

Interpretation. The graphical displays gained from the numerical information

Part B offers a detailed introduction to linear geostatisties for a single variable.

Part C presents three well-known methods of multivariate analysis. Prineipal

Part D extends linear geostatistics to the multivariate case. The properties of

central problem of multivariate geostatistics. Two models, the intrinsic

Part E discusses phenomena involving a non-stationary component called the

In this introductory chapter we review a few basic concepts of statistics such as

The mean: center of mass

Z = 5, 5.5, 6, 6.5, 7, 7.5, 8

are normed weights with

' " weight w

Transposing the physical problem at hand into a probabilistic context, we

Figure 2.2: Histogram.

differentiable distribution functions. The derivative of the frequency distribution

Figure 2.3: Cumulative histogram.

and as the expectation is a linear operator

COV(Zl' Z2) = !. I)zf - md(z~ - m2)

has a variance equal to 1. The covariance of two standardized variables Z1 and

dist 2 (a,b') = var(zt} + a 2 var(z2) + (b')2 - 2acov(zt,Z2)

where b' is the intercept of the shifted straight line.

8 dist 2(a, b')

As the minimum with respect to b' is reached for

dist~;n (a, b) ~( Zl'" -

Figure 2.5: Data cloud with regression line zr.

Z~l <i Z~N

A matrix Z of eentered variables Zj is obtained by subtracting M from the

V = !:.ZTZ = eov(Zj, Z1) var(Zj) eov(Zj, ZN)

eov(ZN,Z1) eov(ZN,Zj) var(ZN)

(1~ = var(Z(xo) - Z(Xo)) = E[ (Z(xo) - Z(xo)r]

(1~ = E [ (Z(Xo)) 2 + (Z(xo)) 2 - 2 Z(Xo) Z(xo) ]