You are on page 1of 142

ISATIS.neo Version 2020.

02
Technical References
Technical References
Published, sold and distributed by GEOVARIANCES
49 bis Av. Franklin Roosevelt, 77210 Avon, France
http://www.geovariances.com

Isatis.neo Version 2020.02, February 2020

All Rights Reserved


© 1993-2020 GEOVARIANCES
No part of the material protected by this copyright notice may be reproduced or utilized in any form
or by any means including photocopying, recording or by any information storage and retrieval
system, without written permission from the copyright owner.
1

Table of Contents
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. Struc-
ture Identification in the Intrinsic Case . . . . . . . . . . . . . . . . . . . . . . . . . .3
1.1 1.1 The Experimental Variability Functions. . . . . . . . . . . . . . . . . . . . . .4
1.2 1.2 Variogram Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
1.3 1.3 The Automatic Sill Fitting Procedure. . . . . . . . . . . . . . . . . . . . . . . .36
2 2 Automatic Variogram Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
2.1 2.1 General Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
2.2 2.2 Quadratic optimization under linear constraints. . . . . . . . . . . . . . . .48
2.3 2.3 Minimization of a sum of squares . . . . . . . . . . . . . . . . . . . . . . . . . .50
3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. Non-sta-
tionary Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
3.1 3.1 Unique Neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
3.2 3.2 Moving Neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
3.3 3.3 Case of External Drift(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68
3.4 3.4 Case of Kriging With Bayesian Drift . . . . . . . . . . . . . . . . . . . . . . . .69
4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. Quick
Interpolations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
4.1 4.1 Inverse Distances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
4.2 4.2 Least Square Polynomial Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
4.3 4.3 Moving Projected Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
4.4 4.4 Discrete Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
4.5 4.5 Bilinear Grid Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5. Linear
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
5.1 5.1 Ordinary Kriging (Intrinsic Case) . . . . . . . . . . . . . . . . . . . . . . . . . .80
5.2 5.2 Simple Kriging (Stationary Case with Known Mean) . . . . . . . . . . .83
5.3 5.3 Drift Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
5.4 5.4 Estimation of a Drift Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . .86
5.5 5.5 Kriging with External Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87
5.6 5.6 Unique Neighborhood Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
5.7 5.7 Filtering Model Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
5.8 5.8 Block Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
5.9 5.9 Sampling Density Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97
5.10 5.10 Kriging with Measurement Error . . . . . . . . . . . . . . . . . . . . . . . . .98
5.11 5.11 Cokriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
5.12 5.12 Extended Collocated Cokriging . . . . . . . . . . . . . . . . . . . . . . . . . .103
6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6. Gaussian
Transformation: the Anamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105
2

6.1 6.1 Modeling and Variable Transformation . . . . . . . . . . . . . . . . . . . . . 106


6.2 6.2 Histogram Modeling and Block Support Correction . . . . . . . . . . . 113
6.3 6.3 Variogram on Raw and Gaussian Variables . . . . . . . . . . . . . . . . . . 117
7 ......................................................... 7. Non Lin-
ear Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1 7.1 Probability from Conditional Expectation . . . . . . . . . . . . . . . . . . . 120
7.2 7.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8 ......................................................... 8. Kriging
With Bayesian Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9 ......................................................... 9.
Advanced Estimation Methods and Simulations . . . . . . . . . . . . . . . . . . 127
9.1 9.1 Turning Bands Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2 9.2 Spill Point Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Technical References 3

1.Structure Identifica-
tion in the Intrinsic Case
This page constitutes an add-on to the User’s Guide for Statistics / Exploratory Data Analysis.

This technical reference reviews the main tools available in Isatis to describe the spatial variability
(regularity, continuity, ...) of the variable(s) of interest, commonly referred to as the "Structure", in
the Intrinsic Case.
4 Structure Identification in the

1.1 The Experimental Variability Functions


Though the variogram is the classical tool to measure the variability of a variable as a function of
the distance, several other two-points statistics exist. Let us review them through their equation and
their graph on a given data set. n designates the number of pairs of data separated by the considered
distance and Z  and Z  stand for the value of the variable at two data points constituting a pair.

m mZ is the mean over the whole data set


2
m  Z is the variance over the whole data set
m m+Z is the mean calculated over the first points of the pairs (head)

m m-Z is the mean calculated over the second points of the pairs (tail)
+
m Z is the standard deviation calculated over the head points


m  Z is the standard deviation calculated over the tail points
m  and   is considered to be at a distance of +h from  .

1.1.1 Univariate case


The Transitive Covariogram

n Z Z

(fig. 1.1-1)
Technical References 5

The Variogram
1-
2n   Z  – Z  
----- 2

(fig. 1.1-2)

The Covariance ( centered )


1---
n   Z – mZ   Z – mZ 
n

(fig. 1.1-3)

The Non-Centered Covariance


1---
n  Z Z
n
6 Structure Identification in the

(fig. 1.1-4)

The Non-Ergodic Covariance

(fig. 1.1-5)

The Correlogram
1---  Z  – m Z   Z  – m Z 
n  -----------------------------------------------
2
-
n Z
Technical References 7

(fig. 1.1-6)

The Non-Ergodic Correlogram


Z  – mZ+   Z  – m Z- 
1--- ----------------------------------------------------
n
-
n  + -
Z Z

(fig. 1.1-7)

The Madogram (First Order Variogram)


1- Z – Z
2n 
-----  
n
8 Structure Identification in the

(fig. 1.1-8)

The Rodogram ( 1/2 Order Variogram )


1-
2n 
----- Z – Z
n

(fig. 1.1-9)

The Relative Variogram


1-  Z – Z  2
2n  -------------------------
-----
m2 Z
n
Technical References 9

(fig. 1.1-10)

The Non-Ergodic Relative Variogram


1-  Z – Z  2
2n  --------------------------------
----- - 2
-
+
n m Z + mZ 
-------------------------
 2 

(fig. 1.1-11)

The Pairwise Relative Variogram


1-  Z – Z  2
2n  --------------------------
----- -
Z + Z 2
n  ------------------
 
 2 
10 Structure Identification in the

(fig. 1.1-12)

Although the interest of the madogram and rodogram, as compared to the variogram, is quite obvi-
ous (at least graphically), as it tends to smooth out the function, the user must always keep in mind
that the only tool that corresponds to the statement of kriging (namely minimizing a variance) is the
variogram. This is particularly obvious when looking at the variability values (measured along the
vertical axis) on the different figures, remembering that the experimental variance of the data is rep-
resented as a dashed line on the variogram picture.

1.1.2 Weighted Variability Functions


It can be of interest to take into account weights during the computation of variability functions.
These weights can for instance be derived from declustering; in this case, their integration is
expected to compensate potential bias in the estimation of the experimental function from clustered
data. For further information about these weighted variograms, see for instance Rivoirard J. (2000),
Weighted Variograms, In Geostats 2000, W. Kleingeld and D. Krige (eds), Vol. 1, pp. 145-155.

For instance, the weights     =1  N are integrated in the weighted experimental variogram
equation in the following way:

1--- 
   Z – Z  2
n
----------------------------------------------
- (eq. 1.1-1)
2
n  
The other experimental functions are obtained in a similar way.
Technical References 11

1.1.3 Multivariate case


In the multivariate case kriging requires a multivariate model. The variograms of each variable are
usually designated as "simple" when the variograms between two variables are called cross-vario-
grams.

We will now describe, through their equation, the extension given to the statistical tools listed in the
previous section, for the multivariate case. We will designate the first variable by (Z) and the second
by (Y), and mz and my refer to their respective means over the whole field, m+Z and m+Y to their
means for the head points, m-Z and m-Y to their means for the tail points.

The Transitive Cross-Covariogram

n Z Y

(fig. 1.1-13)

The Cross-Variogram
1-  Z – Z   Y – Y 
2n 
-----    
n
12 Structure Identification in the

(fig. 1.1-14)

The Cross-Covariance (centered)


1---  Z – m   Y – m 
n  Z  Y
n

(fig. 1.1-15)

The Non-Centered Cross-Covariance


1---
n
Z Y
n
Technical References 13

(fig. 1.1-16)

The Non-Ergodic Cross-Covariance


1
---   Z  – mZ+   Y  – mY- 
n
n

(fig. 1.1-17)

The Cross-Correlogram
1---  Z  – m Z   Y  – m Y 
n  -----------------------------------------------
 
-
n Z Y
14 Structure Identification in the

(fig. 1.1-18)

The Non-Ergodic Cross-Correlogram


-
1---  Z  – mZ+   Y  – mY 
n ------------------------------------------------
 + -
-
n Z Y

(fig. 1.1-19)

The Cross-Madogram
1
------   Z – Z   Y – Y 
2n
n
Technical References 15

(fig. 1.1-20)

The Cross-Rodogram
1- 4  Z – Z   Y – Y 
2n 
-----    
n

(fig. 1.1-21)

The Relative Cross-Variogram


1-  Z  – Z    Y  – Y  
2n 
----- ----------------------------------------------
n
mZ mY
16 Structure Identification in the

(fig. 1.1-22)

The Non-Ergodic Relative Cross-Variogram


1-  Z – Z   Y – Y 
-----  ----------------------------------------------------------
2n n mZ+ + mZ- m + + m-
-
 -----------------------  -----------------------
Y Y 
-
 2  2 

(fig. 1.1-23)

The Pairwise Relative Cross-Variogram


1-  Z  – Z    Y  – Y  
2n 
----- ----------------------------------------------
n Z + Z   Y + Y 
------------------ ------------------
 2  2 
Technical References 17

(fig. 1.1-24)

This time most of the curves are no longer symmetrical. In the case of the covariance, it is even

convenient to split it into its odd and even parts as represented below. If h designates the distance
(vector) between the two data points constituting a pair, we then consider:

The Even Part of the Covariance


1---
 C  h  + C ZY  – h  
2 ZY

(fig. 1.1-25)

The Odd Part of the Covariance


1---
 C  h  – C ZY  – h  
2 ZY
18 Structure Identification in the

(fig. 1.1-26)

Note - The cross-covariance function is a more powerful tool than the cross-variogram in term of
structural analysis as it allows the identification of delay effects. However, it necessitates stronger
hypotheses (stationarity, estimation of means), it is not really used in the estimation steps.
In fact, the cross-variogram can be derived from the covariance as follows:
1
  h  = C ZY  0  – ---  C ZY  h  + C ZY  – h  
2
and is therefore similar to the even part of the covariance. All the information carried by the odd
part of the covariance is simply ignored.

A last remark concerns the presence of information on all variables at the same data points: this
property is known as isotopy. The opposite case is heterotopy: one variable (at least) is not defined
at all the data points.

The kriging procedure in the multivariate case can cope nicely with the heterotopic case. Neverthe-
less, in the meantime one has to calculate cross-variograms which can obviously be established
from the common information only. This consideration is damaging in a strong heterotopic case
where the structure, only inferred on a small part of the information, is used for a procedure which
possibly operates on the whole data set.

1.1.4 Variogram Transformations


Several transformations based on variogram calculations (in the generic sense) are also provided:

The ratio between the cross-variogram and one of the simple variograms.
Technical References 19

(fig. 1.1-27)

When this ratio is constant, the variable corresponding to the simple variogram is "self-krigeable".
This means that in the isotopic case (both variables measured at the same locations) the kriging of
this variable is equal to its cokriging. This property can be extended to more than 2 variables: the
ratio should be considered for any pair of variables which includes the self-krigeable variable.

The ratio between the square root of the variogram and the madogram:

(fig. 1.1-28)

This ratio is constant and equal to  for a standard normal variable, when its pairs satisfy the
hypothesis of binormality. A similar result is obtained in the case of a bigamma hypothesis.

The ratio between the variogram and the madogram:


20 Structure Identification in the

(fig. 1.1-29)

If the data obeys a mosaic model with tiles identically and independently valuated, this ratio is con-
stant.

The ratio between the cross-variogram and the square root of the product of the two simple var-
iograms:

When two variables are in intrinsic correlation, the two simple variograms and the cross variogram
are proportional to the same basic variogram. This means that this ratio, in the case of intrinsic cor-
relation must be constant. When two variables are in intrinsic correlation cokriging and kriging are
equivalent in the isotopic case.
Technical References 21

1.2 Variogram Model


1.2.1 Basic Structures
The following pages illustrate all the basic structures available in Isatis to fit a variogram model on
an experimental variogram. Each basic structure is described by:

l its name.

l its mathematical expression, which involves:


m A coefficient which gives the order of magnitude of the variability along the vertical axis
(homogenous to the variance). In the case of bounded functions (covariances), this value is
simply the level of the plateau reached and is called the sill. The same concept has been kept
even for the non-bounded functions and we continue to call it sill for convenience. The inter-
est of this value is that it always comes as a multiplicative coefficient and therefore can be
calculated using automatic procedures, as explained further. The sill is equal to "C" in the
following models.
m A parameter which affects the horizontal axis by normalizing the distances: hence the name
of scale factor. This term avoids having to normalize the space where the variable is defined
beforehand (for example when data are given in microns whereas the field extends on sev-
eral kilometers). This scale factor is also linked to the physical parameter of the selected
basic function.
When the function is bounded, it reaches a constant level (sill) or even changes its expres-
sion after a given distance: this distance value is the range (or correlation distance in statis-
tical language) and is equal to the scale factor. For the bounded functions where the sill is
reached asymptotically, the scale factor corresponds to the distance where the function
reaches 95% of the sill (also called practical range). For functions where the sill is reached
asymptotically in a sinusoidal way (hole-effect variogram), the scale factor is the distance
from which the variation of the function does not exceed 5% around the sill value.
This is why, in the variogram formulae, we systematically introduce the coefficient  (norm)
which gives the relationship between the Scale Factor (SF) and the parameter a:
SF = a   .
For homogeneity of the notations, the  norm and a are kept even for the functions which
depend on a single parameter (linear variogram for example): the only interest is to manipu-
late distances "standardized" by the scaling factor and therefore to reduce the risk of numer-
ical instabilities.
Finally, the scale factor is used in case of anisotropy. For bounded functions, it is easy to say
that the variable is anisotropic if the range varies with the direction. This concept is general-
ized to any basic function using the scale factor which depends on the direction, in the calcu-
lation of the distance.
m A third parameter required by some particular basic structures  .
22 Structure Identification in the

l a chart representing the shape of the function for various values of the parameters.

l a non-conditional simulation performed on a 100 X 100 grid. As this technique systematically


leads to a normal outcome (hence symmetrical), we have painted positive values in black and
negative ones in white. Except for the linear model where the median is used as a threshold.

Spherical Variogram

3 h 1 h 3
  h  = C ---  ----- – ---  ----- h  a
2 a  2 a 
(eq. 1.2-1)
h = C h  a
 = 1

(fig. 1.2-1)

Variograms (SF=1.,2.,3.,4.,5.,6.,7.,8.,9.,10.) & Simulation (SF=10.)

Exponential Variogram

– h
  h  = C 1 – exp  ---------
 a  (eq. 1.2-2)

 = 2.996
Technical References 23

(fig. 1.2-2)

Variograms (SF=1.,2.,3.,4.,5.,6.,7.,8.,9.,10.) & Simulation (SF=10.)

Gaussian Variogram

h 2
  h  = C 1 – exp  –  ----- 
  a  (eq. 1.2-3)

 = 1.731

(fig. 1.2-3)

Variograms (SF=1.,2.,3.,4.,5.,6.,7.,8.,9.,10.) & Simulation (SF=10.)

Cubic Variogram
24 Structure Identification in the

h 2 35 h 3 7 h 5 3 h 7
  h  = C 7  ----- – ------  ----- + ---  ----- – ---  ----- h  a
 a 4  a 2 a  4 a 
(eq. 1.2-4)
h = C h  a
 = 1

(fig. 1.2-4)

Variograms (SF=1.,2.,3.,4.,5.,6.,7.,8.,9.,10.,) & Simulation (SF=10.)

Cardinal Sine Variogram

h
sin  -----
 a
  h  = C 1 – -------------------
-----h (eq. 1.2-5)
a
 = 20.371
Technical References 25

(fig. 1.2-5)

Variograms (SF=1.,5.,10.,15.,20.,25.) & Simulation (SF=25.)

Stable Variogram

h 
  h  = C 1 – exp  –  ----- 
  a  (eq. 1.2-6)

 =  3

(fig. 1.2-6)

Variograms (SF= 8. & = .25, .50, .75, 1., 1.25, 1.5, 1.75, 2.)
26 Structure Identification in the

Note - The technique for simulating stable variograms is not implemented in the Turning Bands
method.

Gamma Variogram

1
  h  = C 1 – ------------------------ a  0
 1 + -----h (eq. 1.2-7)
 a

 =  20 – 1

(fig. 1.2-7)

Variograms (SF= 8. & = .5,1.,2.,5.,10.,20.) & Simulation (SF= 10. & = 2.)

Note - For  = 1 , this model is called the hyperbolic model.


J-Bessel Variogram

h
J   -----
 a
  h  = C 1 – 2     + 1  ----------------
-    d--- – 1
 h   2 
 ----- (eq. 1.2-8)
 a
 = 1
where (from Chilès J.P. & Delfiner P., 1999, Geostatistics: Modeling Spatial Uncertainty, Wiley
series in Probability and Statistics, New-York):
Technical References 27

- the Gamma function is defined for 0 by (Euler's integral)


  = 0 e–u u – 1 du (eq. 1.2-9)

- the Bessel function of the first kind with index  is defined by the development

x    –1  k x 2k
J   x  =  ---  ------------------------------------  --- (eq. 1.2-10)
 2 k!    + k + 1   2
k =0

- the modified Bessel function of the first kind, used below, is defined by

x   1 2k
I   x  =  ---  -  --x-
----------------------------------- (eq. 1.2-11)
 2 k!    + k + 1   2
k =0

- the modified Bessel function of the second kind, used in K-Bessel variogram hereafter, is defined
by

 I –  x  – I   x 
K   x  = --- ---------------------------------- (eq. 1.2-12)
2 sin 

(fig. 1.2-8)

Variograms (SF=1. & = .5,1.,2.,3.,4.,5.) & Simulation (SF=1. & = 1)

K-Bessel Variogram
28 Structure Identification in the


 -----h
 a h
  h  = C 1 – ------------------------- K –  -----
a   0
2  – 1   (eq. 1.2-13)

0.4874
 = 3.6527  

(fig. 1.2-9)
Variograms (SF=1. & = .1,.5,1.,2.,5.,10.) & Simulation (SF=1. & = 1.)

Exponential Cosine (Hole Effect Model)

h xy hz hz
C  h  = exp  – ---------- cos  2 ----- exp  – -------
n
 h  R  a 1  0 a 2  0 
a1 a2 a1

Note that C(h) is a covariance in R2 if and only if a 2  a 1 , in R3 if and only if a 2  a 1 3 (in


Chilès, Delfiner, Geostatistics, 1999).

Note - This model cannot be used in the Turning Bands simulations.


Generalized Cauchy Variogram

h 2 –
  h  = C 1 –  1 +  -----    0
a (eq. 1.2-14)

 =  20 – 1
Technical References 29

(fig. 1.2-10)

Variograms (SF=10. & = .1,.5,1.,2.,5.,10.) & Simulation (SF=10. & = 1.)

Linear Variogram

h
  h  = C  -----
a (eq. 1.2-15)

 = 1

(fig. 1.2-11)

Variogram (SF= 5.) & Simulation (SF = 5.)

Power Variogram
30 Structure Identification in the


  h  = C  -----h 0    2
a (eq. 1.2-16)

 = 1

(fig. 1.2-12)
Variograms (SF= 5. & = 0.25, 0.5, 0.75, 1., 1.25, 1.5, 1.75)

Note - The technique for simulating Power variograms is not implemented in the Turning Bands
method.

1.2.2 The Anisotropy


By anisotropy we mean the difference in the variability of the phenomenon in the different direc-
tions of the space. For the practical description (in 2D) of this concept, we focus on two orthogonal
directions and distinguish between the two following behaviors, illustrated through basic structures
with sill and range:

l same sill, different ranges: geometric anisotropy.

Its name comes from the fact that by stretching the space in one direction by a convenient factor
we also stretch the corresponding directional range until it reaches the range on the orthogonal
direction. In this new space, the phenomenon is then isotropic: the correction is of a geometric
nature.
Technical References 31

(fig. 1.2-13)

Nugget Effect + Spherical (10km, 4km)

l same range, different sills: zonal anisotropy

This describes a phenomenon where its variability is larger in one direction than in the orthogo-
nal one. This is typically the case for vertical orientation through a "layer cake" deposit by
opposition to any horizontal orientation. No geometric correction will reduce this dissimilarity

.
32 Structure Identification in the

(fig. 1.2-14)

Nugget Effect + Spherical (N/A, 4km)

l Practical calculations

The anisotropy consists of a rotation and the ranges along the different axes of the rotated system.
The rotation can be defined either globally or for each basic structure.

In the 2D case, for one basic structure, and if "u" and "v" designate the two components of the dis-
tance vector in the rotated system, we first calculate the equivalent distance:

u- 2  ----
v 2
d 2 =  ----
a + a - (eq. 1.2-17)
 u  v

where au and av are the ranges of the model along the two rotated axes.

Then this distance is used directly in the isotropic variogram expression where the range is normal-
ized to 1.

In the case of geometric anisotropy, the value au/av corresponds to the ratio between the two main
axes of the anisotropy ellipse.

For zonal anisotropy, we can consider that the contribution of the distance component along one of
the rotated axes is discarded: this is obtained by setting the corresponding range to "infinity".

Obviously, in nature, both anisotropies can be present, and, moreover, simultaneously.

Finally the setup of any anisotropy requires the definition of a system: this is the system carrying
the anisotropy ellipsoid in case of geometric (or elliptic) anisotropy, or the system carrying the
direction or plane of zonal anisotropy.
Technical References 33

This new system is defined by one rotation angle in 2D, or by 3 angles (dip, azimuth and plunge) in
3D. It is possible to attach the anisotropy rotation system globally or individually to each one of the
nested basic structures. This possibility leads to an enormous variety of different textures.

1.2.3 Integral Ranges


The integral range is the value of the following integral (only defined for bounded covariances):

A =
 C  h  dh
x
A is a function of the dimension of the space. The following table gives the integral ranges of the
main basic structures when the sill C is set to 1. with b = SF = a   and the parameter 

1-D 2-D 3-D

Nugget 0 0 0
Effect

Exponen- 2b 2b2 8b3


tial

Spherical 3b/4 
--- b 2

--- b 3
5 6

Gaussian b2
b   b3

Cardinal b + +
Sine

Stable
+1 +2  + 3
4--- 3  -------------
2b   -------------  b 2   ------------- b  
    3  
34 Structure Identification in the

Gamma 2b
------------ 1
–1 2b2 - 8b 3
-----------------------------  2 --------------------------------------------- 3
+ else  – 1  – 2  – 1  – 2  – 3
+ else + else

J-Bessel

   + 1 - 1 3    + 1 - 5
2b ---------------------   --- 4b 2    --- 8 b 3---------------------
1   ---
  + ---
1 2 2    – --- 2
2 2
+ else + else
+ else

K-Bessel
3
1
   + ---    + ---
2 2
2b ---------------------- 4b 2  8 b 3----------------------
    

Gen. Cau-
chy
b b2 -   b3
------------ 1 -----------------------------  2 --------------------------------------------- 3
–1  – 1  – 2  – 1  – 2  – 3
+ else + else + else

Convolution

If we know that the measured variable Z is the result of a convolution p applied on the underlying
variable

Z = Y*p (eq. 1.2-18)

We can demonstrate that the variogram of Z can be deduced from the variogram of Y as follows:

 Y =  Z *P with P = p*p (eq. 1.2-19)

Therefore, if the convolution function is fully determined (its type and the corresponding parame-
ters), specifying a model for Y will lead to the corresponding model for Z.
Technical References 35

1.2.4 Incrementation
In order to introduce the concept of incrementation, we must recall the link between the variogram
and the covariance:

h = C0 – Ch (eq. 1.2-20)

where   h  is calculated as the variance of the smallest possible increment:

1
  h  = ---  Z  x + h  – Z  x   (eq. 1.2-21)
2
We can then introduce the generalized variogram  h as the variance of the increment of order
(k+1):

1 k+1
  h  = ------- Var
Mk 
q=0
 – 1  q C kq =1 Z x +  k + 1 – q h  (eq. 1.2-22)

where k+1
M k = C 2k +2
which requires data to be located along a regular.

The scaling factor Mk is there to ensure that in the case of a pure nugget effect:

0 h = 0
h =  (eq. 1.2-23)
 C0 h0

The benefit of the incrementation is that the generalized variogram can be derived using the gener-
alized covariance:

1- k + 1
Mk 
h = ------  – 1  p C 2k
k + 1 + p K  ph 
+1 (eq. 1.2-24)
p=k – 1

Then, we make explicit the relationships between and for several orders :

k h

0 h = K0 – Kh


36 Structure Identification in the

1 4 1
K  0  – --- K  h  + --- K  2h 
3 3

2 3 3 1
K  0  – --- K  h  + --- K  2h  – ------ K  3h 
2 5 10

Generally speaking, we can say that the shape of h is not modified when considering K(h):

l if K(h) is a standard covariance (range a and sill C), h reaches the same sill C for the same
range: its shape is slightly different.

l if K(h) is a generalized covariance of the h  type, then   h  is of the same type: the only
difference comes from its coefficient which is multiplied by:

1- k + 1
Mk 
------ k + 1 + p p 1
 – 1  p C 2k (eq. 1.2-25)
+1
p = k–1

1.2.5 Multivariate Case


When several variables are considered simultaneously, we work in the scope of the Linear Model
of Coregionalization which corresponds to a rather crude hypothesis, although it has been used
satisfactorily in a very large number of cases.

In this model, every variable is expressed as a linear combination of the same elementary compo-
nents or factors. Therefore all simple and cross-variograms can be expressed as linear combinations
of the same basic structures (i.e. the variograms of the factors).

The covariance model is then defined by the list of the nested normalized basic structures (sill=1)
and the matrix of the sills (square, symmetrical and whose dimension is equal to the number of vari-
ables): each element b pij is the sill of the cross-variogram between variables "i" and "j" (or the sill
of the variogram of variable "i" for b pii ) for the basic structure "p".

Note - The cross-covariance value at the origin may be badly defined in the heterotopic case, or
even undefined in the fully heterotopic case. It is possible to specify the values of the simple and
cross-covariances at the origin, using for instance the knowledge about the variance-covariance
coming from another dataset.

1.3 The Automatic Sill Fitting Procedure


Isatis uses an original algorithm to fit a univariate or a multivariate model of coregionalization to
the experimental variograms. The algorithm called Multi Scale P.C.A. has been developed by C.
Technical References 37

Lajaunie (See Lajaunie C., Béhaxétéguy J.P. Elaboration d'un programme d'ajustement semi-
automatique d'un modèle de corégionalisation - Théorie, Technical report N21/89/G, Paris:
ENSMP, 1989, 6p).

This technique can be used, when the set of basic structures has been defined, in order to establish
the matrix of sills.

It obviously also works for a single variable. Nevertheless, we must note that it can only be used to
infer the sill coefficients of the model but does not help for all the other types of parameters such as:

l the number and types of basic structures,

l for each one of them, the range or third coefficient (if any),

finally for the anisotropy. This is why the term automatic fitting is somehow abusive.

Considering a set of N second order stationary regionalized random functions Zi(x) we wish to
establish the multivariate model taking into account all the simple and cross covariances Cij(h).

If the variables Zi(x) are intrinsic, the covariances no longer exist and the model must then be
derived from simple and cross variograms  ij  h  . Nevertheless, this chapter will be developed in
the stationary case.

A well known result is that the matrix b pij for each basic structure p must be (semi-) definite posi-
tive in order to ensure the positiveness of the variance of any linear combination of the random vari-
ables Zi(x).

In order to build this linear model of coregionalization, we assume that the variables Zi are decom-
posed on a basis of random variables generically denoted Y, stationary and orthogonal. These vari-
ables are regrouped in P groups of Yp random functions characterized by the same covariance Cp(h)
called the basic structure. The count of variables within each group is equal to the number of vari-
ables N. We will then write:

P N
Zi  x  =   apik Ykp (eq. 1.3-1)
p=1 k =1

The coefficients a pik are the coefficients of the linear model. The covariance between two variables
Zi and Zj and can be written:

P N
C ij  h  =   apik apjk C p  h  (eq. 1.3-2)
p=1 k =1

which can also be considered as:


38 Structure Identification in the

P
C ij  h  =  bpij C p  h  (eq. 1.3-3)
p =1

N
Obviously the terms b pij =  apik apjk , homogeneous to sills, are symmetric and the matrices
k =1
Bp whose generic terms are b pij are symmetric, semi-definite positive: they correspond to the vari-
ance-covariance matrix for each basic structure.

1.3.1 Procedure
Assuming that the number of basic structures P, as well as all the characteristics of each basic
model Cp(h), are defined, the procedure determines all the coefficients a pik and derives the vari-
ance-covariance matrices.

Starting from the experimental simple and cross-covariances C ij  h  on a set of U lags hu, the
procedure tries to minimize the quantity:

U
 =    Cij*  hu  – Cij  hu   2   hu  (eq. 1.3-4)
i j u=1

where   h u  is a weighting function chosen in order to reduce the importance of the lags with few
pairs, and to increase the size of the first lags corresponding to short distances. For more informa-
tion on the choice of these weights, the user should refer to the next paragraph.

Each matrix Bp is decomposed as:

B p = X p  p X pT (eq. 1.3-5)

where Xp is the matrix composed of the normalized eigen vectors and  p is the diagonal matrix of
the eigen values. Instead of minimizing (eq. 1.3-4) under the constraints that Bp is definite positive,
we prefer writing that:

N
b pij =  apik apjk (eq. 1.3-6)
k =1

imposing that each coefficient


Technical References 39

a pik =  pk x pik (eq. 1.3-7)

where  pk is the k-th term of the diagonal of  p and x pik is the k-th vector of the matrix . This
hypothesis will ensure the matrix Bp to be definite positive.

Equation (eq. 1.3-4) can now be reformulated:


2
U P N
 =  C ij*  h u  –   apik apjk C p  hu    hu  (eq. 1.3-8)
i j u=1 p=1 k =1

Without losing generality, we can impose orthogonality constraints:

 apik apjk = 0  i  j  (eq. 1.3-9)


k
If we introduce the terms:
U
K ij =  Cij*  hu    hu 
u=1

U
pq
T =  C p  hu C q  hu    hu  (eq. 1.3-10)
u =1

U
A ijp =  C p  hu Cij*  hu    hu 
u=1
The criterion (eq. 1.3-8) becomes:

 =  Kij +  a pik a pjk a qil a qjl T pq – 2  a pik a pjk A ijp (eq. 1.3-11)

i j i j p q k l i j p k

By differentiations against each a pik , we obtain: for each i, k and p:

pq
 apjk aqil aqjl T  apjk Aij
p
= (eq. 1.3-12)

j l q j
We shall describe the case of a single structure first before reviewing the more general case of sev-
eral nested basic structures.
40 Structure Identification in the

1.3.1.1 Case of a Single Basic Structure


As the number of basic structures is reduced to 1, the indices p and q are omitted in the set of equa-
tions (eq. 1.3-12)

 a jk  a il a jl T =  a jk Aij i k (eq. 1.3-1)

j l j
Using the orthogonality constraints, the only non-zero term in the left-hand side of the equality is
obtained when j=i:

a ik   a il  2 T =  a jk Ajk i k (eq. 1.3-2)

l j
If we introduce:

Pi =   a il  2 (eq. 1.3-3)

l
then:

a ik  P i  2 T =  a jk Ajk i k (eq. 1.3-4)

This leads to an eigen vector problem. If we denote respectively by  k and xik the eigen values and
the corresponding normalized eigen vectors, then:

a ik =  k
-----x ik k  0
T (eq. 1.3-5)

a ik = 0 k  0
The minimum of  is then equal to:

k 2
------------
 =  Kij –  T
(eq. 1.3-6)

i j kK
where K designates the set of indices corresponding to positive eigen values.

This result will now be generalized to the case of several nested basic structures.

1.3.1.2 Case of Several Basic Structures


The procedure is iterative and consists in optimizing each basic structure in turn, taking into
account the structures already optimized. The following flow chart describes one iteration:
Technical References 41

1. Loop on each basic structure p=1, ..., P

If we define:

p h = C* h –
K ik ij  b ijqC q h  (eq. 1.3-1)

qp

we optimize a pik in the equation:

2
 p h  –
K ik u  apik apjk C p  hu    hu  (eq. 1.3-2)

i j u p k
we then set, due to orthogonality constraints:

b pij =   apik  2 (eq. 1.3-3)

k
2. Improvement of the solution by selecting the coefficients mp which minimize:
2
 =  C ij*  h u  –  m p b pij C p  h u    h u  (eq. 1.3-4)

i j u p

If mp is positive, we update the results of step (1):

b pij  b pij  m p
(eq. 1.3-5)
a pik  m p a pik
Return to step (1)

Step (2) is used to equalize the weight of each basic structure as the first structure processed in
step (1) has more influence than the next ones.

The coefficient mq is the solution of the linear system:

 m q  bpij bqij T pq =  bpij Aijp p (eq. 1.3-6)

q i j i j

Note - This procedure ensures that  converges but does not induce that the bp converge.

1.3.2 Choice of the Weights


The principle of the Automatic Sill Fitting procedure is to minimize the distance between the exper-
imental value of a variogram lag and the corresponding value of the model. This minimization is
42 Structure Identification in the

performed giving different weights to different lags. The determination of these weights depends on
one of the four following rules.

l Each lag of each direction has the same weights.

l The weight for each lag of each direction is proportional to the total number of pairs for all the
lags of this direction.

l The weight for each lag of each direction is proportional to the number of pairs and inversely
proportional to the average distance of the lag.

l The weight for each lag of each direction is inversely proportional to the number of lags in this
direction.

1.3.3 Printout of the Linear Model of Coregionalization


This paragraph illustrates a typical printout for a model established for two variables called "Pb"
and "Zn":
Model : Covariance part
=======================
Number of variables = 2
- Variable 1 : Pb
- Variable 2 : Zn

and fitted using a linear combination of two basic structure:

l an exponential variogram with a scale factor of 2.5km (practical range)

l a linear variogram (Order-1 G.C.) with a scale factor of 1km.


Number of basic structures = 2
S1 : Exponential - Scale = 2.50km

Variance-Covariance matrix :
Variable 1 Variable 2
Variable 1 1.1347 0.5334
Variable 2 0.5334 1.8167

Decomposition into factors (normalized eigen vectors) :


Variable 1 Variable 2
Factor 1 0.6975 1.2737
Factor 2 0.8051 -0.4409
Decomposition into eigen vectors (whose variance is eigen values) :
Variable 1 Variable 2 Eigen Val. Var. Perc.
E.Vect 1 0.4803 0.8771 2.1087 71.45
E.Vect 2 0.8771 -0.4803 0.8426 28.55

S2 : Order-1 G.C. - Scale = 1km

Variance-Covariance matrix :
Variable 1 Variable 2
Variable 1 0.2562 0.0927
Variable 2 0.0927 0.1224

Decomposition into factors (normalized eigen vectors) :


Variable 1 Variable 2
Factor 1 0.4906 0.2508
Factor 2 -0.1246 0.2438

Decomposition into eigen vectors (whose variance is eigen values) :


Technical References 43

Variable 1 Variable 2 Eigen Val. Var. Perc.


E.Vect 1 0.8904 0.4552 0.3036 80.20
E.Vect 2 -0.4552 0.8904 0.0750 19.80

For each basic structure, the printout contains the following information:

In the Variance-Covariance matrix, the sill of the simple variogram for the first variable "Pb" and
for the exponential basic structure is equal to 1.1347. This sill is equal to 1.8167 for the second vari-
able "Zn" and the same exponential basic structure. The cross-variogram has a sill of 0.5334. These
values correspond to the b pij matrix for the first basic structure.

This Variance-Covariance matrix is decomposed into the orthogonal normalized vectors Y1 and Y2.
In this example and for the first basic structure, we can read that:

Zn = 0.6975Y 1 + 0.8051Y 2
(eq. 1.3-7)
Pb = 1.2737Y 1 – 0.4409Y 2

These coefficients are the a pik coefficients in the procedure described beforehand and one can
check, for example that for the first basic structure (p=1):

b 111 =  a 111  2 +  a 112  2


1.1347 =  0.6975  2 +  0.8051  2
b 122 =  a 121  2 +  a 122  2
(eq. 1.3-8)
1.8167 =  1.2737  2 +  – 0.4409  2
b 111 = a 111 a 121 + a 112 a 122
0.5334 =  0.6975   1.2737  +  0.8051   -0.4409 
The last array corresponds to the decomposition into eigen values and eigen vectors. For example:
44 Structure Identification in the

a 111 =  11 x111
 0.6975  = 2.1087  0.4803 
a 112 =  12 x112
 0.8051  = 0.8426  0.8771 
(eq. 1.3-9)
a 121 =  11 x121
 1.2737  = 2.1087  0.8771 
a 122 =  12 x122
 -0.4409  = 0.8426  -0.4803 

We can easily check that the vectors x 11. and x 12. are orthogonal and normalized.

Each eigen vector corresponds to a line and is attached to an eigen value. They are displayed by
decreasing order of the eigen values. As the variance-covariance matrix is definite positive, the
eigen values are positive or null. Their sum is equal to the trace of the matrix and it makes sense to
express them as a percentage of the total trace. This value is called "Var. Perc.".

l References
- [1] Akaike, Hirotugu (1974). "A new look at the statistical model identification». IEEE
Transactions on Automatic Control 19 (6): 716-723. doi:10.1109/TAC.1974.1100705.
MR0423716.
- [2] Brockwell, P.J., and Davis, R.A. (2009). Time Series: Theory and Methods, 2nd ed.
Springer.
- [3] Burnham, K. P., and Anderson, D.R. (2002). Model Selection and Multimodel Infer-
ence: A Practical Information-Theoretic Approach, 2nd ed. Springer-Verlag. ISBN 0-
387-95364-7.
- [4] Burnham, K. P., and Anderson, D.R. (2004), «Multimodel inference: understanding
AIC and BIC in Model Selection», Sociological Methods and Research, 33: 261-304.
- [5] Cavanaugh, J.E. (1997). «Unifying the derivations of the Akaike and corrected
Akaike information criteria», Statistics and Probability Letters, 31:201-208.
- [6] Hurvich, C. M., and Tsai, C.-L. (1989). «Regression and time series model selection
in small samples», Biometrika, 76: 297-307.
- [7] Schwarz, Gideon. 1978. Estimating the Dimension of a Model. Annals of Statistics
6:461-4.6
Automatic Variogram Fitting 45

2 Automatic Variogram
Fitting

The Automatic Variogram Fitting is based on an optimization algorithm. This Fitting algorithm is
based on Gauss-Newton and optimized, in mono or multivariate cases, the parameters of the basic
structures (sills, ranges, rotations).
46

2.1 General Optimization


Consider a function F over a d-dimensional space D and taking its values in . The problem is to
find:

(eq. 2.1-1)

l Newton type methods: general principle

We use an iterative algorithm to approximate x*. More precisely, starting from an initial value
x0, we construct a sequence x1,..., xn ,... which converges to a local optimizer of F. The
principle of the algorithm is to approximate F by a quadratic form at each iteration and then to
solve analytically a quadratic optimization problem. The Taylor expansion of F around any x
gives:

(eq. 2.1-2)

Where F(x) and H(x) are respectively the gradient vector and the Hessian matrix of F computed
at x. Let qx(h) denote this quadratic approximation:

(snap. 2.1-1)

At the (k +1)th iteration, we obtain xk+1 by optimizing qxk with respect to h. Differentiating qxk
with respect to h and equalizing to zero leads to:

(snap. 2.1-2)

In other words the updating equation is:

(eq. 2.1-3)

(eq. 2.1-4)
Automatic Variogram Fitting 47

The Newton type methods are known to converge toward a local optimum with a very good rate
when the current value is not far to this optimum. Indeed, in that case, the quadratic
approximation qx(h) of F(x + h) is a very accurate approximation of F. But for any general
starting value x0, this method can be quite inefficient. For this reason, we used a trust region
based method.

l Trust region based method

Starting from the consideration that a Taylor approximation is all the more accurate as h is
small, we solve at each iteration a quadratic optimization problem under the constraint that the
size of h is small. More precisely, at the (k+1)th iteration, we compute a candidate xc for the next
iteration by solving:

(eq. 2.1-5)

under the constraint than:

(eq. 2.1-6)

where is a positive constant. Then we compare the gain in the objective function with the
predicted gain by computing the ratio:

(snap. 2.1-3)

If r < 0, we set xk+1 = xk we reject the candidate value xc because F(xc) >F(xk) (since the
denominator of r is always positive). If r  0, we set xk+1 = xc.

For the value of :

if r > 0.75, we set = 2* since qx gives a good approximation of the gain.


k

Note - in order to simplify the constrained optimization problem, we work with ||h||=max|hi|. With
this choice, the inequality constraints become linear.

Note - trust regions based methods are intermediate between the gradient method(robust but with a
slow convergence) for small and Newton based methodfor larger .
48

2.2 Quadratic optimization under linear constraints


Each iteration of the general algorithm requires to optimize a quadratic form under a linear
inequality constraint. For this purpose, we must solve a quadraticoptimization problem under a
linear equality constraint.

2.2.1 Optimization under equality constraints


The general formulation of a quadratic optimization problem under linear equality constraint is as
follow :

Optimize

(eq. 2.2-1)

under the constraints Ax = b.

This problem can be solved by using Lagrange multipliers. The constrained optimizer is the
solution of the linear system:

(eq. 2.2-2)

2.2.1.1 Under inequality constraints


The general formulation of a quadratic optimization problem under linear equality constraint is as
follow :

Optimize

(eq. 2.2-3)

under the constraints Ax  b where  means that all the components of the left-hand side vector
are greater than the ones of the right-hand side.
Automatic Variogram Fitting 49

The principle of the algorithm is to produce a finite sequence x1,...,xv of point which satisfy the
inequality constraints and such that  x 0       x v  . Such points are called feasible points.
50

2.3 Minimization of a sum of squares


In the specific case of variogram fitting, the function F takes the following form:

(snap. 2.3-1)

where w1,...,wn is a set of weights,  1 ..., is the set of experimental variograms for distance h1,..., hn, f
is the model and x is the set of parameters. For such a sum of squares the gradient vector can be
written :

(eq. 2.3-1)

and the Hessian matrix is well approximated by:

(eq. 2.3-2)

where the ith component of f is given by f(hi, x), W is a matrix of 0 outside the diagonal and contains
the weights wi on the diagonal, and the (i, j)th term of J(x) is equal to

(snap. 2.3-2)

The use of this approximation is known as the Gauss-Newton algorithm.

2.3.1 Goodness of Fit (GOF), Akaike and Bayesian Information


Criterion

This paper is a short note which describes the denitions of the Akaike and Bayesian information
criterion and their implementation in the Automatic Fitting procedure, in order to evaluate the
performance of a model.

2.3.2 1 Akaike Criterion


The next paragraphs give a denition of the Akaike criterion found in Wikipedia.
Automatic Variogram Fitting 51

2.3.2.1 Introduction
The Akaike information criterion is a measure of the relative goodness of fit of a statistical model.
It was developed by Hirotsugu Akaike, under the name of "an information criterion" (AIC), and
was first published by Akaike in 1974. It is grounded in the concept of information entropy, in
effect offering a relative measure of the information lost when a given model is used to describe
reality.

It can be said to describe the trade-off between bias and variance in model construction, or loosely
speaking between accuracy and complexity of the model. Given a data set, several candidate
models may be ranked according to their AIC values. From the AIC values one may also infer that
e.g. the top two models are roughly in a tie and the rest are far worse. Thus, AIC provides a means
for comparison among models- a tool for model selection. AIC does not provide a test of a model in
the usual sense of testing a null hypothesis; i.e. AIC can tell nothing about how well a model fits the
data in an absolute sense. Ergo, if all the candidate models fit poorly, AIC will not give any warning
of that.

2.3.3 Definition
In the general case, the AIC is:
AIC= 2k⊥2ln(L)

where kis the number of parameters in the statistical model, and L is the maximized value of the
likelihood function for the estimated model. Given a set of candidate models for the data, the
preferred model is the one with the minimum AIC value. Hence AIC not only rewards goodness of
fit, but also includes a penalty that is an increasing function of the number of estimated parameters.
This penalty discourages over-fitting (increasing the number of free parameters in the model
improves the goodness of the fit, regardless of the number of free parameters in the data-generating
process).

The estimate, though, is only valid asymptotically: if the number of data points is small, then some
correction is often necessary.

2.3.4 How to calculate the likelihood term


2.3.4.2 The general presentation of the Khi-squared test

The khi-squared test (denoted ) is expressed as a weighted sum of the squared errors:
52

where O stands for the observation, E for the theoretical data and is the known variance of
the observation. The sum holds over all the observations. This definition is only useful when one
has estimates for the error on the measurements, but it leads to a situation where a chi-squared
distribution can be used to test goodness of fit, provided that the errors can be assumed to have a
normal distribution.

2.3.5 Application
In the usual case, the problem is stated as follows: the theoretical model that we are looking for is a

family of functions depending on the variables (x) and a set of unknown parameters

. Then we wish to find the optimal function (or equivalently the optimal set of parameters)
which minimizes the sum of quadratic errors between the data and the prediction performed with
this function (called residuals):

(eq. 2.3-3)

The value can be considered as the distance between the data and the theoretical model
used to predict the data. Optimally, this distance must be as small as possible.

If we know the standard deviation of the noise attached to each datum yi, we can use it to
weight the contribution of each datum to the global distance: a sample will have a large influence if
its uncertainty is small. This weighted distance is referred to as the khi-squared test:

(eq. 2.3-4)

2.3.6 Definition of the Log-likelihood

We consider a random function X with a given distribution depending upon a parameter , and

its density . Given the set of observations (x1,x2,...,xN) following the law of the random
variable X, we define the likelihood:
Automatic Variogram Fitting 53

(eq. 2.3-5)

We look for the minimum of the likelihood function which leads to an


optimization problem. The solution requires the derivation of the previous function, which is not
always formally tractable. Instead, we usually consider the logarithm of the likelihood function
(which transform products into sums) as the transform function is a monotonously increasing
function. Then it is equivalent to look for the optimum by setting the partial derivative of the log-
likelihood to 0:

(eq. 2.3-6)

2.3.7 Normal assumption for errors


When the random variable Xfollows a normal distribution, the density depends

upon two parameters :

(eq. 2.3-7)

Then the general expression of the likelihood function can be written:

We consider the error between the real data and the predicted value
(for each observation) as a random variable, then the likelihood function can be
written:

(eq. 2.3-8)
54

Note that, for sake of generality, each observation carries its own error variance .

The log-likelihood can be written:

(eq. 2.3-9)

Note that this formula introduces a term similar to the famous test which would lead to the
following AIC expression:

(eq. 2.3-10)

and as only differences in AIC are meaningful (when comparing several parametric models), we
can write:

(eq. 2.3-11)

Let us now consider that all observation errors share the same variance. According to Burnham and
Anderson, in the special case of least squares (LS) estimation with normally distributed errors, AIC
is expressed as:

(eq. 2.3-12)

where stands for the error variance estimation:

(eq. 2.3-13)
Automatic Variogram Fitting 55

Hence the final formula:

(eq. 2.3-14)

2.3.8 Extended criterion


When kis large relative to the sample size n, there is a small-sample (second-order bias correction)
version called AICc :

(eq. 2.3-15)

which should be used unless n/k>40 for the model with the largest value of k. Thus, AICc is AIC
with a greater penalty for extra parameters. Burnham & Anderson (2002) strongly recommend
using AICc, rather than AIC, if n is small or k is large. Since AICc converges to AIC as n gets
large, AICc generally should be employed regardless. Using AIC (instead of AICc) when n is not
many times larger than k², increases the probability of selecting models that have too many
parameters, i.e. of overfitting. The probability of AIC overfitting can be substantial, in some cases.

AICc was first proposed by Hurvich & Tsai (1989). Dierent derivations of it are given by Brockwell
& Davis (2009), Burnham & Anderson (2002), and Cavanaugh (1997). All the derivations assume a
univariate linear model with normally-distributed errors (conditional upon regressors); if that
assumption does not hold, then the formula for AICc will usually change. Further discussionn of
this, with examples of other assumptions, is given by Burnham & Anderson (2002, ch.7). In
particular, bootstrap estimation is usually feasible.

2.3.9 BIC criterion


Schwarz (1978) derived another criterion for model selection among a finite set of models. It is
based, in part, on the likelihood function, and it is closely related to Akaike information criterion
(AIC). The Bayesian information criterion is an asymptotic result derived under the assumptions
that the data distribution is in the exponential family:

(eq. 2.3-16)

Under the assumption that the model errors are independent and identically distributed according to
56

a normal distribution, this becomes:

(eq. 2.3-17)

Given any two estimated models, the model with the lower value of BIC is the one to be preferred.

The BIC is an increasing function of and an increasing function of k. That is, unexplained
variation in the dependent variable and the number of explanatory variables increase the value of
BIC. Hence, lower BIC implies either fewer explanatory variables, better t, or both. The BIC
generally penalizes free parameters more strongly than does the Akaike information criterion,
though it depends on the size of n and relative magnitude of n and k.

l References
- [1] Akaike, Hirotugu (1974). "A new look at the statistical model identification». IEEE
Transactions on Automatic Control 19 (6): 716-723. doi:10.1109/TAC.1974.1100705.
MR0423716.
- [2] Brockwell, P.J., and Davis, R.A. (2009). Time Series: Theory and Methods, 2nd ed.
Springer.
- [3] Burnham, K. P., and Anderson, D.R. (2002). Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach, 2nd ed. Springer-Verlag. ISBN
0-387-95364-7.
- [4] Burnham, K. P., and Anderson, D.R. (2004), «Multimodel inference: understanding
AIC and BIC in Model Selection», Sociological Methods and Research, 33: 261-304.
- [5] Cavanaugh, J.E. (1997). «Unifying the derivations of the Akaike and corrected
Akaike information criteria», Statistics and Probability Letters, 31:201-208.
- [6] Hurvich, C. M., and Tsai, C.-L. (1989). «Regression and time series model selection
in small samples», Biometrika, 76: 297-307.
- [7] Schwarz, Gideon. 1978. Estimating the Dimension of a Model. Annals of Statistics
6:461-4.6
Technical References 57

3.Non-stationary
Modeling
This technical reference describes the non-stationary variogram modeling approach, where both the
Drift and the Covariance part of the Structure are directly derived in a calculation procedure.

In the non-stationary case (the variable shows either a global trend or local drifts), the correct tool
cannot be the variogram any more as we must deal with variables presenting much larger
fluctuations. Generalized covariances are used instead. As they can be specified only when the
drift hypotheses are given, a Non-stationary Model is constituted of both the drift and the
generalized covariance parameters.

The general framework used for the non-stationary case is known as the Intrinsic Random
Functions of order k (IRF-k for short). In this scope, the structural analysis is split into two steps:
m determination of the degree of the polynomial drift.
m influence of the optimal generalized covariance compatible with the degree of the drift.

The procedure described hereafter only concerns the univariate aspect. Conversely, it is developed
to enable the use of the external drift feature.
58 Non-stationary Modeling

3.1 Unique Neighborhood


3.1.1 Determination of the Degree of the Drift
The principle is to consider that the random variable Z(x) is only constituted of the drift which
corresponds to a large scale function with regard to the size of the neighborhood. This function is
usually modeled as a low order polynomial.

K
Zx = mx =  al f l  x  (eq. 3.1-1)
l

fl(x) denotes the basic monomials


al are the unknown coefficients
K represents the number of monomials and is related to the degree of the polynomial through the
dimension of the space

The procedure consists in a cross-validation criterion assuming that the best (order of the) drift is
the one which results in the smallest average error. The cross-validation is a generic name for the
process which in turns considers one data point (called the target), removes it and estimates it from
the remaining neighboring information. The cross-validation error is the difference between the
known and the estimated values. When the theoretical variance of estimation is available, the
previous error can be divided by the estimation standard deviation.

The estimation m*(x) is obtained through a least squares procedure, the main lines of it are recalled
here. If Z  designates the neighboring information we wish to minimize:

2
 =   Z  – m  x    (eq. 3.1-2)

Replacing m  x   by its expansion:

 2 
 =  
 
Z  – 2 l l   l
a f l Z +
,m
a a f l fm
l m  

(eq. 3.1-3)

which must be minimized against each unknown al


m
am  fl fm

= 

f l Z l (eq. 3.1-4)

In matrix notation:
Technical References 59

 F T F A =  F T Z  (eq. 3.1-5)

The principle in this drift identification phase consists in selecting data points as targets, fitting the
polynomials for several order assumptions, based on their neighboring information and derives the
minimum square errors for each assumption. The optimal drift assumption is the one which
produces, on average, the smallest error variance.

The drawback to this method is its lack of robustness against possible outliers. As a matter of fact,
an outlier will produce large variances whatever the degree of the polynomial and will reduce the
discrepancy between results.

A more efficient criterion, for each target point, is to rank the least squared errors for the various
polynomial orders. The first rank is assigned to the order producing the smallest error, the second
rank to the second smallest one and so one. These ranks are finally averaged on the different target
points and the smallest averaged rank corresponds to the optimal degree of the drift.

3.1.2 Inference of the Covariance


Here again, we consider the generic form of the generalized covariance:

Kh = p bp Kp  h  (eq. 3.1-6)

where Kp(h) corresponds to predefined basic structures.

The idea consists in finding the coefficients bp but, this time, among a class of quadratic estimators.

bˆ p =   Z A Z
 
(eq. 3.1-7)

using systematically all the information available.

The principle of the method is based on the MINQUE theory (Rao) which has been rewritten in
terms of generalized covariances.

Let Z be a vector random variable following the usual decomposition

Z = X + U (eq. 3.1-8)

Let us first review the MINQUE approach. The covariance matrix of Z, can be expanded on a basis
of authorized basic models:

Cov  Z, Z  =  2 1 V 1 +  +  2 r V r
(eq. 3.1-9)

introducing the variance components  2 p . We can estimate them using a quadratic form
60 Non-stationary Modeling

̂ 2 p = Z T A p Z
(eq. 3.1-10)

where the following conditions are satisfied on the matrix Ap:


1. Invariance: ApX = 0 (X is the drift vector composed of columns of coordinates)

2. Unbiasedness: T r  A p V q  =  pq

3. Optimality: ||Ap||2V = Tr(ApVApV) minimum

where V is a covariance matrix used as a norm.

Rao suggested defining V as a linear combination of the Vp:

2
V =  p Vp (eq. 3.1-11)

2 2
The MINQUE is reached when the coefficients  p coincide with the variance components  p , but
this is precisely what we are after.

Using the vector  which constitutes an increment of the data Z we can refer Ap by:
 Sp T

where:
TX = 0
and check that the norm V is only involved through:

W = TV
If A and B designate real symmetric n*n matrices, we define the scalar product

<A, B>n = Tr(AVBV) (eq. 3.1-12)

If A and B satisfy invariance conditions, then we can find respectively S and T, such that:

(eq. 3.1-13)

Then:

(eq. 3.1-14)
Technical References 61

which defines a scalar product on the (n-k)*(n-k) matrix if k designates the number of drift terms.

With these notations, we can reformulate the MINQUE theory:

(eq. 3.1-15)

(eq. 3.1-16)

l The unbiasedness condition leads to:

(eq. 3.1-17)

We introduce the following notations:

(eq. 3.1-18)

then

(eq. 3.1-19)

l The optimality condition:

(eq. 3.1-20)

(eq. 3.1-21)

(eq. 3.1-22)

(eq. 3.1-23)
62 Non-stationary Modeling

If  designates the subspace spanned on the Hi, the optimality condition induces that Sp belongs
to this space and can be written:

(eq. 3.1-24)

The unbiasedness conditions can be written:

(eq. 3.1-25)

This system has solutions as soon as the matrix H(H(i,j) = <Hi,Hj>) is non singular.

When the coefficients  pi have been calculated, the matrices Sp and Ap are determined and finally

the value of bˆ p is obtained.


These coefficients must then be replaced in the formulation of the norm V and therefore in W. This
leads to new matrices Hi and to new estimates of the coefficients  pi . The procedure is iterated

until the estimates bˆ p of have reached a stable position.

Still there is no guarantee that the estimate bˆ p satisfies the consistency conditions for K to be a
valid generalized covariance.

It can be demonstrated however that the coefficients linked to a single basic structure covariance
lead to positive results which produce authorized generalized covariances.

The procedure resembles the one used in the moving neighbourhood case. All the possible
combinations are tested and the ones which lead to non-authorized generalized covariances are
dropped.

In order to select the optimal generalized covariance, a cross-validation test is performed and the
model which leads to the standardized error closest to 1 is finally retained.
Technical References 63

3.2 Moving Neighborhood


This time, the procedure is quite different whether we consider a Moving or a Unique
Neighborhood Technique. It consists in finding the optimal generalized covariance, knowing the
degree of the drift.

3.2.1 Determination of the Degree of the Drift


The procedure consists in finding the optimal drift considered as the large scale drift with regards to
the (half) size of the neighborhood. As a matter of fact, each sample is considered in turn as the
seed for the neighborhood search. This neighborhood is then split into two rings: the closest
samples to the seed belong to the ring numbered 1, the other samples to ring number 2.

As for the Unique Neighborhood case, the determination is based on a cross-validation procedure.
All the data from ring 1 are used to fit the functions corresponding to the different drift hypotheses.
Each datum of ring 2 is used to check the quality of the fit. Then the roles of both rings are inverted.
The best fit corresponds to the minimal average variance of the cross-validation errors, of for a
more robust solution, to the minimal re-estimation rank. The final drift identification only considers
the results obtained when testing data of ring 2 against drift trials fitted on samples from ring 2.

3.2.2 Constitution of ALC-k


We can then consider that the resulting model is constituted of the drift that we have just inferred,
completed by a covariance function reduced to a pure nugget effect, the value of which is equal to
the variance of the cross-validation errors.

The value of the polynomial at the test data (denoted by the index "0") is:

(eq. 3.2-1)

This establishes that this estimate is a linear combination of the neighboring data. The set of
weights is given by:

(eq. 3.2-2)

As the residual from the least squares polynomial of order k coincides with a kriging estimation
using a pure nugget effect in the scope of the intrinsic random functions of order k, and as the
nugget effect is an authorized model for any degree k of the drift, then:

(eq. 3.2-3)
64 Non-stationary Modeling

is an authorized linear combination of the points   Z   Z0  .


with the corresponding weights     –1  .
We have found a convenient way to generate one set of weights which, given a set of points,
constitutes an authorized linear combination of order k (ALC-k).

3.2.3 Inference of the Covariance


The procedure is a cross-validation technique performed using the two rings of samples as defined
when determining the optimal degree of the drift. Then each datum of the ring 1 is considered with
all the data in ring 2: they constitute a measure. Similarly, each datum of ring 2 is considered with
all the data of ring 1. Finally one neighborhood, centered on a seed data point, which contains 2N
data points leads to (a maximum of) 2N measures.

The first task is to calculate the weights that must be attached to each point of the measure in order
to constitute an authorized linear combination of order k.

Now the order k of the random function is known since it comes from the inference performed in
the previous step. The obvious constraint is that the number of points contained in a measure is
larger than the number of terms of the drift to be filtered.

A simple way to calculate these weights is obtained through the least square fitting of polynomials
of order k.

We will now apply the famous "Existence and Uniqueness Theorem" to complete the inference of
the generalized covariance. It says that for any ALC-k, we can write:

(eq. 3.2-4)

introducing the generalized covariance K(h) where K  designates the value of this function K for
the distance between points  and  .

We assume that the generalized covariance K(h) that we are looking for is a linear combination of a
given set of generic basic structures Kp(h), the coefficients bp (equivalent to sills) of which still
need to be determined:

(eq. 3.2-5)

We use the theorem for each one of the measures previously established, that we denote by using
the index "m":
Technical References 65

Var 

 m Z =    m K m
 

(eq. 3.2-6)
 
=  b p      m K p    m
p   

If we assume that each generic basic structure Kp(h) is entirely determined with a sill equal to 1,
each quantity:

(eq. 3.2-7)

as well as the quantity

(eq. 3.2-8)

are known.

Then the problem is to find the coefficients such that

(eq. 3.2-9)

for all the measures generated around each test data. This is a multivariate linear regression
problem that we can solve by minimizing:

(eq. 3.2-10)

2
The term  m is a normation weight introduced to reduce the influence of ALC-k with a large
variance. Unfortunately this variance is equal to:
66 Non-stationary Modeling

(eq. 3.2-11)

which depends on the precise coefficients that we are looking for. This calls for an iterative
procedure.

Moreover we wish to obtain a generalized covariance as a linear combination of the basic


structures. As each one of the basic structures individually is authorized, we are in fact looking for
a set of weights which are positive or null. We can demonstrate that, in certain circumstances, some
coefficients may be slightly negative. But in order to ensure a larger flexibility to this automatic
procedure, we simply ignore this possibility. We should however perform regression under the
positiveness constraints. Instead we prefer to calculate all the possible regressions with one non-
zero coefficient only, then with two non-zero coefficients, and so on ... Each one of these
regressions is called a subproblem.

As mentioned before, each subproblem is treated using an iterative procedure in order to reach a
correct normation weight.

The principle is to initialize all the non-zero coefficients of the subproblem to 1. We can then derive
2
an initial value for the normation weights   m  0 . Using these initial weights, we can solve the
regression subproblem and derive the new coefficients. We can therefore obtain the new value of
the normation weights. This iteration is stopped when the coefficients bp remain unchanged
between two consecutive iterations.

We must still check that the solution is authorized as the resulting coefficients, although stable, may
still be negative. The non-authorized solutions are discarded.

Anyhow, it can easily be seen that the monovariate regressions always lead to authorized solutions.

Let us assume that the generalized covariance is reduced to one basic structure

K(h) = bK0(h) (eq. 3.2-12)

The single unknown is the coefficient , which is obtained by minimizing:

(eq. 3.2-13)

The solution is obviously:


Technical References 67

(eq. 3.2-14)

0
As  m is an ALC-k, the term K   m  corresponds to the variance of the ALC-k and is therefore
positive. We can check that b* r 0.

We have obtained several authorized sets of coefficients, each set being the optimal solution of the
corresponding subproblem. We must now compare these results. The objective criterion is to
compare the ratio between the experimental and the theoretical variance:

(eq. 3.2-15)

The closer this ratio is to 1, the better the result.


68 Non-stationary Modeling

3.3 Case of External Drift(s)


The principle of the external drift technique is to replace the large scale drift function, previously
modelled as a low order polynomial, by a combination of a few deterministic functions fl known
over the whole field. However, in practical terms, the first constant monomial universality
condition is always kept; some of the other traditional monomials can also be used so that the drift
can now be expanded as follows:

E  Z  x   = m  x  = a0 +  al f l  x  (eq. 3.3-1)
l

when the fl denotes both standard monomials and external deterministic functions.

When this new decomposition has been stated, the determination of the number of terms in the drift
expansion as well as the corresponding generalized covariance is similar to the procedure explained
in the previous paragraph.

Nevertheless some additional remarks need to be mentioned.

The inference (as well as the kriging procedure) would not work properly as soon as some of the
basic drift functions and the data locations are linearly dependant.

In the case of a standard polynomial drift these cases are directly linked to the geometry of the data
points: a first order IRF will fail if all the neighboring data points are located on a line; a second
order IRF will fail if they belong to any quadric such as a circle, an ellipse or a set of two lines.

In the case of external drift(s), this condition involves the value of these deterministic functions at
the data points and is not always easy to check. In particular, we can imagine the case where only
the external drift is used and where the function is constant for all the samples of a (moving)
neighborhood: this property with the universality condition will produce an instability in the
inference of the model or in its use via the kriging procedure.

Another concern is the degree that we can attribute to the IRF when the drift is represented by one
or several external functions. As an illustration we could imagine using two external functions
corresponding respectively to the first and second coordinates of the data. This would transform the
target variable into a IRF 1 and would therefore authorize the fitting of generalized covariances
such as K(h) = |h|3. As a general rule we consider that the presence of an external drift function
does not modify the degree of the IRF which can only be determined using the standard monomials:
this is a conservative position as we recall that the generalized covariance that can be used for an
IRF(k), can always be used for an IRF(k+1).
Technical References 69

3.4 Case of Kriging With Bayesian Drift


The principle of the kriging with bayesian drift is to replace the drift coefficients by random
Gaussian variables in the universal kriging. The kriging dichotomy is now expressed as:

(eq. 3.4-1)

Where is the drift, is a set of random variables with the first two moments
known a priori and is the residual.

(eq. 3.4-2)

The unbiasedness condition aiming at filtering out on the drift, leads to add the following equations:

(eq. 3.4-3)

The random function Z and the set of random variables are related by:

(eq. 3.4-4)

Also the spatial covariances of stationary residuals can be expressed as:

(eq. 3.4-5)

Using the optimality condition and minimizing the prediction variance, we get the following
Bayesian kriging system:
70 Non-stationary Modeling

(eq. 3.4-6)

In matrix notations we can write it as:

(eq. 3.4-7)

The final prediction system is:

(eq. 3.4-8)

With:
Technical References 71

4.Quick Interpolations
The term Quick Interpolation is used to characterize an estimation technique that does not require
any explicit model of spatial structure. They usually correspond to very basic estimation algorithms
widely spread in the literature. For simplicity purpose, only the univariate estimation techniques are
proposed.
72 Quick Interpolations

4.1 Inverse Distances


The estimation is a linear combination of the neighboring information.

Z = 

  Z
(eq. 4.1-1)

The weight attached to each information is inverse proportional to the distance from the data to the
target, at a given power (p):

1-
-----
 d P
 = -------------
1-
 ----- d P
 (eq. 4.1-2)

If the smallest distance is smaller than a given threshold, the value of the corresponding sample is
simply copied at the target point:
Technical References 73

4.2 Least Square Polynomial Fit


The neighboring data is used in order to fit a polynomial expression of a degree specified by the
user.

l designates each monomial at the point x the least square system is written:
If f  

 2
 
 
Z   l 

l
 f l

minimum
(eq. 4.2-1)

which leads to the following linear system:


l
a l   f l f l 

= 

Z  f l l
(eq. 4.2-2)

When the coefficients al of the polynomial expansion are obtained, the estimation is:

l
Z =  al Zf0
l
(eq. 4.2-3)

where f 0l designates the value of each monomial at the target location.


74 Quick Interpolations

4.3 Moving Projected Slope


The idea is to consider the data samples 3 by 3. Each triplet of samples defines a plane whose value
at the target location gives the plane-estimation related to that triplet. The estimated value is
obtained by averaging all the estimations given by all the possible triplets of the neighborhood. This
can also be expressed as a linear combination of the data but the weights are more difficult to
establish.
Technical References 75

4.4 Discrete Splines


The interested reader can find references on this technique in Mallet J.L., Automatic Contouring in
Presence of Discontinuities (In Verly et Al eds., Geostatistics for Natural Resources
Characterization, Part 2, Reidel, 1984). The method has only been implemented on regular grids.

The global roughness is obtained as a combination of the following constraints, defined in 2D:

if we interpolate the top Z =   x y  of a geological stratigraphic layer, as such layers are


generally nearly horizontal, it is wise to assume that the interpolator  is such that:

2 2
R 1    =  and R 2    =  are minimum
x y (eq. 4.4-1)

l if we consider the layer as an elastic beam that has been deformed under the action of
geological stresses, it is known that shearing stresses in the layer are proportional to second
order derivatives. At any point where the shearing stresses exceed a given threshold, rupture
will occur. For this reason, it is wise to assume the following condition at any point where no
discontinuity exists:

2 2 2
R3    =   , R 4    =   and R 5    =   are minimum (eq. 4.4-2)
x2 y2  x y
The global roughness can be established as follows:

R    =   R1    + R2     +  1 –    R3    + R4    + R5     (eq. 4.4-3)

where  is a real number belonging to the interval [0, 1].

Practice has shown that the term R 5    has little influence on the result. For this reason, the term
R 5    is often dropped from the global criterion.

Finally, as we are dealing with values located on a regular grid, we replace the partial derivatives by
their digital approximations:
76 Quick Interpolations

 -----
-
=   i + 1 j  –   i – 1 j 
 x   i  j 

 -----
-
=   i j + 1  –   i j – 1 
 y   i  j 
2
  
 2 =   i + 1 j  – 2  i j  +   i – 1 j 
  x   i j 
2
  
 2 =   i j + 1  – 2   i j  +   i j – 1 
  y   i j 
 -
2
 ---------- =   i + 1 j + 1  –   i – 1 j + 1  –   i + 1 j – 1  +   i – 1 j – 1 
 x y  i j 
(eq. 4.4-4)

Due to this limited neighborhood for the constraints, we can minimize the global roughness in an
iterative process, using the Gauss-Seidel Method.
Technical References 77

4.5 Bilinear Grid Interpolation


When the data are defined on a regular grid, we can derive a value of a sample using the bilinear
interpolation method as soon as the sample is surrounded by four grid nodes:

(fig. 4.5-1)

y x x
Z = ------  ------  Z  i + 1 ;j + 1  +  1 – ------ Z  i ;j + 1  
y  x  x 

y x x
+  1 – ------  ------  Z  i + 1 ;j  +  1 – ------ Z  i ;j  
y x x
(eq. 4.5-1)

We can check that the bilinear technique is an exact interpolator as when x = y = 0,

Z = Z  i j  (eq. 4.5-2)
78 Quick Interpolations
Technical References 79

5.Linear Estimation
This technical reference presents the outline of the main kriging applications. In fact, by the generic
term "kriging", we designate all the procedures based on the Minimum Variance Unbiased Linear
Estimator, for one or several variables. The following cases are presented:
m ordinary kriging,
m simple kriging,
m drift estimation,
m estimation of a drift coefficient,
m kriging with external drift,
m unique neighborhood case,
m filtering model components,
m block kriging,
m sampling density variance,
m kriging with measurement error,
m cokriging,
m extended collocated cokriging.
80 Linear Estimation

5.1 Ordinary Kriging (Intrinsic Case)


We designate by Z the random variable. We define the kriging estimate, denoted Z*, as a linear
combination of the neighboring information Z  , introducing the corresponding weights  

Z =    Z (eq. 5.1-1)


For a better legibility, we will omit the summation symbol when possible using the Einstein
notation. We consider the estimation error, i.e. the difference between the estimation and the true
value Z* - Z0.

We impose the estimator at the target (denoted "0") to be:

l unbiased:

E  Z – Z 0  = E    Z  – Z 0  = 0 (eq. 5.1-2)

(which assumes that the expectation of the linear combination exists).

l minimum variance (optimal):

Var  Z – Z 0  = Var    Z  – Z 0  minimum (eq. 5.1-3)

(which assumes that the variance of the linear combination exists).

We will develop the equations assuming that the random variable Z has a constant unknown mean
value:

EZ = m (eq. 5.1-4)

Then equation (eq. 5.1-2) can be expanded:

 
E  Z – Z 0  = m     – 1 = 0 m (eq. 5.1-5)
 

   = 1

This is usually called "the Universality Condition".

Introducing C  = Cov  Z  Z   the equation (eq. 5.1-3) is expanded using the covariance C
Technical References 81

2
 = Var  Z – Z 0  =    b C  – 2   C  0 + C 00 minimum (eq. 5.1-6)

which should be minimum under the constraints given in (eq. 5.1-5) .

Introducing the Lagrange multiplier  , we must then minimize the quantity:

 
 =     C  – 2   C  0 + C 00 + 2      – 1 (eq. 5.1-7)
 

against the unknown   and  .


 -------- 
 - = 0   C  +  = C a0 
 
 (eq. 5.1-8)
-
 ----- = 0    = 1
 
 

We finally obtain the (Ordinary) kriging system:

   C  +  = C  0 


  = 1

(eq. 5.1-9)
 
 2
  = C 00 –   C 0 – 
Using matrix notation:

C 1    C0   C


= and 2 = C 00 –     0 (eq. 5.1-10)
   
1 0   1   1

In the intrinsic case, we know that we can use the variogram  instead of the covariance C and
that:

h = C0 – Ch (eq. 5.1-11)

We can then rewrite the kriging system:


82 Linear Estimation

 –     +  = –   0 


  = 1

(eq. 5.1-12)
 
 2
  = –  00 –     0 – 
In the intrinsic case, there are two ways of expressing kriging equations: either in covariance terms
or in variogram terms. In view of the numerical solution of these equations, the formulation in
covariance terms should be preferred because it endows the kriging matrix with the virtues of
definite positiveness and involves an easier practical inversion.
Technical References 83

5.2 Simple Kriging (Stationary Case with Known


Mean)
We assume that the expectation of the random variable is constant equal to "m". There is no further
need for a Universality Condition and the (Simple) kriging system is:


  C  = C  0
 2 (eq. 5.2-1)
  = C 00 –   C 0
In matrix notation:

 C        =  C  0  and  2 = C 00 –      C  0  (eq. 5.2-2)

In this particular case of stationarity, the estimator is given by:

 
Z*0 =    Z +  1 –    m
 
(eq. 5.2-3)

 
84 Linear Estimation

5.3 Drift Estimation


Let us rewrite the usual universal kriging dichotomy:

Zx = Yx + mx (eq. 5.3-1)

where m  x  = a l f l  x  is the drift

We wish to estimate the value of the drift at the target point by kriging:

m  x 0  =   Z  (eq. 5.3-2)

The unbiasedness condition implies that:

E  m – m 0  =   a l f l – a l f 0l = 0 a l (eq. 5.3-3)

therefore   f l = f 0l
The optimality condition leads to:

Var  m – m 0  =     C  minimum (eq. 5.3-4)

Finally the kriging system is derived:

 C +  f l = 0
  l 
  l
  f  = f 0l (eq. 5.3-5)

  2 =  l f l

In matrix notation:

   0
C  f l     
  =  l (eq. 5.3-6)
f l 0   l   f0 
and
Technical References 85

T
    0
 2 = –   l  (eq. 5.3-7)
  l   f0 
86 Linear Estimation

5.4 Estimation of a Drift Coefficient


Let us rewrite the usual universal kriging dichotomy:

Zx = Yx + mx (eq. 5.4-1)

where m(x) = alfl(x) is the drift

We wish to estimate the value of one of the drift components (say the one corresponding to the
basic drift function number l0) at the target point by kriging:

al* = a *l  x 0  =   Z  (eq. 5.4-2)


0

The unbiasedness condition implies that:

E  a l  – a l  =   a l f l – a l = 0 l (eq. 5.4-3)
0 0 0

This leads to the following conditions on the weights:

 l
  f = 0 for l  l0
 l
(eq. 5.4-4)
   f 0 = 1 for l = l0
The optimality condition leads to:

Var  a l  – a l  =     C  minimum (eq. 5.4-5)


0 0

Finally the kriging system is derived:

 C +  f l = 0
  l 
  l
  f = 0 (eq. 5.4-6)

   f l0 = 1
 
Technical References 87

5.5 Kriging with External Drift


We recall that when kriging the variable in the scope of the IRF-k, the expectation of Z(x) is
l
expanded using a basis of polynomials: E[Z(x)] = alf (x) with unknown coefficients al

Here, the basic hypothesis is that the expectation of the variable can be written:

E  Z  x   = a0 + a1 S  x  (eq. 5.5-1)

where S(x) is a known variable (background) and where a0 and a1 are unknown.

Once again, before applying the kriging conditions, we must make sure that the mean and the
variance of the kriging error exist. We need this error to be a linear combination authorized for the
drift to be filtered. This leads to the equations:


 = 1

 (eq. 5.5-2)
    S = S0

These existence equations ensure the unbiasedness of the system.

This optimality constraint leads to the traditional equations:

Var  Z – Z 0  =     K  – 2   K  0 + K 00 minimum (eq. 5.5-3)

where K(h) is then a generalized covariance.

Introducing the Lagrange parameters  0 and  1 , we must now minimize:

 =     K  – 2   K  0 + K 00 + 2  0     – 1  + 2  1    S  – S 0  (eq. 5.5-4)

against the unknowns   ,  0 and  1 :


88 Linear Estimation


  --------  
   - = 0    K  +  0 +  S  = K  



  --------      = 1
   = 0
 
 
 (eq. 5.5-5)
 0

  
  -------- = 0     S  = S 0 
  
 1

We finally obtain the kriging system with external drift:

   K  +  0 +  1 S  = K  0


 
 = 1 (eq. 5.5-6)
 
 
  S = S0

In matrix notation:

K  1 S     K 0
 
1 0 0   0 = 1 (eq. 5.5-7)
 
S  0 0   1 S0

and

T
   K 0
 
 2 = K 00 –   0 1 (eq. 5.5-8)
 
  1 S0
Technical References 89

5.6 Unique Neighborhood Case


We recall the principle of the kriging or cokriging, although only the kriging case will be addressed
here for simplicity. We wish to estimate the variable Z at any target point (Z*) using the neighboring
information Z  , as the linear combination:

Z =    Z (eq. 5.6-1)

where the kriging weights   are the unknown.


The Kriging conditions of unbiasedness and optimality lead to the following linear Kriging
System:

C f l    C̃  0
= (eq. 5.6-2)

f l 0   l   f 0l

and the variance of the kriging estimation error is given by:

˜     C̃ 
0 
2 = C̃ 00 –    (eq. 5.6-3)
  l 
 l   f0 
with the following notations:

  Indices relative to data points belonging to the neighborhood of the target


point
0 Index which refers to the target point
C  The value of the covariance part of the structural model expressed for the dis-
tance between the data points
The value of the drift function ranked "l" applied to the data point
f l
The value of the modified covariance part of the structural model expressed
C̃ 0 for the distance between the point and the target point.

f 0l The value of the drift function ranked "l" applied to the target point
90 Linear Estimation

The value of the modified covariance part of the structural model (iterated
˜ twice) expressed between the target point and itself.
C̃ 00

˜
The terms C̃ 0 and C̃ 00 depend on the type of quantity to be estimated:

˜
C̃ 0 C̃ 00

punctual ˜
C̃ = C C̃ = C

drift
C̃ = 0 ˜
C̃ = 0

block average ˜
C̃ =
v C  dv  C̃ =
vvC  dv  dv 

first order partial derivative ˜ 2C


C C̃ = ---------
C̃ = ------
x x 2

A second look at this kriging system allows us to write it as follows:

AX = B (eq. 5.6-4)

where:
A is the left-hand side kriging matrix
X is the vector of kriging weights (including the possible Lagrange multipliers)
B is the right-hand side kriging vector
AB stands for the matrix product

* will designate the scalar product


Technical References 91

It is essential to remark that, given the structural model:

l The left-hand side matrix depends on the mutual location of the data points present in the
neighborhood of the target point.

l The right-hand side depends on the location of the data points of the neighborhood with regard
to the location of the target point.

l The choice of the calculation option only influences the right-hand side and leaves the left-hand
side matrix unchanged.

In the Moving Neighborhood case, the data points belonging to the neighborhood vary with the
location of the target point. Then the left-hand matrix A, as well as the right-hand side vector B
must be established each time and the vector of kriging weights X is obtained by solving the linear
kriging system. The estimation is derived by calculating the product of the first part of the vector
X (excluding the Lagrange multipliers) by the vector of the variable value measured at the
neighboring data samples Z, that we can write in matrix notation as:

Z = X t *Z̃ (eq. 5.6-5)

where Z̃ is the vector of the variable value complemented by as many zero values as there are drift
equations (and therefore Lagrange multipliers) and designates the scalar product.

Finally the variance of the estimation error is derived by calculating another scalar product:

˜
 2 = C̃ 00 – X t *B (eq. 5.6-6)

In the Unique Neighborhood case, the neighboring data points remain the same whatever the target
point. Therefore the right-hand side matrix is unchanged and it seems reasonable to invert it once
for all A-1. For each target point, the right-hand side vector must be established, but this time the
vector of kriging weights X is obtained by a simple scalar product:

X = A –1  B (eq. 5.6-7)

Then, the rest of the procedure is similar to the Moving Neighborhood case:

Z = X t *Z̃ (eq. 5.6-8)

˜
 2 = C̃ 00 – X t *B (eq. 5.6-9)

If the variance of the estimation error is not required, the vector of kriging weights does not even
have to be established. As a matter of fact, we can invert the following system:

A  C = Z̃ (eq. 5.6-10)
92 Linear Estimation

The estimation is immediately obtained by calculating the scalar product (usually referred as the
dual kriging system):

Z = C t *B (eq. 5.6-11)
Technical References 93

5.7 Filtering Model Components


Let us imagine that the target variable Z can be considered as a linear combination of two random
variables Y1 and Y2, called scale components, in addition to the mean:

Z = m + Y1 + Y2 (eq. 5.7-1)

where Y1 is centered (mean is zero), characterized by the variogram  1 and Y2 by  2 . If the two
variables are independent, it is easy to see that the variogram of the variable Z is given by:

 = 1 + 2 (eq. 5.7-2)

Instead of estimating Z , we may be interested in estimating one of the two components, the
estimation of the mean has been covered in the previous paragraph. We are going to describe the
estimation of one scale component (say the first one):


Y 1 =  Z (eq. 5.7-3)


Here again, we will have to distinguish whether the mean is a known quantity or not. If the mean is
a known constant, then it is obvious to see that the unbiasedness of the estimator is fulfilled
automatically without implying additional constraints on the kriging weights. If the mean is
constant but unknown, the unbiasedness condition leads to the equation:

  = 0 (eq. 5.7-4)


Note that the formalism can be extended to the scope of IRF-k (i.e. defining the set of monomials
fl(x) which compose the drift) and impose that:

  fl  x  = 0 l (eq. 5.7-5)


Nevertheless the rest of this paragraph will be developed in the intrinsic case of order 0 and we can
establish the optimality condition:

Var  Y 1 – Y 01  = –         + 2     1 0 –  00
1 minimum (eq. 5.7-6)

  
This leads to the system:
94 Linear Estimation

 –     +  = –  1  0 


 
 = 0
(eq. 5.7-7)

 

The estimation of the second scale component Y2*, will be obtained by simply
changing  
1 into  2 in the right-hand side of the kriging system, keeping the left-hand side
0 0
unchanged.

Similarly, rather than extracting a scale component, we can also be interested in filtering a scale
component. Usually this happens when the available data measure the variable together with an
acquisition noise. This noise is considered as independent from the variable and characterized by its
own scale component, the nugget effect. The technique is applied to produce an estimate of the
variable, filtering out the effect of this noise, hence the name. In Isatis instead of selecting one scale
component to be estimated, the user has to filter out components.

Because of the linearity of the kriging system, we can easily check that:

Z = m + Y 1 + Y 2 (eq. 5.7-8)

This technique is obviously not limited to two components per variable, nor to one single variable.
We can even perform components filtering using the cokriging technique.
Technical References 95

5.8 Block Kriging


The kriging principle can be used for the estimation of any linear combination of the data. In
particular, instead of the estimation of Z at a target point, we might be interested in computing the
average value of Z over a volume v, called block. The block kriging performs this calculation; it is
obtained by modifying the right-hand side of the point kriging system (see the paragraph Kriging of
One Variable in the IRF-k Case):

l K  0 by K v which corresponds to the integral of the covariance function between the data
point and a point which describes the volume v :

1
v 
K  v = ----- K x dx (eq. 5.8-1)

The integral must be expanded over the number of dimensions of the space in which v is defined.

l f l 0 by f l v which correspond to the mean values of the drift functions over the volume:

1
v 
f vl = ----- f xl dx (eq. 5.8-2)

v
We obtain the following block kriging system:

 K +  f l = K
   l  v
  l
 (eq. 5.8-3)
 f l = f l
  v
 
The block kriging variance is given by

 2 = K vv –    K  v –   l f vl (eq. 5.8-4)

 l
It requires the calculation of the term Kvv instead K00 of the term

1
K vv = -------2-
v   Kxy dx dx (eq. 5.8-5)

vv
96 Linear Estimation

For each block v, the Kvv integral needs to be calculated once, whereas K v needs to be calculated
as many times as there are points in the block neighborhood. Therefore these integral calculations
have to be optimized.

Formal expressions of these integrals exist for a few basic structures. Unfortunately, this is not true
for most of them, and moreover these formal expressions sometimes lead to time consuming
calculations. Furthermore, the same type of numerical integration MUST be used for the Kvv and
the K  v terms, otherwise we may end up with negative variances.

Numerical integration methods relying on the discretization of the target block are therefore
preferred in Isatis. Two types of discretization are combined:

l the regular discretization,

l the random discretization.

In the regular discretization case, the block is partitioned into equal cells and the target is replaced
by the union of the cell centers ci.This allows the calculation of the K  v terms:

N
1
Kv = ----  K  c (eq. 5.8-6)
N i
i=1
where N is the number of cells in the blocks.

The double integral of the Kvv calculation is replaced by a double summation:

1
K vv = -----2-   K c  c (eq. 5.8-7)
N i j
N N
Applying in this case only the regular discretization sometimes lead to over-estimating the nugget
effect. A random discretization is therefore substituted, where the first point of the discretization
describes the centers of the previous regular cells whereas the second point is randomly located
within its cell. In this case, there is almost no chance that a point ci coincides with the point cj and
the function K(h) is never called for a zero-distance. The nugget effect of the structure therefore
vanishes as soon as the covariance is integrated. This effect is recommended as soon as the
dimension of the block is much larger than the dimension of the sample, which is usually the case.

Note - The drawback of this method is linked to its random aspect. For each calculation of a Kvv
term the set of points requires a set of random values to be drawn which will vary from one trial to
another. This is why it is recommended that the user exercises this calculation to determine the
optimum as a trade-off between accuracy and stability of the result on the one hand, and
computation time on the other : this possibility is provided in the Neighborhood procedure.
Technical References 97

5.9 Sampling Density Variance


This methodology enables you to measure the quality of estimation regardless of the block size we
use the Spatial density variance which is independent of the block size:

(eq. 5.9-1)

where is the kriging variance of the volume V.

The unit of the Spatial density variance being %².m3 it makes it hard to manipulate. For that reason,
the Spatial density variance is normalized by the average of the domain squared. This quantity is
homogenous with a volume, it is called the Specific Volume:

(eq. 5.9-2)

This quantity is not linked to the block size nor the average value. This can help comparing
estimations of different domains or deposit by solely focusing on the variogram and the sampling
layout.

In order to help defining the quality of the estimation, it possible to calculate the coefficient of
variation of the estimation on a given production volume:

(eq. 5.9-3)

Thresholds can be applied on this quantity to classify blocks. For instance C. Dohm defined in 2004
(“A logical approach”) thresholds of 2.5% and 5%:

Measured: CV < 2.5%

Indicated: 2.5% < CV < 5%

Inferred: 5% < CV

To obtain the Spatial density variance, we need the Kriging variance which is obtained by a
Super Kriging: A super block V is defined and centered around each block v, data inside the super

block is used to obtain the kriging variance of the super block . The Spatial density variance is
then the same for the super block and the block:

(eq. 5.9-4)
98 Linear Estimation

5.10 Kriging with Measurement Error


The user will find this kriging option in the "Interpolate / Estimation / (Co-)Kriging..." window,
"Special Kriging Options..." button.

A slight modification of the theory makes it possible to take into account variable measurement
errors at data points, provided the variances of these errors are known.

Suppose that, instead of Z  we are given Z  + e  where e  is a random error satisfying the
following conditions:

 E  e  = 0 

 Cov  e  e   = 0 if   
 (eq. 5.10-1)
 Var  e   = V 

 Cov  e  Z   = 0  

Then the kriging estimator of Z can be written Z 0* =     Z + e  and the variance



becomes:

Var  Z 0 – Z 0  =      K +      2 V – 2    K0 + K00 (eq. 5.10-2)

   

Then the kriging system of Z0 remains the same except that V is now added to the diagonal terms
K  ; no change occurs in the right-hand side of the kriging system.

These data error variances V are related, though not identical, to the nugget effect.

Let us first recall the definition of the nugget effect.

By definition, the nugget effect refers to a discontinuity of the variogram or the covariance at zero
distance. Mathematically, it means that the field Z(x) is not continuous in the mean square sense.
The origin of the terminology "nugget effect" is as follows.

Gold ore is often discovered in the form of nuggets, i.e. pebbles of pure gold disseminated in a
sterile matrix. Consequently, the ore grade varies discontinuously from inside to outside the nugget.
It has been found convenient to retain the term "nugget effect" even if this is due to causes other
than actual nuggets.

Generally, discontinuity of the variogram is only apparent. If we could investigate structures at a


smaller scale, we would see that Z(x) is in fact continuous but with a range much smaller than the
Technical References 99

nearest distance between data points. This is the reason why one could conveniently replace this
nugget effect by a transition scheme (say a spherical variogram) with a very short range.

But the "nugget effect" (as used in the modeling phase) can also be due to another factor: the
measurement error. In this case, the discontinuity is real and is due to errors of the type e  . This
time, the discontinuity remains whatever the size of the structure investigation. If the same type of
measurement error is attributed to all data, the estimate is the same whether:

l you do not use any nugget effect in your model and you provide the same V for each data, or

l you define a nugget effect component in your model whose sill C is precisely equal to V .

Unlike the estimate itself, the kriging variance differs depending on which option is chosen. Indeed,
the measurement error V is considered as an artefact and is not a part of the phenomenon of
interest. Therefore, a kriging with a variance of measurement error equal for each data and no
nugget effect in the model will lead to smaller kriging variances than the estimation with a nugget
component equal to V .

The use of data error variances V really makes sense when the data is of different qualities. Many
situations may occur. For example, the data may come from several surveys: old ones and new
ones. Or the measurement techniques may be different: depth measured at wells or by seismic,
porosities from cores or from log interpretation, etc ...

In such cases error variances may be computed separately for each sub-population and, if we are
lucky, the better quality data will allow identification of the underlying structure (possibly
including a nugget effect component), while the variogram attached to the poorer quality data will
show the same previous structure incremented by a nugget effect corresponding to the specific
measurement error variance V .

In other cases, it could be possible to evaluate directly the precision of each measurement and
derive V : if we are told that the absolute error on Z is Z , by reference to Gaussian errors we
may consider that, Z = 2  and take: V  =  Z  2  2 .
Another use of this technique, is in the post processing of the macro kriging where we calculate
"equivalent samples" with measurement error variances. These variances are in fact calculated from
a fitted model depending on the number of initial samples inside pseudo blocks.
100 Linear Estimation

5.11 Cokriging
This time, we consider two random variables Z1 and Z2 characterized by:

l the simple covariance/variogram of Z1 denoted C 11 /  11

l the simple covariance/variogram of Z2 denoted C 22 /  22

l the symetrical cross-covariance of Z1 and Z2 denoted C 12 ( where C 12 = C 21 )

l the cross-variogram of Z1 and Z2 denoted  12

Note - It is because the cross covariance is supposed to be symetrical, which is a particular case,
that the cokriging system can be easily translated from covariance to variograms.

We assume that the variables have unknown and unrelated means:


E  Z1  = m1 and E  Z 2  = m2

Let us now estimate the first variable at a target point denoted "0", as a linear combination of the
neighboring information concerning both variables and using respectively the weights  1 and  2 :

Z 1 =  1 Z 1 +  2 Z 2 (eq. 5.11-1)

The first variable is also called the main variable.We still apply the unbiasedness condition (eq. 5.1-
2)

E  Z 1 – Z 01  = 0 (eq. 5.11-2)

which leads to:

 1 m1 +  2 m2 – m1 = 0 m 1 m 2 (eq. 5.11-3)

 

  = 1
 1

 (eq. 5.11-4)
   = 0
 2

Let us consider the optimality condition (eq. 5.1-3) and minimize the variance of the estimation
error:
Technical References 101

(eq. 5.11-5)

under the unbiasedness conditions.

This leads to the cokriging system:

  1 C
11 +  2 C 12

+  1 = C110 

   C12 +  2 C
22 +  = C 12
2 0 
 1

  1 =

1 (eq. 5.11-6)
 

  =
 2 0
 
In matrix notations:

11
C C 12

1 0  1 C110
12
C 22
C 0 1  2 = C 12
0 (eq. 5.11-7)
1 0 0 0 1 1
0 1 0 0 2 0

with the estimation variance:

  1  2 = C 11
00
–  1 C110 –  2 C120 – 1 (eq. 5.11-8)
102 Linear Estimation

l In the intrinsic case with symetrical cross-covariances, the cokriging system may be written
using variograms:

  1 
11 +    12
2 
–  1 = 110 

   +  2 
 12 22 –  2 =  12
0 
 1

  1 = 1

(eq. 5.11-9)
 

  = 0
 2
 
with the estimation variance:

  1  2 = –  11
00
+  1 110 +  2 120 – 1 (eq. 5.11-10)

Note - If instead of Z1*, we want to estimate Z2*, the matrix is unchanged and only the right-hand
side is modified:

C110 C120
22
C 12
0  C 0 (eq. 5.11-11)
1 0
0 1
and the corresponding estimation variance:

  2  2 = C 22
00
–  1 C120 +  2 C220 –  2 (eq. 5.11-12)

Let us first remark that both variables Z1 and Z2 do not have to be systematically defined at all the
data points. The only constraint is that when estimating Z1, the number of data where Z2 is defined
is strictly positive.

This system can easily be generalized to more than two variables. The only constraint lies in the
"multivariate structure" which ensures that the system is regular if it comes from a linear
coregionalization model.
Technical References 103

5.12 Extended Collocated Cokriging


Isatis window: Interpolate / Estimation / Bundled Collocated Cokriging.

This technique is used when trying to estimate a target variable Z, known on a sparse sampling, on
a regular grid while a correlated variable Y is available at each node of this grid.

The original technique, strictly "Collocated Cokriging", has been extended in Isatis and is also
referred to as "Multi Collocated Cokriging" in the literature.

The first task that must be performed by the user consists in writing the value of the variable Y at
the points of the sparse sampling. Then he must perform the bivariate structural analysis using the
variables Y and Z. This may lead to a severe problem due to the large heterotopy between these two
variables: as a matter of fact, if the inference is carried out in terms of variograms, the two variables
need to be defined at the same points. If the secondary variable Y is dense with regards to the
primary variable Z, we can always interpolate Y at the points where Z is defined and therefore the
influence (at least as far as the simple variogram  Z  h  and the cross-variogram  Y Z  h  are
concerned) only considers those samples: all the remaining locations where Y only is defined are
simply neglected.

In the literature, we also find another inference method. The variogram  Y  h  is constructed on the
whole dense data set whereas the simple variogram  Z  h  and the cross variogram  Y Z  h  are
set as being similar to  Y  h  up to the scaling of their sills and to the use of the nugget effect: the
whole system must satisfy to the definite positiveness conditions. By definition, we are in the
framework of the linear model of coregionalization. This corresponds to the procedure programmed
in "Interpolate / Estimation / Collocated Cokriging (Bundled)".

The Cokriging step is almost similar to the one described in Paragraph "Kriging Two Variables in
the Intrinsic Case", the only difference is the neighborhood search. Within the neighborhood
(centered on the target grid node), any information concerning the Z variable must be used (because
Z is the primary variable and because the variable is sparse). Regarding the Y variable (which is
assumed to be dense with regards to Z), several possibilities are offered:

l not using any Y information: obviously this does not offer any interest,

l using all the Y information contained within the neighborhood: this may lead to an untractable
solution because of too many information,

l the initial solution (as mentioned in Xu, W., Tran, T. T., Srivastava, R. M., and Journel, A. G.
1992, Integrating seismic data in reservoir modeling: The collocated cokriging alternative. SPE
paper 24742, 67Th Annual Technical Conference and exhibition, p.833-842) consists in using
the single value located at the target grid node location: hence the term collocated. Its
contribution to the kriging estimate relies on the cross-correlation between the two variables at
104 Linear Estimation

zero distance. But, in the Intrinsic case, the weights attached to the secondary variable must add
up to zero and therefore, if only one data value is used, its single weight (or influence) will be
zero.

l the solution used in Isatis is to use the Y variable at the target location and at all the locations
where the Z variable is defined (Multi Collocated Cokriging). This neighborhood search has
given the more reliable and stable results so far.

In general collocated cokriging is less precise than a full cokriging - making use of the auxiliary
variable at all target points when estimating each of these.

Exception are models where the cross variogram (or covariance) between the two variables is
proportional to the variogram (or covariance) of the auxiliary variable.

In this case collocated cokriging coincides with full cokriging, but is also strictly equivalent to the
simple method consisting in kriging the residual of the linear regression of the target variable on the
auxiliary variable.

The user interested by the different approaches to Collocated Cokriging can refer to Rivoirard J.,
Which Models for Collocated Cokriging?, In Math. Geology, Vol. 33, No 2, 2001, pp. 117-131.
Technical References 105

6.Gaussian
Transformation: the
Anamorphosis
In Isatis.neo the gaussian anamorphosis is used in three different ways:
m for variable transformation into the gaussian space useful in the simulation processes
(normal score transformation),
m for histogram modeling and a further use in non linear techniques (D.K., U.C., Global
Support Correction, grade-tonnage curves, ...),
m for variogram transformation.

For information on the theory of Non Linear Geostatistics see Rivoirard J., Introduction to
Disjunctive Kriging and Non-linear Geostatistics (Oxford: Clarendon, 1994, 181p).
106 Gaussian Transformation: the Anamorphosis

6.1 Modeling and Variable Transformation


Note - Isatis window:
- Statistics / Gaussian Anamorphosis Modeling
- Statistics / Normal Score Transformation
- Statistics / Raw <-> Gaussian Transformation

6.1.1 Gaussian Anamorphosis Modeling


The gaussian anamorphosis is a mathematical function which transforms a variable Y with a
gaussian distribution in a new variable Z with any distribution: Z =   Y  . For mathematical
reasons this function can be conveniently written as a polynomial expansion:


Y =  i Hi  Y  (eq. 6.1-1)
i=0

where the Hi(Y) are called the Hermite Polynomials. In practice, this polynomial expansion is
stopped to a given order. Instead of being strictly increasing, the function  consequently shows
maxima and minima outside an interval of interest, that is for very low probability of Y, for instance
outside [-2.5, 3.] in (fig. 6.1-1) (horizontal axis for the gaussian variable and the vertical axis for the
Raw Variable)
Technical References 107

. (fig. 6.1-1)

The modeling of the anamorphosis starts with the discrete version of the curve on the true data set
(fig. 6.1-2). The only available parameters are 2 control points (A and B in (fig. 6.1-2)) which
possibly allow the user to modify the behaviour of the model (fig. 6.1-2) on the edges. But this
opportunity is in practice important only when the number of samples is small. The other
parameters available are the Authorized Interval on the Raw Variable (defined between a minimum
value Zamin and a maximum one Zamax) and the order of the Hermite Polynomial Expansion
(number of polynomials). The default values for the authorized interval are the minimum and the
maximum of the data set. In this configuration, the 2 control points do not modify the experimental
anamorphosis previously calculated.
108 Gaussian Transformation: the Anamorphosis

(fig. 6.1-2)

After the definition of this discretized anamorphosis, the program calculates the i coefficients of
the expansion in Hermite Polynomials. It draws the curve and calculates the Practical Interval of
Definition and the Absolute Interval of Definition:

l the bounds of the Practical Interval of Definition are delimited by the two points [Ypmin,
Zpmin] and [Ypmax, Zpmax] (fig. 6.1-3). The two calculated points are the points where the
curve crosses the upper and lower authorized limits on raw data (Zamin and Zamax) or the
points where the curve is no longer increasing with Y.

l the bounds of the Absolute Interval of Definition are delimited by the two points [Yamin,
Zamin] and [Yamax, Zamax] (fig. 6.1-3). These two points are the intersections of the curve
with the horizontal lines defined by the Authorized Interval on the Raw variable. The values
generated using the anamorphosis function will never be outside this Absolute Interval of
Definition.

The Figure 3 explains how the anamorphosis will be truncated later during use
Technical References 109

: (fig. 6.1-3)

6.1.2 Gaussian Variable into Raw Variable


The back-transformation from the gaussian variable to the raw variable is easy to perform as the
anamorphosis has been built for that. Nevertheless, the anamorphosis is not strictly increasing for
all the values of Y and the transformation is divided in 5 cases according to the (fig. 6.1-3):

Condition on Y Result on Z

Y  Y amin Z = Z amin

Y amin  Y  Y pmin Z = linear  Z amin Z pmin 


110 Gaussian Transformation: the Anamorphosis

Condition on Y Result on Z

Y pmin  Y  Y pmax NH – 1
Z =  i Hi  Y 
i=0

Y pmax  Y  Y amax Z = linear  Z pmax Z amax 

Y  Y amax Z = Z amax

An optional bias correction formulae exists for this back-transformation:

Z =
   Y + 1 –  d2v u g  u  du (eq. 6.1-2)

with  d2v the kriging dispersion variance. In the simple kriging case,  dv
2 = C0 –  2
sk and

as a consequence 1 –  dv
2 =  .
sk

6.1.3 Raw Variable into Gaussian Variable


The anamorphosis function is defined as a function of Y: Z =   Y  and to transform the raw
variable into a gaussian one we have to invert this function: Y =  – 1  Z  . This inversion can be
performed in Isatis in 3 different ways:

l Linear Interpolator Inversion

The inversion is just performed using a linear interpolation of the anamorphosis after
discretization. This interpolation also takes into account the previous intervals of definition of
the anamorphosis function:
Technical References 111

Condition on Z Result on Y

Z  Z amin Y = Y amin

Z amin  Z  Z pmin Y = linear  Y amin Y pmin 

Z pmin  Z  Z pmax NH – 1
YZ =  i Hi  Y 
i=0

Z pmax  Z  Z amax Y = linear  Y pmax Y amax 

Z  Z amax Y = Y amax

In the middle case, local linear interpolation is used.

l Frequency Inversion

The program just sorts the raw values. A cumulative frequency is then calculated for each
sample FCi from the smallest value adding the frequency of each sample:

FC i = FC i – 1 + W i (eq. 6.1-3)

The frequency Wi is given by the user (The Weight Variable) or calculated as Wi = 1/N. Note
that two samples with the same value will get different cumulative frequencies. The program
has finally to calculate the gaussian value:

Y i =  G – 1  FC i  + G –1  FC i – 1    2 (eq. 6.1-4)

In this way, two equal raw data have different gaussian values. The resulting variable is "more"
gaussian. This inversion method is generally recommended in Isatis.
112 Gaussian Transformation: the Anamorphosis

l Empirical Inversion

The empirical inversion calculates for each raw value the attached empirical frequency and
calculates the corresponding gaussian value. This time, two equal raw values will have the same
gaussian transformed value.

Note - It is important to note that even if the gaussian transformed values can be calculated
without any anamorphosis model, if the user performs this operation for a simulation (for example),
he will have to back-transform these gaussian simulated values and this time the anamorphosis
model will be necessary. So it is very important to check during this step that this back
transformation Y  Z will not be a problem, particularly from an interval of definition point of
view. Indeed, one has to keep in mind the fact that a simulation generates gaussian values on an
interval often larger than the interval of definition of the initial data: so the Practical Interval
should be carefully checked if the model has to be used later, after a simulation process.
Technical References 113

6.2 Histogram Modeling and Block Support Correction


The advantage of the Hermite Polynomial Expansion for the anamorphosis modeling is that we can
have very easily, in the context of the Gaussian Discrete Model, a correction of the  i coefficients
to get an anamorphosis on a block support.

This Block Support Correction is available in the Statistics / Gaussian Anamorphosis Modeling
window, "Calculate" button and "Block Correction" option.

For the points, we have:


Z = Y =  i Hi  Y  (eq. 6.2-1)
i=0

The block support anamorphosis can be written:


Zv = r  Yv  =  i r i Hi  Yv  (eq. 6.2-2)
i=0

A simple support correction coefficient can allow the user to get this new anamorphosis and at the
same time a model of the histogram of the blocks. In fact this coefficient "r" is determined from the
variance of the blocks:


varZ v =  i2 r 2i (eq. 6.2-3)
i=1

We also have for the points:


. varZ =  i2 (eq. 6.2-4)
i=1

The only problem in the calculation of the coefficient "r" is that we need the anamorphosis model
  i  and a variogram model. And unfortunately the variance of the points can be calculated with
the anamorphosis (see above) or can be considered as the sill of the variogram (in a strict stationary
case). In Isatis we calculate the block variance in the following way:
114 Gaussian Transformation: the Anamorphosis


varZ =  i2 (eq. 6.2-5)
i= 1

varZ v = varZ –   v v  (eq. 6.2-6)

where   v v  is calculated from the variogram model using a discretization of the block v.

When the sill of the punctual variogram is different from var Z, the value of   v v  can be
normalized by the ratio: (var Z / variogram sill).

The anamorphosis of the blocks can be stored; the size of the block support is kept for further use
(Uniform Conditioning, Disjunctive Kriging, etc...).

As in the case of "punctual" anamorphosis, this block anamorphosis can be used to transform block
values into gaussian ones and conversely. But the user can also get the grade-tonnage curves of the
histogram attached to this block anamorphosis.

6.2.1 Grade-Tonnage Curves


The metal quantity is only available with strictly positive distributions.

When an anamorphosis   Y  has been modelled, the different quantities available for a given
cutoff "zc" are:

The tonnage above the cutoff T  zc  = 1 – G  yc  with


yc =  –1  zc 

 y  y g  y  dy
The metal quantity above the cutoff
Q  zc  =
c

Q  zc 
The mean grade above the cutoff m  z c  = --------------
T  zc 

Obviously these quantities can be calculated for the punctual anamorphosis but also with a given
block support. In this way, the user can have access to global recoverable reserves.
Technical References 115

6.2.2 Grade-Tonnage Curves with information effect


When calculating global recoverable reserves, it can be interesting to take into account the fact that
the selection of mining units will be performed on future kriging estimates, when more samples will
be available.

This option can be activated in the Statistics / Gaussian Anamorphosis Modeling window,
"Calculate" button and "Block Correction" option.

This time, the program will need two other parameters, the variance of the kriged blocks (var Z*v)
and the covariance between the real block grades and the kriged grades (cov(Z*v,Zv)).

Note - These two parameters can be calculated in "Interpolate / Estimation / (Co-)kriging", "Test
Window" option "Print Complete Information", when kriging a block with the future configuration
of the samples... These two values are called in the kriging output: Variance of Z* (Estimated Z)
and Covariance between Z and Z*.

The used formulae are in this case:


VarZ v =  i2 s 2i (eq. 6.2-7)
i=1


cov  Z v Z v  =  i2 r i s i  i (eq. 6.2-8)
i=1

This time, the different quantities for a given cutoff "zc" are

l The tonnage above the cutoff T  zc  = 1 – G  yc  with


y c =  s– 1  z c 

l The metal quantity above the cutoff 


Q  zc  =
 yr  y g  y  dy
c

l The mean grade above the cutoff Q  zc 


m  z c  = --------------
T  zc 
116 Gaussian Transformation: the Anamorphosis

Isatis gives the values of the two gaussian correlation coefficients: "s" and "  " in the "Calculate"
window for information.

Note - In the case where the future block estimates have no conditional bias, then s = r  , and
the estimated recoverable reserves are the same as in the case of larger virtual blocks that would be
perfectly known ("equivalent blocks", having a variance equal to the variance of the future
estimates).
Technical References 117

6.3 Variogram on Raw and Gaussian Variables


Isatis window: Statistics / Gaussian to Raw Variogram.

In the same way that the Hermite Polynomial Expansion can be used to calculate easily the variance
of the raw variable from the polynomial coefficients, a simple relationship between the covariance
of the gaussian transformed variable and the covariance of the raw variable


Ch =  i2  i  h  (eq. 6.3-1)

i=1

where:

l   h  is the covariance of the gaussian variable,


l C(h) is the covariance of the raw variable.

This relationship is valid if the pair of variables (Y(x), Y(x+h)) can be considered as bivariate
normal. From the relationship on covariances, we can derive the relationship on variograms.

The use of that relationship is triple:

l One can calculate the covariances (or variograms) on gaussian transformed values and raw
values and check if the relationship holds in order to confirm the binormality of (Y(x), Y(x+h))
pairs

l One can calculate the gaussian variogram on the gaussian transformed values and deduce the
raw variogram

This is interesting because the variogram of the gaussian variable is often more clearly
structured and easy to fit than the raw variogram derived from the raw values.

l One can calculate the gaussian variogram from the raw variogram.

That transformation is not as immediate as the previous one, as each lag the relationship needs
to be inverted (the secant method can be used for instance).

This use of the relationship between gaussian and raw covariance is compulsory to achieve
disjunctive kriging on gaussian transformed values after change of support. It means that it
calculates the gaussian covariance for the block support v from an analogous relationship


Cv  h  =  i2 r 2i vi  h  (eq. 6.3-2)
i=1

where r is the change of support coefficient in the block anamorphosis.


118 Gaussian Transformation: the Anamorphosis

In each case, the relationship has to be applied using a discretization of the space (namely h values).
Technical References 119

7.Non Linear Estimation


Background information about the following non linear estimation techniques is presented
hereafter:
1. Probability from Conditional Expectation,
2. Confidence Intervals.

For a general presentation of non linear geostatistics, the reader should refer to Rivoirard J.,
Introduction to Disjunctive Kriging and Non-linear Geostatistics (Oxford: Clarendon, 1994, 181p).
120 Non Linear Estimation

7.1 Probability from Conditional Expectation


We designate by Z the random variable, and wish to estimate the probability for this variable to
exceed a given threshold s.

We consider also Y, the gaussian transform of Z by the anamorphosis function  : Z =   Y  .


The reader should first have a look at the chapter about the Gaussian Anamorphosis for further
explanation.

Z can be expressed as follows:

Z  x  =   Y  x  +   x W  x   (eq. 7.1-1)

where:

- Y*and  respectively stand for the simple kriging of Y based on available data Y and its
associated kriging standard deviation,

- W(x) is a normalized gaussian random function, spatially independent from Y*.

The probability for Z to exceed a given threshold s is directly derived from the preceding equation:

P  Z  x   s  = P  Y  x  +   x W   x    –1  s   
 – 1  s  – Y  x 
= P W  x   ------------------------------------
  x  (eq. 7.1-2)

 –1  s  – Y  x 
= 1 – G  ------------------------------------
   x  

where G is the c.d.f. for the gaussian distribution.

Note - At a conditioning point, the probability is equal to 0 or 1 depending upon whether Y is


smaller or larger than  – 1  s  . Conversely, far from any conditioning data, the probability
converges towards the a priori probability 1 – G   –1  s   .
Technical References 121

7.2 Confidence Intervals


The idea is to derive the confidence interval from a block kriging using the discrete gaussian model.
In the gaussian space any characteristic can be easily calculated once the mean and the variance are
known.

We start from the gaussian transform Y  , centered in the blocks to be estimated and from the block
gaussian variogram previously modeled: V .
V

The kriging system to estimate Y VK can be written as:

 +    r 2  V V = r V
V
for all  (eq. 7.2-1)


with:

Y VK =   Y (eq. 7.2-2)


and:

 K2 = 1 –    r  V (eq. 7.2-3)
V

Knowing these two values we can derive any confidence interval on the kriged gaussian values
from the gaussian density function. For instance, for a 95% confidence level, we have:

Pr  Y VK – 2  K  Y V  Y VK + 2  K  = 95 % (eq. 7.2-4)

which is equivalent for the raw values by using the anamorphosis to:

Pr   r  Y VK – 2  K     Y V    r  Y VK + 2  K   = 95 % (eq. 7.2-5)

This gives the bounds of the confidence interval:

Z min =  r  Y VK – 2  K  (eq. 7.2-6)

Z max =  r  Y VK + 2  K  (eq. 7.2-7)


122 Non Linear Estimation
Kriging With Bayesian Drift 123

8. Kriging With
Bayesian Drift
The principle of the kriging with bayesian drift is to replace the drift coefficients by random
Gaussian variables in the universal kriging. The kriging dichotomy is now expressed as:

(eq. 8.0-1)

Where is the drift, is a set of random variables with the first two moments
known a priori and is the residual.

(eq. 8.0-2)

The unbiasedness condition aiming at filtering out on the drift, leads to add the following equations:

(eq. 8.0-3)

The random function Z and the set of random variables are related by:

(eq. 8.0-4)

Also the spatial covariances of stationary residuals can be expressed as:

(eq. 8.0-5)

Using the optimality condition and minimizing the prediction variance, we get the following
Bayesian kriging system:
124

(eq. 8.0-6)

In matrix notations we can write it as:

(eq. 8.0-7)

The final prediction system is:

(eq. 8.0-8)

With:

The priors are used in the Bayesian kriging and in the bayesian simulations. This page presents
prior initialization option. The Bayesian technics offer the possibility of adding some prior
information on the coefficients of the basic drift functions (monomials or external drift
functions). Let us call N their number. P denotes the number of samples.

The principle of the priors is to consider them as a set of Gaussian random variables which must
therefore be defined by specifying their individual means and variances, as well as their two-by-
two correlation coefficient.

The mean value is simply obtained by solving the regression of the data on the set of basic drift
functions. This requires to solve a N x N system which always has a valid solution, unless the
basic drift functions are linearly linked.
Kriging With Bayesian Drift 125

For the variance and correlation, the work is slightly more difficult. The principle is to obtain
through a leave-one-point-out algorithm. In this technique, we alternatively remove one point out of
the data set. On the P-1 samples, we apply the regression (as mentioned above). This leads to an
estimate of the N coefficients of the basic drift functions.

When the P trials have been performed, we have P sets of N coefficients. It is then easy to calculate
the variance and correlation between these series.
126
9.Advanced Estimation
Methods and Simulations
Technical References 129

9.1 Turning Bands Simulations


This page constitutes an add-on to the User’s Guide for Simulations.

For the theoretical background, the user should refer to Matheron G., The intrinsic random
functions and their application (In Adv. App. Prob. Vol.5, pp. 439-468, 1973).

9.1.1 Principle
The Turning Band method is a stereological device designed to reduce a multidimensional
simulation to unidimensional ones: if C3 stands for the (polar) covariance to be produced in 3 , it
is sufficient to simulate a stationary unidimensional random function with X covariance:


C 1  h  = -----  rC 3  r  
r (eq. 9.1-1)

X is then spread throughout the space:


Y  x  = X  <  x> (eq. 9.1-2)

where  is a unit vector with a uniform direction.

9.1.2 Non Conditional Simulation


A random function is said to be multigaussian if any linear combination of its variables follows a
gaussian distribution. In the stationary case, Multigaussian Random Function has its spatial
distribution totally characterized by its mean value and its covariance.

The easiest way to build a Multigaussian Random Function is to use a parallel procedure. Let Y1,
..., Yn stand for a sequence of standard independent and identically distributed random functions
with covariance C. The spatial distribution of the random function:

Y1 +  + Yn
Y  n  = ------------------------------
n (eq. 9.1-3)

tends to become Multigaussian with covariance C as n becomes very large, according to the
Central Limit Theorem.

Several algorithms are available to simulate the elementary random functions Yi with a given
covariance C. The user will find much more information in Lantuéjoul C., Geostatistical
Simulation (Springer Berlin, 2002. 256p).
130

The choice of the method to generate the random function X is theoretically free. However in Isatis,
this or that method will be used preferably to optimize the generation of this or that specific model
of covariance. The selection of the method is automatic.

l Spectral Method

The Spectral Method generates a distribution the covariance of which is expressed as the
Fourier transform of a positive distribution. This method is rather general and is implemented in
Isatis where the covariance is regular at the origin. This is the case for the Gaussian, Cardinal
Sine, J-Bessel or Cauchy models of covariance.

Any covariance is a positive definite function which can be written as the Fourier transform of a
positive spectral measure:

(eq. 9.1-4)

where X is a probability distribution.

The random function is obtained as:

Yx = 2 cos  <  x> +   (eq. 9.1-5)

where:
m  is a random vector with distribution X
m  is a uniform variable between 0 and 2p
l Dilution Method

The Dilution Method generates a numerical function F and partitions into  intervals with
constant length. Each interval is randomly valuated with F or -F. This method is suitable to
simulate covariances with bounded ranges. In Isatis, it is used to generate Spherical or Cubic
models of covariance.
When the covariance corresponds to a geometrical covariogram i.e.:

(eq. 9.1-6)

the random e function is obtained as the dilution of primary functions:


Technical References 131

(eq. 9.1-7)

where:
m P is a Poisson process of intensity ,
m e is a family of standard random variables,
m g is a numerical function.

l Migration Method

The Migration Method generates a Poisson process that partitions into  independent
exponential intervals which are valuated accordingly to the model of covariance to be
simulated. In Isatis it is used for:
m the exponential model: each interval is split into two halves which are alternatively valuated
with +1 and -1;
m the Stable and Gamma models: the intervals are valuated accordingly to an exponential law;
m the generalized Covariance models: the intervals are valuated with the sum of gaussian
processes.

The simulation of the covariance is then obtained by summation with projection of the simulations
on a given number of lines of the covariance . Each line is called in fact "turning band" and the
problem of the optimal count of Turning Bands remains, although Ch. Lantuejoul provides some
hints in Lantuéjoul C., Non Conditional Simulation of Stationary Isotropic Multigaussian Random
Functions (In M. Armstrong & P.A. Dowd eds., Geostatistical Simulations, Kluwer Dordrecht,
1994, pp.147-167).

9.1.2.1 Conditioning

If we consider the kriging estimation of Z(x) using the value of the variable at the data points z  x  
, in each point, we can write the following decomposition:

Z(x) = Z(x)K + [Z(x) - Z(x)K] (eq. 9.1-1)

In the Gaussian framework, the residual [Z(x)-Z(x)K] is not correlated with any data value. It is
therefore independent from any linear combination of these data values, such as the kriging
estimate. Finally the estimate and the residual are two independent random functions, not
necessarily stationary: for example at a data point, the residual is zero.

If we consider a non-conditional simulation Zs(x) of the same random function, known over the
whole domain of interest and its kriging estimation based on the value of this simulation at the data
points, we can write similarly:
132

ZSC(x) = Z(x)K + [ZS(x) - ZS(x)K] (eq. 9.1-2)

where estimate and residual are independent, with the same structure.

By combining the simulated residual to the initial kriging estimation, we obtain:

ZSC(x) = ZS(x) + [Z(x) - ZS(x)K] (eq. 9.1-3)

which is another random function, conditional this time as it honors the data values at the data
points.

Note - This conditioning method is not concerned about how the non-conditional simulation Zs(x)
has been obtained.

As non correlation is equivalent to independence in the gaussian context, a simulation of a gaussian


random function with nested structures can be obtained by adding independent simulations of the
elementary structures.

For the same reason, combining linearly independent gaussian random functions with elementary
structures gives, under a linear model of coregionalization, a multivariate simulation of different
variables.
Technical References 133

9.2 Spill Point Calculation


9.2.1 Introduction
In Oil & Gas applications, the spill point calculation enables the user to delineate a potential
reservoir knowing that some control points are inside or outside the reservoir.

For the illustration of this feature, we will consider one map (which may be one of the outcomes of
a simulation process) where the variable is the topography of the top of a reservoir. We will
consider the depth as counted positively downwards: the top of the structure corresponds to the
lowest value in the field.

Moreover, we assume that we have a collection of control points whose locations are known and
which belong to one of the following two categories:

l the control point belongs to the reservoir: inside,

l the control point does not belong to the reservoir: outside.

Note - All the points located outside the frame where the image is defined are considered as
outside.

9.2.2 Basic Principle


The principle is to find the elevation of the deepest horizontal plane which will split the field into
inside and outside sub-areas while remaining compatible with the control point information (the
Spill). We also look for the crucial point where, if the Spill is slightly increased, the violation of the
constraints will first take place (the Spill Point). The following figure illustrates these definitions:

(fig. 9.2-1)

The Spill Point corresponds to the location of the saddle below volumes A and B. As a matter of
fact, if we consider a deeper spill, these two volumes will connect and the constraints induced by
134

the control points will not be fulfilled any more as the same location cannot be simultaneously
inside and outside the reservoir.

The volume A is considered as outside whereas B is inside. An interesting feature comes from the
volumes C1 and C2:

l they are first connected (as elevation of the separation saddle is located above the spill point)
and therefore constitute a single volume C,

l the contents of this volume C is unknown.

Hence, after the spill point elevation has been calculated, each point in the frame can only
correspond to one of the following four status:

l below the spill point,

l above the spill point and inside the reservoir,

l above the spill point and outside the reservoir,

l above the spill point in an unknown volume.

9.2.3 Maximum Reservoir Thickness Constraint


This constraint corresponds to an actual limitation that must be taken into account in the Oil
Industry. Due to the quality of the rock and the depth of the reservoir, the pressure of the captured
fluid implies the thickness of the reservoir not to exceed a maximum value. This constraint is
referred to as the maximum reservoir thickness. On the previous figure, let us add this constraint:

(fig. 9.2-2)

The new spill point is shifted upwards as otherwise the maximum reservoir thickness constraint
would be violated. Note that, the Spill elevation is clearly known whereas the location of the Spill
Point is rather arbitrary this time. It is the last point that may be included in the reservoir: if the next
Technical References 135

one (sorted by increasing depth) was included, the thickness of the reservoir would overpass the
maximum admissible value.

9.2.4 The "Forbidden types" of control points


Up to now, all the control points have directly been used in order to derive the Spill characteristics.
There exists a second type of control point: the forbidden type ones. This information is not used
for the calculation of the Spill characteristics. They are simply double-checked a posteriori.

A forbidden outside point is a location which must result as either inside or unknown. Conversely,
a forbidden inside point is a location which must result as either outside or below or unknown.

If in a map (usually a simulation outcome), one of these constraints is not fulfilled, the whole map
is considered as not acceptable and is discarded from the final statistics.

9.2.5 Limits of the algorithm


It is essential to understand the behavior of the algorithm in the following simplified scenarios.

Let us consider the case of a synclinal where the top of the structure is constrained to belong to the
reservoir whereas a point located on its flank is supposed to be outside. The Spill (considered as
being the deepest horizontal plane where both constraints are fulfilled) is obviously located at the
elevation of the outside control point.

(fig. 9.2-3)

Let us now consider the opposite case where the outside control point is at the top of the structure
and the inside control point is on the flank. In principle, the situation should be symmetric with the
same result for the elevation of the Spill. But if we consider the volume of the reservoir now: the
volume of the reservoir controlled by the inside control point and located above the spill has its
volume reduced to zero. That is the reason why such a map is considered as not acceptable.
136

(fig. 9.2-4)

9.2.6 Converting Unknown volumes into Inside ones


As explained previously, the Spill point calculation may delineate volumes located above the Spill,
but where the algorithm cannot decide if it belongs to the reservoir or not: these volumes are called
unknown.

Then, if these volumes are discarded from the global statistics, the results are biased: the unknown
volumes are always considered as outside. Another possibility is to convert them all empirically to
inside in order to get another biased estimate (by excess this time).

In addition to the bias, this latter operation can lead to contradictions if the maximum reservoir
thickness criterion has been taken into account, as explained in the next figure:

(fig. 9.2-5)

The volume A is considered outside and B inside the reservoir. The volumes C and D are initially
unknown. If we convert them into inside, the maximum reservoir thickness constraint will not be
Technical References 137

fulfilled any more for the volume C. We could imagine to move the spill upwards until the
constraint is satisfied, but then we should also move the Spill Point in a new location. Instead, we
have considered that such a map should rather be considered as not acceptable.
138

You might also like