You are on page 1of 146

Practical Mining

GEOSTATISTICS

Short Course Notes Prepared for


MineSight®/MEDSYSTEM® Users

ABDULLAH ARIK
Mintec, inc.
Tucson, AZ
USA
1997 All Rights Reserved

Revised: November 1999


Table of Contents

Page

SECTION 1 Introduction 1

SECTION 2 Basic Statistics 3

2.1 Definitions 3
2.2 Descriptive Measures 5
Measures of Locations 5
Measures of Spread 7
Measures of Shape 8
2.3 Theoretical Model of Distributions 9
Normal Distribution 9
Lognormal Distribution 13
2.4 Random Variables 15
2.5 Central Limit Theorem 17
2.6 Bivariate Distribution 18
2.7 Desirable Properties of Estimators 19

SECTION 3 - Data Analysis and Display 21

3.1 Data Preparation and Verification 21


3.2 Graphical Display and Summary 22
Univariate Description 22
Bivariate Description 28
Spatial Description 31

SECTION 4 Analysis of Spatial Continuity 37

4.1 Variogram 37
4.2 Theoretical Variogram Models 46
4.3 Anisotropy 50
4.4 Variogram Types 55
4.5 Fitting a Theoretical Variogram Model 60
4.6 Cross Validation 61

i
Table of Contents (Continued)

Page

SECTION 5 Random Processes and Variances 65

5.1 Theory of Regionalized Variables 65


5.2 Random Function Models 69
5.3 Block Variance 73

SECTION 6 Declustering 75

6.1 Polygonal Declustering 75


6.2 Cell Declustering 77
6.3 Declustered Global Mean 80

SECTION 7 Ordinary Kriging 81

7.1 Kriging Estimator 81


7.2 Kriging System 82
7.3 Properties of Kriging 84
7.4 Effect of Variogram Model Parameters 86
7.5 Effect of Search Strategy 88
7.6 Relevance of Stationarity Models 89
7.7 Simple Kriging 90
7.8 Cokriging 90
7.9 Non-stationary Geostatistics 93
7.10 Non-linear Kriging Methods 94

SECTION 8 Multiple Indicator Kriging 97

8.1 The Indicator Function 97


8.2 The (A;zc) Function 99
8.3 Local Recovery Functions 101
8.4 Estimation of (A;zc) 102
8.5 Indicator Variography 102

ii
Table of Contents (Continued)

Page
SECTION 8 Multiple Indicator Kriging (cont’)

8.6 Order Relations 104


8.7 Construction of Grade-tonnage Relationship 106
8.8 Performing Affine Correction 107
8.9 Advantages and Disadvantages of M.I.K. 113

SECTION 9: Change of Support

9.1 The Support Effect 115


9.2 Smoothing and Its Impact on Reserves 117
9.3 Volume-Variance Relationship 119
9.4 How to Deal with Smoothing 119
9.5 How Much Smoothing Is Reasonable? 120
9.6 Global Correction for the Support Effect 120

SECTION 10 Conditional Simulation 123

10.1 The Objective of Simulation 123


10.2 What is Conditional Simulation? 124
10.3 Simulation or Estimation 126
10.4 Simulation Algorithms 128
10.5 Making a Simulation Conditional 132
10.6 Conditional Simulation Functions 134
10.7 Typical Uses of Simulated Deposits 134

SECTION 11 References 137

iii
Tables

Table 2.3.1 Cumulative Normal Distribution Function

Table 3.2.1 Cumulative Frequency Statistics of Sample Data

Table 7.8.1 Cokriging System of Equations for Two Variables; Primary Variable is Drillhole
Data, Secondary Variable is Blasthole Data

Table 7.9.1 Universal Kriging System of Equations in the Case of a Linear Drift

iv
Figures

Figure 2.3.1 Normal Distribution Curve

Figure 2.3.2 Lognormal Distribution Curve Examples

Figure 3.2.1 Frequency Table and Histogram of Sample Data

Figure 3.2.2 Histogram Plot of Sample Data

Figure 3.2.3 Lognormal Probability Plot of Sample Data.

Figure 3.2.4 Scatter Plot of Two Variables.

Figure 3.2.5 Sample Data Location Map

Figure 3.2.6 Sample Data Contour Map

Figure 3.2.7 Sample Plot of Proportional Effect

Figure 4.1.1 Sample Variogram

Figure 4.1.2 Sample Data For Variogram Computation

Figure 4.1.3 Variogram Direction, Window and Band Width Definitions

Figure 4.2.1 Theoretical Variogram Models

Figure 4.2.2 Variogram Plot and Theoretical Model Fit

Figure 4.3.1 Variogram Contours

Figure 4.3.2 Geometrical and Zonal Anisotropy

Figure 4.3.3 Nested Structures

Figure 4.6.1 Sample Output of Cross Validation

v
Figures (Cont’d)

Figure 5.2.1 Sample Points on a Profile to be Estimated

Figure 5.2.2 A Deterministic Model Curve

Figure 5.3.1 Relation of Variance to Sample Support

Figure 6.1.1 Sample Map of Polygons of Influence

Figure 6.2.1 An Example of Cell Declustering

Figure 6.2.2 Sample Output From Cell Declustering Program

Figure 8.1.1 Indicator Function at Point x

Figure 8.2.1 Proportion of Values z(x) # zc within area A

Figure 8.5.1 Indicator Variograms at Different Cutoff Grades

Figure 8.6.1 Solving the Order Relation Problems

Figure 8.8.1 Illustration of Affine Reduction of Variance

Figure 8.8.2 Affine Correction for Ore Grade and Tonnage Estimation

Figure 9.1.1 Histograms of Data Values from 1x1, 2x2, 5x5 and 20x20 Block Sizes

Figure 9.2.1 The Effect of Smoothing on the Grade-Tonnage Curves

Figure 10.2.1 Schematic Relationship between the Actual Sample Values in a Deposit and a
Conditional Simulation for that Deposit

Figure 10.3.1 Illustration of Conditional Simulation

Figure 10.5.1 Summary of the Algorithm Generating Conditionally Simulated Grade as the Sum
of Three Random Variables, All Derive from the Original Set of Samples

Figure 10.7.1 Use of Conditional Simulation to Forecast Departures from Planning in the Mining
of a Deposit

vi
1
Introduction

Geostatistics refers to the study of distribution in space of useful values for geologists and
mining engineers, such as grade or thickness of an orebody although its application is by no means
limited to problems in geology and mining. Historically, geostatistics can be considered as old as
mining itself. As soon as mining men started to pick and analyze samples, and compute average
grades, weighted by corresponding thickness or area of influence, one may consider that geostatistics
was born. However, it was not until the publication of the Theory of Regionalized Variables by G.
Matheron of France in early 1960’s that the term Geostatistics became popular.

The classical statistical methods are based on the assumption that the sample values are
realizations of a random variable. The samples are considered independent. Their relative positions
are ignored, and it is assumed that all sample values have an equal probability of being selected.
Thus, one does not make use of the spatial correlation of samples although this information is very
relevant and essential in certain data sets such as the ones obtained from an ore deposit.

In contrast, in geostatistics one considers that the sample values are realizations of random
functions. On this hypothesis, the value of a sample is a function of its position in the mineralization
of the deposit, and the relative position of the samples is taken into consideration. Geostatistics
offers many tools describing the spatial continuity that is an essential feature of many natural
phenomena and provides adaptations of classical statistical techniques to take advantage of this
continuity.

The objective of this book is to make the reader familiar with the basic concepts of statistics,
and the geostatistical tools available to solve problems in geology and mining of an ore deposit. The
emphasis of the book will be on the use these tools through MEDSYSTEM®/MineSight® and their
practical application to reserve estimations. The majority of the material in this book is introductory
and exclusive of mathematical formalism. It is based on a small, but real data set. The solutions
proposed in this book are, therefore, for the particular data set used, and may not be used as general
recipes. It is hoped that this book will provide the reader with practical aspects of various statistical
and geostatistical tools, and help prepare them to tackle their problems at hand.

1
Organization of Sections

This book is organized to take the reader from the elementary statistics to more advanced
geostatistical topics in a natural progression. Following this introductory section, Section 2 reviews
the basic statistical concepts and definitions. This section also covers the theoretical model of
distributions.

Organization and presentation of geostatistical results are vital steps in communicating the
essential features of a large data set. Therefore, Section 3 covers univariate and bivariate
descriptions.

The first section which gets into spatial statistics is Section 4. In this section, the reader will
be able to looks at various ways of describing the spatial features of a data set, including variogram
analysis.

Section 5 introduces the random processes and variances. It covers the theory of regionalized
variables, random function models and the necessity of modeling. The question of why probabilistic
models are necessary to describe earth science processes is discussed in this section.

Section 6 reviews different declustering methods which deal with estimating an unbiased
average value over a large area.

Section 7 gets into the most frequently used geostatistical estimation method, the ordinary
kriging. The theory and assumptions of the algorithm, and the effects of variogram parameters on
the kriged estimates are discussed in this section.

One of the most popular non-linear geostatistical estimation methods is the multiple indicator
kriging. Section 8 discusses both the theory and the adaptation of this method to handle the
estimation problems with highly skewed data. This section also discusses the topics of recoverable
reserves and volume-variance relationships.

Section 9 provides a review of conditional simulation, discusses the merits of both estimation
and simulation, and their limitations. The simulation algorithms available and their applications are
also discussed in this section.

2
2
Basic Statistics

This chapter is designed to provide a review of basic statistics for those readers who have
little or no background in statistics.

2.1 DEFINITIONS

Statistics

Statistics is the body of principles and methods for dealing with numerical data. It
encompasses all operations from collection and analysis of the data to the interpretation of the
results.

Geostatistics

Throughout this work book, geostatistics will refer only to the statistical methods and tools
used in ore reserve analysis.

Universe

The universe is the source of all possible data. For our purposes, an ore deposit can be
defined as the universe. But often problems arise when a universe does not have well defined
boundaries. In this case the universe is not clearly located in space until other concepts are defined.

Sampling Unit

The sampling unit is the part of the universe on which a measurement is made. This can be
a core sample, channel sample, a grab sample etc. When one makes statements about characteristics
of a universe, he or she must specify what the sampling unit is.

3
Support

Support is the characteristics of the sampling unit, which refers to the size, shape and
orientation of the sample. For example, the channel samples gathered from anywhere in a drift will
not have the same support as the channel samples cut across the ore vein in the same drift.

Population

The word population is synonymous with universe in that it refers to the total category under
consideration. However, it is possible to have different populations within the same universe based
on the support of the samples. For example, population of blast hole grades versus population of
exploration hole grades. Therefore, the sampling unit and its support must be specified in reference
to any population.

Random Variable

A random variable is a variable whose values are randomly generated according to a


probabilistic mechanism. For example, the outcome of a coin toss, or the grade of a core sample in
a diamond drillhole are random variables.

FREQUENCY DISTRIBUTION

A frequency distribution shows how the units of a population are distributed over the range
of their possible values. There are two types of frequency distribution, probability density function
(pdf) and cumulative density function (cdf).

a) Probability Density Function

The possible outcome of a random selection of one sample is described by the probability
distribution of its grade. This distribution, usually referred to as the probability density function or
pdf, may or may not be known. For example, the possible outcomes from throwing a die are 1, 2, 3,
4, 5, or 6. Each outcome has an equal probability of 1/6. On the other hand, in a mineral deposit, the
probability distribution of the grade will never be known. In that case, an experimental probability
distribution is computed to infer which theoretical distribution may have produced such sample
values.

The probability distributions can be either discrete or continuous functions. In the case of
discrete functions, the distributions assign a probability f(x) to each event x. For example, the
distribution for a toss of a coin will assign a probability of 0.5 that the coin lands heads up, and a
probability of 0.5 that it lands tails up. The summation of all the possible f(x), in this case 0.5 + 0.5,
is equal to one.

4
Thus, the following must hold true if f(x) is discrete:

1. f(xi) $ 0 for xi 0 R (R is the domain)


2. f(xi) = 1

In the case of a continuous distribution, a density of probability f(x) will be assigned to each
x so that the probability of one value falling between x and x+dx will be f(x) dx, where dx is
infinitesimal. Thus, the following must hold true if f(x) is continuous:

1. f(x) $ 0
2. I f(x) dx = 1

b) Cumulative Density Function

The cumulative probability distribution F(X), usually referred to as the cumulative density
function or cdf, describes the proportion of the population below a certain value. If x is a random
variable, then the cumulative density function F(x) is:

F(x) = P(X # x) (2.1.1)

The following holds true for F(x).

1. 0 # F(x) # 1 for all x


2. F(x) is nondecreasing
3. F(-4) = 0, and F(4) = 1

2.2 DESCRIPTIVE MEASURES

We use several statistics to mathematically describe a distribution. These statistics fall into
three categories: measures of location, measures of spread, and measures of shape.

MEASURES OF LOCATION

Mean

The mean, m, is the most important measure of central tendency. It is the arithmetic average
of the data values, and is calculated by the formula:

5
m = 1/n xi i=1,...,n (2.2.1)

where n is the number of data, and x1,...,xn are the data values. The summation sign, , is used as a
shorthand notational substitute for the instruction "take the sum" or "add."

Median

The median, M, is the midpoint of the observed values if they are arranged in increasing (or
decreasing) order. Therefore, half of the values of the distribution are below the median and half of
the values are above the median. The median can be calculated easily once the data is ordered so that
x1 # x2 # ... # xn. The calculation is slightly different depending on whether the number of data
values, n, is odd or even:

M = x(n+1)/2 if n is odd
(2.2.2)
M = [xn/2 + x(n/2)+1] / 2 if n is even

The median can easily be read from a probability plot as we will see in the next chapter.
Since the y-axis records the cumulative frequency, the median is the value on the x-axis that
corresponds to 50% on the y-axis.

Mode

The mode is the value that occurs most frequently. The mode is easily located on a graph of
a frequency distribution. It is at the peak of the curve, the point of maximum frequency. On a
histogram, the class with the tallest bar can give a quick idea where the mode is. However, when the
histogram is especially irregular, or when there are two or more classes with equal frequencies, the
location of the mode becomes more difficult.

One of the drawbacks of the mode is that it may change with the precision of the data values.
For this reason, the mode is not particularly useful for data sets in which the measurements have
several significant digits. In such cases, some approximate value is chosen by finding the tallest bar
on a histogram.

Minimum

The minimum is the smallest value in the data set. In many practical situations, the smallest
values are recorded simply as being below some detection limit. In such situations, it does not make
much difference whether the minimum value assigned is 0 or some arbitrary small value as long as
it is done consistently. However, it is extremely important, especially in drillhole sampling, not to
assign zero values to the missing data. They should be separated from the actual data by either

6
assigning negative values, or by alphanumeric indicators. Once it is decided how to handle the
missing assays, then they can be used accordingly in the compositing stage.

Maximum

The maximum is the largest value in the data set. It is especially important in ore reserve
analysis to double check the maximum value as well as any suspiciously high values, for accuracy.
This should be done to make sure that these values are real, not typographical errors.

Quartiles

The quartiles split the data into quarters in the same way the median splits the data into
halves. Quartiles are usually denoted by the letter Q. For example, Q1 is the lower or first quartile,
Q3 is the upper or third quartile, etc.

As with the median, quartiles can be read from a probability plot. The value on the x-axis,
which corresponds to 25% on the y-axis, is the lower quartile and the value that corresponds to 75%
is the upper quartile.

Deciles, Percentiles, and Quantiles

The idea of splitting the data into halves or into quarters can be extended to any fraction.
Deciles split the data into tenths. One tenth of the data fall below the first or lowest decile. The fifth
decile corresponds to the median. In a similar way percentiles split the data into hundredths. The
25th percentile is the same as the first quartile, the 50th percentile is the same as the median, and the
75th percentile is the same as the third quartile.

Quantiles, on the other hand, can split the data into any fraction. They are usually denoted
by q, such as q.25 and q.75, which corresponds to lower and upper quartiles, Q1 and Q3, respectively.

MEASURES OF SPREAD

Variance

The sample variance, s2, describes the variability of the data values. It is the average squared
difference of the data values from their mean. Mathematically it is the second moment about the
mean. It is calculated by the following formula:

7
s2 = 1/n (xi - m)2 i = 1,...,n (2.2.3)

The following formula can also be used and is more suitable for programming:

s2 = 1/n { (xi)2 - [ (xi) ] 2 / n } i = 1,...,n (2.2.4)

Since the variance involves the squared differences, it is sensitive to outlier high values.

Standard Deviation

The standard deviation, s, is simply the square root of the variance (/s2 ). It is often used
instead of the variance since its units are the same as the units of the variable described.

Interquartile Range

The interquartile range, IQR, is the difference between the upper and the lower quartiles and
is given by:

IQR = Q3 - Q1 (2.2.5)

Unlike the variance and the standard deviation, the interquartile range does not use the mean
as the center of the distribution. Therefore, it is sometimes preferred if a few erratically high values
strongly influence the mean.

MEASURES OF SHAPE

Skewness

The skewness is often calculated to determine if a distribution is symmetric. The direction


of skewness is defined as in the direction of the longer tail of the distribution. If the distribution tails
to the left, it is called negatively skewed, if it tails to the right, it is called positively skewed. The
skewness is calculated using the following formula:

Skewness = [1/n (xi - m)3] / s3 i = 1,...,n (2.2.6)

Skewness is the third moment about the mean, therefore it is even more sensitive to erratic
high values than the mean and the variance. A single large value can heavily influence the skewness
since the difference between each data value and the mean is cubed.

8
Coefficient of Variation

The coefficient of variation, CV, is simply the standard deviation (s) divided by the mean
(m):

CV = s / m (2.2.7)

The coefficient of variation does not have a unit. Therefore, it can be used to compare the
relative dispersion of values around the mean, among different distributions.

If the estimation is the final goal of a study, the coefficient of variation can provide some
warning of upcoming problems. A coefficient of variation greater than one indicates the presence
of some erratic high sample values that may have significant impact on the final estimates.

2.3 THEORETICAL MODEL OF DISTRIBUTIONS

The normal and the lognormal distribution are commonly used to represent frequency
distributions of sample data values.

NORMAL DISTRIBUTION

The normal distribution is the most common theoretical probability distribution used in
statistics. It is also referred as the Gaussian distribution. The normal distribution curve is bell-shaped.
Its equation is a function of two parameters, the mean (m) and the standard deviation (s) as follows:

f(x) = 1 / (s/2 ) exp [-½ ((x-m) / s)2] (2.3.1)

In a population defined by a normal distribution, 68% of the values fall within one standard
deviation of the mean, and 95% of the values fall within two standard deviation of the mean. Figure
2.3.1 shows an example of the normal distribution curve.

Standard Normal Distribution

A normal distribution with mean 0 and standard deviation 1 is called the standard normal
distribution. Any normal variable can be converted to a standard normal deviate, z, or standardized
by subtracting its arithmetic mean and dividing by its standard deviation as follows:

z = (x - m) / s (2.3.2)

9
a) pdf

b) cdf

Figure 2.3.1 Normal Distribution Curve

10
The cumulative distribution function, denoted F(x), is not easily computed for the normal
distribution. Therefore, extensive tables have been prepared to simplify calculation with the normal
distribution. Most statistics text include tables for the standard normal distribution with mean 0 and
variance 1. Table 2.3.1 is an example of such tables. It gives the cumulative normal distribution
function. An example on the use of this table is given below:

Example to use cumulative normal distribution table:

Find the proportion of sample values above 0.5 cutoff in a normal population that has m =
0.3, and s = 0.2.

Solution:

First, transform the cutoff, x0, to unit normal.

z = (x0 - m) / s = (0.5 - 0.3) / 0.2 = 1

Next, find the value of F(z) for z = 1. The value of F(1) = 0.8413 from Table 2.1. Then,
calculate the proportion of sample values above 0.5 cutoff, P(x > 0.5), as follows:

P(x > 0.5) = 1 - P(x # 0.5) = 1 - F(1) = 1 - 0.8413 = 0.16

Therefore, 16% of the samples in the population are greater than 0.5.

11
Table 2.3.1 Cumulative Normal Distribution Function

F(z) = 1 / /2 Iz eu dt, u = -1/2 t2

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

12
LOGNORMAL DISTRIBUTION

The lognormal distribution occurs when the logarithm of a random variable has a normal
distribution. This distribution is positively skewed. Its probability density is given by

f(x) = 1 / (xß/2 ) e-u for x > 0, ß>0 (2.3.3)

where u = (ln x - )2 / 2ß2


= average of logarithms, and
ß = variance of logarithms.

Many variables encountered in ore reserve analysis have positively skewed distributions.
However, it should be noted that not all positively skewed distributions are lognormal. In fact, one
may result in erroneous estimates if an assumption of lognormality is used for data that are not from
a lognormal distribution. Figure 2.3.2 shows an example of a lognormal density curve. The
corresponding formulas of mean and variance between the normal and lognormal distributions are
given below:
2
µ = exp ( + /2) (2.3.4)
2
= µ 2 [exp( 2) - 1] (2.3.5)
2
= log µ - /2 (2.3.6)
2
= log [1 + ( 2/µ 2)] (2.3.7)

Three-Parameter Lognormal Distribution

There are instances where the logarithm of a random variable plus a constant, ln (x+c), is
normally distributed. This type of distribution is called three-parameter lognormal distribution. If a
variable is three-parameter lognormal, the cumulative curve will show an excess of low values. The
additive constant, c, can be estimated by using

c = (M2 - q1 q2) / (q1 + q2 + 2M) (2.3.8)

where M is the median, q1 and q2 are the quantiles corresponding to p and 1-p cumulative
frequencies respectively. In theory, any value of p can be used but a value between 10% and 25%
will give the best results.

Some researchers investigated the possible problems in using the three-parameter lognormal
distribution. They found that bias could result if the additive constant is too small, and that it would
be better to make the constant too large than too small.

13
Figure 2.3.2 Lognormal Distribution Curve Examples

14
2.4 RANDOM VARIABLES

A random variable is a variable whose values are randomly generated according to a probabilistic
mechanism. In other words, a random variable denotes a numerical quantity defined in terms of the
outcome of an experiment. For example, the outcome of coin toss is a random variable. The grade
of ore from a drill hole sample is a random variable.

Properties of a Random Variable

The parameters of a random variable cannot be calculated exactly by observing a few


outcomes of the random variable; rather they are parameters of a conceptual or theoretical model.
From a sequence of observed outcomes, such as drill hole assay values, all one can do is simply to
calculate sample statistics based on that particular data set. However, a different set of assay values
may produce a different set of statistics. It is true that as the number of samples increases, the sample
statistics tend to approach to their corresponding model parameters. This leads the practitioners to
assume that the parameters of the random variable are the same as the sample statistics they can
calculate.

The two model parameters most commonly used in probabilistic approaches to estimation
are the mean or expected value of the random variable and its variance.

Expected Value — The expected value of a random variable is its mean or average outcome.
It is often denoted by µ, and is defined as

µ = E(x) (2.4.1)

E(x) refers to expectation which is defined by the following formula:

E(x) = -4I4 x f(x) dx (2.4.2)

where f(x) is the probability density function of the random variable x.

Variance — The variance of a random variable is the expected squared difference from the
mean of the random variable. It is often denoted by 2, and is defined by
2
= E (x-µ)2 = I4 (x-µ)2 f(x) dx
-4 (2.4.3)

The standard deviation of a random variable is simply the square root of its variance. It is
denoted by .

15
Independence

Random variables are considered to be independent if the joint probability density function
of n random variables satisfy the following relationship:

p(x1,x2,...,xn) = p(x1) p(x2) ... p(xn) (2.4.4)

In this equation, p(xi) is the marginal distribution1 of xi. The equation simply means that the
probability of two events happening is the product of the individual probabilities of each event
happening. If the probability of one of the events happening influences the probability of the other
event happening, then the random variables cannot be considered independent. The dependence
between two random variables is described by covariance. Covariance is defined as follows:

Cov(x1,x2) = E {[x1 - E(x2)] [x2 - E(x2)]}


(2.4.5)
= E(x1x2) - [E(x1)] [E(x2)]

If x1 and x2 are independent, then they have no covariance, or Cov(x1,x2) = 0.

Properties of Expectation and Variance

The following are some of the properties associated with expectation and variance:

1. If C is a constant, then E(Cx) = C E(x)

2. If x1, x2, ..., xn have finite expectation, then

E(x1+x2...+xn) = E(x1) + E(x2) + ... + E(xn)

3. If C is a constant, then Var(Cx) = C2 Var(x)

4. If x1, x2, ..., xn are independent, then

Var(x1+x2...+xn) = Var(x1) + Var(x2) + ... + Var(xn)

5. Var(x+y) = Var(x) + Var(y) + 2 Cov(x,y)

1
A marginal distribution is a distribution which deals with only one variable at a time.

16
2.5 CENTRAL LIMIT THEOREM

The central limit theorem (CLT) states that sample means of a group of independent variables
tend toward a normal distribution regardless of the distribution of the samples. Thus, if samples of
size, n, are drawn from a population that is not normally distributed, the successive sample means
will nevertheless form a distribution that is approximately normal. This distribution approaches
closer and closer to normal as the sample size, n, increases. How close the approximation is to a
normal distribution is hard to determine. In most cases, for n > 40, the approximation is quite good.

Standard Error of the Mean

The dispersion of the distribution of sample means depends on two factors:

1. The dispersion of the parent population. The more variable the parent population, the
more variable will be the sample means.

2. The size of the sample. The variability of the sample means decreases with increasing size
of the sample.

Mathematically speaking, the standard deviation of the distribution of sample means, which
is also called the standard error of the mean, varies directly with the standard deviation of the parent
population ( ), and inversely with the square root of the number of samples (n). This is expressed
in the following equation:

Standard Error of the Mean = x = / /n (2.5.1)

Of course, in order for the above formula to be valid, the samples have to be independent of
each other.

Confidence Limits

The confidence in an estimate can be expressed by applying error bounds around the
estimate. These error bounds are called confidence intervals, or confidence limits. Using the central
limit theorem, confidence limits can be calculated for the sample mean (m). The most common
limits are at 95% confidence level, and are formed in the following manner:

Lower limit at 95% confidence level = m - 2 (s / /n)


(2.5.2)
Upper limit at 95% confidence level = m + 2 (s / /n)

17
where m is sample mean, s is sample standard deviation, and n is the number of samples. The
above confidence limits tell us that 95 out of 100 times this procedure is applied, the true population
mean will be within the limits specified. It is clear that, as the number of samples increases, the
standard error of the mean decreases, and the confidence interval becomes smaller.

2.6 BIVARIATE DISTRIBUTION

In the analysis of earth science data, it is often desirable to know the pattern of dependence
of one variable (X) to another (Y). This pattern of dependence is particularly critical if one wishes
to estimate the outcome of unknown X from the outcome of known Y. For example, in a gold
deposit, if the drillhole cores were sampled for gold, but not always for silver, then one may want
to estimate the missing silver grades from gold grades if these two grades are correlated.

Just like the distribution of a single random variable X is characterized by a cdf F(x) = Prob
{X # x}, the joint distribution of outcomes from two random variables X and Y is characterized by
a joint cdf:

F(x,y) = Prob {X # x, and Y # y} (2.6.1)

In practice, the joint cdf F(x,y) is estimated by the proportion of pairs of data values jointly
below the respective threshold values x, y.

18
2.7 DESIRABLE PROPERTIES OF ESTIMATORS

When comparing different estimators of a random variable, one should check into certain
properties of these estimators. An estimator without a bias is the most desirable. This condition,
which is commonly referred as unbiasedness, can be expressed as follows:

E (Z - Z*) = 0 (2.7.1)

where Z* is the estimate and Z is the true value of the random variable being estimated. This
condition means that the expected value of the error (Z - Z*) is zero. In other words, on average the
estimator predicts the correct value.

The second desirable property of an estimator is that the variance of the errors should be
small. This condition is expressed as follows:

Var (Z - Z*) = E (Z - Z*)2 = small (2.7.2)

Another desirable property of an estimator is its robustness. An estimator that works well
with one data set should also work well with many different data sets. If that is the case, then the
estimator is called robust. An example to this could be the kriging estimator which is considered to
be a robust estimator.

19
20
3
Data Analysis and Display

It is essential to organize any the statistical data to understand its characteristics. Therefore,
much of statistics deals with the organization, presentation, and summary of data. The organization
of data includes its preparation and verification as well.

3.1 DATA PREPARATION AND VERIFICATION

Error Checking

One of the most tedious and time-consuming tasks in a geostatistical study is error checking.
Although one would like to weed out all the errors at the outset, there are often a few errors that
remain hidden until the analysis is already started.

In drillhole assay data preparation, the initial drillhole logs must be coded carefully and
legibly to prevent any future errors. One should not use zero or blank to indicate the missing data.
It is preferable to use a specific negative value, such as -1 or -999, for such data. Otherwise, the
missing data may end up being used in estimation as part of the actual data.

After the data have been entered into the computer, it should be listed to check for
typographical errors. However, that process alone does not guarantee the accuracy of the data. Some
helpful suggestions to verify the data for accuracy are given below:

ô Sort the data and examine the extreme values. Try to establish their authenticity by referring
to the original sampling logs.

ô Plot sections and plan maps for visual verification and spotting the coordinate errors. Are
they plotting within the expected limits?

ô Locate the extreme values on a map. Are they located along trends of similar data values or
are they isolated? Be suspicious of isolated extremes. If necessary and possible, get duplicate
samples.

21
When trying to sort out inconsistencies in the data, it usually helps to familiarize oneself with
the data. Time spent for verification of the data is often rewarded by a quick recognition when an
error has occurred.

3.2 GRAPHICAL DISPLAY & SUMMARY

UNIVARIATE DESCRIPTION

Frequency Distributions and Histograms

One of the most common and useful presentations of data sets is the frequency table and the
corresponding graph, the histogram. A frequency table records how often observed values fall within
certain intervals or classes. The histogram is the graphical representation of the same information.
Figure 3.2.1 gives an example of both information combined in the same printer output for the
sample data used.

Summary statistics were also included in this figure to complete the preliminary information
needed to study the sample data. The histogram of the data can be generated on a plotter for more
accurate display and visual presentation. Figure 3.2.2 shows an example of such display using the
same data.

Frequency tables and histograms are very useful in ore reserve analysis for many reasons:

1. They give a visual picture of the data and how they are distributed.
2. Bimodal distributions show up easily, which usually indicates mixing of two separate
populations.
3. Outlier high grades can be easily spotted.

Cumulative Frequency Tables

In ore reserve analysis, the cumulative frequency above a specified cutoff value is of great
interest. Therefore, the programs that generate frequency tables and histograms are also designed to
generate the cumulative frequency tables. These tables give the mean, standard deviation and the
percentage of data above the cutoff grades that correspond to the histogram intervals. Table 3.2.1 is
an example of such a table using the sample data.

22
** CU COMPOSITE DATA STATISTICS FROM FILE MSOP09.DAT **

HISTOGRAM AND FREQUENCY DISTRIBUTION


"15-m bench composites - Rock type 1

AVE. DATA VALUE = .735


C.V. (STD/MEAN) = .676
MIN. DATA VALUE = .000
MAX. DATA VALUE = 3.700

# CUM. UPPER
FREQ. FREQ LIMIT 0 20 40 60 80 100
----- ----- ----- +.........+.........+.........+.........+.........+
86 .093 .100 +*****. +
34 .130 .200 +** . +
48 .182 .300 +*** . +
73 .261 .400 +**** . +
86 .354 .500 +***** . +
80 .440 .600 +**** . +
84 .531 .700 +***** . +
74 .611 .800 +**** . +
70 .686 .900 +**** . +
60 .751 1.000 +*** . +
43 .798 1.100 +** . +
28 .828 1.200 +** . +
29 .859 1.300 +** . +
31 .893 1.400 +** . +
25 .920 1.500 +* .+
19 .941 1.600 +* .
16 .958 1.700 +* .
8 .966 1.800 + .
9 .976 1.900 + .
3 .979 2.000 + .
6 .986 2.100 + .
4 .990 2.200 + .
1 .991 2.300 + .
3 .995 2.400 + .
3 .998 2.500 + .
1 .999 2.600 + .
0 .999 2.700 + .
0 .999 2.800 + .
0 .999 2.900 + .
0 .999 3.000 + .
0 .999 3.100 + .
0 .999 3.200 + .
0 .999 3.300 + .
0 .999 3.400 + .
0 .999 3.500 + .
0 .999 3.600 + .
0 .999 3.700 + .
1 1.000 3.800 + .
---- ----- ----- +.........+.........+.........+.........+.........+
925 1.000 0 20 40 60 80 100

Figure 3.2.1 Frequency Table and Histogram of Sample Data

23
Figure 3.2.2 Histogram Plot of Sample Data

24
** CU COMPOSITE DATA STATISTICS FROM FILE MSOP09.DAT **

Rock type 1

CUTOFF SAMPLES PERCENT MEAN C.V.


CU ABOVE ABOVE ABOVE

.000 925.00 100.00 .7347 .6762


.100 839.00 90.70 .8049 .5812
.200 805.00 87.03 .8325 .5496
.300 757.00 81.84 .8698 .5131
.400 684.00 73.95 .9263 .4672
.500 598.00 64.65 .9953 .4219
.600 518.00 56.00 1.0644 .3846
.700 434.00 46.92 1.1459 .3479
.800 360.00 38.92 1.2278 .3177
.900 290.00 31.35 1.3193 .2891
1.000 230.00 24.86 1.4167 .2616
1.100 187.00 20.22 1.5047 .2371
1.200 159.00 17.19 1.5681 .2234
1.300 130.00 14.05 1.6388 .2135
1.400 99.00 10.70 1.7326 .2029
1.500 74.00 8.00 1.8335 .1926
1.600 55.00 5.95 1.9373 .1829
1.700 39.00 4.22 2.0541 .1755
1.800 31.00 3.35 2.1365 .1689
1.900 22.00 2.38 2.2505 .1655
2.000 19.00 2.05 2.2984 .1650
2.100 13.00 1.41 2.4162 .1696
2.200 9.00 .97 2.5344 .1767
2.300 8.00 .86 2.5763 .1784
2.400 5.00 .54 2.7180 .2022
2.500 2.00 .22 3.1100 .2683
2.600 1.00 .11 3.7000 .0000
2.700 1.00 .11 3.7000 .0000
2.800 1.00 .11 3.7000 .0000
2.900 1.00 .11 3.7000 .0000
3.000 1.00 .11 3.7000 .0000
3.100 1.00 .11 3.7000 .0000
3.200 1.00 .11 3.7000 .0000
3.300 1.00 .11 3.7000 .0000
3.400 1.00 .11 3.7000 .0000
3.500 1.00 .11 3.7000 .0000
3.600 1.00 .11 3.7000 .0000
3.700 1.00 .11 3.7000 .0000

Min. data value = .0000


Max. data value = 3.7000

C.V. = Coeff. of variation = Standard deviation / mean

925 Intervals used out of 2411

Table 3.2.1 Cumulative Frequency Statistics of Sample Data

25
Probability Plots

Probability plots are useful in determining how close the distribution of sample data is to
being normal or lognormal. On a normal probability plot, the y-axis is scaled so that the cumulative
frequencies will plot as a straight line if the distribution is normal.

On a lognormal probability plot, the x-axis is in logarithmic scale. Therefore, the cumulative
frequencies will plot as a straight line if the data values are lognormally distributed. Figure 3.2.3
shows a lognormal probability plot of the sample data values. It is very tempting to consider the
shape of the plot in this figure to be a straight line, and therefore to assume that the distribution
plotted is lognormal. However, a somewhat convex shape of the plot at its upper center indicates that
this may not be the case.

When the use of an ore reserve estimation method depends on assumptions about the
distribution, one must be aware of the consequences of disregarding deviations of a probability plot
at the extremes. This is because such assumptions often have their greatest impact when one is
estimating extreme values. Departures of a probability plot from approximate linearity at the extreme
values are often deceptively small and easy to overlook. However, the estimates derived using such
an assumption may be different from reality.

Probability plots are very useful for detecting the presence of multiple populations. Although
the deviations from the straight line on the plots do not necessarily indicate multiple populations,
they represent changes in the characteristics of the cumulative frequencies over different intervals.
Therefore, it is always a good idea to find out the reasons for such deviations.

Unless the estimation method is dependent on a particular distribution, selecting a theoretical


model for the distribution of data values is not a necessary step prior to estimation. Therefore, one
should not read too much into a probability plot. The straightness of a line on a probability plot is
no guarantee of a good estimate and the crookedness of a line should not condemn distribution-based
approaches to estimation. Certain methods lean more heavily toward assumptions about the
distribution than others do. Some estimation tools built on the assumption of normality may still be
useful even when the data are not normally distributed.

26
Figure 3.2.3 Lognormal Probability Plot of Sample Data

27
BIVARIATE DESCRIPTION

Scatter Plots

One of the most common and useful presentations of bivariate data sets is the scatter plot.
A scatter plot is simply an x-y graph of the data on which the x-coordinate corresponds to the value
of one variable, and the y-coordinate to the value of other variable.

A scatter plot is useful for assistance in seeing how well two variables are related. It is also
useful for drawing our attention to unusual data pairs. In the early stages of the study of spatially
continuous data set, it is necessary to check and clean the data. Even after the data have been
cleaned, a few erratic values may have a major impact on estimation. The scatter plot can be used
to help both in the validation of the initial data and in the understanding of later results. Figure 3.2.4
shows a scatter plot of two different assay grades.

Quantile-quantile Plots

Two marginal distributions can be compared by plotting their quantiles against one another.
The resultant plot is called quantile-quantile, or simply q-q plot. If the q-q plot appears as a straight
line, the two marginal distributions have the same shape. A 45E line indicates further that their mean
and variances are the same also.

Correlation

In the very broadest sense, there are three possible scenarios between two variables: the
variables are either positively correlated, negatively correlated, or uncorrelated.

Two variables are positively correlated if the smaller values of one variable are associated
with the smaller values of the other variable, and similarly the larger values are associated with the
larger values of the other. For example, in a gold-silver deposit, higher values of silver can be
observed with higher values of gold. Similarly, amount of brecciation in a rock formation can be
positively correlated with the amount of gold deposited in the rock.

Two variables are negatively correlated if the smaller values of one variable are associated
with the larger values of the other variable, or vice versa. In geologic data sets, the concentrations
of two major elements are negatively correlated. For example, in dolomitic limestone, an increase
in the amount of calcium, usually results in a decrease in the amount of magnesium.

The final possibility is that the two variables are not related. An increase or decrease in one
variable has no apparent effect on the other. In this case, the variables are said to be uncorrelated.

28
Figure 3.2.4 Scatter Plot of Two Variables

29
Correlation Coefficient

The correlation coefficient, r, is the statistic that is most commonly used to summarize the
relationship between two variables. It can be calculated using the following equation:

r = CovXY / x y (3.2.1)

The numerator, CovXY, is called the covariance and can be calculated using

CovXY = 1/n (xi - mx) (yi - my) i = 1,...,n (3.2.2)

where n is the number of data; xi’s are the data values for the first variable, mx is their mean,
and x their standard deviation. Similarly, yi’s are the data values for the second variable, my is their
mean, and y their standard deviation.

The correlation coefficient is actually a measure of how close to a straight line two variables
plot. If r = 1, then the scatter plot will be a straight line with a positive slope. This is the case of a
perfect positive correlation. If r = -1, then the scatter plot will be a straight line with a negative slope.
This is the case of a perfect negative correlation. If r = 0, then there is no correlation between the two
variables.

It is important to note that r provides a measure of the linear relationship between two
variables. If the relationship between two variables is not linear, then the correlation coefficient may
be a very poor summary statistics.

The correlation coefficient and the covariance may be affected by a few outlier pairs.
Exclusion of these pairs from the statistics can dramatically improve an otherwise poor correlation
coefficient.

Linear Regression

If there is a strong relationship between two variables, which can be expressed by an


equation, then one variable can be used to predict the other one if one of them is unknown. The
simplest case for this type of prediction is linear regression, in which it is assumed that the
dependence of one variable on the other can be described by the equation of a straight line

y = ax + b (3.2.3)

where "a" is the slope, and "b" is the constant of the line. They are given by:

a=r( y / x) b = my - amx (3.2.4)

30
The slope, a, is the correlation coefficient multiplied by the ratio of the standard deviations,
with y is the standard deviation of the variable we are trying to predict, and x is the standard
deviation of the variable we know. Once the slope is known, then the constant, b, can be calculated
using the means of the two variables, mx and my. Figure 3.2.4 gives the equation of the line that
describes the relationship between two different grade items.

SPATIAL DESCRIPTION

One of the characteristics of earth sciences data sets is that the data belong to some location
in space. Spatial features of the data set, such as the degree of continuity, the overall trend, or the
presence of high or low grade zones, are often of considerable interest. None of the univariate and
bivariate descriptive tools presented in the previous sections capture these spatial features.

Data Location Maps

One of the most common and simplest displays of spatial data is to generate a location map.
This is a map on which each data location is plotted along with its corresponding data value. Figure
3.2.5 is an example of such a map on which the pierce points of the sample drillhole data set have
been plotted at a specified level showing the composite data values on this bench.

The data location maps are an important initial step in analyzing spatial data sets. With
irregularly gridded data, a location map often gives a clue to how the data were collected. For
example, the areas with higher grade mineralization may be drilled more closely indicating some
initial interest.

The data location maps help reveal obvious errors in data locations. They also help draw
attention to unusual data values that may be erroneous. Lone high values surrounded by low values
and vice versa are worth rechecking.

Contour Maps

The overall trends in the data values can be revealed by a contour map. Some contouring
algorithms cannot contour the data unless they are on a regular grid. In that case, the data values need
to be interpolated to a regular grid. Interpolated values are usually less variable than the original data
values and can make the contoured surface appear smoother. Figure 3.2.6 shows the contoured data
values that are plotted on Figure 3.2.5 using the original data locations without gridding.

31
Figure 3.2.5 Sample Data Location Map

32
Figure 3.2.6 Sample Data Contour Map

33
Symbol Maps

For many very large regularly gridded data sets, plotting of all the data values may not be
feasible, and a contour map may mask many of the interesting details. An alternative that is often
used in such situations is a symbol map. This is similar to data plotting with each location replaced
by a symbol that denotes the class to which the data value belongs. These symbols are usually chosen
so that they convey the relative ordering of the classes by their visual density. This type of display
is often designed to be printed on a line printer. Therefore, it is convenient to use if one does not
have access to a plotting device. Unfortunately, the scale on symbol maps is usually distorted since
most line printers do not print the same number of characters per inch horizontally as they do
vertically.

Moving Window Statistics

For a given data set, it is quite common to find that the data values in some regions are more
variable than in others. Such anomalies may have serious practical implications. For example, in
most mines, erratic ore grades cause problems at the mill because most metallurgical processes
benefit from low variability in the ore grade.

The calculation of summary statistics within moving windows is frequently used to


investigate anomalies both in the average value and variability. The area is divided into several local
neighborhoods of equal size and within each local neighborhood, or window, summary statistics are
calculated.

The size of the window depends on the average spacing between data locations and on the
overall dimensions of the area being studied. The window size should be large enough to include
a reliable amount of data within each window for summary statistics. Needing large windows for
reliable statistics and wanting small windows for local detail may necessitate the use of overlapping
the windows, with two adjacent neighborhoods having some data in common. If there are still too
few data within a particular window, it is often better to ignore that window in subsequent analysis
than to incorporate unreliable statistics.

Proportional Effect

Knowing the local variability changes across the deposit is important for estimation.
Contouring of data values or moving window statistics are very useful in displaying fluctuations in
data values in different directions. If there are such fluctuations, it is important to know how the
changes in local variability are related to the local mean.

A proportional effect is a relationship between mean and variance. It may be a strictly


functional relationship or it may simply be empirical. The most common form of proportional effect
occurs when the variance is directly proportional to some function of the mean.

34
In broad sense, there are four relationships one can observe between the local mean and local
variability:

1. The mean and variability are both constant.

2. The mean is constant, variability changes.

3. The mean changes, variability is constant.

4. Both the mean and variability change.

The first two cases are the most favorable ones for estimation. If the local variability is
roughly constant, then the estimates in a particular area will be as good as the estimates elsewhere.
In other words, no area will suffer more than others from highly variable data values. If the
variability changes noticeably, then it is better have a situation where the variability is related to the
mean. In initial data analysis, it is useful to establish if a predictable relationship exists between the
local mean and variability. If it exists, such relationship is generally referred to as a proportional
effect.

Two very common forms of the proportional effect are:

variance % mean2 and variance % (mean + constant)2

One of the characteristics of normally distributed values is that there is usually no


proportional effect. In fact, the local standard deviations are roughly constant. For lognormally
distributed values, a scatter plot of local means versus local standard deviations will show a linear
relationship between the two. Figure 3.2.7 gives an example of such a relationship at a metal mine.

35
Figure 3.2.7 Sample Plot of Proportional Effect

36
4
Analysis of Spatial Continuity

The analysis of spatial continuity in an ore deposit is very essential for ore reserve estimation.
There are several geostatistical tools to describe the spatial continuity, such as correlation function,
covariance function, and variogram. All of these tools use some summary statistics to describe how
spatial continuity changes as a function of distance and direction. This chapter will cover only the
variogram since it is more traditional than the covariance and correlation functions although they
all are equally useful.

4.1 VARIOGRAM

Definition

In simplest terms, the variogram measures the spatial correlation between samples. One
possible way to measure this correlation between two samples at point xi and xi+h taken h distance
apart is the function

f1(h) = 1/n [Z(xi) - Z(xi+h)] (4.1.1)

where Z(xi) refers to the assay value of the sample at point xi, h is the distance between
samples. Thus, the function measures the average difference between samples h distance apart.
Although this function is useful, in many cases it may be equal to zero or close to it because the
differences will cancel out. A more useful function is obtained by squaring the differences:

f2(h) = 1/n [Z(xi) - Z(xi+h)]2 (4.1.2)

In this case the differences do not cancel out, and the result of the above function will always
be positive. This function is called the variogram and is denoted by 2 (h).

Although the variogram was originally defined as 2 (h), popular usage refers to semi-
variogram (h) as being the variogram. Therefore, throughout this chapter, variogram will refer to
the following function:

37
(h) = 1/(2n) [Z(xi) - Z(xi+h)]2 i = 1,...,n (4.1.3)

Note that (h) is a vector function in three-dimensional space and it varies with both distance
and direction. The number of samples, n, is dependent on the distance and direction selected to
accept the data. The formal definition of the variogram is given by the following equation:

(h) = 1/(2v) Iv [Z(x) - Z(x+h)]2 dx (4.1.4)

v refers to the volume of the deposit. The function, Z(x), simply defines the value of interest
such as the grade at point x. Figure 4.1.1 shows a sample variogram.

The terminology used to describe the features of the variogram is given below:

Range — The samples that are close to each other have generally similar values. As the
separation distance between samples increases, the difference between the sample values, and hence
the corresponding variogram value will also generally increase. Eventually, however, an increase in
the separation distance no longer causes a corresponding increase in the variogram value. Thus, the
variogram reaches a plateau, or levels off. The distance at which the variogram levels off is called
the range.

The range is simply the traditional geologic notion of range of influence. It means that
beyond the range, the samples are no longer correlated. In other words, they are independent of each
other.

Sill — The value of the variogram where it levels off is called the sill.

Nugget Effect — Although one would expect to obtain the same value when the samples are
taken from the same location, it is not very unusual, especially in highly variable deposits, to obtain
different values. Several factors, such as sampling error and short scale variability, may cause sample
values separated by extremely small distances to be quite dissimilar. This causes a discontinuity at
the origin of the variogram. The vertical jump from the value of zero at the origin to the value of the
variogram is called the nugget effect. The ratio of the nugget effect to the sill is often referred to as
the relative nugget effect and is usually quoted in percentages.

Computation

In practice, a variogram is almost always computed using a discrete number of points such
as drillhole assays. Therefore, the Equation 4.1.3 can be used to calculate the variogram. In this
equation, it is assumed that there are n pairs of samples, each pair is separated by distance h. In
addition, all these samples are assumed to lie on a straight line, along which the variogram
computation is being performed. To illustrate the computation of a variogram, the following example
can be given.

38
* HORIZONTAL VARIOGRAMS -- ROCK TYPE 1 *

VARIOGRAM TYPE: NORMAL


3-d omni-directional variogram

Easting : 2409.7--> 3123.3 Horizontal angle : .00


Northing : 4791.4--> 5548.4 Horizontal window: 90.00
Elevation: 2045.0--> 2660.0 Vertical angle : .00
Grade : .000--> 3.700 Vertical window: 90.00

Mean : .7347 Std. dev. : .4968 #Samples: 925


Log mean : -.6578 Log std dev.: 1.0567

TYPE: NORMAL TRANSFORMATION: NONE VARIABLE: CU

FROM TO PAIRS DISTANCE DRIFT V(H) MEAN

1 0- 50 2666 30.9 .2536E-01 .1030E+00 .7488E+00


2 50- 100 9734 79.2 -.8209E-03 .2186E+00 .8056E+00
3 100- 150 23036 126.4 -.2306E-01 .2490E+00 .7981E+00
4 150- 200 32117 175.6 -.1505E-01 .2560E+00 .7652E+00
5 200- 250 45989 225.4 .2757E-02 .2601E+00 .7419E+00
6 250- 300 47351 275.1 -.2589E-01 .2508E+00 .7286E+00
7 300- 350 51794 324.7 -.2560E-01 .2505E+00 .7417E+00
8 350- 400 46522 373.8 -.3154E-01 .2434E+00 .7270E+00
9 400- 450 40313 424.5 -.4489E-01 .2448E+00 .7113E+00
10 450- 500 32113 473.7 -.4401E-01 .2270E+00 .6915E+00

.2790E+00 +
.2647E+00 + X
.2504E+00 + X X X X X X
.2361E+00 +
.2218E+00 + X X
.2075E+00 +
.1932E+00 +
.1789E+00 +
.1645E+00 +
.1502E+00 +
.1359E+00 +
.1216E+00 +
.1073E+00 + X
.9300E-01 +
.7869E-01 +
.6439E-01 +
.5008E-01 +
.3577E-01 +
.2146E-01 +
.7154E-02 +
---------+---------+
250. 500.

Figure 4.1.1 Sample Variogram

39
Example for Variogram Calculation:

Let us calculate an E-W variogram using the data in Figure 4.1.2. In this figure, there are five
samples collected in E-W direction, each separated by a distance h where h is equal to 15 units (feet
or meters). Since the spacing of data values is 15 units, we compute the variogram at 15-unit steps.
For the first step (h=15), there are 4 pairs:

1. x1 and x2, or .14 and .28


2. x2 and x3, or .28 and .19
3. x3 and x4, or .19 and .10
4. x4 and x5, or .10 and .09

Therefore, for h=15, we get

(15) = 1/(2*4) [(x1-x2)2 + (x2-x3)2 + (x3-x4)2 + (x4-x5)2]

= 1/8 [(.14-.28)2 + (.28-.19)2 + (.19-.10)2 + (.10-.09)2]

= 0.125 [(-.14)2 + (.09)2 + (.09)2 + (.01)2]

= 0.125 ( .0196 + .0081 + .0081 + .0001 )

= 0.125 ( .0359 )

(15) = 0.00448

For the second step (h=30), there are 3 pairs:

1. x1 and x3, or .14 and .19


2. x2 and x4, or .28 and .10
3. x3 and x5, or .19 and .09

Therefore, for h=30, we get

(30) = 1/(2*3) [(x1-x3)2 + (x2-x4)2 + (x3-x5)2]

= 1/6 [(.14-.19)2 + (.28-.10)2 + (.19-.09)2]

= 0.16667 [(-.05)2 + (.18)2 + (.10)2]

40
Note: N is sample number
+ is sample location
h is equal to 15.

Figure 4.1.2 Sample Data For Variogram Computation

41
= 0.16667 ( .0025 + .0324 + .0100 )

= 0.16667 ( .0449 )

(30) = 0.00748

For the third step (h=45), there are 2 pairs:

1. x1 and x4, or .14 and .10


2. x2 and x5, or .28 and .09

Therefore, for h=45, we get

(45) = 1/(2*2) [(x1-x4)2 + (x2-x5)2]

= 1/4 [(.14-.10)2 + (.28-.09)2]

= 0.25 [(.04)2 + (.19)2]

= 0.25 ( .0016 + .0361 )

= 0.25 ( .0377 )

(45) = 0.00942

For the fourth step (h=60), there is only one pair: x1 and x5. The values for this pair are .14
and .09, respectively. Therefore, for h=60, we get

(60) = 1/(2*1) (x1 - x5)2

= ½ (.14-.09)2

= 0.5 (.05)2

= 0.5 ( .0025 )

(60) = 0.00125

If we take another step (h=75), we see that there are no more pairs. Therefore, the variogram
calculation stops at h=60.

42
Class Size or Lag Distance

In the above example, the sample pairs were on a straight line. Furthermore, the samples were
at uniform intervals with distance between them being 15 feet. Often the sample values do not fall
on a straight line. The spacing between samples may not be uniform either. Consequently, one uses
an interval rather than a point to pair the samples. The basic unit used for interval (h) is called the
class size or lag distance. For example, if the class size is 50 meters, any pair of data points whose
distance is between 0 and 50 can be included in the computation of the first class size. Any pair of
data points whose distance is between 51 and 100 can be included in the computation of the second
class size, etc.

Another way to pair the data is to apply a tolerance distance (dh) at each class size. Thus, if
the specified class size is h, the actual lag distances used becomes nh ± dh where n is the number of
classes. The only exception to this is the first lag which goes from 0 to h + dh. For example, if the
lag distance is 50 meters with a tolerance of ±25, then the first lag (h=50) may actually go from 0
to 75 meters, second lag (h=100) goes from 75 to 125 meters, third lag (h=150) goes from 125 to 175
meters, and so on.

In some cases, a strict tolerance distance may need to be applied around each lag distance,
including lag distance at h=0. For example, if the lag distance is 50 meters with a tolerance of ±10,
then the first lag (h=0) may actually go from 0 to 10 meters, second lag (h=50) goes from 40 to 60
meters, third lag h=100) goes from 90 to 110 meters, and so on. When the distance tolerance is less
than half the lag distance, then some data points may not get used in the variogram calculation.

Horizontal and Vertical Windows

In mining especially, drilling on a irregular grid is not unusual. Even holes that are drilled
on a regular grid seldom lie on a straight line. Therefore, when a variogram is computed along a
specified direction, one has to tolerate this inevitable randomness and accept any pair whose
separation is close to the direction of the variogram. This tolerance is specified in terms of an angle
from the direction of the variogram. For example, if the direction of the variogram is E-W, and we
use a 15E tolerance, then any pair along E-W direction as well as those within ±15E from E-W line
are accepted for variogram computation. The tolerance angle is called the window. If the this
tolerance is in a horizontal direction, it is referred as the horizontal window. If the tolerance is in
a vertical direction, it is referred as the vertical window.

Horizontal and Vertical Band Widths

In some situation, the pairs accepted within a tolerance window can be tested if they are
within a specified distance from the line of direction of the variogram. This distance is referred to

43
as the band width. Again, the band width can be applied to the horizontal as well as the vertical
directions. Figure 4.1.3 illustrates the window and band width definitions.

Note: The above figure can be used for both horizontal and vertical directions;
plan view applies to the horizontal direction, section view applies to
the vertical direction.

Figure 4.1.3 Variogram Direction, Window and Band Width Definitions

44
Analysis

One typically begins the analysis of spatial continuity with an omni-directional variogram
for which the directional tolerance is large enough to include all pairs. An omni-directional
variogram can be thought of loosely as an average of the various directional variograms, or “the
variogram in all directions.” It is not a strict average since the sample locations may cause certain
directions to be over represented. For example, if there are more east-west pairs than north-south
pairs, then the omni-directional variogram will be influenced more by east-west pairs.

The calculation of the omni-directional variogram does not imply a belief that the spatial
continuity is the same in all directions. It merely serves as a useful starting point for establishing
some of the parameters required for sample variogram calculation. Since direction does not play a
role in omni-directional variogram calculations, one can concentrate on finding the distance
parameters that produce the clearest structure. An appropriate class size or lag can usually be chosen
after few trials.

Another reason for beginning with omni-directional calculations is that they can serve as an
early warning for erratic directional variograms. Since the omni-directional variogram contains more
sample pairs than any directional variogram, it is more likely to show a clearly interpretable
structure. If the omni-directional variogram does produce a clear structure, it is very unlikely for the
directional variograms to show a clear structure.

Once the omni-directional variograms are well behaved, one can proceed to explore the
pattern of anisotropy with various directional variograms. In many practical studies, there is some
prior information about the axes of the anisotropy. For example, in a mineral deposit, there may be
geologic information about the ore mineralization that suggests directions of maximum and
minimum continuity. Without such prior information, a contour map of sample values may offer
some clues to such directions. One should be careful, however, in relying solely on a contour map
because the appearance of elongated anomalies on a contour map may sometimes be due to sampling
grid rather than to an underlying anisotropy.

For computing directional variograms, one needs to choose a directional tolerance (window
and/or band width) that is large enough to allow sufficient pairs for a clear variogram, yet small
enough that the character of variograms for separate directions is not blurred beyond recognition. For
most cases, it is reasonable to use a window of about ±15E. As a rule of thumb, one can initially use
half of the incremental angle used for computing directional variograms. For example, if the
variograms were to be computed at 45E increments, then a ±22.5E window would be appropriate.
Both the window and class size selected for any given direction can be adjusted after the initial trial.
The best approach is to try several tolerances and use the smallest one that still yields good results.

In cases where a three-dimensional anisotropy is present, one can apply a coordinate


transformation to the data before computing the sample variogram. The axes of the new or
transformed coordinate system are made to align with the suspected directions of the anisotropy. This
enables a straightforward computation of the variograms along the axes of the anisotropy.

45
4.2 THEORETICAL VARIOGRAM MODELS

In order to use practical use of the experimental variogram, it is necessary to describe it by


a mathematical function or a model. There are many models that can be used to describe the
experimental variograms; however, some models are more commonly used than others. These
models are explained below.

Spherical Model

This model is the most commonly used model to describe a variogram. The definition of this
model is given by

(h) = c0 + c [1.5 (h / a) - 0.5 (h3 / a3)] if h < a


(4.2.1)
(h) = c0 + c if h $ a

In this equation, c0 refers to the nugget effect, "a" refers to the range of the variogram, h is
the distance, and c0+c is the sill of the variogram.

The spherical model has a linear behavior at small separation distances near the origin but
flattens out at larger distances, and reaches the sill at a, the range. It should be noted that the tangent
at the origin reaches the sill at about two thirds of the range.

Linear Model

This is the simplest of the models. The equation of this model is as follows:

(h) = c0 + A(h) (4.2.2)

where c0 is the nugget effect, and A is the slope of the variogram.

Exponential Model

This model is defined by a parameter a (effective range 3a). The equation of the exponential
model is

(h) = c0 + c [1 - exp (-h / a)] h>0 (4.2.3)

This model reaches the sill asymptotically. Like the spherical model, the exponential model
is linear at very short distances near the origin, however it rises more steeply and then flattens out
more gradually. It should be noted that the tangent at the origin reaches the sill at about two fifths
of the range.

46
Gaussian Model

This model is defined by a parameter a (effective range a/3). The equation of the Gaussian
model is given by

(h) = c0 + c [1 - exp (-h2 / a2)] h>0 (4.2.4)

Like the exponential model, this model reaches the sill asymptotically. The distinguishing
feature of the Gaussian model is its parabolic behavior near the origin.

Hole-Effect Model

This model is used to represent fairly continuous processes that show a periodic behavior,
such as a succession of rich and poor zones. Its equation is given by

(h) = c [1 - ( Sin(wh) / wh ) ] (4.2.5)

In this equation, c is synonymous to the sill value, w is a constant related to the wavelength,
and h is distance. Greater flexibility in fitting this model is achieved by using the following modified
equation:

(h) = c {1 - [Sin(wh+p) / (wh+p)]} (4.2.6)

The addition of constant p allows the model to be shifted left or right along the x-axis. This
model could also be modified by adding a nugget effect term to the equation.

Other theoretical models that are used to describe a variogram are the DeWijsian model,
power model, and cubic model. However, these models are not as frequently used in practice as the
models described above. Figure 4.2.1 shows how various theoretical variogram models look. Figure
4.2.2 shows a plot of sample variogram and a spherical theoretical model fit to this variogram.

47
Figure 4.2.1 Theoretical Variogram Models

48
Figure 4.2.2 Variogram Plot and Theoretical Model Fit

49
4.3 ANISOTROPY

Anisotropy exists if the structural character of the mineralization of an ore deposit differs for
various directions. For example, the grade may be more continuous along the strike direction than
it is down the dip direction. This can be determined by comparing the variograms calculated for
different directions within the deposit.

Geometric anisotropy

There are two types of anisotropy: Geometric anisotropy and zonal anisotropy. Geometric
anisotropy is present if the nugget and sill of the variograms are generally the same, but their ranges
are different in various directions. Because the nugget and sill of the variograms are the same, a
simple translation of coordinates is sufficient to transform one variogram to another, or simply to
make them isotropic. The ratio of the major axis over the minor axis of the ellipse that is used to
perform the necessary transformation is called the anisotropy factor.

The variograms can be anisotropic in three-dimensions, in which case two anisotropy factors
would be defined corresponding to the ratios of the length of three axes of an ellipsoid; namely, the
major axis, the minor axis, and the vertical axis.

One way to check for geometric anisotropy is to make a contour plot of the variogram values,
(h), for different directions on the plane where one thinks, for example, the major and minor axes
of anisotropy are located. If the resulting contours display an elliptical shape, this would indicate the
presence of anisotropic mineralization on that plane. More or less circular contours would indicate
isotropic mineralization. Figure 4.3.1 gives a sample plot of variogram contours generated to detect
geometric anisotropy. Before generating this type of contours, a set of directional variograms must
be generated on the plane of interest from which variogram values at varying distances are retrieved
and contoured.

Zonal anisotropy

Zonal anisotropy is present if the nugget and range of the variograms are generally the same,
but their sills are different in various directions. This situation is encountered in deposits in which
the mineralization is layered or stratified. The variation in grade for a particular direction is due not
only to distance, but also to the number of layers crossed.

50
Figure 4.3.1 Variogram Contours

51
Zonal anisotropy is much more difficult to handle during estimation than geometric
anisotropy. Quite often, combinations of geometric and zonal anisotropy are encountered and can
be very difficult to interpret. One way to deal with zonal anisotropy is to partition the data into zones,
and analyze each zone separately. Another way to handle zonal anisotropy is to use nested variogram
structures which will be discussed next. The difference in the sill values is expressed as one nested
structure applicable only along that specific direction. Figure 4.3.2 gives examples of geometrical
and zonal anisotropies.

Nested Structure

A variogram function can often be modeled by combining several variogram functions:

(h) = 1(h) + 2(h) + ... + n(h)

or

(h) = i(h) i = 1,...,n (4.3.1)

For example, there might be two structures displayed by a variogram. The first structure may
describe the correlation on a short scale. The second structure may describe the correlation on a
much larger scale. These two structures can be defined by a nested variogram model.

In using nested models, one is not limited to combining models of the same shape. Often the
sample variogram will require a combination of different basic models. For example, one may
combine spherical and exponential models to handle a slow rising sample variogram that reaches the
sill asymptotically.

To illustrate nested model concept, the three simple structures shown in Figure 4.3.3 are
combined to give the resulting nested variogram.

52
Figure 4.3.2 Geometrical and Zonal Anisotropy

53
Figure 4.3.3 Nested Structures

54
4.4 VARIOGRAM TYPES

The development of variograms can be really frustrating on data with highly skewed
distribution. Because a variogram is computed by taking the squared differences between the data
pairs, a few outlier high grades may contribute so significantly that some of the points in the
variogram are very erratic.

There are types of variograms which are often used to produce clearer descriptions of the
spatial continuity.

Relative Variograms

A relative variogram, R(h), is obtained from the ordinary variogram by simply dividing each
point on the variogram by the square of the mean of all the data used to calculate the variogram value
at that lag distance:

R(h) = (h) / [m(h) + c]2 (4.4.1)

where c is a constant parameter used in the case of a three-parameter lognormal distribution.

Pairwise Relative Variogram

Another type of relative variogram that often helps to produce a clearer display of the spatial
continuity is the pairwise relative variogram. This particular relative variogram also adjusts the
variogram calculation by a squared mean. However, the adjustment is done separately for each pair
of sample values, using the average of the two values as the local mean. This serves to reduce the
influence of very large values on the calculation of the moment of inertia. The equation of pairwise
relative variogram is given as

PR (h) = 1/(2n) [(vi-vj)2 /((vi+vj)/2)2] (4.4.2)

where vi and vj are the values of a pair of samples at locations i and j, respectively.

The reason behind the computation of a relative variogram is an implicit assumption that the
assay values display proportional effect. In this situation, the relative variogram tends to be
stationary, even though the ordinary variogram (or covariance) is not stationary. If the relationship
between the local mean and the standard deviation is something other than linear, one should
consider scaling the variograms by some function other than mi2.

55
Logarithmic (Log) Variograms

A log variogram, L(h), is obtained by calculating the ordinary variogram using the
logarithms of the data, instead of the raw (untransformed) data.

The reason for transforming the raw data into logarithms is to reduce or eliminate the impact
of extreme data values on the variogram structure. The data transformation accomplishes this
objective by simply reducing the range of variability of the raw data.

After computing a logarithmic variogram, one may want to transform its parameters back to
normal values. This can be done using the following steps:

1. Use the range given by logarithmic variograms as the range of the normal variogram.

2. Estimate the logarithmic mean ( ) and variance ( 2). Use the sill of the logarithmic
variogram as the estimate of 2, particularly if the sill is not equal to the computed variance using
the formula.

3. Calculate the mean, (µ) and the variance ( 2) of the normal data using
2
µ = exp ( + /2) (4.4.3)
2
= µ 2 [exp ( 2) - 1] (4.4.4)

4. Set the sill of the normal variogram equal to the variance ( 2) computed above.

5. Compute c (sill-nugget) and c0 (nugget) value of the normal variogram using

C = µ 2 [exp (clog) - 1] (4.4.5)

C0 = sill - c (4.4.6)

The final acceptance of the these parameters should be after an extensive cross validation of
the selected variogram model.

56
Covariance Function Variograms

This is a relatively new framework to obtain a variogram through a direct estimation of


covariances. The method was proposed by Isaaks and Srivastava of Stanford University in 1987. It
is based on the point that the sample covariance function reflects the character of the spatial
continuity better than the sample variogram, particularly under the condition of heteroscedasticity
(proportional effect) and preferential clustering of data.

This deterministic framework, also known as non-ergodic framework, is more appropriate


when sample information is interpolated (as opposed to extrapolated) within the same domain. This
is most often the situation during ore reserve estimation. A non-ergodic process, therefore, implies
a situation whereby local means are different from the population mean, as is usually the case of
heteroscedasticity.

The covariance function, C(h), can be calculated from the following:

C(h) = 1/N [vi vj - m-h . m+h] (4.4.7)

The data values are v1,...,vn; the summation is over only the N pairs of data whose locations
are separated by h. m-h is the mean of all the data values whose locations are -h away from some
other data location. Similarly, m+h is the mean of all the data values whose locations are +h away
from some other data location.

The above equation is sometimes referred to as the covariogram. The covariogram and the
variogram are related by the formula:

(h) = C(0) - C(h) (4.4.8)

Since the value at h=0 is simply the sample variance, the value obtained at each lag for
covariogram is subtracted from the variance of samples to give the covariance function variogram.

Because of the way it is computed, the covariance function variogram is usually more well
behaved than a normal variogram for data with a skewed distribution. However, having obtained a
well behaved variogram does not eliminate the shortcomings of linear geostatistics where one still
has to face the typical problem of how to handle the outlier data during ordinary kriging, to minimize
the common occurrence of overestimation in grades as well as in tonnages.

57
Correlograms

This is another relatively new technique to measure the spatial continuity of data through the
correlation function. By definition, the correlation function (h) is the covariance function (Equation
4.4.7) standardized by the appropriate standard deviations.

(h) = C(h) / ( -h . )
+h (4.4.9)

where -h is the standard deviation of all the data values whose locations are -h away from some
other data location:
2
-h = 1/N (vi2 - m2-h) (4.4.10)

Similarly +h is the standard deviation of all the data values whose locations are +h away
from some other data location:
2
+h = 1/N (vj2 - m2+h) (4.4.11)

Like the means, the standard deviations, -h and +h, are usually not equal in practice. The
shape of the correlation function is similar to covariance function. Therefore, it needs to be inverted
to give a variogram type of curve, which we call correlogram. Since the correlation function is equal
to 1 when h=0, the value obtained at each lag for correlation function is subtracted from 1 to give
the correlogram.

Indicator Variograms

Indicator variograms are computed using indicators of 0 or 1 based on a specified cutoff.


Therefore, to compute an indicator variogram, one must transform the raw data into indicator
variables. These variables are obtained through the indicator function which is defined as:

1, if z(x) # zc
i(x;zc) = (4.4.12)
0, otherwise

where:
x is location,

zc is a specified cutoff value,

z(x) is the value at location x.

Transformation of raw data z(x) into indicator variable i(x;zc) is a non-linear transform. All
spatial distributions of indicator variables at sampled points will have the same basic form of 0 or
1. If the observed grade is less than the cutoff grade, it will be 1. Otherwise, it will be zero. It is
obvious that the indicator variable, i(x;zc), will change as the cutoff grade changes.

58
The best defined experimental indicator variogram for a given set of data is usually that
variogram corresponding to cutoff grades zc close to the median grade. This is because about 50%
of the indicator data are equal to 1 and the rest are equal to 0. Therefore, the expected sill value of
the median variogram is 0.25. It is also the maximum sill value of indicator variograms.

If one is calculating indicator variograms at many cutoff grades, the sill values of indicator
variograms increase until the median variogram is computed. As cutoff grades continue to increase,
more and more indicator data become equal to 1, thus decreasing the sill values but not necessarily
the ranges of the variograms.

Cross Variograms

Like the variogram for spatial continuity of a single variable, the cross variogram is used to
describe the cross-continuity between two variables. Cross variogram between variable u and
variable v can be calculated using the following discrete method:

CR(h) = 1/2n [u(xi) - u(xi+h)]2 * [v(xi) - v(xi+h)]2 (4.4.13)

There are several unique properties associated with cross variograms. One such property is
that the variogram is always positive, whereas cross variograms can take negative values. A negative
value of a cross variogram simply indicates that increase in one variable corresponds to a decrease
in the other variable.

Calculation of cross variograms is a necessary step in cokriging and probability kriging.

59
4.5 FITTING A THEORETICAL VARIOGRAM MODEL

One needs to fit a theoretical model or mathematical function to the experimental or sample
variogram points in order to put it into a practical use. Usually, this fitting is done interactively using
a computer program. However, if one has to resort to a manual fitting of the model to the sample
variogram, the following steps can be useful:

1. Draw the variance as the sill (c + c0)

2. Project the first few points to the y-axis. This is an estimate of the nugget (c0).

3. Project the same line until it intercepts the sill. This distance is two thirds of the range for
spherical model.

4. Now, using the estimates of range, sill, nugget and the equation of the mathematical
model under consideration (e.g., spherical model), calculate a few points and see if the curve fits the
sample variogram.

5. If necessary, modify the parameters and repeat Step 4 to obtain a better fit.

It should be noted that the nugget value (c0) should be estimated more accurately than the sill
and the range of the variogram. This is because the values of the variogram near the origin are
extremely important in estimation problems.

Verification of Model Parameters

Fitting a theoretical variogram model is often quite simple as long as the sample variogram
is well behaved. Usually visual fits are satisfactory under these conditions. Unfortunately, some
sample variograms, mostly those calculated using data with skewed distributions, are not well
behaved. These variograms may not resemble closely any of the theoretical models studied. Under
these circumstances, model fitting becomes a challenging task. It must be noted that in all cases both
choosing the model and estimating its parameters is quite subjective.

60
4.6 CROSS VALIDATION

There are so many interdependent subjective decisions in a geostatistical study that it is a


good practice to validate the entire geostatistical model and kriging plan prior to any production run.
Thus one may want to check the results of a kriging plan using different variogram parameters as
well as different approaches in search strategy. This can be done by a technique that allows us to
compare the estimated and the true values using the information available in our sample data set.
This technique is called cross validation or point validation. It is also known as jack-knifing.
Although the term jack-knifing applies to resampling without replacement, i.e., when one set of data
values is re-estimated from another non-overlapping data set once it has been re-estimated. In cross
validation actual data are dropped one at a time and re-estimated from the some of the neighboring
data. Each datum is replaced in the data set once it has been re-estimated.

Thus with cross validation, using a “leave-one-out” approach, one estimates (predicts) a
known data point using a candidate variogram model and point kriging (or any other interpolation
method), pretending that this data point is not known. In other words, only the surrounding data
points are used to krige this data point, while leaving the data point out.

Once the estimated grade is calculated, one can determine the error between the estimated
value and the true known value for this data point. The procedure is repeated for all known data
points in the test area, to compute the error statistics such as the mean error, variance of errors and
the average kriging variance for specified model parameters. For comparison, the overall process is
repeated using different variogram parameters or models. The “correct” parameters to be chosen are
the ones which produce:

C The least amount of average estimation error.

C Either the variance of the errors or the weighted square error (or variance) is closest to the
average kriging variance.

The weighted square error (WSE) is given by the following equation:

WSE = 3 [(1/ i
2
) (ei)2] / 3 (1/ i
2
) (4.6.1)

where ei is the difference between the predicted value at point i and the known value, and i2 is the
kriging variance for point i. The weighting by the inverse of the kriging variance gives more weight
to those points that should be closely estimated and vice versa.

A cross validation study may help us to choose between different weighting procedures,
between different search strategies, or between different variogram models and parameters.
Unfortunately, cross validation results are most commonly used simply to compare the distribution
of the estimation errors from different estimation methods. Such a comparison, especially if similar
techniques are being compared, may fall short of clearly indicating which alternative is best.

61
However, cross validation results have important spatial information, and a careful study of the
spatial distribution of errors, with a specific focus on the final goals of the estimation exercise can
provide insights into where an estimation procedure may run into trouble. Since such insights may
lead to case-specific improvements in the estimation procedure, cross validation is a useful
preliminary step before final estimates are calculated. The exercise of cross validation is analogous
to a dress rehearsal: it is intended to detect what could go wrong, but it does not ensure that the show
will be successful.

There are limitations of cross validation that should be kept in mind when analyzing the
results of a cross validation study. For example, it can generate pairs of true and estimated values
only at sample locations. Its results usually do not accurately reflect the actual performance of an
estimation method because estimation at sample locations is typically not representative of
estimation at all of the unsampled locations.

In other practical situations, particularly three-dimensional data sets where the samples are
located very close to one another vertically but not horizontally, cross validation may produce very
optimistic results. Discarding a single sample from a drillhole and estimating the value using other
samples from the same drillhole will produce results that make any estimation procedure appear to
perform much better than it will in actual use. The idea of cross validation is to produce sample data
configurations that mimic the conditions under which the estimation procedure will actually be used.
If very close nearby samples will not be available in the actual estimation, it makes little sense to
include them in cross validation. In such situations, it is common to discard more than the single
sample at the location we are trying to estimate. It may, for example, be wiser to discard all of the
samples from the same drillhole. This, however, puts us back in the situation of producing cross
validated results which are probably a bit too pessimistic. Another alternative, in this case, will be
to discard only the data within a specified distance to the point being estimated.

If the data has outlier high grades, it may also be a good idea not to include them in the cross
validation study. If removing a data point is going to make it impossible for other points around this
point to give a reasonable estimate for the point, then there is no reason to include that data point.
Including the extreme data points which do not have or rarely have any corresponding data elsewhere
in the deposit only makes the cross validation results look worse. However, the outlier data and its
impact on the estimation should be dealt separately. Figure 4.6.1 shows a sample output from cross
validation program.

62
Variable : CU

ACTUAL KRIGING DIFF


------- ------- -------

Mean = .7347 .7417 -.0070


Std. Dev = .4968 .3791 .2920
Minimum = .0000 .0200 -.9000
Maximum = 3.7000 2.1300 2.0100
Skewness = .9851 .5129 1.0620
Peakedness= 1.6682 .0576 5.0352

Ave. kriging variance = .1006


Weighted square error = .0836

Number of samples used = 925

Correlation coefficient= .8104

Least square fit line (Y = A + B * X):

Intercept (A) = -.0531


Slope (B) = 1.0621

Standard error of estimate = .2912

95% confidence interval on A = .0413

95% confidence interval on B = .0495

Figure 4.6.1 Sample Output of Cross Validation

63
64
5
Random Processes and Variances

The theory of regionalized variables is developed on a mathematical model in which the


grade of ore in a deposit is considered the result of a random process in three dimensional space. It
is obvious that estimation requires a model of how the phenomena behaves at unsampled locations.
Without a model, one has only the sample data and no inferences can be made about the unknown
values at locations that were not sampled. One of the important contributions of geostatistics is the
emphasis it places on a statement of the underlying model.

5.1 THEORY OF REGIONALIZED VARIABLES

The theory of regionalized variables developed by Matheron forms the mathematical basis
of geostatistics. The key point of this theory is that the geological or mineralogical process active in
the formation of an orebody is interpreted as a random process. Thus, the grade at any point in a
deposit is considered as a particular outcome of this random process. This probabilistic interpretation
of a natural process as a random process is necessary to solve the practical problems of estimation
of the grade of an ore deposit. Such an interpretation is simply a conceptualization of reality and is
valid only so far as it creates a better picture of reality and permits practical problems to be solved.

Regionalized Variable

Regionalized variables are are random variables that are correlated in space (or in time). The
grade of ore, thickness of a coal seam, elevation of the surface of a formation, are examples of
regionalized variables. In the geostatistical framework, such variables could be called random
variables but the term regionalized is used to indicate that such variables are spatially correlated to
some degree.

A function Z(x) that assigns a value to a point x in three dimensional space is called a
regionalized variable. This function Z(x) displays a random aspect consisting of highly irregular and
unpredictable variations, and a structured aspect reflecting the structural characteristics of the
regionalized phenomenon.

65
The main purpose of the theory of regionalized variables is to express the structural
properties of the regionalized variables in adequate form, and to solve the problem of estimating
these variables from the sample data.

Random Processes

If we view the grade of ore in a deposit to be the result of a random process, then we can
represent this process by a model using a random function Z(x) where x corresponds to any point
in three dimensional space. Since there are infinite number of points in a deposit, there are an infinite
number of grades which can be theoretically described using an infinite number probability
distributions.

A random function Z(x) in its generality consists of infinite collection of density functions
in a probability space which consists of a collection of possible events. Such a generalized model
of an ore forming process and also its result, the grade of the deposit, would be impossible to
manipulate. Consequently, certain assumptions are usually made in defining the particular random
process of interest.

One such assumption is the stationarity of the process. Stationarity occurs when the
regionalization is repeated throughout the orebody. This means that the infinite collection of density
functions do not vary from one location to another in their statistical characteristics—they belong
to the same family of density functions. In other words, similar ore distributions are as likely to be
found in one part of the orebody as in another. For example, if a grade at a given point can be
described using a normal density function, then another grade at a different location must also be
described using a normal density function. However, the parameter values of the density functions
need not be constant from one point to another point in a deposit, unless one makes additional
assumptions about these parameter values.

Some of frequently invoked assumptions on the random process during the ore reserve
estimation are:

1. Strong stationarity

2. Second order stationarity

3. Intrinsic hypothesis (weaker second order stationarity)

66
Strong Stationarity

In order for a random function Z(x) to meet the strong stationarity requirement, the following
properties must be satisfied.

E[Z(x)] = m, m = finite and independent of x (5.1.1)


2 2
Var[Z(x)= , = finite and independent of x (5.1.2)

The first condition implies that there is no gradual increase or decrease in grade for some
specified direction. That means there is no drift. The second condition implies a constant parameter
value of the underlying density functions.

Second Order Stationarity

A random function Z(x) is called a second order stationarity, if the requirement of Equation
5.1.2 for a finite 2 is replaced by Equation 5.1.3.

E[Z(x)] = m, m = finite and independent of x (5.1.3)

E[Z(x+h). Z(x)] - m2 = C(h) = finite and independent of x (5.1.4)

Equation 5.1.3 is the covariogram function C(h) defined earlier. This condition implies that
for each pair of random variables Z(x+h) and Z(x), the covariance exists and depends only on the
separation distance h. The covariance does not depend on the particular location x within the deposit.

The stationarity of covariance implies the stationarity of the variance as well as the
variogram. Under this assumption, the relationship between the variogram and the covariogram is
given below.

(h) = C(0) - C(h) = Var[Z(x)] - C(h) (5.1.5)

Note that covariogram C(h) gives the covariances as a function of the distance h.

67
Intrinsic Hypothesis

The intrinsic hypothesis represents a weaker form of second order stationarity. Furthermore,
this hypothesis can be of two types: The intrinsic hypothesis of order zero, and intrinsic hypothesis
of order one.

a) For the intrinsic hypothesis of order zero, the following conditions must be satisfied.

E[Z(x)] = m, m = finite and independent of x (5.1.6)

E[Z(x+h)- Z(x)]2 = 2 (h) = finite and independent of x (5.1.7)

Note this last equation is the definition of the variogram function. Under this hypothesis, we
assume no drift, and the existence and the stationarity of the variogram only.

Often the condition of no drift in a deposit cannot be satisfied in practice. Under such
circumstance, the intrinsic hypothesis of order one is invoked.

b) For the intrinsic hypothesis of order one, the following conditions must be satisfied.

E[Z(x+h)- Z(x)] = m(h) = finite and independent of x (5.1.6)

E[Z(x+h)- Z(x)]2 = 2 (h) = finite and independent of x (5.1.5)

Under this hypothesis, instead of finite and constant mean m, Equation 5.1.6 simply requires
that the difference in the mean must be finite, independent of the support point x, and depend only
on the separation distance h.

In performing local estimation using ordinary kriging, the intrinsic hypothesis of order zero
is invoked, whereas the technique called universal kriging must be employed under the first order
hypothesis.

68
5.2 RANDOM FUNCTION MODELS

Estimation requires a model of how the phenomena behaves at locations where it has not
been sampled. This is because, without a model, one has only the sample data and no inferences can
be made about the unknown values at locations that were not sampled. There are two types models,
deterministic and probabilistic models, which can be applied to the random process under study
depending on how much one knows about that process.

Deterministic Models

The most desirable information that can be brought to bear on the problem is a description
of how the phenomenon was generated. In certain situations, the physical and chemical processes
that generated the data set might be known in sufficient detail so that an accurate description of the
entire profile can be made from only a few sample values. In such situations, a deterministic model
is appropriate.

For example, the sample data set in Figure 5.2.1 consists of seven locations and seven v
values. By itself, this sample data set provides virtually no information about the entire profile of v.
All one knows from the samples is the value of v at seven particular locations. Estimation of the
values at unknown locations demands that one must bring in additional information or make some
assumptions.

Imagine that the seven sample data were measurements of the height of a bouncing ball.
Knowledge of the physics of the problem and the horizontal velocity of the ball would allow one to
calculate the trajectory of this ball shown in Figure 5.2.2. While this trajectory depends on certain
simplifying assumptions, and is therefore somewhat idealized, it still captures the overall
characteristics of a bouncing ball and serves as a very good estimate of the height at unsampled
locations. In this particular example, one relies very heavily on the deterministic model used. In fact,
the same estimated profile could have been calculated with a smaller data set. The deterministic
model allows reasonable extrapolation beyond the available sampling.

From this example, it is clear that deterministic modeling is possible only if the context of
the data values is well understood. The data values, by themselves, do not reveal what the
appropriate model should be.

69
Figure 5.2.1 Sample Points on a Profile to be Estimated

70
Figure 5.2.2 A Deterministic Model Curve

71
Probabilistic Models

Very few earth science processes are understood in sufficient detail to permit the application
of deterministic models. Though one does not know the physics and chemistry of many fundamental
processes, the variables of interest in earth science data sets are typically the end result of a vast
number of processes whose complex interactions cannot be described quantitatively. For the vast
majority of earth science data sets, one is forced to admit that there is some uncertainty about how
the phenomenon behaves between the sample locations. Probabilistic random function models
recognize this fundamental uncertainty and provide tools for estimating values at unknown locations
once some assumptions about the statistical characteristics of the phenomenon are made.

In a probabilistic model, the available sample data are viewed as the result of some random
process. From the outset, it seems like this model conflicts with reality. The processes that create an
ore deposit are certainly extremely complicated, and our understanding of them may be so poor that
their complexity appears as random behavior. However, this does not mean that they are random.
It simply means that one does not know enough about that particular process. Although the word
random often connotes unpredictable, one should view the sample data as the outcome of some
random process. This will help the ore reserve practitioners with the problem of predicting unknown
values.

It is possible in practice to define a random process that might have conceivably generated
any sample data set. The application of the most commonly used geostatistical estimation
procedures, however, does not require a complete definition of the random process. It is sufficient
to specify only certain parameters of the random process.

With any estimation procedure, whether deterministic or probabilistic, one inevitably wants
to know how good the estimates are. Without an exhaustive data set against which one can check
the estimates, the judgment of their goodness is largely qualitative and depends to a large extent on
the appropriateness of the underlying model. As the conceptualization of the phenomenon that allows
one to predict what is happening at location where there are no samples, models are neither right or
wrong. Without additional data, no proof of their validity is possible. They can, however, be judged
as appropriate or inappropriate. Such a judgment, which must take into account the goals of the study
and whatever qualitative information is availbale, will benefit considerably from a clear statement
of the model.

72
5.3 BLOCK VARIANCE

The block variance expressed as 2(v/D) refers to the variability among grades of a given
support (size, shape and orientation). If one treats the grade of each fixed size block as one sample,
then the block variance is simply the variance of these samples. For a given size block, there is only
one block variance in the deposit. The block variance, which is also known as the “volume-variance”
relationship, therefore tells the variability associated with different support of blocks or samples. The
samples of large support volume have smaller variance than those of smaller volume. In other words,
as the block size gets larger, its block variance gets smaller.

Figure 5.3.1 shows the relationship of variance to sample support. In this figure, the data with
a small support, such as the composite grades, will have the curve A. On the other hand, the data
with a larger support, such as the block grades, will have the curve B.

In Figure 5.3.1, the global means of the two curves are the same. However, if a sub-interval
such as the area to the right of the cutoff grade is considered, the conditional mean grade of curve
A is higher than the conditional mean grade of curve B. In other words, the conditional mean grade
of ore material having a large support is always lower than that of a smaller support, whenever an
economic cutoff grade is applied to the distribution.

Krige’s Relationship

Krige’s relationship relates the variance of block grades within a deposit to point grades
within the deposit and within the block. It is expressed in the following form:
2 2 2
(v/D) = (o/D) - (o/v) (5.3.1)

where
2
(v/D) = the variance of block grades within the deposit,
2
(o/D) = the variance of point grades within the deposit,
2
(o/v) = the variance of point grades within the block.

This relationship can be used to calculate a variance reduction factor to address the problem
with mining dilution in an ore deposit.

73
A = Assay or Composite Grades
B = Block Grades

Figure 5.3.1 Relation of Variance to Sample Support

74
6
Declustering

There are two declustering methods that are generally applicable to any sample data set.
These methods are the polygonal method and the cell declustering method. In both methods, a
weighted linear combination of all available sample values are used to estimate the global mean. By
assigning different weights to the available samples, one can effectively decluster the data set.

6.1 POLYGONAL DECLUSTERING

In this method, each sample in the data set receives a weight equal to its area of the polygon.
Figure 6.1.1 shows the polygons of influence of a sample data set. The perpendicular bisectors
between a sample and its neighbors form the boundaries of the polygon of influence. However, the
edges of the global area require special treatment. A sample located near the edge of the area of
interest may not be completely surrounded by other samples and the perpendicular bisectors with its
neighbors may not form a closed polygon. One solution is to choose a natural limit, such as a
property boundary or a geologic contact. This can then be used to close the border polygons. An
alternative, in situations where a natural boundary is not easy to define, is to limit the distance from
a sample to any edge of its polygon of influence. This has the effect of closing the polygon with the
arc of a circle.

By using the areas of these polygons of influence as weights in the weighted linear
combination, one can accomplish the necessary declustering. Clustered samples receive small
weights corresponding to their small polygons of influence. On the other hand, samples with large
polygons can be thought of as being representative of a larger area and therefore entitled to a larger
weight.

75
Figure 6.1.1 Sample Map of Polygons of Influence

76
6.2 CELL DECLUSTERING

In this method, the entire area is divided into rectangular regions called cells. Each sample
receives a weight inversely proportional to the number of samples that fall within the same cell.
Clustered samples will tend to receive lower weights with this method because the cells in which
they are located will also contain several other samples.

Figure 6.2.1 shows a grid of such cells superimposed on a number of clustered samples. The
dashed lines shows the boundaries of the cells. Each of the two uppermost cells contains only one
sample, so both of these samples receive a weight of 1. The lower left cell contains two samples,
both of which receive a weight of 1/2. The lower right cell contains eight samples, each of which
receives a weight of 1/8.

Since all samples within a particular cell receive equal weights and all cells receive a total
weight of 1, the cell declustering method can be viewed as a two step procedure. First, the samples
are used to calculate the mean value within moving windows, then these moving window means are
used to calculate the mean of global area.

Guidelines For Choosing Cell Size

The estimate one gets from the cell declustering method will depend on the size of the cells
specified. If the cells are very small, then most samples will fall into a cell of its own and will
therefore receive equal weights of 1. If the cells are too large, many samples will fall into the same
cell, thereby causing artificial declustering of samples.

If there is an underlying pseudo regular grid, then the spacing of this grid usually provides
a good cell size. If the sampling pattern does not suggest a natural cell size, a common practice is
to try several cell sizes and pick the one that gives the lowest estimate of the global mean. This is
only appropriate if clustered sampling is exclusively in the areas with high grade values. In such
cases, which are common in practice, the clustering of the samples will tend to increase the estimate
of the mean. Therefore, choosing the cell size that produces the lowest estimate can be a proper
approach. However, if the data are known to be clustered in low valued areas, then one should
choose a cell size that yields the highest declustered mean.

A sample output from a cell declustering program is given in Figure 6.2.2.

77
Figure 6.2.1 An Example of Cell Declustering

78
Item: CU

Min. Easting = 1000.0


Max. Easting = 4000.0
Min. Northing = 4000.0
Max. Northing = 6500.0
Min. Elevation= 2000.0
Max. Elevation= 2960.0

Cell Size in X= 60.0


Cell Size in Y= 60.0
Cell Size in Z= 15.0

#of Cells in X= 50
#of Cells in Y= 42
#of Cells in Z= 64

Original Samples Declustered Samples


-------------------- --------------------
No. = 925 No. = 847
Mean = .7347 Mean = .7310
Std.Dev.= .4968 Std.Dev.= .4858
C.V. = .676 C.V. = .665

Skewness= .985 Skewness= .860


Kurtosis= 1.668 Kurtosis= .696

No. of cells with 1 sample = 778


No. of cells with 2 samples= 60
No. of cells with 3 samples= 9
No. of cells with 4 samples= 0
No. of cells with 5 samples= 0
No. of cells with >5 samples= 0

Figure 6.2.2 Sample Output From Cell Declustering Program

79
6.3 DECLUSTERED GLOBAL MEAN
The estimated global mean from declustering the sample data, DGM, is given by the
following equation:

DGM = 3(wi . vi) / 3wi i=1,...,n (6.3.1)

where n is the number of samples, wi are the declustering weights assigned to each sample, and vi
are the sample values. The denominator acts as a factor to standardize the weights so that they add
up to 1.

For the polygonal approach, 3wi is equal to the total area of influence of all the polygons.
For the cell declustering approach, 3wi is equal to the total number occupied cells since the weights
of the samples in each cell add up to 1.

The polygonal method has the advantage over the cell declustering method of producing a
unique estimate. In situations where the sampling does not justify choosing an appropriate cell size,
the cell declustering method will not be very useful.

80
7
Ordinary Kriging

Ordinary kriging is an estimator designed primarily for the local estimation of block grades
as a linear combination of the available data in or near the block, such that the estimate is unbiased
and has minimum variance. It is a method that is often associated with the acronym B.L.U.E. for best
linear unbiased estimator. Ordinary kriging is linear because its estimates are weighted linear
combinations of the available data; it is unbiased since the sum of the weights adds up to 1; it is best
because it aims at minimizing the variance of errors.

The conventional estimation methods, such as the inverse distance weighting method, are
also linear and theoretically unbiased. Therefore, the distinguishing feature of ordinary kriging from
the conventional linear estimation methods is its aim of minimizing the error variance.

7.1 KRIGING ESTIMATOR

The kriging estimator is a linear estimator of the following form:

Z* = i Z(xi) i = 1,...,n (7.1.1)

where Z* is the estimate of the grade of a block or a point, Z(xi) refers to sample grade, i is the
corresponding weight assigned to Z(xi), and n is the number of samples.

The desired attribute of the minimum estimation variance can be achieved by minimization
of the above equation subject to the constraint that the sum of the weights must be equal to 1. In
other words, the weighting process of kriging is equivalent to solving a constrained optimization
problem where the objective function is given by Equation 7.1.1 and the one constraint is as given
below:
2
Minimize = F( 1, 2, 3..., n)

Subject to i =1 (7.1.2)

This constraint optimization problem can be readily solved by the method of Lagrange
multiplier.

81
7.2 KRIGING SYSTEM

Ordinary kriging can be performed for estimation of a point or a block. The linear system of
equations for both cases is very similar.

Point Kriging

The point kriging system of equations in matrix form can be written in the following form:

C . 8 ' D

C11 . . . C 1n 1 81 C10
. . . . . .
. . . . . . ' .

ñ ‡ ñ ‡ ñ ‡
(7.2.1)
. . . . . .
Cn1 . . . C nn 1 8n Cn0
1 . . . 1 0 µ 1
(n+1) x (n+1) (n+1)x1 (n+1)x1

The matrix C consists of the covariance values Cij between the random variables Vi and Vj
at the sample locations. The vector D consists of the covariance values Ci0 between the random
variables Vi at the sample locations and the random variable V0 at the location where an estimate is
needed. The vector consists of the kriging weights and the Lagrange multiplier. It should be noted
that the random variables Vi, Vj, and V0 are the models of the phenomenon under study, and these
are parameters of a random function.

Block Kriging

The block kriging system is similar to the point kriging system given in Equation 7.2.1 above.
In point kriging, the covariance matrix D consists of point-to-point covariances. In block kriging,
it consists of block-to-point covariances. The block kriging system can therefore be written as
follows:

82
C . 8 ' D
C11 . . . C 1n 1 81 C1A
. . . . . .
. . . . . . ' .

ñ ‡ ñ ‡ ñ ‡
(7.2.2)
. . . . . .
Cn1 . . . C nn 1 CnA
1 . . . 1 0 8n 1
µ
(n+1) x (n+1) (n+1)x1 (n+1)x1

The covariance values CiA is no longer a point-to-point covariance like Ci0, but the average
covariance between a particular sample and all of the points within A:

CiA = 1/A 3 Cij (7.2.3)

In practice, the A is discretized using a number of points in x, y, and z directions to


approximate CiA.

Kriging Variance

For each block or point kriged, a kriging variance is calculated. The block kriging variance
is given by:
2
OK = CAA - [3( i . CiA)+ µ] (7.2.4)

The value CAA is the average covariance between pairs of locations within A. In practice, this
average block-to-block covariance is also approximated by discretizing the area A into several
points. It is important to use the same discretization for the calculation of point-to-block covariances
in D in Equation 7.2.2. If one uses different discretizations for the two calculations, there is a risk
of getting negative error variances from Equation 7.2.4.

For the point kriging variance, CAA in Equation 7.2.4 is replaced by the variance of the point
samples, or simply by the sill value of the variogram. CiA is replaced by Cij.

Since it is data value independent, the kriging variance only represents the average reliability
of the data configuration throughout the deposit. It does not provide the confidence interval for the
mean unless one makes an assumption that the estimation errors are normally distributed with mean
zero. However, if the data distribution is highly skewed, the errors are definitely not normal because
one makes larger errors in estimating a higher grade block than a lower grade block. Therefore, the
reliability should be data value dependent, rather than data value independent.

83
Block Discretization

When using the block kriging approach, one has to decide to discretize the local area or block
being estimated. The grid of discretizing points should always be regular. However, spacing between
points can be made larger in one direction than the other if the spatial continuity is anisotropic.

If one chooses to use fewer discretizing points, computation time for kriging will be faster.
This computational efficiency must be weighted against the desire for accuracy, which calls for as
many points as possible.

It has been shown in practice that, in general, significant differences may occur in the
estimates using grids containing less than 16 discretizing points. With more than 16 points, however,
the estimates are found to be very similar.

The following points should be considered when deciding on how many points to use to
discretize a block:

1. Range of influence of the variogram used in kriging.

2. Size of the blocks with respect to this range.

3. Horizontal and vertical anisotropy ratios.

7.3 PROPERTIES OF KRIGING

There are many properties associated with ordinary kriging estimation. Some are listed
below:

C Kriging has a built-in declustering capability of data during estimation. This is very useful
especially when the data used to estimate are clustered and irregularly spaced.

C Kriging is conditionally unbiased.

C Kriging is an exact estimator. In other words, kriging will estimate all the known points
exactly. There is no error.

C Kriging calculates the kriging variance for each block.

It should be noted that the kriging variance is only a ranking index of data configurations.
Since the kriging variance does not depend on the data values, it should not be used to select a

84
variogram model or a kriging implementation; nor should it be used as the sole criterion to determine
the location of additional samples.

C Kriging tends to screen out the influence of some samples if they are directly behind the
nearby samples.

The practical consequence of this property is that some samples may have negative kriging
weights. On the contrary, conventional linear estimation methods, such as inverse distance
weighting, will never produce negative weights. The disadvantage of negative weights is that they
also create the possibility of negative estimates if a particularly high sample value is associated with
a negative weight. When ordinary kriging produces estimates that are negative, one can be justified
in setting such estimates to zero if the variable being estimated must be positive.

It is worth noting that the use of a variogram with a parabolic behavior near the origin will
cause the screen effect to be much more pronounced, often producing negative weights even larger
than those generated by the variogram models that are linear near the origin.

C The average grade of the small blocks from kriging is the same as the kriged grade of the
combined block (assuming the same data is used for both cases). However, the kriging
variances of small blocks are not readily amendable for the addition.

C Kriging tends to give the estimated block grades that are less variable than the actual grades.
This smoothing effect is also true for most estimators.

The variance of block grades is approximately related to the variance of estimated values and
the kriging variance by the following relationship:
2 2 2 2
z = z* + k - m (7.3.1)

The last term 2m, is the estimation variance of the average grade of the entire deposit and is
usually negligible. This relationship can be used to gain insight into the effects of both block size
and drillhole spacing on the quality of ore deposit model. It can also be used under another form to
assess the quality of estimation. For example, one can define 2z* / 2z ratio to be a smoothing factor
in order to show to which degree kriging reproduces reality in terms of block variability.

Assumptions Made in Kriging


The following assumptions are made in ordinary kriging.
1. No drift is present in the data.
2. Both variance and covariance exist and are finite.
3. The mean grade of the deposit is unknown.

85
7.4 EFFECT OF VARIOGRAM MODEL PARAMETERS

The exercise of fitting a model function to the sample variogram involves important choices
and subjectivity on the part of the practitioner. The sample variogram does not provide any
information for distances shorter than the minimum spacing between the sample data. Unless the
sampling includes duplicates at the same location, the nugget effect and the behavior of the
variogram near the origin cannot be determined easily from the sample variogram. In other instances,
the shape of the variogram may not be very clearly defined to determine the range accurately. Yet
these parameters may have considerable effect on the ordinary kriging weights and on the resulting
estimate.

Effect of Scale

Two variogram models that differ only in their scale will produce the exact kriging weights.
Therefore the ordinary kriging estimate will not be affected. For example, a spherical model with
nugget=0.001 and sill=0.01 will give the same estimate for a block as another spherical model with
nugget=0.1 and sill=1, as long as the range of the variogram and data set used remain the same. In
both cases, the nugget to sill ratios are kept constant.

Rescaling a variogram, however, will affect the ordinary kriging variance which will increase
by the same factor that was used to scale the variogram.

The fact that the variogram can be rescaled by any constant without changing the estimate
enables one to use the relative variogram without fear of altering the estimate. In the case where the
local relative variograms differ one from another by only a rescaling factor, only the kriging variance
will be affected. Each one of the local relative variograms will provide an identical kriging estimate.

Effect of Shape

Different variogram models will produce different kriging estimates for a block even if they
have the identical parameters of nugget, sill and range. This is because the shape the models has an
effect on the weights assigned to the data points used for the block. For example, a parabolic
behavior near the origin is indicative of very continuous phenomena so the estimation procedure
gives more weights to the closest samples.

86
Nugget Effect

For the variograms that differ only in their nugget effect, the lower the nugget effect, the
higher will be the ordinary kriging weights assigned to the samples closer to the block being
estimated. Increasing the nugget effect makes the estimation procedure become more like a simple
averaging of the available data. The other noticeable result of using a higher nugget effect is that the
ordinary kriging variance is higher.

A pure nugget effect variogram model indicates the lack of spatial correlation; the data value
at any particular location bears no similarity even to very nearby data values. In terms of statistical
distance, none of the samples is any closer to the point being estimated than any other. The result is
that for ordinary kriging with a pure nugget effect model of spatial continuity, all weights are equal
to 1/n.

Effect of Range

The change of the range has a relatively minor effect on the ordinary kriging weights. Even
so, these relatively minor adjustments in the weights can cause a noticeable change in the estimate.
If the range is increased without changing any other parameter, the ordinary kriging variance will be
lower since this will make the samples appear to be closer to the block, in terms of statistical
distance, than they originally were.

If the range becomes very small, all the samples appear to be equally far away from the block
being estimated and from each other, with the result similar to that of a pure nugget effect: the
weights all become 1/n and the estimation procedure becomes a simple average of the available
sample data.

Effect of Anisotropy

With the use of the anisotropic variogram model in ordinary kriging, more weight will be
given to the samples in the major direction of continuity. For example, if the horizontal anisotropy
ratio (major axis to minor axis range) in a deposit is 2 to 1, then the samples along the major axis
will appear to be twice as close, in terms of statistical distance, than those samples along the minor
axis. This will have an effect of increasing the weights of samples along the major axis direction of
continuity even though other samples in the opposite direction may be closer to the block in terms
of true or geometric distance.

In many data sets, the direction of maximum continuity may not be the same throughout the
area of interest. There may be considerable local fluctuations in the direction and the degree of
anisotropy. In such situations, the sample variogram may appear isotropic only because it may be
unable to reveal the undulating character of the anisotropy. If the qualitative information offers a way

87
to identify the direction and degree of the anisotropy, then the estimation procedure will benefit
greatly from a decision to base the choice of the spatial continuity model on qualitative evidence
rather than the quantitative evidence of the sample variogram.

The success of ordinary kriging is due to its use of a customized statistical distance rather
than a geometric distance and to its attempt to decluster the available sample data. Its use of a spatial
continuity model that describes the statistical distance between points gives it considerable flexibility
and an important ability to customize the estimation procedure to qualitative information.

7.5 EFFECT OF SEARCH STRATEGY

Deciding what criteria to use in order to identify which data points should contribute to the
estimation of a particular point or block is a very critical step in grade interpolation. This is the
selection of a search strategy that is appropriate for the method used. Here, considerable divergence
exists in practice, involving the use of fixed numbers, observations within a specified radius,
quadrant and octant searches, elliptical or ellipsoidal searches with anisotropic data, and so on.
Since the varying of the parameters may affect the outcome of the estimation considerably, the
definition of the search strategy is therefore one of the most consequential steps of any estimation
procedure.

For estimation methods that can handle any number of nearby samples, the most common
approach to choosing the samples that contribute to the estimation is to define a search neighborhood
within which a specified number of samples is used. If there is anisotropy, the search neighborhood
is usually an ellipse which is centered on the point or the center of the block being estimated. The
orientation of this ellipse is dictated by the pattern of spatial continuity of the data based on the
variogram analysis. Obviously, the ellipse is oriented with its major axis parallel to the direction of
maximum continuity. If there is no evident anisotropy, the search ellipse becomes a circle and the
question of orientation is no longer relevant.

A good search strategy should include at least a ring of drillholes with enough samples
around the blocks to be estimated. However, it should also have provisions for not extending the
grades of the peripheral holes to the undrilled areas.

Since most drilling is vertically oriented, increasing the vertical search distance has more
impact on the number of samples available for a given block, than increasing the horizontal search
distance. If the vertical search is considerable for a given block, then there might be a problem of
having a large portion of the samples for the block coming from the nearest hole, thus carrying the
most weight. This may cause excessive smoothing in reserve estimates. If the circumstances warrant
a large the vertical search, then one solution could be to limit the number of samples used from each
individual drillhole.

88
Octant or Quadrant Search

It is common, especially in precious metal deposits, to have denser drilling in highly


mineralized areas of the deposit. When such clustering of the holes is present, it might be necessary
to have a balanced representation of the samples in all directions in space, rather than taking the
nearest n samples for the blocks to be estimated. This can be achieved by either declustering the
samples before the estimation or by a simple octant or quadrant search in which the number of
samples in each octant or quadrant is limited to a specified number during the interpolation.

The use of an octant or quadrant search usually improves the inverse distance weighting
method results more so than it does the results of ordinary kriging which has well-known “screening”
of data to do internal declustering. Therefore, an octant or quadrant search accomplishes some
declustering and the effect of this is more noticeable on the method that does not decluster by itself.

7.6 RELEVANCE OF STATIONARITY MODELS

A random function model is said to be first order stationary if the mean of the probability
distribution of each random variable is the same, as explained in Section 5. Because we use this
assumption to develop the unbiasedness condition, it is concluded that unbiasedness is guaranteed
when the weights sum to one is limited to first order stationary models.

Therefore, an easily overlooked assumption in every estimate is the fact the sample values
used in the weighted linear combination are somehow relevant, and that they belong to the same
group or population, as the point being estimated. Deciding which samples are relevant for the
estimation of a particular point or a block may be more important than the choice of an estimation
method.

The decision to view a particular sample data configuration as an outcome of a stationary


random function model is strongly linked to the decision that these samples can be grouped together.
The cost of using an inappropriate model is that statistical properties of the actual estimates may be
very different from their model counterparts. The use of weighted linear combinations the sum of
whose weight is one, does not guarantee that the actual bias is zero. The actual bias will depend on
several factors such as the appropriateness of sample data configuration as an outcome of a stationary
random function.

All linear estimation methods implicitly assume a first order stationary model through their
use of the unbiasedness condition. Therefore, it is not only the ordinary kriging which requires first
order stationarity. If estimation is performed blindly, with no thought given to the relevance of the
samples within the search strategy, the methods that make use of more samples may produce worse
results than the methods that make use of few nearby samples.

89
7.7 SIMPLE KRIGING

The simple kriging estimator is a linear estimator of the following form

Z*sk = i [Z(xi) - m] + m i = 1,...,n (7.7.1)

where Z*sk is the estimate of the grade of a block or a point, Z(xi) refers to sample grade, i is the
corresponding simple krigiging weights assigned to Z(xi), n is the number of samples, and m = E
{Z(x)} is the location dependent expected value of Z(x).

Thus the simple kriging algorithm requires prior knowledge of the mean m. Stationary simple
kriging does not adapt to local trends in the data since it relies on the mean value m, assumed known
and constant throughout the area. Consequently, simple kriging is rarely used for mapping z-values.
Instead, it is more robust ordinary kriging algorithm which is used.

Another difference of simple kriging from the ordinary kriging is that there is no constraint
on simple kriging weights. In other words, they do not have to add up to 1 as in the case of ordinary
kriging.

7.8 COKRIGING

Cokriging is an extension of ordinary kriging. It is basically the kriging of several variables


simultaneously. It can be used when there is one or more secondary variables that are spatially cross-
correlated with the primary variable. The commonly used ordinary kriging utilizes only the spatial
correlation between samples of a single variable to obtain the best linear unbiased estimate of this
variable. In addition to this feature, cokriging also utilizes the cross-correlations between several
variables to further improve the estimate. Therefore, cokriging can be defined as a method for
estimation that minimizes the variance of the estimation error by exploiting the cross-correlation
between two or more variables. The estimates are derived using secondary variables as well as the
primary variable.

Reasons for Cokriging

Cokriging is particularly suitable when the primary variable has not been sampled
sufficiently. The precision of the estimation may then be improved by considering the spatial
correlations between the primary variable and the other better-sampled variables. Therefore, having
extensive data from blastholes as the secondary variable with the widely spaced exploration data as
the primary variable is an ideal case for cokriging.

90
Cokriging Equation

Other than tedious inference and matrix notations, cokriging is the same as kriging. It also
branches out to several flavors like the ordinary cokriging, simple cokriging, and indicator cokriging.
The traditional ordinary cokriging system of equations for two variables, exploration drillhole data
being the primary variable and the blasthole data being the secondary variable, is given in Table
7.8.1.

Steps Required for Cokriging

Since cokriging uses multiple variables, the amount of work involved prior to cokriging itself
is a function of the number of variables used. For cokriging of drill and blasthole data of the same
item, the following steps are required.

C The regularization of blasthole data into a specified block size. This block size could be the
same as the size of the model blocks to be valued, or a discreet sub-division of such blocks.
One thus establishes a new data base of average blasthole block values.

C Variogram analysis of drillhole data.

C Variogram analysis of blasthole data.

C Cross-variogram analysis between drill and blasthole data. This is done by pairing each
drillhole value with all blasthole values.

C Selection of search and interpolation parameters.

C Cokriging.

Unless the primary variable, that which is being estimated, is under-sampled with respect to
the secondary variable, the weights given to the secondary data tend to be small, and the reduction
in estimation variance brought by cokriging is not worth the additional inference and modeling
effort.

91
Table 7.8.1. Cokriging system of equations for two variables; primary variable is drillhole data,
secondary variable is blasthole data.

....................................... ..... ...........


[Cov{didi}] [Cov{dibj}] [1] [0] [8i] [Cov{x0di}]
....................................... ..... ...........
[Cov{dibj}] [Cov{bjbj}] [0] [1] [*j] [Cov{x0bj}]
....................................... x ..... = ...........
[ 1 ] [ 0 ] 0 0 µd 1
....................................... ..... ...........
[ 0 ] [ 1 ] 0 0 µb 0
....................................... ..... ...........

[Cov{didi}] = drillhole data (dhs) covariance matrix, i=1,n


[Cov{bjbj}] = blasthole data (bhs) covariance matrix, j=1,m
[Cov{dibj}] = cross-covariance matrix for dhs and bhs
[Cov{x0di}] = drillhole data to block covariances
[Cov{x0bj}] = blasthole data to block covariances
[8i] = Weights for drillhole data
[*j] = Weights for blasthole data
µd and µb = Lagrange multipliers

92
7.9 NON-STATIONARY GEOSTATISTICS

Ordinary kriging is used under the stationarity assumption. As we may recall, a variable is
stationary if it behaves in the same way throughout the whole area of consideration. This assumption
is rarely respected in practice. We always look for something that could be stationary in order to use
classical methods, such as reducing the area of investigation (neighborhood.)

An ever-increasing variance is a characteristic of non-stationarity. It implies there is a trend


or drift in the data. In such cases, a modified ordinary kriging method called “universal kriging” may
become a useful tool. Universal kriging is in fact a kriging with a prior trend model. Table 7.9.1
gives the universal kriging system of equations in the case of a linear drift.

Table 7.9.1. Universal kriging system of equations in the case of a linear drift.

C11 ...... C1n 1 x1 y1 1 C1x


. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
C1n ...... Cnn 1 xn yn n Cnx
'
1 1 0 0 0 µ0 1

x1 Xn 0 0 0 µ1 x

y1 Yn 0 0 0 µ2 y

93
7.10 NON-LINEAR KRIGING METHODS

The ordinary kriging and conventional estimation methods, such as inverse distance
weighting method, are all linear estimators. They are appropriate for the estimation of a mean value
for the blocks. However, if the distribution of the samples is highly skewed, then estimating a mean
value for the blocks may not give realistic grade-tonnage curves to calculate recoverable reserves
because of the smoothing that results from such estimates.

Non-linear kriging methods are designed to estimate the distribution of grades within blocks,
in order to better estimate recoverable tonnages within each block. Some of these methods are listed
below with brief explanations:

ô Indicator kriging: Kriging of indicator transforms of the data.

ô Probability kriging: Advance form of indicator kriging which uses both indicator and
uniform variables, as well as the cross-covariances between the indicator and uniform
variables.

ô Lognormal kriging: Kriging applied to logarithms of the data.

ô Multi-Gaussian kriging: Kriging applied to the normal score transforms of the data. It is
actually a generalization of lognormal kriging.

ô Lognormal short-cut: A method which assumes a lognormal distribution of grades in the


block with a mean equal to the ordinary kriging estimate, and the variance equal to the
estimation variance obtained, plus the variance of the points in the block.

ô Disjunctive kriging: Kriging of specific polynomial transforms of the data.

Non-linear methods of estimation may be parametric or non-parametric. Parametric


methods involve assumptions about distributions (defined by parameters). Non- parametric methods
do not involve assumptions about distributions and are sometimes also called distribution-free
methods. All non-linear methods involve a transformation of the data. Almost all parametric
methods (or at least those in common use) involve a transformation to normality, i.e., a parametric
transformation. Non-parametric methods involve a non-parametric transformation usually to
indicator values. However, the change of support problem generally involves some parametric
assumption.

Both indicator and probability kriging are known as non-parametric methods because the
desired estimator of the distribution is entirely based on sample data, and not on any assumptions
about the underlying distribution (or model) of the data. On the other hand, lognormal short-cut,

94
lognormal, multi-gaussian and disjunctive kriging are parametric methods. They are also classified
as non-linear geostatistics because:

C the data transformation used for estimation purposes are non-linear transforms, for example
y = log x,

C the estimates (in this case the distribution) are obtained using non-linear combination of data.

In theory, all linear methods are non-parametric methods because we do not make any
assumptions about the underlying distribution of the data. However, these methods do not work well
if the data is highly skewed.

Why Do We Need Non-linear Geostatistics?

When the sample data are highly skewed, the calculated variograms are often unrecognizable,
thereby making it quite difficult to apply ordinary kriging. Furthermore, if we need local recoverable
reserves (grade distribution within a block), linear geostatistics is not meant to handle that. Also,
some of the shortcomings of linear geostatistics, such as smoothing, can be overcome by non-linear
geostatistical methods, particularly when the underlying data are highly skewed.

In summary, non-linear methods are used to:

ô overcome problems encountered with outliers

ô provide “better” estimates than those provided by linear methods

ô take advantage of the properties on non-normal distributions of data and thereby provide
more optimal estimates

ô provide answers to non-linear problems

ô provide estimates of distributions on a scale different from that of the data (the “change of
support” problem)

A particular measure to use for deciding whether to use linear or non-linear methods for
reserve estimations can be the coefficient of variation. As a rule of thumb, if the coefficient of
variation of data is less than 0.5, one can say that the linear methods work fine. If it is greater than
1.5 or 2, these methods will not be suitable. If the coefficient of variation is in between 0.5 and 1.5,
definite caution is needed in using the linear estimation methods.

One of the most popular non-linear geostatistical methods is the multiple indicator kriging.
This technique will be covered in the next section.

95
96
8
Multiple Indicator Kriging

The indicator kriging technique was introduced in 1983 by Andre Journel. It can handle
highly variable grade distributions, and does not require any assumptions concerning the distribution
of grades. It estimates the local spatial distribution of grades within a block or panel. Local
recoverable reserves can then be obtained by applying the economic cutoff to this distribution.

The basic concept of indicator kriging is very simple. Suppose that equal weighting of N
given samples is used to estimate the probability that the grade of ore at a specified location is below
a cutoff grade. Then, the proportion of N samples that are below this cutoff grade can be taken as
the probability that grade estimated is below this cutoff grade. If a series of cutoff grades is applied,
then a series of probabilities can be obtained. Indicator kriging obtains a cumulative probability
distribution at a given location in a similar manner, except that it assigns different weights to
surrounding samples using the ordinary kriging technique to minimize the estimation variance. The
basis of indicator kriging technique is the indicator function.

8.1 THE INDICATOR FUNCTION

At each point x in the deposit, consider the following indicator function of zc defined as:

1, if z(x) # zc
i(x;zc) = (8.1.1)
0, otherwise

where:
x is location,
zc is a specified cutoff value,
z(x) is the value at location x.

The indicator function at a sampled point, i(x;zc), takes a simple form which is shown in
Figure 8.1.1. This function follows the bimodal probability law and takes only two possible values,
0 and 1. Given an observed point grade z(x) at location x, there is either a 0 or 100 percent chance
that this value will be less than or equal to the cutoff zc. All sample values are flagged by an indicator
function; 1 for values less than or equal to zc and 0 otherwise.

97
Figure 8.1.1 Indicator Function at Point x

98
Essentially, the indicator function transforms the grade at each sampled location into a [0,1]
random variable. Indicators are assigned to each sampled location in the deposit for a series of cutoff
grades. As the cutoff grade increases, the percentage of points below the cutoff grade zc increases.

8.2 THE (A;zc) FUNCTION

The (A;zc) is a cumulative distribution function (cdf) which is built using the information
from the indicator functions. This function is defined as the exact proportion of grades z(x) below
the cutoff zc within any area A in the deposit:

(A;zc) = 1/A IA i(x;zc) dx 0 [0,1] (8.2.1)

For each cutoff grade zc, one point of cumulative probability function (A;zc) is obtained as
shown in Figure 8.2.1.

In mining practice, local recoverable reserves must be assessed for large panels within the
deposit. Local recoverable reserves of ore grade and tonnage can be estimated by developing the
(A;zc) function for each panel A. The indicator data, i(x;zc), provide the source of information
which can be used to estimate point local recoverable reserves in the same way that point grade data
are used to estimate block grades. The similarity between these two estimation procedures is
demonstrated in the following relations:

Estimate of block V: zv (x) = 1/V Ix0V z(x) dx

Indicator cdf for block V: (V;zc) = 1/V Ix0V i(x;zc) dx

Thus any estimator used to determine the block grades from point grade data can also be used
to determine the spatial distribution from point indicator data.

99
Figure 8.2.1 Proportion of Values z(x) # zc within area A

100
8.3 LOCAL RECOVERY FUNCTIONS

The cumulative distribution function (A;zc) gives the proportion of grades below the cutoff
grade zc. Depending on the size of the estimated panel, tonnage and quantity of metal values can be
calculated for panel A using the recovery factors as follows:

Tonnage point recovery factor in A:

t*(A;zc) = 1 - (A;zc) (8.3.1)

Quantity of metal recovery factor in A:

q*(A;zc) = Izc u d (A;u) (8.3.2)

A discrete approximation of this integral is given by

q*(A;zc) = 3 1/2 (zj + zj-1) [ *


(A;zj) - *
(A;zj-1) ] j=2,...,n (8.3.3)

This approximation sums the product of the median cutoff grade and the median (A;zc)
proportion for each cutoff grade increment. The mean ore grade at cutoff z c gives the mean block
grade above the specified cutoff value.

Mean ore grade at cutoff zc:

m*(A;zc) = q*(A;zc) / t*(A;zc) (8.3.4)

101
8.4 ESTIMATION OF (A;zc)

Let (A;zc) to be the proportion of grades z(x) below cutoff zc within panel A. In general this
proportion would be unknown since i(x;zc) is known at only a finite number of points. From a
deterministic point of view, (A;zc) can be approximated by numerical approximation:

(A;zc) = 1/n 3 i(xj;zc) j=1,...,n (8.4.1)

One disadvantage of this approach is that each sample is given the same weight regardless
of its location. Also, samples outside A are not utilized, and no estimation error is provided.
Therefore, the following estimate of (A;zc) should be considered:

(A;zc) =3 j i(xj;zc) xj 0 D j=1,...,N (8.4.2)

where n is the number of samples in the panel A, N is the number of samples in search
volume D, and j are the weights assigned to the samples. For the unbiasedness condition 3 j = 1
and usually N >> n.

Because of the similarity of Equation 8.4.2 to the linear estimators, the ordinary kriging
approach is used to estimate the cumulative distribution function (A;zc) from the indicator data
i(xj;zc). By analogy with the ordinary kriging estimator, we use a random function model for i(x j;zc),
which will be designated by I(xj;zc).

8.5 INDICATOR VARIOGRAPHY

Variograms are used to describe the spatial correlation between grade values or any other
variable in the deposit. Indicator variograms are estimated for each cutoff grade by using the
indicator data i(xj;zc) found for that cutoff grade. The variogram for the random function I(xj;zc) is
estimated by a sample variogram in the same way as is the grade variogram Z(x):

(h;zc) = 1/2 E [ I(x+h);zc) - I(x;zc) ]2


I (8.5.1)

The sample indicator variograms are more robust than the grade variograms since their
estimation does not call for the data values themselves but rather their indicator values with regard
to a given cutoff zc. Figure 8.5.1 illustrates the indicator variograms for a series of cutoff grades.

102
Figure 8.5.1 Indicator Variograms at Different Cutoff Grades

103
Median Indicator Variogram

The best defined experimental indicator variograms correspond to cutoffs zc close to the
median zm since roughly 50% of the indicator data will be equal to 0 and the rest will be 1’s.

The median indicator variogram in discrete form is defined as

m (h;zm) = 1/2n 3 [ I(xj+h);zm) - I(xj);zm) ]2 j=1,...,n (8.5.2)

The maximum sill value that an indicator variogram can have is 0.25 when half of the
samples are 0’s, the other half are 1’s. Thus the sill values of indicator variograms increase until the
median indicator variogram is reached.

8.6 ORDER RELATIONS

The indicator kriging estimator provides an unbiased estimate of the recovered tonnage at
any cutoff of interest. One disadvantage of this method is the possibility of order relation problems.
These problems occur when the distribution function estimated by indicator kriging is decreasing
( *(A;zk) < *(A;zk+1)), has negative values or values greater than 1. In short, an estimated
distribution has order relation problems if it is not a valid distribution function.

In general, these order relation problems can occur simply because the indicator kriging
system at each cutoff is solved independently. Each indicator kriging system provides an optimal
solution by minimizing the estimation variance at each cutoff. However, since these solutions are
arrived at independently, there is no guarantee that they will yield a valid distribution function.

There are at least two feasible methods for resolving order relation problems. One method
involves combining the simple kriging systems for all cutoffs into one giant system and minimizing
the estimation variances. This system would contain constraints which would force the order
relations to hold. The fundamental drawback of this method is that the system of equations would
be too large to be solved easily.

The second method, which closely approximates the results of the first method, takes the
results given by the indicator kriging algorithm and fits a distribution function which minimizes the
weighted sum of squared deviations from the optimal solution, as shown in Figure 8.6.1. Since this
method is a bit complex, one frequently employed solution is to set *(A;zk+1) = *(A;zk).

104
Figure 8.6.1 Solving the Order Relation Problems

105
Median Indicator Approach

The median indicator variogram provides an equal split of the corresponding indicator data.
Adopting only this variogram in indicator kriging at all cutoff grades can be a practical approach
since the median indicator approximation has the advantage of reducing the number of kriging
systems from K to 1. Furthermore, the order relation problems are reduced considerably. Use of the
median indicator variogram simplifies the estimation of the (A;zc) function. It reduces the number
of indicator variograms to be computed and modeled. In a multiple indicator kriging program, only
one set of kriging weights need to be calculated. This is because at each cutoff grade increment, the
kriged weights remain the same even though the indicators are different.

8.7 CONSTRUCTION OF GRADE-TONNAGE RELATIONSHIP

The estimated cumulative probability function *(A;zc) is defined at K discrete points


corresponding to indicator cutoffs used. If the original data distribution is highly skewed, the
estimated function *(A;zc) should be also highly skewed. Keeping this fact in mind, one must now
complete (or interpolate) the estimated function *(A;zc) at all possible ranges of data value z.

Any interpolation within an interval amounts to assuming a particular cumulative probability


distribution model within that class. As such, this model must satisfy the monotonically increasing
property of a distribution function. Also, to each such intra-class distribution, there is the mean of
this class which may not necessarily equal to the midpoint of this class.

The correct estimation of this class mean is probably the most important task in obtaining
the desired grade-tonnage relationship from the *(A;zk). Since the multiple indicator kriging is
mostly used for highly skewed data, the estimation of the last class mean is particularly critical
simply because this last class mean will usually dominate the overall estimated mean grade of the
block or panel. It is therefore suggested that one use either the lognormal or the power model of
cumulative distribution, at least for the last or last few classes to ensure positive skewness at the high
end of the data values. Regardless of which model is used, one may always consider a very high last
cutoff value so that only a minute portion of the data (1% or less) is above this last cutoff. This
precaution will limit the influence of this last class mean, simply because the last probability would
be estimated zero or a very small value for most blocks within the deposit. The only practical
problem of following this suggestion is that the indicator variogram of this last cutoff will be a
purely random variogram.

106
8.8 PERFORMING AFFINE CORRECTION

The estimated cumulative probability function *(A;zc) as well as the grade-tonnage


relationship for each block is based on entirely the distribution point samples (composite values).
Since the selective mining unit (SMU) volume is generally much larger than the sample volume, one
must perform a volume-variance correction to the initial grade-tonnage curve of each block. This
volume-variance correction is called the affine correction. The assumptions required to perform
affine correction are:

1. The distribution of block or SMU grades has the same shape as the distribution of point
or composite samples.

2. The ratio of the variances, i.e., the variance of block grades (or the SMU grades) over that
of point grades is non-conditional to the surrounding data used for estimation.

Krige’s Relation

An important relationship involving the dispersion variances of samples with different


support is given by Krige’s relation:
2 2 2
p = b + p0b

or (8.8.1)

D2(./D) = D2(smu/D) + D2(./smu)

where:
2
p = D2(./D) = Dispersion variance of composites in the deposit
2
b = D2(smu/D) = Dispersion variance of blocks (SMU’s) in the deposit
2
p0b = D2(./smu) = Dispersion variance of points in blocks (SMU’s.)

This is the spatial complement to the partitioning of variances which simply says that the
variance of point values is equal to the variance of block values plus the variance of points within
blocks. Dispersion variance of composites in the deposit 2p is calculated directly from the composite
or blasthole data. It can also be estimated by (D,D), which is the sill of the variogram of the data.

107
2
The average variance of points within the block p0b is equivalent to the average value of the
variogram within the SMU or (smu,smu).

The average variogram value within a block is estimated by


2
p0b = (smu,smu)

= 1/n2 33 (hi,j) i=1,...,n and j=1,...,n. (8.8.2)

Calculation of Affine Correction

Most kriging programs automatically calculate the values for 2b or 2p0b, and sometimes
both. If 2p and 2p0b are known, then 2b can be obtained easily from Krige’s relation:
2
b = 2
p - 2
p0b
(8.8.3)

Once the dispersion variances are known, the affine correction factor, K, can be calculated
as follows:

K2 = 2
b / 2
p #1 (8.8.4)

Using the estimated values from the variogram averaging

K2 = [ (D,D) - (smu,smu) ] / (D,D)

= 1 - [ (smu,smu) / (D,D) ] #1 (8.8.5)

and

Affine correction factor, K = / K2 #1 (8.8.6)

Then, the necessary equation for affine correction of any panel or block is given by
* *
v (A;z) = (A;zadj) (8.8.7)

where

zadj = adjusted cutoff grade = K * (z - ma) + ma (8.8.8)

108
The larger the SMU, the more spatial averaging occurs within each SMU, and thus the
dispersion variance of the SMU’s will decrease.

The permanence of shape applied during affine correction is quite reasonable if the block size
v (i.e., SMU) is small such that the data available does not allow further resolution within v. As a
rule of thumb, v should be smaller than 1/3 the average data spacing, and the relative change in
dispersion variance between composite grade and SMU grade is approximately 30% or less. In other
words,
2 2 2
( p - b ) / p < 30% (8.8.9)

Figure 8.8.1 illustrates of affine reduction of variance using Equation 8.8.7 above. In this
figure, the block (or the SMU) pdf model fv(A;z) is obtained by shrinking the shape of f(A;z) around
the common mean, ma. The correction preserves the shape of f(A;z) and reduces the variance and
spread of fv(A;z). Note that f(A;z) is the probability density function (pdf) whose cumulative
probability function (cdf) function is given by *(A;z).

In practice, we are generally interested in the average ore grade and the proportion of ore
within each block. In other words, we are interested in performing the affine correction to the right
of the economic cutoff grade. Figure 8.8.2 illustrates how one can perform this affine correction,
without actually shrinking the obtained *(A;z) or f(A;z).

During actual mining, we know that we will apply the economic cutoff zc on the fv(A;z) of
SMU’s rather than on f(A;z) of point samples in Figure 8.8.2. Therefore, we need to compute the
area to the right of zc and also the average grade of this ore using fv(A;z). The average ore grade is
obtained by a simple weighted averaging of probability times the associated grade from fv(A;z).
However, we only have f(A;z) to use. For this reason, we compute the equivalent economic cutoff
grade zc’ which will be applied to f(A;z) curve instead, in order to obtain the correct proportion of
area (or tonnage) to the right of the economic cutoff.

After an integration using zc’ and class means of f(A;z), we now have the correct tonnage
(i.e., proportion) but higher average grade. Consequently, the estimated metal recovery will be larger
than actual. Hence, we must perform another affine correction for the recovered metal, by
proportionately reducing the estimated average grade obtained earlier.

109
Figure 8.8.1 Illustration of Affine Reduction of Variance

110
Figure 8.8.2 Affine Correction for Ore Grade and Tonnage Estimation

111
Equivalent Cutoff Calculation

To apply affine correction to recovery calculations, one simply transforms the specified cutoff
grade to an equivalent cutoff grade. When this equivalent cutoff is applied to the point sample
distribution, it provides SMU recoveries. The basic equation is given by:

(zp - m) / p = (zsmu - m) / smu (8.8.10)

where

zp = the equivalent cutoff grade to be applied to the point (or composite) distribution
m = mean of composite and SMU distribution
p = square root of composite dispersion variance
zsmu = the cutoff grade applied to the SMU
m = mean of composite and SMU distribution
smu = square root of SMU dispersion variance

Equation 8.8.10 is rearranged to get:

zp = ( p / smu ) zsmu + m [1 - ( p / smu )] (8.8.11)

The ratio p / smu is basically the inverse of the affine correction factor K given in Equation
8.8.6. This ratio is $ 1.

Numeric Example:

Let the mean of composites = 0.0445, and the specified cutoff grade zsmu = 0.055. If the ratio
p / smu = 1.23, what is the equivalent cutoff grade?

zp = 1.23 (0.055) + 0.0445 (1 - 1.23) = 0.0574

Therefore, the equivalent cutoff grade to be applied to the composite distribution is 0.0574.
Note that if the specified cutoff grade is less than the mean, the equivalent cutoff grade becomes less
than the cutoff, and if the specified cutoff grade is greater than the mean, the equivalent cutoff grade
becomes greater than the cutoff.

112
8.9 ADVANTAGES & DISADVANTAGES OF M.I.K.

Like any method of estimation, multiple indicator kriging also has some advantages and
disadvantages of its own. The advantages of indicator kriging are the following:

1. It estimates the local recoverable reserves within each panel or block.

2. It provides an unbiased estimate of the recovered tonnage at any cutoff of interest.

3. It is non-parametric, i.e., no assumption is required concerning the distribution of grades.

4. It can handle highly variable data.

5. It takes into account the influence of neighboring data and continuity of mineralization.

The disadvantages of indicator kriging are the following:

1. It may be necessary to compute and fit a variogram for each cutoff.

2. Estimators for various cutoff values may not show the expected order relations.

3. Mine planning and pit design using MIK results can be more complicated than
conventional methods.

4. Correlation between indicator functions of various cutoff values are not utilized. More
information could become available through the indicator cross variograms and subsequent
cokriging. These form the basis of the Probability Kriging technique.

113
114
9
Change of Support

One of the important aspects of geostatistical ore reserve estimation is to accurately predict
the grade and tonnage of material above a specified cutoff grade. In mine planning, the change of
support from the initial stage of sample collection to 3-D deposit modeling is the key for
understanding many of the problems encountered in reserve reconciliations.

9.1 THE SUPPORT EFFECT

The term support at the sampling stage refers to the characteristics of the sampling unit, such
as the size, shape and orientation of the sample. For example, channel samples and diamond drillcore
samples have different supports. At the modeling and mine planning stage, the term support refers
to the volume of the blocks used for estimation and production.

It is important to account for the effect of the support in our estimation procedures, since
increasing the support has the effect of reducing the spread of data values. As the support increases,
the distribution of data gradually becomes more symmetrical. The only parameter that is not affected
by the support of the data is the mean. The mean of the data should stay the same even if we change
the support. Figure 9.1.1 shows the histograms of data from different block sizes. The original block
size of 1x1 was combined into 2x2, 5x5 and 20x20 size blocks. The histograms in this figure
illustrate the effects of changing block size on the variance distribution of data values.

115
1x1 Block Histogram 2x2 Blocks

5x5 Blocks 20x20 Blocks

Figure 9.1.1 Histograms of Data Values from 1x1, 2x2, 5x5 and 20x20 Block Sizes.

116
3-D mine model blocks have a much larger volume than those of the data points. Therefore,
certain smoothing of block grades is expected after an interpolation procedure. However, the
selection of the method and the parameters used in the interpolation may contribute to additional
smoothing of the block grades.

9.2 SMOOTHING AND ITS IMPACT ON RESERVES

The search strategy, the parameters and the procedure used to select data, can play a
significant role in the smoothness of the estimates. As a matter of fact, it is one of the most
consequential steps of any estimation procedure (Arik, 1990; Journel, 1989). The degree of
smoothing depends on several factors, such as the size and orientation of the local search
neighborhood, and the minimum and maximum number of samples used for a given interpolation.
Of all the methods, the nearest neighbor method does not introduce any smoothing to the estimates
since it assigns all the weight to the nearest sample value. For the inverse distance weighting method,
increasing the inverse power used decreases the smoothing because, as the distance power is
increased, the estimate approaches that of the nearest neighbor method. For ordinary kriging, the
variogram parameters used, especially the increase in nugget effect, contribute to the degree of
smoothing.

The immediate effect of smoothing caused by any interpolation method is that the estimated
grade and tonnage of ore above a given cutoff are biased with respect to reality. As the degree of
smoothing increases, the average grade above cutoff usually decreases. Also with increased
smoothing, the ore tonnage usually increases for cutoffs below the mean and decreases for cutoffs
above the mean. Figure 9.2.1 illustrates the effect of smoothing on the grade-tonnage curves by
comparing 1x1 size blocks to 5x5 and 20x20 size blocks.

117
Figure 9.2.1 The Effect of Smoothing on the Grade-Tonnage Curves

118
9.3 VOLUME-VARIANCE RELATIONSHIP

Mining takes place on a much larger volume than those of the data points. Therefore, certain
smoothing of grades is expected. This is in accordance with the volume-variance relationship which
implies that as the volume of blocks increases, their variance decreases. However, mining also takes
place on a smaller volume than those of the 3-D model blocks that are based on exploration data
spacing. Therefore, the variance of these 3-D model blocks is lower than what would normally be
observed during the mining of selective mining unit (SMU) blocks.

Realistic recoverable reserve figures can be obtained if we determine the grade-tonnage


curves corresponding to SMU distribution. Since the actual distribution will not be known until after
mining, a theoretical or hypothetical one must be developed and used. The application of this
procedure can minimize the bias on the estimated proportion and grade of ore above cutoff.

9.4 HOW TO DEAL WITH SMOOTHING

There are a few ways to achieve this objective. One possible solution that will obtain better
recoveries is correcting for smoothness of the estimated grades. This can be done by support
correction. There are methods available for doing this, such as the affine correction or the indirect
lognormal correction (Isaaks and Srivastava, 1989).

Similar or better results for recoverable reserves can be obtained through conditional
simulation. A fine grid of simulated values at the sample level is blocked according to the required
SMU size. This procedure is very simple, but also assumes perfect selection (Dagdelen et al, 1997).

The use of higher distance powers in the traditional inverse distance weighting method is an
attempt to reduce the smoothing of block grades during the interpolation of deposits with skewed
grade distributions. On the geostatistical side, there are methods, such as lognormal kriging,
lognormal short cut, outlier restricted kriging and several others, which have been developed to get
around the problems associated with the smoothing of ordinary kriging (David, 1977; Dowd, 1982;
Arik, 1992). There are also advanced geostatistical methods such as indicator or probability kriging,
which take into account the SMU size in calculating the recoverable reserves (Verly and Sullivan,
1985; Journel and Arik, 1988; Deutsch and Journel, 1992). Each of these methods provides the
practitioner a variety of tools from which to select and apply where they deem appropriate, since
each method has advantages as well as shortcomings.

119
9.5 HOW MUCH SMOOTHING IS REASONABLE?

If we are using a linear estimation technique to interpolate the block grades, what would be
the appropriate degree of smoothing which would result in "correct" grade and tonnage above a given
cutoff when applied to our estimates? For one thing, if we know the SMU size that will be applied
during mining, we can determine the theoretical or hypothetical distribution of SMU grades for our
deposit. Once we know this distribution or the grade-tonnage curves of SMUs, we can vary our
search strategy and interpolation parameters until we get close to these curves. The disadvantage of
this procedure is that one may end up using a small number of samples per neighborhood of
interpolation. This lack of information may cause local biases. However, when we are trying to
determine the global and mineable resources at the exploration stage, we are not usually interested
in the local neighborhood. Rather, we are after annual production schedules and mine plans (Parker,
1980).

Refining the search strategy and the kriging plan to control smoothing of the kriged estimates
works reasonably well, depending on our goal. This can be accomplished by comparing the
grade-tonnage curves from the estimated block grades to those from the SMUs. Since the SMU
grades are not known at the time of exploration, we can determine the theoretical or hypothetical
distribution of SMU grades for our deposit based on a specified SMU size.

9.6 GLOBAL CORRECTION FOR THE SUPPORT EFFECT

There are some methods available for adjusting an estimated distribution to account for the
support effect. The most popular ones are affine correction and indirect lognormal correction. All
of these methods have two features in common:

1. They leave the mean of the distribution unchanged.

2. They change the variance of the distribution by some "adjustment" factor.

AFFINE CORRECTION

The affine correction is a very simple correction method. Basically, it changes the variance
of the distribution without changing its mean by simply squeezing values together or by stretching
them around the mean. The underlying assumption for this method is that the shape of the
distribution does not change with increasing or decreasing support.

The affine correction transforms the z value of one distribution to z’ of another distribution
using the following linear formula:

z’ = %f * (q-m) + m (9.6.1)

120
2
where m is the mean of both distributions. If the variance of the original distribution is , the
variance of the transformed distribution will be f 2.

INDIRECT LOGNORMAL CORRECTION

The indirect lognormal correction is a method that borrows the transformation that would
have been used if both the original distribution and the transformed distribution were both
lognormal.

The idea behind this method is that while skewed distributions may differ in important
respects from the lognormal distribution, change of support may affect them in a manner similar to
that described by two lognormal distributions with the same mean but different variances.

The indirect lognormal correction transforms the z value of one distribution to z’ of another
distribution using the following exponential formula:

z’ = a zb (9.6.2)

where a and b are given by the following formulas:

a = m / sqrt(f cv2 +1) [sqrt(cv2 +1) / m]b (9.6.3)

b = sqrt( ln (f cv2 +1) / ln (cv2 +1) ) (9.6.4)

In these formulas, sqrt is used to denote /, and cv the coefficient of variation. As before, m
is the mean and f is the variance adjustment factor.

One of the problems with the indirect lognormal correction method is that it does not
necessarily preserve the mean if it is applied to values that are not exactly lognormally distributed.
In that case, the transformed values may have to be rescaled, using the following equation:

zO = m/mN * zN (9.6.5)

where mN is the mean of the distribution after it has been transformed by equation 9.6.2.

121
122
10
Conditional Simulation

Simulated deposits are computer models that represent a deposit or a system. These models
are used in place of the real system to represent that system for some purpose. The simulation models
are built to have the same the distribution, dispersion characteristics, and spatial relationships of the
grade values in the deposit. Conditionally simulated models additionally have the same values at the
known sample data locations. The difference between models of estimation and conditional
simulations lies in their objectives.

10.1 THE OBJECTIVES OF SIMULATION


Local and global estimations of recoverable reserves are often insufficient at the planning
stage of a new mine or a new section of an operating mine. For the mining engineer, as well as the
metallurgist and chemist, it is often essential to be able to predict the variations of the characteristics
of the recoverable reserves at various stages in the operation.

For instance, in the processing of low grade iron ore deposits, keeping final product within
strict quality standards may be a complex task whenever impurities such as phosphorus are involved.
The blending process and the flexibility of the plant will depend on the dispersion variance of the
grades received at all scales (daily, monthly, yearly). In this case, the actual grade at each moment
is not really relevant. What is important is the variability of that mill feed, or the variance of feeding
tonnages over a time period. If kriged estimates are used to forecast that production variance, it will
certainly be underestimated since it is a known fact that kriging smooths reality.

Therefore, a detailed definition of an adequate mining control method is essential. For a


preliminary design, it is admissible to use average values to perform an evaluation. When it comes
to detailed definitions, however, these averages are not sufficient due to local fluctuations.

If the in situ reality were known, the required dispersions, and thus the most suitable working
methods, could be determined by applying various simulated processes to this reality. Unfortunately,
the perfect knowledge of this in situ reality is not available at the planning stages of the operation.
The information available at this stage is usually incomplete, and limited to the grades of a few
samples. The estimations deduced from this information are far too imprecise or smooth for the exact
calculations of dispersions or fluctuations that are required. One solution is conditional simulation.

123
10.2 WHAT IS CONDITIONAL SIMULATION?
In any simulation, a model is employed in place of a real system to represent that system for
some purpose. To simulate an ore deposit, one has to build a model of the deposit that will reflect
not only the correct grade distribution, but also the correct spatial relationships of the grade values
in the deposit.

In a conditional simulation, there is an additional step where the model values are
conditioned to the experimental data. The conditioning essentially forces the simulation to pass
through the available data points so that certain local characteristics of the deposit can be imparted
to the model. Therefore, the simulation is said to be “conditional” if the resulting realizations honor
the hard data values at their locations. This conditioning gives a certain robustness to the simulation
with respect to characteristics of the real data. If, for example, a sufficient number of data show a
local drift, then the conditional simulations, even though based on stationary model, will reflect the
local drift in the same zone. These conditional simulations can be further improved by adding
different qualitative information available from the real deposit, such as geologic boundaries, fault
zones etc.

For most practical problems, each conditional simulation or realization can be seen as an
exhaustive sampling of a similar field generated by similar “physical random processes.” Each
conditional simulation typically:

C reproduces a histogram. Usually this is the histogram that is deemed representative of the
total sample domain.

C reproduces the covariance function Cz(h) that is deemed representative of the same sample
domain.

C honors the sample data values at their locations, i.e., zsim(x ) = z(x ), for all simulations and
at all data locations x .

C contains a vastly larger number of simulated attribute values relative to the number of sample
values or conditioning data. The ratio may be somewhere between 100:1 and 1000:1.

Figure 10.2.1 illustrates the relationship between the actual sample values in a deposit and a
conditional simulation of that deposit.

124
Figure 10.2.1 Schematic relationship between the actual sample values in a deposit and a
conditional simulation of that deposit.

As far as the dispersion of the simulated variable is concerned, there is no difference between
the simulated deposit and the real deposit. The simulated deposit has the advantage of being known
at all points x and not only at the experimental data points x . This simulated deposit is also called
a “numerical model” of the real deposit.

125
10.3 SIMULATION OR ESTIMATION

Estimation and simulation are complementary tools. Estimation is appropriate for assessing
mineral reserves, particularly global in situ reserves. Simulation aims at correctly representing spatial
variability, and is more appropriate than estimation for decisions in which spatial variability is a
critical concern and for risk analysis.

Any estimation technique such as kriging gives only one single “best estimate;” best in some
local measure such as unbiasedness and minimum error variance, without regards to the global
feature of obtained estimates. For each unknown location in the deposit being estimated, the
technique generates a value which is close on average to the unknown true value. This process is
repeated for every unknown point in the deposit without any consideration of the spatial dependence
that exists between true grades of the deposit. In other words, the estimated values cannot produce
the covariance or the variogram computed from the data. Neither can an estimation technique
reproduce the histogram of the data. The only thing it can reproduce is the data values at known data
locations.

The criteria for measuring the quality of an estimation are unbiasedness and the minimal
quadratic error, or estimation variance. There is no reason, however, for these estimators to
reproduce the spatial variability of the true grades. In the case of kriging, for instance, the
minimization of estimation variance involves a smoothing of the true dispersions. Similarly, the
polygonal method of estimation would consider the grade as constant all over the polygon of
influence of a sample. Therefore, it would also underestimate the local variability of true grades. The
estimated deposit is, thus, a biased base on which to study the dispersions of the true grades.

By being close on average, estimation techniques such as kriging try to avoid colossal errors,
and if well done, should be globally unbiased. Hence, it is a good basis for assessing the global ore
reserve estimates. However, the technique becomes an exercise in mediocrity, because of being close
on average. Specifically, estimation techniques produce a global feature that is smoother than reality
and they also underestimate extreme values. The smoother estimated surface gives a false optimism
in the sense that the true reality would also be as smooth.

Conditional simulation, on the other hand, provides the same mean, histogram, and the
variogram as the real grades (assuming that the samples are representative of the reality). Therefore
it identifies the main dispersion characteristics of these true grades.

In general, the objectives of simulation and estimation are not compatible. It can be seen from
Figure 10.3.1 that, even though the estimation curve is, on average, closer to the real curve, the
simulation curve is a better reproduction of the fluctuations of the real curve. The estimation curve
is preferred for locating and estimating reserves, while the simulation curve is preferred for studying
the dispersion characteristics of these reserves, remembering that the real curve is known only at the
experimental data points.

126
True but unknown reality
......... Variations approximated by conditional simulation
- - - - Smoothed reality by kriging

Figure 10.3.1 Illustration of conditional simulation

127
10.4 SIMULATION ALGORITHMS

There are many simulation methods available for use in the conditional simulation of deposits. The
turning bands method, gaussian sequential simulation, indicator sequential simulation, L-U
decomposition, probability field simulation, and annealing techniques. The turning bands method
was the first method introduced in the early 1970’s. Since then many new techniques have been
introduced, particularly since the mid 1980’s. No single algorithm is flexible enough to allow the
reproduction of the wide variety of features and statistics encountered in practice.

Gaussian Sequential Simulation

One of the most straightforward algorithms for simulation of continuous variables is the
gaussian sequential algorithm which is based on classical multivariate theory. It assumes all
conditional distributions are normal and determined exactly by simple kriging mean and variance.
Since most original data is not multivariate normal, they need to be transformed into normal scores,
then transformed back to the original distribution after the simulation. However, the algorithm is
computationally faster and easier. It also has established record of successful applications.

Because of the unique properties of a multi-variate gaussian model, performing the gaussian
sequential simulation is perhaps the most convenient method. It is accomplished following the basic
idea below:

The multi-variate pdf f(x1, x2,...,xn; z1, z2,...,zn), where xi’s denote the location in the domain
A and zi’s denote particular attribute values at these locations, can be expressed as a product of
univariate conditional distributions as follows:

f(x1, x2,...,xn; z1, z2,...,zn) = f(x1; z1)


x f(x2; z2 | Z(x1) = z1)
x f(x3; z3 | Z(x1) = z1, Z(x2) = z2)
x ... (10.4.1)
x f(xn; zn | Z(x ) = z , = 1,...,n-1)

If all univariate conditional distributions in Equation 9.4.1 are known, then a realization z(x)
of RF Z(x) can be constructed by a sequence of random drawings from each of the n univariate
conditional distributions:

1. A realization z1 of the random variable Z(x1) is obtained by randomly drawing a value


from the marginal distribution f(x1; z1).

2. The realization z1 is used to condition the distribution of Z(x2).

128
3. A realization z2 of the random variable Z(x2) is obtained by randomly drawing a value
from the conditional distribution f(x2; z2 | Z(x1) = z1).

4. The realizations z1 and z2 are used to condition the distribution of Z(x3).

5. A realization z3 of the random variable Z(x3) is obtained by randomly drawing a value


from the conditional distribution f(x3; z3 | Z(x1) = z1, Z(x2) = z2).

6. The sequence of random draws and subsequent conditionings is continued until the last
distribution f(xn; zn | Z(x ) = z , = 1,...,n-1) is fully conditioned. Then a realization zn of the last
random variable Z(xn) is randomly drawn a value from this distribution.

Note that the number n above for n variate, multi-variate joint pdf corresponds to the total
number of simulated grid points during conditional simulation.

For this algorithm to be of any practical use, the complete sequence of conditional
distributions for the given multi-variate pdf must be known. It can be shown that the univariate
conditional distribution of a stationary, multi-variate gaussian RF model Y(x) with the covariance
CY(h) or variogram (h) function is gaussian with a conditional mean and variance exactly equal to
the simple kriging estimate (or mean) and simple kriging variance.

Since the simulated values ys(x) are sequentially drawn from exact univariate conditional
distributions as shown in Equation 9.4.1, the covariance CY(h) is reproduced by Ys(x). Unfortunately
however, most earth science data are not univariate normal, much less multi-variate normal. Thus,
to take advantage of the multi-variate normal model, a normal scores transformation is typically
applied to the initial sample data z(x ).

y(x ) = ( z(x ) ) = 1,...,n (10.4.2)

where (.) is a one-to-one (invertible) transformation function and y(x )’s are normal with mean=0
and variance=1. The RF model Y(x) is then assumed multi-variate normal which enables the drawing
of the realizations ys(x), s = 1,...,S from the multi-variate normal cdf fully characterized by the
covariance function CY(h) that is inferred from the normal data y(x ).

These realizations are then back transformed into realizations of Z(x) using the inverse
function -1(.).
-1
zs(x) = ( ys(x) ) s = 1,...,S (10.4.3)

The resulting RF model Z(x) is said to be multi- -normal. In other words, its transform
Y(x) is multi-variate normal.

129
Another problem in implementing the sequential method in Equation 9.4.1 comes from the
increasing number of conditioning data at each sequential step. The number of conditioning data
keeps growing from m to a maximum of m + n - 1 values where m is the number of conditioning data
and n number of simulation nodes. The number of simulation nodes n may be as large as 106 which
would require the solution of unreasonably large kriging systems. Therefore, in practice only those
data closest to the current node being simulated are retained. The rational for doing this is that the
information contained by data further from the simulation node is “screened” by the closer data. The
impact of such screened information is deemed small enough so that it can be ignored without
consequence. For example, if only the closest four data are retained in each sector of a quadrant
search, then a simple kriging system of maximum dimension 16 would have to be solved at each
simulation node.

Indicator Sequential Simulation

Indicator sequential simulation allows different patterns of spatial continuity for different
cutoffs. It also allows incorporation of secondary information and constraint intervals. It is the
preferred method if one is concerned with proportions, categorical variables, or with the continuity
properties of the extremes. However, the algorithm is computationally intense and slower.

The basic idea of indicator sequential simulation is exactly analogous to that of gaussian
sequential simulation. Again, multi-variate pdf f(x1, x2,...,xn; z1, z2,...,zn) is expressed as a product
of univariate conditional distributions, and these distributions are approximated using a series of
indicator transforms of Z(x) and simple or ordinary krigings. It has been shown that any kriging
technique that produces a distribution of possible outcomes (e.g., indicator kriging, probability
kriging, disjunctive kriging) can be used to obtain the local conditional distributions.

The main advantages of indicator sequential simulation over gaussian sequential simulation
are the following:

C It is possible to control N spatial covariances (or indicator variograms) instead of a single


one.

C Not only the hard data values z(x ) but also any number of soft local data, such as constraint
intervals z(x ) 0 [a , b ] and prior probability distributions for the datum value z(x ) can be
utilized. Such additional or soft information may improve the accuracy of the resulting
conditional simulations to simulations generated without soft information.

C It is particularly adept in displaying the spatial connectivity of extreme values. On the other
hand, gaussian sequential simulation would fail to show such connectivity because the two
indicators I(z;x) and I(z;x+h) become independent as the cutoff value z gets away from the
median.

130
One should keep in mind that the indicator sequential algorithm requires the calculation of
indicator variograms as well as the construction and solution of one kriging system for each cutoff
value. Therefore, the computational requirements for this simulation are much more demanding than
the gaussian sequential simulation.

Indicator sequential simulation is presently the only conditional simulation method that is
not directly or indirectly based on the gaussian RF model. The method is non-parametric in the sense
that it need not call for prior estimation of any distributional parameters; all results can be derived
from the data. In cases where the data is plentiful and gaussian RF model is shown to be
inappropriate, the indicator RF model provides an alternative.

The gaussian RF model is usually preferred if the problem at hand has more to do with spatial
averages, whereas indicator RF model is preferred if one is concerned with proportions, categorical
variables, or with the continuity properties of the extremes.

Turning Bands Method

The very first geostatistical simulation algorithm took a different approach to obtaining a
conditional simulation. The steps for this method are given below:

1. Krige the grid to obtain Z*.

2. Produce an unconditional simulation (Zucs) that has the correct histogram and variogram
but happens not to honor the available samples.

3. Sample the unconditional simulation at the locations where actual sample values exists.

4. Krige the unconditional simulation to obtain Z*ucs.

5. Compute the simulated value:

Zsim(x) = Z*(x) + [Z*ucs(x) - Zucs(x)]

Z*(x) is the kriged estimate at location x and [Z*ucs(x) - Zucs(x)] is the simulated error. The
unconditional simulation is produced by spatially averaging uncorrelated values to produce one
dimensional “bands” that radiate from a common origin, then averaging again by projecting each
point on the simulated grid onto these bands.

131
10.5 MAKING A SIMULATION CONDITIONAL

Suppose that we have a set of simulated values Z(x) for each point of the deposit, obtained
from an original set of Y(x), the real grade know at sample points x.

Using the known values Y(x ) at the points x we can compute a kriged estimate Y*(x ) for
any point x, remembering that if x = x then Y*(x ) = Y(x ) (the exact interpolation property of
kriging).

Now, from the values of Z(x) at the sampling points x we can compute a set of kriged
estimates Z*(x) for all x.

We now have three sets of values for each point:

Z(x), Z*(x), Y*(x)

and remembering that

Z*(x ) = Z(x ) and Y*(x ) = Y(x )

we assign to each point x the value of the function

Zs(x) = Y*(x) + (Z(x) - Z*(x))

The properties of this function are:

Zs(x ) = Y(x ), since


Y*(x ) = Y(x ) and Z*(x ) = z(x ),

and thus, the conditionality requirement is satisfied.

Further,
E(Zs(x)) = m

since E(Y*(x)) = m

and E(Z(x) - Z*(x)) = 0

and Y and Z are uncorrelated.

This takes care of generating a conditional set of values. The procedure is summed up in
Figure 10.5.1.

132
Figure 10.5.1 Summary of the algorithm generating conditionally simulated grade as the sum of
three random variables all derived from the original set of samples.

133
10.6 CONDITIONAL SIMULATION FUNCTIONS

The procedures to do various functions related to conditional simulation are provided in a


new group in the MEDSYSTEM®/MineSight® Procedure Manager. This new group is called
CONDITIONAL SIMULATION. The functions currently available in this group are listed below.

C Normal scores transformation


C Histograms and statistics of normal scores
C Generate grid nodes and assign conditioning data
C Sequential gaussian simulation
C Sequential indicator simulation
C Back transformation
C Histogram and statistics of simulated data
C Variograms using simulated data
C Variogram modeling of simulated data
C Display and mapping of simulated values

Once the simulated values are stored into the model file, the standard
MEDSYSTEM®/MineSight® programs are used to verify, display, plot and summarize the results
from the simulation model.

10.7 TYPICAL USES OF SIMULATED DEPOSITS

A conditionally simulated deposit represents a known numerical model on a very dense grid.
As the simulation can only reproduce known, modeled structures, the simulation grid is limited to
the dimensions of the smallest modeled structure. Various methods of sampling, selection, mining,
haulage, blending, ore control and so on, can be applied to this numerical model, to test their
efficiency before applying them to the real deposit. Figure 10.6.1 illustrates the use of conditional
simulation to forecast departures from planning in the mining of a deposit.

134
Figure 10.7.1 Use of conditional simulation to forecast departures from planning in the mining of
a deposit.

135
Since each conditional simulation provides equally probable, alternative sets of simulated
values of the deposit, all consistent with the same available information, simulation provides a more
complete look at uncertainty in the estimated values. This uncertainty can be used in conducting a
serious financial risk analysis to weigh the economic consequences of overestimation and
underestimation.

Some of the examples of typical uses of simulated deposits are as follows:

ó Application in grade control to determine dig-lines that are most likely to maximize the profit
or minimize the dollar loss.
ó Comparative studies of various estimation methods and approaches to mine planning
problems.
ó Studies of the sampling level and drillhole spacing necessary for any given objective.
ó Studies to investigate the influence of mining machines on the ore-waste ratios and on the
variability in tonnage and grade of ore produced.
ó Application for generating models of porosity and permeability.
ó Application in petroleum reservoir production.
ó Application to determine the change of support correction factors.
ó Application to blending of stockpiles to stabilize tonnage and quality of production (grades,
hardness, mineralogy).
ó Studies to determine the probability of exceeding a regulatory limit and application in
development of emission control strategy.
ó Studies to quantify the variability of impurities or contaminants in metal or coal delivered
to a customer at different scales and time frames.
ó Prediction of recoverable reserves.

It must be understood that the results obtained from simulated deposits will apply to reality
only to the extent to which the simulated deposit reproduces the essential characteristics of the real
system. Therefore, the more the real deposit is known, the better its model will be, and the closer the
conditional simulation will be to reality. As the quality of the conditional simulation improves, not
only the reproduced structures of the variability become closer to those of reality, but so also will
the qualitative characteristics (geology, alteration, and so on) that can be introduced into the
numerical model. It must be stressed that simulation cannot replace a good sampling campaign of
the real deposit.

136
11
References
Arik, A., 1992, "Outlier Restricted Kriging: A New Kriging Algorithm for Handling of Outlier High
Grade Data in Ore Reserve Estimation," Proceedings, APCOM, Tucson, Arizona, pp.
181-188.

Arik, A., 1998, Nearest Neighbor Kriging: A Solution to Control the Smoothing of Kriged
Estimates. SME Annual Meeting, Orlando, Florida. Preprint 98-73.
Arik, A., 1990, “Effects of Search Parameters on Kriged Reserve Estimates,” International Journal
of Mining and Geological Engineering, Vol 8, No. 12, pp. 305-318.

Dagdelen, K., Verly, G., and Coskun B., 1997, “Conditional Simulation for Recoverable Reserve
Estimation,” SME Annual Meeting, Denver, Colorado. Preprint 97-201.

David, M., 1977, Geostatistical Ore Reserve Estimation, Elsevier, Amsterdam.

Deutsch, C.V., Journel, A.G., 1992, GSLIB: Geostatistical Software Library and User’s Guide,
Oxford University Press, New York.

Dowd, P.A, 1982, Lognormal Kriging The General Case, Mathematical Geology, Vol. 14, No. 5.

Isaaks, E.H., 1996, Geostatistics Training Course Notes.

Isaaks, E.H., Srivastava, R.M., 1989, Applied Geostatistics, New York, Oxford University Press.

Journel, A.G., Arik, A., 1988, “Dealing with Outlier High Grade Data in Precious Metals Deposits,”
Proceedings, Computer Applications in the Mineral Industry, Balkema, Rotterdam, pp. 161-
171.

Journel A.G. and Huijbregts Ch.J., 1978,. Mining Geostatistics, Academic Press, London.

Kim, Y.C, Knudsen, H.P. and Baafi, E.Y., 1980, Application of Conditional Simulation to Emission
Control Strategy Development, University of Arizona, Tucson, Arizona.

Parker, H.M., 1980, “The Volume-Variance Relationship: A Useful Tool for Mine Planning,”
Geostatistics (Mousset-Jones, P.,ed.), McGraw Hill, New York.

Rossi, M.E., Parker, H.M. Roditis, Y.S., 1994, “Evaluation of Existing Geostatistical Models and
New Approaches in Estimating Recoverable Reserves,” SME Annual Meeting , Preprint #94-
322.

Verly, G.W., Sullivan, J.A., 1985, “Multigaussian and Probability Kriging, Application to Jerritt
Canyon Deposit,” Mining Engineering, Vol. 37, pp 568-574.

137
138

You might also like