Topic: BASIC GEOSTATISTICS

Subtopic: Status

Introduction

Classical Statistical Concepts

Data Posting and Validation

Regionalized Variables

Kriging

Data Integration

Conditional Simulation

Public Domain Geostatistics Programs

Case Studies

Selected Readings

Geostatistics Glossary



COMPILED BY: JIMBA OLUWAFEMI SOLOMON









THERE IS LOVE IN SHARING!!
THIS IS FOR THE BENEFIT OF MY COLLEAGUES IN
PETROLEUM GEOSCIENCE IMPERIAL COLLEGE LONDON
WHO WILL AND MUST GRADUATE IN SEPTEMBER 2008.

WISH YOU ALL SUCCESS!!!
















INTRODUCTION
Before undertaking any study of Geostatistics, it is necessary to become familiar
with certain key concepts drawn from Classical Statistics, which form the basic
building blocks of Geostatistics. Because the study of Statistics generally deals
with quantities of data, rather than a single datum, we need some means to deal
with that data in a manageable form. Much of Statistics deals with the
organization, presentation, and summary of data. Isaaks and Srivastava (1989)
remind us that “Data speaks most clearly when organized”.
This section reviews a number of classic statistical concepts that are frequently
used during the course of geostatistical analysis. By understanding these
concepts, we will gain the tools needed to analyze and describe data, and to
understand the relationships between different variables.
STATISTICAL NOTATION
Statistical notation uses Roman or Greek letters in equations to represent similar
concepts, with the distinction being that:
- Greek notation describes Populations: measures of a population are
called parameters
- Roman notation describes Samples: measures of a sample are called
statistics
Now might be a good time to review the list of Greek letters. Following is a list of
Greek letters and their significance within the realm of statistics.
Letter
Name

Upper &
Lower Case
alpha
A
o
beta
B
|
gamm
a
I
¸
delta
A
o
epsilon

E
c
zeta
Z
,
eta
H
q
theta
O
u
iota
I
i
kappa

K
k
lambd
a
A
ì
mu
M
µ Statistical Notation: Mean of a Population
nu
N
v
xi
±
ç
omicro
n
O
o
pi
H
t
rho
P
µ Statistical Notation: Correlation Coefficient
sigma
E Statistical Notation: Summation
o Statistical Notation: Standard Deviation of a
Population
tau
T
t
upsilon

Υ
u
phi
u
|
chi
X
_ Statistical Notation: Mean of a Sample ( )
psi
+
¢
omega

O
e

It is important to note that in some cases, a letter may take on a different
meaning, depending on whether the letter is upper case or lower case. Certain
Roman letters take on additional importance as part of the standard notation of
Statistics or Geostatistics.






_
Letter
Name

Statistical
Notation
E Event
F
f
f
Distribution
Frequency,
Probability function for a random variable
h Lag distance (distance between two sample points)
m
Sample mean
N
n
Population size
Sample size (or number of observations in a data set)

O
o
Observed frequencies
Outcomes
P
p
Probability
Proportion
s
Standard deviation of a sample
V Variance
X
x
Random variable
A single value of a random variable
MEASUREMENT SYSTEMS
Because the conclusions of a quantitative study are based in part on inferences
drawn from measurements, it is important to consider the nature of the
measurement systems from which data are collected. Measurements are
numerical values that reflect the amount or magnitude of some property. The
manner in which numerical values are assigned determines the measurement
scale, and thereby determines the type of data analysis (Davis, 1986).
There are four measurement scales, each more rigorously defined than its
predecessor; and thus containing more information. The first two are the nominal
and ordinal scales, in which we classify observations into exclusive categories.
The other two scales, interval and ratio, are the ones we normally think of as
“measurements,” because they involve determinations of the magnitude of an
observation (Davis, 1986).
Nominal Scale
This measurement classifies observations into mutually exclusive categories of
equal rank, such as “red,” “green,” or “blue.” Symbols like “A,” “B,” “C,” or
numbers are also often used. In geostatistics, we may wish to predict facies
occurrence, and may therefore code the facies as 1, 2 and 3, for sand, siltstone,
and shale, respectively. Using this scale, there is no connotation that 2 is “twice
as much” as 1, or that 3 is “greater than” 2.

Ordinal Scale
Observations are sometimes ranked hierarchically. A classic example taken from
geology is Mohs‟ scale of hardness, in which mineral rankings extend from one to
ten, with higher ranks signifying increased hardness. The step between
successive states is not equal in this scale. In the petroleum industry, kerogen
types are based on an ordinal scale, indicative of stages of organic diagenesis.
Interval Scale
This scale is so named because the width of successive intervals is constant.
The most commonly cited example of an interval scale is temperature. A change
from 10 to 20 degrees C is the same as the change from 110 to 120 degrees C.
This scale is commonly used for many measurements. An interval scale does not
have a natural zero, or a point where the magnitude is nonexistent. Thus, it is
possible to have negative values. Within the petroleum industry, reservoir
properties are measured along a continuum, but there are practical limits for the
measurements. (It would be hard to conceive of negative porosity, permeability,
or thickness, or of porosity greater than 100%.)
Ratio Scale
Ratios not only have equal increments between steps, but also have a zero point.
Ratio scales represent the highest forms of measurement. All types of
mathematical and statistical operations are performed with them. Many
geological measurements are based on a ratio scale, because they have units of
length, volume, mass, and so forth.
For most of our geostatistical studies, we will be primarily concerned with the
analysis of interval and ratio data. Typically, no distinction is made between the
two, and they may occur intermixed in the same problem. For example, in trend
surface analysis, the independent variable may be measured on a ratio scale,
whereas the geographical coordinates are on an interval scale.
POPULATIONS AND SAMPLES
INTRODUCTION
Statistical analysis is built around the concepts of “populations” and “samples.”
A population consists of a well-defined set of elements (either finite or infinite).
More specifically, a population is the entire collection of those elements.
Commonly, such elements are measurements or observations made on items of
a specific type (porosity or permeability, for example). A finite population might
consist of all the wells drilled in the Gulf of Mexico in 1999, whereas, the infinite
population might be all wells drilled in the Gulf of Mexico, past, present, and
future.
A sample is a subset of elements drawn from the population (Davis, 1986).
Samples are studied in order to make inferences about the population itself.
Parameters, Data, And Statistics
Populations possess certain numerical characteristics (such as the population
mean) which are known as parameters. Data are measured or observed values
obtained by sampling the population. A statistic is similar to a parameter, but it
applies to numerical characteristics of the sample data.
Within the population, a parameter consists of a fixed value, which does not
change. Statistics are used to estimate parameters or test hypotheses about the
parent population (Davis, 1986). Unlike the parameter, the value of a statistic is
not fixed, and may change by drawing more than one sample from the same
population.
Remember that values from Populations (parameters) are often assigned Greek
letters, while the values from Samples (statistics) are assigned Roman letters.
Random Sampling
Samples should be acquired from the population in a random manner. Random
sampling is defined by two properties.
- First, a random sample must be unbiased, so that each item in the sample
has the same chance of being chosen as any other item in the sample.
- Second, the random sample must be independent, so that selecting one
item from the population has no influence on the selection of other
items in the population.
Random sampling produces an unbiased and independent result, so that, as the
sample size increases, we have a better chance of understanding the true nature
(distribution) of the population.
One way to determine whether random samples are being drawn is to analyze
sampling combinations. The number of different samples of n measurements that
can be drawn for the population, N, is given by the equation:

Where:
C
N
n
= the number of combinations of samples
N = the number of elements in the population
n = the number of elements in the sample
If the sampling is conducted in a manner such that each of the C
N
n
samples has
an equal chance of being selected, the sampling program is said to be random
and the result is a random sample (Mendenhall, 1971).
Sampling Methods
The method of sampling affects our ability to draw inferences about our data
(such as estimation of values at unsampled locations) because we must know the
probability of an observation in order to arrive at a statistical inference.
Replacement
The issue of replacement plays an important role in our sampling strategy. For
example, if we were to draw samples of cards from a population consisting of a
deck, we could either:
- Draw a card from the deck, and add it‟s value to our hand, then draw
another card
Or
)! n N ( ! n
! N N
n
C
÷
=
- Draw a card from the deck, note it‟s value, and put it back in the deck, then
draw a card from the deck again.
In the first case, we sample without replacement; in the second case we sample
with replacement. Sampling without replacement prevents us from sampling that
value again, while sampling with replacement allows us the chance to pick that
same value again in our sample.
Oilfield Applications to Sampling
When observations having certain characteristics are systematically excluded
from the sample, whether deliberately or inadvertently, the sampling is
considered biased. In the oil industry, we face this situation quite frequently.
Suppose, for example, we may be interested in the pore volume of a particular
reservoir unit for pay estimation. Typically, we use a threshold or porosity cutoff
when making the calculation, thus deliberately biasing the true pore volume to a
larger value.
Similarly, the process of drilling wells in a reservoir necessarily involves sampling
without replacement.
Furthermore, any sample data set will provide only a sparse and incomplete
picture of the entire reservoir. The sampling routine (also known as the drilling
program) is highly biased and dependent, and rightly so -any drilling program will
be biased toward high porosity, high permeability, high structural position, and
ultimately, high production. And the success or failure of nearby wells will
influence further drilling. Because the sample data set represents a minuscule
subset of the population, we will never really know that actual population
distribution function of the reservoir. (We will discuss bias in more detail in our
discussion of summary statistics.)
However, despite these limitations, our task is to infer properties about the entire
reservoir from our sample data set. To accomplish this, we need to use various
statistical tools to understand and summarize the properties of the samples to
make inferences about the population (reservoir).
TRIALS, EVENTS, AND PROBABILITY
INTRODUCTION
In statistical parlance, a trial is an experiment that produces an outcome which
consists of either a success or a failure. An event is a collection of possible
outcomes of a trial. Probability is a measure of the likelihood that an event will
occur, or a measure of that event‟s relative frequency. The following discussion
introduces events and their relation to one another, then provides an overview on
probability.
EVENTS
An event is a collection of possible outcomes, and this collection may contain
zero or more outcomes, depending on how many trials are conducted. Events
can be classified by there relationship to one another:
Independent Events
Events are classified as Independent if the occurrence of event A has no bearing
on the occurrence of event B, and vice versa.
Dependent Events
Events are classified as Dependent if the occurrence of event A influences the
occurrence of event B.
Mutually Exclusive Events
Events are Mutually Exclusive if the occurrence of either event precludes the
occurrence of the other. Two events that are independent events cannot be
mutually exclusive.
PROBABILITY
Probability is a measure of the likelihood that an event will occur, or a measure of
that event‟s relative frequency. The measure of probability is scaled from 0 to 1,
where:
- 0 represents no chance of occurrence, and
- 1 represents certainty that the event will occur.
Probability is just one tool that enables the statistician to use information from
samples to make inferences or describe the population from which the samples
were obtained (Mendenhall, 1971). In this discussion, we will review discrete and
conditional probabilities.
Discrete Probability
All of us have an intuitive concept of probability. For example, if asked to guess
whether it will rain tomorrow, most of us would reply with some confidence that
rain is either likely or unlikely. Another way of expressing the estimate is to use a
numerical scale, such as a percentage scale. Thus, you might say that there is a
30% chance of rain tomorrow, and imply that there is a 70% chance it will not
rain.
The chance of rain is an example of discrete probability; it either will or it will not
rain. The probability distribution for a discrete random variable is a formula, table,
or graph providing the probability associated with each value of the random
variable (Mendenhall, 1971; Davis, 1986). For a discrete distribution, probability
can be defined by the following:
P(E) =

number of outcomes corresponding to event E
total number of possible outcomes
Where:
P = the probability of a particular outcome, and
E = the event
Consider the following classic example of discrete probability, used almost
universally in statistics texts.
Coin Toss Experiment
Coin tossing is a clear-cut example of discrete probability. The event has two
states and must occupy one or the other; except for the vanishingly small
possibility that the coin will land precisely on edge, it must come up either heads
or tails (Davis, 1986: Mendenhall, 1971).
The experiment is conducted by tossing two unbiased coins. When a single coin
is tossed, it has two possible outcomes: heads or tails. Because each outcome is
equally likely, the probability of obtaining a head is ½. This does not imply that
every other toss results in a head, but given enough tosses, heads will appear
one-half the time.
Now let us look at the two-coin example. The sample points for this experiment
with their respective probabilities are given below (taken from Mendenhall, 1971).


Sample
Point
Coin 1 Coin 2 P(E
I
) y
E
1
H H ¼ 2
E
2
H T ¼ 1
E
3
T H ¼ 1
E
4
T T ¼ 0

Let y equal the number of heads observed. We assign the value y = 2 to sample
point E1, y = 1 to sample point E2, etc. The probability of each value of y may be
calculated by adding the probabilities of the sample points in the numerical event.
The numerical event y = 0 contains one sample point, E4; y =1 contains two
sample points, E2 and E3; while
y =2 contains one sample point, E1.

The Probability Distribution Function for y, where y = Number of Heads
y
Sample Points in
y
p(y)
0 E
4
¼
1 E
2
, E
3
½
2 E
1
¼

Thus, for this experiment there is a 25% chance of observing two heads from a
single toss of the two coins. The histogram contains three classes for the random
variable y, corresponding to y = 0, y = 1, and y = 2. Because p(0) = ¼, the
theoretical relative frequency for y = 0 is ¼; p(1) = ½, hence the theoretical
relative probability for y = 1 is ½, etc. The histogram is shown in Figure 1
(Probability Histogram for p(y) (modified from Davis, 1986)).



Figure 1


If you were to draw a sample from this population, by throwi ng two balanced
coins, say 100 times, and recorded the number of heads observed each time to
construct a histogram for the 100 measurements, your histogram would appear
very similar to that of Figure 1. If you repeated the experiment with 1000 coin
tosses, the similarity would be even more pronounced.
Conditional Probability
The concept of conditional probability is key to oil and gas exploration, because
once a well is drilled, it makes more information available, and allows us to revise
our estimates of the probability of further outcomes or events. Two events are
often related in such a way that the probability of occurrence of one event
depends upon whether the other event has or has not occurred. Such a
dependence on a prior event describes the concept of Conditional Probability: the
chance that a particular event will occur depends on whether another event
occurred previously.
For example, suppose an experiment consists of observing weather on a specific
day. Let event A = „snow‟ and B = „temperature below freezing‟. Obviously,
events A and B are related, but the probability of snow, P(A), is not the same as
the probability of snow given the prior information that the temperature is below
freezing. The probability of snow, P(A), is the fraction of the entire population of
observations which result in snow. Now examine the sub-population of
observations resulting in B, temperature below freezing, and the fraction of these
resulting in snow, A. This fraction, called the conditional probability of A given B,
may equal P(A), but we would expect the chance of snow, given freezing
temperatures, to be larger.
In statistical notation, the conditional probability that event A will occur given that
event B has occurred already is written as:
P(A|B)
where the vertical bar in the parentheses means “given” and events appearing to
the right of the bar have occurred (Mendenhall, 1971).
Thus, we define the conditional probabilities of A given B as:
P(A|B) = P(AB)
P(B)

and we define the conditional probabilities of B given A as follows:
P(B|A) = P(AB)
P(A)

Bayes’ Theorem on Conditional Probability
Bayes‟ Theorem allows the conditional probability of an event to be updated as
newer information becomes available. Quite often, we wish to find the conditional
probability of an event, A, given that event B occurred at some time in the past.
Bayes‟ Theorem for the probability of causes follows easily from the definition of
conditional probability:

Where:
P(A | B) = the probability that event A will occur, given that event B has
already occurred
P(B | A) = the probability that event B will occur, given that event A has
already occurred
P(A) = the probability that event A will occur
P(B | A') = the probability that event B will occur, given that event A has not
already occurred
P(A') = the probability that event A will not occur
A practical geostatistical application using Bayes‟ Theorem is described in an
article by Doyen; et al. (1994) entitled Bayesian Sequential Indicator Simulation
of Channel Sands in the Oseberg Field, Norwegian North Sea.
Additive Law of Probability
Another approach to probability problems is based upon the classification of
compound events, event relations, and two probability laws. The first is the
Additive Law of Probability, which applies to unions.
The probability of the union (A B) is equal to:
P(A B) = P(A) + P(B) -P(AB)
If A and B are mutually exclusive, P(AB) = 0 and
P(A B) = P(A) + P(B)
Multiplicative Law of Probability
The second law of probability is called the Multiplicative Law of Probability, which
applies to intersections.
) ' A ( P ) ' A | B ( P ) A ( P ) A | B ( P
) A ( P ) A | B ( P
) B | A ( P
+
=
Given two events, A and B, the probability of the intersection, AB, is equal to
P(AB) = P(A)P(B|A)
= P(B)P(A|B)
If A and B are independent, then P(AB) = P(A)P(B)

RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
INTRODUCTION
Geoscientists are often tasked with estimating the value of a reservoir property at
a location where that property has not been previously measured. The estimation
procedure must rely upon a model describing how the phenomenon behaves at
unsampled locations. Without a model, there is only sample data, and no
inference can be made about the values at locations that were not sampled. The
underlying model and its behavior is one of the essential elements of the
geostatistical framework.
Random variables and their probability distributions form the foundation of the
geostatistical method. Unlike many other estimation methods (such as linear
regression, inverse distance, or least squares) that do not state the nature of their
model, geostatistical estimation methods clearly identify the basis of the models
used (Isaaks and Srivastava, 1989). In this section, we define the random
variable and briefly review the essential concepts of important probability
distributions. The random variable is further explained later, in Spatial Correlation
Analysis and Modeling.
THE PROBABILISTIC APPROACH
Deterministic models are applicable only when the process that generated the
data is known in sufficient detail to enable an accurate description of the entire
population to be made from only a few sample values. Unfortunately, few
reservoir processes are understood well enough to permit application of
deterministic models. Although we know the physics or chemistry of the
fundamental processes, the variables we study in reservoir data sets are often
the product of complex interactions that are not fully quantifiable. These
processes include, for example, depositional mechanisms, tectonic processes,
and diagenetic alterations.
For most reservoir data sets, we must accept that there is an unavoidable degree
of uncertainty about how the attribute behaves between sample locations (Isaaks
and Srivastava, 1989). Thus, a probabilistic approach is required, and the
following random function models introduced herein recognize this fundamental
uncertainty, providing us with tools to estimate values at unsampled locations.
The following discussion describes the two kinds of random variables. Next, we‟ll
discuss the probability distributions or functions associated with each type
random variable.


RANDOM VARIABLE DEFINED
A random variable can be defined as a numerical outcome of an experiment
whose values are generated randomly according to some probabilistic
mechanism. A random variable associates a unique numerical value with every
outcome, so the value of the random variable will vary with each trial as the
experiment is repeated.
The throwing of a die, for example, produces values randomly from the set
1,2,3,4,5,6. The coin toss is another experiment that produces numbers
randomly. (In the case of a coin toss, however, we need to designate a numerical
value to “heads” as 0 and “tails” as 1; then we can draw randomly from the set
0,1.)
TWO CLASSES OF RANDOM VARIABLES
There are two different classes of random variables, with the distinction based on
the sample interval associated with the measurement. The two classes are the
discrete and the continuous random variable. We will discuss each in turn.
Discrete Random Variables
A discrete random variable may be identified by the number and nature of the
values it assumes; it may assume only a finite range of distinct values (distinct
values being the operative phrase here, e.g.: 0,1,2,3,4,5 -as opposed to each and
every number between 0 and 1 -which would produce an infinite number of
values).
In most practical problems, discrete random variables represent count (or
enumerated) data, such as point counts of minerals in a thin section. The die and
coin toss experiments also generate discrete random variables.
Discrete random variables are characterized by a probability distribution, which
may be described by a formula, table or graph that provides the probability
associated with each value of the discrete random variable. The probability
distribution function of discrete random variables may be plotted as a histogram.
Refer to Figure 1 (Probability histogram) as an example histogram for a discrete
random variable.


Figure 1


Frequency Tables and Histograms
Discrete random variables are often recorded in a frequency table, and displayed
as a histogram. A frequency table records how often data values fall within
certain intervals or classes. A histogram is a graphical representation of the
frequency table.
It is common to use a constant class width for a histogram, so that the height of
each bar is proportional to the number of values within that class. Data is
conventionally ranked in ascending order, and thus can be represented as a
cumulative frequency histogram, where the total number of values below certain
cutoffs are shown, rather than the total number of values in each class.












Table 1 Frequency and Cumulative Frequency tables of 100 values, X, with a
class width of one (modified from Isaaks and Srivastava, 1989).
Class
Interva
l
Frequency
Occurrenc
es
Frequenc
y
Percenta
ge
Cumulati
ve
Number
Cumulativ
e
Percentag
e
0-1 1 1 1 1
1-2 1 1 2 2
2-3 0 0 2 2
3-4 0 0 2 2
4-5 3 3 5 5
5-6 2 2 7 7
6-7 2 2 9 9
7-8 13 13 22 22
8-9 16 16 38 38
9-10

11 11 49 49
10-
11
13 13 62 62
11-
12
17 17 79 79
12-
13
13 13 92 92
13-
14
4 4 94 94
>14 4 4 100 100

Figure 2a and 2b display Frequency and Cumulative frequency histograms of
data in Table 1 (modified from Isaaks and Srivastava,


Figure 2a


1989).


2b



(Sometimes, the histograms are converted to continuous curves by running a line
from the midpoint of each bar in the histogram. This process may be convenient
for comparing continuous and discrete random variables, but may tend to
confuse the presentation.)
Continuous Random Variables
These variables are defined by an infinitely large number of possible values
(much like a segment of a number-line, which can be repeatedly subdivided into
smaller and smaller intervals to create an infinite number of increments).
In most practical problems, continuous random variables represent measurement
data, such as the length of a line, or the thickness of a pay zone.
The probability density function of the continuous random variable may be plotted
as a continuous curve. Although such curves may assume a variety of shapes, it
is interesting to note that a very large number of random variables observed in
nature approximate a bell-shaped curve. A statistician would say that such a
curve approximates a normal distribution (Mendenhall, 1971).
Probability Distributions Of The Discrete Random Variable
The probability distribution of a discrete random variable consists of the relative
frequencies with which a random variable takes each of its possible values. Four
common probability distributions for discrete random variables are: Binomial,
Negative Binomial, Poisson, and Hypergeometric. Each of these distributions is
discussed using practical geological examples taken from Davis (1986).
Binomial Probability Distribution
Binomial distributions only apply to a special type of discrete random variable,
called a binary variable. Binary variables can only have two values: such as ON
or OFF, SUCCESS or FAILURE, 0 or 1. (Often times, values such as ON or OFF,
and SUCCESS or FAILURE will be assigned the numerical values of 1 or 0
respectively.) Similarly, binomial distributions are only valid for trials in which
there are only two possible outcomes for each trial. Furthermore, the total
number of trials must be fixed beforehand, all of the trials must have the same
probability of success, and the outcomes of all the trials must not be influenced
by the outcomes of previous trials. The probability distribution governing a coin
toss or die throwing experiment is a binomial distribution.
We‟ll consider how the binomial distribution can be applied to the following oilfield
example.
Problem: Forecast the probability of success of a drilling program.
Assumptions: Each wildcat is classified as either:
0 = Failure (dry hole)
1 = Success (discovery)
The binomial distribution is appropriate when a fixed number of wells will be
drilled during an exploratory program or during a single period (budget cycle) for
which the forecast is made.
In this case, each well that is drilled in turn is presumed to be independent; this
means that the success or failure of one hole does not influence the outcome of
the next. Thus, the probability of discovery remains unchanged as successive
wildcats are drilled (true initially -as Davis pointed out in 1986, this assumption is
difficult to justify in most cases, because a discovery or failure influences the
selection of subsequent drilling locations).
The probability p that a wildcat well will discover gas or oil is estimated using an
industry-wide success ratio for drilling in similar areas, or based on the
company‟s own success ratio. Sometimes the success ratio is a subjective
“guess.” From p, the binomial model can be developed for exploratory drilling as
follows:
P
The probability that a hole will be successful.
-p
The probability of failure.
P = (1 -p)
n
The probability that n successive wells will be dry.
P = (1 -p)
n-1
p
The probability that the n
th
hole will be a discovery, but the preceding (n -1) holes
will be dry.
P = n(1 -p)
n-1
p
The probability of drilling one discovery well in a series of n wildcat holes, where
the discovery can occur in any of the n wildcats.
P = (1 -p)
n-r
p
r
The probability that (n -r) dry holes will be drilled, followed by r discoveries.
However, the (n -r) dry holes and the r discoveries may be arranged in
combinations, or equivalently, in
n! / (n -r)!r! different ways, resulting in the equation:
P = [n! / (n -r)!r!][(1 -p)
n-r
p
r
]
The probability that r discoveries will be made in a drilling program of n wildcats.
This is an expression of the binomial distribution, and gives the probability that r
successes will occur in n trials, when the probability of success in a single trial is
p.
For example, suppose we want to find the probability of success associated with
a 5-well exploration program in a virgin basin where the success ratio is
anticipated to be about 10%. What is the probability that the entire exploration
program will be a total failure, with no discoveries?
The terms of the equation are:
N = 5
r = 0
p = 0.10
P = [(5!/5!0!] [1] [0.9
5
]
= 0.59
Where:
P = the probability of success
r = the number of discovery wells
(
¸ ¸
r
n
p = anticipated success ratio
n = the number of holes drilled in the exploration program
The probability of no discoveries resulting from exploratory effort is almost 60%.
Using either the binomial equation or a table for the binomial distribution, Figure 3
(Discrete distribution giving the probability of making n discoveries in a five-well
drilling program when the success ratio (probability of discovery) is 10%
(modified from Davis,


Figure 3


1986) shows the probabilities associated with all possible outcomes of the five-
well drilling program.
Negative Binomial Probability Distribution
Other discrete distributions can be developed for experimental situations with
different basic assumptions. We can develop a Negative Binomial Probability
Distribution to find the probability that x dry holes will be drilled before r
discoveries are made.
Problem: Drill as many holes as needed to discover two new fields in a virgin
basin.
Assumption: The same conditions that govern the binomial distribution are
assumed, except that the number of “trials” is not fixed.
The probability distribution governing such an experiment is the negative
binomial. Thus we can investigate the probability that it will require, 2, 3, 4, …, up
to n exploratory wells before two discoveries are made.
The expanded form of the negative binomial equation is
P = [(r + x -1)!/(r -1)!x!][(1 -p)
x
p
r
Where:
P = the probability of success
r= the number of discovery wells
x = the number of dry holes
p = regional success ratio
If the regional success ratio is 10 %, the probability that a two-hole exploration
program will meet the company‟s goal of two discoveries can be calculated:
r = 2
x = 0
p = 0.10
P = 0.029
The calculated probabilities are low because they relate to the likelihood of
obtaining two successes and exactly x dry holes (in this case: x = zero). It may be
more appropriate to consider the probability distribution that more than x dry
holes must be drilled before the goal of r discoveries is achieved. We do this by
first calculating the cumulative form of the negative binomial. This gives the
probability that the goal of two successes will be achieved in (x + r) or fewer
holes, as shown in Figure 4 (Discrete distribution giving the cumulative
probability that two discoveries will be made by or before a specified hole is
drilled, when the success ratio is 10% (modified from Davis, 1986)).


Figure 4


Each of these probabilities is then subtracted from 1.0 to yield the desired
probability distribution illustrated in Figure 5 (Discrete distribution giving the
probability that more than a specified number of holes must be drilled to make
two discoveries, when the success ratio is 10% (modified from Davis, 1986)).



Figure 5


Poisson Probability Distribution
A Poisson random variable is typically a count of the number of events that occur
within a certain time interval or spatial area. The Poisson probability distribution
seems to be a reasonable approach to apply to a series of geological events. For
example, the historical record of earthquakes in California, the record of volcanic
eruptions in the Mediterranean, or the incidence of landslides related to El Nino
along the California coast can be characterized by Poisson distributions.
The Poisson probability model assumes that:
- events occur independently,
- the probability that an event will occur does not change with time,
- the length of the observation period is fixed in advance,
- the probability that an event will occur in an interval is proportional to the
length of the interval, and
- the probability of more than one event occurring at the same time is
vanishingly small.
When the probability of success becomes very small, the Poisson Distribution
can be used to approximate the binomial distribution with parameters n and p.
This is a discrete probability distribution regarded as the limiting case of the
binomial when:
- n, the number of trials becomes very large, and
- p, the probability of success on any one trial becomes very small.
The equation in this case is
p(X) = e

ì
x
/X!
Where
p(X) = probability of occurrence of the discrete random variable X
ì = rate of occurrence
Note that the rate of occurrence, ì, is the only parameter of the distribution.
The Poisson distribution does not require either n or p directly, because we use
the product np = ì instead, which is given by the rate of occurrence of events.
Hypergeometric Probability Distributions
The binomial distribution would not be appropriate for calculating the probability
of discovery because the chance of success changes with each wildcat well. For
example, we can use Statistics to argue two distinctly contradictory cases:
- Discovery of one reservoir increases the odds against finding another
(fewer fields remaining).
- Drilling a dry hole increases the probability that the remaining untested
features will prove productive.
What we need is to find all possible combinations of producing and dry features
within the population, then enumerate those combinations that yield the desired
number of discoveries.
The probability distribution generated by sampling without replacement, is called
a hypergeometric distribution. Consider the following:
Problem: An offshore concession contains 10 seismic anomalies, with a
historical success ratio of 40%. Our limited budget will permit only six
anomalies to be drilled. Assume that if four structures are productive, the
discovery of one reservoir increases the odds against finding another. What
will be the number of discoveries?
The probability of making x discoveries in a drilling program consisting of n holes,
when sampling from a population of N prospects of which S are believed to
contain commercial reservoirs, is
¦S¹¦N -S¹
¹x )¹n -x )
P = ¦N¹
¹n )
Where:
x = the number of discoveries
N = the number of prospects in the population
n = the number of holes drilled
S = the number commercial reservoirs
This expression represents the number of combinations of reservoirs, taken by
the number of discoveries, times the number of combinations of barren
anomalies, taken by the number of dry holes, all divided by the number of
combinations of all prospects taken by the total number of holes in the drilling
program (Davis, 1989).
Applying this to our offshore concession example containing ten seismic
anomalies, from which four are likely to be reservoirs, what are the probabilities
associated with a three-well drilling program?
- The probability of total failure, with no discoveries among the three
structures is about 17%.
- The probability of one discovery is about 50%.
A histogram of all possible outcomes of this exploration strategy is shown in
Figure 6 (Discrete distribution giving the probability of n discoveries in three holes
drilled on ten prospects, when four of the ten contain reservoirs (modified from
Davis, 1986)). Note that some probability of success is (1.00 -0.17), or 83%.



Figure 6







Frequency Distributions Of Continuous Random Variables
Frequency distributions of continuous random variables follow a theoretical
probability distribution or probability density function that can be represented by a
continuous curve. These functions can take on a variety of shapes. Rather than
displaying the functions as a curve, the distributions may be displayed as a
histogram, as shown in Figure 7a, 7b,



Figure 7a


7c,



7b





7c


and 7d (Examples of some continuous variable probability distributions).



7d


In this section, we will discuss the following common distribution functions:
- Normal Probability Distribution
- Lognormal Distribution

Normal Probability Distribution
It is often assumed that random variables follow a normal probability density
function, and many statistical (and geostatistical) methods are based on this
supposition. The Central Limit Theorem is the foundation of the normal probability
distribution.
Central Limit Theorem
The Central Limit Theorem (CLT) states that under rather general conditions, as
the sample size increases, the sums and means of samples drawn from a
population of any distribution will approximate a normal distribution (Sokol and
Rohlf, 1969; Mendenhall, 1971). The Central Limit Theorem is defined below:
Central Limit Theorem:
If random samples of n observations are drawn from a population with finite
mean, µ, and a standard deviation, o, then, as n grows larger, the sample
mean, y, will be approximately normally distributed with mean equal to µ
and standard deviation o/\n. The approximation will become more and
more accurate as n becomes large (Mendenhall, 1971).
The Central Limit Theorem consists of three statements:
1. The mean of the sampling distribution of means is equal to the mean of
the population from which the samples were drawn.
2. The variance of the sampling distribution of means is equal to the variance
of the population from which the samples were drawn, divided by the size
of the samples.
3. If the original population is distributed normally (i.e. it is bell shaped), the
sampling distribution of means will also be normal. If the original
population is not normally distributed, the sampling distribution of means
will increasingly approximate a normal distribution as sample size
increases (i.e. when increasingly large samples are drawn).
The significance of the Central Limit Theorem is twofold:
1. It explains why some measurements tend to possess (approximately) a
normal distribution.
2. The most important contribution of the CLT is in statistical inference. Many
algorithms that are used to make estimations or simulations require
knowledge about the population density function. If we can accurately
predict its behavior using only a few parameters, then our predictions
should be more reliable. If the CLT applies, then knowing the sample
mean and sample standard deviation, the density distribution can be
recreated precisely.
However, the disturbing feature of the CLT, and most approximation procedures,
is that we must have some idea as to how large the sample size, n, must be in
order for the approximation to yield useful results. Unfortunately, there is no
clear-cut answer to this question, because the appropriate value of n depends
upon the population probability distribution as well as the use we make of the
approximation. Fortunately, the CLT tends to work very well, even for small
samples, but this is not always true.

Properties of the Normal Distribution
Formally, the Normal Probability Density Function is represented by the following
expression:

Where
Z is the height of the ordinate (y-axis) of the curve and represents the
density of the function. It is the dependent variable in the expression, being
a function of the variable Y.
There are two constants in the equation: t, well-known to be approximately
3.14159, making 1/\2t equal 0.39894, and e, the base of the Naperian or
natural logarithms, whose value is approximately 2.71828.
There are two parameters in the normal probability density function. These
are the parametric mean, µ, and the standard deviation, o, which determine
the location and shape of the distribution (these parameters are discussed
under Summary Statistics). Thus, there is not just one normal distribution,
rather there is an infinity of such curves, because the parameters can
assume an infinity of values (Sokol and Rohlf, 1969).


Figure 8a


Figure 8a (Illustration of how changes in the two parameters of the normal
distribution affect the shape and position of histograms. Left (µ = 4, o = 1).
Right(µ = 8, o = 0.5)) illustrates the impact of parameters on the shape of a
probability distribution histogram.
The histogram (or curve) is symmetrical about the mean. Therefore the mean,
median and mode (described later under this subtopic) of the normal distribution
occur at the same point. Figure 8b (Bell curve) shows that the curve of a
(
(
¸
(

¸

|
.
|

\
| ÷
÷ =
2
2
1
2
1
o
µ
t o
Y
e Z
Gaussian normal distribution can be described by the position of its maximum,


Figure 8b


which corresponds to its mean (µ) and its points of inflection. The distance
between µ and one of the points of inflection represents the standard deviation,
sometimes referred to as the mean variation. The square of the mean variation is
the variance.
In a normal frequency distribution, the standard deviation may be used to
characterize the sample distribution under the bell curve. According to Sokol and
Rohlf, (1969): 68.3% of all sample values fall within -1o to +1o from the mean,
while 95.4% of the sample values fall within -2oand +2o from the mean, and
99.7% of the values are contained within -3o and +3o of the mean. This bears
repeating, in a different format this time:
- µ ± o (1 standard deviation) contains 68.3% of the data
- µ ± 2o (2 standard deviations) contain 95.46% of the data
- µ ± 3o (3 standard deviations) contain 99.73% of the data
How are the percentages calculated? The direct calculation of any portion of the
area under the normal curve requires an integration of the function shown as the
above expression. Fortunately, for those who have forgotten their calculus, the
integration has recorded in tabular form (Sokol and Rohlf, 1969). These tables
can be found in most standard statistical books, for example, see Statistical
Tables and Formulas, Table 1 (Hald, 1952).
Application of the Normal Distribution
The normal frequency distribution is the most widely used distribution in statistics.
There are three important applications of the density function (Sokol and Rohlf,
1969).
1. Sometimes we need to know whether a given sample is normally
distributed before we can apply certain tests. To test whether a sample
comes from a normal distribution we must calculate the expected
frequencies for a normal curve of the same mean and standard deviation,
then compare the two curves.
2. Knowing when a sample comes from a normal distribution may confirm or
reject underlying hypotheses about the nature of the phenomenon studied.
3. Finally, if we assume a normal distribution, we may make predictions
based upon this assumption. For the geosciences, this means a better and
unbiased estimation of reservoir parameters between the well data.
Normal Approximation to the Binomial Distribution
Recall that approximately 95% of the measurements associated with a normal
distribution lie within two standard deviations of the mean and almost all lie within
three standard deviations. The binomial probability distribution would nearly be
symmetrical if the distribution were able to spread out a distance equal to two
standard deviations on either side of the mean, which in fact is the case.
Therefore, to determine the normal approximation we calculate the following
when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and
p, respectively:
- µ = np
- o = \npq
If the interval µ ± 2o lies within the binomial bounds, 0 and n, the approximation
will be reasonably good (Mendenhall, 1971).
Lognormal Distribution
Many variables in the geosciences do not follow a normal distribution, but are
highly skewed, such as the distribution in Figure 7b, and as shown below.
Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of
hundred thousand-barrel equivalent.


Figure 9


The histogram illustrates that most fields are small, with decreasing numbers of
larger fields, and a few rare giants that exceed all others in volume. If the
histograms of Figure 7b and Figure 9 are converted to logarithmic forms (that is,
we use Y
i
= log X
i
instead of Y
i
=X
i
for each observation), the distribution
becomes nearly normal. Such variables are said to be lognormal.
Transformation of Lognormal data to Normal
The data can be converted into logarithmic form by a process known as
transformation. Transforming the data to a standardized normal distribution (i.e.,
zero mean and unit variance) simplifies data handling and eases comparison to
different data sets.
Data which display a lognormal distribution, for example, can be transformed to
resemble a normal distribution by applying the formula ln(z) to each z variate in
the data set prior to conducting statistical analysis. The success of the
transformation can be judged by observing its frequency distribution before and
after transformation. The distribution of the transformed data should be markedly
less skewed than the lognormal data. The transformed values may be back-
transformed prior to reporting results.
Because of its frequent use in geology, the lognormal distribution is extremely
important. If we look at the transformed variable Y
i
rather than X
i
itself, the
properties of the lognormal distribution can be explained simply by reference to
the normal distribution.
In terms of the original transformed variable X
i
, the mean of Y corresponds to the
nth root of the products of X
i
,

n
Xi GM Y H = =
Where:
GM is the geometric mean
H is analogous to E, except that all the elements in the series are multiplied
rather than added together (Davis, 1986).
In practice, it is simpler to convert the measurements into logarithms and
compute the mean and variance. If you want, the geometric mean and variance
compute the antilog of Y and s
2
y
. If you work with the data in the transformed
state, all of the statistical procedures that are appropriate for ordinary variables
are applicable to the log transformed variables (Davis, 1986).
The characteristics of the lognormal distribution are discussed in a monograph by
Aitchison and Brown (1969) and in the geological context by Kock and Link
(1981).
Random Error
Random errors for normal distributions are additive, which means that errors of
opposite sign tend to cancel one another, and the final measurement is near the
true value. Lognormal distribution random errors are multiplicative, rather than
additive, thus produce an intermediate product near the geometric mean.
UNIVARIATE DATA ANALYSIS
INTRODUCTION
There are several ways in which to summarize a univariate (single attribute)
distribution. Quite often we will simply compute the mean and the variance, or
plot its histogram. However, these statistics are very sensitive to extreme values
(outliers) and do not provide any spatial information, which is the heart of a
geostatistical study. In this section, we will describe a number of different
methods that can be used to analyse data for a single variable.
SUMMARY STATISTICS
The summary statistics represented by a histogram can be grouped into three
categories:
- measures of location,
- measures of spread, and
- measures of shape.
Measures of Location
Measures of location provide information about where the various parts of the
data distribution lie, and are represented by the following:
- Minimum: Smallest value.
- Maximum: Largest value.
- Median: Midpoint of all observed data values, when arranged in
ascending order. Half the values are above the median, and half are
below. This statistic represents the 50th percentile of the cumulative
frequency histogram and is not generally affected by an occasional
erratic data point.
- Mode: The most frequently occurring value in the data set. This value falls
within the tallest bar on the histogram.
- Quartiles: In the same way that the median splits the data into halves, the
quartiles split the data in quarters. Quartiles represent the 25th, 50th
and 75th percentiles on the cumulative frequency histogram.
- Mean: The arithmetic average of all data values. (This statistic is quite
sensitive to extreme high or low values. A single erratic value or outlier
can significantly bias the mean.) We use the following formula to
determine the mean of a Population:
Mean = µ =

where:
µ = population mean
N = number of observations (population size)
E Z
I
= sum of individual observations

We can determine the mean of a Sample in a similar manner. The
below formula for the sample mean is comparable to the above
formula, except that population notations have been replaced with
those for samples.
Mean =

where:
= sample mean
n = number of observations (sample size)
E Z
I
= sum of individual observations
Measures of Spread
Measures of spread describe the variability of the data values, and are
represented by the following:
- Variance: Average squared difference of the observed values from the
mean. Because the variance involves squared differences, this statistic
is very sensitive to abnormally high/low values.
Variance = o
2
=
Kachigan (1986) notes that the above formula is only appropriate for
defining variance of a population of observations. If this same formula was
applied to a sample for the purpose of estimating the variance of the
parent population from which the sample was drawn, then the formula
above will tend to underestimate the population variance. This
underestimation occurs as repeated samples are drawn from the
population and the variance is calculated from each, using the sample
mean ( ), rather than the population mean (µ). The resulting average of
N
i
EZ
n
i
x
EZ
=
x
( )
N
i
2
µ ÷ Z E
x
these variances would be lower than the true value of the population
variance (assuming we were able to measure every single member of the
population).
We can avoid this bias by taking the sum of squared deviations and
dividing that sum by the number of observations – less one. Thus, the
sample estimate of population variance is obtained using the following
formula:
Variance = s
2
=

- Standard Deviation: Square root of the variance.
Standard Deviation = o =


This measure is used to show the extent to which the data is spread
around the vicinity of the mean, such that a small value of standard
deviation would indicate that the data was clustered near to the mean. For
example, if we had a mean equal to 10, and a standard deviation of 1.3,
then we could predict that most of our data would fall somewhere between
(10 - 1.3) and (10 + 1.3), or between 8.7 to 11.3. The standard deviation is
often used instead of the variance, because the units are the same as the
units of the attribute being described.
- Interquartile Range: Difference between the upper (75th percentile) and
the lower (25th percentile) quartile. Because this measure does not use
the mean as the center of distribution, it is less sensitive to abnormally
high/low values.
Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %, but
different variances.

( )
1
2
÷
÷ Z E
n
x
i
2
o

1b


Outliers or “Spurious” Data



Figure 1a


Another statistic to consider is the Z-score; a summary statistic in terms of
standard deviation. Data which “appear” to be anomalous based on its Z-score
which have absolute values are greater than a specified cutoff are termed
outliers. The typical cutoff is 2.5 standard deviations from the mean. The formula
is the ratio of the data value minus the sample mean to the sample variance.
Z
score
= (Z
i
-µ) / o
This statistic serves as a caution, signifying either bad data, or a true local
anomaly, which must be taken into account in the final analysis.
Note: The Z-score transform does not change the shape of the histogram. The
transform re-scales the histogram with a mean equal 0 and a variance equal 1. If
the histogram is skewed before being transformed, it retains the same shape
after the transform. The X-axis is now in terms of ± standard deviation units about
the mean of zero.
Measures of Shape
Measures of shape describe the appearance of the histogram and are
represented by the following:
- Coefficient of Skewness: Averaged cubed difference between the data
values and the mean, divided by the cubed root of the standard
deviation. This measure is very sensitive to abnormally high/low
values:
CS = {1/n E (Z
i
-µ)
3
/ o
1/3
where:
µ is the mean
o is the standard deviation
n is the number of X and Y data pairs
The coefficient of skewness allows us to quantify the symmetry of the data
distribution, and tells us when a few exceptional values (possibly outliers?)
exert a disproportionate effect upon the mean.
 positive: long tail of high values (median < mean)
 negative: long tail of low values (median > mean)
 zero: a symmetrical distribution
Figure 2a, 2b,



Figure 2a


and 2c


2c


illustrate histograms with negative, symmetrical and positive skewness.



2b


- Coefficient of Variation: Often used as an alternative to skewness as a
measure of asymmetry for positively skewed distributions with a
minimum at zero. It is defined as the ratio of the standard deviation to
the mean. A value of CV > 1 probably indicates the presence of some
high erratic values (outliers).
CV = o/µ
where:
o is the standard deviation
µ is the mean
SUMMARY OF UNIVARIATE STATISTICAL MEASURES AND DISPLAYS
Advantages
- Easy to calculate.
- Provides information in a very condensed form.
- Can be used as parameters of a distribution model (e.g., normal
distribution defined by sample mean and variance).
Limitations
- Summary statistics are too condensed, and do not carry enough
information about the shape of the distribution.
- Certain statistics are sensitive to abnormally high/low values that properly
belong to the data set (e.g., µ, o, o
2
, CS).
- Offers only a limited description, especially if our real interest is in a
multivariate data set (attributes are correlated).
BIVARIATE STATISTICAL MEASURES AND DISPLAYS
INTRODUCTION
Methods for bivariate description not only provide a means to describe the
relationship between two variables, but are also the basis for tools used to
analyze the spatial content of a random function (to be described in the Spatial
Correlation and Modeling Analysis section). The bivariate summary methods
described in this section only measure the linear relationship between the
variables - not their spatial features.
THE RELATIONSHIP BETWEEN VARIABLES
Bivariate analysis seeks to determine the extent to which one variable is related
to another variable. We can reason that if one variable is indeed related to
another, then information about the first variable might help us to predict the
behavior of the second. If, on the other hand, our analysis of these two variables
shows absolutely no relationship between the two, then we might need to discard
one from the pair in favor of a different variable which will be more predictive the
other variable's behavior.
The relationship between two variables can be described as complementary,
parallel, or reciprocal. Thus, we might observe a simultaneous increase in value
between two variables, or a simultaneous decrease. We might even see a
simultaneous decrease in the value of one variable while the other increases. An
alternative way of characterizing the relationship between two variables would be
to describe their behaviors in terms of variance. In this case, we observe how the
value of one variable may change (or vary) in a manner that leaves the
relationship with the second variable unchanged. (For instance, if the relationship
was defined by a 1:10 ratio, then as the value of one variable changed, the other
would vary by 10 times that amount - thus preserving the relationship.)
Dependent and Independent Variables
Where a relationship between variables does exist, we can characterize each
variable as being either dependent or independent. We use the behavior of the
independent (or predictor) variable to determine how the dependent (or criterion)
variable will react. For instance, we might expect that an increase in the value of
the independent variable would result in a corresponding increase in the value of
the dependent variable.
COMMON BIVARIATE METHODS
The most commonly used bivariate statistical methods include:
- Scatterplots
- Covariance
- Product Moment Correlation Coefficient
- Linear Regression
We will discuss each of these methods in turn, below.
SCATTERPLOTS
The most common bivariate plot is the Scatterplot, Figure 1 (Scatterplot of
Porosity (dependent variable) versus Acoustic Impedance (independent
variable)).


Figure 1


This plot follows a common convention, in which the dependent variable (e.g.,
porosity) is plotted on the Y-axis (ordinate) and the independent variable (e.g.,
acoustic impedance) is plotted on the X-axis (abscissa). This type of plot serves
several purposes:
- detects a linear relationship,
- detects a positive or inverse relationship,
- identifies potential outliers,
- provides an overall data quality control check.
This plot displays an inverse relationship between porosity and acoustic
impedance, that is, as porosity increases, acoustic impedance decreases. This
display should be generated before calculating bivariate summary statistics, like
the covariance or correlation coefficient, because many factors affect these
statistical measures. Thus, a high or low value has no real meaning until verified
visually.
A common geostatistical application of the scatterplot is the h-scatterplot. (In
geostatistics, h commonly refers to the lag distance between sample points.)
These plots are used to show how continuous the data values are over a certain
distance in a particular direction. If the data values at locations separated by h
are identical, they will fall on a line x = y, a 45-degree line of perfect correlation.
As the data becomes less and less similar, the cloud of points on the h-
Scatterplot becomes fatter and more diffuse. A later section will present more
detail on the h-scatterplot.
COVARIANCE
Covariance is a statistic that measures the correlation between all points of two
variables (e.g., porosity and acoustic impedance). This statistic is a very
important tool used in Geostatistics to measure spatial correlation or dissimilarity
between variables, and forms the basis for the correlogram and variogram
(detailed later).
The magnitude of the covariance statistic is dependent upon the magnitude of the
two variables. For example, if the X
i
values are multiplied by the factor k, a scalar,
then the covariance increases by a factor of k. If both variables are multiplied by
k, then the covariance increases by k
2
. This is illustrated in the table below.
VARIABLES

COVARIAN
CE

X and Y

3035.63

X*10 and Y

30356.3

X*10 and
Y*10

303563


The covariance formula is:
COV
x,y
=
( )( )
n
y i x i
µ ÷ Y µ ÷ X E
where:
X
i
is the X variable
Y
i
is the Y variable
µ
x
is the mean of X
µ
y
is the mean of Y
n is the number of X and Y data pairs
It should be emphasized that the covariance is strongly affected by extreme pairs
(outliers).
Product Moment Correlation Coefficient
The product moment correlation coefficient ( µ ) is more commonly called simply
the correlation coefficient, and is a statistic that measures the linear relation
between all points of two variables (e.g., porosity and velocity). This linear
relationship is assigned a value that ranges between +1 to -1, depending on the
degree of correlation:
+1 = perfect, positive correlation
0 = no correlation -a totally random relation
-1 = perfect inverse correlation.
Figure 2 illustrates scatterplots showing positive correlation, no correlation, and
inverse correlation between two variables.


Figure 2


The numerator for the correlation coefficient is the covariance. This value is
divided by the product of the standard deviations for variables X and Y. This
normalizes the covariance, thus removing the impact of the magnitude of the data
values. Like the covariance, outliers adversely affect the correlation coefficient.
The Correlation Coefficient formula (for a population) is:
Corr. Coeff.
x,y
=
( )( )
y x
n
y i
Y
x i
X
y , x
o o
(
(
¸
(

¸
µ ÷ µ ÷
E
= µ
where:
X
i
is the X variable
Y
i
is the Y variable
µ
x
is the mean of X
µ
y
is the mean of Y
o
x
is the standard deviation of X
o
y
is the standard deviation of Y
n is the number of X and Y data pairs
As with other statistical formulas, Greek is used to signify the measure of a
population, while algebraic notation ( r ) is used for samples.
Rho Squared
The square of the correlation coefficient µ
2
(also referred to as r
2
) is a measure of
the variance accounted for in a linear relation. This measure tells us about the
extent to which two variables covary. That is, it tells us how much of the variance
seen in one variable can be predicted by the variance found in the other variable.
Thus, a value of µ = -0.83 between porosity and acoustic impedance tells us that
as porosity increases in value, velocity decreases, which has a real physical
meaning. However, only about 70% (actually, it is -0.83
2
, or 68.89%)of the
variability in porosity is explained by its relationship with acoustic impedance.
In keeping with statistical notation, the Greek symbol µ
2
is used to denote the
correlation coefficient of a population, while the algebraic equivalent is used to r
2

refer to the correlation coefficient of a sample.
Linear Regression
Linear regression is another method we use to indicate whether a linear
relationship exists between two variables. This is a useful tool, because once we
establish a linear relationship, we may later be able to interpolate values between
points, extrapolate values beyond the data points, detect trends, and detect
points that deviate away from the trend.
Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic
impedance, with a correlation coefficient of -0.83), shows a simple display of
regression.


Figure 3


When two variables have a high covariance (strong correlation), we can predict a
linear relationship between the two. A regression line drawn through the points
of the scatterplot helps us to recognize the relationship between the variables. A
positive slope (from lower left to upper right) indicates a positive or direct
relationship between variables. A negative slope (from upper left to lower right)
indicates a negative or inverse relationship. In the example illustrated in the
above figure, the porosity clearly tends to decrease as acoustic impedance
increases.
The regression equation has the following general form:
Y = a + bX
i
,
where:
Y is the dependent variable, or the variable to be estimated (e.g., porosity)
X
i
is the independent variable, or the estimator (e.g., velocity)
b is the slope; defined as b = µ (o
y
/o
x
), and
µ is the correlation coefficient between X and Y
o
x
is the standard deviation of X
o
y
is the standard deviation of Y
a is a Constant, which defines the ordinate (Y-axis) intercept
and:
a = µ
x
-bµ
y
µ
x
is the mean of X
µ
y
is the mean of Y
With this equation, we can plot a regression line that will cross the Y-axis at the
point “a”, and will have a slope equal to “b”
Linear equations can include polynomials of any degree, and may include
combinations of logarithmic, exponential or any other non-linear variables.
The terms in the equation for which coefficients are computed are independent
terms, and can be simple (a single variable) or compound (several variables
multiplied together). It is also common to use cross terms (the interaction
between X and Y), or use power terms.
Z = a + bX +cY: uses X and Y as predictors and a constant
Z = a + bX + cY + dXY: adds the cross term
Z = a + bX + cY + dXY +eX
2
+ fY
2
: adds the power terms
SUMMARY OF BIVARIATE STATISTICAL MEASURES AND DISPLAYS
Advantages
- Easy to calculate.
- Provides information in a very condensed form.
- Can be used to estimate one variable from another variable or from
multiple variables.
Limitations
 mvarhaug·- Summary statistics sometimes can be too condensed,
and do not carry enough information about the shape of the
distribution.
- Certain statistics are sensitive to abnormally high/low values that properly
belong to the data set (e.g., covariance, correlation coefficient).
Outliers can highly bias a regression predication equation.
- No spatial information.
EXPLORATORY DATA ANALYSIS
The early phase of a geostatistics project often employs classical statistical tools
in a general analysis and description of the data set.
This process is commonly referred to as Exploratory Data Analysis, or simply
EDA. It is conducted as a way of validating the data itself -you need to be sure
each value that you plug into the geostatistical model is valid. (Remember:
garbage in -garbage out!) By analyzing the data itself, you can determine which
points represent anomalous values of an attribute (outliers) that should either be
disregarded or should be scrutinized more closely.
EDA is an important precursor to the final goal of a geostatistical study, which
may be interpolation, or simulation and assessment of uncertainty. Unfortunately,
in many studies, including “routine” mapping of attributes, EDA is often
overlooked. However, it is absolutely necessary to have a good understanding of
your data, so taking the time in EDA to check the quality of the data, as well as
exploring and describing the data set, will reward you with improved results.
Formatted: Bullets and Numbering
The classical statistical tools described in previous sections, along with the tools
that we will introduce in the sections under the Data Validation heading, will help
you to conduct a thorough analysis of your data.
EDA PROCESS
Note that there is no one set of prescribed steps in EDA. Often, the process will
include a number of the following tasks, depending on the amount and type of
data involved:
- data preprocessing
- univariate and multivariate statistical analysis
- identification and probable removal of outliers
- identification of sub-populations
- data posting
- quick maps
- sampling of seismic attributes at well locations
At the very least, you should plot the distribution of attribute values within your
data set. Look for anomalies in your data, and then look for possible explanations
for those anomalies. By employing classical statistical methods to analyze your
data, you will not only gain a clearer understanding of your data, but will also
discover possible sources of errors and outliers.
Geoscientists tasked with making predictions about the reservoir will always face
these limitations:
- Most prospects provide only a very few direct “hard” observations (well
data)
- “Soft” data (seismic) is only indirectly related to the “hard” well data
- A scarcity of observations can often lead to a higher degree of uncertainty
These problems can be compounded when errors in the data are overlooked.
This is especially troublesome with large data sets, and when computers are
involved; we simply become detached from our data. A thorough EDA will foster
an intimate knowledge of the data to help you flag bogus results. Always take the
time to explore your data.
SEARCH NEIGHBORHOOD CRITERIA
INTRODUCTION
All interpolation algorithms require a standard for selecting data, referred to as
the search neighborhood. The parameters that define a search neighborhood
include:
- Search radius
- Neighborhood shape
- Number of sectors ( 4 or 8 are common)
- Number of data points per sector
- Azimuth of major axis of anisotropy
When designing a search neighborhood, we should remember the following
points:
- Each sector should have enough points (~ 4) to avoid directional sampling
bias.
- CPU time and memory requirements grow rapidly as a function of the
number of data points in a neighborhood.
We will see a further example of the search neighborhood in our later discussion
on kriging.
SEARCH STRATEGIES
Two common search procedures are the Nearest Neighbor and the Radial
Search methods. These strategies calculate the value of a grid node based on
data points in the vicinity of the node.
Nearest Neighbor
One simple search strategy looks for data points that are closest to the grid node,
regardless of their angular distribution around the node. The nearest neighbor
search routine is quick, and works well as long as samples are spread about
evenly. However, it provides poor estimates when sample points are
concentrated too closely along widely spaced traverses.
Another drawback to the nearest neighbor method occurs when all nearby points
are concentrated in a narrow strip along one side of the grid node (such as might
be seen when wells are drilled along the edge of a fault or pinchout). When this
occurs, the selection of points produces an estimate of the node that is
essentially unconstrained, except in one direction. This problem may be avoided
by specifying search parameters which select control points that are evenly
distributed around the grid node.
Radial Searches
Two common radial search procedures are the quadrant search, and its close
relative, the octant search. Each is based on a circular or elliptical area, sliced
into four or eight equal sections. These methods require a minimum number of
control points for each of the four or eight sections surrounding the grid node.
These constrained search procedures test more neighboring control points than
the nearest neighbor search, which increases the time required. Such constraints
on searching for nearest control points will expand the size of the search
neighborhood surrounding the grid node because a number of nearby control
points will be passed over in favor of more distant points that satisfy the
requirement for a specific number of points being selected from a single sector.
In choosing between the simple nearest neighbor approach and the constrained
quadrant or octant searches, remember that the autocorrelation of a surface
decreases with increasing distance, so the remote data points sometimes used
by the constrained searches are less closely related to the location being
estimated. This may result in a grid node estimate that is less realistic than that
produced by the simpler nearest neighbor search.

SPATIAL DESCRIPTION
One of the distinguishing characteristics of earth science data is that these data
sets are assigned to some particular location in space. Spatial features of the
data sets, such as the degree of continuity, directional trends and location of
extreme values, are of considerable interest in developing a reservoir description.
The statistical descriptive tools presented earlier are not able to capture these
spatial features. In this section, we will use a data set from West Texas to
demonstrate tools that describe spatial aspects of the data.
DATA POSTING
Data posting is an important initial step in any study (Figure 1: Posted porosity
data for 55 wells from North Cowden Field in West Texas).


Figure 1


Not only do these displays reveal obvious errors in data location, but they often
also highlight data values that may be suspect. Lone high values surrounded by
low values (or visa versa) are worth investigating. Data posting may provide clues
as to how the data were acquired. Blank areas may indicate inaccessibility
(another company‟s acreage, perhaps); heavily sampled areas indicate some
initial interest. Locating the highest and lowest values may reveal trends in the
data.
In this example, the lower values are generally found on the west side of the
area, with the larger values in the upper right quadrant. The data are sampled on
a nearly regular grid, with only a few holes in the data locations. The empty spots
in the lower right corner are on acreage belonging to another oil company. Other
missing points are the result of poor data, and thus are not included in the final
data set. More information is available about this data set in an article by
Chambers, et al. (1994). This data set and acoustic impedance data from a high-
resolution 3D seismic survey will be used to illustrate many of the geostatistical
concepts throughout the remainder of this presentation.

DATA DISTRIBUTION
A reservoir property must be mapped on the basis of a relatively small number of
discrete sample points (most often consisting of well data). When constructing
maps, either by hand or by computer, attention must be paid to the distribution of
those discrete sample points. The distribution of points on maps (Figure 2,
Typical distribution of data points within the map area.


Figure 2


) may be classified into three categories: regular, random, or clustered (Davis,
1986).
- Regular: The pattern is regular (Figure 2 -part a) if the points are located
on some sort of grid pattern, for example, a 5-spot well pattern. The
patterns of points are considered uniform in density if the points in any
sub-area are equal to the density of points in any other sub-area.
- Random: When points are distributed at random (Figure 2 -part b) across
the map area, the coverage may be uniform, however, we do not
expect to see the same number of points within each sub-area.
- Clustered: Many of the data sets we work with show a natural clustering
(Figure 2 -part c) of points (wells). This is especially true when working
on a more regional scale.







GRIDS AND GRIDDING
INTRODUCTION
One of our many tasks as geoscientists is to create contour maps. Al though
contouring is still performed by hand, the computer is used more and more to
map data, especially for large data sets. Unfortunately, data are often fed into the
computer without any special treatment or exploratory data analysis. Quite often,
defaults are used exclusively in the mapping program, and the resulting maps are
accepted without question, even though the maps might violate sound geological
principles.
Before using a computer to create a contour map it is necessary to create a grid
and then use the gridding process to create the contours. This discussion will
introduce the basic concepts of grids, gridding and interpolation for making
contour maps.
WHAT IS A GRID?
Taken to extremes, every map contains an infinite number of points within its
map area. Because it is impractical to sample or estimate the value of any
variable at an infinite number of points within the map area, we define a grid to
describe locations where estimates will be calculated for use in the contouring
process.
A grid is formed by arranging a set of values into a regularly spaced array,
commonly a square or rectangle, although other grid forms may also used. The
locations of the values represent the geographic locations in the area to be
mapped and contoured (Jones, et al., 1986). For example, well spacing and
known geology might influence your decision to calculate porosity every 450 feet
in the north-south direction, and every 300 feet in the east-west direction. By
specifying a regular interval of columns (every 450 feet in the north-south
direction) and rows (every 300 feet in the east-west direction), you have, in effect,
created a grid.
Grid nodes are formed by the intersection of each column with a row. The area
enclosed by adjacent grid nodes is called a grid cell (three nodes for a triangular
arrangement, or more commonly, four nodes for a square arrangement).
Because the sample data represent discrete points, a grid should be designed to
reflect the average spacing between the wells, and designed such that the
individual data points lie as closely as possible to a grid node.
GRID SPACING
The grid interval controls the detail that can be seen in the map. No features
smaller than the interval are retained. To accurately define a feature, it must
cover two to three grid intervals; thus the cell should be small enough to show the
required detail of the feature. However, there is a trade-off involving grid size.
Large grid cells produce quick maps with low resolution, and a course
appearance. While small grid cells may produce a finer appearance with better
resolution, they also tend to increase the size of the data set, thus leading to
longer computer processing time; furthermore a fine grid often imparts gridding
artifacts that show up in the resulting map (Jones, et al., 1986).
A rule of thumb says that the grid interval should be specified so that a given grid
cell contains no more than one sample point. A useful approach is to estimate, by
eye, the average well spacing, and use it as the grid interval, rounded to an even
increment (e.g., 200 rather than 196.7).
GRIDS AND GRIDDING
Within the realm of geostatistics, you will often discover that seemingly similar
words have quite different meanings. In this case, the word “gridding” should not
be considered as just a grammatical variation on the word “grid.”
Gridding is the process of estimating the value of an attribute from isolated points
onto a regularly spaced mesh, called a grid (as described above). The attribute‟s
values are estimated at each grid node.
Interpolation And Contouring
The objective of contouring is to visually describe or delineate the form of a
surface. The surface may represent a structural surface, such as depth to the top
of a reservoir, or may represent the magnitude of a petrophysical property, such
as porosity. Contour lines, strictly speaking, are isolines of elevation. However,
geologists are rather casual about their use of terminology, and usually call any
isoline a contour, whether it depicts elevation, porosity, thickness, composition, or
other property.
Contour maps are a type of three-dimensional graph or diagram, compressed
onto a flat, two-dimensional representation. The X-and Y-axes usually
correspond to the geographical coordinates east-west and north-south. The Z-
axis typically represents the value of the attribute, for example: elevation with
respect to sea level, or porosity, thickness, or some other quantity (Davis, 1986).
Contour lines connect points of equal value on a map, and the space between
two successive contour lines contains only points falling within the interval
defined by the contour lines. It is not possible to know the value of the surface at
every possible location, nor can we measure its value at every point we might
wish to choose. Thus, the purpose of contouring i s to summarize large volumes
of data and to depict its three-dimensional spatial distribution on a 2-D paper
surface. We use contour maps to represent the value of the property at
unsampled locations (Davis, 1986; Jones, et al., 1986).
The Interpolation Process
The mapping (interpolation) and contouring process involves four basic steps.
According to Jones, et al. (1986), the four mapping and contouring steps are:
1. Identifying the area and the attribute to be mapped (Figure 1 , Location
and values of control points within the mapping area, North Cowden Field,
West Texas);



Figure 1


2. Designing the grid over the area (Figure 2 , Grid design superimposed on
the control points);



Figure 2


3. Calculating the values to be assigned at each grid node (Figure 3, Upper
left quadrant of the grid shown in Figure 2.


Figure 3


The values represent interpolated values at the grid nodes.These values
are used to create the contours shown in Figure 4);
4. Using the estimated grid node values to draw contours (Figure 4 , Contour
map of porosity, created from the contol points in Figure 1 and the grid
mesh values shown in Figure 3).



Figure 4


To illustrate these steps, we will use porosity measurements from the previously
mentioned West Texas data set.

TRADITIONAL INTERPOLATION METHODS
INTRODUCTION
The point-estimation methods described in this section consist of common
methods used to make contour maps. These methods use non-geostatistical
interpolation algorithms and do not require a spatial model. They provide a way to
create an initial “quick look” map of the attributes of interest. This section is not
meant to provide an exhaustive dissertation of the subject, but will introduce
certain concepts needed to understand the principles of geostatistical
interpolation and simulation methods discussed in later sections.
Most interpolation methods use a weighted average of values from control points
in the vicinity of the grid node in order to estimate the value of the attribute
assigned to that node. With this approach, the attribute values of the nearest
control points are weighted according to their distance from the grid node, with
the heavier weights assigned to the closest points. The attribute values of grid
nodes that lie beyond the outermost control points must be extrapolated from
values assigned to the nearest control points.
Many of the following methods require the definition of Neighborhood parameters
to characterize the set of sample points used during the estimation process,
given the location of the grid node. For the upcoming examples, we‟ve specified
the following neighborhood parameters:
- Isotropic ellipse with a radius = 5000 feet
- 4 quadrants
- A minimum of 7 sample points, with an optimum of 3 sample points per
quadrant
These examples use porosity measurements, located on a nearly regular grid.
See Figure 1 (Location and values of control points within the mapping area at
North Cowden Field, West Texas) for the sample locations and porosity values.

Figure 1


The following seven estimation methods will be discussed in turn:
- Inverse Distance
- Closest Point
- Moving Average
- Least Squares Polynomial
- Spline
- Polygons of Influence
- Triangulation
The first five estimation methods are accompanied by images that illustrate the
patterns and relative magnitude of the porosi ty values created by each method.
All images have the same color scale. The lowest value of porosity is dark blue
(5%) and the highest value is red (13%), with a 0.5% color interval. However, for
the purpose of this illustration, the actual values are not important at this time.
(No porosity mapping images were produced for the polygons of influence and
triangulation methods.)
INVERSE DISTANCE
This estimation method uses a linear combination of attribute values from
neighboring control points. The weights assigned to the measured values used in
the interpolation process are based on distance from the grid node, and are
inversely proportional, at a given power (p). If the smallest distance is smaller
than a given threshold, the value of the corresponding sample is copied to the
grid node. Large values of p (~ 5 or greater) create maps similar to the closest
point method (Isaaks and Srivastava, 1989; Jones, et al., 1986), which will be
described next. The equation for the inverse distance method has the following
form:

Where:

And:
Z
*
= the target grid node location
ì
o
= the weights
Z
o
= the data points
d
o
p
= power of distance from Z
o
to Z
*



Figure 2 displays Inverse Distance gridding using a power of 1.

o o
ì Z E = Z
-
( )
) / 1 (
/ 1
p
p
d
d
o
o
o
ì
E
=

Figure 2


The Inverse Distance method is recommended as a “first pass” through the data
because it:
- is simple to use and understand.
- produces a “quick map.”
- is an excellent QC tool.
- locates “bulls-eye” effect (lone high or low values).
- spots erroneous sample locations.
- gives a first indication of trends.
CLOSEST POINT
The closest point (Figure 3) or nearest neighbor methods consist of copying
the value of the closest sample point to the target grid node.


Figure 3


This method can be viewed as a linear combination of the neighboring points with
all the weights equal to 0, except the weight attached to the closest point which is
equal to 100% (Henley, 1981; Jones, et al., 1986).
Z
*
= Z
o (closest
Where
Z
*
= the target grid node location
Z
o
= the data points
MOVING AVERAGE
The moving average method (Figure 4) is the most frequently used estimation
method.


Figure 4


Each neighboring sample point is given the same weight. The weight is
calculated so that the sum of the weights of all the neighboring sample points
sum to unity (Henley, 1981; Jones, et al., 1986).
So, if we assume that there are N neighboring data,
Z
*
= E Z
o
/N
Where
Z
*
= the target grid node location
Z
o
= the data points
N = the neighboring data
The moving average takes its name from the process of estimating the attribute
value at each grid node based on the weighted average of nearby control points
in the search neighborhood, and then moving the neighborhood from grid node to
node.
LEAST SQUARES POLYNOMIAL
The least squares polynomial method (Figure 5) is commonly used for trend
surface analysis.


Figure 5


The neighboring points are used to fit a polynomial expression of a degree
specified by the user. The polynomial form is a logical choice for surface
approximation, as any function that is continuous and possesses all derivatives
can be reproduced by an infinite power series. The polynomial surface is a
mathematical function involving powers of X and Y. The complexity of the surface
(Table 3.1) is controlled by the user through the number of terms used, which is
dependent upon its degree, N, a positive integer (Jones, et al., 1986; Davis,
1986; Krumbein and Graybill, 1965, Henley, 1981).
Z
*
= EE a
ij
X
i
Y
j
Where
Z
*
= the target grid node location

Table 1: General form of polynomial functions (after (Jones, ET al., 1986).
Degree

Function

1

Z = a
00
+ a
10
X + a
01
Y

2

Z = a
00
+ a
10
X + a
01
Y + a
20
X
2
+ a
11
XY +
a
02
Y
2


N

Z = a
00
+ a
10
X + a
01
Y + . . . + a
0N
Y
N


SPLINE
Spline fitting is a commonly used quantitative method. The method ignores
geologic trends and allows sample location geometry to dictate the range of
influence of the samples. The bicubic spline (Figure 6) is a two-dimensional
gridding algorithm.


Figure 6


In one-dimension, the function has the form of a flexible rod between the sample
points. In two dimensions, the function has the form of a flexible sheet. The
objective of the method is to fit the smoothest possible surface through all the
samples using a least squares polynomial approach (Jones, et al., 1986).
POLYGONS OF INFLUENCE
This is a very simple method, and is often used in the mining industry to estimate
average ore grade within blocks. Often, the value estimated at any location is
simply the value of the closest point. The method is similar to the closest point
approach. Polygonal patterns are created, based on sample location. Polygon
boundaries represent the distance midway between adjacent sample locations.
As long as the points we are estimating fall within the same polygon of influence,
the polygonal estimate does not change. As soon as we encounter a grid node in
a different polygon, the estimate changes to a different value. This method
causes abrupt discontinuities in the surface, and may create unrealistic maps
(Isaaks and Srivastava, 1989; Henley, 1981).
TRIANGULATION
The triangulation method is used to calculate the value of a variable (such as
depth, or porosity for instance) in an area of a map located between 3 known
control points. Triangulation overcomes the problem of the polygonal method,
removing possible discontinuities between adjacent points by fitting a plane
through three sample points that surround the grid node being estimated (Isaaks
and Srivastava, 1989). The equation of the plane is generally expressed as:
Z
*
= ax + by + c
This method starts by connecting lines between the 3 known control points to
form a triangle (denoted as Arst in Figure 7).


Figure 7


Next, join a line from the unknown point (point O in the figure), to each of the
corners of the triangle, thereby forming 3 new triangles within the original triangle.
The value of any point located within the triangle (point O in this example) can be
determined through the following steps:
1. Compute the areas of the resulting new triangles
2. Use the areas of each triangle to establish a weight for each corner point
3. Multiply the values of the three corner points by their respective weights,
and
4. Add the resulting products.
The formula to find the area of a triangle is:
A = bh/2
Where
A is the area of the triangle,
b is the length of the base, and
h is the length of the height, taken perpendicular to the base.
Weights are assigned to each value in proportion to the area of the triangle
opposite the known value, as shown by the example in the Figure 7. This
example shows how the values from the three closest locations are weighted by
triangular areas to form an estimated value at point O. The control values (r,s,t)
are located at the corners of the triangle.
- The value at point r is weighted by the triangular area A
Ost
,
- point s is weighted by the area A
Ort
, and
- point t is weighted by the area A
Ors
.
The weights are taken as a percentage, where the sum of all 3 weights equals 1.
Now multiply the weight times its associated control value to arrive at a weighted
control value. Do this for each of the three points. Then add up the 3 weighted
control values to triangulate an interpolated depth for point O.
Be aware, however, that choosing different meshes of triangles or entering the
data in a different sequence may result in a different set of contours for your map.
MAP DISPLAY TYPES
CONTOUR MAPS
Contour maps reveal overall trends in the data values. Hand contouring the data
is an excellent way to become familiar with the data set. Unfortunately, many
data sets are too large to hand contour, so computer contouring is often the only
alternative.
At this preliminary stage of spatial description, the details of the contouring
algorithm need not concern us as long as the contour map is a good visual
display of the data. Gridding and contouring of the data requires values to be
interpolated onto a regular grid. For a first pass through the data, the inverse
distance algorithm is a good choice. The inverse distance parameters are easy to
set; then choose an octant isotropic octant search neighborhood with about two
data points per octant, if possible. Design the grid interval to be about the
average spacing of the wells, or half that size. Figure 1 (Grid mesh and data
location of 55 porosity data points from North Cowden Field in West Texas)


Figure 1


shows the grid design with respect to the data locations and Figure 2 is the
resulting contour map using an inverse distance approach with a distance power
equal to 1.


Figure 2


In this example, the high porosity area is located in the upper right quadrant,
extending down the right side of the mapped area. There is a second region of
high porosity in the southern, central portion of the area. We can see that low
values are generally trending north-south, with a zone of lower porosity trending
east to west through the central portion of the area. Displays such as this will aid
in designing the spatial analysis strategy and help to highlight directional
continuity trends.
SYMBOL MAPS
For many large data sets (for example, seismic), posting individual sample values
may not be feasible and contouring may mask interesting local details (Isaaks
and Srivastava, 1989). An alternative approach is a symbol map. Different colors
for ranges of data values can be used to reveal trends in high and low values.
Figure 3 is a five-color symbol map of 33,800 acoustic impedance values, scaled
between 0 and 1, from a high resolution 3D seismic survey.


Figure 3


Previous studies with this data set and its accompanying porosity data set
(Chambers, et al., 1994) show that acoustic impedance has a -0.83 correlation
with porosity. Therefore, observation from this map may indicate zones of high
porosity associated with the red and orange areas (see contour and data posting
maps, Figure 1 and Figure 2). Low porosity is located in the blue and green
areas. If this relationship holds, the seismic data can be used to infer porosity in
the inter-well regions using a geostatistical data integration technique commonly
referred to as cokriging (which we will describe later in the section on Data
Integration).
INDICATOR MAPS
An indicator map is a special type of symbol map with only two symbols, for
example, black and white. With these two symbols, each data point is assigned to
one of two classes. Indicator maps simply record when a data value is above or
below a certain threshold. Four indicator maps (Figure 4 ) were created from the
acoustic impedance data shown in Figure 3. The threshold values are 0.2, 0.4,
0.6, and 0.8 acoustic impedance scaled units.


Figure 4


These maps show definite trends in high and low values, which relate to trends in
porosity values.
Figure 4 parts B and C are perhaps the most revealing. Zones of lowest scaled
impedance are located in the upper right quadrant of the study area (Figure 4
part B). Zones of highest impedance are on the western side of the study area,
trending generally north to south. There is also a zone of high impedance cutting
east to west across the southern, central portion of the area, with another north to
south trend in the lower right corner of the map area.
Data posting, contour, symbol and indicator maps provide us with a lot of
information about the spatial arrangement and pattern in our data sets. These are
also excellent quality control displays and provide clues about potential data
problems.
MOVING WINDOW STATISTICS
INTRODUCTION
Moving window statistics provide a way to look for local anomalies in the data set,
heteroscedasticity in statistical jargon. In earth science data it is quite common to
find data values in some regions that display more variability than in other
regions. For example, local high (thief zones) or low (barriers) zones of
permeability hamper the effective recovery of hydrocarbons and creates many
more problems with secondary and tertiary recovery operations.
The calculation of a few summary statistics (mean and standard deviation) within
moving windows is often useful for investigating anomalies in the data (Isaaks
and Srivastava, 1989). The method is quite simple, and consists of :
- dividing the area into local neighborhoods of equal size,
- then computing summary statistics within each local area.
The window size depends on the average data spacing, dimensions of the study
area, and amount of data. We are looking for possible trends in the local mean
and standard deviation. It is also important to see if the magnitude of the local
standard deviation tracks (correlates) with the magnitude of the local mean,
known as the proportionality effect. See Isaaks and Srivastava (1989) for more
details on moving windows and the proportionality effect.
PROPORTIONALITY EFFECT
The proportionality effect concerns the relationship of the local summary statistics
computed from moving windows. There are four relationships between the local
mean and the local variability (e.g., standard deviation or variance). According to
Isaaks and Srivastava (1989), these relationships are:
- There is no trend in the local mean or the variability. The variability is
independent of the magnitude of the local mean. This is the ideal case,
but is rarely seen.
- There is a trend in the local mean, but the variability is independent of the
local mean and has no trend.
- There is no trend in the local mean, but there is a trend in the local
variability.
- There is a trend in both the local mean and local variability. The magnitude
of the local variability correlates with the magnitude of the local mean.
For estimation purposes, the first two cases are the most favorable. If the local
variability is roughly constant, then estimates anywhere in the mapped area will
be as good as estimates anywhere else. If the local mean shows a trend, then we
need to examine our data for signs of stationarity.
CONCEPT OF STATIONARITY
A stationary property is stable throughout the area measured. Stationarity may be
considered the statistical equivalent of homogeneity, in which statistical
parameters such as mean and standard deviation are not seen to change.
Stationarity requires that values in the data set represent the statistical
population.
Ideally, we would like our data to be independent of sample location. However,
data often show a regular increase (or decrease) in value over large distances,
and the data are then said to be non-stationary, or to show a trend (Hohn, 1988;
Isaaks and Srivastava, 1989; Henley, 1981; Wackernagel, 1995).
The concept of stationarity is used in every day practice. For example, consider
the following:
The top of Formation A occurs at a depth of about 975 feet TVDss.
This statement, however, does not preclude the possibility that Formation A
varies in depth from well to well. Thus, if Z
(top)
is a stationary random function,
and
- At location Z
(xi)
, Formation A occurs at 975 feet TVDss, then
- At location Z
(xi + ½ mile)
, Formation A should also occur at about 975 feet
TVDss.
However, if Formation A is known to be non-stationary, then predicting the depth
to the top of Formation A in the new well is more difficult, and requires a more
sophisticated model. We will discuss such models and how stationarity influences
them in the section on regionalized variables.
INTRODUCTION
In the reservoir, the variables of interest (e.g., porosity, permeability, sand/shale
volumes, etc.) are products of a variety of complex physical and chemical
processes. These processes superimpose a spatial pattern on the reservoir rock
properties, so it is important to understand the scales and directional aspects of
these features to gain efficient hydrocarbon production. The spatial component
adds a degree of complexity to these variables, and serves to increase the
uncertainty about the behavior of attributes at locations between sample points
(sample points are usually wells). Deterministic models cannot handle the
uncertainties associated with such variables, so a geostatistical approach has
been developed because it is based on probabilistic models that account for
these inevitable uncertainties (Isaaks and Srivastava, 1989).
THE BASIS OF THE REGIONALIZED VARIABLE AND SPATIAL CORRELATION
Matheron, in his definitive work on Geostatistics entitled Traite de Geostatistique
Appliquee (1963), laid the foundation for regionalized variable theory, a body of
theoretical statistics in which the location was for the first time considered an
important factor in the estimation procedure. Regionalized variable theory
pertains to the statistics of a special type of variable that differs from the ordinary
scalar random variable. Although the regionalized variable possesses the usual
distribution parameters (mean, variance, etc.), it also has a defined spatial
location. Thus, two realizations of a regionalized variable that differ in spatial
location will display in general a non-zero correlation. However, successive
realizations of an ordinary scalar random variable are uncorrelated (Henley,
1981).
Therefore, the premise of regionalized variables and spatial correlation analysis
is to quantify the continuity of sample properties with distance and direction. For
example, we can intuitively surmise that two wells in close proximity are more
likely to have similar reservoir properties than two wells which are further apart.
But just exactly how far can wells be separated yet still yield similar results? We
need a new statistical measure, because the classical univariate or bivariate
statistics cannot capture spatial correlation information. Spatial correlation
analysis is one of the most important steps in a geostatistical reservoir study,
because it conditions subsequent processes, such as kriging and conditional
simulation results, and their associated uncertainties.
PROPERTIES OF REGIONALIZED AND RANDOM VARIABLES
One data set can have exactly the same univariate statistics as another set, and
yet exhibit very different spatial continuity. Consider the following two sequences
(Figure 1a


Figure 1a


and Figure 1b: Comparison of porosity data measured over 50 units of distance
with an equal sampling interval):



Figure 1b


This graphic shows a plot of porosity measures along two transects. The sample
spacing is 1 unit of distance. Sequence A, on the left (Regionalized Variable)
shows spatial continuity in porosity, whereas the Sequence B on the right
(Random Variable) shows a random distribution of porosity. However, the mean,
variance and histogram for both porosity sequences are identical.
Statistical Properties Of The Porosity Profiles
The porosity profiles in the above graphic are purely hypothetical, but serve to
illustrate the concepts behind the regionalized variable and spatial correlation.
- Similarities:
 Same mean = 8.4%
 Same standard deviation = 2.7 %
 Same frequency distribution (histogram)
- Differences:
 Sequence A has SPATIAL CONTINUITY
 Sequence B is RANDOM
Sequence A exhibits a structured or spatial correlation component, and hence is
a REGIONALIZED VARIABLE. Sequence B does not show any spatial continuity,
and so is classified as a RANDOM VARIABLE.
These variables will come into play later, during the mapping process. The
process of hand-contouring data points on a map is a form of geostatistical
modeling, where the geoscientist has a certain model in mind before he attempts
the contouring exercise. However, those who contour data usually assume the
presence of the spatial component (regionalized variable), but typically ignore the
second component (the random variable).
Autocorrelation
Let us further investigate the concept of spatial continuity by plotting Sequences
A and B in a different way. When any of these data sequences is plotted against
itself, it will yield a slope of 45 degrees, thus indicating perfect correlation.
However, if the data are translated by the sampling interval, then plotted against
itself, we will begin to see the impact of spatial correlation, or the lack of spatial
correlation.
Sequence A Correlation Plots
Figure 2a, 2b,



Figure 2a


2c,



2b


2d,



2c


2e,



2d


and 2f


2f


(Correlation plots) will help to illustrate the concept of spatial autocorrelation.
Here we see a reasonably good correlation even with 3 units of translation of
Sequence A.



2e


Sequence B Correlation Plots
When we compare Figure 3a, 3b,



Figure 3a


and 3c


3c


(Correlation cross-plots) with that of the previous graphic, we see a distinct
difference in autocorrelation characteristics. There is a poor correlation after one
unit of translation of Sequence B.



3b


Observations
Sequences A and B (above) are presented as h-Scatterplots, where h represents
lag, or separation distance. Recall that the concept of the h-Scatterplot was
discussed in the section on Bivariate Statistical Measures and Displays. The h-
Scatterplot forms the basis for describing a model of spatial correlation. The
shape of the cloud on these plots tells us how continuous the data values are
over a certain distance in a particular direction. For this case, h-Scatterplots were
computed along two different transects. If the data values at locations separated
by h are identical, then they will fall on a line x = y, a 45-degree line of perfect
correlation. As the data becomes less and less similar, the cloud of points on the
h-Scatterplot becomes fatter and more diffuse (Hohn, 1988; Isaaks and
Srivastava, 1989).
The following observations are readily apparent from the previous two figures:
Sequence A is Regionalized Variable and shows spatial continuity over about 3
units of distance.
- Good correlation after three units of translation is shown
- Correlation is 0.21 after five units of translation
Sequence B is a Random Variable with no spatial continuity.
- Poor correlation after one unit of translation is shown
- Correlation approaches 0 after two units of translation
The Random Function
The complex attributes we study in the reservoir are random functions, which are
combinations of Regionalized and Random Variables. Thus, the random function
has two components:
- Structured Component, consisting of the regionalized variable, which
exhibits some degree of spatial auto-correlation
- Local Random Component, consisting of the random variable (also
referred to as the nugget effect), showing little or no correlation
The random function model assumes that:
- The single measurement at location z(x
i
) is one possible outcome from a
random variable located at point Z(x
i
).
- The set of collected samples, z(x
i
), i = 1, … n, are interpreted as a
particular realization of dependent random variables, Z(x
i
), i = 1, … n,
known as a random function.
The process of quantifying spatial information involves the comparison of
attribute values measured at one location with values of the same attribute
measured at other locations. This method is analogous to the h-Scatterplot. By
studying the spatial dependency between any two measurements of the same
attribute sampled at z(x
i
) and z(x
i+h
), where h is some measurement of distance,
we are essentially studying the spatial correlation between two corresponding
random functions Z(x
i
) and Z(x
i+h
).
For more information on the random function model, see: Hohn, (1988); Isaaks
and Srivastava, (1989) Deutsch and Journel (1992), Wackernagel, (1995) and
Henley (1981).
SPATIAL CONTINUITY ANALYSIS
In previous sections, we discussed classical methods for analyzing single
variables or multiple variables. Those methods, however, could not properly
address the spatial continuity and directionality that is inherent in earth science
data. Traditional interpolation procedures work on the assumption that spatial
correlation within a data set may be modeled by a linear function, based on the
premise that as the distance between sampled locations increases, the variability
between data values increases proportionally. In practice, such a linear
correlation amounts to gross oversimplification. In this section, we will introduce
the concept of auto-correlation, and the tools used to measure this property.
SPATIAL AUTO-CORRELATION
Spatial auto-correlation describes the relationship between regionalized variables
sampled at different locations. Samples that are auto-correlated are not
independent with regard to distance. The closer two variables are to each other in
space the more likely they are to be related. In fact, the value of a variable at one
location can be predicted from values sampled at other (nearby) locations.
The two common measures of spatial continuity are the variogram and its close
relative, the correlogram, which allow us to quantify the continuity, anisotropy and
azimuthal properties of our measured data set.
THE VARIOGRAM
Regionalized variable theory uses the concept of semivariance to express the
relationship between different points on a surface. Semivariance is defined as:
¸(h) = [1/2N(h)] E [(z
xi
) -( z
xi
+h)]
2

Where:
¸ = semivariance
h = lag (separation distance)
z
xi
= value of sample located at point xi
z
xi
+h = value of sample located at point xi+h
N(h) = total number of sample pairs for the lag interval h.
Semivariance is used to describe the rate of change of a regionalized variable as
a function of distance. We know intuitively that there should be no change in
values (semivariance = 0) between points located at a lag distance h = 0,
because there are no differences between points that are compared to
themselves. However, when we compare points that are spaced farther apart, we
see a corresponding increase in semivariance (the higher the average
semivariance, the more dissimilar the values of the attribute being examined). As
the distance increases further, the semivariance eventually becomes
approximately equal to the variance of the surface itself. This distance is the
greatest distance over which a variable measured at one point on the surface is
related to that variable at another point.
Semivariance is evaluated by calculating ¸ (h) for all pairs of points in the data set
and assigning each pair to a lag interval h. If we plot a graph of semivariance
versus lag distance, we create a variogram (also known as a semivariogram).
The variogram measures dissimilarity, or increasing variance between points
(decreasing correlation) as a function of distance. In addition to helping us assess
how values at different locations vary over distance, the variogram provides a
way to study the influence of other geologic factors which may affect whether the
spatial correlation varies only with distance (the isotropic case) or with direction
and distance (the anisotropic case).
Because the variogram is the sum of the squared differences of all data pairs
falling within a certain lag distance, divided by twice the number of pairs found for
that lag, we use the variogram to infer the correlation between points. That is,
rather than showing how two points are alike, or predicting attribute value at each
point, we actually plot the difference between each value over a given lag
distance.
The Experimental Variogram
The variogram described above is known as an experimental variogram. The
experimental variogram is based on the values contained in the data set, and is
computed as a preliminary step in the kriging process. The experimental
variogram serves as a template for the model variogram, which is used to guide
the kriging process.
THE CORRELOGRAM
The correlogram is another measure of spatial dependence. Rather than
measuring dissimilarity, the correlogram is a measure of similarity, or correlation,
versus separation distance.
C
(h)
= 1/n E |Z(
xi)
-m] [Z
(xi+h)
-m]
Where:
m is the sample mean over all paired points, n
(h)
, separated by distance h.
Computing the covariance for increasing lags (double, triple, etc.) allows us to
generate a plot showing decreasing covariance with distance, as shown in
Figure 1a and 1b: omni-directional variogram (A)


Figure 1a


and correlogram (B) computed from the same data set.


1b


In this graphic, we see that while the variogram in Frame A measures increasing
variability (dissimilarity) with increasing distance, the correlogram in Frame B
measures decreasing correlation with distance.
Anatomy of the correlation model
Notice that the variogram and correlogram plots in Figure 1a and 1b curve in
opposite directions. Thus, the origin represents zero variance for the variogram
and perfect correlation for the correlogram; a measure of self-similarity.
The variogram model tends to reach a plateau called a sill (the dashed horizontal
line at the top of Figure 1a). The sill represents the maximum variance (o
2
) of the
measured spatial process being modeled. The lag distance at which the sill is
reached by the variogram is called the range, which represents the maximum
separation distance at which one data point will be able to correlate with any
other point in the data set. The range plays a role in determining the maxi mum
separation distance between grids.
The correlogram reaches its range when C
(h)
= 0. The correlation scale length is
determined when the covariance value reaches zero (no correlation).
Working from the same data set, the range for a variogram and correlogram
should be the same for a given set of search parameters. The sill and range are
useful properties when comparing directional trends in the data.
Notice that the correlogram intersects the Y-axis at 1.0, but there is a
discontinuity near zero for the variogram model. Often the variogram or
correlogram show discontinuity near the origin. Such a discontinuity is called the
nugget effect in geostatistical terminology. The nugget effect is generally
interpreted as a residual variance or spatially independent variability that occurs
at spatial scales below the observational threshold of the sampling -smaller than
the resolution of the sample grid. It can also be caused by random noise at all
scales, or measurement error.
Variogram Search Strategies
If the sample data have a regular sampling interval, the search strategy is simple.
Unfortunately, point data (well data) rarely form a neat regular array, therefore to
extract as much information out of the data as possible, rather than searching
along a simple vector, we search for data within a bin.
Search Parameters
When computing the experimental variogram (covariance model), the following
search parameters must be taken into account (Figure 2: Search strategy along
azimuths 45 and 135 degrees.):



Figure 2


- Lag: The lag distance is the separation distance, h, between sample
points used in calculating the experimental model.
In a producing field, a good starting distance might be the average well
spacing, rather than the closest well pair spacing. For example, West
Texas fields are commonly drilled on 1340-foot (1/4-mile) spacing.
However, because all wells will not be spaced exactly 1340 feet apart, we
should set the lag at 1400 feet. Then we will program a lag tolerance to
one-half the lag interval. Thus, for this example, the first lag bin is from 700
to 2100 feet. (Some programs may set the first bin centered around 350
feet to account for wells spaced closer than 700 feet.) The second lag bin
is from 2100 to 3500 feet, and so forth until the maximum lag distance
specified is reached.
An important consideration in designing a lag strategy involves how we
specify the size of the lag bin. If we decrease the size of the bin, then we
increase the number of bins, and hence the number points that we plot on
the variogram. This will increase the resolution of the variogram. However,
by decreasing the size of each bin, we also decrease the number of data
pairs within each class. This has the effect of decreasing the likelihood that
the average semivariance for that class is accurately estimated.
- Search Azimuth: Because reservoir data often exhibit directional
properties, we may wish to specify a certain direction for the search
strategy. Such is the case when the continuity of a reservoir property is
more prevalent in one direction than in another direction. We say this
property exhibits an anisotropic behavior. The search azimuth also has
an azimuth tolerance. For example, we may wish to calculate two
directions, at 045 and 135 degrees. By using a ±45-degree tolerance
about each search direction, we will be able to cover all sample
locations in the neighborhood.
- Bandwidth: The bandwidth restricts the limits (width) of the azimuth
tolerance at large lag distances.
In the above graphic, Point A is compared to Point B. The Bandwidth is indicated
by a light dashed line about the azimuthal direction (heavy dashed line) of 45
degrees. Point B lies within one of the search bins designated by the lag
tolerance.
Omni-directional Experimental Models
There are many ways to design a search strategy, each dictated by the data
configuration and the number of sample points. Quite often, it is necessary to
conduct an omni-directional search, simply because of a sparse data set. (By
calculating an omni-directional variogram, we do not necessarily imply a belief
that the spatial continuity is the same in all directions.) An omni -directional search
is designed by selecting a single azimuthal direction (does not matter what angle
is selected) and setting a tolerance of 90 degrees.
The omni-direction model is a good choice for an initial variogram, which can
subsequently be refined by calculating a directional model:
- It is the average of all possible directional variograms.
- It can serve as an early warning for erratic data points.
- If the omni-directional variogram is not able to clearly define the spatial
continuity, then it is unlikely that spatial continuity will be established by
directional variograms.
Anisotropic Experimental Models
Because earth science data is often more continuous in one direction than in
another, we need to design a variogram search strategy to model the maximum
and minimum directions of continuity. In the Figure 2, the minimum direction of
continuity might align along the 45-degree azimuth. By definition, the maximum
direction of continuity is orthogonal (90 degrees) to the minimum axis. Note,
however, that this assumption does not always hold true, as the axes may
change direction across the study area (e.g. meandering channel system), and it
is preferable to have a variogram that conforms to the major axi s of anisotropy.
Accounting for Anisotropy
If we base our variogram search along two different azimuths, we often see the
influence of anisotropy. By plotting the results onto a common graph, we produce
two variograms on the same chart. A study of each variogram allows us to further
characterize the nature of the anisotropy. Figure 3a and 3b


3b


(Variograms computed along different azimuths) shows two types of anisotropy:



Figure 3a


- Geometric anisotropy (Frame A) is indicated by directional variograms
that have the same sill, but different ranges.
- Zonal anisotropy (Frame B) is seen when variograms have different sills
but the same range.
Practical Considerations for Computing the Experimental Spatial Variogram
- The omni-directional variogram considers all azimuths simultaneously.
- An omni-directional variogram contains more sample pairs per lag than
any directional variogram, and therefore is more likely to show
structure.
- An omni-directional variogram is the average of all directional variograms.
- The Nugget Effect is more easily determined from the omni-directional
variogram.
- May need to “clean-up” the data prior to calculating the variogram;
variograms are very sensitive to “outliers.”
- Do not consider variogram values for distances greater than about ½ the
size of the study area.
- Interpret a variogram only if the corresponding number of pairs is sufficient
(e.g., 15 to 20 pairs).
- A saw-toothed pattern may indicate a poor choice of lag increment.
- If data are skewed, consider transforming the data (e.g. Gaussian
distribution).
- Consider data clustering.
- The variogram computation involves a decision of stationarity.
- Non-stationary variograms do not reach a sill and are considered
unbounded (have a characteristic parabolic upward shape).

Advantages And Limitations of Variograms And Correlograms
Given a sufficient number of data points, variograms and correlograms provide
useful tools to:
- measure linear spatial dependence
- quantify spatial scales
- identify and quantify anisotropy
- test multiple geological scenarios
However, variograms and correlograms are subject to limitations:
- They are measures of linear spatial interdependence, and may not be
appropriate for non-linear processes.
- Spatial correlation analysis is difficult to perform when data are sparse.
- It is often difficult to select a domain of stationarity (constant mean) for
computation.
SPATIAL CROSS-CORRELATION ANALYSIS
The previous discussion focused on spatial analysis of only a single variable
(e.g., comparing porosity values to other nearby porosity values), a process
known as auto-correlation. To study the spatial relationships between two or
more variables we use the process of cross-correlation.
The cross correlation model is useful when performing cokriging or conditional
cosimulation (e.g., matching well data with seismic data). The cross correlogram
or cross variogram describes spatial correlation in which the paired points
represent different variables. In this case, known values of one variable are
compared to known values of a different variable.
If for example, we need to estimate porosity based on measurements of acoustic
impedance, then it is first necessary to compute the auto-correlation models for
both attributes and then compute the cross-correlation model, as seen in
Figure 1a, 1b,



Figure 1a


and 1c


1c


(Omni-directional variograms of Porosity (A), Acoustic Impedance (B), and their
cross variogram (C)).



1b


In this graphic, the solid squares on the figures represent the average of porosity
or acoustic impedance data pairs for each 500-unit lag interval. The numbers of
data pairs are displayed next to the average experimental data point. The first
point contains only one data pair and should not be taken into consideration
during the modeling step.
Below are the general variogram equations for the primary attribute (porosity), the
secondary attribute (acoustic impedance), and their cross variogram.
Consider the following:
- Z
(xi)
= the primary attribute measured at location xi
- Z
(xi + h)
= the primary attribute measured at xi + some separation distance
(lag), h
- T
(xi)
= the secondary attribute measured at location xi
- T
(xi + h)
= the secondary attribute measured at xi + some separation
distance (lag), h
- N = the number of data points
then
- The variogram of the primary attribute is calculated as(Figure 1a):
¸
(h)
= 1/2N E |Z
(xi)
] -[Z
(xi+h)
]
2

- The variogram of the secondary attribute is calculated as(Figure 1b):
¸
(h)
= 1/2N E |T
(xi)
] -[T
(xi+h)
]
2

- The cross variogram between the primary and secondary attribute is
calculated as(Figure 1c):
¸
(h)
= 1/2N E |Z
(xi)
-Z
(xi+h)
] [T
(xi)
-T
(xi+h)
]
SUPPORT EFFECT
Most reservoir studies are concerned with physical rock samples, with
observations corresponding to a portion of rock of finite volume. It is obvious that
once a piece of rock is collected (e.g. cores, hand samples, etc.) from a location,
it is impossible to collect it again from the same location (Henley, 1981).
The shape and volume of the rock are collectively termed the support of the
observation. If the dimensions of the support are very small in comparison to the
sampling area or volume (e.g. the reservoir), the sample can be considered as
point data (Henley, 1981). For much of our work the support does not influence
our mapping, until we compare samples of different sizes or volumes (e.g. core
plugs and whole core, or wireline data and core).
The support effect becomes significant when combining well data and seismic
data. Seismic data can not be considered point data. The volumes of
measurement differences between well logs and cores versus seismic data is
very large and should not be ignored. Geostatistical methods can account for the
support effect using a variogram approach. As the support size increases, the
variance decreases until it remains constant after reaching a certain area or
volume (Wackernagel, 1995).
STATIONARITY IN REGIONALIZED VARIABLES
Stationarity ensures that the spatial correlation can be modeled with a positive
definite function and states that the expected value which may be considered the
mean of the data values is not dependent upon the distance separating the data
points. Mathematically, this assumption means that the expected value of the
difference between two random variables is zero. This is denoted by the following
equation:
E[Z(x
i
+h)-Z(x
i
)] = 0 for all x
i
, h
Where:
Z(x
i
), Z(x
i
+h) = random variables
E = expected value
x
i
= sampled location
h = distance between sampled locations
Stationarity is defined through the first-order (mean) and second-order
(variability) moments of the observed random function, and degrees of
stationarity correspond to the particular moments that remain invariant across the
study area (Hohn, 1988). For a random variable, Z
(x)
, observed at location x, the
distribution function of Z
(x)
has the expectation
E Z
(x)
= m
(x)

which can depend upon x. This is the first-order moment.
Three second-order moments are useful in geostatistics:
1. The variance of the random variable Z
(x)
:
VAR Z
(x)
= E [Z
(x)
- m
(x)
]
2

2. The covariance:
C
(x1 -x2)
= E [Z
(x1)
- m
(x1)
]

[Z
(x2)
- m
(x2
],
where
Z
(x1)
and Z
(x2)
are two random variables observed at locations x1 and
x2.
3. The semivariogram function:
¸
(x1, x2)
= VAR [Z
(x1)
-Z
(x2)
] / 2
Under conditions of second-order stationarity, the semivariogram and covariance
are alternative measures of spatial autocorrelation (Hohn, 1988; Isaaks and
Srivastava, 1989; Henley, 1981; Wackernagel, 1995).
HANDLING NON-STATIONARY DATA
An ongoing debate centers on how to handle non-stationary data. Some argue
that stationarity is a matter of scale (Hohn, 1988; Isaaks and Srivastava, 1989;
Henley, 1981; Wackernagel, 1995). A variable of interest may have a trend
across an area, and so it would be deemed non-stationary. However, others
contend that they can define quasi-stationarity as local stationarity, if the
maximum distance h used in computing the semivariogram or covariance is
much smaller than the scale of the trend (Hohn, 1988).
The impact of non-stationarity depends in part on the sampling scale in relation to
the scale of the trend. With sufficient sampling, stationarity can be achieved.
Unfortunately, the geoscientist seldom has total control over the sample
distribution.
If a regionalized variable is non-stationary, it can be regarded as a composite of
two parts, the residual and the trend.
Z
(x)
= Y
(x)
+ m
(x)

Where:
Y
(x)
has an underlying variogram (residual)
m
(x)
can be approximated by a polynomial (trend)
Note: The univariate probability law [the probability of remaining at the same
value at location Z
(xi)
and at Z
(xi+h)
] does not depend on the location of x,
but only on the separation distance, h (Isaaks and Srivastava, 1989).
In practice, it is possible to ignore the trend if the data set contains a “large”
number of data points. However, if the data are sparse, the variogram of the
residuals should be computed, which means detrending the data.
HANDLING TRENDS
The following approach is often used to “detrend” the data. However, this
approach also has its problems.
1. “Stationarize” the data.
a. determine the trend on the sample data (trend surface analysis)
b. subtract the trend from the data (usually from the well data)
2. Compute the variogram (correlogram) on the residuals.
3. Obtain kriging or conditional simulation of the residuals on the grid.
4. Krig the trend to the grid.
5. Calculate the final gridded results by adding residuals to the trend.
Although this is a reasonable approach, the main hurdle is the “correct”
determination of the trend to remove from the raw data. In addition, the trend
removed by this method is the global trend, but perhaps we should be working at
a local scale (e.g., neighborhood scale).
MODEL VARIOGRAMS
The experimental variogram and correlogram described in the previous section
are calculated only along specific inter-distance vectors, corresponding to
angular/distance classes. After computing the experimental variogram, the next
step is to define a model variogram. This variogram is a simple mathematical
function that models the trend in the experimental variogram. In turn, this
mathematical model of the variogram is used in kriging computations.
The kriging and conditional simulation processes require a model of spatial
dependency, because:
- Kriging requires knowledge of the correlation function for all -possible
distances and azimuths.
- The model smoothes the experimental statistics and introduces geological
information.
- Kriging cannot fit experimental directional covariance models
independently, but depends upon a model from a limited class of
acceptable functions.
Consider a random function Z
(x)
with an auto-covariance C
(h)
:
- Define an estimator, Z

= Eì
i
Z
(xi)

- The variance of Z is given by: o
z
2
= Eì
i
ì
j
C
(xi -xj)
> 0
- The variance must be positive (positive definiteness criterion)
for any choice of weights (ì
i
and ì
j
),
and any choice of locations (x
i
and x
j
)
- To honor the above inequality, the experimental covariance model must be
fit with a positive definite C
(h)
.
Spatial modeling is not curve fitting, in the least squares sense. A least squares
model does not satisfy the positive definiteness criterion. The shape of the
experimental model usually constrains the type of model selected, although any
model can be applied, affecting the final kriged results.
THE MODEL CHOICE
The important characteristics of the spatial model are its behavior near the origin
and behavior at large distances from the origin.
BEHAVIOR NEAR THE ORIGIN
The behavior near the origin affects the short scale variability of the final map
plot.
Figure 1a, 1b,



Figure 1a


1c,



1b


and 1d:


1d


Variogram behavior near the origin.


1c


Frame A shows purely random behavior. Frame B is linear, with some degree of
random component. Frame C is highly continuous, while Frame D exhibits linear
behavior.
BEHAVIOR AT LARGE DISTANCES
After the variogram reaches its variance or sill, we can describe its behavior as
either bounded, or unbounded. Figure 2a and 2b: (Variogram behavior at large
distances after reaching the variance of the data)


Figure 2a


shows an example of each type of behavior.


2b


In this example, we note that the bounded variogram (Frame A) reaches a sill and remains
at the sill value for an infinite distance. This behavior is typical of the classic variogram.
Meanwhile, the unbounded variogram (Frame B) never plateaus at the sill, but shows a
continuous increase in variance with increasing distance. Variograms displaying this
characteristic are typical of data that possess a trend.
BASIC COVARIANCE FUNCTIONS
Basic covariance functions for modeling variograms have the following
characteristics:
- simple, isotropic functions
- independent of direction: correlograms are equal to 1 at h = 0,
while variograms are equal to 0 at h = 0
- variance reaches or approaches the sill beyond a certain distance
(the range or correlation length), a
Figure 3a, 3b,



Figure 3a


3c,



3b


3d,



3c


3e,



3d


3f,



3e


and 3g


3g


(Common variogram models) shows a variety of models.



3f


The Spherical model is the most commonly used model, followed by the
Exponential. The Gaussian and Exponential functions reach the sill
asymptotically. For such functions, the range is arbitrarily defined as the distance
at which the value of the function decreases to 5%. The Nested model is a linear
combination of two spherical structures; having short and long scale components.
The “Hole” model is used for variograms computed from data which has a
repeating pattern. The Hole model can be dangerous because the periodicity will
show up in a map, although it is not present in the data. There does not appear to
be any relationship between depositional environment and variogram shape.
Below are equations for four common variogram models.

LINEAR MODEL
The linear model describes a straight line variogram. This model has no sill, so
the range is defined arbitrarily to be the distance interval for the l ast lag class in
the variogram. (Since the range is an arbitrary value it should not be compared
directly with ranges of other models.) This model is described by the following
formula:
(h) = Co + [h(C/Ao)]
where
h = lag interval,
C
o
= nugget variance > 0,
C = structural variance > C
o
, and
Ao = range parameter
SPHERICAL MODEL
The spherical model is a modified quadratic function where the range marks the
distance at which pairs of points are no longer autocorrelated and the
semivariogram reaches an asymptote. This model is described by the following
formula:
(h) = Co + C [1.5(h/A
o
) -0.5(h/A
o
)
3
] for h < A
o

(h) = Co + C for h > A
o

where
h = the lag distance interval,
C
o
= nugget variance > 0,
C = structural variance > C
o
, and
Ao = range
EXPONENTIAL MODEL
This model is similar to the spherical variogram in that it approaches the sill
gradually, but differs in the rate at which the sill is approached and in the fact that
the model and the sill never actually converge. This model is described by the
following formula:
(h) = Co + C[1-exp(-h/Ao)]
where
h = lag interval,
Co = nugget variance > 0,
C = structural variance > C
o
, and
Ao = range parameter
In the exponential model, Ao is a parameter used to provide range, which, in the
exponential model, is usually assumed to be the point at which the model
approaches 95% of the sill (C+Co). Range is estimated as 3Ao.
GAUSSIAN MODEL
The gaussian or hyperbolic model is similar to the exponential model but
assumes a gradual rise for the y-intercept. This model is described by the
following formula:
(h) = Co + C[1-exp(-h
2
/Ao
2
)]
where
h = lag interval,
C
o
= nugget variance > 0,
C = structural variance > C
o
, and
Ao = range parameter
The range parameter in this model is simply a constant defined as that point at
which 95% of the sill is approached. The range can be estimated as 1.73A
o
(1.73
is the square root of 3).
Practical Guidelines For Variogram Modeling
- Do not over fit; use the simplest model (fewest number of structures).
- Do not fit the covariance for distances greater than ½ the study area.
- Pay special attention to the fit for small distances and the size of the
nugget effect.
 The nugget acts as a smoothing function during kriging.
 The nugget adds variability during conditional simulation.
- Beware of features that may relate to non-stationarity.
- Beware of periodic oscillations.
 Is “hole” effect related to true structure?
 Is “hole” effect due to sparse data at given lag intervals?
- Compute and fit the covariance along the direction of maximum and
minimum continuity using a single structure if possible.
KRIGING OVERVIEW
INTRODUCTION
Contouring maps by hand or by computer requires the use of some type of
interpolation procedure. As previously shown in the section on Gridding and
Interpolation, there are a number of algorithms for computer-based interpolation.
At the other end of the spectrum is the geologist who maps by hand, interpolating
between data points (or extrapolating beyond the control points), drawing
contours, smoothing the map to make it look “real” and perhaps biasing the map
with a trend based on geological experience (Hohn, 1988). This section provides
a broad overview of the computer-intensive interpolation process which lies at the
heart of geostatistical modeling.
LINEAR ESTIMATION
Kriging is a geostatistical technique for estimating attribute values at a point, over
an area, or within a volume. It is often used to interpolate grid node values in
mapping and contouring applications. In theory, no other interpolation process
can produce better estimates (being unbiased, with minimum error); though the
effectiveness of the technique actually depends on accurately modeling the
variogram. The accuracy of kriging estimates is driven by the use of variogram
models to express autocorrelation relationships between control points in the
data set. Kriging also produces a variance estimate for its interpolation values.
The technique was first used for the estimation of gold ore grade and reserves in
South Africa (hence the origin of the term Nugget Effect), and it is named in
honor of a South African mining engineer, Danny Krige. The mathematical validity
and foundation was developed by Georges Matheron, who later founded the
Centre de Geostatistiques, as part of the Ecole

des Mines in Paris, France.
(Henley, 1981; Hohn, 1988; Journel, 1989; Isaaks and Srivastava, 1989; Deutsch
and Journel, 1992; Wackernagel, 1995).
KRIGING FEATURES
Kriging is a highly accurate estimation process which:
- minimizes estimation error (the difference between measured value - the
re-estimated value)
- honors “hard” data
- does not introduce an estimation bias
- does not reproduce inter-well variability
- produces a “smoothed” result; like all interpolators
- is a univariate estimator; requiring only one covariance model
- weighs control points according to a spatial model (variogram)
- tends to the mean value when control data are sparse
- uses a spatial correlation model to determine the weights (ì)
- assigns negative or null weights to control points outside the correlation
range of the spatial model
- indicates the global relative reliability of the estimate through RMS error
(kriging variance), as a by-product of kriging
- has a general and easily reformulated kriging matrix, making it a very
flexible technique to use more than one variable
- declusters data before the estimation
Types Of Kriging
There are a number of kriging algorithms, and each is distinguished by how the
mean value is determined and used during the estimation process. The four most
commonly used methods are:
- Simple Kriging: The global mean is known (or supplied by the user), and
is held constant over the entire area of interpolation.
- Ordinary Kriging: The local mean varies, and is re-estimated based on
the control points in the current search neighborhood ellipse.
- Kriging with an External Drift: Although this method uses two variables,
only one covariance model is required, and the shape of the map is
related to a 2-D attribute which guides the interpolation of the primary
attribute known only at discrete locations. A typical application is time-
to-depth conversion, where the primary attribute (such as depth at the
wells) acquires its shape from the secondary attribute, referred to as
external drift (such as two-way travel time known on a 2-D grid).
- Indicator Kriging: estimates the probability of an attribute at each grid
node (e.g., lithology, productivity). The technique requires the following
parameters:
 Coding of the attribute in binary form, as 0 or 1.
 Prior Probabilities of both classes.
 Spatial covariance model of the indicator variable.
The Kriging Process
We will illustrate the estimation process with an example problem, as shown in
Figure 1: Arrangement of three data points.


Figure 1


Given samples located at (Z
o
), where o = 1, 2, 3 Find the most likely value of the
variable Z at the target point (grid node: Z
0
*
, Figure 1). In this graphic, we see the
geometrical arrangement of three data points Z
o
, the location of the point whose
value we wish to estimate Z
0
*
, and the unknown weights, ì
o
.
- Consider Z
0
*
as a linear combination of the data Z
o

 Z
0
*
= ì
0
+ E ì
o
Z
o

 Where: E ì
o
= 1 and

ì
0
= m
z
-E ì
o

- Determine ì
o
so that:
 Z
0
*
is unbiased: E [Z
0
*
-Z
o
] = 0
 Z
0
*
has minimum mean square error (MSE)
 E [Z
0
*
-Z
o
]
2

is minimum
Recall that the unknown value Z
0
*
is estimated by a linear combination of n data
points plus a shift parameter ì
0
:
Z
0
*
= ì
0
+ E ì
o
Z
o
(1)

By transforming the above equation into a set of linear normal equations, we
solve the following to obtain the weights ì
o
. The set of linear equations takes the
following form:
E ì
j
C (x
o
, x
j
) -µ = c (x
o
, x
0
) for all j = 1,n (2)
or in matrix shorthand notation:
C A = c (3)
All three terms are matrices where:
- C (x
o
, x
j
) represents a covariance between sample points x
o
and x
j

- c (x
o
, x
0
) represents a covariance between a sample located at x
o
and the
target point x
0
; the estimated point
- are the unknown weights, ì
j

- is a Lagrange multiplier that converts a constrained minimization problem
into an unconstrained minimization.
Determine the matrix of unknown weights by solving the matrix equation for A as
follows:
C A = c (4)
Where
A = C
-1
c (5)
Note that equation 3 is written in terms of covariance values, however we either
modeled a variogram or correlogram, not the covariance. We use the covariance
values because it is computationally more efficient.
The covariance equals the sill minus the variogram (Figure 2 : Relationship
between a spherical variogram and its covariance equivalent):



Figure 2


C
(h)
= o
2
(sill) -¸
(h)
(6)

Kriging Variance
In addition to estimating the value of a variable at an unsampled location, the
kriging technique also provides an estimation of the likely error (in the form of
error variance) at every grid node. These error estimates can be mapped to give
a direct assessment of the reliability of the contoured surface. Because the
kriging variances are determined independent of the data values, the kriging error
is not a measure of local reliability (Deutsch and Journel, 1992). Do not attempt
to use the kriging standard deviation like the true classical standard deviation
statistic.
The kriging variance equation is:
o
2
k
= C(x
0
, x
0
) E ì
i
(x
o
, x
0
) -µ (7)
Search Neighborhood Criterion
All interpolation algorithms require good data selection criteria, specified by a
search neighborhood. The model variogram plays a role in controlling extent of
the neighborhood. The variogram range defines the maximum size of the
neighborhood from which control points should be selected for estimating a grid
node, in order to take advantage of the statistical correlation among the
observations.
A typical geostatistical routine might interpolate values for a specific location
using nearest neighbor values weighted by distance and the degree of
autocorrelation present for that distance (as defined by the variogram model).
The neighborhood searches would be limited to a specified number of nearest
neighbors, and might also be restricted to a particular direction.
Search neighborhood parameters include:
- Search radius
- Neighborhood shape (isotropic or anisotropic: Figure 3a and 3b


3b


)
- Number of sectors ( 4 or 8 are common)



Figure 3a


- Number of data points per sector
 Unique Neighborhoods use all data points (practical limit is 100)
 Moving Neighborhoods use a limited number of points per sector (e.g.,
4)
- Azimuth of major axis of anisotropy
In this graphic, note the elliptical shape of the anisotropic search neighborhood
and the circular shape for the isotropic neighborhood. Both neighborhoods are
divided into octants, with a maximum of two data points per sector.
This graphic shows the radii for the anisotropic neighborhood are: minor axis =
1000 meters and major axis = 4000 meters, aligned at N15E. The isotropic model
has a 1500-meter radius length. The center of the neighborhood is the target grid
node for estimation. There are 55-sample points (x) within the study area.
Weights are shown for data control points used for the estimation at the target
point.
Practical Considerations in Designing the Search Neighborhood
- Align the search axis with the direction of maximum anisotropy.
- Search radii (if anisotropic) should be s to the correlation ranges.
- Each quadrant should have enough points (~ 4) to avoid directional
sampling bias.
- CPU time and memory requirements grow rapidly as a function of the
number of data points in neighborhood.
- In theory, more data in the kriging system reduces the mean square error.
- In practice, the covariance is poorly known for distances exceeding about
½ to 2/3 the size of the field. Including points that are more distant may
actually increase the error.
- The kriging estimator is built from data within the search neighborhood
centered at the target location Z
o
*
.
Practical Considerations: Unique versus Moving Search Neighborhoods
- In a moving search neighborhood, a new simple kriging (SK) or ordinary
kriging (OK) system of equations is solved at each grid node.
- A unique search neighborhood uses all the data, so the left side of the
kriging matrix, C, is solved only once and used at each grid node.
- If sufficient wells are available for ordinary kriging, then a moving
neighborhood is preferable to the unique neighborhood.
- Unique neighborhoods tend to prevent artifacts from abrupt changes in the
number and values of the data points.
- Unique neighborhoods smooth the data more than moving neighborhoods.
Practical Considerations: Ordinary (OK) versus Simple Kriging (SK)
- Simple kriging does not adapt to local trends; rather it relies on a constant,
global mean.
- Ordinary kriging uses a local mean (m
z
), which amounts to re-estimating
m
z
at each grid node from the data within the search neighborhood.
- When all data points are used (unique neighborhood), ordinary kriging and
simple kriging yield similar results.
- If only a few data points are available in the local search neighborhood,
ordinary kriging may produce spurious weights because of the
constraint that the weights must sum to 1.
- If the wells are known to provide a biased sampling, it may be better to
impose your own m
z
with simple kriging rather than use ordinary
kriging.
Effects of Variogram Parameters on Kriging
Kriging applies weighting functions according to a mathematical model of the
variogram. The resulting kriging estimates are best linear unbiased estimates of
the surface, provided that the surface is stationary and the correct form of the
variogram has been determined. As the shape of the model variogram changes,
so do the kriging results.
- Rescaling the variogram or correlogram (to create a larger or smaller sill):
 Has no affect on kriging estimate
 Changes the kriging variance
- Increasing the nugget component:
 Acts as a smoothing term during kriging (makes weights more similar)
- Increasing the range tends to increase the influence of more distance data
points and leads to smoother maps.
- The shape of the variogram or correlogram near the origin influences the
continuity of the interpolation process (e.g., the gentl er the slope, the
smoother the interpolation). See Figure 4a, 4b,



Figure 4a


4c,



4b


4d,



4c


4e,



4d


4f,



4e


4g,



4f


4h,



4g


and 4i:


4i


Kriging results from a common data set, based on different variogram
models.



4h


In this graphic, Frames A-H use the isotropic neighborhood design shown in
Figure 3b. The nested model (Frame F) used two spherical variograms, with a
short range = 1000 meters and a long range of 10,000 meters. Nested models
are additive. The anisotropic model (Frame I) used the anisotropic neighborhood
design shown in Figure 3. The minor axes of the variogram model = 1000 meters,
with a major axis = 5000 meters (5:1 anisotropy ratio), rotated to N15E. The color
scale is equivalent for all figures. Purple is 5% porosity and red is 13%. All these
illustrations were created using the same input data set.
Advantages Of Kriging
- Kriging is an exact interpolator (if the control point coincides with a grid
node).
- Kriging variance:
 Relative index of the reliability of estimation in different regions.
 Good indicator of data geometry.
 Smaller nugget (or sill) gives a smaller kriging variance.
 Minimizes the Mean Square Error.
 Can use a spatial model to control the interpolation process.
 A robust technique (i.e., small changes in kriging parameters equals
small changes in the results).
Disadvantages Of Kriging
Kriging tends to produce smooth images of reality (like all interpolation
techniques). In doing so, short scale variability is poorly reproduced, while it
underestimates extremes (high or low values). It also requires the specification of
a spatial covariance model, which may be difficult to infer from sparse data.
Kriging consumes much more computing time than conventional gridding
techniques, requiring numerous simultaneous equations to be solved for each
grid node estimated. The preliminary processes of generating variograms and
designing search neighborhoods in support of the kriging effort also require much
effort. Therefore, kriging probably is not normally performed on a routine basis;
rather it is best used on projects that can justify the need for the highest quality
estimate of a structural surface (or other reservoir attribute), and which are
supported by plenty of good data.
CROSS-VALIDATION
Cross-validation is a process for checking the compatibility between a set of data,
the spatial model and neighborhood design. In cross-validation, each point in the
spatial model is individually removed from the model, and then its value is
estimated by a covariance model. In this way, it is possible to compare estimated
versus actual values.
The procedure consists of the following steps:
1. Consider each control point in turn.
2. Temporarily suppress each control point from the data set.
3. Re-estimate each point from the surrounding data using the covariance
model.
4. Compare the estimated values, Z
est
, to the true values, Z
true
.
This also provides a re-estimation error (kriging variance is also calculated
at the same time):
RE = Z
est
-Z
true

5. Calculate a standardized error:
SE = RE/o
krig
- Ideally, it should have a zero mean and a variance equal to 1.
- The numerator is affected by the range
- The denominator is affected by the sill
6. Average the errors for a large number of target points to obtain:
- Mean error
- Mean standard error
- Mean squared error
- Mean squared standardized error
7. Distribution of errors (in map view) can provide useful criterion for:
- Selecting a search region
- Selecting a covariance model
8. Any data point whose absolute Standardized Error > 2.5 is considered an
outlier, based on the fact that the data point falls outside the 95%
confidence limit of a normal distribution.
USEFUL CROSS-VALIDATION PLOTS
Figure 1a, 1b,



Figure 1a


1c,



1b


and 1d


1d


(Cross-validation plots) show output from a cross-validation test.



1c


Figure 1a shows a map view of the magnitude of the Re-estimation Error (RE).
Open circles are over-estimations; solid circles are under-estimations. The solid
red circle falls outside 2.5 standard deviation from a mean = 0. Also, look for
intermixing of the RE as an additional indication of biasing. A good model has an
equally likely chance of over or under estimating any location.
Figure 1b is a cross plot of the measured attribute of porosity at the wells versus
porosity re-estimated at the well locations during the cross validation test. Again,
open circles are under estimates. The red, solid circle is the sample from
Figure 1a.
The two most important plots are in Figure 1c and 1d because they help identify
model bias. If the histogram of standardized error (SE) in Figure 1c is skewed, if
or there is a correlation between SE and the estimated values in Figure 1d, then
the model is biased. Such is not the case in this example; however, Figure 2a,
2b,



Figure 2a


2c,



2b





2c


and 2d (Cross-validation results from a biased model) is.



2d


In Figure 1a, the over estimated RE values are clustered in the center of the map.
The histogram of SE (Figure 1c) is slightly skewed towards over estimation.
Finally, there is a positive correlation between SE and estimated porosity. These
indicate poor model design, poor neighborhood design, or both.
COKRIGING
INTRODUCTION
In the previous section, we described kriging with a single attribute. Rather than
only consider the spatial correlation between a set of sparse control points, we
will now describe the use of a secondary variable in the kriging process.
In this section, you will learn about multivariate geostatistical data integration
techniques, which fall into the general category called Cokriging and you will
learn more about External Drift.
There are many situations when it is possible to study the covariance between
two or more regionalized variables. The techniques introduced in this section are
appropriate for instances when the primary attribute of interest (such as well
data) is sparse, but there is an abundance of related secondary information (such
as seismic data). The mutual spatial behavior of regionalized variables is called
coregionalization.
The estimation of a primary regionalized variable (e.g., porosity) from two or more
variables (such as acoustic impedance) is known as Cokriging.
TYPES OF COKRIGING
Cokriging is a general multivariate regression technique which has three basic
variations:
- Simple Cokriging uses a multivariate spatial model and a related
secondary 2-D attribute to guide the interpolation of a primary attribute
known only at control points (such as well locations). The mean is
specified explicitly and assumed to be a global constant. The method
uses all primary and secondary data according to search criterion.
- Ordinary Cokriging is similar to Simple Cokriging in that the mean is still
assumed to be constant, but it is estimated using the neighborhood
control points rather than specified globally.
- Collocated Cokriging is a reduced form of Cokriging, which requires
knowledge of only the hard data covariance model, the Product-
Moment Correlation coefficient between the hard and soft data, and the
variances of the two attributes. There is also a modified search criterion
used in Collocated Cokriging. This method uses all the primary data,
but, in its simplest form, uses only one secondary data value, the value
at the target grid node.
PROPERTIES OF COKRIGING
This method is a powerful extension of kriging, which:
- must satisfy the same conditions as kriging:
 it minimizes the estimation error.
 is an unbiased estimator
 honors the “hard” data
 control points are weighed according to a model of coregionalization
- is more demanding than kriging:
 requires a simple covariance model the for the primary and all
secondary attributes
 requires cross-covariance models for all attributes
 must be modeled with a single coregionalized model
 Requires neighborhood searches that are more demanding
 requires more computation time
DATA INTEGRATION
Besides being able to use a spatial model for determining weights during
estimation, one of the more powerful aspects of the geostatistical method is
quantitative data integration. We know from classical multivariate statistics that
models developed from two or more variables often produce better estimates. We
can extend classical multivariate techniques into the geostatistical realm and use
two or more regionalized variables in this geostatistical estimation process.
Basic Concept
We‟ll illustrate this concept by way of example. From our exploratory data
analysis we might find a good correlation between a property measured at well
locations and a certain seismic attribute. In such a case, we might want to use
the seismic information to provide better inter-well estimates than could be
obtained from the well data alone. Even when the number of primary (well) data
(e.g., porosity) are sparse, it is possible to use a densely sampled secondary
attribute (e.g., seismic acoustic impedance), in the interpolation process.
Well data have excellent vertical resolution of reservoir properties, but poor
lateral resolution. Seismic data, on the other hand, have poorer vertical resolution
than well data, but provide densely sampled lateral information. Geostatistical
data integration methods allow us to capitalize on the strengths of both data
types, to yield higher quality reservoir models.
The Cokriging Process
We can illustrate the cokriging process by way of the following problem:
Given samples located at Zo, and seismic data located on a grid, find the most
likely value of the variable Z
0
* at the target grid node (Figure 1: Arrangement of
control points, seismic grid, weights, and target grid node).


Figure 1


This graphic shows the geometrical arrangement of three data control points Zo
(where o = 1,2,3), a grid of seismic data, the unknown weights, ìo, and the target
grid node, Z
0
*.
Traditional Cokriging Estimator
The general Cokriging estimator is expressed as a weighted linear combination of
the primary (well) data
Z
1
, …. , Z
n
and the secondary (seismic) data T
1
, …. , T
m
. The traditional cokriging
approach uses only secondary data at the well locations, so if there is a very high
correlation between the primary and secondary data, it is the same as kriging the
primary data only. This condition is known as self-krigability.
Z
o
*
= ¿ ì
o
Z
a
+ ¿ |
j
T
j
+ c
Where:
ì
o
= 1 (primary weights)
|
j
= 0 (secondary weights)
c = m
Z
[1-¿ ì
o
] -m
T
¿ |
j

to ensure unbiasness.
Requirements
- C
ZZ (h)
is the spatial covariance model of the primary attribute (well data).
- C
TT (h)
is the spatial covariance model of the secondary attribute (seismic
data).
- C
ZT (h)
is the spatial cross-covariance model of well and seismic data.
Self-Krigability in the Isotropic Case
A variable is defined as self-krigable when the cokriging estimate (in the isotropic
case) is identical to its kriging estimate.
Different Cases
- No Correlation: ¸
Z1Z2

(h)
= 0 ¬ h
- Perfect Correlation: ¬
i,j

- Intrinsic Correlation occurs when the simple and cross-variograms are
proportional, which is always the case if there is only one basic
structure.
Cokriging: Example Using 3 Data Points
Data Configuration
Figure 2 shows a typical data configuration for traditional cokriging.


Figure 2


Estimator
Z
0
*
= ì
Z1
Z
1
+ ì
Z2
Z
2
+ ì
Z3
Z
3
+|
T1
T
1
+|
T2
T
2
+|
T3
T
3

Estimated Error
o
2
= C
Z00

Z1
C
Z01

Z2
C
Z02
- ì
Z3
C
Z03
- |
T1
C
T01
- |
T21
C
T02
- |
T3
C
T03

Collocated Cokriging
Collocated cokriging is a modification of the general cokriging case:
jj
b
ii
b
ij
b ± =
- It requires only the simple covariance model of the primary attribute in its
simplest form. In the case of sparse primary data, the covariance
model is often derived from the covariance model of the densely
sampled secondary attribute.
- It uses all primary data according to search criterion.
- It uses secondary data attribute located only at the target grid (simplest
form) node during estimation.
- If the secondary attribute covariance model is assumed proportional to the
primary attribute covariance model, then:
 the correlation coefficient is the constant of proportionality.
 we can use the correlation coefficient and the ratio of the secondary to
primary variances to transforms a univariate covariance model into a
multivariate covariance model. This assumption is termed Markov-
Bayes assumption (Deutsch and Journel, 1992)
Example Using 3 Data Points
Figure 3 illustrates a typical data configuration for collocated cokriging.


Figure 3


In this graphic, the secondary data at the estimation grid node is the only bit of
seismic data used in this forma of the algorithm. Forms that are more complex
combine this data configuration with the one shown in Figure 1, which also
increases the computation time substantially.
Estimator
Z
0
*
= ì
Z1
Z
1
+ ì
Z2
Z
2
+ ì
Z3
Z
3
+|
T0
T
0

Estimated Error
o
2
= C
Z00

Z1
C
Z01

Z2
C
Z02

Z3
C
Z03
-|
T0
C
T0

General Cokriging Versus Collocated Cokriging
Cokriging
- A secondary variable is not required at all nodes of the estimation grid.
- The traditional method does not incorporate secondary information from
non-collocated data points, it uses only data located at the primary
sample locations.
- Cokriging requires more modeling effort: C
ZZ(h)
, C
YY(h)
and C
ZY(h)
must be
specified. No assumption regarding the relationship between the cross-
covariance and the auto-covariance of the primary variable is required.
- It is impractical to incorporate more than two to three secondary variables
into the cokriging matrix because of increased modeling assumptions
and computational time.
- System of normal equations may be ill conditioned; that is, it is often
difficult to find a common model of coregionalization.
Collocated Cokriging
- Collocated cokriging assumes that the secondary variable is known at all
nodes of the estimation grid and uses all secondary information during
the estimation process.
- The simplest form of collocated cokriging ignores the influence of non-
collocated secondary data points, because it uses the secondary data
located only at the target grid node.
- Collocated cokriging only requires the knowledge of the primary
covariance model (C
ZZ(h)
), the variances of the primary and secondary
attributes (o
2
Z
, o
2
T
) and the correlation coefficient between the primary
and secondary attributes (µ
ZY
).
- The Markov-Bayes approach to collocated cokriging assumes that the
cross-covariance is a scaled version of the primary variable auto-
covariance.
- In general, the system of normal equations is well conditioned.
Practical Considerations: Cokriging and Collocated Cokriging
- In general, increasing the number of data points of the secondary variable
does not improve the cokriging performance. In fact, as the number
increases, the cokriging system can become unstable.
- During extrapolation situations (i.e., no well data within the search radius),
collocated cokriging reduces to a traditional least-squares regression
problem.
- Careful determination of the correlation coefficient, µ
ZT
(0), is critical when
applying collocated cokriging, because it controls the scaling between
the primary and secondary data when using the Markov-Bayes
assumption.
 Remove outliers when computing µZT(0).
 Analyze and understand the physical meaning of the correlation,
especially if the well data are sparse.
 Make sure that the estimator yields a meaningful range of estimates,
Z0*, for the minimum and maximum values of the secondary data (e.g.,
the well data probably do not calibrate the full range of the secondary
data).
 With a correlation coefficient of < 0.5, the secondary attributes has less
influence during the estimation process.
- Do not use more than one or two seismic attributes, because it is often
difficult to understand the physical meaning of the multivariate
correlation.
Advantages of Cokriging and Collocated Cokriging
- Allows incorporation of correlated, secondary data into the mapping
process.
- Can calibrate and control the influence of the secondary data via a cross-
covariance model (cokriging) or through the correlation coefficient
(collocated cokriging).
- When compared to traditional least-squares regression, the cokriging
technique honors the primary data and accounts for spatial correlation
in the variations of the secondary data.
- Yields more accurate estimates than simple single variable kriging.
Limitations of Cokriging and Collocated Cokriging
- Requires more modeling effort than kriging or kriging with an external drift.
- Cokriging system may sometimes be ill conditioned.
- Cokriging tends to produce a smoothed image, but not as smooth as
kriging.
- Inferring a correct linear correlation model is difficult for sparse well data.
EXTERNAL DRIFT KRIGING
This technique allows us to krig, or simulate, in the presence of a trend. Although
technically a univariate problem requiring only the primary attribute covariance
model, external drift kriging (KED) is generally included in the multivariate data
integration discussion of most geostatistical presentations. KED is a true
regression technique, which uses a secondary attribute to define a trend to guide
the estimation of the primary variable (Deutsch and Journel, 1992).
- Regionalized variables are made up of two parts:
 Drift - expected value, analogous to a local trend surface, representing
regional features
 Residual - deviation from the drift, represents local features
- The basic hypothesis is that the expectation of the variable is a function,
denoted S(x), which is completely defined:
E [Z
(x)
] = S
(x)

- To provide greater flexibility, we often express the model as:
E [Z
(x)
] = a
0
+ a
1
S
(x)
where the coefficients a
0
and a
1
are unknown.
The a
0
and a
1
coefficients in the above equation are a linear combination
of the error term used to filter the local secondary data trend (or drift). This
is analogous to trend surface analysis for removing a trend based on a
polynomial equation.
Once the local secondary data drift is known, the „residual‟ is estimated at
the target grid node using traditional kriging methods, then the drift value is
added back to produce the final estimated value.
- Before applying the kriging conditions, the mean from the kriging
neighborhood must be known. Like traditional kriging, external drift
kriging must use an authorized variogram model to ensure the
computation of a positive kriging variance (meeting the positive
definiteness criterion).
The sum of the weights must equal 1:
¿ ì
i
= 1
The weight times the drift value is equal to the drift value at the target
location (which is the area we want to investigate):
ì
i
S
i
= S
0
These equations ensure that the system i s unbiased. This optimality
constraint leads to the traditional error equation:
o
2
= K
00

i
K
i0

0

1
S
0

- KED is a multi-step process:
 Compute the coefficients a
0
and a
1
from a local least-squares
regression using the primary and secondary variables measured at the
wells.
 Compute residuals of the well data.
 Compute residuals at all data points.
 Compute the estimated attribute at all grid nodes.
KRIGING WITH EXTERNAL DRIFT:
EXAMPLE USING 3 DATA POINTS
Figure 1 illustrates a typical example (time to depth conversion) for the KED
method.


Figure 1


The objective of KED is to use the seismic data as a correlated shaping function,
a true regression approach, to construct the final depth map. Four wells
intersected the top of a reservoir. Kriging was used to map the solid curved
surface through the data points. This surface is a second or third order
polynomial. The seismic two-way time data (lightweight line) is the External Drift.
The seismic travel times correlate with the measured depth at the wells and
suggest a much more complex surface than the surface created using only the
well data.
KED is appropriate when shape is an important aspect of the study. The
approach assumes a perfect correlation between the well and seismic data. KED
is not an appropriate approach for mapping reservoir rock properties; the
collocated cokriging is the better choice.
Figure 2 shows a three data point KED example.


Figure 2


The primary data are located at Z
o
and the secondary data at S
o
. Note that KED
also uses the secondary information at the target grid node. This data
configuration can also be used for a more rigorous application of the collocated
cokriging method.
Estimator
Z
0
*
= ì
1
Z
1
+ ì
2
Z
2
+ ì
3
Z
3

Estimated Error
o
2
= K
00

1
K
01

2
K
02
- ì
Z3
K
03

0

1
S
0
PRACTICAL CONSIDERATIONS: KRIGING WITH EXTERNAL DRIFT
- The external drift must be known at the locations of all primary data and at
all nodes of the estimation grid.
- In theory, the covariance model of the residuals, K
ZZ(h)
, cannot be inferred
from the Z data. In practice, K
ZZ(h)
~ C
ZZ(h)
for small distances of h or
along directions not strongly affected by the trend.
- In a moving neighborhood, the coefficients a
0
and a
1
in the regression Z
*
(x)

= a
0
+ a
1
S
(x)
are re-estimated at each grid node. Sufficient data must
fall within the search neighborhood to ensure proper definition of the
regression (to filter the trend).
- Use a unique neighborhood in cases with sparse data.
- Consider a KED approach only when shape is important, or a very high
correlation exists between the primary and secondary attributes. Use a
cross-plot to investigate the relationship:
 Is there a linear relation?
 Is it well defined (i.e., µ > 0.9)?
 Is the correlation physically meaningful?
- The external drift should be a smoothly varying function, otherwise the
KED system may be unstable (produce extremely high or low values).
Advantages of KED
- Allows direct integration of a secondary attribute during estimation of the
primary data.
- Easier to implement than cokriging or collocated cokriging because it does
not require any secondary attribute modeling.
- Neighborhood search is identical to kriging.
- Computation time is similar to kriging a single variable.
Limitations of KED
- May be difficult to infer the covariance of the residuals (local features).
- There is no means to calibrate and control the influence of the secondary
variable because the method is a true regression model and assumes
a perfect correlation between the two data types.
- KED system may be unstable if the drift is not a smoothly varying function.
MEASUREMENT ERRORS
Kriging, cokriging and conditional simulation algorithms are flexible enough to
take measurement errors in the primary variable into account. At a data point i,
the measured value of Z
i
= S
i
+ e
i,
where S
i
is the true value and e
i
is the
unknown measurement error. If true, then assume that:
- The errors are random; that is not spatially correlated.
- The expected mean value of the errors is equal to zero.
- The errors are independent of the true values.
- The errors have a Gaussian distribution.
Using these assumptions, decompose the data values into:
- A signal component with a constant variance, o
S
2
.
- Zero-mean Gaussian white noise uncorrelated with the signal. It is also
assumes that the variance of the noise, o
i
2
, is known at every primary
data location.
PRACTICAL CONSIDERATIONS: MEASUREMENT ERROR
- For (co)-kriging, the noise variance acts as a smoothing parameter (as
does the Nugget Effect), which determines how closely the primary
data are honored. When o
n
2
is equal to zero, the data values are
honored exactly. When o
n
2
at one point is large compared to the signal,
the data point receives a much lower weight in the interpolation
process.
- In the simulation mode, o
n
2
introduces more variability into the final
simulated result.
- When modeling the experimental covariance of the data, the user must
remove the contribution of the nugget to extract the signal variance.
- It is often useful to allow the noise parameter to vary from one data
location to another. Examples include:
 interpolating zone average data from wireline logs with differing
accuracy.
 mixing core-derived and log-derived measurements.
 building velocity maps from well-derived and NMO-derived average
velocities.
Advantages of Using Measurement Error
- Integration of data of varying quality.
- Account for spatially varying measurement error.
- Errors are accounted for in the final result.
Limitations of Measurement Error
- Only implemented for the primary data, therefore, the secondary data are
assumed noise free.
- Amount of smoothing is not proportional to the noise variance.
Data Integration Examples
The following examples illustrate the three data integration methods just
described. Figure 1a, 1b,



Figure 1a


1c,



1b


and 1d


1d


illustrate the basic data configuration for the three examples.


1c


Figure 2a, 2b,



Figure 2a


2c,



2b


and 2d


2d


illustrate cokriging, and Figure 3a,



2c


3b,



Figure 3a


3c,



3b


and 3d


3d


show collocated cokriging and kriging with external drift.



3c


Figure 1a shows porosity data points from well log information. Figure 1b shows
variograms derived from the well data, with the experimental variogram (thin line
labeled D1) superimposed on the model variogram. In Figure 1c, the porosity
data were kriged to the seismic grid using the omni-directional, spherical
variogram model having a range of 1500 meters (shown in Figure 1b). The
isotropic search neighborhood used an octant search with 2 points per sector.
Figure 1d shows the seismic acoustic impedance data. The seismic data resides
on a grid of approximately 12 by 24 meters in X and Y, respectively. This is the
grid mesh used for all the following examples, including Figure 1c.
Figure 2a, 2b, 2c, and 2d illustrate an example of traditional cokriging. Porosity
(Figure 2a) and acoustic impedance (Figure 2b) were modeled with an omni-
directional, spherical variogram with a range of 1500 meters. The lines labeled
D1 represent the experimental variograms upon which the model variograms are
based. The cross variogram (Figure 2c) uses the same spherical model. The
cross variogram shows an inverse relationship between porosity and acoustic
impedance (the correlation is -0.83). The curved, dashed lines show the bounds
of perfect positive or inverse correlation. The sill of the cross variogram reflects
the magnitude of the -0.83 correlation between the data. Figure 2d shows the
results of cokriging using the cross-variogram model from Figure 2c.
Figure 3a, 3b, 3c, and 3d illustrate collocated cokriging (Figure 3a-c) and kriging
with external drift (Figure 3d). The model for the collocated kriging was derived
from analysis and modeling of the seismic acoustic impedance data from the
West Texas data set. Lines D1 and D2 represent experimental variograms taken
from two different directions, based on the anisotropic search neighborhood. The
well porosity data is sparse (55 data points) in comparison to the densely
sampled seismic data (33,800 data points). We are justified in using the nested,
anisotropic seismic data variogram model (Figure 3a) as a model of porosity
based on the high correlation coefficient (-0.83). Thus, we can use the Markov-
Bayes assumption to create the seismic variogram from the porosity variogram,
calibrate them using the correlation coefficient, and scale them based on their
individual variances. Figure 3b is a result of a Markov-Bayes collocated cokriging
using a correlation of -0.83. Figure 3c is also a collocated cokriging using the
Markov-Bayes assumption, except the correlation coefficient in this case was set
to -0.1. Figure 3c illustrates the condition of self-krigability, when the secondary
attribute has no correlation to the primary attribute, thus reverting to a simple
kriging solution. Although not a totally appropriate use of KED (Figure 3d), the
porosity map using KED shows a slightly wider range of porosity values. This
approach would be similar to a Markov-Bayes assumption using a -1.0
correlation.

CONDITIONAL SIMULATION AND UNCERTAINTY ESTIMATION
INTRODUCTION
Stochastic modeling, also known as conditional simulation, is a variation of
conventional kriging or cokriging. An important advantage of the geostatistical
approach to mapping is the ability to model the spatial covariance before
interpolation. The covariance models make the final estimates sensitive to the
directional anisotropies present in the data. If the mapping objective is reserve
estimation, then the smoothing properties of kriging in the presence of a large
nugget may be the best approach. However, if the objective is to map directional
reservoir heterogeneity (continuity) and assess model uncertainty, then a method
other than interpolation is required (Hohn, 1988).
Once thought of as stochastic “artwork”, useful only for decorating the walls of
research centers (Srivastava, 1994a), conditional simulation models are
becoming more accepted into our day-to-day reservoir characterization-modeling
efforts because the results contain higher frequency content, and lend a more
realistic appearance to our maps when compared to kriging.
Srivastava (1994a) notes that, in an industry that has become too familiar with
layer-cake stratigraphy, with lithologic units either connected from well-to-well or
that conveniently pinch out halfway, and contour maps that show gracefully
curving undulations, it is often difficult to get people to understand that there is
much more inter-well heterogeneity than depicted by traditional reservoir models.
Because stochastic modeling produces many, equi-probable reservoir images,
the thought of needing to analyze more than one result, let alone flow simulate all
of them, changes the paradigm of the traditional reservoir characterization
approach. Some of the realizations may even challenge the prevailing geological
wisdom, and will almost certainly provide a range of predictions from optimistic to
pessimistic (Yarus, 1994).
Most of us are willing to admit that there is uncertainty in our reservoir models,
but it is often difficult to assess the amount of uncertainty. One of the biggest
benefits of geostatistical stochastic modeling is the assessment of risk or
uncertainty in our model. To paraphrase Professor Andre Journel “… it is better
to have a model of uncertainty, than an illusion of reality.”
Before reviewing various conditional simulation methods, it is useful to ask what
is it that we want from a stochastic modeling effort. We really need to consider
the goal of the reservoir modeling exercise itself, because the simulation method
we choose depends, in large part, on the goal of the study and the types of data
available. Not all conditional simulation studies need the Cadillac approach, when
a Volkswagen technique will do fine (Srivastava, 1994a).
WHAT DO WE WANT FROM A CONDITIONAL SIMULATION METHOD?
Srivastava (1994a), in an excellent review of stochastic methods for reservoir
characterization, identifies five major types of stochastic simulation model
approaches:
- Assessing the impact of uncertainty.
- Monte Carlo risk analysis.
- Honoring heterogeneity.
- Facies or rock properties (or both)
- Honoring complex information.
The interested reader should refer to the original article for details, which is only
summarized in this presentation.
Assessing the Impact of Uncertainty
Anyone who forecasts reservoir performance understands that there is always
uncertainty in the reservoir model. Performance forecasts or volumetric
predictions are often based on a “best” case model. However, the reservoir
engineer is also interested in other models, such as, the “pessimistic” and
“optimistic” case. These models allow the engineer to assess whether the field
development plan, based on the “best” case scenario, is flexible enough to
handle the uncertainty. When used for this kind of study, stochastic models offer
many models consistent with the input data. We could then sort through the many
realizations, select one that looks like a downside scenario, and find another that
looks like an up-side model.
Monte Carlo Risk Analysis
A critical aspect for the use of stochastic modeling is the belief in some “space of
uncertainty” and that the stochastic simulations are outcomes which sample this
space fairly and adequately. We believe that we can generate a fair
representation of the whole spectrum of possibilities and hope that they do not
have any systematic tendencies to show pessimistic or optimistic scenarios. This
type of study involves the idea of a probability distribution, rather than simply
sorting through a large set of outcomes and selecting two that seem plausible. In
Monte Carlo risk analysis, we depend on the notation of a complete probability
distribution of possible outcomes, and that the simulation realizations fairly
represent the entire population.
Honoring Heterogeneity
Although stochastic techniques are capable of producing many plausible
outcomes, many studies only use a single outcome as the basis of performance
prediction. Over the past decade, it has become increasing apparent that
reservoir performance predictions are more accurate when based on models that
reflect possible reservoir heterogeneity. We are painfully aware of the countless
examples of failed predictions due to the use of overly simplistic models. The
thought of using only a single outcome from a stochastic modeling effort is often
viewed with disdain by those who like to generate hundreds of realizations.
Srivastava (1994a) argues that “even a single outcome from a stochastic
approach is a better basis for performance prediction than a single outcome from
a traditional technique that does not honor reservoir heterogeneity.”
Granted, many people will argue with this statement, because that one simulation
may be the pessimistic (or optimistic) realization just by the “luck of the draw,”
probabilistically speaking.
Facies or Rock Properties (or both)
Reservoir modelers recognize two fundamentally different aspects of stochastic
reservoir models. The reservoir architecture is usually the first priority, consisting
of the overall structural elements (e.g. faults, top and base of reservoir, etc.), then
defining the geobodies based on the depositional environment (e.g., eolian,
deep-water fan, channels, etc.). Once the spatial arrangement of the different
flow units are modeled, we must then decide how to populate them with rock and
fluid properties. The important difference between modeling facies versus
modeling rock properties is that the former is a categorical variable, whereas the
latter are continuous variables. Articles by Tyler, et al. (1994), MacDonald and
Aasen (1994), and Hatloy (1994) provide excellent overviews of these methods.
Though it is conventionally assumed that a lithofacies model is an appropriate
model of reservoir architecture, we should ask ourselves whether this is a good
assumption. Just because the original depositional facies are easily recognized
and described, they may not be the most important control on fluid flow. For
example, permeability variations might be due to later diagenesis or tectonic
events (Srivastava, 1994 ).

Honoring Complex Information
Stochastic methods allow us to incorporate a broad range of information that
most conventional methods can not accommodate. Many individuals are not so
much interested in the stochastic simulation because it generates a range of
plausible outcomes, but because they want to integrate seismic data with
petrophysical data while obtaining some measure of reliability.
Properties of Conditional Simulation
Conditional simulation is a Monte Carlo technique designed to:
- honor measured data values
- approximately, reproduce the data histogram
- honor the spatial covariance model
- be consistent with secondary data
- assess uncertainty in the reservoir model
Conditional Simulation Methods
The following section is but a very brief review of stochastic simulation methods
in common use, followed by a discussion of important practical advantages and
limitations of each method.
The terms stochastic and conditional are sometimes used interchangeably.
Technically, they each mean something different. Stochastic typically connotes
randomness to most people. In geostatistics, we define stochastic simulation as
the process of drawing equally probable, joint realizations of the component
Random Variables from a Random Function model. These are usually gridded
realizations, and represent a subset of all possible outcomes of the spatial
distribution of the attribute values. Each realization is as called a stochastic
image (Deutsch and Journel, 1992). If the image represents a random drawing
from a population of mean = 0 and variance = 1, based on some spatial model,
we would call this type of realization a non-conditional simulation. However, a
simulation is said to be conditional when it honors the measured values of a
regionalized variable (Hohn, 1988). For the remainder of this discussion,
stochastic and conditional will be used as equivalent processes.
Non-conditional simulations are often used to assess the influence of the spatial
model parameters, such as the nugget and sill values, in the absence of control
data. Each of these parameters has a direct affect on the amount of variability in
the final simulation. Increasing either the sill or nugget increases the amount of
variability in a simulated realization.
Srivastava (1994a) lists the following types of stochastic simulation methods:
- Turning Bands
- Sequential Simulation -
 Gaussian
 Indicator
 Bayesian
- Simulated Annealing
- Boolean, Marked-Point Process and Object Based
- Probability Field
- Matrix Decomposition Methods
We will describe each of these methods in turn.
Turning Bands
This is one of the earliest simulation methods, tackling the simulation problem by
first creating a smooth model by kriging, then adding an appropriate level of
noise. The noise is added through a non-conditional simulation step using the
same histogram and spatial model as in the kriging step, but does not use the
actual data values at the well locations. The final model still honors the original
data and the spatial model, but now also has an appropriate level of spatial
heterogeneity (Srivastava, 1994a; Deutsch and Journel, 1992).
Sequential Simulation
Three sequential simulation (Gaussian, Indicator, and Bayesian) procedures
make use of the same basic algorithm for different data types. The general
process is
1. Select at random grid node GN
i
, a point not yet simulated in the grid.
2. Use kriging to estimate the mean, m
i
,

and

variance, o
i
2
at location GN
i
from
the local Gaussian conditional probability distribution (lGcpd), with zero
mean and unit variance.
P (Z
Si
, Z
S1
, . . ., Z
Si-1
) o exp [(Z
Si
-m
i
)
2
/ 2o
i
2
]
where:
m
i
is estimated by any of the kriging methods, including kriging with
external drift (KED)
o
i
2
is the error variance of m
i
,
3. Draw at random a single value, z
i
from the lGcpd, whose maximum spread
is ± 2 o around m
i

4. Create a newly simulated value Z
Si
*
= m
i
+ z
i
.
5. Include the newly simulated value Z
Si
*
in the set of conditioning data. This
ensures that closely spaced values have the correct short scale correlation.
6. Repeat the process until all grid nodes have a simulated value.
Selection of the Simulated Grid Node
The first step in sequential simulation is the random selection of a location GN
i
,
then GN
i +1
, until all grid nodes contain a simulated value. The order in which grid
nodes are randomly simulated influences the cumulative feedback effect on the
outcome. The selection process is random, but repeatable:
- For each simulation, shuffle the grid nodes into an order defined by a
random seed value.
- Each random seed corresponds to a unique grid order.
- Different random seed values produce a different path through the grid.
- Although the total possible number of orderings is very large, each random
path is uniquely identified and repeatable.
Sequential Gaussian Simulation (SGS) is a method for the simulation of
continuous variables, such as petrophysical properties. In SGS, the procedure is
essentially the same as (co) kriging, with the addition of a bias.
Sequential Indicator Simulation (SIS) is a method used to simulate discrete
variables. By creating a grid of 0s and 1s, it uses the same methodology as SGS,
which represent “lithofacies” (pay/non-pay, or sand/shale).
SIS requires the following input parameters:
- The a priori probabilities (proportions) of two data classes (Indicators -
denoted as I) coded as 0 or 1, for example:
 I(z
x
) = 1 if z
x
is shale
 I(z
x
) = 0 if z
x
is sand
- Indicator histogram
- The Indicator spatial correlation model
Bayesian Sequential Indicator Simulation is a later form of SIS (Doyen, ET al.,
1994). This technique allows direct integration of seismic attributes with well data
using a combination of classification and indicator methods.
Bayesian SIS input parameter requirements:
- Code well data as 0 or 1, as in SIS.
- Classify the seismic attribute into two classes (0, 1):
 Assuming the two data classes are Normal Distributions, we need the:
+ Mean and standard deviation of the seismic attribute that is 0.
+ Mean and standard deviation of the seismic attribute that is 1.
 The a priori probabilities of the two classes of seismic data.
 The Indicator spatial correlation models.
Simulated Annealing
Annealing is the process where a metallic alloy is heated so that the molecules
move around and reorder themselves into a low-energy grain structure. The
probability that any two molecules will follow each other is known as the
Boltzmann probability distribution. Simulated annealing is the application of the
annealing mechanism of swapping the attributes assigned to two different grid
node locations, using the Boltzmann probability distribution for accepting the
perturbations (Deutsch, 1994). The process continues until the desired model
conditions are satisfied.
Simulated annealing constructs the reservoir model via an iterative trial and error
process, and does not use an explicit random function model. Rather, the
simulated image is formulated as an optimization process. The first requirement
is an objective (or energy) function, which is some measure of difference
between the desired spatial characteristics and those of the candidate realization
(Deutsch, 1994). For example, we might want to produce an image of a
sand/shale model with a 70% net-to-gross ratio, an average shale length of 60 m,
with average shale thickness of 10 m.
The image starts with pixels arranged randomly, having sand and shale in the
correct global proportion. The net-to-gross is incorrect because of the random
assignment of the sand and shale. The average shale length and width are too
short also. Next, the annealing mechanism swaps attributes at different grid node
locations, applies the Boltzmann probability distribution for accepting the
perturbations, and continues until the model conditions are satisfied.
At first glance, this approach seems terribly inefficient, because millions of
perturbations may be required to arrive at the desired image. However, these
methods are more efficient than they appear in theory (Deutsch, 1994).
Boolean, Marked-Point Process and Object Based
Theses methods constitute a family of techniques that create reservoir models
based on objects of some genetic significance, rather than being built up from
one elementary node or pixel at a time. To use such methods, you need to select
a basic shape for each lithofacies that describes its geometry. For example, you
might want to model sand channels that look like half ellipses in cross section, or
deltas as triangular wedges in map view. You must also specify the proportions of
the shapes in the final model and choose a distribution for the parameters that
describe the shapes. There are algorithms that describe how the geobodies are
positioned relative to each other (that is, can they overlap, and how, or must
there be a minimum distance between the shapes).
After the distribution of parameters and position rules are chosen, follow the
remaining steps in the procedure (Srivastava, 1994a):
1. Fill the reservoir model background with some lithofacies (e.g., shale).
2. Randomly select a starting point in the model.
3. Randomly select one of the lithofacies shapes, and draw an appropriate
size, anisotropy and orientation.
4. Check to see if the shape conflicts with any conditioning data (e.g., well
data) or with other previously simulated shapes. If not, keep the shape,
otherwise reject it and go back to the previous step.
5. Check to see if the global proportions are correct, if not, return to step 2.
6. Simulate petrophysical properties within the geobodies using the more
classical geostatistical methods. If control data must be honored, this step
is typically completed first, and then the inter-well region is simulated. Be
sure that there are no conflicts with known stratigraphic and lithologic
sequences in the wells.
Boolean or object-based techniques are of current interest in the petroleum
industry with a number of research, academic, and commercial vendors working
on new implementation algorithms. In the past, Boolean-type algorithms could not
always honor all of the conditioning data, because the algorithms were not strict
simulators of shape. The number of input parameters made this almost a
deterministic method, requiring much upfront knowledge of the depositional
system you wanted to model. Articles by Tyler, et al. (1994) and Hatloy (1994)
provide excellent case studies using Boolean-type methods to simulate fluvial
systems.
Probability Field Simulation
This method is an enhancement of the sequential simulation methods described
earlier. In sequential simulation, the value drawn from the local cumulative
probability distribution at a particular grid node is treated as if it was hard data,
and is included as local conditioning data. This ensures that closely spaced
values have the correct short scale correlation. Otherwise, the simulated image
would contain too much short scale (high frequency noise) variability.
The idea behind probability field, or P-field, simulation is to increase the efficiency
of computing the local conditional probability distribution (lcpd) on the original well
data only. P-field simulation gets around the problem of too much short scale
variability by controlling the sampling of the distributions rather than controlling
the distributions as in sequential simulation (Srivastava, 1994a).
Srivastava (1994b) shows how P-field simulation improves the ability to visualize
uncertainty and the article by Bashore, et al, (1994) illustrates a P-field
application for establishing an appropriate degree of correlation between porosity
and permeability.
Matrix Decomposition Methods
Some simulation techniques involve matrix decomposition; L-U decomposition is
one such example, using a matrix represented as the product of a lower
triangular matrix, L, and an upper triangular matrix U. This decomposition can be
made unique either by stipulating that the diagonal elements of L be unity, or that
the diagonal elements of L and U be correspondingly identical. In this approach,
different outcomes are created by multiplying vectors of random numbers by a
precalculated matrix created from spatial continuity information supplied by the
user, typically as a variogram or correlogram. Matrix methods can be viewed as a
form of sequential simulation because the multiplication across the rows of the
precalculated matrix and down the column vector of the random numbers can be
construed as a sequential process in which the value of the successive node
depends upon the value of the previously simulated nodes (Srivastava, 1994a;
Deutsch and Journel, 1992).
Uncertainty Estimation
Once all of these simulated images have been generated, how do you determine
which one is correct? Technically speaking, any one of the simulated images is a
possible realization of the reservoir, because each image is equally likely, based
on the data and the spatial model. However, just because the image is
statistically equally probable does not mean it is geologically acceptable. You
must look at each simulated image to determine if it is a reasonable
representation of what you know about the reservoir -if not, discard it, and run
more simulations if necessary.
Some of the possible maps generated from a suite of simulated images include:
- Mean: This map is the average of n conditional simulations. At each cell,
the program computes the average value, based on the values from all
simulations at the same location. When the number of input simulations
is large, the resultant map converges to the kriged solution.
- Minimum: Each cell displays the smallest value from all input simulations.
- Maximum: Each cell displays the largest value from all input simulations.
- Standard Deviation: A map of the standard deviation at each grid cell,
computed from all input maps. This map is used as a measure of the
standard error and is used to analyze uncertainty.
- Uncertainty or Risk: This map displays the probability of meeting or
exceeding a user specified threshold value at each grid cell. The grid
cell values range between 0 and 100 percent.
- Iso-Probability: These maps are displayed in terms of the attribute value at
a constant probability threshold.
Practical Considerations for Conditional Simulation
- In theory, the amount of conditioning data increases as the number of
points simulated, using a sequential simulation approach. However,
only measured data and previously simulated points that fall within the
search radius are used at any given time.
- The spatial correlation function is reproduced onl y for distances within the
search radius. Therefore, the search region must extend at least to
distances for which the covariance function is to be reproduced.
- Do not expect exact reproduction of the spatial model, because of
uncertainty in the model parameters.
- In sequential simulation, locations are visited according to a random path
to avoid artifacts and maximize simulation variability.
- Correct determination of the correlation coefficient between the primary
and secondary variable is crucial for a Markov-Bayes collocated
simulation. Over-estimating the correlation may result in over-
constrained simulations and a narrow range of outcomes.
- Simulation with the KED method may yield an unrealistically wide range of
simulated values unless the external drift is a smoothly varying function
(e.g., seismic velocity).
- Sparse data sets will produce a wide range of outcomes.
How many simulations should we create and use?
- This is often a difficult question to answer, because it depends, in part, on
the number of conditioning data points and the quality of the correlation
between the primary and secondary data (if performing a co-
simulation). Only about 100 simulations are required to produce
reasonable probability maps or confidence margins on global
parameters.
- Only a small number of simulated models, representing minimum, most
likely and maximum cases need to be retained for fluid flow
simulations.
- Discard geologically unrealistic simulations and recompute more
simulations to ensure adequate summary maps.
- The density and quality of the conditioning data control the amount of
variability.
Advantages of Conditional Simulation
- By approximately reproducing the data histogram and the spatial
correlation structure, conditional simulations provide more realistic
reservoir images than (co)-kriging.
- Because simulations can reproduce extreme values (tails of histograms)
and their pattern of connectivity, they are useful for simulating
hydrocarbon production volumes and rates.
- Conditional simulations provide alternative models, which are consistent
with the data.
- Simulations generate different, but equally probable geological scenarios
for use in risk assessment.
Limitations Of Conditional Simulation
- CPU and memory intensive.
- Large numbers of simulations may create data management problems.
- Interpret confidence limits calculated from post processing simulations with
caution, because uncertainty in the conditioning data may be large.
- Simulations are very sensitive to covariance model parameters, like the sill
and nugget, or correlation coefficient if using collocated cosimulation.
- Sparse conditioning data generally produces a wide range of variability
between the simulations.
- Although statistically equally probable, not all images may be geologically
realistic.
A Conditional Simulation Example
The following figures illustrate a Markov-Bayes, collocated cosimulation of
porosity with seismic acoustic impedance for the North Cowden Field data set.
The variogram mode, neighborhood configurations, and Markov-Bayes
assumption are identical to those used in the collocated kriging example
described in an earlier section (Figure 1a and 1b


1b


)



Figure 1a


Fifty simulations (Figure 2 ) were generated and post-processed to create the
mean and standard deviation maps of the simulations shown in Figure 3a,



Figure 2


3b,



Figure 3a


and 3c


3c


.


3b


Figure 4a,



Figure 4a


4b, 4c,



4b


and 4d


4d


illustrate a risk map for two different porosity cutoffs and minimum and maximum
value maps.


4c


Figure 2 shows eleven of 50 simulations created by a Markov-Bayes collocated
co-simulation approach. The mean of the simulated values is displayed in the
lower right corner. Each image is a reasonable representation of porosity based
on the input data. What we see are repeating global patterns with local variabi lity.
When you see repeating features from image to image, you should have more
confidence that the values are real. Figure 3a, 3b,, 3c and Figure 4a, 4b, 4c, 4d
show the results of post processing the 50 simulations.
Figure 3a is the mean of the 50 simulations compared to the collocated cokriging
result (Figure 3b). The standard deviation or standard error map (Figure 3c) of
the 50 simulations ranges from 0 to about 0.8 porosity percentage units. This
map provides a measure of uncertainty based on the input data and spatial
model.
Figure 4 shows the maximum (Figure 4a) and minimum (Figure 4b) values
simulated at each grid node. Do not use these as the pessimistic and optimistic
cases. These are computed maps, not simulated results. These displays only
show the range of simulated values. Figure 4c shows the probability that the
porosity is > 8 %, and Figure 4d shows the probability that porosity is > 10 %.
These displays are very useful for risk analysis.
Other Points To Consider When Performing A Simulation
Stochastic simulation methods assume that the data follow a Normal Distribution.
If the sample data are reasonably normal, then it may not be necessary to
transform them. This assumption is easily checked using q-q plots, for example.
(A graphical approach that plots ordered data values against the expected values
of those observations. If the values follow a particular reference distribution, the
points in such a plot will follow a straight line. Departures from the line show how
the data differ from the assumed distribution.) However, if the data are skewed
(Figure 5a ), it may be necessary to transform the data.


Figure 5a


One commonly used transformation that transforms any data set into a Normal
Distribution is the Hermite polynomial method (Wackernagel, 1995: Hohn, 1998).
This method fits a polynomial with n terms to the histogram and maps the data
from one domain to another. Variogram modeling, kriging and simulation are
performed on the transformed variable. Then, back-transform the gridded results
using the stored Hermite coefficients. A Hermite polynomial transform on 55
porosity data is shown in Figure 5a, 5b, 5c,



5b


and 5d


5d


.


5c


The shape in Figure 5a shows a truncated porosity distribution (no values lower
than 6 %) because a cut-off was used for pay estimation. This approach to pay
estimation creates the skewed distribution. If we want to honor this histogram
(Figure 5a) in the simulation process, we must transform the original data into a
Gaussian (normal) distribution (Figure 5b). Figure 5c and Figure 5d show the
results of a Hermite polynomial modeling approach to transform the data. The
modeled histogram (blue) is superimposed on the raw histogram (black) in
Figure 5c. The cumulative histogram (Figure 5d) shows a reasonably good match
between the model (blue) and the original (black) data. The purpose of the
transformation is to model the overall shape and not every nuance of the raw
data, which may only be an approximation.
OVERVIEW
INTRODUCTION
Several very good public domain geostatistical mapping and modeling packages
are available to anyone with access to a personal computer. In this section, five
software packages are reviewed, with information on how to obtain them. For a
more complete review, see the article by Clayton (1994).
The geostatistical packages STATPAC, Geo-EAS, GEOPACK, Geostatistical
Toolbox, and GSLIB are reviewed according to their approximate chronological
order of appearance in the public domain. These packages are fairly
sophisticated, reflecting the evolution in personal computer graphics, interfaces,
and advances in geostatistical technology.
These programs are placed into the public domain with the understanding that
the user is ultimately responsible for its proper use. Geostatistical algorithms are
complicated to program and debug, considering all the possible combinations of
hardware and operating systems.
Another word of caution is that the different authors use different nomenclature
and mathematical conventions, which just confuses the issue further. The
International Association of Mathematical Geologists has attempted to
standardize geostatistical jargon through the publication of Geostatistical
Glossary and Multilingual Dictionary, edited by Richardo Olea.
STATPAC
STATPAC (STATistical PACkage) is a collection of general-purpose statistical
and geostatistical programs developed by the U. S. Geological Survey. The
programs were complied in their current form by David Grundy and A. T. Miesch.
It was released as USGS Open-File Report 87-411-A, 87-411-B, and 87-411-C,
and was lasted updated in May 1988.
The programs were originally developed for use in applied geochemistry and
petrology within the USGS. The geostatistical program only works for 2-
dimensional spatial data analysis. This early program was developed for the older
XT PCs, and thus does not take advantage of the quality graphical routines now
available. The limited graphic capabilities may discourage beginning practitioners
from using this software, even though STATPAC may have some advantages
over other public domain software (Clayton, 1994).
Order STATPAC from the following sources:
Books and Open-File Reports
U. S. Geological Survey
Federal Center
P. O. Box 25425
Denver, CO 80225
Telephone: (303) 236-7476
Order Reports: OF 87-411-A, B, C
Cost: About $100
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEO-EAS
Evan Englund (USGS) and Allen Sparks (Computer Sciences Corporation)
developed Geo-EAS (Geostatistical Environmental Assessment Software) for the
U.S. Environmental Protection Agency for environmental site assessment and
monitoring of data collected on a spatial network. Version 1.2.1 was compiled in
July 1990.
Geo-EAS provides practical geostatistical applications for individuals with a
working knowledge of geostatistical concepts. The integrated program layout,
interface design, and excellent user's manual makes this an excellent
instructional or self-study tool for learning geostatistical analysis (Clayton, 1994)
Order Geo-EAS from the following sources:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
National Technical Information Service
Springfield, VA 22161
Telephone: (707) 487-4650
Fax: (703) 321-8547
Cost: about $100
IGWMC USA
Institute for Ground-Water Research and Education
Colorado School of Mines
Golden, CO 80401-1887
Telephone: (303) 273-3103
Fax: (303) 272-3278
Cost: call for current pricing
GeoApplications
P. O. Box 41082
Tucson, AZ 85717-1082
Telephone: (602) 323-9170
Fax: (602) 327-7752
Cost: call or fax for current pricing
GEOPAC
This is a geostatistical package suitable for teaching, research and project work
released by the EPA. S. R. Yates (U. S. Department of Agriculture) and M. V.
Yates (University of California-Riverside) developed GEOPACK. Version 1.0 was
released in January 1990.
GEOPACK is useful for mining, petroleum, environmental, and research projects
for individuals who do not have access to a powerful workstation or mainframe
computer. It is designed for both novice and experienced geostatistical
practitioners (Clayton, 1994)
Order GEOPACK from:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing
“GEOPACK”
Robert S. Kerr Environmental Research Laboratory
Office of Research and Development
U. S. EPA
Ada, OK 74820
IGWMC USA

Institute for Ground-Water Research and Education
Colorado School of Mines
Golden, CO 80401-1887
Telephone: (303) 273-3103
Fax: (303) 272-3278
Cost: call for current pricing
GEOSTATISTICAL TOOLBOX
FSS International, a consulting company specializing in natural resources and
risk assessment, makes Geostatistical Toolbox available to the public. The
program was developed and written by Roland Froidevaux, with Version 1.30
released in December 1990.
Geostatistical Toolbox provides a PC based interactive, user-friendly
geostatistical toolbox for workers in mining, petroleum, and environmental
industries. It is also suitable for teaching and academic applications. The program
has been rigorously tested and is recommended for anyone wanting an excellent
2-dimensional geostatistical package.
Order Geostatistical Toolbox from the following sources:
Computer Oriented Geological Survey
P. O. Box 370246
Denver, CO 80237
Telephone: (303) 751-8553
Cost: call for current pricing

FSS International Offices at:
800 Millbank
False Creek
South
Vancouver, BC
Canada V5Z 3Z4

245 Moonshine Circle
Reno, NV 89523
USA

10 Chemin de
Drize
1256 Troinex
Switzerland

P. O. Box 657
Eppling 2121
NSW, Australia

GSLIB
The GSLIB is a library of geostatistical programs developed at Stanford
University under the direction of Andre Journel, director of the Stanford Center for
Reservoir Forecasting. Oxford University Press published the user‟s guide and
FORTRAN programs authored by Clayton Deutsch and Andre Journel (1992).
GSLIB addresses the needs of graduate students and advanced geostatistical
practitioners, but is also a useful resource for the novice. GSLIB is the most
advanced public domain geostatistical software available, offering full 2-D and 3-
D applications. The program library does not contain executable code, but rather
uncompiled ASCII FORTRAN program listings. These programs will run on any
computer platform that can compile FORTRAN. Although the user‟s guide is well
written and documents the program in an organized text-like fashion, with
theoretical background, the novice may find introductory texts, such as Hohn
(1988) or Isaaks and Srivastava (1989), useful supplementary reading.
Order GSLIB from the following sources:
Oxford University Press
Business and Customer Service
2001 Evans Road
Cary, NJ 27513
Telephone: 1-800-451-7756
Order: GSLIB: Geostatistical Software Library and User’s Guide by Clayton V.
Deutsch and Andre Journel (ISBN 0-19-507392-4)
Cost: $49.95 plus $2.50 postage
You can also order this book through most bookstores.
SELECTED PUBLISHED GEOSTATISTICAL CASE STUDIES
The following list of published case studies provides an excellent overview of
geostatistical applications within the petroleum industry. Though not an
exhaustive list, it is a representative list of case studies.
Almeida, A. S. and P. Frykman, 1994, “Geostatistical Modeling of Chalk
Reservoir Properties in the Dan Field, Danish North Sea,” in Stochastic Modeling
and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds. AAPG
Computer Applications in Geology, No. 3, pp. 273-286.
Araktingi, U. G., W. M. Bashore, T. T. B. Tran, and T. Hewett, 1993, “Integration
of Seismic and Well Log Data in Reservoir Modeling,” in Reservoir
Characterization III, B. Linville, Ed., Pennwell Publishing, Tulsa, Oklahoma, pp.
515-554.
Armstrong, M. and G. Matheron, 1987, Geostatistical Case Studies, Reidel,
Dordrecht.
Bashore, W. M., U. G. Araktingi, M. Levy, and W. J. Schweller, 1994, “Importance
of a Geological Framework and Seismic Data Integration for Reservoir Modeling
and Subsequent Fluid-Flow Predictions,” in Stochastic Modeling and
Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds., AAPG Computer
Applications in Geology, No. 3, pp. 159-176.
Chambers, R. L., M. A. Zinger and M. C. Kelly, 1994, “Constraining Geostatistical
Reservoir Descriptions with 3-D Seismic Data to Reduce Uncertainty,” in
Stochastic Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers,
Eds. AAPG Computer Applications in Geology, No. 3, pp. 143-58.
Chu, J., W. Xu and A. G. Journel, 1994, “3-D Implementation of Geostatistical
Analyses-The Amoco Case Study,” in Stochastic Modeling and Geostatistics,
(1994), J. M. Yarus and R. L. Chambers, Eds. AAPG Computer Applications in
Geology, No. 3, pp. 201-216.
Cox, D. L., S. J. Linquist, C. L. Bargas, K. G. Havholm, and R. M. Srivastava,
1994, “Integrated Modeling for Optimum Management of a Giant Gas
Condensate Reservoir, Jurassic Eolian Nugget Sandstone, Anschutz Ranch East
Field, Utah Overthrust (USA),” in Stochastic Modeling and Geostatistics, (1994),
J. M. Yarus and R. L. Chambers, Eds., AAPG Computer Applications in Geology,
No. 3, pp. 287-322.
Deutsch, C. V. and A G. Journel, 1994, “Integrating Well Test-Derived Effective
Absolute Permeabilities in Geostatistical Reservoir Modeling,” in Stochastic
Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds.,
AAPG Computer Applications in Geology, No. 3, pp. 131-142.
Doyen, P. M., 1988, “Porosity From Seismic Data: A Geostatistical Approach,”
Geophysics, Vol. 53, pp. 1263-1295.
Doyen, P. M. and T. M. Guidish, 1992, “Seismic Discrimination of Lithology and
Porosity, a Monte Carlo Approach,” in Reservoir Geophysics: Investigations in
Geophysics, R. E. Sheriff, Ed., Vol. 7, Society of Exploration Geophysicists,
Tulsa, Oklahoma, pp. 243-250.
Doyen, P. M., D. E. Psaila and S. Strandenes, 1994, Bayesian Sequential
Indicator Simulation of Channel Sands in the Oseberg Field, Norwegian North
Sea, SPE Annual Technical Conference and Technical Exhibition, New Orleans,
SPE 28382.
Galli, A. and G. Meunier, 1987, “Study of a Gas Reservoir Using the External Drift
Method,” in Geostatistical Case Studies, M. Armstrong and G. Matheron, Eds.,
Reidel, Dordrecht, pp. 105-119.
Hatloy, A. S., 1994, “Numerical Modeling Combining Deterministic and Stochastic
Methods,” in Stochastic Modeling and Geostatistics, (1994), J. M. Yarus and R. L.
Chambers, Eds., AAPG Computer Applications in Geology, No. 3, pp. 109-120.
Hewett, T., 1994, “Fractal Methods for Fracture Characterization,” in Stochastic
Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds. AAPG
Computer Applications in Geology, No. 3, pp. 249-260.
Hohn, M. E. and R. R. McDowell, 1994, “Geostatistical Analysis of Oil Production
and Potential Using Indicator Kriging,” in Stochastic Modeling and Geostatistics,
(1994), J. M. Yarus and R. L. Chambers, Eds., AAPG Computer Applications in
Geology, No. 3, pp. 121-130.
Hoye, T., E. Damsleth, and K. Hollund, 1994, “Stochastic Modeling of Troll West
with Special Emphasis on the Thin Oil Zone,” in Stochastic Modeling and
Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds. AAPG Computer
Applications in Geology, No. 3, pp. 217-240.
Journel, A. G. and F. G. Alabert, 1990, “New Method for Reservoir Mapping,”
Journal of Petroleum Technology, Vol. 42, No. 2, pp. 212-218.
Kelkar, M. and S. Shibli, 1994, “Description of Reservoir Properties Using
Fractals,” in Stochastic Modeling and Geostatistics, (1994), J. M. Yarus and R. L.
Chambers, Eds. AAPG Computer Applications in Geology, No. 3, pp. 261-272.
MacDonald, A. C. and J. O. Aasen, 1994, “A Prototype Procedure for Stochastic
Modeling of Facies Tract Distribution in Shoreface Reservoirs,” in Stochastic
Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds.,
AAPG Computer Applications in Geology, No. 3, pp. 91-108.
Marechal, A., 1984, “Kriging Seismic Data in the Presence of Faults,” in
Geostatistics for Natural Resources Characterization, G. Verly, ET al., Eds.,
NATO ASI Series C-122, Reidel, Dordrecht, pp. 271-294.
Moinard, L., 1987, “Application of Kriging to the Mapping of a Reef from Wireline
Log and Seismic Data: A Case History,” in Geostatistical Case Studies, M.
Armstrong and G. Matheron, Eds., Reidel, Dordrecht, pp. 93-103.
Srivastava, R. M., 1994a, “An Overview of Stochastic Methods for Reservoir
Characterization,” in Stochastic Modeling and Geostatistics, (1994), J. M. Yarus
and R. L. Chambers, Eds., AAPG Computer Applications in Geology, No. 3, pp.
3-16.
Srivastava, R. M., 1994b, “The Visualization of Spatial Uncertainty,” in Stochastic
Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds.,
AAPG Computer Applications in Geology, No. 3, pp. 339-346.
Tyler, K., A. Henriquez, and T Svanes, 1994, “Modeling Heterogeneities in Fluvial
Domains: A Review of the Influences on Production Profiles,” in Stochastic
Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds.,
AAPG Computer Applications in Geology, No. 3, pp. 77-90.
Wolf, D. J., K. D. Withers, and M. D. Burnaman, 1994, “Integration of Well and
Seismic Data Using Geostatistics,” in Stochastic Modeling and Geostatistics,
(1994), J. M. Yarus and R. L. Chambers, Eds., AAPG Computer Applications in
Geology, No. 3, pp. 177-200.
Xu, W., T. T. Tran, R. M. Srivastava and A. G. Journel , 1992, “Integrating Seismic
Data in Reservoir Modeling: the Collocated Cokriging Alternative,” in Proceedings
of the 67
th
Annual Technical Conference of the Society of Petroleum Engineers,
Washington, SPE 24742, pp. 833-842.
REFERENCES
Books
Agterberg, F. P., 1974, Geomathematics, New York, Elsevier Scientific Publishing,
596 p.
Armstrong, M., 1998, Basic Linear Geostatistics, Springer-Verlag , Berlin, 153 p.
Clark, I., 1979, Practical Geostatistics, Applied Science Publishers, London, 129 p.
Cressie, N., 1991, Statistics for Spatial Data, Wiley, New York,900 p.
David, M., 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific
Publishing, New York,364 p.
Davis, J. C., 1986, Statistics and Data Analysis in Geology, Second Edition, John
Wiley & Sons, New York, 646 p.
Deutsch, C. V. and A G. Journel, 1992, GSLIB: Geostatistical Software Library and
User's Guide, Oxford University Press, New York, Oxford, with software diskettes,
340 pp.
Goovaerts, P., 1997, Geostatistics for natural resources evaluation, Oxford
University Press, 483 pp.
Henley, S., 1981, Nonparametric Geostatistics, Elsevier Applied Science Publishers
LTD., Essex, UK, 145 pp.
Hohn, M. E., 1988, Geostatistics and Petroleum Geology, Van Nostrand Reinhold,
NY, 264 pp.
Isaaks, E. H. and R. M. Srivastava, 1989, An Introduction to Applied Geostatistics,
Oxford University Press, Oxford, 561 p.
Journel, A. G., 1989, Fundamentals of Geostatistics in Five Easy Lessons, Short
course in Geology, Vol. 8, American Geophysical Union, 40 p.
Journel, A. G. and Huijbergts, 197 8, Mining Geostatistics, Academic Press,
Orlando, Florida, 600 p.
Kachigan, Sam Kash, 198 6 , Statistical Analysis: An Interdisciplinary Introduction
to Univariate and Multivariate Methods , (ISBN: 0-942154-99-1) Radius Press, New
York, 589 p
Olea, R. A., 1991, Geostatistical Glossary and Multilingual Dictionary, Oxford
University Press, New York, 177 p.
Wackernagel, H., 1995, Multivariate Geostatistics: An Introduction with Applications,
Springer-Verlag, Berlin, 256 p.

Papers
Anguy, Y., R. Ehrlich, C. M. Prince, V. L. Riggert, and D. Bernard, 1994, The
Sample Support Problem for Permeability Assessment in Sandstone Reservoirs, in
Stochastic Modeling and Geostatistics, (1994), J. M. Yarus and R. L. Chambers,
Eds. AAPG Computer Applications in Geology, No. 3, pp. 273-286.
Chu, J., Xu, W., Zhu, H., and Journel, A.G., 1991, The Amoco case study: Stanford
Center for Reservoir Forecasting, Report 4 (73 pages).
Clark, I., 1979, The Semivariogram -Part 1: Engineering and Mining Journal, Vol.
180, No. 7, pp. 90-94.
Clark, I., 1979, The Semivariogram -Part 2: Engineering and Mining Journal, Vol.
180, No. 8, pp. 90-97.
Clayton, C. M. (1994) "Public Domain Geostatistics Programs: STATPAC, Geo-
EAS, GEOPAC, Geostatistical Toolbox, and GSLIB," in Stochastic Modeling and
Geostatistics, (1994), J. M. Yarus and R. L. Chambers, Eds., AAPG Computer
Applications in Geology, No. 3, pp. 340-367.
Cressie, N., 1990, The Origins of Kriging, Mathematical Geology, Vol. 22, No. 3, pp.
239-252.
Cressie, N. and D. M. Hawkins, 1980, Robust Estimation of the Variogram,
Mathematical Geology, Vol. 12, No. 2, pp. 115-125.
Davis, J.M, Phillips, F.M., Wilson, J.L., Lohmann, R.C., and Love, D.W., 1992, A
sedimentological-geostatistical model of aquifer heterogeneity based on outcrop
studies (Abstract) EOS, American Geophysical Union, v.73 p.122.
Dimitrakopoulos, R., and Desbarats, A.J., 1997, Geostatistical modeling of
gridblock permeabilities for 3D reservoir simulators: SPE Reservoir Engineering, v.8,
p 13-18.
Englund, E. J., 1990, A Variance of Geostatisticians, Mathematical Geology, Vol.
22, No. 4, pp. 313-341.
Fogg, G.E., Lucia, F.J, and Senger,R.K, 1991, "Stochastic simulation of interwell-
scale heterogeneity for improved prediction of sweep efficiency in a carbonate
reservoir," in Reservoir characterization II, W.L. Lake, H. B. Caroll, and T.C. Wesson,
eds., Orlando, Florida, Academic Press, p. 355-381.
Gomez-Hernandez, J.J., and Srivastava, R.M. 1989, ISIM3D: an ANSP-C three-
dimensional multiple indicator conditional simulation program: Computer and
Geosciences, v.16, p. 395-440.
Isaaks, E. H. and R. M. Srivastava, 1988, Spatial Continuity Measures for
Probabilistic and Deterministic Geostatistics, Mathematical Geology, Vol. 20, No. 4,
pp. 239-252.
Journel, A. G., 1988, Non-parametric Geostatistics for Risk and Additional Sampling
Assessment, in Principles of Environmental Sampling, L. Keith, Ed., American
Chemical Society, pp. 45-72.
Krige, D. G., 1951, A Statistical Approach to Some Basic Mine Evaluation Problems
on the Witwatersrand, J. Chem. Metall. Min. Soc. South Africa, vol. 52, pp. 119-39.
Lund, H. J., Ates, H. Kasap, E., and Tillman, R.W., 1995 Comparison of single and
multi-facies variograms of Newcastle Sandstone: measures for the distribution of
barriers to flow: SPE paper 29596, p. 507-522.
Matheron, G., 1963, Principles of Geostatistics, Economic Geology, vol. 58, pp.
1246-66.
Olea, R. A., 1977, Measuring Spatial Dependence with Semivariograms, Lawrence,
Kansas, Kansas Geological Survey, Series on Spatial Analysis, No. 3, 29 p.
Olea, R. A., 1994, Fundamentals of Semivariogram Estimation, Modeling and
Usage, in Stochastic Modeling and Geostatistics, (1994), J. M. Yarus and R. L.
Chambers, Eds. AAPG Computer Applications in Geology, No. 3, pp. 27-36.
Rehfeldt, K.R., Boggs, J.M, and Gelhar, L.W., 1992, Field study of dispersion in a
heterogeneous aquifer 3: geostatistical analysis of hydraulic conductivity: Water
Resources Research, v.28, p.3309 -3324.
Royle, A. G., 1979, Why Geostatistics?, Engineering and Mining Journal, Vol. 180,
pp. 92-101.

Websites
Easton, V.J., and McColl, J.H., Statistics Glossary v1.1, (HTML Editing by Ian
Jackson), http://www.cas.lancs.ac.uk/glossary_v1.1/main.html -a helpful and
authoritative glossary, with definitions explained in plain language, accompanied by
equations.
ADDITIONAL READING

Caers, J. (2001). "Geostatistical Reservoir Modeling Using Statistical Pattern
Recognition." Petroleum Science and Engineering, V. 29, No. 3-4, p. 177-188.
Capen, E. C. (1993). "A Consistent Probabilistic Approach to Reserves Estimates."
Society of Petroleum Engineers Hydrocarbon Economics and Evaluation
Symposium. SPE Paper 25830, p. 117-122.
Chavez-Cerna, M. and Bianchi-Ramirez, C. (1998). " Abstract: Geostatistics
Applied to a Reservoir Study in Northwestern Peru Talara Basin ." AAPG Bulletin, v.
82, p. 1883-1984.
De Araújo Simões-Filho, I. and Queiroz De Castro, J. (2001). "Estimation of
Subseismic Nonreservoir Layers within a Turbidity Oil-Bearing Sandstone, Campos
Basin, Using a Geostatistical Approach." Journal of Petroleum Science and
Engineering, v. 32, No. 2-4, p. 79-86.
Emanuel, A. S., Behrens, R. A., Hewett, T.A. and Alamed,a G. K. (1988).
"Reservoir Performance Prediction Methods Based on Fractal Geostatistics:
Abstract." AAPG Bulletin, v. 72, p. 181-181.
Haldorsen, H. H. and Damsleth, L. W. (1990). " Stochastic Modeling." Journal of
Petroleum Technology, April 1990, p. 404-412.
Ioannidis, M. A., Kwiecien, M. J. and Chatzis, I. (1997). "Statistical Analysis of the
Porous Microstructure as a Method for Estimating Reservoir Permeability."
Petroleum Science and Engineering, v. 16, No. 4, p. 251-261.
MacDonald, A. C. and Aasen, J. O. (1993). "Parameter Estimation for Stochastic
Models of Fluvial Channel Reservoirs." AAPG Bulletin, v. 77, p. 1643-1644.
McDowell,R. R., Matchen, D. L., Hohn, M. E., and Vargo, A. G. (1994). "An
Innovative Geostatistical Approach to Oil Volumetric Calculations: Rock Creek Field,
West Virginia: Abstract ." AAPG Bulletin, v. 78, p. 1332-1332.
Murray, C. J. (1992). "Geostatistical Simulation of Petrophysical Rock Types."
AAPG Bulletin, v. 76, p. 94-94.
Nederlof, M. H. (1994). "Comparing Probabilistic Predictions with Outcomes in
Petroleum Exploration Prospect Appraisal." Nonrenovable Resources, v.3, No. 3, p.
183-189.
Norris, R. J. (1996). "Focusing Stochastic Simulation for Effective Problem-Solving
in Reservoir Engineering: Abstract." AAPG Bulletin, v. 80, p. 1318.
Pawar, R. J., Edwards, E. B. and Whitney, E. M. (2001). "Geostatistical
Characterization of the Carpinteria Field, California." Petroleum Science and
Engineering, v. 31, No. 2-4, p. 175-192.
Smyth, M. and Buckley, M.J. (1993). " Statistical Analysis of the Microlithotype
Sequences in the Bulli Seam, Australia, and Relevance to Permeability for Coal
Gas." International Journal of Coal Geology, v. 22, p. 167-187.
Tetzlaff, D. (1996). "Probabilistic Estimates from Reservoir-Scale Sedimentation
Models." Numer. Exp. Stratigr., Lawrence. KS. 1996, p. 145-146.
TERMINOLOGY
In compiling this list of geostatistical terminology, only the most commonly
encountered terms were selected. No attempt was made to duplicate the more
extensive glossary by Richardo Olea (1991). Some definitions may differ slightly
from those of Olea.
Admissibility (of semivariogram models): for a given covariance model, the
kriging variance must be > 0, this condition is also known as positive definite.
Anisotropy: refers to changes in a property when measured along different axes.
In geostatistics, anisotropy refers to covariance models that have major and
minor ranges of different distances (correlation scale or lengths). This condition is
easiest seen when a variogram shows a longer range in one direction than in
another. In this module, we discuss two types of anisotropy:
- Geometric anisotropic covariance models have the same sill, but different
ranges;
- Zonal anisotropic covariance models have the same range, but different
sills.
Auto-correlation: a method of computing a spatial covariance model for a
regionalized variable. It measures a change in variance (variogram) or correlation
(correlogram) with distance and/or azimuth.
Biased estimates: seen when there is a correlation between standardized errors
and estimated values (see Cross-Validation). A histogram of the standardized
errors is skewed, suggesting a bias in the estimates, so that there is a chance
that one area of a map with always show estimates higher (or lower) than
expected.
Block kriging: Kriging with nearby sample values to make an estimated value for
an area; making a kriging estimate over an area, for example estimating the
average value at the size of the grid cell. The grid cell is divided into a specified
number of sub-cells, a value is kriged to each sub-cell, and then the average
value is placed at the grid node.
Cokriging: the process of estimating a regionalized variable from two or more
variables, using a linear combination of weights obtained from models of spatial
auto-correlation and cross-correlation. The multivariate version of kriging.
Conditional bias: a problem arising from insufficient smoothing which causes
high values of an attribute to be overstated, while low values are understated.
Conditional simulation: a geostatistical method to create multiple (and equally
probable) realizations of a regionalized variable based on a spatial model. It is
conditional only when the actual control data are honored. Conditional simulation
is a variation of conventional kriging or cokriging, and can be considered as an
extrapolation of data, as opposed to the interpolations produced by kriging. By
relaxing some of the kriging constraints (e.g. minimized square error), conditional
simulation is able to reproduce the variance of the control data. Simulations are
not estimations; their goal is to characterize variability or risk. The final “map”
captures the heterogeneity and connectivity mostly likely present in the reservoir.
Post processing conditional simulation produces a measure of error (standard
deviation) and other measures of uncertainty, such as iso-probability and
uncertainty maps.
Correlogram: a measure of spatial dependence (correlation) of a regionalized
variable over some distance. The correlogram can also be calculated with an
azimuthal preference.
Covariance: a measure of correlation between two variables. The kriging system
uses covariance, rather than variogram or correlogram values, to determine the
kriging weights, ì. The covariance can be considered as the inverse of the
variogram, and equal to the value of the sill minus the variogram model (or zero
minus the correlogram).
Coregionalization: the mutual spatial behavior between two or more
regionalized variables.
Cross-correlation: a technique used to compute a spatial cross-covariance
model between two regionalized variables. This provides a measure of spatial
correlation between the two variables. It produces a bivariate analogue of the
variogram.
Cross-validation: a procedure to check the compatibility between a data set, its
spatial model and neighborhood design. First, each sampled location is kriged
with all other samples in the search neighborhood. The estimates are then
compared against the true sample values. Significant differences between
estimated values and true values may be influenced by outliers or other
anomalies. This technique is also used to check for biased estimates produced
by poor model and/or neighborhood design.
Drift: often used to describe data containing a trend. Drift usually refers to short
scale trends at the size of the neighborhood.
Estimation variance: the kriging variance at each grid node. This is a measure
of global reliability, not a local estimation of error.
Experimental variogram: a measure of spatial dependence (dissimilarity or
increasing variability) of a regionalized variable over some distance and/or
direction. This is the variogram that is based upon the sample data; upon which
the model variogram will be fitted.
External drift: a geostatistical linear regression technique that uses a spatial
model of covariance when a secondary regionalized variable (e.g. seismic
attribute) is used to control the shape of the final map created by kriging or
simulation.
Geostatistics: the statistical method used to analyze spatially (or temporally)
correlated data and to predict the values of such variables distributed over
distance or time.
h-Scatterplot: a plot obtained by selecting a value for separation distance, h,
then plotting the pairs Z
(x)
and Z
(x+h)
as the two axes of a bivariate plot. The shape
and correlation of the cloud is related to the value of the variogram for distance,
h.
Histogram: a plot, which shows the frequency or number of occurrences (Y-axis)
of data, falling into size classes of equal width (X-axis).
Indicator variable: a binary transformation of data to either 1 of 0, depending on
whether the value of the data point surpasses or falls short of a specified cut-off
value.
Interpolation: estimation technique in which samples located within a certain
search neighborhood are weighted to form an estimate, such as the kriging
technique.
Inverse distance weighting: Non-geostatistical interpolation technique that
assumes that attributes vary according to the inverse of their separation (raised
to some power).
Iso-probability map: maps created by post processing conditional simulations to
show the value of the regionalized variable at a constant probability threshold.
For example, at the 10
th
, 50
th
(median), or the 90
th
percentiles. These maps
provide a level of confidence in the mapped results.
Kriging: a method of calculating estimates of a regionalized variable using a
linear combination of weights obtained from a model of spatial correlation. It
assigns weights to samples to minimize estimation variance. The univariate
version of cokriging.
Kriging variance: see estimation variance.
Lag: a distance parameter (h) used during computation of the experimental
covariance model. The lag distance typically has a tolerance of ± one-half the
initial lag distance.
Linear estimation method: a technique for making estimates based on a linear
weighted average of values, such as seen in kriging.
Model variogram: a function fitted to the experimental variogram as the basis for
kriging.
Moving neighborhood: a search neighborhood designed to use only a portion of
the control data point during kriging or conditional simulation.
Nested variogram model: a linear combination of two or more variogram
(correlogram) models. It has more than one range showing different scales of
spatial variability; for example, a short-range exponential model combined with a
longer-range spherical model. Often, it involves adding a nugget component to
one of the other models.
Nonconditional simulation: a method that does not use the control data during
the simulation process; quite often used to observe the behavior of a spatial
model and neighborhood design.
Nugget effect: a feature of the covariance model where the experimental points
defining the model does not appear to intersect the y-axis at the origin. The
nugget represents a chaotic or random component of attribute variability. The
nugget model shows constant variance at all ranges, but is often modeled as
zero variance at the control point (well location). Abbreviated as Co by
convention.
Ordinary (co-)kriging: a technique in which the local mean varies and is re-
estimated based on the control points in the search neighborhood ellipse (moving
neighborhood).
Outliers: data points falling outside about ± 2.5 standard deviation of the mean
value of the sample population possibly the result of bad data values or local
anomalies.
Point kriging: making a kriging estimate at a specific point, for example at a grid
node, or a well location.
Positive definite: see admissibility.
Random function: the random function has two components: (1) a regional
structure component manifesting some degree of spatial auto-correlation
(regionalized variable) and lack of independence in the proximal values of Z
(x)
,
and (2) a local, random component (random variable).
Random variable: a variable created by some random process, whose values
follow a probability distribution, such as a normal distribution.
Range: the distance where the variogram reaches the sill, or when the
correlogram reaches zero correlation. Also known as the correlation range or
correlation scale, it represents the distance at which correlation ceases. It is
abbreviated as a by convention.
Regionalized variable: a variable that has some degree of spatial auto-
correlation and lack of independence in the proximal values of Z
(x)
.
Risk map: see Uncertainty Map
Simple kriging: the global mean is constant over the entire area of interpolation
and is based on all the control points used in a unique neighborhood (or is
supplied by the user).
Semivariogram: a measure of spatial dependence (dissimilarity or increasing
variability) of a regionalized variable over some distance; a plot of similarity
between points as a function of distance between the points. The variogram can
also be calculated with an azimuthal preference. The semivariogram is commonly
called a variogram. See also correlogram.
Sill: the upper level of variance, where the variogram reaches its correlation
range. The variance of the sample population is the theoretical sill of the
variogram.
Smearing: a condition produced by the interpolation process where high-grade
attributes are allowed to influence the estimation of nearby lower grades.
Stationarity: the simplest definition is that the data do not exhibit a trend; spatial
statistical homogeneity. This implies that a moving window average shows
homogeneity in the mean and variance over the study area.
Stochastic modeling: used interchangeably with conditional simulation,
although not all stochastic modeling applications necessarily use control data.
Support: the size, shape, and geometry of volumes upon which we estimate a
variable. The effect of which is that attributes of small support are more variable
than those having a larger support.
Transformation: a mathematical process used to convert the frequency
distribution of a data set from Lognormal to Normal.
Unique neighborhood: a neighborhood search ellipse that uses all available
data control points. The practical limit is 100 control points. A unique
neighborhood is used with simple kriging.
Uncertainty map: these are maps created by post processing conditional
simulations. A threshold value is selected, for example, 8 % porosity, an
uncertainty map shows at each grid node, the probability that porosity is either
above or below the chosen threshold.
Variogram: geostatistical measure used to characterize the spatial variability of
an attribute.
Weights: values determined during an interpolation or simulation, that are
multiplied by the control data points in the determination of the final estimated or
simulated value at a grid node. To create a condition of unbiasness, the weights,
ì, sum to unity for geostatistical applications.
SUGGESTED REFERENCE
Olea, R. A., 1991, Geostatistical Glossary and Multilingual Dictionary, New York,
Oxford University Press, 177 pages.

THERE IS LOVE IN SHARING!! THIS IS FOR THE BENEFIT OF MY COLLEAGUES IN PETROLEUM GEOSCIENCE IMPERIAL COLLEGE LONDON WHO WILL AND MUST GRADUATE IN SEPTEMBER 2008.

WISH YOU ALL SUCCESS!!!

INTRODUCTION Before undertaking any study of Geostatistics, it is necessary to become familiar with certain key concepts drawn from Classical Statistics, which form the basic building blocks of Geostatistics. Because the study of Statistics generally deals with quantities of data, rather than a single datum, we need some means to deal with that data in a manageable form. Much of Statistics deals with the organization, presentation, and summary of data. Isaaks and Sriv astava (1989) remind us that “Data speaks most clearly when organized”. This section reviews a number of classic statistical concepts that are frequently used during the course of geostatistical analysis. By understanding these concepts, we will gain the tools needed to analyze and describe data, and to understand the relationships between different variables. STATISTICAL NOTATION Statistical notation uses Roman or Greek letters in equations to represent similar concepts, with the distinction being that:  Greek notation describes Populations: measures of a population are called parameters  Roman notation describes Samples: measures of a sample are called statistics Now might be a good time to review the list of Greek letters. Following is a list of Greek letters and their significance within the realm of statistics. Letter Name Upper & Lower Case                  

alpha beta gamm a delta epsilon

zeta eta theta iota

Certain Roman letters take on additional importance as part of the standard notation of Statistics or Geostatistics. a letter may take on a different meaning.kappa lambd a mu nu xi omicro n pi rho                  Statistical Notation: Mean of a Population Statistical Notation: Correlation Coefficient Statistical Notation: Summation sigma tau upsilon  Statistical Notation: Standard Deviation of a Population   Υ          phi chi psi omega Statistical Notation: Mean of a Sample (  ) It is important to note that in some cases. . depending on whether the letter is upper case or lower case.

The other two scales. interval and ratio. each more rigorously defined than its predecessor.” “B.” Symbols like “A. for sand. and thereby determines the type of data analysis (Davis. 1986). . The manner in which numerical values are assigned determines the measurement scale. such as “red. and thus containing more information. Measurements are numerical values that reflect the amount or magnitude of some property. Probability function for a random variable Lag distance (distance between two sample points) Sample mean Population size Sample size (or number of observations in a data set) Observed frequencies Outcomes Probability Proportion Standard deviation of a sample Variance Random variable A single value of a random variable MEASUREMENT SYSTEMS Because the conclusions of a quantitative study are based in part on inferences drawn from measurements. 2 and 3.” because they involve determinations of the magnitude of an observation (Davis. or that 3 is “greater than” 2.” or numbers are also often used. in which we classify observations into exclusive categories. respectively. and may therefore code the facies as 1.” “green.” “C. siltstone. it is important to consider the nature of the measurement systems from which data are collected. there is no connotation that 2 is “twice as much” as 1. Nominal Scale This measurement classifies observations into mutually exclusive categories of equal rank. 1986). are the ones we normally think of as “measurements. The first two are the nominal and ordinal scales. and shale. In geostatistics. There are four measurement scales. we may wish to predict facies occurrence.Letter Name E F f f h m N n O o P p s V X x Statistical Notation Event Distribution Frequency. Using this scale.” or “blue.

whereas the geographical coordinates are on an interval scale.) Ratio Scale Ratios not only have equal increments between steps. In the petroleum industry. in trend surface analysis. or thickness. but there are practical limits for the measurements. it is possible to have negative values. The step between successive states is not equal in this scale. A change from 10 to 20 degrees C is the same as the change from 110 to 120 degrees C. but it applies to numerical characteristics of the sample data. (It would be hard to conceive of negative porosity. Within the petroleum industry. because they have units of length. Many geological measurements are based on a ratio scale. Thus. the infinite population might be all wells drilled in the Gulf of Mexico. such elements are measurements or observations made on items of a specific type (porosity or permeability. Ratio scales represent the highest forms of measurement. kerogen types are based on an ordinal scale. A sample is a subset of elements drawn from the population (Davis. volume. The most commonly cited example of an interval scale is temperature. A classic example taken from geology is Mohs‟ scale of hardness. A statistic is similar to a parameter. the independent variable may be measured on a ratio scale. All types of mathematical and statistical operations are performed with them. A finite population might consist of all the wells drilled in the Gulf of Mexico in 1999. for example). and they may occur intermixed in the same problem. Typically. And Statistics Populations possess certain numerical characteristics (such as the population mean) which are known as parameters. no distinction is made between the two. Interval Scale This scale is so named because the width of successive intervals is constant. 1986). indicative of stages of organic diagenesis. reservoir properties are measured along a continuum. whereas. Parameters. and future. present. or of porosity greater than 100%. POPULATIONS AND SAMPLES INTRODUCTION Statistical analysis is built around the concepts of “populations” and “samples.Ordinal Scale Observations are sometimes ranked hierarchically. in which mineral rankings extend from one to ten. More specifically. permeability. a population is the entire collection of those elements. mass. Samples are studied in order to make inferences about the population itself. This scale is commonly used for many measurements. or a point where the magnitude is nonexistent. past. Commonly. but also have a zero point.” A population consists of a well-defined set of elements (either finite or infinite). with higher ranks signifying increased hardness. An interval scale does not have a natural zero. Data. For most of our geostatistical studies. we will be primarily concerned with the analysis of interval and ratio data. For example. . Data are measured or observed values obtained by sampling the population. and so forth.

is given by the equation: Cn  n!(N  n)! Where: CNn = the number of combinations of samples N = the number of elements in the population n = the number of elements in the sample If the sampling is conducted in a manner such that each of the C Nn samples has an equal chance of being selected. a random sample must be unbiased. a parameter consists of a fixed value. so that selecting one item from the population has no influence on the selection of other items in the population. Replacement The issue of replacement plays an important role in our sampling strategy. N. while the values from Samples (statistics) are assigned Roman letters. and may change by drawing more than one sample from the same population. For example. as the sample size increases. Random sampling produces an unbiased and independent result. then draw another card Or N N! . if we were to draw samples of cards from a population consisting of a deck. the random sample must be independent. the sampling program is said to be random and the result is a random sample (Mendenhall.  First. which does not change. we could either:  Draw a card from the deck.  Second. One way to determine whether random samples are being drawn is to analyze sampling combinations. the value of a statistic is not fixed. Unlike the parameter. Random Sampling Samples should be acquired from the population in a random manner. we have a better chance of understanding the true nature (distribution) of the population. Random sampling is defined by two properties. 1971). so that. and add it‟s value to our hand.Within the population. Statistics are used to estimate parameters or test hypotheses about the parent population (Davis. so that each item in the sample has the same chance of being chosen as any other item in the sample. Remember that values from Populations (parameters) are often assigned Greek letters. The number of different samples of n measurements that can be drawn for the population. Sampling Methods The method of sampling affects our ability to draw inferences about our data (such as estimation of values at unsampled locations) because we must know the probability of an observation in order to arrive at a statistical inference. 1986).

Sampling without replacement prevents us from sampling that value again. Similarly. The sampling routine (also known as the drilling program) is highly biased and dependent. Events can be classified by there relationship to one another: Independent Events Events are classified as Independent if the occurrence of event A has no bearing on the occurrence of event B. and rightly so -any drilling program will be biased toward high porosity. EVENTS An event is a collection of possible outcomes. In the oil industry. (We will discuss bias in more detail in our discussion of summary statistics. a trial is an experiment that produces an outcome which consists of either a success or a failure. then draw a card from the deck again. AND PROBABILITY INTRODUCTION In statistical parlance. note it‟s value. the sampling is considered biased. Furthermore.) However. the process of drilling wells in a reservoir necessarily involves sampling without replacement. And the success or failure of nearby wells will influence further drilling. An event is a collection of possible outcomes of a trial. our task is to infer properties about the entire reservoir from our sample data set. we need to use various statistical tools to understand and summarize the properties of the samples to make inferences about the population (reservoir). Oilfield Applications to Sampling When observations having certain characteristics are systematically excluded from the sample. in the second case we sample with replacement. then provides an overview on probability. high permeability. and this collection may contain zero or more outcomes. Probability is a measure of the likelihood that an event will occur. . Because the sample data set represents a minuscule subset of the population. Draw a card from the deck. while sampling with replacement allows us the chance to pick that same value again in our sample. Typically. we may be interested in the pore volume of a particular reservoir unit for pay estimation. despite these limitations. and ultimately. for example. The following discussion introduces events and their relation to one another. thus deliberately biasing the true pore volume to a larger value. we use a threshold or porosity cutoff when making the calculation. or a measure of that event‟s relative frequency. depending on how many trials are conducted. TRIALS. and vice versa. EVENTS. high production. Suppose. To accomplish this. high structural position. any sample data set will provide only a sparse and incomplete picture of the entire reservoir. whether deliberately or inadvertently. we face this situation quite frequently. and put it back in the deck. In the first case. we sample without replacement. we will never really know that actual population distribution function of the reservoir.

such as a percentage scale. The chance of rain is an example of discrete probability. Discrete Probability All of us have an intuitive concept of probability. and  1 represents certainty that the event will occur. and imply that there is a 70% chance it will not rain. 1986: Mendenhall. used almost universally in statistics texts. or graph providing the probability associated with each value of the random variable (Mendenhall. Coin Toss Experiment Coin tossing is a clear-cut example of discrete probability. it either will or it will not rain. The probability distribution for a discrete random variable is a formula. 1971. and E = the event Consider the following classic example of discrete probability. table. 1986). if asked to guess whether it will rain tomorrow. Mutually Exclusive Events Events are Mutually Exclusive if the occurrence of either event precludes the occurrence of the other. or a measure of that event‟s relative frequency. . For a discrete distribution. most of us would reply with some confidence that rain is either likely or unlikely. it must come up either heads or tails (Davis. we will review discrete and conditional probabilities. 1971). 1971). Davis. you might say that there is a 30% chance of rain tomorrow. except for the vanishingly small possibility that the coin will land precisely on edge. The event has two states and must occupy one or the other. PROBABILITY Probability is a measure of the likelihood that an event will occur.Dependent Events Events are classified as Dependent if the occurrence of event A influences the occurrence of event B. For example. where:  0 represents no chance of occurrence. Thus. The measure of probability is scaled from 0 to 1. Two events that are independent events cannot be mutually exclusive. probability can be defined by the following: P(E) = number of outcomes corresponding to event E total number of possible outcomes Where: P = the probability of a particular outcome. In this discussion. Probability is just one tool that enables the statistician to use information from samples to make inferences or describe the population from which the samples were obtained (Mendenhall. Another way of expressing the estimate is to use a numerical scale.

This does not imply that every other toss results in a head. . where y = Number of Heads y 0 1 2 Sample Points in y E4 E2. while y =2 contains one sample point. E3 E1 p(y) ¼ ½ ¼ Thus. When a single coin is tossed. Because each outcome is equally likely. y = 1 to sample point E2. y =1 contains two sample points. etc. hence the theoretical relative probability for y = 1 is ½. and y = 2.The experiment is conducted by tossing two unbiased coins. 1971). The histogram is shown in Figure 1 (Probability Histogram for p(y) (modified from Davis. Because p(0) = ¼. for this experiment there is a 25% chance of observing two heads from a single toss of the two coins. y = 1. The sample points for this experiment with their respective probabilities are given below (taken from Mendenhall. p(1) = ½. 1986)). The probability of each value of y may be calculated by adding the probabilities of the sample points in the numerical event. E4. the probability of obtaining a head is ½. E1. but given enough tosses. heads will appear one-half the time. The numerical event y = 0 contains one sample point. etc. The histogram contains three classes for the random variable y. Sample Point E1 E2 E3 E4 Coin 1 H H T T Coin 2 H T H T P(EI) ¼ ¼ ¼ ¼ y 2 1 1 0 Let y equal the number of heads observed. We assign the value y = 2 to sample point E1. corresponding to y = 0. the theoretical relative frequency for y = 0 is ¼. it has two possible outcomes: heads or tails. E2 and E3. The Probability Distribution Function for y. Now let us look at the two-coin example.

is the fraction of the entire population of observations which result in snow. but we would expect the chance of snow. If you repeated the experiment with 1000 coin tosses. Such a dependence on a prior event describes the concept of Conditional Probability: the chance that a particular event will occur depends on whether another event occurred previously. by throwing two balanced coins. and recorded the number of heads observed each time to construct a histogram for the 100 measurements. say 100 times. called the conditional probability of A given B. your histogram would appear very similar to that of Figure 1. is not the same as the probability of snow given the prior information that the temperature is below freezing. the conditional probability that event A will occur given that event B has occurred already is written as: P(A|B) . because once a well is drilled. Conditional Probability The concept of conditional probability is key to oil and gas exploration. Obviously. and allows us to revise our estimates of the probability of further outcomes or events. P(A). given freezing temperatures. Two events are often related in such a way that the probability of occurrence of one event depends upon whether the other event has or has not occurred. but the probability of snow. In statistical notation. The probability of snow. Now examine the sub-population of observations resulting in B. it makes more information available. P(A). A. This fraction. may equal P(A). the similarity would be even more pronounced. temperature below freezing. events A and B are related. and the fraction of these resulting in snow. For example. to be larger. suppose an experiment consists of observing weather on a specific day. Let event A = „snow‟ and B = „temperature below freezing‟.Figure 1 If you were to draw a sample from this population.

where the vertical bar in the parentheses means “given” and events appearing to the right of the bar have occurred (Mendenhall. given that event B occurred at some time in the past. P(AB) = 0 and P(A  B) = P(A) + P(B) Multiplicative Law of Probability The second law of probability is called the Multiplicative Law of Probability. event relations. which applies to unions. Additive Law of Probability Another approach to probability problems is based upon the classification of compound events. and two probability laws. et al. which applies to intersections. The first is the Additive Law of Probability. given that event A has already occurred P(A) = the probability that event A will occur P(B | A') = the probability that event B will occur. (1994) entitled Bayesian Sequential Indicator Simulation of Channel Sands in the Oseberg Field. Quite often. . Bayes‟ Theorem for the probability of causes follows easily from the definition of conditional probability: P(A | B)  P( B | A) P( A) P ( B | A ) P ( A )  P( B | A' ) P ( A ' ) Where: P(A | B) = the probability that event A will occur. 1971). given that event A has not already occurred P(A') = the probability that event A will not occur A practical geostatistical application using Bayes‟ Theorem is described in an article by Doyen. A. The probability of the union (A  B) is equal to: P(A  B) = P(A) + P(B) -P(AB) If A and B are mutually exclusive. given that event B has already occurred P(B | A) = the probability that event B will occur. Thus. we wish to find the conditional probability of an event. Norwegian North Sea. we define the conditional probabilities of A given B as: P(A|B) = P(AB) P(B) and we define the conditional probabilities of B given A as follows: P(B|A) = P(AB) P(A) Bayes’ Theorem on Conditional Probability Bayes‟ Theorem allows the conditional probability of an event to be updated as newer information becomes available.

1989). and no inference can be made about the values at locations that were not sampled. The random variable is further explained later. the probability of the intersection. there is only sample data. AB. Unfortunately. Without a model. tectonic processes. Next. we must accept that there is an unavoidable degree of uncertainty about how the attribute behaves between sample locations (Isaaks and Srivastava. A and B. and the following random function models introduced herein recognize this fundamental uncertainty.Given two events. for example. in Spatial Correlation Analysis and Modeling. . the variables we study in reservoir data sets are often the product of complex interactions that are not fully quantifiable. or least squares) that do not state the nature of their model. depositional mechanisms. is equal to P(AB) = P(A)P(B|A) = P(B)P(A|B) If A and B are independent. In this section. inverse distance. For most reservoir data sets. The following discussion describes the two kinds of random variables. Thus. THE PROBABILISTIC APPROACH Deterministic models are applicable only when the process that generated the data is known in sufficient detail to enable an accurate description of the entire population to be made from only a few sample values. we‟ll discuss the probability distributions or functions associated with each type random variable. providing us with tools to estimate values at unsampled locations. Although we know the physics or chemistry of the fundamental processes. few reservoir processes are understood well enough to permit application of deterministic models. Unlike many other estimation methods (such as linear regression. a probabilistic approach is required. The estimation procedure must rely upon a model describing how the phenomenon behaves at unsampled locations. and diagenetic alterations. geostatistical estimation methods clearly identify the basis of the models used (Isaaks and Srivastava. we define the random variable and briefly review the essential concepts of important probability distributions. Random variables and their probability distributions form the foundation of the geostatistical method. These processes include. then P(AB) = P(A)P(B) RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS INTRODUCTION Geoscientists are often tasked with estimating the value of a reservoir property at a location where that property has not been previously measured. The underlying model and its behavior is one of the essential elements of the geostatistical framework. 1989).

5. it may assume only a finite range of distinct values (distinct values being the operative phrase here.1. The probability distribution function of discrete random variables may be plotted as a histogram. We will discuss each in turn.4. so the value of the random variable will vary with each trial as the experiment is repeated. The two classes are the discrete and the continuous random variable.4.2. (In the case of a coin toss. e. Refer to Figure 1 (Probability histogram) as an example histogram for a discrete random variable. The die and coin toss experiments also generate discrete random variables. we need to designate a numerical value to “heads” as 0 and “tails” as 1.5 -as opposed to each and every number between 0 and 1 -which would produce an infinite number of values). Discrete Random Variables A discrete random variable may be identified by the number and nature of the values it assumes. The coin toss is another experiment that produces numbers randomly. Discrete random variables are characterized by a probability distribution.RANDOM VARIABLE DEFINED A random variable can be defined as a numerical outcome of an experiment whose values are generated randomly according to some probabilistic mechanism.: 0. for example. however.3. produces values randomly from the set 1. The throwing of a die. such as point counts of minerals in a thin section. In most practical problems.1. with the distinction based on the sample interval associated with the measurement.g. which may be described by a formula. A random variable associates a unique numerical value with every outcome.2. table or graph that provides the probability associated with each value of the discrete random variable. discrete random variables represent count (or enumerated) data. then we can draw randomly from the set 0.6. .3.) TWO CLASSES OF RANDOM VARIABLES There are two different classes of random variables.

A histogram is a graphical representation of the frequency table. A frequency table records how often data values fall within certain intervals or classes. rather than the total number of values in each class. and displayed as a histogram. so that the height of each bar is proportional to the number of values within that class. .Figure 1 Frequency Tables and Histograms Discrete random variables are often recorded in a frequency table. where the total number of values below certain cutoffs are shown. and thus can be represented as a cumulative frequency histogram. Data is conventionally ranked in ascending order. It is common to use a constant class width for a histogram.

with a class width of one (modified from Isaaks and Srivastava. X.Table 1 Frequency and Cumulative Frequency tables of 100 values. 1989). Class Interva l 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 1011 1112 1213 1314 >14 Frequency Occurrenc es 1 1 0 0 3 2 2 13 16 11 13 17 13 4 4 Frequenc y Percenta ge 1 1 0 0 3 2 2 13 16 11 13 17 13 4 4 Cumulati ve Number 1 2 2 2 5 7 9 22 38 49 62 79 92 94 100 Cumulativ e Percentag e 1 2 2 2 5 7 9 22 38 49 62 79 92 94 100 Figure 2a and 2b display Frequency and Cumulative frequency histograms of data in Table 1 (modified from Isaaks and Srivastava. .

2b .Figure 2a 1989).

which can be repeatedly subdivided into smaller and smaller intervals to create an infinite number of increments). Problem: Forecast the probability of success of a drilling program. Thus.) Similarly. The probability density function of the continuous random variable may be plotted as a continuous curve. this means that the success or failure of one hole does not influence the outcome of the next. binomial distributions are only valid for trials in which there are only two possible outcomes for each trial. Negative Binomial. it is interesting to note that a very large number of random variables observed in nature approximate a bell-shaped curve. Binary variables can only have two values: such as ON or OFF. called a binary variable. the total number of trials must be fixed beforehand. Poisson. This process may be convenient for comparing continuous and discrete random variables. Furthermore. and SUCCESS or FAILURE will be assigned the numerical values of 1 or 0 respectively. values such as ON or OFF. In this case. all of the trials must have the same probability of success. SUCCESS or FAILURE. In most practical problems. such as the length of a line. Binomial Probability Distribution Binomial distributions only apply to a special type of discrete random variable. The probability distribution governing a coin toss or die throwing experiment is a binomial distribution. Although such curves may assume a variety of shapes. We‟ll consider how the binomial distribution can be applied to the following oilfield example. Each of these distributions is discussed using practical geological examples taken from Davis (1986). but may tend to confuse the presentation. the probability of discovery remains unchanged as successive . Assumptions: Each wildcat is classified as either: 0 = Failure (dry hole) 1 = Success (discovery) The binomial distribution is appropriate when a fixed number of wells will be drilled during an exploratory program or during a single period (budget cycle) for which the forecast is made. and Hypergeometric. or the thickness of a pay zone.) Continuous Random Variables These variables are defined by an infinitely large number of possible values (much like a segment of a number-line. the histograms are converted to continuous curves by running a line from the midpoint of each bar in the histogram. each well that is drilled in turn is presumed to be independent. 1971). continuous random variables represent measurement data. 0 or 1. A statistician would say that such a curve approximates a normal distribution (Mendenhall. and the outcomes of all the trials must not be influenced by the outcomes of previous trials. Four common probability distributions for discrete random variables are: Binomial. Probability Distributions Of The Discrete Random Variable The probability distribution of a discrete random variable consists of the relative frequencies with which a random variable takes each of its possible values. (Often times.(Sometimes.

or based on the company‟s own success ratio. P = (1 -p)n-r pr The probability that (n -r) dry holes will be drilled. this assumption is difficult to justify in most cases. with no discoveries? The terms of the equation are: N =5 r =0 p = 0. because a discovery or failure influences the selection of subsequent drilling locations). What is the probability that the entire exploration program will be a total failure. or equivalently.10 P = [(5!/5!0!] [1] [0.” From p. For example. the (n -r) dry holes and the r discoveries may be arranged in combinations. where the discovery can occur in any of the n wildcats.wildcats are drilled (true initially -as Davis pointed out in 1986. the binomial model can be developed for exploratory drilling as follows: P The probability that a hole will be successful. This is an expression of the binomial distribution. P = (1 -p)n The probability that n successive wells will be dry. and gives the probability that r successes will occur in n trials.59 Where: P = the probability of success r = the number of discovery wells n  r  . P = n(1 -p)n-1 p The probability of drilling one discovery well in a series of n wildcat holes. resulting in the equation: P = [n! / (n -r)!r!][(1 -p)n-r pr] The probability that r discoveries will be made in a drilling program of n wildcats. -p The probability of failure. followed by r discoveries. suppose we want to find the probability of success associated with a 5-well exploration program in a virgin basin where the success ratio is anticipated to be about 10%. but the preceding (n -1) holes will be dry. when the probability of success in a single trial is p. P = (1 -p)n-1 p The probability that the nth hole will be a discovery. Sometimes the success ratio is a subjective “guess. However.95] = 0. The probability p that a wildcat well will discover gas or oil is estimated using an industry-wide success ratio for drilling in similar areas. in n! / (n -r)!r! different ways.

Problem: Drill as many holes as needed to discover two new fields in a virgin basin. Thus we can investigate the probability that it will require. Figure 3 (Discrete distribution giving the probability of making n discoveries in a five-well drilling program when the success ratio (probability of discovery) is 10% (modified from Davis. Assumption: The same conditions that govern the binomial distribution are assumed. Figure 3 1986) shows the probabilities associated with all possible outcomes of the fivewell drilling program. 2. 3. Negative Binomial Probability Distribution Other discrete distributions can be developed for experimental situations with different basic assumptions. up to n exploratory wells before two discoveries are made. …. except that the number of “trials” is not fixed. The probability distribution governing such an experiment is the negative binomial. The expanded form of the negative binomial equation is P = [(r + x -1)!/(r -1)!x!][(1 -p)x pr . Using either the binomial equation or a table for the binomial distribution.p = anticipated success ratio n = the number of holes drilled in the exploration program The probability of no discoveries resulting from exploratory effort is almost 60%. 4. We can develop a Negative Binomial Probability Distribution to find the probability that x dry holes will be drilled before r discoveries are made.

It may be more appropriate to consider the probability distribution that more than x dry holes must be drilled before the goal of r discoveries is achieved. Figure 4 . when the success ratio is 10% (modified from Davis.Where: P = the probability of success r= the number of discovery wells x = the number of dry holes p = regional success ratio If the regional success ratio is 10 %. the probability that a two-hole exploration program will meet the company‟s goal of two discoveries can be calculated: r=2 x=0 p = 0. This gives the probability that the goal of two successes will be achieved in (x + r) or fewer holes. 1986)).029 The calculated probabilities are low because they relate to the likelihood of obtaining two successes and exactly x dry holes (in this case: x = zero). We do this by first calculating the cumulative form of the negative binomial.10 P = 0. as shown in Figure 4 (Discrete distribution giving the cumulative probability that two discoveries will be made by or before a specified hole is drilled.

 the length of the observation period is fixed in advance.  the probability that an event will occur does not change with time.Each of these probabilities is then subtracted from 1. The Poisson probability distribution seems to be a reasonable approach to apply to a series of geological events. the historical record of earthquakes in California. The Poisson probability model assumes that:  events occur independently. For example. the record of volcanic eruptions in the Mediterranean. and  the probability of more than one event occurring at the same time is vanishingly small.0 to yield the desired probability distribution illustrated in Figure 5 (Discrete distribution giving the probability that more than a specified number of holes must be drilled to make two discoveries.  the probability that an event will occur in an interval is proportional to the length of the interval. when the success ratio is 10% (modified from Davis. Figure 5 Poisson Probability Distribution A Poisson random variable is typically a count of the number of events that occur within a certain time interval or spatial area. 1986)). . or the incidence of landslides related to El Nino along the California coast can be characterized by Poisson distributions.

because we use the product np =  instead.  Drilling a dry hole increases the probability that the remaining untested features will prove productive. and  p. The probability distribution generated by sampling without replacement.When the probability of success becomes very small. the discovery of one reservoir increases the odds against finding another. when sampling from a population of N prospects of which S are believed to contain commercial reservoirs. Hypergeometric Probability Distributions The binomial distribution would not be appropriate for calculating the probability of discovery because the chance of success changes with each wildcat well. For example. is SN -S x n -x  P= N n  Where: x = the number of discoveries N = the number of prospects in the population . we can use Statistics to argue two distinctly contradictory cases:  Discovery of one reservoir increases the odds against finding another (fewer fields remaining). What will be the number of discoveries? The probability of making x discoveries in a drilling program consisting of n holes. What we need is to find all possible combinations of producing and dry features within the population. The equation in this case is p(X) = e-x/X! Where p(X) = probability of occurrence of the discrete random variable X  = rate of occurrence Note that the rate of occurrence. The Poisson distribution does not require either n or p directly. Consider the following: Problem: An offshore concession contains 10 seismic anomalies. Assume that if four structures are productive. the probability of success on any one trial becomes very small. . Our limited budget will permit only six anomalies to be drilled. the Poisson Distribution can be used to approximate the binomial distribution with parameters n and p. is the only parameter of the distribution. with a historical success ratio of 40%. the number of trials becomes very large. which is given by the rate of occurrence of events. then enumerate those combinations that yield the desired number of discoveries. This is a discrete probability distribution regarded as the limiting case of the binomial when:  n. is called a hypergeometric distribution.

with no discoveries among the three structures is about 17%.00 -0. 1989). what are the probabilities associated with a three-well drilling program?  The probability of total failure. when four of the ten contain reservoirs (modified from Davis. taken by the number of discoveries. from which four are likely to be reservoirs.17). A histogram of all possible outcomes of this exploration strategy is shown in Figure 6 (Discrete distribution giving the probability of n discoveries in three holes drilled on ten prospects. Figure 6 .  The probability of one discovery is about 50%. or 83%.n = the number of holes drilled S = the number commercial reservoirs This expression represents the number of combinations of reservoirs. 1986)). Note that some probability of success is (1. all divided by the number of combinations of all prospects taken by the total number of holes in the drilling program (Davis. Applying this to our offshore concession example containing ten seismic anomalies. times the number of combinations of barren anomalies. taken by the number of dry holes.

Figure 7a 7c. the distributions may be displayed as a histogram. Rather than displaying the functions as a curve. 7b .Frequency Distributions Of Continuous Random Variables Frequency distributions of continuous random variables follow a theoretical probability distribution or probability density function that can be represented by a continuous curve. These functions can take on a variety of shapes. as shown in Figure 7a. 7b.

7d In this section. we will discuss the following common distribution functions:  Normal Probability Distribution  Lognormal Distribution .7c and 7d (Examples of some continuous variable probability distributions).

. 2. The mean of the sampling distribution of means is equal to the mean of the population from which the samples were drawn. Many algorithms that are used to make estimations or simulations require knowledge about the population density function. . there is no clear-cut answer to this question. The most important contribution of the CLT is in statistical inference. the sampling distribution of means will also be normal. 1971). The Central Limit Theorem is defined below: Central Limit Theorem: If random samples of n observations are drawn from a population with finite mean. 1969. The approximation will become more and more accurate as n becomes large (Mendenhall. the sums and means of samples drawn from a population of any distribution will approximate a normal distribution (Sokol and Rohlf. and most approximation procedures. as n grows larger. will be approximately normally distributed with mean equal to  and standard deviation n. Central Limit Theorem The Central Limit Theorem (CLT) states that under rather general conditions.e. the density distribution can be recreated precisely. the sample mean. then our predictions should be more reliable. Unfortunately. n. 2. because the appropriate value of n depends upon the population probability distribution as well as the use we make of the approximation. must be in order for the approximation to yield useful results. 1971). y. If the original population is not normally distributed.e. it is bell shaped). as the sample size increases. The Central Limit Theorem consists of three statements: 1. If the CLT applies. the disturbing feature of the CLT. is that we must have some idea as to how large the sample size. when increasingly large samples are drawn). If we can accurately predict its behavior using only a few parameters. The significance of the Central Limit Theorem is twofold: 1. then. It explains why some measurements tend to possess (approximately) a normal distribution. . Fortunately. then knowing the sample mean and sample standard deviation. Mendenhall. The Central Limit Theorem is the foundation of the normal probability distribution. 3. If the original population is distributed normally (i. The variance of the sampling distribution of means is equal to the variance of the population from which the samples were drawn. and a standard deviation. the sampling distribution of means will increasingly approximate a normal distribution as sample size increases (i. and many statistical (and geostatistical) methods are based on this supposition. even for small samples. divided by the size of the samples. However.Normal Probability Distribution It is often assumed that random variables follow a normal probability density function. the CLT tends to work very well. but this is not always true.

. being a function of the variable Y.39894. It is the dependent variable in the expression.5)) illustrates the impact of parameters on the shape of a probability distribution histogram. making 1/2 equal 0. the Normal Probability Density Function is represented by the following expression: Z  1  Y   2  e      2  2       1 Where Z is the height of the ordinate (y-axis) of the curve and represents the density of the function. There are two constants in the equation: . whose value is approximately 2. which determine the location and shape of the distribution (these parameters are discussed under Summary Statistics).71828. Right( = 8.  = 1).  = 0.Properties of the Normal Distribution Formally. rather there is an infinity of such curves. Therefore the mean. and the standard deviation. Figure 8b (Bell curve) shows that the curve of a . well-known to be approximately 3. there is not just one normal distribution. Left (  = 4. Figure 8a Figure 8a (Illustration of how changes in the two parameters of the normal distribution affect the shape and position of histograms. and e. There are two parameters in the normal probability density function. The histogram (or curve) is symmetrical about the mean. the base of the Naperian or natural logarithms.14159. These are the parametric mean. because the parameters can assume an infinity of values (Sokol and Rohlf. 1969). . Thus. median and mode (described later under this subtopic) of the normal distribution occur at the same point.

1952).73% of the data How are the percentages calculated? The direct calculation of any portion of the area under the normal curve requires an integration of the function shown as the above expression.46% of the data    3 (3 standard deviations) contain 99. Table 1 (Hald. see Statistical Tables and Formulas. (1969): 68. for example. the integration has recorded in tabular form (Sokol and Rohlf. the standard deviation may be used to characterize the sample distribution under the bell curve. in a different format this time:     (1 standard deviation) contains 68. Application of the Normal Distribution The normal frequency distribution is the most widely used distribution in statistics. sometimes referred to as the mean variation. According to Sokol and Rohlf.4% of the sample values fall within -2and +2 from the mean. These tables can be found in most standard statistical books.3% of all sample values fall within -1 to +1 from the mean. 1969). 1969). The distance between  and one of the points of inflection represents the standard deviation. while 95.Gaussian normal distribution can be described by the position of its maximum. for those who have forgotten their calculus. This bears repeating.7% of the values are contained within -3 and +3 of the mean. and 99.3% of the data    2 (2 standard deviations) contain 95. The square of the mean variation is the variance. There are three important applications of the density function (Sokol and Rohlf. . Fortunately. In a normal frequency distribution. Figure 8b which corresponds to its mean () and its points of inflection.

such as the distribution in Figure 7b. 1971). Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of hundred thousand-barrel equivalent.1. 2. respectively:   = np   = npq If the interval   2 lies within the binomial bounds. 0 and n. To test whether a sample comes from a normal distribution we must calculate the expected frequencies for a normal curve of the same mean and standard deviation. Knowing when a sample comes from a normal distribution may confirm or reject underlying hypotheses about the nature of the phenomenon studied. this means a better and unbiased estimation of reservoir parameters between the well data. Lognormal Distribution Many variables in the geosciences do not follow a normal distribution. but are highly skewed. if we assume a normal distribution. Sometimes we need to know whether a given sample is normally distributed before we can apply certain tests. The binomial probability distribution would nearly be symmetrical if the distribution were able to spread out a distance equal to two standard deviations on either side of the mean. Finally. Normal Approximation to the Binomial Distribution Recall that approximately 95% of the measurements associated with a normal distribution lie within two standard deviations of the mean and almost all lie within three standard deviations. then compare the two curves. . and as shown below. For the geosciences. 3. to determine the normal approximation we calculate the following when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and p. Therefore. the approximation will be reasonably good (Mendenhall. which in fact is the case. we may make predictions based upon this assumption.

we use Yi = log Xi instead of Yi =Xi for each observation). The transformed values may be backtransformed prior to reporting results. zero mean and unit variance) simplifies data handling and eases comparison to different data sets. If the histograms of Figure 7b and Figure 9 are converted to logarithmic forms (that is. the properties of the lognormal distribution can be explained simply by reference to the normal distribution. Transformation of Lognormal data to Normal The data can be converted into logarithmic form by a process known as transformation.. The distribution of the transformed data should be markedly less skewed than the lognormal data. Data which display a lognormal distribution. the lognormal distribution is extremely important. can be transformed to resemble a normal distribution by applying the formula ln(z) to each z variate in the data set prior to conducting statistical analysis. In terms of the original transformed variable Xi. Such variables are said to be lognormal. Y  GM  n Xi . with decreasing numbers of larger fields. If we look at the transformed variable Yi rather than Xi itself. and a few rare giants that exceed all others in volume.e. Because of its frequent use in geology. Transforming the data to a standardized normal distribution (i. for example. the mean of Y corresponds to the nth root of the products of Xi. the distribution becomes nearly normal. The success of the transformation can be judged by observing its frequency distribution before and after transformation.Figure 9 The histogram illustrates that most fields are small.

 Median: Midpoint of all observed data values. If you want. In practice. and half are below. it is simpler to convert the measurements into logarithms and compute the mean and variance. Half the values are above the median. SUMMARY STATISTICS The summary statistics represented by a histogram can be grouped into three categories:  measures of location. 1986). thus produce an intermediate product near the geometric mean. and are represented by the following:  Minimum: Smallest value. and the final measurement is near the true value. Measures of Location Measures of location provide information about where the various parts of the data distribution lie. The characteristics of the lognormal distribution are discussed in a monograph by Aitchison and Brown (1969) and in the geological context by Kock and Link (1981). and  measures of shape. Quite often we will simply compute the mean and the variance. rather than additive. Random Error Random errors for normal distributions are additive. when arranged in ascending order. 1986). If you work with the data in the transformed state.Where: GM is the geometric mean  is analogous to . or plot its histogram. these statistics are very sensitive to extreme values (outliers) and do not provide any spatial information. except that all the elements in the series are multiplied rather than added together (Davis.  Maximum: Largest value. . which is the heart of a geostatistical study. which means that errors of opposite sign tend to cancel one another. the geometric mean and variance compute the antilog of Y and s2y. In this section. Lognormal distribution random errors are multiplicative. we will describe a number of different methods that can be used to analyse data for a single variable. However. all of the statistical procedures that are appropriate for ordinary variables are applicable to the log transformed variables (Davis. This statistic represents the 50th percentile of the cumulative frequency histogram and is not generally affected by an occasional erratic data point. UNIVARIATE DATA ANALYSIS INTRODUCTION There are several ways in which to summarize a univariate (single attribute) distribution.  measures of spread.

the quartiles split the data in quarters.  Quartiles: In the same way that the median splits the data into halves. except that population notations have been replaced with those for samples. i    Variance =   N  2 Kachigan (1986) notes that the above formula is only appropriate for defining variance of a population of observations. Because the variance involves squared differences. (This statistic is quite sensitive to extreme high or low values. A single erratic value or outlier can significantly bias the mean. and are represented by the following:  Variance: Average squared difference of the observed values from the mean. Quartiles represent the 25th. Mode: The most frequently occurring value in the data set. this statistic is very sensitive to abnormally high/low values. 50th and 75th percentiles on the cumulative frequency histogram. then the formula above will tend to underestimate the population variance. If this same formula was applied to a sample for the purpose of estimating the variance of the parent population from which the sample was drawn. using the sample mean ( x . This value falls within the tallest bar on the histogram. The below formula for the sample mean is comparable to the above formula. rather than the population mean ().  Mean: The arithmetic average of all data values.) We use the following formula to determine the mean of a Population: Mean =  =  N i where:  = population mean N = number of observations (population size) ZI = sum of individual observations We can determine the mean of a Sample in a similar manner. Mean = x   n i where: x = sample mean n = number of observations (sample size) ZI = sum of individual observations Measures of Spread Measures of spread describe the variability of the data values. This underestimation occurs as repeated samples are drawn from the population and the variance is calculated from each. The resulting average of .

these variances would be lower than the true value of the population variance (assuming we were able to measure every single member of the population). Thus. it is less sensitive to abnormally high/low values. Standard Deviation =   2  This measure is used to show the extent to which the data is spread around the vicinity of the mean. or between 8.3). . For example.1. such that a small value of standard deviation would indicate that the data was clustered near to the mean. if we had a mean equal to 10.7 to 11.3.  Interquartile Range: Difference between the upper (75th percentile) and the lower (25th percentile) quartile. then we could predict that most of our data would fall somewhere between (10 . Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %. because the units are the same as the units of the attribute being described. and a standard deviation of 1. the sample estimate of population variance is obtained using the following formula:  i  x Variance = s  n 1    2  Standard Deviation: Square root of the variance. but different variances. The standard deviation is often used instead of the variance. We can avoid this bias by taking the sum of squared deviations and dividing that sum by the number of observations – less one. Because this measure does not use the mean as the center of distribution.3.3) and (10 + 1.

a summary statistic in terms of standard deviation.5 standard deviations from the mean. The formula is the ratio of the data value minus the sample mean to the sample variance.1b Outliers or “Spurious” Data Figure 1a Another statistic to consider is the Z-score. Data which “appear” to be anomalous based on its Z-score which have absolute values are greater than a specified cutoff are termed outliers. . The typical cutoff is 2.

Note: The Z-score transform does not change the shape of the histogram. The transform re-scales the histogram with a mean equal 0 and a variance equal 1. 2b. or a true local anomaly. signifying either bad data. and tells us when a few exceptional values (possibly outliers?) exert a disproportionate effect upon the mean. The X-axis is now in terms of  standard deviation units about the mean of zero. This measure is very sensitive to abnormally high/low values: CS1/nZi -)3/ where:  is the mean is the standard deviation n is the number of X and Y data pairs The coefficient of skewness allows us to quantify the symmetry of the data distribution. divided by the cubed root of the standard deviation. Measures of Shape Measures of shape describe the appearance of the histogram and are represented by the following:  Coefficient of Skewness: Averaged cubed difference between the data values and the mean. Figure 2a .Zscore = (Zi -) / This statistic serves as a caution. which must be taken into account in the final analysis. it retains the same shape after the transform. If the histogram is skewed before being transformed.  positive: long tail of high values (median < mean)  negative: long tail of low values (median > mean)  zero: a symmetrical distribution Figure 2a.

. symmetrical and positive skewness.and 2c 2c illustrate histograms with negative. It is defined as the ratio of the standard deviation to the mean. 2b  Coefficient of Variation: Often used as an alternative to skewness as a measure of asymmetry for positively skewed distributions with a minimum at zero. A value of CV > 1 probably indicates the presence of some high erratic values (outliers).

or a simultaneous decrease. Thus. (For instance. Limitations  Summary statistics are too condensed. We can reason that if one variable is indeed related to another. We might even see a simultaneous decrease in the value of one variable while the other increases.. or reciprocal. If. normal distribution defined by sample mean and variance). parallel. BIVARIATE STATISTICAL MEASURES AND DISPLAYS INTRODUCTION Methods for bivariate description not only provide a means to describe the relationship between two variables.g. on the other hand. we might observe a simultaneous increase in value between two variables.  Offers only a limited description. The relationship between two variables can be described as complementary.CV =  where:  is the standard deviation  is the mean SUMMARY OF UNIVARIATE STATISTICAL MEASURES AND DISPLAYS Advantages  Easy to calculate.not their spatial features.. especially if our real interest is in a multivariate data set (attributes are correlated). but are also the basis for tools used to analyze the spatial content of a random function (to be described in the Spatial Correlation and Modeling Analysis section).  Provides information in a very condensed form. and do not carry enough information about the shape of the distribution.CS). The bivariate summary methods described in this section only measure the linear relationship between the variables . our analysis of these two variables shows absolutely no relationship between the two. if the relationship .  Certain statistics are sensitive to abnormally high/low values that properly belong to the data set (eg. In this case.. then we might need to discard one from the pair in favor of a different variable which will be more predictive the other variable's behavior. An alternative way of characterizing the relationship between two variables would be to describe their behaviors in terms of variance.  Can be used as parameters of a distribution model (e. then information about the first variable might help us to predict the behavior of the second. THE RELATIONSHIP BETWEEN VARIABLES Bivariate analysis seeks to determine the extent to which one variable is related to another variable. we observe how the value of one variable may change (or vary) in a manner that leaves the relationship with the second variable unchanged.

Figure 1 This plot follows a common convention. COMMON BIVARIATE METHODS The most commonly used bivariate statistical methods include:  Scatterplots  Covariance  Product Moment Correlation Coefficient  Linear Regression We will discuss each of these methods in turn. For instance. we might expect that an increase in the value of the independent variable would result in a corresponding increase in the value of the dependent variable.) Dependent and Independent Variables Where a relationship between variables does exist. below.. in which the dependent variable (e.was defined by a 1:10 ratio. the other would vary by 10 times that amount . This type of plot serves several purposes:  detects a linear relationship. SCATTERPLOTS The most common bivariate plot is the Scatterplot.g. porosity) is plotted on the Y-axis (ordinate) and the independent variable (e. We use the behavior of the independent (or predictor) variable to determine how the dependent (or criterion) variable will react.. .g. acoustic impedance) is plotted on the X-axis (abscissa).thus preserving the relationship. then as the value of one variable changed.  detects a positive or inverse relationship. we can characterize each variable as being either dependent or independent. Figure 1 (Scatterplot of Porosity (dependent variable) versus Acoustic Impedance (independent variable)).

as porosity increases.g.63 30356. If the data values at locations separated by h are identical. This plot displays an inverse relationship between porosity and acoustic impedance. VARIABLES X and Y X*10 and Y X*10 and Y*10 COVARIAN CE 3035. acoustic impedance decreases.) These plots are used to show how continuous the data values are over a certain distance in a particular direction. and forms the basis for the correlogram and variogram (detailed later).  provides an overall data quality control check. For example. the cloud of points on the hScatterplot becomes fatter and more diffuse. A common geostatistical application of the scatterplot is the h-scatterplot. If both variables are multiplied by k. a high or low value has no real meaning until verified visually. porosity and acoustic impedance). (In geostatistics.. This is illustrated in the table below. Thus. COVARIANCE Covariance is a statistic that measures the correlation between all points of two variables (e. if the Xi values are multiplied by the factor k.3 303563 The covariance formula is:  i   x  i   y COV x. This statistic is a very important tool used in Geostatistics to measure spatial correlation or dissimilarity between variables. A later section will present more detail on the h-scatterplot. h commonly refers to the lag distance between sample points. then the covariance increases by a factor of k. a 45-degree line of perfect correlation. identifies potential outliers. then the covariance increases by k2. like the covariance or correlation coefficient. The magnitude of the covariance statistic is dependent upon the magnitude of the two variables.y = n   . a scalar. As the data becomes less and less similar. they will fall on a line x = y. because many factors affect these statistical measures. that is. This display should be generated before calculating bivariate summary statistics.

Figure 2 illustrates scatterplots showing positive correlation. positive correlation 0 = no correlation -a totally random relation -1 = perfect inverse correlation. depending on the degree of correlation: +1 = perfect.where: Xi is the X variable Yi is the Y variable x is the mean of X y is the mean of Y n is the number of X and Y data pairs It should be emphasized that the covariance is strongly affected by extreme pairs (outliers). This normalizes the covariance.x. This value is divided by the product of the standard deviations for variables X and Y.. y xy   . The Correlation Coefficient formula (for a population) is:  X i   x  Yi   y    n       Corr. Product Moment Correlation Coefficient The product moment correlation coefficient (  ) is more commonly called simply the correlation coefficient. Figure 2 The numerator for the correlation coefficient is the covariance. and inverse correlation between two variables. Coeff. Like the covariance.g. no correlation. porosity and velocity). This linear relationship is assigned a value that ranges between +1 to -1. and is a statistic that measures the linear relation between all points of two variables (e.y = x . outliers adversely affect the correlation coefficient. thus removing the impact of the magnitude of the data values.

while the algebraic equivalent is used to r2 refer to the correlation coefficient of a sample.832.83 between porosity and acoustic impedance tells us that as porosity increases in value. Greek is used to signify the measure of a population.89%)of the variability in porosity is explained by its relationship with acoustic impedance. This measure tells us about the extent to which two variables covary. while algebraic notation ( r ) is used for samples. However. and detect points that deviate away from the trend. That is. detect trends. In keeping with statistical notation.where: Xi is the X variable Yi is the Y variable x is the mean of X y is the mean of Y x is the standard deviation of X y is the standard deviation of Y n is the number of X and Y data pairs As with other statistical formulas. Linear Regression Linear regression is another method we use to indicate whether a linear relationship exists between two variables. Thus. extrapolate values beyond the data points. which has a real physical meaning. it is -0. . Rho Squared The square of the correlation coefficient 2 (also referred to as r2) is a measure of the variance accounted for in a linear relation. velocity decreases. only about 70% (actually. shows a simple display of regression. or 68. Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic impedance. because once we establish a linear relationship. we may later be able to interpolate values between points. it tells us how much of the variance seen in one variable can be predicted by the variance found in the other variable. This is a useful tool.83). with a correlation coefficient of -0. the Greek symbol 2 is used to denote the correlation coefficient of a population. a value of = -0.

g. which defines the ordinate (Y-axis) intercept and: a = x -by x is the mean of X y is the mean of Y . A positive slope (from lower left to upper right) indicates a positive or direct relationship between variables. porosity) Xi is the independent variable. or the estimator (e. The regression equation has the following general form: Y = a + bXi.g. defined as b =  (y/x). where: Y is the dependent variable.. In the example illustrated in the above figure. we can predict a linear relationship between the two. or the variable to be estimated (e. A negative slope (from upper left to lower right) indicates a negative or inverse relationship. A regression line drawn through the points of the scatterplot helps us to recognize the relationship between the variables.Figure 3 When two variables have a high covariance (strong correlation). and  is the correlation coefficient between X and Y x is the standard deviation of X y is the standard deviation of Y a is a Constant.. velocity) b is the slope. the porosity clearly tends to decrease as acoustic impedance increases.

and will have a slope equal to “b” Linear equations can include polynomials of any degree. It is conducted as a way of validating the data itself -you need to be sure each value that you plug into the geostatistical model is valid. Outliers can highly bias a regression predication equation. we can plot a regression line that will cross the Y-axis at the point “a”. The terms in the equation for which coefficients are computed are independent terms.g.With this equation. Z = a + bX +cY: uses X and Y as predictors and a constant Z = a + bX + cY + dXY: adds the cross term Z = a + bX + cY + dXY +eX 2 + fY2: adds the power terms SUMMARY OF BIVARIATE STATISTICAL MEASURES AND DISPLAYS Advantages  Easy to calculate. and may include combinations of logarithmic.  Can be used to estimate one variable from another variable or from multiple variables. or simulation and assessment of uncertainty. Unfortunately. covariance. which may be interpolation. or simply EDA. EDA is often overlooked. or use power terms. as well as exploring and describing the data set. (Remember: garbage in -garbage out!) By analyzing the data itself. in many studies. and can be simple (a single variable) or compound (several variables multiplied together). will reward you with improved results. so taking the time in EDA to check the quality of the data. It is also common to use cross terms (the interaction between X and Y). correlation coefficient).  Provides information in a very condensed form. and do not carry enough information about the shape of the distribution. This process is commonly referred to as Exploratory Data Analysis. EDA is an important precursor to the final goal of a geostatistical study. EXPLORATORY DATA ANALYSIS The early phase of a geostatistics project often employs classical statistical tools in a general analysis and description of the data set. Limitations mvarhaug · Summary statistics sometimes can be too condensed.. including “routine” mapping of attributes. Formatted: Bullets and Numbering . However.  No spatial information. exponential or any other non-linear variables. you can determine which points represent anomalous values of an attribute (outliers) that should either be disregarded or should be scrutinized more closely. it is absolutely necessary to have a good understanding of your data.  Certain statistics are sensitive to abnormally high/low values that properly belong to the data set (e.

we simply become detached from our data. you will not only gain a clearer understanding of your data. This is especially troublesome with large data sets. along with the tools that we will introduce in the sections under the Data Validation heading. Geoscientists tasked with making predictions about the reservoir will always face these limitations:  Most prospects provide only a very few direct “hard” observations (well data)  “Soft” data (seismic) is only indirectly related to the “hard” well data  A scarcity of observations can often lead to a higher degree of uncertainty These problems can be compounded when errors in the data are overlooked. referred to as the search neighborhood. you should plot the distribution of attribute values within your data set. The parameters that define a search neighborhood include:  Search radius  Neighborhood shape  Number of sectors ( 4 or 8 are common)  Number of data points per sector  Azimuth of major axis of anisotropy . Always take the time to explore your data. will help you to conduct a thorough analysis of your data. EDA PROCESS Note that there is no one set of prescribed steps in EDA. SEARCH NEIGHBORHOOD CRITERIA INTRODUCTION All interpolation algorithms require a standard for selecting data. By employing classical statistical methods to analyze your data. Often. Look for anomalies in your data.The classical statistical tools described in previous sections. but will also discover possible sources of errors and outliers. A thorough EDA will foster an intimate knowledge of the data to help you flag bogus results. and then look for possible explanations for those anomalies. depending on the amount and type of data involved:  data preprocessing  univariate and multivariate statistical analysis  identification and probable removal of outliers  identification of sub-populations  data posting  quick maps  sampling of seismic attributes at well locations At the very least. and when computers are involved. the process will include a number of the following tasks.

Radial Searches Two common radial search procedures are the quadrant search. This problem may be avoided by specifying search parameters which select control points that are evenly distributed around the grid node. Nearest Neighbor One simple search strategy looks for data points that are closest to the grid node. SEARCH STRATEGIES Two common search procedures are the Nearest Neighbor and the Radial Search methods. we should remember the following points:  Each sector should have enough points ( 4) to avoid directional sampling bias. and works well as long as samples are spread about evenly. These constrained search procedures test more neighboring control points than the nearest neighbor search. sliced into four or eight equal sections. it provides poor estimates when sample points are concentrated too closely along widely spaced traverses. Such constraints on searching for nearest control points will expand the size of the search neighborhood surrounding the grid node because a number of nearby control points will be passed over in favor of more distant points that satisfy the requirement for a specific number of points being selected from a single sector. so the remote data points sometimes used by the constrained searches are less closely related to the location being estimated. . Each is based on a circular or elliptical area. the octant search.  CPU time and memory requirements grow rapidly as a function of the number of data points in a neighborhood. which increases the time required. When this occurs. Another drawback to the nearest neighbor method occurs when all nearby points are concentrated in a narrow strip along one side of the grid node (such as might be seen when wells are drilled along the edge of a fault or pinchout). the selection of points produces an estimate of the node that is essentially unconstrained. The nearest neighbor search routine is quick. This may result in a grid node estimate that is less realistic than that produced by the simpler nearest neighbor search. These methods require a minimum number of control points for each of the four or eight sections surrounding the grid node. remember that the autocorrelation of a surface decreases with increasing distance. These strategies calculate the value of a grid node based on data points in the vicinity of the node. We will see a further example of the search neighborhood in our later discussion on kriging. In choosing between the simple nearest neighbor approach and the constrained quadrant or octant searches. However.When designing a search neighborhood. and its close relative. regardless of their angular distribution around the node. except in one direction.

(1994). we will use a data set from West Texas to demonstrate tools that describe spatial aspects of the data. perhaps). This data set and acoustic impedance data from a highresolution 3D seismic survey will be used to illustrate many of the geostatistical concepts throughout the remainder of this presentation. More information is available about this data set in an article by Chambers. Other missing points are the result of poor data. In this section. In this example. . are of considerable interest in developing a reservoir description. the lower values are generally found on the west side of the area.SPATIAL DESCRIPTION One of the distinguishing characteristics of earth science data is that these data sets are assigned to some particular location in space. but they often also highlight data values that may be suspect. Figure 1 Not only do these displays reveal obvious errors in data location. and thus are not included in the final data set. with only a few holes in the data locations. The statistical descriptive tools presented earlier are not able to capture these spatial features. directional trends and location of extreme values. Spatial features of the data sets. Lone high values surrounded by low values (or visa versa) are worth investigating. The empty spots in the lower right corner are on acreage belonging to another oil company. Locating the highest and lowest values may reveal trends in the data. Blank areas may indicate inaccessibility (another company‟s acreage. The data are sampled on a nearly regular grid. with the larger values in the upper right quadrant. heavily sampled areas indicate some initial interest. DATA POSTING Data posting is an important initial step in any study (Figure 1: Posted porosity data for 55 wells from North Cowden Field in West Texas). such as the degree of continuity. et al. Data posting may provide clues as to how the data were acquired.

or clustered (Davis. the coverage may be uniform. The patterns of points are considered uniform in density if the points in any sub-area are equal to the density of points in any other sub-area. .DATA DISTRIBUTION A reservoir property must be mapped on the basis of a relatively small number of discrete sample points (most often consisting of well data). When constructing maps. random. we do not expect to see the same number of points within each sub-area. The distribution of points on maps (Figure 2. for example. however. either by hand or by computer. This is especially true when working on a more regional scale.  Random: When points are distributed at random (Figure 2 -part b) across the map area. a 5-spot well pattern.  Regular: The pattern is regular (Figure 2 -part a) if the points are located on some sort of grid pattern. Typical distribution of data points within the map area. attention must be paid to the distribution of those discrete sample points. 1986).  Clustered: Many of the data sets we work with show a natural clustering (Figure 2 -part c) of points (wells). Figure 2 ) may be classified into three categories: regular.

GRID SPACING The grid interval controls the detail that can be seen in the map. Large grid cells produce quick maps with low resolution. The locations of the values represent the geographic locations in the area to be mapped and contoured (Jones. No features smaller than the interval are retained. well spacing and known geology might influence your decision to calculate porosity every 450 feet in the north-south direction. et al. you have. For example. although other grid forms may also used. and every 300 feet in the east-west direction. and a course appearance. the computer is used more and more to map data. This discussion will introduce the basic concepts of grids. . data are often fed into the computer without any special treatment or exploratory data analysis. they also tend to increase the size of the data set. it must cover two to three grid intervals. Grid nodes are formed by the intersection of each column with a row. Because the sample data represent discrete points. Unfortunately.. Although contouring is still performed by hand. By specifying a regular interval of columns (every 450 feet in the north-south direction) and rows (every 300 feet in the east-west direction). Because it is impractical to sample or estimate the value of any variable at an infinite number of points within the map area. thus leading to longer computer processing time. 1986).. Quite often. WHAT IS A GRID? Taken to extremes. especially for large data sets. gridding and interpolation for making contour maps. furthermore a fine grid often imparts gridding artifacts that show up in the resulting map (Jones. we define a grid to describe locations where estimates will be calculated for use in the contouring process. Before using a computer to create a contour map it is necessary to create a grid and then use the gridding process to create the contours. there is a trade-off involving grid size. thus the cell should be small enough to show the required detail of the feature. 1986). and the resulting maps are accepted without question. in effect.GRIDS AND GRIDDING INTRODUCTION One of our many tasks as geoscientists is to create contour maps. While small grid cells may produce a finer appearance with better resolution. A grid is formed by arranging a set of values into a regularly spaced array. or more commonly. every map contains an infinite number of points within its map area. a grid should be designed to reflect the average spacing between the wells. four nodes for a square arrangement). et al. commonly a square or rectangle. even though the maps might violate sound geological principles. The area enclosed by adjacent grid nodes is called a grid cell (three nodes for a triangular arrangement. To accurately define a feature. However. created a grid. defaults are used exclusively in the mapping program. and designed such that the individual data points lie as closely as possible to a grid node.

composition.. thickness. or porosity. two-dimensional representation. the word “gridding” should not be considered as just a grammatical variation on the word “grid. According to Jones. geologists are rather casual about their use of terminology. (1986). whether it depicts elevation. for example: elevation with respect to sea level.. the purpose of contouring is to summarize large volumes of data and to depict its three-dimensional spatial distribution on a 2-D paper surface. thickness. porosity. and usually call any isoline a contour. 1986).” Gridding is the process of estimating the value of an attribute from isolated points onto a regularly spaced mesh.A rule of thumb says that the grid interval should be specified so that a given grid cell contains no more than one sample point. The attribute‟s values are estimated at each grid node.g. compressed onto a flat. The Zaxis typically represents the value of the attribute. or some other quantity (Davis. It is not possible to know the value of the surface at every possible location. 200 rather than 196. the average well spacing. Identifying the area and the attribute to be mapped (Figure 1 . called a grid (as described above). We use contour maps to represent the value of the property at unsampled locations (Davis. Jones. rounded to an even increment (e. A useful approach is to estimate. GRIDS AND GRIDDING Within the realm of geostatistics. Thus. The Interpolation Process The mapping (interpolation) and contouring process involves four basic steps. 1986. 1986). nor can we measure its value at every point we might wish to choose. In this case. Contour lines connect points of equal value on a map. North Cowden Field. et al. the four mapping and contouring steps are: 1. or other property. and use it as the grid interval. Contour maps are a type of three-dimensional graph or diagram. you will often discover that seemingly similar words have quite different meanings. West Texas). Contour lines. are isolines of elevation. Location and values of control points within the mapping area.7). The X-and Y-axes usually correspond to the geographical coordinates east-west and north-south. However. by eye. . such as depth to the top of a reservoir. The surface may represent a structural surface. strictly speaking. et al. or may represent the magnitude of a petrophysical property. such as porosity. and the space between two successive contour lines contains only points falling within the interval defined by the contour lines. Interpolation And Contouring The objective of contouring is to visually describe or delineate the form of a surface.

Upper left quadrant of the grid shown in Figure 2. Designing the grid over the area (Figure 2 . Figure 2 3.Figure 1 2. Calculating the values to be assigned at each grid node (Figure 3. . Grid design superimposed on the control points).

Figure 4 To illustrate these steps. created from the contol points in Figure 1 and the grid mesh values shown in Figure 3). 4. we will use porosity measurements from the previously mentioned West Texas data set. .These values are used to create the contours shown in Figure 4).Figure 3 The values represent interpolated values at the grid nodes. Using the estimated grid node values to draw contours (Figure 4 . Contour map of porosity.

the attribute values of the nearest control points are weighted according to their distance from the grid node. See Figure 1 (Location and values of control points within the mapping area at North Cowden Field. with the heavier weights assigned to the closest points. West Texas) for the sample locations and porosity values. This section is not meant to provide an exhaustive dissertation of the subject. Many of the following methods require the definition of Neighborhood parameters to characterize the set of sample points used during the estimation process. located on a nearly regular grid. These methods use non-geostatistical interpolation algorithms and do not require a spatial model. with an optimum of 3 sample points per quadrant These examples use porosity measurements. given the location of the grid node. Most interpolation methods use a weighted average of values from control points in the vicinity of the grid node in order to estimate the value of the attribute assigned to that node.TRADITIONAL INTERPOLATION METHODS INTRODUCTION The point-estimation methods described in this section consist of common methods used to make contour maps. we‟ve specified the following neighborhood parameters:  Isotropic ellipse with a radius = 5000 feet  4 quadrants  A minimum of 7 sample points. The attribute values of grid nodes that lie beyond the outermost control points must be extrapolated from values assigned to the nearest control points. They provide a way to create an initial “quick look” map of the attributes of interest. but will introduce certain concepts needed to understand the principles of geostatistical interpolation and simulation methods discussed in later sections. With this approach. Figure 1 . For the upcoming examples.

et al.) INVERSE DISTANCE This estimation method uses a linear combination of attribute values from neighboring control points. If the smallest distance is smaller than a given threshold. Jones. Large values of p ( 5 or greater) create maps similar to the closest point method (Isaaks and Srivastava. 1986). the value of the corresponding sample is copied to the grid node.The following seven estimation methods will be discussed in turn:  Inverse Distance  Closest Point  Moving Average  Least Squares Polynomial  Spline  Polygons of Influence  Triangulation The first five estimation methods are accompanied by images that illustrate the patterns and relative magnitude of the porosity values created by each method. The lowest value of porosity is dark blue (5%) and the highest value is red (13%). 1989. for the purpose of this illustration. . The weights assigned to the measured values used in the interpolation process are based on distance from the grid node. The equation for the inverse distance method has the following form:     Where:   And: 1 / d  p  (1 / d ) p Z* = the target grid node location  = the weights Z = the data points dp = power of distance from Z to Z* Figure 2 displays Inverse Distance gridding using a power of 1. which will be described next. with a 0. and are inversely proportional.. All images have the same color scale. (No porosity mapping images were produced for the polygons of influence and triangulation methods. the actual values are not important at this time.5% color interval. at a given power (p). However.

 spots erroneous sample locations.”  is an excellent QC tool.Figure 2 The Inverse Distance method is recommended as a “first pass” through the data because it:  is simple to use and understand.  produces a “quick map. CLOSEST POINT The closest point (Figure 3) or nearest neighbor methods consist of copying the value of the closest sample point to the target grid node.  locates “bulls-eye” effect (lone high or low values).  gives a first indication of trends. Figure 3 .

This method can be viewed as a linear combination of the neighboring points with all the weights equal to 0, except the weight attached to the closest point which is equal to 100% (Henley, 1981; Jones, et al., 1986). Z* = Z (closest Where Z* = the target grid node location Z = the data points MOVING AVERAGE The moving average method (Figure 4) is the most frequently used estimation method.

Figure 4

Each neighboring sample point is given the same weight. The weight is calculated so that the sum of the weights of all the neighboring sample points sum to unity (Henley, 1981; Jones, et al., 1986). So, if we assume that there are N neighboring data, Z* =  Z /N Where Z* = the target grid node location Z = the data points N = the neighboring data The moving average takes its name from the process of estimating the attribute value at each grid node based on the weighted average of nearby control points in the search neighborhood, and then moving the neighborhood from grid node to node. LEAST SQUARES POLYNOMIAL The least squares polynomial method (Figure 5) is commonly used for trend surface analysis.

Figure 5

The neighboring points are used to fit a polynomial expression of a degree specified by the user. The polynomial form is a logical choice for surface approximation, as any function that is continuous and possesses all derivatives can be reproduced by an infinite power series. The polynomial surface is a mathematical function involving powers of X and Y. The complexity of the surface (Table 3.1) is controlled by the user through the number of terms used, which is dependent upon its degree, N, a positive integer (Jones, et al., 1986; Davis, 1986; Krumbein and Graybill, 1965, Henley, 1981). Z* =  aij Xi Yj Where Z* = the target grid node location Table 1: General form of polynomial functions (after (Jones, ET al., 1986). Degree 1 2 N Function Z = a00 + a10X + a01Y Z = a00 + a10X + a01Y + a20X2 + a11XY + a02Y2 Z = a00 + a10X + a01Y + . . . + a0NYN

SPLINE Spline fitting is a commonly used quantitative method. The method ignores geologic trends and allows sample location geometry to dictate the range of influence of the samples. The bicubic spline (Figure 6) is a two-dimensional gridding algorithm.

Figure 6

In one-dimension, the function has the form of a flexible rod between the sample points. In two dimensions, the function has the form of a flexible sheet. The objective of the method is to fit the smoothest possible surface through all the samples using a least squares polynomial approach (Jones, et al., 1986). POLYGONS OF INFLUENCE This is a very simple method, and is often used in the mining industry to estimate average ore grade within blocks. Often, the value estimated at any location is simply the value of the closest point. The method is similar to the closest point approach. Polygonal patterns are created, based on sample location. Polygon boundaries represent the distance midway between adjacent sample locations. As long as the points we are estimating fall within the same polygon of influence, the polygonal estimate does not change. As soon as we encounter a grid node in a different polygon, the estimate changes to a different value. This method causes abrupt discontinuities in the surface, and may create unrealistic maps (Isaaks and Srivastava, 1989; Henley, 1981). TRIANGULATION The triangulation method is used to calculate the value of a variable (such as depth, or porosity for instance) in an area of a map located between 3 known control points. Triangulation overcomes the problem of the polygonal method, removing possible discontinuities between adjacent points by fitting a plane through three sample points that surround the grid node being estimated (Isaaks and Srivastava, 1989). The equation of the plane is generally expressed as: Z* = ax + by + c This method starts by connecting lines between the 3 known control points to form a triangle (denoted as rst in Figure 7).

Add the resulting products. taken perpendicular to the base. The weights are taken as a percentage. The value of any point located within the triangle (point O in this example) can be determined through the following steps: 1. .  point s is weighted by the area A Ort. to each of the corners of the triangle. Weights are assigned to each value in proportion to the area of the triangle opposite the known value. thereby forming 3 new triangles within the original triangle. join a line from the unknown point (point O in the figure). b is the length of the base. as shown by the example in the Figure 7. and 4. and  point t is weighted by the area A Ors. Multiply the values of the three corner points by their respective weights. The control values (r. Do this for each of the three points. Be aware.t) are located at the corners of the triangle. The formula to find the area of a triangle is: A = bh/2 Where A is the area of the triangle. that choosing different meshes of triangles or entering the data in a different sequence may result in a different set of contours for your map.s. This example shows how the values from the three closest locations are weighted by triangular areas to form an estimated value at point O.Figure 7 Next. Now multiply the weight times its associated control value to arrive at a weighted control value. Use the areas of each triangle to establish a weight for each corner point 3. and h is the length of the height. however. where the sum of all 3 weights equals 1. Compute the areas of the resulting new triangles 2.  The value at point r is weighted by the triangular area A Ost. Then add up the 3 weighted control values to triangulate an interpolated depth for point O.

many data sets are too large to hand contour. Unfortunately. Design the grid interval to be about the average spacing of the wells. if possible. then choose an octant isotropic octant search neighborhood with about two data points per octant. The inverse distance parameters are easy to set. Figure 1 (Grid mesh and data location of 55 porosity data points from North Cowden Field in West Texas) Figure 1 shows the grid design with respect to the data locations and Figure 2 is the resulting contour map using an inverse distance approach with a distance power equal to 1. Gridding and contouring of the data requires values to be interpolated onto a regular grid. the details of the contouring algorithm need not concern us as long as the contour map is a good visual display of the data. or half that size. Hand contouring the data is an excellent way to become familiar with the data set.MAP DISPLAY TYPES CONTOUR MAPS Contour maps reveal overall trends in the data values. . At this preliminary stage of spatial description. the inverse distance algorithm is a good choice. so computer contouring is often the only alternative. For a first pass through the data.

800 acoustic impedance values. . extending down the right side of the mapped area. An alternative approach is a symbol map. the high porosity area is located in the upper right quadrant. with a zone of lower porosity trending east to west through the central portion of the area. central portion of the area. Displays such as this will aid in designing the spatial analysis strategy and help to highlight directional continuity trends. posting individual sample values may not be feasible and contouring may mask interesting local details (Isaaks and Srivastava. SYMBOL MAPS For many large data sets (for example. There is a second region of high porosity in the southern. Figure 3 is a five-color symbol map of 33. seismic). 1989).Figure 2 In this example. Different colors for ranges of data values can be used to reveal trends in high and low values. We can see that low values are generally trending north-south. from a high resolution 3D seismic survey. scaled between 0 and 1.

If this relationship holds. black and white. and 0. each data point is assigned to one of two classes. 1994) show that acoustic impedance has a -0.4. Therefore.83 correlation with porosity. INDICATOR MAPS An indicator map is a special type of symbol map with only two symbols. Figure 1 and Figure 2).8 acoustic impedance scaled units. 0. The threshold values are 0. Indicator maps simply record when a data value is above or below a certain threshold. With these two symbols. 0. Four indicator maps (Figure 4 ) were created from the acoustic impedance data shown in Figure 3. et al. for example. observation from this map may indicate zones of high porosity associated with the red and orange areas (see contour and data posting maps. Low porosity is located in the blue and green areas. the seismic data can be used to infer porosity in the inter-well regions using a geostatistical data integration technique commonly referred to as cokriging (which we will describe later in the section on Data Integration).6.Figure 3 Previous studies with this data set and its accompanying porosity data set (Chambers. .2..

local high (thief zones) or low (barriers) zones of permeability hamper the effective recovery of hydrocarbons and creates many more problems with secondary and tertiary recovery operations. Zones of highest impedance are on the western side of the study area. The method is quite simple. symbol and indicator maps provide us with a lot of information about the spatial arrangement and pattern in our data sets. Data posting. with another north to south trend in the lower right corner of the map area. which relate to trends in porosity values. These are also excellent quality control displays and provide clues about potential data problems. and consists of : . central portion of the area. For example. In earth science data it is quite common to find data values in some regions that display more variability than in other regions. MOVING WINDOW STATISTICS INTRODUCTION Moving window statistics provide a way to look for local anomalies in the data set. Zones of lowest scaled impedance are located in the upper right quadrant of the study area (Figure 4 part B). heteroscedasticity in statistical jargon. trending generally north to south.Figure 4 These maps show definite trends in high and low values. 1989). There is also a zone of high impedance cutting east to west across the southern. contour. The calculation of a few summary statistics (mean and standard deviation) within moving windows is often useful for investigating anomalies in the data (Isaaks and Srivastava. Figure 4 parts B and C are perhaps the most revealing.

and amount of data. Formation A occurs at 975 feet TVDss.  There is a trend in both the local mean and local variability. There are four relationships between the local mean and the local variability (e. Ideally. The magnitude of the local variability correlates with the magnitude of the local mean. if Z (top) is a stationary random function. however. Henley. For example. The concept of stationarity is used in every day practice. these relationships are:  There is no trend in the local mean or the variability.  There is a trend in the local mean. but is rarely seen. The variability is independent of the magnitude of the local mean. consider the following: The top of Formation A occurs at a depth of about 975 feet TVDss. However. and  At location Z(xi). Wackernagel. This statement. If the local mean shows a trend. According to Isaaks and Srivastava (1989). Isaaks and Srivastava. 1995). but the variability is independent of the local mean and has no trend. 1988. See Isaaks and Srivastava (1989) for more details on moving windows and the proportionality effect. does not preclude the possibility that Formation A varies in depth from well to well. or to show a trend (Hohn. PROPORTIONALITY EFFECT The proportionality effect concerns the relationship of the local summary statistics computed from moving windows.g. then . CONCEPT OF STATIONARITY A stationary property is stable throughout the area measured. If the local variability is roughly constant. The window size depends on the average data spacing. This is the ideal case. we would like our data to be independent of sample location. dimensions of the study area.  then computing summary statistics within each local area. dividing the area into local neighborhoods of equal size. and the data are then said to be non-stationary. in which statistical parameters such as mean and standard deviation are not seen to change. known as the proportionality effect. then estimates anywhere in the mapped area will be as good as estimates anywhere else. standard deviation or variance). but there is a trend in the local variability. then we need to examine our data for signs of stationarity. 1989. Stationarity may be considered the statistical equivalent of homogeneity. It is also important to see if the magnitude of the local standard deviation tracks (correlates) with the magnitude of the local mean. data often show a regular increase (or decrease) in value over large distances.  There is no trend in the local mean. Thus. the first two cases are the most favorable. 1981. We are looking for possible trends in the local mean and standard deviation. For estimation purposes.. Stationarity requires that values in the data set represent the statistical population.

. PROPERTIES OF REGIONALIZED AND RANDOM VARIABLES One data set can have exactly the same univariate statistics as another set. in his definitive work on Geostatistics entitled Traite de Geostatistique Appliquee (1963). For example. and serves to increase the uncertainty about the behavior of attributes at locations between sample points (sample points are usually wells). the premise of regionalized variables and spatial correlation analysis is to quantify the continuity of sample properties with distance and direction. etc. Although the regionalized variable possesses the usual distribution parameters (mean. successive realizations of an ordinary scalar random variable are uncorrelated (Henley.). because it conditions subsequent processes. permeability. because the classical univariate or bivariate statistics cannot capture spatial correlation information. and yet exhibit very different spatial continuity. then predicting the depth to the top of Formation A in the new well is more difficult. a body of theoretical statistics in which the location was for the first time considered an important factor in the estimation procedure. 1989). and their associated uncertainties. if Formation A is known to be non-stationary.g. and requires a more sophisticated model. so a geostatistical approach has been developed because it is based on probabilistic models that account for these inevitable uncertainties (Isaaks and Srivastava. Deterministic models cannot handle the uncertainties associated with such variables. it also has a defined spatial location. THE BASIS OF THE REGIONALIZED VARIABLE AND SPATIAL CORRELATION Matheron. We will discuss such models and how stationarity influences them in the section on regionalized variables. laid the foundation for regionalized variable theory. variance. porosity. INTRODUCTION In the reservoir. Regionalized variable theory pertains to the statistics of a special type of variable that differs from the ordinary scalar random variable. Thus. etc. so it is important to understand the scales and directional aspects of these features to gain efficient hydrocarbon production. 1981). sand/shale volumes. These processes superimpose a spatial pattern on the reservoir rock properties. Therefore. Formation A should also occur at about 975 feet TVDss. Spatial correlation analysis is one of the most important steps in a geostatistical reservoir study. At location Z(xi + ½ mile). However. the variables of interest (e. such as kriging and conditional simulation results. two realizations of a regionalized variable that differ in spatial location will display in general a non-zero correlation. The spatial component adds a degree of complexity to these variables.) are products of a variety of complex physical and chemical processes. Consider the following two sequences (Figure 1a . we can intuitively surmise that two wells in close proximity are more likely to have similar reservoir properties than two wells which are further apart. However. But just exactly how far can wells be separated yet still yield similar results? We need a new statistical measure.

on the left (Regionalized Variable) shows spatial continuity in porosity.7 %  Same frequency distribution (histogram)  Differences:  Sequence A has SPATIAL CONTINUITY .Figure 1a and Figure 1b: Comparison of porosity data measured over 50 units of distance with an equal sampling interval): Figure 1b This graphic shows a plot of porosity measures along two transects.  Similarities:  Same mean = 8. The sample spacing is 1 unit of distance. However. but serve to illustrate the concepts behind the regionalized variable and spatial correlation. whereas the Sequence B on the right (Random Variable) shows a random distribution of porosity. the mean. Sequence A. variance and histogram for both porosity sequences are identical. Statistical Properties Of The Porosity Profiles The porosity profiles in the above graphic are purely hypothetical.4%  Same standard deviation = 2.

However. it will yield a slope of 45 degrees. those who contour data usually assume the presence of the spatial component (regionalized variable). Sequence A Correlation Plots Figure 2a. where the geoscientist has a certain model in mind before he attempts the contouring exercise. if the data are translated by the sampling interval. during the mapping process. but typically ignore the second component (the random variable). or the lack of spatial correlation. we will begin to see the impact of spatial correlation. and hence is a REGIONALIZED VARIABLE. Autocorrelation Let us further investigate the concept of spatial continuity by plotting Sequences A and B in a different way. The process of hand-contouring data points on a map is a form of geostatistical modeling. However. Sequence B is RANDOM Sequence A exhibits a structured or spatial correlation component. Sequence B does not show any spatial continuity. These variables will come into play later. . Figure 2a 2c. thus indicating perfect correlation. then plotted against itself. When any of these data sequences is plotted against itself. 2b. and so is classified as a RANDOM VARIABLE.

2c 2e. 2d .2b 2d.

2e Sequence B Correlation Plots When we compare Figure 3a.and 2f 2f (Correlation plots) will help to illustrate the concept of spatial autocorrelation. Here we see a reasonably good correlation even with 3 units of translation of Sequence A. . 3b.

we see a distinct difference in autocorrelation characteristics.Figure 3a and 3c 3c (Correlation cross-plots) with that of the previous graphic. . There is a poor correlation after one unit of translation of Sequence B.

Isaaks and Srivastava.  Good correlation after three units of translation is shown  Correlation is 0. showing little or no correlation The random function model assumes that: . then they will fall on a line x = y. which are combinations of Regionalized and Random Variables. The following observations are readily apparent from the previous two figures: Sequence A is Regionalized Variable and shows spatial continuity over about 3 units of distance. consisting of the random variable (also referred to as the nugget effect).3b Observations Sequences A and B (above) are presented as h-Scatterplots. the random function has two components:  Structured Component. h-Scatterplots were computed along two different transects. consisting of the regionalized variable. 1989). For this case. If the data values at locations separated by h are identical. where h represents lag. 1988. Recall that the concept of the h-Scatterplot was discussed in the section on Bivariate Statistical Measures and Displays. which exhibits some degree of spatial auto-correlation  Local Random Component. Thus. The shape of the cloud on these plots tells us how continuous the data values are over a certain distance in a particular direction. or separation distance.  Poor correlation after one unit of translation is shown  Correlation approaches 0 after two units of translation The Random Function The complex attributes we study in the reservoir are random functions.21 after five units of translation Sequence B is a Random Variable with no spatial continuity. As the data becomes less and less similar. a 45-degree line of perfect correlation. the cloud of points on the h-Scatterplot becomes fatter and more diffuse (Hohn. The hScatterplot forms the basis for describing a model of spatial correlation.

such a linear correlation amounts to gross oversimplification. the value of a variable at one location can be predicted from values sampled at other (nearby) locations. (1989) Deutsch and Journel (1992). Semivariance is defined as: (h) = [1/2N(h)]  [(zxi) -( zxi+h)]2 Where:  = semivariance h = lag (separation distance) zxi = value of sample located at point xi zxi+h = value of sample located at point xi+h N(h) = total number of sample pairs for the lag interval h. In practice. SPATIAL CONTINUITY ANALYSIS In previous sections. we discussed classical methods for analyzing single variables or multiple variables. Traditional interpolation procedures work on the assumption that spatial correlation within a data set may be modeled by a linear function. the correlogram. Isaaks and Srivastava. i = 1. In this section.  The set of collected samples. Wackernagel. In fact. THE VARIOGRAM Regionalized variable theory uses the concept of semivariance to express the relationship between different points on a surface. where h is some measurement of distance. Samples that are auto-correlated are not independent with regard to distance. The single measurement at location z(x i) is one possible outcome from a random variable located at point Z(x i). based on the premise that as the distance between sampled locations increases. (1995) and Henley (1981). could not properly address the spatial continuity and directionality that is inherent in earth science data. we will introduce the concept of auto-correlation. SPATIAL AUTO-CORRELATION Spatial auto-correlation describes the relationship between regionalized variables sampled at different locations. For more information on the random function model. are interpreted as a particular realization of dependent random variables. Z(xi). By studying the spatial dependency between any two measurements of the same attribute sampled at z(x i) and z(x i+h). i = 1. however. z(x i). The closer two variables are to each other in space the more likely they are to be related. The two common measures of spatial continuity are the variogram and its close relative. The process of quantifying spatial information involves the comparison of attribute values measured at one location with values of the same attribute measured at other locations. . Those methods. and the tools used to measure this property. see: Hohn. … n. the variability between data values increases proportionally. (1988). … n. known as a random function. which allow us to quantify the continuity. anisotropy and azimuthal properties of our measured data set. This method is analogous to the h-Scatterplot. we are essentially studying the spatial correlation between two corresponding random functions Z(x i) and Z(xi+h).

Semivariance is used to describe the rate of change of a regionalized variable as a function of distance. We know intuitively that there should be no change in values (semivariance = 0) between points located at a lag distance h = 0, because there are no differences between points that are compared to themselves. However, when we compare points that are spaced farther apart, we see a corresponding increase in semivariance (the higher the average semivariance, the more dissimilar the values of the attribute being examined). As the distance increases further, the semivariance eventually becomes approximately equal to the variance of the surface itself. This distance is the greatest distance over which a variable measured at one point on the surface is related to that variable at another point. Semivariance is evaluated by calculating  (h) for all pairs of points in the data set and assigning each pair to a lag interval h. If we plot a graph of semivariance versus lag distance, we create a variogram (also known as a semivariogram). The variogram measures dissimilarity, or increasing variance between points (decreasing correlation) as a function of distance. In addition to helping us assess how values at different locations vary over distance, the variogram provides a way to study the influence of other geologic factors which may affect whether the spatial correlation varies only with distance (the isotropic case) or with direction and distance (the anisotropic case). Because the variogram is the sum of the squared differences of all data pairs falling within a certain lag distance, divided by twice the number of pairs found for that lag, we use the variogram to infer the correlation between points. That is, rather than showing how two points are alike, or predicting attribute value at each point, we actually plot the difference between each value over a given lag distance. The Experimental Variogram The variogram described above is known as an experimental variogram. The experimental variogram is based on the values contained in the data set, and is computed as a preliminary step in the kriging process. The experimental variogram serves as a template for the model variogram, which is used to guide the kriging process. THE CORRELOGRAM The correlogram is another measure of spatial dependence. Rather than measuring dissimilarity, the correlogram is a measure of similarity, or correlation, versus separation distance. C(h) = 1/n Z(xi) -m] [Z(xi+h) -m] Where: m is the sample mean over all paired points, n(h), separated by distance h. Computing the covariance for increasing lags (double, triple, etc.) allows us to generate a plot showing decreasing covariance with distance, as shown in Figure 1a and 1b: omni-directional variogram (A)

Figure 1a

and correlogram (B) computed from the same data set.

1b

In this graphic, we see that while the variogram in Frame A measures increasing variability (dissimilarity) with increasing distance, the correlogram in Frame B measures decreasing correlation with distance. Anatomy of the correlation model Notice that the variogram and correlogram plots in Figure 1a and 1b curve in opposite directions. Thus, the origin represents zero variance for the variogram and perfect correlation for the correlogram; a measure of self-similarity. The variogram model tends to reach a plateau called a sill (the dashed horizontal line at the top of Figure 1a). The sill represents the maximum variance  of the measured spatial process being modeled. The lag distance at which the sill is

reached by the variogram is called the range, which represents the maximum separation distance at which one data point will be able to correlate with any other point in the data set. The range plays a role in determining the maximum separation distance between grids. The correlogram reaches its range when C (h) = 0. The correlation scale length is determined when the covariance value reaches zero (no correlation). Working from the same data set, the range for a variogram and correlogram should be the same for a given set of search parameters. The sill and range are useful properties when comparing directional trends in the data. Notice that the correlogram intersects the Y-axis at 1.0, but there is a discontinuity near zero for the variogram model. Often the variogram or correlogram show discontinuity near the origin. Such a discontinuity is called the nugget effect in geostatistical terminology. The nugget effect is generally interpreted as a residual variance or spatially independent variability that occurs at spatial scales below the observational threshold of the sampling -smaller than the resolution of the sample grid. It can also be caused by random noise at all scales, or measurement error. Variogram Search Strategies If the sample data have a regular sampling interval, the search strategy is simple. Unfortunately, point data (well data) rarely form a neat regular array, therefore to extract as much information out of the data as possible, rather than searching along a simple vector, we search for data within a bin. Search Parameters When computing the experimental variogram (covariance model), the following search parameters must be taken into account (Figure 2: Search strategy along azimuths 45 and 135 degrees.):

Figure 2

 Lag: The lag distance is the separation distance, h, between sample points used in calculating the experimental model.

For example. each dictated by the data configuration and the number of sample points. However. Thus. and so forth until the maximum lag distance specified is reached.In a producing field. rather than the closest well pair spacing. (Some programs may set the first bin centered around 350 feet to account for wells spaced closer than 700 feet. we also decrease the number of data pairs within each class. Point A is compared to Point B. The search azimuth also has an azimuth tolerance. which can subsequently be refined by calculating a directional model:  It is the average of all possible directional variograms. We say this property exhibits an anisotropic behavior. If we decrease the size of the bin. . then we increase the number of bins. and hence the number points that we plot on the variogram. by decreasing the size of each bin. In the above graphic. Such is the case when the continuity of a reservoir property is more prevalent in one direction than in another direction. Point B lies within one of the search bins designated by the lag tolerance. By using a 45-degree tolerance about each search direction.  It can serve as an early warning for erratic data points. Omni-directional Experimental Models There are many ways to design a search strategy. we do not necessarily imply a belief that the spatial continuity is the same in all directions. This will increase the resolution of the variogram. for this example. we should set the lag at 1400 feet. West Texas fields are commonly drilled on 1340-foot (1/4-mile) spacing. because all wells will not be spaced exactly 1340 feet apart.  Bandwidth: The bandwidth restricts the limits (width) of the azimuth tolerance at large lag distances. Then we will program a lag tolerance to one-half the lag interval. simply because of a sparse data set. The omni-direction model is a good choice for an initial variogram.) The second lag bin is from 2100 to 3500 feet. (By calculating an omni-directional variogram. For example. The Bandwidth is indicated by a light dashed line about the azimuthal direction (heavy dashed line) of 45 degrees. This has the effect of decreasing the likelihood that the average semivariance for that class is accurately estimated. we will be able to cover all sample locations in the neighborhood. it is necessary to conduct an omni-directional search. Quite often. However.  Search Azimuth: Because reservoir data often exhibit directional properties. An important consideration in designing a lag strategy involves how we specify the size of the lag bin. we may wish to specify a certain direction for the search strategy. we may wish to calculate two directions.) An omni-directional search is designed by selecting a single azimuthal direction (does not matter what angle is selected) and setting a tolerance of 90 degrees. at 045 and 135 degrees. a good starting distance might be the average well spacing. the first lag bin is from 700 to 2100 feet.

Anisotropic Experimental Models Because earth science data is often more continuous in one direction than in another. By plotting the results onto a common graph. Figure 3a and 3b 3b (Variograms computed along different azimuths) shows two types of anisotropy: . If the omni-directional variogram is not able to clearly define the spatial continuity. A study of each variogram allows us to further characterize the nature of the anisotropy. the maximum direction of continuity is orthogonal (90 degrees) to the minimum axis. By definition. we produce two variograms on the same chart. however. Accounting for Anisotropy If we base our variogram search along two different azimuths. as the axes may change direction across the study area (e. then it is unlikely that spatial continuity will be established by directional variograms. the minimum direction of continuity might align along the 45-degree azimuth. In the Figure 2. we need to design a variogram search strategy to model the maximum and minimum directions of continuity. we often see the influence of anisotropy. meandering channel system).g. that this assumption does not always hold true. Note. and it is preferable to have a variogram that conforms to the major axis of anisotropy.

variograms are very sensitive to “outliers.  The Nugget Effect is more easily determined from the omni-directional variogram. Practical Considerations for Computing the Experimental Spatial Variogram  The omni-directional variogram considers all azimuths simultaneously.  If data are skewed.  An omni-directional variogram contains more sample pairs per lag than any directional variogram.  Interpret a variogram only if the corresponding number of pairs is sufficient (e.  The variogram computation involves a decision of stationarity. 15 to 20 pairs). consider transforming the data (e.  May need to “clean-up” the data prior to calculating the variogram..  An omni-directional variogram is the average of all directional variograms.Figure 3a  Geometric anisotropy (Frame A) is indicated by directional variograms that have the same sill.”  Do not consider variogram values for distances greater than about ½ the size of the study area. .  Zonal anisotropy (Frame B) is seen when variograms have different sills but the same range.  A saw-toothed pattern may indicate a poor choice of lag increment.g. and therefore is more likely to show structure.  Non-stationary variograms do not reach a sill and are considered unbounded (have a characteristic parabolic upward shape). but different ranges.g.  Consider data clustering. Gaussian distribution).

 It is often difficult to select a domain of stationarity (constant mean) for computation. variograms and correlograms provide useful tools to:  measure linear spatial dependence  quantify spatial scales  identify and quantify anisotropy  test multiple geological scenarios However. The cross correlation model is useful when performing cokriging or conditional cosimulation (e. as seen in Figure 1a.g. If for example. comparing porosity values to other nearby porosity values). we need to estimate porosity based on measurements of acoustic impedance. and may not be appropriate for non-linear processes. 1b. In this case.. matching well data with seismic data). SPATIAL CROSS-CORRELATION ANALYSIS The previous discussion focused on spatial analysis of only a single variable (e. The cross correlogram or cross variogram describes spatial correlation in which the paired points represent different variables. then it is first necessary to compute the auto-correlation models for both attributes and then compute the cross-correlation model..  Spatial correlation analysis is difficult to perform when data are sparse. To study the spatial relationships between two or more variables we use the process of cross-correlation. known values of one variable are compared to known values of a different variable. Figure 1a .Advantages And Limitations of Variograms And Correlograms Given a sufficient number of data points. a process known as auto-correlation.g. variograms and correlograms are subject to limitations:  They are measures of linear spatial interdependence.

and 1c 1c (Omni-directional variograms of Porosity (A). 1b In this graphic. . Acoustic Impedance (B). the solid squares on the figures represent the average of porosity or acoustic impedance data pairs for each 500-unit lag interval. The first point contains only one data pair and should not be taken into consideration during the modeling step. and their cross variogram (C)). The numbers of data pairs are displayed next to the average experimental data point.

g. etc. cores. 1995).Below are the general variogram equations for the primary attribute (porosity). The shape and volume of the rock are collectively termed the support of the observation. this assumption means that the expected value of the . It is obvious that once a piece of rock is collected (e. the reservoir). it is impossible to collect it again from the same location (Henley. Geostatistical methods can account for the support effect using a variogram approach.g. the variance decreases until it remains constant after reaching a certain area or volume (Wackernagel. hand samples. or wireline data and core). Consider the following:  Z(xi) = the primary attribute measured at location xi  Z(xi + h) = the primary attribute measured at xi + some separation distance (lag). h  T(xi) = the secondary attribute measured at location xi  T(xi + h) = the secondary attribute measured at xi + some separation distance (lag). and their cross variogram. 1981). until we compare samples of different sizes or volumes (e. The support effect becomes significant when combining well data and seismic data.) from a location. STATIONARITY IN REGIONALIZED VARIABLES Stationarity ensures that the spatial correlation can be modeled with a positive definite function and states that the expected value which may be considered the mean of the data values is not dependent upon the distance separating the data points. the sample can be considered as point data (Henley. For much of our work the support does not influence our mapping. h  N = the number of data points then  The variogram of the primary attribute is calculated as(Figure 1a): (h) = 1/2N Z(xi)] -[Z(xi+h)]2  The variogram of the secondary attribute is calculated as(Figure 1b): (h) = 1/2N T(xi)] -[T(xi+h)]2  The cross variogram between the primary and secondary attribute is calculated as(Figure 1c): (h) = 1/2N Z(xi) -Z(xi+h)] [T(xi) -T(xi+h)] SUPPORT EFFECT Most reservoir studies are concerned with physical rock samples. The volumes of measurement differences between well logs and cores versus seismic data is very large and should not be ignored. the secondary attribute (acoustic impedance). with observations corresponding to a portion of rock of finite volume. Seismic data can not be considered point data. core plugs and whole core. If the dimensions of the support are very small in comparison to the sampling area or volume (e. 1981).g. As the support size increases. Mathematically.

For a random variable. Wackernagel. 1988). the geoscientist seldom has total control over the sample distribution. others contend that they can define quasi-stationarity as local stationarity. . Isaaks and Srivastava. 1981. the residual and the trend. 3. 1988. A variable of interest may have a trend across an area.m(x2]. If a regionalized variable is non-stationary. Three second-order moments are useful in geostatistics: 1.m(x1) ] [Z(x2) . 1995). Unfortunately. where Z(x1) and Z(x2) are two random variables observed at locations x1 and x2. the semivariogram and covariance are alternative measures of spatial autocorrelation (Hohn. 1988. 1989. observed at location x. 1981. The variance of the random variable Z (x): VAR Z(x) = E [Z(x) . 1988). Z(x).m(x) ]2 2. The covariance: C(x1 -x2) = E [Z(x1) . The semivariogram function:  (x1. Z(x i +h) = random variables E = expected value x i = sampled location h = distance between sampled locations Stationarity is defined through the first-order (mean) and second-order (variability) moments of the observed random function. and so it would be deemed non-stationary. it can be regarded as a composite of two parts. However. h Where: Z(x i). and degrees of stationarity correspond to the particular moments that remain invariant across the study area (Hohn. the distribution function of Z (x) has the expectation E Z(x) = m(x) which can depend upon x. Some argue that stationarity is a matter of scale (Hohn. stationarity can be achieved.difference between two random variables is zero. This is the first-order moment. x2) = VAR [Z(x1) -Z(x2)] / 2 Under conditions of second-order stationarity. HANDLING NON-STATIONARY DATA An ongoing debate centers on how to handle non-stationary data. Henley. Isaaks and Srivastava. Henley. if the maximum distance h used in computing the semivariogram or covariance is much smaller than the scale of the trend (Hohn. This is denoted by the following equation: E[Z(x i+h)-Z(x i)] = 0 for all x i. With sufficient sampling. 1989. 1995). Wackernagel. The impact of non-stationarity depends in part on the sampling scale in relation to the scale of the trend.

a. the next step is to define a model variogram. 4. Obtain kriging or conditional simulation of the residuals on the grid. The kriging and conditional simulation processes require a model of spatial dependency. 3. This variogram is a simple mathematical function that models the trend in the experimental variogram. determine the trend on the sample data (trend surface analysis) b. In turn. Although this is a reasonable approach. 1. 1989).  The model smoothes the experimental statistics and introduces geological information. because:  Kriging requires knowledge of the correlation function for all-possible distances and azimuths. but depends upon a model from a limited class of acceptable functions. HANDLING TRENDS The following approach is often used to “detrend” the data. h (Isaaks and Srivastava. it is possible to ignore the trend if the data set contains a “large” number of data points. corresponding to angular/distance classes. In practice. Consider a random function Z (x) with an auto-covariance C(h): . Calculate the final gridded results by adding residuals to the trend. but only on the separation distance. However. After computing the experimental variogram. However.  Kriging cannot fit experimental directional covariance models independently. the main hurdle is the “correct” determination of the trend to remove from the raw data. the trend removed by this method is the global trend. but perhaps we should be working at a local scale (e. if the data are sparse.. MODEL VARIOGRAMS The experimental variogram and correlogram described in the previous section are calculated only along specific inter-distance vectors. 5. which means detrending the data. subtract the trend from the data (usually from the well data) 2.Z(x) = Y(x) + m(x) Where: Y(x) has an underlying variogram (residual) m(x) can be approximated by a polynomial (trend) Note: The univariate probability law [the probability of remaining at the same value at location Z(xi) and at Z(xi+h)] does not depend on the location of x. Krig the trend to the grid.g. neighborhood scale). this mathematical model of the variogram is used in kriging computations. Compute the variogram (correlogram) on the residuals. the variogram of the residuals should be computed. this approach also has its problems. “Stationarize” the data. In addition.

and any choice of locations (x i and xj)  To honor the above inequality. in the least squares sense. THE MODEL CHOICE The important characteristics of the spatial model are its behavior near the origin and behavior at large distances from the origin. The shape of the experimental model usually constrains the type of model selected. A least squares model does not satisfy the positive definiteness criterion. Define an estimator. affecting the final kriged results. the experimental covariance model must be fit with a positive definite C(h). 1b. Figure 1a 1c. . although any model can be applied. Z = iZ(xi)  The variance of Z is given by: z2 = ijC(xi -xj) 0  The variance must be positive (positive definiteness criterion) for any choice of weights (i and j). Spatial modeling is not curve fitting. Figure 1a. BEHAVIOR NEAR T HE ORIGIN The behavior near the origin affects the short scale variability of the final map plot.

.1b and 1d: 1d Variogram behavior near the origin.

Frame C is highly continuous. . Frame B is linear. we can describe its behavior as either bounded. or unbounded. BEHAVIOR AT LARGE DISTANCES After the variogram reaches its variance or sill. This behavior is typical of the classic variogram.1c Frame A shows purely random behavior. 2b In this example. we note that the bounded variogram (Frame A) reaches a sill and remains at the sill value for an infinite distance. while Frame D exhibits linear behavior. Figure 2a and 2b: (Variogram behavior at large distances after reaching the variance of the data) Figure 2a shows an example of each type of behavior. with some degree of random component.

3b 3d. . BASIC COVARIANCE FUNCTIONS Basic covariance functions for modeling variograms have the following characteristics:  simple. 3b. Variograms displaying this characteristic are typical of data that possess a trend. a Figure 3a. isotropic functions  independent of direction: correlograms are equal to 1 at h = 0. Figure 3a 3c. while variograms are equal to 0 at h = 0  variance reaches or approaches the sill beyond a certain distance (the range or correlation length). but shows a continuous increase in variance with increasing distance. the unbounded variogram (Frame B) never plateaus at the sill.Meanwhile.

3c 3e. 3e . 3d 3f.

There does not appear to be any relationship between depositional environment and variogram shape. followed by the Exponential. . Below are equations for four common variogram models.and 3g 3g (Common variogram models) shows a variety of models. The Nested model is a linear combination of two spherical structures. The Hole model can be dangerous because the periodicity will show up in a map. although it is not present in the data. For such functions. having short and long scale components. The “Hole” model is used for variograms computed from data which has a repeating pattern. The Gaussian and Exponential functions reach the sill asymptotically. the range is arbitrarily defined as the distance at which the value of the function decreases to 5%. 3f The Spherical model is the most commonly used model.

is usually assumed to be the point at which the model approaches 95% of the sill (C+Co). C = structural variance > Co. (Since the range is an arbitrary value it should not be compared directly with ranges of other models.5(h/A o) -0. This model is described by the following formula: (h) = Co + C [1. but differs in the rate at which the sill is approached and in the fact that the model and the sill never actually converge. which. and Ao = range parameter SPHERICAL MODEL The spherical model is a modified quadratic function where the range marks the distance at which pairs of points are no longer autocorrelated and the semivariogram reaches an asymptote. and Ao = range EXPONENTIAL MODEL This model is similar to the spherical variogram in that it approaches the sill gradually. so the range is defined arbitrarily to be the distance interval for the last lag class in the variogram. Ao is a parameter used to provide range. GAUSSIAN MODEL The gaussian or hyperbolic model is similar to the exponential model but assumes a gradual rise for the y-intercept. and Ao = range parameter In the exponential model.LINEAR MODEL The linear model describes a straight line variogram. in the exponential model. Co = nugget variance > 0. C = structural variance > Co. C = structural variance > Co. This model has no sill. This model is described by the following formula: . Co = nugget variance > 0. This model is described by the following formula: (h) = Co + C[1-exp(-h/Ao)] where h = lag interval. Range is estimated as 3Ao.) This model is described by the following formula: (h) = Co + [h(C/Ao)] where h = lag interval.5(h/Ao)3] (h) = Co + C for h > Ao for h < Ao where h = the lag distance interval. Co = nugget variance > 0.

though the effectiveness of the technique actually depends on accurately modeling the variogram.  Do not fit the covariance for distances greater than ½ the study area. Kriging also produces a variance estimate for its interpolation values. with minimum error). C = structural variance > Co. drawing contours.  Pay special attention to the fit for small distances and the size of the nugget effect. Practical Guidelines For Variogram Modeling  Do not over fit.  The nugget acts as a smoothing function during kriging.  Is “hole” effect related to true structure?  Is “hole” effect due to sparse data at given lag intervals?  Compute and fit the covariance along the direction of maximum and minimum continuity using a single structure if possible.  Beware of features that may relate to non-stationarity. The range can be estimated as 1. Co = nugget variance > 0. At the other end of the spectrum is the geologist who maps by hand. LINEAR ESTIMATION Kriging is a geostatistical technique for estimating attribute values at a point.(h) = Co + C[1-exp(-h2/Ao2)] where h = lag interval. or within a volume. .  Beware of periodic oscillations.  The nugget adds variability during conditional simulation. there are a number of algorithms for computer-based interpolation. over an area. KRIGING OVERVIEW INTRODUCTION Contouring maps by hand or by computer requires the use of some type of interpolation procedure.73A o (1. no other interpolation process can produce better estimates (being unbiased. smoothing the map to make it look “real” and perhaps biasing the map with a trend based on geological experience (Hohn. This section provides a broad overview of the computer-intensive interpolation process which lies at the heart of geostatistical modeling. In theory. 1988). The accuracy of kriging estimates is driven by the use of variogram models to express autocorrelation relationships between control points in the data set. interpolating between data points (or extrapolating beyond the control points).73 is the square root of 3). It is often used to interpolate grid node values in mapping and contouring applications. use the simplest model (fewest number of structures). As previously shown in the section on Gridding and Interpolation. and Ao = range parameter The range parameter in this model is simply a constant defined as that point at which 95% of the sill is approached.

only one covariance model is required. 1981. 1988. France. and it is named in honor of a South African mining engineer. like all interpolators  is a univariate estimator. Deutsch and Journel. 1989. referred to as external drift (such as two-way travel time known on a 2-D grid). who later founded the Centre de Geostatistiques.  Ordinary Kriging: The local mean varies. . where the primary attribute (such as depth at the wells) acquires its shape from the secondary attribute. 1995). Isaaks and Srivastava. requiring only one covariance model  weighs control points according to a spatial model (variogram)  tends to the mean value when control data are sparse  uses a spatial correlation model to determine the weights ()  assigns negative or null weights to control points outside the correlation range of the spatial model  indicates the global relative reliability of the estimate through RMS error (kriging variance). Journel. KRIGING FEATURES Kriging is a highly accurate estimation process which:  minimizes estimation error (the difference between measured value .the re-estimated value)  honors “hard” data  does not introduce an estimation bias  does not reproduce inter-well variability  produces a “smoothed” result. and is re-estimated based on the control points in the current search neighborhood ellipse. Danny Krige.The technique was first used for the estimation of gold ore grade and reserves in South Africa (hence the origin of the term Nugget Effect). The mathematical validity and foundation was developed by Georges Matheron. (Henley. The four most commonly used methods are:  Simple Kriging: The global mean is known (or supplied by the user). 1992. 1989. Hohn. as a by-product of kriging  has a general and easily reformulated kriging matrix. and is held constant over the entire area of interpolation. making it a very flexible technique to use more than one variable  declusters data before the estimation Types Of Kriging There are a number of kriging algorithms. and each is distinguished by how the mean value is determined and used during the estimation process. A typical application is timeto-depth conversion. and the shape of the map is related to a 2-D attribute which guides the interpolation of the primary attribute known only at discrete locations.  Kriging with an External Drift: Although this method uses two variables. as part of the Ecole’ des Mines in Paris. Wackernagel.

 Indicator Kriging: estimates the probability of an attribute at each grid node (e. Figure 1). 3 Find the most likely value of the variable Z at the target point (grid node: Z 0*. where  = 1. and the unknown weights.  Consider Z0* as a linear combination of the data Z   Z0* = 0 +   Z  Where:   = 1 and 0 = mz -   Determine  so that:  Z0* is unbiased: E [Z0* -Z] = 0  Z0* has minimum mean square error (MSE)  E [Z0* -Z]2 is minimum Recall that the unknown value Z 0* is estimated by a linear combination of n data points plus a shift parameter 0: Z0* = 0 +   Z (1) . 2. lithology. The Kriging Process We will illustrate the estimation process with an example problem.  Spatial covariance model of the indicator variable. as 0 or 1. the location of the point whose value we wish to estimate Z0*.  Prior Probabilities of both classes. productivity).g. In this graphic. as shown in Figure 1: Arrangement of three data points. The technique requires the following parameters:  Coding of the attribute in binary form.. Figure 1 Given samples located at (Z ). we see the geometrical arrangement of three data points Z . .

x j) represents a covariance between sample points x  and xj  c (x . The covariance equals the sill minus the variogram (Figure 2 : Relationship between a spherical variogram and its covariance equivalent): (4) (3) Figure 2 C(h) = 2 (sill) -(h) (6) . j  is a Lagrange multiplier that converts a constrained minimization problem into an unconstrained minimization. we solve the following to obtain the weights . The set of linear equations takes the following form:  j C (x . We use the covariance values because it is computationally more efficient. x 0) for all j = 1. not the covariance. Determine the matrix of unknown weights by solving the matrix equation for  as follows: C=c Where  = C-1 c (5) Note that equation 3 is written in terms of covariance values.n (2) or in matrix shorthand notation: C=c All three terms are matrices where:  C (x .By transforming the above equation into a set of linear normal equations. xj) - = c (x . however we either modeled a variogram or correlogram. x 0) represents a covariance between a sample located at x  and the target point x 0. the estimated point  are the unknown weights.

Do not attempt to use the kriging standard deviation like the true classical standard deviation statistic. specified by a search neighborhood. the kriging technique also provides an estimation of the likely error (in the form of error variance) at every grid node. Because the kriging variances are determined independent of the data values. x 0)  i(x . The kriging variance equation is: 2k = C(x 0. x 0) - (7) Search Neighborhood Criterion All interpolation algorithms require good data selection criteria. The model variogram plays a role in controlling extent of the neighborhood. These error estimates can be mapped to give a direct assessment of the reliability of the contoured surface. and might also be restricted to a particular direction. in order to take advantage of the statistical correlation among the observations. 1992). The neighborhood searches would be limited to a specified number of nearest neighbors.Kriging Variance In addition to estimating the value of a variable at an unsampled location. Search neighborhood parameters include:  Search radius  Neighborhood shape (isotropic or anisotropic: Figure 3a and 3b 3b . The variogram range defines the maximum size of the neighborhood from which control points should be selected for estimating a grid node. the kriging error is not a measure of local reliability (Deutsch and Journel. A typical geostatistical routine might interpolate values for a specific location using nearest neighbor values weighted by distance and the degree of autocorrelation present for that distance (as defined by the variogram model).

The isotropic model has a 1500-meter radius length.g. The center of the neighborhood is the target grid node for estimation. This graphic shows the radii for the anisotropic neighborhood are: minor axis = 1000 meters and major axis = 4000 meters. Both neighborhoods are divided into octants. Practical Considerations in Designing the Search Neighborhood  Align the search axis with the direction of maximum anisotropy. Weights are shown for data control points used for the estimation at the target point. 4)  Azimuth of major axis of anisotropy In this graphic. . with a maximum of two data points per sector. There are 55-sample points (x) within the study area.  Each quadrant should have enough points ( 4) to avoid directional sampling bias.)  Number of sectors ( 4 or 8 are common) Figure 3a  Number of data points per sector  Unique Neighborhoods use all data points (practical limit is 100)  Moving Neighborhoods use a limited number of points per sector (e. aligned at N15E..  CPU time and memory requirements grow rapidly as a function of the number of data points in neighborhood. note the elliptical shape of the anisotropic search neighborhood and the circular shape for the isotropic neighborhood.  Search radii (if anisotropic) should be  to the correlation ranges.

 In theory.  The kriging estimator is built from data within the search neighborhood centered at the target location Zo*. As the shape of the model variogram changes.  Unique neighborhoods smooth the data more than moving neighborhoods.  If the wells are known to provide a biased sampling. then a moving neighborhood is preferable to the unique neighborhood. Including points that are more distant may actually increase the error. which amounts to re-estimating mz at each grid node from the data within the search neighborhood.  If sufficient wells are available for ordinary kriging. Practical Considerations: Unique versus Moving Search Neighborhoods  In a moving search neighborhood. The resulting kriging estimates are best linear unbiased estimates of the surface. ordinary kriging and simple kriging yield similar results.  When all data points are used (unique neighborhood). a new simple kriging (SK) or ordinary kriging (OK) system of equations is solved at each grid node.  A unique search neighborhood uses all the data. is solved only once and used at each grid node.  Unique neighborhoods tend to prevent artifacts from abrupt changes in the number and values of the data points. Practical Considerations: Ordinary (OK) versus Simple Kriging (SK)  Simple kriging does not adapt to local trends. it may be better to impose your own mz with simple kriging rather than use ordinary kriging.  Ordinary kriging uses a local mean (mz).  If only a few data points are available in the local search neighborhood. so the left side of the kriging matrix. so do the kriging results. the covariance is poorly known for distances exceeding about ½ to 2/3 the size of the field. C. Effects of Variogram Parameters on Kriging Kriging applies weighting functions according to a mathematical model of the variogram. rather it relies on a constant.  Rescaling the variogram or correlogram (to create a larger or smaller sill):  Has no affect on kriging estimate  Changes the kriging variance  Increasing the nugget component:  Acts as a smoothing term during kriging (makes weights more similar) . provided that the surface is stationary and the correct form of the variogram has been determined. global mean. more data in the kriging system reduces the mean square error.  In practice. ordinary kriging may produce spurious weights because of the constraint that the weights must sum to 1.

. 4b.  The shape of the variogram or correlogram near the origin influences the continuity of the interpolation process (e. Increasing the range tends to increase the influence of more distance data points and leads to smoother maps. Figure 4a 4c.g. the gentler the slope. the smoother the interpolation). . 4b 4d. See Figure 4a.

. 4d 4f. 4e 4g.4c 4e.

4f

4h,

4g

and 4i:

4i

Kriging results from a common data set, based on different variogram models.

4h

In this graphic, Frames A-H use the isotropic neighborhood design shown in Figure 3b. The nested model (Frame F) used two spherical variograms, with a short range = 1000 meters and a long range of 10,000 meters. Nested models are additive. The anisotropic model (Frame I) used the anisotropic neighborhood design shown in Figure 3. The minor axes of the variogram model = 1000 meters, with a major axis = 5000 meters (5:1 anisotropy ratio), rotated to N15E. The color scale is equivalent for all figures. Purple is 5% porosity and red is 13%. All these illustrations were created using the same input data set. Advantages Of Kriging  Kriging is an exact interpolator (if the control point coincides with a grid node).  Kriging variance:       Relative index of the reliability of estimation in different regions. Good indicator of data geometry. Smaller nugget (or sill) gives a smaller kriging variance. Minimizes the Mean Square Error. Can use a spatial model to control the interpolation process. A robust technique (i.e., small changes in kriging parameters equals small changes in the results).

Disadvantages Of Kriging Kriging tends to produce smooth images of reality (like all interpolation techniques). In doing so, short scale variability is poorly reproduced, while it underestimates extremes (high or low values). It also requires the specification of a spatial covariance model, which may be difficult to infer from sparse data. Kriging consumes much more computing time than conventional gridding techniques, requiring numerous simultaneous equations to be solved for each grid node estimated. The preliminary processes of generating variograms and

designing search neighborhoods in support of the kriging effort also require much effort. Therefore, kriging probably is not normally performed on a routine basis; rather it is best used on projects that can justify the need for the highest quality estimate of a structural surface (or other reservoir attribute), and which are supported by plenty of good data. CROSS-VALIDATION Cross-validation is a process for checking the compatibility between a set of data, the spatial model and neighborhood design. In cross-validation, each point in the spatial model is individually removed from the model, and then its value is estimated by a covariance model. In this way, it is possible to compare estimated versus actual values. The procedure consists of the following steps: 1. Consider each control point in turn. 2. Temporarily suppress each control point from the data set. 3. Re-estimate each point from the surrounding data using the covariance model. 4. Compare the estimated values, Zest, to the true values, Ztrue. This also provides a re-estimation error (kriging variance is also calculated at the same time): RE = Zest -Ztrue 5. Calculate a standardized error: SE = RE/krig  Ideally, it should have a zero mean and a variance equal to 1.  The numerator is affected by the range  The denominator is affected by the sill 6. Average the errors for a large number of target points to obtain:  Mean error  Mean standard error  Mean squared error  Mean squared standardized error 7. Distribution of errors (in map view) can provide useful criterion for:  Selecting a search region  Selecting a covariance model 8. Any data point whose absolute Standardized Error  2.5 is considered an outlier, based on the fact that the data point falls outside the 95% confidence limit of a normal distribution. USEFUL CROSS-VALIDATION PLOTS Figure 1a, 1b,

Figure 1a 1c. 1b and 1d .

A good model has an equally likely chance of over or under estimating any location. The red. If the histogram of standardized error (SE) in Figure 1c is skewed. Open circles are over-estimations. Again. The solid red circle falls outside 2. 1c Figure 1a shows a map view of the magnitude of the Re-estimation Error (RE). solid circle is the sample from Figure 1a. The two most important plots are in Figure 1c and 1d because they help identify model bias. solid circles are under-estimations. open circles are under estimates. look for intermixing of the RE as an additional indication of biasing. Also. if . Figure 1b is a cross plot of the measured attribute of porosity at the wells versus porosity re-estimated at the well locations during the cross validation test.5 standard deviation from a mean = 0.1d (Cross-validation plots) show output from a cross-validation test.

Figure 2a 2c. 2b . Such is not the case in this example. Figure 2a. then the model is biased.or there is a correlation between SE and the estimated values in Figure 1d. 2b. however.

COKRIGING INTRODUCTION In the previous section. there is a positive correlation between SE and estimated porosity. .2c and 2d (Cross-validation results from a biased model) is. we will now describe the use of a secondary variable in the kriging process. The histogram of SE (Figure 1c) is slightly skewed towards over estimation. 2d In Figure 1a. Rather than only consider the spatial correlation between a set of sparse control points. we described kriging with a single attribute. or both. poor neighborhood design. the over estimated RE values are clustered in the center of the map. Finally. These indicate poor model design.

TYPES OF COKRIGING Cokriging is a general multivariate regression technique which has three basic variations:  Simple Cokriging uses a multivariate spatial model and a related secondary 2-D attribute to guide the interpolation of a primary attribute known only at control points (such as well locations).  Collocated Cokriging is a reduced form of Cokriging. which requires knowledge of only the hard data covariance model. but it is estimated using the neighborhood control points rather than specified globally. PROPERTIES OF COKRIGING This method is a powerful extension of kriging. There is also a modified search criterion used in Collocated Cokriging.In this section.  Ordinary Cokriging is similar to Simple Cokriging in that the mean is still assumed to be constant. the value at the target grid node. but there is an abundance of related secondary information (such as seismic data). The estimation of a primary regionalized variable (e. porosity) from two or more variables (such as acoustic impedance) is known as Cokriging. and the variances of the two attributes. the ProductMoment Correlation coefficient between the hard and soft data. you will learn about multivariate geostatistical data integration techniques. which fall into the general category called Cokriging and you will learn more about External Drift. uses only one secondary data value. The method uses all primary and secondary data according to search criterion. There are many situations when it is possible to study the covariance between two or more regionalized variables. but. The mutual spatial behavior of regionalized variables is called coregionalization.g. is an unbiased estimator honors the “hard” data control points are weighed according to a model of coregionalization  is more demanding than kriging:  requires a simple covariance model the for the primary and all secondary attributes  requires cross-covariance models for all attributes  must be modeled with a single coregionalized model  Requires neighborhood searches that are more demanding . The mean is specified explicitly and assumed to be a global constant. in its simplest form. The techniques introduced in this section are appropriate for instances when the primary attribute of interest (such as well data) is sparse. This method uses all the primary data.. which:  must satisfy the same conditions as kriging:     it minimizes the estimation error.

we might want to use the seismic information to provide better inter-well estimates than could be obtained from the well data alone.g. on the other hand. Seismic data. The Cokriging Process We can illustrate the cokriging process by way of the following problem: Given samples located at Z. Basic Concept We‟ll illustrate this concept by way of example. one of the more powerful aspects of the geostatistical method is quantitative data integration. in the interpolation process.g. Even when the number of primary (well) data (e. requires more computation time DATA INTEGRATION Besides being able to use a spatial model for determining weights during estimation. We know from classical multivariate statistics that models developed from two or more variables often produce better estimates. find the most likely value of the variable Z0* at the target grid node (Figure 1: Arrangement of control points. and seismic data located on a grid. seismic acoustic impedance). From our exploratory data analysis we might find a good correlation between a property measured at well locations and a certain seismic attribute. seismic grid. We can extend classical multivariate techniques into the geostatistical realm and use two or more regionalized variables in this geostatistical estimation process. and target grid node). it is possible to use a densely sampled secondary attribute (e. In such a case. Geostatistical data integration methods allow us to capitalize on the strengths of both data types. but provide densely sampled lateral information. have poorer vertical resolution than well data.. porosity) are sparse. . but poor lateral resolution. Well data have excellent vertical resolution of reservoir properties.. weights. to yield higher quality reservoir models.

T m.  CTT (h) is the spatial covariance model of the secondary attribute (seismic data). Requirements  CZZ (h) is the spatial covariance model of the primary attribute (well data). This condition is known as self-krigability. and the target grid node. so if there is a very high correlation between the primary and secondary data.Figure 1 This graphic shows the geometrical arrangement of three data control points Z (where  = 1. The traditional cokriging approach uses only secondary data at the well locations. …. the unknown weights. Zo* =  Za +   jTj + c Where:  = 1 (primary weights) j = 0 (secondary weights) c = mZ [1-  ] -mT   j to ensure unbiasness. Zn and the secondary (seismic) data T1. …. . it is the same as kriging the primary data only. a grid of seismic data. Traditional Cokriging Estimator The general Cokriging estimator is expressed as a weighted linear combination of the primary (well) data Z1.2. .3). Z0*. . .

Different Cases  No Correlation: Z1Z2 (h) = 0  Perfect Correlation: h bij   bii b jj i. Figure 2 Estimator Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T1T1 + T2T2 +T3T3 Estimated Error 2 = CZ00 -Z1CZ01 -Z2CZ02. Self-Krigability in the Isotropic Case A variable is defined as self-krigable when the cokriging estimate (in the isotropic case) is identical to its kriging estimate.j  Intrinsic Correlation occurs when the simple and cross-variograms are proportional.T21CT02. CZT (h) is the spatial cross-covariance model of well and seismic data. Cokriging: Example Using 3 Data Points Data Configuration Figure 2 shows a typical data configuration for traditional cokriging. which is always the case if there is only one basic structure. T1CT01.T3CT03 - Collocated Cokriging Collocated cokriging is a modification of the general cokriging case: .Z3CZ03.

1992) Example Using 3 Data Points Figure 3 illustrates a typical data configuration for collocated cokriging. the secondary data at the estimation grid node is the only bit of seismic data used in this forma of the algorithm.  If the secondary attribute covariance model is assumed proportional to the primary attribute covariance model.  It uses all primary data according to search criterion.  we can use the correlation coefficient and the ratio of the secondary to primary variances to transforms a univariate covariance model into a multivariate covariance model. Forms that are more complex combine this data configuration with the one shown in Figure 1. This assumption is termed MarkovBayes assumption (Deutsch and Journel. which also increases the computation time substantially. It requires only the simple covariance model of the primary attribute in its simplest form. Estimator Z0* = Z1Z1 + Z2Z2+ Z3Z3 +T0T0 Estimated Error 2 = CZ00 -Z1CZ01 -Z2CZ02-Z3CZ03- T0CT0 - .  It uses secondary data attribute located only at the target grid (simplest form) node during estimation. then:  the correlation coefficient is the constant of proportionality. In the case of sparse primary data. Figure 3 In this graphic. the covariance model is often derived from the covariance model of the densely sampled secondary attribute.

the system of normal equations is well conditioned. that is. no well data within the search radius). the variances of the primary and secondary attributes (2Z.  Careful determination of the correlation coefficient..  It is impractical to incorporate more than two to three secondary variables into the cokriging matrix because of increased modeling assumptions and computational time. it uses only data located at the primary sample locations. Practical Considerations: Cokriging and Collocated Cokriging  In general. as the number increases. the cokriging system can become unstable. it is often difficult to find a common model of coregionalization. especially if the well data are sparse. In fact.  Analyze and understand the physical meaning of the correlation.  In general. ZT(0). collocated cokriging reduces to a traditional least-squares regression problem. because it uses the secondary data located only at the target grid node.  During extrapolation situations (i. CYY(h) and CZY(h) must be specified. .  Collocated cokriging only requires the knowledge of the primary covariance model (CZZ(h)). Collocated Cokriging  Collocated cokriging assumes that the secondary variable is known at all nodes of the estimation grid and uses all secondary information during the estimation process. because it controls the scaling between the primary and secondary data when using the Markov-Bayes assumption.  The Markov-Bayes approach to collocated cokriging assumes that the cross-covariance is a scaled version of the primary variable autocovariance. 2T) and the correlation coefficient between the primary and secondary attributes (ZY).  Cokriging requires more modeling effort: CZZ(h).  The traditional method does not incorporate secondary information from non-collocated data points.General Cokriging Versus Collocated Cokriging Cokriging  A secondary variable is not required at all nodes of the estimation grid. No assumption regarding the relationship between the crosscovariance and the auto-covariance of the primary variable is required.e.  Remove outliers when computing ZT(0).  The simplest form of collocated cokriging ignores the influence of noncollocated secondary data points. increasing the number of data points of the secondary variable does not improve the cokriging performance.  System of normal equations may be ill conditioned. is critical when applying collocated cokriging.

 Yields more accurate estimates than simple single variable kriging.5. in the presence of a trend. representing regional features  Residual .  When compared to traditional least-squares regression. for the minimum and maximum values of the secondary data (e. represents local features  The basic hypothesis is that the expectation of the variable is a function. which is completely defined: E [Z(x)] = S(x)  To provide greater flexibility. because it is often difficult to understand the physical meaning of the multivariate correlation. external drift kriging (KED) is generally included in the multivariate data integration discussion of most geostatistical presentations. the secondary attributes has less influence during the estimation process. Make sure that the estimator yields a meaningful range of estimates.  Cokriging system may sometimes be ill conditioned.  Can calibrate and control the influence of the secondary data via a crosscovariance model (cokriging) or through the correlation coefficient (collocated cokriging).. denoted S(x). analogous to a local trend surface. Advantages of Cokriging and Collocated Cokriging  Allows incorporation of correlated. Z0*. the well data probably do not calibrate the full range of the secondary data). but not as smooth as kriging.  Inferring a correct linear correlation model is difficult for sparse well data. we often express the model as: E [Z(x)] = a0 + a1 S(x) where the coefficients a 0 and a1 are unknown.  With a correlation coefficient of < 0.  Do not use more than one or two seismic attributes. secondary data into the mapping process.  Cokriging tends to produce a smoothed image. KED is a true regression technique. which uses a secondary attribute to define a trend to guide the estimation of the primary variable (Deutsch and Journel. . Although technically a univariate problem requiring only the primary attribute covariance model. or simulate. Limitations of Cokriging and Collocated Cokriging  Requires more modeling effort than kriging or kriging with an external drift.  Regionalized variables are made up of two parts:  Drift .g. the cokriging technique honors the primary data and accounts for spatial correlation in the variations of the secondary data.deviation from the drift.expected value. 1992). EXTERNAL DRIFT KRIGING This technique allows us to krig.

The sum of the weights must equal 1:  i = 1 The weight times the drift value is equal to the drift value at the target location (which is the area we want to investigate): iSi = S0 These equations ensure that the system is unbiased.  Before applying the kriging conditions.  Compute the estimated attribute at all grid nodes. Once the local secondary data drift is known. the „residual‟ is estimated at the target grid node using traditional kriging methods. then the drift value is added back to produce the final estimated value. This optimality constraint leads to the traditional error equation: 2 = K00 -iKi0 -0 -1 S0  KED is a multi-step process:  Compute the coefficients a0 and a1 from a local least-squares regression using the primary and secondary variables measured at the wells. KRIGING WITH EXTERNAL DRIFT: EXAMPLE USING 3 DATA POINTS Figure 1 illustrates a typical example (time to depth conversion) for the KED method.  Compute residuals of the well data.The a0 and a1 coefficients in the above equation are a linear combination of the error term used to filter the local secondary data trend (or drift). external drift kriging must use an authorized variogram model to ensure the computation of a positive kriging variance (meeting the positive definiteness criterion).  Compute residuals at all data points. This is analogous to trend surface analysis for removing a trend based on a polynomial equation. Like traditional kriging. the mean from the kriging neighborhood must be known. .

Figure 2 shows a three data point KED example. KED is not an appropriate approach for mapping reservoir rock properties. to construct the final depth map. Kriging was used to map the solid curved surface through the data points. . The seismic travel times correlate with the measured depth at the wells and suggest a much more complex surface than the surface created using only the well data. The seismic two-way time data (lightweight line) is the External Drift. a true regression approach. The approach assumes a perfect correlation between the well and seismic data. KED is appropriate when shape is an important aspect of the study. Four wells intersected the top of a reservoir.Figure 1 The objective of KED is to use the seismic data as a correlated shaping function. This surface is a second or third order polynomial. the collocated cokriging is the better choice.

 In theory.9)?  Is the correlation physically meaningful? . Use a cross-plot to investigate the relationship:  Is there a linear relation?  Is it well defined (i. Sufficient data must fall within the search neighborhood to ensure proper definition of the regression (to filter the trend).  Consider a KED approach only when shape is important. In practice.  In a moving neighborhood. Estimator Z0* =  1Z1 +  2Z2+  3Z3 Estimated Error 2 = K00 -1K01 -2K02. K ZZ(h)  CZZ(h) for small distances of h or along directions not strongly affected by the trend.   0.e. the covariance model of the residuals.Figure 2 The primary data are located at Z and the secondary data at S .Z3K03 -0 -1S0 PRACTICAL CONSIDERATIONS: KRIGING WITH EXTERNAL DRIFT  The external drift must be known at the locations of all primary data and at all nodes of the estimation grid. K ZZ(h). cannot be inferred from the Z data. Note that KED also uses the secondary information at the target grid node. the coefficients a 0 and a1 in the regression Z*(x) = a0 + a1 S(x) are re-estimated at each grid node.. This data configuration can also be used for a more rigorous application of the collocated cokriging method.  Use a unique neighborhood in cases with sparse data. or a very high correlation exists between the primary and secondary attributes.

 Neighborhood search is identical to kriging. that is not spatially correlated. then assume that:  The errors are random.  KED system may be unstable if the drift is not a smoothly varying function. S2.  Easier to implement than cokriging or collocated cokriging because it does not require any secondary attribute modeling.  Zero-mean Gaussian white noise uncorrelated with the signal.  In the simulation mode. the user must remove the contribution of the nugget to extract the signal variance. the noise variance acts as a smoothing parameter (as does the Nugget Effect). the data point receives a much lower weight in the interpolation process. the data values are honored exactly. The external drift should be a smoothly varying function.  There is no means to calibrate and control the influence of the secondary variable because the method is a true regression model and assumes a perfect correlation between the two data types.  The errors are independent of the true values.  The expected mean value of the errors is equal to zero. Advantages of KED  Allows direct integration of a secondary attribute during estimation of the primary data. where Si is the true value and i is the unknown measurement error. It is also assumes that the variance of the noise. the measured value of Z i = Si + i. n2 introduces more variability into the final simulated result. Using these assumptions. When n2 at one point is large compared to the signal. PRACTICAL CONSIDERATIONS: MEASUREMENT ERROR  For (co)-kriging. otherwise the KED system may be unstable (produce extremely high or low values). At a data point i. cokriging and conditional simulation algorithms are flexible enough to take measurement errors in the primary variable into account. decompose the data values into:  A signal component with a constant variance. If true. which determines how closely the primary data are honored. is known at every primary data location. i2. When n2 is equal to zero.  When modeling the experimental covariance of the data. Limitations of KED  May be difficult to infer the covariance of the residuals (local features). MEASUREMENT ERRORS Kriging.  The errors have a Gaussian distribution. .  Computation time is similar to kriging a single variable.

Data Integration Examples The following examples illustrate the three data integration methods just described.  building velocity maps from well-derived and NMO-derived average velocities. Advantages of Using Measurement Error  Integration of data of varying quality. . Figure 1a. therefore. Limitations of Measurement Error  Only implemented for the primary data.  Account for spatially varying measurement error.  Amount of smoothing is not proportional to the noise variance.  Errors are accounted for in the final result. Figure 1a 1c. It is often useful to allow the noise parameter to vary from one data location to another. Examples include:  interpolating zone average data from wireline logs with differing accuracy. 1b. the secondary data are assumed noise free.  mixing core-derived and log-derived measurements.

.1b and 1d 1d illustrate the basic data configuration for the three examples.

1c Figure 2a. 2b and 2d . 2b. Figure 2a 2c.

2d illustrate cokriging. 2c 3b. . and Figure 3a.

3b and 3d .Figure 3a 3c.

Figure 1b shows variograms derived from the well data. In Figure 1c. Figure 1d shows the seismic acoustic impedance data. spherical variogram model having a range of 1500 meters (shown in Figure 1b). with the experimental variogram (thin line labeled D1) superimposed on the model variogram.3d show collocated cokriging and kriging with external drift. the porosity data were kriged to the seismic grid using the omni-directional. The seismic data resides . The isotropic search neighborhood used an octant search with 2 points per sector. 3c Figure 1a shows porosity data points from well log information.

Figure 3b is a result of a Markov-Bayes collocated cokriging using a correlation of -0. 3b. also known as conditional simulation. 3c. We are justified in using the nested.83 correlation between the data. Although not a totally appropriate use of KED (Figure 3d). and 3d illustrate collocated cokriging (Figure 3a-c) and kriging with external drift (Figure 3d). However. Figure 2d shows the results of cokriging using the cross-variogram model from Figure 2c. then a method other than interpolation is required (Hohn. conditional simulation models are .800 data points). then the smoothing properties of kriging in the presence of a large nugget may be the best approach. This is the grid mesh used for all the following examples. An important advantage of the geostatistical approach to mapping is the ability to model the spatial covariance before interpolation. and scale them based on their individual variances. Lines D1 and D2 represent experimental variograms taken from two different directions. 2b. and 2d illustrate an example of traditional cokriging. calibrate them using the correlation coefficient. is a variation of conventional kriging or cokriging. CONDITIONAL SIMULATION AND UNCERTAINTY ESTIMATION INTRODUCTION Stochastic modeling. This approach would be similar to a Markov-Bayes assumption using a -1. Figure 3a. spherical variogram with a range of 1500 meters. respectively. based on the anisotropic search neighborhood. The sill of the cross variogram reflects the magnitude of the -0. 1988). Thus. thus reverting to a simple kriging solution. Figure 3c is also a collocated cokriging using the Markov-Bayes assumption. If the mapping objective is reserve estimation. the porosity map using KED shows a slightly wider range of porosity values. including Figure 1c. The model for the collocated kriging was derived from analysis and modeling of the seismic acoustic impedance data from the West Texas data set. except the correlation coefficient in this case was set to -0. The curved. 2c. anisotropic seismic data variogram model (Figure 3a) as a model of porosity based on the high correlation coefficient (-0. 1994a). dashed lines show the bounds of perfect positive or inverse correlation. Once thought of as stochastic “artwork”.1. we can use the MarkovBayes assumption to create the seismic variogram from the porosity variogram. Figure 3c illustrates the condition of self-krigability. The cross variogram (Figure 2c) uses the same spherical model. The covariance models make the final estimates sensitive to the directional anisotropies present in the data.0 correlation.83).83.on a grid of approximately 12 by 24 meters in X and Y. when the secondary attribute has no correlation to the primary attribute. useful only for decorating the walls of research centers (Srivastava. Porosity (Figure 2a) and acoustic impedance (Figure 2b) were modeled with an omnidirectional. if the objective is to map directional reservoir heterogeneity (continuity) and assess model uncertainty. Figure 2a.83). The well porosity data is sparse (55 data points) in comparison to the densely sampled seismic data (33. The lines labeled D1 represent the experimental variograms upon which the model variograms are based. The cross variogram shows an inverse relationship between porosity and acoustic impedance (the correlation is -0.

becoming more accepted into our day-to-day reservoir characterization-modeling efforts because the results contain higher frequency content, and lend a more realistic appearance to our maps when compared to kriging. Srivastava (1994a) notes that, in an industry that has become too familiar with layer-cake stratigraphy, with lithologic units either connected from well-to-well or that conveniently pinch out halfway, and contour maps that show gracefully curving undulations, it is often difficult to get people to understand that there is much more inter-well heterogeneity than depicted by traditional reservoir models. Because stochastic modeling produces many, equi-probable reservoir images, the thought of needing to analyze more than one result, let alone flow simulate all of them, changes the paradigm of the traditional reservoir characterization approach. Some of the realizations may even challenge the prevailing geological wisdom, and will almost certainly provide a range of predictions from optimistic to pessimistic (Yarus, 1994). Most of us are willing to admit that there is uncertainty in our reservoir models, but it is often difficult to assess the amount of uncertainty. One of the biggest benefits of geostatistical stochastic modeling is the assessment of risk or uncertainty in our model. To paraphrase Professor Andre Journel “… it is better to have a model of uncertainty, than an illusion of reality.” Before reviewing various conditional simulation methods, it is useful to ask what is it that we want from a stochastic modeling effort. We really need to consider the goal of the reservoir modeling exercise itself, because the simulation method we choose depends, in large part, on the goal of the study and the types of data available. Not all conditional simulation studies need the Cadillac approach, when a Volkswagen technique will do fine (Srivastava, 1994a). WHAT DO WE WANT FROM A CONDITIONAL SIMULATION METHOD? Srivastava (1994a), in an excellent review of stochastic methods for reservoir characterization, identifies five major types of stochastic simulation model approaches:  Assessing the impact of uncertainty.  Monte Carlo risk analysis.  Honoring heterogeneity.  Facies or rock properties (or both)  Honoring complex information. The interested reader should refer to the original article for details, which is only summarized in this presentation. Assessing the Impact of Uncertainty Anyone who forecasts reservoir performance understands that there is always uncertainty in the reservoir model. Performance forecasts or volumetric predictions are often based on a “best” case model. However, the reservoir engineer is also interested in other models, such as, the “pessimistic” and “optimistic” case. These models allow the engineer to assess whether the field development plan, based on the “best” case scenario, is flexible enough to handle the uncertainty. When used for this kind of study, stochastic models offer

many models consistent with the input data. We could then sort through the many realizations, select one that looks like a downside scenario, and find another that looks like an up-side model. Monte Carlo Risk Analysis A critical aspect for the use of stochastic modeling is the belief in some “space of uncertainty” and that the stochastic simulations are outcomes which sample this space fairly and adequately. We believe that we can generate a fair representation of the whole spectrum of possibilities and hope that they do not have any systematic tendencies to show pessimistic or optimistic scenarios. This type of study involves the idea of a probability distribution, rather than simply sorting through a large set of outcomes and selecting two that seem plausible. In Monte Carlo risk analysis, we depend on the notation of a complete probability distribution of possible outcomes, and that the simulation realizations fairly represent the entire population. Honoring Heterogeneity Although stochastic techniques are capable of producing many plausible outcomes, many studies only use a single outcome as the basis of performance prediction. Over the past decade, it has become increasing apparent that reservoir performance predictions are more accurate when based on models that reflect possible reservoir heterogeneity. We are painfully aware of the countless examples of failed predictions due to the use of overly simplistic models. The thought of using only a single outcome from a stochastic modeling effort is often viewed with disdain by those who like to generate hundreds of realizations. Srivastava (1994a) argues that “even a single outcome from a stochastic approach is a better basis for performance prediction than a single outcome from a traditional technique that does not honor reservoir heterogeneity.” Granted, many people will argue with this statement, because that one simulation may be the pessimistic (or optimistic) realization just by the “luck of the draw,” probabilistically speaking. Facies or Rock Properties (or both) Reservoir modelers recognize two fundamentally different aspects of stochastic reservoir models. The reservoir architecture is usually the first priority, consisting of the overall structural elements (e.g. faults, top and base of reservoir, etc.), then defining the geobodies based on the depositional environment (e.g., eolian, deep-water fan, channels, etc.). Once the spatial arrangement of the different flow units are modeled, we must then decide how to populate them with rock and fluid properties. The important difference between modeling facies versus modeling rock properties is that the former is a categorical variable, whereas the latter are continuous variables. Articles by Tyler, et al. (1994), MacDonald and Aasen (1994), and Hatloy (1994) provide excellent overviews of these methods. Though it is conventionally assumed that a lithofacies model is an appropriate model of reservoir architecture, we should ask ourselves whether this is a good assumption. Just because the original depositional facies are easily recognized and described, they may not be the most important control on fluid flow. For example, permeability variations might be due to later diagenesis or tectonic events (Srivastava, 1994 ).

Honoring Complex Information Stochastic methods allow us to incorporate a broad range of information that most conventional methods can not accommodate. Many individuals are not so much interested in the stochastic simulation because it generates a range of plausible outcomes, but because they want to integrate seismic data with petrophysical data while obtaining some measure of reliability. Properties of Conditional Simulation Conditional simulation is a Monte Carlo technique designed to:  honor measured data values  approximately, reproduce the data histogram  honor the spatial covariance model  be consistent with secondary data  assess uncertainty in the reservoir model Conditional Simulation Methods The following section is but a very brief review of stochastic simulation methods in common use, followed by a discussion of important practical advantages and limitations of each method. The terms stochastic and conditional are sometimes used interchangeably. Technically, they each mean something different. Stochastic typically connotes randomness to most people. In geostatistics, we define stochastic simulation as the process of drawing equally probable, joint realizations of the component Random Variables from a Random Function model. These are usually gridded realizations, and represent a subset of all possible outcomes of the spatial distribution of the attribute values. Each realization is as called a stochastic image (Deutsch and Journel, 1992). If the image represents a random drawing from a population of mean = 0 and variance = 1, based on some spatial model, we would call this type of realization a non-conditional simulation. However, a simulation is said to be conditional when it honors the measured values of a regionalized variable (Hohn, 1988). For the remainder of this discussion, stochastic and conditional will be used as equivalent processes. Non-conditional simulations are often used to assess the influence of the spatial model parameters, such as the nugget and sill values, in the absence of control data. Each of these parameters has a direct affect on the amount of variability in the final simulation. Increasing either the sill or nugget increases the amount of variability in a simulated realization. Srivastava (1994a) lists the following types of stochastic simulation methods:  Turning Bands  Sequential Simulation  Gaussian  Indicator  Bayesian  Simulated Annealing  Boolean, Marked-Point Process and Object Based

shuffle the grid nodes into an order defined by a random seed value. Use kriging to estimate the mean. Create a newly simulated value Z Si* = mi + zi. The final model still honors the original data and the spatial model. This ensures that closely spaced values have the correct short scale correlation. Sequential Simulation Three sequential simulation (Gaussian. Deutsch and Journel. . but repeatable:  For each simulation. with zero mean and unit variance. then GNi +1. . and variance. Indicator. whose maximum spread is  2  around m i 4. each random path is uniquely identified and repeatable. 2. 5. . P (ZSi  ZS1 . The order in which grid nodes are randomly simulated influences the cumulative feedback effect on the outcome. Repeat the process until all grid nodes have a simulated value. 3. The noise is added through a non-conditional simulation step using the same histogram and spatial model as in the kriging step. Turning Bands This is one of the earliest simulation methods. .. Select at random grid node GNi. 1994a. 6. including kriging with external drift (KED) i2 is the error variance of mi.  Each random seed corresponds to a unique grid order. until all grid nodes contain a simulated value. and Bayesian) procedures make use of the same basic algorithm for different data types. but does not use the actual data values at the well locations.  Although the total possible number of orderings is very large. ZSi-1)  exp [(ZSi -mi)2 / 2i2] where: mi is estimated by any of the kriging methods. then adding an appropriate level of noise. mi. tackling the simulation problem by first creating a smooth model by kriging. a point not yet simulated in the grid. Probability Field  Matrix Decomposition Methods We will describe each of these methods in turn. i2 at location GNi from the local Gaussian conditional probability distribution (lGcpd). The selection process is random. Draw at random a single value. Selection of the Simulated Grid Node The first step in sequential simulation is the random selection of a location GN i. but now also has an appropriate level of spatial heterogeneity (Srivastava. zi from the lGcpd. Include the newly simulated value Z Si* in the set of conditioning data. The general process is 1.  Different random seed values produce a different path through the grid. 1992).

The first requirement is an objective (or energy) function. Bayesian SIS input parameter requirements:  Code well data as 0 or 1.  The Indicator spatial correlation models. Simulated annealing constructs the reservoir model via an iterative trial and error process. with average shale thickness of 10 m. as in SIS.Sequential Gaussian Simulation (SGS) is a method for the simulation of continuous variables.. or sand/shale). Rather. The image starts with pixels arranged randomly. we need the:  Mean and standard deviation of the seismic attribute that is 0. In SGS. 1994). the simulated image is formulated as an optimization process.  Classify the seismic attribute into two classes (0. for example:  I(zx) = 1 if zx is shale  I(zx) = 0 if zx is sand  Indicator histogram  The Indicator spatial correlation model Bayesian Sequential Indicator Simulation is a later form of SIS (Doyen. which represent “lithofacies” (pay/non-pay. we might want to produce an image of a sand/shale model with a 70% net-to-gross ratio. such as petrophysical properties. For example. The net-to-gross is incorrect because of the random . By creating a grid of 0s and 1s. and does not use an explicit random function model. with the addition of a bias. it uses the same methodology as SGS.  Mean and standard deviation of the seismic attribute that is 1. 1):  Assuming the two data classes are Normal Distributions. The process continues until the desired model conditions are satisfied.  The a priori probabilities of the two classes of seismic data. The probability that any two molecules will follow each other is known as the Boltzmann probability distribution. Simulated annealing is the application of the annealing mechanism of swapping the attributes assigned to two different grid node locations. the procedure is essentially the same as (co) kriging. using the Boltzmann probability distribution for accepting the perturbations (Deutsch. 1994). SIS requires the following input parameters:  The a priori probabilities (proportions) of two data classes (Indicators denoted as I) coded as 0 or 1. Sequential Indicator Simulation (SIS) is a method used to simulate discrete variables. having sand and shale in the correct global proportion. an average shale length of 60 m. which is some measure of difference between the desired spatial characteristics and those of the candidate realization (Deutsch. 1994). This technique allows direct integration of seismic attributes with well data using a combination of classification and indicator methods. ET al. Simulated Annealing Annealing is the process where a metallic alloy is heated so that the molecules move around and reorder themselves into a low-energy grain structure.

or must there be a minimum distance between the shapes). et al. because millions of perturbations may be required to arrive at the desired image. Randomly select one of the lithofacies shapes. You must also specify the proportions of the shapes in the final model and choose a distribution for the parameters that describe the shapes. requiring much upfront knowledge of the depositional system you wanted to model. or deltas as triangular wedges in map view. anisotropy and orientation. you need to select a basic shape for each lithofacies that describes its geometry. Articles by Tyler. shale). and how. At first glance. because the algorithms were not strict simulators of shape. The number of input parameters made this almost a deterministic method.assignment of the sand and shale. applies the Boltzmann probability distribution for accepting the perturbations. 4. this approach seems terribly inefficient. this step is typically completed first. Boolean. To use such methods. can they overlap. After the distribution of parameters and position rules are chosen. 6. There are algorithms that describe how the geobodies are positioned relative to each other (that is. 2. Simulate petrophysical properties within the geobodies using the more classical geostatistical methods. Be sure that there are no conflicts with known stratigraphic and lithologic sequences in the wells. In sequential simulation. academic. (1994) and Hatloy (1994) provide excellent case studies using Boolean-type methods to simulate fluvial systems. Check to see if the shape conflicts with any conditioning data (e. Boolean or object-based techniques are of current interest in the petroleum industry with a number of research. 1994). well data) or with other previously simulated shapes. keep the shape. However. If control data must be honored. 3. Randomly select a starting point in the model. Fill the reservoir model background with some lithofacies (e. Next. and commercial vendors working on new implementation algorithms. 5. Boolean-type algorithms could not always honor all of the conditioning data. 1994a): 1. and draw an appropriate size.g.g. return to step 2. and then the inter-well region is simulated. follow the remaining steps in the procedure (Srivastava. Check to see if the global proportions are correct. rather than being built up from one elementary node or pixel at a time. . the value drawn from the local cumulative probability distribution at a particular grid node is treated as if it was hard data. Marked-Point Process and Object Based Theses methods constitute a family of techniques that create reservoir models based on objects of some genetic significance. these methods are more efficient than they appear in theory (Deutsch. In the past. you might want to model sand channels that look like half ellipses in cross section.. Probability Field Simulation This method is an enhancement of the sequential simulation methods described earlier. otherwise reject it and go back to the previous step. The average shale length and width are too short also. and continues until the model conditions are satisfied. For example. if not. If not.. the annealing mechanism swaps attributes at different grid node locations.

This decomposition can be made unique either by stipulating that the diagonal elements of L be unity. (1994) illustrates a P-field application for establishing an appropriate degree of correlation between porosity and permeability. Srivastava (1994b) shows how P-field simulation improves the ability to visualize uncertainty and the article by Bashore. In this approach. Otherwise. 1992). This ensures that closely spaced values have the correct short scale correlation. et al. L. Matrix Decomposition Methods Some simulation techniques involve matrix decomposition. 1994a). This map is used as a measure of the standard error and is used to analyze uncertainty. based on the values from all simulations at the same location. because each image is equally likely. L-U decomposition is one such example. typically as a variogram or correlogram. discard it. The idea behind probability field. and run more simulations if necessary.and is included as local conditioning data. or P-field.  Standard Deviation: A map of the standard deviation at each grid cell. just because the image is statistically equally probable does not mean it is geologically acceptable. or that the diagonal elements of L and U be correspondingly identical. how do you determine which one is correct? Technically speaking. computed from all input maps. the program computes the average value. any one of the simulated images is a possible realization of the reservoir. simulation is to increase the efficiency of computing the local conditional probability distribution (lcpd) on the original well data only. P-field simulation gets around the problem of too much short scale variability by controlling the sampling of the distributions rather than controlling the distributions as in sequential simulation (Srivastava. Deutsch and Journel. At each cell. different outcomes are created by multiplying vectors of random numbers by a precalculated matrix created from spatial continuity information supplied by the user. 1994a. and an upper triangular matrix U. However. You must look at each simulated image to determine if it is a reasonable representation of what you know about the reservoir -if not. the resultant map converges to the kriged solution. When the number of input simulations is large. the simulated image would contain too much short scale (high frequency noise) variability. using a matrix represented as the product of a lower triangular matrix. Matrix methods can be viewed as a form of sequential simulation because the multiplication across the rows of the precalculated matrix and down the column vector of the random numbers can be construed as a sequential process in which the value of the successive node depends upon the value of the previously simulated nodes (Srivastava. . Uncertainty Estimation Once all of these simulated images have been generated. Some of the possible maps generated from a suite of simulated images include:  Mean: This map is the average of n conditional simulations.  Maximum: Each cell displays the largest value from all input simulations. based on the data and the spatial model.  Minimum: Each cell displays the smallest value from all input simulations.

Only about 100 simulations are required to produce reasonable probability maps or confidence margins on global parameters.  Iso-Probability: These maps are displayed in terms of the attribute value at a constant probability threshold.g. conditional simulations provide more realistic reservoir images than (co)-kriging.  In sequential simulation. How many simulations should we create and use?  This is often a difficult question to answer. Advantages of Conditional Simulation  By approximately reproducing the data histogram and the spatial correlation structure. because it depends.  Only a small number of simulated models.  Simulation with the KED method may yield an unrealistically wide range of simulated values unless the external drift is a smoothly varying function (e.  Do not expect exact reproduction of the spatial model. However. only measured data and previously simulated points that fall within the search radius are used at any given time. because of uncertainty in the model parameters. representing minimum. most likely and maximum cases need to be retained for fluid flow simulations. Over-estimating the correlation may result in overconstrained simulations and a narrow range of outcomes. Uncertainty or Risk: This map displays the probability of meeting or exceeding a user specified threshold value at each grid cell. on the number of conditioning data points and the quality of the correlation between the primary and secondary data (if performing a cosimulation). the search region must extend at least to distances for which the covariance function is to be reproduced.  The spatial correlation function is reproduced only for distances within the search radius. seismic velocity).  Sparse data sets will produce a wide range of outcomes. The grid cell values range between 0 and 100 percent. locations are visited according to a random path to avoid artifacts and maximize simulation variability. Therefore. the amount of conditioning data increases as the number of points simulated..  Correct determination of the correlation coefficient between the primary and secondary variable is crucial for a Markov-Bayes collocated simulation. in part. Practical Considerations for Conditional Simulation  In theory.  Discard geologically unrealistic simulations and recompute more simulations to ensure adequate summary maps. . using a sequential simulation approach.  The density and quality of the conditioning data control the amount of variability.

neighborhood configurations. which are consistent with the data. or correlation coefficient if using collocated cosimulation. and Markov -Bayes assumption are identical to those used in the collocated kriging example described in an earlier section (Figure 1a and 1b 1b ) .  Although statistically equally probable.  Large numbers of simulations may create data management problems.  Interpret confidence limits calculated from post processing simulations with caution. Because simulations can reproduce extreme values (tails of histograms) and their pattern of connectivity. collocated cosimulation of porosity with seismic acoustic impedance for the North Cowden Field data set. The variogram mode.  Simulations are very sensitive to covariance model parameters. but equally probable geological scenarios for use in risk assessment. like the sill and nugget.  Simulations generate different.  Conditional simulations provide alternative models. A Conditional Simulation Example The following figures illustrate a Markov-Bayes. Limitations Of Conditional Simulation  CPU and memory intensive. not all images may be geologically realistic. because uncertainty in the conditioning data may be large. they are useful for simulating hydrocarbon production volumes and rates.  Sparse conditioning data generally produces a wide range of variability between the simulations.

.Figure 1a Fifty simulations (Figure 2 ) were generated and post-processed to create the mean and standard deviation maps of the simulations shown in Figure 3a.

Figure 2 3b. Figure 3a .

and 3c 3c . 3b Figure 4a. .

Figure 4a 4b. 4c. 4b and 4d .

3c and Figure 4a. 4c. The standard deviation or standard error map (Figure 3c) of the 50 simulations ranges from 0 to about 0. This . 3b. Each image is a reasonable representation of porosity based on the input data..8 porosity percentage units. When you see repeating features from image to image.4d illustrate a risk map for two different porosity cutoffs and minimum and maximum value maps. 4b. 4c Figure 2 shows eleven of 50 simulations created by a Markov-Bayes collocated co-simulation approach. Figure 3a. The mean of the simulated values is displayed in the lower right corner. Figure 3a is the mean of the 50 simulations compared to the collocated cokriging result (Figure 3b). 4d show the results of post processing the 50 simulations. you should have more confidence that the values are real. What we see are repeating global patterns with local variability.

If the values follow a particular reference distribution. . if the data are skewed (Figure 5a ). 1998). for example.) However. A Hermite polynomial transform on 55 porosity data is shown in Figure 5a. Figure 4 shows the maximum (Figure 4a) and minimum (Figure 4b) values simulated at each grid node. These displays are very useful for risk analysis. Do not use these as the pessimistic and optimistic cases. not simulated results. Other Points To Consider When Performing A Simulation Stochastic simulation methods assume that the data follow a Normal Distribution. then it may not be necessary to transform them. This assumption is easily checked using q-q plots. This method fits a polynomial with n terms to the histogram and maps the data from one domain to another. These are computed maps. Variogram modeling. it may be necessary to transform the data. (A graphical approach that plots ordered data values against the expected values of those observations. back-transform the gridded results using the stored Hermite coefficients. Departures from the line show how the data differ from the assumed distribution.map provides a measure of uncertainty based on the input data and spatial model. and Figure 4d shows the probability that porosity is  10 %. kriging and simulation are performed on the transformed variable. the points in such a plot will follow a straight line. 5c. 1995: Hohn. If the sample data are reasonably normal. Then. Figure 4c shows the probability that the porosity is  8 %. These displays only show the range of simulated values. 5b. Figure 5a One commonly used transformation that transforms any data set into a Normal Distribution is the Hermite polynomial method (Wackernagel.

5b and 5d 5d . .

with information on how to obtain them. The purpose of the transformation is to model the overall shape and not every nuance of the raw data. The cumulative histogram (Figure 5d) shows a reasonably good match between the model (blue) and the original (black) data. considering all the possible combinations of hardware and operating systems.5c The shape in Figure 5a shows a truncated porosity distribution (no values lower than 6 %) because a cut-off was used for pay estimation. In this section. reflecting the evolution in personal computer graphics. This approach to pay estimation creates the skewed distribution. interfaces. For a more complete review. OVERVIEW INTRODUCTION Several very good public domain geostatistical mapping and modeling packages are available to anyone with access to a personal computer. which may only be an approximation. Figure 5c and Figure 5d show the results of a Hermite polynomial modeling approach to transform the data. These programs are placed into the public domain with the understanding that the user is ultimately responsible for its proper use. These packages are fairly sophisticated. The geostatistical packages STATPAC. and GSLIB are reviewed according to their approximate chronological order of appearance in the public domain. . see the article by Clayton (1994). and advances in geostatistical technology. we must transform the original data into a Gaussian (normal) distribution (Figure 5b). If we want to honor this histogram (Figure 5a) in the simulation process. The modeled histogram (blue) is superimposed on the raw histogram (black) in Figure 5c. Geostatistical algorithms are complicated to program and debug. GEOPACK. five software packages are reviewed. Geo-EAS. Geostatistical Toolbox.

STATPAC STATPAC (STATistical PACkage) is a collection of general-purpose statistical and geostatistical programs developed by the U. Geological Survey. Geological Survey Federal Center P. and thus does not take advantage of the quality graphical routines now available. C Cost: About $100 GeoApplications P. 87-411-B. S. Order STATPAC from the following sources: Books and Open-File Reports U. Box 41082 Tucson. and excellent user's manual makes this an excellent instructional or self-study tool for learning geostatistical analysis (Clayton. The programs were complied in their current form by David Grundy and A. edited by Richardo Olea. This early program was developed for the older XT PCs. Miesch. The integrated program layout. The programs were originally developed for use in applied geochemistry and petrology within the USGS. 1994). The geostatistical program only works for 2dimensional spatial data analysis. S.Another word of caution is that the different authors use different nomenclature and mathematical conventions. which just confuses the issue further. Environmental Protection Agency for environmental site assessment and monitoring of data collected on a spatial network. and was lasted updated in May 1988. B. It was released as USGS Open-File Report 87-411-A. 1994) Order Geo-EAS from the following sources: . Box 25425 Denver. O.S. and 87-411-C. Geo-EAS provides practical geostatistical applications for individuals with a working knowledge of geostatistical concepts. T. The International Association of Mathematical Geologists has attempted to standardize geostatistical jargon through the publication of Geostatistical Glossary and Multilingual Dictionary. AZ 85717-1082 Telephone: (602) 323-9170 Fax: (602) 327-7752 Cost: call or fax for current pricing GEO-EAS Evan Englund (USGS) and Allen Sparks (Computer Sciences Corporation) developed Geo-EAS (Geostatistical Environmental Assessment Software) for the U. The limited graphic capabilities may discourage beginning practitioners from using this software. CO 80225 Telephone: (303) 236-7476 Order Reports: OF 87-411-A.1 was compiled in July 1990. Version 1. even though STATPAC may have some advantages over other public domain software (Clayton. O.2. interface design.

Department of Agriculture) and M. EPA Ada.0 was released in January 1990. CO 80237 Telephone: (303) 751-8553 Cost: call for current pricing National Technical Information Service Springfield. It is designed for both novice and experienced geostatistical practitioners (Clayton. OK 74820 IGWMC USA Institute for Ground-Water Research and Education . R.Computer Oriented Geological Survey P. 1994) Order GEOPACK from: Computer Oriented Geological Survey P. Yates (U. Box 41082 Tucson. S. environmental. Box 370246 Denver. Yates (University of California-Riverside) developed GEOPACK. Box 370246 Denver. CO 80401-1887 Telephone: (303) 273-3103 Fax: (303) 272-3278 Cost: call for current pricing GeoApplications P. O. O. Kerr Environmental Research Laboratory Office of Research and Development U. and research projects for individuals who do not have access to a powerful workstation or mainframe computer. Version 1. V. petroleum. GEOPACK is useful for mining. AZ 85717-1082 Telephone: (602) 323-9170 Fax: (602) 327-7752 Cost: call or fax for current pricing GEOPAC This is a geostatistical package suitable for teaching. O. CO 80237 Telephone: (303) 751-8553 Cost: call for current pricing “GEOPACK” Robert S. VA 22161 Telephone: (707) 487-4650 Fax: (703) 321-8547 Cost: about $100 IGWMC USA Institute for Ground-Water Research and Education Colorado School of Mines Golden. S. research and project work released by the EPA. S.

but is also a useful resource for the novice. Box 657 Eppling 2121 NSW. The program was developed and written by Roland Froidevaux. a consulting company specializing in natural resources and risk assessment. with Version 1. O.30 released in December 1990. Order Geostatistical Toolbox from the following sources: Computer Oriented Geological Survey P. CO 80401-1887 Telephone: (303) 273-3103 Fax: (303) 272-3278 Cost: call for current pricing GEOSTATISTICAL TOOLBOX FSS International. petroleum. Australia GSLIB The GSLIB is a library of geostatistical programs developed at Stanford University under the direction of Andre Journel.Colorado School of Mines Golden. CO 80237 Telephone: (303) 751-8553 Cost: call for current pricing FSS International Offices at: 800 Millbank False Creek South Vancouver. but rather uncompiled ASCII FORTRAN program listings. and environmental industries. user-friendly geostatistical toolbox for workers in mining. Box 370246 Denver. BC Canada V5Z 3Z4 10 Chemin de Drize 1256 Troinex Switzerland 245 Moonshine Circle Reno. offering full 2-D and 3D applications. These programs will run on any . director of the Stanford Center for Reservoir Forecasting. makes Geostatistical Toolbox available to the public. NV 89523 USA P. Geostatistical Toolbox provides a PC based interactive. Oxford University Press published the user‟s guide and FORTRAN programs authored by Clayton Deutsch and Andre Journel (1992). GSLIB addresses the needs of graduate students and advanced geostatistical practitioners. The program library does not contain executable code. GSLIB is the most advanced public domain geostatistical software available. It is also suitable for teaching and academic applications. The program has been rigorously tested and is recommended for anyone wanting an excellent 2-dimensional geostatistical package. O.

273-286. Bargas. Zinger and M. Tulsa. L. S. M. Yarus and R. (1994). and G. G. No.” in Stochastic Modeling and Geostatistics. J. Eds. “Integration of Seismic and Well Log Data in Reservoir Modeling. 1994. and W. NJ 27513 Telephone: 1-800-451-7756 Order: GSLIB: Geostatistical Software Library and User’s Guide by Clayton V. “Geostatistical Modeling of Chalk Reservoir Properties in the Dan Field. “3-D Implementation of Geostatistical Analyses-The Amoco Case Study. R. Dordrecht. J. J. 1994. Chambers. T. G. 3. L. Oklahoma. Chambers. L. B. 1994. L. 201-216. Chu.. AAPG Computer Applications in Geology. W. Hewett. T. No. “Importance of a Geological Framework and Seismic Data Integration for Reservoir Modeling and Subsequent Fluid-Flow Predictions. (1994). J. W. Armstrong. M. the novice may find introductory texts. AAPG Computer Applications in Geology. (1994). Reidel. and P. B. M. Ed. pp. with theoretical background.. C. Schweller. M. M. Eds. Chambers. Deutsch and Andre Journel (ISBN 0-19-507392-4) Cost: $49. pp.. Bashore. Araktingi. U. Araktingi. Eds. Havholm. . C. Kelly. 143-58. pp. No. Srivastava.” in Stochastic Modeling and Geostatistics. (1994).. 3. Frykman. Geostatistical Case Studies. pp. Linville. L. Bashore. Utah Overthrust (USA). 515-554. Yarus and R. AAPG Computer Applications in Geology. J. such as Hohn (1988) or Isaaks and Srivastava (1989). Danish North Sea. L. K. Tran. (1994). and T. Order GSLIB from the following sources: Oxford University Press Business and Customer Service 2001 Evans Road Cary. “Integrated Modeling for Optimum Management of a Giant Gas Condensate Reservoir. Jurassic Eolian Nugget Sandstone. W. “Constraining Geostatistical Reservoir Descriptions with 3-D Seismic Data to Reduce Uncertainty. 1994. useful supplementary reading. L. Though not an exhaustive list.” in Reservoir Characterization III. M. G. M.50 postage You can also order this book through most bookstores. J.” in Stochastic Modeling and Geostatistics. Journel. Linquist. and R.computer platform that can compile FORTRAN.. Eds. 1994.. pp. Chambers.” in Stochastic Modeling and Geostatistics. G. M. S. M. SELECTED PUBLISHED GEOSTATISTICAL CASE STUDIES The following list of published case studies provides an excellent overview of geostatistical applications within the petroleum industry. 1993. U.95 plus $2. 3. Although the user‟s guide is well written and documents the program in an organized text-like fashion. 3. 1987. Levy. 159-176. Almeida. Matheron. Yarus and R. M.” in Stochastic Modeling and Geostatistics. Yarus and R. Cox. Xu and A. No. A. Anschutz Ranch East Field. Pennwell Publishing. D. J. it is a representative list of case studies. Chambers. A. AAPG Computer Applications in Geology..

(1994).” in Stochastic Modeling and Geostatistics. 3.” in Stochastic Modeling and Geostatistics. E. Aasen. 105-119. Journel. Eds. L. Bayesian Sequential Indicator Simulation of Channel Sands in the Oseberg Field. 3. (1994). Eds. 217-240. No. 3. Hollund. L. and F. MacDonald. M. 42. L. AAPG Computer Applications in Geology. Armstrong and G. “A Prototype Procedure for Stochastic Modeling of Facies Tract Distribution in Shoreface Reservoirs. Ed. “Geostatistical Analysis of Oil Production and Potential Using Indicator Kriging. G. 1990. J. T. SPE Annual Technical Conference and Technical Exhibition. 3. L. pp. A. P. “New Method for Reservoir Mapping. J.” Journal of Petroleum Technology.. AAPG Computer Applications in Geology. pp. Yarus and R. Chambers.. “Fractal Methods for Fracture Characterization. pp. Guidish.. 1987. 287-322. No. 1992. 1994.. Doyen. Psaila and S. 7. M. A. J. (1994). 1994.. Hohn. and T.” in Stochastic Modeling and Geostatistics. E. 1988. and A G. 243-250. 91-108. V. Eds. pp. Kelkar. R. 261-272. 1994. J. AAPG Computer Applications in Geology. Meunier. Yarus and R.. “Integrating Well Test-Derived Effective Absolute Permeabilities in Geostatistical Reservoir Modeling. Reidel. and J. 109-120. AAPG Computer Applications in Geology. pp. E. Chambers. M. . Vol. Eds. AAPG Computer Applications in Geology.J. M. M. Chambers. M. M. 53. Chambers. “Numerical Modeling Combining Deterministic and Stochastic Methods. T. “Stochastic Modeling of Troll West with Special Emphasis on the Thin Oil Zone. Tulsa. (1994). Yarus and R. (1994). Doyen. Galli. Dordrecht. 3. M. 3. 3. SPE 28382. 1994. No. M. Hewett. Yarus and R. R. pp. C. Doyen.. J. Strandenes. M. (1994). Sheriff.” in Stochastic Modeling and Geostatistics. 1263-1295. and S. M. 212-218. A.” Geophysics.” in Stochastic Modeling and Geostatistics. Oklahoma. AAPG Computer Applications in Geology. and R. M. Damsleth. Eds. 121-130. pp. O. 2. Eds. pp. pp. Chambers. Vol. Norwegian North Sea. Matheron. Yarus and R. P. (1994). pp. 131-142. L.” in Stochastic Modeling and Geostatistics. 1994. “Description of Reservoir Properties Using Fractals. Journel. Eds. Chambers. P. E. 1994. McDowell. 3. New Orleans. Chambers. Yarus and R. a Monte Carlo Approach..” in Reservoir Geophysics: Investigations in Geophysics. 249-260. M. L. 1994. AAPG Computer Applications in Geology. J. L. “Porosity From Seismic Data: A Geostatistical Approach. Eds. and G. M. No. A.” in Stochastic Modeling and Geostatistics. Shibli. G. AAPG Computer Applications in Geology.. Vol. M. No. Alabert. Yarus and R.. No. No. pp. C. S. No. L.. J.” in Geostatistical Case Studies. Chambers. and K. “Study of a Gas Reservoir Using the External Drift Method. Hoye. pp. “Seismic Discrimination of Lithology and Porosity. Society of Exploration Geophysicists. No. 1994.. Deutsch. Hatloy. Yarus and R. Eds. D.

“Kriging Seismic Data in the Presence of Faults. I. L. Verly. M. Wiley. Chambers. J. New York. M. SPE 24742. pp. J. Geostatistical Ore Reserve Estimation. 77-90. AAPG Computer Applications in Geology. M. AAPG Computer Applications in Geology. N.” in Geostatistics for Natural Resources Characterization. Armstrong and G.. 1992. No. D. 271-294. Srivastava and A. 1998. 1986.” in Stochastic Modeling and Geostatistics. Reidel.. New York. 1994a. Geomathematics. Moinard.. pp. Reidel. M. K. 1984. John Wiley & Sons.. Eds. L. A. Yarus and R. Springer-Verlag . 1991.. Practical Geostatistics. L.900 p. Yarus and R. No. M. London. R. M. Berlin... Eds. “An Overview of Stochastic Methods for Reservoir Characterization. Tran. Cressie.. pp. and M..Marechal. Elsevier Scientific Publishing. New York. Second Edition. L. 3. and T Svanes. AAPG Computer Applications in Geology. M.” in Stochastic Modeling and Geostatistics. 1974. “Integrating Seismic Data in Reservoir Modeling: the Collocated Cokriging Alternative. 3.. 3.364 p. 1987. J.” in Proceedings of the 67th Annual Technical Conference of the Society of Petroleum Engineers.” in Stochastic Modeling and Geostatistics. Matheron. 596 p. Statistics for Spatial Data.. (1994).. Henriquez.” in Stochastic Modeling and Geostatistics. Davis. Wolf. 1994b. 646 p. 93-103.. pp.. Eds. Dordrecht. (1994). pp. “Modeling Heterogeneities in Fluvial Domains: A Review of the Influences on Production Profiles. Basic Linear Geostatistics. Armstrong.. “Integration of Well and Seismic Data Using Geostatistics. Eds. Journel. Elsevier Scientific Publishing. 3. Tyler. “The Visualization of Spatial Uncertainty. REFERENCES Books Agterberg. 153 p. D.. K. 177-200. R.” in Geostatistical Case Studies.. T. Washington. 129 p. Srivastava. Dordrecht. Yarus and R. G. (1994). W. David. J. Chambers. J. 833-842. R. C. G. 339-346. Chambers. Applied Science Publishers. ET al. 1994. 1979. A. T.. (1994). P. NATO ASI Series C-122. Statistics and Data Analysis in Geology. F. “Application of Kriging to the Mapping of a Reef from Wireline Log and Seismic Data: A Case History. L. 1994. J.. D. 3-16. M. Chambers. No. Eds. AAPG Computer Applications in Geology. Burnaman. No.. Yarus and R. pp. Srivastava. 1977. M. Xu. . Eds. New York. Clark. M. pp. Withers.

7. 40 p. A. Y. Academic Press. 1991. Goovaerts. . Oxford University Press. Chambers. A.. Journel. GSLIB: Geostatistical Software Library and User's Guide. M. R. New York. Oxford University Press. 1997. Nonparametric Geostatistics. NY. Kachigan.. H.. 1991. UK. C. Henley. Berlin. Statistical Analysis: An Interdisciplinary Introduction to Univariate and Multivariate Methods . 8. J. Orlando. Oxford. The Semivariogram -Part 2: Engineering and Mining Journal. Xu. V. G. 273-286. Vol. L. 177 p. The Semivariogram -Part 1: Engineering and Mining Journal. Journel. and A G. Elsevier Applied Science Publishers LTD. The Sample Support Problem for Permeability Assessment in Sandstone Reservoirs. Vol. Geostatistical Glossary and Multilingual Dictionary.G. Fundamentals of Geostatistics in Five Easy Lessons. 1992. E. pp. C. Oxford University Press. No. and Huijbergts. American Geophysical Union.. M. Chu. 1989. H. P. R. Sam Kash. Yarus and R.. Van Nostrand Reinhold. 264 pp. Bernard. Oxford University Press. New York.. with software diskettes.. G. W. J. 1994. 256 p. pp. Hohn. L. and D. Report 4 (73 pages). A. Multivariate Geostatistics: An Introduction with Applications. Vol... and Journel. M. Prince. pp. 1979. Short course in Geology. (ISBN: 0-942154-99-1) Radius Press. E. An Introduction to Applied Geostatistics. Riggert. 340 pp.. Springer-Verlag. Oxford. Essex.. 483 pp..Deutsch.. Mining Geostatistics. 1981. I. M. and R. 1995. 90-94. 561 p. The Amoco case study: Stanford Center for Reservoir Forecasting. Journel. No. 8. 180. Ehrlich. 197 8. 145 pp. 180. Eds. Isaaks. Zhu. Clark. 589 p Olea. Geostatistics and Petroleum Geology. 1988.. 1989. New York. H. Papers Anguy. 90-97. Geostatistics for natural resources evaluation. I. Florida. (1994). 1979. 600 p. 3. 198 6 . Srivastava. V. Clark. in Stochastic Modeling and Geostatistics. Wackernagel. S. A. AAPG Computer Applications in Geology. No.

H. Mathematical Geology. M. 1991. 20. Vol. Lake. 3.. J. 22... Ed. G. Vol. Vol. 239-252..Clayton. 52. L. Florida. in Principles of Environmental Sampling. Phillips. H. A. A Statistical Approach to Some Basic Mine Evaluation Problems on the Witwatersrand. E. Srivastava. Dimitrakopoulos. Lucia. 1989. pp. Fogg. GEOPAC.L. Lohmann. F.8. G. American Chemical Society..W. J. L. 1951. and GSLIB. and Tillman.. p. 119-39.. .K. Caroll. 239-252. Metall. Wesson.. J. pp.. Cressie. Soc. p. Wilson. American Geophysical Union. vol. Hawkins. ISIM3D: an ANSP-C threedimensional multiple indicator conditional simulation program: Computer and Geosciences. R. G. J.. and T." in Reservoir characterization II. Orlando. eds. 1988. 4. 2. Yarus and R. 22. R. Cressie. Isaaks. 1963. Mathematical Geology. 1992. R. Geostatistical Toolbox. Spatial Continuity Measures for Probabilistic and Deterministic Geostatistics. and Srivastava... and Desbarats. pp. (1994) "Public Domain Geostatistics Programs: STATPAC. pp. Academic Press. Vol. G.. No.. v. 1997. 1990. and Senger. No. AAPG Computer Applications in Geology. Englund. H. E. No.J. and R.122. and Love. Min. 1988.. 1246-66. A Variance of Geostatisticians. Davis.M.. 1980. South Africa. F. A sedimentological-geostatistical model of aquifer heterogeneity based on outcrop studies (Abstract) EOS. N. B." in Stochastic Modeling and Geostatistics. pp. v. Gomez-Hernandez. D. Mathematical Geology. v.M. J. E.73 p. M.E. 340-367.J. 507-522. 355-381. Keith. Eds. GeoEAS. 4. p 13-18. p.L.C. 115-125. Non-parametric Geostatistics for Risk and Additional Sampling Assessment. Principles of Geostatistics. H. 1995 Comparison of single and multi-facies variograms of Newcastle Sandstone: measures for the distribution of barriers to flow: SPE paper 29596. and D. Matheron.. Krige. J. (1994).C. Economic Geology. pp. J. Chambers. No.16. M. C. 395-440. A. pp. Ates. Geostatistical modeling of gridblock permeabilities for 3D reservoir simulators: SPE Reservoir Engineering.R. 12. "Stochastic simulation of interwellscale heterogeneity for improved prediction of sweep efficiency in a carbonate reservoir. pp.M. W. Mathematical Geology. No. N. 1990. D.W. 313-341. 45-72. The Origins of Kriging. R. Kasap. Chem. vol. Journel. M. Lund.J.. Robust Estimation of the Variogram. 58. 3..

R. 29. Haldorsen. p. Chambers. v." Petroleum Science and Engineering. v. J. SPE Paper 25830. 92-101. (1993). A. AAPG Computer Applications in Geology. No. 3. (1988). v. 1992. 3." Society of Petroleum Engineers Hydrocarbon Economics and Evaluation Symposium. 3-4. "A Consistent Probabilistic Approach to Reserves Estimates. A. 117-122.1. p.a G.html -a helpful and authoritative glossary. Using a Geostatistical Approach. (2001). Hewett. . Eds. J. Vol. Why Geostatistics?. p. (2001)... V. Websites Easton. R. Campos Basin." AAPG Bulletin. 1977. 177-188. 1994. Field study of dispersion in a heterogeneous aquifer 3: geostatistical analysis of hydraulic conductivity: Water Resources Research. ADDITIONAL READING Caers. H. H.28. Kansas. Rehfeldt. I. Royle.lancs. p. "Reservoir Performance Prediction Methods Based on Fractal Geostatistics: Abstract. R. and Alamed.H. (1994). Boggs.. A. Engineering and Mining Journal. Modeling and Usage. 32..A.cas. M. 180. R. No. 1979.. 2-4. http://www. (1990). pp. accompanied by equations.W. J. 27-36. K. G. T. J. " Abstract: Geostatistics Applied to a Reservoir Study in Northwestern Peru Talara Basin . No. S. W. Fundamentals of Semivariogram Estimation. 404-412. 181-181. Capen. (HTML Editing by Ian Jackson). "Geostatistical Reservoir Modeling Using Statistical Pattern Recognition. p. Lawrence. 29 p. L. "Estimation of Subseismic Nonreservoir Layers within a Turbidity Oil-Bearing Sandstone. 72.ac. (1998). Statistics Glossary v1.3309 -3324." AAPG Bulletin. Olea. in Stochastic Modeling and Geostatistics." Journal of Petroleum Technology. 79-86. Emanuel.. April 1990. and Bianchi-Ramirez.J. No. M. E. K. A. Behrens.. C.uk/glossary_v1.. p. Chavez-Cerna.Olea. 82. with definitions explained in plain language. v. V. L." Journal of Petroleum Science and Engineering.M. Yarus and R. and Queiroz De Castro. p.1/main. Measuring Spatial Dependence with Semivariograms. J. 1883-1984.. Kansas Geological Survey. and Gelhar. and Damsleth. A. " Stochastic Modeling. Series on Spatial Analysis. C. pp. L. and McColl. De Araújo Simões-Filho.

(1997). p. and Whitney. J. R. and Relevance to Permeability for Coal Gas. we discuss two types of anisotropy:  Geometric anisotropic covariance models have the same sill. In this module. J. 94-94. C. 1332-1332. p. E. Australia. 2-4. Exp. C. L. This condition is easiest seen when a variogram shows a longer range in one direction than in another." AAPG Bulletin. v. Kwiecien. 80. 1996.  Zonal anisotropic covariance models have the same range. 167-187. 77. (1992). 183-189. Matchen. p. 251-261. Norris. v." International Journal of Coal Geology. and Vargo. M. J. v. West Virginia: Abstract . 78. M.. Murray. I. this condition is also known as positive definite. A. Tetzlaff. M. p. and Chatzis. p. M. v. "Geostatistical Simulation of Petrophysical Rock Types. 31. Nederlof. Lawrence. and Buckley. No. No. "Geostatistical Characterization of the Carpinteria Field. A. No. v. 16. anisotropy refers to covariance models that have major and minor ranges of different distances (correlation scale or lengths). Pawar. 3. "An Innovative Geostatistical Approach to Oil Volumetric Calculations: Rock Creek Field. " Statistical Analysis of the Microlithotype Sequences in the Bulli Seam. Anisotropy: refers to changes in a property when measured along different axes. 145-146." Nonrenovable Resources. (1993). v. J. O. Hohn. the kriging variance must be  0. "Parameter Estimation for Stochastic Models of Fluvial Channel Reservoirs. E. M. R. No attempt was made to duplicate the more extensive glossary by Richardo Olea (1991). . but different sills. v. Smyth. Some definitions may differ slightly from those of Olea. 76. KS. 4. "Statistical Analysis of the Porous Microstructure as a Method for Estimating Reservoir Permeability. J. E. only the most commonly encountered terms were selected... (1993). Edwards. D." AAPG Bulletin. MacDonald. California. TERMINOLOGY In compiling this list of geostatistical terminology... A.J. McDowell. (1996). M.Ioannidis. R. but different ranges. "Probabilistic Estimates from Reservoir-Scale Sedimentation Models." AAPG Bulletin. 175-192. p. p.3. 1643-1644. (1994)." Numer. (1994)." Petroleum Science and Engineering." AAPG Bulletin. 1318. (2001).R.. "Focusing Stochastic Simulation for Effective Problem-Solving in Reservoir Engineering: Abstract. v. G. Stratigr. B. 22. "Comparing Probabilistic Predictions with Outcomes in Petroleum Exploration Prospect Appraisal. p. In geostatistics. M. Admissibility (of semivariogram models): for a given covariance model. D." Petroleum Science and Engineering. p. H. (1996). and Aasen.

This provides a measure of spatial correlation between the two variables. as opposed to the interpolations produced by kriging. The kriging system uses covariance. It produces a bivariate analogue of the variogram. conditional simulation is able to reproduce the variance of the control data. Cross-validation: a procedure to check the compatibility between a data set. Post processing conditional simulation produces a measure of error (standard deviation) and other measures of uncertainty. while low values are understated.g. It is conditional only when the actual control data are honored. First. Biased estimates: seen when there is a correlation between standardized errors and estimated values (see Cross-Validation). so that there is a chance that one area of a map with always show estimates higher (or lower) than expected. Conditional simulation is a variation of conventional kriging or cokriging. The covariance can be considered as the inverse of the variogram. The correlogram can also be calculated with an azimuthal preference. and then the average value is placed at the grid node. It measures a change in variance (variogram) or correlation (correlogram) with distance and/or azimuth. By relaxing some of the kriging constraints (e. and can be considered as an extrapolation of data. making a kriging estimate over an area. rather than variogram or correlogram values. Cokriging: the process of estimating a regionalized variable from two or more variables. A histogram of the standardized errors is skewed. for example estimating the average value at the size of the grid cell. such as iso-probability and uncertainty maps. Cross-correlation: a technique used to compute a spatial cross-covariance model between two regionalized variables. and equal to the value of the sill minus the variogram model (or zero minus the correlogram). Correlogram: a measure of spatial dependence (correlation) of a regionalized variable over some distance. The multivariate version of kriging. each sampled location is kriged . . suggesting a bias in the estimates. to determine the kriging weights. Block kriging: Kriging with nearby sample values to make an estimated value for an area. The final “map” captures the heterogeneity and connectivity mostly likely present in the reservoir. Coregionalization: the mutual spatial behavior between two or more regionalized variables.Auto-correlation: a method of computing a spatial covariance model for a regionalized variable. their goal is to characterize variability or risk. a value is kriged to each sub-cell. Conditional simulation: a geostatistical method to create multiple (and equally probable) realizations of a regionalized variable based on a spatial model. Simulations are not estimations. using a linear combination of weights obtained from models of spatial auto-correlation and cross-correlation. minimized square error). Conditional bias: a problem arising from insufficient smoothing which causes high values of an attribute to be overstated. The grid cell is divided into a specified number of sub-cells. its spatial model and neighborhood design. Covariance: a measure of correlation between two variables.

Kriging: a method of calculating estimates of a regionalized variable using a linear combination of weights obtained from a model of spatial correlation. depending on whether the value of the data point surpasses or falls short of a specified cut-off value. The shape and correlation of the cloud is related to the value of the variogram for distance. h. or the 90th percentiles. These maps provide a level of confidence in the mapped results. It assigns weights to samples to minimize estimation variance. Interpolation: estimation technique in which samples located within a certain search neighborhood are weighted to form an estimate. Indicator variable: a binary transformation of data to either 1 of 0. External drift: a geostatistical linear regression technique that uses a spatial model of covariance when a secondary regionalized variable (e. This technique is also used to check for biased estimates produced by poor model and/or neighborhood design. Estimation variance: the kriging variance at each grid node. The estimates are then compared against the true sample values. The univariate version of cokriging. Histogram: a plot. Experimental variogram: a measure of spatial dependence (dissimilarity or increasing variability) of a regionalized variable over some distance and/or direction. h. falling into size classes of equal width (X-axis). . This is a measure of global reliability. 50th (median). which shows the frequency or number of occurrences (Y-axis) of data. not a local estimation of error. Inverse distance weighting: Non-geostatistical interpolation technique that assumes that attributes vary according to the inverse of their separation (raised to some power). upon which the model variogram will be fitted. seismic attribute) is used to control the shape of the final map created by kriging or simulation. Kriging variance: see estimation variance. Drift: often used to describe data containing a trend. For example.g. then plotting the pairs Z (x) and Z(x+h) as the two axes of a bivariate plot. Drift usually refers to short scale trends at the size of the neighborhood. at the 10th.with all other samples in the search neighborhood. such as the kriging technique. Iso-probability map: maps created by post processing conditional simulations to show the value of the regionalized variable at a constant probability threshold. This is the variogram that is based upon the sample data. Significant differences between estimated values and true values may be influenced by outliers or other anomalies. Geostatistics: the statistical method used to analyze spatially (or temporally) correlated data and to predict the values of such variables distributed over distance or time. h-Scatterplot: a plot obtained by selecting a value for separation distance.

Ordinary (co-)kriging: a technique in which the local mean varies and is reestimated based on the control points in the search neighborhood ellipse (moving neighborhood). Model variogram: a function fitted to the experimental variogram as the basis for kriging. random component (random variable). Also known as the correlation range or correlation scale. Random variable: a variable created by some random process.Lag: a distance parameter (h) used during computation of the experimental covariance model. Nugget effect: a feature of the covariance model where the experimental points defining the model does not appear to intersect the y-axis at the origin. It is abbreviated as a by convention. but is often modeled as zero variance at the control point (well location). Point kriging: making a kriging estimate at a specific point. The nugget represents a chaotic or random component of attribute variability. Linear estimation method: a technique for making estimates based on a linear weighted average of values. Moving neighborhood: a search neighborhood designed to use only a portion of the control data point during kriging or conditional simulation. for example. Random function: the random function has two components: (1) a regional structure component manifesting some degree of spatial auto-correlation (regionalized variable) and lack of independence in the proximal values of Z (x). it represents the distance at which correlation ceases. or a well location. Nested variogram model: a linear combination of two or more variogram (correlogram) models. or when the correlogram reaches zero correlation.5 standard deviation of the mean value of the sample population possibly the result of bad data values or local anomalies. for example at a grid node. such as seen in kriging. It has more than one range showing different scales of spatial variability. . and (2) a local. The nugget model shows constant variance at all ranges. a short-range exponential model combined with a longer-range spherical model. Positive definite: see admissibility. Regionalized variable: a variable that has some degree of spatial autocorrelation and lack of independence in the proximal values of Z (x). it involves adding a nugget component to one of the other models. such as a normal distribution. Nonconditional simulation: a method that does not use the control data during the simulation process. Outliers: data points falling outside about  2. Abbreviated as Co by convention. quite often used to observe the behavior of a spatial model and neighborhood design. Range: the distance where the variogram reaches the sill. Often. whose values follow a probability distribution. The lag distance typically has a tolerance of  one-half the initial lag distance.

. The practical limit is 100 control points. The variance of the sample population is the theoretical sill of the variogram. 1991.Risk map: see Uncertainty Map Simple kriging: the global mean is constant over the entire area of interpolation and is based on all the control points used in a unique neighborhood (or is supplied by the user). the probability that porosity is either above or below the chosen threshold. A threshold value is selected. This implies that a moving window average shows homogeneity in the mean and variance over the study area. Geostatistical Glossary and Multilingual Dictionary. 177 pages. A unique neighborhood is used with simple kriging. Unique neighborhood: a neighborhood search ellipse that uses all available data control points. A. To create a condition of unbiasness. and geometry of volumes upon which we estimate a variable. Semivariogram: a measure of spatial dependence (dissimilarity or increasing variability) of a regionalized variable over some distance. The semivariogram is commonly called a variogram. Sill: the upper level of variance. R. where the variogram reaches its correlation range. that are multiplied by the control data points in the determination of the final estimated or simulated value at a grid node. sum to unity for geostatistical applications. although not all stochastic modeling applications necessarily use control data. 8 % porosity. . . Uncertainty map: these are maps created by post processing conditional simulations. New York. Support: the size. Variogram: geostatistical measure used to characterize the spatial variability of an attribute. spatial statistical homogeneity. The effect of which is that attributes of small support are more variable than those having a larger support. See also correlogram. for example. Stationarity: the simplest definition is that the data do not exhibit a trend. Oxford University Press. Stochastic modeling: used interchangeably with conditional simulation. Weights: values determined during an interpolation or simulation. The variogram can also be calculated with an azimuthal preference. the weights. a plot of similarity between points as a function of distance between the points. Smearing: a condition produced by the interpolation process where high-grade attributes are allowed to influence the estimation of nearby lower grades. Transformation: a mathematical process used to convert the frequency distribution of a data set from Lognormal to Normal. SUGGESTED REFERENCE Olea. shape. an uncertainty map shows at each grid node.

Sign up to vote on this title
UsefulNot useful