You are on page 1of 100

PE 362: GEOSTATISTICS

William Apau Marfo


williamapaumarfo@gmail.com
Geostatistics
Study of phenomenon that vary in space and time
(Deutsch, 2002)

“Geostatistics can be regarded as a collection of numerical


techniques that deal with the characterization of spatial attributes,
employing primarily random models in a manner similar to the
way in which time series analysis characterizes temporal data.”
(Olea, 1999)

“Geostatistics offers a way of describing the spatial continuity of


natural phenomena and provides adaptations of classical
regression techniques to take advantage of this continuity.”
(Isaaks and Srivastava, 1989)
Geostatistics
Geostatistics deals with spatially autocorrelated data.

Autocorrelation: correlation between elements of a series and


others from the same series separated from them by a given
interval. (Oxford American Dictionary)
Some spatially autocorrelated parameters of interest to reservoir
engineers: facies, reservoir thickness, porosity, permeability
Basic Components of Geostatistics

(Semi)variogram analysis – characterization of spatial correlation

Kriging – optimal interpolation; generates best linear unbiased


estimate at each location; employs semivariogram model

Stochastic simulation – generation of multiple equiprobable


images of the variable; also employs semivariogram model

Geostatistical routines are implemented in the major reservoir


modeling packages like Petrel and Roxar’s Irap RMS; used in the
generation of grids of facies, permeability, porosity, etc. for the
reservoir.
COURSE OBJECTIVES

At the end of the course

• an understanding of the theoretical foundations


of geostatistics

• a good grasp of its possibilities and limitations


Course Syllabus

• Review of Classical statistics-basic statistics concepts;


• univariate distributions and estimators; measures of
heterogeneity; hypothesis testing, correlation, and
regression;
• Spatial modelling,
• Estimation/Interpolation Methods
References

• Heriot Watt-IPE, Petroleum Geoscience Manual, 2011

• Olea R.A, A Practical Primer on Geostatistics , Open-


File Report 2009–1103, U.S. Geological Survey

• Introduction To Geostatistics And Variogram Analysis


By Geoff Bohling
PE 362: GEOSTATISTICS
CHAPTER 1: REVIEW OF CLASS IC AL
STATISTICS

1
POPULATIONS A N D SAMPLES

Statistical analysis is built around the concepts


of “populations” and “samples.”

A population consists of a well-defined set of


elements (either finite or infinite)
More specifically, a population is the entire
collection of
those elements

9
• The properties of a reservoir unit for which the geologist
or engineer is required to infer (or estimate or guess)
values can be considered a population

This population may be


• the entire reservoir (e.g., the Brent Group reservoir in
the
North Sea),
• a subdivision of the reservoir (e.g., the Etive,
Rannoch Formations of the Brent Group)
• a sedimentological entity within the reservoir (e.g., a
bedform or lamina type)

1
0
• Populations possess certain numerical
characteristics (such as the population mean)
which are known as parameters

• Data are measured or observed values obtained


by sampling the population

• A statistic applies to numerical characteristics of


the sample data .

1
1
• A sample is a subset of
elements drawn from
the population

• Samples are studied in


order to make inferences
about the population
itself.

• The sample can be a


small set of
measurements (e.g.,
core plugs) taken from
the reservoir .
1
2
• Within the population, a parameter consists of a
fixed value, which does not change.

• Statistics are used to estimate parameters or


test hypotheses about the parent population

• Unlike the parameter, the value of a statistic is not fixed,


and may change by drawing more than one sample
from the same population.

1
3
• In summary one can estimate the population
parameter by statistical inference from a statistic
computed from some sample data

1
4
Sampling and Sampling techniques

• Samples should be acquired from the population in


a random manner

Random sampling is defined by two properties

1
5
• A random sample must be unbiased, so that each item
in the sample has the same chance of being chosen
as any other item in the sample
• Second, the random sample must be independent,
so that selecting one item from the population has
no influence on the selection of other items in the
population.

1
6
• Unbiased and independent sample gives a better
chance of understanding the true nature (distribution)
of the population as the sample size increases

1
0
Sampling Methods

• The method of sampling affects our ability to


draw inferences about our data

• because we must know the probability of an


observation in order to arrive at a statistical
inference.

• We can sample with replacement or without


replacement

1
1
• Sampling with replacement allows us the chance to
pick that same value again in our sample

• Sampling without replacement prevents us from


sampling that value again,

1
2
Oilfield Applications to Sampling

• Sampling in the oilfield is considered bias

• For example, we may be interested in the pore volume


of a particular reservoir unit for pay estimation

• Typically, we use a threshold or porosity cutoff when


making the calculation, thus deliberately biasing the
true pore volume to a larger value.

1
3
• Similarly, the process of drilling wells in a
reservoir necessarily involves sampling without
replacement

• Also any drilling program will be biased toward high


porosity, high permeability, high structural position,
and ultimately, high production

• Success or failure of nearby wells will influence


further drilling

1
4
• The sampling routine (also known as the drilling
program) is highly biased and dependent

• It is common to infer the parameters for an entire


reservoir (order 108-1010 m3) from a few cores (10- 102
m3) from which limited samples are taken (10-2-10-3
m3)

• Therefore any sample data set will provide only a


sparse and incomplete picture of the entire reservoir

1
5
MEASUREMENT SYSTEMS

• The conclusions of a quantitative study are based in


part on inferences drawn from measurements

• Measurements are numerical values that reflect


the amount or magnitude of some property

• The manner in which numerical values are


assigned determines the measurement scale,
and thereby determines the type of data
analysis

1
6
Nominal Scale

• This measurement classifies observations into


mutually exclusive categories of equal rank

• Observation can be ranked as “red,” “green,” or “blue.”


Symbols like “A,” “B,” “C,” or numbers are also often
used

• In geostatistics, we may wish to predict facies


occurrence, and may therefore code the facies as
1, 2 and 3, for sand, siltstone, and shale, respectively

1
7
Ordinal Scale

• Observations are ranked hierarchically

• An example in geology is Mohs‟ scale of


hardness, in which mineral rankings extend from
one to ten, with higher ranks signifying increased
hardness

• The step between successive states is not equal in


this scale

• In the petroleum industry, kerogen types are


based on an
ordinal scale, indicative of stages of organic diagenesis
1
8
Interval Scale

• This scale is so named because the width of


successive intervals is constant

• Example of an interval scale is temperature. A change


from 10 to 20 degrees C is the same as the change
from 110 to 120 degrees C.

1
9
Ratio Scale

• Ratios not only have equal increments between steps,


but also have a zero point.

• Many geological measurements are based on a ratio


scale, because they have units of length, volume,
mass, and so forth

• For most of our geostatistical studies, we will be


primarily concerned with the analysis of interval and
ratio data.

2
0
• Within the petroleum industry, reservoir properties are
measured along a continuum, but there are practical
limits for the measurements; I t would be hard to
c onceive of negative porosity, permeability, or
thickness, or of porosity greater than 100%.)

2
1
TRIALS, EVENTS, A N D PROBABILITY

• A trial is an experiment that produces an outcome


which consists of either a success or a failure

• An event is a collection of possible outcomes of a trial

• Probability is a measure of the likelihood that an event


will occur, or a measure of that event‟s relative
frequency

2
2
Events can be classified by
relationship to one
there
another;
• Independent Events if the occurrence
event A has no bearing on
of
occurrence
the of event B, and vice
versa
• Dependent if the occurrence of event
influences the occurrence of event
A
B.

2
3
• Mutually Exclusive if the occurrence of either
event precludes the occurrence of the other

2
4
The measure of probability is scaled from 0 to 1,
where:
• 0 represents no c hance of occurrence, and
• 1 represents certainty that the event will
occur.

• Probability is just one tool that enables the statistician


to use information from samples to make inferences
or describe the population from which the samples
were obtained

2
5
Discrete Probability

• Describes that chance that an event will or will not


occur

• For a discrete distribution, probability can be defined


by the following:

P(E) = number of outcomes corresponding


to
event
total number of possible outcomes

2
6
Coin Toss Experiment

• This is a classic example of discrete probability

• The event has two states and must occupy one or


the other it must c ome up either heads or tails

• Because each outcome is equally likely, the


probability of obtaining a head is ½

2
7
Conditional Probability

• The concept of conditional probability is key to oil


and gas exploration.

• The concept of Conditional Probability: the chance


that a particular event will occur depends on whether
another event occurred previously

2
8
• For example, suppose an experiment consists of
observing weather on a specific day

• Let event A =“ snow” and B = “temperature


below freezing”. Obviously, events A and B are
related

• The probability of snow, P(A), is not the same as the


probability of snow given the prior information that
the temperature is below freezing.

2
9
• In statistical notation, the conditional probability
that event A will occur given that event B has
occurred already is written as P(A|B)

• Thus, we define the conditional probabilities of A given


B as
P(A|B) = P(AB) /P(B)

3
0
• and we define the conditional probabilities of B given
A as follows:

P(B|A) = P(AB)/ P(A)

3
1
Additive Law of Probability

Another approach to probability problems is based


upon the classification of compound events, event
relations, and two probability laws

Additive Law of Probability, which applies to unions: A U


B could be read as “A or B”

3
2
• The probability of the union (A u B) is equal
to:

P(A u B) = P(A) + P(B) -P(AnB)

If A and B are mutually exclusive,

P(A n B) = 0 and P(A u B) = P(A) +


P(B)

3
3
Multiplicative Law of Probability

• Multiplicative Law of Probability applies to intersections;


A n B could be read as A and B

• Given two events, A and B, the probability of


the intersection, AB, is equal to
P(AB) = P(A)P(B|A) = P(B)P(A|B)

However I f A and B are independent


P(AB) = P(A)P(B)

3
4
Risk Analysis In Petroleum Play

• The Petroleum explorationist use the concept of


probability to access the Potential of a prospect
to produce commercial hydrocarbons

• Risk Analysis involves the estimation of chances of


exploration success (defined as finding
hydrocarbons)

3
5
The probability of a prospect’s exploration success is
a function of the individual probabilities
concerning;
• RESERVOIR,
• SEAL,
• SOURCE/MIGRATION,
• TRAP
• and T IMING

3
6
• Probabilities are assigned for each of the elements
from 0 (meaning impossible), to 1 (certain or very likely):

• P(R) - The probability that there is reservoir developed


in the prospect

• P(SL) - The probability that sufficient unbreached, non-


permeable seal c ontinuously existed above and lateral
to the reservoir from prior to hydrocarbon migration until
the present day

3
7
• P(SR) - The probability that there is a mature
hydrocarbon source rock in the vicinity of the structure
and that a migration path exists

• P(TR) - The probability that a structural or stratigraphic


trap is present

• P(T) - The probability that the trap was developed prior


to hydrocarbon migration

3
8
Assuming these events are independent, then apply
the multiplicative law,

The Total Prospect Risk, P, is determined

as: P = P(R) x P(SL) x P(SR) x

P(TR) x P(T)

3
9
As guideline Prospects risk can be classified
into the following

• "Needs more work” prospect P < 0.4


• High Risk Prospect 0.4 < P < 0.6
• Low Risk Prospect 0.6 < P < 0.8
• Very good prospect 0.8 < P

4
0
4
1
4
2
R A N D O M VARIABLES

• A random variable can be defined as a numerical


outcome of an experiment whose values are
generated randomly according to some probabilistic
mechanism

• The throwing of a die, for example, produces


values randomly from the set 1,2,3,4,5,6

4
3
• The coin toss is another experiment that produces
numbers randomly . In the case of a coin toss,
however, we need to designate a numerical value to
“heads” as 0 and “tails” as 1; then we c a n draw
randomly from the set 0,1.)

• The concept is used in geostatistics extensively to


characterize a population or convey the unknown value
that an attribute may take at any spatiotemporal
location

4
4
TWO CLASSES OF R A N D O M VARIABLES

There are two different classes of random variables,


with the distinction based on the sample interval
associated with the measurement

• DISCRETE RANDOM VARIABLES


• CONTINUOUS VARIABLES

4
5
Discrete Random Variables

• A discrete random variable may be identified by the


number and nature of the values it assumes; only a
finite range of distinct values

• E.g.: 0,1,2,3,4,5 - as opposed to each and every


number between 0 and 1 - which would produce an
infinite number of values).

4
6
• In most practical problems, discrete random
variables represent count (or enumerated) data,
such as point counts of minerals in a thin section

• Discrete random variables are characterized by a


probability distribution, which may be described by
a formula, table or graph that provides the
probability associated with each value of the
discrete random variable

4
7
Frequency Tables and Histograms

• Discrete random variables are often recorded in a


frequency table, and displayed as a histogram

• A frequency table records how often data values


fall within certain intervals or classes

• A histogram is a graphical representation of


the frequency table

4
8
• In a frequency distribution table interval of variation
of the data is divided into class intervals

• It is common to use a constant class width for a


histogram, so that the height of each bar is
proportional to the number of values within that
class

4
9
• Data is conventionally ranked in ascending order, and
thus can be represented as a cumulative frequency
histogram, where the total number of values below
certain cutoffs are shown, rather than the total number
of values in each class

5
0
5
1
Continuous Random Variables

• These variables are defined by an infinitely large


number of possible values

• The probability density function of the continuous


random variable may be plotted as a continuous
curve.

• Although such curves may assume a variety of shapes, it


is interesting to note that a very large number of
random variables observed in nature approximate a
bell-shaped curve; Normal distribution

5
2
Probability Distributions Of The Discrete
Random Variable
The probability distribution of a discrete random
variable consists of the relative frequencies with which
a random variable takes each of its possible values

Four common probability distributions for discrete


random variables are:

• Binomial
• Negative Binomial
• Poisson
• Hypergeometric

5
3
Binomial Probability Distribution

• Binomial distributions only apply to a special type


of discrete random variable, called a binary
variable

• Binary variables can only have two values: such as ON


or OFF, SUCCESS or FAILURE, 0 or 1

• The probability distribution governing a coin toss or


die throwing experiment is a binomial distribution

• The total number of trials must be fixed beforehand;


an
important requirement for binomial distribution
5
4
• The experiment c onsists of n repeated trials.
• Each trial can result in just two possible outcomes. We
call one of these outcomes a success and the other, a
failure.
• The probability of success, denoted by P, is the same
on every trial.
• The trials are independent; that is, the outcome on
one
trial does not affect the outcome on other trials.

5
5
• let us look at the two-coin example.
• The sample points for this experiment with their
respective probabilities are given below

5
6
• Let y equal the number of heads observed.

• The probability of each value of y may be calculated


by adding the probabilities of the sample points in
the numerical event.

• The numerical event y = 0 contains one sample point,


E4; y =1 contains two sample points, E2 and E3; while y
=2 contains one sample point, E1.

5
7
• The Probability Distribution Function for y, where
y = Number of Heads

5
8
• Thus, for this experiment there is a 25% chance of
observing two heads from a single toss of the two
coins

• The histogram contains three classes for the


random variable y, corresponding to y = 0, y =
1, and y = 2

5
9
• Because p(0) = ¼, the theoretical relative frequency
for y
= 0 is ¼; p(1) = ½, hence the theoretical
relative probability for y = 1 is ½, etc

6
0
• If you were to draw a sample from this population,
by throwing two balanced coins, say 100 times
your histogram would appear very similar

• If you repeated the experiment with 1000 coin tosses,


the similarity would be even more pronounced

6
1
• A typical application of this in the oil industry is to
forecast the probability of success of a drilling program.

• Each wildcat is classified as


either 0 = Failure (dry hole)
1 = Success (discovery)

6
2
• The binomial distribution is appropriate when a
fixed number of wells will be drilled during an
exploratory program or during a single period
(budget cycle) for which the forecast is made.

• Each well that is drilled in turn is presumed to be


independent; An arguable assumption.

6
3
The probability p that a wildcat well will discover gas or
oil is estimated

1. using an industry-wide success ratio for


drilling in similar areas,

2 based on the company’s own success


. ratio.
3 Sometimes the success ratio is a gues
. subjective s

6
4
• In Maths terms;

P = [n! / (n -r)!r!][(1 -p)n-r pr]

The probability that r discoveries will be made in a


drilling program of n wildcats

6
5
• For example, suppose we want to find the probability of
success associated with a 5-well exploration program in
a virgin basin where the success ratio is anticipated to
be about 10%. What is the probability that the entire
exploration program will be a total failure, with no
discoveries?

The terms of the equation are


N = 5 r = 0 p = 0.10
P = [(5!/5!0!] [1] [0.95]
=0.59

6
6
Where:
• P = the probability of success
• r = the number of discovery wells
• p = anticipated success ratio
• n = the number of holes drilled in the exploration
program

6
7
• Discrete distribution giving the probability of making
n discoveries in a five-well drilling program when
the success ratio (probability of discovery) is 10%

6
8
Negative Binomial Probability Distribution

• Negative Binomial Probability Distribution can be


developed to find the probability that x dry holes will
be drilled before r discoveries are made

• The same conditions that govern the binomial


distribution are assumed, except that the number of
“trials” is not fixed

6
9
• The experiment c onsists of x repeated trials.
• Each trial can result in just two possible outcomes. We
call one of these outcomes a success and the other, a
failure.
• The probability of success, denoted by P, is the same
on every trial.
• The trials are independent; that is, the outcome on
one
trial does not affect the outcome on other trials.
• The experiment c ontinues until r successes are
observed, where r is specified in advance.

7
0
• The expanded form of the negative binomial equation
is
P = [(r + x -1)!/(r -1)!x!][(1 -p)x-r p r

Where:
P = the probability of success
r= the number of discovery
wells x = the number of holes
p = regional success ratio

7
1
Discrete distribution giving the cumulative probability that two discoveries will be
made by or before a specified hole is drilled, when the success ratio is 10%

7
2
Poisson Probability Distribution

• A Poisson random variable is typically a count of the


number of events that occur within a certain time
interval or spatial area

The Poisson probability model assumes that:


• events o c cur independently,

• the probability that an event will occur does not


change with time,

• the length of the observation period is fixed in


advance,
.

7
3
• The probability that an event will occur in an interval
is proportional to the length of the interval, and

• the probability of more than one event occurring at


the same time is vanishingly small

7
4
• The equation in this case
is

• Where p(X) = probability of occurrence of the


discrete random variable X
• = rate of occurrence

7
5
Hypergeometric Probability Distributions

• The binomial distribution would not be appropriate


for calculating the probability of discovery because
the chance of success changes with each wildcat
well.

• N: The number of items in the population.


• k: The number of items in the population that are
classified
as suc c esses.

7
6
• n: The number of items in the sample.
• x: The number of items in the sample that are classified
as successes.

• h(x; N, n, k): hypergeometric probability - the


probability that an n-trial hypergeometric experiment
results
in exactly x successes, when the population consists
of N items, k of which are classified as successes.

7
7
• The probability of making x discoveries in a drilling
program consisting of n holes, when sampling from
a population of N prospects of which S are
believed to contain commercial reservoirs

P=(SC X . C n-x)/NC n
N-S

7
8
x = the number of discoveries
N = the number of prospects in the
population n= the number of holes drilled
S = the number commercial reservoirs

7
9
Discrete distribution giving the probability of n discoveries
in three holes drilled on ten prospects, when four of the
ten contain reservoirs

8
0
Frequency Distributions Of Continuous
Random Variables
• Probability density function can be represented
by a continuous curve

• These functions can take on a variety of shapes

• But also be can be represented as as a histogram,


as shown

8
1
8
2
Normal Probability Distribution

• It is often assumed that random variables follow a


normal probability density function

• Many statistical (and geostatistical) methods are


based on this supposition

• The Central Limit Theorem is the foundation of the


normal probability distribution.

8
3
Central Limit Theorem

• The Central Limit Theorem (CLT) states that under


rather general conditions, as the sample size increases,
the sums and means of samples drawn from a
population of any distribution will approximate a
normal distribution(Sokol and Rohlf, 1969; Mendenhall,
1971 )

• I f random samples of n observations are drawn from a


population with finite mean, µ , and a standard
deviation,
, then, as n grows larger, the sample mean, y, will be
approximately normally distributed with mean equal to µ
and standard deviation

8
4
The Central Limit Theorem consists of three
statements

1.The mean of the sampling distribution of means is


equal to the mean of the population from which the
samples were drawn.

2.The variance of the sampling distribution of means is


equal to the variance of the population from which
the samples were drawn, divided by the size of the
samples.

3.If the original population is distributed normally (i.e. it


is bell shaped), the sampling distribution of means will
also be normal.
8
5
The most important contribution of the CLT is in
statistical inference

• Many algorithms that are used to make estimations


or simulations require knowledge about the
population density function

• If we can a c curately predict its behavior using


only a few parameters, then our predictions should
be more reliable

• If the CLT applies, then knowing the sample mean and


sample standard deviation, the density distribution
can be recreated precisely.

8
6
• Formally, the Normal Probability Density
Function is represented by the following
expression

• Z is the height of the ordinate; density of the


function dependent on Y the sample mean
• the parametric mean, µ , and the standard deviation,
o , which determine the location and shape of the
distribution

8
7
• Gaussian normal
distribution can be
described by the position
of its maximum which
corresponds to its mean

)

• The points of inflection


represents the
standard deviation

8
8
• 68.3% of all sample values fall within -1ó to +1ó from
the mean

• 95.4% of the sample values fall within -2ó and +2ó


from the mean

• 99.7% of the values are contained within -3ó


• and +3ó of the mean

8
9
Lognormal Distribution

Many variables in the geosciences do not follow


a normal distribution ; but are highly skewed

Schematic histogram of sizes and numbers of oil


field
discoveries of hundred thousand-barrel equivalent
9
0
• Transforming the data to a standardized normal
distribution simplifies data handling and eases
comparison to different data sets.

• Data which display a lognormal distribution, can be


transformed to resemble a normal distribution by
applying the formula ln(z) to each z variate in the data
set prior to conducting statistical analysis

9
1

You might also like