You are on page 1of 97

Geostatistics

Ahmed Talib - 2020


1
WHAT IS GEOSTATISTICS?

• “Geostatistics: study of phenomena that vary in space


and/or time” (Deutsch, 2002)

• “Geostatistics can be regarded as a collection of


numerical techniques that deal with the characterization
of spatial attributes, employing primarily random models
in a manner similar to the way in which time series analysis
characterizes temporal data.” (Olea, 1999)
• “Geostatistics offers a way of describing the spatial
continuity of natural phenomena and provides
adaptations of classical regression techniques to take
advantage of this continuity.” (Isaaks and Srivastava, 1989)

• Geostatistics deals with spatially autocorrelated data.

• Autocorrelation: correlation between elements of a


series and others from the same series separated from
them by a given interval. (Oxford American Dictionary)

• Some spatially autocorrelated parameters of interest to


reservoir engineers: facies, reservoir thickness, porosity,
permeability
SPATIAL STATISTICS
Spatial statistics is a vast subject, in large part because spatial data
has so many different types:

• univariate or multivariate
• categorical or continuous
• real-value (numerical) or not real-valued.
• observational or experimental

The data locations may:


Euclidean: relating to the
• be points, regions, line segments, or curves geometry (= the study of angles
and shapes formed by the
• be regularly or irregularly spaced relationships between lines)
• regularly or irregularly shaped described by Euclid
• belong to a Euclidean or non-Euclidean space
Three important prototypes:

1. Geostatistical data
2. Lattice data
3. Spatial point patters

The distinctions between these three


types are not always clear-cut.
Spatial Analysis and Modeling
Stochastic what
(flipping a coin)?
Geostatistics is:

 A set of mathematical tools that uses:


• Deterministic (best estimate) and
• Stochastic (probabilistic) techniques
 A set of methods that:
• Can integrate different types of data
• Can aid in reservoir characterization
• Preserve heterogeneity and connectivity
 For the purpose of:
• Filling the interwell space
• Assessing uncertainty

6
A simple definition

Statistics:

“The determination of the probable


from the possible”
STATISTICS
• Like dreams, statistics are a form of wish fulfillment (Jean
Baudrillard, b. 1929)

• He uses statistics as a drunken man uses a lamppost – for


support rather than illumination. (Andrew Lang, 1844-
1912)

• There are three kinds of lies: lies, damn lies and statistics.
(Benjamin Disraeli, 1804-1881)
Why is Statistics Important?
• Statistics is part of the quantitative approach to knowledge

In the past, geology has been qualitative, but is now becoming increasingly quantitative.
Statistics can be used to quantify data, but often times statistics are ignored or
misrepresented
• In the “old days” geologists would use more observation skills. “This looks like granite”

• Nowadays geologists have a bunch of numbers to deal with

• Methods of statistical data analysis are required to retrieve information from the set of
numbers in the computer
“Explosion” of Observations

• In recent years new sophisticated observation methods lead to a


overwhelming amount of data in many disciplines that needs to be
analyzed

• Retrieving information from the data, as well as choosing the right


observation source for information retrieval are the main challenges of
the information age

• Statistics is a way of stepping back and getting the big picture


• NAME FMTN LONG LAT AS AU CR CU NA PB U ZN LA

• 17657S 2 146.473 65.371 35 -5 60 22 0.79 36 9.4 110 40

• 16933S 2 145.859 65.478 42 -12 70 26 0.71 20 7.5 67 73

• 19063S 2 145.889 65.466 57 21 50 41 0.92 16 3 61 43

• 19100S 2 145.891 65.462 140 30 90 94 0.83 16 3.5 77 40

• 21999S 2 145.906 65.475 140 43 100 110 0.77 16 3.7 76 40

• 15849S 2 146.143 65.411 25 -7 70 34 0.75 20 5.7 64 37

• 24248S 3 146.358 65.453 16 -5 40 20 1.3 12 14.1 120 37

• 21935S 3 146.364 65.451 14 -8 50 20 1.3 14 16.5 110 38

• 17130S 3 146.374 65.449 22 -8 50 27 1.3 16 15.4 140 39

• 23555S 3 146.38 65.446 17 -6 40 23 1.1 16 15.1 110 34

• 16754S 3 146.387 65.442 18 -10 80 25 1.3 22 21.5 130 51

• 18145S 3 146.423 65.452 24 -11 60 19 0.94 38 85.9 68 29

• 19974S 3 146.427 65.455 18 -5 10 7 1.9 10 37 39 37

• 16716S 3 146.434 65.464 8 -7 20 9 2.4 12 18.5 63 43

• 17180S 3 146.437 65.454 38 -12 40 12 2 16 88 78 52

• 18417S 3 146.441 65.412 23 -8 70 13 1.6 18 46.4 58 40

• 24426S 3 146.443 65.444 12 11 20 7 2.3 14 25.1 55 63

• 17185S 3 146.445 65.448 28 -10 40 10 2.5 14 51.6 68 85

• 19652S 3 146.445 65.451 16 -5 -10 6 2.3 8 14.4 38 29

• 23558S 3 146.446 65.415 17 -5 40 14 1.5 16 44.5 62 32

• 18273S 3 146.447 65.464 11 9 10 9.5 2.1 16 20 48 45

• 23455S 3 146.451 65.452 31 11 40 17 1.3 28 149 78 41

• 22928S 3 146.454 65.448 44 -10 30 12 1.3 18 114 63 43

• 15975S 3 146.456 65.42 5 -9 60 8 2.1 14 26.8 43 51

• 15971S 3 146.459 65.418 8 -7 10 10 2.1 10 16.2 48 27

• 24247S 3 146.466 65.446 25 -9 30 12 1.4 18 86.1 70 47

• 23028S 3 146.467 65.448 18 6 30 7.5 1.7 16 40.7 47 39

• 19634S 3 146.473 65.417 13 -5 40 12 2.1 12 19 48 39

• 22140S 3 146.475 65.443 28 -14 50 11 1.9 20 107 53 50

• 23546S 3 146.483 65.441 51 -9 30 15 1.1 18 81.1 62 61

• 21802S 3 146.486 65.418 9 -8 40 10 2.2 14 18.8 44 45


A First Approach of Data Analysis
On the Use of Graphs

What do we do? … We make graphs

Bivariate Plots
(plots having, or relating
to, two variables)
•Are these fields distinct? Are trends significant?
• The use of graphs is a helpful tool to get a first grasp about the information
content in the data
• However, the following questions remain:
- What do all of these show?
- Is there any critical analysis of these curves and fields?

• We need a better answer than “They look like it so they must be different”
• We need to employ more “rigorous” scientific methods to geologic
problems
- hypothesis testing

• We need to turn to statistics


What is the Field of Statistics?
• Statistics is “the determination of the probable from the possible”
- This implies we need a rigorous definition and quantification of “probable”
- Statistics is the quantitative study of variance

• Statistics usually deals with data in the form of numbers


• Two common uses of the word “Statistics”:

Descriptive Statistics: Numerical or graphical summary of data (what was


observed)

Inferential Statistics: used to model patterns in the data, accounting for


randomness and drawing inferences about the larger population.
• answers to yes/no questions (hypothesis testing), estimates of numerical characteristics
(estimation), descriptions of association (correlation), or modeling of relationships
(regression).
• Other modeling techniques include ANOVA, time series, and data mining.
Descriptive Statistics
Examples:
 The average age of citizens who voted for the winning candidate in the last presidential election
 The average length of all books about statistics
 The variation in the weight of 100 boxes of cereal selected from a factory’s production line
 Or more technical: “The adjustments of 14 GPS control points for this orthorectification ranged
from 3.63 to 8.36m with an arithmetic mean of 5.14”

• Interpretation:

 You are most likely to be familiar with this branch of statistics, because many examples arise in
everyday life.
 Descriptive statistics form the basis for analysis and discussion in many fields.
Inferential Statistics
Examples:
- A survey that sampled 2001 full- or part-time workers ages 50 to 70, conducted by the American
Association of Retired Persons (AARP), discovered that 70% of those polled planned to work past
the traditional mid-60s retirement age.
•This statistics could be used the draw conclusions about the population of all workers ages 50 to 70.

- Or again more technical: “The mean adjustment of any set of GPS points used for orthorectification is
no less than 4.3 and no more than 6.1m; this statement has a 5% probability of being wrong”

• Interpretation:

- If you use inferential statistics, you start with a hypothesis and look to see whether the data are
consistent with this hypothesis.
- Inferential statistical methods can be easily misapplied or misconstructed, and many methods require
the use of a calculator or computer.
What is Variance?

• Variance measures how a set of data values for a variable fluctuate around the
mean of that variable.
• Variance is the natural error or scatter or variability in measurements or it can be
thought of as the natural spread of data

- Variance is an inherent value of the measurement device one is using, or of the


object that is observed

• Variance is also one of many quantitative measures of variability of data


(assuming the data is of Gaussian nature). This is sometimes represented as σ2 or
S2.
Why do we need statistics: why do data vary?
•Geological error (natural variability) usually what you want to determine
– what is the composition or variability of a granite pluton?
•Field sampling errors
– not getting representative sample
•Preparation errors
– contamination, final split does not represent field sample
•Analytical errors
– calibration errors (setting up the machine)
– measurement errors (fluctuations in counting)
– machine errors (properties of the machine, mass fraction).
Fields like geochronology have tried to address these issues

• Much of the statistical analysis of data focuses upon discovering sources of


variation
Types of data:
1. Nominal: Classification Scheme, may be numeric but could as easily be A,B,C
• 1 -- limestone
• 2 -- sandstone in a geol. column
• no relationship between 1 and 2 i.e. sandstone is not twice as (whatever) as limestone

2. Ordinal: Rank order data, numeric


• 3 > 2 > 1, but 2 not necessarily twice 1
• example: mineral hardness scale, diff. between diamond (10) and corundum (9) is >>>> than difference between 1 (talc) and 9 (corundum)
• example: ranking ice thickness as thin (1), moderately thin (2), medium (3), moderately thick (4), thick (5)

3. Interval: Doesn't have a true (absolute) zero


• can have negative numbers
• intervals have same spacing, zero arbitrarily set
• all measurements are relative to zero
• e.g. Centigrade temp scale,

20 deg. is twice as far from 0 as is 10 degrees


120 - 110 deg. = 20 - 10 deg.
to convert from one interval scale to another we have to multiply and add to shift the zero point
i.e. centigrade ---> Fahrenheit
4. Ratio: Has true zero
• negative values are meaningless
• to convert from one scale to another we need only to multiply: cm to inches
• length, mass, Kelvin temp
Ratio and interval are often used interchangeably. Distance can be an interval if you impose a baseline on the data. For example altitudes are compared to sea level
5. Angular directional: similar to interval (arbitrary zero)
• very important in geology

Data can be “closed” or “open”, “continuous” or “discrete”


Types of statistics - number of variables
1. Univariate
– dependant on one variable
calculations of mean, standard deviation, median, ANOVA, most contouring
are all univariate techniques

2. Bivariate
– two variables x versus y

regression, correlation in geology Harker Diagrams are bivariate


Time Series is a special type of bivariate
3. Multivariate
– many variables

matrix manipulation is generally needed to handle the data


4. Spatial:
Map based, having three or four variables analyzed together. Two or three of the variables are spatial. This
type of analysis is almost unique in geoscience and has led to a geology-derived field called geostatistics.
Types of statistics - robustness

There is a trade off between how confident you can be about a conclusion and
how applicable a statistical test is to your data.
1. Parametric
- greatest degree of confidence but assume a distribution or model
examples T test, F test, ANOVA, principle component analysis requires
“normal” distribution
2. Nonparametric
- not based on model, lesser degree of confidence
- Let the data speak for itself

examples: Mann-Whitney, Chi-squared


3. Robust - the ideal
– based on a model, but give meaningful results even if the data are not in the
distribution
SOME DEFINITIONS
Population versus sample
• Population: total number of all possible specimens in a study

– pebbles on a beach

• Sample: geologically - a single specimen (geologic sample)


– statistically - a subset of a population (group of data)

Statistic versus parameter


• Statistic refers to something calculated from a set of data (sample)
• Parameter a property of a population which usually cannot be measured directly but
must be estimated by a statistic
A few words about precision versus accuracy
Precision:
• How well you can measure something
• The reproducibility of the observation
• The quality of the operation by which a result is obtained
• The number of decimal points on an analysis

Accuracy
• How close you are to the actual value
• The degree of perfection in measurement
• relates to the quality of the result

You can be precise and not accurate


generally NOT OK
You can be accurate and not precise generally OK, but …
Geo-Statistics and Subsurface
Statistics is the science of collecting, pooling samples and making
inferences to support decision making.

Geostatistics is branch of applied statistics with focusing on:


 Geologic (Spatial ) context
 Spatial relationship
 Volumetric support / scale
 Uncertainty
E&P Lifecycle Management
E&P LIFECYCLE MANAGEMENT

INTERPRET & MODEL PLAN & DESIGN EXECUTE & OPERATE

PETROLEUM INVESTMENT LIFECYCLE


Manage business risk

EXPLORATION RESERVOIR DRILLING LIFECYCLE PRODUCTION


LIFECYCLE LIFECYCLE Drill deeper, farther, LIFECYCLE
Invest in the right Enhance reservoir lower TCO Optimize production
prospects understanding operations

INFORMATION LIFECYCLE
Enable data-driven decisions

Achieve optimal technical, operational, and economic performance across the E&P Lifecycle
Exploration Lifecycle Management
Reservoir Lifecycle Management
Drilling Lifecycle Management
Volume Rendering
Multivariate Data Analysis
• The advanced data analytical tools for examining relationships among multiple variables
at the same time.
• Analyze multiple variables
• Enhance sweet spot identification
• Optimize well placement
• Easy to visualize results in the form of a map or 3D model, e.g. quality indicator

Multiple advance techniques for examining relationships among multiple variables at the same time. It is assumed that a given outcome of interest is affected or influenced by more than
one thing.
Multivariate Analytic techniques have been used for many years in others industries such as pharmaceutical, medical, stock market.
In O&G the application of multivariate analysis is relatively new, mainly due to the emergence of unconventional resources, more specifically shale reservoirs, where the good production targets
are a function of multiples variables that belong to the following categories: rock physics, Geochemistry, geological location and engineering parameters.

The main benefit of Multivariate Data Analysis is the ability to easily and effectively combine multiple independent variables to create a Super variable carrying out all information of the input
data, classifying the samples in good, average and bad areas to drill. The final result, can be visualized in maps, 3D grids or cross sections, where the asset team can plan their next wells.
Steps to answer the question about subsurface
What do we sample?
Sampling Types of Measures
Types of Measures
Sampling Representatively
Sampling Bias
Goal of Sampling and Statistics Example
Variable / Feature
Population and Sample
Parameters and Statistics
Predictor and Response Features
Inference
Prediction
Deterministic and Statistical Model
Deterministic model
Statistical model
Statistical model
Uncertainty
Reservoir modeling
Transfer Function
Exploratory data analysis (EDA):
• The following data are 26 acidity samples of precipitation from Allegheny County
Penn collected in 1973 – 1974. The data are pH values, pH 7 is neutral and the
lower the value, the more acidic the precipitation. These data could bear on the
theory that air pollution causes rainfall to be more acidic.
Four histograms of the same data
A better approach is a stem and leaf diagram
Let the data do the talking:
1. Look at data, find maximum and minimum
5.78 – Max
4.12 – Min
1.66 – Difference

2. Choose a suitable pair of adjacent digit positions so you have 10 – 20 groups and split each data
value between the positions. For the pH data, split 4.57 into 4.5 7
The 45 are the leading digits and the 7 is the trailing digit.

3. Write down a column of all the possible sets of leading digits in order from lowest to highest.
These are the stems. We have a stem spacing of 0.1 pH units

4. For each data value, write down the first trailing digit on the line labeled by its leading digits.
These are the leaves, one leaf for each data value.

4.57 becomes 45|7


42|669 represents 4.26, 4.26 and 4.29
From this approach:
• We have maintained the values of the data.
• We have sorted the data
• We have a “histogram” of the data
• We can see that the majority of the data are
grouped around pH 4.1 – 4.7
• and that there are a few high pH outliers that
we should probably look at.
A silly (but real) example:
Grades in a class
65, 74, 78, 80, 81, 82, 83, 89, 90, 91, 92
Approach 1: 90’s – A, 80’s – B, 70’s – C, 60’s – D.
Maybe histogram to make sure that we don’t have all A’s

Yes, but this ignores the fact that the student with the 89 is probably as deserving of an A
as the student with 90.
Approach 2: start with a stem and leaf or two

* is used for splitting a group of 10 into 0 – 4 and 5 - 9


So, looking at the stem and leaf preserves the 89 so I can see that it is more like the 90’s than the other 80’s. I might
make my A – B cutoff at 85, and my B – C cutoff at 75.
A dot plot is an approach halfway between a histogram and stem and leaf.
In a dot plot, data from a single sample are displayed as dots on a scale, with
repeated values represented by dots placed on top of each other. Computer-
generated versions have limited resolution along the scale so the ‘repeated values’
need not be identical. This is similar in concept to the histogram, except that we are
not so concerned with grouping the data into classes.
3. Dot plots are also useful for
comparing two or more samples.
Figure B2.5.4 represents widths
of brachiopods taken from three
sources. They are plotted on
identical scales to make
comparison easier. Note that the
scales must include the lowest
value observed in all three
samples and the highest in all
three. It is easy to see that
samples A and B are similar in
both their means and the degree
of scatter, whilst sample C differs
from them in both respects.
Next we should look in more detail at the distribution of the data and we
should summarize the data
To do this we will use a LETTER VALUE DISPLAY.

To construct a letter value display:

1. Order the data from lowest to highest. The stem and leaf is good for this.

2. Determine how far each value is from the low or high end of the batch. This
is called the depth of each value.
The “deepest” data value is called the median. It marks the middle of the batch
depth of median = (n + 1)/2 d(M)
In our example, n = 26, so d(M) = (26+1)/2 = 13.5 (the average of the 13th and 14th
data values
which are 4.52 and 4.56, so the MEDIAN is 4.54
The median divides the data into half. Similarly we can divide each half of the data
into halves, and that division is called a hinge (H).
Hinges (H): middle of each half of the data
d(H) = ([d(M)] + 1)/2 [ ] means the integer part, or truncate the value so 13.5
becomes 13
Another measure sometimes used are quartiles: d(quartile) = (n + 1)/4
Hinges and quartiles are not the same. For any batch there are two hinges, an upper
hinge and a lower hinge.
For our data,
d(H) = ([13.5] + 1)/2 = 14/2 = 7
Upper hinge = 4.82
Lower hinge = 4.31
On the other hand, quartiles
d(quartile) = (26+1)/4 = 27/4 = 6.75, which implies some tricky interpolation
between the depth 6 and depth 7 values. Most computer programs use quartiles
because they are easier for computers to handle.
Similarly we can divide the first and last quarters of the data into half, called
eighths
d(E) = ([d(H)] +1)/2
= (7+1)/2 = 4
Beyond eighths are letters D, C, B, A, Z, Y...
extremes are the largest and smallest values and
do not have letters: Depth = 1
The letter values are summarized on a table:

The Mid values are the averages of the upper and lower values, and are useful
in examining the symmetry of the data. Note how the midpoint increases with
decreasing depth.
The spread is the difference between the upper and lower values and show
something about the spread or variability of the data.

You might also like