You are on page 1of 74

Geostatistics Course

A
Notes B

NW 324-7R 345-7R 366-7R 328-7R 351-17R 362-17R SE

SHALE

SHALE

GAS
A1 A2

OIL A3
A4 A6

OIL
A5

WATER

Cross Section Through Effective Porosity Model. Green = High Effective Porosity.

KULIAH GEOSTATISTIK
GL-5043

KULIAH GEOS 1
TATISTIK GL
Course Organization

Introduction
Geostatistics in Reservoir Management
Introduction to Important Concepts and
Vocabulary
Limitations of Geostatistics
Exploratory Data Analysis
Univariate and Bivariate Statistics
Univariate Spatial Statistics
Covariance and Semi-Variogram Function
Variogram Modeling of Continuous Data
Categorical Data and the Indicator
Transform
Random Functions and Spatial
Models

KULIAH GEOS 2
TATISTIK GL
Course Organization
(continued)

Single Variable Geostatistics


Estimation
Kriging
Comparison with Traditional
Methods
Simple Vs. Ordinary Kriging
Effect of Variogram Parameters on
Kriging
Stochastic Techniques
Gaussian-based Methods
Indicator-based Methods
Introduction to Simulated
Annealing
Introduction Boolean (Object)
Methods

KULIAH GEOS 3
TATISTIK GL
Course Organization
(continued)

Brief Introduction to Two


Variable Geostatistics
Estimation Methods
Cokriging
Collocated Cokriging
Stochastic Techniques with Soft Data
Integration
Brief Overview of Other
Methods
Boolean (Object-based)
Simulated Annealing
Course Summary

KULIAH GEOS 4
TATISTIK GL
Introduction

Statement of Learning
Objectives
Definition of Geostatistics and
Important Assumptions
Brief History
Important Reference Books
Role of Geostatistics in Reservoir
Management
Limitations of Geostatistics
Introduction to Key Concepts
Deterministic Vs Stochastic
Estimation Vs Simulation
Key Steps in a Geostatistical
Study

KULIAH GEOS 5
TATISTIK GL
Definition of
Geostatistics

Geostatistics
Branch of Statistics that Deals
with Spatially Correlated Data

Basic Assumptions
Sample Values are Not
Independent
Spatial Continuity Exists

Goal of Geostatistics
Model Spatial Continuity
Use Model for Estimation and/or
Simulation of Spatial Distribution

KULIAH GEOS 6
TATISTIK GL
A Brief History
of Geostatistics
Geostatistics Term Coined by Hart (1952)
- Application of Statistics in a Geographic
Context
Matheron (1962, 1963) Used Term in a
Geological Context for Inferring Ore
Reserves from Data Spatially Distributed
Within an Ore Body
Developed Theory of Regionalized Variables
Formal Introduction of New Statistic - the
Semivariogram

where z(x) = data value at location x and


z(x+h0) = data 1 value at location x+h0 (h =2
( h )
separation
0

2n

distance) z ( x ) z ( x h0 )
Used Kriging to Obtain Best Estimate of a
Property (i.e. Ore Grade) at Some Location in
an Ore Deposit
Built Theory on Practical Work of Krige (1951,
1960)

KULIAH GEOS 7
TATISTIK GL
Semi-Variogram
Example
Model Form = EXPONENTIAL

Sill


= Data Points
Nugget
(may be zero)
Range
Lag or Separation Distance

Fundamental Parameter is the Semi-Variance,


1
(h )
0
2n
z ( xi ) z ( xi h0 )
2

KULIAH GEOS 8
TATISTIK GL
Reference
Books for
Geostatistics
Principle References - Textbooks
Isaaks, E. H. and Srivastava, R. M., 1989. Applied
Geostatistics, Oxford University Press, New York
Yarus, J. M. and Chambers, R. L., 1995. Stochastic Modeling and
Geostatistics - Principles, Methods, and Case Studies, AAPG, Tulsa
Hohn, M. E., 1999 (second edition). Geostatistics and
Petroleum Geology, Kluwer Academic Publishers
Goovaerts, Pierre, 1997. Geostatistics for Natural
Resources Estimation, Oxford University Press, New York
Deutsch, C. V., and Journel, A. G., 1997. GSLIB Geostatistical
Software Library and User Guide, Oxford University Press, New
York (includes FORTRAN code on CDROM).
Clark, I., 1979. Practical Geostatistics, Applied Science
Publishers, Englewood, New Jersey
Journel, A. G. and Huijbregts, C. J., 1978. Mining Geostatistics,
Academic Press, London
Cressie, N., 1991. Statistics for Spatial Data, John Wiley & Sons,
New York
Armstrong, Margaret, 1998. Basic Linear Geostatistics.
Springer.
Olea, Ricardo, 1999. Geostatistics for Engineers and Earth
Scientists. Kluwer Academic Publishers
Deutsch, Clayton V., 2002. Geostatistical Reservoir
Modeling. Oxford
Webster, Richard and Margaret A. Olivier, 2001.
Geostatistics for Environmental Scientists. Wiley

KULIAH GEOS 9
TATISTIK GL
Application of Statistics,
Including Geostatistics,
to Reservoir
Characterization
A Description of a Reservoir is a
Necessary Part of Reservoir
Management
Description is Based on a Variety of
Data (Core Data, Well Log Data,
Seismic Data, Fluid Data, Production
Data, etc.)
Little, if Any, Direct Data
Indirect Data Generally Represents
Very Small Part of Reservoir Volume
Much Uncertainty in Reservoir
Description Due to Limited Data
Statistics Provides a Systematic Way
of Describing and Handling the
Uncertainty

KULIAH GEOS 10
TATISTIK GL
Geostatistics in
Reservoir
Characterization
Well Spacing Vs Reservoir
Volume Sampled by Core and
Well Log Data*
3.50E-05
Reservoir Volume Sampled

3.00E-05
2.50E-05
2.00E-05 Core Data
1.50E-05 Well Log Data
1.00E-05
5.00E-06
0.00E+00
5 10 20 40 80 160 320 640
Well Spacing (Acres)

* Core Diameter = 0.5 feet


Well Log Diameter of
Investigation = 3.0 feet

KULIAH GEOS 11
TATISTIK GL
Geostatistics in
Reservoir
Management
Some Reasons for Strong
Industry Interest in Geostatistics
Geostatistical Estimation and Simulation
Methods Allow Detailed Reservoir Property
(k, Sw) Distributions to be Generated.
Geostatistical Distributions Are Better than
Traditional Methods!
Realistic - Both Geologically and
Statistically
Easy to Generate Using Available
Software
Geostatistical Techniques Such As
Collocated Cokriging Offer Quick
Integration of Well Based Data and Seismic
Data
Stochastic Techniques Allow the
Uncertainty of a Reservoir Description to be
Quantified

KULIAH GEOS 12
TATISTIK GL
Geostatistics in
Reservoir Management
Fundamental Goal is to Calculate
a Reservoir Property Distribution
Using the Available Well Log,
Core, and/or Seismic Data
Geostatistical
Geostatistical
Analysis
Analysis
Raw
Raw
Data
Data

Selection
Selection of
of
Model
Model Appropriate
Appropriate
Estimation
Estimation or
or
Stochastic
Stochastic Algorithm
Algorithm
KULIAH GEOS 13
TATISTIK GL
Geostatistics in
Reservoir Management

Quantify Uncertainty Using


Multiple Geologically and
Statistically Valid Models
Individual
Reservoir
Simulation
Runs Are
Numbered

RESERVOIR n

1 3
1 FLOW 6
2
SIMULATOR
3
5
4
4 2
5
6 OUTCOME
PROPERTY (PHI, K) (RECOVERY)
DISTRIBUTIONS

KULIAH GEOS 14
TATISTIK GL
Limitations of
Geostatistics
Geostatistics Does Not Create
Data or Eliminate the Value of
Obtaining Additional Good Data
Geostatistics Does Not Replace
Sound Qualitative Understanding
and Expert Judgment
Geostatistics Does Not
Necessarily Save Time, At Least in
the Short Term.
Geostatistics Does Not Work Well
Porosity at X is 13.7%
as a Black Box

KULIAH GEOS 15
TATISTIK GL
Important Concepts and
Vocabulary

Deterministic Vs Stochastic
Estimation Vs Simulation

KULIAH GEOS 16
TATISTIK GL
Deterministic Vs
Stochastic

Spatial Model Types


Deterministic If One Knows
Enough About the Process
Responsible for the Distribution
Stochastic If the Underlying
Process Is Not Well Understood

Modeling Is Not
Magic!

KULIAH GEOS 17
TATISTIK GL
Deterministic Vs
Stochastic
Deterministic Models Depend on Outside
Information Not Contained in the Data Values
(i.e. Quantitative Process Description) and the
Context of the Data
Deterministic Model Examples:
Distance a Ball Will Travel When Thrown
Information Needed
Equation
Velocity and Angle Ball Is Thrown
Gravitational Constant (g)
Diffusion of a Trace Element When a Pure
Metal Bar and a Contaminated Metal Bar
Are Joined
Information Needed:
Diffusion Constant and Temperature
Dependence of Diffusion Constant
Diffusion Equation
Initial Concentration of Trace Element in
Contaminated Bar
Concentration of Trace Element at Any Time
or Location in Metal Bar May Be Calculated

KULIAH GEOS 18
TATISTIK GL
Deterministic Vs
Stochastic

Stochastic Models
Stochastic Models Are Useful When the
Process Responsible for the Distribution
of Values is Not Well Understood
A Stochastic Model is a Random Model
Controlled by a Spatial Correlation Model
Stochastic Models are a Useful Reservoir
Characterization Tool Because a
Reservoir is the End Product of Many
Poorly Understood Processes Including
Some or All of the Following:
Sedimentation Compaction
Bioturbation Diagenesis
Burial Erosion
Local Tectonics Regional Tectonics

KULIAH GEOS 19
TATISTIK GL
Estimation Vs
Simulation
Estimation is Process of Obtaining the
Single Best Value of a Reservoir Property
at an Unsampled Location. Local
Accuracy Takes Precedence Over Global
Spatial Variability. Estimation Methods,
Therefore, Tend to Produce Smooth
Property Distributions.
Many Traditional Methods
Block Averages
Inverse Distance Weighted
Interpolation
Triangulation
Many Geostatistical Methods
Ordinary Kriging
Collocated Cokriging

KULIAH GEOS 20
TATISTIK GL
Estimation Vs
Simulation

Simulation is Process of Obtaining


One or More Good Values of a
Reservoir Property at an Unsampled
Location. The Simulated
Distributions Honor Global Features
and Statistics Instead of Local
Accuracy. Simulation Methods Tend
to Produce More Realistic Property
Distributions.
Variety of Methods Available,
Including:
Gaussian Sequential Simulation (GSS)
Sequential Indicator Simulation (SIS)
Simulated Annealing
Boolean (Marked-Point, Object Based)

KULIAH GEOS 21
TATISTIK GL
Estimation Vs
Simulation
Estimation Simulation

Note
NoteSmooth
SmoothContours
Contours
On
OnEstimation
EstimationMap
Map
Compared
Comparedto toSimulation
Simulation
(Stochastic)
(Stochastic)Map.
Map.

Note
Notethat
thatAreas
Areasof of
Greatest
GreatestDifference
Difference
Between
BetweenthetheTwo
TwoMaps
Maps
Are
AreIn
InAreas
AreasofofLittle
Little
or
orNo
NoWell
WellControl.
Control.

Effective Porosity

KULIAH GEOS 22
TATISTIK GL
Principle Steps in a
Geostatistical Reservoir
Characterization Study
Basic Geological Study Provides
Structural and Stratigraphic
Framework
Data Quality Control and Clean-
Up (Univariate and Multi-Variate
Statistical Analysis)
Define Region(s) in which
Stationarity is Applicable
Characterize and Model Spatial
Continuity (Variogram
Modeling) in Selected Regions
Obtain Reservoir Property
Distribution(s) by Estimation
and/or Conditional Simulation
Using Model(s) of Spatial
Continuity
Document Results

KULIAH GEOS 23
TATISTIK GL
Introduction
Learning Objectives
Definition of Geostatistics and
Important Assumptions
Brief History
Important Reference Books
Role of Geostatistics in Reservoir
Management
Limitations of Geostatistics
Introduction to Key Concepts
Deterministic Vs Stochastic
Estimation Vs Simulation
Key Terms
Attribute, Variable, Individual,
Population
Parameter, Statistic
Key Steps in a Geostatistical Study

KULIAH GEOS 24
TATISTIK GL
Univariate Statistics

Although Our Knowledge of


Most Reservoirs is Limited,
the Amount of Data is Often
Difficult to Manage and
Communicate.

KULIAH GEOS 25
TATISTIK GL
Univariate Statistics
Statement of Learning Objectives
Review of Measures of Location and
Spread
Mean
Variance
Standard Deviation
Review of Univariate Plots
Histogram
Probability Density Function - pdf
Cumulative Density Function - cdf
Handling Outliers
Review of Types of Distributions
Parametric
Normal (Gaussian)
Log-Normal
Non-Parametric

KULIAH GEOS 26
TATISTIK GL
Univariate Statistics
The Mean, Usually Denoted by m, is the Central
Value of a Distribution. The Arithmetic Mean is
Given by
1 n
m z (i )
n i 1
where z(i) = sample value and n = number of
data values
For Log-Normal Distributions, the Geometric
Mean is a Better Measure of the Central Value.
The Geometric Mean Given by

1 n

log m log z (Used


The Median Is AlsonFrequently i ) As an
i 1
Appropriate Central Value for Distributions that
are Highly Asymmetric. The Median, Usually
Denoted by M, is the Value that Splits a
Distribution in Half, i.e., 50% of the Values are
Less than the Median and 50% are Greater.

KULIAH GEOS 27
TATISTIK GL
Univariate Statistics

Measures of Location
(continued)
Normal Distribution
Mode = Median = Mean

Mean
Mode
0.08 Median
0.07
Frequency

0.06
0.05
0.04
0.03
0.02
0.01
0

Porosity

KULIAH GEOS 28
TATISTIK GL
Univariate Statistics

Measures of Location
(continued)
Lognormal Distribution
Mode < Median < Mean
Mode
2
1.8 Median
1.6
Frequency

1.4 Arithmetic
1.2 Mean
1
0.8
0.6
0.4
0.2
0
Permeability

KULIAH GEOS 29
TATISTIK GL
Univariate Statistics

The Variance, Usually Denoted by 2, Is a


Measure of the Spread of the Data Values
Around the Mean. The Variance is Given by
n 2
1
z (i ) m
2
n i 1
where z(i) = sample value and n =
number of data values
Note that the Unit of Variance is the Data
Value Unit Squared, i.e. %2. Also note that
Variance Magnitude Is Sensitive to the
Unit Magnitude, i.e. Millidarcys Vs Darcys.
To Get Around these Problems, the
Standard Deviation is Used.
The Standard Deviation, Usually
Denoted by , is the Square Root of the
Variance, that is

2
KULIAH GEOS 30
TATISTIK GL
Univariate Statistics

Example Data Set #1 Showing


Calculation of Mean, Median, Arithmetic
Mean, Geometric Mean, Variance, and
Standard Deviation
Depth Porosity (%) Permeability (darcy) Permeability (md)
Depth Porosity (%) Permeability (darcy) Permeability (md)
1001 16 0.222 222
1001
1002 1116 0.222
0.345 345222
1002
1003 1011 0.345
0.124 124345
1003
1004 910 0.124
0.076 76124
1004
1005 79 0.076
0.050 5076
1005
1006 57 0.050
0.020 2050
1006
1007 15 0.020
0.002 220
1007
1008 21 0.002
0.005 52
1008
1009 72 0.005
0.013 135
1009
1010 47 0.013
0.001 113
1010
1011 84 0.001
0.033 331
1011
1012 138 0.033
0.045 4533
1012
1013 1913 0.045
0.055 5545
1013
1014 2019 0.055
0.513 513 55
1014
1015 2120 0.513
0.456 456513
1015
1016 1721 0.456
0.345 345456
1016
1017 1717 0.345
0.411 411345
1017
1018 1517 0.411
0.250 250411
1018
1019 615 0.250
0.004 4250
1019
1020 26 0.004
0.006 64
1020 2 0.006 6
Number of Values 20 20 20
Number of Values 20 20 20
Column Sum 210.00 2.98 2976.2
Column Sum 210.00 2.98 2976.2
Arithmetic Average 10.50 0.15 148.8
Arithmetic Average 10.50 0.15 148.8
Geometric Average 8.06 0.05 46.0
Geometric Average 8.06 0.05 46.0
Median 9.50 0.05 52.5
Median 9.50 0.05 52.5
Variance 40.79 0.03 30390.2
Variance 40.79 0.03 30390.2
Standard Deviation 6.39 0.17 174.3
Standard Deviation 6.39 0.17 174.3

KULIAH GEOS 31
TATISTIK GL
Univariate Statistics
Measures of Spread and
Shape (continued)

12 m = 0.128 20
10
15
8
6 10
4
5
2
0 0
0.1
0.06
0.02

0.14

0.18

0.22

0.26

Two Distributions with Same Mean


Value. Standard Deviation () of
Distribution Shown by Line with
Solid Boxes is Larger than Standard
Deviation of Distribution Shown by
Line with Crosses.
KULIAH GEOS 32
TATISTIK GL
Univariate Statistics

Univariate Distribution Plots


Histogram
Plot of the Data Value (X-axis)
Against Actual Frequency of
Occurrence of the Data Value (Y-axis)
The X-axis is Normally Divided Into a
Number of Bins or Classes.
Probability Density Function (pdf)
Plot of the Data Value (X-axis)
Against Normalized Frequency of
Occurrence of the Data Value (Y-axis)
The Normalized Frequency of
Occurrence is Obtained by Dividing
the Actual Frequency of Occurrence
by Total Number of Sample Values

KULIAH GEOS 33
TATISTIK GL
Univariate Statistics

Univariate Distribution Plots


(continued)
Cumulative Density Function
(cdf)
Plot of the Data Value (X-axis)
Against Cumulative Frequency of
Occurrence of the Data Value (Y-
axis)
The Cumulative Frequency of
Occurrence is Obtained by
Summing the Normalized
Frequency of Occurrence of the
Current Bin or Class and all Lower
Bins or Classes.
The cdf Y-axis Must Range Between
0 and 1 (or 0% and 100%).

KULIAH GEOS 34
TATISTIK GL
Univariate Statistics
Example of Histogram, pdf, and
cdf Calculations for Example Data
Set #1 (Porosity)
Basic Steps Are:
Set Up Bins (Classes)
Determine Number of Data Values in
Each Bin
Normalized Frequency = Number of
Values in Each Bin Divided by Total
Number of Values
Cumulative Frequency is Obtained by
Summing the Normalized Frequency
of Occurrence of the Current Bin or
Class and all Lower Bins or Classes.
Bin Raw Frequency Normalized Frequency Cumulative Frequency
Bin Raw Frequency Normalized Frequency Cumulative Frequency
0 1 0.05 0.05
0 1 0.05 0.05
2 3 0.15 0.2
2 3 0.15 0.2
4 2 0.1 0.3
4 2 0.1 0.3
6 2 0.1 0.4
6 2 0.1 0.4
8 1 0.05 0.45
8 1 0.05 0.45
10 2 0.1 0.55
10 2 0.1 0.55
12 1 0.05 0.6
12 1 0.05 0.6
14 1 0.05 0.65
14 1 0.05 0.65
16 2 0.1 0.75
16 2 0.1 0.75
18 3 0.15 0.9
18 3 0.15 0.9
20 2 0.1 1
20 2 0.1 1

KULIAH GEOS 35
TATISTIK GL
Univariate Statistics
Calculation of
Histogram, PDF, and
CDF Using
Spreadsheet Program
(Excel)
Enter Data Values in Column
Enter Bin Values in Different
Column
Select Tools Options
Select Data Analysis Option
Select Histogram Function
Execute Histogram Function
by Completing Pop-up at
Right
Histogram
CDF
Plot Results Using Graph
Wizard

KULIAH GEOS 36
TATISTIK GL
Univariate Statistics
Histogram
Histogram
3
3
2.5
2.5

Example Raw Frequency


Raw Frequency
2

1.5
2

1.5

of
1
1
0.5
0.5
0

Histogram
0 0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20
Porosity
Porosity

, pdf, and PDF

cdf Plots
PDF
0.16
0.16
0.14

for
"Normalized" Frequency

0.14
"Normalized" Frequency

0.12
0.12
0.1
0.1

Porosity
0.08
0.08
0.06
0.06
0.04
0.04

Values in
0.02
0.02
0
0 0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20

Example Porosity
Porosity

Data Set CDF


CDF

#1
1
1

0.8
Cumulative Frequency

0.8
Cumulative Frequency

0.6
0.6

0.4
0.4

0.2
0.2

0
0
0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20
Porosity
Porosity

KULIAH GEOS 37
TATISTIK GL
Univariate Statistics
Histogram - Class Size = 5
Histogram - Class Size = 5

Univariate 6
6

Distribution
5
5

Raw Frequency
4

Raw Frequency
4

Plots
3
3
2
2

(continued) 1
0
1
0
0 5 10 15 20 25
Selection of 0 5 10
Porosity
Porosity
15 20 25

Bin or Class
Size Is a Histogram - Class Size = 2
Function of 3
Histogram - Class Size = 2

Data Range, 2.5


3
2.5
Number of
Raw Frequency

2
Raw Frequency

2
1.5

Data Values, 1.5


1
1

and 0.5
0.5
0

Distribution 0
0

10

12

14

16

18

22

24
20

26
0

10

12

16

18

22

24

26
14

20

Porosity

Information Porosity

that Is
Needed. Histogram - Class Size = 1
Histogram - Class Size = 1

Same Data 2
2

Used to Build 1.5


1.5
Raw Frequency

All
Raw Frequency

1
1

Histograms. 0.5
0.5

0
0
1

13

15

17

19

21

23

25
11
1

13

15

17

19

21

23

25
11

Porosity
Porosity

KULIAH GEOS 38
TATISTIK GL
Univariate Statistics

Histograms, pdfs, and cdfs are Also


Used to Spot Outliers
Outliers, Either Extreme Low Values or
Extreme High Values, May Strongly
Affect Summary Univariate Statistics
Such as Mean, Variance, Bivariate
Statistics Such as Covariance and the
Correlation Coefficient, or Spatial
Statistics such as the Semi-Variogram
Outliers May Be Handled by
Declare the Extreme Values to be
Erroneous and Discard. This Approach
Should be Used Judiciously as the Extreme
Values Are Often the Most Interesting
Transform Data to Minimize the Influence
of Extreme Values
Classify Extremes Into a Separate
Statistical Population

KULIAH GEOS 39
TATISTIK GL
Depth Permeability (md)

2701 11
2702 5
2703 6

Univariate
2704 3
2705 5
2706 8
2707 12
2708 13

Statistics
2709 256
2710 390
2711 44
2712 11
2713 2
2714 1
2715 2
2716 4
2717 5
2718 4

Example Data 2719


2720
2721
2722
2
5
7
8

Set with
2723 14
2724 59
2725 389
2726 17

Extreme
2727 452
2728 12
2729 11
2730 5

Values
2731 6
2732 8
2733 6
2734 3

Summary
2735 2
2736 1
2737 5
2738 7

Statistics 2739
2740
6
8

Histogram and CDF (All Data)

16 1

0.9
14
0.8
12
Number of Values 40.00 0.7
10 Arithmetic Average 45.38 0.6
Cumulative Frequency
Raw Frequency

8
Geometric Average 9.12 0.5
Variance 12778.04
0.4
6 Standard Deviation 113.04
0.3
4 Median 6.50
0.2
2
0.1

0 0
0

25

50

75

225

500
100

125

150

175

200

250

275

300

325

350

375

400

425

450

475

Permeability (md)

KULIAH GEOS 40
TATISTIK GL
Univariate Statistics
LOW Permeability (md) HIGH Permeability (md)

11

Outlier
5
6
3
5
8

Handling 12
13
256
390

Delete
44
11
2
1

Extreme 2
4
5

Values
4
2
5
7

Treat
8
14
59
389

Separat
17
452
12
11

ely 5
6
8
6

Transfor 3
2
1

m (next
5
7
6
8

page) 36.00
9.11
4.00
371.75
Number of Values
Arithmetic Average
6.06 364.00 Geometric Average
127.13 6822.92 Variance
11.28 82.60 Standard Deviation
6.00 389.50 Median

KULIAH GEOS 41
TATISTIK GL
Univariate Statistics
Depth Permeability (md) Log Square Root

2701 11 1.041 3.317


2702 5 0.699 2.236
2703 6 0.778 2.449
2704 3 0.477 1.732

Outlier 2705
2706
2707
2708
5
8
12
13
0.699
0.903
1.079
1.114
2.236
2.828
3.464
3.606

Handling
2709 256 2.408 16.000
2710 390 2.591 19.748
2711 44 1.643 6.633
2712 11 1.041 3.317
2713 2 0.301 1.414

(continue 2714 1 0.000 1.000


2715 2 0.301 1.414
2716 4 0.602 2.000
2717 5 0.699 2.236

d)
2718 4 0.602 2.000
2719 2 0.301 1.414
2720 5 0.699 2.236
2721 7 0.845 2.646
2722 8 0.903 2.828

Transfor
2723 14 1.146 3.742
2724 59 1.771 7.681
2725 389 2.590 19.723
2726 17 1.230 4.123

m
2727 452 2.655 21.260
2728 12 1.079 3.464
2729 11 1.041 3.317
2730 5 0.699 2.236

Example 2731
2732
2733
2734
6
8
6
3
0.778
0.903
0.778
0.477
2.449
2.828
2.449
1.732

s 2735
2736
2737
2
1
5
0.301
0.000
0.699
1.414
1.000
2.236
2738 7 0.845 2.646

Log10 2739
2740
6
8
0.778
0.903
2.449
2.828

Number of Values 40.00 40.00 40.00


Square Arithmetic Average
Geometric Average
45.38
9.12
0.9601 4.3584

Variance 12778.03 0.4224 27.0558


Root Standard Deviation
Median
113.04
6.50
0.6499
0.8116
5.2015
2.5476

9.12 19.00

=10**(Arithmetic Average)

=(Arithmetic Average)**2

KULIAH GEOS 42
TATISTIK GL
Univariate Statistics
Normal Score Transform
Gaussian Anamorphosis
Essentially Involves Transforming Any
Distribution to Its Corresponding
Normal or Gaussian Distribution Using
Percentile Ranks. That is, First One
Ranks the Data and then Assigns to
Every i th Observation the Value of
the i th Percentile in a Standard
Normal Distribution.
Important as Many Algorithms Assume
a Gaussian Data Distribution
Often, this Transformation Is Done
Automatically.
Note that for Log-Normal Distributions
the Normal Score Transform is the
Same as Doing a Logarithmic
Transform.

KULIAH GEOS 43
TATISTIK GL
Univariate Statistics

KULIAH GEOS 44
From Olea, 1999
TATISTIK GL
Univariate Statistics

Types of Distributions
Parametric Distributions Can Be
Completely Described by a Few
Parameters such as Mean and
Variance
Normal or Gaussian
Porosity (Sometime)
Saturation (Usually)
Log-Normal
Permeability (Sometime)
Non-Parametric Distributions Can Not
Easily Be Described by Parameters.
Frequently the Result of a Mixture of
Populations.
Bi-Modal
Multi-Modal

KULIAH GEOS 45
TATISTIK GL
Univariate Statistics
Theoretical Normal (Gaussian)
Distribution
Some Distributions Have a Concise
Mathematical Description
A Normal or Gaussian Distribution is
Completely Described by
1 z m 2
1 2
g z e 2
2

where g(z) = frequency, m = arithmetic


mean and
is the variance
The Standard Normal Distribution
Sets m = 0 and = 1

KULIAH GEOS 46
TATISTIK GL
Univariate Statistics
Normal Distributions
Example Histogram and CDF
CDF for a Normal (Gaussian)
Distribution Has an S Shape

Histogram and CDF for Normal Distribution

25 1
0.9
20 Mean = 17.1 0.8

Cumulative Frequency
0.7
Raw Frequency

15 0.6
0.5
10 0.4
0.3
5 0.2
0.1
0 0
10

12

13

14

15

16

17

18

19

20

21

22

23

24

25
11

Porosity

Unit Normal Distribution Has Mean =


0 and Standard Deviation = 1

KULIAH GEOS 47
TATISTIK GL
Univariate Statistics

Log-Normal Distribution
Example Histogram and CDF
Note Skewed S Shape of CDF

Histogram and CDF for Log-Normal Distribution

45 1
40
0.8

Cumulative Frequency
35
Raw Frequency

30
25 Mean = 4.2 0.6

20
15
Geometric Mean = 3.6 0.4

10 0.2
5
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Porosity

KULIAH GEOS 48
TATISTIK GL
Univariate Statistics

Non-Parametric Distribution
Example Histogram and cdf for Bi-
Modal Distribution
Note Step Shape of CDF

Histogram and CDF For Bi-Modal Distribution

16 1
0.9
14
Mean = 13.8 0.8
12
0.7 Cumulative Frequency
Raw Frequency

10 0.6
8 0.5

6 0.4
0.3
4
0.2
2 0.1
0 0
21
13

15

17

19

23

25
1

11

Porosity

KULIAH GEOS 49
TATISTIK GL
Univariate Statistics

Non-Parametric Distribution
Example
Example Data Set (25 Core
Porosity Measurements)
6, 6, 8, 5, 4, 5, 7, 7, 6, 6, 6, 6, 5, 7, 12,
14, 17, 15, 17, 18, 16, 14, 17, 15, 16

Summary Statistics
Summary Statistics

Mean 10.2
Median 7
Mode 6
Standard Deviation 5.03
Variance
25.25
Range 14
Minimum 4
Maximum 18
Sum 255
Count 25

KULIAH GEOS 50
TATISTIK GL
Univariate Statistics

Non-Parametric Distribution
Example (continued)
Histogram and CDF

Histogramand cdf

6 120.00%
5 100.00%
4 80.00%
Frequency

3 60.00%
2 40.00%
1 20.00%
0 0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Porosity(%)

KULIAH GEOS 51
TATISTIK GL
Univariate Statistics

Non-Parametric Distributions
(continued)
Separating a Bi-Modal or Multi-Modal
Mixture (for Example, Effective
Porosity in a Sand-Shale Sequence)
May Result in Individual Distributions
that are Normal or near-Normal
Indicator Transform Often Provides a
Convenient Way of Splitting Mixed
Populations

KULIAH GEOS 52
TATISTIK GL
Univariate Statistics
Session Summary
Measures of Location and Spread
Mean - Measure of Central Value
(Arithmetic Vs Geometric)
Median - Another Measure of Central
Value
Variance and Standard Deviation -
Measure of Spread
Common Displays
Histogram and Population Density
Function (pdf)
Cumulative Density Function (cdf)
Outlier Handling
Exclusion
Separate Population
Transform
Types of Distribution
Parametric (Normal, Log-Normal)
Non-parametric (Bi-Modal)

KULIAH GEOS 53
TATISTIK GL
Univariate Statistics
Problem 1 - Calculate Arithmetic Mean,
Geometric Mean, Median, and Mode for
the Porosity and Permeability Values
Given Below

Depth Porosity (%) Permeability (md) Gamma Ray (API)

1000.00 2.00 2.00 78.00


1001.00 3.00 1.00 75.00
1002.00 2.00 5.00 18.00
1003.00 4.00 9.00 72.00
1004.00 3.00 5.00 78.00
1005.00 5.00 5.00 71.00
1006.00 4.00 11.00 67.00
1007.00 4.00 4.00 72.00
1008.00 4.00 17.00 70.00
1009.00 2.00 0.50 71.00
1010.00 5.00 21.00 76.00
1011.00 4.00 8.00 69.00
1012.00 3.00 0.20 78.00
1013.00 7.00 23.00 77.00
1014.00 9.00 45.00 61.00
1015.00 11.00 78.00 44.00
1016.00 13.00 92.00 48.00
1017.00 11.00 126.00 48.00
1018.00 12.00 84.00 37.00
1019.00 14.00 217.00 22.00
1020.00 16.00 460.00 21.00
1021.00 17.00 893.00 11.00
1022.00 15.00 678.00 31.00
1023.00 16.00 431.00 29.00
1024.00 12.00 42.00 44.00
1025.00 7.00 305.00 67.00
1026.00 4.00 21.00 78.00
1027.00 3.00 6.00 69.00
1028.00 4.00 0.10 72.00

KULIAH GEOS 54
TATISTIK GL
Univariate Statistics

Problem 1 Worksheet

KULIAH GEOS 55
TATISTIK GL
Univariate Statistics
Problem 1 - Solution
Arithmetic Mean
Sum All Measurements
Divide Sum by Total Number of
Measurements
Median Value
Order Measurements from Smallest to
Largest Value
Middle Value = Median
Mode is the Value (or Values) that
Occurs Most Frequently
Geometric Mean
Calculate log10 of Each Value
Sum All log10s
Divide Sum by Total Number of
Measurements
Exponentiate to Get Geometric Mean

KULIAH GEOS 56
TATISTIK GL
Univariate Statistics
Learning Objectives
Review of Measures of Location and
Spread
Mean
Variance
Standard Deviation
Review of Univariate Plots
Histogram
Probability Density Function - pdf
Cumulative Density Function - cdf
Handling Outliers
Review of Types of Distributions
Parametric
Normal (Gaussian)
Log-Normal
Non-Parametric

KULIAH GEOS 57
TATISTIK GL
Bivariate Statistics

Learning Objectives
Bivariate Data Display
Scatterplot or Crossplot
Bivariate Measures
Covariance
Correlation Coefficient
Rank Correlation Coefficient
Brief Review of Linear
Regression
Procedure
Example
Limitations

KULIAH GEOS 58
TATISTIK GL
Bivariate Statistics
Given the Data Shown in the
Table Below, What is the
Relationship Between the
Gamma Ray Log Trace and
the Porosity Values? Is the
Line Shown a Good Fit?
MD (FEET) GR POROSITY
3260.00
3261.00
87.71
67.55
0.097
0.098
SCATTERPLOT
3262.00 42.21 0.09
3263.00 47.33 0.093 0.2
3264.00 48.48 0.079
3265.00 46.05 0.061
POROSITY

0.15
3266.00 40.49 0.064
3267.00 28.05 0.071
3268.00 15.98 0.075 0.1
3269.00 7.61 0.074
3270.00 11.15 0.065 0.05
3271.00 24.89 0.058
3272.00 49.39 0.056 0
3273.00 72.20 0.068
3274.00 79.02 0.105 0.00 50.00 100.00
3275.00 83.61 0.139
3276.00 91.02 0.144 GR
3277.00 94.37 0.149
3278.00 85.52 0.144

Y = 0.00087(X) + 0.045

KULIAH GEOS 59
TATISTIK GL
Bivariate Statistics
Basic Question Is -
What Is the
Relationship
Between Two 1000

Variables? 900

800

Scattergram or
700

600

500

Crossplot Is Basic 400

300

Plot Used to
200

100

Summarize
0
0 5 10 15 20 25 30

Bivariate Data 1000

Used to Examine 100

Relationship
Between Two 10

Variables
Used to Examine
1
0 5 10 15 20 25 30

Data for Extreme 1000

Values (High or Low)


May Use Linear,
100

Semi-Log, or Log-Log 10

Scales
1
1 10 100

KULIAH GEOS 60
TATISTIK GL
Bivariate Statistics
Bivariate Measures
The Covariance, Usually Denoted by
C or XY is the Measure of Joint
Variation of Two Variables, X and Y,
About Their Respective Means, m X and


mY. That 1is n
xy
n
x i mx yi my
i 1

where n = number of data pairs


Covariance = Variance if X = Y
Covariance Has a Magnitude and Units
Problem Similar to Variance.
Therefore, We Use the Correlation
Coefficient, Usually Denoted by XY..
xy
The Correlation Coefficient Is Given by
xy
x y

KULIAH GEOS 61
TATISTIK GL
Bivariate Statistics

Bivariate
Measures
(continued) 10

The 5

Correlation 0
0 5 10
Coefficient Is
Unitless and 10
Varies 5
Between -1 0
and +1. 0 5 10

Values Near 10
Zero Indicate
5
No Significant
0
Linear 0 5 10
Correlation.
Note:
Note:Variance
Variance(s(s2) )ofofXXand
andYY
2

Values is Same for All Plots


Values is Same for All Plots

KULIAH GEOS 62
TATISTIK GL
Bivariate Statistics

Bivariate Measures (continued)


Correlation Coefficient is Extremely
Sensitive to Extreme Values (Outliers)
25.000

20.000

15.000

10.000

5.000

0.000
0.000 10.000 20.000 30.000 40.000 50.000 60.000

(all points) = 0.96


(excluding extreme) = 0.23

May Also Calculate the Rank Correlation


Coefficient, R. If R Significantly Different
than , then Extremes are Present or a
Non-Linear Relationship Exists.

KULIAH GEOS 63
TATISTIK GL
Bivariate Statistics

Bivariate Measures (continued)


Example Calculation for R for
Case With Extreme Value

x y Rank xRank y Normal

1.00 4.00 1 8 25.00

1.50 0.30 2 1 20.00


2.00 2.00 3 5
2.20 0.80 4 3 15.00

2.50 1.60 5 4 10.00


3.00 0.60 6 2
3.10 7.00 7 9 5.00

4.00 3.00 8 7 0.00


4.05 2.40 9 6 0.00 5.00 10.00 15.00 20.00

18.00 22.00 10 10

Rank

10
9
Total Correlation Coeffient 0.94 8

Correlation Coefficient without Extreme0.18 7

Rank Correlation Coeffieient 0.48 6


5
4
3
2
1
0
0 2 4 6 8 10

KULIAH GEOS 64
TATISTIK GL
Bivariate Statistics

Bivariate Measures (continued)


Example Calculation for R for
Case With Non-Linear
Relationship
x y Rank x Rank y Normal

1.00 0.00 1 1 25000.00

2.00 16.00 2 2 20000.00


3.00 81.00 3 3
4.00 300.00 4 4 15000.00

5.00 499.00 5 5 10000.00


6.00 1000.00 6 6
7.00 2401.00 7 7 5000.00

8.00 4096.00 8 8 0.00


9.00 6000.00 9 9 0.00 2.00 4.00 6.00 8.00 10.00

10.00 21000.00 10 10

Rank

10
9
Total Correlation Coeffient 0.73 8
7

Rank Correlation Coeffieient 1.00 6


5
4
3
2
1
0
0 2 4 6 8 10

KULIAH GEOS 65
TATISTIK GL
Bivariate Statistics

Linear Regression
Goal is to Summarize a Relationship
Between Two Variables such as
Porosity and Permeability with a
Linear Equation of the Form

y = a + bx

where y = y value, x = x value, b = slope,


and a = y intercept
In Addition to Obtaining the Model
Parameters (i.e. a and b), Regression
Analysis Also Gives Information About
the Dependency of the Variables and
the Source of Variability

KULIAH GEOS 66
TATISTIK GL
Bivariate Statistics

Linear Regression (continued)


When Two Variables Have a Non-Zero
Covariance, Measurements of One
Variable Can Be Used To Estimate the
Other
Assume a Deterministic Model of the
Form y = a + bx
The Best Estimates of a and b are
Obtained by Finding the Unbiased,
Minimum Variance Estimates of a and
b
This Process Gives:
xy
b 2 a my b mx
x

where mi = mean value of x or y

KULIAH GEOS 67
TATISTIK GL
Bivariate Statistics

Linear Regression
(continued)
The Coefficient of Determination, r2, is
Used to Estimate How Much of the
Overall Variation in Y is Explained by
the Linear Model Shown Below
2 Explained Variation of Y
r =
That is,
Total Variation of Y

n 2
y i y by
r2 is Given

Mathematically,
r 2 1 in1

yi my
2

i1

KULIAH GEOS 68
TATISTIK GL
Bivariate Statistics

Linear Regression
(continued)
Sample Plot Showing
(y -^m )
i y

(yi - y )
y = a + bx
m y

(y1 - m y)
( y1 -^y )
y(1)

x(1)

KULIAH GEOS 69
TATISTIK GL
Bivariate Statistics
Depth Porosity Permeability Permeability

Linear
% darcy md
1001 16 0.288 288
1002 11 0.276 276
1003 10 0.124 124
1004 9 0.076 76

Regressi 1005
1006
1007
1008
7
5
1
2
0.050
0.020
0.001
0.005
50
20
1
5

on
1009 7 0.090 90
1010 4 0.010 10
1011 8 0.033 33
1012 13 0.222 222

Example
1013 19 0.459 459
1014 20 0.513 513
1015 29 0.887 887
1016 17 0.345 345
1017 17 0.411 411
1018 15 0.250 250
1019 6 0.022 22
1020 2 0.006 6

Number of Values 20 20 20
Column Sum 218.00 4.09 4088.0
Arithmetic Average 10.90 0.20 204.4
Geometric Average 8.19 0.07 74.0
Median 9.50 0.11 107.0
Variance 52.83 0.05 53566.8
Standard Deviation 7.27 0.23 231.4

Covariance Bewteen
Porosity (X) and Permeability (Y) 1.537 1536.990
Correlation Coefficient 0.962 0.962

b 0.029 29.092
a -0.113 -112.706
r2 0.925 0.925

900

800

700

600

500

400

300

200

100

0
Y = 29.02X - 112.7
-100

-200
0 5 10 15 20 25 30

KULIAH GEOS 70
TATISTIK GL
Bivariate Statistics
AverageWeight of Linemen
at Texas

300
200
100
0 y = - 1108 + 0.66x

1900 1950 2000

Given the Best-Fit Line Above, What


Will Be the Average
Weight of Linemen at the University
of Texas in the Year
3000?

What About in the Year 1000?

(Data and Example from Larson


& Marx, 1986)
KULIAH GEOS 71
TATISTIK GL
Bivariate Statistics

Linear Regression
(continued)
Limitations
Linear Regression Only Useful
if Data Trend is Linear.
Extrapolation Using Best-Fit
Line Outside of Data Range is
Often Outright Wrong or
Misleading.
Examine Your Data Carefully. A
Subset of the Data May Have
a Useful Linear Trend Even if
the Entire Data Set Does Not.

KULIAH GEOS 72
TATISTIK GL
Bivariate Statistics
Section Summary
Scatterplot Is Basic Bivariate Display
Relationship Between Variables
Extreme Values Easily Spotted
May Be Linear, Semi-Log, or Log-Log
Bivariate Measures
Covariance
Correlation Coefficient
Unit-Less
Varies Between -1 and +1
Rank Correlation Coefficient
Rank Correlation Coefficient Exceeds
Correlation Coefficient Indicates Non-Linear
Relationship
Rank Correlation Coefficient Less than
Correlation Coefficient Suggests Presence of
Extreme Values
Linear Regression

KULIAH GEOS 73
TATISTIK GL
Bivariate Statistics

Learning Objectives
Bivariate Data Display
Scatterplot or Crossplot
Bivariate Measures
Covariance
Correlation Coefficient
Rank Correlation Coefficient
Brief Review of Linear
Regression
Procedure
Example
Limitations

KULIAH GEOS 74
TATISTIK GL