Data Analytics: Basic Statistical Parameters

Data Analytics 2
Basic Statistical Parameters

One of the most important tools in statistics is the histogram, which can be
used to describe the frequency distribution of data. A histogram is
constructed from data as compared to a probability distribution that is a
theoretical model. One of the main advantages of histogram is the graphic
interpretation, as it can reveal frequency properties of the data. The basic
statistical parameters (Table 2.1), such as the mean, variance and skewness,
are conveyed in histogram. The mean of data describes a central tendency
even though it may or may not be among the data values. The median and
mode, somewhat describing the “central” tendency of the data, are also
conveyed in the histogram. The variance describes the overall variation in
the data around the mean. The skewness describes the asymmetry of the
data distribution. All these parameters are useful for exploratory data
analysis. Figure 2.1 shows a porosity histogram, with the mean porosity
equal to 0.117, the median equal to 0.112, the mode equal to 0.132, and the
variance equal to 0.00034. These common statistical parameters, along
with other useful parameters, are described in Table 2.1.
Table 2.1 Commonly used statistical parameters
1
Fig. 2.1 Histogram of porosity from well logs in a fluvial sandstone reservoir
Mean
The mean has several variants, including arithmetic mean (the basic,
default connotation), geometric mean, and harmonic mean.
The arithmetic mean is defined as
1 𝑛
𝑚𝑎 = ∑ 𝑥𝑖
𝑛 𝑖=1
The geometric mean is defined as
𝑛 1⁄
𝑛
𝑚𝑔 = (∏ 𝑥𝑖 ) = 𝑛√𝑥1 𝑥2 . . . 𝑥𝑛
𝑖=1
The harmonic mean is defined as
𝑛
𝑚ℎ =
1 1 1
+ +. . .
𝑥1 𝑥2 𝑥𝑛
The arithmetic mean is calculated by the sum of sample values divided by

the number of samples and is often used to describe the central tendency
of the data. It is an unbiased statistical parameter to characterize the
average mass in the data. The geometric mean is more appropriate for
describing proportional growth, such as exponential growth and varying
growth. For example, the geometric mean can be used for a compounding
growth rate. More generally, the geometric mean is useful for sets of
positive numbers interpreted according to their product. The harmonic
mean provides an average for cases where a rate or ratio is concerned, and
it is thus more useful for sets of numbers defined in relation to some unit.
2
Weighted Mean/Average
The weighted mean is commonly used in geosciences and reservoir data
analyses. In weighted averaging, the mean is expressed as a linear
combination of data with weighting coefficients. Only the relative weights
are relevant in such a linear combination and the weighting coefficients
sum to one (termed a convex combination). The weighted mean of a dataset
of n values, xi, is
𝑤1 𝑥1 + 𝑤2 𝑥2 +. . . +𝑤𝑛 𝑥𝑛
𝑚𝑥 =
𝑤1 + 𝑤2 +. . . +𝑤𝑛
*An element with a higher weighting coefficient contributes more to the

weighted mean than an element with a lower weight. The weights cannot
be negative; some weights can be zero, but not all the weights can be zero
because division by zero is not allowed.
Mean, Change of Scale and Sample’s Geometries

One of the most common uses of the mean in geosciences and reservoir
analysis is to average data from a small scale to a larger scale. Core plugs
are a few centimeters, well logs are often sampled or averaged at 15 cm
(half foot) intervals, and 3D reservoir properties are generally modeled
with 20–100 m lateral grid cells and 0.3–10 m in thickness. As a result,
averaging data from a small scale to a larger scale is very common. The
question is what averaging method to use. Change of spatial scale of data
can be treacherous in reservoir characterization and modeling. The
weighted mean has many uses, including the change of scales from well
logs to geocellular grids and from geocellular grids to dynamic simulation
grids, estimating population statistics from samples, and mapping
applications.
Figure 2.2a shows a scheme in which the unweighted mean of three
porosities is 15%; the length-weighted mean is 8.6%, such as
0.5 0.2 1.5

𝑚= × 0.2 + × 0.22 + × 0.03 ≅ 0.086
0.5 + 0.2 + 1.5 0.5 + 0.2 + 1.5 0.5 + 0.2 + 1.5
Figure 2.2b shows a configuration of an upscaling. Assuming the lateral

size (in both X and Y directions) of the grid cells are constant, the volume
of Cell 1 is ¾ of the volume of Cell 3, Cell 2’s volume is 1/4 of the
volume of Cell 3, and Cell 4 has the same volume as Cell 3. Then, Cell 1
has a weight of 0.25 [ 𝑖. 𝑒. , 0.75/⁄(0.75 + 0.25 + 1 + 1) = 0.25 ] Cell 2
has a weight of 0.083 [𝑖. 𝑒. , 0.25/(0.75 + 0.25 + 1 + 1) ≈ 0.083], and
Cells 3 and 4 each have a weight of 1/3.
3
Fig. 2.2 (a) Uneven sampling scheme with 3 samples (can be a horizontal or vertical well, or any
1D sampling scheme). (b) Illustration of upscaling 4 grid cells into 1 large cell
Example:
In mapping the area shown below, the fractional volumes of dolomite

(Vdolomite) at the 2 locations are given. The location with Vdolomite 0.1
is perfectly at the middle; the location with Vdolomite 0.9 is at the 1/6
length to the east side. Assuming no geological interpretation was done,
estimate the target Vdolomite for the map.
Solution:
𝑳 × (𝟏/𝟐) 𝑳 × (𝟏/𝟔)
L
The sample of Vdolomite 0.1 should have a weight twice as much as the
weight for the sample of Vdolomit 0.9 in estimating the target fraction for
the map because it should represent not only the central area, but also the
western area as no sample is available there. Therefore, the following
weighted average should be used for estimating the target Vdolomite:
Let L = 1
1 1 1 1
0.1 × ( + ) + 0.9 × ( + )
2 6 6 6
2 1
→ 0.1 × + 0.9 × = 0.367.
3 3
In contrast, the estimate by a simple average is 0.5.
4
Averaging Permeability Data
Consider the permeability distributions shown in Figure 2.3. Each
geometry requires a different expression to determine the average
permeability, as outlined below. For horizontal flow through layered
permeability, use the arithmetic average (also called the mean):
∑ 𝑘 𝑖 ℎ𝑖
𝑘𝑎𝑣𝑔 = 𝑘𝑎𝑟𝑖𝑡ℎ =
∑ ℎ𝑖
The unweighted arithmetic average permeability k is determined from:
∑ 𝑘𝑖
𝑘̅𝐴 =
𝑛
For vertical flow through layers, use the harmonic average:
∑ ℎ𝑖
𝑘𝑎𝑣𝑔 = 𝑘ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 =
∑ ℎ𝑖 /𝑘𝑖
The unweighted harmonic average permeability k is determined from:
𝑛
𝑘̅𝐻 =
1
∑𝑛𝑖=1( )
𝑘𝑖
For flow in any direction through a disorganized permeability distribution,
use the geometric average:
𝑘𝑎𝑣𝑔 = 𝑘𝑔𝑒𝑜𝑚 = 𝑘̅𝐺 = 𝑛√𝑘1 𝑘2 … 𝑘𝑛 .
The above formula for 𝑘𝑔𝑒𝑜𝑚 is a simplified version of the full form, which
includes the volume 𝑉𝑖 of region 𝑖 and the total system volume 𝑉𝑡 :
𝑉 ⁄𝑉𝑡 𝑉2 ⁄𝑉𝑡 𝑉 ⁄𝑉𝑡
𝑘𝑎𝑣𝑔 = 𝑘𝑔𝑒𝑜𝑚 = 𝑘11 𝑘2 … 𝑘𝑛𝑛 .
For the same set of data, 𝑘ℎ𝑎𝑟𝑚 ≤ 𝑘𝑔𝑒𝑜𝑚 ≤ 𝑘𝑎𝑟𝑖𝑡ℎ , and they are equal only
in a homogeneous reservoir. These formulas apply for any scale of
heterogeneity. That is, the averages apply equally whether the
heterogeneities are in the form of cm-thick laminations, meter-scale beds,
or km-size regions
5
Fig. 2.3 Averaging permeability for different permeability distributions: (a) layered permeability and horizontal flow; (b)
layered permeability and vertical flow; (c) rando permeability. Case (c) assumes the regions are roughly equal in size.
EFFECTIVE PERMEABILITY FROM CORE DATA

The effective permeability, obtained from core data, may be estimated
from:
𝜎𝑘2
𝑘𝑒 = (1 + ) × 𝑒 𝑘̅𝐺
6
where 𝑘̅𝐺 is the geometric mean of the natural log of permeability, i.e.:
𝑘̅𝐺 = 𝑛√𝑙𝑛𝑘1 𝑙𝑛𝑘2 𝑙𝑛𝑘3 … 𝑙𝑛𝑘𝑛
and 𝜎𝑘2 is the variance of the natural log of the permeability estimates:
2
∑𝑛𝑖=1(𝑙𝑛𝑘𝑖 − 𝑙𝑛𝑘̅)2
𝜎 =
𝑛−1
Where
∑ 𝑙𝑛𝑘𝑖
𝑙𝑛𝑘̅ =
𝑛
6
Example:
Given the permeability data in the table below calculate:
1-The arithmetic, geometric, and harmonic averages of the core-derived

permeability values.
2-The effective permeability.
Solution:
1)
∑ 𝑘𝑖 120 + 213 + ⋯ + 117
𝑘̅𝐴 = = = 165 mD
𝑛 14
14
𝑘̅𝐺 = √𝑘1 𝑘2 … 𝑘𝑛 = √120 × 213 × … × 117 = 158.7 𝑚𝐷
𝑛
𝑛 14
𝑘̅𝐻 = = = 151.4 mD
1 1 1 1
∑𝑛𝑖=1( ) (120) + (213) + ⋯ + (117)
𝑘𝑖
The harmonic averaging technique yields, as expected, the lowest value of

average permeability. But the difference between the three averages is not
significant, implying that the formation is essentially homogeneous.
2) Natural log of the core-derived permeability values:
1
𝑘̅𝐺 = √𝑙𝑛𝑘1 𝑙𝑛𝑘2 𝑙𝑛𝑘3 … 𝑙𝑛𝑘𝑛 = (7.173 × 109 )14 = 5.058 𝑚𝐷
𝑛
∑ 𝑙𝑛𝑘𝑖 70.938
𝑙𝑛𝑘̅ = = = 5.067 𝑚𝐷
𝑛 14
2
∑𝑛𝑖=1(𝑙𝑛𝑘𝑖 − 𝑙𝑛𝑘̅)2 1.2171
𝜎 = = = 0.093623
𝑛−1 14 − 1
The arithmetic average of the natural log of the 14 permeability values is

practically equal to the geometric mean of the same permeability values.
This further indicates that this particular formation is practically
homogeneous. Using the geometric mean of the natural log of k values, the
effective permeability is:
𝜎𝑘2 0.093623
𝑘𝑒 = (1 + ) × 𝑒 𝑘̅𝐺 = (1 + ) × 𝑒 5.058 = 159.729 𝑚𝐷
6 6

Data Analytics: Basic Statistical Parameters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analytics: Basic Statistical Parameters

Uploaded by

Copyright:

Available Formats

Data Analytics 2

Basic Statistical Parameters

The arithmetic mean is calculated by the sum of sample values divided by

*An element with a higher weighting coefficient contributes more to the

Mean, Change of Scale and Sample’s Geometries

0.5 0.2 1.5

Figure 2.2b shows a configuration of an upscaling. Assuming the lateral

In mapping the area shown below, the fractional volumes of dolomite

In contrast, the estimate by a simple average is 0.5.

𝑘𝑎𝑣𝑔 = 𝑘𝑔𝑒𝑜𝑚 = 𝑘̅𝐺 = 𝑛√𝑘1 𝑘2 … 𝑘𝑛 .

EFFECTIVE PERMEABILITY FROM CORE DATA

𝑘̅𝐺 = 𝑛√𝑙𝑛𝑘1 𝑙𝑛𝑘2 𝑙𝑛𝑘3 … 𝑙𝑛𝑘𝑛

Given the permeability data in the table below calculate:

1-The arithmetic, geometric, and harmonic averages of the core-derived

2-The effective permeability.

The harmonic averaging technique yields, as expected, the lowest value of

The arithmetic average of the natural log of the 14 permeability values is

You might also like