(GEOSTAT) 1b Review On Basic Statistics

TA3201 Geostatistics for Resources Modeling
Review on Basic Statistics

Review on Basic Statistical Analysis
1. Univariate Statistics
Analysis on single variable without considering their location. The
data is assumed to be a random variable.
2. Bivariate Statistics
Analysis on two different variables located in the same location.
3. Spatial Statistics
Analysis on a variable with considering the spatial aspect of data. It
can be applied for natural phenomena, by assuming that the data is a
random function.
2
Univariate Analysis
Parameters for Measure on Central Tendency
1. Mean :
2. Median : central data value
3. Mode : highest frequency value
4. Skewness :
Skewness
5. Kurtosis :
Kurtosis
3
µ
(a) (b) (c)
Skewness of some histograms: a) symmetric (normal

distribution); b) negative skewness; and c) positive skewness
(lognormal distribution)
4
Parameters for Measure on Dispersion
1. Range : range = Xmax – Xmin
2. Variance :
3. Standard Deviation :
4. Coefficient of Variation : CV
5
Coefficient of variation of some grade values of mineral deposit in the world
Type of mineral deposits CV

Gold: California, USA; placer Tertiary 5.10
Tin: Pemali, Bangka, Indonesia; Primary 2.89
Gold: Loraine, South Africa; Black Bar 2.81
Gold: Norseman, Australia; Princess Royal Reef 2.22
Gold: Grasberg, Papua, Indonesia 2.01
Lead: Grasberg, Papua, Indonesia 1.57
Tungsten: Alaska 1.56
Gold: Shamva, Rhodesia 1.55
Uranium: Yeelirrie, Australia 1.19
Gold: Vaal Reefs, South Africa 1.02
Zinc: Grasberg, Papua, Indonesia 0.87
Zinc: Frisco, Mexico 0.80
Nickel: Kambalda Australia 0.70
Manganese 0.58
Lead: Frisko, Mexico 0.57
Sulphur in Coal: Lati Mine, Berau, Indonesia 0.48
Lateritic Nickel: Gee Island, East Halmahera, Indonesia 0.44
Iron ore 0.27
Bauxite 0.22
6
Bivariate Analysis
Scatter plot  used to plot the correlation between two different

variables (i.e. x and y variables) located in the same position
7
Covariance (Cxy)  used to measure the dispersion of two different
variables (i.e. x and y variables) located in the same position
Coefficient of correlation ()  used to measure the correlation

between two different variables (i.e. x and y variables) located in
the same position:
Linear regression of two variables:
where: a = slope
b = Y-intercept
8
About Outlier…
• Outlier can be very sensitive to the data distribution and spatial

structure in estimation (i.e. tend to generate overestimate).
• There is no strict solution to decide how to handle outliers, or even
decide what an outlier is  any solution is based on feelings and
common sense.
• The presence of outliers may require a special robust estimator of
the mean, i.e. “Sichel’s-t-estimator” (Sichel, 1966).
• At this topic we only discuss the problem of correcting individual
values in practice  by cutting/capping high values.
• Other solution  the distribution of data larger and smaller than
twice standard deviation can be considered as outliers (anomaly).
9
Cumulative frequency curve of uranium grades and suggested correction
for outliers (David, 1988).
10
Probability plot of: (a) Pb and (b) Zn grades for each rock types. The parts
noted by dotted circles show some lowest and highest values which are
considered as outliers data, while the horizontal dotted line in the graph is
cut-off for lower and higher grades in intrusive group (Heriawan et al.,
2008).
11
Probability Plot TDHplot
Probability s/d -100 Tambang
of Tin grade Pemali
99.99
99.9
Top-cut for Sn grade = 3.26 kg/m3
99
95
90
80
70
Persen
50
30
20
10
5
.1
.01
0 5 10 15 20 25
12
TDH (kg/m3)
Grade Sn (kg/m3)
Distribution of metal grades in each rock type for Cu-Au porphyritic deposit
6 5000 35
Acidic-Andesitic Volcanics Acidic-Andesitic Volcanics
Acidic-Andesitic Volcanics
Breccia Breccia Breccia
5 Porphyritic Diorite Porphyritic Diorite 30
4000 Porphyritic Diorite
Tuff Tuff Tuff
25
4
3000
Au (ppm)
Cu (ppm)
20
Ag (ppm)
3
15
2000
2
10
1000
1
5
0 0 0
.01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99 .01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99 .01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99
Percent Percent Percent
3500 1400
Acidic-Andesitic Volcanics
8000 Acidic-Andesitic Volcanics Acidic-Andesitic Volcanics
Breccia
3000 Breccia 1200 Breccia
Porphyritic Diorite
Porphyritic Diorite Porphyritic Diorite
Tuff
Tuff Tuff
2500 1000
6000
Pb (ppm)
2000 800
Zn (ppm)
Mo (ppm)
4000
1500 600
1000 400
2000
500 200
0 0 0
.01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99 .01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99 .01 .1 1 5 10 20 30 50 70 80 90 95 99 99.9 99.99
Percent
Percent Percent
13
One method to differentiate the background and anomaly data
An o m alo u s
An o m alo u s
M -2 SD M -1 SD M ean M + 1 SD M + 2 SD
(0 .7 ) (0 .9 ) (1 .1 ) (1 .3 ) (1 .6 )
Slig h tly An o m alo u s Back g roStan
u n dd ard Deviatio n Slig h tly An o m alo u s
6 6
4 4
2 2
0 0
0 .6 0 .7 0 .8 0 .9 1 .0 1 .1 1 .2 1 .3 1 .4 1 .5 1 .6 1 .7
14
Recognizing the different population…
15
Data-point locations of the different population of sodium contents with the low
content is smaller than 1 % () and the high content is larger than 1 % ()
16
Fe vs. Ni grades in Laterite Nickel Deposit
Limonitezone
Saprolite zone
17
The perspective views of Pb-Zn grades in
intrusive group for: (a) Pb and (b) Zn with blue Rocktype Pb >0.005% Pb <0.005% Zn >0.01% Zn <0.01%
and grey colors show the high grade and low Nb of values 979 10671 2666 9107
grade respectively
Min 0.0050 0.0003 0.0100 0.0004
Max 0.8537 0.0049 3.2509 0.0099
Statistics of Pb-Zn grades for Mean 0.0354 0.0014 0.0653 0.0050
each cut-off in intrusive group
Median 0.0086 0.0012 0.0150 0.0047
Std error 0.0032 0.000010 0.0047 0.000022
Variance 0.0103 0.000001 0.0579 0.000004
18
Coef. of var 2.8744 0.7010 3.6828 0.4128
Distribution of Spatial Data
Isotropic
Different Population
Trend (plane)
An example on spatial correlation
of data: The maps show good
correlation between Cu and Au
grades.
1 1 1 1 2 2 2 2 2 1 1 1
1 1 2 2 2 3 2 3 3 2 2 1
1 2 2 2 2 4 3 3 4 3 2 1
1 2 2 4 4 5 5 5 3 3 3 2
2 2 3 7 8 6 7 6 4 2 2 2
2 2 4 7 9 7 6 5 6 4 2 2
2 2 4 5 8 6 5 7 5 4 2 1
1 2 3 3 2 4 5 3 1 2 2 1
1 1 2 2 2 2 3 2 1 1 1 1
1 1 2 2 2 2 2 2 1 1 1 1
Example of data distributon in blocks
• In the blocks (population), if we

select specific area, we will obtain
the different histogram (means
different distribution).
• In the blocks, selecting all blocks Histogram of data distribution according to the
will produce histogram C, then blocks selection
selecting blocks color light grey
will produce histogram A, while
selecting blocks color dark grey
will produce hitogram B.
(a) Example of blocks distribution in
four different mine sites
(b) Histogram of data
• If the cut-off grade is known to be ≥ 2%, then the blocks with distribution from the blocks
dimension 5050 m2 contained grade ≥ 2% will be distributed in four different mine sites
as shown in figure (a) for the four different mine sites.
• The selected blocks in four different mine sites (by chance)
have the same histogram as shown in figure (b).
• If due to the technical reason that the mineable blocks should
have area minimum of 100100 m2 (four blocks in vicinity),
then not all selected blocks is mineable.
Pattern-1
Block size = 50  50 m
Histogram of Pattern-1
40 40
30 30
20 20
10 10
0 0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Pattern-2
15 15
10 10
5 5
0 0
2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Pattern-3
Population for low

15
grade 15
Population for high

grade
10 10
5 5
0 0
2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Classical Statistics vs. Spatial Statistics
• It has been known two methods for statistical analysis of mineral

deposit characteristics: classical statistics and spatial statistics.
• Classical statistical is used to define the properties of sample
values with assumption that they are realization of random
variables.
• In this case, the samples composition/support is relatively ignored,
then assumed that all sample values have the same probability to
be picked up.
• The presence of trends and ore shoots in mineralization zones is
ignored.
• The fact in earth sciences shows that two samples taken in vicinity
gives the similar value compared to the others in further distance.
29
Centre de Geostatistique, Ecole des Mines de Paris,
Fountainebleau
• On the other hand, spatial statistics assumes that the sample values
are realizations of random function.
• In this hypothesis, sample values is function of their locations in
deposit, then their relative position is considered in analysis.
• The similarity of sample values which is function of the samples
distance is the basics theory in spatial statistics.
• In order to define how closely the spatial correlation among points
in deposit, we must know the structural function which is
represented by variogram (semi-variogram).
30
Why spatial analysis ??
 Statistical description has not taken the data location into

account.
 Statistical description has not taken the data density into
account.
 Statistical description will produce the same result even though
the data location is changed randomly.
 Spatial analysis can be prepared by plotting the data
distribution (into a map).
31
Fundamentals of Geostatistics
【Random Data The Same Average 【Anisotropic Distribution】
Distribution】
● ● ● ● and Variance ● ● ●
●
BUT!
● ● ● ● The same histogram ● ● ●
●
● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ●
【Biased Distribution】【Distribution with Trend】

● ● ●
● ●
● ● Orange
Red
●
Blue
Green
● ●
●
● ●
● ● ●
● ●
● ● ●
● Largely Different
● ● ●
● ● ● ●
Spatial
● ● ●
Distribution
Importance of considering data location 18

(GEOSTAT) 1b Review On Basic Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(GEOSTAT) 1b Review On Basic Statistics

Uploaded by

Copyright:

Available Formats

TA3201 Geostatistics for Resources Modeling

Review on Basic Statistics

2. Median : central data value

3. Mode : highest frequency value

(a) (b) (c)

Skewness of some histograms: a) symmetric (normal

1. Range : range = Xmax – Xmin

Type of mineral deposits CV

Scatter plot  used to plot the correlation between two different

Coefficient of correlation ()  used to measure the correlation

Linear regression of two variables:

• Outlier can be very sensitive to the data distribution and spatial

Example of data distributon in blocks

• In the blocks (population), if we

Population for low

Population for high

• It has been known two methods for statistical analysis of mineral

 Statistical description has not taken the data location into

【Biased Distribution】【Distribution with Trend】

You might also like

(GEOSTAT) 1b Review On Basic Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(GEOSTAT) 1b Review On Basic Statistics

Uploaded by

Copyright:

Available Formats

TA3201 Geostatistics for Resources Modeling

Review on Basic Statistics

2. Median : central data value

3. Mode : highest frequency value

(a) (b) (c)

Skewness of some histograms: a) symmetric (normal

1. Range : range = Xmax – Xmin

Type of mineral deposits CV

Scatter plot  used to plot the correlation between two different

Coefficient of correlation ()  used to measure the correlation

Linear regression of two variables:

• Outlier can be very sensitive to the data distribution and spatial

Example of data distributon in blocks

• In the blocks (population), if we

Population for low

Population for high

• It has been known two methods for statistical analysis of mineral

 Statistical description has not taken the data location into

【Biased Distribution】 【Distribution with Trend】

You might also like

【Biased Distribution】【Distribution with Trend】