ACSGY7 PRN

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/234138000
The sampling theory of Pierre Gy: Comparisons, implementation and

applications for environmental sampling
Chapter · January 1996
CITATIONS READS
2 9,411
4 authors, including:
John W. Kern Richard Anderson-Sprecher

Kern Statistical Services, Inc., University of Wyoming, Montana State University University of Wyoming
58 PUBLICATIONS 968 CITATIONS 46 PUBLICATIONS 1,707 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Missing Numbers in the COVID-19 Equation: Answer 7 Questions to Fill the Gaps. View project
All content following this page was uploaded by John W. Kern on 21 May 2014.
The user has requested enhancement of the downloaded file.

1
The Sampling Theory of Pierre Gy:

Comparisons, Implementation, and
Applications for Environmental Sampling
Leon E. Borgman, John W. Kern,

Richard Anderson-Sprecher and George T. Flatman
ABSTRACT
The sampling theory developed and described by Pierre Gy [1]

is compared to design-based classical finite sampling methods for
estimation of a ratio of random variables. For samples of
materials which can be completely enumerated the methods are
asymptotically equivalent. Gy extends the finite sampling
methods to situations where complete enumeration of samples is
not feasible. Gy's methods involve a set of sampling constants
related to the heterogeneity of the material sampled; methods to
estimate these constants from grouped data are given. Computer
programs for the estimation of these constants are described, and
environmental applications are discussed.
INTRODUCTION
Classical finite sampling theory, sometimes called design-
based sampling, is most often associated with the design and
analysis of sample surveys. The identical theory is also used,
however, to guide the selection and analysis of samples in other
applications. In particular, environmental studies are most
often planned and interpreted from this standpoint. (see, for
example, [2]). A thorough description of classical sampling may
be found in Cochran [3], and Thompson [4] describes applications
of the classical theory to selected nonstandard problems.
The major advantage of classical random sampling is that it
is fundamentally objective; assumptions made about the underlying

2
population are, for practical purposes, nonexistent. An
underlying characteristic of the classical paradigm is that it
considers the real world (the population of interest) to be fixed
and deterministic, and randomness is present only because of the
sample selection process. When estimating a fixed population
parameter, variation within the population is thus a hurdle to be
surmounted, and probabilistic description of variation is neither
necessary nor even meaningful. Whatever patterns of variability
are present within the population are effectively removed or,
more accurately, nullified (on average) by randomization of the
sample.
The alternative to classical sampling is generally
understood to be model-based sampling. Model-based sampling
(like design-based sampling) actually consists of a variety of
methods, the most practical and important variant for
environmental samplers being geostatistical sampling (See [5] for
a thorough, general treatment and [6] for a discussion of
geostatistical applications to environmental problems). The main
distinction between model-based and design-based theories rests
on the use or non-use of a model to account for patterns of
variability within the population. Geostatistical models
describe the spatial covariance structure of variables of
interest. Because these models are stochastic, they interject a
degree of randomness into one's perception of the world itself.
In other words, the world as observed through a window in space-

3
time is no longer seen as a fixed fact, but is rather viewed as a
single realization of a random process.
Because the model-based approach views randomness as part of
the population itself, random sampling is no longer necessary for
a model-based sample design. In fact, it is generally not even
desirable, because regularly spaced observations usually provide
the best information about the random process one assumes to be
lurking behind the realized population. The price paid for
allowing non-random sampling is that the patterns of variation in
space and/or time--that is, the character of the underlying
process--must be adequately (although not perfectly) understood
if estimates are to be reliable. The payoff is that model-based
sampling is usually more efficient because it makes more complete
use of information about the population.
Model-based sampling is now sufficiently well established
amongst sampling theorists for opposing camps of design- vs.
model-based statisticians to have formed, feuded, and (sometimes)
made truces. More and more practitioners now recognize that both
perspectives have value, depending on the actual problem at hand.
Relevant articles of interest include Borgman and Quimby [7] and
Brus and De Gruijter [8].
Are design-based and model-based sampling the only choices
available? There exists, perhaps, a third way, as well. Pierre
Gy, a French mining engineer, developed a sampling theory in
relative isolation from the theoretical statisticians, and his

4
theory is now being advanced as another alternative to the
classical sampling theory [1],[9] and [10]. Like classical
sampling, Gy's theory assumes a particular fixed state of nature
which the sampler wishes to describe with calculable confidence.
Like model-based sampling (in particular, geostatistical
sampling), Gy's theory attempts to address patterns in the
variability of the population, body, or area to be sampled.
Although Gy developed his theory in the context of a particular
problem -- the estimation of the grade (percentage mineral
content) in a sample of ore -- proponents view Gy's contributions
as a general theory of sampling which offers improvements over
standard methods in other contexts as well. In particular,
whenever a sampled medium may be viewed as particulate (including
fluids), Gy's theory may be applied. Environmental samplers who
work with soils and waters are of course interested in any
contributions that Gy's theory may offer.
The following pages describe certain important aspects of
Gy's theory and compare it with classical sampling, thereby
clarifying the theory itself and indicating what practitioners
can hope to gain from this perspective. Lyman [11] compares Gy’s
variance estimator to that of Ingamells, including extensive data
analysis. Much of the mathematics in Gy's theory is equivalent
to that in classical design based sampling theory, and the most
important connections will be examined. Gy's work is most easily

5
understood in the context in which it was developed, and the
explanation below makes reference to a population of mineral
particles containing varying amounts of ore. Examples are
given for ways that this model can be used in situations that
parallel problems of interest to environmental samplers.
GY'S SAMPLING THEORY
One of the primary contributions of Gy's theory is it's
systematic identification of different aspects of population
variability before sampling begins. Gy treats the world as
deterministic, just as do classical samplers, but the choice of
sample for Gy may depend on one's assessment of population
variability. (This assessment may be made in part by studying
the variogram of the variable being studied, so Gy's theory has
connections with geostatistical sampling.) In particular, the
method seeks to improve upon classical sampling by directly
addressing the various sources of error rather than by simply
relying on randomization to account for all potential variation.
Pragmatically, this separation of error sources is often
essential, because some errors may represent actual biases, not
just simple variation. After carrying out a presampling analysis
scientists may adopt whatever additional assumptions they believe
are appropriate to organize the sampling process and to interpret
results.
6
The essence of Gy's theory is to "divide and conquer". Once
sources of error in sampling are identified, an attempt is made
to minimize each type of error separately. Pitard [10] describes
this part of Gy's theory in detail and the following brief
summary draws heavily from his exposition. Some error
minimization is just good technician work. In other cases errors
are intrinsic to the population being sampled. Many of Gy's
procedures for accounting for variability are expressly motivated
by properties of particulate sampling. Most notably, true simple
random samples are not feasible in particulate populations. Also,
units can occasionally be altered and rearranged by physical
crushing and mixing. (Note that environmental populations can
rarely be homogenized in this manner.) Finally, technicalities
arise because the parameter of interest, the percent of a mineral
in a body of ore, is a ratio instead of a simple additive
measure.
The most basic errors identified by Gy result from the
intrinsic heterogeneity of the world--not all particles are the
same, and unlike particles are unevenly distributed in space. To
assess the heterogeneity in a population Gy asks two questions.
First, how much do sampling units differ from each other (What is
the "constitution heterogeneity"); and second, how are different
types of units spread about or clustered within the population
(What is the "distribution heterogeneity")? Both types of
heterogeneity affect the reliability of a sample.

7
Because the constitution heterogeneity is impossible to
alter, the sampling error that is associated with it is termed
the fundamental error (FE). This error can never be eliminated
and is a major focus of much of Gy's theory. Estimation of the
variance of the fundamental error is the portion of the theory
which is outlined in most detail below.
A variety of errors are related to the interaction between
the distribution heterogeneity and the sampling method used
(always some form of cluster sampling). An important error of
this type results from the interplay between uneven clumps of
units in the population and sampling devices that grab clumps of
units. This error is called the grouping and segregation error
(GE). In brief, the grouping and segregation error is smallest
when clustering is absent in both the population and the sampling
procedure. Environmental samplers rarely have the luxury of
being able to homogenize (mix) their populations, so this error
will be present in most problems of interest. As is always true
in cluster sampling problems, the desire to select small sampling
clusters must be balanced against the practicality of sampling
larger clusters.
When the population is itself a nearly linear flow of
particles, then additional errors are associated with
distribution heterogeneity. In this case the distribution
heterogeneity may be addressed by using the variogram to describe
variation in the population (not in a random process as in

8
geostatistics). Stratified sampling can then be used as a
remedial measure with strata selected according to the errors
associated with trends (TrE) and cycles (CyE) that are identified
in such a stream. Patterns in two or three dimensions are
substantially more difficult to characterize and Gy does not
attempt to address such patterns.
A perfectly executed sampling plan would contain precisely
the errors described above. For a single lot the ideal sampling
error (ISE) would then be
ISE ' FE % GE (1)
and for a moving stream of particles the error would be
ISE ' FE % GE % TrE % CyE. (2)
Gy also carefully delineates and probes errors that can
enter a sample from sources other than those ideally described
thus far. Survey statisticians have long recognized problems
such as processing errors, nonresponse bias, and the effects of
improper sampling. Analogous errors may arise in particulate
sampling, and Gy has gone to great pains to describe, measure,
and minimize errors of this sort. Among major error sources
delineated by Gy are: errors that arise from edge effects of
sampling equipment and from similar mechanical problems
(delimitation and extraction, or, jointly, mechanical errors);

9
errors in preparing samples for laboratory analysis (preparation
errors) and actual errors from the laboratory (analytic errors).
Those familiar with quality control may note a similarity in
spirit between the above identification of error sources and
similar exercises in the quality literature.
Three comments about these additional types of errors are
relevant at this point. First, these errors are potentially
important to the practitioner because they may bias observations,
as mentioned above. If they are recognized in time, most of
these errors can be minimized, or even eliminated, by proper
physical collection and handling. Second, because sampling is
often done in stages, most of the above-mentioned errors can
enter the problem many times, and the stage with the greatest
error present will form a lower bound on the total sampling
error. Third, the theory itself can only point to the existence
of such variability; it cannot itself remedy errors at this
level.
The error which cannot be removed by even the most careful
technicians and the best instrumentation is the error intrinsic
to the population variability, that is the fundamental error.
The focus below is upon the fundamental error because it is
always present and it is the only error that can be assessed
independently of the sampling method. The fundamental error as
defined by Gy is the relative error in estimating the grade
(proportion of desired mineral), and is thus a measure of the

10
variation intrinsic in the population of available mineral
particles. It's variance is the square of the coefficient of
variation of the grade. If the grade is expressed as the ratio
of two random variables on a set of sampling units, then the
fundamental error may be estimated using methods given by Cochran
[3]. It can be shown that Gy's methods are equivalent to those
given by Cochran, for a population for which complete enumeration
is possible. However, Gy has developed methods for particulate
sampling where complete enumeration of samples is not feasible or
cost effective.
COMPARISON OF GY'S THEORY AND CLASSICAL SAMPLING
The description below uses the notation of classical
statistics and the physical context of particulate ore sampling.
Most environmental samplers will be accustomed to the notation
used, but they will probably wish to translate physical variables
into those used in their own areas of interest. For example,
grade of ore may be analogous to the percentage of some chemical
present in a particular medium.
Let L represent an ore body, where X is the total mass of
the body and Y is the mass of the mineral of interest. The
parameter to be estimated is R=Y/X, the grade of the ore body.
In the language of classical finite sampling, this is the problem
of estimating the ratio of two random variables. In finite

11
sampling theory, it is assumed that a population is composed of N
sampling units (Ui I=1,2,3...N) and that certain attributes of
these units may be enumerated or measured. For the estimation of
ore grade, the finite population consists of a set of fragments
of ore. The mass of the mineral of interest contained in
fragment I is denoted by yi and the total mass of the fragment is
given by xi for I=1,2,3...N. The notational conventions used for
totals, averages and ratios are given in (Table 1).

12
Table 1. Notation for 2 random variables measured on a sample of

size n from a population of size N.
Population of size N Sample of size n
Total
Y ' j yi y ' j yi
N n
i'1 i'1
Mean
Y X y x
Ȳ ' , X̄ ' ȳ ' , x̄ '
N N n n
Ratio
Y Ȳ y ȳ
R ' ' R̂ ' '
X X̄ x x̄
An estimator of the ratio R, and approximations of the first
two moments of that estimate are given both by Cochran [3] and by
Gy [1]. Each uses a slightly different method of derivation to
arrive at results, but the moments they obtain are equivalent up
to the order of approximation. A summary of the derivation of
these results is given.
Estimation of The Grade R^
In a finite population the statistical expectation operator
is defined by averaging over all possible combinations (CNn), of
samples, where
13
(N)!
CNn ' . (3)
n!(N&n)!
Cochran [3] shows that the expectation of ȳ is given by
E[ȳ] ' j ' Ȳ.

ȳ
(4)
CNn
where the sum is over all possible samples. Using this result,
it is clear that Ny/n is an unbiased estimator of Y. The natural
estimator of R is based on the ratio of totals
y
R̂ ' (5)
x
Moments of the Estimated Grade R^
The moments of a ratio estimator are not obvious. Both Gy
and Cochran find means and approximations of variances of R^ .
Brief derivations follow.
Define
nX nY
µx ' E(x)' , µy 'E(y) ' . (6)
N N
One may express R^ in terms of the relative variables u and v
defined by
14
x ' µx (1%u) and y 'µy (1%v). (7)
Then in terms of u and v, R^ is given by
(1%v)
R̂ 'R . (8)
(1%u)
Changing the denominator into a multiplicative factor and using
Taylor's theorem gives
R̂ ' R (1%v)(1&u%u 2&u 3%...). (9)
Because the expectations of both u and v are 0,
E(R̂) ' R (1&µ11(u,v)%µ20(u,v)%µ21(u,v)%...) (10)
where
µij(u,v) 'E[ (u&µu)i(v&µv)j ] (11)
This may be written in the form
E(R̂) 'R (1%S) (12)
where S is given by
S ' (&µ11(u,v)%µ20(u,v)%µ21(u,v)%....). (13)
S is equivalent to what Gy calls the fundamental bias,
E(R̂) &R
S ' Bias(R̂) / , (14)
R
15
which is the relative bias in the estimate R^ . Gy [1] and
Cochran [3] both independently provided approximations of S using
the first two terms in the series and writing the result in terms
of the correlation between x and y. Matheron [12] in an
examination of Gy’s work used Laplace transforms to derive a
general expression for the expectation of the ratio of 2 random
variables raised to a power. This general result was used to
derive Gy’s formula. Further, Cochran [3] presents the exact
results due to Hartley and Ross [13]
µ11(R̂,x)
E(R̂) 'R 1& (15)
µxR
and
µ11(R̂,x)
S '& . (16)
E(x)R
Only approximate formulas for of the variance of R^ are
available. Using equation (8) we compute the expectation
2
E(R̂ )' R 2 (1%S )). (17)
where
S ) ' (µ02(u,v)&4µ11(u,v)%3µ20(u,v)%...) (18)

16
is obtained from the Taylor expansion of 1/(1+u)2. Combining
equations (12) and (17) gives
2
Var(R̂) 'E(R̂ ) &{E(R̂)}2
(19)
' R 2 (S )&2 S).
Using the definition of µij(u,v) given in (11) it can be shown
[3], that up to the second order moments of u and v,
j
N&n N (yi &R xi)
2
1
Var(R̂) – . (20)
2 nN i'1 N&1
X̄
The Fundamental Error
Gy [1] defines the fundamental error of estimation and the
relative variance of the fundamental error as
R̂ & R Var(R̂)
FE ' , and s2(FE) ' (21)
R [E(R̂)]2
respectively. This is a slightly nonstandard convention in that
the usual variance of the fundamental error is given by
Var(R̂)
Var(FE)' ' S )&2S. (22)
2
R
In practice, this convention does not pose any difficulty since
up to second order moments in u and v

17
s2(FE) ' Var(FE). (23)
From these forms, Gy derives the approximate form
&1 j
Var(R̂) N N xi 2
R i&R 2
– (24)
(E(R̂))2 n i'1 X R
which is used in applications below.
In summary, Gy's fundamental error is exactly the relative
bias in the estimate R^ of the ratio of random variables. This
is equivalent to the bias given by Cochran. Further, the
variance of the fundamental error as defined by Gy, s2(FE), is
asymptotically equivalent to the usual variance of the
fundamental error (Var(FE)).
In applications where complete enumeration of the sample is
possible, the methods given by Gy are equivalent to those of
classical random sampling. Differences between Gy's methods and
those of finite sampling lie in the methods developed for
sampling of particulate materials after grouping into categories.
These methods are used to reduce the cost of estimation when
complete enumeration of the sample is not feasible. In these
methods, estimators are developed which are similar to those
applied to estimate the mean and variance of grouped data.
APPLICATION TO PARTICULATE MATERIALS

18
To estimate R and the variance of R^ requires complete
enumeration of the n units in the sample and measurement of xi
and yi on each sampled fragment. In the case of particulate
materials, this is impractical. To overcome this problem, Gy
derived a method of estimation of R^ and the variance which does
not require fragment-by-fragment enumeration. Details follow.
Let L represent the population of particulate material with
N fragments denoted by {Ui I=1,2,3...N}. These fragments may be
divided into classes Laß with average volume Va and average
density ?ß. If each fragment in the class Laß is identified with
an average fragment Faß, then an estimate of R^ and Var( R^ ) based
on the midpoints of the size and density classes may be used.
This treatment is essentially the same as the computation of the
mean and variance from a grouped frequency distribution. The
necessary notation is listed in Table 2.
Table 2. Definitions of notation used for estimation of R^ and

Var( R^ ) based on size and density classes. Each fragment is
represented by an average fragment denoted Faß.
Population of size N Sample of size n
Average Volume Va va
Average Density ?ß dß
19
Average Mass X̄aß ' Va?ß x̄aß 'va dß
Average Ratio Raß R̂aß

(Grade)
Consider equation (24) for the variance of the fundamental
error. Summation on the index I is replaced with double sums on
a and ß as
j (Ri&R) x i – j j Naß (Raß&R) X̄ aß.

N r s
2 2 2 2
(25)
i'1 a'1 ß'1
Making this substitution in equation (24) and using the fact that
X̄aß ' Va?ß gives
jj
Var(R̂) N 1 r s NaßX̄aß Raß&R 2
' &1 Va?ß
(E(R̂))2 n X a'1 ß'1 X R
(26)
N 1
' &1 H.
n X
H is defined to be a constant of constitution heterogeneity. The
number of fragments Naß in class (a,ß) times the average particle
mass gives the total mass in that size-density class.
Using this relationship,
H – jj
r s Raß&R 2
Xaß
Va?ß. (27)
a'1 ß'1 R X
Now an estimate of H based on the sample of n fragments is
needed. One may estimate Va and ?ß with their sample equivalents

20
va and dß respectively. This gives the estimate
x – j j xaß where x̄aß – vadß xaß – naßx̄aß (28)

a ß
where naß may be known, or estimated based on average volume and
mass. Depending on the degree of precision desired and the
available resources, the number of grains in each volume density
class may be counted or estimated based on the average size and
density.
Defining dm as the density of the constituent of interest,
dw as the density of the waste and
1 1
&
da dw
R̂aß – , (29)
1 1
&
dm dw
an estimate of the ratio R is given by the weighted average
j j xaßR̂aß
j j xaß
a ß
R̂2 ' . (30)
a ß
Substituting equation (28) into (27) and using equations (29) and
(30)gives the estimate
Ĥ – j j
R̂aß&R̂2 xaß
vadß (31)
a ß R̂2 x
21
As with naß, the average value of R̂aß could be estimated through
assay if budget constraints allowed. In most environmental
sampling scenarios, direct assay would be used. It should be
noted that when estimated by equation (29), Raß appears to depend
only on a. However, the density of waste and mineral may vary
with volume class depending on the degree of separation between
ore and waste (percent liberation). As the fragment size
decreases, the percent liberation of the constituent of interest
is generally increased. This variation will be captured if Raß
is estimated by assay rather than by equation (29). Finally,
since X >> x an estimate of the variance of the fundamental error
is
N 1 X x̄ 1
var(FE) – &1 Ĥ ' &1 Ĥ
n X X̄ x X
(32)
1 1 1
– & Ĥ – Ĥ
x X x
Estimation of Physical Constants
Gy [9] developed a set of physical constants which can be
used to estimate the variance of the fundamental error for
mineralogical data. Following is a method to estimate those
physical constants from sample data. To facilitate the
computation of these constants we introduce the usual dot

22
notation for row and column sums typically associated with the
analysis of variance.
xa. ' j xaß x.ß ' j xaß

s r
(33)
ß'1 a'1
x.. ' j j xaß' j xa.' j x.ß

r s r s
(34)
a'1 ß'1 a'1 ß'1
Define the constitution heterogeneity for a given size class
obtained by summing over the density classes
Ha ' j
xaß R̂aß&R̂2
va dß. (35)
ß xa. R̂2
Two limiting cases can be identified for Ha given by complete
homogeneity of the material sampled or complete heterogeneity.
These limiting cases help to explain the method being used, and
they also occur in certain applications.
a) Completely homogeneous (R̂aß/R̂2) for all a.
In this case Ha = 0 (no constitution heterogeneity).
b) Completely heterogeneous (completely liberated)
In this case all of the material in class a can be
separated into 2 density classes
ß = 1, for pure mineral, with grade 1.0
ß = 2, for pure waste , with grade 0.0.

23
In the completely liberated case, let xa1 be the mass of the
mineral in class a and xa2 be the mass of the waste in class a.
Then, in this limiting case, let ca = Ha
2
1&R̂2
ca ' Ra dm % 1&Ra dw (36)
R̂2
where
xa1
Ra ' (37)
xa1%xa2
is the ratio of the mass of the constituent of interest to the
total mass in volume class a. In this case the liberation ratio
Ra, is defined to be (Ha/ca) so that for the completely
homogeneous case, Ra = 0 and for the completely liberated case Ra
= 1.0.
Define
j xa. va ca Ra
j xa. va
H (' a
. (38)
a
Then the mineralogical factor c is given by
2
1&R̂2
c ' R̂2 dm %(1&R̂2) dw (39)
R̂2
and the liberation factor R is the ratio of H* and c

24
H(
R' . (40)
c
Finally the constitution heterogeneity H, can be approximated by
Ĥ– j
xa.
vac R (41)
a x..
Letting v95 be the 95th percentile of the volumes, define the
granulometric constant
g'j
xa. va
. (42)
a x.. v95
Then the variance of the fundamental error is estimated by
var(R̂) N 1
s2FE ' – &1 cR gv95
(E(R̂)) 2 n X
1 1
– & cR gv95 (43)
x X
cR gv95
– .
x
The final relation follows because X >> 1. In the context of
environmental sampling, these constants H, R, c and g must be
reinterpreted and estimated. François-Bongarçon [14] noted that
Gy’s method, although potentially powerful has failed to be
widely applied even in mining applications due to the difficulty
in adequately estimating the geological constants. Further
research should be directed toward determination of appropriate

25
physical constants in the environmental setting. Sinclair [15]
emphasized the importance of characterization of heterogeneity
(denoted as geologic and value continuity) for ore reserve
estimation. Similar characterization is of equal importance in
the environmental sampling context.
Summary of Gy's Basic Formula
Let K = (c)(R)(g), where c is the composition
(mineralogical) constant, R is the liberation factor, and g is
the granulometric constant. Note that c has units mass/volume
and the other constants are unitless. Then the basic formula
advanced by Gy is
var(R̂) K v95
s2FE ' – (44)
(E(R̂))2 x
Here, v95 is the 95-th percentile of the fragment volumes and x
is the sample weight. The symbol sFE2 in equation (44) represents
the square of the coefficient of variation of the fundamental
error. This is somewhat different from usual statistical
notation where s2 is reserved for variances, but it is consistent
with Gy's use of the term. Therefore, if K is known, one can
estimate the square of the coefficient of variation of R^ as a
function of the physical constants, x, and v95. Alternatively,
the weight of the sample, x, needed to achieve a specified

26
coefficient of variation can be computed if v95 is known, or the
size to which the material must be ground to (that is the
required v95 ) can be calculated for a fixed weight of sample.
OTHER APPLICATIONS
Gy's methods were developed specifically for application to
particulate sampling. However, these methods may also be applied
directly to other continuous materials, such as liquids.
Although the physical constants developed empirically for
minerals do not apply to fluids, the histogram methods can be
applied directly. Environmental sampling for contaminants in
liquid media is thus a natural area of application. In
particular, Gy's methods suggest application to composite
sampling. For example, monitoring a river for contaminant
concentrations could be aided by Gy's methods, in that
appropriate sizes of experimental units could be derived through
a size analysis similar to that applied to particulates. Further
research should include empirical experimentation to develop a
set of physical constants for sampling of other than heavy
metals.
Computer subroutines have been developed at the University
of Wyoming to compute estimates of the ratio of random variables
using the methods given by Cochran [3] and Gy [1] and [9]. These
subroutines also provide estimates of the constants, c, R, and g.

27
Some examples of the application of Gy's results follow.
EXAMPLES
Example 1
To compare Gy's approximate method to the classical finite
sampling methods with complete enumeration, we simulated data
representing 1000 soil fragments. The simulated population ratio
was assumed to have a lognormal distribution with expected value
0.05. The fragment masses were assumed to be exponentially
distributed giving many small fragments with a few larger
fragments. The simulated data were analyzed using the computer
subroutines referenced above. Results are included in Appendix
A.
Using the finite sampling methods where individual fragment
by fragment enumeration was required, the estimate of the ratio
was found to be R^ = 0.05206 with an estimated relative variance
var( R^ )/( R^ )2 = 0.003327. Using Gy's methods on the same data
after data were cross-classified into size and density classes
resulted in the estimated ratio R^ = 0.05249 with an estimated
variance of the fundamental error given by s2(FE) = 0.002601. We
consider this to be relatively good agreement of the 2 methods,
although results are conditioned on the particular realization of

28
the simulated population. The physical constants derived by Gy
were also calculated and are given in Appendix A, along with the
other estimates and the cross-classified data.
Example 2
One application of the use of Gy's formulas is the
determination of the sample size (mass) required to attain a
desired relative precision in the estimation of the grade of a
mineral of interest. This is a standard example due to Ottley
[16] giving the way in which Gy’s formula is typically used in
mining applications. Other more recent examples can be found in
François-Bongarçon [17]. It is anticipated that similar use can
be made in environmental settings. Suppose it is anticipated
that an ore of zinc contains 6.6% Zn as ZnS. If the ore can be
crushed to a maximum size of 2 cm, what mass of sample is
required to insure that a 95% confidence interval gives an
estimate of the grade with relative error + 10% ?
An approximate 95% confidence interval for R is given by
R̂ ± 2× SE(R̂) (45)
The specified precision can be expressed by
2 SE(R̂)
' 0.10 (46)
R̂
29
or equivalently as
2
0.10 (cRg)v.95
s2
FE
' ' . (47)
2 x
Gy substitutes for v.95 using the 95th percentile of the diameters
and a shape factor f. Empirical studies have shown that in most
mineralogical applications v.95 ' f (d.95)3 . In the present
example, f=0.5 and d.95 =2 cm. giving v.95 = 4 cm3. Gy recommends
the granulometric constant g=0.25 and the liberation factor
R=0.05. The mineralogical factor recommended by Gy is given by
1&R
c' (1&R) dm % Rdw (48)
R
where some suitable constants are dm=5.0 for the density of
mineral, dw=2.6 for the density of waste and R=0.066 x 1.5=0.099.
This gives c=43.34 and K=(c)(l)(g)=0.54. Substituting K and v.95
into equation (47), gives a sample mass x = 864 g.
To improve the precision of estimates, the sample could be
crushed further. To what diameter should the sample be crushed
to give a relative error of 0.05% given the sample mass of 864 g?
Again solve equation (47) where x=864g is substituted. This
gives v.95 = 1 cm3 or d.95 = 1.26 cm, so the sample should be
crushed to a diameter less than 1.26 cm.
The first example provides evidence of the similarity

30
between estimates obtained through the use of classical sampling
methods and Gy's methods. The second example gives an indication
of the utility of Gy's specification of physical constants for
sample size and handling determination. Gy has developed a
method for converting the classical sample size determination
problem into one of sample mass and sample handling procedures
appropriate to achieve a specified precision. Pitard [10]
provides many further details and examples in a modern context
for these procedures.
Example 3
A third application is found in the sampling of liquids.
Suppose an estimate of the concentration of an organic
contaminant such as polychlorinated biphenyl (PCB) flowing past a
cross section of river is desired. Water sub-samples of volume v
are to be taken at random locations in the cross section and
combined to some total volume vt to estimate the concentration.
What volume of sub-sample unit should be used and what total
volume is required to give a specified coefficient of variation
(sFE) for the estimate?
To answer this question, one may use Gy's methods where each
sub-sample unit, (ie. an increment of water and suspended
particulate) is treated analogously to a fragment of solid
material. Assume that the contaminant is found in solution and
as a surfactant on suspended particulate material. If several

31
sizes of sub-sampling units are used, then the set of sub-sample
observations can be classified into a 2 way table by volume and
density. If there is little suspended particulate, then there
will be just one density class. It is anticipated that the
percentage of suspended particulate may vary with the volume and
density of sample units. Then using Gy's basic equation,
(c gR) v
s2FE ' (49)
vt
with a selected value of sFE, the volume of an individual sub-
sample unit may be determined for a given total volume of sample,
or alternatively, a total volume of sample may be determined
given a sub-sample unit volume. However, to apply equation (49),
the constants c, g and R must be determined.
A basic field exercise may be used to determine these
constants. Suppose a set of r samples of size n is taken where
the volume of each sub-sample unit is intentionally varied so
that v1, v2, ... vr are the sub-sample volumes. Each sample unit
is kept separate, and the, volume and density are recorded. The
set of (nxr) sub-sample units collected is then cross classified
based on volume and density. Sample units are combined within
volume and density class and assayed for PCB content. If
individual sample units are sufficiently large, and PCB
concentrations are high, then individual sample units could be
assayed. For volume density class (a,ß) the number of sample
units in the class naß, the average mass xaß and the average PCB
32
concentration is available. Based on this table, the estimates
of c, g and R can be obtained from the formulas previously given
in this paper. These constants may then be used to determine the
relationship between total sample volume, vt, and sub-sample unit
volume, v and s2FE .
CONCLUSIONS
The methods of Gy and Cochran for ratio estimation have been
shown to be asymptotically equivalent for samples which can be
completely enumerated. Both are based on finite sample theory.
Gy extends the procedure to treat data grouped into a 2-way table
of fragment volume and fragment density, and provides a simple
estimation procedure for estimating appropriate sample volume and
fragment sizes to attain a specified relative error. A computer
program, available from the authors, has been developed at the
University of Wyoming to estimate the constants from a table of
grouped data.
This paper has shown certain equivalences between finite
sampling theory and Gy’s work for cases where samples may be
completely enumerated. In environmental settings and for the
estimation of certain ores such as precious metals, further
empirical study is required to improve the value of Gy’s method
for sampling materials which are not completely enumerable. Note
that methods outlined in example 3 show how Gy’s method can be

33
implemented for the important problem of sampling liquid media.
Future work will determine the ultimate value of Gy’s method in
applications other than ore reserve estimation.

34
REFERENCES
[1] Gy, P.M. (1967). Memoires du Bureau de Recherches Geologiques

Minieres, no. 56, (Chapitre 4, Theorie de l'enchantillonnage
equiprobable, pp. 42-51), Paris.
[2] Gilbert R.O. (1987) Statistical Methods for Environmental

Pollution Monitoring, Van Nostrand Reinhold, New York.
[3] Cochran, W.G. (1977). Sampling Techniques, John Wiley &

Sons, Inc. New York.
[4] Thompson, S.K. (1992). Sampling, John Wiley & Sons, Inc.
New York.
[5] Cressie N.A.C. (1991). Statistics for Spatial Data, John

Wiley & Sons, Inc. New York.
[6] Flatman G.T, Englund E.J. and Yfantis, A.A. (1988)

Geostatistical Approaches to Design of Sampling Regimes, In
Principles of Environmental Sampling, L.H. Keith Ed;
American Chemical Society,Washington, D.C., pp 73-84.
[7] Borgman L.E. and Quimby W.F. (1988) Sampling for Tests of
Hypotheses When Data are Correlated in Space and Time. In
Principles of Environmental Sampling, L.H. Keith Ed;
American Chemical Society,Washington, D.C., pp 25-44.
[8] Brus, D.J. and de Gruijter, J.J. (1993) Environmetrics (4)

pp. 123-152.
[9] Gy, P.M. (1982). Sampling of Particulate Materials, Theory

and Practice. Elsevier Scientific Publishing Company, New
York.
[10] Pitard, F.F. (1989). Pierre Gy's Sampling Theory and

Sampling Practice Volume I, Heterogeneity and Sampling. CRC
Press Inc. Boca Raton, Florida.
[11] Lyman, G.J. (1993). Geochimica et Cosmochimica Acta. (57)

p. 3825-3833.
[12] Matheron, G., (1966), Review de L’industrie Minerale. Aug.

P. 609-621.
[13] Hartley, H.O. and Ross, A. (1954). Nature, 174, p 270-271.
[14] François-Bongarçon, D. (1992). The theory of sampling

35
broken ores, revisited: An effective geostatistical approach

for the determination of sample variances and minimum sample
masses. In Proceedings of The XVth World Mining Congress,
Madrid, Spain.
[15] Sinclair, A.J. (1994). Explor. Mining Geol., (3) 2, pp. 95-
108.
[16] Ottley, D.J. (1966). World Mining, (19), 9, p. 40-44.
[17] François-Bongarçon, D. (1991), CIM Bulletin, (84) 970, p 75-

81.
36
APPENDIX A.
CLASSICAL VARIANCE ESTIMATES, COCHRAN (1977)
TOTAL OF X: 48021.5
TOTAL OF Y: 2500.18
ESTIMATED RATIO: 0.520638E-01
ESTIMATED VARIANCE: 0.901763E-05
ESTIMATED SQUARED CV: 0.332676E-02
DATA GROUPED BY SIZE AND DENSITY CLASSES
AVERAGE FRAGMENT MASS

AVERAGE GRADE
CELL FREQUENCY
DENSITY CLASS
VOLUME 2.6378 2.8382 3.0070 3.1647 3.3960
8.9512
23.6117 25.4051 26.9161 28.3278 30.3988
0.0286 0.1777 0.2800 0.3722 0.4883
632 50 17 12 4
31.9481
84.2732 90.6742 96.0669 101.1057 108.4972
0.0308 0.1698 0.2917 0.3616 0.0000
187 18 4 1 0
56.7869
149.7934 161.1710 170.7565 179.7128 192.8510
0.0361 0.1571 0.0000 0.0000 0.0000
46 4 0 0 0
77.9711
205.6735 221.2953 234.4567 246.7541 264.7936
0.0283 0.1630 0.0000 0.0000 0.0000
19 2 0 0 0
105.5720
278.4796 299.6315 317.4518 334.1024 358.5276
0.0317 0.0000 0.2655 0.0000 0.0000
3 0 1 0 0
VARIANCE ESTIMATES USING VOLUME & DENSITY CLASSES, GY (1982)
APPROXIMATE GRADE: 0.524973E-01

TOTAL MASS: 48027.0
CONSTANT OF CONSTITUTION HETEROGENEITY: 124.920
MINERALOGICAL FACTOR: C= 87.9689

GRANULOMETRIC FACTOR: G= 0.601487
LIBERATION FACTOR: L= 0.422788E-01
37
95th VOLUME PERCENTILE: 55.8410

ESTIMATED VARIANCE OF THE FUNDAMENTAL ERROR: 0.260103E-02
38
CONTRIBUTORS
Anderson-Sprecher, Richard
Statistics Department
Univesity of Wyoming
Laramie, WY 82071
Borgman, Leon E.
Statistics Department
Univesity of Wyoming
Laramie, WY 82071
Flatman, George T.
Exposure Assessment Research Division
U.S. Environmental Protection Agency
Las Vegas, NV 89114-5027
Kern, John W.
Western Ecosystems Technology Inc.
1402 S. Greeley Hwy.
Cheyenne, WY 82007
View publication stats

ACSGY7 PRN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ACSGY7 PRN

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The sampling theory of Pierre Gy: Comparisons, implementation and

Chapter · January 1996

John W. Kern Richard Anderson-Sprecher

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

The Sampling Theory of Pierre Gy:

Leon E. Borgman, John W. Kern,

The sampling theory developed and described by Pierre Gy [1]

Classical finite sampling theory, sometimes called design-

based sampling, is most often associated with the design and

analysis of sample surveys. The identical theory is also used,

however, to guide the selection and analysis of samples in other

applications. In particular, environmental studies are most

often planned and interpreted from this standpoint. (see, for

example, [2]). A thorough description of classical sampling may

be found in Cochran [3], and Thompson [4] describes applications

of the classical theory to selected nonstandard problems.

The major advantage of classical random sampling is that it

is fundamentally objective; assumptions made about the underlying

population are, for practical purposes, nonexistent. An

underlying characteristic of the classical paradigm is that it

considers the real world (the population of interest) to be fixed

and deterministic, and randomness is present only because of the

sample selection process. When estimating a fixed population

parameter, variation within the population is thus a hurdle to be

surmounted, and probabilistic description of variation is neither

necessary nor even meaningful. Whatever patterns of variability

are present within the population are effectively removed or,

more accurately, nullified (on average) by randomization of the

The alternative to classical sampling is generally

understood to be model-based sampling. Model-based sampling

(like design-based sampling) actually consists of a variety of

methods, the most practical and important variant for

environmental samplers being geostatistical sampling (See [5] for

a thorough, general treatment and [6] for a discussion of

geostatistical applications to environmental problems). The main

distinction between model-based and design-based theories rests

on the use or non-use of a model to account for patterns of

variability within the population. Geostatistical models

describe the spatial covariance structure of variables of

interest. Because these models are stochastic, they interject a

degree of randomness into one's perception of the world itself.

In other words, the world as observed through a window in space-

time is no longer seen as a fixed fact, but is rather viewed as a

single realization of a random process.

Because the model-based approach views randomness as part of

the population itself, random sampling is no longer necessary for

a model-based sample design. In fact, it is generally not even

desirable, because regularly spaced observations usually provide

the best information about the random process one assumes to be

lurking behind the realized population. The price paid for

allowing non-random sampling is that the patterns of variation in

space and/or time--that is, the character of the underlying

process--must be adequately (although not perfectly) understood

if estimates are to be reliable. The payoff is that model-based

sampling is usually more efficient because it makes more complete

use of information about the population.

Model-based sampling is now sufficiently well established

amongst sampling theorists for opposing camps of design- vs.

model-based statisticians to have formed, feuded, and (sometimes)

perspectives have value, depending on the actual problem at hand.

Relevant articles of interest include Borgman and Quimby [7] and