You are on page 1of 4

Co~Jtters & GeotcitncesVol.7, No. 4, pp. 331-334,1981 0098-3004/811040331-4Mf~.

00/0
Printedin Great Britain. © 1981PergamonPress Ltd.

THE DESIGN OF OPTIMAL SAMPLING SCHEMES


FOR LOCAL ESTIMATION AND MAPPING
OF REGIONALIZED VARIABLES--I

THEORY AND METHODt

A. B. MCBRATNEY
Department of Soil Science, University of Aberdeen, Scotland

R. WEBSTER
Rothamsted Experimental Station, Harpenden, Hertfordshire, England

and

T. M. BURGESS
Department of Agricultural Science, University of Oxford, England

(Received 6 December 1980; revised 13 February 1981)

Abstract--Surveys of materials at the earth's surface, especially soil, can be planned to make the best use of the
resources for survey or to achieve a certain minimum precision provided the nature of spatial dependence is known
already. A method is described for designing optimal sampling schemes. It is based on the theory of regionalized
variables, and assumes that spatial dependence is expressed quantitatively in the form of the semi-variogram. It
assumes also that the maximum standard error of a kriged estimate is a reasonable measure of the goodness of a
sampling scheme. By sampling on a regular triangular grid, the maximum standard error is kept to a minimum for
any given sampling, but a square grid is approximately equivalent where variation is isotropic. Given the
semi-variogram for a variable, the sampling density for any prescribed maximum standard error is determined.
Where variation is geometrically anisotropic, the same method is employed to determine sample spacing in the
direction of maximum change, and the grid mesh elongated in the perpendicular direction in proportion to the
anisotropy ratio.

Key Words: Geostatistics, Kriging, Regionalized variables, Sampling, Soil Survey, Soil Science.

INTRODUCTION reason is economy of effort where only sparse sampling


A frequent task in regional studies of soil, geochemistry can be afforded. Here, however, quantitative estimation
and hydrology is to describe and map one or more is possible with sound sampling, and the classical statis-
variables over the land surface from sample data. In each tical model has been applied to determine the precision
situation, variation is acknowledged, and the aim is to of estimates and sampling effort in several instances, for
display this variation and use the knowledge of it to example, Thornburn and Larsen (1959) and Kantey and
predict or estimate values at unvisited sites within the Williams (1962) in the civil engineering field, Webster and
region. However, questions arise as to the precision of Beckett (1968) for general soil survey, and Tidball (1976)
the estimates and to the confidence that can be placed in for geochemistry.
such predictions, and to the sampling effort required to The classical model assumes that the value of a vari-
achieve any particular degree of precision. able Z at any place i in class j is the sum of three terms:
In geological surveys, information usually derives
from sparse fortuitous exposures of the rock and from z~ = #. + aj + e~j (I)
deep borings whose locations are determined logistically.
At best investigators have to be satisfied with a posteriori where/z is the general mean of the variable in the region
estimates of precision from sampling schemes in which as a whole, aj is the difference between the general mean
they have had or can have little control. At worst, no and the mean of class j, 0.J, and e, is a spatially-uncor-
quantitative estimation procedure is possible. Neverthe- related component distributed normally with zero mean
less, by subdividing a region into classes each of which and variance cr2. The expected value at any place in the jth
can be regarded as reasonably homogeneous, the pre- class is tz~, and is estimated from a sample of size nj as
cision of information from scattered bores and outcrops
can be increased by restricting use to the classes within
which those borings and exposures lie. z'J = z~ = - - 2., zkj. (2)
nik=l
The same approach has been adopted for surveys of
soil and other superficial deposits even though these The estimate is, of course, subject to error, and has an
materials are accessible almost everywhere. Again, the estimation variance, (rE2, defined by

tReprints available from Rothamsted Experimental Station. cry: = E I e z , - ~,)2]. (3)

CAGF, O Vol.7, No. 4=--A 33|


332 A.B. McBRATNEY,R. WESSTELand T. M. BURCESS

This is given by z(x) = ~ + e(x), (5)

~r~+= ~ + (r2/n ~ (4) where z(x) is the value of Z at place x within a sur-
rounding neighbourhood s, p., is the mean value in that
The estimation variance is thus the sum of the within- neighbourhood, and e(x) is a spatially-dependent, random
class variance and the estimation variance of the class component with zero mean and variation defined by
mean.
The quantity ~r/X/nj is the familiar standard error. Var [e(x)- e(x + h)] = E[{e(x) - e(x + h)2}] = 2y(h).
Using the classical model, it is a simple matter to devise (6)
a sampling scheme to estimate class means for any given
precision if cr is known; nj is chosen large enough to The symbol x is used to denote a place in the two-
achieve the desired standard error. However, it is a dimensional plane, so that x = [x~, x2]. The vector h is the
different matter when estimating the value of Z at a lag, representing the distance and direction separating
particular place. The term affecting precision is pre- pairs of places in this expression. The quantity y(h) is
dominantly the within-class variance. It sets a minimum the semi-variance at that lag, and the function 3,(h) for all
to the estimation variance which no amount, of sampling h is the semi-varingram.
can diminish. For the purpose of prediction using the The kriged estimate of Z, ~(Xo), at Xo within s is a
classical model, quality of the classification is weighted average of the observed values, z(x,),
paramount, and this must be one of the main reasons for z(x2). . . . . z(x,) in it, thus:
attention being paid to classification in regional resource
surveys.
z(~)= ~ X,z(x,). (7)
If initial classification of a region accounts for all the i--I

spatially-dependent variance, the classical model cannot


be improved. However, in intensive surveys, usually The estimation variance of Z(Xo), or kriging variance, is
there will be substantial spatial dependence within any given by
regional subdivisions. In these circumstances, the clas- n n n
sical model, because it takes no account of spatial cor-
O'K2= 2 ~1 ~'Y(X',Xo)-2 E ~,,~4(X,,Xt), (8)
relation and the positions of sampling sites, is in- -- i--lj--I

adequate. The alternative is to use regionalized variable


theory (Matheron, 1965, 1971), and this the earth scientist where y(x,,xo) and y(x,,xj) are the semi-variances be-
concerned with surface materials is able to do. Since the tween places x, and xo and between x~ and xj, respectively.
land surface is accessible readily, scientists in prin- The weights, ~,, are chosen so that they sum to 1,
ciple can sample to achieve the precision they desire. thereby ensuring that the estimate is unbiased, and at the
To do this they must have prior knowledge of the vari- same time, minimize the estimation variance.
ability in the region. Given this knowledge in the form of The latter condition obtains when
a semi-variogram, either inferred from the survey of a n

similar neighbouring region or, obtained from recon- ~ Ajy(x,, x~) + O = 7(x,, xo) for all i, (9)
naissance transects across the region, it is possible to
design a sampling scheme that achieves the desired pre- where 0 is the Lagrange parameter associated with the
cision for minimum effort, and is, in this sense, optimal. minimization. Equation (8) thus becomes
This paper outlines the theoretical background, describes
n
the method and provides a general recipe for survey.
~r~ = ~ A,~/(x,,xo) = ¢,. (I0)
Part II presents a FORTRAN program for designing J-I

optimal schemes, and illustrates it with examples


(McBratney and Webster, 1981). Because we are more The coefficients, ~, are obtained by solving the set of
familiar with soil than other materials at the earth's n + ! simultaneous equations (9) written in matrix form
surface and our examples are drawn from soil survey, we as

shall describe the problem in terms of soil, though c= A-Ib, (11)


recognizing that its solution applies equally in other
fields.
~,(x,, x~) ~,(x2, x,) . . . . ~(x., x9
METHOD
where A = i ~ i i
Basis of method ~(x~, x.) ~,(x,, x.) . . . . -~(x., x.)
The method of local estimation provided by regional-
1 .... 1 0
ized variable theory is known as kriging in earth
sciences. The theory is described by Matheron (1965,:
1971); it is summarized and illustrated with examples of F y(x~,
y(x2,
A~

applications by David (1977) and Journel and Huijbregts


b= i ;and c= i (12)
(1978). The model for simple kriging, which we have
discovered to be adequate for soil (Webster and Burgess, LY(~ 2,x° OA"
1980), is
Design of optimal samplingschemes for local estimation--I 333
The kriging variance is computed then as preferred. It is worth noting that Mat~rn (1960) showed
equally conclusively that the equilateral grid is the best
trK2 = bTe. 03) sampling design for estimating me~.n values of variables
within geographic regions.
Equations (8) and (10) define the kriging variance in
those situations where estimates are required for sites of Optimal sampling
the same area and shape, that is support, as those on Assuming, for the moment, that variation is isotropic,
which the individual observations were made: they refer the previous description shows that for any sampling
specifically to point kriging. In many instances, an esti- density, and hence cost, the maximum error will be
mate will be required over a larger area or block, V. The minimized if sampling is performed on an equilateral
kriging variance, ~r:ga, is then triangular grid, and occurs when estimating values at the
tl centers of the grid cells. If the cost of survey is limiting,
cr~a = , ~ Aiy(x~, V)+ 0a - y(V, V). (14) the estimation variance can be predicted by solving
equation (10). In block kriging, the maximum estimation
The term 7(x,, V) is the average semi-variance between variance can occur either when blocks are centered at
the observation points and block V, and y(V, V) is the the centers of grid cells or when they are centered on
average semi-variance between points within V, that is, observation points, depending on the size of the blocks
the within-block variance. The quantity CB is a Lagrange in relation to the grid mesh. Therefore, for block kriging,
parameter as before. equation (14) should be solved for both situations to
Equations (10) and (14) reveal an important feature of identify the best sampling procedure.
kriging: the estimation variances depend on the semi- If a survey is funded generously, then such sampling
variogram and, through it, on the configuration of may give more than adequate precision, and, hence,
observation points in relation to the point or block to be waste money. Equally, the maximum tolerable estimation
estimated and on the distances among them. They do not variance may be specified for the survey. In these situa-
depend on the observed values themselves. Thus, if the tions, the problem is to determine the sampling density
semi-variogram is known already then the kriging vari- that provides precisely the required precision. This can
ances for any sampling scheme can be determined before he achieved by solving equations (10) or (14), depending
performing the sampling. A prerequisite for this type of on whether the estimates are required for points or
survey is, therefore, a reconaissance from which a semi- blocks, for a range of spacings. A graph is then plotted of
2
variogram can be estimated. This is achieved by samp- o,,<m,x against sampling density, and the required density
ling along transects in several directions across the read from the graph. This defines the sampling scheme
region, estimating the semi-variances in integer multiples involving least effort or cost, for a given precision, and is
of the sampling interval, and then, fitting an appropriate optimal in this sense.
model by a least-squares or maximum-likelihood pro-
cedure. Egect of anisotropy
Equilateral sampling grids are optimal only where
Sampling pattern variation is isotropic; modification is needed otherwise.
It is evident from equations (8) and (9) that the kriging Our experience to date is that anisotropy in the lateral
variance is not constant from place to place as is the plane can be regarded usually as geometric and can be
estimation variance for the classical model. Since soil eliminated by linear transformation of the coordinates.
semi-variograms are monotonic increasing functions, at The transformation for angle 0 is defined by
least within the distances for which kriged estimates are
to be required, the kriging variance will tend to increase 1"1(0)= {A2 cos2(0 - ~) + B 2 sin2(0 - ~b)}~/2, (15)
as the distance between the interpolated point or block
and the observation points increases. Let this distance he where A, B and 4, are parameters of the model (David,
d. Clearly, d can be shortened to increase the average 1977). It can be applied to the distance parameter of any
precision of estimates. However, the maximum distance, isotropic semi-variogram, for example, to the gradient of
d .... between an interpolated point and its nearest a linear semi-variogram or to the range of a spherical
sampling point can be minimized for any given sampling model, so that the meanings of A, B and ~b depend on
density by sampling on an equilateral triangular grid. For the particular model being employed. Over the first lags,
unit density, dm,x = 0.6304 for points at the centers of where the semi-variogram is approximately linear, the
the grid cells, and it is at these points that the kriging following equation serves:
variance is largest. Let this value be o-2g~,ax.Any other
sampling scheme, especially random sampling and other y(h, 0) = Co + {A2 cos2(0 - ~) + B ~ sin'(0 - 4,)}"21hl, (16)
irregular schemes, will have some larger values of dm~x
and consequently, larger maximum estimation variances. where h is the lag and 0 is the direction in which the
Nevertheless, a square grid, with dm,x = I/X/2 =0.7071 semi-variance, % is estimated, Co is the nugget variance,
for unit sampling density, will be generally less precise A is the gradient in the direction, O, of maximum slope,
than the triangular scheme, and for reasons of con- and B is the gradient in the direction perpendicular. The
venience in indexing, site location and computer hand- ratio r = A/B is the anisotropy ratio, expressing the
ling, and of shorter travelling distances, actually may be differing rates of variation in these two directions.
334 A. B. McBRATNEY,R. WESSTER,and T. M. BURGESS

Where such an anisotropy is shown to be present, the F2 Sample transects in 3 or more directions with
sampling grid should be aligned in directions ~ and randomly-located starting points.
d, + ~r/2 with spacings in the ratio r with the smaller C2 Calculate experimental semi-variograms and fit a
spacing in direction d', the direction of maximum rate of model.
variation. An optimal sampling scheme can be obtained C3 Obtain grid spacing a for direction of maximum
using the method described in the previous section by variation d~ using the method described previously
finding the maximum grid spacing for a given O":Kmax in for both triangular and square grids. The grid
direction ~. The grid spacing for direction ~ + w/2 is, spacing in direction ,b + Ir/2 is ra, where r is the
then, r times the spacing for direction ~. A further anisotropy ratio. If the semi-variogram is isotropic,
implication of anisotropy, in terms of an overall sampling the grid can be oriented in an~/ direction and the
strategy, is that during the reconnaissance stage three or grid spacings are equal in both directions.
more directions must be studied in order to discover IM Choose to sample on a triangular or square grid.
whether or not the semi-variogram shows geometric Only in exceptional circumstances will the
anisotropy. efficiency advantage of the triangular grid out-
weight the inconveniences of extra travelling, site
Changing variance
location and computer handling.
Finally, in this section we should point out that sur-
F3 Sample on grid in direction ~ rad with spacing a
veyors should look for any substantial differences in the
and d, + ¢r/2 rad with spacing ra.
patterns of variation from one part of a region to ano-
ther. If different parts of a region have different semi- AcknowledgmentsbWe should like to thank the Natural
variograms, then different optimal strategies will apply to Environment Research Council and the Department of Agricul-
them. Surveyors may choose to compromise, perhaps by ture and Fisheries for Scotland for the award of studentships to
T. M. Burgess and A. B. McBratney, respectively.
oversampling in some parts. If the differences are large,
however, then it may pay to adopt different strategies on
REFERENCF~
nonconformant grids with different spacings or orien-
David, M., 1977, Geostatistical ore reserve estimation: Elsevier,
tations or both. This will be inconvenient, and surveyors Amsterdam, 364 p.
must judge whether the benefits of optimal sampling are Journel, A. G., and Huijbregts, C. J., 1978, Mining geostatistics:
worth the inconvenience. Academic Press, London, 600 p.
Kantey. B. A., and Williams, A. A. B., 1962. The use of soil
RECIPE FOR AN OPTIMAL SAMPLING STRATEGY engineering maps for road projects: Trans. South African Inst,
To summarize the method described and place it in the Civil Engngs., v. 4, no. 8, p. 149-159.
context of an overall scheme, the following recipe for McBramey, A. B., and Webster, R., 1981, The design of optimal
sampling schemes for local estimation and mapping of
sampling is suggested. The strategy consists of decisions
regionalized variables. Part II. Program and examples: Com-
(D), computations (C), and field-work (F). puters & Geosciences, v. 7, no. 4, p. 335-365.
D! Choose the maximum error allowed ~K~,ax and Mat6rn, B., 1960, Spatial variation: Meddelanden fr~n Statens
block size. Skogforskningsinstitut, v. 49, no. 5, 144 p.
D2 Decide the level of presurvey information Matheron, G., 1965, Les variables r~gionalis6es et leur estima-
tion: Masson, Paris, 305 p.
required. Matheron. G., 1971, The theory of regionalized variables and its
(a) If the semi-variogram is known or can be infer- applications: Les Cahiers du Centre de Morphologie Math~m-
red, then go to C3. atique, no. 5, Centre de G6ostatistique, Fontainebleau, 211 p.
(b) If the scale of variation is known or can be Miesch, A. T., 1975, Variograms and variance components in
inferred, then go to D3. geochemistry and ore evaluation, in Whitten, E. H. T.. ed.,
Quantitative studies in the geological sciences: Geol. Soc.
(c) Else, nothing is known or can be inferred about America. Mem. 142, p. 333-340.
the variable in the region of interest and the Thornburn, T. H., and Larsen, W. R., 1959, A statistical study of
scale of variation first should be obtained using soil sampling: Jour. Soil Mechanics and Foundation Division,
FI and CI. Proc. Am. Soc. Civil Engngs., v. 85, n. SM5, p. 1-13.
Tidball, R. R., 1976, Chemical variation of soils in Missouri
FI Obtain the scale of variation using a nested associated with selected levels of the soil classification system:
design (Youden and Mehlich, 1937). U.S. Geol. Survey Prof. Paper 954-B, 16 p.
C I Calculate nondirectional semi-variograms for Webster. R., and Beckett, P. H. T., 1968, Quality and usefulness
nested design (Miesch, 1975). of soil maps: Nature v. 219, no. 5755, p. 680-682.
D3 Choose transect sample interval from nondirec- Webster, R., and Burgess, T. M., 1980, Optimal interpolation and
isarithmic mapping of soil properties. 111. Changing drift and
tional semi-variogram and preset ~ .... universal kriging: Jour. Soil Science, v. 31, no. 3, p. 505-524.
remembering that this sample interval should be Youden. W. J., and Mehlich, A., 1937, Selection of efficient
considerably less than the final grid spacing to methods for soil sampling: Contr. Boyce Thompson inst. Plant
obtain a useful experimental semi-variogram. Res., v. 9, p. 59-70.

You might also like