Estimation For Stochastic Soil Model PDF

ESTIMATION FOR STOCHASTIC SOIL MODELS
By Gordon A. Fenton,1 Associate Member, ASCE
ABSTRACT: Although considerable theory exists for the probabilistic treatment of soils, the ability to identify
the nature of spatial stochastic soil variation is almost nonexistent. We all know that we could excavate an entire
site and there would be no doubt about the soil properties. However, there would no longer be anything to rest
our structure on, and so we must live with uncertainty and attempt to quantify it rationally. Twenty years ago
the mean and variance was sufficient. Clients are now demanding full reliability studies, requiring more so-
phisticated models, so that engineers are becoming interested in rational soil correlation structures. Knowing
that soil properties are spatially correlated, what is a reasonable correlation model? Are soils best represented
using fractal models or finite-scale models? What is the difference? How can this question be answered? Once
a model has been decided upon, how can its parameters be estimated? These are questions that this paper
Downloaded from ascelibrary.org by Shiv Nadar University on 01/18/23. Copyright ASCE. For personal use only; all rights reserved.
addresses by looking at a number of tools that aid in selecting appropriate stochastic models. These tools include
the sample covariance, spectral density, variance function, variogram, and wavelet variance functions. Common
models, corresponding to finite scale and fractal models, are investigated, and estimation techniques are dis-
cussed.
INTRODUCTION designs, designs involving a future state, or designs where a

large site is to be characterized on the basis of a small test
The reliability assessment of geotechnical projects has been region. This paper focuses on inferential statistics for several
receiving increased attention from regulatory bodies in recent reasons: (1) Descriptive statistics are already reasonably well
years. To provide a rational reliability analysis of a geotech- established and understood; (2) a priori knowledge of the sec-
nical system, there is a need for realistic random soil models ond-moment (covariance) structure of soil properties is essen-
that can then be used to assess probabilities relating to the tial for BLUE estimators and Bayesian updating; and (3) site
design. Unfortunately, little research on the nature of soil spa- investigations are often not complete enough to even begin a
tial variability is available, and this renders reliability analyses spatial covariance estimation with any accuracy at all. Thus,
using spatial variability suspect. In an attempt to remedy this most reliability based designs will benefit from a database of
situation, this paper lays out the theory and discusses the anal- second-order soil statistics.
ysis and estimation tools needed to analyze spatially distrib- Consider a site from which a reasonably dense set of soil
uted soil data statistically. Because of the complexity of the samples has been gathered. The goal is to make statements
problem, the concentration herein is purely on the 1D case. using this data set about the stochastic nature of the soil at a
That is, the overall goal is to establish reasonable models for different, although presumed similar, site. The collected data
soil variability along a line. To achieve this goal, existing tools are a sample extracted over some sampling domain of extent
and estimators need to be critically reviewed to assess their D from a continuously varying soil property field. An example
performance for both large and small geotechnical data sets. may be seen in Fig. 1, where the solid line could represent the
In general, statistical analyses can be separated into two known, sampled, undrained shear strength of the soil, and the
areas that can be thought of as descriptive and inferential in dashed lines represent unknown shear strengths outside the
nature. In the former, the goal is to best describe a particular sampling domain.
data set with a view toward interpolating within the data set. Clearly, this sample exhibits a strong spatial trend and
For example, this commonly occurs when geotechnical data would classically be represented by an equation of the form
are obtained at a site for which a design is destined. The de-
scriptive techniques most often used are those of regression, Su(z) = m(z) ⫹ ε(z) (1)
using an appropriate polynomial that explains most of the var- where m(z) = deterministic function giving the mean soil prop-
iability, or best linear unbiased estimation (BLUE). Regression erty at z; and ε(z) = random residual. If the goal were purely
is purely geometry and observation based, whereas BLUE also descriptive, then m(z) would likely be selected to allow opti-
incorporates the covariance structure between the data. Thus, mally accurate (minimum variance) interpolation of Su be-
the BLUE techniques require an a priori estimate of the co- tween observations. This generally involves letting m(z) be a
variance function governing the soil’s spatial variability; this polynomial trend in z with coefficients selected to render ε
is often obtained by inference from other sites because it gen- mean zero with small variance.
erally requires a very large data set to estimate reliably. However, if the data shown in Fig. 1 are to be used to
In general, inference occurs whenever one estimates prop- characterize another site, then the trend must be viewed with
erties at any unobserved spatial location. Here, the word in- considerable caution. In particular, one must ask if a similar
ference will be taken to mean the estimation of stochastic trend is expected to be seen at the site being characterized,
model parameters that allow one to make probabilistic state- and, if so, where the trend origin is to be located. In some
ments about an entire site for which data are limited or not cases, where soil properties vary predictably with depth, the
available. This may be necessary, for example, in preliminary answer to this question is affirmative. For example, undrained
shear strength is commonly thought to increase with depth (but
1
Assoc. Prof., Dept. of Engrg. Math., Dalhousie Univ., Halifax, NS, not always). In cases where the same trend is not likely to
Canada B3J 2X4. reappear at the target site, then removal of the trend from the
Note. Discussion open until November 1, 1999. Separate discussions data and dealing with just the residual ε(z) have the following
should be submitted for the individual papers in this symposium. To
extend the closing date one month, a written request must be filed with
implications:
the ASCE Manager of Journals. The manuscript for this paper was sub-
mitted for review and possible publication on May 27, 1998. This paper
• Typically, the covariance structure of ε(z) is drastically
is part of the Journal of Geotechnical and Geoenvironmental Engi- different from that of Su(z)—it shows more spatial inde-
neering, Vol. 125, No. 6, June, 1999. 䉷ASCE, ISSN 1090-0241/99/0006- pendence and has reduced variance.
0470–0485/$8.00 ⫹ $.50 per page. Paper No. 18420. • The reintroduction of the trend to predict the deterministic
470 / JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999
J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

FIG. 1. Soil Sample on Finite-Sampling Domain
part of the soil properties at the target site may be grossly chemical and mechanical details of the soil particles them-
in error. selves), which is common to most soil properties and types.
• The use of only the residual process ε(z) at the target site Although it is undoubtedly true that many exceptions to the
will considerably underestimate the soil variability—the idea of an externally created correlation structure exist, the
reported statistics will be unconservative. In fact, the more idea nevertheless gives a reasonable starting point for this type
variability accounted for by m(z), the less variable is ε(z). of investigation. It allows the correlation structure derived
from a random field of a specific soil property to be used
without change as a reasonable a priori correlation structure
From these considerations, it is easily seen that trends that are for other soil properties of interest and for other similar sites
not likely to reappear, that is, trends that are not physically (or (although changes in the mean and possibly variance may, of
empirically) based and predictable, must not be removed prior course, be necessary).
to performing an inferential statistical analysis. The trend itself Fundamental to the following statistical analysis is the as-
is part of the uncertainty to be characterized and removing it sumption that the soil is spatially statistically homogeneous.
leads to unconservative reported statistics. This means that the mean, covariance, correlation structure,
It should be pointed out at this time that one of the reasons and higher-order moments are independent of position (and
for ‘‘detrending’’ the data is precisely to render the residual thus are the same from any reference origin). In the general
process largely spatially independent. This is desirable because case, isotropy is not usually assumed. That is, the vertical and
virtually all classical statistics are based on the idea that sam- horizontal correlation structures are allowed to be quite dif-
ples are composed of independent and identically distributed ferent. However, this is not an issue in this study because only
observations. Alternatively, when observations are dependent, the 1D case is considered.
the distributions of the estimators become very difficult to es- The assumption of spatial homogeneity does not imply that
tablish. This is compounded by the fact that the actual depen- the process is relatively uniform over any finite domain. It
dence structure is unknown. Only a limited number of asymp- allows apparent trends in the mean, variance, and higher-order
totic results are available to provide insight into the spatially moments as long as those trends are just part of a larger-scale
dependent problem (Beran 1994), and simulation techniques random fluctuation; that is, the mean, variance, etc. need only
are proving very useful in this regard (Cressie 1993). be constant over infinite space, not when viewed over a finite
Another issue to be considered is the level of information distance. (Fig. 1 may be viewed as an example of a homo-
available at the target site. Generally, a design does not pro- geneous random field that appears nonstationary when viewed
ceed in the complete absence of site information. The ideal locally.) Thus, this assumption does not preclude large-scale
case involves gathering enough data to allow the main char- variations, such as those often found in natural soils, although
acteristics, for instance, the mean and variance, of the soil the statistics relating to the large-scale fluctuations are gener-
property to be established with reasonable confidence. Then, ally harder to estimate reliably from a finite-sampling domain.
inferred statistics regarding the spatial correlation (where cor- The assumption of spatial homogeneity does, however,
relation means correlation coefficient throughout this paper) seem to imply that the site over which the measurements are
structure can be used to complete the uncertainty picture and taken is fairly uniform in geological makeup (or soil type).
allow a reasonable reliability analysis or internal linear esti- Again, this assumption relates to the level of uncertainty about
mation. Under this reasoning, this paper will concentrate on the site for which the random model is aimed. Even changing
statistics relating to the spatial correlation structure of a soil. geological units may be viewed as simply part of the overall
Although the mean and variance will be obtained along the randomness or uncertainty, which is to be characterized by the
way as part of the estimation process, these results tend to be random model. The more that is known about a site, the less
specifically related to, and affected by, the soil type; this is random the site model should be. However, the initial model
that is used before significant amounts of data are explicitly
true particularly of the mean. The correlation structure is be-
gathered should be consistent with the level of uncertainty at
lieved to be more related to the formation process of the soil;
the target site at the time the model is applied. Bayesian up-
that is, the correlation between soil properties at two disjoint
dating can be used to improve a prior model under additional
points will be related to where the materials making up the site data.
soil at the two points originated and to the common weathering With these thoughts in place, an appropriate inferential anal-
processes experienced at the two points (i.e., geological dep- ysis proceeds as follows:
osition processes, etc.). Thus, the major factors influencing a
soil’s correlation structure can be thought of as being ‘‘exter- 1. An initial regression analysis may be performed to de-
nal’’ (i.e., related to transport and weathering rather than to termine if a statistically significant spatial trend is pres-
JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999 / 471

ent. Because a trend with, for example, depth may have
some physical basis and may be expected to occur iden-
tically at other sites, it may make sense to predict this
trend and assume it to hold at the target site. If so, the
remainder of the analysis is performed on the detrended
data, ε(x) = Su(x) ⫺ m(x), for example, and the trend
and residual statistics must both be reported for use at
the target site because they are intimately linked and can-
not be considered separately. Using just the residual sta-
tistics leads to a stochastic model that is likely to be
grossly in error.
2. Establish the second-moment behavior of the data set
over space. Here, interest may specifically focus on
whether the soil is best modeled by a finite-scale sto-
chastic model having limited spatial correlation, or by a

fractal model having significant lingering correlation
over very large distances. These terms are discussed in
more detail later.
3. For a selected spatial correlation function, estimate any
required parameters from the data set.
By and large, probably the most common spatial correlation

model in use currently is the 1D Markov model that has an
exponentially decaying correlation function
␳(␶) = exp 冉冊
⫺
2兩␶兩
␪
(2)
where ␳(␶) = correlation coefficient between two points sep-

arated by a lag distance ␶. In this model the parameter ␪ is a
distance called the ‘‘scale of fluctuation.’’ It may be loosely
interpreted as the separation distance beyond which soil prop-
erties are largely uncorrelated. VanMarcke (1984) defined ␪ to
be equal to the area under the correlation function. Eq. (2) is
considered to be a finite-scale model because the correlation
dies out very rapidly for separation distances ␶ > ␪; the area
under this function, in particular, is finite. Such models are
also called short-memory (Beran 1994). Other common finite- FIG. 2. Realization of Self-Similar Fractional Gaussian Noise
scale models are the Gaussian model, ␳(␶) = exp{⫺␲(␶/␪)2}, (H = 0.95) Seen at Various Resolutions
and the spherical model, ␳(␶) = 1 ⫺ 1.5兩␶/␪兩 ⫹ 0.5兩␶/␪兩3, (兩␶兩
ⱕ ␪; ␳(␶) = 0 if 兩␶兩 > ␪). Probably the best way to envisage the spectral density in-
An alternative model, which is rapidly gaining acceptance terpretation of a random process is to think of the random
in a wide variety of applications, is the fractal model, also process as being composed of a number of sinusoids, each
known as statistically self-similar, long-memory, and 1/f noise. with random amplitude (power). The fractal model indicates
This model has an infinite scale of fluctuation, and correlations that these random processes are made up of high amplitude,
remain significant over very large distances. An example of long wavelength (low frequency) sinusoids added to succes-
such a process is shown in Fig. 2. It should be noted that the sively less powerful, short wavelength sinusoids. The long
samples remain statistically similar, regardless of viewing res- wavelength components provide for what are seen as trends
olution, under suitable scaling of the vertical axis. Such pro- when viewed over a finite interval. As one ‘‘zooms’’ out and
cesses are often described by the (one-sided) spectral density views progressively more of the random process, even longer
function wavelength (scale) sinusoids become apparent. Conversely, as
one zooms in, the short wavelength components dominate the
Go (local) picture. This is the nature of self-similarity attributed
G(␻) = (3)
␻␥ to fractal processes—realizations of the process look the same
(statistically) at any viewing scale.
in which the parameter ␥ controls how the spectral power is By locally averaging the fractional Gaussian noise (0 < ␥ <
partitioned from the low-to-high frequencies, and Go can be 1) process over the small distance ␦, Mandelbrot (1968) ren-
viewed as a spectral intensity (white noise intensity when ␥ = ders fractional Gaussian noise physically realizable with finite
0). In particular, the case where 0 ⱕ ␥ < 1 corresponds to variance and correlation function
infinite high frequency power and results in a stationary ran-
dom process called fractional Gaussian noise (Mandelbrot 1
1968). When ␥ > 1, the spectral density falls off more rapidly ␳(␶) = [兩␶ ⫹ ␦兩2H ⫺ 2兩␶兩2H ⫹ 兩␶ ⫺ ␦兩2H] (4)
2␦2H
at high frequencies, but grows more rapidly at low frequencies
so that the infinite power is now in the low frequencies. This where H = (1/2)(␥ ⫹ 1) is called the Hurst or self-similarity
then corresponds to a nonstationary random process called coefficient with 1/2 ⱕ H < 1. The case H = 1/2 gives white
fractional Brownian motion. Both cases are infinite-variance noise, whereas H = 1 corresponds to perfect correlation [all
processes that are physically unrealizable. Their spectral den- X(z) = X, in the stationary case]. Strictly speaking, the process
sities must be truncated in some fashion to render them sta- variance ␴2X is fully determined by Go, H, and ␦ as ␴X2 =
tionary with finite variance. Go␦2H⫺2⌫(1 ⫺ 2H )cos(␲H )/(␲H ), which goes to infinity as the

冘
n
local averaging distance ␦ goes to zero, as expected. The local 1
averaging is effectively a low pass filter, damping out high ␴
ˆ 2X = (x i ⫺ ␮
ˆ X)2 (6)
n i =1
frequency contributions, so that Mandelbrot’s approach essen-
tially truncates the spectral density function at the high end. This estimator is biased because it is not divided by n ⫺ 1
Self-similarity for fractional Gaussian noise is expressed by as is usually seen. (A biased estimator is one whose expected
saying that the process X(z) has the same distribution as the value is not equal to the parameter it purports to estimate.)
scaled process a1⫺HX(az), for some a > 0. Alternatively, self- The use of a biased estimator here is for three reasons:
similarity for fractional Brownian motion means that X(z) has
the same distribution as a⫺HX(az), where the different expo- 1. The expected error variance is (slightly) smaller than that
nent on a is due to the fact that fractional Gaussian noise is for the biased case.
the derivative of fractional Brownian motion. Fig. 2 shows a 2. The biased estimator, when estimating covariances, leads
realization of fractional Gaussian noise with H = 0.95 pro- to a tractable nonnegative definite covariance matrix.
duced using the local average subdivision method (Fenton 3. It is currently the most popular variance estimator in time
1990). The uppermost plot is of length n = 65,536. Each plot series analysis (Priestley 1981).
in Fig. 2 zooms in by a factor of a = 8, so that each lower

plots has its vertical axis stretched by a factor of 80.05 = 1.11 The main reason for its popularity is probably due to its non-
to appear statistically similar to the next higher plot. negative definiteness. The covariance C(␶) between X(z) and
In the following sections, the applicability of these models X(z ⫹ ␶) is estimated, using a biased estimator for reasons
to characterize soil properties will be considered. In particular, discussed above, as
冘
the Second-Order Structural Analysis section takes a qualita- n⫺j ⫹1
tive look at a number of tools that reveal something about 1
Ĉ(␶j) = (x i ⫺ ␮
ˆ X)(x i ⫹ j ⫺1 ⫺ ␮
ˆ X), j = 1, 2, . . . , n (7)
the second-moment behavior of a 1D random process. The n i =1
intent of that section is to evaluate these tools with respect to
their ability to discern between finite scale and fractal behav- where the lag ␶j = ( j ⫺ 1)⌬z. Notice that Ĉ(0) is the same as
ior. In the Estimation of First- and Second-Order Statistical the estimated variance ␴ˆ 2X. The sample correlation is obtained
Parameters section, various maximum likelihood approaches by normalizing
to the estimation of the parameters for the finite-scale and frac- Ĉ(␶j)
tal models are given. Finally, the results are summarized in ␳ˆ (␶j) = (8)
the Conclusions with a view toward their use in developing a Ĉ(0)
priori soil statistics from a large geotechnical database. One of the major difficulties with the sample correlation func-
tion resides in the fact that it is heavily dependent on the
SECOND-ORDER STRUCTURAL ANALYSIS estimated mean ␮ ˆ X. When the soil shows significant long-scale
Attention is now turned to the stochastic characterization of dependence, characterized by long-scale fluctuations (see, e.g.,
the soil data itself. Aside from estimating the mean and vari- Fig. 1), then ␮
ˆ X is almost always a poor estimate of the true
ance, the spatial correlation structure must be deduced. In the mean. In fact, it is not too difficult to show that although the
following it is assumed that the data, xi , i = 1, 2, . . . , n, are mean estimator [(5)] is unbiased, its variance is given by
collected at a sequence of equispaced points along a line and
冋冘冘册
n n
1
that the best stochastic model along that line is to be found. Var[␮
ˆ X] = ␳(␶i⫺j) ␴2X = ␥n␴2X ⯝ ␥(D)␴2X (9)
Note that the xi may be some suitable transformation of the n2 i =1 j =1
actual data derived from the samples. In the following, xi is where D = (n ⫺ 1)⌬z = sampling domain size (interpreted as
an observation of the random process X i = X(zi), where z is an the region defined by n equisized ‘‘cells,’’ each of width ⌬z
index (commonly depth) and zi = (i ⫺ 1)⌬z, i = 1, 2, . . . , n. centered on an observation); ␥(D) = variance function
A variety of tools will be considered in this section and their (VanMarcke 1984) that gives the variance reduction due to
ability to identify the most appropriate stochastic model for Xi averaging over the length D
冕冕冕
will be discussed. In particular, interest focuses on whether the D D D
process X(z) is finite scaled or fractal in nature. The perfor- 1 2
mance of the various tools in answering this question will be ␥(D) = ␳(␶ ⫺ s) d␶ ds = (T ⫺ ␶)␳(␶) d␶ (10)
D2 0 0 D2 0
evaluated via simulation employing 2,000 realizations of fi-
nite-scale (Markov) [(2)] and fractal [(4)] processes. Each sim- The discrete approximation to the variance function, denoted
ulation is 20.48 m in length with ⌬z = 0.02, so that n = 1,024, ␥n in (9), approaches ␥(D) as n becomes large. For highly
and realizations are produced via covariance matrix decom- correlated soil samples (over the sampling domain), ␥(D) re-
position (Fenton 1994)—a method that follows from a Cho- mains close to 1.0, so that ␮
ˆ X remains highly variable, almost
leski decomposition. Unfortunately, large covariance matrices as variable as X(z) itself. Notice that the variance of ␮ ˆ X is
are often nearly singular and so are difficult to decompose unknown because it depends on the unknown correlation struc-
correctly. Because the covariance matrix for a 1D equispaced ture of the process.
random field is symmetric and Toeplitz (i.e., the entire matrix In addition, it can be shown that Ĉ(␶j) is biased according
is known if only the first column is known—all elements to the following (VanMarcke 1984):
冉冊
along each diagonal are equal), the decomposition is done us-
ˆ j)] ⯝ ␴2X n⫺j⫹1
ing the numerically more accurate Levinson-Durbin algorithm E[C(␶ [␳(␶j) ⫺ ␥(D)] (11)
[see Marple (1987) and Brockwell and Davis (1987)]. n
where, again, the approximation improves as n increases. From
Sample Correlation Function this it can be seen that
冉冊冉冊
The classical sample average of x i is computed as ˆ j)] n⫺j⫹1 ␳(␶j) ⫺ ␥(D)
冘
E[C(␶
n E[␳ˆ (␶j)] ⯝ ⯝ (12)
1 ˆ
E[C(0)] n 1 ⫺ ␥(D)
␮
ˆX = xi (5)
n i =1
using a first-order approximation. For soil samples that show
and the sample variance as considerable serial correlation, ␥(D) may remain close to 1,

FIG. 3. Covariance Estimator on Strongly Dependent Finite Sample
and generally the term ␳(␶j) ⫺ ␥(D) will become negative for The major difference between V̂(␶j) and ␳ˆ (␶j) is that the
all j* ⱕ j ⱕ n for some j* less than n. What this means is semivariogram does not depend on ␮ ˆ X. This is a clear advan-
that the estimator ␳ˆ (␶) will typically dip below zero even when tage because many of the problems of the correlation function
the field is actually highly positively correlated. relate to this dependence. In fact, it is easily shown that the
Another way of looking at this problem is as follows. Con- semivariogram is an unbiased estimator with E[V(␶ ˆ j)] =
sider the sample shown in Fig. 1 and assume that the apparent (1/2)E[(X i⫹j ⫺ X i)2]. Fig. 5 shows how this estimator behaves
trend is in fact just part of a long-scale fluctuation. Clearly, for finite-scale and fractal processes. Notice that the finite-
Fig. 1 is then a process with a large scale of fluctuation, com- scale semivariogram rapidly increases to its limiting value (the
pared with D. The local average ␮ ˆ X is estimated and shown variance) and then flattens out whereas the fractal process
by the dashed line in Fig. 3. Now, for any ␶ greater than about leads to a semivariogram that continues to increase gradually
half of the sampling domain, the product of the deviations throughout. This behavior can indicate the underlying process
from ␮ ˆ X in (7) will be negative. This means that the sample type and allow identification of a suitable correlation model.
correlation function will decrease rapidly and become negative Noteworthy, however, is the very wide range between the ob-
somewhere before ␶ = (1/2)D even though the true correlation served minimum and maximum (the maximum going off the
function may remain much closer to 1.0 throughout the sam- plot but having maximum values in the range from 5 to 10 in
ple. both cases). The high variability in the semivariogram may
It should be noted that if the sample does in fact come from hinder its use in discerning between model types unless suf-
a short-scale process, with ␪ << D, the variability of (9) and ficient averaging can be performed.
the bias of (12) largely disappear because ␥(D) ⯝ 0. This The semivariogram finds its primary use in mining geosta-
means that the sample correlation function is a good estimator tistics applications [see, e.g., Journel and Huijbregts (1978)].
of short-scale processes as long as ␪ << D. However, if the Cressie (1993) discussed some of its distributional character-
process does in fact have long-scale dependence, then the cor- istics along with robust estimation issues, but little is known
relation function cannot identify this and in fact continues to about the distribution of the semivariogram when X(z) is spa-
illustrate short-scale behavior. In essence, the estimator is anal- tially dependent. Without the estimator distribution, the
ogous to a self-fulfilling prophecy: It always appears to justify semivariogram cannot easily be used to test rigorously be-
its own assumptions. tween competing model types (as in fractal versus finite scale),
Fig. 4 illustrates the situation graphically using simulations nor can it be used to fit model parameters using the maximum
from finite-scale and fractal processes. The dashed lines show likelihood method. The latter is one of the goals of this paper.
the maximum and minimum correlations observed at each lag For these reasons, the semivariogram will not be pursued fur-
over the 2,000 realizations. The finite-scale (␪ = 3) simulation ther in this paper, although indications are that such a pursuit
shows reasonable agreement between ␳ˆ (␶) and the true corre- may be warranted.
lation because ␪ << D ⯝ 20. However, for the fractal process
(H = 0.95) there is a very large discrepancy between the es- Sample Variance Function
timated average and true correlation functions. Clearly, the
sample correlation function fails to provide any useful infor- The variance function measures the decrease in the variance
mation about large-scale or fractal processes. of an average as an increasing number of sequential random
variables are included in the average. If the local average of
Sample Semivariogram a random process XD is defined by
冕
The semivariogram [half of the variogram, as defined by D
Matheron (1962)] gives essentially the same information as 1

XD = X(z) dz (15)
the correlation function because, for stationary processes, they D 0
are related according to then the variance of XD is just ␥(D )␴2X. In the discrete case,
1 which will be used here, this becomes
冘冘
V(␶j) = E[(X i⫹j ⫺ X i)2] = ␴2X (1 ⫺ ␳(␶)) (13) n n
2 1 1
X̄ = ␮
ˆX = X(zi) = Xi (16)
The sample semivariogram is defined by n i =1 n i =1
冘
n⫺j
1 where Var[X̄ ] = ␥n␴X2 ; and ␥n, defined by (9), is the discrete
V̂(␶j) = (x i⫹j ⫺ x i)2, j = 0, 1, . . . , n ⫺ 1 (14)
2(n ⫺ j) i =1 approximation of ␥(D) (see Sample Correlation Function sub-

FIG. 4. Sample Correlation Functions for Models Averaged Over 2,000 Realizations: (a) Finite Scale (␪ = 3); (b) Fractal (H = 0.95)
section). If the X i values are independent and identically dis- It should be noted that ␥ˆ 1 = 1 as the sum of (18) is the same
tributed then ␥n = 1/n, whereas if X1 = X2 = ⭈ ⭈ ⭈ = X n, then the as that used to find ␴ˆ 2X when i = 1. Also, when i = n, the
values of X are completely correlated and ␥n = 1 so that av- sample variance function ␥ˆ n = 0 because X n, j = X n,1 = ␮
ˆ X. Thus,
eraging does not lead to any variance reduction. In general, the sample variance function always connects the points ␥1 =
for correlation functions that remain nonnegative, 1/n ⱕ ␥n 1 and ␥n = 0.
ⱕ 1. Unfortunately, the sample variance function is biased, and
Conceptually, the rate at which the variance of an average its bias depends on the degree of correlation between obser-
decreases with averaging size states the spatial correlation vations. Specifically, it can be shown that
structure. In fact these are equivalent because in the 1D (con-
tinuous case) ␥i ⫺ ␥n
E[␥ˆ i] ⯝ (20)
冕
D 1 ⫺ ␥n
2 1 ⭸ 2
␥(D) = 2 (D ⫺ ␶)␳(␶) d␶ ⇔ ␳(␶) = [␶2 ␥(␶)] (17)

D 0 2 ⭸␶ 2 using a first-order approximation. This becomes unbiased as n
→ ⬁ only if D = (n ⫺ 1)⌬z → ⬁ and ␥(D) → 0 as well. In
Given a sequence of n equispaced observations over a sam- other words, we need both the averaging region to grow large
pling domain of size D = n⌬z the sample (discrete) variance and the correlation function to decrease sufficiently rapidly
function is estimated to be within the averaging region for the sample variance function
冘
n⫺i⫹1
1 to become unbiased.
␥ˆ i = 2 (X i, j ⫺ ␮
ˆ X)2, i = 1, 2, . . . , n (18) Figs. 6(a and b) show sample variance functions averaged
␴
ˆ X(n ⫺ i ⫹ 1) j =1
over 2,000 simulations of finite-scale and fractal random pro-
where X i, j = local average as follows: cesses, respectively. There is very little difference between the
冘
j ⫹i ⫺1 estimated variance function in the two plots, despite the fact
1 that they come from quite different processes. Clearly, the es-
X i, j = Xk, j = 1, 2, . . . n ⫺ i ⫹ 1 (19)
i k=j timate of the variance function in the fractal case is highly

FIG. 5. Variograms for Models Averaged Over 2,000 Realizations: (a) Finite Scale (␪ = 3); (b) Fractal (H = 0.95)
biased. Thus, the variance function plot does not appear to be they are self-similar in nature [i.e., all wavelets look the same
a good identification tool and is really only a useful estimate when viewed at the appropriate scale (which, in the above
of second-moment behavior for the finite-scale process with ␪ definition, is some power of 2)]. The random process X(z) is
<< D. In the plots, T = i⌬z for i = 1, 2, . . . , n. then expressed as a linear combination of various scalings,
冘冘
translations, and dilations of a common ‘‘shape.’’ Specifically
Wavelet Coefficient Variance
X(z) = X mj ␺mj (z) (22)
The wavelet basis has attracted much attention in recent m j
years in areas of signal analysis, image compression, and If the wavelets are suitably selected to be orthonormal, then
among other things, fractal process modeling (Wornell 1996). the coefficients can be found through the inversion
It can basically be viewed as an alternative to Fourier decom-
冕
⬁
position except that sinusoids are placed by ‘‘wavelets’’ that m
act only over a limited domain. 1D wavelets are usually de- X =
j X(z)␺mj (z) dz (23)
⫺⬁
fined as translations along the real axis and dilations (scalings)
of a ‘‘mother wavelet,’’ as in for which highly efficient numerical solution algorithms exist.
The details of the wavelet decomposition will not be discussed
␺mj (z) = 2m/2␺(2mz ⫺ j) (21)
in this paper. [The interested reader should see, for example,
where m and j = dilation and translation indices, respectively. Strang and Nguyen (1996).]
The appeal to using wavelets to model fractal processes is that A theorem by Wornell (1996) states that, under reasonably

FIG. 6. Sample Variance Functions for Models Averaged Over 2,000 Realizations: (a) Finite Scale (␪ = 1); (b) Fractal (H = 0.98)
general conditions, if the coefficients X mj are mutually uncor- against the scale index m for the finite-scale and fractal sim-
related, zero-mean random variables with variances ulation cases. The fractal simulations yield a straight line, as
expected, whereas the finite-scale simulations show a slight
␴2m = Var[X mj ] = ␴22⫺␥m (24) flattening of the variance at lower values of m (larger scales).
The lowest value of m = 1 is not plotted because the variance
then X(z) obtained through (22) will have a spectrum that is of this estimate is very large, and it appears to suffer from the
very nearly fractal. Furthermore, Wornell made theoretical and same bias as estimates of the spectral density function at ␻ =
simulation based arguments showing that the converse is also 0 (as discussed later). On the basis of Fig. 7, it appears that
approximately true, namely, that if X(z) is fractal with spectral the wavelet coefficient variance plot may have some potential
density proportional to ␻⫺␥, then the coefficients X mj will be in identifying an appropriate stochastic model, although the
approximately uncorrelated with variance given by (24). If this difference in the plots is quite small. Confident conclusions
is the case, then a plot of ln(Var[X mj ]) versus the scale index will require a large data set.
m will be a straight line. If it turns out that X(z) is fractal and Gaussian, then the
Using a fifth-order Daubechies wavelet basis, Fig. 7 shows coefficients X mj are also Gaussian and (largely) uncorrelated as
a plot of the estimated wavelet coefficient variances ␴ˆ 2m, where discussed above. This means that a maximum likelihood es-
冘
2m⫺1 timation can be performed to evaluate the spectral exponent ␥
1 by looking at the likelihood of the computed set of coefficients
␴
ˆ 2m = (x mj )2 (25)
2m⫺1 j =1 x mj .

FIG. 7. Sample Wavelet Coefficient Variance Functions for Models Averaged Over 2,000 Realizations: (a) Finite Scale (␪ = 3); (b) Frac-
tal (H = 0.95)
Sample Spectral Density Function exponentially distributed with means equal to the true one-
sided spectral density G(␻j) [see Beran (1994)]. VanMarcke
The sample spectral density function, referred to here also
(1984) showed that the periodogram itself has a nonzero scale
as the periodogram despite its slightly nonstandard form, is
of fluctuation when D = n⌬z is finite, equal to 2␲/D. This
obtained by first computing the Fourier transform of the data
suggests the presence of serial correlation between periodo-
冘
n⫺1
1 gram estimators. However, because the periodogram estimates
␹(␻j) = X(tk)e⫺i␻j k (26) at Fourier frequencies are separated by 2␲/D, they are, there-
n k =0
fore, approximately independent, according to the physical in-
at each Fourier frequency ␻j = 2␲j/D, j = 0, 1, . . . , (n ⫺ 1)/2. terpretation of the scale of fluctuation distance. The indepen-
This is efficiently achieved using the fast Fourier transform. dence and distribution have also been shown by Yajima (1989)
The periodogram is then given by the squared magnitude of to hold for both fractal and finite-scale processes. Armed with
the complex Fourier coefficients according to this distribution on the periodogram estimates, one can per-
D form maximum likelihood estimation as well as (conceptually)
Ĝ(␻j) = 兩␹(␻j)兩2 (27) hypothesis tests. If the periodogram is smoothed using some
␲
sort of smoothing window, as discussed by Priestley (1981),
where D = n⌬z. For stationary processes with finite variance, the smoothing may lead to loss of independence between es-
the periodogram estimates as defined here are independent and timates at sequential Fourier frequencies so that likelihood ap-

FIG. 8. Periodograms for Models Averaged Over 2,000 Realizations: (a) Finite Scale (␪ = 3); and (b) Fractal (H = 0.95)
proaches become complicated. In this sense, it is best to c, so that a log-log plot of the sample spectral density function
smooth the periodogram (which is notoriously rough) by av- of a fractal process will be a straight line with a slope of ⫺␥.
eraging over an ensemble of periodogram estimates taken from Fig. 8 illustrates how the periodogram behaves when averaged
a sequence of realizations of the random process, where avail- over both fractal and finite-scale simulations. The periodogram
able. is a straight line with a negative slope in the fractal case and
It should be noted that the periodogram estimate at ␻ = 0 becomes more flattened at the origin in the finite-scale case,
is not a good estimator of G(0), and so it should not be in- as was observed for the wavelet variance plot. Again, the dif-
cluded in the periodogram plot. In fact, the periodogram es- ference is only slight, so that a fairly large data set is required
timate at ␻ = 0 is biased with E[Ĝ(0)] = G(0) ⫹ n␮2X /(2␲) to decide on a model with any degree of confidence.
(Brockwell and Davis 1987). Recalling that ␮X is unknown,
and its estimate highly variable when strong correlation exists, ESTIMATION OF FIRST- AND SECOND-ORDER
the estimate Ĝ(0) should not be trusted. In addition, its distri- STATISTICAL PARAMETERS
bution is no longer a simple exponential.
The easiest way to determine whether the data are fractal in Upon deciding on whether a finite-scale or fractal model is
nature is to look directly at a plot of the periodogram. Fractal more appropriate in representing the soil data, the next step is
processes have spectral density functions of the form G(␻) ⬀ to estimate the pertinent parameters. In the case of the finite-
␻⫺␥ for ␥ > 0. Thus, ln G(␻) = c ⫺ ␥ ln ␻, for some constant scale model, the parameter of interest is the scale of fluctuation

␪. For fractal models, the parameter of interest is the spectral succinctly expresses the first two moments of the random field,
exponent ␥, or, equivalently for 0 ⱕ ␥ < 1, the self-similarity where it is hoped that even these can be estimated with some
parameter H = (1/2)(␥ ⫹ 1). confidence.
Because the data are assumed to be jointly normally dis-
Finite-Scale Model tributed, the space domain maximum likelihood estimators are
obtained by maximizing the likelihood of observing the spatial
If the process is deemed to be finite-scale in nature, then a data under the assumed joint distribution. The likelihood of
variety of techniques are available to estimate ␪, some of observing the sequence of observations xT = {x1, x2, . . . , x n}
which are as follows: (superscript T denotes the vector or matrix transpose) given
the distributional parameters ␾T = {␮X , ␴2X , ␪} is
• Directly compute the area under the sample correlation
function. This is a nonparametric approach, although it
assumes that the scale is finite and that the correlation
function is monotonic. The area is usually taken to be the
L(x兩␾) =
1
(2␲)n/2兩C兩1/2
exp 再 ⫺
1
2
(x ⫺ ␮)TC ⫺1(x ⫺ ␮)冎 (28)
area up to when the function first becomes negative (the where C = covariance matrix between the observation; Cij =
scale of fluctuation is not well defined for oscillatory cor- E[(X i ⫺ ␮i)(X j ⫺ ␮j)]; 兩C兩 = determinant of C; and ␮ = vector
relation functions—other parameters may be more appro- of means corresponding to each observation location. In the
priate if this is the case). It should also be noted that following, the data are assumed to be modeled by a stationary
correlation estimates lying within the band ⫾2n⫺1/2 are random field, so that the mean is spatially constant and ␮ =
commonly deemed to be not significantly different than ␮X 1, where 1 is a vector with all elements equal to 1. Again,
zero [see Priestley (1981, p. 340) and Brockwell and Da- if this assumption is not deemed warranted, a deterministic
vis (1987, Chapter 7)]. trend in the mean and variance can be removed from the data
• Use regression to fit a correlation function to ␳ˆ (␶) or a prior to the following statistical analysis through the transfor-
semivariogram to V̂(␶). For certain assumed correlation or mation
semivariogram functions, this regression may be nonlin-
ear in ␪. x(z) ⫺ m(z)
x⬘(z) =
• If the sampling domain D is deemed to be much larger s(z)
than the scale of fluctuation, then the scale can be esti- where m(z) and s(z) = deterministic spatial trends in the mean
mated from the variance function using an iterative tech- and standard deviation, respectively, possibly obtained by re-
nique such as that suggested by VanMarcke (1984, p. gression analysis of the data. Recall, however, that this is gen-
337). erally only warranted if the same trends are expected at the
• Assuming a joint distribution for X(tj) with corresponding target site.
correlation function model, estimate unknown parameters Also due to the stationarity assumption, the covariance ma-
(␮X , ␴2X, and correlation function parameters) using max- trix can be written in terms of the correlation matrix ␳ as
imum likelihood in the space domain.
• Using the established results regarding the joint distribu- C = ␴2X ␳ (29)
tion for periodogram estimates at the set of Fourier fre-
quencies, an assumed spectral density function can be where ␳ = function only of the unknown correlation function
‘‘fit’’ to the periodogram using maximum likelihood. parameter ␪. If the correlation function has more than one
parameter, then ␪ is treated as a vector of unknown parameters
Because of the reasonably high bias in the sample correlation and the maximum likelihood will generally be found via a
(or covariance) function estimates, even for finite-scale pro- gradient or grid search in these parameters. With (29), the
cesses, using the sample correlation directly to estimate ␪ will likelihood function of (28) can be written as
not be pursued here. The variance function techniques have
not been found by the writer to be particularly advantageous
over, for example, the maximum likelihood approaches and
L(x兩␾) =
1
(2␲␴ ) 兩␳兩1/2
2 n/2
X
exp 再 ⫺
2␴X2
冎
(x ⫺ ␮)T␳⫺1(x ⫺ ␮)
(30)
are also prone to error due to their high bias in long-scale or Because the likelihood function is strictly nonnegative, maxi-
fractal cases. Here, the maximum likelihood estimator in the mizing L(x兩␾) is equivalent to maximizing its logarithm,
space domain will be discussed briefly. The maximum likeli- which, ignoring constants, is given by
hood estimator in the frequency domain will be considered
later. n 1 (x ⫺ ␮)T␳⫺1(x ⫺ ␮)
ᏸ(x兩␾) = ⫺ ln ␴2X ⫺ ln兩␳兩 ⫺ (31)
It will be assumed that the data are normally distributed, or 2 2 2␴X2
have been transformed from their raw state into something that
is at least approximately normally distributed. For example, The maximum of (31) can in principle be found by differen-
many soil properties are commonly modeled using the log- tiating with respect to each unknown parameter, ␮X , ␴2X , and
normal distribution, often primarily because this distribution ␪, in turn and setting the results to zero. This gives three equa-
is strictly nonnegative. To convert lognormally distributed data tions in three unknowns. The partial derivative of ᏸ with re-
to a normal distribution, it is sufficient merely to take the nat- spect to ␮X , when set equal to zero, leads to the following
ural logarithm of the data prior to further statistical analysis. estimator for the mean
It should be noted that the normal model is commonly used
1T␳⫺1x
for at least two reasons: (1) It is analytically tractable in many ␮
ˆX = (32)
ways; and (2) it is completely defined through knowledge of 1T␳⫺11
the first two moments, namely the mean and covariance struc- Because this estimator still involves the unknown correlation
ture. Because other distributions commonly require higher mo- matrix, it should be viewed as the value of ␮X that maximizes
ments and because higher moments are generally quite difficult the likelihood function for a given value of the correlation
to estimate accurately, particularly in the case of geotechnical parameter ␪. If the two vectors r and s are solutions of the
samples that are typically limited in size, the use of other dis- two systems of equations
tributions is often difficult to justify. The normal assumption
can be thought of as a minimum knowledge assumption that ␳r = x; ␳s = 1 (33a,b)

then the estimator for the mean can be written as more accurate decomposition and allows the direct computa-
tion of the log-determinant as part of the solution [see, e.g.,
1Tr Marple (1987)].
␮
ˆX = (34)
1Ts One finite-scale model that has a particularly simple maxi-
mum likelihood formulation is the jointly normal distribution
It should be noted that this estimator is generally not very
with the Markov correlation function given by (2). When ob-
different from the usual estimator obtained by simply aver-
servations are equispaced, the correlation matrix has a simple
aging the observations (Beran 1994). It should also be noted
closed form determinant and a tridiagonal inverse
that 1Tr = xTs, so that only s needs to be found to compute
␮ˆ X . However, r will be needed in (35), so it should be found 兩␳兩 = (1 ⫺ q2)n⫺1 (37)
冉冊
anyhow.
The partial derivative of ᏸ with respect to ␴2X , when set 1
equal to zero, leads to the following estimator for ␴2X ␳⫺1 =
1 ⫺ q2
1
␴
ˆ 2X = (x ⫺ ␮
ˆ X 1)Tr (35)
n 1 ⫺q 0 0 ⭈⭈⭈ 0 0
⫺q 1 ⫹ q2 ⫺q 0 ⭈⭈⭈ 0 0
which is also implicitly dependent on the correlation function 0 ⫺q 1 ⫹ q2 ⫺q ⭈⭈⭈ 0 0
parameter through ␮ˆ X and r. ⭈ 0 0 ⫺q 1 ⫹ q2 ⭈⭈⭈ 0 0
Thus, both the mean and variance estimators can be ex- ⭈⭈ ⭈⭈ ⭈⭈ ⭈⭈ ⭈⭈ ⭈⭈⭈ ⭈⭈⭈
pressed in terms of the unknown parameter ␪. Using these ⭈ ⭈ ⭈ ⭈ ⭈
0 0 0 0 ⭈⭈⭈ 1 ⫹ q2 ⫺q
results, the maximization problem simplifies to finding the 0 0 0 0 ⭈⭈⭈ ⫺q 1
maximum of (38)
n 1 where q = exp{⫺2⌬z/␪} for observations spaced ⌬z apart.

ᏸ(x兩␾) = ⫺ ln ␴
ˆ 2X ⫺ ln兩␳兩 (36) Using these results, the maximum likelihood estimation of q
2 2
reduces to finding the root of the cubic equation
where the last term in (31) became simply n/2 and was
dropped from (36) because it does not affect the location of f (q) = b0 ⫹ b1q ⫹ b2q 2 ⫹ b3q 3 = 0 (39)
the maximum.
In principle, (36) need now only be differentiated with re- on the interval q 僆 (0, 1), where
spect to ␪, and the result set to zero to yield the optimal es- b0 = nR1; b1 = ⫺(R 0 ⫹ nR⬘);
0 b2 = ⫺(n ⫺ 2)R1 (40a–c)
timate ␪ˆ and subsequently ␴ˆ 2X and ␮
ˆ X . Unfortunately, this in-
冘
n
volves differentiating the determinant of the correlation matrix,
and closed form solutions do not always exist. Solution of the b3 = (n ⫺ 1)R⬘;
0 R0 = (x i ⫺ ␮
ˆ X)2 (40d,e)
i =1
maximum likelihood estimators may therefore proceed by it-
eration as follows: R⬘0 = R 0 ⫺ (x1 ⫺ ␮
ˆ X)2 ⫺ (x n ⫺ ␮
ˆ X)2 (40f )
1. Guess at an initial value for ␪.
冘
n⫺1
2. Compute the corresponding correlation matrix elements R1 = (xi ⫺ ␮

ˆ X)(x i⫹1 ⫺ ␮
ˆ X) (40g)
␳ij = ␳(兩z i ⫺ z j 兩), which in the current equispaced 1D case i =1
is both symmetric and Toeplitz (i.e., elements along each For given q, the corresponding maximum likelihood estimates
diagonal are equal). of ␮X and ␴2X are
3. Solve (33) for vectors r and s.
4. Solve for the determinant of ␳ (because this is often van- Q n ⫺ q(Q n ⫹ Q⬘)
n ⫹ q Q⬘
2
␮
n
ishing, it is usually better to compute the log-determinant ˆX = (41a)
n ⫺ 2q(n ⫺ 1) ⫹ q (n ⫺ 2)
2
directly to avoid numerical underflow).

5. Compute the mean and variance estimates using (35) and R 0 ⫺ 2qR1 ⫹ q 2R⬘0
(34). ␴
ˆ 2X = (41b)
n(1 ⫺ q 2)
6. Compute the log-likelihood value ᏸ using (36).
7. Guess at a new value for ␪ and repeat Steps 2–7 until where
the global maximum value of ᏸ is found.
冘
n
Qn = x i; Q⬘n = Q n ⫺ x1 ⫺ x n (42a,b)
Guesses for ␪ can be arrived at simply by stepping discretely i =1
through a likely range (and the speed of modern computers
make this a reasonable approach), increasing the resolution in According to Anderson (1971), (39) will have one root be-
the region of located maxima. Alternatively, more sophisti- tween 0 and 1 (for positive R1, which is n times the lag 1
cated techniques may be employed that look also at the mag- covariance and should be positive under this model) and two
nitude of the likelihood in previous guesses. One advantage to roots outside the interval (⫺1, 1). The root of interest is the
the brute force approach of stepping along at predefined in- one lying between 0 and 1 and it can be efficiently found using
crements is that it is more likely to find the global maximum Newton-Raphson iterations with starting point q = R1 /R⬘, 0 as
in the event that multiple local maxima are present. With the long as that starting point lies within (0, 1) (if not, use starting
speed of modern computers, this approach has been found to point q = 0.5).
be acceptably fast for a low number of unknown parameters, Because the coefficients of the cubic depend on ␮ ˆ X , which
for instance, less than four, and where bounds on the param- in turn depends on q, the procedure actually involves a global
eters are approximately known. iteration outside the Newton-Raphson root finding iterations.
For large samples, the correlation matrix ␳ can become However, ␮ ˆ X changes only slightly with changing q, so global
nearly singular, so that numerical calculations become unsta- convergence is rapid if it is bothered with at all. Once the root
ble. In the 1D case, the Durbin-Levinson recursion, taking full q of (39) has been determined, the maximum likelihood esti-
advantage of the Toeplitz character of ␳, yields a faster and mate of the scale of fluctuation is determined from

2⌬z Alternatively, if both ␦ and H are estimated simultaneously via
␪ˆ = ⫺ (43) maximum likelihood in the space domain (see previous sub-
ln q
section), then one finds that the likelihood surface has many
In general, estimates of the variances of the maximum likeli- local maxima making it difficult to find the global maximum.
hood estimates derived above are also desirable. One of the Even when it has been found with reasonable confidence, it is
features of the maximum likelihood approach is that asymp- the writer’s experience that the global maximum tends to cor-
totic bounds on the covariance matrix C␪ˆ between the esti- respond to unreasonably large values of ␦ corresponding to
mators ␪ˆ T = {␮
ˆ X , ␴ˆ 2X , ␪]
ˆ can be found. This covariance matrix over-averaging and thus is far too smooth a process. Why this
is called the Cramer-Rao bound, and the bound has been is so is yet to be determined.
shown to hold asymptotically for both finite-scale and fractal A better approach to the fractal model is to employ the
processes (Dahlhaus 1989; Beran 1994), in both cases for both spectral representation, with G(␻) = Go /␻␥, and apply an upper
n and the domain size going to infinity. If we let ␪ = {␮X , frequency cutoff, in the event that 0 ⱕ ␥ˆ < 1, or a lower
␴2X , ␪} be the vector of unknown parameters and define the frequency cutoff in the event that ␥ˆ > 1. Both of these ap-
vector proaches render the process stationary and having finite vari-
ance. When ␥ = 1, the infinite variance contribution appears
⭸
ᏸ⬘ = ᏸ (44) at both ends of the spectrum and both an upper and lower
⬃ ⭸␪j cutoff are needed. The appropriate cutoff frequencies should
where ᏸ is the log-likelihood function defined by (31), then be selected on the basis of the following:
the matrix C␪ˆ is given by the inverse of the Fisher information
matrix • A minimum descriptive scale, in the case 0 ⱕ ␥ˆ < 1,
below which details of the process are of no interest. For
C ⫺1
ˆ
␪ = E[[ᏸ⬘][ᏸ⬘]T ] (45) example, if the random process is intended to be soil per-
⬃ ⬃
meability, then the minimum scale of interest might cor-
where the superscript T denotes the transpose and the expec- respond to the laboratory sample scale d at which per-
tation is overall possible values of X using its joint distribution meability tests are carried out. The upper frequency cutoff
[see (28)] with parameters ␪. ˆ The above expectation is gen- might then be selected such that this laboratory scale cor-
erally computed numerically because it is quite complex an- responds to, for instance, one wavelength, ␻u = 2␲/d.
alytically. For the Gauss-Markov model the vector ᏸ⬘ is given • For the case ␥ˆ > 1, the lower bound cutoff frequency must
⬃
by be selected on the basis of the dimension of the site under
consideration. Because the local mean will be almost cer-
1 T ⫺1 tainly estimated by collecting some observations at the
1 ␳ (X ⫺ ␮)
␴2X site, one can eliminate frequencies with wavelengths that
1 n are large compared with the site dimension. This issue is
ᏸ⬘ = (X ⫺ ␮)T␳⫺1(X ⫺ ␮) ⫺ (46) more delicate than that of an upper frequency cutoff dis-
⬃ 2␴4X 2␴2X
cussed above because there is no natural lower frequency
2⌬z(n ⫺ 1)q 2 1 bound corresponding to a certain finite scale (whereas
⫺ (X ⫺ ␮)TR(X ⫺ ␮)
␪2(1 ⫺ q 2) 2␴X2 there is an upper bound corresponding to a certain finite-
sampling resolution). If the frequency bound is made to
where R is the partial derivative of ␳⫺1 with respect to the be too high, then the resulting process may be missing
scale of fluctuation ␪. the apparent long-scale trends seen in the original data
set. As a tentative recommendation, the writer suggests
Fractal Model using a lower cutoff frequency equal to the least nonzero
The use of a fractal model is considerably more delicate Fourier frequency ␻o = 2␲/D, where D is the site dimen-
than that of the finite-scale model. This is because the fractal sion in the direction of the model.
model, with G(␻) ⬀ ␻⫺␥, has infinite variance. When 0 ⱕ ␥
< 1, the infinite variance contribution comes from the high The parameter ␥ is perhaps best estimated directly in the fre-
frequencies so that the process is stationary but physically un- quency domain via maximum likelihood. There are at least
realizable. Alternatively, when ␥ > 1, the infinite variance two possible approaches, but here only the wavelet and per-
comes from the low frequencies that yield a nonstationary iodogram maximum likelihood estimators will be discussed.
(fractional Brownian motion) process. In the latter case, the In the case of the wavelet basis representation, the approach
infinite variance basically arises from the gradual meandering is as follows [from Wornell (1996)]. For an observed Gaussian
of the process over increasingly large distances as one looks process, x(ti), i = 1, 2, . . . , n, the wavelet coefficients x mj , m
over increasingly large scales. Though nonstationarity is an = 1, 2, . . . , M, j = 1, 2, . . . , 2m⫺1, where n = 2M, can be
interesting mathematical concept, it is not particularly useful obtained via (23) (preferably using an efficient wavelet decom-
nor practical in soil characterization. It does, however, em- position algorithm). Because the input is assumed to be Gauss-
phasize the dependence of the overall soil variation on the size ian, the wavelet coefficients will also be Gaussian. Wornell
of the region considered. This explicit emphasis on domain has shown that they are mean zero and largely independent if
size is an important feature of the fractal model. X(z) comes from a fractal process so that the likelihood of
To render the fractal model physically useful for the case obtaining the set of coefficients x mj is given by
when 0 ⱕ ␥ < 1, Mandelbrot (1968) introduced a distance ␦
写写再冎
M 2m⫺1
1 (x jm)2
over which the fractal process is averaged to smooth out the L(x; ␥) = exp ⫺ (47)
high frequencies and eliminate the high frequency (infinite) m =1 j =1 兹2␲␴2m 2␴2m
variance contribution. The resulting correlation function is where ␴2m = ␴22⫺␥m = model variance for some unknown in-
given by (4). Unfortunately, the rather arbitrary nature of ␦ tensity ␴2. The log-likelihood function is thus
renders Mandelbrot’s model of questionable practical value,
冘冘冉冊冘
2 m⫺1
particularly from an estimation point of view. If ␦ is treated
M M
1 1 1
as known, then one finds that the parameter H can be estimated ᏸ=⫺ (2m⫺1
)ln(␴ ) ⫺
2
(x jm)2 (48)
␴2m
m
2 2
to be any value desired simply by manipulating the size of ␦.
m =1 m =1 j =1

(discarding constant terms) that must be maximized with re- However, it should be pointed out that these variances are only
spect to ␴2 and ␥. [See Wornell (1996, Section 4.3) for details achieved asymptotically as the sample length increases to in-
on a relatively efficient algorithm to maximize ᏸ.] finity. In practice, this is not possible, so that estimates of ␥
An alternative approach to estimating ␥ is via the periodo- and Go will show much greater variability from sample to sam-
gram. In the writer’s opinion, the periodogram approach is ple than suggested by the above bounds. What this means is
somewhat superior to that of the wavelet because the two that the uncertainty in the estimates of ␥ and Go are best ob-
methods have virtually the same ability to discern between tained via simulation or by considering a large number of sam-
finite-scale and fractal processes, and the periodogram has a pling domains.
vast array of available theoretical results dealing with its use
and interpretation. Although the wavelet basis may warrant CONCLUSIONS
further detailed study, its use in geotechnical analysis seems
unnecessarily complicated at this time. When attempting to identify which of a suite of stochastic
Because the periodogram has been shown to consist of ap- models is best suited to represent a soil property, a variety of
proximately independent exponentially distributed estimates at data transforms are available. Most commonly, these are the
sample correlation or covariance function, the semivariogram,
the Fourier frequencies for a wide variety of random processes,

including fractal, it leads easily to a maximum likelihood es- the sample variance function, the sample wavelet coefficient
timator for ␥. In terms of the one-sided spectral density func- variance function, and the periodogram. In trying to determine
tion, G(␻), the fractal process is defined by whether the soil property best follows a finite-scale model or
a fractal 1/f-type noise, the periodogram, wavelet variance, and
Go semivariogram plots were found to be the most discriminating.
G(␻) = , 0ⱕ␻<⬁ (49) In this sense, the periodogram is perhaps the most preferable
␻␥
due to the fact it has been extensively studied, particularly in
The likelihood of seeing the periodogram estimates Ĝj = Ĝ(␻j), the context of time-series analysis, and because it has a nice
j = 1, 2, . . . , k, where k = (n ⫺ 1)/2, and ␻j = 2␲j/D is just physical interpretation relating to the distribution of power to
写冉冊再冎
k various component frequencies.
ˆ ␪) = ␻j␥ ␻j␥ ˆ It is recognized that these data transforms have been eval-
L(G; exp ⫺ Gj (50)
j =1 Go Go uated by averaging over an ensemble of realizations. In many
real situations only one or a few data sets will be available.
and its logarithm is
Judging from the minimum/maximum curves shown on the
冘冘
k k
1 plots of Figs. 4–8, it may be difficult to assess the true nature
ᏸ = ⫺k ln Go ⫹ ␥ ln ␻j ⫺ ␻␥j G
ˆj (51) of the process if little or no averaging can be performed. This
j =1 Go j =1
is, unfortunately, a fundamental issue in all statistical analy-
which is maximized with respect to Go and ␥. The spectral ses—confidence in the results decreases as the number of in-
intensity parameter Go is not necessarily of primary interest dependent observations decreases. All that can really be said
because it may have to be adjusted anyhow to ensure that the about the tools discussed above (i.e., the periodogram) is that
area under G(␻) is equal to ␴ˆ 2x after the cutoff frequency dis- on average it shows a straight line with negative slope for
cussed above is employed (or, alternatively, the cutoff fre- fractal processes and flattens out at the origin for finite-scale
quency adjusted for fixed Go). processes. Because the periodogram ordinates are exponen-
Differentiating (51) with respect to Go and setting the result tially distributed, and the ability to distinguish between process
to zero yields types depends on just the first few (low frequency) ordinates,
冘
k the use of only a single sample set may not lead to a firm
ˆo = 1
G ˆ j ␻␥j
G (52) conclusion. For this, special large-scale investigations, yield-
k j =1 ing a large number of soil samples, may be necessary.
The sample correlation or covariance functions are accept-
In turn, differentiating (51) with respect to ␥ and setting the able measures of second moment structure when the scale of
result to zero leads to the following root finding problem in ␥ fluctuation of the process is small relative to the sampling
冘
k domain, implying that many of the observations in the sample
冘
Ĝj ␻j␥ ln ␻j k are effectively independent. However, these sample functions
become severely biased when the sample shows strong depen-
冘
⫺ ln ␻j = 0
j =1
k (53)
j =1 dence, preventing them from being useful to discern between
Ĝj ␻␥j finite-scale and fractal type processes. Because the level of
j =1
dependence is generally not known a priori, inferences based
For almost all common processes, 0 ⱕ ␥ ⱕ 3, so that (53) on the sample covariance and correlation functions are not
can be solved efficiently via bisection. generally reliable. Likewise, the sample variance function is
The Fisher information matrix, and thus the covariance ma- heavily biased in the presence of strong dependence, rendering
trix between the unknown parameters Go and ␥, is especially its use questionable unless the soil property is known to be
simple to compute for the periodogram maximum likelihood finite scale with ␪ << D.
estimator because, asymptotically, the estimator Ĝo is indepen- Once a class of stochastic models has been determined using
dent of ␥ˆ . The estimated variances of each estimated parameter the periodogram, the periodogram can again be used to esti-
can be found from mate the parameters of an assumed spectral density function
via maximum likelihood. This method can be applied to either
Ĝ 2o finite-scale or fractal processes, requiring only an assumption
冉冘冊冘
␴2Ĝo ⯝ k 2 k (54)
on the functional form of the spectral density function. The
␻1⫺␥
i
ˆ
⫺k ⫺ ␻i2(1⫺␥ˆ ) maximum likelihood approach is preferred over other esti-
i =1 i =1
mation techniques such as regression, because of the many
1 available results dealing with the distribution of maximum
冉冘冊
␴2␥ˆ ⯝ k 2 (55) likelihood estimates [see, e.g., Beran (1994) and DeGroot and
ln ␻i ⫺ k ⫹k Baecher (1993)].
i =1 If the resulting class of models is deemed to be fractal with

0 ⱕ ␥ˆ < 1, then the Mandelbrot model of (4) can also be fitted moires du Bureau de Recherches Geologiques et Minieres, 14, Editions
using maximum likelihood in the space domain (preferably Technip, Paris (in French).
Priestley, M. B. (1981). Spectral analysis and time series. Vol. 1, Uni-
with ␦ taken to be some assumed small averaging length, be- variate Series, Academic, New York.
low which details of the random soil property process are of Strang, G., and Nguyen, T. (1996). Wavelets and filter banks. Wellesley-
no interest). For ␥ > 1, the fitted spectral density function G(␻) Cambridge, New York.
= Go /␻␥ is still of limited use because it corresponds to infinite VanMarcke, E. H. (1984). Random fields: Analysis and synthesis, MIT
variance. It must be truncated at some appropriate upper or Press, Cambridge, Mass.
lower bound (depending on whether ␥ is below or above 1.0) Wornell, G. W. (1996). Signal processing with fractals: A wavelet based
approach. Prentice-Hall, Englewood Cliffs, N.J.
to render the model physically useful. The choice of truncation Yajima, Y. (1989). ‘‘A central limit theorem of Fourier transforms of
point needs additional investigation although some rough strongly dependent stationary processes.’’ J. Time Ser. Anal., 10, 375–
guidelines were suggested above. 383.
In the finite-scale case, usually only a single parameter, the
scale of fluctuation, needs to be estimated to provide a com- APPENDIX II. NOTATION
pletely usable stationary stochastic model. However, indica-
tions in a companion paper (Fenton 1999) are that soil prop- The following symbols are used in this paper:
erties are fractal in nature, exhibiting significant correlations
over very large distances. This proposition is reasonable if one a = constant;
thinks about the formation processes leading to soil deposits— bi = coefficients in Gauss-Markov maximum likelihood
the transport of soil particles by water, ice, or air, often takes problem;
place over hundreds if not thousands of kilometers. There may, C = covariance matrix;
however, still be a place for finite-scale models in soil models. Cij = element of covariance matrix;
The major strength of the fractal model lies in its emphasis on Ĉ(␶j) = estimated covariance function at discrete lag ␶j ;
the relationship between the soil variability and the size of the D = sampling domain length or depth;
domain being considered. However, once a site has been es- d = representative length;
tablished, there may be little difference between a properly f (q) = cubic function in Gauss-Markov maximum likeli-
selected finite-scale model and the real fractal model over the hood problem;
finite domain. The relationship between such an ‘‘effective’’ Go = spectral intensity parameter;
finite-scale model and the true but finite-domain fractal model Ĝo = estimated spectral intensity parameter;
G(␻) = one-sided spectral density function;
can be readily established via simulation and is the subject of
Ĝ(␻) = sample one-sided spectral density function;
ongoing research.
H = Hurst or self-similarity coefficient;
i = index, also 兹⫺1 when appearing in exponential;
ACKNOWLEDGMENTS L( ⭈ ) = likelihood function;
The writer would like to thank the Norwegian Geotechnical Institute ᏸ( ⭈ ) = log-likelihood function;
(NGI) and, in particular, Suzanne Lacasse, Farrokh Nadim, and Odd Gre- m = scale (dilation) index for wavelet basis;
gerson for their comments on the early research as well as for their hos- m(z) = mean trend varying with spatial position;
pitality during the summer of 1997. The writer extends his gratitude to n = number of observations in sample of X(z);
the Research Council of Norway for their generous support and for mak- Q n, Q⬘n = coefficients in Gauss-Markov maximum likelihood
ing the project possible. Over the 10 months following the NGI visit, problem;
further analysis work was carried out at Princeton University, and thanks
are due to Erik VanMarcke and the Department of Civil Engineering and
q = root in Gauss-Markov maximum likelihood prob-
Operations Research at Princeton for the use of their facilities and for lem;
their support in many ways. Finally, the writer would like to thank the R 0, R⬘,
0 R1 = coefficients in Gauss-Markov maximum likelihood
National Sciences and Engineering Research Council of Canada for their problem;
support under operating Grant OPG0105445. r = vector used in space domain maximum likelihood
estimator;
APPENDIX I. REFERENCES s = vector used in space domain maximum likelihood
estimator;
Anderson, T. W. (1971). ‘‘The statistical analysis of time series.’’ Prob- s(z) = standard deviation trend varying with spatial posi-
ability and mathematical statistics, Wiley, New York.
Beran, J. (1994). ‘‘Statistics for long-memory processes.’’ Monographs
tion;
on statistics and applied probability, Chapman & Hall, New York. T = averaging dimension;
Brockwell, P. J., and Davis, R. A. (1987). Time series: Theory and meth- V(␶j) = variogram;
ods. Springer, New York. V̂(␶j) = estimated variogram;
Cressie, N. A. C. (1993). Statistics for spatial data, 2nd Ed., Wiley, New X̄ = average of X i across sample of length n;
York. X i = random value of X(z) at zi;
Dahlhaus, R. (1989). ‘‘Efficient parameter estimation for self-similar pro- Xi, j = average of X j, Xj ⫹1, . . . , X j ⫹i;
cesses.’’ Ann. Statist., 17, 1749–1766. X mj = random wavelet coefficient;
DeGroot, D. J., and Baecher, G. B. (1993). ‘‘Estimating autocovariance XT = random local average of X(z) over length T;
of in-situ soil properties.’’ J. Geotech. Engrg., ASCE, 119(1), 147–
166.
X(z) = random 1D process;
Fenton, G. A. (1994). ‘‘Error evaluation of three random field genera- x = vector of observations in sample;
tors,’’ J. Engrg. Mech., ASCE, 120(12), 2478–2497. x i = observed value of X(z) at z i;
Fenton, G. A. (1999). ‘‘Random field modeling of CPT data.’’ J. Geotech. x mj = computed wavelet coefficient from transform;
and Geoenvir. Engrg., ASCE, 125(6), 486–498. x⬘(z) = mean and variance standardized random process;
Fenton, G. A., and VanMarcke, E. H. (1990). ‘‘Simulation of random Y(z) = random 1D process;
fields via local average subdivision,’’ J. Engrg. Mech., ASCE, 116(8), z = coordinate, commonly with depth;
1733–1749. z i = discrete points along 1D random process;
Journel, A. G., and Huijbregts, Ch. J. (1978). Mining geostatistics. Aca- ␥ = spectral exponent in fractal model;
demic, New York.
Marple, S. L., Jr. (1987). Digital spectral analysis. Prentice-Hall, Engle-
␥ˆ = estimated spectral exponent;
wood Cliffs, N.J. ␥(D) = variance function;
Mandelbrot, B. B., and Ness, J. W. (1968). ‘‘Fractional Brownian mo- ␥ˆ (D) = sample variance function;
tions, fractional noises and applications.’’ SIAM Rev., 10(4), 422–437. ⌬z = incremental distance between observations;
Matheron, G. (1962). ‘‘Traite de Geostatistique Appliquee, Tome I,’’ Me- ␦ = averaging dimension in Mandelbrot’s fractal model;

ε(z) = residual random process; ␳ˆ (␶j) = estimated correlation function at discrete lag ␶j ;
␪ = scale of fluctuation; ␴2 = wavelet coefficient intensity parameter;
␪ˆ = estimated scale of fluctuation; ␴2m = wavelet coefficient variance at scale m;
␪ = vector of unknown parameters to be estimated; ␴2X = variance of X(z);
␪ˆ = vector of estimated parameters; ␴
ˆ 2X = estimated variance of X(z);
␮ = vector of mean values associated with set of spatial ␶ = separation distance;
points; ␶j = discrete separation distance, =j⌬z;
␮j = element of ␮; ␹(␻j) = complex Fourier coefficient;
␮X = mean of X(z); ␺mj (z) = daughter wavelet;
␮ˆX = estimated mean of X(z); ␺(z) = mother wavelet;
␳ = correlation coefficient matrix; ␻ = frequency or wavenumber;
␳i, j = element of correlation coefficient matrix; ␻o = lower frequency cutoff;
␳(␶) = correlation coefficient between two points separated ␻u = upper frequency cutoff; and
by ␶; 1 = vector with all elements equal to 1.

Estimation For Stochastic Soil Model PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation For Stochastic Soil Model PDF

Uploaded by

Copyright:

Available Formats

ESTIMATION FOR STOCHASTIC SOIL MODELS

By Gordon A. Fenton,1 Associate Member, ASCE

INTRODUCTION designs, designs involving a future state, or designs where a

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

FIG. 1. Soil Sample on Finite-Sampling Domain

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

chastic model having limited spatial correlation, or by a

By and large, probably the most common spatial correlation

where ␳(␶) = correlation coefficient between two points sep-

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

in Fig. 2 zooms in by a factor of a = 8, so that each lower

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

FIG. 3. Covariance Estimator on Strongly Dependent Finite Sample

Matheron (1962)] gives essentially the same information as 1

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

␥(D) = 2 (D ⫺ ␶)␳(␶) d␶ ⇔ ␳(␶) = [␶2 ␥(␶)] (17)

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

480 / JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

n 1 where q = exp{⫺2⌬z/␪} for observations spaced ⌬z apart.

2. Compute the corresponding correlation matrix elements R1 = (xi ⫺ ␮

directly to avoid numerical underflow).

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

482 / JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

the Fourier frequencies for a wide variety of random processes,

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

484 / JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING / JUNE 1999 / 485

J. Geotech. Geoenviron. Eng., 1999, 125(6): 470-485

You might also like