Sample Size Determination in Geotechnical

992
ARTICLE
Sample size determination in geotechnical site investigation
considering spatial variation and correlation
Yu Wang, Zheng Guan, and Tengyuan Zhao
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Abstract: Site investigation is a fundamental element in geotechnical engineering practice, but only a small portion of geoma-
terials is sampled and tested during site investigation. This leads to a question of sample size determination: how many samples
are needed to achieve a target level of accuracy for the results inferred from the samples? Sample size determination is a
well-known topic in statistics and has many applications in a wide variety of areas. However, conventional statistical methods,
which mainly deal with independent data, only have limited applications in geotechnical site investigation because geotechnical
data are not independent, but spatially varying and correlated. Existing design codes around the world (e.g., Eurocode 7) only
provide conceptual principles on sample size determination. No scientific or quantitative method is available for sample size
determination in site investigation considering spatial variation and correlation of geotechnical properties. This study performs
an extensive parametric study and develops a statistical chart for sample size determination with consideration of spatial
variation and correlation using Bayesian compressive sensing or sampling. Real cone penetration test data and real laboratory
test data are used to illustrate application of the proposed statistical chart, and the method is shown to perform well.
Key words: geotechnical site investigation, sample size, Bayesian method, compressive sensing, random field.
Résumé : L'étude du site est un élément fondamental de la pratique de l’ingénierie géotechnique, mais seule une petite partie
des géomatériaux est échantillonnée et testée au cours de l’étude du site. Cela conduit à une question de détermination de la
For personal use only.
taille de l’échantillon : combien faut-il d’échantillons pour atteindre un niveau cible d’exactitude pour les résultats déduits des
échantillons? La détermination de la taille des échantillons est un sujet bien connu en statistique et a de nombreuses applica-
tions dans de nombreux domaines. Cependant, les méthodes statistiques conventionnelles, qui traitent principalement de
données indépendantes, n’ont que des applications limitées dans l’étude géotechnique de sites, car les données géotechniques
ne sont pas indépendantes, mais varient dans l’espace et en corrélation. Les codes de conception existants dans le monde
(Eurocode 7, par exemple) ne fournissent que des principes conceptuels pour la détermination de la taille de l’échantillon.
Aucune méthode scientifique ou quantitative n’est disponible pour la détermination de la taille de l’échantillon dans l’étude du
site, compte tenu de la variation spatiale et de la corrélation des propriétés géotechniques. Cette étude réalise une évaluation
paramétrique approfondie et développe un tableau statistique pour la détermination de la taille de l’échantillon en prenant en
compte la variation et la corrélation spatiales à l’aide de la détection ou de l’échantillonnage bayésien en compression. Les
données des tests de pénétration au cône réel et les données des tests de laboratoire réels sont utilisées pour illustrer
l’application du diagramme statistique proposé, et il est démontré que la méthode donne de bons résultats. [Traduit par la
Rédaction]
Mots-clés : étude géotechnique du site, taille de l’échantillon, méthode bayésienne, détection de compression, champ aléatoire.
Introduction are often sparsely measured (e.g., Mayne et al. 2002; Phoon 2017;
Site investigation is a fundamental element in geotechnical en- Wang et al. 2017; Orr 2017; Zhao et al. 2018). This results in estima-
gineering practice, and interpretation of site investigation data tion error in the interpretation results and leads to a question of
leads to the expected ground condition (e.g., spatial distribution sample size determination: how many samples are needed to
of different soil types) and design profiles of geotechnical proper- achieve a target level of accuracy for the results inferred from the
ties that are subsequently used in geotechnical analyses and de- samples?
signs. Without proper site investigation or data interpretation to Sample size determination is a well-known topic in statistics
produce reliable input parameters, the subsequent geotechnical (e.g., Krejcie and Morgan 1970) and has many applications in a
analyses and designs are much less meaningful, and the geotech- wide variety of areas (e.g., quality control, surveys, and polls).
nical construction projects might be subjected to significant risk. However, conventional statistical methods, which mainly deal
Clayton (2001) conducted a survey in the UK and found that about with independent data, only have limited application in geotech-
42% of the problems that occurred in geotechnical construction nical site investigation, because geotechnical data are not inde-
projects were caused by uncertainty associated with site investi- pendent, but spatially correlated. Note that soils are natural
gation. During geotechnical investigation of a site, only a small geomaterials, whose properties are affected by many spatially
portion of geomaterials is examined, and site investigation data varying but correlated factors during the geological process, such
Received 13 July 2018. Accepted 2 October 2018.

Y. Wang,* Z. Guan, and T. Zhao. Department of Architecture and Civil Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.
Corresponding author: Yu Wang (email: yuwang@cityu.edu.hk).
*Y. Wang currently serves as an Editorial Board Member; peer review and editorial decisions regarding this manuscript were handled by C. Lake.
Copyright remains with the author(s) or their institution(s). Permission for reuse (free in most cases) can be obtained from RightsLink.
Can. Geotech. J. 56: 992–1002 (2019) dx.doi.org/10.1139/cgj-2018-0474 Published at www.nrcresearchpress.com/cgj on 3 October 2018.
Wang et al. 993
as the properties of their parent materials, weathering and ero- Fig. 1. Framework for establishing relationship between sample
sion processes, transportation agents, and sedimentation condi- size and accuracy in geotechnical site investigation. [Colour online.]
tions. Geotechnical properties are therefore spatially varying and
correlated (e.g., Lacasse and Nadim 1996; Phoon and Kulhawy
Step 1: Suppose that there is a soil property profile, all
1999; Wang et al. 2016). To deal with spatially varying and corre- data points of which are known
lated data, a suite of specialized statistical methods called geo-
statistics has been developed, originally for ore estimation.
Geostatistics generally require a large number of data points mea- Step 2: Obtain a number, M (e.g., M = 5), of measured data
sured at preferably regular intervals for proper construction of points from this given soil property profile
semi-variogram (e.g., Wang et al. 2017). Geotechnical data are,
Step 3: Interpolate the M measured data points to infer a
however, often sparsely measured during site investigation, and

the number of measured data points may not be sufficient for Stage 1: for a complete interpreted soil property profile using Bayesian
given soil compressive sampling (BCS)
proper application of geostatistics (e.g., Baecher and Christian
property profile
2003).
Step 4: Compare the interpreted soil property profile with
Currently there is no scientific or quantitative method to deter- the original one, and use relative error to evaluate the
mine sample size in geotechnical site investigation. Existing de- accuracy of results obtained from M measured data points
sign codes around the world only provide conceptual principles
on sample size determination. For example, clause 2.4.2.4 in Step 5: Repeat Steps 2-4 for many different M values
Eurocode 7 (CEN 2007) states that “The necessary number of spec-
imens to be tested shall be established depending on the homo-
geneity of the ground, the quality and amount of comparable Step 6: Establish a relationship between sample size M and
relative error for a given soil property profile
experience with the ground and the geotechnical category of the
problem.” No detailed or quantitative method has been provided
on how to determine the sample size in engineering practice. A
Step 7: Suppose that there is a given site where many (e.g.,
straightforward way to bypass this problem is to increase the 50) sets of complete soil property profiles are available.
sample size, or even, to test the geomaterials at every location in Repeat Steps 1 to 6 for each set of the soil property profiles
Stage 2: for a
a site. However, this substantially increases investigation costs, given site
in this site.
and the manpower and time required. Even if additional cost and
manpower are available, access may not be possible for a long
Step 8: Calculate statistics (e.g., mean) of relative error at

period of site investigation, particularly in urban areas where the different sample sizes, and establish a relationship between
ground surface is already occupied with buildings and ongoing sample size and relative error statistic for the given site
activities.
Some attempts to tackle the sample size determination prob-
lem have been reported in geotechnical literature. For example, Step 9: Suppose that there are a wide variety of typical sites
statistical methods have been developed for determining the re- commonly encountered in geotechnical site investigation.
quired sample size for rock strength tests (Yamaguchi 1970; Gill Repeat the Steps 7 to step 8 for each of the sites.
et al. 2005; Cui et al. 2017) or construction of compacted soil liners Stage 3: for a wide
variety of sites
(Benson et al. 1994), but the assumption of independent samples is
still adopted (i.e., no consideration of spatial correlation) in these Step 10: Summarize the results for all the sites above
and develop a relationship between normalized sample
studies. More recently, Fenton et al. (2015) and Liza et al. (2017)
size and relative error statistic
developed statistical methods to determine the sample size for
quality control of cement-based solidification–stabilization and to
check whether or not the sample average of the hydraulic conduc-
tivity satisfies the regulatory hydraulic conductivity. Their meth- first. Figure 1 shows a conceptual framework for developing such
ods rely on random finite element simulation of flow and do not a relationship in geotechnical site investigation. As shown in
aim at design profiles of geotechnical properties (i.e., spatial vari- Fig. 1, the framework mainly involves three stages: (1) the relation-
ation of geotechnical properties), which is the objective of site ship between the desired level of accuracy and sample size for a
investigation. In addition, the effect of sampling location has also given soil property profile, (2) the relationship for a given site, and
been explored (e.g., Goldsworthy et al. 2007). finally (3) the relationship for a wide variety of typical sites com-
This paper aims at developing a rational method and statistical monly encountered in geotechnical site investigation. Note that
chart for sample size determination in geotechnical site investi- Fig. 1 only deals with sampling along depth, and this study focuses
gation, with consideration of spatial variation and correlation in on one-dimensional cases.
soil property. An extensive parametric study is performed to de- In stage 1, it is instructive to firstly consider the required sample
velop a relationship between sample size and interpretation ac- size for one soil property profile (i.e., steps 1–6 in Fig. 1). Suppose
curacy with consideration of spatial variation and correlation. that there is a soil property profile that is completely known (i.e.,
After this introduction, development of the relationship is de- all data points in the profile are known). To mimic the data mea-
scribed, followed by development of the statistical chart for sam- surement and interpretation in geotechnical site investigation, a
ple size determination. Examples of using the proposed statistical number, M (e.g., M = 5), of data points is obtained from the soil
chart are also provided. property profile and these points are used as measurement data
points to interpret a complete soil property profile using interpo-
Development of relationship between sample size lation methods, such as Bayesian compressive sampling (BCS)
(Wang and Zhao 2017). Because the original soil property profile is
and interpretation accuracy considering spatial
completely known, the soil property profile interpreted from the
variation and correlation M measurement data points may be compared with the original
To determine the sample size required to achieve a target level soil property profile to evaluate the accuracy of the interpreted
of accuracy on the inferred geotechnical profiles (e.g., spatial vari- profile (i.e., steps 2–4 in Fig. 1). Let X = [X(D1), X(D2), …, X(DN)]T be a
ation of a soil property along depths), a relationship between the vector representing the original soil property X profile along
desired level of accuracy and sample size should be established depths D with a total number N of data points, and let a vector X̂ =
Published by NRC Research Press

994 Can. Geotech. J. Vol. 56, 2019
[X̂(D1), X̂(D2), …, X̂(DN)]T denotes the soil property profile inter- tionship between the normalized ␮RE and normalized M will be
preted from the M measurement data points. The accuracy of the developed from the dimensionless plot. The normalized ␮RE and
inferred geotechnical profile X̂ can be quantified by comparing a normalized M relationship provides a link between the accuracy
relative error, RE, between X and X̂, which is expressed as of the interpretation results and sample size, and this relationship
will be used subsequently to develop a statistical chart for sample
兺 size determination with consideration of spatial variation and
冪
N
[X(Di) ⫺ X̂(Di)]2
㛳X ⫺ X̂㛳 i⫽1 correlation of soil properties.
(1) RE ⫽ ⫽ × 100%
㛳X㛳
兺 In the proposed framework, a key element is BCS that is used to
N
X(Di)2 interpret a complete soil property profile from M measurement
i⫽1
data points (Wang and Zhao 2017). A brief review of BCS method is
where the symbol “㛳 · 㛳” denotes an operator of vector norm, which provided in the next section.
is defined as the square root of the sum of squared value of all
elements, as shown on the right-hand side of eq. (1). Note that the
Brief review of Bayesian compressive sampling
definition of RE in eq. (1) is similar to the normalized root-mean- (BCS) method
square error in statistics if the original soil property profile X is The BCS method is a probabilistic version of compressive sens-
treated as the mean. To investigate the influence of different sam- ing (CS), which is a novel sampling method that can reconstruct a
ple sizes (i.e., different M values) on the accuracy of the inferred signal from sparse measurements on that signal (e.g., Candès et al.
geotechnical profile, steps 2–4 described above are repeated for 2006; Donoho 2006; Tropp and Gilbert 2007; Ji et al. 2008, 2009;
many different M values (e.g., M = 10), and the corresponding REs Wang and Zhao 2016; Zhao and Wang 2018a). A signal may be
are calculated (i.e., step 5 in Fig. 1). When the sample size M is defined loosely as a quantity that exhibits variation with time or
small, the soil property profile interpreted from such sparse mea- space, such as a variation of soil property along depth. Therefore,
surement data might not be accurate, and the RE is relatively spatial variation of a soil property with depth can be interpreted
large. As M increases, the accuracy of the inferred geotechnical from sparse and limited measurement data using BCS method.
profile improves, and the RE gradually decreases. When all data The BCS method is based on the fact that natural signals have
points in the original soil property profile are treated as measure- clear trends and are compressible. The term “compressible”
ment data points (i.e., M = N), the inferred geotechnical profile means that a signal can be concisely represented by a weighted
converges to the original soil property profile, and the RE ap- summation of a limited number of basis functions, such as wave-
proaches zero. The relationship between RE and M for a given soil let functions (e.g., Candès et al. 2006; Donoho 2006; Tropp and
property profile can be established after steps 2–4 are repeated for Gilbert 2007).
many different M values (i.e., step 6). This relationship quantifies Mathematically, let f = [f1, f2, …, fN]T be a soil property profile
the variation of the estimation error as sample size changes for a with a length of N, which denotes spatial variation of a soil prop-
given soil property profile. erty with depths. The measurement data drawn from f is defined
In stage 2 (i.e., steps 7–8 in Fig. 1), the study is extended from a
as y = [y1, y2, …, yM]T, which is a column vector with a length of M,
given soil property profile (i.e., stage 1) to a given site. Suppose
where M < N. The soil property profile interpreted from the BCS
that there is a given site where many (e.g., 50 or 100) sets of
method is denoted as f̂ ⫽ 关f̂1, f̂2, ..., f̂N兴T, which can be expressed as
complete soil property profiles are available. The procedure devel-
oped in stage 1 (i.e., steps 1–6) may be used repeatedly to each set
of the complete soil property profiles in this site, resulting in (2) f̂ ⫽ B␻s
many sets of relationships between RE and M. Statistical analysis
can be performed to obtain the mean of RE, ␮RE, at different M where B is an orthonormal matrix with a dimension of N × N, each
values for all the relationships obtained. This establishes a rela- column of which represents a basis function; ␻s is a weight col-
tionship between the interpretation accuracy (e.g., using ␮RE) and umn vector with a length of N, corresponding to N basis functions
sample sizes for a given site. in B. Note that matrix B is independent of the soil property profile
In stage 3 (i.e., steps 9–10 in Fig. 1), the study is further extended f, and that B can be constructed using discrete wavelet transform
from a given site (i.e., stage 2) to a wide variety of typical sites (DWT) (e.g., Daubechies 1992).
commonly encountered in geotechnical site investigation. The Wang and Zhao (2017) developed a Bayesian framework to
procedure developed in stage 2 (i.e., steps 7–8) may be used repeat- statistically reconstruct the soil property profile f̂ from sparsely
edly for each typical site, resulting in many sets of relationships
measured data. Using the Bayesian framework, the posterior
between ␮RE and M. Because well-investigated and well-documented
distribution of ␻s is derived as a multivariate Student’s t distribu-
sites are rare in reality, random field simulation (e.g., Vanmarcke
tion with a degree of 2cn, where cn = M/2 + c, with c being a very
1977) is adopted in this study to generate a large number of typical
small nonnegative number. The mean and covariance matrix of
sites as a parametric study in stage 3. Three statistical parameters
␻s are expressed as (Wang and Zhao 2017)
are commonly needed in a random field simulation of spatial varia-
tion of a soil property: (i) mean ␮ of the soil property, (ii) variance or
coefficient of variation, COV, and (iii) correlation length ␭, which (3) ␮␻s ⫽ HATy
reflects quantitatively spatial correlation of the soil property at
different depths. Typical ranges of ␮, COV, and ␭ for a variety of dnH
(4) COV␻s ⫽
soil properties have been reported in the literature (e.g., Phoon cn ⫺ 1
and Kulhawy 1999; Cao et al. 2016), and these typical ranges are
used in the random field simulation in the parametric study to
generate a large number of typical sites that mimic spatially vary- where H = (ATA + D)−1, A = ⌿B (where ⌿ is a measurement matrix
ing and correlated soil properties in real site conditions. The pro- that represents the locations of components of y in f), and D is a
cedure developed in stage 2 (i.e., steps 7–8) is used repeatedly to diagonal matrix with diagonal components ␣i (i = 1, 2, …, N) that
each typical site in the parametric study, resulting in many sets of can be obtained by maximizing the likelihood of y (i.e., p(y); cn =
relationship between ␮RE and M for different site conditions. The M/2 + c, dn ⫽ d ⫹ 共yTy ⫺ ␮␻sT⌯⫺1␮␻s兲/2, where c and d are very small
large number of ␮RE and M relationships obtained in the paramet- nonnegative number (e.g., c = d = 10−4). As f̂ = B␻s, the mean and
ric study will be summarized in a dimensionless plot, and a rela- covariance of f̂ are expressed as

Wang et al. 995
Table 1. Summary of real CPT data used in illustration example. method. The dashed lines indicate the mean qc profiles inter-
Min. Max. No. of preted from the BCS method, and the statistical uncertainty of the
depth depth Thickness data interpreted qc profiles is represented by an interval with two dot-
Soil type CPT No. (m) (m) (m) points ted lines that are the mean qc profile plus or minus one standard
deviation of qc obtained from the BCS method. The corresponding
Keswick clay CD1⬃CD50 2.01 4.56 2.55 512 relative error, i.e., RE, for each M value (e.g., M = 10, 15, 20, 60) are
calculated using eq. (1) as RE = 8.29%, 6.28%, 3.71%, and 1.90%,
respectively. Figure 2 shows that RE decreases significantly as the
(5) ␮f̂ ⫽ B␮␻s
number M of measured data points increases. In other words, the
accuracy of the interpreted qc profile from BCS method improves
(6) COVf̂ ⫽ BCOV␻sBT
as M increases. When M increases to 60, out of a total number of

data points N = 512, the mean qc profile interpreted from BCS
method almost completely overlaps with the original complete qc
The BCS method is able to not only provide an expected (or
profile. In addition, the interval within two dotted lines decreases
mean) soil property profile interpreted from sparse measurement
as M increases, and the statistical uncertainty of interpreted qc
data, but also quantify estimation error of the inferred geotechni-
data profile also decreases with the increase of M.
cal profile (e.g., Zhao et al. 2018). Wang and Zhao (2017) showed
The REs for different number of measurement data points are
that, as the number M of the measured data points increases, the
summarized in Fig. 3 (i.e., step 6 in Fig. 1). The vertical axis in Fig. 3
BCS mean soil property profile gradually converges to the original
represents the RE value, while the horizontal axis denotes the
soil property profile and the statistical uncertainty gradually re-
number of measured data points or sample size. It shows that the
duces and approaches a negligible value when all data points in
RE decreases from about 10% to about 1% as the number of mea-
the original soil property profile are measured. Note that mea-
surement data increases from 5 to 80. Figure 3 establishes a rela-
surement error has been explicitly modeled in BCS as a Gaussian
tionship between sample size M and RE for the qc profile of CPT
random variable with a zero mean and unknown variance, which
is subsequently integrated through a marginalization in the No. CD5.
Bayesian framework. Because BCS has the unique capability of
Illustration of stage 2 using real CPT data
interpolating sparsely measured data and quantifying the evolu-
tion of interpolation error as the number of sparse measurements To illustrate stage 2 of the proposed framework, steps 1–6 in
increases, BCS is used in this study to develop the relationship stage 1 are repeated for each of the 50 sets of CPT data in the
Keswick clay layer. For each set of the CPT qc profile, different
between interpretation accuracy and sample size. In the next sec-

tions, stages 1 and 2 of the proposed framework are illustrated number M of measured data points, i.e., M = 5, 10, 15, 20, 25, 30, 35,
using BCS and real cone penetration test (CPT) data. 40, 45, 50, 60, and 80, are taken from the respective original qc
profile and used repeatedly as input to the BCS method to inter-
Illustration of stage 1 using real CPT data pret the corresponding complete qc profiles. For example, Fig. 4
To illustrate stages 1 and 2 of the proposed framework, 50 sets of shows the original qc profiles for CPT No. CD1 to CD11 by gray solid
real CPT tip resistance qc data are taken from Jaksa et al. (1999) and lines, together with the complete qc profiles interpreted from BCS
used in this study. Because CPT data are almost continuous method using M = 10, 20, and 60 measured data points taken from
and have a measurement interval as low as 0.005 m, the high- each set of CPT data as input, respectively. The interpreted qc
resolution qc profile may be used to compare with the qc profile profiles are shown by dashed lines, dotted lines, and dashed dot-
reconstructed from BCS method with a limited number M of mea- ted lines for M = 10, 20, and 60, respectively. Similar to Fig. 2, Fig. 4
sured qc data points as input, as described in stages 1 and 2 of the shows that, as M increases, the interpreted qc profiles gradually
proposed framework. A large number of CPTs was carried out in a approach the respective original qc profiles. Similar to Fig. 3, a
relatively homogeneous, stiff and overconsolidated clay layer in relationship between RE and M may be obtained for each set of
Adelaide, Australia (Jaksa et al. 1999). The clay is known as Kes- CPT data.
wick clay, and details of the 50 sets of CPT data are summarized in Figure 5 summarizes the RE and M relationships for the 50 sets
Table 1. The 50 sets of CPT tip resistance qc data used in this section of CPT data in the Keswick clay layer. Note that the horizontal axis
were downloaded from the ISSMGE TC304 database website (http:// of Fig. 5 is sampling interval, ds. Because thickness of the Keswick
140.112.12.21/issmge/Database_2011.htm). According to Jaksa and clay layer in which the 50 sets of CPTs were performed is known
Kaggwa (1994), the depth of Keswick clay is from about 1.5 to about and the measured data points are taken with an equal interval,
5 m below the ground surface at this site. To guarantee that all sampling interval ds may be determined as the thickness (e.g.,
CPT data used in this study come from the Keswick clay, CPT data 2.55 m in this illustration) divided by (M – 1). For example, M = 5,
ranging from 2.01 to 4.56 m were used in this section. The CPT qc 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, and 80 leads to ds = 0.64, 0.28,
data were recorded with a sampling interval of 0.005 m, and the 0.18, 0.13, 0.11, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, and 0.03 m, respec-
total number of data points for each set CPT data is N = 512. In tively. Figure 5 shows REs for the 50 sets of CPT data with ds = 0.64,
addition, the 50 sets of CPTs were performed with a small hori- 0.28, 0.18, 0.13, 0.11, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, and 0.03 m,
zontal interval of 0.5 m, and each of the 50 sets of CPT data may be respectively. It is obvious in Fig. 5 that RE increases as ds increases
considered as a random sample of a statistically homogeneous or M decreases. In addition, variability of RE also increases as ds
one-dimensional random field that represents spatial variations increases. At each ds value, statistical analysis is performed to
of qc with depths in the homogenous Keswick clay. estimate the mean, ␮RE, of the 50 RE values obtained for the
To illustrate stage 1 in Fig. 1 (i.e., steps 1–6), different numbers of 50 sets of CPT data. A variation of ␮RE as a function of ds is shown
measured data points (i.e., sample size), M = 5, 10, 15, 20, 25, 30, 35, in Fig. 6. The mean RE, ␮RE increases from about 1.7% to about
40, 45, 50, 60, and 80, are taken from the CPT No. CD 5 and used as 11.2% as the sampling interval increases from 0.03 to 0.64 m.
input to the BCS method for interpreting the complete qc data Figure 6 establishes a relationship between the interpretation
profile. For simplicity, the measured data points are taken with an accuracy (e.g., ␮RE) and sample size (i.e., sampling interval ds) for
equal interval. Figure 2 shows the interpretation results for M = 10, the given Keswick clay layer.
15, 20, 60. In this figure, the solid lines represent the original qc In an extensive parametric study reported in the next section,
profile, and the open circles represent the measurement data stages 1 and 2 illustrated in these two sections are repeated for a
taken from the original qc profile and used as input to the BCS wide variety of sites typically encountered in geotechnical prac-

Fig. 2. Interpreted qc profiles from BCS method for different number M of measured data points: (a) M = 10; (b) M = 15; (c) M = 20; (d) M = 60 for
CPT No. CD5. [Colour online.]
qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0
2.0 2.0
2.5 2.5
3.0 RE=8.29% 3.0 RE=6.28%

Depth (m)
Depth (m)
3.5 3.5
4.0 4.0
4.5 4.5
Original qc profile Original qc profile
5.0 BCS mean qc profile 5.0 BCS mean qc profile
BCS mean qc profile ± 1 σ BCS mean qc profile ± 1 σ
Measurement data y Measurement data y
5.5 5.5
(a) (b)
qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0
2.0 2.0
2.5 2.5
3.0 3.0
RE=3.71% RE=1.90%
Depth (m)
Depth (m)
3.5 3.5
4.0 4.0
4.5 4.5
Original qc profile Original qc profile
5.0 BCS mean qc profile 5.0 BCS mean qc profile
BCS mean qc profile ± 1 σ BCS mean qc profile ± 1 σ
Measurement data y Measurement data y
5.5 5.5
(c) (d)
tice. Because well-characterized and well-documented sites are tionary soil layers are simulated using random fields. Consider,
rare in reality, random field simulation is used to generate a large for example, a one-dimensional stationary Gaussian random field
number of typical sites that mimic spatially varying and corre- X(D), where D is depth and X is a random variable representing a
lated soil properties in real site conditions. soil property of interest with a mean ␮ and a standard deviation ␴.
The spatial correlation between X(D) at different depths is mod-
Parametric study using random field simulation eled by a correlation function ␳(⌬D). A single exponential correla-
data (stage 3) tion function (SECF) is adopted in this study. For a SECF, the
Random field simulation of soil property profiles correlation function ␳(⌬D) between X(Di) and X(Dj) at the respec-
Random fields have been successfully used to model spatially tive depth of Di and Dj may be expressed as
correlated geotechnical properties during the last several decades
(e.g., Vanmarcke 1977; Vanmarcke et al. 1986; Phoon and Kulhawy (7) ␳(⌬D) ⫽ exp(⫺2|⌬D|/ ␭)
1999; Fenton and Griffiths 2008; Bong and Stuedlein 2018; Wang
et al. 2018; Zhao and Wang 2018b). In this study, soil property in which ␭ is the correlation length, ⌬D is the separation distance
profiles from a wide variety of statistically homogenous and sta- between Di and Dj (= |Di – Dj|).

Wang et al. 997
Fig. 3. Relationship between number, M, of measured data points ranging from 5 to 80 are used as input to BCS for interpreting the
and relative error for CPT No. CD5. complete soil property profiles.
In stage 3, for each of the 504 random fields, 100 random field
15 samples are generated, and steps 1–6 are repeatedly performed for
each random field sample at 12 different sample sizes (i.e., step 7),
resulting in 504 × 100 × 12 = 604 800 relative errors. Then, step 8 is
repeatedly performed to estimate the mean relative error for each
M value and random filed, leading to 504 × 12 = 6048 ␮RE. In next
section, the parametric study results (i.e., 6048 data pairs of ␮RE
Relative error, RE (%)
10 and M) are summarized for developing a dimensionless relation-

ship between the normalized sample size and ␮RE.
Dimensionless relationship between sample size and

relative error
The parametric study results are summarized in Fig. 7 with two
5 dimensionless parameters, normalized mean relative error ␮RE/
COV and normalized sampling interval ds/␭. The vertical axis in
Fig. 7 (i.e., ␮RE/COV) represents the interpretation accuracy (i.e.,
mean RE) normalized with respect to the variability (i.e., COV) of
the soil property concerned. The horizontal axis in Fig. 7 (i.e., ds/␭)
represents the sample size (i.e., sampling interval) normalized
0 with respect to the spatial correlation (i.e., correlation length).
0 10 20 30 40 50 60 70 80 90 100 Figure 7 shows the 6048 data pairs of ␮RE/COV and ds/␭ by open
Number of measured data points, M triangles. The data pairs in Fig. 7 reveal a relationship that the
value of ␮RE/COV increases as the value of ds/␭ increases from 0 to
Let X = [X(D1), X(D2), …, X(DN)]T be a vector of X(D) at N different about 3, and that the ␮RE/COV value approaches to an asymptotic
depths. X is a Gaussian vector with a mean vector ␮IN×1 and cova- value when ds/␭ is larger than 3.
riance matrix C = ␴2R. IN×1 is column with N components in which When the normalized sampling interval ds/␭ is close to 0, a
all equal to one. R is a correlation matrix for X, and each compo- small positive intercept is observed along the vertical axis of ␮RE/
nent of R is calculated using eq. (7). Then X can be expressed as COV. There are two possible scenarios leading to this small inter-
cept. In the first scenario, the sampling interval ds approaches 0,
which means that the soil properties are sampled and measured
(8) X ⫽ ␮IN×1 ⫹ V兹diag[L]Z at every location. The small intercept in the vertical axis of ␮RE/
COV reflects the nugget effect or the difference in soil properties
where V (= [v1, …, vN]) is eigenvector matrix of the covariance matrix at neighboring locations (e.g., Matheron 1963). In the second sce-
C; diag[L] is a diagonal matrix, with diagonal elements being eigen- nario, the correlation length ␭ is very large, which indicates that
values of C; and Z is a standard Gaussian vector (e.g., Au and Wang the soils are uniform within the layer and that the soil properties
2014). In essence, eq. (8) is a Karhunen–Loève expansion of the ran- at different locations within the soil layer are perfectly correlated
dom vector X, and random field samples of X may be generated among each other. Therefore, the entire soil property profile can
using eq. (8) (e.g., Zhang and Ellingwood 1994; Huang et al. 2001; be represented by a single sample and measurement. The small
Wang et al. 2018). intercept in the vertical axis of ␮RE/COV in this scenario repre-
sents the measurement error, which is also often referred to as
Parametric study nugget effect in geostatistics. Note that measurement error has
To simulate a wide variety of typical sites, typical ranges of the been probabilistically modeled in BCS as a Gaussian random vari-
three statistical parameters (i.e., ␮, COV, and ␭) for random field able with a zero mean and unknown variance, which is subse-
modeling of soil properties are considered and used in the para- quently integrated through a marginalization in the Bayesian
metric study. For example, Phoon and Kulhawy (1999) and Cao framework. More discussion on the nugget effect is provided sub-
et al. (2016) summarized the typical ranges of ␮ and COV for a wide sequently in this section.
variety of soil properties, including undrained shear strength, On the contrary, when the normalized sampling interval ds/␭ is
friction angle, natural water content, plastic and liquid limits, large (e.g., larger than 3), the sample size is quite small and the
unit weight, relative density, and field measurements (e.g., CPT; sampling interval is much larger than the distance within which
vane shear test, VST; standard penetration test, SPT). Generally the measurement data are significantly correlated. In this case,
speaking, the mean values range from 0.5 to 700, while the COV the ds/␭ has negligible effect on the ␮RE/COV, which plots more or
varies from 2% to 90%. The typical values of vertical correlation less horizontally over different ds/␭ values in Fig. 7. When ds/␭ is
length ␭ for different soil properties range from 0.8 to 12.7 m (e.g., larger than 3, reducing sampling interval (i.e., ds) or increasing
Phoon and Kulhawy 1999). Hence, these typical ranges of random sample size (i.e., M) does not improve the level of accuracy for the
field parameters are used in the parametric study. Table 2 sum- interpretation results. This suggests that there is a minimum sam-
marizes the parameters used in the parametric study. The ␮ val- ple size threshold or maximum sampling interval threshold for
ues range from 0.5 to 1000, while the COV values range from 1% to geotechnical site investigation. When the sample size is smaller
100%. The values of vertical correlation length ␭ vary from 0.5 to than the minimum sample size threshold or the sampling interval
20 m. The thicknesses of the soil layers vary from 2.0 to 51.1 m, is larger than the maximum sampling interval threshold (e.g., 3␭),
which cover the typical ranges of thickness for a homogeneous changing the sample size or sampling interval has negligible in-
soil layer in geotechnical practice. In total, 504 random fields are fluence on the level of accuracy for the interpretation results.
simulated. For each random field, 100 random field samples are Note that the data pattern in Fig. 7 is very similar to a semi-
generated, and the total number of data points for each random variogram in geostatistics, which shows variation of a half of au-
field sample is N = 512. Then, 12 different sample sizes (i.e., M = 5, tocovariance with lag distance. Figure 8 shows a schematic of a
10, 15, 20, 25, 30, 35, 40, 45, 50, 60, and 80) are adopted for each typical semi-variogram where the experimental semi-variogram
random field sample. The numbers of measurement data points data (i.e., open circles) are fitted to an exponential function with a

Fig. 4. Interpreted qc profiles from BCS method for CPT No. CD1 to CD11 with different number M of measured data points.
qc (MPa) qc (MPa) qc (MPa) qc (MPa) qc (MPa)

1.5 2.0 2.5 3.0 3.5 1.5 2.0 2.5 3.0 3.5 1.5 2.0 2.5 3.0 3.5 4.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
2.0 2.0 2.0 2.0 2.0
2.5 2.5 2.5 2.5 2.5
Depth (m)
Depth (m)
Depth (m)
Depth (m)
Depth (m)
3.0 3.0 3.0 3.0 3.0
3.5 3.5 3.5 3.5 3.5
4.0 4.0 4.0 4.0 4.0
4.5 4.5 4.5 4.5 4.5
CPT No. CD1 CPT No. CD2 CPT No. CD3 CPT No. CD4 CPT No. CD6
5.0 5.0 5.0 5.0 5.0
qc (MPa) qc (MPa) qc (MPa) qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 3.5
2.0 2.0 2.0 2.0 2.0
2.5 2.5 2.5 2.5 2.5
Depth (m)
Depth (m)
Depth (m)
Depth (m)
Depth (m)
3.0 3.0 3.0 3.0 3.0
3.5 3.5 3.5 3.5 3.5

4.0 4.0 4.0 4.0 4.0
4.5 4.5 4.5 4.5 4.5
CPT No. CD7 CPT No. CD8 CPT No. CD9 CPT No. CD10 CPT No. CD11
5.0 5.0 5.0 5.0 5.0
Original qc profile Interpreted qc profile for: M=10; M=20; M=60
Fig. 5. Variation of relative error with sampling interval from 50 Fig. 6. Relationship between mean relative error and sampling interval.
sets of real CPT data.
15
30
M=80 M=60
M=50 M=45
M=40 M=35
Mean relative error, PRE (%)
M=30 M=25
10
M=20 M=15
Relative error, RE (%)
20 M=10 M=5
5
10
0
0.0 0.2 0.4 0.6 0.8
0 Sampling interval, ds (m)
0.0 0.2 0.4 0.6 0.8
Sampling interval, ds (m)
the lag distance where the value of semi-variogram reaches 95% of
its sill value. The semi-variogram contains a small positive inter-
nugget component (i.e., solid line). As shown in Fig. 8, the semi- cept, ng, at the vertical axis, which is referred to as nugget effect in
variogram monotonically increases as lag distance increases, and geostatistics for modeling measurement errors and spatial varia-
it approaches an asymptotic value, defined as sill, s, in geostatis- tion within the shortest sampling interval (e.g., Matheron 1963).
tics, when lag distance is larger than an effective range, a (e.g., Given that Fig. 7 exhibits a pattern that is similar to an expo-
Webster and Oliver 2007). The effective range is usually taken as nential semi-variogram, an exponential function is used in a re-

Wang et al. 999
Table 2. Summary of parameters used in parametric study. Fig. 9. Statistical chart for sampling interval determination.
Parameter Variation 3
Best-fit curve-1σ
Thickness (m) 2.0, 5.0, 10.2, 20.4, 30.7, 40.9, and 51.1
Mean, ␮ 0.5, 20, 40, 50, 60, 80, 100, and 1000 Best-fit curve+1σ
COV (%) 1, 5, 10, 20, 30, 40, 50, and 100 Best-fit curve
Normalized sampling interval, ds/O

Correlation length, ␭ (m) 0.5, 1, 2, 3, 5, 7, 9, 11, 13, 15, 18, and 20
Number of measurement 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60,
data points, M and 80 2
Fig. 7. Dimensionless plot of normalized mean relative error versus

normalized sampling interval. [Colour online.]
1.5 Parametric study result(total No.6048)

Best-fit curve 1
Best-fit curve+1V
Normalized mean relative error, PRE /COV
Best-fit curve-1V
1.0
0
0.0 0.5 1.0 1.5
Equation of best-fit curve: Target normalized relative error, RET /COV
x
y=0.157+0.975 u {1-exp(- )}
0.657
0.5 y=PRE /COV, x=ds/O where y = ␮RE/COV and x = ds/␭. The best-fit ng value in eq. (9) is
0.157, and the sill s = 0.157 + 0.975 = 1.132. The effective range a = 1.8
R2=0.969 can be estimated from the best-fit line in Fig. 7. A dotted line and
a dashed dotted line are also included in Fig. 7 to represent the
best-fit curve plus or minus one standard deviation, respectively.
The equations for the dotted and dashed dotted lines are ex-
0.0 pressed, respectively, as
0 1 2 3 4 5 6
Normalized sampling interval, ds/O (10) 关共 0.657
y ⫽ 0.209 ⫹ 0.975 1 ⫺ exp ⫺
x
兲兴
y ⫽ 0.105 ⫹ 0.975关1 ⫺ exp共⫺
0.657 兲兴
Fig. 8. Illustration of an exponential semi-variogram model with x
fitting parameters. (11)
Fitted semi-variogram model The relationship between the normalized mean RE and normal-
Experimental semi-variogram ized sampling interval shown in eqs. (9)–(11) establishes a connec-
tion between the sample size and level of accuracy for the
interpretation results. This relationship may be further used to
develop statistical chart for sample size determination with con-
sideration of spatial variation and correlation of soil properties, as
Semi-variogram, γ(h)
described in next section.
Development of statistical chart

x
y n g c1 u {1- exp(- )} As shown in Fig. 9, a statistical chart is developed for determi-
Sill, s
r nation of sample size along vertical direction in a single statisti-

y γ ( h) cally homogeneous soil layer. The chart is developed from the
x h parametric study results and Fig. 7. Similar to Fig. 7, Fig. 9 is
plotted in accordance with two normalized dimensionless param-
s n g c1
eters, a target normalized relative error (i.e., RET/COV) in the hor-
Effective range, a izontal axis and a normalized sampling interval in the vertical
axis (i.e., ds/␭). The target relative error, denoted as RET, is the
Nugget, ng desired level of accuracy for the interpretation results that a user
0 would like to have. When replacing ␮RE by RET in eqs. (9)–(11) and
Lag distance, h (m)
switching the vertical and horizontal axis of Fig. 7, a statistical
chart is generated, as shown in Fig. 9. Similar to Fig. 7, a dotted
line and a dashed dotted line are also included in Fig. 9 to repre-
gression analysis to best fit the 6048 data points in Fig. 7. The sent the best-fit curve minus or plus one standard deviation, re-
best-fit line is shown by a solid line in Fig. 7, and corresponding spectively. Figure 9 shows that, as the RET value decreases, or a
best-fit equation, with a coefficient of determination R2 = 0.969, is stringent level of accuracy is required for the interpretation re-
expressed as sults, the RET/COV value decreases, and the normalized sampling
interval decreases as well. As the sampling interval decreases, the
(9) 关
y ⫽ 0.157 ⫹ 0.975 1 ⫺ exp ⫺ 共 x
0.657 兲兴 required number of samples increases. Figure 9 provides a quan-
titative tool for geotechnical practitioners to rationally determine

sample size with consideration of desired level of accuracy and Fig. 10. Comparison between original qc profile and qc profile
spatial variation and correlation of soil properties. interpreted from BCS method for real CPT data example. [Colour
To use the statistical chart in Fig. 9, users must first specify their online.]
target relative error RET. Target relative error is determined de-
pending on the importance and risk of the project (e.g., failure qc (MPa)
consequences), as well as complexity of local geology. Then, the
coefficient of variation, COV, and vertical correlation length, ␭, of
5 10 15 20 25 30 35
the soil properties concerned is estimated based on existing
knowledge of the site and soil properties concerned. Such existing
knowledge includes, but is not limit to, local experience on soil 6
properties and sites under similar conditions, geologic and topo-

graphic maps, and the typical range of COV and ␭ for different soil
properties reported in the literature (e.g., Phoon and Kulhawy RE=9.98%
1999; Cao et al. 2016). With the estimated COV value and specified 8 RET=10%
RET value, the RET/COV in the horizontal axis of Fig. 9 is calculated
accordingly. Then, Fig. 9 is used to determine the corresponding
ds/␭ value in the vertical axis. Finally, the sampling interval ds is
10
Depth (m)
calculated using the ds/␭ value obtained from Fig. 9 and the esti-
mated ␭ value. The required sample size M is calculated from the
ds value and the thickness of soil layer.
Application examples 12
To demonstrate application of the proposed statistical chart,
real CPT qc data and laboratory test result (i.e., liquid limit) are
used as illustrative examples in this section. Note that CPT data
are used in the first illustrative example, because CPT is per- 14
formed in a nearly continuous manner. The CPT data then can be
used for validating the interpretation results of measurement
data, the sample size of which is determined from the statistical

chart. The proposed quantitative method and statistical chart are
16 Original qc profile
equally applicable to measurements on other soil properties, such
as liquid limit data shown in the second example.
BCS mean qc profile
18 BCS mean qc profile ± 1 σ
Real CPT qc data example
A CPT (i.e., test number CPT-1) was carried out in a cohesive soil Measurement data y
layer in eastern Houston, Texas, USA, and its qc data were down-
loaded online from ISSMGE TC304dB database website (http://
tical chart provides reasonable results and achieves the target
140.112.12.21/issmge/Database_2011.htm) and used in this illustrative
level of accuracy for this site.
example. According to Stuedlein et al. (2012), the water table is
about 1.8 m below the ground surface, and the site contains a Liquid limit data example
thick layer of stiff, slightly silty clay. The CPT qc data ranging from In this subsection, liquid limit, wL, data measured from the
5.10 to 15.32 m are used in this example. The qc data were recorded laboratory are used to illustrate application of the proposed sta-
at 20 mm interval, so the total number of data points for this CPT tistical chart. Consider, for example, determination of sample size
qc data profile is 512. for quantifying spatial variation of liquid limit along depth within
Suppose that the target relative error, RET, in this application is a silty clay layer in the construction site of Taipei City Hall Station,
10%. The COV and vertical ␭ of the qc data in this site are estimated Taiwan (Chin et al. 1994). The site contains a thick layer of silty
as 15% and 2.0 m, respectively, based on local experience (e.g., clay from a depth of about 2.9–30.0 m.
nearby CPT data). These ranges are also within the typical ranges Suppose that the target relative error is 12%. The COV and ver-
of 5%⬃40% for COV and 0.1⬃2.2 m for vertical ␭ for the clay CPT qc tical ␭ of the liquid limit in this silty clay layer are estimated as 18%
data reported in the literature (e.g., Phoon and Kulhawy 1999). and 5.0 m, respectively. These two values are approximately the
Then, the value of RET/COV is calculated as 0.67. Using the best-fit mean of the typical ranges of 6%⬃30% for COV and 1.6⬃8.7 m for
curve in the statistical chart, the normalized sampling interval vertical ␭ of the clay liquid limit reported in the literature (e.g.,
ds/␭ = 0.50 is calculated. Because ␭ is estimated as 2.0 m, the Phoon and Kulhawy 1999). Then, the RET/COV = 0.67 is calculated.
sampling interval is then calculated as ds = 1.00 m. Using the best-fit curve in the statistical chart, ds/␭ = 0.5 is calcu-
The thickness of soil property profile is 10.22 m, and the num- lated. Because ␭ = 5 m, the sampling interval ds = 2.5 m is esti-
ber of measurement data points is estimated as M = 10.22/1.00 + mated. The thickness of silty clay layer is 27.1 m, and the number
1 ≈ 11. Subsequently, 11 data points with an equal interval are of measurement data points is estimated as M = (27.1/2.5) + 1 ≈ 12.
taken from the CPT data, and they are used as measurement data Subsequently, 12 samples within the silty clay layer (i.e., from a
and input for the BCS method to interpret the complete soil prop- depth of about 2.9 to 30.0 m) shall be obtained from a borehole
erty profile. As shown in Fig. 10, the measurement data are shown with an equal sampling interval for carrying out liquid limit tests
by open circles. The solid line represents the original CPT qc pro- in the laboratory and quantifying spatial variation of the liquid
file, and the dashed line indicates the mean qc profile obtained limit along depth.
from BCS. The statistical uncertainty of the interpreted qc profiles Chin et al. (1994) happened to report that a total number of 39
is indicated by the interval within two dotted lines, which are the soils samples (i.e., T1–T39) was obtained from a borehole within
mean qc profile plus or minus one standard deviation. The RE this silty clay layer for carrying out liquid limit tests in the labo-
between the original and mean qc profiles is 9.98%, which is quite ratory. Figure 11 shows the 39 wL values measured along depth by
close to the target relative error of 10%. In other words, the statis- open triangles. To validate the performance of the proposed sta-

Wang et al. 1001
Fig. 11. Comparison between original and profile interpreted from normalized mean relative error approaches an asymptotic value
BCS method for liquid limit data example. [Colour online.] when the normalized sampling interval is larger than 3. A nugget
effect was observed from the dimensionless plot that represents
measurement errors and spatial variation in soil properties at
neighboring locations. There also exists a minimum sample size
threshold or maximum sampling interval threshold for geotech-
nical site investigation. When the sample size is smaller than the
minimum sample size threshold or the sampling interval is larger
than the maximum sampling interval threshold (e.g., 3␭), chang-
ing the sample size or sampling interval has negligible influence
on the level of accuracy for the spatial variation of soil properties

interpreted.
A statistical chart for sample size determination was further
developed from the dimensionless plot. Real CPT data and real
liquid limit data were used to illustrate the application of the
proposed statistical chart. The statistical chart was shown to per-
form well in both illustrative examples.
Acknowledgements
The work described in this paper was supported by grants from
the Research Grants Council of the Hong Kong Special Adminis-
trative Region, China (Project No. 9042516 (CityU 11213117) and
Project No. 8779012 (T22-603/15N)). The financial support is grate-
fully acknowledged. The authors would like to thank the mem-
bers of the TC304 Committee on Engineering Practice of Risk
Assessment and Management of the International Society of Soil
Mechanics and Geotechnical Engineering for developing the da-
tabase 304dB used in this study and making it available for scien-
tific inquiry. The authors also wish to thank M. Jaksa and A.W.
Stuedlein for contributing this database to the TC304 compen-

dium of databases.
References
tistical chart in this illustrative example, 12 wL values are taken Au, S.-K., and Wang, Y. 2014. Engineering risk assessment with subset simula-
from the 39 wL values with an approximately equal sampling tion. John Wiley and Sons, Singapore.
interval, and they are shown in Fig. 11 by solid circles. The 12 wL Baecher, G.B., and Christian, J.T. 2003. Reliability and statistics in geotechnical
engineering. John Wiley and Sons, Hoboken, New Jersey.
values obtained are used as the measurement data, i.e., input to Benson, C.H., Zhai, H., and Rashad, S.M. 1994. Statistical sample size for con-
BCS for interpolating a high-resolution liquid limit profile with struction of soil liners. Journal of Geotechnical Engineering 120(10): 1704–
256 data points. Figure 11 also shows the mean liquid limit profile 1724. doi:10.1061/(ASCE)0733-9410(1994)120:10(1704).
interpreted from BCS by a dashed line and the mean plus or minor Bong, T., and Stuedlein, A.W. 2018. Efficient methodology for probabilistic anal-
one standard deviation profiles by two dotted lines. It is observed ysis of consolidation considering spatial variation. Engineering Geology, 237:
53–63. doi:10.1016/j.enggeo.2018.02.009.
from Fig. 11 that the dashed line follows a trend quite similar to Candes, E.J., Romberg, J.K., and Tao, T. 2006. Stable signal recovery from incom-
that of the open triangles (i.e., all 39 wL values), and that most of plete and inaccurate measurements. Communications on Pure and Applied
the 39 wL data points (i.e., the open triangles) fall within the dot- Mathematics, 59(8): 1207–1223. doi:10.1002/cpa.20124.
ted lines. The relative error RE between the 39 original liquid limit Cao, Z., Wang, Y., and Li, D. 2016. Quantification of prior knowledge in geotech-
nical site characterization. Engineering Geology, 203: 107–116. doi:10.1016/j.
values and mean liquid limit values at the corresponding depths enggeo.2015.08.018.
obtained from BCS is calculated as 11.64%, which is very consistent CEN. 2007. Eurocode 7: Geotechnical design — Part 1: General rules. EN 1997-1:
with the target relative error of 12%. The consistency suggests that 2007. European Committee for Standardization (CEN), Brussels, Belgium.
the sample size determined by the proposed statistical chart (i.e., Chin, C., Crooks, J., and Moh, Z. 1994. Geotechnical properties of the cohesive
Sungshan deposits, Taipei. Geotechnical Engineering, 25(2).
Fig. 9) achieves the target level of accuracy for the results inter- Clayton, C.R. 2001. Managing geotechnical risk: improving productivity in UK
preted from the given number of measurements. In summary, the building and construction. Thomas Telford Publishing, London.
proposed statistical chart performs well in the illustrative exam- Cui, J., Jiang, Q., Li, S., Feng, X., Zhang, M., and Yang, B. 2017. Estimation of the
ples for both in situ and laboratory test data. number of specimens required for acquiring reliable rock mechanical pa-
rameters in laboratory uniaxial compression tests. Engineering Geology,
222: 186–200. doi:10.1016/j.enggeo.2017.03.023.
Conclusion Daubechies, I. 1992. Ten lectures on wavelets. Society for Industrial and Applied
This paper performed an extensive parametric study and devel- Mathematics (SIAM), Philadelphia, Pa., USA.
oped a quantitative method and statistical chart for sample size Donoho, D.L. 2006. Compressed sensing. IEEE Transactions on Information The-
ory, 52(4): 1289–1306. doi:10.1109/TIT.2006.871582.
determination in geotechnical site investigation with consider- Fenton, G.A., and Griffiths, D.V. 2008. Risk assessment in geotechnical engineer-
ation of spatial variation and correlation of soil properties. The ing. John Wiley and Sons, New York, USA.
extensive parametric study was performed to establish a quanti- Fenton, G.A., Liza, R., Lake, C.B., Menzies, W.T., and Griffiths, D.V. 2015. Statis-
tative relationship between the sample size M and the correspond- tical sample size for quality control programs of cement-based “solidifica-
tion/stabilization”. Canadian Geotechnical Journal, 52(10): 1620–1628. doi:10.
ing level of accuracy for the results interpreted from the M 1139/cgj-2013-0478.
number of measured data points. The parametric study results Gill, D.E., Corthésy, R., and Leite, M.H. 2005. Determining the minimal number
were summarized using a dimensionless plot between the nor- of specimens for laboratory testing of rock properties. Engineering Geology,
malized mean relative error ␮RE/COV and normalized sampling 78(1–2): 29–51. doi:10.1016/j.enggeo.2004.10.005.
Goldsworthy, J.S., Jaksa, M.B., Fenton, G.A., Kaggwa, W.S., Griffiths, V., and
interval ds/␭. The dimensionless plot showed such a relationship Poulos, H.G. 2007. Effect of sample location on the reliability based design of
that the normalized mean relative error increases as the normal- pad foundations. Georisk: Assessment and Management of Risk for Engineered
ized sampling interval increases from 0 to about 3, and that the Systems and Geohazards, 1(3): 155–166. doi:10.1080/17499510701697377.

Huang, S., Quek, S., and Phoon, K. 2001. Convergence study of the truncated via orthogonal matching pursuit. IEEE Transactions on Information Theory,
Karhunen–Loeve expansion for simulation of stochastic processes. Interna- 53(12): 4655–4666. doi:10.1109/TIT.2007.909108.
tional Journal for Numerical Methods in Engineering, 52(9): 1029–1043. doi: Vanmarcke, E.H. 1977. Probabilistic modeling of soil profiles. Journal of the
10.1002/nme.255. Geotechnical Engineering Division, ASCE 103(11): 1227–1246.
Jaksa, M., and Kaggwa, W.S. 1994. A micro-computer based data acquisition Vanmarcke, E., Shinozuka, M., Nakagiri, S., Schueller, G., and Grigoriu, M. 1986.
system for the cone penetration test. Department of Civil and Environmental Random fields and stochastic finite elements. Structural Safety, 3(3–4): 143–
Engineering, University of Adelaide. 166. doi:10.1016/0167-4730(86)90002-0.
Jaksa, M., Kaggwa, W., and Brooker, P. 1999. Experimental evaluation of the Wang, Y., and Zhao, T. 2016. Interpretation of soil property profile from limited
scale of fluctuation of a stiff clay. In Proceedings of the 8th International measurement data: a compressive sampling perspective. Canadian Geotech-
Conference on the Application of Statistics and Probability, A.A. Balkema, nical Journal, 53(9): 1547–1559. doi:10.1139/cgj-2015-0545.
Sydney, Rotterdam. Vol. 1, pp. 415–422. Wang, Y., and Zhao, T. 2017. Statistical interpretation of soil property profiles
Ji, S., Xue, Y., and Carin, L. 2008. Bayesian compressive sensing. IEEE Transac- from sparse data using Bayesian compressive sampling. Géotechnique, 67(6):
tions on Signal Processing, 56(6): 2346–2356. doi:10.1109/TSP.2007.914345. 523–536. doi:10.1680/jgeot.16.P.143.
Ji, S., Dunson, D., and Carin, L. 2009. Multitask compressive sensing. IEEE Trans- Wang, Y., Cao, Z., and Li, D. 2016. Bayesian perspective on geotechnical variabil-
actions on Signal Processing, 57(1): 92–106. doi:10.1109/TSP.2008.2005866. ity and site characterization. Engineering Geology, 203: 117–125. doi:10.1016/
Krejcie, R.V., and Morgan, D.W. 1970. Determining sample size for research j.enggeo.2015.08.017.
activities. Educational and Psychological Measurement, 30(3): 607–610. doi: Wang, Y., Akeju, O.V., and Zhao, T. 2017. Interpolation of spatially varying but
10.1177/001316447003000308. sparsely measured geo-data: a comparative study. Engineering Geology, 231:
Lacasse, S., and Nadim, F. 1996. Uncertainties in characterising soil properties. In 200–217. doi:10.1016/j.enggeo.2017.10.019.
Uncertainty in the geologic environment: From theory to practice. Publica- Wang, Y., Zhao, T., and Phoon, K.-K. 2018. Direct simulation of random field
tion No. 201. Norwegian Geotechnical Institute, Oslo, Norway, pp. 49–75. samples from sparsely measured geotechnical data with consideration of
Liza, R., Fenton, G.A., Lake, C.B., and Griffiths, D.V. 2017. An analytical approach to uncertainty in interpretation. Canadian Geotechnical Journal, 55(6): 862–
assess quality control sample sizes of cement-based “solidification/stabilization”. 880. doi:10.1139/cgj-2017-0254.
Canadian Geotechnical Journal, 54(3): 419–427. doi:10.1139/cgj-2016-0218. Webster, R., and Oliver, M.A. 2007. Geostatistics for environmental scientists.
Matheron, G. 1963. Principles of geostatistics. Economic Geology, 58(8): 1246– John Wiley and Sons, Chichester, England.
1266. doi:10.2113/gsecongeo.58.8.1246. Yamaguchi, U. 1970. The number of test-pieces required to determine the
Mayne, P.W., Christopher, B.R., and Dejong, J. 2002. Manual on subsurface in- strength of rock. International Journal of Rock Mechanics and Mining
vestigations. Nat. Highway Inst. Sp. Pub. FHWA NHI-01-031. Federal Highway Sciences & Geomechanics Abstracts, 7: 209–227. doi:10.1016/0148-9062(70)
Administration, Washington, D.C. 90013-6.
Orr, T.L.L. 2017. Defining and selecting characteristic values of geotechnical Zhang, J., and Ellingwood, B. 1994. Orthogonal series expansions of random
parameters for designs to Eurocode 7. Georisk: Assessment and Management fields in reliability analysis. Journal of Engineering Mechanics, 120(12): 2660–
of Risk for Engineered Systems and Geohazards, 11(1): 103–115. doi:10.1080/ 2677. doi:10.1061/(ASCE)0733-9399(1994)120:12(2660).
17499518.2016.1235711. Zhao, T., and Wang, Y. 2018a. Interpretation of pile lateral response from deflec-
Phoon, K.K. 2017. Role of reliability calculations in geotechnical design. Georisk: tion measurement data: a compressive sampling-based method. Soils and
Assessment and Management of Risk for Engineered Systems and Geohazards, Foundations, 58(4): 957–971. doi:10.1016/j.sandf.2018.05.002.
11(1): 4–21. doi:10.1080/17499518.2016.1265653. Zhao, T., and Wang, Y. 2018b. Simulation of cross-correlated random field sam-
Phoon, K.-K., and Kulhawy, F.H. 1999. Characterization of geotechnical variabil- ples from sparse measurements using Bayesian compressive sensing. Me-
ity. Canadian Geotechnical Journal, 36(4): 612–624. doi:10.1139/t99-038. chanical Systems and Signal Processing, 112: 384–400. doi:10.1016/j.ymssp.
Stuedlein, A.W., Kramer, S.L., Arduino, P., and Holtz, R.D. 2012. Geotechnical 2018.04.042.
characterization and random field modeling of desiccated clay. Journal of Zhao, T., Montoya-Noguera, S., Phoon, K.K., and Wang, Y. 2018. Interpolating
Geotechnical and Geoenvironmental Engineering, 138(11): 1301–1313. doi:10. spatially varying soil property values from sparse data for facilitating char-
1061/(ASCE)GT.1943-5606.0000723. acteristic value selection. Canadian Geotechnical Journal, 55(2): 171–181. doi:
Tropp, J.A., and Gilbert, A.C. 2007. Signal recovery from random measurements 10.1139/cgj-2017-0219.

Sample Size Determination in Geotechnical

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample Size Determination in Geotechnical

Uploaded by

Copyright:

Available Formats

992

Received 13 July 2018. Accepted 2 October 2018.

however, often sparsely measured during site investigation, and

Step 8: Calculate statistics (e.g., mean) of relative error at

Published by NRC Research Press

兺 size determination with consideration of spatial variation and

Published by NRC Research Press

as M increases. When M increases to 60, out of a total number of

between interpretation accuracy and sample size. In the next sec-

Published by NRC Research Press

3.0 RE=8.29% 3.0 RE=6.28%

Published by NRC Research Press

10 and M) are summarized for developing a dimensionless relation-

Dimensionless relationship between sample size and

Published by NRC Research Press

qc (MPa) qc (MPa) qc (MPa) qc (MPa) qc (MPa)

2.5 2.5 2.5 2.5 2.5

3.5 3.5 3.5 3.5 3.5

4.0 4.0 4.0 4.0 4.0

4.5 4.5 4.5 4.5 4.5

2.5 2.5 2.5 2.5 2.5

3.5 3.5 3.5 3.5 3.5

4.0 4.0 4.0 4.0 4.0

4.5 4.5 4.5 4.5 4.5

Published by NRC Research Press

Normalized sampling interval, ds/O

Fig. 7. Dimensionless plot of normalized mean relative error versus

1.5 Parametric study result(total No.6048)

described in next section.

Development of statistical chart

r nation of sample size along vertical direction in a single statisti-

Published by NRC Research Press

properties and sites under similar conditions, geologic and topo-

data, the sample size of which is determined from the statistical

Published by NRC Research Press

on the level of accuracy for the spatial variation of soil properties

Stuedlein for contributing this database to the TC304 compen-

Published by NRC Research Press

Published by NRC Research Press

You might also like