Professional Documents
Culture Documents
Sample Size Determination in Geotechnical
Sample Size Determination in Geotechnical
ARTICLE
Sample size determination in geotechnical site investigation
considering spatial variation and correlation
Yu Wang, Zheng Guan, and Tengyuan Zhao
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Abstract: Site investigation is a fundamental element in geotechnical engineering practice, but only a small portion of geoma-
terials is sampled and tested during site investigation. This leads to a question of sample size determination: how many samples
are needed to achieve a target level of accuracy for the results inferred from the samples? Sample size determination is a
well-known topic in statistics and has many applications in a wide variety of areas. However, conventional statistical methods,
which mainly deal with independent data, only have limited applications in geotechnical site investigation because geotechnical
data are not independent, but spatially varying and correlated. Existing design codes around the world (e.g., Eurocode 7) only
provide conceptual principles on sample size determination. No scientific or quantitative method is available for sample size
determination in site investigation considering spatial variation and correlation of geotechnical properties. This study performs
an extensive parametric study and develops a statistical chart for sample size determination with consideration of spatial
variation and correlation using Bayesian compressive sensing or sampling. Real cone penetration test data and real laboratory
test data are used to illustrate application of the proposed statistical chart, and the method is shown to perform well.
Key words: geotechnical site investigation, sample size, Bayesian method, compressive sensing, random field.
Résumé : L'étude du site est un élément fondamental de la pratique de l’ingénierie géotechnique, mais seule une petite partie
des géomatériaux est échantillonnée et testée au cours de l’étude du site. Cela conduit à une question de détermination de la
For personal use only.
taille de l’échantillon : combien faut-il d’échantillons pour atteindre un niveau cible d’exactitude pour les résultats déduits des
échantillons? La détermination de la taille des échantillons est un sujet bien connu en statistique et a de nombreuses applica-
tions dans de nombreux domaines. Cependant, les méthodes statistiques conventionnelles, qui traitent principalement de
données indépendantes, n’ont que des applications limitées dans l’étude géotechnique de sites, car les données géotechniques
ne sont pas indépendantes, mais varient dans l’espace et en corrélation. Les codes de conception existants dans le monde
(Eurocode 7, par exemple) ne fournissent que des principes conceptuels pour la détermination de la taille de l’échantillon.
Aucune méthode scientifique ou quantitative n’est disponible pour la détermination de la taille de l’échantillon dans l’étude du
site, compte tenu de la variation spatiale et de la corrélation des propriétés géotechniques. Cette étude réalise une évaluation
paramétrique approfondie et développe un tableau statistique pour la détermination de la taille de l’échantillon en prenant en
compte la variation et la corrélation spatiales à l’aide de la détection ou de l’échantillonnage bayésien en compression. Les
données des tests de pénétration au cône réel et les données des tests de laboratoire réels sont utilisées pour illustrer
l’application du diagramme statistique proposé, et il est démontré que la méthode donne de bons résultats. [Traduit par la
Rédaction]
Mots-clés : étude géotechnique du site, taille de l’échantillon, méthode bayésienne, détection de compression, champ aléatoire.
Introduction are often sparsely measured (e.g., Mayne et al. 2002; Phoon 2017;
Site investigation is a fundamental element in geotechnical en- Wang et al. 2017; Orr 2017; Zhao et al. 2018). This results in estima-
gineering practice, and interpretation of site investigation data tion error in the interpretation results and leads to a question of
leads to the expected ground condition (e.g., spatial distribution sample size determination: how many samples are needed to
of different soil types) and design profiles of geotechnical proper- achieve a target level of accuracy for the results inferred from the
ties that are subsequently used in geotechnical analyses and de- samples?
signs. Without proper site investigation or data interpretation to Sample size determination is a well-known topic in statistics
produce reliable input parameters, the subsequent geotechnical (e.g., Krejcie and Morgan 1970) and has many applications in a
analyses and designs are much less meaningful, and the geotech- wide variety of areas (e.g., quality control, surveys, and polls).
nical construction projects might be subjected to significant risk. However, conventional statistical methods, which mainly deal
Clayton (2001) conducted a survey in the UK and found that about with independent data, only have limited application in geotech-
42% of the problems that occurred in geotechnical construction nical site investigation, because geotechnical data are not inde-
projects were caused by uncertainty associated with site investi- pendent, but spatially correlated. Note that soils are natural
gation. During geotechnical investigation of a site, only a small geomaterials, whose properties are affected by many spatially
portion of geomaterials is examined, and site investigation data varying but correlated factors during the geological process, such
Can. Geotech. J. 56: 992–1002 (2019) dx.doi.org/10.1139/cgj-2018-0474 Published at www.nrcresearchpress.com/cgj on 3 October 2018.
Wang et al. 993
as the properties of their parent materials, weathering and ero- Fig. 1. Framework for establishing relationship between sample
sion processes, transportation agents, and sedimentation condi- size and accuracy in geotechnical site investigation. [Colour online.]
tions. Geotechnical properties are therefore spatially varying and
correlated (e.g., Lacasse and Nadim 1996; Phoon and Kulhawy
Step 1: Suppose that there is a soil property profile, all
1999; Wang et al. 2016). To deal with spatially varying and corre- data points of which are known
lated data, a suite of specialized statistical methods called geo-
statistics has been developed, originally for ore estimation.
Geostatistics generally require a large number of data points mea- Step 2: Obtain a number, M (e.g., M = 5), of measured data
sured at preferably regular intervals for proper construction of points from this given soil property profile
semi-variogram (e.g., Wang et al. 2017). Geotechnical data are,
Step 3: Interpolate the M measured data points to infer a
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
[X̂(D1), X̂(D2), …, X̂(DN)]T denotes the soil property profile inter- tionship between the normalized RE and normalized M will be
preted from the M measurement data points. The accuracy of the developed from the dimensionless plot. The normalized RE and
inferred geotechnical profile X̂ can be quantified by comparing a normalized M relationship provides a link between the accuracy
relative error, RE, between X and X̂, which is expressed as of the interpretation results and sample size, and this relationship
will be used subsequently to develop a statistical chart for sample
冪
N
[X(Di) ⫺ X̂(Di)]2
㛳X ⫺ X̂㛳 i⫽1 correlation of soil properties.
(1) RE ⫽ ⫽ × 100%
㛳X㛳
兺 In the proposed framework, a key element is BCS that is used to
N
X(Di)2 interpret a complete soil property profile from M measurement
i⫽1
data points (Wang and Zhao 2017). A brief review of BCS method is
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
where the symbol “㛳 · 㛳” denotes an operator of vector norm, which provided in the next section.
is defined as the square root of the sum of squared value of all
elements, as shown on the right-hand side of eq. (1). Note that the
Brief review of Bayesian compressive sampling
definition of RE in eq. (1) is similar to the normalized root-mean- (BCS) method
square error in statistics if the original soil property profile X is The BCS method is a probabilistic version of compressive sens-
treated as the mean. To investigate the influence of different sam- ing (CS), which is a novel sampling method that can reconstruct a
ple sizes (i.e., different M values) on the accuracy of the inferred signal from sparse measurements on that signal (e.g., Candès et al.
geotechnical profile, steps 2–4 described above are repeated for 2006; Donoho 2006; Tropp and Gilbert 2007; Ji et al. 2008, 2009;
many different M values (e.g., M = 10), and the corresponding REs Wang and Zhao 2016; Zhao and Wang 2018a). A signal may be
are calculated (i.e., step 5 in Fig. 1). When the sample size M is defined loosely as a quantity that exhibits variation with time or
small, the soil property profile interpreted from such sparse mea- space, such as a variation of soil property along depth. Therefore,
surement data might not be accurate, and the RE is relatively spatial variation of a soil property with depth can be interpreted
large. As M increases, the accuracy of the inferred geotechnical from sparse and limited measurement data using BCS method.
profile improves, and the RE gradually decreases. When all data The BCS method is based on the fact that natural signals have
points in the original soil property profile are treated as measure- clear trends and are compressible. The term “compressible”
ment data points (i.e., M = N), the inferred geotechnical profile means that a signal can be concisely represented by a weighted
converges to the original soil property profile, and the RE ap- summation of a limited number of basis functions, such as wave-
proaches zero. The relationship between RE and M for a given soil let functions (e.g., Candès et al. 2006; Donoho 2006; Tropp and
For personal use only.
property profile can be established after steps 2–4 are repeated for Gilbert 2007).
many different M values (i.e., step 6). This relationship quantifies Mathematically, let f = [f1, f2, …, fN]T be a soil property profile
the variation of the estimation error as sample size changes for a with a length of N, which denotes spatial variation of a soil prop-
given soil property profile. erty with depths. The measurement data drawn from f is defined
In stage 2 (i.e., steps 7–8 in Fig. 1), the study is extended from a
as y = [y1, y2, …, yM]T, which is a column vector with a length of M,
given soil property profile (i.e., stage 1) to a given site. Suppose
where M < N. The soil property profile interpreted from the BCS
that there is a given site where many (e.g., 50 or 100) sets of
method is denoted as f̂ ⫽ 关f̂1, f̂2, ..., f̂N兴T, which can be expressed as
complete soil property profiles are available. The procedure devel-
oped in stage 1 (i.e., steps 1–6) may be used repeatedly to each set
of the complete soil property profiles in this site, resulting in (2) f̂ ⫽ Bs
many sets of relationships between RE and M. Statistical analysis
can be performed to obtain the mean of RE, RE, at different M where B is an orthonormal matrix with a dimension of N × N, each
values for all the relationships obtained. This establishes a rela- column of which represents a basis function; s is a weight col-
tionship between the interpretation accuracy (e.g., using RE) and umn vector with a length of N, corresponding to N basis functions
sample sizes for a given site. in B. Note that matrix B is independent of the soil property profile
In stage 3 (i.e., steps 9–10 in Fig. 1), the study is further extended f, and that B can be constructed using discrete wavelet transform
from a given site (i.e., stage 2) to a wide variety of typical sites (DWT) (e.g., Daubechies 1992).
commonly encountered in geotechnical site investigation. The Wang and Zhao (2017) developed a Bayesian framework to
procedure developed in stage 2 (i.e., steps 7–8) may be used repeat- statistically reconstruct the soil property profile f̂ from sparsely
edly for each typical site, resulting in many sets of relationships
measured data. Using the Bayesian framework, the posterior
between RE and M. Because well-investigated and well-documented
distribution of s is derived as a multivariate Student’s t distribu-
sites are rare in reality, random field simulation (e.g., Vanmarcke
tion with a degree of 2cn, where cn = M/2 + c, with c being a very
1977) is adopted in this study to generate a large number of typical
small nonnegative number. The mean and covariance matrix of
sites as a parametric study in stage 3. Three statistical parameters
s are expressed as (Wang and Zhao 2017)
are commonly needed in a random field simulation of spatial varia-
tion of a soil property: (i) mean of the soil property, (ii) variance or
coefficient of variation, COV, and (iii) correlation length , which (3) s ⫽ HATy
reflects quantitatively spatial correlation of the soil property at
different depths. Typical ranges of , COV, and for a variety of dnH
(4) COVs ⫽
soil properties have been reported in the literature (e.g., Phoon cn ⫺ 1
and Kulhawy 1999; Cao et al. 2016), and these typical ranges are
used in the random field simulation in the parametric study to
generate a large number of typical sites that mimic spatially vary- where H = (ATA + D)−1, A = ⌿B (where ⌿ is a measurement matrix
ing and correlated soil properties in real site conditions. The pro- that represents the locations of components of y in f), and D is a
cedure developed in stage 2 (i.e., steps 7–8) is used repeatedly to diagonal matrix with diagonal components ␣i (i = 1, 2, …, N) that
each typical site in the parametric study, resulting in many sets of can be obtained by maximizing the likelihood of y (i.e., p(y); cn =
relationship between RE and M for different site conditions. The M/2 + c, dn ⫽ d ⫹ 共yTy ⫺ sT⌯⫺1s兲/2, where c and d are very small
large number of RE and M relationships obtained in the paramet- nonnegative number (e.g., c = d = 10−4). As f̂ = Bs, the mean and
ric study will be summarized in a dimensionless plot, and a rela- covariance of f̂ are expressed as
Table 1. Summary of real CPT data used in illustration example. method. The dashed lines indicate the mean qc profiles inter-
Min. Max. No. of preted from the BCS method, and the statistical uncertainty of the
depth depth Thickness data interpreted qc profiles is represented by an interval with two dot-
Soil type CPT No. (m) (m) (m) points ted lines that are the mean qc profile plus or minus one standard
deviation of qc obtained from the BCS method. The corresponding
Keswick clay CD1⬃CD50 2.01 4.56 2.55 512 relative error, i.e., RE, for each M value (e.g., M = 10, 15, 20, 60) are
calculated using eq. (1) as RE = 8.29%, 6.28%, 3.71%, and 1.90%,
respectively. Figure 2 shows that RE decreases significantly as the
(5) f̂ ⫽ Bs
number M of measured data points increases. In other words, the
accuracy of the interpreted qc profile from BCS method improves
(6) COVf̂ ⫽ BCOVsBT
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Fig. 2. Interpreted qc profiles from BCS method for different number M of measured data points: (a) M = 10; (b) M = 15; (c) M = 20; (d) M = 60 for
CPT No. CD5. [Colour online.]
qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0
2.0 2.0
2.5 2.5
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Depth (m)
3.5 3.5
4.0 4.0
4.5 4.5
Original qc profile Original qc profile
5.0 BCS mean qc profile 5.0 BCS mean qc profile
BCS mean qc profile ± 1 σ BCS mean qc profile ± 1 σ
Measurement data y Measurement data y
5.5 5.5
(a) (b)
qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0
For personal use only.
2.0 2.0
2.5 2.5
3.0 3.0
RE=3.71% RE=1.90%
Depth (m)
Depth (m)
3.5 3.5
4.0 4.0
4.5 4.5
Original qc profile Original qc profile
5.0 BCS mean qc profile 5.0 BCS mean qc profile
BCS mean qc profile ± 1 σ BCS mean qc profile ± 1 σ
Measurement data y Measurement data y
5.5 5.5
(c) (d)
tice. Because well-characterized and well-documented sites are tionary soil layers are simulated using random fields. Consider,
rare in reality, random field simulation is used to generate a large for example, a one-dimensional stationary Gaussian random field
number of typical sites that mimic spatially varying and corre- X(D), where D is depth and X is a random variable representing a
lated soil properties in real site conditions. soil property of interest with a mean and a standard deviation .
The spatial correlation between X(D) at different depths is mod-
Parametric study using random field simulation eled by a correlation function (⌬D). A single exponential correla-
data (stage 3) tion function (SECF) is adopted in this study. For a SECF, the
Random field simulation of soil property profiles correlation function (⌬D) between X(Di) and X(Dj) at the respec-
Random fields have been successfully used to model spatially tive depth of Di and Dj may be expressed as
correlated geotechnical properties during the last several decades
(e.g., Vanmarcke 1977; Vanmarcke et al. 1986; Phoon and Kulhawy (7) (⌬D) ⫽ exp(⫺2|⌬D|/ )
1999; Fenton and Griffiths 2008; Bong and Stuedlein 2018; Wang
et al. 2018; Zhao and Wang 2018b). In this study, soil property in which is the correlation length, ⌬D is the separation distance
profiles from a wide variety of statistically homogenous and sta- between Di and Dj (= |Di – Dj|).
Fig. 3. Relationship between number, M, of measured data points ranging from 5 to 80 are used as input to BCS for interpreting the
and relative error for CPT No. CD5. complete soil property profiles.
In stage 3, for each of the 504 random fields, 100 random field
15 samples are generated, and steps 1–6 are repeatedly performed for
each random field sample at 12 different sample sizes (i.e., step 7),
resulting in 504 × 100 × 12 = 604 800 relative errors. Then, step 8 is
repeatedly performed to estimate the mean relative error for each
M value and random filed, leading to 504 × 12 = 6048 RE. In next
section, the parametric study results (i.e., 6048 data pairs of RE
Relative error, RE (%)
nent of R is calculated using eq. (7). Then X can be expressed as COV. There are two possible scenarios leading to this small inter-
cept. In the first scenario, the sampling interval ds approaches 0,
which means that the soil properties are sampled and measured
(8) X ⫽ IN×1 ⫹ V兹diag[L]Z at every location. The small intercept in the vertical axis of RE/
COV reflects the nugget effect or the difference in soil properties
where V (= [v1, …, vN]) is eigenvector matrix of the covariance matrix at neighboring locations (e.g., Matheron 1963). In the second sce-
C; diag[L] is a diagonal matrix, with diagonal elements being eigen- nario, the correlation length is very large, which indicates that
values of C; and Z is a standard Gaussian vector (e.g., Au and Wang the soils are uniform within the layer and that the soil properties
2014). In essence, eq. (8) is a Karhunen–Loève expansion of the ran- at different locations within the soil layer are perfectly correlated
dom vector X, and random field samples of X may be generated among each other. Therefore, the entire soil property profile can
using eq. (8) (e.g., Zhang and Ellingwood 1994; Huang et al. 2001; be represented by a single sample and measurement. The small
Wang et al. 2018). intercept in the vertical axis of RE/COV in this scenario repre-
sents the measurement error, which is also often referred to as
Parametric study nugget effect in geostatistics. Note that measurement error has
To simulate a wide variety of typical sites, typical ranges of the been probabilistically modeled in BCS as a Gaussian random vari-
three statistical parameters (i.e., , COV, and ) for random field able with a zero mean and unknown variance, which is subse-
modeling of soil properties are considered and used in the para- quently integrated through a marginalization in the Bayesian
metric study. For example, Phoon and Kulhawy (1999) and Cao framework. More discussion on the nugget effect is provided sub-
et al. (2016) summarized the typical ranges of and COV for a wide sequently in this section.
variety of soil properties, including undrained shear strength, On the contrary, when the normalized sampling interval ds/ is
friction angle, natural water content, plastic and liquid limits, large (e.g., larger than 3), the sample size is quite small and the
unit weight, relative density, and field measurements (e.g., CPT; sampling interval is much larger than the distance within which
vane shear test, VST; standard penetration test, SPT). Generally the measurement data are significantly correlated. In this case,
speaking, the mean values range from 0.5 to 700, while the COV the ds/ has negligible effect on the RE/COV, which plots more or
varies from 2% to 90%. The typical values of vertical correlation less horizontally over different ds/ values in Fig. 7. When ds/ is
length for different soil properties range from 0.8 to 12.7 m (e.g., larger than 3, reducing sampling interval (i.e., ds) or increasing
Phoon and Kulhawy 1999). Hence, these typical ranges of random sample size (i.e., M) does not improve the level of accuracy for the
field parameters are used in the parametric study. Table 2 sum- interpretation results. This suggests that there is a minimum sam-
marizes the parameters used in the parametric study. The val- ple size threshold or maximum sampling interval threshold for
ues range from 0.5 to 1000, while the COV values range from 1% to geotechnical site investigation. When the sample size is smaller
100%. The values of vertical correlation length vary from 0.5 to than the minimum sample size threshold or the sampling interval
20 m. The thicknesses of the soil layers vary from 2.0 to 51.1 m, is larger than the maximum sampling interval threshold (e.g., 3),
which cover the typical ranges of thickness for a homogeneous changing the sample size or sampling interval has negligible in-
soil layer in geotechnical practice. In total, 504 random fields are fluence on the level of accuracy for the interpretation results.
simulated. For each random field, 100 random field samples are Note that the data pattern in Fig. 7 is very similar to a semi-
generated, and the total number of data points for each random variogram in geostatistics, which shows variation of a half of au-
field sample is N = 512. Then, 12 different sample sizes (i.e., M = 5, tocovariance with lag distance. Figure 8 shows a schematic of a
10, 15, 20, 25, 30, 35, 40, 45, 50, 60, and 80) are adopted for each typical semi-variogram where the experimental semi-variogram
random field sample. The numbers of measurement data points data (i.e., open circles) are fitted to an exponential function with a
Fig. 4. Interpreted qc profiles from BCS method for CPT No. CD1 to CD11 with different number M of measured data points.
Depth (m)
Depth (m)
Depth (m)
Depth (m)
Depth (m)
3.0 3.0 3.0 3.0 3.0
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
CPT No. CD1 CPT No. CD2 CPT No. CD3 CPT No. CD4 CPT No. CD6
5.0 5.0 5.0 5.0 5.0
qc (MPa) qc (MPa) qc (MPa) qc (MPa) qc (MPa)
1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 3.5
2.0 2.0 2.0 2.0 2.0
Depth (m)
Depth (m)
Depth (m)
Depth (m)
Depth (m)
3.0 3.0 3.0 3.0 3.0
CPT No. CD7 CPT No. CD8 CPT No. CD9 CPT No. CD10 CPT No. CD11
5.0 5.0 5.0 5.0 5.0
Original qc profile Interpreted qc profile for: M=10; M=20; M=60
Fig. 5. Variation of relative error with sampling interval from 50 Fig. 6. Relationship between mean relative error and sampling interval.
sets of real CPT data.
15
30
M=80 M=60
M=50 M=45
M=40 M=35
Mean relative error, PRE (%)
M=30 M=25
10
M=20 M=15
Relative error, RE (%)
20 M=10 M=5
5
10
0
0.0 0.2 0.4 0.6 0.8
0 Sampling interval, ds (m)
0.0 0.2 0.4 0.6 0.8
Sampling interval, ds (m)
the lag distance where the value of semi-variogram reaches 95% of
its sill value. The semi-variogram contains a small positive inter-
nugget component (i.e., solid line). As shown in Fig. 8, the semi- cept, ng, at the vertical axis, which is referred to as nugget effect in
variogram monotonically increases as lag distance increases, and geostatistics for modeling measurement errors and spatial varia-
it approaches an asymptotic value, defined as sill, s, in geostatis- tion within the shortest sampling interval (e.g., Matheron 1963).
tics, when lag distance is larger than an effective range, a (e.g., Given that Fig. 7 exhibits a pattern that is similar to an expo-
Webster and Oliver 2007). The effective range is usually taken as nential semi-variogram, an exponential function is used in a re-
Table 2. Summary of parameters used in parametric study. Fig. 9. Statistical chart for sampling interval determination.
Parameter Variation 3
Best-fit curve-1σ
Thickness (m) 2.0, 5.0, 10.2, 20.4, 30.7, 40.9, and 51.1
Mean, 0.5, 20, 40, 50, 60, 80, 100, and 1000 Best-fit curve+1σ
COV (%) 1, 5, 10, 20, 30, 40, 50, and 100 Best-fit curve
Best-fit curve-1V
1.0
0
0.0 0.5 1.0 1.5
Equation of best-fit curve: Target normalized relative error, RET /COV
x
y=0.157+0.975 u {1-exp(- )}
0.657
0.5 y=PRE /COV, x=ds/O where y = RE/COV and x = ds/. The best-fit ng value in eq. (9) is
0.157, and the sill s = 0.157 + 0.975 = 1.132. The effective range a = 1.8
R2=0.969 can be estimated from the best-fit line in Fig. 7. A dotted line and
a dashed dotted line are also included in Fig. 7 to represent the
best-fit curve plus or minus one standard deviation, respectively.
For personal use only.
The equations for the dotted and dashed dotted lines are ex-
0.0 pressed, respectively, as
0 1 2 3 4 5 6
Normalized sampling interval, ds/O (10) 关 共 0.657
y ⫽ 0.209 ⫹ 0.975 1 ⫺ exp ⫺
x
兲兴
y ⫽ 0.105 ⫹ 0.975关1 ⫺ exp共⫺
0.657 兲兴
Fig. 8. Illustration of an exponential semi-variogram model with x
fitting parameters. (11)
Fitted semi-variogram model The relationship between the normalized mean RE and normal-
Experimental semi-variogram ized sampling interval shown in eqs. (9)–(11) establishes a connec-
tion between the sample size and level of accuracy for the
interpretation results. This relationship may be further used to
develop statistical chart for sample size determination with con-
sideration of spatial variation and correlation of soil properties, as
Semi-variogram, γ(h)
sample size with consideration of desired level of accuracy and Fig. 10. Comparison between original qc profile and qc profile
spatial variation and correlation of soil properties. interpreted from BCS method for real CPT data example. [Colour
To use the statistical chart in Fig. 9, users must first specify their online.]
target relative error RET. Target relative error is determined de-
pending on the importance and risk of the project (e.g., failure qc (MPa)
consequences), as well as complexity of local geology. Then, the
coefficient of variation, COV, and vertical correlation length, , of
5 10 15 20 25 30 35
the soil properties concerned is estimated based on existing
knowledge of the site and soil properties concerned. Such existing
knowledge includes, but is not limit to, local experience on soil 6
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Depth (m)
calculated using the ds/ value obtained from Fig. 9 and the esti-
mated value. The required sample size M is calculated from the
ds value and the thickness of soil layer.
Application examples 12
To demonstrate application of the proposed statistical chart,
real CPT qc data and laboratory test result (i.e., liquid limit) are
used as illustrative examples in this section. Note that CPT data
are used in the first illustrative example, because CPT is per- 14
formed in a nearly continuous manner. The CPT data then can be
used for validating the interpretation results of measurement
For personal use only.
Fig. 11. Comparison between original and profile interpreted from normalized mean relative error approaches an asymptotic value
BCS method for liquid limit data example. [Colour online.] when the normalized sampling interval is larger than 3. A nugget
effect was observed from the dimensionless plot that represents
measurement errors and spatial variation in soil properties at
neighboring locations. There also exists a minimum sample size
threshold or maximum sampling interval threshold for geotech-
nical site investigation. When the sample size is smaller than the
minimum sample size threshold or the sampling interval is larger
than the maximum sampling interval threshold (e.g., 3), chang-
ing the sample size or sampling interval has negligible influence
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Acknowledgements
The work described in this paper was supported by grants from
the Research Grants Council of the Hong Kong Special Adminis-
trative Region, China (Project No. 9042516 (CityU 11213117) and
Project No. 8779012 (T22-603/15N)). The financial support is grate-
fully acknowledged. The authors would like to thank the mem-
bers of the TC304 Committee on Engineering Practice of Risk
Assessment and Management of the International Society of Soil
Mechanics and Geotechnical Engineering for developing the da-
tabase 304dB used in this study and making it available for scien-
tific inquiry. The authors also wish to thank M. Jaksa and A.W.
For personal use only.
References
tistical chart in this illustrative example, 12 wL values are taken Au, S.-K., and Wang, Y. 2014. Engineering risk assessment with subset simula-
from the 39 wL values with an approximately equal sampling tion. John Wiley and Sons, Singapore.
interval, and they are shown in Fig. 11 by solid circles. The 12 wL Baecher, G.B., and Christian, J.T. 2003. Reliability and statistics in geotechnical
engineering. John Wiley and Sons, Hoboken, New Jersey.
values obtained are used as the measurement data, i.e., input to Benson, C.H., Zhai, H., and Rashad, S.M. 1994. Statistical sample size for con-
BCS for interpolating a high-resolution liquid limit profile with struction of soil liners. Journal of Geotechnical Engineering 120(10): 1704–
256 data points. Figure 11 also shows the mean liquid limit profile 1724. doi:10.1061/(ASCE)0733-9410(1994)120:10(1704).
interpreted from BCS by a dashed line and the mean plus or minor Bong, T., and Stuedlein, A.W. 2018. Efficient methodology for probabilistic anal-
one standard deviation profiles by two dotted lines. It is observed ysis of consolidation considering spatial variation. Engineering Geology, 237:
53–63. doi:10.1016/j.enggeo.2018.02.009.
from Fig. 11 that the dashed line follows a trend quite similar to Candes, E.J., Romberg, J.K., and Tao, T. 2006. Stable signal recovery from incom-
that of the open triangles (i.e., all 39 wL values), and that most of plete and inaccurate measurements. Communications on Pure and Applied
the 39 wL data points (i.e., the open triangles) fall within the dot- Mathematics, 59(8): 1207–1223. doi:10.1002/cpa.20124.
ted lines. The relative error RE between the 39 original liquid limit Cao, Z., Wang, Y., and Li, D. 2016. Quantification of prior knowledge in geotech-
nical site characterization. Engineering Geology, 203: 107–116. doi:10.1016/j.
values and mean liquid limit values at the corresponding depths enggeo.2015.08.018.
obtained from BCS is calculated as 11.64%, which is very consistent CEN. 2007. Eurocode 7: Geotechnical design — Part 1: General rules. EN 1997-1:
with the target relative error of 12%. The consistency suggests that 2007. European Committee for Standardization (CEN), Brussels, Belgium.
the sample size determined by the proposed statistical chart (i.e., Chin, C., Crooks, J., and Moh, Z. 1994. Geotechnical properties of the cohesive
Sungshan deposits, Taipei. Geotechnical Engineering, 25(2).
Fig. 9) achieves the target level of accuracy for the results inter- Clayton, C.R. 2001. Managing geotechnical risk: improving productivity in UK
preted from the given number of measurements. In summary, the building and construction. Thomas Telford Publishing, London.
proposed statistical chart performs well in the illustrative exam- Cui, J., Jiang, Q., Li, S., Feng, X., Zhang, M., and Yang, B. 2017. Estimation of the
ples for both in situ and laboratory test data. number of specimens required for acquiring reliable rock mechanical pa-
rameters in laboratory uniaxial compression tests. Engineering Geology,
222: 186–200. doi:10.1016/j.enggeo.2017.03.023.
Conclusion Daubechies, I. 1992. Ten lectures on wavelets. Society for Industrial and Applied
This paper performed an extensive parametric study and devel- Mathematics (SIAM), Philadelphia, Pa., USA.
oped a quantitative method and statistical chart for sample size Donoho, D.L. 2006. Compressed sensing. IEEE Transactions on Information The-
ory, 52(4): 1289–1306. doi:10.1109/TIT.2006.871582.
determination in geotechnical site investigation with consider- Fenton, G.A., and Griffiths, D.V. 2008. Risk assessment in geotechnical engineer-
ation of spatial variation and correlation of soil properties. The ing. John Wiley and Sons, New York, USA.
extensive parametric study was performed to establish a quanti- Fenton, G.A., Liza, R., Lake, C.B., Menzies, W.T., and Griffiths, D.V. 2015. Statis-
tative relationship between the sample size M and the correspond- tical sample size for quality control programs of cement-based “solidifica-
tion/stabilization”. Canadian Geotechnical Journal, 52(10): 1620–1628. doi:10.
ing level of accuracy for the results interpreted from the M 1139/cgj-2013-0478.
number of measured data points. The parametric study results Gill, D.E., Corthésy, R., and Leite, M.H. 2005. Determining the minimal number
were summarized using a dimensionless plot between the nor- of specimens for laboratory testing of rock properties. Engineering Geology,
malized mean relative error RE/COV and normalized sampling 78(1–2): 29–51. doi:10.1016/j.enggeo.2004.10.005.
Goldsworthy, J.S., Jaksa, M.B., Fenton, G.A., Kaggwa, W.S., Griffiths, V., and
interval ds/. The dimensionless plot showed such a relationship Poulos, H.G. 2007. Effect of sample location on the reliability based design of
that the normalized mean relative error increases as the normal- pad foundations. Georisk: Assessment and Management of Risk for Engineered
ized sampling interval increases from 0 to about 3, and that the Systems and Geohazards, 1(3): 155–166. doi:10.1080/17499510701697377.
Huang, S., Quek, S., and Phoon, K. 2001. Convergence study of the truncated via orthogonal matching pursuit. IEEE Transactions on Information Theory,
Karhunen–Loeve expansion for simulation of stochastic processes. Interna- 53(12): 4655–4666. doi:10.1109/TIT.2007.909108.
tional Journal for Numerical Methods in Engineering, 52(9): 1029–1043. doi: Vanmarcke, E.H. 1977. Probabilistic modeling of soil profiles. Journal of the
10.1002/nme.255. Geotechnical Engineering Division, ASCE 103(11): 1227–1246.
Jaksa, M., and Kaggwa, W.S. 1994. A micro-computer based data acquisition Vanmarcke, E., Shinozuka, M., Nakagiri, S., Schueller, G., and Grigoriu, M. 1986.
system for the cone penetration test. Department of Civil and Environmental Random fields and stochastic finite elements. Structural Safety, 3(3–4): 143–
Engineering, University of Adelaide. 166. doi:10.1016/0167-4730(86)90002-0.
Jaksa, M., Kaggwa, W., and Brooker, P. 1999. Experimental evaluation of the Wang, Y., and Zhao, T. 2016. Interpretation of soil property profile from limited
scale of fluctuation of a stiff clay. In Proceedings of the 8th International measurement data: a compressive sampling perspective. Canadian Geotech-
Conference on the Application of Statistics and Probability, A.A. Balkema, nical Journal, 53(9): 1547–1559. doi:10.1139/cgj-2015-0545.
Sydney, Rotterdam. Vol. 1, pp. 415–422. Wang, Y., and Zhao, T. 2017. Statistical interpretation of soil property profiles
Ji, S., Xue, Y., and Carin, L. 2008. Bayesian compressive sensing. IEEE Transac- from sparse data using Bayesian compressive sampling. Géotechnique, 67(6):
tions on Signal Processing, 56(6): 2346–2356. doi:10.1109/TSP.2007.914345. 523–536. doi:10.1680/jgeot.16.P.143.
Can. Geotech. J. Downloaded from cdnsciencepub.com by FAC CIENCIAS EST PROFESIONALES on 11/03/21
Ji, S., Dunson, D., and Carin, L. 2009. Multitask compressive sensing. IEEE Trans- Wang, Y., Cao, Z., and Li, D. 2016. Bayesian perspective on geotechnical variabil-
actions on Signal Processing, 57(1): 92–106. doi:10.1109/TSP.2008.2005866. ity and site characterization. Engineering Geology, 203: 117–125. doi:10.1016/
Krejcie, R.V., and Morgan, D.W. 1970. Determining sample size for research j.enggeo.2015.08.017.
activities. Educational and Psychological Measurement, 30(3): 607–610. doi: Wang, Y., Akeju, O.V., and Zhao, T. 2017. Interpolation of spatially varying but
10.1177/001316447003000308. sparsely measured geo-data: a comparative study. Engineering Geology, 231:
Lacasse, S., and Nadim, F. 1996. Uncertainties in characterising soil properties. In 200–217. doi:10.1016/j.enggeo.2017.10.019.
Uncertainty in the geologic environment: From theory to practice. Publica- Wang, Y., Zhao, T., and Phoon, K.-K. 2018. Direct simulation of random field
tion No. 201. Norwegian Geotechnical Institute, Oslo, Norway, pp. 49–75. samples from sparsely measured geotechnical data with consideration of
Liza, R., Fenton, G.A., Lake, C.B., and Griffiths, D.V. 2017. An analytical approach to uncertainty in interpretation. Canadian Geotechnical Journal, 55(6): 862–
assess quality control sample sizes of cement-based “solidification/stabilization”. 880. doi:10.1139/cgj-2017-0254.
Canadian Geotechnical Journal, 54(3): 419–427. doi:10.1139/cgj-2016-0218. Webster, R., and Oliver, M.A. 2007. Geostatistics for environmental scientists.
Matheron, G. 1963. Principles of geostatistics. Economic Geology, 58(8): 1246– John Wiley and Sons, Chichester, England.
1266. doi:10.2113/gsecongeo.58.8.1246. Yamaguchi, U. 1970. The number of test-pieces required to determine the
Mayne, P.W., Christopher, B.R., and Dejong, J. 2002. Manual on subsurface in- strength of rock. International Journal of Rock Mechanics and Mining
vestigations. Nat. Highway Inst. Sp. Pub. FHWA NHI-01-031. Federal Highway Sciences & Geomechanics Abstracts, 7: 209–227. doi:10.1016/0148-9062(70)
Administration, Washington, D.C. 90013-6.
Orr, T.L.L. 2017. Defining and selecting characteristic values of geotechnical Zhang, J., and Ellingwood, B. 1994. Orthogonal series expansions of random
parameters for designs to Eurocode 7. Georisk: Assessment and Management fields in reliability analysis. Journal of Engineering Mechanics, 120(12): 2660–
of Risk for Engineered Systems and Geohazards, 11(1): 103–115. doi:10.1080/ 2677. doi:10.1061/(ASCE)0733-9399(1994)120:12(2660).
17499518.2016.1235711. Zhao, T., and Wang, Y. 2018a. Interpretation of pile lateral response from deflec-
Phoon, K.K. 2017. Role of reliability calculations in geotechnical design. Georisk: tion measurement data: a compressive sampling-based method. Soils and
Assessment and Management of Risk for Engineered Systems and Geohazards, Foundations, 58(4): 957–971. doi:10.1016/j.sandf.2018.05.002.
11(1): 4–21. doi:10.1080/17499518.2016.1265653. Zhao, T., and Wang, Y. 2018b. Simulation of cross-correlated random field sam-
For personal use only.
Phoon, K.-K., and Kulhawy, F.H. 1999. Characterization of geotechnical variabil- ples from sparse measurements using Bayesian compressive sensing. Me-
ity. Canadian Geotechnical Journal, 36(4): 612–624. doi:10.1139/t99-038. chanical Systems and Signal Processing, 112: 384–400. doi:10.1016/j.ymssp.
Stuedlein, A.W., Kramer, S.L., Arduino, P., and Holtz, R.D. 2012. Geotechnical 2018.04.042.
characterization and random field modeling of desiccated clay. Journal of Zhao, T., Montoya-Noguera, S., Phoon, K.K., and Wang, Y. 2018. Interpolating
Geotechnical and Geoenvironmental Engineering, 138(11): 1301–1313. doi:10. spatially varying soil property values from sparse data for facilitating char-
1061/(ASCE)GT.1943-5606.0000723. acteristic value selection. Canadian Geotechnical Journal, 55(2): 171–181. doi:
Tropp, J.A., and Gilbert, A.C. 2007. Signal recovery from random measurements 10.1139/cgj-2017-0219.