Professional Documents
Culture Documents
2015 Jacques MONNET Et Al, THE USE OF CLUSTER ANALYSIS FOR IDENTIFICATION OF SOIL LAYER THE GRAYS HARBOR PONTOON CONSTRUCTION PROJECT PDF
2015 Jacques MONNET Et Al, THE USE OF CLUSTER ANALYSIS FOR IDENTIFICATION OF SOIL LAYER THE GRAYS HARBOR PONTOON CONSTRUCTION PROJECT PDF
) © 2015
ABSTRACT - The cluster analysis is a statistical method used for grouping data which have
similar mathematical characteristics. We present here an analysis of pressuremeter tests using
the Cluster method, which uses as the statistical variables: p LM ; Ee ; Ee/pLM. The results relate
the relevance of these variables to identify the soil horizons, and to characterize the various
geological layers of the site of the Grays Harbor.
RÉSUMÉ - L'analyse par cluster est une méthode statistique utilisée pour regrouper les
données qui ont des caractéristiques mathématiques similaires. Nous présentons ici une
analyse des essais pressiométriques selon la méthode de cluster, qui utilise comme variables
statistiques: pLM; Ee; Ee / pLM. Les résultats indiquent la pertinence de ces variables pour
identifier les horizons du sol, et caractériser les différentes couches du site de Grays Harbor.
1. Introduction
The study of soil stratification is a necessary step in geotechnical site characterization involving
the mapping of horizontal boundaries and the identification of soil layers. It is assumed that in
each layer the mechanical characteristics of the soil are homogenous. Therefore if it is
determined that a material has similar by pressuremeter mechanical characteristics, it follows
that said material can be grouped into different soil types allowing for the identification of
different soil layers. This hypothesis is not new and is currently used by Robertson and Wride
(1998) to determine soil stratigraphy using cone penetrometer resistance qc and the friction
ratio Rf.
The cluster analysis (Everitt, 1974) is a statistical method used for grouping data which
have similar mathematical characteristics in subsets with common or homogeneous values,
which can then be used for the stratification analysis. This method can be implemented on
various fields of data for which no division is a priori known and can be useful in the case of the
identification of the soil horizons, for site characterization. Cluster analyses for in-situ
applications are composed of the following six steps: selection of variables; standardization of
the data; resemblance distance matrix; choice of clustering technique; determination of the
number of clusters; interpretation of cluster results.
The grouping of the data in clusters, can be carried out, along a vertical profile (Saboya and
Balbi, 2003), but also in given planes (Mlynarek and Lunne, 1997). In both cases, the method
of calculation is identical and the only difference consists in an axial or plane treatment of the
problem. Using this method it’s possible to : objectively define similar groups in the soil profile;
delineate different layer boundaries; allocate the lenses within a sub-layer; determine the mean
mechanical values of each layer.
Various types of in-situ tests can be treated in this manner, such as the penetrometer (CPT),
or piezocone CPTU (Hegazy and Mayne, 2002). Recent applications relate mainly to the
piezocone but some relate to pressuremeter measurements (Baud, 2005; Baud and Gambin,
2008; Monnet et al., 2003; Monnet et al., 2006; Monnet and Allagnat, 2006, Monnet and
249
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Broucke, 2012). We present here a pressuremeter analysis by cluster, which uses the
variables pLM , Ee , Ee /pLM. The results relate to the relevance of these various variables to the
identification of soil horizons, and to the best possible combination of data needed to
characterize the various geological layers of the site of Grays Harbor, Washington State USA.
Data standardization is a way to use and represent data in a no dimensional space, so that the
effects of the different units on each variable are reduced. From a theoretical point of view
standardization is not essential to the use of the cluster method, but it is a logical step to use
data with mean value equal to zero and a standard deviation equal to one. At a depth i, a series
of measurements x of j variables (varying from 1 to p) can be noted x ij. The standardized
variable zij corresponding to xij is determined by the relation:
xij x j
zij
S x j
(1)
where x j and S(xj) are respectively the mean value and the standard deviation of the variable j
of xij with i varying from 1 to n.
2.2. Similarity between data and concept of distance between data series
The similarity between two different data series can be evaluated at least by 7 different
methods: Pearson; Manhattan; Minkowski; Power; Euclidean; Chebychev; Cosine (Norusis,
1988). There is no specific rule to classify and measure the similarity between data. According
to Hegazy and Mayne (2002) and Mlynarek et al. (2005), the best method to estimate the
deviations between data points is the Cosine method (2) rather than the a priori more simple
Euclidean method (3):
p
z .z ik jk
dij k 1
(2)
z .z
p p
2 2
ik jk
k 1 k 1
z z jk
p
2
dij ik (3)
k 1
It can be easily seen that the Euclidean relation gives a null result for two identical data
points in the same cluster, and produces a large deviation for two different points in two
different clusters. The Cosine method was chosen to determine the measurement of data
deviation because of the methods correlation related to measurement. The result produced by
the Cosine method of relation (2) lie between -1 and +1, with 1 corresponding to identical data,
-1 to data with anti-correlation and 0 to data with no correlation, whereas with relation (3) there
is no boundary to determine an uncorrelated series of data.
250
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Hegazy (1998) used various methods of regrouping data and finally recommended to use the
method of nearest neighbor which satisfies the mathematical conditions of continuity and
minimum distortion. It is this method which is used here for the analysis of the pressuremeter
tests.
The method is organized as shown in Figure 1. At the first step, for as set of n given data,
the values which are closest are determined (dij minimum for the Euclidean distance or dij
nearest to 1 for the Cosine distance). If the deviation is above the threshold (Cosine method) or
below the threshold (Euclidean method) the second point joins the cluster of the first point. In
the following step, two points being analyzed are either not in an existing cluster and form a
new cluster, or one of the points is an element of an existing cluster and thus the second point
joins the cluster of the first point. The program controls the condition of keenness, by checking
the distance between the mean value of two different clusters to ensure that it is under (Cosine
method) or above (Euclidean method) so that these two clusters are related to different soil
251
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
layers. Finally the program controls the condition of homogeneity, by checking the distance
between each point of a cluster and the mean value of this cluster to ensure that it is above
(Cosine method) or below (Euclidean method) the threshold.
The various variables used are classified from z1 to z3. We use no dimensional variable to
ensure there is no influence of the units. z1 corresponds to pLM which is usually used to
identify the shearing resistance of the soil, z2 correspond to Ee and yields a measurement of
the Young’s modulus of the soil (Young’s modulus measured on the unloading-reloading cycle)
and z3 corresponds to the Ee /pLM used by Baguelin et al. (1978) to classify the soil.
The chosen variables should be independent from each other. To check this condition the
correlation coefficient was calculated with relation (4) where Cov(xi, xj) is the covariance of the
variables xi and xj :
Cov(xi , x j )
ij (4)
S(xi ).S(x j )
Theoretically the correlation between independent variables is zero, and the correlation
between linked variables is either 1 or –1. Practically a threshold of 0.65 for our 142 variables is
chosen to be acceptable.
The Grays Harbor site investigation (Figure 2) was performed in Washington State USA in
order to construct a facility capable of building floating pontoons for the Highway 520 Bridge in
Seattle, Washington. From the technical and economic points of view, the project must be
conceived and dimensioned with an adaptation to the sub-soil constrains which must be
recognized and known. The usual method of investigation is to take undisturbed samples and
test them in a laboratory. Unfortunately the soils tested on the Grays Harbor site, which
generally are sandy, silty and clayey are located under the water table and are difficult to
sample. The pressuremeter analysis is useful because it allows for the characterization of the
soil with respect to their natural structure at the depth the test was performed. This
measurement technique was used for the study of the site.
The description of the site investigation and the characteristics of the main geological
formations are shown in Figure 3. The scope of work consisted of 15 Boreholes for sampling
(to depths of 49m max), 23 cone penetrometer tests with qc and Rf measurements (to depths
of 49m), 4 pressuremeter boreholes with Ee and pl measurements (to depth of 45m max).
252
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Figure 2. Map of the section profile and location of the boreholes of the Grays Harbor site
253
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Results of the pressuremeter tests do not show any organization (Figure 4 to 7) and the
laboratory tests are used for the classification of the soil.
EM
Eoed (5)
254
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
1000000
1000000
100000
100000
10000
10000
1000
1000
100
100
0 0
5 Pl 5 Pl
10 Ee 10 Ee
15 15
Depth (m)
Depth (m)
20 20
25 25
30 30
35 35
40 40
45
45
Log(pl), Log(Ee)
Log(pl), Log(Ee)
1000000
100000
10000
1000
100
0
Pl
5 Ee 0
Pl
10 5
Ee
10
15
15
Depth (m)
Depth (m)
20
20
25
25
30 30
35 35
40 40
45 45
255
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Table II shows that with these three variables the algorithm detects the 4 layers of the site.
The Cosine distance measurement identifies 4 different layers once the threshold value of 0.7
is reached with a reasonable low number of unclassified data (1%). As the difference between
family Unit 2 and Unit 3 is small when considering the mean value of pressuremeter modulus
and the net limit pressure, it was decided to increase the threshold value of the correlation to
0.95.
In Table III, the analysis in Euclidean deviation is presented. The analysis performed using
the Euclidean distance tends to be the most sensible one and it is difficult to reduce the level of
unclassified data, while the number of clusters remains high. . If we consider a standard
256
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
distribution for these results, a common value of the deviation from the mean is 0.67 of the
standard deviation, where 50% of the data lie. This deviation is required to identify 4 layers.
For these two values the number of unclassified data remains larger (6%) than the unclassified
data of the Cosine method (4%).
Table IV : Results of the geotechnical inverse analysis for Cosine method; dij minimum 0.95; 2
variables
Family Description Cluster Number of Ee p.l
values (kPa)
(kPa)
Unit 1 Silts 1 17 478 9915
Unit 2 Silty Sands 1 21 478 9915
2 9 1233 34885
3 3 1530 103421
4 2 1187 20402
Unit 3 Silts 2 24 1233 34885
3 7 1530 103421
4 20 1187 20402
5 19 1006 13397
6 3 1096 16317
Unit 4 Sand and Gravel 2 3 1233 34885
4 2 1187 20402
5 2 1006 13397
Table V : Results of the geotechnical inverse analysis for Euclidean method; dij maximum 0.67;
2 variables
Number of Ee p.l
Family Description Cluster
values (kPa) (kPa)
Unit 1 Silts 1 3 623 13576
3 14 225 3401
Unit 2 Silty Sands 1 18 623 13576
2 17 1068 20995
4 3 1364 42701
Unit 3 Silts 2 56 1068 20995
4 9 1364 42701
5 4 1368 17236
Sand and
Unit 4 2 4 1068 20995
Gravel
4 3 1364 42701
257
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
mechanical characteristics), Unit 2 is made of two very different clusters (clusters 1 and 2),
which have rather different pressuremeter modulus and limit pressure, where closer values are
expected. One more time, the interpretation can be used of an interface between Unit 1 and 2
lower than assumed by driller. Unit 3 is found to be mainly composed of 2 different clusters (2,
4) with the same pressuremeter modulus, but with different limit pressure.
On borehole TH28 (Figure 8) with the Cosine method. the results show a correspondence
between Unit 2 (Silty Sands) with clusters 2 and 1 for the upper 25m. The fact that Unit 3 is
divided into 3 different clusters (clusters 2, 4 and 5) shows its heterogeneity, which is seen in
Table 4. The interface between Unit 2 and 3 appears at 25m.
On borehole TH28 (Figure 9) with the Euclidean method the results shows a
correspondence between Unit 2 (Silty Sands) with clusters 2 for upper than 25m. The interface
between Unit 2 and 3 is not detected. The fact that Unit 3 is divided into 3 different clusters
(clusters 2, 4 and 5) shows its heterogeneity, which is seen in Table V.
Cluster Cluster
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 0
TH-28
5 TH-28
5
Driller's
10 description 10 Driller's
description
15 15
Depth (m)
Depth (m)
20 20
25 25
30 30
35 35
40 40
45 45
Figure 8. Comparison between the cluster Figure 9. Comparison between the cluster
classification and the driller; borehole TH28; classification and the driller ; borehole TH28;
Cosine method. dij minimum 0.95; 2 variables Euclidean method; dij maximum 0.67; 2
variables
Cluster
Cluster
0 1 2 3 4 5 6 7 8 9 10
0 2 4 6 8 10
0
0
5 TH-29
5
10 Driller's 10
description
15 TH-29
15
Depth (Feet)
Depth (m)
20 Driller's
20
description
25 25
30 30
35 35
40 40
45 45
Figure 10. Comparison between the cluster Figure 11. Comparison between the cluster
classification and the driller ; borehole classification and the driller ; borehole
TH29; Cosine method; dij minimum 0.95 ; TH29; Euclidean method; dij maximum
2 variables 0.67 ; 2 variables
258
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
Cluster Cluster
0 2 4 6 8 10 0 2 4 6 8 10
0 0
5 TH-30
5
10 Driller's 10
description TH-30
15 15
Driller's
Depth (m)
Depth (m)
20 20 description
25 25
30
30
35
35
40
40
45
45
Figure 12. Comparison between the cluster Figure 13. Comparison between the cluster
classification and the driller ; borehole TH30; classification and the driller ; borehole TH30;
Cosine method; dij minimum 0.95; 2 variables Euclidean method; dij maximum 0.67; 2
variables
On borehole TH29, the Cosine method (Figure 10) shows a correspondence between Unit 1
(Silt) with cluster 1 for the upper 9m. The interface between Unit 1 and 2 is found at 18m where
the driller founds it at 9m. Unit 2 (Silty sand) is divided into 2 different clusters (clusters 1 and 3)
and Unit 3 shows its heterogeneity by is division into 3 different clusters (2, 4 and 5) which is
seen in Table IV.
On borehole TH29, the Euclidean method (Figure 11) finds the same interface between Unit
1 and 2 as the driller at 9m depth. It finds a correspondence between Unit 1 and cluster 3
above 9m. It also finds a correspondence between Unit 2 (Silty Sands) and 2 clusters 1 and 2).
For the lower part of the borehole, below 25m Unit 3 (Silts) appears in correspondence with
cluster 2 but the interface between Unit 2 and Unit 3 is not detected at 25m depth
On borehole TH30, the Cosine method (Figure 12) shows a correspondence between Unit 1
(Silt) with cluster 1 for the upper 12m. The interface between Unit 1 and 2 is not found. Unit 2
(Silty sand) shows a correspondence with clusters 1. The fact that Unit 3 is divided into 3
different clusters (clusters 2, 4 and 5) shows its heterogeneity as seen in Table 4. The pocket
of cluster 4 is found in correspondence with cluster 2.
On borehole TH30 the Euclidean method (Figure 13) is successful to identify an interface
between Unit 1 and Unit 2 at 9m depth where the driller finds the interface.
at 12m depth. It finds a correspondence between Unit 2 (Silty Sands) with cluster 1. For the
lower part of the borehole, below 25m Unit 3 (Silts) appears in correspondence with cluster 2
and 4. The pocket of cluster 4 is found in correspondence with cluster 4.
4. Conclusions
The analysis of the pressuremeter tests of the pontoon construction project in Grays Harbor,
Washington was carried out by the cluster method. This study shows that the use of the 2
variables pl, Ee, allows for the identification of the different soil layers. The measurement of the
deviation between the data by the Euclidean method appears sometimes uncompleted, but the
use of the Cosine deviation seems simpler to use and able to point out the differences between
the geological layers of the soil. It allows the determination of the mean value of the mechanical
characteristics of each layer, the position of the interface and the thickness of the different
layers of the site.
259
ISP7-PRESSIO 2015. Frikha, Varaksin & Gambin(Eds.) © 2015
List of symbol
5. References
Baguelin F., Jezequel J.F., Shields D.H. (1978) The pressuremeter and foundation engineering, Trans. Tech.
Publications, Aedermannsdorf, Switzerland.
Baud J.P.(2005) A [Log (pl), Log (EM / pl ] diagram for spectral analysis of Menard PMT results and its application
to geotechnical site survey, ISP5, Presses des Ponts et Chaussées, Paris, 167-186.
Baud J.-P., Gambin M.(2008) Homogenizing MPM tests curves by using a Hyperbolic model, ISC'3, A.-B., Taylor
and Francis, London.
Everitt B.(1974) Cluster analysis, Halsted-Wiley, N.Y..
Hegazy Y.A. (1998), Delineating geostratigraphy by cluster analysis of piezocone data, Ph.D thesis, School of Civil
and Environmental Engineering dept, Georgia Inst. of Tech., Atlanta,.
Hegazy Y.A., Mayne P.W. (2002) Objective site characterization using clustering of piezocone data, Journal of
Geotechnical and Geoenvironmental Engineering, 128 (12), 986-996.
Mlynarek Z., Lunne T. (1997) Statistical estimation of the homogeneity of North Sea overconsolidated clay, Proc. Of
Int. Conf. On Statistical and Application Probability, Vancouver, Canada,
Mlynarek Z., Wierzbicki J., Wolynski W. (2005) Use of cluster method for in situ tests, Studia Geotechnica et
Mechanica, Vol. XXVII, (3–4) 15-27.
Monnet J., Chapeau C., Godard G. (2003) Caractérisation des sols pour le tunnel de la Rocade Nord de Grenoble,
Revue Française de Géotechnique (101) 57-70
Monnet J., Allagnat D., Teston J., Billet P., Baguelin F.(2006) Foundation design for a large arch bridge on alluvial
soils, Proceeding of the Institution of the Civil Engineers - Geotechnical Engineering , London, vol.159,n°1,
p.19-28
Monnet J., Allagnat D.(2006) Interpretation of pressuremeter results for the design of diaphragm wall, Geotechnical
Testing Journal, American Soc. Testing Materials, Boston, V.29, n°2, 126-132;
Monnet J., Broucke M. (2012) The use of a cluster analysis in a Ménard pressuremeter survey, ICE - Geotechnical
Engineering ,London, Volume 165, Issue 6, 367-377.
Norusis M.J. (1988) The SPSS guide to data analysis for SPSS. SPSS Inc. , Chicago, 402.
Robertson P.K., Wride C.E. 1998, Evaluating cyclic liquefaction potential using the cone penetration test, Canadian
Geotechnical Journal,(35) 442–459
Saboya F., Balbi D.J.G (2003) Soil profile interpretation based on similarity concept for CPTU data, Proc. ISC-2 on
Geot. And Geophys. Site Charact, Porto, Ed. V.de Fonseca, P.Mayne
260