You are on page 1of 11

Engineering with Computers (2019) 35:955–965

https://doi.org/10.1007/s00366-018-0643-1

ORIGINAL ARTICLE

A swarm intelligence-based machine learning approach for predicting


soil shear strength for road construction: a case study at Trung Luong
National Expressway Project (Vietnam)
Dieu Tien Bui1,2 · Nhat‑Duc Hoang3 · Viet‑Ha Nhu4

Received: 2 July 2018 / Accepted: 11 September 2018 / Published online: 15 September 2018
© Springer-Verlag London Ltd., part of Springer Nature 2018

Abstract
Determining the shear strength of soil is an important task in the design phase of construction project. This study puts for-
ward an artificial intelligence (AI) solution to estimate this parameter of soil. The proposed approach is a hybrid AI model
that integrates the least squares support vector machine (LSSVM) and the cuckoo search optimization (CSO). A dataset of
332 soil samples collected from the Trung Luong National Expressway Project in Viet Nam have been used for construct-
ing and validating the AI model. The sample depth, sand percentage, loam percentage, clay percentage, moisture content,
wet density of soil, specific gravity, liquid limit, plastic limit, plastic index, and liquid index are used as input variables to
predict the output variable of shear strength. In the hybrid AI framework, LSSVM is employed to generalize the functional
mapping that estimates the shear strength from the information provided by the aforementioned input variables. Since the
model establishment of LSSVM requires a proper setting of the regularization and the kernel function parameters, the CSO
algorithm is utilized to automatically determine these parameters. Experimental results show that the prediction accuracy
of the hybrid method of LSSVM and CSO (RMSE = 0.082, MAPE = 14.841, and R2 = 0.885) is better than those of the
benchmark approaches including the standard LSSVM, the artificial neural network, and the regression tree. Therefore, the
proposed method is a promising alternative for assisting construction engineers in the task of soil shear strength estimation.

Keywords Soil · Shear strength · Expressway · Hybrid artificial intelligence · Optimization · Data-driven method

1 Introduction dams, retaining walls, and high-rise buildings [1]. The pro-
cess to obtain the soil shear strength in laboratory is not
Shear strength is defined as the capability of soils to with- only time-consuming but also costly [2]. These are due to
stand internal movement or slippage when subjected to an the instrument handling problems and the meticulous proce-
imposed load. This property of soil is a crucial factor which dures needed to guarantee reliable measurement outcomes.
is often employed in the design phase of many large-scale Because of these reasons, many studies have been dedi-
infrastructure projects including highway, pavements, earth cated in establishing alternative ways to obtain the shear
strength of soils including traditional formula-based meth-
* Dieu Tien Bui ods [3] and advanced data-driven methods [4]. Although the
buitiendieu@tdt.edu.vn formula-based methods can explicitly express the functional
1
relationship between the shear strength and its conditional
Geographic Information Science Research Group, Ton Duc variables, the modeling accuracy of these traditional meth-
Thang University, Ho Chi Minh City, Vietnam
2
ods is limited. The first reason is that the soil shear strength
Faculty of Environment and Labour Safety, Ton Duc Thang is affected by many factors including the soil density, plastic
University, Ho Chi Minh City, Vietnam
3
index, liquid limit, moisture content, and clay content [5].
Faculty of Civil Engineering, Institute of Research The second is that the functional relationship between the
and Development, Duy Tan University, P809‑03 Quang
Trung, Da Nang 550000, Vietnam shear strength of soils and those influencing factors can be
4 highly nonlinear. Conventional formula-based approaches
Department of Geological‑Geotech Engineering, Hanoi
University of Mining and Geology, , No. 18 Pho Vien, Duc are restricted in both nonlinear and multivariate modeling;
Thang, Bac Tu Liem, Hanoi, Vietnam

13
Vol.:(0123456789)
956 Engineering with Computers (2019) 35:955–965

they therefore cannot produce predictive results with satis- algorithms and machine learning models. Mohammad
factory accuracy [6]. Emami et al. [23] relied on the metaheuristic algorithms
For these motivations, various scholars have recently of teaching–learning-based optimization, imperialist com-
resorted to artificial intelligence (AI) as advanced data anal- petitive, and artificial bee colony algorithms for estimating
ysis tools to construct soil shear strength prediction models. shear-wave velocity of sandstone. Prayogo, Susanto [16]
AI-based models are excellent in nonlinear modeling and are combined symbiotic organisms search and least squares
capable of taking into account many input variables which support vector machine to enhance the predictive accu-
determine the value of soil shear strength. The AI models racy of models used for modeling the friction capacity of
are also flexible; they can adjust their model structures adap- driven piles in cohesive soil. Particle swarm optimization
tively according to the changes in the collected geotechnical and genetic algorithm have been utilized in [4] to optimize
data [7–11]. the fuzzy neural networks employed for forecasting the
Kanungo et al. [12] employed artificial neural network shear strength of soft soil. A model that incorporates sup-
(ANN) and regression tree (RT) to estimate unsaturated soil port vector machine and an improved firefly algorithm has
shear strength parameters including cohesion and angle of been proposed in [24] for predicting subbase soil modu-
internal friction. The percentages of gravel, sand, silt, clay, lus. Previous studies have demonstrated the efficacy of
dry density, and plasticity index are employed as input infor- metaheuristic algorithms in modeling complex phenomena
mation. This study found that ANN performed better than in geotechnical engineering.
RT. In another study, Kiran et al. [13] applied Probabilistic Proposed in [25], cuckoo search optimization (CSO)
Neural Network to predict the shear strength parameters of is a novel and capable metaheuristic method which per-
soil. A model based on functional network used for esti- formance can surpass those of Particle Swarm Optimi-
mating the residual strength of clay has been proposed by zation and Genetic Algorithm. This algorithm relies on
Khan et al. [14]. Hashemi Jokar, Mirasi [6] relied on adap- the Lévy flights mechanism which mimics the foraging
tive neuro-fuzzy inference system for modeling unsaturated path of an animal to explore the search space [26]. Due
soils shear strength. Pham et al. [4] recently utilized the to such property, CSO has good explorative power and
neuro-fuzzy inference models, Support Vector Regression, is capable of escaping from local minima. Good perfor-
and Artificial Neural Networks (ANN) to predict the shear mances of CSO in complex optimization problems have
strength of soft soils. been widely reported in the literature [27–29]. Therefore,
In general, based on the recent literature, the common this algorithm has increasingly attracted the interest of
finding is that AI methods are highly suitable for prediction various researchers [30–32]. Nevertheless, the application
of the soil shear strength. In addition, since the problem of of CSO in soil shear strength estimation has not yet been
interest is complicated, other advanced AI models should be investigated. Therefore, this study applies metaheuristic
investigated to identify better estimation models. The least as a method to optimize the performance of the LSSVM
squares support vector machine (LSSVM) is a capable tool model with respect to its hyperparameter selection.
for nonlinear and multivariate modeling. This AI approach The LSSVM and CSO methods are combined to con-
has been applied successfully in geotechnical and geological struct an integrated AI model for soil shear strength pre-
engineering fields [15–21]. Nevertheless, none of the previ- diction. To train and verify the hybrid AI approach, this
ous studies have employed LSSVM to predict shear strength study has collected 332 samples of soil with the testing
of soils. Therefore, the current study is an attempt to fill this results of shear strength. The data were recorded during
gap in the literature. the design and construction phases of the Trung Luong
Moreover, the model establishment of LSSVM requires national expressway project in Viet Nam. In total, eleven
appropriate setting of its hyperparameters including the factors including sample depth, sand percentage, loam per-
regularization and the kernel function parameters. These centage, clay percentage, moisture content, wet density
two parameters strongly affect the outcome of the learning of soil, specific gravity, liquid limit, plastic limit, plastic
phase and therefore determine the predictive capability of index, liquid index are employed as input information.
the LSSVM-based soil shear strength estimation model. The subsequent part of the study is organized as fol-
Specifying the regularization and the kernel function param- lows: The second section reviews the mathematical back-
eters is not a trivial task because they should be searched ground of the employed AI algorithms including LSSVM
in continuous domains [22]. Therefore, there is an infinite and CSO. The third section presents the collected dataset,
number of parameter sets. followed by the description of the hybrid AI model used
Since the problem of parameter tuning of machine for soil shear strength prediction. The fifth section reports
learning models can be formulated as optimization prob- the experimental results, followed by some concluding
lems, various researchers have incorporated metaheuristic remarks in the final section.

13
Engineering with Computers (2019) 35:955–965 957

2 Mathematical background of the artificial To find the solution of the aforementioned constrained opti-
intelligence algorithms mization problem, the Lagrangian is formulated as follows:

2.1 Least squares support vector machine (LSSVM) ∑


N
L(w, b, e;𝛼) = Jp (w, e) − 𝛼k {wT 𝜙(xk ) + b + ek − yk },
k=1
LSSVM, proposed by Suykens et al. [33], is an advanced (2)
artificial intelligence (AI) approach which is established by where 𝛼k denotes the Lagrange multipliers.
the principal of structural risk minimization. The learning The Karush–Kuhn–Tucker conditions for optimality are
phase of LSSVM is fast since this phase only requires solv- employed by differentiating the Lagrangian function L(w,b,e,α)
ing a set of linear equations. To construct the prediction with the variables in the following manner:
model, a dataset of soil shear strength records is prepared
in the following form: D = {xk , yk }, k = 1, 2, … , N, where ⎧ 𝜕L �N
⎪ =0→w= 𝛼k 𝜙(xk )
k denotes the kth data sample and N is the total number of ⎪ 𝜕w k=1
data samples. ⎪
It is noted that x k is a vector of eleven elements; ⎪ 𝜕L �N
⎪ =0→ 𝛼k = 0
xk1 , xk2 , xk3 , xk4 , xk5 , … , xk11 denotes the set of shear strength ⎨ 𝜕b k=1 (3)
conditional factors. In addition, yk represents the shear ⎪ 𝜕L
strength of the kth soil sample. The objective of the LSSVM ⎪ = 0 → 𝛼k = 𝛾ek , k = 1, … , N
⎪ 𝜕ek
learning phase is to construct a mapping function y(x) that ⎪ 𝜕L
estimates the response variable of soil shear strength based ⎪ = 0 → wT 𝜙(xk ) + b + ek − yk = 0 k = 1, … N.
⎩ 𝜕𝛼k
on a collected set of influencing variables x. The influencing
variable x contains the information of the soil sample needed
for strength prediction. Because the mapping function y(x)
is possibly nonlinear, LSSVM deals with data nonlinearity After the above linear system is solved, the final LSSVM
by mapping the data from the original input space to a high model used for function approximation is demonstrated com-
dimensional feature space; this data mapping is achievable pactly as follows:
through a mapping function 𝜙(x) [34]. Thus, linear regres-
sion analysis can be feasibly done in such high-dimensional ∑
N

feature space. The concept of LSSVM is demonstrated in


y(x) = 𝛼k K(xk , xl ) + b, (4)
k=1
Fig. 1.
To establish a LSSVM model, it is necessary to solve the where αk and b denote the solution to the linear system. k
following optimization problem [33]: and N represent the index and the total number of data sam-
ples in the training set. xk and xl are an input pattern in the
1∑ 2
N
1 T training and testing set, respectively. K(.) denotes the kernel
Minimize Jp (w, e) = w w+𝛾 e. (1)
2 2 k=1 k function which performs data mapping. The Radial Basis
Function (RBF) kernel is widely utilized [35–37]; therefore,
Subjected to yk = wT 𝜙(xk ) + b + ek , k = 1, … , N . it is selected for this work. Its functional form is described
where ek ∈ R is the kth error variable; 𝛾 > 0 denotes a as follows:
regularization constant. w and b are the LSSVM model’s ( )
parameters which specify the hyperplane used for function ‖x − x ‖2
‖ k l‖
approximation. 𝜙(xk ) represents a mapping function.
K(xk , xl ) = exp − , (5)
2𝜎 2

Fig. 1  The LSSVM model for


estimating soil shear strength

13
958 Engineering with Computers (2019) 35:955–965

where 𝜎 represents the radial basis kernel function parameter. In addition, the local random walk used in CSO aims
at exploring the vicinity of a current solution xi in the gth
g

2.2 Cuckoo search optimization (CSO) generation via the following equation:


( ) ( g )
CSO [25] is a recently developed metaheuristic algorithm g+1 g
xi = xi + 𝛼 ⋅ s ⊗ H pa − 𝜀 ⊗ xj − xk ,
g
(7)
which is inspired by the brood parasitism of cuckoo spe-
where xj and xk denote the two randomly selection solu-
g g
cies. Equipped with Levy flights random walks, CSO is an
effective swarm intelligence-based algorithm for solving for tions, respectively; H(.) is a Heaviside step function;
global optimization in continuous space [26]. Successful 𝜀 ∈ [0, 1] denotes a uniform-distribution random number; s
applications of this metaheuristic in tacking complex opti- represents the step length parameter; ⊗ is the dot product of
mization problems have been observed in various engineer- two vectors; pa denotes the probability of local random walk
ing fields [27, 38–40]. [25].
In the CSO operation, a possible solution for the opti-
mization problem at hand is represented by an egg. Each
individual of cuckoo can lay one egg. Therefore, a cuckoo
is associated with a solution in the search space. During 3 Study area and data
the iterative process of exploiting and exploring the search
space, a new and potentially better solution, or cuckoos, is The study area (Fig. 2) is the Trung Luong National Express-
identified and replace an inferior solution [38]. way section (Km 81–87), which is a part of the Ho Chi
The CSO algorithm keeps a balance between the explora- Minh–Can Tho Expressway in the Mekong River delta of
tion and exploitation of the search space using both global Vietnam with the total length of 139 km. The expressway
and local random walks. The global random walk relied on is under constructing and is expected to be completed at the
Levy flights is employed to modify the position of a current end of 2019. The road is designed for 4-lane traffic (road
solution; its mathematical equation is shown as follows: width of 25.5 m) in the first stage and for 6-lane traffic (road
width of 33.0 m) in the completion Stage. The designed
g+1
xi
g
= xi + 𝛼 ⋅ Levy, (6) speed for the expressway is 120 km/h.
This study employed AI methods as a reliable tool for to
where 𝛼 > 0 is the step length. Levy is a random number estimating the shear strength of soils. Accordingly, a data-
drawn from the Levy distribution. g denotes the number of set has been collected during the geotechnical investigation
generations. phase of the Trung Luong expressway project in Vietnam,

Fig. 2  Location of the Trung


Luong National Expressway
Project

13
Engineering with Computers (2019) 35:955–965 959

section from Km 81 to 87. It should be noted that the geo- limit (X9), plastic index (X10), and liquid index (X11) are
logical survey and laboratorial tests were carried out refer- employed as input information. The statistical characteristics
ring to the Vietnam Standards (22TCN259, 262, 263–2000) of the variables used in this study are shown in Table 1. His-
for the expressway. In this Sect. 73, boreholes were drilled tograms of the variables in the collected dataset are provided
with the total drilling length is of 2168.7 m. The highest in Fig. 3.
depth is 70.5 m and the lowest depth is 12.5 m. To obtain In addition, the eleven shear strength influencing factors
the information of the soil layers, Standard penetration test of the dataset have been normalized by the Z score transfor-
[41], Cone penetration test [42], and Vane Shear Test [43] mation. The data transformation aims at helping to avoid the
were performed. In addition, based on the Unified Soil Clas- circumstance in which the influencing factors with large mag-
sification System (USCS) [44], the soil in the study area can nitude dominate the ones with small magnitudes. The Z score
be classified into categories of CL (lean clay), MH (elastic data transformation is mathematically described as follows:
silt), and CH (fat clay).
XO − mX
In total, 332 samples of soil with the testing results of shear XN = , (8)
sX
strength (Y) have been collected. Eleven variables including
sample depth (X1), sand percentage (X2), loam percentage where XN and XO denote the normalized and the original
(X3), clay percentage (X4), moisture content (X5), wet density shear strength influencing factors, respectively. mX and sX
of soil (X6), specific gravity (X7), liquid limit (X8), plastic

Table 1  Statistical description Variables Unit Notation Min Mean Median Std Max
of variables
Sample depth m X1 1.00 16.88 13.50 12.79 52.00
Sand percentage % X2 0.00 8.13 2.40 12.77 57.30
Loam percentage % X3 20.00 54.15 56.35 12.22 83.70
Clay percentage % X4 10.40 37.42 37.70 11.52 63.90
Moisture content % X5 1.64 47.07 34.40 24.09 90.30
Wet density of soil g/cm3 X6 1.07 1.78 1.85 0.22 2.15
Specific gravity Unitless X7 2.37 2.70 2.70 0.06 3.69
Liquid limit % X8 21.40 54.27 54.65 14.56 79.90
Plastic limit % X9 12.90 24.67 24.05 5.28 35.90
Plastic index % X10 6.60 29.60 29.50 9.79 49.70
Liquid index Unitless X11 0.00 0.66 0.70 0.48 1.73
Shear strength cm2/kG Y 0.06 0.35 0.27 0.27 1.04

Fig. 3  Histograms of variables

13
960 Engineering with Computers (2019) 35:955–965

represent the mean value and the standard deviation of the where Par denotes the tuning parameter at the first iteration.
original influencing factors, respectively. RN is a uniform random number generated within the range
of 0 and 1. Based on the suggestion in the previous work
of [15], LB = 0.001 and UB = 1000 are the lower and upper
4 Proposed hybrid artificial intelligence boundaries of the LSSVM parameters, respectively.
approach for predicting the shear To identify the most appropriate set of the LSSVM hyper-
strength of soil parameters (the regularization parameter 𝛾 and the kernel
function parameter 𝜎 ), the original dataset has been divided
This section of the study describes the proposed AI approach into two exclusive sets: the Dataset 1 (80%) and the Dataset
for estimating soil shear strength in detail. The proposed 2 (20%). The Dataset 1 is again separated into two subsets:
approach is a hybridization of the LSSVM and CSO algo- the training (80%) and validating (20%) subsets. The Dataset
rithms. Therefore, the hybrid model is named as LSSVM- 1 is used for training and validating purpose. The Dataset 2
CSO. LSSVM is employed to generalize the underlying serves as testing cases to assess the predictive capability of
functional relationship that computes the shear strength of the constructed LSSVM model.
soil samples based on a set of influencing factors. In addi- During the optimization process, the CSO algorithm
tion, CSO serves as an optimizer to determine the LSSVM employs its local and global random walks to explore the
hyperparameters automatically. The overall model structure appropriateness of the model hyperparameters. The opti-
of LSSVM-CSO is provided in Fig. 4. The hybrid model is mization algorithm gradually discards inferior 𝛾 and 𝜎 ; it
constructed in MATLAB environment with the help of the preserves better values of the hyperparameters which lead
LS-SVMlab Toolbox [45]. to better prediction accuracy.
At the first iteration (Iter), the two hyperparameters (𝛾 To identify better sets of the model hyperparameters,
and 𝜎 ) of LSSVM, are randomly generated within the range the following cost function (CF) is minimized by the CSO
of lower and upper boundaries by the following equation: algorithm:
Par = LB + RN × (UB − LB), (9) RMSETraining + RMSEValidating
CF = , (10)
2
Fig. 4  The proposed LSSVM-
CSO for predicting soil shear Start
strength

The collected Tuning parameter


dataset initialization
Iter = 1
LSSVM training
Dataset 1 phase CSO Algorithm

Training and validating data


LSSVM prediction
phase

Cost function
evaluation

Stopping Iter = Iter + 1


condition Unsatisfied
Satisfied

Optimized LSSVM
Dataset 2
model
Testing data

Shear Strength
Prediction Result

13
Engineering with Computers (2019) 35:955–965 961

where ­RMSETraining and ­RMSEValidating denote the root mean of y and t, respectively. 𝜎y and 𝜎t represents the standard
squared error (RMSE) of prediction results of the training deviations of y and t, respectively. Additionally, n denotes
and validating data, respectively. the number of data sample.
The RMSE is calculated as follows: For demonstrating the capability of the newly developed

√N model, the performance of LSSVM-CSO is compared with
√∑ (y − t )2 those of benchmark models including LSSVM (without
RMSE = √ i i
, (11) using CSO for parameter optimization), Backpropaga-
i=1
N
tion Artificial Neural Network (BPANN), and Regression
where yi and ti are the predicted and the actual values of Tree (RT). The LSSVM is operated via the Matlab tool-
the shear strength, respectively; N represents the number of box developed by De Brabanter et al. [45]. The BPANN
data samples. and RT models are implemented in MATLAB environment
It is noted that the training and the validating sets belong with the help of the statistics and machine learning toolbox
to the Dataset 1 which occupies 80% of the original data- [48]. It is noted that the parameter of LSSVM is tuned via a
set. The purpose of the CF described in Eq. 10 is to guard grid search process [49]. The number of neurons required in
against overfitting phenomenon. It is because minimizing the setting of the BPANN model structure is selected in the
the ­RMSETraining alone may lead to a prediction model which range of 5–20. Additionally, for the case of RT, the number
has a very good fit to the training data; however, the perfor- of minimum number of leaf node observations is allowed
mance of the model when it estimates data in the testing set to vary from 1 to 5% of the number of data instances in the
is very poor. This situation is commonly known as overfit- training set. The parameters of LSSVM, BPANN, and RT
ting [46]. Therefore, the inclusion of the ­RMSEValidating in associated with the smallest ­RMSEValidating are selected to be
the CF can help to alleviate the overfitting problem. The used in the prediction phase.
CSO carries out the optimization process until a sufficient At the first experiment, 20% of the whole dataset (includ-
number of searching iteration (Iter) is reach. Herein, based ing 332 data samples) is randomly extracted to be used as
on several trial-and-error runs as well as the suggestions in testing data; 80% of the dataset is used for training and vali-
the previous work of [38], the maximum number of iteration dating purposes. Based on the experiment, the CSO algo-
is set to be 100. When the optimization process terminates, rithm has identified the optimal regularization parameter and
the optimized LSSVM model with an appropriate set of the kernel function parameter of LSSVM as follows: 𝛾 = 12.45
hyperparameters is ready for predicting the shear strength and 𝜎 = 3.72. The most appropriate regularization parameter
of soil sample in the testing dataset. and kernel function parameter of LSSVM found by the grid
search algorithm are 500 and 10, respectively. The number
of neurons in the BPANN’s hidden layer is optimally set
5 Results and discussion to be 9. For RT, the number of minimum number of leaf
node observations = 1% of the number of data instances
To evaluate the performance of the LSSVM-CSO, besides in the training set lead to the smallest RMSE value. The
RMSE, the mean absolute percentage error (MAPE), Vari- experimental results of the training and prediction phases
ance Accounted for (VAF) [6], and the coefficient of deter- of the employed models are presented in Table 2. As can be
mination (R2) [47] are also employed in this section of the seen from this table, the predictive outcomes of LSSVM-
study. The equations used to compute these two performance CSO (RMSE = 0.082, MAPE = 14.841%, VAF = 93.110;
measurement metrics are provided as follows: R2 = 0.885) are better than those of LSSVM (RMSE = 0.085,
MAPE = 17.874%, VAF = 91.330; R2 = 0.875), BPANN
(RMSE = 0.097, MAPE = 21.789%, VAF = 88.151;
100% ∑ || ti − yi ||
n
MAPE = , (12) ­R2 = 0.838) and RT (RMSE = 0.113, MAPE = 18.805%,
n i=1 || ti ||
VAF = 86.333, R2 = 0.778). Illustrations of the performance
[ ] of the hybrid LSSVM-CSO model are provided in Figs. 5
var(ti − yi ) and 6.
VAF = 1 − × 100, (13)
var(ti ) Moreover, since one running time of model training and
testing may not reflect the true predictive capability of the
{ }2 model due to randomness in data selection, this study has

n
[ ]
2
R = (1∕n) × (yi − yMean )(ti − tMean ) ∕(𝜎y 𝜎t ) , performed the random subsampling of the dataset which
i=1 includes 20 runs. In each run, 80% of the data is used for
(14) model establishment and the rest of the data serves as testing
where t and y are the actual and the predicted values of samples. The experimental results are reported in Table 3.
shear strength. yMean and tMean denotes the average values As can observed from the table, the average result of the

13
962 Engineering with Computers (2019) 35:955–965

Table 2  Experimental result of the training and prediction phases R2 = 0.794). These experimental outcomes point out that
Phase Metric LSSVM-CSO LSSVM BPANN RT LSSVM-CSO is best suited for the collected dataset of soil
shear strength in this study.
Training RMSE 0.078 0.079 0.081 0.052
MAPE (%) 14.910 16.059 18.150 8.529
VAF (%) 95.501 96.550 94.408 97.794 6 Concluding remarks
R2 0.922 0.919 0.915 0.966
Testing RMSE 0.082 0.085 0.097 0.113 The shear strength of soil is a crucial parameter widely used
MAPE (%) 14.841 17.874 21.789 18.805 in the design phase of construction projects. The conven-
VAF (%) 93.110 91.330 88.151 86.333 tional process of shear strength determination in laboratory
R2 0.885 0.875 0.838 0.778 is both costly and time-consuming. To avoid such limita-
tion of the conventional process, this study proposes a data-
driven method for estimating the shear strength of soils. The
testing phase of the hybrid LSSVM-CSO (RMSE = 0.096, newly proposed approach is a hybridization of LSSVM and
MAPE = 19.165, VAF = 93.088%; R2 = 0.871) is superior CSO. LSSVM is used as a function approximation technique
to those of LSSVM (RMSE = 0.102, MAPE = 20.321, which can predict the value of shear strength based on a
VAF = 91.661%; R 2 = 0.851), BPANN (RMSE = 0.113, set of input information. The CSO algorithm is employed
MAPE = 23.926, VAF = 88.639%; R 2 = 0.812), and RT to finetune the hyperparameters of LSSVM including the
(RMSE = 0.123, MAPE = 20.020, VAF = 88.266%; regularization and the kernel parameters.

Fig. 5  Prediction performance


of LSSVM-CSO in the training
and testing phases

Fig. 6  Actual vs. predicted value of shear strength yielded by LSSVM-CSO in the training and testing phases

13
Engineering with Computers (2019) 35:955–965 963

Table 3  Results of the random Phase Metric LSSVM-CSO LSSVM BPANN RT


subsampling process with 20
runs Mean SD Mean SD Mean SD Mean SD

Training RMSE 0.076 0.002 0.066 0.003 0.090 0.013 0.049 0.003
MAPE (%) 14.300 0.378 12.255 0.490 18.863 3.256 8.247 0.418
VAF(%) 95.363 0.285 96.450 0.246 93.253 3.310 97.738 0.413
R2 0.922 0.004 0.941 0.004 0.888 0.032 0.966 0.004
Testing RMSE 0.096 0.011 0.102 0.018 0.113 0.033 0.123 0.018
MAPE (%) 19.165 3.190 20.321 3.638 23.926 6.390 20.020 3.320
VAF (%) 93.088 1.340 91.661 4.101 88.639 7.792 88.266 2.767
R2 0.871 0.027 0.851 0.050 0.812 0.148 0.794 0.055

To construct the hybrid model, a dataset of 332 soil more historical cases of soil shear strength test should be
samples with testing results of shear strength has been col- collected to enhance the generalization of the machine learn-
lected during the geotechnical investigation process of an ing based model.
expressway construction project in Vietnam. A set of vari-
ables including sample depth, sand percentage, loam per- Acknowledgements This research was supported by the Geographic
Information Science research group, Ton Duc Thang University, Ho
centage, clay percentage, moisture content, wet density of Chi Minh city, Vietnam. We would like to thank the Transport Engi-
soil, specific gravity, liquid limit, plastic limit, plastic index, neering Design Inc.—TEDI, Hanoi, Vietnam, for providing the data
liquid index have been employed as the shear strength’s for this analysis.
predictors. Experimental results have pointed out that the
hybrid LSSVM-CSO has successfully identified the map- Compliance with ethical standards
ping function that computes the shear strength based on
the input information provided by the influencing factors. Conflict of interest The authors declare no conflict of interest.
This fact is confirmed by a good predictive performance of
LSSVM-CSO with R2 = 0.885. The outcome of R2 implies
that almost 87% of the variation in the output variable of
the shear strength can be explained by the model. This is References
a highly satisfied performance since the estimation of the
1. Vanapalli SK, Fredlund DG, Pufahl DE, Clifton AW (1996) Model
shear strength of soil is widely known to be complex and for the prediction of shear strength with respect to soil suction.
uncertain. Can Geotech J 33(3):379–392. https​://doi.org/10.1139/t96-060
Moreover, the predictive capability of LSSVM-CSO has 2. Sun S, Xu H (2007) Determining the shear strength of unsaturated
been shown to be superior to the benchmark approaches of silt. In: Schanz T (ed) Experimental unsaturated soil mechanics.
Springer, Berlin, Heidelberg, pp 195–206
LSSVM, BPANN, and RT. Thus, one advantage of the pro- 3. Vanapalli SK, Fredlund DG (2000) Comparison of different pro-
posed LSSVM-CSO is that this model can help to attain cedures to predict unsaturated soil shear strength. Adv Unsatur
highly accurate predictive result of soil shear strength. Geotech. https​://doi.org/10.1061/40510​(287)13
Another advantage of the newly constructed approach is that 4. Pham BT, Son LH, Hoang T-A, Nguyen D-M, Tien Bui D (2018)
Prediction of shear strength of soft soil using machine learning
the tuning parameters of LSSVM can be identified adap- methods. CATENA 166:181–191. https​://doi.org/10.1016/j.caten​
tively; therefore, the model training phase can be performed a.2018.04.004
automatically without domain knowledge in machine learn- 5. Das BM, Sobhan K (2013) Principles of geotechnical engineering.
ing and metaheuristic. Therefore, the proposed hybrid AI Cengage Learning (ISBN-10:1133108660)
6. Hashemi Jokar M, Mirasi S (2017) Using adaptive neuro-fuzzy
method is very promising to help geotechnical engineers to inference system for modeling unsaturated soils shear strength.
quickly and reliably estimate the soil shear strength. Nev- Soft Comput. https​://doi.org/10.1007/s0050​0-017-2778-1
ertheless, one disadvantage of the model is that it is not 7. Chen W, Panahi M, Pourghasemi HR (2017) Performance evalua-
capable of expressing the level of importance of each input tion of GIS-based new ensemble data mining techniques of adap-
tive neuro-fuzzy inference system (ANFIS) with genetic algorithm
variable. In addition, LSSVM-CSO is a black-box model; (GA), differential evolution (DE), and particle swarm optimization
this means that the model structure cannot be expressed (PSO) for landslide spatial modelling. CATENA 157:310–324.
explicitly in the form of predictive equations. Such draw- https​://doi.org/10.1016/j.caten​a.2017.05.034
back of the current modeling approach may cause certain 8. Naghibi SA, Pourghasemi HR, Abbaspour K (2018) A comparison
between ten advanced and soft computing models for groundwa-
difficulty for the practicing engineers to comprehend the ter qanat potential assessment in Iran using R and GIS. Theor
model behavior. Moreover, a limitation of the current study Appl Climatol 131(3):967–984. https​://doi.org/10.1007/s0070​
is that the size of the dataset is still relatively small; thus, 4-016-2022-4

13
964 Engineering with Computers (2019) 35:955–965

9. Hoang N-D, Tien Bui D, Liao K-W (2016) Groutability esti- 25. Yang XS, Deb SD Cuckoo search via Levy flights. In: 2009
mation of grouting processes with cement grouts using dif- World Congress on nature & biologically inspired computing
ferential flower pollination optimized support vector machine. (NaBIC), 9–11 Dec 2009, pp 210–214. https​://doi.org/10.1109/
Appl Soft Comput 45:173–186. https​: //doi.org/10.1016/j. NABIC​.2009.53936​90
asoc.2016.04.031 26. Joshi AS, Kulkarni O, Kakandikar GM, Nandedkar VM (2017)
10. Asim KM, Awais M, Martínez–Álvarez F, Iqbal T (2017) Seismic Cuckoo search optimization—a review. Mater Today Proc
activity prediction using computational intelligence techniques in 4(8):7262–7269. https​://doi.org/10.1016/j.matpr​.2017.07.055
northern Pakistan. Acta Geophys 65(5):919–930 27. Etedali S, Mollayi N (2018) Cuckoo search-based least squares
11. Asim KM, Idris A, Iqbal T, Martínez-Álvarez F (2018) Seismic support vector machine models for optimum tuning of tuned
indicators based earthquake predictor system using Genetic Pro- mass dampers. Int J Struct Stab Dyn 18(02):1850028. https​://
gramming and AdaBoost classification. Soil Dyn Earthqu Engi doi.org/10.1142/s0219​45541​85002​81
111:1–7 28. Suresh S, Lal S (2016) An efficient cuckoo search algorithm
12. Kanungo DP, Sharma S, Pain A (2014) Artificial Neural Net- based multilevel thresholding for segmentation of satellite
work (ANN) and Regression Tree (CART) applications for the images using different objective functions. Expert Syst Appl
indirect estimation of unsaturated soil shear strength parameters. 58:184–209. https​://doi.org/10.1016/j.eswa.2016.03.032
Front Earth Sci 8(3):439–456. https​://doi.org/10.1007/s1170​ 29. Khoja I, Ladhari T, M’sahli F, Sakly A (2018) Cuckoo
7-014-0416-0 search approach for parameter identification of an acti-
13. Kiran S, Lal B, Tripathy SS (2016) Shear strength prediction vated sludge process. Comput Intell Neurosci. https​: //doi.
of soil based on probabilistic neural network. J Sci Technol org/10.1155/2018/34768​51
9(41):1–6 30. Rakhshani H, Rahati A (2017) Snap-drift cuckoo search: a
14. Khan SZ, Suman S, Pavani M, Das SK (2016) Prediction of the novel cuckoo search optimization algorithm. Appl Soft Comput
residual strength of clay using functional networks. Geosci Front 52:771–794. https​://doi.org/10.1016/j.asoc.2016.09.048
7(1):67–74. https​://doi.org/10.1016/j.gsf.2014.12.008 31. Das P, Naskar SK, Patra SN (2018) Hardware efficient FIR fil-
15. Tien Bui D, Pham BT, Nguyen QP, Hoang N-D (2016) Spatial ter design using global best steered quantum inspired cuckoo
prediction of rainfall-induced shallow landslides using hybrid search algorithm. Appl Soft Comput 71:1–19. https​: //doi.
integration approach of least-squares support vector machines org/10.1016/j.asoc.2018.06.030
and differential evolution optimization: a case study in Cen- 32. Chen G, Qiu S, Zhang Z, Sun Z, Liao H (2017) Optimal power
tral Vietnam. Int J Digit Earth 9(11):1077–1097. https​://doi. flow using gbest-guided cuckoo search algorithm with feedback
org/10.1080/17538​947.2016.11695​61 control strategy and constraint domination rule. Math Probl
16. Prayogo D, Susanto YTT (2018) Optimizing the prediction accu- Eng. https​://doi.org/10.1155/2017/90675​20
racy of friction capacity of driven piles in cohesive soil using a 33. Suykens J, Gestel JV, Brabanter JD, Moor BD, Vandewalle J
novel self-tuning least squares support vector machine. Adv Civ (2002) Least square support vector machines. World Scientific
Engi. https​://doi.org/10.1155/2018/64901​69 Publishing Co. Pte. Ltd., Singapore (ISBN: 9812381511)
17. Cheng M-Y, Hoang N-D (2014) Groutability prediction of micro- 34. Hoang N-D (2018) An artificial intelligence method for asphalt
fine cement based soil improvement using evolutionary LS-SVM pavement pothole detection using least squares support vector
inference model. J Civ Eng Manag 20(6):839–848. https​://doi. machine and neural network with steerable filter-based feature
org/10.3846/13923​730.2013.80271​7 extraction. Adv Civ Eng. https​://doi.org/10.1155/2018/74190​58
18. Niu D, Dai S (2017) A short-term load forecasting model with a 35. Tien Bui D, Bui K-TT, Bui Q-T, Doan CV, Hoang N-D (2017)
modified particle swarm optimization algorithm and least squares Hybrid intelligent model based on least squares support vector
support vector machine based on the denoising method of empiri- regression and artificial bee colony optimization for time-series
cal mode decomposition and grey relational analysis. Energies modeling and forecasting horizontal displacement of hydro-
10(3):408 power dam. In: Samui P, Sekhar S, Balas VE (eds) Handbook
19. Cheng M-Y, Prayogo D, Wu Y-W (2018) Prediction of perma- of neural computation, Chap 15. Academic Press, pp 279–293.
nent deformation in asphalt pavements using a novel symbiotic https​://doi.org/10.1016/B978-0-12-81131​8-9.00015​-6
organisms search-least squares support vector regression. Neural 36. Cheng M-Y, Hoang N-D, Wu Y-W (2013) Hybrid intelligence
Comput Appl. https​://doi.org/10.1007/s0052​1-018-3426-0 approach based on LS-SVM and differential evolution for con-
20. Samui P, Kurup P (2012) Multivariate adaptive regression spline struction cost index estimation: a Taiwan case study. Autom
(MARS) and least squares support vector machine (LSSVM) for Constr 35:306–313
OCR prediction. Soft Comput 16(8):1347–1351 37. Tien Bui D, Pham TB, Nguyen Q-P, Hoang N-D (2016) Spatial
21. Samui P, Kim D (2013) Least square support vector machine and prediction of rainfall-induced shallow landslides using hybrid
multivariate adaptive regression spline for modeling lateral load integration approach of least squares support vector machines
capacity of piles. Neural Comput Appl 23(3):1123–1127. https​:// and differential evolution optimization: a case study in central
doi.org/10.1007/s0052​1-012-1043-x Vietnam. Int J Digit Earth 9(11):1077–1097
22. Wu Y-H, Shen H (2018) Grey-related least squares support vector 38. Hoang N-D, Bui DT (2016) A novel relevance vector machine
machine optimization model and its application in predicting natu- classifier with cuckoo search optimization for spatial prediction
ral gas consumption demand. J Comput Appl Math 338:212–220. of landslides. J Comput Civ Eng 30(5):04016001. https​://doi.
https​://doi.org/10.1016/j.cam.2018.01.033 org/10.1061/(ASCE)CP.1943-5487.00005​57 doi
23. Mohammad Emami N, Rasool Amiri K, Mohammad Khodaiy A, 39. Yasar M (2016) Optimization of reservoir operation using
Mahdi Shahbazi R (2018) Metaheuristic optimization approaches cuckoo search algorithm: example of Adiguzel Dam, Denizli,
to predict shear-wave velocity from conventional well logs in Turkey. Math Probl Eng. https​://doi.org/10.1155/2016/13160​38
sandstone and carbonate case studies. J Geophys Eng 15(3):1071 40. Mohamad AB, Zain AM, Nazira Bazin NE (2014) Cuckoo
24. Chou J-S, Chong WK, Bui D-K (2016) Nature-inspired search algorithm for optimization problems—a literature review
metaheuristic regression system: programming and implemen- and its applications. Appl Artif Intell 28(5):419–448. https​://
tation for civil engineering applications. J Comput Civ Eng doi.org/10.1080/08839​514.2014.90459​9
30(5):04016007. https ​ : //doi.org/10.1061/(ASCE)CP.1943-
5487.00005​61 doi

13
Engineering with Computers (2019) 35:955–965 965

41. Clayton CR (1995) The standard penetration test (SPT): meth- 47. Jokar MH, Khosravi A, Heidaripanah A, Soltani F (2018) Unsat-
ods and use. Construction Industry Research and Information urated soils permeability estimation by adaptive neuro-fuzzy
Association inference system. Soft Comput. https​://doi.org/10.1007/s0050​
42. Schmertmann JH (1978) Guidelines for cone penetration test: 0-018-3326-3
performance and design. United States. Federal Highway 48. Matwork (2017) Statistics and machine learning toolbox user’s
Administration guide. Matwork Inc., https​://www.mathw​orks.com/help/pdf_doc/
43. (ASTM) ASfTaM (2005) ASTM D4648/D4648M-16, Standard stats​/stats​.pdf. Accessed 28 Apr 2018
test methods for laboratory miniature vane shear test for satu- 49. Hoang N-D, Bui DT (2018) Predicting earthquake-induced soil
rated fine-grained clayey soil. Active Standard ASTM D4648, liquefaction based on a hybridization of kernel Fisher discrimi-
vol ASTM International, West Conshohocken, PA, 2016. https​:// nant analysis and a least squares support vector machine: a multi-
www.astm.org. Accessed 14 Mar 2018 dataset study. Bull Eng Geol Environ 77(1):191–204. https​://doi.
44. ASTM (1985) Classification of soils for engineering purposes: org/10.1007/s1006​4-016-0924-0
annual book of ASTM standards, D 2487-83, 04.08. American
Society for Testing and Materials, pp 395–408 Publisher’s Note Springer Nature remains neutral with regard to
45. De Brabanter K, Karsmakers P, Ojeda F, Alzate C, De Brabanter jurisdictional claims in published maps and institutional affiliations.
J, Pelckmans K, De Moor B, Vandewalle J, Suykens JAK (2010)
LS-SVMlab Toolbox User’s Guide version 1.8
46. Bishop CM (2011) Pattern Recognition and Machine Learning
(Information Science and Statistics). Springer, Berlin (ISBN-10:
0387310738)

13

You might also like