Construction and Building Materials: Jian Zhou, Peixi Yang, Chuanqi Li, Kun Du

Construction and Building Materials 409 (2023) 133911
Contents lists available at ScienceDirect
Construction and Building Materials

journal homepage: www.elsevier.com/locate/conbuildmat
Hybrid random forest-based models for predicting shear strength of

structural surfaces based on surface morphology parameters and
metaheuristic algorithms
Jian Zhou a, *, Peixi Yang a, Chuanqi Li b, Kun Du a, *
a
School of Resources and Safety Engineering, Central South University, Changsha 410083, China
b
Laboratory 3SR, CNRS UMR 5521, Grenoble Alpes University, Grenoble 38000, France
A R T I C L E I N F O A B S T R A C T
Keywords: The prediction of shear strength between soil-structure interactions is of great significance to the stability of
Peak interface efficiency geotechnical engineering. In this study, 480 morphological data with seven morphological parameters (deviation
Morphological parameters of the root mean square value of the profile (Pq), skewness of the height distribution in the profile (Psk), kurtosis
Dragonfly algorithm
of the height distribution of the profiles (Pku), average width of outline elements (PSm), root mean square slope
Sparrow Search Algorithm
Whale optimization algorithm
of the profile (Pdq), material ratio of the profile(Pmr), number of peaks (Ppc)) were selected to generate a
comprehensive database for predicting the peak interface efficiency (IEp) considering three different soil particle
sizes (0.35 mm, 0.53 mm, and 0.80 mm). Three random forest (RF) models optimized using dragonfly algorithm
(DA-RF), sparrow search algorithm optimized random forest (SSA-RF), and whale optimization algorithm (WOA-
RF) were generated to predict IEp. and compared the predictive performance with extreme learning machine
(ELM), support vector regression with radial basis function kernel (SVR-RBF) and initial RF models. The mean
absolute error (MAE), the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the
coefficient of determination (R2) were used to evaluate the performance of all models. The results showed that
the WOA-RF model has achieved the best performance by resulting in MAE of (0.0145, 0.0181, 0.0179 and
0.0210, 0.0273, 0.0216), MAPE of (1.9866, 2.6417, 2.5310 and 2.8924, 4.0294, 3.0816), and RMSE of (0 0178,
0.0237,0.0224 and 0.0252, 0.0362, 0.0276), R2 (0.9473, 0.9262, 0.9352 and 0.9404, 0.8433, 0.9313) in the
training and testing phases. The results of significance analysis indicated that Pdq and Pq have more importance
than other parameters for predicting IEp.
1. Introduction structures result from various factors, like particle properties [19,20]
and soil density [50,52,112] are significant parameters affecting soil-
Geotechnical engineering has always faced the challenge of under structure interface behavior. In addition to the intrinsic properties of
standing the interaction between soil and structure, which must be the soil, the morphological characteristics of soil surface also exert a
addressed from a practical perspective. One of the main areas of concern notable influence on the shear strength. Therefore, a comprehensive
is that the interactions induced by contact interfaces between two consideration of various morphological factors is required to enhance
different objects. Unstable system at the interface can lead to relative the accuracy of evaluating the mechanical properties at the soil-
displacement of the contact points leading to structural instability structure interface. Several parameters including the geometry and
[2,13,22,75,90,91,100,109]. For instance, weak structural surfaces material type of the structure surface, dryness and humidity, particle
affect the slope stability [69] and the joint surfaces is closely related to size, water content, and surface roughness on the mechanical properties
the fragmentation of rock mass [12]. In particular, the shear strength of of the soil-structure interface have been examined to assess shear
structural surfaces plays an important role in maintaining the structure strength of structural surfaces [9,23,43,68,80,82,89]. Among these
stability [14]. morphological characteristics, the interface roughness has an important
The mechanical characteristics of the interaction between soil- impact on the shear strength at the soil contact point and subsequent
* Corresponding authors.
E-mail addresses: csujzhou@hotmail.com, j.zhou@csu.edu.cn (J. Zhou), dukuncsu@csu.edu.cn (K. Du).
https://doi.org/10.1016/j.conbuildmat.2023.133911
Received 7 August 2023; Received in revised form 19 October 2023; Accepted 20 October 2023
0950-0618/© 2023 Elsevier Ltd. All rights reserved.
J. Zhou et al. Construction and Building Materials 409 (2023) 133911
Fig. 1. Schematic flow chart of DA.
deformation [4,20,28,66,79,82,102]. However, it is not sufficient to (XGBoost) algorithm [84,114,115], backpropagation neural network
classify interface roughness into smooth and rough based on qualitative (BPNN) [113]. While the above ML models are more or less restricted by
analysis to clarify the potency of interface roughness in the interaction some limitations which affect their predictive performance. For
between soil-structures. Uesugi et al. [73] proposed the relative instance, ELM has shown success in some applications, it is prone to
roughness as a classic quantitative index to describe surface morphology overfitting and stochasticity issues when dealing with practical prob
(Rn = Rmax /D50 , where Rmax represents the maximum height of the lems [21,29]. In contrast, SVR has limitations in its applicability and
surface, and D50 represents the average particle diameter). This study effectiveness in handling nonlinear and non-monotonic relationships,
provides a detailed classification of D50 to investigate the shear strength and requires extensive data preprocessing for optimal performance [55].
of structural surfaces. However, the traditional approach of conducting Nevertheless, random forest (RF) regression is widely applied because of
numerous experiments to measure morphological characteristics and its strong robustness, inclusiveness and powerful ability to avoid over
study the relationship between them and the shear strength of structural fitting [1,17,40,42,96,97,99,101,110]. In RF model, selecting the
surfaces is time-consuming and labor-intensive. Thus, there is a need to appropriate combination of hyperparameters is essential for enhancing
develop more efficient methods to understand the complexity of the soil- the generalizability of the model, mitigating the risk of overfitting, and
structure interface. ultimately improving the accuracy of predictions [7]. Grid search,
AI technologies, such as machine learning (ML), have become random search and metaheuristic algorithms are typical optimization
increasingly popular in structural and geotechnical engineering due to methods. However, grid search has limitations being prohibitively slow,
their exceptional merit and powerful learning abilities [31,35,61,67]. In and random search is restricted by the distribution of space. These ap
recent years, several studies have demonstrated the effectiveness of ML proaches are primarily suitable for smaller search spaces and models. As
models in predicting various mechanical properties of interfaces, such as engineering problems become increasingly intricate, the size of the
the effect of surface roughness, relative soil density and normal stiffness search space and model volume continue to expand. Thus, traditional
on interfacial behavior [97]; the interface shear behavior between search methods are inadequate to meet the demand for efficient and
frozen soil and structure surface [15], the behavior of underground effective problem-solving [41]. Metaheuristic algorithms-based swarm
structures in soils and rocks [33], influence of soil-structure interactions intelligence are favored for their demonstrated efficiency in dealing with
on seismic response of building systems [47,81]. Furthermore, some high-dimensional and complex problems. [11]. Moayedi et al. [51]
scholars combined ML algorithms with simulation experiments to utilized the dragonfly optimization (DA) algorithm and the whale
replace the capture of underlying physics or system processes by the optimization algorithm (WOA) to optimize the artificial neural network
simulation model with the autonomous learning of ML algorithms, (ANN) model for predicting the soil shear strength (SSS). As a result, the
improving accuracy while greatly reducing the time originally required DA-ANN and WOA-ANN hybrid models achieved higher R2 values of
for simulation [3,72,76,80,88]. Numerous ML models were utilized to 0.7647 and 0.7442 than that of the traditional ANN model (0.6329).
solve diverse geological problems, such as the extreme learning machine Yang et al. [86] utilized the sparrow search algorithm (SSA) to optimize
(ELM) [37,38,62], a support vector regression with radial basis function the gradient boosted regression tree (GBRT) model for predicting the
kernel (SVR-RBF) [8,39,53], adaptive boosting (AdaBoost) [24], penetration rate (PR) of tunnel boring machines (TBM). The evaluation
gradient boosting regression tree (GBRT) [25,116], artificial neural results indicated that the performance of optimized GBRF is superior
networks (ANN) [36], least square support vector machine (LSSVM) than the initial GBRT model by means of the higher R and RMSE values
[60,94], genetic programing (GP) [65], extreme Gradient Boosting (0.871 and 0.120).
2
Fig. 2. Schematic flow chart of SSA.
Therefore, this paper aims to combine DA, SSA and WOA algorithms group. The resulting schematic flow diagram of the DA algorithm is
and RF model for predicting the shear strength of soil-structure inter shown in Fig. 1.
face. The rest of this paper is listed as follows: In Section 2, the funda In DA, two important dragonfly behaviors of searching for prey and
mental RF model with DA, SSA and WOA are reviewed. In Section 3, the hiding from natural enemies can be expressed by the Eq. (1):
surface morphological parameters of the structural interface are pre {
Fm = X + − X
processed and several evaluation indices are introduced to determine (1)
Em = X − − X
model performance. In Section 4, three hybrid RF models (DA-RF, SSA-
RF, and WOA-RF) are developed and compared to the unoptimized RF,
where Fm represents the position vector of the m-th dragonfly in
ELM, and SVR-RBF models. The significance analysis is adopted to
searching prey, X represents the current position of the individual
calculate parameter importance. The main conclusions and limitations
dragonfly, and X+ represents the position of the prey; Em denotes the
of this paper are discussed in Section 5.
direction vector of the dragonfly to avoid the natural enemy and X-
denotes the position vector of the natural enemy.
2. Materials and method Since the dragonfly is in constant motion most of the time, the dy
namic position of the dragonfly (Xt+1 ) and the corresponding step vector
2.1. Random forest ΔXt+1 can be expressed by the following equation:
{
The Random Forest (RF) was proposed by Breiman [7], combining of Xt+1 = Xt + ΔXt+1
(2)
“Bootstrap aggregating” method and “Random subspace” method ΔXt+1 = (sSm + aAm + cCm + f Fm + eEm ) + ωΔXt
[6,27]. The instructions for using RF model are as follows: First, the
Bootstrap method is used to generate a random number of training sets, where s, a, c, f, and e represent the weights of the five behaviors,
i.e., resample. Then, a series of decision trees corresponding to these respectively. while Sm , Am , Cm ,Fm , and Em represent the position vectors
training sets are generated, and each non-leaf node is split with a of the five behaviors, respectively. And ω is the inertia weight.
random selection of attributes to complete the tree growth. Next, all If the proximity solution is not searched in the vicinity of similar
decision trees need to be verified using the testing set. Finally, the individuals, the space is going to be searched by means of Levy flight,
average value of all predicted values obtained by all decision trees is then the position update method is changed as follows:
considered as the final prediction result. Xt+1 = Xm + Levy(d)ΔXm (3)
2.2. Dragonfly algorithm where d denotes dimensionality, and Levy denotes the levy flight
expression [44].
DA is a class of bionic algorithms proposed by Mirjalili et al. [48],
which is centered on simulating the behavior of individual and clusters 2.3. Sparrow search algorithm
of dragonflies in nature to perform optimization search. Dragonfly
habits consist of the following five behaviors: separating, lining up, Xue and Shen [85] proposed a novel metaheuristic algorithm by
aligning, feeding, and avoiding natural predators. The aim of separating observing the predatory and anti-predatory behavior of sparrows, called
behavior is to prevent collisions between individual dragonfly; align the sparrow search algorithm. The SSA is a metaheuristic optimization
ment is matching of speed between individuals when flying with each algorithm inspired by the social behavior of sparrows as shown in Fig. 2.
other; and alliance behavior describes a moving together in dragonfly The algorithm involves a search process in which sparrows move around
3
Fig. 3. Schematic flow chart of WOA.
in search of food, and the best sparrow becomes the leader of the group. two basic steps, namely, the encircling and the bubble-net hunting op
In the optimization process, solutions are represented by sparrows, and erations. These two steps are based on the positions of the best indi
their positions are updated based on the leader’s position and the best vidual and the current position of each whale.
solution found so far. The algorithm aims to converge to the optimal Humpback whales need to search for prey before start hunting, the
solution by gradually improving the solutions of the sparrow population. searching process can be represented by the following equation:
The discoverers usually have a larger foraging search space than the {
D = |CXrand − X(t)|
producers, and the position of the discoverer in the searching space can (5)
X(n + 1) = Xrand − AD
be expressed by the following equation:
⎧
⎪ i Before hunting, the prey was besieged by humpback whales, and this
⎨ Xi,j ⋅exp(− ), if R2 < ST
Xi,jt+1 = α.itermax (4) behavior can be represented in the model by Eq. (6):
⎪
⎩ X + Q.L, if R ⩾ST {
i,j 2 D = |CX * (n) − X(n)|
(6)
X(n + 1) = X * (n) − AD
where t represents the current number of iterations, itermax denotes the
maximum number of iterations, Xij denotes the position of the i-th where n denotes the current number of iterations; A and C are co
sparrow in the j-th dimension, and α is a random number in (0,1]. R2 efficients, X* (n) denotes the optimal position of the whale at the current
represents the current hazard level value, and ST represents the hazard iteration, and X(n) represents the current position of the whale.
threshold. Q is a random number with a normal distribution, and L de Humpback whales approach their targets with a spiral motion in
notes a matrix consisting of 1. When R2 < ST, the current area is safe and hunting, and this behavioral pattern can be represented by Eq. (7).
sparrows continue to search food. Indeed, the discovers find enemies { *
and sparrows flee quickly (i.e., R2 > ST). X (n) − AD, p < Pi
X(n + 1) = (7)
X * (n) + Dp ebl cos(2π l)
Meanwhile, joiners often mingled with producers to participate in
searching and obtaining food, The adaptability of joiners also varies, the
The Dp in the formula represents the distance between the prey and the
lower the adaptability value means the more difficult in foraging food.
whale, b is a constant used to define the helix, and l is a constant within
The identity of discoverers and joiners is always dynamic, and a joiner
[-1,1]. The coefficient A reflects the distance between the humpback
may also be attracted to become a producer.
whale and its prey. If A > 1 means that the humpback whale chose to
abandon the current prey and move away from it to continue searching
2.4. Whale optimization algorithm for a better prey. Otherwise, humpback whale starts to attack the prey.
The WOA is a class of metaheuristic optimization algorithms pro 3. Database preparation

posed by Mirjalili et al. [49]. The algorithm is inspired by the special
predatory behavior of humpback whales in the ocean, which is called 3.1. Data source
bubble-net. The WOA mimics the behavior of whales in encircling prey,
bubble-net feeding, and searching for food in the ocean as shown in The three-dimensional discrete element method (DEM) has shown
Fig. 3. The WOA employs a search space exploration mechanism using
4
Fig. 4. Definition of each surface morphology parameter.
remarkable success in solving problems in geotechnical and structural 3.2. Data description
engineering fields. A number of scholars have utilized DEM to simulate
various mechanical problems in soil and rock, and to explore the cor Sadowski et al. [59] investigated the microscopic changes at the
relation between different parameters and material behavior interface between the concrete overlay and the substrate. Four different
[26,32,46,56,74,77,95,111]. Numerical simulation methods are typi surface morphologies were obtained using different surface treatment
cally time-intensive, which can limit their utility for real-time or rapid techniques: T1-raw, T2-as cast, T3-ground and T4-shotblasted. Chen
analysis. As an alternative, machine learning models can be trained et al. [14] collected a number of surface morphological parameters using
using numerical simulation parameters as inputs to predict simulation Mountains Map to parse 60 surface profiles based four types of concrete
results or to optimize simulation parameters. Recent studies have shown surfaces from previous studies. These surface morphology parameters
that these hybrid methods are more efficient and accurate than relying contain amplitude parameters [30]: the minimum height value (Pv), the
solely on numerical simulations [34,45,54,83,98]. Therefore, the pre maximum height value (Pp), the height difference between the
sent study aims to utilize data from random soil structure interface shear maximum and minimum values (Pz), average height of the profile (Pc),
tests simulated by DEM to further explore the correlation between arithmetic mean difference of the profile (Pa), deviation of the root
different parameters and the material behavior. mean square value of the profile (Pq), skewness of the height distribu
tion in the profile (Psk), kurtosis of the height distribution of the profiles
(Pku); spacing parameters: average width of outline elements (PSm);
5
Fig. 5. Pearson correlation coefficient analysis of surface topography parameters.
hybrid parameters: root mean square slope of the profile (Pdq); material correlations of Pp, Pv, Pz, Pc, Pa and Pdc are eliminated. Thus, the input
ratio curves and correlation parameters: material ratio of the profile parameters considered in this paper to predict interface shear strength of
(Pmr); and peak parameters: relative material ratio of the profile, soil-structure are Pq, Psk, Pku, PSm, Pdq, Pmr and Ppc, and the output
number of peaks (Ppc) [30], as described in Fig. 4. parameter is represented by IEP.
Based aforementioned factor (particle properties) that affect the
mechanical properties of soil-structure in Section 1, the database used in 3.4. Evaluation indicators
this study contains three different average particle sizes, i.e., D50 = 0.35
mm (4444 particles and 120 shear tests), D50 = 0.53 mm (10, 000 par The performance evaluation is essential to verify the efficacy of the
ticles and 240 shear tests) and D50 = 0.80 mm (22,500 particles and 120 established ML models [58,103,106]. In the study, the mean absolute
shear tests). Chen et al. [14] applied an index named the interface ef error (MAE) is a metric that measures the average absolute difference
ficiency (IE) to normalize the interface stress of soil-structure, which can between predicted and actual values, the mean absolute percentage
be calculated by the following expression: error (MAPE) provides a measure of the accuracy of the predictions in
percentage terms, the coefficient of determination (R2) provides a
tanδ
IE = (8) measure of the goodness of fit of the model, and the root mean square
tanϕ
error (RMSE) provides a measure of the average magnitude of the errors
where tanδ and tanϕ represent the friction coefficient of interface and in the predictions. The application of these evaluation indices is detailed
pure soil, respectively. Furthermore, the peak IE (IEP) can be calculated in [40,64,103–108,110]. In this paper, MAE, MAPE, R2, and RMSE were
by their peak friction coefficients to represent the shear strength of soil- used as the evaluation indices of the proposed models, and these indices
structure. can be defined as follows:
m ⃒ ⃒
1 ∑ ⃒∧ ⃒
MAE = ⃒yi − yi ⃒
⃒ ⃒ (9)
m i=1
3.3. Data processing
⃒ ⃒
m ⃒ ∧ ⃒
In ML, highly correlated variables can lead to multicollinearity, 100\% ∑ ⃒yi − yi ⃒
MAPE = ⃒ ⃒ (10)
which can cause issues in model training and affect the interpretation of m ⃒ y ⃒
i=1 ⃒ i ⃒
the model. The Pearson correlation coefficient expresses the degree of
similarity between two variables, which is widely used to reflect the ∑m ∧
degree of similarity between calculated features and categories [16]. By (yi − yi )2
R2 = 1 − ∑i=1 (11)
identifying highly correlated variables, the number of input variables
m 2
i=1 (yi − yi )
can be reduced and the performance of the model can be improved [63]. √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
In general, input parameters with high correlation coefficients (Pearson 1 ∑ m
∧
RMSE = (yi − yi )2 (12)
correlation coefficients above 0.8) usually need to be filtered to prevent m i=1
data redundancy [63]. When the correlation between two input pa
rameters is more than 0.80, a trade-off is typically made. The surface ∧
topographical parameters of database were examined using Pearson where yi , yi and yi represent the predicted value, actual value and
correlation analysis to eliminate data redundancy, and the findings are average value, respectively. m indicates the use of training or testing
displayed in Fig. 5. As can be seen in this picture, the Pearson correlation samples.
coefficient calculation results illustrate that the association between Pp,
Pv, Pz, Pc, Pa, Pq, and Pdc is stronger than others. Therefore, the high
6
Fig. 6. Flowchart of RF-based models (DA-RF, WOA-RF, SSA-RF).
all data were normalized to [-1, 1]. Cross-validation (CV) improves

Table 1
model performance and reduces the risk of overfitting by providing
The parameter setting of hybrid models.
more accurate estimates of model performance on new data. It also
Hybrid Number of Population Objectives of optimization maximizes the use of existing data and improves the results reliability,
model iterations size
making a valuable tool for model evaluation and selection [92]. In this
DA-RF 600 10, 20, 50, the number of trees, the study, a five-fold CV is used to prevent overfitting of the proposed
100 maximum depth of trees
models. In RF regression, the optimization objectives are usually the
SSA-RF 600 10, 20, 50,
100 number of trees and the maximum depth of the trees [57]. Therefore, the
WOA-RF 600 10, 20, 50, number of trees and the maximum depth of trees are optimized by DA,
100 SSA, and WOA in this paper. For optimizing ML models, the population
size and number of iterations are artificially set to control the search
process within a reasonable time frame and avoid overfitting or
3.5. Rf-based hybrid models
underfitting. The population size represents the number of solutions or
individuals that exist in each generation of the optimization algorithm,
The database of surface topography parameters in this paper con
while the number of iterations determines the number of generations to
tains a total of 480 sets of data (120 data sets for D50 = 0.35 mm, 240
be created during the optimization process [93]. The parameter settings
data sets for D50 = 0.53 mm, and 120 data sets for D50 = 0.80 mm in the
for the three types of hybrid models (DA-RF, SSA-RF, and WOA-RF) are
data set, D50 indicates soil particle size). As shown in Fig. 6, these data
shown in Table 1. During the merit-seeking iteration of the model, the
are randomly divided into a training set (75 %) and a test set (25 %), and
average RMSE values produced from the five-fold CV training is
employed as the fitness evaluation function [87]. Finally, the perfor
mance of the proposed models for predicting the IEp is evaluated
Table 2
comprehensively using MAE, MAPE, R2, and RMSE. The flowchart of
The parameter settings of RF, ELM and SVR-RBF.
constructing DA-RF, WOA-RF and SSA-RF models for predicting the
Model Hyperparameter Typical default values Reference
shear strength of structural surfaces is shown in Fig. 6.
RF mtry p/3 for regression [57]
samples size n 4. Results and discussion
node size 5 (for regression)
number of trees 1000
splitting rule Gini impurity 4.1. RF, ELM and SVR-RBF models
ELM the number of hidden nodes p [78]
activation function Sigmoid function In light of the advantages of RF model over traditional machine
regularization coefficient 0.1
learning models in preventing overfitting, ELM and SVR-RBF models
SVR-RBF regularization parameter C 1.0 [18]
kernel coefficient gamma 1/p were selected as comparison groups to conduct predictive comparisons
epsilon for the loss function 0.1 for IEp in order to evaluate the relative effectiveness of each model. In RF
model, the number of randomly drawn candidate variables (mtry) is
Note: n is the number of observations and p is the number of variables in the
dataset.
usually set to p/3 (p is the number of variables in the dataset) in some RF
7
Fig. 7. . Performance of ELM, SVR-RBF and RF for IEp prediction.
software packages for regression problems. The choice of the sample size are shown in Fig. 7. Through an analysis of Fig. 7(a), (b), and (c), it was
parameter is usually a trade-off between stability and accuracy of the observed that the performance of ELM was satisfactory in the D50 = 0.35
tree, and is expressed here in terms of the number of observations n (n is mm data set. However, ELM exhibited poor performance in the D50 =
the number of observations). The node size parameter specifies the 0.80 mm dataset, with R2 values of 0.748 and 0.6991 in the training and
minimum number of observations in the terminal nodes, and in some test sets, respectively. Additionally, ELM demonstrated significant
standard packages, the default 1 is for classification and 5 is for overfitting in the D50 = 0.53 mm dataset for surface topography pa
regression. Therefore, node size is set equal to 5 to generate model for rameters. These findings indicate that ELM may have limited noise
predicting IEp. The number of trees in a forest is not adjustable in the control capabilities when dealing with certain surface topography
classical sense, but should be set high enough, usually to 500 or 1000 parameter data sets. In Fig. 7 (d), (e), (f), it is observed that SVR-RBF
[57], and it is set equal to 1000 in this paper. The splitting rule can be exhibits considerable overfitting in both D50 = 0.35 mm and D50 =
seen as one of the core features of performance RF, and the classic Gini 0.80 mm datasets. Moreover, even though there is no overfitting in the
impurity is chosen as the splitting rule in this paper. For other parameter surface topography parameter dataset with D50 = 0.53 mm, the pre
settings of ELM and SVR-RBF, refer to Table 2, and all models were diction performance is poor as indicated by the low R2 values of 0.6511
trained and evaluated on the same training and test sets. and 0.6383 in the training and test sets, respectively. Fig. 7(g), (h), and
The performance results of ELM, SVR-RBF and RF for IEp prediction (i) show that RF performs exceptionally well in all three types of surface
8
Fig. 8. Iteration curves of three types of hybrid models DA-RF, SSA-RF and WOA-RF.
topography data sets without any significant overfitting. Therefore, it assessed by plotting fitness curves based on four different population
can be concluded that RF has superior noise control capabilities for sizes, i.e., 10, 20, 50, and 100. Results indicate that the WOA-RF model
surface topography parameters and can more accurately identify the demonstrated the best overall performance, achieving the highest fitness
relationship between surface topography parameters and IEp, making it value across all population sizes, with a population size of 10 yielding
a more appropriate choice for predicting IEp than ELM and SVR-RBF. the best performance. In contrast, the DA-RF model showed the slowest
improvement rate, with a population size of 20 yielding the best per
formance. The SSA-RF model exhibited intermediate performance, with
4.2. Model validity judgment a population size of 100 yielding the best performance. These findings
highlight the importance of selecting appropriate optimization algo
In contrast to other commonly used error assessment metrics like rithms and population sizes for optimizing the random forest model for
MAE, RMSE is preferred for quantifying the difference between true and predicting the IEp based on surface morphology parameters.
predicted values as it avoids the need for absolute values. This property
makes RMSE a crucial factor in evaluating some model parameters
where determining the sensitivity of MAE is challenging [10]. By 4.3. Hybrid models performance evaluation
monitoring the change in RMSE in RF, the hyperparameters can be
tuned to minimize the model fitness value. In this study, RMSE is chosen Fig. 9 intuitively shows the comparison between the predicted IEP
as a fitness function for model optimization. The iteration curves of three value and the actual IEP value. In these diagrams, each color blob rep
hybrid models with different population sizes are presented in Fig. 8. resents the distribution of the predicted and actual values for a class of
The performance of a random forest model optimized using three particle sizes (i.e., red: D50 = 0.35 mm; blue: D50 = 0.53 mm; green: D50
optimization algorithms, DA, SSA, and WOA, was examined for pre = 0.80 mm). The predicted value and the actual value jointly locate the
dicting the interface stress of soil-structure based on surface morphology position of the data point in the graph. In other words, the closer the data
parameters. The effect of population size on model performance was point is to the diagonal dotted line, the smaller the difference between
9
Fig. 9. Distribution of prediction results and histogram of differences for three types of hybrid models.
10
Fig. 10. Comprehensive information on different hybrid model evaluation indicators.
11
Fig. 11. Comparison of the overall performance score of each model (The marked is the optimal model).
the predicted value and the actual value. The range of the difference
Table 3
distribution between the true and expected values of particles of each
Comparison of the prediction effect of WOA-RF (Pop = 10) and previous study.
particle size and the distribution is visually represented by the histogram
of the difference distribution in the top right corner. The distributions Method R2
Training set
results of predicted values obtained by three hybrid RF models are all
centered around the dotted line, highlighting the high level of prediction WOA-RF(Pop = 10) D50 = 0.35 mm 0.9473
D50 = 0.53 mm 0.9262
accuracy. The DA-RF (pop = 20) model demonstrated a prediction error
D50 = 0.80 mm 0.9352
range of 0.08, while the WOA-RF (pop = 10) and SSA-RF (pop = 100) BNGR [14] 0.88
models showed lower error ranges of 0.06. These findings suggest that Testing set
the WOA-RF (pop = 10) and SSA-RF (pop = 100) models may be more WOA-RF(Pop = 10) D50 = 0.35 mm 0.9404
accurate in predicting interface stress of soil-structure (IEp) than the DA- D50 = 0.53 mm 0.8433
D50 = 0.80 mm 0.9352
RF (pop = 20) model, given their lower prediction error ranges.
BNGR [14] 0.77
The Taylor diagram representation was utilized to assess the per
formance of RF, SVR-RBF, ELM, DA-RF, SSA-RF, and WOA-RF in pre
dicting the shear strength of structural surfaces, as illustrated in Fig. 10. WOA-RF (pop = 10) significantly outperformed other models, with
It is evident from the diagram that the data points associated with DA- WOA-RF (pop = 10) performing the best (Training set: MAE = 0.0145,
RF, SSA-RF, and WOA-RF are closer to the observation point than MAPE = 1.9866, R2 = 0.9473, RMSE = 0.0178 at D50 = 0.35 mm; MAE
those of unoptimized RF model, implying that these models exhibit = 0.0181, MAPE = 2.6417, R2 = 0.9262, RMSE = 0.0237 at D50 = 0.53
lower RMSE in their predictions. Moreover, the smaller azimuth angle mm; MAE = 0.0179, MAPE = 2.5310, R2 = 0.9352, RMSE = 0.0224 at
signifies that DA-RF, SSA-RF, and WOA-RF exhibit higher correlation D50 = 0.80 mm; Testing set: MAE = 0.0210, MAPE = 2.8924, R2 =
coefficients [70]. These findings indicate that combining DA, SSA, and 0.9404, RMSE = 0.0252 at D50 = 0.35 mm; MAE = 0.0273, MAPE =
WOA is help for selecting effective hyperparameters of RF model and 4.0294, R2 = 0.8433, RMSE = 0.0362 at D50 = 0.53 mm; MAE = 0.0216,
improving predictive performance of initial RF model. This is an effi MAPE = 3.0, R2 = 0.9352, RMSE = 0.0224 at D50 = 0.80 mm), sug
cient optimization search that captures the intrinsic relationship be gesting that the WOA algorithm with a population size of 10 can effec
tween the surface morphology parameters and IEp. tively search for optimal hyperparameter combinations and significantly
To comprehensively evaluate the predictive performance of DA-RF, improve the ability of RF model to capture the intrinsic relationship
SSA-RF, and WOA-RF for IEp, this study computed evaluation parame between surface morphology parameters and IEp.
ters for each model and ranked them based on scores from high to low. Table 3 presents the comparative analysis of the prediction perfor
The resulting comprehensive evaluation is presented in Fig. 11, where a mance of the WOA-RF model proposed in this study with the published
larger fan-shaped area denotes a higher score and better performance. results by Chen et al. [14]. The analysis reveals that the proposed WOA-
Specifically, Fig. 11(a) compares the performance of DA-RF in predict RF model outperforms the previous approaches (based on the same
ing IEp under different population sizes and suggests that setting the database) in predicting IEp. The results indicate that the WOA-RF model
population size to 20 can better optimize the hyperparameters of RF. exhibits good prediction accuracy on both the training and test datasets,
Additionally, in Fig. 11(b) and (c), SSA-RF (pop = 100) and WOA-RF without any significant overfitting. Therefore, it is recommended to use
(pop = 10) exhibited the best performance, respectively. Finally, the WOA-RF model as an effective tool for predicting IEp.
Fig. 11(d) provides a comprehensive comparison of RF, ELM, SVR-RBF,
DA-RF (pop = 20), SSA-RF (pop = 100), and WOA-RF (pop = 10) in
predicting IEp. The results indicate that both SSA-RF (pop = 100) and
12
Fig. 12. Importance distribution of each parameter of the optimal model.
13
Table 4 However, this marginal decline underscores the substantial influence of

Comparison of WOA-RF results between the dataset with Pmr removed and the these identified variables. The fact that the model retains a large pro
original dataset. portion of its accuracy, even when other variables are excluded, un
WOA-RF (Pop = 10) MAE MAPE R2 RMSE equivocally demonstrates the paramount importance of the selected
Without Pmr
variables in predicting shear strength based on surface morphology
D50 = 0.35 mm 0.0244 3.3677 0.9329 0.0280 parameters. This observation not only validates our variable selection
D50 = 0.53 mm 0.0288 4.2157 0.8348 0.0369 but also emphasizes the potential efficiency in scenarios where mea
D50 = 0.80 mm 0.0218 3.1296 0.9269 0.0277 surement or computation resources might be constrained, thereby
Original
enabling researchers to focus primarily on these pivotal parameters.
D50 = 0.35 mm 0.021 2.8924 0.9404 0.0252
D50 = 0.53 mm 0.0273 4.0294 0.8433 0.0362
D50 = 0.80 mm 0.0216 3.0816 0.9313 0.0276 5. Conclusions
The interaction between soil structural surfaces has been the focus of
4.4. Parameter importance analysis study in geotechnical engineering, and numerical simulation has
become an important study tool for many scholars due to the complex
Feature importance analysis is a crucial function in Random Forest field environment and the difficulty of conducting multiple types of
(RF) algorithms. On the one hand, it aids the model in recognizing sig experiments. In this study, 480 sets of numerical simulation data from
nificant features, eliminating noise interference, and mitigating over DEM were used, including 120 sets with D50 = 0.35 mm, 240 sets with
fitting risks. On the other hand, feature importance plays a pivotal role D50 = 0.53 mm, and 120 sets with D50 = 0.80 mm. Several models
in the interpretability of the model. In the RF model, the significance of a including ELM, SVR-RBF, RF (not optimized), DA-RF, SSA-RF, and
variable refers to the contribution of variable to the model [71]. Brei WOA-RF were constructed with seven types of surface morphology pa
man et al. [5] considered more sensitive variables have a higher Gini rameters (Pq, Psk, Pku, PSm, Pdq, Pmr and Ppc) as training objects to
index, which is computed as follows: predict IEp.
∑
B Based on the analysis of the model results, RF outperforms ELM and
Gini(Xa ) = P(Xa = Yb )(1 − P(Xa = Yb )) SVR-RBF, demonstrating better prediction performance, a stronger
b− 1
(13) ability to capture the intrinsic correlation between surface topography
∑
B
=1− P(Xa = Yb )2 parameters and IEp, and the ability to prevent overfitting. Furthermore,
b the optimized RF model using WOA with a population size of 10 ach
ieves the best IEp prediction performance. This indicates that WOA has
where Gini(Xa) represents the Gini index. P(Xa = Yb) is the probabilities. an excellent hyperparameter search capability for the surface topog
Xa = Yb represents the estimated values. raphy parameter-IEp relationship problem in RF.
Fig. 12 shows the importance of each input variable in the RF model Based on the feature importance analysis, it was found that Pdq and
based on three different particle sizes (i.e., D50 = 0.35 mm, D50 = 0.53 Pq played crucial roles in predicting IEp. Meanwhile, PSm, Pku, Psk, and
mm, and D50 = 0.80 mm). Based on the feature importance analysis and Ppc showed less contribution, and the correlation between Pmr and IEp
the data presented in the figure, it was observed that Pdq and Pq exhibit was weak. If only parameters with salient features are considered, there
the highest importance scores among the surface morphological pa lies an inherent risk of overlooking the subtle interactions amongst less
rameters. This highlights their close association with IEp and their crit prominent features, which, collectively, may exert a significant impact
ical role in predicting IEp using the RF model. These results also suggest on the prediction outcomes. The data used in this study were derived
that adjusting Pdq and Pq may have a more significant impact on IEp in from numerical simulations that incorporated surface morphology data
numerical simulations. Additionally, the moderate importance scores of from multiple prior studies, indicating their generalizability. However,
PSm, Pku, Psk, and Ppc imply that these parameters contribute to the caution must be exercised when applying the findings to different soil
prediction of IEp, albeit to a lesser extent than Pdq and Pq. Nevertheless, structural surface environments. Future study should incorporate a more
their intrinsic relationship with IEp remains relatively strong. comprehensive range of surface topography parameters and adjust the
On the other hand, the lower importance score of Pmr implies a model parameters to optimize feature values, thereby enhancing the
weaker association with IEp and a reduced impact on the prediction of accuracy and reliability of the predictive model.
IEp using the RF model. Consequently, this parameter may not be as
crucial as other surface morphological parameters for predicting IEp. CRediT authorship contribution statement
To conclude, Pdq and Pq are closely related to IEp and are critical for
predicting IEp using the RF model. These parameters should be the focus Jian Zhou: Writing – review & editing, Supervision, Methodology,
of adjustment when tuning simulation parameters in numerical simu Funding acquisition, Conceptualization. Peixi Yang: Writing – original
lations. Additionally, the potential effects of PSm, Pku, Psk, and Ppc on draft, Software, Methodology, Data curation. Chuanqi Li: Writing –
IEp should be taken into account. Finally, due to its low correlation with review & editing, Visualization, Formal analysis. Kun Du: Writing –
IEp, Pmr can be considered last. The parameter importance analysis in review & editing, Validation, Supervision, Investigation, Formal
this study provides insights into the various connections between analysis.
different surface topography parameters and IEp, and forms a basis for
optimizing parameter adjustment in numerical simulations. Declaration of Competing Interest
To further validate the role of feature importance analysis, the least
important feature, Pmr, was eliminated from the dataset. The currently The authors declare that they have no known competing financial
most prominent WOA-RF (Pop = 10) was then utilized to retrain the interests or personal relationships that could have appeared to influence
dataset. The results are presented in Table 4. the work reported in this paper.
It is evident that upon removing Pmr from the datasets with D50 =
0.35 mm, D50 = 0.53 mm, and D50 = 0.80 mm, there was a slight Data availability
decrease in the R2 values, which represent model accuracy. Conversely,
error indicators such as MAE, MAPE, and RMSE experienced a slight Data will be made available on request.
increase. This suggests that although Pmr has a low importance level in
predicting IEp, it remains an integral part of accurately forecasting IEp.
14
Acknowledgements Conference on Neural Networks (IEEE Cat, No. 04CH37541) (Vol. 2, Ieee, 2004,
pp. 985–990.
[30] ISO, E., 2009. 4287: 2009. Geometrical Product Specifications (GPS)-Surface
This research is partially supported by the National Natural Science texture: Profile method–Terms, definitions and surface texture parameters (ISO
Foundation Project of China (42177164), and the Distinguished Youth 4287: 1997+Cor 1: 1998+ Cor 2: 2005+ Amd 1: 2009) (includes Corrigendum
Science Foundation of Hunan Province of China (2022JJ10073). AC: 2008 and Amendment A1: 2009).
[31] C. Janiesch, P. Zschech, K. Heinrich, Machine learning and deep learning,
Electron. Mark. 31 (3) (2021) 685–695.
References [32] X.Y. Jing, W.H. Zhou, H.X. Zhu, Z.Y. Yin, Y. Li, Analysis of soil-structural interface
behavior using three-dimensional DEM simulations, Int. J. Numer. Anal. Meth.
[1] E. Avuçlu, A. Elen, Evaluation of train and test performance of machine learning Geomech. 42 (2) (2018) 339–357.
algorithms and Parkinson diagnosis with statistical measurements, Med. Biol. [33] S.C. Jong, D.E.L. Ong, E. Oh, State-of-the-art review of geotechnical-driven
Eng. Compu. 58 (2020) 2775–2788. artificial intelligence techniques in underground soil-structure interaction, Tunn.
[2] R.J. Bathurst, B. Huang, T.M. Allen, LRFD calibration of the ultimate pullout limit Undergr. Space Technol. 113 (2021), 103946.
state for geogrid reinforced soil retaining walls, Int. J. Geomech. 12 (4) (2012) [34] D. Kawabata, J. Bandibas, Landslide susceptibility mapping using geological data,
399–413. a DEM from ASTER images and an Artificial Neural Network (ANN),
[3] B. Bohn, J. Garcke, R. Iza-Teran, A. Paprotny, B. Peherstorfer, U. Schepsmeier, C. Geomorphology 113 (1–2) (2009) 97–109.
A. Thole, Analysis of car crash simulation data with nonlinear machine learning [35] M. Kuhn, K. Johnson, Applied Predictive Modeling Vol. 26 (2013) p. 13)..
methods, Procedia Comput. Sci. 18 (2013) 621–630. [36] S. Lee, C. Lee, Prediction of shear strength of FRP-reinforced concrete flexural
[4] L. Borana, J.H. Yin, D.N. Singh, S.K. Shukla, Interface behavior from suction- members without stirrups using artificial neural networks, Eng. Struct. 61 (2014)
controlled direct shear test on completely decomposed granitic soil and steel 99–112.
surfaces, Int. J. Geomech. 16 (6) (2016) D4016008. [37] C. Li, J. Zhou, M. Khandelwal, X. Zhang, M. Monjezi, Y. Qiu, Six novel hybrid
[5] L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classifi-cation and regression extreme learning machine–swarm intelligence optimization (ELM–SIO) models
trees, CRC Press, 1984. for predicting backbreak in open-pit blasting, Nat. Resour. Res. 31 (5) (2022)
[6] L. Breiman, Bagging Predictors. Machine Learning 24 (2) (1996) 123–140. 3017–3039.
[7] L. Breiman, Random Forests. Machine Learning 45 (1) (2001) 5–32. [38] C. Li, J. Zhou, M. Tao, K. Du, S. Wang, D.J. Armaghani, E.T. Mohamad,
[8] Broomhead, D. S., & Lowe, D. (1988). Radial basis functions, multi-variable Developing hybrid ELM-ALO, ELM-LSO and ELM-SOA models for predicting
functional interpolation and adaptive networks. Royal Signals and Radar advance rate of TBM, Transp. Geotech. 36 (2022), 100819.
Establishment Malvern (United Kingdom). [39] E. Li, F. Yang, M. Ren, X. Zhang, J. Zhou, M. Khandelwal, Prediction of blasting
[9] H. Canakci, M. Hamed, F. Celik, W. Sidik, F. Eviz, Friction characteristics of mean fragment size using support vector regression combined with five
organic soil with construction materials, Soils Found. 56 (6) (2016) 965–972. optimization algorithms, J. Rock Mech. Geotech. Eng. 13 (6) (2021) 1380–1397.
[10] T. Chai, R.R. Draxler, Root mean square error (RMSE) or mean absolute error [40] Y. Li, C. Zou, M. Berecibar, E. Nanini-Maury, J.C.W. Chan, P. Van den Bossche,
(MAE)?–Arguments against avoiding RMSE in the literature, Geosci. Model Dev. 7 N. Omar, Random forest regression for online capacity estimation of lithium-ion
(3) (2014) 1247–1250. batteries, Appl. Energy 232 (2018) 197–210.
[11] A. Chakraborty, A.K. Kar, Swarm intelligence: A review of algorithms, Theory and [41] Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid search, random search,
applications, Nature-inspired computing and optimization, 2017, pp. 475–494. genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:1912.06059.
[12] J. Chen, M. Zhou, H. Huang, D. Zhang, Z. Peng, Automated extraction and [42] H.B. Ly, T.A. Nguyen, B.T. Pham, Estimation of soil cohesion using machine
evaluation of fracture trace maps from rock tunnel face images via deep learning, learning method: A random forest approach, Advances in Civil Engineering 2021
Int. J. Rock Mech. Min. Sci. 142 (2021), 104745. (2021) 1–14.
[13] C. Li, J. Zhou, K. Du, D. Dias, Stability prediction of hard rock pillar using support [43] S. Maghsoodi, O. Cuisinier, F. Masrouri, Thermal effects on mechanical behaviour
vector machine optimized by three metaheuristic algorithms. International Journal of soil–structure interface, Can. Geotech. J. 57 (1) (2020) 32–47.
of Mining Science and Technology, 2023, 33(8), 1019-1036. [44] Mandelbrot, B. B. (1995). Introduction to fractal sums of pulses. In Lévy Flights
[14] W.B. Chen, W.H. Zhou, Ł. Sadowski, Z.Y. Yin, Metaheuristic model for the and Related Topics in Physics: Proceedings of the International Workshop Held at Nice,
interface shear strength between granular soil and structure considering surface France, 27–30 June 1994 (pp. 110-123). Springer Berlin Heidelberg.
morphology, Comput. Geotech. 135 (2021), 104141. [45] J. McBeck, Y. Ben-Zion, F. Renard, Predicting fault reactivation and macroscopic
[15] W. Chen, Q. Luo, J. Liu, T. Wang, L. Wang, Modeling of frozen soil-structure failure in discrete element method simulations of restraining and releasing step
interface shear behavior by supervised deep learning, Cold Reg. Sci. Technol. 200 overs, Earth Planet. Sci. Lett. 593 (2022), 117667.
(2022), 103589. [46] C.X. Miao, J.J. Zheng, R.J. Zhang, L. Cui, DEM modeling of pullout behavior of
[16] I. Cohen, Y. Huang, J. Chen, J. Benesty, J. Benesty, J. Chen, I. Cohen, Pearson geogrid reinforced ballast: the effect of particle shape, Comput. Geotech. 81
correlation coefficient, Noise Reduction in Speech Processing (2009) 1–4. (2017) 249–261.
[17] J. Zhou, Z. Wang, C. Li, W. Wei, S. Wang, D.J. Armaghani, K. Peng, Hybridized [47] R.T. Mirhosseini, Seismic response of soil-structure interaction using the support
random forest with population-based optimization for predicting shear properties vector regression, Structural Engineering and Mechanics, an Int’l Journal 63 (1)
of rock fractures, Journal of Computational Science 72 (2023) 102097. (2017) 115–124.
[18] C.E. da Silva Santos, R.C. Sampaio, L. dos Santos Coelho, G.A. Bestard, C. [48] S. Mirjalili, Dragonfly algorithm: a new meta-heuristic optimization technique for
H. Llanos, Multi-objective adaptive differential evolution for SVM/SVR solving single-objective, discrete, and multi-objective problems, Neural Comput.
hyperparameters selection, Pattern Recogn. 110 (2021), 107649. & Applic. 27 (4) (2016) 1053–1073.
[19] J.T. DeJong, Z.J. Westgate, Role of initial state, material properties, and [49] S. Mirjalili, A. Lewis, The whale optimization algorithm, Adv. Eng. Softw. 95
confinement condition on local and global soil-structure interface behavior, (2016) 51–67.
J. Geotech. Geoenviron. Eng. 135 (11) (2009) 1646–1660. [50] K. Miura, K. Maeda, M. Furukawa, S. Toki, Mechanical characteristics of sands
[20] J.T. DeJong, D.J. White, M.F. Randolph, Microscale observation and modeling of with different primary properties, Soils Found. 38 (4) (1998).
soil-structure interface behavior using particle image velocimetry, Soils Found. [51] H. Moayedi, D. Tien Bui, A. Dounis, L. Kok Foong, B. Kalantar, Novel nature-
46 (1) (2006) 15–28. inspired hybrids of neural computing for estimating soil shear strength, Appl. Sci.
[21] Deng, W., Zheng, Q., & Chen, L. (2009, March). Regularized extreme learning 9 (21) (2019) 4643.
machine. In 2009 IEEE symposium on computational intelligence and data [52] Y. Nakata, Y. Kato, M. Hyodo, A.F. Hyde, H. Murata, One-dimensional
mining (pp. 389-395). IEEE. compression behaviour of uniformly graded sand related to single particle
[22] S. Dong, H. Zhang, Y. Peng, Z. Lu, W. Hou, Method of calculating shear strength crushing strength, Soils Found. 41 (2) (2001) 39–51.
of rock mass joint surface considering cyclic shear degradation, Sci. Rep. 12 (1) [53] H. Nguyen, Y. Choi, X.N. Bui, T. Nguyen-Thoi, Predicting blast-induced ground
(2022) 9406. vibration in open-pit mines using vibration sensors and support vector regression-
[23] J.E. Dove, J.B. Jarrett, Behavior of dilative sand interfaces in a geotribology based optimization algorithms, Sensors 20 (1) (2019) 132.
framework, J. Geotech. Geoenviron. Eng. 128 (1) (2002) 25–37. [54] A.R. Peeketi, R.K. Desu, P. Kumbhar, R.K. Annabattula, Thermal analysis of large
[24] D.C. Feng, Z.T. Liu, X.D. Wang, Z.M. Jiang, S.X. Liang, Failure mode classification granular assemblies using a hierarchical approach coupling the macro-scale finite
and bearing capacity prediction for reinforced concrete columns based on element method and micro-scale discrete element method through artificial
ensemble machine learning algorithm, Adv. Eng. Inf. 45 (2020), 101126. neural networks, Computational Particle Mechanics 6 (2019) 811–822.
[25] B. Fu, D.C. Feng, A machine learning-based time-dependent shear strength model [55] Y. Peng, M.H. Nagata, An empirical overview of nonlinearity and overfitting in
for corroded reinforced concrete beams, Journal of Building Engineering 36 machine learning using COVID-19 data, Chaos Solitons Fractals 139 (2020),
(2021), 102118. 110055.
[26] M. Gu, J. Han, M. Zhao, Three-dimensional discrete-element method analysis of [56] S.W. Perkins, B.R. Christopher, B.A. Lacina, J. Klompmaker, Mechanistic-
stresses and deformations of a single geogrid-encased stone column, Int. J. empirical modeling of geosynthetic-reinforced unpaved roads, Int. J. Geomech.
Geomech. 17 (9) (2017) 04017070. 12 (4) (2012) 370–380.
[27] T.K. Ho, The random subspace method for constructing decision forests, IEEE [57] P. Probst, M.N. Wright, A.L. Boulesteix, Hyperparameters and tuning strategies
Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844. for random forest, Wiley Interdisciplinary Reviews: Data Mining and Knowledge
[28] L. Hu, J. Pu, Testing and modeling of soil-structure interface, Journal of Discovery 9 (3) (2019) e1301.
Geotechnical and Geoenvironmental Engineering 130 (8) (2004) 851–860. [58] Y. Qiu, J. Zhou, M. Khandelwal, H. Yang, P. Yang, C. Li, Performance evaluation
[29] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-
scheme of feedforward neural networks, in: In 2004 IEEE International Joint induced ground vibration, Eng. Comput. (2021) 1–18.
15
[59] Ł. Sadowski, D. Stefaniuk, Microstructural evolution within the interphase [88] Z.M. Yaseen, An insight into machine learning models era in simulating soil,
between hardening overlay and existing concrete substrates, Appl. Sci. 7 (2) water bodies and adsorption heavy metals: Review, challenges and solutions,
(2017) 123. Chemosphere 277 (2021), 130126.
[60] B.A. Salami, T. Olayiwola, T.A. Oyehan, I.A. Raji, Data-driven model for ternary- [89] K. Yin, A.L. Fauchille, E. Di Filippo, P. Kotronis, G. Sciarra, A review of sand–clay
blend concrete compressive strength prediction using machine learning approach, mixture and soil–structure interface direct shear test, Geotechnics 1 (2) (2021)
Constr. Build. Mater. 301 (2021), 124152. 260–306.
[61] H. Salehi, R. Burgueño, Emerging artificial intelligence methods in structural [90] Z.Y. Yin, C.S. Chang, Stress–dilatancy behavior for sand under loading and
engineering, Eng. Struct. 171 (2018) 170–189. unloading conditions, Int. J. Numer. Anal. Meth. Geomech. 37 (8) (2013)
[62] M. Shariati, M.S. Mafipour, B. Ghahremani, F. Azarhomayun, M. Ahmadi, N. 855–870.
T. Trung, A. Shariati, A novel hybrid extreme learning machine–grey wolf [91] Z.Y. Yin, C.S. Chang, P.Y. Hicher, Micromechanical modelling for effect of
optimizer (ELM-GWO) model to predict compressive strength of concrete with inherent anisotropy on cyclic behaviour of sand, Int. J. Solids Struct. 47 (14–15)
partial replacements for cement, Eng. Comput. (2022) 1–23. (2010) 1933–1951.
[63] N. Shrestha, Detecting multicollinearity in regression analysis, Am. J. Appl. Math. [92] J. Yoon, Forecasting of real GDP growth using machine learning models: Gradient
Stat. 8 (2) (2020) 39–42. boosting and random forest approach, Comput. Econ. 57 (1) (2021) 247–265.
[64] P.F. Smith, S. Ganesh, P. Liu, A comparison of random forest regression and [93] Z. Yu, X. Shi, J. Zhou, X. Chen, X. Qiu, Effective assessment of blast-induced
multiple linear regression for prediction in neuroscience, J. Neurosci. Methods ground vibration using an optimized random forest model based on a Harris
220 (1) (2013) 85–91. hawks optimization algorithm, Appl. Sci. 10 (4) (2020) 1403.
[65] R. Solhmirzaei, H. Salehi, V. Kodur, M.Z. Naser, Machine learning framework for [94] C. Zhang, J. Ji, Y. Gui, J. Kodikara, S.Q. Yang, L. He, Evaluation of soil-concrete
predicting failure mode and shear capacity of ultra high performance concrete interface shear strength based on LS-SVM, Geomech. Eng 11 (3) (2016) 361–372.
beams, Eng. Struct. 224 (2020), 111221. [95] J. Zhang, N. Yasufuku, H. Ochiai, A few considerations of pullout test
[66] L.J. Su, W.H. Zhou, W.B. Chen, X. Jie, Effects of relative roughness and mean characteristics of geogrid reinforced sand using DEM analysis, Geosynth. Eng. J.
particle size on the shear strength of sand-steel interface, Measurement 122 22 (2007) 103–110.
(2018) 339–346. [96] J. Zhang, Y. Sun, G. Li, Y. Wang, J. Sun, J. Li, Machine-learning-assisted shear
[67] H. Sun, H.V. Burton, H. Huang, Machine learning applications for building strength prediction of reinforced concrete beams with and without stirrups, Eng.
structural design and performance assessment: State-of-the-art review, Journal of Comput. (2022) 1–15.
Building Engineering 33 (2021), 101816. [97] P. Zhang, Y. Yang, Z.Y. Yin, BiLSTM-based soil–structure interface modeling, Int.
[68] L. Tang, Y. Du, L. Liu, L. Jin, L. Yang, G. Li, Effect mechanism of unfrozen water J. Geomech. 21 (7) (2021) 04021096.
on the frozen soil-structure interface during the freezing-thawing process, [98] W. Zhang, A. Mehta, P.S. Desai, C.F. Higgs III, Machine Learning Enabled Powder
Geomechanics and Engineering 22 (3) (2020) 245–254. Spreading Process Map for Metal Additive Manufacturing (AM), University of
[69] Z. Tao, Y. Shu, X. Yang, Y. Peng, Q. Chen, H. Zhang, Physical model test study on Texas at Austin, 2017.
shear strength characteristics of slope sliding surface in Nanfen open-pit mine, [99] W. Zhang, C. Wu, H. Zhong, Y. Li, L. Wang, Prediction of undrained shear
Int. J. Min. Sci. Technol. 30 (3) (2020) 421–429. strength using extreme gradient boosting and random forest based on Bayesian
[70] K.E. Taylor, Summarizing multiple aspects of model performance in a single optimization, Geosci. Front. 12 (1) (2021) 469–477.
diagram, J. Geophys. Res. Atmos. 106 (D7) (2001) 7183–7192. [100] L.S. Zhao, W.H. Zhou, B. Fatahi, X.B. Li, K.V. Yuen, A dual beam model for
[71] W. Tian, A review of sensitivity analysis methods in building energy analysis, geosynthetic-reinforced granular fill on an elastic foundation, App. Math. Model.
Renew. Sustain. Energy Rev. 20 (2013) 411–419. 40 (21–22) (2016) 9254–9268.
[72] H. Tongal, M.J. Booij, Simulation and forecasting of streamflows using machine [101] J. Zhou, Y. Dai, K. Du, M. Khandelwal, C. Li, Y. Qiu, COSMA-RF: New intelligent
learning models coupled with base flow separation, J. Hydrol. 564 (2018) model based on chaos optimized slime mould algorithm and random forest for
266–282. estimating the peak cutting force of conical picks, Transp. Geotech. 36 (2022),
[73] M. Uesugi, H. Kishida, Frictional resistance at yield between dry sand and mild 100806.
steel, Soils Found. 26 (4) (1986) 139–149. [102] J.J. Zhou, X.N. Gong, K.H. Wang, R.H. Zhang, Shaft capacity of the pre-bored
[74] S. Utili, T. Zhao, G.T. Houlsby, 3D DEM investigation of granular column grouted planted pile in dense sand, Acta Geotech. 13 (5) (2018) 1227–1239.
collapse: evaluation of debris motion and its destructive power, Eng. Geol. 186 [103] J. Zhou, Y. Qiu, D.J. Armaghani, W. Zhang, C. Li, S. Zhu, R. Tarinejad, Predicting
(2015) 3–16. TBM penetration rate in hard rock condition: a comparative study among six XGB-
[75] Q. Van Nguyen, B. Fatahi, A.S. Hokmabadi, Influence of size and load-bearing based metaheuristic techniques, Geosci. Front. 12 (3) (2021), 101091.
mechanism of piles on seismic performance of buildings considering [104] J. Zhou, Y.G. Qiu, S.L. Zhu, D.J. Armaghani, C.Q. Li, H. Nguyen, S. Yagiz,
soil–pile–structure interaction, Int. J. Geomech. 17 (7) (2017) 04017007. Optimization of support vector machine through the use of metaheuristic
[76] von Rueden, L., Mayer, S., Sifa, R., Bauckhage, C., & Garcke, J. (2020). algorithms in forecasting TBM advance rate, Eng. Appl. Artif. Intel. 97 (2021),
Combining machine learning and simulation to a hybrid modelling approach: 104015.
Current and future directions. In Advances in Intelligent Data Analysis XVIII: 18th [105] J. Zhou, Y. Qiu, M. Khandelwal, S. Zhu, X. Zhang, Developing a hybrid model of
International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Jaya algorithm-based extreme gradient boosting machine to estimate blast-
Germany, April 27–29, 2020, Proceedings 18 (pp. 548-560). Springer induced ground vibrations, Int. J. Rock Mech. Min. Sci. 145 (2021), 104856.
International Publishing. [106] J. Zhou, Y. Dai, S. Huang, D.J. Armaghani, Y. Qiu, Proposing several hybrid
[77] J. Wang, M. Jiang, Unified soil behavior of interface shear test and direct shear SSA—machine learning techniques for estimating rock cuttability by conical pick
test under the influence of lower moving boundaries, Granul. Matter 13 (5) with relieved cutting modes, Acta Geotech. 18 (3) (2023) 1431–1446.
(2011) 631–641. [107] J. Zhou, X. Shi, K. Du, X. Qiu, X. Li, H.S. Mitri, Feasibility of random-forest
[78] J. Wang, S. Lu, S.H. Wang, Y.D. Zhang, A review on extreme learning machine, approach for prediction of ground settlements induced by the construction of a
Multimed. Tools Appl. 81 (29) (2022) 41611–41660. shield-driven tunnel, Int. J. Geomech. 17 (6) (2017) 04016129.
[79] R. Wang, D.E. Ong, M.I. Peerun, D.S. Jeng, Influence of surface roughness and [108] J. Zhou, X.J. Shen, Y.G. Qiu, X.Z. Shi, M. Khandelwal, Cross-correlation stacking-
particle characteristics on soil–structure interactions: A state-of-the-art review, based microseismic source location using three metaheuristic optimization
Geosciences 12 (4) (2022) 145. algorithms, Tunn. Undergr. Space Technol. 126 (2022), 104570.
[80] S. Wang, K. Guan, C. Zhang, D. Lee, A.J. Margenot, Y. Ge, Y. Huang, Using soil [109] W.H. Zhou, K.V. Yuen, F. Tan, Estimation of maximum pullout shear stress of
library hyperspectral reflectance and machine learning to predict soil organic grouted soil nails using Bayesian probabilistic approach, Int. J. Geomech. 13 (5)
carbon: Assessing potential of airborne and spaceborne optical soil sensing, (2013) 659–664.
Remote Sens. Environ. 271 (2022), 112914. [110] J. Zhou, S. Huang, Y. Qiu, Optimization of random forest through the use of MVO,
[81] J. Won, J. Shin, Machine learning-based approach for seismic damage prediction GWO and MFO in evaluating the stability of underground entry-type excavations,
method of building structures considering soil-structure interaction, Tunnelling and Underground Space Technology 124 (2022) 104494.
Sustainability 13 (8) (2021) 4334. [111] H.X. Zhu, W.H. Zhou, X.Y. Jing, Z.Y. Yin, Observations on fabric evolution to a
[82] X. Wu, J. Yang, Tests of the interface between structures and filling soil of common micromechanical state at the soil-structure interface, Int. J. Numer.
mountain area airport, Geomechanics & Engineering 12 (3) (2017) 399–415. Anal. Meth. Geomech. 43 (15) (2019) 2449–2470.
[83] C. Xu, D.D. Tannant, W. Zheng, K. Liu, Discrete element method and support [112] H. Zhu, W.H. Zhou, Z.Y. Yin, Deformation mechanism of strain localization in 2D
vector machine applied to the analysis of steel mesh pinned by rockbolts, Int. J. numerical interface tests, Acta Geotech. 13 (3) (2018) 557–573.
Rock Mech. Min. Sci. 125 (2020), 104163. [113] L. Zhu, Q. Liao, Z. Wang, J. Chen, Z. Chen, Q. Bian, Q. Zhang, Prediction of Soil
[84] J.G. Xu, S.Z. Chen, W.J. Xu, Z.S. Shen, Concrete-to-concrete interface shear Shear Strength Parameters Using Combined Data and Different Machine Learning
strength prediction based on explainable extreme gradient boosting approach, Models, Appl. Sci. 12 (10) (2022) 5100.
Constr. Build. Mater. 308 (2021), 125088. [114] Y. Qiu, J. Zhou, Short-Term Rockburst Damage Assessment in Burst-Prone Mines:
[85] J. Xue, B. Shen, A novel swarm intelligence optimization approach: sparrow An Explainable XGBOOST Hybrid Model with SCSO Algorithm, Rock Mechanics
search algorithm, Systems Science & Control Engineering 8 (1) (2020) 22–34. and Rock Engineering (2023) 1–26. https://doi.org/10.1007/s00603-023-03522
[86] H. Yang, X. Liu, K. Song, A novel gradient boosting regression tree technique -w.
optimized by improved sparrow search algorithm for predicting TBM penetration [115] Y. Qiu, J. Zhou, Short-term rockburst prediction in underground project: insights
rate, Arab. J. Geosci. 15 (6) (2022) 461. from an explainable and interpretable ensemble learning model, Acta
[87] H. Yao, X. Li, H. Pang, L. Sheng, W. Wang, Application of random forest algorithm Geotechnica (2023) 1–31. https://doi.org/10.1007/s11440-023-01988-0.
in hail forecasting over Shandong Peninsula, Atmos. Res. 244 (2020), 105093. [116] J. Zhou, S. Huang, M. Tao, M. Khandelwal, Y. Dai, M. Zhao, Stability prediction of
underground entry-type excavations based on particle swarm optimization and
gradient boosting decision tree, Underground Space 9 (2023) 234–249.
16

Construction and Building Materials: Jian Zhou, Peixi Yang, Chuanqi Li, Kun Du

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Construction and Building Materials: Jian Zhou, Peixi Yang, Chuanqi Li, Kun Du

Uploaded by

Copyright:

Available Formats

Construction and Building Materials 409 (2023) 133911

Contents lists available at ScienceDirect