Professional Documents
Culture Documents
Journal of Rock Mechanics and Geotechnical Engineering
Journal of Rock Mechanics and Geotechnical Engineering
a r t i c l e i n f o a b s t r a c t
Article history: The spatial information of rockhead is crucial for the design and construction of tunneling or under-
Received 16 April 2021 ground excavation. Although the conventional site investigation methods (i.e. borehole drilling) could
Received in revised form provide local engineering geological information, the accurate prediction of the rockhead position with
6 June 2021
limited borehole data is still challenging due to its spatial variation and great uncertainties involved.
Accepted 20 June 2021
Available online 14 September 2021
With the development of computer science, machine learning (ML) has been proved to be a promising
way to avoid subjective judgments by human beings and to establish complex relationships with mega
data automatically. However, few studies have been reported on the adoption of ML models for the
Keywords:
Rockhead
prediction of the rockhead position. In this paper, we proposed a robust probabilistic ML model for
Machine learning (ML) predicting the rockhead distribution using the spatial geographic information. The framework of the
Probabilistic model natural gradient boosting (NGBoost) algorithm combined with the extreme gradient boosting (XGBoost)
Gradient boosting is used as the basic learner. The XGBoost model was also compared with some other ML models such as
the gradient boosting regression tree (GBRT), the light gradient boosting machine (LightGBM), the
multivariate linear regression (MLR), the artificial neural network (ANN), and the support vector machine
(SVM). The results demonstrate that the XGBoost algorithm, the core algorithm of the probabilistic N-
XGBoost model, outperformed the other conventional ML models with a coefficient of determination (R2)
of 0.89 and a root mean squared error (RMSE) of 5.8 m for the prediction of rockhead position based on
limited borehole data. The probabilistic N-XGBoost model not only achieved a higher prediction accu-
racy, but also provided a predictive estimation of the uncertainty. Thus, the proposed N-XGBoost
probabilistic model has the potential to be used as a reliable and effective ML algorithm for the pre-
diction of rockhead position in rock and geotechnical engineering.
2021 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting by
Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jrmge.2021.06.012
1674-7755 2021 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting by Elsevier B.V. This is an open access article under the CC BY-
NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1232 X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245
Fig. 1. Boreholes along MRT project line and two-dimensional (2D) geological map of Singapore.
Pan et al., 2018). Although the Kriging interpolation method is predict the soil-rock interfaces. The results indicated that the spline
widely used, its performance runs lower than expected when the regression method had outperformed the other three algorithms.
dataset is characterized as nonlinear or sparse (Qi et al., 2020a). Nevertheless, the prediction was merely based on the statistics
Meanwhile, results interpreted by this approach could involve method and the attributes (e.g. location, topographical features)
significant conflict against engineers’ geology knowledge of the site were not considered in the modeling process. The quantile
in a complicated case. Different from geological strata, the rockhead regression forests (QRF) were used by Chen et al. (2020) in a spatial
is normally distributed in the same formation. Due to weathering or model to predict the soil thickness of loess deposits in central
other geological complications, the prediction of rockhead through France, but the prediction accuracy was poor and only a mean co-
Kriging interpolation based on limited borehole data is still efficient of determination R2 of 0.33 was achieved. Overall, most of
challenging. the existing studies only adopted geoscience statistic methods or
With the rapid development of artificial intelligence (AI), ma- single ML regressor as their core algorithms, and the prediction
chine learning (ML) can provide a promising and effective way to accuracy of those models still has room for improvement.
deal with challenges in engineering prediction (Dixit et al., 2020; In 2016, Chen and Guestrin (2016) proposed the extreme
Fuentes et al., 2020; Huang et al., 2020; Zhang et al., 2021a, b; Zhao gradient boosting (XGBoost) model, which is a powerful scalable
et al., 2021). A good ML system could reduce the cost of manpower tree boosting ML framework and a sparsity-aware algorithm for
and provide an accurate reference for making decision through sparse data. For efficient performance, XGBoost implements the
learning the inherent laws from a big dataset. For instance, a gen- architecture of gradient boosted decision tree which could yield
eral regression neural network was introduced to present spatial high accuracy in both classification and regression tasks. It has been
distribution of soil type using borehole data (Zhou et al., 2018). The applied in disease prediction (Budholiya et al., 2020; Davagdorj
method is able to predict the simple soil distribution in an area of et al., 2020), gene expression prediction (Li et al., 2019), casualty
72 m 40 m with only spatial coordinates. Support vector machine prediction for terrorist attack (Feng et al., 2020), industrial pre-
(SVM) method has also been applied for interpreting sparse diction (Zheng and Wu, 2019), and construction engineering (Zhao
geological information (Smirnoff et al., 2008), where the task is et al., 2019; Duan et al., 2020a; Zhang et al., 2020a, c, 2021a, b).
regarded as a pure classification problem, and a cross-validation However, the application of XGBoost in rockhead prediction with
procedure is conducted for describing findings from different limited and sparse borehole data has not been reported so far.
training sets. In this case, SVM can be considered as a novel learning Motivated by the increasing demand for underground devel-
method for treating small data samples, especially when boreholes opment in Singapore, this study proposed a hybrid ML framework
and cross-section data are limited. Wei et al. (2017) built a global titled N-XGBoost based on the XGBoost and the natural gradient
spatial bedrock prediction model based on the random forest and boosting (NGBoost) methods to improve the predictive accuracy of
gradient boosting tree algorithms, but the data among spare global DTB. In this framework, XGBoost is used to be the base learners of
areas strongly impacted the precision of the proposed models in a the NGBoost algorithm. Borehole data and local terrain parameters
local region. Qi et al. (2020b) employed polynomial regression, from a tunneling project in Singapore were chosen as the data
spline interpolation, one-dimensional (1D) spline regression, and source to train, validate, and evaluate the proposed ML framework.
Bayesian-based conditional random field algorithms to spatially Other existing ML algorithms such as multivariate linear regression
X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245 1233
(MLR), artificial neural network (ANN), SVM, gradient boosting acid rocks including granite, adamellite, granodiorite, and the acid
regression tree (GBRT), and light gradient boosting machine and intermediate hybrids (Qi et al., 2020a). Due to the humid
(LightGBM) were also evaluated by the same dataset in this study tropical climate in Singapore, the acid rocks in BTG formation have
for the purpose of comparison. The main contributions of this study been heavily weathered. The thickness of residual soils derived
include: (1) the proposal and application for the first time of the from weathered BTG ranges from a few meters up to 70 m and the
XGBoost-based hybrid ML model for the accurate prediction of average thickness is 30 m (Sharma et al., 1999; Wee and Zhou,
bedrock depth with limited borehole dataset, and (2) the ability of 2009). With the aims to ensure a safe underground development
the proposed model to provide not only accurate point prediction in Singapore, borehole investigations were carried out in the area of
but also estimation of the predictive uncertainty for reliable deci- interest to investigate the geological conditions. Based on practical
sion-making. engineering experience in Singapore, the rock mass weathered in
Grades I to III is classified as rock whereas the rock mass weathered
2. Geological conditions higher than Grade III is usually regarded as soil-like materials
(Sharma et al., 1999). The rockhead is normally regarded as the
Understanding the geological formation is necessary for con- elevation between Grades III and IV in engineering practice (Qi
struction and evaluation of the proposed ML model. From a et al., 2020a).
regional-scaled view, Singapore and its several smaller islands are To support the underground space development in Singapore,
lying in the southern extension of the Malaysian Peninsula, with a Singapore Land Authority is working to develop a 3D map of sub-
total land area of about 650 km2 (Sharma et al., 1999; Qi et al., surface utilities. In this case, GeM2S is established as a web-based
2020a). As shown in Fig. 1, the geological formation of Singapore 3D design tool for managing the shallow borehole data to present
contains three main parts: sedimentary rocks (Jurong formation, JF) the subsurface formation for future underground projects in
in the west, igneous rocks (Bukit Timah granite, BTG) in the central, Singapore (Pan et al., 2020). However, in the 3D geological model
and quaternary deposits (Old Alluvium soils and soft soils deposits construction process, interpolation methods like Kriging interpo-
called Kallang formation, KF) in the east. BTG is the largest phys- lation followed by expert justification could be time-consuming
iographic area for Singapore, which is characterized by hills and and introduce unexpected uncertainties when the geo-model is
valleys of both high and low relief. Most of the hills in this area are complex (Smirnoff et al., 2008). Therefore, it is desirable to utilize
less than 60 m in height, however, the granite near its western the ML techniques to estimate the unseen geological information
contact to JF formed steeper and more prominent hills that the between boreholes and automatically update the 3D geological
highest one is raised up to 163 m. The BTG is a general name for the model when new data are obtained. In this paper, a hybrid ML
Fig. 2. XGBoost-based ML framework for spatial rockhead prediction. GSE: ground surface elevation; EVRS: explained variance regression score; MAE: mean absolute error.
1234 X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245
Tj h i
1 1 X ðjÞ 2
U qj ¼ gTj þ lwk ¼ gTj þ l wk (9)
2 2
k¼1
of LightGBM focuses on performance and scalability. Due to the 4. Verification of the proposed method
similarity in base theory, the details of LightGBM are not presented
here and can be found in Ke et al. (2017) and Liang et al. (2020). 4.1. Data source
Fig. 8. Distribution of DTB in this study: (a) Histogram of DTB, and (b) Distribution of DTB in boxplot.
Further can be seen, the number of DTBs lower than 33 m was generate new synthetic samples with the three nearest neighbors
few. Thus, the boreholes with DTB deeper than 33 m were of seed case which are supposed to have similar DTBs and local
recognized as outliers in the boxplot view in Fig. 8b. However, as a terrain features (e.g. slope and aspect). Therefore, SMOGN was
purely data-driven methodology, ML could be strongly affected by adopted to oversample the rare data points (DTB < 33 m) in this
the outliers due to the imbalanced distribution of DTB in the study to help improve the robustness of ML model for predicting a
original samples. To solve this problem, a new data preprocessing deeper DTB. More details on the SMOTER and SMOGN algorithms
method called SMOGN was adopted. can be found in some references (Torgo et al., 2013; Branco et al.,
Several studies have claimed that the thickness of the soil is 2017).
likely to be related to the local terrain (Themistocleous et al., 2016; As a usual data preprocessing method, normalization not only
Wei et al., 2017; Simon et al., 2020). Therefore, both the borehole enhance the overall predictive performance of ML models, but also
data from site investigation and the local terrain features (i.e. slope, improve the computing efficiency (Pu et al., 2019; Yu et al., 2020;
aspect, and curvature) derived from the high precision DEM of Zhang et al., 2021a). In this study, the min-max normalization was
Singapore were utilized to create the dataset for the ML model adopted to convert the dataset to a range from 0 to 1:
established in this study. Table 1 presents the summary statistics of
the dataset.
fi ðkÞ mini fi ðkÞ
fi0 ðkÞ ¼ (10)
maxi fi ðkÞ mini fi ðkÞ
4.2. Data preprocessing
where i is the sample index, and fi ðkÞ denotes the i-th sample in the
In this study, the observations of DTB, borehole locations, and k-th feature domain.
local terrain features under the Singapore coordinate generated a The innovative idea of SMOGN method is to oversample the
dataset which was then randomly divided into a training set (80% of minority in training data to improve the predictive ability of ML
the whole data) and a testing set (20% of the whole data). model based on two oversampling techniques by the KNN algo-
In order to overcome the performance degradation problem rithm distances in features space underlying a given observation
caused by imbalanced data, Torgo et al. (2013) proposed the (Branco et al., 2017). If the distance between given observation is
SMOTER algorithm which could change the distribution of the close enough, SMOTER is applied. If the distance is too far, Gaussian
given training dataset to balance the rare and the most frequent noise is introduced into SMOTER to oversample.
ones. Branco et al. (2017) further introduced Gaussian noise to the
SMOTER, i.e. SMOGN, for dealing with imbalanced regression
problems where the most important cases to the user are poorly
represented in the available data. SMOGN can generate new syn-
thetic examples with SMOTER only when the seed example and the
k-nearest neighbors (KNN) selected are ‘close enough’ and use the
introduction of Gaussian noise when the two examples are ‘more
distant’. As shown in Fig. 9, the key idea of SMOGN algorithm is to
Table 1
Summary statistics of dataset.
4.3. Model establishment in boxplot. To overcome the data imbalance problem, SMOGN was
adopted to change the distribution of minority samples as shown in
As presented above, the primary objective of this study is to Fig. 11b. The benefits of SMOGN in improving the performance of
build a novel method of AI for predicting the rockhead elevation of ML models will be presented in Section 4.4. Additionally, to reduce
engineering practice based on XGBoost model and NGBoost prob- the effects of different scales of features on performance of the ML
ability prediction algorithm, called N-XGBoost methodology. models, the training dataset should also be normalized by Eq. (1).
Accordingly, borehole data together with local ground surface pa-
rameters such as slope, aspect, and curvature were prepared as 4.3.2. Training ML models
predictors. The rockhead elevation was carefully recognized by As mentioned in Section 3.3.1, an initial XGBoost model as the N-
experts from the borehole log and was regarded as the target var- XGBoost base learner was developed with four important hyper-
iable. There were 502 boreholes data which have been randomly parameters in this study: max tree depth (dmax ), learning rate (a),
divided into two parts for training and testing purposes. That is 80% minimum loss reduction (g) and L2 regularization factor (l). The
of the total samples used for training the N-XGBoost model by ten- optimum values of these four parameters were picked up by
fold cross-validation strategy, whereas the remaining 20% is used Bayesian optimization. Bayesian optimization is widely used for
for testing the precision of the developed N-XGBoost model. With searching the value of the minimized objective function by estab-
the aims to overcome the bias influence of rare samples in lishing an alternative function according to the evaluation results. It
regression, a preprocessing method called SMOGN was introduced becomes a powerful tool when the objective function is unknown
before training in this study to improve the predictive capabilities and operation is complex (Zhou et al., 2018), therefore eliminating
of N-XGBoost in a larger space. Fig. 10 shows the flowchart of N- plenty of wasted effort. Other hyperparameters were set with their
XGBoost model in this study. default values. Table 2 shows the results of hyperparameters tun-
In the developed hybrid N-XGBoost model, the XGBoost algo- ing. All the other ML models were also trained with the best
rithm was introduced to NGBoost probability prediction algorithm hyperparameters set obtained by Bayesian optimization based on
as the base learner. To establish the model, an initial XGBoost the same training dataset.
model was first fitted by the training dataset. Meanwhile, the four After the optimum hyperparameter was set, the training dataset
key hyperparameters of XGBoost model were chosen after trial and was used to fit the ML models described above. The performance of
error, and were further optimized by the Bayesian optimization each ML model was evaluated by the ten-fold cross-validation
algorithm. With the achievement of optimal XGBoost model, its method. In the training stage, the developed XGBoost model was
predictive accuracy can be enhanced to some extent. systematically compared with three conventional ML models and
For comparison, the popular ML models like MLR, MLP-ANN, GBRT. Fig. 12 shows the comparison results with respect to the
and SVM were also trained and generated based on the same predictive accuracy and robustness under ten-fold cross-validation.
training dataset. All the developed ML models were evaluated by It demonstrates that the developed XGBoost model with an average
indices like R2, MAE, and RMSE under the same testing dataset. R2 R2 of 0.895 achieved a better performance than GBRT and LightGBM,
represents the correlation and fitting goodness between the target and outperformed the other three ML models significantly as well.
and real values. MAE is a measurement of average errors for all the Meanwhile, the curves of R2 for the three conventional ML models
predictions. RMSE is widely applied when sensible error estimation (MLR, MLP-ANN, and SVM) indicates that they have poor robustness
is required. In rockhead elevation prediction, RMSE and MAE are in with different training data subsets.
unit of meter. EVRS was also used to evaluate the explained vari- Fig. 13 illustrates the accuracy of the predictive values by the
ance of model. The higher the EVRS, the better the explained different ML algorithms for estimating the rockhead position (DTB).
variance of model. Overall, the tree-based ML models (LightGBM, GBRT, and XGBoost)
For n target values, the statistic criteria stated above can be have a higher prediction accuracy than SVM, MLR, and MLP-ANN for
calculated by determining the DTB values with limited data. The developed
P XGBoost model achieved the highest R2 and lowest RMSE among
ðy by i Þ2 those ML models.
R2 ¼ 1 P i (11)
ðyi b
y Þ2
4.4. Model evaluation
1X n
MAE ¼ jy b
yij (12) The performances of the prediction of rockhead position (DTB)
n i¼1 i
in the developed XGBoost model and the other five ML models
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (MRL, MLP-ANN, SVM, LightGBM, and GBRT) were both evaluated
u n in training and testing datasets by the four indicators (R2, RMSE,
u1 X
RMSE ¼ t ðy b y i Þ2 (13) MAE, and EVRS). Fig. 14 shows the predictive results of the testing
n i¼1 i
dataset using different ML models, which demonstrates that the
tree-based models are more suitable than other conventional ML
varð b
y i yi Þ models to predict the DTB based on sparse borehole data. The
EVRS ¼ 1 100 (14) performances of different models with or without SMOGN pre-
varðyi Þ
processing are presented in Table 3. The developed XGBoost model
where by is the mean observed value; and yi and b
y i are the i-th achieved the best performance among all the ML models in both
observed and predicted values, respectively. training and testing datasets. It also can be concluded that the
SMOGN preprocessing method could help improve the overall
predictive performance by oversampling in rare samples distrib-
4.3.1. SMOGN in preprocessing uted below 30 m.
Fig. 11a shows the distribution of rockhead elevation of original Based on the results shown in Table 3, the developed XGBoost
training data in this study. It demonstrates that the samples deeper model was selected to combine with NGBoost algorithm to make
than 33 m are so rare that it can be considered as abnormal points probability predictions.
X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245 1239
Fig. 11. Preprocessing results: (a) Before SMOGN, and (b) After SMOGN.
1240 X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245
Table 2
Hyperparameters tuning for XGBoost in this study.
Learning rate, a 0.3 [0.01, 0.3] The weight of each step is tuned to improve the 0.098
robustness of the model
Max tree depth, dmax 6 [3, 10] Maximum depth of a tree. Increasing this value 8
will make the model more complex and more
likely to
overfit
Minimum loss reduction, g 0 [0.1, 5] Minimum loss reduction required to make a 0.012
further partition on a leaf node of the tree. The
larger g is, the
more conservative the algorithm will be
L2 regularization factor, l 1 [0.1, 20] Increasing this value will make model more 6.996
conservative
Table 3
Comparison of the predictive models both in training and testing datasets.
With SMOGN MLR 0.765 0.79 8.28 7.99 6.53 6.32 76.5 79
MLP-ANN 0.832 0.822 7.02 7.35 5.34 5.83 83.2 82.3
SVM 0.825 0.816 7.15 7.47 5.04 5.66 82.9 81.8
LightGBM 0.965 0.856 3.19 6.62 2.32 4.94 96.5 86
GBRT 0.954 0.862 2.63 6.47 2.76 4.83 95.5 86.7
XGBoost 0.985 0.889 2.06 5.81 1.45 4.35 98.5 89.2
Without SMOGN MLR 0.646 0.632 7.57 7.88 5.99 5.87 64.6 63.7
MLP-ANN 0.646 0.655 7.57 7.63 6.01 5.63 64.6 66.9
SVM 0.705 0.659 6.91 7.59 5.23 5.59 70.6 67.1
LightGBM 0.903 0.74 3.97 6.63 2.96 4.97 90.3 74
GBRT 0.891 0.805 4.2 5.74 3.06 4.39 89.1 80.6
XGBoost 0.981 0.812 1.76 5.64 1.27 4.43 98.1 81.4
Fig. 15. Plots of the predictive capability of the hybrid N-XGBoost model with SMOGN: (a) Prediction with uncertainty estimation of test data along the MRT line, and (b) Prediction
of training data along the MRT line.
coordinate y were the top three features that could mostly affect well as the available features would help to level up the
the predictive results of rockhead position in this study. The reason prediction accuracy.
may be that the variation of DTB is greatly affected by the spatial (2) Since the borehole information is regarded as discrete data
coordinates and GSE in the study area. samples in this study, the prediction of rockhead is strongly
Although the proposed model obtains desirable predictive results, related to the current GSE and limited spatial relationships
there are some limitations that need to be addressed in the future among rockhead points. Some other features not detected
study: may also have influences on the predictive results, such as
seismic velocity of rock, mechanical parameters of rock
(1) Because the predictive performance of ML is greatly affected sample, and rock quality. To further improve the perfor-
by the number and quality of the observation dataset, mance of the proposed model, these continuous features can
increasing the number of high-quality borehole samples, as be included in the training model.
1244 X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245
Fig. 16. Plots of the predictive capability of the hybrid N-XGBoost model without SMOGN: (a) Prediction with uncertainty estimation of test data along the MRT line, and (b)
Prediction of training data along the MRT line.
significant financial support for this work that could have influ-
enced its outcome.
Acknowledgments
References
Fig. 17. Feature importance ranking.
Adepelumi, A.A., Fayemi, O., 2012. Joint application of ground penetrating radar and
electrical resistivity measurements for characterization of subsurface stratig-
(3) The zone of interest is line-type in this study. The perfor- raphy in Southwestern Nigeria. J. Geophys. Eng. 9 (4), 397e412.
Ahmadi, M.A., 2015. Developing a robust surrogate model of chemical flooding
mance of the proposed model in a more complex shape of
based on the artificial neural network for enhanced oil recovery implications.
area (e.g. rectangle, circular, and irregular) needs to be Math. Probl Eng. https://doi.org/10.1155/2015/706897.
further validated. Ba
ci
c, M., Libric, L., Kauni, D.J., Kovacevic, M.S., 2020. The usefulness of seismic
surveys for geotechnical engineering in karst: some practical examples. Geo-
sciences 10 (10), 406.
Declaration of competing interest Branco, P., Torgo, L., Ribeiro, R.P., 2017. SMOGN: a pre-processing approach for
imbalanced regression. Preceed. Mach. Learn. Res. 74, 36e50.
Bressan, T.S., de Souza, M.K., Girelli, T.J., Junior, F.C., 2020. Evaluation of machine
The authors wish to confirm that there are no known conflicts of learning methods for lithology classification using geophysical data. Comput.
interest associated with this publication and there has been no Geosci. 139.
X. Zhu et al. / Journal of Rock Mechanics and Geotechnical Engineering 13 (2021) 1231e1245 1245
Budholiya, K., Shrivastava, S.K., Sharma, V., 2020. An optimized XGBoost based Simon, A., Geitner, C., Katzensteiner, K., 2020. A framework for the predictive
diagnostic system for effective prediction of heart disease. J. King Saud Univer. - mapping of forest soil properties in mountain areas. Geoderma 371.
Comput. Inform. Sci. https://doi.org/10.1016/j.jksuci.2020.10.013. Smirnoff, A., Boisvert, E., Paradis, S.J., 2008. Support vector machine for 3D
Chen, T.Q., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. Proceedings modelling from sparse geological information of various origins. COMPUT
of the 22nd ACM SIGKDD International Conference, pp. 785e794. GEOSCI-UK 34 (2), 127e143.
Chen, S.C., Richer-De-Forges, A.C., Mulder, V.L., Martelet, G., Loiseau, T., Lehmann, S., Svozil, D., Kvasnieka, V., Pospichal, J., 1997. Introduction to multi-layer feed-forward
Arrouays, D., 2020. Digital mapping of the soil thickness of loess deposits over a neural networks. CHEMOMETR INTELL LAB 39, 43e62.
calcareous bedrock in central France. Catena 198. https://doi.org/10.1016/ Thanh, H.V., Sugai, Y., Nguele, R., Sasaki, K., 2019. Integrated workflow in 3D
j.catena.2020.105062. geological model construction for evaluation of CO2 storage capacity of a
Cho, E., Jacobs, J.M., Jia, X., Kraatz, S., 2019. Identifying subsurface drainage using fractured basement reservoir in Cuu Long Basin. Vietnam. Int. J. Greenh. Gas
satellite big data and machine learning via google earth engine. Water Resour. Control 90. https://doi.org/10.1016/j.ijggc.2019.102826.
Res. 55 (10), 8028e8045. Themistocleous, K., Hadjimitsis, D.G., Michaelides, S., Papadavid, G., Kavoura, K.,
Cremasco, D., 2013. Estimating Depth to Bedrock in Weathered Terrains Using Konstantopoulou, M., Kyriou, A., Nikolakopoulos, K.G., Sabatakakis, N.,
Ground Penetrating Radar: A Case Study in the Adelaide Hills. BSc Thesis. Depountis, N., 2016. 3D subsurface geological modeling using GIS, remote
University of Adelaide. sensing, and boreholes data. Proceedings of Fourth International Conference on
Davagdorj, K., Pham, V.H., Theera-Umpon, N., Ryu, K.H., 2020. XGBoost-based Remote Sensing and Geoinformation of the Environment. RSCy2016.
framework for smoking-induced noncommunicable disease prediction. Int. J. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P., 2013. SMOTE for regression. Progress
Environ. Res. Publ. Health 17 (18). in Artificial Intelligence, pp. 378e389.
Dixit, N., Mccolgan, P., Kusler, K., 2020. Machine learning-based probabilistic lith- Vapnik, V., Cortes, C., 1995. Support-vector networks. Mach. Learn. 20, 273e297.
ofacies prediction from conventional well logs: a case from the umiat oil field of Wang, Y., Sherry Ni, X., 2019. A xgboost risk model via feature selection and Bayesian
Alaska. Energies 13 (18). hyper-parameter optimization. Int. J. Database Manag. Syst. 11 (1), 1e17.
Du, Y., Xu, P., Ling, S., Tian, B., You, Z., Zhang, R., 2019. Determining the soil-bedrock Wang, L., Wu, C., Tang, L., Zhang, W., Lacasse, S., Liu, H., Gao, L., 2020. Efficient
interface and fracture-zone scope in the central urban area of the Jinan city, reliability analysis of earth dam slope stability using extreme gradient boosting
China, by using microtremor signals. J. Geophys. Eng. 16 (4), 680e689. method. Acta Geotech 15 (11), 3135e3150.
Duan, J., Asteris, P.G., Nguyen, H., Bui, X.N., Moayedi, H., 2020a. A novel artificial Wee, L.K., Zhou, Y., 2009. Geology of Singapore, second ed. Defence Science and
intelligence technique to predict compressive strength of recycled aggregate Technology Agency, Singapore.
concrete using ICA-XGBoost model. Eng. Comput. https://doi.org/10.1007/ Wei, S., Hengl, T., Mendes De Jesus, J., Yuan, H., Dai, Y., 2017. Mapping the global depth
s00366-020-01003-0. to bedrock for land surface modeling. J. Adv. Model. Earth Syst. 9 (1), 65e88.
Duan, T., Anand, A., Ding, D.Y., Thai, K.K., Basu, S., Ng, A., Schuler, A., 2020b. Young, D.S., 2017. Handbook of Regression Methods. CRC Press.
NGBoost: natural gradient boosting for probabilistic prediction. In: Proceedings Yu, X., Xu, Y., 2015. A methodology for automatically 3D geological modeling based
of Proceedings of the 37th International Conference on Machine Learning, vol. on geophysical data grids. Proceedings of 2015 8th International Conference on
119, pp. 2690e2700. Intelligent Computation Technology and Automation (ICICTA), pp. 40e43.
Feng, Y., Wang, D., Yin, Y., Li, Z., Hu, Z., 2020. An XGBoost-based casualty prediction Yu, H., Chen, G., Gu, H., 2020. A machine learning methodology for multivariate
method for terrorist attacks. Complex Intell. Syst. 6 (3), 721e740. pore-pressure prediction. Comput. Geosci. 143.
Fuentes, I., Padarian, J., Iwanaga, T., Willem Vervoort, R., 2020. 3D lithological Zhang, W., Zhang, R., Wu, C., Goh, A.T.C., Lacasse, S., Liu, Z., Liu, H., 2020a. State-of-
mapping of borehole descriptions using word embeddings. Comput. Geosci. 141. the-art review of soft computing applications in underground excavations.
Gao, L., Ding, Y., 2020. Disease prediction via Bayesian hyperparameter optimization Geosci. Front. 11, 1095e1106.
and ensemble learning. BMC Res. Notes 205 (13), 1e6. Zhang, X., Zhang, Y., Xu, L., Zhang, J., Tian, Y., Wang, S., Li, Z., 2020b. Urban geological
Huang, H.W., Zhao, S., Zhang, D.M., Chen, J.Y., 2020. Deep learning-based instance 3D modeling based on papery borehole log. ISPRS Int. J. Geo-Inf. 9 (6).
segmentation of cracks from shield tunnel lining images. Struct. Infrastruct. Zhang, W., Zhang, R., Wu, C., Goh, A.T.C., Wang, L., 2020c. Assessment of basal heave
Eng. https://doi.org/10.1080/15732479.2020.1838559. stability for braced excavations in anisotropic clay using extreme gradient
Ke, G.L., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y., 2017. boosting and random forest regression. Undergr. Space. https://doi.org/10.1016/
LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., j.undsp.2020.03.001.
Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. Zhang, W., Wu, C., Zhong, H., Li, Y., Wang, L., 2021a. Prediction of undrained shear
(Eds.), Proceedings of the NIPS2017. strength using extreme gradient boosting and random forest based on Bayesian
Lawal, A.I., Kwon, S., 2020. Application of artificial intelligence to rock mechanics: optimization. Geosci. Front. 12 (1), 469e477.
an overview. J. Rock Mech. Geotech. Eng. 13 (1), 248e266. Zhang, W., Li, H., Li, Y., Liu, H., Chen, Y., Ding, X., 2021b. Application of deep learning
Li, W., Yin, Y., Quan, X., Zhang, H., 2019. Gene expression value prediction based on algorithms in geotechnical engineering: a short critical review. Artif. Intell. Rev.
XGBoost algorithm. Front. Genet. 10. https://doi.org/10.3389/fgene.2019.01077. https://doi.org/10.1007/s10462-021-09967-1.
Liang, W., Luo, S., Zhao, G., Wu, H., 2020. Predicting hard rock pillar stability using Zhao, J., Shi, M., Hu, G., Song, X., Zhang, C., Tao, D., Wu, W., 2019. A data-driven
GBDT, XGBoost, and LightGBM algorithms. Mathematics 8 (5). framework for tunnel geological-type prediction based on TBM operating data.
Moon, S.W., Subramaniam, P., Zhang, Y., Vinoth, G., Ku, T., 2019. Bedrock depth IEEE Access 7, 66703e66713.
evaluation using microtremor measurement: empirical guidelines at weathered Zhao, S., Shadabfar, M., Zhang, D., Chen, J., Huang, H., 2021. Deep learning-based
granite formation in Singapore. J. Appl. Geophys. 171. classification and instance segmentation of leakage-area and scaling images of
Nath, R.R., Kumar, G., Sharma, M.L., Gupta, S.C., 2018. Estimation of bedrock depth shield tunnel linings. Struct. Contr. Health Monit. 28 (6).
for a part of Garhwal Himalayas using two different geophysical techniques. Zheng, H., Wu, Y., 2019. A XGBoost model with weather similarity analysis and
Geosci. Lett. 5 (1). feature engineering for short-term wind power forecasting. Appl. Sci. 9 (15).
Olive, D.J., 2017. Linear Regression, 1rd ed. Springer, USA. Zhou, W.H., Zhao, L.S., Chen, G.M., Yuen, K.V., 2018. 3D geologic modelling with borehole
Pan, X., Guo, W., Aung, Z., Nyo, A.K., Chiam, K., Wu, D., Chu, J., 2018. Procedure for data by general regression neural network. Proceedings of the 6th International
establishing a 3D geological model for Singapore. Proceedings of the Geo- Symposium on Reliability Engineering and Risk Management (6ISRERM).
Shanghai 2018 International Conference: Transportation Geotechnics and Zhu, L.F., Zhang, C.J., Li, M.J., Pan, X., Sun, J.Z., 2012. Building 3D solid models of
Pavement Engineering, pp. 81e89. sedimentary stratigraphic systems from borehole data: an automatic method
Pan, X., Chu, J., Aung, Z., Chiam, K., Wu, D., 2020. 3D geological modelling: a case and case studies. Eng. Geol. 127, 1e13.
study for Singapore. Information Technology in Geo-Engineering, pp. 161e167.
Prion, S.K., Haerling, K.A., 2020. Making sense of methods and measurements:
simple linear regression. Clin. Simul. Nurs. 48, 94e95. Dr. Xing Zhu is Postdoc Research Fellow at Nanyang
Pu, Y., Apel, D.B., Liu, V., Mitri, H., 2019. Machine learning methods for rockburst University of Technology, Singapore. His research interests
prediction-state-of-the-art review. Int. J. Min. Sci. Technol. 29 (4), 565e570. are in application of Artificial Intelligent (AI) in engi-
Qi, X.H., Pan, X., Chiam, K., Lim, Y.S., Lau, S.G., 2020a. Comparative spatial pre- neering geology, Wireless Sensor Network, Geological In-
dictions of the locations of soil-rock interface. Eng. Geol. 272. formation and Data Mining. He is also Associate Professor
Qi, X.H., Wang, H., Pan, X.H., Chu, J., Chiam, K., 2020b. Prediction of interfaces of at Chengdu University of Technology, China, where he
geological formations using the multivariate adaptive regression spline method. received his PhD in Geotechnical Engineering in 2014.
Undergr. Space 6 (3), 252e266.
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M., 2015.
Machine learning predictive models for mineral prospectivity: an evaluation of
neural networks, random forest, regression trees and support vector machines.
Ore Geol. Rev. 71, 804e818.
Sharma, J.S., Chu, J., Zhao, J., 1999. Geological and Geotechnical features of
Singapore: an overview. Tunn. Undergr. Space Technol. 14 (4), 419e431.