You are on page 1of 5

EVALUATION OF MACHINE LEARNING ALGORITHMS FOR THE

CLASSIFICATION OF LITHOLOGY USING GEOPHYSICAL LOGS


O. Atita1, R. Durrheim1, E. Saffou1
1
University Of The Witwatersrand

Summary
Classification of subsurface formation lithology from well log data is a significant task in geoscience,
petroleum exploration and engineering. Presently, several machine learning algorithms have been
implemented for lithology classification to improve the prediction accuracy. However, due to the
complex geological conditions, such algorithms are hardly adopted for mineral deposits. In his paper,
we evaluated three popular machine learning algorithms, such as the Support Vector Machine, Random
Forest and Gradient Boosting Decision Tree. This study used the process of grid search and 10-fold
cross-validation to optimize the hyperparameters of each model and avoid overfitting. The performance
of each model is evaluated using metrics of accuracy, precision, recall and F1-score of predicted labels
of lithology against the true labels. The results show that the Gradient Boosting Decision Tree model
has better lithology classification performance, with a precision of 97.74%, recall of 98.67% and F1-
score of 98.20% among other models. The interpretation of GBDT model shows that the order of
features contributing to the lithology classification is VP >Density >Vs > natural gamma. The
study reveals that GBDT model can provide significant information for further exploration targeting of
deep mineral deposits.

Near Surface Geoscience Conference & Exhibition 2022


Evaluation of machine learning algorithms for the classification of lithology using geophysical
logs

Introduction

Classification of subsurface formation lithological units was traditionally conducted using methods
such as description and analysis of recovered core samples, and examination of cuttings retrieved during
drilling operations. These traditional methods are the most direct and practical. However, these
traditional methods are not always suitable because coring is costly, and coring and core recovery is
sometimes incomplete. In addition, because of complex geological conditions, different geologists may
give different interpretations, leading to uncertainty (Benaouda et al. 1999; Xie et al. 2018). Downhole
geophysical logging has been adopted to address these challenges. Downhole geophysical log data
provides high vertical resolution and good continuity of in-situ information. Also, these downhole
geophysical log data may be used to use information from the parts of the borehole with core recovery
to interpret the parts of the borehole with core loss (Benaouda et al. 1999; Xie et al. 2018). Therefore,
geophysical logs are a significant resource of subsurface rock information. However, the relationship
between the geophysical logging signatures and formation lithology is often complex.

Recently, several machine learning algorithms have been used to classify lithology using geophysical
log data. These algorithms assist the geoscientists to tackle the non-linear relationship between
geophysical logging signatures and subsurface lithologies to improve classification performance
accuracy (Xie et al. 2018). The aim of this research is to evaluate machine learning algorithms for the
classification of formation lithological units using geophysical logs from a gold deposit and compare
the performance of three popular optimized machine learning algorithms on the basis of the
computational training time and performance accuracy.

The study site is at Moab Khotsong gold mine in the Klerksdorp gold field region on the northwest
border of the Witwatersrand Basin (Figure 1). The Witwatersrand Basin is unique and of great
importance to geoscientists and explorationists. It is the oldest well-preserved, laterally extensive
successions of sedimentary basins in the world and hosts rich gold-bearing conglomerate beds (Frimmel
and Minter, 2002). The geological structure of Moab Khotsong gold mine is complex with series of
faults and intrusives cutting across igneous and metasedimentary rocks. Presently, mining at Moab
Khotsong is approaching a depth of about 4 km. Boreholes were drilled under the auspices on the
International Continental scientific Drilling Program project DSeis to investigate the nature of the fault
zone that hosted a M5.5 earthquake in August 2014 (Ogasawara et al. 2017).

Figure 1 Geological map of the Witwatersrand Basin, South Africa. The location of Moab Khotsong
gold mine is denoted by the blue circle on the enlarged map (modified after Dankert and Hein, 2010).

Near Surface Geoscience Conference & Exhibition 2022


Method

Random Forest method


The Random Forest method is an ensemble supervised machine learning algorithm first proposed by
Breiman based on construction of several uncorrelated decision trees in an independent random
manner and taking average of their predictions (Breiman, 2001).

Gradient Boosting Decision Tree (GBDT)


GBDT is a boosting ensemble supervised machine learning algorithms. GBDT builds a decision tree
based on the combination of several weak decision trees, which is generated in a sequence by
reducing the combined decision tree bias at each step (Friedman, 2002).

Super Vector Machine (SVM)


The SVM is a supervised machine learning algorithm originally built for binary classification problems,
where it separates data points to classify the class label linearly. However, most datasets are non-linear
in reality, and the datasets must be converted into linear datasets in a high-dimensional space to improve
the classification accuracy by non-linear transformation function called radial basis function kernel (Xie
et al. 2018).

Hyperparameter tuning
Hyperparameter tuning is a selection process that applies a performance metric to estimate appropriate
search ranges and parameter values with the best accuracy performance metric optimal training of
each classifier model. This was achieved using a grid search method. Also, validation score curve
from 10-fold-cross validation was applied to avoid overfitting or underfitting of each models for some
hyperparameter values.

K-means clustering analysis


K-means clustering is a non-hierarchical kind of unsupervised multivariate machine learning method
that employs a series of techniques to generate classes in dataset according to similarity or dissimilarity
features represented by the dataset using (Benaouda et al. 1999).

A summarized workflow of the proposed methods is shown in Figure 2 and the results in Figure 3.

Figure 2 Workflow of proposed machine learning classification methods.

Near Surface Geoscience Conference & Exhibition 2022


Figure 3 Model evaluation; (a) confusion matrix plots; (b) classified lithological profile

Near Surface Geoscience Conference & Exhibition 2022


Results and Interpretation

1) Overall, the results shows that our supervised models have similar classification accuracies, which
is above 90%. Also, the classification of each lithology label into the correct label is above 80%
(Figure 3a). However, the results shows that the GBDT model has a better lithology classification
performance with a performance accuracy of 97.06%; a precision of 97.04%; a recall of 97.06%;
and a F1-score of 97.04%, compared to other optimized models;
2) The results demonstrates that quartzite and intrusive rocks were more accurately classified (Figure
3b). This could be due to the homogeneity in rocks, large samples size, textures, and structures.
On the other hand, siltstone had the highest misclassification rate, and was often misclassified as
quartzite. This could be because the Vp and density values of siltstone and quartzite are similar,
the heterogeneity of siltstone, as it has frequent interbedding of shale and quartz, and the samples
size of siltstone was too small for the relationship between logging and lithology to be fitted;
3) K-means clustering results enhance the knowledge of the location of significant amount of
radioactive elements (Figure 3b). Also, quartzite and intrusive rocks were more correctly
predicted without lithological information from core analysis and observation.

Conclusions

This research shows that machine learning methods can effectively classify lithologies using
geophysical log data with or without information of lithology labels from mineral deposits. In addition,
the result demonstrates that GBDT model outperformed the other two competing models by its
classification accuracy for the test set, while SVM model had the fastest computational training time.
Finally, according to the interpretation of the GBDT model, the order of the logging feature that
determine lithology in Moab Khotsong gold mine is Vp > density > Vs > natural gamma.

References

Benaouda, D., Wadge, G., Whitmarsh, R.B., Rothwell, R.G. and MacLeod, C. [1999] Inferring the
lithology of borehole rocks by applying neural network classifiers to downhole logs: an example from
the Ocean Drilling Program. Geophysical Journal International, 136(2), 477-491.

Breiman, L. [2001] Random forests. Machine learning, 45(1), 5-32.

Dankert, B. T. and Hein, K. A. A. [2010] Evaluating the structural character and tectonic history of
the Witwatersrand Basin. Precambrian Research, 177(1-2), 1–22.

Friedman, J. H. [2002] Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4),
367-378.

Frimmel, H. E. and Minter, W. E. L. [2002] Recent developments concerning the geological history
and genesis of the Witwatersrand gold deposits, South Africa. Society of Economic Geologists,
Special Publication, 9, 45-117.
Ogasawara, H., Durrheim, R. J., Yabe, Y., Ito, T., van Aswegen, G., Grobbelaar, M., Funato, A.,
Ishida, A., Mngadi, S., Manzi, M. S. D. and Ziegler, M. [2017] Drilling into seismogenic zones of
M2. 0–M5. 5 earthquakes from deep South African gold mines (DSeis): establishment of research
sites. In Proceedings of ISRM AfriRock 2017, 3-5 October 2017, Cape Town, South African Institute
of Mining & Metallurgy, Symposium Series S93,+ 237-248
Xie, Y., Zhu, C., Zhou, W., Li, Z., Liu, X. and Tu, M. [2018] Evaluation of machine learning methods
for formation lithology identification: A comparison of tuning processes and model performances.
Journal of Petroleum Science and Engineering, 160, 182-193.

Near Surface Geoscience Conference & Exhibition 2022

You might also like