You are on page 1of 16

Computers & Geosciences 171 (2023) 105266

Contents lists available at ScienceDirect

Computers and Geosciences


journal homepage: www.elsevier.com/locate/cageo

Prediction of reservoir brittleness from geophysical logs using machine


learning algorithms
Tobi Ore, Dengliang Gao *
Department of Geology and Geography, West Virginia University, Morgantown, WV, USA

A R T I C L E I N F O A B S T R A C T

Keywords: Brittleness is an important geomechanical property of reservoirs, which is usually estimated from cores or sonic
Machine learning logs that are expensive to acquire. In this study, we report data-driven, machine learning workflows to predict
Unconventional reservoir brittleness from less expensive, readily available conventional logs. We propose three strategies to predict
Brittleness
brittleness using gamma ray, neutron porosity, density, caliper, sonic, and photoelectric factor logs by utilizing
Reservoir property estimation
Marcellus shale
gradient boosting (GB), support vector regression (SVR), and neural networks (NN). The first strategy involves
predicting brittleness directly from the logs while the second strategy predicts shear sonic logs used for the
estimation of brittleness. The performance of the models given as R2 on deployment on the testing set for the first
strategy is: GB (0.87), SVR (0.73), and NN (0.82), while for the second strategy: GB (0.94), SVR (0.87), and NN
(0.94). In the third strategy, we convert the prediction into a classification task by grouping the brittleness es­
timate into ductile, transition, and brittle. The accuracy of model deployment on the testing set for the third
strategy is: GB (89.37%), SVM (89.06%), and NN (89.16%). We demonstrate that depending on the strategy
adopted, the gradient boosting algorithm outperforms the other peer algorithms in terms of training and vali­
dation scores. Furthermore, we combine the three algorithms using a committee machine to improve the per­
formance of the model. The workflow in this study can be adopted to predict other reservoir properties from
available logs. The workflow can also be adopted to characterize reservoir heterogeneity from seismic traces
trained by well logs.

1. Introduction of brittleness of reservoir rocks. Brittleness is a measure of a rock’s


behavior under stress that is dictated by mineralogy and elastic prop­
Over the years, organic-rich shale, which has previously been erties. It also indicates whether fractures created after a rock fails under
thought of as a source rock in conventional reservoirs, has been directly stress will be maintained (Rickman et al., 2008). This property is
drilled as an unconventional reservoir to produce hydrocarbons due to essential in the development of reservoirs because, during the explora­
advancements in technology. Technological advancements, such as tion for and production of energy resources, regions with high brittle­
multi-stage horizontal drilling and hydraulic fracturing, not only made it ness are predominantly selected. The prior statement is a heuristic in
feasible to tap into the resource but also profitable. Efforts have been practice because brittle rocks tend to contain natural fractures with the
made to properly understand the behavior of shale and to better char­ possibility of induced fractures. This is important during proppant in­
acterize shale reservoirs to improve efficiency of this new exploitation jection in stimulation processes, as both old and new fractures are kept
technique. A common step in reservoir characterization is to delineate open with the proppant. Whilst ductile rocks deform plastically during
the spatial distribution and variability of reservoir geomechanical stimulation resulting in the created fractures closing, hindering the flow
properties using geophysical logs, seismic signals, and cores. Of these of fluids (Evans et al., 2019a, 2019b).
different data sets reflecting reservoir properties at various scales, the Typically, the brittleness of rock is inferred through laboratory
geophysical log data are the most abundant, and the subsurface infor­ studies of rock samples and cores or using empirical equations that
mation obtained from such logs individually or combinatorically is vital utilize geophysical log data. Core data are expensive and are mostly
in the characterization and prediction of unconventional reservoirs. collected for exploratory wells, while the geophysical logs of a well al­
Critical to the exploration of unconventional plays is the estimation ways have variable availability. This poses a major challenge in the

* Corresponding author.
E-mail address: dengliang.gao@mail.wvu.edu (D. Gao).

https://doi.org/10.1016/j.cageo.2022.105266
Received 2 May 2022; Received in revised form 13 October 2022; Accepted 4 November 2022
Available online 8 November 2022
0098-3004/© 2022 Elsevier Ltd. All rights reserved.
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 1. A base map of the location where the data was acquired with the available wells and the seismic survey. The red wells on the map have full wave sonic for the
estimation of brittleness, while the black wells do not. The wells outside the seismic survey are used to increase the data base to build the model for better prediction
of brittleness in the black wells within the seismic survey.(For interpretation of the references to color in this figure legend, the reader is referred to the Web version
of this article.)

exploration for unconventional plays where these data are not available. most stable of all models. Ye et al. (2022) proposed a method to predict
One possible solution to this is the use of machine learning methods to the brittleness index of the Wufeng-Longmaxi and Baota formations of
fill the gap in a data-driven fashion. the Sichuan Basin via principal component analysis and
The adoption of machine learning to predict reservoir brittleness has back-propagation neural networks.
been well-documented in recent literature. Wood (2021) utilized a In this study, we adopt a workflow that can efficiently estimate and
data-matching algorithm to predict the mineralogical brittleness index predict the brittleness of reservoirs in wells without the right set of data
of the Lower Barnett Shale using gamma ray, density, neutron, re­ by utilizing gradient boosting, support vector regression, and artificial
sistivity, and sonic logs. Negara et al. (2017) applied a support vector neural network algorithms. Advanced pre-processing of the data is
regression to predict the brittleness index from elemental spectroscopy incorporated into the workflow to reduce uncertainty and improve the
and petrophysical properties and documented that the prediction was a predictive ability of the models. This study represents the first docu­
good match with the laboratory-measured brittleness indices. Ahmadov mented attempt to estimate and predict the brittleness of the Marcellus
(2021) utilized linear, ridge and lasso regression, K-nearest neighbors, Shale in the Appalachian Basin. We propose a new brittleness index that
support vector machine, decision tree, random forest, AdaBoost, and conforms with the known physical properties of rocks, highlights several
gradient boosting to predict the geomechanically derived brittleness best practices in building machine learning models, and suggest three
index of the Tuscaloosa Shale and found that the tree-based methods strategies to approach the problem of brittleness prediction. The output
(gradient boosting and random forest) outperformed all other methods, of these models will aid the understanding of the brittleness variability
which agrees with the results of Ore and Gao (2021) and Ore’s Thesis in the reservoir which will be useful for the optimization of well oper­
(2020). In a similar work, Sun et al. (2020) applied Chi-square automatic ation and for reservoir modeling.
interaction detector, random forest, support vector machine, K-nearest
neighbors, and artificial neural network to predict the brittleness of rock
samples from a water transfer tunneling project in Malaysia using the 1.1. Dataset
results of simple rock index tests as predictors. They found that the
random forest model performed best for the training and validation set, The dataset is from the Appalachian Basin with the target reservoir
while for the test set the artificial neural network was better performing. being the Marcellus Shale. There are 9 wells with different available
However, they emphasized that the K-nearest neighbors model was the geophysical logs, with 6 in the seismic survey available which, in the
future, will be used for brittleness inversion (Fig. 1). Since the logs

2
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

needed for building the machine learning models must be available in all the shear sonic (uS/Ft), Δtcomp is the compressional sonic (uS/Ft) and ρb
wells, this poses a significant constraint on the process. For this reason, is the bulk density (kg/m3).
gamma ray (GR), neutron porosity (NPHI), density (RHOZ), caliper The YME and PR are estimated using equations (5) and (6):
(HCAL), compressional sonic (DTCO), and photoelectric factor (PEFZ)
9G ∗ K
are used as the predictors, where all the logs are environmentally cor­ E= (5)
G + 3K
rected before use in the model building. Only 4 of the 9 available wells
have the full sonic-wave log suites needed to estimate brittleness; as a 3K − 2G
result, these 4 wells are used to build the model and evaluate how well it v= (6)
6K + 2G
predicts brittleness for other wells.
where E and v are Young’s Modulus and Poisson’s ratio, respectively.
1.2. Brittleness In convention, the computed YME and PR are normalized by sub­
tracting each value by the minimum and dividing by the range, which is
Brittleness is the ease of a rock to break and create planes of weak­ called min-max normalization using equations (7) and (8), and the
ness under certain differential stress. This property is dependent on li­ brittleness average is then estimated using equation (9):
thology, mineral content, temperate, fluids, diagenesis, and effective E − Emin
stress (Perez and Marfurt, 2013). The goal of a hydraulic fracturing Ebrittleness = (7)
Emax − Emin
operation is to increase permeability in the rock through the opening of
natural fractures or the creation of new fractures, making rocks with vbrittleness =
v − vmin
(8)
high brittleness suitable candidates. Other factors such as loading his­ vmax − vmin
tory, engineering design of the stimulation, and lithology of top and base
Ebrittleness + vbrittleness
reservoir, are important in the determination if eventually fractures will BA = (9)
2
be induced by the stimulation.
Researchers often debate the definition of brittleness and the method where Emin is the minimum YME, Emax is the maximum YME, vmin is the
of estimating this geomechanical property, making it a contentious topic minimum PR, vmax is the maximum PR and BA is the brittleness average.
of debate. Consequently, no globally accepted brittleness estimation However, Rickman et al.’s (2008) brittleness average of equation (9)
method has been put forward to date (Altindag and Guney, 2010). depends on how the YME brittleness and PR brittleness are normalized,
However, based on extensive reviews of the subject concept in literature, and can be incorrect if conventional min-max normalization scheme is
the methods that utilize mineralogy and elastic property seem to be used. For example, the conventional nomalization for PR brittleness
gaining traction (Mews et al., 2019). using equation (8) could lead to incorrect estimate of the brittleness, as
Jarvie et al. (2007) estimated brittleness by using a ratio of the tested in section 3.1. Rickman et al. (2008) used a normalization scheme
mineral content (equation (1)). They hypothesized that mineralogical for YME and PR brittleness, but it is complicated, inconsistent and
characteristics are important for the success of stimulation and frac­ difficult to interprete.
turing. Wang and Gale (2009) modified this relationship by considering To correctly and consistently evaluate brittleness avergae using a
the impact of dolomite and organic content on the brittleness (equation consistent and simple normalization scheme, we propose a modified
(2)). These estimation methods do not factor in the effect of stress normalization scheme for the PR brittleness using equation (10). This
regime and diagenesis on the brittleness, resulting in a poor applicabi­ approach gives a straightforward and better-constrained brittleness es­
lityin different plays with unique mineralogical properties. timate and conforms with scientific knowledge about the geomechanical
Q behavior of materials as tested by computational experimentation in
BIj = (1) section 3.1.
Q + Ca + Cl
1
( )
− min 1v
BIw =
Q+D
(2) vbrittleness = v ( 1) ( ) (10)
Q + D + Ca + Cl + TOC max v − min 1v

To adopt this elastic property technique for brittleness estimation,


where BIj and BIw is the Brittleness Index by Jarvie et al. (2007) and
compressional and shear sonic are required. Unfortunately, full-
Wang and Gale (2009) respectively, and Q, Ca, D, Cl, and TOC is the
waveform acoustic logs are most times not available in wells and often
quartz, calcite, dolomite, clay, and total organic carbon content
are acquired only in limited intervals. To this end, we apply machine
respectively.
learning to build and train models with the capability of predicting
Rickman et al. (2008) proposed a relationship, using elastic prop­
brittleness from readily available geophysical logs and comparing their
erties, to distinguish between brittle and ductile rocks. They utilized
performance and efficiency.
Young’s modulus (YME) and Poisson’s ratio (PR) by normalizing the
elastic properties and averaging them in a term they call “Brittleness
Average”. The idea of combining these two elastic properties is borne 2. Methodologies
out of the belief regarding the behavior of materials: ductile materials
exhibit a low YME and high PR whereas brittle materials exhibit a low 2.1. Support vector regression
PR and moderate/high YME (Grieser and Bray, 2007).
Generally, the YME and PR of rock are estimated from the dynamic Support Vector Regression (SVR) is a supervised machine learning
elastic moduli (shear and bulk modulus) which are calculated using technique that generalizes well to unseen data due to its backbone
empirical equations (3) and (4) from the sonic and density logs. depending on statistical learning theory. Support vector algorithms have
grown to be a popular machine learning technique for both regression
ρb and classification prediction tasks. The support vector machine (SVM)
G = (13474.45) (3)
(Δtshear )2 was first applied to the character recognition classification problem. The
[ ] SVR was adapted from the SVM for regression tasks but works on the
K = (13474.45)ρb (
1 4
G (4) same principle (Smola and Schölkopf, 2004).
)2 −
Δtcomp 3 Given training data (X, y) where X are the input features and y the
target values, the algorithm attempts to find a function f(x) that has at
where G is the shear modulus (Pa), K is the bulk modulus (Pa), Δtshear is most ε deviation from the actual targets yi , and at the same time is as flat

3
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 3. The framework of a Committee Machine. The estimators are machine


Fig. 2. The architecture of the Neural Network adopted in this study.
learning models independently trained to make predictions. These predictions
are averaged, resulting in an ensemble with reduced error variance.
as possible i.e w is small in equation (11) (Cortes and Vapnik, 1995).
For simplicity, let’s assume a linear function f of the form: an input layer, a hidden layer(s), and an output layer, are connected by
f (x) = 〈w, x〉 + b with w ∈ X, b ∈ R (11) synapses that have associated weights (Fig. 2). At the layers, the input
signal xj at the synapse j connected to neuron k is multiplied by the
where 〈•, •〉 denotes the dot product in X. The above case is flat when w weight wkj. For each neuron, a bias (bk) is added to the weighted signal
is small, which can be guaranteed by minimizing the risk: before introducing nonlinearity via the activation function (Hastie et al.,
2009).
1 ∑m
R[f ] = ‖w‖2 + C L(y, f (x)) (12) Mathematically, the neuron can be expressed as:
2 i=1

n
zk = wkj xj + bk (15)
where the constant C is a regularization parameter that determines the j=1
trade-off between the flatness of f and the tolerance level of the slack
variable ε, and L(y, f(x)) is an ε-insensitive loss function that can either yk = σ (zk ) (16)
be linear (equation (13)):
where zk is the linear output, yk is the output of the neuron, and σ (•) is
L(y, f (x)) = |y − f (x)|ε = max(0, |y − f (x)| − ε) , (13)
the activation function.
There are several ways the neurons can be combined, resulting in
or quadratic (equation (14)):
different architectures such as feedforward neural networks (Sazli,
L(y, f (x)) = |y − f (x)|2ε . (14) 2006), convolutional neural networks (O’Shea and Nash 2015), and
recurrent neural networks (Medsker and Jain, 1999). Though the algo­
rithms work with similar principles, their adoption depends on the na­
2.2. Gradient boosting
ture of the modeling problem, the traditional applicability to the
problem, and the objective of the project.
Boosting is an ensemble method where new models are added
sequentially to build a stronger model. This new weak learner model is
trained at each iteration taking into consideration the ensemble error.
2.4. Committee machine
Gradient boosting employs a gradient-descent based optimization of the
model hyperparameters, which is made up of a weak prediction model,
Committee machines are an ensemble of estimators that combine the
an additive model to combine the weak models, and a loss function
predictions of the individual committee members to predict a new input
(Friedman, 2001).
(Fig. 3). This committee machine typically outperforms the individual
In gradient boosting, the weak prediction models are decision trees
estimators in test set prediction because the combination of the pre­
characterized by branches (input features) and leaves (output feature).
dictions cancels out the errors from the individual estimators reducing
During an iteration, an appropriate weight is used to combine the
the variance (Tresp, 2018).
ensemble with the new decision tree which has learned from the mis­
Consider a set of N trained estimators, fi (x) where i = 1, …, N. This
takes of the ensemble. Consider at first training iteration, a weak model
estimator is the sum of the desired function g(x) and an error function
Fm that can predict the mean of the output. The gradient boosting con­
(Bishop, 1995):
structs a new model that improves on Fm by adding an estimator h (re­
sidual of the previous model) to create a better model Fm+1. Consider a fi (x) = g(x) + ei (x) (17)
differentiable loss function L(y, F(x)), the gradient boosting algorithm
This model’s average sum-of-squares can be expressed as:
(Hastie et al., 2009) is outlined as follows:
[ 2] [ ]
Input: training set {xi , yi }ni=1 E ei = E (fi (x) − g(x))2 (18)

1. model initialization: F0 (x) = ni=1 L(yi , γ)for m = 1, …, M do
[ ] The committee’s output is the average of the N estimators incorpo­
2. compute rim = − ∂L(y∂F(x i ,F(xi ))
i)
for i = 1, …, n; rated in the committee machine in the form:
F(x)=Fm− 1 (x)
∑n
3. solve F0 (x) = i=1 L(yi , Fm− 1 (xi ) + γhm (xi ))
1 ∑N
4. update model: Fm (x) = Fm− 1 (x) + γ m hm (x)end for fCOM (x) = fi (x). (19)
N i=1
Output: FM (x).
If it is assumed that the errors ei (x) are uncorrelated and have zero
2.3. Neural network mean, we have
1
Neural networks (NN), inspired by the brain, are nonlinear statistical ECOM = EAV (20)
N
modeling tools that are made up of several components that work
together for predictive modeling (Hopfield, 1988). These components, Where ECOM and EAV are the average error made by the committee ma

4
­
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

chine and the individual estimators respectively. Table 1


However, for highly correlated and biased estimators it can easily be Hyperparameters considered in the training of the machine learning models.
proved that (See Bishop, 1995; Perrone, 1993; Krogh and Vedelsby, These were carefully selected from the host of possible parameters because of
1994): their degree of influence on performance of the models.
Algorithm Hyperparameter Description
ECOM ≤ EAV (21)
Support Vector C Determines the strength of the
Consequently, averaging won’t make things worse if the committee Regression regularization
members have good prediction performance as it will be as good as the Gamma This is the kernel coefficient. In this
average model or better, thereby improving the generalization capacity research, the kernel used is the radial
basis function (eγ•xi xj ) and the gamma

of the model.
parameter controls the distance of the
influence of a single training point.
2.5. Data preprocessing Typically, models with a very large
gamma tends to overfit
Data in its raw form is often time not consumable by machine Gradient Maximum depth This is the maximum number of nodes
Boosting allowed from the root to the farthest leaf
learning models because they are not in the right format, or they contain
of a tree. The ability for the model to
information that can adversely affect the performance of the model. learn complex relationship depends on
Therefore, preprocessing is a de facto first step of a machine learning the depth, however, deeper trees are
pipeline that comprises several steps, the first being the outlier removal susceptible to overfitting
(Felix and Lee, 2019). Minimum child This is the minimum weight required for
weight a new node to be created in the tree.
Extreme data points which can be a representation of error in data More branches are created in the tree if
collection are called outliers. These outliers are most times removed by it is small, but the problem of overfitting
studying the box-and-whisker plot. However, this process is subjective can also arise
and can lead to bias in the outlier removal. The isolation forest algorithm Number of This is the number of gradients boosted
estimators trees which is equivalent to the number
is adopted to remove the outlier in the dataset. This is a tree-based al­
of boosting rounds
gorithm that isolates data points randomly by selecting a feature and Learning rate This is related to the weights of the
splitting it between the maximum and minimum (Liu et al., 2008). The nodes and determines how fast the
length between the root and the terminating node is equivalent to the boosting learning will reach a minimum
number of splitting required to isolate a data point, which is averaged Neural Network Number of hidden The robustness of the data and nature of
layers the problem sought out to be modeled
over the forest. In the splitting, shorter path lengths are representative of must be considered when selecting the
potential anomalies which will adversely affect the model’s appropriate number of hidden layers.
performance. Also, as described above, when too small
In machine learning classification tasks, a major problem is class they lead to high bias and when too
large they lead to the problem of
imbalance in the dataset where the models have a high accuracy just by
overfitting
predicting the majority class but fails to capture the minority class. A Number of neurons This also must be tuned to find the
widely adopted technique to circumvent this problem is resampling i.e., per layer perfect combination that will result in a
removing samples (under-sampling) and/or adding more samples (over- network properly trained with output in
sampling). In this work, we implement a combination, developed by between high bias and high variance
Solver This is the algorithm the neural network
Batista et al. (2004), of over-sampling (Synthetic Minority Oversampling
will use to update the weights of every
Technique) and under-sampling (Edited Nearest Neighbor), together layer after each epoch. Some popular
referred to as SMOTEENN. When using the Synthetic Minority Over­ algorithms which will be tested are
sampling Technique (SMOTE), a point from the minority class is selected stochastic gradient descent, Adam and
lbfgs (Limited-memory Broyden-
at random, and its k-nearest neighbors are calculated. Between the
Fletcher-Goldfarb-Shanno)
selected point and its neighbors, the synthetic points are added (Chawla Learning Rate The rate at which the weights are
et al., 2002). The Edited Nearest Neighbor (ENN) under-sampling updates
technique eliminates instances of the majority class on the boundary Number of iterations The number of times the learning
whose predictions by the k-nearest neighbors algorithm are different algorithm will train over the entire
training dataset. That is, one iteration/
from the other majority class points (Wilson, 1972).
epoch means that each sample in the
The dataset is then split into partitions: train, validation, and test. training dataset has had a chance to
The model learns the relationship between the input features and output update the internal model parameters
from the training set. Typically, the model is trained iteratively via grid (weight). This plays an important role in
how well the model fits on the training
search cross-validation and the validation set is used to evaluate the
dataset. This is usually tuned based on
performance of the model to prevent overfitting. The test set is used to computational power and time
assess the generalization of the model on the dataset, as the model has
not seen this set and data leakage is avoided. It is important to split the
dataset in a manner that the partitions have similar distributions to normalization and the Standard scaler approach and assess which
avoid bias. technique results in a better-performing model. The min-max scaler
transforms each feature to range between 0 and 1 preserving the original
2.6. Feature scaling data distribution, by using the relationship in equation (22).
x − min(x)
Some machine learning algorithms are sensitive to the magnitude of x=
̂ (22)
max(x) − min(x)
the range of features and the variance in the dataset, making feature
scaling an important step in the model-building pipeline. Generally, the where x is the original value, and ̂
x is the normalized value. The Stan­
performance of most machine learning algorithms improves signifi­ dard scaler centralizes the data around a mean of 0 with a standard
cantly upon scaling the feature (Han et al., 2011). This can be done by deviation of 1 through equation (23).
normalization and standardization of the features, and the choice of
scaling depends on the algorithm. Here, we use the min-max

5
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 4. Schematic diagram of the structure of 3-fold cross-validation. The


performance metric is the R2 score for regression task and accuracy for classi­
Fig. 5. Confusion matrix for a binary classification task which contains all the
fication task.
information of the classification model performance.

x− μ
x=
̂ (23)
σ
1 ∑N
MAE = yi|
|yi − ̂ (26)
where μ is the mean of the values, and σ is the standard deviation of the N i=1
values.
where N is the data size, yi is the true value, ̂
y i is the predicted value, and

yi is the average of the true value (yi = N1 Ni=1 yi ).
2.7. Hyperparameter tuning
The confusion matrix is used to assess the performance of a classi­
Machine learning model’s performance depends on configurations fication model through a count of misclassification which can be dis­
that can either be learned from the data or predefined prior to training played in a tabular form (Fig. 5). This table gives all the information
the models. Parameters are internal to the model and are often estimated required to quantify the predictive ability of a classification model. The
through an optimization technique e.g the support vectors of a SVR, the accuracy score is a ratio of the set of correct predicted values to the total
weights in a NN, etc. On the other hand, hyperparameters are external to number of predictions made.
the model and are set manually e.g the learning rate, etc (Li et al., 2017).
Hyperparameters can be thought of as a-priori parameters whose suit­ 3. Results and discussion
able value is achieved by random search in a range of combinations.
Table 1 describes the hyperparameters considered for each model. 3.1. Elastic based brittleness estimation
An efficient way to train the model with a host of hyperparameters is
by Grid Search Cross-Validation. Here, an array of possible hyper­ Brittleness is estimated using equations (5)–(10) for comparative
parameters is passed, and all possible combinations are tested for the analysis. The estimate utilizing Poisson’s ratio brittleness relationships
optimal hyperparameters based on a predefined evaluation metric in equations (8) and (10) are compared with Young’s modulus and
(Krstajic et al., 2014). The training data is split into K folds, where the Poisson’s ratio via a cross-plot (Fig. 6) and the viability of the methods to
first fold is used to train the model with a certain hyperparameter estimate the brittleness of the Marcellus shale is assessed.
combination and the left out is used for testing (Fig. 4). This is an iter­ Rocks in the subsurface are confined, restricting how much they can
ative process that is repeated K times for a hyperparameter combination. extend or shorten, especially in the horizontal direction. For example, a
sedimentary rock buried would experience a vertical shortening that
could only be compensated by a horizontal extension. By squeezing a
2.8. Estimating model performance sample, one can simulate this in the laboratory by creating horizontal
stresses that will balance the axial shortening (Fossen, 2016). This axial
The predictive capability of a machine learning model is assessed strain can be expressed as:
through its performance on unseen data. This performance is typically
evaluated using statistical relationships that compare the difference 1
ez = [σ 1 − v(σ 2 + σ 3 )] (27)
between the predicted and true values. Depending on the modeling task E
(classification or regression), there exists a plethora of evaluation met­ A similar expression can be found for horizontal stresses (Fossen,
rics. Here, we utilize R-squared (R2), Mean Squared Error (MSE), and 2016). Considering the vertical stress σ 1 , σ2 = σ3 , and the boundary
Mean Absolute Error (MAE) for the regression while the accuracy score condition ex = 0, we obtain:
and confusion matrix are used to define the performance for the
v
classification. σ2 = σ3 = σ1 (28)
1− v
The R2 score is a measure of the amount of variation in the dependent
variable that is explained by the independent variable. It is calculated The Poisson’s ratio (v) is directly proportional to σ 3 which has im­
using equation (24): plications for the differential stress required for the failure of the rock. In
other words, an increase in the Poisson’s ratio results in higher differ­
∑ ential stress to create fractures in the rock, therefore, inversely propor­
N
(yi − ̂y i )2
2
R =1− i=1
(24) tional to the brittleness of the rock. This contrasts with the brittleness
N ( )2

yi − yi estimate using the min-max normalization scheme, where the brittleness
i=1 increases with increasing Poisson’s ratio (Fig. 6a), while our proposed
method shows the expected relationship (Fig. 6b).
The MSE is the average of the squared differences between the pre­
As a proof of concept, we compare the proposed brittleness rela­
dicted and true values:
tionship in this paper with known rock properties and mineralogy. In
1 ∑N
Fig. 7, the two estimates are compared with closure pressure, a hydraulic
MSE = y i )2
(yi − ̂ (25)
N i=1 fracture design parameter obtained from a diagnostic fracture injection
test that identifies the pressure at which the fracture closes without
The MAE is the average of the absolute error values:

6
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 6. A cross-plot of Young’s modulus against Poisson’s ratio with an overlay of brittleness estimate (color) using Poisson’s ratio brittleness normalization equation
(8) on the left and the proposed normalization equation (10) on the right.(For interpretation of the references to color in this figure legend, the reader is referred to
the Web version of this article.)

Fig. 7. A cross-plot of closure pressure against brittleness estimate. The closure pressure has an inverse relationship to the brittleness of the rock (Li et al., 2020). The
brittleness estimate on the left does not reflect this relationship, however, the proposed brittleness estimate on the right does.

Poisson Ratio, which are inversely proportional.


Table 2
Next, we compare these two brittleness estimates with the miner­
The correlation of the mineralogy of the Marcellus Shale and the brittleness
alogy of the rock. The brittleness depends on the mineral content, as
estimate. Silicate minerals are considered brittle and should be positively
described in Wang and Gale (2009) where silicate minerals are referred
correlated with the brittleness estimate.
to as brittle, and clay minerals as ductile. As earlier alluded, brittleness
SILICATE CARBONATE CLAY
estimate from mineral content alone is not sufficient to be a proxy for
Correctly normalized BI 0.08 0.09 − 0.18 brittleness as the stress regime and tectonics are not factored into the
Incorrectly normalized BI − 0.37 0.56 − 0.21 mineralogical property. Table 2 shows that incorrectly normalized
brittleness estimate is negatively correlated to the silicate mineral. Fig. 8
proppant in place. The closure pressure approximates the minimum shows the high brittleness clusters around the low silicate content while
stress, and an increase in closure pressure is indicative of enhanced rock the opposite is for the proposed brittleness estimate. We also found that
plasticity, and a decreased brittleness resulting in a monotonic decrease our suggested technique has a better agreement with the observed
in the brittleness index (Li et al., 2020). In Fig. 7b, the brittleness esti­ fracture counts in the MIP3H well.
mated by the proposed relationship increases with a decrease in closure
pressure in line with the expected behavior. Contrary to this, the closure 3.2. Feature selection
pressure in Fig. 7a increases as closure pressure increases. A plausible
explanation for this is that the Barnett shale, where the min-max Experience and domain knowledge are used to reduce the dozens of
normalization scheme was first adopted, have different mineralogy logs available in the well to those that have predictive significance for
from the Marcellus Shale, as the Marcellus Shale is a silica-rich argilla­ the property of interest. These logs were selected based on the definition
ceous mudstone while the Barnett shale is a mixed siliceous mudstone. of brittleness, which depends on the mineralogical composition and
Also, the estimate averages two measurements, Young’s Modulus and geomechanical properties of the formation. Therefore, logs that are

7
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 8. Ternary diagram of the mineral composition of the silica-rich Marcellus Shale overlaid by the brittleness estimate. On the left is the incorrectly normalized
brittleness estimate which shows that the high brittleness is clustered in the low silicate mineral region.

Fig. 9. Statistical relationship between the geophysical logs. (a) Mutual information (b) Pearson’s correlation.

sensitive to mineralogical change and elastic properties that will have standard deviations, resulting in a value between − 1 and 1. The result
brittleness implications are considered. These logs are caliper (HCAL), shows that in terms of mutual information, the compressional sonic log
bulk density (RHOZ), gamma ray (GR), neutron porosity (NPHI), pho­ has the highest association with brittleness while in terms of correlation,
toelectric factor (PEFZ), and compressional sonic (DTCO). the neutron porosity has the highest correlation (Fig. 9). The caliper log
To assess the relationship and association between the features, the has the lowest association with brittleness but is included as a predictor
mutual information and Pearson correlation are utilized. The measure­ since its correlation is > 0.1. However, the density log is not used as a
ment of the dependence between two random variables is mutual in­ predictor in the model building as the association is low and the cor­
formation, which has a non-negative value. it is a measure of the amount relation is close to 0.
of information about one random variable that can be learned from
observing another random variable. Higher values indicate higher 3.3. Data preprocessing
dependence, and it equals zero only when two random variables are
independent (Kraskov et al., 2004). The Pearson correlation is a measure Basic statistics are used to understand the nature of the data and its
of the linear correlation between two random variables. It is a ratio distribution. Specifically, we observe the range and variance of the
between the covariance of the variables and the product of their features as machine learning models are, most times, sensitive to

Table 3
The statistical summary of the raw data prior to preprocessing. The range and order of magnitude of the features are different, reinforcing the need to normalize the
data before model building.
count mean std min 25% 50% 75% max

GR (api) 5798 140.09 61.76 24.46 111.06 145.32 164.41 780.63


NPHI 5798 0.17 0.07 0.00 0.15 0.20 0.22 0.38
RHOZ (g/cm3) 5798 2.67 0.08 2.29 2.62 2.68 2.73 3.12
HCAL (in) 5798 8.69 0.51 7.59 8.39 8.78 8.90 11.16
DTCO (μs/ft) 5798 73.05 10.42 49.56 69.02 72.84 79.49 105.45
PEFZ 5798 3.81 0.69 2.32 3.54 3.80 4.04 9.78
Brittleness 5798 0.32 0.18 0.01 0.19 0.25 0.48 0.93

8
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Table 4
Dataset summary after scaling using the min-max approach. The range of each feature in the data set is constrained to 1 without affecting the shape of the distribution.
count mean std min 25% 50% 75% max

GR (api) 5798 0.15 0.08 0.00 0.11 0.16 0.19 1.00


NPHI 5798 0.45 0.19 0.00 0.39 0.52 0.57 1.00
RHOZ (g/cm3) 5798 0.46 0.09 0.00 0.39 0.47 0.54 1.00
HCAL (in) 5798 0.31 0.14 0.00 0.22 0.33 0.37 1.00
DTCO (μs/ft) 5798 0.42 0.19 0.00 0.35 0.42 0.54 1.00
PEFZ 5798 0.20 0.09 0.00 0.16 0.20 0.23 1.00
Brittleness 5798 0.34 0.19 0.00 0.20 0.26 0.51 1.00

Table 5
Dataset summary after standardization. The features are constrained to a distribution with mean of 0 and standard deviation of 1.
count mean std min 25% 50% 75% max

GR (api) 5798 0.00 1.00 − 1.87 − 0.47 0.08 0.39 10.37


NPHI 5798 0.00 1.00 − 2.40 − 0.30 0.37 0.65 2.93
RHOZ (g/cm3) 5798 0.00 1.00 − 4.98 − 0.73 0.15 0.85 5.89
HCAL (in) 5798 0.00 1.00 − 2.18 − 0.60 0.18 0.40 4.87
DTCO (μs/ft) 5798 0.00 1.00 − 2.25 − 0.39 − 0.02 0.62 3.11
PEFZ 5798 0.00 1.00 − 2.14 − 0.38 − 0.01 0.34 8.62
Brittleness 5798 0.00 1.00 − 1.80 − 0.77 − 0.43 0.90 3.47

features with varying orders of these statistics. In our case, Table 3


shows that the features need to be normalized/standardized by using
techniques described in section 2.6. Tables 4 and 5 show the new sta­
tistics of the feature when the min-max scaler and standard scaler is
applied.
The data has 4657 records (Boggess [6905–8330 ft], MIP 3H
[6804–7804 ft], and St Whipkey [6740–7880 ft] wells), which is split
into training and validation set by an 80%–20% scheme. To test the
predictive power of the models, a blind well (Poseidon well [7185–8140
ft]) was utilized. This well has 1910 records and was not used in the
model building; therefore, it will serve as a good check of the model’s
performance on new wells in the area (Fig. 10).
Fig. 10. Data splitting before the building of the machine learning models. The Machine learning models are sensitive to outliers, observations that
testing set is a blind well completely left out in the training process, while the are far from the majority in the feature space. The outliers in the training
training and validation is a random 80-20 split of the other wells data shuffled. dataset, as shown in Fig. 11, are removed using the Isolation Forest with
contamination of 0.1 and the training dataset is reduced to 3353. Box­
plot still shows outliers (Fig. 12), however, care was taken not to remove

Fig. 11. Boxplot of the well log data which is informative of the skewness and the outliers. Using the information on the interquartile ranges from the boxplot alone
to remove outliers will result in the loss of vital information.

9
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 12. Boxplot of the data after outlier removal using the Isolation Forest algorithm. This data is better constrained for building machine learning models,
especially those sensitive to outliers.

Table 6
Model performance of strategy 1. The gradient boosting model has the best
training set performance, but the committee machine outperformed other
models in the validation and testing set.
Model Data R2 MSE MAE
− 4 2
Support Vector Regression Training 0.9588 9.2 x 10 2.0 x 10−
3 2
Validation 0.9297 1.7 x 10− 2.6 x 10−
3 2
Testing 0.7298 3.3 x 10− 4.4 x 10−
4 3
Gradient Boosting Training 0.9928 1.6 x 10− 7.6 x 10−
3 2
Validation 0.9505 1.2 x 10− 2.2 x 10−
3 2
Testing 0.8685 1.6 x 10− 2.8 x 10−
3 2
Neural Network Training 0.9057 2.2 x 10− 3.5 x 10−
3 2
Validation 0.9043 2.3 x 10− 3.5 x 10−
2 2
Testing 0.8193 3.5 x 10− 3.5 x 10−
4 2
Committee Machine Training 0.9625 9.1 x 10− 2.0 x 10−
4 2
Validation 0.9596 9.8 x 10− 2.0 x 10−
3 2
Testing 0.8782 1.5 x 10− 2.8 x 10−

actual subsurface data which can give more illumination of the physical
behavior of the stratigraphy. Fig. 13. A histogram showing the significance of each feature in training the
decision trees of the gradient boosting model.

3.4. Model results


split of 4. The performance on the testing set is an R2 score of 0.8685,
In this research, three strategies to predict brittleness are explored. In MSE of 0.0016, and MAE of 0.028. In building the NN model, care was
the first, the brittleness is predicted directly from other geophysical logs, taken to avoid overfitting the model as given the right parameters and
while the second strategy involves predicting the shear sonic and then training for a long period will most times result in a high training ac­
calculating the brittleness using the empirical relationship. The third curacy. This kind of model has no predictive power, as it has just
strategy involves converting the problem into a classification one, where memorized the data in the training set and has not generalized the data
the brittleness values are split into groups. distribution. The number of neurons in the hidden layer used to train the
network was limited due to the small data size. 10 hidden neurons with a
3.4.1. Strategy 1 maximum iteration of 500 and a tolerance of 0.00001 using the lbfgs
Here, we predict brittleness directly from other geophysical logs by solver were found to be the optimum parameter combination. The
using grid search cross-validation to train the machine learning models. performance of the NN model on the test set is an R2 score of 0.8193,
In the SVR model, the radial basis function (rbf) kernel was utilized MSE of 0.0035, and MAE of 0.035 (See Table 6 for the performance
where the gamma value was set to scale (1/(number of features * metrics of the models on the training and validation set).
Variance(X)) and optimum epsilon and C were found to be 0.01 and 1, The GB outperformed all other models in the performance on the
respectively. The performance of the SVR model on the testing data is R2 training set. This is expected as tree-based models are well suited to
of 0.7298, MSE of 0.00033, and MAE of 0.044. The GB model with a model problems such as the one in this study (Ore and Gao, 2021; Ore,
huber loss function had an optimum maximum depth of 7, minimum 2020; Ahmadov, 2021). Also, the trees in the GB model can be visualized
samples leaf of 1, the number of estimators is 250 and minimum samples and information on feature importance in the model building can be

10
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 14. Model result on deployment on the blind well using for strategy 1.

Fig. 15. Error histogram showing the distribution of the difference of the actual and predicted brittleness on deployment of the strategy 1 model on the validation
and test set.

11
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Table 7 predict the shear sonic log before using an empirical relationship to
Model performance of strategy 2. The gradient boosting model outperformed the calculate the brittleness estimate. This is a problem that has been
other models in the training and validation sets but the committee machine is the extensively addressed in the literature, therefore, making this approach
best for the blind test set. feasible (Bukar et al., 2019; Zhang et al., 2022; Rajabi et al., 2010; Yu
Model Data R2 MSE MAE et al., 2021). However, a major drawback of this strategy is the fact that
Support Vector Regression Training 0.9887 4.4042 1.5111 the wells in which the brittleness is sought to be predicted must have the
Validation 0.9514 20.5059 2.3744 compressional sonic log for the brittleness calculation, as this informa­
Testing 0.8684 31.6586 4.0317 tion is also very important for enhancing the predictive capability of the
Gradient Boosting Training 0.9980 0.7844 0.64637 machine learning models for the shear sonic. Typically, sonic logs are
Validation 0.9863 5.7939 1.5502
Testing 0.9353 15.55721 3.0236
not available for all wells, making this strategy, not the first choice in
Neural Network Training 0.9821 6.9836 2.0579 predicting brittleness.
Validation 0.9752 10.4656 2.3759 For this task, the geophysical logs used as predictors are the gamma-
Testing 0.9432 13.6690 2.6000 ray, neutron porosity, caliper, compressional sonic, and photoelectric
Committee Machine Training 0.9925 2.9301 1.2978
factor. The 3 machine learning algorithms (SVR, GB, and NN) are
Validation 0.9838 6.8545 1.8009
Testing 0.9505 11.8954 2.5860 trained with the shear sonic as the target. The hyperparameters for the
GB model that resulted in the best performance are a learning rate of 0.1,
maximum depth of 6, minimum samples leaf of 3, minimum samples
further investigated (Fig. 13). The sonic and neutron, referred to as split of 2, and the number of estimators of 200. The R2 score on the
porosity logs, are the most influential features in the building of the GB deployment of the model on the test set is 0.9353, with an MSE of 15.6
model. The neutron log is sensitive to the presence of gas which can and an MAE of 3.0. For the SVR model, the hyperparameter combination
serve as a proxy for natural fractures in shales, while the sonic log is the is a C of 10, an epsilon of 0.02, gamma set to scale (1/(number of fea­
basis for the brittleness estimation which further supports the algo­ tures * Variance(X)), and a radial basis function kernel. The performance
rithms selection of these features as important in the splitting of the metrics for the testing set are: R2 of 0.8684, MSE of 31.7, and MAE of
decision trees for the ensemble model. This follows physical intuition as 4.0. The best hyperparameter for the NN model was found to be 1 hidden
the brittleness estimate is related to porosity (Heidari et al., 2014). layer containing 17 neurons and an lbfgs solver. The testing set R2 is
The SVR, NN, and GB performed relatively well in modeling the 0.9432, MSE is 13.7 and MAE is 2.6. The fusion of the three trained
brittleness as can be seen in the prediction on the blind well (test set). models into a committee machine resulted in a model with better pre­
However, it appears that they are sensitive to the gamma-ray values as dictive power reflected by the performance on the test set where R2 is
regions with high gamma ray display unusual patterns. Irrespective, the 0.9505, MSE is 11.9 and MAE is 2.6 (See Table 7 for the performance
models correctly predict the trend of brittleness in the blind well metrics of the models on the training and validation set).
(Fig. 14). Analysis of the error histograms further gives information on The errors of the model are normally distributed, except for the SVR
the distribution of the errors between the estimated and predicted which has a left-skewed distribution to the overestimated region
brittleness. This also is indicative of the uncertainty associated with the (Fig. 17). However, more than 80% of the errors are in the ±10 bounds,
models. In all models, the errors in the prediction of the validation set with the committee machine having a tighter bound of ±5 making the
are normal and centered at zero. However, the errors of the SVR in the committee machine a more robust solution to the problem, as it per­
prediction of the test set are skewed to the underestimated region, with forms well on the blind well. The model is deployed to predict the shear
about 80% in the region and the bound for the prediction is ±0.2. The sonic log, and the brittleness is then estimated from the relationship
SVR, NN, and GB results are combined to form a committee machine described in the methodology (Fig. 16). All models capture the general
capable of better generalization for the prediction of brittleness from the trend of brittleness variability in the blind well. However, a significant
geophysical logs. This is reflected by the improvement in the perfor­ issue arises from the amplification of errors in the models. The brittle­
mance scores (testing set scores: R2 of 0.8782, MSE of 0.0015, and MAE ness estimates obtained from the predicted shear sonic appear to be most
of 0.028) and the reduction in the error variance (Fig. 15). shifted with the SVR model being significant, though all the models
perform well in predicting the shear sonic. This is because, in the brit­
3.4.2. Strategy 2 tleness computation, the error from the shear sonic prediction is squared
Another approach to estimating brittleness for wells without the multiple times making the errors magnified, consequently making this
required geophysical logs is to use machine learning algorithms to strategy not the first choice for the prediction of brittleness.

Fig. 16. Model result on deployment on the blind well using for strategy 2.

12
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 17. Error histogram showing the distribution of the difference of the actual and predicted brittleness on deployment of the strategy 2 model on the validation
and test set.

end goal of our study is to invert brittleness from seismic trace data. This
will be difficult to achieve using the information of brittleness classes,
however, we will implement this strategy and demonstrate the result
performance and capabilities.
The first step is to group the brittleness estimates into classes. The
cutoff for the classes is from prior knowledge of the reservoir often
biased, or through unsupervised techniques such as K-nearest neighbors.
In this study, 3 groups were found: ductile, transition, and brittle, with
0.27 and 0.5 as cutoffs (Fig. 18). Isolation Forest, with contamination of
0.1, is used to identify and remove outliers in the training data, while the
classification variant of the three machine learning algorithms was used
to build the models, and the metric used to assess the performance of the
model is accuracy and a confusion matrix.
Before building the model, about 60% of the training data was found
to be from the low brittleness class which brings up the issue of class
imbalance. This imbalance has grave implications for the model per­
formance, as it will find it harder to predict the minority classes. To work
Fig. 18. Histogram showing the brittleness distribution for the training set. The around this issue, we adopted a resampling technique popularly known
brittleness estimates are grouped as ductile (0.0–0.27), transition (0.27–0.5) as SMOTEENN (see section 2.5) to balance the data, resulting in the 3
and brittle (0.5–1.0). groups having similar sample sizes.
Using the same training strategy as the regression problem, the
3.4.3. Strategy 3 model’s hyperparameters were tuned. For the GB model, the optimum
Experts often time interpret brittleness qualitatively as low and high. hyperparameters were the maximum depth of 7, minimum samples leaf
The idea behind this is hinged upon the fact that the actual brittleness of 2, and minimum samples split of 2 with accuracy on the training set of
definition of rocks has not been properly constrained to date, and all the 100%, on the validation set of 98.58%, and testing set of 89.37%. Using
relationships used are proxies and serve as estimates. One can therefore a gamma of 1/(number of features * Variance(X), the optimum hyper­
lump the brittleness estimate values into groups and solve this predic­ parameters for the SVM is a C of 100 and a radial basis function kernel
tion problem as classification. One advantage of doing this is that the which had a performance of 96.01% on the training set, 95.15% on the
complexity of the problem will reduce, and we can build a more robust validation set and 89.06% on the testing set. The NN model was built
and scalable model to predict the brittleness classes. One drawback of with one hidden layer containing 19 neurons and an lbfgs solver with an
this approach is that a ton of information will be lost. For instance, the accuracy of 94.46% on the training set, 94.63% on the validation set,

13
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Table 8 unconventional resources because regions with high brittleness may be


Model performance of strategy 3 is assessed by the accuracy score. The gradient good candidates for drilling tragets. The use of existing relationships to
boosting model outperformed the other models in all sets. estimate the brittleness of the Marcellus Shale presents challenges
Training Validation Testing because it does not conform with known physical behaviors. To provide
accuracy accuracy accuracy a more physically interpretable brittleness estimate, we propose an
Support Vector 96.01% 95.15% 89.06% alternative brittleness estimate which is a modification of the Rickman
Machine et al. (2008) relationship. This is a challenging task because these esti­
Gradient Boosting 100% 98.58% 89.37% mates depend on the elastic properties of the rock, but this information
Neural Network 94.46% 94.63% 89.16%
is typically unavailable. The use of machine learning techniques offers a
way to fill this data availability gap.
and 89.16% on the testing set (Table 8). The confusion matrix (Fig. 19) In this study, we utilize support vector regression, neural network,
gives information on the error made in the class predictions for the and gradient boosting for prediction of brittleness using geophysical
testing set. Though the accuracy scores indicate that the GB algorithm logs. In terms of training scores, the gradient-boosting model out­
outperformed the SVM and the NN in terms of training, testing, and performed other models. However, a committee machine that combines
validation accuracy, the confusion matrix shows that each one of the all three models performed better on testing when applied to unseen
models is doing equally just as well in distinguishing the three classes data. In the classification task, the gradient boosting outperformed all
and predicting correctly but they are all misclassifying transition sam­ other models reinforcing the predictive capability of tree-based models.
ples as ductile samples at the similar rate. Further inspection of the Machine learning algorithms offer a workaround to predicting brit­
prediction on the blind well (Fig. 20), the accuracy and confusion matrix tleness in wells without the relevant geophysical logs. The workflow
do not tell the full story of the model’s performance. The NN prediction adopted here can be used in the prediction of other reservoir properties
in the Marcellus Shale interval, when compared with the actual classes, such as permeability and porosity. Using the workflow reported in this
is very different. However, the GB and SVM models do a decent job in study, we plan to predict reservoir heterogeneity from seismic traces to
the prediction. Note that this Marcellus layer is mostly ductile to tran­ be trained by well logs.
sitional in brittleness values, making the algorithms struggle to predict
the only brittle layer in the interval (See Table 8 for the accuracy of the Authorship contribution statement
models on the training and validation set).
T. O. performed data collection, processing, analysis., and quality
checking. D. G. conceived the study, designed the project and secured
4. Conclusions
data and funding from industry. T. O. and D. G. wrote and revised the
paper.
Brittleness is an important factor in the exploration of

Fig. 19. Confusion matrix of the models on deployment on the test set. The three models struggle to predict the transition class correctly.

14
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

Fig. 20. Model result on deployment on the blind well using for strategy 3. The green flag represents high brittleness, while the yellow represents low brittleness.
(For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Declaration of competing interest the mohr-coulomb failure criteria. In: Unconventional Resources Technology
Conference, pp. 4036–4046. Denver, Colorado, 22-24 July 2019.
Evans, K.G., Carr, T.R., Ghahfarokhi, P.K., Ore, T., Smith, J., Toth, R., 2019b. Improving
The authors declare that they have no known competing financial completion techniques of unconventional shale reservoirs through the analysis of
interests or personal relationships that could have appeared to influence geomechanical properties and fracture imaging; A study of horizontal velocity and
the work reported in this paper. image logs within the MIP-3H Marcellus shale well in monongalia county, West
Virginia. In: 2019 AAPG Eastern Section Meeting: Energy from the Heartland.
Felix, E.A., Lee, S.P., 2019. Systematic literature review of preprocessing techniques for
Data availability imbalanced data. IET Softw. 13 (6), 479–496.
Fossen, H., 2016. Structural Geology. Cambridge university press.
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine.
Data will be made available on request. Ann. Stat. 1189–1232.
Grieser, W.V., Bray, J.M., 2007. Identification of production potential in unconventional
Acknowledgements reservoirs. In: Production and Operations Symposium. Society of Petroleum
Engineers.
Han, J., Pei, J., Kamber, M., 2011. Data Mining: Concepts and Techniques. Elsevier.
We thank Occidental Corporation for the support of our joint in­ Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning: Data
dustry geophysics consortium. Energy Corporation of America (ECA) Mining, Inference, and Prediction. Springer Science & Business Media.
Heidari, M., Khanlari, G.R., Torabi-Kaveh, M., Kargarian, S., Saneie, S., 2014. Effect of
provided well data along with a 3D seismic survey. Tim Carr offerred porosity on rock brittleness. Rock Mech. Rock Eng. 47 (2), 785–790.
two additional sets of well logs from MIP 3H and Boggess wells in the Hopfield, J.J., 1988. Artificial neural networks. IEEE Circ. Dev. Mag. 4 (5), 3–10.
data base. Journal peer reviews by Associate Editor and four anonymous Jarvie, D.M., Hill, R.J., Ruble, T.E., Pollastro, R.M., 2007. Unconventional shale-gas
systems: the Mississippian Barnett Shale of North-Central Texas as one model for
reviewers helped improve the quality of the paper.
thermogenic shale-gas assessment. AAPG (Am. Assoc. Pet. Geol.) Bull. 91, 475–499.
Kraskov, A., Stögbauer, H., Grassberger, P., 2004. Estimating mutual information. Phys.
Code availability Rev. 69 (6), 066138.
Krogh, A., Vedelsby, J., 1994. Neural network ensembles, cross validation, and active
learning. Adv. Neural Inf. Process. Syst. 7.
The code used in this study is available for download at the link: Krstajic, D., Buturovic, L.J., Leahy, D.E., Thomas, S., 2014. Cross-validation pitfalls when
https://github.com/tobi-ore/Brittleness-Predicition-using-Machine-Le selecting and assessing regression and classification models. J. Cheminf. 6 (1), 1–15.
arning Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A., 2017. Hyperband: a
novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res.
18 (1), 6765–6816.
References Li, Y., Zhou, L., Li, D., Zhang, S., Tian, F., Xie, Z., Liu, B., 2020. Shale brittleness index
based on the energy evolution theory and evaluation with logging data: a case study
Ahmadov, J., 2021. Utilizing data-driven models to predict brittleness in Tuscaloosa of the Guandong block. ACS Omega 5 (22), 13164–13175.
marine shale: a machine learning approach. September. In: SPE Annual Technical Liu, F.T., Ting, K.M., Zhou, Z.H., 2008. Isolation forest. December. In: 2008 Eighth Ieee
Conference and Exhibition. OnePetro. International Conference on Data Mining. IEEE, pp. 413–422.
Altindag, R., Guney, A., 2010. Predicting the relationships between brittleness and Medsker, L., Jain, L.C. (Eds.), 1999. Recurrent Neural Networks: Design and
mechanical properties (UCS, TS and SH) of rocks. Sci. Res. Essays 5, 2107–2118. Applications. CRC press.
Batista, G.E., Prati, R.C., Monard, M.C., 2004. A study of the behavior of several methods Mews, K.S., Alhubail, M.M., Barati, R.G., 2019. A review of brittleness index correlations
for balancing machine learning training data. ACM SIGKDD explorations newsletter for unconventional tight and ultra-tight reservoirs. Geosciences 9 (7), 319.
6 (1), 20–29. Negara, A., Ali, S.S., Al Dhamen, A., Kesserwan, H., Jin, G., 2017. Data-Driven Brittleness
Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford university press. index prediction from elemental spectroscopy and petrophysical properties using
Bukar, I., Adamu, M.B., Hassan, U., 2019. A machine learning approach to shear sonic support-vector regression. June. In: SPWLA 58th Annual Logging Symposium.
log prediction. August. In: SPE Nigeria Annual International Conference and OnePetro.
Exhibition. OnePetro. Ore, T.M., 2020. A machine learning and data-driven prediction and inversion of
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic reservoir brittleness from geophysical logs and seismic signals. In: A Case Study in
minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357. Southwest Pennsylvania, Central Appalachian Basin. M.S.Thesis. West Virginia
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. University.
Evans, K., Toth, R., Ore, T., Smith, J., Bannikova, N., Carr, T., Ghahfarokhi, P.K., 2019a. Ore, T., Gao, D., 2021. Supervised machine learning to predict brittleness using well logs
Fracture analysis before and after hydraulic fracturing in the Marcellus shale using and seismic signal attributes: methods and application in an unconventional

15
T. Ore and D. Gao Computers and Geosciences 171 (2023) 105266

reservoir. In: First International Meeting for Applied Geoscience & Energy. Society of Sun, D., Lonbani, M., Askarian, B., Jahed Armaghani, D., Tarinejad, R., Thai Pham, B.,
Exploration Geophysicists, pp. 1566–1570. Huynh, V.V., 2020. Investigating the applications of machine learning techniques to
O’Shea, K., Nash, R., 2015. An Introduction to Convolutional Neural Networks arXiv predict the rock brittleness index. Appl. Sci. 10 (5), 1691.
preprint arXiv:1511.08458. Tresp, V., 2018. Committee machines. In: Handbook of Neural Network Signal
Perez, R., Marfurt, K., 2013. Brittleness estimation from seismic measurements in Processing. CRC Press, 5-1.
unconventional reservoirs: application to the Barnett Shale. January. In: 2013 SEG Wang, F.P., Gale, J.F.W., 2009. Screening criteria for shale-gas systems. Gulf Coast Assoc.
Annual Meeting. Society of Exploration Geophysicists. Geol. Soc. Transact. 59, 779–793.
Perrone, M.P., 1993. Improving Regression Estimation: Averaging Methods for Variance Wilson, D.L., 1972. Asymptotic properties of nearest neighbor rules using edited data.
Reduction with Extensions to General Convex Measure Optimization. Brown IEEE Transact. Syst. Man Cybernetics (3), 408–421.
University. Wood, D.A., 2021. Brittleness index predictions from Lower Barnett Shale well-log data
Rajabi, M., Bohloli, B., Ahangar, E.G., 2010. Intelligent approaches for prediction of applying an optimized data matching algorithm at various sampling densities.
compressional, shear and Stoneley wave velocities from conventional well log data: Geosci. Front. 12 (6), 101087.
a case study from the Sarvak carbonate reservoir in the Abadan Plain (Southwestern Ye, Y., Tang, S., Xi, Z., Jiang, D., Duan, Y., 2022. A new method to predict brittleness
Iran). Comput. Geosci. 36 (5), 647–664. index for shale gas reservoirs: insights from well logging data. J. Petrol. Sci. Eng.
Rickman, R., Mullen, M.J., Petre, J.E., Grieser, W.V., Kundert, D., 2008. A practical use of 208, 109431.
shale petrophysics for stimulation design optimization: all shale plays are not clones Yu, Y., Xu, C., Misra, S., Li, W., Ashby, M., Pan, W., Deng, T., Jo, H., Santos, J.E., Fu, L.,
of the Barnett Shale. January. In: SPE Annual Technical Conference and Exhibition. Wang, C., 2021. Synthetic sonic log generation with machine learning: a contest
Society of Petroleum Engineers. summary from five methods. Petrophysics-The SPWLA J. Format. Evaluat. Reservoir
Sazli, M.H., 2006. A brief review of feed-forward neural networks. Communications Description 62 (4), 393–406.
Faculty of Sciences University of Ankara Series A2-A3. Phys. Sci. Eng. 50 (1). Zhang, F., Deng, S., Wang, S., Sun, H., 2022. Convolutional neural network long short-
Smola, A.J., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput. term memory deep learning model for sonic well log generation for brittleness
14 (3), 199–222. evaluation. Interpretation 10 (2), T367–T378.

16

You might also like