You are on page 1of 25

Accepted Manuscript

Identifying channel sand-body from multiple seismic attributes with an improved


random forest algorithm

Yile Ao, Hongqi Li, Liping Zhu, Sikandar Ali, Zhongguo Yang

PII: S0920-4105(18)30916-1
DOI: https://doi.org/10.1016/j.petrol.2018.10.048
Reference: PETROL 5411

To appear in: Journal of Petroleum Science and Engineering

Received Date: 11 August 2018


Revised Date: 10 October 2018
Accepted Date: 16 October 2018

Please cite this article as: Ao, Y., Li, H., Zhu, L., Ali, S., Yang, Z., Identifying channel sand-body from
multiple seismic attributes with an improved random forest algorithm, Journal of Petroleum Science and
Engineering (2018), doi: https://doi.org/10.1016/j.petrol.2018.10.048.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Identifying Channel Sand-body from Multiple Seismic Attributes


with an Improved Random Forest Algorithm

Yile Aoa,∗, Hongqi Lia , Liping Zhua , Sikandar Alia , Zhongguo Yangb

PT
a China University of Petroleum-Beijing, Changping, Beijing, China
b North China University of Technology, Shijingshan, Beijing, China

RI
Abstract

SC
Machine learning provides numerous data-driven tools for automatic pattern recognition. Even
though various algorithms such as neural networks and support vector machines have been
widely applied, it is still necessary to explore new paradigms and algorithms to improve the

U
machine learning assisted seismic interpretation. Random Forest (RF) is a widely used ensem-
ble algorithm, however, only limited studies of random forest in the seismic application were
published. In this article, the methodology of random forest is introduced systematically. Mean-
AN
while, to solve the problem of hyper-parameter determination, we propose an improved algorithm
named Pruning Random Forest (PRF). To reveal the advantages of PRF in terms of predictive
performance, robustness, and feature selection compared with support vector machine, neural
network, and decision tree, several well-designed experiments are executed based on the seismic
M

data of the western Bohai Sea of China. The potential and advantages of random forest in the
present case are confirmed by various experiments, which substantiates that the proposed pruning
random forest algorithm provides a reliable alternative way for further machine learning assisted
D

seismic interpretation.
Keywords: machine learning, random forest, ensemble learning, seismic interpretation, channel
TE

identification
EP

1. Introduction

Literature studies show that machine learning algorithms exhibit superior performance in
terms of high accuracy and low error compared to conventional statistical methods [1, 2]. In the
C

field of seismic interpretation, multifarious algorithms such as artificial neural network [3, 4, 5],
support vector machine [6, 7, 8], and fuzzy logic [9] have been widely applied for geological-
body identification [10, 11], lithofacies identification [12, 13], fault detection [14, 15], formation
AC

properties estimation [16, 17, 18] and reservoir characterization [19, 20]. In the above studies, the
effectiveness of using machine learning algorithms in seismic interpretation has been confirmed.
However, there are still many algorithms that have not been explored yet. Whether it is possible
to improve the application of machine learning assisted seismic interpretation by introducing
more sophisticated algorithms still worths studying.

∗ Corresponding author
Email address: aoyile@yeah.com (Yile Ao)
Preprint submitted to Journal of Petroleum Science & Engineering October 19, 2018
ACCEPTED MANUSCRIPT

Random Forest (RF) is an advanced ensemble learning algorithm proposed by Breiman [21]
based on the concept of randomized decision trees [22], which combines the essence of bagging
ensemble [23] and the random feature selection technique [24]. Real-world applications show
that random forests have significant advantages in solving machine learning problems in many
fields [25, 26, 27, 28, 29]. In the field of applied geophysics and geosciences, random forests

PT
have been employed for the seafloor’s sediment porosity evaluation [30], volcanic magma tec-
tonic setting discrimination [31], subsurface mineral prospectivity mapping [32, 33, 34, 35, 36],
surface lithological mapping [37, 38, 39, 40], soil organic matter mapping [41, 42, 43, 44], and

RI
groundwater potential evaluation [45, 46]. In the above research, the random forest is often
compared with algorithms such as neural network, support vector machine and decision tree
[33, 36, 39, 37, 45]. In most cases, the superiority of the random forest algorithm is substan-
tiated, which indicates the enormous potential of random forest for the petroleum geophysics

SC
application.
However, applications of random forest in petroleum geophysics are concentrated on the
borehole formation interpretation such as borehole lithology interpretation [47], logging facies
analysis [48], and borehole formation evaluation [49, 50, 51]. Only limited studies [10, 52, 53]

U
have explored the possibility of using random forest for seismic interpretation yet. In this article,
we will continue to explore the application of random forest on seismic interpretation. However,
AN
different from the previous studies, we will take a deeper insight into why random forest works
better than other algorithms based on the analysis of two issues:
1. Inevitable Data Error: Noise in seismic data and sample mislabeling bring a large number
of errors to the constructed sample set. Under this circumstance, effective machine learning
M

requires the employed algorithm to be robust enough to eliminate the influence of data
errors.
2. Irrelevant Information: In machine learning assisted seismic interpretation, we often wish
D

to use as more seismic attributes as possible for pattern recognition. However, too many
attributes also bring irrelevant information. Therefore, the applied algorithm should have
TE

strong feature selection ability to reduce the influence of irrelevant attributes.


Focusing on the above issues, we designed several experiments to reveal the advantages
of random forest in terms of robustness and adaptive feature selection. Through our analysis,
the superiority of applying random forests for seismic interpretation is elucidated, which we
EP

believe to be the biggest contribution of our study. The remainder of this article is structured as
follows. After this brief introduction, Section 2 presents the methodology of the decision tree and
random forest algorithms. Meanwhile, to solve the problem of hyper-parameter determination,
we propose an improved random forest algorithm with post-pruning, which makes random forest
C

much easier to tune and train. In Section 3, demonstrated with a real-world seismic interpretation
task, the proposed algorithm is applied to explore its effectiveness in practice. Then in Section 4,
AC

by comparing with support vector machines, neural network, and decision tree, the advantages
of pruning random forest in the aforementioned issues are revealed by experiments. And finally,
Section 5 presents the conclusions of our research.

2. Methodology

In this section, the methodologies of the decision tree and random forest algorithms are
presented systematically. As the basis of random forest, the algorithm of the decision tree is
2
ACCEPTED MANUSCRIPT

introduced at first. After the introduction of random forest, we point out the problem of hyper-
parameter determination in practice of the ordinary random forest algorithm, and then propose
the improved algorithm which uses the post-pruning operation to reduce the number of hyper-
parameters.

PT
2.1. Algorithm of decision tree
Decision trees refer to a family of algorithms which use a divide-and-conquer strategy to
induce rules from data for further prediction. There are many specific decision-tree algorithms,

RI
including CHAID [54], ID3 [55], C4.5 [56], and CART [22]. The differences between them
mainly exist in their partition types (binary partition for CART, or multi-way partition for C4.5)
and partition criterions (Bonferroni testing significance for CHAID, information gain ratio for
C4.5, or Gini index for CART). As the basis of the random forest algorithm, the CART algorithm

SC
is chosen as an example to illustrate the principle of classification decision trees.
CART (Classification And Regression Tree) is a non-parametric algorithm that produces
either classification or regression trees, depending on whether the learning target is categorical
or numeric respectively. The core of the CART algorithm is its recursive partition process. For

U
each partition, denoting the input space for parent node NC as XC , CART tries to find the best
split to partition XC into two subspace XL and XR , which is corresponding to the left child node
AN
NL and the right child node NR . Denotes the split based on the pth input by specified value v as
S p,v , for numerical inputs such as seismic attributes, S p,v is expressed as:

S p,v : XL = {x p < v|x ∈ XC }; XR = {x p ≥ v|x ∈ XC } (1)


M

In fact, S p,v is equivalent to an IF-ELSE rule. During the training of CART, a cascaded
logic chain is learned and stored as a binary tree by recursive binary partition. For a K classes
classification task, the split for each node is optimized with the minimization of joint impurity,
D

which is defined as:


 
JI S p,v = Prob(x ∈ XL ) · Gini(XL ) + Prob(x ∈ XR ) · Gini(XR )
TE

(2)
In the definition of Eq (2), Gini(X) represents the Gini impurity measurement of X, which is:
K X
X
Gini(X) = 1 − Prob(y = yk |x ∈ X)2
EP


(3)
k=1 x∈X

By traversing every possible split in all the P inputs, the best split S p,v with the minimum
JI(S p,v ) will be selected for the partition of XC . Then for the next steps, XL and XR will be
C

regarded as the parent node for further partition recursively, until any of the following conditions
are met:
AC

• No split is necessary since the classes of all samples in the subspace are the same.
• No split is available since the input values of all samples in the subspace are the same.

• The tree depth exceeds the preset maximum depth limitation (maxdepth).
• The samples count in the subspace is less than the preset minimum leaf size limitation
(minleave).

3
ACCEPTED MANUSCRIPT

After fully growth with optimization, the whole input space is partitioned into L subspace
X1 . . . XL defined by L leaf nodes in the tree structure. For the sample x0 in the lth subspace Xl ,
the conditional probability for the kth class of x0 is estimated as:

I yi = yk | xi ∈ Xl
PN 

PT
Prob y = yk |x = x0 = i=1PN

 (4)
i=1 I xi ∈ Xl
where N represents the count of samples in the training set, and I (c) is a conditional function
which returns 1 if c is true, returns 0 if c is false. For the observation with input vector x0 , by

RI
substituting x0 into the logic chain to find out the corresponding leaf node and subspace X0 , the
conditional probability for each class is estimated by Eq (4). Then CART tree will choose the
class with the highest conditional probability as the conclusion of x∗ .

U SC
AN
M
D

Figure 1: A demonstration of the CART algorithm. The right side is the learned tree structure, which partitions the whole
input space into six subspace visualized in the left side.
TE

For an intuitive understanding of the algorithm, the classification result of CART on a sim-
ulated dataset 1 is illustrated in Fig. 1. With a fine selected hyper-parameter setting (minleave
= 8 and maxdepth = 5), a tree structure with five splits is learned from the samples, which
EP

partitions the whole input space into X1 to X6 . Accordingly, the class type for each node is de-
termined based on the majority class of samples in the corresponding subspace. As a result, for
any samples with known x and y, their classes are determined by substituting them into the logic
chain represented by the tree structure.
C

Decision trees are fast to train and easy to understand, however, the drawbacks of decision
trees are undeniable: 1) they are very unstable. A small change in the training samples will
AC

engender a significant change in the tree structure; 2) decision trees are easy to overfit, which
may create over-complex trees with low predictive ability; 3) The hyper-parameters of decision
trees are hard to determine; 4) The classification boundary of the decision tree may be too rough
for some problems. These drawbacks affect the practice of decision tree seriously, especially
for classification problems with serious noise and mislabeling. However, with the benefits of
ensemble learning [58], these drawbacks can be eliminated to a certain extent.

1 This dataset is created by Hastie and Tibshirani, et al [57], which is also used for demonstration in the famous book

The Element of Statistical Learning


4
ACCEPTED MANUSCRIPT

2.2. Algorithm of random forest


The formal definition of Random Forest (RF) was first made by Breiman [21] in 2001, which
is a bagging ensemble of uncorrelated decision trees learned with randomized node optimization.
The framework of Breiman’s random forest is demonstrated in Fig. 3 (a). Based on M sample

PT
subsets generated from the whole sample set S by bootstrap sampling, M randomized decision
trees are constructed independently and then integrated as a bagging model. The randomness of
these trees is embodied in two aspects:

RI
• Random sample selection: The construction of the mth decision tree Tm (x) is based on a
random sample subset Sm , obtained by bootstrap sampling from the whole sample set S.

• Random feature selection: During the partition of tree nodes, instead of traversing ev-

SC
ery possible split in all P inputs, the randomized tree only consider splits in a randomly
selected input subset. The size of this subset is specified by the hyper-parameter ntry.

After the independent construction in parallel, the M base trees are integrated as the final

U
random forest model F (x). For the prediction of sample x0 , the predictions of M trees are deter-
mined at first, then the conditional probability for the kth class of x0 is estimated by:
AN
M
1 X
Prob y = yk |x = x0 = I Tm (x0 ) = yk
 
(5)
M m=1
The class with the highest conditional probability will be chosen to be the classification result
M

of x0 , which is equivalent to a majority voting aggregation:

F (x0 ) = argmax Prob y = yk |x = x0



D

yk ∈y1 ,...,yK
  (6)
= Mode T1 (x∗ ), T2 (x0 ), . . . , T M (x0 )
TE

For bagging style ensemble, the diversity between based models had been proved beneficial
for the performance and robustness of ensemble models [23, 59, 60]. As a special version of
bagging trees, random forest inherits this nature. Due to the additional randomized node opti-
EP

mization, the diversity of based trees is increased significantly, which makes random forest more
accurate and robust than the common bagging trees without randomization. However, similar to
other machine learning algorithms, the performance of random forest models is also influenced
by the setting of hyper-parameters. In general, hyper-parameters of random forest include:
C

• ntree: The number of generated randomized trees. Predicting with a large number of trees
AC

will improve the stability and accuracy of results. However, too many trees also bring
unnecessary computational consumption. For the real-world practice, a recommended
value for ntree is 500.
• ntry: The size of the randomly selected feature set, which is directly associated with the
randomness of trees [61]. It is generally believed that the less ntry is, the more diversity
will be obtained. However, ntry also impacts the feature selection ability of the based
trees [62]. Therefore, a too small ntry will aggravate the influence of irrelevant features
seriously.
5
ACCEPTED MANUSCRIPT

• maxdepth: The maximum depth limitation of each tree, which is inherited from the CART
algorithm. maxdepth controls the complexity of the based tree models. Since too shallow
trees cannot fit the data effectively (underfitting) while too deep trees trend to overfit [22].
Thus, the determination of maxdepth is not a easy work.

PT
• minleave: The minimum sample count in a leaf, which is also inherited from the CART
algorithm. Even though a larger minleave will help to eliminate the influence of mislabel-
ing [62]. However, the determination of minleave also faces the dilemma of overfitting
and underfitting.

RI
It is obvious that excepted ntree, values of the rest hyper-parameters are not easy to de-
termine. As a result, to ensure the effectiveness of the trained random forest model, tuning of

SC
hyper-parameters is necessary. A safe way is traversing every possible setting by grid search to
find out the best one. For instance, supposing 10 possible values are available for each hyper-
parameter, there will be 103 = 1000 kinds of combinations for the setting of {ntry, maxdepth,
minleave}. Frankly, it is not a wise choice to traverse all the 1000 kinds of combination by

U
grid search. Although there are some more sophisticated tuning techniques [63, 64, 65], a better
solution for this problem is finding a way to reduce the number of hyper parameter for random
forest.
AN
2.3. The pruning random forest
To simplify the hyper-parameter tuning process, we propose an improved version. The basic
M

idea of our improvement is replacing the complexity control implementation of hyper-parameters


with the post-pruning operations. Consequently, with a specified ntree, only ntry needs to
determinate, which makes the random forest algorithm more easy to use in practice.
D
TE
C EP
AC

Figure 2: A demonstration for the post-pruning operations. The tree in (a) is fully grown without any limitation on the
tree depth or leave size, which overfits the samples seriously. However, since the overfitted nodes in (a) are pruned during
the post-pruning, the result in (b) is competitive with the tree in Fig. 1 with fine parameter setting.

Pruning is a technique that simplifies overfitted decision trees by removing nodes with poor
predictive ability. After the pruning operations, the rise of overfitting will be greatly reduced

6
ACCEPTED MANUSCRIPT

since the complexity of tree is decreased. In our improvement, the post-pruning process is em-
ployed. The term post-pruning refers to the pruning operations executed after the construction of
trees. For the post-pruning of a decision tree, an additional pruning set SP independent with the
training set ST is required. During the pruning phase, for a branch node Nc in the tree structure
T ∗ constructed with ST , two predictive functions are available:

PT
• Lc (x): this function views Nc as a leaf node and uses the majority class of samples in Nc
for classification prediction.

RI
• Tc (x): this function views Nc and its children nodes as a sub-tree and classifies samples
with the logic chain of it.

By substituting samples in ST to T ∗ , for branch node Nc the classification accuracy of Lc (x)

SC
and Tc (x) are evaluated. If the accuracy of Lc (x) is higher than Tc (x) (which indicates that
Tc (x) is overfitted), then the sub-tree of Nc is pruned and Nc is converted into a leaf node. A
demonstration for post-pruning is provided in Fig. 2. After the pruning phase, the overfitted
tree in Fig. 2 (a) is corrected effectively, and the pruned result is competitive with Fig. 1 .

U
This demonstration illustrates that, without any hyper-parameter optimization, the robustness
and accuracy of decision trees are able to guarantee by post-pruning operations, too.
AN
M
D
TE
EP

Figure 3: Frameworks of two random forest algorithms.


C

The construction of training-and-pruning trees needs to bipartition the whole samples set
into the training and pruning set randomly. Due to the randomness in bipartition, such a de-
AC

sign brings uncertainty into the tree structure. However, if we combine this process with the
random forest framework, the uncertainty will not be a problem since it will be eliminated by
the integration of many trees. Furthermore, the randomness in samples partition just meets the
demand for diversity of random forest, which has equivalent effect with the bootstrap sampling.
Therefore, we propose an improved random forest with post-pruning operations. The framework
of the proposed algorithm is shown in Fig. 3 (b). To differentiate it from the original random
forest algorithm, we named our new algorithm as Pruning Random Forest (PRF), and the main
differences between them focus on two points:

7
ACCEPTED MANUSCRIPT

1. Random Bipartition: For the construction of the mth base tree, the whole sample set S
is partitioned into a training set STm and a pruning set SmP randomly instead of bootstrap
sampling.
2. Pruning after Training: For the generation of the mth base tree, a randomized tree structure
Tm∗ is learned from the training set STm without any limitation on depth or leaf at first. Then

PT
the final constructed tree Tm is obtained by pruning Tm∗ with the pruning set SmP .
In fact, the performance of random forest can be decomposed into two parts: 1) the mean
performance of based trees, and 2) the performance gain benefited by the diversity of based trees

RI
[58]. Since the diversity associated with random sampling and random feature selection is not
changed, while the mean performance of based trees is improved by post-pruning, as a result, the
performance of PRF models will be better than the ordinary random forest models.

SC
However, the above conclusion has a precondition that there are enough samples for the
training of PRF. In simpler terms, we must guarantee that the samples in STm are sufficient (since
half of the samples has been removed for pruning) to reflect the patterns of classes. Fortunately,
for classification tasks with large sample sets this is not a problem. The present case reveals

U
that, although the design of PRF is aim to simplify the hyper-parameter tuning process, our
improvement also brings advantages in predictive ability, robustness, and feature selection, which
AN
will be discussed in Section 4 later.

3. Method Application
M

In this section, based on the channel identification task for a block in the western Bohai Sea 2 ,
a real-world application of the proposed algorithm is illustrated. The geological survey and data
preparation works are introduced at first, and then the details of model training and prediction
are presented later.
D

3.1. Geological survey and data preparing


TE

In the present case, the formation of F33 is chosen as the research target. Preliminary geologi-
cal studies show that F33 is sedimented under the fluvial environment, which develops large-scale
meandering rivers and a large number of small braided rivers at the same time. Since the chan-
nel sand-bodies are the major storage space of hydrocarbon for this block, the key to reservoir
EP

evaluation is to determine the horizontal distribution of channels. Slice of the post-stack seismic
volume for F33 is visualized in Fig. 4. At the first glance, the channel sand-bodies of a large-scale
meandering river and another middle-scale meandering river are easy to distinguish. However,
due to the influence of heterogeneity and tuning thickness phenomenon [66], ambiguity in ampli-
C

tude makes it difficult to identify the small-scale braided channels clearly. Therefore, additional
processing is still necessary to improve the interpretation of channels based on the seismic data.
AC

Waveform analysis shows that, due to the existence of strong reflection interface, the wave-
forms of channel formation are characterized by strong negative amplitude with a long wave-
length. Meanwhile, for non-channel formation, the responses are weak and variable negative
waveforms or no significant response. The image of the section INLINE 1450 is illustrated in
Fig. 5. Due to the significant negative responses, channel sand-bodies are easy to distinguish
from the background. Compared the sketched out (a), (b) and (c) in Fig. 5, it is clear that the

2 For the purposes of data confidentiality, the real names of the block, formation, and wells are replaced.
8
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
Figure 4: The slice image of the post-stack seismic volume for F33 . The white stripes in the image represent the areas
affected by the fault, which will not be considered in the further analysis.
M
D
TE
C EP
AC

Figure 5: The image of the section INLINE 1450. The sketched out (a) and (b) are typical seismic responses for braided
river channels, while (c) is a typical seismic response for meandering river channels.

9
ACCEPTED MANUSCRIPT

amplitude, wavelength, and stability of braided river channels are much smaller than meander-
ing river channels. In addition to the amplitude, response differences also have reflections in
phase, frequency, and morphology [67], and the identification of channels can be improved by
synthesizing various attributes from different perspectives.

PT
Table 1: Abbreviations and meanings of 16 candidate seismic attributes
Abbreviation Attribute Meaning Abbreviation Attribute Meaning

RI
Seismic post-stack seismic volume InsAmp instantaneous amplitude
CosPhase instantaneous phase cosine SinPhase instantaneous phase sine
InsFreq instantaneous frequency InsBand instantaneous bandwidth

SC
InsQuality instantaneous quality factor RmsAmp rooted mean squared amplitude
ArcLength arc length of the waveform ToughAmp amplitude of the tough
ToughArea area of the waveform ToughCurvity mean curvity of the tough
ToughFreq frequency of the tough EffectBand interval effective bandwidth

U
Symmetry symmetry of the waveform Variance interval response variance
AN
To enrich the representation space, 16 kinds of seismic attributes [68] are obtained as candi-
date inputs. The abbreviations and meanings of them are listed in Table. 1 and the calculation
methods are illustrated in Appendix A. In the present case, corresponding to the meandering river
channel, braided river channel, and non-channel, formations are summarized into three classes
M

including MRC, BRC, and NRC respectively. For each trace, values of 16 attributes are collected to
form the inputs of each sample. As a supervised learning task, labels of enough samples should
be determined for supervision. However, only 48 wells are available for F33 , which means that
D

only 48 samples can be labeled based on the well-point conclusions. Such a small sample size is
far from sufficient to guarantee the effectiveness of machine learning. More samples should be
TE

labeled in other ways.


C EP

Figure 6: A demonstration for artificial sample labeling in a selected area. Formation types of W30 and W35 are BRC
AC

while formation types of W33, W37, and W38 are NRC. Besides, it is clear that the discontinuity of strips in ToughAmp
is able to supplement by CosPhase.

Fortunately, depending on the significant amplitude difference, channels of meandering rivers


are easy to sketch out and label with MRC. Meanwhile, since the horizontal development of forma-
tion is relatively stable, the neighbor traces of well-points with similar seismic responses share
the same class. Fig. 6 provides a demonstration for the artificial sample labeling process. Since
formation types of W30 and W35 are BRC while formation types of W33, W37, and W38 are NRC,
the strips with high amplitude near to W30 and W35 are labeled with BRC, and areas with low
10
ACCEPTED MANUSCRIPT

or positive amplitude near to W33, W37, and W38 are labeled with NRC. Besides, there are also
lots of high amplitude strips which are easy to distinguish as BRC. During the labeling process,
the discontinuity of strips on the amplitude slices is supplemented the phase slices. Meanwhile,
to avoid artificial mislabeling, the areas near to the possible boundaries of channels are reserved
without labeling.

PT
Table 2: A subset of the collected training samples for demonstration
Type Seismic InsAmp SinPhase CosPhase ArcLength RmsAmp ... Symmetry

RI
MRC −4.351 × 109 4.291 × 109 -0.091 -0.728 1.434 × 1010 8.547 × 109 ... 0.653
MRC −4.802 × 109 4.842 × 109 -0.176 -0.699 1.579 × 1010 9.441 × 109 ... 0.792
MRC −5.020 × 109 4.999 × 109 -0.164 -0.701 1.602 × 1010 9.915 × 109 ... 0.817
−4.750 × 109 4.632 × 109 1.542 × 1010 9.233 × 109

SC
MRC -0.048 -0.720 ... 0.610
MRC −1.923 × 109 2.177 × 109 -0.173 -0.629 7.888 × 109 3.705 × 109 ... 0.700
... ... ... ... ... ... ... ... ...
BRC −3.430 × 109 3.520 × 109 0.131 -0.816 1.253 × 1010 7.486 × 109 ... 0.554
BRC −4.330 × 109 4.235 × 109 0.234 -0.814 1.293 × 1010 9.361 × 109 ... 0.631
−3.954 × 109 3.787 × 109 0.357 -0.761 1.081 × 1010 8.243 × 109 ... 0.660

U
BRC
BRC −2.102 × 109 2.149 × 109 0.403 -0.729 6.407 × 109 4.241 × 109 ... 0.688
BRC −7.397 × 108 9.837 × 108 0.416 -0.468 4.376 × 109 1.309 × 109 ... 0.470
AN
... ... ... ... ... ... ... ... ...
NRC −3.599 × 107 2.016 × 108 0.859 0.245 6.620 × 108 5.986 × 107 ... 0.122
NRC −3.474 × 108 4.129 × 108 0.267 -0.771 1.265 × 109 8.214 × 108 ... 0.378
NRC −4.124 × 108 4.552 × 108 0.140 -0.594 1.933 × 109 8.097 × 108 ... 0.300
NRC −2.247 × 108 4.590 × 108 0.083 -0.371 2.115 × 109 6.990 × 108 ... 0.492
−2.900 × 108 6.097 × 108
M

NRC 0.051 -0.427 2.906 × 109 9.240 × 108 ... 0.449


... ... ... ... ... ... ... ... ...

Frankly, these artificially labeled samples are not guaranteed to be absolutely correct, but
D

the majority of them are objective and reasonable. Overall. the benefit of sufficient samples
outweighs the impact of a few mislabeled samples (which can be eliminated by using robust al-
TE

gorithms). Through the above process, 66807 traces are labeled artificially (including 33085 MRC,
13132 BRC, and 20590 NRC). A subset of the labeled samples is illustrated in Table. 2 for demon-
stration, where each column represents an input and the first column represents the formation
type for each sanple. Fig. 7 shows the horizontal location of the labeled 66807 samples.
EP

So far, the channel identification task is converted into a 3 classes classification problem.
Based on 66807 samples with 16 input attributes, by using a suitable learning algorithm, a clas-
sifier can be trained to predict the type of formations at all traces. In the next subsection, we will
apply the new proposed pruning random forest algorithm to identify the distribution of channels
C

in this way.

3.2. Model training and prediction


AC

To validate the utility of the proposed algorithm, based on the collected 66807 samples, a
PRF classifier is trained for the prediction of formation type. During the training phase, 500
randomized trees are constructed for the ensemble of PRF. For the mth randomized tree, the
66807 samples are is randomly divided into a training set STm and a pruning set SmP equally. Then
the mth tree is trained on STm without any depth and leaf size limitation and pruned base on SmP
subsequently.
To determine the hyper-parameter ntry, we examine every possible value of ntry ranged
from 1 to 16. The performance of models is evaluated by 50 times bootstrap [69] (a better way
11
ACCEPTED MANUSCRIPT

PT
RI
SC
Figure 7: Visualization of the artificially labeled samples for MRC, BRC, and NRC. The unknown areas are colored with

U
white. For these areas, there is no sufficient evidence for exports to determine which kinds of formation they are definitely.
AN
than cross-validation) and represented by the mean accuracy. Evaluated accuracy with different
ntrys are visualized in Fig. 8. The accuracy change shows a trend of first increase and then
decrease. The initial increase is mainly due to the strengthening of feature selection. However,
after the maximum accuracy is achieved, the accuracy is decreased gradually since the diversity
M

between trees is reduced. Since the accuracy reaching its peak of 0.853 at 5, we use the hyper-
parameter setting ntree = 500, ntry = 5 for the training of the final PRF model.
D
TE
C EP

Figure 8: Accuracy with different ntrys, it is easy to see that the accuracy reach the maximum 0.853 when ntry is
equaled to 5.
AC

The trained model is applied to the whole area for prediction. Instead of predicting the class
for each trace directly, we predict the conditional probabilities of MRC, BRC, and NRC for fuzzy
characterization. The predicted result is visualized by the dominant probability with different
colors in Fig. 9. Through the differences in probabilities, the channels of meandering rivers and
braided rivers are easy to distinguish. In addition to the labeled area in Fig. 7, the trained clas-
sifier predicts the channel distribution of unlabeled areas effectively. Meanwhile, the predicted
result is consistent with the pattern of channel distribution, which proves the reasonableness of
the prediction result. Compared with the slice image in Fig. 4, the visibility and clarity of Fig.
12
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
Figure 9: Visualization of the prediction result. The image is colored with the predominance probability of MRC, BRC,
and NRC, and the distribution of channel sand-bodies can be sketched out based on the differences of color easily.

9 are highly improved. Based on the prediction result, the distribution of channel sand-bodies is
M

able to sketch out easily, which achieves our desired goal.

4. Comparison and Discussion


D

To highlight the superiority of the proposed PRF algorithm, in this section, six algorithms
including LSVM (Linear Support Vector Machine), KSVM (Kernel Support Vector Machine),
TE

NNET (Neural Network), DT (Decision Tree), RF (Random Forest), and PRF (Pruning Random
Forest) are compared from the perspective of predictive ability, robustness and attribute contri-
butions. The experiments are executed in the R language environment and packages including
nnet, kernlab, rpart, and ranger are used for the implementation of the aforementioned
EP

competitive algorithms.

4.1. Comparison of the predictive ability


As the most concerned matter, the predictive ability is compared at first. For each algorithm,
C

the optimized hyper-parameter setting and the corresponding accuracy is listed in Table. 3. It is
easy to see that:
AC

1. The accuracy of the LSVM model is much lower than the others. This phenomenon indi-
cates that the classification problem in the present case is linear inseparable.
2. For more sophisticated nonlinear algorithms like KSVM, NNET, and DT, the performances
are improved significantly referred to LSVM. However, these improvements are competi-
tive, while the accuracy of DT model seems slightly higher.
3. Due to the benefit of ensemble learning, the predictive accuracy can be further improved
to a higher level by RF and PRF. Meanwhile, the PRF mode is more accurate than RF.
13
ACCEPTED MANUSCRIPT

4. With the highest accuracy of 85.3%, the accuracy comparison illustrates that PRF has
outperformed all the rest in the present case.

PT
Table 3: Hyper-parameter setting and predictive accuracy of each model

Algorithm Hyper-parameter Bootstrap Accuracy

LSVM linear kernel, C = 10.0 73.4%

RI
KSVM radial basis kernel with sigma = 3.0, C = 5.0 80.7%
NNET 3 layers network with 6 nodes in the hidden layer, decay = 0.0001 79.6%
DT minleave = 50, maxdepth = 30, without any pruning 80.9%

SC
RF ntree = 500, ntry = 12, minleave = 20, no limitation on maxdepth 82.1%
PRF ntree = 500, ntry = 5 85.3%

U
Furthermore, in order to evaluate the predictive ability more intuitively, the prediction results
for the small area in the lower-right corner are visualized for comparison in Fig. 10. It is obvious
that the LSVM result lacks the ability to distinguish the formations of meandering and braided
AN
river channels, whereas, the rest models are able to distinguish them more or less. Besides, even
though their accuracy is competitive, the result of DT is more acceptable compared with KSVM
and NNET, since shapes of braided river channels have already been revealed roughly. This may
attribute to the fact that, the hypothesis space of DT is more suitable for the classification problem
M

in the present case. The result of DT can be improved by RF and PRF, and after all, with the
clearest result, PRF outperforms all the other algorithms, which further confirms its advantage in
the predictive ability.
D
TE
C EP
AC

Figure 10: The prediction results of six different algorithms for the lower-right corner

4.2. Comparison of the algorithm robustness


In practice, error in data is an unignored factor for machine learning. Learning with error data
requires the employed algorithm to be high robust - even though there are unavoidable errors, the
14
ACCEPTED MANUSCRIPT

algorithm can still learn the patterns reflected by the population. For the present case, there are
two kinds of data error, they are:
1. Input noise: Noise is widely existing in seismic data due to signal interference and ac-
curacy limitation in measurement. Based on the seismic signal with noise, the computed

PT
seismic attributes are mixed with errors inevitably. In order to simulate the situation un-
der different noise intensities, Gaussian noise with different intensity (measured in terms
of a multiple to the standard deviation of seismic wave amplitude) is added into the seis-
mic waves for experiments. Then seismic attributes are recomputed based on the noised

RI
seismic signal to construct a new sample set for modeling and evaluation.
2. Mislabeling: In the present case, samples are labeled artificially to obtain sufficient sam-
ples for training. It is inevitable to mislabel some instances. For example, the samples

SC
belong to NRC near to the channel may be mislabeled with BRC or MRC. To simulate the de-
gree of mislabeling, we shuffle the labels of samples in different proportion (shuffling 50%
samples may cause 20%-30% samples to be mislabeled) to make controllable mislabeling.

U
The accuracy of each algorithm with different additional noise intensities and label shuffled
ratios is visualized in Fig. 11 (a) and (b) separately. For each algorithm, in order to eliminate the
influence of randomness, the accuracy evaluation is repeated for 10 times and the mean accuracy
AN
is accepted as the final result.
M
D
TE

Figure 11: Accuracy curves of algorithms under different additional noise intensity and label shuffled ratio
EP

In general, data error will increase the risk of overfitting, which in turn leads to the degra-
dation of performance. This understanding is confirmed by the observation that in Fig. 11 (a)
and (b), the accuracy of all algorithms decreased with the increase in noise intensity and shuf-
C

fling ratio. However, for different algorithms, the performance declines are different in speed and
degree:
AC

1. Since the hypothesis space of LSVM is simple and uneasy to overfit, the influence of noise
and mislabeling is relatively smaller than other algorithms.
2. For KSVM, NNET, and DT, when the noise intensity reaches a high level (≥ 0.3), no
significant accuracy advantage is existing referred to LSVM. However, RF and PRF seem
to be more robust since their performance drops are relatively smaller.
3. Mislabeling seems to have a greater impact. With higher shuffled ratios greater than 0.3,
the accuracy of KSVM, NNET, DT, and RF are even less than LSVM.

15
ACCEPTED MANUSCRIPT

4. At least for the present case, PRF seems to be less affected by noise and mislabeling, which
keeps outperforming its competitors even with very high noise intensity or shuffled ratios.
Based on the evidence that PRF outperforms other algorithms at any levels of noise or mis-
labeling, the advantage of PRF in robustness is confirmed. An intuitive explanation for this

PT
phenomenon is that, even though the based randomized trees may overfit the training set, they
are able to correct it by the followed post-pruning operation based on the additional pruning set.
As mentioned above, due to the inevitable problem of noise and mislabeling in the samples, ma-
chine learning assisted seismic interpretation puts high demands on the robustness of algorithms.

RI
Compared with conventional algorithms such as support vector machine and neural networks,
the pruning random forest with high robustness is more suitable for machine learning assisted
seismic interpretation.

SC
4.3. Comparison of the feature selection ability
In machine learning assisted seismic interpretation, we often wish to use as more seismic
attributes as possible to enrich the representation space of patterns. However, the increasing

U
of attributes also brings a lot of irrelevant information. So the employed algorithm must have
strong embedded feature selection ability to reduce the influence of irrelevant information. In
AN
this subsection, we will illustrate the strong feature selection ability of PRF compared to KSVM,
NNET, and RF from the perspective of attribute contributions.
The contribution of an attribute refers to the degree of its influence on the model prediction.
In our experiments, the contribution of each attribute is evaluated as followed. For sample set
M

S with N samples, by shuffling the values of the pth attributes randomly (the values of other
attributes remain the same), a new sample set S p is constructed. Substituting S and S p into
model F (x), the contribution score of the pth attribute for model F (x) is defined as:
D

N
1 X  
C p (F , S) = I F (xi ) , F (x p,i ) (7)
N i=1
TE

In Eq. (7), xi and x p,i represent the ith sample in S and S p separately. Obviously, C p (F , S)
is the percentage of how many predictions have been changed due to the random shuffling of
the pth attribute. If the value of pth attribute is very important for F (x), since lots of samples
will get different results, the corresponding C p (F , S) will be very large. On the contrary, if the
EP

pth attribute is useless for decision making, there is no significant difference between F (xi ) and
F (x p,i ) so C p (F , S) ≈ 0. In practice, to eliminate the influence of randomness in shuffling, the
evaluation of C p (F , S) will be repeated for 10 times, and the average of computed scores will be
accepted as the final contribution score for the pth attribute.
C

We compute the contribution scores of each attribute in the KSVM, NNET, RF, and PRF
models obtained in Section. 4.1 and visualize them as bar charts in Fig. 12. It can be observed
AC

that the scores of different models have significant differences in order and distribution. However,
there are some common characteristics. For example, attributes such as Seismic, ToughAmp,
and InsAmp are always topped on the list, which indicates that they are important attributes
to all the models. Conversely, attributes like EffectBand, InsFreq, and Symmetry always
lay at the bottom, which indicates that they are irrelevant attributions. Besides, attributes with
moderate scores such as ArcLength, CosPhase, InsBand, and InsQuality can be viewed as
the complementary attributes - they may provide additional information for the classification.

16
ACCEPTED MANUSCRIPT

PT
RI
SC
Figure 12: Bar-charts of attribute contribution scores for the trained KSVM, NNET, RF, and PRF model.

U
However, there still exists irrelevant information in them, which signposts that they cannot be
used as the main basis for decision making.
AN
For a model trained by the specified algorithm, the distribution of its contributions reflects
the feature selection ability of algorithm. The more the contributions are concentrated, the more
effective feature selection can be made by this algorithm. As illustrated in Fig. 12, the attribute
contributions in RF and PRF are more concentrated compared to KSVM and NNET. Pairwise
comparison shows that the degree of concentration is more prominent in PRF: the scores of
M

Seismic, ToughAmp, and InsAmp are significantly higher than the followed complementary
attributes, while for irrelevant attributes like InsFreq, EffectBand, and Symmetry, the cor-
responding scores are much smaller than NNET, KSVM, and RF. These observations illustrate
D

that, with a modest complement of other attributes, the PRF model makes predictions based on
the first three attributes mainly, whereas the influences of irrelevant attributes are reduced to the
TE

minimum. In other words, PRF has stronger feature selection ability than other algorithms.
We argue that the strong feature selection ability of PRF is also sourced from the additional
pruning operations. During the training of based trees, since there are enough samples for in-
ference, the partitions for nodes closed to the tree root are optimized effectively, which will
automatically reject the splits in irrelevant attributes. While for nodes deeper in the tree, some
EP

occasional rules related to the irrelevant attributes may appear since there are only a few samples
available for optimization. These rules are overfitted by chance, which cannot be applied for fur-
ther prediction. As the post-pruning operations removing these overfitted nodes, the influences
of the irrelevant attributes are eliminated effectively. As a conclusion, the embedded feature se-
C

lection ability of the PRF is strengthened. The phenomenon also elucidates why the predictive
ability of PRF is stronger than others since it is less affected by the irrelevant information.
AC

The strong feature selection ability is very important in machine learning assisted seismic
interpretation since the input attributes tend to have different sensitivities for modeling. One may
argue that the problem of irrelevant information is able to solve by explicit attribute selection [17,
70, 71] before modeling. However, this is a very time-consuming work. And more importantly,
since most of the attributes (especially those supplementary attributes) are always between useful
and useless, by ”hard” explicit feature selection, it is difficult to decide whether to select these
attributes or not. However, using algorithms with strong feature selection ability makes a ”soft”
selection inside the model, which will fully utilize the effective information in attributes, and
17
ACCEPTED MANUSCRIPT

eliminates the influence of irrelevant information adaptively at the same time. Learning using
algorithms with strong feature selection ability allows us to incorporate as more attributes as
possible into the interpretation model without presupposing the sensitivities of them, which is
beneficial to machine learning assisted seismic interpretation obviously.

PT
5. Conclusion

In this article, we propose the pruning random forest algorithm for machine learning assisted

RI
seismic interpretation. Demonstrated with a real-world channel identification task, a systematic
analysis in terms of predictive performance, robustness, and feature selection ability is presented.
The research conclusions are summed up as follows:

SC
1. Combined with effective post-pruning operations, the proposed algorithm simplifies the
tuning process of the random forest algorithm greatly, which makes it much easier to apply
for the machine learning assisted seismic interpretation.
2. Through well-designed experiments, the advantages in predictive performance, robustness

U
and feature selection of pruning random forest are revealed. These natures make it more
suitable for the machine learning tasks in seismic interpretation, which often face problems
AN
of data error and irrelevant information.
3. By comparing with support vector machines, neural networks, and decision trees, the ad-
vantages of the proposed algorithm in seismic interpretation applications are confirmed,
which provides a reliable alternative way for further machine learning assisted seismic
M

interpretation.

However, there still are drawbacks for the application of pruning random forest. Since the
construction of decision trees in PRF requires enough samples both for training and pruning,
D

samples based on well-points are far from sufficient. Thus, in the present case, more samples are
labeled with export assistance, which will bring subjective understandings into the constructed
TE

sample set. When the seismic responses of geological bodies are not significant sufficiently, the
subjective cognitive intervention will have an impact on the objectivity of applications, which
needs to be handled carefully.
EP

References
[1] M. Nikravesh, F. Aminzadeh, Past, present and future intelligent reservoir characterization trends, Journal of
Petroleum Science and Engineering 31 (2-4) (2001) 67–79.
[2] T. Zhao, V. Jayaram, A. Roy, K. J. Marfurt, A comparison of classification techniques for seismic facies recognition,
C

Interpretation 3 (4) (2015) SAE29–SAE58.


[3] M. M. Poulton, Neural networks as an intelligence amplification tool: A review of applications, Geophysics 67 (3)
(2002) 979–993.
AC

[4] G. Wang, T. R. Carr, Organic-rich marcellus shale lithofacies modeling and distribution pattern analysis in the
appalachian basin organic-rich shale lithofacies modeling, appalachian basin, AAPG Bulletin 97 (12) (2013) 2173–
2205.
[5] G. Wang, T. R. Carr, Marcellus shale lithofacies prediction by multiclass neural network classification in the ap-
palachian basin, Mathematical Geosciences 44 (8) (2012) 975–1004.
[6] T. Zhao, V. Jayaram, K. J. Marfurt, H. Zhou, Lithofacies classification in barnett shale using proximal support
vector machines, in: SEG Technical Program Expanded Abstracts 2014, Society of Exploration Geophysicists,
2014, pp. 1491–1495.
[7] G. Wang, T. R. Carr, Y. Ju, C. Li, Identifying organic-rich marcellus shale lithofacies by support vector machine
classifier in the appalachian basin, Computers & Geosciences 64 (2014) 52–60.
18
ACCEPTED MANUSCRIPT

[8] W.-P. Luo, H.-Q. Li, N. Shi, Semi-supervised least squares support vector machine algorithm: application to off-
shore oil reservoir, Applied Geophysics 13 (2) (2016) 406–415.
[9] M. Nikravesh, Soft computing-based computational intelligent for reservoir characterization, Expert Systems with
Applications 26 (1) (2004) 19–38.
[10] H. Di, G. AlRegib, Seismic multi-attribute classification for salt boundary detection - a comparison, in: Proceedings
of the 79th EAGE Conference and Exhibition 2017, 2017.

PT
[11] W. Lewis, D. Vigh, Deep learning prior models from seismic images for full-waveform inversion, in: SEG Techni-
cal Program Expanded Abstracts 2017, Society of Exploration Geophysicists, 2017, pp. 1512–1517.
[12] G. Wang, T. R. Carr, Methodology of organic-rich shale lithofacies identification and prediction: A case study from
marcellus shale in the appalachian basin, Computers & Geosciences 49 (2012) 151–163.

RI
[13] S. Bhattacharya, T. R. Carr, M. Pal, Comparison of supervised and unsupervised approaches for mudstone litho-
facies classification: Case studies from the bakken and mahantango-marcellus shale, usa, Journal of Natural Gas
Science and Engineering 33 (2016) 1119–1133.
[14] P. Meldahl, R. Heggland, B. Bril, P. D. Groot, Identifying fault and gas chimneys using multi-attributes and neural
networks, The Leading Edge 20 (5) (2001) 474–482.

SC
[15] M. Araya-Polo, T. Dahlke, C. Frogner, C. Zhang, T. Poggio, D. Hohl, Automated fault detection without seismic
processing, The Leading Edge 36 (3) (2017) 208–214.
[16] H. Khoshdel, M. A. Riahi, Multi attribute transform and neural network in porosity estimation of an offshore oil
field - a case study, Journal of Petroleum Science and Engineering 78 (3) (2011) 740–747.
[17] U. Iturrarn-Viveros, J. O. Parra, Artificial neural networks applied to estimate permeability, porosity and intrinsic

U
attenuation using seismic attributes and well-log data, Journal of Applied Geophysics 107 (107) (2014) 45–54.
[18] S. R. Naimi, S. R. Shadizadeh, M. A. Riahi, M. Mirzakhanian, Estimation of reservoir porosity and water saturation
based on seismic attributes using support vector regression approach, Journal of Applied Geophysics 107 (4) (2014)
AN
93–101.
[19] A. K. Verma, S. Chaki, A. Routray, W. K. Mohanty, M. Jenamani, Quantification of sand fraction from seismic
attributes using neuro-fuzzy approach, Journal of Applied Geophysics 111 (2014) 141–155.
[20] S. Chaki, A. Routray, W. K. Mohanty, A novel preprocessing scheme to improve the prediction of sand fraction
from seismic attributes using neural networks, IEEE Journal of Selected Topics in Applied Earth Observations and
M

Remote Sensing 8 (4) (2015) 1808–1820.


[21] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32.
[22] L. Breiman, J. H. Friedman, R. Olshen, C. Stone, Classification and regression trees.
[23] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140.
D

[24] T. K. Ho, The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis
and Machine Intelligence 20 (8) (1998) 832–844.
[25] M. Pal, Random forest classifier for remote sensing classification, International Journal of Remote Sensing 26 (1)
TE

(2005) 217–222.
[26] P. O. Gislason, J. A. Benediktsson, J. R. Sveinsson, Random forests for land cover classification, Pattern Recogni-
tion Letters 27 (4) (2006) 294–300.
[27] D. R. Cutler, T. C. Edwards Jr, K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, J. J. Lawler, Random forests for
classification in ecology, Ecology 88 (11) (2007) 2783–2792.
EP

[28] X. Chen, H. Ishwaran, Random forests for genomic data analysis, Genomics 99 (6) (2012) 323–329.
[29] G. Fanelli, M. Dantone, J. Gall, A. Fossati, L. Van Gool, Random forests for real time 3d face analysis, International
Journal of Computer Vision 101 (3) (2013) 437–458.
[30] K. M. Martin, W. T. Wood, J. J. Becker, A global prediction of seafloor sediment porosity using machine learning,
Geophysical Research Letters 42 (24) (2015) 10–640.
C

[31] K. Ueki, H. Hino, T. Kuwatani, Geochemical discrimination and characteristics of magmatic tectonic settings: A
machine-learning-based approach, Geochemistry, Geophysics, Geosystems 19 (4) (2018) 1327–1347.
[32] E. J. M. Carranza, A. G. Laborte, Random forest predictive modeling of mineral prospectivity with small number
AC

of prospects and data with missing values in abra (philippines), Computers and Geosciences 74 (2015) 60–70.
[33] V. Rodriguez-Galiano, M. Chica-Olmo, M. Chica-Rivas, Predictive modelling of gold potential with the integration
of multisource information based on random forest: a case study on the rodalquilar area, southern spain, Interna-
tional Journal of Geographical Information Science 28 (7) (2014) 1336–1354.
[34] V. Rodriguez-Galiano, M. Sanchez-Castillo, M. Chica-Olmo, M. Chica-Rivas, Machine learning predictive models
for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector
machines, Ore Geology Reviews 71 (2015) 804–818.
[35] E. J. M. Carranza, A. G. Laborte, Data-driven predictive mapping of gold prospectivity, baguio district, philippines:
Application of random forests algorithm, Ore Geology Reviews 71 (2015) 777–787.
[36] G. McKay, J. Harris, Comparison of the data-driven random forests model and a knowledge-driven method for
mineral prospectivity mapping: a case study for gold deposits around the huritz group and nueltin suite, nunavut,
19
ACCEPTED MANUSCRIPT

canada, Natural Resources Research 25 (2) (2016) 125–143.


[37] M. J. Cracknell, A. M. Reading, Geological mapping using remote sensing data: A comparison of five machine
learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit
spatial information, Computers and Geosciences 63 (2014) 22–33.
[38] J. Harris, E. Grunsky, Predictive lithological mapping of canada’s north using random forest classification applied
to geophysical and geochemical data, Computers and Geosciences 80 (2015) 9–25.

PT
[39] J. He, J. Harris, M. Sawada, P. Behnia, A comparison of classification algorithms using landsat-7 and landsat-8
data for mapping lithology in canadał arctic, International Journal of Remote Sensing 36 (8) (2015) 2252–2276.
[40] S. Kuhn, M. J. Cracknell, A. M. Reading, Lithologic mapping using random forests applied to geophysical and
remote-sensing data: A demonstration study from the eastern goldfields of australia, Geophysics 83 (4) (2018)

RI
B183–B193.
[41] R. Grimm, T. Behrens, M. Märker, H. Elsenbeer, Soil organic carbon concentrations and stocks on barro colorado
islandigital soil mapping using random forests analysis, Geoderma 146 (1-2) (2008) 102–113.
[42] M. Wiesmeier, F. Barthold, B. Blank, I. Kögel-Knabner, Digital mapping of soil organic matter stocks using random
forest modeling in a semi-arid steppe ecosystem, Plant and Soil 340 (1-2) (2011) 7–24.

SC
[43] B. Heung, C. E. Bulmer, M. G. Schmidt, Predictive soil parent material mapping at a regional-scale: A random
forest approach, Geoderma 214 (2014) 141–154.
[44] M. Ließ, B. Glaser, B. Huwe, Uncertainty in the spatial prediction of soil texture: comparison of regression tree
and random forest models, Geoderma 170 (2012) 70–79.
[45] S. A. Naghibi, H. R. Pourghasemi, A comparative assessment between three machine learning models and their

U
performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping, Water
Resources Management 29 (14) (2015) 5217–5236.
[46] O. Rahmati, H. R. Pourghasemi, A. M. Melesse, Application of gis-based data driven random forest and maximum
AN
entropy models for groundwater potential mapping: a case study at mehran region, iran, Catena 137 (2016) 360–
372.
[47] Y. Xie, C. Zhu, W. Zhou, Z. Li, X. Liu, M. Tu, Evaluation of machine learning methods for formation lithol-
ogy identification: A comparison of tuning processes and model performances, Journal of Petroleum Science and
Engineering 160 (2018) 182–193.
M

[48] S. Bhattacharya, S. Mishra, Applications of machine learning for facies and fracture prediction using bayesian
network theory and random forest: Case studies from the appalachian basin, usa, Journal of Petroleum Science and
Engineering.
[49] S. Baziar, H. B. Shahripour, M. Tadayoni, M. Nabi-Bidhendi, Prediction of water saturation in a tight gas sandstone
D

reservoir by using four intelligent methods: a comparative study, Neural Computing and Applications (2016) 1–15.
[50] F. Anifowose, J. Labadin, A. Abdulraheem, Improving the prediction of petroleum reservoir characterization with
a stacked generalization ensemble model of support vector machines, Applied Soft Computing 26 (2015) 483–496.
TE

[51] F. Anifowose, J. Labadin, A. Abdulraheem, Predicting petroleum reservoir properties from downhole sensor data
using an ensemble model of neural networks, in: Proceedings of Workshop on Machine Learning for Sensory Data
Analysis 2013, ACM, 2013, p. 27.
[52] V. L. Hauge, G. H. Hermansen, Machine learning methods for sweet spot detection: a case study, in: Geostatistics
Valencia 2016, Springer, 2017, pp. 573–588.
EP

[53] S. Bhattacharya, S. Mishra, Applications of machine learning for facies and fracture prediction using bayesian
network theory and random forest: Case studies from the appalachian basin, usa, Journal of Petroleum Science and
Engineering 170 (2018) 1005–1017.
[54] G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied Statistics (1980)
119–127.
C

[55] J. R. Quinlan, Induction of decision trees, Machine Learning 1 (1) (1986) 81–106.
[56] R. J. Quinlan, C4. 5: Programs for Machine Learning, Morgan Kaufmann, 1993.
[57] T. Hastie, R. Tibshirani, J. Friedman, The Element of Statistical Learning, Springer, 2009.
AC

[58] Z. H. Zhou, Ensemble Methods: Foundations and Algorithms, Taylor & Francis, 2012.
[59] N. Ueda, R. Nakano, Generalization error of ensemble estimators, in: Proceedings of the IEEE International Con-
ference on Neural Networks 1996, 1996, pp. 90–95 vol.1.
[60] G. Brown, J. L. Wyatt, P. TiÅo, Managing diversity in regression ensembles, Journal of Machine Learning Research
6 (1) (2005) 1621–1650.
[61] S. Bernard, L. Heutte, S. Adam, bastien, Forest-rk: A new random forest induction method, Lecture Notes in
Computer Science 5227 (2008) 430–437.
[62] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Machine Learning 63 (1) (2006) 3–42.
[63] A. C. Lorena, A. C. de Carvalho, Evolutionary tuning of svm parameter values in multiclass problems, Neurocom-
puting 71 (1618) (2008) 3326–3334.
[64] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, Journal of Machine Learning Research
20
ACCEPTED MANUSCRIPT

13 (1) (2012) 281–305.


[65] C. Thornton, F. Hutter, H. H. Hoos, K. Leytonbrown, Auto-weka: combined selection and hyperparameter opti-
mization of classification algorithms, Computer Science (2013) 847–855.
[66] M. Widess, How thin is a thin bed?, Geophysics 38 (6) (1973) 1176–1180.
[67] S. Chopra, K. J. Marfurt, Seismic attributes for prospect identification and reservoir characterization, Society of
Exploration Geophysicists and European Association of Geoscientists and Engineers, 2007.

PT
[68] A. E. Barnes, Handbook of poststack seismic attributes, Society of Exploration Geophysicists, 2016.
[69] A. K. Jain, R. C. Dubes, C.-C. Chen, Bootstrap techniques for error estimation, IEEE Transactions on Pattern
Analysis and Machine Intelligence (5) (1987) 628–633.
[70] A. Gholami, H. R. Ansari, Estimation of porosity from seismic attributes using a committee model with bat-inspired

RI
optimization algorithm, Journal of Petroleum Science and Engineering 152 (2017) 238–249.
[71] K. P. Dorrington, C. A. Link, Genetic-algorithm/neural-network approach to seismic attribute selection for well-log
prediction, Geophysics 69 (1) (2004) 212–221.

SC
Appendix A: Calculation Methods of Seismic Attributes

According to the difference in calculation, we divide the 16 involved attributes into three
categories: 1) instantaneous volume attributes; 2) interval statistic attributes; 3) waveform mor-

U
phology attributes. And the calculation method of attributes will be introduced by category.
AN
1. Instantaneous Volume Attributes
Attributes in this category should be computed by volume at first. Then slice operation is
performed to obtain attribute slices based on the provided horizon of the target layer. The basis
of instantaneous attributes is the Hilbert transform. Denoting the post-stack wave of one trace
M

as x(t), and its Hilbert transform result as y(t), instantaneous attributes are able to compute as
follows:
D

q
A(t) = x2 (t) + y2 (t)
TE

 y(t) 
θ(t) = atan
x(t)
1 d θ(t + ∆t) − θ(t)
F(t) = · θ(t) ≈
2π dt 2π · ∆t
EP

1 d |lgA(t + ∆t) − lgA(t)|


B(t) = · | lgA(t)| ≈
2π dt 2π · ∆t
1 θ(t + ∆t) − θ(t)
Q(t) = − ·
2 |lgA(t + ∆t) − lgA(t)|
C

In the above formulas, A(t) is the instantaneous amplitude (InsAmp), θ(t) is the instanta-
neous phase (InsPhase), F(t) is the instantaneous frequency (InsFreq), B(t) is the instanta-
AC

neous bandwidth (InsBand), and Q(t) is the instantaneous quality factor (InsQuality). In
practice, the sine and cosine values (SinPhase and CosPhase) of θ(t) are used to represent the
instantaneous phase, since the values of InsFreq will jump from −π to π. Based on the above
definition, volumes of these attributes are computed at first, then slices are obtained based on the
provided horizon with window size 15ms.

21
ACCEPTED MANUSCRIPT

2. Interval Statistic Attributes


Attributes in this category are some statistics for wave records in a specified time interval,
which are able to compute as slices directly. Given the specified window size w and the picked
horizon of the target formation, these attributes are computed based on different statistics in the
interval [t1 , t2 ] (tu = th − w/2 and tl = th + w/2, while th is the horizon time for this trace).

PT
Specifically:

s
1 X 2

RI
Arms (w, th ) = x (t)
w t∈[t , t ]
u l
X
Larc (w, th ) = x(t) − x(t − ∆t)

SC
t∈[tu +∆t, tl ]
1 X  2
σ2 (w, th ) = x(t) − E (x)
w t∈[t , t ]
u l

where Arms represents RmsAmp, Larc represents ArcLength, and σ2 represents Variance. The

U
calculation of EffectBand is more sophisticated. EffectBand is short for Effective Bandwidth,
which is an empirical estimation of bandwidth derived from the autocorrelation of waveforms
AN
in the specified interval. Based on the autocorrelation φ(t) with different offsets, the effective
bandwidth is able to compute by:
1 φ(th )
Be (w, th ) = ·P
M

2∆t t∈[tu , tl ] |φ(t)|


In the present case, the window size is set as 15ms since the averaged wavelength of seismic
responses is about 30ms.
D

3. Waveform Morphology Attributes


For the target formation of F33 , since the existence of channel sand-bodies will cause tough
TE

exceptions in seismic responses, morphological characteristics of tough waveforms are believed


to provide effective information. For the computation of these attributes, the nearest wave tough
of the horizon in each trace is identified (the time position of tough is denoted as t0 ), then the
upper and lower zero points tu and tl are identified to define the time range of this tough. The
EP

search process of tough is limited in the interval [th − 10ms, th + 10ms] to avoid horizon wearing.
For cases that no trough is found in this interval, t0 = th , tu = th − 10ms and tu = th + 10ms are
accepted as default.
C
AC

Figure 13: Schematic diagrams for the calculation of morphological attributes.

Fig. 13 provides some schematic diagrams for the calculation of morphological attributes:
22
ACCEPTED MANUSCRIPT

1. ToughAmp is the corresponding amplitude value at t0 . The negative sign is retained since
it may be positive for some cases.
2. ToughArea is the response wave area in the interval [tu , tl ], which can be computed by:
X 1
S tough (tu , tl ) = ∆t · x(t) + x(t + ∆t)

PT
t∈[t +∆t, t ]
2
u l

3. ToughCurvity is the mean absolute curvature of the waveform in the interval [tu , tl ],
which can be computed by:

RI
1 X    
Ctough (tu , tl ) = x(t + ∆t) − x(t) − x(t) − x(t − ∆t)
tl − tu − 2 t∈[t +∆t, t −+∆t]
u l

SC
4. ToughFreq is the approximate frequency of the waveform. Since the half-wavelength is
identified by λ/2 = tl − tu , an approximation of frequency is able to estimate by:

Ftough (tu , tl ) = 2π/λ

U
Besides, Symmetry is a measure for the symmetry of waveform. The calculation of Symmetry
AN
is a little different with other morphological attributes because it is based on the wave records in
a symmetric interval [t0 − 10ms, t0 + 10ms]. Taking t0 as the axis, then the mirror waveform is
able to obtain. In our study, Symmetry is defined as the linear correlation coefficient between the
raw and the mirror waveforms.
M
D
TE
C EP
AC

23
ACCEPTED MANUSCRIPT

Highlights:

• The superiority of applying random forests for seismic interpretation is elucidated.

PT
• The Pruning Random Forest (PRF) algorithm is proposed to simplify the hyper-parameter
tuning of the ordinary random forest algorithm.

RI
• The additional advantages of PRF in predictive performance, robustness and feature selec-
tion are revealed by well-designed experiments.

SC
• Compared with support vector machine, neural networks, decision tree and the ordinary
random forest, the advantages of pruning random forest in seismic interpretation are con-
firmed.

U
AN
M
D
TE
C EP
AC

You might also like