You are on page 1of 5

Bayesian deep learning for seismic facies classification and its uncertainty estimation

Pradip Mukhopadhyay and Subhashis Mallick, Department of Geology and Geophysics, University of Wyoming
Downloaded 06/21/20 to 130.238.7.40. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

SUMMARY
To measure the prediction uncertainty, Gal and Ghahramani
Probabilistic pixel-wise semantic segmentation using (2016) introduced dropout as approximate Bayesian
Convolutional neural networks and Monte Carlo sampling inference over the network’s weight and showed that
with dropout during testing has gained popularity because of dropout can be used during the testing phase to impose a
its ability to obtain uncertainly maps from deep learning Bernoulli distribution over the convolution net filter weights
models and reduce model over-fitting when predicting to estimate model uncertainty in conjunction with model
semantic classes. In this work, we propose an encoder- prediction. The dropout technique is also used to regularize
decoder based Bayesian SegNet architecture for seismic CNNs to prevent model overfitting and co-adaptation of
facies classification and introduce the concept of predictive features. SegNet (Badrinarayan et al., 2015) is a state of the
entropy to obtain uncertainty maps. By applying the method art and easy to implement CNN based semantic
to real seismic data with salt and sediment structures, we segmentation architecture which can be trained end-to-end
observe high prediction uncertainty at facies boundaries, for in one step due to its lower parameterization as compared
data samples which are affected by processing and imaging with heavily parameterized Fully Convolutional Networks
artifacts, and in the zones where the sediments are trapped (FCN) (Long, et al., 2014) and Dilation Network (Yu and
within the salt bodies. Comparing Monte Carlo dropout Koltun, 2016). Kendall et al., (2016) used the concept of
model prediction with the state of the art SegNet architecture Monte Carlo (MC) dropout in the SegNet architecture and
without dropout, we demonstrate the usability of the called it Bayesian SegNet. They successfully demonstrated
proposed Bayesian SegNet for seismic facies classification the practical use of this technique for model prediction and
and uncertainty estimation. uncertainty estimation.

INTRODUCTION The classification prediction uncertainty can be addressed


using epistemic uncertainty, representing what the model
With the increased use of artificial intelligence (AI) systems does not know due to insufficient data and aleatoric
in real-life scenarios like autonomous driving and medical uncertainty caused by noisy data measurements. The entropy
diagnosis, convolution neural networks (CNNs), a deep of the predictive distribution, also known as predictive
learning technique is also gaining its popularity in the oil and entropy, captures predictive uncertainty which combines
gas industry. The CNN models designed for image both epistemic and aleatoric uncertainties (Gal, 2016). We
segmentation task, known as semantic segmentation require introduced this predictive entropy measurement in Bayesian
an understanding of an image at the pixel level. This method SegNet architecture to measure model prediction uncertainty
has been used successfully in seismic facies classification in while classifying seismic facies. Although the deep learning
two-dimensional (2D) seismic profiles or traversing through prediction uncertainty estimation using dropout has been
all 2D profiles in three-dimensional (3D) data volumes using successfully studied in computer vision applications, it has
both a small patch of data, and with encoder-decoder based not been introduced and extensively studied in geoscience
architecture (Zhao, 2018; Badrinarayan et al., 2017; Chen et applications.
al., 2018).
In this work, we use the encoder-decoder based Bayesian
Quantifying uncertainty should be a natural component of SegNet model to classify seismic facies from real seismic
any prediction system, because it is important to know the data. We present the classification results along with
confidence with which we can trust our prediction for prediction uncertainty maps by measuring predictive
decision making. Seismic data poses its own challenges due entropy and demonstrate the possible sources of high
to the limitation of available training samples, presence of uncertainty in our predictions. Finally, we conclude that
incoherent noise, and the noise resulting from the processing modelling uncertainty using MC dropout also improves
and imaging artifacts. Unlike other computer vision images segmentation performance by minimizing model overfitting.
where features are well defined within each class, seismic
facies are often nonunique and are contaminated by samples BAYESIAN CONVOLUTIONAL ENCODER-
from other classes because of geologic and tectonic DECODER NEURAL NETWORK
complexities, and it is important to provide uncertainty maps
along with pixel-wise model predictions which an The encoder-decoder based CNN consists of a sequence of
experienced seismic interpreter can use for decision making. non-linear processing layers or encoders and a

© 2019 SEG 10.1190/segam2019-3216870.1


SEG International Exposition and 89th Annual Meeting Page 2488
Seismic facies classification using Deep Neural Network
Downloaded 06/21/20 to 130.238.7.40. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

corresponding set of decoders followed by a pixel-wise


classifier. In general, each encoder unit consists of one or Bayesian SegNet is a popular probabilistic encoder-decoder
more convolutional layers with batch normalization and a network architecture where the model is trained by inserting
rectified linear unit (ReLU), followed by non-overlapping dropout in the training process and the distribution over the
max-pooling and sub-sampling. Next, in the decoder process, weights is sampled during testing using Monte Carlo
the spatial dimensions are gradually recovered using stochastic sampling to obtain the posterior distribution of
deconvolution and up-sampling where the max-pooling softmax class probabilities (Kendell, et al., 2016). Figure 1
indices in the encoding sequence are utilized. In computer shows the schematic of the Bayesian SegNet architecture
vision terminology this process is known as the semantic used in this work. We inserted dropout layers only in the
segmentation and is widely used for visual scene middle flow of SegNet, as the low-level features in shallow
identifications. layers of the network are mostly consistent across the
distribution of the models and hence can be represented
Deep neural networks with model prediction and uncertainty using deterministic weights whereas the higher-level
quantification capabilities are commonly known as Bayesian features in the deep layers are better modelled using
neural network. As performing inference in Bayesian neural probabilistic weights.
network is difficult, variational inference is used to model
the posterior. Gal and Grahramani (2015) introduced A Bayesian model for semantic segmentation not only
dropout as approximate Bayesian inference over network’s produces model predictions for each pixel but also generates
weights and linked it to variational inference in deep neural pixel-wise uncertainty estimates. Among others, the
network with Bernoulli distributions over the network’s prediction model uncertainty can be measured using
weight. This is achieved by sampling the network with predictive entropy. The predictive entropy captures
randomly dropped out unit during testing in a fashion similar predictive uncertainty which combines both epistemic and
to MC sampling is carried out from the posterior aleatoric uncertainty (Gal, 2016).
distributions. In CNN architecture, the posterior distribution
over the convolutional weights w given the observed training The predictive entropy H ̂ [𝑦|𝐱, 𝐷𝑡𝑟𝑎𝑖𝑛 ] given a test input x
data x and labels y as 𝑝(𝑤|𝐱, 𝑦) is not tractable. Hence, the and the training data 𝐷𝑡𝑟𝑎𝑖𝑛 can be approximated as:
distribution of these weights is approximated using
variational inference as 𝑞(𝑤) via minimizing the Kullback- ̂ [𝑦|𝐱, 𝐷𝑡𝑟𝑎𝑖𝑛 ] =
H
Leibler (KL) divergence between this approximate
distribution and the full posterior as: 1 1
− ∑ ( ∑ 𝑝(𝑦 = 𝑐|𝐱, 𝑤
̂ 𝑡 ) log( ∑ 𝑝(𝑦 = 𝑐|𝐱, 𝑤
̂ 𝑡 )) (3)
T T
KL(𝑞(𝑤) || 𝑝(𝑤|𝐱, 𝑦) (1) c 𝑡 𝑡

In equation 1, the approximating variational distribution where y is the output variable, c ranges over all the classes,
𝑞(𝑤𝑖 ) for every K x K dimensional convolutional layer i, T is the number of Monte Carlo samples (stochastic forward
with units j, is defined as: passes), 𝑝(𝑦 = 𝑐|𝐱, 𝑤̂ 𝑡 ) is the softmax probability of input x
b𝑖,𝑗 ~ Bernoulli(pi ) for j = 1, … , 𝐾𝑖, being in class c, and 𝑤̂ 𝑡 are the model parameters on the tth
Monte Carlo sample. The predictive entropy attains its
𝑤𝑖 = M𝑖 diag(b𝑖 ) (2)
maximum value when all classes are predicted to have equal
and uniform probability, and its minimum value when one
with b𝑖 vectors of Bernoulli distributed random variables, pi
class has probability 1 and all others a probability of 0 (i.e.
the dropout probability and variational parameters M𝑖 , we the prediction is certain).
obtain the approximate model of the Gaussian process.

© 2019 SEG 10.1190/segam2019-3216870.1


SEG International Exposition and 89th Annual Meeting Page 2489
Seismic facies classification using Deep Neural Network

APPLICATION OF BAYESIAN CNN showed that we achieved performance improvement of


about 4% using Monte Carlo sampling with dropout,
To demonstrate the application of Bayesian CNN for pixel- compared with a standard SegNet architecture without
wise seismic facies classification and uncertainty dropout (see Table 1). Thus, the results confirm that
quantification, we used stacked seismic data where each modeling uncertainty using Monte Carlo sampling with
pixel is labelled as either salt or sediment classes. The data dropout not only produces uncertainty maps but also helps
Downloaded 06/21/20 to 130.238.7.40. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

used in this study were prepared by TGS and available for reducing network over-fitting and achieve superior
academic research. predictions.

A seismic image is the representation of the digital signal The knowledge of confidence with which we can rely on our
amplitude recorded by sensors and are generated by the model prediction is important for any decision-making
source waveform interactions at the elastic impedance system and this is especially vital for geological
boundaries between different rock formations. These images interpretations as we use passive measured data to interpret
show the amplitude difference of different rock boundaries or distinguish geological facies. Deep neural network is the
but does not provide quantitative geological interpretation of widely accepted Machine Learning architecture used for
each rock-type. The impedance contrast within the facies regression, natural clustering and classification problems.
represented by salt structure is not significant whereas the The practical use of Deep neural network in geoscience
contrasts between the boundaries of salt and sediment layers applications poses some naturally occurring errors and
is high and produce strong reflected signal amplitudes. limitations, such as model over-fitting, prediction error due
Finally, the sediment layers within the salt bodies possess to insufficient training samples and incoherent noise present
other imaging challenges as incident signal becomes on data that are caused by processing artifacts. In Figure 2.
obstructed due to low illumination angles. The main goal of darker color of predictive entropy indicates higher prediction
this image-based classification is to automatically identify uncertainty. We observe high uncertainty inside the
salt and sediment facies from seismic images and measure boundaries of the facies. Geological facies boundary is the
the prediction uncertainty using Machine Learning region where different sedimentary layers change
technique, and we used Bayesian SegNet to achieve this goal. gradationally from one into another with a mixture of
mineralogical compositions. Therefore, high uncertainty
We used 4000 training images and their corresponding inside the facies boundaries makes is geological sense. We
classes, labelled as either salt or sediment to train our also observe higher uncertainty (marked as green boundaries
Bayesian network. Each training image had 101x101 pixels. in Figure 2) where the samples are either contaminated by
We used batch normalization layers after each convolution processing or imaging artifacts, and where there are
layer and trained the network with median frequency class sediments trapped inside salt body.
balancing using the method proposed by Eigen and Fergus
(2014). The MC dropout was inserted in the middle layers. CNN Mean
For training, we used stochastic gradient descent method accuracy (%)
with base learning rate of 0.001 and weight decay parameter With MC dropout 91.56
set to 0.0005. We continued the process with dropout until
convergence when no further improvements of training Without dropout 87.14
losses were recorded.

Following training, 1000 test images were fed into the


network to obtain model prediction and uncertainty Table 1. Seismic facies classification accuracy with and
estimation. We used 200 stochastic forward passes with without dropout.
dropout to estimate per-pixel model prediction and
uncertainty. We used the mean of the softmax probability as CONCLUSION
a measure of final model prediction. We also recorded mean
model accuracy, per class accuracy, and predictive entropy In this study, we used Bayesian CNN models for seismic
from Equation 3. In Figure 2, we present some qualitative facies classification and uncertainty estimation. The
results on the test images. The samples marked as red are Bayesian model clearly classifies the salt and sediment
classified as salt and the grey samples are classified as facies from stacked seismic amplitude image and produces
sediments. We achieved about 92% prediction accuracy prediction uncertainty estimates. Our study also shows that
using Bayesian SegNet architecture. This result confirms Bayesian CNN is a useful architecture for facies
that Bayesian CNN technique can be used for seismic facies classification with limited training samples. We achieved
classification from stacked seismic images. To benchmark superior prediction performance using Monte Carlo
our Bayesian SegNet algorithm with the state of the art sampling with dropout by reducing model over-fitting
SegNet architecture, we use the same training samples to compared to a normal CNN architecture. The uncertainty
train SegNet algorithm and tested against all 1000 images. maps produced by our method are valuable and can provide
Table 1 shows the seismic facies prediction accuracy using guidance to an experienced interpreter to use machine
Bayesian SegNet with and without dropout. This result generated facies classes in production environments. In

© 2019 SEG 10.1190/segam2019-3216870.1


SEG International Exposition and 89th Annual Meeting Page 2490
Seismic facies classification using Deep Neural Network

future, we plan to extend this classification method in a


continuous 2D or 3D seismic sections to bring more
meaningful interpretations of our predictions.

ACKNOWLEDGEMENT
Downloaded 06/21/20 to 130.238.7.40. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

The authors thank TGS for sharing the classified seismic


data for academic research and publication. The CNN
models discussed in this study were implemented in
TensorFlow, an open source library from Google, and using
the SegNet architecture.

© 2019 SEG 10.1190/segam2019-3216870.1


SEG International Exposition and 89th Annual Meeting Page 2491
REFERENCES
Badrinarayanan, V., A. Kendall, and R. Cipolla, 2015, SegNet: A deep convolutional encoder-decoder architecture for image segmentation: ar-
Xiv:1511.00561, 2, 3, 4, 5, 6, 9.
Chen, L. C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, 2018, Encoder-decoder with atrous separable convolution for semantic image seg-
mentation: arXiv preprint, arXiv:1802.02611v2.
Eigen, D., and R. Fergus, 2014, Predicting depth, surface normal and semantic labels with a common multi-scale convolutional architecture: ar-
Downloaded 06/21/20 to 130.238.7.40. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Xiv:1411.4734, 5, 6.
Gal, Y., 2016, Uncertainty in deep learning: Ph.D. thesis, Cambridge University.
Gal, Y., and Z. Ghahramani, 2016, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning: arXiv:1506.02142v6.
Kendall, A., V. Badrianrayanan, and R. Cipolla, 2016, Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for
scene understanding: arXiv:1511.02680v2.
Long, J., E. Shelhamer, and T. Darrell, 2014, Fully convolutional networks for semantic segmentation: arXiv:1411.4038, 1, 2, 5, 6, 8.
Mukhoti, J., and Y. Gal, 2018, Evaluating Bayesian deep learning methods for semantic segmentation: arXiv:1811.12709.
Yu, F., and V. Koltun, 2016, Multi-scale context aggregation by dilated convolutions: ICLR, 2, 6, 8.
Zhao, T., 2018, Seismic facies classification using different deep convolution neural network: 88th Annual International Meeting, SEG, Expanded
Abstracts, 2046–2049, doi: 10.1190/segam2018-2997085.1.
Zhao, T., and P. Mukhopadhyay, 2018, A fault detection workflow using deep learning and image processing: 88th Annual International Meeting,
SEG, Expanded Abstracts, 1966–1969, doi: 10.1190/segam2018-2997005.1.

© 2019 SEG 10.1190/segam2019-3216870.1


SEG International Exposition and 89th Annual Meeting Page 2492

You might also like