Professional Documents
Culture Documents
10.1190/GEO2019-0332.1
Zhaoqi Gao1, Chuang Li1, Bing Zhang1, Xiudi Jiang2, Zhibin Pan3, Jinghuai Gao1, and Zongben
Xu4
INTRODUCTION By incorporating well-log data, one simple and widely used ap-
proach can be used to build a large-scale density model. This ap-
Estimating high-fidelity models for subsurface parameters is cru- proach is realized by extrapolating the density from well locations
cial for the exploration of oil and gas. Among several parameters, to other places using some extrapolation methods, and usually the
density plays a key role in lithology interpretation, reservoir evalu- horizons of the subsurface medium are needed to guarantee a lat-
ation, and description. Consequently, building a density model is erally reasonable extrapolated model. However, this approach has
important in geophysics. However, it is well known that large-scale two weaknesses: (1) picking accurate horizons remains a challenge,
density can hardly be directly inverted using the information carried and (2) it is firmly established that the well-log data have high ver-
by seismic waves because the amplitude of the scattered wavefield tical resolution but poor horizontal continuity. Thus, the extrapo-
corresponding to the density perturbation decreases quickly with
lated density model is correct in the well position but unreliable for
the increase of the scattering angle (Tarantola, 1986; Forgues and
places that are far away from the wells. This weakness will be more
Lambaré, 1997; Virieux and Operto, 2009). Thus, additional infor-
serious when the subsurface medium is laterally heterogeneous
mation is needed for building a large-scale density model.
Manuscript received by the Editor 21 May 2019; revised manuscript received 30 July 2020; published ahead of production 4 October 2020; published online
16 December 2020.
1
Xi’an Jiaotong University, School of Information and Communications Engineering, Xi’an, Shaanxi 710049, China and Xi’an Jiaotong University, National
Engineering Laboratory for Offshore Oil Exploration, Xi’an, Shaanxi 710049, China. E-mail: zq_gao@xjtu.edu.cn (corresponding author); chli0409@126.com;
xawslhh@163.com; jhgao@mail.xjtu.edu.cn.
2
CNOOC Research Institute, Beijing 100028, China. E-mail: jiangxd2@cnooc.com.cn.
3
Xi’an Jiaotong University, School of Information and Communications Engineering, Xi’an, Shaanxi 710049, China. E-mail: zbpan@mail.xjtu.edu.cn.
4
Xi’an Jiaotong University, School of Mathematics and Statistics, Xi’an, Shaanxi 710049, China. E-mail: zbxu@mail.xjtu.edu.cn.
© 2021 Society of Exploration Geophysicists. All rights reserved.
M1
and only a few well logs are available. Another commonly used In this paper, we propose a deep-learning-based data-driven
approach builds a large-scale density model from the large-scale method to build a large-scale density model. Our basic idea is to build
velocity model by using some empirical relations between velocity a mapping from seismic data to a large-scale density model using
and density, such as the Gardner relation (Gardner et al., 1974) and end-to-end deep learning. To build such a nonlinear relation, the deep
the generalized Gardner relation (Ursenbach, 2005). However, this architecture called the long short-term memory (LSTM) network is
approach also faces two problems: (1) it is challenging to build an adopted. To train the LSTM network and improve its applicability in
accurate large-scale velocity model, and (2) it is untrivial to deter- complex cases (e.g., in a laterally heterogeneous medium), we pro-
mine the parameters of an empirical relation because they are lith- pose to use the following method to construct the data set for network
ology-dependent, meaning that the empirical relation with fixed training and validation based on seismic data and a few well logs.
parameters is unreliable and will introduce artifacts into the density First, pairs of seismic traces and the corresponding large-scale density
model. As a result, more advanced techniques are required for model directly obtained from well logs are used to form the data set.
building a large-scale density model. Because the limited wells are sparsely distributed in space, this data
As a data-driven machine learning algorithm, neural networks, set is insufficient to guarantee the generalization ability of the deep
which are inspired by the biological neural networks that constitute network, especially for a laterally heterogeneous medium. To over-
animal brains, enable us to do tasks by learning from several exam- come this shortcoming, we randomly generate many velocity and
ples. Neural networks have been applied in geophysical problems for density models according to the statistical distributions of well-log
a long time (van der Baan and Jutten, 2000; Poulton, 2002). Recently, data, based on which several pairs of synthetic seismic data and the
Maiti and Tiwari (2010) propose a new method, which is set in a corresponding large-scale density model are generated to signifi-
Bayesian neural network framework and uses a hybrid Monte Carlo cantly enlarge the size and diversity of the data set. The proposed
simulation scheme to identify facies changes from complex well-log method has two important characteristics: (1) it avoids the require-
data. In addition, the neural network is optimized by a hybrid genetic ments of large-scale velocity building and horizon picking, and (2) it
algorithm and particle swarm optimization method. Ardjmandpour has the potential to handle complex cases even if only limited
et al. (2011) propose a forward modeling and an inversion method well logs are available. We test the deep-learning-based data-driven
based on neural networks. Wit et al. (2013) use a mixture density method using synthetic and field data examples. The numerical re-
sults clearly demonstrate the following two facts: (1) the randomly
network to obtain 1D marginal posterior probability density func-
generated data set is indeed the key for improving the performance of
tions, which provide a quantitative description on the individual earth
the proposed method in dealing with a strong heterogeneous medium,
parameters. Kahrizi and Hashemi (2014) propose to use the neuron
and (2) the proposed method performs significantly better than the
curve to find the optimal layer size and the minimum size of the train-
commonly used methods.
ing set of the neural network and apply it to find the first-break picks
This paper is organized as follows. We first present the basic con-
of seismic data. Konaté et al. (2015) propose the self-organizing map
cept and information about the LSTM network. Then, a detailed
neural network and use it for the classification of metamorphic rocks
description of the proposed method is provided. Finally, numerical
from log data. Keynejad et al. (2017) compare the simultaneous pre-
examples are presented to verify the effectiveness of the proposed
stack inversion and neural network methods in creating 3D Poisson’s
method.
ratio models built upon low-frequency initial models. The compari-
son shows that the neural-network-based method is notably more
successful than the extrapolation-based method for the results beyond THE LSTM NETWORK
the logged sections in the wells and away from the control wells.
LSTM, first proposed by Hochreiter and Schmidhuber (1997), is
Chen et al. (2018) use the pseudo-back-propagation neural network
a kind of RNN that is powerful for processing time-series data. Dif-
method to invert the gravity anomalies of multidensity interfaces. The
ferent from traditional RNN, LSTM has a unique architecture, as
latest variant of neural networks, that is, deep learning, has also been
shown in Figure 1. The key of LSTM is the cell state that is rep-
used in solving geophysical problems, including but not limited to
resented by the horizontal line running through the top of a unit. It
seismic inversion, velocity model building, and seismic facies analy-
enables LSTM to have continuous gradient flow to prevent back-
sis (Lewis and Vigh, 2017; Araya-Polo et al., 2018; Qian et al., 2018;
propagated errors from vanishing or exploding; consequently,
Gao et al., 2019, 2020a, 2020b), and it promises breakthroughs.
LSTM can work well for the tasks that require memories of events
Recently, the applications of neural networks in reservoir charac-
that happened thousands or even millions of discrete time steps
terization have also been investigated. Ahmadi et al. (2013) propose
earlier.
to implement a soft sensor on the basis of a feed-forward neural net- For each unit of LSTM, the relation between its input and output
work to forecast the permeability of a reservoir. Chaki et al. (2015) can be summarized as follows:
propose a preprocessing scheme to improve the prediction of the sand
8
fraction from seismic attributes using neural networks. Cersósimo >
>
> f t ¼ σ g ðW f xt þ U f ht−1 þ bf Þ
et al. (2016) use a neural network to predict lateral variations of seis- >
>
mic velocity, density, thickness, and gamma rays. Boateng et al. < it ¼ σ g ðW i xt þ Ui ht−1 þ bi Þ
ot ¼ σ g ðW o xt þ U o ht−1 þ bo Þ ; (1)
(2017) propose a porosity estimation method based on Caianiello >
>
>
> c ¼ f ∘ c þ i ∘ σ ðW x þ U h þ b Þ
neural networks and Levenberg-Marquardt optimization, and they > t t t−1
: ht ¼ ot ∘ σ h ðct Þ
t c c t c t−1 c
demonstrate that this method is robust. Motaz and Ghassan (2018)
propose a petrophysical property estimation method using a deep net-
work called recurrent neural networks (RNN), and they demonstrate where xt ∈ Rd is the input vector; f t ∈ Rh is the forget gate’s ac-
that this method can build a density model from seismic data through tivation vector; it ∈ Rh is the input gate’s activation vector; ot ∈ Rh
a learned nonlinear relation between them. is the output gate’s activation vector; ht ∈ Rh is the output vector;
“∘” denotes element-wise multiplication; ct ∈ Rh is the cell state be effective in dealing with complex cases, such as a strong hetero-
vector; σ is the activation function; and W ∈ Rh×d , U ∈ Rh×h , geneous medium. Considering that seismic data and well-log den-
and b ∈ Rh are the learnable weights and biases, which can be ob- sity are temporal dynamic, meaning each data point of them is not
tained through network training with the gradient calculated isolated but has the dependency of the data points before and after it,
through back propagation (Rumelhart et al., 1986). Herein, σ g ðxÞ ¼ we think that the deep learning architecture called LSTM is suitable
1∕ð1 þ e−x Þ is the sigmoid function, and σ c and σ h are the tanh in our method because of its effectiveness in processing time-series
function tanhðxÞ ¼ ðex − e−x Þ∕ðex þ e−x Þ. data as introduced above. We call the proposed method the deep-
learning-based method, and its workflow is shown in Figure 2.
METHODOLOGY There are several important parts of the proposed method, and each
of them will be explained in detail as follows.
Although large-scale density can hardly be inverted from seismic
data (Forgues and Lambaré, 1997), it is still reasonable to believe The deep learning architecture of the proposed
that the relation between them certainly exists because of the fol- method
lowing two facts: (1) the traveltime of seismic data is related to the
large-scale velocity model of the subsurface, and (2) the large-scale The basic ingredient of a successful deep learning application is
velocity and density are nonlinearly related (Gardner et al., 1974). the network architecture. Herein, we use a deep learning architec-
As a consequence, it is possible to obtain large-scale density from ture, shown in Figure 3, in our method. The input of the deep
seismic data given a sufficiently accurate nonlinear relation between network is the seismic data. Considering the temporal and spatial
them. In this work, we propose to establish such a nonlinear relation correlation, at each time step, we use seismic data within a sliding
using deep learning in the data-driven framework with the aim of window (marked by the green rectangles in Figure 3) as the input
providing a more advanced and robust large-scale density model rather than just inputting a single data point. This can bring us two
building method, which can not only work for simple cases but also obvious advantages. First, the correlation of seismic data enables
deep learning to build a more sophisticated non-
linear relation. Second, by taking the seismic
data within a window into consideration, a non-
linear relation with better noise resistibility can
be expected. In application, the size of the sliding
window in the time direction should be chosen
based on the wavelength of the seismic wavelet
whereas its size in lateral distance can be fixed as
a constant. Once the seismic data are input into
the deep network, the unit of LSTM at each time
step will capture the information of input and
then output a vector, which then will be used
Figure 1. The architecture of the LSTM network. Symbol “A” represents the unit of as the input of the fully connected neural network
LSTM, which contains several operations and gates. The terms xt ; t ¼ 1; 2; : : : ; N t (FCNN) to output a density value for the current
and ht ; t ¼ 1; 2; : : : ; N t are the input and output of a unit, respectively. time step.
Based on the deep learning architecture, the output density at 1) Obtain the statistical distributions of well-log data (compres-
time t can be expressed as sional wave [P-wave] velocity and density). As shown in Figure 4,
we first equally divide the entire well-log curve into several sections
ρt ¼ σ ρ ðW ρ ht þ bρ Þ; (2) based on its value. Specifically, the maximum value vmax and the
minimum value vmin of the well-log curve are obtained, and then the
where W ρ ∈ Rh and bρ ∈ R are the weights and bias of the FCNN value range ½vmin ; vmax is divided into several sections with the
layer, respectively, ht is the output of the unit of LSTM at time t, σ ρ is lower and upper bounds of each section being
the activation function of the FCNN layer, and the rectified linear unit
function σ ρ ðxÞ ¼ maxð0; xÞ is used in our work. It is worth noting
slower ¼ vmin þ ðl − 1Þ × vmaxL−vmin
from equation 2 that the weights and bias of the FCNN layer do not l
upper ; (4)
sl ¼ vmin þ l × vmaxL−vmin
change along with time t. In other words, we use shared weights and
bias for all time steps with the aim to reduce the free parameters of the
deep learning architecture. In summary, the learnable parameters of where slower
l and supper
l are the lower and upper bounds of section
the deep learning architecture can be summarized as follows: sl ; l ¼ 1; 2; : : : ; L, respectively, and L is the total number of sec-
tions. Then, a probability is assigned to each section according to
Θ ¼ fW f ;Uf ;bf ;W i ;U i ; bi ;W o ;Uo ;bo ;W c ;Uc ; bc ;W ρ ;bρ g: the number of data points within it as follows:
(3) N sl
Pðsl Þ ¼ ; (5)
NT
Constructing data sets for training and validation
where N sl is the number of data points within the section sl and N T is
For all supervised deep learning applications, multiple pairs of
the total data points of the whole well-log curve. In this step, choosing
data are crucial for training the deep network, in which massive data
a reasonable value of L (the number of sections) is the key for
sets are often required for more complex data analysis applications.
obtaining the probability of the well-log because a too small L will
For our purposes, we need pairs of seismic data and large-scale den-
lead to low accuracy and a too large L will lead the calculation of
sity. Herein, we use the well-log data to construct such data sets.
probability to be computationally expensive. A reasonable criterion is
The well-log data are converted from the depth to the time domain
that we choose the value of L as small as possible on the premise
before use to make sure that the seismic data and well logs are in the
of certain accuracy. In practice, the algorithm summarized in Algo-
same domain. Based on the well logs, we construct two kinds of
data sets, and the details will be introduced next. rithm 1 can be adopted to determine the optimal value of L. In the
For the first kind of data set, we directly obtain large-scale den- algorithm, PðLÞ is the probability obtained using equations 4 and 5
sity traces from well logs by applying a filter to extract the low-fre- with the number of sections of L and ΦL ð·Þ is an operator that maps a
quency trend of the density curves, and then we associate them with probability with L sections to a probability with Lmax sections using
max
the corresponding seismic data to form several data pairs. However, interpolation. This operator can guarantee that ΦL ðPðLÞ Þ and PðL Þ
2
training the deep learning based on this data set faces problems be- have the same dimensionality; consequently, their L distance can be
cause the number of available wells is usually limited in practice, measured. Herein, we empirically set ε ¼ 2% with the aim to balance
which seriously hinders the generalization ability of deep learning accuracy and efficiency and Lmax is chosen as 100.
especially for a laterally heterogeneous medium. 2) Generate random models and the corresponding synthetic seis-
The second kind of data set is the supplement of the first kind to mic data. Based on the probability defined above, we first generate
improve the generalization ability of deep learning. Herein, our ba- several 1D P-wave velocity and density models. Each 1D model has
sic idea is to randomly generate several models and calculate their several layers (ranging randomly from four to seven), and the thick-
corresponding seismic data to form many data pairs with the aim of ness of each layer is randomly given under the premise that the
trying to include different geologic structures (such as a salt-like whole model has the same length as the well-log curve. The value
structure) into the data set. This can be realized through the follow- (P-wave velocity or density) of each layer is randomly generated
ing two steps: according to the calculated probability. In this paper, this is realized
by using the inbuilt function randsrc of the MATLAB software.
Then, we generate a 2D P-wave velocity model and a 2D density
Algorithm 1. Determine the optimal value of L.
model by interpolating these 1D models in lateral direction. Next,
Input: Error threshold ε and the maximum number of sections Lmax we obtain seismic data of the 2D model in a trace-by-trace manner
using the convolution model (Robinson, 1967) as follows:
Output: Optimal value L
max
1: Calculate the probability PðL Þ 2w 3
1
2: Set L ¼ 2, calculate the probability Pð2Þ and calculate 6 w2 w1 7
max
Errorð2Þ ¼ kΦ2 ðPð2Þ Þ − PðL Þ k22
6 . .. 72 3
6 . 7 rm;1
6 . w2 . 7
3: while ErrorðLÞ∕Errorð2Þ > ε do 6 .. .. 76 rm;2 7
6 76 7
4: Set L ¼ L þ 1 sm ¼ Wrm ¼ 6 wL . . w1 76 .. 7; (6)
6 .. 74 . 5
max
5: Calculate PðLÞ and ErrorðLÞ ¼ kΦL ðPðLÞ Þ − PðL Þ k22 6 7
6 wL . w2 7 r
6 .. .. 7 m;N t
6: end while 4 5
7: return L ¼ L.
. .
wL
where sm is the seismic data of the mth trace; w ¼ ½w1 ; w2 ; · · · ; wL T function with the aim to guarantee the estimation accuracy of deep
is the seismic wavelet; and rm ¼ ½rm;1 ; rm;2 ; · · · ; rm;N t T is the learning on the well-log data. The loss function shown in equation 8
reflectivity vector of the mth trace of which each element is can be minimized using a gradient-based local optimization tech-
vm;iþ1 ρm;iþ1 −vm;i ρm;i nique with the descent direction calculated through back propagation
; i ¼ 1; 2; · · · ; N t − 1 (Rumelhart et al., 1986).
rm;i ¼ vm;iþ1 ρm;iþ1 þvm;i ρm;i ; (7)
0; i ¼ Nt
Implementation of the proposed method
where vm;i and ρm;i are the ith element of the P-wave velocity and
density, respectively, corresponding to the mth trace. Apart from the For the implementation of the proposed method, we use Tensor-
seismic data, the large-scale density of each trace of the 2D model Flow for training our deep learning architecture. TensorFlow is a free
is derived by applying a filter to the corresponding trace of the 2D open-source software library for dataflow and differentiable program-
density model. ming across a range of tasks, and it can automatically and correctly
After finishing these two steps, we can obtain plentiful data pairs map the flow of gradients back to individual weights and biases dur-
of seismic traces and the corresponding large-scale density, which are ing back propagation (Abadi et al., 2015). We train the deep learning
then used to train the deep learning architecture
together with the data pairs directly obtained from
well logs. It is worth noting that, although a con-
tinuous-to-discrete conversion is used during con-
struction of the random data set, the seismic traces
and large-scale density models within the random
data set are all continuous.
(8)
architecture using Adam’s optimizer (Kingma and Ba, 2014) with a data and well-log data. The proposed method uses a machine learning
fixed learning rate of 0.001 and a fixed batch size of 32. A new mini- algorithm, that is, deep learning, to learn a nonlinear relation between
batch will be created before each iteration using random shuffling to seismic data and large-scale density. Such an approach does not re-
ensure that there is no bias in learning. Before iteration, all weights quire large-scale velocity or picked horizons of the subsurface, which
and biases are initialized. Specifically, all weights are randomly ini- are the prerequisites of the well extrapolation method or the empiri-
tialized based on a normal distribution whose mean and variance are cal-relation-based method. Thus, the proposed method can be more
0 and 0.01, respectively, and all biases are initialized as a constant of robust. More importantly, a large amount of randomly generated data
0.1. The entire training procedure contains 100 epochs, and the early pairs, which cover a variety of subsurface structures, are incorporated
stopping strategy (Nielsen, 2015) is adopted to prevent overfitting. into the training data set, leading the proposed deep-learning-based
method to have the potential to handle complex cases.
The advantage of the proposed deep-learning-based
method SYNTHETIC EXAMPLES
The proposed deep-learning-based method provides a general In this section, the performance of the proposed deep-learning-
framework to obtain a large-scale density model based on seismic based method will be assessed through synthetic experiments, which
are based on a modified Marmousi II model. The
true P-wave velocity and density models are
shown in Figure 5a and 5b, respectively. This
model is generated from the top portion of the
original Marmousi II model, and the water layer
is removed. In addition, the original depth domain
model is converted to the time domain. The whole
modified Marmousi II model has 1600 traces in
lateral direction, each of which has a length of
1.2 s. We set the sampling interval as 0.002 s; con-
sequently, each trace has 601 points in time. This
model has three pseudo wells, which are located at
common depth points (CDPs) of 100, 400, and
1100. The P-wave velocity and density curves
of these wells are shown in Figure 6. It is worth
noting that the three pseudo wells offer us nearly
no information about the middle part of the model,
which is complex in structure because faults and
anomalies are present. As a consequence, it is suit-
able to use this test to investigate the performance
of different methods in dealing with complex
cases. Based on the P-wave velocity and density
model, the observed seismic data, as shown in
Figure 7, are obtained for all 1600 traces using the
convolution model shown in equation 6. A Ricker
wavelet with peak frequency of 10 Hz is used
Figure 6. The P-wave velocity (the blue lines) and density (the red lines) of the as the source in this test. Based on the modified
pseudo wells (a–c) correspond to wells 1–3, respectively. Marmousi II model, we conduct two experiments.
First, we use the proposed deep-learning-based
method to build a large-scale density model and compare its perfor-
mance with the well extrapolation method. Second, we conduct the
uncertainty analysis to assess how the Gaussian noise in seismic data
influences the performance of the proposed method. Detailed infor-
mation of these experiments is shown in the following sections.
Parameter Value
Figure 10. The large-scale density models: (a) the true model gen-
erated from Figure 5b using smoothing, (b) the model built by well
Figure 8. The probabilities calculated from the three pseudo wells extrapolation, (c) the model built by the proposed method without
shown in Figure 6 using the method illustrated in Figure 4: (a and randomly generated data set, and (d) the model built by the pro-
b) correspond to the P-wave velocity and density, respectively. posed method with randomly generated data set.
seismic data will be used as the input of the unit of LSTM. It is ity and density from the three pseudo wells (see Figure 8). Then, 57
worth noting that the number of time steps of LSTM (number of different 1D P-wave velocity and density models are randomly gen-
LSTM units) is set as 121 here, meaning that the time step of the erated based on the calculated probabilities following the method
sliding window is chosen as 0.01 s here even though the sampling introduced above. Next, 2D P-wave velocity and density models
interval of the data is 0.002 s. This mechanism can significantly are generated by interpolating these 1D models in lateral direction.
reduce the number of LSTM units and consequently leads to efficiency By setting the interval of different 1D models as 200 traces, we can
improvements during network training. It is reasonable to use this obtain a 2D P-wave velocity model as well as a 2D density model,
mechanism in our application because the seismic data and large-scale both of which have 11,201 traces, after interpolation. Figure 9 dis-
density are band limited and the information will not be lost while plays part of the randomly generated 2D models. Finally, the 2D
satisfying the Nyquist sampling theorem. In addition, the dimension models together with their corresponding synthetic seismic data
of the output vector of LSTM in each time step is 128 here. are used to generate an additional data set, which is then used
The key step is to construct the data set for training the deep for network training. The built large-scale density models of differ-
learning architecture. As introduced above, part of the data set is ent methods are shown in Figures 10 and 11. To quantitatively mea-
directly obtained from the three pseudo wells, whereas the others sure the quality of these models, we calculate their corresponding
come from the randomly generated models. To generate such ran- peak signal-to-noise ratio (PS/N) and structural similarity (SSIM),
dom models, we first calculate the probability of the P-wave veloc- and the results are shown in Table 2.
Figures 10b and 11b display the built large-
scale density model of the well extrapolation
method, and its corresponding PS/N and SSIM
are shown in Table 2. It is clear from the results
that this model is quite different from the true
large-scale density model (Figure 10a) in density
structures and values. This model can not depict
the density anomalies around CDP 250 and CDP
800, nor depict the discontinuity in the middle of
the model. This is because the well extrapolation
has a strong dependence on the well logs and it
can only offer reliable results for the places around
wells, and the wells used in this experiment offer
no information about these density anomalies.
Figures 10c and 11c display the built large-
scale density model of the proposed method with-
out a randomly generated data set. Compared to
the model built by well extrapolation, this model
is more consistent with the true model because
larger PS/N and SSIM values are obtained. Spe-
cifically, this model tells us that a low-density
anomaly presents around CDP 250 and 0.6 s in
time. In addition, this model displays some high-
density anomalies in the deep part, which is con-
sistent with the true model. This implies that the
performance of the proposed method is better than
that of the well extrapolation method, even though
Figure 11. A detailed comparison of the large-scale density models: (a-d) correspond to
the models shown in Figure 10a–10d, respectively. The density structures inside the red only three wells are used in the network training.
marked areas are quite different from those of the true model. However, the built large-scale density model here
also has some incorrect structures, such as the
density values inside the red dashed area (Fig-
Table 2. Comparison of large-scale density models built by ure 11c) are lower than those in the true model (Figure 11a). This
different methods. These models correspond to the noise-free is because the data set obtained directly from the three pseudo wells
synthetic example based on the modified Marmousi II model.
The bold values indicate that the proposed method with is insufficient for deep learning to establish a nonlinear relation that
randomly generated data set has the best performance. can be generalized to a variety of subsurface structures.
Figures 10d and 11d display the built large-scale density model
of the proposed method with a randomly generated data set.
Proposed method Proposed method Compared to the model shown in Figure 10c, this model has a sig-
Well without randomly with randomly nificant improvement in quality. Specifically, this model has a very
Method extrapolation generated data set generated data set
high PS/N value (49.2250 dB), which is almost 10 dB larger than
PS/N (dB) 32.8276 39.5746 49.2250 that without a randomly generated data set, and also a high SSIM
SSIM 0.9708 0.9982 0.9996 value (0.9996). In addition, the overall density structures and values
are consistent with that of the true model even for some small-size
density anomalies, whose accurate depiction is considered to be by increasing the energy of the Gaussian noise, the effective signal
challenging especially when no adequate well is available. These contained in the seismic data is seriously contaminated by the noise
results imply the following: (1) the randomly generated data set especially when the S/N is 0 dB, which makes building a large-scale
is crucial for the performance of the proposed deep-learning-based density model from seismic data challenging. Besides the seismic
method especially for its implementation in complex cases and data, other experimental conditions used here are the same as that
(2) the proposed method works well in this synthetic experiment used above. The randomly generated data set is used in this experi-
even though only three wells are available, and it achieves superior ment during training of the proposed deep learning architecture.
performance over the well extrapolation method. The results are shown in Figure 13 and Table 3.
Figure 13b displays the built large-scale density model corre-
Uncertainty analysis on Gaussian noise sponding to the noisy data with the S/N of 10 dB. In such a case,
the built model is comparable to the noise-free case (Figure 13a) be-
The proposed method builds the large-scale density model from cause both of them have consistent density structures. In addition,
seismic data. In the previous experiment, we assumed noise-free their corresponding PS/N as well as the SSIM values are very close
seismic data. Herein, we conduct an uncertainty analysis to inves- to each other. Figure 13c displays the built large-scale density model
tigate the sensitivity of the proposed method to Gaussian noise. corresponding to the noisy data with the S/N of 5 dB. It is clear from
Three different levels of Gaussian noise with signal-to-noise ratios this model that, with the increase of the Gaussian noise, some arti-
(S/Ns) of 10, 5, and 0 dB are added into the seismic data to generate facts are present in the model. This phenomenon can more clearly be
three noisy data sets (see Figure 12). It is clear from Figure 12 that, observed in Figure 13d, which corresponds to the case with the S/N
Figure 12. The noisy seismic data: (a-c) correspond to the S/N of 10, 5, and 0 dB, respectively; (d-f) are the comparisons of the noise-free data
(the red lines) and the noisy data (the blue dashed lines), which correspond to (a-c), respectively. The data are from CDP 400.
of 0 dB. Specifically, three obvious artifacts marked by the red ure 14. This field data set has 810 traces in lateral direction, and
dashed rectangles can be observed, and the PS/N value of this model each trace has a length of 4.0 s. Herein, we only use part of the
is lower than the noise-free case by 4.8015 dB, which indicates the data, which ranges from 1.8 to 2.4 s in time. The sampling interval
reduction in quality. This is maybe because the strong noise destroys of these data is 0.002 s; thus, each trace has 301 samples. This part
the lateral continuity of seismic data. Considering that the proposed of the data is quite complex in structure because it has faults and
deep-learning-based method builds the large-scale density model in a anomalies. As a consequence, its large-scale density model building
trace-by-trace manner, this discontinuity is definitely harmful. is challenging. These data have five wells, which are located at
Although strong noise can cause the quality of the built large- CDPs of 115, 163, 294, 560, and 677. Figure 15 displays the P-
scale density model to decrease, it is worth noting that the proposed wave and density curves of the five wells. Different from the other
method can also accurately depict the complex density structures of wells, well 2 (Figure 15g) has a high anomaly from approximately
the middle part of the model even for the case with an S/N of 0 dB. 2.15 to 2.2 s. To further complicate the density model building,
To be specific, the positions and shapes of the low- and high-density we will not use well 2 for training but for testing, meaning that
anomalies can be accurately characterized. In addition, even for the no information about the density anomaly will be provided to the
extreme case (S/N is 0 dB), the large-scale density model built by proposed method. In the following, we will conduct two different
the proposed method is still significantly better than that of the well experiments: (1) we apply the well extrapolation method and the
extrapolation method. This suggests that the proposed method can proposed method to build the large-scale density model of this field
tolerate a wide range of Gaussian noise levels. data set and compare their performance, and (2) we use two differ-
ent well groups to train the deep learning architecture to investigate
FIELD DATA EXAMPLES the influence of the training data set on the performance of the pro-
posed method for uncertainty analysis.
In this section, the performance of the proposed deep-learning-
based method will be investigated in field data examples, which are Comparison of different methods in building a
conducted based on the 2D poststack seismic data shown in Fig- large-scale density model
Herein, the well extrapolation method and the proposed deep-
learning-based method are used to build large-scale density models.
The well extrapolation method uses four wells located at CDPs of
115, 294, 560, and 677 to obtain the large-scale density model for
all traces using extrapolation constrained by the picked horizons.
The same wells are used to construct the training data set for the
proposed method. Specifically, four pairs of the seismic data and
the corresponding large-scale density from the well-log form the
first part of the training data set, and the remaining data pairs are
constructed based on the 2D random models, which are generated
based on the probabilities shown in Figure 16. The 2D random
models have a total of 16,200 traces, which are generated based
on 163 1D randomly generated P-wave velocity and density models
using lateral interpolation. Figure 17 displays part of the 2D random
models. The wavelet used to generate the synthetic seismic data of
the 2D random models is extracted from the observed seismic data.
Table 4 displays the detailed information of the deep architecture
used in this example. The same as the synthetic example, we set
the size of the sliding window as 10 × 5, and the dimension of
the output vector of LSTM in each time step is fixed at 128. Herein,
the number of time steps of LSTM is 151, meaning that the time
step of the sliding window is 0.004 s. Figure 18 displays the large-
scale density models built by the well extrapolation method and the
Figure 15. The P-wave velocity (the blue lines) and density (the red lines) of the five wells: (a-e) the P-wave velocity curves corresponding to
wells 1–5, respectively, and (f-j) the density curves corresponding to wells 1–5, respectively.
Figures 18b and 19c display the built large-scale density model of testing; consequently, the above results are evidence that the trained
the proposed method. This model is quite different from that built deep learning architecture has good generalization ability in this
by the well extrapolation method in the structures, as compared in experiment.
Figure 19. More importantly, the comparison of density curves
shown in Figure 19d indicates that the built model of the proposed Uncertainty analysis on the training data set
method is consistent with the true density model obtained from the
well-log. Specifically, the built model of the proposed method In the preceding experiment, we use the well group that includes
can clearly depict the structure and the value of the high-density four wells (wells 1 and 3–5) to train our deep learning architecture.
anomaly in well 2. This consistency of the built model and the true Herein, we investigate how different well combinations used in net-
density model is also verified by the PS/N and SSIM values in work training influence the performance of the proposed method. In
Table 5, which are 47.0095 dB and 0.9255, respectively. It is worth
noting that the information of well 2 is not used in training but in
Table 4. The detailed deep architecture used in field data
examples.
Parameter Value
DISCUSSION
In the preceding sections, the deep-learning-based method has
been successfully applied to build large-scale density models and
its advantages over the well extrapolation method have been clearly
demonstrated through theoretical analysis and numerical examples.
In addition, the uncertainty analyses on Gaussian noise in seismic
data and the choice of wells in network training have been con-
ducted to verify the robustness of the proposed method. The main
advantages of the proposed method can be briefly summarized as
follows. First, it builds a large-scale density model from seismic
data using a nonlinear relation between them described by a deep
learning architecture that is established based on the LSTM net-
work. This approach avoids the requirements of a large-scale veloc-
ity model and picked horizons, which are the essential elements of
some other methods, leading these methods to be weak because
large-scale velocity model building and horizon picking are known
to be challenging. Second, the proposed method uses randomly
generated models to greatly enlarge the size and diversity of the
training data set instead of just using the limited well logs for net-
work training. This strategy significantly improves the performance Figure 20. The large-scale density models: (a and b) the model built
of the proposed method in dealing with complex cases. Below, we by the proposed deep-learning-based method using well groups 1
will address some important aspects of the proposed method. and 2 for network training, respectively.
CONCLUSION
Potential limitations
Building an accurate large-scale density model is of great impor-
The proposed deep-learning-based method builds a large-scale tance in exploration geophysics. We propose a deep-learning-based
density model from seismic data based on the nonlinear relation method to build such a model. This method builds the large-scale
between them. Consequently, its effectiveness has two prerequisite density model from seismic data based on the nonlinear relation
conditions. First, sufficient well logs are required by the proposed between them described by a deep learning architecture, and the
method to build a reliable nonlinear relation. In other words, the well logs are used to construct a training data set to enable deep
proposed method is not applicable to cases in which no well-log learning to learn such a nonlinear relation. The proposed method
is available. Second, the well-tie condition between seismic data has two important characteristics. First, the LSTM network is used
and well logs is required by the proposed method to ensure that to make full use of the dynamic nature of seismic data and well logs.
a reasonable nonlinear relation can be built. Second, and more important, the random models, which are gen-
erated based on the probabilities calculated from well logs, are used
Extension to 3D cases to greatly enlarge the size and diversity of the training data set to
significantly improve the ability of the proposed method in handling
The extension of the proposed method to 3D cases is straightfor- complex cases. Synthetic examples based on the modified Mar-
ward because it uses a trace-by-trace strategy to build the large-scale mousi II model and field data examples display that the proposed
density model, so that its applications in 2D and 3D cases are ba- method can build reasonable large-scale density models even
sically the same. However, one should pay attention to the following though the well logs used in network training are limited, and they
two aspects to ensure a successful 3D implementation. First, as in- display superior performance because the built large-scale density
troduced in the 2D cases, we use a sliding window to extract the model is 11.9666 dB and 0.6740 larger, respectively, in PS/N and
seismic data inside it as the input of the LSTM unit for each time SSIM than that of the well extrapolation method in the field data
example. In addition, uncertainty analyses of the proposed method Forgues, E., and G. Lambaré, 1997, Parameterization study for acoustic
and elastic ray plus born inversion: Journal of Seismic Exploration, 6,
on Gaussian noise in seismic data and the choice of wells in network 253–277.
training are conducted to display the robustness of the proposed Gao, Z., S. Hu, C. Li, H. Chen, J. Gao, and Z. Xu, 2020a, Reflectivity
method. In the future, we will extend the current work to 3D ap- inversion of nonstationary seismic data with deep learning based data
correction: 82nd Annual International Conference and Exhibition, EAGE,
plications and also investigate its implementation in building mod- Extended Abstracts, doi: 10.3997/2214-4609.202011195.
els for other reservoir parameters. Gao, Z., C. Li, T. Yang, Z. Pan, J. Gao, and Z. Xu, 2020b, OMMDE-Net: A
deep learning-based global optimization method for seismic inversion:
IEEE Geoscience and Remote Sensing Letters, Early Access, doi: 10
ACKNOWLEDGMENTS .1109/LGRS.2020.2973266.
Gao, Z., Z. Pan, C. Zuo, J. Gao, and Z. Xu, 2019, An optimized deep net-
work representation of multimutation differential evolution and its appli-
The authors gratefully thank the editors and three anonymous cation in seismic inversion: IEEE Transactions on Geoscience and
reviewers for their comments, which have greatly helped to improve Remote Sensing, 57, 4720–4734, doi: 10.1109/TGRS.2019.2892567.
the quality of this paper. The authors also thank the National Gardner, G. H. F., L. W. Gardner, and A. R. Gregory, 1974, Formation veloc-
ity and density–the diagnostic basics for stratigraphic traps: Geophysics,
Nature Science Foundation of China under grant no. 41804113, 39, 770–780, doi: 10.1190/1.1440465.
the National Postdoctoral Program for Innovative Talents under Hochreiter, S., and J. Schmidhuber, 1997, Long short-term memory: Neural
Computation, 9, 1735–1780, doi: 10.1162/neco.1997.9.8.1735.
grant no. BX201700193, the National Key R&D Program of the Kahrizi, A., and H. Hashemi, 2014, Neuron curve as a tool for performance
Ministry of Science and Technology of China under grant nos. evaluation of MLP and RBF architecture in first break picking of seismic
2018YFC1504200 and 2018YFC0603501, and the National Science data: Journal of Applied Geophysics, 108, 159–166, doi: 10.1016/j
.jappgeo.2014.06.012.
and Technology Major Project under grant nos. 2016ZX05024-001- Keynejad, S., M. L. Sbar, and R. A. Johnson, 2017, Comparison of model-
007 and 2017ZX05069 for their financial support. based generalized regression neural network and prestack inversion in
predicting Poisson’s ratio in Heidrun Field, North Sea: The Leading Edge,
36, 938–946, doi: 10.1190/tle36110938.1.
DATA AND MATERIALS AVAILABILITY Kingma, D. P., and J. Ba, 2014, Adam: A method for stochastic optimiza-
tion: arXiv preprint arXiv:1412.6980.
Data related to the synthetic examples are available and can be Konaté, A. A., H. Pan, S. Fang, S. Asim, Y. Z. Yao, C. Deng, and N. Khan,
2015, Capability of self-organizing map neural network in geophysical
obtained by contacting the corresponding author. Data related to the log data classification: Case study from the CCSD-MH: Journal of Ap-
field data examples are confidential and cannot be released. plied Geophysics, 118, 37–46, doi: 10.1016/j.jappgeo.2015.04.004.
Lewis, W., and D. Vigh, 2017, Deep learning prior models from seismic im-
ages for full-waveform inversion: 87th Annual International Meeting, SEG,
Expanded Abstracts, 1512–1517, doi: 10.1190/segam2017-17627643.1.
REFERENCES Maiti, S., and R. K. Tiwari, 2010, Automatic discriminations among geo-
physical signals via the Bayesian neural networks approach: Geophysics,
Abadi, M., A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Cor- 75, no. 1, E67–E78, doi: 10.1190/1.3298501.
rado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, Margrave, G. F., M. P. Lamoureux, and D. C. Henley, 2011, Gabor decon-
G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Leven- volution: Estimating reflectivity by nonstationary deconvolution of seis-
berg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. mic data: Geophysics, 76, no. 3, W15–W30, doi: 10.1190/1.3560167.
Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Motaz, A., and A. Ghassan, 2018, Petrophysical-property estimation from seis-
Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, mic data using recurrent neural networks: 88th Annual International Meeting,
Y. Yu, and X. Zheng, 2015, TensorFlow: Large-scale machine learning on SEG, Expanded Abstracts, 2141–2146, doi: 10.1190/segam2018-2995752.1.
heterogeneous systems. (Software available from tensorflow.org). Nielsen, M. A., 2015, Neural networks and deep learning: Determination
Agudo, Ò. C., N. V. da Silva, M. Warner, T. Kalinicheva, and J. Morgan, Press.
2018, Addressing viscous effects in acoustic full-waveform inversion: Poulton, M. M., 2002, Neural networks as an intelligence amplification tool: A
Geophysics, 83, no. 6, R611–R628, doi: 10.1190/geo2018-0027.1. review of applications: Geophysics, 67, 979–993, doi: 10.1190/1.1484539.
Ahmadi, M. A., S. Zendehboudi, A. Lohi, A. Elkamel, and I. Chatzis, 2013, Qian, F., M. Yin, X.-Y. Liu, Y.-J. Wang, C. Lu, and G.-M. Hu, 2018, Un-
Reservoir permeability prediction by neural networks combined with supervised seismic facies analysis via deep convolutional autoencoders:
hybrid genetic algorithm and particle swarm optimization: Geophysical Geophysics, 83, no. 3, A39–A43, doi: 10.1190/geo2017-0524.1.
Prospecting, 61, 582–598, doi: 10.1111/j.1365-2478.2012.01080.x. Robinson, E. A., 1967, Predictive decomposition of time series with appli-
Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke, 2018, Deep-learning cation to seismic exploration: Geophysics, 32, 418–484, doi: 10.1190/1
tomography: The Leading Edge, 37, 58–66, doi: 10.1190/tle37010058.1. .1439873.
Ardjmandpour, N., C. Pain, J. Singer, J. Saunders, E. Aristodemou, and J. Rumelhart, D. E., G. E. Hinton, and R. J. Williams, 1986, Learning repre-
Carter, 2011, Artificial neural network forward modelling and inversion sentations by back-propagating errors: Nature, 323, 533–536, doi: 10
of electrokinetic logging data: Geophysical Prospecting, 59, 721–748, .1038/323533a0.
doi: 10.1111/j.1365-2478.2010.00935.x. Sui, Y., and J. Ma, 2019, A nonstationary sparse spike deconvolution with
Biswas, R., M. K. Sen, V. Das, and T. Mukerji, 2019, Prestack and poststack anelastic attenuation: Geophysics, 84, no. 2, R221–R234, doi: 10.1190/
inversion using a physics-guided convolutional neural network: Interpre- geo2017-0846.1.
tation, 7, no. 3, SE161–SE174, doi: 10.1190/INT-2018-0236.1. Tarantola, A., 1986, A strategy for nonlinear elastic inversion of seismic
Boateng, C. D., L.-Y. Fu, W. Yu, and G. Xizhu, 2017, Porosity inversion reflection data: Geophysics, 51, 1893–1903, doi: 10.1190/1.1442046.
by Caianiello neural networks with Levenberg-Marquardt optimization: Ursenbach, C. P., 2005, Generalized Gardner relations: 75th Annual
Interpretation, 5, no. 3, SL33–SL42, doi: 10.1190/INT-2016-0119.1. International Meeting, SEG, Expanded Abstracts, 1885–1888, doi: 10
Causse, E., R. Mittet, and B. Ursin, 1999, Preconditioning of full-waveform .1190/1.1817057.
inversion in viscoacoustic media: Geophysics, 64, 130–145, doi: 10.1190/ van der Baan, M., and C. Jutten, 2000, Neural networks in geophysical
1.1444510. applications: Geophysics, 65, 1032–1047, doi: 10.1190/1.1444797.
Cersósimo, D. S., C. L. Ravazzoli, and R. G. Martínez, 2016, Prediction of Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in
lateral variations in reservoir properties throughout an interpreted seismic exploration geophysics: Geophysics, 74, no. 6, WCC1–WCC26, doi: 10
horizon using an artificial neural network: The Leading Edge, 35, 265– .1190/1.3238367.
269, doi: 10.1190/tle35030265.1. Wang, L., J. Gao, W. Zhao, and X. Jiang, 2013, Enhancing resolution of
Chaki, S., A. Routray, and W. K. Mohanty, 2015, A novel preprocessing nonstationary seismic data by molecular-Gabor transform: Geophysics,
scheme to improve the prediction of sand fraction from seismic attributes 78, no. 1, V31–V41, doi: 10.1190/geo2011-0450.1.
using neural networks: IEEE Journal of Selected Topics in Applied Earth Wit, R. W. L. D., A. P. Valentine, and J. Trampert, 2013, Bayesian inference
Observations and Remote Sensing, 8, 1808–1820, doi: 10.1109/JSTARS of Earth’s radial seismic structure from body-wave traveltimes using neu-
.2015.2404808. ral networks: Geophysical Journal International, 195, 408–422, doi: 10
Chen, X., Y. Du, Z. Liu, W. Zhao, and X. Chen, 2018, Inversion of density .1093/gji/ggt220.
interfaces using the pseudo-backpropagation neural network method:
Pure and Applied Geophysics, 175, 4427–4447, doi: 10.1007/s00024-
018-1889-7. Biographies and photographs of the authors are not available.