Multi-Scale T-Matrix Completion Method For FWI in The Absence of A Good Starting Model

Multi-scale T-matrix completion method for FWI in the absence of a good starting model
Morten Jakobsen (University of Bergen) and Ru-Shan Wu (University of California, Santa Cruz)
SUMMARY Multi-scale regularization has been developed to reduce the

starting model dependence (Bunks et al., 1995; Pratt et al.,
Downloaded 09/26/17 to 80.82.77.83. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Researchers in the mathematical physics community have re- 1998; Pratt, 1999; Sirgue and Pratt, 2004). The method is a
cently proposed a conceptually new method for solving nonlin- cascade inversion starting from the lowest frequency available
ear inverse scattering problems which is inspired by the theory for the recovery of the largest scale possible. Recent devel-
of nonlocality of physical interactions. The conceptually new opment of low-frequency land source (down to 1.5 Hz) has
method, which may be referred to as the T-matrix completion allowed multi-scale FWI to use 1-D smooth starting model
method, is very interesting since it is not based on linearization (Baeten et al., 2013). They showed that the lowest frequency
at any stage. The TMC method is based on the observation that band (1.5-2.0 Hz) is crucial in recovering the correct long-
the scattered wavefield that one can observe at the surface of wavelength background velocity structure. However, in gen-
the Earth is linearly related with the so-called T-matrix, which, eral, the ultra-low frequency sources are very expensive and
in turn, is related with the unknown scattering potential by an usually not available. Therefore, a good starting model is still
integral equation of the Lippmann-Schwinger type, indepen- required for conventional FWI.
dent of the source-receiver configuration. The main physical
requirements are that the T-matrix should be data-compatible Shin and Cha (2009) developed a method for waveform inver-
and the scattering potential operator should be dominantly lo- sion in the Laplace-Fourier domain that can be used indepen-
cal; although a non-local scattering potential operator is al- dently or to generate a satisfactory starting model for conven-
lowed in the intermediate iterations. The convergence radius tional FWI. There are also some methods which operates with
of the original TMC method is seriously restricted by its use scattering angle filters or combinations of waveform inversion
of single-frequency scattering data only. In this study, we have with travel time inversion wave equation migration analysis or
therefore developed a multi-scale version of the TMC method tomography (see Luo and Wu, 2014). In addition, the seismic
with much better convergence properties, required for seismic envelope inversion method (SEI) has been introduced as a tool
FWI in the absence of a good starting model. Essentially, we for nonlinearly separating the response of large-scale structure
use a simplified real-space version of the original TMC method from the fine-scale structures Luo and Wu, 2014). However, it
within two nested loops over angular frequency and the target is not clear if the hybrid SEI + FWI will always converge in the
degree of non-locality. The loop over angular frequency pro- presence of strong contrasts (e.g., related to salt structures).
vides a first layer of multi-scale regularization similar to the Seismic FWI is essentially a nonlinear inverse scattering prob-
standard sequential frequency inversion method. The second lem similar to those in physics (Pike and Sabatier, 2002; We-
loop over the target degree of non-locality provides a second glein et al., 2003; Weglein, 2013; Jakobsen and Ursin, 2015;
layer of multi-scale regularization which is particularly impor- Wu and Zheng, 2014; Wu et al., 2015a,b). A concept from
tant in the absence of a good starting model and/or ultra-low quantum field theory called renormalization has recently been
frequencies. A formal proof of convergence is still lacking, used to derive convergent scattering series (Lesage et al., 2014;
but the results we have obtained in a series of numerical ex- Jakobsen and Wu, 2016a) and to justify and develop the en-
periments based on synthetic data for the strongly scattering velop inversion method (see Wu et al., 2014; Wu and Luo,
SEG/EAGE salt model are very encouraging. 2015). The so-called renormalization group has also been used
to obtain a new interpretation of the well-known Born and Ry-
tov approximations (Kirkinis, 2008) that are often used for the
INTRODUCTION calculation of sensitivities in traditional FWI. Multi-scale as-
pects of the present work is related with renormalization.
The full waveform inversion (FWI) method often produces im-
ages of higher quality and resolution than conventional seismic Inspired by the theory of nonlocality of physical interactions,
amplitude and/or traveltime analysis. However, there are still Levinson and Markel (2016a,b) proposed a conceptually new
several challenges related with this comprensive and emerging T-matrix completion (TMC) method for solving nonlinear in-
imaging method; including it’s huge computational cost and verse scattering problems. In contrast with established meth-
sensitivity to the starting model (Virieux and Operto, 2009). ods for FWI and nonlinear inverse scattering (Virieux and Op-
Traditional FWI is often implemented with a gradient-based erto, 2009; Jakobsen and Ursin, 2015), the TMC method is not
local optimization method (e.g. Pratt and Worthington, 1990; based on linearization at any stage. Since the more direct TMC
Pratt; 1999; Sirgue and Pratt, 2004). A faster convergence rate method does not involve any objective functions, the problem
may be achieved by using Newton-based methods (Pratt et al., with convergence to local minimae have aparently been elim-
1998; Pratt, 1999), at the expence of increasing the compu- inated. However, since the original TMC method is based on
tational cost. Since conventional FWI is based on the use of the use of single frequency data only, there are convergence
local optimization methods for solving a nonlinear optimiza- problems in strongly scattering media. Based on the work of
tion problem, it should not come as a surprise that there are Jakobsen and Wu (2016c), we have developed a multi-scale
often convergence problems associated with the application of version of the TMC method, which seem to converge even in
FWI to strongly scattering media (e.g., sub-salt structures). the absence of a good starting model or ultra-low frequencies.
© 2017 SEG Page 1274

SEG International Exposition and 87th Annual Meeting
FWI via T-matrix completion
SINGLE-FREQUENCY T-MATRIX COMPLETION The first estimate of T and V

Since the determination of the T-matrix from the scattered wave-
The scattered wavefield
field data in equation (5) represents a highly ill-posed linear
In the frequency domain, the scalar wave equation reduces to inverse problem, the two-stage inversion method outlined in
the Helmholtz equation, that can be solved using the concept of the previous section will not work. In what follows, we shall
Green’s functions (Jakobsen and Ursin, 2015). By decompos- describe a simplified version of the single-frequency T-matrix
ing the actual medium with velocity field c(x) into an arbitrary completion method of Levinson and Markel (2016a,b). If the
background medium with velocity field c(0) (x) and a corre- scattered wavefield δ GRS is assumed known (observed) then a
sponding perturbation, one can show that the corresponding first estimate of the T-matrix can be obtained from (Levinson
Green’s functions G(x, x0 ) and G(0) (x, x0 ) for the actual and and Markel, 2016a,b)
reference media are related by a volume integral equation of
(0) (0)
the Lippmann-Schwinger type (Jakobsen, 2012). After spatial T (1) = GV R δ GRS GSV (6)
discretization, this integral equation can be represented by two
coupled matrix equation (Jakobsen and Ursin, 2015): where
(0) (r) (0) (r)
GV R ≡ (GRV )+ , GSV ≡ (GV S )+ (7)
(0) (0)
GRS = GRS + GRV V GV S (1) and the +-symbol denotes the More-Penrose pseudoinverse
(Levinson and Markel, 2016a,b). Equation (6) can be regarded
(0) (0)
GV S = GV S + GVV V GV S (2) as a transformation of the scattered wavefield data from the
(0) (0) (0) data domain to the image domain (see Figure 1). In our direct
Here, GV S , GVV and GRV are the so-called source-volume, nonlinear inversion method, the data are used only one time;
volume-volume and volume-receiver Green’s function matri- that is, to calculate the experimental T-matrix from the scat-
ces, and V is the scattering potential matrix (associated with tered wavefield residuals using equation (6). Given the exper-
perturbations in the squared slowness). Following Jakobsen imental T-matrix, one can reconstruct the unknown scattering
(2012), we now define the T -matrix by potential using an iterative process that involves the interme-
(0) diate use of a non-local scattering potential operator. The ex-
V GV S = T GV S . (3)
perimental T-matrix can be used to find a first estimate of the
The T-matrix accounts for all nonlinear effects of multiple scat- scattering potential, denoted by V (1) . By inserting the experi-
tering (Jakobsen, 2012; Jakobsen and Ursin, 2015). From mental T-matrix into the exact relation (4) between the scatter-
equations (2) and (3) and the fact that the background medium ing potential V and the T-matrix, one obtains a first estimate of
arbitrary, we obtain (Jakobsen and Ursin, 2015) the unknown scattering potential:
(0)
T = V +V GVV T.
(0)
(4) V (1) = (I − T (1) GVV )−1 T (1) (8)
It follows from equations (1) and (3) that If the first estimate T (1) of the T-matrix was exactly equal to
the true T-matrix for the unknown true model then first es-
(0) (0) (0)
δ GRS ≡ GRS − GRS = GRV T GV S . (5) timate V (1) would exactly equal to the true scattering poten-
tialy which is perfectly diagonal. However, the first estimate
Since the observable scattered wavefield is given by the prod- of the T-matrix will generally be very different from the true
uct of δ GRS and the source wavelet (Jakobsen and Ursin, 2015), T-matrix, so it is clear that V (1) will generally be nonlocal. The
we regard δ GRS as data when the source wavelet is known. use of a nonlocal scattering potential in the intermediate steps
Equations (5) and (4) suggest that the nonlinear inverse scat- of an iterative algorithm is satisfactory from a mathematical
tering problem can potentially be solved in two steps: First point of view. However, our inversion algorithm should out-
one performes a linear inversion of δ GRS for the T-matrix. (2) put a local scattering potential. The local scattering potential
Then one solves equation (4) for the scattering potential under model corresponding to V (1) is given by
the assumption that the T-matrix is known or determined via
an experiment (See Figure 1). D(1) = diag(V (1) ). (9)
In order to check if the esimated local scattering potential D(1)

is a physically satisfactory solution, one can calculate the dis-
tance between the nondiagonal and diagonal matrices V and D
using the following (normalized) norm
|V (1) − D(1) |
η (1) = . (10)
|V (1) |
If η (1) is significantly smaller than 1 then the estimated diago-

nal scattering potential D(1) is considered as dominantly diag-
onal; that is, more or less satisfactory from a physical point of
view.
Figure 1: Forward (direct) vs inverse scattering.
© 2017 SEG Page 1275

The second estimate of T and V MULTI-SCALE T-MATRIX COMPLETION

In order to decrease the degree of nonlocality, one can try to In our multi-scale TMC method, the single-frequency TMC
iterate between V and T by making use of the exact relations algorithm described above is placed within two nested loops
between these two operators or matrices given in Figure 1; but over the angular frequency ω and the degree of non-locality η.
always using the diagonal part of the estimated scattering po- The background medium Green’s functions are updated after
tential. Thus, the second estimate of the T-matrix is given by each iteration within the two nested loops over ω and η. The
T (2) = (I − D(1) G(0) )−1V (1) (11) outer loop over ω provide a first layer of regularization similar
to the standard sequential frequency inversion method of Pratt
The first estimate of the T-matrix was clearly compatible with
(1999). The inner loop over η represents an additional layer of
the scattered wavefield data, since it was obtained by using the
regularization, which is particularly important in the absence
Moore-Penrose inverse directly in conjunction with the data
of a good starting model and/or ultra-low frequencies. In the
equation. However, the second estimate of the T-matrix will
inner loop over η, the target degree of non-locality ηtarget is
generally not be compatible with the observed data. Therefore,
gradually decreased from a relatively high value (correspond-
one needs to add a correction term ∆T (2) to the second estimate
ing to a very non-local scattering potential) to a much smaller
of the T-matrix in order to be consistent with the data;
value (corresponding to a dominantly local scattering poten-
T (2) → T̃ (2) = T (2) + ∆T (2) (12) tial). By allowing for a non-local scattering potential in the
intermediate iterations, we effectively increase the numbers of
Data compatibility implies that
freedom, so that we avoid cycle-skipping problems, even when
(0)
δ GRS = GRV T̃ (2) GV S (13) the starting frequency is so high that the standard sequential
frequency inversion method is garanteed to fail.
If one multiplies the above equation with the Moore-Penrose
(0) (0)
invers matrices GV R and GSV from the left and right, respec-
tively; then one can verify that the correction term is given by NUMERICAL EXAMPLE
(2) (1) (0) (0)
∆T =T − GV RRV ∆T (2) GV SSV (14)
We consider a resampled SEG/EAGE salt model (2D) which is
where 2088 m wide and 432 m deep. We assume 174 sources and 174
(0) (0) (0) (0) (0) (0) receivers uniformly located along a single line at the surface.
GV RRV ≡ GV R GRV , GV SSV ≡ GV S GSV (15)
Since the grid size is 24 m in each direction, there are 87 and
are N × N dimensional matrices, where N is the number of 18 grid blocks in the horizontal and vertical directions, respec-
pixel’s in the seismic image. tively. We assume a Ricker wavelet with a central frequency
equal to 7.5. We employ the multi-scale TMC method to a set
The kth estimate of T and V
of synthetic frequency domain waveform data without noise.
Following Levinson and Markel (2016), we consider the case In Figure 3, one can see that the inverted model is converging
when the iterations start from an initial guess. We first set k = 1 towards the true model.
and use equation (6) to calculate T (k) = T (1) . We then estimate
the kth approximation to the scattering potential (Step 1):
CONCLUDING REMARKS
(0)
V (k) = T (k) (I + GVV T (k) )−1 .
We have developed a conceptually new method for FWI in the
If V (k) is not sufficiently local/diagonal, we regard it as un-
absence of a good starting model and/or ultra-low frequencies,
physical and extract it’s diagonal (Step 2):
which is based on the use of a real-space T-matrix completion
D(k) = diag(V (k) ). algorithm within two nested (multi-scale regularization) loops
over the angular frequency and degree of non-locality of phys-
If the reconstructed scattering potential is dominantly local, in
ical interaction. A formal proof of convergence is still lacking,
the sense that η (k) ≡ |V (k) − D(k) |/|V (k) | << ηtarget , then we
but the new method has given encouraging results in numerical
exit the iterative loop. Otherwise, we proceed to (Step 3):
experiments based on synthetic data. The computational cost
0 (0)
T (k) = (I −V (k) GVV )−1V (k) . of our new multi-scale TMC method is relatively high, but it
may in principle be significantly reduced via the use of domain
Unlike T (k) , the new estimate Tk0 is no longer data compati- decomposition methods (e.g., Jakobsen and Wu, 2016b; Wang
ble. Therefore, we add a correction to make the T-matrix data et al., 2016) and/or lowest frequency inversion on a courser
compatible and advance the iterative index k by one (Step 4): grid for generation of starting models for conventional FWI.
0 0
(0)
T (k+1) = T (1) + T (k) − GV RRV T (k) GV SSV .
(0) The theoretical work presented here can in principle be gen-
eralized to anisotropic elastic media (see Jakobsen and Ursin,
Finally, we return to Step 1. The above workflow, which is 2015; Jakobsen et al., 2015).
illustrated in Figure 1, represents a simplified real space rep-
resentation of the original TMC algorithm of Levinson and Acknowledgments The work of Morten Jakobsen is associ-
Markel (2016a,b). We have found that the original TMC method ated with Petromaks 2 project 228357 funded by the Norwe-
can be applied to seismic FWI if and only if it is supplemented gian Research Council. Ru-Shan Wu acknowledge support
by some kind of multi-scale regularization. from the sponsors of the WTOPI consortium at the UCSC.
© 2017 SEG Page 1276

Figure 2: Workflow for single-frequency T-matrix completion. The multi-scale TMC algorithm was designed by locating the
single-frequency TMC within two nested loops over frequency and the degree of non-locality of the physical interaction. The exact
formal expressions for the T and V matrices can be replaced by direct and inverse cluster expansions or thin-slab propagators.
Figure 3: Numerical example of multi-scale TMC. The left column shows the inversion results we have obtained at 3 Hz by
gradually searching for an increasingly local scattering potential model. The right column shows the results of a sub-sequent
sequential frequency inversion. The starting model was laterally homogeneous and involved a linear increase of the velocity with
increasing depth from 2000 m/s to 4500 m/s; that is, very different from the true model.
© 2017 SEG Page 1277

EDITED REFERENCES
Note: This reference list is a copyedited version of the reference list submitted by the author. Reference lists for the 2017
SEG Technical Program Expanded Abstracts have been copyedited so that references provided with the online
metadata for each paper will achieve a high degree of linking to cited sources that appear on the Web.
REFERENCES
Baeten, G., J. W. de Maag, R. E. Plessix, M. Klaassen, T. Qureshi, M. Kleemeyer, F. ten Kroode, and R.
Zhang, 2013, The use of the low frequencies in a full waveform inversion and impedance
inversion land seismic case study: Geophysical Prospecting, 61, 701–711,
http://doi.org/10.1111/1365-2478.12010.
Bunks, C., F. M. Salech, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, https://doi.org/10.1190/1.1443880.
Jakobsen, M., 2012, T-matrix approach to seismic forward modelling in the acoustic approximation:
Studia Geophysica et Geodaetica, 56, 1–20, http://doi.org/10.1007/s11200-010-9081-2.
Jakobsen, M., and B. Ursin, 2015, Full waveform inversion in the frequency domain using direct iterative
T-matrix methods: Journal of Geophysics and Engineering, 12, 400–418,
https://doi.org/10.1088/1742-2132/12/3/400.
Jakobsen, M., I. Pilskog, and M. Lopez, 2015, Generalized T-matrix approach to seismic modeling in
fractured reservoirs and related anisotropic systems: 77th Annual International Conference and
Exhibition, EAGE, Extended Abstracts, http://doi.org/10.3997/2214-4609.201412933.
Jakobsen M., and R. S. Wu, 2016a, Renormalized scattering series for frequency-domain waveform
modelling of strong velocity contrasts: Geophysical Journal International, 206, 880–899,
https://doi.org/10.1093/gji/ggw169.
Jakobsen, M., and R. S. Wu, 2016b, Domain decomposition method for efficient waveform inversion in
strongly scattering media: 86th Annual International Meeting, SEG, Expanded Abstracts, 1395–
1399, https://doi.org/10.1190/segam2016-13951062.1.
Jakobsen, M., and R. S. Wu, 2016c, Direct nonlinear inversion by multi-frequency T-matrix completion:
AGU Annual Fall meeting, Abstract and oral presentation.
Kirkinis, E., 2008, Renormalization group interpretation of the Born and Rytov approximations: Journal
of the Optical Society of America A, 25, 2499–2508, https://doi.org/10.1364/JOSAA.25.002499.
Lesage, A. C., J. Yao, F. Hussain, and D. J. Kouri, 2014, Multi-dimensional inverse scattering series
using the Volterra renormalization of the Lippmann-Schwinger equation: 84th Annual
International Meeting, SEG, Expanded Abstracts, 3118–3122,
https://doi.org/10.1190/segam2014-1349.1.
Levinson, H. W., and V. A. Markel, 2016a, Solution to the inverse scattering problem by T-matrix
completion I. Theory: Physical Review E, 94, 043317,
https://doi.org/10.1103/PhysRevE.94.043318.
Levinson, H. W., and V. A. Markel, 2016b, Solution to the inverse scattering problem by T-matrix
completion. II. Simulations: Physical Review E, 94, 043318,
https://doi.org/10.1103/PhysRevE.94.043318.
Luo, J., and R. S. Wu, 2014, Nonlinear scale separation and misfit configuration of envelope inversion:
84th Annual International Meeting, SEG, Expanded Abstracts, 1216–1221,
Pike, R., and P. Sabatier, 2002, Scattering and inverse scattering in pure and applied science: Academic
Press.
Pratt, R. G., and M. H. Worthington, 1990, Inverse-theory applied to multisource cross-hole tomography:
Acoustic wave equation method: Geophysical Prospecting, 38, 287310,
https://doi.org/10.1111/j.1365-2478.1990.tb01846.x.
© 2017 SEG Page 1278

Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, https://doi.org/10.1190/1.1444597.
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton method in frequency-space
seismic waveform inversion: Geophysical Journal International, 133, 341–362,
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Shin, C., and Y. Cha, 2009, Waveform inversion in the Laplace-Fourier domain: Geophysical Journal
International, 177, 1067–1079, https://doi.org/10.1111/j.1365-246X.2009.04102.x.
Sirgue, L., and R. G. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting
temporal frequencies: Geophysics, 69, 231–248, https://doi.org/10.1190/1.1649391.
Virieux, J., and S. Operto, 2009, An overview of full waveform inversion in exploration geophysics:
Geophysics, 74, no. 6, WCC127–WCC152, https://doi.org/10.1190/1.3238367.
Weglein, A. B., 2013, A timely and necessary antidote to indirect methods and so-called P-wave FWI:
The Leading Edge, 32, 1192–1204, https://doi.org/10.1190/tle32101192.1.
Weglein, A. B., F. V. Araujo, P. M. Carvalho, R. H. Stolt, K. H. Matson, R. T. Coates, D. Corrigan, D. J.
Foster, and S. A. Shaw, 2003, Inverse scattering series and seismic exploration: Inverse
Problems, 19, R27–R83, https://doi.org/10.1088/0266-5611/19/6/R01.
Wu, R. S., B. Wang, and C. Hu, 2015a, Renormalized nonlinear sensitivity kernel and inverse thin-slab
propagator in T-matrix formalism for wave-equation tomography: Inverse Problems, 31, 115004,
https://doi.org/10.1088/0266-5611/31/11/115004.
Wu, R. S., B. Wang, and M. Jakobsen, 2015b, Greens function and T-matrix reconstruction using surface
data for direct nonlinear inversion: 85th Annual International Meeting, SEG, Expanded Abstracts,
1286–1291, https://doi.org/10.1190/segam2015-5926527.1.
Wu, R. S., and J. Luo, 2015, Nonlinear scale separation and a renormalization interpretation in seismic
envelope inversion: AGU meeting, Abstract.
Wu, R. S., J. Luo, and B. Wang, 2014, Seismic envelope inversion and modulation signal model:
Geophysics, 79, no. 3, WA13–WA24, https://doi.org/10.1190/geo2013-0294.1.
Wu, R. S., and Y. Zheng, 2014, Nonlinear partial derivative and its De Wolf approximation for nonlinear
seismic inversion: Geophysical Journal International, 196, 1827–1843,
https://doi.org/10.1093/gji/ggt496.
© 2017 SEG Page 1279

2D Full Waveform Inversion and Uncertainty Estimation using the Reversible Jump Hamiltonian
Monte Carlo
Reetam Biswas∗ and Mrinal K. Sen, Institute for Geophysics, University of Texas at Austin
SUMMARY Using too few parameters can lead to under-fitting the data, and
estimating biased parameters. On the other hand, considering
Seismic data are used to generate high resolution subsurface too many model parameters can over-fit the data, which leads
images, which require detailed velocity models. Full Wave- to estimating under-determined parameters with enormous un-
form Inversion (FWI), has recently gathered immense popular- certainty (Dosso et al., 2014). Thus it makes sense to make the
ity in inverting for the elastic wave velocities from the seismic number of model parameters itself a parameter to be solved for.
data. FWI is a non-linear and non-unique inverse problem that One other limitation of FWI is the requirement of a good start-
uses complete time and amplitude information for estimating ing model. Although this problem has recently been addressed
the elastic properties. Typically FWI is performed using lo- using global optimization methods (e.g., Datta and Sen, 2016),
cal optimization methods in which the subsurface model is de- no attempt has yet been made to characterize uncertainty using
scribed by using a large number of grids. The number of model a fully nonlinear sampling method. Zhu et al. (2016) estimate
parameters is determined a priori. In addition, the convergence uncertainty assuming the posteior distribution (PPD) to be a
of the algorithm to the globally optimum answer is largely Gaussian. In this paper, we primarily address these two issues
dictated by the choice of a starting model. Here, we apply using a transdimensional approach.
a trans-dimensional approach, which is based on a Bayesian
framework to solve the waveform inversion problem. In our Monte Carlo Markov Chain (MCMC) based on the Metropolis-
approach, the number of model parameters is also treated as a Hastings update rule is the most common method for sam-
variable, which we hope to estimate. We use Voronoi cells and pling from a distribution that does not have simple analytic
represent our 2D velocity model using certain nuclei points form in fixed-dimensional problems. Its extension to prob-
and employ a recently developed method called the Reversible lems with variable dimensional model space (reversible jump
Jump Hamiltonian Monte Carlo (RJHMC). RJHMC is an ef- MCMC) was developed by Green (1995) and subsequently
fective tool for model exploration and uncertainty quantifica- applied to geophysical inverse problems by Malinverno and
tion. It combines the reversible jump MCMC with the gradient Leaney (2000, 2005); Malinverno (2002); Ray et al. (2016).
based Hamiltonian Monte Carlo (HMC). We solve our forward Most recently, Sen and Biswas (2017) combined RJMCMC
problem using time-domain finite difference method while ad- with the Hamiltoniam Monte-Carlo (RJHMC) to speed up con-
joint method is used to compute the gradient vector required at vergence and applied to 1D seismic inversion. Here we apply
the HMC stage. We demonstrate our algorithm with noisy syn- RJHMC to a computationally intesive 2D FWI problem and
thetic data for the well known Marmousi model. Convergence demonstrate its feasibility and usefulness using a noisy syn-
of the chain is attained in about 3000 iterations; marginal pos- thetic dataset.
terior density plots of velocity models demonstrate uncertainty
in the obtained velocity models.
THEORY
To solve the FWI problem, we first describe the parametriza-

INTRODUCTION tion used in describing a given velocity model and then the
forward modeling algorithm. Finally, we describe model ex-
Unlike travel time tomography and AVO, Full Waveform In- ploration based on RJHMC.
version (Tarantola, 1986; Virieux and Operto, 2009) uses both
amplitude and phase information to generate subsurface veloc- Model Parametrization
ity models using an iterative least square data fitting procedure.
To solve a trans-dimensional 2D FWI problem using RJHMC
It minimizes the data misfit by updating the velocity model by
we choose to represent our model space using an adaptive
a scaled gradient of the misfit function. Generally, the adjoint
ensemble of elastic nuclei. These nuclei define the 2D dis-
state method (Plessix, 2006) is used to calculate the gradient
tribution of the elastic properties in the model. The nuclei
of the misfit function. The most popular formulation of FWI is
distribution as well as their number can adaptively change as
the acoustic FWI which solves for the P-wave velocity. In FWI
a function of iteration during sampling, depending on misfit
with an over-parameterized model description, it is necessary
of the data. These may lead to concentrating nuclei where
to smooth the data, which is done using regularization (Sen
high resolution is required in the model and sparser mesh else-
and Biswas, 2015).
where (Agostinetti et al., 2015; Bodin et al., 2012). To connect
The above formulation for solving a FWI problem requires the the nuclei distributed in the model we used Voronoi tessela-
dimension of model parameters, or the number of model pa- tion. It covers the entire model with non-overlapping, con-
rameters, to be fixed, which is perhaps the least known of all vex cells from the distribution of the nuclei (Sambridge et al.,
parameters. In an inverse problem it is very important to de- 1995; Agostinetti et al., 2015).
termine exact model parametrization, i.e. the number of model
Following Agostinetti et al. (2015) we define our model vector
parameters, to be consistent with resolving power of the data.
© 2017 SEG Page 1280

Trans-dimensional 2D FWI using RJHMC
m as m having k nuclei, may jump towards an increase in dimension

to the proposed state m0 having k0 = k + n nuclei. The prior for
the model dimension is given by q(m0 , k0 ). The second step, is
m = (k, l, v), (1) a fixed-dimensional Hamiltonian Monte Carlo (HMC) (Neal,
where k is the number of nuclei in the current model. The po- 2011), where the model m0 moves to a new state m∗ having
sition of the nuclei is given by a 2k vector l = (x1 , z1 , . . . , xk , zk ) the same number of nuclei points k0 . In the reverse step of this
and the seismic P-wave velocity is given by the k vector v = move, we first perform the fixed dimensional HMC to change
(V1 , . . . ,Vk ). To generate the full scale velocity model from the state of model from m∗ to m0 having k0 number of nuclei
the nuclei, we first create the Voronoi connection between the and then perform the trans-D step to move model m0 having k0
nuclei and the grid point which falls in the Voronoi area gets nuclei to model state m having k nuclei. The transition from
the velocity from the respective nucleus. Figure 1 shows an model m (k) to model m0 (k0 ) to final model m∗ (k0 ) is given
example of the transformation from nuclei cells to a complete by the RJHMC acceptance probability.
velocity model.
First: Trans-dimensional step
Forward Modeling
For a first tarns-dimensional step the random moves can jump
Once a gridded model is obtained, we perform forward model- between the states having different model dimensions (number
ing using the constant density acoustic wave equation, where of nuclei here). Here we have the transition from model m
the only elastic parameter is the P-wave velocity. Here we with k nuclei to model m0 having k0 nuclei. To represent such
assume that the reflections are due to the contrast in P-wave a transition, the Bayes’ rule can be modified to accommodate
velocities only. Such type of wave equation is given by the variability in model parameterization so as to represent the
PPD and can be given as
1 ∂ 2P
= ∇2 P + S(x,t), (2) p(k)p(m|k)p(d|m, k)
c2 ∂t 2 π(k, m) = P R , (6)
k∈κ p(k)p(m|k)p(d|m, k)dm
where P is the pressure wavefield, c is the P-wave velocity, ∇2
2 2
is the Laplacian given by ∂∂x2 + ∂∂z2 , and S(x,t) is the source where κ represents all possible values of k. p(k) is the prior
term. distribution for choosing the model dimension. We define it to
be a bounded uniform distribution between the minimum num-
At each iteration, after the velocity model is perturbed, the ber of nuclei kmin and the maximum number of nuclei possible
wave equation is solved to compute new seismogram (dcal ) kmax . p(m|k) is the prior distribution on the parameter values
and the difference from the observed seismogram (dobs ) is cal- and is defined as a Gaussian prior as
culated using " #
E = kdobs − dcal k2 , (3) 1 1 T −1
p(m|k) = 1/2 exp − 2 (m− m̄) (C̄ p ) (m− m̄) ,
The gradient is calculated by the adjoint state method, where (2π)k det(C̄ p )
the zero lag cross correlation of forward propagating source (7)
and backward propagating data residuals in time is calculated where C̄ p is the prior covariance matrix and m̄ represents the
by prior mean model.
1 ∂ 2R
= ∇2 R + ∆S(x,t), (4) p(d|m, k) in equation 6 represents the likelihood, which is given
c2 ∂t 2 by the magnitude of the error in the predicted data. Assuming
where R is the adjoint wavefield, and ∆S(x,t) is the data resid- the error covariance matrix to be a identity matrix, the mag-
ual between the observed data (dobs ) and the calculated data nitude of error can be given by L2 norm of the difference in
(dcal ). The gradient of the misfit function is calculated by sum- seismogram as given in equation 3. Thus the likelihood can be
ming over the zero lag cross correlation summed over multiple represented as
shots
∂E 1 X 1 E2
h i
= P̈(x, z,t)R(x, z,t), (5) p(d|m, k) = N/2 exp − . (8)
∂ m c(x, z)3 2
shots 2π
where P and R are the source and adjoint wavefields in space
and time. Second: Fixed-dimensional step (HMC)
Inverse formulation This is a fixed Dimensional MCMC step, where the model in
FWI is a non-unique, non linear inverse problem. It is impos- state m0 with k0 nuclei moves to a new state m∗ with same
sible to get a unique solution in FWI. Therefore using a prob- k0 nuclei using Hamiltonian Monte Carlo. HMC uses Hamil-
abilistic approach seems to be a better way to address such a tonian Dynamics to produce distant proposal for Metropolis
problem. Here, we follow the Reversible Jump Hamiltonian algorithm and avoid slow exploration by using gradient infor-
Monte Carlo algorithm (Sen and Biswas, 2017). This is a mation while changing state. In HMC, we calculate the total
stochastic method which is based on Bayes’ rule. It consist Potential energy (U) and Kinetic energy (K) of the state, i.e.
of two step (Al-Awadhi et al., 2004). The first step is a trans- the Hamiltonian Energy. HMC augments the state space of the
dimensional step similar to RJMCMC. In this step the model target distribution by adding auxiliary sets of variables termed
© 2017 SEG Page 1281

Velocity model VP
6 15 20 19
1
4
20 2
18
3 5000
14
11
40
4000
Depth
60
80 7 3000
100
9
17
13 2000
120 10 12 8 16
50 100 150 200 250 300 350

X Dir.
Figure 1: Figure showing the velocity model created from nuclei cells represented in black dot. The white cross are the grid points
in the velocity model. The color in the Voronoi cells are the velocity of that respective nuclei in that area.
as momentum p. Each nuclei is augmented with a initial mo-

mentum value derived from a normal distribution N(0, 1). The
total Hamiltonian energy of the model m with momentum p
can be given by
H(m∗ , p) = U(m∗ ) + K(p). (9)
The Potential energy U is defined as the negative logarithm of

the posterior distribution given as
U(m∗ ) = − log p(m∗ )p(d|m∗ ) ,

(10)
where prior probability p(m∗ ) is taken to be a uniform bounded
distribution and the likelihood p(d|m) is same as equation 8.
The Kinetic energy term K is calculated by the equation given
as
pT p
K(p) = . (11)
2
To perform the Hamiltonian step, the model (m0 , k0 ) in the be-
ginning of second step is assigned a momentum (p) for each
nuclei. It has a Hamiltonian energy H(m0 , p). This momentum
value is drawn randomly from a normal distribution N(0, 1).
Then we simulate Hamiltonian dynamics for L number of times. Figure 2: Flowchart of RJHMC.
In each Hamiltonian dynamics step we update the momentum
p with the help of gradient information calculated from FWI
adjoint state method and use this updated momentum to up- RESULT
date the model. At the end of Hamiltonian simulation we have
a new state of model m∗ and momentum p∗ with Hamilto- Now to demonstrate our algorithm, we invert Marmousi model
nian energy H(m∗ , p∗ ). We calculate the difference in the total for P-wave velocity. The model has a dimension of 251 × 767,
Hamiltonian energy of the two states. containing grid spacing of 12m. We generated synthetic data
using a 15 Hz ricker wavelet. We used 21 shots at an inter-
After completion of the two steps, i.e. a transition from model val of 36 grid spacing with 767 receiver per shot. Gaussian
(m, k) → (m0 , k0 ) → (m∗ , k0 ), we accept this new model with noise was added to the traces, which were then used in the
a RJHMC acceptance probability given in Sen and Biswas inversion. To perform inversion, we started with 1000 nuclei
(2017). The acceptance probability can be given by points randomly distributed in the space. In this example we
performed two steps of Hamiltonian Dynamics for each itera-
"
p(m∗ |k0 )p(d|m∗ ,k0 )
α(m∗ , k0 |m, k) = min 1, p(m|k)p(d|m,k)
× tion of RJHMC. In the trans-dimensional step of RJHMC, the
algorithm can add or delete upto max of 30 nuclei at a time.
#
q(m|k) exp(−H(m0 )) Figure 3 shows the result from the inversion. The first model is
q(m0 |k0 )
× exp(−H(m∗ ))
. (12)
the true velocity model. Rest of the models show the velocity
at various iterations of RJHMC along with the nuclei count
where q(m0 |k0 ) is the proposal distribution for model param- at that step. We started with 1000 nuclei as a staring model
eter values which is chosen to be an approximation of a pos- (Figure 3b). Note that, in general, we observe an increase in
terior distribution, to achieve a high value of acceptance ratio. the number of nuclei until convergence. The burn-in phase
Figure 2 shows the flowchart of the algorithm.
© 2017 SEG Page 1282

Iter=0
True Velocity NC=1000
a b 5000
Depth (km) 1 4000
2 3000
2000
3
2 4 6 8 2 4 6 8
Iter=1258 Iter=3969
NC=3010 NC=6850
c d 5000
Depth (km)
4000
3000
2000
2 4 6 8 2 4 6 8
X Dir. (km) X Dir. (km)
Figure 3: Figure showing velocity models at different iterations of RJHMC: (a) the true velocity model, (b)-(d) models at progressive
iterations. Plots show the iteration number and the number of nuclei count in the model represented as NC.
seems to have completed in 3000 iterations only. Figure 3d

location Green shows the model from last iteration. Here we consider a well
PPD
5000 Log
Mean Parameter estimate
at locations X = 7140m marked in green.
VP (m/s)
4000 95% confidence interval
3000
Figure 4 shows the velocity model at X = 7140m. It shows
2000
0.5 1 1.5 2 2.5 3

the plot of marginal PPD (grey cloud), true log (green), mean
Depth (km)
parameter estimate (red), and 95% confidence interval (dashed
line) for the well marked with green Location. The well has
Figure 4: Plot of marginal PPD (grey cloud), true log (green), the target reservoir of the marmousi model at a depth of 2.3
mean parameter estimate (red), and 95% confidence interval km that is fairly resolved. We also observe that with depth
(dashed line) for a well at Green Location the uncertainty in estimation of the P-wave velocity increases.
Figure 5 shows the mean model and the standard deviation (as
a percentage of mean) at each grid. The maximum percent-
age standard deviation is nearly 10% and the standard devi-
Mean Model ation seems to be increasing with depth as the data coverage
5000 becomes poorer with depth.
Depth (km)
1 4000
2 3000 DISCUSSION AND CONCLUSION

2000 In this paper, we demonstrated the RJHMC algorithm and its
3 application to a 2D FWI problem using the Marmousi model.
2 4 6 8
X Dir. (km) We represented our 2D model using certain nuclei points. RJHMC
(a) algorithm solves the 2D FWI problem in a trans-dimensional
Percentage Standard Deviation
High
way. It solved for the number of nuclei as well as the ve-
locity of the nuclei points. The number of nuclei required is
Depth (Km)
1 completely dictated by the data and determined large automat-

ically. The maximum number of nuclei possible is the dimen-
2 sion of the velocity grid, i.e. 251 × 767 = 192517, but in the
example, using only ≈ 5000 nuclei points RJHMC is able to
3 Low
2 4 6 8 reconstruct the model. Using the gradient information from
X Dir. (Km) HMC, the algorithm can make some jumps in the model space
(b) and efficiently sample the model space. RJHMC successfully
reconstructed the complete model with an optimal number of
Figure 5: (a) Plot of the estimated mean velocity model (b) model parameters as needed by the data. RJHMC explores the
Plot of the percentage standard deviation of of the estimated pertinent model space having various number of parameters. It
velocity model. is able to quantify uncertainty in the eatimation of the P-wave
velocity using 2D FWI.
© 2017 SEG Page 1283

EDITED REFERENCES
REFERENCES
Agostinetti, N. P., G. Giacomuzzi, and A. Malinverno, 2015, Local three-dimensional earthquake

tomography by trans-dimensional monte carlo sampling: Geophysical Journal International, 201,
1598–1617, http://doi.org/10.1093/gji/ggv084.
Al-Awadhi, F., M. Hurn, and C. Jennison, 2004, Improving the acceptance rate of reversible jump mcmc
proposals: Statistics & probability letters, 69, 189–198, http://doi.org/10.1016/j.spl.2004.06.025.
Bodin, T., M. Sambridge, N. Rawlinson, and P. Arroucau, 2012, Transdimensional tomography with
unknown data noise: Geophysical Journal International, 189, 1536–1556,
http://doi.org/10.1111/j.1365-246X.2012.05414.x.
Datta, D., and M. K. Sen, 2016, Estimating a starting model for full-waveform inversion using a global
optimization method: Geophysics, 81, no. 4, R211–R223, http://doi.org/10.1190/geo2015-0339.1.
Dosso, S. E., J. Dettmer, G. Steininger, and C. W. Holland, 2014, Efficient trans-dimensional bayesian
inversion for geoacoustic profile estimation: Inverse Problems, 30, 114018,
http://doi.org/10.1088/0266-5611/30/11/114018.
Green, P. J., 1995, Reversible jump markov chain monte carlo computation and bayesian model
determination: Biometrika, 82, 711–732, http://doi.org/10.1093/biomet/82.4.711.
Malinverno, A., 2002, Parsimonious bayesian markov chain monte carlo inversion in a nonlinear
geophysical problem: Geophysical Journal International, 151, 675–688,
http://doi.org/10.1046/j.1365-246X.2002.01847.x.
Malinverno, A., and S. Leaney, 2000, A monte carlo method to quantify uncertainty in the inversion of
zero-offset VSP data: 70th Annual International Meeting, SEG, Expanded Abstracts, 2393–2396,
http://doi.org/10.1190/1.1815943.
Malinverno, A., and W. S. Leaney, 2005, Monte-carlo bayesian look-ahead inversion of walkaway
vertical seismic profiles: Geophysical prospecting, 53, 689–703, http://doi.org/10.1111/j.1365-
2478.2005.00496.x.
Neal, R. M., 2011, Mcmc using hamiltonian dynamics, in Handbook of Markov Chain Monte Carlo:
Chapman & Hall/CRC.
Plessix, R.-E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
geophysical applications: Geophysical Journal International, 167, 495–503,
http://doi.org/10.1111/j.1365-246X.2006.02978.x.
Ray, A., A. Sekar, G. M. Hoversten, and U. Albertin, 2016, Frequency domain full waveform elastic
inversion of marine seismic data from the alba field using a bayesian trans-dimensional
algorithm: Geophysical Journal International, 205, 915–937, http://doi.org/10.1093/gji/ggw061.
Sambridge, M., J. Braun, and H. McQueen, 1995, Geophysical parametrization and interpolation of
irregular data using natural neighbours: Geophysical Journal International, 122, 837–857,
http://doi.org/10.1111/j.1365-246X.1995.tb06841.x.
Sen, M. K., and R. Biswas, 2015, Choice of regularization weight in basis pursuit reflectivity inversion:
Journal of Geophysics and Engineering, 12, 70–79, http://doi.org/10.1088/1742-2132/12/1/70.
Sen, M. K., and R. Biswas, 2017, Transdimensional seismic inversion using the reversible jump
hamiltonian monte carlo algorithm: Geophysics, 82, no. 3, R119–R134,
http://doi.org/10.1190/geo2016-0010.1.
Tarantola, A., 1986, A strategy for nonlinear elastic inversion of seismic reflection data: Geophysics, 51,
1893–1903, http://doi.org/10.1190/1.1442046.
© 2017 SEG Page 1284

Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics:
Geophysics, 74, no. 6, WCC1–WCC26, http://doi.org/10.1190/1.3238367.
Zhu, H., S. Li, S. Fomel, G. Stadler, and O. Ghattas, 2016, A bayesian approach to estimate uncertainty
for full-waveform inversion using a priori information from depth migration: Geophysics, 81, no.
5, R307–R323, http://doi.org/10.1190/geo2015-0641.1.
© 2017 SEG Page 1285

Full waveform inversion with an exponentially-encoded optimal transport norm
Lingyun Qiu *, Jaime Ramos-Martínez and Alejandro Valenciano, PGS; Yunan Yang and Björn Engquist,
University of Texas at Austin
Summary (Luo and Sava, 2011), (Ma and Hale, 2013) and (Warner
and Guasch, 2014).
Full waveform inversion (FWI) with 𝐿𝐿2 norm objective
function often suffers from cycle skipping that causes the Recently, the Wasserstein distance has been proposed to
solution to be trapped in a local minimum, usually far from replace the 𝐿𝐿2 distance for the objective function in FWI
the true model. We introduce a new norm based on the (Engquist and Froese, 2014). The Wasserstein distance is a
optimal transport theory for measuring the data mismatch well-defined metric from the theory of optimal transport in
to overcome this problem. The new solution uses an mathematics. It was first brought up by Gaspard Monge in
exponential encoding scheme and enhances the phase 1781 (Monge, 1781) and more recently by Kantorovich
information when compared with the conventional 𝐿𝐿2 norm. (Kantorovich, 1942) seeking the optimal cost of
The adjoint source is calculated trace-wise based on the 1D rearranging one density into the other, where the
Wasserstein distance. It uses an explicit solution of the transportation cost per unit mass is the Euclidean distance
optimal transport over the real line. It results in an efficient or Manhattan distance.
implementation with a computational complexity of the
adjoint source proportional to the number of shots, Wasserstein distance has the ability to consider both phase
receivers and the length of recording time. We demonstrate shifts and amplitude differences It has been demonstrated
the effectiveness of our solution by using the Marmousi in (Engquist, Froese and Yang, 2016) that 𝑊𝑊 2 bears some
model. A second example, using the BP 2004 velocity advantageous mathematical properties, such as convexity
benchmark model, illustrates the benefit of the combination with respect to shift and dilation and insensitivity to noise.
of the new norm and Total Variation (TV) regularization. In (Yang Engquist, Sun and Froese 2016), 𝑊𝑊 2 on 2D data
is applied to FWI on synthetic benchmark models. The
Introduction calculation of the corresponding adjoint source requires
solving a Monge–Ampère equation that can be
FWI is formulated as a nonlinear inverse problem matching computationally demanding. Another popular optimal
modeled data to the recorded field data (Tarantola, 1984). transport metric used for FWI is the 1-Wasserstein distance
Usually, a least-square objective function is used for ( ), approximated by the Kantorovich Rubinstein (KR)
measuring the data misfit. This misfit is minimized with norm (Métivier, et al, 2016). For this metric the transport
respect to model parameter and the model update is map is not unique. The KR norm doesn't require data to be
computed using the adjoint state method. FWI can produce positive and mass preserved. Therefore it can be directly
high-resolution models of the subsurface when compared to applied to the seismic data without transferring them into
ray-based methods. Due to the large scale of the problem, probability density function (pdf). Both analysis and
local rather than global optimization methods are numerical results shows the potential of FWI with optimal
mandatory. However, FWI is often an ill posed problem transport to mitigate cycle-skipping problem.
due to the band-limited nature of the seismic data and the
limitations of the acquisition geometries. Furthermore, the The Wasserstein metric is designed to measure the distance
non-convexity resulting from the least-square objective between two pdfs. Thus, non-negativeness and unit mass
function causes the local minima, i.e., cycle-skipping are desired for the input. But, oscillation and sign-change
problem, especially with data lacking low frequency are typical features of the seismic data. Therefore, we need
information. a misfit function that takes the global features of data into
consideration and is robust to periodicity and sign-change.
It is well known that the least-square formulation of FWI Since seismic data are not naturally positive, a proper
tends to produce many local minima. This is because only normalization method is the key to Wasserstein distance
the pointwise amplitude difference is measured based inversion. Some previous methods may lead to non-
with 𝐿𝐿2 norm while the phase or travel-time information differentiable misfit function and are not compatible with
embedded in the data is more critical for the inversion. adjoint-state method, or lose information of original data
There are different approaches proposed to capture the during the normalization.
travel-time difference, such as dynamic time warping and
convolution based methods. This information is used in Here, we address the issue of how to transform seismic data
order to convexify the objective function or enlarge the true into pdfs. The new solution uses an exponential encoding
solution valley. In this direction, we mention the works in scheme and enhances the phase information when
compared with the conventional 𝐿𝐿2 norm. The algorithm
© 2017 SEG Page 1286

uses of the 1D Wasserstein metric. As a result, the To make use of the phase information from the negative
implementation of the adjoint source has the same order of part of the data, we balance this uneven encoding by also
computational complexity as of the conventional 𝐿𝐿2 norm. taking into account the data reformed by the map
We illustrate our method by using the Marmousi and the .

BP 2004 velocity benchmark models. In practice, we perform the inversion in an alternative
fashion. That is, we switch the data encoding process
Exponentially-encoded Wassesrtein distance for seismic between and every few iterations.
data
The corresponding objective function is constructed as
In this section, we define a procedure to transfer the
seismic data into pdf-like data before we calculate the
Wasserstein distance between them. Meanwhile, we also .
pursue to extract the phase information from the seismic
data for computing Wasserstein distance. Seismic data are Since we only change the objective function, the
not naturally positive, which is a challenge to apply 𝑊𝑊 2 corresponding modification for the conventional FWI is to
directly. Some previous methods such as comparing the use a new adjoint source. It can be computed as
positive and negative parts separately (Engquist and Froese,
2014) seem not be compatible with adjoint-state method.
The linear transformation (Yang Engquist, Sun and Froese,
2016) may lose the global convexity that 𝑊𝑊 2 has for
positive signals. Therefore a proper data normalization Note that and are 1D functions. We can take advantage
method is the key for inversion. of the explicit expression of the Wasserstein distance for
distributions over the real line. In this way, the
Suppose we have seismic data 𝑑𝑑, which has both positive computational complexity for obtaining the adjoint source
and negative values. We let is , where , and stand for the number
of receivers, shots and time steps, respectively. In practice,
we find that the additional computational time is very small
where α is a prescribed positive constant to control the compared with the conventional method to calculate the
upper bound of the power for the numerical accuracy. Since adjoint source, which is a subtraction with the same order
the exponential function has the feature that it has much of complexity .
milder derivative on the negative half real axis, the above
procedure treat the negative and positive part of the seismic The quadratic Wasserstein distance between two 1D pdfs
data differently. At the same time, the processed data is and is defined as
non-negative. We apply this procedure to both the recorded
data and simulation with the same constant. With an
additional scaling, we turn the recorded data d and
� and 𝒖𝒖
simulated data u into pdf-like functions 𝒅𝒅 �. Therefore,
we can apply the Wasserstein distance to measure their
Here, and are the associated cumulative distribution
difference.
functions (cdf) and ⋅−𝟏𝟏 stands for the pseudo-inverse
defined as
Intuitively, the above algorithm is nothing but an uneven
encoding process. All the information in the positive part of .
the data is amplified and stored in (𝟏𝟏, +∞) and the The Fréchet derivative with respect to is given by
information from the negative part is compressed in (𝟎𝟎, 𝟏𝟏).
In this way, the phase information is extracted mainly from
the positive side of the seismic data for the FWI. This
encoding process is invertible and Fréchet differentiable.
Therefore, according to the chain rule, the only additional
work is to multiply the adjoint source by The above equality can be simplified using the inverse
function theorem and we have that
.
FWI with this encoding process will be biased to match
travel-time provided by the positive signal. The negative
side is also needed, especially for FWI with reflection data.
© 2017 SEG Page 1287

Note that both and are monotonic increasing functions.

Hence, and are computed in
operations and both are monotonic functions. Therefore, we

can obtain the adjoint source for a single trace with
operations. Once the adjoint source is obtained, the rest of
the inversion is the same as the conventional FWI.
Numerical experiments
We first investigated the use of our method on the

Marmousi model (Figure 1a). The model contains many
reflectors, steep dips, and strong velocity variations in both
the lateral and the vertical direction. The velocity model is
9.2 km × 3.2 km. The synthetic data was created with a
minimum frequency of 5 Hz (zero power) and 7 Hz full
power. The sources and receivers are both uniformly
distributed every 20 m at 40 m depth. The maximum
recording time is 8 s. We randomly select 31 sources per
iteration. The initial model (Figure 1b) is created by
smoothing the true model using a Gaussian filter with 2 km
correlation length. With this initial model, inversion with
𝑳𝑳𝟐𝟐 objective function fails to provide a good reconstruction
(Figure 1c) but the 𝑾𝑾𝟐𝟐 gives a result closer to the true
model (Figure 1d).
Next, we perform numerical test on the BP 2004

benchmark velocity model (Figure 2a) (Billette and
Brandsberg-Dahl, 2005). The model is 28.5 km × 7.5 km
and contains a salt body in the middle of the domain of
interest. The synthetic data was created with a minimum Figure 1: (a): True model, (b) Initial model, (c) FWI with
frequency of 1 Hz (zero power) and 3 Hz full power. For 𝑳𝑳𝟐𝟐 (d) FWI with 𝑾𝑾𝟐𝟐 .
the acquisition geometry, the sources are uniformly
distributed every 40 m and the receivers are deployed every In this work, we focus on measuring the difference in data
40 m with a maximum offset of 20 km. Both source and space. Thus, no conditioning or stabilization procedure,
receiver are located at 40m depth. With this long-offset such as smoothing on the gradient and regularization on the
setting, the maximum recording time of the data is set to 12 model, is applied to the inversion results shown in Figure 2
s. For efficiency purpose, a random selected 36 shots are and 3.
used per iteration.
The oscillatory noise in FWI can be efficiently removed
A heavily smoothed model (1.1 km correlation length) using total variation type regularization (Qiu, et al., 2016).
from the true model with the water layer fixed is used as The regularization is necessary to stabilize the inversion
the starting velocity model for FWI (Figure 2b). From this and inject a priori information into the optimization. The
initial model, the conventional FWI with 𝐿𝐿2 distance fails extension of the proposed algorithm to incorporate TV
to recover the salt boundary (Figure 2c). As shown in regularization is straightforward. The inversion results are
Figure 2d, inversion with proposed algorithm produces shown in Figure 4 and 5. The TV regularization helps to
better reconstruction. The salt body shallower than 7 km produce a blocky inverted model. But, from the slices view
depth is well restored. Slices of initial model, true model, (Figure 5), it is clear that the FWI with 𝑳𝑳𝟐𝟐 distance (blue
𝐿𝐿2 reconstructed model and 𝑊𝑊 2 reconstructed model at curve) and TV regularization do not restore the salt
x=12 km are shown in Figure 3. boundary correctly. In contrast, the 𝑾𝑾𝟐𝟐 model is close to
the true model showing almost perfect sediment velocity
and salt boundary reconstruction.
© 2017 SEG Page 1288

Downloaded 09/26/17 to 80.82.77.83. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/ Full waveform inversion with an exponentially-encoded optimal transport norm
Figure 4: (a): 𝑳𝑳𝟐𝟐 with TV regularization, (b): 𝑾𝑾𝟐𝟐 with TV

regularization
Figure 2: (a): True model, (b) Initial model, (c) FWI with
𝑳𝑳𝟐𝟐 (d) FWI with 𝑾𝑾𝟐𝟐 .
Figure 5: Slices of the velocity models with TV

regularization in Figure 4.
Conclusions
The formulation of FWI with Wasserstein distance shows

the potential to mitigate the cycle-skipping problem present
in the 𝑳𝑳𝟐𝟐 solution. We propose an exponential-encoding
process to transfer the seismic data into pdf with emphasis
on phase information. The adjoint source is calculated
using the explicit solution of the optimal transport over the
real line. All the efforts lead to an efficient and robust
seismic inversion scheme. The numerical results
demonstrate the advantages of the proposed algorithm. In
the Marmousi example the new method allows the FWI to
Figure 3: Slices of the velocity models in Figure 2. start from a heavily smoothed model with high frequency
data and obtain a good result. The BP 2004 benchmark
example shows how by combining the new norm with TV
regularization the salt body velocity and boundaries can be
reconstructed starting from a smooth model.
Aknowledgments
We thank PGS for permission to publish the results.
© 2017 SEG Page 1289

EDITED REFERENCES
REFERENCES
Billette, F. J., and S. Brandsberg-Dahl, 2005, The 2004 bp velocity benchmark: 67th Annual International
Conference and Exhibition, EAGE, Extended Abstracts.
Ramos-Martinez, J., S. Crawley, S. Kelly, and B. Tsimelzon, 2011, Full-waveform inversion by pseudo-
analytic extrapolation: 81st Annual International Meeting, SEG, Expanded Abstracts,
http://dx.doi.org/10.1190/1.3627750.
Tarantola, A., 1984, Inversion of seismic refection data in the acoustic approximation: Geophysics, 49,
1259–1266, http://doi.org/10.1190/1.1441754.
Engquist, B., and B. D. Froese, 2014, Application of the Wasserstein metric to seismic signals:
Communications in Mathematical Sciences, 12, 979–988,
http://dx.doi.org/10.4310/CMS.2014.v12.n5.a7.
Engquist, B., B. D. Froese, and Y. Yang, 2016, Optimal transport for seismic full waveform inversion:
Yang, Y., B. Engquist, J. Sun, and B. D. Froese. Application of optimal transport and the quadratic
wasserstein metric to fullwaveform inversion, arXiv preprint arXiv:1612.05075, 2016.
Metivier, L., R. Brossier, Q. Merigot, E. Oudet, and J. Virieux, 2016, Measuring the mis t between
seismograms using an optimal transport distance: application to full waveform inversion:
Geophysical Journal International, 205, 345–377.
Warner, M., and L. Guasch, 2014, Adaptive waveform inversion: Theory: 84th Annual International
Meeting, SEG, Expanded Abstracts, 1089–1093, http://dx.doi.org/10.1190/segam2014-0371.1.
Luo, S., and P. Sava, 2011, A deconvolution-based objective function for wave-equation inversion: 81st
Annual International Meeting, SEG, Expanded Abstracts, 2788–2792,
http://dx.doi.org/10.1190/1.3627773.
Ma, Y., and D. Hale, 2013, Wave-equation refection traveltime inversion with dynamic warping and full-
waveform inversion: Geophysics, 78, R223-R233, https://doi.org/10.1190/geo2013-0004.1.
Monge, G., 1781, Memoire sur la theorie des deblais et des remblais: Del’Imprimerie Royale.
Kantorovich, L. V., 1942, On the translocation of masses: In Dokl. Akad. Nauk SSSR, 37, 199–201.
Qiu, L., N. Chemingui, Z. Zou, and A. Valenciano, 2016, Full-waveform inversion with steerable
variation regularization: 86th Annual International Meeting, SEG, Expanded Abstracts, 1174–
1178, http://dx.doi.org/10.1190/segam2016-13872436.1.
© 2017 SEG Page 1290

Analysis of optimal transport and related misfit functions in FWI
Yunan Yang and Björn Engquist, The University of Texas at Austin
SUMMARY The goal of this paper is to analyze important features of opti-

mal transport and to compare with methods introduced earlier.
We summarize and compare four different misfit functions for We focus on two features in particular. One is integration of
full waveform inversion (FWI): the conventional least-squares data and the other is the need to rescale the data to be non-
norm, the integral wavefields misfit functional, the Normal- negative. Integration provides a global comparison between
ized Integration Method (NIM) and the quadratic Wasserstein observed and synthetic data and also shifts the focus to lower
metric. The integral wavefields misfit functional and NIM are frequencies. Nonnegativity further reduces the risk of cycle
equivalent to the norm for Soblev space, which has intrinsic skipping.
connections with the quadratic Wasserstein metric. We extract
two important features of optimal transport. The first one is
integration of data, which reduces high frequencies and glob- THEORY
ally compares observed and synthetic seismic waveforms. The
other is rescaling of the data to be nonnegative. Numerical Full waveform inversion is a PDE-constrained optimization
results illustrate that FWI with quadratic Wasserstein metric problem, minimizing the data misfit d( f , g) by updating the
can effectively overcome the cycle skipping problem. A math- model m, i.e. :
ematical study on the convexity of the four misfit functions
demonstrates the importance of data nonnegativity and inte- m� = argmin d( f (xr ,t; m), g(xr ,t)), (1)
m
gration in dealing with local minima in inversion.
where g is observed data, f is simulated data, xr are receiver
locations, and m is the model parameter. We get the modeled
data f (x,t; m) by solving a wave equation with a finite differ-
INTRODUCTION ence method (FDM) in both the space and time domain (Alford
et al., 1974).
Full waveform inversion (FWI) is a data-driven method to ob-
tain high resolution subsurface properties by minimizing the Generalized least squares functional is a weighted sum of the
difference between observed and synthetic seismic waveforms squared errors and hence a generalized version of the standard
(Virieux et al., 2017). In the past three decades, the least- least squares misfit function. The formulation is
squares norm (L2 ) has been widely used as a misfit function ��
(Tarantola and Valette, 1982; Lailly, 1983), which is known J1 (m) = |W ( f (xr ,t; m)) −W (g(xr ,t))|2 dt, (2)
to suffer from cycle skipping issues with local minimum trap- r
ping and sensitivity to noise (Virieux and Operto, 2009). Other
misfit functions proposed in literature, include the L1 norm where W is an operator. In the conventioinal L2 misfit, W = I,
(Brossier et al., 2010), the Huber norm (Ha et al., 2009), fil- the identity operator.
ter based misifit functions (Warner and Guasch, 2014; Zhu and The integral wavefields misfit functional (Huang et al., 2014) is
Fomel, 2016), seismic envelop inversion (Luo and Wu, 2015) a generalized least squares functional applied on full-waveform
and some others. �t
inversion (FWI) with weighting operator W (u) = 0 u(x, τ)dτ.
A recently introduced class of misfit functions are optimal- The objective function is defined as
transport related (Engquist and Froese, 2014; Métivier et al., � �2
� � �� t � t
�
2016; Engquist et al., 2016; Métivier et al., 2016; Yang et al., J2 (m) = � f (xr , τ; m)dτ − g(xr , τ)dτ � dt, (3)
� �
2016). As useful tools from the theory of optimal transport, the r 0 0
quadratic Wasserstein metric (W2 ) computes the optimal cost
of rearranging one distribution into another with a quadratic �t
If we define the integral wavefields U(x,t) = 0 u(x, τ)dτ, then
cost function, while 1-Wasserstein metric (W1 ) using absolute misfit function (3) is the ordinary least squares misfit
value cost function. � t between
the observed and predicted integral wavefields 0 g(xr , τ)dτ
�t
In this paper, we will also discuss about Normalized Intergra- and 0 f (xr , τ; m)dτ. The integral wavefields still satisfy the
tion Method (NIM) which computes the least-squares differ- original acoustic
�t wave equation with a different source term:
ence between two normalized data sets (Liu et al., 2012; Chau- δ (�x −�xs ) 0 s(τ)dτ = δ (�x −�xs )H(t) ∗ s(t), where s is the origi-
ris et al., 2012; Donno et al., 2013). If we consider the data are nal source term and H(t) is the Heaviside step function (Huang
properly rescaled, the misfit of NIM is the norm of Sobolev et al., 2014).
space H −1 in mathematics. The connection between W2 and
Normalized Integration Method (NIM) is another generalized
H −1 is not obvious from the optimal transport definition, but
least squares functional, similar to the integral wavefields mis-
is clear from the 1D closed solution formula. We shall also
fit functional. However, compared with integral wavefields
see that this is valid in higher dimensions even if there is no
misfit functional which directly integrates the observed and
explicit solution formula.
© 2017 SEG Page 1291

Conventional L2
where M is the set of all maps T f ,g that rearrange the distribu-
Integral L2
100
f
30 tion f into g (Villani, 2003).
f
g 20
g
50 10
The optimal transport formulation� requires non-negative
� dis-
0
tributions and equal total masses, f (x)dx = g(x)dx, which
0 -10
are not natural for seismic signals. Therefore a proper data
normalization is required before inversion. Datasets f and g

-20
can be rescaled to be nonnegative with values in range [0, 1],
-50 -30
0 2 4 6 8 0 2 4 6 8 and to have equal mass. This step is exactly the same as the
(a) Misfit for L2 norm (b) Misfit for Integral L2 method one in Equation (5) in NIM.
NIM/ W2/ W1 We can compare the data trace by trace and use the Wasserstein
1
F metric (Wp ) in 1D to measure the misfit. The overall misfit is
0.8 dist
G
then
W = |F -1 -G -1 | R
1 � p
0.6
dist 2
J4 (m) = Wp ( f (xr ,t; m), g(xr ,t)), (7)
2
W2 = |F -1 -G -1 | 2 r=1
0.4
where R is the total number of traces. In this paper, we mainly
0.2
dist' 2
discuss the quadratic Wasserstein metric (W2 ) when p = 2 in (6)
NIM = |F-G|
2 and (7).
0
0 2 4 6 8
(c) Comparison among NIM, W2 , W1 in 1D

PROPERTIES
Figure 1: The shaded areas represent
� the mismatch each mis-
fit function considers. 2 2 Next we discuss the similarities and difference among the mis-
� � (a) L� : 2 ( f − g) dt. (b) Integral
wavefields method: ( f� − g) dt. After data normaliza- fit functions mentioned above. We will regard f and g as the
measures (F − G)2�dt, while W2 considers synthetic and observed data from one single trace as an 1D
� −1(c) NIM
tion,
2
(F − G ) dt and W1 considers |F −1 − G−1 |dt.
−1 problem.
Relations among misfit functions

Conventional 2
� full-waveform inversion measures the L norm
difference | f (t) − g(t)|2 dt, indicated by the shaded part in
synthetic data in time, NIM first preconditions the data and
Figure 1a. The integral wavefields misfit functional first inte-
then takes the integration. The objective function is:
� grates f and g in time, and then measures their L2 misfit (3).
1� The integral wavefields can be viewed as wavefields produced
J3 (m) = |Q( f (xr ,t; m)) − Q(g(xr ,t))| p dt, (4)
2 r by a low-passed seismic wavelet. The created lower frequency
components (in Figure 1b) can properly explain the improve-
where Q is transformation of the wavefield u, defined as: ment in inversion (Huang et al., 2014).
�t
P(u)(xr , τ)dτ With a proper normalization method, it is possible to scale the
Q(u)(xr ,t) = � 0T . (5)
0 P(u)(xr , τ)dτ data to have nonnegativity and mass balance. This step is es-
sential for both NIM and W2 . Since processing data trace-by-
The operator P is included to make the data nonnegative. Three trace is an 1D problem, we are able to solve the optimal trans-
common choices are P1 (u) = |u|, P2 (u) = u2 and P3 = E(u), port problem exactly (Villani, 2003). The optimal map is the
which correspond to the absolute value, the square and the en- unique monotone rearrangement of the density f into g. In or-
velop of the signal (Liu et al., 2012). der to compute the quadratic Wasserstein metric, we need the
cumulative distribution functions F and G and their inverses
Despite the fact that both methods are measuring the L2 misfit, F −1 and G−1 . The explicit formulation for the 1D Wasser-
there are three different features in NIM compared with con- stein metric is:
ventional FWI. Data sets are normalized to be nonnegative, � 1
mass balanced and integrated in time. The first two are exactly p
Wp ( f , g) = |F −1 (x) − G−1 (x)| p dx. (8)
the prerequisite of optimal transport based misfit functions, i.e. 0
the Wasserstein metrics.
The interesting fact is that W2 computes the L2 misfit between
Optimal transport F −1 and G−1 (Figure 1c), while the objective function of NIM
�T
Optimal transport refers to the problem of seeking the mini- measures the L2 misfit between F and G, i.e. 0 |F(t)−G(t)|2 dt
mum cost required to transport mass of one distribution into (Figure 1c). This is identical to the mathematical norm of
another given a cost function, e.g. |x − y| p . The mathematical Sobolev space H −1 , || f − g||2H −1 , given f and g are nonneg-
definition of the distance between the distributions f : X → R+ ative and sharing equal mass.
and g : Y → R+ can then be formulated as
� Since F and G are both monotone increasing, one can show
� �
p
Wp ( f , g) = inf �x − T f ,g (x)� p f (x) dx (6) that there is an equivalency between NIM and W2 misfit with
T f ,g ∈M
X
© 2017 SEG Page 1292

Conventional L2 Integral L2 misfit NIM with p(x)=x*x
12000 300 2 L2 misfit
10000 250
1.5
8000 200 1
6000 150 1
4000 100
Misfit
0.5
2000 50 0.5
0 0 0
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
Shift Shift Shift
0
W2, p(x)=x*x W2,
-3
10
p(x)=ax+b W2, p(x)=exp(c*x) 1
4 4 0.015 0.8 1.6
0.6 1.8
2
3 3 0.4 2.2
0.01
2.4
v p,0
2 2
0.005 (a) L2
1 1
0 0 0
-2 -1 0
Shift
1 2 -2 -1 0
Shift
1 2 -2 -1 0
Shift
1 2
W2 misfit
Figure 2: The misfit between f (x) and f (x − s) by six differ- 1
ent misfit functions. First row shows conventional L2 (left),

intergral wavefield method (middle) and NIM with p(x) = x2
Misfit
0.5
(right). Second row shows the W2 misfit with different nor-

malization methods: p(x) = x2 (left), ax + b (middle) and 0
1
exp(c ∗ x) (right). 0.8
0.6 2
1.8
1.6
0.4 2.2
2.4
v p,0
(b) W2
Figure 3: (a) Convexity plot of conventional L2 (b) Convexity

the same data normalization. Another demonstration of the plot of trace-by-trace W2 with normalization p2 (x) = ax + b
similarity between NIM and optimal transport based metrics
comes when p = 1 in (4) and (8). These two misfits are the
� �1
same since |F(t) − G(t)|dt = 0 |F −1 (x) − G−1 (x)|dx. shift f (x − s). One can refer to the blue and red curves in
Figure 1a. Here we plot the data misfits as a function of s in
Mathematical connection between H −1 norm and W2 norm Figure 2. Conventional L2 , Intergral L2 and NIM are compared
Next we move into a general case that f and g are synthetic on the first row. The second row displays W2 misfits with three
and observed data in higher dimensions, satisfying nonneg- different scaling functions.
ativity and conservation of mass. To compute the quadratic
The figure on the top left for the conventional L2 is the mo-
Wasserstein metric, we solve the following Monge-Ampère
tivation of (Engquist and Froese, 2014) to bring the quadratic
equation (Brenier, 1991)
Wasserstein metric into seismic inversion. Such many local
det(D2 u(x)) = f (x)/g(∇u(x)) (9) minima in the figure are not in favor of gradient-based opti-
mization. The graph on the top middle is result of integral
If f and g are close enough and g = (1 + εh + O(ε 2 )) f ,
where wavefields misfit functional. It creates lower frequency com-
h has mean zero, we can linearize (9) and also derive an ap- ponent, which decrease the chance of cycle skipping. Although
proximation of the quadratic Wasserstein metric between f and having less local minima than conventional L2 , this method is
g (Villani, 2003, p126-p127): still ill-posed in inversion. Integrating the wavefields or inte-
� grating the source may help invert the low wavenumber com-
W22 ( f , g) ≈ |∇φ (x)|2 f (x)dx = || f − g||2H −1 (dµ) , (10) ponent of velocity, but still suffers from cycle skipping issues.
Rn
where dµ = f (x)dx. In one word, the quadratic Wasserstein As demonstrated by Engquist et al. (2016), the squared Wasser-
metric is a weighted H −1 norm. stein metric has several properties that make it attractive as
a choice of misfit function. One highly desirable feature is
Besides, the dynamical characterization of the Wasserstein met- its convexity with respect to several parameterizations. How-
ric proposed by Benamou-Brenier (Benamou and Brenier, 2000) ever, the convexity highly depends on the data normalization
gives insights to consider that H −1 and W2 belongs to the same method to satisfy nonnegativity and mass balance. The curves
class of measures. One can refer to Dolbeault et al. (2009) and in the second row of Figure 2 are W22 distance with differ-
Cardaliaguet et al. (2012) for more theoretical details, and Pa- ent scaling functions: p1 (x) = x2 , and p2 (x) = ax + b and
padakis et al. (2014) for computational schemes. Mathemati- p3 (x) = exp(c · x). Theoretically p1 gives perfect convexity,
cally, the misfits computed by NIM and W2 are close also in but having difficulty in inversion with adjoint-state method.
higher dimensions. From Taylor expansion p3 is very close to p2 when c is small,
but easy to blow up with large c. Our current choice is to nor-
Convexity
malize data with p2 , but it is worth thinking a new normaliza-
In order to illustrate the convexity of different objective function function that is able to preserve the convexity better.
tions, we borrow an example from Engquist and Froese (2014)
It is interesting to compare the graph for NIM (upper right)
that compares the misfit between a Ricker wavelet f and its
© 2017 SEG Page 1293

x (km)
0 2 4 6 8 10 12 14 16 though W2 is not convex in data domain with normalization
0
method p2 (x) = ax + b (Figure 2), the curve for W2 (Figure 3b)
1
is globally convex in model parameters v p,0 and α. It demon-
2
strates the capacity of W2 in mitigating cycle skipping issues.
z (km)
4
5
NUMERICAL EXAMPLE
6
(a)
In this section, we use a part of the BP 2004 benchmark ve-
0 2 4 6 8 10 12 14 16 locity model (Billette and Brandsberg-Dahl, 2005) (Figure 4a)
0
and an initial model without the upper salt part (Figure 4b)
1
to do inversion with W2 and L2 norm respectively. A fixed-
2 spread surface acquisition is used, involving 11 shots located
every 1.6km on top. A Ricker wavelet centered on 5Hz is used
z (km)
4 to generate the synthetic data with a bandpass filter only keep-

5
ing 3 to 9Hz components. We stopped the inversion after 300
6
L-BFGS iterations.
(b)
Here we precondition the data with function p2 (x) = ax + b
0 2 4 6 8 10 12 14 16 to satisfy the nonnegativity and mass balance in optimal trans-
0
port. Inversion with trace-by-trace W2 norm successfully con-
1
struct the shape of the salt bodies (Figure 4d), while FWI with
2 the conventional L2 failed to recover boundaries of the salt
bodies as shown by Figure 4c.
z (km)
5
CONCLUSION
6
(c)
In this paper, we summarize and compare four misfit func-
0
0 2 4 6 8 10 12 14 16 tions: the conventional least-squares inversion (L2 ), the in-
tegral wavefields misfit function, the Normalized Integration
1
Method (NIM), and the quadratic Wasserstein metric (W2 ) from
2
optimal transport. The L2 norm is popular in general inverse
z (km)
3
problems, but suffers from cycle skipping in seismic inversion.
4 The other three methods all incorporate the idea of integra-
5 tion the waveforms. Integration helps in enhancing the lower
6 frequency component, but cannot avoid local minima coming
(d) from the oscillatory periodicity of the data. It is ideal to have
a preconditioning operator which can “break” the periodicity
Figure 4: (a) True model velocity (b) Initial velocity (c) Inver- and “record” the previous data information in time.
sion result using L2 (d) Inversion result using W2
One solution is to combine the nonnegativity and integration in
time together. Both NIM and the quadratic Wasserstein met-
ric include these ideas as essential steps. A detailed discussion
with the one of W2 (lower left) both of which are using the illustrates that the quadratic Wasserstein metric and the H −1
same normalization function (p1 ) and globally convex with re- norm which NIM computes belong to the same family of math-
spect to the shift s. When f (x) and f (x − s) are close (i.e. |s| ematical measures. Moreover, H −1 and W2 become equivalent
is small), W2 is a weighted H −1 as (10) states. Both curves when the two data sets are close. The analysis among these
have good convexity as O(s2 ) around zero. As |s| gets larger, misfit functions of FWI brings additional insights into the im-
W22 ( f , fs ) is still O(s2 ), while the misfit measured by NIM is portance of seismic data preconditioning, which also can be
O(s). The convexity of NIM becomes a bit weaker. seen in examples of large scale FWI.
Finally we present a convexity result in model domain. We

borrow the example from Métivier et al. (2016). The veloc- ACKNOWLEDGMENTS
ity model is assumed to vary linearly in depth as v(x, z) =
v p,0 + αz, where v0 is the starting velocity on the surface, α We thank Sergey Fomel, Junzhe Sun and Zhiguang Xue for
is vertical gradient and z is depth. The reference for (v p,0 , α) helpful discussions, and thank the sponsors of the Texas Con-
is (2km/s, 0.7s−1 ), and we plot the misfit curves with α ∈ sortium for Computational Seismology (TCCS) for financial
[0.4, 1] and v0 ∈ [1.75, 2.25] on 41 × 45 grid in Figure 3. We support. This work was also partially supported by NSF DMS-
observe many local minima and maxima in Figure 3a. Al- 1620396.
© 2017 SEG Page 1294

EDITED REFERENCES
REFERENCES
Alford, R., K. Kelly, and D.M. Boore, 1974, Accuracy of finite-difference modeling of the acoustic wave
equation: Geophysics, 39, 834–842, http://dx.doi.org/10.1190/1.1440470.
Benamou, J.-D., and Y. Brenier, 2000, A computational fluid mechanics solution to the monge-
kantorovich mass transfer problem: Numerische Mathematik, 84, 375–393,
http://dx.doi.org/10.1007/s002110050002.
Billette, F., and S. Brandsberg-Dahl, 2005, The 2004 bp velocity benchmark: Presented at the 67th EAGE
Conference & Exhibition.
Brenier, Y., 1991, Polar factorization and monotone rearrangement of vector-valued functions:
Communications on pure and applied mathematics, 44, 375–417,
http://dx.doi.org/10.1002/(ISSN)1097-0312.
Brossier, R., S. Operto, and J. Virieux, 2010, Which data residual norm for robust elastic frequency-
domain full waveform inversion?: Geophysics, 75, no. 3, R37–R46,
http://dx.doi.org/10.1190/1.3379323.
Cardaliaguet, P., G. Carlier, and B. Nazaret, 2012, Geodesics for a class of distances in the space of
probability measures: Calculus of Variations and Partial Differential Equations, 1–26,
https://doi.org/10.1007/s00526-012-0555-7.
Chauris, H., D. Donno, and H. Calandra, 2012, Velocity estimation with the normalized integration
method: Presented at the 74th EAGE Conference and Exhibition incorporating EUROPEC 2012,
https://doi.org/10.3997/2214-4609.20148721.
Dolbeault, J., B. Nazaret, and G. Savaré, 2009, A new class of transport distances between measures:
Calculus of Variations and Partial Differential Equations, 34, 193–231,
http://dx.doi.org/10.1007/s00526-008-0182-5.
Donno, D., H. Chauris, and H. Calandra, 2013, Estimating the background velocity model with the
normalized integration method: 75th Annual International Conference and Exhibition
incorporating SPE EUROPEC, EAGE, Extended Abstracts, http://dx.doi.org/10.3997/2214-
4609.20130411.
Engquist, B., and B. D. Froese, 2014, Application of the Wasserstein metric to seismic signals:
Communications in Mathematical Sciences 12, https://doi.org/10.4310/cms.2014.v12.n5.a7.
Engquist, B., B. D. Froese, and Y. Yang, 2016, Optimal transport for seismic full waveform inversion:
Ha, T., W. Chung, and C. Shin, 2009, Waveform inversion using a back-propagation algorithm and a
huber function norm: Geophysics, 74, no. 3, R15–R24, http://dx.doi.org/10.1190/1.3112572.
Huang, G., H. Wang, and H. Ren, 2014, Two new gradient precondition schemes for full waveform
inversion: arXiv preprint arXiv:1406.1864.
Lailly, P., 1983, The seismic inverse problem as a sequence of before stack migrations: Conference on
inverse scattering: Theory and application: Society for Industrial and Applied Mathematics, 206–
220.
Liu, J., H. Chauris, and H. Calandra, 2012, The normalized integration method-an alternative to full
waveform inversion?: Presented at the 25th Symposium on the Application of Geophysics to
Engineering & Environmental Problems, https://doi.org/10.3997/2214-4609.20144373.
© 2017 SEG Page 1295

Luo, J., and R.-S. Wu, 2015, Seismic envelope inversion: Reduction of local minima and noise resistance:
Geophysical Prospecting, 63, 597–614, http://dx.doi.org/10.1111/1365-2478.12208.
Métivier, L., R. Brossier, Q. Mérigot, E. Oudet, and J. Virieux, 2016, Measuring the misfit between
seismograms using an optimal transport distance: Application to full waveform inversion:
Geophysical Journal International, 205, 345–377, http://dx.doi.org/10.1093/gji/ggw014.
Métivier, L., R. Brossier, Q. Mérigot, E. Oudet, and J. Virieux, 2016, An optimal transport approach for
seismic tomography: application to 3d full waveform inversion: Inverse Problems, 32, 115008,
http://dx.doi.org/10.1088/0266-5611/32/11/115008.
Papadakis, N., G. Peyré, and E. Oudet, 2014, Optimal transport with proximal splitting: SIAM Journal on
Imaging Sciences, 7, 212–238, http://dx.doi.org/10.1137/130920058.
Tarantola, A., and B. Valette, 1982, Generalized nonlinear inverse problems solved using the least
squares criterion: Reviews of Geophysics, 20, 219–232,
http://dx.doi.org/10.1029/RG020i002p00219.
Villani, C., 2003, Topics in optimal transportation: American Mathematical Society, Graduate Studies in
Mathematics 58.
Virieux, J., A. Asnaashari, R. Brossier, L. Métivier, A. Ribodetti, and W. Zhou, 2017, In 6. An
introduction to full waveform inversion: Encyclopedia of Exploration Geophysics, R1-1–R1-40,
http://dx.doi.org/10.1190/1.9781560803027.entry6.
Geophysics, 74, no. 6, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367.
Warner, M., and L. Guasch, 2014, In Adaptive waveform inversion: Theory: 84th Annual International
Yang, Y., B. Engquist, J. Sun, and B.-D. Froese, 2016, Application of optimal transport and the quadratic
wasserstein metric to full-waveform inversion: arXiv preprint arXiv:1612.05075.
Zhu, H., and S. Fomel, 2016, Building good starting models for full-waveform inversion using adaptive
matching filtering misfit: Geophysics, 81, no. 5, U61–U72, http://dx.doi.org/10.1190/geo2015-
0596.1
© 2017 SEG Page 1296

Time-domain broadband phase-only full-waveform inversion with implicit shaping
Musa Maharramov∗ , Anatoly I. Baumstein, Yaxun Tang, Partha S. Routh, Sunwoong Lee, and Spyros K. Lazaratos,
ExxonMobil Upstream Research Company
SUMMARY However, due to the lack of full separation, amplitude errors in

the observed or predicted data leak into the inversion result,

We propose a new full-waveform inversion (FWI) method that limiting the utility of such methods. In this work we propose
approximates broadband tomographic inversion and has a re- an objective function for a broadband FWI that fully separates
duced sensitivity to errors in the observed-data amplitude in- kinematic and dynamic information for a single transmission
formation. The method is based on fitting the observed-data or reflection event even in cases of dispersive propagation. We
phase spectrum while automatically shaping forward-modeled demonstrate that the proposed method still outperforms a pop-
wave fields to the observed-data amplitude spectrum. This is ular alternative in a realistic example with multiple reflection
achieved by using a phase-only objective function that allows and transmission events, and in the presence of amplitude at-
broadband time-domain inversion of the observed-data phase tenuation.
information. We demonstrate our method’s reduced sensitivity
to dynamic information under the traveltime approximation,
and compare the new objective function to the normalized L2 THEORY
FWI objective function in an experiment on synthetic data with
a frequency-dependent attenuation. Our method is based on minimizing the following objective
function that represents a weighted phase misfit between the
observed and forward-modeled data:
Z
INTRODUCTION ˆ 2 ˆ 2
|d|ˆ û − d = |d|ˆ û − d dω, (1)
|û| ˆ
|d| |û| ˆ
|d|
Inversion of subsurface velocity models from seismic data us-
ing both phase and amplitude content of the observed data is where d = d(t, s, r) stands for the observed data trace for a
sensitive to data quality and accuracy of the underlying math- single source-receiver pair s, r, and u = u(t, s, r) is predicted
ematical model of wave propagation. For example, invert- (forward-modeled) data. The “hats” above the wave field sym-
ing amplitude-versus-offset (AVO) effects requires full elastic bols denote temporal Fourier transforms. Note that in the above
modeling; anisotropic phase-and-amplitude FWI requires ac- objective function the amplitude spectrum of the predicted data
curate modeling of wavefield amplitudes in anisotropic media; is shaped to the amplitude spectrum of the observed data, thus
unknown or changing amplitude spectra of the source often always matching it. Ignoring multipathing (i.e., assuming a
need to be taken into account to resolve subtle (for example, single transmitted or reflected wave) the observed data response
time-lapse) effects; transmission or absorption attenuation ef- to a delta-function source is asymptotically given by the fol-
fects may easily leak into the inverted velocity models unless lowing traveltime (high-frequency or WKB) approximation:
an adequate Q model is used. However, a good deal of infor-
mation about subsurface velocity models can be, and routinely ˆ s, r) = Ad (ω, s, r) exp [iωτd (ω, s, r)] ,
d(ω, (2)
is, extracted from purely kinematic data, such as arrival times.
Tomographic techniques based on this approach are very suc- where τd (ω, s, r) is the observed traveltime between source s
cessful in practice and form the backbone of many existing and receiver r, and Ad (ω, s, r) is the observed wave field am-
velocity model building methods, but may require a signifi- plitude. Note that in Equation 2 we allow frequency-dependent
cant amount of manual picking and analysis. Our objective is kinematic and dynamic propagation effects, such as attenu-
to achieve automated time-domain full-waveform inversion of ation due to transmission and absorption. Similarly, for the
subsurface velocity models using mostly kinematic informa- forward-modeled wave field, we have
tion contained in broadband seismic data, while largely ignor- û(ω, s, r) = Au (ω, s, r) exp [iωτu (ω, s, r)] . (3)
ing dynamic (amplitude) information.
Substituting Equations 2 and 3 into Equation 1, for the misfit
One approach to constructing true kinematic FWI objective
functional we obtain
functions is based on extracting traveltime differences between Z
the observed and predicted data by cross-correlation (Luo and
A2d |exp [iωτu ] − exp [iωτd ]|2 dω, (4)
Schuster, 1991; Gee and Jordan, 1992; Van Leeuwen and Mul-
der, 2010), and is sensitive to noise in the data and ambigu-
ity in event picking. Another approach is based on construct- where the dependence of Ad , τd , and τu on ω, s, and r is omit-
ing phase and amplitude misfits in time and frequency do- ted for brevity. The integrand in Equation 4 is a measure of
mains (Fichtner and Igel, 2008; Bozdaǧ et al., 2011; Fichtner, the misfit between the observed and predicted traveltimes τd
2011). The latter approach, as well as the normalized L2 FWI and τu modulo 2π/ω, weighted by the observed wave field
(Routh et al., 2011), do not fully separate kinematic and dy- amplitude. The objective function becomes zero if the travel-
namic information, but provide an approximation to kinematic time is accurately predicted regardless of any errors in either
inversion by using phase as a proxy for traveltime information. forward-modeled Au or observed Ad amplitudes. In this sense
Time-domain broadband phase-only FWI with implicit shaping
Equation 1 provides a true “kinematic” objective function un- effects, resulting in a leakage of amplitude information into the
der the assumption of single-event asymptotic representations misfit. If forward modeling is not dynamically accurate, this
in Equations 2 and 3. Of course, instead of Equation 1 we can leakage may result in significant inversion errors.
use a simple frequency-domain normalized misfit
2
û dˆ EXAMPLES
− (5)
|û| ˆ
|d|
We conduct time-domain full-waveform inversion experiments
in either time or frequency-domain FWI. As with Equation 1, with the objective functions in Equations 1 and 7 using syn-
Equation 5 provides an amplitude-insensitive objective func- thetic data with and without amplitude attenuation. The adjoint
tion for fitting single events of Equations 2 and 3. However, source for the new objective function is computed as
our proposed objective function in Equation 1 provides a broad- n 2i
band inversion over an arbitrary range of frequencies with the o
−1
f (t, s, r) = Fω→t Im wû wû − dˆ (8)
data amplitude spectrum acting as a frequency-dependent mis- û
fit weight. Any desired shaping can be applied to the observed ˆ û| and Fω→t
−1
where w(ω) = |d|/| is the inverse Fourier trans-
data amplitude spectrum in a single data preprocessing step,
form. We generated synthetic data using acoustic modeling
for example, boosting lower frequencies to improve FWI con-
with density. The true velocity model used in our experiments
vergence and reduce sensitivity to cycle skips (Lazaratos et al.,
is shown in Figure 1, the true density model is obtained from
2011; Plessix and Li, 2013). Another advantage of the new ob-
the true velocity model by dividing it by 1500 (setting water
jective function is its ability to handle singularities in the nor-
density to 1). Both forward modeling and inversion are per-
malized observed wave field due to notches in the amplitude
formed on a 1000 (horizontal) by 800 (vertical) computational
spectrum in the presence of complex multipathing. In any re-
grid with a 10 m horizontal and 5 m vertical spacing. A Ricker
alistic experiment the observed wave field is the sum of multi-
wavelet centered at 10 Hz is used for source, and absorbing
ple events of Equation 2, representing various transmitted and
boundary conditions are applied at the surface to avoid surface-
reflected waves,
related multiples. A streamer acquisition is used with 39 shots
X h i
ˆ s, r) = j j and a 260 m shot spacing, with offsets ranging from 10 m to 10
d(ω, Ad (ω, s, r) exp iωτd (ω, s, r) , (6)
km. We use a starting velocity model obtained from the true
j
model using a 400 m smoothing filter.
j j
with various traveltimes τd and amplitudes Ad . Complex multi-
pathing may result in the amplitude spectrum of the multiple-
event trace of Equation 6 being close to zero, leading to errors
in the normalized wave fields in Equations 1 and 5. Weighting
of the integrand in Equation 1 by the observed data amplitude
spectrum effectively eliminates the contribution of such singu-
larities. It must be noted that when a single-event asymptotic
of Equation 2 is not valid, our method can no longer be re-
garded as a true kinematic inversion. Indeed, the phase spec-
trum of a multiple-event trace in Equation 6 is determined by
both traveltimes and amplitudes of the individual events. Am-
plitude errors in either forward-modeled or observed data leak
into the phase spectrum and inverted models, reducing contri-
bution of the weakest events to the inversion. This obvious
limitation of the proposed method can be partially overcome
using data masking for isolating individual events of interest
as, for example, in a target-oriented or time-lapse inversion.
One existing broadband alternative to the proposed method is
the normalized L2 objective function (Routh et al., 2011),
2 Z 2
u d u d

kuk − kdk = kuk − kdk dt. (7)
For a single-event asymptotic of Equation 2 and in the absence

of frequency dispersion—i.e., when both traveltime τd and am-
plitude Ad are independent of ω—the normalized L2 objective
function of Equation 7 yields a purely kinematic misfit equiv-
alent to that of Equation 1. However, for a dispersive propaga-
tion with the amplitude Ad as a function of frequency, or in the
presence of a frequency-dependent noise, division by the full Figure 1: The true velocity model. The true density model was
trace norm in Equation 7 may not completely remove dynamic obtained from velocity by dividing it by 1500.
We use no frequency continuation and conduct a 0-25 Hz broad- locity model of Figure 1, however, velocity contrasts are over-
band time-domain full-waveform inversion to convergence, us- predicted. FWI using the normalized L2 objective function of
ing the objective functions of Equations 1 and 7. In our ex- Equation 7 produced a similar result with no clear advantage
periments we intentionally avoid density inversion in order to to any method, as demonstrated by the forward-modeled traces
study the effect of amplitude errors on our objective functions. shown in Figure 3. In the absence of frequency dispersion,
Since the true model generates multiple refraction and reflec- both approaches deliver similar results, and both suffer from
tion events, the neglected density effects are expected to leak amplitude leakage into the phase spectrum of multiple-event
into the velocity inversion. Indeed, for the acoustic reflection traces in Equation 6.
coefficient in the absence of elastic conversions, for small re-
flection angles less than ≈ 30◦ we have (Aki and Richards,
1980; Shuey, 1985):

1 ∆VP ∆ρ 1 ∆VP
R(θ ) ≈ + + sin2 θ , (9)
2 VP ρ 2 VP
where θ is the reflection angle, VP and ρ are the acoustic ve-

locity and density below a model contrast, and ∆VP and ∆ρ are
velocity and model changes across the model contrast.
Figure 3: Predicted traces at ≈1 km offset using Equation 1

(green) and Equation 7 (blue) versus unattenuated true data
(red). In the absence of frequency dispersion the normalized
Figure 2: The inverted velocity model using the objective func- L2 produces results comparable to those obtained using the
tion of Equation 1. Density was not inverted and non-constant new objective function.
density effects leaked into the inverted velocity, resulting in
sharper model contrasts. In our next experiment we apply amplitude attenuation to the
synthetic data of the previous experiment using a uniform Q
Equation 9 demonstrates that with our choice of the true den- constant of 60. No phase effects of Q are applied as our objec-
sity model, density contrasts boost the amplitude effects of ve- tive is to mimic kinematically accurate propagators with dy-
locity contrasts, especially at near offsets, and the leakage of namic errors in either data or the propagator. FWI using Equa-
density into the inverted acoustic velocity model should re- tion 1 produced both qualitatively and quantitatively superior
sult in sharper velocity contrasts. Figure 2 shows the acous- results across all depths as shown in a sample velocity log of
tic velocity model obtained by FWI with the proposed phase- Figure 4.
only objective function of Equation 1. As expected, the in-
version exhibits a good qualitative agreement with the true ve- While both methods suffer from amplitude leakage, the phase-
tive to the frequency-continuation approach based on the mis-

fit functional of Equation 5 as described by Maharramov et al.
(2016).
ACKNOWLEDGMENTS
The authors would like to thank Ramesh Neelamani, Ganglin

Chen, Tom Dickens, David Johnston, and Eric Wildermuth for
a number of useful discussions, and ExxonMobil Upstream
Research Company for the permission to publish this work.
Figure 4: Logs of the velocity difference from the starting ve-

locity at 1 km inline coordinate using FWI with Equation 1
(green), Equation 7 (blue), and true velocity model (red). The
phase-only objective function with implicit shaping delivers a
more accurate velocity across all depths in comparison with
the normalized L2 FWI.
only objective function with implicit shaping can at least cor-

rectly separate phase and amplitude information for single ev-
ents, resulting in a reduced sensitivity to amplitude errors, es-
pecially for weaker events. This is demonstrated in Figure 5,
which shows forward-modeled data traces versus unattenuated
true data. The normalized L2 FWI appears to be most sensi-
tive to the strongest events, while the new phase-only method
delivers a better overall kinematic agreement.
CONCLUSIONS
FWI based on our new phase-only objective function with im-

plicit shaping can achieve time-domain broadband inversion
of subsurface models from the phase spectrum of the observed
data. While in the presence of multipathing the method does
not deliver a true kinematic inversion, it is less sensitive to fre-
quency dispersion in the amplitude information in comparison
with the normalized L2 . The new method can deliver quali- Figure 5: Predicted traces at ≈1 km offset using FWI veloc-
tatively and quantitatively better results in the presence of at- ity models from attenuated data with Q = 60 and the objective
tenuation and other dispersive phenomena. Implicit shaping functions of Equation 1 (green) and Equation 7 (blue) versus
to the observed-data amplitude spectrum provides frequency- the unattenuated true data (red). In the presence of frequency
dependent weighting of phase misfits and can help avoid com- dispersion the new method achieves a better kinematic accu-
putationally expensive frequency continuation in time-domain racy in comparison with the normalized L2 FWI.
FWI. Applications of the proposed method can include FWI
using kinematically accurate but dynamically wrong numerical
propagators (for example, pseudo-acoustic anisotropic propa-
gators), inversion of field data with noisy or unreliable ampli-
tude information, kinematic source inversion, and time-lapse
FWI. Application of the proposed method to time-lapse FWI
may be of particular interest, providing a broadband alterna-
EDITED REFERENCES
REFERENCES
Aki, K., and P. G. Richards, 1980, Quantitative seismology: Theory and methods: W. H. Freeman and
Co.
Bozdag, E., J. Trampert, and J. Tromp, 2011, Misfit functions for full waveform inversion based on
instantaneous phase and envelope measurements: Geophysical Journal International, 185, 845–
870, http://dx.doi.org/10.1111/j.1365-246X.2011.04970.x.
Fichtner, A., 2011, Full seismic modeling and inversion: Springer.
Fichtner, A., and H. Igel, 2008, Efficient numerical surface wave propagation through the optimization of
discrete crustal models — a technique based on non-linear dispersion curve matching (DCM):
Geophysical Journal International, 173, 519–533, http://dx.doi.org/10.1111/j.1365-
246X.2008.03746.x.
Gee, L. S., and T. H. Jordan, 1992, Generalized seismological data functionals: Geophysical Journal
International, 111, 363–390, http://dx.doi.org/10.1111/j.1365-246X.1992.tb00584.x.
Lazaratos, S., I. Chikichev, and K. Wang, 2011, Improving the convergence rate of full wavefield
inversion using spectral shaping: 81st Annual International Meeting, SEG, Expanded Abstracts,
2428–2432, http://dx.doi.org/10.1190/1.3627696.
Luo, Y., and G. T. Schuster, 1991, Wave-equation traveltime inversion: Geophysics, 56, 645–653,
http://dx.doi.org/10.1190/1.1443081.
Maharramov, M., B. L. Biondi, and M. A. Meadows, 2016, Time-lapse inverse theory with applications:
Geophysics, 81, no. 6, R485–R501, http://dx.doi.org/10.1190/geo2016-0131.1.
Plessix, R.-E., and Y. Li, 2013, Waveform acoustic impedance inversion with spectral shaping:
Geophysical Journal International, 195, 301–314, http://dx.doi.org/10.1093/gji/ggt233.
Routh, P. S., J. R. Krebs, S. Lazaratos, A. I. Baumstein, I. Chikichev, N. Downey, D. Hinkley, and J. E.
Anderson, 2011, Full-wavefield inversion of marine streamer data with the encoded simultaneous
source method: 73rd Annual International Conference and Exhibition, EAGE, Extended
Abstracts, F032, http://dx.doi.org/10.3997/2214-4609.20149730.
Shuey, R. T., 1985, A simplification of the Zoeppritz equations: Geophysics, 50, 609–614,
http://dx.doi.org/10.1190/1.1441936.
Van Leeuwen, T., and W. A. Mulder, 2010, A correlation-based misfit criterion for wave-equation
traveltime tomography: Geophysical Journal International, 182, 1383–1394,
http://dx.doi.org/10.1111/j.1365-246X.2010.04681.x.
© 2017 SEG Page 1301

Robust full-waveform inversion based on particle swarm optimization
Guiting Chen* and Zhenli Wang. Institute of Geology and Geophysics, Chinese Academy of Sciences. University of
Chinese academy of Sciences
Summary algorithm is an effective and feasible way to solve the

problem of geophysical inversion.
Particle swarm Optimization (PSO) is a global optimization
of swarm intelligence simulating the birds in nature to find The classical time-domain full-waveform inversion (FWI) is
food. PSO has a good global search capability and is easy to a method proposed by Tarantola (1984) to refine the
implement, which makes it very suitable for non-linear migration velocity model by making the misfit of observed
optimization problems. We propose a novel robust search data and the predicted data in the least squares sense (Symes,
method of the full-waveform inversion based on PSO, which 2008). It uses the residual predicted record as the boundary
avoids the difficulty of gradient-like method for solving the to reverse time propagation and do cross correlation with the
Hessian matrix and has better robustness on dealing with forward wave field to calculate the gradient (Lailly 1983,
poor initial model. We add the gradient term of full- Tarantola ,1984). In this paper, we propose a full-waveform
waveform inversion to the standard PSO to construct a new inversion method based on particle swarm optimization
random search strategy. The gradient term as a guidance can algorithm. We deduce the gradient calculation method of the
significantly speed up the convergence rate. The smoothed objective function in the time domain full-waveform
Marmousi model is used as the initial model, utilizing the inversion, and integrate the gradient term into the particle
nonlinear conjugate gradient (NLCG) method and the swarm algorithm as the guiding term to obtain the time-
particle swarm optimization method to do acoustic time domain full-waveform inversion. The smoothed Marmousi
domain full-waveform inversion. The result demonstrates model is used to test the new algorithm and the result
that under the poor initial velocity model, the NLCG method indicates it is feasible method to perform full -waveform
is prematurely trapped into the local optimum, but the full- inversion.
waveform inversion method based on particle swarm
algorithm can still get an ideal result with a faster Method
convergence in early iterations. Thus, PSO is a feasible and
effective method in solving the nonlinear inversion problem Particle swarm algorithm is a global random evolutionary
and has a robust global optimization. It provides a new way computating technique (Kennedy and Eberhart, 1995), with
for the poor initial model of the FWI problem. the idea simulating of swarm behavior. The position of food
in the bird's airspace is assumed to be the global optimal
Introduction position. Birds through sharing information and their own
cognition, constantly change the search direction. The search
Particle swarm optimization algorithm is a random direction of each bird is random, weighting the three factors.
evolutionary computing technique (Kennedy and Eberhart,
1995), which is an extremely simple and easy to implement Assuming there are n particles with the dimension of m, the
global swarm intelligent algorithm with the random search position of each particle is and the velocity is , then
strategy and has a good convergence speed. Particle’s the kinematic equation of each particle is:
movement is affected by three factors: the best position of = + ()( − )+ ()( −
particle and swarm have been searched in model domain, (1)
and particle’s inertia effect. Random weighted combination = + (2)
of these three factors make the particle have robust global Where i is the particle’s number, i = 1,2,3 … n , d is the
search ability (Fernández-Martínez et al. 2009). dimension, d = 1,2,3 … m, is the optimal position of the
i-th particle that has been searched in the dimension of
It is a hot topic that has been widely concerned by using the d, is the optimal position where all particles have been
swarm intelligent random evolution search algorithms to searched in dimension d, ω is the inertia constant, , are
solve the global optimization. Geophysical problems based cognitive constants, rand() is a random function between 0
on swarm intelligence algorithms also have been presented, and 1. Where ∈ "(#$%& , #'( ) ,#$%& , #'( is the upper and
including the use of PSO to invert geophysical data (Ranjit
lower boundary of the model domain. The first term in
Shaw et al. 2007). Surface travel tomography of PSO (Jens
equation 1 is the inertia term of the particle, the second term
Tronicke et al. ,2012). Nonlinear joint inversion of
is the cognitive term, and the third term is social component.
The movement of a particle is controlled by a vector ), and
tomographic (Hendrik Paasche et al. 2014), Traveltime and
) is determined by combining the inertial, cognitive and
constrained AVO inversion using PSO (Aayush Agarwal et
al. 2016). These studies illustrate the swarm intelligence
social components with a random disturbance.
© 2017 SEG Page 1302

we define γ as a stabilizing factor, to avoid the phenomenon

Numerous studies have shown that the size of the swarm has of zero division.
a large effect on the result, but the ω , ,
parameters For the reasons of “Curse of Dimensionality”, there will be

might critically influence the performance of the algorithm countless solutions directly using the PSO algorithm can
and ensure convergence. van den Bergh (2002) suggests match on observation data, it might cause the result to appear
to select the best convergence effect when the PSO so-called "Mosaic" phenomenon. Thus, we add the derived
(* ,* )
parameters are satisfied with: + - − 1 < ω gradient term as a guidance to the PSO formula, we integrate
the formula 8 into formula 1 and 2:
=g + ()( −M )
The updated optimal location criteria are:
if (F( ) < F( ))： =
+ ()h −M i
(3)
if (F( ) < F( ))： =
− ^ ()∇\(M )
(4)
and F(∙) is the objective function. In the PSO, the particles (9)
are initialized by placing them randomly in model space, and M =M + (10)
the initialized ) is set to zero. And objective function of particle swarm full-waveform
inversion is formula 6.
For the constant density of the acoustic equation:
4((3,5;37 )
- - − ∇ ( , 9; : ) = ;: ( , 9; :) (5) The first term of formula 9 is inertia term, which makes the
2 (3) 45
Where we set ;: ( , 9; : ) = ;(9 < )=(9 − 9 < ) is the source, p particle swarm affected by the previous step. The second
is the pressure field, and v(x) represents the medium term drives the particle to move towards to the optimal
velocity, ∇ = ∇ ∙ ∇= A33 + ABB is the Laplacian operator. position where the particle has been searched. Similarly, the
According to above constant acoustic equation, the misfit third term propels the particle moving toward to the optimal
vector ∆D = EFGH − EIJK is defined as the error vector of the position of the swarm. Need to pay special attention to the
received data and the forward data, and let *L$ = ;(M) , fourth term about random gradient, which makes the search
where f(∙) is expressed as a forward process, m is the direction affected by the gradient, but the step size is
randomly generated. The gradient term as a guidance can
corresponding velocity model. The purpose of full-
significantly speed up the convergence rate. Through the
waveform inversion is to minimize the misfit error with the
combination of these four items, particles may fly to the
iterations by modify the velocity model, so define the
objective function as: optimal position, or it may fly away from the optimal
1 1 position, which is a random search process, so particle
E(m) = ∆ ∗ ∆p = ‖ *L$ − %Q: ‖
2 2
swarm has a robust global searching ability.
Z Z: 5VWX
1
= RR S 9| *L$ ( U , 9; : )
In summary, the procedure of performing full waveform
2 inversion based on particle swarm:
U[ :[Y Y
− %Q: ( U , 9; : )| 1. The initial model makes random perturbations as the
(6) initial position of each particle, and the initial velocity is set
Where the gradient of the objective function can be to be zero
expressed as (Bunks et al. 1995):
∇\] 2.Perform a finite difference forward modelling and
Z Z:
2 5VWX
A *L$ ( , 9; :) calculate the residual vector ∆D
= ^ RRS U_: ( U , 9; : ) 9
( ) Y A9
U[ :[ 3.Use residual records as the boundary conditions to
(7) calculate the reverse propagation wave field and use Eq 9,
U_: ( U , 9; : ) is the residual wave field, which propagated 10 to calculate the gradient.
by misfit vector ∆D as a boundary condition (Tarantola,
4-( (3,5;3 )
1984). `Wa 7
term can be approximated as Laplacian 4. Calculate the objective function E(m) and determine
45 -
operator of p . Considering the energy geometric diffusion whether to update the optimal position of individuals and
when the seismic wave are propagating, the gradient is swarm have searched by Eq 3,4.
normalized by using the seismic illumination method as a
precondition (Gauthieret al., 1986; Bai et al. 2014): 5. Use Eq 10, 11 to update the velocity and displacement of
∇\] particles
∇\] =
5
b∑Z::[ dY
VWX
*L$ ( , 9; 9: ) 9 + e
6. Repeat steps 2-5 until the requirements are met.
(8)
Numerical Examples
© 2017 SEG Page 1303

Marmousi model
In this paper, we design a comparative experiment to study

the sensitivity of a poor initial model. We use resized
Marmousi model to do smooth and get a bad initial velocity
model, showed in Figure 2. PSO algorithm and nonlinear
conjugate gradient method (NLCG, Hager and Zhang, 2006,
Pengliang Yang et al. 2016) are used to execute FWI with the
initial smoothed model. We analyze the experimental results
and convergence rate of the two algorithms. Design 10Hz
Ricker wavelet forward 15 shots for seismic records, time
sampling 2600 steps. FWI refined velocity model under
rough initial model, so it is not necessary to use the PML
boundary and high precision staggered grid scheme.
Therefore, we adopt second-order regular grid finite
difference and Clayton-Enquist ABC proposed by Clayton
and Engquis (1977) and Engquist and Majda (1977).
Figure 2. Smooth1 model is smoothed by Marmousi.
We set the swarm size of 50, using the initial smoothed

model to do random disturbances as the initial position of
each particle, the inertia parameter ω = 0.3 , weight
coefficient = 0.3, = 1.2, ^ = 0.2 , particle velocity
scanning boundary #$%& = M] Z − 500, #'( = M]L3 +
500 . The swarm is iterated over 300 times and outputs the
result as shown in Figure 4. and the result of the full-
waveform inversion using the nonlinear conjugate gradient
method (NLCG) of Hestenes-Stiefel and Dai-Yuan methods
(Hager and Zhang, 2006) are shown in Figure 3. The
evolution curve of the two algorithms are shown in Fig 5.
Figure 1. Resized Marmousi model
Figure 3. The result of NLCG full-waveform inversion

after 300 iterations.
© 2017 SEG Page 1304

in this paper. We can see that the evolution curve of PSO

drops by 80% after 300 iterations and has a very steep
decline at the early iterations, which indicates the algorithm

has a very efficient search performance at the early iterations.
However, the convergence curve of the NLCG method
converges very slowly, it is too early to fall into the local
optimum, without a comprehensive search in model domain.
We have found that the gradient term has a significant

influence on the convergence through a large amount of
numerical simulations. The amplitude of the gradient item
obtained by the cross correlation is extremely unbalance,
although the amplitude is processed by illumination, there is
still a larger energy at the source. This is an important reason
for the slower convergence of gradient-like search methods.
We use the stochastic evolution theory, which does not need
to satisfy the complex mathematical relations but based on
Figure 4. The result of PSO full-waveform inversion after
the Monte Carlo stochastic theory, so we can make a flexible
300 iterations.
amplitude equalization processing the gradient term to
improve the convergence rate. The motion parameters of the
From two figures above, we can find that the NLCG full-
particle swarm algorithm have very large influence on the
waveform inversion of the poverty Smooth1 model is
trajectory of the particles. The irrational parameters setting
prematurely into the local optimum, the resolution of the
easily leads the particles to premature convergence gather,
results is very bad, especially in the deep region. But PSO
so-called "premature" phenomenon, where the swarm didn’t
algorithm on Smooth1 still has an idea result. The shallow
finish a full search. Therefore, some scholars put forward the
region has been well characterized, and the deep region has
PSO algorithm of dynamic parameter process (Zhang Y et al.
also been greatly improved compared to the initial model.
2010), which can adjust the motion parameters and targets
Through the comparison of the results we can see that the
dynamically by certain rules, so that the improved algorithm
particle swarm full-waveform inversion has a robust global
can effectively avoid premature convergence. In summary,
optimal performance when the initial model is poor. It is a
particle swarm full-waveform inversion is a feasible way to
random search method based on the Monte Carlo principle
refine the coarse initial models with a faster convergence rate
which has better adaptability for the initial model, so we can
in the early iterations.
get the desired result under a poor initial model.
Conclusions
In this study, we propose a particle swarm full-waveform

inversion method based on robust global optimization,
which is a stochastic evolution method to simulate the
behavior of the native swarm. By using the information
sharing mechanism of the swarm and gradient term as a
guidance, we get a new search method in the time domain
full-waveform inversion. It avoids the processing of the
Hessian matrix without complex mathematical relationships,
and exhibits a very good robustness. We use the Marmousi
model to test the algorithm and get novel results after 300
iterations compared with NLCG, at the same time we get a
steady fast convergence rate in the early iterations, so it is a
feasibility and effectiveness in nonlinear inversion problem.
The result of this comparative experiment gives us a refining
velocity model strategy: firstly, we use migration velocity
analysis or tomography to get a rough velocity model, then
Figure 5. The evolution curve of the misfit function
use this result to perform PSO full-waveform inversion to
get a better initial model, and finally use NLCG or other
The pink solid line in Figure 5 is the result of the NLCG
algorithms for fast full waveform inversion.
method, and the dotted line is the result of the PSO method
© 2017 SEG Page 1305

EDITED REFERENCES
REFERENCES
Agarwal, A., K. Sain, and S. Shalivahan, 2016, Traveltime and constrained AVO inversion using FDR
PSO: 86th Annual International Meeting, SEG, Expanded Abstracts, 577–581.
http://doi.org/10.1190/segam2016-13959236.1.
Clayton, R., and B. Enquist, 1977, Absorbing boundary conditions for acoustic and elastic wave
equations: Bulletin of the Seismological Society of America, 67, 1529–1540.
Engquist, B., and A. Majda, 1977, Absorbing boundary conditions for numerical simulation of waves:
Proceedings of the National Academy of Sciences, 74, 1765–1766,
http://doi.org/10.1073/pnas.74.5.1765.
Fernández-Martínez, J. L., and E. García-Gonzalo, 2009, The PSO family: Deduction, stochastic analysis
and comparison: Swarm Intelligence, 3,245–273, http://doi.org/10.1007/s11721-009-0034-8.
Gauthier, O., J. Virieux, and A. Tarantola, 1986, Two-dimensional nonlinear inversion of seismic
waveforms: Numerical results: Geophysics, 51, 1387–1403, http://doi.org/10.1190/1.1442188.
Hager, W. W., and H. Zhang, 2006, A survey of nonlinear conjugate gradient methods: Pacific Journal of
Optimization, 2, 35–58.
Kennedy, J., and R. C. Eberhart, 1995, Particle swarm optimization: Proceedings of the IEEE
International Conference on Neural Networks, 4, 1942–1948.
Luis, J., T. Mukerji, E. García- Gonzalo, and A. Suman, 2010, Reservoir characterization and inversion
uncertainty via a family of Particle Swarm Optimizers: 80th Annual International Meeting, SEG,
Expanded Abstracts, 2334–2339, http://doi.org/10.1190/1.3513319.
Onwunalu, J. E., & L. J. Durlofsky, 2010, Application of a particle swarm optimization algorithm for
determining optimum well location and type: Computational Geosciences, 14, 183–198,
http://doi.org/10.1007/s10596-009-9142-1.
Paasche, H., and J. Tronicke, 2014, Nonlinear joint inversion of tomographic data using swarm
intelligence: Geophysics, 79, no. 4, R133–R149, http://doi.org/10.1190/geo2013-0423.1.
Shaw, R., and S. Srivastava, 2007, Particle swarm optimization: A new tool to invert geophysical data:
Geophysics, 72, no. 2, F75–F83, http://doi.org/10.1190/1.2432481.
Shi, Y., and R. C. Eberhart, 1998, Parameter selection in particle swarm optimization: Proceedings of the
7th Annual Conference on Evolutionary Programming, 591–600,
http://doi.org/10.1007/BFb0040810.
Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, http://doi.org/10.1190/1.1441754.
Tronicke, J., H. Paasche, and U. Böniger, 2012, Crosshole traveltime tomography using particle swarm
optimization: A near-surface field example: Geophysics, 77, no. 1, R19–R32,
http://doi.org/10.1190/geo2010-0411.1.
Van den Bergh, F., 2002, An analysis of particle swarm optimizers: Ph.D. thesis, University of Pretoria.
Yang, P., J. Gao, and B. Wang, 2015, A graphics processing unit implementation of time-domain full-
waveform inversion: Geophysics, 80, no. 3, F31–F39, http://doi.org/10.1190/geo2014-0283.1.
Zhang, Y., Y. Jun, G. Wei, and L. Wu, 2010, Find multi-objective paths in stochastic networks via
chaotic immune PSO: Expert Systems with Applications, 37, no. 3, 1911–1919,
http://doi.org/10.1016/j.eswa.2009.07.025.
© 2017 SEG Page 1306

An Ensemble-Transform Kalman Filter - Full Waveform Inversion scheme for Uncertainty estimation
J. Thurin,1 R. Brossier1 and L. Métivier2,1
1 Univ. Grenoble Alpes, ISTerre, 2 Univ. Grenoble Alpes, CNRS, LJK
SUMMARY (2011a); Métivier et al. (2013, 2014) allow assessing the effect
of the Hessian matrix on a vector, their use for estimating the

Uncertainty Quantification is a major topic for most geophys- whole inverse Hessian remains challenging. Promising meth-
ical tomography techniques, in particular for large-scale prob- ods have been proposed to decrease the computational cost of
lems. In this work, we present an original application of Hessian (or inverse Hessian) estimation by selecting prior pa-
ensemble-based methods to Full Waveform Inversion. This ap- rameterization of the Hessian or employing random probing
proach relies on a deterministic Ensemble-Transform Kalman strategies. This enables some uncertainty estimation (Ficht-
Filter borrowed from the Data Assimilation community, and ner and Trampert, 2011b; Fang et al., 2014; Zhu et al., 2016;
a frequency-domain Full Waveform Inversion. This method- Fichtner and van Leeuwen, 2015).
ology gives access to a low-rank version of the posterior co-
Aside those works, the Data Assimilation (DA) community has
variance matrix of our inverse problem, thanks to the ensem-
developed tools to solve complex non-linear inverse problems,
ble repartition. We can thus extract information from this co-
with similar difficulties as encountered in FWI. Typical ap-
variance matrix to assess uncertainty in the Bayesian sense.
plications are found in weather forecasting, climatology, and
This proof-of-concept is applied to a 2D Marmousi case, be-
oceanography. As opposed to FWI, where the goal is param-
fore discussing many questions associated with the design of
eter estimation in a static case, DA often aims at recovering
the scheme.
the state of a dynamic system given a set of sparse measure-
ments trough time. Despite this difference, we believe some of
the DA methods could be useful for uncertainty estimation in
INTRODUCTION FWI. Indeed, due to the nature of their applications, DA com-
munity focused early on uncertainty quantification and quality
Geophysical tomography allows retrieving subsurface physical
control, with schemes such as the Kalman Filter (KF) algo-
properties from discrete surface measurements. While many
rithm (Kalman, 1960), which combines cleverly modeling and
different techniques exist, Full Waveform Inversion (FWI) has
measurements while accounting for their errors. However, KF
become, for a decade, a widely used method thanks to its high-
is only suited for small linear problems due to its extensive ma-
resolution and quantitative outputs, both in the academia and
nipulation of covariance and noise matrices. While Extended
in the exploration industry (Plessix, 2009; Sirgue et al., 2010;
KF proposed to cope with non-linearity, it is still not relevant
Plessix et al., 2012; Warner et al., 2013; Fichtner et al., 2013;
for high-dimensional cases. Thanks to Evensen (1994, 2009)
Zhu et al., 2015; Operto et al., 2015). These detailed outputs
and his formulation of Ensemble Kalman Filter (EnKF), ex-
come, however, at the expense of a more complex inverse prob-
plicit manipulation of such large matrices can be avoided, re-
lem to be solved, compared to travel time tomography, as the
lying on low-rank approximations contained through the repar-
entire recorded wavefield is used as data to be fit. Since the
tition of an ensemble of states. Thus, EnKF appears to be well
advent of the technique (Lailly, 1983; Tarantola, 1984), most
suited for large scale applications and is nowadays an oper-
of the research has been dedicated to make the approach work-
ational tool in the DA community for solving problems with
ing, focusing on theoretical and methodological developments,
similar scales that found in FWI.
computational hardware capacities and seismic acquisition de-
sign. The method is becoming now more mature, but uncer- Du et al. (2012); Jordan (2015) have suggested ensemble-
tainty estimation has largely been left aside and very few stud- based approaches for tomographic problems. Their ap-
ies in the literature have tried to tackle this problem. The quan- proaches do not include, however, least-squares analysis, char-
tification of uncertainties remains a significant challenge for acteristic of KF algorithms. Jin et al. (2008) also proposed
most targets. Up to now, most quality control relies on data fit, an EnKF scheme for 1D prestack FWI based on a convolu-
in-situ comparisons with well-log data, common-image gath- tional model. In this abstract, we intend to propose a com-
ers flatness or by comparing with other geophysical methods. bination of EnKF with the FWI problem, in order to access
a low-rank approximation of the posterior covariance matrix.
In order to assess uncertainties, the Bayesian Inference Frame-
This allows reaching quantitative uncertainty information for
work for general inverse problems, as developed by Tarantola
large scale FWI problems. The difficulty to adapt this peculiar
(2005), may provide a good solution. Tarantola (2005) states
method used in dynamic applications, to the static FWI case,
that close to the global minimum of an optimization problem,
is discussed. In a first part, we propose a short review of the
the inverse Hessian operator is also the posterior covariance
EnKF and Ensemble Transform Kalman Filter (ETKF) formal-
matrix. In a least-squares sense and under the assumption that
ism. The details of our ETKF-FWI method are then exposed
the global minimum is reached, these operators would, there-
before showing some preliminary results of this proof of con-
fore, be crucial for uncertainty estimation. For realistic large-
cept. Finally, we address the challenges and open questions
scale FWI problems, however, computing those operators is
associated to this original scheme.
out of reach, as they grow larger with the problem size. While
the matrix-free methods proposed by Fichtner and Trampert
© 2017 SEG Page 1307

BRIEF ENKF REVIEW R is the measurement noise matrix, INe the identity matrix of
size (Ne , Ne ) and Y is the perturbation observation matrix de-
Kalman Filter allows retrieving the state of a linear dynamic fined as
system, using the modeling and measurement properties, and Y = d1 − d, ¯ d2 − d,
¯ ..., dN − d¯ , (7)

e
their respective errors. Schemes of the KF family are used to
with the observations mean
define the best tradeoff between modeling and measurements.
KF-schemes are classically formalized as alternated two steps 1 XeN

algorithms (figure 1) : d¯ = di . (8)
Ne
i=1
Prediction step: During this step, the state m of a dynamic sys-
tem is forecast from time/state k to k + 1 by applying a model- For a non-linear observation operator H , we have di =
ing operator F , H (mi ).
f
mk+1 = F (mk ), (1) However, determining the square root of the operator TTT is a
where the superscript f denotes the forecast system. nonunique problem. Wang et al. (2004); Ott et al. (2004) pro-
pose to use the truncated singular value decomposition (SVD)
Analysis step: Using measured data and forecast state at k + 1,
of TTT , giving
the analyzed system state at k + 1 is computed in the least-
squares sense, weighted over modeling and measurement er- TTT = CΓCT → T = CΓ−1/2 CT . (9)
rors ratio. The analyzed system state is generally defined by
the superscript a. Here, C is the singular vectors matrix and Γ the diagonal ma-
trix containing the truncated singular values of TTT . If the
The EnKF application
requires the definition of an ensemble ensemble members are uncorrelated, the rank of TTT can be
m = m1,k , ..., mNe ,k Matrices will be denoted by bold letters

shown to be min(Ne − 1, Nobs ) with Nobs the number of obser-
and transpose operator by the superscript T in the following.
vations. Recall that these steps have a low computational cost,
This matrix contains Ne state vectors, each of them the size N,
since the operator in (6) is only of size Ne .
the number of state parameters. From the ensemble, the mean
m̄ and the perturbation matrix M are given by From the definition of T, the updated Ma and m̄a are given by
N
Ma = Ne − 1M f CΓ−1/2 CT ,
p
1 Xe
(10)
m̄k = mi,k , (2)
Ne T
¯
i=1 m̄a = m̄ f + M f CΓ−1 CT Y f R−1 (dobs − d), (11)
Mk = m1,k − m̄k , ..., mNe ,k − m̄k , (3)

giving the new analysed ensemble ma = m̄a + Ma .
From equations (2) and (3) the ensemble covariance matrix The whole analysis phase is equivalent to the following varia-
Pe,k is computed as tional minimization problem (Hunt et al., 2007)
N
1 X e
1 1 1 ¯ T R−1 (dobs − d),
¯
Pe,k = (mi,k − m̄k )(mi,k − m̄k )T = M MT . C (m) = (m− m̄ f )T P−1 (m− m̄ f )+ (dobs − d)
Ne − 1 Ne − 1 k k 2 2
i=1 (12)
(4)
which combines, in the least-squares sense, the forcast state
This covariance can be extracted from the ensemble repartition
and the data, considering their respective uncertainties.
during both the forecast and analysis steps.
Each ensemble member is forecast independently by applying
the operator F . It is worth mentioning that it is an embar- ETKF-FWI SCHEME
rassingly parallel process as members are independents. Dur-
FWI is a static inverse problem, which does not relate directly
ing the analysis, the forecasts and measured observations are
to dynamic/evolutive problems specific to DA. Our idea is to
combined, in the least-squares sense. We choose here to fol-
use hierarchical strategies commonly used in FWI could con-
low the formalism of the deterministic Ensemble Transform
veniently replace temporal evolution in ETKF-FWI. The most
KF (ETKF, Bishop et al., 2001), among the variety of exist-
common strategy in FWI is the frequency-continuation, orig-
ing EnKF formalisms. Deterministic EnKF is chosen, as they
inally employed to limit cycle skipping (Bunks et al., 1995;
converge toward the solution faster than their stochastic coun-
Sirgue and Pratt, 2004). As a first proposition, we chose to re-
terparts and ETKF analysis phase also has a lower algorithmic
place the time-evolution of DA by increasing frequency band
cost (Tippett et al., 2003). EnKF relies on the definition of
selection. Then, the state vector m must be defined, as well as
Pe as a product of square root matrices, based on equation (4)
the content of the data vector d, and the observation H and
(subscript k is removed for clarity in the following equations).
forecasting F operators. In our application, m encapsulate the
The update during the analysis requires to first compute Ma
physical properties of the subsurface. From the non-linear op-
according to definition
erator H , d is defined as the seismic wavefield recorded at the
Ma = M f T, (5) receivers. Finally, our chosen forecasting operator F corre-
where T is a transformation matrix of size (Ne , Ne ). The least- sponds to the non-linear FWI process for a given initial model
squares formalism tells us that mi,k at one specified frequency band. Thus we have,
1
−1
1

mi,k+1 = F (mi,k ) = argmin kH (mi,k ) − dobs,k k2
f
TTT = INe +
T
Y f R−1 Y f , (6) (13)
Ne − 1 mi,k 2
© 2017 SEG Page 1308

Observation
dobs,k ? Observation correlation length Gaussian filter and ensuring perturbations
(modeling)
(modeling) Forecast around the mean with amplitudes ranging from −100m.s−1 to
f f
• ?
ma
•
• (FWI) mk+1 d(mk+1 )
100m.s−1 . Measurement noise matrix R is considered as diag-
• ? k × • ?
f
mk−1 ×
• × f
? d(mk−1 )
•
•
• • ? onal with small values compatibles with the set level of noise.
•
• ?
? ×
• ×
?
• ? • ?
?
• ?
• Fig. 2-C shows the result of the ETKF-FWI workflow. This
• ×
• × •
? • result is similar to the model that could be obtained from FWI
• • ? •
ma
k−1 ×
•
• ? a
mk+1 ×
•
• f
mk d(mk )
f
•
• alone with the same settings, starting from m0 . This also im-
Forecast
Observation
plies that our initial model was sufficiently good to ensure con-
(FWI)
dobs,k−1 ? (modeling)
dobs,k+1 ?
vergence.
step
The approximated covariance matrix is extracted as a low-
k−1 k k + 1 (modeling frequency)
rank version from the ensemble repartition. The covariance
for the velocity is given in m2 .s−2 and represents the local di-
Figure 1: EnKF algorithm schematics. Ensemble’s members versity through the ensemble members. The variance (diago-
are represented by dots, data by stars and ellipses represent nal of the covariance matrix) can be displayed as a 2D map
uncertainty. The forecast ensemble is denoted in blue, the an- in fig. 2-D. This result corresponds to expectations from the
alyzed ensemble in red, the observed data in green, and the theoretical understanding of the FWI problem with a surface
modeled data from the forecast are depicted in gray. In bold acquisition setup, in term of uncertainty quantification. The
we have the general EnKF operations while in parenthesis we variance map can be interpreted as the superimposition of a
have the associated case for our ETKF-FWI application. The low-wavenumber background and a high-wavenumber pertur-
dashed lines denotes the Analysis steps. bation. The low-wavenumber background has low variance
values near the acquisition and progressively increases with
distance and depth due to the decrease of wavefield coverage
which provides the model minimizing the l2 norm of the mis- and wave amplitude with the geometrical spreading. The high-
fit between modeled data H (mi,k ) = di,k and measured data wavenumber component of the variance map highlights the in-
dobs,k . The ETKF-FWI scheme is represented in figure 1. terfaces. This can be attributed to band-limited data, which
does not constrain the solution enough.
The common way to generate an ensemble with a given statis-
tics would be to first factorize the desired covariance matrix
with a Cholesky decomposition as P = LLT , and then build- DISCUSSION AND CONCLUSION
ing a vector v satisfying this covariance by
ETKF-FWI seems to be a powerful and straightforward
v = Lu, (14) method allowing uncertainty quantification in FWI. Variance
from a random vector u. However, targeting large-scale ap- maps are easily readable to evaluate inversion results, and res-
plications, a Cholesky decomposition is not achievable. A olution could be studied from lines of P. Still, this original
straightforward and pragmatical way to generate the initial en- application set-up many questions that will require extensive
semble is considered instead. The population is built from a work.
consistent starting model m0 , by considering Ne zero mean First, the actual meaning of “uncertainty” as extracted from the
random vectors ui (white noise), convolved with a 2D Gaus- ensemble must be understood. Working with finite-frequency
sian filter with realistic correlation lengths. Each ensemble waves propagation and limited coverage, cause a filter-like ef-
member can be considered as m0,i = m0 +G ui , with G the con- fect. Thus can the quantitative uncertainties be associated with
volution operator with the 2D Gaussian filter. Thus, the prior direct uncertainty on real physical parameters? Alternatively,
covariance obtained with this ensemble generation strategy is will it only be able to account for the apparent macro-model
a Gaussian squared as P = G G T . seen by the waves, as questioned by Capdeville et al. (2010)
for homogenization and down-scaling problems?
APPLICATION - MARMOUSI EXAMPLE Considering the current state of development of our method,
many points need to be explored: How to design the mea-
In this part, a synthetic experiment is conducted on the 2D surement noise matrix R? This parameter should be simple to
Marmousi model (fig.2-A) with our ETKF-FWI strategy. 2D consider and be related with the recording noise level and sen-
visco-acoustic frequency-domain is chosen for this applica- sor design. Up to now, the process noise matrix has been left
tion. The formulation and operators are set as described in the aside but should be associated with the noise and error of the
previous section and frequency evolution replace the dynamic forecasting operator. Classically, it is a troublesome parame-
evolution. 25 dB of white noise have been applied to the data, ter to estimate even for linear operators in DA. In the frame of
preventing noise-free inverse-crime and reviewing the sensibil- ETKF-FWI, for which the FWI process is considered as fore-
ity of the technique to noise. The acquisition is a fixed spread casting, estimating this matrix could be a challenge but will ul-
surface geometry with 144 sources and 660 receivers at a 25m timately be needed in practical applications. The initial ensem-
depth position under water surface. ETKF-FWI has been ap- ble repartition, directly linked to the prior covariance P, is also
plied from 3 to 15Hz, each 0.4Hz, from m0 initial model (fig.2- an open question. A pragmatical approach is to use a Gaus-
B). The 200 ensemble members are generated by using a 500m sian filter, leading to a Gaussian squared covariance. However,
© 2017 SEG Page 1309

values should also be avoided to ensure enough diversity in the
A ensemble repartition and avoid ensemble collapse. The num-
4000 ber of ensemble members is also an important point to be as-
sessed. EnKF usually involves few tenths to hundreds of mem-
Velocit (m.s⁻¹)
3200 bers for large scale problems. The optimal number of members
for ETKF-FWI scheme is still to be determined and may vary
2400 according to model complexity/size and acquisition design.

The “dynamic” strategy of the proposed method may also
1600 be re-designed. For frequency-domain FWI, frequency se-
lection seems a natural hierarchy. This proxy for dynamic
evolution may be extended or replaced with frequency bands
B and/or time-windows for time-domain FWI. The combination
4000
with data subsampling, shot decimation or source assembling
Velocit (m.s⁻¹)
(Krebs et al., 2009; Warner et al., 2013) may prove to be per-
3200
tinent. This would also result in a reduction of the technique’s
cost.
2400
A more global view of the approach also leads to questioning
1600 the variables and observations of the filter itself. Only the ve-
locity properties have been accounted for up to now, but multi-
parameters unknowns are inherently easy to consider. Well-
C log data, for instance, may be used as a direct observations
4000 or constraint in addition to the seismic wavefield. The entire
Velocit (m.s⁻¹)
wavefield may also be considered an as unknown variable in

3200 the EnKF, making some link with Wavefield Reconstruction
Inversion proposed by van Leeuwen and Herrmann (2013), as
2400 both the physical parameter and the wavefield would be con-
sidered as unknowns.
1600
Finally, the differences and added value of the proposed ap-
0 proach must be explicitly evaluated on other methods from
D the literature. The interest of using the analysis step of EnKF,
500 0.3 compared to a purely independent ensemble approach as used
V⁻ri⁻nce (m².s⁻²)
1000 by Du et al. (2012); Jordan (2015) needs to be determined.

Depth (m)
0.2 How does the method compare to a Markov chain Monte Carlo
1500 sampling of the misfit function at the convergence point as pro-
2000 posed by Fang et al. (2014), or the random sampling of the
0.1
Hessian at the convergence point as suggested by Fichtner and
2500 van Leeuwen (2015)?
0.0
0 2000 4000 6000 8000 10000 12000 Of course, one point to carefully consider is the computational
Offset (m)
cost, as each ensemble members requires to solve the FWI
problem on its own, increasing the cost of one or two order of
Figure 2: Velocity models and variance map associated with magnitude compared to classical FWI. Nonetheless, remind-
the experiment depicted in the Application section. A) 2D ing that this scheme is embarrassingly parallel and thanks to
Marmousi true velocity model. B) ETKF-FWI initial model the development of hardware capacities towards the exascale
m0 . C) ETKF-FWI final mean model after 30 assimilation and the current trends in grid computing, ETKF-FWI appli-
steps from 3Hz to 15Hz each 0.4Hz. D) ETKF-FWI final vari- cations may be promptly achievable even for large scale FWI
ance map after 30 assimilation step from 3Hz to 15Hz each problems, as it is the case for actual DA problems.
0.4Hz.
ACKNOWLEDGMENTS
other filters as Laplace or Bessel filter (Trinh et al., 2017) may
be relevant if used with the same strategy. These filters directly This study was partially funded by the SEISCOPE consortium
(http://seiscope2.osug.fr), sponsored by CGG, CHEVRON, EXXON-
affect the spatial shape of the covariance between parameters,
MOBIL, JGI, SHELL, SINOPEC, STATOIL, TOTAL and WOOD-
but also its amplitude. This amplitude should be cautiously SIDE. This study was granted access to the HPC resources of the
set, as too large values could lead to significant kinematics dif- Froggy platform of the CIMENT infrastructure (https://ciment.ujf-
ferences in the data resulting in cycle-skipping. However, the grenoble.fr), which is supported by the Rhône-Alpes region (GRANT
values should be sufficiently large to ensure satisfying explo- CPER07 13 CIRA), the OSUG@2020 labex (reference ANR10
ration of the model space and provide meaningful information LABX56) and the Equip@Meso project (reference ANR-10-EQPX-
29-01).
about the misfit function’s local curvature. Dramatically low
© 2017 SEG Page 1310

EDITED REFERENCES
REFERENCES
Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001, Adaptive sampling with the ensemble transform
kalman filter. Part I: Theoretical aspects: Monthly Weather Review, 129, 420–436,
http://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.
Bunks, C., F. M. Salek, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, http://doi.org/10.1190/1.1443880.
Capdeville, Y., L. Guillot, and J.-J. Marigo, 2010, 2-D non-periodic homogenization to upscale elastic
media for P-SV waves: Geophysical Journal International, 182, 903–922,
http://doi.org/10.1111/j.1365-246X.2010.04636.x.
Du, Z., E. Querendez, and M. Jordan, 2012, Resolution and uncertainty in 3D stereotomographic
inversion: 74th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://doi.org/10.3997/2214-4609.20148576.
Evensen, G., 1994, Sequential data assimilation with nonlinear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics: Journal of Geophysical Research, 99, 143–162,
http://doi.org/10.1029/94JC00572.
Evensen, G., 2009, Data assimilation: The ensemble Kalman filter: Springer.
Fang, Z., F. J. Herrmann, and C. D. Silva, 2014, Fast uncertainty quantification of 2D full-waveform
inversion with randomized source subsampling: 76th Annual International Conference and
Exhibition, EAGE, Extended Abstract, http://doi.org/10.3997/2214-4609.20140715.
Fichtner, A., and J. Trampert, 2011a, Hessian kernels of seismic data functionals based upon adjoint
techniques: Geophysical Journal International, 185, 775–798, http://doi.org/10.1111/j.1365-
246X.2011.04966.x.
Fichtner, A., and J. Trampert, 2011b, Resolution analysis in full waveform inversion: Geophysical
Journal International, 187, 1604–1624, http://doi.org/10.1111/j.1365-246X.2011.05218.x.
Fichtner, A., J. Trampert, P. Cupillard, E. Saygin, T. Taymaz, Y. Capdeville, and A. V. nor, 2013,
Multiscale full waveform inversion: Geophysical Journal International, 194, 534–556,
http://doi.org/10.1093/gji/ggt118.
Fichtner, A., and T. van Leeuwen, 2015, Resolution analysis by random probing: Journal of Geophysical
Research: Solid Earth, 120, 5549–5573, http://doi.org/10.1002/2015JB012106.
Hunt, B., E. Kostelich, and I. Szunyogh, 2007, Efficient data assimilation for spatiotemporal chaos: A
local ensemble transform kalman filter: Physica D: Nonlinear Phenomena, 230, 112–126,
http://doi.org/10.1016/j.physd.2006.11.008.
Jin, L., M. K. Sen, and P. L. Stoffa, 2008, One-dimensional prestack seismic waveform inversion using
ensemble kalman filter: 78th Annual International Meeting, SEG, Expanded Abstracts, 1920–
1924, http://doi.org/10.1190/1.3063815.
Jordan, M., 2015, Estimation of spatial uncertainties in tomographic images: 77th Annual International
Conference and Exhibition, EAGE, Extended Abstracts, http://doi.org/10.3997/2214-
4609.201413555.
Kalman, R., 1960, A new approach to linear filtering and prediction problems: Journal of basic
Engineering, 82, 35–45, https://doi.org/10.1115/1.3662552.
Krebs, J., J. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, and M. D. Lacasse, 2009, Fast
full-wavefield seismic inversion using encoded sources: Geophysics, 74, no. 6, WCC105–
WCC116, http://doi.org/10.1190/1.3230502.
© 2017 SEG Page 1311

Lailly, P., 1983, The seismic problem as a sequence of before-stack migrations: Presented at the
Conference on Inverse Scattering: Theory and Applications, SIAM, Philadelphia.
Métivier, L., F. Bretaudeau, R. Brossier, S. Operto, and J. Virieux, 2014, Full waveform inversion and the
truncated Newton method: quantitative imaging of complex subsurface structures: Geophysical
Prospecting, 62, no. 6, 1353–1375, http://doi.org/10.1111/1365-2478.12136.
Métivier, L., R. Brossier, J. Virieux, and S. Operto, 2013, Full waveform inversion and the truncated
Newton method: SIAM Journal On Scientific Computing, 35, B401–B437,
http://doi.org/10.1137/120877854.
Operto, S., A. Miniussi, R. Brossier, L. Combe, L. Métivier, V. Monteiller, A. Ribodetti, and J. Virieux,
2015, Efficient 3-D frequency-domain mono-parameter full-waveform inversion of ocean-bottom
cable data: application to Valhall in the visco-acoustic vertical transverse isotropic
approximation: Geophysical Journal International, 202, 1362–1391,
http://doi.org/10.1093/gji/ggv226.
Ott, E., B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay, D. Patil, and J. A.
Yorke, 2004, A local Ensemble Kalman filter for atmospheric data assimilation: Tellus A, 56,
415–428, http://doi.org/10.1111/j.1600-0870.2004.00076.x.
Plessix, R. E., 2009, Three-dimensional frequency-domain full-waveform inversion with an iterative
solver: Geophysics, 74, no. 6, WCC53–WCC61, http://doi.org/10.1190/1.3211198.
Plessix, R.-E., G. Baeten, J. W. de Maag, and F. ten Kroode, 2012, Full waveform inversion and distance
separated simultaneous sweeping: a study with a land seismic data set: Geophysical Prospecting,
60, 733–747, http://doi.org/10.1111/j.1365-2478.2011.01036.x.
Sirgue, L., O. I. Barkved, J. Dellinger, J. Etgen, U. Albertin, and J. H. Kommedal, 2010, Full waveform
inversion: The next leap forward in imaging at Valhall: First Break, 28, 65–70,
http://doi.org/10.3997/1365-2397.2010012.
Sirgue, L., and R. G. Pratt, 2004, Efficient waveform inversion and imaging: a strategy for selecting
temporal frequencies: Geophysics, 69, 231–248, http://doi.org/10.1190/1.1649391.
Tarantola, A., 1984, Linearized inversion of seismic reflection data: Geophysical Prospecting, 32, 998–
1015, http://doi.org/10.1111/j.1365-2478.1984.tb00751.x.
Tarantola, A., 2005, Inverse problem theory and methods for model parameter estimation: Society for
Industrial and Applied Mathematics.
Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003, Ensemble square
root filters: Monthly Weather Review, 131, 1485–1490.
Trinh, P.-T., R. Brossier, L. Métivier, J. Virieux, and P. Wellington, 2017, Bessel smoothing filter for
spectral element mesh: Geophysical Journal International, 209, 1489-1512,
https://doi.org/10.1093/gji/ggx103.
van Leeuwen, T., and F. J. Herrmann, 2013, Mitigating local minima in full-waveform inversion by
expanding the search space: Geophysical Journal International, http://doi.org/10.1093/gji/ggt258.
Wang, X., C. H. Bishop, and S. J. Julier, 2004, Which is better, an ensemble of positive-negative pairs or
a centered spherical simplex ensemble?: Monthly Weather Review, 132, 1590–1605,
https://doi.org/10.1175/1520-0493(2004)132%3C1590:wibaeo%3E2.0.co;2.
Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V. Vinje, I. Stekl, L. Guasch, C.
Win, G. Conroy, and A. Bertrand, 2013, Anisotropic 3D full-waveform inversion: Geophysics,
78, R59–R80, http://doi.org/10.1190/geo2012-0338.1.
Zhu, H., E. Bozdag, and J. Tromp, 2015, Seismic structure of the European upper mantle based on adjoint
tomography: Geophysical Journal International, 201, 18–52, https://doi.org/10.1093/gji/ggu492.
© 2017 SEG Page 1312

Zhu, H., S. Li, S. Fomel, G. Stadler, and O. Ghattas, 2016, A bayesian approach to estimate uncertainty
for full-waveform inversion using a priori information from depth migration: Geophysics, 81,
R307–R323, https://doi.org/10.1190/geo2015-0641.1.
© 2017 SEG Page 1313

A new least-square correlation-based near surface full traveltime inversion
Jia Yi1,2* and Yike Liu1
1. Key Laboratory of Shale Gas and Geoengineering, Institute of Geology and Geophysics, Chinese Academy of
Sciences
2. University of Chinese Academy of Sciences
SUMMARY in seismic data. Meanwhile, cycle-skipping is the main

culprit causing local minima in FWI. In order to overcome
To mitigate the problem of cycle-skipping for full waveform this problem, Luo and Schuster (1991) developed a least-
inversion (FWI), conventional correlation-based approach square traveltime shift misfit function which has a robust
as an optimization criterion has been widely used. However, convergence property, but it has to pick traveltime. Van
it has two severe problems, the first is it cannot obtain an Leeuwen and Mulder (2010) proposed the conventional
accurate gradient direction since the derivation of the correlation-based approach, it doesn’t need to pick
gradient is based on Born approximation. The second is traveltime as an optimization criterion. Whereas, there still
conventional correlation-based misfit function is hard to exists two problems for conventional correlation-based
make the desired solution converge to the global minimum. traveltime inversion. The first is conventional correlation-
Therefore, we propose a new least-square correlation-based based misfit function is unable to obtain a more accurate
full traveltime inversion algorithm to tackle these two gradient using Born-based approximation. Luo et al., (2016)
problems. Our misfit function incorporate the auto- presented a full traveltime inversion (FTI) which replaces
correlation in addition to the cross-correlation term, and it is Born approximation with Rytov approximation. Because
the least squares of the two terms to ensure the global Rytov approximation has more advantages in traveltime
convergence. To circumvent inaccurate gradient direction, inversion, it can give a more accurate background gradient.
we utilize Rytov approximation to derive the gradient. For Another problem is conventional correlation-based misfit
crosswell model we illustrate the validity of our method. function which is hard to make the desired solution converge
Then we explore our method in near surface velocity to global minimum, if the auto-correlation of observed
reconstruction, synthetic tests demonstrate that our approach wavefields has some energy at non-zero time-lags (Choi and
can provide satisfactory inversion results. Alkhalifah, 2016).
INTRODUCTION In this paper, we propose a new least-square correlation-

based full traveltime inversion method to reconstruct near
Due to low-velocity zone and rugged topography widely surface velocity, which we extend the optimized correlation-
existed at near surface, the complex near surface imaging based misfit function (Choi and Alkhalifah, 2016) to make
and velocity reconstruction are still challenging. Many the desired solution be close to the global minimum. In order
studies have been devoted to solving the near surface to obtain a more accurate gradient direction, we put
velocity reconstruction. Traveltime tomography (Nolet, emphasis on more reliable traveltime information and utilize
1987; Zhang and Toksoz, 1998) is an important ray-based Rytov approximation to derive the gradient. In addition, we
technique in last decades. The widely used Multichannel use Spectral Element Method (Tromp et al., 2008) to
Analysis of Surface Wave method (Xia and Miller, 1999), simulate near surface seismic wave propagation, since the
which extracts dispersion curves and uses dispersion regular mesh of traditional finite difference method is not
characteristics of surface waves to invert near surface suitable for rugged near surface forward modelling.
parameters. FWI (Tarantola, 1984; Pratt et al., 1998) was
applied to invert near surface velocity in recent years, it THEORY
theoretically yield higher-resolution images and reveal more
detail information of subsurface structure than traveltime Strong nonlinearity in highly dimensional model space
tomography. Yuan and Simons (2014) developed a inverse problem leads to a possible convergence to local
waveform inversion with surface and body waves, because minima in the misfit function, especially when starting
surface waves are important to constrain shallow structure. model is far from the target. Moreover, in case of slow
Shen (2014) extracted the early-arrival waveform to estimate surface waves propagating in low velocity medium of near
near surface velocity. surface, a more robust misfit function should be developed
for avoiding local minima. In this section, we first discuss
However, FWI suffers many obstacles to impede its the two problems of conventional correlation-based method
successful implementation, such as the cycle-skipping in detail, and then propose our new approach. Finally, we
(Mulder and Plessix, 2008) caused by initial velocity model derive our gradient and adjoint-source for traveltime
that is far away from the target or lacking of low frequency inversion.
© 2017 SEG Page 1314

A new least-square correlation-based traveltime inversion
Conventional correlation-based misfit function 1

𝐸= ∑ ∫[𝜏 ∙ {𝐶𝑐𝑟𝑜𝑠𝑠 (𝜏) − 𝐶𝑎𝑢𝑡𝑜 (𝜏)}]2 𝑑𝜏 , (3)
2
𝒓𝑔 ,𝒓𝑠
Let 𝑝𝑜 and 𝑝𝑐 be the observed and calculated wavefields,
respectively. The cross-correlation between observed and 𝐶𝑎𝑢𝑡𝑜 (𝜏) = ∫ 𝑝𝑜 (𝒓𝑔 , 𝑡; 𝒓𝑠 )𝑝𝑜 (𝒓𝑔 , 𝑡 + 𝜏; 𝒓𝑠 )𝑑𝑡 , (4)
calculated wavefields is given as
where 𝜏 ∈ [−𝑚𝑎𝑥𝑡𝑎𝑢, 𝑚𝑎𝑥𝑡𝑎𝑢] is a linear weight function
𝐶𝑐𝑟𝑜𝑠𝑠 (𝜏) = ∫ 𝑝𝑐 (𝒓𝑔 , 𝑡; 𝒓𝑠 )𝑝𝑜 (𝒓𝑔 , 𝑡 + 𝜏; 𝒓𝑠 )𝑑𝑡 . (1) in equation(3), and 𝐶𝑎𝑢𝑡𝑜 is the auto-correlation of observed
Conventional correlation-based misfit function for wave- wavefields. The new misfit function attempts to drive cross-
equation traveltime inversion (Van Leeuwen and Mulder, correlation to match auto-correlation due to the least-square
2010) is defined as connection between them. Finally, the desires solution will
1 locate at the global minimum, and the traveltime shift will
𝐸 = ∑ ∫{𝑊(𝑡) ∙ 𝐶𝑐𝑟𝑜𝑠𝑠 (𝜏)}2 𝑑𝜏 , (2) tend to zero. This new misfit function combines all
2
𝒓𝑔 ,𝒓𝑠 advantages of previous proposed approaches. There is no
where 𝑊(𝑡) is the weight function that penalize energy at need to pick traveltime as an optimization criterion, and it
non-zero time lags. We use a 1D seismogram test to illustrate can prompt the inversion to converge to global minimum.
the first problem of conventional correlation-based misfit
function. Figure 1 shows the calculated and observed Comparison between Born and Rytov approximation
seismogram and their auto-correlation and cross-correlation.
In the process of traveltime inversion, we try to minimize the Born and Rytov approximation both are derived under the
cross-correlation. However, if the auto-correlation of assumption of weak scattering of wavefields. However,
observed wavefields has some energy at non-zero traveltime Born approximation make use of the amplitude of
shift, when the cross-correlation is minimized, there still has wavefields, while Rytov approximation based on the
traveltime shift exist. Thus, the desired solution cannot complex wavefields phase.
locate at the global minimum (black solid line in Figure 1b),
and it will be stuck in the local minimum (black dotted line Born approximation: We give a function 𝑂(𝒓) = 𝑣0 /
in Figure 1b). 𝑣(𝒓) − 1 which represents the velocity perturbations.
(a) Substituting 𝑂(𝒓) into scalar acoustic wave equation in
frequency domain
𝜔2
(∆ + 2 ) 𝑝(𝒓) = 0 , (5)
𝑣 (𝒓)
we can obtain
∆𝑝 + 𝑘02 𝑝 = −2𝑘02 𝑂(𝒓)𝑝(𝒓) − 𝑘02 𝑂 2 (𝒓)𝑝(𝒓) . (6)
Assuming that the wavefields has series solution 𝑝 =
∑∞ 𝑡ℎ
𝑚=0 𝑝𝑚 , where 𝑝𝑚 is related to 𝑚 -order of 𝑂(𝒓). Then,
(b) substituting the series solution into equation (6) and
comparing each order of 𝑂(𝒓), we can get the first-order
Born approximation
𝑝1 (𝒓) = −2𝑘02 ∫ 𝑑𝒓′ 𝐺 (𝒓 − 𝒓′ )𝑂(𝒓′ )𝑝0 (𝒓′ ) , (7)
where 𝐺 is the Green’s function, 𝑝0 and 𝑘0 represent the
incident wavefields and background wavenumber,
respectively. When the scale of the object (it means the
velocity perturbations) is small, we can use the incident
Figure 1: (a) The observed and calculated seismogram, (b) wavefields 𝑝0 to replace the total wavefields 𝑝. However,
are their auto-correlation and cross-correlation. when the scale of the object is too large, Born approximation
is invalid. Born approximation can be applied only when the
magnitude of the differential field is smaller than the
The new least-square correlation-based misfit function incident wavefields, which implies the change in phase
between the incident field and the wave propagating through
To avoid the local minima caused by conventional the object is less than π (Woodward, 1992).
correlation-based misfit function, we take the auto-
correlation into consideration, and extend the optimized Rytov approximation: If we consider the incident wave as
correlation-based misfit function (Choi and Alkhalifah, 2016) a plane wave 𝑝 = 𝐴𝑒 Ψ , where Ψ = 𝑖𝜔𝑡 is the complex
to propose a new least-square correlation-based misfit wavefields phase. Assuming that the phase has series
function, which defined as solutionΨ = ∑∞
𝑚=0 Ψ𝑚 , and using the same derivation of
© 2017 SEG Page 1315

Born approximation, we can obtain the first-order Rytov with respect to the model parameter (slowness s(r)), we can
approximation obtain
2𝑘02 𝜕𝐸 𝜕𝐶𝑐𝑟𝑜𝑠𝑠 𝜕∆𝜏
Ψ1 = − ∫ 𝑑𝒓′ 𝐺 (𝒓 − 𝒓′ )𝑂(𝒓′ )𝑝0 (𝒓′ ) . (8) = ∑ ∫ 𝜏 2 ∙ {𝐶𝑐𝑟𝑜𝑠𝑠 − 𝐶𝑎𝑢𝑡𝑜 } (− ∙ ) 𝑑𝜏.

𝑝0 (𝒓) 𝜕𝑠 𝜕𝜏 𝜕𝑠
𝑟𝑔, 𝑟𝑠
Comparing equation (7) and (8), we note the expression of
(10)
the two approximation are almost the same. However, Rytov
From the Rytov scattering equation
approximation owns more advantages in the application of
traveltime inversion. It naturally separates traveltimes from 𝑝(𝑟𝑔 , 𝜔; 𝑟𝑠 )∆Ψ = −2 ∫ 𝑠0 (𝑟 ′ )𝐺(𝑟𝑔 , 𝜔; 𝑟 ′ )
amplitudes (Luo et al., 2016; Woodward, 1992).
𝐴 𝑝̈ (𝑟 ′ , 𝜔; 𝑟𝑠 )Δ𝑠(𝑟 ′ )𝑑𝑟 ′ , (11)
∆Ψ(ω) = ln(𝑝(ω)) − ln(𝑝0 (ω)) = ln ( ) + 𝑖𝜔Δ𝜏 , (9) where 𝑠0 is the background velocity. We can get the
𝐴0 derivative of ∆τ with respect to the slowness
where the wavefields are defined as 𝑝(𝜔) = 𝐴𝑒 𝑖𝜔𝑡 , 𝐴 𝜕∆𝜏
represents the amplitudes, and Δ𝜏 = 𝑡 − 𝑡0 is the traveltime = −2𝑠0 ∙
𝜕𝑠
shift. Furthermore, Rytov approximation is supposed to be 𝑝̇
∫ 𝑜 𝑔(𝑟 , 𝑡 + 𝜏; 𝑟 )𝐺(𝑟 ′ , 𝑡; 𝑟 ) ∗ 𝑝̈ (𝑟 ′ , 𝑡; 𝑟 )𝑑𝑡
𝑠 𝑔 𝑐 𝑠
valid for large-aperture angles and a small amount of , (12)
scattering per wavelength (Chi, 2012). ∫ 𝑝̇𝑜 (𝑟𝑔 , 𝑡 + 𝜏; 𝑟𝑠 )𝑝̇𝑜 (𝑟𝑔 , 𝑡 + 𝜏; 𝑟𝑠 )𝑑𝑡
Substituting equation (12) in to equation (10), and using the
Born and Rytov approximation in traveltime inversion: adjoint-state method, we can rewrite the gradient as
We use a homogenous model test to illustrate the second 𝜕𝐸
𝚼= = −4𝑠0 ∑ ∫ 𝑝̈𝑐 ∙ 𝑝′ , (13)
problem of conventional correlation-based method. As 𝜕𝑠
𝑟𝑔 ,𝑟𝑠
shown in Figure 2, the true velocity is 4000 𝑚/𝑠, we give a
where 𝑝′ is the back-propagate wavefield at receiver
high (5000 𝑚/𝑠) and a low (3000 𝑚/𝑠) initial velocity to
location, and our adjoint source is
compare the two approximation based gradient. The gradient
of conventional correlation-based misfit function which 𝛅𝒔 = ∫ 𝜏 ∙ {𝐶𝑐𝑟𝑜𝑠𝑠 − 𝐶𝑎𝑢𝑡𝑜 } ∙ 𝐶𝑐𝑟𝑜𝑠𝑠 ∙
based on Born approximation cannot distinguish positive or
𝑝̇𝑜 (𝑡 + 𝜏)
negative gradient direction (Figure 2a, 2b). However, Rytov 𝑑𝜏 . (14)
approximation can clearly indicate the gradient direction ∫ 𝑝̇𝑜 (𝑡 + 𝜏)𝑝̇𝑜 (𝑡 + 𝜏)𝑑𝑡
(Figure 2c, 2d). We use conjugate gradient method to update the velocity
(b)
model.
(a)
EXAMPLES
In order to verify the effectiveness of our method, we first

(c) (d) test it on a crosswell velocity model shown in Figure 3a. The
grid size of the model is 132*329, and grid interval is 1m.
We use a Ricker-wavelet source with peak frequency of 25
Hz. Sources and receivers are distributed on both sides of the
well. We start with a constant velocity of 5275 𝑚/𝑠 , Figure
3b shows the inversion result only after 8 iterations. Then,
Figure 2: Comparing the Born and Rytov approximation we use the inverted result as initial model for conventional
based gradient for high and low velocity. (a) Born-gradient FWI and obtain a high-resolution result shown in Figure 3c.
with high velocity, (b) Born-gradient with low velocity, (c) These results clearly show that our method is valid in
Rytov-gradient with high velocity, (d) Rytov-gradient with traveltime inversion, and it has the capacity to provide very
low velocity. good initial model for conventional FWI.
Gradient and Adjoint source In the second example, we apply our method in a complex
near surface model shown in Figure 4b to illustrate the
From the above discussion, we recognize that Rytov capability of our approach in near surface velocity
approximation is a better choice for traveltime inversion. reconstruction. The grid size of the model is 833*100 with
Thereupon we utilize Rytov approximation to derive our grid interval of 10m. We use a source wavelet of peak
gradient. Following the idea of FTI, we ignore amplitude and frequency at 15Hz to model the data. Sources and receivers
put emphasis on more reliable traveltime information, are evenly distributed on the rugged surface. In order to
because amplitude can cause highly nonlinear when it comes obtain more accurate wavefields, we use Spectral Element
to traveltime inversion. We take the derivative of our new Method to simulate near surface seismic wave propagation.
least-square correlation-based misfit function in equation (3) Figure 4a is the irregular quadrilateral mesh for spectral
© 2017 SEG Page 1316

element method. The starting model is also a homogenous

media of 4800 𝑚/𝑠, and it is far away from the target. The CONCLUSIONS
velocity model returned from conventional correlation-
based method is shown in Figure 4c. it clearly shows its The conventional correlation-based misfit function for
gradient doesn’t produce the correct direction. Figure 4d traveltime inversion is hard to converge to global minimum.
displays the inverted model using our proposed method after Meanwhile, Born approximation based gradient is unable to
25 iterations, and it converges very well. The result of FWI obtain an accurate gradient direction. Therefore, we
which use our inverted result as initial model is shown in proposed a new least-square correlation-based misfit
Figure 4e, and it matches the true model very well. function to tackle the first problem. In order to solve the
second problem, we utilized Rytov approximation to derive
(a) (b) (c) the gradient and obtained a correct gradient direction. Based
on successful numerical experiments, we have validated our
method which could generate good convergent results, even
did not require an accurate initial model. Our approach
overcame or even eliminated the cycle-skipping problem,
and the inverted result could be a very good initial model for
FWI. Finally, we got a high resolution near surface velocity
model through the combination of our proposed method and
conventional FWI.
ACKNOWLEDGEMENTS
We would like to thank Yanhua Yuan for helpful

suggestions. The research was funded by the National
Nature Sciences Foundation of China (Grant No. 42430321)
Figure 3: (a) The true velocity model, (b) the inversion result and Statoil Petroleum (Grant No. 4503288025).
of our method, (c) the inversion result of conventional FWI,
using our inverted model as initial model.
(a)
(b)
(c)
(d)
(e)
Figure 4: (a) The irregular quadrilateral mesh for spectral element method, the mesher is sparse for demonstration, (b) the true
complex near surface model, (c) the inversion result of conventional correlation-based traveltime inversion, (d) the inversion
result of our method, (e) the inversion result of FWI, using our inverted model as initial velocity model.
© 2017 SEG Page 1317

EDITED REFERENCES
REFERENCES
Chi, B., Y. Wang, Y. Liu, and L. Dong, 2012, Hybrid Born and Rytov scattering series and its application
in full waveform inversion: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–5,
http://doi.org/10.1190/segam2012-0934.1.
Choi, Y., and T. Alkhalifah, 2016, An optimized correlation-based full waveform inversion: 76th Annual
International Conference and Exhibition, EAGE, Extended Abstracts, Tu P1 13,
http://doi.org/10.3997/2214-4609.201600642.
Luo, Y., Y. Ma, Y. Wu, H. Liu, and C. Lei, 2016, Full-traveltime inversion: Geophysics, 81, no. 5, R261–
R274, http://doi.org/10.1190/geo2015-0353.1.
http://doi.org/10.1190/1.1443081.
Mulder, W., and R. E. Plessix, 2008, Exploring some issues in acoustic full-waveform inversion:
Geophysical Prospecting, 56, 827–841, http://doi.org/10.1111/j.1365-2478.2008.00708.x.
Nolet, G., 1987, Seismic tomography: Reidel.
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency domain
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Shen, X., 2014, Early-arrival waveform inversion for near-surface velocity estimation: Ph.D. thesis,
Stanford Exploration Project, Geophysics Department.
1259–1266, http://doi.org/10.1190/1.1441754.
Tromp, J., D. Komatitsch, and Q. Liu, 2008, Full-traveltime inversion: Communications in
Computational Physics, 3, 1–32.
Van Leeuwen, T., and W. A. Mulder, 2010, A correlation-based misfit criterion for wave-equation
https://doi.org/10.1111/j.1365-246X.2010.04681.x.
Woodward, M. J., 1992, Wave-equation tomography: Geophysics, 57, 15–26,
http://doi.org/10.1190/1.1443179.
Xia, J., R. D. Miller, C. B. Park, and G. Tian, 2002, Determining Q of near-surface materials from
Rayleigh waves: Journal of Applied Geophysics, 51, 121–129, http://doi.org/10.1016/S0926-
9851(02)00228-8.
Yuan, Y. O., F. J. Simons, and E. Bozdag, 2015, Multiscale adjoint waveform tomography for surface
and body waves: Geophysics, 80, no. 5, R281–R302, http://doi.org/10.1190/geo2014-0461.1.
Zhang, J., and M. N. Toksöz, 1998, Nonlinear refraction traveltime tomography: Geophysics, 63, 1726–
1737, http://doi.org/10.1190/1.1444468.
© 2017 SEG Page 1318

Building good initial model for full-waveform inversion using frequency shift filter
Guanchao Wang, Sanyi Yuan, Wanwan Wei, Shangxu Wang, Xianshi Ye.
China University of Petroleum-Beijing, State Key Laboratory of Petroleum Resources and Prospecting, CNPC Key
Laboratory of Geophysical Exploration
Summary domain FWI. The tomography components in the FWI

gradient have been separated and preserved in the context
An accurate initial model is crucial for full waveform of FWI by Xu et al. (2012), Ma and Hale (2013) and Chi et
inversion. The instantaneous amplitude envelope of a al. (2015) to update the background velocity model.
wavelet is invariant under frequency shift. This means that Alkhalifah (2015) further separated the gradient
resolution is constant for a given frequency bandwidth, and components based on the scattering angle filter. Wu et al.
independent of the actual values of the frequencies (Knapp, (2013) proposed the envelope inversion method. Similarly,
1990). Based on this property, we propose a frequency shift Hu et al. (2014) proposed the Beat tone inversion using two
filter (FSF) to build the relationship between low- and adjacent high frequencies to obtain a good starting model.
high-frequency information with constant frequency
bandwidth. Then, we can use the high-frequency In this paper, under the frame of envelope inversion (EI)
information to invert the low-frequency velocity model. method, we introduce a FSF method into FWI to build a
Numerical results using synthetic data from the Marmousi relationship between low- and high-frequency information.
model demonstrate that our proposed envelope misfit Therefore, we can use the high-frequency signal in seismic
function based on the frequency shift filter can build an data to recover the low-wavenumber components. The
initial model with more accurate long-wavelength effectiveness of the method is proved through synthetic
components. data generated from Marmousi model. The numerical
experimental results show that the proposed method can
Introduction update the long-wavelength model and provide a better
initial model for the traditional FWI.
The full-waveform inversion (FWI) technique is an
important geophysical tool for recovering the unknown Method
subsurface structures and physical properties, such as
velocity, density or the quality factor (Qu et al. 2016), from Invariance of wavelet envelope under frequency shift
pre-stack seismic data. Since Tarantola put forward the
FWI’s basic theory (1984), FWI has been intensively The analytic transform xa (t ) of seismic trace x(t ) is
developed in both synthetic and field data experiments
xa (t )  x(t )  ixh (t ) , (1)
(Virieux and Operto 2009, for a recent review).
where x(t ) is the real seismic trace and xh (t ) is the
Although the FWI has been successfully applied in many Hilbert transform of x(t ) . The instantaneous amplitude
blind and real data cases, there are still many practical envelope of xa (t ) [and of x(t ) ] is
challenges. Among all those difficulties, the significant
challenge is that FWI tends to trap in local minima or E[ x(t )]  [ xa (t ) xa (t )]1/ 2 , (2)
cycle-skipping due to the local character of the used 
where x (t ) is the complex conjugate of xa (t ) . If X ( f ) is
a
optimization method. Two ways to tackle this challenge are
the low-frequency information in seismic data and an the Fourier transform of x(t ) , then the frequency shift
accurate starting model. In the past few years, a lot of the theorem of the Fourier transform is given by
efforts in the industry are put into inventing new 
IFT
X ( f  f 0 )  x(t )ei 2 tf 0 , (3)
source/receiver which can generate/receive low frequency FT
signal (1.5Hz-5Hz) at a significantly higher cost than Then,

previously. However, in general, the low-frequency sources E[ x(t )ei 2 tf 0 ]  {x(t )ei 2 tf 0 [x(t )ei 2 tf0 ]}1/ 2
are not always available and the traditional seismic records ={x(t )ei 2 tf 0 x* (t )e-i 2 tf0 }1/ 2
can go down only to approximately 5 Hz. Therefore, the , (4)
starting model is still a pressing issue for FWI. Fortunately, =[x(t ) x* (t )]1/ 2
many strategies have been developed to improve the =E[ x (t )]
starting model and to relax high nonlinearity. Shin and Cha Through the deduction, we know that the instantaneous
(2008) proposed a Laplace domain FWI approach to build a amplitude envelope is invariant after frequency shift as
long-wavelength velocity model. Later, they extended the shown in Figure 1.
idea and proposed the Laplace-Fourier domain FWI to
provide a more refined result compared with Laplace
© 2017 SEG Page 1319

Building good initial model using FSF
1.5 d obs is the original observed seismic data, S is the sampling

Wavelet Spectrum
operator at the receiver locations, p  0 is the power for
Shifted-Wavelet Spectrum
1
the envelope data, G represents the Green’s function, and
 denotes a temporal convolution. According to Equation
Amplitude
7, the gradient can also be calculated by using back-

0.5 propagation method and the term inside the brace serves as
the effective residual. The step length of our method can be
calculated using the variable step length method (Köhn,
0
0 5 10 15
Frequency(Hz)
20 25 30 2012).
(a) Wavelets with constant frequency bandwidth in frequency
Examples
domain
1
Wavelet
We test the effectiveness of the proposed frequency shift
Shifted-Wavelet
Wavelet Envelope
filter on the Marmousi model using synthetic data. We
0.5 Shifted-Wavelet Envelope
extract a section (Figure 2a) of 4.65 km × 1.48 km from the
original Marmousi model. The linear gradient initial model
Amplitude
0
is shown in Figure 2b. The first-order acoustic wave
equation is used to produce the synthetic data. The forward
-0.5
modeling is performed with a high-order staggered grid
finite difference scheme in time domain. The grid interval
-1
-0.5 -0.4 -0.3 -0.2 -0.1 0
Time(s)
0.1 0.2 0.3 0.4 0.5 is 10m×10m in horizontal and vertical directions. When
generating synthetic seismograms, the Ricker source with
(b) Corresponding wavelets and their envelopes of (a) in time the peak frequency of 20 Hz was filtered with a 7-Hz low-
domain
cut taper. 45 sources are exploded at the depth of 50m with
Figure 1. Invariance of wavelet envelope under frequency shift the distance between adjacent sources of 100 m.
Frequency shift filter (FSF) based envelope inversion
In this paper, based on the envelope invariant property with

constant frequency bandwidth, we propose a filter whose
frequency domain formula is

WOriginal  f WTarget  f  f 0 
P (WOriginal ,WTarget , f 0 )  , (5)
WOriginal  f    2
2
where WOriginal  f  is the original wavelet. WTarget  f  and

WTarget  f  f 0  represent the low-frequency target wavelet
and frequency shifted target wavelet, respectively. f 0 is the (a)
frequency shift.
We define the envelope misfit function based on the

frequency shift filter and derive the corresponding gradient
operator:
1 T
2 0
 dt{E[ d cal ] p  E[ P (d obs )] p }2 , (6)
Then,
 T u
 p  dt{Rres dcal E (d cal ) p 2  [ Rres d cal E (d cal ) p 2 ]h } , (7)
m 0 m
where
d cal  Su  S (G  wTarget ) , (8)
(b)
Rres =E[d cal ]  E[ P( dobs )] ,
p p
(9)
Figure 2. The true velocity model (a) and the linear initial model (b)
© 2017 SEG Page 1320

(a) (b) (c)
(d) (e) (f)

Figure 3. The above represent the seismic record and below represent the corresponding envelope. Original shot record generated by low-cut (<7
Hz) 20 Hz ricker wavelet (left), original shot record filtered by frequency shift filter using 5 Hz ricker wavelet as target wavelet with f 0 =7hz
(middle), reference shot record generated by 5 Hz ricker wavelet (right).
Figure 3 shows the behavior of the frequency shift filter, in The inverted model after 12 iterations of Marmousi by the
which Figure 3a and 3d demonstrate the original shot conventional envelope inversion is shown in Figure 4a. The
record generated by low-cut (<7 Hz) 20 Hz Ricker wavelet final result of EI + FWI is shown in Figure 4b. The velocity
and its envelope, respectively. Figure 3b shows the original model from the conventional EI method is seriously
shot record after filtered by the frequency shift filter using contaminated by high wavenumber information. In addition,
5 Hz Ricker wavelet as target wavelet with f 0 =7hz . the final result of EI + FWI is far away from the true model.
Compared with the original record, the filtered record in
Figure 3b has more events (black arrow) and a lower We firstly apply our proposed frequency shift filter on the
resolution. But Figure 3b and the reference shot record in data sets. Then, we take Equation 6 as the misfit function.
Figure 3c generated by 5 Hz Ricker wavelet have the same The result of our FSF method using 5 Hz Ricker wavelet as
envelope (Figure 3e and Figure 3f) or resolution (Knapp, target wavelet with frequency shift f 0 =7hz is shown in
1990). That is the relationship between low- and high- Figure 5a after 12 iterations. The final inverted model of
frequency information with constant frequency bandwidth FSF + FWI (Figure 5b) is closer to the true one than that
and the reason why we can use the filtered record to shown in Figure 4b. The experimental results demonstrate
recover the low-wavenumber components. that the long wavelength components recovered by FSF
© 2017 SEG Page 1321

Building good initial model using FSF
method can be used as an initial model for FWI to improve

the final results.
Discussion
The aim of this study is to build a relationship between

low- and high-frequency information. Based on the
invariance of wavelet envelope under frequency shift, we
design a frequency shift filter to build the relationship
between low- and high-frequency information with
constant frequency bandwidth. Therefore, we can use the
(a) high-frequency information filtered out by FSF to obtain
the low-frequency information in the model. We can find
that our FSF method have some obvious advantages. On
the one hand, our method can reduce the cycle-skipping or
the unstable issue of conventional envelope inversion when
the seismic record with wider range of frequencies (Wang
et al, 2017). On the other hand, the FSF method does not
need any low-frequency information, so that the FSF can
avoid the effects caused by the serious low-frequency noise
or energy in seismic shot record.
Conclusions
(b)
In this paper, we designed a frequency shift filter into the
Figure 4. (a) The result of conventional EI method. (b) The final
misfit function to build a relationship between low- and
result of EI + FWI.
high-frequency information, based on the invariance of
wavelet envelope under frequency shift. Therefore, we can
use the high-frequency signal in seismic data to recover the
low-wavenumber components in the model. The
effectiveness of the FSF method is proved through a 2D
synthetic data set generated from the Marmousi model. The
numerical experimental results demonstrate that the
proposed FSF method can update the long-wavelength
information effectively and build a better initial model for
FWI.
Acknowledgments
(a) This research was supported by the National Basic

Research Program (2013CB228600), and the National
Natural Science Foundation of China (41674127). We also
thank reviewers for reviewing this manuscript.
(b)
Figure 5. (a) The result of FSF method. (b) The final result of FSF
+ FWI.
© 2017 SEG Page 1322

EDITED REFERENCES
REFERENCES
Alkhalifah, T., 2015, Scattering-angle based filtering of the waveform inversion gradients: Geophysical
Journal International, 200, 363–373, http://dx.doi.org/10.1093/gji/ggu379.
Chi, B. X., L. G. Dong, and Y. Z Liu, 2015, Correlation-based reflection full-waveform inversion:
Hu, W., 2014, FWI without low frequency data-beat tone inversion: 84th Annual International Meeting,
SEG, Expanded Abstracts, 1116–1120, http://dx.doi.org/10.1190/segam2014-0978.1.
Knapp, R. W., 1990, Vertical resolution of thick beds, thin beds, and thin-bed cyclothems: Geophysics,
55, no.6, 1183–1190, http://dx.doi.org/10.1190/1.1442934.
Köhn, D., 2011, Time domain 2D elastic full waveform tomography: Ph.D. dissertation, Kiel University.
Ma, Y., and Hale, D., 2013, Wave-equation reflection traveltime inversion with dynamic warping and
full-waveform inversion: Geophysics, 78, no. 6, R223–R233, http://dx.doi.org/10.1190/geo2013-
0004.1.
Qu, Y. M., Z. C. Li, J. P. Huang, and J. L. Li, 2016, Viscoacoustic anisotropic full waveform inversion:
Journal of Applied Geophysics, 13, no. 3, 511–518,
http://dx.doi.org/10.1016/j.jappgeo.2016.12.001.
Shin, C., and Y. H. Cha, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931, http://dx.doi.org/10.1111/j.1365-246X.2008.03768.x.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Wang, G. C., S. X. Wang, and S. Y. Yuan, 2017, Multi-scale envelope inversion method based on scale
separation: 79th Annual International Meeting, EAGE, Expanded Abstracts.
Wu, R.-S., J. Luo, and B. Wu, 2014, Seismic envelope inversion and modulation signal model:
Geophysics, 79, no. 3, WA13–WA24, http://dx.doi.org/10.1190/geo2013-0294.1.
Xu, S., D. Wang, Y. Chen, Y. Zhang, and G. Lambare, 2012, Full waveform inversion for reflected
seismic data: 74th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://dx.doi.org/10.3997/2214-4609.20148725.
© 2017 SEG Page 1323

Fast Double Plane Wave FWI using the Scattering Integral Method in Frequency Domain
Zeyu Zhao* and Mrinal K. Sen, Institute for Geophysics, Jackson School of Geosciences, The University of
Texas at Austin
SUMMARY (1991). Different imaging methods using DPW data are

proposed by Fokkema and van den Berg (1992), Stoffa et
We propose a fast full waveform inversion (FWI) method al., (2006), Zhao et al. (2016, 2017). We show that both
using double plane wave (DPW) data in frequency domain. sensitivity kernel and the gradient can be constructed with
The gradient of the misfit function can be obtained high efficiency in DPW FWI.
efficiently using the scattering integral (SI) method where
plane wave Green’s functions are used to construct the THEORY
sensitivity kernel. The number of plane wave Green’s
functions required for computing a gradient without FWI misfit and gradient
aliasing only depends on frequencies, source line and
receiver line lengths. This number is relatively small for In traditional shot-profile FWI, the differences between
low frequencies, and it does not change with increasing observed data and predicted data are measured by
data volume or increasing model size. The sensitivity (Tarantola 1984)
kernels can be constructed very efficiently, which
facilitates the sensitivity analysis. The effectiveness of the 1
E(m) = δ d†δ d, (1)
proposed method is demonstrated using the synthetic 2D 2
Marmousi model.
where E(m) is the misfit function, m is the model
INTRODUCTION parameter, δ d = d obs − d pre is the difference between the
Building accurate subsurface elastic parameters using FWI observed data d obs and the predicted data d pre , and the
has become increasingly popular in exploration application superscript † represents the adjoint operator. In FWI, d pre
(Tarantola 1984; Pratt 1999; Virieux and Operto 2009).
FWI in time (Mora 1987) and frequency domains (Pratt et has a non-linear relation with m . The best set of model
al., 1998) have been demonstrated to be effective in parameters describing the data can be found when the
recovering subsurface model parameters with the help of misfit function reaches its minimum.
low frequency and long offset data (Mora 1988; Bunks et
al., 1995; Xue et al., 2016). Assuming our model is close to the true model, the non-
linear problem can be linearized by expanding the misfit
FWI is typically formulated as a local optimization method function in the vicinity of the best solution (Pratt et al.,
where the gradient of the misfit function are required to 1998; Virieux and Operto 2009). We can apply the gradient
minimize the differences between observed and predicated method to update the model parameters iteratively by
data. Both SI (Chen et al., 2007; Tao and Sen 2013a) and
adjoint-state methods (Plessix 2006) can be used to m k+1 = m k + α∇ m E k , (2)
construct the gradient. In exploration application, where the
number of sources is generally significant smaller than the where k is the iteration number and α is the step length
number of receivers, implementing the adjoint-state method along the updating direction. ∇ m E is the gradient of the
is believed to be more efficient than the SI method.
Nevertheless, both methods become computationally misfit function with respect to the model parameter.
expensive with increasing seismic data volume. ∇ m E = Re(J †δ d), where J † is the conjugate transpose of
Simultaneous source (Herrmann et al., 2009; Ben-Hadj-Ali the Frechet derivative matrix. In frequency domain,
et al., 2011) and plane wave (Vigh and Starr 2008; Tao and assuming constant density acoustic media and weak
Sen 2013b) FWI methods are proposed to improve the scattering (Sirgue and Pratt 2004, Zhao 2017), the gradient
computation efficiency. can be written element wise as
In this study, we propose a fast FWI method using SI −2ω 2 f (ω )

g(x) = Re(∑ ∑ G(s,x, ω )G(x,r, ω )
method with DPW data in frequency domain, which is s r v 3 (x) (3)
named as DPW FWI. The DPW domain is a pure plane
δ d * (s,r, ω )),
wave domain, where seismic data are fully decomposed
into plane wave components. Full waveform modeling in
DPW-frequency domain was introduced by Sen and Frazer
© 2017 SEG Page 1324

DPW FWI
where superscript * represents complex conjugate, x is the

parameter location, s is the source location, r is the −2ω 6 f (ω )
g(x) = Re( ∫ ∫ G(p s ,x,ω )G(p r ,xω )
receiver location, ω represents angular frequency, f (ω ) v 3 (x)
ps pr
is the frequency source, v(x) is the velocity, G (s, x, ω ) and (7)

×δ d * (p s ,p r ,ω )exp[+iω (p s + p r ) ⋅(x h − x ref )]
G (x, r, ω ) are Green’s function from s to x , and from x
×dp s dp r ),
to r , respectively. Implementing SI method using
Equation (3) requires calculating a large number of Green’s where x h is the horizontal position of x , G (p s , x, ω ) and
functions, which is computationally expensive. Therefore,
the adjoint-state method (Tromp et al., 2005), which G (p s , x, ω ) are source and receiver plane wave Green’s
requires only two wavefield propagation processes for each functions, respectively, which can be computed by
shot, is typically utilized to compute the gradient.
G(p s ,x,ω ) = ∫ G(s,x,ω ) exp(+iω p s ⋅(s − x h ))ds,
DPW FWI with scattering integral method (8)
G(p r ,x,ω ) = ∫ G(r,x,ω ) exp(+iω p r ⋅(r − x h ))dr.
We start with the Equation (3) and derive an expression for
the gradient to be computed with DPW data residual. Using Detailed derivations for Equation (7) can be found
elsewhere (Zhao et al., 2015b). Gradient of the misfit
the reciprocity principle, (i.e. G (x, r, ω ) = G (r, x, ω ) )
function can be computed using Equation (7) in the form of
Equation (3) becomes SI, where DPW data residual and plane wave Green’s
−2ω 2 f (ω ) functions are utilized. The DPW data residual
g (x) = Re(∑∑ G (s, x, ω )G (r, xω ) δ d (p s , p r , ω ) can be obtained by DPW transform
s r v 3 ( x) (4)
δ d (s, r, ω ) :
δ d * (s, r, ω )).
δ d (p s , p r , ω ) = ∫∫ [dobs (s, r, ω ) − d pre (s, r, ω )]
Data residual δ d (s, r, ω ) can be represented by DPW data × exp[+iω (p s ⋅ (s − x ref ) (9)
(Stoffa et al., 2006; Zhao et al., 2015a), and it can be +p r ⋅ (r − x ref ))]dsdr.
written as
δ d (s, r, ω ) = ω 4 ∫∫ δ d (p s , p r , ω )exp[−iω Plane wave Green’s functions (i.e. Equation (8)) do not
(5)
(p s ⋅ (s − x ref ) + p r ⋅ (r − x ref ))]dp s dp r , depend on source or receiver locations. A plane wave
Green’s function can be used for either p s or p r . Based on
where p s and p r are the source and receiver plane wave plane waves ranges and model setups, the number of
ray-parameters, respectively. xref is the reference point for Green’s functions need to be computed for the entire model
is relatively small. Therefore, computational efficiency of
DPW transform, and δ d (p s , p r , ω ) is the difference computing the gradient can be greatly improved. Selecting
between observed and predicted DPW data. We will only one p s - p r pair from Equation (7), we identify that
explain how to compute δ d (p s , p r , ω ) later in this section. the sensitivity kernel can be written as
Substituting Equation (5) into Equation (4) and replacing −2ω 6 f (ω )

J (x,p s ,p r ) = G(p s ,x,ω )G(p r ,xω )
the summations with integrals, we obtain v 3 (x) (10)
−2ω f (ω )
6 × exp[+iω (p s + p r ) ⋅(x h − x ref )].
g(x) = Re( ∫ ∫ G(s,x,ω )G(r,xω )
v 3 (x)
s r The analysis for such sensitivity kernel is reported in a
× ∫ ∫ δ d * (p s ,p r ,ω ) separate paper (Zhao 2017).
ps pr
(6)
× exp[+iω (p s ⋅(s − x ref ) + p r ⋅(r − x ref ))] Selecting plane waves
×dp s dp r dsdr). The number of plane wave Green’s functions required for
computing the gradient without aliasing depends on
Reorganizing terms and slant stacking Green’s functions in frequencies, source line and receiver line lengths. The
Equation (6), we obtain quantitative requirement for representing a spherical wave
with plane waves is given by (Zhang et al., 2005)
© 2017 SEG Page 1325

DPW FWI
pmax − pmin 1
Δp = ≤ (11)
Np fL
where Np is the number of plane waves, f is the frequency

and L is the length of either source line or receiver line.
NUMERICAL EXAMPLE
We demonstrate the application of our proposed method

with the synthetic 2D Marmousi model. The true model and
the initial model for FWI are shown in Figure 1. There are
460 horizontal and 150 vertical grid points. Grid spacing is
0.02 km in both directions. There were 460 shot gathers
generated using frequency domain finite difference method
with 460 receivers for each shot. The frequency
components of the shot gather ranged from 5 to 20 Hz with Figure 2. a) and b) are the real parts of plane wave Green’s
a 0.25 Hz interval (61 frequencies in total). Shot and functions with p = 0.3 s/km and p = 0.2 s/km. All results
receiver intervals were both 0.02 km. The shot gathers were computed at 5 Hz.
residual were transformed into a DPW data residual with
281 p s and 281 p r plane waves, both of which were A gradient equivalent with the one generated by time
equally spaced between -0.7 to 0.7 s/km. There were 78961 domain shot-profile FWI can be obtained by stacking all
(281 X 281) traces in the DPW domain. The reference individual gradient for each DPW data residual and
point for the DPW transform is at x ref =4.6 km. stacking all frequencies components. Figure 3 shows the
gradient obtained with the smoothed initial model, where
all 78961 traces DPW data residual and all 61 frequencies
components were stacked.
Figure 3. First gradient generated with the initial model

using all DPW data residual and all frequency components.
We carried out a frequency domain FWI experiment using

5 temporal frequencies at 5, 6.5, 8.5, 11.5 and 15 Hz. The
frequency selection strategy was proposed by Sirgue and
Figure 1. a) True Marmousi model. b) Smoothed initial
Pratt (2004). The strategy was derived in 1D case; 2D
model for DPW FWI.
application of this strategy needs further study.
Nevertheless, we found that it worked well in our
The gradient of the misfit function is computed by stacking
application. More frequency selection strategy can be found
the product of each element of the sensitivity kernel with
in Brossier et al. (2009). Given the same shot and receiver
the corresponding DPW data residual, where plane wave
lengths, the numbers of plane wave Green’s functions
Green’s functions are required to construct the sensitivity
required to computing a gradient without aliasing varies
kernel. Here, in Figure 2, we demonstrate two plane wave
with frequency. It is relatively small at low frequencies
Green’s functions with different ray-parameters, which
according to Equation (11). In Figure 4, we show several
were computed using a finite difference method in
gradient profiles generated using different number of plane
frequency domain.
wave Green’s functions at 5 Hz. In this experiment, 71
plane wave Green’s functions were enough for constructing
the gradient without aliasing. Further decreasing the
© 2017 SEG Page 1326

DPW FWI
number deteriorated the gradient. The number of plane

wave Green’s functions required at 6.5, 8.5, 11.5 and 15 Hz
are 84, 110, 141, and 193, respectively.
Figure 5. Inverted results after a) 5 Hz, b) 8.5 Hz and c) 15

Hz, respectively.
CONCLUSIONS
We propose implementing FWI in frequency domain using

compacted DPW datasets. Frequency domain plane wave
Green’s functions are construed to implement SI method,
where the gradient of the misfit function can be computed
with high efficiency. The number of plane wave Green’s
Figure 4. The first gradients generated at 5 Hz with
functions required for computing a gradient without
different numbers of plane wave Green’s functions. a) 281
aliasing at low frequencies is generally small. This number
p s and 281 p r Green’s functions. b) 71 p s and 71 p r
will increase with higher frequencies and larger source and
Green’s functions. c) 36 p s and 36 p r Green’s functions. receiver lengths. This number, however, will not change
with increasing data volume or increasing model size,
given a frequency and source and receiver lengths. The
Starting with the initial smooth model, the proposed sensitivity kernel can be constructed without any extra
method was performed for each frequency sequentially for overhead, which facilitates the sensitivity analysis.
20 iterations. L-BFGS method was utilized in the updating
process. Inverted result of each frequency was used as the ACKNOWLEDGEMENTS
starting model for the next frequency. Inverted results after
5, 8.5 and 15 Hz are shown in Figure 5. Most features in We thank the helpful discussions and suggestions from
the model are successfully recovered; increasing the Siwei Li, Junzhe Sun and Zhiguang Xue.
number of iterations and adding more frequencies for
inversion can further improve the result.
© 2017 SEG Page 1327

EDITED REFERENCES
REFERENCES
Ben-Hadj-Ali, H., S. Operto, and J. Virieux, 2011, An efficient frequency-domain full waveform
inversion method using simultaneous encoded sources: Geophysics, 76, no. 4, R109–R124,
http://dx.doi.org/10.1190/1.3581357.
Brossier, R., S. Operto, and J. Virieux, 2009, Seismic imaging of complex onshore structures by 2D
elastic frequency-domain full-waveform inversion: Geophysics, 74, no. 6, WCC105–WCC118,
http://dx.doi.org/10.1190/1.3215771.
Bunks, C., F. M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, http://dx.doi.org/10.1190/1.1443880.
Chen, P., T. H. Jordan, and L. Zhao, 2007, Full three-dimensional tomography: A comparison between
the scattering-integral and adjoint-wavefield methods: Geophysical Journal International, 170,
175–181, http://dx.doi.org/10.1111/j.1365-246X.2007.03429.x.
Fokkema, J. T., and P. M. van den Berg, 1992, Reflector imaging: Geophysical Journal International, 110,
191–200, http://dx.doi.org/10.1111/j.1365-246X.1992.tb00721.x.
Herrmann, F. J., Y. A. Erlangga, and T. T. Y. Lin, 2009, Compressive simultaneous full-waveform
simulation: Geophysics, 74, no. 4, A35–A40, http://dx.doi.org/10.1190/1.3115122.
Mora, P., 1987, Nonlinear two-dimensional elastic inversion of multioffset seismic data: Geophysics, 52,
1211–1228, http://dx.doi.org/10.1190/1.1442384.
Mora, P., 1988, Elastic wave-field inversion of reflection and transmission data: Geophysics, 53, 750–
759, http://dx.doi.org/10.1190/1.1442510.
Plessix, R. E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain: Part 1, theory and verification in
a physical scale model: Geophysics, 64, 888–901, http://dx.doi.org/10.1190/1.1444597.
Pratt, R. G., C. Shin, and G. J. Hick, 1998, Gauss–newton and full newton methods in frequency–space
Sen, M. K., and L. N. Frazer, 1991, Multifold phase space path integral synthetic seismograms:
Geophysical Journal International, 104, 479–487, http://dx.doi.org/10.1111/j.1365-
246X.1991.tb05695.x.
temporal frequencies: Geophysics, 69, 231–248, http://dx.doi.org/10.1190/1.1649391.
Stoffa, P. L., M. K. Sen, R. K. Seifoullaev, R. C. Pestana, and J. T. Fokkema, 2006, Plane-wave depth
migration: Geophysics, 71, no. 6, S261–S272, http://dx.doi.org/10.1190/1.2357832.
Tao, Y., and M. K. Sen, 2013a, Frequency-domain full waveform inversion with a scattering-integral
approach and its sensitivity analysis: Journal of Geophysics and Engineering, 10, 065008.
Tao, Y., and M. K. Sen, 2013b, Frequency-domain full waveform inversion with plane-wave data:
1259–1266, http://dx.doi.org/10.1190/1.1441754.
© 2017 SEG Page 1328

Tromp, J., C. Tape, and Q. Y. Liu, 2005, Seismic tomography, adjoint methods, time reversal and banana-
doughnut kernels: Geophysical Journal International, 160, 195–216,
http://dx.doi.org/10.1111/J.1365-246x.2004.02453.X.
Vigh, D., and E. W. Starr, 2008, 3D prestack plane-wave, full-waveform inversion: Geophysics, 73, no. 5,
VE135–VE144, http://dx.doi.org/10.1190/1.2952623.
Xue, Z., N. Alger, and S. Fomel, 2016, Full-waveform inversion using smoothing kernels: 86th Annual
http://dx.doi.org/10.1190/segam2016-13948739.1.
Zhang, Y., J. Sun, C. Notfors, S. H. Gray, L. Chernis, and J. Young, 2005, Delayed-shot 3D depth
migration: Geophysics, 70, no. 5, E21–E28, http://dx.doi.org/10.1190/1.2057980.
Zhao, Z., 2017, Sensitivity kernel for double plane wave full waveform inversion: 87th Annual
International Meeting, SEG, Expanded Abstracts, submitted.
Zhao, Z., M. K. Sen, and P. L. Stoffa, 2015a, Plane wave reverse time migration in VTI media using
Green’s function: 85th Annual International Meeting, SEG, Expanded Abstracts, 4002–4007,
Zhao, Z., M. K. Sen, and P. L. Stoffa, 2016, Double-plane-wave reverse time migration in the frequency
domain: Geophysics, 81, no. 5, S367–S382, http://dx.doi.org/10.1190/geo2015-0687.1.
Zhao, Z., M. K. Sen, and P. L. Stoffa, 2017, Double plane-wave reverse-time migration: Geophysical
Prospecting, http://dx.doi.org/10.1111/1365-2478.12507.
Zhao, Z., M. K. Sen, P. L. Stoffa, and H. Zhu, 2015b, Double plane wave least squares reverse time
migration: 85th Annual International Meeting, SEG, Expanded Abstracts, 4170–4174,
© 2017 SEG Page 1329

Frequency-domain full waveform inversion with updates based on nonlinear sensitivities
Yu Geng*, Wenyong Pan and Kristopher A. Innanen, CREWES project, Dept. of Geoscience, University of
Calgary
SUMMARY
To establish a better-posed reconstruction of low
Although full waveform inversion (FWI) is a highly wavenumber model components, and help FWI to converge
nonlinear inverse problem, it is usually solved as a local to a global minimum, several approaches can be employed,
optimization problem under a linear approximation, where e.g., by retrieving the low frequency information in the data
small angle backscattered data (via the residuals) are as in Laplace-domain and Laplace-Fourier-domain
treated linearly. Adding nonlinearity within each update inversion (Shin and Cha, 2008, 2009), envelop inversion
may have important consequences for convergence rates (Wu, Luo, and Wu, 2014); by building the low
and parameter accuracy. One approach is to include higher- wavenumber background model through reflection
order scattering terms into the sensitivities during the information alone (Xu et al., 2012, Brossier, Operto, and
construction of the gradient, by varying not the current but Virieux, 2015), or, together with refractions (Wang et al.,
the updated model at each iteration. By applying inverse 2015, Zhou et al., 2015); and by building the background
scattering theory, this additional sensitivity term can be model and perturbation simultaneously, in the data domain
computed from the data residuals at the current iteration. A or mixed data/image domain (Sun and Symes, 2012,
nonlinear frequency-domain FWI inversion scheme, with Albertin, Shan, and Washbourne, 2013, Biondi and
an inner and an outer loop, implementing this idea is Almomin, 2014, Wu and Alkhalifah, 2015, Alkhalifah and
presented here. A perturbation is inverted from the data Wu, 2016).
residuals within the inner loop, and the descent direction
based on the nonlinear sensitivity to update the model is Mitigation of linearization errors within each iteration
computed involving this perturbation in the outer loop. We could have a significant impact on convergence rates and
test this nonlinear FWI on acoustic single-parameter inversion results. Using nonlinear sensitivities, e.g., in
Marmousi synthetics. The inverted results vary depending resistivity inversion (Mcgillivray and Oldenburg, 1990),
on data frequency ranges and initial models, but we optical imaging (Kwon and Yazici, 2010) and seismic
conclude that the nonlinear FWI has the capability to inversion (Wu and Zheng, 2014, Innanen, 2015) has been
generate high resolution model estimates in both shallow considered. Here, we consider an extension of frequency
and deep regions, and to converge rapidly, relative to a domain FWI (e.g., Sirgue and Pratt, 2004) to incorporate
benchmark FWI approach involving the standard gradient. nonlinear sensitivities, building on the procedure
introduced by Innanen (2015). A two-loop inversion
INTRODUCTION scheme is employed, in which, in the inner iterations, a
perturbation is determined from the data residuals using
Although FWI (Lailly, 1983, Tarantola, 1984, Virieux and linear inversion, and this is used, in the outer iterations, to
Operto, 2009) is a highly nonlinear inverse problem, it is determine a descent direction which in principle anticipates
usually solved as a local optimization problem in the some of the curvature of the objective function caused by
framework of the Born approximation. The fundamentally data-model nonlinearity. In wave physics terms, the
nonlinear data-model relationship is accounted mainly introduction of higher order sensitivities involves
through iteration and updating of linearized relationships. transmission wave paths from each scattering point to both
Linearization error in seismic inversion can be significant sources and receivers at the surface. The application on the
in many cases (Wu and Zheng, 2014), especially when Marmousi model shows that this nonlinear FWI converges
large-contrast and spatially sustained perturbations are more rapidly than does FWI with a conventional gradient,
involved. The former situation prevails when large-angle and appears to be adept at reconstructing low wavenumber
reflection information is being considered, which, in components even absent rich low frequencies.
reflection configurations, is the regime in which many
high-priority elastic and/or anisotropic parameters are THEORY
distinguished (Innanen, 2015). The Fréchet derivative, or
sensitivity, restricts the resolution of FWI and could cause a Nonlinear sensitivity
strong deficit of low wavenumber components in the
updated model due to the lack of large-aperture In this paper, the space/frequency-domain isotropic
illumination and low frequency information. Furthermore, acoustic wave equation with constant density is used to
cycle-skipping artifacts occur when the Born describe wave motion:
approximation is no longer valid, leading the optimization
to a local minimum (Virieux and Operto, 2009).
 s  r     G  r, r ,      r  r  ,
2 2
s s (1)
© 2017 SEG Page 1330

Frequency domain nonlinear full-waveform inversion
where  is the frequency, s  r  is the squared slowness, rs is which includes multiple interactions with sn  r  and one
the source location, and G  r, rs ,   is the Green’s function. interaction with  s in the remaining terms in equation (5).
For simplicity, we hereafter omit  in all wave
The resulting sensitivity is

expressions, e.g., we use G  r, rs  rather than G  r, rs ,   .
G  rg , rs | sn 1   G  rg , rs | sn 1    G  rg , rs | sn 1  
      ... ,
s  r   s  r  s  r 
At the nth iteration, FWI seeks to minimize the misfit  0  1
function (6)
1 where the index refers to the order of the term in the
  s     d  sn  ,
2
(2)
2 rs rg  perturbation sn  r  ,
which is the L2 norm of the data residual  d  sn  , which is  G  rg , rs | sn 1  
   G  rg , r | sn  G  r, rs | sn  ,
2
 (7)
the difference between the recorded data d  rg , rs  at s  r 
 0
receiver rg and modeled data P  rg , rs | sn   FG  rg , rs | sn  ,
where F is the source spectrum. It does so by iteratively  G  rg , rs | sn 1    G  rg , rs | sn 1  
      ...
updating the current model, sn  r  with a perturbation  s  r  1  s  r  2
sn  r  along some appropriate descent direction with a   2  G  rg , r | sn , sn  G  r, rs | sn  . (8)
certain step length  n . This is assumed to converge to the
true velocity model s  r  .  G  rg , r | sn   G  r, rs | sn , sn  
When sn  r  is zero, the higher order term (8) vanish, and
Although the descent direction is different for different the sensitivity reduces to the conventional form in (4).
optimization methods, the gradient of the misfit function is
always involved in the calculation. At a given frequency, the
Calculation of sn  r  before n+1th iteration
gradient is
 G  rg , rs | sn  * 
g n  r     Re   2 F  d  rg , rs | sn   , (3) All of the Green’s functions used in the new sensitivity in
 s  r 
rs , rg   (6) are still calculated in the nth model sn  r  . However, to
where * stands for the complex conjugate, and G s is the calculate the higher order terms (8) in the new sensitivities,
sensitivity determined by varying the field G  rg , rs | sn  in the sn  r  is required. This quantity is unknown, but it has a
current medium iterate sn  r  : relationship with the nth data residual. Innanen (2015)
showed the perturbation sn  r  can be expressed with a
G  rg , rs | sn 
  2 G  rg , r | sn  G  r, rs | sn  . (4) series related to the nth data residual  d  sn  , according to
s  r  inverse scattering theory,
Equation (4) reflects the “true” sensitivity of the medium  d  sn    F  2  dr ' G  rg , r ' | sn  sn  r '  G  r ', rs | sn   ... . (9)
only to the extent that sn  r  reflects the true medium
properties. Prior to convergence, a different sensitivity – in Instead of using standard inverse scattering series
fact, one which implicitly or explicitly depends on the techniques (Weglein et al., 2003), taking only the first order
residuals, is expected if we instead vary the field in the term, the perturbation sn  r  can be approximated by the
medium iterate under construction, sn 1  r  . At the n+1th solution s  r  of a further, second data fitting scheme in a
iteration, this updated model is sn 1  r   sn  r   sn  r  . least squares sense, by minimizing
Varying the medium by a  s localized at r in sn 1  r  , the 1
  s     d  sn    P rg , rs | sn  ,
2
(10)
perturbation caused by this added variation  s and sn  r  2 rs rg
can be written (Innanen, 2015) where  P  rg , rs | sn  is the scattered data obtained by
 G  rg , rs | sn 1 ,  s    s  2 G  rg , r | sn  G  r, rs | sn  sampling the scattered wavefield  P  r, rs | sn  at receiver rg ,
which is found by solving the wave equation with a virtual
  G  rg , r | sn , sn  G  r, rs | sn  , (5)
secondary source  2 s  r '  P  r ', rs | sn  :
 G  rg , r | sn   G  r, rs | sn , sn    ...
 s  r      P  r , r
2
n
2
s | sn    2 s  r '  P  r ', rs | sn  . (11)
where the first term can be interpreted as a scattering
process involving one interaction with the variation  s only, Gradient with higher-order sensitivities
and  G  r, rs | sn , sn  and  G  rg , r | sn , sn  are the scattered
wavefields related to sn  r  , Substituting the perturbation s  r  into the nonlinear
 G  r, rs | sn , sn    2  dr ' G  r, r ' | sn  sn  r '  G  r ', rs | sn 1  sensitivity (6), the nth model update for sn  r  is
 G  rg , r | sn , sn    2  dr ' G  rg , r ' | sn  sn  r ' G  r ', r | sn 1 
© 2017 SEG Page 1331

gn  r    Re   d  s  G  r , r | s  P  r, r
2 *
n g n s | sn  outperforms SD FWI; the NFWI result is particularly close
to the true model.
rs , rg
, (12)
 P  rg , r | sn  P  r, rs | sn   G  rg , r | sn   P  r, rs | sn   
where  P  r, rs | sn  is as in (11) and  P  rg , r | sn  is the

scattered wavefield obtained by solving the wave equation
with virtual secondary source  2 s  r '  G  rg , r ' | sn  . In
this approximation, only up to the 1st order term in (6) is
included. The perturbation s  r  can, lead to a direct
update using the perturbation without line searching as
sn 1  r   sn  r   s  r  . (13)
By exchanging  d  sn  with  d  sn 1  in (12), we obtain the
gradient for updating sn 1  r  with the higher order
sensitivities in the model sn 1  r  ,
g n 1  r    Re   d  s  G  r , r | s  P r, r
rs , rg
2 *
n 1 g n s | sn 
 P  rg , r | sn 1  P  r, rs | sn   G  rg , r | sn   P  r, rs | sn 1   
.(14)
  Re   d  s  G r , r | s  P r, r
rs ,rg
2 *
n 1 g n 1 s | sn 1 
 P  rg , r | sn 1   P  r, rs | sn 1   
EXAMPLES
The Marmousi model is used to test the use of the nonlinear

gradients (12) and (14), hereafter FOFWI and NFWI
respectively. FWI solved with conventional gradients, by
steepest descent (SD) and truncated Gaussian-Newton
methods (GN) (Metivier et al., 2013, Pan, Innanen, and
Liao, 2017) will be used for comparison. Two initial
models and different frequency ranges are used to test
convergence and resolution characteristics of the method.
Synthetic data are generated in the frequency domain with
461 receivers and 46 sources along the surface. The true
model is shown in Figure 1a, which has a 500m added
water layer on top of the classical Marmousi model.
The first initial model is obtained from the true velocity

using a Gaussian smoother (Figure 1b). With 3 frequencies
(4Hz, 6.6Hz, 14.9Hz) starting at 4Hz, 10 iterations per
frequency are used in both conventional and nonlinear FWI
(updating only the velocity under the added water layer).
The SD FWI result is shown in Figure 1c. 5 inner iterations
are used for the GN FWI method (Figure 1d) and in
FOFWI (Figure 1e) and NFWI (Figure 1f). For the first Figure 1: a) True Marmousi velocity model; b) initial velocity
frequency, Figure 2a shows the residual norm vs. # of model. Inversion results with conventional gradient c) using SD
iterations, and Figure 2b shows the model error norm vs. # and d) GN method, e) FOFWI and f) NFWI with 10 iterations for 3
of iterations, with red, cyan blue line and blue plus line frequencies starting from 4Hz to 15Hz.
stands for FWI, GN FWI, NFWI and FOFWI result,
Next, we use a linear initial model (Figure 3a) to test NFWI.
respectively. FWI with the nonlinear gradient converges
5 frequencies (2Hz, 3.3Hz, 5.5Hz, 9Hz and 14.9Hz) are
more rapidly than conventional SD FWI. Because of the
used, with 10 iterations per frequency. Figure 3b-c show
lack of low frequencies, GN FWI performs similarly to SD
the SD and GN FWI results respectively, and Figure 3d
FWI, but the convergence of both FOFWI and NFWI
shows the NFWI result (5 inner iterations are used for both
© 2017 SEG Page 1332

GN FWI and NFWI). Profiles along x = 4km, 6km are

shown in Figure 4, with red, cyan and blue line stands for
FWI, GN FWI and NFWI result, respectively. For the first
frequency, Figure 5a shows the residual norm vs. # of

iterations, and Figure 5b shows the model error norm vs. #
of iterations. We observe that, for the given number of
iterations, with frequency as low as 2Hz, SD FWI can only
reconstruct rough model structures, while GN FWI can
help converge much faster towards the true model. But,
NFWI outperforms both, especially in the deeper regions.
Figure 4: Velocity profiles along a) x = 4km and b) x = 6km.
Figure 2: a) Norm of data residual vector vs. number of iterations;

b) relative model least-squares error vs. number of iterations.
Figure 5: a) Norm of data residual vector vs. number of iterations;

b) relative model least-squares error vs. number of iterations.
CONCLUSIONS
In this study, by computing both zeroth and higher order

sensitivity terms, we have constructed a two-loop nonlinear
frequency domain FWI. In this approach, at each frequency,
a linear inversion is used to obtain a rough perturbation
model from the current data residual before calculating the
nonlinear FWI gradient. The resulting nonlinear FWI
appears to provide accurate and well-resolved inversion
results. Tests on the Marmousi model with different initial
models and frequency bands illustrate the convergence
characteristics of the nonlinear FWI.
ACKNOWLEDGEMENTS
We thank the sponsors of CREWES for support. This work

was funded by CREWES and NSERC (Natural Science and
Engineering Research Council of Canada) through the
grant CRDPJ 379744-08.
Figure 3: a) Initial velocity model; Inversion result with

conventional gradient by b) SD and c) GN, d) NFWI with 10
iterations for 5 frequencies starting from 2Hz to 15Hz.
© 2017 SEG Page 1333

EDITED REFERENCES
REFERENCES
Albertin, U., G. Shan, and J. Washbourne, 2013, Gradient orthogonalization in adjoint scattering-series
inversion: 75th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
Alkhalifah, T., and Z. D. Wu, 2016, The natural combination of full and image-based waveform
inversion: Geophysical Prospecting, 64, 19–30, http://doi.org/10.1111/1365-2478.12264.
Biondi, B., and A. Almomin, 2014, Simultaneous inversion of full data bandwidth by tomographic full-
waveform inversion: Geophysics, 79, no. 3, WA129–WA140, https://doi.org/10.1190/geo2013-
0340.1.
Brossier, R., S. Operto, and J. Virieux, 2015, Velocity model building from seismic reflection data by
full-waveform inversion: Geophysical Prospecting, 63, 354–367, http://doi.org/10.1111/1365-
2478.12190.
Innanen, K. A., 2015, Full waveform inversion updating in the presence of high angle/high contrast
reflectivity: 85th Annual International Meeting, SEG, Expanded Abstracts, 1314–1319,
Kwon, K., and B. Yazici, 2010, Born expansion and Frechet derivatives in nonlinear diffuse optical
tomography: Computers & Mathematics with Applications, 59, 3377–3397,
https://doi.org/10.1016/j.camwa.2009.07.088.
Inverse Scattering, Theory and Application, Society of Industrial and Applied Mathematics,
Expanded Abstracts, 206–220.
Mcgillivray, P. R., and D. W. Oldenburg, 1990, Methods for calculating Frechet derivatives and
sensitivities for the nonlinear inverse problem — a comparative-study: Geophysical Prospecting,
38, 499–524, http://doi.org/10.1111/j.1365-2478.1990.tb01859.x.
Metivier, L., R. Brossier, J. Virieux, and S. Operto, 2013, Full waveform inversion and the truncated
Newton method: SIAM Journal on Scientific Computing, 35, B401–B437,
https://doi.org/10.1137/120877854.
Pan, W. Y., K. A. Innanen, and W. Y. Liao, 2017, Accelerating Hessian-free Gauss-Newton full-
waveform inversion via l-BFGS preconditioned conjugate-gradient algorithm: Geophysics, 82,
no. 2, R49–R64, https://doi.org/10.1190/geo2015-0595.1.
Shin, C., and Y. H. Cha, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931, http://doi.org/10.1111/j.1365-246X.2008.03768.x.
Shin, C., and Y. H. Cha, 2009, Waveform inversion in the Laplace-Fourier domain: Geophysical Journal
International, 177, 1067–1079, http://doi.org/10.1111/j.1365-246X.2009.04102.x.
Sun, D., and W. W. Symes, 2012, Waveform inversion via non-linear differential semblance
optimization: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–7,
Tarantola, A., 1984, Inversion of seismic-reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, https://doi.org/10.1190/1.1441754.
© 2017 SEG Page 1334

Wang, H. Y., S. C. Singh, F. Audebert, and H. Calandra, 2015, Inversion of seismic refraction and
reflection data for building long-wavelength velocity models: Geophysics, 80, no. 2, R81–R93,
https://doi.org/10.1190/geo2014-0174.1.
Weglein, A. B., F. V. Araujo, P. M. Carvalho, R. H. Stolt, K. H. Matson, R. T. Coates, D. Corrigan, D. J.
Foster, S. A. Shaw, and H. Y. Zhang, 2003, Inverse scattering series and seismic exploration:
Inverse Problems, 19, R27–R83, https://doi.org/10.1088/0266-5611/19/6/R01.

Wu, R. S., J. R. Luo, and B. Y. Wu, 2014, Seismic envelope inversion and modulation signal model:
Geophysics, 79, no. 3, WA13–WA24, https://doi.org/10.1190/geo2013-0294.1.
Wu, R. S., and Y. C. Zheng, 2014, Non-linear partial derivative and its De Wolf approximation for non-
linear seismic inversion: Geophysical Journal International, 196, 1827–1843,
Wu, Z. D., and T. Alkhalifah, 2015, Simultaneous inversion of the background velocity and the
perturbation in full-waveform inversion: Geophysics, 80, no. 6, R317–R329,
https://doi.org/10.1190/geo2014-0365.1.
Xu, S., D. Wang, F. Chen, G. Lambaré, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Zhou, W., R. Brossier, S. Operto, and J. Virieux, 2015, Full waveform inversion of diving & reflected
waves for velocity model building with impedance inversion based on scale separation:
Geophysical Journal International, 202, 1535–1554, https://doi.org/10.1093/gji/ggv228.
© 2017 SEG Page 1335

Seismogram registration via Markov chain Monte Carlo optimization and its applications in full wave-
form inversion
Hejun Zhu, The University of Texas at Dallas
SUMMARY 2009) can be used. Ma and Hale (2013) used dynamic warp-
ing to measure non-stationary travel time differences, which

Cycle skipping is a serious issue in full waveform inversion were applied in reflection travel time inversion. Fichtner et al.
(FWI) since it leads to local minima. To date, most FWI (2008) and Wu et al. (2014) measured the differences of seis-
algorithms depend on local gradient based optimization ap- mograms in time-frequency domain. Either phase or envelop
proaches, which cannot guarantee convergence towards the functions have been used to design misfit functions. Warner
global minimum if the misfit function involves local minima and Guasch (2016) proposed adaptive waveform inversion, which
and the starting model is far away from the true solution. In uses a stationary Wiener filter to quantify differences between
this study, I propose a misfit function based on non-stationary observed and predicted seismograms. It has been applied to
time warping functions, which can be calculated by solving experiments without low frequency data and good starting mod-
a seismogram registration problem. Considering the inher- els. Zhu and Fomel (2016) extended this idea to non-stationary
ent cycle skipping and local minima issues of the registration matching filters. It has been used to build good starting mod-
problem, I use a Markov chain Monte Carlo method to solve els for waveform inversion. Engquist and Froese (2014) first
it. A priori constraint about the sparsity of the local warping demonstrated the potential applications of Wasserstein metric
functions is incorporated to eliminate unreasonable solutions. in seismology, which is successively applied in waveform in-
No window selections are required in this procedure. Several version recently (Metivier et al., 2016a,b; Yang et al., 2017).
numerical examples demonstrate that the proposed misfit func-
tion allows us to tackle the cycle skipping problem and con- Baek et al. (2013) considered the measurement procedure as
struct accurate long-wavelength velocity models even without a seismogram registration problem (Fomel and Jin, 2009). By
low frequency data and good starting models. solving a least-squares optimization problem, they were able
to extract non-stationary time and amplitude discrepancies be-
tween two seismograms. However, as noticed in their paper,
similar to FWI, this optimization problem inherently suffers
INTRODUCTION from the cycle skipping and local minima issues. They have
to use a similar frequency continuation, multi-scale strategy
How to effectively and accurately measure differences between and low frequency amplified signals to obtain correct mea-
two seismograms is a general problem in seismology. Least- surements. This strategy is time consuming and easy to pro-
squares waveform difference is one way to quantify the dif- duce wrong phase/amplitude measurements due to the pres-
ferences (Tarantola, 1984; Pratt, 1999; Shin and Cha, 1998). ence of local minima. In this study, I propose to use a Markov
However, if travel time differences between two seismograms chain Monte Carlo (MCMC) approach (Mosegaard and Taran-
are greater than half period of signals, the least-squares wave- tola, 1995) to solve the seismogram registration problem. This
form difference suffers from cycle skipping problem. Cycle global optimization method allows us to directly sample the
skipping is a serious issue in FWI (Virieux and Operto, 2009) global minimum and avoid trapping into the local minima (Roth-
since it usually leads to local minima in misfit functions. To man, 1986, 1985; Sen and Stoffa, 1991, 2013). Then, a misfit
date, most inversion algorithms used in FWI depend on lo- function for velocity model building is designed based on the
cal gradient based optimization approaches. These methods inverted, non-stationary time warping functions. This is sim-
only enable us to search local minima around the initial model. ilar to the procedure in Luo and Schuster (1991) and Ma and
Therefore, if the starting model is far away from the global Hale (2013). The adjoint source of the proposed misfit func-
minimum, they cannot guarantee the convergence of FWI to- tion can be derived based on the idea of connectivity functions
wards the true solution. proposed by Luo and Schuster (1991).
One way to tackle the cycle skipping problem is to design
well-behaved misfit functions in FWI. A good misfit function
THEORY
should be convex, avoid local minima, involve broad basins of
attraction, and insensitive to the selection of frequency bands.
Given observation d(t) and prediction p(t), our goal is to mea-
To date, there have been numerous misfit functions proposed
sure their travel time differences. Baek et al. (2013) formu-
to tackle the cycle skipping problem in FWI. For instance, Luo
lated this as a seismogram registration problem and designed a
and Schuster (1991) and van Leeuwen and Mulder (2010) pro-
misfit function to quantify least-squares waveform differences
posed misfit functions based on cross correlation functions.
between data and time/amplitude warped prediction. Consid-
This type of misfit function focuses on phase differences be-
ering the difficulties of working with amplitudes, in this study,
tween observed and predicted seismograms, which are more
I only focus on travel time differences between observation
linear with respect to velocity perturbations in comparison with
and prediction. Similar to Baek et al. (2013), I define a misfit
waveform differences. To consider non-stationary travel time
function to measure the least-squares differences between pre-
differences, either local cross correlation (Hale, 2006; Diaz
diction p(t) and time warped observation d(t + w(t)), where
and Sava, 2017) or window selection strategies (Maggi et al.,
© 2017 SEG Page 1336

Seismogram registration
w(t) is the unknown local warping function. Once I solve the seismogram registration problem, the inverted
local time warping function quantifies travel time differences
Z T between observations and predictions. Then I can define a mis-
1
c(w) = [d(t + w(t)) p(t)]2 dt , (1) fit function based on the local warping function and apply it in
2 0 velocity model building procedure. Similar to Luo and Schus-
ter (1991) and Ma and Hale (2013), I choose the following
Local warping of a signal can be implemented by shifting the
misfit function
coordinate of input signal and then performing interpolation to
regular coordinate. In this study, I use cubic spline functions
for the interpolation. Z
Ns Nr T
1 XX
Considering the non-uniqueness and nonlinearity of this opti- J= [w(t)]2 dt , (6)
2 0
mization problem, I add regularization to the misfit function. s=1 r=1
The choices of regularization schemes depend on the a priori
where Ns and Nr are the number of sources and receivers, re-
assumption of solutions, either L2 or L1 regularization can be
spectively.
used. If the a priori assumption is that the solution should be
smooth and with finite energies, we can add Tikhonov regular- In order to use this misfit function is waveform inversion, I
ization to the misfit function c(w) as need to compute adjoint sources, which are used to drive ad-
joint wavefields and construct misfit gradients (Tromp et al.,
 2005). Based on Plessix (2006), the misfit gradient can be
Z T 2 Z T
L2 dw(t) computed as
c (w) = c(w) + l1 dt + l2 [w(t)]2 dt ,
0 dt 0 Ns Nr Z Ns Nr Z
XX T XX T
(2) ∂J ∂ w(t) ∂ w(t) ∂ p(t)
= w(t) dt = w(t) dt ,
where l1 and l2 are the regularization parameters. ∂ c(x) 0 ∂ c(x) 0 ∂ p(t) ∂ c(x)
s=1 r=1 s=1 r=1
In this study, I choose sparsity constraints with the a priori as- (7)
∂ w(t)
sumption that the local time warping function should be sparse where c(x) is the velocity and I use chain rule for ∂ c(x)
. For
with some basis functions. Therefore, the misfit function used ∂ p(t)
∂ c(x)
,
I use the sensitivity kernel based on Born approxima-
for the seismogram registration problem is tion (Tarantola, 1984; Luo and Schuster, 1991)
Z T
c L1 (w) = c(w) + l3 |w(t)|dt . (3)
0 ∂ p(t) 2 0 ∂ 2 p(x0 ; xs )
= G(xr ; x ) ⇤ , (8)
where l3 is the regularization parameter which can be chosen ∂ c(x0 ) c3 ∂t 2
RT
based on the relative magnitudes of c(w) and 0 |w(t)|dt. Substitute the sensitivity kernel into Equation 7, and I can de-
Considering the potential local minima and cycle skipping prob- fine adjoint source f † (t) for each pair of shot and receiver as
lems of the seismogram registration problem (Baek et al., 2013), ∂ w(t)
in this study, I choose a MCMC method (Mosegaard and f † (t) = w(t) , (9)
∂ p(t)
Tarantola, 1995) to minimize the misfit function c L1 (w), which
allows us to directly sample the global minimum and avoid lo-
cal minima. The efficiency of the MCMC method depends Therefore, the key procedure is to compute the derivative of
on the size of model space. In order to reduce the number the local warping function with respect to the prediction. Luo
of unknowns and impose smoothness to the solution, I choose and Schuster (1991) provided a connectivity function method
the following model parameterization with cubic spline basis to compute such kind of derivative. I introduce a connectivity
function fk (t). Thus, the unknown model parameters are re- function F(t) and rewrite the previous expression as
duced to the coefficients of the spline functions, i.e., wk .
∂ F(t) ∂ F(t)
f † (t) = w(t) / , (10)
N
X ∂ p(t) ∂ w(t)
w(t) = fk (t)wk , (4) The connectivity function F(t) has to satisfy
k=1
Another advantage of the MCMC method is that I can easily ∂ F(t)

6= 0 . (11)
impose additional constraints for the local warping function. ∂ w(t)
For instance, similar to Ma and Hale (2013) and Hale (2013),
Here I choose the following connectivity function
to avoid unrealistically large stretch and squeeze, I set the up-
per limit for the first derivative of the local warping function
Z
∂ c L1 ˙ +w(t))dt +l3 sgn[w(t)] = 0
F= = [d(t + w(t)) p(t)] d(t ,
dw(t) ∂ w(t)
s . (5) (12)
dt
Since if the local warping function is the correct solution, then
Here I choose s = 1 to avoid 100% stretch and squeeze. the derivative of misfit function c L1 with respect to w(t) should
© 2017 SEG Page 1337

a c
be zero. Otherwise, I can modify the local warping function to
reduce the derivative. While the second derivative of F is not
zero in order to satisfy the requirement in Equation 11.
With the definition of the connectivity function, then I have

b d
∂F ˙ + w(t))
= d(t , (13)
∂ p(t)
and
Figure 1: Seismogram registration with a random time warp-
Z ing function. (a) compares an input signal (red) and its time
∂F ⇥ ⇤2
= [d(t + w(t)) p(t)] d(t ˙ + w(t))
¨ +w(t))+ d(t dt , warped version (dashed blue) with a local warping function in
∂ w(t) (c). (b) compares the input (red) and unwarped (dashed blue)
(14) signals. (c) compares synthetic (blue) and recovered (dashed
Assuming that once I have obtained the correct warping func- red) local warping functions. (d) presents the evolution of mis-
tion, the warped data should approximate prediction, i.e., fit c L1 .
d(t + w(t)) ⇡ p(t) , (15)

Next, I use a simple tomography experiment to illustrate the
Then I have performance of misfit function J in Equation 6. The true ve-
locity model involves four Gaussian anomalies with alterna-
tive signs and the starting model is homogeneous with veloc-
∂F ity equals to 4 km/s. Maximum velocity perturbation reaches
= ṗ(t) , (16)
∂ p(t) 12.5%. 11 shots are located at z=0.5 km and 151 receivers
and are located at z=3.5 km. I use a time domain waveform inver-
sion algorithm and a Ricker wavelet with central frequency of
Z 15 Hz. For the first experiment, I use a least-squares waveform
∂F
= [ ṗ(t)]2 dt . (17) misfit function with a multi-scale inversion strategy (Bunks
∂ w(t)
et al., 1995; Sirgue and Pratt, 2004). Four frequency groups
Combining previous expressions, the adjoint source for the with low pass filters at 1 Hz, 5 Hz, 10 Hz and 15 Hz are used.
misfit function in Equation 6 is 10 conjugate gradient iterations are used in each frequency
group. Figure 2a and b compare the true and recovered ve-
locity models. With this multi-scale inversion strategy, I am
ṗ(t) able to recover these four strong Gaussian anomalies.
f † (t) = w(t) R . (18)
[ ṗ(t)]2 dt
which is similar to the solution in Luo and Schuster (1991), a b
except here the measurements are non-stationary.
NUMERICAL EXAMPLES
Figure 1 presents a simple example to illustrate the seismo-

c d
gram registration problem. The input signal is a seismogram
computed based on 2D Marmousi model. A Ricker wavelet
with central frequency of 20 Hz is used in this example. I apply
a synthetic random time warping function (Figure 1c) to the in-
put signal and obtain a warped signal as shown in Figure 1a.
Given the input and warped signals, I use MCMC method to
minimize the misfit function c L1 (w) and search for the un- Figure 2: A tomography experiment with Gaussian anomalies.
known warping function. 30 cubic spline functions are used (a) presents true velocity model. (b) is a recovered velocity
to parameterize the local warping function. The regularization model using a least-squares misfit with a multi-scale inver-
parameter l3 is chosen as 0.003. 3000 MCMC iterations are sion strategy. (c) presents a recovered velocity model with a
applied and the misfit function c L1 (w) is gradually reduced least-squares misfit but no multi-scale inversion strategy. (d)
(Figure 1d). Figure 1c compares the synthetic and recovered presents a recovered model with the misfit function based on
local warping functions and Figure 1b compares the input and local warping functions (Equation 6).
unwarped signals. The MCMC optimization enables us to re-
cover this random time warping function and align phases be- However, if I do not use the multi-scale inversion strategy and
tween two seismograms. directly work on data with central frequency of 15 Hz. The
© 2017 SEG Page 1338

a b
recovered velocity model after 10 iterations is presented in
Figure 2c. Next, I switch to the misfit function based on lo-
cal warping functions measured from seismogram registration.
3000 MCMC iterations are used to solve the registration prob-
lem and the regularization parameter l3 is set to 0.5. A Ricker c d
wavelet with central frequency of 15 Hz is used and no fre-
quency continuation strategy is applied. After 10 conjugate

gradient iterations, the recovered velocity model is presented
in Figure 2d.
a Figure 4: Recovered velocity and perturbation based on a
least-squares waveform misfit. (a) and (b) are the recovered
velocity and perturbation with a multi-scale inversion strategy.
(c) and (d) are results without the multi-scale strategy.
b misfit function and perform additional 20 iterations. The final

recovered velocity and perturbation are presented in Figure 5c
and d.
a b
c d
Figure 3: Starting and true models for Marmousi2 experiment.

(a) and (b) are the true and starting models, respectively. (c) Figure 5: Recovered velocity and perturbation based on inver-
shows velocity perturbation between (a) and (b). sion strategy proposed in this paper. (a) and (b) are the re-
covered velocity and perturbation after 10 conjugate gradient
Finally, I use Marmousi2 model as an example to further il- iterations using a misfit function based on local warping func-
lustrate the inversion strategy. The starting model (Figure 3b) tions. (c) and (d) are results after 20 additional iterations with
is a highly smoothed version of the true model in Figure 3a. a least-squares misfit and using (a) as the starting model.
Velocity perturbation between the true and starting models is
presented in Figure 3c. 12 shots with spacing of 1.2 km and
201 receivers with spacing of 0.08 km are used. Both shots
and receivers are located at 5 m. First, I use a least-squares DISCUSSION AND CONCLUSIONS
waveform misfit function with a frequency continuation inver-
sion strategy. A low pass filter with corner frequencies at 1 Hz, In this paper, I use a Markov chain Monte Carlo method to
3 Hz and 5 Hz are used. 10 conjugate gradient iterations are solve the seismogram registration problem and measure non-
used in each frequency group. Figure 4a and b present recov- stationary time differences between two seismograms. A pri-
ered velocity and perturbation after the inversion. ori assumption of the sparsity of the local warping function is
built in the registration problem. 1D seismogram registration
However, if I do not use a frequency continuation strategy, in- examples demonstrate the performance of the proposed meth-
stead, I use a Ricker wavelet with central frequency of 5 Hz ods. A misfit function for velocity model building is designed
and apply a high pass filter to eliminate frequency contents based on the calculated non-stationary time warping functions.
below 3 Hz. If I still use a least-squares waveform misfit func- Numerical experiments demonstrate that the proposed misfit
tion and perform 10 conjugate gradient iterations. The recov- function enables us to tackle the cycle skipping problem and
ered velocity and perturbation are illustrated in Figure 4c and build accurate long-wavelength models for experiments with-
d. The inversion reaches a local minimum instead of the global out low frequency data and good starting models.
minimum.
To deal with this cycle skipping problem, I first run 10 itera-
tions with the misfit function defined in Equation 6. The reg- ACKNOWLEDGMENTS
ularization parameter l3 is chosen as 0.05 and 3000 MCMC
iterations are used to solve the seismogram registration prob- I thank the Texas Advanced Computing Center for providing
lem. The recovered velocity and perturbation are shown in computational resources for this work.
Figure 5a and b. Then I switch to the least-squares waveform
© 2017 SEG Page 1339

EDITED REFERENCES
REFERENCES
Baek, H., H. Calandra, and L. Demanet, 2013, Velocity estimation via registration-guided least-squares
inversion: Geophysics, 79, no. 2, R79–R89, http://doi.org/10.1190/geo2013-0146.1.
Diaz, E., and P. Sava, 2017, Seismic tomography using local correlation functions: Geophysics, in press.
Engquist, B., and B. Froese, 2014, Application of the Wasserstein metric to seismic signals:
Communications in Mathematical Sciences, 12, no. 5, 979–988,
http://doi.org/10.4310/CMS.2014.v12.n5.a7.
Fichtner, A., B. Kennett, H. Bunge, and H. Igel, 2008, Theoretical background for continent and global
scale full-waveform inversion in the time-frequency domain: Geophysical Journal International,
175, 665–685, http://doi.org/10.1111/gji.2008.175.issue-2.
Fomel, S., and L. Jin, 2009, Time-lapse image registration using the local similarity attribute: Geophysics,
74, no. 2, A7–A11, http://doi.org/10.1190/1.3054136.
Hale, D., 2006, Fast local cross-correlation of images: 76th Annual International Meeting, SEG,
Hale, D., 2013, Dynamic warping of seismic images: Geophysics, 78, no. 2, S105–S115,
http://doi.org/10.1190/geo2012-0327.1.
Luo, Y., and G. Schuster, 1991, Wave-equation traveltime inversion: Geophysics, 56, 645–653,
http://doi.org/10.1190/1.1443081.
Ma, Y., and D. Hale, 2013, Wave-equation reflection traveltime inversion with dynamic warping and full-
waveform inversion: Geophysics, 78, no. 6, R223–R233, http://doi.org/10.1190/geo2013-0004.1.
Maggi, A., C. Tape, M. Chen, D. Chao, and J. Tromp, 2009, An automated time-window selection
algorithm for seismic tomography: Geophysical Journal International, 178, 257–281,
http://doi.org/10.1111/gji.2009.178.issue-1.
Metivier, L., R. Brossier, Q. Merigot, E. Oudet, and J. Virieux, 2016a, Measuring the misfit between
Geophysical Journal International, 205, 345–377, http://doi.org/10.1093/gji/ggw014.
Metivier, L., R. Brossier, Q. Merigot, E. Oudet, and J. Virieux, 2016b, An optimal transport approach for
seismic tomography: Application to 3D full waveform inversion: Inverse Problems, 32, 115008,
http://doi.org/10.1088/0266-5611/32/11/115008.
Mosegaard, K., and A. Tarantola, 1995, Monte Carlo sampling of solutions to inverse problems: Journal
of Geophysical Research, 100, 12431–12447, http://doi.org/10.1029/94JB03097.
Plessix, R., 2006, A review of the adjoint-state method for computing the gradient of a functional with
http://doi.org/10.1111/j.1365-246X.2006.02978.x.
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, http://doi.org/10.1190/1.1444597.
Rothman, D., 1985, Nonlinear inversion, statistical mechanics, and residual statics estimation:
Rothman, D., 1986, Automatic estimation of large residual statics corrections: Geophysics, 51, 332–346,
http://doi.org/10.1190/1.1442092.
© 2017 SEG Page 1340

Sen, M., and P. Stoffa, 1991, Nonlinear one-dimensional seismic waveform inversion using simulated
annealing: Geophysics, 56, 1624–1638, http://doi.org/10.1190/1.1442973.
Sen, M., and P. Stoffa, 2013, Global optimization methods in geophysical inversion: Cambridge
University Press, http://doi.org/10.1017/CBO9780511997570.
Shin, C., and Y. Cha, 1998, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931, http://doi.org/10.1111/gji.2008.173.issue-3.

Sirgue, L., and G. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting
1259–1266, http://doi.org/10.1190/1.1441754.
Tromp, J., C. Tape, and Q. Y. Liu, 2005, Seismic tomography, adjoint methods, time reversal and banana-
doughnut kernels: Geophysical Journal International, 160, 195–216,
http://doi.org/10.1111/gji.2005.160.issue-1.
van Leeuwen, T., and M. A. Mulder, 2010, A correlation-based misfit criterion from wave-equation
http://doi.org/10.1111/j.1365-246X.2010.04681.x.
Warner, M., and L. Guasch, 2016, Adaptive waveform inversion: Theory: Geophysics, 81, no. 6, R429–
R445, http://doi.org/10.1190/geo2015-0387.1.
Wu, R., J. Luo, and B. Wu, 2014, Seismic envelope inversion and modulation signal model: Geophysics,
79, no. 3, WA13–WA24, http://doi.org/10.1190/geo2013-0294.1.
Yang, Y., B. Engquist, J. Sun, and B. Froese, 2017, Application of optimal transport and the quadratic
Wasserstein metric to full-waveform inversion: arXiv, 1, 1–22.
Zhu, H., and S. Fomel, 2016, Building good starting models for full waveform inversion based on
adaptive matching filtering misfit: Geophysics, 81, no. 5, U61–U72,
http://doi.org/10.1190/geo2015-0596.1
© 2017 SEG Page 1341

Balancing Amplitude and Phase in Tomographic Full Waveform Inversion
Ali Almomin, Saudi Aramco, and Biondo Biondi, Stanford University
SUMMARY methods are suitable for image-domain velocity analysis

methods, e.g. WEMVA, but cannot be directly applied in
We analyze the source of slow convergence of tomographic TFWI due to the explicit calculation of the nonlinear
full waveform inversion and find that it is caused by the modeling operator and residuals in the data space.
unbalanced effects of amplitudes and phase both in the
formulation of the regularization term and the enhancing To improve the convergence of TFWI, we examine the
operator. This imbalance results in a strong dependence of relationship the data fitting term and model regularization
the kinematic updates on the amplitude fitting, slowing term and analyze the cause of the convergence behavior.
convergence. To mitigate the problem we propose two Then, we propose two modifications to TFWI that both
modifications to the tomographic inversion. First, by reduce the slow convergence and allow for direct control of
modifying the regularization term to focus more on the phase the ratio between amplitude and phase effects in the
information, and second, by using an alternative enhancing inversion. These modifications are consistent in the
operator that is less sensitive to the amplitudes in the framework of TFWI and allow for a proper calculation of the
extended model. The modifications reduce the gradient gradient in the data space. We test the modifications on a
artifacts and allow for explicit control over the amplitudes synthetic model and show a reduction in the kinematic
and phases of the residuals. artifacts in the gradient.
INTRODUCTION MODIFIED REGULARIZATION
Tomographic full waveform inversion (TFWI) (Symes, The conventional L2 objective function for TFWI can be
2008; Sun and Symes, 2012; Biondi and Almomin, 2012) is written as follows:
a new inversion procedure that retains all the advantages and
benefits of FWI while avoiding its strict initial model 1 2 𝜖
𝐽 = ‖𝐅̃(𝐦 ̃ ‖22 ,
̃ )𝐬 − 𝐝𝐨𝐛𝐬 ‖2 + ‖𝐀𝐦 (1)
requirement and cycle-skipping challenges. To achieve this 2 2
goal, TFWI modifies full waveform inversion (FWI) by
combining its classical form with a modified form of wave- where 𝐦̃ is the extended model, 𝐬 is the source function, 𝐅̃
equation migration-velocity analysis (WEMVA). This is the extended forward modeling operator, 𝐝𝐨𝐛𝐬 is the
combination manifests itself as an extension of the velocity observed surface data, and 𝐀 is the defocusing operator. The
model through virtual axes (Biondi and Almomin, 2013). ̃ ) can be written as follows:
gradient 𝐠(𝐦
By extending the velocity model with the appropriate axis, 𝐠(𝐦 ̃ ∗ (𝐦

̃) = 𝐋 ̃ , 𝐬)(𝐅̃(𝐦
̃ )𝐬 − 𝐝𝐨𝐛𝐬 ) + 𝜖𝐀∗ (𝐀𝐦
̃ ), (2)
the modeling operator can match the observed data
regardless of the accuracy of the starting model by using where ∗ denotes an adjoint and 𝐋 ̃ ∗ is the extended linearized
kinematic information from the extended axis even when modeling operator. To avoid cycle skipping, the modeled
cycle skipping occurs. We set up the inversion to extract all data needs to be very close to the observed data for the data
the necessary information from the virtual axes and residual to be basically a 90-degree phase rotation of the
smoothly collapse them back into the physical, non- observed data. In TFWI, the data-fitting term can fit the
extended form of the model. The resulting method observed data regardless of the initial model because the
successfully inverts the kinematic and dynamic information model is extended. The regularization term adjusts the model
of the data with outstanding robustness and accuracy. in subsequent iterations to introduce shifts from the observed
data to create a kinematic difference that results in a
While it avoids the cycle-skipping problem, TFWI does meaningful tomographic update.
come with its own challenges—namely, its high
computational cost and the large number of iterations To better understand the process, we can think of the
required (Almomin and Biondi, 2013). The conventional different stages of TFWI as it iterates. First, the data-fitting
solution in the framework of FWI is to only match the phase term fits the observed data by creating reflectors in the
using a single frequency per iteration (Pratt, 1999; Shin and extended model. Then, the regularization term slightly
Ha, 2008). However, not using amplitudes prevents the focuses the model, which results in a small shift of the
simultaneous inversion of scales resulting in reduced modeled data and decrease in the data-fitting. Finally, the
accuracy of the solution. Another alternative is to modify the data-fitting term fits the shifted modeled data to the observed
gradient calculation to reduce some “kinematic” artifacts data by creating smooth tomographic updates in the model
(Fei and Williamson, 2010; Shen and Symes, 2015). These and adjusts the reflector locations. Notice that, except for the
© 2017 SEG Page 1342

first few iterations, all these stages happen simultaneously in
every iteration. ENHANCING OPERATORS
There are a few convergence challenges that occur in The enhancing operator is required to create a kinematic
practice. First, the data-fitting term creates both the shift in the regularization term that can be back-projected
reflectors and tomographic updates, which makes it more into tomographic updates. In the case of differential
difficult to balance, or emphasize either amplitude or phase semblance optimization (DSO), the enhancing operator, i.e.,
fitting. Second, the data-fitting term takes several iterations the complement of the focusing operator, as described in
until it becomes sufficiently small mainly because of Equation 3, is:
amplitude differences resulting from using an adjoint instead
of an inverse. If the regularization term focuses the model 𝐪
̃ (𝐪) = (1 −
𝐄DSO 𝐦 ̃ (𝐪),
)𝐦 (5)
before the model data fits the observed data, then the data 𝐪max
residual does not produce tomographic updates. Third, the
regularization term continues to focus the model at a rate that where 𝐪 is the extended axis value, either in subsurface
only depends on the weighting term epsilon, regardless of offset or time lag. As described in the previous section, a
the data-fitting term, resulting in a strong sensitivity to proper enhancing operator should result in a residual that is
epsilon. If epsilon is too small, the focusing is too slow, and a 90 degree phase rotation of the modeled data. However,
the inversion might take thousands of iterations before the DSO operator creates a tomographic update without
producing any useful tomographic updates. If epsilon is too creating such a phase-rotated residual. Instead, we scale the
large, the model is focused too fast and results in cycle amplitude for the combined effect of all lags (or all offsets
skipping. in data space) to result in the desired tomographic update.
To solve the previous challenges, we want to modify the This DSO approach has a few shortcomings. First, DSO
regularization term for it to directly produce the tomographic assumes the amplitudes do not change significantly along the
updates instead of indirectly through the data-fitting term. reflectors or along offset. This assumption breaks down in
We can start by rewriting the regularization term as follows: many cases, such as in the presence of amplitude versus
offset (AVO) effects, complex geometry, or irregular
𝜖 𝜖 𝜖 acquisition. Second, DSO enhances the image by scaling
̃ ‖22 = ‖𝐦
‖𝐀𝐦 ̃ ‖22 = ‖𝐦
̃ − (𝐈 − 𝐀)𝐦 ̃ ‖22 , (3)
̃ − 𝐄𝐦
2 2 2 down the amplitudes along the extended axis. Because the
total energy in the data is conserved, this approach assumes
where 𝐄 is an enhancing or focusing operator which is the the energy reduction in the image is to be converted to
complement of 𝐀. It is easier to see that the regularization tomographic updates in the velocity model, therefore it
term minimizes the difference between the extended model indirectly focuses the reflection energy toward zero lag.
and an enhanced version of the extended model by focusing However, this assumption ignores the possibility that the
the model. This formulation makes it similar to the data- energy can simply be converted into reflectors in the null-
fitting term, however it is still missing the wave-equation space of the modeling operator, which are usually present at
operators. Therefore, we propose a new regularization term the edge of the model.
as follows:
We illustrate the effects of these shortcomings in a simple
1 2 𝜖 2 constant velocity synthetic example with a single reflector.
𝐽 = ‖𝐅̃(𝐦
̃ )𝐬 − 𝐝𝐨𝐛𝐬 ‖2 + ‖𝐅̃(𝐦
̃ )𝐬 − 𝐅̃(𝐄𝐦
̃ )𝐬‖2 , (4)
2 2 We calculate a shot profile from the modeled data with a
slow background velocity using an extended image. The
This new regularization term completely changes the extension of the image preserves all the kinematic
behavior of TFWI because it does not modify the model. information in the observed data. We then calculate a shot
Instead, it directly calculates a residual between the modeled profile from the modeled data with a slow background
data and focused modeled data, and back-projects it into a velocity using DSO operator regularization on the extended
tomographic update. This regularization residual is image. Figure 1 compares a modeled trace at 2 km offset
guaranteed to have the correct amount of kinematic shift (bottom) with the DSO regularization residual (middle). We
compared to the modeled data regardless of how well the can see there is no significant change in the phase when we
observed data is fit. Therefore, the data-fitting term only compare the DSO regularization residual to the modeled data
produces reflectors while the regularization term only and only a small amplitude change.
produces tomographic updates. Moreover, epsilon now
balances amplitude fitting and phase fitting without any We propose using a different enhancing operator that shifts
danger of cycle skipping at any value, making the inversion the energy toward the zero lag of the extended axis as
process less sensitive to epsilon. follows:
© 2017 SEG Page 1343

gradient in the data space. The explicit control on the ratio
̃ (𝐪) = 𝐦
𝐄shift 𝐦 ̃ (𝐪 + 1 ∗ sign(𝐪)), (5) between the effect of the amplitude and phase information in
the inversion allows the method to have many flavors that
This shifting operator directly forces the energy to move range between amplitude-only methods (e.g. least-squares
toward the zero lag. The main advantage of this enhancing reverse time migration) and phase-only methods (e.g. phase-
operator compared to the DSO enhancing operator is that it only FWI).
rotates each trace by 90 degrees. This means that the
tomographic update does not depend on how the amplitudes ACKNOWLEDGMENTS
of different traces affect each other. In other words, we have
now removed the amplitude dependence. Furthermore, We wish to thank Saudi Aramco for the opportunity to
because the energy is directly focused, there are no artifacts publish this paper and the Stanford Exploration Project
at the edge of the model. affiliate companies for financial support.
We recalculate the regularization residuals with the shifting

operator. The trace at 2 km is shown in Figure 1 (top). We
can see the phase rotation at every offset of the residual.
We repeat the previous example but with a fast background

velocity instead of a slow background velocity. We calculate
a shot profile from the modeled data with a slow background
velocity using an extended image. We then calculate a shot
profile from the model with a slow background velocity
using DSO operator regularization and shifting operator
regularization, respectively, on the extended image. Figure 2
compares a trace at 2 km offset (bottom) with the DSO
regularization residual (middle) and shifting operator
regularization (top). Again, we can see there is no significant
change in the phase when we compare the DSO
regularization residual to the modeled data and only a small
amplitude change. On the other hand, the shifting operator
shows a clear phase rotation (and in the opposite direction to Figure 1: A trace at 2-km offset from the modeled data with a slow
the previous example). background velocity using an extended image (bottom), DSO
operator regularization on the extended image (middle), and shifting
operator regularization on the extended image (top).
Next, we compare enhancing operators in the model space
by calculating the tomographic update for two Gaussian
velocity anomalies, one faster than the background velocity
and one slower than the background velocity with a flat
reflector below. The flat, constant amplitude reflector is the
best possible scenario for DSO. Figure 3 shows the
tomographic update for the fast and slow anomalies using
DSO regularization. Figure 4 shows the tomographic update
of the fast and slow anomalies using shifting operator
regularization. There is a significant reduction in kinematic
artifacts of the shifting operator around the edges of the
model. However, since DSO uses the amplitudes of the
image, it results in better focusing at the location of the
anomaly.
CONCLUSIONS
We present modifications to TFWI that both reduce the

source of the slow convergence and allow for direct control
Figure 2: A trace at 2-km offset from the modeled data with a fast
of the ratio between amplitude and phase effects in the background velocity using an extended image (bottom), DSO
inversion. These modifications are consistent in the operator regularization on the extended image (middle), and shifting
framework of TFWI and allow for a proper calculation of the operator regularization on the extended image (top).
© 2017 SEG Page 1344

Figure 3: The tomographic update of using DSO regularization of a Figure 4: The tomographic update of using shifting operator
fast anomaly (top) and a slow anomaly (bottom). regularization of a fast anomaly (top) and a slow anomaly (bottom).
© 2017 SEG Page 1345

EDITED REFERENCES
REFERENCES
Almomin, A., and B. Biondi, 2013, Tomographic full-waveform inversion (TFWI) by successive
linearizations and scale separations: 83rd Annual International Meeting, SEG, Expanded
Abstracts, 1048–1052, http://dx.doi.org/10.1190/segam2013-1378.1.
Biondi, B., and A. Almomin, 2012, Tomographic full waveform inversion (TFWI) by combining full
waveform inversion with wave-equation migration velocity analysis: 82nd Annual International
Biondi, B., and A. Almomin, 2013, Tomographic full-waveform inversion (TFWI) by extending the
velocity model along the time-lag axis: 83rd Annual International Meeting, SEG, Expanded
Fei, W., and P. Williamson, 2010, On the gradient artifacts in migration velocity analysis based on
differential semblance optimization: 80th Annual International Meeting, SEG, Expanded
Abstracts, 4071–4076, http://dx.doi.org/10.1190/1.3513710.
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, http://dx.doi.org/10.1190/1.1444597.
Shen, P., and W. Symes, 2015, Horizontal contraction in image domain for velocity inversion:
Shin, C., and W. Ha, 2008, A comparison between the behavior of objective functions for waveform
inversion in the frequency and Laplace domains: Geophysics, 73, no. 5, VE119–VE133,
http://dx.doi.org/10.1190/1.2953978.
Sun, D., and W. Symes, 2012, Waveform inversion via non-linear differential semblance optimization:
82nd Annual International Meeting, SEG, Expanded Abstracts, 1–7,
Symes, W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56,
765–790, http://dx.doi.org/10.1111/j.1365-2478.2008.00698.x.
© 2017 SEG Page 1346

Massive 3D seismic data compression and inversion with hierarchical Tucker
Yiming Zhang*, Curt Da Silva, Rajiv Kumar, and Felix J. Herrmann
Seismic Laboratory for Imaging and Modeling (SLIM), University of British Columbia
ABSTRACT recover the fully sampled volumes. The authors in (Hennenfent

and Herrmann, 2006; Herrmann and Hennenfent, 2008) suc-

Modern-day oil and gas exploration, especially in areas of cessfully reconstruct incomplete seismic data in the Curvelet
complex geology such as fault belts and sub-salt areas, is an domain based upon the ideas from compressive sensing using
increasingly expensive and risky endeavour. Typically long- `1 -based reconstruction. Although these methods are sufficient
offset and dense sampling seismic data are required for sub- for storing and retrieving data due to the storage of small sub-
sequent shot based processing procedures, e.g. wave-equation sets of total coefficients, transform-based methods are unable to
based inversion (WEI) and surface-related multiple elimination easily provide query-based access to elements of interest from
(SRME). However, these strict requirements result in an expo- the data volume such as common shot gathers. As such, if one
nential growth in data volume size and prohibitive demands on recovers a subsampled volume with one of these techniques,
computational resources, given the multidimensional nature of invariably one must reform the entire data set before working
the data volumes. Moreover the physical constraints and cost with it in processes such as full-waveform inversion.
limitations impose restrictions on acquiring fully sampled data.
In this work, we propose to invert our large-scale data from a Alternate to transform-based methods are matrix completion
set of subsampled measurements, resulting in an estimate of the (Oropeza and Sacchi, 2011; Kumar et al., 2015) and tensor
true volume in a compressed low-rank tensor format. Rather completion (Kreimer and Sacchi, 2012; Trickett et al., 2013;
than expanding the data to its fully-sampled form for later Da Silva and Herrmann, 2015) techniques, which exploit the
downstream processes, we demonstrate how to use this com- natural low-rank behavior of seismic data under various permu-
pressed data directly via on-the-fly common shot or receiver tations. Critical in applying these methods to realistically-sized
gathers extraction. The combination of massive compression data is the absence of computing singular value decompositions
and fast on demand data reconstruction of 3D shot or receiver (SVDs) on data volumes. Methods proposed by (Kreimer and
gathers leads to a substantial reduction in memory costs but Sacchi, 2012), for instance, which employ a projection on to
with minimal effects on results in the subsequent processing convex sets technique to complete seismic data in the Tucker
procedures. We demonstrate the effective implementation of tensor format, suffer from the high computational complexity
our proposed framework on full-waveform inversion on a 3D of such operations. As a result, practitioners often have to
seismic synthetic data set generated from a Overthrust model. resort to working with small subsets or windows of the data,
which may degrade the recovered signal (Kumar et al., 2015).
An more computationally feasible approach is to formulate al-
INTRODUCTION gorithms directly in terms of the low-rank components of the
signal, which eliminates the necessity of computing expensive
In recent years, industrial seismic exploration has moved to- SVDs or resorting to aggressive data windowing. The authors in
wards acquiring data in challenging-to-image regions. In order Da Silva and Herrmann (2015) develop an optimization frame-
to satisfy Nyquist sampling criteria and to provide more useful work for interpolating Hierarchical Tucker (HT) tensors using
information in the deep or steeply dipping geologies, compa- the smooth manifold structure of the format, which we explore
nies typically acquire finely-sampled and long-offset seismic further.
data in order to avoid aliasing and inaccuracy in subsequent pro- In this work, we outline a workflow for interpolating seismic
cessing steps (Abma et al., 2007). Owing to the large scale and data from missing traces directly in compressed HT form. Once
dimensionality of 3D seismic experiments, acquiring fully sam- we have an estimate for the true data in terms of the HT param-
pled data is an exceedingly time-consuming and cost-intensive eters, we can reconstruct shot or receiver gathers on a per-query
process. For 3D seismic surveys, the high dimensionality (two basis. Using this technique, we do not have to form the full
source coordinates, two receiver coordinates and time) leads data volume when performing full-waveform inversion and in-
to exponentially increasing data processing costs as the size of stead allow the stochastic algorithm to extract shot gathers as
the area of interest grows and fully sampled volumes can easily it requires them throughout the inversion process. This shot
reach terabytes or even petabytes in size. extraction procedure only requires efficient matrix-matrix and
Terrain limitations or cost restrictions often limit fully sampling tensor-matrix products of small parameters matrices and adds
data in realistic scenarios. To mitigate the issue of missing little overhead compared to the cost of solving the partial dif-
traces or coarsely sampled data, so-called transformation-based ferential equations (PDEs). In doing so, we greatly reduce the
methods typically try to transform the data into different do- memory costs involved in storing and processing these data
mains, such as the Radon (Kabir and Verschuur, 1995; Wang volumes in an inversion context.
et al., 2010), Fourier (Sacchi et al., 2009; Curry, 2010), Wavelet
(Villasenor et al., 1996), and Curvelet (Hennenfent and Her-
rmann, 2006; Herrmann and Hennenfent, 2008) transforms, and
exploit sparsity or correlations among coefficients in order to
© 2017 SEG Page 1347

HIERARCHICAL TUCKER REPRESENTATION factorization for this matrix, and we can obtain two intermediate
matrices U(1,2) and U(3,4) at this stage. By further reshaping
Hierarchical Tucker (HT) tensors are a novel structured ten- U(1,2) and U(3,4) into 3D cubes, we finally isolate singletons,
sor format introduced in (Hackbusch and Kühn, 2009). This i.e. dimension 1,2,3,4, from grouped dimensions (1, 2) and
format is extremely storage-efficient, with the number of param- (3, 4). The full tensor can be formed by reversing this process,
eters growing linearly with the number of dimensions rather i.e., multiplying B(1,2) by U1 in the first dimension and U2 in
than exponentially with traditional point-wise array storage, the second dimension, to form U(1,2) , and so on for the other
which makes it computationally tractable for parametrizing matrices. See (Da Silva and Herrmann, 2015) for more details.
high-dimensional problems. To help describe the HT format,
we give some preliminary definitions. By virtue of the recursive construction in Figure 1, it is not
necessary to store the intermediate matrices Usrc x, rec x or
Definition 1: The matricization X(l) of a tensor X 2 Usrc y, rec y . It is sufficient to reconstruct a d dimensional
Cn1 ⇥n2 ⇥···⇥nd reshapes the dimensions specified by full tensor by only storing d small matrices Ut and d 2
l = {l1 , l2 , . . . , li } ⇢ {1, 2, . . . , d} into the row indices small 3 dimensional transfer tensors Bt . Hence, the storage
and l c := {1, 2, . . . , d} \ l into the column indices of the matrix requirement is bounded above by dNk + (d 2)k3 + k2 ,
X(l) . where N = max{n1 , n2 , . . . , nd } and k is the maximum rank
(Hackbusch and Kühn, 2009). Compared to N d parameters
For example, given a 4 dimensional tensor X 2 Cn1 ⇥n2 ⇥n3 ⇥n4 , needed to store the full data, the HT format greatly reduces the
X (1,2) is an n1 n2 ⇥ n3 n4 matrix with the first two dimensions number of parameters needed to be stored and manipulated. For
along the rows and the last two dimensions along the columns. 3D seismic data, the internal ranks of tensor format increases as
Definition 2: The multilinear product of a three-dimensional temporal-frequency grows, so that lower frequencies compress
tensor X 2 Cn1 ⇥n2 ⇥n3 with the matrix Ai 2 Cmi ⇥ni for i = 1, 2, 3, more easily than higher frequencies, as shown in Table 1.
denoted A1 ⇥1 A2 ⇥2 A3 ⇥3 X, is defined in vectorized form as
vec(A1 ⇥1 A2 ⇥2 A3 ⇥3 X) := (A3 ⌦A2 ⌦A1 )vec(X). Intuitively,
this is simply multiplying the tensor X by Ai in the ith dimension Bsrc x,rec x,src y,rec y
for each i.
Definition 3: A dimension tree T for a d dimensional tensor Bsrc x,rec x Bsrc y,rec y
is a binary tree where each node is associated to a subset of
{1, 2, . . . , d}, the root node troot = {1, 2, . . . , d}, and each non-
leaf node t is the disjoint union of its left and right children,
t = tl [ tr , tl \ tr = 0./ We can think of a dimension tree as Usrc x Urec x Usrc y Urec y
defining a recursive partitioning of groups of dimensions, where
the dimensions present in the left child node are “separated”
from the dimensions in the right child node.
1
Figure 2: Non-canonical dimension tree for the HT format
T k34 T
B1234 U34 applied to seismic data
n 1 n2
n1 n2
X (1,2) = U12
n3 n 4
It is critical to note that the organization of the tensor has a
n3 n4 k12 major impact on its low-rank behaviour. In the seismic context,
we permute our data from the typical or canonical organization
B12 (source x, source y, receiver x, receiver y) into a non-canonical
n 1 n2
U12 n1 U12 n1 U 1 organization (source x, receive x, source y, receiver y), which

n2 k12 U2T k2 results in faster decaying singular values for associated matri-
k1
n2 cizations (Da Silva and Herrmann, 2015; Aravkin et al., 2014).
k12 The dimension tree associated to this organization is shown
in Figure 2. From the perspective of low-rank reconstruction,
Figure 1: Hierarchical Tucker format for a 4D tensor X 2 considering randomly missing sources or receivers in the non-
Cn1 ⇥n2 ⇥n3 ⇥n4 . canonical ordering leads to growth of the singular values in
the corresponding matricizations of the tensor, leading to more
Although slightly technical to define, the Hierarchical Tucker favourable recovery conditions as explored in (Kumar et al.,
format aims to “separate” groups of dimensions from each 2015). The resulting HT parameters are indexed as
other, where “separate” is understood in the sense of an SVD- Ut 2 Cnt ⇥kt t = src x, src y, rec x, rec y
type decomposition, depicted in Figure 1. For a 4D tensor X,
ktl ⇥ktr ⇥kt (1)
such as a 3D frequency slice, we first reshape the given tensor Bt 2 C t = (src x, rec x), (src y, rec y),
as a matrix with the first two dimensions along the rows and (src x, rec x, src y, rec y) ,
other two along the columns, namely X(1,2) . Then we separate
where nt , kt corresponds to the dimensions and ranks indexed
the dimensions (1, 2) from (3, 4) by performing a SVD-like
by the label t, respectively.
© 2017 SEG Page 1348

Frequency (Hz) Parameter size Compression Ratio SNR
3 71MB 98.8% 42.8dB
6 421MB 92.9% 43.0dB
Table 1: Compression ratio comparison between non-canonical and canonical organizations with hierarchical Tucker truncation
method. Synthetic data is generated on the 3D Overthrust model with 502 sources and 3962 receivers, resulting in each frequency
slice requiring 5.8GB of storage.
Given fully sampled seismic data in the non-canonical orga- Algorithm 1 Extracting a common shot gather from com-
nization, one can truncate the full tensor to HT form via the pressed hierarchical Tucker parameters
algorithm in (Tobler, 2012), given a prescribed error tolerance Input: Source position ix and iy , and dimension tree
and maximum inner rank. In the more realistic case when we 1. Extract the vector usrcx from the matrix Usrcx (ix , :)
have subsampled data, we can use the algorithm described in 2. Multiply Bsrcx,recx along the ksrcx dimension with the vec-
(Da Silva and Herrmann, 2015) to recover the full data volume tor usrcx (in the sense of Definition 2)
by solving 3. Multiply the matrix obtained from step 2 along the krecx di-
min kA f (x) bk22 . (2) mension (second dimension) with Urecx , resulting in a ma-
x
trix Usrcx,recx of size nrecx ⇥ k(srcx,recx)
Here x is the vectorized set of HT parameters (Ut , Bt ) from (1), 4. Repeat steps 1, 2, 3 along the y coordinate to obtain the ma-
f maps x to the fully-expanded tensor f (x), A is the subsam- trix Usrcy,recy of size nrecy ⇥ k(srcy,recy)
pling operator, and b is our subsampled data. This algorithm T
5. The product Usrcx,recx Bsrcx,recxUsrcy,recy results in the fi-
can interpolate each 4D monochromatic slice quickly and effi-
nal shot gather
ciently as it does not compute SVDs on large matrices.
ON-THE-FLY EXTRACTION OF SHOT/RECEIVER We generate our data from 4 frequencies between 3Hz and 6Hz
GATHERS with spacings of 1Hz using a 50 ⇥ 50 source grid with 400m
spacing and a 396 ⇥ 396 receiver grid with 50m spacing on
Irrespective of our sampling regime, once we have a representa- the ocean bottom. The size of each frequency slice is roughly
tion of our data volume in the HT format, we can greatly reduce 5.8GB. From the full data, we randomly remove 80% of re-
the computational costs of working with our data. In order to ceivers from each frequency slice. We then are able to obtain
make use of the data directly in its compressed form, we present our compressed data volume for each monochromatic slice by
a method for extracting a shot (or receiver) gather at a given interpolating in the HT format. Figure 3 demonstrates the suc-
position (ix , iy ) directly from the compressed parameters. We cessful interpolation of the data volume from a high level of
use Matlab colon notation A(i, :) to denote the extraction of the missing data, resulting in shot-gathers that are simple to extract
ith row of the matrix A, and similarly for column extraction. with (3).
The common shot gather can be reconstructed by computing
0 0
Usrcx,recx = Usrcx (ix , :) ⇥1 Urecx ⇥2 Bsrcx,recx
5 5
Usrcy,recy = Usrcy (iy , :) ⇥1 Urecy ⇥2 Bsrcy,recy (3)
Receiver Y [km]
Receiver Y [km]
Dix ,iy = Usrcx,recx ⇥1 Usrcy,recy ⇥2 Bsrcx,recx,srcy,recy

10 10
15 15
Most importantly, all the computations in Equation (3) can be

efficiently implemented with Kronecker products or matrix- 0 5 10
Receiver X [km]
15 0 5 10
Receiver X [km]
15
vector products, outlined in Algorithm 1. Note that the interme- (a) True data (b) Missing 80% data
diate quantities can be constructed through efficient multilinear
0
products and are much smaller than the ambient dimensionality.
0
5 5
Receiver Y [km]
Receiver Y [km]
10 10
EXPERIMENTS & RESULTS

15 15
To demonstrate that our method significantly reduces mem- 0 5 10 15 0 5 10 15

Receiver X [km] Receiver X [km]
ory costs of working with the data volume, we consider full- (c) Extracted from (d) Difference
waveform inversion on the SEG/EAGE 3D Overthrust velocity compressed, recovered data
model. This 20km ⇥ 20km ⇥ 5km velocity model, discretized
on a 50m grid in each direction, contains structurally complex Figure 3: Comparison of a common shot gather from the Over-
areas such as fault belts and channels. We modify this model thrust data at 6Hz between originally full data and direct ex-
by adding a 500m water layer on top. traction from the compressed data with Algorithm (1) after
interpolation
© 2017 SEG Page 1349

The modeling code we employ is WAVEFORM, which is a 0 0
Helmholtz inversion framework written primarily in Matlab

(Da Silva and Herrmann, 2016). Considering the large number
1.2 1.2
of sources in these experiments, we use a stochastic optimiza-
z [km]
z [km]
2.3 2.3
tion algorithm that works on randomly subsampled subsets of

shots rather than using all the shots at each iteration, which 3.5 3.5
reduces the number of PDEs to be solved at a given iteration. 4.6 4.6

0 5 10 15 20 0 5 10 15 20
For each subset of shots, we partially minimize the resulting y [km] y [km]
least-squares objective function with the LBFGS algorithm (a) (b)
with bound constraints, i.e. minimum and maximum allowed 0 0
velocities (Schmidt et al., 2009). We invert a single frequency

at one time using 50 nodes with 8 Matlab workers running for 1.2 1.2
each, where each node has 20 CPU cores and 256GB of RAM.
z [km]
z [km]
2.3 2.3
We run the 3D FWI experiments for both the full data and com-
pressed HT data recovered from interpolation, fixing the total 3.5 3.5
number of PDEs solved. In this case, we design an interface to 4.6 4.6
our stochastic inversion algorithm that allows the algorithm to 0 5 10

y [km]
15 20 0 5 10
y [km]
15 20
automatically determine the shot gather indices required at a (c) (d)

given iteration, which are subsequently generated by (3). Fig-
ure (4) and (5) show inversion results for both full data and Figure 5: FWI results for x = 12500m lateral slice. (a) true
compressed data. Despite working with heavily subsampled model, (b) initial model, (c) inverted model with full data, (d)
initial data, the interpolation algorithm is able to accurately inverted model with compressed data. The number of PDE
estimate the entire data volume and the compressed parameters solves in both cases are the same.
are used to successfully invert the velocity model at a greatly
reduced memory cost. In this case, the data volume sizes are
reduced by over 90%, with little visual difference in the final than explicitly forming the full data volume, we utilize the
inversion results. data directly in its compressed form, giving query-based access
to the full volume. This approach extracts any common shot
0 0
or receiver gathers on-the-fly, making its code easily embed-
ded into other processing frameworks such as 3D FWI. The
5 5
proposed approach is computationally and memory efficient
without degrading subsequent results.
y [km]
y [km]
10 10
15 15
The techniques outlined in this work have the potential to sub-
stantially reduce data communication costs in distributed wave-
20
0 5 10 15 20
20
0 5 10 15 20 equation based inversion. In a parallel environment, we can
x [km] x [km]
cheaply store a compressed form of the full data volume at a
(a) (b)
given frequency on every node. This technique also lends itself
0 0 to generating simultaneous shots on-the-fly in a similar manner
5 5
to Algorithm (1), without the associated data communication
costs one would incur from distributing the full data volume
over the source dimension. The very high compression ratios
y [km]
y [km]
10 10
(greater than 90% in this example) are particularly enticing for

15 15
low frequency full waveform inversion.
20 20
0 5 10 15 20 0 5 10 15 20
x [km] x [km]
(c) (d) ACKNOWLEDGEMENTS

Figure 4: FWI results for z = 1000m depth slice. (a) true model,
This research was carried out as part of the SINBAD II project
(b) initial model, (c) inverted model with full data , (d) inverted
with the support of the member organizations of the SINBAD
model with compressed data. The number of PDE solves in
Consortium. The authors wish to acknowledge the SENAI
both cases are the same.
CIMATEC Supercomputing Center for Industrial Innovation,
with support from BG Brasil, Shell, and the Brazilian Authority
for Oil, Gas and Biofuels (ANP), for the provision and operation
CONCLUSIONS & DISCUSSION of computational facilities and the commitment to invest in
Research & Development.
In this paper, we propose an approach to represent our large-
scale 5D data set in terms of its low-rank tensor parametrization.
This approach is suitable for when the data is fully sampled
or missing randomly distributed sources or receivers. Rather
© 2017 SEG Page 1350

EDITED REFERENCES
REFERENCES
Abma, R., C. Kelley, and J. Kaldy, 2007, Sources and treatments of migration-introduced artifacts and
noise: 77th Annual International Meeting, SEG, Expanded Abstracts, 2349–2353.
Aravkin, A., R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann, 2014, Fast methods for denoising
matrix completion formulations, with applications to robust seismic data interpolation: SIAM
Journal on Scientific Computing, 36, S237–S266, http://dx.doi.org/10.1137/130919210.
Curry, W., 2010, Interpolation with Fourier-radial adaptive thresholding: Geophysics, 75, no. 6, WB95–
WB102, http://dx.doi.org/10.1190/1.3500977.
Da Silva, C., and F. J. Herrmann, 2015, Optimization on the hierarchical tucker manifold — Applications
to tensor completion: Linear Algebra and its Applications, 481, 131–173,
http://dx.doi.org/10.1016/j.laa.2015.04.015.
Da Silva, C., and F. Herrmann, 2016, A unified 2D/3D software environment for large-scale time-
harmonic full-waveform inversion: 86th Annual International Meeting, SEG, Expanded
Hackbusch, W., and S. Ku¨hn, 2009, A new scheme for the tensor representation: Journal of Fourier
analysis and applications, 15, 706–722, http://dx.doi.org/10.1007/s00041-009-9094-9.
Hennenfent, G., and F. J. Herrmann, 2006, Application of stable signal recovery to seismic data
interpolation: 76th Annual International Meeting, SEG, Expanded Abstracts, 2797–2801,
http://dx.doi.org/10.1190/1.2370105.
Herrmann, F. J., and G. Hennenfent, 2008, Non-parametric seismic data recovery with curvelet frames:
Geophysical Journal International, 173, 233–248, http://dx.doi.org/10.1111/gji.2008.173.issue-1.
Kabir, M. N., and D. Verschuur, 1995, Restoration of missing offsets by parabolic radon transform:
Geophysical Prospecting, 43, 347–368, http://dx.doi.org/10.1111/gpr.1995.43.issue-3.
Kreimer, N., and M. D. Sacchi, 2012, A tensor higher-order singular value decomposition for prestack
seismic data noise reduction and interpolation: Geophysics, 77, no. 3, V113–V122,
http://dx.doi.org/10.1190/geo2011-0399.1.
Kumar, R., C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann, 2015,
Efficient matrix completion for seismic data reconstruction: Geophysics, 80, no. 5, V97–V114,
Oropeza, V., and M. Sacchi, 2011, Simultaneous seismic data denoising and reconstruction via
multichannel singular spectrum analysis: Geophysics, 76, no. 3, V25–V32,
http://dx.doi.org/10.1190/1.3552706.
Sacchi, M., S. Kaplan, and M. Naghizadeh, 2009, Fx gabor seismic data reconstruction: 71st Annual
International Conference and Exhibition, EAGE, Extended Abstracts,
http://dx.doi.org/10.3997/2214-4609.201400441.
Schmidt, M. W., E. Van Den Berg, M. P. Friedlander, and K. P. Murphy, 2009, Optimizing costly
functions with simple constraints: A limited-memory projected quasi-Newton algorithm:
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
(AISTATS-09), 456–463.
Tobler, C., 2012, Low-rank tensor methods for linear systems and eigenvalue problems: Ph.D. thesis,
ETH Zu¨rich.
© 2017 SEG Page 1351

Trickett, S., L. Burroughs, and A. Milton, 2013, Interpolation using Hankel tensor completion: 83rd
Villasenor, J. D., R. Ergas, and P. Donoho, 1996, Seismic data compression using high-dimensional
wavelet transforms: Proceedings of Data Compression Conference, IEEE, 396–405,
http://dx.doi.org/10.1109/DCC.1996.488345.
Wang, J., M. Ng, and M. Perz, 2010, Seismic data interpolation by greedy local radon transform:
Geophysics, 75, no. 6, WB225–WB234, http://dx.doi.org/10.1190/1.3484195.
© 2017 SEG Page 1352

Zero offset VSP velocity inversion with FWI using segmented fast simulated annealing (SFSA)
Xuanying han*1,2, Danping Cao1,2, Xingyao Yin1,2 and Kai Liang1,2.
1
School of Geosciences, China University of Petroleum, Qingdao, China
2
Laboratory for Marine Mineral Resources, Qingdao National Laboratory for Marine Science and Technology,
Qingdao, China
Summary The VSP with small amount of data and the SFSA
algorithm with high efficiency provide the possibility for
In order to solve the problem of low computational this long time inversion. The result is the foundation of the
efficiency of the global optimization algorithm in the wave precise time-depth relationship and explanation and
equation inversion, we propose a segmented fast simulated prediction of oil and gas.
annealing (SFSA) algorithm for velocity full waveform
inversion of VSP data. The SFSA algorithm adopts a kind Disturbance model based on non-uniform mutation
of segmented strategy using different disturbance models
and annealing methods in different iterations. At the same In genetic algorithm, the mutation operator assists the
time, we generate model disturbance combined with the crossover operator to generate new individuals and decides
idea of non-uniform mutation on the later stage of the local search ability of the algorithm. The main idea of
algorithm in order to improve the convergence speed. We non-uniform mutation is to search the small area near the
perform zero offset VSP P-wave velocity inversion with original one. On the non-uniform mutation operation from
FWI using SFSA algorithm and very fast simulated X  X 1 X 2 ... X k ... X l to X  X 1 X 2 ...X k ' ...X l , the formula
annealing (VFSA) respectively on the same initial '
temperature and same markov chain length. The results for the generation of new genes X k is:
confirm the effectiveness and efficiency of the SFSA '  X  (t , U max
k
 X k )ifradom(0,1)  0
algorithm. Xk   k k k （1）
 X k  (t , X  U min )ifradom(0,1)  1
Introduction Here,  (t , y ) represents a random number which is
consistent with the non-uniform distribution in the range of
As an effective nonlinear optimization algorithm, the [0, y]. Jiang et al (2007) based on this idea proposed a new
simulated annealing algorithm has been proved to be method to enhance the ability of local search. He used a
rigorous in theory and effective in practical applications non-uniform mutation strategy to generate new model
under the condition of enough model perturbation and parameters. That is:
iteration times as well as strict annealing scheme. However, xi'  xi  yi ( xi max  xi min )
in practical applications, these requirements can not be
satisfied, and the efficiency is not high enough. Many yi  r (1  T / N ) a sgn( r  0.5) （2）
scholars have improved the simulated annealing algorithm. Here, R is a random number between (0, 1), T is the
Ingber (1989) proposed a very fast simulated annealing current temperature and N represents the maximum number
algorithm, which has a certain ability to solve practical of iterations associated with the maximum temperature and
problems, but the efficiency is still low. In this paper, the the minimum temperature. a is a constant which
segmented fast simulated annealing algorithm based on determines the degree of non-uniformity.
non-uniform mutation is proposed to further improve the
efficiency of the algorithm. In order to further improve the ability of local search and
the search efficiency, we can make some constraints on the
The random search algorithms with global optimization model, and gradually reduce the perturbation space of the
performance, such as simulated annealing algorithm, are model, so as to quickly approximate the optimal solution.
not required for the initial model, and can search for the In this paper, we add a restriction factor m (k) which is
target in the global space randomly in order to ensure that relevant to iteration times (k) to the disturbance model and
the inversion result is the optimal solution. These the new formula can be expressed as:
algorithms have been widely used in seismic inversion. Liu
et al. (1995) used the synthetic simulated annealing yi  r (1  T / N ) a sgn( r  0.5) / m(k ) （3）
algorithm to perform one-dimensional acoustic inversion. When k increases, m also increases and the search range
Zhang (2005) inversed wave impedance by using fast becomes smaller. Therefore, with the increase of the
simulated annealing algorithm. In this paper, we use the number of iterations, the perturbation model will be carried
SFSA algorithm to perform zero offset VSP wave equation out in a smaller and smaller interval around the current
inversion to get the precise P-wave velocity around the hole. model in order to find the optimal solution.
© 2017 SEG Page 1353

the zero offset VSP seismic records in Figure 3. The L-2

The criterion of acceptance probability norm of the difference between the calculated results and
real seismic record is the objective function here.
The criterion of acceptance probability in VSFA only

considers absolute change of the energy of object function. Depth(m) Time（s)
According to the process of phase transition rules from 2000
molten state to the crystalline state in the thermodynamics 100
50
and statistical physics, we need to consider relative change 200 1800 100
of the energy. So the formula can be optimized by the
following formula (Gu, 1999): 300 150
P  [1  (h  E / T )( E1 / E  ]1/ h (4) 400 1600 200
250
E1 is the energy function of current model, the
Here, the 500
300
1400
E is the energy function of new model, and E is the 600
350
relative energy, h,  and  are given nonnegative real 700 400
1200
numbers. The algorithm can have faster search ability with 800 450
the optimized criteria of acceptance probability. 500
900 Depth
20 40 60 80 100 20 40 60 80
(m)
The SFSA algorithm 200 400 600 800
(a) (b)
In this paper, we propose a segmented fast simulated Figure 1: P-wave velocity model (a) and the seismic record
annealing algorithm based on the non-uniform mutation. In (b)
the early stage of the algorithm when the number of
iterations is less than the truncated iteration number, K, the Zero offset VSP P-wave velocity inversion
model uses a global perturbation pattern and the entire
solution space is traversed. The method of model First of all, we need to determine the truncated iteration
perturbation is the same as the conventional fast simulated number, K. In this paper, several iteration numbers are
annealing algorithm (formula (5)). taken as the K, and corresponding experiments are carried
|2 1| out to choose the best number. The initial temperature of
, yi  T sgn(  0.5)[(1  1 / T )  1] (5) inversion is 10 degree centigrade and markov chain length
Here,  is the random number on the (0,1) .The matching is 20. The total the number of iterations is 529. The
annealing plan is: inversion results of the velocity (Figure 2) , the change of
1/ N the temperature and the disturbance modulus (Figure 3) can
T (k )  T0 k (6) be seen that the smaller the K, the stronger the local search
Here, T0 is for the initial temperature, k is the number of ability algorithm will have, but the more easily the
iterations, N represents the number of parameters to be algorithm falls into local the local extremum. When the
inversed and  is the attenuation coefficient. In this stage, truncation iteration number is larger, it is not conducive to
find the optimal solution either. Here, we can get better
we set  to 0.98. The aim of the early stage is to search
inversion results when the truncation number is in the range
and lock the interval of the optimal solution.
of [100,150].
In the later stage of the algorithm when iterations times is
greater than the truncated iteration number, K, in order to
improve the local search ability of the algorithm, we use
the formula (3) to generate the perturbation of the model
and the formula (5) with  =0.97 as the annealing plan.
The purpose is to search the optimal solution in the locked
interval quickly in this stage. The criterion of acceptance
probability of the whole algorithm is formula (4).
Wave equation modeling and objective Function
In this paper, we perform inversion based on wave equation

modeling in frequency domain using optimize 9-point finite
difference algorithm (Stekl, 1998). The P-wave velocity
model we set is shown in Figure 1. After modeling there is
© 2017 SEG Page 1354

Depth (m) Temperature

Depth (m) 10
0 0
100 5
100
Iteration
200 0
200
Disturbance 0 -3 100 200 300 400 500 600 times
modulus 5 x 10
300 300
400 400 0
500 500
Iteration
-5
0 100 200 300 400 500 600 times
600 600 (a)The variation of temperature (top) and disturbance
modulus (bottom) with the number of iterations when K=10
700 700
Temperature
10
800 800
900 Velocity 900 5

1000 1500 2000 2500(m/s) 500 1000 1500 2000 2500Velocity
(m/s)
Iteration
Depth (m)（a)K=10 (b)K=50 0
Depth(m) 0 100 200 300 400 500 600 times
0 0 Disturbance
modulus x 10-3
100 100 4
2
200 200
0
300 300 -2
Iteration
-4
400 400 0 100 200 300 400 500 600 times
(b)The variation of temperature (top) and disturbance
500 500 modulus (bottom) with the number of iterations when K=50
Temperature
600 600
10
700 700
5
800 800 Iteration

Velocity Velocity 0 times
(m/s) 0 100 200 300 400 500 600
900 900 (m/s)
1000 1500 2000 2500 1000 1500 2000 2500 Disturbance
modulusx 10-3
（c)K=100 (d)K=150 4
Depth(m) Depth(m)
0 0 2
0
100 100 -2
Iteration
200 200
-4
0 100 200 300 400 500 600 times
(c)The variation of temperature (top) and disturbance
300 300 modulus (bottom) with the number of iterations when K=100
Temperature
400 400 10
500 500
5
600 600 Iteration

0 times
0 100 200 300 400 500 600
700 700 Disturbance
modulus -3
x 10
800 800 5
Velocity Velocity
900 (m/s) 900
1000 1500 2000 2500 500 1000 1500 2000 2500 (m/s)
0
（e)K=250 (f)K=300
Iteration
Figure 2: Different velocity inversion results on the different -5 times
number as the truncated iteration number K. (The red line is 0 100 200 300 400 500 600
the true value, and the blue line is the result of inversion.) (d)The variation of temperature (top) and disturbance
modulus (bottom) with the number of iterations when K=150
© 2017 SEG Page 1355

Depth(m) Depth(m)
Temperature 0
0
10
100
100
5
200
200
Iteration
0 300
0 100 200 300 400 500 600 times 300
Disturbance
modulus x 10-3 400 400
5
500 500
0
600 600
-5 Iteration
0 100 200 300 400 500 600 times 700 700
(e)The variation of temperature (top) and disturbance
modulus (bottom) with the number of iterations when K=250 800 800
Temperature Velocity
10 900 900 Velocity
1000 1500 2000 2500(m/s) 1000 1500 2000 2500(m/s)
（a)VFSA (b)SFSA
5
Figure 5: The velocity inversion results using different algorithms
Iteration on the condition of same initial temperature (10) and the same
0 markov chain length (20) (The red line is the true value, and the
0 100 200 300 400 500 600 times
Disturbance blue line is the result of inversion.)
modulus -3
x 10
5 The VFSA algorithm and SFSA algorithm are used to
inverse the P-wave velocity of VSP data under the situation
0 of other conditions being equal. The inversion results using
the initial temperature of 100 degree centigrade and markov
Iteration chain length of 200 is shown in Figure 4. And the iteration
-5
0 100 200 300 400 500 600 times
(f)The variation of temperature (top) and disturbance times of VFSA and SFSA are 798 and 604 respectively.
modulus (bottom) with the number of iterations when K=300 Figure 5 demonstrates the inversion results with a
Figure 3: The variation of temperature and disturbance temperature of 10 degree centigrade and markov chain
modulus with different K length of 20. And the iteration times of VFSA and SFSA
are 912 and 529 respectively. The results reveal that the
Depth(m) Depth(m)
0 0 SFSA algorithm can obtain good inversion effect under the
condition of lower temperature and less iteration times. The
10 100 SFSA algorithm can effectively improve the efficiency of
the algorithm.
20 200
30 300 Conclusions
40 400
The traditional VFSA algorithm has been applied in
50 500 seismic inversion, but the efficiency is still low. In order to
improve the efficiency of the simulated annealing
60 600 algorithm in the zero offset VSP velocity inversion with
FWI, we perform a segmented fast simulated annealing
700
70
algorithm. After the model test, the SFSA algorithm can
80 800 obviously improve the convergence speed, and can get
Velocity Velocity better inversion effect under the less iteration times than
90
(m/s) 900
1000 1500 2000
(m/s)
2500 VFSA. The results demonstrate the effectiveness of the
1000 1500 2000 2500
SFSA algorithm.
（a)VFSA (b)SFSA
Figure 4: The velocity inversion results using different algorithms
on the condition of same initial temperature (100) and the same
markov chain length (200) (The red line is the true value, and the
blue line is the result of inversion.)
© 2017 SEG Page 1356

EDITED REFERENCES
REFERENCES
Bohachevsky, I. O., M. E. Johnson, and M. L. Stein, 1986, Generalized simulated annealing for function
optimization: Technometrics, 28, 209–217, http://dx.doi.org/10.2307/1269076.
Ingber, L., 1989, Very fast simulated annealing: Mathematical and Computer Modeling, 12, 967–973,
http://dx.doi.org/10.1016/0895-7177(89)90202-1.
Liu, P.-C., and S. H. Hartzell, 1993, Nonlinear multiparameter inversion using a hybrid global search
algorithm: Examples from 1D acoustic waveform inversion: Geophysical Journal International,
submitted.
Longcong, J., and L. Jiangping, 2007, Simulated annealing algorithm and its improved: Chinese Journal
of Engineering Geophysics, 2, 135–140.
Mrinal, K. S., and P. L. Stoffa, 1995, Global optimization methods in geophysical inversion,
https://doi.org/10.1016/s0921-9366(06)x8001-5.
Štekl, I., and R. G. Pratt, 1998, Accurate viscoelastic modeling by frequency-domain finite differences
using rotated operators: Geophysics, 63, 1779–1794, http://dx.doi.org/10.1190/1.1444472.
© 2017 SEG Page 1357

Waveform Inversion In Laplace-Fourier Domain For Estimation Of Seismic Event Location
And Moment Tensor
Petr V. Petrov and Gregory A. Newman, Lawrence Berkeley National Laboratory.
Summary existing algorithms invert the seismic moment tensor m

under the assumption that the source position xs and origin
Enhanced geothermal system (EGS) development will time t0 are known. However, migration-based and FWI
cause increased seismicity, which the physical mechanisms methods have the advantage of resolving tensor m and
are still not fully understood. For the definition of the location xs simultaneously (Gajewski and Tessmer, 2005;
nature of induced seismicity, a second-order dynamic Michel and Tsvankin, 2014). Waveform inversion is a
moment tensor analysis may be used to ascertain if nonlinear optimization technique that uses full-wavefield
seismicity is arising from stress release along preexisting propagation for simulating the data and iteratively matches
faults or from fracture permeability creation associated seismic waveforms to estimate the model parameters. The
with EGS activities. With dynamic moment tensor analysis most attractive feature of FWI is in the improved resolution
an accurate estimation of the source parameters, including of the inverted models due to the inclusion of phase and
location, in the presence of complex geological media with amplitude information, and the ability to invert
highly variable seismic velocity properties is a non-trivial multicomponent data consistently.
task. Here, we employ the full elastic waveform inversion In distinction from the method used by Michel and
(FWI) in Laplace-Fourier domain to estimate the location Tsvankin (2014), where the inversion is performed in time
and seismic moment tensor parameters of sources domain simultaneously for the source position xs, origin
embedded in 3D isotropic heterogeneous media. Forward time t0, and tensor m, we use FWI in Laplace-Fourier
modeling is carried out with a 3D finite-difference code domain and define source location and moment tensor.
that generates P- and S-waves from the point sources Here we describe the inversion algorithm and evaluate its
defined by second-order moment tensors. We apply the robustness by adding random Gaussian noise in the data
nonlinear conjugate gradient method (NLCG) for the and by changing velocity model. Next, we apply the
minimization of the objective function and source model algorithm to a synthetic seismic data set generated for a
updating. The FWI algorithm is shown to be stable in the design study at the Raft River geothermal field, (Idaho,
presence of complex geometry including faults and random USA).
Gaussian noise. We present results of testing the FWI
methodology on a synthetic dataset for the Raft River Inversion Methodology
geothermal field, Idaho. The detectors were placed on the
earth surface and seismic events were simulated at the The objective is to determine the best fitting source-
depth about 2000 meters. Matching the waveforms from moment model that might arise from fluid-related processes
seismic events provides improved source location along in the geothermal system. Here we try to address this
with the estimates of the pertinent components of the problem from the standpoint of FWI (Plessix et al. 2010
moment tensor. Investigation of the influence of velocity and Sirgue et al. 2010). We propose a full-waveform
model and an initial position of an event were also approach for seismic source model using data in the
performed. Laplace-Fourier domain from a few monitoring surface
monitoring stations. The algorithm is based on nonlinear
Introduction gradient calculations for the minimization of the objective
function and source model updating for the seismic source
Passive seismic monitoring and the associated earthquakes embedded in complex heterogeneous geological media,
arising from production/fracking activities in oil and gas is exhibiting isotropic properties.
now a well established technology. The seismic monitoring Following the inverse problem formulation in the
for geothermal fields which focuses on characterization of Laplace-Fourier domain we solve the following nonlinear
the mechanisms of induced microseismic events is another problem by minimizing the error functional
0.5∑ ( d qobs
,k − dq,k ) E E ( dq,k − dq,k )
application of the technology and currently of great interest H
φ (m, rs ) = sim T obs sim
(1)
due to the development of enhanced geothermal systems k ,q
(EGS) energy production.
Conventional event-location techniques are based on In eq. (1) d qobs sim
, k , d q , k are the observed and predicted data
picking the arrival times of the direct P- and S-waves in a vectors, with subscript q indicating the station position, E
borehole or at the surface (Rutledge and Scott, 2003). Also, - the diagonal matrix of weights defined by the data error,
microseismic events can be located without time picking, and symbols “H, T”- the Hermitian conjugate and transpose
by employing stacking (Anikiev et al., 2014) and operations respectively. The observed data vectors consist
migration-based methods (Zhang and Zhang, 2013). Most
© 2017 SEG Page 1358

Waveform Inversion In Laplace-Fourier Domain For Estimation Of Seismic Event Location And Moment
Tensor
of the Laplace-Fourier image of elastic displacement position of the source at point rs0 and define the best fit for
velocities, obtained from the measured time-domain
the moment tensor. Because of the known position of the
seismic wavefield data with a set of complex frequencies
source, the minimization of (1) becomes a linear problem

sk = 1...N k , where σ k is the Laplace damping
σ k + iωk , k = relative to the 6 components of the moment tensor and can
constant and ωk is the angular frequency with i= −1 : be defined by the least square method
(
B k ( rs0 ) ET E d obs
k − B k ( rs ) ⋅ m k = )
H 0 0
∞
d obs
q,k =∫ d obs
q (t ) e − sk t
dt (2)
0, (7)
Elements of sensitivity matrix B k ( r )

0
0
Predicted data d qsim are defined by the displacement s are estimated from
,k
velocity field and depend upon the velocity model and the 6 solutions of the forward problem (6), separately for
source model each moment tensor component. Specifically
B k = b1 , b 2 ,..., b 6  , b l = B k ⋅ el = QG k P ⋅ el , l = 1...6 (8)
,k = Qq vk
d qsim ˆ (3)
ˆ is the interpolation operator applied to the
where Q where el is a 6-component vector with the one in the l -
q
position and zeros in the others.
calculated velocity field in the vicinity of the station. For
At the second step we update the source position
the elastic medium with density ρ , bulk and shear moduli
using NLCG method (Polyak & Ribiere 1969), where the
κ , µ , the velocity components v k = ( vxk , v yk , vzk ) specified in difference between observed and predicted data
eq.(3), satisfy the equations of motion, which may be e0k ( d obs
= k − dk
sim ,0
) , where d ksim,0 = Bk ( rs0 ) m0k is obtained at
directly obtained taking the Laplace-Fourier transform of
the first for the assumed source position rs0 . Thus
the time-domain system for complex frequency sk . After
finite-difference implementation (Virieux 1986), v k can be (
rs1 =rs0 + α∇ rs φ , ∇rs φ =−2 Re ( e0k E ) E∇ rs d ksim ,0 =
H
) (9)
written via a Green function G k :
= ( Hˆ ∇ P (r ) m
−2 Re g ⋅ sk b Dτ rs
0
s
0
k )
( )
−1
=v k G=
kMk , Gk
ˆ kμ D
sk sk2I − b D τ
ˆ
v
ˆ ,
b D τ Here the vector g H is the solution of the adjoint equation
(4)
M k = ( M xx , M xy , M xz , M yy , M yz , M zz ) (symbol “*” is the conjugate operation)
T
K Tk g* QT EET ( d obs
k ( sk ) − d k )
*
where M is the Laplace-Fourier image of the spatio- = sim ,0

(10)
temporal moment density, kμ , b are block matrices of The iteration process repeats until changes in the relative
gradient become less than 0.5%.
the averaged elastic parameters and D ˆ ,D ˆ are the finite-
τ v
difference operators, which explicit expression can be Implementation for Raft River elastic model
found in (Petrov, Newman 2012, 2014). When the source-
receiver distance is much larger than the source scale and For the algorithm validation we use an existing seismic
the wavelength, the seismic source can be considered as a monitoring survey with the 8 stations at the Raft River
point source at the position rs = ( xs , ys , zs ) and can be geothermal field (Figure 1). The Raft River geothermal
field is a Department of Energy EGS test site, located in
represented by its moment tensor (Aki and Richards, 2009) Cassia county Idaho roughly 100 miles northwest of Salt
Mk (r ) (m , m yzk , mzzk ) δ ( rs − r ) ,
T
= k
xx , mxyk , mxzk , m yy
k
Lake City on the Utah-Idaho border. The Elba quartzite,
(5) located in Precambrian rocks, is the primary geothermal
m k = ∫ M k ( r ) dV reservoir with an average resource temperature of 300°F
Then equation (4) and, accordingly, the vector of predicted (Ayling et al., 2011; Jones et al., 2011). Analysis of water
data d ksim = ( d1,simk , d 2,simk , d qsim
, k ...d Nq , k ) transform to
sim chemistry indicates that the field is bisected into two
regions separated by shear faulting, termed the Narrows
=v k G k=
Pm k , d ksim B=
kmk, B k QG k P, (6) Zone (Figure 1), as described by Ayling & Moore (2013).
The Narrows Zone strikes to the northeast through the
=
where operator P diag (1 1 1 1 1 1) ⊗ δ ( r − rs ) middle of the field. Microseismic activity attributed to plant
defines the source position for each component of moment activity suggests that while acting as a barrier between the
tensor and sign ⊗ denotes the Hadamard product. two regions, the Narrows zone allows for fluid movement
To carry out simultaneous inversion for rs and along its length. Tracking production and distribution of
geothermal fluids is realized with the microseismic
m k we employ an iterative scheme, where each iteration is monitoring, which includes definition of seismic source
divided into two steps. At first we suppose the known location and mechanisms. Essentially the knowledge of
© 2017 SEG Page 1359

Tensor
seismic-source mechanisms can provide insights into the position of the source inside the convergence radius,
fracturing behavior of the reservoir and surrounding rocks roughly 1 km.
and the understanding of the evolution of the stress field.
Figure 2: P-wave velocities (m/s) of the Raft River elastic model

and seismic survey geometry.
Figure 1: Geometry of survey and locations of microearthquakes

relative to Narrows Zone in the Raft River Geothermal Area.
For the synthetic data generation, we use the part of 3D

Raft River velocity model designed by Nash and Moore
(2012). To reduce both the required memory and
computing time, we used a part of the model, which
contains all eight LBNL seismic stations. The model is Figure 3: The trajectories of inversion steps during the source
rotated, so that Narrows Zone aligns along the X-axis with position definitions.
a broadside extension of 396<x<5724, and along the Y-axis
1068<y<5148 (in meters). The depth extent of the model is
0<z<2256 (in meters). The distribution of P-wave velocities
in the planes x=3600 and z=2000 m are shown in Figures 2.
The number of grid nodes is 111 × 85 × 47 with the grid
spacing of 48 m. A free surface boundary condition was
imposed on the surface z=0. On the other boundaries the
perfectly match layer (PML) boundary condition is applied,
with 5 cells in each direction.
Synthetic data were generated by the Laplace-
Fourier domain finite-difference modeling technique Figure 4: Inverted moment tensor for the exact velocity model.
(Petrov and Newman 2012) with the same grid size. Each
of 8 seismic station records three components of velocity To investigate sensitivity of results to the noise in
displacement near the surface. The projections of survey the input data we add random Gaussian noise in the
geometry are presented in Figure 2. Two different positions Laplace-Fourier image of the observed data with the
of microseismic sources were examined near the Narrows variance equal to 5 and 10%. The inversions were
zone at xs=4000, ys=3600, zs=1800 and xs=4500, ys=2000, performed for the source in position 1. The FWI algorithm,
zs=1500 meters. The value of the moment tensor was the applied together with the exact velocity model, estimates
same in both cases and included all 6 components. During correct values of rs and m . Our simulations show a
the inversion all three components of velocity displacement
minor influence of the stochastic noise for the source
were used as the observed data from all stations.
location. Deviation from the exact position does not exceed
For the inversion we used only one complex
10 meters. The moment-tensor components are more
frequency with f=2 Hz and damping constant σ = 6 1/s.
sensitive to the noise (Figure 5) and show approximately
The initial positions of the seismic sources were selected at linear dependence with the noise level (Figure 6).
a distance of 500 to 800 meters from the exact location. As It is more important to evaluate the influence of the
Figures 3 and4 show, the inversion gives correct results for velocity model on the inversion results. To investigate this
the location of the source and does not depend on the initial factor we performed inversions with a few inexact models,
© 2017 SEG Page 1360

Tensor
which were obtained by the different levels of smoothing eM =

M iexact − M iinv M iexact *100%, i =
1...6 (12)
the exact velocity distribution (Figure 7). The difference
between models may be characterized by the relative error: then we can obtain the pattern between es and eM shown
= es V pexact
,s − V psmooth
,s V pexact
,s *100% (11) in Figure 8. Clearly moment tensor inversion results are
quite sensitive to the assumed velocity model and moment
tensor estimates can vary significantly with a poor choice
of the velocity model.
Figure 5: The mean values of moment tensor components inverted

with the random noise in initial data.
Figure 8: Results of inversion the moment tensor components for
different inexact (smoothed) velocity models with the relative
deviation error es = 6%,9% and 12% .
Conclusions
We present a FWI method for estimation of microseismic
sources parameters (location and moment tensor) in
Laplace-Fourier domain for 3D elastic heterogeneous-
isotropic media. The stability of the algorithm was tested
for the input data contaminated with Gaussian noise and
inexact velocity models. In noise-free tests the method
converges to the exact values of the source parameters,
Figure 6: The relative error of moment tensor components inverted provided initial source locations estimates are roughly 1 km
with the random noise in initial data. from the actual event location. In the variance of the noise
and inexact velocity model the inversion still provide very
Despite the difference between exact and used models the good source location estimates, but moment tensor
errors in the estimated source coordinates are relatively estimates are degraded corresponding to the noise
small (less than 20 m) in distinction from moment-tensor magnitude and velocity model deviation. The methodology
elements (Figure 8), which are more sensitive to the shows the importance of accurate background velocity
velocity model. model and is now a focus in our ongoing research efforts
on full waveform source moment inversion. Nevertheless,
the proposed algorithm can provide robust inversion of
microseismic parameters with the definition of a robust
background velocity model. We have demonstrated this
finding using an existing micro-seismic monitoring array at
the Raft River geothermal field.
a) b) c) Acknowledgments
Figure 7: Inexact velocity models with the different levels of
deviation smoothing) from the exact model
This material is based upon work supported by the U.S.
Department of Energy Office of Energy Efficiency and
a) es = 6% , b) es = 9% , c) es = 12% .
Renewable Energy (EERE) Geothermal Technologies
Program, under Award Number GT-480010-19823-10.
Increasing difference between used and exact models leads Computational resources were provided by the National
to the increasing of corresponding errors in the moment Energy Research Scientific Computing (NERSC) Center.
tensor estimates, although relations between different All simulations were performed on the CRAY XC30
components are still approximately conserved. If the supercomputers. The authors acknowledge Joseph Moore,
relative error for the moment tensor is defined as John Queen for providing the 3D elastic Raft River model.
© 2017 SEG Page 1361

EDITED REFERENCES
REFERENCES
Aki, K., and P. G. Richards, 2002, Quantitative seismology: University Science Books.
Anikiev, D., J. Valenta, F. Staněk, and L. Eisner, 2014, Joint location and source mechanism inversion of
microseismic events:
Ayling, B., and J. N. Moore, 2013, Fluid geochemistry at the Raft River geothermal field, Idaho, USA:
New data and hydrogeological implications: Geothermics, 47, 116–126,
https://doi.org/10.1016/j.geothermics.2013.02.004.
Ayling, B., P. Molling, R. Nye, and J. Moore, 2011, Fluid geochemistry at the Raft River geothermal
field, Idaho: new data and hydrogeological implications: 36thWorkshop on Geothermal Reservoir
Engineering, SGP-TR-191.
Benchmarking on seismicity induced by hydraulic fracturing: Geophysical Journal International, 198,
249–258, https://doi.org/10.1093/gji/ggu126.
Gajewski, D., and E. Tessmer, 2005, Reverse modeling for seismic event characterization: Geophysical
Journal International, 163, 276–284, https://doi.org/10.1111/j.1365-246X.2005.02732.x.
Jones C., J. Moore, W. Teplow, and S. Craig, 2011, Geology and hydrothermal alteration of the Raft
River geothermal system: 36th Workshop on Geothermal Reservoir Engineering, SGP-TR-191.
Michel Jarillo, O., and I. Tsvankin, 2014, Gradient calculation for waveform inversion of microseismic
data in VTI media: Journal of Seismic Exploration, 23, 201–217.
Nash, G. D., and J. N. Moore, 2012, Raft river EGS project: A GIS-centric review of geology: GRC
Transactions, 36, 951–958.
Petrov, P. V., and G. A. Newman, 2012, 3D finite-difference modeling of elastic wave propagation in the
Laplace-Fourier domain: Geophysics, 77, no. 4, T137–T155, https://doi.org/10.1190/geo2011-
0238.1.
Petrov, P. V., and G. A. Newman, 2014, Three-dimensional inverse modelling of damped elastic wave
propagation in the Fourier domain: Geophysical Journal International, 198, 1599–1617,
https://doi.org/10.1093/gji/ggu222.
Plessix, R. E., G. Baeten, J. W. de Maag, M. Klaasen, Z. Rujie, and T. Zhifei, 2010, Application of
acoustic full-waveform inversion to a low-frequency large-offset land data set: 80th Annual
International Meeting, SEG, Expanded Abstracts, 930–934, https://doi.org/10.1190/1.3513930.
Polyak, E., and G. Ribière, 1969, Note sur la convergence desméthods conjugées: Revue Francaise
d’Informatique et de Recherche Opérationnelle, 16, 35–43.
Rutledge, J. T., and P.W. Scott, 2003, Hydraulic stimulation of natural fractures as revealed by induced
microearthquakes, Carthage cotton valley gas field, east Texas: Geophysics, 68, 441–452.
inversion: the next leap forward in imaging at Valhall: First Break, 28, 65–70,
https://doi.org/10.3997/1365-2397.2010012.
Virieux, J., 1986, P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference
method: Geophysics, 51, 889–901, https://doi.org/10.1190/1.1442147.
Zhang, W., and J. Zhang, 2013, Microseismic migration by semblance weighted stacking and
interferometry: 83rd Annual International Meeting, SEG, Expanded Abstracts, 397, 2045–2049,
© 2017 SEG Page 1362

Velocity model building challenges and solutions for seabed- and paleo-canyons: a case study in
Campos Basin, Brazil
Kai Zhang, Javier Subia, Chevron; Chanjuan Sun, Hao Shen, Nuree Han, CGG
Summary initial model with 1D inversion of a picked sub-channel

horizon to a reference horizon. The second thing
The Campos Basin, offshore Brazil, features complex experience has taught us is that high-resolution (HR) model
shallow geology in the forms of pronounced seabed building techniques are important to capture the velocity
canyons and paleo-canyons. The rapid variations in the variation associated with the narrow canyon widths –
velocity field due to these complex shallow geologic typically narrower than the acquisition streamer length.
features can be difficult for ray-based tomography Thus far, techniques for implementing or improving ray-
techniques to resolve, resulting in distorted images in based tomography have dominated discussions of velocity
deeper section. Full waveform inversion (FWI) is able to model-building strategy for submarine canyons; analysis
utilize the recorded diving–wave energy to resolve the relies heavily on the quality and density of the residual
high-resolution velocity model in these geologically moveout (RMO) picks to build accurate high-resolutions
complex areas. Additionally, dip-constrained non-linear models (Fruehn et al., 2015). In addition, different
slope tomography introduces dip constraints to ray-based techniques such as pick weighting and layer constraints
residual move-out tomography and is able to capture small- (Sun et al., 2011; Chen and Shen, 2012; Chen and Hu,
scale velocity anomalies associated with these shallow 2014), reference horizons and offset-consistent dip
heterogeneities. A combined workflow of FWI and dip- constraints (Graham and Richard, 2009, Guillaume et al.,
constrained tomography enabled Chevron to build accurate 2013, Chen and Hu 2014), have been explored to help
and detailed velocity models in the Campos Basin, invert good models.
resulting in fewer seismic image distortions. We
demonstrate the method using a Campos Basin, Brazil However, ray-based analysis has been hindered not only by
narrow-azimuth streamer dataset. the lack of sufficient offsets for picking shallow events, but
also by the complex ray paths generated by steep canyon
Introduction walls. Deeks and Lumley (2015) pointed out that multiple
It has been long noticed that submarine canyons incised paths of prism waves are usually generated by the canyons,
into the continental slope water bottom present significant even at short offsets. Prism wave energy becomes stronger
challenges in seismic imaging and interpretation for oil and in narrow and deep canyons and creates shadow events in
gas exploration. Such complex seafloor bathymetry can stacked images and CDP gathers (Deeks and Lumley,
distort seismic wavefronts and amplitudes, making it 2015; Debenham and Westlake, 2014). Confidence in
difficult to estimate accurate velocity models and construct picking proper RMO and geological events is low in the
real geologic reflectivity images, and therefore negatively presence of prism waves. Furthermore, multi-pathed
impacts the results of the derived interpretation. In time energy cannot be properly handled by single-path residual
processing, it was suggested to use wave-equation curvature analysis (RCA) and migration algorithm like
datuming (Berryhill, 1986) or time-variant statics (Dent, Kirchhoff migration, which are typically used in this case.
1983) to reduce the “pull up” or “push down” effects. Such
approaches do not solve the velocity problem but instead FWI provides a different way to tackle the problem. It uses
provide horizons which are perceived as true geology. the wave equation to produce high-resolution velocity
Debenham and Westlake (2013) compared pre-stack time models by directly comparing modeled data to the real
migration (PSTM) and pre-stack depth migration (PSDM) seismic records. It handles complex ray paths naturally and
workflows with a 2D line study and demonstrated that doesn’t rely on a priori geology assumptions or RMO
image distortions and amplitude dim zones cannot be fixed picking. We applied FWI in this case study in Campos
by a static-correction method, while PSDM image with Basin, followed by a ray-based model building workflow
careful depth model building workflow shows obvious with dip-constrained non-linear slope tomography
uplift. The practice of 3D depth imaging in the Campos (Guillaume et al., 2013). The result shows improvement in
Basin has taught us a couple of things. First, initial the sub-canyon image and satisfying quality in both the
velocity models that honor shallow geologic features are data and gather domains.
very helpful and sometimes critical for tomography to
Study area and workflow
converge quickly to reasonable final models. Birdus (2009)
proposed the use of geo-mechanical methods to build initial The study area is situated in the Campos Basin, offshore
seabed-canyon models. Arnaud et. al. (2008) built an Brazil. Water depths range from 150m to 1,500m. The
© 2017 SEG Page 1363

Canyon FWI and Tomo
typical rugose seafloor in the area is presented in Figure 1, The top panel is modeled data with the initial model; the
where canyons carved about 300m into the continental middle panel is modeled data with the FWI updated model,
slope. and the bottom panel shows the recorded field data. The
initial model provides good match for near water-bottom

The first 3D narrow azimuth streamer data (NAZ) events, but poor match for deeper events. Better match with
acquisition in this area was acquired and processed with the real data is seen after the FWI update which indicates
conventional ray-based tomography model building that it worked as expected.
workflow for PSDM in 2008. Imaging suffers from
structure distortions below seabed-canyons down to the KPSDM results with the initial model and the FWI updated
Cretaceous at around 3 km depth. In 2010, a new 3D NAZ model are shown in Figure 3 (a) and (b). The “pull-up” and
streamer survey was acquired. This data was processed in “push-down” structure associated with the canyon shape is
2011-2012 using a workflow of layer-constrained HR clearly observed in the initial model image from ~1,400m
tomography with structurally-guided weighting. Details of to ~3,000m. FWI model fixes the “pull-up” pointed out by
the workflow can be found in Chen and Shen (2012). the arrows however some short wavelength undulations
Compared to the original imaging, the 2012 workflow was still remain. Multiple reasons could explain why FWI
able to greatly reduced sub-canyon image distortions. could not fully resolve the canyon related velocity issues.
However, with this method, a lot of labor-intensive effort First, the strong featherings of the NAZ acquisition without
and attention was required during each of the tomography good uniformed source-receiver patterns could be an issue.
iteration to evaluate stack image in order to identify Second, no usable signal could be extracted below 5Hz
potential non-geologic artifacts in velocity model. Despite from the NAZ data while a lower frequency is usually
the good effort in 2011-2012 work, there were still residual important to for FWI to avoid cycle skipping. Third, the
image distortions in deeper part of section because we acquisition direction, which is parallel to the canyon
didn’t fully resolve velocity anomalies in shallow direction (dip to structure), may also play a role in
overburdens, as proved by well data. preventing FWI from resolving a perfect model. Besides
the data limitations, the FWI algorithm can also suffer from
Due to FWI’s ability to provide high resolution velocity possible anisotropy and density leakage. In order to
model in shallow overburden, it is designed to be the main completely fix the image distortions in deeper section,
part of a 2015 re-imaging effort to update the velocity subsequent ray-based tomography is applied. Dip fields are
model using the 2010 NAZ dataset. We utilized an picked on the stack and reference structural dips are created
acoustic, finite difference time domain FWI for the study by smoothing out these small undulations. After dip-
(Ratcliffe et. al, 2011). The velocity update is primarily constrained tomography, we obtained the 2015 final model.
driven by refraction energy. The smoothed 2012 model Figure 3 (c) is KPSDM QC of the 2015 model. It shows
served as input model for the 2015 FWI. 48 iterations of smooth sediment layers in the entire section. For a fair
FWI were run from 5 to 10Hz in order to minimize the comparison, we generated KPSDM stack using the 2012
misfit of phase between synthetic shot gathers and real shot final model, as shown in Figure 3 (d). The 2012 model also
gathers. After reaching a good match in the data domain, shows a good fix in the shallow depth. However, in the
Kirchhoff PSDM (KPSDM) was run to evaluate the FWI deeper section, a mild residual can be seen below 2,600m.
updated model in the image and gather domains. As we’ll From depth slice QC’s, this observation is even more
show in the next section, most of the non-geological apparent. Figure 4 compares depth slice images at 1,500m
structural undulations were removed from the stack, but and 2,100m for the initial, the 2012 and the 2015 final
some small residuals remained. We then applied additional models. In the initial model image, jitters caused by the
iterations of dip-constrained non-linear tomography. RMO seafloor canyons exist everywhere and become milder at
and dip fields on near, middle, and far stacks were picked deeper depth. The 2012 image did a good job at reducing
for dip-constrained tomography. A model was derived to most of the distortions but some small residuals can still be
flatten the gathers as well as minimize the difference seen on the 2,100m depth slice. In the 2015 image, all
between offset dependent dips and a reference dip. This sediment contours are smooth without distortion.
final model healed all residual non-geological post-FWI Gather domain QC’s provide additional support for the
structural undulations. 2015 model. Figure 5 presents the three models overlaid
on their corresponding KPSDM stack with CIG gathers on
the right side. The initial model is a smoothed velocity field
Results and analysis
without any lateral changes. The gathers show quite large
RMO. Far offsets (~2,000m) quickly stretch out or
Since FWI attempts to match modeled and recorded shots, disappear in this area, which is not ideal for ray-base
we performed data domain QC first to validate the FWI tomography. The speed up and slow down associated with
update. Near/mid/far channels are evaluated in figure 2. canyon flanks and valleys are captured in the 2012 and
© 2017 SEG Page 1364

Canyon FWI and Tomo
2015 final models. The latter puts a smoother velocity Figure 1 Campos Basin Location Map and water bottom
inside canyon flanks (black circle) and a stronger velocity map of the study area.
variation around 1,500m. The oscillating velocities healed
the imaging undulations and minimized gather RMO. The

2015 final image provided a higher level of confidence in
the interpretation results. The image depthing accuracy was
proved by a well drilled shortly after the processing. Other
products based on the seismic, like seismic inversion, were
also greatly benefited, showing better continuity and better
match with wells data. By resolving the effects of seabed
cannyons, we were able the reduce the uncertainty on the
placement of horizontal wells in thin and low dip-angle
reservoirs in the field.
Conclusions Figure 2 On the left, near/mid/far channels (a) modeled
with initial smooth model (b) modeled with FWI update
We presented a model building workflow of combined FWI model and (c) recorded. On the right, recorded shot is
and ray-based dip-constrained tomography to solve for an presented with wiggles and modeled shot is overlaid with
accurate model in complex geology settings of seabed- positive amplitude in red and negative amplitude in blue.
canyons. FWI has an advantage over conventional ray- Better blue to trough and red to peak alignment indicates
based tomography which can break down without good better matching between the modeled and recorded shots.
picks and cannot handle multi-pathing. Although with this (d) Modeled shot with initial model is overlaid on recorded
NAZ data, FWI alone did not fully resolve the problem, it shot gather; (e) Modeled shot with FWI model is overlaid
fixes large image distortions and provides a better starting on recorded shot gather; (f) Recorded shot gather
point for ray-based tomographic update. Dip-constrained
tomography further resolves the residual undulations using
RMO and offset dependent dips. It has been shown that this
workflow is effective at automatically resolving image
distortions down to target level without manually pre-
setting layers or regions for tomographic update. With the
proposed workflow, we achieved a high-resolution model
efficiently that is geology conformable and geo-
mechanically meaningful.
Acknowledgements
The authors thank Chevron and CGG for their support for
this project.
Additional acknowledgements:
Chris Manuel, Chevron
Vanessa Brown, Chevron
© 2017 SEG Page 1365

Canyon FWI and Tomo
Figure 3 Kirchhoff PSDM stacks for (a) initial model (b)

FWI update model (c) 2015 final model (d) 2012 final Figure 5 Model overlaid on PSDM stack for (a) initial
model. model (b) 2012 final model and (c) 2015 final model.
Gathers at the selected location are shown on the right side.
Figure 4 Depth slice at 1,500m (left) and 2,100m (right) for

(a) initial model (b) 2012 final model and (c) 2015 final
model
© 2017 SEG Page 1366

EDITED REFERENCES
REFERENCES
Arnaud, J., S. Hollingworth, A. Woodcock, L. Ben-Brahim, C. Tindle, and S. Varley, Water Bottom
channels’ impact on pre-stack depth migration on elgin field, North Sea: 70th EAGE Conference
and Exhibition incorporating SPE EUROPEC 2008, http://dx.doi.org/10.3997/2214-
4609.20147984.
Berryhill, J., 1986, Submarine canyons: Velocity replacement by wave-equation datuming before stack:
Geophysics, 51, 1572–1579, http://dx.doi.org/10.1190/1.1442207.
Birdus, S., 2009, Geomechanical modeling to resolve velocity anomalies and image distortions below
seafloor with complex topography: 71th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, U014.
Chen, G., and H. Shen, 2012, High-resolution tomography for paleo-canyons: a case study in Para-
Maranhao Basin, Brazil: 82nd Annual International Meeting, SEG, Expanded Abstracts,
Chen, G., and L. Hu, 2014, Dip-constrained tomography with weighting flow for paleo-canyons: a case
study in Para-Maranhao Basin, Brazil: 84nd Annual International Meeting, SEG, Expanded
Abstracts, http://dx.doi.org/10.1190/segam2014-0513.1.
Debenham, H., and Westlake, S., 2014, Pre-stack depth migration for improved imaging under seafloor or
canyons: 2D case study of Browse Basin, Australia: Exploration Geophysics, 45, 216–222,
http://dx.doi.org/10.1071/EG12085.
Deeks, J., and D. Lumley, 2015, Prism waves in seafloor canyons and their effects on seismic imaging:
Geophysics, 80, S213–S222, http://dx.doi.org/10.1190/geo2015-0014.1.
Dent, B., 1983, Compensation of marine seismic data for the effects of highly variable water depth using
ray-trace modeling, a case history: Geophysics, 48, 910–933,
http://dx.doi.org/10.1190/1.1441519.
Fruehn, J., V. Valler, S. Greenwood, S. Phani, S. Sarkar, C. G. Rao, P. Kumar, and P. Routray, 2015,
Velocity model update via inversion of non-parametric RMO picks over canyon areas offshore
Sri Lanka: 14th International Congress of the Brazilian Geophysical Society and EXPOGEF,
1063–1066, http://dx.doi.org/10.1190/sbgf2015-211.
Graham, C., and L. Richard, 2009, Channel tomography: 71th Annual International Conference and
Exhibition, EAGE, Extended Abstracts.
Guillaume, P., M. Reinier, G. Lambaré, A. Cavalié, M. Adamsen, and B. Bruun, 2013, Dip-constrained
non-linear slope tomography: an application to shallow channel characterization: 75th Annual
International Conference and Exhibition, EAGE, Extended Abstracts, 1587–1591.
Sun, Y., Q. Guo, S. Carroll, J. Chen, and E. Liebes, 2011, A high-resolution velocity anisotropy case
study: 81th Annual International Meeting, SEG, Expanded Abstracts, 3918–3922,
http://dx.doi.org/10.1190/1.3628023.
© 2017 SEG Page 1367

A refraction/ reflection full-waveform inversion case study from North West Shelf offshore
Australia
David Dickinson* (Woodside Energy Ltd), Fabio Mancini (Woodside Energy Ltd), Xiang Li (CGG), Kai Zhao
(CGG) and Shuo Ji (CGG)
Summary
In this paper, we use a real 3D data example (with a 7km
The robustness of diving wave Full-Waveform Inversion cable) from Northern Carnarvon Basin (Western Australia)
(FWI) has been proven in industry, but the effectiveness is to evaluate FWI’s business impact: what value can high
limited by its penetration depth. To target deeper reservoirs resolution FWI bring us at the reservoir level?
, the application of FWI using reflection energy is
necessary. This paper presents a real data 25Hz VTI FWI Geology background
case study from North-West Shelf (NWS) Australia
utilizing the full wave-field. Starting from a high-quality The survey is located in an area with rugose water bottom
reflection tomography VTI model, a top-down approach and larger sea floor depth variations (200 ~1200m, Figure
has been adopted. Diving wave FWI updates the shallow, 1). The field lies beneath the continental slope and the
then reflection FWI is introduced to further update the target is a gas-bearing Triassic reservoir from 3 to 4km
deeper section. The updated FWI model demonstrates depth. Local geology challenges include a large tilted
significant uplifts in increasing resolution and conformance faulting system and complex shallow overburden.
with underlying geology. Two promising aspects can be Numerous seafloor canyons and escarpments make it quite
observed: (1) the fairly solid uplifts in mitigating the a challenge for both velocity model building and imaging at
imaging challenges: FWI reduces wave-field distortions, reservoir level. The distorted wave-fields in turn reduce the
leads to overall improved focusing, gather flatness, confidence of the underlying target interpretations due to
continuity, and better positioning in depth; and (2) structural uncertainty and amplitude distortions.
uncovers geological features beyond imaging: high-
50m North
resolution FWI delineates small shallow anomalies and
velocity boundaries across faults, and reveals the strong
acoustic impedance contrasts at reservoir level. It
demonstrates FWI can aid both in reducing the velocity
uncertainty as well as understanding underlying geological
formation.
Introduction
In the past decade, FWI has gained wide acceptance and

gradually become a standard step in velocity model
building. Recently, there have been more efforts towards 4km
higher frequency FWI, and direct interpretation on high- 1200m
resolution FWI output models (Lu et al., 2016 and Mancini
Figure 1: Water Bottom and Location Map
et al., 2016). The robustness of Diving Wave FWI (DW-
FWI) for shallow velocities is well accepted. Privitera et al.
(EAGE, 2016) illustrated in their offshore Gabon case Method: FWI on top of Tomography, with a layer-
study how DW-FWI can help identify shallow gas pockets stripping scheme
and dewatered faults. However, the effectiveness of DW-
FWI for deeper sections is limited by the penetration depth. In our case a top down layer-stripping approach is adopted:
Reflection FWI, on the contrary, has no penetration depth 1) Starting from a high-quality PreSDM tomography
limitation, though the limited angle makes it less robust and model, DW-FWI is carried out firstly, limited to relatively
more accurate input model is thus usually required. To use shallow updates (up to 1.7km); 2) Reflection FWI is then
reflection FWI for the deeper update is now increasingly introduced to update from shallow to deeper section. For
important. Recently Mancini et al. (TLE, 2016), in their 30 both DW-FWI and reflection FWI iterates from low
Hz 2D FWI trial utilizing both refraction and reflection frequency (4Hz) to high frequency, up to 25Hz. The FWI
energies, demonstrated FWI’s potential in providing updates go deeper than 4km, completely covering the
models that aid direct interpretation even for deeper reservoir interval.
sections.
© 2017 SEG Page 1368

A refraction/ reflection full-waveform inversion case study
The starting model is of very high quality with good spatial indications of gas-water contacts. Additional sonic log
and vertical resolution. After more than 10 iterations, it comparisons at other well locations (Figure 6) within the
matches sonic logs low frequency components well (Figure survey further enhances confidence in FWI’s detailed
2, Figure 6), captures major velocity variations conforming delineation of stratigraphy at the reservoir level (Figure 5).
to geology (Figure 3), and overall the corresponding
migration gathers are fairly flat. Although it is a state-of-
the-art reflection tomography flow, the vertical resolution Discussion: Factors affecting FWI applications
declines with depth (due to the decreasing subsurface
In FWI applications, certain factors may affect the accuracy
reflection angle). At reservoir level, the tomography
of the output velocity model besides the well-known Signal
velocity has a vertical resolution ~500m, but well data
to Noise ratio: 1) Complex density and anisotropy
indicates the existence of important shorter wavelength
variations: In this case, density was via Gardener’s law and
velocity variations (Figure 2, Figure 5). The missing high
low frequency background anisotropy (VTI) models were
frequencies in the velocity model, in both shallow and deep
employed so as to only update the P velocity. When we
sections, lead to noticeable wave-field distortions in both
push toward higher frequencies for fine details, those
(Kirchhoff) migrated stacks and gathers (Figures 4(a) and
assumptions might no longer hold, due to the fact the
4(b)). Small perturbations in the common image gathers
localized density and anisotropy variations could have
(CIGs) further indicate velocity anomalies of small
impacts of about the same magnitude as the high frequency
horizontal dimensions (relative to max offset), and the
velocity components, so detailed density and anisotropic
resultant small pull-ups and push-downs in the stack at
models might be required. 2) Seismic attenuation (Q):
reservoir level can be observed. Knowing it would be
Similar to the discussion above, when we pursue higher
difficult to push ray-based tomography further, FWI is
resolution, detailed Q modeling might be needed. Even if
introduced. During FWI updates, the anisotropy parameters
the Q field is indeed smooth, Q-FWI is likely needed,
δ and ε are kept unchanged. Diving wave analysis shows
especially in the shallow water environment.
that refraction energy illumination is limited to a maximum
depth ~1.7km. Refraction energy is introduced first to
Conclusions
update the complex shallow area; Reflections are then
employed to further update the deeper section where
Starting from a good quality tomography VTI model, our
refraction could not reach.
25 Hz FWI, by utilizing both refraction and reflection
energies in a top-down layer-stripping manner, produces a
Final results
high-resolution model that conforms well with geology.
For depth imaging: 1) FWI resolves the gather distortions
By incorporating both refraction and reflection data, our
due to lateral velocity variations from shallow to deep; 2)
FWI yields fairly robust results. In Figure 2, the velocity
Events at reservoir level are overall geologically flatter. At
profiles indicate that 7Hz FWI already catches variations of
the same time, the resultant FWI model significantly
wavelengths ~150m, and the following 25Hz FWI yields
improves the velocity resolution at reservoir level, closely
even higher resolution, reaching wavelengths ~50m,
matches the sonic logs, and uncovers geological features
matching the wells nicely (Figure 2). The velocity
beyond imaging: High-resolution FWI delineates small
evolution is illustrated in figure 3. Starting from a top
shallow anomalies and velocity boundaries across faults,
quality tomography model, FWI gradually reveals more
and reveals the strong acoustic impedance contrasts at the
velocity details hidden in the recorded wave-fields. In
reservoir level. It demonstrates FWI can aid both in
Figure 5, the 3D FWI model follows the geologic structure
reducing the velocity uncertainty as well as providing a
closely in the complex overburden and delineates small
geological interpretation.
geo-bodies, channels and faults clearly. Figure 4(b) and
4(d) further demonstrate in the CIG domain how FWI
mitigates those imprints of missing high frequency Acknowledgments
components in the input model at reservoir level:
We thank our joint venture participants for permission to
perturbations in the gathers have been greatly reduced and
present this work. We thank Sergey Birdus and Alexey
focusing has been improved. FWI also helps to improve the
Artemov of CGG for the model building work, Todd
focusing in the gas reservoir beneath the unconformity
Mojesky of CGG for reviewing, and CGG R&D team for
(Figure 4(a) and 4(c), ~3.2km), as the updated model has
technical support .
better resolved the complex structures above the reservoir.
The FWI model itself reveals key reservoir characteristics.

Important impedance contrasts can be observed in the high-
resolution model (Figure 3, Figure 5), as well as potential
© 2017 SEG Page 1369

1500m/s 4km
1000m/s
(a) PSDM model
PSDM Model
4km
7Hz FWI Model
25Hz FWI Model
4000m/s
(b) 7Hz FWI model
500m
(c) 25Hz FWI model
Figure 2: Frequency-evolution FWI comparison Figure 3: Velocity model panels (a) final PSDM model. (b) and (c) are the
- Sonic log QC. 7Hz and 25Hz FWI results.
(a) Full angle stack using Tomography model (b) CDP gathers using Tomography model)
1.5km
0.5 km
© 2017 SEG Page 1370

(c) Full angle stack using 25Hz FWI final model (d) CDP gathers using 25Hz FWI final model)
1.5km
0.5 km
Figure 4: Kirchhoff PSDM stack and CDP gathers. (a) and (b) are migrated results using tomography model (FWI input).
(c) and (d) are migrated resutls using 25Hz FWI model.
(a) Initial tomography model (b) 25Hz FWI final model
4km 4km
1500m/s
4km
4km
4000m/s
Figure 5: 3D model view: (a) Initial tomography model. (b) 25Hz FWI model.
1000m/s
PSDM Model
FWI Model
500m
Figure 6: Sonic log comparisons for four separate wells
© 2017 SEG Page 1371

EDITED REFERENCES
REFERENCES
Lambare, G., P. Guillaume, and J. P. Montel, 2014, Recent advances in ray-based tomography: 76th
Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://doi.org/10.3997/2214-4609.20141151.
Lu, R., S. Lazaratos, S. Hughes, and D. Leslie, 2016, 78th Annual International Meeting, SEG, Expanded
Abstracts, 1242–1246.
Mancini, F., K. Prindle, T. Ridsdill-Smith, and J. Moss, 2016, Full-waveform inversion as a game
changer: Are we there yet?: The Leading Edge, 35, 445–451,
http://doi.org/10.1190/tle35050445.1.
Privitera, A., A. Ratcliffe, and N. Kotova, 2016, A full-waveform inversion case study from offshore
Gabon: 78th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://doi.org/10.3997/2214-4609.201600830.
© 2017 SEG Page 1372

Improved seismic imaging of fault shadows and shallow gas via multi-parameter Full-
Wavefield Inversion: A Niger Delta Case Study
Gboyega Ayeni, Rishi Bansal, Jaewoo Park, Jacob Violet, Spyros Lazaratos, Rongrong Lu, Eric Wildermuth,
Neelamani Ramesh (ExxonMobil Upstream Research Company), Michael Gurch and Gary Lewis (ExxonMobil
Exploration Company (Div. of Exxon Mobil Corporation)
Summary of subsurface heterogeneities associated with fault shadows

and shallow gas accumulations. In regions impacted by
We demonstrate improved seismic imaging derived from fault shadows or shallow gas anomalies, such FWI models
multi-parameter FWI application to an offshore field that is can provide improved imaging relative to conventional
impacted by fault shadows and shallow gas anomalies. Due model building methods. Furthermore, highly-resolved
to these imaging challenges, conventional tomographic models facilitate estimation and compensation for seismic
model building methods are inadequate. However, high- attenuation caused by shallow gas anomalies. This case
resolution models derived from FWI adequately define the study demonstrates improvements in seismic imaging in
strong lateral subsurface heterogeneities which are regions impacted by fault shadow and shallow gas
associated with fault shadows and which are present in the anomalies through the application of multi-parameter Full
vicinity of shallow gas anomalies. The study area, located Wavefield Inversion.
offshore Niger Delta presents both these challenges and we
show that multi-parameter FWI overcomes them. Case Study
Furthermore, the highly resolved gas anomalies derived
through FWI enables estimation and compensation for The study area is located in the Niger Delta at a water depth
attenuation. of approximately 20m (Figure 1). The goal of this study is
to improve seismic imaging within fault shadow regions
and beneath shallow gas anomalies. The prevalence of
Introduction these imaging challenges in this region has been discussed
by previous authors (Reilly and Edoziem, 1998, Aikulola et
Full Wavefield Inversion (FWI) utilizes kinematic and al., 2010). The input data comprise of the raw hydrophone
amplitude information in recorded seismic data to estimate component of a 4C narrow-azimuth Ocean Bottom Cable
high-resolution models of subsurface properties (Tarantola, (OBC) survey. The nominal inline and crossline offsets are
1984; Pratt et al., 1998; Virieux and Operto, 2009). In approximately 8 km and 400 m, respectively.
many cases, FWI models (which produce synthetic
seismograms that match the recorded data) can provide
improved imaging of the recorded seismic data and can be
interpreted directly. Several published applications
demonstrate that FWI technology can overcome challenges
that exist in conventional model building techniques
(Sirgue et al., 2009, Warner et al., 2010, Lu et al., 2013).
One common imaging challenge is the presence of so-
called “fault shadows” in seismic images. Fault shadows
can be caused by an inaccurate estimate of the strong
velocity heterogeneities across faults (Fagin, 1996). Such
heterogeneities may result from changes in lithology or
pressure regimes across faults. Although there are other
factors such as illumination or data pre-processing which
contribute to the existence of fault shadows in seismic
images, in this paper, we consider only the effects Figure 1: Map of Offshore Niger-Delta showing location of
associated with model inaccuracies. The presence of gas the study area
clouds in the shallow subsurface above targets are also a
recurring problem in seismic imaging. A good estimate of We follow a multi-scale strategy (Bunks, et al, 1995), by
seismic velocities in and around gas clouds is required in first inverting the lowest frequencies and then progressively
order to obtain reliable seismic images below such increasing the bandwidth of the input data (up to 15Hz),
accumulations. Furthermore, the attenuative properties of updating vertical velocity and anisotropy. Figure 2 (a)
these gas accumulations commonly necessitate estimation shows the starting vertical velocity model which was
and compensation in order to reduce the associated seismic derived from reflection tomography. Figure 2 (b) shows the
image deterioration. FWI provides high-resolution models
© 2017 SEG Page 1373

Improved seismic imaging of fault shadows and shallow gas via multi-parameter FWI
corresponding model obtained by FWI. Reflection

tomography fails to resolve the low velocities associated
with the shallow gas anomaly, whereas, FWI provides a
reliable definition of the shallow gas accumulation.

Furthermore, FWI captures the strong lateral velocity
change across the major graben faults. In addition, the FWI
model provides a high-resolution definition of potential
drilling hazards. In addition, the high-resolution FWI
model facilitates estimation of a Q-model which
compensates for seismic energy loss due to attenuation of
seismic waves propagating through the gas. Figure 3 shows
a section and map view of the inverse Q distribution
overlain on the corresponding Kirchhoff PSDM images.
(a)
Figure 4 shows the migrated images derived from the
conventional reflection tomography and from FWI. On the
footwall of the major fault, note that reflector sags are
reduced and imaging of deeper targets improved. In
addition, imaging of the faults and reflections within the
graben is improved in the FWI image. FWI image gathers
show significant improvements over the initial model
(Figure 5), thereby providing improvements in the stacked
image. Furthermore, simulated data through the initial and
final models shows that FWI provides improved match of
the kinematics in the Field data (Figure 6).
(b)
Figure 3: Inverse Q anomalies associated with gas
accumulations within the study area. (a) N-S cross-section
through the study area, and (b) depth slice highlighted with
dashed line in (a).
Conclusions
Multi-parameter FWI provides high-resolution models that

(a) define the strong heterogeneities associated with fault
shadow and shallow gas accumulations. Furthermore, such
highly-resolved models facilitate estimation and
compensation for seismic attenuation caused by shallow
gas anomalies. Conventional methods may not sufficiently
address these challenges. This case study shows that where
such challenges exist, multi-parameter FWI can provide
significant imaging improvements.
Acknowledgements
We thank the joint venture partners Nigerian National

Petroleum Corporation (NNPC) and Mobil Producing
(b) Nigeria Unlimited (MPN) for permission to show these
Figure 2: (a) Initial and (b) FWI Vertical velocity models. results. We thank Eric Wildermuth, Tim Garfield and
Carey Marcinkovich for technical insights and
ExxonMobil Upstream Research Company for permission
to publish.
© 2017 SEG Page 1374

(a)
(b)
Figure 4: Migrated images derived from (a) Conventional PSDM and (b) FWI. Note that FWI provides improved imaging within
the graben and across the faults
© 2017 SEG Page 1375

Improved seismic imaging of fault shadows and shallow gas via multi-parameter FWI
(a) (b)
Figure 5: Kirchhoff image gathers derived from (a) Initial Model and (b) FWI Model. Note the improved gather flatness derived
from the high-resolution FWI model
(a) (b) (c)
Figure 6: Shot records from (a) forward simulation through initial model (b) forward simulation through FWI model and (c) field
data. Note that timing errors (indicated by the arrows) which are present in the simulated data through the initial model have been
corrected by FWI.
© 2017 SEG Page 1376

EDITED REFERENCES
REFERENCES
Aikulola, U. O., S. O. Olotu, and I. Yamusa, 2010. Investigating fault shadows in the Niger Delta: The
Leading Edge, 29, 16–22, http://doi.org/10.1190/1.3284048.
Fagin, S., 1996, The fault shadow problem: Its nature and elimination: The Leading Edge, 15, 1005–
1013, http://doi.org/10.1190/1.1437403.
Lu, R., S. Lazaratos, K. Wang, Y. H. Cha, I. Chikichev, and R. Prosser, 2013, High-resolution elastic
FWI for reservoir characterization: 75th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, Th 10–02, http://doi.org/10.3997/2214-4609.20130113.
Mora, P., 1987, Nonlinear two-dimensional elastic inversion of multioffset seismic data: Geophysics, 52,
1211–1228, http://doi.org/10.1190/1.1442384.
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods infrequency-space
http://doi.org/10.1046/j.1365-246X.1998.00498.x.
Reilly, J., and C. Edoziem E., 1998, Seeing below gas seepage, in P. Shultz ed., The seismic velocity
model as an interpretation asset (No. 2): SEG.
Sirgue, L., O. I. Barkved, J. P. Van Gestel, O. J. Askim, and J. H. Kommedal, 2009, 3D waveform
inversion on Valhall wide-azimuth OBC: 71st 75th Annual International Conference and
Exhibition, EAGE, Extended Abstracts, U038, http://doi.org/10.3997/2214-4609.201400395.
Tarantola, A., 1984, Inversion of seismic-reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, http://doi.org/10.1190/1.1441754.
Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V. Vinje,I. Štekl, L. Guasch, C.
Win, G. Conroy and A. Bertrand, 2013, Anisotropic 3D full-waveform inversion: Geophysics, 78,
no. 2, R59–R80, http://doi.org/10.1190/geo2012-0338.1.
© 2017 SEG Page 1377

Full-bandwidth AWI at the reservoir
Full-bandwidth adaptive waveform inversion at the reservoir

Henry A. Debens* & Fabio Mancini, Woodside Energy Ltd; Mike Warner & Lluís Guasch, S-CUBE London
Summary minima in FWI, for example the misidentification of

multiples as primaries or the misidentification of one branch
Adaptive waveform inversion (AWI) is one of a new breed of a multi-pathed arrival with another. Some of these non-
of full-waveform inversion (FWI) algorithms that seek to cycle-skipped causes of local minima become more likely at
mitigate the effects of cycle skipping (Warner & Guasch, higher frequencies.
2016). The phenomenon of cycle skipping is inherent to the
classical formulation of FWI, owing to the manner in which Given these characteristics, it is not immediately clear that
it tries to minimize the difference between oscillatory the multiscale approach commonly applied in conventional
signals. AWI avoids this by instead seeking to drive the ratio FWI (Bunks et al., 1995) will still provide the most effective
of the Fourier transform of the same signals to unity. One of strategy for AWI. Its use in FWI is essential to avoid cycle
the strategies most widely employed by FWI practitioners skipping unless the starting model is extremely accurate, but
when trying to overcome cycle skipping, is to introduce in AWI cycle skipping is no longer a consideration. This
progressively the more nonlinear components of the data, raises the question as to whether traditional approaches that
referred to as multiscale inversion. Since AWI is insensitive move from low to high frequency during FWI are still
to cycle skipping, we assess here whether this multiscale appropriate for AWI. If it is the case that the multiscale
approach still provides an appropriate strategy for AWI. approach is no longer required, then this reduces the
requirement to capture ultra-low frequency data for
Introduction waveform inversion, and can avoid the cost of ensuring that
this portion of the data is clean.
Full-waveform inversion is now considered by many to be a
routine tool for exploration and development (e.g. Mancini Field data example
et al., 2015). This is because of FWI’s ability to generate
high-resolution high-fidelity models of subsurface To explore the behavior of AWI with respect to inversion
properties, principally acoustic velocity, notwithstanding the bandwidth, we designed a series of experiments using a
large computational costs associated where runtimes subset of narrow azimuth towed-streamer data collected over
increase as the fourth power of the maximum frequency. the Rakhine Basin, offshore Myanmar. This dataset,
acquired in 2014, used flip-flop air gun arrays, deployed
FWI is not however without its limitations. Classical FWI 50 m apart, fired sequentially every 50 m, into ten 7-km
seeks to minimize, in a least-squares sense, the difference cables towed at 15 m depth. The resultant data were high
between observed and modeled seismic data (Tarantola, density and of high quality, with good low-frequency
1984). Because seismic data are band-limited and content down to the hydrophone low-cut filter at 3 Hz. The
oscillatory, the sum of the squares of their differences will target was an accumulation of biogenic gas within a tertiary
pass through a local minimum whenever one dataset is clastic reservoir thought to have been deposited by the
shifted in time by an integer number of cycles with respect Ganges-Brahmaputra system (Harrowfield, 2015).
to the other. The resultant phenomenon of cycle skipping is
one of the principal roadblocks to FWI’s widespread The data were minimally preprocessed for waveform
application to seismic data. inversion. Swell noise was removed, and incoherent and
linear noise were filtered from the low-frequency
AWI reformulates the inversion problem so that it seeks not component. The data were then low-pass filtered and
to drive the difference of the two datasets to zero but instead decimated in the receiver domain, before being muted ahead
seeks to drive their ratio to unity. In practice, this ratio is of the first arrivals. For FWI, reflection events arriving after
formulated in the frequency domain where AWI then 6 s were muted; for AWI no such bottom mute was applied.
becomes equivalent to designing a wiener filter that matches From one of these preprocessed sail lines, which passed
one dataset to the other, and the inversion seeks to drive this directly over the target and near to a recent exploration
filter towards a unit-amplitude, zero-lag, band-limited, delta wellbore, a single 2D gun-cable combination was extracted.
function. This formulation does not pass through a local The cable was feathered by up to 8°.
minimum when the two datasets differ by an integer number
of wave cycles, and so it is entirely unaffected by cycle A source wavelet was derived from numerical modeling of
skipping. the air gun array, corrected in phase and amplitude for 2D
propagation. The initial velocity model was based on an
Although it is able to circumvent cycle skipping, AWI early iteration of reflection traveltime tomography, scaled to
possesses no special immunity to other causes of local checkshot data from the nearby well. The position and
© 2017 SEG Page 1378

reflectivity of the water bottom was refined by modeling and is dominated by high-frequency dipping artefacts in the
synthetic gathers at regular intervals. A VTI anisotropy shallow section. Similar noise is also present at a lower
model was derived using residual moveout on Kirchhoff spatial frequency in Figure 1c below 3 km, but is less
offset-domain common-image gathers (ODCIGs) migrated prevalent; the same noise does not obviously exist in
using the initial model, and the depth to a marker horizon in Figures 1a or 1b.
these ODCIGs relative to the nearby well.
In a second experiment, Figure 2, we compared full-
Results bandwidth AWI and FWI at 23 Hz with the results of
conventional multiscale FWI, beginning at 3 Hz and running
The first experiment was to compare the results of AWI, up to 23 Hz. For this test, shallow velocities in the starting
undertaken without using multiscale inversion, when using model, Figure 2a, were reduced by 1.5%. This is sufficient
different bandwidths of input data. This was done to assess to introduce cycle skipping into the data for full-bandwidth
the frequency at which full-bandwidth AWI breaks down, FWI. For AWI, in order to avoid the generation of sub-
and to determine whether different bandwidths produce vertical noise at depth, we smoothed the model partway
comparable results. Figure 1 shows the results of four AWI through the inversion before continuing. This smoothing
runs using input data that had been low-pass filtered at 5, 11, increased with depth, and was predominantly horizontal.
23 and 47 Hz, respectively. Each inversion used a starting
model similar to that shown in Figure 2a, identical The results of the smoothed full-bandwidth AWI are shown
parameterization and 80 iterations using the full bandwidth in Figure 2b; comparison with Figure 1c shows the effect of
of the input data. the smoothing in suppressing the noise. Figure 2c shows the
results of applying full-bandwidth FWI at 23 Hz. This
As would be expected, the main difference between each run model is now severely compromised by cycle skipping and
is the resolution captured by the inversion. Figures 1a-c there is significant spurious structure around and above the
show broadly the same features: a low-velocity, horizontal, reservoir, even in the shallow subsurface. This is
laterally-extensive reservoir at approximately 3 km depth, a unsurprising; the starting model is not especially accurate
number of shallow gas pockets in an otherwise benign and conventional FWI is not designed to begin at these high
overburden, and simple horizontal stratigraphy. The final frequencies from an imperfect starting model.
inversion, Figure 1d, however does not capture the reservoir
(a) (b)
(c) (d)
Figure 1: Full-bandwidth AWI results, using low-pass filters with corner frequencies of: (a) 5 Hz, (b) 11 Hz,
(c) 23 Hz, and (d) 47 Hz. Each inversion used a spatial filter of identical scale length applied to the AWI gradient.
© 2017 SEG Page 1379

Using multiscale FWI, Figure 2d, largely avoids the effects that these are likely to be spurious.
of cycle skipping, and produces a more geologically realistic
model, particularly in the first 1.5 km below mudline. The In an effort to assess which of Figures 2b and 2d is a more
reservoir in Figure 2d appears fragmented at its fringes, with accurate representation of the true subsurface, the data were
suggestions of gas migration into the overburden. Full- demultipled, deghosted and zero-phased, and migrated using
bandwidth AWI converges to a similar solution in the Kirchhoff prestack depth migration (PreSDM). The
shallow section, but it produces a flatter reservoir that agrees resultant full stack sections are displayed in Figure 3,
closely with the reflection image. overlain on the respective velocity models used to migrate
the data. Whilst the differences between the images
Full-bandwidth AWI and multiscale FWI also differ in the themselves are subtle, the AWI result, Figure 3b, appears to
impact of the limited data fold towards the margins of the correlate significantly better with its stack. As the image and
model. Spurious updates in this region are stronger in the model were generated using two quite distinct regimes of
FWI results than in AWI, most likely because of the wavefield propagation, this agreement suggests that the AWI
narrower aperture of the AWI impulse response. In the left- model provides a reliable account of the geology. The AWI
hand side of Figure 2d, in particular, these sweeping migration result is also generally flatter and more regular.
artefacts appear to merge into geology, placing uncertainty
on the model update here, and impacting migration quality. ODCIGs were generated as a byproduct of the PreSDM,
Figure 4. It can be seen that the multiscale FWI gathers
Another noticeable difference between the two results is the contain non-systematic errors throughout. It is especially
shape and position of the target reservoir. The reservoir noticeable that the gathers in the weaker lower-half of the
appears more central and flatter in the AWI result than the FWI section are under migrated, suggesting that the FWI
multiscale FWI. One concern with the AWI model is the model has not been able to recover from the perturbation that
presence of broad bowl-like features in the shallow was introduced deliberately into the starting model. The
subsurface, similar in appearance to migration aperture AWI gathers are flatter, brighter, and more continuous in
artefacts in imaging. Comparable structures are prominent form. This suggests that the AWI model is indeed more
in Figure 1d, where AWI has clearly gone awry, suggesting kinematically accurate than its multiscale FWI counterpart.
(a) (b)
(c) (d)
Figure 2: (a) Early-stage reflection tomography model, adjusted by 1.5%, used as a starting model; (b) full-bandwidth AWI
model at 23 Hz; (c) full-bandwidth FWI model at 23 Hz; and (d) multiscale FWI model increasing by stages from 3 to 23 Hz.
© 2017 SEG Page 1380

(a) Discussion and conclusions
AWI is a powerful new approach to waveform inversion that

circumvents the phenomena of cycle skipping. The manner

in which AWI and similar domain-extension techniques
appear set to advance waveform inversion capabilities
requires a corresponding reassessment of the optimal
strategy for successful and commercially efficient waveform
inversion. One such strategy is Bunks et al.’s (1995)
multiscale approach designed to accentuate the more linear
(b) low-frequency component of the wavefield, and as the
inversion converges, gradually introduce the more nonlinear
high-frequency component.
AWI does not require this approach in order to mitigate cycle

skipping. When using only a moderately accurate starting
model, full-bandwidth AWI at 23 Hz appears able to
generate a velocity model that produces superior migration
outcomes when compared to conventional multiscale FWI
iterated from 3 to 23 Hz. Full-bandwidth FWI at 23 Hz of
the same data of course fails entirely. Despite its
Figure 3: (a) Multiscale FWI and (b) full-bandwidth insensitivity to cycle skipping, AWI is not able to begin
AWI inversion results after 95 total iterations with inversion at frequencies as high as 47 Hz, at least for this
PreSDM overlay. Both sections are 27.5 km in length. quality of starting model. Given a better starting model
though, AWI should be able to achieve even this.
A major benefit of AWI is a reduced reliance on low-

(a) frequency data to begin waveform inversion. These data are
expensive to acquire and require additional processing effort
to ensure that their signal-to-noise is suitable for waveform
inversion. Although we have shown that full-bandwidth
AWI can be effective, we do not advocate that AWI should
normally begin at frequencies as high as 23 Hz. Typically
we obtain the best outcome from AWI by starting at some
intermediate frequency, lower than 23 Hz, but well above the
3 Hz that has become typical for FWI.
(b) AWI has the potential to be integrated with and lend its
benefits to other waveform inversion frameworks. An
example of such is the combined local and global inversion
for anisotropy parameters (Debens et al., 2015), where the
use of AWI provides resilience against the effects of local
minima related to cycle skipping.
Acknowledgements
The authors would like to thank Woodside, Woodside’s joint

venture participant POSCO DAEWOO, and Myanma Oil
Figure 4: Kirchhoff PreSDM ODCIGs generated using: and Gas Enterprise (MOGE), for permission to present this
(a) multiscale FWI (Figure 2d) and (b) full-bandwidth work. They also wish to express gratitude towards the
AWI (Figure 2b). Each ODCIG is 150 m apart and has FULLWAVE Game Changer JIP at Imperial College
had an outer angle mute applied at 50°. London, for their role in developing the FWI engine used.
Any views or opinions expressed in this paper are solely
those of the authors and do not necessarily represent those of
Woodside or S-CUBE.
© 2017 SEG Page 1381

EDITED REFERENCES
REFERENCES
Debens, H. A., M. Warner, A. Umpleby, and N. V. da Silva, 2015, Global anisotropic 3D FWI: 85th
Harrowfield, G., 2015, Mass transport complexes of the Rakhine Basin, Myanmar – Deepwater examples
from recent 3D seismic data: SEAPEX Exploration Conference.
Mancini, F., K. Prindle, T. Ridsdill-Smith, and J. Moss, 2016, Full-waveform inversion as a game
changer: Are we there yet? The Leading Edge, 35, 445–448, 450–451,
https://doi.org/10.1190/tle35050445.1.
1259–1266, https://doi.org/10.1190/1.1441754.
Warner, M., and L. Guasch, 2016, Adaptive waveform inversion: Theory: Geophysics, 81, no. 6, R429–
R445, https://doi.org/10.1190/geo2015-0387.1.
© 2017 SEG Page 1382

Two vs three-dimensional FWI in a 3D world
T. Kalinicheva*, M. Warner, J. Ashley (Imperial College London) and F. Mancini (Woodside Energy Ltd.)
Summary
In practice, depending in part upon the acquisition
Three-dimensional anisotropic acoustic FWI has become a geometry, it is often possible to reduce the source density
relatively routine component of depth-model building for in 3D FWI below that required in pure 2D so that this
PSDM and shallow-hazard identification, and is increas- scaling is not quite as severe as suggested, but 3D FWI is
ingly being used for pore-pressure prediction and reservoir always significantly more expensive than 2D. In addition,
characterization. However, 3D FWI is relatively resource as the maximum frequency of the data increases, the
intensive, especially at higher frequencies. Consequently number of mesh points required to capture the wavefield
2D FWI can provide a low-cost option when extensive accurately also increases so that the difference in cost
initial testing is required for parameter selection and quality between 2D and 3D FWI becomes more marked.
control of starting models. In addition, during early
exploration, full 3D coverage may not be available Field data
everywhere, and long 2D lines are not uncommonly used to
tie new 3D surveys to more-distant wells and provide We have applied both 2D and 3D FWI to two datasets from
regional context. In these circumstances, 2D FWI may the Carnarvon basin on the NW Australian shelf. In the
have a role to play as part of a larger 3D FWI workflow. first dataset, an 80-km 2D line was acquired using a single
source and single streamer. Data acquisition was optimized
Here we apply both 2D and 3D FWI to the same field for FWI by employing a 10,000-m cable, towed at 25-m
datasets to explore the utility, accuracy and limitations of depth, and a large low-frequency source array towed at 10-
the former. We show that 2D FWI applied to long regional m; the shot interval was 50 m. For this survey, the shallow
lines has exploration benefit and that the additional benefit velocity model was reasonably benign for FWI but there
of applying full 3D FWI to this type of data is limited. We are thin high-velocity igneous intrusions in the deeper
also demonstrate that early testing may be rapidly and section. The water depth ranged from 600 to 1600 m.
usefully performed using 2D FWI ahead of full-3D produc- Figure 1a shows a typical shot record; there are strong
tion FWI with a consequent saving in both time and cost. water-bottom multiples, significant refracted energy at
Initial 2D testing can be especially relevant ahead of cost- longer offsets, and good signal-to-noise at low frequencies.
sensitive decisions to run FWI to higher frequencies where
the 2D results can provide a low-cost initial indication of The second dataset was taken from a conventional 3D
the potential benefits of increased bandwidth in 3D. narrow-azimuth towed-streamer survey designed princip-
ally to enhance spatial resolution rather than for improved
Introduction FWI. The survey used flip-flop sources and eight cables.
The cable length was 5500 m., towed at 6-m depth, and the
3D FWI allows the use of velocity models that vary in three sources were towed at 5-m depth with an 18.75 shot
dimensions, but it also allows sources and receivers to be interval. The underlying velocity model was more
properly distributed in space and it allows those sources complicated than for the first survey with several
and receivers to act as points rather than the lines that 2D generations of buried channels, mini-basins and velocity
wave propagation assumes. The latter changes both the inversions, and this velocity model is difficult to recover
amplitude and phase spectra of the predicted data and without assistance from extensive reflection tomography.
influences its temporal decay. Consequently, 3D The water depth is around 1400 m. Figure 1b shows a
simulation provides different results and is superior to 2D typical shot record; there is less refracted energy, less low-
modelling of the same data. Some of these differences can frequency signal, and more noise than in Figure 1a.
be mitigated during 2D FWI and some cannot.
We applied acoustic VTI anisotropic 2D and 3D FWI on
Most algorithms that are used to simulate the seismic both datasets with minimal preprocessing. Both refracted
wavefields required for FWI scale as n3 in 2D and as n4 in and reflected arrivals were used throughout, and surface
3D for a single source where n is a measure of the linear ghosts and multiples were retained within the field data.
dimensions of the model in mesh points. In addition, 3D We took no explicit account of attenuation or elastic
modelling requires additional source coverage in the third effects, but our FWI code is designed to be robust against
dimension so that in 3D the number of sources required systematic amplitude variations in the field data that do not
scales as n2 whereas in 2D it scales as n. Consequently the match our acoustic assumptions. We used regional values
computational cost of 3D FWI is around n2 times greater for anisotropy and held these values constant during FWI.
than is 2D FWI; typically n is a few hundred or more.
© 2017 SEG Page 1383

The final 2D FWI-derived velocity model is shown in

Figure 2b, and overlain by the 2D PSDM section in Figure
2c. The strong irregular reflections in the lower half of the
section are from basaltic intrusions; these appear as high-

velocity features in the FWI velocity model. Both the FWI
velocity model and the PSDM pick out a major uncon-
formity that traverses the section, and both show shallow
channels in the upper parts of the section. Figure 3 shows a
close-up of both the FWI model and the PSDM where the
detailed correlation between the two sections is clear.
To assess the accuracy of the 2D FWI velocity model,

surface-offset common-image gathers generated using 2D
Kirchhoff PSDM were generated, Figure 4. Figure 4a uses
Figure 1a: Dataset one, shot record, filtered below 30 Hz. the starting model, Figure 4b uses the results of isotropic
2D FWI, and Figure 4c uses the results of anisotropic 2D
FWI. It is clear from these gathers that purely 2D FWI is
able to produce a velocity model that is sufficiently
accurate to migrate these data. The gathers are flatter
following 2D FWI, and even though the inversion is only in
2D, it is clear that the inclusion of anisotropy during FWI
improves the outcome.
Figure 1b: Dataset two, shot record, filtered below 30 Hz.
The source wavelet required for FWI differs between 2D

and 3D. The appropriate wavelet was obtained using an
initial estimate to model the direct arrival through the
water, subsequently using a Wiener filter to match this to
the observed direct arrival. This approach automatically
corrects the phase and amplitude spectra of the source for
2D, and also deals correctly with the surface ghost.
We did not explicitly correct the temporal decay of

amplitude for 2D FWI. Instead we correct this heuristically
by matching RMS amplitudes between predicted and
observed data, suitably stabilized, at each time slice within
each shot record. This forms part of our default
parameterization for marine field data where it is designed
principally to deal with the amplitude effects of anelasticity
and sub-grid-cell scattering, but it has the useful side effect
of also dealing reasonably effectively with the amplitude
differences introduced by 2D simulation.
Results – dataset one
We began the inversion at 2.5 Hz using a starting velocity

model based upon smoothed PSTM stacking velocities,
Figure 2a. We ran 2D inversion to a maximum frequency Figure 2: a) Starting velocity model. b) Anisotropic
of 24 Hz. For this inversion, the data were collapsed onto a 2D FWI-derived velocity model at 24 Hz. c) FWI
2D line, preserving source-receiver offset; the maximum velocity model overlapped with PSDM section.
feathering of the 10-km cable was 650 m.
© 2017 SEG Page 1384

Figure 4: Close up of Figure 3; lateral extent is 18 km.
The migration is not perfect; both FWI and PSDM were

here performed in 2D, and cable feathering and cross-line
velocity changes are not dealt with correctly by either
method. Nonetheless, it is clear that 2D FWI migrates the
data better than the starting model. For long regional tie-
Figure 5: CIGs generated using: a) the starting
lines, 2D FWI clearly provides benefit.
model, b) isotropic 2D FWI, c) anisotropic 2D FWI.
We repeated the inversion of this 2D line using full 3D
FWI run to a maximum frequency of 12 Hz. Since there is
little control over cross-line variations in velocity provided
by this 2D dataset, we regularized the model strongly in the
cross-line direction during FWI. Apart from this
regularization, the inversion was fully in 3D with sources
and receivers placed in their true positions.
Figure 6 shows a comparison between the velocity models

obtained using 2D and 3D FWI. These models are not
identical, but they are remarkably similar. The local
absolute velocities do not always agree but we could
ascertain no systematic differences between them.
Sometimes one and sometime the other model appears to
be better resolved and to match the PSDM more closely.
There are significant differences between the two results in
the area circled in Figure 6. Here the streamer was not
straight, the feathering was larger than normal, and the 2D
assumption was more significantly in error. Unsurp-
risingly, in these circumstances, using the full 3D geometry Figure 6: FWI velocity model at 12 Hz: a) 2D, b) 3D.
leads to a significantly improved outcome for FWI. The 3D result is about 400 times the cost of 2D FWI.
© 2017 SEG Page 1385

Results – dataset two Qualitatively the 2D and 3D results are similar, but the
velocity anomalies introduced by FWI are almost always
Unlike dataset one, here the data were acquired with full more intense in the 3D model. It is not clear why that
3D acquisition. Consequently the main role of 2D FWI in should be the case. Significantly more data is used to
this survey is in early parameter testing and in assessing the generate the 3D result, and this additional data will act to
commercial value of more-expensive algorithms, for reduce the effective signal-to-noise ratio during FWI. This
example extending FWI to higher frequencies. This can be dataset has high noise levels at low frequencies because of
especially relevant when production runs are performed on the shallow tow depth, and we speculate that the amplitudes
the cloud and early testing is performed using more-limited of the velocity updates during 2D FWI are suppressed by
local resources. these higher noise levels acting to compromise the data fit.
Figure 7 shows the start model and results of 2D and 3D For this survey, we used extensive testing with 2D FWI to
anisotropic FWI for this survey. FWI was run between 3.6 design the optimal parameterization for subsequent 3D
and 7.8 Hz; similar parameterization was used for both inversion. This is a difficult dataset to invert, and previous
inversions. In dataset one, 2D FWI used a single source attempts have relied heavily on reflection tomography. We
and single cable so that cable feathering becomes an issue. found the initial 2D testing to be invaluable in developing
In dataset two, we have full 3D acquisition, and so are able the full 3D workflow. Figure 8 demonstrates that the final
to select a subset of sources and receivers from multiple anisotropic 3D result matches check-shot data at the well,
cables that closely match a true 2D geometry; this match is and serves to validate this approach to 3D FWI.
not perfect because of the finite cross-line spacing of both
sources and receivers, but it is much closer than can
normally be obtained by using data from a single cable.
Figure 8: Comparison of
check shots, starting model,
and 2D and 3D anisotropic
FWI models. The location
of the well is shown in
Figure 7.
Conclusions
We have successfully applied anisotropic full-waveform

inversion to marine-streamer datasets in both 2D and 3D.
We have shown that 2D FWI applied to regional 2D tie-
lines appears to work as well as full 3D FWI provided that
the acquisition geometry is approximately two-
dimensional. We have also shown that 2D FWI of
conventional 3D NATS data can provide velocity models
that are qualitatively similar to those generated by full 3D
Figure 7: (a) Starting model, (b) 2D FWI, (c) 3D FWI. FWI. We have also shown that 2D FWI proves to be useful
The inversion is to a maximum frequency of 7.8 Hz. for early parameter testing while working with 3D datasets.
The base of the section is at 4 km below the sea surface.
Vertical line shows well from Figure 8. We thank Woodside Energy Limited and Mitsui E&P
Australia for permission to publish this paper.
© 2017 SEG Page 1386

EDITED REFERENCES
REFERENCES
No references.
© 2017 SEG Page 1387

Application of visco-acoustic waveform tomography to walkaway VSP data from a carbonate reser-
voir
E.M. Takam Takougang∗ , Y. Bouzidi, The Petroleum Institute
SUMMARY compare the final velocity model with the available sonic log
followed by an interpretation of the results.

2D visco-acoustic waveform tomography was applied to walk-
away VSP data acquired over a carbonate reservoir in the
United Arab Emirates, to form high resolution images of P- DATA ACQUISITION
wave velocity and attenuation structures. The waveform to-
mography algorithm was implemented in the frequency do- Eight multi-offset Vertical Seismic Profile (walkaway VSP)
main, and the field data were inverted between the frequencies data were collected offshore, in an oil field in Abu Dhabi,
4-50 Hz. The preconditioning of the input data, the inversion United Arab Emirates. The purpose of the survey was to pro-
strategy and the starting models were all critical for the suc- vide structural information of the reservoir, around and away
cess of the inversion. A specific inversion strategy was used to from the borehole. The water depth in the area was about 14
decouple between updates in velocity and attenuation parame- m and, an air gun was used to shot the lines at 4 m depth,
ters during the inversion. A high resolution velocity model that with a shot interval of 25 m. A typical recording tool with
correlate with the available sonic log was obtained, confirming 20 receivers spaced every 15.1 m was deployed in a deviated
the reliability of the results. Zones with anomalous low veloc- borehole at different depths for each line, from 521 m to 2742
ity values associated with relatively high attenuation values at m depth. We selected 5 parallel and very close lines and merge
known locations of hydrocarbon reservoirs were identified. them to form the input dataset for waveform tomography.
WAVEFORM INVERSION
INTRODUCTION
We adopted the 2D frequency domain visco-acoustic wave-
Waveform tomography can provide subsurface images of high
form tomography approach as described in (Pratt et al., 1998;
resolution (velocity, density, attenuation) that can potentially
Pratt, 1999). The success of the inversion was determined by
be directly comparable to seismic migration. The usual diffi-
3 important factors: the preconditioning of the input data, the
culties in the implementation of waveform tomography are its
starting model, and the inversion strategy.
non linearity due to the abundance of information used from
the input data (amplitude and phase), the variable data qual- Data preconditioning
ity especially at low frequencies, the starting models and the The successful application of waveform tomography requires
starting frequency. Most of the time, it is necessary to de- a good preconditioning of the input data. Any aspect of the
velop a specific inversion strategy for a successful applica- input data that cannot be predicted by the 2D vico-acoustic
tion (e.g. Takougang and Calvert, 2013; Kamei et al., 2014). propagation scheme, e.g., shear waves, coherent noise, shot to
Several applications, ranging from crustal studies using wide shot energy variations, amplitude discrepancy and bad traces,
angle refraction data to reflection seismic data have been re- was removed or corrected. Figure 1 shows a receiver gather at
ported (e.g. Takougang and Calvert, 2013; Pratt et al., 1998; shallow depth; the presence of shear waves (indicated by the
Pratt, 1999; Pratt et al., 2004; Operto et al., 2006; Malinowski blue arrow in Figure 1a), probably due to mode conversion at
et al., 2011; Mothi and Kumar, 2014; Malinowski et al., 2011; the water-seabed interface, can only be predicted by the elastic
Kamei et al., 2013; Sirgue et al., 2010), some with crosshole wave-equation, and was removed from the input data. The
data (e.g. Wang and Rao, 2006; Zhou and Greenhalgh, 2003; following preconditioning steps were adopted:
Pratt et al., 2004), but only a few with walkaway VSP data
(e.g. Gao et al., 2007; Barnes and Charara, 2009; Yang et al., 1. A f-k filter was chosen to remove the shear waves from
2011; Takougang and Bouzidi, 2016). However, VSP data are the input data (see Figure 1). The data were further
generally of greater resolution with higher frequency contain muted with a maximum time window of 200 ms after
compare to surface seismic, due to less energy loss from propa- the first arrival to avoid the inclusion of late converted
gation effects. Therefore, the application of waveform tomog- waves as well as late multiples. A time-window Tw of
raphy to walkaway VSP data can reveal additional subsurface 2 s was used for the inversion.
information that could be used to complement interpretation
2. The amplitudes of the data were corrected from
√ 3D to
based on surface seismic data.
2D propagation by multiplying the data by t (t is the
This paper presents the application of visco-acoustic waveform propagation time), and traces contaminated by noise
tomography to walkaway VSP data from an oil field in a car- were removed. At this stage, the amplitude variation
bonate reservoir offshore Abu Dhabi in the United Arab Emi- with offset (AVO) of the data were scaled using a scal-
rates. We describe the waveform tomography preprocessing ing factor obtained upon comparison of the logarithm
steps as well as the inversion strategy that resulted in forming of the Root Mean Square (RMS) AVO of the field and
high resolution P-wave velocity and attenuation structures. We the modelled data (Brenders and Pratt, 2007), to force
© 2017 SEG Page 1388

Visco-acoustic waveform tomography
the AVO of the input data to have a behavior similar to tifacts during the inversion at the receivers’ locations, large
that of the AVO predicted by the visco-acoustic mod- smoothness constraints were used. After 80 iterations, a veloc-
elling. ity model (Figure 2a) was obtained with an average root mean
square ( RMS) misfit of 11 ms. The velocity model was then
(a) tested to ensure that it predicts the first arrival traveltimes picks
to within half a cycle. A requirement necessary to ensure a
Shot Number successful convergence of waveform tomography (Sirgue and

1 81 161
0 Pratt, 2003)
The starting attenuation model was obtained using the cen-
tral frequency shift method of Quan and Harris (1997). This
method is based on the principle that the high-frequency com-
1 ponents of a seismic waves are attenuated more rapidly than
the low-frequency components as waves propagate in the sub-
Time(s)
surface. Consequently, the amplitude of the centroid frequency

of the seismic wave diminish during propagation. Assuming
2 that the attenuation model (Q−1 p ) is frequency independent, the
decline in amplitude of the centroid frequency of the seismic
wave is proportional to an integral along the raypath through a
given subsurface model. The algorithm for estimation of atten-
uation is therefore similar to the Eikonal equation of traveltime
tomography (e.g. Aldridge and Oldenburg, 1993), in which the
3 first arrival traveltimes are replaced by the centroid frequen-
(b) cies (frequency shifted data) and the slowness by the quantity
α0 = π/(Q pVp ). After 10 iterations, we obtained a final at-
Shot Number
1 81 161 tenuation model (Figure 2c) that is structurally similar to the
0 starting velocity model with attenuation values in the range
Q−1p = 0.01-0.03.
The Gardner law (Gardner et al., 1974) is generally used to es-

timate density values from velocity values. However, this re-
1 lationship, mainly valid for siliclastic rocks may not be appro-
priate for carbonate rocks (the velocity predicted by the Gard-
Time(s)
ner law are generally lower than the actual velocity values in
carbonate rock) (Liebermann and Sondergeld, 1994). For this
2 reason, the density model was estimated from empirical corre-
lation derived from the sonic and density log at the borehole
location. We derived the following relationship:
ρ = 293.2Vp0.25616 , (1)
with ρ in kg/m3 and Vp in m/s. During the inversion, the

3
density was updated at the same time as the velocity model
Figure 1: Common receiver gather at 536 m depth. (a) field using the previous relationship (equation 1).
data before f-k filtering, and (b) field data after f-k filtering. Waveform inversion strategy
The blue arrow represents the location of shear waves in (a) We conducted the inversion in a visco-acoustic medium. The
that were significantly removed in (b) after f-k filtering. First attenuation (1/Q p ) was introduced using frequency dependent
arrival traveltime picks are indicated in red. An AGC operator complex velocities with the assumption that Q p is large and
length of 500 ms was used to display the data. frequency independent. We have:
Starting models and density V = Vr + iVi (2)

Traveltime tomography (Aldridge and Oldenburg, 1993) was with the attenuation expressed as:
used to generate the starting velocity model (Vp ) from first ar-
rival traveltimes picks.The travel times of first arrivals were Vr
Qp = − , (3)
picked with an accuracy estimated to be within ± 4 ms. A 2Vi
1D model with velocities starting from 2500 ms−1 and lin- with Vr and Vi the real and imaginary components of the veloc-
early increasing with depth, with a constant velocity gradient ity. Our inversion strategy consisted of inverting from the low-
of 1.059 s−1 was used as the starting model. The velocity gra- est frequency to the highest frequency (Bunks et al., 1995) us-
dient of the starting model was estimated from the sonic logs ing a series of time damping constant τ in the Laplace Fourier
at the borehole location. Due to the presence of spurious ar- implementation.
© 2017 SEG Page 1389

(a) (b)
Vp (m/s) Vp (m/s)
5400 5400
300 m
300 m
300 m 300 m
4400 4400
Depth
Depth
3400 3400
R
2400 2400
Distance Distance
(c) (d)
1/Qp 1/Qp
0.03 0.06
300 m
300 m
300 m 300 m
0.04
Depth
Depth
0.02
0.02
R
0.01 0
Distance Distance
Figure 2: Velocity and attenuation models with the receivers’ locations (gray dots) in the deviated borehole superimposed. (a)
Starting velocity model from traveltime tomography, (b) velocity model after 4-50 Hz waveform inversion, (c) starting attenuation
model from the frequency shift method and (d) attenuation model after 4-15 Hz waveform inversion. There is a significant improve-
ment of the velocity and attenuation models after waveform inversion. There is a decrease in velocity associated with increase in
attenuation at a reservoir location (R).
After Shin and Cha (2009), the Laplace transform of a time- 0.4 s, 0.8 s, 1.2 s and 1.6 s. The lower values of τ (0.5 s)
domain wavefield u(x;t) can be expressed as : suppresses waveforms arriving after 0.7 s of our selected 2 s
Z +∞ time window, while τ(s) = 0.8 s, τ(s) = 1.2 s and τ(s) = 1.6
u(x; s) = u(x;t)e−st dt, (4) s progressively include late arrivals. The frequencies 4-25 Hz
0 were used with τ = 0.4 s,0.8 s and 1.2 s while the frequen-
where s is the complex-valued Laplace parameter. By intro- cies 25-50 Hz were used with τ = 1.6 s. Only the velocity
ducing τ, a time decay constant and ω a real-valued angular model was recovered during stage 1, while both velocity and
frequency, we can write s as: attenuation models were recovered during stage 2. This was
done to reduce coupling between velocity and attenuation pa-
s = 1/τ + iω. (5) rameters during models updates (see Pratt et al., 2004; Kamei
and Pratt, 2008; Kamei et al., 2013). A more reliable attenu-
Considering u(x;t < 0) = 0, equation 4 can then be expressed ation model is generally obtained when a high resolution ve-
as (Kamei et al., 2013): locity model is used (see Kamei and Pratt, 2008; Takougang
Z +∞ and Calvert, 2013; Kamei et al., 2013). We started the inver-
u(x; τ, ω) = u(x;t)e−t/τ e−iωt dt. (6) sion at the lowest frequency (4 Hz) and gradually progress to
−∞
the highest frequencies, with a frequency interval of 0.5 Hz.
equation 6 can be seen as the real Fourier transform of a time The frequencies were grouped in 5, spaced every 0.5 Hz, and
domain wavefield multiply by a time damping term, e−t/τ . The were inverted at a time. 5 iterations were used for each group
damping term τ serves to suppress or attenuates late arrivals. of frequencies. The updated models obtained after the inver-
The smaller the decay constant τ, the larger the suppression sion of each group of frequencies was used as starting models
of late arrivals. We realized that a judicious selection of the for the inversion of the next group of frequencies. The source
decay constant τ contributed significantly to the success of the signature was obtained following the method of Pratt (1999)
inversion. A larger selection of τ at an early stage of the inver- and was re-estimated after the inversion of each group of fre-
sion resulted in the failure of the inversion with convergence quencies. During stage 1 of the inversion procedure, we sweep
to local minima, while smaller values of τ focus on early ar- through the frequencies with our selected values of τ (0.5 s, 0.8
rival waveforms which are less prone to cycle skipping. The s, 1.2 s and 1.6 s) and repeat the same procedure during stage
inversion was performed in 2 stages, with frequencies rang- 2. The logarithmic data residual (Bednar et al., 2007; Shin and
ing from 4 Hz to 50 Hz, and 4 values of τ(s) were selected:
© 2017 SEG Page 1390

Ming, 2006) was used during the inversion. The logarithmic seismic attenuation in carbonate rocks from Abu Dhabi oil
data residual defined by the difference between the logarithmic field (Bouchaala et al., 2016). Deeper, A low velocity zones
of the observed and the logarithm of the predicted wavefield (Vp = 2500-3200 m/s) associated with increase in attenuation
had proven to provide a more uniform and deeper illumination (Q p = 0.05-0.06) correlate with a known location of hydro-
than the conventional data residual defined by the difference carbon reservoirs (R). A comparison of 1D velocity profiles
of the predicted and observed wavefield (Kamei et al., 2014). at the borehole location and the sonic log shows an overall
The gradient was preconditioned by wavenumber filtering to good match (Figure 3). The decrease in velocity from wave-
reduce high wavenumber artifacts (Sirgue and Pratt, 2004). form tomography (4-50 Hz model) correlates with decrease in
velocity from the sonic log at the reservoir location (R). How-
ever, the waveform tomography velocity model appears to be
Sonic slightly underestimated in few places, for example at 790 m
Starting
depth (see arrow in Figure 3). The inaccuracy of the veloc-
4-50 Hz
ity model at 790 m depth can be explained by the absence of
receivers around that depth.
CONCLUSIONS
We applied 2D visco-acoustic waveform tomography to walk-

away VSP data acquired over a carbonate reservoir offshore
Depth
Abu Dhabi, in the United Arab Emirates. Five parallel walk-

away VSP lines acquired at 25 m shot interval were merged
500 m/s to form the input data. Each line was acquired at different
250 m
depths using a recording tool with 20 receivers at 15.1 m re-

ceivers interval deployed in a deviated borehole. Waveform
tomography was performed in the frequency domain. Three
parameters were critical for its success: The starting velocity
and attenuation models obtained respectively from traveltime
tomography and from attenuation tomography (centroid fre-
quency shift method), the preconditioning of the input data to
remove converted shear waves and noise, and a specific in-
version strategy based on decoupling velocity and attenuation
R parameters updates, and on a judicious selection of the time
damping constant τ in the Laplace-Fourier domain, to miti-
3000 4000 5000 6000 7000 gate non-linearity. The density was calculated using an empir-
Velocity(m/s) ical relationship derived from the available sonic and density
logs. The inversion was performed using the frequencies 4-50
Figure 3: 1D velocity profile at the projected borehole loca- Hz, and a velocity model that generally correlate with sonic
tion. There is generally a good fit between the waveform to- log was obtained. The attenuation values are generally con-
mography model (4-50 Hz) and the sonic log (Sonic). The sistent with estimated seismic attenuation in carbonate rocks
starting model (Starting) has been significantly improved after from Abu Dhabi oil field. The velocity model show a zone
waveform tomography. The 4-50 Hz velocity model is slightly with anomalous low values associated with increase in atten-
underestimated in places, particularly around 790 m depth (see uation that correlate with a known location of hydrocarbons
arrow). The gray dots represent the projected locations of the reservoirs. This study shows that the application of waveform
receivers. (R) represents the location of a reservoir associated tomography to walkaway VSP data acquired over a carbonate
with decrease in velocity. reservoir can provide substantial information of the reservoir
rocks properties.
RESULTS AND DISCUSSION ACKNOWLEDGMENTS
The starting models and the result of the inversion are dis- We are grateful to the oil-subcommittee of the Abu-Dhabi Na-
played in Figure 2. The waveform tomography velocity model tional Oil Company (ADNOC) and its operating companies
was recovered for the frequency range 4-50 Hz, while the in- (OPCOs) for sponsoring this project and providing the data.
version for attenuation was stopped at 15 Hz due to increas- Special thanks to Marc Michel Durandeau and to all the com-
ing artifacts. The velocity (Figure 2a) and attenuation (Fig- mittee members for their inputs and encouragements to the
ure 2c) models from waveform tomography have much greater project. The waveform tomography code was provided by Ger-
resolution compare to the starting velocity from traveltime to- hard Pratt.
mography and starting attenuation from the frequency shift
method. The attenuation values are generally consistent with
© 2017 SEG Page 1391

EDITED REFERENCES
REFERENCES
Aldridge, D. F., and D. Oldenburg, 1993, Two-dimensional tomographic inversion with finite-difference
traveltimes: Journal of Seismic Exploration, 2, 257–274.
Barnes, C., and M. Charara, 2009, Viscoelastic full waveform inversion of north sea offset vsp data: 79th
http://dx.doi.org/abs/10.1190/1.3255315.
Bednar, J. B., C. Shin, and S. Pyun, 2007, Comparison of waveform inversion, part 2: Phase approach:
Geophysical Prospecting, 55, 465–475, http://dx.doi/10.1111/j.1365-2478.2007.00618.x.
Bouchaala, F., M. Ali, and J. Matsushima, 2016, Estimation of seismic attenuation in carbonate rocks
using three different methods: Application on VSP data from Abu Dhabi oilfield: Journal of
Applied Geophysics, 129, 79–91, https://doi.org/10.1016/j.jappgeo.2016.03.014
Brenders, A. J., and R. G. Pratt, 2007, Full waveform inversion tomography for lithospheric imaging:
results from a blind test in a realistic crustal model: Geophysical Journal International, 168, 133–
151, https://doi/10.1111/j.1365-246X.2006.03156.x.
Geophysics, 60, 1457–1473. https://doi.org/10.1190/1.1443880
Gao, F., A. Levander, R. Pratt, C. Zelt, and G.-L. Fradelizio, 2007, Waveform tomography at a
groundwater contamination site: Surface reflection data: Geophyics, 72, no. 5, G45–G55,
https://org/doi/abs/10.1190/1.2752744.
Gardner, G. H. F., L. W. Gardner, and A. R. Gregory, 1974, Formation velocity and densitythe diagnostic
basics for stratigraphic traps: Geophysics, 39, 770–780, https://org/doi/abs/10.1190/1.1440465.
Kamei, R., and R. Pratt, 2008, Waveform tomography strategies for imaging attenuation structure for
cross-hole data: 70th Annual International Conference and Exhibition, EAGE, Extended
Abstracts, Extended Abstracts, F019, https://org/doi.10.3997/2214-4609.20147680.
Kamei, R., R. G. Pratt, and T. Tsuji, 2013, On acoustic waveform tomography of wide-angle obs data –
strategies for pre-conditioning and inversion: Geophysical Journal International, 194, 1250–1280,
Kamei, R., R. G. Pratt, and T. Tsuji, 2014, Misfit functionals in laplace-fourier domain waveform
inversion, with application to wide-angle ocean bottom seismograph data: Geophysical
Prospecting, 62, 1054–1074, https://doi/10.1111/1365-2478.12127.
Liebermann, R. C., and C. H. Sondergeld, 1994, Experimental techniques in mineral and rock physics:
The schreiber volume: Birkhauser Verlag AG.
Malinowski, M., S. Operto, and A. Ribodetti, 2011, High-resolution seismic attenuation imaging from
wide-aperture onshore data by visco-acoustic frequency-domain full-waveform inversion:
Geophysical Journal International, 186, 1179–1204. https://doi/10.1111/j.1365-
246X.2011.05098.x.
Mothi, S., and R. Kumar, 2014, Detecting and estimating anisotropy errors using full waveform inversion
and ray-based tomography: A case study using long-offset acquisition in the gulf of mexico: 84th
http://doi/abs/10.1190/segam2014-0324.1.
Operto, S., J. Virieus, J. X. Dessa, and G. Pascal, 2006, Crustal seismic imaging from multifold ocean
bottom seismometer data by frequency domain full waveform tomography: Application to the
© 2017 SEG Page 1392

eastern-Nankai trough: Journal of Geophysical Research, 111, B09306,
http://doi/10.1029/2005JB003835.
Pratt, R., F. Hou, K. Bauer, and M. H. Weber, 2004, Waveform tomography images of velocity and
inelastic attenuation from the Mallik 2002 crosshole seismic surveys; in scientific results from the
Mallik 2002 gas hydrate production research well program, in S. R. Dallimore and T. S. Collet,
eds: Geological Survey of Canada, 545, 14.

Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, http://org/doi/abs/10.1190/1.1444597.
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency space
http://doi/10.1046/j.1365-246X.1998.00498.x.
Quan, Y., and J. M. Harris, 1997, Seismic attenuation tomography using the frequency shift method:
Geophysics, 62, 895–905, http://seg.org/doi/abs/10.1190/1.1444197.
Shin, C., and Y. H. Cha, 2009, Waveform inversion in the laplace-fourier domain: Geophysical Journal
International, 177, 1067–1079, http://doi/10.1111/j.1365-246X.2009.04102.x.
Shin, C., and D. J. Ming, 2006, Waveform inversion using a logarithmic wavefield: Geophysics, 71, no.
3, R31–R42, http://org/doi/abs/10.1190/1.2194523.
Sirgue, L., O. Barkved, J. Dellinger, J. Etgen, U. Albertin, and J. Kommedal, 2010, Full waveform
inversion: The next leap forward in imaging at valhall: First Break, 28, 65–70.
Sirgue, L., and R. G. Pratt, 2003, Waveform inversion under realistic conditions: Mitigation of non-
linearity: 73rd Annual International Meeting, SEG, Expanded Abstracts, 22, 694–697,
http://seg.org/doi/abs/10.1190/1.1844073.
temporal frequencies: Geophysics, 69, 231–248, http://org/doi/abs/10.1190/1.1649391.
Takougang, E. M. T., and A. J. Calvert, 2013, Seismic waveform tomography across the seattle fault zone
in puget sound: Resolution analysis and effectiveness of visco-acoustic inversion of viscoelastic
data: Geophysical Journal International, 193, 763–787.
Takougang, E. T., and Y. Bouzidi, 2016, Application of fullwaveform tomography to vsp walkaway data:
ASEG Extended Abstracts 2016: 25th International Geophysical Conference and Exhibition,
484–487.
Wang, Y., and Y. Rao, 2006, Crosshole seismic waveform tomography-1. strategy for real data
application: Geophysical Journal International, 166, 1224–1236, http://doi/10.1111/j.1365-
246X.2006.03030.x.
Yang, D., M. Fehler, A. Malcolm, and L. Huang, 2011, Carbon sequestration monitoring with acoustic
double difference waveform inversion: A case study on sacroc walkaway vsp data: 81th Annual
http://seg.org/doi/abs/10.1190/1.3628099.
Zhou, B., and S. Greenhalgh, 2003, Crosshole seismic inversion with normalized full-waveform
amplitude data: Geophysics, 68, 1320–1330. https://library.seg.org/doi/abs/10.1190/1.1598125.
© 2017 SEG Page 1393

Full-3-D waveform inversion with near-surface ambient-noise data based on discontinuous
Galerkin method
Wei Wang*, Po Chen, Ian S. Keifer, Ken G. Dueker, Department of Geology and Geophysics, University of Wyoming
Summary A variety of GITs have been applied to image the DCZ and
each has its own strengths and limitations (Parsekian et al.,
We successfully applied the Full-3-D tomography (F3DT) 2015). In practice, by combining results from multiple
based on adjoint-wavefield method to the passive source types of GITs, we can potentially obtain a more robust
ambient-noise data over a critical zone site in Blair Wallis interpretation. It has been shown that images of subsurface
Watershed, Wyoming. The aim of this study is to image the seismic velocities (i.e., physical quantities related to density
Critical Zone in three-dimensional in order to better and elastic moduli) obtained from seismic refraction
understand its architecture and associated processes. In the surveys (SRS) using ray-theoretic travel-time tomography
forward modeling, we utilize the 3-D elastic wave equation (RTT) correlate well with the degree of weathering in the
discontinuous Galerkin (DG) solver to simulate wave DCZ and can be used to delineate the weathering interfaces
propagation within a model volume, which is constructed (WI) (or reaction zone, depending upon the observation
by tetrahedral elements with variable size determined by scale) separating soil, weathered bedrock and fresh
starting model velocity as well as the desired resolution. crystalline bedrock (e.g., Hunter et al., 1984; Dasios et al.,
We applied the adjoint-wavefield method to calculate 1999; Befus et al., 2011; Holbrook et al., 2014). The
sensitivity (Fréchet) kernels using frequency-dependent architecture of the WI is an important indicator of the
group delay misfit between model-predicted waveforms balance between the rates of weathering and erosion and
and field data. often controls the flow and storage of subsurface fluids
(e.g., Drake et al., 2009; Brantley et al., 2011; Holbrook et
Introduction al., 2014). It is therefore critical to be able to image not
only the overall geometry but also the internal structure of
The critical zone is the outermost layer of the solid earth the WI. The resolution of the images obtained from SRS
that extends from the deepest reach of the groundwater using first-arrival RTT is typically on the order of tens of
chemical reactions to the top of the vegetation canopy meters, which is comparable to or even larger than the
(Anderson et al., 2007; Brantley et al., 2007). It is a highly thicknesses of some WI (Brantley et al., 2011). Our
dynamic layer that hosts a wide variety of physical, preliminary results (Want et al., 2016) strongly indicate that
chemical, hydrological and biological processes. These the application of F3DT is capable of improving the
processes and the interactions among them create and resolution of seismic velocity images to (sub)meter-scale.
transform the environment that sustains agriculture and
most terrestrial life. To understand these processes and to By applying F3DT to image the DCZ with the passive-
build predictive quantitative models that can accurately source ambient-noise seismic waveform data, we will
describe them require us to be able to characterize the provide high-resolution, three-dimensional seismic velocity
structure of the critical zone across its full depth range, images that will lead to a better understanding of the DCZ
from the soil at the surface and shallow depths to the weathering processes.
saprolite, regolith and fractured bedrock, which may extend
to depths of tens to hundreds of meters and are often Field site and data acquisition
difficult to access directly. Drilling and coring is expensive
and only provides direct spot measurements of the deep The field site is located in the Laramie Range with
critical zone (DCZ), which may not be representative of the approximately 21 km south east of Laramie (Fig. 1). The
whole area, especially in areas with strong lateral three arrays (Fig. 1) were deployed in the Blair-Wallis
heterogeneities, where geostatistical interpolation may fail Watershed during the summer of 2015. Green, red, and
to provide an adequate representation. Geophysical yellow dots show locations of the Blair Wallis 1, 2, and 3
imaging techniques (GITs) are minimally invasive, nodal arrays, respectively. Blair Wallis 1 array is the data
relatively inexpensive and provide indirect estimates of we used presented in this work. Stars indicate the locations
physical properties over large areas quickly. Thus, GITs are of borehole wells, of which the three arrays were centered
highly useful complements to drilling/coring, especially for ground truthing. Each array is roughly 200 m by 200 m.
when they can be calibrated with direct measurements, and Each receiver recorded the ambient-noise field
they can provide crucial guidance for selecting the most continuously for 3-4 days. The ambient-noise Green’s
useful sites for excavation and sampling efforts (Riebe & Function (ANGF) is extracted from raw ambient-noise data
Chorover, 2013). followed by seismic interferometry processes (e.g. Bensen
et al., 2007). A scalable parallel algorithm for seismic
© 2017 SEG Page 1394

Full-3-D waveform inversion with near-surface ambient-noise data based on discontinuous Galerkin method
interferometry (pSIN) (Chen et al., 2016) is employed to more corrections can be made.order accuracy on fully
efficiently speed up the workflow. unstructured tetrahedral meshes, which is suitable to deal
with geological structure models with complex geometries
associated with topography, faults and other types of

structural discontinuities. Detailed description of the F3DT
algorithm and its implementation is provided in Lee et al.
400 (2014).
Discontinuous Galerkin (DG) solution of elastodynamic

meters
equations.
200
The DG method is particularly suitable for solving
elastodynamic equations in a complex geological model,
such as the highly heterogeneous critical zone. The
0 advantage of the DG scheme over the spectral-element
200 400 600 scheme is that the DG solution is allowed to be
meters discontinuous across element boundaries. For the Blair
Figure 1: A map of the study area and the ambient-noise data Wallis 1 data array, we constructed a 3D tetrahedral mesh
array with element adapting to both spatial variations of seismic
velocities and desired resolution (Fig. 2). The 1D starting
Methods
model (Fig. 3b) extracted by average the velocity model
from ambient-noise data analysis was extended by
F3DT is in essence an iterative numerical optimization
interpolation onto out 3D tetrahedral mesh. An ANGF
algorithm, in which we search for an optimal subsurface
envelope data example from one station is shown in Fig 3a.
seismic structure model that can minimize the waveform
As the starting model is generally increased with depth that
discrepancies between model-predicted (i.e., synthetic)
indicates wavelengths at shallower depths are much shorter.
seismograms and observed seismograms. The seismic
Hence, we used tetrahedral elements with smaller size (4m
structure model is usually represented using (visco)elastic
at the ground surface) than those at larger depth (10m at he
parameters, such as elastic moduli, density and seismic
about 90m depth). As the seismic velocity model was
velocities (Vp and Vs). These parameters are 3D continuous
updated from iteration to iteration, the tetrahedral mesh can
functions of the space coordinate and can in general be
be updated accordingly to adapt to the changing velocity
discretized using a 3D spatial mesh. During optimization
model.
we search for the optimal discrete representation of the
structural parameters.
To initiate the iterative optimization algorithm, we need a

“starting model”, an initial estimate of the discretized
structure model, which does NOT have to be very close to
the optimal structure model that is to be found. Synthetic
seismograms can then be computed using the starting
model by solving the elastodynamic equations the
discontinuous Galerkin methods. The discrepancies
between synthetic and observed waveforms can be used to
make corrections to the starting model. These corrections
are often called “perturbations”. The mathematical
instrument that converts waveform discrepancies to model
perturbations is known as the “sensitivity (Fréchet) kernel”,
which in essence is the derivative (in a generalized sense)
of the waveform discrepancy with respect to the structure
model, evaluated at the structure model used for computing Figure 2. A perspective view of the tetrahedral mesh used in our
the synthetic seismograms. After applying the perturbations DG simulations. The top surface of the mesh conforms to local
topography derived from Lidar data. Locations of array stations
to the starting model, an updated structure model is
are indicated using green dots on the surface.
obtained, which can then be used for computing updated
synthetic seismograms, computing updated sensitivity
kernels and making new corrections to the updated
Adjoint Kernels
structure model. The entire procedure is repeated until no
© 2017 SEG Page 1395

To capture the rich information in waveforms without

introducing significant nonlinearity into our optimization
process, we quantify waveform discrepancies using

frequency-dependent phase and amplitude misfits extracted
from synthetic and observed waveforms that are localized
in the time-frequency domain (Lee & Chen, 2013). The
waveform measurements were auto picked with a guided
signal window (Fig. 3a) and confirmed by several criterions
such as signal-noise ratio. The sensitivity (Fréchet) kernels
for the waveform misfit measurements with respect to Figure 4. Examples of adjoint kernels for a surface wave along
structure parameters were computed using the adjoint- the hill slope. a), c): cross-section views perpendicular to the
wavefield method. Intuitively, the sensitivity (Fréchet) source-receiver plane; b), d) map views at 2 m below
kernels tell us how to change our structure model in order topography. a)-b) kernel with negative phase-delay
to reduce the discrepancies between synthetic and observed measurement; c)-d) kernel with positive phase-delay
waveforms quantified using a specific misfit measure. An measurement; Warm and cool colors are negative and positive
example of adjoint-kernels with different measurement and perturbation to S-velocity.
frequency is shown in Fig. 4. Those frequency-dependent
sensitivity kernels are then applied to update the model and 4
x 10
in turn improve waveform group delay misfits. 6
Starting Model
3rd−iteration Model
4
Measurement count
0
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Group Delay(s)
Figure 3. a) ANGF envelope data of station 400 ordered by

increasing offset. The thick white line is the guidance window Figure 5. Group Delay misfit distribution
for waveform extraction; b) 1D starting model extracted from
ambient-noise data analysis Conclusions
Waveform Inversion We have successfully applied the Full-3-D tomography
based on adjoint-wavefield method to image the deep
Starts with the 1D velocity model, we used a bandpass Critical Zone structure with the passive-source ambient-
filter with corner frequency 15 and 25 Hz as source time noise data. Nearly 10000 waveforms were inverted to
construct the near-surface velocity structures. Obtained 3D
function to calculate synthetic waveforms. Nearly 10000
S-velocity model provide high-resolution and small-scale
waveforms were chosen for waveform misfit with
heterogeneities of the near-surface structures. We will
confidence and sufficient data coverage over entire
continuous carry out furthermore waveform inversion in
modeling volume. We have finished third iteration of the
waveform inversion. The total waveform misfit drops by order to improve current model and thus a reliable Critical
45.7% from starting model to resulting model as shown in Zone architecture and its function related to associated
the misfit distribution diagram (Fig. 5) and the resulting processes.
velocity model at different depth related to topography is
shown in Fig. 6.
© 2017 SEG Page 1396

Figure 6. F3DT velocity model at different depth related to topography
© 2017 SEG Page 1397

EDITED REFERENCES
REFERENCES
Anderson, S. P., F. von Blanckenburg, and A. F. White, 2007, Physical and chemical controls on the
critical zone: Elements, 3, 315–319, http://dx.doi.org/10.2113/gselements.3.5.315.
Befus, K. M., A. F. Sheehan, M. Leopold, S. P. Anderson, and R. S. Anderson, 2011, Seismic constraints
on critical zone architecture, Boulder Creek watershed, Front Range, Colorado: Vadose Zone
Journal, 10, 1342, http://dx.doi.org/10.2136/vzj2010.0108er.
Bensen, G. D., M. H. Ritzwoller, M. P. Barmin, A. L. Levshin, F. Lin, M. P. Moschetti, N. M. Shapiro,
and Y. Yang, 2007, Processing seismic ambient noise data to obtain reliable broad-band surface
wave dispersion measurements: Geophysical Journal International, 169, 1239–1260,
Brantley, S. L., M. B. Goldhaber, and K. V. Ragnarsdottir, 2007, Crossing disciplines and scales to
understand the critical zone: Elements, 3, 307–314, http://dx.doi.org/10.2113/gselements.3.5.307.
Brantley, S. L., H. Buss, M. Lebedeva, R. C. Fletcher, and L. Ma, 2011, Investigating the complex
interface where bedrock transforms to regolith: Applied Geochemistry, 26, S12–S15,
http://dx.doi.org/10.1016/j. apgeochem.2011.03.017.
Chen P., N. J. Taylor, K. G. Dueker, I. S. Keifer, A. K. Wilson, C. L. McGuffy, C. G. Novitsky, A. J.
Spears, and W. S. Holbrook, 2016, pSIN: a scalable, Parallel algorithm for Seismic
INterferometry of large-N ambient-noise data: Computers & Geosciences, 93, 88–95,
http://dx.doi.org/10.1016/j.cageo.2016.05.003.
Dasios, A., C. McCann, T. R. Astin, D. M. McCann, and P. Fenning, 1999, Seismic imaging of the
shallow subsurface: shear-wave case histories: Geophysical Prospecting, 47, 565–591,
http://dx.doi.org/10.1046/j.1365-2478.1999.00138.x.
Drake, H., E. Tullborg, and A. B. MacKenzie, 2009, Detecting the near-surface redox front in crystalline
bedrock using fracture mineral distribution, geochemistry and U-series disequilibrium: Applied
Geochemistry, 24, 1023–1039, http://dx.doi.org/10.1016/j.apgeochem.2009.03.004.
Holbrook, W. S., C. S. Riebe, M. Elwaseif, J. L. Hayes, K. Basler-Reeder, D. L. Harry, A. Malazian, A.
Dosseto, P. C. Hartsough, and J. W. Hopmans, 2014, Geophysical constraints on deep weathering
and water storage potential in the Southern Sierra Critical Zone Observatory: Earth Surface
Processes and Landforms, 39, 366–380, http://dx.doi.org/10.1002/esp.3502.
Hunter, J. A., S. E. Pullan, R. A. Burns, R. M. Gagne, and R. L. Good, 1984, Shallow seismic reflection
mapping of the overburden–bedrock interface with the engineering seismograph; some simple
techniques: Geophysics, 49, 1381–1385, http://dx.doi.org/10.1190/1.1441766.
Lee, E. J., P. Chen, T. H. Jordan, P. B. Maechling, M. A. Denolle, and G. C. Beroza, 2014, Full-3-D
tomography for crustal structure in southern California based on the scattering-integral and the
adjoint-wavefield methods: Journal of Geophysical Research: Solid Earth, 119, 6421–6451 .
Parsekian, A. D., K. Singha, B. J. Minsley, W. S. Holbrook, and L. Slater, 2015, Multiscale geophysical
imaging of the critical zone: Reviews of Geophysics, 53, 1–26,
http://dx.doi.org/10.1002/2014RG000465.
Riebe, C. S. and J. Chorover, 2013, Report on drilling, sampling, and imaging the depths of the critical
zone, an NSF workshop.
© 2017 SEG Page 1398

Wang, W., P. Chen, and W. S. Holbrook, 2016, Near-surface adjoint tomography based on the
discontinuous Galerkin method, 86th Annual International Meeting, SEG, Expanded Abstracts,
2387–2392, http://dx.doi.org/10.1190/segam2016-13878688.1.
© 2017 SEG Page 1399

Multiscale Phase Inversion of Seismic Marine Data
Lei Fu
King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
SUMMARY spectra, where
We test the feasibility of applying multiscale phase inver- F [p(g,t; s)cal ] = A(g, ω; s)cal eiφ (g,ω;s)cal , (1)
iφ (g,ω;s)obs
sion (MPI) to seismic marine data. To avoid cycle-skipping, F [p(g,t; s)obs ] = A(g, ω; s)obs e . (2)
the multiscale strategy temporally integrates the traces several
times , i.e. high-order integration, to produce low-boost seis- Here, s is the location of the source, and g is the location of
mograms that are used as input data for the initial iterations the geophone for a monochromatic source at frequency ω. The
of MPI. As the iterations proceed, higher frequencies in the modified traces p̄(g,t; s)cal are obtained by replacing A(g, ω; s)cal
data are boosted by using integrated traces of lower order as with A(g, ω; s)obs and performing the inverse Fourier trans-
the input data. Results with synthetic data and field data from form,
the Gulf of Mexico produce robust and accurate results if the n o
p̄(g,t; s)cal = F −1 L(ω)A(g, ω; s)new eiφ (g,ω;s)cal , (3)
model does not contain strong velocity contrasts such as salt-
n o
sediment interfaces. p̄(g,t; s)obs = F −1 L(ω)A(g, ω; s)obs eiφ (g,ω;s)obs , (4)
where A(g, ω; s)new = A(g, ω; s)obs and L(ω) is a low-pass fil-

INTRODUCTION ter applied to data.
One of the most significant problems with full waveform inver- Misfit function. The modified predicted and observed traces
sion (FWI) is the cycle-skipping problem (Virieux and Operto, are time-integrated and their residual are used for the MPI mis-
2009; Warner et al., 2013; Warner and Guasch, 2014), where fit function,
an iterative solution gets stuck in a local minimum. Another XZ
problem is that the amplitudes of the predicted traces do not ε mpi = dt [I n p̄(g,t; s)cal − I n p̄(g,t; s)obs ]2 , (5)
fully match those of the observed data because all of the actual s,g
physics is not used in computing the predicted traces. To miti- where I n is an integration operator I ≡ dt performed n times,
R
gate both problems, Sun and Schuster (1993) proposed a multi- and p̄(g,t; s)cal and p̄(g,t; s)obs are the modified traces in equa-
scale phase inversion (MPI) method. To avoid cycle-skipping, tions (3) and (4). If we set A(g, ω; s)new = A(g, ω; s)cal in
the multiscale strategy temporally integrates the traces several equation (3), then the MPI misfit function becomes that for
times to produce low-boost seismograms that are used as input full wave inversion, except that the traces have been modified
data for the initial iterations of MPI. To avoid the necessity of by the filter L(ω) and integration operator.
exactly predicting amplitudes, only the phase of the seismic
data is predicted and the amplitude information is largely ig- Gradient. The gradient of MPI misfit function ε mpi w.r.t. the
nored. The penalty in not matching amplitudes is a moderate velocity field c(x)
loss in resolution in the velocity tomogram. Sun and Schuster
∂ ε mpi
(1993) demonstrated the feasibility of this method by applying γ mpi (x) = (6)
it to synthetic crosswell data. We now test the MPI strategy on ∂ c(x)
Z
both synthetic data and field data recorded in a marine seismic 1 X
dt [I n p̄˙(x,t; s)] I n p̄˙0 (x,t; s) ,

=
sruvey. The next section describes the theory of MPI, which c(x)3 s
is then followed by the numerical results section. Both syn-
thetic data and field data are inverted with the MPI strategy for where dot means time differentiation, p̄(x,t; s) is the pressure
sediments with moderate velocity contrasts. The final section wavefield by the source at s, and p̄0 (x,t; s) is the wavefield
presents the summary. computed by backprojecting the seismogram residual (Luo and
Schuster, 1991) δ p̄,
p̄0 (x,t; s) =
X
THEORY g(x, −t; g, 0) ∗ δ p̄, (7)
r
For phase inversion, we replace the magnitude spectrum of with

a calculated trace with the magnitude spectrum of the cor- δ p̄ = p̄(g,t; s)obs − p̄(g,t; s)cal . (8)
responding observed trace so that the amplitude strengths of
two traces are equalized. The predicted and observe traces are In theory, the integration operator I n on the fields p̄ and p̄0 in
Fourier transformed to obtain the magnitude A and phase φ equation 6 can be alternatively applied to source functions that
generate them without changing the misfit gradient. For exam-
ple, I n p̄˙(x,t; s) is equivalent to generating a wavefield using a
source wavelet with I n−1 integrations.
© 2017 SEG Page 1400
Marmousi Model
For comparison, the traditional full wave inversion (FWI) gra- 0
km/s
dient is
Z Distance (km)
4
1
∂ ε f wi
Z
1 X
γ f wi (x) = dt [ ṗ(x,t; s)] ṗ0 (x,t; s) ,

= 3
∂ c(x) c(x)3 s 2
2
(9)
0 1 2 3 4
Q Model log10Q
where the FWI gradient is the dot product between the source 0 4
forward modeled wavefield and the backprojected wavefield
Z Distance (km)
with the data residual δ p = p(g,t; s)obs − p(g,t; s)cal . 1 3
Multiscale rrequency strategy. The data should be band-pass 2 2
filtered into different frequency bands with different peak fre- 1

quencies, and then the FWI or MPI method is used for low- 0 1 2 3
X Distance (km)
4
frequency data at the early iterations and then high-frequency

Figure 1: a) The Marmousi velocity model, b) the Q model used for generating visco-acoustic
data at later iterations. A low-pass Wiener filter (Boonyasiri- data.
wat et al., 2009) can be computed by
a) Acoustic b) Visco−acoustic c) Elastic
†
Wtarget (ω)Woriginal (ω) 0 0 0 0.01
Lwiener (ω) = , (10)

|Woriginal (ω)|2 + ε 2
1 1 1
where Lwiener (ω) is the Wiener filter, Woriginal (ω) is the orig-
inal wavelet, Wtarget (ω) is the target wavelet, † denotes com-
Time (s)
2 2 2 0
plex conjugate, ω is the angular frequency, and ε is a damping
factor to prevent numerical instability. One formula for choos-
ing optimal frequency bands proposed by Sirgue and Pratt (2004) 3 3 3
is
fn
fn+1 = , (11) −0.01
αmin 0 2 4 0 2 4 0 2 4
Offset (km) Offset (km) Offset (km)
where fn is the current frequency,
√ fn+1 is the next frequency
Figure 2: Synthetic seismic data, a) constant density acoustic data, b) visco-acoustic data, and
to be chosen, and αmin = z/ h2 + z2 is the parameter that de- c) elastic data, for the shot location at x = 0 km. All subplots have the same scale.
pends on the maximum half offset h and the maximum depth z
to be imaged. a) Initial Model km/s b) Initial Model km/s
0 0
Z Distance (km)
4 4
1 1
3 3
2 2
NUMERICAL RESULTS 2 2
0 1 2 3 4 0 1 2 3 4
To demonstrate the effectiveness of MPI and its advantages, c) FWI Tomogram km/s d) FWI Tomogram km/s
0 0
Z Distance (km)
we invert one synthetic data from the Marmousi and marine 4 4

1 1
data from the Gulf of Mexico. The modeling kernels are based 3 3
2 2
on numerical solutions to the constant-density acoustic wave 2 2
equation, while the observed input data are generated by solv- 0 1 2 3 4 0 1 2 3 4
ing the constant-density acoustic wave equation (Alford et al., e) MPI Tomogram km/s f) MPI Tomogram km/s
0 0
Z Distance (km)
1974), visco-acoustic equation (Operto et al., 2007) or elastic 4 4

1 1
equation (Levander, 1988) in the synthetic cases. 3 3
2 2
2 2
Marmousi Model 0 1 2 3 4 0 1 2 3 4
The Marmousi model (Figure 1a) is discretized into a 284 × g) Profile Comparison h) Profile Comparison
0 0
Z Distance (km)
461 gridded velocity model with spacing of 10 m in both di-

1 1
rections. There are 116 point sources spaced at an interval of
40 m on the free surface and 461 receivers separated by a 10 2 2
m interval along the free surface. 0 1 2 3 4 0 1 2 3 4

X Distance (km) X Distance (km)
Acoustic Data Figure 3: Inversion results for acoustic data. a) The smoothed initial velocity model with
an average velocity error of 12%, c) FWI and e) MPI tomograms based on the initial model
The original acoustic data are generated by solving the con- a), g) velocity profile comparison for true (black), initial (blue), FWI (green) and MPI (red)
stant density acoustic equation with a 15-Hz Ricker wavelet. A tomograms; b) the v(z) initial velocity model with an average velocity error of 22%, d) FWI
and f) MPI tomograms based on the initial model b), h) velocity profile comparison for true
common shot gather for the source at (z, x) = (0, 0)m is shown (black), initial (blue), FWI (green) and MPI (red) tomograms.
in Figure 2a. Different bandpass filters are applied to the origi-
nal data, and the frequency multiscale strategy is used for both
the FWI and MPI methods. Figure 3a is the smoothed initial velocity model with an aver-
age velocity error of 12%, and Figure 3b is the v(z) initial ve-
© 2017 SEG Page 1401
locity model with an average velocity error of 22%. The FWI the source wavelet and acquisition geometry are the same as
and MPI tomograms with the smoothed initial model (Fig- in the acoustic case. The pressure is injected in the water and
ure 3a) are shown in Figures 3c and 3e, respectively. The FWI the pressure field is recorded as the negative of the average of
and MPI tomograms with the v(z) initial model (Figure 3b) the normal stresses. The true v p model is shown in√Figure 1a,
are shown in Figures 3d and 3f, respectively. Figures 3g and the density is given by ρ = 0.31v0.25
p , and vs = v p / 3, except
3h shows the velocity profile comparison for the true (black), the shear velocity of the ocean water is set to 0 m/s. The elas-
initial (blue), FWI (green) and MPI (red) velocity models at tic data at shot location (z, x) = (0, 0)m is shown in Figure 2c.
different offsets. We can see that both the FWI and MPI to- We can see some converted waves which do not appear in the
mograms have a good agreement with the true model when the acoustic data. The FWI and MPI tomograms are shown in
initial model is not far away from the true model. However, Figure 4b and 4c, respectively, and the corresponding velocity
when the initial model is far away from the true model, tradi- profiles are shown in Figure 4c. We can see that the MPI to-
tional FWI gets stuck in a local minima, while MPI provides an mogram is moderately more accurate than the FWI tomogram.
accurate tomogram. Thus, the MPI method has a more robust
convergence than FWI for this model. Gulf of Mexico Data
The MPI method is applied to a streamer data set recorded in
Visco-acoustic Data the Gulf of Mexico using 515 shots with a shot interval of 37.5
We now use visco-acoustic data as input traces to the acous- m, a time-sampling interval of 2 ms, a recording time of 10
tic FWI and MPI algorithms. The goal is to test the sen- s, and 480 hydrophones per shot. The hydrophone interval is
sitivity of each method to the unmodeled attenuation effects 12.5 m, with the minimum and maximum source-receiver off-
in the data. The visco-acoustic data are generated by solving sets of 198 m and 6 km, respectively. The v(z) velocity model
visco-acoustic equations, where the source wavelet and acqui- shown in Figure 5a is used as the initial model for multiscale
sition geometry are the same as in the acoustic case. A pres- FWI and MPI. The initial velocity model is discretized into
sure source is injected in the water, and the pressure field is 402 × 3008 grids with a grid spacing of 6.25 m in both direc-
recorded. The true v p model is shown in Figure 1a and the tions.
Q model is shown in Figure 1b, where the minimum Q is 5.
The visco-acoustic data for the shot location at (z, x) = (0, 0)m a) Initial Model km/s
is shown in Figure 2b, where we can see that reflections are 0 2.5
highly attenuated due to the highly attenuative medium. The

Depth (km)
acoustic FWI and MPI methods are applied to these synthetic 1

2
visco-acoustic data, where the initial v p model is shown in Fig-
ure 3a. The FWI and MPI tomograms are shown in Figures 4a 2
and 4c, respectively, where the velocity profile comparison at 1.5
0 5 10 15
different offsets are shown in Figure 4e. It is found that MPI is
b) FWI Tomogram km/s
slightly more accurate than FWI. 0 2.5
Depth (km)
a) FWI Tomogram km/s b) FWI Tomogram km/s

0 0 1
Z Distance (km)
4 4 2
1 1
3 3
2 2
2
2 2
1.5
0 1 2 3 4 0 1 2 3 4 0 5 10 15
c) MPI Tomogram km/s d) MPI Tomogram km/s c) MPI Tomogram km/s
0 0
Z Distance (km)
4 4 0 2.5
1 1
3 3
Depth (km)
2 2 1
2 2
2
0 1 2 3 4 0 1 2 3 4
e) Profile Comparison f) Profile Comparison 2
0 0
Z Distance (km)
1.5
1 1 0 5 10 15
X Distance (km)
2 2
Figure 5: The a) v(z) initial model, b) multiscale FWI, and c) MPI tomograms for the Gulf of
0 1 2 3 4 0 1 2 3 4 Mexico data.
X Distance (km) X Distance (km)
Figure 4: Inversion results for visco-acoustic and elastic data. a) FWI and, c) MPI tomograms,
e) velocity profile comparison for true (black), initial (blue), FWI (green) and MPI (red) tomo-
grams at different offsets for visco-acoustic data; b) FWI and d) MPI tomograms , f) velocity Figures 5b and 5c depict the FWI tomogram after 26 iterations
profile comparison for true (black), initial (blue), FWI (green) and MPI (red) tomograms at
different offsets for elastic data. and MPI tomogram after 52 iterations, respectively. Both the
FWI and MPI tomograms have a higher resolution compared
Elastic Data with the initial velocity model. In addition, the resolution of
We now use elastic data as input traces to the acoustic FWI the MPI tomogram is slightly higher than that seen in the FWI
and MPI algorithms. The goal is to test the sensitivity of each tomogram. In order to verify the reconstructed FWI and MPI
method to the unmodeled elastic effects in the data. The elastic tomograms, we compare the migration images and angle do-
data are generated by solving the elastic wave equation, where main common image gathers (ADCIGs).
© 2017 SEG Page 1402
a) RTM Image Computed with the Initial Velocity Model Average Correlation for Each Shot
1
0 0.1 Initial
FWI
Depth (km)
MPI
1 0.8
0
2 0.6
Correlation
−0.1
0 5 10 15
0.4
b) RTM Image Computed with FWI Tomogram
0 0.1
0.2
Depth (km)
1
0
0
2
−0.1 2 4 6 8 10 12 14 16
0 5 10 15 X Distance (km)
c) RTM Image Computed with MPI Tomogram Figure 8: The average correlation, which is calculated by the correlations between the ob-
served and predicted data, for each shot.
0 0.1
Depth (km)
1
0
that the ADCIGs associated with the FWI and MPI tomograms
2 are flatter than those from the initial velocity model. And the
−0.1 ADCIGs (in the red box) from MPI are slightly flatter then
0 5 10 15 FWI. Figure 8 shows the data similarity between the observed
X Distance (km)
and predicted data. We can find that the flattened MPI traces
Figure 6: RTM migration images computed from a) initial velocity model, b) FWI tomogram,
and c) MPI tomogram. have a better similarity to one another than the traces obtained
from the FWI tomogram.
a) ADCIGs Computed with the Initial Velocity Model
0.2
SUMMARY AND CONCLUSIONS
0.1
Depth (km)
1
0 The multiscale strategy temporally integrates the traces several
2 −0.1 times to produce low-boost seismograms that are used as input
−0.2 data for the initial iterations of MPI. Synthetic examples show
4 6 8 10 12 14 that both the MIP and FWI methods can obtain similar tomo-
b) ADCIGs Computed with FWI Tomogram grams when the initial velocity model is not far away from
0.2
the true model. However, limited tests suggest that the MPI
0.1 method gives a more accurate tomogram than FWI when the
Depth (km)
1
0 initial model is far from the true model. In addition, tests sug-
−0.1
gest that MPI can provide a more accurate tomogram than FWI
2
when inverting elastic data. These examples show that MPI is
−0.2
4 6 8 10 12 14 more robust than FWI for inverting seismic marine data.
c) ADCIGs Computed with MPI Tomogram
0.2
In the GOM marine data case, both the FWI and MPI methods
successfully inverted the marine data to obtain tomograms that
0.1
Depth (km)
1 are more accurate than the initial velocity model. Comparing

0 the RTM images, ADCISs and data correlations, it is found
2 −0.1 that the MPI has a slightly higher accuracy than FWI.
−0.2
4 6 8 10 12 14
X Distance (km)
ACKNOWLEDGEMENTS
Figure 7: Angle domain common image gathers (ADCIGs) from -45◦ to 0◦ based on the a)
initial model, b) FWI tomogram, and c) MPI tomogram.
The research reported in this publication was supported by the
King Abdullah University of Science and Technology (KAUST)
The original data are migrated using reverse time migration in Thuwal, Saudi Arabia. We are grateful to the sponsors of the
(RTM) computed with the initial velocity model, the FWI to- Center for Subsurface Imaging and Modeling (CSIM) Consor-
mogram and the MPI tomogram, and the results are shown in tium for their financial support. For computer time, this re-
Figures 6a, 6b, and 6c, respectively. We see that the RTM im- search used the resources of the Supercomputing Laboratory
ages computed with the FWI and MPI tomograms are quite at KAUST and the IT Research Computing Group. We thank
similar. The corresponding ADCIGs are shown in Figures 7a, them for providing the computational resources required for
7b and 7c, respectively. Comparing the ADCIGs, we can see carrying out this work.
© 2017 SEG Page 1403
EDITED REFERENCES
REFERENCES
Alford, R., K. Kelly, and D. M. Boore, 1974, Accuracy of finite-difference modeling of the acoustic wave
equation: Geophysics, 39, 834–842, http://doi.org/10.1190/1.1440470.
Boonyasiriwat, C., P. Valasek, P. Routh, W. Cao, G. T. Schuster, and B. Macy, 2009, An efficient
multiscale method for time-domain waveform tomography: Geophysics, 74, no. 6, WCC59–
WCC68, http://doi.org/10.1190/1.3151869.
Levander, A. R., 1988, Fourth-order finite-difference P-SV seismograms: Geophysics, 53, 1425–1436,
http://doi.org/10.1190/1.1442422.
http://doi.org/10.1190/1.1443081.
Operto, S., J. Virieux, P. Amestoy, J.-Y. LExcellent, L. Giraud, and H. B. H. Ali, 2007, 3D finite-
difference frequency domain modeling of visco-acoustic wave propagation using a massively
parallel direct solver: A feasibility study: Geophysics, 72, no. 5, SM195–SM211,
http://doi.org/10.1190/1.2759835.
Sun, Y., and G. T. Schuster, 1993, Time-domain phase inversion: 63rd Annual International Meeting,
SEG, Expanded Abstracts, 684–687, http://doi.org/10.1190/1.1822588.
Warner, M., and L. Guasch, 2014, Adaptive waveform inversion — FWI without cycle skipping-theory:
76th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://doi.org/10.3997/2214-4609.20141092.
Warner, M., T. Nangoo, N. Shah, A. Umpleby, and J. Morgan, 2013, Full-waveform inversion of cycle-
skipped seismic data by frequency down-shifting: 83rd Annual International Meeting, SEG,
Expanded Abstracts, 903–907, http://doi.org/10.1190/segam2013-1067.1
© 2017 SEG Page 1404

An effective multi-parameter Full Waveform Inversion in acoustic anisotropic media
J. Ramos-Martínez, PGS, J. Shi, Rice University, L. Qiu, PGS and A.A. Valenciano, PGS
Summary Here, we use a parameterization consisting of vertical

velocity vz and Thomsen parameters ε and δ to describe an
Multi-parameter Full Waveform Inversion (FWI) suffers acoustic TI medium (VTI or TTI). We assume that the δ
from leakage among different medium properties. Here we field is obtained from other types of data (e.g., well logs,
discuss a practical multi-parameter FWI solution that VSP) and we invert for vz and ε. The resulting vz sensitivity
minimizes leakage and retrieves the long-wavelength kernel for this parameterization is similar in form to one
features of the anisotropic earth model. Our algorithm is derived for inverting isotropic acoustic velocity (Ramos-
based on a regularization of the objective function and a Martinez et al., 2016). This velocity kernel targets long-
specific parameterization of an acoustic Transversely wavelength model updates (diving waves and “rabbit ears”)
Isotropic (TI) medium consisting of the vertical velocity, because it suppresses the specular reflectivity in the
and Thomsen parameters epsilon and delta. We show, by gradient (migration isochrones). To compute the gradient
using synthetic data, that Total Variation (TV) for ε, we derive a new adjoint-state equation within the
regularization significantly reduces leakage of the vertical framework of our pseudo-analytical extrapolator. The result
velocity in the epsilon model during the inversion. Finally, is a cross correlation-type gradient that enables updating all
we show an application on field data from the Gulf of wavelength components of the ε field. To enhance the long-
Mexico where the flatness of the common image gathers is wavelength features during model updating, we apply TV
significantly improved. regularization to the inversion. Our TV implementation is
based on the split Bregman method that provides an
Introduction effective algorithm for solving L1 optimization problems.
TV regularization enables the incorporation of a priori
In most geologic scenarios accounting for anisotropy is information on the smoothness of the ε parameter. Thus, it
crucial for successful application of FWI. Although using reduces leakage of vz into the ε update. From a synthetic
FWI to estimate the vertical velocity is a routine practice, example, we validate the ε kernel and the role of the
jointly updating velocity and anisotropic parameters by regularization in the minimization of leakage between
inversion has proven to be difficult. The challenge arises parameters. We also show a successful application in a
from the coupled effect that vertical velocity and anisotropy field dual-sensor dataset from the Gulf of Mexico,
produce on the surface seismic response. This problem is performing an alternating inversion between vz and ε.
referred to as leakage or crosstalk between parameters and
is shared by inversion methods that rely only on surface Theory
seismic data (either ray tomography or FWI).
FWI solves a nonlinear inverse problem by iteratively
In multi-parameter FWI for acoustic anisotropic media, a updating the model parameters to improve the match
variety of parameterizations and inversion strategies have between modeled and recorded field data. This is often
been proposed (e.g., Plessix and Cao, 2011; Alkhalifah and accomplished by minimizing an objective function
Plessix, 2014; Cheng et al., 2015). In this context, a useful typically in a least-squares sense. The inversion algorithm
conclusion obtained by some authors (e.g., Plessix and Cao, computes the model updates as a scaled representation of
2011, Debens et al., 2015) is that in order to match the the objective function gradient with respect to each model
kinematics of the propagating waves at large offsets, it is parameter. This gradient depends on the sensitivity kernel
enough to produce a low-resolution ε field. In addition, of the different parameters describing the earth model,
field data applications call for different data selection either isotropic or anisotropic.
strategies that could be used to reduce leakage of the
different medium properties. Cheng et al. (2016) concluded Our FWI solves the acoustic wave equation by using the
that the diving waves sense in a similar way both vertical pseudo-analytical method introduced by Etgen and
velocity vz and ε. They recommend the inclusion of Brandsberg-Dahl (2009). We depart from the VTI
reflected events in the FWI when vz is inverted to provide dispersion relation:
independent information and reduce leakage in the two-
parameter updates. Another approach is to filter the  2  v z2 k z2  vh2 k x2  k y2  
 vnmo  vh2 k x2  k y2 k z2 / k x2  k y2  k z2 
gradient by wavenumber content (Alkhalifah, 2015); this 2
also has the potential to reduce leakage of the different (1)
model parameters.
© 2017 SEG Page 1405

Effective multiparameter FWI in anisotropic media
where  is the angular frequency, vz, vh and vnmo are the    W1 ( x, t )S ( x, t )  R ( x, T  t )  
vertical, horizontal and NMO velocities, ki are the 1    dt (6)
2 Avz ( x )  t   W2 ( x, t ) 2
Gv z ( x )  1  S ( x , t )  R ( x , T  t )
wavenumber vector components along the space   
  v z ( x ) t t  
coordinates x. After transforming equation (1) to space-
time domain, and solving the time derivative using a
second-order finite-difference approximation, we obtain the 2  1  f h ( k x , k y ) S ( k , t ) 
A ( x )  
G (x )    FT    R(x, T  t )dt (7)
following time-stepping scheme:   ( f n ( k x , k y , k z ) S ( k , t ) 
S ( x , t  t )  2 S ( x , t )  S ( x , t  t )  where Av z and A are source illumination terms, and Wi

 t v FT
2 2
z
1
 f (k
z z 
) S (k , t )  are dynamic weights designed to optimally suppress the

 t 2 vh2 FT 1 f h ( k x , k y ) S ( k , t )   (2) high wavenumber components in the gradient and thus
produce long-wavelength updates (Ramos-Martίnez et al.,
 t 2 vnmo
2

( x )  vh2 ( x ) FT 1 f n ( k x , k y , k z ) S ( k , t )  s f  2016).
On the other hand, the epsilon gradient (equation 7) is built

where S is the forward-propagated wavefield, the point- by crosscorrelating a modified version of the source
source term with source wavelet w(t) forward propagated wavefield solved from equation 2, and
is s f   ( x  x s ) w(t ) , and FT-1 stands for inverse Fourier the back-propagated residual wavefield solved from the
Transform. The differential operators fi are combinations of adjoint-state equation 5. To enhance long-wavelength
the normalized pseudo-Laplacian operator (Chiu and updates with this gradient, we apply TV regularization
Stoffa, 2011), which corrects for the inaccuracy produced (Guo and de Hoop, 2013). In our TV implementation, we
by of the second-order finite-difference approximation to use split Bregman iterations (Goldstein and Osher, 2009),
the time derivative with time step Δt. For example: an effective algorithm for solving L1 optimization
problems. The result is a computationally efficient and
accurate implementation
2 cos( vv k t )  2
f z (k z )  2
k 
2
z (3)
t 2 v z2 k For simplicity, here we show the forms of the adjoint-state
2 cos( vv k t )  2 equation and the epsilon gradient for a VTI medium.
f h (k x , k y )  2
k 2
y k 2
y  (4) However, these equations can be easily extended for the
t 2 v z2 k TTI case by a rotation of the wavenumbers to match the
tilted symmetry axes.
The adjoint-state equation corresponding to the state
equation (2) has the form Examples
R(x, t  t )  2 R(x, t )  R(x, t  t )  We use a synthetic example to illustrate the performance of

 
 t v z2 FT 1 f z (k z ) R(k, t )  the new ε gradient and the role of TV regularization in
 t v FT  f (k , k ) FT (1  2 ) R(x, t ) 
2 2
h
1
h x y (5) reducing leakage between the inverted parameters. Figure
1a shows the vz model, which is a modified version of the
 t v (x)  v (x ) FT  f (k , k , k ) FT 2(   ) R(x, t ) 
2 2
nmo
2
h
1
n x y z SEAM sediment model, including a low velocity anomaly
 in the centre. Figure 1b shows the difference between the
 sf
true and the starting epsilon fields. The inversion for ε
assumes that the velocity field is exact. Figure 1c shows the
where R is the back-propagated wavefield, ε and δ are the difference between the inverted and the started ε models
Thomsen parameters. The adjoint source term is the without regularization. Even for the case of exact vertical

residual wavefield s  S ( x R , t )  d ( x R , t ) . velocity, leakage of the velocity anomaly at the centre is
observed in the ε updates. Figure 1d shows the ε
For a parameterization consisting of vz, acoustic impedance difference for the inversion using TV regularization. The
(computed from vz), ε and δ, and assuming that δ is constant velocity anomaly leakage is significantly reduced and the
during the inversion, the gradients for vz and ε have the updates resemble the true perturbations shown in Figure 1b.
following form
© 2017 SEG Page 1406

Conclusions
We introduce a practical FWI approach to retrieve the long

wavelength updates in an acoustic anisotropic medium with
transverse isotropy. We use a parameterization consisting
of vertical velocity, epsilon and delta, to obtain the
sensitivity kernels for vertical velocity and epsilon. First,
we update the vertical velocity from a long-wavelength
gradient which is similar in form to one derived from the
isotropic velocity sensitivity kernel that is able to suppress
migration isochrones. Then, long-wavelength epsilon
updates are obtained by TV regularization, which
significantly reduces leakage of the velocity imprint in the
epsilon field. We show a successful application of the
approach to a dual-sensor dataset acquired in the Gulf of
Mexico. Results are validated by the improvement of the
flatness of the image gathers using the inverted models
after two stages of cascaded inversions of vertical inversion
and epsilon.
Acknowledments
We thank PGS for permission to publish the results and

PGS MultiClient for providing the data. We thank Nizar
Chemingui, Sean Crawley, Jan Kirkebo, Dan Whitmore
Faqi Liu and Volker Dirks for useful suggestions
Figure 1. a) Vertical velocity and b) difference between the

true and starting epsilon models for the synthetic example.
Difference between the inverted and starting epsilon
models after inversion without (c) and with (d) TV
regularization.
We tested the multi-parameter FWI on a dual-sensor field

data set from the De Soto Canyon area in the Gulf of
Mexico. Figure 2 shows sample shot records used in the
inversion. The data has a maximum offset of 12 km. The
starting vertical velocity model is shown in Figure 3a. The
initial ε and δ models are zero in the water column and
constant in the sediments with values of 0.08 and 0.04 Figure 2. Sample shot records corresponding to a dataset
respectively. We use a maximum frequency of 7 Hz acquired with dual-sensor technology in the De Soto
performing an alternating inversion between vz and ε. Canyon area, Gulf of Mexico. Maximum offset is 12 km.
Figure 3b and Figure 3c show the final vz and ε models.
Figures 4a and 4b illustrate clear improvement in the
flatness of the Kirchhoff offset gathers after inversion.
© 2017 SEG Page 1407

Figure 3. a) Starting vertical velocity model for the Gulf of

Mexico field data example; epsilon and delta models are
zero in the water column, and homogeneous from the Figure 4. Kirchhoff offset gathers for the a) starting and b)
water bottom with values 0.08 and 0.04, respectively. b) inverted vertical velocity and epsilon models corresponding
Vertical velocity and c) epsilon models after cascade to the field data example of Gulf of Mexico.
inversion.
.
© 2017 SEG Page 1408

EDITED REFERENCES
REFERENCES
Alkhalifah, T., 2015, Conditioning the full-waveform inversion gradient to welcome anisotropy:
Geophysics, 80, no. 3, R11–R122, https://doi.org/10.1190/geo2014-0390.1.
Alkhalifah, T., and R. E. Plessix, 2014, A recipe for practical full-waveform inversion in anisotropic
media: An analytical parameter resolution study: Geophysics, 79, no. 3, R91–R101,
https://doi.org/10.1190/geo2013-0366.1.
Cheng, X., K. Jiao, D. Su, and D. Vigh, 2016, Multiparameter estimation with acoustic vertical transverse
isotropic full-waveform inversion of surface seismic data: Interpretation, 4, SU1–SU16,
https://doi.org/10.1190/INT-2016-0029.1.
Chiu, C., and P. L. Stoffa, 2011, Application of normalized pseudo-Laplacian to elastic wave modeling
on staggered grids: Geophysics, 76, no. 11, T113–T121, http://dx.doi.org/10.1190/geo2011-
0069.1.
Debens, H., M. Warner, A. Umpleby, and N. da Silva, 2015, Global anisotropic 3D FWI: 85th Annual
International Meeting, SEG, Expanded Abstracts, https://doi.org/10.1190/segam2015-5921944.1.
Etgen, J. T., and S. Brandsberg-Dahl, 2009, The pseudo-analytical method: application of pseudo-
Laplacians to acoustic and acoustic anisotropic wave propagation: 79th Annual International
Meeting, SEG, Expanded Abstracts, 2552–2555, https://doi.org/10.1190/1.3255375.
Goldstein, T., and O. Stanley, 2009, The split Bregman method for l1-regularized problems: SIAM
Journal on Imaging Sciences, 2, 323–343, https://doi.org/10.1137/080725891.
Guo, Z., and M. de Hoop, 2013, Shape optimization and level set method in full waveform inversion with
3D body reconstruction: 83th Annual International Meeting, SEG, Expanded Abstracts,
Plessix, R. E., and Q. Cao, 2011, A parameterization study for surface seismic full waveform inversion in
an acoustic vertical transversely isotropic medium: Geophysical Journal International, 185, 539–
556, https://doi.org/10.1111/j.1365-246X.2011.04957.x.
Ramos-Martinez, J., S. Crawley, Z. Zou, A. A. Valenciano, L. Qiu, and N. Chemingui, 2016, A robust
gradient for long wavelength FWI updates: 76th Annual International Conference and Exhibition,
EAGE, Extended Abstracts, https://doi.org/10.3997/2214-4609.201601536.
© 2017 SEG Page 1409

Flattening common image gathers after full-waveform inversion: the challenge of anisotropy
estimation
Thibaut Allemand*, Anna Sedova and Olivier Hermant, CGG
Summary Among them, Stopin et al. (2014) use a strong

regularization on the anisotropy parameter to improve the
Full-waveform inversion (FWI) is an unrivalled tool for conditioning of the inversion; Debens et al. (2015) use a
velocity model building in areas covered by recorded global optimization scheme for a smooth anisotropy
reflected and diving waves. However, being driven mainly parameter coupled with a local optimization for the
by the kinematics of diving waves, the resulting velocity velocity; Cheng et al. (2014) switch to multi-parameter
models do not always flatten common image point gathers. inversion only when velocity-only inversion has stabilized.
This is generally interpreted as the result of poor estimation
of the anisotropic parameters, arising even when multi- Improving the conditioning of joint velocity and anisotropy
parameter anisotropic FWI is performed. Various inversion can actually only be done if diving and reflected
regularizations can be introduced to mitigate the issue but waves are handled simultaneously. A heuristic method to
fundamentally it should be solved by incorporating the do so consists of alternating steps of ray-based tomography
kinematics of reflected waves. We propose a new approach and FWI, the classical workflow being the sequence
involving a tilted transverse isotropic (TTI) joint reflected tomography-FWI-tomography (Mothi and Kumar, 2014).
and diving ray tomography for estimating the initial In this sequential approach, in the area investigated by
anisotropic model for FWI. This step provides anisotropic diving waves, FWI provides the high resolution velocity
parameters, which, for example, may then be kept fixed model, while tomography provides the extra information
during FWI. The use of an original non-linear tomography necessary for assessing anisotropy. However, strong
algorithm for the joint reflected and diving rays is a key coupling of velocity and anisotropy parameters makes
component for the efficiency and accuracy of our approach. convergence challenging. We believe that this problem can
We present here the algorithm and an application be solved more efficiently using a ray-based approach that
demonstrating the capability of the approach within a land directly combines both types of waves in a non-linear way.
FWI context on a full azimuth and ultra-long offset This should be used as a prior step to FWI allowing the
broadband dataset from the Sultanate of Oman. estimation of the anisotropic parameters that are then fixed
during the FWI update, or provide a much better starting
Introduction point for multi-parameter FWI. The idea of using ray-based
techniques to estimate the anisotropy prior to FWI is not
The estimation of an accurate velocity model of the new, see for example Qin et al. (2014) or Xie et al. (2017),
subsurface is a crucial step in seismic imaging. In recent but the use of a non-linear approach combining picks from
years FWI has become a well-established technique for reflected and diving waves makes it particularly accurate
velocity model building (see Virieux and Operto (2009) for and efficient (Prieux et al., 2012).
a review). FWI can estimate both the long and short
wavelength components of the velocity model in the area In the following sections we first present the method and
penetrated by diving waves, pushing the resolution beyond then apply it on a 3D land dataset to show its ability to
the capability of ray-based tomography. recover both diving wave kinematics and CIG flattening.
Finally, we discuss the results obtained using the joint
However, it is often observed that common image point tomography-FWI workflow.
gathers (CIGs) depth-migrated using a velocity model
updated with FWI are not flat, preventing a good focusing Joint reflection-diving ray tomography
of the migrated seismic image (Mothi and Kumar, 2014).
This is usually attributed to the improper estimation of While traveltime tomography of diving waves can be easily
anisotropy parameters. Indeed, FWI is mainly driven by the implemented in a non-linear way (Taillandier et al., 2011),
kinematics of diving waves, which travel nearly with first break times picked from shot or receiver gathers
horizontally in the subsurface, while CIGs are computed and then used in an iterative scheme to update the velocity
using reflected waves, which travel more in the vertical model, most ray-based migration velocity analysis tools are
direction. Therefore in an anisotropic medium, the velocity only able to provide a linear update after each dip and
seen by diving and reflected waves may differ significantly. residual move-out (RMO) picking step (Woodward et al.,
2008). Therefore, non-linear slope tomography (Guillaume
Anisotropy estimation can be done through multi-parameter et al., 2008), with its non-linear forward modeling
FWI. However, it is fundamentally an ill-conditioned functionality, offers a tremendous advantage for the
problem. Some strategies have been proposed to tackle it. combination with diving ray tomography.
© 2017 SEG Page 1410

Flattening common image gathers after full-waveform inversion: the challenge of anisotropy estimation
The core of this non-linear tomography is the use of the so- solving the Eikonal equation, which provides a more stable
called kinematic invariants, which represent the kinematic solution than ray tracing. However, solving the anisotropic
characteristics of locally coherent events in the un-migrated Eikonal equation can be computer intensive. Here we split
domain. They are generally obtained through a kinematic first break modeling into two steps. First, we solve the
demigration of dip and RMO picks in the pre-stack depth isotropic Eikonal equation |∇𝑇|2 = 1/𝑉ℎ2 for each source
migration domain. They are then used to feed a non-linear using the horizontal velocity 𝑉ℎ = 𝑉𝑣 √1 + 2𝜖. This gives
iterative algorithm involving kinematic migrations and an approximated traveltime map. A trajectory between
updates of the velocity model in order to minimize the source and receiver is computed following the traveltime
slope of RMO (Montel et al., 2009). gradients. Then we perturb this trajectory using a ray
bending algorithm to get rays that satisfy Fermat’s principle
We propose to combine this with diving ray tomography, in the full anisotropic model.
which can be seen as a ray-based version of diving wave
FWI, taking advantage of the non-linear capabilities of the Which parameters should we invert for? Several studies
two methods. The associated joint cost function can be deal with sensitivity analysis. Djebbi et al. (2017) compute
expressed as the traveltime sensitivity kernels in a VTI context for
𝑁𝑅𝑀𝑂 𝑁𝐹𝐵
𝑙
several parameterizations. They conclude that diving waves
𝐶(𝑚) = ∑ 𝑎𝑖 |𝑑𝑅𝑀𝑂𝑖 |𝑙 + 𝑤 ∑ 𝑏𝑗 |Δ𝑡𝑗 | + 𝑅(𝑚) are mostly sensitive to the horizontal velocity 𝑉ℎ whereas
𝑖=1 𝑗=1 reflections are mostly sensitive to the NMO
where 𝑁𝑅𝑀𝑂 and 𝑁𝐹𝐵 are the number of picked reflected velocity 𝑉𝑁𝑀𝑂 = 𝑉𝑣 √1 + 2𝛿. When diving waves and
events and the number of picked first breaks, respectively; reflections are used simultaneously, they suggest to
𝑑𝑅𝑀𝑂 is the slope of the reflected event in the CIG use (𝑉𝑁𝑀𝑂 , 𝜂, 𝛿), where the anellipticity parameter is
(derivative of the depth position with respect to the CIG 𝜖−𝛿
defined by 𝜂 = . It was demonstrated that we cannot
parameter, namely offset or angle); Δ𝑡 is the traveltime 1+2𝛿
misfit (difference between computed time in the current recover the three parameters using surface P-wave seismics
model and picked traveltime on the data); 𝑎𝑖 and 𝑏𝑗 are only in layered models (Alkhalifah and Tsvankin, 1995).
Usually, 𝛿 is derived from, or constrained by, well data or
weights on each data (they can be offset-based or include a
data quality factor, for example); w is a global weight regional geological knowledge, while 𝑉𝑣 and 𝜖 are
applied to diving ray misfit term; l describes the chosen estimated keeping 𝛿 fixed (or obeying a priori rock
norm, and finally 𝑅(𝑚) stands for additional constraint and physics-based relationship between 𝜖 and 𝛿). In the
regularization terms applied to the model (including example below, we choose to keep 𝛿 fixed during the
Tikhonov type, or Laplacian, in particular), that are tomography. In this case, solving for (𝑉𝑣 , 𝜖) or (𝑉𝑁𝑀𝑂 , 𝜂)
required in every ill-posed inverse problem. does not make much difference, and we choose the former
for simplicity. We allow long wavelength spatial variations
The cost function is minimized through a non-linear only for  in order to mitigate the tradeoff between
iterative multi-scale procedure. The first step involves a anisotropy and velocity (as Stopin et al., 2014).
kinematic migration which allows re-localizing the
invariants in the updated velocity model. Then Fréchet Additionally, the computation of the Fréchet derivatives
derivatives, with respect to the model parameters, are gives access to the Gauss-Newton approximation of the
computed and the model perturbation is found by solving Hessian, which is used to precondition the gradient and
the normal equations of the least squares problem using an limit the crosstalk between parameters.
iterative linear solver.
Land 3D field example
Dealing with anisotropy
We tested our method on a 3D land broadband wide
Our approach aims at estimating accurately the anisotropy azimuth vibroseis dataset from the Sultanate of Oman
parameters for the later FWI update. In a tilted transverse (Mahrooqi et al., 2012), with a 9 s sweep from 1.5 to 86
isotropic (TTI) medium with known tilt angles, there are Hz. The acquisition design is 50 m by 50 m interval for the
three unknowns: the velocity along the principal symmetry shots, and 250 m by 25 m interval for the receivers. A full
axis 𝑉𝑣 and, the anisotropy parameters 𝜖 and 𝛿, following time and depth processing project has been completed on
Thomsen (1986). In practice, we commonly assume that the this dataset, without FWI, using multi-layer TTI reflection
tilt angles follow the structures of the migrated image. tomography. In the example shown here we work on a
subset area of 800 km2 on which an FWI study has been
Kinematic migration of reflected events in an anisotropic conducted recently (Sedova et al., 2017).
medium does not bring much complexity with respect to First breaks were picked up to 10 km offset, while RMO
the isotropic case. First breaks are usually modeled by picks were available from the recently completed depth
© 2017 SEG Page 1411

processing project. Two wells are located in the center of derived depth model, and using the aforementioned
the study area. We computed an initial velocity model by constant anisotropy, we perform a first pass of joint
smoothing the pre-stack time migration (PreSTM) RMS tomography to update 𝑉𝑣 and 𝜖. Then we update 𝑉𝑣 and 𝛿
velocities in the time domain and converting them to using well data extrapolated along two horizons, and we
interval velocities in the depth domain. The initial perform a second pass of joint tomography for both 𝑉𝑣 and
anisotropy parameters were constant: 𝜖 = 12% and 𝛿 = 𝜖. The  and  models after joint tomography are shown in
5%, coming from known regional values. Figure 3, overlaid on a migrated stack section. The output
1 a) b) c) model is used as an input to FWI inverting both diving
waves and reflections from 3 Hz up to 13 Hz, for updating
𝑉𝑣 only. The preprocessing applied to the data prior to FWI
is the same as in Sedova et al. (2017). We observe on
2 Figure 4 that the convergence of diving wave FWI is
improved when we start from the joint tomography model
rather than the simpler initial model, especially at near
offsets, due to the better estimation of anisotropy.
3
To QC our final result after the FWI at 13 Hz, we compare
in Figure 5 a CIG computed in the legacy multi-layer TTI
Figure 1: Common image point snail gathers migrated in depth for: reflection tomography model with the same CIG computed
a) initial model, b) reflection tomography model, and c) joint in FWI model. We clearly see a reduction of wobbling
reflection-diving ray tomography model. The vertical scale is in
km and the offsets are from 0 to 4 km.
across offsets and an overall satisfactory flatness of events.
2 We also observe good agreement between the velocity
a) b) c) model and the well log. Synthetics overlaid on the real
seismics are shown in Figure 6 for the same two models.
Unsurprisingly, the match between synthetics and real data
is better after FWI than after reflection tomography, and the
modeling using the FWI model is able to reproduce more
3 events. Hence described approach allows honoring both the
reflections and the diving wave kinematics, and matching
well data.
Finally, three depth slices of the final velocity model

4 overlaid on seismics are displayed in Figure 7 together with
Figure 2: Common receiver gathers. Real data (black wiggles) are an inline section. FWI has recovered fine details like near
overlaid on synthetic data (blue/red) computed using: a) initial surface channels and faults that nicely match with the
model, b) reflection tomography model, and c) joint reflection-
seismics.
diving ray tomography model. The vertical scale is in seconds and
the offsets are from 3.8 km to 10 km. Black wiggles should overlap
with red: a perfect QC should show black and blue only. Conclusion
We ran two tomography tests updating jointly 𝑉𝑣 and 𝜖 We presented a joint reflection-diving ray tomography that
starting from the same initial model: the first time using allows the recovered anisotropy to be used in FWI. We
reflections only, and the second time using reflections and applied it to a real land dataset from the Sultanate of Oman.
first breaks jointly. CIGs and wave equation modeled Computed FWI model gives an excellent match between
diving wave synthetics are shown in Figure 1 and Figure 2, synthetic and real data, especially at long offsets. It flattens
respectively. Reflection tomography and joint reflection- the CIGs and the vertical velocity profile compares nicely
diving ray tomography both achieve a good flattening of with the well log, demonstrating the capability of the
the CIGs, but only the latter also correctly models the method.
diving wave kinematics. This gives us good confidence in
the method: flat CIGs and honored first break traveltimes Acknowledgements
mean that the estimated 𝑉𝑣 and 𝜖 are reliable.
We acknowledge PDO and the Ministry for Oil and Gas of
Using joint tomography as a starting model for FWI the Sultanate of Oman for permission to use the data. We
thank our colleagues Gilles Lambaré and Patrice Guillaume
We now use the joint reflection-diving ray tomography to for many helpful discussions, and CGG for permission to
build a starting model for FWI. Starting from the PreSTM- publish this work.
© 2017 SEG Page 1412

From From
1 initial joint
tomo
Figure 4: Common receiver gathers displayed back to back. Real

data filtered at 9Hz (black wiggles) is overlaid on synthetics
(blue / red) computed using: left) 9 Hz FWI from initial model, and
right) 9 Hz FWI from joint reflection-diving ray tomography
Figure 3: Epsilon (left) and delta (right) models after joint
model. The vertical scale is in seconds, offsets are from 0 to 8 km.
reflection-diving ray tomography.
a) b) c) 1 Legacy FWI
1
2 2
3 3
2000 6000 Figure 6: Common receiver gathers displayed back to back. Real
Figure 5: a) Snail CIG in legacy tomography model, b) snail CIG data (black wiggles) is overlaid on synthetics (blue / red) computed
in FWI model, and c) comparison to well log: cyan is FWI, blue is using: left) legacy multi-layer tomography model, right) 13 Hz
the well log. The vertical scale is depth in km. Offset is from 0 to 4 FWI model. The vertical scale is in seconds and the offsets are
km; velocity is in m/s. from 2 to 8 km. Black wiggles should overlap with red.
2 km
840 1200 1405 2 km
4 km
0.5 km
2400 2800 3350 3850 3500 3950 1900 6500

Figure 7: The three left panels are depth slices of velocity after 13 Hz FWI overlaid on seismics. The depth in meters is indicated in the top right
corner of each panel. The velocity is in m/s. The right panel is an inline section of the velocity overlaid on the seismic stack. The white line in the
second panel indicates the position of the inline section.
© 2017 SEG Page 1413

EDITED REFERENCES
REFERENCES
Alkhalifah, T. and I. Tsvankin, 1995, Velocity analysis for transversely isotropic media: Geophysics, 60,
1550–1566, http://dx.doi.org/10.1190/1.1443888.
Cheng, X., K. Jiao, D. Sun, and D. Vigh, 2014, Anisotropic parameter estimation with full-waveform
inversion of surface seismic data: 84th Annual International Meeting, SEG, Expanded Abstracts,
Debens, H. A., M. Warner, A. Umpleby, and N. V. da Silva, 2015, Global anisotropic 3D FWI: 85th
Djebbi, R., R.-É. Plessix and T. Alkhalifah, 2017, Analysis of the traveltime sensitivity kernels for an
acoustic transversely isotropic medium with a vertical axis of symmetry: Geophysical
prospecting, 65, 22–34, http://dx.doi.org/10.1111/1365-2478.12361.
Guillaume, P., G. Lambaré, O. Leblanc, P. Mitouard, J. Le Moigne, J.-P. Montel, T. Prescott, R. Siliqi, N.
Vidal, X. Zhang, and S. Zimine, 2008, Kinematic invariants: an efficient and flexible approach
for velocity model building: 78th Annual International Meeting, SEG, Expanded Abstracts,
3687–3692, http://dx.doi.org/10.1190/1.3064100.
Mahrooqi, S., S. Rawahi, S. Yarubi, S. Abri, A. Yahyai, M. Jahdhami, K. Hunt, and J. Shorter, 2012,
Land seismic low frequencies: Acquisition, processing and full wave inversion of 1.5-86 Hz:
Montel, J.-P., N. Deladerriere, P. Guillaume, G. Lambaré, T. Prescott, J.-P. Touré, Y. Traonmilin, and X.
Zhang, 2009, Kinematic invariants describing locally coherent events: an efficient and flexible
approach to non linear tomography: 71st Annual International Conference and Exhibition, EAGE,
Extended Abstracts, Workshop WS 1: Locally Coherent Events–A New Perspective for Seismic
Imaging.
Mothi, S., and R. Kumar, 2014, Detecting and estimating anisotropy errors using full waveform inversion
and ray-based tomography: A case study using long-offset acquisition in the Gulf of Mexico: 84th
Annual International Meeting, SEG, Expanded Abstract, 1066–1071,
Prieux, V., G. Lambaré, S. Operto, and J. Virieux, 2013, Building starting model for full waveform
inversion from wide-aperture data by stereotomography: Geophysical Prospecting, 61, 109–137,
http://dx.doi.org/10.1111/j.1365-2478.2012.01099.x.
Qin, B., V. Prieux, H. Bi, A. Ratcliffe, J.-P. Montel, D. Carotti, and G. Lambaré, 2014, Towards high-
frequency full waveform inversion-A Case Study: 76th Conference and Exhibition, EAGE,
Expanded Abstracts, We E106 07, http://dx.doi.org/10.3997/2214-4609.20141086.
Sedova, A., G. Royle, O. Hermant, M. Retailleau, and G. Lambaré, 2017, High-resolution land full-
waveform inversion: a case study on a data set from the Sultanate of Oman: 79th Conference and
Exhibition, EAGE, Expanded Abstracts, http://dx.doi.org/10.3997/2214-4609.201701163.
Stopin, A., R.-É. Plessix, and S. Al Abri, 2014, Multiparameter waveform inversion of a large wide-
azimuth low-frequency land data set in Oman: Geophysics, 79, WA69—WA77,
© 2017 SEG Page 1414

Taillandier, C., N. Deladerrière, A. Therond, and D. Le Meur, 2011, First arrival traveltime tomography-
when simpler is better: 73rd Conference and Exhibition, EAGE, Expanded Abstracts,
http://dx.doi.org/10.3997/2214-4609.20149305.
Thomsen, 1986, Weak elastic anisotropy: Geophysics, 51, 1954—1966,
http://dx.doi.org/10.1190/1.1442051.
Woodward, M. J., D. Nichols, O. Zdraveva, P. Whitfield, and T. Johns, 2008, A decade of tomography:
Geophysics, 73, no. 5, VE5–VE11, http://dx.doi.org/10.1190/1.2969907.
Xie Y., B. Zhou, J. Zhou, J. Hu, L. Xu, X. Wu, N. Lin, F. C. Loh, L. Liu, and Z. Wang, 2017,
Orthorhombic full-waveform inversion for imaging the Luda field using wide-azimuth ocean-
bottom-cable data: The Leading Edge, 36, 75–80, http://dx.doi.org/10.1190/tle36010075.1
© 2017 SEG Page 1415

Cross-talk and frequency bands in truncated Newton an-acoustic full waveform inversion
Scott Keating and Kristopher A. Innanen, Department of Geoscience, University of Calgary
SUMMARY P-wave velocity and Q it is, in contrast, the differences in the frequency-
dependence of scattering amplitudes which permits them to be distin-
Simultaneous use of data within relatively broad frequency bands is es- guished (Innanen and Weglein, 2007; Hak and Mulder, 2011). This
sential to discriminating between velocity and Q errors in the construc- fact ties together, in an unusually close manner, issues of multiscale
tion of an-acoustic full waveform inversion (QFWI) updates. Individ- FWI, in which iterations or groups of iterations involve different fre-
ual frequencies or narrow bands in isolation cannot provide sufficient quency bands, parameter cross-talk, and the degree of approximation
information to resolve cross-talk issues in a surface seismic acquisition with which off-diagonal elements of the inverse Hessian are incorpo-
geometry. Truncated Newton (TN) optimization methods offer the po- rated through TN iterations. In this paper we analyze this relationship
tential for reducing computational cost while incorporating approxi- in the context of synthetic an-acoustic frequency domain FWI. Be-
mate versions of the Newton update to reduce these cross-talk issues, cause the exact manner in which dispersion is modelled determines
with the trade-off being mediated by the chosen number of inner TN the character of the cross-talk, the attenuation model type, which must
iterations. In fact, in TN-QFWI we are able to choose between two be selected prior to formulating a detailed FWI algorithm, plays a key
qualitatively distinct “modes” of an-acoustic inversion: one in which role. This issue is discussed in a companion paper (Keating and Inna-
the estimation of a velocity model uncorrupted by the influence of Q is nen, 2017).
the desired outcome, and another in which both a velocity model and
a Q model are the desired outcomes. Both can in principle be accom-
plished in the context of TN-QFWI, with the former at significantly THEORY
reduced computational expense.
Cross-talk
INTRODUCTION
The FWI problem considered here has an objective function given by
Full waveform inversion (FWI) is a technique which attempts to re-
cover the true subsurface parameters by iteratively minimizing the 1
φ (m) = ||dobs − dmod ||22 , (1)
difference between measured data and modeled data generated from 2
the current estimated subsurface parameters (Lailly, 1983; Tarantola,
1984; Virieux and Operto, 2009). While multiparameter versions of where φ , a function of the subsurface model m, measures the discrep-
FWI have been formulated and studied, the majority of research on ancy between the measured data dobs and the modelled data dmod . To
FWI is focused on a single parameter problem, specifically that in recover the true properties of the subsurface, this objective is mini-
which acoustic wave propagation is assumed and density is treated mized in FWI through gradient-based, or Newton type updates.
as constant. In this problem, only P-wave velocity varies in the model. Cross-talk is the phenomenon where data residuals introduced by an
However, in the effort to make FWI effective in the determination of error in one model parameter are attributed to errors in the estimate of
larger numbers of smaller scale (e.g., reservoir) properties, the multi- another parameter. For example, cross-talk is present if an estimate of
parameter FWI problem must be brought to bear. In multiparameter density is modified due to data residuals introduced by errors in a ve-
FWI (e.g., Operto et al., 2013; Plessix et al., 2013; Pan et al., 2016), locity estimate. Cross-talk is a major concern in FWI, as it can severely
allowance is made in the gradient/Hessian quantities for simultaneous harm the accuracy of the recovered model and the convergence of the
and independent variations of several parameters, either to support ve- scheme (e.g., Plessix et al., 2013; Innanen, 2014; Pan et al., 2016).
locity model building, or to push towards elastic characterization of Gradient updates are particularly vulnerable to cross-talk. This is due
the subsurface (Tarantola, 1986; Choi et al., 2008). to the fact that the gradient considers only the derivative of the objec-
tive function with respect to each variable parameterizing the model.
Attenuation and dispersion play important roles in both of these ap- If changes in several different variables can reduce the same part of
plications of multiparameter FWI. It can be a powerful nuisance to the data residual, all will be changed in a gradient update.
acoustic and elastic FWI, strongly influencing the amplitude and phase
of the waveforms we would like to interrogate for acoustic/elastic in- Newton optimization employs both the first order (gradient) and sec-
formation, but it can also be a rich source of information by which ond order (Hessian) derivatives of the objective function. In Newton
fluids and viscosities can be determined or discriminated. So, we can optimization, the update p is given by
also distinguish between whether we wish to specifically determine Q
in FWI, or merely “protect” the recovery of other parameters from its p = −H−1 g , (2)
influence. Either motivation requires that the physics of attenuation be
included in an FWI scheme. An-acoustic FWI (QFWI for short), in where g is the gradient of the objective function, and H is the Hessian
which attenuation and dispersion parameters are determined simulta- matrix. The Hessian provides information about how the derivative
neously alongside their elastic counterparts, has been carefully investi- with respect to one variable will change as another variable changes.
gated (e.g., Hak and Mulder, 2011; Hicks and Pratt, 2001; Malinowski This helps to prevent several variables from being used in reducing the
et al., 2011; Kamei and Pratt, 2013; Métivier et al., 2015). In much data residual introduced by an error in one, mitigating cross-talk. Un-
of this existing research, however, incorporating attenuation is treated fortunately, in realistic FWI applications, Newton optimization tends
as a small addition to the classical acoustic/elastic FWI problem, with not to be a viable approach, because of the excessive cost for the stor-
relatively little focus on how the nature of the problem changes. Pa- age and inversion of the Hessian.
rameter cross-talk, in which one parameter is mistakenly updated to
account for data residuals caused by another, affects an-acoustic FWI Optimization
significantly and in a unique manner requiring special study.
Two approaches which attempt to approximate exact Newton opti-
Simultaneous variations in acoustic and/or elastic properties can be mization but at reduced cost are quasi-Newton methods and truncated
separately estimated in FWI primarily because of differences in the Newton methods. Quasi-Newton methods obtain an exact solution
angle-dependence of scattering from one parameter to another. With to an equation approximating equation 2, whereas truncated Newton
© 2017 SEG Page 1416

Cross-talk and frequency bands in truncated Newton QFWI
(TN) methods are those which obtain an approximate solution to equa- where c is the acoustic wave velocity at the reference frequency ω0 ,
tion 2. Both attempt to provide an efficient alternative to exact Newton and Q is the quality factor. For a chosen ω0 , the an-acoustic FWI prob-
optimization while still retaining important information about the Hes- lem is to determine the unknown spatial distributions of two parame-
sian, which helps to mitigate cross-talk. In this report we focus on the ters, c and Q. Inspection of equation 9 identifies a specific challenge
TN method. that the QFWI problem faces. The size of the frequency dependent
term in s, which models dispersion, is determined by Q. In effect, both
TN optimization is similar to exact Newton optimization, but rather c and Q co-determine the wave velocity at a given frequency. This
than directly solving 2, an approximate solution is obtained by itera- opens the possibility of considerable cross-talk, and is suggestive that
tively minimizing (Nocedal and Wright, 2006) variations from one frequency to another will be instrumental in miti-
gating it.
1 T
θ (p) = p Hp + gT p. (3)
2 Predicting cross-talk with an-acoustic scattering potentials
At a minimum of this objective function, the gradient of θ is zero, so The radiation patterns, or scattering potentials, of point perturbations
in active FWI parameters, plotted as functions of experimental vari-
Hp + g = 0, (4) ables (e.g., angle between incoming and outgoing rays, frequency,
etc.), are often used to determine the degree of expected parameter
satisfying equation 2. In this research, the truncated Gauss-Newton cross talk in multi-parameter FWI. Parameters which generate poten-
method is used, where H is replaced with HGN , the residual indepen- tials with proportional amplitude variations over a given range of these
dent part of the Hessian. Following Metivier et al. (2013), FWI updates experimental variables are easily confused with one another. The scat-
are iteratively constructed in what will be called the outer loop, and the tering potential V for position x and frequency ω associated with our
minimization of equation 3, which occurs once for each FWI update, chosen an-acoustic wave equation is
but is itself iterative, involves what will be called the inner loop. Pro-
ω2
vided a suitable optimization approach is employed in the inner loop, V (x, ω) ≈ − [VQ (x, ω) +Vc (x, ω)] , (10)
this method does not require the storage or inversion of the Hessian c0 (x)2
matrix HGN , only the product of the Hessian with an arbitrary p. This where
Hessian-vector product can be efficiently calculated using the adjoint
F(ω) F(ω)
state method, as described in Metivier et al. (2013). VQ (x, ω) = ∆Q(x), Vc (x, ω) = 1 + ∆c(x) (11)
Q0 (x) Q0 (x)
We implement the inner loop of the TN FWI algorithm with a BFGS
and where F(ω) = i − (2/π) log(ω/ω0 ); the ∆· quantities,
inner solver, wherein p is determined by iteratively solving
Q0 (x) c0 (x)2
pk = pk−1 + αk ∆p, where ∆p = −Q∇pk−1 , (5) ∆Q(x) = 1 − , ∆c(x) = 1 − , (12)
Q(x) c(x)2
Q is the BFGS approximation of the inverse Hessian of θ (which is represent localized jumps in their corresponding model parameters Q
the same as the inverse Hessian of φ ), and and c.
Notably, the Q and c components do not vary independently of one

αk = −(∂ θ /∂ p)T ∆p ∆pT Q∆p . (6) another with scattering angle. This means that angle-based consider-
ations in the discrimination of different model parameters, which are
crucial in elastic and anisotropic FWI, do not apply here. The com-
The computational cost of the truncated Newton method is determined ponents do, however, undergo relative variation as frequency changes;
largely by the number of inner iterations used in minimizing (3) in our conclusion is that only through simultaneous inversion of a range
each FWI iteration. This cost is controlled by specifying stopping of frequencies can a QFWI update distinguish between the influence
conditions. The stopping conditions used here are satisfied when a of c and that of Q. This is illustrated in Figure 1. By inspection of
maximum number of iterations are reached, or the condition this plot, we can furthermore predict that if only a small range of fre-
quencies are considered, over which the two scattering potentials vary
||HGN p + g|| ≤ ||ηg|| (7) roughly in proportion, c and Q will be extremely difficult to distin-
guish. Over broader ranges of frequencies, however, significant differ-
holds, where η is a chosen forcing term. The smaller this forcing term, ences between the two scattering potentials become prevalent, which
the greater the cost and lower the cross-talk; the larger the forcing term, should enable a QFWI iteration to create meaningful updates in both.
the less the computational cost and the greater the cross talk. This introduces a new feature to multi-scale FWI workflows, in which
demands already exist on the frequencies considered.
Wave equations
Gradient for QFWI
In order to study in isolation new aspects of cross talk (etc.) in multi-
parameter FWI which are introduced by attenuation and dispersion, Gradients for c and Q, consistent with equations (1), (8), and (9) can
we consider waves whose propagation is governed by be written

2
ω s(r, ω) + ∇2 u(r, ω) = f (r, ω), (8) gc (r) = ∑ ω 2 1 + β (ω)sq0 (r) G0 (rg , r)G0 (r, rs ) δ d ∗ (13)
rg ,rs ,ω
where u is the pressure field, f is a source term, and the model pa-
and
rameter s includes a dispersive velocity and an attenuation. No single
attenuative-dispersive model is likely always to be entirely correct, so gq (r) = ∑ ω 2 β (ω)sc0 (r) G0 (rg , r)G0 (r, rs )δ d ∗ , (14)
rg ,rs ,ω
many exist, meaning that several anacoustic models could be consid-
ered. This variation, which has its own set of issues for FWI (Keating where δ d = δ d(rg , rs ) are the residulas, sc0 = c−2 −1
0 , sq0 = Q0 , are
and Innanen, 2017), is reflected in different specific forms of s. Here, the current model iterates, G0 (r, r0 ) is the Green’s function describing
the constant Q Kolsky-Futterman attenuation model is considered: propagation from r0 to r in the current medium iterate, and
2
1 1 2 ω β = i− log (ω/ω0 ) . (15)
s(r, ω) = 1+ i − log , (9) π
c2 (r) Q(r) π ω0
© 2017 SEG Page 1417

Figure 3: Gauss-Newton QFWI, inverting only one frequency at each

iteration.
Figure 1: Amplitude of the scattering potential as a function of fre-
quency for velocity perturbation (blue) and Q perturbation (red). Am-
plitudes have been normalized to 1 at 1Hz. Background Q0 = 20, and
the reference frequency was 15Hz for this example.
Figure 4: Gauss-Newton QFWI, inverting a 1Hz band of frequencies

at each iteration. Compare with Figure 3.
Figure 2: Benchmark model, velocity (left) and Q (right). The velocity

values correspond with reference frequency ω0 /2π = 30Hz.
NUMERICAL EXAMPLES
We use the QFWI framework in the previous section to examine the

effects of (1) frequency groups used in individual iterations, and (2)
optimization method on parameter cross-talk. 2D frequency-domain
finite difference modelling is used for simulations; models used are
defined on a 2D 50×50 grid with 10m grid cells. 24 sources at 30m Figure 5: Gauss-Newton QFWI, inverting a broad band of frequencies
depth are spaced 20m apart from 10-470m; 48 receivers at 20m depth at each iteration. Compare with Figures 3 and 4.
are spaced 10m apart from 10-480m. Frequencies from 1Hz to 25Hz
are assumed to be available, and the source function is considered to
have a uniform amplitude spectrum over this range. First order En-
Figure 4 illustrates the results of QFWI in which a narrow band of
gquist boundary conditions are implemented at every boundary. The
frequencies (6 evenly-spaced frequencies in a 1Hz range) was inverted
reference frequency considered was 30 Hz. Figure 2 illustrates the
at each iteration. One iteration was carried out per band, beginning
benchmark model used for all of the examples; the initial model con-
with a band centered at 1.5Hz, and increasing the center frequency
sists of homogeneous c0 and Q0 values equal to the background val-
by 1Hz at each iteration, up to 24.5Hz. The improved recovery of
ues in Figure 2. Velocity perturbations placed above and below a Q
the deeper velocity anomaly and the Q anomaly are notable (compare
anomaly were chosen to highlight cross-talk issues as they arise due to
with Figure 3). This is further evidence supporting the prediction from
optimization strategies and frequency bandwidth.
scattering potential analysis that groups of frequencies at each iteration
Frequencies in multiscale QFWI offer the only tangible means of discriminating between velocity and
Q, and mitigating cross-talk.
In the examples shown in this section, the effect of frequency informa-
tion on cross-talk in QFWI is investigated. Full Gauss-Newton opti- Figure 5 illustrates the results of QFWI in which a broad band of fre-
mization is employed to ensure that cross-talk is not being introduced quencies (6 evenly-spaced frequencies over a growing range) was in-
by numerical optimization. verted at each iteration. The 6 frequencies were distributed from 1Hz
to a maximum frequency which began at 2Hz, and increased by 1Hz
Figure 3 illustrates the results of QFWI when one frequency is inverted per iteration to a maximum of 25Hz. This approach produces the best
at each iteration. 6 iterations were performed at each frequency, begin- recovery of the three strategies. All else having been held fixed in
ning at the lowest frequency, 1Hz, then increasing in 1Hz increments these experiments, we conclude that iterations involving a large, var-
up to 25Hz. The problems in Figure 3 highlight important features of ied range of frequencies are optimal for suppressing cross-talk. This is
QFWI. Cross-talk impairs the Q estimate far more strongly than the c in keeping with the general principle that, to separate any two parame-
estimate, with the Q anomaly not meaningfully recovered. The c esti- ters, data must span experimental variables across which the two have
mate is also strongly impacted by cross-talk, with the lower anomaly different characteristic scattering signatures.
(to illuminate which raypaths have had to traverse the Q-anomaly) be-
ing much more poorly reconstructed. Because this cross-talk occurs Optimization strategy
despite the use of full Gauss-Newton optimization, and comprehen-
Based on our conclusions above, we will from this point on use the
sive simulated acquisition, it is reliably traceable to the use of a single
broad frequency-band multiscale approach (i.e., the scheme by which
frequency, which is similar to attempting to solve for several elastic
Figure 5 was generated). An outstanding weakness in those results,
parameters with a single angle of data.
© 2017 SEG Page 1418

to be a close relationship between outer iterations performed, and inner

iterations required; a simple compromise between these is not easily
achieved.
Figure 6: Result of steepest descent FWI, inverting a large band of fre-

quencies at each iteration. Severe cross-talk is evident, despite using
the same frequencies for inversion as in figure 5.
Figure 7: TN-QFWI, inverting a large band of frequencies at each
iteration with a forcing term of 10−3 .
however promising, is that to achieve them we have used an exact
Gauss-Newton optimization strategy, which, while viable for small
models, is computationally very expensive. To improve efficiency, a
steepest descent scheme can be employed, in which the Hessian is ap-
proximated with the unit operator. Figure 6 illustrates the results of
doing so. For this example, 23 iterations were carried out at each fre-
quency band. Cross-talk is dominant in the result, despite the very
large number of iterations used. So, although a broad frequency band
brings into the inversion sufficient information to suppress cross-talk,
only with the weights provided by the Hessian is that information cor-
rectly used. In this example, the gradient was evaluated 575 times,
which, in addition to the 2356 objective function evaluations required Figure 8: TN-QFWI, inverting a large band of frequencies at each
in the 575 necessary line searches, brought the total cost to 3506 wave- iteration with a forcing term of 10−2 .
field simulations.
In common with other multiparameter FWI problems, something ly-

ing between the full Gauss-Newton and steepest descent schemes is
needed to efficiently suppress cross-talk. The truncated Newton (TN)
optimization approach plays this role. Figure 7 illustrates the result of
applying truncated Gauss-Newton QFWI (hereafter TN-QFWI) with a
forcing term of 10−3 . One iteration was carried out at each frequency
band. This result shows little evidence of cross-talk. The small forcing
term makes this a close approximation to exact Gauss-Newton opti-
mization. This forcing term magnitude also meant that a total of 1789
Hessian-vector products were evaluated, each with the same compu- Figure 9: TN-QFWI, inverting a large band of frequencies at each
tational cost as a gradient calculation. In addition to the 25 gradient iteration with a forcing term of 10−2 , 2 iterations per frequency band.
calculations, and the 100 objective function evaluations required in the
25 line searches performed, this brought the total cost to 3728 wave-
field propagation problems solved. CONCLUSIONS
We next consider to what degree cross-talk suppression like this can
Cross-talk is a serious concern in QFWI, and has a particularly strong
be obtained with larger forcing terms, i.e., greater efficiency. In Figure
impact on the recovered Q model. Frequency dependent effects play a
8 the result with a forcing term of 10−2 , with all else unchanged, is
major role in eliminating this cross-talk. Single frequency updates pro-
illustrated. The recovered model contains significant errors, mostly in
duce results dominated by cross-talk, even when exact Gauss-Newton
the recovered Q. This approach required a total of 803 Hessian-vector
optimization is used. Inverting even a small band of frequencies per it-
product calculations, in addition to 25 gradient calculations and 100
eration offers a notable improvement, and the best results were achieved
objective function evaluations required in 25 line searches, for a total
inverting the largest range of frequencies possible. Adequate consid-
cost of 1756 wavefield propagation problems. This is suggestive that
eration of Hessian information, for instance through truncated Newton
two modes of TN-QFWI be considered: a relatively low-cost mode
methods, is crucial to using these frequencies to suppress cross-talk.
involving a high forcing term, producing a velocity model that is very
A greater number of FWI iterations can compensate for a less precise
successfully “protected” from the influence of Q, but a Q model which
estimate of the Hessian, but the cost of approximating the Hessian is
is itself of limited use. A higher-cost mode involving a low forcing
not constant, and the cost of performing additional iterations at a cho-
term, produces high fidelity reconstructions of both parameters.
sen level of accuracy may not be easy to predict. We observe that
An interesting case to consider is that in which a less accurate TN ap- TN-QFWI mode in “efficient” mode, with a high forcing term, may
proximation is used, but with a greater number of outer iterations per- provide efficient and robust velocity estimates, protected from attenu-
formed. Figure 9 illustrates the results of this experiment: η = 10−2 ation, at the cost of a well-resolved Q model estimate.
is used, with 2 iterations per frequency band. Very little cross-talk
is observed in the recovered model for this example. The computa-
tional cost is high, however, a total of 2475 Hessian-vector products ACKNOWLEDGMENTS
are evaluated, 683 on first iterations of a frequency band, and 1792 on We thank the sponsors of CREWES for continued support. This work
the second. Combined with 50 gradient calculations and 219 objec- was funded by CREWES industrial sponsors and NSERC (Natural
tive function evaluations required in 50 line searches, a total of 5269 Science and Engineering Research Council of Canada) through the
wavefield propagation problems were solved. There appears therefore grant CRDPJ 461179-13.
© 2017 SEG Page 1419

EDITED REFERENCES
REFERENCES
Choi, Y., D. Min, and C. Shin, 2008, Two-dimensional waveform inversion of multi-component data in
acoustic-elastic coupled media: Geophysical Prospecting, 56, 863–881,
http://doi.org/10.1111/j.1365-2478.2008.00735.x.
Hak, B., and W. Mulder, 2011, Seismic attenuation imaging with causality: Geophysical Journal
International, 184, 439–451, https://doi.org/10.1111/j.1365-246X.2010.04848.x.
Hicks, G., and R. Pratt, 2001, Reflection waveform inversion using local descent methods: Estimating
attenuation and velocity over a gas-sand deposit: Geophysics, 66, 598–612,
https://doi.org/10.1190/1.1444951.
Innanen, K. A., 2014, Seismic AVO and the inverse Hessian in precritical reflection full waveform
inversion: Geophysical Journal International, 199, 717–734, https://doi.org/10.1093/gji/ggu291.
Innanen, K. A., and A. B. Weglein, 2007, On the construction of an absorptive-dispersive medium model
via direct linear inversion of reflected seismic primaries: Inverse Problems, 23, 2289–2310,
https://doi.org/10.1088/0266-5611/23/6/001.
Kamei, R., and R. Pratt, 2013, Inversion strategies for visco-acoustic waveform inversion: Geophysical
Journal International, 194, 859–884, https://doi.org/10.1093/gji/ggt109
Keating, S., and K. A. Innanen, 2017, Characterizing and mitigating uncertainty in the physics of
attenuation in an-acoustic full waveform inversion: 87th Annual International Meeting, SEG,
Expanded Abstracts (submitted).
Inverse Scattering, Theory and Application, Society for Industrial and Applied Mathematics,
Malinowski, M., S. Operto, and A. Ribodetti, 2011, High-resolution seismic attenuation imaging from
wide-aperture onshore data by visco-acoustic frequency-domain full-waveform inversion:
Geophysical Journal International, 186, 1179–1204, https://doi.org/10.1111/j.1365-
246X.2011.05098.x.
Metivier, L., R. Brossier, J. Virieux, and S. Operto, 2013, Full waveform inversion and the truncated
newton method: Siam Journal on Scientific Computing, 35, B401–B437,
https://doi.org/10.1137/120877854.
Metivier, L., R. Brossier, S. Operto, and J. Virieux, 2015, Acoustic multi-parameter FWI for the
reconstruction of P-wave velocity, density and attenuation: preconditioned truncated Newton
approach: 85th Annual International Meeting, SEG, Expanded Abstracts, 1198–1203,
Nocedal, J., and P. S. Wright, 2006, Numerical optimization (2nd ed.): Springer.
Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L. Metivier, and J. Virieux, 2013, A guided
tour of multiparameter full-waveform inversion with multicomponent data: From theory to
practice: The Leading Edge, 32, 1040–1054, https://doi.org/10.1190/tle32091040.1.
Pan, W., K. A. Innanen, G. F. Margrave, M. Fehler, X. Fang, and J. Li, 2016, Estimation of elastic
constants for HTI media using Gaussnewton and full Newton multi-parameter full waveform
inversion: Geophysics, 81, no. 5, E323–E339, https://doi.org/10.1190/geo2015-0594.1.
Plessix, R.-E., P. Milcik, H. Rynia, A. Stopin, K. Matson, and S. Abri, 2013, Multiparameter full-
waveform inversion: Marine and land examples: The Leading Edge, 32, 1030–1038,
https://doi.org/10.1190/tle32091030.1.
© 2017 SEG Page 1420

1259–1266, https://doi.org/10.1190/1.1441754.
Tarantola, A., 1986, A strategy for nonlinear inversion of seismic reflection data: Geophysics, 51, 1893–
1903, https://doi.org/10.1190/1.1442046.
© 2017 SEG Page 1421

Joint FWI for velocity model building: a real case study in the viscoacoustic approximation
W. Zhou∗† , R. Brossier† , S. Operto‡ , J. Virieux† , P. Yang†
†ISTerre, Univ. Grenoble Alpes, ‡ Géoazur-CNRS, Univ. Nice–Sophia-Antipolis
SUMMARY skipping issues in short offset ranges.

Due to the different wavepaths involved in EWI and RWI, the

Joint full waveform inversion (JFWI) combines reflection
two inversion schemes tend to preferentially sample the ver-
(RWI) and early-arrival (EWI) waveform inversions to build
tical and horizontal wavenumber components of the velocity
a large-scale velocity model of the subsurface. It is alternated
model, respectively. This prompts us to combine EWI and
with a waveform inversion/migration of near-offset reflections
RWI into a joint inversion (JFWI) to broaden the wavenum-
to build a short-scale impedance model that is used as an in-
ber spectrum of the velocity model (Zhou et al., 2015). They
put to build the sensitivity kernel of RWI along the two-way
show with a synthetic experiment that JFWI outperforms RWI
reflection paths. The velocity macromodel built by JFWI can
for near-surface reconstruction, which translates to more ac-
be used as the initial model of standard FWI to enrich the high
curate reflector images at great depths. A real-case study fur-
wavenumber content of the model. We present an application
ther reveals that JFWI is still sensitive to cycle skipping mainly
of this workflow to a real 2D OBC profile across a gas cloud in
caused by early arrivals, which leads us to implement a layer-
the North Sea. First, we highlight the footprint of attenuation
stripping approach by offset continuation (Zhou et al., 2016).
by comparing recorded seismograms with the synthetics com-
However, we did not account for attenuation, the footprint of
puted in a viscoacoustic velocity model previously developed
which in the wavefield is not negligible in this target zone
by 3D FWI. Then, the main promises and pitfalls of JFWI are
(Prieux et al., 2011; Operto et al., 2015).
highlighted using two initial models of increasing accuracy.
When starting from a crude 1D initial model, JFWI is influ- The sensitivity of FWI to attenuation effects has been reviewed
enced by cycle-skipping artifacts and fails to update the low- in Kurzmann et al. (2013), who have shown a significant im-
wavenumber content of the subsurface model. When a more provement of the velocity model when attenuation is accounted
accurate initial model is used, the procedure of JFWI followed for during seismic modeling, even in an approximate way. There-
by standard FWI (with resulting JFWI model as initial model) fore, we are motivated to re-assess JFWI when seismic mod-
succeeds in building a velocity model which is more accurate eling is performed with attenuation. In this abstract, we shall
than the one built directly by standard FWI. This study sug- first illustrate the attenuation footprint in the data set, and then
gests that JFWI is more efficient than FWI to update the low assess the sensitivity of JFWI to cycle skipping issues. The
horizontal wavenumbers along the reflection wavepaths dur- velocity macromodel built by JFWI will be further assessed in
ing the velocity macromodel building task, hence leading to a terms of kinematic accuracy and spatial resolution as an initial
more accurate reconstruction of the gas cloud. However, it re- model for FWI.
mains prone to cycle skipping when a conventional difference-
based misfit function is considered. Therefore, more robust METHODOLOGY
misfit function must be used in the future to reduce the demand
Two key ingredients of JFWI are, on the data side, the explicit
on an accurate initial model.
separation between the early arrivals d e and short-spread re-
flections d r and on the model side, the scale separation be-
INTRODUCTION tween the velocity macromodel VP and the short-scale impedance
model IP (Jannane et al., 1989; Operto et al., 2013). The weighted
With the development of long-offset wide-azimuth acquisition
misfit function of JFWI writes (Zhou et al., 2015)
geometries, waveform inversion of early arrivals (EWI) such
as diving waves and wide-angle reflections is useful to build C(VP )|IP = 21 kW e (d e − Ru0 ) k2 + 21 kW r (d r − Rδ u) k2 ,
an accurate velocity model (Virieux and Operto, 2009). How- where the background wavefield u0 is computed in the smooth
ever, the limitation of EWI is the insufficient sensitivity to deep VP model. The full scattered wavefield δ u is the difference
structures that are below the maximum penetration depth of between the full wavefield u that is computed in the smooth
early arrivals. In addition, a brute-force application of EWI is VP and rough IP models and the background wavefield (i.e.
often prevented by cycle skipping issue due to long propaga- δ u = u − u0 ). The explicit data separation between early ar-
tion distances when very low frequencies are unavailable (e.g. rivals and reflections is needed to evaluate the two correspond-
Warner and Guasch, 2014; Luo et al., 2016; Métivier et al., ing independent L2 functionals, the gradient of which naturally
2016). Alternatively, reflection waveform inversion (RWI) (e.g. excludes the high-wavenumber migration isochrones (first or-
Xu et al., 2012; Wang et al., 2013; Brossier et al., 2015) has der). Such separation may require a careful preprocessing as
been proposed to build the velocity model by restricting the reviewed in Zhou et al. (2015). The computational cost of the
sensitivity kernel of full waveform inversion (FWI) along the JFWI is twice of standard FWI while no memory overhead is
two-way transmission paths of short-spread reflections, hence required. The overall workflow consists of the following steps
increasing the depth sensitivity compared to EWI. A short- starting from smooth VP and IP models (Zhou et al., 2015): [1]
scale reflectivity model, produced by migration/inversion of Least-squares migration of near-offset reflections to generate
reflection data, is used as an input to build the RWI gradient. IP perturbations; [2] JFWI to update smooth VP ; [3] Go back to
A key property of RWI is the significant reduction of cycle-
© 2017 SEG Page 1422

Step [1] until convergence. The repetition of Step [1] is needed x (km) x (km)
5 7 9 11 13 15 5 7 9 11 13 15
to recreate IP perturbations consistently with new VP models
1 Sediment layers 1
while keeping the background component of IP fixed to its Gas cloud
2 2
z (km)
initial value, such that near-offset reflections are still matched Caprock
3 3
during Step [2]. a) b)
4 4
(km/s)
1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

5 7 9 11 13 15 5 7 9 11 13 15
DATA ANATOMY: TOWARDS VISCOACOUSTIC
1 1
MODELING 2 2
z (km)
3 3
We apply JFWI to an OBC line acquired in the North Sea. c) d)
4 4
A velocity model has been built by 3D reflection traveltime to-
mography (Fig. 1a), which shows a low-velocity gas area (blue 60 80 100 120 140 160 180 200
5 7 9 11 13 15 5 7 9 11 13 15
area) embedded in the soft sediments above the caprock at 2.4 1 1
km depth (Barkved et al., 2010; Haller et al., 2016). Using 2 2
z (km)
the tomographic model as the initial model, various applica- 3 3
tions of 3D FWI (e.g., Sirgue et al., 2010; Operto et al., 2015) 4
e)
4
f)
have largely increased the resolution of the subsurface models

(Fig. 1b). Figure 1: 2D sections of VP models built by (a) reflection tomography
(Courtesy of BP) and (b) 3D FWI (Operto et al., 2015). Corresponding
Two frequency bands, 3–5.1 Hz and 3–7.1 Hz, are sequen- QP models (c,d) and migrated images (e,f) inferred from (a,b), respec-
tially considered for inversion (Bunks et al., 1995). Gibbs ef- tively.
fect after band-pass filtering was mitigated by reshaping the
source wavelets estimated from zero offset traces. The main
phases identified in Fig. 2a are direct and diving waves (yel-
low arrows), short-spread reflections from shallow reflectors
(blue) and the sediment-caprock interface (red). At far offsets,
the postcritical reflections (dashed red) and refractions (dashed
magenta) from the sediment-caprock interface provide low-
wavenumber sensitivity down to 2.5 km depth and are treated
as early arrivals accordingly. Based on this phase identifica-
tion, we define the offset-dependent time window function for
data separation as shown in Fig. 2a, yellow line.
Before inversion, we cautiously made a decision concerning
the modeling tool for wavefield simulation. Faithful anisotropic
models (ε and δ ) are available (courtesy of BP) and the elastic
footprint in the data set can be neglected (Operto et al., 2015,
their figure 23). However, this VTI acoustic approximation
of wave propagation is not kinematically accurate enough as
suggested by the non-negligible mismatches in Fig. 2b, black
arrow, between the recorded diving waves and the synthetics
that are computed in the existing 3D FWI velocity model of
Operto et al. (2015). We attribute these mismatches to attenu-
ation caused by the gas cloud and soft sediments.
To account for attenuation effects, we use three standard linear
solid (SLS) mechanisms (Emmerich and Korn, 1987; Carcione
et al., 1988; Robertsson et al., 1994; Moczo and Kristek, 2005)
to achieve a nearly frequency-independent QP (Plessix, 2016;
Yang et al., 2016). For simplicity, we assume QP = 1000 in the
water layer, QP = 95.17×(VP −1.3)2.5 +50 for 1.5 ≤ VP ≤ 2.5
km/s giving QP ≈ 60 in the gas cloud, and QP = 200 below the
caprock (Fig. 1c,d). As expected, this viscoacoustic modeling Figure 2: (a) One receiver gather of real data after low-pass filtering
engine allows for an improved data fit, especially for the direct (cutoff at 7.1 Hz) and wavelet reshaping. (b-d) Same gather plotted in a
wave and postcritical reflection (Fig. 2c). Although a better blue-white-red color scale. The overlaid synthetics (black wiggles) are
fit of diving waves and postcritical reflections is achieved with computed in the VP model of Fig. 1b with different modeling engines
(titled below). The two data sets are in phase if the black wiggles
3D modeling (Fig. 2d), we limit our study to 2D geometry for
cover the blue area. An increasing level of data fit is observed from
sake of numerical efficiency. The 2D assumption raises the (b) to (d) (black arrows). To achieve potentially acceptable data fit
issue of the footprint of the out-of-plane effects. Nevertheless, while avoiding cumbersome computations, we use viscoacoustic 2D
we shall leave them in the data set and processed them during modeling in this study.
inversion as coherent noise.
© 2017 SEG Page 1423

x (km) x (km)
RESULTS 5 7 9 11 13 15 5 7 9 11 13 15
1 1
Joint FWI 2 2
z (km)
We shall use two initial models of increasing accuracy to test 3 3
a) b)
the sensitivity of JFWI to the initial model. The first one is 4 4
built by smoothing the tomographic model below the see bed 5 7 9 11 13 15 5 7 9 11 13 15

1 1
to remove the reflectivity component followed by a lateral av-
2 2
eraging (Fig. 3a). Hence, the purpose of using this 1D model
z (km)
3 3
is to assess whether JFWI can recover the lateral variations c) d)
4 4
generated by the gas cloud. We assess the velocity model by
computing migrated images or IP perturbations by IpWI. The
flatness of the sediment-caprock interface, revealed by former
studies, is used to assess the faithfulness of the velocity in the
gas cloud (Fig. 1e,f). The migrated image computed in the 1D
initial model shows the deepening of the sediment-caprock in-
terface below the gas cloud (Fig. 3c), which results from over-
estimated velocities in the overburden.
Fig. 3b shows the inversion result. Not having imaged the gas
cloud, JFWI creates a pair of high-velocity anomalies along re-
flection paths above the caprock together with a low-velocity
blob in the middle of the gas cloud. This velocity artifacts
in the overburden leads to an inaccurate migrated image high-
lighted by the non-flatness of the caprock reflector (Fig. 3d). Figure 3: (a) 1D initial model. (b) Final JFWI velocity model. (c,d)
Fig. 3c,d shows the data fit achieved by the 1D initial and the Corresponding migrated images. (e,f) Data fit at 5.1 Hz. Note the
synclinal-shape caprock image resulting from overestimated velocities
resulting JFWI VP models, in which the IP perturbations are
in the overburden.
added to generate reflected waves. Both VP models provide
good data match at short offsets. The postcritical reflections Desired shift
are missing in the synthetic data; these phases may either not
Before JFWI
be produced or interfere with diving/refracted waves, due to

the high velocities in the overburden, and hence are not ob-
servable. On the other hand, an improved fit of refractions Recorded trace
is shown after JFWI; however, this is a cycle-skipped fit. To Env. of recorded trace
Modeled trace Cycle skipped shift
verify this statement, we show the seismograms at the x = 5 Env. of modeled trace
km position in Fig. 4. It is obvious that JFWI has decreased

After JFWI
the least-squares based misfit function at the price of a larger

traveltime lag, leading to the aforementioned high-velocity ar-
tifacts (Fig. 3b).
6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8
t (s)
We build a second initial model by smoothing the tomographic
model such that the the large-scale trend of the gas cloud is Figure 4: Cycle skipping. Zoom of the refracted wave at x = 5 km
(Figs 3e,f). The waveform envelopes (dash lines) help quantify the
preserved (Fig. 5a). Unlike the 1D initial model, this 2D initial
traveltime which is undesirably increased after inversion.
model generates migrated image with a flat cap-rock reflec-
tor (Fig. 5c) as the smoothing, less aggressive than averaging, 5 7 9
x (km)
11 13 15 5 7 9
x (km)
11 13 15
does not significantly degrade the kinematic accuracy of the 1 1
original tomographic model. 2 2
z (km)
3 3
Fig. 5b shows the JFWI result. No cycle skipping is witnessed. a) b)
4 4
Sufficiently low velocities are recovered in the gas cloud pre-
5 7 9 11 13 15 5 7 9 11 13 15
serving flat structures in the migrated image (Fig. 5d). To 1 1
further assess this JFWI model as an initial model for FWI, 2 2
z (km)
we perform standard FWI with a classical frequency contin- 3 3

uation scheme that inverts the 5.1 Hz data set before the 7.1 4
c)
4
d)
Hz one. The final FWI model shows a reasonable broadband 5 7 9 11 13 15 5 7 9 11 13 15
reconstruction of the subsurface (Fig. 5e). 1 1
2 2
z (km)
The low-velocity blob in the 2D initial model improves the fit

3 3
of the postcritical reflections (Fig. 6a) compared with the 1D e) f)
4 4
initial model (Fig. 3e). However, some mismatches are still
Figure 5: (a) 2D initial model. (b) Final JFWI velocity model. (c,d)
visible, implying a deficit of low wavenumbers in this model. Corresponding migrated images. (e) FWI velocity model starting from
Fig. 6b,c show the data fit for the 5.1 Hz JFWI model and (b). (f) FWI velocity model starting from (a).
© 2017 SEG Page 1424

7.1 Hz subsequent FWI model, respectively. The former shows extensions to 3D geometry to take advantage of broader aper-
an improved data fit suggesting a reliable update of the large- ture illumination and account more accurately for 3D wave
scale velocity variations by JFWI. The latter also shows a good propagation effects.
fit expect for the postcritical reflections at x = 7 km position Acknowledgements This study was funded by the SEISCOPE consor-
(Fig. 6c, black arrow), which is a desirable mismatch as these tium, sponsored by CGG, CHEVRON, EXXON-MOBIL, JGI, SHELL,
reflections come from out-of-plane propagation. We admit that SINOPEC, STATOIL, TOTAL and WOODSIDE. This study was granted
we have a pity cycle-skipping phenomenon for near-offset di- access to the HPC resources of CIMENT infrastructure and CINES/IDRIS
rect waves at the border of the near-surface (gray arrow). A 3D under the allocation 046091 made by GENCI. The authors thank BP
Norge AS and Hess Norge AS for providing the data set and the per-
inversion combined with more prudent frequency continuation
mission to publish this work. The first author appreciates fruitful dis-
schemes may help solve this problem (Operto et al., 2015). cussions with F. Audebert (TOTAL), H. Chauris (MinesParisTech),
Standard FWI A. Górszczyk (PAN), G. Lambaré (CGG), L. Métivier (LJK-CNRS,
UGA), and R.-E. Plessix (Shell).
To assess the effectiveness of JFWI in initial macro model
building for FWI, we perform FWI using the 2D smooth model
(Fig. 5a) as initial model. The result is shown in Fig. 5f.
We also show in Fig. 7 the direct comparison between the two
FWI results along an horizontal profile at 2.3 km depth cross-
ing the gas cloud. Both suggest that the FWI result starting
from the JFWI model has a higher lateral resolution in the gas
cloud with a richer low wavenumber content than the other
FWI result. This is because JFWI has succeeded to update
the low horizontal wavenumbers along sub-vertical wavepaths
connecting the reflectors to the surface, whereas FWI is more
suitable to update the low vertical wavenumbers along the wave-
paths associated with wide-aperture arrivals (diving waves and
postcritical reflections).
Such deficit of low wavenumbers in the FWI model of Fig. 5f
cannot be easily detected when we assess the accuracy of the
model through data fit (Fig. 6d). The improved match of div-
ing wave by the standard FWI model is undesirable as these
waves have underwent out-of-plane propagation. In order to
reduce the misfit function, standard FWI may have shifted the
synthetic postcritical reflections to earlier traveltimes, leading
to overestimated velocities in the gas cloud as shown in Fig. 7,
black arrow. In contrast, it seems that the higher sensitivity
of JWI to lateral subsurface variations has prevented such ar-
tifacts and helped recover the low velocities in the gas cloud
although the misfit value is slightly higher.
CONCLUSIONS Figure 6: Data fit at 5.1 Hz (a,b) and 7.1 Hz (c,d). Synthetics are
computed in (a) 2D initial model (Fig. 5a), (b) JFWI model (Fig. 5b),
(c) FWI result using JFWI model (Fig. 5e), and (d) FWI result using
We have combined the sensitivity kernels associated to early
2D initial model (Fig. 5f). Compared with (c), the FWI data fit of
arrivals and reflections for velocity macromodel building that (d) shows a over-fitting of the postcritical reflections (black arrows)
is suitable for standard FWI implementation. The approach despite a better match of shallow transmitted waves (gray arrows).
has been applied to a 2D OBC data set collected across a gas
cloud. We have considered attenuation in the wave simulation 2.2
part of the inversion, the significance of which has been illus-
Vp (km/s)
trated on the postcritical reflections and refracted waves from 2.0

below the gas cloud (i.e. early arrivals at far offsets). However, 1.8
the associated cycle-skipping issue has prevented us from us-
ing a crude initial model. Therefore, it is still required to de- 1.6
sign a decent initial model for the proposed least-squares based 5 7 9 11 13 15
misfit function. When using this initial model, the workflow x (km)
that alternates JFWI and impedance waveform inversion builds Figure 7: Horizontal profiles at 2.3 km depth of the 3D FWI model
an enough accurate initial model for standard FWI: low veloc- (Fig. 1b) (black) as reference, 2D smooth model (dashed), FWI re-
ities are nicely reconstructed in the gas cloud, unlike the direct sult using JFWI model (green) and FWI result using 2D initial model
application of FWI considering the same initial model that we (blue). Note the deviation of the former FWI model resulting from the
over-fitting of the postcritical reflections (black arrow), whereas JFWI
have taken for JFWI. Further investigations should deal with
followed by FWI matches reasonably well the reference profile.
alternative misfit function to mitigate cycle skipping as well as
© 2017 SEG Page 1425

EDITED REFERENCES
REFERENCES
Barkved, O., U. Albertin, P. Heavey, J. Kommedal, J. van Gestel, R. Synnove, H. Pettersen, and C. Kent,
2010, Business impact of full waveform inversion at Valhall: 91st SEG Annual International
Meeting, https://doi.org/10.1190/1.3513929.
Brossier, R., S. Operto, and J. Virieux, 2015, Velocity model building from seismic reflection data by full
waveform inversion: Geophysical Prospecting, 63, 354–367, https://doi.org/10.1111/1365-
2478.12190.
Bunks, C., F. M. Salek, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, no. 5, 1457–1473, https://doi.org/10.1190/1.1443880.
Carcione, J., D. Kosloff, and R. Kosloff, 1988, Wave propagation simulation in a linear viscoacoustic
medium: Geophysical Journal International, 93, 393–401, https://doi.org/10.1111/j.1365-
246x.1988.tb02010.x.
Emmerich, H., and M. Korn, 1987, Incorporation of attenuation into time-domain computation
: Geophysics, 52, 1252–1264, https://doi.org/10.1190/1.1442386.
Haller, N., R. Flateboe, C. Twallin, V. Dahl-Eriksen, P. Heavey, E. Kjos, R. Milne, and W. Rietveld,
2016, Valhall case study — value of seismic technology for reducing risks in a reactive
overburden: Presented at the 78th EAGE Annual Meeting, https://doi.org/10.3997/2214-
4609.201600817.
Jannane, M., W. Beydoun, E. Crase, D. Cao, Z. Koren, E. Landa, M. Mendes, A. Pica, M. Noble, G.
Roeth, S. Singh, R. Snieder, A. Tarantola, and D. Trezeguet, 1989, Wavelengths of Earth
structures that can be resolved from seismic reflection data: Geophysics, 54, 906–910,
https://doi.org/10.1190/1.1442719.
Kurzmann, A., A. Przebindowska, D. Kohn, and T. Bohlen, 2013, Acoustic full waveform tomography in
the presence of attenuation: a sensitivity analysis: Geophysical Journal International, 195, 985–
1000, https://doi.org/10.1093/gji/ggt305.
Luo, Y., Y. Ma, Y. Wu, H. Liu, and L. Cao, 2016, Full-traveltime inversion: Geophysics, 81, R261–
R274, https://doi.org/10.1190/geo2015-0353.1.
Metivier, L., R. Brossier, Q. Merigot, E. Oudet, and J. Virieux, 2016, Measuring the misfit between
Geophysical Journal International, 205, 345–377, https://doi.org/10.1093/gji/ggw014.
Moczo, P., and J. Kristek, 2005, On the rheological models used for time-domain methods of seismic
wave propagation: Geophysical Research Letters, 32, https://doi.org/10.1029/2004gl021598.
Operto, S., R. Brossier, Y. Gholami, L. Metivier, V. Prieux, A. Ribodetti, and J. Virieux, 2013, A guided
tour of multiparameter full waveform inversion for multicomponent data: from theory to practice:
The Leading Edge, 32, 1040–1054, https://doi.org/10.1190/tle32091040.1.
Operto, S., A. Miniussi, R. Brossier, L. Combe, L. Metivier, V. Monteiller, A. Ribodetti, and J. Virieux,
2015, Efficient 3-D frequency-domain mono-parameter full-waveform inversion of ocean-bottom
cable data: application to Valhall in the visco-acoustic vertical transverse isotropic
approximation: Geophysical Journal International, 202, 1362–1391,
https://doi.org/10.1093/gji/ggv226.
Plessix, R., 2016, Visco-acoustic full waveform inversion: Presented at the 78th EAGE Annual Meeting,
https://doi.org/10.3997/2214-4609.201600827.
© 2017 SEG Page 1426

Prieux, V., R. Brossier, Y. Gholami, S. Operto, J. Virieux, O. Barkved, and J. Kommedal, 2011, On the
footprint of anisotropy on isotropic full waveform inversion: the Valhall case study: Geophysical
Journal International, 187, 1495–1515, https://doi.org/10.1111/j.1365-246x.2011.05209.x.
Robertsson, J., J. Blanch, and W. Symes, 1994, Viscoelastic finite-difference modeling: Geophysics, 59,
1444–1456, https://doi.org/10.1190/1.1443701.
inversion: the next leap forward in imaging at Valhall: First Break, 28, 65–70.
Geophysics, 74, WCC1-WCC26.
Wang, S., F. Chen, H. Zhang, and Y. Shen, 2013, Reflection-based full waveform inversion (RFWI) in
the frequency domain: SEG Technical Program Expanded Abstracts, SEG, 877–881,
Warner, M., and L. Guasch, 2014, Adaptative waveform inversion — FWI without cycle skipping —
theory: 76th EAGE Conference and Exhibition 2014, We E106 13.
Xu, S., D. Wang, F. Chen, G. Lambare, and Y. Zhang, 2012, Inversion on reflected seismic wave: SEG
Technical Program Expanded Abstracts 2012, 1–7, https://doi.org/10.1190/segam2012-1473.1.
Yang, P., R. Brossier, L. Metivier, and J. Virieux, 2016, A review on the systematic formulation of 3D
multiparameter full waveform inversion in viscoelastic medium: Geophysical Journal
International, 207, 129–149, https://doi.org/10.1093/gji/ggw262.
Zhou, W., R. Brossier, S. Operto, and J. Virieux, 2015, Full waveform inversion of diving and reflected
Geophysical Journal International, 202, 1535–1554, https://doi.org/10.1093/gji/ggv228.
Zhou, W., R. Brossier, S. Operto, and J. Virieux, 2016, Joint full waveform inversion of early arrivals and
reflections: A real obc case study with gas cloud: SEG Technical Program Expanded Abstracts
2016, SEG, 1247–1251, https://doi.org/10.1190/segam2016-13954099.1.
© 2017 SEG Page 1427

Multiscale time-domain time-lapse FWI with a model-difference regularization
Musa Maharramov∗ , Ganglin Chen, Partha S. Routh, Anatoly I. Baumstein, Sunwoong Lee, and Spyros K. Lazaratos
ExxonMobil Upstream Research Company
SUMMARY regularization (Maharramov and Biondi, 2014) is specifically

intended for penalizing artifacts in the model difference. A

We present a multiscale time-lapse full-waveform inversion total-variation (TV) model-difference regularization is useful
(4D FWI) technique based on a cascaded time-domain simul- for the recovery of “blocky” model changes in a tomographic
taneous inversion of multiple surveys with a model-difference 4D FWI (Maharramov et al., 2015, 2016), while an L1 -based
regularization. In our cascaded approach, different model model-difference regularization helps to recover isolated “spiky”
scales are recovered using different objective functions and anomalies, and can be applied in a cascaded fashion after the
regularization penalties. We apply our method to a synthetic inversion with a TV regularization (Maharramov and Biondi,
example, and demonstrate a robust recovery of production- 2017).
induced velocity changes in the presence of repeatability is-
sues and errors in the amplitude information. In this work we extend the frequency-domain 4D FWI tech-
nique of Maharramov et al. (2016) to time domain, and pro-
pose a cascaded inversion work flow using alternative objec-
tive functions and regularization penalties.
INTRODUCTION
Most of the existing time-lapse seismic inversion techniques THEORY

rely on extracting time shifts and amplitude differences either
directly from baseline and monitor surveys or from migrated We simultaneously invert baseline and monitor models m1 and
baseline and monitor image gathers, and converting them into m2 by solving the following optimization problem:
subsurface model changes (Johnston, 2013). This approach 2 2
u1 d1 u2 d2
is the mainstay of prevalent industry time-lapse practices, and
α −
+ β − + (1)
requires a significant amount of survey cross-equalization and ku1 k kd1 k ku2 k kd2 k
manual intervention. Our objective is to automate time-lapse δ kR (m2 − m1 − ∆m0 )k1 , (2)
analysis, reducing the amount of interpretation and quality con-
where d1 and d2 denote the observed baseline and monitor
trol associated with a typical time-lapse survey. Some exist-
data, u1 and u2 are the predicted baseline and monitor data,
ing model and image-space techniques lend themselves to au-
α, β and δ are misfit and regularization weights, ∆m0 is either
tomation but have their own constraints. For example, tech-
a model difference inverted at an earlier stage of the inversion,
niques based on time-lapse wave-equation velocity analysis
or a model-difference prior, and R can be either the identity op-
(WEMVA) (Shragge and Lumley, 2013) inherit resolution lim-
erator or the spatial gradient operator, R = ∇. The data misfit
itations of conventional WEMVA, while those based on image-
terms in Equation 1 are computed for each trace, then summed
space tomography (Maharramov and Albertin, 2007; Girard
up for all source and receiver pairs. We have chosen the nor-
and Vasconcelos, 2010) or mixed data-image space methods
malized L2 misfit function (Routh et al., 2011) because of its
(Qu and Verschuur, 2016) still require data cross-equalization.
reduced sensitivity to amplitude errors in the data or forward
4D FWI, on the other hand, avoids extraction of time-lapse in- modeling. At the first stage of our cascaded inversion we re-
formation directly from data or image differences, takes advan- cover “blocky” velocity differences by minimizing the objec-
tage of the high-resolution power of FWI, and can use wide- tive function in Equations 1 and 2 with R = ∇ and ∆m0 = 0,
offset seismic acquisitions in an automated inversion of pro- effectively conducting a simultaneous 4D FWI with a TV reg-
duction-induced model differences (Routh et al., 2012; Zheng ularization of the model difference (Maharramov et al., 2016).
et al., 2011; Asnaashari et al., 2012; Raknes et al., 2013; Ma- At the second stage, we recover “spiky” velocity anomalies by
harramov and Biondi, 2014; Yang et al., 2014; Maharramov first setting ∆m0 to the model-difference result of the first in-
et al., 2015; Willemsen and Malcolm, 2015; Alemie and Sac- version, then minimizing the objective function in Equations 1
chi, 2016; Maharramov et al., 2016). Time-lapse FWI (with and 2 with the operator R equal to the identity operator. In
the exception of double-difference FWI), as a model-space tech- some situations our approximation to a “tomographic” inver-
nique, can be less sensitive to survey repeatability issues be- sion using the normalized L2 may not be sufficiently sensitive
cause direct manipulation of the baseline and monitor data, to spiky anomalies in the model difference, as is the case with
such as extraction of time displacements, is avoided. In prac- thin reservoirs and coarse computational grids. In such appli-
tice, however, differences in survey acquisition parameters and cations, the second stage of the inversion may use the standard
coverage result in inversion artifacts that propagate into the L2 data misfit function,
inverted model difference (Asnaashari et al., 2012). Simul- α ku1 − d1 k2 + β ku2 − d2 k2 + (3)
taneous inversion of multiple survey vintages in a linearized-
δ km2 − m1 − ∆m0 k1 , (4)
waveform inversion (Ayeni, 2011; Ayeni and Biondi, 2012) or
full-waveform inversion (Maharramov and Biondi, 2014) was thus making the inversion more sensitive to amplitude effects
proposed to mitigate this sensitivity, while model-difference (Maharramov et al., 2016).
EXAMPLES km. The two surveys are shifted by 100 meters with respect to
each other. Our starting velocity model was obtained from the
We generated synthetic data using acoustic modeling with den- true baseline model using a 400 m smoothing filter. We use
sity. The true baseline velocity model used in our experiments frequency continuation from 10 to 30 Hz, with the maximum
is shown in Figure 1, the difference between the monitor and frequency for each experiment determined using the method
baseline is shown in the zoomed-in Figure 2. The true density of Sirgue and Pratt (2004). In each experiment we conduct a
model and model difference are obtained from the true velocity broadband (from 0 to the maximum frequency) time-domain
model and model difference by dividing them by 1500 (setting full-waveform inversion to convergence, using the objective
water density to 1). functions of Equations 1 and 2 at the first stage of the inver-
sion, and Equations 3 and 4 at the second stage. We intention-
The true velocity difference of Figure 2 was designed to imi- ally avoid density inversion in order to demonstrate the effect
tate three large 30 m-thick reservoir compartments, and three of amplitude errors on our 4D FWI.
smaller compartments located up-dip from a partially perme-
able fault. Velocity changes of −300, −200 and 150 m/s are
prescribed in the large compartments to model the effect of gas
coming out of the solution (Johnston, 2013) and water substi-
tution (in the lowest compartment). The three small compart-
ments have only negative velocity changes of −200, −100, and
−50 m/s to model the effect of gas migrating up-dip through
the partially conductive fault.
Figure 2: The true velocity change. The true density change

was obtained from the velocity change by dividing it by 1500.
First, we conduct a parallel-difference FWI (Asnaashari et al.,

2012) after adding random Gaussian noise to the data. The
signal-to-noise ratio (SNR) peaked at about 10 dB, but deteri-
orated to 1 dB below 5 and above 24 Hz. Figure 3 shows the
result of the parallel-difference FWI. The inversion produced
Figure 1: The true baseline velocity model. The true density
oscillatory artifacts and quantitative errors that are evident both
model was obtained from the velocity by dividing it by 1500.
in Figure 3 and the well logs of Figures 5 and 6. The result
of the first stage of our simultaneous regularized inversion is
Both forward modeling and inversion are performed on a 700 shown in Figure 4, and is in a good quantitative agreement with
(horizontal) by 600 (vertical) computational grid with a 10 m the true model difference—see Figures 5 and 6.
horizontal and 5 m vertical spacing. A Ricker wavelet cen-
tered at 10 Hz is used as a source, and absorbing boundary However, the blocky inverted model of Figure 4 has an obvi-
conditions are applied at the surface to avoid surface-related ous flaw: the TV-regularized inversion on our 5 m grid has
multiples. Two different streamer acquisition geometries are removed the separation between the top two large compart-
used for the baseline and monitor surveys with 39 shots and ments (see Figure 5) and the separation between any of the
a 260 m shot spacing, with offsets ranging from 10 m to 7 small compartments (see Figure 6). Although the effects on the
Figure 3: Velocity difference inverted using parallel-difference

time-lapse FWI.
Figure 4: Velocity difference inverted at the first stage of the
inversion using the normalized L2 data misfit and a model-
travel times of the two ≈ 15 m-thick reservoir separation layers difference TV regularization.
are small and ignored by the blockiness-promoting TV regu-
larized inversion, the amplitude effects of the velocity model
difference at the reservoir interfaces may be significant, and
can be fitted by sparse and “spiky” diffractors at the reservoir
boundaries.
Therefore, at the second stage of our cascaded inversion we

minimize the standard L2 misfit function with a sparsity-pro-
moting L1 model-difference regularization in Equations 3 and
4, setting ∆m0 equal to the model difference of Figure 4. The
results of the cascaded inversion are shown in Figures 7 and
8. We now recover both separators for the large compartments
and one separator for the small compartments without creating
any additional oscillatory artifacts in comparison with the first
stage of the inversion. The weakest and deepest of the negative
velocity anomalies in the up-dip compartments is not well re-
solved by any inversion, apparently due to limited resolution.
The model-difference amplitudes at reservoir boundaries are
over-predicted in the second inversion because these now ac-
count for the effects of both velocity and density change. With
the true density change chosen proportional to the true velocity
change as described earlier, the neglected density has the effect
of boosting the reflectivity, resulting in a leakage into the in-
verted velocity difference when the L2 data misfit function of Figure 5: Model-difference logs at a 2.7 km inline coor-
Equation 3 is used. dinate showing the true (red), parallel-difference time-lapse
FWI (black), and TV-regularized simultaneous time-lapse FWI
(green) model differences.
CONCLUSIONS
The proposed cascaded multiscale 4D FWI method harnesses

the power of various misfit functionals and regularization penal-
ties to resolve subsurface model changes at various scales. A
total-variation model-difference regularization helps to reduce
oscillatory artifacts, but it may also penalize fine features of

interest, especially when applied over coarse grids. An L1 reg-
ularization, on the other hand, may penalize blocky features,
leaving out important effects within the reservoir and overbur-
den. However, a cascaded application of 4D FWI in combina-
tion with the TV seminorm and L1 norm can separate travel-
time and amplitude effects, and provide an imaging tool that
complements the existing image-difference techniques.
Figure 7: Model-difference logs at a 2.7 km inline coordi-

nate showing the true (red), parallel-difference time-lapse FWI
(black), TV-regularized simultaneous time-lapse FWI (green),
and L1 -regularized cascaded time-lapse FWI (blue) model dif-
ferences.
Figure 6: Model-difference logs at a 2 km inline coordi-

nate showing the true (red), parallel-difference time-lapse
FWI (black), and TV-regularized simultaneous time-lapse FWI
(green) model differences.
ACKNOWLEDGMENTS
The authors would like to thank David Johnston, Biondo Biondi,

Tom Dickens, Xinyou Lu, Gboyega Ayeni, Grant Gist, and
Eric Wildermuth for a number of useful discussions, and Exxon-
Mobil Upstream Research Company for the permission to pub-
lish this work.
Figure 8: Model-difference logs at a 2 km inline coordi-

nate showing the true (red), parallel-difference time-lapse FWI
(black), TV-regularized simultaneous time-lapse FWI (green),
and L1 -regularized cascaded time-lapse FWI (blue) model dif-
ferences.
EDITED REFERENCES
REFERENCES
Alemie, W., and M. Sacchi, 2016, Joint reparametrized time-lapse full-waveform inversion: 86th Annual
Asnaashari, A., R. Brossier, S. Garambois, F. Audebert, P. Thore, and J. Virieux, 2012, Time-lapse
imaging using regularized FWI: A robustness study: 82nd Annual International Meeting, SEG,
Expanded Abstracts, 1–5, http://dx.doi.org/10.1190/segam2012-0699.1.
Ayeni, G., 2011, Time-lapse seismic imaging by linearized joint inversion: Ph.D. thesis, Stanford
University.
Ayeni, G., and B. Biondi, 2012, Time-lapse seismic imaging by linearized joint inversion — A Valhall
Field case study: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–6,
Girard, A., and I. Vasconcelos, 2010, Image-domain time-lapse inversion with extended images: 80th
http://dx.doi.org/10.1190/1.3513744.
Johnston, D., 2013, Practical applications of time-lapse seismic data: SEG.
Maharramov, M., and U. Albertin, 2007, Localized image-difference wave-equation tomography: 77th
http://dx.doi.org/10.1190/1.2793096.
Maharramov, M., and B. Biondi, 2014, Joint full-waveform inversion of time-lapse seismic data sets:
Maharramov, M., and B. L. Biondi, 2017, Full waveform inversion for reservoir monitoring — Pushing
the limits of subsurface resolution: 79th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, We PRM 16, http://dx.doi.org/10.3997/2214-4609.201700025.
Maharramov, M., B. L. Biondi, and M. A. Meadows, 2016, Time-lapse inverse theory with applications:
Maharramov, M., B. Biondi, and S. Ronen, 2015, Robust simultaneous time-lapse full-waveform
inversion with total-variation regularization of model difference: 77th Annual International
Conference and Exhibition, EAGE, Extended Abstracts, We P3 09,
http://dx.doi.org/10.3997/2214-4609.201413085.
Qu, S., and D. Verschuur, 2016, Getting accurate time-lapse information using geology-constrained
simultaneous joint migration-inversion: 86th Annual International Meeting, SEG, Expanded
Raknes, E., W. Weibull, and B. Arntsen, 2013, Time-lapse full waveform inversion: Synthetic and real
data examples: 83rd Annual International Meeting, SEG, Expanded Abstracts, 944–948,
Routh, P. S., J. R. Krebs, S. Lazaratos, A. I. Baumstein, I. Chikichev, N. Downey, D. Hinkley, and J. E.
Anderson, 2011, Full-wavefield inversion of marine streamer data with the encoded simultaneous
source method: 73rd Annual International Conference and Exhibition, EAGE, Extended
Abstracts, F032, http://dx.doi.org/10.3997/2214-4609.20149730.
© 2017 SEG Page 1432

Routh, P., G. Palacharla, I. Chikichev, and S. Lazaratos, 2012, Full wavefield inversion of time-lapse data
for improved imaging and reservoir characterization: 82nd Annual International Meeting, SEG,
Expanded Abstracts, 1–6, http://dx.doi.org/10.1190/segam2012-1043.1.
Shragge, J., and D. Lumley, 2013, Time-lapse wave-equation migration velocity analysis: ASEG
Extended Abstracts 2012: 22nd Geophysical Conference, 1–5,
http://dx.doi.org/10.1071/ASEG2012ab197.
Sirgue, L., and R. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting
temporal frequencies: Geophysics, 69, 231–248, http://dx.doi.org/10.1190/1.1649391.
Willemsen, B., and A. Malcolm, 2015, Regularizing velocity differences in time-lapse FWI using
gradient mismatch information: 85th Annual International Meeting, SEG, Expanded Abstracts,
5384–5388, http://dx.doi.org/10.1190/segam2015-5908610.1.
Yang, D., A. E. Malcolm, and M. C. Fehler, 2014, Time-lapse full waveform inversion and uncertainty
analysis with different survey geometries: 76th A Annual International Conference and
Exhibition, EAGE, Extended Abstracts, We ELI1 10, http://dx.doi.org/10.3997/2214-
4609.20141120.
Zheng, Y., P. Barton, and S. Singh, 2011, Strategies for elastic full waveform inversion of timelapse
ocean bottom cable (OBC) seismic data: 81st Annual International Meeting, SEG, Expanded
© 2017 SEG Page 1433

A statistical comparison of three 4D Full-Waveform Inversion schemes
Maria Kotsi* and Alison Malcolm, Memorial University of Newfoundland
SUMMARY Although we consider simple models here, we are aiming at

a more comprehensive understanding of uncertainty. Charac-
Multiple seismic data sets are often recorded to monitor terizing the uncertainty pixel-by-pixel in a large model is not
changes in Earth properties. To image these changes, sev- computationally feasible and it is not clear that doing so would
eral different 4D full waveform inversion (FWI) schemes have help in the interpretation due to the volume of information gen-
been successfully applied over the past decade. We compare erated. We focus instead on the idea of characterizing the un-
three different 4D FWI schemes on two simple numerical ex- certainty of key elements of the image. As a first step towards
amples to quantify how each method performs. To do this, we this goal, we estimate some parameters of the image which
create correlated gaussian noise realizations and add them to we refer to as spatial characteristics, and we explore how are
our models to determine how errors in the models are trans- recovered by different FWI methods.
lated to errors in the final images. We computed spatial char-
acteristics of the recovered models and compare the perfor-
mance of the different 4D FWI schemes. Our results indicate THEORY
that while there are minor differences between the different
proposed methods all perform reasonably well for this type of The most commonly used objective function in FWI is a least
noise in these simple models. The methods that specifically squares measure:
target 4D changes do result in fewer artifacts outside the re-
1
gion of true change, but all methods recover the true change E(m) = kF(m) − dk2 , (1)
with similar accuracy. 2
where m is the model, usually 1/c2 where c is the velocity, d is

the observed data and F is the forward modeling operator. The
INTRODUCTION extension of FWI to the 4D case can be straight forward or
complicated in a number of ways. In this section, we describe
During hydrocarbon production changes occur in the reservoir the methods compared in this paper.
geometry and pore fluid properties. Geophysical monitoring
of these changes allows for the estimation of the extraction ef- The first method we explore is Parallel FWI. In this method,
ficiency and determination of the remaining reserves (Orange FWI is performed on the baseline and monitor data sets sep-
et al., 2009). Time-lapse (4D) seismic is the most commonly arately, and the difference of the two velocity models is com-
used technique for geophysical monitoring. In 4D seismic puted to illuminate time lapse changes. Parallel FWI is simple
differences between multiple surveys at the same site reveal to apply, however it has some challenges. As is well known,
changes in the reservoir. The first survey acquired is called the FWI problem is non-linear resulting in the possibility that
the baseline survey and all subsequent surveys are called mon- the minimization gets stuck in the wrong local minima. With
itor surveys (Lumley, 2001). Results from studies using Full multiple surveys over time that cannot be perfectly repeated
Waveform Inversion (FWI) to recover 4D changes have been this introduces some artifacts that can lead to mistaken signals
encouraging thus far (Asnaashari et al., 2015). Like all in- in the time-lapse change image. In order to overcome those
version methods, the objective of FWI is to deliver a velocity challenges different schemes have been developed.
model of the subsurface by iteratively matching predicted and
The second method we compare is Double difference FWI
observed seismic data (Tarantola, 1984; Virieux et al., 2009).
(DDFWI) (Waldhauser and Ellsworth, 2000; Denli and Huang,
FWI can be extended to the time-lapse case successfully, how-
2009), which is a method that uses the differences in the pre-
ever artifacts may arise due to the non linearity of the inverse
dicted and observed data between the baseline and monitor
problem and the non-repeatability of the surveys. To overcome
models to deliver a time-lapse change image of the subsurface.
this challenge different FWI approaches have been developed
First the baseline data are inverted and the resulting model is
and used (Watanabe et al., 2005; Zheng et al., 2011; Yang et
used as the initial baseline model to start the joint inversion.
al., 2013; Maharramov and Biondi, 2014). Parallel, Double
Instead of minimizing the usual objective function given in
Difference, and Alternating FWI are used in this study and
equation (1), DDFWI minimizes:
they will be explained in greater detail in the following section.
Since 4D monitoring involves looking for small changes in lo- 1
E(m) = k(F(m0inv ) − F(m1 )) − (d0 − d1 )k2 , (2)
calized regions, understanding the uncertainty in the measure- 2
ment of those changes is key. For this study we use two differ-
where F(m0inv ) and F(m1 ) are the modeled data for the base-
ent numerical examples: one of two horizontal reflectors and a
line and monitor models, and d0 and d1 are the observed data
five layer model. We then try to understand the uncertainty in
for the respective models. A downside of DDFWI is that the
the recovered changes by comparing the performances of the
two data sets must be carefully repeated to allow for the data
three methods. To do this, we introduce coherent noise and
subtraction.
observe how errors in the models propagate to the final image.
© 2017 SEG Page 1434

Statistical comparison of 4D FWI
The third method we study, Alternating FWI (AFWI), is a

method that attempts to both mitigate the reliance on the per-
fectly repeatable surveys and give us measure of uncertainty.
To do this, a set of weights, β , are calculated from the differ-
ences in how the baseline and monitor models converge (Yang
et al., 2014; Kotsi and Malcolm, 2017). This set of weights β
can be thought of as a confidence map of changes that high-

lights areas that have the highest probability of change. β is
then used as a regularisation parameter to constrain the final
joint inversion for the change in the material properties. The
objective function to be minimized in this case is :
1 1
E(m0 , m1 ) = kF(m0 ) − d0 k2 + kF(m1 ) − d1 k2
2 2
1 m0 − m1 2
+ k k , (3)
2 β
Figure 1: Horizontal reflectors example: True time-lapse
where m0 and d0 are the baseline model and data, m1 and d1
change together with the recovered from all three FWI
are the monitor model and data, and β is calculated via:
schemes time-lapse changes for the same model realization.
X
β= (1 − sgn[(mi−1 − mi )(mi+1 − mi )])|mi+1 − mi |. (4)
i
image stacking procedure as follows. For each horizontal po-
sition, we compute the distance ∆z between the recovered re-
In the next section, we describe the two numerical examples
flectors by taking the maximum value of the image in a win-
and the spatial characteristics calculation for the respective ex-
dow. We then average the recovered distances over the entire
amples. We then compare the recovered spatial characteristics
image to obtain an average ∆z for a particular image. Fig-
for the three methods on two simple models.
ure 2 shows the histograms of all the recovered ∆z for each of
the FWI schemes together with their calculated standard devi-
NUMERICAL EXAMPLES ations.
The recovered ∆z from all three histograms are approximately

2D Horizontal Reflectors normally distributed with most of the results being concen-
For the first example we use a simple model with a homoge- trated around the ∆z = 16, which is the true value. Even though
neous background and two horizontal reflectors. All of our the deviations in Figure 2 are sometimes large, none of them
calculations are done with the PySit package (Hewett and De- are larger than 1 m from the true ∆z . It is important to mention
manet, 2013). To introduce a change between baseline and that defining an accurate picking window is important. In this
monitor models we shifted the position of the top reflector particular case a larger window allows more artifacts to be in-
(Kotsi and Malcolm, 2017). We generated one hundred dif- cluded in the calculation of ∆z leading to less reliable results
ferent realizations of gaussian distributed random correlated with higher standard deviations.
noise, with a correlation length of that is bigger in the x-direction Layered Model
than in the z-direction. We then scale these random distribu-
tions so that they have the same average velocity as the true We use a five layer model to incrementally increase the com-
model and so that we are adding a zero-mean random field plexity of the model. The thickness of the middle layer is in-
to the squared slowness. Specifically, we take the normalized creased from baseline to monitor introducing a velocity pertur-
perturbations (normPert) and scale them via: bation of magnitude 0.5 km/s and a change in thickness of half
the layer thickiness. The initial model for the inversion was
velAve created by applying a gaussian smoothing filtering with a σ of
velRan = √ , (5)
1 + pertAmpl ∗ normPert 10 to the true baseline model (Figure 3). To see how random
noise affects this model, we consider two cases. In case 1 we
where velRan is the final random noise velocity model, velAve calculate the perturbed velocity model as in equation (5) by ap-
is the average velocity used, and pertAmpl is the amplitude of plying zero-mean perturbations around the slowness squared.
the perturbations. In case 2 we add the normalized correlated noise directly to
the velocity instead of the slowness.
We then add these random perturbations to the true baseline
and monitor velocity models and generate noisy data sets (note We create fifty noise realizations and add them to the true mod-
that we use different noise realizations for the baseline and els. Once again, we apply all three 4D FWI approaches to es-
monitor data sets). For each of the 100 noise realizations we timate the time-lapse changes (Figure 4). Due to the different
apply each of the three FWI methods described above to re- nature of the time-lapse change in this experiment, the spatial
cover the time-lapse changes (Figure 1). We then apply an characteristic we are interested in is the area of the recovered
© 2017 SEG Page 1435

Figure 4: Five layer example: True time-lapse change to-

gether with the recovered for all three FWI schemes time-lapse
changes for the same model realization in type 1 noise.
entire model. The area within dz gives us a measure of how

well each method recovers the true change and the calculated
Figure 2: Left: Histograms of the ∆z of the recovered time- area within the entire domain gives us a measure of how many
lapse changes from the three FWI schemes. Right: the corre- artifacts are introduced into the recovered change image. Fig-
sponding standard deviations. ure 5 shows the histograms of the area deviations from the
case 1 noise in the depth dz on the left and in the total amount
of depth on the right. In the depth dz all three methods per-
changes. To compare the different methods, we compute the form well with the peaks of the histograms being near 1 as
area of the changed region. To do this, we define a target ve- expected. Parallel FWI has the flattest distribution indicating
locity change, we used ±30% of the true change, and compute that it is the method with the least precision in recovering the
the number of pixels within a depth dz, that are within our final model. Of course extremes are also present in the other
velocity range, giving us an estimate of the area of the esti- two cases, but they are fewer and the distributions are thus a bit
mated change. Note that this calculation does not consider any sharper around the optimal recovered area. When we are look-
lateral-discontinuities that might be present in the recovery of ing at the whole depth, we are letting all the potential artifacts
the layer. Addressing this is a topic of current work. be included in our calculation. An overestimation is therefore
expected. DDFWI and AFWI perform better compared to the
parallel FWI, meaning they are more effective at suppressing
random noise and artifacts.
For the case 2 noise we do not scale the normalized correlated

noise and we are thus adding higher amplitude noise which
translates into a higher noise level in our final images. There-
fore, we extend the velocity range we consider to be a cor-
rect recovery by a factor of 2. Figure 6 shows the result-
ing histograms of the area deviations. All of the histograms
are broader indicating that the accuracy of all of the meth-
Figure 3: The true five layer baseline model on the left and the ods is diminished. This is to be expected as by adding noise
initial guess used for the inversion on the right. to the velocity instead of the squared slowness introduces a
more complicated error because the wave equation is linear in
squared slowness but non-linear in velocity. Both AFWI and
The true change has an area of 630 m2 . For comparison we DDFWI clearly outperform parallel FWI in this case, as evi-
performed the 4D FWI schemes on the noise free case. The denced by their significantly narrower distributions with fewer
resulting area for parallel FWI was 376 m2 , for DDFWI 416 outliers particularly when comparing areas within the region
m2 , and for AFWI 370 m2 . To be able to compare noise and of interest.
noise free cases instead of plotting the histograms of areas, we
plot the deviations from the areas in noise free case, in other
words we normalize the results of each method by the recov-
ered area in the noise free case for that method. We calculate
the area both within the depth range dz and throughout the
© 2017 SEG Page 1436

DISCUSSION calculated the distance between the two recovered reflectors

in the time-lapse change. The histograms were approximately
Thus far, we have tested simple models where the time-lapse normally distributed and the standard deviations show simi-
changes are big compared to the lateral extent of the image, larly good performance for all three FWI methods. In the five
meaning that is reasonable that all methods did well. A next layer model we calculated the area of the time-lapse change in
step will be to test a recovery of a smaller perturbation. We can the final image for two different types of noise. We found that
also consider methods such as that proposed by Hale (Hale, when adding noise to the velocity, rather than to the squared
2013) that warp one image or data set onto another to more slowness, we get significantly poorer recovered images. This
quickly compute the changes in a data set caused by small is likely because the wave equation depends linearly on the
perturbations in the model. This would open up the possibil- squared slowness but non linearly on the velocity, resulting
ity of using more robust statistical inversion techniques.Since in a deterioration in the recovered models in the latter case.
we are comparing different methods, its good to compare their This deterioration of results primarily flattens the associated
computational cost. AFWI takes almost 5 times as long as histograms and is particularly noticeable for the parallel FWI
DDFWI and Parallel FWI, which may become important in case. Our results also indicate that both AFWI and DDFWI
more complex models. Last but not least, the surveys consid- are successful at attenuating artifacts outside of the region of
ered were perfectly repeatable, which is clearly a big assump- true change.
tion. Changing acquisitions is a situation in which the extra
cost of AFWI may be justified as it is equipped to handle such
changes where DDFWI is not.
Figure 6: Case
1 2 noise results from all three methods. Left:
Histograms of the area deviations in depth dz. Right: His-
tograms of area deviation in the total amount of depth.
Figure 5: Case 1 noise results from all three FWI
schemes.Left: Histograms of the area deviations in depth dz.
Right: Histograms of area deviation in the total depth.
ACKNOWLEDGMENTS
This work is supported by Chevron and with grants from the

CONCLUSIONS Natural Sciences and Engineering Research Council of Canada
Industrial Research Chair Program and the Research and De-
In this paper we compared three 4D FWI approaches to eval- velopment Corporation of Newfoundland and Labrador and by
uate their relative performance in a statistical way. We used the Hibernia Management and Development Corporation. In
two simple numerical examples, a two horizontal reflectors in addition, we would like to thank Oleg V. Poliannikov from
a homogeneous background and a five layer model. We added MIT for sharing his code for the generation of the gaussian
different realizations of correlated gaussian random noise to perturbed velocity models.
our models, and we calculated spatial characteristics in the re-
covered images. In the two horizontal reflector example we
© 2017 SEG Page 1437

EDITED REFERENCES
REFERENCES
Asnaashari, R., R. Brossier, S. Garambois, F. Audebert, P. Thore, and J. Virieux, 2015, Time-lapse
seismic imaging using regularized full-waveform inversion with a prior model: Which strategy?:
Geophysical Prospecting, 63, http://dx.doi.org/10.1111/1365-2478.12176.
Yang, D., M. Meadows, P. Inderwiesen, J. Landa, A. Malcolm, and M. Fehler, 2015, Double Difference
Waveform Inversion: Feasibility and Robustness Study with Pressure Data: Geophysics, 80,
M129–M141, http://dx.doi.org/10.1190/geo2014-0489.1.
Denli, H., and L. Huang, 2009, Double difference elastic waveform inversion tomography in the time
domain: 79th Annual International Meeting, SEG, Expanded Abstracts, 28, 2302–2306,
https://doi.org/10.1190/1.3255320.
Hale, D., 2013, Dynamic warping of seismic images: Geophysics, 78, S105–S115,
Hewett, R., and L. Demanet, 2013, The pysit team, pysit: Python seismic imaging toolbox v0.5: Release
0.6.
Kotsi, M., and A. Malcolm, 2017, Estimating the Error Distribution of Recovered Changes in Earth
Properties with Full – Waveform Inversion: 13th International Conference on Mathematical and
Numerical Aspects of Wave Propagation, Extended Abstract.
Lumley, D. E., 2001, Time-lapse seismic reservoir monitoring: Geophysics, 66, 50–53,
http://dx.doi.org/10.1190/1.1444921.
Maharramov, M., and B. Biondi, 2014, Joint full waveform inversion of time-lapse seismic data sets: 84th
Orange, A., K. Key, and S. Constable, 2009, The feasibility of reservoir monitoring using time-lapse
marine CSEM: Geophysics, 74, F21–F29, http://dx.doi.org/10.1190/1.3059600.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Virieux, J. and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics:
Geophysics, 74, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367.
Waldhauser, F., and W. L. Ellsworth, 2000, A double difference earthquake location algorithm: method
and application to the Northern Hayward Fault, California: Bulletin of the seismological society
of America, 90, 1353–1368, http://dx.doi.org/10.1785/0120000006.
Watanabe, T., S. Shimizu, E. Asakawa, T. Matsuoka, 2005, Differential waveform tomography for time-
lapse crosswell seismic data with application to gas hydrate production monitoring: 75th Annual
International Meeting, SEG, Expanded Abstracts, http://dx.doi.org/10.1190/1.1845221.
Zheng, Y., P. Barton, and S. Singh, 2011, Strategies for elastic full waveforms inversion of time-lapse
ocean bottom cable (OBC) seismic data: 81st Annual International Meeting, SEG, Expanded
Yang, D., M. Fehler, A. Malcolm, F. Liu, and S. Morton, 2013, Double difference waveform inversion of
4D ocean bottom cable data: Application to Valhall, North Sea: 83rd Annual International
© 2017 SEG Page 1438

Time-lapse full waveform inversion for cross-well monitoring of microbubble injection
Rie Kamei, University of Western Australia, U Geun Jang, Korea Polar Research Institute, David Lumley,
University of Texas at Dallas, Takuji Mouri, Masashi Nakatsukasa, Mamoru Takanashi, Ayato Kato, JOGMEC
Summary
Seismic monitoring is increasingly important to understand

time-varying changes in subsurface physical properties for
hydrocarbon production, CO2 geosequestration, and near-
surface engineering purposes. Since the resulting changes in
elastic parameters and then in recorded seismic waveforms
tend to be small, full waveform inversion (FWI) can be a
powerful method to accurately estimate time-lapse velocity
changes by maximizing the use of waveform information.
We apply time-lapse FWI to cross-well survey data acquired
with highly repeatable pseudo-random sources to monitor
microbubble injection into shallow unconsolidated
sediments. We use parallel time-lapse inversion, and
successfully detect very small time-lapse velocity changes
(<1 %) within a thin layer (< 1m) due to highly repeatable
data sets, careful data preprocessing, and well-designed
inversion procedures. The velocity changes indicate the
potential influence of the fluvial depositional environment
on the migration of injected microbubble water.
Introduction
Seismic monitoring provides valuable information regarding Figure 1: (top) Microbubble water injection schedule, (bottom)
time-varying changes in subsurface physical properties schematic diagram of acquisition geometry. Green shading
during hydrocarbon production, hydraulic fracturing, CO2 indicates the source depths, orange shading depths of receivers
deployed during both baseline and monitoring surveys, and
geosequestration, groundwater injection/withdrawal and yellow shading the depths of receivers deployed only during the
others. However, the resulting changes in subsurface baseline survey.
properties are often small both in terms of magnitude and
spatial extent, leading to seismic data differences that are
field data applications by estimating the P-wave velocity
difficult to detect at typical non-repeatable noise levels. In
changes that occur during the injection of microbubble water
order to better extract information from the time-lapse data,
into shallow sediments. Microbubble is a gas bubble of a
exploiting the full seismic waveform information can be
diameter less than 1 mm, and used for CO2 EOR (Klusman,
critical, since amplitude or traveltime changes may not be
2003), geosequestration (Koide and Xue, 2009), and ground
realiably detected. liquefaction mitigation (Kobayashi et al., 2010). We use a
cross-well geometry that can be useful when the near-surface
Full waveform inversion (FWI) represents a set of methods
condition is sub-optimal (e.g., due to strong weathering and
to find a subsurface model that fits waveforms based on the
scattering).
numerical solution of the wave equation, and is becoming
established as a new technique to estimate high-resolution
Data
velocity models (Virieux and Operto, 2009). Time-lapse
FWI is a favourable approach to fully utilise waveform
We injected total of 8000 m3 of water infused with air
information and to estimate small time-lapse velocity microbubbles into the unconsolidated Quaternary sediments
changes. However time-lapse inversion strongly depends on
in the Kanto basin of Japan over 74 hours at depths between
the knowledge of a velocity model, and the repeatability of
22 and 25 m from the surface (Figure 1). The fluvial
acquisition parameters and data noise (Kamei and Lumley,
sediments mostly consist of sands and clay, and the water
2017). Thus, developing a robust time-lapse FWI method
table is at an approximate depth of 13 m.
has been challenging. A variety of inversion strategies have
been proposed to mitigate the issues (e.g., Routh et al. 2012,
The injection was monitored by repeating cross-well seismic
Asnaashari et al., 2014, Kamei and Lumley 2017). In this
surveys using the geometry depicted in Figure 1b. A pre-
study, we demonstrate the potential of time-lapse FWI in
© 2017 SEG Page 1439

Time-lapse FWI monitoring for microbubble injection
(a)
(b)
(c)
(d) (e)
Figure 2: (a) Observed baseline shot gathers shown for every other shot. (b) Predicted and (c) residual baseline shot gather after the baseline
FWI. (e-f) Baseline and monitoring observed same-level traces at the (c) 35-meter and (d) 24- meter depth. Monitoring 0 indicates the baseline
survey.
injection baseline cross-well survey was conducted 10 days and 4 kHz at depths between 13 and 50 m at 1-m intervals.
before the injection, and monitoring cross-well surveys were Note that 1 and 4 kHz sources were employed during the
conducted 14 times during and shortly after the injection baseline survey only. A source encoding technique was used
(Figure 1a). The distance between source and receiver wells to reduce the acquisition time; an array of six piezoelectric
is 64.4 m, and the injection well is located at a horizontal transducers simultaneously excited six different
distance of 24 m from the source well. Hydrophones were pseudorandom source wavelets with amplitude modulation.
placed in 1-m intervals at depths between 13 and 50 m
during the baseline survey, and at depths between 13 and 36 The acquired baseline and monitoring data exhibit high
m during the monitoring surveys. Pseudo-random sources signal-to-noise ratio. We display the recorded baseline shot
(Takanashi et al., 2016) were used to achieve good source gathers for the 1-kHz sources in Figure 2a after application
repeatability, and were excited at three frequencies at 1, 2 of a bandpass filter between 100 and 1050 Hz. We can
© 2017 SEG Page 1440

Figure 3: Estimated source wavelet for each source (top) before

Figure 4: (a) Traveltime tomography result used as an initial
and (bottom) after the baseline FWI. Note that every 6 sources
velocity model for baseline FWI, (b) FWI-estimated baseline
(corresponding to the bottom source of the source array) exhibit
velocity model. White cylinde indicates the injection well
time shift and different amplitudes compared to others.
location.
clearly identify first-arrival, reflected and diffracted (Figure 3a) as an initial model for the baseline inversion. We
waveforms. use the 1-kHz baseline data for this purpose as the data
include abundant low frequency information to stabilize
The recorded data show excellent repeatability, as well as FWI. The baseline data are preprocessed by trace editing,
showing small but clear changes in waveforms. In Figure muting, and the bandpass filter (200-1050 Hz). We apply
2c-d, we display the representative bandpass-filtered same- surface-consistent calibration similar to Kamei et al. (2015),
level traces of the 2 kHz data at depths of 35 and 24 m for and use conventional phase-only data residuals of Kamei et
the baseline and monitoring surveys. The injected al. (2014) (i.e., amplitude-normalized data) to remove large
microbubble water is not expected to affect seismic variations in source characteristics (Figure 3). The gradient
waveforms at a source-receiver pair at 35 m (much deeper is then preconditioned to reduce undesired high-frequency
than the injection depth). in Figure 2b, all traces indeed oscillations, acquisition footprint, and singularities at source
coincide each other extremely well, with the cross- and receiver locations.
correlation coefficient well above 0.97, suggesting good data
repeatability. On contrary, for the source-receiver pair at the We display the baseline velocity model after FWI in Figure
24 m depth (near the injection depth), we observe small but 4a. The baseline FWI model delineates a thin 1 m low
apparent changes in waveforms in terms of arrival time, velocity layer at a depth of 24 m bounded by two thin 1 m
amplitudes and frequency contents. The waveform changes high velocity layers. The layers are nearly flat, but also show
are time-dependent; for example, first arrival time advances some horizontal heterogeneities. In order to examine the
during the early stage of the injection, then delays and later quality of the baseline velocity model, we compute predicted
returns to the time nearly same as that of the baseline survey waveforms. The predicted data in Figure 2b agree
data. exceptionally well with the observed data at the transmitted
first arrivals and the later reflected arrivals. The excellent fit
Baseline FWI validates the high quality of the baseline FWI model, and
encourages time-lapse inversion, as shown next.
We apply multi-scale Laplace-Fourier-domain acoustic FWI
(Jang and Lumley, 2015, Kamei and Lumley 2015) to the Time-lapse FWI
baseline and monitoring data sets, and estimate P-wave
velocity changes. Prior to estimating time-lapse velocity The excellent baseline velocity inversion result and very
changes, we first invert for baseline P-wave velocity. We use high repeatability of the monitoring data allow us to adopt a
a P-wave velocity model obtained by traveltime tomography parallel inversion approach (Lumley and Shragge 2013,
© 2017 SEG Page 1441

Figure 5: FWI-estimated velocity change at (a) 26, (b) 28, and (c) Figure 6: Predicted and observed shot gathers at 28 hour after the
74 hours after the start of the injection. start of the injection for a source located at a depth of 26 m.
Predicted waveforms are computed from (top) the starting model
Asnaashari et al., 2014, Kamei and Lumley 2017). After for time-lapse FWI, and (bottom) the final time-lapse FWI velocity
applying similar data preconditioning, the 2 kHz baseline model.
and monitoring data are inverted for P-wave velocity,
starting from the FWI-estimated baseline velocity model. disagreement in both traveltime and amplitudes (Figure 6a).
The inversion runs are conducted in parallel for each data After time-lapse FWI, we observe clear improvement in
set, and the time-lapse velocity changes are estimated by waveform fitting (Figure 6b): The predicted and observed
subtracting the updated baseline FWI model from the waveforms nearly perfectly fit each other, indicating the
monitoring FWI models. high reliability of the inversion results.
We display the final estimated velocity changes at 26, 28, Conclusions

and 74 hours after the start of microbubble injection in
Figure 5. We detect velocity increase at horizontal distances We successfully apply time-lapse FWI to cross-well data
between 20 and 50 m and a depth of approximately 24 m. sets, and monitor the injection of microbubble water in
The magnitude of the velocity increase also varies over time. shallow unconsolidated sediments. Time-lapse monitoring
During the early stage of the injection, the velocity increases. data exhibit exceptionally high repeatability, and allow us to
However, during the late stage of the injection (after 71 reveal transient behavior in waveforms and P-wave
hours), the velocity decreases, and then becomes close to the velocities during the microbubble injection. High-resolution
pre-injection values at the end of the injection. We are time-lapse FWI detects changes in P-wave velocity of less
currently investigating rock physics relationships for than 1 percent, initially as velocity increases and
microbubble injection based in part on these results. subsequently as velocity decreases. The velocity changes are
mainly imaged within a thin layer between the injection and
Next, we validate the estimated time-lapse velocity changes receiver wells, indicating the fluid-flow influence on the
by comparing the predicted and observed waveforms before fluvial sediment depositional environment. The resulting
and after time-lapse FWI (Figure 6). Even prior to the time- velocity models fit the observed waveforms very well,
lapse inversion, the predicted waveforms mostly coincide supporting the consistency of the estimated velocity
with the observed waveforms due to the excellent quality of changes.
the previous baseline inversion, but show slight
© 2017 SEG Page 1442

EDITED REFERENCES
REFERENCES
Asnaashari, A., R. Brossier, S. Garambois, F. Audebert, P. Thore, and J. Virieux, 2014, Time-lapse
seismic imaging using regularized full-waveform inversion with a prior model: Which strategy?:
Geophysical Prospecting, 63, 78–98, http://doi.org/10.1111/1365-2478.12176.
Jang, U. J., and D. Lumley, 2015, Full waveform inversion comparison of conventional and broadband
marine seismic streamer data, NW Shelf Australia: 24th International Geophysical Conference
and Exhibition, Perth, Australia, ASEG-PESA.
Kamei, R., and D. Lumley, 2017, Full waveform inversion of repeating seismic events to estimate time-
lapse velocity changes: Geophysical Journal International, Accepted.
Kamei, R., R. G. Pratt, and T. Tsuji, 2014, On misfit functions for Laplace-Fourier waveform inversion,
with applications to wide-angle OBS data: Geophysical Prospecting, 62, 1054–1074,
http://doi.org/10.1111/1365-2478.12127.
Kobayashi, M., N. Suemasa, T. Katada, and K. Nagano, 2010, Feasibility study on countermeasure
against liquefaction using micro bubble: Proceedings of the 12th International Offshore and Polar
Engineering Conference, Beijing, China.
Koide, H., and Z. Xue, 2009, Carbon microbubbles sequestration: A novel technology for stable
underground emplacement of greenhouse gases into wide variety of saline aquifers, fractured
rocks and tight reservoirs: Energy Procedia, 1, 3655–3662,
http://doi.org/10.1016/j.egypro.2009.02.162.
Klusman, R. W., 2003, Evaluation of leakage potential from a carbon dioxide EOR/sequestration project,
Energy Conversation and Management, 44, 1921–1940, http://doi.org/10.1016/S0196-
8904(02)00226-1.
Routh, P., G. Palacharla, I. Chikichev, and S. Lazaratos, 2012, Full wavefield inversion of time-lapse data
for improving imaging and reservoir characterization: 82nd Annual International Meeting, SEG,
Expanded Abstracts, http://doi.org/10.1190/segam2012-1043.1.
Shragge, J., and D. Lumley, 2013, Time-lapse wave-equation migration velocity analysis: Geophysics,
78, no. 2, S69–S79, http://doi.org/10.1190/geo2012-0182.1.
Takanashi, M., Y. Nakamura, M. Nakatsukasa, and J. Sakakibara, 2016, Crosstalk-free simultaneous
acquisition by arbitrary sweeps with amplitude modulation: 78th Annual International Conference
and Exhibition, EAGE, Extended Abstracts, http://doi.org/10.3997/2214-4609.201601412.
Geophysics, 74, no. 6, WCC1–WCC26, http://doi.org/10.1190/1.3238367
© 2017 SEG Page 1443

Efficient 3D localized elastic full waveform inversion for time-lapse seismic surveys
Shihao Yuan*, Nobuaki Fuji and Satish Singh, Institut de Physique du Globe de Paris, Dmitry Borisov,
Princeton University
Summary recorded at the physical receivers to virtual receivers in the

subdomain (residual wavefield extrapolation). Borisov et al.
We present a methodology to invert seismic data for a (2015) proposed a localized waveform inversion technique
localized area by combining source-side wavefield for time-lapse surveys within a target region disconnected
injection and receiver-side extrapolation method. Despite from source arrays. However, there are few studies that
the high resolving power of seismic full waveform extended it to a region disconnected both from source and
inversion (FWI), the computational cost for practical scale receiver arrays. To demonstrate the feasibility, efficiency
elastic FWI remains a heavy burden. This can be much and robustness of the proposed methodology, we perform
more severe for time-lapse surveys, which require real-time 2D and 3D elastic synthetic tests in comparison to those of
seismic imaging on a daily or weekly basis. Besides, conventional full-model inversions.
structure changes during time-lapse surveys are likely to
occur in a small area rather than the whole region, such as sea-air interface
oil reservoir. We thus propose an approach that allows to source
image effectively the localized structure changes far deep
from both source and receiver arrays. We perform both
forward and back propagation only inside the target region. seabed 4C-OBC/OBN
We present 2D and 3D elastic numerical examples of the
proposed method and quantitatively evaluate the inversion
errors, in comparison to those of conventional full-model V-
inversions. The results show that the proposed localized V+
waveform inversion is not only efficient and robust but also
accurate even under the existence of errors in baseline
injection boundary
models.
absorbing boundary
Introduction
Despite its high resolution (Virieux et al., 2009), full Figure 1: An schematic illustration of the proposed localized
waveform inversion requires repeated full wavefield inversion. Black star denotes the physical source (air-gun). Grey
modeling and it is thus computationally expensive to invert and yellow triangles denote physical and virtual receivers,
for a large area. Computational cost can be particularly respectively. The Dashed square represents the absorbing boundary
and the solid square represents the source injection boundary.
more severe for time-lapse surveys in exploration
seismology, which require real-time model estimations on a
regular basis (daily or weekly). Since the oil/gas production Method – Wavefield injection and extrapolation
does not change the overall substructures, there should be
much redundancy in conventional full-model waveform For the source side, wavefield injection method
inversions. In this study, we propose a methodology that (Robertsson et al., 2000; Borisov et al., 2015; Masson et al.,
allows to locally perform waveform inversions in the 2017; Yuan et al., 2017) reconstructs a source expression
region disconnected from both source and receiver arrays. on the surface of target regions, which serves as a localized
forward modeling scheme and allows efficient calculation
Fig. 1 shows how our methodology works. We have a of synthetic seismograms after model alterations within a
physical source (array) and a receiver array on the Earth's certain area. Wavefield injection is essentially a hybrid
surface or ocean bottom. Instead of performing inversion approach, where we impose an artificial boundary
inside the whole model region, we would like to focus on a enclosing the target regions and divide the whole
subvolume in V+, where the model changes are expected to computational domain into two subdomains V+ and V- (Fig.
occur. All forward and backward simulations will be 1). Given a baseline (background) model of the full region,
conducted within the reduced absorbing boundary (the we can calculate wavefield along the injection boundary
dashed square shown in Fig. 1). Such a strategy of localized (Fig. 1, solid square) for each physical source and store
waveform inversion requires two levels of wavefield them (both traction and velocity components). The stored
reconstructions: i) representation of seismic sources in the wavefield will serve as new source distributions (boundary
subdomain (wavefield injection); and ii) extrapolation of conditions) for subsequent localized forward simulations.
residual wavefield between baseline and monitor surveys The only missing part of the reconstructed wavefield using
© 2017 SEG Page 1444

4D localized elastic FWI
injection method is the higher-order interactions between model inversion, the localized inversion retrieves the
the scattered wavefield caused by the altered model within velocity anomalies with a higher accuracy after the same
target area and the unaltered baseline model outside the 100 iterations. The corresponding 1D velocity profiles are
reduced absorbing boundary. To maintaining all shown in Fig. 2e-f. Fig. 3 shows the vertical velocity
interactions, the so-called exact boundary condition should components of initial and final residuals from one shot
be applied (van Manen et al., 2007; Malcolm et al., 2016; gather. The residuals of the major events are better fitted by
Masson et al., 2017). However, it is more computationally the localized inversion (Fig. 3d) in comparison to the final
demanding and requires more accuracy of the baseline full-model inversion residuals (Fig. 3b). We specially plot
model since the higher-order scattered wavefield are more the theoretical initial residuals (Fig. 3e) in case that we
sensitive to the model error. Besides, as we show in directly record wavefield at the same positions as the
following numerical examples, those higher-order, long- extrapolated virtual receivers. The residual difference
range interactions may impose limited effect on the final between Fig. 3c and 3e is shown in Fig. 3f. This difference
inversion results. Considering the facts above, we stick to essentially represents the spurious events resulting from
the wavefield injection without implementing exact wavefield extrapolation. We notice that the localized
boundary condition. inversion does not ``see'' those spurious events when
comparing it with final residuals of the localized inversion
For the receiver side, classical wavefield extrapolation (Fig. 3d and 3f). We also calculated the inverted model
(redatuming) from physical to virtual receivers in the errors as a function of iterations (Fig. 4), showing that
subsurface is needed in our methods to perform localized localized inversions converge faster. Hence, the localized
waveform inversion. We extrapolate residual wavefield inversions require fewer iterations to obtain models with
between baseline and monitor in frequency domain using the same accuracy as full-model inversions. This will
correlation-type representation theorem (Wapenaar et al., further enhance the speed-up and efficiency.
2006; Ravasi et al., 2014), which is similar to reverse time
migration. The Green's functions between physical and Marmousi model - Vp
virtual receivers, serving as back-propagators, are 1
calculated from two-way wave equations and stored just full−model
localised
once using the baseline model. Both truncated integral due
0.8
Normalised model error
to the limited physical receiver aperture (the single-sided

illumination problem) and the inaccurate baseline
(especially in the shallow structures) will lead to some 0.6
spurious events (non-physical events) in the extrapolated
wavefield. As we will show in the later examples, receiver-
side wavefield extrapolation combining with source-side 0.4
wavefield injection can partly neglect these non-physical
extrapolated events due to causal conditions.
0.2
2D synthetic test – Marmousi model

0
20 40 60 80 100
We validate our localized inversion strategy on the Iteration number
modified Marmousi model in comparison to full-model
inversions. The Vp model is 7.5 km × 2.5 km (Fig. 2a). Vs Figure 4: Comparisons of inverted errors as a function of iteration
(Vs ≈ Vp/1.5 + 300 m/s) and density models are linearly number between full-model and localised waveform inversions in
linked to the Vp model. We generate 4 s of data recorded 2D Marmousi model test.
by 271 receivers at 300~m depth with a regular spacing of
25 m. A Ricker wavelet with dominant frequency of 6.5 Hz 3D synthetic test – SEG/EAGE overthrust model
is used as the source wavelet and added onto the vertical
velocity component. The grid spacing for the wave We apply the same strategy on the 3D SEG/EAGE
propagation is 25 m and the time step is 2 ms. 55 sources overthrust model. The Vp baseline model (4.8 km × 4.8 km
are deployed at 275 m depth with the interval of 125 m. × 3.2 km) is shown in Fig. 5a. Vs and density models are
The time-lapse velocity perturbations are about 5% of the linked to Vp model. 11236 receivers are put at 240~m
average baseline model (Fig. 2b). depth with a regular spacing of 40 m. A Ricker wavelet
with dominant frequency of 5.5 Hz is used as the explosive
The 2D Vp perturbations obtained from full-model and source wavelet. The grid spacing for the wave propagation
localized inversions under true baseline model are shown in is 20 m and the time step is 1.8 ms. 36 sources are deployed
Fig. 2c-d, respectively. Compared to the conventional full- at 220 m depth with the interval of 800 m. One selected
© 2017 SEG Page 1445

profile of the true time-lapse velocity perturbations is proposed strategy reveals that putting virtual sources and
shown in Fig. 5b. receivers in the vicinity of target region contribute to
improving the accuracy and robustness of waveform
The 3D Vp perturbations obtained from full-model and inversion. Due to the reduced modeling size in localized
localized inversions under true baseline model are shown in inversions, it not only takes much less time for a single
Fig. 5c-d, respectively. The corresponding 2D profiles are iteration, but also converges much faster. The severe
shown in Fig. 5e-f. This is the preliminary 3D result and we computational cost of 3D elastic full waveform inversion
may notice some inversion errors (high velocity anomalies during practical time-lapse surveys can be reduced to a
in Fig. 5f), which are probably due to the inaccurate more acceptable and economic level. Furthermore, it also
extrapolated wavefield. However, compared to the indicates the potentiality in determining high-resolution
conventional full-model inversion, the localized inversion elastic imaging of reservoir by inverting higher frequencies
still retrieves the velocity anomalies with a higher accuracy (above 30~Hz) at relatively low computational cost.
after the same 30 iterations.
Acknowledgments
Conclusions
The study was carried out as a part of the Paris Exploration
We combine wavefield injection and extrapolation to Geophysics Group project (GPX) funded by the French
perform the localized waveform inversion for time-lapse National Research Agency (ANR), CGG, TOTAL and
seismic surveys. It shows that the localized inversion can Schlumberger. A significant part of the calculations in this
enhance the image quality of local time-lapse variation work was performed on S-CAPAD at Institut de Physique
while reducing the computational cost to a large extent. The du Globe de Paris.
True baseline model − Vp Full−model inversion − Vp (error:37.2%) 1D perturbation − #1 1D perturbation − #2

(a) 5500 (c) 150 (e) (g)
0.8 true 0.8 true
0.6 100 inv inv
0.5 #1 #2
4500
1 1
50
Depth (km)
Depth (km)
Depth (km)
Depth (km)
1 1
3500 0 1.2 1.2
1.5
1.4 −50 1.4 1.4
2500
2
Local perturbations −100 1.6 1.6
1500 1.8 Iteration: 100
2.5 −150
1.25 3.75 6.25 (m/s) 3 4 5 6 (m/s) −200 0 200 −200 0 200
X direction (km) X direction (km) P−wave velocity (m/s) P−wave velocity (m/s)
True time−lapse perturbation − Vp Localized inversion − Vp (error:16.3%) 1D perturbation − #1 1D perturbation − #2
(b) 150 (d) 150 (f) (h)
0.8 true 0.8 true
0.6 100 0.6 100 inv inv
#1 #2
1 1
50 50
Depth (km)
Depth (km)
Depth (km)
Depth (km)
1 1
0 0 1.2 1.2
1.4 −50 1.4 −50 1.4 1.4
−100 −100 1.6 1.6

1.8 1.8 Iteration: 100
−150 −150
3 4 5 6 (m/s) 3 4 5 6 (m/s) −200 0 200 −200 0 200
X direction (km) X direction (km) P−wave velocity (m/s) P−wave velocity (m/s)
Figure 2: Time-lapse Vp model and inverted results. (a) The baseline Vp model selected from the true Marmousi model and the acquisition
geometry. Red star and yellow triangles denote physical source and receivers, respectively. White triangles denote the extrapolated virtual
receivers. Two arrows point to the locations where time-lapse perturbation occurs. (b) The zooming true time-lapse Vp perturbations within the
dashed square in (a). (c) Full-model inversion result. (d) Localized inversion result. The 1D true and inverted velocity perturbations (#1 and #2)
in (c) and (d) are shown in (e, g) and (f, h), respectively.
© 2017 SEG Page 1446

Full-model inversion - Vz Localised inversion - Vz Without wavefield extrapolation - Vz

(a) 0 0.03 (c) 0 0.03 (e) 0 0.03
0.02 0.02 0.02

1000 1000 1000
0.01 0.01 0.01
Time (ms)
Time (ms)
Time (ms)
2000 0 2000 0 2000 0
−0.01 −0.01 −0.01

3000 3000 3000
−0.02 −0.02 −0.02
Initial Initial Initial
4000 −0.03 4000 −0.03 4000 −0.03
50 100 150 200 250 10 20 30 40 50 60 10 20 30 40 50 60
Trace number Trace number Trace number
Full-model inversion - Vz Localised inversion - Vz Residual difference - Vz
(b) 0 0.03 (d) 0 0.03 (f) 0 0.03
0.02 0.02 0.02

1000 1000 1000
0.01 0.01 0.01
Time (ms)
Time (ms)
Time (ms)
2000 0 2000 0 2000 0
−0.01 −0.01 −0.01

3000 3000 3000
−0.02 −0.02 −0.02
Final Final Residual: (e) - (c)
4000 −0.03 4000 −0.03 4000 −0.03
50 100 150 200 250 10 20 30 40 50 60 10 20 30 40 50 60
Trace number Trace number Trace number
Figure 3: Vertical component of inversion residuals from one shot example under true baseline Marmousi model. (a-b) Initial and final (after 100
iterations) residuals using full-model inversion. (c-d) Initial and final (after 100 iterations) residuals using localised inversion. (e) Initial residuals
of localised inversion using the true wavefield directly observed at virtual receiver positions without extrapolation. (f) The residual difference
between (c) and (e).
True baseline model − Vp Full−model inversion − Vp Full−model inversion (2D Profile) − Vp

(a) (c) (e)
Iteration: 30
(m/s) (m/s) (m/s)
True time−lapse perturbation − Vp Localized inversion − Vp Localized inversion (2D Profile) − Vp

(b) (d) (f)
Iteration: 30
(m/s) (m/s) (m/s)
Figure 5: 3D SEG/EAGE overthrust Vp model and inverted results. (a) The baseline Vp model. (b) The 2D profile of true time-lapse Vp
perturbations. (c) Full-model inversion result. (d) Localized inversion result. 2D profiles of inverted velocity perturbations in (c) and (d) are
shown in (e) and (f), respectively.
© 2017 SEG Page 1447

EDITED REFERENCES
REFERENCES
Borisov, D., S. C. Singh, and N. Fuji, 2015, An efficient method of 3-D elastic full waveform inversion
using a finite-difference injection method for time-lapse imaging: Geophysical Journal
International, 202, 1908–1922, https://doi.org/10.1093/gji/ggv268.
Malcolm, A., and B. Willemsen, 2016, Rapid 4D FWI using a local wave solver: The Leading Edge, 35,
1053–1059, https://doi.org/10.1190/tle35121053.1.
Masson, Y., and B. Romanowicz, 2017, Fast computation of synthetic seismograms within a medium
containing remote localized perturbations: A numerical solution to the scattering problem:
Geophysical Journal International, 208, 674–692, https://doi.org/10.1093/gji/ggw412.
Ravasi, M., and A. Curtis, 2013, Elastic imaging with exact wavefield extrapolation for application to
ocean-bottom 4C seismic data: Geophysics, 78, no. 6, S265–S284,
https://doi.org/10.1190/geo2013-0152.1.
Robertsson, J. O., and C. H. Chapman, 2000, An efficient method for calculating finite-difference
seismograms after model alterations: Geophysics, 65, 907–918,
https://doi.org/10.1190/1.1444787.
van Manen, D. J., J. O. Robertsson, and A. Curtis, 2007, Exact wave field simulation for finite-volume
scattering problems: The Journal of the Acoustical Society of America, 122, EL115–EL121,
https://doi.org/10.1121/1.2771371.
Wapenaar, K., and J. Fokkema, 2006, Green’s function representations for seismic interferometry:
Geophysics, 71, no. 4, SI33–SI46, https://doi.org/10.1190/1.2213955.
Yuan, S., N. Fuji, S. Singh, and D. Borisov, 2017, Localised time-lapse elastic waveform inversion using
wavefield injection and extrapolation: 2D parametric studies: Geophysical Journal International,
209, 1699–1717, https://doi.org/10.1093/gji/ggx118.
© 2017 SEG Page 1448

Reconstructed Full Waveform Inversion with the Extended Source
Chao Wang, David Yingst, Paul Farmer, Ian Jones, Gary Martin, and Jacques Leveille, ION
SUMMARY from the reconstructed source wavefield by solving the wave

equation with the extended or reconstructed source. We re-
Conventional full waveform inversion (FWI) has been exten- fer to this method as reconstructed full waveform inversion
sively applied to real seismic data and has successfully gener- with the extended source (RFWI). Having simulated the for-
ated high-fidelity earth models for better seismic imaging and ward modeled data and source wavefield, conventional FWI
structural interpretation. Considering the nonlinearity and ill- searches for earth models such that the synthetic data have the
posedness of the problem for conventional FWI, the success best match to the field data in the least-squares sense. RFWI
in providing reliable updated models heavily relies on the ac- optimizes over earth models and the source wavefield jointly to
curacy of the initial models and low frequency contents in the minimize the data misfit subject to the wavefield being consis-
acquired data. tent with the wave equation in an `2 sense. By reconstructing
a better source wavefield from the extended source instead of
To relax the requirements of good initial models and adequate the original source signature, RFWI relaxes the severe require-
low frequencies, we propose a novel approach to time do- ments for FWI and provides more reliable inverted models.
main reconstructed full waveform inversion with the extended
source (RFWI). RFWI relaxes the constraint that the forward By including the source wavefield as an additional parameter
modeled data exactly solve the wave equation as in conven- to the search space, RFWI adds the wave equation error as a
tional FWI, and instead uses an `2 approximate solution. RFWI penalty term to the original data misfit in conventional FWI
estimates earth models and jointly reconstructs an extended and formulates a joint minimization problem. We reconstruct
source by minimizing an objective function that penalizes the the source wavefield and estimate the earth models in an al-
wave equation error while fitting the data. RFWI extends the ternating fashion. We first reconstruct the extended source by
solution space and therefore overcomes some of the problems minimizing the wave equation error together with the data mis-
with local minima that prevent conventional FWI from obtain- fit. It is estimated from solving the normal equation which is
ing a reliable solution with an inaccurate starting model and/or equivalent to the least-squares solution. The source wavefield
insufficient low-frequency data . is then reconstructed from forward propagation of the extended
source and the receiver wavefield is reconstructed from another
backward propagation. With the reconstructed wavefields and
INTRODUCTION the extended source, models are updated with a gradient based
optimization method and inverted models are used for recon-
Conventional FWI has been an essential tool to build high- structing another extended source and wavefields at next it-
fidelity earth models by minimizing the misfit between the eration. Time-domain derivation and implementation differ-
acquired and forward modeled data (Lailly, 1983; Tarantola, entiate RFWI from other previous works that are related to
1984; Virieux and Operto, 2009). This least-squares problem wavefield reconstruction inversion or source extension in the
has been implemented in both the time and frequency domain frequency domain (van Leeuwen and Hermann, 2013; Huang
(Sirgue and Pratt, 2004; Wang et al., 2013). However, it is a et al., 2016) and provides a more suitable solver for processing
highly nonlinear, ill-posed inversion problem and mitigating 3D large-scale production data sets.
convergence to local minima is a severe challenge. The great-
est limitation of FWI that affects the success in generating re- By expanding the search space, RFWI reconstructs the for-
liable solutions for large-scale production jobs is the critical ward modeled data to better fit the field data and avoid cycle
requirement of low-frequency data coupled with good starting skips. Therefore it mitigates some of the problems with lo-
models. cal minima with inaccurate initial models and/or inadequate
low-frequency contents that limit the success of FWI. While
Over the last decade, various effort has been made to miti- FWI usually relies on diving waves, RFWI takes advantages
gate the problems of local minima and many alternative meth- of reflected waves from wavefield reconstruction and produces
ods have been proposed (Shen and Symes, 2008; van Leeuwen deeper model updates. It also compensates the wavefield er-
and Hermann, 2013; Biondi and Almomin, 2014; Warner and rors that relates to the acoustic assumptions and approxima-
Guasch, 2016; Wang et al., 2016; Huang et al., 2016). All these tions during the wave propagation. From the observations of
previous works and our proposed method in this paper aim to both synthetic and field examples, RFWI demonstrates more
avoid convergence to local minima by adding additional pa- advantages in areas with sharp velocity contrasts, especially
rameters to the models and expanding the solution space. with the presence of the salt.
We now focus on the time-domain method and implementation This paper first presents the theory and methodology for 3D
using finite difference scheme. To compute the misfit function time domain RFWI. It also discusses the differences and simi-
for time-domain FWI, the conventional forward modeled data larities between conventional FWI and RFWI. The benefits of
are extracted from the source wavefield generated by solving RFWI over FWI are demonstrated using a 2D synthetic exam-
the forward wave equation exactly with the given source sig- ple. Finally, the applicability of RFWI on field data is illus-
nature. Our reconstructed forward modeled data are obtained trated on both 2D and 3D streamer data from offshore Mexico.
© 2017 SEG Page 1449

THEORY according to the asymptotic expansion. To make this computa-

tionally feasible, we ignore 1/λ 4 terms on the right hand side
Consider the following general wave equation, of equation (5) and use the symbol˜for any wavefield that has
been approximately reconstructed. The extended source can
2[m]u = f . (1)
then be approximately computed using
where m represents the subsurface model parameters, 2[m] is
g̃ = f + δ˜f ,
(6)
the wave operator or D’Alembert operator, u is the forward
propagated or source wavefield, and f is the source signature. where
Let S[m] denote the solution operator of the forward propa- δ˜f = S∗ P∗ (d0 − PS f )/λ 2 . (7)
gated wave equation (1). At each iteration, conventional FWI
solves the wave equation exactly with the given source and Our next step is to minimize the above objective function (4)
the current model to obtain the source wavefield u = S[m] f . w.r.t. m using the reconstructed g̃. Note that we can now re-
The objective function for conventional FWI, which uses the construct the forward source wavefield ũ = Sg̃. Then we need
`2 norm of the difference between the acquired field data and an explicit form of the wave operator and let’s consider the the
simulated forward modeled data and depends on the model m following operator for a VTI system of two coupled second-
only, is defined as order partial differential equations in terms of P-wave vertical
velocity v, Thomsen parameters, ε and δ , assuming a constant
1 density and zero shear velocity
J[m] = k PS[m] f − d0 k22 . (2)
2 2
1 1 + 2ε 1 ∂x + ∂y2 0
Here P is the restriction operator (a projection) that records the 2[v] , 2 ∂t 2 − 2 , (8)
source wavefield u at the receiver locations and d0 is the field v 1 + 2δ 1 0 ∂z
data. where v is the velocity model that will be inverted for, while
Unlike conventional FWI, the idea of RFWI is to relax the con- anisotropy parameters are fixed. The velocity model v then can
straint that u be an exact solution of the wave equation to an `2 be updated using a conjugate gradient method and the gradient
approximation, by adding the wave equation error as a penalty for the objective function (4) w.r.t. v can be calculated using
term. Thus a new penalized objective function depending on ∇v Jλ [g(v), v] ≈ ∇v Jλ [g, v]
both the source wavefield u and the model m is introduced as
2
1 λ2 = − 3 ∂t2 Sg, S∗ P∗ d0 − S∗ P∗ PSg
J¯λ [u, m] = k Pu − d0 k22 + k 2[m]u − f k22 , (3) v
2 2
2
where λ is a penalty factor that requires to be updated during ≈ − 3 ∂t2 Sg̃, S∗ P∗ d0 − S∗ P∗ PSg̃
v
the inversion. Source wavefield u should be forward going 2
2λ 2 ˜f
and therefore in the range of S, i.e. u = S[m]g for some g. We ≈ − ∂ ũ, δ
refer to g as the extended or reconstructed source since it varies v3 t

with space as well as with time for each shot and it is used to 2
+ 3 ∂t2 S f , S∗ P∗ PSδ˜f .
reconstruct the source wavefield u. Note then that 2[m]u = g v
and so the objective function with the extended source g and
This gradient computation can be easily extended to more gen-
the model m can be redefined as
eral wave equations and used to perform multi-parameter in-
1 λ2 version for velocity and other earth models, such as anisotropy
Jλ [g, m] = k PS[m]g − d0 k22 + k g − f k22 . (4)
2 2 parameters, attenuation quality factor, and/or density, either si-
To make this joint minimization problem computationally fea- multaneously or sequentially. When the penalty factor λ is
sible, we solve g and m in an alternating fashion. We first large enough, RFWI and conventional FWI converge to very
minimize the above objective function w.r.t. g using the cur- similar results. Thus the penalty factor has to be chosen care-
rent model. Since the data mismatch will be built into the fully to make RFWI produce favorable results and it varies
extended source, the reconstructed wavefield better matches with iterations. The second term in the gradient computation
the true wavefield for both reflections and refractions, which may be ignored for certain λ to reduce the computation cost.
mitigates the issues of cycle skipping associated with conven-
tional FWI. The least-squares problem for reconstructing the
extended source g is equivalent to solving the following nor- SYNTHETIC EXAMPLE
mal equation
We first demonstrate the advantage of RFWI by applying it to a
S∗ P∗ PSg + λ 2 g = S∗ P∗ d0 + λ 2 f . 2D synthetic data set and draw a comparison with conventional
Now assume that the extended source g is the solution of the FWI. The true model is a modified SMAART Pluto synthetic
normal equation and write g as a perturbation of f salt model as shown in Figure 1(a), which was used to generate
the field data set that has 250 shot gathers with a shot spacing
g = f +δ f, of 20 m. Each shot gather contains 600 receivers with an inter-
val of 20 m. The lowest frequency used for inversion was 4 Hz
where
and maximum offset was 6000 m. The initial velocity model
δ f = S∗ P∗ d0 /λ 2 − S∗ P∗ PS f /λ 2 + O(1/λ 4 ), (5) is displayed in Figure 1(b), which is a smoothed version of
© 2017 SEG Page 1450

(a) True model (b) Initial model
(a) Initial
(c) Inverted model from conventional FWI (d) Inverted model from RFWI
Figure 1: Synthetic models
the true model. If we compare the inverted models from 87 (b) RFWI inverted
iterations of conventional FWI in Figure 1(c) and 54 iterations
from RFWI in Figure 1(d), we notice that RFWI recovers a Figure 2: Velocity models
more detailed results with faster convergence, especially for
the salt body and sub-salt area.
FIELD EXAMPLES
The second example involves an application of RFWI to 2D

streamer data from offshore Mexico. 745 shots were used with
a shot spacing of 200 m. The lowest frequency used for in-
version was 3 Hz. Figure 2(a) shows the simple initial ve-
locity model with a maximum depth of 6000 m. Figure 2(b) (a) Using initial model
shows the inversion result from RFWI. After the inversion us-
ing RFWI, not only the shallow velocity has been updated, but
also deeper structure has been refined as well. We then com-
pare the stack images from the offset gathers to better QC the
results. Stack using the inverted model demonstrates better fo-
cusing compared to the stack using the initial model as pointed
by the red arrows in Figure 3. Finally, we forward modeled the
data using the initial and inverted models and generated shot
gathers that are displayed in Figure 4(a) and 4(b). Comparing
with field data in Figure 4(d) for the same shot record, the for-
(b) Using RFWI inverted model
ward modeled data using the inverted model after RFWI fit the
field data much better than using the initial model, with clear Figure 3: Stack images
improvement in data matching for both the near and far off-
sets. If we further investigate RFWI results, the reconstructed
forward modeled data that are extracted from the reconstructed
inverted model from RFWI. We then compare the offset gath-
source wavefield using the RFWI inverted model shown in Fig-
ers to QC the inversion results. Comparing the shallow and
ure 4(c) has the best fit to the acquired data. By reconstructing
top-salts events, the offset gathers using RFWI inverted model
a more reliable wavefield, RFWI provides deep updates while
shown in Figure 6(b) are more flattened than using the initial
taking benefits of both reflected and refracted energy.
model shown in Figure 6(a). Another tool for QC RFWI re-
We finally present an application of RFWI to 3D streamer data. sutls is using reverse time migration (RTM). Comparing with
This narrow azimuth survey was located in the Campeche area RTM image using the initial model in Figure 7(a), image using
from offshore Mexico. We used 2280 sources with an interval the RFWI inverted model in Figure 7(b) shows better image
of 500 m. The maximum offset was 6200 m and the lowest focusing and event continuity at the top salt and also at the
frequency used for inversion was 3 Hz. Figure 5(a) shows the deeper area below the salt as indicated by the red rectangles.
legacy velocity model that was used as initial model for RFWI RFWI demonstrates advantages in providing higher-resolution
with a maximum depth of 6000 m and Figure 5(b) shows the velocity for areas with strong velocity contrasts.
© 2017 SEG Page 1451

strong velocity contrasts, which makes it a beneficial method

for velocity model building with the presence of salt.
(a) Using initial model (b) Using RFWI model

(a) Using initial velocity model
(b) Using RFWI inverted velocity model
(c) Using reconstruction (d) Field data Figure 6: Offset gathers
Figure 4: Shot gathers
(a) Using initial velocity model
(a) Initial
(b) Using RFWI inverted velocity model
(b) RFWI inverted

Figure 7: RTM images
Figure 5: Velocity models
ACKNOWLEDGEMENTS
CONCLUSION
We would like to thank SMAART for providing the pluto syn-
We presented the methods and applications of our proposed thetic model. The Campeche reimaging program data was re-
novel inversion method - time domain RFWI. RFWI helps processed and reimaged by ION in partnership with Schlum-
avoid cycle skipping issues to overcome some of the prob- berger, who holds data licensing rights. We also thank ION for
lems with local minima and relaxes the requirements for con- permission to publish the results and our colleagues for pro-
ventional FWI. It demonstrates more advantages in areas with viding valuable discussion and support.
© 2017 SEG Page 1452

EDITED REFERENCES
REFERENCES
Biondi, B., and A. Almomin, 2014, Simultaneous inversion of full data bandwidth by tomographic full
0340.1.
Huang, H., W. Symes, and R. Nammour, 2016, Matched source waveform inversion: Volume extension:
Lailly, P., 1983, The seismic inverse problem as a sequence of before-stack migrations, In: Bednar, J., ed.,
Conference on Inverse Scattering: Theory and Applications: Society for Industrial and Applied
Mathematics, 206–220.
van Leeuwen, T., and F. Hermann, 2013, Mitigating local minima in full-waveform inversion by
expanding the search space: Geophysical Journal International, 195, 661–667,
Shen, P., and W. Symes, 2008, Automatic velocity analysis via shot profile migration: Geophysics, 73,
no. 5, VE49–VE59, https://doi.org/10.1190/1.2972021.
Sirgue, L., and R. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting
1259–1266, https://doi.org/10.1190/1.1441754.
Wang, C., D. Yingst, J. Bai, J. Leveille, P. Farmer, and J. Brittan, 2013, Waveform inversion including
well constraints, anisotropy, and attenuation: The Leading Edge, 32, 1056–1062,
https://doi.org/10.1190/tle32091056.1.
Wang, C., D. Yingst, P. Farmer, and J. Leveille, 2016, Full-waveform inversion with the reconstructed
wavefield method: 86th Annual International Meeting, SEG, Expanded Abstracts,
Warner, M. and L. Guasch, 2016, Adaptive waveform inversion: Theory: Geophysics, 81, no. 6, R429–
R445, https://doi.org/10.1190/segam2014-0371.1.
© 2017 SEG Page 1453

Extending the reach of FWI with reflection data: Potential and challenges
Adriano Gomes* and Nicolas Chazalnoel, CGG
Summary and the low-wavenumber component, also known as the

tomographic term or “rabbit ears” (Figure 1). This
We present a Reflection FWI (RFWI) workflow to update tomographic term is generated along the reflection
the velocity model using the low-wavenumber component wavepath; therefore, it contains significant information
of the FWI gradient of reflection data. This is achieved by about the kinematics of the velocity model, including areas
alternately using high-wavenumber and low-wavenumber beyond the reach of diving waves.
components to update density and velocity models,
respectively. With synthetic examples, we discuss the In this paper, an RFWI method to update the velocity
limitations and requirements of this approach and propose model using the rabbit ears is presented. Based on synthetic
possible ways to overcome some of the limitations. Finally, tests, the limitations and requirements of this approach are
the method is applied to a deep-water survey in the Gulf of discussed. Finally, the method is applied to a deep-water
Mexico, where improvement is observed both in the survey in the Gulf of Mexico.
migrated image and gathers.
Method
Introduction
In practice, using the rabbit ears in RFWI has two basic
Despite the increasing popularity of full waveform conditions: it requires some model or gradient
inversion (FWI) (Tarantola, 1984), the improvements it decomposition to decouple the influence of the migration
makes to the velocity model and seismic image are often term from that of the tomographic term, and it requires
insufficient to fully resolve the complexity in deeper areas. sharp boundaries in the model to generate the backscattered
This is due to the well-known depth limitation of the diving energy that will form the rabbit ears.
waves that are normally used to drive FWI (Sirgue and
Pratt, 2004), being caused by practical constraints on the The terms of the gradient can be distinguished by the
maximum offset recorded in the seismic data and the local direction of propagation of the source and residual
velocity regime, as well as the low signal-to-noise ratio of wavefields (Mora, 1989). The separation can be achieved
the diving-wave low-frequency energy at large offsets by explicit model separation using the Born approximation
(Dellinger et al., 2017). (Xu et al., 2012; Vigh et al., 2016) or by decomposition
techniques, such as inverse-scattering imaging condition
One option to address this limitation is to look to reflection (Ramos-Martinez et al., 2016), scattering-angle filter
data, which contain information about deeper events. (Alkhalifah et al., 2014), or wavefield decomposition (Tang
However, the modeling of reflection data requires a et al., 2013; Irabor and Warner, 2016).
reasonably accurate velocity model and density/reflectivity
model in order to avoid cycle-skipping at higher In our work, an up-down wavefield decomposition method
frequencies (Virieux and Operto, 2009) and to model the proposed by Liu et al. (2011) is used to separate the
correct relative amplitudes (Guitton, 2014). When these components of the gradient:
conditions are met, reflection data can be used in standard
FWI to add detailed features to the velocity model (Qin et t max
 s(x, t )r (x, t )  H
1
al., 2014). However, the high vertical-wavenumber g t ( x)  z ( s ( x, t )) H z ( r ( x, t )) dt
2
component that dominates the FWI gradient of reflection 0 , (1)
data has limited impact on the model kinematics. t max
 s(x, t )r (x, t )  H
1
g m ( x)  z ( s ( x, t )) H z ( r ( x, t )) dt
In the last few years, several methods have been proposed 2
0
to increase the significance of reflection data in the FWI
workflow, e.g., Xu et al. (2012), Tang et al. (2013), where s is the source wavefield, r is the back-propagated
Brossier et al. (2014), Alkhalifah et al. (2014), Irabor and residual wavefield, Hz represents the Hilbert transform in kz
Warner (2016), Vigh et al. (2016), and Ramos-Martinez et direction, and gt and gm are the tomographic and migration
al. (2016), among others. A common feature in all these terms, respectively.
methods is the extraction and/or enhancement of the low-
wavenumber component of the FWI gradient of reflection In order to produce the back-scattered energy necessary to
data. As shown by Mora (1989), reflection data produce generate the rabbit ears, a bootstrapping approach is used to
two different components in the FWI gradient: the high- estimate the location of the reflectors. More specifically, in
wavenumber component, also known as the migration term, the first iteration, the high-wavenumber component gm of
© 2017 SEG Page 1454

Reflection FWI: Potential and challenges
the gradient is used to estimate a density model that will susceptible to cycle-skipping, as the timing error normally
contain the necessary sharp contrasts. This is followed by a increases as we go further from the reference offset.
velocity update iteration, this time using the low-
wavenumber component gt of the gradient. These iterations It is clear that despite having the potential to extend the
are then alternated, meaning that both the background maximum update depth beyond that of diving-wave FWI,
velocity and reflector locations are sequentially updated, RFWI is also subject to its own restrictions, due to the
until a convergence criterion is reached. “tomographic” nature of the problem. In the following two
sections, some limitations of this approach are revisited,
Although the assumption that all reflection data are focusing on the contribution provided by deeper layers.
generated by density contrasts is not precise, the placement
of the reflectors is consistent with the current velocity
model; therefore, the traveltime information obtained with
the estimated density model can be used to infer kinematic
errors in the background velocity model.
In the proposed method, the least-squares objective

function (Tarantola, 1984) is chosen as the misfit
measurement between real and modeled data, though
different objective functions can be used within the general
RFWI framework (Brossier et al., 2014; Vigh et al., 2016).
Figure 1: RFWI update using: (a) and (c) 5% slower model; (b)
Simple synthetic example
and (d) 5% faster model. The top row corresponds to the gradient
of a single shot, with a single offset of 3000 m. The bottom row
A simple two-layer model, consisting of a constant velocity corresponts to the gradient of all shots and offsets.
of 2000 m/s with a density contrast at z = 3000 m, is used
to illustrate the RFWI operation. In this test, two different
Resolution analysis
initial models are compared, one with 5% lower velocity
and another with 5% higher velocity. In both cases, an
First, we analyze the wavenumber resolution of the RFWI
initial constant density model is provided to RFWI. The
gradient. For this purpose, we use the model shown in
maximum offset in the data is 4000 m, and a maximum
Figure 2, which contains three velocity anomalies with
frequency of 10 Hz is used for the RFWI iterations.
different wavenumber contents. The density model contains
a single reflector at z = 10 km, indicated by the black line.
For these tests, zero-offset data are used to update the
From this model, we created a data set with maximum
density model at the first iteration, using the high-
offset of 8 km and maximum frequency of 20 Hz.
wavenumber component of the gradient. This is followed
by a velocity update iteration. Because the reflector depth is
self-derived from each initial model, both tests match the
observed data at zero-offset. RFWI will then derive the
velocity update from the data mismatch at different offsets.
In practice, any offset group can be used as the reference to
estimate the reflector location at the first iteration, although
that does not guarantee convergence to the same final
model, since some offsets might have large accumulated Figure 2: (a) Velociy model with three different shaped anomalies.
errors, increasing the chance of convergence to local
minima if used as the reference. If all offsets are used in the Taking these parameters into account, the wavenumber
first iteration, the stacked migration term is typically resolution of the RFWI gradient — i.e., the model
dominated by the near offset data. However, for complex wavenumbers that are sampled by RFWI — is analytically
geologies, the curvature of migrated gathers can take more calculated as in Zhou (2016) and shown in Figure 3. As
complicated shapes. previously stated, the RFWI gradient is formed along the
reflection wavepath, i.e., by crosscorrelation of the
Figures 1a and 1b show the RFWI low-wavenumber scattered source wavefield and incident receiver wavefield
velocity update using a single trace with 3000 m offset. (and vice-versa). As the reflector gets deeper, the reflection
Figures 1c and 1d show the update for 700 shots. The angle normally decreases, given the offset limitation in the
derived density model in the first iteration is shown in the recorded data set. Therefore, for mildly dipping events, the
background. In both cases, the correct update direction is reflection wavepath becomes more vertical, which means
obtained, i.e., speed up (red) for the 5% slower model and that rapid horizontal variations can be naturally sampled by
slow down (blue) for the 5% faster model. Nonetheless, it different scattering points along the reflector. However,
is important to note that, although the reflector location is rapid vertical variations are averaged out along the
self-derived from the velocity model, RFWI is still wavepath. This effect can be observed in Figure 3, in which
© 2017 SEG Page 1455

the RFWI gradient (in red) provides good coverage of the strategies such as top-down inversion and regularization
horizontal wavenumbers (kx), but it is concentrated around can be considered.
the low vertical wavenumbers (kz).
Reflector depth uncertainty
In addition to the RFWI gradient, the dominant
wavenumbers (larger than -30 dB) of each anomaly in Another challenge faced by RFWI is the uncertainty
Figure 2 are calculated and plotted in Figure 3. Comparing regarding the true reflector depth. Unlike conventional
the spectra, it is clear that Anomaly #1 is well aligned with diving-wave FWI, which only requires a smooth velocity,
the RFWI gradient, while the other two have many RFWI needs sharp contrasts in the model in order to
wavenumbers that are not sampled by the deep reflector. generate the backscattered energy that forms the rabbit
ears. Since the traveltimes depend on both velocity and
Figure 4 shows the RFWI result, starting from a constant reflector position, the non-linearity of the problem is
velocity of 2500 m/s, after a total of 35 iterations. As increased, i.e., RFWI can converge to incorrect velocities
expected from the wavenumber analysis, while horizontal and reflector depths that still match the traveltimes, just as
wavenumbers are well resolved, only the small vertical ray-based reflection tomography can.
wavenumbers are recovered by RFWI. This is sufficient for
Anomaly #1 but insufficient for Anomalies #2 and #3, This problem is illustrated in Figure 5, in which RFWI
which contain higher vertical wavenumbers. As a result, using only deep reflectors is performed with (Figure 5c)
Anomaly #1 is well resolved and the other two are smeared and without (Figure 5d) a priori information about the
vertically from the reflector location to the surface. reflector depths. Since the initial velocity error is large (up
However, a migration QC indicates that, for all three to 30%) and there are not enough events to fully constrain
anomalies, the kinematics of the velocity model are well the inversion, RFWI without a priori information converges
recovered at the reflector depth. to an alternative model that does not give the correct
stacked image, although it improves the flatness of the
migrated gathers (Figure 5h). On the other hand, with a
priori information about the reflector depth, RFWI
correctly recovers the wavenumbers sampled by the deep
reflector and is able to match the true image (Figure 5b) at
that depth.
Figure 3: Wavenumber spectrum of RFWI gradient overlaid with

spectra of velocity anomalies.
Figure 4: RFWI result after 35 iterations.
In practice, the spectrum of the RFWI gradient can be Figure 5: Migrated image and velocity perturbation: a) Initial, b)
extended by the presence of additional reflectors at true model, c) RFWI with a priori information, d) RFWI without a
different depths and with varying dips (Alkhalifah, 2016). priori information. e), f), g), and h) are SOGs corresponding to
However, unlike tomographic methods based on residual models a), b), c), and d) respectively. The location of the gathers is
moveout, in which each sensitivity kernel — i.e., the indicated by the arrows.
sensitivity of the data residual to the model parameters —
is computed individually, the contribution of many kernels Although this velocity-depth ambiguity is well known in
is calculated simultaneously in RFWI. As a result, the migration velocity analysis (MVA) methods (Stork, 1992),
sensitivity kernel is more susceptible to the effects of imposing constraints in RFWI is less straightforward since
amplitude imbalance, such as poor illumination or low the contribution from many events is combined together.
reflectivity events. Ultimately, this can lead to an initial Therefore, for the moment, we recommend applying RFWI
dominance by stronger events, which can introduce a bias starting from a reasonably good initial model, in which the
towards certain wavenumbers. To alleviate this problem, location and focusing of the reflectors are not too damaged.
© 2017 SEG Page 1456

Real data example reflectivity shales between the shallow folds and the deep
events. However, the recovered wavenumbers are still able
Finally, we applied RFWI to a deep-water survey on the to significantly improve the kinematics throughout the
Mexican side of the Gulf of Mexico (GoM). The area of section, most notably at the Wilcox and Cretaceous but also
interest is located on the prolific Perdido fold belt. The in the shales, and the final velocity model has good
water bottom depth ranges from 200 m to 3500 m. The consistency with the geology.
seismic data were acquired using a flat-cable wide-azimuth

(WAZ) acquisition configuration with maximum offset of Discussion and conclusions
8.1 km along the cables and 4.2 km across the cables.
We have shown that RFWI has the potential to extend low-
The initial model for RFWI (Figure 6a) was obtained after wavenumber updates of FWI to much deeper areas, beyond
diving-wave FWI, along with velocity scans and ray-based the reach of diving waves. In fact, RFWI shares many
tomography for the deeper shales (Chazalnoel et al., 2017). concepts with MVA methods, such as ray-based reflection
However, due to the complexity of the folds combined with tomography. However, since the contributions from many
the low reflectivity of the shales in the overburden, some events are calculated simultaneously, RFWI is susceptible
discontinuities remain at the deep Wilcox and Cretaceous to the effects of amplitude imbalance, which can lead to
events (white arrows in Figure 6a). These discontinuities limited vertical resolution and convergence to local
can also be observed on the gathers (Figure 6d). minima. Therefore, at the current stage, RFWI can be
viewed as a complement, rather than a replacement, to
RFWI was then performed from 4 to 7 Hz, using data after established velocity inversion methods. Nonetheless, the
source and receiver deghosting, zero-phasing, and SRME significant improvement obtained by RFWI in the real data
demultiple. After RFWI application, a significant example shows that this technique is worth understanding
improvement is observed in the continuity of deeper events, and improving further, as it could become a valuable tool
both in the migrated image and gathers (Figures 6b and 6e). for updating the deeper section of velocity models.
An analysis of the RFWI perturbation (Figure 6c) reveals Acknowledgments

more consistency with the structures in the fold area, while
the perturbation in the deeper part consists mostly of low We thank CGG Multi-Client & New Ventures and the
vertical wavenumbers. This is due to stronger contributions Mexican Comisión Nacional de Hidrocarburos for
from the deep events around 10 km, compared with the low permission to show these results.
Figure 6: Vertical section with the velocity model overlaid on an RTM stack for: a) initial model, b) RFWI updated model, and c) RFWI velocity
perturbation. RTM surface offset gathers over the same line from: d) initial model, and e) RFWI updated model.
© 2017 SEG Page 1457

EDITED REFERENCES
REFERENCES
Alkhalifah, T., 2014, Scattering-angle based filtering of the waveform inversion gradients:
Geophysical Journal International, 200, 363–373, http://doi.org/10.1093/gji/ggu379.
Alkhalifah, T., 2016, Full-model wavenumber inversion: An emphasis on the appropriate
wavenumber continuation: Geophysics, 81, no. 3, R89-R98,
https://doi.org/10.1190/geo2015-0537.1.
Brossier, R., S. Operto, and J. Virieux, 2014, Velocity model building from seismic reflection
data by full-waveform inversion: Geophysical Prospecting, 63, 354-367,
https://doi.org/10.1111/1365-2478.12190.
Chazalnoel, N., A. Gomes, W. Zhao, and B. Wray, 2017, Revealing shallow and deep complex
geological features with FWI: Lessons learned: 79th Annual International Conference
and Exhibition, EAGE, Extended Abstracts, We-A3-02.
Dellinger, J., A.J. Brenders, J.R. Sandschaper, C. Regone, J. Etgen, I. Ahmed, and K.J. Lee,
2017, The Garden Banks model experience: The Leading Edge, 36, 151–158,
Guitton, A., 2014, On the velocity-density ambiguity in acoustic full-waveform inversion: 76th
Annual International Conference and Exhibition, EAGE, Extended Abstracts, We-E106-
03, https://doi.org/10.3997/2214-4609.20141082.
Irabor, K., M. and Warner, 2016, Reflection FWI: 86th Annual International Meeting, SEG,
Expanded Abstracts, 1136–1140, https://doi.org/10.1190/segam2016-13944219.1.
Liu, F., G. Zhang, S. Morton, and J. Leveille, 2011, An effective imaging condition for reverse-
time migration using wavefield decomposition: Geophysics, 76, no. 1, S29-S39,
https://doi.org/10.1190/1.3533914.
Mora, P., 1989, Inversion = migration + tomography: Geophysics, 54, 1575–1586,
https://doi.org/10.1190/1.1442625.
Qin, B., V. Prieux, H. Bi, A. Ratcliffe, J.P. Montel, D. Carotti, and G. Lambaré, 2014, Towards
high-frequency full waveform inversion - A case study: 76th Annual International
Conference and Exhibition, EAGE, Extended Abstracts, We-E106-07,
https://doi.org/10.3997/2214-4609.20141086.
Ramos-Martinez, J., N. Chemingui, S. Crawley, Z. Zou, A. Valenciano, and E. Klochikhina,
2016, A robust FWI gradient for high-resolution velocity model building: 86th Annual
Sirgue, L., and G. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for
selecting temporal frequencies: Geophysics, 69, 231–248,
https://doi.org/10.1190/1.1649391.
Stork, C., 1992, Reflection tomography in the postmigrated domain: Geophysics, 57, 680–692,
https://doi.org/10.1190/1.1443282.
© 2017 SEG Page 1458

Tang, Y., S. Lee, A. Baumstein, and D. Hinkley, 2013, Tomographically enhanced full wavefield
inversion: 83rd Annual International Meeting, SEG, Expanded Abstracts, 1037–1041,
Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation:
Vigh, D., K. Jiao, X. Cheng, D. Sun, W. Lewis, 2016, Earth-model building from shallow to
deep with full-waveform inversion: The Leading Edge, 35, 1025–1030,
https://doi.org/10.1190/tle35121025.1.
Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration
geophysics: Geophysics, 74, no. 6, WCC1–WCC26, https://doi.org/10.1190/1.3238367.
Xu, S., D. Wang, F. Chen, Y. Zhang, and G. Lambaré, 2012, Full waveform inversion for
reflected seismic data: 74th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, W024, https://doi.org/10.3997/2214-4609.20148725.
Zhou, W., 2016, Velocity model building by full waveform inversion of early arrivals &
reflections and case study with gas cloud effect: Ph.D. thesis, Grenoble Alpes University.
© 2017 SEG Page 1459

Born Modeling based Adjustive Reflection Full Waveform Inversion
Dong Sun ∗ , Kun Jiao, Xin Cheng, Zhen Xu, Luxin Zhang, Denes Vigh, Schlumberger
Summary As is well-known, both low-wavenumber and high-wavenumber

components reside in the gradient of conventional FWI (Mora,
One of the most challenging tasks for full waveform inver- 1989). For data with sufficient transmissions (such as long-
sion (FWI) is to construct background velocity models with offset data with rich diving and refracted energy), the low-
reflections, especially in deep regions. To accomplish this wavenumber components dominate this gradient and, thus, con-
task, we describe a reflection-based FWI algorithm to robustly ventional LS-FWI or inversion with other alternative objec-
build kinematically correct velocity models with reflected en- tives can effectively build kinematically correct low-wavenumber
ergy. This approach decomposes a subsurface model into a models, especially for shallow regions well illuminated by trans-
smooth background that is updated by means of minimiz- mitted energy (Luo and Schuster, 1991; Ravaut et al., 2004;
ing a kinematics-oriented objective function, and a rough re- Vigh and Starr, 2008; Sirgue et al., 2009; Vigh et al., 2011;
flectivity that is computed through a migration at the cur- Jiao et al., 2015). However, for data dominated by reflections,
rent background. Based on this model decomposition strat- the high-wavenumber components dominate, and conventional
egy and the Born modeling, we can explicitly compute the data-fitting-based inversion is not amenable to providing low-
low-wavenumber gradient components based on reflections, wavenumber updates, especially in deep regions.
which cannot be achieved with conventional approaches. To
guarantee that these low-wavenumber components contribute Among various strategies to promote low-wavenumber updates
to updating the background in proper directions, an adjustive during inversion, we adopt the model decomposition strategy
objective function is employed to robustly identify the kine- seen in some data domain approaches (Zhang et al., 2011; Xu
matic discrepancies between the predicted and observed re- et al., 2012; Ma and Hale, 2013; Brossier et al., 2014) and in
flections. More specifically, our approach measures the move- various migration velocity analysis variants (Symes and Caraz-
out differences between the predicted and observed reflections zone, 1991; Mulder and ten Kroode, 2002; Sava and Biondi,
in terms of local traveltime shifts, and then updates the back- 2004; Symes, 2008; Biondi and Almomin, 2014) that decom-
ground model to reduce such moveout discrepancies. Numeri- pose the subsurface model into a smooth background and a
cal experiments with both synthetic and real data demonstrate rough reflectivity. With this model decomposition and the Born
the success of the proposed algorithm for robustly constructing approximation, the low-wavenumber background gradient can
kinematically correct models in complex geological settings. be explicitly computed based on reflections. The remaining
question is how to ensure that this low-wavenumber gradient
will contribute to updating the background model in correct
directions (or, in other words, to avoid cycle skips).
Introduction
To achieve this goal, a kinematics-driven objective function is
Conventional full waveform inversion (FWI) makes inferences desired. Because the moveouts of reflection events are mainly
about subsurface models from recorded seismograms by solv- driven by model kinematics, we adopt the adjustive objective
ing a nonlinear least-squares (LS) optimization problem: function discussed in our previous work (Jiao et al., 2015).
This objective measures the local traveltime differences be-
min 12 ∥p[m] − d∥2 , (1) tween predicted and observed reflections, and forms them as
m
a function of space and time. Such a local traveltime func-
where m stands for subsurface models, d denotes observed tion well quantifies the moveout discrepancies between pre-
seismograms, and p[m] is the prediction from wavefield sim- dicted and observed reflections. Hence, the proposed RFWI
ulations. Although FWI is capable of reconstructing high- algorithm updates the background model to minimize moveout
resolution models , the LS objective is very ill-conditioned and discrepancies dominated by model kinematic errors. Applica-
has many spurious local minima for typical seismic data that tions of the new approach to both synthetic and real data sets
has limited offset and lack usable very-low-frequency energy. demonstrate that the proposed algorithm can effectively recon-
To be successful, conventional FWI usually requires a kine- struct kinematically correct background models with reflected
matically appropriate initial model, or that the recorded data energy in complex geophysical settings.
contains sufficient transmitted energy that illuminates the tar-
get regions (usually in the shallow part of a model). In a pre-
vious study (Sun et al., 2016), we proposed a reflection-based Method
full waveform inversion (RFWI) algorithm that used a data-
domain differential semblance optimization to build kinemati- This section introduces the proposed RFWI strategy and its
cally correct models using reflected energy, especially for deep main components. Here, we consider that the observed data d
regions that cannot be illuminated by transmitted energy. This mainly consists of reflections, and assume that the model m is
work further improves the robustness of our RFWI process by composed of a smooth low-wavenumber component v (back-
adopting the adjustive objective (Jiao et al., 2015) instead of ground model) and a rough high-wavenumber component r
the data-domain differential semblance objective.
© 2017 SEG Page 1460

RFWI
Figure 1: Results for model with 20% lower background velocity: (a) predicted and observed reflections; (b) computed local travel-
time shifts; (c) RFWI gradient (indicating positive updating direction). Results for model with 20% higher background velocity: (d)
predicted and observed reflections; (e) computed local traveltime shifts; (f) RFWI gradient (indicating negative updating direction).
(reflectivity), such that Among them, the adjoint of the derivative operator Dδ p ∆ϕ can
be written as:
m = v(1 + r). { [ ]}
( )T H [δ p] qδ p
D δ p ∆ϕ q = q + H ,
Then, the predicted reflection δ p is computed through the Born E(x,t)2 E(x,t)2
modeling (or de-migration) procedure defined as:
where H [·] stands for Hilbert transform, E(x,t) denotes the
δ p[v, r] := F[v]r, envelope of δ p(x,t), and q indicates any vector in the range of
( )T
Dδ p ∆ϕ . To compute gradient (4), we take
where F[v] denotes the Born operator at background model v.
The inverse problem we want to solve is: ( )T
q = D∆ϕ ∆T ∆T.
Given observed reflections d,
(2)
find v, r so that F[v]r is close to d. Note that, in gradient (4), the operator (Dv δ p)T explicitly gen-
erates the reflection-based low-wavenumber components, and,
( )T
Note that, in this work, for the sake of computational effi- D∆ϕ ∆T ∆T dictates the updating direction.
ciency, we compute r[v] through standard reverse time migra-
tion (RTM) instead of LS-RTM, i.e., r[v] := F[v]T d.
Numerical examples
The proposed RFWI updates v through minimizing the trav-
eltime shift between the Born prediction and observed reflec- The first example is based on a simple 8 by 15 km velocity
tions. In time domain, we view this traveltime shift as a local model with a constant background velocity at 2.28 km/s and
attribute and, hence, a function of both space and time that can one horizontal reflector at a depth of 4 km. There are 188 shots
be translated into the corresponding unwrapped instantaneous and 751 receivers evenly distributed at a depth of 10 m. We
phase error indicating the local phase misalignment. This way, computed the true reflection d, the Born predictions δ p, and
the new optimization problem can be formulated as: the corresponding local traveltime shifts ∆T for two incorrect
models whose velocities are 20% lower than and 20% higher
min J[v] := 21 ∥∆T (x,t)∥2 = 12 ∥∆T [∆ϕ ](x,t)∥2 , (3) than the true background velocity, respectively. Figures 1(a)
v
and (d) demonstrate that the Born predictions δ p and d share
where ∆T and ∆ϕ are the local traveltime shift and instanta- the same arrival time at offset 0 and start to diverge along dif-
neous phase difference, respectively, between δ p and d. ferent moveouts as offset increases, which is driven by differ-
ent model kinematics. Such moveout discrepancies are quan-
Using the chain rule and standard adjoint state derivation, the
tified by local traveltime shifts ∆T between δ p and d (Figures
gradient of J[v] with respect to v can be computed through:
1(b) and (e)). Figures 1(c) and (f) show the RFWI gradients for
( )T ( )T the two incorrect models. Apparently, those gradients based on
∇J = (Dv δ p)T Dδ p ∆ϕ D∆ϕ ∆T ∆T, (4)
the proposed objective indicate desired updating directions.
where Dv δ p, Dδ p ∆ϕ , and D∆ϕ ∆T are the derivative opera-
( )T The second experiment is based on the Marmousi model (Fig-
tors of δ p, ∆ϕ , and ∆T , respectively. (Dv δ p)T , Dδ p ∆ϕ , ure 2(f)) with a fixed-spread acquisition geometry consisting
( )T
and D∆ϕ ∆T stand for the corresponding adjoint operators. of 151 shots and 301 receivers evenly distributed at a depth
of 10 m. The target synthetic data is generated with a Ricker
© 2017 SEG Page 1461

RFWI
Figure 2: (a) initial model; (b) RFWI inverted model; (c) RFWI + LS-FWI inverted model; (d) reflectivity at initial model; (e)
reflectivity at RFWI inverted model; (f) true model; (g) Born prediction at initial model; (h) target shot; (i) Born prediction at RFWI
inverted model; (j) LS-FWI only inverted model.
wavelet with 30 Hz maximum frequency and an isotropic finite- data; the reconstructed model ensures a successful application
difference simulator that is different from the one used by our of conventional FWI to further improve model quality.
inversions. All of the following inversion tests run in a 3 to
18 Hz frequency band with 7 Hz as the dominant frequency. The third example is based on a wide-azimuth (WAZ) field
We first run the proposed RFWI starting from a 1D model (Fig- dataset with approximately 9.6 km maximum offset acquired
ure 2(a)). The target reflection-dominant data is extracted from in the Gulf of Mexico. The geology environment has exten-
the full synthetic data by muting out diving waves and refrac- sive salt/shale sheets with intervening deep-water sediment-
tions beyond water bottom reflections. Figure 2(h) plots one of filled mini-basins. This reflection-based inversion starts from
the target shot gathers and highlights the moveout of a major a model computed from an early-arrival adjustive FWI (Jiao
reflection with a green curve. After 20 RFWI iterations, the in- et al., 2015) that employed a simple 1D initial model and per-
verted background model (Figure 2(b)) already infers the cor- formed in a multi-scale manner with two frequency bands,
rect velocity trend. Figures 2(d) and (e) show the reflectivities i.e., 10 iterations at 4 Hz and 7 iterations at 6 Hz. After the
for the initial and RFWI inverted models, which demonstrate first stage inversion, the model updates were limited to a depth
a clear uplift of the reflectivity image due to the background around 2.7 km. To further build velocity in the deep parts of the
improvement. Figure 2(g) plots one of the shot gathers for the basins and beneath the salts, we exercised the proposed RFWI
initial model, delineates the moveout of a major reflection with with dominant frequency at 4.5 Hz. After 12 iterations, the
a red curve, and draws the target moveout curve in green; Fig- RFWI leads to promising mobile shale and subsalt improve-
ure 2(i) shows the shot gather for RFWI inverted model in the ments as demonstrated by comparing Figures 3(a) and (b) that
similar way as Figure 2(g) does. Clearly, RFWI greatly re- plot one inline of RTM images for the RFWI input and in-
duced the moveout discrepancies between the predicted and verted models. The overlaid velocities show deep updates in
target reflections. Starting from the RFWI inverted model, regions where transmitted energy cannot reach. Due to those
conventional LS-FWI further improves the accuracy and res- deep RFWI updates, the RTM image below the salt presents
olution of the reconstructed model; the inverted model after clear uplift in terms of reflector continuity and focusing, and
20 iterations of LS-FWI is plotted in Figure 2(c). On the con- thus the sub-salt structures become more prominent. What’s
trary, starting directly from the initial background model (Fig- more, Figures 4(a) and (b) present the Kirchhoff gathers for the
ure 2(a)), conventional LS-FWI stalls at an incorrect model RFWI input and inverted models near the location indicated by
(Figure 2(j)). As shown, the proposed algorithm effectively the yellow shaded zone in Figures 3(a) and (b). Clearly, RFWI
corrects the erroneous background using reflection-dominant improves the model kinematics.
© 2017 SEG Page 1462

RFWI
Figure 3: RTM image and velocity overlay for: (a) RFWI input model; (b) RFWI inverted model (after 12 iterations)
Conclusions
We present a robust RFWI algorithm. The approach employs

the Born modeling based simulation kernel to explicitly gen-
erate a low-wavenumber background gradient. To ensure that
this low-wavenumber gradient contributes to updating the back-
ground in the correct direction (, that is, to mitigate the cycle-
skipping issue), we adopt the adjustive objective function to
measure the moveout discrepancies between predicted and ob-
served reflections in terms of local traveltime shifts. Based
on this kinematics-oriented objective function, the proposed
RFWI updates the background through minimizing the move-
out discrepancies that are driven by the kinematic model er-
rors. Numerical experiments with both synthetic and real ex-
periments demonstrate the effectiveness and robustness of the
proposed algorithm in constructing kinematically correct mod-
els (especially in deep regions) in complex geophysical set-
tings. Starting from a background model from the RFWI pro-
cess, conventional FWI could further improve model quality.
Acknowledgements
The authors would like to thank Schlumberger for support and

permission to present this work.
Figure 4: Kirchhoff Gathers for: (a) RFWI input model; (b)

RFWI inverted model (after 12 iterations).
© 2017 SEG Page 1463

EDITED REFERENCES
REFERENCES
Biondi, B., and A. Almomin, 2014, Simultaneous inversion of full data bandwidth by tomographic full-
0340.1.
full-waveform inversion: Geophysical Prospecting, 63, 354–367, https://doi.org/10.1111/1365-
2478.12190.
Jiao, K., D. Sun, X. Cheng, and D. Vigh, 2015, Adjustive full waveform inversion: 85th Annual
https://doi.org/10.1190/1.1443081.
waveform inversion: Geophysics, 78, no. 6, R223–R233, https://doi.org/10.1190/geo2013-
0004.1.
https://doi.org/10.1190/1.1442625.
Mulder, W. A., and A. P. E. ten Kroode, 2002, Automatic velocity analysis by differential semblance
optimization: Geophysics, 67, 1184–1191, https://doi.org/10.1190/1.1500380.
Ravaut, C., S. Operto, L. Improta, J. Virieux, A. Herrero, and P. Dell’Aversana, 2004, Multiscale imaging
of complex structures from multifold wide-aperture seismic data by frequency-domain full-
waveform tomography: Application to a thrust belt: Geophysical Journal International, 159,
1032–1056, https://doi.org/10.1111/j.1365-246X.2004.02442.x.
Sava, P., and B. Biondi, 2004, Wave-equation migration velocity analysis. I. theory: Geophysical
Prospecting, 52, 593–606, https://doi.org/10.1111/j.1365-2478.2004.00447.x.
Sirgue, L., O. I. Barkved, J. P. Gestel, O. J. Askim, and J. H. Kommedal, 2009, 3D waveform inversion
on Valhall wide-azimuth OBC: 71st Annual International Conference and Exhibition, EAGE,
Extended Abstracts, https://doi.org/10.3997/2214-4609.201400395.
Sun, D., K. Jiao, X. Cheng, and D. Vigh, 2016, Reflection based waveform inversion: 86th Annual
Symes, W. W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56,
765–790, https://doi.org/10.1111/j.1365-2478.2008.00698.x.
Symes, W. W., and J. J. Carazzone, 1991, Velocity inversion by differential semblance optimization:
Vigh, D. and E. W. Starr, 2008, 3D plane-wave full-waveform inversion: Geophysics, 73, no. 5, VE135–
VE144, https://doi.org/10.1190/1.2952623.
Vigh, D., J. Kapoor, N. Moldoveanu, and H. Li, 2011, Breakthrough acquisition and technologies for
subsalt imaging: Geophysics, 76, no. 5, WB41–WB51, https://doi.org/10.1190/geo2010-0399.1.
Xu, S., D. Wang, F. Chen, G. Lambare, Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
© 2017 SEG Page 1464

Zhang, S., G. Schuster, Y. Luo, 2011, Wave-equation reflection traveltime inversion: 81st Annual
International Meeting, SEG, Expanded Abstracts, 2705–2710, https://doi.org/10.1190/1.3627756.
© 2017 SEG Page 1465

Seeing below the diving wave penetration with full waveform inversion
Denes Vigh*, Kun Jiao, Xin Cheng, Dong Sun and Lu Xin Zhang, Schlumberger WesternGeco
Summary proposed reflection FWI approach, long- and short-

wavelength model updates are decoupled; the long-
Full waveform inversion (FWI) is a high-resolution model wavelength information contributes to background model
building technique that uses the entire seismic record updates while the short-wavelength content contributes to
content to build the earth model. Conventional FWI usually the reflectivity updates. While deriving the deep updates,
utilizes diving and refracted waves to update the low- both the background and the reflectivity changes are
wavenumber/background components of the model; subject to quality control measures to ensure geologic
however, the update is often depth limited due to the consistency. In this paper, we demonstrate successful
limited offset range acquired. To extend conventional FWI applications of conventional FWI followed by the
beyond these limits of the transmitted energy, we must use reflection FWI workflow in real data environments. These
reflection data as well. Field data examples demonstrate inversions start from very simple models without the
that, even in a complex subsalt Gulf of Mexico setting, the standard tomographic model building process.
background velocity model can be updated from shallow to
deep water using conventional FWI followed by reflection- Method
based FWI.
The recovered wavelengths in FWI are heavily influenced
by the subsurface, the local velocity, and the illumination
Introduction angular aperture. For transmissions and refractions, the
large illumination angle apertures facilitate the
FWI has emerged over the last decade as a high-end tool reconstruction of the long-wavelength parts of velocity
for high-resolution, complex model building. As a data- models. For reflections, only short-wavelength parts tend to
fitting algorithm, FWI exploits the full content of the be recovered by FWI due to the narrow range of reflection
recorded seismic waveform to derive subsurface earth angle apertures. This explains why FWI recovers long-
models. Diving waves, pre-critical and post-critical wavelength components only in shallow areas and its
reflections, and diffractions all carry different information resolving area improves when longer-offset data and lower
about the subsurface. The oil and gas industry has seen frequencies are present. Unfortunately for conventional
very successful applications of FWI in different geological streamer data, the maximum offset is usually limited to 8
environments with various data acquisition methods. km, in which case FWI updates are limited to the shallow
Usually, FWI works best with data acquisition techniques section, especially in deep-water environments. To achieve
that can deliver long-offset and/or low-frequency seismic meaningful deep updates, reflection data must be
data to construct the deep parts of the subsurface and to incorporated.
mitigate the cycle-skip issue due to the nonlinear nature of
FWI. In the past decade, successful FWI applications have FWI updates obey different sensitivity kernels with respect
all used the early arrivals, diving waves, and refractions to to different types of wavefield components, which is
successfully update the shallow parts of the model. The demonstrated by the following simple numerical
remaining question is how do we achieve meaningful deep experiment. Assuming that the velocity model is composed
updates beyond that which conventional FWI can provide of a homogeneous background model with a reflector
using data-domain inversion rather than relying on image- included in density, the complete FWI sensitivity kernel
domain approaches such as ray-based tomography or wave- (Figure 1a) is made up of the primary sensitivity kernel
equation migration velocity analysis? This issue becomes ( S1  R1 ) and the secondary sensitivity kernels ( S2  R3 ,
significant when the data do not have long offsets and are
S3  R2 , S2  R2 , and S3  R3 ). The primary
dominated by reflections. To use reflection data to
successfully update the low-wavenumber model in FWI, sensitivity kernel S1  R1 is formed by correlation of
Xu et al. (2012) and Brossier et al. (2014) proposed a transmitted source and receiver wavefields. This primary
method to decompose the model representation into a sensitivity kernel is a long-wavelength form that is also the
background model that governs the kinematics, and a dominant part in conventional FWI. Beneath the primary
reflectivity model that governs the dynamics of the sensitivity kernel, there are two secondary sensitivity
wavefield. During FWI iterations, both the background kernels. First is the long wavelength and built by the
model and the reflectivity model are updated. In our correlation of the downgoing source-side wavefield and the
© 2017 SEG Page 1466

Deep updates via FWI
upgoing reflected receiver wavefield S2  R3 and the starting and the updated model (Figure 2b and Figure 2d)
correlation of the upgoing source-side reflected wavefield the first obvious conclusion is that FWI picked up the small
and medium scale high velocity carbonate carapaces and
and the downgoing receiver-side wavefield S3  R2 . The

the low velocity shale bodies from the simple starting
second is the short wavelength and formed by the velocity field.
correlation of the downgoing source-side wavefield and To extend the updates to the deep part of the basins and
downgoing receiver-side wavefield S2  R2 , and the under the salt, we switched to the reflection-based FWI.
correlation of upgoing reflected source wavefield and The reflection-based FWI was run through 12 iterations at
upgoing reflected receiver wavefield S3  R3 . The first 4.5 Hz. In the deep part of the section, we separated the low
and high wavenumbers such that the low wavenumbers
described above provides updates for the long-to- contributed to the velocity field while the high
intermediate wavelength update, while the second one the
wavenumbers contributed to the reflectivity field. This
short-wavelength updates. One of the key issues for product of reflection FWI may be used as a quality control
reflection FWI is to separate the sensitivity kernels so that because it is proportional to the image. The deep update is
FWI can use mainly the S2  R3 and S3  R2 kernel to demonstrated by mobile shale and subsalt improvements in
update models. This is a challenge especially for terms of imaging and velocity update. The initial velocity
conventional FWI, because the updates will be dominated field was fine-tuned using reflection-based FWI to achieve
by the primary sensitivity kernel as shown in Figure 1a. In the final updated velocity field (Figure 3b). This was used
our proposed reflection FWI, a Born modeling-based to compare the initial image (Figure 3a) with the final
algorithm can explicitly compute the kernel provides long velocity image (Figure 3b) the deep part of the model at the
wave-number updates plotted in Figure 1b. cretaceous depth shows improvements in reflector
continuity and focusing after the deep reflection FWI
This kernel clearly illustrates that the resolving power of updates in the shale section . The second enhancement was
reflection is very different from the primary refracted seen subsalt, inputting a simple subsalt velocity trend to the
diving wave (the primary sensitivity kernel). It uses the reflection FWI that has changed the shape of the reservoir
reflector-to-surface transmitted wave paths, which are after 12 iterations of velocity update (Figure 4b versus
dominated by the first Fresnel zones associated with a set Figure 4a) and the salt feeder in between two salt-bodies
of virtual sources located at the reflector and virtual point was reinserted by FWI producing the deeper imaging
receivers located at the source position and at the real improvement below the reinserted salt feeder.
receiver position, respectively. This example also
highlights the key role of the reflectivity model acting as Conclusions
secondary sources in depth to update the long-to-
intermediate wavelengths of the subsurface model. The Conventional FWI has the limitation of diving-wave
reflectivity can be computed in different ways, such as by penetration that can be a few kilometer below the sea floor,
migration algorithms. and which is too shallow in a deep-water environment. To
extend this, reflections should be taken into account with
the correct sensitivity kernel. This enables FWI to update
Field data validation the deeper section. Using the reflectivity and the
transmitted part of the reflection ray-path, we can update
The field example consists of a 2 x 4 linear wide-azimuth both the low and the medium-wavenumber parts of the
(WAZ) data set with approximately 9.6 km maximum velocity field beyond the diving wave sensitivity kernel.
offset acquired in the Gulf of Mexico. The gun array, the
shot depth, and the cable depth allowed us to record low
frequencies of about 2.5 to 3.5 Hz In spite of the rich low-
frequency content, we elected to start the conventional FWI Acknowledgements
from 4 Hz because the adjustive (cycle-skip mitigated)
option (Jiao et al., 2015) was employed from a 1D type of
starting velocity field (Figure 2a). First, we used the early The authors would like to thank WesternGeco for the
arrivals as input to FWI from wide-azimuth data. After 10 permission to publish this paper, and also thank the project
iterations at 4 Hz, we increased the frequency to 6 Hz in the team.
multiscale manner to achieve higher resolution to the
velocity update with a further 7 iterations. After the first
two frequency bands using early arrivals for the FWI
update (Figure 2c), the updates were limited to 2.7km,
When the depth slice difference is interrogated between the
© 2017 SEG Page 1467

Figure 1: (a) Full-wave FWI kernel, (b) Reflection FWI kernel.
Figure 2: (a) Initial 1D velocity model vertical section, (b) Initial 1D velocity model depth slice at 2 km, (c) FWI updated velocity after 2
frequency band update in vertical section, (d) FWI updated velocity after 2 frequency band update depth slice at 2 km.
© 2017 SEG Page 1468

Figure 3: (a) Input model to reflection FWI with image overlay in the shale area, (b) Reflection FWI updated model with image overlay in the
shale area.
Figure 4: (a) Input model to reflection FWI with image overlay in the subsalt area, (b) Reflection FWI updated model with image overlay in the
subsalt area.
© 2017 SEG Page 1469

EDITED REFERENCES
REFERENCES
full-waveform inversion: Geophysical Prospecting, 63, 354–367, http://doi.org/10.1111/1365-
2478.12190.
Jiao, K., D. Sun, X. Cheng, and D. Vigh, 2015, Adjustive full waveform inversion: 85th Annual
International Meeting, SEG, Expanded Abstracts, 1091–1095, http://doi.org/10.1190/segam2015-
5901541.1.
Xu, S., F. Chen, G. Lambaré, Y. Zhang, and D. Wang, 2012, Inversion on reflected seismic wave: 82nd
Annual International Meeting, SEG, Expanded Abstracts, 1–7, http://doi.org/10.1190/segam2012-
1473.1.
© 2017 SEG Page 1470

Least-squares reverse-time migration guided full-waveform inversion
Benxin Chi*, Kai Gao, and Lianjie Huang, Los Alamos National Laboratory, Los Alamos, NM 87545
Summary To tackle this problem, there are two major categories of

methods for separating the migration and tomographic
Full-waveform inversion (FWI) has become a powerful tool components of gradients in FWI. The first one is the
for high-resolution velocity building. However, FWI suffers wavefield decomposition method (Wang et al., 2016).
from the local-minima problem, particularly when both the Several wavefield-separation approaches have been
initial velocity model is inaccurate and low-frequency data developed for reverse-time migration (RTM), using the
are absent. To alleviate this problem and improve the Poynting vector (Yoon and Marfurt, 2006; Chen and Huang,
convergence rate, we develop a new full-waveform 2014), Fourier transform (Liu et al., 2011; Tan and Huang,
inversion method using least-squares reverse-time migration 2014), or analytic wavefields (Fei et al., 2015; Shen and
(LSRTM) to guide interface updates and an efficient, Albertin, 2015). The other one is called reflection waveform
implicit wavefield separation scheme to alternatively update inversion (RWI) (Xu et al., 2012; Alkhalifah and Wu, 2016;
the low-wavenumber and high-wavenumber components of Chi et al., 2016). RWI uses migration and demigration to
velocity models. During each iteration step, our new method predict reflections, and back-propagates the residuals along
first employs a migration-like kernel to update high- the reflection wavepath. However, RWI is based on the Born
wavenumber velocity perturbations using LSRTM, and then approximation and cannot properly handle refraction and
utilizes a tomography-like kernel to recover the low- reflection data simultaneously.
wavenumber background velocity. To accurately compute
these two types of kernels, we employ an efficient, implicit We develop a new LSRTM-guided FWI method. We
wavefield-separation scheme, rather than using the decompose the FWI gradient into the migration-like and
reflection waveform inversion under the Born- tomographic components using an efficient, implicit
approximation. We validate our new FWI method using wavefield separation scheme. For a given initial model, we
synthetic data for the Marmousi model. We demonstrate that first perform least-squares reverse-time migration (LSRTM)
that our new method is more robust than conventional FWI, to update the model perturbations using the migration kernel.
particularly when initial velocity models are poor and low- The velocity perturbation updates of LSRTM enable us to
frequency data are not available. compute synthetic reflections to match the observed data,
and the reflectors of LSRTM guide us to update the low-
Introduction wavenumber component of the model above the reflectors
using the tomographic kernel. In our method, the model
Full-waveform inversion (FWI) has the potential to build a perturbation and the background velocity are updated
high-resolution velocity model by iteratively minimizing the alternatively. We use synthetic seismic data for the
misfit between the observed and synthetic data (Tarantola, Marmousi velocity model to validate the improved
1984; Virieux and Operto, 2009). This is because FWI capability of our new FWI method. We compare our
contains both migration and tomographic components inversion results with those obtained using the conventional
(Mora, 1989). However, to reconstruct an accurate FWI. Our numerical examples demonstrate that our new
subsurface velocity model, a sufficiently accurate initial LSRTM-guided FWI method produces more accurate
velocity model is needed for FWI, such that all synthetic velocity models than the conventional FWI.
arrivals misalign observed data within a half period.
Satisfying this requirement is seldom possible for practical Theory
applications, partially because the initial velocity model
usually lacks of high-wavenumber components, or Conventional FWI uses the zero time-lag cross-correlation
interfaces. FWI produces high-wavenumber migration-like between the forward-propagated source wavefield and the
updates when forward- and back-propagated wavefields back-propagated adjoint wavefield to form gradients for
travel in the opposite directions and yields low-wavenumber model update. Mora (1989) showed that a FWI kernel Gfull
tomographic model updates when the forward- and back- is equivalently the summation of migration and reflection
propagated wavefields propagating in the same direction tomography:
(Tang et al, 2013). However, the conventional FWI gradient Gfull  Gmig  Gtomo , (1)
is often dominated by high-wavenumber updates and fails to
update the background velocity model, since the desirable where appropriate coefficients are suppressed for brevity.
tomographic component is one order of magnitude smaller The migration kernel and the tomographic kernel are
than that of the migration-like component. expressed as:
© 2017 SEG Page 1471

LSRTM-guided FWI
T
Gmig  Gdu  Gdu   0
u s , d ur , u  u s ,u ur , d dt , (2a) Gmig/tomo 
1
V 
 2u †
T   2u 
u   H z  2  H z  u †  dt , (7)
0 t
 t 
Ns , Nr 3 2
Ns , Nr
T
Gtomo  Gdd  Guu   0
us , d ur , d  u s , u ur ,u dt , (2b) Where    1 for the migration-like kernel, and   1 for
Ns , Nr
the tomography-like kernel. If we set   0 , the kernel
where the subscripts s and r represent the source and becomes the conventional FWI kernel. The above kernels
receiver, respectively, and the subscripts u and d represent could also be obtained using other possible approaches such
the up- and down-going components. In the framework of as that based on the Fourier transform (Liu et al., 2011). The
adjoint-state method, the receiver wavefield is a result of the advantage of using Equation (7) is that its computational
misfit signals back-propagating from the receiver location as efficiency is higher than the others.
the adjoint source. Equation (2) shows that the migration
kernel is the cross-correlation result of the source and Figure 1 shows the difference between the migration-like
receiver wavefields propagating in different spatial and tomography-like kernels for a simple two-layer model.
directions, while the tomographic kernel results from those It illustrates that the high-wavenumber migration isochrone
wavefields propagating in the same spatial directions. (Figure 1a) and the low-wavenumber wavepath associated
with the direct waves and backscattering reflections (Figure
Separating Gdu  Gdu from Gdd  Guu is not a trivial task. 1b) can be separated from the full mixed kernel (Figure 1c)
We employ the approach of Fei et al. (2015). This approach accurately using Equation (7).
was originally developed for isolating only the cross-
correlation between the down-going source wavefield and
the up-going receiver wavefield for RTM:
T
I du   0
usur  H z  us  H z  ur 
Ns , Nr (3)
 us H z  H t  ur    H z  us  H t  ur   dt ,
where H z and H t are Hilbert transforms in the depth and
(a) Migration-like kernel
time domains, respectively. The above imaging condition is
derived based on the so-called extended (or analytic)
wavefields in the depth and time domains. Analogously, the
cross-correlation between the up-going source wavefield
and the down-going receiver wavefield can be expressed as
T
I ud   0 usur  H z  us  H z  ur 
Ns , Nr (4)
 us H z  H t  ur    H z  us  H t  ur   dt.
(b) Tomography-like kernel
Combining Equations (3) and (4) results in
T
I du  I ud  2    us ur  H z  us  H z  ur dt. (5)
0
Ns , Nr
Similarly, we obtain
T
I dd  I uu  2    us ur  H z  us  H z  ur  dt. (6)
0
Ns , Nr
The wavefield separation in Equations (5) and (6) is

accomplished using an implicit scheme, i.e., neither ud nor
(c) Conventional FWI kernel
u u is separated from the total wavefield. Compared to Shen
Figure 1: Migration-like (a) and tomography-like (b) kernels
and Albertin (2015), Equations (5) and (6) do not require
calculated using Equation (7) together with the conventional FWI
additional wavefield propagation process, and the only kernel (c).
additional computational cost comes from the computation
of Hilbert transforms, which can be efficiently achieved with The most important advantage of our LSRTM-guided FWI
discrete convolution. is that we can easily divide our FWI iterations into two
stages of velocity updates during each FWI iteration: the
Using Equations (5) and (6), we obtain the migration-like LSRTM stage and the tomography stage. In the LSRTM
kernel and tomography-like kernel as stage, we use Equation (7) with    1 to update the high-
© 2017 SEG Page 1472

LSRTM-guided FWI
wavenumber model perturbation V . After a few number Figure 3a is the migration-like gradient, resulting in high-
of LSRTM iterations, we turn to the tomography stage using wavenumber model perturbation updates. The tomography-
Equation (7) with   1 to update the low-wavenumber like gradient of our LSRTM-guided FWI in Figure 3b
background velocity V0 above the reflectors of LSRTM. provides smooth velocity updates above the reflectors.
Consequently, the model perturbation and the background
Next, we test our LSRTM-guided FWI method on the
velocity are updated alternatively.
Marmousi model (Figure 4a). The grid interval for both the
horizontal and vertical directions is 12.5 m. We position 50
We use synthetic seismic data for a layered model and the
shots with a spatial interval of 100 m at the depth of 12.5 m
Marmousi model to validate our new FWI method.
and 400 receivers at all grid points from a distance of 0 to
5 km also at the depth of 12.5 m. A Ricker wavelet with a
Numerical Examples
center frequency of 15 Hz is used for the modeling and
inversion. The initial model is a Gaussian smoothed model
We first use a simple 2D synthetic example to show
(Figure 4b). To illustrate the effectiveness of our LSRTM-
migration-like and tomography-like gradients of our
guided FWI method, the inversion experiment is performed
LRSTM-guided FWI. Figure 2 depicts a four-layer model
without using any multi-scale strategy.
containing a 5% negative anomaly and a 5% positive
anomaly (125 m/s) in the second and third layer relative to
the first layer, respectively. The initial velocity model for
FWI is a homogeneous model with the velocity of the first
layer. The source wavelet is a Ricker time function with a
center frequency of 15 Hz.
(a) Marmousi velocity model
Figure 2: A four-layer velocity model.
(b) Gaussian-smoothed Marmousi model

Figure 4: The Marmousi velocity model (a) and the initial model (b)
for FWI.
(a) Migration-like gradient Because the initial model deviates substantially from the true
model and no multi-scale strategy is employed, the
conventional FWI produces the reasonable updates mainly
in the shallow region of the Marmousi model. The target area
bellow 1 km, or the anticline structures are not recovered, as
indicated by the inverted model shown in Figure 5a.
In contrast, our new LSRTM-guided FWI yields a

significantly improved inversion result as depicted in
Figure 5b. The inversion artifacts in the shallow region of
(b) Tomography-like gradient the model in Figure 5a disappear in Figure 5b. In addition,
Figure 3: The migration-like gradient (a) and tomography-like the anticline structures in the deep region of the model are
gradient (b) of LSRTM-guided FWI for the layered model in well reconstructed using our new method. This can be
Figure 2.
© 2017 SEG Page 1473

LSRTM-guided FWI
clearly observed in the zoom-in figures of the inversion

results, as shown in Figure 6.
(a) Conventiaonl FWI

(a) Data misfit
(b) LSRTM-guided FWI

Figure 5: Comparison between the conventional FWI (a) and our
new LSRTM-guided FWI (b).
(b) Model misfit

Figure 7: Convergence curves of the normalized data misfit (a) and
the normalized model misfit (b) for the conventional FWI (blue line)
and the LSRTM-guided FWI (red line) for the Marmousi model.
(a) Marmousi velocity model Conclusions
We have developed a new LSRTM-guided FWI method to

improve the convergence and robustness of full-waveform
inversion, particularly when the initial model is poor. We use
an efficient, implicit wavefield separation scheme and a
least-squares reverse-time migration method to alternatively
(b) Conventiaonl FWI (c) LSRTM-guided FWI
update the low-wavenumber and high-wavenumber
Figure 6: Zoom-in comparison among the Marmousi velocity components of velocity models. We have validated the
model (a), the conventional FWI (b) and our new LSRTM-guided improvement of our new FWI method using synthetic data
FWI (c) of the anticline structures in the Marmousi model.
for the Marmousi model, and demonstrated that that our new
method produces a more accurate velocity model than the
To quantitatively compare the inversion results using conventional FWI, and converges much faster than the latter.
different methods, we plot both the data misfit and the model
misfit convergence curves in Figure 7. The LSRTM-guided Acknowledgments
FWI not only converges much faster than the conventional
FWI, but also further reduces 90% of the data misfit and 50% This work was supported by the U.S. Department of Energy
of the model misfit from those for the conventional FWI. through contract DE-AC52-06NA25396 to Los Alamos
National Laboratory (LANL). The computation was
performed using the super-computers of LANL’s
Institutional Computing Program.
© 2017 SEG Page 1474

EDITED REFERENCES
REFERENCES
Alkhalifah, T., and Z. Wu, 2016, The natural combination of full and image-based waveform inversion:
Geophysical Prospecting, 64, 19–30, http://doi.org/10.1111/1365-2478.12264.
Chen, T., and L. Huang, 2014, Elastic reverse-time migration with an excitation amplitude imaging
condition: 84th Annual International Meeting, SEG, Expanded Abstracts, 1868–1872,
Chi, B., L. Dong, and Y. Liu, 2015, Correlation-based reflection full-waveform inversion: Geophysics,
80, no. 4, R189–R202, http://doi.org/10.1190/geo2014-0345.1.
Fei, T. W., Y. Luo, J. Yang, H. Liu, and F. Qin, 2015, Removing false images in reverse time migration:
The concept of de-primary: Geophysics, 80, no. 6, S237–S244, http://doi.org/10.1190/geo2015-
0289.1.
Liu, F., G. Zhang, S. Morton, and J. Leveille, 2011, An effective imaging condition for reverse-time
migration using wavefield decomposition: Geophysics, 76, no. 1, S29–S39,
http://doi.org/10.1190/1.3533914.
http://doi.org/10.1190/1.1442625.
Shen, P., and U. Albertin, 2015, Up-down separation using Hilbert transformed source for causal imaging
condition: 85th Annual International Meeting, SEG, Expanded Abstracts, 4175–4179,
Tan, S., and L. Huang, 2014, Least-squares reverse-time migration with a wavefield-separation imaging
condition and updated source wavefields: Geophysics, 79, no. 5, S195–S205,
http://doi.org/10.1190/geo2014-0020.1.
Tang, Y., and S. Lee, 2013, Tomographically enhanced full wavefield inversion: 82nd Annual
1259–1266, http://doi.org/10.1190/1.1441754.
Wang, F., D. Donno, H. Chauris, H. Calandra, and F. Audebert, 2016, Waveform inversion based on
wavefield decomposition: Geophyscis, 81, no. 6, R457–R470, http://doi.org/10.1190/geo2015-
0340.1.
Xu, S., D. Wang, F. Chen, G. Lambare, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Yoon, K., and K. Marfurt, 2006, Reverse-time migration using the Poynting vector: Exploration
Geophysics, 37, 102–107, http://doi.org/10.1071/EG06102.
© 2017 SEG Page 1475

Double Difference Wave Equation Reflection Travel time Inversion
Chao Cui*, Jianping Huang,Yundong Guo and ZhenChun Li
China University of Petroleum(East China)
Summary parameters, RFWI is also a strongly ill-posed problem (Chi

et al. 2015) when using the L2-norm of the data distance as
Wave Equation Reflection Travel time Inversion (WERTI) misfit function. One efficient method to improve the
is capable of estimating the long wave-length model linearity of RFWI is the separation of travel time and
component using seismic reflections. One of the key steps amplitude information. Wang et al (2015) used the
of WERTI is the migration/de-migration procedure to windowed cross correlation to get the travel time difference
generate reflected wave from the smooth model. However, based on semiautomatic global picking strategy. Chi et al
the waveforms of de-migrated data and observed data are (2015) introduced the spatial cross correlation and temporal
significantly different without the application of least cross correlation as misfit function to emphasize on the
square migration and instability can be observed when travel time information, which is successfully applied to
picking up the travel time difference. To further improve field data. However, the cross-correlation based method is
the stability of WERTI, this paper introduces a double not suitable for the rapidly varying time shifts. Ma and
difference wave-equation reflection travel time inversion Hale (2013) applied the Wave Equation Reflection Travel
(DWERTI) method and modifies the conventional time Inversion (WERTI) to get the long wavelength
Dynamic Imaging Warping (DIW) method with a structure based on the Dynamic Imaging Warping (DIW)
regularization term. The travel time difference is measured method (Hale, 2013). One of the key steps of RFWI is the
within the observed or synthetic data rather than between migration/de-migration procedure to generate reflected
them. The misfit function is defined and the corresponding wave from the smooth model. Without the application of
gradient is obtained using the Lagrange multiplier true amplitude migration, the de-migrated data and the
technique. The test on two time shifted noisy traces is used observed data are usually different in terms of the
to illustrate the stability of the modified DIW method with amplitude and waveform. Because DIW relies on the
a regularization term. Then we apply the proposed strategy amplitude information of seismic traces (Venstad, 2014),
on a portion of Sigsbee 2A model to verify its feasibility. the travel time difference measured by DIW is not accurate
Because an accurate initial model can be estimated, enough for inversion without the application of Least
combined with the FWI engine, the proposed inversion square migration.
strategy outperforms the conventional method substantially.
In this paper, we propose a double difference wave-
Introduction equation reflection travel time inversion (DWERTI)
strategy. By avoiding picking travel time difference
Because conventional Full Waveform Inversion (FWI) between observed and modeled data, the proposed method
lacks the inversion ability for low wavenumber model significantly improves the inversion ability for low
component and relies on the quality of low frequency data wavenumber model component. Besides, we add a
and the accuracy of initial model, an accurate low regularization term to conventional DIW method for
wavenumber model is needed to guarantee that the stability. The experiment on a portion of Sigsbee 2A model
optimization procedure converges to the global minimum. verifies the merits of DWERTI compared with
Conventionally, the low wavenumber component is carried conventional FWI.
by the long offset early arrivals propagating through the
shallow structure, such as direct wave and diving wave. Theory
However, limited by the acquisition ability, the long offset
early arrivals are usually unavailable in the field data. Xu et We realize that the waveforms of de-migrated data and
al (2012) proposed the Reflection Full Waveform Inversion observed data are significantly different without the
(RFWI) method to explicitly make use of the long- application of true amplitude migration. In this case, the
wavelength information carried by reflected wave and travel time difference picked by DIW may not be accurate
retrieve the background model in the middle to deep part. enough. In this paper, the misfit function based on travel
RFWI uses a two steps work flow in which one repeatedly time difference is modified as:
updates the background velocity and the high wavenumber 1 2
component and is capable of recovering an accurate   obs  xr , t ; xs    cal  xr , t ; xs  2
background model which can be used for conventional FWI 2 (1)
or migration. It is demonstrated that, because of the
complex relationship between the data and model
© 2017 SEG Page 1476

where  obs and  cal are the travel time difference between
The travel time difference is measured by DIW method
different traces within the observed and de-migrated data,
which is suitable for the rapidly varying time shifts (Hale
respectively.
2013). However instability can be observed for the noisy
data. In this paper, to further improve the performance of
As shown in Figure (1), we explain the physical meaning of
traditional DIW method, we add a regularization term to
the proposed misfit function as following. The black solid
the conventional DIW misfit function, which can be
line and dashed line represent the real reflector and the
expressed as:
migration image, respectively. The migration image differs
from the real reflector position because of the error of 1
Dcal (l )  dxs  dxr  dt
background model. t11 and t 21 are the travel time of the 2
observed data d11 , d 21 , respectively. t10 and t20 are the [( pc ( xr  xr , t  l ( xr , t ; xs ); xs )  pc ( xr , t ; xs )) 2 + l 2 ]
1
travel time of the de-migrated data d10 , d 20 , respectively. In Dobs (l )   dxs  dxr  dt
2
zero offset, the travel time of observed data t01 equals the
[( pobs ( xr  xr , t  l ( xr , t ; xs ); xs )  pobs ( xr , t ; xs )) 2 + l 2 ]
travel time of de-migrated data t00 , while for the far offset (2)
data, cycle skipping can be observed because of the where pobs and pc represent the observed and modeled
inaccurate background model. The conventional WERTI
data, respectively, l represents the travel time difference
aims at minimizing the difference between t11 and t10 and
for the data recorded at receiver position xr and source
the travel time difference is measured between the observed
and de-migrated data. Because DIW relies on the amplitude position xs . xr represents the trace interval for travel
information of seismic traces and the waveforms of de- time difference estimation within the common shot gather.
migrated and observed data are usually different, the travel  is a weight factor which is determined based on human
time difference picked by DIW is not accurate enough for experience. The travel time difference is obtained by
inversion. However, the misfit function proposed in this minimizing the misfit function defined by equation (2). In
paper aims at minimizing the difference between this paper, the optimization problem is solved following the
t21  t11 and t20  t10 . Because the waveforms of different strategy proposed by Hale (2013).
traces within the modelled or de-migrated common shot
gather are similar, the travel time difference can be Figure (2) shows the travel time difference measured by
obtained accurately. Besides, the travel time difference DIW with and without the regularization term for the noisy
within the common shot gathers is measured between the data. The dashed black line represents the real travel time
near traces and a summation is performed from the near difference and the red line represents the estimated travel
offset to the far offset, which can further avoid the time difference. It is clear that, because the panel of the
waveform difference between the near offset trace and far misfit function is very irregular for the noisy data, the
offset trace. When the global minimum of the misfit travel time difference measured by conventional DIW
function is reached, using the zero offset trace as reference, method is different from the real travel time difference. We
we can claim that the travel time information of the also note that for where the amplitude of the signal is zero,
inverted model is accurate. the travel time difference is dominated by the noise, which
may leads to error for inversion. On the contrary, with the
help of regularization term, the measured travel time
difference fits the real travel time difference very well.
(a)
Figure 1 physical meaning of the misfit function of double

difference wave equation reflection travel time inversion
© 2017 SEG Page 1477

with a peak frequency of 10 Hz is used as the source and is

assumed as known during the inversion.
Firstly the conventional FWI is performed and the

inversion result after 500 iterations is shown in Figure (3.b).
The shallow part (<0.5km) of the true model is partly
(b) revealed benefiting from the accuracy of the initial model
Figure 2 The estimated travel time difference by DIW (a) in this portion and the long offset refracted data. However,
without and (b) with the regularization term. The dashed for the deep part, the inversion result only provides some
black line represents the real travel time difference and the incorrect high wavenumber updates and no useful structure
red line represents the measured travel time difference. information can be observed.
In this paper, we only invert for the background model Then we use the DWERTI method to update the
using DWERTI method. Using the Lagrange multiplier background model. Firstly the reflectivity model is
technique (Ma and Hale, 2013), the gradient of misfit generated by migration using the near offset data. The
function with respect to background model parameters can imaging result is high passed in wavenumber domain to
be expressed as: eliminate the low wavenumber noises. The travel time
 qb  x, t; xs  pc  x, t; xs   (3)
difference between different traces within the de-migrated
 ( x)   dxs dt   and observed data is measured by regularized DIW method.
 q  x, t; x   p  x , t ; x 
 c s b s  We use the gradient of misfit function (Eq.3) to update the
where： background model. With the updated background model,
  xr , t ; xs   ( cal  xr , t ; xs    obs  xr , t ; xs ) the reflectivity is gradually corrected to the real position.
The update of low-wavenumber background model and the
 
p  x  xr , t   cal  xr , t ; xs  ; xs   reflectivity take place in the same iteration but in an
 c r  alternating fashion (Ma and Hale, 2013).
 p  x  x , t    x , t ; x  ; x   p  x , t ; x 
  
 c r r cal r s s c r s

 2  Figure (3.c) shows the updated low wavenumber
  p c  xr  xr , t   cal  xr , t ; xs  ; xs     background model. Because the background model is
accurate for the kinematic information, conventional full
 2 
  m ( x )  mr ( x )  2
  2  qb ( x, t )  waveform inversion is capable of providing a high
 t  resolution inversion result using the inverted model of
DWERTI as initial model. As shown in Figure (3.d), the
 p c  xr  xr , t   cal  xr , t ; xs  ; xs  recovered model is almost identical to the real model for
the shallow to middle depth (<1.5km). Some artificial is
 2 2 2
 m( x) 2    qc ( x, t )  mr ( x) 2 qb ( x, t ) presented in the deep model for the poor illumination.
 t  t Overall, the inversion result of DWERTI is capable of
where m and mr represent the smooth background model providing accurate initial model for conventional FWI.
and the reflectivity model obtained by near offset migration To further investigate the reliability of the inversion result
at position x , respectively. of DWERTI, portions of the RTM results using the real
model, the initial model and the inversion result of
We iteratively update the background model using equation DWERTI are compared in Figure (4). Because the constant
(3) and the reflectivity by near offset migration. The gradient model is significantly far from the real model,
inverted background model, which provides accurate travel almost all of the reflectors are located in incorrect position
time information, can be further used as initial model for and the energy is not focused in the RTM result (Figure
conventional FWI. 4.b), compared with the RTM result using the real model.
With the accurate background model provided by DWERTI,
Example the RTM image is substantially improved. Two major
faults are better resolved and reflectors are now close to the
The synthetic data test involves a portion of SigsBee 2A true position.
model which is shown in Figure (3.a). The initial model is
set as linear gradient model except for the water layer, Conclusions
which is the same as the true model. 79 shots and 399
receivers are evenly distributed on the surface of the model In this paper, we propose a double difference wave
at 80m and 16m intervals, respectively. A Ricker wavelet equation travel time inversion with a regularized DIW
© 2017 SEG Page 1478

method. Based on the theoretical analysis and numerical Figure 3 (a) The real model; (b) The inversion result of the
test, we can draw the conclusions that include: (1) conventional FWI using the linear increasing model as
conventional FWI relies heavily on the accuracy of initial initial model; (c) The inversion result of DWERTI; (d) The
model and fails to produce satisfying inversion result inversion result of conventional FWI using (c) as initial
except for the shallow model illuminated by early arrivals model
when the initial model is not accurate enough. (2) Making
use of the travel time information from the seismic data is
the key point for building the accurate initial model for
conventional FWI. The method proposed in this paper can
make use of the travel time accurately and provide a
reasonable initial model for conventional FWI.
Acknowledgment
We thank the discussion and support from SWPI of China (a)

University of Petroleum (East China).
(b)
(a)
(c)
(b)
Figure 4 Part of the migration result using (a) real model (b)
the initial model and (c) the inversion result of DWERTI
(c)
(d)
© 2017 SEG Page 1479

EDITED REFERENCES
REFERENCES
80, no. 4, R189–R202, http://doi.org/10.1190/geo2014-0345.1.
http://doi.org/10.1190/geo2012-0327.1.
waveform inversion: Geophysics, 78, no. 6, R223–R233, http://doi.org/10.1190/geo2013-0004.1.
Venstad, J. M., 2014, Dynamic time warping — An improved method for 4D and tomography time shift
estimation?: Geophysics, 79, no. 5, R209–R220, http://doi.org/10.1190/geo2013-0239.1.
Wang, H., S. C. Singh, F. Audebert, and H. Calandra, 2015, Inversion of seismic refraction and reflection
data for building long-wavelength velocity models: Geophysics, 80, no. 2, R81–R93,
http://doi.org/10.1190/geo2014-0174.1.
Xu, S., D. Wang, F. Chen, G. Lambaré, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Annual International Meeting, SEG, Expanded Abstracts, 1–7, http://doi.org/10.1190/segam2012-
1473.1.
© 2017 SEG Page 1480

Wavefield Seperation by Energy Norm Born Scattering
Bingbing Sun and Tariq Alkhalifah, King Abdullah University of Science and Technology
SUMMARY deal with up-going and down-going waves and can not distin-
guish between reflection and transmission wavs propagating

horizontally.
In Reflection Based Waveform Inversion, the gradient is com-
puted by cross-correlating the direct and Born scattered wave- In this abstract, we propose the Energy Norm Born Scattering
field with their adjoints applied to the data residuals. In this (ENBS) method. It can eliminate the transmission waves in
case, the transmitted part of the Born scattered wavefield pro- born modeling effectively and it works for both vertical and
duces high wavenumber artifacts, which would harm the con- horizontal propagated wavefield. In theory, It is the adjoint op-
vergence of the inversion process. We propose an efficient En- erator of the Energy Norm (inverse scattering) imaging condi-
ergy Norm Born Scattering (ENBS) to attenuate the transmis- tion (Whitmore and Crawley, 2012; Rocha et al., 2016). Thus
sion components of the Born modeling, and allow it to pro- we would at first describe the method by the adjoint analysis
duce only reflections. ENBS is derived from the adjoint of the of the ENIC and then by high frequency asymptotic analysis,
Energy Norm (inverse scattering) imaging condition and in or- we shows that analytically this method produces pure reflec-
der to get deeper insights of how this method works, we show tions when the image or the velocity perturbation is given by
analytically that given an image, in which reflectivity is rep- Dirac delta functions. We illustrate that the method can remove
resented by a Dirac delta function, ENBS attenuates transmis- transmission waves effectively in both the time and frequency
sion energy perfectly. We use numerical examples to demon- domains. At last, we use the proposed method to obtain a clean
strate that ENBS works in both the time and the frequency gradient for velocity updating in RWI.
domain. We also show that in reflection waveform inversion
(RWI) the wave path constructed by ENBS would be cleaner
THEORY
and free of high wavenumber artifacts associated with conven-
tional Born scattering.
adjoint of the energy norm imaging condition
The Energy Norm Imaging Condition (ENIC) for Reverse Time
Migration (RTM) is described in Rocha et al. (2016) as
INTRODUCTION Z
1 ∂S ∂R
I= —S · —Rdt, (1)
As conventional Full Waveform Inversion (FWI) (Virieux and v2 ∂t ∂t
Operto, 2009) relies on low frequencies in the data and is prone
where S and R are the source and receiver wavefields⇣ respec- ⌘
to ”cycle-skipping”, especially in dealing with reflections, re- ∂ ∂ ∂
flection based Waveform Inversion (RWI) provides a stable al- tively, v is the velocity, and I is the image. — = ∂ x , ∂ y , ∂ z
ternative inversion of the background model (Xu et al., 2012; is the space gradient operator. It can also be expressed in the
Irabor and Warner, 2016). The gradient in RWI is computed frequency domain using Green functions:
by cross-correlating the the back propagated residual wave- Z Z h
w2
field with the demigrated one (the born scattered wavefield). I(x) = w(w)G(x, xs , w)G(x, xr , w)
As conventional born scattering produces both reflections and v(x)2
i
transmissions, the constructed gradient will not only contain 5G(x, xs , w) · 5G(x, xr , w)w(w) d(xr , xs , w)dxr dw,
the low wavenumber wavepath component for macro velocity
(2)
updating, but also high-wavenumber migration ellipses, that
may induce cycle skipped gradients. Thus, one content of the where x = (x, y, z) represent the Cartesian coordinates, xs and
RWI is for effectively removing those high wavenumber com- xr are the source and receiver coordinates, w(w) and d(xr , xs , w)
ponents and it is related to separation of reflection and trans- are the source wavelet and the record data, respectively. Over
mission wavefields. line denotes the complex conjugate and G(b, a, w) represents
To attenuate such high wavenumber artifacts, we need to re- the Green function from position a to b with: G(a, b, w) =
move the transmission component of the Born scattered en- G(b, a, w).
ergy. There are a few methods available to do so, including ENIC of equation (2) defines the imaging operator M : d !
for example using Hilbert Transform (Liu et al., 2011) to sepa- I, its adjoint operator would be a form of Born modeling (or
rate up and down going wavefields or splitting the wave equa- demigration), which produces data given an image: M ⇤ : I !
tion (Irabor and Warner, 2016). However, such approaches d . Taking the adjoint of operator in equation (2), and apply it
are relatively expensive: for example, the method based on
Hilbert Transform needs extra convolution operations in each
time step during the imaging process while the wave equa-
tion splitting method requires solving the wave equation with
more auxiliary variables and both of these methods can only
© 2017 SEG Page 1481

to image, we obtain The derivative for the Green function can be evaluated as
! ✓ ◆
d(xr , xs ,w) ∂ eiwR/v eiwR/v w 1 ∂ R
Z h = i
w2 ∂z R R v R ∂z
= w(w)G(x, xs , w)G(x, xr , w) (11)
v(x)2 eiwR/v w ∂ R
i⇤ ⇡ i ,
5 G(x, xs , w) · 5G(x, xr , w)w(w) I(x)dx R v ∂z
Z where for the far field approximation, we assume that 1/R <<
w2 (3) w/v.
= w(w)G(x, xs , w)G(x, xr , w)I(x)
v(x)2
Substitution of the derivatives of equation ( 10 ) and ( 11 ) into
5 · [5G(x, xs , w)I(x)] G(x, xr , w)w(w)dx
Z equation (8), we obtain
w2 ZZ " iwR/v ⇣ #
= w(w)G(x, xs , w)I(x)G(xr , x, w) e w z0 ⌘
v(x)2 p(x0 , y0 , z0 , w) = u 2
kz kz dxdy.
R v R
5 · [5G(x, xs , w)I(x)] G(xr , x, w)w(w)dx z=0
(12)
where ⇤ denotes the adjoint operation, and over line represents It is an oscillatory integral and can be approximated using the
the complex conjugate. We re-express equation 3 using a par- stationary phase method. The phase of this oscillatory integral
tial differential equation (PDE) in the time domain as: is expressed as :
1 ∂2p 1 ∂ 2u R kx x ky y
= Dp + 2 2 d v 5 · (d v 5 u), (4) F(x, y) =+ + . (13)
v ∂t
2 2 v ∂t v w w
where, we replaced the image I(x) with the velocity perturba- The stationary phase approximation gives the result in the form
tion d v here, u is the incident (background) wavefield and p is of
the Born scattered wavefield. " #
eiwR/v ⇣ 2 w z0 ⌘
p= c u kz kz , (14)
high frequency asymptotic analysis R v R ⇤ ⇤
(x,y)=(x ,y ),z=0
In RWI, the velocity perturbation d v is a band limited Dirac where c is the weighting term at the stationary point. (x⇤ , y⇤ )
delta function. In the following, we will demonstrate mathe- is the stationary point; it is calculated by setting the gradient
matically that when we consider the velocity perturbation to of the phase of equation (13) to zero, i.e., 5F = 0:
be a Dirac function, ENBS given by equation (4) will produce
∂ F x⇤ x0 kx
pure reflections. = + = 0; (15)
∂x Rv w
As the background wavefield u satisfies the wave equation ∂ F y⇤ y0 ky
= + = 0. (16)
1 ∂ 2u ∂y Rv w
= 5 · 5u. (5) Thus, at the stationary point, we have
v2 ∂t 2

The source term for ENBS of equation (4 ) can be simplified w 2 (x⇤ x0 )2 (y⇤ y0 )2
as kx2 + ky2 = 2 +
v R2 R2
1 ∂ 2u (17)
f = 2 2 d v 5 · (d v 5 u) = 5 d v · 5u. (6) w 2 w 2 z20
v ∂t = 2 .
In the frequency domain, the scattering wavefield can also be v v2 R2
expressed as a 3D integration using the Green function: Using the dispersion relation for plane waves in equation (10),
ZZZ it can be simplified as
eiwR
p(x0 , y0 , z0 , w) = 5 d v · 5u dV (7) w2 2 2 w
R kz2 = z /R or R = |z0 |. (18)
p v2 0 v|kz |
where R = (x x0 )2 + (y y0 )2 + (z z0 )2 .
Substitute this equation into equation (14), we obtain the Born
Suppose the velocity perturbation is a Dirac delta function d v = wavefield as
d (z). Equation (7) can be simplified as " # 
ZZ  ✓ ◆ eiwR/v z0
∂ ∂ u eiwR p= c u kz2 sign(kz )kz2 .
p(x0 , y0 , z0 , w) = dxdy. (8) R ⇤ ⇤
|z 0|
(x,y)=(x ,y ),z=0
∂z ∂z R z=0 (19)
Here we use the shifting properties for the Dirac delta function: Considering the term in the second brace, It is clear that if the
Z incident wavefield is down going with kz > 0, the Born wave-
∂ d (z) ∂ q(z)
q(z)dz = |z=0 . (9) field using ENBS would be zero for z0 > 0 and it corresponds
∂z ∂z
to the zone for transmission waves in this setup. In the deriva-
Considering the incident wavefield as a plane wave, we have: tions, we assumed the velocity perturbation or the image as a
w2 Dirac delta function. As we pointed out before, in RWI the im-
u =ei(kx x+ky y+kz z) , kx2 + ky2 + kz2 = , age would actually be a band limited Dirac function, we show
v2
(10) that in this slightly degenerated situation, the proposed ENBS
∂u ∂ 2u
= ikz u, = kz2 u. can still perform well in attenuation of transmission waves.
∂z ∂ z2
© 2017 SEG Page 1482

EXAMPLES distance [km]
0 2 4
In the first example, we show an application of ENBS in the 0
time domain. In Figure 1a, we place the source in the upper-
left side of the model (denoted by the yellow star) and set the
velocity perturbation (image) as a band limited Dirac delta 1
function. The main frequency for modeling is 10Hz and the

background velocity is set to be constant and equal to 2 km/s.
depth [km]
2
Figures 1b and 1c are snap shots of the Born scattered wave-
field at t = 2.8s for the conventional Born and for ENBS, re-
spectively. From this result, it is obvious that ENBS can atten- 3
uate transmission waves in the scattered wavefield effectively
for both vertical and horizontal propagated incident wavefields.
4
The same features holds in the frequency domain by solving
for the scattered wavefield using the Helmholtz wave equation
5
with the proper source function. This time the source is placed
on the surface in the middle (yellow star in Figure 2a, and the
background velocity is 2 km/s. Figure 2a shows the veloc- (a)
ity perturbation; it is a two-reflector model, the reflectivity is

also given by a band limited Dirac delta function. The real distance [km]
part of the scattered wavevfield for a 10 Hz frequency for the 0 2 4
conventional Born and ENBS are shown in Figures 2b and 2c 0
,respectively. The results are in line with the theoretical pre-
diction: the transmission wavefield is attenuated effectively in
the frequency domain as well as in the time domain. 1
The last example corresponds to the computation of the wave

depth [km]
path update in RWI. Figures 3a and 3b show the gradients 2

computed using conventional born modeling and ENBS, re-
spectively. We can see that the migration ellipse is attenu- 3
ated successfully in the ENDS implementation.Controlling the
wavepath direction of the scattered field to isolate, for exam-
ple, reflections has many applications beyond RWI. Even least 4
square migration can attain higher resolution with such a fea-
ture. We will discuss such these additional features and show
5
examples in the presentation of this work.
(b)
CONCLUSIONS
distance [km]
We proposed an effective and efficient method to attenuate 0 2 4
transmissions in Born modeling, which we refer to as the En- 0
ergy Norm Born Scattering (ENBS). We showed analytically
that given a velocity perturbation represented by a Dirac delta
function, ENBS produces pure reflections, i.e., transmission 1
energy is attenuated. Numerically, similar observations are
realized with band limited Delta functions. Specifically, the
depth [km]
2
numerical examples show that the proposed method produces
pure reflections in both the time and frequency domains. Using
ENBS, we can produce a smooth gradient in RWI free of high 3
wavenumber artifacts, which is essential for good convergence
of the inversion process.
4
(c)
Figure 1: Application of ENBS in time domain : (a) The ve-

locity perturbation, where the yellow star represents the source
position. The scattered wavefiled using (b) the conventional
Born modeling, and using (c) ENBS.
© 2017 SEG Page 1483
distance [km]
0 2 4
0
1
depth [km]
2
distance [km]
3 0 2 4
0
4
1
5
depth [km]
2
(a)
3
distance [km]
0 2 4 4
0
5
1
(a)
depth [km]
2
distance [km]
3 0 2 4
0
4
1
5
depth [km]
2
(b)
3
distance [km]
0 2 4 4
0
5
1
(b)
depth [km]
2
Figure 3: Application of ENBS in reflection waveform inver-
sion: wavepath-based gradients computed using (a) the con-
3 ventional born modeling, and using (b) ENBS.
(c)
Figure 2: Application of ENBS in frequency domain using the

Helmholtz solver: (a) The velocity perturbation, where the yel-
low star represents the source position. The real part of the
scattered wavefield using (b) the conventional Born modeling,
© 2017
and SEG
using (c) ENBS. Page 1484
EDITED REFERENCES
REFERENCES
Irabor, K., and M. Warner, 2016, Reflection FWI: 81st Annual International Meeting, SEG, Expanded
Abstracts, 1136–1140, https://doi.org/10.1190/segam2016-13944219.1.
Liu, F., G. Zhang, S. A. Morton, and J. P. Leveille, 2011, An effective imaging condition for reverse-time
migration using wavefield decomposition: Geophysics, 76, S29–S39,
https://doi.org/10.1190/1.3533914.
Rocha, D., N. Tanushev, and P. Sava, 2016, Acoustic wavefield imaging using the energy norm:
Geophysics, 81, no. 4, S151–S163, https://doi.org/10.1190/geo2015-0486.1.
© 2017 SEG Page 1485

Born reflection kernel analysis and wave equation reflection traveltime inversion in elastic media
Tengfei Wang and Jiubing Cheng, Tongji University
SUMMARY (2017) utilized PS image to update the Vs model through mode

decomposition. But it is a big issue that the extended domain
Elastic reflection waveform inversion (ERWI) utilize the re- approach requires huge computational cost, especially in 3D
flections to update the low and intermediate wavenumbers in case.
the deeper part of model. However, ERWI suffers from the
cycle-skipping problem due to the objective function of wave- Traveltime information are more sensitive and linearly related
form residual. Since traveltime information relates to the back- to the low-wavenumber components of the model. Therefore,
ground model more linearly, we use the traveltime residuals traveltime inversion will be more robust and helpful to build
as objective function to update background velocity model us- initial models containing long-wavelength components for con-
ing wave equation reflected traveltime inversion (WERTI). The ventional FWI (Chi et al., 2015; Luo et al., 2016). Ma and Hale
reflection kernel analysis shows that mode decomposition can (2013) introduced a wave equation reflected traveltime inver-
suppress the artifacts in gradient calculation. We design a sion (WERTI) method to build the low wavenumber of the
two-step inversion strategy, in which PP reflections are firstly model. Unfortunately, in elastic case, traveltimes of a particular
used to invert P wave velocity (Vp ), followed by S wave ve- wave mode are difficult to extract due to the complicated mode-
locity (Vs ) inversion with PS reflections. P/S separation of conversions. The multi-parameter trade-offs will increase the
multi-component seismograms and spatial wave mode decom- nonlinearity as well. Wang and Cheng (2017) obtained good
position can reduce the nonlinearity of inversion effectively results by utilizing the wave mode decomposition to precon-
by selecting suitable P or S wave subsets for hierarchical in- dition the gradients in EFWI. The mode decomposition has
version. Numerical example of Sigsbee2A model validates the been proved an efficient tool to provide decomposed data for
effectiveness of the algorithms and strategies for elastic WERTI hierarchical strategies.
(E-WERTI). We calculate the reflection wavepath and its components of
different wave modes in elastic media. The investigation of
reflection wavepath (kernel) shows the effectiveness of mode
INTRODUCTION decomposition to suppress the artifacts in the gradient cal-
culation. P/S separation of multicomponent seismograms is
With the development of high-performance computational abil- applied to the observed and predicted reflection data to extract
ities, people are paying more attention to the elastic full wave- the individual traveltime residuals through DIW (Hale, 2013).
form inversion (EFWI) to recover the elastic properties of the Based on the analysis of elastic reflection kernels and the sep-
subsurface (Vigh et al., 2014). EFWI provides high-resolution arated traveltime residuals, we design a two-stage workflow to
model estimation but notoriously surfers from the same non- implement the E-WERTI method, in which the traveltime of
linearities or cycle-skipping problems as in acoustic case and PP is firstly used to recover the background Vp model followed
also the trade-offs of multi-parameter inversion (Operto et al., by inverting the background Vs model through the traveltime
2013). In the absence of low frequency data and/or good initial of PS reflections. During the Vs inversion, we precondition
models, EFWI falls into local minimal easily because of its the Vs gradient through spatial wave mode decomposition. Fi-
incapability to recover the low and intermediate wavenumber nally, the numerical example of Sigsee2A model proves the
components of the model. robustness and validity of our E-WERTI method.
Xu et al. (2012) suggested using a reflection waveform inver-
sion (RWI) method to suppress the nonlinearity in FWI, which
aim to invert the long-wavelength components of the model by METHOD
using the reflections predicted by migration/demigration pro-
cess. RWI highly relies on the accurate reflectivity to generate Objective function and gradients of elastic WERTI
the reflections that can match the observed data. However, it is Assume that there is a perturbation δci j k l in the background
very challenging and also expensive to obtain a good reflectiv- elastic media ci j k l , the background wavefileds ui and perturbed
ity model through least-square migration when initial model is wavefields δui satisfy:
far away from the true value. Through minimizing misfit func- ∂u2 ∂ ∂u

tion of waveform in data domain, the RWI method is developed ρ 2i − ci j k l k = f i , (1)
∂t ∂xj ∂ xl
by several works (Wu and Alkhalifah, 2015; Zhou et al., 2015),
and recently extended to elastic case by Guo and Alkhalifah and
(2016). The misfit function also can be built in image domain
∂δu2i ∂ ∂δuk ∂ ∂uk

in the manner of wave equation migration velocity analysis ρ 2 − c = δci j k l , (2)
(WEMVA), which tries to maximize energy at zero offset loca- ∂t ∂ x j i j k l ∂ xl ∂xj ∂ xl
tion in the extended image space (Biondi and Almomin, 2012; where δui can be seen as the demigrated reflection data using
Sun and Symes, 2012). Raknes and Weibull (2016) developed the image perturbation δci j k l obtained from RTM or other
the image-domain method in the 3D elastic media. Wang et al. imaging method. In WERTI, we aim to minimize the traveltime
© 2017 SEG Page 1486

Born reflection kernel and EWERTI
differences between observed data do and calculated data dc , where m0 ∈ {Vp , Vs } and M, N ∈ {P, S}. K m M N represents the
0
then the objective function is: cross correlation between the M mode forward wavefields and
the N mode adjoint wavefields. Note, it does not denote the
c o 2
 τ(xr, t; xs ) = argτmin k d (xr , t; x s ) − d (xr , t + τ; x s ) k

 kernel of M N mode data. For example, the reflection kernel of
Z PS data should be (u P · δψ P + δuS · ψ S ) but not K PS .
E = 1

τ 2 (xr, t; xs )dtdxr dxs,
2 To analyze the elastic reflection kernels, we calculate them with

(3) the single-source-receiver data which are synthesized by single
where the time differences τ(xr, t; xs ) can be extracted through reflector with pure P-wave source. In the first model, we place
DIW. After a similar derivation as in Ma and Hale (2013), the a single Vp reflector in the homogeneous background (Fig 1a
gradients of equation (3) can be expressed as: and b). Since there is no perturbation of Vs , only PP reflection
∂E
Z
∂ui ∂δψk ∂δui ∂ψk exist in the data, which is almost the same as in acoustic media.
=− ( + ), (4) As shown in Figure 1c and d, the reflection kernel consists of
∂ci j k l ∂ x j ∂ xl ∂ x j ∂ xl
two “rabbit-ear”, the source and receiver parts. We can see the
where ψi and δψi are the adjoint wavefields satisfying: migration impulse below the reflector due to the down-going
perturbed wavefields (Zone D mentioned by Zhou et al. (2015)).
∂ψ 2 ∂ ∂ψ d˙o (xr, t + τ; xs )

The energy in Vs kernel focus on the edge of the wavepath rather
ρ 2i − ci j k l k = τ(xr, t; xs ) i ,
∂t ∂xj ∂ xl hi (xr, t; xs ) than the first Fresnel-Zone as in the Vp kernel. One plausible
(5) reason is that the Vs kernel is relatively insensitive to the PP
and data generated by Vp reflector.
∂δψ 2 ∂ ∂δψk ∂ ∂ψ

ρ 2i − ci j k l = δci j k l k , (6)
∂t ∂xj ∂ xl ∂xj ∂ xl Position(km) Position(km)
0 1.5 3 0 1.5 3
0 3.4 0 1.7
with hi (xr, t) = d˙io (xr, t + τ) 2 − dïo (xr, t + τ)(d ic (xr, t) −

d io (xr, t + τ). The hat dot denotes the time derivative. On ✸ ▼ ✸ ▼
the right hand side (RHS) of equation (4), the first and sec-
Depth(km)
Depth(km)
Vp (km/s)
Vs (km/s)
ond term indicate the source and receiver part of the reflection 0.75 3.2 0.75 1.6
wavepath, respectively. Then we can get the gradients in terms

of P- and S- wave velocities through the chain rule:
∂E ∂E 1.5 3.0 1.5 1.5
= 2ρVp δi j δk l ,
∂Vp ∂ci j k l (a) (b)
(7)
∂E ∂E
= 2ρVs (−2δ i j δ k l + δ ik δ jl + δ il δ j k ). Position(km) Position(km)
∂Vs ∂ci j k l 0
0 1.5 3
0
0 1.5 3
Elastic Born reflection kernel analysis ✸ ▼ ✸ ▼

The key point of reflection inversion is to calculate the re-
Depth(km)
Depth(km)
flection kernel. RWI and WERTI utilize different objective 0.75 0.75
functions which only induce different types of adjoint sources,
but share similar reflection kernels. Due to the complex mode
conversions in elastic wavefields, wavepath of elastic reflec-
tions will be far more complicated than that in acoustic case.
1.5 1.5
Here, we decompose the origin kernel into four components
which represent cross-correlation of different wave modes. (c) (d)
For simplicity, we rewrite (4) as follow: Figure 1: Kernels with single reflector in Vp model. (a) Vp
Z model, (b) Vs model, (c) KVp , (d) KVs .
∇E(m0 ) = − (u · δψ + δu · ψ) (8)
with u and ψ are the forward and adjoint background wave- In the second model, we use the Vs reflector (Fig 2a and b) to
fields, δu and δψ are the forward and adjoint perturbed wave- generate both PP and PS reflection. The Vp kernel excludes
fields. The operator · denotes the cross correlation between two S-wavefield automatically because of the divergence operator
wavefields. Note that, equation (8) just schematically shows the implied in the term δ i j δ k l (equation (7)). However, δψ con-
manner of cross correlation. The detailed formulas should be tains the non-physical converted SP wavefields generated by
derived according to the parameter m0 through chain rule, just the back-propagated ψ S at the location of reflector. These SP
as equation (7). Considering mode decomposition, the above wavefields make the Vp kernel slightly different from that in
formula can be decomposed into four types with: Figure 1c. If we only back-propagate the PP data, Vp kernel
Z will be the same as Figure 1c.
MN
Km 0
= − (u M · δψ N + δu M · ψ N ), (9) For Vs kernel (Figure 2d), due to mode conversions, multi-
© 2017 SEG Page 1487

wavepaths overlapping with each other make it much more

Position(km) Position(km)
complicated and difficult to find the correct PS reflection ker- 0 1.5 3 0 1.5 3
0 0
nel. The straightforward utilization of this kernel in gradient
calculation will very likely cause severe cross-talk during re-
flection inversion. According to equation (9), we calculate the ✸ ▼ ✸ ▼
components of KVs , as shown in Figure 3. The KVPsP is similar
Depth(km)
Depth(km)
to KVPPp
but with an opposite sign. KVPS s
and KVSsP mainly 0.75 0.75
consist of high-wavenumber energy, which in fact are the mi-

gration impulse of cross-mode wavefields. Most importantly,
we expect to update the Vs model through the S-wavepath in
PS reflection, while KVSsS (Figure 33d) is exactly what we want. 1.5 1.5
Therefore, we recommend using KVSsS to mitigate the cross- (a) (b)
talks in Vs gradient calculation.

0 1.5 3 0 1.5 3
0 0
0 1.5 3 0 1.5 3
0 3.4 0 1.7
✸ ▼ ✸ ▼
Depth(km)
Depth(km)
✸ ▼ ✸ ▼
0.75 0.75
Depth(km)
Depth(km)
Vp (km/s)
Vs (km/s)
0.75 3.2 0.75 1.6
1.5 1.5
1.5 3.0 1.5 1.5
(c) (d)
(a) (b)
Figure 3: Four components of KVs . (a) KVPsP , (b) KVPS

s
, (c)
0
0 1.5 3
0
0 1.5 3
KVSsP , (d) KVSsS .
✸ ▼ ✸ ▼
Since only the traveltime is considered in WERTI, just ERTM
Depth(km)
Depth(km)
is applied to obtain the image rather than in a least-square

0.75 0.75
manner (LSRTM) to find the correct reflectivity. Therefore,
the objective function is:
Z
1
Ep p = τp2 p (xr, t; xs )dtdxr dxs . (10)
2
1.5 1.5
Thus, we can obtain the gradient of Vp ( ∂V ∂E ), just using the
(c) (d) p
P-wave seismograms to calculate RHS of equation (5) and
Figure 2: Kernels with single reflector in Vs model. (a) Vp replacing δci j k l with δVp in equation (2) and (6).
model, (b) Vs model, (c) KVp , (d) KVs .
In the S-wave stage, the objective function becomes:
Z
1 2
Workflow of elastic WERTI E ps = τps (xr, t; xs )dtdxr dxs . (11)
DIW extracts the reflection traveltime differences in data do- 2
main. In elastic case, the cross points between different mode- However, the implementation is a little different from the previ-
conversions would be singularities for DIW. Therefore, τ(xr, t; xs ) ous stage. After the P-wave stage inversion, the background Vp
would be inaccurate if using the original multicomponent seis- should be well recovered. As we know, in most cases Vp and Vs
mic data, which makes equation (7) difficult to implement. Be- share the same structure in the subsurface. Therefore, we rec-
sides, according to the reflection kernel analysis, individually ommend to use the well imaged δVp instead of δVs to generate
injecting PP or PS recordings also can mitigate the cross-talk the PS reflections. Besides, in order to make sure that reflected
in gradient calculation. Thus, we decompose the observed and S-wavepath is used to update Vs , wave mode decomposition is
calculated data into P- and S-wave parts through P/S separation implemented to calculate KVSsS with:
of multi-component seismograms (Li et al., 2016). In this way,
∂E ps ∂δuiS ∂ψkS
Z
the traveltime differences can be decoupled into P- and S-wave = −2ρVs ( )(δ ik δ jl + δ il δ j k ). (12)
part, with which we can implement the elastic WERTI through ∂Vs ∂ x j ∂ xl
a two-stage workflow, i.e. the P-wave stage followed by the This is similar to the gradient precondtioning for EFWI pro-
S-wave stage. posed by Wang and Cheng (2017). The mode decomposition
can mitigate parameter trade-offs and suppress artifacts for the
In the first stage, we use PP traveltime to recover Vp model.
Vs inversion.
© 2017 SEG Page 1488

Position (km)
0 1 2 3 4 5
Position (km) Position (km) 0 3.0
0 1 2 3 4 5 0 1 2 3 4 5
0 3.0 0 2
2.5
Depth (km)
Depth (km)
2.5
Vp (km/s)
Vs (km/s)
Depth (km)
1 1
Vp (km/s)
1.5 1
2.0
2 2
2.0
1.5 1
Position (km) Position (km)

0 1 2 3 4 5 0 1 2 3 4 5 1.5
0 3.0 0 2
(a)
2.5
Depth (km)
Depth (km)
Vp (km/s)
Vs (km/s)
1 1
1.5
Position (km)
2.0
0 1 2 3 4 5
0 2
2 2
1.5 1
Depth (km)
Figure 4: Sigbee2A model example. On the top are true models
Vs (km/s)
1
1.5
of Vp (a) and Vs (b). On the bottom are initial models of Vp
(c) and Vs (d) linearly increasing with depth.
2
(b)
NUMERICAL EXAMPLE
Position (km)
We select a part of the Sigsbee2A model (Figure 4a and 4b) 0
0 1 2 3 4 5
3.0
to test the inversion algorithm and strategy. The Vs model is
generated using fixed Poisson’s ratio. The initial model for E-
2.5
Depth (km)
WERTI are shown in Figure 4c and 4d. The linearly increasing
Vp (km/s)
1
initial model of Vp is generally lower while Vs is higher than

2.0
the true model, and both of them are far from the true value.
2
36 shots are evenly deployed on the surface and receivers are
fixed with a maximum offset of 4km. The main frequency of 1.5
P-wave source is 15Hz. (c)
Figure 5a and 5b show the inverted results of E-WERTI. After Position (km)
0 1 2 3 4 5
40 iterations for each stage, WERTI provides a good recovery of 0 2
the background information for both Vp and Vs . Nonetheless,

on the right part, the reflection coverage of surface observa-
Depth (km)
Vs (km/s)
1
tion is insufficient for WERTI to rebuild the long-wavelength 1.5
components. Using the inverted results of WERTI as starting

models, we also perform the conventional EFWI. As shown in 2
Figure 5c and 5d, both of the inverted Vp and Vs models are 1

well reconstructed except the right part. (d)
Figure 5: Inverted results of WERTI and EFWI. (a) and (b) are
CONCLUSIONS inverted Vp and Vs model through two-stage elastic WERTI
with the linearly increased models as initial models. (c) and
Reflection traveltime inversion only minimizes traveltime mis- (d) are inverted Vp and Vs through EFWI using (a) and (b) as
fits which are more sensitive and linearly related to the low- starting models.
wavenumber model perturbation. The kernel analysis of differ-
ent wave modes show that mode decomposition can suppress
artifacts and recover the correct reflection wavepath in gradient ACKNOWLEDGEMENT
calculation. With the aid of DIW and P/S separation of 3C seis-
mograms, we can obtain the travel time differences of PP and PS This work is supported by the National Natural Science Foun-
reflections, respectively. To build the long-wavelength compo- dation of China (NO.41474099, 41674117 & 41630964). This
nent of the model, we introduce a two-stage WERTI workflow paper is also based upon the work supported by the King Abdul-
by firstly using PP then PS reflections, through which the non- lah University of Science and Technology (KAUST) Office of
linearity of reflection inversion is reduced effectively. In the Sponsored Research (OSR) under award NO. 2230. We appre-
second stage, the wave mode decomposition is introduced to ciate the open-source package of DENISE from https://github.com/daniel-
calculate the gradient of Vs to mitigate the trade-off between koehn/ and Mines Java Toolkit from https://github.com/dhale.
Vp and Vs . The Sigsbee2A model example shows that even We thank the useful advice from Tariq Alkhalifah (KAUST),
starting with a bad initial model, the two-stage E-WERTI can Qiang Guo (KAUST), Zedong Wu (KAUST), Chenlong Wang
provide reliable starting model for conventional EFWI. (Tongji University) and Benxin Chi (Los Alamos).
© 2017 SEG Page 1489

EDITED REFERENCES
REFERENCES
Biondi, B., and A. Almomin, 2012, Tomographic full waveform inversion (TFWI) by combining full
waveform inversion with wave-equation migration velocity analysis: SEG.
Technical Program Expanded Abstracts 2012: Society of Exploration Geophysicists.
80, no. 4, R189–R202, http://dx.doi.org/10.1190/geo2014-0345.1.
Guo, Q., and T. Alkhalifah, 2016, A nonlinear approach of elastic reflection waveform inversion: SEG,
1421–1425.
Li, Z., X. Ma, C. Fu, B. Gu, and G. Liang, 2016, Frequency wavenumber implementation for P- and S-
wave separation from multi-component seismic data: Exploration Geophysics, 47, 32,
http://dx.doi.org/10.1071/EG14047.
Luo, Y., Y. Ma, Y. Wu, H. Liu, and L. Cao, 2016, Full-traveltime inversion: Geophysics, 81, no. 5,
R261–R274, http://dx.doi.org/10.1190/geo2015-0353.1.
waveform inversion: Geophysics, 78, no. 6, R223–R233, http://dx.doi.org/10.1190/geo2013-
0004.1.
Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L. Metivier, and J. Virieux, 2013, A guided
tour of multiparameter full-waveform inversion with multicomponent data: From theory to
practice: The Leading Edge, 32, 1040–1054, http://dx.doi.org/10.1190/tle32091040.1.
Raknes, E. B., and W. Weibull, 2016, Combining wave-equation migration velocity analysis and full-
waveform inversion for improved 3D elastic parameter estimation: 86th Annual International
Meeting, SEG, Expanded Abstracts, 1320–1324, http://dx.doi.org/10.1190/segam2016-
13858670.1.
Sun, D., and W. W. Symes, 2012, Waveform inversion via nonlinear differential semblance optimization:
Vigh, D., K. Jiao, D. Watts, and D. Sun, 2014, Elastic full waveform inversion application using
multicomponent measurements of seismic data collection: Geophysics, 79, no. 2, R63–R77,
Wang, C., W. Weibull, J. Cheng, and B. Arntsen, 2017, Automatic shear-wave velocity analysis with
elastic reverse time migration: 79th EAGE Conference and Exhibition 2017, Expanded Abstracts.
Wang, T., and J. Cheng, 2017, Elastic full waveform inversion based on mode decomposition: the
approach and mechanism: Geophysical Journal International, 209, 606–622,
http://dx.doi.org/10.1093/gji/ggx038.
Wu, Z., and T. Alkhalifah, 2015, Simultaneous inversion of the background velocity and the perturbation
in full-waveform inversion: Geophysics, 80, no. 6, R317–R329,
Xu, S., D. Wang, F. Chen, G. Lambare, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
© 2017 SEG Page 1490

Zhou, W., R. Brossier, S. Operto, and J. Virieux, 2015, Full waveform inversion of diving & reflected
Geophysical Journal International, 202, 1535–1554, http://dx.doi.org/10.1093/gji/ggv228.
© 2017 SEG Page 1491


Multi-Scale T-Matrix Completion Method For FWI in The Absence of A Good Starting Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Scale T-Matrix Completion Method For FWI in The Absence of A Good Starting Model

Uploaded by

Copyright:

Available Formats

Multi-scale T-matrix completion method for FWI in the absence of a good starting model

SUMMARY Multi-scale regularization has been developed to reduce the

© 2017 SEG Page 1274

SINGLE-FREQUENCY T-MATRIX COMPLETION The first estimate of T and V

In order to check if the esimated local scattering potential D(1)

If η (1) is significantly smaller than 1 then the estimated diago-

© 2017 SEG Page 1275

The second estimate of T and V MULTI-SCALE T-MATRIX COMPLETION

© 2017 SEG Page 1276

© 2017 SEG Page 1277

© 2017 SEG Page 1278

© 2017 SEG Page 1279

To solve the FWI problem, we first describe the parametriza-

© 2017 SEG Page 1280

m as m having k nuclei, may jump towards an increase in dimension

© 2017 SEG Page 1281

50 100 150 200 250 300 350

as momentum p. Each nuclei is augmented with a initial mo-

The Potential energy U is defined as the negative logarithm of

© 2017 SEG Page 1282

seems to have completed in 3000 iterations only. Figure 3d

4000 95% confidence interval

0.5 1 1.5 2 2.5 3

2 3000 DISCUSSION AND CONCLUSION

1 completely dictated by the data and determined large automat-

© 2017 SEG Page 1283

Agostinetti, N. P., G. Giacomuzzi, and A. Malinverno, 2015, Local three-dimensional earthquake

© 2017 SEG Page 1284

© 2017 SEG Page 1285

© 2017 SEG Page 1286

We illustrate our method by using the Marmousi and the .

© 2017 SEG Page 1287

Note that both and are monotonic increasing functions.

operations and both are monotonic functions. Therefore, we

We first investigated the use of our method on the

Next, we perform numerical test on the BP 2004

© 2017 SEG Page 1288

Figure 4: (a): 𝑳𝑳𝟐𝟐 with TV regularization, (b): 𝑾𝑾𝟐𝟐 with TV

Figure 5: Slices of the velocity models with TV

The formulation of FWI with Wasserstein distance shows

We thank PGS for permission to publish the results.

© 2017 SEG Page 1289

© 2017 SEG Page 1290

SUMMARY The goal of this paper is to analyze important features of opti-

© 2017 SEG Page 1291

normalization is required before inversion. Datasets f and g

(c) Comparison among NIM, W2 , W1 in 1D

Relations among misﬁt functions

© 2017 SEG Page 1292

ent misﬁt functions. First row shows conventional L2 (left),

(right). Second row shows the W2 misﬁt with different nor-

Figure 3: (a) Convexity plot of conventional L2 (b) Convexity

© 2017 SEG Page 1293

4 to generate the synthetic data with a bandpass ﬁlter only keep-

Finally we present a convexity result in model domain. We

© 2017 SEG Page 1294

© 2017 SEG Page 1295

© 2017 SEG Page 1296

SUMMARY However, due to the lack of full separation, amplitude errors in

the observed or predicted data leak into the inversion result,

For a single-event asymptotic of Equation 2 and in the absence

where θ is the reflection angle, VP and ρ are the acoustic ve-

Figure 3: Predicted traces at ≈1 km offset using Equation 1