You are on page 1of 12

132 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL.

5, 2020

Bifidelity Gradient-Based Approach for Nonlinear


Well-Logging Inverse Problems
Han Lu, Qiuyang Shen, Jiefu Chen , Xuqing Wu, Xin Fu, Mohammad Khalil, Cosmin Safta , and Yueqin Huang

Abstract—Solving a nonlinear inverse problem is challenging sampling [2], make it possible to find the global optimum. The
in computational science and engineering. Sampling-based meth- major challenge of sampling-based methods is the computa-
ods require a large number of model evaluations; gradient-based tional burden induced by a large number of evaluations of the
methods require fewer model evaluations but only find the local
minima. Multifidelity optimization combines the low-fidelity model forward model. When the forward simulation is CPU intensive,
and the high-fidelity model to achieve both high accuracy and sampling-based methods become computationally prohibitive.
high efficiency. In this article, we present a bifidelity approach The forward model producing outputs that satisfy the accuracy
to solve nonlinear inverse problems. In the bifidelity inversion requirement of the task at hand is normally a high-fidelity model.
method, the low-fidelity model is used to acquire a good initial guess, The low-fidelity model or the surrogate model, constructed as an
and the high-fidelity model is used to locate the global minimum.
Combined with a multistart optimization scheme, the proposed approximation of the high-fidelity model, is cheaper to evaluate
approach significantly increases the possibility of finding the global but not necessarily accurate. Fast surrogate models are also
minimum for nonlinear inverse problems with many local minima. valuable for sampling-based methods to dramatically reduce
The method is tested with two toy problems and then applied to computational costs. Many surrogate modeling methods such as
an electromagnetic well-logging inverse problem, which is difficult polynomial chaos expansion (PCE) [3], Gaussian process regres-
to solve using traditional gradient-based methods. The bifidelity
method provides promising inversion results and can be easily sion [4], support vector machines [5], [6], and other simplified
applied to traditional gradient-based methods. models [7], [8] have been developed and successfully employed
for many optimization tasks [9], [10]. However, simply replacing
Index Terms—Bifidelity, gradient-based inversion, inverse
problem, multifidelity, polynomial chaos expansion (PCE), well
the high-fidelity model with the low-fidelity model can lead
logging. to biased parameter estimations. A multifidelity optimization
method that takes advantage of both low and high-fidelity models
has drawn great attention because of its capability of achieving
I. INTRODUCTION desired accuracy and efficiency.
NVERSE problems are ubiquitous in scientific and engineer- Many previous works implemented the multifidelity model
I ing fields, such as geophysics, astronomy, computer vision,
and medical imaging. Most of the time, these inversion problems
by adapting the low-fidelity model during the inversion process.
These methods rely on a large number of low-fidelity evaluations
are ill-posed and nonlinear and, thus, cannot be well solved along with a small number of high-fidelity evaluations to build
using deterministic inversion methods [1]. The Bayesian infer- a correction for the low-fidelity model. The adjusted model
ence approaches, such as Markov chain Monte Carlo (MCMC) then serves as a multifidelity approximation to the high-fidelity
model. For example, in [11], the PCE surrogate modeling error is
modeled by another PCE during the MCMC sampling process.
Manuscript received March 16, 2020; revised May 16, 2020 and June 27, 2020;
accepted July 5, 2020. Date of publication July 7, 2020; date of current version The low-fidelity model can also be corrected by a multiplicative
July 23, 2020. This work was supported by the U.S. Department of Energy, Office or/and additive term (see, e.g., [12]–[14]).
of Science, Office of Advanced Science Computing Research, under Award There are also some works using a model management strat-
DE-SC0017033. Sandia National Laboratories is a multimission laboratory
managed and operated by National Technology and Engineering Solutions of egy based on filtering, where the high-fidelity model is invoked
Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the following the evaluation of a low-fidelity filter. This strategy is
U.S. Department of Energy’s National Nuclear Security Administration under mainly employed under the context of statistical inference [15]–
Contract DE-NA0003525. (Corresponding author: Jiefu Chen.)
Han Lu, Jiefu Chen, and Xin Fu are with the Department of Electrical and [19]. However, these methods may have limited applications,
Computer Engineering, University of Houston, Houston, TX 77004 USA (e- because they are sensitive to large modeling errors introduced
mail: hlu10@central.uh.edu; jchen82@central.uh.edu; xfu8@central.uh.edu). by the low-fidelity model [11] and the computational bud-
Qiuyang Shen is with the Cyentech Consulting LLC, Houston, TX 77057
USA (e-mail: qiuyangshen@cyentech.com). get might still be unacceptable for high-dimensional (d > 10)
Xuqing Wu is with the Department of Information and Logistics problems [20]. For example, the logging-while-drilling (LWD)
Technology, University of Houston, Houston, TX 77004 USA (e-mail: electromagnetic (EM) resistivity measurement inverse problem
xwu8@central.uh.edu).
Mohammad Khalil and Cosmin Safta are with the Sandia National shown in Section V is high dimensional (usually larger than 10)
Laboratories, Livermore, CA 94550 USA (e-mail: mkhalil@sandia.gov; with strong nonlinearity and requires real-time inversion. Our
csafta@sandia.gov). tests demonstrate that a nine-parameter LWD forward model is
Yueqin Huang is with the Cyentech Consulting LLC, Cypress, TX 77429
USA (e-mail: yueqinhuang@cyentech.com). too hard to approximate at an affordable computational cost in
Digital Object Identifier 10.1109/JMMCT.2020.3007839 the context of a multifidelity statistical inference method.

2379-8793 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 133

Different from previous works, in this article, we develop a measurements y ∈ Rm . With a nonlinear forward function f (x)
bifidelity approach that is designed for gradient-based methods. that yields responses, given model parameters, the inverse prob-
The motivation for this article is twofold. First, an accurate sur- lem is defined as follows:
rogate is not always available. Second, sampling-based methods
are not computationally efficient enough for inverse problems in argminf (x) − y22 . (1)
x∈Rn
some real-time applications even with low-fidelity models. To
address the above problems, our bifidelity approach is summa- This is an unconstrained nonlinear least-squares optimization
rized as follows. problem, and many numerical methods are developed to solve
1) The surrogate model is used in a gradient-based optimizer. such a problem by iteratively updating the model parameters to
The result is taken as the initial guess for the next gradient- minimize the misfit between the observation y and the model
based optimization process using the high-fidelity model. prediction f (x). To determine the direction of the model updat-
The two-step approach is less sensitive to the model- ing, derivative of the objective function needs to be calculated
ing error of the low-fidelity model. The surrogate model during each iteration. For example, given the objective function
smooths out local minima of the high-fidelity model and F (x) = f (x) − y22 , at the ith iteration, the gradient-descent
is much cheaper to construct. algorithm [27] updates the model parameters x to the direction
2) To reduce the probability of being stuck in local minima, of the negative gradient of F at xi−1 , i.e., ∇F (xi−1 ), as follows:
we use a multistart framework for the deterministic inver-
xi = xi−1 − γ∇F (xi−1 ) (2)
sion. The multistart framework is thoroughly described
in [21]; it is efficient for problems where solutions can be where the positive step size γ controls the converge speed and
easily constructed but have multiple local minima. the accuracy of the algorithm. For multivariate functions, the
Though the surrogate modeling methods have been widely derivative ∇F (xi−1 ) is the Jacobian matrix J of F at xi−1 . With
explored in recent years, there is no surrogate model that works certain assumptions on the function F and particular choices of
well for all kinds of applications. In this article, we use PCEs as γ, the algorithm is guaranteed to converge to a local minimum.
low-fidelity models of the EM resistivity LWD forward model In case that the function F is twice differentiable, one can use
for the purpose of bifidelity inversion. The reason of choos- Newton’s method [28] to minimize the objective function more
ing PCE as the surrogate model is twofold: 1) PCE is good efficiently by the following updating rule:
at capturing global trends in the high-fidelity model; and 2)
the input dimensionality is too high for many other surrogate xi = xi−1 − (∇2 F (xi−1 ))−1 ∇F (xi−1 ) (3)
models. The original PCE model also suffers from the curse of
where ∇2 F (xi−1 ) is the Hessian matrix of F at xi−1 and
dimensionality, which means that a prohibitively large number
automatically decides the step size. However, if the second-
of sample points are required when the problem dimensionality
order derivative is computationally expensive, it will be less
is high. To combat this challenge, Sargsyan et al. developed a
efficient to compute it at each iteration. As a modification of
methodology to construct a sparse PCE surrogate by learning
Newton’s method, the Gauss–Newton algorithm replaces the
and retaining the most relevant polynomial basis terms with the
second derivative with an approximation
aid of sparse Bayesian learning [22]. We adopt this approach
to construct a sparse PCE surrogate for the EM resistivity ∇2 F (xi ) ≈ J (xi )T J (xi ). (4)
LWD forward model for multilayer earth models. The proposed
approach is tested on 2-D and 3-D Shekel functions as well The Levenberg–Marquardt algorithm (LMA) [29], [30], also
as multilayer LWD inverse problems. LWD inverse problems known as the damped least-squares method, is a hybrid ap-
usually have many local optimal solutions due to nonlinearity proach. Compared to the Gauss–Newton algorithm, a damping
of the forward model. Given the high dimensionality and time term is added to the updating rule
requirement of LWD inverse problems, deterministic inversion
xi = xi−1 − (J (xi−1 )T J (xi−1 )) + λI)−1 ∇F (xi−1 ) (5)
methods based on the gradient are still widely used [23]–[26].
To our best knowledge, it is the first work that uses multifidelity where small values of λ result in a Gauss–Newton update,
optimization for LWD inverse problems. Testing results show and large values of λ result in a gradient-descent update. The
that our approach significantly improves the inversion accuracy damping factor is adjusted during each iteration; as a result, the
with negligible computational overheads. solution typically reaches the local minimum faster. In this work,
we use the LMA to solve the inverse problems.
II. BACKGROUND
In this section, we first describe the gradient-based opti- B. PCE Surrogate
mization methods for inverse problems. Then, we introduce a
PCE surrogate construction method using sparse learning in a In this section, we describe the polynomial chaos (PC) ap-
Bayesian framework. proximation to the forward model y = f (x), where x ∈ Rn is
an n-dimensional input vector and y is a scalar output. In PC
theory [31], [32], both input parameters and the output of interest
A. Inverse Problems are represented as a series of orthogonal polynomials Ψk (ξ)
In this article, we are interested in the problem of estimat- of standard independent identically distributed (i.i.d.) random
ing a set of unknown model parameters x ∈ Rn from indirect variables ξ ∈ Rñ . As such, the input parameters can be written

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
134 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL. 5, 2020

as follows: Alternatively, (7) can be treated as a regression model; the


K
in −1
regression coefficients ck can then be computed using regres-
xi ∼
= xi,k Ψk (ξ) (6) sion methods. In the following section, we will demonstrate
k=0 how to solve this regression problem efficiently with Bayesian
compressive sensing (BCS).
where xi,k for i = 1, 2, . . ., n, k = 0, 1, . . ., Kin − 1 are the ex-
pansion coefficients corresponding to the PCE for the input
C. BCS for Polynomial Regression
parameter x. Kin is the number of basis terms in the input PCEs,
which can be fixed at 2 so that x and ξ have a linear relationship. From the Bayesian point of view, the solution to the problem
We go through this exercise to construct a probabilistically of determining the coefficients ck in (7) is a posterior probability
consistent functional form for the model output dependence on density function q(c)
the input parameters. The output is written as
q(c) ∝ LD (c)p(c) (11)
K−1

y∼
= ck Ψk (ξ) (7) where LD (c) is the likelihood of c, a measure of a goodness of
k=0 fit of the corresponding surrogate with respect to the given data
D, and p(c) is the prior distribution of c. Given a zero-mean
where ck for k = 0, 1, . . ., K − 1 are the PCE coefficients for
normal distributed noise model  with standard deviation σ, we
the output y, and the value of K is chosen according to the
can write the likelihood
modeling accuracy requirements.  N 
The type of polynomials Ψk (ξ) is chosen to keep consistency  (yi − yc (ξ i )2
2 −N
with the distribution of ξ. For example, Hermite polynomials LD (c) = (2πσ ) 2 exp − . (12)
i=1
2σ 2
are used for normally distributed random variables and Leg-
endre polynomials are used for uniformly distributed random This problem becomes intractable for high-dimensional prob-
variables. For the surrogate construction purpose, one can use lems, since the number of unknown coefficients will grow
the first-order polynomial of ξi to represent the input parameters rapidly with increasing dimensionality. In many applications,
xi . Subsequently, an available dataset of input–output pairs most of the basis functions in PCE have negligible impact,
{xi , yi }N N
i=1 can be linearly transferred to D = {ξ i , yi }i=1 and
i.e., the vector c is sparse. It is efficient and reasonable to
what remains is determining ck in (7) with the given data. only compute the most significant terms of the PCE, both in
There exist intrusive and nonintrusive methods [33] for the the construction and evaluation of the PCE surrogates. To this
calculation of the polynomial coefficients ck . The intrusive end, in [22], the authors proposed to use BCS to find a sparse
approach requires a reformulation of the solution method and representation of c given available data [36], [37]. The key for
rewriting of the code, and it is not always practical. Alter- inferring a sparse PCE is to impose a prior distribution on c that
natively, the nonintrusive method does not require an explicit induces sparsity. A commonly used sparsity-inducing prior is
representation of the forward model but treats it as a black box. the Laplace prior
Nonintrusive methods attempt to solve the following explicit  
 α k+1 K
problem: p(c) = exp −α c . (13)
K−1
K −1  2
k=0
 in
∼f
ck Ψk (ξ) = xi,k Ψk (ξ) (8) The vector c that maximizes the posterior q(c) in (7) coincides
l=0 k=0 with the solution of the classical compressive sensing problem
which can be solved using nonintrusive spectral projection
arg maxc (log LD (c) − αc1 ) (14)
(NISP) [34] or regression-based methods [35] to compute the
coefficients ck . NISP provides the coefficients as where the regularization term αc1 corresponds to the sparsity-
inducing prior distribution. The positive parameter α is a user-
ck = y, ψk /ψk , ψk  (9)
defined value that controls the level of sparsity. However, the
where X(ξ), Y (ξ) is the inner product of two functions X Laplace distribution is not conjugate to the Gaussian likelihood
and Y with respect to the probability density function of ξ. and thus does not allow a tractable Bayesian analysis. This issue
ψk , ψk  is in practice known exactly; y, ψk  can be calculated was addressed in sparse Bayesian learning, particularly with
by numerical integration the relevance vector machine [38]. Instead of directly using the
 Laplace prior, a hierarchical prior distribution is constructed with
y, ψk  = yψk (ξ)p(ξ)dξ. (10) a Gaussian prior distribution on c

2 1 c2k
Deterministic (Gauss quadrature) or random (Monte Carlo) p(ck |sk ) =  exp − 2 (15)
sampling can be used to compute the integration. However, for 2πs2k 2sk
high-dimensional problems, both methods require a large num- and a gamma prior to the hyperparameter s2k
ber of simulation runs even with sparse sampling techniques,

which renders those approaches impractical for the inversion 2 2 α2 α2 s2k


p(sk |α ) = exp − (16)
problem under consideration. 2 2

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 135

with a resulting (marginalized) Laplace prior density


 ∞ K−1
K−1

α −α|ck |
p(c|α2 ) = p(ck |s2k )p(s2k |α2 )ds2k = e .
0 2
k=0 k=0
(17)
This procedure has been implemented in the Bayesian LASSO
method [39]. For details of the implementation, see [22], [37],
and [40]. The task of finding PCE coefficients c now has become
an optimization problem that finds hyperparameters σ 2 , s2 , and
α that maximize the evidence or the integrated likelihood

E(σ 2 , s2 , α) = LD (c; σ 2 )p(c|s2 )p(s2 |α)p(α)p(σ 2 )dc
RK

1 Fig. 1. Example of a surrogate model that has large modeling error at some
2 2 −1 − 12
∝ p(α)p(σ )p(s |α)σ |C| exp − 2 y T C −1 y points but is helpful for global search.

(18)
where C = I + ΨS −1 Ψ. Here, Ψ is an N × K pro-
jection matrix with entries Ψik = Ψk (ξ i ) and S = as a series of orthogonal polynomials of standard i.i.d. random
diag(σ/s20 , . . ., σ/s2K−1 ). In practice, it turns out that for variables and has already been introduced in Section II-B. We
many basis terms, the inverse variance s12 that maximizes (18) use LMA for the inversion task in this work due to its good
k
grows indefinitely, i.e., s2k → 0. These terms will be purged performance on nonsmooth objective functions; one can switch
from the basis set. to any other gradient-based algorithm according to the behavior
Let us denote by K the number of retained basis functions of the objective function. The workflow of the proposed method
and reindex them using k, for k = 0, 1, . . ., K − 1. One obtains is shown in Fig. 2.
a Gaussian posterior distribution for c with mean and variance Before performing the bifidelity inversion, the low-fidelity
model should be constructed based on the high-fidelity model.
μ = σ −2 ΣΨT y As shown in Fig. 2, the surrogate replaces the high-fidelity model
in the inversion process at the first stage. The surrogate used in
Σ = σ 2 (ΨT Ψ + S)−1 (19)
step 1 is to capture the global trends in the response surface, and
where Ψ is an N × K projection matrix with entries Ψik = the solution at this stage is located closer to the global optimum.
Ψk (ξ i ) and S = diag(σ/s20 , . . ., σ/s2K −1 ). Finally, μ is used At step 2, the LMA inversion is performed with the high-fidelity
as coefficients to form a sparse PCE surrogate model initialized with the solutions from the previous stage.

Finally, the solution with the smallest data misfit will be selected
K
 −1
as the final result. Compared to the single-fidelity inversion that
y μi Ψk (ξ). (20)
only uses the high-fidelity model, this bifidelity optimization
i=0
strategy has a better chance of escaping from local minima
III. BIFIDELITY APPROACH FOR and reaching the global minimum. The multistart optimization
GRADIENT-BASED OPTIMIZATION scheme is used in this approach; the number of required initial
models is greatly reduced with the help of the low-fidelity model.
The major drawback of gradient-based methods is that they
are inherently local methods. However, as pointed out in [41],
IV. NUMERICAL TESTS
for differentiable problems, one should first consider using
multistart gradient-based optimization because of their ease of For the purpose of visualization, we test the performance
implementation as well as their advantages in using derivative. of the proposed approach on 2-D and 3-D Shekel functions.
With many randomly distributed initial guesses, it is likely that The Shekel function is a multidimensional, multimodal, and
some models are close to the global minimum and will finally continuous function and is commonly used as a test function
converge to the global minimum. For high-dimensional prob- for optimization algorithms [43]. As shown in Fig. 3(a), there
lems, the required number of initial guesses grows exponentially are ten local minima in the Shekel function. The global minimum
to obtain the global minimum, which has put big pressure on the has a value of −11.03 and is located at (4, 4). We construct PCE
computational resources. Based on the empirical observation, a surrogates as the low-fidelity model of the shekel functions for
low-order surrogate with large modeling error may not mislead the purpose of bifidelity inversion. The 2-D shekel function is
the global search [42], since it can smooth out the high-fidelity cheap and only contains two input variables, so we construct full
model, as illustrated in Fig. 1. In this article, we propose a bifi- PCE surrogate models of orders 15 and 30 using the nonintrusive
delity approach that utilizes a sparse PCE surrogate to help find method introduced in Section II-B. The samples used for PCE
initial guesses that may converge to the global minimum and then construction are generated using the uniform Legendre quadra-
uses the high-fidelity model for an accurate inversion. PCE is a ture rule [35]. The surrogate models are shown in Fig. 3(b) and
spectral expansion approach that represents quantities of interest (c), respectively. When the PCE order is increased, more local

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
136 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL. 5, 2020

Fig. 2. Bifidelity LMA inversion workflow.

Fig. 3. (a) Original response. The 2-D Shekel function with ten local minima. (b) 30th-order PCE surrogate of the 2-D Shekel function. (c) 15th-order PCE
surrogate of the 2-D Shekel function.

Fig. 4. Inversion results of the 2-D Shekel function with 10 initial models. (a) Inversion with the high-fidelity model. (b) Bifidelity inversion; PCE order = 30.
(c) Bifidelity inversion; PCE order = 15.

minimum in the high-fidelity model is captured. Though the results of using the proposed bifidelity method with a 30th-order
approximation error is large at some points for both surrogate PCE surrogate model. Compared to the high-fidelity case, more
models, the response surface is smoothed. initial models have converged to the global minimum. However,
We first perform LMA inversion for the 2-D Shekel problem there are still some initial models trapped in local minima. The
with only the high-fidelity model, which will be referred to as result of bifidelity with a PCE surrogate of 15th order is shown
the single-fidelity inversion. Ten different initial guesses are in Fig. 4(c), where all initial models converge to the global
generated by the Latin hypercube sampling (LHS) algorithm, minimum. This example demonstrates that the fidelity of the
and both input parameters are within the range [−15, 15]. surrogate model can have a great effect on the bifidelity inversion
The inversion results are shown in Fig. 4(a). Only one model result. The best choice of the low-fidelity model can be different
converges to the global minimum, and the rests converge to for different applications. For PCE surrogates, we suggest one
the other four local minima. Fig. 4(b) shows the inversion to start with a surrogate model with higher fidelity and then

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 137

Fig. 5. Inversion results of the 2-D Shekel function with 100 initial models. (a) Inversion with the high-fidelity model. (b) Bifidelity inversion; PCE order = 30.
(c) Bifidelity inversion; PCE order = 15.

Fig. 6. Model responses when fixing the first coordinate of (a) the 3-D Shekel model and (b) the PCE surrogate of the 3-D Shekel model.

gradually reduce the fidelity to find a proper surrogate model 4. With 100 initial models generated by the LHS algorithm,
for the bifidelity inversion. only two models converge to the global minimum when only
Next, we increase the number of initial models to 100, and using the high-fidelity model. However, with the help of the
the results are shown in Fig. 5. In the single-fidelity test results low-fidelity model, 83 models converge to the global minimum.
shown in Fig. 5(a), only 7% of the models converge to the As shown in Fig. 7, most of the local minima are smoothed out
global minimum, and many models converge to the local minima by the low-fidelity model, and only the one located at (5, 3, 5)
located at (3, 6), (5, 3), (6, 6), and (7, 3.5). The bifidelity inversion is left.
result with a 30th-order PCE surrogate is shown in Fig. 5(b).
In this case, 25% of the models finally converge to the global
minimum. When the 15th-order PCE surrogate is used for the V. APPLICATION TO THE LWD INVERSE PROBLEM
bifidelity inversion, all of the models converge to the global In this section, we demonstrate the bifidelity gradient-based
minimum, as shown in Fig. 5(c). The examples shown in Figs. 4 inversion approach by solving resistivity LWD inverse problems,
and 5 demonstrate that a proper low-fidelity surrogate model where the objective is to infer the earth model parameters (e.g.,
can help avoid some local minimum in the bifidelity inversion resistivity of each formation layer, distances from the logging
method. tool to formation interfaces) based on downhole LWD mea-
Given the observations from the 2-D Shekel example, we surements. The recent development of azimuthal EM resistivity
construct full PCE surrogate of order 15 for the 3-D Shekel LWD tool has greatly extended the depth of investigation and,
function. The 3-D Shekel problem becomes more difficult to thus, increases the number of unknown parameters of the LWD
solve, since the dimensionality is increased; thus, more initial inverse problem. Meanwhile, EM responses are highly nonlinear
models are needed to cover the parameter space. Fig. 6 shows due to multiple transmissions and reflections between formation
the model response of the 3-D Shekel function along the sec- interfaces. Being inherently high-dimensional and ill-posed, one
ond and third coordinates when fixing the first coordinate at needs to perform a large number of independent gradient-based

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
138 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL. 5, 2020

Fig. 7. Inversion results of the 3-D Shekel function with 100 initial models. (a) and (b) Inversion results along the first and second coordinates. (c) and
(d) Inversion results along the second and third coordinates. (e) and (f) Inversion results along the second and third coordinates.

optimization with different initial models to obtain an acceptable sparse PCE surrogate construction, and the bifidelity inversion
result. The problem can also be solved by statistical inference process.
methods [44], [45]. However, the computation cost is too high
to be used in real time.
The remainder of this section describes the implementation A. High-Fidelity and Low-Fidelity Models
details for the bifidelity LWD inversion, which includes the In the LWD inversion application, the high-fidelity model is
forward model of the ultradeep EM resistivity LWD tool, the the forward model that simulates the responses of azimuthal

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 139

TABLE I
LAYER RESISTIVITIES AND THICKNESSES OF THE SYNTHETIC EARTH MODEL

EM LWD tools in 1-D earth models. Given an n-layer earth


model, the input consists of n resistivity values and n − 1 depth-
to-the-boundaries. Multiple transmitter–receiver pairs are set at
different locations of the tool working at multiple frequencies Fig. 8. Tested seven-layer model. The color bar represents the resistivity value
(2, 6, and 24 kHz used here). In this article, we synthesize an of each layer.
azimuthal EM LWD tool generating 72 measurements at each
logging point. We build a surrogate model for each of the 72
signals, i.e., the low-fidelity model consists of 72 independent
single-output PCEs. Seven-layer earth models are assumed in the tool keeps drilling down with a dip angle of 82◦ and travels
the following numerical studies. With 13 input parameters, through six boundaries. The LWD data are collected every 10 ft,
the full representation of PCE is unfeasible at an acceptable and the total working region extends to 800 ft horizontally. The
computational cost, for example, to construct a five-order PCE problem consists of 80 continuous 1-D LWD inverse problems,
surrogate model one needs to generate (5 + 1)13 ∼ = 1.3e10 and we solve them independently. Fig. 8 shows the structure of
quadrature points. In this work, Bayesian sparse learning is used the earth model as well as the drilling trajectory represented by
to construct sparse PCEs for the high-dimensional LWD inverse the black dot line.
problems. We perform LMA inversion for each of the 80 LWD inverse
The resistivity value varies from 0.1 to 300 Ω·m and is first problems using single-fidelity and bifidelity inversion methods;
transformed into the logarithmic scale and then rescaled to [0,1] the initial models are randomly generated by LHS, and the
for surrogate modeling. The outputs are linearly rescaled to [0,1]. parameter range is the same as the range of the surrogate training
Considering the detection scope of the ultradeep directional data. For the sake of fairness, the initial models for the single-
logging tool, we tested earth models with seven layers in this fidelity and bifidelity inversion approach are the same, and the
investigation. As the drilling tool penetrates bed boundaries, the results are shown in Fig. 9. In the single-fidelity inversion,
model response can be highly nonlinear and difficult to capture the resistivity and the locations of layer boundaries near the
with a global PCE approximation. To avoid this situation, we borehole can be well inferred in most of the cases. However,
split the training data into seven subsets, corresponding to seven the parameters of layers away from the borehole cannot be
scenarios (i.e., the tool is located in the first layer, the tool in accurately estimated due to the existence of local minima. The
located in the second layer, and so forth). In the following, we reason is twofold: first, signals are reflected by many boundaries,
describe the steps to build a piecewise PCE surrogate for the so there may exist multiple solutions that can cause similar
LWD forward model. responses; second, signals become weak due to attenuation.
1) Split the training dataset into nonoverlapping subsets As shown in Fig. 9 (left column), with the number of initial
Di , i = 1, 2, . . ., 7. x ∈ Di means that the tool is in the models increasing from 50 to 500, the single-fidelity inversion
ith layer. results are improved because more models are initialized around
2) Construct sparse PCE gij (x) using each dataset for each the global minimum. In the bifidelity inversion, tool responses
output individually (i = 1, 2, . . ., 7, j = 1, 2, . . ., 72). are smoothed out by the low-fidelity model so that many local
3) Declare piecewise PCE surrogates minima are skipped. The bifidelity inversion with only 50 initial
models performs better than the single-fidelity inversion with
g i (x) = (gi1 (x), gi2 (x), . . ., gi72 (x)) 500 initial models. Though there is an added one-time cost for
the surrogate construction before inversion, the surrogate models
if x ∈ Di (i = 1, 2, . . ., 7). (21) can be used for similar inverse problems in the future.
To further examine the feasibility of the proposed method,
The data in each subset consist of 5 × 104 samples generated
we perform single-fidelity and bifidelity inversion with LWD
by the LHS algorithm. The surrogate construction takes 8 h
data contaminated by synthetic zero-mean Gaussian noises.
in total using six 64-bit Intel Xeon CPU E5-2650 v4 @ 2.20-
The noise standard deviation σps = 0.375 and σatt = 0.0625 are
GHz processors. The high-fidelity model takes 4 × 10−2 s for
used for phase-shift and attenuation measurements, respectively.
each evaluation, and the corresponding surrogate takes only 3 ×
Fig. 10 shows three example measurements with and without
10−3 s.
noise. We run the inversion algorithms using the same configu-
rations as those for the clean data example. Reconstructed 2-D
B. LWD Inverse Problem earth models are shown in Fig. 11. Compared to the clean data
The synthetic earth model has seven layers, and the parameter case, both the results for the single-fidelity and the bifidelity in-
values are shown in Table I. Consider a drilling process where version are less accurate due to the existence of noises. However,

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
140 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL. 5, 2020

Fig. 9. Curtain plot of the inversion results of the single-fidelity inversion and the bifidelity inversion with different number of initial models. The number of
initial models are increased from 50 to 500. (a) Single-fidelity, 50 initial models. (b) Bifidelity, 50 initial models. (c) Single-fidelity, 100 initial models. (d) Bifidelity,
100 initial models. (e) Single-fidelity, 500 initial models. (f) Bifidelity, 500 initial models.

Fig. 10. (a)–(c) Three example measurements with and without noise.

the bifidelity inversion algorithm still exhibits good capability To better quantify the inversion accuracy, we compare data
of inverting noisy data. Similar to the case without noise, the misfits of the two approaches, which is defined as follows:
bifidelity inversion result with only 50 initial values achieves
similar results as the single-fidelity inversion with 500 initial 80
values. These observations are also reflected by the inverted data 1  f (xireal ) − f (xiinv ))2
e= . (22)
misfit defined as follows. 80 i=1 f (xireal )2

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 141

Fig. 11. Curtain plot of the inversion results of the single-fidelity inversion and the bifidelity inversion where the measurements are contaminated by synthetic
noise. The number of initial models are increased from 50 to 500. (a) Single-fidelity, 50 initial models. (b) Bifidelity, 50 initial models. (c) Single-fidelity, 100
initial models. (d) Bifidelity, 100 initial models. (e) Single-fidelity, 500 initial models. (f) Bifidelity, 500 initial models.

for the clean data and noisy data tests. For both the clean data
and noisy data inversion, the bifidelity inversion method has a
much lower data misfit than the single-fidelity approach. The
bifidelity approach with only 50 initial models obtains a similar
inversion accuracy as that of the single-fidelity method with
1000 initial models, which means that the bifidelity method only
requires 5% of the computational resources to achieve the similar
accuracy.
In the single-fidelity approach, the high-fidelity model was
evaluated for 417 times on average for each optimization; in
the bifidelity approach, low-fidelity and high-fidelity models
were evaluated for 35 and 370 times, respectively. The aver-
age time cost of the two approaches is shown in Fig. 13(a),
where the average run time of the bifidelity approach with an
Fig. 12. Average normalized errors of the single-fidelity and the bifidelity
inversion methods for both clean data and noisy data.
extra low-fidelity inversion step is even less than the single-
fidelity approach. Fig. 13(b) shows that in the bifidelity in-
version, the low-fidelity model evaluation only takes 4% of
the total run time, which means the bifidelity approach im-
Fig. 12 shows the average normalized data misfit of the 80 proves the inversion accuracy with negligible computational
inverse problems by single-fidelity and bifidelity approaches overhead.

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
142 IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, VOL. 5, 2020

Reference herein to any specific commercial product, process,


or service by trade name, trademark, manufacturer, or otherwise
does not necessarily constitute or imply its endorsement, recom-
mendation, or favoring by the U.S. Government or any agency
thereof. The views and opinions of authors expressed herein do
not necessarily state or reflect those of the U.S. Government or
any agency thereof.

REFERENCES
[1] D. H. Rothman, “Nonlinear inversion, statistical mechanics, and residual
statics estimation,” Geophysics, vol. 50, no. 12, pp. 2784–2796, 1985.
[2] W. R. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte
Carlo in Practice. London, U.K.: Chapman & Hall, 1995.
[3] D. Xiu and G. E. Karniadakis, “The Wiener–Askey polynomial chaos for
stochastic differential equations,” SIAM J. Sci. Comput., vol. 24, no. 2,
pp. 619–644, 2002.
[4] I. Kaymaz, “Application of kriging method to structural reliability prob-
lems,” Struct. Saf., vol. 27, no. 2, pp. 133–151, 2005.
[5] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20,
no. 3, pp. 273–297, 1995.
[6] A. J. Majda and B. Gershgorin, “Quantifying uncertainty in climate change
science through empirical information theory,” Proc. Nat. Acad. Sci.,
vol. 107, no. 34, pp. 14958–14963, 2010.
[7] P. Piperni, A. DeBlois, and R. Henderson, “Development of a multilevel
multidisciplinary-optimization capability for an industrial environment,”
AIAA J., vol. 51, no. 10, pp. 2335–2352, 2013.
[8] Z.-H. Han, S. Görtz, and R. Zimmermann, “Improving variable-fidelity
surrogate modeling via gradient-enhanced kriging and a generalized hy-
brid bridge function,” Aerosp. Sci. Technol., vol. 25, no. 1, pp. 177–189,
2013.
[9] B. Peherstorfer, K. Willcox, and M. Gunzburger, “Survey of multifidelity
methods in uncertainty propagation, inference, and optimization,” SIAM
Fig. 13. (a) Average run time for each LMA inversion. (b) Percentage of the Rev., vol. 60, no. 3, pp. 550–591, 2018.
time used for the high-fidelity evaluation and the low-fidelity evaluation in the [10] M. G. Fernández-Godino, C. Park, N.-H. Kim, and R. T. Haftka, “Review
bifidelity inversion. of multi-fidelity models,” 2016, arXiv:1609.07196.
[11] L. Yan and T. Zhou, “Adaptive multi-fidelity polynomial chaos approach
to Bayesian inference in inverse problems,” J. Comput. Phys., vol. 381,
pp. 110–128, 2019.
VI. CONCLUSION [12] P. Perdikaris, D. Venturi, J. O. Royset, and G. E. Karniadakis, “Multi-
fidelity modelling via recursive co-kriging and gaussian–Markov random
In this article, we presented a bifidelity gradient-based inver- fields,” Proc. Roy. Soc. A: Math., Phys. Eng. Sci., vol. 471, no. 2179, 2015,
sion method aiming at solving high-dimensional and nonlinear Art. no. 20150018.
inverse problems. The key idea is to use a low-fidelity model that [13] C. C. Fischer, R. V. Grandhi, and P. S. Beran, “Bayesian low-fidelity
correction approach to multi-fidelity aerospace design,” in Proc. 58th
smooths out the forward model responses to find initial models AIAA/ASCE/AHS/ASC Struct., Struct. Dyn. Mater. Conf., 2017, p. 0133.
that are close to the global minimum; the high-fidelity model [14] J. Zheng, X. Shao, L. Gao, P. Jiang, and Z. Li, “A hybrid variable-
was then used to refine the inversion result. fidelity global approximation modelling method combining tuned radial
basis function base and kriging correction,” J. Eng. Des., vol. 24, no. 8,
We first apply the proposed method to the 2-D and 3-D Shekel pp. 604–622, 2013.
optimization problems for the convenience of visualization. The [15] J. A. Christen and C. Fox, “Markov chain Monte Carlo using an approxi-
results show that most of the local minima are avoided by the mation,” J. Comput. Graph. Statist., vol. 14, no. 4, pp. 795–810, 2005.
[16] E. Laloy, B. Rogiers, J. A. Vrugt, D. Mallants, and D. Jacques, “Ef-
PCE surrogate. We then demonstrate the performance of the ficient posterior exploration of a high-dimensional groundwater model
bifidelity approach with a 13-parameter LWD inverse problem. from two-stage Markov chain Monte Carlo simulation and polynomial
Compared to the single-fidelity gradient-based inversion, the chaos expansion,” Water Resour. Res., vol. 49, no. 5, pp. 2664–2682,
2013.
proposed method significantly improves inversion accuracy. It [17] B. Peherstorfer, B. Kramer, and K. Willcox, “Combining multiple sur-
can be easily applied to other applications that require gradient- rogate models to accelerate failure probability estimation with expensive
based inversion. high-fidelity models,” J. Comput. Phys., vol. 341, pp. 61–75, 2017.
[18] B. Peherstorfer, T. Cui, Y. Marzouk, and K. Willcox, “Multifidelity
importance sampling,” Comput. Methods Appl. Mech. Eng., vol. 300,
DISCLAIMER pp. 490–509, 2016.
[19] M. Razi, R. M. Kirby, and A. Narayan, “Fast predictive multi-fidelity
This report was prepared as an account of work sponsored by prediction with models of quantized fidelity levels,” J. Comput. Phys.,
an agency of the U.S. Government. Neither the U.S. Government vol. 376, pp. 992–1008, 2019.
[20] A. H. Elsheikh, I. Hoteit, and M. F. Wheeler, “Efficient Bayesian inference
nor any agency thereof, nor any of their employees, makes any of subsurface flow models using nested sampling and sparse polyno-
warranty, express or implied, or assumes any legal liability or mial chaos surrogates,” Comput. Methods Appl. Mech. Eng., vol. 269,
responsibility for the accuracy, completeness, or usefulness of pp. 515–537, 2014.
[21] R. Martí, M. G. Resende, and C. C. Ribeiro, “Multi-start methods for
any information, apparatus, product, or process disclosed, or combinatorial optimization,” Eur. J. Oper. Res. vol. 226, no. 1, pp. 1–8,
represents that its use would not infringe privately owned rights. 2013.

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.
LU et al.: BIFIDELITY GRADIENT-BASED APPROACH FOR NONLINEAR WELL-LOGGING INVERSE PROBLEMS 143

[22] K. Sargsyan, C. Safta, H. N. Najm, B. J. Debusschere, D. Ricciuto, and [34] R. G. Ghanem, J. Red-Horse, and A. Sarka, “Modal properties of a space-
P. Thornton, “Dimensionality reduction for complex models via Bayesian frame with localized system uncertainties,” in Proc. 8th ASCE Specialty
compressive sensing,” Int. J. Uncertainty Quantification, vol. 4, no. 1, Conf. Probab. Mech. Struct. Rel., 2000.
pp. 63–93, 2014. [35] A. O’Hagan, “Polynomial chaos: A tutorial and critique from a statisticians
[23] O. Ijasan, C. Torres-Verdín, and W. E. Preeg, “Inversion-based petro- perspective,” SIAM/ASA J. Uncertainty Quantification, vol. 20, pp. 1–20,
physical interpretation of logging-while-drilling nuclear and resistivity 2013.
measurements,” Geophysics, vol. 78, no. 6, pp. D473–D489, 2013. [36] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans.
[24] D. Pardo and C. Torres-Verdín, “Fast 1D inversion of logging-while- Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008.
drilling resistivity measurements for improved estimation of formation [37] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive
resistivity in high-angle and horizontal wells,” Geophysics, vol. 80, no. 2, sensing using Laplace priors,” IEEE Trans. Image Process., vol. 19, no. 1,
pp. E111–E124, 2015. pp. 53–63, Jan. 2010.
[25] S. A. Bakr, D. Pardo, and C. Torres-Verdín, “Fast inversion of logging- [38] M. E. Tipping, “Sparse Bayesian learning and the relevance vector ma-
while-drilling resistivity measurements acquired in multiple wells,” chine”, J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.
Geophysics, vol. 82, no. 3, pp. E111–E120, 2017. [39] C. Hans, “Bayesian lasso regression,” Biometrika, vol. 96, no. 4,
[26] B. I. Anderson, “Modeling and inversion methods for the interpretation pp. 835–845, 2009.
of resistivity logging tool response,” Ph.D. dissertation, Dept. Inf. Syst. [40] M. E. Tipping, A. Faul, J. J. T. Avenue, and J. J. T. Avenue, “Fast marginal
Technol., Delft Univ. Technol., Delft, Netherlands, 2001. likelihood maximisation for sparse Bayesian models,” in Proc. 9th Int.
[27] A. A. Goldstein, “On steepest descent,” J. Soc. Ind. Appl. Math., Ser. A: Workshop Artif. Intell. Statist., 2003, pp. 3–6.
Control, vol. 3, no. 1, pp. 147–151, 1965. [41] R. T. Haftka, D. Villanueva, and A. Chaudhuri, “Parallel surrogate-assisted
[28] R. Fletcher, Practical Methods of Optimization. Hoboken, NJ, USA: global optimization with expensive functions—A survey,” Struct. Multi-
Wiley, 2013. disciplinary Optim., vol. 54, no. 1, pp. 3–13, 2016.
[29] J. J. Moré, “The Levenberg-Marquardt algorithm: Implementation and [42] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advances and
theory,” in Numerical Analysis. New York, NY, USA: Springer, 1978, future challenges,” Swarm Evol. Comput., vol. 1, no. 2, pp. 61–70, 2011.
pp. 105–116. [43] J. Shekel, “Test functions for multimodal search techniques,” in Proc. 5th
[30] A. Ranganathan, “The Levenberg-Marquardt algorithm,” Tutorial LM Annu. Princeton Conf. Inf. Sci. Syst., 1971, pp. 354–359.
Algorithm, vol. 11, no. 1, pp. 101–110, 2004. [44] H. Lu, Q. Shen, J. Chen, X. Wu, and X. Fu, “Parallel multiple-chain dram
[31] N. Wiener, “The homogeneous chaos,” Amer. J. Math., vol. 60, no. 4, MCMC for large-scale geosteering inversion and uncertainty quantifica-
pp. 897–936, 1938. tion,” J. Petroleum Sci. Eng., vol. 174, pp. 189–200, 2019.
[32] R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral [45] Q. Shen, X. Wu, J. Chen, Z. Han, and Y. Huang, “Solving geosteering
Approach. Chelmsford, MA, USA: Courier Corp., 2003. inverse problems by stochastic hybrid Monte Carlo method,” J. Petroleum
[33] M. Eldred and J. Burkardt, “Comparison of non-intrusive polynomial Sci. Eng., vol. 161, pp. 9–16, 2018.
chaos and stochastic collocation methods for uncertainty quantification,”
in Proc. 47th AIAA Aerosp. Sci. Meeting Including New Horizons Forum
Aerosp. Expo., 2009, Art. no. 976.

Authorized licensed use limited to: University of Houston. Downloaded on August 11,2020 at 19:44:30 UTC from IEEE Xplore. Restrictions apply.

You might also like