# A MATLAB Implementation of the

Minimum Relative Entropy Method
for Linear Inverse Problems
Roseanna M. Neupauer
1
and Brian Borchers
2
1
Department of Earth and Environmental Science
2
Department of Mathematics
New Mexico Institute of Mining and Technology, Socorro, NM 87801, USA
borchers@nmt.edu
March 31, 2000
Abstract
The minimum relative entropy (MRE) method can be used to solve linear inverse
problems of the form Gm = d, where m is a vector of unknown model parameters and
d is a vector of measured data. The MRE method treats the elements of m as random
variables, and obtains a multivariate probability density function for m. The proba-
bility density function is constrained by prior information about the upper and lower
bounds of m, a prior expected value of m, and the measured data. The solution of the
1
inverse problem is the expected value of m, based on the derived probability density
function. We present a MATLAB implementation of the MRE method. Several nu-
merical issues arise in the implementation of the MRE method and are discussed here.
We present the source history reconstruction problem from groundwater hydrology as
an example of the MRE implementation.
Keywords Maximum Entropy, Minimum Relative Entropy, Inverse Problems
1 Introduction
In this paper, we present a MATLAB implementation of the minimum relative entropy
(MRE) method to solve linear inverse problems of the form
Gm = d, (1)
where G is a matrix of size N by M, m is a vector of length M containing unknown
model parameters, and d is a data vector of length N. The vector d consists of
measured data, and typically includes measurement error. The matrix G is typically
very badly conditioned, making a conventional least squares approach impractical.
In the MRE method, the unknown model parameters, m, are treated as random
variables, and the solution to the inverse problem is obtained from the multivariate
probability density function (pdf) of m. The pdf is selected in a two-step process.
First, we generate a prior distribution whose entropy is minimized subject to constraints
2
imposed by the lower and upper bounds of m(l and u, respectively) and by an estimate,
s, of the expected value of m. This prior distribution ensures that the solution of the
inverse problem is within the speciﬁed bounds on m, but does not guarantee that
Eq. (1) is satisﬁed.
In the second step, we select a posterior distribution for m, whose entropy is min-
imized relative to the prior distribution and subject to the constraint imposed by the
measured data d. The mean of the posterior pdf, ˆ m, can be used as a “solution” to
the inverse problem. The 5th and 95th percentiles of the posterior distribution deﬁne
a 90% probability interval, similar to the 90% conﬁdence interval in classical statisti-
cal methods. We can investigate the properties of the solution by using the posterior
distribution to randomly generate many realizations of m.
The goal of this paper is to present the MRE method in the context of a general
discrete linear inverse problem, to discuss some important issues in the implementation
of the method, and to describe our MATLAB implementation of the MRE method.
We also present an example taken from groundwater hydrology.
2 The MRE Method
In this section, we describe the generation of the prior distribution, p(m), and the
posterior distribution, q(m), of m. The elements of m are assumed to be independent,
so p(m) =
¸
M
i=1
p(m
i
) and q(m) =
¸
M
i=1
q(m
i
), where M is the number of model
parameters.
In this discussion, we assume that the lower bound is zero. As we show later,
3
non-zero lower bounds can easily be incorporated with a change of variables. For
a continuous random variable with ﬁnite upper bounds, zero lower bounds, and a
ﬁnite expected value within the bounds, it can be shown that the maximum entropy
distribution is a truncated exponential distribution (Kapur and Kesavan, 1992). Using
this fact we can derive the prior distribution
p(m
i
) =
β
i
exp(−β
i
m
i
)
1 −exp(−β
i
U
i
)
for β
i
= 0 (2)
p(m
i
) =
1
U
i
for β
i
= 0 ;
p(m) =
M
¸
i=1
p(m
i
) ,
where U
i
is the upper bound of parameter m
i
, and β
i
is a Lagrange multiplier whose
value is determined by satisfying the expected value constraint,

m
i
p(m
i
)dm
i
= s
i
,
where s
i
is the prior expected value of m
i
for i = 1, 2, . . . , M. By integrating Eq. (2),
we obtain the expected value equation that is used to evaluate β
i
−(β
i
U
i
+ 1) exp(−β
i
U
i
) + 1
β
i
[1 −exp(−β
i
U
i
)]
= s
i
. (3)
Details of this derivation can be found in Neupauer (1999).
To obtain the posterior distribution, q(m), we minimize its entropy relative to the
prior distribution, p(m). The entropy to be minimized is
H(q, p) =

m
q(m) ln
¸
q(m)
p(m)

dm , (4)
where p(m) is given in Eq. (2). This minimization is subject to two constraints—the
normalization requirement

m
q(m)dm = 1, and the requirement that the posterior
4
mean solution ﬁt the data within a speciﬁed tolerance
||d −Gˆ m||
2
≤ ξ
2
ǫ
2
, (5)
where || · || denotes the L
2
norm, ˆ m is the mean of the posterior distribution, ǫ is the
measurement error, and ξ is a parameter that depends on the assumed error model.
This constrained optimization problem is solved by the method of Lagrange multipliers
by minimizing
φ = H(q, p) + µ
¸
m
q(m)dm−1

+ γ

N
¸
j=1

M
¸
i=1
g
ji
ˆ m
i
−d
j

2
−ξ
2
ǫ
2

, (6)
where µ and γ are Lagrange multipliers. We have replaced the data inequality con-
straint in Eq. (5) with an equality constraint. If the initial guess of the solution does
not satisfy the data constraint, the estimate will be modiﬁed until the data constraint
is satisﬁed, which will ﬁrst occur when ||d −Gˆ m||
2
= ξ
2
ǫ
2
.
This objective function, φ, is minimized relative to q(m) when the following equality
holds (Johnson and Shore, 1984):
0 = ln
¸
q(m)
p(m)

+ 1 + µ +
N
¸
j=1
λ
j

M
¸
i=1
g
ji
m
i
−d
j

, (7)
where λ
j
= 2γ

¸
M
i=1
g
ji
m
i
−d
j

are Lagrange multipliers on the individual measured
data points. In terms of the Lagrange multipliers, λ
j
, the data constraint in Eq. (5) can
be rewritten as ||λ||
2
= 4γ
2
ξ
2
ǫ
2
, showing that γ = ||λ||/(2ξǫ). With this deﬁnition of γ
and the deﬁnition of λ
j
above, the data constraint holds when λ satisﬁes the nonlinear
5
system of equations
d −Gˆ m(λ) +
ξǫ
||λ||
λ = 0 . (8)
These Lagrange multipliers constrain the individual entries of model solution vector so
that the norm of the residuals between the measured and modeled plume satisﬁes the
(equality) data constraint.
The resulting posterior distribution takes the form
q(m
i
) =
a
i
exp(−a
i
m
i
)
1 −exp(−a
i
U
i
)
for a
i
= 0 (9)
q(m
i
) =
1
U
i
for a
i
= 0 ;
q(m) =
M
¸
i=1
q(m
i
) ,
where a
i
= β
i
+
¸
N
j=1
λ
j
g
ji
. Details of the derivation are given in Neupauer (1999).
The solution to this inverse problem is ˆ m, given by
ˆ m
i
=
1 −(a
i
U
i
+ 1) exp(−a
i
U
i
)
a
i
[1 −exp(−a
i
U
i
)]
for a
i
= 0
ˆ m
i
=
U
i
2
for a
i
= 0 . (10)
The uncertainty in the MRE solution can be expressed with probability level-
s of q(m). For example, the 90% probability interval has as its bounds the 5th
and 95th percentile probability levels, which are given by the values of m
i
such that

m
i
0
q(m

i
) dm

i
= 0.05 and

m
i
0
q(m

i
) dm

i
= 0.95, respectively.
6
3 Implementation Issues
We wrote a MATLAB program to implement the MRE method, following the approach
of the previous section. Several numerical issues arose that may not be apparent from
the previous discussion, and are discussed here. The code is available at ftp.iamg.org.
3.1 Non-zero Lower Bounds
The code was written with lower bounds of zero. For non-zero lower bounds, the
problem can be re-scaled by deﬁning ˆ m = ˆ m
o
+l, where ˆ m is the true solution, ˆ m
o
is
the corresponding model solution for a zero lower bound, and l is the vector of lower
bounds. We can solve for ˆ m
o
using the MATLAB routine, with the data modiﬁed as
d
L
= d−Gl, where d
L
is the data vector used in the MRE program, d is the true data
vector, and G is the matrix of kernel values. The upper bounds and expected values
must be replaced by U
i
−L
i
and s
i
− L
i
, respectively, where L
i
is the lower bound of
model parameter m
i
. After ˆ m
o
is computed with the MRE method, the true solution
is obtained from ˆ m = ˆ m
o
+l (Woodbury and Ulrych, 1996).
3.2 Error Models
The data constraint (Eq. 5) and the system of equations used to obtain the values of the
Lagrange multipliers λ
j
(Eq. 8) contain a parameter ξ that depends on the error model.
We consider two possible error models–an additive error model with d
j
= d
0
j

a
δ
j
and
a multiplicative error model with d
j
= d
0
j
+ d
0
j
ǫ
m
δ
j
, where ǫ
a
and ǫ
m
are the standard
deviations of the additive and multiplicative error, respectively, δ
j
is a standard normal
7
random number, d
0
j
is the true value of the data, and d
j
is the measured value.
If the additive error model is used, then ξ
2
= N; and with the multiplicative
error model, we use ξ
2
= ||d||
2
. The MRE implementation allows both additive and
multiplicative components in the error model by replacing Eq. (8) with
d −Gˆ m(λ) +
λ
||λ||

a
+||d||ǫ
m

= 0 . (11)
3.3 Prior Distribution
The prior distribution is calculated from Eq. (2), which is written in terms of the
Lagrange multipliers, β
i
. The values for each β
i
are determined individually from
Eq. (3) by using the bisection method (Gill et al., 1981). For certain values of β
i
,
Eq. (3) is subject to numerical error, and asymptotic approximations must be used.
If β
i
= 0, Eq. (3) is indeterminate. In the limit as s
i
→ U
i
/2, β
i
→ 0. Thus, if
s
i
= U
i
/2, the MRE program assigns β
i
= 0. If s
i
≈ U
i
/2, β
i
is small but non-zero,
and Eq. (3) is subject to numerical error. To avoid this error, we solve for β
i
using an
asymptotic approximation to Eq. (3) for |β
i
| small (|β
i
| < 10
−4
in the MRE code):
s
i

12U
i
−8β
i
U
2
i
+ 3β
2
i
U
3
i
24 −12β
i
U
i
+ 4β
2
i
U
2
i
−β
3
i
U
3
i
. (12)
If s
i
≈ U
i
, then β
i
→ −∞, and Eq. (3) is subject to numerical error. To avoid
this error, we use the asymptotic approximation of β
i
= −1/(U
i
− s
i
) for s
i
≈ U
i
in
our MRE algorithm. If s
i
≈ 0, then β
i
→ ∞. For this case, the MRE algorithm sets
β
i
to the upper limit of the starting interval of the bisection method, and no further
approximations are needed.
8
The prior distribution, p(m), is shown in Eq. (2) in terms of the Lagrange multi-
pliers, β
i
. If β
i
≪−1 (β
i
< −10
2
in the MRE code), the expressions shown in Eq. (2)
cannot be evaluated numerically, and the prior distribution is approximated by
p(m
i
) =

−2β
i
[1 + β
i
(U
i
−m
i
)] U
i
+ 1/β
i
≤ m
i
≤ U
i
,
0 otherwise .
(13)
3.4 Posterior Distribution
3.4.1 Lagrange Multipliers
The posterior distribution is calculated from Eq. (9), which is written in terms of
a
i
= β
i
+
¸
N
j=1
λ
j
g
ji
, where λ
j
are Lagrange multipliers. The values of λ (λ =

1
, λ
2
, . . . , λ
N
]
T
, where T denotes transpose) are calculated using the Newton-Raphson
method to solve Eq. (8), using F(λ) = 0 where
F(λ)
j
= d
j

M
¸
i=1
g
ji
ˆ m
i
(λ) + ξǫ
λ
j
||λ||
. (14)
Here d
j
is the j
th
measured data point, ˆ m
i
is the expected value of m
i
(Eq. 10), and
ξǫ =

a
+||d||ǫ
m
. We begin with λ
j
= 1. If this satisﬁes the data constraint (Eq. 5),
we use λ
j
= 1 in Eq. (10) to obtain the solution to the inverse problem. Otherwise, we
use the Newton-Raphson method to iteratively solve for the zeroes of Eq. (14) using
λ
k
= λ
k−1

∂F
∂λ

λ
k−1

−1
F
k−1
, (15)
9
where the superscripts denote the iteration, and the terms of the Jacobian matrix,
∂F/∂λ, are
∂F
j
∂λ
l
= −
M
¸
i=1
g
ji
¸
∂ ˆ m
i
∂a
i
g
li

+
ξǫ
||λ||
¸
δ
jl

λ
j
λ
l
||λ||
2

, (16)
where l = 1, 2, . . . , N, δ
jl
is the Kronecker delta, and
∂ ˆ m
i
∂a
i
=
a
2
i
U
2
i
exp (−a
i
U
i
) −[1 −exp(−a
i
U
i
)]
2
a
2
i
[1 −exp(−a
i
U
i
)]
2
. (17)
Eq. (14) is suﬃciently non-linear that the step size calculated in Eq. (15) is not
necessarily optimal. Therefore, we use the Newton-Raphson method to calculate the
optimal search direction; then, we use a univariate golden section search (Gill et al.,
1981) in that direction to calculate the optimal step length. The process is repeated
until ||F||/(1 + ||d||) is less than a user-speciﬁed tolerance. We have found that the
condition number of the Jacobian matrix can be high when ǫ = 0. To reduce the
condition number, we perform row-scaling on the Jacobian matrix. It is also possible
to substantially reduce the condition number by including a small amount of noise, at
a level that is negligible compared to the true data values.
3.4.2 Asymptotic Approximations
The posterior distribution, q(m), in Eq. (9), and the expressions for ˆ m
i
(Eq. 10) and
∂m
i
/∂a
i
(Eq. 17) are all written in terms of a
i
= β
i
+
¸
N
j=1
λ
j
g
ji
. For certain values of β
i
and a
i
, these expressions are subject to numerical error and asymptotic approximations
must be used.
10
We obtained Eq. (9) by minimizing the entropy of the posterior distribution relative
to the prior distribution shown in Eq. (2). However, if β
i
≪ −1, an asymptotic ap-
proximation to the prior distribution is used (Eq. 13), and the resulting the asymptotic
approximation of the posterior distribution is
q(m
i
) =

2
i
(b
i
−3β
i
)
[1 + β
i
(U
i
−m
i
)]e
b
i
(U
i
−m
i
)
U
i
+ 1/β
i
≤ m
i
≤ U
i
,
0 otherwise ,
(18)
where b
i
=
¸
N
j=1
λ
j
g
ji
.
The expression for ˆ m
i
in Eq. (10) is subject to numerical error when β
i
≪ −1
(i.e., when Eq. (18) is used as an asymptotic approximation of q(m
i
)) and when |a
i
| is
very small or very large. Using asymptotic approximations when necessary, the general
expression for ˆ m
i
is
ˆ m
i
≈ U
i
−1/a
i
, a
i
≪−1 (a
i
< −10
2
)
ˆ m
i
≈ 1/a
i
, a
i
≫1 (a
i
> 10
2
)
ˆ m
i

12U
i
−8a
i
U
2
i
+3a
2
i
U
3
i
24−12a
i
U
i
+4a
2
i
U
2
i
−a
3
i
U
3
i
, |a
i
| ≪1 (|a
i
| < 10
−4
)
ˆ m
i
≈ U
i
+ (b
i
−β
i
)/[β
i
(b
i
−3β
i
)], β
i
≪−1
ˆ m
i
= U
i
/2, a
i
= 0
ˆ m
i
=
1−(a
i
U
i
+1) exp(−a
i
U
i
)
a
i
[1−exp(−a
i
U
i
)]
, otherwise. (19)
The expression for ∂ ˆ m
i
/∂a
i
in Eq. (16) is also subject to numerical error when β
i

−1 and when |a
i
| is very small or very large. When β
i
≪ −1, the partial derivative,
11
∂ ˆ m
i
/∂a
i
, in Eq. (16) is replaced with ∂ ˆ m
i
/∂b
i
. Using asymptotic approximations when
necessary, the general expression for ∂ ˆ m
i
/∂a
i
is
∂ ˆ m
i
∂a
i
≈ −U
2
i
15−15a
i
U
i
+8a
2
i
U
2
i
180−180a
i
U
i
+105a
2
i
U
2
i
, |a
i
| ≪−1
∂ ˆ m
i
∂a
i
≈ −1/a
2
i
, |a
i
| ≫1
∂m
i
∂b
i
≈ −2/(b
i
−3β
i
)
2
β
i
≪−1
∂ ˆ m
i
∂a
i
=
a
2
i
U
2
i
exp(−a
i
U
i
)−[1−exp(−a
i
U
i
)]
2
a
2
i
[1−exp(−a
i
U
i
)]
2
, otherwise. (20)
4 Example
The MRE inversion approach has been used in groundwater hydrology for parameter
estimation (Woodbury and Ulrych, 1993; Woodbury et al., 1995) and for the source
history reconstruction problem (Woodbury and Ulrych, 1996; Woodbury et al., 1998;
Woodbury and Ulrych, 1998). We present a source history reconstruction example
from Woodbury and Ulrych (1996) as an illustration of our MRE algorithm. The data
ﬁles are available at ftp.iamg.org.
The source history reconstruction problem involves a point source of groundwa-
ter contamination at a known location in a one-dimensional, saturated, homogeneous
porous medium. The source concentration over time is unknown, but the present-day
spatial distribution of the contamination plume is measured at discrete locations. Us-
ing this data with the MRE method, we reconstruct the concentration history at the
contamination source.
12
For this example, the true source history function is
C
in
(t) = exp
¸

(t −130)
2
2(5)
2
¸
+ 0.3 exp
¸

(t −150)
2
2(10)
2
¸
+ (21)
0.5 exp
¸

(t −190)
2
2(7)
2
¸
,
where t is in days, and C
in
is dimensionless. This source history function is shown in
Fig. 1. The solution vector, m, is a discretized version of this curve. We discretized the
source history function into 100 evenly-spaced times, at three-day intervals between
t = 0 days and t = 297 days.
The governing equation of contaminant transport in groundwater is the advection
dispersion equation
∂C
∂t
= D

2
C
∂x
2
−v
∂C
∂x
, (22)
where C = C(x, t) is concentration at time t at location x, D is the dispersion co-
eﬃcient, and v is the groundwater velocity. The initial and boundary conditions for
Eq. (22) are C(x, 0) = 0, C(0, t) = C
in
(t), and C(x, t) → 0 as x → ∞. For this
problem, we use v = 1 m/d and D = 1 m
2
/d. Using Eqs. (21) and (22), the plume at
T = 300 days was calculated and is shown in Fig. 2. The plume was sampled at sixty
points, at ﬁve-meter intervals from x = 5 m to x = 300 m; we assumed an additive
noise with ǫ
a
= 0.005. The sampled data are also shown in Fig. 2. These data comprise
the vector d.
The elements of the matrix G are g
ji
= ∆tf(x
j
, T −t
i
), where ∆t is the temporal
discretization of the source history function (∆t = 3 days); T is the sample time
13
(T = 300 days), x
j
is the location of the j
th
measurement, t
i
is the time corresponding
to the i
th
element of m, and f(x
j
, T − t
i
) is the solution of Eq. (22) for a pulse input
at x = 0 and t = 0, given by
f(x
j
, T −t
i
) =
x
j
2

πD(T −t
i
)
3
exp

[x
j
−v(T −t
i
)]
2
4D(T −t
i
)
¸
. (23)
For this problem, we assume an upper bound on m of U
i
= 1.1, and a prior expected
value of s
i
= exp{−(t
i
− 150)
2
/800}. We solved this inverse problem using the MRE
algorithm, and obtained the solution shown in Fig. 3. The results of MRE algorithm
match well with the true solution, although the small peak near t = 150 days is not
reproduced. The true solution falls within the 90% probability interval.
Using the posterior distribution, q(m), we generated 30 realizations of the model
solution vector m. An example of one realization is shown in Fig. 4. The realization
follows the same pattern as the true solution, but is less smooth. The results of the 30
realizations are plotted in Fig. 5, along with the 90% probability interval. The results
show that the randomly-generated solutions fall within the 90% probability interval
most of the time.
5 Summary and Conclusions
A MATLAB program for solving linear inverse problems using the minimum relative
entropy method was presented. We discussed several implementation issues that may
not be apparent from the MRE theory. We solved the minimization problem using the
method of Lagrange multipliers. In evaluating the Lagrange multipliers, the system of
14
equations is suﬃciently non-linear that the Newton-Raphson method fails to converge.
We used a damped Newton-Raphson method using the Newton-Raphson method to
compute a search direction and a univariate golden section search to obtain the optimal
step length. Other implementation issues included asymptotic approximations as the
values of the Lagrange multipliers approach ±∞ or zero.
We demonstrated the MATLAB algorithm on a source history reconstruction prob-
lem from groundwater hydrology. The results show that the MRE method performs
well for this simple test case.
Acknowledgments
This research was supported in part by the Environmental Protection Agency’s
STAR Fellowship program under Fellowship No. U-915324-01-0. This work has not
been subjected to the EPA’s peer and administrative review and therefore may not
necessarily reﬂect the views of the Agency and no oﬃcial endorsement should be in-
ferred. Allan Woodbury provided information about his earlier implementation of the
minimum relative entropy inversion method.
References
Gill, P. E., Murray, W., Wright, M. H., 1981. Practical Optimization. Academic Press,
Inc., San Diego, California, 401 pp.
Johnson, R. W., and Shore, J. E., 1984. Relative-entropy minimization with uncertain
constraints – Theory and application to spectrum analysis. NRL Memorandum
15
Report 5478, Naval Research Laboratory, Washington, D.C.
Kapur, J. N., Kesavan, H. K., 1992. Entropy Optimization Principles with Applica-
tions. Academic Press, San Diego, California, 408 pp.
Neupauer, R. M., 1999. A Comparison of Two Methods for Recovering the Release
History of a Groundwater Contamination Source. M.Sc. Thesis, New Mexico Insti-
tute of Mining and Technology, Socorro, New Mexico, 197 pp.
Woodbury, A. D., Render, F., Ulrych, T., 1995. Practical probabilistic ground-water
modeling. Ground Water 33 (4), 532–538.
Woodbury, A. D., Sudicky, E., Ulrych, T. J., Ludwig, R., 1998. Three-dimensional
plume source reconstruction using minimum relative entropy inversion. Journal of
Contaminant Hydrology 32, 131–158.
Woodbury, A. D., Ulrych, T. J., 1993. Minimum relative entropy: Forward probabilis-
tic modeling. Water Resources Research 29 (8), 2847–2860.
Woodbury, A. D., Ulrych, T. J., 1996. Minimum relative entropy inversion: Theory
and application to recovering the release history of a groundwater contaminant.
Water Resources Research 32 (9), 2671–2681.
Woodbury, A. D., Ulrych, T. J., 1998. Minimum relative entropy and probabilistic in-
version in groundwater hydrology. Stochastic Hydrology and Hydraulics 12, 317–
358.
16
Figure Captions
Figure 1: True source history function.
Figure 2: True plume at T = 300 days (solid line) and sampled data (circles).
Figure 3: Inverse problem solution for the example problem. (A) MRE solution
(solid line), true source history function (dashed line), prior expected value (dot-dashed
line). (B) MRE solution (solid line), 5th percentile probability level (dashed line), 95th
percentile probability level (dot-dashed line).
Figure 4: One realization of the model solution. solid line: realization; dashed line:
true source history.
Figure 5: Results of 30 realizations of the model solution (dots) and the 90% proba-
bility interval (solid lines).
17
0 50 100 150 200 250 300
0
0.5
1
1.5
Time (days)
C
i
n
(
t
)
Figure 1
18
0 50 100 150 200 250 300
0
0.2
0.4
Distance (m)
C
(
x
,
T
=
3
0
0

d
a
y
s
)
Figure 2
19

0
0.5
1
1.5
A
C
i
n
(
t
)
0 50 100 150 200 250 300
0
0.5
1
1.5
B
C
i
n
(
t
)
Time (days)
Figure 3
20
0 50 100 150 200 250 300
0
0.5
1
1.5
Time (days)
C
i
n
(
t
)
Figure 4
21
0 50 100 150 200 250 300
0
0.5
1
1.5
Time (days)
C
i
n
(
t
)
Figure 5
22

inverse problem is the expected value of m, based on the derived probability density function. We present a MATLAB implementation of the MRE method. Several numerical issues arise in the implementation of the MRE method and are discussed here. We present the source history reconstruction problem from groundwater hydrology as an example of the MRE implementation.

Keywords Maximum Entropy, Minimum Relative Entropy, Inverse Problems

1

Introduction

In this paper, we present a MATLAB implementation of the minimum relative entropy (MRE) method to solve linear inverse problems of the form

Gm = d,

(1)

where G is a matrix of size N by M , m is a vector of length M containing unknown model parameters, and d is a data vector of length N . The vector d consists of measured data, and typically includes measurement error. The matrix G is typically very badly conditioned, making a conventional least squares approach impractical. In the MRE method, the unknown model parameters, m, are treated as random variables, and the solution to the inverse problem is obtained from the multivariate probability density function (pdf) of m. The pdf is selected in a two-step process. First, we generate a prior distribution whose entropy is minimized subject to constraints

2

The elements of m are assumed to be independent. m. to discuss some important issues in the implementation of the method. We can investigate the properties of the solution by using the posterior distribution to randomly generate many realizations of m. of the expected value of m. where M is the number of model 3 . As we show later. we assume that the lower bound is zero. This prior distribution ensures that the solution of the inverse problem is within the speciﬁed bounds on m. and the posterior distribution. of m. so p(m) = parameters. In the second step. In this discussion. can be used as a “solution” to the inverse problem. similar to the 90% conﬁdence interval in classical statistical methods. q(m). respectively) and by an estimate. M i=1 p(mi ) and q(m) = M i=1 q(mi ). we describe the generation of the prior distribution. We also present an example taken from groundwater hydrology. The mean of the posterior pdf. we select a posterior distribution for m. The goal of this paper is to present the MRE method in the context of a general discrete linear inverse problem. whose entropy is minimized relative to the prior distribution and subject to the constraint imposed by the ˆ measured data d. s. 2 The MRE Method In this section. but does not guarantee that Eq. p(m).imposed by the lower and upper bounds of m (l and u. (1) is satisﬁed. The 5th and 95th percentiles of the posterior distribution deﬁne a 90% probability interval. and to describe our MATLAB implementation of the MRE method.

(2). 1992). This minimization is subject to two constraints—the normalization requirement m q(m)dm = 1. and βi is a Lagrange multiplier whose value is determined by satisfying the expected value constraint. it can be shown that the maximum entropy distribution is a truncated exponential distribution (Kapur and Kesavan. 2. p) = m q(m) ln (4) where p(m) is given in Eq. i=1 where Ui is the upper bound of parameter mi . p(m) (3) H(q. By integrating Eq. . . mi p(mi )dmi = si . . . M .non-zero lower bounds can easily be incorporated with a change of variables. we obtain the expected value equation that is used to evaluate βi −(βi Ui + 1) exp(−βi Ui ) + 1 = si . To obtain the posterior distribution. and the requirement that the posterior 4 . and a ﬁnite expected value within the bounds. q(m). The entropy to be minimized is q(m) dm . p(m). zero lower bounds. βi [1 − exp(−βi Ui )] Details of this derivation can be found in Neupauer (1999). For a continuous random variable with ﬁnite upper bounds. where si is the prior expected value of mi for i = 1. Using this fact we can derive the prior distribution βi exp(−βi mi ) for βi = 0 1 − exp(−βi Ui ) 1 for βi = 0 . Ui M p(mi ) = p(mi ) = p(m) = (2) p(mi ) . (2). we minimize its entropy relative to the prior distribution.

This constrained optimization problem is solved by the method of Lagrange multipliers by minimizing  N M i=1 2 φ = H(q. In terms of the Lagrange multipliers. the data constraint holds when λ satisﬁes the nonlinear 5 . which will ﬁrst occur when ||d − Gm||2 = ξ 2 ǫ2 . φ. the estimate will be modiﬁed until the data constraint ˆ is satisﬁed. is minimized relative to q(m) when the following equality holds (Johnson and Shore. (7) where λj = 2γ M i=1 gji mi − dj are Lagrange multipliers on the individual measured data points. 1984): N q(m) +1+µ+ λj p(m) j=1 M i=1 0 = ln gji mi − dj . ǫ is the measurement error. (5) ˆ where || · || denotes the L2 norm. (5) can be rewritten as ||λ||2 = 4γ 2 ξ 2 ǫ2 . We have replaced the data inequality constraint in Eq.mean solution ﬁt the data within a speciﬁed tolerance ˆ ||d − Gm||2 ≤ ξ 2 ǫ2 . the data constraint in Eq. and ξ is a parameter that depends on the assumed error model. (5) with an equality constraint. If the initial guess of the solution does not satisfy the data constraint. m is the mean of the posterior distribution. λj . (6) where µ and γ are Lagrange multipliers. This objective function. p) + µ m q(m)dm − 1 + γ  j=1 gji mi − dj ˆ − ξ 2 ǫ2    . With this deﬁnition of γ and the deﬁnition of λj above. showing that γ = ||λ||/(2ξǫ).

respectively. The resulting posterior distribution takes the form ai exp(−ai mi ) for ai = 0 1 − exp(−ai Ui ) 1 for ai = 0 . ˆ The solution to this inverse problem is m.05 and i i mi 0 q(m′ ) dm′ = 0. i=1 where ai = βi + N j=1 λj gji . which are given by the values of mi such that mi 0 q(m′ ) dm′ = 0. the 90% probability interval has as its bounds the 5th and 95th percentile probability levels. 2 mi = ˆ mi = ˆ (10) The uncertainty in the MRE solution can be expressed with probability levels of q(m). i i 6 .95. For example.system of equations ˆ d − Gm(λ) + ξǫ λ=0. given by 1 − (ai Ui + 1) exp(−ai Ui ) for ai = 0 ai [1 − exp(−ai Ui )] Ui for ai = 0 . Details of the derivation are given in Neupauer (1999). ||λ|| (8) These Lagrange multipliers constrain the individual entries of model solution vector so that the norm of the residuals between the measured and modeled plume satisﬁes the (equality) data constraint. Ui M q(mi ) = q(mi ) = q(m) = (9) q(mi ) .

where ǫa and ǫm are the standard j j deviations of the additive and multiplicative error. respectively. where Li is the lower bound of ˆ model parameter mi . The code is available at ftp. and l is the vector of lower ˆ bounds. where dL is the data vector used in the MRE program. δj is a standard normal 7 . 3. the true solution ˆ ˆ is obtained from m = mo + l (Woodbury and Ulrych. and are discussed here. 0 We consider two possible error models–an additive error model with dj = dj + ǫa δj and a multiplicative error model with dj = d0 + d0 ǫm δj . respectively. and G is the matrix of kernel values.org. with the data modiﬁed as dL = d − Gl. 5) and the system of equations used to obtain the values of the Lagrange multipliers λj (Eq. the ˆ ˆ ˆ ˆ problem can be re-scaled by deﬁning m = mo + l. 8) contain a parameter ξ that depends on the error model. After mo is computed with the MRE method. We can solve for mo using the MATLAB routine. where m is the true solution. following the approach of the previous section.1 Non-zero Lower Bounds The code was written with lower bounds of zero.3 Implementation Issues We wrote a MATLAB program to implement the MRE method. 1996). 3.iamg.2 Error Models The data constraint (Eq. d is the true data vector. mo is the corresponding model solution for a zero lower bound. Several numerical issues arose that may not be apparent from the previous discussion. For non-zero lower bounds. The upper bounds and expected values must be replaced by Ui − Li and si − Li .

8 . (2). 1981).random number. βi is small but non-zero. If βi = 0. To avoid this error. (3) for |βi | small (|βi | < 10−4 in the MRE code): si ≈ 2 12Ui − 8βi Ui2 + 3βi Ui3 . To avoid this error. (3) by using the bisection method (Gill et al. and with the multiplicative error model. βi . The values for each βi are determined individually from Eq. the MRE program assigns βi = 0.3 Prior Distribution The prior distribution is calculated from Eq. then βi → −∞. In the limit as si → Ui /2. d0 is the true value of the data. (3) is subject to numerical error. and no further approximations are needed. we use the asymptotic approximation of βi = −1/(Ui − si ) for si ≈ Ui in our MRE algorithm.. and dj is the measured value. if si = Ui /2. 2 3 24 − 12βi Ui + 4βi Ui2 − βi Ui3 (12) If si ≈ Ui . ||λ|| (11) 3. (3) is subject to numerical error. Eq. we use ξ 2 = ||d||2 . For this case. If si ≈ Ui /2. and Eq. j If the additive error model is used. and Eq. The MRE implementation allows both additive and multiplicative components in the error model by replacing Eq. then βi → ∞. (3) is indeterminate. For certain values of βi . Eq. βi → 0. we solve for βi using an asymptotic approximation to Eq. (8) with ˆ d − Gm(λ) + λ √ N ǫa + ||d||ǫm = 0 . then ξ 2 = N . Thus. and asymptotic approximations must be used. which is written in terms of the Lagrange multipliers. the MRE algorithm sets βi to the upper limit of the starting interval of the bisection method. If si ≈ 0. (3) is subject to numerical error.

the expressions shown in Eq. (2) in terms of the Lagrange multipliers. is shown in Eq. λN ]T . using F (λ) = 0 where M F (λ)j = dj − gji mi (λ) + ξǫ ˆ i=1 λj . 5). which is written in terms of ai = βi + N j=1 λj gji . If this satisﬁes the data constraint (Eq. . The values of λ (λ = [λ1 . p(m). (14) using ∂F ∂λ λk−1 −1 λk = λk−1 − Fk−1 . (10) to obtain the solution to the inverse problem. we use the Newton-Raphson method to iteratively solve for the zeroes of Eq. otherwise . where λj are Lagrange multipliers. (8). βi . λ2 . we use λj = 1 in Eq.1 Posterior Distribution Lagrange Multipliers The posterior distribution is calculated from Eq.4. We begin with λj = 1. and ˆ ξǫ = √ N ǫa +||d||ǫm . and the prior distribution is approximated by      −2βi [1 + βi (Ui − mi )]    0  p(mi ) = Ui + 1/βi ≤ mi ≤ Ui . Otherwise.4 3. (9). . . (15) 9 . . (13) 3. If βi ≪ −1 (βi < −102 in the MRE code). where T denotes transpose) are calculated using the Newton-Raphson method to solve Eq. ||λ|| (14) Here dj is the j th measured data point.The prior distribution. (2) cannot be evaluated numerically. 10). mi is the expected value of mi (Eq.

10 . and the expressions for mi (Eq. . and ∂ mi ˆ ∂ai a2 Ui2 exp (−ai Ui ) − [1 − exp(−ai Ui )]2 i . . a2 [1 − exp(−ai Ui )]2 i = (17) Eq. at a level that is negligible compared to the true data values. we use the Newton-Raphson method to calculate the optimal search direction.. The process is repeated until ||F||/(1 + ||d||) is less than a user-speciﬁed tolerance. we perform row-scaling on the Jacobian matrix. . q(m). (9).2 Asymptotic Approximations The posterior distribution.4. are M ∂ mi ˆ ξǫ λj λl ∂Fj =− gji gli + δjl − ∂λl ∂ai ||λ|| ||λ||2 i=1 . To reduce the condition number. (14) is suﬃciently non-linear that the step size calculated in Eq. It is also possible to substantially reduce the condition number by including a small amount of noise. 10) and ˆ ∂mi /∂ai (Eq. (15) is not necessarily optimal. we use a univariate golden section search (Gill et al. in Eq. these expressions are subject to numerical error and asymptotic approximations must be used. For certain values of βi and ai . and the terms of the Jacobian matrix. then. N . Therefore. 2. ∂F/∂λ. We have found that the condition number of the Jacobian matrix can be high when ǫ = 0. 17) are all written in terms of ai = βi + N j=1 λj gji .where the superscripts denote the iteration. 1981) in that direction to calculate the optimal step length. 3. . δjl is the Kronecker delta. (16) where l = 1.

(2). 1/ai . When βi ≪ −1. (16) is also subject to numerical error when βi ≪ ˆ −1 and when |ai | is very small or very large. the partial derivative. 3 2 12Ui −8ai Ui +3a2 Ui i 2 U 2 −a3 U 3 . when Eq. and the resulting the asymptotic approximation of the posterior distribution is          2 6βi (bi −3βi ) [1 + βi (Ui − mi )]ebi (Ui −mi ) Ui + 1/βi ≤ mi ≤ Ui . q(mi ) =         0 where bi = N j=1 λj gji . However. (18) is used as an asymptotic approximation of q(mi )) and when |ai | is very small or very large. (10) is subject to numerical error when βi ≪ −1 ˆ (i. ˆ mi = ˆ mi = ˆ Ui /2. 13). (19) mi ≈ Ui + (bi − βi )/[βi (bi − 3βi )]. 1−(ai Ui +1) exp(−ai Ui ) . the general expression for mi is ˆ mi ≈ ˆ mi ≈ ˆ mi ≈ ˆ Ui − 1/ai . 11 . (9) by minimizing the entropy of the posterior distribution relative to the prior distribution shown in Eq. ai [1−exp(−ai Ui )] The expression for ∂ mi /∂ai in Eq. an asymptotic approximation to the prior distribution is used (Eq.. if βi ≪ −1. (18) otherwise . The expression for mi in Eq. 24−12ai Ui +4ai i i i ai ≪ −1 (ai < −102 ) ai ≫ 1 (ai > 102 ) |ai | ≪ 1 (|ai | < 10−4 ) βi ≪ −1 ai = 0 otherwise.e. Using asymptotic approximations when necessary.We obtained Eq.

1995) and for the source history reconstruction problem (Woodbury and Ulrych. homogeneous porous medium. Woodbury et al. 1993. Woodbury et al. 12 . −2/(bi − 3βi )2 2 a2 Ui exp(−ai Ui )−[1−exp(−ai Ui )]2 i . 1996. but the present-day spatial distribution of the contamination plume is measured at discrete locations. Woodbury and Ulrych. (16) is replaced with ∂ mi /∂bi . the general expression for ∂ mi /∂ai is ˆ ∂ mi ˆ ∂ai ∂ mi ˆ ∂ai ∂mi ∂bi ∂ mi ˆ ∂ai ≈ ≈ ≈ = i −Ui2 180−180ai Ui +105a2iU 2 .. in Eq.∂ mi /∂ai . a2 [1−exp(−ai Ui )]2 i 4 Example The MRE inversion approach has been used in groundwater hydrology for parameter estimation (Woodbury and Ulrych. We present a source history reconstruction example from Woodbury and Ulrych (1996) as an illustration of our MRE algorithm. The source history reconstruction problem involves a point source of groundwater contamination at a known location in a one-dimensional. (20) 2 −1/ai . 1998.org. Using this data with the MRE method. i i i i 15−15a U +8a2 U 2 |ai | ≪ −1 |ai | ≫ 1 βi ≪ −1 otherwise. saturated.. we reconstruct the concentration history at the contamination source. The data ﬁles are available at ftp. Using asymptotic approximations when ˆ ˆ necessary.iamg. 1998). The source concentration over time is unknown.

For this example. the plume at T = 300 days was calculated and is shown in Fig. The plume was sampled at sixty points. 1. we assumed an additive noise with ǫa = 0. For this problem. The sampled data are also shown in Fig. D is the dispersion coeﬃcient. where ∆t is the temporal discretization of the source history function (∆t = 3 days). These data comprise the vector d. The initial and boundary conditions for Eq. t) = Cin (t). 2.005. and C(x. and Cin is dimensionless. Using Eqs. is a discretized version of this curve. t) → 0 as x → ∞. The elements of the matrix G are gji = ∆tf (xj . and v is the groundwater velocity. m. The governing equation of contaminant transport in groundwater is the advection dispersion equation ∂C ∂2C ∂C =D 2 −v . the true source history function is (t − 150)2 (t − 130)2 + 0. (21) and (22). C(0. The solution vector. T − ti ). at ﬁve-meter intervals from x = 5 m to x = 300 m. This source history function is shown in Fig. at three-day intervals between t = 0 days and t = 297 days. 0) = 0. We discretized the source history function into 100 evenly-spaced times. 2. Cin (t) = exp − (21) 0. t) is concentration at time t at location x. ∂t ∂x ∂x (22) where C = C(x.3 exp − + 2(5)2 2(10)2 (t − 190)2 2(7)2 . (22) are C(x.5 exp − where t is in days. we use v = 1 m/d and D = 1 m2 /d. T is the sample time 13 .

The results show that the randomly-generated solutions fall within the 90% probability interval most of the time. 4. 5. 3. 5 Summary and Conclusions A MATLAB program for solving linear inverse problems using the minimum relative entropy method was presented. ti is the time corresponding to the ith element of m. along with the 90% probability interval.1. We solved this inverse problem using the MRE algorithm. We discussed several implementation issues that may not be apparent from the MRE theory. the system of 14 . given by xj [xj − v(T − ti )]2 exp − 4D(T − ti ) 2 πD(T − ti )3 f (xj . (22) for a pulse input at x = 0 and t = 0. The results of MRE algorithm match well with the true solution. although the small peak near t = 150 days is not reproduced. In evaluating the Lagrange multipliers. The results of the 30 realizations are plotted in Fig. xj is the location of the j th measurement. we generated 30 realizations of the model solution vector m. We solved the minimization problem using the method of Lagrange multipliers. we assume an upper bound on m of Ui = 1. (23) For this problem. The realization follows the same pattern as the true solution. q(m). The true solution falls within the 90% probability interval. and f (xj . and a prior expected value of si = exp{−(ti − 150)2 /800}. but is less smooth. T − ti ) = . Using the posterior distribution. and obtained the solution shown in Fig. T − ti ) is the solution of Eq.(T = 300 days). An example of one realization is shown in Fig.

U-915324-01-0. Murray. 401 pp. California. NRL Memorandum 15 . E. 1984. San Diego. We demonstrated the MATLAB algorithm on a source history reconstruction problem from groundwater hydrology. M.equations is suﬃciently non-linear that the Newton-Raphson method fails to converge. Johnson. P... Allan Woodbury provided information about his earlier implementation of the minimum relative entropy inversion method. and Shore. W. References Gill. Acknowledgments This research was supported in part by the Environmental Protection Agency’s STAR Fellowship program under Fellowship No. R. The results show that the MRE method performs well for this simple test case. Relative-entropy minimization with uncertain constraints – Theory and application to spectrum analysis. E. This work has not been subjected to the EPA’s peer and administrative review and therefore may not necessarily reﬂect the views of the Agency and no oﬃcial endorsement should be inferred. Inc. Wright. W.. Practical Optimization. Other implementation issues included asymptotic approximations as the values of the Lagrange multipliers approach ±∞ or zero. 1981. Academic Press. J... H. We used a damped Newton-Raphson method using the Newton-Raphson method to compute a search direction and a univariate golden section search to obtain the optimal step length..

San Diego.. Woodbury. New Mexico. A.. D. Ulrych. Woodbury. T.Sc. M. Ulrych. 532–538. 2847–2860. 1998. 1992. 1995. 1993.. Three-dimensional plume source reconstruction using minimum relative entropy inversion. T. Ulrych. 1998. Minimum relative entropy inversion: Theory and application to recovering the release history of a groundwater contaminant. M. Woodbury. A.. A. 197 pp. 2671–2681.. Naval Research Laboratory.. Stochastic Hydrology and Hydraulics 12. D. T.. Neupauer. F. Socorro. H. Ulrych. 408 pp. Ground Water 33 (4).. 131–158. T. Render.. 1999. 317– 358. J. A Comparison of Two Methods for Recovering the Release History of a Groundwater Contamination Source. J. Water Resources Research 29 (8).. T. Ludwig. Practical probabilistic ground-water modeling.. Academic Press. K. Journal of Contaminant Hydrology 32. D. Kesavan. Thesis. N. Woodbury. D. Entropy Optimization Principles with Applications. R. California. J. Minimum relative entropy and probabilistic inversion in groundwater hydrology.. E. 16 . Washington. Minimum relative entropy: Forward probabilistic modeling. Ulrych. New Mexico Institute of Mining and Technology. J. D.C. A.. A. Sudicky. 1996.. Woodbury. Kapur. J. Water Resources Research 32 (9)... D.Report 5478. R.

5th percentile probability level (dashed line). solid line: realization. Figure 4: One realization of the model solution. (A) MRE solution (solid line). 17 .Figure Captions Figure 1: True source history function. (B) MRE solution (solid line). true source history function (dashed line). dashed line: true source history. Figure 3: Inverse problem solution for the example problem. 95th percentile probability level (dot-dashed line). Figure 5: Results of 30 realizations of the model solution (dots) and the 90% probability interval (solid lines). Figure 2: True plume at T = 300 days (solid line) and sampled data (circles). prior expected value (dot-dashed line).

5 0 0 50 100 150 200 Time (days) 250 300 Figure 1 18 .5 Cin(t) 1 0.1.

C(x.T=300 days) 0.4 0.2 0 0 50 100 150 200 Distance (m) 250 300 Figure 2 19 .

1.5 0 1.5 A Cin(t) 1 0.5 B Cin(t) 1 0.5 0 0 50 100 150 200 Time (days) 250 300 Figure 3 20 .

5 Cin(t) 1 0.5 0 0 50 100 150 200 Time (days) 250 300 Figure 4 21 .1.

1.5 0 0 50 100 150 200 Time (days) 250 300 Figure 5 22 .5 Cin(t) 1 0.