Tests For Choice of Regularization Parameter in Nonlinear Inverse Problems

tests for choice of regularization parameter in nonlinear inverse problems
J.L. Mead and C.C. Hammerquist

Supported by NSF grant DMS 1043107, Boise State University, Department of Mathematicss, Boise, ID 83725-1555. Tel: 208426-2432, Fax: 208-426-1354 Email jmead@boisestate.edu. Abstract. We address discrete nonlinear inverse problems with weighted least squares and Tikhonov regularization. Regularization is a way to add more information to the problem when it is ill-posed or ill-conditioned. However is still an open question as to how to weight this information. The discrepancy principle considers the residual norm to determine the regularization weight or parameter, while the 2 method [8, 9, ?, 10] uses the regularized residual. Using the regularized residual has the benet of giving a clear 2 test with a xed noise level when the number of parameters is equal to or greater than the number of data. Previous work with the 2 method has been for linear problems, and here we extend it to nonlinear problems. In particular, we determine the appropriate 2 tests for Gauss-Newton and Levenburg Marquart algorithms, and these tests are used to nd a regularization parameter or weights on initial parameter estimate errors. This algorithm is applied to a two-dimensional cross-well tomography problem and a one dimensions electromagnetic problem.
Keywords: Regularization, nonlinear, least squares, covariance
Submitted to: Inverse Problems
Nonlinear
method
1. Introduction We address nonlinear inverse problems of the form F(x) = d, where d 2 Rm represents measurements, F 2 Rmn a nonlinear model that depends on parameters x 2 Rn . It is fairly straightforward, although possibly computationally demanding, to recover parameter estimates x from data d for a linear operator F = A, when A is wellconditioned. If A is ill-conditioned, typically constraints or penalties are added to the problem, and the problem is regularized. The most common regularization technique is Tikhonov regularization where least squares is used to minimize ||d Ax||2 + 2 ||Lx||2 . 2 2 (1)
The minimum occurs at x x = (AT A + 2 LT L) 1 AT d, where it is clear how appropriate choice of creates a well-conditioned problem. Popular methods for choosing are the discrepancy principle, L-curve, and generalized cross validation (GCV) [5], while L can be chosen to be the identity matrix, or a discrete approximation to a derivative operator. The problem with this approach is that large values of may signicantly change the problem, or create undesired smooth solutions. Nonlinear problems have the same issues, but in addition, have the added burden of linear approximations. [11]. The parameter estimate of the regularized nonlinear problem minimizes ||d F(x)||2 + 2 ||Lx||2 . 2 2 (2)
In [3] they show that this regularized problem is stable and that there is a weak connection between the ill-posedness of the non-linear problem and the linearization. In [13] they prove the well-posedness of the regularized problem and convergence of regularized solutions. The least squares estimate can be viewed as the maximum a posteriori (MAP) estimate [14] when data and initial parameter estimate xp have errors that are normally distributed with error covariances C and Cf , respectively. Regularization in this case takes the form of adding priori information about the parameters. The regularized maximum likelihood estimate minimizes the following objective function: J = (d F(x))T C 1 (d F(x)) + (x xp )T Cf 1 (x xp ). (3)
The di culty with this viewpoint is that the distribution of the data an initial parameter errors may be unknown, or not normal. In addition, good estimates of C and Cf can be di cult to acquire. However, dierent norms or functionals can be sampled which represent error distributions other than normal, and scientic knowledge can be used to
Nonlinear
method
form prior error covariance matrices. Empirical Bayes methods can also be used to nd hyperparameters, i.e. parameters which dene the prior distribution. In this work we use the least squares estimator and weight the squared errors with inverse covariances matrices, but do not assume the errors are normally distributed. Matrix weights alleviate the problem of undesired smooth solutions because they give piecewise smooth solutions, thus discontinuous parameters can be obtained. In preliminary work [8] and [10], diagonal matrix weights Cf were found, and future work involves more robust methods for nding Cf . In this initial work on nonlinear problems, we approximate the unknown error covariance matrix with a scalar, but future work involves nding more dense matrices. To nd this scalar weight we use the 2 method which is similar to the discrepancy principle. The discrepancy principle can be viewed as a 2 test on the data residual but this is not possible when number of parameters is greater than or equal to the number of data because the degrees of freedom in the 2 test is zero or negative. Instead we suggest applying the 2 test to the regularized residual (1), in which case the number of degrees of freedom is equal to the number of data [1]. This is called the 2 method and previous work with the method applied to linear problems has been done [8, 9, 10, 12]. The 2 method is described in Section 2 where we also develop it for nonlinear problems. We prove in Section 2 that the 2 principle applies to each iterate of the Gauss-Newton and Levenburg-Marquart methods and give the appropriate functionals from which the regularization parameter, or error covariance matrices, can be estimated. In Section 3 we give results from the method on benchmark problems from [1], and compare it to other methods. In Section 4 we give conclusions and future work. 2. Nonlinear
2
method
We formulate the problem as minimizing weighted errors and f in a least squares sense where the errors are dened as: d = F(x) + x = xp + f . (4) (5)
The weighted least squares t of the measurements d and initial parameter estimate xp is found by minimizing the objective function in (3). This means the weights are chosen to be the inverse covariance matrices for the respective errors. Statistical information comes in the form of error covariance matrices, but they are only used to weight the errors or residuals, and not used in the sense of Bayesian inference because the errors may not be Gaussian. An initial estimate xp is required for this method, but this often comes from some understanding of the problem. The goal is then to nd a way to weight this initial
Nonlinear
method
estimate. The elements in Cf should reect if xp is a good or poor estimate, and ideally quantify the correlation in the parameter errors. Estimation of Cf is thus a challenge with limited understanding of the parameters, but regularization methods are an approach to its approximation. In this work we use a modied form of the discrepancy principle, called the 2 method [8] to estimate it. It is well known that the linear form of the objective function (3) follows a 2 distribution [1, 8], and this fact is often used to also check the validity of estimates. This property is given in the following Theorem. Theorem 1. If F(x) = Ax, d and x are independent and identically distributed random variables from unknown distributions with = f = 0, T = C , T = Cf and T f = 0, and the number of data m is large then the minimum value of (3) follows a 2 distribution with m degrees of freedom. A proof of this is given in [8]. In the case that xp is not the mean of x, then (3) follows a non-central 2 distribution as shown in [12] . The requirement that T f = 0 assumes that the errors in d and xp are uncorrelated and hence we do not get xp from d. The 2 method is used to nd weights for mists in the initial parameter estimates, Cf , or data, C , and is based on the 2 tests given in Theorem 1. The method can also be used to nd weights for the regularization term when it contains a rst or second derivative operator L. In this case the inverse covariance matrix is viewed as weighting the error in an initial estimate of the appropriate derivative. The minimum value of (3) is J ( ) = rT (ACf AT + C ) 1 r x with r = d Axp [8]. Its expected value is the number of degrees of freedom, which is the number of measurements in d. The 2 method thus estimates weights Cf or C by solving the nonlinear system of equations rT (ACf AT + C ) 1 r = m. (6)
For example, with d given by the data, xp an initial parameter estimate (possibly 0), if we have estimates of data uncertainty C we can solve for Cf (alternatively we can solve for C given Cf ). This diers from the discrepancy principle in that the regularized residual is used for the 2 test, rather than just the data residual. One advantage of this approach over the discrepancy principle is that it can be applied when the number of parameters is greater than or equal to the number of data. Results from the 2 2 method for the case when Cf = f I, i.e the scalar 2 method, are given in [9, 12]. It is shown there that is an attractive alternative to the L-curve, Generalized Cross Validation (GCV) and the discrepancy principle among other methods. Preliminary
Nonlinear
method
work for more dense estimates of C or Cf has been done in [8, 10] and future work involves e cient solution of nonlinear systems similar to (6) to estimate more dense Cf or C . Here we extend the estimation of Cf or C by 2 tests to nonlinear problems. It has been shown that the nonlinear data residual has similar properties to those of the linear data residual []. In particular, it behaves like a 2 random variable with m n degrees of freedom. Here we show that when Newton-type methods are used to nd the minimum of (3), the corresponding cost functions at each iterate are also 2 , and we give the 2 tests for both the Gauss-Newton and Levenburg-Marquart methods. 2.1. Gauss-Newton Method The Gauss-Newton method uses Newtons method to estimate the minimum of function. When we use it to nd parameters that minimize (3) (or (2) with xp = 0, Cf 1 = 2 LLT ), we get the following estimate at each iterate xk+1 = xk + (JT C 1 Jk + Cf 1 ) 1 (JT C 1 rk k k Cf 1 4xk ) (7)
where Jk is the Jacobian of f (x) about the previous estimate xk , rk = d F(xk ), and 4xk = xp xk . The corresponding objective function that is minimized at each iterate is essentially the linearlization of (3) e e J (x) Jk (x) = (dk e Jk x)T C 1 (dk Jk x) + (x xp )T Cf 1 (x xp ),(8)
e with dk = d F(xk ) + Jk xk . Just as the linear form of (3) follows a 2 distribution with m degrees of freedom, the underlying cost function (8) in each iterate of the Gauss-Newton method for nonlinear problems also follows a 2 distribution with m degrees of freedom, and this is proved in Theorem 2. The problem (4)-(5) is now formulated as minimizing the weighted errors k and f dened by: e dk = Jk x + k x = xp + f where k =
k
(9) (10)
+ and
k
represents error in linear approximation at step k, i.e. F(xk ) Jk (x xk ).
= F(x)
Theorem 2. Let k = f = 0, T f = 0, k T = Ck , and T = Cf , then the minimum k k value of (8) at each iterate is e Jk (xk+1 ) = (rk + Jk 4xk )T Pk 1 (rk + Jk 4xk )
2
(11)
with Pk = Jk Cf JT + Ck , and it follows a k
distribution with m degrees of freedom.
Nonlinear
method
Proof. The minimum occurs at xk+1 and is dened by xk+1 = xk + (JT Ck1 Jk + Cf 1 ) 1 (JT Ck1 rk k k e where rk = dk Jk xk and 4xk = xk xp . Dene Qk = JT Ck1 Jk + Cf 1 so that k xk+1 = xk + Cf JT Pk 1 rk k Cf 1 4xk ) (12)
Qk 1 Cf 1 4xk .
(13)
The minimum value of (8) is found by substituting (13) into it. Simplifying we get e Jk (xk+1 ) = rT Pk 1 rk +24xT JT Pk 1 rk +4xT Cf 1 Cf k k k k Cf 1 Cf Q k 1 C f 1 = C f 1 C f JT C 1 Jk Q k 1 C f 1 k = JT Cf 1 Qk 1 JT Ck1 k k = JT (JT Pk 1 )T k k where the last equality uses the fact that Cf JT Pk 1 = Qk 1 JT Ck1 . Now we have the k k following quadratic form e Jk (xk+1 ) = (rk + Jk 4xk )T Pk 1 (rk + Jk 4xk ) = kT k
1/2 T
Qk 1 Cf 1 4xk .(14)
Simplifying further we note that
where rk + Jk 4xk = Pk k. The square root of P is dened since Ck and Cf are covariance matrices and hence Pk is symmetric positive denite. The mean of k is k = Pk and rk = Jk (xp xk
1/2
(rk + Jk (xk
k
xp )) x p ) + C f JT 1 P k 1 1 r k k
xk ) if = =
= = 0. Now using (13) Qk 1 1 Cf 1 )(xk

1 1 1 1
xp = (I
(I Qk 1 Cf Cf JT 1 Pk 1 1 Jk 1 )(xk 1 k 1 1 Qk 1 (Qk 1 Qk 1 JT 1 Ck1 Jk 1 )(xk 1 k
xp ) xp )
= 0, thus k = 0. In addition, the covariance of k is kkT = Pk = I.

1/2 1/2
J k C f JT + C k P k k
Nonlinear
method
k
7 + C and C xk )2 )
k
Note that Ck = C each iterate, i.e. C k 2.2. Occams method
can be approximated using the Hessian Hk at
Hk cov((x 2
HT k . 2
(15)
Occams method estimates the parameters that minimize (3) iteratively with (7). This 2 means that Cf 1 = k LT L in (7), and it diers from Gauss-Newton because the regularization parameter is updated at each iterate. The regularization parameter is chosen according to the discrepancy principle, and the operator L is almost always chosen to represent the second derivative [1]. The discrepancy principle nds the value of k for which ||d F(xk+1 )||2 ||||2 . 2 2 The goal in Occams inversion is to nd smooth parameters that t the data [4]. When xp , and hence the second derivate estimate, is chosen to be 0 smooth results are obtained. The discrepancy principle uses the 2 test for the data residual to nd weights k that minimizes this second derivate estimate of zero, which is why the resulting parameters are smooth. It is possible to nd non-smooth parameters even when xp is 0 if the second derivate operator is weighted with an appropriate error covariance matrix. Proper weighting of the residuals, i.e. each element does not have the same weight, allow piecewise smooth estimates [10]. The nonlinear 2 method diers from Occams inversion in that the regularized residual is used to nd the weights k . When the operator L is not full rank, the degrees of freedom in the 2 test changes [9]. The leads to the following Corollary to Theorem 2. Corollary 3. Let e JL (x) = (dk e Jk x)T C 1 (dk Jk x) + (L(x xp ))T (CL ) 1 (L(x f xp )),(16)
and assume the invertibility condition N (Jk ) \ N (L) 6= 0,
(17)
where N (Jk ) is the null space of matrix Jk . In addition, let the assumptions in Theorem 2 hold, except that Lf (Lf )T = CL . If L has rank p, the minimum value of the functional f (2) follows a 2 distribution with m n + p degrees of freedom. Proof. The proof for linear problems is given in [9]. 2.3. Levenburg-Marquart Method With the Gauss-Newton method it may be the case the objective function does not decrease, or it may decrease slowly at each iteration. The solution can be incremented
Nonlinear
method
k.
8 The (18)
in the direction of steepest descent by introducing the damping parameter iterate in this case at each step is xk+1 = xk + (JT C 1 Jk + Cf 1 + k
k I) 1
(JT C 1 rk k
Cf 1 4xk )
e e JkLM (x) = (dk
In [1] they state that this approach diers from Tikhonov regularization because it does not alter the objective function (3). However, at each iterate it does minimize an altered e form of Jk (x) in (8) i.e. it minimizes e Jk x)T C 1 (dk Jk x) + (x xp )T Cf 1 (x xp ) +
k (x
xk )T (x
xk )(19)
By viewing each iteration of the LM algorithm as minimizing the cost function (19) we can make a simple modication to Theorem 2 and get the corresponding 2 for LM method. Theorem 4. With the same assumptions as in Theorem 2, the minimum value of (19) at each iterate is e JkLM (xk+1 ) = (rk + Jk 4xk )T (Pk )
k I) 1 T Jk 1
with Pk = Jk (Cf 1 + freedom.
(rk + Jk 4xk )
2
(20)
+ Ck , and it follows a
distribution with m degrees of
In Section we give some results on the cross-well tomography 3. Numerical experiments We present several numerical experiments to illustrate both the validity of the previous results with the corresponding necessary approximations and also the applicability of these results to solving real problems. For these numerical experiments we consider two non-linear ill-posed problems from Chapter 10 of [1]. Here they describe and solve these problems and provide corresponding Matlab codes with the text that both set up the forward problem nd solutions for the inverse problems. 3.1. Two ill-posed non-linear least squares problems The rst problem is an implementation of non-linear cross-well tomography. The forward model includes ray path refraction and the refracted rays tend to travel through high velocity regions and avoid low-velocity regions adding non-linearity to the problem. It is set up with two wells spaced 1600 m apart, with 150 sources and receivers equally spaced at 100 m depths down the wells. The travel time between each pair of opposing sources and receivers is recorded, and the objective is to recover the two dimensional velocity structure between the two wells. The true velocity structure has a background of 2.9 km/s with an embedded Gaussian shaped region that is about 10 % faster and
Nonlinear
method
another Gaussian-shaped region that is about 15% slower than the background. The observations for this particular problem consist of 64 travel times between each pair of opposing sources and receivers. The true velocity model along with the 64 ray paths are plotted in Figure 1. The region between the two wells is discretized into 64 square blocks so there are 64 model parameters (the slowness of each block) and 64 observations (the ray path travel times).
3200 0 3100 500 m 3000 2900 1000 2800 2700 0 500 1000 m 1500 2600
1500
Figure 1: The setup of the cross-well tomography problem. Shown here is the true velocity model(m/s) and the corresponding ray paths A second problem considered is the estimation of soil electrical conductivity prole from above-ground electromagnetic induction measurements. The forward problem models a ground conductivity meter which has two coils on a 1 meter long bar. Alternating current is sent in one of the coils which induces currents in soil and both coils measure the magnetic eld that is created by the subsurface currents. For a complete treatment of the instrument and corresponding mathematical model see [7]. However for the purpose of this paper the model is considered and is treated as the proverbial black box function. There are a total of 18 observations and the subsurface electrical conductivity of the ground is discretized into 10 layers, 20 cm thick with a semi-innite layer below 2m, resulting in 11 conductivities to estimated. 3.2. Numerical validation of
2
behavior
We numerically tested Theorem 2 on the cross-well tomography problem by using the Gauss Newton method to minimize (3) for 1000 dierent realizations of and f for which were sampled from N(~ (.001)2 I) and N(~ (.00001)2 I), respectively and found the 0, 0,
Nonlinear
method
10
e approximate distributions Jk at each iteration. For perspective the values for d are O(.1) and the values for x are O(.0001). Each of these inversions converged in 6 iterations. An additional simplifying assumption was made that C" C"k . These approximate e distributions for Jk are shown below in Figure 2. Since there are 64 observations and e e 64 model parameters the theory says that Jk 2 and so E(Jk ) = 64. However C" 64 e slightly underestimates C"k , so the mean of Jk is somewhat larger than the theoretical mean. We also tested Corollary 3 using this tomography problem. In [1] they use a discrete approximation of the Laplacian to regularize this problem. Using this operator and the same assumptions as above we found approximate distributions of JLk from (2). The rank of this operator is 63 so the E(JLk ) = 63. The approximate distributions found are shown in Figure 2.
At iteration k =1 200 150 100 50 0 35 69.5 100 At iteration k =4 200 150 100 50 0 35 65 100 200 150 100 50 0 35 65 100 200 150 100 50 0 35 65.4 100 At iteration k =5 200 150 100 50 0 35 65 100 At iteration k =2 200 150 100 50 0 35 65.1 100 At iteration k =6 200 150 100 50 0 35 63.8 100 At iteration k =3 200 150 100 50 0 35 64.3 100 At iteration k =4 200 150 100 50 0 35 63.8 100 At iteration k =1 200 150 100 50 0 35 63.8 100 At iteration k =5 200 150 100 50 0 35 63.8 100 At iteration k =2 200 150 100 50 0 35 63.8 100 At iteration k =6 At iteration k =3
e Figure 2: The approximate distributions showing the sample distribution of Jk (x) at each Gauss-Newton iteration for the cross-well tomography problem. The mean of the sample is shown as the middle tick. Similarly, we numerically tested Theorem 4 using Matlab codes adapted from the codes provided by [1] for the electrical conductivity problem. For this numerical experiment an implementation of the Levenberg-Marquardt algorithm was used to estimate the model parameters by minimizing (3). This inversion was repeated for 1000 dierent realizations of and f which were sampled from N(~ (.1)2 I) and N(~ (100)2 I) 0, 0, respectively. For perspective, the values for d are O(100) and the values for x are O(100). Almost all of these trials converged in 6 iterations or less with the majority converging in 5 iterations. As before the assumption was made that C" C"k . These e approximate distributions for JkLM (x) are shown below in Figure 3. There were 18 e observations for this problem and 11 parameters so E(JkLM ) = 18. This experiment was repeated using an approximate 2nd order dierential operator with rank = 9 to
Nonlinear
method
11
regularize the inversion instead of just the inverse prior covariance of the parameters. In this experiment f was sampled from N(~ (10)2 I) since Lx are O(10). All of these LM 0, iterations converged in 5 or less iterations with the majority converging in 4 iterations. These results are also shown in Figure 3.
At iteration k =1 200 150 100 50 0 5 18.5 35 200 150 100 50 0 5 18 35 At iteration k =2 200 150 100 50 0 5 18 35 At iteration k =3 200 150 100 50 0 5 16.2 35 At iteration k =1 200 150 100 50 0 5 16.2 35 At iteration k =2 200 150 100 50 0 5 16.2 35 At iteration k =3
At iteration k =4 200 150 100 50 0 5 18 35 200 150 100 50 0
At iteration k =5 60 40 20 5 18.3 35 0
At iteration k =6 200 150 100 50 5 20 35 0
At iteration k =4 60 40 20 0
At iteration k =5
5 16.2
35
5 17
35
e Figure 3: The approximate distributions showing the sample distribution of JkLM (x) at 2 each Levenberg-Marquardt iteration for the electrical conductivities problem. Cf = f I 2 (Left) and for Cf = f L0 L (Right). The sample mean is shown as the middle tick. 3.3. Solving the inverse problem and comparison of results To apply the previous results to solving nonlinear inverse problems we propose the following algorithm: Algorithm 1 Nonlinear 2 method Start with initial point x0 for k=0,1,2,... do e Calculate J(xk ) and dk Choose such that: q e e e the linearized cost function (8) is J (xk+1 ) E(Jk+1 ) + 2 V ar(Jk+1 ) where xk+1 is found using the Gauss-Newton step (??) if Converged then x = xk+1 return end if end for
Nonlinear
method
3200 3200 0 3100 3100 500 m 3000 2900 1000 2800 2700 0 500 1000 m 1500 2600
12
500 m
3000 2900
1000
2800 2700 0 500 1000 m 1500 2600
1500
1500
Figure 4: Solutions found for the tomography problem using Cf 1 = 2 LT L. (Left) The solution corresponding to discrepancy principle. (Right) The solution corresponding to nonlinear 2 method. In the application of this algorithm can be found using any standard root nding algorithm. We used the secant method which converged in 4 or 5 steps. Also for more di cult problems the Gauss Newton step and linear cost function can be replaced with the Levenberg Marquardt step and linear cost function. To test this algorithm we compared using the inverse problems previously discussed against the discrepancy principle and the Occams Inversion in the next two sections. 3.3.1. Cross-well tomography In [1] they solve the non-linear tomography problem by minimizing J = (d F( ))T C 1 (d F( )) + 2 (x xp )T LT L(x xp ) for a x x range of values of s where L represents the two-dimensional Laplacian operator. After which they picked the for which Jdata, ( ) = (d F( ))T C 1 (d F( )) best x x x satised the discrepancy principle. The discrepancy principle say that Jdata ( ) = (d x F( ))T C 1 (d F( )) should m. For comparison we also used this same methodology x x to solve inverse problem using J = (d F( ))T C 1 (d F( ))+2 (x xp )T (x xp ) such x x that Jdata, ( ) satised the discrepancy principle. We then compared these solutions to x the solutions found using the nonlinear 2 method. These solutions are shown below in Figures 4, 5.
Nonlinear
method
3200 3200 0 3100 3100 500 m 3000 2900 1000 2800 2700 0 500 1000 m 1500 2600
13
500 m
3000 2900
1000
2800 2700 0 500 1000 m 1500 2600
1500
1500
Figure 5: Solutions found for the tomography problem using Cf 1 = 2 I. (Left) The solution corresponding to discrepancy principle (Right) The solution corresponding to nonlinear 2 method. In order to establish a good comparison, the above procedure for both the discrepancy principle and the 2 method were repeated for 200 dierent realizations of ". The mean and standard deviation of || xtrue ||/||xtrue || for the 200 trials for each x method are given below in Table 1. The 2 method gave better results on average when Cf 1 = 2 I was used as the regularizing operator. But the discrepancy principle did better on average when Cf 1 = 2 LT L was used as the regularizing operator. Table 1: Comparison of discrepancy principle to problem Method
2 2
method for cross-well tomography Cf 1 = 2 L T L = 0.0206 = 0.00456 = 0.018 = 0.0021
Method
Discrepancy Principle
Cf 1 = 2 I = 0.1628 = 0.0006 = 0.01672 = 0.00050
3.4. Subsurface conductivity estimation The subsurface electrical conductivities inverse problem is in some ways more di cult than the previous problem. The Gauss-Newton step doesnt always lead to a reduction in the nonlinear cost function and it is not always possible to nd a regularizing parameter for which the solution satises the discrepancy principle. In [1] they used an implementation of the Levenberg-Marquardt algorithm to minimize the data mist but this results in a solution that is under-regularized and is highly non-physical. This solution is plotted below in Figure 6. They then solved the problem using
Nonlinear
method
14
Occams Inversion method with an approximate second order dierential operator for the regularizing operator which gave a reasonable solution. This solution is also shown below in Figure 6. Using this a basis for comparison, we used the nonlinear 2 method with the Levenberg Marquardt step to nd the solution using both Cf 1 = 2 LT L and Cf 1 = 2 I as regularizing operators. These solutions are plotted below in Figure 7.
400 Conductivity (mS/m) LM solution True model 300 200 100 0 0
6000 Conductivity (mS/m) 4000 2000 0 2000 4000 0 0.5 1 1.5 depth (m)
Occams inversion True model
2.5
0.5
1 1.5 depth (m)
2.5
Figure 6: (Left) The solution found using a Levenberg Marquardt algorithm. (Right) The solution found with Occams inversion.
400 Conductivity (mS/m) 300 200 100 0 0
Conductivity (mS/m)
2 method True model
400 300 200 100 0 0
2 method True model
0.5
1 1.5 depth (m)
2.5
0.5
1 1.5 depth (m)
2.5
Figure 7: (Left) The parameters found using the 2 method and Cf 1 = 2 LT L. (Right) The parameters found using the 2 method and Cf 1 = 2 I. Once again, to establish a good comparison each of these methods were run for 200 dierent realizations of . The mean and standard deviation of || xtrue ||/||xtrue || for x
Nonlinear
method
15
the 200 trials for each method are given below in Table 1. These results show that the 2 method consistently gave much better answers than Occams Inversion for this problem. Occams Inversion was not done using 2 I as the regularizing operator because it was unable to nd good solutions. . Table 2: Comparison of conductivities
2
method to Occams Inversion for the estimation of subsurface Cf 1 = 2 I = 0.1827 = 0.0295 Cf 1 = 2 L T L = 0.0308 = 0.0281 = 0.4376 = 0.6615
Method
2
Method
Occams Inversion
Nonlinear
method
16
4. Summary and conclusions Weighted least squares with Tikhonov regularization is a popular approach to solving ill-posed inverse problems. For nonlinear problems, the resulting parameter estimates minimize a nonlinear functional and Gauss-Newton or Levenburg-Marquart algorithms are commonly used to nd the minimizer. These nonlinear optimization methods iteratively solve a linear form of the objective function, and we have shown for both methods that the minimum value of the functionals at each iterate follow a 2 distribution. The minimum value of the Gauss-Newton and Levenburg-Marquart functionals dier, but both have degrees of freedom equal to the number of data. We illustrate this 2 behavior for the Gauss-Newton method on a two-dimensional cross-well tomography problem and for the Levenburg-Marquart method, we use a one-dimensional electromagnetic problem. Since the minimum value of the functionals follow a 2 distirubtion at each iterate, this gives 2 tests from which a regularization parameter can be found for nonlinear problems. We give the resulting algorithms, and test them by estimating parameters for the cross-well tomography and electromagnetic problems. We show that this 2 method gives better results than the discrepancy principle, occams method or the LevenburgMarquart method with no regularization. 5. References
[1] Aster, R.C., Borchers, B. and Thurber, C., 2005, Parameter Estimation and Inverse Problems, Academic Press, p 301. [2] Constable, S.C., Parker, R.L., and Constable, C.G.,1987,Occams inversion: A practical algorithm for generating smooth models from electromagnetic sounding data, Geophysics, 53(3):289-300. [3] Engl, H.W., Kunisch, K. and Neubauer, A., 1989,Convergence rates for Tikhonov regularisation of non-linear ill-posed problems, Inverse Problems 5 , 523 [4] Gouveia, W.P. and Scales, J.A., 1997, Resolution of seismic waveform inversion: Bayes versus Occam, Inverse Problems 13, 323-349. [5] Hansen, P. C., 1998, Rank-Decient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, [6] Heath, M. T., 2002, Scientic Computing, An Introductory Survey, 2nd edition, McGraw-Hill, New York, 2002. [7] Hendrickx, J M H; Borchers, B; Corwin, D L; Lesch, S M; et al. Inversion of soil conductivity proles from electromagnetic induction measurements: Theory and experimental verication Soil Science Society of America Journal66. 3 (May/Jun 2002): 673-685. [8] Mead, J., 2008, Parameter estimation: A new approach to weighting a priori information, J. Inv. Ill-posed Problems, 16, 2, 175-194. [9] Mead, J., and Renaut, R. A., A Newton root-nding algorithm for estimating the regularization parameter for solving ill-conditioned least squares problems, Inverse Problems, Vol. 25, No. 2, 025002.
Nonlinear
method
17
[10] Mead, J., 2011, Discontinuous parameter estimates with least squares estimators, Applied Mathematics and Computation, in revision. [11] Rieder, A., 1999, On the regularization of nonlinear ill-posed problems via inexact Newton iterations, Inverse Problems, 15, 309. [12] Renaut, R.A,, Hnetynkova, I. and Mead, J. L., 2010, Regularization Parameter Estimation for Large Scale Tikhonov Regularization Using A Priori Information, Computational Statistics and Data Analysis, 54, 3430-3445. [13] Seidman, T.I. and Vogel, C.R., 1989, Well posedness and convergence of some regularisation methods for non-linear ill posed problems, Inverse Problems 5 227. [14] Tarantola, A., 2005, Inverse Problem Theory and Methods for Model Parameter Estimation, (SIAM). [15] Vogel, C. R., 2002, Computational Methods for Inverse Problems, (SIAM Frontiers in Applied Mathematics).

Tests For Choice of Regularization Parameter in Nonlinear Inverse Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tests For Choice of Regularization Parameter in Nonlinear Inverse Problems

Uploaded by

Copyright:

Available Formats

tests for choice of regularization parameter in nonlinear inverse problems

J.L. Mead and C.C. Hammerquist

Keywords: Regularization, nonlinear, least squares, covariance

Submitted to: Inverse Problems

represents error in linear approximation at step k, i.e. F(xk ) Jk (x xk ).

with Pk = Jk Cf JT + Ck , and it follows a k

distribution with m degrees of freedom.

Simplifying further we note that

= = 0. Now using (13) Qk 1 1 Cf 1 )(xk

(I Qk 1 Cf Cf JT 1 Pk 1 1 Jk 1 )(xk 1 k 1 1 Qk 1 (Qk 1 Qk 1 JT 1 Ck1 Jk 1 )(xk 1 k

= 0, thus k = 0. In addition, the covariance of k is kkT = Pk = I.

Note that Ck = C each iterate, i.e. C k 2.2. Occams method

can be approximated using the Hessian Hk at

and assume the invertibility condition N (Jk ) \ N (L) 6= 0,

e e JkLM (x) = (dk

with Pk = Jk (Cf 1 + freedom.

distribution with m degrees of

At iteration k =4 200 150 100 50 0 5 18 35 200 150 100 50 0

At iteration k =6 200 150 100 50 5 20 35 0

2800 2700 0 500 1000 m 1500 2600

2800 2700 0 500 1000 m 1500 2600

method for cross-well tomography Cf 1 = 2 L T L = 0.0206 = 0.00456 = 0.018 = 0.0021

Cf 1 = 2 I = 0.1628 = 0.0006 = 0.01672 = 0.00050

Occams inversion True model

1 1.5 depth (m)

400 Conductivity (mS/m) 300 200 100 0 0

2 method True model

400 300 200 100 0 0

2 method True model

1 1.5 depth (m)

1 1.5 depth (m)

You might also like