Professional Documents
Culture Documents
c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.
DOI: 10.1007/s10589-005-3074-z
JIANYU PAN
Department of Mathematics, East China Normal University, Shanghai, 200062, P.R. China
Received December 17, 2003; Revised November 17, 2004; Accepted May 24, 2005
Abstract. We propose a new self-adaptive Levenberg-Marquardt algorithm for the system of nonlinear
equations F(x) = 0. The Levenberg-Marquardt parameter is chosen as the product of Fk δ with δ being a
positive constant, and some function of the ratio between the actual reduction and predicted reduction of the
merit function. Under the local error bound condition which is weaker than the nonsingularity, we show that
the Levenberg-Marquardt method converges superlinearly to the solution for δ∈ (0, 1), while quadratically
for δ∈ [1, 2]. Numerical results show that the new algorithm performs very well for the nonlinear equations
with high rank deficiency.
1. Introduction
In this paper, we consider the problem of solving the system of nonlinear equations
F(x) = 0, (1)
−1 T
dk = − JkT Jk + µk I Jk Fk , (2)
∗ Thiswork is supported by Chinese NSFC grants 10401023 and 10501013, Research Grants for Young
Teachers of Shanghai Jiao Tong University, and E-Institute of Shanghai Municipal Education Commission,
N. E03004.
48 FAN AND PAN
min Fk + Jk d2 + µk d2 = ψk (d). (4)
d∈R n
Let
−1 T
k = − JkT Jk + µk I Jk Fk , (5)
Hence, we can regard the Levenberg-Marquardt method as a trust region method. The
main difference between these two methods is that, the trust region method updates
the trust region radius k directly, while the Levenberg-Marquardt method updates the
parameter µk , which in turn modifies the value k implicitly. For more details about the
relations between these two methods, please see [7, 8, 16, 17].
It is well known that the Levenberg-Marquardt method has a quadratic convergence if
the Jacobi matrix is nonsingular at the solution x ∗ of (1) and if the initial point is chosen
sufficiently close to x ∗ . However, the nonsingularity is too strong. Recently, Yamashita
and Fukushima [14] show that if the Levenberg-Marquardt parameter is chosen as
µk = Fk 2 in (2), and if the initial point is sufficiently close to the solution set X∗ of (1),
then under the local error bound condition, which is weaker than the nonsingularity, the
Levenberg-Marquardt method converges quadratically to the solution set X∗ . However,
when the sequence is sufficiently close to the solution set, the parameter µk = Fk 2
may be smaller than the machine eps, so it will lose its role. On the other hand, when
the sequence is far away from the solution set, µk = Fk 2 may be very large, and the
step dk will be too small, so it prevents the sequence from converging quickly. Based
on such observations, Fan and Yuan [2] choose the parameter as µk = Fk in (2), and
obtain that the Levenberg-Marquardt method converges quadratically to the solution of
(1) under the local error bound condition.
More recently, H. Dan, N. Yamashita and M. Fukushima [1] propose an inexact
Levenberg-Marquardt method for nonlinear equations (1) with the parameter chosen as
µk = Fk δ , (7)
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 49
where δ is a positive constant. They show that the new inexact Levenberg-Marquardt
method converges to the solution superlinearly when δ ∈ (0, 2), while quadratically
when δ = 2.
To obtain the global convergence, we use the trust region technique and present a
modified Levenberg-Marquardt algorithm in [3]. We define the actual reduction and the
predicted reduction of the merit function as follows:
Aredk
rk = (10)
Predk
plays a key role in deciding whether to accept the trial step and how to adjust the
parameter. In [3], we choose the Levenberg-Marquardt parameter as
µk = αk Fk , (11)
where αk is computed by
4αk , if rk < p1 ,
αk , if rk ∈ [ p1 , p2 ],
αk+1 = (12)
αk
max , m , if rk > p2 ,
4
where q(r ) is a continuous nonnegative function of r and δ ∈ (0, 2]. The main purpose
of the introduction of q(r ) is to obtain better numerical results. The factor αk is updated
at a variable rate according to the ratio rk , rather than by simply enlarging or reducing the
original one at a constant rate as (12). So we call it a self-adaptive Levenberg-Marquardt
50 FAN AND PAN
method. This technique is similar to that proposed by Hei [4] in updating the trust region
radius for unconstrained optimization problem.
In the next section, we present the new self-adaptive Levenberg-Marquardt algorithm,
and obtain the global convergence that the sequence generated by the new algorithm
converges to the stationary point of the merit function. In section 3, we show that
under the local error bound condition, the new algorithm converges superlinearly to the
solution of (1) for δ ∈ (0, 1), while quadratically for δ ∈ [1, 2]. Finally in section 4,
the numerical results are given for some singular systems of nonlinear equations.
In this section, we first construct the function q(r) and discuss its properties, then present
the new self-adaptive Levenberg-Marquardt algorithm and prove its global convergence.
Usually, if rk is satisfactory, then we reduce α k , otherwise we enlarge α k . Hence,
similarly to the updating rule (12), the continuous nonnegative function q(r) in (13)
should satisfy
q(r̃ ) = 1;
q(r ) ≥ 1, ∀ r < r̃ ; (14)
0 < q(r ) ≤ 1, ∀ r > r̃
for some given 0 < r̃ < 1. For example, we can construct q(r ) as
1
q(r ) = max , 1 − 2(2r − 1)3 . (15)
4
Its image is illustrated as Figure 1. For comparision, we also indicate the discontinuous
updating rule of αk given by (12). Of course, there are some other chioces of q(r )
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 51
such as
1
1 1 3
q(r ) = max ,1 − r − . (16)
4 2
giving dk .
Step 3. Compute rk = Aredk /Predk ; set
xk + dk , if rk > p0 ,
xk+1 = (18)
xk , otherwise.
and
Now from the famous result of Powell [11], we have the following lemma.
Lemma 2.1. Let dk be computed by (17), then the inequality
T
T J Fk
Predk = ϕk (0) − ϕk (dk ) ≥ Jk Fk min dk , kT (22)
J Jk
k
Theorem 2.1. Under the conditions of Assumption 2.1, the sequence {xk } generated
by Algorithm 2.1 satisfies
lim JkT Fk = 0. (23)
k→∞
Proof: If the theorem is not true, then there exist a positive τ and infinite many k such
that
T
J Fk ≥ τ. (24)
k
and
T = {k | xk+1 = xk , k ∈ S}.
which gives
dk < +∞. (26)
k∈T
The above inequality, together with (20) and (21) implies that
J T Fk − J T Fk+1
< +∞. (27)
k k+1
k∈T
Relation (27) and the fact that (24) holds for infinitely many k indicate that there exists
a k̂ with Jk̂T Fk̂ ≥ τ such that
J T Fk − J T Fk+1
< τ . (28)
k k+1
2
k∈T,k≥k̂
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 53
By induction, we obtain that JkT Fk ≥ τ/2 for all k ≥ k̂. This result and (26) mean
that
lim dk = 0. (29)
k→∞
So it folllows from (17) that αk → +∞. On the other hand, it follows from (24), (26)
and Lemma 2.1 that
Aredk
|rk − 1| =
− 1
Predk
Fk + Jk dk O(dk 2 ) + O(dk 4 )
=
Predk
Fk + Jk dk O(dk 2 ) + O(dk 4 )
≤ T
J Fk min dk , JkT Fk
T
k Jk Jk
O(dk 2 )
≤ → 0, (30)
dk
that is, rk → 1. Hence, there exists a positive constant M > m such that
αk < M
holds for all large k, which gives a contradiction. Therefore the assumption (24) can
not be true. The proof is completed.
Assumption 3.1. (a) F(x) is continuously differentiable, and the Jacobian J (x) is
Lipschitz continuous on some neighbourhood of x ∗ ∈ X ∗ , i.e., there exist two positive
constants L 1 and b1 < 1 such that
(b) F(x) provides a local error bound on N (x ∗ , b1 ) for the system (1), i.e., there
exists a positive constant c1 > 0 such that
converse is not necessarily true, for more details, please see [14]. Hence, a local error
bound condition is weaker than the nonsingularity.
By Assumption 3.1(a), we have
F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ N (x ∗ , b1 ), (33)
Let
b1 c1
t = min , . (36)
2 2L 1
To obtain the local convergence rate of {xk }, we first give some lemmas.
Lemma 3.1. Under the conditions of Assumption 3.1. if xk ∈ N (x ∗ , t), then there
exists a constant c2 > 0 such that
dk ≤ c2 xk − x̄k . (37)
is the minimizer of
1
dk 2 ≤ ψk (x̄k − xk )
αk Fk δ
1
= (Fk + Jk (x̄k − xk )2 + αk Fk δ x̄k − xk 2 )
αk Fk δ
L2
≤ 1δ x̄k − xk 4−δ + x̄k − xk 2
mc1
2
L1
≤ + 1 x̄k − xk 2 . (40)
mc1δ
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 55
Lemma 3.2. Under the conditions of Assumption 3.1, if xk ∈ N (x ∗ , t), then there
exists a positive constant c3 such that the predicted reduction satisfies
We consider two cases. If x̄k − xk ≤ dk , then x̄k − xk is the feasible point of
problem (6), so it follows from (32), (33), (42) and Lemma 3.1 that
Inequalities (43) and (44), together with Lemma 3.1 show that
so we have rk → 1, which means that q(rk ) < 1 for all sufficiently large k, thus there
exists a constant M > m such that
αk < M (47)
Lemma 3.3. Under the conditions of Assumptions 3.1, if x1 is chosen sufficiently close
to X∗ , then the sequence generated by Algorithm 2.1 converges to some solution x̄ of (1)
superlinearly for δ ∈ (0, 2), and quadratically for δ = 2.
ψk (dk ) ≤ ψk (x̄k − xk )
= Fk + Jk (x̄k − xk )2 + αk Fk δ x̄k − xk 2
≤ L 21 x̄k − xk 4 + M L δ2 x̄k − xk 2+δ
≤ L 21 + M L δ2 x̄k − xk 2+δ .
So we have
Hence
1
dist(xk + dk , X ∗ ) ≤ F(xk + dk ) ≤ c4 dist(xk , X ∗ ) 2 ,
2+δ
(48)
c1
2
where c4 = L 1 + M L δ2 + L 1 c22 /c1 .
Now using the same analysis as in [2, 14], we can show that if the initial point x1 is
chosen sufficiently to the solution set X∗ , then all xk will be in N(x∗ , t). So it follows
from (48) that
∞
dist(xk , X ∗ ) < +∞,
k=0
for all large k. Thus from (37), (48) and (49) we obtain that
2+δ
dk+1 = O dk 2 .
We now use the singular value decomposition of the Jacobi matrix to give the more
accurate convergence rate of Algorithm 2.1 according to the different exponent δ.
Without generality, we suppose {xk } → x ∗ , rank(J (x ∗ )) = r , and the singular value
decomposition (SVD) of J (x ∗ ) is
∗ ∗T
∗ ∗ ∗ ∗T ∗ ∗
1 V1
J (x ) = U V = (U1 , U2 ) = U1∗ 1∗ V1∗T , (50)
0 V2∗T
where 1∗ = diag(σ1∗ , . . . , σr∗ ) with σ1∗ ≥ · · · ≥ σr∗ > 0. And suppose the SVD of Jk
is
Jk = Uk k VkT
k,1 T
Vk,1
T
= (Uk,1 , Uk,2 , Uk,3 )
k,2 V
k,2
T
0 Vk,3
where k,1 = diag(σk,1 , . . . , σk,r ) and k,2 = diag(σk,r +1 , . . . , σk,r +q ) with σk,1 ≥
· · · ≥ σk,r ≥ σk,r +1 ≥ · · · ≥ σk,r +q > 0, q ≥ 0. In the following, if the context is clear,
we write the above formulae as
Then we have the following lemma, which plays an important role in the analyses later.
Proof: Result (a) follows immediately from (21). By (20) and the theory of matrix
perturbation [13], we have
Let sk = −Jk+ Fk , where Jk+ is the pseudo-inverse of Jk . It is easy to see that sk is the
least squares solution of min Fk + Jk s, so we obtain from (33) that
U3 U T Fk = Fk + Jk sk ≤ Fk + Jk (x̄k − xk ) ≤ O(xk − x̄k 2 ). (54)
3
Let J˜k = U1 1 V1T and s̃k = − J˜k+ Fk . Since s̃k is the least squares solution of
min Fk + J˜k s, it follows from (33) and (53) that
U2 U T + U3 U T Fk = Fk + J˜k s̃k
2 3
≤ Fk + J˜k (x̄k − xk )
≤ Fk + Jk (x̄k − xk ) + ( J˜k − Jk )(x̄k − xk )
≤ L 1 x̄k − xk )2 + U2 2 V2T (x̄k − xk )
≤ O(xk − x ∗ 2 ). (55)
Theorem 3.1. Under the conditions of Assumption 3.1, the sequence {xk } generated
by Algorithm 2.1 converges to the solution of (1) superlinearly for δ ∈ (0, 1), and
quadratically for δ ∈ [1, 2].
and
−1
Fk + Jk dk = Fk − U1 1 12 + αk Fk δ I 1 U1T Fk
−1
−U2 2 22 + αk Fk δ I 2 U2T Fk
−1
= αk Fk δ U1 12 + αk Fk δ I U1T Fk
−1
+αk Fk δ U2 22 + αk Fk δ I U2T Fk + U3 U3T Fk . (57)
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 59
Since {xk } converges to x∗ , without loss of generality, we assume that L 1 xk − x ∗ <
σr∗ /2
holds for all large k. Thus we obtain from (53) that
2 1 4
+ αk Fk δ I −1 ≤ −2 ≤ < ∗2 . (58)
1 1
(σr∗ ∗
− L 1 xk − x ) 2 σr
holds for all sufficiently large k, which together with Lemma 3.1 means that
That is, {xk } converges to the solution of system (1.1) superlinearly for δ ∈ (0, 1), and
quadratically for δ ∈ [1, 2]. The proof is completed.
Remark. Theorem 3.1 improves the results given in Lemma 3.3. From the analyses
above, we can see that the sequence {xk } converges linearly to the solution of (1) for
δ = 0. For δ ∈ (2, 3), we have xk+1 − x̄k+1 ≤ o(xk − x̄k ), which means {xk }
converges superlinearly to the solution set of (1). However, we can not obtain that
xk+1 − x ∗ ≤ o(xk − x ∗ ) for δ ∈ (2, 3); the problem is that we can not show the
result of dk ≤ O(xk − x̄k ) for δ ∈ (2, 3).
60 FAN AND PAN
4. Numerical results
where F(x) is the standard nonsingular test function given by Moré, Garbow and Hill-
strom [9], x∗ is its root, and A ∈ R n×k has full column rank with 1 ≤ k ≤ n. Hence,
F̂(x ∗ ) = 0, (66)
and
these two algorithms perform almost the same, so we omit the results here. However,
for
1 1 1 1 ··· 1
A∈R , A =
n×2 T
, (69)
1 −1 1 −1 · · · ±1
the new self-adaptive Levenberg-Marquardt alogrithm 2.1 performs much better than
the modified Levenberg-Marquardt alogrithm. The results are listed in the following
table. The third column of the table indicates that the starting point is x1 , 10x1 , where
100x1 is suggested by Moré, Garbow and Hillstrom in [9]; “NF” and “NJ” represent the
numbers of function calculations and Jacobi calculations, respectively; and “n.s.x∗ ?”
gives a Y(yes) if the algorithm converges to the same solution as the corresponding
nonsingular problem, a N(no) otherwise. If the algorithm fails to find the solution in
100(n+1) iterations, we denote it by the sign “–”. And if the iterations have underflows
or overflows, we denote it by OF.
From the table we can see that our new self-adaptive Levenberg-Marquardt alogrithm
2.1 performs more efficiently than the modified Levenberg-Marquardt algorithm for
most of the problems. The continuous updating rule of α k results in a faster convergence
than just reducing or enlarging α k at a constant rate. Hence, the new updating strategy
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 61
of α k with the continuous function q(r) satisfying (14) seems to be more efficient for
equations with high rank deficiency of the Jacobi J(x∗ ).
Results on singular nonlinear equations with rank (F (x ∗ )) = n − 2.
Acknowledgments
The authors would like to thank the referees for the helpful comments and valuable
suggestions, which have greatly improve the presentation.
References
9. J.J. Moré, B.S. Garbow, and K.H. Hillstrom, “Testing unconstrained optimization software,” ACM Trans.
Math. Software, vol. 7, pp. 17–41, 1981.
10. J. Nocedal and Y.X. Yuan, “Combining trust region and line search techniques,” in Advances in Nonlinear
Programming, Y. Yuan (Ed.), Kluwer, pp. 153–175, 1998.
11. M.J.D. Powell, “Convergence properties of a class of minimization algorithms,” in Nonlinear Program-
ming 2, O.L. Mangasarian, R.R. Meyer, and S.M. Robinson (Eds.), Academic Press: New York, 1975,
pp. 1–27.
12. R.B. Schnabel and P.D. Frank, “Tensor methods for nonlinear equations,” SIAM J. Numer. Anal., vol.
21, pp. 815–843, 1984.
13. G.W. Stewart and J.G. Sun, Matrix Perturbation Theory. Academic Press: San Diego, CA, 1990.
14. N. Yamashita and M. Fukushima, “On the rate of convergence of the Levenberg-Marquardt method,”
Computing (Suppl), vol. 15, pp. 239–249, 2001.
15. Y.X. Yuan, “Trust region algorithms for nonlinear programming,” in Contemporary Mathematics, Z.C.
Shi (Ed.) vol. 163, American Mathematics Society, 1994, pp. 205–225.
16. Y.X. Yuan, “Trust region algorithms for nonlinear equations,” Information, vol. 1, pp. 7–20, 1998.
17. Y.X. Yuan, “A review of trust region algorithms for optimization,” in ICM99: Proceedings of the Fourth
International Congress on Industrial and Applied Mathematics, J.M. Ball and J.C.R. Hunt (Eds.), Oxford
University Press, 2000 pp. 271–282.