Self Adaptive Levenberg Marquardt Algorithm

Computational Optimization and Applications, 34, 47–62, 2006

c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.
DOI: 10.1007/s10589-005-3074-z
Convergence Properties of a Self-adaptive

Levenberg-Marquardt Algorithm Under
Local Error Bound Condition∗
JINYAN FAN
Department of Mathematics, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China;
Division of Computational Science, E-Institute of Shanghai Universities, SJTU
JIANYU PAN
Department of Mathematics, East China Normal University, Shanghai, 200062, P.R. China
Received December 17, 2003; Revised November 17, 2004; Accepted May 24, 2005
Published online: 18 October 2005
Abstract. We propose a new self-adaptive Levenberg-Marquardt algorithm for the system of nonlinear
equations F(x) = 0. The Levenberg-Marquardt parameter is chosen as the product of Fk δ with δ being a
positive constant, and some function of the ratio between the actual reduction and predicted reduction of the
merit function. Under the local error bound condition which is weaker than the nonsingularity, we show that
the Levenberg-Marquardt method converges superlinearly to the solution for δ∈ (0, 1), while quadratically
for δ∈ [1, 2]. Numerical results show that the new algorithm performs very well for the nonlinear equations
with high rank deficiency.
Keywords: singular nonlinear equations, Levenberg-Marquardt method, trust region method
1. Introduction
In this paper, we consider the problem of solving the system of nonlinear equations
F(x) = 0, (1)
where F(x): Rn → Rn is continuously differentiable and F (x) is Lipschitz continuous.

Due to the nonlinearity of F(x), system (1) may have no solutions. Throughout the paper,
we assume that the solution set of (1) is nonempty and denoted by X∗ . In all cases, ·
refers to the 2-norm.
The classical Levenberg-Marquardt method (see [5, 6]) for nonlinear equations (1)
computes the current trial step by
−1 T
dk = − JkT Jk + µk I Jk Fk , (2)
∗ Thiswork is supported by Chinese NSFC grants 10401023 and 10501013, Research Grants for Young
Teachers of Shanghai Jiao Tong University, and E-Institute of Shanghai Municipal Education Commission,
N. E03004.
48 FAN AND PAN
where Fk = F(xk ), Jk = F (xk ) is the Jocabi, and µk ≥ 0 is a parameter being updated

from iteration to iteration. The Levenberg-Marquardt step (2) is a modification of the
Newton’s step
dkN = −Jk−1 Fk . (3)
The parameter µk is introduced to overcome the difficulties caused by the singularity

or near singularity of the Jacobi Jk . It is easy to see that the Levenberg-Marquardt step
dk is the solution of the optimization problem

min Fk + Jk d2 + µk d2 = ψk (d). (4)
d∈R n
Let
−1 T
k = − JkT Jk + µk I Jk Fk , (5)
then dk is also the solution of the following subproblem

minn Fk + Jk d2 = ϕk (d)
d∈R
s. t. d ≤ k . (6)
Hence, we can regard the Levenberg-Marquardt method as a trust region method. The
main difference between these two methods is that, the trust region method updates
the trust region radius k directly, while the Levenberg-Marquardt method updates the
parameter µk , which in turn modifies the value k implicitly. For more details about the
relations between these two methods, please see [7, 8, 16, 17].
It is well known that the Levenberg-Marquardt method has a quadratic convergence if
the Jacobi matrix is nonsingular at the solution x ∗ of (1) and if the initial point is chosen
sufficiently close to x ∗ . However, the nonsingularity is too strong. Recently, Yamashita
and Fukushima [14] show that if the Levenberg-Marquardt parameter is chosen as
µk = Fk 2 in (2), and if the initial point is sufficiently close to the solution set X∗ of (1),
then under the local error bound condition, which is weaker than the nonsingularity, the
Levenberg-Marquardt method converges quadratically to the solution set X∗ . However,
when the sequence is sufficiently close to the solution set, the parameter µk = Fk 2
may be smaller than the machine eps, so it will lose its role. On the other hand, when
the sequence is far away from the solution set, µk = Fk 2 may be very large, and the
step dk will be too small, so it prevents the sequence from converging quickly. Based
on such observations, Fan and Yuan [2] choose the parameter as µk = Fk in (2), and
obtain that the Levenberg-Marquardt method converges quadratically to the solution of
(1) under the local error bound condition.
More recently, H. Dan, N. Yamashita and M. Fukushima [1] propose an inexact
Levenberg-Marquardt method for nonlinear equations (1) with the parameter chosen as
µk = Fk δ , (7)
SELF-ADAPTIVE LEVENBERG-MARQUARDT ALGORITHM 49
where δ is a positive constant. They show that the new inexact Levenberg-Marquardt
method converges to the solution superlinearly when δ ∈ (0, 2), while quadratically
when δ = 2.
To obtain the global convergence, we use the trust region technique and present a
modified Levenberg-Marquardt algorithm in [3]. We define the actual reduction and the
predicted reduction of the merit function as follows:
Aredk = Fk 2 − F(xk + dk )2 , (8)
Predk = ϕk (0) − ϕk (dk ). (9)
The ratio between these two reductions
Aredk
rk = (10)
Predk
plays a key role in deciding whether to accept the trial step and how to adjust the
parameter. In [3], we choose the Levenberg-Marquardt parameter as
µk = αk Fk , (11)
where αk is computed by


 4αk , if rk < p1 ,



αk , if rk ∈ [ p1 , p2 ],
αk+1 = (12)

 αk


 max , m , if rk > p2 ,
4
where p1 , p2 are positive constants satisfying 0 < p1 < p2 < 1.

As we know, the ratio rk between the actual reduction and the predicted reduction
reflects the extent to which the quadratic model ϕk (d) approximates the merit function.
If the step dk is successful and the ratio rk is satisfactory, then we accept dk and reduce
αk , otherwise, we refuse dk and enlarge αk . Naturally, when rk → 1 and the step dk
is very very successful, we may reduce αk even to zero in the ideal case. On the other
extreme case, when rk → −∞ and dk is too bad, then it is reasonable for us to enlarge
αk even to +∞. Hence, to avoid the jumps in αk+1 /αk across the thresholds p1 and p2 ,
and to make more use of the information of the ratio rk , we consider the choice of the
Levenberg-Marquardt parameter in (2) in this paper as follows,
µk = αk Fk δ , with αk+1 = αk q(rk ), (13)
where q(r ) is a continuous nonnegative function of r and δ ∈ (0, 2]. The main purpose
of the introduction of q(r ) is to obtain better numerical results. The factor αk is updated
at a variable rate according to the ratio rk , rather than by simply enlarging or reducing the
original one at a constant rate as (12). So we call it a self-adaptive Levenberg-Marquardt
50 FAN AND PAN
Figure 1. The image of (15).
method. This technique is similar to that proposed by Hei [4] in updating the trust region
radius for unconstrained optimization problem.
In the next section, we present the new self-adaptive Levenberg-Marquardt algorithm,
and obtain the global convergence that the sequence generated by the new algorithm
converges to the stationary point of the merit function. In section 3, we show that
under the local error bound condition, the new algorithm converges superlinearly to the
solution of (1) for δ ∈ (0, 1), while quadratically for δ ∈ [1, 2]. Finally in section 4,
the numerical results are given for some singular systems of nonlinear equations.
2. The algorithm and global convergence
In this section, we first construct the function q(r) and discuss its properties, then present
the new self-adaptive Levenberg-Marquardt algorithm and prove its global convergence.
Usually, if rk is satisfactory, then we reduce α k , otherwise we enlarge α k . Hence,
similarly to the updating rule (12), the continuous nonnegative function q(r) in (13)
should satisfy

 q(r̃ ) = 1;


q(r ) ≥ 1, ∀ r < r̃ ; (14)



0 < q(r ) ≤ 1, ∀ r > r̃
for some given 0 < r̃ < 1. For example, we can construct q(r ) as

1
q(r ) = max , 1 − 2(2r − 1)3 . (15)
4
Its image is illustrated as Figure 1. For comparision, we also indicate the discontinuous
updating rule of αk given by (12). Of course, there are some other chioces of q(r )
such as

1
1 1 3
q(r ) = max ,1 − r − . (16)
4 2
We now present our new self-adaptive Levenberg-Marquardt algorithm.

Algorithm 2.1 (Self-adaptive Levenberg-Marquardt algorithm for nonlinear equa-
tions).
Step 1. Given x1 ∈ R n , 0 < δ ≤ 2, α1 > m > 0, 0 ≤ p0 < r̃ < 1, k := 1.
Step 2. If JkT Fk = 0, then stop; Solve

JkT Jk + αk Fk δ I d = −JkT Fk , (17)
giving dk .
Step 3. Compute rk = Aredk /Predk ; set

xk + dk , if rk > p0 ,
xk+1 = (18)
xk , otherwise.
Step 4. Choose αk+1 as
αk+1 = max{m, αk q(rk )}, (19)
where q(r ) is given by (15); k := k + 1; go to Step 2.
In Algorithm 2.1, the given positive constant m is the lowerbound of Levenberg-

Marquardt parameter. It plays the role in preventing the step from being too large when
the sequence is near the solution.
To obtain the global convergence, we make the following assumptions:
Assumption 2.1 F(x) is continuously differentiable, and both the Jacobi J(x) and F(x)
are Lipschitz continuous, i.e., there exist positive constants L1 and L2 such that
J (y) − J (x) ≤ L 1 y − x, ∀x, y, (20)
and
F(y) − F(x) ≤ L 2 y − x, ∀x, y. (21)
Now from the famous result of Powell [11], we have the following lemma.
Lemma 2.1. Let dk be computed by (17), then the inequality
T
T J Fk

Predk = ϕk (0) − ϕk (dk ) ≥ Jk Fk min dk , kT (22)
J Jk
k
holds for all k ≥ 1.

52 FAN AND PAN
Theorem 2.1. Under the conditions of Assumption 2.1, the sequence {xk } generated
by Algorithm 2.1 satisfies

lim JkT Fk = 0. (23)
k→∞
Proof: If the theorem is not true, then there exist a positive τ and infinite many k such
that
T
J Fk ≥ τ. (24)
k
Let K and T be the sets of all indices that satisfy

S = {k | JkT Fk ≥ τ/2},
and
T = {k | xk+1 = xk , k ∈ S}.
Then it follows from Lemma 2.1 that

F1 2 ≥ (Fk 2 − Fk+1 2 )
k∈S

= (Fk 2 − Fk+1 2 )
k∈T
T
T J Fk
≥ p0 Predk ≥ p0 Jk Fk min dk , kT
J Jk
k∈T k∈T k
p0 τ
τ
≥ min dk , 2 , (25)
k∈T
2 2L 2
which gives

dk < +∞. (26)
k∈T
The above inequality, together with (20) and (21) implies that

J T Fk − J T Fk+1 < +∞. (27)
k k+1
k∈T
Relation (27) and the fact that (24) holds for infinitely many k indicate that there exists
a k̂ with Jk̂T Fk̂ ≥ τ such that

J T Fk − J T Fk+1 < τ . (28)
k k+1
2
k∈T,k≥k̂
By induction, we obtain that JkT Fk ≥ τ/2 for all k ≥ k̂. This result and (26) mean
that
lim dk = 0. (29)
k→∞
So it folllows from (17) that αk → +∞. On the other hand, it follows from (24), (26)
and Lemma 2.1 that

Aredk
|rk − 1| = − 1
Predk
Fk + Jk dk O(dk 2 ) + O(dk 4 )
=
Predk
Fk + Jk dk O(dk 2 ) + O(dk 4 )
≤ T
J Fk min dk , JkT Fk
T
k Jk Jk
O(dk 2 )
≤ → 0, (30)
dk
that is, rk → 1. Hence, there exists a positive constant M > m such that
αk < M
holds for all large k, which gives a contradiction. Therefore the assumption (24) can
not be true. The proof is completed.
3. Local convergence results
In this section, we study the convergence properties of the self-adaptive Levenberg-

Marquardt Algorithm 2.1 when choosing the different exponent δ ∈ (0, 2] of the power
function F(x). We first show that the sequence generated by Algorithm 2.1 converges
to some solution x̄ ∈ X ∗ . We make the following assumption.
Assumption 3.1. (a) F(x) is continuously differentiable, and the Jacobian J (x) is
Lipschitz continuous on some neighbourhood of x ∗ ∈ X ∗ , i.e., there exist two positive
constants L 1 and b1 < 1 such that
J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ N (x ∗ , b1 ) = {x | x − x ∗ ≤ b1 }. (31)
(b) F(x) provides a local error bound on N (x ∗ , b1 ) for the system (1), i.e., there
exists a positive constant c1 > 0 such that
F(x) ≥ c1 dist(x, X ∗ ), ∀x ∈ N (x ∗ , b1 ) = {x | x − x ∗ ≤ b1 }. (32)
Note that, if J (x ∗ ) is nonsingular at the solution x ∗ of (1), then x ∗ is an isolated

solution, so F(x) provides a local error bound on some neighbourhood of x ∗ . But the
54 FAN AND PAN
converse is not necessarily true, for more details, please see [14]. Hence, a local error
bound condition is weaker than the nonsingularity.
By Assumption 3.1(a), we have
F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ N (x ∗ , b1 ), (33)
and there exists a constant L 2 > 0 such that

F(y) − F(x) ≤ L 2 y − x, ∀x, y ∈ N (x ∗ , b1 ). (34)
In the following, we denote x̄k the vector in X∗ that satisfies
xk − x̄k = dist(xk , X ∗ ). (35)
Let

b1 c1
t = min , . (36)
2 2L 1
To obtain the local convergence rate of {xk }, we first give some lemmas.
Lemma 3.1. Under the conditions of Assumption 3.1. if xk ∈ N (x ∗ , t), then there
exists a constant c2 > 0 such that
dk ≤ c2 xk − x̄k . (37)
Proof: Since xk ∈ N (x ∗ , t), we have
x̄k − x ∗ ≤ x̄k − xk + xk − x ∗ ≤ 2xk − x ∗ ≤ 2t ≤ b1 ,
which means that x̄k ∈ N (x ∗ , b1 ). As the Levenberg-Marquardt step

−1 T
dk = − JkT Jk + αk Fk δ I Jk Fk (38)
is the minimizer of
ψk (d) = Fk + Jk d2 + αk Fk δ d2 , (39)
we have from (19), (33), (34) and 0 < δ ≤ 2 that
1
dk 2 ≤ ψk (x̄k − xk )
αk Fk δ
1
= (Fk + Jk (x̄k − xk )2 + αk Fk δ x̄k − xk 2 )
αk Fk δ
L2
≤ 1δ x̄k − xk 4−δ + x̄k − xk 2
mc1

2
L1
≤ + 1 x̄k − xk 2 . (40)
mc1δ
Thus we obtain (37).
Lemma 3.2. Under the conditions of Assumption 3.1, if xk ∈ N (x ∗ , t), then there
exists a positive constant c3 such that the predicted reduction satisfies
Predk ≥ c3 Fk dk (41)
for all sufficiently large k.
Proof: By the definition of t, we have

c1
x̄k − xk ≤ xk − x ∗ ≤ . (42)
2L 1
We consider two cases. If x̄k − xk ≤ dk , then x̄k − xk is the feasible point of
problem (6), so it follows from (32), (33), (42) and Lemma 3.1 that
Fk − Fk + Jk dk ≥ Fk − Fk + Jk (x̄k − xk )

≥ c1 x̄k − xk − L 1 x̄k − xk 2
c1
≥ dk . (43)
2c2
On the other case when x̄k − xk > dk , we have

dk
Fk − Fk + Jk dk ≥ Fk − Fk + Jk (x̄k − xk )
x̄k − xk
dk
≥ (Fk − Fk + Jk (x̄k − xk ))
x̄k − xk |
dk
≥ (c1 x̄k − xk − L 1 x̄k − xk 2 )
x̄k − xk
c1
≥ dk . (44)
2
Inequalities (43) and (44), together with Lemma 3.1 show that
Predk = (Fk + Fk + Jk dk )(Fk − Fk + Jk dk )

≥ Fk (Fk − Fk + Jk dk )

c1 c1
≥ min , Fk dk , (45)
2c2 2
which gives (41). Hence, it follows that

Aredk O(dk 2 )Fk + Jk dk + O(dk 4 )
|rk − 1| = − 1 =
Predk Predk
O(dk )Fk + O(dk )
2 4
≤
O(Fk dk )
= O(dk ) → 0, (46)
56 FAN AND PAN
so we have rk → 1, which means that q(rk ) < 1 for all sufficiently large k, thus there
exists a constant M > m such that
αk < M (47)
holds for all sufficiently large k.
Lemma 3.3. Under the conditions of Assumptions 3.1, if x1 is chosen sufficiently close
to X∗ , then the sequence generated by Algorithm 2.1 converges to some solution x̄ of (1)
superlinearly for δ ∈ (0, 2), and quadratically for δ = 2.
Proof: Suppose xk ∈ N (x ∗ , t). It follows from (33) and (47) that
ψk (dk ) ≤ ψk (x̄k − xk )
= Fk + Jk (x̄k − xk )2 + αk Fk δ x̄k − xk 2
≤ L 21 x̄k − xk 4 + M L δ2 x̄k − xk 2+δ

≤ L 21 + M L δ2 x̄k − xk 2+δ .
So we have
F(xk + dk ) ≤ Fk + Jk dk + L 1 dk 2

≤ ψk (dk ) + L 1 dk 2

2+δ
≤ L 21 + M L δ2 x̄k − xk 2 + L 1 c22 x̄k − xk 2
2 2+δ
≤ L 1 + M L δ2 + L 1 c22 x̄k − xk 2 .
Hence
1
dist(xk + dk , X ∗ ) ≤ F(xk + dk ) ≤ c4 dist(xk , X ∗ ) 2 ,
2+δ
(48)
c1
2
where c4 = L 1 + M L δ2 + L 1 c22 /c1 .
Now using the same analysis as in [2, 14], we can show that if the initial point x1 is
chosen sufficiently to the solution set X∗ , then all xk will be in N(x∗ , t). So it follows
from (48) that
∞

dist(xk , X ∗ ) < +∞,
k=0
which implies, due to Lemma 3.1, that

∞

dk < +∞.
k=0
Thus {xk } converges to some point x̄ ∈ X ∗ . It is obviously that
dist(xk , X ∗ ) ≤ dist(xk + dk , X ∗ ) + dk .
The above inequality and (48) imply that
dist(xk , X ∗ ) ≤ 2dk (49)
for all large k. Thus from (37), (48) and (49) we obtain that
2+δ
dk+1 = O dk 2 .
Hence, {xk } converges to some solution x̄ ∈ X superlinearly when δ ∈ [1, 2) and

quadratically when δ = 2.
We now use the singular value decomposition of the Jacobi matrix to give the more
accurate convergence rate of Algorithm 2.1 according to the different exponent δ.
Without generality, we suppose {xk } → x ∗ , rank(J (x ∗ )) = r , and the singular value
decomposition (SVD) of J (x ∗ ) is
∗ ∗T
∗ ∗ ∗ ∗T ∗ ∗
1 V1
J (x ) = U V = (U1 , U2 ) = U1∗ 1∗ V1∗T , (50)
0 V2∗T
where 1∗ = diag(σ1∗ , . . . , σr∗ ) with σ1∗ ≥ · · · ≥ σr∗ > 0. And suppose the SVD of Jk
is
Jk = Uk k VkT
  
k,1 T
Vk,1
  T 
= (Uk,1 , Uk,2 , Uk,3 ) 
 k,2 V 
  k,2 
T
0 Vk,3
= Uk,1 k,1 Vk,1

T
+ Uk,2 k,2 Vk,2
T
,
where k,1 = diag(σk,1 , . . . , σk,r ) and k,2 = diag(σk,r +1 , . . . , σk,r +q ) with σk,1 ≥
· · · ≥ σk,r ≥ σk,r +1 ≥ · · · ≥ σk,r +q > 0, q ≥ 0. In the following, if the context is clear,
we write the above formulae as
Jk = U1 1 V1T + U2 2 V2T . (51)
Then we have the following lemma, which plays an important role in the analyses later.
Lemma 3.4. Under the conditions of Assumption 2.1, we have

(a) U1 U1T Fk ≤ O(xk − x̄k );
(b) U2 U2T Fk ≤ O(xk − x ∗ 2 );
(c) U3 U3T Fk ≤ O(xk − x̄k 2 ).
58 FAN AND PAN
Proof: Result (a) follows immediately from (21). By (20) and the theory of matrix
perturbation [13], we have
diag(1 − 1∗ , 2 , 0) ≤ Jk − J ∗ ≤ L 1 xk − x ∗ . (52)
which implies that
1 − 1∗ ≤ L 1 xk − x ∗ and 2 ≤ L 1 xk − x ∗ . (53)
Let sk = −Jk+ Fk , where Jk+ is the pseudo-inverse of Jk . It is easy to see that sk is the
least squares solution of min Fk + Jk s, so we obtain from (33) that

U3 U T Fk = Fk + Jk sk ≤ Fk + Jk (x̄k − xk ) ≤ O(xk − x̄k 2 ). (54)
3
Let J˜k = U1 1 V1T and s̃k = − J˜k+ Fk . Since s̃k is the least squares solution of
min Fk + J˜k s, it follows from (33) and (53) that

U2 U T + U3 U T Fk = Fk + J˜k s̃k
2 3
≤ Fk + J˜k (x̄k − xk )
≤ Fk + Jk (x̄k − xk ) + ( J˜k − Jk )(x̄k − xk )

≤ L 1 x̄k − xk )2 + U2 2 V2T (x̄k − xk )
≤ O(xk − x ∗ 2 ). (55)
Due to the orthogonality of U2 and U3 , we get result (b).
Now the convergence rate of Algorithm 2.1 can be given as follows:
Theorem 3.1. Under the conditions of Assumption 3.1, the sequence {xk } generated
by Algorithm 2.1 converges to the solution of (1) superlinearly for δ ∈ (0, 1), and
quadratically for δ ∈ [1, 2].
Proof: By the SVD of Jk , we know

−1 −1
dk = −V1 12 + αk Fk δ I 1 U1T Fk − V2 22 + αk Fk δ I 2 U2T Fk , (56)
and
−1
Fk + Jk dk = Fk − U1 1 12 + αk Fk δ I 1 U1T Fk
−1
−U2 2 22 + αk Fk δ I 2 U2T Fk
−1
= αk Fk δ U1 12 + αk Fk δ I U1T Fk
−1
+αk Fk δ U2 22 + αk Fk δ I U2T Fk + U3 U3T Fk . (57)
Since {xk } converges to x∗ , without loss of generality, we assume that L 1 xk − x ∗ <
σr∗ /2
holds for all large k. Thus we obtain from (53) that
2 1 4
+ αk Fk δ I −1 ≤ −2 ≤ < ∗2 . (58)
1 1
(σr∗ ∗
− L 1 xk − x ) 2 σr
So it follows from Lemma 3.4, (21) and (47) that
Fk + Jk dk ≤ O(xk − x̄k 1+δ ) + O(xk − x ∗ 2 ). (59)
Hence, we have from Lemma 3.1 that
c1 xk+1 − x̄k+1 ≤ F(xk + dk ) ≤ Fk + Jk dk + O(dk 2 )

≤ O(xk − x ∗ 1+δ ) + O(xk − x ∗ 2 )
= O(dk 1+δ ) + O(dk 2 )
≤ O(xk − x̄k 1+δ ) + O(xk − x̄k 2 ). (60)
It now follows from
xk − x̄k ≤ dk + xk+1 − x̄k+1 (61)
and 0 < δ ≤ 2 that
xk − x̄k ≤ 2dk (62)
holds for all sufficiently large k, which together with Lemma 3.1 means that
xk − x̄k = O(dk ). (63)
Therefore we have from (60) that

O(dk 1+δ ), if δ ∈ (0, 1),
dk+1 ≤ (64)
O(dk 2 ), if δ ∈ [1, 2].
That is, {xk } converges to the solution of system (1.1) superlinearly for δ ∈ (0, 1), and
quadratically for δ ∈ [1, 2]. The proof is completed.
Remark. Theorem 3.1 improves the results given in Lemma 3.3. From the analyses
above, we can see that the sequence {xk } converges linearly to the solution of (1) for
δ = 0. For δ ∈ (2, 3), we have xk+1 − x̄k+1 ≤ o(xk − x̄k ), which means {xk }
converges superlinearly to the solution set of (1). However, we can not obtain that
xk+1 − x ∗ ≤ o(xk − x ∗ ) for δ ∈ (2, 3); the problem is that we can not show the
result of dk ≤ O(xk − x̄k ) for δ ∈ (2, 3).
60 FAN AND PAN
4. Numerical results
In this section, we test our self-adaptive Levenberg-Marquardt algorithm 2.1 (SLM)

on some singular nonlinear equations, and compare it with the modified Levenberg-
Marquardt algorithm (MLM) proposed in paper [3]. The main differences between these
two algorithms are the updating rule of αk and the exponent of Fk . For comparison,
we choose δ = 1 in Algorithm 2.1. In fact, we can change it in the test.
The test problems are the same as those in [3]. They have the form
F̂(x) = F(x) − J (x ∗ )A(A T A)−1 A T (x − x ∗ ), (65)
where F(x) is the standard nonsingular test function given by Moré, Garbow and Hill-
strom [9], x∗ is its root, and A ∈ R n×k has full column rank with 1 ≤ k ≤ n. Hence,
F̂(x ∗ ) = 0, (66)
and
Jˆ(x ∗ ) = J (x ∗ )(I − A(A T A)−1 A T ) (67)
has rank n−k.

We choose p0 = 0.0001, p1 = 0.25, p2 = 0.75, and α 1 = 10−4 , m = 10−8 . The
algorithm is terminated when the norm of Jk T Fk , e.g.,the derivative of 12 F(x)2 at
the k-th iteration, is less than ε= 10−5 , or when the number of the iterations exceeds
100(n+1). For
A ∈ R n×1 , A T = (1, 1, . . . , 1), (68)
these two algorithms perform almost the same, so we omit the results here. However,
for

1 1 1 1 ··· 1
A∈R , A =
n×2 T
, (69)
1 −1 1 −1 · · · ±1
the new self-adaptive Levenberg-Marquardt alogrithm 2.1 performs much better than
the modified Levenberg-Marquardt alogrithm. The results are listed in the following
table. The third column of the table indicates that the starting point is x1 , 10x1 , where
100x1 is suggested by Moré, Garbow and Hillstrom in [9]; “NF” and “NJ” represent the
numbers of function calculations and Jacobi calculations, respectively; and “n.s.x∗ ?”
gives a Y(yes) if the algorithm converges to the same solution as the corresponding
nonsingular problem, a N(no) otherwise. If the algorithm fails to find the solution in
100(n+1) iterations, we denote it by the sign “–”. And if the iterations have underflows
or overflows, we denote it by OF.
From the table we can see that our new self-adaptive Levenberg-Marquardt alogrithm
2.1 performs more efficiently than the modified Levenberg-Marquardt algorithm for
most of the problems. The continuous updating rule of α k results in a faster convergence
than just reducing or enlarging α k at a constant rate. Hence, the new updating strategy
of α k with the continuous function q(r) satisfying (14) seems to be more efficient for
equations with high rank deficiency of the Jacobi J(x∗ ).
Results on singular nonlinear equations with rank (F (x ∗ )) = n − 2.
Acknowledgments
The authors would like to thank the referees for the helpful comments and valuable
suggestions, which have greatly improve the presentation.
References
1. H. Dan, N. Yamashita, and M. Fukushima, “Convergence properties of the inexact Levenberg-Marquardt

method under local error bound,” Optimization Methods and Software, vol. 17, pp. 605–626, 2002.
2. J.Y. Fan and Y.X. Yuan, “On the quadratic convergence of the Levenberg-Marquardt method without
nonsingularity assumption,” Computing, vol. 74, no. 1, pp. 23–39, 2005.
3. J.Y. Fan, “A modified Levenberg-Marquardt algorithm for singular system of nonlinear equations,” Journal
of Computational Mathematics, vol. 21, no. 5, pp. 625–636, 2003.
4. L. Hei, “A self-adaptive trust region algorithm,” Report, AMSS, Chinese Academy of Sciences, 2000.
5. K. Levenberg, “A method for the solution of certain nonlinear problems in least squares,” Quart. Appl.
Math., vol. 2, pp. 164–166, 1944.
6. D.W. Marquardt, “An algorithm for least-squares estimation of nonlinear inequalities,” SIAM J. Appl.
Math., vol. 11, pp. 431–441, 1963.
7. J.J. Moré, “The Levenberg-Marquardt algorithm: implementation and theory,” in Lecture Notes in Math-
ematics 630: Numerical Analysis, G.A. Watson (Ed.), Springer-Verlag: Berlin, 1978, pp. 105–116.
8. J.J. Moré, “Recent developments in algorithms and software for trust region methods,” in Mathematical
Programming: The State of Art, A. Bachem, M. Grötschel and B. Korte (Eds.), Springer: Berlin, 1983,
pp. 258–287.
62 FAN AND PAN
9. J.J. Moré, B.S. Garbow, and K.H. Hillstrom, “Testing unconstrained optimization software,” ACM Trans.
Math. Software, vol. 7, pp. 17–41, 1981.
10. J. Nocedal and Y.X. Yuan, “Combining trust region and line search techniques,” in Advances in Nonlinear
Programming, Y. Yuan (Ed.), Kluwer, pp. 153–175, 1998.
11. M.J.D. Powell, “Convergence properties of a class of minimization algorithms,” in Nonlinear Program-
ming 2, O.L. Mangasarian, R.R. Meyer, and S.M. Robinson (Eds.), Academic Press: New York, 1975,
pp. 1–27.
12. R.B. Schnabel and P.D. Frank, “Tensor methods for nonlinear equations,” SIAM J. Numer. Anal., vol.
21, pp. 815–843, 1984.
13. G.W. Stewart and J.G. Sun, Matrix Perturbation Theory. Academic Press: San Diego, CA, 1990.
14. N. Yamashita and M. Fukushima, “On the rate of convergence of the Levenberg-Marquardt method,”
Computing (Suppl), vol. 15, pp. 239–249, 2001.
15. Y.X. Yuan, “Trust region algorithms for nonlinear programming,” in Contemporary Mathematics, Z.C.
Shi (Ed.) vol. 163, American Mathematics Society, 1994, pp. 205–225.
16. Y.X. Yuan, “Trust region algorithms for nonlinear equations,” Information, vol. 1, pp. 7–20, 1998.
17. Y.X. Yuan, “A review of trust region algorithms for optimization,” in ICM99: Proceedings of the Fourth
International Congress on Industrial and Applied Mathematics, J.M. Ball and J.C.R. Hunt (Eds.), Oxford
University Press, 2000 pp. 271–282.

Self Adaptive Levenberg Marquardt Algorithm

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Self Adaptive Levenberg Marquardt Algorithm

Uploaded by

Copyright:

Available Formats

Computational Optimization and Applications, 34, 47–62, 2006

Convergence Properties of a Self-adaptive

Published online: 18 October 2005

Keywords: singular nonlinear equations, Levenberg-Marquardt method, trust region method

where F(x): Rn → Rn is continuously differentiable and F (x) is Lipschitz continuous.

where Fk = F(xk ), Jk = F  (xk ) is the Jocabi, and µk ≥ 0 is a parameter being updated

dkN = −Jk−1 Fk . (3)

The parameter µk is introduced to overcome the difficulties caused by the singularity

then dk is also the solution of the following subproblem

Aredk = Fk 2 − F(xk + dk )2 , (8)

Predk = ϕk (0) − ϕk (dk ). (9)

The ratio between these two reductions

where p1 , p2 are positive constants satisfying 0 < p1 < p2 < 1.

µk = αk Fk δ , with αk+1 = αk q(rk ), (13)

Figure 1. The image of (15).

2. The algorithm and global convergence

We now present our new self-adaptive Levenberg-Marquardt algorithm.

Step 4. Choose αk+1 as

αk+1 = max{m, αk q(rk )}, (19)

where q(r ) is given by (15); k := k + 1; go to Step 2.

In Algorithm 2.1, the given positive constant m is the lowerbound of Levenberg-

J (y) − J (x) ≤ L 1 y − x, ∀x, y, (20)

F(y) − F(x) ≤ L 2 y − x, ∀x, y. (21)

holds for all k ≥ 1.

Let K and T be the sets of all indices that satisfy

Then it follows from Lemma 2.1 that

3. Local convergence results

In this section, we study the convergence properties of the self-adaptive Levenberg-

J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ N (x ∗ , b1 ) = {x | x − x ∗ ≤ b1 }. (31)

F(x) ≥ c1 dist(x, X ∗ ), ∀x ∈ N (x ∗ , b1 ) = {x | x − x ∗ ≤ b1 }. (32)

Note that, if J (x ∗ ) is nonsingular at the solution x ∗ of (1), then x ∗ is an isolated

and there exists a constant L 2 > 0 such that

In the following, we denote x̄k the vector in X∗ that satisfies

xk − x̄k = dist(xk , X ∗ ). (35)

Proof: Since xk ∈ N (x ∗ , t), we have

x̄k − x ∗ ≤ x̄k − xk + xk − x ∗ ≤ 2xk − x ∗ ≤ 2t ≤ b1 ,

which means that x̄k ∈ N (x ∗ , b1 ). As the Levenberg-Marquardt step

ψk (d) = Fk + Jk d2 + αk Fk δ d2 , (39)

we have from (19), (33), (34) and 0 < δ ≤ 2 that

Thus we obtain (37).

Predk ≥ c3 Fk dk (41)

for all sufficiently large k.

Proof: By the definition of t, we have

Fk − Fk + Jk dk ≥ Fk − Fk + Jk (x̄k − xk )

On the other case when x̄k − xk > dk , we have

Predk = (Fk + Fk + Jk dk )(Fk − Fk + Jk dk )

which gives (41). Hence, it follows that

holds for all sufficiently large k.

Proof: Suppose xk ∈ N (x ∗ , t). It follows from (33) and (47) that

F(xk + dk ) ≤ Fk + Jk dk + L 1 dk 2

which implies, due to Lemma 3.1, that

Thus {xk } converges to some point x̄ ∈ X ∗ . It is obviously that

dist(xk , X ∗ ) ≤ dist(xk + dk , X ∗ ) + dk .

The above inequality and (48) imply that

dist(xk , X ∗ ) ≤ 2dk (49)

Hence, {xk } converges to some solution x̄ ∈ X superlinearly when δ ∈ [1, 2) and

= Uk,1 k,1 Vk,1

Jk = U1 1 V1T + U2 2 V2T . (51)

Lemma 3.4. Under the conditions of Assumption 2.1, we have

diag(1 − 1∗ , 2 , 0) ≤ Jk − J ∗ ≤ L 1 xk − x ∗ . (52)

which implies that

1 − 1∗ ≤ L 1 xk − x ∗ and 2 ≤ L 1 xk − x ∗ . (53)

Due to the orthogonality of U2 and U3 , we get result (b).

Now the convergence rate of Algorithm 2.1 can be given as follows:

where F(x): Rn → Rn is continuously differentiable and F (x) is Lipschitz continuous.

where Fk = F(xk ), Jk = F (xk ) is the Jocabi, and µk ≥ 0 is a parameter being updated

= Uk,1 k,1 Vk,1

Jk = U1 1 V1T + U2 2 V2T . (51)

diag(1 − 1∗ , 2 , 0) ≤ Jk − J ∗ ≤ L 1 xk − x ∗ . (52)

1 − 1∗ ≤ L 1 xk − x ∗ and 2 ≤ L 1 xk − x ∗ . (53)