Professional Documents
Culture Documents
(1.1)
(1.2)
(1.3)
(1.4)
(ii) The sequence {xk }lN of Newton iterates is well defined with xk B(x0 , 0 ),
k lN0 , and xk x B(x0 , 0 ), k lN0 (k ), where F (x ) = 0,
(iii) The convergence xk x (k ) is quadratic,
(iv) The solution x of F (x) = 0 is unique in
B(x0 , 0 ) (D B(x0 , 0 )) ,
1 + 1 2h0
:=
.
kF 0 (x0 )1 k
Proof. We have
kF 0 (xk ) F 0 (x0 )k kxk x0 k tk
for some upper bound tk , k lN.
If we can prove xk B(x0 , 0 ) and tk := kF 0 (x0 )1 ktk < 1, k lN, then by the
(1.5)
1 kF 0 (x0 )1 kkF 0 (xk ) F 0 (x0 )k
kF 0 (x0 )1 k
kF 0 (x0 )1 k
=: k .
1 kF 0 (x0 )1 kkxk x0 k
1 tk
kF 0 (xk )1 k
Assuming the assertion to be true for some k lN, for k + 1, using (1.2) we
obtain
kxk+1 xk k = kF 0 (xk )1 F (xk )k =
= kF 0 (xk )1 (F (xk ) F (xk1 ) F 0 (xk1 )xk1 )k =
Z1
= kF 0 (xk )1
F 0 (xk1 + sxk1 ) F 0 (xk1 ) xk1 dsk
(1.6)
Z1
kF 0 (xk )1 k
s kxk1 k
1
k kxk xk1 k2 .
2
Setting
hk := kxk+1 xk k ,
we thus get the recursion
hk =
1
k h2k1
2
k lN .
(1.7)
= hk
tk
(1.8)
tk+1 tk =
Hence, multiplying both sides with kF 0 (x0 )1 k, we end up with the following
three-term recursion for tk :
1
1
tk+1 tk =
(tk tk1 )2
2 1 tk
t0 = 0 , t1 = h0 .
(1.9)
1 2
t = tk tk tk1 +
2 }k
|
{z
= (tk+1 ,tk )
1 2
t
.
2 k1}
= (tk ,tk1 )
It follows that
(tk+1 , tk ) = (t1 , t0 ) = h0 ,
from which we deduce
tk+1 tk =
h0 tk + 21 t2k
(tk )
= 0
,
1 tk
(tk )
where : lR lR is given by
1
(t) := h0 t + t2 .
2
Obviously, has the zeroes
p
t1 := 1 1 2h0
t2 := 1 +
1 2h0 .
j
B(x0 , ) D , :=
h02 1
.
1
h
0
j=0
(1.10)
(1.11)
(1.12)
(1.13)
(1.14)
(ii)
1
2
kxk xk1 k ,
(iii)
k lN,
, where
X k j
1
1
(h20 )2 )
:=
(1 +
.
2
2 1 h20k
j=1
Proof. Observing
F 0 (xk1 ) xk1 + F (xk1 ) = 0 ,
we obtain
kxk k = kF 0 (xk )1 F (xk )k =
0 k 1
k
k1
0 k1
k1
= kF (x )
F (x ) F (x ) F (x )x
k
Z1
kF 0 (xk )1 k k
F 0 (xk1 + sxk1 ) F 0 (xk1 ) xk1 dsk
0
kxk1 k2 ,
2
which gives the assertion (ii).
We now prove that {xk }lN0 is a Cauchy sequence in B(x0 , ). By induction on
k we show
kxk+1 xk k
2 2k
h
0
k lN0 .
2
h0 .
h0
h
.
2
0
It follows readily from (1.15) that xk+1 B(x0 , ):
kxk+1 x0 k kxk+1 xk k + ... + kx1 x0 k
X
2
2 2k
j
h0 + ... + h0 =
h0
h20 1 = .
| {z } j=0
=
(1.15)
Since {xk }lN0 is a Cauchy sequence in B(x0 , ), there exists x B(x0 , ) such
that xk x (k ). Hence,
xk = F 0 (xk )1 F (xk ) F 0 (x )F (x ) = 0 ,
and thus F (x ) = 0 which proves (i).
The assertion (iii) is shown as follows: Setting
hk :=
1
kxk k ,
2
we obtain
kxk x k = lim kxk xm k
k<m
h
i
m
m1
k+1
k
lim
kx x
k + ... + kx
x k
k<m
h
i
2
lim
hm1 + ... + hk =
k<m
h
hk+1
2hk
hm1 i
1 +
=
lim
+ ... +
.
k<m
hk
hk
On the other hand, taking (ii) into account
hk
2
2
kxk1 k2 = h2k1 ,
whence
`
hk+` h2k
k lN0 .
We conclude
h
i
1
k1 2
2
kx x k
kx k
1 + hk + ...
2
X
1
2j
1 +
hk kxk1 k2 ,
2
j=1
k
1 1 2h0
0
B(x , 0 ) D , 0 :=
.
x, y D ,
(1.16)
(1.17)
(1.18)
(1.19)
(ii) The sequence {xk }lN of Newton iterates is well defined with xk B(x0 , 0 ),
k lN0 , and xk x B(x0 , 0 ), k lN0 (k ), where F (x ) = 0,
(iii) The convergence xk x (k ) is quadratic,
(iv) The solution x of F (x) = 0 is unique in
0
B(x , 0 ) (D B(x , 0 )) ,
Proof. First homework assignment.
0 :=
1+
1 2h0
.
(1.20)
,
x, y, z D(1.21)
,
h0 := kx0 k < 2 ,
kx0 k
B(x0 , ) D , :=
.
1 h20
(1.22)
(1.23)
kxk x k
1
kxk xk1 k2 ,
2
kxk xk1 k
.
1 12 kxk xk1 k
L := {x D | kF (x)k <
h0 := kF (x0 k < 2 .
2
,
(1.25)
(1.26)
10
Then, the sequence {xk }lN0 of Newton iterates stays in L , and there exists an
x L such that xk x for some subsequence N 0 lN and F (x ) = 0.
Moreover, for the residuals F (xk ) there holds
kF (xk )k
1
kF (xk )k2 .
2
k = 0: in view of (1.26)
kF (x0 )k <
(ii)
x0 L .
kF (x + x )k = kF (x ) +
0
i
F 0 (xk + txk ) F 0 (xk ) xk + (1 ) F (xk ) dtk
|
{z
}
0
tkF 0 (xk )xk )k2
Z
k F 0 (xk )xk k2 t dt + (1 ) kF (xk )k =
| {z }
= F (xk )
(1.27)
11
1
= (1 + 2 kF (xk )k) kF (xk )k .
2
We assume
xk+1 = xk + xk
/ L .
Then there exists
:= min{ (0, 1] | xk + xk L } .
It follows from (1.27)
kF (xk + xk )k
1 2
(1 + kF (xk )k) kF (xk )k <
| {z }
2
<
2
2
< (1 + ) kF (xk )k <
,
|
{z
} | {z }
< 1
<
1 2
kF (xk )k ,
2
i.e.,
1
1 2
hk = hk hk .
2
2
hk+1
Since h0 < 2, for k = 0 we obtain
h1
1
h0 h0 < h0 ,
2 |{z}
< 2
12
k lN0 .
Moreover,
kF (xk+1 )k < kF (xk )k <
lim kF (xk )k = 0 ,
and
which implies
xk L D
k lN .
Affine conjugacy
Assume that D lRn is a convex set and that f : D lR is a strictly convex
functional. Consider the minimization problem
min f (x) .
xD
xD.
We note that the Jacobian F 0 (x) = f 00 (x) is symmetric and uniformly positive definite on D. In particular, F 0 (x)1/2 is well defined and symmetric,
positive definite as well.
Consequently, the energy product
(u, v)E := uT F 0 (x)v
u, v, x D
g(y) := f (By) ,
x = By .
13
14
(1.28)
xD
xD.
(1.29)
(1.31)
(1.32)
(1.33)
1/2
f (x ) f (x )
5
6
0
.
1 h20
(1.36)
15
Proof: Assertion (i) and (1.33) can be verified as in the proof of the affine
contravariant version of the Newton-Mysovskikh theorem.
For the proof of (1.34) in (ii), observing
F 0 (xk )xk = F (xk ) ,
we obtain
f (xk+1 ) f (xk ) +
1
kF 0 (xk )1/2 xk k2 =
2
Z1
< F (xk + sxk ), xk > ds < F (xk ), xk >
=
s=0
1
< F 0 (xk )xk , xk > =
2
Z1
< F (xk + sxk ) F (xk ), xk > ds
=
s=0
Z1
1
< F 0 (xk )xk , xk > =
2
Z1
< F 0 (xk + stxk )xk , xk > dt ds
=
s=0 t=0
Z1
Z1
< (F 0 (xk + stxk ) F 0 (xk ))xk , xk > dt ds =
|
{z
}
s
s=0
t=0
Z1
=
s=0
Z1
s
s=0
=: wk
Z1
s
1
< F 0 (xk )xk , xk >=
2
Z1
kF 0 (xk )1/2 xk k2 kF 0 (xk )1/2 xk k
|
{z
} |
{z
}
= k
= hk
Z1
s2
s=0
t dt
t=0
1
1
5
+ hk ) k <
k .
2
6
6
1
hk k ,
6
16
1
1
1
hk ) k >
k .
2
6
6
0 (f (x ) f (x ))
k=0
5 2
k =
6
5 2
5
1
=
hk =
4 ( hk )2 .
6
6
2
Using
1
1
1
hk+1 ( hk )2
hk < 1 ,
2
2
2
we further get
1
1
h0 )2 + ( h1 )2 + ...
2
2
1
1
1
( h0 )2 + ( h0 )4 + ( h1 )4 + ...
2
2
2
1 2
X
h
1
1 2
( h0 )k = 4 h00 ,
h0
4
2
1 2
k=0
(
17
(1.37)
The classical convergence theorems of Newton-Kantorovich and Newton-Mysovskikh and its affine covariant, affine contravariant, and affine conjugate versions assume the exact solution of (1.37).
In practice however, in particular if the dimension is large, (1.37) will be solved
by an iterative method. In this case, we end up with an outer/inner iteration, where the outer iterations are the Newton steps and the inner iterations
result from the application of an iterative scheme to (1.37). It is important to
tune the outer and inner iterations and to keep track of the iteration errors.
With regard to affine covariance, affine contravariance, and affine conjugacy the
iterative scheme for the inner iterations has to be chosen in such a way, that it
easily provides information about the
error norm in case of affine covariance,
residual norm in case of affine contravariance, and
energy norm in case of affine conjugacy.
Except for convex optimization, we cannot expect F 0 (x), x D, to be symmetric positive definite. Hence, for affine covariance and affine contravariance
we have to pick iterative solvers that are designed for nonsymmetric matrices.
Appropriate candidates are
CGNE (Conjugate Gradient for the Normal Equations) in case
of affine covariance,
GMRES (Generalized Minimum RESidual) in case of affine contravariance, and
PCG (Preconditioned Conjugate Gradient) in case of affine conjugacy.
18
(1.38)
As the name already suggests, CGNE is the conjugate gradient method applied
to the normal equations:
It solves the system
AAT z = b ,
(1.39)
(1.40)
i =
i =
i1
,
kpi k2
2
i1
= i i1 ,
ri = ri1 i Api
i = kri k2 ,
i
.
i1
min
ky vk ,
(1.41)
(1.42)
19
n1
X
j2 .
(1.43)
j=i
m lN .
(1.44)
(1.45)
i1
X
i1
X
kyj+1 yj k =
j=0
j2 .
(1.46)
j=0
n1
P
j=0
= 2i
j2
i1
P
j=0
(1.47)
j2
i+m
X
j2
m lN,
(1.48)
j=i
,
kyi k
kyi k
where is a user specified accuracy.
(1.49)
20
(1.50)
We will measure the impact of the inexact solution of (1.37) by the relative
error
k :=
kxk xk k
.
kxk k
(1.51)
hk
1 + k2
hk := kxk k = p
(1.53)
0 < 1,
(1.54)
1
h
2 k
+ k (1 + hk )
p
< 1,
1 + k2
(1.55)
21
(i)
B(x0 , ) ,
:=
kx0 k
1
(1.56)
(1.57)
kxk k
2
1+
(1.58)
k+1
= kF (x
(1.59)
h
i
k+1
k
) F (x ) F (x ) + F 0 (xk+1 )1
k+1 1
F (xk )
| {z }
= rk F 0 (xk )xk
= kF (x
k+1 1
h
F (x
k+1
i
) F (x ) F (x )x k +
+ kF 0 (xk+1 )1
rk
|{z}
= F 0 (xk )(xk xk )
Z1
h
i
kF 0 (xk+1 )1 F 0 (xk + txk ) F 0 (xk ) xk kdt +
|0
{z
=: I
22
Using the affine covariant Lipschitz condition (1.52), the first term on the
right-hand side in (1.59) can be estimated according to
Z1
I kxk k2
t dt =
1
kxk k2 .
2
(1.60)
k
k
kxk+1 k
1
1
kxk xk k
k
k kx x k
+
kx
k
kx
k
+
kxk k
2 | {z }
2
kxk
kxk k
|
{z
}
|
{z
}
= h
k
= k hk
= k
hk + k (1 + hk ) .
1
h
2 k
+ k (1 + hk )
p
< 1,
1 + k2
(1.62)
.
(1.63)
2
2
kxk k
1 + k+1
kxk k
1 + k+1
It can be easily shown that {xk }lN0 is a Cauchy sequence in B(x0 , ). Consequently, there exists x B(x0 , ) such that xk x (k ). Since
F 0 (xk )xk = F (xk ) + rk ,
| {z }
|
{z
}
0
we conclude F (x ) = 0.
F (x )
23
2
1+
(1.64)
for some appropriate > 0 and control the inner iterations such that
hk
.
2 1 + hk
(1.65)
B(x , ) ,
kx0 k
:=
1 1+
h0
2
(1.66)
kxk+1 k
kx k
=
kxk k
1
h
2 k
+ k (1 + hk )
p
.
1 + k2
1 + k2 kxk+1 k
.
2
1 + k+1
kxk k
hk .
2
k
kx k
2 1 + k
2
(1.67)
(1.68)
24
and
kxk+1 k
1+
1+
hk
q
hk ,
kxk k
2
2
2
1 + k+1
from which (1.67) and (1.68) follow by the definition of the Kantorovich quantities.
In order to deduce quadratic convergence we have to make sure that the initial
increments (k = 0) are small enough, i.e.,
1+
1+
h0
h0 < 1 .
2
2
(1.69)
Furthermore, (1.68) and (1.69) allow us to show that the iterates xk , k lN stay
in B(x0 , ). Indeed, (1.68) implies
kxj k
1+
1+
hj1 kxj1 k
h0 kxj1 k , j lN ,
2
2
and hence,
k
X
1+
kx0 k
kx x k
kx k kx k
(
h0 )j
.
1+
2
1
h
0
2
j=0
j=0
k
k
X
k := t
,
(1.70)
2
k
kx k
1+
k
2
where k and k+1 are computationally available estimates of k2 and k+1
.
25
1 + k2 kxk k.
k1 and
Consequently, replacing k1 and k by the computable quantities
k , we arrive at the termination criterion
q
2
1 + k
(1.71)
XTOL .
2
1
k1
1
h
2 k
+ k (1 + hk )
p
< 1.
1 + k2
On the other hand, in view of (1.65) of Theorem 3.2, in the quadratic convergence mode the termination criterion is
hk
.
2 1 + hk
hk = kxk k = p
are not directly accessible, we have to replace them by computationally available estimates [hk ].
We recall that for hk we have the a priori estimate
[hk ] = 2 2k1 hk .
k1 (cf. (1.70)), we
Consequently, replacing k by k , hk by [hk ], and k1 by
get the a priori estimates
[hk ]
[hk ] = q
1+
2
k
2
[hk ] = 2
k1
k lN .
(1.72)
For k = 0, we choose 0 = 0 = 41 .
In practice, for k 1 we begin with the quadratic convergence mode and switch
26
2
(iii)1 Quadratic convergence mode
The computationally realizable termination criterion for the inner iteration in the quadratic convergence mode is
k
[hk ]
.
2 1 + [hk ]
(1.73)
(1.74)
is met.
The computationally realizable termination criterion for the inner iteration in the linear convergence mode is
[(hk , k )] := ([hk ], k ) =
1
[h ]
2 k
+ k (1 + [hk ])
q
.
2
1 + k
(1.75)
27
In practice, we require the monotonicity test (1.70) in CGNE and run the
inner iterations until k satisfies (1.75) or divergence occurs, i.e.,
k > 2 .
28
(1.76)
v1 :=
r0
V1 := v1 .
(1.77)
(1.78)
(1.79)
II. Normalization:
vi+1
.
k
vi+1 k
vi+1 =
(1.80)
III. Update:
Vi+1 =
Hi =
Hi =
Vi vi+1 .
hi
k
vi+1 k
(1.81)
Hi1
hi
0
k
vi+1 k
i=1,
(1.82)
(1.83)
i>1.
29
(1.84)
V. Approximate solution:
yi = Vi zi + y0 .
(1.85)
min
kb Azk .
(1.86)
i lN0 .
(1.87)
i lN ,
(1.88)
i lN .
(1.89)
kri k
.
kr0 k
(1.90)
if i 6= 0 .
(1.91)
(1.92)
(1.93)
30
r0k = F (xk ) .
(1.94)
Consequently, during the inner GMRES iteration the relative error i , i lN0 ,
in the residuals satisfies
i =
krik k
1 ,
kF (xk )k
i+1 < i , if i 6= 0 .
(1.95)
In the sequel, we drop the subindices i for the inner iterations and refer to k
as the final value of the inner iterations at each outer iteration step k.
Theorem 3.3 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part I: Linear convergence
Suppose that F : D lRn lRn is continuously differentiable on D and let
x0 D be some initial guess. Let further the following affine contravariant
Lipschitz condition be satisfied
k(F 0 (y) F 0 (x))(y x)k kF 0 (x)(y x)k2 , x, y D , 0 . (1.96)
Assume further that the level set
L0 := {x lRn | kF (x)k kF (x0 )k}
(1.97)
is a compact subset of D.
In terms of the Kantorovich quantities
hk := kF (xk )k ,
k lN0 .
1
kF (xk+1 )k k + (1 k2 ) hk kF (xk )k .
2
(1.98)
(1.99)
Assume that
h0 < 2
(1.100)
1
hk ,
2
(1.101)
31
(1.102)
(1.103)
(1.104)
k+1
F 0 (xk + txk ) xk dt .
) = F (x ) +
(1.105)
k+1
Z1
)k = k
0
1
Z
F (x + tx ) F (x ) xk dt + rk k
0
0 k
k
0 k
k F (x + tx ) F (x ) xk k dt + krk k
kF 0 (xk ) xk k2 + krk k
2
1
kF (xk ) rk k2 + krk k .
2
We recall (1.93)
krk F (xk )k2 = (1 k2 ) kF (xk )k2 ,
from which (1.99) can be immediately deduced.
Now, in view of (1.101), (1.99) yields
1
kF (xk+1 )k
k + (1 k2 )hk kF (xk )k
|{z}
2
12 hk
1 2
hk ) kF (xk )k kF (xk )k .
2 k
k lN0 .
32
(1 + ) kF (x0 )k 0 (k lN ) ,
the whole sequence must converge to x .
Theorem 3.4 Affine contravariant convergence theorem for the inexact Newton GMRES method. Part II: Quadratic convergence
Under the same assumptions on F : D lRn lRn as in Theorem 3.3 suppose
that the initial guess x0 D satisfies
h0 <
2
1+
(1.106)
for some appropriate > 0 and control the inner iterations such that
k
hk .
2
1 k
2
(1.107)
1
(1 + ) (1 k2 ) kF (xk )k2 .
2
(1.108)
Proof. Inserting (1.107) into (1.99) and observing hk = kF (xk )k gives the
assertion.
kF (xk )k
(1.109)
(1.110)
33
(1.111)
1
hk ,
2
hk .
2
1 k
2
Again, we replace the theoretical Kantorovich quantities hk by some computationally easily available a priori estimates. We distinguish between the quadratic
and the linear convergence mode:
(iii)1 Quadratic convergence mode
We recall the termination criterion (1.107) for the quadratic convergence mode
k
hk .
2
1 k
2
It suggests the a posteriori estimate
[hk ]2 :=
2 k
hk .
(1 + ) (1 k2 )
(1.112)
2
1 k
2
1.0 .
(1.113)
34
(1.114)
1
kF (xk ) rk k2 =
(1 k2 ) hk kF (xk )k . (1.115)
2
2
2 kF (xk+1 ) rk k
(1 k2 )kF (xk )k
hk
(1.116)
(1.117)
1
[hk+1 ] .
2
(1.118)
If we find
k+1 < k
(1.119)
with k from (1.113), we continue the iteration in the quadratic convergence mode.
Otherwise, we realize the linear convergence mode with some
k+1 k+1 .
(1.120)
35
u, v lRn .
ri+1 = ri
1
Api
i
i2 =
pi+1 = ri+1 +
,
i
i
1
pi ,
i
ri+1 = Bri+1
i =
(= kyi+1 yi k2A ) ,
i+1
pi
i
kpi k2A
i
36
min
ky zkA ,
(1.121)
(1.122)
m lN .
(1.123)
n1
X
j2 .
(1.124)
j=i
(1.125)
= i2
kyi
y0 k2A
i1
X
kyj+1
yj k2A
j=0
i1
X
j2 .
(1.126)
j=0
n1
P
j=0
j2
= 2i
i1
P
j=0
(1.127)
j2
37
i+m
X
j2 .
(1.128)
j=0
In the inexact Newton PCG method we will control the inner PCG iterations by the relative energy error norms
p
[i ]
ky yi kA
i =
(1.129)
kyi kA
kyi kA
and use the termination criterion
i ,
(1.130)
(1.131)
Again, we will drop the subindices i for the inner PCG iterations and refer to
k as the final value of the inner iterations at each outer iteration step k. We
recall the Galerkin orthogonality (cf. (1.123))
(xk , F 0 (xk )(xk xk )) = (xk , rk ) = 0 .
(1.132)
38
is compact.
Let further the following affine conjugate Lipschitz condition be satisfied
k
1 + k2
and
hk := kF 0 (xk )1/2 xk k = p
hk
,
1 + k2
0 k 1/2
k
k
kF (x )
x x k
.
k :=
kF 0 (xk )1/2 xk k
Assume that for some < 1
h0 < 2 < 2
(1.134)
and that
k+1 k
k lN0
2
hk + k hk + 4 + (hk )
p
(hk , k ) :=
.
2 1 + k2
(1.135)
(1.136)
39
Then, the inexact Newton PCG iterates xk , k lN0 stay in L0 and converge linearly to some x L0 with f (x ) = min f (x).
xD
k lN0 ,
k lN0 .
(1.137)
(1.138)
1
2
1
hk k f (xk ) f (xk+1 )
k
h .
10
3
10 k k
(1.139)
Proof. Observing
rk = F (xk ) + F 0 (xk )xk
k lN0 ,
f (x + x ) f (x ) =
(1.140)
s=0
Z
k
(xk , F (xk )) ds =
(x , F (x + sx ) F (x )) ds +
s=0
s=0
Z
=
Zs
(xk , F 0 (xk + stxk )xk ) dt ds +
s
s=0
t=0
s=0
Zs
Z
=
s
s=0
Z
+
s=0
t=0
Zs
Z
k
(xk ,
(x , F (x )x ) dt ds +
t=0
(xk , F (xk )) ds =
s=0
F (xk ) ) ds =
| {z }
rk F 0 (xk )xk
Zs
s=0
t=0
Z
+
Zs
Z
(xk , F 0 (xk )xk ) dt ds
s
s=0
t=0
Z
+
40
(xk , rk )
| {z }
ds
1 4
1 6
hk k +
k 2 k .
10
3
1
1
hk k + ( 2 1) k ) .
10
3
(1.141)
(1.142)
41
= kF 0 (xk+1 )1/2
Z1
(1.143)
1
kF 0 (xk )1/2 xk k2 + kF 0 (xk+1 )1/2 rk k .
2
Setting z = xk xk , for the second term on the right-hand side of the previous
inequality we get the implicit estimate
kF 0 (xk+1 )1/2 rk k2
kF 0 (xk )1/2 zk2 + hk kF 0 (xk )1/2 zk kF 0 (xk+1 )1/2 rk k ,
which gives the explicit bound
kF 0 (xk+1 )1/2 rk k
1
hk +
4 + (hk )2 kF 0 (xk )zk .
2
(1.144)
1 2
1
0 k 1/2
k 2
2
kF (x ) x k +
hk + 4 + (hk ) kF 0 (xk )1/2 zk .
|
{z
}
|
{z
}
2
2
= (hk )2
= k hk
Taking (1.136) into account, we thus get the contraction factor estimate
k :=
1+k hk
(1.145)
42
` = k, k + 1 ,
kF 0 (xk )1/2 xk k
1 + k2
k k .
2
1 + k+1
(1.146)
By standard arguments we further show that the sequence {xk }lN0 of inexact
Newton PCG iterates is a Cauchy sequence in L0 and there exists an x L0
such that xk x (k ) with F (x ) = 0.
2
1+
(1.147)
for some appropriate > 0 and control the inner iterations such that
k
pk
.
2 hk + 4 + (hk )2
(1.148)
1+
kF 0 (xk )1/2 xk k2 ,
2
(1.149)
1+
kF 0 (xk )1/2 xk k2 .
2
(1.150)
43
(1 + ) hk ,
0
k
1/2
k
2
kF (x ) x k
2
2 1 + k
which proves (1.149) in view of hk hk h0 < 2.
The proof of (1.150) follows along the same line by using (1.148) in (1.146).
f (xk+1 ) f (xk )
(1.151)
k+1
1/2
(1 + 2 ) 1/2
k+1
k+1
2
(1 + k ) k
< 1
(1.152)
k = (1 + k ) k ETOL2 .
(1.153)
1
ETOL2 .
2
(1.154)
or
f (xk ) f (xk+1 )
44
quadratic convergence mode and switch to the linear convergence mode as soon
k is below some prespecified threshold
as the approximate contraction factor
value 12 .
(iii)1 Quadratic convergence mode
A computationally realizable termination criterion for the inner PCG
iteration in the quadratic convergence mode is given by
k
[hk ]
p
,
[hk ] +
4 + [hk ]2
(1.155)
10
1
k+1
k
|f
(x
)
f
(x
)
+
|
3 k
k
(1.156)
and
q
[hk ]2 :=
1 + k |[hk ]2 .
(1.157)
(1.158)
Using (1.158) in (1.157), for the inexact Kantorovich quantity we obtain the
following a priori estimate
[hk ]
[hk ] := q
.
2
1 + k
(1.159)
(1.160)
45
is satisfied.
The computationally realizable termination criterion for the inner iteration in the linear convergence mode is
[(hk , k )] := ([hk ], k ) .
(1.161)
k q
(k ) ,
1
we observe:
46
4. Quasi-Newton Methods
4.1 Introduction
Given F : D lRn lRn as well as xk , xk+1 D , xk 6= xk+1 , the idea is to
approximate F locally around xk+1 by an affine function
Sk+1 (x) := F (xk+1 ) + Jk+1 (x xk+1 ) , Jk+1 lRnn ,
(1.162)
such that
Sk+1 (xk ) = F (xk ) .
(1.163)
k+1
k
J x
x
= F (xk+1 ) F (xk ) .
(1.164)
|
{z
}
|
{z
}
k
=: y
=: xk
(1.165)
(1.166)
where
(1.167)
An appropriate idea is to choose Jk+1 Sk+1 such that there is a least change
in the affine model in the sense
kJk+1 Jk kF =
min kJ Jk kF ,
JSk+1
(1.168)
n
X
i,k=1
2
Jik
1/2
(1.169)
47
(1.170)
= (y k Jk xk )
Jk+1 Jk = v k (xk )T .
(1.171)
y k Jk xk
.
(xk )T xk
A1 uv T A1
.
1 + v T A1 u
(1.173)
Setting
A := Jk
u := F (xk+1 ) F (xk ) Jk xk
we obtain
1
= Jk1
Jk+1
v :=
(xk )T
,
(xk )T xk
i
h
1
k+1
k
k
x Jk (F (x ) F (x )) (xk )T Jk1
h
i
+
.
(xk )T Jk1 F (xk+1 ) F (xk )
(1.174)
48
min kJ 1 Jk1 kF .
J Sk+1
(1.175)
1
Jk+1
i
T
xk Jk1 F (xk+1 ) F (xk )
F (xk+1 ) F (xk )
= Jk1 +
.(1.176)
k+1
k
k+1
k
F (x ) F (x )
F (x ) F (x )
(1.177)
(1.178)
=: Ek (J)
xk+1 v T
Jk+1 = Jk I
,
v T xk
v lRn \ {0}
(1.179)
49
Jk+1 = Jk
xk+1 (xk )T
I
kxk k2
(1.180)
kxk+1 k
1
<
k
kx k
2
(1.181)
The update matrix Jk+1 is a least change update in the sense that
kEk (Jk+1 )k kEk (J)k ,
J Sk+1 ,
kEk (Jk+1 )k k .
(ii)
(1.182)
(1.183)
I +
xk+1 (xk )T 1
Jk ,
(1 k+1 ) kxk k2
(1.184)
where
k+1 =
(xk )T xk+1
1
<
.
k
2
kx k
2
xk+1
.
1 k+1
(1.185)
< 1.
k
kx k
1 k+1
(1.186)
xk+1 (xk )T
xk (xk )T
k
=
kE
(J)
k kEk (J)k ,
k
kxk k2
kxk k2
50
kxk+1 k
xk+1 (xk )T
k
= k .
kxk k2
kxk k
1
,
2
< 1.
=
k
kx k
1 k+1
1 k
Finally, the proofs of (ii) and (iii) are direct consequences of the ShermanMorrison-Woodbury formula (1.173).
kF 0 (x )1 F 0 (x) F 0 (x ) vk kx x k kvk ,
(1.187)
where x, x + v D, v lRn .
For some 0 < < 1 assume further that:
(a)
(b)
.
1+
(1.188)
1
0 .
2 1+
(1.189)
(1.190)
51
kxk+1 k kxk k .
(1.191)
(1.192)
<
.
2
1+
(1.193)
(1.194)
= kF 0 (x )1 F (xk+1 ) F (xk ) +
F 0 (x )1 F (xk )
{z
}
|
= F 0 (x )1 Jk xk = Ek xk xk
Z1
Z1
kxk + t
Z1
0
xk
|{z}
x k dt kxk k + kEk xk k
= xk+1 xk
1
2
(tk+1 + tk ) +
52
kEk xk k
kxk k .
kxk k
xk+1 k
kEk+1
kxk+1 k
< 1 we obtain
kEk+1
xk+1 k
) kxk+1 k.
(1.196)
kxk+1 k
Setting
` :=
kE` x` k
, ` = k, k + 1,
kx` k
1
(tk + tk+1 ) ,
2
tk :=
.
k
kx k
1 k+1
(1.197)
= ek Jk1 F (xk ) F (x ) =
= ek Jk1 F 0 (x ) F 0 (x )1 F (xk ) F (x ) =
| {z }
= (I+Ek )1
= (I +
Ek )1
h
(I +
Ek )ek
F (x )
= (I + Ek )1 Ek ek F 0 (x )1
Z1
i
F (x ) F (x )
=
k
i
F 0 (xk + tek ) F 0 (x ) ek dt .
kEk k + 21 tk
tk .
1 kEk k
(1.198)
k :=
kEk k + tk
.
1 kEk k
(1.199)
53
As far as the approximation properties of the rank 1 updates are concerned, we start from
Ek+1
= F 0 (x )1 Jk+1 I =
(1.200)
= Jk1 F (xk+1 )
= F 0 (x )1 Jk I
z }| {
xk+1
(xk )T
I =
kxk k2
= Ek + F 0 (x )1
F (xk+1 ) (xk )T
.
kxk k2
We further have
= Jk xk
{z
= (I+Ek )xk
Z1
= F 0 (x )1
= F 0 (x )1
|
Z1
F 0 (xk + txk ) F 0 (x ) dt xk Ek xk .
{z
=: Dk+1
xk (xk )T
xk (xk )T
Ek+1
= Ek I
+
D
.
k+1
kxk k2
kxk k2
|
{z
}
| {z }
=: Qk
(1.201)
=: IQk =Q
k
)T = Qk (Ek )T + Q
(Ek+1
k Dk+1 .
(1.202)
54
kEk+1
xk k
kDk+1 xk k
=
tk ,
kxk k
kxk k
(1.203)
kEk+1
k = max
v6=0
max
v6=0
k(Ek+1
)T vk
kvk
(1.204)
k(Ek )T vk
|(Dk+1 xk , v)|
+ max
kEk k + tk .
v6=0
kvk
kxk k kvk
k < 1 ,
k lN0 ,
it is natural to assume
k
+ t0
:= .
1
(1.205)
This gives
kEk+1
k
kE0 k
k
X
t` < kE0 k +
`=0
t0
,
1
(1.206)
t0
.
1
(1.207)
If we insert the expression for into (1.205) and solve for t0 , we obtain
2 t0 = (1 ) (1 + ) kE0 k .
Taking into account that
t0 =
1
1
(t0 + t1 )
(1 + ) t0 ,
2
2
kE0 k .
1+
(1.208)
55
1
<
2
1+
for < 1 .
(1.209)
kE0 k = kF 0 (x )1 J0 Ik = kF 0 (x )1 J0 F 0 (x ) F 0 (x0 ) k
kF 0 (x )1 J0 F 0 (x0 ) k + kF 0 (x )1 F 0 (x0 ) F 0 (x ) k .
{z
}
|
{z
}
|
= 0
kx0 x k=t0
(2 ) t0 (1 )
0
1+
t0
1
0 ,
2 1+
k(Ek+1
)T vk2 = k(Ek )T vk2
(Dk+1 xk , v)2
(Ek xk , v)2
+
.
kxk k2
kxk k2
X
X
kE`+1
vk2
kE0 vk2
(Dk+1 xk , v)2
(Ek xk , v)2
=
+
.
kvk2 kxk k2
kvk2
kvk2
kvk2 kxk k2
k=0
k=0
kEk+1
xk k
kDk+1 xk k
=
tk ,
kxk k
kxk k
56
we obtain
X
X
X
(Ek xk , v)2
2k 2
2
2
2
kE0 k +
tk kE0 k +
t0 .
2
k
2
kvk kx k
k=0
k=0
k=0
2
Since t0
1
(1
2
X
(Ek xk , v)2
1 1+ 2
2
kE0 k +
2 t0 .
2 kxk k2
kvk
2
1
k=0
v lRn \ {0} .
Consequently, setting
k :=
xk
,
kxk k
we have
lim Ek k = 0 ,
The recursion (1.184) cannot be used directly for the computation of the QuasiNewton increments. To come up with a computationally feasible recursion, we
rewrite (1.184) according to
1
=
Jk+1
I +
xk+1 (xk )T 1
Jk .
kxk k2
(1.210)
k1
Y
`=0
I +
x`+1 (x` )T 1
J0 ,
kx` k2
(1.211)
57
In order to monitor the condition number of the approximations of the Jacobians, we provide the following elementary result for rank 1 matrices.
Lemma 4.1 Condition number of rank 1 matrices
For a rank 1 matrix A of the form
A = I
uv T
vT v
:=
kuk
< 1,
kvk
1+
.
1
uv T
k 1+
vT v
and
kA k
uv T 1
1
1 k T k
.
v v
1
Applying Lemma 4.1 to the recursions (1.56) and (1.60) and observing
k =
kxk+1 k
1
<
,
kxk k
2
1 + k
cond(Jk ) < 3 cond(Jk ) .
1 k
(1.212)
1
,
2
(1.213)
58
Termination criterion
(1.214)
multiplication by Jk results in
Jk xk
| {z }
= Jk J 1 F k+1 ,
= F (xk )
whence
Jk J 1 F k+1 = F k+1 F (xk+1 ) .
We thus get
I Jk J 1 F k+1 = F (xk+1 ) .
|
{z
}
(1.215)
= Ek (J)
F (xk+1 )v T
1
Jk+1
= Jk1 I
,
v T F k+1
v lRn \ {0}
59
(1.216)
(1.217)
The update matrix Jk+1 is a least change update in the sense that
kEk (Jk+1 )k kEk (J)k ,
J Sk+1 ,
(1.218)
k
.
(1.219)
1 k
is regular as well. Jk+1 can be represented
kEk (Jk+1 )k
(1.220)
(F k+1 )T F (xk+1 )
1
xk+1 = Jk+1
F (xk+1 ) = 1
( Jk1 F (xk+1 ))
(1.221)
.
|
{z
}
kF k+1 k2
=: xk+1
which gives
kEk (Jk+1 )F k+1 k
kF (xk+1 )k
kEk (Jk+1 )k =
=
=
kF k+1 k
kF k+1 k
=
J Sk+1 .
Using
kF (xk+1 )k
kF (xk+1 ) F (xk )k kF (xk )k kF (xk+1 )k = kf (xk )k 1
,
kF (xk )k
we find
kF (xk+1 )k
k
kEk (Jk+1 )k =
.
k+1
kF k
1 k
60
In the the subsequent convergence proof for the affine contravariant QuasiNewton method we need the following technical result:
Lemma 4.2 An elementary technical estimate
Assume 0 < < 1 , 0 0 < and
t
0
.
1 + 0 + 43 (1 )1
Setting
= 0 +
t
,
(1 t)(1 )
there holds
+ (1 + ) t .
Proof. For t we have
t
1+
4
)
3(1
3(1 )
=: g() .
7 3
= 0 +
0 +
7
t
6
4
3
+ (1 + 0 +
7 28
3
)t =
1
6
)t
t
t
+ (1 + 0 +
) t = + (1 + )t .
(1 t)(1 )
(1 t)(1 )
61
0
0 +
k F (x) F (x ) (y x)k kF 0 (x )(x x )k kF 0 (x )(y x)k (, 1.222)
where x, y D, and denote by
Ek := I F 0 (x )Jk1
(1.223)
(b)
(1.224)
0
.
1 + 0 + 43 (1 )1
(1.225)
(1.226)
kF (xk+1 )k kF (xk )k .
(1.227)
(1.228)
t0
.
(1 t0 )(1 )
(1.229)
62
(1.230)
k+1
) =
= Jk xk
Z1
=
F 0 (xk + txk ) xk dt =
F (x ) +
| {z }
0
F 0 (xk + txk ) F 0 (x ) xk dt +
0
=
xk .
F 0 (x ) Jk
{z
}
|
k
1
(F 0 (x )Jk I) Jk x
| {z }
= F (xk )
Z1
kF 0 (x )
Z1
(xk + txk x )
{z
}
|
= (1t)(xk x )+t(xk+1 x )
+ kEk k kF (xk )k =
1
(tk + tk+1 ) k
2
=
63
where
tk := kF 0 (x )(xk x )k ,
k lN0 .
Setting
tk :=
1
(tk + tk+1 ) ,
2
k lN0 ,
we obtain
kF (xk+1 )k tk k(Ek I)F (xk )k + kEk k kF (xk )k
h
(1.231)
i
tk (1 + kEk k) + kEk k kF (xk )k .
xk
|{z}
= Jk1 F (xk )
{z
= F 0 (x )(xk x ) +
F (x ) F (xk )
|
{z
}
=
R1
+ Ek F (xk ) ,
and hence,
kF 0 (x )(xk x )k
Z1
t kF 0 (x )(xk x )k2 dt + kEk k kF (xk ) F 0 (x )(xk x )k
+ kF 0 (x )(xk x )k .
Treating F 0 (x )(xk x ) F (xk ) as before and multiplying by yields
tk+1
1 2
1
tk + kEk k ( t2k + tk ) =
2
2
(1.232)
64
1 + kEk k
kEk k +
tk tk .
2
Ek+1
= I F 0 (x )Jk+1
= I F 0 (x )Jk1 I
=
kF k+1 k2
= (I F (x
)Jk1 )
= (I F 0 (x )Jk1 ) (I
{z
}
|
|
= Ek
= (IQk ) = Q
k
(I F (x
)Jk1 )
1
0
= Ek Q
k + F (x )Jk
F k+1 (F k+1 )T
.
kF k+1 k2
|
{z
}
= Qk
0
Ek+1
= Ek Q
k + F (x )Jk
Ek Q
k
I F (x
) Jk1
|
kF k+1 k2
= (1.233)
= Ek Q
k + Ek+1 Qk .
Qk Qk
65
kEk+1
k kEk Q
k k + kEk+1 Qk k
kEk k
(1.234)
kEk+1
F k+1 k
+
.
kF k+1 k
and hence
F k+1 = F k+1 F 0 (x ) xk =
Ek+1
Z1
=
(1.235)
F 0 (xk + txk ) F 0 (x ) xk dt .
kEk+1
F k+1 k
Z1 h
i
tkF 0 (x )(xk+1 x )k + (1 t)kF 0 (x )(xk x )k kF 0 (x )xk k dt
kEk+1
F k+1 k
tk
kF k+1 k .
1 tk
kEk+1
k kEk k +
tk
.
1 tk
(1.236)
66
The bounded deterioration property and the contraction of the residuals is now proved by an induction argument. We assume that we have
k
P
kEk k kE0 k +
t0
`=0
1 t0
:= kE0 k +
t0
.
(1 t0 )(1 )
and
k
tk t0 .
Then, (1.232) in combination with Lemma 4.2 yields
k
kEk+1
k kEk k +
k1
P
kE0 k
t0
`=0
1 t0
k
P
kE0 k +
tk
1 t0
k+1
t0
+
1 t0
`
t0
`=0
1 t0
(Ek+1
)T = Q
+ Qk (Ek+1
)T .
k (Ek )
67
Observing
Qk (Ek+1
)T v
= F
k+1
k+1
, v)
(F k+1 , (Ek+1
)T v)
k+1 (Ek+1 F
= F
,
k+1
2
k+1
2
kF k
kF k
we get
T
2
T
2
k(Ek+1
)T vk2 = kQ
k (Ek ) vk + kQk (Ek+1 ) vk =
T
2
2
T
= k(Ek )T vk2 kQ
k (Ek ) vk + kQk (Ek+1 ) vk =
= k(Ek )T vk2
F k+1 , v)2
(Ek+1
(Ek F k+1 , v)2
+
.
kF k+1 k2
kF k+1 k2
k=0
kF k+1 k2 kvk2
X
kE`+1
k2
F k+1 , v)2
(Ek+1
kE0 k2
=
+
.
kvk2
kvk2
kF k+1 k2 kvk2
k=0
Estimating from above by neglecting the negative term and observing (1.236),
for ` , we obtain
`
X
(E F k+1 , v)2
k
k=0
kF k+1 k2 kvk2
kE0 k2 +
X
X
tk 2
t20
2k
kE0 k2 +
.
2
1
t
(1
t
k
0)
k=0
|k=0{z }
=
1
2
1
and thus
lim kEk k = 0 .
The superlinear convergence of the residuals results from the following reasoning: Observing
Ek F (xk+1 ) = F (xk+1 ) F 0 (x )Jk1 F (xk+1 ) ,
we have
kF (xk+1 )k kF 0 (x )Jk1 F (xk+1 )k kEk F (xk+1 )k kEk kkF (xk+1 )k ,
68
and hence,
kF (xk+1 )k
kF 0 (x )Jk1 F (xk+1 )k
.
1 kEk k
(1.237)
which gives
kF 0 (x )Jk1 F (xk+1 )k
k
kEk k+ 1t
2kEk k +
tk
(1 + ) kF (xk )k .
1 tk
tk
2kEk k + 1t
k
kF (xk+1 )k
kF (xk )k .
1 kEk k
(1.238)
1
12k
69
Convergence monitor
We choose
max <
1
2
e.g. max =
1
4
and check
k max .
(1.239)
(1.240)
70
1
1
kF (x)k2 =
F (x)T F (x)
2
2
(1.241)
if T (xk ) 6= 0 .
(1.242)
(1.243)
In terms of the level set G, the monotonicity criterion (1.242) can be stated as
xk+1 int G(xk ) ,
if int G(xk ) 6= .
(1.244)
In the steepest descent method, the gradient of the level function is used as
the direction of the iterative correction
xk = grad T (xk ) = F 0 (xk )F (xk ) ,
xk+1 = xk + sk xk ,
(1.245)
71
(1.246)
i lN0
< 1 (e.g.,
1
.
2
(1.247)
the downhill property (1.246) assures that a finite number of reductions will
result in a feasible steplength sk > 0.
The prediction strategy selects s0k+1 on the basis of an ad-hoc rule with
respect to the steplength history
s
min (smax , k ) , if sk1 sk
0
sk+1 :=
.
(1.248)
sk , otherwise
Remark 5.1 The speed of convergence of the steepest descent method may be
slow and problems may occur due to an ill-conditioning of the Jacobian F 0 (x).
72
The steepest descent method as described above is not affine covariant. Indeed,
given a nonsingular matrix A lRnn , we may introduce another level function
TA (x) :=
1
kAF (x)k2 .
2
(1.249)
(1.250)
there exists a class of regular matrices A such that for some > 0
TA (x + s) > TA (x) ,
0<s< .
(1.251)
Proof. We have
xT grad TA (x) = (F 0 (x)T F (x))T F 0 (x)T AT AF (x) =
= F (x)T F 0 (x)F 0 (x)T AT AF (x) .
We choose A lRnn such that
AT A = F 0 (x)F 0 (x)T + yy T
where y lRn satisfies
T
0
0
T
F (x) F (x)F (x) + I y = 0 ,
>0,
F (x)T y 6= 0 .
73
we get
xT grad TA (x) > 0 ,
which proves (1.251).
(1.252)
if int GA (xk ) 6= .
Denoting by GL(n) the set of all regular nn matrices, we introduce the affine
covariant level set
\
GA (x) .
GA (x) :=
(1.253)
AGL(n)
A GL(n) ,
(1.254)
(1.255)
(1.256)
74
(1.257)
H(x0 ) :=
HA (x0 ) .
(1.258)
AGL(n)
A A =
i2 qi qiT .
i=1
A := {A GL(n) | A A =
i2 qi qiT , q1 =
i=1
F (x0 )
}.
kF (x0 )k
n
X
b j qj
bj lR , 1 j n ,
j=1
and hence,
2
kAyk = y A Ay =
n
X
i2 b2i ,
i=1
HA (x ) = {y lR |
n
X
i=1
75
2
1
2
n
2
2
b
+
b
+
...
+
b2n 1 .
2
kF (x0 )k2 1
1 kF (x0 )k
1 kF (x0 )k
For A A, all ellipsoids have a common b1 -axis of length kF (x0 )k, whereas the
lengths of the other axes differ (cf. Figure 1).
It follows readily that
0 ) = {y lRn | y = b1 q1 , |b1 | kF (x0 )k} =
H(x
= {y lRn | y = (1 )F (x0 ) , [0, 2]} =
= {y lRn | Ay = (1 )AF (x0 ) , [0, 2] , A GL(n)} .
Since A GL(n), we have
0) .
H(x0 ) H(x
0 ) and A A
On the other hand, for y H(x
kAyk2 = (1 )2 kAF (x0 )k2 kAF (x0 )k2 ,
which shows
0 ) H(x0 ) .
H(x
(1.259)
76
The final stage of the proof is done by an appropriate lifting of the path H(x0 )
to G(x0 ) using the homotopy
(x, ) := F (x) (1 )F (x0 ) .
In view of
x = F 0 (x) ,
= F (x0 )
and observing that x is nonsingular for x D and GA (x0 ) D, local continuation from x(0) = x0 by the implicit function theorem, applied to 0,
delivers the existence of the path
x GA (x0 ) D
with the properties (1.256),(1.257). The assertions (1.254) and (1.255) are now
a direct consequence of (1.259).
Remark 5.2 The implication of the previous theorem is that even far from the
solution, the Newton increment x0 /kx0 k, which is tangent to the Newton
path originating from x0 , plays a decisive role and should be used in an affine
invariant globalization strategy. Alone, its length may me too large and thus
has to be controlled appropriately.
Remark 5.3 The previous theorem assumes that the Jacobian is regular in D.
However, sometimes the situation is encountered where the Jacobian is singular
at a critical point x even close to the initial guess x0 . In this case, the implicit
function theorem tells us that the Newton path ends at that critical point.
5.2 Trust region concepts
As we have seen, far away from the solution the ordinary Newton method can
be still used, provided an appropriate damping of the Newton increment is
provided. Of course, we would like to know how to determine the damping
factor, or in other words, what is the region around the current iterate where
we can rely on the linearization with respect to the tangent to the Newton path.
The specification of such regions is known as trust region concepts.
5.2.1 Trust region based on the Levenberg-Marquardt method
Given a current iterate xk lRn and a prespecified parameter > 0, the idea of
the Levenberg-Marquardt method is to determine an increment xk lRn
as the solution of the constrained minimization problem
inf
xk K
77
xk lRn
sup L(xk , )
lR+
(1.260)
kxk k2 2 0 ,
(1.261)
0 ,
(kxk k2 2 ) = 0 .
Denoting the solution of the saddle point problem by (xk (), ), we observe
0+
>> 1
xk () F 0 (xk )F (xk ) ,
xk ()
1 0 k
1
F (x )F (xk ) = grad T (xk ) .
This means:
Close to the solution, the method coincides with the ordinary Newton method,
whereas far from the solution, it corresponds to a steepest descent with the
steplength parameter 1 .
The Levenberg-Marquardt method looks robust, since the coefficient matrix
F 0 (xk )T F 0 (xk ) + I in (1.260) is regular, even if the Jacobian F 0 (xk ) is singular.
However, the method may terminate for singular F 0 (xk ), since then the righthand side in (1.260) also degenerates. Moreover, the Levenberg-Marquardt
methods lacks affine invariance.
5.2.2 The Armijo damping strategy
An empirical damping strategy is the Armijo strategy:
Let k {1, 12 , 41 , ..., min } be a sequence of steplengths with the property
1
T (xk + xk ) (1 ) T (xk ) ,
2
k .
(1.262)
78
Obviously, the choice of the level function T (x) in the Armijo rule does not
reflect affine covariance. We will develop an affine covariant damping strategy
below.
5.2.3 Affine covariant trust region method
The Levenberg-Marquardt method can be easily reformulated to yield an affine
covariant version. Since affine covariance means affine invariance with respect
to transformations in the domain of definition, we have to modify the objective
functional:
79
xk K
is given as follows:
whereas the set of constraints K
:= {xk lRn | kF 0 (xk )xk k } .
K
(1.264)
There is basically the same geometric interpretation as before with the only
difference that now the picture has to be drawn in the range space.
5.3 Globalization of affine contravariant Newton methods
5.3.1 Convergence of the damped Newton iteration
We consider the damped Newton iteration
F 0 (xk )xk = F (xk ) ,
xk+1 = xk + k xk , k [0, 1]
(1.265)
1
hk 2 .
2
(1.267)
80
1
).
hk
(1.268)
k
F 0 (xk + txk ) F 0 (xk ) xk dtk + (1 ) kF (xk )k .
0
The first term on the right-hand side measures the deviation from the Newton path. Using the affine contravariant Lipschitz condition, it can be estimated as follows
Z
k
1 2
1
kF 0 (xk )xk k2 hk 2 kF (xk )k .
2
2
Inserting this estimate into the previous one and minimizing tk () proves the
theorem.
(1.269)
with > 0 sufficiently small, the damped Newton iterates xk , k lN0 converge
to some x D0 with F (x ) = 0.
81
1 12
, 0 h1k ,
tk ()
1 + 21 h1k
, h1k h2k .
For 0 <
1
hk
1
,
2
(1.270)
(1.271)
where [] is a lower bound for the domain dependent Lipschitz constant that
can be obtained by pointwise sampling.
Then, an estimate of the optimal damping factor is given by means of
[k ] := min (1,
1
).
[hk ]
(1.272)
82
i.e., we may have a considerable overestimation. As a remedy, repeated reductions must be performed by appropriate prediction and correction strategies.
The following bit counting lemma gives information about the contraction
in the residuals in terms of the accuracy of the estimate for the Kantorovich
quantity.
Lemma 5.3 Bit counting lemma
Assume that for some 0 < 1 there holds
0 hk [hk ] < max (1, [hk ]) .
Then, the residual monotonicity test (1.267) yields
1
kF (xk+1 )k 1 (1 )k kF (xk )k .
2
(1.273)
(1.274)
[1
+
hk ]|=[k ] <
kF (xk )k
2
< [1 +
1
1
(1 + ) 2 [hk ]]|=[k ] 1 (1 ) k .
2
2
k+1
1
)k 1 k kF (xk )k .
4
(1.275)
83
As far as the correction step is concerned, we recall that the damped Newton method with damping factor [0, 1] represents a deviation from the
Newton path which can be measured by means of
1
2 kF (xk )k2 .
2
This leads us to the following lower bound for the affine contravariant Kantorovich quantity
kF (xk+1 ) (1 )F (xk )k
[hk ] :=
2 kF (xk+1 ) (1 )F (xk )k
hk .
2 kF (xk )k
Using the prediction 0k from the previous step, for i 0 we compute the trial
iterate
xk+1 = xk + ik xk
and perform the residual monotonicity test
kF (xk+1 )k (1
1 i
) kF (xk )k .
4 k
If the test is successful, we accept the current value ik as the damping factor.
Otherwise, we set
1
1
i+1
= min ( ik , i ) .
k
2
[hk ]
As long as i+1
min , the gives us a new trial iterate. However, if i+1
< min ,
k
k
the process is stopped (convergence failure).
For the prediction of a damping factor 0k+1 , we recall
hk+1 =
kF (xk+1 )k
hk .
kF (xk )k
Denoting by i the index, for which ik passed the residual monotonicity test,
we use the lower bound
[h0k+1 ] =
kF (xk+1 )k i
[hk ] < [hik ]
k
kF (x )k
and set
0k+1 := min (1,
1
[h0k+1 ]
).
84
(1.276)
k+1
k+1
k < kxk k ,
(1.277)
k+1
= F (xk+1 ) .
(1.278)
1
kAF (x)k2
2
A GL(n) .
(1.279)
(1.280)
The previous result tells us that with regard to first order information all
level functions are equally well suited. In order to be more selective, we have
to use second order information.
Theorem 5.4 Affine covariant downhill property
Assume that F : D lRn lRn , D lRn convex, is continuously differentiable
on D with regular Jacobian F 0 (x), x D and suppose further that the following
affine covariant Lipschitz condition holds true
85
A GL(n) ,
(1.282)
hk := hk cond(AF 0 (xk )) .
(1.283)
(1.284)
where
tA
k () := 1 +
1 2
hk .
2
(1.285)
(1.286)
F (xk )
| {z }
k =
= F 0 (xk )xk
Z
= kA
= F (xk )
Z
(F 0 (xk + txk ) F 0 (xk )) xk dtk + (1 ) kAF (xk )k .
kA
0
Invoking the affine covariant Lipschitz condition, for the first term on the righthand side we obtain
Z
F 0 (xk )1 (F 0 (xk + txk ) F 0 (xk )) xk dtk
kAF 0 (xk )
0
Z
0
t |{z}
t kxk k k
kAF (x )k
0
xk
|{z}
= (AF 0 (xk ))1 AF (xk )
k dt
86
1 2
kAF 0 (xk )k hk k(AF 0 (xk ))1 AF (xk )k
2
1 2
1 2
hk kAF 0 (xk )k k(AF 0 (xk ))1 k kAF (xk )k =
hk kAF (xk )k .
2
2
Combining the previous estimates gives the assertions.
In view of Theorem 5.4 we readily get the following global convergence result.
Theorem 5.5 Affine covariant global convergence theorem
In addition to the assumptions of Theorem 5.4, let x0 D be an initial guess
such that the path-connected component D0 of GA (x0 ) is a compact subset of D.
Then, for all damping parameters
k [ , 2 k (A) ] ,
(1.287)
with > 0 being sufficiently small, the damped Newton method converges
to some x D0 with F (x ) = 0.
Proof. As before, we remark that the parabola tA
k () can be bounded from
above by a polygonal bound according to
tA
k () 1
1
,
2
0<
1
.
hk
(1.288)
1
k = hk cond(F 0 (xk ))
1,
(1.289)
87
(1.290)
The associated level function TF 0 (xk )1 is called the natural level function
which gives rise to the natural monotonicity test
kx
k+1
k kxk k
(1.291)
k+1
= F 0 (xk )1 F (xk+1 ) .
(1.292)
Several remarks are due with respect to the properties of the natural level
function.
Remark 5.6 Extremal properties
As shown in Figure 3, for A GL(n) the reduction factors tA
k () and the optimal
1 2
hk tA
k () ,
2
(1.293)
1
) k (A) .
hk
(1.294)
88
(1.295)
k (Ak ) = 1 ,
(1.296)
1
kx x k2 + O(kx x k3 ) .
2
89
distance to x . Note that for other level functions, the level surface is an ellipsoid
close to x , with the ratio of the largest to the smallest half-axis being related
to the condition number of the Jacobian, and an osculating ellipsoid off x .
Remark 5.10 Local descent
if we insert A = Ak into (1.285),(1.286) of Theorem 5.4, we get the local
descent property
1
k+1
kx k 1 + 2 hk kxk k .
(1.297)
2
Remark 5.11 Global convergence
We note that the results of Theorem 5.5 are not applicable to the situation at
hand, since A = Ak changes from one step to the other. Taking the asymptotic
distance function property into account, in the subsequent global convergence
result we make the fixed choice A = F 0 (x )1 .
Theorem 5.6 Global convergence of the affine covariant damped Newton method with natural level functions; Part I
Assume that F : D lRn lRn , D lRn convex, is continuously differentiable
on D with regular Jacobian F 0 (x), x D and suppose that the following affine
covariant Lipschitz condition is fulfilled
0<<
1
,
hk
(1.299)
where
k := min(1,
1
) ,
hk
hk := kxk k kF 0 (xk )1 F 0 (x )k .
(1.300)
90
1 2
kxk k kF 0 (xk )1 F 0 (x )k kF 0 (x )1 F (xk )k .
|
{z
}
2
= hk
The rest of the proof proceeds in exactly the same manner as in the proof of
Theorem 5.5.
In much the same way as we derived Theorem 5.5 from Theorem 5.4, the previous results imply the following convergence statement in a more realistic scenario:
Corollary 5.7 Global convergence of the affine invariant damped Newton method; Part II
Assume that all assumptions of Theorem 5.6 are met, except that the affine
covariant Lipschitz condition is replaced by one with a local Lipschitz
constant
0<<
1
,
hk (z)
(1.302)
1
),
hk (z)
(1.303)
where
hk (z) := (z) kxk k kF 0 (xk )1 F 0 (z)k .
(1.304)
91
Figure 5: Newton path G(xk ), trust region around xk and Newton step with
locally optimal damping factor
We have a local level function reduction according to
2
1
TF 0 (z)1 (xk + xk ) 1 + 2 hk (z) TF 0 (z)1 (xk ) .
2
(1.305)
1
,
k
where the radius k describes the local trust region around the current iterate
xk .
5.4.3 Adaptive trust region strategies
We provide lower estimates
[k ] k
[hk ] hk
(1.306)
for the Lipschitz constant and the Kantorovich quantity (e.g., by pointwise
sampling of the domain, and thus get an upper estimate
[k ] := min(1,
1
) k
[hk ]
(1.307)
92
(1.308)
k+1
1
k (1 (1 ))kxk k .
2
(1.309)
kxk+1 k 1
kxk k .
(1.310)
4
Correction strategy
We have to monitor the deviation from the Newton path. In an affine covariant setting we have the upper bound
kxk+1 () (1 ) xk k
1 2
k kxk k2 ,
2
(1.311)
jk
:= min( , [hk ](jk ))
2
(1.312)
93
kxk xk k
k .
k1 kxk1 k kxk k
(1.313)
The estimate (1.313) exploits newest information and leads us to the prediction strategy
0k
:= min(1, k ) ,
kxk1 k
kxk k
:=
k1 .
kxk xk k kxk k
(1.314)
94
(1.315)
describing certain exothermal chemical reactions, where > 0 stands for the
so-called Arrhenius parameter.
95
(1.316)
F (x, ) = 0 ,
(1.317)
I(x ) .
(1.318)
(1.319)
(1.320)
(1.321)
dim ker F 0 (y ) = 1 .
(1.322)
96
dim ker F 0 (y ) = k + 1 ,
(1.323)
(1.324)
0N .
(1.325)
The solution of (1.325) requires a good initial guess which will be provided by
some appropriately chosen prediction method. A related important issue is
an adaptive selection of the steplengths
:= +1
0 N 1 .
97
(1.326)
where := .
In the sequel, as the most important examples for prediction pathes we will
consider
the classical continuation method,
the tangent continuation method,
the standard and the partial standard embedding,
the polynomial continuation method.
98
(i)
Classical continuation
The most simple prediction path is the constant path
x() := x( ) ,
+1 .
(1.327)
Obviously, we have
kx() x()k kx() x( )k
max
s[ ,+1 ]
kx0 (s)k .
max
s[ ,+1 ]
kx0 (s)k .
(ii)
Tangent continuation
An alternative way to obtain a prediction path is to apply the explicit Euler
method to the Davidenko equation (1.318):
x() := x( ) + ( ) x0 ( ) ,
+1 .
(1.328)
Therefore, the tangent continuation is also referred to as the Euler continuation or method of incremental load. Figure 7 shows both the classical
and the tangent continuation method.
As far as the order is concerned, we have
kx() x()k kx() x( ) x0 ( )k
1
2
2
max
s[ ,+1 ]
kx00 (s)k .
1
2
max
s[ ,+1 ]
kx00 (s)k .
+1 .
(1.329)
However, this method does not exploit any structure of F . Therefore, a better
way is ro select only a component of the mapping which leads to the so-called
partial standard embedding
99
kF 0 (
x())1 F 0 (x) F 0 (
x() k
kx x()k , x, x() D , I.(1.331)
Then, the order coefficient for the classical continuation method is
1 = max kF 0 (x())1 P F (x )k ,
I
(1.332)
12 .
2
(1.333)
F (x, ) =
F (x) = F 0 (x) ,
x
x
F (x, ) = P F (x ) ,
and hence,
x0 () = F 0 (x())1 P F (x ) .
This readily gives (1.332). For the tangent continuation, we must invoke the
Lipschitz condition
kF 0 (
x( ))1 F 0 (x()) F 0 (x( )) k kx0 ()k
kx() x( )k 1
12 .
(iv)
Polynomial continuation
We distinguish between extrapolation by Lagrange and by Hermite interpolation.
(iv)1 Lagrange extrapolation
We assume that for some q > 0 the data
x(` ) ,
q `
100
x(` L`q () .
(1.334)
`=q
(1.335)
where
() :=
( ` ) .
`=q
q ` .
(1.336)
where
() :=
( ` )2 .
`=q
kFx (
x(), )1 Fx (y, ) Fx (x, ) k
0 ky xk , x, y D , I (1.337)
101
0 p
the ordinary Newton method with initial guess x(+1 ) converges to the solution
point x(+1 ).
Proof. For the ease of exposition, we write instead of . The affine
covariant Newton-Kantorovich theorem requires
kx0 ()k
0
1
.
2
(1.339)
0
1
1
kx ()k = kFx (
x(), ) F (
x(), )k = kFx (
x, )
F (
x, ) F (x, ) k =
Z1
kFx (
x, )1
1
Fx (x + t(
x x), )(
x x) dtk k
x x)k 1 +
0 k
x xk .
2
1
p
kx ()k p 1 +
0 p
=: () .
2
0
(1.340)
1
1
0 p p 1 +
0 p p
,
2
2
which is equivalent to
0 p p
21 .
102
0 () :=
0 kx0 ()k .
2
(1.342)
0 p p ,
2
which leads to
0 p p g(0 ()) ,
where
g() :=
1 + 4 1 .
g(0 ())
0 p ,
p
g() 1/p
[
0 p ]
1
.
4
g() 1/p
g(0 )
(1.343)
Remark: If the termination criterion detects some k such that k > 21 , the
last continuation step has to be repeated with
0 :=
g() 1/p
,
g(k )
(1.344)
103
20 ( )
0 ,
kx0 ( )
k
x( ) x( )k
p .
|1 |p
:=
(1.345)
Remark: The prediction strategy (1.345) is robust with respect to the accuracy of x: Even if only a single Newton step is performed, i.e.,
x = x( ) + x0 ( ) ,
the prediction takes the form
0 :=
g() 1/p
20
1 ,
g() 1/p
:=
1 .
2min
(1.346)
104
rank F 0 (y ) = n k .
(1.347)
R (A)
(1.348)
with
dim N (A) = k + 1 ,
dim R (A) = k
P := AA+
w := P (y y ) ,
(1.349)
projects onto R (A).
v := P (y y ) .
(1.350)
Then, in view of (1.347) the implicit function theorem asserts the existence of
a function w = w (v) such that
P F (y + v + w) = 0
w = w (v) .
(1.351)
f (v) : P F (y + v + w (v)) = 0 ,
(1.352)
z := [z1 , ..., zk ] ,
105
we obviously have
At = 0 , tT t = Ik+1 , P = ttT ,
AT z = 0 , z T z = Ik , P
(1.353)
= zz T .
k+1
X
i ti
f (v) = z =
i=1
k
X
j z j ,
j=1
in terms of the function : lRk+1 lRk , the reduced system (1.351) can be
written as
() := z T f (t) = z T F (y + t + w (t)) = 0 .
(1.354)
(1.355)
(1.356)
(1.357)
(1.358)
(1.359)
106
(1.360)
(1.361)
(1.362)
(1.363)
In particular, if we have
G(h(0), ) = G(o, ) = p(0, ) 6= 0 ,
the reduced system (1.354) has to be replaced by
z T F (y ) = p(0, ) .
These k equations together with the n k equations P F = 0 then give rise to
the n equations
F (y ) = zp(0, )
(1.364)
(1.365)
(1.366)
1 T
(z z 1) = 0 ,
2
(1.367)
107
(1.368)
A
J(y, z, ) =
0
AT 0
In z ,
zT 0
where
0
C := (F (y) z) =
n
X
fi00 (y)zi
A := F 0 (y) .
i=1
C
A
0
(F 0 )T z
AT 0
y
In z z = F + z
1 T
zT 0
(z z 1)
2
(1.369)
is well-defined in a neighborhood of y .
Instead of (1.369), replacing J(y, z, ) by J(y, z, o) and A by A F 0 (y ), we
consider the Newton-like method
C AT 0
y
(F 0 )T z
A 0 z z = F + z ,
(1.370)
1 T
(z
z
1)
0 zT 0
2
108
R S
A = Q
T ,
0 T
where Q is an orthogonal nn matrix, R is an upper triangular (n1)(n1)
matrix, S is an (n 1) 2 matrix, lR2 , and is an (n + 1) (n + 1)
permutation matrix.
For y close to y , the matrix R is nonsingular and the vector is small. Hence,
we may choose
R
S
A = Q
T .
(1.371)
0 0
Using (1.371) in (1.370), suggests the partitioning
C11 C12
T
C := C =
, C22 lR22 ,
T
C12
C22
w
T
z := Q z =
, w lRn1 , lR ,
u
T
y =
, u lRn1 , v lR2 ,
v
w
T
, w lRn+1 , lR ,
z = Q z =
f1
T T
A z =
, f1 lRn1 , f2 lR2 ,
f2
g1
T
Q (F + z) =
, g1 lRn1 , g2 lR ,
g2
1 T
h :=
(z z 1) ,
2
which leads to the linear system
C11 C12 RT 0 0
T
C12
C22 S T 0 0
R
S
0 0 w
0
0
0 0
0
0 wT 0
u
v
w
f1
f2
g1
g2
h