Professional Documents
Culture Documents
A. T. Ghronopoulos Department of Computer Science University of Minnesota at Minneapolis Minneapolis, Minnesota 55455
and Z. Zlatev
Airpollution Laborato y Danish Agency of Environmental Risoe National Laborato y D&4000 Roskilde, Denmark
Protection
ABSTRACT
A nonlinear conjugate gradient method has been introduced and analyzed by J. W. Daniel. This method applies to nonlinear operators with symmetric Jacobians. The conjugate gradient method applied to the normal equations can be used to approximate the soultion of general nonsymmetric linear systems of equations if the condition of the coefficient matrix is small. In this article, we obtain nonlinear generalizations of this method converge obtained. which apply directly to nonlinear and the Jacobian Error bounds operator equations. Under conditions methods on the Hessian of the operators, we prove that these
to a unique solution.
1.
INTRODUCTION
Nonlinear systems of equations often arise when solving initial or boundary value problems in ordinary or partial differential equations. We consider the nonlinear system of equations F(x) = 0 (1.1)
where F( x> is a nonlinear operator from a real Euclidean space of dimension N or Hilbert space into itself. The Newton method coupled with Gaussian elimination is an efficient way to solve such nonlinear systems when the
AND COMPUTATION 167 0096-3003/92/$5.00
APPLIED MATHEMATICS
51:167-180
(1992)
0 Elsevier Science Publishing Co., Inc., 655 Avenue of the Americas, New York, NY 10010
168
A. T. CHRONOPOULOS
AND Z. ZLATEV
dimension of the Jacobian is of small. When the Jacobian is large and sparse, some kind of iterative method may be used. This can be a nonlinear iteration (for example, functional iteration for contractive operators) or an inexact Newton method. In an inexact Newton, the solution of the resulting linear systems is approximated by a linear iterative method. (cf. [15], [6]) Nonlinear steepest descent methods for the minimal residual and normal equations have been studied by many authors (c.f. [12] and [14]). J. Fletcher and C. M. Reeves [8], and J. W. Daniel [4] have obtained a nonlinear conjugate gradient method that converges if the Jacobian is symmetric and uniformly positive definite. These nonlinear methods reduce to the standard conjugate gradient methods for linear systems. These methods are based on exact line search at each iteration and thus must solve a scalar nonlinear minimization problem in order to determine the steplengths. Several authors have suggested inexact line search and have given conditions under which these methods would still converge [8]. This is done to avoid solving exactly the scalar minimization problem whose derivative evaluation involves evaluation of the nonlinear operator. The conjugate gradient method applied to normal equations can be used to solve iteratively nonsymmetric linear systems when the condition number of the Jacobian is small. Some preconditioning applied to the original linear system can be used to achieve this goal. Two algorithms exist for the conjugate gradient method applied to the normal equations: the CGNR [l, 111 and CGNE [l, 31 (or Craigs method). In this article we obtain a nonlinear extension of the Conjugate Gradient methods applied to the normal equations. We assume that the Jacobian and the Hessian of the nonlinear operator are uniformly bounded. We prove global convergence and local convergence results for the nonlinear algorithms. We also give asymptotic steplength estimates and error bounds. These steplengths can be used in implementing these methods. In section 2, we review the CGNR and CGNE methods. In section 3, we derive a nonlinear CGNR method and prove global convergence. In section 4, we derive a nonlinear CGNE obtain asymptotic method and prove local convergence. steplength and error estimates. In section 5, we
2.
GRADIENT
APPLIED
TO
THE
NORMAL
Let us consider the system of linear equations Ax = f, where A is nonsingular, nonsymmetric matrix of order N. This system can be solved by either of the two normal equations systems: ATAx =
Atf
(2.1)
169 =ATy
(2.2)
Since both ATA and AA have the same spectrum, we can apply CG to either system to obtain an approximate solution of Ax = f. The CGNR method [ll] appl ies CG to (2.1). Then x,, minimizes the norm of the residual error E(x,,) = Ilf - Ax,,ll. 2 over the affine Krylov subspace
x,, +
is the following.
The CGNR
algorithm.
ro = f - Ax,,, Po = Ar,, For n = 0 Until Convergence Do (A?r,, Ar,,) 1. a, = ( AP,) > & > a,,p, and r,,+l = 2. x,1+1 =
xi+
r,, - a,,AP,,.
( AAT?, + l> APJ 11 AP,, II
The CGNE method [3] applies CG to (2.2). Then x,, minimizes the norm of the error E(x,) = /Ix* - x,,l12 over the same affine Krylov subspace as CGNR, and the resulting algorithm is the following.
The CGNE
algorithm.
Do
(rfl1 rn> (PII, P,l) x, + a,, p,, and r, +, = r,, - a,, Ap,. + b,p, where b,, = ( ATr,,+,, P)
3. P,,+~ = Alr,+,
EndFor. Since the spectrum of the matrices AAr and ATA are the same, we should expect that the performance of CGNR and CGNE is the same. However, CGNE minimizes the norm of the error and may yield better performance.
lIP,,lI ,
170 The CGNE method is sometimes proposed by E. J. Craig. The following bound error
can be obtained
E(x):
E(x,*)
=G2
1-
l/p
1 + l/P
I2 E( x0>
(2.3)
where
p = ]]A]]z]]A-11]2
is the condition
number
of the matrix.
3.
THE
NONLINEAR
CGNR
METHOD
In this section, we generalize the CGNR iteration to a nonlinear iteration which requires the solution of a scalar equation to determine the steplength. We then prove a global convergence result under assumptions that the Hessian and the Jacobian are uniformly bounded. Let F(x) be an operator mapping of the Euclidean space R (or, even more generally a real Hilbert space) into itself. The notation F(x) and F(x) will be used to denote the Frechet and Gateaux derivatives respecand F(x,,) tively. Also, for simplicity F,, and F,: will denote F(x,~) respectively. We seek to solve iteratively the nonlinear system of equations: F(x) = 0. In the linear case F(x) = AX - b and F(x) = A. Assume that F(x) and F(x) exist at all x and that there exist scalars 0 < m < M, 0 < B independent of x so that the following satisfied for any vectors x and u: conditions are
M211~l12
(3.la) (3.lb)
(i)
The
symmetric
and holds:
M21M~2.
(ii) The left inequality in (3.la) and the inverse function theorem for differential operators imply that the inverse F-(x) exists and it is differentiable.
171
From the left inequality in (3.la) and the inverse function theorem, we conclude that F-(x) exists and it is differentiable at all x. We use the mean value theorem for the operator F- (x) to obtain the following equation
(y -x>(y
Combining
-xl)
(F'-'(-MY)
in (3.Ia)
-f(x))T(~
we obtain:
-d),
II y -
xl2 <
llF(y) m
- F( x)lI II y - xll.
This inequality
implies that
(3.2)
and assumption
(3.3)
generaliza-
The Nonlinear
CGNR
Algorithm.
p. = F:r,, r-0 = -F(x,,), For 12 = 0 Until Convergence Do 1. Select the smallest positive c,, to minimize
2. xn+r = x, + c,p, 3. b, = -
IIF(x,
+ ~p,~)lll, c > 0
where
EndFor The scalars c, and b,, are defined to guarantee nality relations:
(3.4)
172
A. T. CHRONOPOULOS
AND Z. ZLATEV
and
(FnP,, F,,P,-1)= 0.
Under the assumptions (3.1), the following lemma holds.
(3.5)
LEMMA 3.1. Let {r,,) be the nonlinear residuals and ( p,,) be the direction vectors in Algorithm 3.1 then the following identities hold:
,I1p,,_ ,I? (iii) llF,,rF,~fTr,l12 IIFnp,,l12 + b,f_ lllFnp,,_I II2 = (iv) mllr,,II ,< IIF,zTr,,ll=GII p,,ll (~1 II p,ll G M31i*,il (vi) Ilr,, 1ll =Jlf-nII.
PROOF. The orthogonality relations (3.4) and (3.5) combined with Step 3 of Algorithm 3.1 imply (i)-(iii). Equality (ii) and (3.la) are used in proving inequality in (iv) as follows:
cl
REMARK 3.1.
Let f,,(c)
denote
;llF(r,,
+ cp,Jl12. Its
f:(c)
= (Fk
+ CIA,), F(x,,
(3.6) P# (3.7)
F( x, + T,>)
Ilp,1?(m~
< m2 IIp,,l12 -
~llp,l1211r,~II
<f:(c)
(3.8) (3.9)
iteration
f:(c)
G llp,,l12( M2 + Bllr,,ll)
G llpnl12( M2 + Bllr,,ll)
(3.1) the nonlinear CGNR
Untter the assumptions (3.1) on the nonlinear operator THEOREM 3.1. F(x) the sequence x,, generated by Algorithm 3.1 is well-dejned for any x,,, it converges to a unique solution x* of the nonlinear system F(x) = 0 and
$F(x,,)ll.
PROOF. The proof is divided in four parts. Firstly, we prove the existence of c,, in 1 of Algorithm 3.1. The derivative of the real function f,, at zero [because of Lemma 3.1 (i)] is:
f,(O) = -IIFn4r,)12 < 0 So there exists there is a c > c, > 0 where inserting x = bounded for c
c > 0 such that I]F(r, + c-p,)11 < I(rn((. We must prove that 0 such that f,(O) < fn(c). ThIS would imply that there exists f,(c) assumes a local minimum. From inequality (3.2) by x, and y = xn + cptl we conclude that F(y) grows un-+ m. This proves that there is a 0 < c such that f,(O) <f,,(c).
Secondly, we obtain a lower bound on the steplength c,. Taylors expansion gives f,<cn> = 0 = f,:(O) + c,,fi(E,,), where En = t,c, for some t in (0,1). We solve for c,,. We then use the upper bound in inequality (3.9), (3.la), and Lemma 3.1 (v) to obtain
m4 M( M2 + Bllroll) llp,l12(
m211rnl12 M2 + Bllroll)
174
A. T. CHRONOPOULOS
Thirdly, we prove that the sequence of residual norms decreases For C = tc for some t in [0, I] we have
Now by inserting
c =
IIF,~,,l12
I]pJ2( M + B]]r,,]]) we Obtain
IIF,,r,,l14 II p,l12( M2 +
Now using m211r,l12 f
Bllr,ll) 1
lIr12
If we substitute
in (3.10), we prove that the norm of the residual is reduced (at each iteration) by a constant factor that is less than one. This implies that Ilr,ll converges to zero. Finally, we prove that the sequence of iterates converges to a unique solution of the nonlinear operator equation. By use of (3.2) with x = x, and x,, is a Cauchy sequence. Thus, it !j = *n+kT we obtain that the sequence and the error bound converges to x* and F(x*) = 0. The uniqueness inequality y = x*. 4. THE in the theorem statement follow from (3.2) with x = x, and 0
NONLINEAR
CGNE
METHOD
Let us assume that (3.la) and (3.lb) hold in this section. From Theorem 3.1, it follows that a unique solution of F(x) = 0 exists. Next, we introduce a nonlinear version of CGNE, and we prove a local convergence theorem.
CGNE
Algorithm.
175
For 11 = 0 Until Convergence Do 1. Select the smallest positive c, to minimize 1(x* - (x,, + c,D,,&_, c > 0 2.
XII+1 =x, +
c,p,
and r,,+l P)
3. b, = EndFor
(F:+T~+I,
IIp,$
where
REMARK 4.0. The error function in step 1 of the algorithm is not computable because it uses the exact solution. However, it is possible to determine an approximation to c, that guarantees local convergence of the algorithm to the solution. Let us denote the true error x* - x,, by e,,. The scalars definition imply the following two orthogonality relations: c,, and b, by
(en+l,
and
P) =o
(4.1)
(P
n+l,
P) = 0.
(4.2)
residuals
and { pJ
be the direction
in Algorithm
( p,, e,> = (FnTr,, e,> IIF~rnl12 lIpnIl + b,2_111pn_1112 = Mllr,ll 2 II p,II MIlenIl llrnII 2 mIlenIl 2 (0) Ile,+lll < lIenIl
PROOF. (i) follows from relation (4.1) and equality 5 of Algorithm (4.1). We prove (ii) from equality 5 of Algorithm (4.1) and relation (4.2). Part (iii) follows from (ii) and (3.1). By using equalities (3.2) and (3.3) (at the beginning of section 3) with y = xn and x = X* we prove (iv). Part (v) follows from the selection of c, 0
AND Z. ZLATEV
= file - cpl?.
(4.3)
(4.4)
Using Lemma =
(4.5)
We next prove that under the assumptions (3.1) that Algorithm (4.1) converges locally to the unique solution of the nonlinear operator equation.
THEOREM4.1.
is selected
such that
(4.1) is well-&fined
x* of the
PROOF. Firstly, we prove the existence suffices to prove that the first derivative of second derivative is positive in an interval and (v) and the assumption of the theorem
of the nonlinear steplengths c,. It f(c) is negative at c = 0 and its [0, cr). By using Lemma 4.1 (iv) we prove the following inequality:
l(r, ( Flen,
e,>)I
<
(4.7)
Nonlinear
Conjugate
Gradient
Metho&
177
II p,,Ila ;llr,,ll.
(4.8)
This indequality shows that the second derivative of fi,<c> is positive if convergence has not been reached (i.e., I;, z 0). Secondly, we obtain a lower bound on the nonlinear steplength c,,. Since f,,<~,~) = 0, we used (4.5) and (4.7) to obtain
C=
( p,, et,1
)
IIp,,II
We insert
1 c = and we use Lemma in fi,<c> = ilk,, - cp,,ll 2M2 and (iv) to obtain the following error bound:
+ !g
lle,,l12 4M2
(1 -
m2)
of the iteration
ESTIMATES
AND
ERROR
In this section, we obtain asymptotic estimates of the steplengths c, near the solution. We also obtain an asymptotic error factor estimate. We only obtain these results for Algorithm 4.1. Similar results can be obtained for Algorithm 3.1. We next obtain asymptotic assumptions of Theorem 4.1. estimates of the steplengths c, under the
PROPOSITION 5.1.
llrnl12
llrl12
cm-,II).
178
A. T. CHRONOPOULOS
AND Z. ZLATEV
PROOF. We will prove only the rightmost inequality. The leftmost inequality is proved similarly. From equality (4.5) and f,:<c> = 0, we obtain the following inequality:
c, <
lb-,,112
IIr,,l12
where
4.1.
PROPOSITION 5.2. Under the assumptions following inequality on the residual errors:
of Theorem
4.1, we obtain
the
PROOF. obtain:
We
note
(4.1)
and Lemma
4.1
6)
we
Ce,)
Nonlinear Conjugate Gradient Methocls Now using equality (4.4) and Lemma lie,, Using Proposition ill2 4.1 (iv) we obtain:
179
(5.1)
+ E,,I
(5.2)
c1 J E ) I,
+ 1
Bllr,,II lle,,l12
is less than
llenl12[l- $
where q, =
+q.],
Bllr,,II.
6.
CONCLUSIONS
We have presented and analyzed nonlinear generalizations of the CGNR and CGNE method. These nonlinear methods apply to nonlinear operator equations with nonsymmetric Jacobian. We show that under certain uniform assumptions on the Jacobians and Hessians the nonlinear CGNR is guaranteed to converge globally to a unique solution. For the nonlinear CGNE under the same assumptions as CGNR, we prove local convergence results and give asymptotic error bound estimates. These results extend the work of other authors [4, 81 to deriving nonlinear methods for nonsymmetric Jacobians. The research was partially supported by NSF under grant CCR-8722260.
A conference proceedings version of this paper appeared in the proceedings of the S.E. Conference on Theory of Approximation, March 1991, Marcel-Dekker, Pure and AppZied Mathematics, Vol. 138, edited by George A. Anastassion.
180 REFERENCES 1 2 3 4 5 6 7 S. F. Ashby, T. A. Manteuffel, SIAM J. A local convergence methods, iteration conjugate
A. T. CHRONOPOULOS
AND
Z. ZLATEV
Saylor,
A taxonomy (1990).
for conjugate
gradient difference
Nurner.
P. N. Brown,
Anal. 24:610-638
Approximute
Minimization
R. S. Dembo, S. C. Eisenstat,
S. C. Eisenstat, H. C. Ehnan,
and M. H. Schultz,
Variational
methods
for nonsymmetric (1983). 8 9 10 11 12 13 R. Fletcher Compt. R. Fletcher, tion, Wiley, G. H. Golub nonlinear systems, nonlinear equations 13:765-794 14 15 T. L. Saaty, 1981. D. P. Oleary, variables, J. M. Hestenes
and C. M. Reeves, (1964). Practical Methods Chichester, BIT 1980. and R. Kannan, and E. Stiefel, Res. Nat. Bur. On general operator with equations, variational
by conjugate
/. 7:149-154
of a two-stage
equations,
209-219
gradients (1952).
for solving
linear of
M. Z. Nashed, M. Z. Nashed,
solutions descent J.
of a class
Camp.
19: 14-24
The convergence
operators, Publications
A discrete
Newton
for minimizing
a function
Math. Programming
23:20-33
(1982).