You are on page 1of 14

Iterative Methods for Nonlinear Operator Equations

A. T. Ghronopoulos Department of Computer Science University of Minnesota at Minneapolis Minneapolis, Minnesota 55455
and Z. Zlatev

Airpollution Laborato y Danish Agency of Environmental Risoe National Laborato y D&4000 Roskilde, Denmark

Protection

ABSTRACT
A nonlinear conjugate gradient method has been introduced and analyzed by J. W. Daniel. This method applies to nonlinear operators with symmetric Jacobians. The conjugate gradient method applied to the normal equations can be used to approximate the soultion of general nonsymmetric linear systems of equations if the condition of the coefficient matrix is small. In this article, we obtain nonlinear generalizations of this method converge obtained. which apply directly to nonlinear and the Jacobian Error bounds operator equations. Under conditions methods on the Hessian of the operators, we prove that these

to a unique solution.

and local convergence

results are also

1.

INTRODUCTION

Nonlinear systems of equations often arise when solving initial or boundary value problems in ordinary or partial differential equations. We consider the nonlinear system of equations F(x) = 0 (1.1)

where F( x> is a nonlinear operator from a real Euclidean space of dimension N or Hilbert space into itself. The Newton method coupled with Gaussian elimination is an efficient way to solve such nonlinear systems when the
AND COMPUTATION 167 0096-3003/92/$5.00

APPLIED MATHEMATICS

51:167-180

(1992)

0 Elsevier Science Publishing Co., Inc., 655 Avenue of the Americas, New York, NY 10010

168

A. T. CHRONOPOULOS

AND Z. ZLATEV

dimension of the Jacobian is of small. When the Jacobian is large and sparse, some kind of iterative method may be used. This can be a nonlinear iteration (for example, functional iteration for contractive operators) or an inexact Newton method. In an inexact Newton, the solution of the resulting linear systems is approximated by a linear iterative method. (cf. [15], [6]) Nonlinear steepest descent methods for the minimal residual and normal equations have been studied by many authors (c.f. [12] and [14]). J. Fletcher and C. M. Reeves [8], and J. W. Daniel [4] have obtained a nonlinear conjugate gradient method that converges if the Jacobian is symmetric and uniformly positive definite. These nonlinear methods reduce to the standard conjugate gradient methods for linear systems. These methods are based on exact line search at each iteration and thus must solve a scalar nonlinear minimization problem in order to determine the steplengths. Several authors have suggested inexact line search and have given conditions under which these methods would still converge [8]. This is done to avoid solving exactly the scalar minimization problem whose derivative evaluation involves evaluation of the nonlinear operator. The conjugate gradient method applied to normal equations can be used to solve iteratively nonsymmetric linear systems when the condition number of the Jacobian is small. Some preconditioning applied to the original linear system can be used to achieve this goal. Two algorithms exist for the conjugate gradient method applied to the normal equations: the CGNR [l, 111 and CGNE [l, 31 (or Craigs method). In this article we obtain a nonlinear extension of the Conjugate Gradient methods applied to the normal equations. We assume that the Jacobian and the Hessian of the nonlinear operator are uniformly bounded. We prove global convergence and local convergence results for the nonlinear algorithms. We also give asymptotic steplength estimates and error bounds. These steplengths can be used in implementing these methods. In section 2, we review the CGNR and CGNE methods. In section 3, we derive a nonlinear CGNR method and prove global convergence. In section 4, we derive a nonlinear CGNE obtain asymptotic method and prove local convergence. steplength and error estimates. In section 5, we

2.

THE CONJUGATE EQUATIONS

GRADIENT

APPLIED

TO

THE

NORMAL

Let us consider the system of linear equations Ax = f, where A is nonsingular, nonsymmetric matrix of order N. This system can be solved by either of the two normal equations systems: ATAx =

Atf

(2.1)

Nonlinear Conjugate Gradient Methods


uTy =f,
x

169 =ATy

(2.2)

Since both ATA and AA have the same spectrum, we can apply CG to either system to obtain an approximate solution of Ax = f. The CGNR method [ll] appl ies CG to (2.1). Then x,, minimizes the norm of the residual error E(x,,) = Ilf - Ax,,ll. 2 over the affine Krylov subspace

x,, +

ATr,, , . . , ( AA) - Arr,)

and the resulting algorithm

is the following.

ALGORITHM 2.1. Initial vector x,)

The CGNR

algorithm.

ro = f - Ax,,, Po = Ar,, For n = 0 Until Convergence Do (A?r,, Ar,,) 1. a, = ( AP,) > & > a,,p, and r,,+l = 2. x,1+1 =

xi+

r,, - a,,AP,,.
( AAT?, + l> APJ 11 AP,, II

3. p,,+ 1 = Ar,,+, EndFor.

+ b,, p,, where b,, = -

The CGNE method [3] applies CG to (2.2). Then x,, minimizes the norm of the error E(x,) = /Ix* - x,,l12 over the same affine Krylov subspace as CGNR, and the resulting algorithm is the following.

ALGORITHM 2.2. Initial vector x0

The CGNE

algorithm.

ro =f - Aq, p, = A?r,, For n = 0 Until Convergence 1. a,,= 2.


x,+1 =

Do

(rfl1 rn> (PII, P,l) x, + a,, p,, and r, +, = r,, - a,, Ap,. + b,p, where b,, = ( ATr,,+,, P)

3. P,,+~ = Alr,+,

EndFor. Since the spectrum of the matrices AAr and ATA are the same, we should expect that the performance of CGNR and CGNE is the same. However, CGNE minimizes the norm of the error and may yield better performance.

lIP,,lI ,

170 The CGNE method is sometimes proposed by E. J. Craig. The following bound error

A. T. CHRONOPOULOS called Craigs method

AND Z. ZLATEV because it was first functional

can be obtained

[5] for the error

E(x):

E(x,*)

=G2

1-

l/p

1 + l/P

I2 E( x0>

(2.3)

where

p = ]]A]]z]]A-11]2

is the condition

number

of the matrix.

3.

THE

NONLINEAR

CGNR

METHOD

In this section, we generalize the CGNR iteration to a nonlinear iteration which requires the solution of a scalar equation to determine the steplength. We then prove a global convergence result under assumptions that the Hessian and the Jacobian are uniformly bounded. Let F(x) be an operator mapping of the Euclidean space R (or, even more generally a real Hilbert space) into itself. The notation F(x) and F(x) will be used to denote the Frechet and Gateaux derivatives respecand F(x,,) tively. Also, for simplicity F,, and F,: will denote F(x,~) respectively. We seek to solve iteratively the nonlinear system of equations: F(x) = 0. In the linear case F(x) = AX - b and F(x) = A. Assume that F(x) and F(x) exist at all x and that there exist scalars 0 < m < M, 0 < B independent of x so that the following satisfied for any vectors x and u: conditions are

m21bllP ((F'( x)F( x))u, u) G G


IIF( x)11< B
REMARK 3.0.

M211~l12

(3.la) (3.lb)

(i)

The

symmetric

F( x)F( xY have the same eigenvalues. rn2~~ul12 (F'( G x)F(

definite operators F(x)F(x) Thus, the following inequality


x)~u,u)

and holds:

M21M~2.

(ii) The left inequality in (3.la) and the inverse function theorem for differential operators imply that the inverse F-(x) exists and it is differentiable.

Nonlinear Conjugate Gradient Methods

171

From the left inequality in (3.la) and the inverse function theorem, we conclude that F-(x) exists and it is differentiable at all x. We use the mean value theorem for the operator F- (x) to obtain the following equation

(y -x>(y
Combining

-xl)

(F'-'(-MY)
in (3.Ia)

-f(x))T(~
we obtain:

-d),

this with the right inequality

II y -

xl2 <

llF(y) m

- F( x)lI II y - xll.

This inequality

implies that

ml1y - XII G llF( y) - F( x>II


By use of the mean value theorem (3.Ia), for the operator F(x)

(3.2)
and assumption

we obtain the following inequality

llF( y) - F( x)II Q Mll y - XII


Under assumptions tion of CGNR. (3.1), we consider the following nonlinear

(3.3)
generaliza-

ALGORITHM 3.1. Initial vector xg

The Nonlinear

CGNR

Algorithm.

p. = F:r,, r-0 = -F(x,,), For 12 = 0 Until Convergence Do 1. Select the smallest positive c,, to minimize
2. xn+r = x, + c,p, 3. b, = -

IIF(x,

+ ~p,~)lll, c > 0

and r,,+l = -F(x,+,) ( F,+J,,T,P-,,+I~ F,+I pn) IIF,,+ I P#

where

prl + I = F,T+1r, + 1 + b,, P,, the following two orthogo-

EndFor The scalars c, and b,, are defined to guarantee nality relations:

(3.4)

172

A. T. CHRONOPOULOS

AND Z. ZLATEV

and

(FnP,, F,,P,-1)= 0.
Under the assumptions (3.1), the following lemma holds.

(3.5)

LEMMA 3.1. Let {r,,) be the nonlinear residuals and ( p,,) be the direction vectors in Algorithm 3.1 then the following identities hold:

(i) (r,,, F,,pJ = IIF,r,,l12 (ii> II p,ll = IIF,Tr,,l12 + bt-

,I1p,,_ ,I? (iii) llF,,rF,~fTr,l12 IIFnp,,l12 + b,f_ lllFnp,,_I II2 = (iv) mllr,,II ,< IIF,zTr,,ll=GII p,,ll (~1 II p,ll G M31i*,il (vi) Ilr,, 1ll =Jlf-nII.

PROOF. The orthogonality relations (3.4) and (3.5) combined with Step 3 of Algorithm 3.1 imply (i)-(iii). Equality (ii) and (3.la) are used in proving inequality in (iv) as follows:

mllr,il12 < llF,:Tr,,lI < II prrl12


Equality (iii) and (3.la) are used in proving inequality (v) as follows:

mll pnII < IIF,:p,,II < IIF,lF,:Tr,~II< M211r,ll


Inequality (vi) follows from the definition of c,~.

cl

REMARK 3.1.

Let f,,(c)

denote

the scalar function:

;llF(r,,

+ cp,Jl12. Its

first and second derivatives

are given by:

f:(c)

= (Fk

+ CIA,), F(x,,

+ w,)P~) + llF( x + 7,)

(3.6) P# (3.7)

f:(c) = (W x,,+ c-r%l) PJ P7

F( x, + T,>)

Nonlinear Conjugate Gradient Methods


The following upper and lower bounds on f:(c) the assumptions (3.0, and Lemma Bllrll) 3.1 (vi). can be computed

173 from (3.61,

Ilp,1?(m~

< m2 IIp,,l12 -

~llp,l1211r,~II

<f:(c)

(3.8) (3.9)
iteration

f:(c)

G llp,,l12( M2 + Bllr,,ll)

G llpnl12( M2 + Bllr,,ll)
(3.1) the nonlinear CGNR

We next prove that under assumptions converges globally to a unique solution.

Untter the assumptions (3.1) on the nonlinear operator THEOREM 3.1. F(x) the sequence x,, generated by Algorithm 3.1 is well-dejned for any x,,, it converges to a unique solution x* of the nonlinear system F(x) = 0 and

< 11x,, x*11

$F(x,,)ll.

PROOF. The proof is divided in four parts. Firstly, we prove the existence of c,, in 1 of Algorithm 3.1. The derivative of the real function f,, at zero [because of Lemma 3.1 (i)] is:

f,(O) = -IIFn4r,)12 < 0 So there exists there is a c > c, > 0 where inserting x = bounded for c
c > 0 such that I]F(r, + c-p,)11 < I(rn((. We must prove that 0 such that f,(O) < fn(c). ThIS would imply that there exists f,(c) assumes a local minimum. From inequality (3.2) by x, and y = xn + cptl we conclude that F(y) grows un-+ m. This proves that there is a 0 < c such that f,(O) <f,,(c).

Secondly, we obtain a lower bound on the steplength c,. Taylors expansion gives f,<cn> = 0 = f,:(O) + c,,fi(E,,), where En = t,c, for some t in (0,1). We solve for c,,. We then use the upper bound in inequality (3.9), (3.la), and Lemma 3.1 (v) to obtain

m4 M( M2 + Bllroll) llp,l12(

m211rnl12 M2 + Bllroll)

II FiTr,, II2 Ilp,l12(M2 + Bllr,ll)


(3.10)

174

A. T. CHRONOPOULOS

AND Z. ZLATEV to zero.

Thirdly, we prove that the sequence of residual norms decreases For C = tc for some t in [0, I] we have

Now by inserting

c =

IIF,~,,l12
I]pJ2( M + B]]r,,]]) we Obtain

IIF,,r,,l14 II p,l12( M2 +
Now using m211r,l12 f

Bllr,ll) 1

IIF, r,,ll [from Lemma 3.1 (iv)] we obtain


llF,ar,,l12 II ~,~ll( M + Bllr,,II)

lIr12

If we substitute

the fraction term in the square brackets

by the left most term

in (3.10), we prove that the norm of the residual is reduced (at each iteration) by a constant factor that is less than one. This implies that Ilr,ll converges to zero. Finally, we prove that the sequence of iterates converges to a unique solution of the nonlinear operator equation. By use of (3.2) with x = x, and x,, is a Cauchy sequence. Thus, it !j = *n+kT we obtain that the sequence and the error bound converges to x* and F(x*) = 0. The uniqueness inequality y = x*. 4. THE in the theorem statement follow from (3.2) with x = x, and 0

NONLINEAR

CGNE

METHOD

Let us assume that (3.la) and (3.lb) hold in this section. From Theorem 3.1, it follows that a unique solution of F(x) = 0 exists. Next, we introduce a nonlinear version of CGNE, and we prove a local convergence theorem.

ALGORITHM 4.1. The Nonlinear


Initial vector ro = -F(x,,I, x0 p, = F$r,,

CGNE

Algorithm.

Nonlinear Conjugate Gradient Methods

175

For 11 = 0 Until Convergence Do 1. Select the smallest positive c, to minimize 1(x* - (x,, + c,D,,&_, c > 0 2.
XII+1 =x, +

c,p,

and r,,+l P)

= -F(x,,+~) P,,+~ = FnT+Ir,+l + b, P,

3. b, = EndFor

(F:+T~+I,

IIp,$

where

REMARK 4.0. The error function in step 1 of the algorithm is not computable because it uses the exact solution. However, it is possible to determine an approximation to c, that guarantees local convergence of the algorithm to the solution. Let us denote the true error x* - x,, by e,,. The scalars definition imply the following two orthogonality relations: c,, and b, by

(en+l,
and

P) =o

(4.1)

(P

n+l,

P) = 0.

(4.2)

Under the assumptions

(3.1) the following lemma holds for Algorithm 4.1.

LEMMA 4.1. vectors 0)

Let {r.,] be the nonlinear 4.1 then the following

residuals

and { pJ

be the direction

in Algorithm

identities hold true:

(ii) (iii) (iv)

( p,, e,> = (FnTr,, e,> IIF~rnl12 lIpnIl + b,2_111pn_1112 = Mllr,ll 2 II p,II MIlenIl llrnII 2 mIlenIl 2 (0) Ile,+lll < lIenIl

PROOF. (i) follows from relation (4.1) and equality 5 of Algorithm (4.1). We prove (ii) from equality 5 of Algorithm (4.1) and relation (4.2). Part (iii) follows from (ii) and (3.1). By using equalities (3.2) and (3.3) (at the beginning of section 3) with y = xn and x = X* we prove (iv). Part (v) follows from the selection of c, 0

176 REMARK4.1. Let us denote

A. T. CHRONOPOULOS by fj,(c) the scalar function

AND Z. ZLATEV

fllx* (X + q,)ll The first and second derivatives are:

= file - cpl?.

./xc> = cllp,l12 (p,, e,),_&y(c> lIpnIl =


We expand F(x*) = 0 in Taylor series around r, = where ii = x, + te,. f;(c) x, to obtain:

(4.3)

F,e, + (F,le,, e,),


4.1 (i) and (4.31, we obtain:

(4.4)

Using Lemma =

~Ilp,l12+ llrnl12 (r,, (F,fe,, e,)). -

(4.5)

We next prove that under the assumptions (3.1) that Algorithm (4.1) converges locally to the unique solution of the nonlinear operator equation.

THEOREM4.1.

Assume that conditions II F(x,)ll < &.

(3.1) hold. Also, assume that x0 x, generated by

is selected

such that

Then the sequence

Algorithm nonlinear operator

(4.1) is well-&fined

converges equation F(x) = 0.

to the unique solution

x* of the

PROOF. Firstly, we prove the existence suffices to prove that the first derivative of second derivative is positive in an interval and (v) and the assumption of the theorem

of the nonlinear steplengths c,. It f(c) is negative at c = 0 and its [0, cr). By using Lemma 4.1 (iv) we prove the following inequality:

l(r, ( Flen,

e,>)I

<

~IIrnI1211e,,ll ~llrnI12 <

< ~ll~nl12. (4.6)


Also, (4.3) and

Combining (4.5) and (4.61, we conclude (4.6) imply that

that f,(O) is negative.

IfA(O)l= I(e,, p,)l > +llrnl12.

(4.7)

Nonlinear

Conjugate

Gradient

Metho&

177

Now, using Lemma

4.1 (iv), we obtain:

II p,,Ila ;llr,,ll.

(4.8)

This indequality shows that the second derivative of fi,<c> is positive if convergence has not been reached (i.e., I;, z 0). Secondly, we obtain a lower bound on the nonlinear steplength c,,. Since f,,<~,~) = 0, we used (4.5) and (4.7) to obtain

C=

( p,, et,1
)

IIp,,II

IIr,,l12 1 ___ a 2M 2 211 p,,II


4.1 (iii)

We insert

1 c = and we use Lemma in fi,<c> = ilk,, - cp,,ll 2M2 and (iv) to obtain the following error bound:

n+1ll2 2f(c,,) = lle,,ll- !$ = Ile


This proves the convergence factor of the linear convergence. 5. ASYMPTOTIC BOUNDS STEPLENGTH

+ !g

lle,,l12 4M2

(1 -

m2)

of the iteration

and gives also a bound on the 0

ESTIMATES

AND

ERROR

In this section, we obtain asymptotic estimates of the steplengths c, near the solution. We also obtain an asymptotic error factor estimate. We only obtain these results for Algorithm 4.1. Similar results can be obtained for Algorithm 3.1. We next obtain asymptotic assumptions of Theorem 4.1. estimates of the steplengths c, under the

PROPOSITION 5.1.

llrnl12

llrl12

lIpnIl (1 + E) B cnG lIpnIl (1 - 8)


where E, =

cm-,II).

178

A. T. CHRONOPOULOS

AND Z. ZLATEV

PROOF. We will prove only the rightmost inequality. The leftmost inequality is proved similarly. From equality (4.5) and f,:<c> = 0, we obtain the following inequality:

c, <

lb-,,112

IIr,,l12

IIp,,II- Bllf-,,I Ile,,II = IIp,,ll(l - E,,) Bllr,,II lle,,II IIp,,II

where

E,, is the fraction

Now, we use Lemma

4.1 (iv) and

inequality (4.8) to obtain:

We next obtain an asymptotic

error bound for iterates in Algorithm

4.1.

PROPOSITION 5.2. Under the assumptions following inequality on the residual errors:

of Theorem

4.1, we obtain

the

.+1112 Ile,Ild,,, < Ile


where m2 d, = l-s+%

and a;, = 2M2(lr,ll.

PROOF. obtain:

We

note

that by using relation

(4.1)

and Lemma

4.1

6)

we

Ile .+1ll2 = (en+,, e, - c, p,) = -c,(enT P,) = -c,(r,,

Ce,)

Nonlinear Conjugate Gradient Methocls Now using equality (4.4) and Lemma lie,, Using Proposition ill2 4.1 (iv) we obtain:

179

Ile,,l12 -c,,IIr,,II + Bllr,,ll lle,,l12. G

(5.1)

5.1 and Lemma

4.1 (iii), we prove the following inequality:

Ilr,,l14 1 2 ~lle.,ll~l c,lIr,,l12 a IIp,,II (1 + %)


Now using (5.2) in (5.1) we obtain:

+ E,,I

(5.2)

.+1ll2 d Ile,,l12 - $ 1 Ile [


The last term in this inequality

c1 J E ) I,

+ 1

Bllr,,II lle,,l12

is less than

llenl12[l- $
where q, =

+q.],

Bllr,,II.

6.

CONCLUSIONS

We have presented and analyzed nonlinear generalizations of the CGNR and CGNE method. These nonlinear methods apply to nonlinear operator equations with nonsymmetric Jacobian. We show that under certain uniform assumptions on the Jacobians and Hessians the nonlinear CGNR is guaranteed to converge globally to a unique solution. For the nonlinear CGNE under the same assumptions as CGNR, we prove local convergence results and give asymptotic error bound estimates. These results extend the work of other authors [4, 81 to deriving nonlinear methods for nonsymmetric Jacobians. The research was partially supported by NSF under grant CCR-8722260.

A conference proceedings version of this paper appeared in the proceedings of the S.E. Conference on Theory of Approximation, March 1991, Marcel-Dekker, Pure and AppZied Mathematics, Vol. 138, edited by George A. Anastassion.

180 REFERENCES 1 2 3 4 5 6 7 S. F. Ashby, T. A. Manteuffel, SIAM J. A local convergence methods, iteration conjugate

A. T. CHRONOPOULOS

AND

Z. ZLATEV

and P. E. theory procedures, method

Saylor,

A taxonomy (1990).

for conjugate

gradient difference

methods, projection The

Nurner.

Anal. 27:1542-1568 for combined J. Math. for linear

P. N. Brown,

inexact-Newton/finite(1987). (1955). operator Phys. 34:64-73 and nonlinear

SlAM J. Numer. gradient

Anal. 24:610-638

E. J. Craig, The N-step J. W. Daniel, equations, J. W. Englewood /. Numer. Daniel,

SIAM J. Numer. The Cliffs, Anal.

Anal. 4:10-26 1971.

(1967). of Function&, Inexact Newton Prentice-Hall, methods, iterative SIAM

Approximute

Minimization

New Jersey, 19:4OO-408

R. S. Dembo, S. C. Eisenstat,

S. C. Eisenstat, H. C. Ehnan,

and T. Steighaug, (1982).

and M. H. Schultz,

Variational

methods

for nonsymmetric (1983). 8 9 10 11 12 13 R. Fletcher Compt. R. Fletcher, tion, Wiley, G. H. Golub nonlinear systems, nonlinear equations 13:765-794 14 15 T. L. Saaty, 1981. D. P. Oleary, variables, J. M. Hestenes

systems of linear equations, Function

SIAM J. Namer. minimization

Anal. 20:345-357 gradients, Optimkaprocess for

and C. M. Reeves, (1964). Practical Methods Chichester, BIT 1980. and R. Kannan, and E. Stiefel, Res. Nat. Bur. On general operator with equations, variational

by conjugate

/. 7:149-154

in Optimization, Convergence (1986). Methods

Vol. 2, Unconstrained Richardson

of a two-stage

equations,

209-219

of conjugate 49:409-436 methods

gradients (1952).

for solving

linear of

Standards iterative Math, or

M. Z. Nashed, M. Z. Nashed,

for the of steepest

solutions descent J.

of a class

Camp.

19: 14-24

(1965). for nonlinear Math. Inc., Me&.

The convergence

of the method quasi-variational Equations, Algorithm

operators, Publications

(1964). Modern Nonlinear Dover New York, of many

A discrete

Newton

for minimizing

a function

Math. Programming

23:20-33

(1982).

You might also like