Solving Nonlinear Equations th Newton's M thod

fundamentals of Algorithms
Editor-in-Chief: Nicholas J. Higham, University of Manchester The SIAM series on Fundamentals of Algorithms publishes monographs on state-of-the-art numerical methods to provide the reader with sufficient knowledge to choose the appropriate method for a given application and to aid the reader in understanding the limitations of each method. The monographs focus on numerical methods and algorithms to solve specific classes of problems and are written for researchers, practitioners, and students. The goal of the series is to produce a collection of short books written by experts on numerical methods that include an explanation of each method and a summary of theoretical background. What distinguishes a book in this series is its emphasis on explaining how to best choose a method, algorithm, or software program to solve a specific type of problem and its descriptions of when a given algorithm or method succeeds or fails.

Kelley, C. T.

Solving Nonlinear Equations with Newton's Method

Kelleg North Carolina State University Raleigh. North Carolina Solving Nonlinear Equations with Newton's Method siamm Society for Industrial and Applied Mathematics Philadelphia .C T.

To my students .

m 1.9.8.1 Warning! 1.2 Newton's Method 1.9.9.7 A Basic Algorithm 1.1 Notation 1.8 Things to Consider 1.m 1.9.1 Human Time and Public Domain Codes 1.1 Nonsmooth Functions 1.m 1.2 Failure to Converge .2 The Initial Iterate 1.Contents Preface How to Get the Software 1 Introduction 1.10.1 Common Features 1.8.2.7.11 Projects vii xi xiii 1 1 1 2 3 5 7 9 11 12 14 15 15 15 16 16 17 17 18 19 19 20 20 20 21 21 22 23 24 .2 newtsol.3 Approximating the Jacobian 1.4 Choosing a Solver 1.1 What Is the Problem? 1.3 Failure of the Line Search 1.4 Slow Convergence 1.9.9.9 What Can Go Wrong? 1.10. .10 Three Codes for Scalar Equations 1.6 Global Convergence and the Armijo Rule 1.10.8.3 Computing the Newton Step 1.4 secant.10.5 Termination of the Iteration 1.4 Inexact Newton Methods 1.1 Local Convergence Theory 1. 1.6 Storage Problems 1.1.8.3 chordsol.5 Multiple Solutions 1.

7.6.2 Loss of Orthogonality 3.4 What Can Go Wrong? 3.8 Projects 2.9 Source Code for nsold.3 Preconditioners 3.3 Computing a Finite Difference Jacobian 2.8.7.2.6 Using nsold.2 Nested Iteration 2.6.2 Preconditioning Nonlinear Equations 3.1 Chandrasekhar H-equation 2.4 A Two-Point Boundary Value Problem 2.5.7.2 Low-Storage Krylov Methods 3.5 What Can Go Wrong? 2.1.5.1 GMRES 3.m 2.1.6.6.m 3.2 Output from nsold.7 Examples 2.5 Stiff Initial Value Problems 2.1 Input to nsoli.1 Krylov Methods for Solving Linear Equations 3.3 Preconditioning 3.1 Chandrasekhar H-equation 3.4.1 Jacobian-Vector Products 3.7.2.4.3 Chandrasekhar H-equation 2.1.2 Finite Difference Jacobian Error 2.2 2 Estimating the q-order Singular Problems 24 25 27 27 28 29 33 34 34 35 35 35 36 37 37 38 39 41 43 47 50 50 50 51 57 57 58 59 60 61 61 61 62 63 64 64 64 65 65 65 66 66 67 71 Finding the Newton Step with Gaussian Elimination 2.m 3.viii Contents 1.5.2 A Simple Two-Dimensional Example 2.6 Examples 3.1 Input to nsold.7.1 Direct Methods for Solving Linear Equations 2.1 Failure of the Inner Iteration 3.m Newton-Krylov Methods 3.3 Choosing the Forcing Term 3.m 2.3 Convection-Diffusion Equation 3 .2 The Ornstein-Zernike Equations 3.3 Pivoting 2.5.6.1 1.1 Poor Jacobians 2.5 Using nsoli.8.5.4 The Chord and Shamanskii Methods 2.11.2 The Newton-Armijo Iteration 2.2.1 Arctangent Function 2.m 3.2 Computing an Approximate Newton Step 3.m 2.11.2 Output from nsoli.

7.5 Using brsola.2 Convection-Diffusion Equation 4.4.7 3.3 Two-Point Boundary Value Problem 74 3.7. 73 Projects 74 3.2 An Algorithmic Sketch 4.5.2 Output from brsola.m 76 85 86 86 87 89 89 89 89 90 90 90 91 91 93 Broyden's Method 4.6.6.3 Computing the Broyden Step and Update 4.8 4 3.m 4.7 Source Code for brsola.m 4.4.5.m Bibliography Index 97 103 .4 Making a Movie 75 Source Code for nsoli.7.1 Krylov Methods and the Forcing Term 74 3.4 Time-Dependent Convection-Diffusion Equation .6.7.4 What Can Go Wrong? 4. .1 Chandrasekhar H-equation 4.2 Left and Right Preconditioning 74 3.Contents ix 3.1 Input to brsola.1 Failure of the Line Search 4.m 4.6 Examples 4.2 Failure to Converge 4.1 Convergence Theory 4.

5 on an Apple Macintosh G4 and a SONY VAIO.Preface This small book on Newton's method is a user-oriented guide to algorithms and implementation. This book is intended to complement my larger book [42]. Vickie Kearn. Todd Coffey. Any opinions. most recently by grants DMS-0070641. Alan Hindmarsh. Jeff Holland. There are many introductory books on MATLAB. The computational examples in this book were done with MATLAB v6. how one can choose an appropriate Newton-type method for a given problem and write an efficient solver or apply one written by others. We assume that the reader has a good understanding of elementary numerical analysis at the level of [4] and of numerical linear algebra at the level of [23. colleagues.76]. John Dennis. Paul Boggs. and with several examples. nsoli. Its purpose is to show.m. Mac Hyman. Hong-Liang Cui. Charlie Berger. implementation in any particular language. Peter Brown. Stacy Howington.m from the collection of MATLAB codes in my own research. Steve Davis. Chris Kees. this book cannot be understood without a working knowledge of MATLAB. findings. Russ Harmon. but does not discuss the details of solving particular problems. I have used the three main solvers nsold.to medium-scale problems having at most a few thousand unknowns. and DA AD 19-02-1-0391. I'm particularly grateful to these stellar rootfinders for their direct and indirect assistance and inspiration: Don Alfonso. Nick Higham. MATLAB is an excellent environment for prototyping and testing and for moderate-sized production work. Matthew Farthing. Dan Finkel. Either of [71] and [37] would be a good place to start. Ilse Ipsen. The codes were designed for production work on small. and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation or the Army Research Office. Large-scale problems are best done in a compiled language with a high-quality public domain code. Katie Kavanagh. Steve Campbell. which focuses on indepth treatment of convergence theory. The MATLAB codes for the solvers and all the examples accompany this book. and friends helped with this project. Jorg Gablonsky. and brsola. Jan Hesthaven.m. Carl xi . Because the examples are so closely coupled to the text. or evaluating a solver for a given problem. DMS-0209695. via algorithms in pseudocode. Lea Jenkins. in MATLAB®. Jackie Hallberg. DAAD19-02-1-0111. DMS-0112542. Tom Fogwell. Many of my students. Parts of this book are based on research supported by the National Science Foundation and the Army Research Office.

Dwight Woolard. Tom Mullikin. Jong-Shi Pang. Joe Schmidt. T. Linda Petzold. David Keyes. Sam Young. Kelley Raleigh. Matthew Lasater. Dana Knoll. and every student who ever took my nonlinear equations course. Peiji Zhao.xii Preface and Betty Kelley. Jim Ortega. C. Homer Walker. Chuck Siewert. Monte Pettitt. Stephen Nash. Tammy Kolda. Jill Reese. Debbie Lockhart. Chung-Wei Ng. North Carolina May 2003 . Carol Woodward. Linda Thiel. Bobby Schnabel. Mike Pernice. Greg Racine. Ekkehard Sachs. Carl Meyer. Casey Miller.

siam. (1) SOLVERS — nsold.m (5) Chapter 4: examples that use brsol. Inc. MA 01760-2098 (508) 647-7000 Fax: (508) 647-7001 Email: info@mathworks.org/books/fa01 The software is organized into the following five directories.com WWW: http://www.How to Get the Software This book is tightly coupled to a suite of MATLAB codes. 3 Apple Hill Drive Natick.m Newton-Krylov methods.mathworks.com XIII .m (4) Chapter 3: examples that use nsoli. You should put the SOLVERS directory in your MATLAB path. The codes are available from SIAM at the URL http://www. no matrix storage — brsol. direct factorization of Jacobians — nsoli.m Broyden's method.m One can obtain MATLAB from The MathWorks. no matrix storage (2) Chapter 1: solvers for scalar equations with examples (3) Chapter2: examples that use nsold.m Newton's method.

Physical models that are expressed as nonlinear partial differential equations. and warn of the problems.1 What Is the Problem? Nonlinear equations are solved as part of almost all simulations of physical processes. If the components of F are differentiable at x € RN. The output of an iterative method is a sequence of approximations to a solution. Rarely can the solution of a nonlinear equation be given by a closed-form expression. The purpose of this book is to show these authors what technology is available. We will let df/d(x)i denote the partial derivative of / with respect to (x)i. We will call F the nonlinear residual or simply the residual. Here F : RN —> RN. vectors are to be understood as column vectors. We use the standard notation for systems of N equations in N unknowns. 1. for example. Authors of simulation codes must either use a nonlinear solver as a tool or write one from scratch. and {xn}n>o the sequence of iterates. We do this via algorithmic outlines. nonlinear solvers in MATLAB that can be used for production work. en = xn — x* is the error in the nth iterate. become large systems of nonlinear equations when discretized. We will refer to XQ as the initial iterate (not guess!). so iterative methods must be used to approximate the solution numerically.Chapter 1 Introduction 1. We will denote the ith component of a vector x by (x)i (note the parentheses) and the ith component of xn by (x n )j.1. e = x — x* will denote the error. We will rarely need to refer to individual components of vectors. for example. we define the Jacobian 1 . As is standard [42]. following the convention in [42. x a potential solution. sketch the implementation. and chapter-ending projects.43]. The vector x* will denote a solution. examples in MATLAB.1 Notation In this book. So.

The MATLAB program ataneg. and 3. Mn is called the local linear model. We graph the local linear model at Xj from the point (xj. and the variations in Newton's method that we discuss in this book differ most significantly . then Mn(xn+i) = 0 is equivalent to (1.2). consumes most of the work.0).F(XJ)} to the next iteration (xj+1. construction of xn+i = x n +As.Xn] and let the root of Mn be the next iteration. approximate solution of the equation for the Newton step s. 2. If F'(xn) is nonsingular.1 illustrates the local linear model and the Newton iteration for the scalar equation with initial iterate X0 = 1. The iteration converges rapidly and one can see the linear model becoming more and more accurate. The third iterate is visually indistinguishable from the solution.2 Newton's Method The methods in this book are variations of Newton's method. The computation of a Newton iteration requires 1. Introduction matrix F'(x) by Throughout the book.yj) = (XJ. The Newton sequence is The interpretation of (1.1 and the other figures in this chapter for the arctan function. evaluation of F(xn] and a test for termination.m creates Figure 1. || • || will denote the Euclidean norm on 1. the computation of the Newton step.2) is that we model F at the current iterate xn with a linear function Mn(x) = F(xn ) + F'(xn}(x . where the step length A is selected to guarantee decrease in Item 2.2 Chapter 1. Figure 1.

1. Recall that Lipschitz continuity near x* means that there is 7 > 0 (the Lipschitz constant) such that . Equation 1.1.1.2.57] that is most often seen in an elementary course in numerical methods is local.1.2.1 Local Convergence Theory The convergence theory for Newton's method [24. Newton's Method 3 Figure 1. One should not write one's own nonlinear solver without step-length control (see section 1. 2. the step s in item 2 was satisfactory. The reader should be warned that attention to the step length is generally very important. Not all methods for computing the Newton step require the complete Jacobian matrix. and item 3 was not needed. In the example from Figure 1. 1.57] requires the standard assumptions. F' : fJ —* RNxN is Lipschitz continuous near x*.3) by an iterative method.2. (standard assumptions) 1. F'(x*) is nonsingular. Computing the step may require evaluation and factorization of the Jacobian matrix or the solution of (1. Assumption 1.6). in how the Newton step is approximated. as we will see in Chapter 2.1 has a solution x*. which.42. This means that one assumes that the initial iterate XQ is near a solution.42. The local convergence theory from [24. can be very expensive. Newton iteration for the arctan function. 3.

1319e-03 3. The classic convergence theorem is as follows. With this choice of difference increment. If X0 is sufficiently near x*. However. F'(xn is nonsingular for all n > 0) and converges to x* and there is K > 0 such that for n sufficiently large. Of course. one cannot examine the error without knowing the solution. Therefore. are often expensive and can be very inaccurate. An inaccurate Jacobian can cause many problems (see . the convergence speed of the nonlinear iteration is as fast as that for Newton's method.1 we report the Newton iteration for the scalar (N = 1) nonlinear equation The solution is x* « 4. we can observe the quadratic reduction in the error computationally. then the Newton sequence exists (i.8818e-16 Stagnation is not affected by the accuracy in the derivative. The reason for this stagnation is clear: one cannot evaluate the function to higher precision than (roughly) machine unit roundoff.. is called q-quadratic. Theorem 1. because the nonlinear residual will also be roughly squared with each iteration.1.9818e-06 5.y sufficiently near x*.58] floating point system is about 10~16.5955e-12 8. while usually reliable. until stagnation takes over. n 0 1 2 3 4 5 \F(Xn)\ 1.493. in which the error in the solution will be roughly squared with each iteration. we should see the exponent field of the norm of the nonlinear residual roughly double with each iteration. The results reported in Table 1. Let the standard assumptions hold. Introduction for all x.1. Residual history for Newton's method.13)). The reader should be aware that difference approximations to derivatives.e.4 Chapter 1. then progress slows down for iteration 4 and stops completely after that. which in the IEEE [39. if F'(x*) is well conditioned (see (1. Table 1.4).3733e-01 4. The convergence described by (1. The decrease in the function is as the theory predicts for the first three iterations. Squaring the error roughly means that the number of significant figures in the result doubles with each iteration. at least for this example.8818e-16 8. In Table 1.1 used a forward difference approximation to the derivative with a difference increment of 10~6.

Let the standard assumptions hold.e. the norm of the nonlinear residual against the iteration number. Figure 1. the results will be as accurate as the evaluation of F.3 Approximating the Jacobian As we will see in the subsequent chapters. but can be worth it in terms of computer time and robustness when a difference Jacobian performs poorly.2.5). if the Jacobian is well conditioned (see (1.2 and 1.13) in section 1. machine roundoff. While Table 1.1. F'(xn) + A(rcn) is nonsingular for all n) and satisfies for some K > 0.3. Then. One way to do this is to approximate F'(xn) in a .2 is a semilog plot of residual history.1 gives a clear picture of quadratic convergence.3) will affect the speed of the nonlinear iteration. Theorem 1. One uses the semilogy command in MATLAB for this.m. This type of stagnation is usually benign and.. The messages of Theorem 1. See the file tandemo. i. it's easier to appreciate a graph. the asymptotic convergence results for exact arithmetic describe the observations well for most problems. • Errors in the Jacobian and in the solution of the linear equation for the Newton step (1. Approximating the Jacobian 5 section 1. An analytic Jacobian can require some human effort.e. but not the limit of the sequence. We will ignore the errors in the function in the rest of this book. However. if X0 is sufficiently small. 1. but one needs to be aware that stagnation of the nonlinear iteration is all but certain in finite-precision arithmetic.1. for an example. The concavity of the plot is the signature of superlinear convergence. in the function evaluation can lead to stagnation.2 are as follows: • Small errors.9)..3. Let a matrix-valued function A(x) and a vector-valued function e(x) be such that for all x near x*. which generated Figures 1. One can quantify this stagnation by adding the errors in the function evaluation and derivative evaluations to Theorem 1. for example. the sequence near x* and dj and 6p are sufficiently is defined (i. it is usually most efficient to approximate the Newton step in some way.

rather than a forward difference. more nonlinear iterations are needed to solve the problem. Introduction Figure 1. as you can see from the definition (1. This means that there is p G (0. One way to approximate the Jacobian is to compute F'(XQ) and use that as an approximation to F'(xn] throughout the iteration. This is the chord method or modified Newton method.2 to the chord method with e = 0 and ||A(xn)|| = O(||eo||) and conclude that p is proportional to the initial error. i. Assuming that the initial iteration is near enough to x*. but. The secant method for scalar equations approximates the derivative using a finite difference.1) such that for n sufficiently large.4). the overall cost of the solve is usually significantly less. The price for such an approximation is that the nonlinear iteration converges more slowly.6 Chapter 1.e. We can apply Theorem 1. the convergence is q-linear. uses the most recent two . Q-quadratic convergence is also q-linear. In many cases of q-linear convergence. The curve appears to be a line with slope « log(p). but also saves linear algebra work and matrix storage. because the computation of the Newton step is less expensive. Newton iteration for tan(x) — x = 0. one observes that In these cases. The convergence of the chord iteration is not as fast as Newton's method. The constant p is called the q-factor.2. q-linear convergence is usually easy to see on a semilog plot of the residual norms against the iteration number. However. The formal definition of q-linear convergence allows for faster convergence. way that not only avoids computation of the derivative..

m for the solvers. Inexact Newton Methods 7 iterations to form the difference quotient. we say that xn —> x* q-superlinearly with q-order p. if xn —> x* and. We discuss one of the many generalizations of the secant method for systems of equations in Chapter 4. and tandemo . 1. An inexact Newton method [22] uses as a Newton .m to apply the solvers and make the plots. and secant .1. These solvers are basic scalar codes which have user interfaces similar to those of the more advanced codes which we cover in subsequent chapters. with e = 0 and ||A(xn)|| = O(||en_i||). We see the convergence behavior that the theory predicts in the linear curve for the chord method and in the concave curves for Newton's method and the secant method.2. We also see the stagnation in the terminal phase. one could instead solve the equation for the Newton step approximately.m for the residual.3. In Figure 1. The MATLAB codes for these examples are ftst . The secant method's approximation to F'(xn) converges to F'(x*} as the iteration progresses. The formula for the secant method does not extend to systems of equations (N > 1) because the denominator in the fraction would be a difference of vectors. Theorem 1. for some p > 1. This is what we do in our MATLAB code secant.99z0.4 Inexact Newton Methods Rather than approximate the Jacobian. newt sol . We will discuss the design of these codes in section 1. we compare Newton's method with the chord method and the secant method for our model problem (1.m. One way to do that is to let x-i = 0. This means that either xn = x* for some finite n or Q-superlinear convergence is hard to distinguish from q-quadratic convergence by visual inspection of the semilog plot of the residual history. The residual curve for q-superlinear convergence is concave down but drops less rapidly than the one for Newton's method.m.4. implies that the iteration converges q-superlinearly. The division by zero that we observed is an extreme case. Q-quadratic convergence is a special case of q-superlinear convergence. The secant method has the dangerous property that the difference between xn and x n _i could be too small for an accurate difference approximation. The secant method must be initialized with two points. chordsol.5).10. m. The figure does not show the division by zero that halted the secant method computation at iteration 6. So where xn is the current iteration and xn-i is the iteration before that. More generally.

then the inexact Newton iteration where converges q-linearly to x*. therefore leading to convergence in fewer iterations.3. if X0 € B(6).8 Chapter 1. Theorem 1. Theorem 1.3. {rjn} C [0. Newton/chord/secant comparison for tan(x) — x.10) very expensive. Introduction Figure 1. Choosing a small value of rj will make the iteration more like Newton's method. • */ fyi —» 0. fj]. The local convergence theory [22. a small value of 77 may make computing a step that satisfies (1.3 is a typical example of such a convergence result. However.42] for inexact Newton methods reflects the intuitive idea that a small value of 77 leads to fewer iterations. step a vector s that satisfies the inexact Newton condition The parameter 77 (the forcing term) can be varied as the Newton iteration progresses. Moreover.order l+p. for some K^ > Q. the convergence is q-superlinear. the convergence is q-superlinear with . Then there are 6 and f\ such that. and q. Let the standard assumptions hold.

1.)|| is small is to compare a relative reduction in the norm of the error with a relative reduction in the norm of the nonlinear residual. setting ra = 0) is a poor idea because an initial iterate that is near the solution may make (1. Termination of the Iteration _9 Errors in the function evaluation will.5.11) with which implies q-linear convergence if \\eo\\ is sufficiently small. Better choices of 77 include 77 = 0.2.9 in nsoli.m t see this. which we describe in Chapter 3. Either of these usually leads to rapid convergence near the solution. Using only the relative reduction in the nonlinear residual as a basis for termination (i. For example.3 to analyze the chord method or the secant method. Newton iterative methods are named by the particular iterative method used for the linear equation. the author's personal favorite. If the standard assumptions hold and XQ and x are sufficiently near the root. and a more complex approach (see section 3.e.. One way to quantify the utility of termination when ||. implying q-superlinear convergence.F(a.1. Based on this heuristic. 1. we terminate the iteration in our codes when The relative rr and absolute ra error tolerances are both important. the overall nonlinear solver is called a Newton iterative method. For the secant method. An unfortunate choice of the forcing term 77 can lead to very poor result The reader is invited to try the two choices 77 = 10~6 and 77 = .3 does not fully describe the performance of inexact methods in practice because the theorem ignores the method used to obtain a step that satisfies (1.10) and ignores the dependence of the cost of computing the step as a function of 77. In this case. r.m. is an implementation of several Newton— Krylov methods. One can use Theorem 1.5 Termination of the Iteration While one cannot know the error without knowing the solution.m code. in most cases the norm of F(x) can be used as a reliable indicator of the rate of decay in \\e\\ as the iteration progresses [42].n = O(||en_i||). In the case of the chord method. Iterative methods for solving the equation for the Newton step would typically use (1.12) impossible to satisfy with ra = 0. lead to stagnation of the iteration.3) from [29] and [42] that is the default in nsoli. but at a much lower cost for the linear solver than a very small forcing term such as 77 = 10~4. the steps satisfy (1. in general. Theorem 1. then . the nsoli.10) as a termination criterion.

where a small residual implies a small error if the matrix is well conditioned.e. then (1. one may use \\sn\\ as an estimate of ||en||. So. however. This can happen in practice if the Jacobian is ill conditioned and the initial iterate is far from the solution [45]. Termination using (1. terminating the iteration with xn+1 as soon as will imply that ||en+i|| < r. by Assuming that the estimate of p is reasonable.(F'(x*}) is not very large). This is analogous to the linear case.16) will imply that In practice. One can estimate the current rate of convergence from above by Hence. if we terminate the iteration when and the estimate of p is an overestimate. .13) we conclude that. if the Jacobian is well conditioned (i. but is used for linearly convergent methods in some initial value problem solvers [8. which is supported by theory only for superlinearly convergent methods. a safety factor is used on the left side of (1.. the iteration can terminate too soon. If the iteration is converging superlinearly. when the iteration is converging superlinearly. say.12) is a useful termination criterion. the estimate of p is much smaller than the actual q-factor.17) to guard against an underestimate.61]. is to exploit the fast convergence to estimate the error in terms of the step. for a superlinearly convergent method. then implies that Hence. then and hence Therefore.10 where Chapter 1. The trick is to estimate the q-factor p.14) is only supported by theory for superlinearly convergent methods. Introduction is the condition number of F'(x*) relative to the norm || • ||. for n sufficiently large. If. Another approach. K. Prom (1. then (1.

but overshoots by larger and larger amounts. we apply Newton's method to find the root x* = 0 of the function F(x) = arctan(x) with initial iterate XQ — 10.18) as easy as possible to satisfy. The parameter a £ (0. as we have been doing up to now in this book. Methods like the Armijo rule are called line search methods because one searches for a decrease in ||F|| along the line segment [xn. the step while in the correct direction. The line search in our codes manages the reduction in the step size with more sophistication than simply halving an unsuccessful step. called the Armijo rule [2].xn + d]. a = 10~4 is typical and used in our codes. toward x* = 0. say) . For the methods in this book. we show how this approach. Global Convergence and the Armijo Rule 11 1. The simple artifice of reducing the step by half until ||-F(a. The condition in (1. When we talk about local convergence and are taking full steps (A = 1 and s = d). In Figure 1.6. A rigorous convergence analysis requires a bit more detail. The circled points are iterations for which m > 1 and the value of m is above the circle.6 Global Convergence and the Armijo Rule The requirement in the local convergence theory that the initial iterate be near the solution is more than mathematical pedantry. the Newton step will be a positive scalar multiple of the Newton direction.)|| has been reduced will usually solve this problem. we will not make this distinction and only refer to the step.1. The motivation for this is that some problems respond well to one or two reductions in the step length by modest amounts (such as 1/2) and others require many such reductions. In fact. 1) is a small number intended to make (1.18) is called the sufficient decrease of ||F||. i.4. The initial iterate and the four subsequent iterates are As you can see. but might do much better if a more aggressive step-length reduction (by factors of 1/10. succeeds. created by ataneg. is far too large in magnitude. the Newton step points in the correct direction. This initial iterate is too far from the root for the local convergence theory to hold.m. In order to clearly describe this.e. We begin by computing the Newton direction To keep the step from going too far. we find the smallest integer m > 0 such that and let the step be s = 2~md and xn+i = xn + 2~md. we will now make a distinction between the Newton direction d = —F'(x)~1F(x) and the Newton step when we discuss global convergence.. To see this.

1. A m _i/2]. Newton-Armijo for arctan(o. There is a lot of . is used. we use the three-point parabolic model from [42]. subject to the safeguard that the reduction in A be at least a factor of two and at most a factor of ten.).28. So the algorithm generates a sequence of candidate step-length factors {Am} with AO = 1 and The norm in (1.4.19) is squared to make <j> a smooth function that can be accurately modeled by a quadratic over small ranges of A. To address this possibility. Am is the minimum of this parabola on the interval [A m _i/10. The line search terminates with the smallest m > 0 such that In the advanced codes from the subsequent chapters. In this approach. AI = 1/2. a parabola is fitted to the data </>(0).7 A Basic Algorithm Algorithm nsolg is a general formulation of an inexact Newton-Armijo iteration. We refer the reader to [42] for the details and to [24.57] for a discussion of other ways to implement a line search.12 Chapter 1. </>(A m ). To compute Am for m > 1. we build a quadratic polynomial model of based on interpolation of 0 at the three most recent values of A. and 0(A m _i). Introduction Figure 1. The next A is the minimizer of the quadratic model. The methods in Chapters 2 and 3 are special cases of nsolg. after two reductions by halving do not lead to sufficient decrease.42.

77 is bounded away from one (in the sense of (1.7. then the forcing term 77 is determined implicitly. It's standard in line search implementations to use a polynomial model like the one we described in section 1.F. terminate with failure.5. The number of nonlinear iterations. and changes in the step length all should be limited. while ||F(z)|| > r do Find d such that \\F'(x}d + F(x}\\ < rj\\F(x}\\ If no such d can be found. when near the solution. end while x <— x + \d end while The theory for Algorithm nsolg is very satisfying. The algorithm does not cover all aspects of a useful implementation. but is not necessary in practice if you use direct solvers. and the relative and absolute termination tolerances ra and rr.rr\F(x)\ + ra. x will be the approximate solution on output. If you use a direct solver. . If nsolg terminates successfully.6.ra.4 states this precisely. the Jacobians remain well conditioned throughout the iteration. 2. The essential input arguments are the initial iterate x. m. then rj = 0 in exact arithmetic. using either the Jacobian F'(x] or an approximation of it. Theorem 1. which we describe in Chapter 3). and 3.10) is the termination criterion for that linear solver. nsolg(z. If F is sufficiently smooth. For example. you do not need to provide one. we compute a step length A and a step s = Ad so that the sufficient decrease condition (1.1. then usually (1. If you use an iterative linear solver. Knowing about n helps you understand and apply the theory. We list some of the potential causes of failure in sections 1.4.1/2] is computed by minimizing the polynomial model of ||F(arn + Ad)||2. where a 6 [1/10. You'll need to make a decision about the forcing term in that case (or accept the defaults from a code like nsoli. the computation of the Newton direction d can be done with direct or iterative linear solvers. then the iteration converges to a solution and. and the sequence {xn} remains bounded. T <. A Basic Algorithm 13 freedom in Algorithm nsolg. If you use an approximate Jacobian and solve with a direct method. The theoretical requirements on the forcing term 77 are that it be safely bounded away from one (1. Within the algorithm. Failure of any of these loops to terminate reasonably rapidly indicates that something is wrong. A= l while \\F(x + Xd)\\ > (1 . Having computed the Newton direction.22). if you solve the equation for the Newton step with a direct method.1.9.22)).aA)||F(z)|| do A <— 0-A. the function F. then rj is proportional to the error in the Jacobian. Algorithm 1.21) holds. the convergence is as fast as the quality of the linear solver permits.Tr) Evaluate F(x). linear iterations.

such as the Newton-Krylov methods in Chapter 3. for example). • {xn} will be unbounded. but users of nonlinear solvers should be aware that the line search can fail. therefore. A poor approximation to the Jacobian will cause the Newton step to be inaccurate. .57].4. The important thing that you should remember is that. will usually (but not always) work well when far from a solution. and homotopy methods [78] are three such alternatives. find the solution that is appropriate to a physical problem. or a very good approximation (forward difference. the outcome can be much worse when far from a solution.42.25. there are only three possibilities for the iteration of Algorithm nsolg: • {xn} will converge to a solution :r*. other methods are available and can sometimes overcome stagnation or. if XQ is far from x* there is no reason to expect the secant or chord method to converge. whereas differentiation in the directions of the iterations. or • F'(xn) will become singular. However. While the line search paradigm is the simplest way to find a solution if the initial iterate is far from a root. is used to compute the step. which are natural directions for the problem. full steps (X = I) are taken for n sufficiently large. residuals. Difference approximations to the Jacobian are usually sufficiently accurate.44]. in the case of many solutions.60]. there are particularly hard problems [48] for which differentiation in the coordinate directions is very inaccurate. The reason for this is that the success of the line search is very sensitive to the direction. and {xn} and { \ \ F f ( x n ) ~ l | | } are bounded. In particular.36. 1. is very accurate. A good code will watch for this failure and respond by using a more accurate Jacobian or Jacobianvector product. Theorem 1.14 Chapter 1. for smooth F.1) be given.1 Warning! The theory for convergence of the inexact Newton-Armijo iteration is only valid if F'(xn). Let XQ e RN and a e (0. Sometimes methods like the secant and chord methods work fine with a line search when the initial iterate is far from a solution. The inexact Newton methods.3). and the convergence behavior in the final phase of the iteration is that given by the local theory for inexact Newton methods (Theorem 1. at which the standard assumptions hold. Assume that {xn} is given by Algorithm nsolg. use a forward difference approximation for Jacobian-vector products (with vectors that are natural for the problem) and.7. Introduction but not as generally as the results in [24. F is Lipschitz continuously differentiate. Then {xn} converges to a root x* of F at which the standard assumptions hold. Trust region globalization [24. pseudotransient continuation [19. and steps. While this can result in slow convergence when the iterations are near the root.

8. be sure that the signs of the corresponding components of the initial iterate agree with those of the solution. then you must either reformulate the problem or find the storage for a direct method.8. Even if the storage is available. factorization of the Jacobian is usually a poor choice for very large problems. make sure that any boundary conditions are reflected in your initial iterate. If you know the signs of some components of the solution. may be the only choice.2. For very large problems. The items in the list above are not independent.3).6. the Newton-Krylov methods from Chapter 3 and Broyden's method from Chapter 4 are worth exploring. • If AT is small and F is cheap.3 and 3. A direct method is not always the best choice for a small problem. if your problem is a discretized differential equation. For example. where problems such as differential equations are solved on a coarse mesh and the initial iterate for the solution on finer meshes is an interpolation of the solution from a coarser mesh. so it is worth considerable effort to build a good preconditioner for an iterative method.3. but usually require preconditioning (see sections 3.16 Chapter 1. storing a Jacobian is difficult and factoring one may be impossible.2.7. computing F' with forward differences and using direct solvers for linear algebra makes sense.21].4 Choosing a Solver The most important issues in selecting a solver are • the size of the problem. Low-storage Newton-Krylov methods. Introduction in which the initial iterate is the output of a predictor. If you can exploit sparsity in the Jacobian. in which case one should try to exploit those data about the solution.2).1. such as the example in sections 2.3 Computing the Newton Step If function and Jacobian evaluations are very costly. It is more common to have a little information about the solution in advance. Integral equations. If these efforts fail and the linear iteration fails to converge. These methods are probably the optimal choice in terms of saving your time. The methods from Chapter 2 are a good choice. 1. though. and 4. and nested iteration (see section 2. you will save a significant amount of work in . • the cost of evaluating F and F'.8. • Sparse differencing can be done in considerable generality [20. and • the way linear systems of equations will be solved. 1. Both methods avoid explicit computation of Jacobians. The reader in a hurry could use the outline below and probably do well.1. are one type for which iterative methods perform better than direct methods even for problems with small numbers of unknowns and dense Jacobians. 3. such as Newton-BiCGSTAB.

• control structures like case or if-then-else that govern the value returned by F. 1. • If AT is large or computing and storing F' is very expensive. If you can store F'. but requires the sparsity pattern from you. . including the ones that accompany this book. What Can Go Wrong? 17 the computation of the Jacobian and may be able to use a direct solver.9. On the other hand. a nonsmooth nonlinearity can cause any of the failures listed in this section. — If you can't compute or store F' at all.1. or a fractional power. you can use that matrix to build an incomplete factorization [62] preconditioner. are intended to solve problems for which F' is Lipschitz continuous. The internal MATLAB code numjac will do sparse differencing. or calls to other codes.9 What Can Go Wrong? Even the best and most robust codes can (and do) fail in practice.1 will help you choose a Krylov method.3 and implement a banded differencing algorithm in nsold. If your function is close to a smooth function.m. These are some problems that can arise for all choices of methods.1 Nonsmooth Functions Most nonlinear equation codes. The discussion in section 3. The codes will behave unpredictably if your function is not Lipschitz continuously differentiable. you might be able to use a sparse differencing method to approximate F' and a sparse direct solver. a Newton-Krylov code is a good start. for example. If you can obtain the sparsity pattern easily and the computational cost of a direct factorization is acceptable. • internal interpolations from tabulated data. If you have a good preconditioner. you may not be able to use a direct method. when we discuss problems that are specific to a method for approximating the Newton direction. If. We discuss how to do this for banded Jacobians in section 2. — If F' is sparse. a vector norm. In this section we give some guidance that may help you troubleshoot your own solvers or interpret hard-to-understand results from solvers written by others. a direct method is a very attractive choice. We will also repeat some of these things in subsequent chapters. the codes may do very well. then the matrix-free methods in Chapters 3 and 4 may be your only options. you may well have a nondifferentiable problem.9. the code for your function contains • nondifferentiable functions such as the absolute value. 1.

including the ones that accompany this book. Thinking about the errors in your function and. internal tolerances to algorithms within the computation of F may be too loose. Alternatives to Newton-Armijo If you find that a Newton-Armijo code fails for your problem. Introduction 1. The clear symptoms of this are divergence of the iteration or failure of the residual to converge to zero. then the Newton iteration will diverge to +00 from any starting point. then either the iteration will become unbounded or the Jacobian will become singular.2 illustrates how an unfortunate choice of initial iterate can lead to this behavior. errors in programming (a. may have been realized in a way that destroys the solution. the minimum of |F(a. Singular Jacobian The case where F' approaches singularity is particularly dangerous. which is not a root. If the error in your function evaluation is larger than that.k. The causes in practice are less clear. So. internal calculations based on table lookup and interpolation may be inaccurate. does not imply that the iteration will converge. The algorithm for computing F. there are alternatives to line search globalization that.9. the Newton-Armijo iteration will converge to 0. as stated in Theorem 1.2 Failure to Converge The theory.18 Chapter 1. so if one terminates when the step is small and fails to check that F is approaching zero. only that nonconvergence can be identified easily. if necessary. if the iteration fails to converge to a root. the model itself may be wrong. The example in section 2. and if-then-else constructs can make F nondifferentiable. and pseudotransient continuation [44]. while complex and often more costly. In this case the step lengths approach zero. changing the difference increment in the solvers will usually solve this problem. If F(x) = e~x. Inaccurate function evaluation Most nonlinear solvers. then any solver will have trouble. assume that the errors in the evaluation are on the order of machine roundoff and.4. therefore. For example. There are public domain . homotopy [78]. Among these methods are trust region methods [24. 60]. No solution If your problem has no solution. can be more robust than Newton-Armijo. If F(x) = x2 + 1. If F is a model of a physical problem.a.7. use a difference increment of « 10~7 for finite difference Jacobians and Jacobian-vector products. the Newton direction can be poor enough for the iteration to fail. bugs) are the likely source. one can incorrectly conclude that a root has been found. while technically correct.)|.

Jacobian-vector product. be sure that you have specified the correct sparsity pattern. See section 3. A difference approximation to a Jacobian or Jacobian-vector product is usually. Central difference approximations. where the optimal increment is roughly the cube root of the error in the function. The local superlinear convergence results from Theorems 1. an analytic Jacobian may make the line search perform much better.9. The difference increment in a forward difference approximation to a Jacobian or a Jacobian-vector product should be a bit more than the square root of the error in the function.9. One should scale the finite difference increment to reflect the size of x (see section 2. you should see if you've made a modeling error and thus posed a problem with no solution. twice that of a forward difference.4. is rarely justified.2 for more about this problem.7.3). • If you are using a sparse-matrix code to solve for the Newton step. . If these methods fail. or linear solver is inaccurate. Check for errors in the preconditioner and try to investigate its quality.4 Slow Convergence If you use Newton's method and observe slow convergence.3 Failure of the Line Search If the line search reduces the step size to an unacceptably small value and the Jacobian is not becoming singular. We repeat the caution from section 1. but for large problems the cost. can improve the performance of the solver. What Can Go Wrong? 19 codes for the first two of these alternatives.1.1 and 1. • Make sure the tolerances for an iterative linear solver are set tightly enough to get the convergence you want. 1. for example). Failure of the line search in a Newton—Krylov iteration may be a symptom of loss of orthogonality in the linear solver.1 that the theory for convergence of the Armijo rule depends on using the exact Jacobian. • Check your computation of the Jacobian (by comparing it to a difference. the chances are good that the Jacobian. 1. Our codes use h = 10~7. but do not. If you're using a direct method to compute the Newton step. which is a good choice unless the function contains components such as a table lookup or output from an instrument that would reduce the accuracy. but not always. If you expect to see superlinear convergence.9. then increase the difference increment in a difference Jacobian to roughly the square root of the errors in the function [42]. sufficient. you might try these things: • If the errors in F are significantly larger than floating point roundoff. the quality of the Newton direction is poor.3 only hold if the correct linear system is solved to high accuracy.

2. The Newton—Krylov methods and Broyden's method are good candidates for the latter.7. The registers in the CPU are the fastest. 1.2). make sure that you have not lost orthogonality (see section 3. Other computing environments solve run-time storage problems with virtual memory. and the secant method.20 Chapter 1.9. m. m are MATLAB implementations of Newton's method. The problems we discuss in sections 2. This is rarely acceptable. the chord method. Simple things such as ordering loops to improve the locality of reference can speed up a code dramatically. are supported by the theory that says that either the solver will converge to a root or it will fail in some well-defined manner. 2. The solvers we discuss in this book.m. Below the registers can be several layers of cache memory. GMRES needs a vector for each linear iteration. but in FORTRAN or C. or use a solver that requires less storage. Introduction • If you are using a GMRES solver. When this happens.7. 1. you may not be able to store the factors that the sparse Gaussian elimination in MATLAB creates. and 3. Even if you use an iterative method.4. will tell you that there is not enough storage for your job. respectively. This is called paging and will slow down the computation by factors of 100 or more.6 Storage Problems If your problem is large and the Jacobian is dense. but much more expensive. MATLAB. so a cache is small. This means that data are sent to and from disk as the computation proceeds. newt sol. there is no guarantee that an equation has a unique solution. The discussion of loop ordering in [23] is a good place to start learning about efficient programming for computers with memory hierarchies. Cache memory is faster than RAM.6. will print this message: Out of memory.9. for example. Your best option is to find a computer with more memory. If your Jacobian is sparse. you can find a way to obtain more memory or a larger computer. Modern computer architectures have complex memory hierarchies. you do. for example. so you do best if you can keep data in registers as long as possible. Below the cache is RAM. MATLAB among them. and secant.2 have multiple solutions. 1.5 Multiple Solutions In general.10 Three Codes for Scalar Equations Three simple codes for scalar equations illustrate the fundamental ideas well. You probably don't have to think about cache in MATLAB. chordsol.3. for scalar .9. No theory can say that the iteration will converge to the solution that you want. you may be unable to store that Jacobian. as well as the alternatives we listed in section 1. Many computing environments.4. and below that is disk. you may not be able to store the data that the method needs to converge. Type HELP MEMORY for your options.

2)). The third and fourth. f. doing so makes it as easy as possible to plot iteration statistics. m is the only one of the scalar codes that uses a line search. hist] = solver(x. The function f atan. As codes for scalar equations.2 newtsol. for Newton's method only. jdiff is an optional argument. The first column is the iteration counter and the second the absolute value of the residual after that iteration. tolr) or. Setting jdiff function / with two output arguments [y. one need not keep the iteration number in the history and our codes for systems do not. the function /.or four-column hist array.hist(:.10. for a simple example. x = solver(x. One MATLAB command will make a semilog plot of the residual history: semilogy(hist(:. The Newton's method code includes a line search.10. tolr). are the number of times the line search reduced the step size and the Newton sequence {xn}. jdiff). and relative and absolute residual tolerances tola and tolr.m returns the arctan function and. f.1. Three Codes for Scalar Equations 21 equations. tola. The output is the final result and (optionally) a history of the iteration.1 a bit too seriously. Each of the scalar codes has a limit of 100 nonlinear iterations. 1. The most efficient way to write such a function is to only compute F' if it is requested. 1. optionally. The calling sequence is [x.1 Common Features The three codes require an initial iterate or. The history is kept in a two. where y = F(x) and yp = F'(x).m newt sol. The secant and chord method codes do not.m lets you choose between evaluating the derivative with a forward difference (the default) and analytically in the function evaluation. they do not need to pay attention to numerical linear algebra or worry about storing the iteration history.10. tolr. if you're not interested in the history array. The step-length reduction is done by halving. its derivative: = 1 directs newtsol.yp]=f(x). tola.m to expect a .1). The codes can be called as follows: [x. but. They have features in common with the more elaborate codes from the rest of the book.7. not by the more sophisticated polynomial model based method used in the codes for systems of equations. f. hist] = newtsol (x. taking the warning in section 1. Here is an example. tola. newtsol. Of course.

» [x.10.d-12.tol). To use the semilogy to plot circles when the line search was required in this example. tol. 4.2)) .0000e+00 3. 1) .5730e+00 4.0000e+00 l.yp] = fatan(x) 7.3724e+00 1.:) ans = 0 l. xO=10. 7.OOOOe+OO 4. (optionally) YP = 1/(1+X~2).0000e+00 2.m approximates the Jacobian at the initial iterate with a forward difference and uses that approximation for the entire nonlinear iteration.OOOOe+01 -8. 'o') xlabel('iterations'). and 4.0000e+00 1.m chordsol.4547e+00 1.hist (2:5.OOOOe+00 2. 3.abs(hist(2:5.d-12. The code below. creates the plot in Figure 1. 7. 1.9730e+00 -3. y = atan(x). 3. and 5 of the history array in the plot. knowledge of the history of the iteration was needed. The calling sequence is . for example. tol=l.4. 2. tol=l. The third column is the number of times the step size was reduced in the line search.hist] = newtsoKxO.4. m has four columns.0000e+00 S. end The history array for newt sol.4711e+00 1.4.YP] = FATAN(X) returns Y\. After that.22 Chapter 1. » hist(1:5.3921e-01 0 3. if nargout == 2 yp = l/(l+x~2). The fourth column contains the Newton sequence. full steps were taken.3170e+00 9.0000e+00 2.=\. ylabel('function absolute values').8549e+00 1. This is the information we need to locate the circles and the numbers on the graph in Figure 1.4. EXAMPLE Draw Figure 1. semilogy (hist (:. 7. [Y. 'fatan'.3670e+00 The third column tell us that the step size was reduced for the first through fourth iterates. This allows you to make plots like Figure 1.2)) .3 chordsol. FATAN Arctangent function with optional derivative */. we can use rows 2.tol).hist] = newtsoKxO. tol. Here is a call to newt sol followed by an examination of the first five rows of the history array: » xO=10.atan(X) and '/. [x.1) . 'fatan'. Introduction function [y.abs(hist(: . Once we know that the line search is active only on iterations 1.

1.10.

Three Codes for Scalar Equations

23

[x, hist] = chordsol (x, f, tola, tolr). The hist array has two columns, the iteration counter and the absolute value of the nonlinear residual. If you write / as you would for newtsol.m, with an optional second output argument, chordsol.m will accept it but won't exploit the analytic derivative. We invite the reader to extend chordsol.m to accept analytic derivatives; this is not hard to do by reusing some code from newtsol.m.

1.10.4

secant, m

The secant method needs two approximations to x* to begin the iteration, secant .m uses the initial iterate XQ = x and then sets

When stagnation takes place, a secant method code must take care to avoid division by zero in (1.8). secant .m does this by only updating the iteration if xn-i ^ xn. The calling sequence is the same as for chordsol.m: [x, hist] = secant(x, f, tola, tolr). The three codes newtsol.m, chordsol.m, and secant.m were used together in tandemo.m to create Figure 1.3, Table 1.1, and Figure 1.2. The script begins with initialization of the solvers and calls to all three: 7. EXAMPLE 7. Draw Figure 1.3.

7.

xO=4.5; tol=l.d-20; 7.

7t Solve the problem three times. 7. [x,hist]=newtsol(xO,'ftan',tol,tol,l); [x,histc]=chordsol(xO,'ftan',tol,tol); [x,hists]=secant(xO,'ftan',tol,tol); 7. 7. Plot 15 iterations for all three methods. 7. maxit=15; semilogy(hist(l:maxit,l),abs(hist(l:maxit,2),'-',... histc(l:maxit,l),abs(histc(l:maxit,2)),'—',... hists(l:maxit,l),abs(hists(l:maxit,2)),'-.'); legend('Newton','Chord','Secant'); xlabel('Nonlinear iterations'); ylabelOAbsolute Nonlinear Residual');

24

Chapter 1. Introduction

1.11
1.11.1

Projects
Estimating the q-order

One can examine the data in the itJiist array to estimate the q-order in the following way. If xn —> x* with q-order p, then one might hope that

for some K > 0. If that happens, then, as n —» oo,

and so

Hence, by looking at the itJiist array, we can estimate p. This MATLAB code uses nsold.m to do exactly that for the functions f(x) = x — cos(a:) and f(x) = arctan(x).
7. QORDER a program to estimate the q-order 7, 7, Set nsold for Newton's method, tight tolerances.

7.
xO = 1.0; parms = [40,1,0]; tol = [l.d-8,l.d-8] ; [x.histc] = nsold(xO, 'fcos' , tol, parms); lhc=length(histc(: , ) ; 2) 7, 7. Estimate the q-order. 7. qc = log(histc(2:lhc,l))./log(histc(l:lhc-l,D); 7. 7. Try it again with f(x) = atan(x) . 7. [x,histt] = nsold(xO, 'atan' , tol, parms) ; lht=length(histt(: ,2)) ;

7. 7. Estimate the q-order. 7. qt = log(histt(2:lht,l))./log(histt(l:lht-l,l));

If we examine the last few elements of the arrays qc and qt we should see a good estimate of the q-order until the iteration stagnates. The last three elements of qc are 3.8,2.4, and 2.1, as close to the quadratic convergence q-order of 2 as we're likely to see. For f(x) = arctan(o;), the residual at the end is 2 x 10-24, and the final four elements of qt are 3.7, 3.2, 3.2, and 3.1. In fact, the correct q-order for this problem is 3. Why? Apply this idea to the secant and chord methods for the example problems in this chapter. Try it for sin(ar) = 0 with an initial iterate of XQ = 3. Are the

1.11.

Projects

25

estimated q-orders consistent with the theory? Can you explain the q-order that you observe for the secant method?

1.11.2

Singular Problems

Solve F(x) = x2 = 0 with Newton's method, the chord method, and the secant method. Try the alternative iteration

.76]..e. direct methods are more robust than iterative methods and do not require your worrying about the possible convergence failure of an iterative method or preconditioning.. positivity. symmetry. As is standard in numerical linear algebra (see [23.32. exchanging an increase in the number of nonlinear iterations for a dramatic reduction in the cost of the computation of the steps. for example). 77 = 0 in Algorithm nsolg). 27 . one should expect to see q-quadratic convergence until finite-precision effects produce stagnation (as predicted in Theorem 1. The typical implementation of Gaussian elimination.23. of course.76].74.32. approximate the Jacobian or evaluate it only a few times during the nonlinear iteration. Jacobian factorization and storage of that factorization may be more expensive than a solution by iteration. If the linear equation for the Newton step is solved exactly and the Jacobian is computed and factored with each nonlinear iteration (i. we distinguish between the factorization and the solve.Chapter 2 Finding the Newton Step with Gaussian Elimination Direct methods for solving the equation for the Newton step are a good idea if • the Jacobian can be computed and stored efficiently and • the cost of the factorization of the Jacobian is not excessive or • iterative methods do not converge for your problem. One can. Even when direct methods work well. However.2). .1 Direct Methods for Solving Linear Equations In this chapter we solve the equation for the Newton step with Gaussian elimination.27. 2.) [1.74. factors the coefficient matrix A into a product of a permutation matrix and lower and upper triangular factors: The factorization may be simpler and less costly if the matrix has an advantageous structure (sparsity. called an LU factorization.

27] manages the permutation for you in some way.2 The Newton-Armijo Iteration Algorithm newton is an implementation of Newton's method that uses Gaussian elimination to compute the Newton step.28 Chapter 2. The factorization is the most expensive part of the solution. and some address computations. We will ignore the permutation for the remainder of this chapter. . where. The cost of an LU factorization of an N x N matrix is N3/3 + O(N2) flops. highly ill conditioned.8571e-01 2.0000e+00 0 0 8. The cost of the two triangular solves is N2 + O(N) flops.0000e-01.u]=lu(A) returned by the MATLAB command is » El. P is not explicitly referenced. Finding the Newton Step with Gaussian Elimination The permutation matrix reflects row interchanges that are done during the factorization to improve stability.u]=lu(a) 1= 5.7143e-01 2. The factorization can fail if. In MATLAB.0000e-01 0 0 l. a multiply.OOOOe+00 -2. For example. Most linear algebra software [1. for example. The significant contributors to the computational cost are the computation and LU factorization of the Jacobian. Following the factorization. one can solve the linear system As = b by solving the two triangular systems Lz = b and Us = z. but the reader should remember that it is important.OOOOe+00 0 u = 7. F' is singular or.8571e-01 l. in MATLAB. 2. if the LU factorization [l.OOOOe+00 l. we define a flop as an add. following [27].0000e+00 1.0000e+01 2.4286e+00 0 l. but is encoded in L.

1.)|| > r do Compute F'(x). F. if the factorization fails then report an error and terminate else solve LUs = —F(x) end if Find a step length A using a polynomial model. factor F'(x) = LU.1) should be scaled. ra. it can be crucial if \(x)j\ is very large. then approximating the Jacobian by differences is the only option. Note that we do not make adjustments if | (x) j \ is very small because the lower limit on the size of the difference increment is determined by the error in F. Computing a Finite Difference Jacobian 29 Algorithm 2. . roughly the square root of the error in F. therefore. The difference increment in (2. if only function evaluations are available. The difference increment h should be no smaller than the square root of the inaccuracy in F. this usually causes no problems in the nonlinear iteration and a forward difference approximation is probably sufficient. T <. rr) Evaluate F(x).rr||F(x)|| + ra. However. The jth column is In (2. While this scaling usually makes little difference. In some cases one can compute the function and the Jacobian at the same time and the Jacobian costs little more (see the example in section 2. while ||F(a.7.1) 6j is the unit vector in the jth coordinate direction. Rather than simply perturb a: by a difference increment /i.5.2) than the evaluation of the function.3 Computing a Finite Difference Jacobian The effort in the computation of the Jacobian can be substantial.3. newton(or. a finite difference Jacobian costs N function evaluations. in each coordinate direction. end while 2. Each column of V^F requires one new function evaluation and. x <— x + As Evaluate F(x). also see section 2. we multiply the perturbation to compute the jth column by with a view toward varying the correct fraction of the low-order bits in (x)j. For example.2. One computes the forward difference approximation (V/l-F)(ar) to the Jacobian by columns. As we said in Chapter 1.3.

The cost of the factorization.30 Chapter 2. is 2Nninu(l + o(l)) floating point operations. we can let and compute From D^F we can recover the first. fourth. Continuing in this way. The methods for doing this for general sparsity patterns [20. but we can illustrate the ideas with a forward difference algorithm for banded Jacobians. A matrix A is banded with upper bandwidth nu and lower bandwidth nl if Aij = Q i f j < i — niOYJ>i + n u . when n/ and nu are small in comparison to JV. . then only columns 1 and 2 depend on (x)i. columns of V^-F from D^F as . The MATLAB sparse matrix commands exploit this structure. If F' is tridiagonal.2) This is different from the MATLAB sign function. HI = nu — I.. one can compute several columns of the Jacobian with a single new function evaluation. then one can compute a numerical Jacobian several columns at a time. we can differentiate F with respect to (rr)i and (0^)4 at the same time.. Hence we use the scaled perturbation (Tjh. the difference increment should change roughly the last 8 digits of x. Finding the Newton Step with Gaussian Elimination if evaluations of F are accurate to 16 decimal digits. as does the cost of the factorization. The LU factorization of a banded matrix takes less time and less storage than that of a full matrix [23]. where In (2. If F' is sparse.21] are too complex for this book. The factors have at most ni + nu + 1 nonzeros. for which sign(O) = 0. The Jacobian F' is banded with upper and lower bandwidths nu and HI if (F}i depends only on (x)j for For example. Since (F)k for k > 4 is completely independent of any variables upon which (F)i or (F)<2 depend. if F' is tridiagonal. If F' is banded. The cost estimates for a difference Jacobian change if F' is sparse.

we can compute the forward difference Jacobian with 1 + HI + nu perturbations. then (F)k depends on (x)i for 1 < k < 1 + nj. Repeat the process with to compute the final third of the columns. 7.fO. Continuing in this way we define pk for 1 < k < 1 + nu + nu by where there are k — I zeros before the first one and HI + nu zeros between the ones. If we perturb in the first coordinate direction.3. The matrix is stored in MATLAB's sparse format. columns. If we set we can use formulas analogous to (2. So we can compute the forward difference approximations of dF/d(x}\ and dF/d(x)2+nu+nu with a single perturbation. By using the vectors {p^} as the differencing directions. Hence a tridiagonal Jacobian can be approximated with differences using only three new function evaluations. fifth. it uses efficient factorization and storage methods for banded matrices. Hence the next admissible coordinate for perturbation is 2 + ni + nu. x = function and point 7.m solver uses this algorithm if the upper and lower bandwidths are given as input arguments. the bookkeeping is a bit more complicated. For a general banded matrix. n = length(x).nu) 7.4) and (2. Our nsold. precomputed function value 7e nl.5) to obtain the second.. nu = lower and upper bandwidth 7. When MATLAB factors a matrix in this format.nl. jac = sparse(n. If the upper and lower bandwidths are nu < N and HI < N. Computing a Finite Difference Jacobian follows: 31 We can compute the remainder of the Jacobian after only two more evaluations. BANDJAC Compute a banded Jacobian f (x) by forward differences.. . 7t Inputs: f.n). function jac = bandjac(f. but the central idea is the same.x. fO = f(x).2. . we cannot perturb in any other direction that influences any (F)k that depends on (or)i.

l). m = iht-ilt.n]). while ist <= n ilt = il(ist). dv = (fl-fO)/epsnew. end 7. ih(ip). 7.n]). 7o Build the perturbation vector. il(ip) = max([ip-nu. ist = delr(ist)+l.ist) = dv(ilt:iht). ist = delr(ist)+l. 7. for is = l:delr(l) ist = is. ist = is. fl = feval(f. 7. . Sweep through the delr(l) perturbations of f. 7» Compute the forward difference. xl = x+epsnew*pt. while ist <= n pt(ist) = 1.32 Chapter 2. pt = zeros(n. il(ip) = range of indices that influence f(ip) for ip = l:n delr(ip) = min([nl+nu+ip. end 7. 7. 7. 7. 7. 7. epsnew = l.l). end end 7. Finding the Newton Step with Gaussian Elimination dv = zeros(n. 7.d-7. 7. 7o delr(ip)+l = next row to include after ip in the 7.l]). ih(ip) = min([ip+nl. iht = ih(ist). jac(ilt:iht. 7. We'll need delr(l) new function evaluations. perturbation vector pt 7.xl). % Fill the appropriate columns of the Jacobian.

while the convergence is q-linear and more nonlinear iterations will be needed than for Newton's method. The Chord and Shamanskii Methods 33 The internal MATLAB code numjac is a more general finite difference Jacobian code.4 The Chord and Shamanskii Methods If the computational cost of a forward difference Jacobian is high (F is expensive and/or N is large) and if an analytic Jacobian is not available. so the step and the direction are the same.F. where the Jacobian may not be updated for several time steps. the overall cost of the solve will usually be much less.rr) Evaluate F(x).ra.8. Algorithm 2. numjac will.3 does exactly that. So. Here the Jacobian factorization and matrix function evaluation are done after every ra computations of the step.61]. factor F'(x] = LU. Compute F'(x). T <.2.2.4. chord(x. Global convergence problems have been ignored. numjac was designed to work with the stiff ordinary differential equation integrators [68] in MATLAB.rr\F(x}\ + ra. end while end if A middle ground is the Shamanskii method [66]. . The chord method from section 1. it is wise to amortize this cost over several nonlinear iterations. since both the N function evaluations and the O(N3) work (in the dense matrix case) in the matrix factorization are done only once. let you input a general sparsity pattern for the Jacobian and then use a sophisticated sparse differencing algorithm. if the factorization fails then report an error and terminate else while \\F(x)\\ > T do Solve LUs = -F(x). The advantages of the chord method increase as N increases. and the computation of the step is based on an LU factorization of F'(x) at an iterate that is generally not the current one. The chord method is the solver of choice in many codes for stiff initial value problems [3. 2. for example. Algorithms chord and Shamanskii are special cases of nsolg. Recall that the chord method differs from Newton's method in that the evaluation and factorization of the Jacobian are done only once for F'(X0). X «— X + S Evaluate F(x).

of course. end for end while If one counts as a complete iteration the full m steps between Jacobian computations and factorizations. 2. this inaccuracy can cause a line search to fail. factor F'(x) = LU. to which a code will switch after a Newton-Armijo iteration has resolved any global convergence problems. the convergence may be slower than you'd like. This means that the Jacobians will be inaccurate. ra) while ||F(x)|| > r do Evaluate F(x). x <— x + s Evaluate F(x)\ if ||F(a. Finding the Newton Step with Gaussian Elimination Algorithm 2. if your initial iterate is far from a solution. r <— rr|F(x)| + ra. You should think of the chord and Shamanskii methods as local algorithms.9 is complete. Compute F'(x).3.m (see section 2. ra. i. if the initial iterate is good. However. is the m = 1 case.)|| < r terminate. . if the factorization fails then report an error and terminate end if for p = I : m do Solve LUs = -F(x). Newton's method. but it's worth thinking about a few specific problems that can arise when you compute the Newton step with a direct method. for some K > 0. if you use an approximation to the Jacobian.1 Poor Jacobians The chord method and other methods that amortize factorizations over many nonlinear iterations perform well because factorizations are done infrequently.5. then the line search can fail.34 Chapter 2. shamanskii(j.. 2. the Shamanskii method converges q-superlinearly with q-order ra+ 1.. Even if the initial iterate is acceptable. but. The major point to remember is that. the Jacobians will be accurate enough for the overall performance to be far better than a Newton iteration. F. Our code nsold.5 What Can Go Wrong? The list in section 1.6) watches for these problems and updates the Jacobian if either the line search fails or the rate of reduction in the nonlinear residual is too slow. rr.e.

m 2. If that assumption is not valid for your problem. Automatic differentiation software takes as its input a code for F and produces a code for F and F'. If F is smooth and can be evaluated for complex arguments.6. The derivatives are exact. then you can get a second-order accurate derivative with a single function evaluation by using the formula One should use (2. the cost of pivoting can be large and it is tempting to avoid it. it's probably a good idea to re-enable it. 2.m is to try to avoid computation of the Jacobian and. one should scale h.1). If the components of x differ dramatically in size. not to update the Jacobian and to reuse the factorization. If line search fails and you have disabled pivoting in your sparse factorization. Check that you have scaled the difference increment to reflect the size of x. 2. You were warned in sections 1. consider a change of independent variables to rescale them.5. If. for example. but the cost of a centered difference Jacobian is very high.3 Pivoting If F' is sparse. One other approach to more accurate derivatives is automatic differentiation [34].9. Another approach [49.4 that the difference increment in a forward difference approximation to a Jacobian or a Jacobian-vector product should be a bit more than the square root of the error in the function.m becomes the chord method once the iteration is near the solution. pivoting can be essential for a factorization to produce useful solutions. ierr.2 Finite Difference Jacobian Error 35 The choice of finite difference increment h deserves some thought.m it_hist. however. nsold. This means that nsold. Automatic differentiation software for C and FORTRAN is available from Argonne National Laboratory [38]. For sparse problems. of course. you may have the option to compute a sparse factorization without pivoting.5. Using nsold. assume that the error in the function is on the order of floating point roundoff.73] uses complex arithmetic to get higher order accuracy.tol. Using nsold. as we did in (2. Most codes.2.6) with some care if there are errors in F and. this is the way to proceed. but the codes are usually less efficient and larger than a hand-coded Jacobian program would be. the difference increment must be adjusted to reflect that.3 and 1.9. including ours.m is a Newton-Armijo code that uses Gaussian elimination to compute the Newton step. x_hist] = nsold(x. F' is symmetric and positive definite.f. For general F'.6 [sol. if the reduction in the norm of the nonlinear residual is large enough (a factor of two). The .parms). The calling sequence is The default behavior of nsold. Switching to centered differences can also help.

for example.j acobian]=f(x). it is generally faster if you do that rather than let nsold compute the Jacobian as a full or banded matrix with a forward difference. when the iteration is far from the solution) of the iteration and that it is almost never updated in the local phase (i. and the tolerances for termination. isham — 1 and rsham = 0 is Newton's method.36 Chapter 2. If you can provide an analytic Jacobian (using the optional second ..2). The components of parms are maxit is the upper limit on the nonlinear iteration..1) of using an out-of-date Jacobian when far from a solution is reduced.m from section 1.r r ) contains the tolerances for the termination criterion (1. and you want to use the MATLAB sparse matrix functions. In this way the risk (see section 1. nsold. 2. the function /. The scalar function f atan. All our codes expect x and / to be column vectors of the same length. In practice this means that the Jacobian is almost always updated in the global phase (i. As in all our codes. If nsold. So.m takes this danger into account by updating the Jacobian if the reduction in the norm of the residual is too small or if the line search fails (see section 2. The syntax for the function / is function=f(x) or [function.1 that this strategy could defeat the line search. Finding the Newton Step with Gaussian Elimination reader was warned in section 1. but not banded.m The required input data are an initial iterate x. you must compute the Jacobian and store it as a MATLAB sparse matrix.7.10.12).e. You can leave this argument out if you want a difference Jacobian and you are not using the banded Jacobian factorization. The default is isham = 1000 and rsham — 0. The H-equation code heq. The Jacobian is computed and factored after every isham nonlinear iterations or whenever the ratio of successive norms of the nonlinear residual is larger than rsham.5. The next parameter controls the computation of the Jacobian.6.3 in the software collection is a nontrivial example of a function with an optional Jacobian. the vector tol = (r a . the default is 40. which is usually enough. A forward difference approximation (jdiff = 1) is the default.7.6. so the Jacobian is updated only if the decrease in the nonlinear residual is not sufficiently rapid. If your Jacobian is sparse.m is called with no optional arguments. then a forward difference Jacobian is computed and factored only if the ratio ||F(xn)||/||F(o:n_i)|| > 0.1 Input to nsold. If it is easy for you to compute a Jacobian analytically.e.m from section 2.7.5 or the line search fails. when the iteration is near a solution that satisfies the standard assumptions). The parms array controls the details of the iteration.2 is a simpler example.

Automatic differentiation (see section 2. The error flag ierr is 0 if the nonlinear iteration terminates successfully. The limit of 20 can be changed with an internal parameter maxarm in the code. .m twice. Examples 37 output argument to the function). 2.m The outputs are the solution sol and.m and to compare the pure Newton's method with the default strategy. If your Jacobian is sparse. give the lower and upper bandwidths to nsold. We provide codes for each example that call nsold. indicating that we provide an analytic Jacobian. set jdiff = 0. The first is the J2-norm of the nonlinear residual and the second is the number of step-size reductions done in the line search. Be warned: asking for the iteration history. If your Jacobian is banded. We invite the reader to try jdiff = I. The failure modes are ierr = 1. sometimes more than is worthwhile. can expend all of MATLAB's storage.2. and the entire sequence {xn}. {xn} stored in columns of the array x-hist. for example. which means that the step length was reduced 20 times in the line search without satisfaction of the sufficient decrease condition (1.7 Examples The purposes of these examples are to illustrate the use of nsold.7. Analytic Jacobians almost always make the solver more efficient. if nsold is used within a larger code and one needs a test for success. once with the default iteration parameters and once with the parameters for Newton's method Note that the parameter jdiff = 0. The sequence of iterates is useful for making movies or generating figures like Figure 2. which means that the termination criterion is not met after maxit iterations.7. but require human effort. 2.1. For the H-equation example in section 2. optionally. and ierr = 2.1. MATLAB will automatically use a sparse factorization. an error flag. but also requires some human and computational effort.2 Output from nsold. These can be left out for full Jacobians. The history array itJiist has two columns.3 the difference Jacobian computation takes more time than the rest of the solve! We also give simple examples of how one can use the solver from the command line. The error flag is useful.m as the last two parameters. a history of the iteration.21).6. One can use xJiist to create figures like Figure 2.2) is a different way to obtain exact Jacobian information.5.

38 Chapter 2.1402e-01 1.hist]=nsold(x. The line search in nsold.7.4711e+00 1.4547e+00 1.0].0000e+00 2.m uses the polynomial model and. 1.m solves this problem using nsold. » tol=[l. Run the code and compare the plots yourself.m. The columns in the hist array are the residual norms and the number of times the line search reduced the step length. It takes several iterations before nsold's default mode stops updating the Jacobian and the two iterations begin to differ. Why is that? » xO=10.3920e-01 9.87116-01 7.0000e+00 0 0 0 0 0 0 . params). 0.4. The alert reader will see that the solution and the residual norm are the same to five significant figures.12786-01 9.1 Arctangent Function This is a simple example to show how a function should be built for nsold.d-2. Finding the Newton Step with Gaussian Elimination 2. » sol sol 9.2507e-01 8. One can run the solver from the command line to get a feel for its operation and its output. » [sol.3724e+00 1. In the lines below we apply Newton's method with coarse tolerances and report the solutions and iteration history. therefore.l.8343e-01 5. The function only computes a Jacobian if there are two output arguments. even this small problem is difficult for the solver and the step length is reduced many times. 'fatan'.66056-04 0 S. tol.OOOOe+OO 3.3170e+00 9.d-2]. The MATLAB code atandemo.6605e-04 » hist hist = 1. » params=[40.m with ra = rr = 10~6 and compares the iteration histories graphically.0000e+00 2. With an initial iterate of X0 = 10. the iteration history for Newton's method is a bit different from that in Figure 1.

5:2:40]. global convergence behavior '/. SIMPDEMO '/. f(l)=x(l)*x(l)+x(2)*x(2) . where the Jacobian is singular. 2*x(2)]. v=[. 2*x(2).2).7. f(2)=exp(x(l)-l) + x(2)*x(2) . '/. '/. perhaps by evaluating F (see section 1.1.7. . '/. the line search will fail and the stagnation point will be near the x(l)-axis. jac]=simple(x) */. % Return the Jacobian if it's needed. This is a fragment from simpdemo.5:4:40.2. vr=2:4:40.l. '/.. This program solves the simple two-dimensional °/. SIMPLE simple two-dimensional problem with interesting */.0.1 we plot the iteration history for both choices of initial iterate on a contour plot of ||F||. problem in Chapter 2 and makes Figure 2.2. y. v=.vr]. Here N = 2 and This function is simple enough for us to put the MATLAB code that computes the function and Jacobian here. the step length was reduced twice on the first iteration. Here's the code that produced Figure 2.d-6]. Examples 39 2. v=[vl. In this example ra = rr = 10~6. Create the mesh for the contour plot of II f I I . % vl=. f=zeros(2. If XQ = (3.9. if nargout == 2 jac=[2*x(l).m. but not to a root! Line search codes that terminate when the step is small should also check that the solution is an approximate root. '/.25. Full steps were taken after that.1. end The MATLAB code for this function is simple . For XQ = (2.2.5)T. The iteration that stagnates converges.5)T.l).d-6.5:2. m. tol=[l. This is an interesting example because the iteration can stagnate at a point where F'(x) is singular. In Figure 2.m and the code that generated Figure 2.1 is simpdemo.l:. We investigated two initial iterates. function [f. exp(x(l)-l).2 A Simple Two-Dimensional Example This example is from [24].

'simple'. x_hist]=nsold(xO. z=zeros(n. tol. Newton's method 7. The iteration from xl will stagnate 7. 0. tol.5]'. n=length(xr). xl=[3. x_hist2]=nsold(xl. '/. '/.1. params).0]. for i=l:n for j=l:n w=[xr(i).40 Chapter 2. [sn2. 7. z(i.j)=norm(simple(w)).xr(j)]'. 7. 7. xO=[2. 'simple'. xO is a good initial iterate. [sn.. 7. ierrn2. .n). at a point where F' is singular.5]'.m. xr=-5:.2:5. Finding the Newton Step with Gaussian Elimination Figure 2. '/. errsn. 1. end end '/. params). 7. 7o Draw a contour plot of II f I I . xl is a poor initial iterate. Solution of two-dimensional example with nsold. errsn2. 7. ierrn. params= [40.

z.m for solving the H-equation stores -A as a MATLAB global variable and uses it in the evaluation of both F and F1. x_hist2(l.7.:).:). '/.x_hist2(2. Once A is stored. xlabel('x_l').x_hist(2. Let A be the matrix Our program heqdemo.xr.7. plot(x_hist(l.17] is This equation arises in radiative transfer theory. J -o').:). Examples 41 figured) contour(xr.:). */.3 Chandrasekhar H-equation The Chandrasekhar H-equation [15. axis([0 5-5 5]) 2. There are two solutions unless c = 0 or c = 1.8) in a more compact form. Use the x_hist array to plot the iterations on the contour plot. F(x) can be rapidly evaluated as The Jacobian is given by . The algorithms and initial iterates we use in this book find the solution that is of interest physically [46]. ylabel('x_2').'-*'. The resulting discrete problem is We will express (2.v) hold '/.'Stagnation').2. Can you find the other one? We will approximate the integrals by the composite midpoint rule: where ni = (i — 1/2) /N for 1 < i < N. legend('Convergence'.

and c = 0.5)/n. A_heq=cc*A_heq'. h=x-ph. ra = rr = 10~6. x=ones(n. end The MATLAB code heqdemo.d-6.*h j ac. The MATLAB code for the H-equation is heq.9. mu=mu'.. cc=.n). . 7. 1)T. 7. hj ac=A_heq. once F has been computed. 7.2.hjac]=heq(x) % HEQ Chandrasekhar H-equation residual */. n=100. N = 100. % Be sure to store the correct data in the global array A_heq. The output is the plot in Figure 2. % Set the nodal points for the midpoint rule. 7. there is almost no new computation needed to obtain F'.42 Chapter 2. Notice how the analytic Jacobian appears in the argument list. ph=ones(n. mu=(mu-.l)*mu'. n=length(x). HEQDEMO This program creates the H-equation example in Chapter 2.. Form and store the kernel of the integral operator in a 7o global variable. 7./h.. Finding the Newton Step with Gaussian Elimination Hence. 7.l).*ph)*ones(1.m. 7. hj ac=eye(n)-hj ac.. tol=[l.1)-(A_heq*x).d-6]. Jacobian uses precomputed data for fast evaluation.l. if nargout==2 hj ac=(ph. 7. function [h.5*c/n. 7. m solves this equation with initial iterate XQ = (1.9. mu=l:n./(A_heq+A_heq'). Solve the H-equation with the default parameters in nsold and plot 7o the results. % global A_heq. A_heq=ones(n. 7.l). 7. global A_heq. h=ones(n. c=.

This is a very easy problem and the Jacobian is only computed and factored once with the default settings of the parameters. 'Rotation' . 2. tol). v = 0.2. 7o Plot the results.1) . an exercise from [3].9) to a first-order system for . Solution of the H-equation for c = 0. Use the default parameters in nsold. % '/.2. plot(gr. so the objective is to find a nonzero solution. 7.3. c= 1 and rsham = 0. 'heq'. for example.20]) such that This problem has at least two solutions. 7. One.9. Examples 43 Figure 2. We begin by converting (2. [hc. errsd. is not interesting. xlabel(. ylabel('H'. ierrd]=nsold(x.4 A Two-Point Boundary Value Problem This example.7.1. We seek v € C2([0.m 7.7. shows how to use the banded differencing algorithm from section 2. Things are somewhat different with.he).\mu').

*/. Problem 7. 7.44 Chapter 2.1). Finding the Newton Step with Gaussian Elimination The equation for U is We will discretize this problem with the trapezoid rule [3. It is a direct translation of the formulas above.40] on an equally spaced mesh {*»}£=!. page 187 in [3] . The discretization approximates the differential equation with the IN — 2 equations for Ui « U(U) The boundary data provide the remaining two equations We can express the problem for {f/i}^:1 as a nonlinear equation F(x) = 0 with a banded Jacobian of upper and lower bandwidth two by grouping the unknowns at the same point on the mesh: In this way (z)2i+i ~ v(ti) and (x)^ ~ v'(ti).4.1) * h for 1 < i < N and h = 2Q/(N .m. The even components of F are the discretization of the original differential equation Here and The MATLAB code for the nonlinear function is bvpsys.1. BVPSYS Two-point BVP for two unknown functions 7. where U = (i . The boundary conditions are the first and last equations w'i = u-2 is expressed in the odd components of F as for 1 < i < N .

m plots v and v' as functions of*. fb(2:2:n2)=f2. v" = ( / ) v' + (t v . bvp2demo. 7. 7.7.*v . f2(l:n-l)= vp(2:n)-vp(l:n-l) + h*.l). 7. 45 Calling nsold. rhs=cof. r=0:n-l. problems in Chapter 2 with nsold. fl(l)=vp(l). cof(l)=l. 7. This script solves the system of two-point boundary value 7. Fix it up. n=n2/2. fl(2:n)= v(2:n)-v(l:n-l)-h*. The zero solution is easy to find. 7. Separate v and v' from their storage in u. BVP2DEMO 7. 7. 7. but you may not get the same solution each time! Run the code bvp2demo. % function fb=bvpsys(u) global L n2=length(u). v=u(l:2:n2-l).l). 7. 7. cof=r. The division by zero really doesn't happen.*vp + (r. .1) v 4t 7.l). Set the boundary conditions. h=L/(n-l).m. fb(l:2:n2-l)=fl. cof=4.*v. 7.5*(vp(2:n)+vp(l:n-D)./cof. r=r'*h.5*(rhs(2:n)+rhs(l:n-l)). f2=zeros(n. Examples 7. cof(l)=0. 7. v>(0) = 0 f2(n)=v(n). v(L) = 0.l). 7.3.2. 7.m is equally straightforward. fb=zeros(n2. We plot that solution in Figure 2. too. 7. 7. vp=u(2:2:n2). and then change the initial iterate to the zero vector and see what happens. with the line search being active for three of the nine iterations.m. fl=zeros(n. 7. We can find a nonzero solution using the initial iterate The solver struggles.

it_hist. 7. 0.parms). tol=[l. v=sol(l:2:n-l).v.9). h=L/(nh-l). u(l:2:n-l)=v.'--').tol.l). ierr] = nsold(u.2*r. r=0:nh-l. 2.*r*. 7. r=r'*h.46 Chapter 2. 7. 7. The upper and lower bandwidths are both 2. Try different initial iterates and 7. 7.'v\prime'). xlabel('t'). Use Newton's method. nh=n/2. Solution of (2. [sol.vp.'bvpsys'. parms= [ 0 1.r. 2] . 7. vp=-. u=zeros(n. . it_hist plot(r. watch Newton find a different solution! % v=exp(-r. 4.d-12].l).*v.l.3. n=800. u(2:2:n)=vp. vp=sol(2:2:n). This choice of initial iterate gives the "correct" result. legend('v'.d-12. global L L=20. 1. Finding the Newton Step with Gaussian Elimination Figure 2.'-'.

7. very small time steps must be taken. consider the nonlinear parabolic problem with boundary data and initial data We solve this on a spatial mesh with width 6X = 1/64 and use a time step of dt = 0. usually something like (1.tn) for the interior nodes {xi}i=i = {ifix}f=i and times {ti}i£i = {i8t}i=i. The most elementary example is the implicit Euler method. but is usually very robust. The unknowns are approximations to u(xi.61] the termination criterion is based on small step lengths. To solve the initial value problem with the implicit Euler method. stiffness means that either implicit methods must be used to integrate in time or. The initial iterate is usually either UQ = un~l or a linear predictor C/o = 2un~1 —un~2. a nonlinear solver must be used at each time step. .7. If the problem is nonlinear. in the case of an explicit method. In general terms [3. Similarly.5 Stiff Initial Value Problems Nonlinear solvers are important parts of codes for stiff initial value problems. In most modern codes [3.8. We refer the reader to the literature for a complete account of how nonlinear solvers are managed in initial value problem codes and focus here on a very basic example. The discretized problem is a stiff system of 63 ordinary differential equations. This combination can lead to problems. where un solves the nonlinear equation The nonlinear solver is given the function and an initial iterate. we specify a time step 6t and approximate the value of the solution at the mesh point n8t by un. Hence the solver sees a different function (varying un~l and h) at each time step.2. the Jacobian is updated very infrequently—rarely at every time step and certainly not at every nonlinear iteration.67]. The time step h depends on n in any modern initial value problem code. This eliminates the need to evaluate the function only to verify a termination condition.1.Our discretization in space is the standard central difference approximation to the second derivative with homogeneous Dirichlet boundary conditions. Examples 47 2. As an example.17).

with the backward Euler discretization. FTIME '/.m. 7.dt * (exp(u) . calling nsold.t) = u(l. 7.m with MATLAB global variables.t) = 0. The value of u at the current time and the time step are passed 7. n=length(u).0) =0. so we 7. Nonlinear residual for time-dependent problem in Chapter 2 7. the components of the function F sent to nsold. so we can store the time history of the . we use the banded difference Jacobian approximation in nsold. use the banded differencing function.d2u). 0 < t < 1 7. 7.and superdiagonals. for the nonlinear solver. while computing it analytically is easy. 7t Nonlinear residual for implicit Euler discretization 7. ft=(u . The code timedep. 7. 7. u(0. d2u=d2u/OT2). 7. Finding the Newton Step with Gaussian Elimination For a given time step n and time increment 6t. The discrete second derivative D^ is the tridiagonal matrix with —2 along the diagonal and 1 along the sub. 7. u(x. 7.m integrates the initial value problem. All of this is encoded in the MATLAB code ftime.m are given by for 1 < i < N = 63.m generates the time-space plot of the solution in Figure 2. d2u(l:n-l)=d2u(l:n-l)-u(2:n).48 Chapter 2.4. h=l/(n+l). 7. 7. timedep. d2u is the numerical negative second derivative. 7o TIMEDEP This code solves the nonlinear parabolic pde 7. Newton's method is used 7. 7. to the nonlinear residual as MATLAB global variables. d2u(2:n)=d2u(2:n)-u(l:n-l).m at each time step. The time step and solution are passed as globals. We pass the time step and un~l to ftime. function ft=ftime(u) global uold dt 7. u_t = u_xx + exp(u).m. This problem is 1-D. d2u=2*u. The Jacobian is tridiagonal and. 7.uold) . The Jacobian is tridiagonal. This code has the zero boundary conditions built in.

d-6. global uold dt dt-. dx=l/(nx+l). 7. xval=0:dx:1. 1. uold=zeros(nx. 0. parms=[40.nt).d-6]. end 7.1). uhist(2:nx+1. and a tridiagonal Jacobian. integration and draw a surface plot. uold=unew. Solution of (2. 1]. Newton's method. tol=[l.uhist) y. it_hist. for it=l:nt-l [unew.xval.it+1)=unew. 7. 7. Use tight tolerances.l.l. 1. ierr] =nsold (uold.7. nt=l+l/dt. 7.2.' f time'. tval=0:dt:1. . 7. tol. 1. parms) . mesh(tval. Plot the results.13).4. uhist=zeros(nx+2. nx=63. Figure 2. Examples 49 '/.

Do this for these examples and explain your results. 2.8 2. interpolating the solution to a finer mesh. t) tends to a limit as t —> oo. brsola.8. If the discretization is secondorder accurate and you halve the mesh width at each level. nsoli .m.9999.m.8. Finding the Newton Step with Gaussian Elimination You can see from the plot that u(x. Of course.99.0.44]. Do the data in the itJiist array indicate superlinear convergence? Does the choice of the forcing term in nsoli.1].9. If c 7^ 0.1.36. for c = 0. then the H-equation has two solutions [41.0.50 Chapter 2. using piecewise linear interpolation to move from coarse to fine meshes.1 Projects Chandrasekhar H-equation Solve the H-equation and plot residual histories for all of nsold. resolving on the finer mesh. integrating accurately in time is a wasteful way to solve the steady-state problem.14) would be to solve the time-dependent problem and look for convergence of u as t —> oo. called pseudotransient continuation. Apply this idea to some of the examples in the text. how should you terminate the solver at each level? What kind of iteration statistics would tell you that you've done a satisfactory job? . In this case that limit is a solution of the steady-state (time-independent) equation with boundary data This might give you the idea that one way to solve (2.0.1. and then repeating the process until you have a solution on a target mesh. but an extension of this idea. you can estimate the q-factor by examining the ratios of successive residual norms. does work [19.1. For c £ (—00.0. How would you compute them? 2. Try to find the other one.5.m.52]. 25.m affect your results? If you suspect the convergence is q-linear. the two solutions are complex. This is especially entertaining for c < 0.2 Nested Iteration Solving a differential or integral equation by nested iteration or grid sequencing means resolving the rough features of the solution of a differential or integral equation on a coarse mesh. The one you have been computing is easy to find.

m function [sol.3.hist and ierr and set the iteration parameters. internal parameter: 72 '/. 47 '/.hist. 29 7. isham .5. 27 % 28 % isham = -1. nu: lower and upper bandwidths of a banded Jacobian.tol. maxarm = 20. rtol] relative/absolute 18 % error tolerances 19 7. initial iterate = x 16 '/. 1.parms) 7. 'sin'. nu] 20 % maxit = maximum number of iterations 21 % default . rsham: The Jacobian matrix is 23 7. 44 % If you include nl and nu in the parameter list. 62 '/. 50 % output: 51 % sol " solution 52 7. 82 "/.nsold(x. April 1. 55 % 56 % ierr = 0 upon successful termination. 69 % 70 % 71 '/. x_hist = matrix of the entire iteration history. This 65 '/. if nargin >= 4ftlength(parms) "» 0 . The example computes pi as a root of sin(x) 76 % with Newton's method and forward difference derivatives 77 % and plots the iteration history.40 22 % isham. 88 % 89 90 91 92 93 94 95 96 97 98 99 100 debug = 0. tol = [l. nl. 0.Jacobian] = f(x). NSOLD Newton-Arm!jo nonlinear solver 4 X Factor Jacobians with Gaussian Elimination 5 % 6 % Hybrid of Newton. 10 '/. Source Code for nsold. debug = turns on/off iteration statistics display as 73 % the iteration progresses 74 % 75 % Here is an example. 1 turns display on. 43 % nl. 79 X 80 % 81 % x . if x_hist is in the output argument list. it_hist = array of iteration history. but 66 7.1 if after maxit iterations 58 % the termination criterion is not satisfied 59 '/.hist.2. rsham = . 45 % the Jacobian will be evaluated with a banded differencing 46 % scheme and stored as a sparse matrix. 64 % The columns are the nonlinear iterates. 37 % 38 % jdiff = 1: compute Jacobians with forward differences. function . 30 % isham = -1. errs. 86 % 87 % Set the debug parameter. This code comes with no guarantee or warranty of any kind.5 is the default. rsham. maxit = 40. rsham = 0.f . parms = [maxit. 31 % isham = m. for example. Storage is only allocated 68 1. isham. x_hist] • nsold(x. T.d-6.9 1 2 3 7. x_hist] . 42 '/.f. ierr = 2 failure in the line search. function [sol. ierr.f 17 7. otherwise off.1. This is an 67 % OPTIONAL argument.d-6]. Kelley.tol. The Jacobian is computed and factored 35 % whenever the step size 36 '/. can consume way too much storage. 1] 4. iband = 0. [result. 41 % defaults = [ 0 1000. is useful for making movies. The iteration 60 % is terminated if too many step-length reductions 61 % are taken. params = [40. jdiff.parms) 13 % 14 % inputs: 15 '/. l. it. params). rsham = 0 is Newton's method. tol. 39 % jdiff • 0: a call to f will provide analytic Jacobians 40 % using the syntax [function. 0]. rsham = 1 is the Shamanskii method with 32 X m steps per Jacobian evaluation. isham • 1.0. useful for tables and plots. Note that x_hist is not in 78 % the output list. Chord 7 % 8 % C. 53 '/. tol = [atol. 83 % result '4 % semilogy(errs) 85 7.-1. it. is reduced in the line search. jdiff . 2003 9 '/. % ierr . 48 % 49 % 11 •/. computed and factored after isham 24 % updates of x or whenever the ratio 25 % of successive 12-norms of the 26 % nonlinear residual exceeds rsham. % % Initialize it. 57 % ierr . Shamanskii. 33 % 34 '/. rsham = 1 is the chord method. The two columns are the residual norm and 54 % number of step-size reductions done in the line search. ierr. 63 7. 12 '/. ierr] = nsold(x.5.

164 direction = u\tmp. 115 itc = 0.feval(f. % If the Jacobian is fresh. 108 end 109 end 110 rtol = tol(2).armflag] = armijo(direction. 7. end while end sol = x.bandjac(f. 147 if jdiff == 1 148 if iband == 0 149 [1.tol & itc < maxit) 130 '/. atol = tol(l). if debug == 1. jac_age = jac_age+l. isham = parms(2). Evaluate f at the initial iterate. 139 7. :) = [itc fnrm rat]. x_hist = x. [step. 153 end 154 else 155 [fv. 138 itc = itc+1. 140 7t Evaluate and factor the Jacobian 141 7. Keep track of the ratio (rat = fnrm/frnmo) 132 '/.x]. '/. end outstat(itc+1. f O ) . Compute the Newton direction. u] = d i f f j a c ( x . if fnrm > stop_tol. 7. 162 7. if debug == 1. 152 [l.nu). 123 fnrmo = 1. 7. fO = fold. set the error flag. update it. 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 7.maxarm). 116 7. disp(outstat). if armflag == 1 if jac_age > 0 sol = xold. .f. or 142 '/. 163 tmp . nu = parms(6). ierr = 2. If the line search fails and the Jacobian is old.tol = atol+rtol*fnrm. Add one to the age of the Jacobian after the factors have been '/. else disp('Complete Armijo failure. end rat = fnrm/fnrmo. 112 n = length(x). end 114 fnrm = 1.nl. on the first iteration. fold = fO. 117 '/. 120 fO .x. 121 fnrm = norm(f0). the iteration counter (itc) . 111 it_hist = []. sol = xold. f . 135 rat = fnrm/fnrmo. 113 if nargout == 4. 144 if(itc == 1 I rat > rsham I itsham == 0 I armflag == 1) 145 itsham . main iteration loop 128 '/. 124 itsham = isham. if the ratio of successive residual norms is too large. used in a solve.O]. ierr = 1. 131 7. 150 else 151 jacb .iarm. end 202 7.fO. rsham = parms(3). 157 end 158 end 159 itsham = itsham-1. return end end fnrm = norm(f0). every isham iterate.'). 156 [l. 122 it_hist . 7t On failure. 7.').u] = lu(jacb). end 7. if nargout == 4.fO. 136 outstat(itc+1. disp('Armijo failure. :) « [itc fnrm rat]. disp([itc fnrm rat]). you're dead.x. x = xold. 137 fnrmo = fnrm. recompute Jacobian.[fnrm.u] = lu(jac). 102 if length(parms) >= 4 103 jdiff = parms(4).iarm]'] '. 143 7. 7.jac] = feval(f. 129 while(fnrm > stop. 107 iband = 1.isham. A fresh Jacobian has an age of -1 at birth.101 maxit = parms(l).x). x_hist = [x_hist. 104 end 105 if length(parms) >= 6 106 nl = parms(5). Compute the stop tolerance. 7.fO. 161 7. 134 7. xold = x. 118 '/.-l\fO.x. of successive residual norms and 133 7. 119 '/. 126 % 127 '/. 146 jac_age = -1. 160 7. [fnrm. 125 stop. it_hist = [it_hist' .x).

fO. end end % % function z = dirder(x.l). while ist <= n ilt = il(ist). '/. Fill the % appropriate columns of the Jacobian. ist . '/. '/.zeros(n. % % Hardwired difference increment epsnew = l. jac = sparse(n. % pt = zeros(n.iht-ilt. function jac .w. fO .f(x). April 1. (uses dirder) X V.d-7.length(x). % Approximate f'(x) w.n). ist = delr(ist)+1. C.x./epsnew.feval(f.next row to include after ip in the % perturbation vector pt % '/. epsnew = l. u] . x = function and point 7. Compute a forward difference dense Jacobian f' ( ) return lu factors. il(ip) = max([ip-nu. % % delr(ip)+l . . % for is = l:delr(l) 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 ist .w. 2 0 03 % This code comes with no guarantee or warranty of any kind.x1).n]). for j = l:n zz .j) • dirder(x. jac(ilt:iht. nu = lower and upper bandwidth % n = length(x). f. ist = is. '/. preevaluated •/.range of indices that influence f(ip) % for ip = l:n delr(ip) = min([nl+nu+ip.bandjac(f. 5% % Inputs: f.f0) % inputs: '/. '/. % '/.f. % function z = dirder(x. il(ip) . while ist <= n pt(ist) . fO = f(x).l). z ( ) = 1. n . % % n .d-7.delr(ist)+l. Kelley. T. % x. % % Build the perturbation vector.1.zz.f. T. % inputs: % x.length(x). precomputed function value % nl. '/. Kelley.ist) . x. fl .n]). fO) '/. iht = ih(ist).fO). 7. zj jac(:.lu(jac). % % April 1. % % C. u] . f = point and function •/. % end % Sweep through the delr(l) perturbations of f. m .203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 V.nu) '/. X xl = x+epsnew*pt. % % Compute the forward difference. ih(ip) = min([ip+nl. w = point and direction f » function fO = f(x). We'll need delr(l) new function evaluations. dv = (f1-f0).l]). 2003 % % This code conies with no guarantee or warranty of any kind. function [1. end [1. end % '/.diffjac(x. % % ih(ip). dv " zeros(n.fO) % Compute a finite difference directional derivative.is.nl.f. in nonlinear iterations f (x) has usually been computed before the call to dirder. BANDJAC Compute a banded Jacobian f (x) by forward differences.dv(ilt:iht).l).

lamc = lambda.ft.parab3p(lambdac.alpha*lambda) * nfO. April 1. '/. else lambda « parab3p(lamc. ffc. ffc = nft*nft.l). if iarm == 0 lambda . 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 lamm . '/. iarm = 0.f. sigmal = 0.305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 % Scale the step. if norm(w) == 0 z = zeros(n. is more important than clarity. ffc = value of II F(x_c + lambdac d) II "2 X ffm » value of I I F(x_c + lambdam d) I I " 2 % % '/. '/. output: % lambdap = new value of lambda given parabolic model '/. lamm.norm(ft). C. fl . safeguarding bounds for the line search % % % Set internal parameters. end epsnew=epsnew/norm(w). '/. armflag . '/. return end '/. too many reductions'). if iarm > maxarm disp('Armijo failure. '/• Update x. end of line search '/. ffc = nft*nft.sigmal*lambda. xs=(x'*w)/norm(w).1. % input: '/• lambdac = current step-length '/. lambdam. function [step.xt). % '/.xt. ffm) '/. sigma0 = 0. Kelley. ft = feval(f.d-4. lambda = 1. '/.5. % % Compute % the step length with the three-point parabolic model.maxarm) iarm » 0. lambdam. internal parameters: '/. nfO = norm(fO). '/. '/. del = x+epsnew*w. iarm = iarm+1. del and fl could share the same space if storage '/.1. lambdam = previous step-length '/. ffO = value of II F(x_c) II "2 */. % xold = x. xp » x. '/. T. function lambdap .dO)*sign(xs). end '/. sol .fO)/epsnew. '/. sigmal = . % function lambdap = parab3p(lambdac. if xs "= 0. return.xold.5. ft = feval(f.x. xt = x + step. I step • lambda*direction. ffm).1. end end xp . fp . ffc. % This code comes with no guarantee or warranty of any kind.fO. % Apply the three-point parabolic model. Keep the books on the function norms.d0 epsnew=epsnew*max(abs(xs). while nft >= (1 . % sigma0 % = . sigmal = .5. nft = norm(ft). alpha = l.xp.0. armflag « 1.iarm. '/. ffc. 2003 '/. . ffO « nfO*nfO. ffO. •/.feval(f. ffm = ffc.fp.armflag] = armijo(direction. ffm = nft*nft. keep the books on lambda. '/. nft .del). '/. % Now scale the difference increment. fp » fO. ffm) '/. z = (fl . ffO. lamm = 1. lamc = lambda. step = lambda*direction. xt = x + step. ffO. Apply three-point safeguarded parabolic model for a line search.xt).lamc.

407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422

% Compute coefficients of interpolation polynomial. % X p(lambda) • ffO + (cl lambda + c2 lambda'2)/dl 51 % dl - (lambdac - lambdam)*lambdac*lambdam < 0 % So if c2 > 0 we have negative curvature and default to % lambdap - sigaml * lambda. X c2 = lambdam*(ffc-ffO)-lambdac*(ffm-ffO); if c2 >- 0 lambdap = sigmal*lambdac; return end cl = Iambdac*lambdac*(ffm-ff0)-lambdam*lambdam*(ffc-ff0); lambdap = -cl*.5/c2; if lambdap < sigmaOlambdac, lambdap = sigmaO*lambdac; end if lambdap > sigmal*lambdac, lambdap = sigmal*lambdac; end

423

Chapter 3

Newton–Krylov Methods

Recall from section 1.4 that an inexact Newton method approximates the Newton direction with a vector d such that

The parameter is called the forcing term. Newton iterative methods realize the inexact Newton condition (3.1) by applying a linear iterative method to the equation for the Newton step and terminating that iteration when (3.1) holds. We sometimes refer to this linear iteration as an inner iteration. Similarly, the nonlinear iteration (the while loop in Algorithm nsolg) is often called the outer iteration. The Newton–Krylov methods, as the name suggests, use Krylov subspacebased linear solvers. The methods differ in storage requirements, cost in evaluations of F, and robustness. Our code, nsoli .m, includes three Krylov linear solvers: GMRES [64], BiCGSTAB [77], and TFQMR [31]. Following convention, we will refer to the nonlinear methods as Newton-GMRES, Newton-BiCGSTAB, and NewtonTFQMR.

3.1

Krylov Methods for Solving Linear Equations

Krylov iterative methods approximate the solution of a linear system Ad = b with a sum of the form where TQ = b — Ado and do is the initial iterate. If the goal is to approximate a Newton step, as it is here, the most sensible initial iterate is do = 0, because we have no a priori knowledge of the direction, but, at least in the local phase of the iteration, expect it to be small. We express this in compact form as dk KK, where the Kth Krylov subspace is
57

the default linear solver in nsoli . Convergence of GMRES As a general rule (but not an absolute law! [53]). implemented as a matrix-free method. GMRES (m) does this by restarting the iteration when the size of the Krylov space exceeds m vectors.m. This is an important property of the method because one can. but far from always. exhaust the available fast memory.1.42. performs best if the eigenvalues of A are in a few tight clusters [16. is to observe that the kth GMRES residual is in Kk and hence can be written as a polynomial in A applied to the residual Here p € Pk. The convergence theory for GMRES does not apply to GMRES(m).58 Chapter 3.64. to build an iterate in the appropriate Krylov subspace. Newton–Krylov Methods Krylov methods build the iteration by evaluating matrix-vector products. The kth GMRES iterate is the solution of the linear least squares problem of minimizing over Kk.m code. like other Krylov methods.42. are needed to implement a Krylov method. pointing out only that it is not a completely trivial task to implement GMRES well. 3. keeping in mind that do = 0.76]. GMRES. we must have [42] This simple fact can lead to very useful error estimates. Our nsoli . is often. . GMRES. Since the kth GMRES iteration satisfies for all z € Kk. like most implementations of Newton–Krylov methods. rather than details of the matrix itself.1 GMRES The easiest Krylov method to understand is the GMRES [64] method. in very different ways. One way to understand this.1). If you find that the iteration is stagnating. approximates Jacobianvector products with forward differences (see section 3. This is the set of polynomials of degree k with p(0) = 1. the set of Kth-degree residual polynomials. GMRES must accumulate the history of the linear iteration as an orthonormal basis for the Krylov subspace.m has a default value of m = 40. like other Krylov methods. The reason for this is that only matrix-vector products. Any implementation of GMRES must limit the size of the Krylov subspace.76] for the details of the implementation. We refer the reader to [23. so the performance of GMRES(m) can be poor if m is small. you might see if an analytic Jacobian-vector product helps. nsoli.23.2. and often does for large problems.

is a convergence result for diagonalizable matrices. for example. One objective of preconditioning (see section 3. say. Here VH is the complex conjugate transpose of V. GMRES will reduce the residual by a factor of.1. In this case the columns of V are the eigenvectors of A and V–l = VH. Let Pk £ Pk. Proof. and all the eigenvalues of A lie in a disk of radius 0. See [16] for examples of similar estimates when the eigenvalues are contained in a small number of clusters. Since reduction of the residual is the goal of the linear iteration in an inexact Newton method. Theorem 3. Let A = VAV~l be a nonsingular diagonalizable matrix.1. Theorem 3.1.1 implies (using Pk(z) = (1 — z)k] that Hence. the conjugate gradient (CG) method [35] has better convergence and storage properties than the more generally applicable Krylov methods.1. that A is diagonalizable. 3.3) is to change A to obtain an advantageous distribution of eigenvalues. k(V) = 100. In exact arithmetic the Kth CG iteration minimizes . The reader should be aware that V and A can be complex even if A is real. A is diagonalizable if there is a nonsingular matrix V such that Here A is a diagonal matrix with the eigenvalues of A on the diagonal. Then. 105 after seven iterations.2 Low-Storage Krylov Methods If A is symmetric and positive definite.3. Suppose. If A is a diagonalizable matrix and p is a polynomial. We can easily estimate ||pk(A)|| by as asserted.1 centered about 1 in the complex plane. this is a very useful bound. then A is normal if the diagonalizing transformation V is unitary. for example. Let dk be the kth GMRES iterate. for all pk Pk. Krylov Methods for Solving Linear Equations 59 Here.

preconditioning can be done in a matrix-free manner. 3. while both have the advantage of a fixed storage requirement throughout the linear iteration. Newton–Krylov Methods over the Kth Krylov subspace.42. One does this with the expectation that systems with the coefficient matrix MA or AM are easier to solve than those with A. called CGNE.42]. . The number of linear iterations that BiCGSTAB and TFQMR need for convergence can be roughly the same as for GMRES. two such low-storage solvers. but each linear iteration needs two matrix-vector products (i. that failure will manifest itself as a stagnation in the iteration. If you consider BiCGSTAB and TFQMR as solvers.1) to approximate a Jacobian-vector product with a forward difference.33. two new evaluations of F).77] for detailed descriptions of these methods. A similar approach.2. solves AATz = 6 with CG and then sets x = ATz. there are some problems.m. and hence the convergence of the CG iteration can be far too slow. this means that the iteration will cause a division by zero. The need for a transpose-vector multiplication is a major problem unless one wants to store the Jacobian matrix. but no matrix-free way to obtain a transpose-vector product is known. BiCGSTAB [77] and TFQMR [31]. If you can store the Jacobian. Either method can break down. CGNR and CGNE are used far less frequently than the other low-storage methods. A tempting idea is to multiply a general system Ax = 6 by AT to obtain the normal equations ATAx = ATb and then apply CG to the new problem. While the cost of a single iteration is two matrix-vector products. or both sides by a preconditioner M. Of course.60 Chapter 3. While GMRES (m) can also fail to converge. Low-storage alternatives to GMRES that do not need a transpose-vector product are available [31.. If. We refer the reader to [31. and the Jacobian is well conditioned. One needs only a function that performs a preconditioner-vector product. left.42. you should be aware that. however. one of BiCGSTAB or TFQMR may solve your problem.1.77] but do not have the robust theoretical properties of GMRES or CG. It is simple (see section 3.e. has the disadvantage that the condition number of ATA is the square of that of A. This is not an artifact of the floating point number system but is intrinsic to the methods. or can compute a transposevector product in an efficient way. can be used in nsoli. GMRES (m) should be your first choice. convergence. Aside from GMRES(m). which has a symmetric positive definite coefficient matrix ATA. at least in exact arithmetic. This approach. is guaranteed [33. you cannot allocate the storage that GMRES(m) needs to perform well. called CGNR. applying CG iteration to the normal equations can be a good idea. Because the condition number is squared and a transpose-vector multiplication is needed. The symmetry and positivity can be exploited so that the storage requirements do not grow with the number of iterations.3 Preconditioning Preconditioning the matrix A means multiplying A from the right. not a division by zero or an overflow.

Right preconditioning solves the system AMy = b with the Krylov method. 3.3 to form a banded approximation to the Jacobian.2 Preconditioning Nonlinear Equations Our code nsoli. One factors the banded approximation and uses that factorization as the preconditioner. Remember not to use the MATLAB sign function for sgn. one simply applies nsoli. Right preconditioning has the feature that the residual upon which termination is based is the residual for the original problem. since the preconditioned residual will be used to terminate the linear iteration.m expects you to incorporate preconditioning into F. A different approach. So we multiply h by The same scaling was used in the forward difference Jacobian in (2. One then applies the Krylov method to the preconditioned system. which is integrated into some initial value problem codes [10. If the condition number of MA is really smaller than that of A. Then the solution of the original problem is recovered by setting x = My. is to pretend that the Jacobian is banded. If h is roughly the square root of the error in F.2 3.m to the preconditioned nonlinear problem . Two-sided preconditioning replaces A with M left AM right . The forward difference directional derivative at x in the direction w is The scaling is important. which is defined by (2. the Jacobian-vector product is easy to approximate with a forward difference directional derivative. the residual of the preconditioned system will be a better reflection of the error than that of the original system. The reason for this is that the data structures and algorithms for the construction and application of preconditioners are too diverse to all fit into a nonlinear solver code.2.12].3.2. One would hope so.1 Computing an Approximate Newton Step Jacobian-Vector Products For nonlinear equations. We first scale w to be a unit vector and take a numerical directional derivative in the direction w/||w||. even if it isn't.1). we use a difference increment in the forward difference to make sure that the appropriate low-order bits of x are perturbed. and to use Jacobian-vector products and the forward difference method for banded Jacobians from section 2. 3. Computing an Approximate Newton Step 61 Left preconditioning multiplies the equation As — b on both sides by M to obtain the preconditioned system MAx = Mb. To precondition the equation for the Newton step from the left.3).2.

but also to obtain quadratic convergence when near a solution. right preconditioning. Left or Right Preconditioning? There is no general rule for choosing between left and right preconditioning. responding to the problem statement "Make ||F|| small. equivalently. so you need to decide what you're interested in. The formula is complex and motivated by a lengthy story. then and this termination criterion captures the actual error. one way to do this is where 7 € (0. If we set x = My and solve with Newton's method. Newton–Krylov Methods which is the left-preconditioned equation for the Newton step for F. the choice 7]n = rj^es will do the job.)|| is small. by terminating when ||F(a. The overall goal in [29] is to solve the linear equation for the Newton step to just enough precision to make good progress when far from a solution.1) as the nonlinear iteration progresses. On the other hand.62 The equation for the Newton step for G is Chapter 3. but it's simpler to solve G(y) = 0 to the desired accuracy and set x = My at the end of the nonlinear solve. As in the linear case. then the equation for the step is which is the right-preconditioned equation for the step. which we condense from [42]. If rj^es is bounded away from 1 for the entire iteration." which is often the real objective.2.s). 3.1] is a parameter. x+ = M(y+ +. You should keep in mind that the two approaches terminate the nonlinear iteration differently.3 Choosing the Forcing Term The approach in [29] changes the forcing term 77 in (3. assuming we make a good choice . If M is a good approximation to F^a. Left preconditioning will terminate the iteration when ||MF(o. To recover the step s in x we might use s = Ms or. One might base a choice of 77 on residual norms. captures the behavior of the residual.)|| is small.*)"1. the nonlinear residual is the same as that for the original problem. Linear equations present us with exactly the same issues.

m are 7 = 0. Nmax is an upper limit on the forcing term and In [29] the choices 7 = 0. 3. we can simply limit its maximum size. Of course. one finally arrives at [42] The term is the termination tolerance for the nonlinear iteration and is included in the formula to prevent oversolving on the final iteration. a fast transform method (see section 3. If the high-order term is linear.3) or a multigrid iteration [9].9 and Nmax = 0. a method of safeguarding was proposed in [29] to avoid volatile decreases in Nn.6. then the inverse of the high-order part of the differential operator (with the correct boundary conditions) is an excellent preconditioner [50]. then the linear equation for the Newton step can be solved to far more precision than is really needed. The defaults in nsoli. If your problem is a discretization of an elliptic differential equation. for example. After taking all this into account. Preconditioners 63 for no.9 and rjmax = 0. we do not let Nn decrease by too much. and combining . Domain decomposition preconditioners [72] approximate the inverse of the high-order term (or the entire operator) by subdividing the geometric domain of the differential operator. To make sure that Nn stays well away from one. if Nn is too small in the early stage of the iteration.9999 are used. Multigrid implementation is difficult and a more typical application is to use a single multigrid iteration (for the high-order term) as a preconditioner. The idea is that if rjn-i is sufficiently large. it can often be shown that a solution can be obtained at a cost of O(N) operations.9. Multigrid methods exploit the smoothing properties of the classical stationary iterative methods by mapping the equation through a sequence of grids. Ideally the preconditioner should be close to the inverse of the Jacobian. one can get away with far less.3 Preconditioners This section is not an exhaustive account of preconditioning and is only intended to point the reader to the literature. one might be able to compute the preconditionervector product rapidly with. [29] suggests limiting the decrease to a factor of rj n -1. where N is the number of unknowns. When multigrid methods are used as a solver.3. computing the inverses on the subdomains. To protect against oversolving.3. In practice.

Algebraic multigrid attempts to recover geometric information from the sparsity pattern of the Jacobian and thereby simulate the intergrid transfers and smoothing used in a conventional geometric multigrid preconditioner. which may be generated by computer programs. 3. tests for loss of orthogonality and tries to correct it. conserve storage.m. like the ones based on the GMRES solver in [11]. The MATLAB commands luinc and cholinc implement incomplete LU and Cholesky factorizations. Incomplete factorization preconditioners compute a factorization of a sparse matrix but do not store those elements in the factors that are too small or lie outside a prescribed sparsity pattern.4 What Can Go Wrong? Any problem from section 1.m. 3. can arise if you solve linear systems by iteration. do this. which is designed for discretized differential equations on unstructured grids. The iteration could terminate prematurely because the estimated residual satisfies (3. of course. Algebraic preconditioners use the sparsity structure of the Jacobian matrix.9.2 Loss of Orthogonality GMRES and CG exploit orthogonality of the Krylov basis to estimate the residual and. for example. use a different linear solver. in the case of CG.4. for problems that do not come from discretizations of differential or integral equations or for discretizations of differential equations on unstructured grids. When implemented in an optimal way. Newton-Krylov Methods those inverses.63]. The symptoms of these problems are unexpectedly slow convergence or even failure/stagnation of the nonlinear iteration. the condition number of the preconditioned matrix is independent of the discretization mesh size. There are a few problems that are unique to Newton iterative methods. We refer the reader .1) while the true residual does not. in extreme cases.4. 3. find enough storage to use a direct solver. While it is likely that the nonlinear iteration will continue to make progress. a sensible response is to warn the user and return the step to the nonlinear iteration. An example of such a preconditioner is algebraic multigrid. This is important. Most codes. In finite-precision arithmetic this orthogonality can be lost and the estimate of the residual in the iteration can be poor. The GMRES code in nsoli. including nsoli .1) and the limit on linear iterations has been reached. This is a much more subtle problem than failure to converge because the linear solver can report success but return an inaccurate and useless step. convergence is not certain and one may have to allow the linear solver more iterations.1 Failure of the Inner Iteration When the linear iteration does not satisfy the inexact Newton condition (3. Another algebraic approach is incomplete factorization [62. These preconditioners require that the Jacobian be stored as a sparse matrix.64 Chapter 3. or.

Imeth. The components parms = [maxit. . These are the same as for nsold. GMRES (Imeth = 1) is the default. except for in GMRES (m). one must also specify the total number of restarts in restart-limit. which means that GMRES (m) is allowed 20 x 40 = 800 linear iterations per nonlinear iteration. and the tolerances for termination.m (see section 2. are The parms array is more complex than that for nsold. The default is 20.5. where it is the maximum number of iterations before a restart. Don't ask for the sequence {xn} unless you have enough storage for this array. The default is 40. You have the option in most GMRES codes of forcing the iteration to maintain the orthogonality of the Krylov basis at a cost of doubling the number of scalar products in the linear iteration.m: [sol. For large problems.1).1). maxitl. an error flag.m nsoli.5).1. and TFQMR (Imeth = 4).3. BiCGSTAB (Imeth = 3). x_hist] = nsoli(x.m 65 to [42] for the details.3.m. The choice of Krylov method is governed by the parameter Imeth. then 77 is determined by (3.5. etamax controls the linear tolerance in the inexact Newton condition (3. and rj must be set if you change the value of Imeth. as described in section 3. f.6.m is a Newton-Krylov code that uses one of several Krylov methods to satisfy the inexact Newton condition (3.1.2 Output from nsoli.1). The other alternatives are GMRES (m) (Imeth — 2). Using nsoli.m expects the preconditioner to be part of the nonlinear function.m The required data for nsoli . etamax. then r\ = \etamax\. it_hist. If etamax > 0.m are x. the outputs are the solution sol and. The default is etamax = 0.1 Input to nsoli.9.r r ) contains the tolerances for the termination criterion (1. 3. 3.12). as it is in all our codes.maxitl. restart Jimit]. maxit is the upper limit on the nonlinear iterations. The values of maxit. parms). tol. nsoli. The sequence of iterates is useful for making movies or generating figures like Figure 2.m Like nsold. If GMRES (m) is the linear solver. optionally. The vector tol = (r a . the function /. This parameter has a dual role.5 Using nsoli. ierr. x and / must be column vectors of the same length.5. If etamax < 0. and the entire sequence {xn}. The default is 40. maxitl is the maximum number of linear iterations per nonlinear iteration.m. 3. The syntax for / is function = f ( x ) . a history of the iteration. The calling sequence is similar to that for nsold.

That's the case with the H-equation in our first example in section 3. which accounts for their added cost.5). The history array it-hist has three columns.m draws two figures.21). The failure modes are ierr = 1. for this example. heqkdemo . TFQMR and BiCGSTAB need two Jacobian-vector products for each linear iteration. and TJ are the defaults but must be included if Imeth is to be varied. The limit of 20 can be changed with an internal parameter maxarm in the code.1).1 Chandrasekhar H-equation To get started. '/. respectively.6 Examples Often iterative methods are faster than direct methods even if the Jacobian is small and dense. and ierr = 2. with the number of calls to F on the horizontal axis. Note that the values of maxit.6. as is the case with the other two examples.3. If the Jacobian is too expensive to compute and store. In this way we can better estimate the total cost and see. The code ozmovie. Generating such a plot is simple. one that plots the residual against the nonlinear iteration count and another. Call nsoli to solve the H-equation with Newton-TFQMR. Newton-BiCGSTAB. 3.m in the directory for this chapter is an example of how to use the sequence of iterates to make a movie. . which means that the termination criterion is not met after maxit iterations. The code heqkdemo.66 Chapter 3. the second is the cumulative number of calls to F. 3. The first is the Euclidean norm of the nonlinear residual ||F(x)||. NEWTON-TFQMR SOLUTION OF H-EQUATION 7. Newton-Krylov Methods asking for the iteration history {xn} by including xJiist in the argument list can expend all of MATLAB's storage. shown in Figure 3. This MATLAB fragment does the job with Newton-TFQMR and an initial iterate of H = I to produce a plot of the norm of the nonlinear residual against the number of function evaluations (the dot-dash curve in Figure 3.1) and ra = rr = 10~8. The initial iterate was the vector ones(100. The forcing term is computed using (3.1. we solve the H-equation (2.1.m calls nsoli .4 for Newton-GMRES. The error flag ierr is 0 if the nonlinear iteration terminates successfully.6.m with three sets of the parameter array with Imeth = 1. which means that the step length was reduced 20 times in the line search without satisfaction of the sufficient decrease condition (1.7) on a mesh of 100 points with a variety of Newton-Krylov methods and compare the performance by plotting the relative nonlinear residual ||F(xn)||/||F(xo)|| against the number of calls to F. that GMRES requires fewer calls to F than the other two linear solvers and therefore is preferable if the storage that GMRES needs is available. maxitl. and Newton-TFQMR. and the third is the number of step-size reductions done in the line search. factoring the Jacobian is not an option.

it_hist.6.l.2 The Ornstein–Zernike Equations This example is taken from [7. Nonlinear residual versus calls to F. c € C[0. [sol. It is standard to truncate the computational domain and seek h. In their simplest isotropic form the Ornstein-Zernike (OZ) equations are a system consisting of an integral equation where .1). making the computation of the Newton step with a direct method impractical.6. Examples 67 '/. 7. Plot a residual history.1. After approximating the integral with the trapezoid rule. L].d-8].4].3. For this example L = 9.it_hist(:. Figure 3.18.2).40.56].9. ierr] = nsoli(x..tol.d-8. x=ones(100.1)) . '/. The problem is an integral equation coupled with an algebraic constraint. The unknowns are two continuous functions h and c defined on 0 < r < oo. we find that the function can be most efficiently evaluated with a fast Fourier transform (FFT). tol=[l. 3. parms = [40. semilogy(it. '/.'heq'.parms).hist(:.l)/it_hist(1.

Let kj = (j — l)6k. we begin by discretizing frequency in a way that allows us to use the FFT to evaluate the convolution.1. .0. The convolution h * c in (3. If h decays sufficiently rapidly.1. Discrete Problem We will approximate the values of h and c on the mesh where 6 = L/(N — 1) is the mesh width and rf = (i — 1)6. e. e = 0. and a = 2.7) can be computed with only one-dimensional integrals using the spherical-Bessel transform. The nonlinear algebraic constraint is In (3. p = 0.1).9) and /?. for 2 < i < N . To approximate h * c.2. Newton-Krylov Methods and the integral is over R3. Here p is a parameter. we define and We compute h * c by discretizing the formula where he is the pointwise product of functions.68 Chapter 3. We define. for 2 < j < N . as we assume. and <r are parameters. where 6k = ir6/(N .1. For this example we use ft = 10. Then.

L=9.m solves this problem on a 201-point mesh.C]=DZDEMO returns the solution on a grid with a mesh spacing of 1/256. lf=imag(ft(2:n+D). 7e Compute the potential and store it in a global variable. OZDEMO This program creates the OZ example in Chapter 3. sigma=2. organizing the computation as was done in [47]. To prepare this problem for nsoli.12) and (3. Examples Finally. epsilon=. 7. 7.cT)T.2*n+2). We set (u * V)N = 0 and define (u * v)i by linear interpolation as The sums in (3. 7. The function oz. The sine transform code is as follows: % LSINT 7.2.2. r=0:dx:L. for 2 < i < N — 1. r=r'.m that produces the graph of the solution in Figure 3. rho=.m does this.13) can be done with a fast sine transform using the MATLAB FFT.m we must first consolidate h and c into a single vector x = (hT. .6.10). we define. Fast sine transform with MATLAB's FFT 7.0.c]=ozdemo global L U rho n=257. beta=10. 7. function [h.3. plots h and c as functions of r. Here is the part of ozdemo. and compares the cost of a few strategies for computing the forcing term. We also use global variables to avoid repeated computations of the potential u in (3. The MATLAB code ozdemo. 7t 7t 7. 69 where uv denotes the componentwise product.f']'. dx=L/(n-l).l. [H. ft=-fft([0. function lf=lsint(f) n=length(f). To compute the sums for 1 < i < N — 1 one can use the MATLAB code Isint to compute 1 = Isint (f).

xlabel('r'). plot(r.epsilon. 7. 7. Newton-Krylov Methods U=elj(r. xlabel('r').1).6) with 77 = 0.m.l. In Figure 3. Solution of the OZ equations.'Rotation». h=sol(l:n).!]. 7. x=zeros(2*n.d-8].'oz'. Figure 3. but the choice 77 = 0. For this example. ylabeK'c'. also produced with ozdemo .3.2.1. ierr] = nsoli(x. ylabeK'h'.1). Plot the solution. the default choice of 77 is best if the goal is very small residuals. 7. plot(r.2. 7. 7.h.l). [sol. it_hist. % Unpack h and c.'-'). . 1) .tol).1 is superior for realistic values of the tolerances.'-').d-8. c=sol(n+l:2*n). It is easy to compare methods for computing the forcing term with nsoli. m.beta). subplot(1. parms=[40. 'Rotation'. much smaller than is needed for a mesh this coarse and used only to illustrate the differences between the choices of the forcing term.70 Chapter 3. tol=[l.80.-.c. For both computations the tolerances were ra = rr = 10~8. subplot(1. we compare the default strategy (3.sigma.2).2.

The problem is a semilinear (i.1)..6. here is the source of dxmf .3.f (Laplacian). but solvers expect one-dimensional vectors. where one applies the differential operators. . and lapmf . Nonlinear residual versus calls to F.e. MATLAB makes it easy to alternate between a two-dimensional u (padded with the zero boundary conditions).1) x (0.3 Convection-Diffusion Equation This example is taken from [42] and shows how to incorporate left and right preconditioning into F.m. and the one-dimensional vector that the solvers require.m (d/dy]. Examples 71 Figure 3.m (d/dx). All of this was done within the matrix-free difference operators dxmf . dymf . 3. linear in the highest order derivative) convection-diffusion equation with homogeneous Dirichlet boundary conditions on the unit square (0. As an example. Here V2 is the Laplacian operator / has been constructed so that the exact solution is the discretization of We discretized on a uniform mesh with 31 interior grid points in each direction using centered differences and terminated the iterations consistently with the second-order accuracy of the difference scheme by setting The physical grid is two-dimensional.3.6.

with the BCs built in. global rhsf prhsf y. y.72 Chapter 3. dxuu=zeros(n. Notice that the preconditioner is applied to the low-order term. y. vv=zeros(n. n=sqrt(n2). PDELEFT W=PDELEFT(U) is the nonlinear residual of the left°/. y. % Compute the partial derivative.n).n). y. h=l/(n+l).n+2). We apply the preconditioner from the left to (3. . vv(:)=u. y. */. homogeneous Dirichlet BC n2=length(u).15) to obtain the preconditioned equation pdelef t. % Turn u into a 2-D array y. Newton-Krylov Methods function dxu = dxmf(u) % DXMF Matrix-free partial derivative wrt x. uu=zeros(n+2. % Compute the low-order nonlinear y. uu(2:n+l. Our solver f ish2d.m to lampmf (u) simply to recover u. y. m is the MATLAB code for the nonlinear residual.2:n+l)-uu(l:n. There is no need to apply f ish2d.2:n+l). function w=pdeleft(u) '/.m uses the MATLAB FFT to solve the discrete form of with homogeneous Dirichlet boundary conditions to return g = Mu. dxu=.2:n+l)=w. dxuu=uu(3:n+2. i We can exploit the regular grid by using a fast Poisson solver M as a preconditioner. term. preconditioned pde example with C=20.5*dxuu(:)/h. % % Divide by 2*h.

for BiCGSTAB and TFQMR.and right-preconditioned iterations. however. 3. as before.m. y ) is the same as in section 3.3.4 Time-Dependent Convection-Diffusion Equation This example is a time-dependent form of the equation in section 3. say. which calls the solver.6. For the right-preconditioned problem.3. but are more complicated to implement. is optimal in the sense that the convergence speed of the linear iteration will be independent of the mesh spacing [50]. so it isn't completely valid to compare the left. The preconditioned right side 73 is stored as the global variable prhsf in pdeldemo. Preconditioning a semilinear boundary value problem with an exact inverse for the high-order term. It does make sense to compare the choices for linear solvers and forcing terms.1).6.6. As in section 3. we impose homogeneous Dirichlet boundary conditions on the unit square (0.3. Recall that the residual has a different meaning than for the left-preconditioned problem. Apply fish2d to the entire pde. while the number of nonlinear iterations is roughly the same. t) of (3. We will use the implicit Euler method to integrate the nonlinear parabolic initial boundary value problem in time for 0 < t < 1. Multigrid or domain decomposition preconditioners [9.17) to converge to usteady as t —> oo. Were this a larger problem. One can examine the performance for the three linear solvers and find. * (dxmf (u) +dymf (u) ) . '/. we set u = Mw and solve The MATLAB code for this is pderight. so the solution of the steady-state (time-independent) problem is We expect the solution u(x. once again on the second nonlinear iteration. the step length was reduced once at the first nonlinear iteration for all three choices of linear solver and. Examples v=20*u. that. the problem becomes a large system of ordinary differential .m. w=u+fish2d(v)-prhsf. in three space dimensions. as we do here.6.1) x (0. °/. With right preconditioning.6. the number of function evaluations is lower for GMRES. The Armijo rule made a difference for the right-preconditioned problem. After discretization in space. The function f ( x .72] also do this. the storage for full GMRES could well be unavailable and the low-storage solvers could be the only choices. 7.3.

solves linear systems with GMRES.7 3. compares the result at t = 1 with the steady-state solution.7. 3. You'll need a preconditioner to get good performance. and. so implicit methods are necessary if we want to avoid unreasonably small time steps. Would an incomplete factorization (like luinc from MATLAB) work? . Compare the accuracy of the results.2 Left and Right Preconditioning Use the pdeldemo.1.m for the solve is shorter than the explanation above. the OZ equations. and the convection-diffusion equation. a time step of 0. Make the comparison in terms of computing time.7. number of function evaluations needed to reach a given tolerance.7. one solves.5 to prepare the nonlinear systems that must be solved for each time step.7.74 Chapter 3.1 Projects Krylov Methods and the Forcing Term Compare the performance of the three Krylov methods and various choices of the forcing term for the H-equation. 3.7. We follow the procedure from section 2.5.m. and storage requirements. Newton-Krylov Methods equations. First. which uses a 63 x 63 spatial mesh and.m.m solver. The code pdetime.m and pderdemo. at each time step. at the end.7.3 Two-Point Boundary Value Problem Try to solve the boundary value problem from section 2. we discretize in space with centered differences to obtain a system of ordinary differential equations.m. The integration code is pdetimedemo. the nonlinear equation where M represents the application of the fish2d. Try using a factorization of F'(XQ] to build one.4 with nsoli. which we write as The nonlinear equation that must be solved for the implicit Euler method with a fixed time step of 6t is To precondition from the left with the fast Poisson solver f ish2d. If GMRES is limited to the storage that BiCGSTAB or TFQMR needs. Modify the codes to refine the mesh and see if the performance of the preconditioner degrades as the mesh is refined. integration in time proceeds just as it did in section 2. This system is stiff. Because the nonlinear residual F has been constructed. how well does it perform? Do all choices of the forcing term lead to the same root? 3.m codes in the directory for this chapter to examine the quality of the Poisson solver preconditioner.

and nonlinear residuals for some of the other examples. Projects 75 3. This is especially interesting for the differential equation problems.3. Use this code as a template to make movies of solutions. steps. .7.m in the directory for this chapter solves the OZ equations and makes a movie of the iterations for h.7.4 Making a Movie The code ozmovie.

7. sol = solution % it_hist(maxit. If etamax < 0. globally convergent '/. debug = 0. For iterative methods other than GHRES(m) maxitl '/. 98 if nargout == 4.1. 3 (BICGSTAB).f. sigmal = . Storage is only allocated 7. x_hist ™ matrix of the entire iteration history.f. then eta .1 (GMRES. 7. Imeth = 2 7.3. default » 20 7.[atol. otherwise off. for example. 7.3) * 12-norms of nonlinear residuals % for the iteration. when the relative linear residual is 7. maxitl = maximum number of inner iterations before restart 7. 7.hist = x. in CURES (m). Imeth = choice of linear iterative method 7. 7. The iteration '/. default .5. maxit = 40. This '/. Kelley. it. This code comes with no guarantee or warranty of any kind. 1 turns display on. sigmaO = 0. . if x_hist is in the output argument list. default: etamax = 0. m = maxitl V. sigmaO = . maxarm * 20. x.9 7. etamax = . The inner iteration terminates '/. those iteration parameters that are optional inputs.1. parms = [maxit.hist and set the default values of '/. maxitl. 2 GMRES(m).d-4. 7. April 1. it. Initialize it. solver for f(x) =0 '/. 7.hist. end 99 7.1. but % can consume way too much storage. 7. 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 'It output: 7. function = f 7. 7. 7. initial iterate » x 'I. This is an '/. is useful for making movies.parms) '/. maxarm » 20.tol. 7. 7. error tolerances for the nonlinear iteration '/. 7. 7. 7. 1 (GMRES). 7. Imeth. 7. debug ™ turns on/off iteration statistics display as 7.histx = zeros(maxit. safeguarding bounds for the line search 7. 7. Inexact Newton-Armijo iteration '/. parms) '/. default = 40 7. '/. it_hist. 7. 7. etamax. restart. 7. 7. x. x. Set the debug parameter. '/.d-4.hist. % internal parameters: 1. 7. 7. by the modified Eisenstat-Walker formula if etamax > 0. the termination criterion is not satisfied 7t ierr = 2 failure in the line search. Set internal parameters. I etamax I . '/.9. gamma = . 7.3). iteration. 'I. ierr. eta is determined '/. is the upper bound on linear iterations. function [sol.tol. parameter to measure sufficient decrease 7. iteration. C. ierr = 0 upon successful termination 7i ierr = 1 if after maxit iterations '/. number of function evaluations. % Parabolic line search via three-point interpolation % '/. restart. x_hist] = nsoli(x. Eisenstat-Walker forcing term '/. '/. sigmal = 0. 7.5.9. 7. is terminated if too many step-length reductions 7> are taken. default = 40 '/. % and number of step-length reductions 7. maximum number of step-length reductions before 7. 7t inputs: '/. The columns are the nonlinear iterates. 4 (TFQMR) 7.limit = max number of restarts for GMRES if 7. the iteration progresses 7. Imaxit = 40.Maximum error tolerance for residual in inner '/.limit] 7i maxit = maximum number of nonlinear iterations 7.hist] = nsoli(x.limit = 20. 97 Imeth . Initialize parameters for the iterative methods. alpha = l.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Source Code for nsoli. 7. no restarts) 7. ierr. smaller than eta*I F(x_c) I. 100 7. OPTIONAL argument. NSOLI Hewton-Krylov solver. restart. alpha = l. T. 96 ierr = 0. 2003 '/.I etamax I for the entire 7. rtol] relative/absolute 7.m function [sol. failure is reported 7. ierr. tol .

201 '/.2) = it_histx(itc+l. rat = fnrm/fnrmo.(1 . atol « tol(l). ffm). ffc. 127 X 128 % main iteration loop 129 % 130 while(fnrm > stop_tol t itc < maxit) 131 X 132 % Keep track of the ratio ( a = fnrm/frnmo) rt 133 % of successive residual norms and 134 % the iteration counter (itc) . The line search starts here. 164 lame .norm(fO).3) = 0. ffO = nfO*nfO. 165 X 166 % Keep the books on the function norms. iarm = 0. Adjust eta as per Eisenstat-Walker. 135 '/.xold. if nargout — 4.inner_f_evals] = . 175 ierr = 2. Evaluate f at the initial iterate. Imaxit]. it_histx(itc+l. fnrm . itc = 0.parms(2).:). if iarm — 0 lambda = sigmal*lambda.3) = iarm. ffm = nft*nft..2)+l.hist = [x.l) » fnrm.parms(4). 124 fnrmo . end % % Update x. 110 end 111 if length(parms) == 5 112 gmparms * [abs(etamax). 107 gmparms . keep the books on lambda. too many reductions').norm(ft). 136 rat » inrm/fnrmo. 1]. x. 123 it_histx(itc+1. 173 if iarm > maxarm 174 dispCArmijo failure. 178 if nargout — 4. xold = x. f. 117 % 118 V. x.2) = it_histx(itc.2) = 0.x). errstep. parms(5). Check for optioned inputs. Imaxit]. 171 ffc . n = length(x). 167 '/. 113 end 114 end 115 X 116 rtol . 163 lamm » lame. 138 itc = itc+1. 125 stop_tol = atol + rtol*fnrm.parab3p(lamc.1.3).x]. 120 % 121 fO = feval(f. 139 [step.1) . x_hist • [x_hist. 122 fnrm . :) . lambda . 202 161 y. it_histx(itc+l. end 179 sol . 200 '/. 137 fnrmo » fnrm. 162 xt » x+lambda*step. it_histx(itc+l.xt). 141 142 % 152 153 154 155 156 157 158 159 160 % % Apply the three-point parabolic model.1. 102 % 103 gmparms » [aba(etamax). while nft >. it_histx(itc+l.tol(2).x]. 180 return. Imeth). it_histx(itc+l. 172 iarm . 106 it_histx . if etamax > 0 143 % 144 % 145 146 147 148 149 150 151 '/. 126 outstat(itc+1. end fnrm .lambda. How many function evaluations did this iteration require? it_histx(itc+l. 184 fO = ft.[abs(etamax). .hist.fnrm. and 119 'I. ffc = nft*nft. xt = x + lambda*step.xt). inner_it_count. Imazit .2)+inner_f_evails+iarm+l. compute the stop tolerance. 168 ft = feval(f.alpha*lambda) * nfO. 140 dkrylov(fO. 176 disp(outstat) 177 it_hist = it_histx(l:itc+1. nfO = norm(fO). else 187 y. ft = feval(f. nft . 104 if nargin — 4 105 maxit » parms(l). lame = lambda. 181 end 182 end 183 x = xt.iarm+1. 169 nft « norm(ft). lambda . lamm. etamax * parms(3). 185 % 186 % end of line search 188 189 190 191 % 192 % 193 % 194 195 196 197 % 198 199 '/. lamm = 1.norm(f0). if itc == 1.. end. 108 if length(parms)>= 4 109 Imeth . gmparms.zeros(maxit.nft*nft. Imaxit.1.101 V.[itc fnrm 0 0 0 ] . ffO. 170 ffm = ffc.

ffm) X Apply three-point safeguarded parabolic model for a line search. ffc. gmparms(l) • max(gmparms(l) . Imeth) X X X Input: fO = function at current point X f = nonlinear function X The format for f is function fx .5/c2. end V. X c2 = Iambdam*(ffc-ff0)-lambdac*(ffm-ff0). Kelley.it_histx(l:itc+l. f. sigmal = 0. if lambdap < sigmaO*lambdac. X x = current point params= vector to control iteration params(l) params(2) params(3) params(4) 1 — 2 — 3 — * relative residual reduction factor = max number of iterations = max number of restarts for GMRES(m) (Optional) . Imeth = method choice 1 GMRES without restarts (default) 2 GMRES(m). lambdam. f_evals] = . X X C. m = params(2) and the maximum number .reorthogonalization method in GMRES Brown/Hindmarsh condition (default) Never reorthogonalize (not recommended) Always reorthogonalize (not cheap!) X X Set internal parameters.203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 % end etaold . T. Kelley.sigmal*lambdac.1 etanew . lambdap .0. 2003 X X This code comes with no guarantee or warranty of any kind. f. outstat(itc+l. lambdap = new value of lambda given parabolic model X X internal parameters: X sigmaO . if gamma*etaold*etaold > . On failure.f(x).value of II F(x. f.evals] X • dkrylov(fO. params. end '/. total_iters. X function lambdap = parab3p(lambdac. dkrylov(fO.etamax]).1. X X p(lambda) = ffO + (cl lambda + c2 lambda's)/dl X X dl = (lambdac .lambdam)*lambdac*lambdam < 0 X so if c2 > 0 we have negative curvature and default to X lambdap .:).:).gamma*etaold*etaold). X X function [step. safeguarding bounds for the line search X 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 294 295 296 297 298 299 300 301 302 303 304 x 293 % X X X X X X X X X X X X Compute coefficients of interpolation polynomial.sigmal*lambdac. lambdap « -cl*. ffm) % % input: X lambdac » current step length '/. end sol = x. end if lambdap > sigmal*lambdac.1. end X X X function [step. X sigmaO « . ierr = 1.c + lambdac d) I I "2 '/. return end cl = Iambdac*lambdac*(ffm-ff0)-lambdam*lambdam*(ffc-ff0). set the error flag.. sigmal ™ . function lambdap = parab3p(lambdac. . Imeth) X Krylov linear equation solver for use in nsoli X X C. x. total_iters. lambdap .5*stop_tol/fnrm) .sigaml * lambda. lambdam = previous step length X ffO . it_hist . end gmparms(l) = min([etanew. if debug == 1 disp(outstat) it_hist . lambdam. ffm = value of I I F(x_c + lambdam d) I I "2 X X output: '/. :) « [itc fnrm inner_it_count rat iann].5. T. if c2 >= 0 lambdap .max(etanew. '/. params. X Note that for Newton-GMRES we incorporate any X preconditioning into the function routine. '/.it_histx(l:itc+l. ffO. '/. errstep.gmparms(l)..5.value of II F(x_c) ||-2 X ffc . '/. April 1. ffc. April 1.sigmaO*lambdac. 2003 X X X This code comes with no guarantee or warranty of any kind. ffO. x. X if fnrm > stop_tol. etanew = gamma*rat*rat. errstep.

for this purpose. total_iters] » dtfqmr(fO.iters] = dgmres(fO. y. % y. errstep. 2003 y. % % initialization '/. else gmparms .evals « 2*total_iters. f. n = length(x). return end . restart. errstep. y. total. end y. '/. [step. y. in nonlinear iterations. y. iterative methods if Imeth — 1 I Imeth == 2 % GHRES or GHRES(m) X y..limit = 20. if length(params) >= 3 restart. total_iters] . x. elseif length(params) == 4 % % reorthogonalization method is params(4) % gmparms = Cparams(l).305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 1.. % This code comes with no guarantee or warranty y. gmparms).f. 7. Kelley. errstep. BiCGSTAB % y. 7. end if length(params) ~ 3 V. y. 7. '/. 1].iters . % f(x) has usually been computed 7. Compute the step using a GHRES routine especially designed 7. function z . Use a hardwired difference increment.limit = 0.fO) % Finite difference directional derivative % Approximate f ( ) w.w.iters] . % 3 BiCGSTAB '/. f.d-7..evals = 2*total iters.f .3 [step. Restart at most restart_limit times. gmparms).limit . % linear y. y. gmparms. x.fO) y. 4 TFQHR % % Output: x = solution % errstep .w.dcgstab(f0.vector of residual norms for the history of X. 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 end y.step). ' x of any kind. total. gmparms). params(2). errstep. y. f_evals = total_iters+kinn. % inputs: x. kinn < restart_limit kinn = kinn+1. y. w " point and direction % % f = function fO = f(x). f. y. total. elseif Imeth — 4 [step. kinn . T. [step. % default reorthogonalization % gmparms = [params(l). before the call to dirder. f. y. if norm(w) == 0 z = zeros(n.number of iterations '/. Scale the step. of restarts is params(3). while total. C. errstep(total. the iteration % total.l). f.[params(l). params(4)].params(3).iters == Imaxit ft . % TFQMR y. params(2).dirder(x.dgmres(f0. % function z = dirder(x. epsnew = l. Imaxit = params(2). end if Imeth — 1. elseif Imeth -. x. April 1. f. restart.0. x. params(2)].iters = total_iters+kinn*Imaxit.iters) > gmparms(l)*norm(fO) ft . else error('Imeth error in fdkrylov') end % '/..

params (2) = max number of iterations 442 'I. Now scale the difference increment.l).m. The right side of the linear equation for the step is -fO. 490 errtol = errtol*norm(b). dirder. params.1). April 1.zeros(kmax+l. params (1) .l).relative residual reduction factor 441 '/. Kelley. 463 reorth « 1.del). 419 fl = feval(f. x. 438 '/. dispCearly termination') 499 return 500 end 501 % 502 % 503 v(:. 450 '/.iters] . 505 k = 0. 504 beta = rho.dO 411 epsnew=epsnew*max(abs(xs). 410 if xs -• O. params(3) (Optional) = reorthogonalization method 443 '/. 497 if(rho < errtol) 498 */.iters = number of iterations 455 % 456 % requires givapp.f (x) . 412 end 413 epsnew=epsnew/norm(w). the iteration 454 % total. xinit . 487 s . 1 — Brown/Hindmarsh condition (default) 444 '/. xc = current point 439 % params » vector to control iteration 440 '/. xinit = initial iterate.dO)*sign(xs). 484 h . T.0.dgmres(fO. 468 '/. 421 % 422 423 function [x. xinit) 424 y. 494 '/. total_iters] = dgmres(fO.l) = r/rho. f. m) 475 '/. 466 end 467 '/. will be used as the linear solver. error. xinit) % 432 % 433 */.l). 489 g . 464 if length(params) == 3 465 reorth = params(3). 486 c = zeros(kmax+1. 471 n = length(b). 417 % 418 del = x+epsnew*w. 429 '/. 506 '/. 408 % 409 xs=(x'*w)/norm(w). The format for f is function f x . 478 if nargin == S 479 x = xinit. is a reasonable choice unless restarted GMRES 449 '/. CURES linear equation solver for use in Nevton-GMRES solver 425 % 426 */. Input: fO " function at current point 434 'I. 491 error = [] . 420 z « (fl .rho].407 '/. . 458 % 459 '/.rho*eye(kmax+l. fO)-fO. C. 469 % 470 b .iters . the calling routine has a better idea (useful for GMRES ( ) . 480 r « -dirder(xc. This 448 '/. 492 % 493 % Test for termination on entry. 3 — Always reorthogonalize (not cheap!) 446 */.m 457 431 y. params.zeros(kmax). f. 428 % This code comes with no guarantee or warranty of any kind. 2 — Never reorthogonalize ( o recommended) nt 445 '/. xc. 488 rho = norm(r). 461 errtol = params(l). 485 v • zeros(n. 436 % Note that for Newton-GHRES we incorporate any 437 % preconditioning into the function routine. f = nonlinear function 435 '/.fO)/epsnew.-fO. 451 % Output: x .kmax). 447 •/. del and fl could share the same space if storage 416 % is more important than clarity.solution 452 '/t error « vector of residual norms for the history of 453 'I. xc. 414 '/. 462 kmax = params(2). initialization 460 */. 481 end 482 % 483 '/. 477 r = b. 415 '/. 430 % function [x. 495 error » [error. 476 x = zeros(n. error. 2 0 03 427 '/.0 is the default. 496 total.l. total. f. 507 % GMRES iteration 508 '/. 472 X 473 % Use zero vector as initial iterate for Newton step unless 474 •/.

k)/nu. v(:. end 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 y.k+l). 2003 y.k)/nu. vrot(i:i+l) = [wl.509 510 511 512 513 514 515 516 while((rho > errtol) ft (k < kmax)) k . y. '/.s(l:k-l). c(k) . 554 555 556 557 558 559 if(h(k+l.k+l)/h(k+l.k) = norm(v(:.k). rho .k+l). 538 */.abs(g(k+O). end '/. C. y.k). T. % At this point either k > kmax or rho < errtol. % C. if Reorthogonalize? (reorth — 1 t normav + .iters] «. X y. error • [error. '/. normav2 • h(k+l.k)/nu).k+1. 537 '/. normav . v(:.s(i)*vrot(i)+c(i)*vrot(i+l).k)+hr.l:k)\g(l:k).k) . Kelley. xinit) % Forward difference BiCGSTAB solver for use in nsoli y.k+l) = v(:. 2003 y. h(k+l. nu « norm(h(k:k+1. used within GMRES codes.k) = v(:. end h(k+l. T. error.k) ".k+O).iters] 'I. xinit) 00 .k) '/. function [x.k) .k-l).j)'*v(:.. f. % w2 . w2 . g(k:k+l) .k+O). error. 1. params. '/. y. X y.givapp(c(l:k-l). x • x + v(l:n. % % function vrot . h(j.j). y. end function [x.h(l:k.givapp(c.s(k).v(:.h(j. '/.k+l)-h(j. end h(k+l. Update the residual norm. k) vrot « vin. y. total. for i = l:k wl = c(i)*vrot(i)-s(i)*vrot(i+l). h(k. X % end of the main while loop % end y. y.vin.rho].k) = c(k)*h(k. if k > 1 h(l:k.j)'*v(:. % y.l:k)*y.k+l) = v(:.. y.3 for j = l:k hr . y.k)). params.nonn(v(:.1).s. % It's time to compute x and leave.k). y .k) .0) v(:.s(i)*vrot(i)+conj(c(i))*vrot(i+l). xc. Kelley.iters = k. Thanks to Howard Elman for this.w2]. % 517 % 518 % 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 Modified Gram-Schmidt for j . April 1. = dcgstabCfO. total.k). f.k) • 0. 539 540 541 542 7. if nu ~= 0 c(k) = h(k. xc. end Don't divide by zero if solution has been found. s. % function vrot = givapp(c.v(:. Here's a modest change that makes the code work in complex % arithmetic. % v(:.k). s(k) = -h(k+l. total.k)*v(:.conj(h(k.001*normav2 »• normav) I reorth -. v(:. This code comes with no guarantee or warranty of any kind.givapp(c(k). '/.k+l)).k+l) .g(k:k+l). Apply a sequence of k Givens rotations. Watch out for happy breakdown.h(l:k. dcgstab(fO. f 0) . end Form and store the information for the new Givens rotation. This code comes with no guarantee or warranty of any kind.norm(v(:.j).k)-s(k)*h(k+l. V. % Call directional derivative function. 543 % 544 % 545 546 547 548 % 549 % 550 X 551 552 553 '/. vin.k+l) • dirder(xc. April 1.l:k h(j. f.k+l)-hr*v(:.

params(l) .r-alpha*v.rho(k+l)/tau. x.f. tau =0').1).l). The format for f is function fx = f(x). 665 tau « hatrO'*v.. function [x. is a reasonable choice unless restarts are needed. V. 703 % Note that for Newton-GMRES we incorporate any 704 % preconditioning into the function routine. 664 v = dirder(xc. hatrO ..m % */. p . params.fO). while((zeta > errtol) ft (k < kmax)) k .omega*v). v » zeros(n. rho « zeros(kmax+1. 9 / 695 7. 683 end 64 '. % BiCGSTAB iteration •/. error = [error. 672 tau « t'*t.s-omega*t. 682 error » [error. C. r = b.number of iterations '/. xinit = initial iterate. r = -dirder(xc.xinit. 64 '. 7. f. April 1. params (2) ™ max number of iterations 709 % 710 '/. y.611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 '/. % Requires: dirder. zeta]. Use zero vector as initial iterate for Newton step unless '/.zeta]. 688 dtfqmr(fO. errtol = params(1)*norm(b). oo ro 712 y. if omega ~ 0 error('BiCGSTAB breakdown. if nargin == 5 x . omega .l). '/. y. 671 t = dirder(xc. 679 r . '/. initialization % b . '/. k = 0.l). xinit) 689 % Forward difference TFQHR solver for use in nsoli 60 '. 662 beta. 670 s . the calling routine has a better idea (useful for CURES ( ) . 9 / 691 '/.iters] = .k+1.r. 663 p = r+beta*(p . error = vector of residual norms for the history of */.relative residual reduction factor '/. kmax = params(2). omega ™ 1. xc. 675 end 676 omega = t'*s/tau. rho(l) = 1. total_iters] 696 % = dtfqmr(fO. xinit = initial iterate. 2003 692 '/. 8 / 685 % 686 687 function [x. alpha « 1. Note that for Newton-GMRES we incorporate any X preconditioning into the function routine. end y. 680 zeta » norm(r).iters » k.iters .fO). error.hatrO'*r.0 is the default. Kelley. params. Input: fO = function at current point % f » nonlinear function % The format for f is function fx .p. params (2) . 681 total.zeros(n. . 705 */. rho(2) . This '/. This 711 '/. end '/. xinit . xc » current point 706 % params = vector to control iteration 707 % params(1) = relative residual reduction factor 708 '/. error = []. 678 x " x+alpha*p+omega*s. 666 if tau == 0 667 error('BiCGSTAB breakdown. 693 % This code comes with no guarantee or warranty of any kind. ic = current point '/. 698 % 699 % 700 % Input: fO = function at current point 701 % f • nonlinear function 702 y. '/. is a reasonable choice unless restarts are needed.-fO.f(x). params = two-dimensional vector to control iteration '/. n . Output: x « solution "/. the iteration % total.s. 673 if tau == 0 674 error('BiCGSTAB breakdown. 668 end 669 alpha . xc. total. T. t .(rho(k+l)/rho(k))*(alpha/omega). error.0 ) '. x = zeros(n. zeta = norm(r). fO)-fO. xinit) 697 •/.0 ) '. f. 677 rho(k+2) = -omega*(hatrO'*t). xinit = 0 is the default.length(b). m) '/. y.f. f.max number of iterations '/.

vector of residual norms for the history of 716 % the iteration 717 7.2) = dirder(xc. v » w-alpha*u(:.tau*theta*c. theta . d • y(:. 729 if nargin == 5 730 x .iters = k. 762 end 763 m » 2*k-2+j. 757 % Compute y2 and u2 only if you have to. tau .j). OO W .l)+beta*(u(:. y .2). x. d • zeros(n.f. w " r. 727 x = zeros(n. y(:.r. 771 '/. 758 '/. Try to terminate the iteration at each pass through the loop.j)+(theta*theta*eta/alpha)*d.2)+beta*v). 719 '/. 761 u(:. return end end if rho — 0 error('TFQMR breakdown.f.[error.solution 715 % error .[error.f. v = u(:. 746 '/. 755 for j = 1:2 756 7. 726 n » length(b). x . 737 u(:. errtol = params(l)*norm(b).l) = w + beta*y(:.iters • number of iterations 718 '/. tau = norm(r). f. 764 765 76 6 767 768 769 % 770 7.0') end rhon = r'*w.2). c = l/sqrt(l+theta*theta).v. 772 773 774 775 776 777 778 % 779 % 780 % 781 782 783 784 % 785 786 787 788 789 790 791 end 792 '/.y(:. 739 rho = tau*tau.rhon/rho.2) . total. error = []. 759 if j — 2 760 y(:. 725 b = -fO. 747 if sigma — 0 748 error('TFQMR breakdown. y(:.tau].dirderCxc. 731 r = -dirder(xc.l) .fO). y(:.2).2).0.x+eta*d. rho . 736 v • dirder(xc. eta . total.l). u(:.fO). 745 sigma = r'*v. kmax = params(2).zeros(n. tau]. 754 •/. beta . rho . 740 % 741 % TFQMR iteration 742 X 743 while( k < kmax) 744 k = k+1. 735 k = 0. eta = c*c*alpha.l)-alpha»v.m 720 '/. 738 theta .rhon.l). fO)-fO. 752 % 753 '/. y(:. 793 Y.0.norm(w)/tau. tau]. 732 end 733 7. 734 u = zeros(n.iters = k. total.fO). error . error .l) . if tau*sqrt(m+l) <= errtol error = [error. 721 722 */. 728 r = b.l) . 723 % initialization 724 7. y(:.xinit.rho/sigma. Requires: dirder.1). sigma = 0') 749 end 750 X 751 alpha .l).713 % 714 % Output: x .

Contrast this cost to NewtonGMRES. Broyden's method usually requires preconditioning to perform well. The cost of this updating in the modern implementation we advocate here is one vector for each nonlinear iteration. where the storage is accumulated during a linear iteration. These methods are extensions of the secant method to several variables.Chapter 4 Broyden's Method Broyden's method [14] approximates the Newton direction by using an approximation of the Jacobian (or its inverse). Having said that. because one can't divide by a vector. The formula for bn will not do. the current approximation to Ff(xn). one can ask that Bn. Recall that the secant method approximates f'(xn) with and then takes the step One way to mimic this in higher dimensions is to carry an approximation to the Jacobian along with the approximation to x* and update the approximate Jacobian as the iteration progresses. However. Broyden's method can perform very well. Broyden's method. For these reasons. so the decisions you will make are the same as those for a Newton—Krylov method. does not guarantee that the approximate Newton direction will be a descent direction for \\F\\ and therefore a line search may fail. this is a significant disadvantage for Broyden's method. which is updated as the nonlinear iteration progresses. satisfy the secant equation 85 . Broyden's method is the simplest of the quasi-Newton methods. the Newton-Krylov methods are now (2003) used more frequently than Broyden's method. For a problem where the initial iterate is far from a solution and the number of nonlinear iterations will be large. like the secant method for scalar equations. when the initial iterate is near the solution.

Broyden's Method For scalar equations. Then there are 6 and SB such then the Broyden sequence for the data (F.1! This may not work and a code must be prepared for the line search to fail. then where \n is the step length for the Broyden direction After the computation of xn+1.1) are equivalent.1) is meaningless. The algorithm follows the broad outline of nsolg. if xn and Bn are the current approximate solution and Jacobian. so a wide variety of methods that satisfy the secant equation have been designed to preserve such properties of the Jacobian as the sparsity pattern or symmetry [24.2 An Algorithmic Sketch Most implementations of Broyden's method. For equations in more than one variable.43].42.1. Theorem 4. .1 Convergence Theory The convergence theory for Broyden's method is only local and. The line search cannot be proved to compensate for a poor initial iterate. BO) exists and xn —»• x* q-superlinearly.F(x n ) and 4. (4. incorporate a line search. if Let the standard assumptions hold. X0. 4. y = F(xn+1) . Bn is updated to form Bn+1 using the Broyden update In (4. therefore. (4. Theorem 4. our code brsola. In the case of Broyden's method.4). i. Keep in mind the warning in section 1.m among them.1 is all there is. respectively.e. that.7.86 Chapter 4. The data now include an initial approximation B to the Jacobian..2) and (4. less satisfactory than that for the Newton and Newton-Krylov methods.

If the line search fails. terminate. This will amortize the O(N3} factorization of A over the entire nonlinear iteration. using preconditioning to arrange things so that BQ = I. The two sequences of approximate solutions are exactly the same [42]. Keep in mind that one will never compute and store A~1. BQ = I is still a good choice. BQ = F'(XQ] is a good choice.3 Computing the Broyden Step and Update One way to solve the equation for the Broyden step is to factor Bn with each iteration. but the nonlinear iteration will be different. Computing the Broyden Step and Update 87 Algorithm 4. as we will see. one could apply Broyden's method to the left-preconditioned problem A~lF(x) = 0 and use BQ — I. If the initial iterate is accurate. There are many ways to obtain a good BQ. Unlike inexact Newton methods or Newton iterative methods.1. quasi-Newton methods need only one function evaluation for each nonlinear iteration. 4. instead. of course. eliminates part of the advantage of approximating the Jacobian. Letting BQ be the highest order term in a discretized elliptic partial differential equation or the noncompact term in an integral equation is another example. B. If. rr) Evaluate while || F (or) || > r do Solve Bd = -F(x). One then applies this factorization at a cost of O(7V2) floating point operations whenever one wants to compute A~lF(x) or F(A~lx). Left preconditioning works in the following way. but this is also extremely costly.4. The next step is to use the Sherman-Morrison formula [69.3. If B is a nonsingular matrix and u. If the standard assumptions hold and the data XQ and BQ are accurate approximations to x* and F'(x*). are very similar to those for Newton-GMRES. This. One can also factor BQ and update that factorization (see [24] for one way to do this). then B + UVT is nonsingular if and only if .70]. Rather than use BQ = A. Use a line search to compute a step length A. but rather factor A and store the factors. Ta. one uses the right-preconditioned problem F(A~1x) = 0. Suppose A « F'(x*). then the convergence is q-superlinear. Most quasi-Newton codes update B~l as the iteration progresses. The storage requirements for Broyden's method. end while The local convergence theory for Broyden's method is completely satisfactory. F. broyden_sketch(x. v € RN.

compute AQ with a line search. Algorithm 4. Chapter 4. rffl. The storage can be halved with a trick [26. rr) Evaluate F(x). x <— x + s n<-0 while ||F(x)|| > T do .6) at a cost of O(Nn) floating point operations and storage of the 2n vectors {u>fc}£~Q and {sfc}fcZ0.88 1 + vTB~lu ^ 0. In that case. where.2. SG <~ Aod. F. keeping in mind that BQ = /. d < F(x}.42] using the observation that the search direction satisfies Hence (see [42] for details) one can compute the search direction and update B simultaneously and only have to store one new vector for each nonlinear iteration. broyden(x.m. we use (4.5) to Broyden's method. for k > 0. Broyden's Method To apply (4. Keep in mind that we assume that F has been preconditioned and that BQ = I. r <— rr|F(x)| + ra. So. Note that we also must store the sequence of step lengths. Algorithm broyden shows how this is implemented in our Broyden-Armijo MATLAB code brsola. to apply B~l to a vector p. Terminate if the line search fails.4) as where Then. we write (4.

you may need to find a better preconditioner or switch to a Newton-Krylov method.2 Failure to Converge The local theory for Broyden states that the convergence is superlinear if the data XQ and BQ are good.m allows for this. Our code brsola. 4. it_hist. ierr. but if you find that it fails. 4.tol.5 Using brsola.m and nsoli. If the data are poor or you use all available storage for updating B. x_hist] = brsola(x. There are a few failure modes that are unique to Broyden's method. called limited memory in the optimization literature [54. parms). which. A different approach. 4. When the nonlinear iteration converges slowly or the method completely fails. n – 1 do end for Compute A n +i with a line search. .4 What Can Go Wrong? Most of the problems you'll encounter are shared with the Newton-Krylov methods.m is an implementation of Broyden's method as described in Algorithm broyden. What Can Go Wrong? 89 for j = 0.4.1 Failure of the Line Search There is no guarantee that a line search will succeed with Broyden's method. the preconditioner is one likely cause. the iteration can be restarted if there is no more room to store the vectors [30.42].4. is to replace the oldest of the stored steps with the most recent one. Our MATLAB code brsola.m: [sol.m brsola. 4.4.f. like the chord method.55]. As with line search failure. The user interface is similar to those of nsold. the nonlinear iteration may fail.m has a line search. has no guarantee of global convergence.4. Terminate if the line search fails. end while As is the case with GMRES. better preconditioning may fix this.

The first is the Euclidean norm of the nonlinear residual ||F(x)||. For large problems. the function /. when the line search fails. maxitl is the maximum number of nonlinear iterations before a restart (so maxitl — I vectors are stored for the nonlinear iteration).m Exactly as in nsoli. optionally.m The required data for brsola.21). x and / must be column vectors of the same length.5. which means that the termination criterion is not met after maxit iterations. 4. Broyden's method is not as generally applicable as a Newton-Krylov method.m. We warn you again not to ask for the sequence {xn} unless you have the storage for this array. rather than the generous 20 given to nsoli. The vector tol = (r a . the default is 40.r r ) contains the tolerances for the termination criterion (1.m in the directory for this chapter is an example of how to use brsola. The error flag ierr is 0 if the nonlinear iteration terminates successfully.m are x. and ierr = 2. Notice that we give the line search only 10 chances to satisfy (1. The code heqmovie.1 Input to brsola.m to solve the OZ equations from section 3. which means that the step length was reduced 10 times in the line search without satisfaction of the sufficient decrease condition (1.90 Chapter 4. the outputs are the solution sol and. where the Jacobian-vector product is highly accurate. and the tolerances for termination.12).m and the sequence of iterates to make a movie. Broyden's method is useless.5. For example. the line search will fail if you use brsola.21). a history of the iteration. The syntax for / is function = f ( x ) .m. The default is 40. One can increase this by changing an internal parameter maxarm in the code. The parms array is maxit is the upper limit on the nonlinear iterations. The failure modes are ierr = 1. an error flag. 4. Because of the uncertainty of the line search. However. and the third is the number of step-size reductions done in the line search. the second is the cumulative number of calls to F. Broyden's Method 4.2 (unless you find a good preconditioner). and the entire sequence {xn}. asking for the iteration history {xn} by including xJiist in the argument list can expend all of MATLAB's storage. when working well.6. is superlinearly convergent in the terminal phase of the iteration. .6 Examples Broyden's method. The history array itJiist has three columns.2 Output from brsola.

% Solve the H-equation with brsola. it_hist.e.6. Figure 4.m and nsold. This fragment from heqbdemo. m generated these results. broyden is at its best for this kind of problem.tol). The MATLAB code that generated this .6. x=ones(n. We compare brsola.4. one can see that the nonlinear iteration is slower than the two Newton-based methods for the first few iterations. ierr] = brsola(x.l).m is the call to brsola. nsoli terminates in 5 iterations. nsold evaluates the Jacobian only once and takes 12 nonlinear iterations and 13 function evaluations to terminate. Since we can evaluate the Jacobian for the H-equation very efficiently.7. after which the updating takes effect.1 with brsola.m using the default choices of the parameters. 4.6. [sol. we did not precondition). but at a cost of 15 function evaluations. initial iterate. Nonlinear residual versus calls to F. The MATLAB code heqbdemo.m.3. We used the identity as the initial approximation for the Jacobian (i.m with both nsoli. and tolerances) as we did in sections 2. the overall cost is about the same. Broyden's method terminated after 7 nonlinear iterations and 8 function evaluations.6.'heq'. Examples 91 4.2 Convection-Diffusion Equation In this section we compare Broyden's method to the right-preconditioned partial differential equation from section 3.1 Chandrasekhar H-equation We'll solve the same problem (equation.m..6.3 and 3. Right preconditioning.1.

Contrast this with the two solves using nsoli . For left preconditioning. which required no reduction.m.m. While Broyden's method takes more nonlinear iterations. one of the plots created by pdebrr. In spite of the extra work in the line search. In the case of the left-preconditioned convection-diffusion problem. but brsola. When storage is limited. on the other hand. for example. Broyden's method takes more than 20% fewer nonlinear function evaluations.m succeeds.m. Broyden's method does best on this example. . the results (obtained with pdebrl. shows that simply counting iterations is not enough to compare the methods.m reduces the step length once on the second nonlinear iteration.m) are similar. Figure 4. Broyden's method required 10 nonlinear iterations at a cost of 10 vectors of storage. Newton-GMRES.1.m does not need the line search at all. Even so. Broyden's Method example is pdebrr. nsoli. Broyden's method is less impressive.92 Chapter 4. reducing the step length once on iterations 2 and 3. the cost in terms of calls to the function is significantly less. took at most 6 linear iterations for each nonlinear iteration. This is an interesting example because the line search in brsola.

% % x. Storage is only allocated % if x_hist is in the output argument list.m function [sol. 7. stp_nrm(l)=stp(:. stop_tol=atol + rtol*fnrm. 2 0 03 50 51 52 53 54 % % maxarm .x). % % internal parameter: % debug .l)'*stp(:. parameter to measure sufficient decrease X 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 7. % % function [sol. it_histx(itc+l. n = length(x). April 1. if fnrm < stop. compute the stop tolerance % fO-feval(f.tol. end if nargout«™4 x_hist-x. default = 40 % % output: % sol = solution % it_hist(maxit.3).hist .7 1 2 3 4 5 Source Code for brsola. fnrm-1.-fc. lambda=l. itc-0. function .4. nbroy-0.2 failure in the line search. maxdim«parms(2)-l. % The columns are the nonlinear iterates.matrix of the entire iteration history.turns on/off iteration statistics display as % the iteration progresses % % alpha = l. ierr. the termination criterion is not satisfied. fnrmo~l.3) = 12 norms of nonlinear residuals % for the iteration. . % ierr . stp_nrm=zeros(maxdim. ierr. outstat(itc+l. Kelley. it_histx(itc+l. This % is useful for making movies. Armijo rule. parms = [maxit.d-4. so maxdim-1 vectors are % stored 7.tol sol-x. % if nargin == 4 maxit-parms(l). 1 turns display on. number function evaluations. xjiist. it_histx(itc+l. x. fc-fO. % This code comes with no guarantee or warranty of any kind.hist] » brsola(z.1 if after maxit iterations 'I. end rtol=tol(2).m(x. one vector storage X % C. it. maxarm-10. return end % % initialize the iteration history storage matrices '/.l) .histx-zeros(maxit. tol . maxit-40. rtol] relative/absolute % error tolerances for the nonlinear iteration "I. maxdim=39. T. lam_rec-ones(maxdim. % and number of steplength reductions % ierr = 0 upon successful termination 7. and set the iteration parameters % ierr = 0.maximum number of nonlinear iterations % default . maximum number of steplength reductions before failure is reported % % set the debug parameter. debug=0. globally convergent % solver for f(x) .[itc fnrm 0 0]. otherwise off 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 7.2)-0. stp(:. 7.0.parms) % % inputs: /. % % initialize it_hist.l)-fnrm.f.maxdim).l). :) . fnrm=norm(fO). it_hist.f '/. stp=zeros(n.1).f. This is an % OPTIONAL argument. it. % terminate on entry? 7. maxdim] % maxit .40 % maxdim = maximum number of Broyden iterations % before restart. initial iterate = x '/.3)=0. atol-tol(l). X % Set the initial step to -F. ierr . x_hist] « brsola.hist.10.tol. for example.[atol. % % evaluate f at the initial iterate 7. compute the step norm '/. ierr. The iteration % is terminated if too many steplength reductions % are taken.1). parms) % BRSOLA Broyden's Method solver. but % can consume way too much storage.

133 while fnrm >« (1 . 143 fnrm=norm(fc).xold + lambda*stp(:.6.m ') 152 153 154 155 156 157 158 159 160 '/. 163 164 165 166 167 X 168 % 169 X 170 171 172 173 174 175 '/.3)-iarm. 142 fc»feval(f. 113 % compute the new point. 161 % 162 '/. the iteration counter (itc) 110 '/. How many function evaluations did this iteration require? it_histx(itc+l. end ierr-2.:). inexact Newton direction. main iteration loop 103 '/. 183 % 184 '/. 117 x = x + stp(:.nbroy)»lambda*stp(:. itc=itc+l.2)+iarm+l. ffm=ffc.2) = it_histx(itc+l. lamc=lambda. If the line search fails to 128 % find sufficient decrease after maxarm steplength reductions 129 7.nbroy). :) • [itc fnrm iarm rat]. 200 '/. end return. if nargout == 4 x_hist-[x_hist. iarm-0. lambda-1.:)) end return end modify the step and step norm if needed to reflect the line search lam_rec(nbroy)=lambda. 201 '/.m returns with failure. it_hist(itc+l)=fnrm. Line search. stp_nrm(nbroy)=lambda*lambda*stp_nrm(nbroy). if (itc — 1) it. 146 end 147 % 148 '/. ffc»fnrm*fnrm. 150 if iarm — maxarm 151 dispCLine search failure in brsola. 144 ffc=fnrm*fnrm. lamc=lambda. 108 % keep track of successive residual norms and 109 '/.2)-it_histx(itc. compute the next search direction and step norm and add to the iteration history .x). terminate? if fnrm < stop. 123 ffO=fnrmo*fnrmo.x).101 '/. 186 187 188 189 190 191 '/.tol sol=x. we assume that the Broyden direction is an 127 y.lambda*alpha) *fnrmo ft iarm < maxarm 134 '/. 139 end 140 lamm-lamc. test for termination before 114 '/. 116 xold-x. 122 fnrm*norm(fc). 131 % Three-point parabolic line search 132 '/. 104 while(itc < maxit) 105 V. 102 '/. f f c . outstat(itc+l. 202 '/. 185 •/. adding to iteration history 115 V. :) » [itc fnrm iarm rat]. sol-xold.nbroy). 145 iarm«iarm+l. 106 nbroy»nbroy+1.histx(itc-H. 111 fnrmo=fnrm. 124 % 125 % 126 1. f f m ) . if lambda "= 1 stp(:.:)) end if there's room. end. lamm. 112 '/.nbroy). alpha-l. outstat(itc+l. rat-fnrm/fnrmo.:). it_histx(itc+l. brsola. 141 x . it_hist-it_histx(l:itc+l. 192 % 193 % 194 195 196 197 198 199 */. rat=fnrm/fnrmo.d-4. set error flag and return on failure of the line search 149 •/.2)-H. it_hist=it_histx(l:itc+l. if debug==l disp(outstat(itc+l. 130 '/. f f O . lambda-lambda*lrat.l)-fnrm.x]. 120 end 121 fc=feval(f.x]. end it. 135 if iarm—0 136 lambda»lambda*lrat. 137 else 138 Iambda=parab3p(lamc. lrat=. if debug—I disp(outstat(itc+l. 107 '/.hist(itc+l)=fnrm. 176 177 178 179 180 181 % 182 '/. it_histx(itc+l. 118 if nargout — 4 119 x_hist-[x_hist.

1.nbroy+l)'*stp(:.value of \| F x c + Uambdac d) \ ' (.5. 218 zz=stp(:. 234 X 235 X 236 % 237 end 238 X X % X X X X X X X X X X X X X X X X X C. if c2 > 0 we have negative curvature and default to X lambdap .value of \| F(x_c + \lambdam d) \ ' |2 output: lambdap . sigmal . return end cl . ffO. lambdam. lambdap = sigmaO*lambdac. if c2 >« 0 lambdap = sigmal*lambdac. 222 stp(:.lam_rec(nbroy).sigaml * lambda. Apply three-point safeguarded parabolic model for a line search.nbroy+l).(lambdac . X sol=x. we've taken the maximum % number of iterations and not terminated. end if lambdap > sigmal*lambdac.lambdam)*lambdac*lambdam < 0 X so. ztmp-ztmp+(l . 232 stp_nrm(l)-stp(:. X X Compute coefficients of interpolation polynomial.l)'*stp(:.parab3p(lambdac. it_hist-it_histx(l:itc+l. lambdap = -cl*. ffO.5/c2.1). lambdap = sigmal*lambdac.. 255 function lambdap = parab3p(lambdac. Kelley. 219 220 a3-al*zz/stp_nrm(nbroy).5.203 '/. sigmal * . ztmp=ztmp*lam_rec(kbr).x]. 2003 This code comes with no guarantee or warranty of any kind.1. X c2 . safeguarding bounds for the linesearch 278 X X Set internal parameters. April 1. |2 ffm .kbr-H)/lam_rec(kbr+l).value of \| F(x_c) \ ' |2 ffc . time to restart 229 X 230 % 231 stp(: . 233 nbroy-0. ffc. ffm) input: lambdac = current steplength lambdam • previous steplength ffO .new value of lambda given parabolic model internal parameters: sigmaO " .lambdac*lambdac*(ffm-ffO)-lambdam*lambdam*(ffc-ffO).l/lam_rec(kbr))*stp(:.nbroy))/a4.ffO + (cl lambda + c2 lambda-2)/dl X X dl . if nargout »» 4 x_hist-[x_hist. 224 "/. X 285 X p(lambda) . 257 X 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 279 280 281 282 283 284 286 287 288 289 290 291 292 293 294 295 296 297 298 207 208 209 210 211 212 end 213 end 214 '/. ffm) 256 'I. 223 stp_nrm(nbroy+l)=stp(:. end ierr-1. X sigmaO = . function lambdap . if lambdap < sigmaO*lambdac. if nbroy > 1 for kbr • l:nbroy-l ztmp-stp(: . 221 a4-l+a2*zz. 215 X store the new search direction and its norm 216 % 217 a2»-lam_rec(nbroy)/stp_nrm(nbroy). z=z+ztmp*((stp(:. end 239 X end while 240 241 242 243 244 245 246 247 248 249 250 251 end X % We're not supposed to be here. lambdam. 204 205 206 254 X if nbroy < maxdim+1 z=-fc.lambdam*(ffc-ffO)-lambdac*(ffm-ffO).nbroy)'*z.kbr)'*z)/stp_nrm(kbr)). al=l . 225 X 226 X 227 else 228 X out of room. if debug==l 252 253 end disp(outstat) .nbroy+l)=(z-a3*stp(:. ffc.kbr). T.:).1)—fc.

610–638. [10] P. 16 (1966). G. L. Second Edition. S. pp. SMITH. GREENBAUM. BLACKFORD. Extensible Toolkit for Scientific Computation (PETSc) home page. DEMMEL. C. 97 . MC!NNES. BALAY. BAI. Anal. Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations. L. Tech. F. [6] S. L.gov/petsc. Math. Rep. ANDERSON. BROWN AND A. SIAM. W. Pacific J. [4] K. R.. MCKENNEY. F. C. ASCHER AND L. BOOTH. Philadelphia. HENSON. D. Philadelphia. pp. Appl. Philadelphia. HINDMARSH. [2] L. C. W. Minimization of functions having Lipschitz-continuous first partial derivatives. BALAY. J. 1-3. SIAM. Argonne National Laboratory. 2000. V. Comput. Third Edition. SMITH. L. Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. E. IL.anl.28. SCALES. J. [5] S. Du CROZ. Vol. http://www. A Multigrid Tutorial. PETZOLD. Reduced storage matrix methods in stiff ODE systems. Comm. MCCORMICK. Efficient solution of liquid state integral equations using the Newton-GMRES algorithm. MCINNES.Revision 2. Comput.0 Users Manual. ARMUO. C. AND S. pp.. GROPP. AND B. 23 (1986). DONGARRA. 14 in Classics in Applied Mathematics. HAMMARLING. N. D. PETZOLD. Numer. New York. A. J. E. [11] P. [7] M. A. AND A. S. Second Edition. BRIGGS. SIAM J. BRENAN. SIAM.. SIAM. AND L.Bibliography [1] E. 122-134. BROWN AND A. N. [3] U. E. M. [9] W. 1999. HINDMARSH. C. S. John Wiley and Sons. GROPP. D.mcs. J. BISCHOF. 119 (1999). SCHLIPER. Philadelphia. Phys. 1989. L. pp. E. Math. 1996. AND D. 1998. R. ANL-95/11 . Argonne. An Introduction to Numerical Analysis. 40–91.. A. F. 31 (1989). CAMPBELL. J. Z. AND B. HAYMET. SORENSEN. ATKINSON. PETSc 2. J. Portable. LAPACK Users Guide.0. Matrix-free methods for stiff systems of ODE's. [8] K. 2000.

SIAM J. AND T. Anal. 19 (1982). U. 19 (1965). T. Non-isotropic solution of an OZ equation: Matrix methods for integral equations. D. AND J. 1960. 244-276. 400-408. [20] T. BIT. [21] A. Anal. 85 (1995). Comput. Using Krylov methods in the solution of large-scale differential-algebraic systems. E. C. DEMMEL. Comput. Engrg. MEYER. 36 (1996). DEMBO. SIAM. L. C. AND L. Cambridge.K. [23] J. I. W. [15] I. B. KELLEY. W. pp. BUSBRIDGE. SCHNABEL. Adaptive Pseudo-Transient Continuation for Nonlinear Steady State Problems. J. pp... N. 11 (1990). Inst. 577-593. HINDMARSH. A class of methods for solving nonlinear simultaneous equations. Impact Comput.. Sci. [27] J. 1960. CURTIS. [17] S. no. BROWN AND Y... AND A. BROWN.. LINPACK Users' Guide. COLEMAN AND J. D. Vol. C. [14] C. MOLER. BROYDEN. T. AND T. 2 (1990). 16 in Classics in Applied Mathematics. M. Numer. FREUND.. [13] P. DEUFLHARD. J. Comput. CAMPBELL. KEYES. JR. pp. COFFEY.. PETTITT. Comput. R. 1996. C. Dover. SIAM J. PETZOLD. G. Appl. 13 (1974). S. 239-250. BUNCH. Philadelphia. B. Rep. pp. 20 (1983). Sci. SIAM. Numer. SAAD. Stat. [19] C. Comm. pp. F. W. 450–481. to appear. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. pp. KELLEY. pp. F. 1467-1488. 1979. . [16] S. 1997. pp. EISENSTAT. Cambridge University Press. AND R. C. 50 in Cambridge Tracts. IPSEN.-M. 15 (1994). R. 117-119. CHEN AND B. [26] P. S. 187-209. [18] Z. DENNIS. GMRES and the minimal polynomial. Radiative Transfer. D. Berlin. Applied Numerical Linear Algebra. Fast secant methods for the iterative solution of large nonsymmetric linear systems. CHANDRASEKHAR.98 Bibliography [12] P. J. Estimation of sparse Jacobian matrices and graph coloring problems. SIAM J. W. Pseudo-transient continuation and differential-algebraic equations. On the estimation of sparse Jacobian matrices. Hybrid Krylov methods for nonlinear systems of equations. 664–675. Math. Math.. Sci. J. R. [25] P. pp. DONGARRA. A. STEIHAUG. E. [22] R. K. Comput. STEWART. SIAM J. Sci. WALTER. N. AND C.. DEUFLHARD. 02-14. M. Philadelphia. Philadelphia. [24] J. Inexact Newton methods. Tech. March 2002. New York. POWELL. MORE. SIAM. J. REID. Konrad-Zuse-Zentrum fur Informationstechnik. AND G. Phys. SIAM J.. The Mathematics of Radiative Transfer.

NJ. pp. 707-718. 24 in CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia. pp. Philadelphia. Argonne National Laboratory. [31] R. [34] A. [29] S. Argonne.. C. 2002. F.. 37 (1999). STEIFEL.. Johns Hopkins Studies in the Mathematical Sciences. J. J. SIAM. Math. Philadelphia. MATLAB Guide. Piscataway. 19 in Frontiers in Applied Mathematics. [32] G. 14 (1993). KELLEY. . Internat. Trust region algorithms and timestep selection. SIAM. Sci. GREENBAUM. HOVLAND AND B. HESTENES AND E. SIAM J. Nat. Comput. SIAM J. J. GRIEWANK.anl. WALKER. 2000. Methods of conjugate gradient for solving linear systems. J. Sci. KELLEY. KELLER. 17 (1981). [30] M. IL. 409–436.mcs. Std 754-1885. 4 (1994). pp. SIAM. 470-482. F. Vol. Numer. Numerical Solution of Two Point Boundary Value Problems. Baltimore. AND K. 16 in Frontiers in Applied Mathematics. Phys. 1995. pp. 1985. Argonne National Laboratory Computational Differentiation Project. C. 17 in Frontiers in Applied Mathematics. Vol. pp. WALKER.Bibliography 99 [28] S. A transpose-free quasi-minimal residual algorithm for nonHermitian linear systems. HICHAM. Bureau Standards. 16-32. SIAM. Solution of the Chandrasekhar H-equation by Newton's method. Philadelphia. 1625-1628. [42] C. ENGELMAN. pp. Methods Engrg. SIAM J. BATHE. Globally convergent inexact Newton methods. SIAM.. Optim. IEEE. 1997. W. pp. 17 (1996). Philadelphia. 393–422. J. T. Vol. [36] D. B. Third Edition. Res. VAN LOAN. [40] H. 2000. 21 (1980). R. [41] C. J. [35] M.gov/autodiff/ [39] IEEE Standard for Binary Floating Point Arithmetic. HIGHAM. G. FREUND. http://www-fp. EISENSTAT AND H. Numer. 1976. Iterative Methods for Solving Linear and Nonlinear Equations. NORRIS. Anal.. Vol. Iterative Methods for Solving Linear Systems. GOLUB AND C. [37] D. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.. T. 49 (1952). [33] A. J. SIAM J. 194-210. EISENSTAT AND H. Choosing the forcing terms in an inexact Newton method. F. H. [38] P. 1996. Comput. STRANG. HIGHAM AND N. Matrix Computations. The application of quasiNewton methods in fluid mechanics. Johns Hopkins University Press.

C. pp. MANTEUFFEL AND S. C. [45] C. 778795. 501-512. 13 (1992). [49] J. [53] N. New York. [47] C. [54] J. Numer. W. S. 60 (1992). Anal. 1970. pp. KELLEY. AND K. [48] T. User Guide for MINPACK-l. 18 in Frontiers in Applied Mathematics.. [46] C. Rep. 28 (1986). Prob. 508-523. Argonne National Laboratory. [57] J. Phys. 1999. 202-210. A. pp. Akad.. M. pp. Appl. [56] L. pp. Appl. T. Rep. MULLIKIN. Raleigh. Theory of algorithms for unconstrained optimization. Some probability distributions for neutron transport in a half space. Numer. Iterative Solution of Nonlinear Equations in Several Variables. pp. J. AND M. T. W. KELLEY AND T. Tech. ZERNIKE. E. SIAM J.. MULLIKIN. KELLEY AND B.100 Bibliography [43] C. pp. 793-806. S. NOCEDAL. On acceleration methods for coupled nonlinear elliptic systems. Vol. [44] C. KERKHOVEN AND Y. Sci. SIAM. A Fast Algorithm for the OrnsteinZernike Equations. North Carolina State University. 280–290. Tocci. NC. HILLSTROM. Comput. Academic Press. Numerical differentiation tions. MOLER. PARTER.. Termination of Newton/chord iterations and the method of lines.. of analytic func- [50] T. pp. 19 (1998). SIAM Rev. 27 (1990). pp. ANL-80–74. Konink. Matrix Anal. 19 (1978). IL. Center for Research in Scientific Computation.. 525–548. 199–242. B. T. 656–694. L. KELLEY AND D. Wetensch. Anal. Argonne. NAZARETH. J. pp. T. RHEINBOLDT. Acta Numer. CRSC-TR02-12. Math. TREFETHEN. MILLER. Anal. N... April 2002. T. D. 4 (1967). SAAD. Preconditioning and boundary conditions. 35 (1998). NACHTIGAL. Philadelphia. Numer. Conjugate gradient methods less dependent on conjugacy. V. 357-374. 5 (1968). AND L. [55] J. Iterative Methods for Optimization. KELLEY. LYNESS AND C. 1 (1991). MORE. N. C. How fast are nonsymmetric matrix iterations?.. Numer. SIAM J. ORTEGA AND W. Nederl. S. pp. J. SIAM J. SIAM J. Math. M. SIAM J. M. Convergence analysis of pseudo-transient continuation. REDDY. B.. 1980. 17 (1914). E. Solution by iteration of H-equations in multigroup neutron transport. 500–501. T. GARBOW. [51] J. Proc. Tech. KEYES. Accidental deviations of density and opalescence at the critical point of a single substance.. [52] T. ORNSTEIN AND F. . PETTITT.

E. 621. Sci. 419440. 856-869. SIAM J.. [60] M. MORRISON. Rep. [64] Y. Sci. SIAM. Rabinowitz. 21 (1950).colorado. . Sci. 2002. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. SCHNABEL. HINDMARSH. MATLAB Primer. SIAM J. POWELL. W. pp. Sixth Edition. Description and Use ofLSODE. FL. L. 87-114. SIAM J. OVERTON. Comput.. Comput. Stat. 302-318. pp. SIAM J. SHAMPINE. [63] Y. Second Edition. ed. SCHULTZ.Bibliography 101 [58] M. SIAM. [67] L. pp. [71] K. SHAMANSKII. WALKER. Implementation of implicit formulas for the solution of ODEs. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. WEISS. Comput. 18 (1997). 19 (1998). AND B. J.. 7 (1986). J. 1-22. Stat. [68] L. [65] R. [62] Y. 133-138 (in Russian). CRC Press. pp. Philadelphia. 19 (1967). ftp://ftp. SHERMAN AND W. CA. E. Stat. 17 (1996). KOONTZ. URCLID-113855. Comput.. RADHAKRISHNAN AND A. pp. Ann. F. 124-127. pp. Philadelphia. SAAD AND M. SAAD. A modification of Newton's method.edu/users/uncmin/tape. in Numerical Methods for Nonlinear Algebraic Equations. pp. SAAD. Sci. DAVIS. F. J.. Livermore. Numerical Computing with IEEE Floating Point Arithmetic. p. Iterative Methods for Sparse Linear Systems. SHERMAN AND W. Boca Raton. 103-118. H. Ukrain. Zh. Comput.. 20 (1949). 2003.jan30/shar [66] V. REICHELT.. The MATLAB ODE suite. New York. pp. 830–847. A modular system of algorithms for unconstrained minimization. F.cs. Mat. [61] K. [70] J. Stat. Sci. ACM TOMS. SHAMPINE AND M. [59] M. A. Math. 2001. the Livermore Solver for Ordinary Differential Equations. 1 (1980). P. Gordon and Breach. A hybrid method for nonlinear equations. Math.. J. [69] J. D. 11 (1985). Lawrence Livermore National Laboratory. ILUM: A multi-elimination ILU preconditioner for general sparse matrices. December 1993. SlGMON AND T. B. Tech. Ann. Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix (abstract). SIAM J. NITSOL: A Newton iterative solver for nonlinear systems. E. MORRISON. pp. C. PERNICE AND H.. 1970.

[75] A. U. Stat. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Software. C. Philadelphia.. P.. SMITH. Comput. [73] W. 40 (1998). Livermore. User Documentation for KIN SOL. TRAPP. HINDMARSH. 1996. GROPP. AND A. July 1998. A. T. W. P. BI-CGSTAB: A fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems. CA. VAN DER VORST. 281-310. BJ0RSTAD. N. [77] H. 631–644. Center for Applied Scientific Computing. 1973. Tech. Sci. SIAM Rev. 13 (1987). SIAM. a Nonlinear Solver for Sequential and Parallel Computers. SQUIRE AND G. WATSON. .102 Bibliography [72] B. pp. TAYLOR AND A. ACM Trans.K. New York. 110-112. 13 (1992). [78] L. S. MORGAN. Rep. Cambridge University Press. C. SIAM J. Math. Academic Press. [74] G. Using complex variables to estimate derivatives of real functions. Lawrence Livermore National Laboratory. [76] L. Numerical Linear Algebra. Introduction to Matrix Computations. BAU III.. Cambridge. AND W. pp. 1997. UCRL-ID131185. pp. BILLUPS. STEWART. TREFETHEN AND D. Algorithm 652: ROMPACK: A suite of codes for globally convergent homotopy algorithms. G.

Index Armijo rule. 73 Diagonalizable matrix. 9. 44 CGNE. 59 Conjugate transpose. 43 directional derivative. 59 Convection-diffusion equation. 65 .m. 59 Difference increment scaling. 57 Limited memory. 89 Line search. 8. 15 iterate. 35 nsoli. 60 CGNR. 1 Inner iteration. 61 Failure to converge. 10 Conjugate gradient method. 27 GMRES. 2 difference approximation. 89 nsold.m. 61 Jacobian. 64 Chord method. 30. 15 Krylov subspace.m. 21 Forcing term. 18. 8 Inexact Newton method. 29 KINSOL. 41 heq. 86 brsola. 67.m. 1. 74 MATLAB code brsola. 29 Fourier transform. 64. 59 Diagonalizing transformation. 86 convergence. 30. 57 Forward difference 103 banded Jacobian. 27 luinc. 45 bvpsys.m. 42 Homotopy. 58 H-equation. 35 Banded Jacobian. 11 Lipschitz constant. 11 Automatic differentiation. 41. 7 Initial guess. 57 Jacobian matrix. 85. 6. 89 Fast Poisson solver. 14 Inexact Newton condition.m. 33 Codes public domain. 71. 2 Local quadratic convergence. 92 time-dependent. 89 bvpdemo. 69 fatan.m. 72 Gaussian elimination. 4 LU factorization. 66. 60 Break down. 50. 60 Chandrasekhar H-equation. 3 Local linear model. 29. 72 Fast sine transform. 91 cholinc. 58 GMRES(m).m. 60 Broyden's method. 43 Bandwidth. 15 Condition number. 30 BiCGSTAB.

58. 33 Singular Jacobian. 89 multiple solutions. 64 incomplete factorization. 73 Stiff initial value problems. 15 Modified Newton method. 2 algorithm. 35 nsoli. 6 Nested iteration. 59 nsold. 62 right. 50 Newton direction.m. 1 polynomial. 9 Newton step. 64 Preconditioning. 59 Well-conditioned Jacobian. 4 Quasi-Newton methods. 60 Normal matrix. 6 Shamanskii method.104 Index MATLAB FFT. 18 slow convergence. 69 Matrix-free. 30 numjac. 34 estimation. 25 SNES. 61. 9. 62 two-sided. 29. 10 . 15 Sparse Jacobian. 63 PETSc. 33 Ornstein-Zernike equations. 4 Standard assumptions. 15 Preconditioner algebraic multigrid. 15 Q-factor. 14 Two-point boundary value problem. 61 Secant equation. 61 Problems finite difference Jacobian error. 6 Q-linear convergence. 18 pivoting. 14 Public domain. 15 Normal equations. 60 left. 5. 20 MINPACK. 6 Q-order. 35 line search. 34 singular Jacobian. 35 poor Jacobians. 85 Secant method.m. 5 nonlinear. 20 no solution. 15 Unitary transformation. 19 storage. 59. 61. 24 Q-quadratic convergence. 16. 33 Stagnation.m. 85 Residual history. 57 NKSOL. 58 Scaling difference increment. 3 Steady-state solution. 19. 11 Newton's method. 28 Newton-Krylov methods. 57 Oversolving. 67 Outer iteration. 7. 65 numjac. 60 Trust region. 60 Memory paging. 20 Pseudotransient continuation. 11 Newton iterative method. 74 UNCMIN. 18. 17. 47 Sufficient decrease. 11 TFQMR.