You are on page 1of 32

Lecture 6 & 7

Newton’s Method and Modifications

Course Website
https://sites.google.com/view/kporwal/teaching/mtl107
Newton iteration

▶ Let the function f for which we look a zero x ∗ be


differentiable.
▶ In practice, the derivative f’ should be cheap to compute
(comparable with the computation of f).
▶ In Newton’s method the function f is linearized at some
approximate value xk ≈ x ∗ .
▶ Define function t(x) that has at x = xk same function value
and derivative as f
(Taylor polynomial of degree one )

t(x) = f (xk ) + f ′ (xk )(x − xk ).

▶ The single zero of t(x) = 0 yield xk+1 .


Newton iteration (cont.)

▶ The single zero of t(x) = 0 yields xk+1 :

f (xk )
xk+1 = xk − , k = 0, 1, ....
f ′ (xk )
▶ Newton’s method is a fixed point iteration with iteration
function
f (x)
g (x) = x − ′ .
f (x)
Clearly, g (x ∗ ) = x ∗ .
▶ Since we have neglected second (and higher) terms of the
Taylor expansion of f at x ∗ we can expect that t(x) is a very
good approximation of f (x) if xk ≈ x ∗ .
Numerical Methods for Computational Science and Engineering
Newton’s method: geometric interpretation
Newton iteration

Newton’s method: geometric interpretation

Figure 7: Geometric Interpretation


NumCSE, Lecture 4, Sept 30, 2013
Algorithm:Newton’s iteration

Given a scalar differentiable function f (x).


▶ Start from an initial guess x0 .
▶ For k = 0,1,2,....set

f (xk )
xk+1 = xk −
f ′ (xk )
until xk+1 satisfies some termination criterion
Newton’s method code:

1 function [x,it]=newton(f,df,x)
2 % NEWTON Newton iteration for scalar eq f(x)=0.
3 % x=newton(f,df,x0)
4 dx = f(x)/df(x); it=0;
5 while abs(dx) > 1e-10,
6 x = x - dx;
7 dx = f(x)/df(x);
8 it = it+1; end
Example (Convergence of Newton’s iteration)


Iteration for computing a, a > 0:
▶ f (x) = x 2 − a, f ′ (x) = 2x.
x 2 −a x 2 +a 1
x + xa

▶ g (x) = x − 2x = 2x = 2
2 √
▶ g ′ (x) = 1 − x2x+a 1 a
2 = 2 − 2x 2 , g ′ ( a) = 0,

▶ g ′′ (x) = xa3 , g ′′ ( a) = √1a .

√ √
xk+1 = 12 (xk + a
xk ) =⇒ |xk+1 − a| = 1
2|xk | |xk − a|2
Numerical Methods for Computational Science and Engineering
Newton iteration
Example (Convergence of Newton iteration (cont.))
Example (Convergence of Newton iteration (cont.))
Numerical experiment: iterates for a =2:
Numerical experiment: iterates for a = 2: p
k xk ek := xk 2 log |e|ek k |1 | : log |e|e(kk 1|
2) |

0 2.00000000000000000 0.58578643762690485
1 1.50000000000000000 0.08578643762690485
2 1.41666666666666652 0.00245310429357137 1.850
3 1.41421568627450966 0.00000212390141452 1.984
4 1.41421356237468987 0.00000000000159472 2.000
5 1.41421356237309492 0.00000000000000022 0.630

The number of significant digits in xk essentially doubles at each


The number of significant digits in xk essentially doubles at each
iteration.
iteration.When roundoff level is reached, no meaningful
When roundo↵ level is reached, no meaningful improvement can be
improvement can be obtained any further. The improvement from
obtained any further. The improvement from the 4th to the 5th
the 4th to th 5th iteration (in this example ) is minimal.
iteration (in this example) is minimal.
Example (Choice of initial guess)

Consider f (x) = 2 cosh(x/4)x. f has 2 roots:


x1∗ ≈ 2.35755106, x2∗ ≈ 8.50719958.

2 cosh(xk /4)xk
The Newton’s iteration here is xk+1 = xk − 0.5 sinh(xk /4) .
Iteration until |f (xk+1 )| < 10−8 .

▶ Starting from x0 = 2 requires 4 iteration to reach x1∗ to within


the given tolerance.
▶ Starting from x0 = 4 requires 5 iterations to reach x1∗ .
▶ Starting from x0 = 8 requires 5 iteration to reach x2∗ .
▶ Starting from x0 = 10 requires 6 iterations to reach x2∗
The value of f (xk ), starting from x0 = 8, are:

k 0 1 2 3 4 5
f (xk ) 4.76e-1 8.43e-2 1.56e-3 5.65e-7 7.28e-14 1.78e-15

Number of significant digits essentially doubles at each iteration.


When roundoff level is reached, no meaningful improvement can be
obtained upon heaping more floating point operations, and the
improvement from the 4th to the 5th iteration in this example is
marginal.
Order of convergence

The method is said to be


▶ linearly convergent if there is a constant ρ ¡ 1 such that

|xk+1 − x ∗ | ≤ ρ|xk − x ∗ |, for k sufficiently large;

▶ quadratic convergent if there is a constant M such that

|xk+1 − x ∗ | ≤ M|xk − x ∗ |2 , for k sufficiently large;

▶ superlinearly convergent if there is a sequence of constants


ρk → 0 such that

|xk+1 − x ∗ | ≤ ρk |xk − x ∗ |, for k sufficiently large;

The quadratic case is superlinear with ρk = M|xk − x ∗ | → 0


Convergence of Newton’s method
Theorem ( Convergence of Newton’s iteration)
If f ∈ C 2 [a, b] has a root x ∗ in [a, b] and f ′ (x ∗ ) ̸= 0, then there
exists δ > 0 such that, for any x0 in [x ∗ − δ, x ∗ + δ], Newton’s
method converges quadratically.
Proof:
(i) Since the iteration function g is continuously differentiable
there is some neighborhood [x ∗ − δ, x ∗ + δ] of x ∗ in which
|g ′ | < 1.
(ii) For convergence order use the Taylor expansion of g at x ∗ :
x ∗ − xk+1 = g (x ∗ ) − g (xk )
= g (x ∗ ) − g (x ∗ ) + g ′ (x ∗ )(x ∗ − xk+1 ) + g ′′ (ξ)(x ∗ − xk+1 )2

Another example

The equation f (x) = x 3 − 3x + 2 = (x + 2)(x − 1)2 has two zeros:


-2 and 1. The Newton iteration is
2xk 2
xk+1 = g (xk ) = +
3 3(xk + 1)2
2 2
g ′ (x) = −
3 3(x + 1)2
Newton iteration for multiple roots
Let x ∗ be a multiple root of f: f (x) = (x − x ∗ )m g (x) with g
differentiable and g (x ∗ ) ̸= 0. Then,

f (x) (x − x ∗ )m g (x)
g (x) = x − =
f ′ (x) m(x − x ∗ )m−1 g (x) + (x − x ∗ )m g ′ (x)
(x − x ∗ )g (x)
= x−
mg (x) + (x − x ∗ )g ′ (x)

g (x) + (x − x ∗ )g ′ (x)
g ′ (x) = 1 −
mg (x) + (x − x ∗ )g ′ (x)
(x − x ∗ )g (x)(mg ′ (x) + g ′ (x) + g ′ (x) + (x − x ∗ )g ”(x))

(mg (x) + (x − x ∗ )g ′ (x))2
1
g ′ (x ∗ ) = 1 − . (0.3)
m
EXERCISE
Newton iteration for multiple roots (cont.)

Therefore, Newton iteration converges only linearly to multiple


roots. For large m the convergence is very slow.

For a double root (m=2) the Lipschitz constant is L = 1/2.

Remedy: We have to extend the step length in accordance with


the multiplicity of the zero of f(x).

Note: Often we do not know the multiplicity of a root.

Remark: One may try to apply Newton to the function f (x)/f ′ (x)
that has only simple roots.
Simplified Newton iteration

xk+1 = xk − f (xk )/f ′ (x 0 ), k = 0, 1.....


Linear convergence

K := |g ′ (x ∗ )| = |1 − f ′ (x ∗ )/f ′ (x 0 )|.

Simplified Newton iteration can be very effective if, x 0 is a good


approximation of x ∗ . Then,

1 − f (xk )/f ′ (x 0 ) ≈ 1 − f (xk )/f ′ (x ∗ )

Such that the convergence factor L is small.


Damped Newton
To avoid overshooting one can damp (shorten) the Newton step

xk+1 = xk − λk f (xk )/f ′ (xk ), k = 0, 1, ...

λk is chosen such that |f (xk+1 )| < |f (xk )|.

∆x = f (x)/f ′ (x);
while (|f (x − λ∆x)| > |f (x)|)
λ = λ/2;
end

Close to convergence we should let λk → 1 to have the full step


length and quadratic convergence. Before each iteration step:
λ = min(1, 2λ)
Secant method

xk − xk−1
xk+1 = xk − f (xk ) , secant
f (xk ) − f (xk−1 )
Notice that the secant method is obtained be approximating the
derivative in Newton’s method by a finite difference,

f (xk ) − f (xk−1 )
f ′ (xk ) ≈ .
xk − xk−1

The secant mehtod is not a fixed point method but a multi-point

method. It can interpreted as follows. The next xk+1 is the zero of


the degree 1 polynomial that interpolates f at xk and xk−1 .
Convergence rate: 1.618, i.e. superlinear ! No derivative needed
Secant method (cont.)

1 function [x,i] = secant(x0,x1,f,tol,maxit)


2 % secant method
3 f0 = f(x0);
4 for i=1:maxit
5 f1 = f(x1);
6 s = f1*(x1-x0)/(f1-f0);
7 x0=x1;x1=x1-s;
8 if(abs(s)<tol), x=x1; return; end
9 f0=f1;
10 end
11 x = NaN;
Inverse Interpolation

Given a data set

(xi , yi = f (xi )), i = 0, 1, ....n

In inverse interpolation we want to find a position x̄ such that, for

a given ȳ , f (x̄) = ȳ .
If the given function f is monotone in the interval, then for each y
there is only one x for which f (x) = y . In this situation, it makes
sense to interpolate the points (yi , xi = f −1 (yi )).

Here: we are looking for x ∗ such that

f (x ∗ ) = 0 ⇐⇒ x ∗ = f −1 (0)
Inverse Linear Interpolation

The secant method can be derived as linear interpolation:


The function that linearly interpolate (yk , f −1 (yk )) and
(yk−1 , f −1 (yk−1 )) is
y − yk−1 y − yk
f −1 (yk ) − f −1 (yk−1 ) .
yk − yk−1 yk − yk−1

The value of this function at y = 0 gives the approximate xk+1 :


−xk yk−1 + xk−1 yk
xk+1 = (0.4)
yk − yk−1
xk − xk−1
 
= xk − yk , yk ≡ fxk (0.5)
yk − yk−1
Inverse quadratic interpolation

f(k−2) fk−1
xk+1 = xk
(fk − f(k−2) )(fk − fk−1 )
f(k−2) fk
+ xk−1
(fk−1 − f(k−2) )(fk−1 − fk )
fk−1 fk
+ xk−2
(fk−2 − fk−1 )(fk−2 − fk )

Code on next page from Moler, Numerical computing with

MATLAB, SIAM, 2004.


Convergence rate: 1.839. No derivatives needed!!
Example for IQI
Example for IQI
Find zero of f (x) = xe x − 0 = 0, x 1 = 2.5, x 2 = 5.
x 1, x (0) (1) (2)
Find zero of f (x) = xe 1, x = 0, x = 2.5, x = 5.
log |ek+1 | log |ek |
k xk g(xk ) ek := xk x ⇤ log |ek | log |ek 1 |
3 0.08520390058175 -0.90721814294134 -0.48193938982803
4 0.16009252622586 -0.81211229637354 -0.40705076418392 3.33791154378839
5 0.79879381816390 0.77560534067946 0.23165052775411 2.28740488912208
6 0.63094636752843 0.18579323999999 0.06380307711864 1.82494667289715
7 0.56107750991028 -0.01667806436181 -0.00606578049951 1.87323264214217
8 0.56706941033107 -0.00020413476766 -0.00007388007872 1.79832936980454
9 0.56714331707092 0.00000007367067 0.00000002666114 1.84841261527097
10 0.56714329040980 0.00000000000003 0.00000000000001

Figure 8: Errors
Zeroin (MATLAB’s fzero)

Combine the reliability of bisection with the convergence speed of


secant and inverse quadratic interpolation (IQI). Requires only
function evaluation.
Outline:
▶ Start with a and b s.t. f (a)f (b) < 0.
▶ Use a secant step to get c between a and b.
▶ Repeat the following steps until |b − a| < |b| or f (b) = 0.
▶ Arrange a,b and c so that
▶ f (a)f (b) < 0
▶ |f (b)| ≤ |f (a)|
▶ c is the previous value of b.
▶ If c ̸= a, consider an IQI step.
▶ If c = a, consider a secant step.
Zeroin (MATLAB’s fzero) (cont.)

▶ If the IQI or secant step is in the interval [a,b]. take it.


▶ If the step is not in the interval, use bisection.

This algorithm is foolproof: It never loses track of the zero trapped


in a shrinking interval.
It uses rapidly convergent methods when they are reliable.
It uses a slow, but sure, method when it is necessary.
It only uses function values, no derivatives.
Computing multiple zeros

If we have found a zero z of f(x) = 0 and want to compute another


one, we want to avoid recomputing the already found z.
We can explicitly deflate the zero by defining a new function

f (x)
f1 (x) := ,
x −z
and apply method of choice to f1 . This procedure can in particular
be done with polynomials which can be error prone if z is not
accurate.
We can proceed similarly for multiple zeros z1 , .....zm .
Computing multiple zeros (cont.)

For the reciprocal Newton correction for f1 we get


f ′ (x) f (x)
f1′ (x) x−z − (x−z)2 1
= f ′ (x)
− .
f1 (x) x −z
f (x)

Then a Newton correction becomes


1
x (k+1) = xk − f ′ (xk ) 1
f (xk ) − xk −z

and similarly for multiple zeros z1 , ......zm .


The above proceed is called implicit deflation. f is not modified.
In this way errors in z are not propagated to f1
Comparison of methods

Comparison of some methods for computing the smallest zero of


f (x) = cos(x) cosh(x) + 1 = 0

method start steps function evals


bisection [0,3] 32 34
secant method [1.5,3] 8 9
secant method [0,3] 15 16
Newton x (0) = 1.5 5 11
Brent [0,1.5,3] 6 9

Notes: (1) Brent is a method similar to MATLAB’s fzero


(2) These numbers depend on the function f !
Minimizing a function of one variable

▶ A major source of applications giving rise to root finding is


optimization.
▶ One-variable version: find an argument x = x̂ that minimizes
a given objection function ϕ(x).
▶ Example from earlier: Find the minimum of the function.
x
ϕ(x) = 10 cosh( ) − x
4
over the real line.
▶ Note: maximize function ψ(x) ⇐⇒ minimize ϕ(x) = −ψ(x)
Conditions for minimum point

Assume that ϕ ∈ C 2 [a, b]. Denote

f (x) = ϕ′ (x).

An argument x ∗ satisfying a < x ∗ < b is called a critical point if

f (x ∗ ) = 0.

For parameter h small enough so that x ∗ + h ∈ [a, b] we can


expand in a Taylors series

h2
ϕ(x ∗ + h) = ϕ(x ∗ ) + hϕ′ (x ∗ ) + ϕ”(x ∗ ) + .....
2
h2
= ϕ(x ∗ ) + [ϕ”(x ∗ ) + O(h)]
2
Conditions for a minimum point (cont.)

Since |h| can be taken arbitrary small, it is now clear that at a


critical point:
▶ If ϕ”(x ∗ ) > 0, then x̂ = x ∗ is a local minimizer of ϕ(x). This
means that ϕ attains a minimum at x̂ = x ∗ in some
neighborhood which includes x ∗ .
▶ If ϕ”(x ∗ ) < 0, then x̂ = x ∗ is a local maximizer of ϕ(x). This
means that ϕ attains a minimum at x̂ = x ∗ in some
neighborhood which includes x ∗ .
▶ If ϕ”(x ∗ ) = 0, then a further investigation at x ∗ is required.
Computation of minima of functions

If ϕ(x) attains a minimum (or maximum) at a point x̂, then this


point must be critical, i.e. f (x̂) = 0.
We can apply any of the zero finder to find x̂ with f (x̂) = 0.
Example. For the function ϕ(x) = 10 cosh( x4 ) − x we have

10 x 10 x
ϕ′ (x) = f (x) = sinh( ) − 1, ϕ”(x) = f ′ (x) = cosh( ).
4 4 16 4
Note that for quadratic convergence of Newton method, ϕ(x)
must have three continuous derivatives.
Note: The problem of finding all minima of a given function ϕ(x) can be
solved be finding all the critical roots and then checking for each if it is a
minimum by examining the sign of second derivative of ϕ.

You might also like