Lecture 6 & 7 Newton's Method and Modifications: Course Website

Lecture 6 & 7
Newton’s Method and Modifications
Course Website
https://sites.google.com/view/kporwal/teaching/mtl107
Newton iteration
▶ Let the function f for which we look a zero x ∗ be

differentiable.
▶ In practice, the derivative f’ should be cheap to compute
(comparable with the computation of f).
▶ In Newton’s method the function f is linearized at some
approximate value xk ≈ x ∗ .
▶ Define function t(x) that has at x = xk same function value
and derivative as f
(Taylor polynomial of degree one )
t(x) = f (xk ) + f ′ (xk )(x − xk ).
▶ The single zero of t(x) = 0 yield xk+1 .

Newton iteration (cont.)
▶ The single zero of t(x) = 0 yields xk+1 :
f (xk )
xk+1 = xk − , k = 0, 1, ....
f ′ (xk )
▶ Newton’s method is a fixed point iteration with iteration
function
f (x)
g (x) = x − ′ .
f (x)
Clearly, g (x ∗ ) = x ∗ .
▶ Since we have neglected second (and higher) terms of the
Taylor expansion of f at x ∗ we can expect that t(x) is a very
good approximation of f (x) if xk ≈ x ∗ .
Numerical Methods for Computational Science and Engineering
Newton’s method: geometric interpretation
Newton iteration
Newton’s method: geometric interpretation
Figure 7: Geometric Interpretation

NumCSE, Lecture 4, Sept 30, 2013
Algorithm:Newton’s iteration
Given a scalar differentiable function f (x).

▶ Start from an initial guess x0 .
▶ For k = 0,1,2,....set
f (xk )
xk+1 = xk −
f ′ (xk )
until xk+1 satisfies some termination criterion
Newton’s method code:
1 function [x,it]=newton(f,df,x)
2 % NEWTON Newton iteration for scalar eq f(x)=0.
3 % x=newton(f,df,x0)
4 dx = f(x)/df(x); it=0;
5 while abs(dx) > 1e-10,
6 x = x - dx;
7 dx = f(x)/df(x);
8 it = it+1; end
Example (Convergence of Newton’s iteration)
√
Iteration for computing a, a > 0:
▶ f (x) = x 2 − a, f ′ (x) = 2x.
x 2 −a x 2 +a 1
x + xa

▶ g (x) = x − 2x = 2x = 2
2 √
▶ g ′ (x) = 1 − x2x+a 1 a
2 = 2 − 2x 2 , g ′ ( a) = 0,
√
▶ g ′′ (x) = xa3 , g ′′ ( a) = √1a .
√ √
xk+1 = 12 (xk + a
xk ) =⇒ |xk+1 − a| = 1
2|xk | |xk − a|2
Numerical Methods for Computational Science and Engineering
Newton iteration
Example (Convergence of Newton iteration (cont.))
Example (Convergence of Newton iteration (cont.))
Numerical experiment: iterates for a =2:
Numerical experiment: iterates for a = 2: p
k xk ek := xk 2 log |e|ek k |1 | : log |e|e(kk 1|
2) |
0 2.00000000000000000 0.58578643762690485
1 1.50000000000000000 0.08578643762690485
2 1.41666666666666652 0.00245310429357137 1.850
3 1.41421568627450966 0.00000212390141452 1.984
4 1.41421356237468987 0.00000000000159472 2.000
5 1.41421356237309492 0.00000000000000022 0.630
The number of significant digits in xk essentially doubles at each

The number of significant digits in xk essentially doubles at each
iteration.
iteration.When roundoff level is reached, no meaningful
When roundo↵ level is reached, no meaningful improvement can be
improvement can be obtained any further. The improvement from
obtained any further. The improvement from the 4th to the 5th
the 4th to th 5th iteration (in this example ) is minimal.
iteration (in this example) is minimal.
Example (Choice of initial guess)
Consider f (x) = 2 cosh(x/4)x. f has 2 roots:

x1∗ ≈ 2.35755106, x2∗ ≈ 8.50719958.
2 cosh(xk /4)xk
The Newton’s iteration here is xk+1 = xk − 0.5 sinh(xk /4) .
Iteration until |f (xk+1 )| < 10−8 .
▶ Starting from x0 = 2 requires 4 iteration to reach x1∗ to within

the given tolerance.
▶ Starting from x0 = 4 requires 5 iterations to reach x1∗ .
▶ Starting from x0 = 8 requires 5 iteration to reach x2∗ .
▶ Starting from x0 = 10 requires 6 iterations to reach x2∗
The value of f (xk ), starting from x0 = 8, are:
k 0 1 2 3 4 5
f (xk ) 4.76e-1 8.43e-2 1.56e-3 5.65e-7 7.28e-14 1.78e-15
Number of significant digits essentially doubles at each iteration.

When roundoff level is reached, no meaningful improvement can be
obtained upon heaping more floating point operations, and the
improvement from the 4th to the 5th iteration in this example is
marginal.
Order of convergence
The method is said to be

▶ linearly convergent if there is a constant ρ ¡ 1 such that
|xk+1 − x ∗ | ≤ ρ|xk − x ∗ |, for k sufficiently large;
▶ quadratic convergent if there is a constant M such that
|xk+1 − x ∗ | ≤ M|xk − x ∗ |2 , for k sufficiently large;
▶ superlinearly convergent if there is a sequence of constants

ρk → 0 such that
|xk+1 − x ∗ | ≤ ρk |xk − x ∗ |, for k sufficiently large;
The quadratic case is superlinear with ρk = M|xk − x ∗ | → 0

Convergence of Newton’s method
Theorem ( Convergence of Newton’s iteration)
If f ∈ C 2 [a, b] has a root x ∗ in [a, b] and f ′ (x ∗ ) ̸= 0, then there
exists δ > 0 such that, for any x0 in [x ∗ − δ, x ∗ + δ], Newton’s
method converges quadratically.
Proof:
(i) Since the iteration function g is continuously differentiable
there is some neighborhood [x ∗ − δ, x ∗ + δ] of x ∗ in which
|g ′ | < 1.
(ii) For convergence order use the Taylor expansion of g at x ∗ :
x ∗ − xk+1 = g (x ∗ ) − g (xk )
= g (x ∗ ) − g (x ∗ ) + g ′ (x ∗ )(x ∗ − xk+1 ) + g ′′ (ξ)(x ∗ − xk+1 )2

Another example
The equation f (x) = x 3 − 3x + 2 = (x + 2)(x − 1)2 has two zeros:

-2 and 1. The Newton iteration is
2xk 2
xk+1 = g (xk ) = +
3 3(xk + 1)2
2 2
g ′ (x) = −
3 3(x + 1)2
Newton iteration for multiple roots
Let x ∗ be a multiple root of f: f (x) = (x − x ∗ )m g (x) with g
differentiable and g (x ∗ ) ̸= 0. Then,
f (x) (x − x ∗ )m g (x)
g (x) = x − =
f ′ (x) m(x − x ∗ )m−1 g (x) + (x − x ∗ )m g ′ (x)
(x − x ∗ )g (x)
= x−
mg (x) + (x − x ∗ )g ′ (x)
g (x) + (x − x ∗ )g ′ (x)
g ′ (x) = 1 −
mg (x) + (x − x ∗ )g ′ (x)
(x − x ∗ )g (x)(mg ′ (x) + g ′ (x) + g ′ (x) + (x − x ∗ )g ”(x))
−
(mg (x) + (x − x ∗ )g ′ (x))2
1
g ′ (x ∗ ) = 1 − . (0.3)
m
EXERCISE
Newton iteration for multiple roots (cont.)
Therefore, Newton iteration converges only linearly to multiple

roots. For large m the convergence is very slow.
For a double root (m=2) the Lipschitz constant is L = 1/2.
Remedy: We have to extend the step length in accordance with

the multiplicity of the zero of f(x).
Note: Often we do not know the multiplicity of a root.
Remark: One may try to apply Newton to the function f (x)/f ′ (x)
that has only simple roots.
Simplified Newton iteration
xk+1 = xk − f (xk )/f ′ (x 0 ), k = 0, 1.....

Linear convergence
K := |g ′ (x ∗ )| = |1 − f ′ (x ∗ )/f ′ (x 0 )|.
Simplified Newton iteration can be very effective if, x 0 is a good

approximation of x ∗ . Then,
1 − f (xk )/f ′ (x 0 ) ≈ 1 − f (xk )/f ′ (x ∗ )
Such that the convergence factor L is small.

Damped Newton
To avoid overshooting one can damp (shorten) the Newton step
xk+1 = xk − λk f (xk )/f ′ (xk ), k = 0, 1, ...
λk is chosen such that |f (xk+1 )| < |f (xk )|.
∆x = f (x)/f ′ (x);
while (|f (x − λ∆x)| > |f (x)|)
λ = λ/2;
end
Close to convergence we should let λk → 1 to have the full step

length and quadratic convergence. Before each iteration step:
λ = min(1, 2λ)
Secant method
xk − xk−1
xk+1 = xk − f (xk ) , secant
f (xk ) − f (xk−1 )
Notice that the secant method is obtained be approximating the
derivative in Newton’s method by a finite difference,
f (xk ) − f (xk−1 )
f ′ (xk ) ≈ .
xk − xk−1
The secant mehtod is not a fixed point method but a multi-point
method. It can interpreted as follows. The next xk+1 is the zero of

the degree 1 polynomial that interpolates f at xk and xk−1 .
Convergence rate: 1.618, i.e. superlinear ! No derivative needed
Secant method (cont.)
1 function [x,i] = secant(x0,x1,f,tol,maxit)

2 % secant method
3 f0 = f(x0);
4 for i=1:maxit
5 f1 = f(x1);
6 s = f1*(x1-x0)/(f1-f0);
7 x0=x1;x1=x1-s;
8 if(abs(s)<tol), x=x1; return; end
9 f0=f1;
10 end
11 x = NaN;
Inverse Interpolation
Given a data set
(xi , yi = f (xi )), i = 0, 1, ....n
In inverse interpolation we want to find a position x̄ such that, for
a given ȳ , f (x̄) = ȳ .
If the given function f is monotone in the interval, then for each y
there is only one x for which f (x) = y . In this situation, it makes
sense to interpolate the points (yi , xi = f −1 (yi )).
Here: we are looking for x ∗ such that
f (x ∗ ) = 0 ⇐⇒ x ∗ = f −1 (0)
Inverse Linear Interpolation
The secant method can be derived as linear interpolation:

The function that linearly interpolate (yk , f −1 (yk )) and
(yk−1 , f −1 (yk−1 )) is
y − yk−1 y − yk
f −1 (yk ) − f −1 (yk−1 ) .
yk − yk−1 yk − yk−1
The value of this function at y = 0 gives the approximate xk+1 :

−xk yk−1 + xk−1 yk
xk+1 = (0.4)
yk − yk−1
xk − xk−1

= xk − yk , yk ≡ fxk (0.5)
yk − yk−1
Inverse quadratic interpolation
f(k−2) fk−1
xk+1 = xk
(fk − f(k−2) )(fk − fk−1 )
f(k−2) fk
+ xk−1
(fk−1 − f(k−2) )(fk−1 − fk )
fk−1 fk
+ xk−2
(fk−2 − fk−1 )(fk−2 − fk )
Code on next page from Moler, Numerical computing with
MATLAB, SIAM, 2004.

Convergence rate: 1.839. No derivatives needed!!
Example for IQI
Example for IQI
Find zero of f (x) = xe x − 0 = 0, x 1 = 2.5, x 2 = 5.
x 1, x (0) (1) (2)
Find zero of f (x) = xe 1, x = 0, x = 2.5, x = 5.
log |ek+1 | log |ek |
k xk g(xk ) ek := xk x ⇤ log |ek | log |ek 1 |
3 0.08520390058175 -0.90721814294134 -0.48193938982803
4 0.16009252622586 -0.81211229637354 -0.40705076418392 3.33791154378839
5 0.79879381816390 0.77560534067946 0.23165052775411 2.28740488912208
6 0.63094636752843 0.18579323999999 0.06380307711864 1.82494667289715
7 0.56107750991028 -0.01667806436181 -0.00606578049951 1.87323264214217
8 0.56706941033107 -0.00020413476766 -0.00007388007872 1.79832936980454
9 0.56714331707092 0.00000007367067 0.00000002666114 1.84841261527097
10 0.56714329040980 0.00000000000003 0.00000000000001
Figure 8: Errors
Zeroin (MATLAB’s fzero)
Combine the reliability of bisection with the convergence speed of

secant and inverse quadratic interpolation (IQI). Requires only
function evaluation.
Outline:
▶ Start with a and b s.t. f (a)f (b) < 0.
▶ Use a secant step to get c between a and b.
▶ Repeat the following steps until |b − a| < |b| or f (b) = 0.
▶ Arrange a,b and c so that
▶ f (a)f (b) < 0
▶ |f (b)| ≤ |f (a)|
▶ c is the previous value of b.
▶ If c ̸= a, consider an IQI step.
▶ If c = a, consider a secant step.
Zeroin (MATLAB’s fzero) (cont.)
▶ If the IQI or secant step is in the interval [a,b]. take it.

▶ If the step is not in the interval, use bisection.
This algorithm is foolproof: It never loses track of the zero trapped

in a shrinking interval.
It uses rapidly convergent methods when they are reliable.
It uses a slow, but sure, method when it is necessary.
It only uses function values, no derivatives.
Computing multiple zeros
If we have found a zero z of f(x) = 0 and want to compute another

one, we want to avoid recomputing the already found z.
We can explicitly deflate the zero by defining a new function
f (x)
f1 (x) := ,
x −z
and apply method of choice to f1 . This procedure can in particular
be done with polynomials which can be error prone if z is not
accurate.
We can proceed similarly for multiple zeros z1 , .....zm .
Computing multiple zeros (cont.)
For the reciprocal Newton correction for f1 we get

f ′ (x) f (x)
f1′ (x) x−z − (x−z)2 1
= f ′ (x)
− .
f1 (x) x −z
f (x)
Then a Newton correction becomes

1
x (k+1) = xk − f ′ (xk ) 1
f (xk ) − xk −z
and similarly for multiple zeros z1 , ......zm .

The above proceed is called implicit deflation. f is not modified.
In this way errors in z are not propagated to f1
Comparison of methods
Comparison of some methods for computing the smallest zero of

f (x) = cos(x) cosh(x) + 1 = 0
method start steps function evals

bisection [0,3] 32 34
secant method [1.5,3] 8 9
secant method [0,3] 15 16
Newton x (0) = 1.5 5 11
Brent [0,1.5,3] 6 9
Notes: (1) Brent is a method similar to MATLAB’s fzero

(2) These numbers depend on the function f !
Minimizing a function of one variable
▶ A major source of applications giving rise to root finding is

optimization.
▶ One-variable version: find an argument x = x̂ that minimizes
a given objection function ϕ(x).
▶ Example from earlier: Find the minimum of the function.
x
ϕ(x) = 10 cosh( ) − x
4
over the real line.
▶ Note: maximize function ψ(x) ⇐⇒ minimize ϕ(x) = −ψ(x)
Conditions for minimum point
Assume that ϕ ∈ C 2 [a, b]. Denote
f (x) = ϕ′ (x).
An argument x ∗ satisfying a < x ∗ < b is called a critical point if
f (x ∗ ) = 0.
For parameter h small enough so that x ∗ + h ∈ [a, b] we can

expand in a Taylors series
h2
ϕ(x ∗ + h) = ϕ(x ∗ ) + hϕ′ (x ∗ ) + ϕ”(x ∗ ) + .....
2
h2
= ϕ(x ∗ ) + [ϕ”(x ∗ ) + O(h)]
2
Conditions for a minimum point (cont.)
Since |h| can be taken arbitrary small, it is now clear that at a

critical point:
▶ If ϕ”(x ∗ ) > 0, then x̂ = x ∗ is a local minimizer of ϕ(x). This
means that ϕ attains a minimum at x̂ = x ∗ in some
neighborhood which includes x ∗ .
▶ If ϕ”(x ∗ ) < 0, then x̂ = x ∗ is a local maximizer of ϕ(x). This
means that ϕ attains a minimum at x̂ = x ∗ in some
neighborhood which includes x ∗ .
▶ If ϕ”(x ∗ ) = 0, then a further investigation at x ∗ is required.
Computation of minima of functions
If ϕ(x) attains a minimum (or maximum) at a point x̂, then this

point must be critical, i.e. f (x̂) = 0.
We can apply any of the zero finder to find x̂ with f (x̂) = 0.
Example. For the function ϕ(x) = 10 cosh( x4 ) − x we have
10 x 10 x
ϕ′ (x) = f (x) = sinh( ) − 1, ϕ”(x) = f ′ (x) = cosh( ).
4 4 16 4
Note that for quadratic convergence of Newton method, ϕ(x)
must have three continuous derivatives.
Note: The problem of finding all minima of a given function ϕ(x) can be
solved be finding all the critical roots and then checking for each if it is a
minimum by examining the sign of second derivative of ϕ.

Lecture 6 & 7 Newton's Method and Modifications: Course Website

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 6 & 7 Newton's Method and Modifications: Course Website

Uploaded by

Copyright:

Available Formats

Lecture 6 & 7

Newton’s Method and Modifications

▶ Let the function f for which we look a zero x ∗ be

t(x) = f (xk ) + f ′ (xk )(x − xk ).

▶ The single zero of t(x) = 0 yield xk+1 .

▶ The single zero of t(x) = 0 yields xk+1 :

Newton’s method: geometric interpretation

Figure 7: Geometric Interpretation

Given a scalar differentiable function f (x).

The number of significant digits in xk essentially doubles at each

Consider f (x) = 2 cosh(x/4)x. f has 2 roots:

▶ Starting from x0 = 2 requires 4 iteration to reach x1∗ to within

Number of significant digits essentially doubles at each iteration.

The method is said to be

|xk+1 − x ∗ | ≤ ρ|xk − x ∗ |, for k sufficiently large;

▶ quadratic convergent if there is a constant M such that

|xk+1 − x ∗ | ≤ M|xk − x ∗ |2 , for k sufficiently large;

▶ superlinearly convergent if there is a sequence of constants

|xk+1 − x ∗ | ≤ ρk |xk − x ∗ |, for k sufficiently large;

The quadratic case is superlinear with ρk = M|xk − x ∗ | → 0

The equation f (x) = x 3 − 3x + 2 = (x + 2)(x − 1)2 has two zeros:

Therefore, Newton iteration converges only linearly to multiple

For a double root (m=2) the Lipschitz constant is L = 1/2.

Remedy: We have to extend the step length in accordance with

Note: Often we do not know the multiplicity of a root.

xk+1 = xk − f (xk )/f ′ (x 0 ), k = 0, 1.....

Simplified Newton iteration can be very effective if, x 0 is a good

1 − f (xk )/f ′ (x 0 ) ≈ 1 − f (xk )/f ′ (x ∗ )

Such that the convergence factor L is small.

xk+1 = xk − λk f (xk )/f ′ (xk ), k = 0, 1, ...

λk is chosen such that |f (xk+1 )| < |f (xk )|.

Close to convergence we should let λk → 1 to have the full step

The secant mehtod is not a fixed point method but a multi-point

method. It can interpreted as follows. The next xk+1 is the zero of

1 function [x,i] = secant(x0,x1,f,tol,maxit)

Given a data set

(xi , yi = f (xi )), i = 0, 1, ....n

In inverse interpolation we want to find a position x̄ such that, for

Here: we are looking for x ∗ such that

The secant method can be derived as linear interpolation:

The value of this function at y = 0 gives the approximate xk+1 :

Code on next page from Moler, Numerical computing with

MATLAB, SIAM, 2004.

Combine the reliability of bisection with the convergence speed of

▶ If the IQI or secant step is in the interval [a,b]. take it.

This algorithm is foolproof: It never loses track of the zero trapped

If we have found a zero z of f(x) = 0 and want to compute another

For the reciprocal Newton correction for f1 we get

Then a Newton correction becomes

and similarly for multiple zeros z1 , ......zm .

Comparison of some methods for computing the smallest zero of

method start steps function evals

Notes: (1) Brent is a method similar to MATLAB’s fzero

▶ A major source of applications giving rise to root finding is

Assume that ϕ ∈ C 2 [a, b]. Denote

An argument x ∗ satisfying a < x ∗ < b is called a critical point if

For parameter h small enough so that x ∗ + h ∈ [a, b] we can

Since |h| can be taken arbitrary small, it is now clear that at a

If ϕ(x) attains a minimum (or maximum) at a point x̂, then this

You might also like