You are on page 1of 22

Numerical solution of non-linear equations

October 14, 2020

Anne Kværnø (modified by André Massing and Markus Grasmair)


Date: May 27, 2020
Revision: Oct 11, 2020

1 Introduction
We know that the quadratic equations of the form

ax2 + bx + cx = 0

has the roots √


± −b ±b2 − 4ac
r = .
2a
More generally, for a given function f , a number r satisfying f (r ) = 0 is called a root (or solution)
to the equation
!
f ( x ) = 0.
In many applications, we encounter such equations for which we do not have a simple solution
formula as for quadratic functions. In fact, an analytical solution formula might not even exist!
Thus the goal of the chapter is to develop some numerical techniques for solving nonlinear scalar
equations (one equation, one unknown), such as, for example

x3 + x2 − 3x = 3.

or systems of equations, such as, for example

xey = 1,
− x2 + y = 1.

NB!

• We refer to section 3.1 in Preliminaries for some general comments on convergence.


• There are no examples of numerical calculations done by hand in this note. If you would
like some, you can easily generate them yourself. Take one of the computer exercises, do
your calculations by hand, if needed modify the code so that you get the output you want,
run the code, and compare with the results you got by your pencil and paper calculation.

1
2 Scalar equations
In this section we discuss the solution of scalar equations. The techniques we will use are known
from previous courses. When they are repeated here, it is because the techniques used to develop
and analyse these methods can, at least in principle, be extended to systems of equations. We will
also emphasise the error analysis of the methods.
A scalar equation is given by
f ( x ) = 0, (1)
where f is a continuous function defined on some interval [ a, b]. A solution r of the equation is
called a zero or a root of f . A nonlinear equation may have one, more than one, or no roots.

Example 1: Given the function f ( x ) = x3 + x2 − 3x − 3 on the interval [−2, 2]. Plot the function
on the interval, and locate possible roots.

In [5]: %matplotlib inline

import numpy as np
from numpy import pi
from numpy.linalg import solve, norm # Solve linear systems and compute norms
import matplotlib.pyplot as plt

newparams = {'figure.figsize': (8.0, 4.0), 'axes.grid': True,


'lines.markersize': 8, 'lines.linewidth': 2,
'font.size': 14}
plt.rcParams.update(newparams)

In [6]: def f(x): # Define the function


return x**3+x**2-3*x-3

# Plot the function on an interval


x = np.linspace(-2, 2, 101) # x-values (evaluation points)
plt.plot(x, f(x)) # Plot the function
plt.plot(x, 0*x, 'r') # Plot the x-axis
plt.xlabel('x')
plt.grid(True)
plt.ylabel('f(x)');

2
According to the plot, the function f has three real roots in the interval [−2, 2]. The function
can be rewritten as
f ( x ) = x3 + x2 − 3x − 3 = ( x + 1)( x2 − 3).

Thus the roots of f are −1 and ± 3. We will use this example as a test problem later on.

2.1 Existence and uniqueness of solutions


The following theorem is well known:

Theorem 1: Existence and uniqueness of a solution.

• If f ∈ C [ a, b] with f ( a) and f (b) of opposite sign, then there exist at least one r ∈ ( a, b) such
that f (r ) = 0.

• The solution is unique if f ∈ C1 [ a, b] and f 0 ( x ) > 0 or f 0 ( x ) < 0 for all x ∈ ( a, b).

The first condition guarantees that the graph of f will pass the x-axis at some point r, the
second guarantees that the function is either strictly increasing or strictly decreasing.
We note that the condition that f ( a) and f (b) are of opposite sign can be summarised in the
condition that f ( a) · f (b) < 0.

2.2 Bisection method


The first part of the theorem can be used to construct the first, quite intuitive algorithm for solving
scalar equations. Given an interval, check if f has a root in the interval (by comparing the signs of
the function values at the end-points), divide it in two, check in which of the two halfs the root is,
and continue until a root is located with sufficient accuracy.

3
Bisection method.

• Given the function f and the interval I0 := [ a, b], such that f ( a) · f (b) < 0.

• Set a0 = a, b0 = b.

• For k = 0, 1, 2, 3, . . .
a k + bk
• ck =
2
(
[ ak , ck ] if f ( ak ) · f (ck ) ≤ 0
• Ik+1 := [ ak+1 , bk+1 ] =
[ck , bk ] if f (bk ) · f (ck ) ≤ 0

Use ck as the approximation to the root. Since ck is the midpoint of the interval [ ak , bk ], the error
satisfies |ck − r | 6 (bk − ak )/2. The loop is terminated when (bk − ak )/2 is smaller than some user
specified tolerance. Since the length of interval Ik satisfies (bk − ak ) = 2−k (b − a), we also have an
a priori estimate of the error after k bisections,

1 1
| c k − r | 6 ( bk − a k ) 6 k + 1 ( b − a )
2 2
Consequently, we can estimate how many bisections we have to perform to compute a root up to
a given tolerance tol by simply requiring that
b−a
1 b−a log
(b − a) 6 tol ⇔ 6 2k ⇔ 2 tol 6 k.
2k +1 2 tol log 2

We will of course also terminate the loop if f (ck ) is very close to 0.

2.2.1 Implementation
The algorithm is implemented in the function bisection().

In [7]: def bisection(f, a, b, tol=1.e-6, max_iter = 100):


# Solve the scalar equation f(x)=0 by bisection
# The result of each iteration is printed
# Input:
# f: The function.
# a, b: Interval:
# tol : Tolerance
# max_iter: Maximum number of iterations
# Output:
# the root and the number of iterations.

fa = f(a)
fb = f(b)
if fa*fb > 0:
print('Error: f(a)*f(b)>0, there may be no root in the interval.')
return 0, 0
for k in range(max_iter):

4
c = 0.5*(a+b) # The midpoint
fc = f(c)
print('k ={:3d}, a = {:.6f}, b = {:.6f}, c = {:10.6f}, f(c) = {:10.3e}'
.format(k, a, b, c, fc))
if abs(f(c)) < 1.e-14 or (b-a) < 2*tol: # The zero is found!
break
elif fa*fc <= 0:
b = c # There is a root in [a, c]
else:
a = c # There is a root in [c, b]
return c, k
Example 2: Use the code above to find the root of
f ( x ) = x3 + x2 − 3x − 3
in the interval [1.5, 2]. Use 10−6 as the tolerance.
In [8]: # Example 2
def f(x): # Define the function
return x**3+x**2-3*x-3
a, b = 1.5, 2 # The interval
c, nit = bisection(f, a, b, tol=1.e-6) # Apply the bisection method
k = 0, a = 1.500000, b = 2.000000, c = 1.750000, f(c) = 1.719e-01
k = 1, a = 1.500000, b = 1.750000, c = 1.625000, f(c) = -9.434e-01
k = 2, a = 1.625000, b = 1.750000, c = 1.687500, f(c) = -4.094e-01
k = 3, a = 1.687500, b = 1.750000, c = 1.718750, f(c) = -1.248e-01
k = 4, a = 1.718750, b = 1.750000, c = 1.734375, f(c) = 2.203e-02
k = 5, a = 1.718750, b = 1.734375, c = 1.726562, f(c) = -5.176e-02
k = 6, a = 1.726562, b = 1.734375, c = 1.730469, f(c) = -1.496e-02
k = 7, a = 1.730469, b = 1.734375, c = 1.732422, f(c) = 3.513e-03
k = 8, a = 1.730469, b = 1.732422, c = 1.731445, f(c) = -5.728e-03
k = 9, a = 1.731445, b = 1.732422, c = 1.731934, f(c) = -1.109e-03
k = 10, a = 1.731934, b = 1.732422, c = 1.732178, f(c) = 1.201e-03
k = 11, a = 1.731934, b = 1.732178, c = 1.732056, f(c) = 4.596e-05
k = 12, a = 1.731934, b = 1.732056, c = 1.731995, f(c) = -5.317e-04
k = 13, a = 1.731995, b = 1.732056, c = 1.732025, f(c) = -2.429e-04
k = 14, a = 1.732025, b = 1.732056, c = 1.732040, f(c) = -9.845e-05
k = 15, a = 1.732040, b = 1.732056, c = 1.732048, f(c) = -2.624e-05
k = 16, a = 1.732048, b = 1.732056, c = 1.732052, f(c) = 9.860e-06
k = 17, a = 1.732048, b = 1.732052, c = 1.732050, f(c) = -8.192e-06
k = 18, a = 1.732050, b = 1.732052, c = 1.732051, f(c) = 8.340e-07

Control that the numerical result is within the error tolerance:


In [9]: r = np.sqrt(3)
print('The error: |c_k-r|={:.2e}'.format(abs(r-c)))
The error: |c_k-r|=8.81e-08

5
2.3 Exercise 1: Bisection method
a) Compute the solution(s) of x2 + sin( x ) − 0.5 = 0 by the bisection method.

b) Compute the solution(s) of x2 + sin( x ) − 0.5 = 0 by the bisection method.

c) Given a root in the interval [1.5, 2]. How many iterations are required to guarantee that the
error is less than 10−4 .

2.4 Fixed point iterations


The bisection method is very robust, but not particular fast. In addition, the bisection method in
that form can only be used for the solution of equations in a single variable, and it is difficult to
use similar methods for systems of equations in several variables.
We will therefore discuss a major class of iteration schemes, namely the so-called fixed point
iterations. The idea is:

• Given a scalar equation f ( x ) = 0 with a root r.

• Rewrite the equation in fixed point form x = g( x ) such that the root r of f is a fixed point of g,
that is, r satisfies r = g(r ).

The fixed point iterations are then given by:

Fixed point iterations.

• Given g and a starting value x0 .

• For k = 0, 1, 2, 3, . . .
x k +1 = g ( x k )

2.4.1 Implementation
The fixed point scheme is implemented in the function fixedpoint. The iterations are terminated
when either the error estimate | xk+1 − xk | is less than a given tolerance tol, or the number of
iterations reaches some maximal number max_iter.

In [10]: def fixedpoint(g, x0, tol=1.e-8, max_iter=30):


# Solve x=g(x) by fixed point iterations
# The output of each iteration is printed
# Input:
# g: The function g(x)
# x0: Initial values
# tol: The tolerance
# Output:
# The root, the number of iterations
x = x0
print('k ={:3d}, x = {:14.10f}'.format(0, x))
for k in range(max_iter):
x_old = x # Store old values for error estimation
x = g(x) # The iteration

6
err = abs(x-x_old) # Error estimate
print('k ={:3d}, x = {:14.10f}'.format(k+1, x))
if err < tol: # The solution is accepted
break
return x, k+1

Example 3: The equation

x3 + x2 − 3
x3 + x2 − 3x − 3 = 0 can be rewritten as x= .
3
x 3 + x 2 −3
The fixed points are the intersections of the two graphs y = x and y = 3 , as can be demon-
strated by the following script:

In [11]: # Example 3
x = np.linspace(-2, 2, 101)
plt.plot(x, (x**3+x**2-3)/3,'b', x, x, '--r' )
plt.axis([-2, 3, -2, 2])
plt.xlabel('x')
plt.grid('True')
plt.legend(['g(x)','x']);

We observe that the fixed points of g are the same as the zeros of f .
Apply fixed point iterations on g( x ). Aim for√the fixed point between 1 and 2, so choose
x0 = 1.5. Do the iterations converge to the root r = 3?

In [12]: # Solve the equation from example 3 by fixed point iterations.

7
# Define the function
def g(x):
return (x**3+x**2-3)/3

# Apply the fixed point scheme


x, nit = fixedpoint(g, x0=1.5, tol=1.e-6, max_iter=30)

print('\n\nResultat:\nx = {}, antall iterasjoner={}'.format(x, nit))

k = 0, x = 1.5000000000
k = 1, x = 0.8750000000
k = 2, x = -0.5214843750
k = 3, x = -0.9566232041
k = 4, x = -0.9867682272
k = 5, x = -0.9957053567
k = 6, x = -0.9985807218
k = 7, x = -0.9995282492
k = 8, x = -0.9998428981
k = 9, x = -0.9999476491
k = 10, x = -0.9999825515
k = 11, x = -0.9999941841
k = 12, x = -0.9999980614
k = 13, x = -0.9999993538
k = 14, x = -0.9999997846

Resultat:
x = -0.9999997845980656, antall iterasjoner=14

2.5 Exercise 2: Fixed point method part I


Repeat the experiment with the following reformulations of f ( x ) = 0:

− x2 + 3x + 3
x = g2 ( x ) = ,
p x2
3
x = g3 ( x ) = 3 + 3x − x2 ,
r
3 + 3x − x2
x = g4 ( x ) =
x
Use x0 = 1.5 in your experiments, you may well experiment with other values as well.

2.6 Theory
Let us first state some existence and uniqueness results. Apply Theorem 1 in this note on the
equation f ( x ) = x − g( x ) = 0. The following is then given (the details are left to the reader): * If
g ∈ C [ a, b] and a < g( x ) < b for all x ∈ [ a, b] then g has at least one fixed point r ∈ ( a, b).

• If in addition g ∈ C1 [ a, b] and | g0 ( x )| < 1 for all x ∈ [ a, b] then the fixed point is unique.

8
In the following, we will write the assumption a < g( x ) < b for all x ∈ [ a, b] as g([ a, b]) ⊂
( a, b).
In this section we will discuss the convergence properties of the fixed point iterations, under
the conditions for existence and uniqueness given above.
The error after k iterations is given by ek = r − xk . The iterations converge when ek → 0 as
k → ∞. Under which conditions is this the case?
This is the trick: For every k we have:

xk+1 = g( xk ), the iterations,


r = g (r ), the fixed point.

Take the difference between those and use the mean value theorem (Result 3 in section 5 in Pre-
liminaries), and finally take the absolute value of each expression in the sequence of equalities:

|ek+1 | = |r − xk+1 | = | g(r ) − g( xk )| = | g0 (ξ k )| · |r − xk | = | g0 (ξ k )| · |ek |. (2)

Here ξ k is some unknown value between xk (known) and r (unknown). We can now use the
assumptions from the existence and uniqueness result. Note here that the condition g([ a, b]) ⊂
( a, b) guarantees that if x0 ∈ [ a, b] then xk ∈ ( a, b) for k = 1, 2, 3, . . ..

The condition | g0 ( x )| ≤ L < 1 guarantees convergence towards the unique fixed point r, since

| e k +1 | ≤ L | e k | ⇒ |ek | ≤ Lk |e0 | → 0 as k → ∞,

and Lk → 0 as k → ∞ when L < 1. In addition, we have that

| xk+1 − xk | = | g( xk ) − g( xk−1 )| = | g0 (ξ k )| · | xk − xk−1 |

for some ξ k between xk−1 and xk , which shows that

| x k +1 − x k | ≤ L | x k − x k −1 | .

Repeating this argumentation k-times yields the estimate

| x k +1 − x k | ≤ L k | x 1 − x 0 | .

As a consequence, since r = limk→∞ xk , we obtain that


∞ ∞ ∞
L
| x 1 − r | = ∑ ( x k +1 − x k +2 ) ≤ ∑ | x k +1 − x k +2 | ≤ ∑ L k +1 | x 1 − x 0 | = 1 − L | x 1 − x 0 | .

k =0 k =0 k =0

A similar argumentation (but now starting at xk instead of x1 ) yields that

L L k +1
| x k +1 − r | ≤ | x k +1 − x k | ≤ | x1 − x0 |.
1−L 1−L
In summary we have

9
The fixed point theorem. If there is an interval [ a, b] such that g ∈ C1 [ a, b], g([ a, b]) ⊂ ( a, b) and
there exist a positive constant L < 1 such that | g0 ( x )| ≤ L < 1 for all x ∈ [ a, b] then the following
hold:
• g has a unique fixed point r in ( a, b).
• The fixed point iteration xk+1 = g( xk ) converges towards r for all starting values x0 ∈ [ a, b].
• The error ek+1 = r − xk+1 after iteration k + 1 satisfies the estimates
• | e k +1 | 6 L | e k | . . . error reduction rate,
L k +1
• | e k +1 | 6 | x1 − x0 | . . . a-priori error estimate.
1−L
L
• | e k +1 | 6 |x − xk | . . . a-posteriori error estimate.
L − 1 k +1
From the discussion above, we can draw some conclusions:
• The smaller the constant L, the faster the convergence.
• If | g0 (r )| < 1 then there will always be an interval (r − δ, r + δ) around r for some δ > 0 on
which all the conditions are satisfied, meaning that the iterations will always converge if x0
is sufficiently close to r.
• If | g0 (r )| > 1, a similar argumentation shows that the fixed point iterates move away from
the point r. In practice, this means that the fixed point iteration in this case will never con-
verge towards r.
• If the constant L is known, the a-priori error estimate can be used in order to estimate the
number of steps it takes in order to obtain an error that is smaller than some specified toler-
ance tol. To that end, we simply have to solve the inequality
L k +1
| x1 − x0 | ≤ tol
1−L
for k.
• The a-posteriori estimate can be used to estimate the actual error based on the size of the last
update | xk+1 − xk |.
• If the constant L is not known, one can estimate it using the ideas found in
preliminaries.ipynb. Then one can use this estimate of L in the a-posteriori error estimate
in order to estimate the actual error.
Example 3 revisited: Use the theory above to analyse the iteration scheme from Example 3,
where
x3 + x2 − 3 3x2 + 2x
g( x ) = , g0 ( x ) =
3 3

It is clear that g is differentiable. We already
√ know that
√ g has three fixed√points, r = ±
√ 3 and
r = −1. For the first two, we have that g0 ( 3) = 3 + 32 3 = 4.15 and g0 (− 3) = 3 − 32 3 = 1.85,
so the fixed point iterations will never converges towards those roots. But g0 (−1) = 1/3, so
we get convergence towards this root, given sufficiently good starting values. The figure below
demonstrates that the assumptions of the theorem are satisfied in some region around x = −1, for
example [−1.3, −0.7].

10
In [13]: plt.rcParams['figure.figsize'] = 10, 5 # Resize the figure for a nicer plot

In [14]: # Demonstrate the assumptions of the fixed point theorem

def g(x): # The function g(x)


return (x**3+x**2-3)/3

def dg(x): # The derivative g'(x)


return (3*x**2+2*x)/3

a, b = -4/3, -2/3 # The interval [a,b] around r.


x = np.linspace(a, b, 101) # x-values on [a,b]
y_is_1 = np.ones(101) # For plotting the for g'

plt.rcParams['figure.figsize'] = 10, 5 # Resize the figure


# Plot x and g(x) around the $r=-1$.
plt.subplot(1,2,1)
plt.plot(x, g(x), x, x, 'r--')
plt.xlabel('x')
plt.axis([a,b,a,b])
plt.grid(True)
plt.legend(['g(x)','x'])

# Plot g'(x), and the limits -1 and +1


plt.subplot(1,2,2)
plt.plot(x, dg(x), x, y_is_1, 'r--', x, -y_is_1, 'r--');
plt.xlabel('x')
plt.grid(True)
plt.title('dg/dx');

11
In [15]: # run the fixed point iteration starting at x_0 = -4/3
x, nit = fixedpoint(g, x0=-2/3, tol=1.e-6, max_iter=30)

print('\n\nResultat:\nx = {}, antall iterasjoner={}'.format(x, nit))

k = 0, x = -0.6666666667
k = 1, x = -0.9506172840
k = 2, x = -0.9851247206
k = 3, x = -0.9951879923
k = 4, x = -0.9984113972
k = 5, x = -0.9994721469
k = 6, x = -0.9998242347
k = 7, x = -0.9999414321
k = 8, x = -0.9999804797
k = 9, x = -0.9999934935
k = 10, x = -0.9999978312
k = 11, x = -0.9999992771
k = 12, x = -0.9999997590

Resultat:
x = -0.9999997590221911, antall iterasjoner=12

The plot to the left demonstrates the assumption g([ a, b]) ⊂ ( a, b), as the graph y = g( x ) enters
at the left boundary and exits at the right and does not leave the region [ a, b] anywhere in between.
The plot to the right shows that | g0 ( x )| ≤ | g0 ( a)| = L < 1 is satisfied in the interval.

2.7 Exercise 3: Fixed point method part II


a) See how far the interval [ a, b] can be stretched, still with convergence guaranteed.

b) Do a similar analysis on the three other iteration schemes suggested above, and confirm the
results numerically. The schemes are:

− x2 + 3x + 3
x = g2 ( x ) = ,
p x2
3
x = g3 ( x ) = 3 + 3x − x2 ,
r
3 + 3x − x2
x = g4 ( x ) =
x

Example 3 revisited: The fixed point theorem stated that |ek+1 | 6 L|ek | with L < 1 and thus
the fixed point iteration is linearly convergent.

In [17]: def eocs(errs):


# Computes the experimentally observed convergence rates for a

12
# list of iterates
# errs: list of computed errors
# Output:
# list of computed eocs

if (len(errs) < 3):


print('Error: Need at least 3 iterates to compute EOCs')

# (Recall list indexing and slicing in Python if you trouble


# to understand the next line)
ps = np.log(errs[2:]/errs[1:-1])/np.log(errs[1:-1]/errs[:-2])

# Print
for k in range(len(ps)):
print("k = {:3d}, eoc(k) = {:3.3f}".format(k+2, ps[k]))

return ps

We will also modify the fixedpoint() function so that it returns not only the last iteration, but
all iterations, so that we can compute the estimated order of convergence.

In [18]: def fixedpoint_v2(g, x0, tol=1.e-8, max_iter=30):


# Solve x=g(x) by fixed point iterations
# The output of each iteration is printed
# Input:
# g: The function g(x)
# x0: Initial values
# tol: The tolerance
# Output:
# List of iterations
x = [x0]
print('k ={:3d}, x = {:14.10f}'.format(0, x[-1]))
for k in range(max_iter):
x.append(g(x[-1]))
err = abs(x[-1]-x[-2]) # Error estimate
print('k ={:3d}, x = {:14.10f}'.format(k+1, x[-1]))
if err < tol: # The solution is accepted
break
# Return as numpy array
return np.array(x)

Now we can rerun example 3 with g( x ) = ( x3 + x2 − 3)/3 and compute the estimated order
of convergence:

In [19]: ## Compute the eoc for the fixed point method when solving example 3.

# Define the function


def g(x):
return (x**3+x**2-3)/3

13
# Apply the fixed point scheme
x = fixedpoint_v2(g, x0=-2/3, tol=1.e-6, max_iter=30)

# Compute x_k - r,
r = -1 # exact solution
errs = x - r # error for each iterate

# Find the order and the error constant C


print('\nThe order p and the error constant C')
for k in range(len(errs)-2):
p = np.log(np.abs(errs[k+2]/errs[k+1]))/np.log(np.abs(errs[k+1]/errs[k]))
C = np.abs(errs[k+2])/np.abs(errs[k+1])**p
print('k = {:2d}, p = {:4.2f}, C = {:6.4f}'.format(k, p, C))

k = 0, x = -0.6666666667
k = 1, x = -0.9506172840
k = 2, x = -0.9851247206
k = 3, x = -0.9951879923
k = 4, x = -0.9984113972
k = 5, x = -0.9994721469
k = 6, x = -0.9998242347
k = 7, x = -0.9999414321
k = 8, x = -0.9999804797
k = 9, x = -0.9999934935
k = 10, x = -0.9999978312
k = 11, x = -0.9999992771
k = 12, x = -0.9999997590

The order p and the error constant C


k = 0, p = 0.63, C = 0.0985
k = 1, p = 0.94, C = 0.2519
k = 2, p = 0.98, C = 0.2999
k = 3, p = 0.99, C = 0.3200
k = 4, p = 1.00, C = 0.3282
k = 5, p = 1.00, C = 0.3314
k = 6, p = 1.00, C = 0.3326
k = 7, p = 1.00, C = 0.3331
k = 8, p = 1.00, C = 0.3332
k = 9, p = 1.00, C = 0.3333
k = 10, p = 1.00, C = 0.3333

2.8 Newton’s method


Based on the previous discussion, it is clear that fast convergence can be achieved if g0 (r ) is as
small as possible, preferably g0 (r ) = 0. Can this be achieved for a general problem?
We have that xk = r − ek where ek is the error in iteration k. Do a Taylor expansion (Preliminar-

14
ies, section 4) of g( xk ) around r:

1
ek+1 = r − xk+1 = g(r ) − g( xk ) = g(r ) − g(r − ek ) = − g0 (r )ek + g00 (ξ k )e2k .
2
If g0 (r ) = 0 and there is a constant M such that | g00 ( x )|/2 ≤ M, then

| e k +1 | ≤ M | e k | 2

and quadratic convergence has been achieved, see Preliminaries, section 3.1 on "Convergence of an
iterative process."
Question: Given the equation f ( x ) = 0 with an unknown solution r. Can we find a g with r as
a fixed point, satisfying g0 (r ) = 0?
Idea: Let
g ( x ) = x − h ( x ) f ( x ).
for some function h( x ). Independent of the choice of h( x ), r will be a fixed point of g. Choose h( x )
such that g0 (r ) = 0, that is

g0 ( x ) = 1 − h0 ( x ) f ( x ) − h( x ) f 0 ( x ) ⇒ g 0 (r ) = 1 − h (r ) f 0 (r )

That is, we have to choose h( x ) in such a way that h(r ) = 1/ f 0 (r ). There are several choices for h
with this property, but the most obvious one is the choose h( x ) = 1/ f 0 ( x ). The result is

Newton’s method.

• Given f and f 0 , and a starting value x0 .

• For k = 0, 1, 2, 3, . . .
f ( xk )
x k +1 = x k −
f 0 ( xk )

2.8.1 Implementation
The method is implemented in the function newton(). The iterations are terminated when | f ( xk )|
is less than a given tolerance.

In [20]: def newton(f, df, x0, tol=1.e-8, max_iter=30):


# Solve f(x)=0 by Newtons method
# The output of each iteration is printed
# Input:
# f, df: The function f and its derivate f'.
# x0: Initial values
# tol: The tolerance
# Output:
# The root and the number of iterations
x = x0
print('k ={:3d}, x = {:18.15f}, f(x) = {:10.3e}'.format(0, x, f(x)))
for k in range(max_iter):
fx = f(x)
if abs(fx) < tol: # Accept the solution

15
break
x = x - fx/df(x) # Newton-iteration
print('k ={:3d}, x = {:18.15f}, f(x) = {:10.3e}'.format(k+1, x, f(x)))
return x, k+1

Example 4: Solve f ( x ) = x3 + x2 − 3x − 3 = 0 by Newton’s method. Choose x0 = 1.5. The


derivative of f is f 0 ( x ) = 3x2 + 2x − 3.

In [28]: # Example 4
def f(x): # The function f
return x**3+x**2-3*x-3

def df(x): # The derivative f'


return 3*x**2+2*x-3

x0 = -4/3 # Starting value


x, nit = newton(f, df, x0, tol=1.e-14, max_iter=30) # Apply Newton
print('\n\nResult:\nx={}, number of iterations={}'.format(x, nit))

k = 0, x = -1.333333333333333, f(x) = 4.074e-01


k = 1, x = -0.111111111111110, f(x) = -2.656e+00
k = 2, x = -0.944875107665806, f(x) = -1.162e-01
k = 3, x = -0.997403215651661, f(x) = -5.207e-03
k = 4, x = -0.999993308904898, f(x) = -1.338e-05
k = 5, x = -0.999999999955230, f(x) = -8.954e-11
k = 6, x = -1.000000000000000, f(x) = 0.000e+00

Result:
x=-1.0, number of iterations=7

2.8.2 Error analysis


The method was constructed to give quadratic convergence, that is

| e k +1 | ≤ M | e k | 2 ,
where ek = r − xk is the error after k iterations. But under which conditions is this true, and can
we say something about the size of the constant M?
By a Taylor expansion of f (r ) around xk , we get:
1 00
0 = f (r ) = f ( xk ) + f 0 ( xk )(r − xk ) + f (ξ k )(r − xk )2 (Taylor series)
2
0 = f ( xk ) + f 0 ( xk )( xk+1 − xk ) (Newton’s method)

where ξ k is between r and xk .


Subtract the two equations to get
1 00 1 f 00 (ξ k ) 2
f 0 ( xk )(r − xk+1 ) + f (ξ k )(r − xk )2 = 0 ⇒ e k +1 = − e
2 2 f 0 ( xk ) k

16
So we obtain quadratic convergence if f is twice differentiable around r, f 0 ( xk ) 6= 0, and x0 is
chosen sufficiently close to r. More precisely:

Theorem: Convergence of Newton iterations. Assume that the function f has a root r, and let
Iδ = [r − δ, r + δ] for some δ > 0. Assume further that
• f ∈ C2 ( Iδ ).
• There is a positive constant M such that
00
f (y)
• 0
≤ 2M, for all x, y ∈ Iδ .
f (x)
In this case, Newton’s iterations converge quadratically with

| e k +1 | ≤ M | e k | 2
for all starting values satisfying | x0 − r | < min{1/M, δ}.

Here the condition | x0 − r | ≤ 1/M guarantees that the error actually decreases in each step,
since we have this condition that |ek+1 | ≤ M|ek |2 < |ek |.

2.9 Exercise 4: Newton’s method for a scalar equation


a) Repeat Example 4 using different starting values x0 . Find the two other roots.
b) Verify quadratic convergence numerically. How to do so is explained in Preliminaries, section
3.1.
c) Solve the equation x (1 − cos( x )) = 0, both by the bisection method and by Newton’s method
with x0 = 1. Comment on the result.

3 System of nonlinear equations


In this section we will discuss how to solve systems of non-linear equations, given by

f 1 ( x1 , x2 , . . . , x n ) = 0
f 2 ( x1 , x2 , . . . , x n ) = 0
..
.
f n ( x1 , x2 , . . . , x n ) = 0
or short by
f(x) = 0.
where f : → Rn Rn .
Example 5: We are given the two equations
1
x13 − x2 + = 0,
4
x12 + x22 − 1 = 0.
This is illustrated in the following figure.

17
In [22]: plt.rcParams['figure.figsize'] = 6, 6 # Resize the figure for a nicer plot

In [23]: # Example 5 with graphs


x = np.linspace(-1.0, 1.0, 101)
plt.plot(x, x**3+0.25);
t = np.linspace(0,2*pi,101)
plt.plot(np.cos(t), np.sin(t));
plt.axis('equal');
plt.xlabel('x1')
plt.ylabel('x2')
plt.grid(True)
plt.legend(['$x_1^3-x_2+1/4=0$','$x_1^2+x_2^2-1=0$']);

The solutions of the two equations are the intersections of the two graphs. So there are two
solutions, one in the first and one in the third quadrant. We will use this as a test example in the
sequel.

18
3.1 Newton’s method for systems of equations
The idea of fixed point iterations can be extended to systems of equations. But it is in general
hard to find convergent schemes. So we will concentrate on the extension of Newton’s method to
systems of equations. And for the sake of illustration, we only discuss systems of two equations
and two unknowns written as
f ( x, y) = 0,
g( x, y) = 0,
to avoid getting completely lost in indices.
Let r = [r x , ry ] T be a solution to these equations and some x̂ = [ x̂, ŷ] T a known approximation
to r. We search for a better approximation. This can be done by replacing the nonlinear equation
f(x) = 0 by its linear approximation. This can be found by a multidimensional Taylor expansion
around x̂:
∂f ∂f
f ( x, y) = f ( x̂, ŷ) + ( x̂, ŷ)( x − x̂ ) + ( x̂, ŷ)(y − ŷ) + . . .
∂x ∂y
∂g ∂g
g( x, y) = g( x̂, ŷ) + ( x̂, ŷ)( x − x̂ ) + ( x̂, ŷ)(y − ŷ) + . . .
∂x ∂y
where the . . . represent higher order terms, which are small if x̂ ≈ x. By ignoring these terms we
get a linear approximation to f(x), and rather than solving the nonlinear original system, we can
solve its linear approximation:

∂f ∂f
f ( x̂, ŷ) + ( x̂, ŷ)( x − x̂ ) + ( x̂, ŷ)(y − ŷ) = 0
∂x ∂y
∂g ∂g
g( x̂, ŷ) + ( x̂, ŷ)( x − x̂ ) + ( x̂, ŷ)(y − ŷ) = 0
∂x ∂y
or more compact
f(x̂) + J (x̂)(x − x̂) = 0,
where the Jacobian J (x) is given by
∂f ∂f
!
∂x ( x, y ) ∂y ( x, y )
J (x) = ∂g ∂g
∂x ( x, y ) ∂y ( x, y )

It is to be hoped that the solution of the linear equation x provides a better approximation to r
than our initial guess x̂, so the process can be repeated, resulting in

Newton’s method for system of equations.

• Given a function f(x), its Jacobian J (x) and a starting value x0 .

• For k = 0, 1, 2, 3, . . .

• Solve the system J (xk )∆k = −f(xk ).

• Let xk+1 = xk + ∆k .

19
The strategy can be generalized to systems of n equations in n unknowns, in which case the Jaco-
bian is given by:
 ∂ f1 ∂ f1 ∂ f1 
∂x1 ( x ) ∂x2 ( x ) · · · ∂xn ( x )
 ∂ f2 ∂f ∂f
 ∂x1 (x) ∂x22 (x) · · · ∂x2n (x) 

J (x) = 

.. .. ..
.
. . .

 
∂ fn ∂ fn ∂ fn
∂x1 ( x ) ∂x2 ( x ) · · · ∂xn ( x )

3.1.1 Implementation
Newton’s method for system of equations is implemented in the function newton_system. The nu-
merical solution is accepted when all components of f(xk ) are smaller than a tolerance in absolute
value, that means when |f(xk)|{∞} < tol. See Preliminaries, section 1 for a description of norms.

In [24]: np.set_printoptions(precision=15) # Output with high accuracy

In [25]: def newton_system(f, jac, x0, tol = 1.e-10, max_iter=20):


x = x0
print('k ={:3d}, x = '.format(0), x)
for k in range(max_iter):
fx = f(x)
if norm(fx, np.inf) < tol: # The solution is accepted.
break
Jx = jac(x)
try:
delta = solve(Jx, -fx)
except:
print("Error in solving the Newton system")
k = -1 # return -1 as iteration number
break
x = x + delta
print('k ={:3d}, x = '.format(k+1), x)
return x, k

Example 6: Solve the equations from Example 5 by Newton’s method, that is, the system

1
x13 − x2 + = 0,
4
x12 + x22 − 1 = 0.

The vector valued function f and the Jacobian J are in this case

x1 − x2 + 41 3x12 −1
 3   
f(x) = and J (x) =
x12 + x22 − 1 2x1 2x2

We already know that the system has 2 solutions, one in the first and one in the third quadrant. To
find the first one, choose x0 = [1, 1] T as starting value.

In [33]: # Example 6

20
# The vector valued function. Notice the indexing.
def f(x):
y = np.array([x[0]**3-x[1]+0.25,
x[0]**2+x[1]**2-1])
return y

# The Jacobian
def jac(x):
J = np.array([[3*x[0]**2, -1],
[2*x[0], 2*x[1]]])
return J

x0 = np.array([1.0, 1.0]) # Starting values


max_iter = 50
x, nit = newton_system(f, jac, x0, tol = 1.e-12, max_iter = max_iter) # Apply Newton's

print('\nTest: f(x)={}'.format(f(x)))
if nit == max_iter-1:
print('Warning: Convergence has not been achieved')

k = 0, x = [ 1. 1.]
k = 1, x = [ 0.8125 0.6875]
k = 2, x = [ 0.750687815833801 0.663959854014599]
k = 3, x = [ 0.746302675769953 0.665623251157924]
k = 4, x = [ 0.746281278080405 0.665630719318386]
k = 5, x = [ 0.746281277575054 0.665630719499142]

Test: f(x)=[ 1.110223024625157e-16 0.000000000000000e+00]

3.2 Exercise 5: Newton’s method for a system


a) Search for the solution of Example 5 in the third quadrant by changing the initial values.

b) Apply Newton’s method to the system

xey = 1,
using x0 = y0 = −1.
− x2 + y = 1,

3.2.1 Final remarks


A complete error and convergence analysis of Newton’s method for systems is far from trivial,
and outside the scope of this course. But in summary: If f is sufficiently differentiable, and there
is a solution r of the system f(x) = 0 and J (r) is nonsingular, then the Newton iterations will
converge quadratically towards r for all x0 sufficiently close to r.
Finding solutions of nonlinear equations is difficult. Even if the Newton iterations in principle
will converge, it can be very hard to find sufficient good starting values. Nonlinear equations
can have none or many solutions. Even when there are solutions, there is no guarantee that the
solution you find is the one you want.

21
If n is large, each iteration is computationally expensive since the Jacobian is evaluated in each
iteration, and a linear system has to be solved. In practice, some modified and more efficient
version of Newton’s method will be used, maybe together with more robust but slow algorithms
for finding sufficiently good starting values.

22

You might also like