Professional Documents
Culture Documents
Iterative Methods: 2.1.1 Simple Iteration Example
Iterative Methods: 2.1.1 Simple Iteration Example
Iterative Methods
2.1
Introduction
In this section, we will consider three different iterative methods for solving a sets of equations.
First, we consider a series of examples to illustrate iterative methods.
To construct an iterative method, we try and rearrange the system of equations such that we generate a sequence.
2.1.1
y=ex
(2.1)
y=2x
x = 2 ex ,
i.e. by computing x(k+1) = 2 ex
(k)
21
x(k)
x(k)
1.0
-1.0
1.63212
-0.71828
1.80449
-0.05091
1.83544
0.947776
1.84046
1.61240
1.84126
1.80059
1.84138
1.83480
1.84140
1.84124
1.84141
1.84138
...
...
In this example, both sequences appear to converge to a value close to the root = 1.84141 where
0 < < 2. Hence, we have constructed a simple algorithm for solving an equation and it appears
to be a robust iterative method.
However, (2.1) has two solutions: a positive root at 1.84141 and a negative root at -1.14619. Why
do we only find one root?
If f (x) = 0 has a solution x = then x(k+1) = g(x(k) ) will converge to , provided |g ()| < 1
and x(0) is suitably chosen.
and
g (x) = ex ,
and
|g (x)| < 1
if
x > 0.
So this method can be used to find the positive root of (2.1). However, it will never converge to the
negative root. Hence, this kind of approach will not always converge to a solution.
2.1.2
Linear Systems
22
x1
(k)
= 1.2 x2 /10
(k)
(k+1)
= 2.1 x1 /10 ,
x2
(k)
(k)
to generate a sequence of vectors x(k) = (x1 , x2 )T from some starting vector, x(0) .
If
(0)
x
then
(0)
0
=
,
0
0
1.2
0.99
1.002
(1)
(2)
(3)
=
, x =
, x =
, x =
, ...
0
2.1
1.98
2.001
where
x(k)
1
2
as k ,
x1
(k+1)
x2
(0)
then
(0)
(k)
= 12 10x1 ,
(k)
= 21 10x2
0
=
,
0
0
21
99
1011
(1)
(2)
(3)
=
, x =
, x =
, x =
, ...
0
12
198
1992
23
(1)
x2
from
(0)
is computed from x2
and in
(0)
x1 .
(1)
(0)
It seems more natural, from a computational point of view, to use x1 rather then x1 in the second
step. i.e. to use the latest available value. In effect, we want to compute the following:
(k+1)
= 1.2 x2 /10
(k+1)
= 2.1 x1
x1
x2
which gives,
(0)
x
which converges to
1
2
(k)
(k+1)
/10 ,
0
1.2
1.002
1
(1)
(2)
=
, x =
, x =
,
0
1.98
1.9998
2
In the following sections, we will consider, in general terms, iterative methods for solving a system
Ax = b. First, though we introduce some important results about a sequence of vectors
2.2
Sequences of Vectors
2.2.1
Let x(k) k=0 be a sequence in a Vector Space V . How do we know if this sequence has a limit?
First observe that kxk = kyk ; x = y. i.e. two distinct objects in a Vector Space can have the
same size. However, from rule 1 for norms (1.1) we know that if kx yk = 0, then x y.
So if
lim kx(k) xk = 0
then
lim x(k) = x
24
2.2.2
Convergence of a Sequence
i.e. we have a monotonically decreasing sequence, or, in other words, the error in the approximations
decreases.
Say we start from an initial guess x(1) x = B(x(0) x). Then
x(2) x = B(x(1) x)
= B B(x(0) x)
= B2 (x(0) x) ,
then
kBk < 1 .
(B) > 1 ,
then
kBk > 1 ,
and if
2.2.3
In numerical analysis, to compare different methods for solving systems of equations we are interested
in determining the rate of convergence of the method. As we will see below the spectral radius is a
measure of the rate of convergence.
Consider the situation where BN N has N linearly independent eigenvectors. As before we have
x(k+1) x = B(x(k) x) ,
or substituting in for v(k) = x(k) x, we have
v(k+1) = Bv(k) .
Now write v(0) =
then
PN
i=1
(1)
=B
N
X
i i ei
i ei
i=1
(2)
=B
N
X
i=1
N
X
i Bei =
i i ei ,
i=1
i=1
N
X
N
X
i i Bei =
N
X
i 2i ei ,
i=1
i=1
N
X
i ki ei .
i=1
N
X
i ki ei
i=2
k1
"
1 e1 +
N
X
i=2
i
1
k
ei .
2.2.4
Gerschgorins Theorem
The above result means that if we know the magnitude of the largest vector of the iteration matrix we
can estimate the rate of convergence of a system of equations for a particular method. However, this
26
requires the magnitudes of all eigenvalues to be known, which would probably have to be determined
numerically.
The Gerschgorin Theorem is a surprisingly simple result concerning eigenvalues that allows us to put
bounds on the size of the eigenvalues of a matrix without actually finding the eigenvalues themselves.
The equation Ae = e, where (, e) are an eigenvalue, eigenvector pair of the matrix A, can be
written in component notation as
N
X
aij ej = aii ei +
N
X
aij ej = ei .
j=1
j6=i
j=1
Rearranging implies
ei (aii ) =
N
X
aij ej ,
j=1
j6=i
and thus,
|ei | |aii |
N
X
j=1
j6=i
|aij | |ej | .
Suppose the component of eigenvector e with the largest absolute value is |el |, such that |el | |ej |
for j (note ej 6= 0 for all j). Then from above
|el | |all |
N
N
X
X
|alj | |el |
|alj | |ej |
j=1
j6=l
j=1
j6=l
N
X
j=1
j6=l
|alj | .
Each eigenvalue lies inside a circle with centre all and radius
However, we dont know l without finding and e.
PN
j=1
|alj | with j 6= l.
But we can say that the union of all such circles must contain all the eigenvalues. This is
Gerschgorins Theorem.
Example 2.5.1: Determine the bounds on the eigenvalues for the matrix
2 1 0
0
1 2 1 0
.
A=
0 1 2 1
0
0 1 2
27
|all |
j=1
j6=l
|alj | .
2.3
The Jacobi Iterative Method follows the iterative method shown in Example 2.1.2.
Consider the linear system
Ax = b,
aij xj = bi .
j=1
N
X
aij xj ,
j=1
j6=i
N
X
1
bi
aij xj
aii
j=1
j6=i
xi
N
X
1
(k)
bi
aij xj
,
aii
j=1
j6=i
28
(2.2)
(2.3)
and
aij , i > j
lij =
0,
ij,
aij , i < j
uij =
0,
i j.
(D L U)x = b ,
or,
Dx = (L + U)x + b .
Dividing each equation by aii is equivalent to writing
x = D1 (L + U)x + D1 b
where the elements of D1 are 1/aii , so we have pre-multiplied by the inverse of D. Hence, the
matrix form of the iterative method (2.2), known as the Jacobi Iteration Method is
x(k+1) = D1 (L + U)x(k) + D1 b .
(2.4)
The matrix BJ = D1 (L + U) is called the iteration matrix for the Jacobi Iteration method.
2.3.1
From 2.2.2 recall that an iterative method of the form x(k+1) = Bx(k) + c will converge provided
kBk < 1 and that a necessary and sufficient condition for this is to be true is (B) < 1.
Thus, for the Jacobi method, we require kBJ k = kD1 (L + U)k < 1 for convergence and, hence,
(BJ ) < 1.
29
Example 2.3.1: Let us return once more to Example 2.1.2 and recast it in the form of the Jacobi
iterative method. The linear system we wish to solve is
12
10 1
x
1 = = b .
Ax =
21
1 10 x2
The first thing we need to do is find D and L + U where A = D L U :
10 1
10 0
0 1
D=
and L + U =
A=
1 10
0 10
1 0
0
1/10
.
hence, D1 (L + U) = BJ =
1/10
0
Now choosing the matrix norm sub-ordinate to the infinity norm we find
kBJ k =
1
< 1.
10
Alternatively we can consider the spectral radius of BJ . The eigenvalues of BJ are given by
2 1/100 = 0
and so
(BJ ) =
1
,
10
1
kx(k) xk .
10
0
0
x=
and
kx(1) xk
Remember,
(1)
x
so,
(1)
x
and indeed,
1
,
2
1
2 = 0.2 .
10
1.2
=
,
2.1
0.2
x=
.
0.1
kx(1) xk 0.2 .
Since the size of (BJ ) is an indication of the rate of convergence we see here that this system
converges at a rate of (BJ ) = 0.1. The smaller the spectral radius the more rapid the convergence.
So is it possible to modify this method to make it faster?
30
2.4
To produce a faster iterative method we amend the Jacobi Method to make use of the new values
as they become available (e.g. as in Example 2.2.2.).
Expanding out the Jacobi Method (2.4) we have
x(k+1) = D1 (L + U)x(k) + D1 b
= D1 Lx(k) + D1 Ux(k) + D1 b .
Here D1 L is a lower triangular matrix so the ith row of D1 Lx(k) contains the values
(k)
(k)
(k)
(k)
x1 , x2 , x3 , . . . , xi1 .
(components up to, but not including the diagonal).
Likewise, D1 U is an upper triangular matrix so the ith row contains
(k)
(k)
(k)
xi+1 , xi+2 , . . . , xN .
(k+1)
If we compute the xi
s in the order of increasing i (i.e. from the top of the vector to the bottom)
xi
we have available
(k+1)
x1
(k+1)
, x2
(k+1)
, . . . , xi1 .
Hence, a more efficient version of the Jacobi Method is to compute (in the order of increasing i)
x(k+1) = D1 Lx(k+1) + D1 Ux(k) + D1 b .
This is equivalent to finding x(k+1) from
(I D1 L)x(k+1) = D1 Ux(k) + D1 b ,
or,
x(k+1) = (I D1 L)1 D1 Ux(k) + (I D1 L)1 D1 b .
This is known as the Gauss-Seidel Iterative Method.
The iteration matrix becomes
BGS = (I D1 L)1 D1 U
= [D(I D1 L)]1 U
= (D L)1 U .
31
(2.5)
The iteration matrix for the Gauss-Seidel method is given by BGS = (D L)1 U. Thus, for
convergence (from 2.2.2) we require that
kBGS k = k(D L)1 Uk < 1 .
Example 2.4.1: Again we reconsider the linear system used in Examples (2.1.2, 2.1.3 & 2.3.1) and
recast it in the form of the Gauss-Seidel Method:
10 1
,
A=
1 10
and since A = D L U, we have
10 0
DL=
1 10
and
Then
(D L)1 =
1/10
,
1/100 1/10
so
0 1
.
U=
0 0
(D L)1 U =
1/100 1/10 0
0
= (D L)1 U =
0
1/10
1
,
0
1/10
.
1/100
1
< 1,
10
1/10
= 0,
det
0 1/100
32
or,
1
= 0,
100
so we have
=0
or
1
,
100
and hence,
1
.
(BGS ) = (D L)1 U =
100
Observe that in this example even though kBGS k = kBJ k we have (BGS ) = [(BJ )] (cf Example
2.3.1), implying that Gauss-Seidel converges twice as fast as Jacobi.
2.5
The third iterative method we will consider is a method which accelerates the Gauss-Seidel method.
Consider the system Ax = b, with A = D L U as before. When trying to solve Ax = b, we
obtain an approximate solution x(k) of the true solution x. The quantity r(k) = b Ax(k) is called
a residual and it is a measure of the accuracy of x(k) . Clearly, we would like to make the residual
r(k) to be as small as possible for each approximate solution x(k) .
(k+1)
(k)
(k+1)
, . . . , xi1
the Gauss-Seidel iterative method for the most recent approximation, the residual vector is given by
r(k) = b Dx(k) + Lx(k+1) + Ux(k) .
Ultimately, we wish to make x x(k) as small as possible. However, as we dont know x yet, we
instead consider x(k+1) x(k) as a measure for x x(k) . We now wish to calculate x(k+1) such that
D(x(k+1) x(k) ) = (b Dx(k) + Lx(k+1) + Ux(k) ) ,
where is called the relaxation parameter. Re-arranging, we get
(D L)x(k+1) = ((1 )D + U)x(k) + b ,
and hence, the recurrence relation is given by
x(k+1) = (D L)1 ((1 )D + U)x(k) + (D L)1 b .
(2.6)
The process of reducing residuals at each stage is called Successive Relaxation. If 0 < < 1,
the iterative method is known as a Successive Under Relaxation and they can be used to
obtain convergence when the Gauss-Seidel scheme is not convergent. For choices of > 1 the scheme
33
(D L)x = ((1 )D + U) x + b ,
so,
BSOR = (D L)1 [(1 )D + U]
The aim is to choose such that the rate of convergence is maximised, that is the spectral radius,
(BSOR ()), is minimised. How do we find the value of that does this? There is no complete
answer for general N N systems, but it is known that if, for each, 1 i N , aii 6= 0 then
(BSOR ) |1 | .
This means that for convergence we must have 0 < < 2.
Example 2.6.1: We return once more to the linear system considered throughout this chapter
in Examples (2.1.1, 2.1.2, 2.3.1 & 2.4.1) and recast it here in terms of the SOR iterative method.
Recall,
A=
and
Now
10
1
,
10
10 0
0 1
10(1 )
+
=
,
(1 )D + U = (1 )
0 10
0 0
0
10(1 )
(D L) =
10
0
0
10 0
10
1 0
10
0
1/10
,
(D L)1 =
/100 1/10
34
/10
2
100
+1
+1
= 0,
[(1 ) ]
100
100
2
2
2 (1 )
2
(1 ) +
+ 1 + (1 )
+1
= 0,
100
100
100
2
2 2(1 ) +
+ (1 )2 = 0 .
100
Solving this quadratic for gives
1/2 !
2
4
2
2(1 ) +
4(1 )2
4(1 )2 + 4(1 )
+
100
100 104
1/2
2
1
4
2
4(1 )
+ 4
= (1 ) +
200 2
100 10
1/2
2
2
4(1 ) +
.
= (1 ) +
200 20
100
1
=
2
1
100 .
Changing changes
2
= 0,
100
and
= 1.
2.6
In general, it is not easy to find an appropriate for the SOR method and so an is usually chosen
which lies in the range 1 < < 2 and leads to a spectral radius, (BSOR ) which is as small as
reasonably possible. However, there are a set of matrices for which it is relatively easy to find the
optimum .
Consider the linear system Ax = b and let A = D L U. If the eigenvalues of
1 1
1
6= 0 ,
D L + D U ,
are independent of , then the matrix is said to be Consistently Ordered, and the optimum
for the SOR iterative method is
w=
Explanation
2
p
.
1 + 1 2 (BJ )
First, we note that for such a matrix, consistently ordered (eigenvalues are the same for all ) implies
that the eigenvalues of
D1 L +
1 1
D U
are the same as those for D1 L + D1 U = BJ , the Jacobi iterative matrix (i.e. put = 1).
Now consider the eigenvalues of BSOR . They satisfy the polynomial
det(BSOR I) = 0
or
and hence,
det (D L)1 ((1 )D + U) I = 0 ,
det(D L)1 det [(1 )D + U (D L)] = 0 ,
{z
}
|
6= 0
so satisfy
det [(1 )D + U + L] = 0 .
Since 6= 0, the non-zero eigenvalues satisfy
1
(1 )
D+
U + L = 0 ,
det
36
and thus,
1
1 1
( + 1)
D L + D U
det
I = 0.
When consistently ordered, the eigenvalues of
1
1
D L + D1 U
+1
2 2 = 2 + 2( 1) + ( 1)2 ,
or,
2 + (2 2 2 2 ) + ( 1)2 = 0 .
The eigenvalues of BSOR are then given by
1p
2 2
4( 1)2 4( 1)2 2 4( 1)2 + 4 4
= ( 1) +
2 r 2
2 2
4 4
=1+
(1 )2 2 +
2
4
r
2 2
2 2
=1+
(1 ) +
.
2
4
For each 2 there are 2 values of , these may be real or complex. If complex (note > 1), then
= ( 1)2
or
|| = 1 .
Hence,
(BSOR ) = 1 .
For the fastest convergence we require (BSOR ) to be as small as possible. It can be shown that
the best outcome is to make the roots of BSOR equal when = (BJ ), i.e. when is largest. This
implies
2 2
+1=0.
4
p
1 2
,
2
/2
1 (1 2 )
p
1 1 2
2
p
.
=
1 1 2
37
2
= 2
We are looking for the smallest value of and so we take the positive root of the above equation.
Hence, with = (BJ ), the best possible choice for is
2
p
.
1 + 1 ((BJ ))2
Example 2.6.1: We again return to Example (2.2.1, 2.2.2, 2.3.1 & 2.4.1) and show that it is a
consistently ordered matrix and determine the minimum , and hence, the fastest rate of convergence
for the SOR method.
As before we have
A=
then
D1 L +
10
1
,
10
1 1
D U=
=0
10
10
1
10
10
so
2 =
1
,
100
=
=
=
2
p
1 ((BJ ))2
2
p
,
1 + 1 (1/100)
1.0025126 .
1+
38