You are on page 1of 76

Chapter 3

Linear Systems of Equations
The solution of linear systems of equations is an extremely important pro-
cess in scientific computing. Linear systems of equations directly serve as
mathematical models in many situations, while solution of linear systems of
equations is an important intermediary computation in the analysis of other
models, such as nonlinear systems of differential equations.
Example 3.1
Find x
1
, x
2
, and x
3
such that
x
1
+ 2x
2
+ 3x
3
= −1,
4x
1
+ 5x
2
+ 6x
3
= 0,
7x
1
+ 8x
2
+ 10x
3
= 1.
This chapter deals with the analysis and approximate solution of such sys-
tems of equations with floating point arithmetic. We will study two direct
methods, Gaussian elimination (the LU decomposition) and the QR decom-
position), as well as iterative methods, such as the Gauss–Seidel method, for
solving such systems. (Computations in direct methods finish with a finite
number of operations, while iterative methods involve a limiting process, as
fixed point iteration does.) We will also study the singular value decomposi-
tion, a powerful technique for obtaining information about linear systems of
equations, the mathematical models that give rise to such linear systems, and
the effects of errors in the data on the solution of such systems.
The process of dealing with linear systems of equations comprises the sub-
ject numerical linear algebra. Before studying the actual solution of linear
systems, we introduce (or review) underlying commonly used notation and
facts.
69
70 Applied Numerical Methods
3.1 Matrices, Vectors, and Basic Properties
The coefficients of x
1
, x
2
, and x
3
in Example 3.1 can be written as an array
of numbers
A =
_
_
1 2 3
4 5 6
7 8 10
_
_
,
which we call a matrix. The horizontal lines of numbers are the rows, while
the vertical lines are the columns. In the example, we say “A is a 3 by 3
matrix,” meaning that it has 3 rows and 3 columns. (If a matrix B had two
rows and 5 columns, for example, we would say “B is a 2 by 5 matrix.”)
In numerical linear algebra, the variables x
1
, x
2
, and x
3
as in Example 3.1
are typically represented in a matrix
x =
_
_
x
1
x
2
x
3
_
_
with 3 rows and 1 column, as is the set of right members of the equations:
b =
_
_
−1
0
1
_
_
;
x and b are called column vectors.
Often, the system of linear equations will have real coefficients, but the
coefficients will sometimes be complex. If the system has n variables in the
column vector x, and the variables are assumed to be real, we say that x ∈ R
n
.
If B is an m by n matrix whose entries are real numbers, we say B ∈ R
m×n
.
If vector x has n complex coefficients, we say x ∈ C
n
, and if an m by n matrix
B has complex coefficients, we say B ∈ C
m×n
.
Systems such as in Example 3.1 can be written using matrices and vectors,
with the concept of matrix multiplication. We use upper case letters to denote
matrices, lower case letters without subscripts to denote vectors (which we
consider to be column vectors, we denote the element in the i-th row, j-th
column of a matrix A by a
ij
, and we sometimes denote the entire matrix A
by (a
ij
). The numbers that comprise the elements of matrices and vectors are
called scalars.
DEFINITION 3.1 If A = (a
ij
), then A
T
= (a
ji
) and A
H
= (a
ji
) denote
the transpose and conjugate transpose of A, respectively.
Example 3.2
On most computers in use today, the basic quantity in matlab is a matrix
whose entries are double precision floating point numbers according to the
Linear Systems of Equations 71
IEEE 754 standard. Matrices are marked with square brackets “[” and “]”,
with commas or spaces separating the entries in a row and semicolons or the
end of a line separating the rows. The transpose of a matrix is obtained by
typing a single quotation mark (or apostrophe) after the matrix. Consider
the following matlab dialog.
>> A = [1 2 3;4 5 6;7 8 10]
A =
1 2 3
4 5 6
7 8 10
>> A’
ans =
1 4 7
2 5 8
3 6 10
>>
DEFINITION 3.2 If A is an m× n matrix and B is an n × p matrix,
then C = AB where
c
ij
=
n

k=1
a
ik
b
kj
for i = 1, · · · , m, j = 1, · · · , p. Thus, C is an m×p matrix.
Example 3.3
Continuing the matlab dialog from Example 3.2, we have
>> B = [-1 0 1
2 3 4
3 2 1]
B =
-1 0 1
2 3 4
3 2 1
>> C = A*B
C =
12 12 12
24 27 30
72 Applied Numerical Methods
39 44 49
>>
(If the reader or student is not already comfortable with matrix multiplica-
tion, we suggest confirming the above calculation by doing it with paper and
pencil.)
With matrix multiplication, we can write the linear system in Example 3.1
at the beginning of this chapter as
Ax = b, A =
_
_
1 2 3
4 5 6
7 8 10
_
_
, x =
_
_
x
1
x
2
x
3
_
_
, b =
_
_
−1
0
1
_
_
.
Matrix multiplication can be easily described in terms of the dot product:
DEFINITION 3.3 Suppose we have two real vectors
v =
_
_
_
_
_
v
1
v
2
.
.
.
v
n
_
_
_
_
_
and w =
_
_
_
_
_
w
1
w
2
.
.
.
w
n
_
_
_
_
_
Then the dot product v ◦ w, also written (v, w), of v and w is the matrix
product
v
T
w =
n

i=1
v
i
w
i
.
If the vectors have complex components, the dot product is defined to be
v
H
w =
n

i=1
v
i
w
i
,
where v
i
is the complex conjugate of v
i
.
Dot products can also be defined more generally and abstractly, and are
useful throughout pure and applied mathematics. However, our interest here
is the fact that many computations in scientific computing can be written in
terms of dot products, and most modern computers have special circuitry and
software to do dot products efficiently.
Example 3.4
matlab represents the transpose of a vector V as V’. Also, if A is an n by
n matrix in matlab, the i-th row of A is accessed as A(i,:), while the j-th
Linear Systems of Equations 73
column of A is accessed as A(:,j). Continuing Example 3.3, we have the
following matlab dialog, illustrating writing the product matrix C in terms
of dot products.
>> C = [ A(1,:)*B(:,1), A(1,:)*B(:,2), A(1,:)*B(:,3)
A(2,:)*B(:,1), A(2,:)*B(:,2), A(2,:)*B(:,3)
A(3,:)*B(:,1), A(3,:)*B(:,2), A(3,:)*B(:,3)]
C =
12 12 12
24 27 30
39 44 49
>>
Matrix inverses are also useful in describing linear systems of equations:
DEFINITION 3.4 Suppose A is an n by n matrix. (That is, suppose
a is square.) Then, A
−1
is the inverse of A if A
−1
A = AA
−1
= I, where I
is the n by n identity matrix, consisting of 1’s on the diagonal and 0’s in all
off-diagonal elements. If A has an inverse, then A is said to be nonsingular
or invertible.
Example 3.5
Continuing the matlab dialog from the previous examples, we have
>> Ainv = inv(A)
Ainv =
-0.66667 -1.33333 1.00000
-0.66667 3.66667 -2.00000
1.00000 -2.00000 1.00000
>> Ainv*A
ans =
1.00000 0.00000 -0.00000
0.00000 1.00000 -0.00000
-0.00000 0.00000 1.00000
>> A*Ainv
ans =
1.0000e+00 4.4409e-16 -4.4409e-16
1.1102e-16 1.0000e+00 -1.1102e-15
3.3307e-16 2.2204e-15 1.0000e+00
>> eye(3)
74 Applied Numerical Methods
ans =
1 0 0
0 1 0
0 0 1
>> eye(3)-A*inv(A)
ans =
1.1102e-16 -4.4409e-16 4.4409e-16
-1.1102e-16 -8.8818e-16 1.1102e-15
-3.3307e-16 -2.2204e-15 2.3315e-15
>> eye(3)*B-B
ans =
0 0 0
0 0 0
0 0 0
>>
Above, observe that I · B = B. Also observe that the computed value of
I − AA
−1
is not exactly the matrix consisting entirely of zeros, but is a
matrix whose entries are small multiples of the machine epsilon for IEEE
double precision floating point arithmetic.
Some matrices do not have inverses.
DEFINITION 3.5 A matrix that does not have an inverse is called a
singular matrix. A matrix that does have an inverse is said to be non-singular.
Singular matrices are analogous to the number zero when we are dealing
with a single equation in a single unknown. In particular, if we have the
system of equations Ax = b, it follows that x = A
−1
b (since A
−1
(Ax) =
(A
−1
A)x = Ix = x = A
−1
b), just as if ax = b, then x = (1/a)b.
Example 3.6
The matrix
_
_
1 2 3
4 5 6
7 8 9
_
_
is singular. However, if we use matlab to try to find an inverse, we obtain:
>> A = [1 2 3;4 5 6;7 8 9]
Linear Systems of Equations 75
A =
1 2 3
4 5 6
7 8 9
>> inv(A)
ans =
1.0e+016 *
-0.4504 0.9007 -0.4504
0.9007 -1.8014 0.9007
-0.4504 0.9007 -0.4504
>>
Observe that the matrix matlab gives for the inverse has large elements (on
the order of the reciprocal of the machine epsilon �
m
≈ 1.11 × 10
−16
times
the elements of A). This is due to roundoff error. This can be viewed as
analogous to trying to form 1/a when a = 0, but, due to roundoff error (such
as some cancelation error) a is a small number, on the order of the machine
epsilon �
m
. We then have 1/a is on the order of 1/�
m
.
The following two definitions and theorem clarify which matrices are sin-
gular and clarify the relationship between singular matrices and solution of
linear systems of equations involving those matrices.
DEFINITION 3.6 Let

v
(i)

m
i=1
be m vectors. Then

v
(i)

m
i=1
is said
to be linearly independent provided

m
i=1
β
i
v
(i)
= 0, then β
i
= 0 for i =
1, 2, · · · , m.
Example 3.7
Let
a
1
=
_
_
1
2
3
_
_
, a
2
=
_
_
4
5
6
_
_
, and a
3
=
_
_
7
8
9
_
_
be the rows of the matrix A from Example 3.6 (expressed as column vectors).
Then
a
1
−2a
2
+a
3
= 0,
so a
1
, a
2
, and a
3
are linearly dependent. In particular, the third row of A is
two times the second row minus the first row.
DEFINITION 3.7 The rank of a matrix A, rank(A), is the maximum
number of linearly independent rows it possesses. It can be shown that this is
the same as the maximum number of linearly independent columns. If A is an
m by n matrix and rank(A) = min{m, n}, then A is said to be of full rank.
For example, if m < n and the rows of A are linearly independent, then A is
of full rank.
76 Applied Numerical Methods
The following theorem deals with rank, nonsingularity, and solutions to
systems of equations.
THEOREM 3.1
Let A be an n ×n matrix (A ∈ L(C
n
)). Then the following are equivalent:
1. A is nonsingular.
2. det(A) �= 0, where det(A) is the determinant
1
of the matrix A.
3. The linear system Ax = 0 has only the solution x = 0.
4. For any b ∈ C
n
, the linear system Ax = b has a unique solution.
5. The columns (and rows) of A are linearly independent. (That is, A is
of full rank, i.e. rank(A) = n.)
When the matrices for a system of equations have special properties, we
can often use these properties to take short cuts in the computation to solve
corresponding systems of equations, or to know that roundoff error will not
accumulate when solving such systems. Symmetry and positive definiteness
are important properties for these purposes.
DEFINITION 3.8 If A
T
= A, then A is said to be symmetric. If
A
H
= A, then A is said to be Hermitian.
Example 3.8
If A =

1 2 −i
2 +i 3

, then A
H
=

1 2 −i
2 +i 3

, so A is Hermitian.
DEFINITION 3.9 If A is an n by n matrix with real entries, if A
T
= A
and x
T
Ax > 0 for any x ∈ R
n
except x = 0, then A is said to be symmetric
positive definite. If A is an n by n matrix with complex entries, if A
H
= A
and x
H
Ax > 0 for x ∈ C
n
, x �= 0, then A is said to be Hermitian positive
definite. Similarly, if x
T
Ax ≥ 0 (for a real matrix A) or x
H
Ax ≥ 0 (for
a complex matrix A) for every x �= 0, we say that A is symmetric positive
semi-definite or Hermitian positive semi-definite, respectively.
1
We will not give a formal definition of determinant here, but we will use their properties.
Determinants are generally defined well in a good linear algebra course. We explain a good
way of computing determinants in Section 3.2.3 on page 86. When computing a determinant
of small matrices symbolically, expansion by minors is often used.
Linear Systems of Equations 77
Example 3.9
If A =

4 1
1 3

, then A
T
= A, so A is symmetric.
Also, x
T
Ax = 4x
2
1
+ 2x
1
x
2
+ 3x
2
2
= 3x
2
1
+ (x
1
+ x
2
)
2
+ 2x
2
2
> 0 for x �= 0.
Thus, A is symmetric positive definite.
Prior to studying actual methods for analyzing systems of linear equations,
we introduce the following concepts.
DEFINITION 3.10 If v = (v
1
, . . . , v
n
)
T
is a vector and λ is a number,
we define scalar multiplication w = λv by w
i
= λv
i
, that is, we multiply each
component of v by λ. We say that we have scaled v by λ. We can similarly
scale a matrix.
Example 3.10
Observe the following matlab dialog.
>> v = [1;-1;2]
v =
1
-1
2
>> lambda = 3
lambda =
3
>> lambda*v
ans =
3
-3
6
>>
DEFINITION 3.11 If A is an n×n matrix, a scalar λ and a nonzero x
are an eigenvalue and eigenvector of A if Ax = λx.
DEFINITION 3.12 ρ(A) = max
1≤i≤n

i
|, where {λ
i
}
n
i=1
is the set of eigen-
values of A, is called the spectral radius of A.
Example 3.11
The matlab function eig computes eigenvalues and eigenvectors. Consider
the following matlab dialog.
78 Applied Numerical Methods
>> A = [1,2,3
4 5 6
7 8 10]
A =
1 2 3
4 5 6
7 8 10
>> [V,Lambda] = eig(A)
V =
-0.2235 -0.8658 0.2783
-0.5039 0.0857 -0.8318
-0.8343 0.4929 0.4802
Lambda =
16.7075 0 0
0 -0.9057 0
0 0 0.1982
>> A*V(:,1) - Lambda(1,1)*V(:,1)
ans =
1.0e-014 *
0.0444
-0.1776
0.1776
>> A*V(:,2) - Lambda(2,2)*V(:,2)
ans =
1.0e-014 *
0.0777
0.1985
0.0944
>> A*V(:,3) - Lambda(3,3)*V(:,3)
ans =
1.0e-014 *
0.0666
-0.0444
0.2109
>>
Note that the eigenvectors of the matrix A are stored in the columns of V, while
corresponding eigenvalues are stored in the diagonal entries of the diagonal
matrix Lambda. In this case, the spectral radius is ρ(A) ≈ 16.7075.
Although we won’t study computation of eigenvalues and eigenvectors until
Chapter 5, we refer to the concept in this chapter.
With these facts and concepts, we can now study the actual solution of
systems of equations on computers.
Linear Systems of Equations 79
3.2 Gaussian Elimination
We can think of Gaussian elimination as a process of repeatedly adding
a multiple of one equation to another equation to transform the system of
equations into one that is easy to solve. We first focus on these elementary
row operations.
DEFINITION 3.13 Consider a linear system of equations Ax = b, where
A is n × n, and b, x ∈ R
n
.Elementary row operations on a system of linear
equations are of the following three types:
1. interchanging two equations,
2. multiplying an equation by a nonzero number,
3. adding to one equation a scalar multiple of another equation.
THEOREM 3.2
If system Bx = d is obtained from system Ax = b by a finite sequence of
elementary operations, then the two systems have the same solutions.
(A proof of Theorem 3.2 can be found in elementary texts on linear alge-
bra and can be done, for example, with Theorem 3.1 and using elementary
properties of determinants.)
The idea underlying Gaussian elimination is simple:
1. Subtract multiples of the first equation from the second through the
n-th equations to eliminate x
1
from the second through n-th equations.
2. Then, subtract multiples of the new second equation from the third
through n-th equations to eliminate x
2
from these. After this step, the
third through n-th equations contain neither x
1
nor x
2
.
3. Continue this process until the resulting n-th equation contains only x
n
,
the resulting n −1-st equation contains only x
n
and x
n−1
, etc.
4. Solve the resulting n-th equation for x
n
.
5. Plug the value for x
n
into the resulting (n − 1)-st equation, and solve
that equation for x
n−1
.
6. Continue this back-substitution process until we have solved for x
1
in
the first equation.
80 Applied Numerical Methods
Example 3.12
We will apply this process to the system in Example 3.1. In illustrating
the process, we can write the original system and transformed systems as an
augmented matrix, with a number’s position in the matrix telling us to which
variable (or right-hand-side) and which equation it belongs. The original
system is thus written as
_
_
1 2 3 −1
4 5 6 0
7 8 10 1
_
_
.
We will use ∼ to denote that two systems of equations are equivalent, and we
will indicate below this symbol which multiples are subtracted: For example
R
3
← R
3
− 2R
2
would mean that we replace the third row (i.e. the third
equation) by the third equation minus two times the second equation. The
Gaussian elimination process then proceeds as follows.
_
_
1 2 3 −1
4 5 6 0
7 8 10 1
_
_

R2←R2−4R1
R3←R3−7R1
_
_
1 2 3 −1
0 −3 −6 4
0 −6 −11 8
_
_

R3←R3−2R2
_
_
1 2 3 −1
0 −3 −6 4
0 0 1 0
_
_
.
The transformed third equation now reads “x
3
= 0,” while the transformed
second equation reads “−3x
2
− 6x
3
= 4.” Plugging x
3
= 0 into the trans-
formed second equation thus gives
x
2
= (4 + 6 · x
3
)/(−3) = (4)/(−3) = −
4
3
.
Similarly plugging x
3
= 0 and x
2
= −4/3 into the transformed first equation
gives
x
1
= (−1 −2x
2
−3x
3
) = (−1 −2(−4/3)) = 5/3.
The solution vector is thus
_
_
x
1
x
2
x
3
_
_
=
_
_
5/3
−4/3
0
_
_
.
We check by computing the residual :
Ax −b =
_
_
1 2 3
4 5 6
7 8 10
_
_
_
_
5/3
−4/3
0
_
_

_
_
−1
0
1
_
_
=
_
_
−1
0
1
_
_

_
_
−1
0
1
_
_
=
_
_
0
0
0
_
_
.
Linear Systems of Equations 81
Example 3.13
If we had used floating point arithmetic in Example 3.12, 5/3 and −4/3
would not have been exactly representable, and the residual would not have
been exactly zero. In fact, a variant
2
of Gaussian elimination with back-
substitution is programmed in matlab and accessible with the backslash (\)
operator:
>> A = [1 2 3
4 5 6
7 8 10]
A =
1 2 3
4 5 6
7 8 10
>> b = [-1;0;1]
b =
-1
0
1
>> x = A\b
x =
1.6667
-1.3333
-0.0000
>> A*x-b
ans =
1.0e-015 *
0.2220
0.8882
0.8882
>>
3.2.1 The Gaussian Elimination Algorithm
Following the pattern in the examples we have presented, we can write down
the process in general. The system will be written
a
11
x
1
+a
12
x
2
+ · · · +a
1n
x
n
= b
1
,
a
21
x
1
+a
22
x
2
+ · · · +a
2n
x
n
= b
2
,
.
.
.
a
n1
x
1
+a
n2
x
2
+ · · · +a
nn
x
n
= b
n
.
2
using partial pivoting, which we will see later
82 Applied Numerical Methods
Now, the transformed matrix
_
_
1 2 3
0 −3 −6
0 0 1
_
_
from Example 3.12 (page 80) is termed an upper triangular matrix, since it has
zeros in all entries below the diagonal. The goal of Gaussian elimination is to
reduce A to an upper triangular matrix through a sequence of elementary row
operations as in Definition 3.13. We will call the transformed matrix before
working on the r-th column A
(r)
, with associated right-hand-side vector b
(r)
,
and we begin with A
(1)
= A = (a
(1)
ij
) and b = b
(1)
= (b
(1)
1
, b
(1)
2
, · · · , b
(1)
n
)
T
,
with A
(1)
x = b
(1)
. The process can then be described as follows.
Step 1: Assume that a
(1)
11
�= 0. (Otherwise, the nonsingularity of A guaran-
tees that the rows of A can be interchanged in such a way that the new
a
(1)
11
is nonzero.) Let
m
i1
=
a
(1)
i1
a
(1)
11
, 2 ≤ i ≤ n.
Now multiply the first equation of A
(1)
x = b
(1)
by m
i1
and subtract the
result from the i-th equation. Repeat this for each i, 2 ≤ i ≤ n. As a
result, we obtain A
(2)
x = b
(2)
, where
A
(2)
=
_
_
_
_
_
_
_
a
(1)
11
a
(1)
12
. . . a
(1)
1n
0 a
(2)
22
. . . a
(2)
2n
.
.
.
.
.
.
.
.
.
0 a
(2)
n2
. . . a
(2)
nn
_
_
_
_
_
_
_
and b
(2)
=
_
_
_
_
_
_
_
b
(1)
1
b
(2)
2
.
.
.
b
(2)
n
_
_
_
_
_
_
_
.
Step 2: We consider the (n − 1) × (n − 1) submatrix
˜
A
(2)
of A
(2)
defined
by
˜
A
(2)
= a
(2)
ij
, 2 ≤ i, j ≤ n. We eliminate the first column of
˜
A
(2)
in a manner identical to the procedure for A
(1)
. The result is system
A
(3)
x = b
(3)
where A
(3)
has the form
A
(3)
=
_
_
_
_
_
_
_
_
_
a
(1)
11
a
(1)
12
· · · a
(1)
1n
0 a
(2)
22
· · · a
(2)
2n
.
.
. 0 a
(3)
33
· · · a
(3)
3n
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
(3)
n3
· · · a
(3)
nn
_
_
_
_
_
_
_
_
_
.
Linear Systems of Equations 83
Steps 3 to n −1: The process continues as above, where at the k-th stage
we have A
(k)
x = b
(k)
, 1 ≤ k ≤ n −1, where
A
(k)
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
(1)
11
· · · a
(1)
1n
0 a
(2)
22
· · · a
2
2n
.
.
. 0 a
33
· · ·
.
.
.
.
.
.
.
.
.
.
.
.
0 a
(k)
kk
· · · a
(k)
kn
.
.
.
.
.
.
0 · · · 0 a
(k)
nk
· · · a
(k)
nn
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
and b
(k)
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
b
(1)
1
b
(2)
2
.
.
.
b
(k−1)
k−1
b
(k)
k
.
.
.
b
(k)
n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
(3.1)
For every i, k + 1 ≤ i ≤ n, the k-th equation is multiplied by
m
ik
= a
(k)
ik
/a
(k)
kk
and subtracted from the i-th equation. (We assume, if necessary, a row
is interchanged so that a
(k)
kk
�= 0.) After step k = n − 1, the resulting
system is A
(n)
x = b
(n)
where A
(n)
is upper triangular.
On a computer, this algorithm can be programmed as:
ALGORITHM 3.1
(Gaussian elimination, forward phase)
INPUT: the n by n matrix A and the n-vector b ∈ R
n
.
OUTPUT: A
(n)
and b
(n)
.
FOR k = 1, 2, · · · , n −1
FOR i = k + 1, · · · , n
(a) m
ik
←a
ik
/a
kk
.
(b) FOR j = k, k + 1, · · · , n
a
ij
←a
ij
−m
ik
a
kj
.
END FOR
(c) b
i
←b
i
−m
ik
b
k
.
END FOR
END FOR
END ALGORITHM 3.1.
84 Applied Numerical Methods
Note: In Step (b) of Algorithm 3.1, we need only do the loop for j = k + 1
to n, since we know that the resulting a
k+1,k
will equal 0.
Back solving can be programmed as:
ALGORITHM 3.2
(Gaussian elimination, back solving phase)
INPUT: A
(n)
and b
(n)
from Algorithm 3.1.
OUTPUT: x ∈ R
n
as a solution to Ax = b.
1. x
n
←b
n
/a
nn
.
2. FOR k = n −1, n −2, · · · , 1
x
k
←(b
k

n

j=k+1
a
kj
x
j
)/a
kk
.
END FOR
END ALGORITHM 3.2.
Note: To solve Ax = b using Gaussian elimination requires
1
3
n
3
+ O(n
2
)
multiplications and divisions. (See Exercise 4 on page 142.)
3.2.2 The LU decomposition
We now explain how the Gaussian elimination process we have just pre-
sented can be viewed as finding a lower triangular matrix L (i.e. a matrix
with zeros above the diagonal) and an upper triangular matrix U such that
A = LU. Assume first that no row interchanges are performed in Gaussian
elimination. Let
M
(1)
=
_
_
_
_
_
_
_
1 0 . . . 0
−m
21
−m
31
.
.
. I
n−1
−m
n1
_
_
_
_
_
_
_
,
where I
n−1
is the (n−1)×(n−1) identity matrix, and where m
i1
, 2 ≤ i ≤ n are
defined in Gaussian elimination. Then, A
(2)
= M
(1)
A
(1)
and b
(2)
= M
(1)
b
(1)
.
Linear Systems of Equations 85
At the r-th stage of the Gaussian elimination process,
M
(r)
= r-th row
_
_
_
_
_
_
_
I
r−1
0 0 . . . 0
0 1 0 . . . 0
0 −m
r+1,r
.
.
.
.
.
. I
n−r
0 −m
n,r
_
_
_
_
_
_
_
. (3.2)
Also,
(M
(r)
)
−1
=
_
_
_
_
_
_
_
I
r−1
0 0 . . . 0
0 1 0 . . . 0
0 m
r+1,r
.
.
.
.
.
. I
n−r
0 m
n,r
_
_
_
_
_
_
_
, (3.3)
where m
ir
, r + 1 ≤ i ≤ n are given in the Gaussian elimination process, and
A
(r+1)
= M
(r)
A
(r)
and b
(r+1)
= M
(r)
b
(r)
. (Note: We are assuming here that
a
(r)
rr
�= 0 and no row interchanges are required.) Collecting the above results,
we obtain A
(n)
x = b
(n)
, where
A
(n)
= M
(n−1)
M
(n−2)
· · · M
(1)
A
(1)
and b
(n)
= M
(n−1)
M
(n−2)
· · · M
(1)
b
(1)
.
Recalling that A
(n)
is upper triangular and setting A
(n)
= U, we have
A = (M
(n−1)
M
(n−2)
· · · M
(1)
)
−1
U. (3.4)
Example 3.14
Following the Gaussian elimination process from Example 3.12, we have
M
(1)
=
_
_
1 0 0
−4 1 0
−7 0 1
_
_
, M
(2)
=
_
_
1 0 0
0 1 0
0 −2 1
_
_
,
(M
(1)
)
−1
=
_
_
1 0 0
4 1 0
7 0 1
_
_
, (M
(2)
)
−1
=
_
_
1 0 0
0 1 0
0 2 1
_
_
,
and A = LU, with
L = (M
(1)
)
−1
(M
(2)
)
−1
=
_
_
1 0 0
4 1 0
7 2 1
_
_
, U =
_
_
1 2 3
0 −3 −6
0 0 1
_
_
.
Applying the Gaussian elimination process to b can be viewed as solving
Ly = b for y then solving Ux = y for x. Solving Ly = b involves forming
M
1)
b, then forming M
(2)
(M
(1)
b), while solving Ux = y is simply the back-
substitution process.
86 Applied Numerical Methods
Note: The product of two lower triangular matrices is lower triangular and
the inverse of a nonsingular lower triangular matrix is lower triangular. Thus,
L = (M
(1)
)
−1
(M
(2)
)
−1
· · · (M
(n−1)
)
−1
is lower triangular. Hence, A = LU,
i.e., A is expressed as a product of lower and upper triangular matrices. The
result is called the LU decomposition (also known as the LU factorization,
triangular factorization, or triangular decomposition of A. The final matrices
L and U are given by:
L =
_
_
_
_
_
_
_
_
1 0 · · · 0
m
21
1 0 · · · 0
m
31
m
32
1 · · · 0
.
.
.
.
.
.
.
.
.
.
.
.
m
n1
m
n2
· · · m
n,n−1
1
_
_
_
_
_
_
_
_
and U =
_
_
_
_
_
_
_
_
a
(1)
11
a
(1)
12
· · · a
(1)
1n
0 a
(2)
22
· · · a
(2)
2n
.
.
.
.
.
.
.
.
.
0 · · · 0 a
(n)
nn
_
_
_
_
_
_
_
_
.
(3.5)
This decomposition can be so formed when no row interchanges are required.
Thus, the original problem Ax = b is transformed into LUx = b.
Note: Since computing the LU decomposition of A is done by Gaussian
elimination, it requires O(n
3
) operations. However, if L and U are already
available, computing y with Ly = b then computing x with Ux = y requires
only O(n
2
) operations.
Note: In some software, the multiplying factors, that is, the nonzero off-
diagonal elements of L, are stored in the locations of corresponding entries
of A that are made equal to zero, thus obviating the need for extra storage.
Effectively, such software returns the elements of L and U in the same array
that was used to store A.
3.2.3 Determinants and Inverses
Usually, the solution x of a system of equations Ax = b is desired, and the
determinant det(A) is not of interest, even though one method of computing
x, Cramer’s rule, involves first computing determinants. (In fact, computing
x with Gaussian elimination with back substitution is more efficient than
using Cramer’s rule, and is definitely more practical for large n.) However,
occasionally the determinant of A is desired for other reasons. An efficient
way of computing the determinant of a matrix is with Gaussian elimination.
If A = LU, then
det(A) = det(L) det(U) =
n

j=1
a
(j)
jj
.
(Using expansion by minors to compute the determinant requires O(n!) mul-
tiplications.)
Similarly, even though we could in principle compute A
−1
, then compute
x = A
−1
b, computing A
−1
is less efficient than applying Gaussian elimination
with back-substitution. However, if we need A
−1
for some other reason, we
Linear Systems of Equations 87
can compute it relatively efficiently by solving n systems Ax
(j)
= e
(j)
where
e
(j)
i
= δ
ij
, where δ
ij
is the Kronecker delta function defined by
δ
ij
=

1 if i = j,
0 if i �= j.
If A = LU, we perform n pairs of forward and backward solves, to obtain
A
−1
= (x
1
, x
2
, · · · , x
n
).
Example 3.15
In Example 3.14, for
A =
_
_
1 2 3
4 5 6
7 8 10
_
_
,
we used Gaussian elimination to obtain A = LU, with
L =
_
_
1 0 0
4 1 0
7 2 1
_
_
and U =
_
_
1 2 3
0 −3 −6
0 0 1
_
_
.
Thus,
det(A) = u
11
u
22
u
33
= (1)(−3)(1) = −3.
We now compute A
−1
: Using L and U to solve
_
_
1 2 3
4 5 6
7 8 10
_
_
x
(1)
=
_
_
1
0
0
_
_
gives x
(1)
=
_
_
−2/3
−2/3
1
_
_
,
solving
_
_
1 2 3
4 5 6
7 8 10
_
_
x
(2)
=
_
_
0
1
0
_
_
gives x
(2)
=
_
_
−4/3
11/3
−2
_
_
,
and solving
_
_
1 2 3
4 5 6
7 8 10
_
_
x
(3)
=
_
_
0
0
1
_
_
gives x
(3)
=
_
_
1
−2
1
_
_
.
Thus,
A
−1
=
_
_
−2/3 −4/3 1
−2/3 11/3 −2
1 −2 1
_
_
, AA
−1
= I =
_
_
1 0 0
0 1 0
0 0 1
_
_
.
88 Applied Numerical Methods
3.2.4 Pivoting in Gaussian Elimination
In our explanation and examples of Gaussian elimination so far, we have
assumed that “no row interchanges are required.” In particular, we must have
a
kk
�= 0 in each step of Algorithm 3.1. Otherwise, we may need to do a “row
interchange,” that is, we may need to rearrange the order of the transformed
equations. We have two questions:
1. When can Gaussian elimination be performed without row interchanges?
2. If row interchanges are employed, can Gaussian elimination always be
employed?
THEOREM 3.3
(Existence of an LU factorization) Assume that n×n matrix A is nonsingular.
Then A = LU if and only if all the leading principal submatrices of A are
nonsingular.
3
Moreover, the LU decomposition is unique, if we require that
the diagonal elements of L are all equal to 1.
REMARK 3.1 Two important types of matrices that have nonsingu-
lar leading principal submatrices are symmetric positive definite and strictly
diagonally dominant, i.e.,
|a
ii
| >
n

j=1
j�=i
|a
ij
|, for i = 1, 2, · · · , n.
We now consider our second question, “If row interchanges are employed, can
Gaussian elimination be performed for any nonsingular A?” Switching the
rows of a matrix A can be done by multiplying A on the left by a permutation
matrix:
DEFINITION 3.14 A permutation matrix P is a matrix whose columns
consist of the n different vectors e
j
, 1 ≤ j ≤ n, in any order.
3
The leading principal submatrices of A have the form



a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
k1
. . . a
kk


 for k = 1, 2, · · · , n.
Linear Systems of Equations 89
Example 3.16
P = (e
1
, e
3
, e
4
, e
2
) =
_
_
_
_
1 0 0 0
0 0 0 1
0 1 0 0
0 0 1 0
_
_
_
_
is a permutation matrix such that the first row of PA is the first row of A, the
second row of PA is the fourth row of A, the third row of PA is the second
row of A, and the fourth row of PA is the third row of A. Note that the
permutation of the columns of the identity matrix in P corresponds to the
permutation of the rows of A. For example,
_
_
0 1 0
0 0 1
1 0 0
_
_
_
_
1 2 3
4 5 6
7 8 10
_
_
=
_
_
4 5 6
7 8 10
1 2 3
_
_
.
Thus, by proper choice of P, any two or more rows can be interchanged.
Note: det P = ±1, since P is obtained from I by row interchanges.
Now, Gaussian elimination with row interchanges can be performed by the
following matrix operations:
4
A
(n)
= M
(n−1)
P
(n−1)
M
(n−2)
P
(n−2)
· · · M
(2)
P
(2)
M
(1)
P
(1)
A.
b
(n)
= M
(n−1)
P
(n−1)
· · · M
(2)
P
(2)
M
(1)
P
(1)
b.
It follows that U =
ˆ
LA
(1)
, where
ˆ
L is no longer lower triangular. However, if
we perform all the row interchanges first, at once, then
M
(n−1)
· · · M
(1)
PAx = M
(n−1)
M
(n−2)
· · · M
(1)
Pb,
or
˜
LPAx =
˜
LPb,
so
˜
LPA = U.
Thus,
PA =
˜
L
−1
U = LU.
We can state these facts as follows.
4
When implementing Gaussian elimination, we usually don’t actually multiply full n by
n matrices together, since this is not efficient. However, viewing the process as matrix
multiplications has advantages when we analyze it.
90 Applied Numerical Methods
THEOREM 3.4
If A is a nonsingular n × n matrix, then there is a permutation matrix P
such that PA = LU, where L is lower triangular and U is upper triangular.
(Note: det(PA) = ±det(A) = det(L) det(U).)
We now examine the actual operations we do to complete the Gaussian
elimination process with row interchanges (known as pivoting).
Example 3.17
Consider the system
0.0001x
1
+x
2
= 1
x
1
+x
2
= 2.
The exact solution of this system is x
1
≈ 1.00010 and x
2
≈ 0.99990. Let
us solve the system using Gaussian elimination without row interchanges.
We will assume calculations are performed using three-digit rounding decimal
arithmetic. We obtain
m
21

a
(1)
21
a
(1)
11
≈ 0.1 ×10
5
,
a
(2)
22
← a
(1)
22
−m
21
a
(1)
12
≈ 0.1 ×10
1
−0.1 ×10
5
≈ −0.100 ×10
5
.
Also, b
(2)
≈ (0.1 × 10
1
, −0.1 × 10
5
)
T
, so the computed (approximate) upper
triangular system is
0.1 ×10
−3
x
1
+ 0.1 ×10
1
x
2
= 0.1 ×10
1
,
−0.1 ×10
5
x
2
= −0.1 ×10
5
,
whose solutions are x
2
= 1 and x
1
= 0. If instead, we first interchange the
equations so that a
(1)
11
= 1, we find that x
1
= x
2
= 1, correct to the accuracy
used.
Example 3.17 illustrates that small values of a
(r)
rr
in the r-th stage lead to
large values of the m
ir
’s and may result in a loss of accuracy. Therefore, we
want the pivots a
(r)
rr
to be large.
Two common pivoting strategies are:
Partial pivoting: In partial pivoting, the a
(r)
ir
for r ≤ i ≤ n, in the r-
th column of A
(r)
is searched to find the element of largest absolute
value, and row interchanges are made to place that element in the pivot
position.
Full pivoting: In full pivoting, the pivot element is selected as the element
a
(r)
ij
, r ≤ i, j ≤ n of maximum absolute value among all elements of
Linear Systems of Equations 91
the (n −r) ×(n −r) submatrix of A
(r)
. This strategy requires row and
column interchanges.
In theory, full pivoting is required in general to assure that the process does
not result in excessive roundoff error. However, partial pivoting is adequate
in most cases. For some classes of matrices, no pivoting strategy is required
for a stable elimination procedure. For example, no pivoting is required for a
real symmetric positive definite matrix or for a strictly diagonally dominant
matrix [41].
We now present a formal algorithm for Gaussian elimination with partial
pivoting. In reading this algorithm, recall that
a
11
x
1
+a
12
x
2
+· · · +a
1n
x
n
= b
1
a
21
x
1
+a
22
x
2
+· · · +a
2n
x
n
= b
2
.
.
.
a
n1
x
1
+a
n2
x
2
+· · · +a
nn
x
n
= b
n
.
ALGORITHM 3.3
(Solution of a linear system of equations with Gaussian elimination with par-
tial pivoting and back-substitution)
INPUT: The n by n matrix A and right-hand-side vector b.
OUTPUT: An approximate solution
5
x to Ax = b.
FOR k = 1, 2, · · · , n −1
1. Find � such that |a
�k
| = max
k≤j≤n
|a
jk
| (k ≤ � ≤ n).
2. Interchange row k with row �
_
_
_
c
j
←a
kj
a
kj
←a
�j
a
�j
←c
j
_
_
_
for j = 1, 2, . . . , n, and
_
_
_
d ← b
k
b
k
← b

b

← d
_
_
_
.
3. FOR i = k + 1, · · · , n
(a) m
ik
←a
ik
/a
kk
.
(b) FOR j = k, k + 1, · · · , n
a
ij
←a
ij
−m
ik
a
kj
.
END FOR
5
approximate because of roundoff error
92 Applied Numerical Methods
(c) b
i
←b
i
−m
ik
b
k
.
END FOR
4. Back-substitution:
(a) x
n
←b
n
/a
nn
and
(b) x
k


b
k

n

j=k+1
a
kj
x
j
��
a
kk
, for k = n −1, n −2, · · · , 1.
END FOR
END ALGORITHM 3.3.
REMARK 3.2 In Algorithm 3.3, the computations are arranged “seri-
ally,” that is, they are arranged so each individual addition and multiplica-
tion is done separately. However, it is efficient on modern machines, that
have “pipelined” operations and usually also have more than one processor,
to think of the operations as being done on vectors. Furthermore, we don’t
necessarily need to change entire rows, but just keep track of a set of indices
indicating which rows are interchanged; for large systems, this saves a signif-
icant number of storage and retrieval operations. For views of the Gaussian
elimination process in terms of vector operations, see [16]. For an example of
software that takes account of the way machines are built, see [5].
REMARK 3.3 If U is the upper triangular matrix resulting from Gaus-
sian elimination with partial pivoting, we have
det(A) = (−1)
K
det(U) = (−1)
K
a
(1)
11
a
(2)
22
· · · a
(n)
nn
,
where K is the number of row interchanges made.
3.2.5 Systems with a Special Structure
We now consider some special but commonly encountered kinds of matrices.
3.2.5.1 Symmetric, Positive Definite Matrices
We first characterize positive definite matrices.
THEOREM 3.5
Let A be a real symmetric n × n matrix. Then A is positive definite if and
only if there exists an invertible lower triangular matrix L such that A = LL
T
.
Furthermore, we can choose the diagonal elements of L, �
ii
, 1 ≤ i ≤ n, to be
positive numbers.
Linear Systems of Equations 93
The decomposition with positive �
ii
is called the Cholesky factorization of
A. It can be shown that this decomposition is unique. L can be computed
using a variant of Gaussian elimination. Set �
11
=

a
11
and �
j1
= a
j1
/

a
11
for 2 ≤ j ≤ n. (Note that x
T
Ax > 0, and the choice x = e
j
implies that
a
jj
> 0.) Then, for i = 1, 2, 3, · · · n, set

ii
=

a
ii

i−1

k=1
(�
ik
)
2

1
2

ji
=
1

ii

a
ji

i−1

k=1

ik

jk

for i + 1 ≤ j ≤ n.
If A is real symmetric and L can be computed in this way, then A is positive
definite. (This is an efficient way to show positive definiteness.) To solve
Ax = b where A is real symmetric positive definite, L can be formed in this
way, and the pair Ly = b and L
T
x = y can be solved for x, analogously to
the way we use the LU decomposition to solve a system.
Note: The multiplication and division count for Cholesky decomposition is
n
3
/6 +O(n
2
).
Thus, for large n, about 1/2 the multiplications and divisions are required
compared to standard Gaussian elimination.
Example 3.18
Consider solving approximately
x
��
(t) = −sin(πt), x(0) = x(1) = 0.
One technique of approximately solving this equation is to replace x
��
in the
differential equation by
x
��
(t) ≈
x(t +h) −2x(t) +x(t −h)
h
2
. (3.6)
If we subdivide the interval [0, 1] into four subintervals, then the end points of
these subintervals are t
0
= 0, t
1
= 1/4, t
2
= 1/2, t
3
= 3/4, and t
4
= 1. If we
require the approximate differential equation with x
��
replaced using (3.6) to
be exact at t
1
, t
2
, and t
3
and take h = 1/4 to be the length of a subinterval,
we obtain:
at t
1
=
1
4
:
x
2
−2x
1
+x
0
1
16
= −sin(π/4),
at t
2
=
1
2
:
x
3
−2x
2
+x
1
1
16
= −sin(π/2),
at t
3
=
3
4
:
x
4
−2x
3
+x
2
1
16
= −sin(3π/4),
94 Applied Numerical Methods
with t
k
= k/4, k = 0, 1, 2, 3, 4. If we plug in x
0
= 0, x
4
= 0, we multiply
both sides of each of these three equations by −h
2
= −1/16, and we write the
equations in matrix form, we obtain
_
_
2 −1 0
−1 2 −1
0 −1 2
_
_
_
_
x
1
x
2
x
3
_
_
=
1
16
_
_
sin(π/4)
sin(π/2)
sin(3π/4)
_
_
.
The matrix for this system is symmetric. There is a matlab function chol
that performs a Cholesky factorization. We use it as follows:
>> A = [2 -1 0
-1 2 -1
0 -1 2]
A =
2 -1 0
-1 2 -1
0 -1 2
>> b = (1/16)*[sin(pi/4); sin(pi/2); sin(3*pi/4)]
b =
0.0442
0.0625
0.0442
>> L = chol(A)’
L =
1.4142 0 0
-0.7071 1.2247 0
0 -0.8165 1.1547
>> L*L’-A
ans =
1.0e-015 *
0.4441 0 0
0 -0.4441 0
0 0 0
>> y = L\b
y =
0.0312
0.0691
0.0871
>> x = L’\y
x =
0.0754
0.1067
0.0754
>> A\b
ans =
0.0754
Linear Systems of Equations 95
0.1067
0.0754
>>
3.2.5.2 Tridiagonal Matrices
A tridiagonal matrix is a matrix of the form
A =
_
_
_
_
_
_
_
_
_
a
1
c
1
0 · · · 0
b
2
a
2
c
2
0 · · · 0
0 b
3
a
3
c
3
· · · 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 · · · 0 b
n−1
a
n−1
c
n−1
0 · · · 0 b
n
a
n
_
_
_
_
_
_
_
_
_
.
For example, the matrix from Example 3.18 is tridiagonal. In many cases im-
portant in applications, A can be decomposed into a product of two bidiagonal
matrices, that is,
A = LU =
_
_
_
_
_
_
α
1
0 · · · 0
b
2
α
2
· · · 0
.
.
.
.
.
.
.
.
.
.
.
.
0 · · · b
n
α
n
_
_
_
_
_
_
_
_
_
_
_
_
1 γ
1
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
γ
n−1
0 · · · 0 1
_
_
_
_
_
_
. (3.7)
In such cases, multiplying the matrices on the right of (3.7) together and
equating the resulting matrix entries with corresponding entries of A gives
the following variant of Gaussian elimination:
α
1
= a
1
,
γ
1
= c
1

1
,

α
i
= a
i
−b
i
γ
i−1
γ
i
= c
i

i

for i = 2, · · · , n −1,
α
n
= a
n
−b
n
γ
n−1
.
(3.8)
Thus, if α
i
�= 0, 1 ≤ i ≤ n, we can compute the decomposition (3.7). Fur-
thermore, we can compute the solution to Ax = f = (f
1
, f
2
, · · · , f
n
)
T
by
successively solving Ly = f and Ux = y, i.e.,
y
1
= f
1

1
,
y
i
= (f
i
−b
i
y
i−1
)/α
i
for i = 2, 3, · · · , n,
x
n
= y
n
,
x
j
= (y
j
−γ
j
x
j+1
) for j = n −1, n −2, · · · , 1.
(3.9)
Sufficient conditions to guarantee the decomposition (3.7) are as follows.
THEOREM 3.6
Suppose the elements a
i
, b
i
, and c
i
of A satisfy |a
1
| > |c
1
| > 0, |a
i
| ≥ |b
i
|+|c
i
|,
and b
i
c
i
�= 0 for 2 ≤ i ≤ n − 1, and suppose |a
n
| > |b
n
| > 0. Then A is
96 Applied Numerical Methods
invertible and the α
i
’s are nonzero. (Consequently, the factorization (3.7) is
possible.)
Note: It can be verified that solution of a linear system having tridiago-
nal coefficient matrix using (3.8) and (3.9) requires (5n − 4) multiplications
and divisions and 3(n − 1) additions and subtractions. (Recall that we need
n
3
/3+O(n
2
) multiplications and divisions for Gaussian elimination.) Storage
requirements are also drastically reduced to 3n locations, versus n
2
for a full
matrix.
Example 3.19
The matrix from Example 3.18 is tridiagonal, and satisfies the conditions
in Theorem 3.6. This holds true if we form the linear system of equations
in the same was as in Example 3.18, regardless of how small we make h,
and how large the resulting system is. Thus, we may solve such systems
with the forward substitution and back substitution algorithms represented
by (3.8) and (3.9). If we want less truncation error in the approximation to
the differential equation, we need to solve a larger system (with h smaller). It
is more practical to do so with (3.8) and (3.9) than with the general Gaussian
elimination algorithm, since the amount of work the computer has to do is
proportional to n, rather than n
3
.
3.2.5.3 Block Tridiagonal Matrices
We now consider briefly block tridiagonal matrices, that is, matrices of the
form
A =
_
_
_
_
_
_
_
_
_
_
A
1
C
1
0 · · · 0 0
B
2
A
2
C
2
0 · · · 0
0 B
3
A
3
C
3
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 B
n−1
A
n−1
C
n−1
0 0 0 0 B
n
A
n
_
_
_
_
_
_
_
_
_
_
,
where A
i
, B
i
, and C
i
are m×m matrices. Analogous to the tridiagonal case,
we construct a factorization of the form
A =
_
_
_
_
_
_
ˆ
A
1
0 . . . 0
B
2
ˆ
A
2
. . . 0
0
.
.
.
.
.
.
.
.
.
0 . . . B
n
ˆ
A
n
_
_
_
_
_
_
_
_
_
_
_
_
I E
1
. . . 0
0 I
.
.
. 0
0
.
.
.
.
.
. E
n−1
0 . . . 0 I
_
_
_
_
_
_
.
Linear Systems of Equations 97
Provided the
ˆ
A
i
, 1 ≤ i ≤ n, are nonsingular, we can compute:
ˆ
A
1
= A
1
,
E
1
=
ˆ
A
−1
1
C
1
,
ˆ
A
i
= A
i
−B
i
E
i−1
for 2 ≤ i ≤ n,
E
i
=
ˆ
A
−1
i
C
i
, for 2 ≤ i ≤ n −1.
For efficiency, the
ˆ
A
−1
i
are generally not computed, but instead, the columns
of E
i
are computed by factoring
ˆ
A
i
and solving a pair of triangular systems.
That is,
ˆ
A
i
E
i
= C
i
with
ˆ
A
i
= L
i
U
i
becomes L
i
U
i
E
i
= C
i
.
Note: The number of operations for computing a block factorization of a
block tridiagonal system is proportional to nm
3
. This is significantly less
than the number of operations, proportional to n
3
, for completing the general
Gaussian elimination algorithm, for m small relative to n. In such cases,
tremendous savings are achieved by taking advantage of the zero elements.
Now consider
Ax = b, x =
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
, b =
_
_
_
_
_
b
1
b
2
.
.
.
b
n
_
_
_
_
_
,
where x
i
, b
i
∈ R
m
. Then, with the factorization A = LU, Ax = b can be
solved as follows: Ly = b, Ux = y, with
ˆ
A
1
y
1
= b
1
,
ˆ
A
i
y
i
= (b
i
−B
i
y
i−1
) for i = 2, · · · , n,
x
n
= y
n
,
x
j
= y
j
−E
j
x
j+1
for j = n −1, · · · , 1.
Block tridiagonal systems arise in various applications, such as in equi-
librium models for diffusion processes in two and three variables, a simple
prototype of which is the equation

2
u
∂x
2
+

2
∂y
2
= −f(x, y),
when we approximate the partial derivatives in a manner similar to how we
approximated u
��
in Example 3.18. In that case, not only is the overall system
block tridiagonal, but, depending on how we order the equations and variables,
the individual matrices A
i
, B
i
, and C
i
are tridiagonal, or contain mostly zeros.
Taking advantage of these facts is absolutely necessary, to be able to achieve
the desired accuracy in the approximation to the solutions of certain models.
3.2.5.4 Banded Matrices
A generalization of a tridiagonal matrix arising in many applications is a
banded matrix. Such matrices have non-zero elements only on the diagonal
98 Applied Numerical Methods
and p entries above and below the diagonal. For example, p = 1 for a tridi-
agonal matrix. The number p is called the semi-bandwidth of the matrix.
Example 3.20
_
_
_
_
_
_
_
_
_
3 −1 1 0 0
−1 3 −1 1.1 0
0.9 −1 3 −1 1.1
0 1.1 −1 3 −1
0 0 0.9 −1 3
_
_
_
_
_
_
_
_
_
is a banded matrix with semi-bandwidth equal to 2.
Provided Gaussian elimination without pivoting is applicable, banded ma-
trices may be stored and solved analogously to tridiagonal matrices. In partic-
ular, we may store the matrix in 2p +1 vectors, and we may use an algorithm
similar to (3.8) and (3.9), based on the general Gaussian elimination algorithm
(Algorithm 3.1 on page 83), but with the loop on i having an upper bound
equal to min k +p, n, rather than n, and with the a
i,j
replaced by appropriate
references to the n by 2p +1 matrix in which the non-zero entries are stored.
It is advantageous to handle a matrix as a banded matrix when its dimension
n is large relative to p.
3.2.5.5 General Sparse Matrices
Numerous applications, such as models of communications and transporta-
tion networks, give rise to matrices most of whose elements are zero, but do
not have an easily usable structure such as a block or banded structure. Ma-
trices most of whose elements are zero are called sparse matrices. Matrices
that are not sparse are called dense or full . Special, more sophisticated vari-
ants of Gaussian elimination, as well as iterative methods, which we treat
later in Section 3.5, may be used for sparse matrices.
Several different schemes are used to store sparse matrices. One such scheme
is to store two integer vectors r and c and one floating point vector v, such
that the number of entries in r, c, and v is the total number of non-zero
elements in the matrix; r
i
gives the row index of the i-th non-zero element, c
i
gives the corresponding column index, and v
i
gives the value.
Linear Systems of Equations 99
Example 3.21
_
_
_
_
_
_
_
_
_
0 0 1 0 0
−3 0 0 0 1
−2 −1 0 1.1 0
0 0 0 5 −1
7 −8 0 0 0
_
_
_
_
_
_
_
_
_
may be stored with the vectors
r =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2
3
5
3
5
1
3
4
2
4
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, c =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
1
1
2
2
3
4
4
5
5
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, and v =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
−3
−2
7
−1
−8
1
1.1
5
1
−1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
Note that there are 25 entries in this matrix, but only 10 nonzero entries.
There is a question concerning whether or not a particular matrix should
be considered to be sparse, rather than treated as dense. In particular, if the
matrix has some elements that are zero, but many are not, it may be more
efficient to treat the matrix as dense. This is because there is extra overhead
in the algorithms used to solve the systems with matrices that are stored as
sparse, and the elimination process can cause fill-in, the introduction of non-
zeros in the transformed matrix into elements that were zero in the original
matrix. Whether a matrix should be considered to be sparse or not depends
on the application, the type of computer used to solve the system, etc. Sparse
systems that have a banded or block structure are more efficiently treated
with special algorithms for banded or block systems than with algorithms for
general sparse matrices.
There is extensive support for sparse matrices in matlab. This is detailed in
matlab’s help system. One method of describing a sparse matrix in matlab
is as we have done in Example 3.21.
100 Applied Numerical Methods
3.3 Roundoff Error and Conditioning
On page 17, we defined the condition number of a function in terms of the
ratio of the relative error in the function value to the relative error in its argu-
ment. Also, in Example 1.17 on page 16, we saw that one way of computing
a quantity can lead to a large relative error in the result, while another way
leads to an accurate result; that is, one algorithm can be numerically unstable
while another is stable.
Similar concepts hold for solutions to systems of linear equations. For ex-
ample, Example 3.17 on page 90 illustrated that Gaussian elimination without
partial pivoting can be numerically unstable for a system of equations where
Gaussian elimination with partial pivoting is stable. We also have a concept of
condition number of a matrix, which relates the relative change of components
of x to changes in the elements of the matrix A and right-hand-side vector
b for the system. To understand the most commonly used type of condition
number of a matrix, we introduce norms.
3.3.1 Norms
We use norms to describe errors in vectors and convergence of sequences of
vectors.
DEFINITION 3.15 A function that assigns a non-negative real number
�v� to a vector v is called a norm, provided it has the following properties.
1. �u� ≥ 0.
2. �u� = 0 if and only if u = 0.
3. �λu� = |λ|�u� for λ ∈ R (or λ ∈ C if v is a complex vector).
4. �u +v� ≤ �u� +�v� (triangle inequality).
Consider V = C
n
, the vector space of n-tuples of complex numbers. Note
that x ∈ C
n
has the form x = (x
1
, x
2
, · · · , x
n
)
T
. Also,
x +y = (x
1
+y
1
, x
2
+y
2
, · · · , x
n
+y
n
)
T
and
λx = (λx
1
, λx
2
, · · · , λx
n
)
T
.
Important norms on C
n
are:
(a) �x�

= max
1≤i≤n
|x
i
|: the �

or max norm (for z = a + ib, |z| =

a
2
+b
2
=

zz.)
Linear Systems of Equations 101
(b) �x�
1
=
n

i=1
|x
i
|: the �
1
norm
(c) �x�
2
=

n

i=1
|x
i
|
2

1/2
: the �
2
norm (Euclidean norm)
(d) Scaled versions of the above norms, where we define �v�
a
= �a
T
v�,
where a = (a
1
, a
2
, · · · , a
n
)
T
with a
i
> 0 for 1 ≤ i ≤ n.
A useful property relating the Euclidean norm and the dot product is:
THEOREM 3.7
(the Cauchy–Schwarz inequality)
|v ◦ w| = |v
T
w| ≤ �x�
2
�y�
2
.
We now introduce a concept and notation for describing errors in compu-
tations involving vectors.
DEFINITION 3.16 The distance from u to v is defined as �u −v�.
The following concept and associated theorem are worth keeping in mind,
since they hint that, in many cases, it is not so important from the point of
view of size of the error, which norm we choose to describe the error in a
vector.
DEFINITION 3.17 Two norms � · �
α
and � · �
β
are called equivalent if
there exist positive constants c
1
and c
2
and such that
c
1
�x�
α
≤ �x�
β
≤ c
2
�x�
α
.
Hence, also,
1
c
2
�x�
β
≤ �x�
α

1
c
1
�x�
β
.
THEOREM 3.8
Any two norms on C
n
are equivalent.
102 Applied Numerical Methods
The following are the constants associated with the 1-, 2-, and ∞-norms:
(a) �x�

≤ �x�
2


n|x�

,
(b)
1

n
�x�
1
≤ �x�
2
≤ �x�
1
,
(c)
1
n
�x�
1
≤ �x�

≤ �x�
1
.
(3.10)
The above relations are sharp in the sense that vectors can be found for which
the inequalities are actually inequalities. Thus, in a sense, the 1, 2, and ∞
norms of vectors become “less equivalent,” the larger the vector space.
Example 3.22
The matlab function norm computes norms of vectors. Consider the follow-
ing dialog.
>> x = [1;1;1;1;1]
x =
1
1
1
1
1
>> norm(x,1)
ans =
5
>> norm(x,2)
ans =
2.2361
>> norm(x,inf)
ans =
1
>> n=1000;
>> for i=1:n;x(i)=1;end;
>> norm(x,1)
ans =
1000
>> norm(x,2)
ans =
31.6228
>> norm(x,inf)
ans =
1
>> >>
This illustrates that, for a vector all of whose entries are equal to 1, the second
Linear Systems of Equations 103
inequality in (3.10)(a) is an equation, the first inequality in (b) is an equation,
and the first inequality in (c) is an equation.
To discuss the condition number of the matrix, we use the concept of the
norm of a matrix. In the following, A and B are arbitrary square matrices
and λ is a complex number.
DEFINITION 3.18 A matrix norm is a real-valued function of A, de-
noted by � · � satisfying:
1. �A� ≥ 0.
2. �A� = 0 if and only if A = 0.
3. �λA� = |λ| �A�.
4. �A+B� ≤ �A� +�B�.
5. �AB� ≤ �A� �B�.
REMARK 3.4 In contrast to vector norms, we have an additional fifth
property, referred to as a submultiplicative property, dealing with the norm
of the product of two matrices.
Example 3.23
The quantity
�A�
E
=
_
_
n

i,j=1
|a
ij
|
2
_
_
1
2
is called the Frobenius norm. Since the Frobenius norm is the Euclidean
norm of the matrix when the matrix is viewed to be a single vector formed
by concatenating its columns (or rows), the Frobenius norm is a norm. It is
also possible to prove that the Frobenius norm is a matrix norm.
To relate norms of matrices to errors in the solution of linear systems, we
relate vector norms to matrix norms:
DEFINITION 3.19 A matrix norm �A� and a vector norm �x� are called
compatible if for all vectors x and matrices A we have �Ax� ≤ �A� �x�.
REMARK 3.5 A consequence of the Cauchy–Schwarz inequality is that
�Ax�
2
≤ �A�
E
�x�
2
, i.e., the Euclidean norm �· �
E
for matrices is compatible
with the �
2
-norm � · �
2
for vectors.
104 Applied Numerical Methods
In fact, every vector norm has associated with it a sharply defined compat-
ible matrix norm:
DEFINITION 3.20 Given a vector norm � · �, we define a natural or
induced matrix norm associated with it as
�A� = sup
x�=0
�Ax�
�x�
. (3.11)
It is straightforward to show that an induced matrix norm satisfies the five
properties required of a matrix norm. Also, from the definition of induced
norm, an induced matrix norm is compatible with the given vector norm,
that is,
�A� �x� ≥ �Ax� for all x ∈ C
n
. (3.12)
REMARK 3.6 Definition 3.20 is equivalent to
�A� = sup
�y�=1
�Ay�,
since
�A� = sup
x�=0
�Ax�
�x�
= sup
x�=0




A
x
�x�




= sup
�y�=1
�Ay�
(letting y = x/�x�).
We now present explicit expressions for �A�

, �A�
1
, and �A�
2
.
THEOREM 3.9
(Formulas for common induced matrix norms)
(a) �A�

= max
1≤i≤n
n

j=1
|a
ij
| = {maximum absolute row sum}.
(b) �A�
1
= max
1≤j≤n
n

i=1
|a
ij
| = {maximum absolute column sum}.
(c) �A�
2
=

ρ(A
H
A), where ρ(M) is the spectral radius of the matrix M,
that is, the maximum absolute value of an eigenvalue of M.
(We will study eigenvalues and eigenvectors in Chapter 5. This spectral
radius plays a fundamental role in a more advanced study of matrix
norms. In particular ρ(A) ≤ �A� for any square matrix A and any
matrix norm, and, for any square matrix A and any � > 0, there is a
matrix norm � · � such that �A� ≤ ρ(A) +�. )
Linear Systems of Equations 105
Note that �A�
2
is not equal to the Frobenius norm.
Example 3.24
The norm function in matlab gives the induced matrix norm when its ar-
gument is a matrix. With the matrix A as in Example 3.13 (on page 81),
consider the following matlab dialog (edited for brevity):
>> A
A =
1 2 3
4 5 6
7 8 10
>> x’
ans = 1 1 1
> norm(A,1)
ans = 19
>> norm(x,1)
ans = 3
>> norm(A*x,1)
ans = 46
>> norm(A,1)*norm(x,1)
ans = 57
>> norm(A,2)
ans = 17.4125
>> norm(x,2)
ans = 1.7321
>> norm(A*x,2)
ans = 29.7658
>> norm(A,2)*norm(x,2)
ans = 30.1593
>> norm(A,inf)
ans = 25
>> norm(x,inf)
ans = 1
>> norm(A*x,inf)
ans = 25
>> norm(A,inf)*norm(x,inf)
ans = 25
>>
We are now prepared to discuss condition numbers of matrices.
3.3.2 Condition Numbers
We begin with the following:
106 Applied Numerical Methods
DEFINITION 3.21 If the solution x of Ax = b changes drastically when
A or b is perturbed slightly, then the system Ax = b is called ill-conditioned.
Because rounding errors are unavoidable with floating point arithmetic,
much accuracy can be lost during Gaussian elimination for ill-conditioned
systems. In fact, the final solution may be considerably different than the
exact solution.
Example 3.25
An ill-conditioned system is
Ax =

1 0.99
0.99 0.98
��
x
1
x
2

=

1.99
1.97

, whose exact solution is x =

1
1

.
However,
Ax =

1.989903
1.970106

has solution x =

3
−1.0203

.
Thus, a change of
δb =

−0.000097
0.000106

produces a change δx =

2.0000
−2.0203

.
We first study the phenomenon of ill-conditioning, then study roundoff error
in Gaussian elimination. We begin with
THEOREM 3.10
Let � · �
β
be an induced matrix norm. Let x be the solution of Ax = b with
A an n ×n invertible complex matrix. Let x +δx be the solution of
(A+δA)(x +δx) = b +δb. (3.13)
Assume that
�δA�
β
�A
−1

β
< 1. (3.14)
Then
�δx�
β
�x�
β
≤ κ
β
(A)(1 −�δA�
β
�A
−1

β
)
−1

�δb�
β
�b�
β
+
�δA�
β
�A�
β

, (3.15)
where
κ
β
(A) = �A�
β
�A
−1

β
is defined to be the condition number of the matrix A with respect to norm
�·�
β
. There exist perturbations δx and δb for which (3.15) holds with equality.
That is, inequality (3.15) is sharp.
Linear Systems of Equations 107
(We supply a proof of Theorem 3.10 in [1].)
The condition number κ
β
(A) ≥ 1 for any induced matrix norm and any
matrix A, since
1 = �I�
β
= �A
−1
A�
β
≤ �A
−1

β
�A�
β
= κ
β
(A).
Example 3.26
Consider the system of equations from Example 3.25, and the following mat-
lab dialog.
>> A = [1 0.99
0.99 0.98]
A =
1.0000 0.9900
0.9900 0.9800
>> norm(A,1)*norm(inv(A),1)
ans =
3.9601e+004
>>>> b = [1.99;1.97]
b =
1.9900
1.9700
>> x = A\b
x =
1.0000
1.0000
>> btilde = [1.989903;1.980106]
btilde =
1.9899
1.9801
>> xtilde = A\btilde
xtilde =
102.0000
-101.0203
>> norm(x-xtilde,1)/norm(x,1)
ans =
101.5102
>> sol_error = norm(x-xtilde,1)/norm(x,1)
sol_error =
101.5102
>> data_error = norm(b-btilde,1)/norm(b,1)
data_error =
0.0026
>> data_error * cond(A,1)
ans =
102.0326
>> cond(A,1)
ans =
108 Applied Numerical Methods
3.9601e+004
>> cond(A,2)
ans =
3.9206e+004
>> cond(A,inf)
ans =
3.9601e+004
>>
This illustrates the definition of the condition number, as well as the fact that
the relative error in the norm of the solution can be estimated by the relative
error in the norms of the matrix and the right-hand-side vector multiplied
by the condition number of the matrix. Also, in this two-dimensional case,
the condition numbers in the 1-, 2-, and ∞-norms do not differ by much.
The actual error in the solutions A\x and A\tilde x are small relative to the
displayed digits, in this case.
If δA = 0, we have
�δx�
β
�x�
β
≤ κ
β
(A)
�δb�
β
�b�
β
,
and if δb = 0, then
�δx�
β
�x�
β

κ
β
(A)
1 −�A
−1

β
�δA�
β
�δA�
β
�A�
β
.
Note: In solving systems using Gaussian elimination with partial pivoting,
we can use the condition number as a rule of thumb in estimating the number
of digits correct in the solution. For example, if double precision arithmetic is
used, errors in storing the matrix into internal binary format and in each step
of the Gaussian elimination process are on the order of 10
−16
. If the condition
number is 10
4
, then we might expect 16 − 4 = 12 digits to be correct in the
solution. In many cases, this is close. (For more foolproof bounds on the
error, interval arithmetic techniques can sometimes be used.)
Note: For a unitary matrix U, i.e., U
H
U = I, we have κ
2
(U) = 1. Such a
matrix is called perfectly conditioned, since κ
β
(A) ≥ 1 for any β and A.
A classic example of an ill-conditioned matrix is the Hilbert matrix of order
n:
H
n
=
_
_
_
_
_
_
_
_
1
1
2
1
3
· · ·
1
n
1
2
1
3
1
4
· · ·
1
n+1
.
.
.
1
n
1
n+1
1
n+2
· · ·
1
2n−1
_
_
_
_
_
_
_
_
.
Hilbert matrices and matrices that are approximately Hilbert matrices occur
in approximation of data and functions. Condition numbers for some Hilbert
Linear Systems of Equations 109
TABLE 3.1: Condition numbers of some Hilbert matrices
n 3 5 6 8 16 32 64
κ2(Hn) 5 ×10
2
5 ×10
5
15 ×10
6
15 ×10
9
2.0 ×10
22
4.8 ×10
46
3.5 ×10
95
matrices appear in Table 3.1. The reader may verify entries in this table,
using the following matlab dialog as an example.
>> hilb(3)
ans =
1.0000 0.5000 0.3333
0.5000 0.3333 0.2500
0.3333 0.2500 0.2000
>> cond(hilb(3))
ans =
524.0568
>> cond(hilb(3),2)
ans =
524.0568
REMARK 3.7 Consider Ax = b. Ill-conditioning combined with round-
ing errors can have a disastrous effect in Gaussian elimination. Sometimes,
the conditioning can be improved (κ decreased) by scaling the equations. A
common scaling strategy is to row equilibrate the matrix A by choosing a
diagonal matrix D, such that premultiplying A by D causes max
1≤j≤n
|a
ij
| = 1
for i = 1, 2, · · · , n. Thus, DAx = Db becomes the scaled system with maxi-
mum elements in each row of DA equal to unity. (This procedure is generally
recommended before Gaussian elimination with partial pivoting is employed
[19]. However, there is no guarantee that equilibration with partial pivoting
will not suffer greatly from effects of roundoff error.)
Example 3.27
The condition number does not give the entire story in Gaussian elimination.
In particular, if we multiply an entire equation by a non-zero number, this
changes the condition number of the matrix, but does not have an effect on
Gaussian elimination. Consider the following matlab dialog.
>> A = [1 1
-1 1]
A =
1 1
-1 1
>> cond(A)
ans =
110 Applied Numerical Methods
1.0000
>> A(1,:) = 1e16*A(1,:)
A =
1.0e+016 *
1.0000 1.0000
-0.0000 0.0000
>> cond(A)
ans =
1.0000e+016
>>
However, the strange scaling in the first row of the matrix will not cause
serious roundoff error when Gaussian elimination proceeds with floating point
arithmetic, if the right-hand-sides are scaled accordingly.
3.3.3 Roundoff Error in Gaussian Elimination
Consider the solution of Ax = b. On a computer, elements of A and b
are represented by floating point numbers. Solving this linear system on a
computer only produces an approximate solution ˆ x.
There are two kinds of rounding error analysis. In backward error analysis,
one shows that the computed solution ˆ x is the exact solution of a perturbed
system of the form (A + F)ˆ x = b. (See, for example, [30] or [42].) Then we
have
Ax −Aˆ x = −F ˆ x,
that is,
x − ˆ x = −A
−1
F ˆ x,
from which we obtain
�x − ˆ x�

�ˆ x�

≤ �A
−1


�F�

= κ

(A)
�F�

�A�

. (3.16)
Thus, assuming that we have estimates for κ

(A) and �F�

, we can use
(3.16) to estimate the error �x − ˆ x�

.
In forward error analysis, one keeps track of roundoff error at each step of
the elimination procedure. Then, x − ˆ x is estimated in some norm in terms
of, for example, A, κ(A), and θ =
p
2
β
1−t
[37, 38].
The analyses are lengthy and are not given here. The results, however, are
useful to understand. Basically, it is shown that
�F�

�A�

≤ c
n
gθ, (3.17)
where
c
n
is a constant that depends on size of the n ×n matrix A,
Linear Systems of Equations 111
g is a growth factor, g =
max
i,j,k
|a
(k)
ij
|
max
i,j
|a
ij
|
, and
θ is the unit roundoff error, θ =
p
2
β
1−t
.
Note: Using backward error analysis, c
n
= 1.01n
3
+ 5(n + 1)
2
, and using
forward error analysis, c
n
=
1
6
(n
3
+ 15n
2
+ 2n −12).
Note: The growth factor g depends on the pivoting strategy: g ≤ 2
n−1
for
partial pivoting,
6
while g ≤ n
1/2
(2 · 3
1/2
· 4
1/3
· · · n
1/n−1
)
1/2
for full pivoting.
(Wilkinson conjectured that this can be improved to g ≤ n.) For example,
for n = 100, g ≤ 2
99
≈ 10
30
for partial pivoting and g ≤ 3300 for full pivoting.
Note: Thus, by (3.16) and (3.17), the relative error �x − ˆ x�

/�ˆ x�

depends
directly on κ

(A), θ, n
3
, and the pivoting strategy.
REMARK 3.8 The factor of 2
n−1
discouraged numerical analysts in
the 1950’s from using Gaussian elimination, and spurred study of iterative
methods for solving linear systems. However, it was found that, for most
matrices, the growth factor is much less, and Gaussian elimination with partial
pivoting is usually practical.
3.3.4 Interval Bounds
In many instances, it is practical to obtain rigorous bounds on the solution
x to a linear system Ax = b. The algorithm is a modification of the gen-
eral Gaussian elimination algorithm (Algorithm 3.1) and back substitution
(Algorithm 3.2), as follows.
ALGORITHM 3.4
(Interval bounds for the solution to a linear system)
INPUT: The n by n matrix A and n-vecttor b ∈ R
n
.
OUTPUT: an interval vector x such that the exact solution to Ax = b must
be within the bounds x.
1. Use Algorithm 3.1 and Algorithm 3.2 (that is, Gaussian elimination with
back substitution, or any other technique) and floating point arithmetic
to compute an approximation Y to A
−1
.
2. Use interval arithmetic, with directed rounding, to compute interval en-
closures to Y A and Y b. That is,
6
It cannot be improved, since g = 2
n−1
for certain matrices.
112 Applied Numerical Methods
(a)
˜
A ←Y A (computed with interval arithmetic),
(b)
˜
b ←Y b (computed with interval arithmetic).
3. FOR k = 1, 2, · · · , n −1 (forward phase using interval arithmetic)
FOR i = k + 1, · · · , n
(a) m
ik
← ˜ a
ik
/˜ a
kk
.
(b) ˜ a
ik
←[0, 0].
(c) FOR j = k + 1, · · · , n
˜ a
ij
← ˜ a
ij
−m
ik
˜ a
kj
.
END FOR
(d)
˜
b
i

˜
b
i
−m
ik
˜
b
k
.
END FOR
END FOR
4. x
n

˜
b
n
/˜ a
nn
.
5. FOR k = n −1, n −2, · · · , 1 (back substitution)
x
k
←(
˜
b
k


n
j=k+1
˜ a
kj
x
j
)/˜ a
kk
.
END FOR
END ALGORITHM 3.4.
Note: We can explicitly set a
ik
to zero without loss of mathematical rigor,
even though, using interval arithmetic, a
ik
−m
ik
a
kk
may not be exactly [0, 0].
In fact, this operation does not even need to be done, since we need not ref-
erence a
ik
in the back substitution process.
Note: Obtaining the rigorous bounds x in Algorithm 3.2 is more costly
than computing an approximate solution with floating point arithmetic using
Gaussian elimination with back substitution, because an approximate inverse
Y must explicitly be computed to precondition the system. However, both
computations take O(n
3
) operations for general systems.
The following theorem clarifies why we may use Algorithm 3.4 to obtain
mathematically rigorous bounds.
THEOREM 3.11
Define the solution set to
˜
Ax =
˜
b to be
Σ(
˜
A,
˜
b) =

x |
˜
Ax =
˜
b for some
˜
A ∈
˜
A and
˜
b ∈
˜
b

.
Linear Systems of Equations 113
If Ax

= b, then x

∈ Σ(
˜
A,
˜
b). Furthermore, if x is the output to Algo-
rithm 3.4, then Σ(
˜
A,
˜
b) ⊆ x.
For facts enabling a proof of Theorem 3.11, see [29] or other references on
interval analysis.
Example 3.28
_
_
_
3.3330 15920. −10.333
2.2220 16.710 9.612
1.5611 5.1791 1.6852
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_ =
_
_
_
15913.
28.544
8.4254
_
_
_
For this problem, κ

(A) ≈ 16000 and the exact solution is x = [1, 1, 1]
T
. We
will use the matlab(providing IEEE double precision floating point arith-
metic) to compute Y , and we will use the intlab interval arithmetic toolbox
for matlab (based on IEEE double precision). Rounded to 14 decimal digits,
7
we obtain
Y ≈
_
_
−0.00012055643706 −0.14988499865822 0.85417095741675
0.00006278655296 0.00012125786211 −0.00030664438576
−0.00008128244868 0.13847464088044 −0.19692507695527
_
_
.
Using outward rounding in both the computation and the decimal display, we
obtain
˜
A ⊆

[1.00000000000000, 1.00000000000000] [−0.00000000000012, −0.00000000000011]
[0.00000000000000, 0.00000000000001] [1.00000000000000, 1.00000000000001]
[0.00000000000000, 0.00000000000001] [0.00000000000013, 0.00000000000014]
[−0.00000000000001, −0.00000000000000]
[−0.00000000000001, −0.00000000000000]
[0.99999999999999, 1.00000000000001]

,
and
˜
b ⊆

[0.99999999999988, 0.99999999999989]
[1.00000000000000, 1.00000000000001]
[1.00000000000013, 1.00000000000014]

.
Completing the remainder of Algorithm 3.4 then gives
x

∈ x ⊆
_
_
[0.99999999999999, 1.00000000000001]
[0.99999999999999, 1.00000000000001]
[0.99999999999999, 1.00000000000001]
_
_
.
The actual matlab dialog is as follows:
>> format long
>> intvalinit(’DisplayInfsup’)
===> Default display of intervals by infimum/supremum (e.g. [ 3.14 , 3.15 ])
>> x = interval_Gaussian_elimination(A,b)
7
as matlab displays it
114 Applied Numerical Methods
x =
1.000000000000000
1.000000000000000
1.000000000000001
>> IA = [intval(3.3330) intval(15920.) intval(-10.333)
intval(2.2220) intval(16.710) intval(9.612)
intval(1.5611) intval(5.1791) intval(1.6852)]
intval IA =
1.0e+004 *
Columns 1 through 2
[ 0.00033330000000, 0.00033330000001] [ 1.59200000000000, 1.59200000000000]
[ 0.00022219999999, 0.00022220000000] [ 0.00167100000000, 0.00167100000001]
[ 0.00015610999999, 0.00015611000000] [ 0.00051791000000, 0.00051791000001]
Column 3
[ -0.00103330000001, -0.00103330000000]
[ 0.00096120000000, 0.00096120000001]
[ 0.00016851999999, 0.00016852000001]
>> Ib = [intval(15913.);intval(28.544);intval(8.4254)]
intval Ib =
1.0e+004 *
[ 1.59130000000000, 1.59130000000000]
[ 0.00285440000000, 0.00285440000001]
[ 0.00084253999999, 0.00084254000000]
>> YA = Y*IA
intval YA =
Columns 1 through 2
[ 0.99999999999999, 1.00000000000001] [ -0.00000000000100, -0.00000000000099]
[ -0.00000000000001, 0.00000000000001] [ 1.00000000000000, 1.00000000000001]
[ -0.00000000000001, 0.00000000000001] [ 0.00000000000013, 0.00000000000014]
Column 3
[ 0.00000000000000, 0.00000000000001]
[ -0.00000000000001, -0.00000000000000]
[ 0.99999999999999, 1.00000000000001]
>> Yb = Y*Ib
intval Yb =
[ 0.99999999999900, 0.99999999999901]
[ 1.00000000000000, 1.00000000000001]
[ 1.00000000000013, 1.00000000000014]
>> x = interval_Gaussian_elimination(A,b)
x =
1.000000000000000
1.000000000000000
1.000000000000001
Here, we need to use the intlab function intval to convert the decimal
strings representing the matrix and right-hand side vector elements to small
intervals containing the actual decimal values. This is because, even though
the original system did not have interval entries, the elements cannot all be
represented exactly as binary floating point numbers, so we must enclose
the exact values in floating point intervals to be certain that the bounds we
compute contain the actual solution. This is not necessary in computing the
floating point preconditioning matrix Y , since Y need not be an exact inverse.
The function interval Gaussian elimination, not a part of intlab, is as
follows:
function [x] = interval_Gaussian_elimination(A, b)
% [x] = interval_Gaussian_elimination(A, b)
% returns the result of Algorithm 3.5 in the book.
% The matrix A and vector b should be intervals,
% although they may be point intervals (i.e. of width zero).
Linear Systems of Equations 115
n = length(b);
Y = inv(mid(A));
Atilde = Y*A;
btilde = Y*b;
error_occurred = 0;
for k=1:n
for i=k+1:n
m_ik = Atilde(i,k)/Atilde(k,k);
for j=k+1:n
Atilde(i,j) = Atilde(i,j) - m_ik*Atilde(k,j);
end
btilde(i) = btilde(i) -m_ik*btilde(k);
end
end
x(n) = btilde(n)/Atilde(n,n);
for k=n-1:-1:1
x(k) = btilde(k);
for j=k+1:n
x(k) = x(k) - Atilde(k,j)*x(j);
end
x(k) = x(k)/Atilde(k,k);
end
x = x’;
Note: There are various ways of using interval arithmetic to obtain rigorous
bounds on the solution set to linear systems of equations. Some of these are
related mathematically to the interval Newton method introduced in §2.4 on
page 56, while others are related to the iterative techniques we discuss later in
this section. The effectiveness and practicality of a particular such technique
depend on the condition of the system, and whether the entries in the matrix
A and right hand side vector b are points to start, or whether there are larger
uncertainties in them (that is, whether or not these coefficients are wide or
narrow intervals). A good theoretical reference is [29] and some additional
practical detail is given in our monograph [20].
We now consider another method for computing the solution of a linear
system Ax = b. This method is particularly appropriate for various statistical
computations, such as least squares fits, when there are more equations than
unknowns.
116 Applied Numerical Methods
3.4 Orthogonal Decomposition (QR Decomposition)
This method for computing the solution of Ax = b is based on orthogonal
decomposition, also known as the QR decomposition or QR factorization. In
addition to solving linear systems, the QR factorization is also useful in least
squares problems and eigenvalue computations.
We will use the following concept heavily in this section, as well as when
we study the singular value decomposition.
DEFINITION 3.22 Two vectors u and v are called orthogonal provided
the dot product u ◦ v = 0. A set of vectors v
(i)
is said to be orthonormal,
provided v
(i)
◦ v
(j)
= δ
ij
, where δ
ij
is the Kronecker delta function
δ
ij
=

1 if i = j,
0 if i �= j.
A matrix Q whose columns are orthonormal vectors is called an orthogonal
matrix.
In QR-decompositions, we compute an orthogonal matrix Q and an upper
triangular matrix
8
R such that A = QR. Advantages of the QR decomposition
include the fact that systems involving an upper triangular matrix R can be
solved by back substitution, the fact that Q is perfectly conditioned (with
condition number in the 2-norm equal to 1), and the fact that the solution to
Qy = b is Q
T
y.
There are several ways of computing QR-decompositions. These are de-
tailed, for example, in our graduate-level text [1]. Here, we focus on the
properties of the decomposition and its use.
Note: The QR decomposition is not unique. Hence, different software may
come up with different QR decompositions for the same matrix.
3.4.1 Properties of Orthogonal Matrices
The following two properties, easily provable, make the QR decomposition
a numerically stable way of dealing with systems of equations.
THEOREM 3.12
Suppose Q is an orthogonal matrix. Then Q has the following properties.
8
also known as a “right triangular” matrix. This is the reason for the notation “R”.
Linear Systems of Equations 117
1. Q
T
Q = I, that is, Q
T
= Q
−1
. Thus, solving the system Qy = b can be
done with a matrix multiplication.
9
2. �Q�
2
= �Q
T

2
= 1.
3. Hence, κ
2
(Q) = 1, where κ
2
(Q) is the condition number of Q in the 2-
norm. That is, Q is perfectly conditioned with respect to the 2-norm (and
working with systems of equations involving Q will not lead to excessive
roundoff error accumulation).
4. �Qx�
2
= �x�
2
for every x ∈ R
n
. Hence �QA�
2
= �A�
2
for every n by
n matrix A.
3.4.2 Least Squares and the QR Decomposition
Overdetermined linear systems (with more equations than unknowns) oc-
cur frequently in data fitting, in mathematical modeling and statistics. For
example, we may have data of the form {(t
i
, y
i
)}
m
i=1
, and we wish to model
the dependence of y on t by a linear combination of n basis functions {ϕ
j
}
n
j=1
,
that is,
y ≈ f(t) =
n

i=1
x
i
ϕ
i
(t), (3.18)
where m > n. Setting f(t
i
) = y
i
, 1 ≤ i ≤ m, gives the overdetermined linear
system
_
_
_
_
_
_
_
_
ϕ
1
(t
1
) ϕ
2
(t
1
) · · · ϕ
n
(t
1
)
ϕ
1
(t
2
) ϕ
2
(t
2
) · · · ϕ
n
(t
2
)
.
.
.
ϕ
1
(t
m
) ϕ
2
(t
m
) · · · ϕ
n
(t
m
)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
y
1
y
2
.
.
.
y
m
_
_
_
_
_
_
_
_
, (3.19)
that is,
Ax = b, where A ∈ L(R
n
, R
m
), a
ij
= ϕ
j
(t
i
), and b
i
= y
i
. (3.20)
Perhaps the most common way of fitting data is with least squares, in which
we find x

such that
1
2
�Ax

−b�
2
2
= min
x∈R
n
ϕ(x), where ϕ(x) =
1
2
�Ax −b�
2
2
. (3.21)
(Note that x

minimizes the 2-norm of the residual vector r(x) = Ax − b,
since the function g(u) = u
2
is increasing.)
9
With the usual way of multiplying matrices, this is n
2
multiplications, more than with
back-substitution, but still O(n
2
). Furthermore, it can be done with n dot products,
something that is efficient on many machines.
118 Applied Numerical Methods
The naive way of finding x

is to set the gradient ∇ϕ(x) = 0 and simplify.
Doing so gives the normal equations:
A
T
Ax = A
T
b. (3.22)
(See Exercise 11 on page 143.) However, the normal equations tend to be
very ill-conditioned. For example, if m = n, κ
2
(A
T
A) = κ
2
(A)
2
. Fortunately,
the least squares solution x

may be computed with a QR decomposition. In
particular,
�Ax −b�
2
= �QRx −b�
2
= �Q
T
(QRx −b)�
2
= �Rx −Q
T
b�
2
.
(Above, we used �Ux�
2
= �x�
2
when U is orthogonal.) However,
�Rx −Q
T
b�
2
2
=
n

i=1
_
_
_
_
_
i

j=1
r
ij
x
j
_
_
_
−(Q
T
b)
i
_
_
2
+
n

i=m+1
(Q
T
b)
2
i
. (3.23)
Observe now:
1. All m terms in the sum in (3.23) are nonnegative.
2. The first n terms can be made exactly zero.
3. The last m−n terms are constant.
Therefore,
min
x∈R
n
�Ax −b�
2
2
=
n

i=m+1
(Q
T
b)
2
i
,
and the minimizer x

can be computed by backsolving the square triangular
system consisting of the first n rows of Rx = Q
T
b.
We summarize these computations in the following algorithm.
ALGORITHM 3.5
(Least squares fits with a QR decomposition)
INPUT: the m by n A, m ≥ n, and b ∈ R
m
.
OUTPUT: the least squares fit x ∈ R
n
such that �Ax −b�
2
is minimized, as
well as the square of the residual norm �Ax −b�
2
2
.
1. Compute Q and R such that Q is an m by m orthogonal matrix, R is an
m by n upper triangular (or “right triangular”) matrix, and A = QR.
2. form y = Q
T
b.
Linear Systems of Equations 119
3. Solve the upper triangular system n by n system R
1:n,1:n
x = y
1:n
using
Algorithm 3.2 (the back-substitution algorithm). Here, R
1:n,1:n
corre-
sponds to A
(n)
and y
1:n
corresponds to b
(n)
.
4. Set the residual norm �Ax −b�
2
to


m
i=n+1
y
2
i
= �y
n+1:m

2
.
END ALGORITHM 3.5.
Example 3.29
Consider fitting the data
t y
0 1
1 4
2 5
3 8
in the least squares sense with a polynomial of the form
p
2
(x) = x
0
ϕ
0
(x) +x
1
ϕ
1
(x) +x
2
ϕ
2
(x),
where ϕ
0
(x) ≡ 1, ϕ
1
(x) ≡ x, and ϕ
2
(x) ≡ x
2
. The overdetermined system
(3.19) becomes
_
_
_
_
1 0 0
1 1 1
1 2 4
1 3 9
_
_
_
_
_
_
x
0
x
1
x
2
_
_
=
_
_
_
_
1
4
5
8
_
_
_
_
.
We use matlab to perform a QR decomposition and find the least squares
solution:
>> format short
>> clear x
>> A = [1 0 0
1 1 1
1 2 4
1 3 9]
A =
1 0 0
1 1 1
1 2 4
1 3 9
>> b = [1;4;5;8]
b =
1
4
5
8
>> [Q,R] = qr(A)
Q =
-0.5000 0.6708 0.5000 0.2236
-0.5000 0.2236 -0.5000 -0.6708
-0.5000 -0.2236 -0.5000 0.6708
-0.5000 -0.6708 0.5000 -0.2236
R =
-2.0000 -3.0000 -7.0000
120 Applied Numerical Methods
0 -2.2361 -6.7082
0 0 2.0000
0 0 0
>> Qtb = Q’*b;
>> x(3) = Qtb(3)/R(3,3);
>> x(2)=(Qtb(2) - R(2,3)*x(3))/R(2,3);
>> x(1) = (Qtb(1) - R(1,2)*x(2) - R(1,3)*x(3))/R(1,1)
x =
3.4000 0.7333 0.0000
>> x=x’;
>> resid = A*x - b
resid =
2.4000
0.1333
-0.1333
-2.4000
>> tt = linspace(0,3);
>> yy = x(1) + x(2)*tt + x(3)*tt.^2;
>> axis([-0.1,3.1,0.9,8.1])
>> hold
Current plot held
>> plot(A(:,2),b,’LineStyle’,’none’,’Marker’,’*’,’MarkerEdgeColor’,’red’,’Markersize’,15)
>> plot(tt,yy)
>> y = Q’*b
y =
-9.0000
-4.9193
0.0000
-0.8944
>> x = R(1:n,1:n)\y(1:n)
x =
1.2000
2.2000
0.0000
>> resid_norm = norm(y(n+1:m),2)
resid_norm =
0.8944
>> norm(A*x-b,2)
ans =
0.8944
>>
>>
This dialog results in the following plot, illustrating the data points as stars
and the quadratic fit (which in this case happens to be linear) as a blue curve.
0 0.5 1 1.5 2 2.5 3
1
2
3
4
5
6
7
8
Note that the fit does not approximate the first and fourth data points well.
(The portion of the dialog following the plot commands illustrates alternative
views of the computation of x and the residual norm.)
Linear Systems of Equations 121
Although working with the QR decomposition is a stable process, care
should be taken when computing Q and R. We discuss actually computing Q
and R in [1].
We now turn to iterative techniques for linear systems of equations.
3.5 Iterative Methods for Solving Linear Systems
Here, we study iterative solution of linear systems
Ax = b, i.e.
n

k=1
a
jk
x
k
= b
j
, j = 1, 2, . . . , n. (3.24)
Example 3.30
Consider Example 3.18 (on page 93), where we replaced a second derivative
in a differential equation by a difference approximation, to obtain the system
_
_
2 −1 0
−1 2 −1
0 −1 2
_
_
_
_
x
1
x
2
x
3
_
_
=
1
16
_
_
sin(π/4)
sin(π/2)
sin(3π/4)
_
_
.
In other words, the equations are
2x
1
− x
2
=
1
16
sin(
π
4
),
−x
1
+ 2x
2
− x
3
=
1
16
sin(
π
2
),
− x
2
+ 2x
3
=
1
16
sin(

4
).
Solving the first equation for x
1
, the second equation for x
2
, and the third
equation for x
3
, we obtain
x
1
=
1
2

1
16
sin(
π
4
) + x
2

,
x
2
=
1
2

1
16
sin(
π
2
) + x
1
+ x
3

,
x
3
=
1
2

1
16
sin(

4
) + x
2

,
which can be written in matrix form as
_
_
_
x
1
x
2
x
3
_
_
_ =
_
_
_
0
1
2
0
1
2
0
1
2
0
1
2
0
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_+
1
32
_
_
_
sin(
π
4
)
sin(
π
2
)
sin(

4
)
_
_
_,
that is,
x = Gx +c, (3.25)
122 Applied Numerical Methods
with
x =
_
_
_
x
1
x
2
x
3
_
_
_, G =
_
_
_
0
1
2
0
1
2
0
1
2
0
1
2
0
_
_
_, and c =
1
32
_
_
_
sin(
π
4
)
sin(
π
2
)
sin(

4
)
_
_
_.
Equation (3.25) can form the basis of an iterative method:
x
(k+1)
= Gx
(k)
+c. (3.26)
Starting with x
(0)
= (0, 0, 0)
T
, we obtain the following in matlab:
>> x = [0,0,0]’
x =
0
0
0
>> G = [0 1/2 0
1/2 0 1/2
0 1/2 0]
G =
0 0.5000 0
0.5000 0 0.5000
0 0.5000 0
>> c = (1/32)*[sin(pi/4); sin(pi/2); sin(3*pi/4)]
c =
0.0221
0.0313
0.0221
>> x = G*x + c
x =
0.0221
0.0313
0.0221
>> x = G*x + c
x =
0.0377
0.0533
0.0377
>> x = G*x + c
x =
0.0488
0.0690
0.0488
>> x = G*x + c
x =
0.0566
0.0800
0.0566
>> x = G*x + c
x =
0.0621
0.0878
0.0621
>> x = G*x + c
x =
0.0660
0.0934
0.0660
>> x = G*x + c
x =
0.0688
Linear Systems of Equations 123
0.0973
0.0688
>> x = G*x + c
x =
0.0707
0.1000
0.0707
>> x = G*x + c
x =
0.0721
0.1020
0.0721
>>
Comparing with the solution in Example 3.18, we see that the components
of x tend to the components of the solution to Ax = b as we iterate (3.26).
This is an example of an iterative method (namely, the Jacobi method) for
solving the system of equations Ax = b.
Good references for iterative solution of linear systems are [23, 30, 39, 44].
Why may we wish to solve (3.24) iteratively? Suppose that n = 10, 000 or
more, which is not unreasonable for many problems. Then A has 10
8
elements,
making it difficult to store or solve (3.24) directly using, for example, Gaussian
elimination.
To discuss iterative techniques involving vectors and matrices, we use:
DEFINITION 3.23 A sequence of vectors {x
k
}

k=1
is said to converge
to a vector x ∈ C
n
if and only if �x
k
−x� →0 as k →∞ for some norm �· �.
Definition 3.23 implies that a sequence of vectors {x
k
} ⊂ R
n
(or ⊂ C
n
)
converges to x if and only if x
k
i
→x
i
as k →∞ for all i.
Note: Iterates defined by (3.26) can be viewed as fixed point iterates that
under certain conditions converge to the fixed point.
DEFINITION 3.24 The iterative method defined by (3.26) is called con-
vergent if, for all initial values x
(0)
, we have x
(k)
→A
−1
b as k →∞.
We now take a closer look at the Jacobi method, as well as the related
Gauss–Seidel method and SOR method.
3.5.1 The Jacobi Method
We can think of the Jacobi method illustrated in the above example in
matrix form as follows. Let L be the lower triangular part of the matrix A,
U the upper triangular part, and D the diagonal part.
124 Applied Numerical Methods
Example 3.31
In Example 3.30,
L =
_
_
0 0 0
−1 0 0
0 −1 0
_
_
, U =
_
_
0 −1 0
0 0 −1
0 0 0
_
_
, and D =
_
_
2 0 0
0 2 0
0 0 2
_
_
.
Then the Jacobi method may be written in matrix form as
G = −D
−1
(L +U) ≡ J. (3.27)
J is called the iteration matrix for the Jacobi method. The iterative method
becomes:
x
(k+1)
= −D
−1
(L +U)x
(k)
+D
−1
b, k = 0, 1, 2, . . . (3.28)
Generally, one uses the following equations to solve for x
(k+1)
:
x
(0)
i
is given,
x
(k+1)
i
=
1
a
ii
_
_
_
b
i

i−1

j=1
a
ij
x
(k)
j

n

j=i+1
a
ij
x
(k)
j
_
_
_
,
(3.29)
for k ≥ 0 and 1 ≤ i ≤ n (where a sum is absent if its lower limit on j is larger
than its upper limit). Equations (3.29) are easily programmed.
3.5.2 The Gauss–Seidel Method
We now discuss the Gauss–Seidel method, or successive relaxation method.
If in the Jacobi method, we use the new values of x
j
as they become available,
then
x
(0)
i
is given,
x
(k+1)
i
=
1
a
ii
_
_
_
b
i

i−1

j=1
a
ij
x
(k+1)
j

n

j=i+1
a
ij
x
(k)
j
_
_
_
,
(3.30)
for k ≥ 0 and 1 ≤ i ≤ n. (We continue to assume that a
ii
�= 0 for i =
1, 2, . . . , n.) The iterative method (3.30) is called the Gauss–Seidel method,
and can be written in matrix form with
G = −(L +D)
−1
U ≡ G,
so
x
(k+1)
= −(L +D)
−1
Ux
(k)
+ (L +D)
−1
b for k ≥ 0. (3.31)
Linear Systems of Equations 125
Note: The Gauss–Seidel method only requires storage of
(x
(k+1)
1
, x
(k+1)
2
, . . . , x
(k+1)
i−1
, x
(k)
i
, x
(k)
i+1
, . . . , x
(k)
n
)
T
to compute x
(k+1)
i
. The Jacobi method requires storage of x
(k)
as well as
x
(k+1)
. Also, the Gauss–Seidel method generally converges faster. This gives
an advantage to the Gauss–Seidel method. However, on some machines, sep-
arate rows of the iteration equation may be processed simultaneously in par-
allel, while the Gauss–Seidel method requires the coordinates be processed
sequentially (with the equations in some specified order).
Example 3.32

2 1
−1 3
��
x
1
x
2

=

3
2

, that is,
2x
1
+ x
2
= 3
−x
1
+ 3x
2
= 2.
(The exact solution is x
1
= x
2
= 1.) The Jacobi and Gauss–Seidel methods
have the forms
Jacobi:
_
¸
¸
_
¸
¸
_
x
(k+1)
1
=
3
2

1
2
x
(k)
2
x
(k+1)
2
=
2
3
+
1
3
x
(k)
1
_
¸
¸
_
¸
¸
_
,
Gauss–Seidel:
_
¸
¸
_
¸
¸
_
x
(k+1)
1
=
3
2

1
2
x
(k)
2
x
(k+1)
2
=
2
3
+
1
3
x
(k+1)
1
_
¸
¸
_
¸
¸
_
.
The results in Table 3.2 are obtained with x
(0)
= (0, 0)
T
. Observe that the
Gauss–Seidel method converges roughly twice as fast as the Jacobi method.
This behavior is provable.
3.5.3 Successive Overrelaxation
We now describe Successive OverRelaxation (SOR). In the SOR method,
one computes x
(k+1)
i
to be a weighted mean of x
(k)
i
and the Gauss–Seidel
iterate for that element. Specifically, for σ �= 0 a real parameter, the SOR
method is given by
x
(0)
i
is given
x
(k+1)
i
= (1 −σ)x
(k)
i
+
σ
a
ii
_
_
_
b
i

i−1

j=1
a
ij
x
(k+1)
j

n

j=i+1
a
ij
x
(k)
j
_
_
_
,
(3.32)
126 Applied Numerical Methods
TABLE 3.2: Iterates of the Jacobi and
Gauss–Seidel methods, for Example 3.32
k x
(k)
1
Jacobi x
(k)
2
Jacobi x
(k)
1
G–S x
(k)
2
G–S
0 0 0 0 0
1 1.5 0.667 1.5 1.167
2 1.167 1.167 0.917 0.972
3 0.917 1.056 1.014 1.005
4 0.972 0.972 0.998 0.999
5 1.014 0.991 1.000 1.000
6 1.005 1.005
7 0.998 1.002
8 0.999 0.999
9 1.000 1.000
for 1 ≤ i ≤ n and for k ≥ 0. The parameter σ is called a relaxation factor. If
σ < 1, we call σ an underrelaxation factor and if σ > 1, we call σ an overre-
laxation factor. Note that if σ = 1, the Gauss–Seidel method is obtained.
Note: For certain classes of matrices and certain σ between 1 and 2, the SOR
method converges faster than the Gauss–Seidel method.
We can write (3.32) in the matrix form:

L +
1
σ
D

x
(k+1)
= −

U + (1 −
1
σ
)D

x
(k)
+b (3.33)
for k = 0, 1, 2, . . . , with x
(0)
given. Thus,
G = (σL +D)
−1
[(1 −σ)D −σU] ≡ S
σ
,
and
x
(k+1)
= S
σ
x
(k)
+

L +
1
σ
D

−1
b. (3.34)
The matrix S
σ
is called the SOR matrix. Note that σ = 1 gives G, the
Gauss–Seidel matrix.
A classic reference on iterative methods, and the SOR method in particular,
is [44].
3.5.4 Convergence of Iterative Methods
The general iteration equation (3.26) (on page 122) gives
x
(k+1)
= Gx
(k)
+c and x
(k)
= Gx
(k−1)
+c.
Linear Systems of Equations 127
Subtracting these equations and using properties of vector addition and matrix-
vector multiplication gives
x
(k+1)
−x
(k)
= G(x
(k)
−x
(k−1)
). (3.35)
Furthermore, similar rearrangements give
(I −G)(x
(k)
−x) = x
(k)
−x
(k+1)
(3.36)
because
x = Gx +c and x
(k+1)
= Gx
(k)
+c.
Combining (3.35) and (3.36) gives
x
(k)
−x = −(I −G)
−1
G(x
(k)
−x
(k−1)
) = −(I −B)
−1
B
2
(x
(k−1)
−x
(k−2)
) · · · ,
and taking norms gives
�x
(k)
−x� ≤ �(I −G)
−1
� �G� �x
(k)
−x
(k−1)

= �x
(k)
−x� ≤ �(I −G)
−1
� �G� �G(x
(k−1)
−x
(k−2)
)�
≤ �(I −G)
−1
� �G� �G� �x
(k−1)
−x
(k−2)

.
.
.
.
.
.
≤ �(I −G)
−1
� �G�
k
�x
(1)
−x
(0)
�.
It is not hard to show that, for any induced matrix norm,
�(I −G)
−1
� ≤
1
1 −�G�
.
Therefore,
�x
(k)
−x� ≤
�G�
k
1 −�G�
�x
(1)
−x
(0)
�. (3.37)
The practical importance of this error estimate is that we can expect linear
convergence of our iterative method when �G� < 1.
Example 3.33
We revisit Example 3.30, with the following matlab dialog:
>> x = [0,0,0]’
x =
0
0
0
>> G = [0 1/2 0
1/2 0 1/2
0 1/2 0]
G =
0 0.5000 0
0.5000 0 0.5000
0 0.5000 0
128 Applied Numerical Methods
>> c = (1/32)*[sin(pi/4); sin(pi/2); sin(3*pi/4)]
c =
0.0221
0.0313
0.0221
>> exact_solution = (eye(3)-G)\c
exact_solution =
0.0754
0.1067
0.0754
>> normG = norm(G)
normG =
0.7071
>> for i=1:5;
old_norm = norm(x-exact_solution);
x = G*x + c;
new_norm = norm(x-exact_solution);
ratio = new_norm/old_norm
end
ratio =
0.7071
ratio =
0.7071
ratio =
0.7071
ratio =
0.7071
ratio =
0.7071
>> x
x =
0.0660
0.0934
0.0660
>>
We thus see linear convergence with the Jacobi method, with convergence
factor �G� ≈ 0.7071, just as we discussed in Section 1.1.3 (page 7) and our
study of the fixed point method for solving a single nonlinear equation (Sec-
tion 2.2, starting on page 47).
Example 3.34
We examine the norm of the iteration matrix for the Gauss–Seidel method
for Example 3.30:
>> L = [0 0 0
-1 0 0
0 -1 0]
L =
0 0 0
-1 0 0
0 -1 0
>> U = [0 -1 0
0 0 -1
0 0 0]
U =
0 -1 0
0 0 -1
0 0 0
>> D = [2 0 0
0 2 0
0 0 2]
D =
Linear Systems of Equations 129
2 0 0
0 2 0
0 0 2
>> GS = -inv(L+D)*U
GS =
0 0.5000 0
0 0.2500 0.5000
0 0.1250 0.2500
>> norm(GS)
ans =
0.6905
We see that this norm is less than the norm of the iteration matrix for
the Jacobi method, so we may expect the Gauss–Seidel method to converge
somewhat faster.
The error estimates hold if � · � is any norm. Furthermore, it is possible to
prove the following.
THEOREM 3.13
Suppose
ρ(G) < 1,
where ρ(G) is the spectral radius of G, that is,
ρ(G) = max{|λ| : λ is an eigenvalue of G}.
Then the iterative method
x
(k+1)
= Gx
(k)
+c
converges.
In particular, the Jacobi method and Gauss–Seidel method for matrices
of the form in Example 3.30 all converge, although �G� becomes nearer to 1
(and hence, the convergence is slower), the finer we subdivide the interval [0, 1]
(and hence the larger n becomes). There is some theory relating the spectral
radius of various iteration matrices, and matrices arising from discretizations
such as in Example 3.30 have been analyzed extensively.
One criterion that is easy to check is diagonal dominance, as defined in
Remark 3.1 on page 88:
THEOREM 3.14
Suppose
|a
ii
| ≥
n

j=1
j�=i
|a
ij
|, for i = 1, 2, · · · , n,
and suppose that the inequality is strict for at least one i. Then the Jacobi
method and Gauss–Seidel method for Ax = b converge.
We present a more detailed analysis in [1].
130 Applied Numerical Methods
3.5.5 The Interval Gauss–Seidel Method
The interval Gauss–Seidel method is an alternative method
10
for using
floating point arithmetic to obtain mathematically rigorous lower and upper
bounds to the solution to a system of linear equations. The interval Gauss–
Seidel method has several advantages, especially when there are uncertainties
in the right-hand-side vector b that are represented in the form of relatively
wide intervals [b
i
, b
i
], and when there are also uncertainties [a
ij
, a
ij
] in the co-
efficients of the matrix A. That is, we assume that the matrix is A ∈ IR
n×n
,
b ∈ IR
n
, and we wish to find an interval vector (or “box”) x that bounds
Σ(A, b) = {x | Ax = b for some A ∈ A and some b ∈ b} , (3.38)
where IR
n×n
denotes the set of all n by n matrices whose entries are in-
tervals, IR
n
denotes the set of all n-vectors whose entries are intervals, and
A ∈ A means that each element of the point matrix A is contained in the
corresponding element of the interval matrix A (and similarly for b ∈ b).
The interval Gauss–Seidel method is similar to the point Gauss–Seidel
method as defined in (3.30) on page 124, except that, for general systems,
we almost always precondition. In particular, let
˜
A = Y A and
˜
b = Y b,
where Y is a preconditioning matrix. We then have the preconditioned system
Y Ax = Y b, i.e.
˜
Ax =
˜
b. (3.39)
We have
THEOREM 3.15
(The solution set for the preconditioned system contains the solution set for
the original system.) Σ(A, b) ⊆ Σ(Y A, Y b) = Σ(
˜
A,
˜
b).
This theorem is a fairly straightforward consequence of the subdistributivity
(Equation (1.4) on page 26) of interval arithmetic. For a proof of this and
other facts concerning interval linear systems, see, for example, [29].
Analogously to the noninterval version of Gauss–Seidel iteration (3.30), the
interval Gauss–Seidel method is given as
x
(k+1)
i

1
˜ a
ii
_
_
_
˜
b
i

i−1

j=1
˜ a
ij
x
(k+1)
j

n

j=i+1
˜ a
ij
x
(k)
j
_
_
_
(3.40)
for i = 1, 2, . . . , n, where a sum is interpreted to be absent if its lower index
is greater than its upper index, and with x
(0)
i
given for 1 = 1, 2, . . . , n.
10
to the interval version of Gaussian elimination of Section 3.3.4 on page 111
Linear Systems of Equations 131
REMARK 3.9 As with the interval version of Gaussian elimination (Al-
gorithm 3.4 on page 111), a common preconditioner Y for the interval Gauss–
Seidel method is the inverse midpoint matrix Y = (m(A))
−1
, where m(A)
is the matrix whose elements are midpoints of corresponding elements of the
interval matrix A. However, when the elements of A have particularly large
widths, specially designed preconditioners
11
may be more appropriate.
REMARK 3.10 Point iterative methods, are often preconditioned. How-
ever, computing an inverse of a point matrix A leads to Y A ≈ I, where I is
the identity matrix, so the system will already have been solved (except for,
possibly, iterative refinement). Moreover, such point iterative methods are
usually employed for very large systems of equations, with matrices with “0”
for many elements. Although the elements that are 0 need not be stored,
the inverse generally does not have 0’s in any of its elements [13], so it may
be impractical to even store the inverse, let alone compute it.
12
Thus, spe-
cial approximations are used for these preconditioners.
13
Preconditioners for
the point Gauss–Seidel method, conjugate gradient method (explained in our
graduate text [1]), etc. are often viewed as operators that increase the sepa-
ration between the largest eigenvalue of A and the remaining eigenvalues of
A, rather than computing an approximate inverse.
The following theorem tells us that the interval Gauss–Seidel method can
be used to prove existence and uniqueness of a solution of a system of linear
equations.
THEOREM 3.16
Suppose (3.40) is used, starting with initial interval vector x
(0)
, and obtaining
interval vector x
(k)
after a number of iterations. Then, if x
(k)
⊆ x
(0)
, for each
A ∈ A and each b ∈ b, there is an x ∈ x
(k)
such that Ax = b.
The proof of Theorem 3.16 can be found in many places, such as in [20] or
[29].
Example 3.35
Consider Ax = b, where
A =

[0.99, 1.01] [1.99, 2.01]
[2.99, 3.01] [3.99, 4.01]

, b =

[−1.01, −0.99]
[0.99, 1.01]

, x
(0)
=

[−10, 10]
[−10, 10]

.
11
See [20, Chapter 3].
12
Of course, the inverse could be computed one row at a time, but this may still be im-
practical for large systems.
13
Much work has appeared in the research literature on such preconditioners
132 Applied Numerical Methods
Then,
14
m(A) =

1 2
3 4

, Y ≈ m(A)
−1
=

−2.0 1.0
1.5 −0.5

,
˜
A = Y A ⊆

[0.97, 1.03] [−0.03, 0.03]
[−0.02, 0.02] [0.98, 1.02]

,
˜
b = Y b ⊆

[2.97, 3.03]
[−2.02, −1.98]

.
We then have
x
(1)
1

1
[0.97, 1.03]

[2.97, 3.03] −[−0.03, 0.03][−10, 10]

⊆ [2.5922, 3.4330],
x
(1)
2

1
[0.98, 1.02]

[−2.02, −1.98] −[−0.02, 0.02][2.5922, 3.4330]

⊆ [−2.1313, −1.8738].
If we continue this process, we eventually obtain
x
(4)
= ([2.8215, 3.1895], [−2.1264, −1.8786])
T
,
which, to four significant figures, is the same as x
(3)
. Thus, we have found
mathematically rigorous bounds on the set of all solutions to Ax = b such
that A ∈ A and b ∈ b.
In Example 3.35, uncertainties of ±0.01 are present in each element of the
matrix and right-hand-side vector. Although the bounds produced with the
preconditioned interval Gauss–Seidel method are not guaranteed to be the
tightest possible with these uncertainties, they will be closer to the tightest
possible when the uncertainties are smaller.
Convergence of the interval Gauss–Seidel method is related closely to con-
vergence of the point Gauss–Seidel method, through the concept of diagonal
dominance. We give a hint of this convergence theory here.
DEFINITION 3.25 If a = [a, a] is an interval, then the magnitude of a
is defined to be
mag(a) = max{|a|, |a|}.
Similarly, the mignitude of a is defined to be
mig(a) = min
a∈a
|a|.
14
These computations were done with the aid of intlab, a matlab toolbox available free
of charge for non-commercial use.
Linear Systems of Equations 133
Given the matrix
˜
A, form the matrix H = (h
ij
) such that
h
ij
=

mag(a
ij
) if i �= j,
mig(a
ij
) if i = j.
Then, basically, the interval Gauss–Seidel method will be convergent if H is
diagonally dominant.
For a careful review of convergence theory for the interval Gauss–Seidel
method and other interval methods for linear systems, see [29]. Also, see [32].
3.6 The Singular Value Decomposition
The singular value decomposition, which we will abbreviate “SVD,” is not
always the most efficient way of analyzing a linear system, but is extremely
flexible, and is sometimes used in signal processing (smoothing), sensitivity
analysis, statistical analysis, etc., especially if a large amount of information
about the numerical properties of the system is desired. The major libraries
for programmers (e.g. Lapack) and software systems (e.g. matlab, Mathe-
matica) have facilities for computing the SVD. The SVD is often used in the
same context as a QR factorization, but the component matrices in an SVD
are computed with an iterative technique related to techniques for computing
eigenvalues and eigenvectors (in Chapter 5 of this book).
The following theorem defines the SVD.
THEOREM 3.17
Let A be an m by n real matrix, but otherwise arbitrary. Then there are
orthogonal matrices U and V and a an m by n matrix Σ = [Σ
ij
] such that
Σ
ij
= 0 for i �= j, Σ
i,i
= σ
i
≥ 0 for 1 ≤ i ≤ p = min{m, n}, and σ
1
≥ σ
2

· · · ≥ σ
p
, such that
A = UΣV
T
.
For a proof and further explanation, see G. W. Stewart, Introduction to
Matrix Computations [35] or G. H. Golub
15
and C. F. van Loan, Matrix
Computations [16].
Note: The SVD for a particular matrix is not necessarily unique.
Note: The SVD is defined similarly for complex matrices (that is, matrices
whose elements are complex numbers).
15
Gene Golub, a famous numerical analyst, a professor of Computer Science and, for many
years, department chairman, at Stanford University, invented the efficient algorithm used
today for computing the singular value decomposition.
134 Applied Numerical Methods
REMARK 3.11 A simple algorithm to find a singular-value decompo-
sition is: (1) find the nonzero eigenvalues of A
T
A, i.e., λ
i
, i = 1, 2, . . . , r,
(2) find the orthogonal eigenvectors of A
T
A and arrange them in n × n ma-
trix V , (3) form the m × n matrix Σ with diagonal entries σ
i
=

λ
i
, (4)
let u
i
= σ
−1
i
Av
i
, i = 1, 2, . . . r and compute u
i
, i = r + 1, r + 2, . . . , m using
Gram-Schmidt orthogonalization. However, a well-known efficient method
for computing the SVD is the Golub-Reinsch algorithm [36] which employs
Householder bidiagonalization and a variant of the QR method.
Example 3.36
Let A =
_
_
1 2
3 4
5 6
_
_
. Then
U ≈
_
_
−0.2298 0.8835 0.4082
−0.5247 0.2408 −0.8165
−0.8196 −0.4019 0.4082
_
_
, Σ ≈
_
_
9.5255 0
0 0.5143
0 0
_
_
, and
V ≈

−0.6196 −0.7849
−0.7849 0.6196

is a singular value decomposition of A. This approximate singular value de-
composition was obtained with the following matlab dialog.
>> A = [1 2;3 4;5 6]
A =
1 2
3 4
5 6
>> [U,Sigma,V] = svd(A)
U =
-0.2298 0.8835 0.4082
-0.5247 0.2408 -0.8165
-0.8196 -0.4019 0.4082
Sigma =
9.5255 0
0 0.5143
0 0
V =
-0.6196 -0.7849
-0.7849 0.6196
>> U*Sigma*V’
ans =
1.0000 2.0000
3.0000 4.0000
5.0000 6.0000
>>
Linear Systems of Equations 135
Note: If A = UΣV
T
represents a singular value decomposition of A, then,
for
˜
A = A
T
,
˜
A = V Σ
T
U
T
represents a singular value decomposition for
˜
A.
DEFINITION 3.26 The vectors V (:, i), 1 ≤ i ≤ p are called the right
singular vectors of A, while the corresponding U(:, i) are called the left singular
vectors of A corresponding to the singular values σ
i
.
The singular values are like eigenvalues, and the singular vectors are like
eigenvectors. In fact, we have
THEOREM 3.18
Let the n by n matrix A be symmetric and positive definite. Let {λ
i
}
n
i=1
be
the eigenvalues of A, ordered so that λ
1
≥ λ
2
≥ · · · ≥ λ
n
, and let v
i
be the
eigenvector corresponding to λ
i
. Furthermore, choose the v
i
so {v
i
}
n
i=1
is an
orthonormal set, and form V = [v
1
, · · · , v
n
] and Λ = diag(λ
1
, · · · , λ
n
). Then
A = V ΛV
T
represents a singular value decomposition of A.
This theorem follows directly from the definition of the SVD. We also have
THEOREM 3.19
Let the n by n matrix A be invertible, and let A = UΣV
T
represent a singular
value decomposition of A. Then the 2-norm condition number of A is κ
2
(A) =
σ
1

n
.
Thus, the condition number of a matrix is obtainable directly from the
SVD, but the SVD gives us more useful information about the sensitivity of
solutions than just that single number, as we’ll see shortly.
The singular value decomposition is related directly to the Moore–Penrose
pseudo-inverse. In fact, the pseudo-inverse can be defined directly in terms
of the singular value decomposition.
DEFINITION 3.27 Let A be an m by n matrix, let A = UΣV
T
represent
a singular value decomposition of A, and assume r ≤ p is such that σ
1
≥ σ
2

· · · ≥ σ
r
> 0, and σ
r+1
= σ
r+2
= · · · = σ
p
= 0. Then the Moore–Penrose
pseudo-inverse of A is defined to be
A
+
= V Σ
+
U
T
,
where Σ
+
=

Σ
+
ij

is an m by n matrix such that

Σ
+
ij
= 0 if i �= j or i > r, and
Σ
+
ii
= 1/σ
i
if 1 ≤ i ≤ r.
136 Applied Numerical Methods
Part of the power of the singular value decomposition comes from the fol-
lowing.
ace-2pt
THEOREM 3.20
Suppose A is an m by n matrix and we wish to find approximate solutions to
Ax = b, where b ∈ R
m
. Then,
• If Ax = b is inconsistent, then x = A
+
b represents the least squares
solution of minimum 2-norm.
• If A is consistent (but possibly underdetermined) then x = A
+
b repre-
sents the solution of minimum 2-norm.
• In general, x = A
+
b represents the least squares solution to Ax = b of
minimum 2-norm.
The proof of Theorem 3.20 is left as an exercise (on page 144).
REMARK 3.12 If m < n, one would expect the system to be underde-
termined but full rank. In that case, A
+
b gives the solution x such that �x�
2
is minimum; however, if A were also inconsistent, then there would be many
least squares solutions, and A
+
b would be the least squares solution of min-
imum norm. Similarly, if m > n, one would expect there to be a single least
squares solution; however, if the rank of A is r < p = n, then there would
be many such least squares solutions, and A
+
b would be the least squares
solution of minimum norm.
Example 3.37
Consider Ax = b, where A =
_
_
1 2 3
4 5 6
7 8 9
_
_
and b =
_
_
−1
0
1
_
_
. Then
U ≈
_
_
−0.2148 0.8872 0.4082
−0.5206 0.2496 −0.8165
−0.8263 −0.3879 0.4082
_
_
, Σ ≈
_
_
16.8481 0 0
0 1.0684 0
0 0 0.0000
_
_
,
V ≈
_
_
−0.4797 −0.7767 −0.4082
−0.5724 −0.0757 0.8165
−0.6651 0.6253 −0.4082
_
_
, and Σ
+

_
_
0.0594 0 0
0 0.9360 0
0 0 0
_
_
.
Since σ
3
= 0, we note that the system is not of full rank, so it could be either
inconsistent or underdetermined. We compute x ≈ [0.9444, 0.1111, −0.7222]
T
,
Linear Systems of Equations 137
and we obtain
16
�Ax−b�
2
≈ 2.5×10
−15
. Thus, Ax = b, although apparently
underdetermined, is apparently consistent, and x represents that solution of
Ax = b which has minimum 2-norm.
As with other methods for computing solutions, we usually do not form the
pseudo-inverse A
+
to compute A
+
x, but we use the following.
ALGORITHM 3.6
(Computing A
+
b)
INPUT:
(a) the m by n matrix A ∈ L(R
n
, R
m
),
(b) the right-hand-side vector b ∈ R
m
,
(c) a tolerance � such that a singular value σ
i
is considered to be equal to
0 if σ
i

1
< �.
OUTPUT: an approximation x to A
+
b.
1. Compute the SVD of A, that is, compute approximations to U ∈ L(R
m
),
Σ ∈ L(R
n
, R
m
), and V ∈ L(R
n
) such that A = UΣV
T
.
2. p ←min{m, n}.
3. r ←p.
4. FOR i = 1 to p.
IF σ
i

1
> � THEN
σ
+
i
←1/σ
i
.
ELSE
i. r ←i −1.
ii. EXIT FOR
END IF
END FOR
5. Compute w = (w
1
, · · · , w
r
)
T
∈ R
r
, w ← U(:, 1 : r)
T
b, where U(:, 1 :
r) ∈ R
n×r
is the matrix whose columns are the first r columns of U.
6. FOR i = 1 to r: w
i
←σ
+
i
w
i
.
16
The computations in this example were done using matlab, and were thus done in IEEE
double precision. The digits displayed here are the results from that computation, rounded
to four significant decimal digits with matlab’s intrinsic display routines.
138 Applied Numerical Methods
7. x ←
r

i=1
w
i
V (:, i).
END ALGORITHM 3.6.
REMARK 3.13 Ill-conditioning (i.e., sensitivity to roundoff error) in the
computations in Algorithm 3.6 occurs when small singular values σ
i
are used.
For example, suppose σ
i

1
≈ 10
−6
, and there is an error δU(:, i) in the vector
b, that is, b =
˜
b−δU(:, i) (that is, we perturb b by δ in the direction of U(:, i)).
Then, instead of A
+
b,
A
+
(b +δU(:, i)) = A
+
b +A
+
δU(:, i) = A
+
b +δ
1
σ
i
V (:, i). (3.41)
Thus, the norm of the error δU(:, i) is magnified by 1/σ
i
. Now, if, in addition,
b happened to be in the direction of U(:, 1), that is, b = δ
1
U(:, 1), then
�A
+
b�
2
= �δ
1
(1/σ
1
)V (:, 1)�
2
= (1/σ
1
)�b�
2
. Thus, the relative error, in this
case, would be magnified by σ
1

i
.
In view of Remark 3.13, we are led to consider modifying the problem
slightly to reduce the sensitivity to roundoff error. For example, suppose that
we are data fitting, with m data points (t
i
, y
i
) (as in Section 3.4 on page 117),
and A is the matrix as in Equation (3.19), where m � n. Then we assume
there is some error in the right-hand-side vector b. However, since {U(:, i)}
forms an orthonormal basis for R
m
,
b =
m

i=1
β
i
U(:, i) for some coefficients {β
i
}
m
i=1
.
Therefore, U
T
b = (β
1
, . . . , β
m
)
T
, and we see that x will be more sensitive to
changes in components of b in the direction of the β
i
with larger indices. If we
know that typical errors in the data are on the order of �, then, intuitively, it
makes sense not to use components of b in which the magnification of errors
will be larger than that. That is, it makes sense in such cases to choose � = �
in Algorithm 3.6.
Use of � �= 0 in Algorithm 3.6 can be viewed as replacing the smallest
singular values of the matrix A by 0. In the case that A ∈ L(R
n
) is square and
only σ
n
is replaced by zero, this amounts to replacing an ill-conditioned matrix
A by a matrix that is exactly singular. One (of many possible) theorems
dealing with this replacement process is
THEOREM 3.21
Suppose A is an n by n matrix, and suppose we replace σ
n
�= 0 in the singular
value decomposition of A by 0, then form
˜
A = U
˜
ΣV
T
, where A = UΣV
T
rep-
resents the singular value decomposition of A, and
˜
Σ = diag(σ
1
, · · · , σ
n−1
, 0).
Linear Systems of Equations 139
Then
�A−
˜
A�
2
= min
B∈L(R
n
)
rank(B)<n
�A−B�
2
,
Suppose now that
˜
A has been obtained from A by replacing the smallest
singular values of A by 0, so the nonzero singular values of
˜
A are σ
1
≥ σ
2

· · · ≥ σ
r
> 0, and define x =
˜
A
+
b. Then, perturbations of size �∆b� in b
result in perturbations of size at most (σ
1

r
)�∆b� in x. This prompts us to
define a generalization of condition number as follows.
DEFINITION 3.28 Let A, be an m by n matrix with m and n arbitrary,
and assume the nonzero singular values of A are σ
1
≥ σ
2
≥ · · · ≥ σ
r
> 0.
Then the generalized condition number of A is σ
1

r
.
Example 3.38
Consider A =
_
_
1 2 3
4 5 6
7 8 10
_
_
, whose singular value decomposition is approxi-
mately
U ≈
_
_
0.2093 0.9644 0.1617
0.5038 0.0353 −0.8631
0.8380 −0.2621 0.4785
_
_
,
Σ ≈
_
_
17.4125 0 0
0 0.8752 0
0 0 0.1969
_
_
, and
V ≈
_
_
0.4647 −0.8333 0.2995
0.5538 0.0095 −0.8326
0.6910 0.5528 0.4659
_
_
.
Suppose we want to solve the system Ax = b, where b = [1, −1, 1]
T
, but that,
due to noise in the data, we do not wish to deal with any system of equations
with condition number equal to 25 or greater. How can we describe the set of
solutions, based on the best information we can obtain from the noisy data?
We first observe that κ
2
(A) = σ
1

n
≈ 88.4483. However, σ
1

2

19.8963 < 25. We may thus form a new matrix
˜
A = U
˜
ΣV
T
, where
˜
Σ
is obtained from Σ by replacing σ
3
by 0. This is equivalent to projecting
A onto the set of singular matrices according to Theorem 3.21. We then
use Algorithm 3.6 (applied to
˜
A) to determine x as x =
˜
A
+
b. We obtain
x ≈ (−0.6205, 0.0245, 0.4428)
T
. Thus, to within the accuracy of 1/25 = 4%,
140 Applied Numerical Methods
we can only determine that the solution lies along the line
_
_
−0.6205
0.0245
0.4428
_
_
+y
3
V
:,3
, y
3
∈ R.
This technique is a common type of analysis in data fitting. The parameter
y
3
(or multiple parameters, in case of higher-order rank deficiency) needs to
be chosen through other information available with the application.
3.7 Applications
Consider the following difference equation model [3], which describes the
dynamics of a population divided into three stages.
_
¸
_
¸
_
J(t + 1) = (1 −γ
1
)s
1
J(t) +bB(t)
N(t + 1) = γ
1
s
1
J(t) + (1 −γ
2
)s
2
N(t)
B(t + 1) = γ
2
s
2
N(t) +s
3
B(t)
(3.42)
The variables J(t), N(t) and B(t) represents the number of juveniles, non-
breeders, and breeders, respectively, at time t. The parameter b > 0 is the
birth rate, while γ
1
, γ
2
∈ (0, 1) represent the fraction (in one time unit) of
juveniles that become non-breeders and non-breeders that become breeders,
respectively. Parameters s
1
, s
2
, s
3
∈ (0, 1) are the survivor rates of juveniles,
non-breeders and breeders, respectively.
To analyze the model numerically, we let b = 0.6, γ
1
= 0.8, γ
2
= 0.7, s
1
=
0.7, s
2
= 0.8, s
3
= 0.9. Also notice the model can be written as
_
_
J(t + 1)
N(t + 1)
B(t + 1)
_
_
=
_
_
0.14 0 0.6
0.56 0.24 0
0 0.56 0.9
_
_
_
_
J(t)
N(t)
B(t)
_
_
or a matrix form
X(t + 1) = AX(t),
where X(t) = (J(t), N(t), B(t))
T
, and A =
_
_
0.14 0 0.6
0.56 0.24 0
0 0.56 0.9
_
_
.
Suppose we know all the eigenvectors v
i
, i = 1, 2, 3 and their associated
eigenvalues λ
i
, i = 1, 2, 3 of the matrix A. By the knowledge of linear alge-
bra, any initial vector X(0) can be expressed as a linear combination of the
eigenvectors
X(0) = c
1
v
1
+c
2
v
2
+c
3
v
3
,
Linear Systems of Equations 141
then
X(1) = AX(0) = A(c
1
v
1
+c
2
v
2
+c
3
v
3
)
= c
1
Av
1
+c
2
Av
2
+c
3
Av
3
= c
1
λ
1
v
1
+c
2
λ
2
v
2
+c
3
λ
3
v
3
.
Applying the same techniques, we get
X(2) = AX(1) = A(c
1
λ
1
v
1
+c
2
λ
2
v
2
+c
3
λ
3
v
3
)
= c
1
λ
2
1
v
1
+c
2
λ
2
2
v
2
+c
3
λ
2
3
v
3
.
Continuing the above will lead to the general solution of the population dy-
namical model (3.42)
X(t) =
3

i=1
c
i
λ
t
i
v
i
.
Now, to compute the eigenvalues and eigenvectors of A, we could simply type
the following in matlab command window
>> A=[0.14 0 0.6; 0.56 0.24 0; 0 0.56 0.9]
A =
0.1400 0 0.6000
0.5600 0.2400 0
0 0.5600 0.9000
>> [v,lambda]=eig(A)
v =
-0.1989 + 0.5421i -0.1989 - 0.5421i 0.4959
0.6989 0.6989 0.3160
-0.3728 - 0.1977i -0.3728 + 0.1977i 0.8089
lambda =
0.0806 + 0.4344i 0 0
0 0.0806 - 0.4344i 0
0 0 1.1188
From the result, we see that the spectral radius of A is λ
3
= 1.1188 and its
corresponding eigenvector is v
3
= (0.4959, 0.3160, 0.8089)
T
. Hence, X(t) =
A
t
X(0) ≈ c
3
(1.1188)
t
v
3
. This shows the population size will increase geo-
metrically as time increases.
3.8 Exercises
1. Let
A =

5 −2
−4 7

.
Find �A�
1
, �A�

, �A�
2
, and ρ(A). Verify that ρ(A) ≤ �A�
1
, ρ(A) ≤
�A�

and ρ(A) ≤ �A�
2
.
142 Applied Numerical Methods
2. Show that back solving for Gaussian elimination (that is, show that
completion of Algorithm 3.2) requires (n
2
+ n)/2 multiplications and
divisions and (n
2
−n)/2 additions and subtractions.
3. Consider Example 3.14 (on page 85).
(a) Fill in the details of the computations. In particular, by multiplying
the matrices together, show that M
−1
1
and M
−1
2
are as stated, that
A = LU, and that L = M
−1
1
M
−1
2
.
(b) Solve Ax = b as mentioned in the example, by first solving Ly = b,
then solving Ux = y. (You may use matlab, but print the entire
dialog.)
4. Show that performing the forward phase of Gaussian elimination for
Ax = b (that is, completing Algorithm 3.1) requires
1
3
n
3
+ O(n
2
) mul-
tiplications and divisions.
5. Show that the inverse of a nonsingular lower triangular matrix is lower
triangular.
6. Explain why A = LU, where L and U are as in Equation (3.5) on
page 86.
7. Verify the details in Example 3.15 by actually computing the solutions
to the three linear systems, and by multiplying A and A
−1
. (If you use
matlab, print the details.)
8. Program the tridiagonal version of Gaussian elimination represented by
equations (3.8) and (3.9) on page 95. Use your program to approxi-
mately solve
u
��
= −1, u(0) = u(1) = 0
using the technique from Example 3.18 (on page 93), with h = 1/4, 1/8,
1/64, and 1/4096. Compare with the exact solution u(x) =
1
2
x(1 −x).
9. Store the matrices from Problem 8 in matlab’s sparse matrix format,
and solve the systems from Problem 8 in matlab, using the sparse
matrix format. Compare with the results you obtained from your tridi-
agonal system solver.
10. Let A =
_
_
1 2 3
4 5 6
7 8 10
_
_
and b =
_
_
−1
0
1
_
_
.
(a) Compute κ

(A) approximately.
(b) Use floating point arithmetic with β = 10 and t = 3 (3-digit deci-
mal arithmetic), rounding-to-nearest, and Algorithms 3.1 and 3.2
to find an approximation to the solution x to Ax = b.
Linear Systems of Equations 143
(c) Execute Algorithm 3.4 by hand, using t = 3, β = 10, and out-
wardly rounded interval arithmetic (and rounding-to-nearest for
computing Y ).
(d) Find the exact solution to Ax = b by hand.
(e) Compare the results you have obtained.
11. Derive the normal equations (3.22) from (3.21).
12. Let
A =
_
_
2 1 1
4 4 1
6 −5 8
_
_
.
(a) Find the LU factorization of A, such that L is lower triangular and
U is unit upper triangular.
(b) Perform back solving then forward solving to find a solution x for
the system of equations Ax = b = [4 7 15]
T
.
13. Find the Cholesky factorization of
A =
_
_
1 −1 2
−1 5 4
2 4 29
_
_
.
Also explain why A is positive definite.
14. Let A =

0.1α 0.1α
1.0 1.5

. Determine α such that κ

(A), the condition
number in the induced ∞-norm, is minimized.
15. Let A be n ×n lower triangular matrix with elements
a
ij
=
_
_
_
1 if i = j,
−1 if i = j + 1,
0 otherwise.
Determine the condition number of A using the matrix norm � · �

.
16. Consider the matrix system Au = b given by
_
_
_
_
_
_
_
_
_
_
1
2
0 0 0
1
4
1
2
0 0
1
8
1
4
1
2
0
1
16
1
8
1
4
1
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
u
1
u
2
u
3
u
4
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
1
0
0
1
_
_
_
_
_
_
_
_
_
_
.
144 Applied Numerical Methods
(a) Determine A
−1
by hand.
(b) Determine the infinity-norm condition number of the matrix A.
(c) Let ˜ u be the solution when the right-hand side vector b is perturbed
to
˜
b = (1.01 0 0 0.99)
T
. Estimate �u − ˜ u�

, without computing
˜ u.
17. Complete the computations, to check that x
(4)
is as given in Exam-
ple 3.35 on page 131. (You may use intlab. Also see the code
gauss seidel step.m available from
http://www.siam.org/books/ot110.)
18. Repeat Example 3.28, but with the interval Gauss–Seidel method, in-
stead of interval Gaussian elimination, starting with x
(0
i
= [−10, 10],
1 ≤ i ≤ 3. Compare the results.
19. Let A be the n ×n tridiagonal matrix with,
a
ij
=
_
_
_
4 if i = j,
−1 if i = j + 1 or i = j −1,
0 otherwise.
Prove that the Gauss-Seidel and Jacobi methods converge for this ma-
trix.
20. Consider the linear system,

3 2
2 4
��
x
1
x
2

=

7
10

.
Using the starting vector x
(0)
= (0, 0)
T
, carry out two iterations of the
Gauss–Seidel method to solve the system.
21. Prove Theorem 3.20 on page 136. (Hint: You may need to consider
various cases. In any case, you’ll probably want to use the properties of
orthogonal matrices, as in the proof of Theorem 3.19.)
22. Given U, Σ, and V as given in Example 3.37 (on page 136), compute
A
+
b by using Algorithm 3.6. How does the x that you obtain compare
with the x reported in Example 3.37?
23. Find the singular value decomposition of the matrix A =
_
_
1 2
1 1
1 3
_
_
.