You are on page 1of 3

Is Gaussian elimination backward stable?

Why pivot?
Consider the following example:
A =
_
10
16
1
1 1
_
(1)
If we compute the LU-factorization of A in double-precision arithmetic but without pivoting (swapping
rows) then we obtain the following:

L =
_
1 0
(1/10
16
) 1
_
=
_
1 0
10
16
1
_

U =
_
10
16
1
0 (1 (10
16
1))
_
=
_
10
16
1
0 10
16
_
Note that
A

L

U =
_
0 0
0 1
_
,
in both oating-point and exact arithmetic. The norm of this error (or more precisely, the residual ) is
roughly the same size as the norm of the input A.
On the other hand, if we use partial pivoting we obtain the following:
PA =
_
1 1
10
16
1
_

L =
_
1 0
(10
16
/1) 1
_
=
_
1 0
10
16
1
_

U =
_
1 1
0 (1 (10
16
1))
_
=
_
1 1
0 1
_
Now we have A = (

U) (but not in exact arithmetic!).


Code for GEPP and GECP
There are two main strategies for pivoting. The rst method, called partial pivoting, swaps rows to
ensure that the entry in the pivot position has the greatest absolute value in that column on or below
that row. Stripped to its essence, here is GEPP in pseudocode:
for each column j = 1:m
let i = row in range j:m such that abs(A(i,j)) is max
swap rows j and i
set A(j+1:m,j+1:m) = A(j+1:m,j+1:m) - A(j+1:m,j)*A(j,j+1:m)
(A full version is posted.) Note that this algorithm has the property that it keeps the entries of L small.
The second main strategy for pivoting is called complete pivoting. GECP always swaps rows to
ensure that the entry in the pivot position has the greatest absolute value among all entries below and
to the right of the position. Stripped to its essence, here is GECP in pseudocode:
for each column j = 1:m
let i,k = row,column in ranges j:m,j:m such that abs(A(i,k)) is max
swap rows j and i
swap columns j and k
set A(j+1:m,j+1:m) = A(j+1:m,j+1:m) - A(j+1:m,j)*A(j,j+1:m)
This is a vastly more expensive search operation at each stage. Fortunately, as we shall see, GECP is
rarely necessary. In practice partial pivoting is stable.
1
Growth factors
To understand the stability of GEPP and GECP, we can assume that the pivoting (row swapping) has
all been done beforehand. With this in mind lets look at the key theorem.
Theorem 1 If A = LU, and if

L and

U are the values computed by Gaussian elimination without
pivoting, on an IEEE-compliant machine, then there exists a A such that

U = A + A, where A = C L U , (2)
and C is a constant which depends on the dimension m, but not on the matrix entries.
Note that we make no claim about the errors L

L or U

U, only about the residual A

L

U. We
will see more and more theorems of this avor. Also, note that this does not give backward stability,
since the factor on the right is L U. not A. Thus, the question of stability comes down to the
question of how big this factor is compared to A.
Since in both GEPP and GECP the factor L stays bounded (never bigger than m, depending on
the norm used) the question boils down to this? How big can the largest absolute value in U be compared
to the largest absolute value in A? This leads us to dene the growth factor for GEPP:
g
PP
(A) =
|U|
max
|A|
max
(3)
How bad can it be?
Consider the following m m example:
A =
_

_
1 0 0 0 1
1 1 0 0 1
1 1 1 0 1
.
.
.
.
.
.
.
.
.
1 1 1 1 1
_

_
=
_

_
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
.
.
.
.
.
.
.
.
.
1 1 1 1 1
_

_
_

_
1 0 0 0 1
0 1 0 0 2
0 0 1 0 4
.
.
.
.
.
.
.
.
.
0 0 0 0 2
m1
_

_
(4)
At each state of Gaussian elimination we add the pivot row to all of the rows below it. At each stage
this doubles the values of all the entries in the last column below the pivot row. The entry in the lower
right gets doubled m 1 times. In this case, therefore, g
PP
= 2
m1
.
Now this example also shows that with GEPP (or GECP) the growth factor never exceeds 2
m1
.
This is because at each step we are adding a multiple of one row to another, and that multiplier is at
most 1 in absolute value. Thus, after each type III operation each new entry is at most twice the value
of one of the previous matrix entries. Since each entry is modied (at most) m 1 times, each entry is
at worst multiplied by 2
m1
.
Of course, we wouldnt expect such continual doubling, if for no other reason than the fact that
the signs of the entries will not all be the same, and so there is as likely to be cancellation as there is
doubling. Moreover, since the entries vary in size and since we always take the biggest in a given column
as the pivot, the multipliers will usually be less than one.
But these are heuristic arguments. Are they valid in practice? If so, can we give some more precise
explanation as to why this is so? These are questions we will explore next.
Homework problems: First version due Friday, 17 February
Final version due Friday, 3 March
Do two of the following four problems.
1. In theory the function dened by the rule
f(x) =
_
_
_
log(1 + x)
x
, if x = 0
1, if x = 0
2
is smooth everywhere, even when x = 0. [Why? Explain!] Dene this function in Matlab or
Octave, then plot it, rst over the range [1, 1] and then over the range [10, 10], where is the
machine epsilon. Is the picture smooth?
Now dene a new algorithm to compute y = f(x):
s = 1 + x;
if s = 1
y = 1;
else
y = log(s)/(s-1);
end
Graph the output of this new algorithm, over the same intervals. Compare the two sets of graphs.
What do you nd? How do you explain it?
2. Consider the problem of solving AX = B, where A is n n and both X and B are n m. I want
you to compare two algorithms:
a. Use Gaussian elimination to nd the factorization PA = LU, then use forward and backward
substitution to solve for X.
b. Use Gaussian elimination to nd A
1
then compute A
1
B.
Count the maximum number of ops needed for both. (This is the theoretical maximum, which is
a function of m and n.) Which algorithm is faster? Use Matlab or Octave to verify your prediction.
3. Given a nonsingular matrix A and a vector b prove that for all suciently small positive there
are choices A and b such that
x
2
= A
2
(A
2
x
2
+ b
2
),
where (as before) Ax = b, x = x x, and
(A + x) x = b + b.
4. Consider the problem of solving a 2 2 system:
_
a b
c d
_ _
x
y
_
=
_
e
f
_
.
I want you to compare two algorithms:
a. GEPP.
b. Cramers Rule:
det = a*d - b*c
x = (d*e - b*f)/det
y = (-c*e + a*f)/det
Show by means of a numerical example that Cramers Rule is not backward stable. What does
backward stability imply about the size of the residual?
3