You are on page 1of 18

Three Iterative Methods for Solving the Eigenproblem

for Various Classes of Matrices
Nicholas Benthem
Kurt O’Hearn
Connor Scholten
December 9, 2011
1 Introduction
Our paper presents and discusses three iterative techniques for solving the eigenproblem: the Power
method, the QR algorithm, and the Jacobi method. Of these methods, the Power method and the QR
algorithm can be employed on any square matrix in order to determine its eigen-information.
Furthermore, the Jacobi method can be used to determine eigen-information for symmetric matrices. In
addition to providing the steps of and some examples using these methods, the paper also discuses
indicators of convergence rates and variations upon these three methods.
1.1 Motivation
We first define the meaning of the terms direct method and iterative method. A direct method produces a
result in a finite number of prescribed computations. An indirect method produces a sequence of
approximations for a result that (hopefully) converges to the true solution. With this distinction between
direct and iterative methods in mind, we now consider the following question: why are iterative
methods necessary in solving the eigen-problem?
Recall that every n x n matrix has an associated characteristic equation, which is an n degree polynomial.
To determine the eigenvalues of a matrix, we must solve this polynomial. Hence, solving the
eigen-problem is equivalent to solving polynomials. Now, also consider the following result established
by Niels Hendrik Abel in 1823.
Theorem 1 (Abel’s Theorem). Let p(x) be a polynomial of degree n with complex coefficients. Then if n ≥ 5, p
has no ‘solution by radicals’. That is, there exists no direct method to determine the roots of p by means of a finite
number of elementary operations.
Thus, Abel proved that there is no direct method for finding the roots of polymonials of degree 5 or
higher. This astonishing historical result means we must employ iterative methods to approximate the
solutions of polynomials, or equivalently to approximate the eigenvalues of n x n matrices where n ≥ 5.
1.2 Definitions, Notations, and Remarks Concerning this Paper
For this paper, all matrices will be square with real entries. We will use the following notation:
Notation 1. Let A
nx n
be a matrix with real entries. The following expressions are equivalent.
1. A is an n x n real matrix
2. A ∈ R
nx n
1
We also note that although we only consider real matrices within this paper, many of the ideas hold with
complex matrices. Additionally, the following two definitions will be employed throughout the paper.
Definition 1. Let A ∈ R
nx n
. We say A is simple when A has n linearly independent eigenvectors. Otherwise, A
is said to be defective.
Definition 2. Let A ∈ R
nx n
. We say A is nonsingular when A is invertible. Otherwise, A is said to be singular.
2 The Power Method
The Power method is used to find the strictly dominant eigenvector of any matrix A ∈ R
nx n
. By strictly
dominant eigenvector v
1
, we mean the eigenvector associated with the strictly dominant eigenvalue λ
1
of A. That is,

1
| > |λ
2
| ≥ |λ
3
| ≥ . . . ≥ |λ
n
|
for the eigenvalues λ
1
, λ
2
, λ
3
, . . . , λ
n
of A. To avoid complicating the discussion about the Power
method, we require that A be simple. In the following subsections, we present the steps of the power
method, discuss why the Power method converges, present and discuss an indicator of the rate of
convergence, and consider variations upon the Power method.
2.1 Steps of the Power Method
As mentioned, let A ∈ R
nx n
be a simple matrix, and let v
1
be the strictly dominant eigenvector
associated with A. The following are the steps in the Power method.
Method 1 (The Power Method).
1. Choose an initial x
0
in R
n
2. For k = 0, 1, . . . , end
(a) Compute Ax
k
(b) Let µ
k
be the largest entry in absolute value of Ax
k
(c) Compute x
k+1
=
1
µ
k
Ax
k
Then the sequence
X = {x
0
, x
1
= Ax
0
, x
2
= Ax
1
, x
3
= Ax
2
, . . . , x
m
= Ax
m−1
, . . .}
generated by the Power method converges to the dominant eigenvector of A and the sequence
Y = {µ
0
, µ
1
, µ
2
, µ
3
, . . . , µ
m
, . . .}
converges to the dominant eigenvalue of A.
2.2 Proof of the Convergence of the Power Method
To show why the sequence X of approximations converges to the dominant eigenvector, we will first
analyze a representation of our initial vector x
0
in another basis. Then we will consider some
consequences of this representation, particularly as we progress far in the sequence. Recall that the
eigenvectors v
1
, v
2
, . . . , v
n
of A form a basis for R
n
due to their pair-wise linear independence. In light of
this fact, we can take an arbitrary q in R
n
and write this vector as a linear combination of these basis
elements. That is, there exists real numbers c
1
, c
2
, . . . , c
n
such that
2
q = c
1
v
1
+c
2
v
2
+. . . +c
n
v
n
. (1)
Additionally, we stipulate that for the eigenvector representation of our arbitrary q in (1), c
1
is non-zero.
That is, we assert that the component of q in the direction of the dominant eigenvector v
1
of A is nonzero.
Now, observe that if we multipy both sides of (1) on the left by A, we obtain
Aq = c
1
Av
1
+c
2
Av
2
+. . . +c
n
Av
n
= c
1
λ
1
v
1
+c
2
λ
2
v
2
+. . . +c
n
λ
n
v
n
. (2)
We note that the last line in the above equation is a consequence of v
1
, v
2
, . . . , v
n
being eigenvectors of A.
Now, again left multiplying A by both sides of (2), we obtain
A
2
q = c
1
λ
2
1
v
1
+c
2
λ
2
2
v
2
+. . . +c
n
λ
2
n
v
n
.
Hence, if we continue this left hand multiplication by A, we have for any positive integer j,
A
j
q = c
1
λ
j
1
v
1
+c
2
λ
j
2
v
2
+. . . +c
n
λ
j
n
v
n
. (3)
Now, if we observe that since we know that |λ
1
| > 0 because λ
1
is the dominant eigenvalue, then we can
conclude from (3) that
1
λ
j
1
A
j
q = c
1
v
1
+c
2
λ
j
2
λ
j
1
v
2
+. . . +c
n
λ
j
n
λ
j
1
v
n
. (4)
And again since |λ
1
| > |λ
2
| ≥ |λ
3
| ≥ . . . ≥ |λ
n
|, we have
λ
j
2
λ
j
1
≤ . . . ≤
λ
j
n
λ
j
1
< 1. (5)
Thus, as j → ∞, we can observe from (5) that
λ
j
2
λ
j
1
→ 0, . . . ,
λ
j
n
λ
j
1
→ 0.
Thus, we see from the above equation that in (4),
1
λ
j
1
A
j
q converge to a scalar multiple of the dominant
eigenvector v
1
of A. With this result, we now consider the sequence of approximations X generated from
the Power method in the following form:
X = {x
0
, x
1
= Ax
0
, x
2
= A
2
x
0
, x
3
= A
3
x
0
, . . . , A
n
x
0
, . . .}.
Hence, if we represent each of the above terms in X using our previously established eigenvector basis
in (1), we now know from (3) that this sequence does indeed converges to a scalar multiple of the
dominant eigenvector v
1
, and the sequence Y must indeed converge to the dominant eigenvector λ
1
.
2.3 Discussion of the Convergence of the Power Method
Since we have now established that the sequences generated by the method do indeed converge, we
now consider how these sequences behave as they converge. Observe that if |λ
1
| = 1, then
1
λ
j
1
A
j
x
0
= c
1
v
1
+c
2
λ
j
2
λ
j
1
v
2
+. . . +c
n
λ
j
n
λ
j
1
v
n
3
clearly converges to c
1
v
1
as j tends to ∞. However, consider what happens if |λ
1
| > 1, or |λ
1
| < 1. We
can observe that if |λ
1
| < 1, then
1
λ
j
1
A
j
x
0

0 as j → ∞. Likewise, if |λ
1
| > 1, then the entries of
1
λ
j
1
A
j
x
0
increase without bound as j → ∞.
To account for both situations, we scale each term in the sequence X of approximations in step (2b) of
the Power method by the multiplicative inverse of the largest entry in absolute value of x
n
. We choose
this scaling value µ
j
because λ
1
is not known in practice but instead useful only in theoretical discussion.
This scale by µ
j
then adjusts the entries of v
1
so that largest entry is 1. Thus, this scaling accounts for the
cases involving |λ
1
| = 1 in which the sequences converges either to the zero vector, or the entries of the
terms in the sequence grow without bound.
2.4 Convergence Rate of the Power Method
Let Q be the sequence defined as
Q = {q
0
= x
0
, q
1
=
1
λ
1
Ax
0
, q
2
=
1
λ
2
1
A
2
x
0
, . . . , q
m
=
1
λ
m
1
A
m
x
0
, . . .}. (6)
To better understand how well a particular sequence of approximations X from the Power method
converges, we consider the error of approximation of the m
th
iterate of the algorithm using q
m
in place of
x
m
, denoted as ||q
m
−c
1
v
1
||; we perform this substitution since µ
m
from the Power method is unknown
and unpredicable unless given an initial vector x
0
. Now, observe that from our previous representations,
we have
||q
m
−c
1
v
1
|| =
¸
¸
¸
¸
¸
¸
¸
¸
1
λ
m
1
A
m
x
0
−c
1
v
1
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
¸
¸
1
λ
m
1
(c
1
λ
m
1
v
1
+c
2
λ
m
2
v
2
+c
3
λ
m
3
v
3
+. . . +c
n
λ
m
n
v
n
) −c
1
v
1
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
¸
¸
c
1
v
1
+c
2
_
λ
2
λ
1
_
m
v
2
+c
3
_
λ
3
λ
1
_
m
v
2
+. . . +c
n
_
λ
n
λ
1
_
m
v
n
−c
1
v
1
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
¸
¸
c
2
_
λ
2
λ
1
_
m
v
2
+c
3
_
λ
3
λ
1
_
m
v
3
+. . . +c
n
_
λ
n
λ
1
_
m
v
n
¸
¸
¸
¸
¸
¸
¸
¸
≤ |c
2
| ·
¸
¸
¸
¸
λ
2
λ
1
¸
¸
¸
¸
m
· ||v
2
|| +|c
3
| ·
¸
¸
¸
¸
λ
3
λ
1
¸
¸
¸
¸
m
· ||v
3
|| +. . . +|c
n
| ·
¸
¸
¸
¸
λ
n
λ
1
¸
¸
¸
¸
m
· ||v
n
|| by the Triangle Inequalilty
≤ |c
2
| ·
¸
¸
¸
¸
λ
2
λ
1
¸
¸
¸
¸
m
· ||v
2
|| +|c
3
| ·
¸
¸
¸
¸
λ
2
λ
1
¸
¸
¸
¸
m
· ||v
3
|| +. . . +|c
n
| ·
¸
¸
¸
¸
λ
2
λ
1
¸
¸
¸
¸
m
· ||v
n
|| by (5)
=
¸
¸
¸
¸
λ
2
λ
1
¸
¸
¸
¸
m
(|c
2
| · ||v
2
|| +|c
3
| · ||v
3
|| +. . . +|c
n
| · ||v
n
||) . (7)
Hence, we see that
¸
¸
¸
λ
2
λ
1
¸
¸
¸ is an important indicator of the convergence rate of the Power method. In fact,
(7) tells us the error in the approximation of c
1
v
1
decreases by a factor of at least
¸
¸
¸
λ
2
λ
1
¸
¸
¸ for each iteration of
the method. We can also observe that when
¸
¸
¸
λ
2
λ
1
¸
¸
¸ is close to 1,
_
λ
2
λ
1
_
m
approaches zero relatively slowly,
and the error in the approximation is decreases slowly. This means that the sequence will then converge
slowly. Likewise, if
¸
¸
¸
λ
2
λ
1
¸
¸
¸ is close to 0, then the error in each iterate is small, and thus the sequence
converges rapidly.
4
2.5 Implementation of the Power Method
With the theoretical results above, we next consider an example implemention of the Power method. Let
A =
_
¸
¸
¸
¸
¸
¸
¸
¸
_
124.76 −171.63 21.139 17.160 38.059
42.946 −54.063 5.8936 6.3312 12.409
62.779 −89.405 16.559 13.493 16.420
−184.56 271.52 −35.151 −22.380 −60.979
−102.06 153.41 −24.098 −15.986 −29.873
_
¸
¸
¸
¸
¸
¸
¸
¸
_
and x
0
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
94
12
−2
50
10
_
¸
¸
¸
¸
¸
¸
¸
¸
_
.
For the first iteration, we then have
x
1
= Ax
0
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−1.23
−0.433
−0.640
1.789
1.0
_
¸
¸
¸
¸
¸
¸
¸
¸
_
.
After a few more iterations, we see that the terms in X are
x
0
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
94
12
−2
50
10
_
¸
¸
¸
¸
¸
¸
¸
¸
_
, x
1
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−1.23
−0.433
−0.640
1.789
1.0
_
¸
¸
¸
¸
¸
¸
¸
¸
_
, . . . , x
14
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−0.930
−0.436
−0.207
1.0
0.379
_
¸
¸
¸
¸
¸
¸
¸
¸
_
, x
15
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−0.930
−0.436
−0.206
1.0
0.379
_
¸
¸
¸
¸
¸
¸
¸
¸
_
.
And, the terms in Y are
µ
0
= −585.739, µ
1
= 10.231, . . . , µ
14
= 15.007, µ
15
= 15.000.
2.6 Extensions to the Power Method
We now present two extensions to the Power method. First, we consider the Inverse Power method.
Method 2 (The Inverse Power Method).
Let A ∈ R
nx n
be a simple, nonsingular matrix, and let v
1
be the strictly dominant eigenvector associated
with A. The following are the steps in the Inverse Power method.
1. Choose an initial x
0
in R
n
2. For k = 0, 1, . . . , end
(a) Compute A
−1
x
k
(b) Let µ
k
be the largest entry in absolute value of A
−1
x
k
(c) Compute x
k+1
=
1
µ
k
A
−1
x
k
5
Then the sequence
X = {x
0
, x
1
= A
−1
x
0
, x
2
= A
−1
x
1
, x
3
= A
−1
x
2
, . . . , x
m
= A
−1
x
m−1
, . . .}
generated by the Power method converges to the weakest eigenvector of A and the sequence
Y = {µ
0
, µ
1
, µ
2
, µ
3
, . . . , µ
m
, . . .}
converges to the weakest eigenvalue of A.
To better understand why the Inverse Power method converges, we first prove the following theorem.
Theorem 2. If λ and v are an eigenvalue and its corresponding eigenvector of a simple nonsingular matrix A,
then
1
λ
and v are are an eigenvalue and its corresponding eigenvector in A
−1
.
Proof. We assume that A is a is simple nonsingular matrix and that λ and v are an eigenvalue and its
corresponding eigenvector of A.Since λ and v are an eigenvalue and its corresponding eigenvector of A,
then
Av = λv.
Since A is nonsingular,
v = A
−1
λv.
Furthermore, because A is nonsingular, λ = 0. So, we have
1
λ
v = A
−1
v.
Thus, by definition,
1
λ
and v are are an eigenvalue and its corresponding eigenvector in A
−1
as
desired.
Now, with the established result, we could perform the similar analysis to the Power method to establish
convergence. For brevity, we omit this discussion. Instead, we now turn our attention to another
extension: phrase shifts. To better understand what a phase shfit p does, consider the proof of the
following theorem.
Theorem 3. If λ and v are an eigenvalue and its corresponding eigenvector of a simple matrix A, then λ −p and v
are an eigenvalue and its corresponding eigenvector of A−pI.
Proof. We assume that λ and v are an eigenvalue and its corresponding eigenvector of a simple matrix A
Since λ and v are an eigenvalue and its corresponding eigenvector of a simple matrix A, then
(A−pI)v = Av −pIv.
Since λ and v are an eigenvalue and its corresponding eigenvector of a simple matrix A, we have
(A−pI)v = λv −pv.
And, by the distributive property, we have
(A−pI)v = (λ −p)v.
Thus, λ −p and v are an eigenvalue and its corresponding eigenvector of A−pI, as desired.
6
As demonstrated in the above result, a phrase shift p is a means for shifting the eigenvalues of A. A
method where such shifts become useful is in the Inverse Power method. For this method, the
eigenvalue and eigenvector for which the method converges to is predetermined. However, if we shift A
by p and perform inverse iteration on A−pI, we can essentially control which eigen-pair the method
finds. This works by choosing p to be approximately λ
k
. Then |λ
k
−p| is significantly less than the other
shifted eignevalues. And, hence,
1

k
−p|
is signficantly greater than the multiplicative inverses of the
other shifted eignevalues. Thus, direct iteration of A−pI produces sequences that converge to λ
k
and v
k
.
3 The QR Algorithm
The QR Algorithm is used for determining all of the eigenvalues of a square matrix. We only stipule for
our proof of convergence that A is nonsingular. The steps in the QR Algorithm are as follows:
Method 3 (The QR Algorithm).
1. Let A
0
= A.
2. For i = 1, . . . , end
(a) Compute A
i
= Q
i
R
i
, where Q
i
is an orthogonal matrix and R
i
is upper triangular.
(b) Compute A
i+1
= R
i
Q
i
(Flip the order of the two matrices.)
The sequence A
i
then converges to a diagonal matrix with the eigenvalues, λ
1
, λ
2
, . . . , λ
n
in order as the
entries of A.
The basis of our proof will be constructing two QR compositions of A
i
, and showing that these are
equivalent. We will then show that as i → ∞, A
i
converges to an upper triangular matrix with the
eigenvalues of A on its diagonal.
3.1 Proof of the convergence of the QR algorithm
Let Q
s
= Q
1
Q
2
Q
3
· · · Q
s
and let R
s
= R
s
R
s−1
· · · R
2
R
1
. We will establish that the QR algorithm
converges. Let A be an arbitrary n x n matrix, with distinct eigenvectors in modulus. Let A
n+1
= R
n
Q
n
be a sequence of matrices, where Q is an orthogonal matrix and R is an upper triangular matrix with
positive entries on the diagonal. Recall that since R has positive entries on the diagonal, then this QR
factorization is unique. Notice that in every iteration, the matrix products are similarity transformations,
and hence these matrices have the same eigenvalues as A. For the i = 1 case, since Q is orthogonal,
A
1
= Q
1
R
1
, or
Q
T
1
A
1
= R
1
(8)
And substituting this expression for R
1
into the next iteration of the algorithm, we obtain that
A
2
= R
1
Q
1
= Q
T
1
A
1
Q
1
Continuing this process, we see that in general,
A
s+1
= (Q
s
)
T
A
1
Q
s
. (9)
7
We can order the eigenvalues of A, λ
1
, λ
2
, . . . , λ
n
such |λ
1
| > |λ
2
| > . . . > |λ
n
| > 0. Notice this restricts
our matrices to be non-singular. Hence, λ = 0. Let X = [X
1
, X
2
, . . . , X
n
] be the matrix of right
eigenvectors of A . Recall that the right eigenvector satisfies AX
1
= λ
1
X
1
. Now, if the matrix of left
eigenvectors, Y = X
−1
has an LU factorization, then the QR algorithm converges to an upper triangular
matrix.
Let D be a diagonal matrix with the eigenvalues of A on its diagonal in descending order. That is, let
D = diag{λ
1
, λ
2
, . . . , λ
n
}. Notice that from our assumption we have that
X
−1
A
1
X = D. (10)
Observe that this an eigenvalue decomposition of our matrix, as X is a matrix of eigenvectors of A
1
, and
so we have that A
1
= XDX
−1
. This implies that X
−1
A
1
X = D, as desired.
Let A
s
1
= A
1
A
1
· · · A
1
. However, we note that A
s
1
= Q
s
R
s
. For brevity, we will only consider the case
when s = 3 case. Notice that A
2
= Q
2
R
2
= R
1
Q
1
, and that A
1
= Q
1
R
1
= R
0
Q
0
. So, we see that
Q
3
R
3
= Q
1
Q
2
Q
3
R
3
R
2
R
1
= Q
1
Q
2
(Q
3
R
3
)R
2
R
1
= Q
1
Q
2
(R
2
Q
2
)R
2
R
1
= Q
1
(Q
2
R
2
)(Q
2
R
2
)R
1
= Q
1
(R
1
Q
1
)(R
1
Q
1
)R
1
= (Q
1
R
1
)(Q
1
R
1
)(Q
1
R
1
)
= A
3
1
(11)
From (10), see that
A
s
1
= (XDY )
s
As X is the matrix of right eigenvectors, and since Y = X
−1
, consequently XY = I. So,
A
s
1
= (XDY )
s
= (XDY )(XDY )(XDY ) · · · (XDY )(XDY )(XDY )
= XD
s
Y (12)
From our assumptions, we are guaranteed that Y has an LU factorization; that is, there exist matrices L
and R such that Y = L
Y
R
Y
where L
Y
is lower triangular and L
Y
is upper triangular. Furthermore, we
can dictate that L
Y
be unit lower triangular. That is, that the entries along the diagonal of L
Y
are all 1.
Replacing our expression in (12) with our LU factorization, we obtain
A
s
1
= XD
s
L
Y
R
Y
= XD
s
L
Y
IR
Y
= XD
s
L
Y
(D
−s
D
s
)R
Y
= X(D
s
L
Y
D
−s
)D
s
R
Y
(13)
8
Now we will show that as s → ∞, D
s
L
Y
D
−s
→ I. Notice that
D
s
L
Y
D
−s
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
1 0 . . . 0
l
2,1
_
λ
2
λ
1
_
s
1 0 . . .
l
3,1
_
λ
3
λ
1
_
s
l
3,2
_
λ
3
λ
2
_
s
1
.
.
.
.
.
.
.
.
.
l
n,1
_
λn
λ
1
_
s
1
_
¸
¸
¸
¸
¸
¸
¸
¸
_
So, we see that for that as our eigenvalues were ordered so that λ
i
> λ
j
for all n > i > j. Hence,
_
λ
j
λ
i
s
_
→ 0. So, we can see that D
s
L
Y
D
−s
is just an identity matrix along with some extra matrix that
converges to zero as s becomes large. That is, D
s
L
Y
D
−s
= I +δ
s
where δ
s
is every non diagonal entry.
Furthermore, note that as X, our matrix of eigenvectors, has an QR factorization given by X = Q
X
R
X
So, as s → ∞, I +δ
s
→ I, and so we obtain that
A
s
1
= X(D
s
L
Y
D
−s
)D
s
R
Y
= X(I +δ
s
)D
s
R
Y
= XD
s
R
Y
= Q
X
R
X
D
s
R
Y
Now, we know that if the diagonal elements of R in the QR decomposition are all positive, then that QR
decomposition is unique. So, let F
s
be an orthogonal diagonal matrix with entries of ±1 such that
F
s
R
x
D
s
R
y
has positive diagonal entries. Then we see that F
s
F
s
= I (since if an entry in F
s
, say F
s
i,j
is
negative, then the corresponding element in F
s
F
s
will be positive and if F
s
i,j
is negative, then the
corresponding entry in F
s
F
s
will be positive). Furthermore, we see that
Q
X
R
X
D
s
R
Y
= Q
X
IR
X
D
s
R
Y
= Q
X
(F
s
F
s
)R
X
D
s
R
Y
= (Q
X
F
s
)(F
s
R
X
D
s
R
Y
)
Notice that we constructed F
s
so that F
s
R
x
D
s
R
y
has positive diagonal entries. Notice further that Q
X
F
s
is orthogonal (as Q
X
is orthogonal, and F
s
acts almost like the identity matrix). As R
X
and R
Y
are upper
diagonal, and both F
s
and D
s
are diagonal matrices, F
s
R
X
D
s
R
Y
is also an upper triangular matrix with
positive diagonal entries, which ensures that our QR factorization obtained in (11) is in fact, equivalent
to this one.
Hence, as s → ∞, then Q
s
−Q
X
F
s
→ 0 where 0 is the zero matrix. Similarly, R
s
−F
s
R
X
D
s
R
Y
→ 0.
Therefore, as s → ∞, then Q
s
= Q
X
F
s
and R
s
= F
s
R
X
D
s
R
Y
. So, from (9), we have
A
s+1
= (Q
s
)
T
A
1
Q
s
= (Q
x
F
s
)
T
A
1
(Q
x
F
s
)
= (F
s
)
T
Q
T
x
A
1
(Q
x
F
s
) (14)
Recall from the (10), we have that that A
1
= XDX
−1
. So, by replacing the expression for A
0
in (14), and
noting that F
T
s
= F
s
(as F
s
is diagonal with entries of ±1), we see that
9
(F
s
)
T
Q
T
x
A
1
(Q
x
F
s
) = F
s
Q
T
x
(XDX
−1
)Q
x
F
s
= F
s
(Q
T
x
X)D(X
−1
Q
x
)F
s
(15)
Now, recall that X is our matrix of left eigenvectors, and that X has a QR factorization such that
X = Q
x
R
x
. Notice that Q
x
is orthogonal, and that Q
T
x
= Q
−1
X
. Hence,
X = Q
x
R
x
or
Q
−1
x
X = R
x
or
Q
T
X
X = R
x
(16)
and
X = Q
x
R
x
or
XR
−1
X
= Q
x
or
R
−1
X
= Q
x
X
−1
(17)
So, replacing the expressions from (16) and (17) in (15), we see that
F
s
(Q
T
x
X)D(X
−1
Q
x
)F
s
= F
s
(R
x
)D(R
−1
X
)F
s
= F
s
(R
x
DR
−1
X
)F
s
Hence, as s → ∞,
A
s
→ F
s
(R
x
DR
−1
X
)F
s
where the (Q
X
F
s
)(F
s
R
X
D
s
R
Y
) is an upper triangular matrix with coefficients ordered as λ
1
, λ
2
, . . . , λ
n
(since D was constructed as a diagonal matrix with the eigenvalues in this order). Notice that the F
s
matrices on the right and left do not change the sign of our eigenvalues as they have diagonal entries of
±1, and will consequently cancel each other out. Hence, our QR algorithm converges to an upper
triangular sequence with the diagonals consisting of the eigen values. Notice in our proof that we
assumed
3.2 Rate of Convergence and Phase Shifts
So, notice that in our proof, we had I +δ
s
→ I, where I +δ
s
= D
s
L
Y
D
−s
. And that the entries of δ
s
were
λ
i
λ
j
. So, this dicates that our convergence depends upon our lower triangular matrix δ
s
, going to zero
quickly. Obviously, if λ
i
≈ λ
j
, then we have our matrix δ
s
going to zero at a very slow pace.
One way to fix this problem is to introduce shifts, just as we did in the power method. To do so, chose a
shift, k
i
. Then, factor A
i
−k
i
I = Q
i
R
i
. Then, compute A
i+1
= k
i
I +R
i
Q
i
. Note then that
10
A
i+1
= k
i
I +R
i
Q
i
= k
i
I + (I)R
i
Q
i
= k
i
I + (Q
T
i
Q
i
)R
i
Q
i
= k
i
I +Q
T
i
(Q
i
R
i
)Q
i
= k
i
I +Q
T
i
(A
i
−k
i
I)Q
i
= k
i
I + (Q
T
i
A
i
−Q
T
i
k
i
I)Q
i
= k
i
I +Q
T
i
A
i
Q
i
−k
i
Q
T
i
Q
i
= k
i
I +Q
T
i
A
i
Q
i
−k
i
I
= Q
T
i
A
i
Q
i
Hence, by performing a phase shift, the next sequence in our iteration is orthogonally similar to A, and
hence preserves the eigenvalues.
3.3 Example
So, let’s now see an example of the QR Algorithm in action. So, let A = A
1
be the same matrix as given
in (2.5). Now, we first compute the QR factorization of A. We see that:
Q
1
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
0.486242 0.6120 −0.09485 0.60122 0.1358
0.16738 0.56664 0.00119 −0.58699 −0.5534
0.24468 0.09242 0.8093 −0.2667 0.4532
−0.71933 0.34422 0.4277 0.35263 −0.23820
−0.39779 0.420999 −0.3911 −0.31382 0.64268
_
¸
¸
¸
¸
¸
¸
¸
¸
_
R
1
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
256.573 −370.717 50.188 35.1633 80.347
0.0 14.098 −4.4366 0.9044 −1.7222
0.0 0.0 5.7933 5.97981 −4.7062
0.0 0.0 0.0 0.12636 −0.9100
0.0 0.0 0.0 0.0 1.0705
_
¸
¸
¸
¸
¸
¸
¸
¸
_
Then, for the second iteration, we have
A
2
= R
1
Q
1
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
17.729 −2.4501 −0.54741 345.68 306.06
1.3087 7.164 −2.5134 −6.2327 −11.137
−1.0119 0.6125 9.0875 2.0405 −1.8233
0.27110 −0.3396 0.4100 0.3301 −0.6149
−0.4258 0.4507 −0.4187 −0.3359 0.6880
_
¸
¸
¸
¸
¸
¸
¸
¸
_
So, the fifth iteration of the algorithm give us
11
A
6
= R
5
Q
5
_
¸
¸
¸
¸
¸
¸
¸
¸
_
15.073 0.13209 2.8388 30.958 −445.83
0.11263 6.1469 −1.1321 −9.8412 62.458
−0.16755 2.1358 9.8091 6.6690 −95.388
−0.00094 0.0140 0.0045 3.0058 −23.630
0.0 0.0 0.0 0.0029 0.96503
_
¸
¸
¸
¸
¸
¸
¸
¸
_
At the 50
th
iteration, we have
A
50
=
_
¸
¸
¸
¸
¸
¸
¸
¸
_
15.0 2.7517 −1.1777 29.955 −442.21
0.0 9.0 −3.3535 10.638 −125.30
0.0 0.0 7.0 6.6985 −23.605
0.0 0.0 0.0 3.0 −23.784
0.0 0.0 0.0 0.0 1.0
_
¸
¸
¸
¸
¸
¸
¸
¸
_
We see that after 50 iterations, the algorithm has achieved convergence, and thus the eigenvalues of A
are 15, 9, 7, 3, and1.
4 The Jacobi Method
The Jacobi Method finds all eigenpairs for any real symmetric matrix. The iterative technique of the
Jacobi Method will be used to find the orthogonal decomposition of A, and we will show that this
decomposition contains all the eigenvalues and eigenvectors of any symmetric matrix. We will be using
a clever iterative technique involving similarity transformations to find an equivalent form of the
orthogonal decomposition. More specificaly, we will be using a special case of the similarity
transformation involving special orthogonal matrices called plane rotation matrices. These plane
rotations will be used to make our symmetric matrix converge to the decomposition we want.
Method 4 (The Jacobi Method).
1. Let D
0
= A.
2. For k = 0, 1, . . . , end
(a) Compute orthogonal plane rotation matrix R
k
(b) Compute D
k+1
= R
T
k
D
k
R
k
.
The sequence D
k
then converges to a diagonal matrix with the eigenvalues, λ
1
, λ
2
, ...λ
n
and the
product of R
0
, R
1
, . . . , R
k
converges to an orthogonal matrix containing the corresponding
eigenvectors.
4.1 Discussion of the Convergence Underpinning the Jacobi Method
Let A be a real symmetric matrix; hence A is orthogonally diagonalizeable. That is, there exists an
orthogonal matrix R and a diagonal matrix D such that
A = RDR
t
12
Since R is orthogonal R
T
= R
−1
. Thus, we can multiply both sides of the equation by R and by R
T
to
obtain
D = R
T
AR (18)
We should consider what a similarity transformation is. A similarity transformation is in the form
Y = U
−1
XU, where X and Y have the same eigenvalues and eigenvectors. Since R is orthogonal, this
means that (18) is a similarity transformation and that A has the same eigenvalues and eigenvectors as D.
Equation (18) will be the decomposition the Jacobi Method will strive to converge to. We are already
given A, so the goal of finding the eigenvalues and eigenvectors is equivalent to finding the matrix R
such that R
T
AR will be a diagonal matrix. However, we must first show that this decomposition
contains the eigen information. /newline
If we define
R = R
1
R
2
· · · R
k−1
R
k
· · · (19)
where R
1
, R
2
, · · · , R
k−1
, R
k
, · · · are each orthogonal matrices, hence R is also an orthogonal matrix. We
also see that
R
T
= · · · R
T
k
R
T
k−1
· · · R
T
2
R
T
1
. (20)
By substituting the left hand side of equations (19) and (20) into the right hand side of equation (18), we
obtain
D = · · · R
T
k
R
T
k−1
· · · R
T
2
R
T
1
AR
1
R
2
· · · R
k−1
R
k
· · · , (21)
D = · · · R
T
k
(R
T
k−1
· · · (R
T
2
(R
T
1
AR
1
)R
2
) · · · R
k−1
)R
k
) · · · . (22)
We can use this equation to define a recursive relation that constructs a series of orthogonal matricies R
1
,
R
2
, . . . , R
n
where:
D
0
= A,
D
j
= R
t
j
D
j−1
R
j
, for j = 1, 2, . . . .
Since each of these iterations are a similarity transformation, the eigenvalues and eigenvectors are
preserved through each iteration. Recall, our goal will be to make this sequence become
lim
j→∞
D
j
= diag(λ
1
, λ
2
, . . . , λ
n
).
In reality, we will be using a finite number of iterations k to make D
k
≈ D.
Using this process to the k
th
iteration produces:
D
k
= R
t
k
R
t
k−1
· · · R
t
1
AR
1
R
2
· · · R
k−1
Rk. (23)
If we define R

to be the series of orthogonal matrices, then
R

= R
1
R
2
· · · R
k−1
R
k
. (24)
So R

will be orthogonal and by subsitution
D
k
= R
t

AR

. (25)
13
We are then able to obtain as good of an estimation of the eigenvalues as we desire.
To show this composition gives us the eigenvalues and eigenvectors, consider the following:
AR

= R

D
k
. (26)
Which implies that
R

D
k
≈ R

D. (27)
Using this we can show that the columns of R and the diagonal entries of D are the respective
approximate eigenvectors and eigenvalues of our matrix A. Using (28) and (29) we see that
AR

≈ R

D. (28)
Recall that D is diag(λ
1
, λ
2
, . . . , λ
n
) and we will define the columns of R

to be vectors x
1
, x
2
, . . . , x
n
.
Multiplying the matrices on each side of (14) gives us
[Ax
1
Ax
2
· · · Ax
n
] ≈ [λ
1
x
1
λ
2
x
2
· · · λ
n
x
n
]. (29)
Which implies that for any i for 1 ≤ i ≤ n
Ax
i
≈ λ
i
x
i
. (30)
Thus, the eigenvalues and eigenvectors can be found in this iteration technique.
4.2 Discussion of plane rotations underpinning the Jacobi Method
The problem remains that we must make D
k
≈ D. A way of accomplishing that is to zero out two off
diagonal elements after each iteration. The orthogonal matrices R
1
R
2
· · · R
k−1
R
k
will become special
plane rotation matrices that will accomplish this task of zeroing the two off diagonal entries.
For example, if we are in our first iteration and we want to eliminate d
pq
and d
qp
. We will make R
1
be the
first orthogonal matrix we use with the form
R
1
=
_
_
_
_
_
_
_
_
_
_
_
_
p q
1 · · · 0 · · · 0 · · · 0
.
.
.
.
.
.
p 0 · · · cos φ · · · sin φ · · · 0
.
.
.
.
.
.
q 0 · · · −sin φ · · · cos φ · · · 0
.
.
.
.
.
.
0 · · · 0 · · · 0 · · · 1
_
_
_
_
_
_
_
_
_
_
_
_
. (31)
This matrix has diagonal entries of value one except for the two cases of cos φ and the off diagonal
elements have the value zero except for sin φ and −sin φ. The reason this matrix is called a plane rotation
matrix is that this matrix, when multiplied with a vector, will rotate the vector an angle of φ on the pq
th
plane.
Notice if we multiply B = AR
1
the entries will be the following
b
kj
= b
jk
= a
jk
, whenj = p andk = q,
b
jk
= b
jp
= a
jp
cos φ −a
jq
sin φ, for j = 1, 2, . . . , n,
b
qj
= b
jq
= a
jp
sin φ +a
jq
cos φ, for j = 1, 2, . . . , n.
14
Notice that any entry that is not in the j
th
or p
th
row or column will not be changed in this multiplication
We can use the same reasoning for the product C = R
t
1
A which would result also in only changing rows
p and q.
So when we only concern ourselves with zeroing out the two off diagonal entries of A, a
pq
anda
qp
, we
now must compute exactly what kind of rotation matrix we need for each iteration.
Consider if we have an arbitrary first iteration D
1
= R
t
1
AR
1
then the entries d
pq
andd
qp
would become
d
pq
= d
qp
= (cos
2
φ −sin
2
φ)a
pq
+ cos φsin φ(a
pp
−a
qq
). (32)
Our goal is to make the above equation zero for each iteration, thus we will seperate variables φ from the
entries of matrix A to get
0 = (cos
2
φ −sin
2
φ)a
pq
+ cos φsin φ(a
pp
−a
qq
), (33)
(cos
2
φ −sin
2
φ)
2 cos φsin φ
=
a
pp
−a
qq
2a
pq
. (34)
Recall the trigonometric identity,
cot 2φ =
cos
2
φ −sin
2
φ
2 cos φsin φ
. (35)
Thus by substitution
cot 2φ =
a
pp
−a
qq
2a
pq
(36)
We now have our angle of rotation φ and can compute our first rotation matrix for our first iteration.
Also, to find the matrix of the first iteration, we can deduce from the multiplications B = AR
1
and
C = R
t
1
A that every entry that is not in the row or column of p or q will not be changed. In addition, the
rest of entries can be calculated with these formulas
d
jp
= d
pj
= cos φa
jp
−sin φa
jq
whenj = p andj = q, , (37)
d
jq
= d
qj
= sin φa
jp
+ cos φa
jq
whenj = p andk = q, , (38)
d
pp
= cos
2
φa
pp
+ sin
2
φa
qq
−2 cos φsin φa
pq
, (39)
d
qq
= sin
2
φa
pp
+ cos
2
φa
qq
+ 2 cos φsin φa
pq
, (40)
(41)
This shortcut to calculating the product of each similarity transformation compresses the cost of our
iterations immensely.
Thus we perform the iteration to reduce two off diagonal entries to zero and keep track of each rotational
matrix we construct once D is sufficiently diagonal, we multiply the rotation matrices together to obtain
the matrix of eigenvectors R.
4.3 Variations of the Jacobi Method
We have some freedom when performing these iterations as we can choose what off diagonal entries to
eliminate each time. For example, in the classical Jacobi method, we would choose the largest absolute
value of the off diagonal entries after each iteration. However, there is an extra cost if a program were to
search for the dominant off diagonal entry after each iteration, so to reduce computing power we can use
the cyclic method. The cyclical method is a process where we choose to go through each the rows, left to
15
right, of all entries below the diagonal of matrix A. We continue to cycle through the entries by rows
from left to right and through the rows from the second from the top row to the bottom row. This is
called a sweep. We repeat sweeps until all the entries are within a previously defined tolerance threshold
around zero.
A way to further reduce cost in the Jacobi method is by calculating more than one transformation at a
time. If we can find the rotation of two or more different planes then we can reduce the number of
iterations needed to solve the problem and cut computational costs by multiplying fewer rotation
matricies. This can be done as long as each plane transformation within this multiple plane rotation
matrix never share the same row or column as each plane rotation will not affect each others entries.
4.4 Implementation of the Cyclical Jacobi Method
Consider we have the symmetric matrix
A =
_
¸
¸
¸
¸
¸
_
8 −1 3 −1
−1 6 2 0
3 2 9 1
−1 0 1 7
_
¸
¸
¸
¸
¸
_
(42)
Using the classical Jacobi Method Consider that we want our first iteration to zero out the entries a
13
and
a
31
thus we will find the plane rotation matrix in the form
R
1
=
_
_
_
_
cos φ 0 sin φ 0
0 1 0 0
−sin φ 0 cos φ 0
0 0 0 1
_
_
_
_
(43)
Then to make a
13
= a
31
= 0 we use the equation (number) to obtain φ
cot 2φ =
a
11
−a
33
2a
13
(44)
cot 2φ =
9 −8
2(3)
(45)
cot 2φ =
1
6
(46)
φ =
1
2
cot
−1
(
1
6
) (47)
cos φ ≈ .763020 (48)
sin φ ≈ .646375 (49)
Thus our plane rotation matrix will become
R
1
=
_
_
_
_
.763020 0 .646375 0
0 1 0 0
−.646375 0 .763020 0
0 0 0 1
_
_
_
_
(50)
Thus our first iteration D
1
= R
t
1
AR
1
will become
16
D
1
=
_
¸
¸
¸
¸
¸
_
5.458619 −2.055770 0 −1.409395
−2.055770 6.0 0.879665 0.0
−0 0.879665 11.541381 0.116645
−1.409395 0 0.116645 7
_
¸
¸
¸
¸
¸
_
(51)
Next, chose to zero out the element a
12
= a
21
= −2.055770 to make
D
2
=
_
¸
¸
¸
¸
¸
_
3.655795 0 0.579997 −1.059649
0 7.802824 0.661373 0.929268
0.579997 0.661373 11.541381 0.116645
−1.059649 0.929268 0.116645 7
_
¸
¸
¸
¸
¸
_
(52)
After ten iterations with the Classical Method we obtain
D
10
=
_
¸
¸
¸
¸
¸
_
3.295870 0.002521 0.037859 0
0.002521 8.405210 −0.004957 0.066758
0.037859 −0.004957 11.704123 −0.001430
0 0.066758 −0.001430 6.594797
_
¸
¸
¸
¸
¸
_
(53)
We will continue this until the eighteenth iteration where we converge to the diagonal matrix with a
threshold of 10
−6
.
D =
_
¸
¸
¸
¸
¸
_
3.295699 0 0 0
0 8.407662 0 0
0 0 11.704301 0
0 0 0 6.592338
_
¸
¸
¸
¸
¸
_
(54)
Then to obtain the corresponding eigenvectors, we will multiply the plane rotation matrices
R = R
1
R
2
· · · R
18
to obtain
R =
_
¸
¸
¸
¸
¸
_
0.528779 −0.573042 0.582298 0.230097
0.591967 0.472301 0.175776 −0.628975
−0.536093 0.282050 0.792487 −0.071235
0.287454 0.6074455 0.044680 0.739169
_
¸
¸
¸
¸
¸
_
(55)
As we can see in the example, the previously zeroed entries do not remain zero. However, as we
accomplish each iteration, the off diagonal entries will not revert back to or above the magnitude the
entries once were before they were zeroed out. While there are many variations of the Jacobi method, all
of these ways will converge quadratically.
5 Conclusion
We have seen that for large matrices, no direct polynomial root exists to find the eigenvalues and
eigenvectors. We can however use a variety of techniques to approximate the eigenvalues with great
accuracy. The use of these processes usually require powerful computers, but with the prevalence of
17
modern day computing, lack of computing power is a quickly diminishing problem.
In summary, the Power method uses an iterative technique of multiplying an arbitrary vector and
continously scaling; yet it can only find the largest or smallest eigenvalue. We can use the shifted Inverse
Power method to adjust the original Power method to find other eigenvalues, but this requires
experimentation to resolve towards other eigenvalues. We can also use the QR method, which flips the
order of the QR decomposition to converge to an upper triangular matrix to find all the eigenvalues, but
it does not have a direct method to find the eigenvectors of a matrix. A method that does find both the
eigenvalues and the eigenvectors is the Jacobi method, which uses plane rotations to transform a
symmetric matrix into a diagonal matrix.
Each of these methods have their flaws and their strengths; deciding upon what method to use relies
upon a variety of factors: speed of computation, ease of coding, need of accuracy, and what
eigen-information is desired. Furthermore, in our paper we obviously ignored the issue of complex
eigenvalues in our research. The interested reader will find in our research a variety of sources that go
deeper into the problem of complex eigenvalues and eigenvectors.
6 Works Cited
• Mathews, John H., and Kurt K. Fink. Numerical Methods Using Matlab,. 4th ed. New Jersey:
Prentice-Hall Inc., 2004. 612-21. Print.
• Cullen, Charles G. An introduction to Numerical Linear Algebra. 4th ed. N.p.: Pws Pub Co, 1993.
101-134; 137-155. Print.
• Wilkinson, John H. ”Convergence of the LR, QR, and related algorithms.”
http://comjnl.oxfordjournals.org/content/8/1/77.full.pdf Web. 15 Nov. 2011.
• Press, William H. NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING. 2nd ed.
N.p.: Cambridge University Press, 1992. 463-469. Print.
• Watkins, David S. Fundamentals of Matrix Computations. New York: John Wiley & Sons, 1991.
Print.
18