Matrix Analysis Chap 08 Handout

Matrix Analysis and Applications
Chapter 8: QR and Interpolative Decomposition
Minghua Xia
(https://seit.sysu.edu.cn/teacher/XiaMinghua)
School of Electronics and Information Technology

Sun Yat-sen University
November 15, 2023
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 1 / 108
Recap of the Overview of the Course
I. Five basic mathematical problems:

1. Ax = b
2. Ax = λx
3. Av = σu
kAxk2
4. max
kxk2
5. Factor the matrix A
II. Organization:
Chapter 1: Basic concepts
Chapters 2-4: Eigenvalue decomposition
Chapter 5: Singular value decomposition
Chapters 6-7: Linear systems (algebra vs. geometry)
Chapters 8-10: Matrix factorization and applications
Table of Contents
1 QR Decomposition
2 Gram-Schmidt Orthonormalization
3 Givens Rotations
4 Householder Transformations
5 Interpolative Decomposition: CZ and CMR Factorization
CZ Factorization
CMR Factorization
Three Bases for the Column Space
6 Computing Eigenvalues and Eigenvectors
Power Iteration
Jacobi Method
QR Algorithm
Krylov Subspace Methods
Generalized Eigenvalue Problems
QR Decomposition
Definition 1 (QR Decomposition)
Any A ∈ Cm×n can be decomposed as
A = QR, (1)
where Q ∈ Cm×m is unitary, and R ∈ Cm×n is upper triangular.
efficient to compute, a.k.a. poor man’s SVD.

can be used to compute
LS solutions;
bases of R(A) and R(A)⊥ ;
decompositions such as SVD.
Question 1
1) How do we know that QR decomposition exists?
2) How to compute the factors Q and R algorithmically?
An Illustrative Application: Solving LS Problem
Consider the LS problem for a full column-rank A ∈ Cm×n :
min kAx − yk22 . (2)

x∈Cn
Let
R1
A = QR = Q1 Q2 = Q1 R1 , (3)
0
where Q1 ∈ Cm×n , Q2 ∈ Cm×(m−n) , R1 ∈ Cn×n (upper triangular). Since
2
kAx − yk22 = QH (Ax − y) 2
H 2
Q1 (Ax − y)
= (4)
QH
2 (Ax − y) 2
2 2
= R1 x − QH
1 y 2
+ QH
2 y 2
, (5)
the LS problem is equivalent to solving the upper triangular system
R1 x = QH
1 y. (6)
QR for Full Column-Rank Matrices
Theorem 2 (QR Decomposition)

Let A ∈ Cm×n , and suppose that A has full column rank. Then, A admits a
decomposition
R1
A = QR = Q1 Q2 = Q1 R1 , (7)
0
where Q1 ∈ Cm×n is semi-unitary; R1 ∈ Cn×n is upper triangular and
nonsingular.
We call A = Q1 R1 a thin QR of A, and (Q1 , R1 ) a thin QR pair of A.
It follows directly that
Property 1
For the thin QR in Theorem 2, we have R(A) = span(Q1 ) and
R(A)⊥ = span(Q2 ). Moreover, Q1 QH 1 is orthogonal projector onto R(A).
QR for Full Column-Rank Matrices (Cont’d)
Proof.
1) AH A is positive definite.
2) Let R1 be the (upper triangular and nonsingular) Cholesky factor of

AH A, i.e.,
AH A = R1H R1 .
Also, let Q1 = AR1−1 .
3) It can be verified that QH

1 Q1 = I and Q1 R1 = A.
Application: A Geometrical Interpretation of Determinant
Question 2
When X ∈ Rm×n has linearly independent columns, explain why the volume of
the n-dimensional parallelepiped generated by the columns of X is
1/2
Vn = det X T X

. (8)
In particular, if X is square, then
Vn = |det (X)| . (9)
Figure 1: Geometric interpretation of determinant.
Application: A Geometrical Interpretation of Determinant
Solution: Recalling the application ‘Determine the Volume of n-Dimensional

Parallelepiped’ in Chapter 7, we know that if X = QR is the QR factorization of
X, then the volume of the n-dimensional parallelepiped generated by the columns
of X is
Vn = v1 v2 · · · vn = det(R), (10)
where the vk ’s are the diagonal elements of the upper-triangular matrix R.
On the other hand, it is not hard to prove that

2
det X T X = det RT QT QR = (det (R)) = (v1 v2 · · · vn )2 = Vn2 ,

(11)
which leads to (8). If X is square, we have

2 2
det X T X = det X T det (X) = (det (X)) = (det (R)) = Vn2 ,

(12)
which implies (9).
Classical Gram-Schmidt for Thin QR
Question 3
Given a full column-rank A, how to compute its thin QR pair (Q1 , R1 )?
The proof of Theorem 2 already suggests a way to compute

(Q1 , R1 ), but there are more efficient ways.
Suppose that A = Q1 R1 exists. Then, the k th column of A = Q1 R1

equals
ak = r1k q1 + r2k q2 + · · · + rkk qk . (13)
Stage 1: since a1 = r11 q1 , we can obtain r11 and q1 via
r11 = ka1 k2 , q1 = a1 /ka1 k2 . (14)
Classical Gram-Schmidt for Thin QR (Cont’d)
Stage 2: since a2 = r12 q1 + r22 q2 and q1H q2 = 0, we can obtain r12 via
r12 = q1H a2 . (15)
Also, we can obtain r22 and q2 via a cancellation trick:
r22 = ky2 k2 , q2 = y2 /ky2 k2 , (16)
where y2 = a2 − r12 q1 .
Stage 3: we have a3 = r13 q1 + r23 q2 + r33 q3 , Using the same trick as above
yields
r13 = q1H a3 , r23 = q2H a3 , (17)
and
q3 = y3 /ky3 k2 , r33 = ky3 k2 , (18)
where y3 = a3 − r13 q1 − r23 q2 .
The same idea applies to the subsequent stages.

Classical Gram-Schmidt for Thin QR
Gram-Schmidt procedure: for k = 1, · · · , n, compute
rik = qiH ak , i = 1, · · · , k − 1, (19a)

k−1
X
yk = ak − rik qi , (19b)
i=1
yk
qk = , (19c)
kyk k2
rkk = kyk k2 . (19d)
There are variants with the implementations of Gram-Schmidt

(numerical stability is usually the concern).
Classical Gram-Schmidt for Thin QR (Cont’d)
1) Classical Gram-Schmidt procedure often suffers the loss of

orthogonality in finite precision.
2) Also, separate storage is required for A, Q, and R, since original ak

are needed in inner loop, so qk cannot overwrite columns of A.
3) Both deficiencies are improved by modified Gram-Schmidt procedure,

with each vector orthogonalized in turn against all subsequent
vectors, so qk can overwrite ak .
Modified Gram-Schmidt
Let riT denote the ith row of R1 (with a slight abuse of notations). The following
illustrates a modified Gram-Schmidt, which is numerically more stable.
Modified Gram-Schmidt procedure: for k = 1, · · · , n, compute
n
X
A = Q1 R1 = qi riT . (Rank-one decomposition) (20)
i=1
Suppose that q1 , · · · , qk−1 , r1 , · · · , rk−1 are determined. By observing

k−1
X n
X
A− qi riT = qi riT = [0, · · · , 0, rkk qk , · · · ], (21)
| {z }
i=1 i=k
k−1 zeros
we can obtain qk and rk via

k−1
!
X
yk = A− qi riT ek , (22a)
i=1
qk = yk /kyk k2 , (22b)
k−1
!
X
rkT = qkT A− qi riT . (22c)
i=1
Givens Rotations
Consider yet another way for QR; again, assume the real-valued case.
Example 3
Given x ∈ R2 , consider

y1 c s x1 cx1 + sx2
= = , (23)
y2 −s c x2 −sx1 + cx2
| {z }
=J
where c , cos(θ), s , sin(θ), for some θ. It can be verified that
1) J is orthogonal.
2) y2 = 0 if θ = tan−1 (x2 /x1 ), or if
x1 x2
c= p 2 , s= p 2 . (24)
x1 + x22 x1 + x22
Givens Rotations (Cont’d)
Givens rotations:  
I

 c s 

J (i, k, θ) , 
 I ,
 (25)
 −s c 
I
where c , cos(θ), s , sin(θ); c lies in the (k, k)th and (i, i)th entries; −s the
(i, k)th entry; s the (k, i)th entry.
1) J (i, k, θ) is orthogonal;
2) Let y = J (i, k, θ)x. We have

 cxi + sxk , j = i
yj = −sxi + cxk , j = k . (26)
6 i, k
xj , j =

Also, yk can be forced to zero by choosing θ = tan−1 (xk /xi ).
Example 4
Consider a 4 × 3 matrix.
       
× × × × × × × × × × × ×
× × × J2,1  0 × × J3,1  0 × × J4,1  0 × ×
× × × −−−→ × ×
A=    −−−→   −−−→  
× 0 × × 0 × ×
× × × × × × × × × 0 × ×
     
× × × × × × × × ×
J3,2  0 × ×  J4,2  0 × × J4,3  0 × ×
−−−→  0 0 × −−−→  0
   −−−→   = R,
0 × 0 0 ×
0 × × 0 0 × 0 0 0
J
where B − → C means C = J B; Ji,k = J (i, k, θ), with θ that is chosen to zero out the
(i, k)th entry of the matrix transformed by Ji,k .
Givens QR: perform a sequence of Givens rotations to annihilate the lower

triangular parts of A to obtain
Jn,n−1 . . . (Jn,2 · · · J3,2 )(Jn,1 · · · J2,1 ) A = R, (27)

| {z }
=QT
where R is upper triangular, and Q is orthogonal.
Example 5 (Givens Rotation)
T
1) Let x = 4 3 .
2) To annihilate the second entry, we compute cosine and sine

x1 x
c= p 2 = 0.8 and s = p 2 = 0.6.
x1 + x22 x21 + x22
3) Rotation is then given by

c s 0.8 0.6
J= = .
−s c −0.6 0.8
4) Rotation is then given by

0.8 0.6 4 5
Jx = = .
−0.6 0.8 3 0
Characteristics of Givens QR Factorization
Remark 1
1) Straightforward implementation of the Givens method requires about
50% more work than the Householder method and requires more
storage since each rotation requires two numbers, c and s, to define
it.
2) These disadvantages can be overcome but requires more complicated

implementation.
3) Givens can be advantageous for computing QR factorization when

many matrix entries are already zero since those annihilations can
then be skipped.
Interactive example
Householder Transformations
Consider an alternative QR via the Householder transformations (to be
introduced). Assume the real-valued case for convenience.
Definition 6 (Reflection Matrix)

A matrix of the form
H , I − 2P , (28)
m×m
where P ∈ R is an orthogonal projector, is called a reflection matrix.
By letting a = P a + P⊥ a, P⊥ = I − P , it is seen that

Ha = (I − 2P )a = −P a + (I − P )a = −P a + P⊥ a, (29)
where a is ‘reflected’.
A reflection matrix is both orthogonal and symmetric:
H = H T = H −1 . (30)
P T +4 P
Proof: H T H = (I − 2P )T (I − 2P ) = I − 2P − 2 |{z} T
| {zP} = I.
=P =P
Householder Transformations (Cont’d)
Question 4
Given a ∈ Rm , find a reflection matrix H ∈ Rm×m such that
Ha = βe1 , for some β ∈ R. (31)
Definition 7 (Householder Transformations)

Let
2
H,I− vv T , (32)
kvk22
T
vv
for some v ∈ Rm (compared with (28), P = kvk 2 is indeed an orthogonal
2
projector onto R(v)). It can be verified that if
v = a + sign(a1 )kak2 e1 , (33)
where sign(a1 ) was introduced to avoid cancellation, then, we obtain
Ha = −sign(a1 )kak2 e1 . (34)

Householder Transformations (Cont’d)
Figure 2: Geometric interpretation of Householder transformation as reflection.
Example 8 (Householder Transformation)
T
1) If a = 2 1 2 , then we take
     
2 1 5
v = a + sign(a1 )kak2 e1 = 1 + kak2 0 = 1 .
2 0 2
2) To confirm that transformation works,

     
2 5 −3
vT a 15
Ha = a − 2 v = 1 − 2 × × 1 =  0  = −sign(a1 )kak2 e1 .
kvk2 30
2 2 0
Interactive example
Householder QR Factorization
Stage 1: let H1 ∈ Rm×m be the Householder transformation w.r.t. a1 , i.e., the

1st column of A. Then,  
× × ··· ×
 0 × · · · ×
A(1) = H1 A =  . ..  . (35)
 
. .. . .
. . . .
0 × ··· ×
Stage 2: let H̃2 ∈ R(m−1)×(m−1) be the Householder transformation w.r.t. the

(1)
1st column of A2:m, 2:n . Then,
 
× × × ··· ×
 0 × × · · · ×
 
× × .

1 0 .. 0 × · · · ×
A(2) =

H1 A = (1) =  . (36)
0 H̃2 0 H̃2 A2:m 2:n 
. . . .

| {z }  ..
 .. .. . . . .. 

=H2
0 0 × ··· ×
The same idea applies to stage k, 1 ≤ k ≤ n − 1.
Householder QR Factorization (Cont’d)
Householder QR procedure: for k = 1, · · · , n − 1, compute
A(k) = Hk A(k−1) , (37)
where
Ik−1 0
Hk = , (38)
0 H̃k
in which Ik is the k × k identity matrix, and H̃k ∈ R(m−k+1)×(n−k+1) is the
(k−1)
Householder transformation w.r.t Ak:m, k:n .
The above procedure results in
A(n−1) = Hn−1 · · · H2 H1 A = R. (39)
Eq. (39) implies Q = H1T H2T · · · Hn−1

T
;
The above QR procedure already implies the existence of full QR; also, A
can be arbitrary rather than being of full column rank as in Gram-Schmidt.
Example 9 (Householder QR Factorization)
1) For the polynomial data-fitting example given previously, with
   
1 −1.0 1.0 1.0
1 −0.5 0.25 0.5
   
A= 1 0.0 0.0 
 , b = 0.0 .
 
1 0.5 0.25 0.5
1 1.0 1.0 2.0
2) Householder vector v1 for annihilating subdiagonal entries of first column of

A is      
1 2.236 3.236
1  0   1 
     
v1 = 1 +  0  =  1  .
    
1  0   1 
1 0 1
Example 9 (Householder QR Factorization (Cont’d))
3) Applying resulting Householder transformation H1 yields transformed matrix
and right-hand side
   
−2.236 0 −1.118 −1.789
 0
 −0.191 −0.405 
−0.362
 
H1 A =  0
 0.309 −0.655 , H1 b = 

−0.862 .

 0 0.809 −0.405   −0.362
0 1.309 0.345 1.138
4) Householder vector v2 for annihilating subdiagonal entries of second column

of H1 A is      
0 0 0
−0.191 −1.581 −1.772
     
v2 =  0.309  +  0  =  0.309  .
    
 0.809   0   0.809 
1.309 0 1.309
5) Applying resulting Householder transformation H2 yields
   
−2.236 0 −1.118 −1.789
 0 1.581 0   0.632 
   
H2 H1 A =   0 0 −0.725  , H2 H1 b = −1.035 .
  
 0 0 −0.589  0.816 
0 0 0.047 0.404
6) Householder vector v3 for annihilating subdiagonal entries of third column of

H2 H1 A is      
0 0 0
 0   0   0 
     
v3 = −0.725 + −0.935 = −1.660 .
    
−0.589  0  −0.589
0.047 0 0.047
7) Applying resulting Householder transformation H3 yields
   
−2.236 0 −1.118 −1.789
 0 1.581 0   0.632 
   
H3 H2 H1 A =  0 0 0.935  , H3 H2 H1 b = 

 1.336  .

 0 0 0   0.026 
0 0 0 0.337
8) Now solve upper triangular system Rx = c1 by back-substitution to obtain

T
x = 0.086 0.400 1.429 .
Interactive example
Further Study: Hilbert Transform
Question 5
Householder transform is able to change the amplitudes of elements of a vector. But how to
change their phases?
Answer:
1
Z ∞
x(τ )
x̂(t) = H {x(t)} = dτ (40)
π t−τ
−∞
Z ∞
1 x̂(τ )
x(t) = H−1 {x̂(t)} = − dτ (41)
π −∞ t − τ
Figure 3: The Hilbert transform in the time domain, which is, in essence, a π/2 phase shifter
in the frequency domain.

−j, if ω ≥ 0;
H(ω) = F {h(t)} = (42)
+j, if ω < 0.
An application in signal processing is to construct complex signals.
Interpolative Decomposition
Definition 10 (Interpolative Decomposition)

Given A ∈ Rm×n , we have
A = CZ or A = CM R, (43)
where C (resp., R) are actual columns (resp. actual rows) of A – thoughtfully

chosen.
A = CZ takes less computing time and less storage than A = U ΣV T and

A = QR;
When A is sparse or nonnegative or both, so is C;
When A comes from discretizing a differential or integral equation, the
columns in C have more meanings than the orthogonal bases in U and Q;
When A is a matrix of data, the columns kept in C have simple
interpretations.
Interpolative Decomposition (Cont’d)
Comparison with SVD and Nonnegative Matrix Decomposition (NMF):

SVD: The singular vectors are algebraically and geometrically perfect but
they are “humanly” hard to know;
NMF: When the entries in A are intrinsically positive, the goal may be an
NMF. Then, both factors in A ≈ M N have nonnegative entries. This is
important for moderate sizes, but it is asking a lot in the world of big data;
A = CZ is right for large matrices when Z has small entries (say |zij | ≤ 2).
CZ Factorization
1 When C contains a basis for the column space of A, the factor Z is
completely determined by
aj = Czj . (44)
2 When the columns of C are not very independent, we first look at the row
space of C:
cTi = yiT B, (45)
where B is an r × r invertible matrix. (45) implies C = Y B. Then,
combining it with A = CZ yields
Am×n = Cm×r Zr×n = Ym×r Br×r Zr×n , (46)
where all matrices have rank r.
Remark 2 (On the values of zj and yiT )
By recalling the fact that the columns of C came directly form A and the rows of
B came directly from C, for those columns and rows, the vectors zj in (44) and
yiT in (45) came directly from the identity matrix I.
CZ Factorization (Cont’d)
3 According to Remark 2, if we suppose that zj and yiT are the first r
columns and rows in A and C, then A = Y BZ has a special form:

Ir
B −1 Zn−r

Y = −1 , B = submatrix of A, Z = Ir (47)
Cm−r B
Example 11
   
1 2 4 2 1 0
1 2 1 0 0 0
A = 0 1 2 1 = 0 1 (48)
0 1 0 1 2 1
1 3 6 3 1 1 | {z } | {z }
| {z } B Z
Y
 
1 2
1 0 0 0
= 0 1 (49)
0 1 2 1
1 3 | {z }
| {z } Z
C
Remark 3 (On the choice of B)
1) We are just supposing that the upper left r × r corner B of A is invertible.

But every matrix A of rank r has an invertible r × r submatrix B
somewhere, and even maybe many such Bs.
2) If Cm−r or Zn−r or both happen to contain large numbers, the factors in
(47) are bad choices. We are happiest if those other entries of Y and Z are
small. The best is |yij | ≤ 1 and |zij | ≤ 1. And we can find a submatrix B
that makes this true!
4 According to Remark 3, if we choose B as the r × r submatrix of A with

the largest determinant, then all entries of Y and Z have |yij | ≤ 1 and
|zij | ≤ 1. Accordingly, we have
A = Y Bmax Z (50)
= CZ. (51)
Example 12
   
1 2 4 2 1 0
1 4 1 0 0 0
A = 0 1 2 1 = 0 1 (52)
0 2 0 0.5 1 0.5
1 3 6 3 1 1 | {z } | {z }
| {z } Bmax Z
Y
 
1 4
1 0 0 0
= 0 2 (53)
0 0.5 1 0.5
1 6 | {z }
| {z } Z
C
Note: In the applications of big data analysis, A is generally rank-deficient and

A = CZ holds for sure. However, in case A is of full column rank, it is obvious
that one must choose C = A such that A = CZ, where Z = I. If C is chosen
with a lower rank than A, then one can only get the approximation A ≈ CZ.
For more details on the approximation error, see the following Definition 13.
CMR Factorization
Definition 13 (CMR Factorization)

Given A ∈ Rm×n , if rank(A) = r < min{m, n}, we have
A = CM R, (54)
where C (resp., R) are actual columns (resp. actual rows) of A – thoughtfully chosen,
and M is a mixing matrix. Otherwise, we have the approximation
A ≈ CM R, (55)
with
kA − CM Rk ≤ kUr−1 k + kVr−1 k σr+1 .

(56)
The basic idea for implementing the CMR factorization is the discrete empirical
interpolation method (DEIM). Specifically, based on the SVD decomposition
A = U ΣV T , the DEIM procedure uses U and V to select the columns and rows of A
that form C and R, respectively. For more details, please refer to the work1 .
1
D. C. Sorensen and M. Embree, A DEIM Induced CUR Factorization, SIAM Journal on Scientific Computing, vol. 38,
no. 3, pp. A1454-A1482, 2016.
Three Bases for the Column Space
Motivation: As the matrices get large, their rank is also large if there is random
noise. But the effective rank, when the noise is removed, may be considerably
smaller than m and n. So,
1) Don’t use AT A and AAT when you can operate directly on A;
2) Don’t assume that the original order of the rows and columns is necessarily
the best.
The first warning would apply to the least squares and SVD problems because by
forming AT A, we are squaring the condition number σ1 /σn . This measures the
sensitivity and vulnerability of A. Also, for really large matrices, the cost of
computing and storing AT A is just unthinkable.
Question 6
What do we really need from A?
Answer: We need a good basis for the column space! From that starting point,
we can do anything!
Three Bases for the Column Space (Cont’d)
1 Singular vectors u1 , · · · , ur from the SVD, with σ1 ≥ σ2 ≥ · · · ≥ σr :
A = Ur × Σr VrT .

(57)
The column basis u1 , · · · , ur in U is orthonormal. It has the extra property that

also the rows σk vkT of the second factor Σr VrT are orthogonal.
2 Orthogonal vectors q1 , · · · , qr from Gram-Schmidt. Using column pivoting!
A = Qm×r × Rr×n . (58)
The column basis q1 , · · · , qr in Q is orthonormal. The rows of the second part R

are not orthogonal but they are well-conditioned (safely independent).
3 Independent columns c1 , · · · , cr taken directly from A after column
exchanges:
A = Cm×r × Zr×n . (59)
The column basis c1 , · · · , cr in C is not orthogonal. But both C and the second
factor Z can be well conditioned. This is achieved by allowing column exchanges.
(Not just allowing but insisiting.) C contains r “good columns” from A.
Three Bases for the Column Space (Cont’d)
Remark 4 (How to Compute the Three Bases?)

1) We start from A = QR with column pivoting, i.e.,
AP = QR, (60)
where P is a permutation matrix that rearranges the first k columns of AP (and thus Q)
to be the important columns.
2) Make a low-rank approximation to A:
AP = QR = Qr Rr An−r and A = Qr Rr P T An−r P T ,

(61)
where A = Qr Rr PT + E ≈ Qr Rr PT.
3) Find the SVD of the matrix Rr P T with only r rows: Rr P T = Ur ΣV T .
4) Multiply Qr times Ur to find U = Qr Ur such that
A = Qr Ur ΣV T + E = U ΣV T + E. (62)
5) With U and V , the CMR decomposition can be derived according to 1.
Further reading: Please read carefully Sections II.3-II.4 of the book2 .
2
Gilbert Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge University Press, 2019, pp. 138-155.
Power Iteration (λmax )
1) Simplest method for computing one eigenvalue-eigenvector pair is

power iteration, which repeatedly multiplies matrix times initial
starting vector.
2) Assume A has unique eigenvalue of maximum modulus, say λ1 , with

corresponding eigenvector v1 .
3) Then, starting from nonzero vector x0 , iteration scheme
xk = Axk−1 (63)
converges to multiple of eigenvector v1 corresponding to dominant

eigenvalue λ1 .
Convergence of Power Iteration
1) To see why power iteration converges to the dominant eigenvector,
express starting vector x0 as a linear combination
n
X
x0 = αi vi , (64)
i=1
where vi are eigenvectors of A.
2) Then
xk = Axk−1 = A2 xk−2 = · · · = Ak x0
n n
!
X X
= λki αi vi = λk1 α1 v1 + k
(λi /λ1 ) αi vi
i=1 i=2
= λk1 α1 v1 , as k → ∞. (65)
3) Since |λi /λ1 | < 1 for i > 1, successively higher powers go to zero,
leaving only component corresponding to v1 .
Example 14 (Power Iteration)
1) Ratio of values of given component of xk from one iteration to next
converges to dominant eigenvalue λ1 .

1.5 0.5 0
2) For example, if A = and x0 = , we obtain
0.5 1.5 1
k xTk ratio
0 0.0 1.0
1 0.5 1.5 1.500
2 1.5 2.5 1.667
3 3.5 4.5 1.800
4 7.5 8.5 1.889
5 15.5 16.5 1.941
6 31.5 32.5 1.970
7 63.5 64.5 1.985
8 127.5 128.5 1.992
3) Ratio converges to the dominant eigenvalue, which is 2.
Limitations of Power Iteration
Power iteration can fail for various reasons:

1) Starting vector may have no component in dominant eigenvector v1 ,
i.e., α1 = 0. Nevertheless, this problem does not occur in practice
because rounding error usually introduces such component in any
case.
2) There may be more than one eigenvalue having the same (maximum)
modulus, in which case iteration may converge to a linear
combination of corresponding eigenvectors.
3) For real matrix and starting vector, iteration can never converge to a
complex vector.
Normalized Power Iteration
1) Geometric growth of components at each iteration risks eventual

]colorred overflow (or underflow if λ1 < 1).
2) Approximate eigenvector should be normalized at each iteration, say,

by requiring its largest component to be unity in modulus, giving
iteration scheme
yk = Axk−1 , (66)
xk = yk /kyk k∞ . (67)
3) With normalization, kyk k∞ → |λ1 | and xk → v1 /kv1 k∞ .
Example 15 (Normalized Power Iteration)
Repeating the previous example with the normalized scheme,
k xTk kyk k∞
0 0.000 1.0
1 0.333 1.0 1.500
2 0.600 1.0 1.667
3 0.778 1.0 1.800
4 0.882 1.0 1.889
5 0.939 1.0 1.941
6 0.969 1.0 1.970
7 0.984 1.0 1.985
8 0.992 1.0 1.992
Interactive example
Geometric Interpretation
1) Behavior of power iteration depicted geometrically:
2) Initial vector x0 = v1 + v2 contains equal components in eigenvectors

v1 and v2 (dashed arrows).
3) Repeated multiplication by A causes component in v1 (corresponding
to larger eigenvalue, 2) to dominate, so sequence of vectors xk
converges to v1 .
Power Iteration with Shift (to Accelerate Convergence)
1) Convergence rate of power iteration depends on ratio |λ2 /λ1 |, where
λ2 is eigenvalue having second largest modulus.
2) May be possible to choose shift, A − σI such that
λ2 − σ λ2
< , (68)
λ1 − σ λ1
so convergence is accelerated.
3) Shift must then be added to the result to obtain the eigenvalue of the
original matrix.
4) In the earlier example, for instance, if we pick the shift of σ = 1
(which is equal to another eigenvalue), then the ratio becomes zero,
and the method converges in one iteration.
5) In general, we would not be able to make such a fortuitous choice,
but shifts can still be extremely useful in some contexts, as we will see
later.
Inverse Iteration (λmin )
1) If the smallest eigenvalue of the matrix required rather than the
largest, can make use of the fact that eigenvalues of A−1 are
reciprocals of those of A, so smallest eigenvalue of A is reciprocal of
the largest eigenvalue of A−1 .
2) This leads to inverse iteration scheme
Ayk = xk−1 , (69)

xk = yk /kyk k∞ , (70)
which is equivalent to power iteration applied to A−1 .

3) Inverse of A not computed explicitly, but factorization of A used to
solve a system of linear equations at each iteration.
4) Inverse iteration converges to eigenvector corresponding to smallest
eigenvalue of A.
5) Eigenvalue obtained is dominant eigenvalue of A−1 , and hence its
reciprocal is smallest eigenvalue of A in modulus.
Example 16 (Inverse Iteration)
Applying inverse iteration to the previous example to compute the smallest
eigenvalue yields sequence
k xTk kyk k∞
0 0.000 1.0
1 -0.333 1.0 0.750
2 -0.600 1.0 0.833
3 -0.778 1.0 0.900
4 -0.882 1.0 0.944
5 -0.939 1.0 0.971
6 -0.969 1.0 0.985
which is indeed converging to 1 (which is its own reciprocal in this case).
Interactive example
Inverse Iteration with Shift (to Accelerate Convergence)
1) As before, shifting strategy, working with A − σI for some scalar σ,

can greatly improve convergence.
2) Inverse iteration is particularly useful for computing eigenvector

corresponding to approximate eigenvalue since it converges rapidly
when applied to shifted matrix A − λI, where λ is an approximate
eigenvalue.
3) Inverse iteration is also useful for computing the eigenvalue closest to

the given value β, since if β is used as shift, then desired eigenvalue
corresponds to the smallest eigenvalue of the shifted matrix.
Rayleigh Quotient
1) Given approximate eigenvector x for real matrix A, determining best

estimate for corresponding eigenvalue λ can be considered as n × 1
linear least squares approximation problem
xλ ∼
= Ax. (71)
2) From normal equation xT xλ = xT Ax, least squares solution is given

by
xT Ax
λ= T . (72)
x x
3) This quantity, known as Rayleigh quotient, has many useful
properties.
Example 17 (Rayleigh Quotient)
1) Rayleigh quotient can accelerate the convergence of iterative methods
such as power iteration since Rayleigh quotient xTk Axk /xTk xk gives a
better approximation to eigenvalue at iteration k than does basic
method alone.
2) For the previous example using power iteration, the value of the
Rayleigh quotient at each iteration is shown below.
k xTk kyk k∞ xTk Ax/xTk xk
0 0.000 1.0
1 -0.333 1.0 1.500 1.500
2 -0.600 1.0 1.667 1.800
3 -0.778 1.0 1.800 1.941
4 -0.882 1.0 1.889 1.985
5 -0.939 1.0 1.941 1.996
6 -0.969 1.0 1.970 1.999
Rayleigh Quotient Iteration
1) Given an approximate eigenvector, the Rayleigh quotient yields a

good estimate for the corresponding eigenvalue.
2) Conversely, inverse iteration converges rapidly to eigenvector if the
approximate eigenvalue is used as shift, with one iteration often
sufficing.
3) These two ideas combined in Rayleigh quotient iteration
σk = xTk Axk /xTk xk , (73a)

(A − σk I)yk+1 = xk , (73b)
xk+1 = yk+1 /kyk+1 k∞ , (73c)
starting from given nonzero vector x0 .
Rayleigh Quotient Iteration (Cont’d)
4) Rayleigh quotient iteration is especially effective for symmetric
matrices and usually converges rapidly.
5) Using different shifts at each iteration means the matrix must be
refactored each time to solve the linear system. Hence, the cost per
iteration is high unless the matrix has a special form that makes
factorization easy.
6) The same idea also works for complex matrices, for which transpose is
replaced by conjugate transpose, so Rayleigh quotient becomes
xH Ax/xH x.
7) Using the same matrix as previous examples and randomly chosen
starting vector x0 , Rayleigh quotient iteration converges in two
iterations.
k xTk σk
0 0.807 0.397 1.896
1 0.924 1.000 1.998
2 1.000 1.000 2.000
Deflation
1) After eigenvalue λ1 and corresponding eigenvector x1 have been

computed, then additional eigenvalues λ2 , · · · , λn of A can be
computed by deflation, which effectively removes known eigenvalue.
2) Let H be any nonsingular matrix such that Hx1 = αe1 , scalar

multiple of the first column of identity matrix (Householder
transformation is a good choice for H).
3) Then similarity transformation determined by H transforms A into

form
λ1 bT

−1
HAH = , (74)
0 B
where B is matrix of order n − 1 having eigenvalues λ2 , · · · , λn .
Deflation (Cont’d)
4) Thus, we can work with B to compute next eigenvalue λ2 .
5) Moreover, if y2 is eigenvector of B corresponding to λ2 , then

−1 α
x2 = H , (75)
y2
T
where α = λb2 −λ
y2
1
, is eigenvector corresponding to λ2 for original
matrix A, provided λ1 6= λ2 .
6) Process can be repeated to find additional eigenvalues and

eigenvectors.
Deflation (Cont’d)
7) Alternative approach lets u1 be any vector such that uT1 x1 = λ1 .
8) Then A − x1 uT1 has eigenvalues 0, λ2 , · · · , λn .
9) Possible choices for u1 include
u1 = λ1 x1 , if A is symmetric and x1 is normalized so that kx1 k2 = 1;

u1 = λ1 y1 , where y1 is corresponding left eigenvector (i.e.,
AT y1 = λ1 y1 ) normalized so that y1T x1 = 1;
u1 = AT ek , if x1 is normalized so that kx1 k∞ = 1 and k th

component of x1 is 1.
Jacobi Method
1) One of oldest methods for computing eigenvalues is Jacobi method,
which uses similarity transformation based on plane rotations.
2) Sequence of plane rotations chosen to annihilate symmetric pairs of

matrix entries, eventually converging to diagonal form.
3) Choice of plane rotation is slightly more complicated than in Givens’

method for QR factorization.
4) To annihilate given off-diagonal pair, choose c and s so that

H c −s a b c s
J AJ = (76)
s c b d −s c
2
c a − 2csb + s2 d c2 b + cs(a − d) − s2 b

= 2 (77)
c b + cs(a − d) − s2 b c2 d + 2csb + s2 a
is diagonal.
Jacobi Method (Cont’d)
5) Transformed matrix diagonal if
c2 b + cs(a − d) − s2 b = 0. (78)
6) Dividing both sides by c2 b, we obtain
s a − d s2
1+ − 2 = 0. (79)
c b c
7) Making substitution t = s/c, we get quadratic equation
a−d
1+t − t2 = 0. (80)
b
for tangent
√ t of angle of rotation, from which we can recover
c = 1/ 1 + t2 and s = c · t.
8) Advantageous numerically to use the root of smaller magnitude.
Example 18 (Plane Rotation)
Consider 2 × 2 matrix
1 2
A= .
2 1
Quadratic equation for tangent reduces to t2 = 1, so t = ±1.
Two roots of√same magnitude,√so we arbitrarily choose t = −1, which

yields c = 1/ 2 and s = −1/ 2.
Resulting plane rotation J gives

√ √ √ √
T 1/ √2 1/√2 1 2 1/√2 −1/√ 2 3 0
J AJ = = .
−1/ 2 1/ 2 2 1 1/ 2 1/ 2 0 −1
9) Starting with symmetric matrix A0 = A, each iteration has form
Ak+1 = JkT Ak Jk , (81)
where Jk is plane rotation chosen to annihilate a symmetric pair of

entries in Ak .
10) Plane rotations are repeatedly applied from both sides in systematic
sweeps through the matrix until the off-diagonal mass of the matrix is
reduced to within some tolerance of zero.
11) Resulting diagonal matrix orthogonally similar to the original matrix,

so diagonal entries are eigenvalues, and the product of plane rotations
gives eigenvectors.
12) Jacobi method is reliable, simple to program, and capable of high

accuracy, but converges rather slowly and is difficult to generalize
beyond symmetric matrices.
13) Except for small problems, more modern methods usually require 5 to
10 times less work than Jacobi.
14) One source of inefficiency is that previously annihilated entries can

subsequently become nonzero again, thereby requiring repeated
annihilation.
15) Newer methods, such as QR iteration, preserve zero entries

introduced into the matrix.
Example 19 (Jacobi Method)
 
1 0 2
1) Let A0 = 0 2 1.
2 1 1
2) First annihilate (1,3) and (3,1) entries using rotation

 
0.707 0 −0.707
J0 =  0 1 0 
0.707 0 0.707
to obtain  
3 0.707 0
A1 = J0T A0 J0 = 0.707 2 0.707 .
0 0.707 −1
Example 19 (Jacobi Method (Cont’d))
3) Next, annihilate (1,2) and (2,1) entries using rotation
 
0.888 −0.460 0
J1 = 0.460 0.888 0
0 0 1
to obtain  
3.366 0 0.325
A2 = J1T A1 J1 = 0 1.634 0.628 .
0.325 0.628 −1
4) Next, annihilate (2,3) and (3,2) entries using rotation

 
1 0 0
J2 = 0 0.975 −0.221
0 0.221 0.975
to obtain  
3.366 0.072 0.317
A3 = J2T A2 J2 = 0.072 1.776 0 .
0.317 0 −1.142
Example 19 (Jacobi Method (Cont’d))
5) Beginning a new sweep, again annihilate (1,3) and (3,1) entries using rotation
 
0.998 0 −0.070
J3 =  0 1 0 
0.070 0 0.998
to obtain  
3.388 0.072 0
A4 = J3T A3 J3 = 0.072 1.776 −0.005 .
0 −0.005 −1.164
6) Process continues until off-diagonal entries are reduced to as small as desired.
7) Result is a diagonal matrix orthogonally similar to the original matrix, with the orthogonal
similarity transformation given by the product of plane rotations.
Interactive example
Simultaneous Iteration
1) Simplest method for computing many eigenvalue-eigenvector pairs is

simultaneous iteration, which repeatedly multiplies matrix times
matrix of initial starting vectors.
2) Starting from n × p matrix X0 of rank p, iteration scheme is
Xk = AXk−1 . (82)
3) span(Xk ) converges to invariant subspace determined by p largest

eigenvalues of A, provided |λp | > |λp+1 |.
4) Also called subspace iteration.
Orthogonal Iteration
1) As with power iteration, normalization is needed with simultaneous
iteration.
2) Each column of Xk converges to dominant eigenvector, so columns

of Xk become increasingly ill-conditioned basis for span(Xk ).
3) Both issues can be addressed by computing QR factorization at each

iteration
Q̂k Rk = Xk−1 (83a)

Xk = AQ̂k (83b)
where Q̂k Rk is reduced QR factorization of Xk−1 .

4) This orthogonal iteration converges to block triangular form, and the
leading block is triangular if moduli of consecutive eigenvalues are
distinct.
QR Iteration (proposed by Heinz Rutishauser in 1958 and refined by J. F. G.
Francis of Ferranti Ltd., London in 1961-1962)
1) For p = n and X0 = I, matrices

Ak = Q̂H
k AQ̂k (84)
generated by orthogonal iteration converge to triangular or block
triangular form, yielding all eigenvalues of A.
2) QR iteration computes successive matrices Ak without forming above

product explicitly.
3) Starting with A0 = A, at iteration k compute QR factorization

Qk Rk = Ak−1 (85)
and form reverse product
Ak = Rk Qk . (86)
QR Iteration (Cont’d)
4) Successive matrices Ak are unitarily similar to each other
Ak = Rk Qk = QH
k Ak−1 Qk . (87)
5) Diagonal entries (or eigenvalues of diagonal blocks) of Ak converge

to eigenvalues of A.
6) Product of orthogonal matrices Qk converges to matrix of

corresponding eigenvectors.
7) If A is symmetric, then symmetry is preserved by QR iteration, so Ak

converge to the matrix that is both triangular and symmetric, hence
diagonal.
Example 20 (QR Iteration)

7 2
1) Let A0 = .
2 4
2) Compute QR factorization

0.962 −0.275 7.28 3.02
A0 = Q1 R1 = ,
0.275 0.962 0 3.30
and form reverse product

7.83 0.906
A 1 = R 1 Q1 = .
0.906 3.17
3) Off-diagonal entries are now smaller, and diagonal entries are closer
to eigenvalues, 8 and 3.
4) Process continues until the matrix is within a tolerance of being

diagonal, and diagonal entries then closely approximate eigenvalues.
QR Iteration with Shifts (to Accelerate Convergence)
1) Convergence rate of QR iteration can be accelerated by incorporating

shifts
Qk Rk = Ak−1 − σk I,
Ak = Rk Qk + σk I,
where σk is a rough approximation to the eigenvalue.
2) Good shift can be determined by computing eigenvalues of 2 × 2

submatrix in the lower right corner of the matrix.
Example 21 (QR Iteration with Shifts)
1) Repeat previous example, but with shift of σ1 = 4, which is lower
right corner entry of matrix.
2) We compute QR factorization

0.832 0.555 3.61 1.66
A0 − σ1 I = Q1 R1 = ,
0.555 −0.832 0 1.11
and form reverse product, adding back shift to obtain

7.92 0.615
A 1 = R 1 Q1 + σ 1 I = .
0.615 3.08
3) After one iteration, off-diagonal entries are smaller compared with an

unshifted algorithm, and eigenvalues are closer approximations to
eigenvalues.
Interactive example
Preliminary Reduction
1) Efficiency of QR iteration can be enhanced by first transforming the

matrix as close to triangular form as possible before beginning
iterations.
2) Hessenberg matrix is triangular except for one additional nonzero

diagonal immediately adjacent to the main diagonal.
3) Any matrix can be reduced to Hessenberg form in a finite number of

steps by an orthogonal similarity transformation, for example, using
Householder transformations.
4) Symmetric Hessenberg matrix is tridiagonal.
5) Hessenberg or tridiagonal form is preserved during successive QR

iterations.
Preliminary Reduction (Cont’d)
Advantages of initial reduction to upper Hessenberg or tridiagonal form:

6) Work per QR iteration is reduced from O(n3 ) to O(n2 ) for general
matrix or O(n) for symmetric matrix.
7) Fewer QR iterations are required because the matrix is nearly

triangular (or diagonal) already.
8) If any zero entries on the first subdiagonal, then the matrix is block
triangular, and the problem can be broken into two or more smaller
subproblems.
Preliminary Reduction (Cont’d)
9) QR iteration is implemented in two stages:

Case 1: symmetric matrix −→ tridiagonal −→ diagonal
or
Case 2: general matrix −→ Hessenberg −→ triangular
10) Preliminary reduction requires a definite number of steps, whereas the

subsequent iterative stage continues until convergence.
11) In practice, only a modest number of iterations are usually required,

so much of the work is in preliminary reduction.
12) Cost of accumulating eigenvectors, if needed, dominates total cost.
Cost of QR Iteration
The approximate overall cost of preliminary reduction and QR iteration,

counting both additions and multiplications:
1) Symmetric matrices
4 3
3nfor eigenvalues only;
3
9n for eigenvalues and eigenvectors.
2) General matrices
10n3 for eigenvalues only;
25n3 for eigenvalues and eigenvectors.
Krylov Subspace Methods (Russian applied mathematician,
1863–1945)
1) Krylov subspace methods reduce the matrix to Hessenberg (or tridiagonal)
form using only matrix-vector multiplication, developed by Krylov in 1931.
2) For arbitrary starting vector x0 , if

Kk = x0 Ax0 · · · Ak−1 x0 = x0

x1 ··· xk−1 , (89)
then, for k = n, we have
Kn−1 AKn = Cn , (90)
where Cn = [e2 , · · · , en , a], with a = Kn−1 xn , is upper Hessenberg.
3) To obtain better conditioned basis for span(Kn ), compute QR factorization

Qn Rn = Kn , (91)
then, substituting (91) into (90) yields
−1
QH
n AQn = Rn Cn Rn ≡ H, (92)
with H upper Hessenberg.
Krylov Subspace Methods (Cont’d)
4) Equating k th columns on each side of equation AQn = Qn H yields

recurrence
Aqk = h1k q1 + · · · + hkk qk + hk+1,k qk+1 , (93)
relating qk+1 to preceding vectors q1 , · · · , qk .
5) Premultiplying by qjH and using orthonormality,
qjH Aqk = hjk , j = 1, · · · , k. (94)
6) These relationships yield Arnoldi iteration, which produces unitary

matrix Qn and upper Hessenberg matrix Hn column by column using
only matrix-vector multiplication by A and inner products of vectors.
Arnoldi Iteration (American engineer, 1917–1995)
1) The Arnoldi iteration algorithm was developed by Arnoldi in 1951.

1: x0 = arbitrary nonzero starting vector
2: q1 = x0 /kx0 k2
3: for k = 1, 2, · · · do
4: uk = Aqk
5: for j = 1 to k do
6: hjk = qjH uk
7: uk = uk − hjk qj
8: end for
9: hk+1,k = kuk k2
10: if hk+1,k = 0 then
11: stop
12: end if
13: qk+1 = uk /hk+1,k
14: end for
Arnoldi Iteration (Cont’d)
2) If
Q k = q1 · · · qk , (95)
then
Hk = QH
k AQk . (96)
is upper Hessenberg matrix.
3) Eigenvalues of Hk , called Ritz values, are approximate eigenvalues of

A, and Ritz vectors given by Qk y, where y is eigenvector of Hk , are
corresponding approximate eigenvectors of A.
4) Eigenvalues of Hk must be computed by another method, such as

QR iteration, but this is an easier problem if k n.
Arnoldi Iteration (Cont’d)
5) Arnoldi iteration is fairly expensive in work and storage because each

new vector qk must be orthogonalized against all previous columns of
Qk , and all must be stored for that purpose.
6) The Arnoldi process is usually restarted periodically with a carefully

chosen starting vector.
7) Ritz values and vectors produced are often good approximations to

eigenvalues and eigenvectors of A after relatively few iterations.
Lanczos Iteration (Hungary, 1893–1974, Einstein’s
assistant in Berlin in 1928)
1) Work and storage costs drop dramatically if the matrix is symmetric

or Hermitian since recurrence then has only three terms and Hk is
tridiagonal (so usually denoted Tk )
1: q0 = 0, β0 = 0
2: x0 = arbitrary nonzero starting vector
3: q1 = x0 /kx0 k2
4: for k = 1, 2, · · · do
5: uk = Aqk
6: αk = qkH uk
7: uk = uk − βk−1 qk−1 − αk qk
8: βk = kuk k2
9: if βk = 0 then
10: stop
11: end if
12: qk+1 = uk /βk
13: end for
Lanczos Iteration (Cont’d)
2) αk and βk are diagonal and subdiagonal entries of symmetric

tridiagonal matrix Tk .
3) As with Arnoldi, Lanczos iteration does not produce eigenvalues and

eigenvectors directly, but only tridiagonal matrix Tk , whose
eigenvalues and eigenvectors must be computed by another method
to obtain Ritz values and vectors.
4) If βk = 0, then the algorithm appears to break down, but in that case,

invariant subspace has already been identified (i.e., Ritz values and
vectors are already exact at that point).
5) In principle, if Lanczos algorithm were run until k = n, resulting

tridiagonal matrix would be orthogonally similar to A.
6) In practice, rounding error causes loss of orthogonality, invalidating

this expectation.
7) Problem can be overcome by reorthogonalizing vectors as needed, but

the expense can be substantial.
8) Alternatively, we can ignore the problem, in which case the algorithm

still produces good eigenvalue approximations, but multiple copies of
some eigenvalues may be generated.
9) Great virtue of Arnoldi and Lanczos iterations is their ability to

produce good approximations to extreme eigenvalues for k n.
10) Moreover, they require only one matrix-vector multiplication by A per

step and little auxiliary storage, so they are ideally suited to large
sparse matrices.
11) If eigenvalues are needed in the middle of the spectrum, say near σ,
then the algorithm can be applied to matrix (A − σI)−1 , assuming it
is practical to solve systems of form (A − σI)x = y.
Example 22 (Lanczos Iteration)
For a 29 × 29 symmetric matrix with eigenvalues 1, · · · , 29, the behavior
of Lanczos iteration is shown below.
Generalized Eigenvalue Problems
1) Generalized eigenvalue problem has form
Ax = λBx, (97)
where A and B are given n × n matrices.

2) If either A or B is nonsingular, then the generalized eigenvalue
problem can be converted to a standard eigenvalue problem, either
(B −1 A)x = λx or (A−1 B)x = (1/λ)x. (98)
3) This is not recommended since it may cause

loss of accuracy due to rounding error;
loss of symmetry if A and B are symmetric.
4) Better alternative for generalized eigenvalue problems is the QZ
algorithm.
Generalized Eigenvalue Problems: QZ Algorithm
1) If A and B are triangular, then eigenvalues are given by λi = aii /bii ,
for bii 6= 0.
2) QZ algorithm reduces A and B simultaneously to upper triangular
form by orthogonal transformations.
3) First, B is reduced to upper triangular form by orthogonal
transformation from left, which is also applied to A.
4) Next, transformed A is reduced to upper Hessenberg form by
orthogonal transformation from left while maintaining the triangular
form of B, which requires additional transformations from the right.
5) Finally, analogous to QR iteration, A is reduced to triangular form
while still maintaining the triangular form of B, which again requires
transformations from both sides.
6) Eigenvalues can now be determined from mutually triangular form,
and eigenvectors can be recovered from products of left and right
transformations, denoted by Q and Z.
Generalized Eigenvalue Problems: Matlab Application
In Matlab, the built-in function
[V , D] = eig(A, B, algorithm) (99)
solves the generalized eigenvalue problem AV = BV D, where B must

be the same size as A.
If the algorithm is ‘chol,’ uses the Cholesky factorization of B to
compute the generalized eigenvalues.
The default for the algorithm depends on the properties of A and B,
but is generally ‘qz,’ which uses the QZ algorithm.
If A is Hermitian and B is Hermitian positive definite, then the
default for the algorithm is ‘chol.’
Extension: Computing SVD
1) Singular values of A are nonnegative square roots of eigenvalues of

AT A, and columns of U and V are orthonormal eigenvectors of
AAT and AT A, respectively.
2) Algorithms for computing SVD work directly with A, without forming
AAT or AT A, to avoid loss of information associated with forming
these matrix products explicitly.
3) SVD is usually computed by a variant of QR iteration, with A first
reduced to bidiagonal form by orthogonal transformations, then
remaining off-diagonal entries are annihilated iteratively. (See also
Sections II.3-II.4 of the book3 , where the computations of QR, SVD,
and CMR are addressed.)
4) SVD can also be computed by a variant of the Jacobi method, which
can be useful on parallel computers or if the matrix has a special
structure.
3
Gilbert Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge University Press, 2019, pp. 138-155.
Further Reading
Barry A. Cipra, “The Best of the 20th Century: Editors Name Top 10
Algorithms”, SIAM News, vol. 33, no. 4.
Thank you very much for your attention!
Minghua Xia
Email: xiamingh@mail.sysu.edu.cn
Web: https://seit.sysu.edu.cn/teacher/XiaMinghua

Matrix Analysis Chap 08 Handout

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Analysis Chap 08 Handout

Uploaded by

Copyright:

Available Formats

Matrix Analysis and Applications

Chapter 8: QR and Interpolative Decomposition

School of Electronics and Information Technology

November 15, 2023

I. Five basic mathematical problems:

where Q ∈ Cm×m is unitary, and R ∈ Cm×n is upper triangular.

efficient to compute, a.k.a. poor man’s SVD.

min kAx − yk22 . (2)

the LS problem is equivalent to solving the upper triangular system

Theorem 2 (QR Decomposition)

We call A = Q1 R1 a thin QR of A, and (Q1 , R1 ) a thin QR pair of A.

It follows directly that

2) Let R1 be the (upper triangular and nonsingular) Cholesky factor of

3) It can be verified that QH

In particular, if X is square, then

Vn = |det (X)| . (9)

Figure 1: Geometric interpretation of determinant.

Solution: Recalling the application ‘Determine the Volume of n-Dimensional

On the other hand, it is not hard to prove that

which leads to (8). If X is square, we have

which implies (9).

The proof of Theorem 2 already suggests a way to compute

Suppose that A = Q1 R1 exists. Then, the k th column of A = Q1 R1

r11 = ka1 k2 , q1 = a1 /ka1 k2 . (14)

r12 = q1H a2 . (15)

Also, we can obtain r22 and q2 via a cancellation trick:

r22 = ky2 k2 , q2 = y2 /ky2 k2 , (16)

The same idea applies to the subsequent stages.

Gram-Schmidt procedure: for k = 1, · · · , n, compute

rik = qiH ak , i = 1, · · · , k − 1, (19a)

There are variants with the implementations of Gram-Schmidt

1) Classical Gram-Schmidt procedure often suffers the loss of

2) Also, separate storage is required for A, Q, and R, since original ak

3) Both deficiencies are improved by modified Gram-Schmidt procedure,

Suppose that q1 , · · · , qk−1 , r1 , · · · , rk−1 are determined. By observing

we can obtain qk and rk via

where c , cos(θ), s , sin(θ), for some θ. It can be verified that

Also, yk can be forced to zero by choosing θ = tan−1 (xk /xi ).

Givens QR: perform a sequence of Givens rotations to annihilate the lower

Jn,n−1 . . . (Jn,2 · · · J3,2 )(Jn,1 · · · J2,1 ) A = R, (27)

where R is upper triangular, and Q is orthogonal.

2) To annihilate the second entry, we compute cosine and sine

3) Rotation is then given by

4) Rotation is then given by

2) These disadvantages can be overcome but requires more complicated

3) Givens can be advantageous for computing QR factorization when

Definition 6 (Reflection Matrix)

By letting a = P a + P⊥ a, P⊥ = I − P , it is seen that

Ha = βe1 , for some β ∈ R. (31)

Definition 7 (Householder Transformations)

v = a + sign(a1 )kak2 e1 , (33)

where sign(a1 ) was introduced to avoid cancellation, then, we obtain

Ha = −sign(a1 )kak2 e1 . (34)

Figure 2: Geometric interpretation of Householder transformation as reflection.

2) To confirm that transformation works,

Stage 1: let H1 ∈ Rm×m be the Householder transformation w.r.t. a1 , i.e., the

Stage 2: let H̃2 ∈ R(m−1)×(m−1) be the Householder transformation w.r.t. the

The same idea applies to stage k, 1 ≤ k ≤ n − 1.

A(k) = Hk A(k−1) , (37)

The above procedure results in

A(n−1) = Hn−1 · · · H2 H1 A = R. (39)

Eq. (39) implies Q = H1T H2T · · · Hn−1

2) Householder vector v1 for annihilating subdiagonal entries of first column of