You are on page 1of 94

Matrix Analysis and Applications

Chapter 8: QR and Interpolative Decomposition

Minghua Xia
(https://seit.sysu.edu.cn/teacher/XiaMinghua)

School of Electronics and Information Technology


Sun Yat-sen University

November 15, 2023

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 1 / 108
Recap of the Overview of the Course

I. Five basic mathematical problems:


1. Ax = b
2. Ax = λx
3. Av = σu
kAxk2
4. max
kxk2
5. Factor the matrix A
II. Organization:
Chapter 1: Basic concepts
Chapters 2-4: Eigenvalue decomposition
Chapter 5: Singular value decomposition
Chapters 6-7: Linear systems (algebra vs. geometry)
Chapters 8-10: Matrix factorization and applications

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 2 / 108
Table of Contents
1 QR Decomposition
2 Gram-Schmidt Orthonormalization
3 Givens Rotations
4 Householder Transformations
5 Interpolative Decomposition: CZ and CMR Factorization
CZ Factorization
CMR Factorization
Three Bases for the Column Space
6 Computing Eigenvalues and Eigenvectors
Power Iteration
Jacobi Method
QR Algorithm
Krylov Subspace Methods
Generalized Eigenvalue Problems

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 3 / 108
QR Decomposition
Definition 1 (QR Decomposition)
Any A ∈ Cm×n can be decomposed as

A = QR, (1)

where Q ∈ Cm×m is unitary, and R ∈ Cm×n is upper triangular.

efficient to compute, a.k.a. poor man’s SVD.


can be used to compute
LS solutions;
bases of R(A) and R(A)⊥ ;
decompositions such as SVD.

Question 1
1) How do we know that QR decomposition exists?
2) How to compute the factors Q and R algorithmically?
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 5 / 108
An Illustrative Application: Solving LS Problem
Consider the LS problem for a full column-rank A ∈ Cm×n :

min kAx − yk22 . (2)


x∈Cn

Let  
  R1
A = QR = Q1 Q2 = Q1 R1 , (3)
0
where Q1 ∈ Cm×n , Q2 ∈ Cm×(m−n) , R1 ∈ Cn×n (upper triangular). Since
2
kAx − yk22 = QH (Ax − y) 2
 H  2
Q1 (Ax − y)
= (4)
QH
2 (Ax − y) 2
2 2
= R1 x − QH
1 y 2
+ QH
2 y 2
, (5)

the LS problem is equivalent to solving the upper triangular system

R1 x = QH
1 y. (6)

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 6 / 108
QR for Full Column-Rank Matrices

Theorem 2 (QR Decomposition)


Let A ∈ Cm×n , and suppose that A has full column rank. Then, A admits a
decomposition  
  R1
A = QR = Q1 Q2 = Q1 R1 , (7)
0
where Q1 ∈ Cm×n is semi-unitary; R1 ∈ Cn×n is upper triangular and
nonsingular.

We call A = Q1 R1 a thin QR of A, and (Q1 , R1 ) a thin QR pair of A.

It follows directly that

Property 1
For the thin QR in Theorem 2, we have R(A) = span(Q1 ) and
R(A)⊥ = span(Q2 ). Moreover, Q1 QH 1 is orthogonal projector onto R(A).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 7 / 108
QR for Full Column-Rank Matrices (Cont’d)

Proof.
1) AH A is positive definite.

2) Let R1 be the (upper triangular and nonsingular) Cholesky factor of


AH A, i.e.,
AH A = R1H R1 .
Also, let Q1 = AR1−1 .

3) It can be verified that QH


1 Q1 = I and Q1 R1 = A.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 8 / 108
Application: A Geometrical Interpretation of Determinant
Question 2
When X ∈ Rm×n has linearly independent columns, explain why the volume of
the n-dimensional parallelepiped generated by the columns of X is
1/2
Vn = det X T X

. (8)

In particular, if X is square, then

Vn = |det (X)| . (9)

Figure 1: Geometric interpretation of determinant.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 9 / 108
Application: A Geometrical Interpretation of Determinant

Solution: Recalling the application ‘Determine the Volume of n-Dimensional


Parallelepiped’ in Chapter 7, we know that if X = QR is the QR factorization of
X, then the volume of the n-dimensional parallelepiped generated by the columns
of X is
Vn = v1 v2 · · · vn = det(R), (10)
where the vk ’s are the diagonal elements of the upper-triangular matrix R.

On the other hand, it is not hard to prove that


2
det X T X = det RT QT QR = (det (R)) = (v1 v2 · · · vn )2 = Vn2 ,
 
(11)

which leads to (8). If X is square, we have


2 2
det X T X = det X T det (X) = (det (X)) = (det (R)) = Vn2 ,
 
(12)

which implies (9).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 10 / 108
Classical Gram-Schmidt for Thin QR

Question 3
Given a full column-rank A, how to compute its thin QR pair (Q1 , R1 )?

The proof of Theorem 2 already suggests a way to compute


(Q1 , R1 ), but there are more efficient ways.

Suppose that A = Q1 R1 exists. Then, the k th column of A = Q1 R1


equals
ak = r1k q1 + r2k q2 + · · · + rkk qk . (13)
Stage 1: since a1 = r11 q1 , we can obtain r11 and q1 via

r11 = ka1 k2 , q1 = a1 /ka1 k2 . (14)

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 12 / 108
Classical Gram-Schmidt for Thin QR (Cont’d)
Stage 2: since a2 = r12 q1 + r22 q2 and q1H q2 = 0, we can obtain r12 via

r12 = q1H a2 . (15)

Also, we can obtain r22 and q2 via a cancellation trick:

r22 = ky2 k2 , q2 = y2 /ky2 k2 , (16)

where y2 = a2 − r12 q1 .

Stage 3: we have a3 = r13 q1 + r23 q2 + r33 q3 , Using the same trick as above
yields
r13 = q1H a3 , r23 = q2H a3 , (17)
and
q3 = y3 /ky3 k2 , r33 = ky3 k2 , (18)
where y3 = a3 − r13 q1 − r23 q2 .

The same idea applies to the subsequent stages.


Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 13 / 108
Classical Gram-Schmidt for Thin QR

Gram-Schmidt procedure: for k = 1, · · · , n, compute

rik = qiH ak , i = 1, · · · , k − 1, (19a)


k−1
X
yk = ak − rik qi , (19b)
i=1
yk
qk = , (19c)
kyk k2
rkk = kyk k2 . (19d)

There are variants with the implementations of Gram-Schmidt


(numerical stability is usually the concern).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 14 / 108
Classical Gram-Schmidt for Thin QR (Cont’d)

1) Classical Gram-Schmidt procedure often suffers the loss of


orthogonality in finite precision.

2) Also, separate storage is required for A, Q, and R, since original ak


are needed in inner loop, so qk cannot overwrite columns of A.

3) Both deficiencies are improved by modified Gram-Schmidt procedure,


with each vector orthogonalized in turn against all subsequent
vectors, so qk can overwrite ak .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 15 / 108
Modified Gram-Schmidt
Let riT denote the ith row of R1 (with a slight abuse of notations). The following
illustrates a modified Gram-Schmidt, which is numerically more stable.
Modified Gram-Schmidt procedure: for k = 1, · · · , n, compute
n
X
A = Q1 R1 = qi riT . (Rank-one decomposition) (20)
i=1

Suppose that q1 , · · · , qk−1 , r1 , · · · , rk−1 are determined. By observing


k−1
X n
X
A− qi riT = qi riT = [0, · · · , 0, rkk qk , · · · ], (21)
| {z }
i=1 i=k
k−1 zeros

we can obtain qk and rk via


k−1
!
X
yk = A− qi riT ek , (22a)
i=1

qk = yk /kyk k2 , (22b)
k−1
!
X
rkT = qkT A− qi riT . (22c)
i=1

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 16 / 108
Givens Rotations
Consider yet another way for QR; again, assume the real-valued case.
Example 3
Given x ∈ R2 , consider
      
y1 c s x1 cx1 + sx2
= = , (23)
y2 −s c x2 −sx1 + cx2
| {z }
=J

where c , cos(θ), s , sin(θ), for some θ. It can be verified that

1) J is orthogonal.
2) y2 = 0 if θ = tan−1 (x2 /x1 ), or if
x1 x2
c= p 2 , s= p 2 . (24)
x1 + x22 x1 + x22

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 18 / 108
Givens Rotations (Cont’d)

Givens rotations:  
I

 c s 

J (i, k, θ) , 
 I ,
 (25)
 −s c 
I
where c , cos(θ), s , sin(θ); c lies in the (k, k)th and (i, i)th entries; −s the
(i, k)th entry; s the (k, i)th entry.
1) J (i, k, θ) is orthogonal;
2) Let y = J (i, k, θ)x. We have

 cxi + sxk , j = i
yj = −sxi + cxk , j = k . (26)
6 i, k
xj , j =

Also, yk can be forced to zero by choosing θ = tan−1 (xk /xi ).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 19 / 108
Givens Rotations (Cont’d)

Example 4
Consider a 4 × 3 matrix.
       
× × × × × × × × × × × ×
× × × J2,1  0 × × J3,1  0 × × J4,1  0 × ×
× × × −−−→ × ×
A=    −−−→   −−−→  
× 0 × × 0 × ×
× × × × × × × × × 0 × ×
     
× × × × × × × × ×
J3,2  0 × ×  J4,2  0 × × J4,3  0 × ×
−−−→  0 0 × −−−→  0
   −−−→   = R,
0 × 0 0 ×
0 × × 0 0 × 0 0 0
J
where B − → C means C = J B; Ji,k = J (i, k, θ), with θ that is chosen to zero out the
(i, k)th entry of the matrix transformed by Ji,k .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 20 / 108
Givens Rotations (Cont’d)

Givens QR: perform a sequence of Givens rotations to annihilate the lower


triangular parts of A to obtain

Jn,n−1 . . . (Jn,2 · · · J3,2 )(Jn,1 · · · J2,1 ) A = R, (27)


| {z }
=QT

where R is upper triangular, and Q is orthogonal.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 21 / 108
Example 5 (Givens Rotation)
 T
1) Let x = 4 3 .

2) To annihilate the second entry, we compute cosine and sine


x1 x
c= p 2 = 0.8 and s = p 2 = 0.6.
x1 + x22 x21 + x22

3) Rotation is then given by


   
c s 0.8 0.6
J= = .
−s c −0.6 0.8

4) Rotation is then given by


    
0.8 0.6 4 5
Jx = = .
−0.6 0.8 3 0

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 22 / 108
Characteristics of Givens QR Factorization

Remark 1
1) Straightforward implementation of the Givens method requires about
50% more work than the Householder method and requires more
storage since each rotation requires two numbers, c and s, to define
it.

2) These disadvantages can be overcome but requires more complicated


implementation.

3) Givens can be advantageous for computing QR factorization when


many matrix entries are already zero since those annihilations can
then be skipped.

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 23 / 108
Householder Transformations
Consider an alternative QR via the Householder transformations (to be
introduced). Assume the real-valued case for convenience.

Definition 6 (Reflection Matrix)


A matrix of the form
H , I − 2P , (28)
m×m
where P ∈ R is an orthogonal projector, is called a reflection matrix.

By letting a = P a + P⊥ a, P⊥ = I − P , it is seen that


Ha = (I − 2P )a = −P a + (I − P )a = −P a + P⊥ a, (29)
where a is ‘reflected’.
A reflection matrix is both orthogonal and symmetric:
H = H T = H −1 . (30)
P T +4 P
Proof: H T H = (I − 2P )T (I − 2P ) = I − 2P − 2 |{z} T
| {zP} = I.
=P =P

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 25 / 108
Householder Transformations (Cont’d)
Question 4
Given a ∈ Rm , find a reflection matrix H ∈ Rm×m such that

Ha = βe1 , for some β ∈ R. (31)

Definition 7 (Householder Transformations)


Let
2
H,I− vv T , (32)
kvk22
T
vv
for some v ∈ Rm (compared with (28), P = kvk 2 is indeed an orthogonal
2
projector onto R(v)). It can be verified that if

v = a + sign(a1 )kak2 e1 , (33)

where sign(a1 ) was introduced to avoid cancellation, then, we obtain

Ha = −sign(a1 )kak2 e1 . (34)


Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 26 / 108
Householder Transformations (Cont’d)

Figure 2: Geometric interpretation of Householder transformation as reflection.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 27 / 108
Example 8 (Householder Transformation)
 T
1) If a = 2 1 2 , then we take
     
2 1 5
v = a + sign(a1 )kak2 e1 = 1 + kak2 0 = 1 .
2 0 2

2) To confirm that transformation works,


     
2 5 −3
vT a 15
Ha = a − 2 v = 1 − 2 × × 1 =  0  = −sign(a1 )kak2 e1 .
kvk2 30
2 2 0

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 28 / 108
Householder QR Factorization

Stage 1: let H1 ∈ Rm×m be the Householder transformation w.r.t. a1 , i.e., the


1st column of A. Then,  
× × ··· ×
 0 × · · · ×
A(1) = H1 A =  . ..  . (35)
 
. .. . .
. . . .
0 × ··· ×

Stage 2: let H̃2 ∈ R(m−1)×(m−1) be the Householder transformation w.r.t. the


(1)
1st column of A2:m, 2:n . Then,
 
× × × ··· ×
  0 × × · · · ×
 
× × .
  
1 0 .. 0 × · · · ×
A(2) =

H1 A = (1) =  . (36)
0 H̃2 0 H̃2 A2:m 2:n 
. . . .

| {z }  ..
 .. .. . . . .. 

=H2
0 0 × ··· ×

The same idea applies to stage k, 1 ≤ k ≤ n − 1.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 29 / 108
Householder QR Factorization (Cont’d)
Householder QR procedure: for k = 1, · · · , n − 1, compute

A(k) = Hk A(k−1) , (37)

where  
Ik−1 0
Hk = , (38)
0 H̃k
in which Ik is the k × k identity matrix, and H̃k ∈ R(m−k+1)×(n−k+1) is the
(k−1)
Householder transformation w.r.t Ak:m, k:n .

The above procedure results in

A(n−1) = Hn−1 · · · H2 H1 A = R. (39)

Eq. (39) implies Q = H1T H2T · · · Hn−1


T
;
The above QR procedure already implies the existence of full QR; also, A
can be arbitrary rather than being of full column rank as in Gram-Schmidt.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 30 / 108
Example 9 (Householder QR Factorization)
1) For the polynomial data-fitting example given previously, with
   
1 −1.0 1.0 1.0
1 −0.5 0.25 0.5
   
A= 1 0.0 0.0 
 , b = 0.0 .
 
1 0.5 0.25 0.5
1 1.0 1.0 2.0

2) Householder vector v1 for annihilating subdiagonal entries of first column of


A is      
1 2.236 3.236
1  0   1 
     
v1 = 1 +  0  =  1  .
    
1  0   1 
1 0 1

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 31 / 108
Example 9 (Householder QR Factorization (Cont’d))
3) Applying resulting Householder transformation H1 yields transformed matrix
and right-hand side
   
−2.236 0 −1.118 −1.789
 0
 −0.191 −0.405 
−0.362
 
H1 A =  0
 0.309 −0.655 , H1 b = 

−0.862 .

 0 0.809 −0.405   −0.362
0 1.309 0.345 1.138

4) Householder vector v2 for annihilating subdiagonal entries of second column


of H1 A is      
0 0 0
−0.191 −1.581 −1.772
     
v2 =  0.309  +  0  =  0.309  .
    
 0.809   0   0.809 
1.309 0 1.309

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 32 / 108
Example 9 (Householder QR Factorization (Cont’d))
5) Applying resulting Householder transformation H2 yields
   
−2.236 0 −1.118 −1.789
 0 1.581 0   0.632 
   
H2 H1 A =   0 0 −0.725  , H2 H1 b = −1.035 .
  
 0 0 −0.589  0.816 
0 0 0.047 0.404

6) Householder vector v3 for annihilating subdiagonal entries of third column of


H2 H1 A is      
0 0 0
 0   0   0 
     
v3 = −0.725 + −0.935 = −1.660 .
    
−0.589  0  −0.589
0.047 0 0.047

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 33 / 108
Example 9 (Householder QR Factorization (Cont’d))
7) Applying resulting Householder transformation H3 yields
   
−2.236 0 −1.118 −1.789
 0 1.581 0   0.632 
   
H3 H2 H1 A =  0 0 0.935  , H3 H2 H1 b = 

 1.336  .

 0 0 0   0.026 
0 0 0 0.337

8) Now solve upper triangular system Rx = c1 by back-substitution to obtain


 T
x = 0.086 0.400 1.429 .

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 34 / 108
Further Study: Hilbert Transform
Question 5
Householder transform is able to change the amplitudes of elements of a vector. But how to
change their phases?

Answer:
1
Z ∞
x(τ )
x̂(t) = H {x(t)} = dτ (40)
π t−τ
−∞
Z ∞
1 x̂(τ )
x(t) = H−1 {x̂(t)} = − dτ (41)
π −∞ t − τ

Figure 3: The Hilbert transform in the time domain, which is, in essence, a π/2 phase shifter
in the frequency domain.

−j, if ω ≥ 0;
H(ω) = F {h(t)} = (42)
+j, if ω < 0.
An application in signal processing is to construct complex signals.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 35 / 108
Interpolative Decomposition

Definition 10 (Interpolative Decomposition)


Given A ∈ Rm×n , we have

A = CZ or A = CM R, (43)

where C (resp., R) are actual columns (resp. actual rows) of A – thoughtfully


chosen.

A = CZ takes less computing time and less storage than A = U ΣV T and


A = QR;
When A is sparse or nonnegative or both, so is C;
When A comes from discretizing a differential or integral equation, the
columns in C have more meanings than the orthogonal bases in U and Q;
When A is a matrix of data, the columns kept in C have simple
interpretations.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 37 / 108
Interpolative Decomposition (Cont’d)

Comparison with SVD and Nonnegative Matrix Decomposition (NMF):


SVD: The singular vectors are algebraically and geometrically perfect but
they are “humanly” hard to know;
NMF: When the entries in A are intrinsically positive, the goal may be an
NMF. Then, both factors in A ≈ M N have nonnegative entries. This is
important for moderate sizes, but it is asking a lot in the world of big data;
A = CZ is right for large matrices when Z has small entries (say |zij | ≤ 2).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 38 / 108
CZ Factorization
1 When C contains a basis for the column space of A, the factor Z is
completely determined by
aj = Czj . (44)
2 When the columns of C are not very independent, we first look at the row
space of C:
cTi = yiT B, (45)
where B is an r × r invertible matrix. (45) implies C = Y B. Then,
combining it with A = CZ yields
Am×n = Cm×r Zr×n = Ym×r Br×r Zr×n , (46)
where all matrices have rank r.
Remark 2 (On the values of zj and yiT )
By recalling the fact that the columns of C came directly form A and the rows of
B came directly from C, for those columns and rows, the vectors zj in (44) and
yiT in (45) came directly from the identity matrix I.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 40 / 108
CZ Factorization (Cont’d)
3 According to Remark 2, if we suppose that zj and yiT are the first r
columns and rows in A and C, then A = Y BZ has a special form:
 
Ir
B −1 Zn−r
 
Y = −1 , B = submatrix of A, Z = Ir (47)
Cm−r B

Example 11
   
1 2 4 2 1 0   
1 2 1 0 0 0
A = 0 1 2 1 = 0 1 (48)
0 1 0 1 2 1
1 3 6 3 1 1 | {z } | {z }
| {z } B Z
Y
 
1 2  
1 0 0 0
= 0 1 (49)
0 1 2 1
1 3 | {z }
| {z } Z
C

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 41 / 108
CZ Factorization (Cont’d)

Remark 3 (On the choice of B)

1) We are just supposing that the upper left r × r corner B of A is invertible.


But every matrix A of rank r has an invertible r × r submatrix B
somewhere, and even maybe many such Bs.
2) If Cm−r or Zn−r or both happen to contain large numbers, the factors in
(47) are bad choices. We are happiest if those other entries of Y and Z are
small. The best is |yij | ≤ 1 and |zij | ≤ 1. And we can find a submatrix B
that makes this true!

4 According to Remark 3, if we choose B as the r × r submatrix of A with


the largest determinant, then all entries of Y and Z have |yij | ≤ 1 and
|zij | ≤ 1. Accordingly, we have

A = Y Bmax Z (50)
= CZ. (51)

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 42 / 108
CZ Factorization (Cont’d)

Example 12
   
1 2 4 2 1 0   
1 4 1 0 0 0
A = 0 1 2 1 = 0 1 (52)
0 2 0 0.5 1 0.5
1 3 6 3 1 1 | {z } | {z }
| {z } Bmax Z
Y
 
1 4  
1 0 0 0
= 0 2 (53)
0 0.5 1 0.5
1 6 | {z }
| {z } Z
C

Note: In the applications of big data analysis, A is generally rank-deficient and


A = CZ holds for sure. However, in case A is of full column rank, it is obvious
that one must choose C = A such that A = CZ, where Z = I. If C is chosen
with a lower rank than A, then one can only get the approximation A ≈ CZ.
For more details on the approximation error, see the following Definition 13.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 43 / 108
CMR Factorization

Definition 13 (CMR Factorization)


Given A ∈ Rm×n , if rank(A) = r < min{m, n}, we have

A = CM R, (54)

where C (resp., R) are actual columns (resp. actual rows) of A – thoughtfully chosen,
and M is a mixing matrix. Otherwise, we have the approximation

A ≈ CM R, (55)

with
kA − CM Rk ≤ kUr−1 k + kVr−1 k σr+1 .

(56)

The basic idea for implementing the CMR factorization is the discrete empirical
interpolation method (DEIM). Specifically, based on the SVD decomposition
A = U ΣV T , the DEIM procedure uses U and V to select the columns and rows of A
that form C and R, respectively. For more details, please refer to the work1 .
1
D. C. Sorensen and M. Embree, A DEIM Induced CUR Factorization, SIAM Journal on Scientific Computing, vol. 38,
no. 3, pp. A1454-A1482, 2016.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 45 / 108
Three Bases for the Column Space

Motivation: As the matrices get large, their rank is also large if there is random
noise. But the effective rank, when the noise is removed, may be considerably
smaller than m and n. So,
1) Don’t use AT A and AAT when you can operate directly on A;
2) Don’t assume that the original order of the rows and columns is necessarily
the best.
The first warning would apply to the least squares and SVD problems because by
forming AT A, we are squaring the condition number σ1 /σn . This measures the
sensitivity and vulnerability of A. Also, for really large matrices, the cost of
computing and storing AT A is just unthinkable.

Question 6
What do we really need from A?

Answer: We need a good basis for the column space! From that starting point,
we can do anything!

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 47 / 108
Three Bases for the Column Space (Cont’d)
1 Singular vectors u1 , · · · , ur from the SVD, with σ1 ≥ σ2 ≥ · · · ≥ σr :

A = Ur × Σr VrT .

(57)

The column basis u1 , · · · , ur in U is orthonormal. It has the extra property that


also the rows σk vkT of the second factor Σr VrT are orthogonal.
2 Orthogonal vectors q1 , · · · , qr from Gram-Schmidt. Using column pivoting!

A = Qm×r × Rr×n . (58)

The column basis q1 , · · · , qr in Q is orthonormal. The rows of the second part R


are not orthogonal but they are well-conditioned (safely independent).
3 Independent columns c1 , · · · , cr taken directly from A after column
exchanges:
A = Cm×r × Zr×n . (59)
The column basis c1 , · · · , cr in C is not orthogonal. But both C and the second
factor Z can be well conditioned. This is achieved by allowing column exchanges.
(Not just allowing but insisiting.) C contains r “good columns” from A.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 48 / 108
Three Bases for the Column Space (Cont’d)

Remark 4 (How to Compute the Three Bases?)


1) We start from A = QR with column pivoting, i.e.,
AP = QR, (60)
where P is a permutation matrix that rearranges the first k columns of AP (and thus Q)
to be the important columns.
2) Make a low-rank approximation to A:
AP = QR = Qr Rr An−r and A = Qr Rr P T An−r P T ,
   
(61)

where A = Qr Rr PT + E ≈ Qr Rr PT.
3) Find the SVD of the matrix Rr P T with only r rows: Rr P T = Ur ΣV T .
4) Multiply Qr times Ur to find U = Qr Ur such that
A = Qr Ur ΣV T + E = U ΣV T + E. (62)
5) With U and V , the CMR decomposition can be derived according to 1.

Further reading: Please read carefully Sections II.3-II.4 of the book2 .

2
Gilbert Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge University Press, 2019, pp. 138-155.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 49 / 108
Power Iteration (λmax )

1) Simplest method for computing one eigenvalue-eigenvector pair is


power iteration, which repeatedly multiplies matrix times initial
starting vector.

2) Assume A has unique eigenvalue of maximum modulus, say λ1 , with


corresponding eigenvector v1 .

3) Then, starting from nonzero vector x0 , iteration scheme

xk = Axk−1 (63)

converges to multiple of eigenvector v1 corresponding to dominant


eigenvalue λ1 .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 52 / 108
Convergence of Power Iteration
1) To see why power iteration converges to the dominant eigenvector,
express starting vector x0 as a linear combination
n
X
x0 = αi vi , (64)
i=1

where vi are eigenvectors of A.

2) Then
xk = Axk−1 = A2 xk−2 = · · · = Ak x0
n n
!
X X
= λki αi vi = λk1 α1 v1 + k
(λi /λ1 ) αi vi
i=1 i=2
= λk1 α1 v1 , as k → ∞. (65)
3) Since |λi /λ1 | < 1 for i > 1, successively higher powers go to zero,
leaving only component corresponding to v1 .
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 53 / 108
Example 14 (Power Iteration)
1) Ratio of values of given component of xk from one iteration to next
converges to dominant eigenvalue λ1 .
   
1.5 0.5 0
2) For example, if A = and x0 = , we obtain
0.5 1.5 1
k xTk ratio
0 0.0 1.0
1 0.5 1.5 1.500
2 1.5 2.5 1.667
3 3.5 4.5 1.800
4 7.5 8.5 1.889
5 15.5 16.5 1.941
6 31.5 32.5 1.970
7 63.5 64.5 1.985
8 127.5 128.5 1.992
3) Ratio converges to the dominant eigenvalue, which is 2.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 54 / 108
Limitations of Power Iteration

Power iteration can fail for various reasons:


1) Starting vector may have no component in dominant eigenvector v1 ,
i.e., α1 = 0. Nevertheless, this problem does not occur in practice
because rounding error usually introduces such component in any
case.

2) There may be more than one eigenvalue having the same (maximum)
modulus, in which case iteration may converge to a linear
combination of corresponding eigenvectors.

3) For real matrix and starting vector, iteration can never converge to a
complex vector.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 55 / 108
Normalized Power Iteration

1) Geometric growth of components at each iteration risks eventual


]colorred overflow (or underflow if λ1 < 1).

2) Approximate eigenvector should be normalized at each iteration, say,


by requiring its largest component to be unity in modulus, giving
iteration scheme

yk = Axk−1 , (66)
xk = yk /kyk k∞ . (67)

3) With normalization, kyk k∞ → |λ1 | and xk → v1 /kv1 k∞ .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 56 / 108
Example 15 (Normalized Power Iteration)
Repeating the previous example with the normalized scheme,
k xTk kyk k∞
0 0.000 1.0
1 0.333 1.0 1.500
2 0.600 1.0 1.667
3 0.778 1.0 1.800
4 0.882 1.0 1.889
5 0.939 1.0 1.941
6 0.969 1.0 1.970
7 0.984 1.0 1.985
8 0.992 1.0 1.992

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 57 / 108
Geometric Interpretation

1) Behavior of power iteration depicted geometrically:

2) Initial vector x0 = v1 + v2 contains equal components in eigenvectors


v1 and v2 (dashed arrows).
3) Repeated multiplication by A causes component in v1 (corresponding
to larger eigenvalue, 2) to dominate, so sequence of vectors xk
converges to v1 .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 58 / 108
Power Iteration with Shift (to Accelerate Convergence)
1) Convergence rate of power iteration depends on ratio |λ2 /λ1 |, where
λ2 is eigenvalue having second largest modulus.
2) May be possible to choose shift, A − σI such that

λ2 − σ λ2
< , (68)
λ1 − σ λ1

so convergence is accelerated.
3) Shift must then be added to the result to obtain the eigenvalue of the
original matrix.
4) In the earlier example, for instance, if we pick the shift of σ = 1
(which is equal to another eigenvalue), then the ratio becomes zero,
and the method converges in one iteration.
5) In general, we would not be able to make such a fortuitous choice,
but shifts can still be extremely useful in some contexts, as we will see
later.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 59 / 108
Inverse Iteration (λmin )
1) If the smallest eigenvalue of the matrix required rather than the
largest, can make use of the fact that eigenvalues of A−1 are
reciprocals of those of A, so smallest eigenvalue of A is reciprocal of
the largest eigenvalue of A−1 .
2) This leads to inverse iteration scheme

Ayk = xk−1 , (69)


xk = yk /kyk k∞ , (70)

which is equivalent to power iteration applied to A−1 .


3) Inverse of A not computed explicitly, but factorization of A used to
solve a system of linear equations at each iteration.
4) Inverse iteration converges to eigenvector corresponding to smallest
eigenvalue of A.
5) Eigenvalue obtained is dominant eigenvalue of A−1 , and hence its
reciprocal is smallest eigenvalue of A in modulus.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 60 / 108
Example 16 (Inverse Iteration)
Applying inverse iteration to the previous example to compute the smallest
eigenvalue yields sequence
k xTk kyk k∞
0 0.000 1.0
1 -0.333 1.0 0.750
2 -0.600 1.0 0.833
3 -0.778 1.0 0.900
4 -0.882 1.0 0.944
5 -0.939 1.0 0.971
6 -0.969 1.0 0.985

which is indeed converging to 1 (which is its own reciprocal in this case).

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 61 / 108
Inverse Iteration with Shift (to Accelerate Convergence)

1) As before, shifting strategy, working with A − σI for some scalar σ,


can greatly improve convergence.

2) Inverse iteration is particularly useful for computing eigenvector


corresponding to approximate eigenvalue since it converges rapidly
when applied to shifted matrix A − λI, where λ is an approximate
eigenvalue.

3) Inverse iteration is also useful for computing the eigenvalue closest to


the given value β, since if β is used as shift, then desired eigenvalue
corresponds to the smallest eigenvalue of the shifted matrix.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 62 / 108
Rayleigh Quotient

1) Given approximate eigenvector x for real matrix A, determining best


estimate for corresponding eigenvalue λ can be considered as n × 1
linear least squares approximation problem

xλ ∼
= Ax. (71)

2) From normal equation xT xλ = xT Ax, least squares solution is given


by
xT Ax
λ= T . (72)
x x
3) This quantity, known as Rayleigh quotient, has many useful
properties.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 63 / 108
Example 17 (Rayleigh Quotient)
1) Rayleigh quotient can accelerate the convergence of iterative methods
such as power iteration since Rayleigh quotient xTk Axk /xTk xk gives a
better approximation to eigenvalue at iteration k than does basic
method alone.
2) For the previous example using power iteration, the value of the
Rayleigh quotient at each iteration is shown below.
k xTk kyk k∞ xTk Ax/xTk xk
0 0.000 1.0
1 -0.333 1.0 1.500 1.500
2 -0.600 1.0 1.667 1.800
3 -0.778 1.0 1.800 1.941
4 -0.882 1.0 1.889 1.985
5 -0.939 1.0 1.941 1.996
6 -0.969 1.0 1.970 1.999

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 64 / 108
Rayleigh Quotient Iteration

1) Given an approximate eigenvector, the Rayleigh quotient yields a


good estimate for the corresponding eigenvalue.
2) Conversely, inverse iteration converges rapidly to eigenvector if the
approximate eigenvalue is used as shift, with one iteration often
sufficing.
3) These two ideas combined in Rayleigh quotient iteration

σk = xTk Axk /xTk xk , (73a)


(A − σk I)yk+1 = xk , (73b)
xk+1 = yk+1 /kyk+1 k∞ , (73c)

starting from given nonzero vector x0 .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 65 / 108
Rayleigh Quotient Iteration (Cont’d)
4) Rayleigh quotient iteration is especially effective for symmetric
matrices and usually converges rapidly.
5) Using different shifts at each iteration means the matrix must be
refactored each time to solve the linear system. Hence, the cost per
iteration is high unless the matrix has a special form that makes
factorization easy.
6) The same idea also works for complex matrices, for which transpose is
replaced by conjugate transpose, so Rayleigh quotient becomes
xH Ax/xH x.
7) Using the same matrix as previous examples and randomly chosen
starting vector x0 , Rayleigh quotient iteration converges in two
iterations.
k xTk σk
0 0.807 0.397 1.896
1 0.924 1.000 1.998
2 1.000 1.000 2.000
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 66 / 108
Deflation

1) After eigenvalue λ1 and corresponding eigenvector x1 have been


computed, then additional eigenvalues λ2 , · · · , λn of A can be
computed by deflation, which effectively removes known eigenvalue.

2) Let H be any nonsingular matrix such that Hx1 = αe1 , scalar


multiple of the first column of identity matrix (Householder
transformation is a good choice for H).

3) Then similarity transformation determined by H transforms A into


form
λ1 bT
 
−1
HAH = , (74)
0 B
where B is matrix of order n − 1 having eigenvalues λ2 , · · · , λn .

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 67 / 108
Deflation (Cont’d)

4) Thus, we can work with B to compute next eigenvalue λ2 .

5) Moreover, if y2 is eigenvector of B corresponding to λ2 , then


 
−1 α
x2 = H , (75)
y2
T
where α = λb2 −λ
y2
1
, is eigenvector corresponding to λ2 for original
matrix A, provided λ1 6= λ2 .

6) Process can be repeated to find additional eigenvalues and


eigenvectors.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 68 / 108
Deflation (Cont’d)

7) Alternative approach lets u1 be any vector such that uT1 x1 = λ1 .

8) Then A − x1 uT1 has eigenvalues 0, λ2 , · · · , λn .

9) Possible choices for u1 include

u1 = λ1 x1 , if A is symmetric and x1 is normalized so that kx1 k2 = 1;


u1 = λ1 y1 , where y1 is corresponding left eigenvector (i.e.,
AT y1 = λ1 y1 ) normalized so that y1T x1 = 1;

u1 = AT ek , if x1 is normalized so that kx1 k∞ = 1 and k th


component of x1 is 1.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 69 / 108
Jacobi Method
1) One of oldest methods for computing eigenvalues is Jacobi method,
which uses similarity transformation based on plane rotations.

2) Sequence of plane rotations chosen to annihilate symmetric pairs of


matrix entries, eventually converging to diagonal form.

3) Choice of plane rotation is slightly more complicated than in Givens’


method for QR factorization.

4) To annihilate given off-diagonal pair, choose c and s so that


   
H c −s a b c s
J AJ = (76)
s c b d −s c
 2
c a − 2csb + s2 d c2 b + cs(a − d) − s2 b

= 2 (77)
c b + cs(a − d) − s2 b c2 d + 2csb + s2 a

is diagonal.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 71 / 108
Jacobi Method (Cont’d)

5) Transformed matrix diagonal if

c2 b + cs(a − d) − s2 b = 0. (78)

6) Dividing both sides by c2 b, we obtain

s a − d s2
1+ − 2 = 0. (79)
c b c
7) Making substitution t = s/c, we get quadratic equation

a−d
1+t − t2 = 0. (80)
b
for tangent
√ t of angle of rotation, from which we can recover
c = 1/ 1 + t2 and s = c · t.
8) Advantageous numerically to use the root of smaller magnitude.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 72 / 108
Example 18 (Plane Rotation)
Consider 2 × 2 matrix  
1 2
A= .
2 1
Quadratic equation for tangent reduces to t2 = 1, so t = ±1.

Two roots of√same magnitude,√so we arbitrarily choose t = −1, which


yields c = 1/ 2 and s = −1/ 2.

Resulting plane rotation J gives


 √ √   √ √   
T 1/ √2 1/√2 1 2 1/√2 −1/√ 2 3 0
J AJ = = .
−1/ 2 1/ 2 2 1 1/ 2 1/ 2 0 −1

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 73 / 108
Jacobi Method (Cont’d)

9) Starting with symmetric matrix A0 = A, each iteration has form

Ak+1 = JkT Ak Jk , (81)

where Jk is plane rotation chosen to annihilate a symmetric pair of


entries in Ak .

10) Plane rotations are repeatedly applied from both sides in systematic
sweeps through the matrix until the off-diagonal mass of the matrix is
reduced to within some tolerance of zero.

11) Resulting diagonal matrix orthogonally similar to the original matrix,


so diagonal entries are eigenvalues, and the product of plane rotations
gives eigenvectors.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 74 / 108
Jacobi Method (Cont’d)

12) Jacobi method is reliable, simple to program, and capable of high


accuracy, but converges rather slowly and is difficult to generalize
beyond symmetric matrices.

13) Except for small problems, more modern methods usually require 5 to
10 times less work than Jacobi.

14) One source of inefficiency is that previously annihilated entries can


subsequently become nonzero again, thereby requiring repeated
annihilation.

15) Newer methods, such as QR iteration, preserve zero entries


introduced into the matrix.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 75 / 108
Example 19 (Jacobi Method)
 
1 0 2
1) Let A0 = 0 2 1.
2 1 1

2) First annihilate (1,3) and (3,1) entries using rotation


 
0.707 0 −0.707
J0 =  0 1 0 
0.707 0 0.707

to obtain  
3 0.707 0
A1 = J0T A0 J0 = 0.707 2 0.707 .
0 0.707 −1

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 76 / 108
Example 19 (Jacobi Method (Cont’d))
3) Next, annihilate (1,2) and (2,1) entries using rotation
 
0.888 −0.460 0
J1 = 0.460 0.888 0
0 0 1

to obtain  
3.366 0 0.325
A2 = J1T A1 J1 = 0 1.634 0.628 .
0.325 0.628 −1

4) Next, annihilate (2,3) and (3,2) entries using rotation


 
1 0 0
J2 = 0 0.975 −0.221
0 0.221 0.975

to obtain  
3.366 0.072 0.317
A3 = J2T A2 J2 = 0.072 1.776 0 .
0.317 0 −1.142

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 77 / 108
Example 19 (Jacobi Method (Cont’d))
5) Beginning a new sweep, again annihilate (1,3) and (3,1) entries using rotation
 
0.998 0 −0.070
J3 =  0 1 0 
0.070 0 0.998

to obtain  
3.388 0.072 0
A4 = J3T A3 J3 = 0.072 1.776 −0.005 .
0 −0.005 −1.164

6) Process continues until off-diagonal entries are reduced to as small as desired.

7) Result is a diagonal matrix orthogonally similar to the original matrix, with the orthogonal
similarity transformation given by the product of plane rotations.

Interactive example

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 78 / 108
Simultaneous Iteration

1) Simplest method for computing many eigenvalue-eigenvector pairs is


simultaneous iteration, which repeatedly multiplies matrix times
matrix of initial starting vectors.

2) Starting from n × p matrix X0 of rank p, iteration scheme is

Xk = AXk−1 . (82)

3) span(Xk ) converges to invariant subspace determined by p largest


eigenvalues of A, provided |λp | > |λp+1 |.

4) Also called subspace iteration.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 80 / 108
Orthogonal Iteration
1) As with power iteration, normalization is needed with simultaneous
iteration.

2) Each column of Xk converges to dominant eigenvector, so columns


of Xk become increasingly ill-conditioned basis for span(Xk ).

3) Both issues can be addressed by computing QR factorization at each


iteration

Q̂k Rk = Xk−1 (83a)


Xk = AQ̂k (83b)

where Q̂k Rk is reduced QR factorization of Xk−1 .


4) This orthogonal iteration converges to block triangular form, and the
leading block is triangular if moduli of consecutive eigenvalues are
distinct.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 81 / 108
QR Iteration (proposed by Heinz Rutishauser in 1958 and refined by J. F. G.
Francis of Ferranti Ltd., London in 1961-1962)

1) For p = n and X0 = I, matrices


Ak = Q̂H
k AQ̂k (84)
generated by orthogonal iteration converge to triangular or block
triangular form, yielding all eigenvalues of A.

2) QR iteration computes successive matrices Ak without forming above


product explicitly.

3) Starting with A0 = A, at iteration k compute QR factorization


Qk Rk = Ak−1 (85)
and form reverse product
Ak = Rk Qk . (86)
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 82 / 108
QR Iteration (Cont’d)

4) Successive matrices Ak are unitarily similar to each other

Ak = Rk Qk = QH
k Ak−1 Qk . (87)

5) Diagonal entries (or eigenvalues of diagonal blocks) of Ak converge


to eigenvalues of A.

6) Product of orthogonal matrices Qk converges to matrix of


corresponding eigenvectors.

7) If A is symmetric, then symmetry is preserved by QR iteration, so Ak


converge to the matrix that is both triangular and symmetric, hence
diagonal.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 83 / 108
Example 20 (QR Iteration)
 
7 2
1) Let A0 = .
2 4
2) Compute QR factorization
  
0.962 −0.275 7.28 3.02
A0 = Q1 R1 = ,
0.275 0.962 0 3.30

and form reverse product


 
7.83 0.906
A 1 = R 1 Q1 = .
0.906 3.17

3) Off-diagonal entries are now smaller, and diagonal entries are closer
to eigenvalues, 8 and 3.

4) Process continues until the matrix is within a tolerance of being


diagonal, and diagonal entries then closely approximate eigenvalues.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 84 / 108
QR Iteration with Shifts (to Accelerate Convergence)

1) Convergence rate of QR iteration can be accelerated by incorporating


shifts

Qk Rk = Ak−1 − σk I,
Ak = Rk Qk + σk I,

where σk is a rough approximation to the eigenvalue.

2) Good shift can be determined by computing eigenvalues of 2 × 2


submatrix in the lower right corner of the matrix.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 85 / 108
Example 21 (QR Iteration with Shifts)
1) Repeat previous example, but with shift of σ1 = 4, which is lower
right corner entry of matrix.

2) We compute QR factorization
  
0.832 0.555 3.61 1.66
A0 − σ1 I = Q1 R1 = ,
0.555 −0.832 0 1.11

and form reverse product, adding back shift to obtain


 
7.92 0.615
A 1 = R 1 Q1 + σ 1 I = .
0.615 3.08

3) After one iteration, off-diagonal entries are smaller compared with an


unshifted algorithm, and eigenvalues are closer approximations to
eigenvalues.

Interactive example
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 86 / 108
Preliminary Reduction

1) Efficiency of QR iteration can be enhanced by first transforming the


matrix as close to triangular form as possible before beginning
iterations.

2) Hessenberg matrix is triangular except for one additional nonzero


diagonal immediately adjacent to the main diagonal.

3) Any matrix can be reduced to Hessenberg form in a finite number of


steps by an orthogonal similarity transformation, for example, using
Householder transformations.

4) Symmetric Hessenberg matrix is tridiagonal.

5) Hessenberg or tridiagonal form is preserved during successive QR


iterations.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 87 / 108
Preliminary Reduction (Cont’d)

Advantages of initial reduction to upper Hessenberg or tridiagonal form:


6) Work per QR iteration is reduced from O(n3 ) to O(n2 ) for general
matrix or O(n) for symmetric matrix.

7) Fewer QR iterations are required because the matrix is nearly


triangular (or diagonal) already.

8) If any zero entries on the first subdiagonal, then the matrix is block
triangular, and the problem can be broken into two or more smaller
subproblems.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 88 / 108
Preliminary Reduction (Cont’d)

9) QR iteration is implemented in two stages:


Case 1: symmetric matrix −→ tridiagonal −→ diagonal
or
Case 2: general matrix −→ Hessenberg −→ triangular

10) Preliminary reduction requires a definite number of steps, whereas the


subsequent iterative stage continues until convergence.

11) In practice, only a modest number of iterations are usually required,


so much of the work is in preliminary reduction.

12) Cost of accumulating eigenvectors, if needed, dominates total cost.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 89 / 108
Cost of QR Iteration

The approximate overall cost of preliminary reduction and QR iteration,


counting both additions and multiplications:
1) Symmetric matrices
4 3
3nfor eigenvalues only;
3
9n for eigenvalues and eigenvectors.

2) General matrices
10n3 for eigenvalues only;
25n3 for eigenvalues and eigenvectors.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 90 / 108
Krylov Subspace Methods (Russian applied mathematician,
1863–1945)
1) Krylov subspace methods reduce the matrix to Hessenberg (or tridiagonal)
form using only matrix-vector multiplication, developed by Krylov in 1931.

2) For arbitrary starting vector x0 , if


Kk = x0 Ax0 · · · Ak−1 x0 = x0
   
x1 ··· xk−1 , (89)
then, for k = n, we have
Kn−1 AKn = Cn , (90)
where Cn = [e2 , · · · , en , a], with a = Kn−1 xn , is upper Hessenberg.

3) To obtain better conditioned basis for span(Kn ), compute QR factorization


Qn Rn = Kn , (91)
then, substituting (91) into (90) yields
−1
QH
n AQn = Rn Cn Rn ≡ H, (92)
with H upper Hessenberg.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 92 / 108
Krylov Subspace Methods (Cont’d)

4) Equating k th columns on each side of equation AQn = Qn H yields


recurrence

Aqk = h1k q1 + · · · + hkk qk + hk+1,k qk+1 , (93)

relating qk+1 to preceding vectors q1 , · · · , qk .

5) Premultiplying by qjH and using orthonormality,

qjH Aqk = hjk , j = 1, · · · , k. (94)

6) These relationships yield Arnoldi iteration, which produces unitary


matrix Qn and upper Hessenberg matrix Hn column by column using
only matrix-vector multiplication by A and inner products of vectors.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 93 / 108
Arnoldi Iteration (American engineer, 1917–1995)

1) The Arnoldi iteration algorithm was developed by Arnoldi in 1951.


1: x0 = arbitrary nonzero starting vector
2: q1 = x0 /kx0 k2
3: for k = 1, 2, · · · do
4: uk = Aqk
5: for j = 1 to k do
6: hjk = qjH uk
7: uk = uk − hjk qj
8: end for
9: hk+1,k = kuk k2
10: if hk+1,k = 0 then
11: stop
12: end if
13: qk+1 = uk /hk+1,k
14: end for

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 94 / 108
Arnoldi Iteration (Cont’d)

2) If  
Q k = q1 · · · qk , (95)
then
Hk = QH
k AQk . (96)
is upper Hessenberg matrix.

3) Eigenvalues of Hk , called Ritz values, are approximate eigenvalues of


A, and Ritz vectors given by Qk y, where y is eigenvector of Hk , are
corresponding approximate eigenvectors of A.

4) Eigenvalues of Hk must be computed by another method, such as


QR iteration, but this is an easier problem if k  n.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 95 / 108
Arnoldi Iteration (Cont’d)

5) Arnoldi iteration is fairly expensive in work and storage because each


new vector qk must be orthogonalized against all previous columns of
Qk , and all must be stored for that purpose.

6) The Arnoldi process is usually restarted periodically with a carefully


chosen starting vector.

7) Ritz values and vectors produced are often good approximations to


eigenvalues and eigenvectors of A after relatively few iterations.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 96 / 108
Lanczos Iteration (Hungary, 1893–1974, Einstein’s
assistant in Berlin in 1928)

1) Work and storage costs drop dramatically if the matrix is symmetric


or Hermitian since recurrence then has only three terms and Hk is
tridiagonal (so usually denoted Tk )
1: q0 = 0, β0 = 0
2: x0 = arbitrary nonzero starting vector
3: q1 = x0 /kx0 k2
4: for k = 1, 2, · · · do
5: uk = Aqk
6: αk = qkH uk
7: uk = uk − βk−1 qk−1 − αk qk
8: βk = kuk k2
9: if βk = 0 then
10: stop
11: end if
12: qk+1 = uk /βk
13: end for
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 97 / 108
Lanczos Iteration (Cont’d)

2) αk and βk are diagonal and subdiagonal entries of symmetric


tridiagonal matrix Tk .

3) As with Arnoldi, Lanczos iteration does not produce eigenvalues and


eigenvectors directly, but only tridiagonal matrix Tk , whose
eigenvalues and eigenvectors must be computed by another method
to obtain Ritz values and vectors.

4) If βk = 0, then the algorithm appears to break down, but in that case,


invariant subspace has already been identified (i.e., Ritz values and
vectors are already exact at that point).

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 98 / 108
Lanczos Iteration (Cont’d)

5) In principle, if Lanczos algorithm were run until k = n, resulting


tridiagonal matrix would be orthogonally similar to A.

6) In practice, rounding error causes loss of orthogonality, invalidating


this expectation.

7) Problem can be overcome by reorthogonalizing vectors as needed, but


the expense can be substantial.

8) Alternatively, we can ignore the problem, in which case the algorithm


still produces good eigenvalue approximations, but multiple copies of
some eigenvalues may be generated.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 99 / 108
Lanczos Iteration (Cont’d)

9) Great virtue of Arnoldi and Lanczos iterations is their ability to


produce good approximations to extreme eigenvalues for k  n.

10) Moreover, they require only one matrix-vector multiplication by A per


step and little auxiliary storage, so they are ideally suited to large
sparse matrices.

11) If eigenvalues are needed in the middle of the spectrum, say near σ,
then the algorithm can be applied to matrix (A − σI)−1 , assuming it
is practical to solve systems of form (A − σI)x = y.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 100 / 108
Example 22 (Lanczos Iteration)
For a 29 × 29 symmetric matrix with eigenvalues 1, · · · , 29, the behavior
of Lanczos iteration is shown below.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 101 / 108
Generalized Eigenvalue Problems

1) Generalized eigenvalue problem has form

Ax = λBx, (97)

where A and B are given n × n matrices.


2) If either A or B is nonsingular, then the generalized eigenvalue
problem can be converted to a standard eigenvalue problem, either

(B −1 A)x = λx or (A−1 B)x = (1/λ)x. (98)

3) This is not recommended since it may cause


loss of accuracy due to rounding error;
loss of symmetry if A and B are symmetric.
4) Better alternative for generalized eigenvalue problems is the QZ
algorithm.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 103 / 108
Generalized Eigenvalue Problems: QZ Algorithm
1) If A and B are triangular, then eigenvalues are given by λi = aii /bii ,
for bii 6= 0.
2) QZ algorithm reduces A and B simultaneously to upper triangular
form by orthogonal transformations.
3) First, B is reduced to upper triangular form by orthogonal
transformation from left, which is also applied to A.
4) Next, transformed A is reduced to upper Hessenberg form by
orthogonal transformation from left while maintaining the triangular
form of B, which requires additional transformations from the right.
5) Finally, analogous to QR iteration, A is reduced to triangular form
while still maintaining the triangular form of B, which again requires
transformations from both sides.
6) Eigenvalues can now be determined from mutually triangular form,
and eigenvectors can be recovered from products of left and right
transformations, denoted by Q and Z.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 104 / 108
Generalized Eigenvalue Problems: Matlab Application

In Matlab, the built-in function

[V , D] = eig(A, B, algorithm) (99)

solves the generalized eigenvalue problem AV = BV D, where B must


be the same size as A.
If the algorithm is ‘chol,’ uses the Cholesky factorization of B to
compute the generalized eigenvalues.
The default for the algorithm depends on the properties of A and B,
but is generally ‘qz,’ which uses the QZ algorithm.
If A is Hermitian and B is Hermitian positive definite, then the
default for the algorithm is ‘chol.’

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 105 / 108
Extension: Computing SVD

1) Singular values of A are nonnegative square roots of eigenvalues of


AT A, and columns of U and V are orthonormal eigenvectors of
AAT and AT A, respectively.
2) Algorithms for computing SVD work directly with A, without forming
AAT or AT A, to avoid loss of information associated with forming
these matrix products explicitly.
3) SVD is usually computed by a variant of QR iteration, with A first
reduced to bidiagonal form by orthogonal transformations, then
remaining off-diagonal entries are annihilated iteratively. (See also
Sections II.3-II.4 of the book3 , where the computations of QR, SVD,
and CMR are addressed.)
4) SVD can also be computed by a variant of the Jacobi method, which
can be useful on parallel computers or if the matrix has a special
structure.
3
Gilbert Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge University Press, 2019, pp. 138-155.
Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 106 / 108
Further Reading

Barry A. Cipra, “The Best of the 20th Century: Editors Name Top 10
Algorithms”, SIAM News, vol. 33, no. 4.

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 107 / 108
Thank you very much for your attention!

Minghua Xia
Email: xiamingh@mail.sysu.edu.cn
Web: https://seit.sysu.edu.cn/teacher/XiaMinghua

Minghua Xia (Sun Yat-sen University) Matrix Analysis and Applications November 15, 2023 108 / 108

You might also like