Professional Documents
Culture Documents
A Collection of Notes
on Numerical Linear Algebra
10 9 8 7 6 5 4 3 2 1
All rights reserved. No part of this book may be reproduced, stored, or transmitted in
any manner without the written permission of the publisher. For information, contact
any of the authors.
No warranties, express or implied, are made by the publisher, authors, and their em-
ployers that the programs contained in this volume are free of error. They should not
be relied on as the sole basis to solve a problem whose incorrect solution could result
in injury to person or property. If the programs are employed in such a manner, it
is at the user’s own risk and the publisher, authors, and their employers disclaim all
liability for such misuse.
Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trade-
mark is intended.
Preface xi
0 Notes on Setting Up 1
0.1 Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.1 Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.1.3 What You Will Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.2 Setting Up to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.2.1 How to Navigate These Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.2.2 Setting Up Your Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3.1 Why MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3.2 Installing MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3.3 MATLAB Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.4 Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.4.1 Additional Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.4.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
I Basics 9
1 Notes on Simple Vector and Matrix Operations 11
1.1 Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.1 Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.3 What You Will Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 (Hermitian) Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Conjugating a complex scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Conjugate of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Conjugate of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.4 Transpose of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 Hermitian transpose of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
i
1.3.6 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.7 Hermitian transpose (adjoint) of a matrix . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Vector-vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 Scaling a vector (scal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Scaled vector addition (axpy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.3 Dot (inner) product (dot) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Matrix-vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.1 Matrix-vector multiplication (product) . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.2 Rank-1 update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Matrix-matrix multiplication (product) . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Element-by-element computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Via matrix-vector multiplications . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.3 Via row-vector times matrix multiplications . . . . . . . . . . . . . . . . . . . . . . 31
1.6.4 Via rank-1 updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7 Enrichments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.7.1 The Basic Linear Algebra Subprograms (BLAS) . . . . . . . . . . . . . . . . . . . . 33
1.8 Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.8.1 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.8.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
=
Answers 357
1. Notes on Simple Vector and Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
2. Notes on Vector and Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
3. Notes on Orthogonality and the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
4. Notes on Gram-Schmidt QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
6. Notes on Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
8. Notes on Solving Linear Least-squares Problems (Answers) . . . . . . . . . . . . . . . . . . . . 395
8. Notes on the Condition of a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9. Notes on the Stability of an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
10. Notes on Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
11. Notes on Gaussian Elimination and LU Factorization . . . . . . . . . . . . . . . . . . . . . . . 407
12. Notes on Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
13. Notes on Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14. Notes on the Power Method and Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . 426
16. Notes on the Symmetric QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
17. Notes on the Method of Relatively Robust Representations . . . . . . . . . . . . . . . . . . . . 431
18. Notes on Computing the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
This document was created over the course of many years, as I periodically taught an introductory course
titled “Numerical Analysis: Linear Algebra,” cross-listed in the departments of Computer Science, Math,
and Statistics and Data Sciences (SDS), as well as the Computational Science Engineering Mathematics
(CSEM) graduate program.
Over the years, my colleagues and I have used many different books for this course: Matrix Computations
by Golub and Van Loan [22], Fundamentals of Matrix Computation by Watkins [47], Numerical Linear
Algebra by Trefethen and Bau [38], and Applied Numerical Linear Algebra by Demmel [12]. All are books
with tremendous strenghts and depth. Nonetheless, I found myself writing materials for my students that
add insights that are often drawn from our own research experiences in the field. These became a series of
notes that are meant to supplement rather than replace any of the mentioned books.
Fundamental to our exposition is the FLAME notation [25, 40], which we use to present algorithms hand-
in-hand with theory. For example, in Figure 1 (left), we present a commonly encountered LU factorization
algorithm, which we will see performs exactly the same computations as does Gaussian elimination. By
abstracting away from the detailed indexing that are required to implement an algorithm in, for example,
Matlab’s M-script language, the reader can focus on the mathematics that justifies the algorithm rather
than the indices used to express the algorithm. The algorithms can be easily translated into code with
the help of a FLAME Application Programming Interface (API). Such interfaces have been created for
C, M-script, and Python, to name a few [25, 6, 30]. In Figure 1 (right), we show the LU factorization
algorithm implemented with the FLAME@lab API for M-script. The C API is used extensively in our
implementation of the libflame dense linear algebra library [41, 42] for sequential and shared-memory
architectures and the notation also inspired the API used to implement the Elemental dense linear algebra
library [31] that targets distributed memory architectures. Thus, the reader is exposed to the abstractions
that experts use to translate algorithms to high-performance software.
These notes may at times appear to be a vanity project, since I often point the reader to our research papers
for a glipse at the cutting edge of the field. The fact is that over the last two decades, we have helped
further the understanding of many of the topics discussed in a typical introductory course on numerical
linear algebra. Since we use the FLAME notation in many of our papers, they should be relatively easy to
read once one familiarizes oneself with these notes. Let’s be blunt: these notes do not do the field justice
when it comes to also giving a historic perspective. For that, we recommend any of the above mentioned
texts, or the wonderful books by G.W. Stewart [36, 37]. This is yet another reason why they should be
used to supplement other texts.
xi
Algorithm: A := L\U = LU(A) function [ A_out ] = LU_unb_var4( A )
[ ATL, ATR, ...
AT L AT R ABL, ABR ] = FLA_Part_2x2( A, ...
Partition A →
ABL ABR 0, 0, ’FLA_TL’ );
where AT L is 0 × 0 while ( size( ATL, 1 ) < size( A, 1 ) )
[ A00, a01, A02, ...
while n(AT L ) < n(A) do
a10t, alpha11, a12t, ...
Repartition A20, a21, A22 ] = ...
FLA_Repart_2x2_to_3x3( ...
A00 a01 A02 ATL, ATR, ...
AT L AT R ABL, ABR, 1, 1, ’FLA_BR’ );
→ aT α11 aT12
10 %------------------------------------------%
ABL ABR
A20 a21 A22 a21 = a21 / alpha11;
A22 = A22 - a21 * a12t;
a21 := a21 /α11 %------------------------------------------%
[ ATL, ATR, ...
A22 := A22 − a21 aT12
ABL, ABR ] = ...
FLA_Cont_with_3x3_to_2x2( ...
Continue with A00, a01, A02, ...
a10t, alpha11, a12t, ...
A00 a01 A02
AT L AT R A20, a21, A22, ’FLA_TL’ );
← aT α11 aT12 end
10
ABL ABR
A_out = [ ATL, ATR
A20 a21 A22
ABL, ABR ];
endwhile return
Figure 1: LU factorization represented with the FLAME notation and the FLAME@lab API.
The order of the chapters should not be taken too seriously. They were initially written as separate,
relatively self-contained notes. The sequence on which I settled roughly follows the order that the topics
are encountered in Numerical Linear Algebra by Trefethen and Bau [38]. The reason is the same as the
one they give: It introduces orthogonality and the Singular Value Decomposition early on, leading with
important material that many students will not yet have seen in the undergraduate linear algebra classes
they took. However, one could just as easily rearrange the chapters so that one starts with a more traditional
topic: solving dense linear systems.
The notes frequently refer the reader to another resource of ours titled Linear Algebra: Foundations to
Frontiers - Notes to LAFF With (LAFF Notes) [30]. This is a 900+ page document with more than 270
videos that was created for the Massive Open Online Course (MOOC) Linear Algebra: Foundations
to Frontiers (LAFF), offered by the edX platform. That course provides an appropriate undergraduate
background for these notes.
I (tried to) video tape my lectures during Fall 2014. Unlike the many short videos that we created for the
Massive Open Online Course (MOOC) titled “Linear Algebra: Foundations to Frontiers” that are now part
of the notes for that course, I simply set up a camera, taped the entire lecture, spent minimal time editing,
and uploaded the result for the world to see. Worse, I did not prepare particularly well for the lectures,
other than feverishly writing these notes in the days prior to the presentation. Sometimes, I forgot to turn
on the microphone and/or the camera. Sometimes the memory of the camcorder was full. Sometimes I
forgot to shave. Often I forgot to comb my hair. You are welcome to watch, but don’t expect too much!
One should consider this a living document. As time goes on, I will be adding more material. Ideally,
people with more expertise than I have on, for example, solving sparse linear systems will contributed
notes of their own.
Acknowledgments
These notes use notation and tools developed by the FLAME project at The University of Texas at Austin
(USA), Universidad Jaume I (Spain), and RWTH Aachen University (Germany). This project involves
a large and ever expanding number of very talented people, to whom I am indepted. Over the years, it
has been supported by a number of grants from the National Science Foundation and industry. The most
recent and most relevant funding came from NSF Award ACI-1148125 titled “SI2-SSI: A Linear Algebra
Software Infrastructure for Sustained Innovation in Computational Chemistry and other Sciences”1
In Texas, behind every successful man there is a woman who really pulls the strings. For more than thirty
years, my research, teaching, and other pedagogical activities have been greatly influenced by my wife,
Dr. Maggie Myers. For parts of these notes that are particularly successful, the credit goes to her. Where
they fall short, the blame is all mine!
1 Any opinions, findings and conclusions or recommendations expressed in this mate- rial are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation (NSF).
xv
Chapter 0
Notes on Setting Up
0.1.1 Launch
1
Chapter 0. Notes on Setting Up 2
0.1.2 Outline
• Download the file LAFF-NLA.zip to a directory on your computer. ()I chose to place it in my home
directory rvdg).
The above steps create a directory structure with various files as illustrated in Figure 1.
You will want to put this PDF in that directory in the indicated place! Opening it with * Acrobat
Reader will ensure that hyperlinks work properly.
This is a work in progress. When instructed, you will want to load new versions of this PDF,
replacing the existing one. There will also be other files you will be asked to place in various
subdirectories.
0.3 MATLAB
Users
rvdg
LAFF-NLA
LAFF-NLA.pdf .............
The PDF for the document you are reading should be placed
here.
Programming
laff....................Subdirectory in which a small library that we will use re-
sides.
matvec
matmat
util
vecvec
chapter20 answers ..... Subdirectory with answers for the coding assignments for
Chapter 20.
Videos
Videos-Chapter00.zip . Place this file here before unzipping (if you choose to down-
load videos).
Videos-Chapter00......Subdirectory created when you unzip
Video-Chapter00.zip.
.
.
.
Videos-Chapter20.zip . Place this file here before unzipping (if you choose to down-
load videos).
Videos-Chapter20......Subdirectory created when you unzip
Video-Chapter20.zip.
Spark......................Web tool with which you will translate algorithms into code
using the FLAME@lab API.
Figure 1: Directory structure for each of the steps (illustrated for Step 1 in subdirectory step1).
Chapter 0. Notes on Setting Up 6
these tutorials altogether, and come back to them if you find you want to know more about MATLAB and
its programming language (M-script).
What is MATLAB?
* YouTube
* Downloaded
Video
* YouTube
* Downloaded
Video
MATLAB Variables
* YouTube
* Downloaded
Video
MATLAB as a Calculator
* YouTube
* Downloaded
Video
0.4. Wrapup 7
* to
MathWorks.com
Video
0.4 Wrapup
0.4.2 Summary
You will see that we develop a lot of the theory behind the various topics in linear algebra via a sequence
of homework exercises. At the end of each week, we summarize theorems and insights for easy reference.
Chapter 0. Notes on Setting Up 8
Chapter 1
Notes on Simple Vector and Matrix Operations
1.1.1 Launch
We assume that the reader is quite familiar with vectors, linear transformations, and matrices. If not, we
suggest the reader reviews the first five weeks of
Since undergraduate courses tend to focus on real valued matrices and vectors, we mostly focus on the
case where they are complex valued as we review.
Read disclaimer regarding the videos in the preface!
The below first video is actually for the first lecture of the semester.
* YouTube
* Download from UT Box
* View After Local Download
9
Chapter 1. Notes on Simple Vector and Matrix Operations 10
1.1.2 Outline
1.2 Notation
Throughout our notes we will adopt notation popularized by Alston Householder. As a rule, we will use
lower case Greek letters (α, β, etc.) to denote scalars. For vectors, we will use lower case Roman letters
(a, b, etc.). Matrices are denoted by upper case Roman letters (A, B, etc.).
If x ∈ Cn , and A ∈ Cm×n then we expose the elements of x and A as
χ0 α0,0 α0,1 · · · α0,n−1
χ
1
α
1,0 α 1,1 · · · α1,n−1
x = . and A = .
.. .. .. ..
. . · · · .
χn−1 αm−1,0 αm−1,1 · · · αm−1,n−1
It is possible for a subvector to be of size zero (no elements) or one (a scalar). If we want to emphasize
that a specific subvector is a scalar, then we may choose to use a lower case Greek letter for that scalar, as
in
x0
x= 1 .
χ
x2
We will see frequent examples where a matrix, A ∈ Cm×n , is partitioned into columns or rows:
abT0
abT
1
A = a0 a1 · · · an−1 = . .
..
abTm−1
Here we add the b to be able to distinguish between the identifiers for the columns and rows. When from
context it is obvious whether we refer to a row or column, we may choose to skip the b. Sometimes, we
partition matrices into submatrices:
A
b0 A0,0 A0,1 · · · A0,N−1
A b1 A1,0 A 1,1 · · · A1,N−1
A = A0 A1 · · · AN−1 = . = .
.. .. .. ..
. . .
A
bM−1 AM−1,0 AM−1,1 · · · AM−1,N−1
1.3. (Hermitian) Transposition 13
α = αr − iαc
Notice that transposing a (column) vector rearranges its elements to make a row vector.
Chapter 1. Notes on Simple Vector and Matrix Operations 14
1.3.8 Exercises
Homework 1.1 Partition A
abT0
abT
1
A= a0 a1 · · · an−1 = . .
..
abTm−1
Convince yourself that the following hold:
1.3. (Hermitian) Transposition 15
aT0
T aT
1
• a0 a1 · · · an−1 = . .
..
aTm−1
T
abT0
abT
1
• . = ab0 ab1 · · · abn−1 .
..
abTm−1
aH
0
H aH
1
• a0 a1 · · · an−1 = . .
..
aH
m−1
H
abT0
abT
1
• . = ab0 ab1 · · · abn−1 .
..
abTm−1
* SEE ANSWER
• xH = x0H x1H ··· H
xN−1 .
* SEE ANSWER
Homework 1.3 Partition A
A0,0 A0,1 ··· A0,N−1
A1,0 A1,1 ··· A1,N−1
A= ,
.. .. ..
. . ··· .
AM−1,0 AM−1,1 · · · AM−1,N−1
If y := αx with
ψ0
ψ
1
y= . ,
..
ψn−1
then the following loop computes y:
for i := 0, . . . , n − 1
ψi := αχi
endfor
• (αx)T = αxT .
• (αx)H = αxH .
x0 αx0
x1 αx1
• α . =
.. ..
.
xN−1 αxN−1
* SEE ANSWER
Cost
Scaling a vector of size n requires, approximately, n multiplications. Each of these becomes a floating
point operation (flop) when the computation is performed as part of an algorithm executed on a computer
that performs floating point computation. We will thus say that scaling a vector costs n flops.
It should be noted that arithmetic with complex numbers is roughly 4 times as expensive as is arithmetic
with real numbers. In the chapter “Notes on Performance” (page 211) we also discuss that the cost of
moving data impacts the cost of a flop. Thus, not all flops are created equal!
Chapter 1. Notes on Simple Vector and Matrix Operations 18
This operation is known as the axpy operation: alpha times x plus y. Typically, the vector y is overwritten
with the result:
χ0 ψ0 αχ0 ψ0
χ ψ αχ ψ
1 1 1 1
y := αx + y = α . + . = + .
.. .. .. ..
.
χn−1 ψn−1 αχn−1 ψn−1
for i := 0, . . . , n − 1
ψi := αχi + ψi
endfor
* SEE ANSWER
Cost
The axpy with two vectors of size n requires, approximately, n multiplies and n additions or 2n flops.
1.4. Vector-vector Operations 19
We will sometimes refer to this operation as a “dot” product since it is like the dot product xH y, but without
conjugation. In the case of real valued vectors, it is the dot product.
Cost
The dot product of two vectors of size n requires, approximately, n multiplies and n additions. Thus, a dot
product cost, approximately, 2n flops.
where 0 denotes the zero vector of size m. This suggests the following loop for computing y := Ax:
y := 0
for j := 0, . . . , n − 1
y := χ j a j + y (axpy)
endfor
In Figure 1.1 (left), we present this algorithm using the FLAME notation (LAFF Notes Week 3 [30]).
Focusing on how A can be partitioned by rows, we find that
ψ0 abT0 abT0 x
ψ abT abT x
1 1 1
y = . = Ax = . x = .
.. .. .
..
ψn−1 T
abm−1 T
abm−1 x
y := 0
for i := 0, . . . , m − 1
ψi := abTi x + ψi (‘‘dot’’)
endfor
Here we use the term “dot” because for complex valued matrices it is not really a dot product. In Figure 1.1
(right), we present this algorithm using the FLAME notation (LAFF Notes Week 3 [30]).
It is important to notice that this first “matrix-vector” operation (matrix-vector multiplication) can be
“layered” upon vector-vector operations (axpy or ‘‘dot’’).
Cost
Matrix-vector multiplication with a m × n matrix costs, approximately, 2mn flops. This can be easily
argued in three different ways:
• The computation requires a multiply and an add with each element of the matrix. There are mn such
elements.
Chapter 1. Notes on Simple Vector and Matrix Operations 22
Algorithm: [y] := M VMULT UNB VAR 1(A, x, y) Algorithm: [y] := M VMULT UNB VAR 2(A, x, y)
xT AT yT
Partition A → AL AR , x → Partition A → , y →
xB AB yB
where AL is 0 columns, xT has 0 elements where AT has 0 rows, yT has 0 elements
while n(AL ) < n(A) do while m(AT ) < m(A) do
Repartition Repartition
x A y
0 0
xT 0 AT yT
AL AR → A0 a1 A2 , → χ1
→ aT , → ψ1
1
xB AB yB
x2 A2 y2
y := χ1 a1 + y ψ1 := aT1 x + ψ1
Figure 1.1: Matrix-vector multiplication algorithms for y := Ax + y. Left: via axpy operations (by
columns). Right: via “dot products” (by rows).
• The operation can be computed via n axpy operations, each of which requires 2m flops.
• The operation can be computed via m dot operations, each of which requires 2n flops.
Homework 1.8 Follow the instructions in the below video to implement the two algorithms for computing
y := Ax + y, using MATLAB and Spark. You will want to use the laff routines summarized in Appendix B.
You can visualize the algorithm with PictureFLAME.
* YouTube
* Downloaded
Video
* SEE ANSWER
1.5. Matrix-vector Operations 23
Also,
yxT = y χ0 χ1 · · · χn−1 = χ0 y χ1 y · · · χn−1 y .
which shows that all columns are a multiple of row vector xT . This motivates the observation that the
matrix yxT has rank at most equal to one (LAFF Notes Week 10 [30]).
The operation A := yxT + A is called a rank-1 update to matrix A and is often referred to as underline-
general rank-1 update (ger):
α0,0 α0,1 ··· α0,n−1
α
1,0 α1,1 · · · α1,n−1
:=
.
.. .. ..
. .
αm−1,0 αm−1,1 · · · αm−1,n−1
Chapter 1. Notes on Simple Vector and Matrix Operations 24
ψ0 χ0 + α0,0 ψ0 χ1 + α0,1 ··· ψ0 χn−1 + α0,n−1
ψ1 χ0 + α1,0 ψ1 χ1 + α1,1 ··· ψ1 χn−1 + α1,n−1
.
.. .. ..
. . .
ψm−1 χ0 + αm−1,0 ψm−1 χ1 + αm−1,1 · · · ψm−1 χn−1 + αm−1,n−1
Now, partition
abT0
abT
1
A= a0 a1 · · · an−1 = . .
..
abTm−1
Focusing on how A can be partitioned by columns, we find that
yxT + A = χ0 y χ1 y · · · χn−1 y + a0 a1 · · · an−1
= χ0 y + a0 χ1 y + a1 · · · χn−1 y + an−1 .
Notice that each column is updated with an axpy operation. This suggests the following loop for comput-
ing A := yxT + A:
for j := 0, . . . , n − 1
a j := χ j y + a j (axpy)
endfor
In Figure 1.2 (left), we present this algorithm using the FLAME notation (LAFF Notes Week 3).
Focusing on how A can be partitioned by rows, we find that
ψ0 xT abT0
ψ1 xT abT
1
yxT + A = + .
.
.. ..
ψm−1 x T T
abm−1
ψ0 xT + abT0
ψ1 xT + abT
1
= .
.
..
ψm−1 xT + abTm−1
Notice that each row is updated with an axpy operation. This suggests the following loop for computing
A := yxT + A:
for i := 0, . . . , m − 1
abTj := ψi xT + abTj (axpy)
endfor
1.5. Matrix-vector Operations 25
Algorithm: [A] := R ANK 1 UNB VAR 1(y, x, A) Algorithm: [A] := R ANK 1 UNB VAR 2(y, x, A)
xT yT AT
Partition x → , A → AL AR Partition y → , A →
xB yB AB
where xT has 0 elements, AL has 0 columns where yT has 0 elements, AT has 0 rows
while m(xT ) < m(x) do while m(yT ) < m(y) do
Repartition Repartition
x y A
0 0
xT 0 yT AT
→ χ1
, AL AR → A0 a1 A2 → ψ1 , → aT
1
xB yB AB
x2 y2 A2
Figure 1.2: Rank-1 update algorithms for computing A := yxT + A. Left: by columns. Right: by rows.
In Figure 1.2 (right), we present this algorithm using the FLAME notation (LAFF Notes Week 3).
Again, it is important to notice that this “matrix-vector” operation (rank-1 update) can be “layered”
upon the axpy vector-vector operation.
Cost
A rank-1 update of a m × n matrix costs, approximately, 2mn flops. This can be easily argued in three
different ways:
• The computation requires a multiply and an add with each element of the matrix. There are mn such
elements.
• The operation can be computed one column at a time via n axpy operations, each of which requires
2m flops.
• The operation can be computed one row at a time via m axpy operations, each of which requires 2n
flops.
Homework 1.9 Implement the two algorithms for computing A := yxT + A, using MATLAB and Spark.
Chapter 1. Notes on Simple Vector and Matrix Operations 26
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
* SEE ANSWER
Homework 1.10 Prove that the matrix xyT where x and y are vectors has rank at most one, thus explaining
the name “rank-1 update”.
* SEE ANSWER
C := AB +C
∑k−1
p=0 α0,p β p,0 + γ0,0 ∑k−1
p=0 α0,p β p,1 + γ0,1 ··· ∑k−1
p=0 α0,p β p,n−1 + γ0,n−1
∑k−1
p=0 α1,p β p,0 + γ1,0 ∑k−1
p=0 α1,p β p,1 + γ1,1 ··· ∑k−1
p=0 α1,p β p,n−1 + γ1,n−1
= .
.. .. ..
. . ··· .
k−1 k−1
∑ p=0 αm−1,p β p,0 + γm−1,0 ∑ p=0 αm−1,p β p,1 + γm−1,1 · · · ∑k−1
p=0 αm−1,p β p,n−1 + γm−1,n−1
for i := 0, . . . , m − 1
for J := 0, . . . , n − 1
for p := 0, . . . , k − 1
γi, j := αi,p β p, j + γi, j (dot)
endfor
endfor
endfor
Then
abT0 γ0,0 γ0,1 ··· γ0,n−1
abT γ1,0 γ1,1 ··· γ1,n−1
1
C := AB +C = . b0 b1 · · · bn−1 +
.. .
.. .. ..
. ··· .
abTm−1 γm−1,0 γm−1,1 · · · γm−1,n−1
abT0 b0 + γ0,0 abT0 b1 + γ0,1 ··· abT0 bn−1 + γ0,n−1
abT b + γ abT1 b1 + γ1,1 ··· abT1 bn−1 + γ1,n−1
1 0 1,0
= .
.. .. ..
. . ··· .
abTm−1 b0 + γm−1,0 abTm−1 b1 + γm−1,1 · · · abTm−1 bn−1 + γm−1,n−1
for i := 0, . . . , m − 1 for j := 0, . . . , n − 1
for J := 0, . . . , n − 1 for i := 0, . . . , m − 1
γi, j := abTi b j + γi, j (dot) or γi, j := abTi b j + γi, j (dot)
endfor endfor
endfor endfor
The cost of any of these algorithms is 2mnk flops, which can be explained by the fact that mn elements
of C must be updated with a dot product of length k.
c1 := Ab1 + c1
Continue with
BL BR ← B0 b1 B2 , CL CR ← C0 c1 C2
endwhile
Then
C := AB +C = A b0 b1 · · · bn−1 + c0 c1 · · · cn−1
= Ab0 + c0 Ab1 + c1 · · · Abn−1 + cn−1
for j := 0, . . . , n − 1
c j := Ab j + c j (matrix-vector multiplication)
endfor
In Figure 1.3, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).
The given matrix-matrix multiplication algorithm with m×n matrix C, m×k matrix A, and k ×n matrix
B costs, approximately, 2mnk flops. This can be easily argued by noting that each update of a column of
matrix C costs, approximately, 2mk flops. There are n such columns to be computed.
Homework 1.11 Implement C := AB + C via matrix-vector multiplications, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
* SEE ANSWER
1.6. Matrix-matrix multiplication (product) 29
Continue with
A C
AT 0 CT
0
← aT ,
← cT
1 1
AB CB
A2 C2
endwhile
for i := 0, . . . , m − 1
cbTi := abTi B + cbTi (row-vector times matrix-vector multiplication)
endfor
In Figure 1.4, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).
Homework 1.12 Argue that the given matrix-matrix multiplication algorithm with m × n matrix C, m × k
matrix A, and k × n matrix B costs, approximately, 2mnk flops.
* SEE ANSWER
Homework 1.13 Implement C := AB + C via row vector-matrix multiplications, using MATLAB and
Spark. You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm
with PictureFLAME. Hint: yT := xT A + yT is not supported by a laff routine. You can use laff gemv
instead.
An implementation can be found in
* SEE ANSWER
= bT0
a0 b + a1 bT
b1 + · · · + ak−1bT
bk−1 +C
T T
= bk−1 + (· · · (a1b
ak−1b b1 + (a0bbT0 +C)) · · ·)
which shows that C can be updated with a sequence of rank-1 update, suggesting the loop
for p := 0, . . . , k − 1
C := a pbbTp +C (rank-1 update)
endfor
In Figure 1.5, we present this algorithm using the FLAME notation (LAFF Notes, Week 3 and 4).
1.7. Enrichments 31
C := a1 bT1 +C
Continue with
B
BT 0
AL AR ← A0 a1 A2 , ← bT
1
BB
B2
endwhile
Homework 1.14 Argue that the given matrix-matrix multiplication algorithm with m × n matrix C, m × k
matrix A, and k × n matrix B costs, approximately, 2mnk flops.
* SEE ANSWER
Homework 1.15 Implement C := AB + C via rank-1 updates, using MATLAB and Spark. You will want
to use the laff routines summarized in Appendix B. You can visualize the algorithm with PictureFLAME.
* SEE ANSWER
1.7 Enrichments
For my graduate class I have posted this article on * Canvas. Others may have access to this article via
their employer or university.
1.8 Wrapup
Homework 1.17 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (I’m not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
* SEE ANSWER
1.8.2 Summary
It is important to realize that almost all operations that we discussed in this chapter are special cases of
matrix-matrix multiplication. This is summarized in Figure 1.6. A few notes:
Observations made about operations with partitioned matrices and vectors can be summarized by par-
titioning matrices into blocks: Let
C0,0 C0,1 · · · C0,N−1 A0,0 A0,1 · · · A0,K−1
C
1,0 C 1,1 · · · C1,N−1
A
1,0 A 1,1 · · · A1,K−1
C= , A =
.. .. .. .. .. ..
. . · · · .
. . · · · .
CM−1,0 CM−1,1 · · · CM−1,N−1 AM−1,0 AM−1,1 · · · AM−1,K−1
B0,1 B0,1 ··· B0,N−1
B
1,0 B1,1 · · · B1,N−1
B= .
.
.. .. ..
. ··· .
BK−1,0 BK−1,1 · · · BK−1,N−1
1.8. Wrapup 33
:= +
large large large gemm General matrix-matrix multi-
plication
:= +
large large 1 ger General rank-1 update (outer
product if initially C = 0)
:= +
large 1 large gemv General matrix-vector multi-
plication
:= +
1 large large gemv General matrix-vector multi-
plication
:= +
1 large 1 axpy
:= +
large 1 1 axpy
:= +
1 1 large dot
:= +
1 1 1 MAC Multiply accumulate
Figure 1.6: Special shapes of gemm C := AB +C. Here C, A, and B are m × n, m × k, and k × n matrices,
respectively.
Then
C := AB +C =
Chapter 1. Notes on Simple Vector and Matrix Operations 34
∑K−1
p=0 A0,p B p,0 +C0,0 ∑K−1
p=0 A0,p B p,1 +C0,1 ··· ∑K−1
p=0 A0,p B p,N−1 +C0,N−1
∑K−1
p=0 A1,p B p,0 +C1,0 ∑K−1
p=0 A1,p B p,1 +C1,1 ··· ∑K−1
p=0 A1,p B p,N−1 +C1,N−1
.
.. .. ..
. . ··· .
∑K−1 k−1 k−1
p=0 AM−1,p B p,0 +CM−1,0 ∑ p=0 AM−1,p B p,1 +CM−1,1 · · · ∑ p=0 AM−1,p B p,N−1 +CM−1,N−1
Notice that multiplication with partitioned matrices is exactly like regular matrix-matrix multiplication
with scalar elements, except that multiplication of two blocks does not necessarily commute.
Chapter 2
Notes on Vector and Matrix Norms
* YouTube
* Download from UT Box
* View After Local Download
2.1.1 Launch
* YouTube
* Downloaded
Video
35
Chapter 2. Notes on Vector and Matrix Norms 36
2.1.2 Outline
* YouTube
* Downloaded
Video
Definition 2.1 Let ν : Cn → R. Then ν is a (vector) norm if for all x, y ∈ Cn and all α ∈ C
Homework 2.2 Prove that if ν : Cn → R is a norm, then ν(0) = 0 (where the first 0 denotes the zero vector
in Cn ).
* SEE ANSWER
In an undergraduate course, you should have learned that, given real valued vector x and unit length
real valued vector y, the component of x in the direction of y is given by xT yy:
The length of x is kxk2 and the length of xT yy is |xT y|. Thus |xT y| ≤ kxk2 . Now, if y is not of unit length,
then |xT (y/kyk2 )| ≤ kxk2 or, equivalently, |xT y| ≤ kxk2 kyk2 . This is known as the Cauchy-Schwartz
inequality for real valued vectors. Here is its generalization for complex valued vectors.
To show that the vector 2-norm is a norm, we will need the following theorem:
Proof: Assume that x 6= 0 and y 6= 0, since otherwise the inequality is trivially true. We can then choose
xb = x/kxk2 and yb = y/kyk2 . This leaves us to prove that |b xH yb| ≤ 1 since kb xk2 = kb yk2 = 1.
Pick α ∈ C with |α| = 1 so that αb xH yb is real and nonnegative. Another way of saying this is that
αb xH yb = |xH y|. Note that since it is real we also know that αb xH yb = αb xH yb = αbyH xb.
Now,
0 ≤ kb yk22
x − αb
y)H (b
= (x − αb x − αb
y) (kzk22 = zH z)
= xbH xb− αb
yH xb− αb
xH yb+ ααb
yH yb (multiplying out)
= xH yb+ |α|2
1 − 2αb xk2 = kb
(kb xH yb = αb
yk2 = 1 and αb yH xb)
xH yb = αb
xH yb
= 2 − 2αb (|α| = 1)
xH yb|
= 2 − 2|b xH y = |xH y|).
(αb
Proof: To prove this, we merely check whether the three conditions are met:
Let x, y ∈ Cn and α ∈ C be arbitrarily chosen. Then
Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that
χ j 6= 0. Then
q q
kxk2 = |χ0 |2 + · · · + |χn−1 |2 ≥ |χ j |2 = |χ j | > 0.
Chapter 2. Notes on Vector and Matrix Norms 40
kx + yk22 = (x + y)H (x + y)
= xH x + yH x + xH y + yH y
≤ kxk22 + 2kxk2 kyk2 + kyk22
= (kxk2 + kyk2 )2 .
Taking the square root of both sides yields the desired result.
QED
The vector 1-norm is sometimes referred to as the “taxi-cab norm”. It is the distance that a taxi travels
along the streets of a city that has square city blocks.
Proving that the p-norm is a norm is a little tricky and not particularly relevant to this course. To prove
the triangle inequality requires the following classical result:
Theorem 2.11 (Hölder inequality) Let x, y ∈ Cn and 1p + q1 = 1 with 1 ≤ p, q ≤ ∞. Then |xH y| ≤ kxk p kykq .
Clearly, the 1-norm and 2 norms are special cases of the p-norm. Also, kxk∞ = lim p→∞ kxk p .
* YouTube
* Downloaded
Video
Notice that one can think of the Frobenius norm as taking the columns of the matrix, stacking them on
top of each other to create a vector of size m × n, and then taking the vector 2-norm of the result.
Similarly, other matrix norms can be created from vector norms by viewing the matrix as a vector. It
turns out that other than the Frobenius norm, these aren’t particularly interesting in practice.
Chapter 2. Notes on Vector and Matrix Norms 42
kAxkµ
kAkµ,ν = sup .
x ∈ Cn kxkν
x 6= 0
Let us start by interpreting this. How “big” A is, as measured by kAkµ,ν , is defined as the most that A
magnifies the length of nonzero vectors, where the length of a vector (x) is measured with norm k · kν and
the length of a transformed vector (Ax) is measured with norm k · kµ .
Two comments are in order. First,
kAxkµ
sup = sup kAxkµ .
x ∈ Cn kxkν kxkν =1
x 6= 0
Also the “sup” (which stands for supremum) is used because we can’t claim yet that there is a vector x
with kxkν = 1 for which
kAkµ,ν = kAxkµ .
In other words, it is not immediately obvious that there is a vector for which the supremum is attained. The
fact is that there is always such a vector x. The proof depends on a result from real analysis (sometimes
called “advanced calculus”) that states that supx∈S f (x) is attained for some vector x ∈ S as long as f is
continuous and S is a compact set. Since real analysis is not a prerequisite for this course, the reader may
have to take this on faith! From real analysis we also learn that if the supremum is attained by an element
in S, then supx∈S f (x) = maxx∈S f (x). Thus, we replace sup by max from here on in our discussion.
We conclude that the following two definitions are equivalent definitions to the one we already gave:
Definition 2.15 Let k · kµ : Cm → R and k · kν : Cn → R be vector norms. Define k · kµ,ν : Cm×n → R by
kAxkµ
kAkµ,ν = maxn .
x ∈ C kxkν
x 6= 0
and
In this course, we will often encounter proofs involving norms. Such proofs are often much cleaner if one
starts by strategically picking the most convenient of these two definitions.
2.4. Matrix Norms 43
Proof: To prove this, we merely check whether the three conditions are met:
Let A, B ∈ Cm×n and α ∈ C be arbitrarily chosen. Then
• A 6= 0 ⇒ kAkµ,ν > 0 (k · kµ,ν is positive definite):
Notice that A 6= 0 means that at least one of its columns is not a zero vector (since at least
one element). Let us assume it is the jth column, a j , that is nonzero. Then
kAxkµ kAe j kµ ka j kµ
kAkµ,ν = maxn ≥ = > 0.
x ∈ C kxkν ke j kν ke j kν
x 6= 0
QED
The matrix p-norms with p ∈ {1, 2, ∞} will play an important role in our course, as will the Frobenius
norm. As the course unfolds, we will realize that the matrix 2-norm is difficult to compute in practice
while the 1-norm, ∞-norm, and Frobenius norms are straightforward and relatlively cheap to compute.
The following theorem shows how to practically compute the matrix 1-norm:
Chapter 2. Notes on Vector and Matrix Norms 44
Theorem 2.19 Let A ∈ Cm×n and partition A = a0 a1 · · · an−1 . Show that
kAk1 = max ka j k1 .
0≤ j<n
Hence
kaJ k1 ≤ max kAxk1 ≤ kaJ k1
kxk1 =1
which implies that
max kAxk1 = kaJ k1 = max ka j k.
kxk1 =1 0≤ j<n
QED
Similarly, the following exercise shows how to practically compute the matrix ∞-norm:
abT0
abT
1
Homework 2.20 Let A ∈ Cm×n and partition A = . . Show that
..
T
abm−1
kAk∞ = max kb
ai k1 = max (|αi,0 | + |αi,1 | + · · · + |αi,n−1 |)
0≤i<m 0≤i<m
* SEE ANSWER
aTi )T since abTi is the label for the ith row of matrix A.
Notice that in the above exercise abi is really (b
Homework 2.21 Let y ∈ Cm and x ∈ Cn . Show that kyxH k2 = kyk2 kxk2 .
* SEE ANSWER
2.4. Matrix Norms 45
2.4.4 Discussion
While k · k2 is a very important matrix norm, it is in practice often difficult to compute. The matrix norms,
k · kF , k · k1 , and k · k∞ are more easily computed and hence more practical in many instances.
Theorem 2.23 Let k · kν : Cn → R be a vector norm and given any matrix C ∈ Cm×n define the corre-
sponding induced matrix norm as
kCxkν
kCkν = max = max kCxkν .
x6=0 kxkν kxkν =1
Then for any A ∈ Cm×k and B ∈ Ck×n the inequality kABkν ≤ kAkν kBkν holds.
In other words, induced matrix norms are submultiplicative. To prove this theorem, it helps to first proof
a simpler result:
Lemma 2.24 Let k · kν : Cn → R be a vector norm and given any matrix C ∈ Cm×n define the correspond-
ing induced matrix norm as
kCxkν
kCkν = max = max kCxkν .
x6=0 kxkν kxkν =1
Then for any A ∈ Cm×n and y ∈ Cn the inequality kAykν ≤ kAkν kykν holds.
Proof: If y = 0, the result obviously holds since then kAykν = 0 and kykν = 0. Let y 6= 0. Then
kAxkν kAykν
kAkν = max ≥ .
x6=0 kxkν kykν
Rearranging this yields kAykν ≤ kAkν kykν . QED
We can now prove the theorem:
Proof:
kABkν = max kABxkν = max kA(Bx)kν ≤ max kAkν kBxkν ≤ max kAkν kBkν kxkν = kAkν kBkν .
kxkν =1 kxkν =1 kxkν =1 kxkν =1
QED
Ax = b Exact equation
A(x + δx) = b + δb Perturbed equation
We would like to determine a formula, κ(A, b, δb), that tells us how much a relative error in b is potentially
amplified into an error in the solution b:
kδxk kδbk
≤ κ(A, b, δb) .
kxk kbk
We will assume that A has an inverse. To find an expression for κ(A, b, δb), we notice that
Ax + Aδx = b + δb
Ax = b −
Aδx = δb
b = Ab
namely the x for which the maximum is attained. Pick b x.
• There is an δb
b for which
kA−1 xk kA−1 δbk
b
kA−1 k = max = ,
kxk6=0 kxk kδbk
b
again, the x for which the maximum is attained.
A(x + δx) = b
b + δb
b
Homework 2.28 Let k·k be a matrix norm induced by the k·k vector norm. Show that κ(A) = kAkkA−1 k ≥
1.
* SEE ANSWER
This last exercise shows that there will always be choices for b and δb for which the relative error is at
best directly translated into an equal relative error in the solution (if κ(A) = 1).
Theorem 2.29 Let k · kµ : Cn → R and k · kν : Cn → R be vector norms. Then there exist constants αµ,ν
and βµ,ν such that for all x ∈ Cn
αµ,ν kxkµ ≤ kxkν ≤ βµ,ν kxkµ .
The proof of this result again uses the fact that the supremum of a function on a compact set is attained.
A similar result holds for matrix norms:
Theorem 2.30 Let k · kµ : Cm×n → R and k · kν : Cm×n → R be matrix norms. Then there exist constants
αµ,ν and βµ,ν such that for all A ∈ Cm×n
2.7 Enrichments
instead. A careful analysis shows that if γ does not overflow, neither do any of the intermediate values
2 2
encountered during it computation. While one of the terms αµ or βµ may underflow, the other one
equals one and hence the overall result does not underflow. A complete discussion of all the intracacies go
beyond this note.
This insight generalizes to the computation of kxk2 where x ∈ Cn . Rather than computing it as
q
kxk2 = |χ0 |2 + |χ1 |2 + · · · + |χn−1 |2
µ = kxk∞
s
|χ0 | 2 |χ1 | 2 |χn−1 | 2
kxk2 = µ + +···+ .
µ µ µ
2.8 Wrapup
* SEE ANSWER
1. kAk1 =
2. kAk∞ =
3. kAkF =
* SEE ANSWER
Prove that
Chapter 2. Notes on Vector and Matrix Norms 50
1. kAkF = kAT kF .
q
2. kAkF = ka0 k22 + ka1 k22 + · · · + kan−1 k22 .
q
3. kAkF = a0 k22 + ke
ke a1 k22 + · · · + ke
am−1 k22 .
aTi )T .)
(Note that here aei = (e
* SEE ANSWER
• ke j k2 =
• ke j k1 =
• ke j k∞ =
• ke j k p =
* SEE ANSWER
• kIkF =
• kIk1 =
• kIk∞ =
* SEE ANSWER
Homework 2.38 Let k · k be a vector norm defined for a vector of any size (arbitrary n). Let k · k be the
induced matrix norm. Prove that kIk = 1 (where I equals the identity matrix).
Conclude that kIk p = 1 for any p-norm.
* SEE ANSWER
δ0 0 ··· 0
0 δ1 · · · 0
Homework 2.39 Let D = (a diagonal matrix). Compute
.. .. ..
. . . 0
0 0 · · · δn−1
• kDk1 =
• kDk∞ =
* SEE ANSWER
2.8. Wrapup 51
δ0 0 ··· 0
0 δ1 · · · 0
Homework 2.40 Let D = (a diagonal matrix). Then
.. ... ...
. 0
0 0 · · · δn−1
kDk p =
2.8.2 Summary
Chapter 2. Notes on Vector and Matrix Norms 52
Chapter 3
Notes on Orthogonality and the Singular Value
Decomposition
If you need to review the basics of orthogonal vectors, orthogonal spaces, and related topics, you may
want to consult weeks Weeks 9-11 of
Video
The following videos were recorded for the Fall 2014 offering of “Numerical Analysis: Linear Algebra”.
Read disclaimer regarding the videos in the preface!
* YouTube
* Downloaded
Video
53
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 54
z = Ax
w
b
C (A)
Here we consider
• A, a matrix in Rm×n .
• C (A), the space spanned by the columns of A (the column space of A).
• b, a vector in Rm .
• z, the component of b in C (A) which is also the vector in C (A) closest to the vector b. We know that
since this vector is in the column space of A it equals z = Ax for some vector x ∈ Rn .
The vectors b, z, w, all exist in the same planar subspace since b = z + w, which is the page on which these
vectors are drawn in the above picture.
Thus,
b = z + w,
where
• z = Ax with x ∈ Rn ; and
• AT w = 0 since w is orthogonal to the column space of A and hence in N (AT ) (the left null space of
A).
0 = AT w = AT (b − z) = AT (b − Ax)
or, equivalently,
AT Ax = AT b.
3.1. Opening Remarks 55
This is known as the normal equation for finding the vector x that “best” solves Ax = b in the linear
least-squares sense. More on this later in our course.
Then, provided (AT A)−1 exists (which happens when A has linearly independent columns),
x = (AT A)−1 AT b.
Thus, the component of b in C (A) is given by
z = Ax = A(AT A)−1 AT b
while the component of b orthogonal (perpendicular) to C (A) is given by
w = b − z = b − A(AT A)−1 AT b = Ib − A(AT A)−1 AT b = I − A(AT A)−1 AT b.
Summarizing:
z = A(AT A)−1 AT b
w = I − A(AT A)−1 AT b.
Now, we say that, given matrix A with linearly independent columns, the matrix that projects (orthog-
onally) a given vector b onto the column space of A is given by
A(AT A)−1 AT
since A(AT A)−1 AT b is the component of b in C (A). Similarly, given matrix A with linearly independent
columns, the matrix that projects a given vector b onto the space orthogonal to the column space of A
(which, recall, is the left null space of A) is given by
I − A(AT A)−1 AT
since I − A(AT A)−1 AT b is the component of b in C (A)⊥ = N (AT ).
This picture can be thought of as a matrix B ∈ Rm×n where each element in the matrix encodes a pixel in
the picture. The jth column of B then encodes the jth column of pixels in the picture.
Now, let’s focus on the first few columns. Notice that there is a lot of similarity in those columns.
This can be illustrated by plotting the values in the column as a function of the index of the element in the
column:
We plot βi, j , the value of the (i, j) pixel, for j = 0, 1, 2, 3 in different colors. The green line corresponds to
j = 3 and you notice that it is starting to deviate some for i near 250.
If we now instead look at columns j = 0, 1, 2, 100, where the green line corresponds to j = 100, we
see that the curve corresponding to that column is dramatically different:
Changing this to plotting j = 100, 101, 102, 103 and we notice a lot of similarity again:
3.1. Opening Remarks 57
Now, let’s think about this from the point of view taking one vector, say the first column of B, and project-
ing the other columns onto the span of that vector. The hope is that the vector is representatives and that
therefore the projections of the columns onto that span of that vector is a good approximation for each of
the columns. What does this mean?
• Partition B into columns: B = b0 b1 · · · bn−1 .
a(aT a)−1 aT b1 ≈ a
If we let T = a(aT a)−1 aT (the matrix that projects onto Span({a}), then
T −1 T T −1 T
a(a a) a b0 b1 b2 · · · = a(a a) a B.
• Finally, we recognize awT as an outer product (a column vector times a row vector) and hence a
matrix of rank one.
• If we do this for our picture, we get the approximation on the left:
Notice how it seems like each column is the same, except with some constant change in the gray-
scale. The same is true for rows. Why is this? If you focus on the left-most columns in the picture,
they almost look correct (comparing to the left-most columns in the picture on the right). Why is
this?
• The benefit of the approximation on the left is that it can be described with two vectors: a and w (n
+ m floating point numbers) while the original matrix on the right required an entire matrix (m × n
floating point numbers).
The disadvantage of the approximation on the left is that it is hard to recognize the original picture...
What if we take two vectors instead, say column j = 0 and j = n/2, and projected each of the columns
onto the subspace spanned by those two vectors?
• Partition B into columns: B = b0 b1 · · · bn−1 .
• Pick A = a0 a1 = b0 bn/2 .
A(AT A)−1 AT b1 ≈ a
= A (AT A)−1 AT B = AW T .
| {z }
W T
• Notice that A and W each have two columns and AW T is the sum of two outer products:
T wT
AW T = a0 a1 w0 w1 = a0 a1 0 = a0 wT0 + a1 wT1 .
| {z } wT1
WT
It can be easily shown that this matrix has rank at most two, which is why AW T is called a rank-2
approximation of B.
Rank-k approximations
We can continue the above by picking progressively more columns for A. The progression of pictures in
Figure 3.1 shows the improvement as more and more columns are used, where k indicates the number of
columns.
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 60
k=1 k=2
k = 10 k = 25
k = 50 original
Picking vectors for matrix A from the columns of the original picture does not usually yield an optimal
approximation. How then does one compute the best rank-k approximation of a matrix so that the amount
of data that must be stored best captures the matrix? This is where the Singular Value Decomposition
(SVD) comes in.
3.1.2 Outline
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Opening Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1.1 Launch: Orthogonal projection and its application . . . . . . . . . . . . . . . . . 57
3.1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.3 What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Orthogonality and Unitary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Toward the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.6 Consequences of the SVD Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.7 Projection onto the Column Space . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.8 Low-rank Approximation of a Matrix . . . . . . . . . . . . . . . . . . . . . . . 83
3.9 An Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.10 SVD and the Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . . 86
3.11 An Algorithm for Computing the SVD? . . . . . . . . . . . . . . . . . . . . . . 87
3.12 Wrapup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.12.1 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.12.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.1. Opening Remarks 63
Definition 3.2 Let q0 , q1 , . . . , qn−1 ∈ Cm . These vectors are said to be mutually orthonormal if for all
0 ≤ i, j < n
1 if i = j
H
qi q j = .
0 otherwise
q
The definition implies that kqi k2 = qH
i qi = 1 and hence each of the vectors is of unit length in addition
to being orthogonal to each other.
For n vectors of length m to be mutually orthonormal, n must be less than or equal to m. This is
because n mutually orthonormal vectors are linearly independent and there can be at most m linearly
independent vectors of length m.
A very concise way of indicating that a set of vectors are mutually orthonormal is to view them as the
columns of a matrix, which then has a very special property:
Definition 3.3 Let Q ∈ Cm×n (with n ≤ m). Then Q is said to be an orthonormal matrix if QH Q = I.
The subsequent exercise makes the connection between mutually orthonormal vectors and an orthogonal
matrix.
Homework 3.4 Let Q ∈ C m×n (with n ≤ m). Partition Q = q0 q1 · · · qn−1 . Show that Q is an
orthonormal matrix if and only if q0 , q1 , . . . , qn−1 are mutually orthonormal.
* SEE ANSWER
Definition 3.5 Let Q ∈ Cm×m . Then Q is said to be a unitary matrix if QH Q = I (the identity).
Unitary matrices are always square and only square matrices can be unitary. Sometimes the term
orthogonal matrix is used instead of unitary matrix, especially if the matrix is real valued.
Unitary matrices have some very nice poperties, as captured by the following exercise.
Homework 3.6 Let Q ∈ Cm×m . Show that if Q is unitary then Q−1 = QH and QQH = I.
* SEE ANSWER
Homework 3.7 Let Q0 , Q1 ∈ Cm×m both be unitary. Show that their product, Q0 Q1 , is unitary.
* SEE ANSWER
3.2. Orthogonality and Unitary Matrices 65
Homework 3.8 Let Q0 , Q1 , . . . , Qk−1 ∈ Cm×m all be unitary. Show that their product, Q0 Q1 · · · Qk−1 , is
unitary.
* SEE ANSWER
Let x ∈ Cm . Then
H
H
x = QQ x = q0 q1 · · · qm−1 q0 q1 · · · qm−1 x
qH
0
qH
1
= · · · qm−1 . x
q0 q1
..
qH
m−1
qH
0x
qH x
1
= q0 q1 · · · qm−1
..
.
qH
m−1 x
= (qH H H
0 x)q0 + (q1 x)q1 + · · · + (qm−1 x)qm−1 .
• The vector
qH
0x
qH x
H 1
Q x=
..
.
qH
m−1 x
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 66
gives the coefficients when the vector x is written as a linear combination of the orthonormal vectors
q0 , q1 , . . . , qm−1 :
x = (qH H H
0 x)q0 + (q1 x)q1 + · · · + (qm−1 x)qm−1 .
Another way of looking at this is that if q0 , q1 , . . . , qm−1 is an orthonormal basis for Cm , then any x ∈ Cm
can be written as a linear combination of these vectors:
x = α0 q0 + α1 q1 + · · · + αm−1 qm−1 .
Now,
qH H
i x = qi (α0 q0 + α1 q1 + · · · + αi−1 qi−1 + αi qi + αi+1 qi+1 + · · · + αm−1 qm−1 )
= α0 q H q + α1 qH q + · · · + αi−1 qH q
| i{z }0 | i{z }1 | i {zi−1}
0 0 0
+ αi qH H H
i qi + αi+1 qi qi+1 + · · · + αm−1 qi qm−1
|{z} | {z } | {z }
1 0 0
= αi .
Thus qH
i x = αi , the coefficient that multiplies qi .
The point is that given vector x and unitary matrix Q, QH x computes the coefficients for the orthonor-
mal basis consisting of the columns of matrix Q. Unitary matrices allow one to elegantly change
between orthonormal basis.
Multiplication by a unitary matrix preserves length (since changing the basis for a vector does not
change its length when the basis is orthonormal).
Homework 3.10 Let U ∈ Cm×m and V ∈ Cn×n be unitary matrices and A ∈ Cm×n . Then
* SEE ANSWER
Homework 3.11 Let U ∈ Cm×m and V ∈ Cn×n be unitary matrices and A ∈ Cm×n . Then
* SEE ANSWER
3.3. Toward the SVD 67
* YouTube
* Downloaded
Video
In this section, we lay the foundation for the Singular Value Decomposition. We come very close: the
following lemma differs from the Singular Value Decomposition Theorem (discussed later in this chaper)
only in that it doesn’t yet guarantee that the diagonal elements of D are ordered from largest to smallest.
DT L 0
such that A = UDV H where D = with DT L = diag(δ0 , · · · , δr−1 ) and δi > 0 for 0 ≤ i < r.
0 0
• Inductive step: Assume the result is true for all matrices with 1 ≤ k < n columns. Show that it is
true for matrices with n columns.
Let A ∈ Cm×n with n ≥ 2. W.l.o.g., A 6= 0 so that kAk2 6= 0. Let δ0 and v0 ∈ Cn have the prop-
erty that kv0 k2 = 1 and δ0 = kAv0 k2 = kAk2 . (In other words, v0 is the vector that maximizes
maxkxk2 =1 kAxk2 .) Let u0 = Av0 /δ0 . Note that ku0 k2 = 1. Choose U1 ∈ Cm×(m−1) and V1 ∈ Cn×(n−1)
so that Ũ = u0 U1 and Ṽ = v0 V1 are unitary. Then
H
Ũ H AṼ = u0 U1 A v0 V1
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 68
uH
0 Av0 uH
0 AV1 δ0 uH
0 u0 uH
0 AV1 δ0 wH
= = = ,
U1H Av0 U1H AV1 δU1H u0 U1H AV1 0 B
where w = V1H AH u0 and B = U1H AV1 . Now, we will argue that w = 0, the zero vector of appropriate
size:
2
δ0 w H
x
0 B
kU H AV xk2
δ20 = kAk22 = kU H AV k22 = max 2
= max 2
x6=0 kxk22 x6=0 kxk22
2
2
δ0 w H δ
2
δ0 + w w
H
0
0 B w
Bw
2 2
≥
2 =
2
δ0
δ0
w
w
2 2
(δ20 + wH w)2
≥ = δ20 + wH w.
δ20 + wH w
δ0 0
Thus δ20 ≥ δ20 + wH h which means that w = 0 and Ũ H AṼ = .
0 B
ĎT L 0
Ď ∈ R(m−1)×(n−1) such that B = Ǔ ĎV̌ H where Ď = with ĎT L = diag(δ1 , · · · , δr−1 ).
0 0
Now, let
1 0 1 0 δ0 0
U = Ũ ,V = Ṽ , and D = .
0 Ǔ 0 V̌ 0 Ď
(There are some really tough to see ”checks” in the definition of U, V , and D!!) Then A = UDV H
where U, V , and D have the desired properties.
• By the Principle of Mathematical Induction the result holds for all matrices A ∈ Cm×n with m ≥ n.
Homework 3.14 Assume that U ∈ Cm×m and V ∈ Cn×n are unitary matrices. Let A, B ∈ Cm×n with
B = UAV H . Show that the singular values of A equal the singular values of B.
* SEE ANSWER
3.4. The Theorem 69
σ0 0
Homework 3.15 Let A ∈ Cm×n with A = and assume that kAk2 = σ0 . Show that kBk2 ≤
0 B
kAk2 . (Hint: Use the SVD of B.)
* SEE ANSWER
Proof: Notice that the proof of the above theorem is identical to that of Lemma 3.12. However, thanks
to the above exercises, we can conclude that kBk2 ≤ σ0 in the proof, which then can be used to show that
the singular values are found in order.
Proof: (Alternative) An alternative proof uses Lemma 3.12 to conclude that A = UDV H . If the entries on
the diagonal of D are not ordered from largest to smallest, then this can be fixed by permuting the rows
and columns of D, and correspondingly permuting the columns of U and V .
* YouTube
* Downloaded
Video
We will now quickly illustrate what the SVD Theorem tells us about matrix-vector multiplication
(linear transformations) by examining the case where A ∈ R2×2 . Let A = UΣV T be its SVD. (Notice that
all matrices are now real valued, and hence V H = V T .) Partition
σ0 0 T
A= u0 u1 v0 v1 .
0 σ1
Since U and V are unitary matrices, {u0 , u1 } and {v0 , v1 } form orthonormal bases for the range and domain
of A, respectively:
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 70
χ0
Now let us look at how A transforms any vector with (Euclidean) unit length. Notice that x =
χ1
means that
x = χ0 e0 + χ1 e1 ,
where e0 and e1 are the unit basis vectors. Thus, χ0 and χ1 are the coefficients when x is expressed using
e0 and e1 as basis. However, we can also express x in the basis given by v0 and v1 :
T vT x
x = VV T
|{z} x = v0 v1 v0 v1 x = v0 v1 0
vT1 x
I
α
0
= vT0 x v0 + vT1 x v1 = α0 v0 + α0 v1 = v0 v1 .
|{z} |{z} α1
α0 α1
Thus, in the basis formed by v0 and v1 , its coefficients are α0 and α1 . Now,
T T α0
Ax = σ0 u0 σ1 u1 v0 v1 x= σ0 u0 σ1 u1 v0 v1 v0 v1
α1
α0
= σ0 u0 σ1 u1 = α 0 σ0 u 0 + α 1 σ1 u 1 .
α1
This is illustrated by the following picture, which also captures the fact that the unit ball is mapped to an
“ellipse”1 with major axis equal to σ0 = kAk2 and minor axis equal to σ1 :
1 It is not clear that it is actually an ellipse and this is not important to our observations.
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 72
Finally, we show the same insights for general vector x (not necessarily of unit length).
Another observation is that if one picks the right basis for the domain and codomain, then the com-
putation Ax simplifies to a matrix multiplication with a diagonal matrix. Let us again illustrate this for
nonsingular A ∈ R2×2 with
σ0 0 T
A= u0 u1 v0 v1 .
| {z } 0 σ1 | {z }
U V
| {z }
Σ
Now, if we chose to express y using u0 and u1 as the basis and express x using v0 and v1 as the basis, then
uT y ψ
b0
T
| {z } y = U U
UU T
y = (uT0 y)u0 + (uT1 y)u1 = u0 u1 0 = U
|{z} T
u1 y ψ
b1
I yb | {z }
yb
vT x χ 0
T
V T x x = (vT0 x)v0 + (vT1 x)v1 = v0 v1 0 = V
b
|{z} x = V |{z}
VV .
T
v1 x χ1 .
b
I xb | {z }
xb
If y = Ax then
U U T y = UΣV T
| {z }x = UΣb
x
|{z}
yb Ax
3.6. Consequences of the SVD Theorem 73
so that yb = Σb
x and
ψ
b0 σ0 b
χ0
= .
b 1.
ψ χ1 .
σ1 b
We first generalize the observations we made for A ∈ R2×2 . Let us track what the effect of Ax = UΣV H x
is on vector x. We assume that m ≥ n.
• Let U = u0 · · · um−1 and V = v0 · · · vn−1 .
• Let
vH
0x
H
..
x = VV H x = v0 · · · vn−1 v0 · · · vn−1 x= v0 · · · vn−1
.
vH
n−1 x
= vH H
0 xv0 + · · · + vn−1 xvn−1 .
This can be interpreted as follows: vector x can be written in terms of the usual basis of Cn as χ0 e0 +
· · · + χ1 en−1 or in the orthonormal basis formed by the columns of V as vH H
0 xv0 + · · · + vn−1 xvn−1 .
Ax = vH H
0 xAv0 + · · · + vn−1 xAvn−1
= vH H
0 xσ0 u0 + · · · + vn−1 xσn−1 un−1
= σ0 u0 vH H
0 x + · · · + σn−1 un−1 vn−1 x
= σ0 u0 vH H
0 + · · · + σn−1 un−1 vn−1 x.
Chapter 3. Notes on Orthogonality and the Singular Value Decomposition 74
Proof:
H
ΣT L 0 H
A = UΣV = UL UR VL VR = UL ΣT LVLH .
0 0
AT
Homework 3.19 Let A = . Use the SVD of AT to show that kAk2 = kAT k2 .
0
* SEE ANSWER
Any matrix with rank r can be written as the sum of r rank-one matrices:
Then
A = σ0 u0 vH + σ1 u1 vH + · · · + σr−1 ur−1 vH .
| {z 0} | {z 1} | {z r−1}
σ0 σ1 σr−1
(Each term is a nonzero matrix and an outer product, and hence a rank-1 matrix.)
Proof:
• Let y ∈ C (A). Then there exists x ∈ Cn such that y = Ax (by the definition of y ∈ C (A)). But then
y = Ax = UL ΣT LVLH x = UL z,
| {z }
z
Corollary 3.22 Let A = UL ΣT LVLH be the reduced SVD of A where UL and VL have r columns. Then the
rank of A is r.
Proof: The rank of A equals the dimension of C (A) = C (UL ). But the dimension of C (UL ) is clearly r.
Proof:
• Let x ∈ N (A). Then
H
H VLH
x = | {z } x =
VV VL VR VL VR x= VL VR x
VRH
I
VLH x
= VL VR = VLVLH x +VRVRH x.
VRH x
If we can show that VLH x = 0 then x = VR z where z = VRH x. Assume that VLH x 6= 0. Then ΣT L (VLH x) 6=
0 (since ΣT L is nonsingular) and UL (ΣT L (VLH x)) 6= 0 (since UL has linearly independent columns).
But that contradicts the fact that Ax = UL ΣT LVLH x = 0.
Corollary 3.24 For all x ∈ Cn there exists z ∈ C (VL ) such that Ax = Az.
Proof:
H
H
Ax = A VV
| {z } x = A VL VR VL VR x
I
= A VLVLH x +VRVRH x = AVLVLH x + AVRVRH x
Cn Cm
xn
N (A) = C (VR )
N (AH ) = C (UR )
dim n − r dim m − r
in x. (Note: for this permutation matrix, PT = P. In general, this is not the case. What is the case
for all permutation matrices P is that PT P = PPT = I.)
4. kA−1 k2 = 1/σn−1 .
Proof: The only item that is less than totally obvious is (3). Clearly A−1 = V Σ−1U H . The problem is that
in Σ−1 the diagonal entries are not ordered from largest to smallest. The permutation fixes this.
Corollary 3.28 If A ∈ Cm×n has linearly independent columns then AH A is invertible (nonsingular) and
−1 H
(AH A)−1 = VL Σ2T L VL .
Proof: Since A has linearly independent columns, A = UL ΣT LVLH is the reduced SVD where UL has n
columns and VL is unitary. Hence
2
AH A = (UL ΣT LVLH )H UL ΣT LVLH = VL ΣH H H H H
T LUL UL ΣT LVL = VL ΣT L ΣT LVL = VL ΣT LVL .
Since VL is unitary and ΣT L is diagonal with nonzero diagonal entries, they are both nonsingular. Thus
−1 H
VL Σ2T LVLH VL Σ2T L VL ) = I.
Theorem 3.30 Let UL ∈ Cm×k have orthonormal columns. The projection of y onto C (UL ) is given by
ULULH y.
2
ULH ULH
= min
UL w − y
w∈Ck
URH URH
2
2
ULH UL w UHy
− L
= min
w∈Ck
URH UL w URH y
2
2
UHy
w
− L
= min
w∈Ck
URH y
0
2
2
w −ULH y
= min
w∈Ck
−URH y
2
2
2
2
u
= min
w −ULH y
2 +
−URH y
2 (since
= kuk2 + kvk2 )
2 2
w∈Ck
v
2
H
2
2
min w −UL y 2 +
URH y
2 .
=
w∈Ck
This is minimized when w = ULH y. Thus, the vector that is closest to y in the space spanned by UL is given
by x = UL w = ULULH y.
Corollary 3.31 Let A ∈ Cm×n and A = UL ΣT LVLH be its reduced SVD. Then the projection of y ∈ Cm onto
C (A) is given by ULULH y.
Proof: This follows immediately from the fact that C (A) = C (UL ).
Corollary 3.32 Let A ∈ Cm×n have linearly independent columns. Then the projection of y ∈ Cm onto
C (A) is given by A(AH A)−1 AH y.
−1
Proof: From Corrolary 3.28, we know that AH A is nonsingular and that (AH A)−1 = VL Σ2T L VLH . Now,
−1
A(AH A)−1 AH y = (UL ΣT LVLH )(VL Σ2T L VLH )(UL ΣT LVLH )H y
= UL ΣT L VLH VL Σ−1 Σ−1 V H V Σ U H y = ULULH y.
| {z } T L T L | L{z L} T L L
I I
Definition 3.33 Let A have linearly independent columns. Then (AH A)−1 AH is called the pseudo-inverse
or Moore-Penrose generalized inverse of matrix A.
3.8. Low-rank Approximation of a Matrix 79
Theorem 3.34 Let A ∈ Cm×n have SVD A = UΣV H and assume A has rank r. Partition
ΣT L 0
U = UL UR , V = VL VR , and Σ = ,
0 ΣBR
where UL ∈ Cm×k , VL ∈ Cn×k , and ΣT L ∈ Rk×k with k ≤ r. Then B = UL ΣT LVLH is the matrix in Cm×n
closest to A in the following sense:
kA − Bk2 = kU H (A − B)V k2 = kU H AV −U H BV k2
0 0
H
Σ ΣT L
=
T L
=
Σ − UL U R B VL VR
−
2
0 ΣBR 0 0
2
0 0
=
= kΣBR k2 = σk
0 ΣBR
2
Next, assume that C has rank t ≤ k and kA − Ck2 < kA − Bk2 . We will show that this leads to a
contradiction.
• The null space of C has dimension at least n − k since dim(N (C)) + rank(C) = n.
• If x ∈ N (C) then
kAxk2 = k(A −C)xk2 ≤ kA −Ck2 kxk2 < σk kxk2 .
• Partition U = u0 · · · um−1 and V = v0 · · · vn−1 . Then kAv j k2 = kσ j u j k2 = σ j ≥ σs
for j = 0, . . . , k. Now, let x be any linear combination of v0 , . . . , vk : x = α0 v0 + · · · + αk vk . Notice
that
kxk22 = kα0 v0 + · · · + αk vk k22 ≤ |α0 |2 + · · · |αk |2 .
Then
so that kAxk2 ≥ σk kxk2 . In other words, vectors in the subspace of all linear combinations of
{v0 , . . . , vk } satisfy kAxk2 ≥ σk kxk2 . The dimension of this subspace is k + 1 (since {v0 , · · · , vk }
form an orthonormal basis).
• Both these subspaces are subspaces of Cn . Since their dimensions add up to more than n there must
be at least one nonzero vector z that satisfies both kAzk2 < σk kzk2 and kAzk2 ≥ σk kzk2 , which is a
contradiction.
The above theorem tells us how to pick the best approximation to a given matrix of a given desired
rank. In Section 3.1.1 we discussed how a low rank matrix can be used to compress data. The SVD thus
gives the best such rank-k approximation. In the next section, we revisit this.
3.9 An Application
Let us revisit data compression. Let Y ∈ Rm×n be a matrix that, for example, stores a picture. In this
case, the (i, j) entry in Y is, for example, a number that represents the grayscale value of pixel (i, j). The
following instructions, executed in octave or matlab, generate the picture of Mexican artist Frida Kahlo in
Figure 3.3(top-left). The file FridaPNG.png can be found in Programming/chapter03/.
octave> IMG = imread( ’FridaPNG.png’ ); % this reads the image
octave> Y = IMG( :,:,1 );
octave> imshow( Y ) % this dispays the image
Although the picture is black and white, it was read as if it is a color image, which means a m × n × 3 array
of pixel information is stored. Setting Y = IMG( :,:,1 ) extracts a single matrix of pixel information.
(If you start with a color picture, you will want to approximate IMG( :,:,1), IMG( :,:,2), and IMG(
:,:,3) separately.)
Now, let Y = UΣV T be the SVD of matrix Y . Partition, conformally,
ΣT L 0
U = UL UR , V = VL VR , and Σ = ,
0 ΣBR
k=2 k=5
k = 10 k = 25
where κ2 (A) = kAk2 kA−1 k2 is the condition number of A, using the 2-norm.
If we go back to the example of A ∈ R2×2 , recall the following pictures that shows how A transforms
the unit circle:
R2 : Domain of A. R2 : Range (codomain) of A.
In this case, the ratio σ0 /σn−1 represents the ratio between the major and minor axes of the “ellipse” on
the right. Notice that the more elongated the “ellipse” on the right is, the worse (larger) the condition
number.
3.12 Wrapup
* SEE ANSWER
Homework 3.37 Let A ∈ Cn×n have singular values σ0 ≥ σ1 ≥ · · · ≥ σn−1 (where σn−1 may equal zero).
Prove that σn−1 ≤ kAxk2 /kxk2 ≤ σ0 (assuming x 6= 0).
* SEE ANSWER
Homework 3.38 Let A ∈ Cn×n have the property that for all vectors x ∈ Cn it holds that kAxk2 = kxk2 .
Use the SVD to prove that A is unitary.
* SEE ANSWER
3.12.2 Summary
Chapter 4
Notes on Gram-Schmidt QR Factorization
A classic problem in linear algebra is the computation of an orthonormal basis for the space spanned by a
given set of linearly independent vectors: Given a linearly independent set of vectors {a0 , . . . , an−1 } ⊂ Cm
we would like to find a set of mutually orthonormal vectors {q0 , . . . , qn−1 } ⊂ Cm so that
* YouTube
* Download from UT Box
* View After Local Download
4.1.1 Launch
85
Chapter 4. Notes on Gram-Schmidt QR Factorization 86
4.1.2 Outline
The process proceeds as described in Figure 4.1 and in the algorithms in Figure 4.2.
Homework 4.1 What happens in the Gram-Schmidt algorithm if the columns of A are NOT linearly in-
dependent? How might one fix this? How can the Gram-Schmidt algorithm be used to identify which
columns of A are linearly independent?
* SEE ANSWER
Homework 4.2 Convince yourself that the relation between the vectors {a j } and {q j } in the algorithms
in Figure 4.2 is given by
ρ0,0 ρ0,1 · · · ρ0,n−1
0 ρ1,1 · · · ρ1,n−1
a0 a1 · · · an−1 = q0 q1 · · · qn−1 . ,
.. .. .. ..
. . .
0 0 · · · ρn−1,n−1
where
H for i < j
qi a j
1 for i = j
j−1
qH q
i j = and ρi, j = ka j − ∑i=0 ρi, j qi k2 for i = j
0 otherwise
0 otherwise.
* SEE ANSWER
Thus, this relationship between the linearly independent vectors {a j } and the orthonormal vectors {q j }
can be concisely stated as
A = QR,
where A and Q are m × n matrices (m ≥ n), QH Q = I, and R is an n × n upper triangular matrix.
Theorem 4.3 Let A have linearly independent columns, A = QR where A, Q ∈ Cm×n with n ≤ m, R ∈ Cn×n ,
QH Q = I, and R is an upper triangular matrix with nonzero diagonal entries. Then, for 0 < k < n, the first
k columns of A span the same space as the first k columns of Q.
Proof: Partition
RT L RT R
A→ AL AR , Q→ QL QR , and R → ,
0 RBR
where AL , QL ∈ Cm×k and RT L ∈ Ck×k . Then RT L is nonsingular (since it is upper triangular and has no
zero on its diagonal), QH
L QL = I, and AL = QL RT L . We want to show that C (AL ) = C (QL ):
4.2. Classical Gram-Schmidt (CGS) Process 89
Steps Comment
ρ0,0 := ka0 k2 Compute the length of vector a0 , ρ0,0 := ka0 k2 .
q0 =: a0 /ρ0,0 Set q0 := a0 /ρ0,0 , creating a unit vector in the direction of
a0 .
Clearly, Span({a0 }) = Span({q0 }). (Why?)
ρ0,1 = qH
0 a1
a⊥
1 = a1 − ρ0,1 q0 Compute a⊥
1 , the component of vector a1 orthogonal to q0 .
ρ1,1 = ka⊥
1 k2 Compute ρ1,1 , the length of a⊥
1.
q1 = a⊥
1 /ρ1,1 Set q1 = a⊥
1 /ρ1,1 , creating a unit vector in the direction of
a⊥
1 .
Now, q0 and q1 are mutually orthonormal and
Span({a0 , a1 }) = Span({q0 , q1 }). (Why?)
ρ0,2 = qH
0 a2
ρ1,2 = qH
1 a2
a⊥ ⊥
2 = a2 − ρ0,2 q0 − ρ1,2 q1 Compute a2 , the component of vector a2 orthogonal to q0
ρ = ka⊥ k and q1 .
2,2 2 2
⊥
q2 = a2 /ρ2,2 Compute ρ2,2 , the length of a⊥
2.
Set q2 = a⊥
2 /ρ2,2 , creating a unit vector in the direction of
⊥
a2 .
for j = 0, . . . , n − 1 for j = 0, . . . , n − 1 , n − 1
j = 0, . . .
for
for k = 0, . . . , j − 1 ρ0, j qH a
ρk, j := qH 0 j
k aj
H
.. ..
end . := . = q0 · · · q j−1 aj
ρ j−1, j qHj−1 a j
a⊥j := a j a⊥j := a j
for k = 0, . . . , j − 1 for k = 0, . . . , j − 1 ρ
0, j
.
ρk, j := qH
k ja a⊥j := a j − q0 · · · q j−1 ..
a⊥j := a⊥j − ρk, j qk a⊥j := a⊥j − ρk, j qk
ρ j−1, j
end end
⊥
ρ j, j := ka j k2 ρ j, j := ka⊥j k2
⊥
q j := a j /ρ j, j q j := a⊥j /ρ j, j ρ j, j := ka⊥j k2
end end q j := a⊥j /ρ j, j
end
• We first show that C (AL ) ⊆ C (QL ). Let y ∈ C (AL ). Then there exists x ∈ Ck such that y = AL x. But
then y = QL z, where z = RT L x 6= 0, which means that y ∈ C (QL ). Hence C (AL ) ⊆ C (QL ).
• We next show that C (QL ) ⊆ C (AL ). Let y ∈ C (QL ). Then there exists z ∈ Ck such that y = QL z. But
then y = AL x, where x = R−1
T L z, from which we conclude that y ∈ C (AL ). Hence C (QL ) ⊆ C (AL ).
Since C (AL ) ⊆ C (QL ) and C (QL ) ⊆ C (AL ), we conclude that C (QL ) = C (AL ).
Theorem 4.4 Let A ∈ Cm×n have linearly independent columns. Then there exist Q ∈ Cm×n with QH Q = I
and upper triangular R with no zeroes on the diagonal such that A = QR. This is known as the QR
factorization. If the diagonal elements of R are chosen to be real and positive, the QR factorization is
unique.
Proof: (By induction). Note that n ≤ m since A has linearly independent columns.
• Base case: n = 1. In this case A = a0 where a0 is its only column. Since A has linearly
independent columns, a0 6= 0. Then
A = a0 = (q0 ) (ρ00 ) ,
• Inductive step: Assume that the result is true for all A with n − 1 linearly independent columns. We
will show it is true for A ∈ Cm×n with linearly independent columns.
Let A ∈ Cm×n . Partition A → A0 a1 . By the induction hypothesis, there exist Q0 and R00
such that QH
0 Q0 = I, R00 is upper triangular with nonzero diagonal entries and A0 = Q0 R00 . Now,
4.2. Classical Gram-Schmidt (CGS) Process 91
Figure 4.3: (Classical) Gram-Schmidt algorithm for computing the QR factorization of a matrix A.
compute r01 = QH ⊥
0 a1 and a1 = a1 − Q0 r01 , the component of a1 orthogonal to C (Q0 ). Because the
columns of A are linearly independent, a⊥ ⊥ ⊥
1 6= 0. Let ρ11 = ka1 k2 and q1 = a1 /ρ11 . Then
R00 r01
Q0 q1 = Q0 R00 Q0 r01 + q1 ρ11
0 ρ11
= A0 Q0 r01 + a⊥
1
= A0 a1 = A.
Chapter 4. Notes on Gram-Schmidt QR Factorization 92
R00 r01
Hence Q = Q0 q1 and R = .
0 ρ11
• By the Principle of Mathematical Induction the result holds for all matrices A ∈ Cm×n with m ≥ n.
The proof motivates the algorithm in Figure 4.3 (left) in FLAME notation1 .
An alternative for motivating that algorithm is as follows: Consider A = QR. Partition A, Q, and R to
yield
R r R02
00 01
A0 a1 A2 = Q0 q1
T
Q2 0 ρ11 r12 .
0 0 R22
Assume that Q0 and R00 have already been computed. Since corresponding columns of both sides must be
equal, we find that
a1 = Q0 r01 + q1 ρ11 . (4.1)
Also, QH H
0 Q0 = I and Q0 q1 = 0, since the columns of Q are mutually orthonormal. Hence Q0 a1 =
H
QH H
0 Q0 r01 + Q0 q1 ρ11 = r01 . This shows how r01 can be computed from Q0 and a1 , which are already
known. Next, a⊥ 1 = a1 − Q0 r01 is computed from (4.1). This is the component of a1 that is perpendicular
to the columns of Q0 . We know it is nonzero since the columns of A are linearly independent. Since
ρ11 q1 = a⊥ ⊥ ⊥
1 and we know that q1 has unit length, we now compute ρ11 = ka1 k2 and q1 = a1 /ρ11 , which
completes a derivation of the algorithm in Figure 4.3.
Homework 4.5 Let A have linearly independent columns and let A = QR be a QR factorization of A.
Partition
RT L RT R
A → AL AR , Q → QL QR , and R → ,
0 RBR
where AL and QL have k columns and RT L is k × k. Show that
2. C (AL ) = C (QL ): the first k columns of Q form an orthonormal basis for the space spanned by the
first k columns of A.
3. RT R = QH
L AR ,
4. (AR − QL RT R )H QL = 0,
5. AR − QL RT R = QR RBR , and
6. C (AR − QL RT R ) = C (QR ).
* SEE ANSWER
1The FLAME notation should be intuitively obvious. If it is not, you may want to review the earlier weeks in Linear
Algebra: Foundations to Frontiers - Notes to LAFF With.
4.3. Modified Gram-Schmidt (MGS) Process 93
[y⊥ , r] = Proj orthog to QCGS (Q, y) [y⊥ , r] = Proj orthog to QMGS (Q, y)
(used by classical Gram-Schmidt) (used by modified Gram-Schmidt)
y⊥ = y y⊥ = y
for i = 0, . . . , k − 1 for i = 0, . . . , k − 1
ρi := qH ρi := qH ⊥
i y i y
y⊥ := y⊥ − ρi qi y⊥ := y⊥ − ρi qi
endfor endfor
Figure 4.4: Two different ways of computing y⊥ = (I − QQH )y, the component of y orthogonal to C (Q),
where Q has k orthonormal columns.
This can be computed by the algorithm in Figure 4.4 (left) and is used by what is often called the Classical
Gram-Schmidt (CGS) algorithm given in Figure 4.3.
An alternative algorithm for computing y⊥ is given in Figure 4.4 (right) and is used by the Modified
Gram-Schmidt (MGS) algorithm also given in Figure 4.5. This approach is mathematically equivalent to
Chapter 4. Notes on Gram-Schmidt QR Factorization 94
Repartition
R00 r01 R02
RT L RT R
→ , → T
AL AR A0 a1 A2 0 ρ11 r12
0 RBR
0 0 R22
wherea1 and q1 are columns, ρ11 is a scalar
Continue with
R r R02
RT L RT R 00 01
AL AR ← A0 a1 A2 , ← 0 ρ11 rT
12
0 RBR
0 0 R22
endwhile
Figure 4.5: Left: Classical Gram-Schmidt algorithm. Middle: Modified Gram-Schmidt algorithm. Right:
Modified Gram-Schmidt algorithm where every time a new column of Q, q1 is computed the component
of all future columns in the direction of this new vector are subtracted out.
for j = 0, . . . , n − 1 for j = 0, . . . , n − 1
a⊥j := a j
for k = 0, . . . , j − 1 for k = 0, . . . , j − 1
ρk, j := qH ⊥
k aj
ρk, j := aHk aj
a⊥j := a⊥j − ρk, j qk a j := a j − ρk, j ak
end end
ρ j, j := ka⊥j k2 ρ j, j := ka j k2
q j := a⊥j /ρ j, j a j := a j /ρ j, j
end
end
(a) MGS algorithm that computes Q and R from (b) MGS algorithm that computes Q and R from
A. A, overwriting A with Q.
for j = 0, . . . , n − 1 for j = 0, . . . , n − 1
ρ j, j := ka j k2 ρ j, j := ka j k2
a j := a j /ρ j, j a j := a j /ρ j, j
for k = j + 1, . . . , n − 1 for k = j + 1, . . . , n − 1
ρ j,k := aHj ak ρ j,k := aHj ak
end
for k = j + 1, . . . , n − 1
ak := ak − ρ j, j a j ak := ak − ρ j,k a j
end end
end end
(c) MGS algorithm that normalizes the jth col- (d) Slight modification of the algorithm in (c) that
umn to have unit length to compute q j (overwrit- computes ρ j,k in a separate loop.
ing a j with the result) and then subtracts the com-
ponent in the direction of q j off the rest of the
columns (a j+1 , . . . , an−1 ).
for j = 0, . . . , n − 1 for j = 0, . . . , n − 1
ρ j, j := ka j k2 ρ j, j := ka j k2
a j := a j /ρ j, j
aj := a j /ρ j, j
ρ j, j+1 · · · ρ j,n−1 := aHj a j+1 · · · aHj an−1 ρ j, j+1 · · · ρ j,n−1 := aHj a j+1 · · · an−1
a j+1 · · · an−1 := a j+1 · · · an−1 := a j+1 · · · an−1 −
a j+1 − ρ j, j+1 a j · · · an−1 − ρ j,n−1 a j a j ρ j, j+1 · · · ρ j,n−1
end end
(e) Algorithm in (d) rewritten without loops. (f) Algorithm in (e) rewritten to expose
the row-vector-times
matrix multiplica-
H
tion a j a j+1 · · · an−1 and rank-1 update
a j+1 · · · an−1 − a j ρ j, j+1 · · · ρ j,n−1 .
Figure 4.7: Modified Gram-Schmidt algorithm for computing the QR factorization of a matrix A.
y⊥ = y − (qH H
0 y)q0 − · · · − (qi−1 y)qi−1 ,
leaving us with
y⊥ = y − (qH H H
0 y)q0 − · · · − (qi−1 y)qi−1 − (qi y)qi .
Now, notice that
qH H H
= qH H H H H
i y − (q0 y)q0 − · · · − (qi−1 y)qi−1 i y − qi (q0 y)q0 − · · · − qi (qi−1 y)qi−1
= qH H H H H
i y − (q0 y) qi q0 − · · · − (qi−1 y) qi qi−1
| {z } | {z }
0 0
= qH
i y.
4.4. In Practice, MGS is More Accurate 97
If we now ask the question “Are the columns of Q orthonormal?” we can check this by computing QH Q,
which should equal I, the identity. But
H
1 0 0 1 0 0 √ √
2 2
√ √
ε − 2 − 2
√ √
ε − 2 − 2
1 + εmach − 2 ε − 2 ε
√
QH Q = √2 2 √2 2
= − 22 ε 1 .
2
2
1 2
0 0 0 0 √
2 2 2 1
√
2
√
2
− 2 ε 2 1
0 0 2 0 0 2
Clearly, the second and third columns of Q are not mutually orthogonormal. What is going on? The
answer lies with how a⊥
2 is computed in the last step of each of the algorithms.
Chapter 4. Notes on Gram-Schmidt QR Factorization 98
Figure 4.8: Execution of the CGS algorith (left) and MGS algorithm (right) on the example in Eqn. (4.2).
4.5. Cost 99
4.5 Cost
Let us examine the cost of computing the QR factorization of an m × n matrix A. We will count multiplies
and an adds as each as one floating point operation.
We start by reviewing the cost, in floating point operations (flops), of various vector-vector and matrix-
vector operations:
Now, consider the algorithms in Figure 4.5. Notice that the columns of A are of size m. During the kth
iteration (0 ≤ k < n), A0 has k columns and A2 has n − k − 1 columns.
Chapter 4. Notes on Gram-Schmidt QR Factorization 100
∑n−1
k=0 [2mk + 2mk + 2m + m]
= ∑n−1
k=0 [3m + 4mk]
= 3mn + 4m ∑n−1
k=0 k
2
≈ 3mn + 4m n2 (∑n−1 2
k=0 k = n(n − 1)/2 ≈ n /2
= 3mn + 2mn2
≈ 2mn2 (3mn is of lower order).
∑n−1
k=0 [2m + m + 2m(n − k − 1) + 2m(n − k − 1)]
= ∑n−1
k=0 [3m + 4m(n − k − 1)]
= 3mn + 4m ∑n−1
k=0 (n − k − 1)
= 3mn + 4m ∑n−1
i=0 i (Change of variable: i = n − k − 1)
n2
≈ 3mn + 4m 2 (∑n−1 2
i=0 i = n(n − 1)/2 ≈ n /2
= 3mn + 2mn2
≈ 2mn2 (3mn is of lower order).
4.6. Wrapup 101
4.6 Wrapup
Homework 4.7 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (I’m not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
* SEE ANSWER
4.6.2 Summary
Chapter 4. Notes on Gram-Schmidt QR Factorization 102
Chapter 5
Notes on the FLAME APIs
Video
Read disclaimer regarding the videos in the preface!
No video.
103
Chapter 5. Notes on the FLAME APIs 104
5.0.1 Outline
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.0.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Install FLAME@lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 An Example: Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . 108
5.3.1 The Spark Webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.2 Implementing CGS with FLAME@lab . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.3 Editing the code skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4 Implementing the Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.1 Motivation
In the course so far, we have frequently used the “FLAME Notation” to express linear algebra algorithms.
In this note we show how to translate such algorithms into code, using various QR factorization algorithms
as examples.
[y⊥ , r] = Proj orthog to QCGS (Q, y) [y⊥ , r] = Proj orthog to QMGS (Q, y)
(used by classical Gram-Schmidt) (used by modified Gram-Schmidt)
y⊥ = y y⊥ = y
for i = 0, . . . , k − 1 for i = 0, . . . , k − 1
ρi := qH ρi := qH ⊥
i y i y
y⊥ := y⊥ − ρi qi y⊥ := y⊥ − ρi qi
endfor endfor
Figure 5.1: Two different ways of computing y⊥ = (I − QQH )y, the component of y orthogonal to C (Q),
where Q has k orthonormal columns.
careful spacing that is required. For this reason, we created a webpage that creates a “code skeleton.”. We
call this page the “Spark” page:
http://www.cs.utexas.edu/users/flame/Spark/.
When you open the link, you will get a page that looks something like the picture in Figure 5.3.
Repartition
R00 r01 R02
RT L RT R
→ , → T
AL AR A0 a1 A2 0 ρ11 r12
0 RBR
0 0 R22
wherea1 and q1 are columns, ρ11 is a scalar
Continue with
R r R02
RT L RT R 00 01
AL AR ← A0 a1 A2 , ← 0 ρ11 rT
12
0 RBR
0 0 R22
endwhile
Figure 5.2: Left: Classical Gram-Schmidt algorithm. Middle: Modified Gram-Schmidt algorithm. Right:
Modified Gram-Schmidt algorithm where every time a new column of Q, q1 is computed the component
of all future columns in the direction of this new vector are subtracted out.
To the left of the menu, you now find what we call a code skeleton for the implementation, as shown in
Figure 5.5. In Figure 5.6 we show the algorithm and generated code skeleton side-by-side.
5.3. An Example: Gram-Schmidt Orthogonalization 107
which happens to be correct. (When you implement the Householder QR factorization, you may not
be so lucky...)
The “update” statements: The Spark webpage can’t guess what the actual updates to the various parts
of matrices A and R should be. It fills in
% update line 1 %
% : %
% update line n %
r01 := AH
0 a1 r01 = A0’ * a1;
a1 := a1 − A0 r01 a1 = a1 - A0 * r01;
into appropriate M-script code:
ρ11 := ka1 k2 rho11 = norm( a1 );
a1 = a1 / rho11;
a1 := a1 /ρ11
Chapter 5. Notes on the FLAME APIs 108
r01 := AH
0 a1
a1 := a1 − A0 r01
ρ11 := ka1 k2
a1 := a1 /ρ11
Continue with
AL AR ← A0 a1 A2 ,
R00 r01 R02
RT L RT R
← 0 ρ11 rT
12
0 RBR
0 0 R22
endwhile
Figure 5.4: Left: Classical Gram-Schmidt algorithm. Right: Generated code-skeleton for CGS.
(Notice: if one forgets the “;”, when executed the results of the assignment will be printed by
Matlab/Octave.)
At this point, one saves the resulting code in the file CGS unb var1.m. The “.m” ending is important
since the name of the file is used to find the routine when using Matlab/Octave.
5.3.4 Testing
To now test the routine, one starts octave and, for example, executes the commands
> A = rand( 5, 4 )
> R = zeros( 4, 4 )
> [ Q, R ] = CGS_unb_var1( A, R )
> A - Q * triu( R )
Figure 5.5: The Spark webpage filled out for CGS Variant 1.
(The first time you execute the above, you may get a bunch of warnings from Octave. Just ignore
those.)
• Modified Gram Schmidt algorithm, (MGS unb var1, corresponding to the right-most algorithm in
Figure 5.2), respectively.
• The Householder QR factorization algorithm and algorithm to form Q from “Notes on Householder
QR Factorization”.
The routine for computing a Householder transformation (similar to Figure 5.1) can be found at
http://www.cs.utexas.edu/users/flame/Notes/FLAMEatlab/Housev.m
That routine implements the algorithm on the left in Figure 5.1). Try and see what happens if you
replace it with the algorithm to its right.
Note: For the Householder QR factorization and “form Q” algorithm how to start the algorithm when the
matrix is not square is a bit tricky. Thus, you may assume that the matrix is square.
Chapter 5. Notes on the FLAME APIs 110
r01 := AH
0 a1
a1 := a1 − A0 r01
ρ11 := ka1 k2
a1 := a1 /ρ11
Continue with
AL AR ← A0 a1 A2 ,
R00 r01 R02
RT L RT R
← 0 ρ11 rT
12
0 RBR
0 0 R22
endwhile
Figure 5.6: Left: Classical Gram-Schmidt algorithm. Right: Generated code-skeleton for CGS.
Chapter 6
Notes on Householder QR Factorization
Video
Read disclaimer regarding the videos in the preface! This is the video recorded in Fall 2014.
* YouTube
* Download from UT Box
* View After Local Download
(For help on viewing, see Appendix A.)
6.1.1 Launch
A fundamental problem to avoid in numerical codes is the situation where one starts with large values and
one ends up with small values with large relative errors in them. This is known as catastrophic cancellation.
The Gram-Schmidt algorithms can inherently fall victim to this: column a j is successively reduced in
length as components in the directions of {q0 , . . . , q j−1 } are subtracted, leaving a small vector if a j was
almost in the span of the first j columns of A. Application of a unitary transformation to a matrix or vector
inherently preserves length. Thus, it would be beneficial if the QR factorization can be implementated
as the successive application of unitary transformations. The Householder QR factorization accomplishes
this.
The first fundamental insight is that the product of unitary matrices is itself unitary. If, given A ∈ Cm×n
(with m ≥ n), one could find a sequence of unitary matrices, {H0 , . . . , Hn−1 }, such that
R
Hn−1 · · · H0 A = ,
0
where R ∈ Cn×n is upper triangular, then
R R R
A = H0H · · · Hn−1
H = Q = QL QR = QL R,
| {z } 0 0 0
Q
111
Chapter 6. Notes on Householder QR Factorization 112
where QL equals the first n columns of Q. Then A = QL R is the QR factorization of A. The second
fundamental insight will be that the desired unitary transformations {H0 , . . . , Hn−1 } can be computed and
applied cheaply.
6.1. Opening Remarks 113
6.1.2 Outline
x
H
x=z+u xu
H
u xu
v=x−y
z
H
u xu u
u
H H u
(I − 2 u u )x (I − 2 u u )x
y
Figure 6.1: Left: Illustration that shows how, given vectors x and unit length vector u, the subspace
orthogonal to u becomes a mirror for reflecting x represented by the transformation (I − 2uuH ). Right:
Illustration that shows how to compute u given vectors x and y with kxk2 = kyk2 .
Definition 6.1 Let u ∈ Cn be a vector of unit length (kuk2 = 1). Then H = I −2uuH is said to be a reflector
or Householder transformation.
We observe:
(I − 2uuH )z = z − 2u(uH z) = z.
= z + uH xu − 2uH x |{z}
uH u u = z − uH xu.
1
This can be interpreted as follows: The space perpendicular to u acts as a “mirror”: any vector in that space
(along the mirror) is not reflected, while any other vector has the component that is orthogonal to the space
(the component outside, orthogonal to, the mirror) reversed in direction, as illustrated in Figure 6.1. Notice
that a reflection preserves the length of the vector.
Chapter 6. Notes on Householder QR Factorization 116
where χ1 equals
the
first element of x and x2 is the rest of x. Then we will wish to find a Householder
1
vector u = so that
u2
T
1 1 ±kxk2
I − 1
χ
1 = .
τ u2 u2 x2 0
6.2. Householder Transformations (Reflectors) 117
±kxk2
Notice that y in the previous discussion equals the vector , so the direction of u is given by
0
χ1 ∓ kxk2
v= .
x2
We now wish to normalize this vector so its first entry equals “1”:
v 1 χ ∓ kxk2 1
u= = 1 = .
ν1 χ1 ∓ kxk2 x2 x2 /ν1
where ν1 = χ1 ∓ kxk2 equals the first element of v. (Note that if ν1 = 0 then u2 can be set to 0.)
where χ1 equals
the
first element of x and x2 is the rest of x. Then we will wish to find a Householder
1
vector u = so that
u2
H
1 1
±kxk2
I − 1
χ
1 = .
τ u2 u2 x2 0
Here
± denotes a complex scalar on the complex unit circle. By the same argument as before
χ1 −
±kxk2
v= .
x2
We now wish to normalize this vector so its first entry equals “1”:
v 1 χ −
±kxk2 1
u= = 1 = .
kvk2 χ1 −
±kxk2 x2 x2 /ν1
where ν1 = χ1 −
±kxk2 . (If ν1 = 0 then we set u2 to 0.)
Chapter 6. Notes on Householder QR Factorization 118
where τ = uH u/2 = (1 + uH
2 u2 )/2 and ρ =
±kxk2 .
q
z z
Hint: ρρ = |ρ|2 = kxk22 since H preserves the norm. Also, kxk22 = |χ1 |2 + kx2 k22 and z = |z| .
* SEE ANSWER
χ1
Again, the choice
± is important. For the complex case we choose
± = −sign(χ1 ) = |χ1 |
ρ χ1
, τ := Housev
u2 x2
as the computation of the above mentioned vector u2 , and scalars ρ and τ, from vector x. We will use the
notation H(x) for the transformation I − 1τ uuH where u and τ are computed by Housev(x).
The function
f u n c t i o n [ rho , . . .
u2 , t a u ] = Housev ( c h i 1 , ...
x2 )
Homework 6.6 Function Housev.m implements the steps in Figure 6.2 (left). Update this implementation
with the equivalent steps in Figure 6.2 (right), which is closer to how it is implemented in practice.
* SEE ANSWER
6.3. Householder QR Factorization 119
ρ χ1
Algorithm: , τ = Housev
u2 x2
χ2 := kx2 k2
χ1
α :=
(= kxk2 )
χ2
2
ρ = −sign(χ1 )kxk2 ρ := −sign(χ1 )α
ν1 = χ1 + sign(χ1 )kxk2 ν1 := χ1 − ρ
u2 = x2 /ν1 u2 := x2 /ν1
χ2 = χ2 /|ν1 |(= ku2 k2 )
τ = (1 + uH
2 u2 )/2 τ = (1 + χ22 )/2
Figure 6.2: Computing the Householder transformation. Left: simple formulation. Right: efficient com-
putation. Note: I have not completely double-checked these formulas for the complex case. They
work for the real case.
ρ11 α11 aT12
, τ1 = :=
u21 a21 A22
Original matrix “Move forward”
α11 ρ11 aT12 − wT12
Housev
a21 0 A22 − u21 wT12
× × × × × × × × × × × × × × × ×
× × × × × × × × 0 × × × 0 × × ×
× × × × × × × × 0 × × × 0 × × ×
× × × × × × × × 0 × × × 0 × × ×
× × × × × × × × 0 × × × 0 × × ×
× × × × × × × × × × × ×
0 × × × 0 × × × 0 × × ×
0 × × × 0 0 × × 0 0 × ×
0 × × × 0 0 × × 0 0 × ×
0 × × × 0 0 × × 0 0 × ×
× × × × × × × × × × × ×
0 × × × 0 × × × 0 × × ×
0 0 × × 0 0 × × 0 0 × ×
0 0 × × 0 0 0 × 0 0 0 ×
0 0 × × 0 0 0 × 0 0 0 ×
× × × × × × × × × × × ×
0 × × × 0 × × × 0 × × ×
0 0 × × 0 0 × × 0 0 × ×
0 0 0 × 0 0 0 × 0 0 0 ×
0 0 0 × 0 0 0 0 0 0 0 0
Now let us assume that after k iterations of the algorithm matrix A contains
R r01 R02
RT L RT R 00
A→ = 0 α11 a12 , T
0 ABR
0 a21 A22
and update
I 0 R00 r01 R02
H
A := aT12
0
0 I − 1 1 1
α11
τ1
u2 u2 0 a21 A22
H
0 0 R r01 R02
1 00
= I − 1 1 0 α11 aT12
τ1
u2 u2 0 a21 A22
R r R02
00 01
= 0 ρ11
T T
a12 − w12 ,
0 0 A22 − u21 wT12
where  denotes the original contents of A and RT L is an upper triangular matrix. Rearranging this we find
that
 = H0 H1 · · · Hn−1 R
which shows that if Q = H0 H1 · · · Hn−1 then  = QR.
Chapter 6. Notes on Householder QR Factorization 122
* SEE ANSWER
Typically, the algorithm overwrites the original matrix A with the upper triangular matrix, and at each
step u21 is stored over the elements that become zero, thus overwriting a21 . (It is for this reason that the
first element of u was normalized to equal “1”.) In this case Q is usually not explicitly formed as it can be
stored as the separate Householder vectors below the diagonal of the overwritten matrix. The algorithm
that overwrites A in this manner is given in Fig. 6.4.
We will let
[{U\R},t] = HQR(A)
denote the operation that computes the QR factorization of m × n matrix A, with m ≥ n, via Householder
transformations. It returns the Householder vectors and matrix R in the first argument and the vector of
scalars “τi ” that are computed as part of the Householder transformations in t.
Homework 6.8 Given A ∈ Rm×n show that the cost of the algorithm in Figure 6.4 is given by
2
CHQR (m, n) ≈ 2mn2 − n3 flops.
3
* SEE ANSWER
Homework 6.9 Implement the algorithm in Figure 6.4 as
function [ Aout, tout ] = HQR unb var1( A, t )
Input is an m×n matrix A and vector t of size n. Output is the overwritten matrix A and the vector of scalars
that define the Householder transformations. You may want to use Programming/chapter06/test HQR unb var1.m
to check your implementation.
* SEE ANSWER
6.4 Forming Q
Given A ∈ Cm×n , let [A,t] = HQR(A) yield the matrix A with the Householder vectors stored below the
diagonal, R stored on and above the diagonal, and the τi stored in vector t. We now discuss how to form
the first n columns of Q = H0 H1 · · · Hn−1 . The computation is illustrated in Figure 6.5.
Notice that to pick out the first n columns we must form
In×n In×n In×n
Q = H0 · · · Hn−1 = H0 · · · Hk−1 Hk · · · Hn−1 .
0 0 0
| {z }
Bk
where Bk is defined as indicated.
6.4. Forming Q 123
Repartition
A a01 A02 t
AT L AT R 00 tT 0
→ aT α11 aT
and
→ τ1
10 12
ABL ABR tB
A20 a21 A22 t2
whereα11 and τ1 are scalars
α11 ρ11 α11
, τ1 := , τ1 = Housev
a21 u21 a21
aT12 1 aT
Update := I − 1 1 uH 12
τ1 21
A22 u21 A22
via the steps
• wT12 := (aT12 + aH
21 A22 )/τ1
a T T T
a12 − w12
• 12 :=
A22 A22 − a21 wT12
Continue with
A a A02 t
AT L AT R 00 01 tT 0
← aT α11
aT12 and
← τ1
10
ABL ABR tB
A20 a21 A22 t2
endwhile
α11 aT12
:=
a21 A22
Original matrix “Move forward”
1 − 1/τ1 −(uH
21 A22 )/τ1
−u21 /τ1 A22 + u21 aT12
1 0 0 0 1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 × 0 0 0 ×
0 0 0 0 0 0 0 × 0 0 0 ×
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
0 0 × × 0 0 × ×
0 0 × × 0 0 × ×
0 0 × × 0 0 × ×
1 0 0 0 1 0 0 0
0 × × × 0 × × ×
0 × × × 0 × × ×
0 × × × 0 × × ×
0 × × × 0 × × ×
× × × × × × × ×
× × × × × × × ×
× × × × × × × ×
× × × × × × × ×
× × × × × × × ×
• Inductive step: Assume the result is true for Bk . We show it is true for Bk−1 :
In×n I 0
Bk−1 = Hk−1 Hk · · · Hn−1 = Hk−1 Bk = Hk−1 k×k .
0 0 B̃k
I 0 I 0 0
(k−1)×(k−1) (k−1)×(k−1)
= 1 0 1 0
1
0 I − τk 1 uH
k
uk 0 0 B̃k
I 0
(k−1)×(k−1)
= 1 1 0
1
0 I − 1 uH
τk k
uk 0 B̃k
I 0
(k−1)×(k−1)
= 1 0 1 where yTk = uH
k B̃k /τk
0 − 1/τk yk T
0 B̃k uk
I(k−1)×(k−1) 0
= 1 − 1/τk −yTk
0
−uk /τk B̃k − uk yk T
I(k−1)×(k−1) 0 0
I(k−1)×(k−1)
0
= 0 1 − 1/τk −yTk = .
0 B̃k−1
0 −uk /τk B̃k − uk yTk
Theorem 6.11 Given [A,t] = HQR(A,t) from Figure 6.4, the algorithm in Figure 6.6 overwrites A with
the first n = n(A) columns of Q as defined by the Householder transformations stored below the diagonal
of A and in the vector t.
Repartition
A a A02 t
AT L AT R 00 01 tT 0
→ aT α11 aT
and
→ τ1
10 12
ABL ABR tB
A20 a21 A22 t2
whereα11 and τ1 are scalars
α11 aT12 1
1 1 0
Update := I −
τ1
1 uH
21
a21 A22 u21 0 A22
via the steps
• α11 := 1 − 1/τ1
• aT12 := −(aH
21 A22 )/τ1
Continue with
A a01 A02 t
AT L AT R 00 tT 0
← aT
α11 aT12 and
← τ1
10
ABL ABR tB
A20 a21 A22 t2
endwhile
Figure 6.6: Algorithm for overwriting A with Q from the Householder transformations stored as House-
holder vectors below the diagonal of A (as produced by [A,t] = HQR UNB VAR 1(A,t) ).
6.5. Applying QH 127
You may want to use Programming/chapter06/test HQR unb var1.m to check your implementation.
* SEE ANSWER
Homework 6.13 Given A ∈ Cm×n the cost of the algorithm in Figure 6.6 is given by
2
CFormQ (m, n) ≈ 2mn2 − n3 flops.
3
* SEE ANSWER
6.5 Applying QH
In a future Note, we will see that the QR factorization is used to solve the linear least-squares problem. To
do so, we need to be able to compute ŷ = QH y where QH = Hn−1 · · · H0 .
Let us start by computing H0 y:
H H
1 1 1 1
I − 1
ψ ψ ψ
1 = 1 − 1 /τ1
τ1 u2 u2 y2 y2 u2 u2 y2
| {z }
ω1
ψ1 1 ψ − ω1
= − ω1 = 1 .
y2 u2 y2 − ω1 u2
where ω1 = (ψ1 + uH 2 y2 )/τ1 . This motivates the algorithm in Figure 6.7 for computing y := Hn−1 · · · H0 y
given the output matrix A and vector t from routine HQR.
The cost of this algorithm can be analyzed as follows: When yT is of length k, the bulk of the com-
putation is in an inner product with vectors of length m − k (to compute ω1 ) and an axpy operation with
vectors of length m − k to subsequently update ψ1 and y2 . Thus, the cost is approximately given by
n−1
∑ 4(m − k) ≈ 4mn − 2n2.
k=0
Notice that this is much cheaper than forming Q and then multiplying.
Chapter 6. Notes on Householder QR Factorization 128
Repartition
A a01 A02 t y
AT L AT R 00 0 0
tT yT
→ aT α11 aT
, → τ1 , and
→ ψ1
10 12
ABL ABR tB yB
A20 a21 A22 t2 y2
whereα11 , τ1 , and ψ1 are scalars
ψ1 1
1 ψ1
Update := I −
τ1
1 uH
21
y2 u21 y2
via the steps
• ω1 := (ψ1 + aH
21 y2 )/τ1
ψ1 ψ − ω1
• := 1
y2 y2 − ω1 u2
Continue with
A a A02 t y
AT L AT R 00 01 0 0
tT yT
← aT α11
aT12 , ← τ1 , and
← ψ1
10
ABL ABR tB yB
A20 a21 A22 t2 y2
endwhile
Figure 6.7: Algorithm for computing y := Hn−1 · · · H0 y given the output from the algorithm
HQR UNB VAR 1.
6.6. Blocked Householder QR Factorization 129
where t01 = QH
0 u1 .
* SEE ANSWER
Homework 6.16 Consider ui ∈ Cm with ui 6= 0 (the zero vector). Define τi = (uH
i ui )/2, so that
1 H
Hi = I − ui u
τi i
equals a Householder transformation, and let
U = u0 u1 · · · uk−1 .
Show that
H0 H1 · · · Hk−1 = I −UT −1U H ,
where T is an upper triangular matrix.
* SEE ANSWER
The above exercises can be summarized in the algorithm for computing T from U in Figure 6.8.
Homework 6.17 Implement the algorithm in Figure 6.8 as
function T = FormT unb var1( U, t, T )
* SEE ANSWER
In [28] we call the transformation I −UT −1U H that equals the accumulated Householder transforma-
tions the UT transform and prove that T can instead by computed as
T = triu UH U
(the upper triangular part of U H U) followed by either dividing the diagonal elements by two or setting
them to τ0 , . . . , τk−1 (in order). In that paper, we point out similar published results [10, 35, 46, 32].
Chapter 6. Notes on Householder QR Factorization 130
Repartition
U u01 U02 t
UT L UT R 00 tT 0
→ uT υ11 uT ,
→ τ1
,
10 12
UBL UBR tB
U20 u21 U22 t2
T t T
TT L TT R 00 01 02
→ t T τ11 t T
10 12
TBL TBR
T20 t21 T22
whereυ11 is 1 × 1, τ1 has 1 row, τ11 is 1 × 1
τ11 := τ1
Continue with
U u01 U02 t
UT L UT R 00 tT
0
uT10
← uT12 , ←
τ1 ,
υ11
UBL UBR tB
U u21 U22 t2
20
T t01 T02
TT L TT R 00
← T
t10 τ11 T
t12
TBL TBR
T20 t21 T22
endwhile
Figure 6.8: Algorithm that computes T from U and the vector t so that I − UT −1U H equals the UT
transform. Here U is assumed to the output of, for example, HouseQR unb var1. This means that it is a
lower trapezoidal matrix, with ones on the diagonal.
6.6. Blocked Householder QR Factorization 131
I − βvvT ,
where β = 2/vT v (= 1/τ, where τ is as discussed before). This leads to an alternative accumulation of
Householder transforms known as the compact WY transform [35] (which is itself a refinement of the WY
representation of producs of Householder Transformations [10]):
I −USU H
where upper triangular matrix S relates to the matrix T in the UT transform via S = T −1 . Obviously, T can
be computed first and then inverted via the insights in the next exercise. Alternatively, inversion of matrix
T can be incorporated into the algorithm that computes T (which is what is done in the implementation in
LAPACK), yielding the algorithm in Figure 6.10.
An algorithm that computes the inverse of an upper triangular matrix T based on the above exercise is
given in Figure 6.9.
* SEE ANSWER
Repartition
T t T
TT L TT R 00 01 02
→ t T τ11 t T
10 12
TBL TBR
T20 t21 T22
whereτ11 is 1 × 1
Continue with
T00 t01 T02
TT L TT R
T T
←
t10 τ11 t12
TBL TBR
T20 t21 T22
endwhile
Figure 6.9: Unblocked algorithm for inverting an upper triangular matrix. The algorithm assumes that T00
has already been inverted, and computes the next column of T in the current iteration.
6.6. Blocked Householder QR Factorization 133
Repartition
U u01 U02 t
UT L UT R 00 tT 0
→ uT υ11 uT ,
→ τ1
,
10 12
UBL UBR tB
U20 u21 U22 t2
S s S02
ST L ST R 00 01
→ sT σ11 sT
10 12
SBL SBR
S20 t21 S22
whereυ11 is 1 × 1, τ1 has 1 row, σ11 is 1 × 1
τ11 := 1/τ1
s01 := −τ11U00 ((uT10 )H +U20
Hu )
21
Continue with
U u01 U02 t
UT L UT R 00 tT
0
uT10
← uT12 , ←
τ1 ,
υ11
UBL UBR tB
U u21 U22 t2
20
S s01 S02
ST L ST R 00
sT10
← sT12
σ11
SBL SBR
S20 s21 S22
endwhile
Figure 6.10: Algorithm that computes S from U and the vector t so that I − USU H equals the compact
WY transform. Here U is assumed to the output of, for example, HouseQR unb var1. This means that it
is a lower trapezoidal matrix, with ones on the diagonal.
Chapter 6. Notes on Householder QR Factorization 134
A11 A11
[ ,t1 ] := HQR( )
A21 A21
A11
T11 := F ORM T( ,t1 )
A21
−H
W12 := T11 (U H A +U21HA )
11 12 22
A A −U11W12
12 := 12
A22 A22 −U21W12
Continue with
A00 A01 A02 t0
AT L AT R
tT
←
A10 A11 A12 , ← t1
ABL ABR tB
A20 A21 A22 t2
endwhile
• For T11 from the Householder vectors using the procedure descrived in Section 6.6.1:
A11
T11 := F ORM T( ,t1 )
A21
6.6. Blocked Householder QR Factorization 135
• Now we need to also apply the Householder transformations to the rest of the columns:
H H
A12 U11 U11 A12
:= I − T −1
11
A22 U21 U21 A22
A12 U11
= − W12
A22 U21
A12 −U11W12
= ,
A22 −U21W12
where
−H H H
W12 = T11 (U11 A12 +U21 A22 ).
There are many possible algorithms for computing the QR factorization. For example, the unblocked
algorithm from Figure 6.4 can be merged with the unblocked algorithm for forming T in Figure 6.8 to
yield the algorithm in Figure 6.12.
Let us now again compute the QR factorization of A simultaneous with the forming of T , but now taking
advantage of the fact that T is partically computed to change the algorithm into what some would consider
a “left-looking” algorithm.
Partition
A a01 A02 T t T
00 00 01 02
A → a10 α11 aT12 T → 0 τ11 t12
T T
and .
A20 a21 A22 0 0 T22
A U
00 00
T T
Assume that a10 has been factored and overwritten with u10 and R00 while also computing
A20 U20
T00 . In the next step, we need to apply previous Householder transformations to the next column of A and
then update that column with the next column of R and U. In addition, the next column of T must be
computing. This means:
Chapter 6. Notes on Householder QR Factorization 136
Repartition
A a01 A02 T t01 T02
AT L AT R 00 00
TT L TT R
→ aT α11 aT
, → tT
τ11 T
t12
10 12 10
ABL ABR TBL TBR
A20 a21 A22 T20 t21 T22
whereα11 is 1 × 1, τ11 is 1 × 1
α11 ρ11 α11
, τ11 := , τ11 = Housev
a21 u21 a
21
aT12 1
1 aT12
Update := I −
τ11
1 uH
21
A22 u21 A22
via the steps
• wT12 := (aT12 + aH
21 A22 )/τ11
a T T
a12 − w12T
• 12 :=
A22 A22 − a21 wT12
t01 := (aT10 )H + AH
20 a21
Continue with
A a A02 T t T02
AT L AT R 00 01 00 01
TT L TT R
← aT α11 aT
, ← t T τ11
T
t12
10 12 10
ABL ABR TBL TBR
A20 a21 A22 T20 t21 T22
endwhile
Figure 6.12: Unblocked Householder transformation based QR factorization merged with the computation
of T for the UT transform.
6.6. Blocked Householder QR Factorization 137
Repartition
A a01 A02 T t01 T02
AT L AT R 00 00
TT L TT R
→ aT α11 aT
, → tT
τ11 T
t12
10 12 10
ABL ABR TBL TBR
A20 a21 A22 T20 t21 T22
whereα11 is 1 × 1, τ11 is 1 × 1
H H
a U U a
01 00 00 01
−1 T
α11 := I − u10
T
T00 u10
α11
a02 U20 U20 a21
via the steps
−H H a + α (uT )H +U H a )
• w01 := T00 (U00 01 11 10 20 21
a a U
01 01 00
• α11 := α11 − u10 w01
T
a02 a02 U20
α ρ α
11 , τ11 := 11 , τ11 = Housev 11
a21 u21 a21
t01 := (aT10 )H + AH
20 a21
Continue with
A a A02 T t T02
AT L AT R 00 01 00 01
TT L TT R
← aT α11 aT
, ← t T τ11
T
t12
10 12 10
ABL ABR TBL TBR
A20 a21 A22 T20 t21 T22
endwhile
Figure 6.13: Alternative unblocked Householder transformation based QR factorization merged with the
computation of T for the UT transform.
Chapter 6. Notes on Householder QR Factorization 138
• Update
H H
a U U a
01 00 00 01
−1 T
:= I − u10
T
T00 u10
α11 α11
a21 U20 U20 a21
t01 := AH
20 a21
An alternative blocked variant that uses either of the unblocked factorization routines that merges the
formation of T is given in Figure 6.14.
6.7 Enrichments
6.8 Wrapup
A11 A11
[ , T11 ] := H OUSE QR F ORM T UNB VAR X( )
A21 A21
−H
W12 := T11 (U H A +U21HA )
11 12 22
A A −U11W12
12 := 12
A22 A22 −U21W12
Continue with
A A01 A02 t
AT L AT R 00 0
tT
← A10 A11
A12 , ← t1
ABL ABR tB
A20 A21 A22 t2
endwhile
1 1 1
eps 0 0
0 eps 0
0 0 eps
}
• With the various routines you implemented for Chapter 4 and the current chapter compute
• Finally, check whether the columns of the various computed matices Q are mutually orthonormal:
Q_CGS’ * Q_CGS
Q_MGS’ * Q_MGS
Q_HQR’ * Q_HQR
What do you notice? When we discuss numerical stability of algorithms we will gain insight into
why HQR produces high quality mutually orthogonal columns.
Later, we will see how the QR factorization can be used to solve Ax = b and linear least-squares
problems. At that time we will examine how accurate the solution is depending on which method for QR
factorization is used to compute Q and R.
* SEE ANSWER
A
Homework 6.20 Consider the matrix where A has linearly independent columns. Let
B
• A = QA RA be the QR factorization of A.
RA RA
• = QB RB be the QR factorization of .
B B
A A
• = QR be the QR factorization of .
B B
6.8. Wrapup 141
Repartition
R r01 R02
RT L RT R 00
→ rT
ρ11 T
r12 ,
RBL RBR 10
R20 r21 R22
t
0
tT
BL BR → B0 b1 B2 , → τ1
tB
t2
whereρ11 is 1 × 1, b1 has 1 column, τ1 has 1 row
Continue with
R00 r01 R02
RT L RT R
T T
←
r10 ρ11 r12 ,
RBL RBR
R20 r21 R22
t0
tT
← , ←
BL BR B0 b1 B2 τ1
tB
t2
endwhile
Figure 6.15: Outline for algorithm that compute the QR factorization of a triangular matrix, R, appended
with a matrix B. Upon completion, the Householder vectors are stored in B and the scalars associated with
the Householder transformations in t.
Chapter 6. Notes on Householder QR Factorization 142
Assume that the diagonal entries of RA , RB , and R are all positive. Show that R = RB .
* SEE ANSWER
R
Homework 6.21 Consider the matrix where R is an upper triangular matrix. Propose a mod-
B
ification of HQR UNB VAR 1 that overwrites R and B with Householder vectors and updated matrix R.
Importantly, the algorithm should take advantage of the zeros in R (in other words, it should avoid com-
puting with the zeroes below its diagonal). An outline for the algorithm is given in Figure 6.15.
* SEE ANSWER
6.8.2 Summary
Chapter 7
Notes on Rank Revealing Householder QR
Factorization
7.1.1 Launch
It may be good at this point to skip forward to Section 12.4.1 and learn about permutation matrices.
Given A ∈ Cm×n , with m ≥ n, the reduced QR factorization A = QR yields Q ∈ Cm×n , the columns of
which form an orthonormal basis for the column space of A. If the matrix has rank r with r < n, things
get a bit more complicated. Let P be a permutation matrix so that the first r columns of APT are linearly
independent. Specifically, if we partition
APT → AL AR ,
where AL has r columns, we can find an orthonormal basis for the column space of AL by computing its
QR factorization AL = QL RT L , after which
APT = QL RT L RT R ,
where RT R = QH L AR .
Now, if A is merely almost of rank r (in other words, A = B + E where B has rank r and kEk2 is small),
then
RT L RT R
T
AP = AL AR = QL QR ≈ QL RT L RT R ,
0 RBR
where kRBR k2 is small. This is known as the rank revealing QR factorization (RRQR) and it yields a rank
r approximation to A given by
A ≈ QL RT L RT R .
The magnitudes of the diagonal elements of R are typically chosen to be in decreasing order and some
criteria is used to declare the matrix of rank r, at which point RBR can be set to zero. Computing the
RRQR is commonly called the QR factorization with column pivoting (QRP), for reason that will become
apparent.
143
Chapter 7. Notes on Rank Revealing Householder QR Factorization 144
7.1.2 Outline
Building on the last exercise, we make an important observation that greatly reduces the cost of de-
termining the column that is longest. Let us start by computing v as the vector such that the ith entry in
v equals the square of the length of the ith column of A. In other words, the ith entry of v equals the dot
product of the i column of A with itself. In the above outline for the MGS with column pivoting, we can
then also partition
ν1
v→ .
v2
The question becomes how v2 before the update A2 := A2 − q1 r12 T compares to v after that update. The
2
T .
answer is that the ith entry of v2 must be updated by subtracting off the square of the ith entry of r12
Let us introduce the functions v = C OMPUTE W EIGHTS(A) and v = U PDATE W EIGHTSv, r to compute
the described weight vector v and to update a weight vector v by subtracting from its elements the squares
of the corresponding entries of r. Also, the function D ETERMINE P IVOT returns the index of the largest in
the vector, and swaps that entry with the first entry. A MGS algorithm with column pivoting, MGSP, is
then given in Figure 7.1. In that algorithm, A is overwritten with Q.
(I −UT −T U T ) A
b = R,
| {z }
QT
where A
b represents the orginal contents of A, U is the matrix of Householder vectors, T is computed from
U as described in Section 6.6.1, and R is upper triangular.
Now, partially into the computation we find ourselves in the state where
H
UT L UT L A
bT L A
bT R RT L RT R
I − T −H = .
TL
UBL UBL A
bBL A
bBR 0 A
eBR
Chapter 7. Notes on Rank Revealing Householder QR Factorization 148
Repartition
R00 r01 R02
RT L RT R
→ , → T T ,
AL AR A0 a1 A2 r10 ρ11 r12
RBL RBR
R20 r21 R22
p v
pT 0 vT
0
→ π1 ,
→ ν1
pB vB
p2 v2
wherea1 has 1 column, ρ11 is 1 × 1, π1 has 1 row, ν1 has 1 row
ν1 ν1
[ , π1 ] = D ETERMINE P IVOT( )
v2 v2
A0 a1 A2 := A0 a1 A2 P(π1 )T
ρ11 := ka1 k2
q1 := a1 /ρ11
T := qT A
r12 1 2
T
A2 := A2 − q1 r12
v2 := U PDATE W EIGHTS(v2 , r12 )
Continue with
···
endwhile
Repartition
A00 a01 A02 t0 p0 v0
AT L AT R t p v
T → τ , T → π , T → ν
→ aT aT12
α11 ,
10 1 1 1
ABL ABR tB pB vB
A20 a21 A22 t2 p2 v2
whereα11 and τ1 are scalars
ν1 ν1
[ , π1 ] = D ETERMINE P IVOT( )
v2 v2
a A02 a A02
01 01
α11 a12 α11 a12 P(π1 )T
T := T
a21 A22 a21 A22
α ρ α
11 , τ1 := 11 , τ1 = Housev 11
a21 u21 a21
wT12 := (aT12 + aH A )/τ
21 22 1
aT12 aT12 − wT12
:=
A22 T
A22 − a21 w12
v2 = U PDATE W EIGHT(v2 , a12 )
Continue with
···
endwhile
Figure 7.2: Simple unblocked rank revealing QR fractorization via Householder transformations (HQRP,
unblocked Variant 1).
Chapter 7. Notes on Rank Revealing Householder QR Factorization 150
by which we mean that the Householder vectors computed so far are stored below the diagonal of RT L .
Let us manipulate this a bit:
H
A
bT L AbT R UT L U A A R R
T −H T L T L TR
= TL TR
b b
− .
TL
ABL ABR
b b UBL UBL ABL ABR
b b 0 ABR
e
| {z }
WT L WT R
| {z }
U W UT LWT R
TL TL
UBLWT L UBLWT R
bBR −UBLWT R .
eBR = A
from which we find that A
Now, what if we do not bother to update ABR so that instead at this point in the computation A contains
U\RT L RT R
,
UBL ABR
b
but we also have computed and stored WT R . How would we be able to compute another Householder
vector, row of R, and row of W ?
Repartition
AbT L AbT R UT L RT L RT R
− WT L WT R = .
ABL ABR
b b UBL 0 ABR
e
| {z } | {z }
A ab01 A U R r01 R02
00
b02
00 00
b
b 11 ab12 − u10 W00 w01 W02
T T T T
e 11 ae12 .
ab10 α 0 α
A
b20 ab21 A b22 U20 0 ae21 A e22
What does this mean? To update the next rows of A and R according to the QR factorization via
Householder transformations
• The Householder vector can then be computed, updating α11 and a12 ;
• wT12 := aT12 − aT21 A22 + (aT21 A21 )W02 )/τ(= aT12 − uT21 (A22 +U21W02 )/τ = aT12 − uT21 A
e22 /τ). This com-
putes the next row of W (or, more precisely, the part of that row that will be used in future iterations);
and
• aT12 := aT12 − wT12 . This computes the remainder of the next row of R, overwriting A.
The resulting algorithm, to which column pivoting is added, is presented in Figure 7.3. In that figure
HQRP UNB VAR 1 is also given for comparison. An extra parameter, r, is added to stop the process after
r columns of A have been completely computed. This will allow the algorithm to be used to compute only
the first r columns of Q (or rather the Householder vectors from which those columns can be computed)
and will also allow it to be incorporated into a blocked algorithm, as we will discuss next.
7.5 Computing Q
Given the Householder vectors computed as part of any of the HQRP algorithms, any number of desired
columns of matrix Q can be computed using the techniques described in Section 6.4.
7.6 Enrichments
Repartition
A00 a01 A02 t0 p0
AT L AT R tT p
T → π ,
→ aT T →
10 α11 a12 , τ1 ,
1
ABL ABR tB pB
A20 a21 A22 t2 p2
v0 W00 w01 W02
v W WT R
T → ν , T L
→ wT wT12
1 10 ω11
vB WBL WBR
v2 W20 w21 W22
where α11 and τ1 are scalars
Continue with
···
endwhile
Figure 7.3: HQRP unblocked Variant 3 that updates r rows and columns of A, but leaves the trailing matrix
ABR prestine. Here v has already been initialized before calling the routine so that it can be called from a
blocked algorithm. The HQRP unblocked Variant 1 from Figure 7.2 is also given for reference.
7.6. Enrichments 153
A11 A12 v1
[ ,t1 , p1 , , W1 W2 ] :=
A21 A22 v
2
A11 A12 v
,t1 , p1 , 1 , W1 W2 , b)
HQRP UNB VAR 3 (
A21 A22 v2
A22 := A22 −U2W2
Continue with
A00 A01 A02 t0
AT L AT R t
← A10 A11 A12 , T ← t1 ,
ABL ABR tB
A20 A21 A22 t2
p0 v0
pT vT
← p1 , ← v1 , WL WR ← W0 W1 W2
pB vB
p2 v2
endwhile
Figure 7.4: Blocked HQRP algorithm. Note: W starts as a b × n(A) matrix. If this is not a uniform block
size used during computation, resizing may be necessary.
Chapter 7. Notes on Rank Revealing Householder QR Factorization 154
7.7 Wrapup
7.7.2 Summary
Chapter 8
Notes on Solving Linear Least-Squares
Problems
For a motivation of the linear least-squares problem, read Week 10 (Sections 10.3-10.5) of
Video
Read disclaimer regarding the videos in the preface!
8.0.1 Launch
155
Chapter 8. Notes on Solving Linear Least-Squares Problems 156
8.0.2 Outline
In other words, x is the vector that minimizes the expression kAx − yk2 . Equivalently, we can solve
If x solves the linear least-squares problem, then Ax is the vector in C (A) (the column space of A) closest
to the vector y.
If A has linearly independent columns then AT A is nonsingular. Hence, the x that minimizes kAx − yk2
solves AT Ax = AT y. This is known as the method of normal equations. Notice that then
x = (AT A)−1 AT y,
| {z }
A†
• Compute ŷ = AT y.
Cost: 2mn flops.
• Solve Lz = ŷ and LT x = z.
Cost: n2 flops each.
8.3. Solving the LLS Problem Via the QR Factorization 159
Thus, the total cost of solving the LLS problem via normal equations is approximately mn2 + 13 n3 flops.
Remark 8.1 We will later discuss that if A is not well-conditioned (its columns are nearly linearly depen-
dent), the Method of Normal Equations is numerically unstable because AT A is ill-conditioned.
The above discussion can be generalized to the case where A ∈ Cm×n . In that case, x must solve AH Ax =
AH y.
A geometric explanation of the method of normal equations (for the case where A is real valued) can
be found in Week 10 (Sections 10.3-10.5) of Linear Algebra: Foundations to Frontiers - Notes to LAFF
With.
Assume A ∈ Cm×n has linearly independent columns and let A = QL RT L be its QR factorization. We wish
to compute the solution to the LLS problem: Find x ∈ Cn such that
Notice that we know that, if A has linearly independent columns, the solution is given by x = (AH A)−1 AH y
(the solution to the normal equations). Now,
Thus, the desired x that minimizes the linear least-squares problem solves RT L x = QH
L y. The solution is
unique because RT L is nonsingular (because A has linearly independent columns).
In practice, one performs the following steps:
• Compute the QR factorization A = QL RT L .
If Gram-Schmidt or Modified Gram-Schmidt are used, this costs 2mn2 flops.
• Form ŷ = QH
L y.
Cost: 2mn flops.
• Solve RT L x = ŷ (triangular solve).
Cost: n2 flops.
Thus, the total cost of solving the LLS problem via (Modified) Gram-Schmidt QR factorization is approx-
imately 2mn2 flops.
Notice that the solution computed by the Method of Normal Equations (generalized to the complex
case) is given by
(AH A)−1 AH y = ((QL RT L )H (QL RT L ))−1 (QL RT L )H y = (RH H −1 H H
T L QL QL RT L ) RT L QL y
−1 H −1 −H H −1 H −1
= (RH H H
T L RT L ) RT L QL y = RT L RT L RT L QL y = RT L QL y = RT L ŷ = x
where RT L x = ŷ. This shows that the two approaches compute the same solution, generalizes the Method of
Normal Equations to complex valued problems, and shows that the Method of Normal Equations computes
the desired result without requiring multivariate calculus.
8.4. Via Householder QR Factorization 161
Given A ∈ Cm×n with linearly independent columns, the Householder QR factorization yields n House-
holder transformations, H0 , . . . , Hn−1 , so that
RT L
Hn−1 · · · H0 A = .
| {z }
H 0
Q = QL QR
QH
L
H
ŷ = QH
L y = I 0 y = I 0 QL QR y= I 0 QH y
QH
R
= I 0 (Hn−1 · · · H0 )y. = I 0 ( Hn−1 · · · H0 y ) = wT .
| {z }
wT
w=
wB
Form w= (Hn−1 (· · · (H0 y) · · ·)) (see “Notes on Householder QR Factorization”). Partition w =
•
w
T where wT ∈ Cn . Then ŷ = wT .
wB
Cost: 4m2 − 2n2 flops. (See “Notes on Householder QR Factorization” regarding this.)
• Solve RT L x = ŷ.
Cost: n2 flops.
Thus, the total cost of solving the LLS problem via Householder QR factorization is approximately 2mn2 −
2 3
3 n flops. This is cheaper than using (Modified) Gram-Schmidt QR factorization, and hence preferred
(because it is also numerically more stable, as we will discuss later in the course).
Chapter 8. Notes on Solving Linear Least-Squares Problems 162
Given A ∈ Cm×n with linearly independent columns, let A = UΣV H be its SVD decomposition. Partition
ΣT L
U= UL UR and Σ= ,
0
ΣT L
A= UL UR V H = UL ΣT LV H .
0
We wish to compute the solution to the LLS problem: Find x ∈ Cn such that
Notice that we know that, if A has linearly independent columns, the solution is given by x = (AH A)−1 AH y
(the solution to the normal equations). Now,
= V Σ−1 −1 H H
T L ΣT LV V ΣT LUL y V −1 = V H and (BCD)−1 = D−1C−1 B−1
= V Σ−1 H
T LUL y V H V = I and Σ−1
T L ΣT L = I
8.6. What If A Does Not Have Linearly Independent Columns? 163
The x that solves ΣT LV H x = ULH y minimizes the expression. That x is given by x = V Σ−1 H
T LUL y.
This suggests the following approach:
• Form ŷ = Σ−1 H
T LUL y.
Cost: approx. 2mn flops.
• Compute z = V ŷ.
Cost: approx. 2mn flops.
Now,
characterizes all solutions to the linear least-squares problem, where wB can be chosen to be any vector of
size n − r. By choosing wB = 0 and hence x = VL Σ−1 H
T LUL y we choose the vector x that itself has minimal
2-norm.
8.6. What If A Does Not Have Linearly Independent Columns? 165
The sequence of pictures on the following pages reasons through the insights that we gained so far
(in “Notes on the Singular Value Decomposition” and this note). These pictures can be downloaded as a
PowerPoint presentation from
http://www.cs.utexas.edu/users/flame/Notes/Spaces.pptx
Chapter 8. Notes on Solving Linear Least-Squares Problems 166
A
C(VL ) C(U L )
Row
space
Column
space
dim = r ŷ dim = r
Ax = ŷ
0 x = xr + x n 0
dim = n − r dim = m − r
C(U R )
Le1
null
Null
space
space
C(VR )
If A ∈ Cm×n and
ΣT L 0 H
A= UL UR VL VR = UL ΣT LVLH
0 0
Also, given a vector x ∈ Cn , the matrix A maps x to yb = Ax, which must be in C (A) = C (UL ).
8.6. What If A Does Not Have Linearly Independent Columns? 167
A
C(VL ) C(U L )
Row
space
Column
space
dim = r
dim = r
Ax = y??
y
0
x 0
dim = n − r dim = m − r
C(U R )
Le1
null
Null
space
space
C(VR )
Given an arbitrary y ∈ Cm not in C (A) = C (UL ), there cannot exist a vector x ∈ Cn such
that Ax = y.
Chapter 8. Notes on Solving Linear Least-Squares Problems 168
A
C(VL ) C(U L )
Row
space
Column
−1
xr = VL Σ TLU L y H
space
Axr = U L ΣTLVLH xr = ŷ ŷ = U LU LH y
dim = r
dim = r
Ax = ŷ
y
0
x = xr + x n 0
xn = VR wB C(U R )
Le1
null
Null
space
space
C(VR )
The solution to the Linear Least-Squares problem, x, equals any vector that is mapped by
A to the projection of y onto the column space of A: yb = A(AH A)−1 AH y = QQH y. This
solution can be written as the sum of a (unique) vector in the row space of A and any vector
in the null space of A. The vector in the row space of A is given by
xr = VL Σ−1 H −1 H
T LUL y == VL ΣT LUL y
b.
8.6. What If A Does Not Have Linearly Independent Columns? 169
The sequence of pictures, and their explanations, suggest a much simply path towards the formula for
solving the LLS problem.
Ax = ULULH y.
• We know that there must be a solution xr in the row space of A. It suffices to find wT such that
xr = VL wT .
AVL wT = ULULH y.
• Since there is a one-to-one mapping by A from the row space of A to the column space of A, we
know that wT is unique. Thus, if we find a solution to the above, we have found the solution.
• Since A = UL Σ−1 H
T LVL , we can rewrite the above equation as
ΣT L wT = ULH y
so that wT = Σ−1 H
T LUL y.
• Hence
xr = VL Σ−1 H
T LUL y.
• Adding any vector in the null space of A to xr also yields a solution. Hence all solutions to the LLS
problem can be characterized by
x = VL Σ−1 H
T LUL y +VR wR .
ΣT L wL = vT
• Thus, by expressing x in the right basis (the columns of VL ) and the projection of y in the right
basis (the columns of UL ), the problem became trivial, since the matrix that related the solution
to the right-hand side became diagonal.
Chapter 8. Notes on Solving Linear Least-Squares Problems 170
You may want to review “Notes on the QR Factorization” as you do this exercise.
Homework 8.2 Let A ∈ Cm×n with m < n have linearly independent rows. Show that there exist a lower
triangular matrix LL ∈ Cm×m and a matrix QT ∈ Cm×n with orthonormal rows such that A = LL QT ,
noting that LL does not have any zeroes on the diagonal. Letting L = LL 0 be Cm×n and unitary
QT
Q= , reason that A = LQ.
QB
Don’t overthink the problem: use results you have seen before.
* SEE ANSWER
Homework 8.3 Let A ∈ Cm×n with m < n have linearly independent rows. Consider
kAx − yk2 = min kAz − yk2 .
z
Use the fact that A = LL QT , where LL ∈ Cm×m is lower triangular and QT has orthonormal rows, to argue
−1
that any vector of the
form QH H
T LL y + QB wB (where wB is any vector in C
n−m ) is a solution to the LLS
QT
problem. Here Q = .
QB
* SEE ANSWER
Homework 8.4 Continuing Exercise 8.2, use Figure 8.1 to give a Classical Gram-Schmidt inspired al-
gorithm for computing LL and QT . (The best way to check you got the algorithm right is to implement
it!)
* SEE ANSWER
Homework 8.5 Continuing Exercise 8.2, use Figure 8.2 to give a Householder QR factorization inspired
algorithm for computing L and Q, leaving L in the lower triangular part of A and Q stored as Householder
vectors above the diagonal of A. (The best way to check you got the algorithm right is to implement it!)
* SEE ANSWER
8.8 Wrapup
8.8.2 Summary
8.8. Wrapup 171
Repartition
A0 L00 l01 L02 Q0
AT LT L LT R
QT T
→ aT1 ,
→ T
l10 λ11 T
l12 , → q1
AB LBL LBR QB
A2 L20 l21 L22 Q2
wherea1 has 1 row, λ11 is 1 × 1, q1 has 1 row
Continue with
A L l L02 Q
AT 0 L LT R 00 01 0
QT T
← aT
, TL ← l T λ11
T
l12 , ← q1
1 10
AB LBL LBR QB
A2 L20 l21 L22 Q2
endwhile
Repartition
A a01 A02 t
AT L AT R 00 0
tT
→ aT α11 aT
, → τ1
ABL ABR 10 12 tB
A20 a21 A22 t2
whereα11 is 1 × 1, τ1 has 1 row
Continue with
A a A02 t
AT L AT R 00 01 0
tT
← aT α11
aT12 , ← τ1
ABL ABR 10 tB
A20 a21 A22 t2
endwhile
9.1.1 Launch
Correctness in the presence of error (e.g., when floating point computations are performed) takes on a
different meaning. For many problems for which computers are used, there is one correct answer, and
we expect that answer to be computed by our program. The problem is that, as we will see later, most
real numbers cannot be stored exactly in a computer memory. They are stored as approximations, floating
point numbers, instead. Hence storing them and/or computing with them inherently incurs error.
Naively, we would like to be able to define a program that computes with floating point numbers as
being “correct” if it computes an answer that is close to the exact answer. Unfortunately, some problems
that are computed this way have the property that a small change in the input yields a large change in the
output. Surely we can’t blame the program for not computing an answer close to the exact answer in this
case. The mere act of storing the input data as a floating point number may cause a completely different
output, even if all computation is exact. We will later define stability to be a property of a program. It
is what takes the place of correctness. In this note, we instead will focus on when a problem is a “good”
problem, meaning that in exact arithmetic a ”small” change in the input will always cause at most a “small”
change in the output, or a “bad” problem if a “small” change may yield a “large” A good problems will be
called well-conditioned. A bad problem will be called ill-conditioned.
Notice that “small” and “large” are vague. To some degree, norms help us measure size. To some
degree, “small” and “large” will be in the eyes of the beholder (in other words, situation dependent).
Video
Read disclaimer regarding the videos in the preface!
173
Chapter 9. Notes on the Condition of a Problem 174
9.1.2 Outline
9.2 Notation
Throughout this note, we will talk about small changes (perturbations) to scalars, vectors, and matrices.
To denote these, we attach a “delta” to the symbol for a scalar, vector, or matrix.
Notice that the “delta” touches the α, x, and A, so that, for example, δx is not mistaken for δ · x.
1 1
kyk = kAxk ≤ kAkkxk or, equivalently, ≤ kAk . (9.1)
kxk kyk
Also,
A(x + δx) = y + δy
implies that Aδx = δy so that δx = A−1 δy.
Ax = y
Hence
kδxk = kA−1 δyk ≤ kA−1 kkδyk. (9.2)
Combining (9.1) and (9.2) we conclude that
kδxk kδyk
≤ kAkkA−1 k .
kxk kyk
What does this mean? It means that the relative error in the solution is at worst the relative error in
the right-hand side, amplified by kAkkA−1 k. So, if that quantity is “small” and the relative error in the
right-hand size is “small” and exact arithmetic is used, then one is guaranteed a solution with a relatively
“small” error.
The quantity κk·k (A) = kAkkA−1 k is called the condition number of nonsingular matrix A (associated
with norm k · k).
9.3. The Prototypical Example: Solving a Linear System 177
Are we overestimating by how much the relative error can be amplified? The answer to this is
no. For every nonsingular matrix A, there exists a right-hand side y and perturbation δy such that, if
A(x + δx) = y + δy,
kδxk kδyk
= kAkkA−1 k .
kxk kyk
In order for this equality to hold, we need to find y and δy such that
kAxk
kyk = kAxk = kAkkxk or, equivalently, kAk =
kxk
and
kA−1 δyk
kδxk = kA−1 δyk = kA−1 kkδyk. or, equivalently, kA−1 k = .
kδyk
In other words, x can be chosen as a vector that maximizes kAxk/kxk and δy should maximize kA−1 δyk/kδyk.
The vector y is then chosen as y = Ax.
What if we use the 2-norm? For this norm, κ2 (A) = kAk2 kA−1 k2 = σ0 /σn−1 . So, the ratio between the
largest and smallest singular value determines whether a matrix is well-conditioned or ill-conditioned.
To show for what vectors the maximal magnification is attained, consider the SVD
σ0
σ1 H
A = UΣV T = u0 u1 · · · un−1 v0 v1 · · · vn−1 .
..
.
σn−1
Recall that
• kA−1 k2 = 1/σn−1 , un−1 is the vector that maximizes maxkzk2 =1 kA−1 zk2 , and Avn−1 = σn−1 un−1 .
R2 : Domain of A. R2 : Codomain of A.
Figure 9.1: Illustration for choices of vectors y and δy that result in kδxk 2 σ0 kδyk2
kxk2 = σn−1 kyk2 . Because of the
eccentricity of the ellipse, the relatively small change δy relative to y is amplified into a relatively large
change δx relative to x. On the right, we see that kδyk2 /kyk2 = βσ1 /σ0 (since ku0 k2 = ku1 k2 = 1). On the
left, we see that kδxk2 /kxk = β (since kv0 k2 = kv1 k2 = 1).
Homework 9.1 Show that, if A is a nonsingular matrix, for a consistent matrix norm, κ(A) ≥ 1.
* SEE ANSWER
We conclude from this that we can generally only expect as much relative accuracy in the solution as
we had in the right-hand side.
9.3. The Prototypical Example: Solving a Linear System 179
Alternative exposition Note: the below links conditioning of matrices to the relative condition number
of a more general function. For a more thorough treatment, you may want to read Lecture 12 of “Trefethen
and Bau”. That book discusses the subject in much more generality than is needed for our discussion of
linear algebra. Thus, if this alternative exposition baffles you, just skip it!
Let f : Rn → Rm be a continuous function such that f (y) = x. Let k · k be a vector norm. Consider for
y 6= 0
f k f (y + δy) − f (y)k kδyk
κ (y) = lim sup / .
δ→0 k f (y)k kyk
kδyk ≤ δ
(Obviously, if δy = 0 or y = 0 or f (y) = 0 , things get a bit hairy, so let’s not allow that.)
Roughly speaking, κ f (y) equals the maximum that a(n infitessimally) small relative error in y is mag-
nified into a relative error in f (y). This can be considered the relative condition number of function f . A
large relative condition number means a small relative error in the input (y) can be magnified into a large
relative error in the output (x = f (y)). This is bad, since small errors will invariable occur.
Now, if f (y) = x is the function that returns x where Ax = y for a nonsingular matrix A ∈ Cn×n , then
via an argument similar to what we did earlier in this section we find that κ f (y) ≤ κ(A) = kAkkA−1 k, the
condition number of matrix A:
k f (y + δy) − f (y)k kδyk
lim sup /
δ→0 k f (y)k kyk
kδyk ≤ δ
kA−1 (y + δy) − A−1 (y)k
kδyk
= lim sup /
δ→0 kA−1 yk kyk
kδyk ≤ δ
kA−1 (y + δy) − A−1 (y)k
kδyk
= lim max /
δ→0 kA−1 yk kyk
kzk = 1
δy = δ · z
kA−1 δyk
−1
kA yk
= lim max /
δ→0 kδyk kyk
kzk = 1
δy = δ · z
kA−1 δyk
kyk
= lim max
δ→0 kδyk kA−1 yk
kzk = 1
δy = δ · z
Chapter 9. Notes on the Condition of a Problem 180
kA −1 δyk kyk
= lim max
kδyk kA−1 yk
δ→0
kzk = 1
δy = δ · z
−1
kA (δ · z)k kyk
= lim max
δ→0 kδ · zk kA−1 yk
kzk = 1
δy = δ · z
−1
kA zk kyk
= max
kzk kA−1 yk
kzk = 1
−1
kA zk kAxk
= max
kzk kxk
kzk = 1
−1
kA zk kAxk
≤ max max
kzk kxk
kzk = 1 x 6= 0
= kAkkA−1 k,
where k · k is the matrix norm induced by vector norm k · k.
Figure 9.2: Linear least-squares problem kAx − yk2 = minv kAv − yk2 . Vector z is the projection of y onto
C (A).
and hence
1 kAk2
≤
kxk2 cos θkyk2
We conclude that
kδxk2 kAk2 k(AH A)−1 AH k2 kδyk2 1 σ0 kδyk2
≤ =
kxk2 cos θ kyk2 cos θ σn−1 kyk2
where σ0 and σn−1 are (respectively) the largest and smallest singular values of A, because of the following
result:
Homework 9.2 If A has linearly independent columns, show that k(AH A)−1 AH k2 = 1/σn−1 , where σn−1
equals the smallest singular value of A. Hint: Use the SVD of A.
* SEE ANSWER
The condition number of A ∈ Cm×n with linearly independent columns is κ2 (A) = σ0 /σn−1 .
Notice the effect of the cos θ. When y is almost perpendicular to C (A), then its projection z is small
and cos θ is small. Hence a small relative change in y can be greatly amplified. This makes sense: if y is
almost perpendical to C (A), then x ≈ 0, and any small δy ∈ C (A) can yield a relatively large change δx.
• Reason that using the method of normal equations to solve Ax = y has a condition number of κ2 (A)2 .
Chapter 9. Notes on the Condition of a Problem 182
* SEE ANSWER
Let A ∈ Cm×n have linearly independent columns. If one uses the Method of Normal Equations to solve
the linear least-squares problem minx kAx−yk2 , one ends up solving the square linear system AH Ax = AH y.
Now, κ2 (AH A) = κ2 (A)2 . Hence, using this method squares the condition number of the matrix being used.
Thus,
k∆Ck2 = kA∆Bk2 ≤ kAk2 k∆Bk2 .
Also, B = A−1C so that
kBk2 = kA−1Ck2 ≤ kA−1 k2 kCk2
and hence
1 1
≤ kA−1 k2 .
kCk2 kBk2
Thus,
k∆Ck2 k∆Bk2 k∆Bk2
≤ kAk2 kA−1 k2 = κ2 (A) .
kCk2 kBk2 kBk2
This means that the relative error in matrix C = AB is at most κ2 (A) greater than the relative error in B.
The following exercise gives us a hint as to why algorithms that cast computation in terms of multipli-
cation by unitary matrices avoid the buildup of error:
This means is that the relative error in matrix C = UB is no greater than the relative error in B when U is
unitary.
Homework 9.6 Characterize the set of all square matrices A with κ2 (A) = 1.
* SEE ANSWER
κ2 (A) = 5.
Now, let us change the problem to
κ2 (B) = 2600.
So what is going on?
One way to look at this is that matrix B is close to singular: If only three significant digits are stored,
then compared to 3000 and 2000 the values 2 and 3 are not that different from 0. So in that case
2 3000 0 3000
≈ ,
3 2000 0 2000
Also
Ax = b
can be changed to
ADR D−1 z = b.
|{z} R
B
Chapter 9. Notes on the Condition of a Problem 184
What this in turn shows is that sometimes a poorly conditioned matrix can be transformed into a well-
conditioned matrix (or, at least, a better conditioned matrix). When presented with Ax = b, a procedure
for solving for x may want to start by balancing the norms of rows and columns of A:
DL AD−1 DR x = DL b ,
| {z R } |{z} |{z}
B z b̂
where DL and DR are diagonal matrices chosen so that κ(B) is smaller than κ(A). This is known as
balancing the matrix. The example also shows how this can be thought of as picking the units in terms of
which the entries of x and b are expressed carefully so that large values are not artificially introduced into
the matrix.
9.8 Wrapup
* SEE ANSWER
9.9. Wrapup 185
9.9 Wrapup
9.9.2 Summary
Chapter 9. Notes on the Condition of a Problem 186
Chapter 10
Notes on the Stability of an Algorithm
Based on “Goal-Oriented and Modular Stability Analysis” [7, 8] by Paolo Bientinesi and Robert van de
Geijn.
Video
Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
10.0.1 Launch
187
Chapter 10. Notes on the Stability of an Algorithm 188
10.0.2 Outline
Figure 10.1: In this illustation, f : D → R is a function to be evaluated. The function fˇ represents the
implementation of the function that uses floating point arithmetic, thus incurring errors. The fact that
for a nearby value x̌ the computed value equals the exact function applied to the slightly perturbed x,
f (x̌) = fˇ(x), means that the error in the computation can be attributed to a small change in the input. If
this is true, then fˇ is said to be a (numerically) stable implementation of f for input x.
10.1 Motivation
Correctness in the presence of error (e.g., when floating point computations are performed) takes on a
different meaning. For many problems for which computers are used, there is one correct answer and we
expect that answer to be computed by our program. The problem is that most real numbers cannot be stored
exactly in a computer memory. They are stored as approximations, floating point numbers, instead. Hence
storing them and/or computing with them inherently incurs error. The question thus becomes “When is a
program correct in the presense of such errors?”
Let us assume that we wish to evaluate the mapping f : D → R where D ⊂ Rn is the domain and
R ⊂ Rm is the range (codomain). Now, we will let fˆ : D → R denote a computer implementation of this
function. Generally, for x ∈ D it is the case that f (x) 6= fˆ(x). Thus, the computed value is not “correct”.
From the Notes on Conditioning, we know that it may not be the case that fˆ(x) is “close to” f (x). After
all, even if fˆ is an exact implementation of f , the mere act of storing x may introduce a small error δx and
f (x + δx) may be far from f (x) if f is ill-conditioned.
The following defines a property that captures correctness in the presense of the kinds of errors that
are introduced by computer arithmetic:
Definition 10.1 Let given the mapping f : D → R, where D ⊂ Rn is the domain and R ⊂ Rm is the range
(codomain), let fˆ : D → R be a computer implementation of this function. We will call fˆ a (numerically)
stable implementation of f on domain D if for all x ∈ D there exists a x̂ “close” to x such that fˆ(x) = f (x̂).
10.2. Floating Point Numbers 191
In other words, fˆ is a stable implementation if the error that is introduced is similar to that introduced
when f is evaluated with a slightly changed input. This is illustrated in Figure 10.1 for a specific input x.
If an implemention is not stable, it is numerically unstable.
• There is a largest number (in absolute value) that can be stored. Any number that is larger “over-
flows”. Typically, this causes a value that denotes a NaN (Not-a-Number) to be stored.
• There is a smallest number (in absolute value) that can be stored. Any number that is smaller
“underflows”. Typically, this causes a zero to be stored.
Notice that
√ n−1
kxk2 ≤ n max |χi |
i=0
and hence unless some χi is close to overflowing, the result will not overflow. The problem is that if
some element χi has the property that χ2i overflows, intermediate results in the computation in (10.1) will
overflow. The solution is to determine k such that
n−1
|χk | = max |χi |
i=0
It can be argued that the same approach also avoids underflow if underflow can be avoided..
Now, in practice any base β can be used and floating point computation uses rounding rather than
truncating. A similar analysis can be used to show that then
|δχ| ≤ u|χ|
where u = 12 β1−t is known as the machine epsilon or unit roundoff. When using single precision or
double precision real arithmetic, u ≈ 10−8 or 10−16 , respectively. The quantity u is machine dependent; it
is a function of the parameters characterizing the machine arithmetic.
The unit roundoff is often alternatively defined as the maximum positive floating point number which
can be added to the number stored as 1 without changing the number stored as 1. In the notation
introduced below, fl(1 + u) = 1.
Homework 10.3 Assume a floating point number system with β = 2 and a mantissa with t digits so that a
typical positive number is written as .d0 d1 . . . dt−1 × 2e , with di ∈ {0, 1}.
• What is the largest positive real number u (represented as a binary fraction) such that the float-
ing point representation of 1 + u equals the floating point representation of 1? (Assume rounded
arithmetic.)
* SEE ANSWER
10.3. Notation 193
10.3 Notation
When discussing error analyses, we will distinguish between exact and computed quantities. The function
fl(expression) returns the result of the evaluation of expression, where every operation is executed in
floating point arithmetic. For example, assuming that the expressions are evaluated from left to right,
fl(χ + ψ + ζ/ω) is equivalent to fl(fl(fl(χ) + fl(ψ)) + fl(fl(ζ)/fl(ω))). Equality between the quantities lhs
and rhs is denoted by lhs = rhs. Assignment of rhs to lhs is denoted by lhs := rhs (lhs becomes rhs). In
the context of a program, the statements lhs := rhs and lhs := fl(rhs) are equivalent. Given an assignment
κ := expression, we use the notation κ̌ (pronounced “check kappa”) to denote the quantity resulting from
fl(expression), which is actually stored in the variable κ.
The fact that the bounds that we establish can be easily converted into bounds involving norms is a
consequence of the following theorem, where k kF indicates the Frobenius matrix norm.
Theorem 10.11 Let A, B ∈ Rm×n . If |A| ≤ |B| then kAk1 ≤ kBk1 , kAk∞ ≤ kBk∞ , and kAkF ≤ kBkF .
κ := χ0 ψ0 + χ1 ψ1 .
Then, under the computational model given in Section 10.4, if κ := χ0 ψ0 +χ1 ψ1 is executed, the computed
result, κ̌, satisfies
T
χ0 ψ
κ̌ = 0
χ1 ψ1
= [χ0 ψ0 + χ1 ψ1 ]
= [[χ0 ψ0 ] + [χ1 ψ1 ]]
(0) (1)
= [χ0 ψ0 (1 + ε∗ ) + χ1 ψ1 (1 + ε∗ )]
(0) (1) (1)
= (χ0 ψ0 (1 + ε∗ ) + χ1 ψ1 (1 + ε∗ ))(1 + ε+ )
(0) (1) (1) (1)
= χ0 ψ0 (1 + ε∗ )(1 + ε+ ) + χ1 ψ1 (1 + ε∗ )(1 + ε+ )
Chapter 10. Notes on the Stability of an Algorithm 196
Algorithm: D OT:κ := xT y
xT yT
x→ ,y →
xB yB
while m(xT ) < m(x) do
x y0
xT 0
yT
→ χ1 , → ψ1
xB yB
x2 y2
κ := κ + χ1 ψ1
x0 y0
xT
yT
←
χ1 , ← ψ1
xB yB
x2 y2
endwhile
T
(0) (1)
χ0 ψ0 (1 + ε∗ )(1 + ε+ )
=
(1) (1)
χ1 ψ1 (1 + ε∗ )(1 + ε+ )
T
(0) (1)
χ0 (1 + ε∗ )(1 + ε+ ) 0 ψ0
=
(1) (1)
χ1 0 (1 + ε∗ )(1 + ε+ ) ψ1
T
(0) (1)
χ0 (1 + ε∗ )(1 + ε+ ) ψ0
= (1) (1)
,
χ1 (1 + ε∗ )(1 + ε+ ) ψ1
κ := ((χ0 ψ0 + χ1 ψ1 ) + χ2 ψ2 ),
10.5.3 Preparation
Under the computational model given in Section 10.4, the computed result of (10.2), κ̌, satisfies
(0) (1) (1) (n−2) (n−1) (n−1)
κ̌ = (χ0 ψ0 (1 + ε∗ ) + χ1 ψ1 (1 + ε∗ ))(1 + ε+ ) + · · · (1 + ε+ ) + χn−1 ψn−1 (1 + ε∗ ) (1 + ε+ )
!
n−1 n−1
(i) ( j)
= ∑ χi ψi (1 + ε∗ ) ∏ (1 + ε+ ) , (10.3)
i=0 j=i
(0) (0) ( j) ( j)
where ε+ = 0 and |ε∗ |, |ε∗ |, |ε+ | ≤ u for j = 1, . . . , n − 1.
Clearly, a notation to keep expressions from becoming unreadable is desirable. For this reason we
introduce the symbol θ j :
Lemma 10.14 Let εi ∈ R, 0 ≤ i ≤ n − 1, nu < 1, and |εi | ≤ u. Then ∃ θn ∈ R such that
n−1
∏ (1 + εi)±1 = 1 + θn,
i=0
We will show that if εi ∈ R, 0 ≤ i ≤ n, (n + 1)u < 1, and |εi | ≤ u, then there exists a θn+1 ∈ R such
that
n
∏(1 + εi)±1 = 1 + θn+1, with |θn+1| ≤ (n + 1)u/(1 − (n + 1)u).
i=0
±1
∏n−1
i=0 (1 + εi ) 1 + θn θn − εn
= = 1+ ,
1 + εn 1 + εn 1 + εn
| {z }
θn+1
which tells us how to pick θn+1 . Now
nu
1−nu + u
θn − εn |θn | + u nu + (1 − nu)u
|θn+1 | = ≤ ≤ =
1 + εn 1−u 1−u (1 − nu)(1 − u)
(n + 1)u − nu2 (n + 1)u
= 2
≤ .
1 − (n + 1)u + nu 1 − (n + 1)u
Chapter 10. Notes on the Stability of an Algorithm 198
The quantity θn will be used throughout this note. It is not intended to be a specific number. Instead,
it is an order of magnitude identified by the subscript n, which indicates the number of error factors of
the form (1 + εi ) and/or (1 + εi )−1 that are grouped together to form (1 + θn ).
Since the bound on |θn | occurs often, we assign it a symbol as follows:
where |θ j | ≤ γ j , j = 2, . . . , n.
Two instances of the symbol θn , appearing even in the same expression, typically do not represent the
same number. For example, in (10.4) a (1 + θn ) multiplies each of the terms χ0 ψ0 and χ1 ψ1 , but these
two instances of θn , as a rule, do not denote the same quantity. In particular, One should be careful when
factoring out such quantities.
As part of the analyses the following bounds will be useful to bound error that accumulates:
The first two are backward error results (error is accumulated onto input parameters, showing that the
algorithm is numerically stable since it yields the exact output for a slightly perturbed input) while the last
one is a forward error result (error is accumulated onto the answer). We will see that in different situations,
a different error result may be needed by analyses of operations that require a dot product.
Let us focus on the second result. Ideally one would show that each of the entries of y is slightly
perturbed relative to that entry:
σ0 ψ0 σ0 · · · 0 ψ0
.. . .
= .. .. . .
.. ..
δy =
.
= Σy,
σn−1 ψn−1 0 · · · σn−1 ψn−1
where each σi is “small” and Σ = diag(σ0 , . . . , σn−1 ). The following special structure of Σ, inspired
by (10.5) will be used in the remainder of this note:
0 × 0 matrix if n = 0
(n)
Σ = θ1 if n = 1 (10.6)
diag(θn , θn , θn−1 , . . . , θ2 ) otherwise.
Homework 10.19 Let k ≥ 0 and assume that |ε1 |, |ε2 | ≤ u, with ε1 = 0 if k = 0. Show that
I +Σ (k) 0
(1 + ε2 ) = (I + Σ(k+1) ).
0 (1 + ε1 )
Hint: reason the case where k = 0 separately from the case where k > 0.
* SEE ANSWER
Theorem 10.20 Let x, y ∈ Rn and let κ := xT y be computed by executing the algorithm in Figure 10.2.
Then
κ̌ = [xT y] = xT (I + Σ(n) )y.
Chapter 10. Notes on the Stability of an Algorithm 200
10.5.6 A weapon of math induction for the war on (backward) error (optional)
We focus the reader’s attention on Figure 10.3 in which we present a framework, which we will call the
error worksheet, for presenting the inductive proof of Theorem 10.20 side-by-side with the algorithm for
D OT. This framework, in a slightly different form, was first introduced in [3]. The expressions enclosed
by { } (in the grey boxes) are predicates describing the state of the variables used in the algorithms and
in their analysis. In the worksheet, we use superscripts to indicate the iteration number, thus, the symbols
vi and vi+1 do not denote two different variables, but two different states of variable v.
The proof presented in Figure 10.3 goes hand in hand with the algorithm, as it shows that before and
after each iteration of the loop that computes κ := xT y, the variables κ̌, xT , yT , ΣT are such that the predicate
{κ̌ = xTT (I + ΣT )yT ∧ k = m(xT ) ∧ ΣT = Σ(k) } (10.7)
10.5. Stability of the Dot Product Operation 201
κ̌i+1 = κ̌i + χ1 ψ1 (1 + ε∗ ) (1 + ε+ )
SCM, twice
(ε+ = 0 if k = 0)
(k)
= x0T (I + Σ0 )y0 + χ1 ψ1 (1 + ε∗ ) (1 + ε+ )
Step 6: I.H.
T
(k)
κ := κ + χ1 ψ1 x0 I + Σ0 0 y0 8
= (1 + ε+ ) Rearrange
χ1 0 1 + ε∗ ψ1
T
x0 y0
= I + Σ(k+1) Exercise 10.19
χ1 ψ1
T
i+1
x0 Σ 0 y
0
κ̌i+1 = I + 0
0 σ1i+1
χ1 ψ1
7
Σi+1 0 x
= Σ(k+1) ∧ m 0 = (k + 1)
∧ 0
0 σi+1
1 χ 1
Continue
with
! x0 ! y0 ! Σi+1
0 0 0
xT yT ΣT 0 5b
← χ1 , ← ψ1 , ← 0 σi+1
1 0
xB yB 0 ΣB
x2 y 0 0 Σ
n 2 2
o
T (k)
κ̌ = xT (I + ΣT )yT ∧ ΣT = Σ ∧ m(xT ) = k 2c
endwhile
n o
κ̌ = xTT (I + ΣT )yT ∧ ΣT = Σ(k) ∧ m(xT ) = k ∧ m(xT ) = m(x) 2d
n o
κ̌ = xT (I + Σ(n) )y ∧ m(x) = n 1b
Figure 10.3: Error worksheet completed to establish the backward error result for the given algorithm that
computes the D OT operation.
Chapter 10. Notes on the Stability of an Algorithm 202
holds true. This relation is satisfied at each iteration of the loop, so it is also satisfied when the loop
completes. Upon completion, the loop guard is m(xT ) = m(x) = n, which implies that κ̌ = xT (I + Σ(n) )y,
i.e., the thesis of the theorem, is satisfied too.
In details, the inductive proof of Theorem 10.20 is captured by the error worksheet as follows:
Base case. In Step 2a, i.e. before the execution of the loop, predicate (10.7) is satisfied, as k = m(xT ) = 0.
Inductive step. Assume that the predicate (10.7) holds true at Step 2b, i.e., at the top of the loop. Then
Steps 6, 7, and 8 in Figure 10.3 prove that the predicate is satisfied again at Step 2c, i.e., the bottom
of the loop. Specifically,
By the Principle of Mathematical Induction, the predicate (10.7) holds for all iterations. In particular,
when the loop terminates, the predicate becomes
κ̌ = xT (I + Σ(n) )y ∧ n = m(xT ).
10.5.7 Results
A number of useful consequences of Theorem 10.20 follow. These will be used later as an inventory
(library) of error results from which to draw when analyzing operations and algorithms that utilize D OT.
Corollary 10.22 Under the assumptions of Theorem 10.20 the following relations hold:
R1-B: (Backward analysis) κ̌ = (x + δx)T y, where |δx| ≤ γn |x|, and κ̌ = xT (y + δy), where |δy| ≤ γn |y|;
10.6. Stability of a Matrix-Vector Multiplication Algorithm 203
Proof: We leave the proof of R1-B as an exercise. For R1-F, let δκ = xT Σ(n) y, where Σ(n) is as in
Theorem 10.20. Then
10.6.2 Analysis
Assume A ∈ Rm×n and partition
aT0 ψ0
aT ψ
1 1
A= . and .
.
.. ..
aTm−1 ψm−1
Then
ψ0 aT0 x
ψ aT x
1 1
:= .
.
. . ...
ψm−1 aTm−1 x
Chapter 10. Notes on the Stability of an Algorithm 204
Algorithm: G EMV:y := Ax
AT yT
A→ ,y →
AB yB
while m(AT ) < m(A) do
A y0
AT 0
yT
→ aT1 , → ψ1
AB yB
A2 y2
ψ1 := aT1 x
x0 y0
xT
yT
←
χ1 , ← ψ1
xB yB
x2 y2
endwhile
From Corollary 10.22 R1-B regarding the dot product we know that
ψ̌0 (a0 + δa0 )T x aT0 δaT0
ψ̌
1 (a1 + δa1 )T x aT δaT
1 1
y̌ = . = = . + x = (A + ∆A)x,
.. .. . ..
. .
.
ψ̌m−1 (am−1 + δam−1 )T x aTm−1 δaTm−1
Algorithm: G EMM :C := AB
C → CL CR , B → BL BR
while n(CL ) < n(C) do
CL CR → C0 c1 C2 , BL BR → B0 b1 B2
c1 := Ab1
CL CR ← C0 c1 C2 , BL BR ← B0 b1 B2
endwhile
Homework 10.25 In the above theorem, could one instead prove the result
y̌ = A(x + δx),
where δx is “small”?
* SEE ANSWER
10.7.2 Analysis
Partition
C= c0 c1 · · · cn−1 and B = b0 b1 · · · bn−1 .
Then
c0 c1 · · · cn−1 := Ab0 Ab1 · · · Abn−1 .
Chapter 10. Notes on the Stability of an Algorithm 206
From Corollary 10.22 R1-B regarding the dot product we know that
č0 č1 · · · čn−1 = Ab0 + δc0 Ab1 + δc1 · · · Abn−1 + δcn−1
= Ab0 Ab1 · · · Abn−1 + δc0 δc1 · · · δcn−1
= AB + ∆C.
Theorem 10.26 (Forward) error results for matrix-matrix multiplication. Let C ∈ Rm×n , A ∈ Rm×k ,
and B ∈ Rk×n and consider the assignment C := AB implemented via the algorithm in Figure 10.5. Then
the following equality holds:
Homework 10.27 In the above theorem, could one instead prove the result
Č = (A + ∆A)(B + ∆B),
10.7.3 An application
A collaborator of ours recently implemented a matrix-matrix multiplication algorithm and wanted to check
if it gave the correct answer. To do so, he followed the following steps:
• He created random matrices A ∈ Rm×k , and C ∈ Rm×n , with positive entries in the range (0, 1).
• He computed C = AB with an implementation that was known to be “correct” and assumed it yields
the exact solution. (Of course, it has error in it as well. We discuss how he compensated for that,
below.)
• He computed ∆C = Č −C and checked that each of its entries satisfied δγi, j ≤ 2kuγi, j .
• In the above, he took advantage of the fact that A and B had positive entries so that |A||B| = AB = C.
ku
He also approximated γk = 1−ku with ku, and introduced the factor 2 to compensate for the fact that
C itself was inexactly computed.
10.8 Wrapup
10.8.2 Summary
Chapter 11
Notes on Performance
How to attain high performance on modern architectures is of importance: linear algebra is fundamental
to scientific computing. Scientific computing often involves very large problems that require the fastest
computers in the world to be employed. One wants to use such computers efficiently.
For now, we suggest the reader become familiar with the following resources:
A similar exercise:
Michael Lehn
GEMM: From Pure C to SSE Optimized Micro Kernels
http://apfel.mathematik.uni-ulm.de/ lehn/sghpc/gemm/index.html.
207
Chapter 11. Notes on Performance 208
Chapter 12
Notes on Gaussian Elimination and LU
Factorization
12.1.1 Launch
The LU factorization is also known as the LU decomposition and the operations it performs are equivalent
to those performed by Gaussian elimination. For details, we recommend that the reader consult Weeks 6
and 7 of
Video
Read disclaimer regarding the videos in the preface!
Lecture on Gaussian Elimination and LU factorization:
* YouTube
* Download from UT Box
* View After Local Download
* YouTube
* Download from UT Box
* View After Local Download
* Slides
209
Chapter 12. Notes on Gaussian Elimination and LU Factorization 210
12.1.2 Outline
The first question we will ask is when the LU factorization exists. For this, we need a definition.
Theorem 12.3 Existence Let A ∈ Cm×n and m ≤ n have linearly independent columns. Then A has a
unique LU factorization if and only if all its principle leading submatrices are nonsingular.
The proof of this theorem is a bit involved and can be found in Section 12.5.
12.3 LU Factorization
We are going to present two different ways of deriving the most commonly known algorithm. The first is
a straight forward derivation. The second presents the operation as the application of a sequence of Gauss
transforms.
• a21 = 0.
Example 12.5 Gauss transforms can be used to take multiples of a row and subtract these multiples from
other rows:
T a T T
1 0 0 0 ab0 b 0 a
b 0
0 T
a T T
1 0 0 a
1
b b 1 a
b 1
= = .
0 −λ 1 0 a T a T λ
T
a − λ a T
21 2 2 − 21 abT 2
b b b 21 1
b
1
0 −λ31 0 1 abT3 abT3 λ31 abT3 − λ31 abT1
Notice the similarity with what one does in Gaussian Elimination: take a multiples of one row and subtract
these from other rows.
Now assume that the LU factorization in the previous subsection has proceeded to where A contains
A a01 A02
00
0 α11 a12 T
0 a21 A22
Repartition
A a01 A02 L 0 0
AT L AT R 00 00
LT L LT R
→ aT α11 aT
, → l T λ11 0
10 12 10
ABL ABR LBL LBR
A20 a21 A22 L20 l21 L22
whereα11 , λ11 are 1 × 1
l21 := a21 /α11
A22 := A22 − l21 aT12
(a21 := 0)
or, alternatively,
l21 := a21 /α11
A a01 A02 I 0 0 A00 a01 A02
00
0 α11 aT12 := 0 0 α11 aT12
0 1
0 a21 A22 0 −l21 0 0 a21 A22
A00 a01 A02
= 0 α11
T
a12
0 0 A22 − l21 aT12
Continue with
A a A02 L 0 0
AT L AT R 00 01 0 00
LT L
← aT α11 aT
, ← l T λ11 0
10 12 10
ABL ABR LBL LBR
A20 a21 A22 L20 l21 L22
endwhile
Figure 12.1: Most commonly known algorithm for overwriting a matrix with its LU factorization.
Chapter 12. Notes on Gaussian Elimination and LU Factorization 216
The resulting algorithm is summarized in Figure 12.1 under “or, alternatively,”. Notice that this algorithm
is identical to the algorithm for computing LU factorization discussed before!
How can this be? The following set of exercises explains it.
* SEE ANSWER
Now, clearly, what the algorithm does is to compute a sequence of n Gauss transforms L b0 , . . . , L
bn−1
−1
such that Ln−1 Ln−2 · · · L1 L0 A = U. Or, equivalently, A = L0 L1 · · · Ln−2 Ln−1U, where Lk = Lk . What
b b b b b
will show next is that L = L0 L1 · · · Ln−2 Ln−1 is the unit lower triangular matrix computed by the LU
factorization.
L 0 0
00
T
Homework 12.7 Let L̃k = L0 L1 . . . Lk . Assume that L̃k has the form L̃k−1 = l10 1 0 , where L̃00
L20 0 I
L00 0 0 I 0 0
is k × k. Show that L̃k is given by L̃k = l10 1 0 .. (Recall: Lk = 0
T
0 .)
b 1
L20 l21 I 0 −l21 I
* SEE ANSWER
What this exercise shows is that L = L0 L1 · · · Ln−2 Ln−1 is the triangular matrix that is created by simply
placing the computed vectors l21 below the diagonal of a unit lower triangular matrix.
• Assume A is n × n.
• Computing l21 := a21 /α11 is typically implemented as β := 1/α11 and then the scaling l21 := βa21 .
The reason is that divisions are expensive relative to multiplications. We will ignore the cost of the
division (which will be insignificant if n is large). Thus, we count this as n − k − 1 multiplies.
• The rank-1 update of A22 requires (n − k − 1)2 multiplications and (n − k − 1)2 additions.
n−1 n−1
2
j + 2 j2
∑ (n − k − 1) + 2(n − k − 1) = ∑ (Change of variable: j = n − k − 1)
k=0 j=0
n−1 n−1
= ∑ j + 2 ∑ j2
j=0 j=0
n(n − 1) n Z
≈ +2 x2 dx
2 0
n(n − 1) 2 3
= + n
2 3
2 3
≈ n
3
Notice that this involves roughly half the number of floating point operations as are required for a House-
holder transformation based QR factorization.
(This is the backward error result for the Crout variant for LU factorization, discussed later in this note.
Some of the other variants have an error result of (A + ∆A) = ĽǓ where |∆A| ≤ γn (|A| + |Ľ||Ǔ|).) Now, if
α11 is small in magnitude compared to the entries of a21 then not only will l21 have large entries, but the
update A22 − l21 aT12 will potentially introduce large entries in the updated A22 (in other words, the part of
matrix A from which the future matrix U will be computed), a phenomenon referred to as element growth.
To overcome this, we take will swap rows in A as the factorization proceeds, resulting in an algorithm
known as LU factorization with partial pivoting.
Repartition
A00 a01 A02 L00 0 0 p0
AT L AT R
LT L LT R T
pT
→ T T
a10 α11 a12 , → l10 λ11 0 , → π1
ABL ABR LBL LBR pB
A20 a21 A22 L20 l21 L22 p2
whereα11 , λ11 , π1 are 1 × 1
α11
π1 = maxi
a21
α aT α aT
11 12 := P(π1 ) 11 12
a21 A22 a21 A22
l21 := a21 /α11
A22 := A22 − l21 aT12
(a21 := 0)
Continue with
A a A L 0 0 p
AT L AT R 00 01 02 LT L 0 00 p T
0
← aT α11 aT ,
← l T λ11 0 , ← π1
10 12 10
ABL ABR LBL LBR pB
A20 a21 A22 L20 l21 L22 p2
endwhile
If P is a permutation matrix then PA rearranges the rows of A exactly as the elements of x are rearranged
by Px.
We will see that when discussing the LU factorization with partial pivoting, a permutation matrix that
swaps the first element of a vector with the π-th element of that vector is a fundamental tool. We will
denote that matrix by
In if π = 0
0 0 1 0
P(π) = 0 I
π−1 0 0
otherwise,
1 0 0 0
0 0 0 I n−π−1
where n is the dimension of the permutation matrix. In the following we will use the notation Pn to indicate
that the matrix P is of size n. Let p be a vector of integers satisfying the conditions
Remark 12.9 In the algorithms, the subscript that indicates the matrix dimensions is omitted.
Example 12.10 Let aT0 , aT1 , . . . , aTn−1 be the rows of a matrix A. The application of P(p) to A yields a
matrix that results from swapping row aT0 with aTπ0 , then swapping aT1 with aTπ1 +1 , aT2 with aTπ2 +2 , until
finally aTk−1 is swapped with aTπk−1 +k−1 .
Remark 12.11 For those familiar with how pivot information is stored in LINPACK and LAPACK, notice
that those packages store the vector of pivot information (π0 + 1, π1 + 2, . . . , πk−1 + k)T .
[A, p] := LUpivA,
• Partition A, L as follows:
α11 aT12 1 0
A→ , and L → .
a21 A22 l21 L22
α11
• Compute π1 = maxi .
a21
α11 aT12 α11 aT12
• Permute the rows: := P(π1 ) .
a21 A22 a21 A22
or, equivalently,
T
A = P0T L0 · · · Pbn−1 Ln−1U,
b−1 . What we will finally show is that there are Gauss transforms L0 , . . . Ln−1 (here the “bar”
where Lk = Lk
does NOT mean conjugation. It is just a symbol) such that
A = P0T · · · Pn−1
T
L0 · · · Ln−1 U
| {z }
L
or, equivalently,
P(p)A = Pn−1 · · · P0 A = L0 · · · Ln−1 U,
| {z }
L
which is what we set out to compute.
Here is the insight. Assume that after k steps of LU factorization we have computed pT , LT L , LBL , etc.
so that
LT L 0 AT L AT R
P(pT )A = ,
LBL I 0 ABR
where AT L is upper triangular and k × k.
AT L AT R
Now compute the next step of LU factorization with partial pivoting with :
0 ABR
• Partition
A a01 A02
AT L AT R 00
→ 0 α11 aT
12
0 ABR
0 a01 A02
α11
• Compute π1 = maxi
a21
A a01 A02 A a01 A02
00 0 00
I
• Permute 0 α11 aT12 := aT12
0
α11
0 P(π1 )
0 a21 A22 0 a21 A22
Repartition
A00 a01 A02 L00 0 0 p0
AT L AT R LT L LT R
pT
→ aT10 α11 aT12 ,
→ T λ
l10 11 0 , → π1
ABL ABR LBL LBR pB
A20 a21 A22 L20 l21 L22 p2
whereα11 , λ11 , π1 are 1 × 1
α11
π1 = maxi
a21
A a A A a A
00 01 02 I 0 00 01 02
T
l10 α11 aT12 := l T α11 aT
10 12
0 P(π1 )
L20 a21 A22 L20 a21 A22
l21 := a21 /α11
A a A I 0 0 A a A A a A02
00 01 02 00 01 02 00 01
0 α11 aT12 := 0 1 0 0 α11 aT12 = 0 α11 aT12
0 a21 A22 0 −l21 0 0 a21 A22 T
0 0 A22 − l21 a12
Continue with
A00 a01 A02 L00 0 0 p0
AT L AT R LT L 0 T pT
← T T
a10 α11 a12 , ← l10 λ11 0 , ← π1
ABL ABR LBL LBR pB
A20 a21 A22 L20 l21 L22 p2
endwhile
Repartition
A a01 A02 p
AT L AT R 00 0
pT
→ aT α11 aT
, → π1
ABL ABR 10 12 pB
A20 a21 A22 p2
whereα11 , λ11 , π1 are 1 × 1
α11
π1 = maxi
a21
aT10 α11 aT12 aT10 α11 aT12
:= P(π1 )
A20 a21 A22 A20 a21 A22
a21 := a21 /α11
A22 := A22 − a21 aT12
Continue with
A00 a01 A02 p0
AT L AT R
pT
aT10 α11
← aT12 , ←
π1
ABL ABR pB
A20 a21 A22 p2
endwhile
Figure 12.4: LU factorization with partial pivoting, overwriting A with the factors.
• Update
A a01 A02 A a01 A02
00 00
0 α11 aT12 := 0 aT12
α11
0 a21 A22 0 0 A22 − l21 aT12
Chapter 12. Notes on Gaussian Elimination and LU Factorization 224
After this,
I 0 0
A a01 A02
LT L 0 I 0 00
P(pT )A = aT12 (12.2)
0 1 0 0 α11
LBL I 0 P(π1 )
0 l21 I 0 0 A22 − l21 aT12
But
I 0 0
A a01 A02
LT L 0 I 0 00
aT12
0 1 0 0
α11
LBL I 0 P(π1 )
0 l21 I 0 0 A22 − l21 aT12
I 0 0 A a01 A02
I 0 LT L 0 00
= aT12
0 1 0 0 α11
0 P(π1 ) P(π1 )LBL I
0 l21 I 0 0 A22 − l21 aT12
I 0 0 A00 a01 A02
I 0 L 0
= TL 0 1 0
0 α11 T
a12 ,,
0 P(π1 ) LBL I
T
0 l21 I 0 0 A22 − l21 a12
NOTE: There is a “bar” above some of L’s and l’s. Very hard to see!!! For this reason, these show
up in red as well. Here we use the fact that P(π1 ) = P(π1 )T because of its very special structure.
Bringing the permutation to the left of (12.2) and “repartitioning” we get
L 0 0 I 0 0 A a01 A02
I 0 00 00
P(p0 ) A = l T 1 0 0 1 0 0 α11
T
a12 .
10
0 P(π1 )
T
| {z } L 20 0 I 0 l21 I 0 0 A 22 − l a
21 12
p0
| {z }
P L 0 0
π1 00
T
l 10 1 0
L20 l21 I
This explains how the algorithm in Figure 12.3 compute p, L, and U (overwriting A with U) so that
P(p)A = LU.
Finally, we recognize that L can overwrite the entries of A below its diagonal, yielding the algorithm
in Figure 12.4.
be the LU factorization of A where AT L , LT L , and UT L are k × k. Notice that U cannot have a zero on
the diagonal since then A would not have linearly independent columns. Now, the k × k principle leading
submatrix AT L equals AT L = LT LUT L which is nonsingular since LT L has a unit diagonal and UT L has no
zeroes on the diagonal. Since k was chosen arbitrarily, this means that all principle leading submatrices
are nonsingular.
Inductive Step: Assume the result is true for all matrices with n = k. Show it is true for matrices with
n = k + 1.
Let A of size n = k + 1 have nonsingular principle leading submatrices. Now, if an LU factorization
of A exists, A = LU, then it would have to form
A00 a01 L00 0
T
T
U00 u01
a10 α11 = l10 . (12.3)
1
0 υ11
A20 a21 L20 l21 | {z }
U
| {z } | {z }
A L
If we can show that the different parts of L and U exist and are unique, we are done. Equation (12.3)
can be rewritten as
A L a L00 u01
00 00 01
T T T u +υ
a10 = l10 U00 and = .
α11 l10 01 11
A20 L20 a21 L20 u01 + l21 υ11
Now, by the Induction Hypothesis L11 , l10T , and L exist and are unique. So the question is whether
20
u01 , υ11 , and l21 exist and are unique:
Chapter 12. Notes on Gaussian Elimination and LU Factorization 226
• u01 exists and is unique. Since L00 is nonsingular (it has ones on its diagonal) L00 u01 = a01
has a solution that is unique.
• υ11 exists, is unique, and is nonzero. Since l10T and u exist and are unique, υ = α −l T u
01 11 11 10 01
exists and is unique. It is also nonzero since the principle leading submatrix of A given by
A a01 L 0 U u01
00 = 00 00 ,
aT10 α11 T
l10 1 0 υ11
is nonsingular by assumption and therefore υ11 must be nonzero.
• l21 exists and is unique. Since υ11 exists and is nonzero, l21 = a21 /υ11 exists and is uniquely
determined.
Thus the m × (k + 1) matrix A has a unique LU factorization.
By the Principle of Mathematical Induction the result holds.
Homework 12.12 Implement LU factorization with partial pivoting with the FLAME@lab API, in M-
script.
* SEE ANSWER
PAx = Py
|{z}
yb
Lz = yb
Ux = z.
12.8.1 Lz = y
First, we discuss solving Lz = y where L is a unit lower triangular matrix.
Variant 1
Then
1 0 ζ1 ψ1
= .
l21 L22 z2 y2
| {z } | {z } | {z }
L z y
Chapter 12. Notes on Gaussian Elimination and LU Factorization 228
y2 := y2 − ψ1 l21 T y
ψ1 := ψ1 − l10 0
Figure 12.5: Algorithms for the solution of a unit lower triangular system Lz = y that overwrite y with z.
12.8. Solving Triangular Systems of Equations 229
Variant 2
Then
L00 0 z0 y0
= .
T
l10 1 ζ1 ψ1
| {z } | {z } | {z }
L z y
Multiplying out the left-hand side yields
L00 z0 y0
=
T z +ζ
l10 ψ1
0 1
Discussion
Notice that Variant 1 casts the computation in terms of an AXPY operation while Variant 2 casts it in terms
of DOT products.
Chapter 12. Notes on Gaussian Elimination and LU Factorization 230
12.8.2 Ux = z
Next, we discuss solving Ux = y where U is an upper triangular matrix (with no assumptions about its
diagonal entries).
Homework 12.13 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts
most computation in terms of DOT products. Hint: Partition
υ11 u12 T
U → .
0 U22
Call this Variant 1 and use Figure 12.6 to state the algorithm.
* SEE ANSWER
Homework 12.14 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of AXPY operations. Call this Variant 2 and use Figure 12.6 to state the algorithm.
* SEE ANSWER
(A is overwritten by L below the diagonal and U on and above the diagonal, where multiplying L
and U yields the original matrix A.)
• All the algorithms will march through the matrices from top-left to bottom-right. Thus, at a repre-
sentative point in the algorithm, the matrices are viewed as quadrants:
AT L AT R LT L 0 UT L UT R
A→ , L → , and U → .
ABL ABR LBL LBR 0 UBR
where AT L , LT L , and UT L are all square and equally sized.
1 Fora thorough discussion of the different LU factorization algorithms that also gives a historic perspective, we recommend
“Matrix Algorithms Volume 1“ by G.W. Stewart [36]
12.9. Other LU Factorization Algorithms 231
Figure 12.6: Algorithms for the solution of an upper triangular system Ux = y that overwrite y with x.
Chapter 12. Notes on Gaussian Elimination and LU Factorization 232
• In terms of these exposed quadrants, in the end we wish for matrix A to contain
AT L AT R L\UT L UT R
=
ABL A LBL L\UBR
BR
LT L 0 U UT R A A
TL
bT L bT R
where =
LBL LBR 0 UBR A
bBL A
bBR
• Manipulating this yields what we call the Partitioned Matrix Expression (PME), which can be
viewed as a recursive definition of the LU factorization:
AT L AT R L\UT L UT R
=
ABL ABR LBL L\UBR
LT LUT L = A
bT L LT LUT R = A
bT R
∧
LBLUT L = A
bBL LBLUT R + LBRUBR = A
bBR
Now, consider the code skeleton for the LU factorization in Figure 12.7. At the top of the loop
(right after the while), we want to maintain certain contents in matrix A. Since we are in a loop, we
haven’t yet overwritten A with the final result. Instead, some progress toward this final result have
been made. The way we can find what the state of A is that we would like to maintain is to take the
PME and delete subexpression. For example, consider the following condition on the contents of A:
AT L AT R L\UT L UT R
=
ABL ABR LBL bBR − LBLUT R
A
.
LT LUT L = A
bT L LT LUT R = A
bT R
∧ (
(((
(( b
LBLUT L = A
bBL LBLU
(((T R
((+(
LBR
(U
BR = ABR
What we are saying is that AT L , AT R , and ABL have been completely updated with the corresponding
parts of L and U, and ABR has been partially updated. This is exactly the state that the algorithm
that we discussed previously in this document maintains! What is left is to factor ABR , since it
contains AbBR − LBLUT R , and A
bBR − LBLUT R = LBRUBR .
• By carefully analyzing the order in which computation must occur (in compiler lingo: by performing
a dependence analysis), we can identify five states that can be maintained at the top of the loop, by
deleting subexpressions from the PME. These are called loop invariants and are listed in Figure 12.8.
• Key to figuring out what updates must occur in the loop for each of the variants is to look at how the
matrices are repartitioned at the top and bottom of the loop body.
12.9. Other LU Factorization Algorithms 233
Algorithm: A := LU(A)
AT L AT R L LT R U UT R
Partition A → , L → TL , U → TL
ABL ABR LBL LBR UBL UBR
whereAT L is 0 × 0, LT L is 0 × 0, UT L is 0 × 0
while m(AT L ) < m(A) do
Repartition
A a01 A02 L l L02
AT L AT R 00 L LT R 00 01
→ aT α11 aT
, TL → l T λ11 l T
,
10 12 10 12
ABL ABR LBL LBR
A20 a21 A22 L20 l21 L22
U u01 U02
UT L UT R 00
→ uT υ11 uT
10 12
UBL UBR
U20 u21 U22
•
whereα11 is 1 × 1, λ11 is 1 × 1, υ11 is 1 × 1
Continue with
A00 a01 A02 L00 l01 L02
AT L AT R LT L LT R
aT10
← aT12 , ← T
l10 T
l12 ,
α11 λ11
ABL ABR LBL LBR
A a21 A22 L20 l21 L22
20
U u01 U02
UT L UT R 00
uT10
← T
u12
υ11
UBL UBR
U20 u21 U22
endwhile
L\U00 ab01 A
b02
abT10 b 11 abT12
α
Ab20 ab21 Ab22
L00U00 = A
b00 L00 u01 = ab01 L00U02 = A
b02
TU =a
l10 bT10 T u +υ = α
l10 T U + uT = a
l10 bT12
00 01 11 b 11 02 12
L20U00 = A
b20 L20 u01 + υ11 l21 = ab21 L20U02 + l21 uT12 + L22U22 = A
b22
where the entries in red are already known. The equalities in yellow can be used to compute the desired
parts of L and U:
• Solve L00 u01 = a01 for u01 , overwriting a01 with the result.
T U = aT (or, equivalently, U T (l T )T = (aT )T for l T ), overwriting aT with the result.
• Solve l10 00 10 00 10 10 10 10
T u , overwriting α with the result.
• Compute υ11 = α11 − l10 01 11
Homework 12.15 If A is an n × n matrix, show that the cost of Variant 1 is approximately 23 n3 flops.
* SEE ANSWER
L\U00 ab01 A
b02
T
l10 b 11 abT12
α
L20 ab21 Ab22
L00U00 = A
b00 L00 u01 = ab01 L00U02 = A
b02
TU =a
l10 bT10 T u +υ = α
l10 T U + uT = a
l10 bT12
00 01 11 b 11 02 12
L20U00 = A
b20 L20 u01 + υ11 l21 = ab21 L20U02 + l21 uT12 + L22U22 = A
b22
The equalities in yellow can be used to compute the desired parts of L and U:
• Solve L00 u01 = a01 for u01 , overwriting a01 with the result.
T u , overwriting α with the result.
• Compute υ11 = α11 − l10 01 11
• Compute l21 := (a21 − L20 u01 )/υ11 , overwriting a21 with the result.
L00U00 = A
b00 L00 u01 = ab01 L00U02 = A
b02
TU =a
l10 bT10 T u +υ = α
l10 T U + uT = a
l10 bT12
00 01 11 b 11 02 12
L20U00 = A
b20 L20 u01 + υ11 l21 = ab21 L20U02 + l21 uT12 + L22U22 = A
b22
The equalities in yellow can be used to compute the desired parts of L and U:
• Compute l21 := (α21 − L20 u01 )/υ11 , overwriting a21 with the result.
Homework 12.17 Implement all five LU factorization algorithms with the FLAME@lab API, in M-script.
* SEE ANSWER
Homework 12.18 Which of the five variants can be modified to incorporate partial pivoting?
* SEE ANSWER
Repartition
A a01 A02
AT L AT R 00
→ aT
α11 aT12
10
ABL ABR
A20 a21 A22
whereα11 is 1 × 1
A00 contains L00 and U00 in its strictly lower and upper triangular part, re-
spectively.
Variant 1 Variant 2 Variant 3 Variant 4 Variant 5
Bordered Left-looking Up-looking Crout variant Classical LU
−1 −1
a01 := L00 a01 a01 := L00 a01 Exercise in 12.16
−1
aT10 := aT10U00
α11 := α11 − aT21 a01 α11 := α11 − aT21 a01 α11 := α11 − aT21 a01
aT12 := aT12 − aT10 A02
a21 := a21 − A20 a01 a21 := a21 − A20 a01
a21 := a21 /α11 a21 := a21 /α11 a21 := a21 /α11
A22 := A22 − a21 aT12
Continue with
A00 a01 A02
AT L AT R
aT10 α11
← aT12
ABL ABR
A20 a21 A22
endwhile
Theorem 12.19 Let A ∈ Rn×n and let the LU factorization of A be computed via the Crout variant (Variant
4), yielding approximate factors Ľ and Ǔ. Then
Theorem 12.20 Let L ∈ Rn×n be lower triangular and y, z ∈ Rn with Lz = y. Let ž be the approximate
solution that is computed. Then
Theorem 12.21 Let U ∈ Rn×n be upper triangular and x, z ∈ Rn with Ux = z. Let x̌ be the approximate
solution that is computed. Then
Theorem 12.22 Let A ∈ Rn×n and x, y ∈ Rn with Ax = y. Let x̌ be the approximate solution computed via
the following steps:
The analysis of LU factorization without partial pivoting is related that of LU factorization with partial
pivoting. We have shown that LU with partial pivoting is equivalent to the LU factorization without partial
pivoting on a pre-permuted matrix: PA = LU, where P is a permutation matrix. The permutation doesn’t
involve any floating point operations and therefore does not generate error. It can therefore be argued that,
as a result, the error that is accumulated is equivalent with or without partial pivoting
Chapter 12. Notes on Gaussian Elimination and LU Factorization 240
From this exercise we conclude that even LU factorization with partial pivoting can yield large (exponen-
tial) element growth in U. You may enjoy the collection of problems for which Gaussian elimination with
partial pivoting in unstable by Stephen Wright [48].
In practice, this does not seem to happen and LU factorization is considered to be stable.
Algorithm: A := LU(A)
AT L AT R
Partition A →
ABL ABR
whereAT L is 0 × 0
while m(AT L ) < m(A) do
A00 contains L00 and U00 in its strictly lower and upper triangular part, re-
spectively.
Variant 1: Variant 2: Variant 3:
A01 := L00 −1 A01 A01 := L00 −1 A01 A10 := A10U00 −1
A10 := A10U00 −1 A11 := A11 − A10 A01 A11 := A11 − A10 A01
A11 := A11 − A10 A01 A11 := LUA11 A11 := LUA11
A11 := LUA11 A21 := (A21 − A20 A01 )U11 −1A12 := A12 − A10 A02
Variant 4: Variant 5:
A11 := A11 − A10 A01 A11 := LUA11
A11 := LUA11 A21 := A21U11 −1
−1
A21 := (A21 − A20 A01 )U11 A12 := L11 −1 A12
A12 := L11 −1 (A12 − A10 A02 )A22 := A22 − A21 A12
Continue with
A A01 A02
AT L AT R 00
← A10 A11
A12
ABL ABR
A20 A21 A22
endwhile
Continue with
A00 A01 A02 p0
AT L AT R p
T ← p1
← A10 A11 ,
A12
ABL ABR pB
A20 A21 A22 p2
endwhile
Figure 12.11: Blocked algorithms for computing the LU factorization with partial pivoting.
.
12.12. Blocked Algorithms 243
or, equivalently,
A11 = L11U11 A12 = L11U12
.
A21 = L21U11 A22 − L21U12 = L22U22
If we let L and U overwrite the original matrix A this suggests the algorithm
• Compute the LU factorization A11 = L11U11 , overwriting A11 with L\U11 . Notice that any of the
“unblocked” algorithms previously discussed in this note can be used for this factorization.
−1
• Solve L11U12 = A12 , overwriting A12 with U12 . (This can also be expressed as A12 := L11 A12 .)
−1
• Solve L21U11 = A21 , overwiting A21 with U21 . (This can also be expressed as A21 := A21U11 .)
If b is small relative to n, then most computation is in the last step, which is a matrix-matrix multiplication.
Similarly, blocked algorithms for the other variants can be derived. All are given in Figure 12.10.
where A00 , L00 , and U00 are k × k, and A11 , L11 , and U11 are b × b.
Assume that the computation has proceeded to the point where A contains
A A01 A02 L\U00 U01 U02
00
A10 A11 A12 = L10 b11 − L10U01 A12 − L10U02 ,
A
A20 A21 A22 L20 b21 − L20U01 A22 − L20U02
A
where, as before, A
b denotes the original contents of A and
A A01 A02 L 0 0 U U01 U02
00 00 00
b b b
P(p0 ) A10 A b12 = L10 I 0 0 A11 A12 .
b b11 A
Ab20 A b21 A b22 L20 0 I 0 A21 A22
A11
• Compute the LU factorization with pivoting of the “current panel” :
A21
A11 L11
P(p1 ) = U11 ,
A21 L21
• Solve L11U12 = A12 , overwriting A12 with U12 . (This can also be more concisely written as A12 :=
−1
L11 A12 .)
Careful consideration shows that this puts the matrix A in the state
A00 A01 A02 L\U00 U01 U02
A10 A11 A12 = L10 L\U11 U12 ,
A20 A21 A22 L20 L21 b22 − L20U02 − L21U12
A
where
A A
b01 A L00 0 0 U00 U01 U02
00
b b02
p0
P( )
A A
b11 A = L10
L11
0 0 U11 U12 .
10
b b12
p1
A
b20 A
b21 A
b22 L20 L21 I 0 0 A22
Similarly, blocked algorithms with pivoting for some of the other variants can be derived. All are given
in Figure 12.10.
All LU factorization algorithms presented in this note perform exactly the same floating point operations
(with some rearrangement of data thrown in for the algorithms that perform pivoting) as does the triple-
nested loop that implements Gaussian elimination:
12.14. Inverting a Matrix 245
where e j equals the jth standard basis vector. Thus Ax j = e j , and hence any method for solving Ax = y
can be used to solve each of these problems, of which there are n.
• Solve UY = L−1 .
Homework exercises will show that this can be done at a cost of, approximately, 33 n3 flops. (I need
to check this!)
• Permute the columns of the result: X = Y P.
This incurs a O(n2 ) cost.
The total cost now is 2n3 flops.
Show that
−1
L00
L−1 = .
T L−1
−l10 1
00
* SEE ANSWER
Homework 12.25 Use Homework 12.24 and Figure 12.12 to propose an algorithm for overwriting a unit
lower triangular matrix L with its inverse.
* SEE ANSWER
Homework 12.26 Show that the cost of the algorithm in Homework 12.25 is, approximately, 13 n3 , where
L is n × n.
* SEE ANSWER
Homework 12.27 Given n × n upper triangular matrix U and n × n unit lower triangular matrix L, pro-
pose an algorithm for computing Y where UY = L. Be sure to take advantage of zeros in U and L. The
cost of the resulting algorithm should be, approximately, n3 flops.
There is a way of, given that L and U have overwritten matrix A, overwriting this matrix with Y . Try
to find that algorithm. It is not easy...
* SEE ANSWER
Repartition
L l L02
LT L LT R 00 01
→ l T λ11 l T
10 12
LBL LBR
L20 l21 L22
whereλ11 is 1 × 1
Continue with
L l L02
LT L LT R 00 01
← l T λ11
T
l12
10
LBL LBR
L20 l21 L22
endwhile
Figure 12.12: Algorithm skeleton for inverting a unit lower triangular matrix.
There are very few applications for which explicit inversion of a matrix is needed. When talking about
solving Ax = y, the term “inverting the matrix” is often used, since y = A−1 x. However, notice that
inverting the matrix and then multiplying it times vector y requires 2n3 flops for the inversion and then 2n2
flops for the multiplication. In contrast, computing the LU factorization requires 23 n3 flops, after which the
solves with L and U requires an additional 2n2 flops. Thus, it is essentially always beneficial to instead
compute the LU factorization and to use this to solve for x.
There are also numerical stability benefits to this, which we will not discuss now.
Chapter 12. Notes on Gaussian Elimination and LU Factorization 248
The problem is to estimate κ(A) = kAkkA−1 k for some matrix norm induced by a vector norm. If A is a
dense matrix, then inverting the matrix requires approximately 2n3 computations, which is more expensive
than solving Ax = y via, for example, LU factorization with partial pivoting. A second problem is that
computing the 2-norms kAk2 and kA−1 k2 is less than straight-forward and computationally expensive.
12.15.2 Insights
The second problem can be solved by choosing either the 1-norm or the ∞-norm. The first problem can
be solved by settling for an estimation of the condition number. For this we will try to find a lower bound
close to the actual condition number, since then the user gains insight into how many digits of accuracy
might be lost and will know this is a poorly conditioned problem if the estimate is large. If the estimate is
small, the jury is out whether the matrix is actually well-conditioned.
Notice that
kA−1 k∞ = max kA−1 yk∞ ≥ kA−1 dk∞ = kxk∞
kyk∞ =1
where Ax = d and kdk∞ = 1. So, we want to find an approximation for the vector (direction) d with
kdk∞ = 1 that has the property that kxk∞ is nearly maximally magnified. Remember from the proof of the
fact that
kAk∞ = max kb
a j k1
j
which has property that kyk∞ = 1. So, it would be good to look for a vector d with the special property
12.15. Efficient Condition Number Estimation 249
that
±1
±1
d= ,
..
.
±1
So, we are looking for x that satisfies U T LT x = d where d has ±1 for each of its elements. Importantly,
we get to choose d.
The way a simple approach proceeds is to pick d in such a way in an effort to make kzk∞ large, where
U z = d. After this LT x = z is solved and the estimate for kA−T k∞ is kxk∞ .
T
Partition
υ11 uT12 ζ1 δ1
U → ,z → , and d → .
0 U22 z2 d2
Then U T z = d means
T
υ11 uT12 ζ1 δ1
=
0 U22 z2 d2
and hence
υ11 ζ1 = δ1
ζ1 u12 +U22 z2 = d2
Now, δ1 can be chosen as ±1. (The choice does not make a difference. Might as well make it δ1 = 1.)
Then ζ1 = 1/υ11 .
Now assume that the process has proceeded to the point where
U u01 U02 z d
00 0 0
U → 0 υ11 uT12 , z → ζ1 , and d → δ1
0 0 U22 z2 d2
Chapter 12. Notes on Gaussian Elimination and LU Factorization 250
Algorithm: z := A PPROX N ORM U INV T UNB VAR 1(U, z) Algorithm: z := A PPROX N ORM U INV T UNB VAR 2(U, z)
z := 0
UT L UT R zT UT L UT R zT
Partition U → ,z→ Partition U → ,z→
UBL UBR zB UBL UBR zB
where UT L is 0 × 0, zT has 0 rows where UT L is 0 × 0, zT has 0 rows
while m(UT L ) < m(U) do while m(UT L ) < m(U) do
Repartition Repartition
U00 u01 U02 z0 U00 u01 U02 z0
UT L UT R z UT L UT R z
→ uT υ uT , T → ζ → uT υ uT , T → ζ
10 11 12 1 10 11 12 1
UBL UBR zB UBL UBR zB
U20 u21 U22 z2 U20 u21 U22 z2
so that U T z = d means
T
U00 z0 = d0
uT01 z0 + υ11 ζ1 = δ1
T T
U02 z0 + ζ1 u12 +U22 z2 = d2 .
Assume that d0 and z0 have already been computed and now δ1 and ζ1 are to be computed. Then
Constrained by the fact that δ1 = ±1, the magnitude of ζ1 can be (locally) maximized by choosing
δ1 := −sign(uT01 z0 )
and then
ζ1 := (δ1 − uT01 z0 )/υ11 .
This suggests the algorithm in Figure 12.13(left). By then taking the output vector z and solving the unit
upper triangular system LT x = z we compute kxk∞ ≤ kA−T k∞ = kA−1 k1 so that
An alternative algorithm that casts computation in terms of AXPY instead of DOT, put computes the
exact same z as the last algorithm, can be derived as follows: The first (top) entry of d can again be chosen
to equal 1. Assume we have proceeded to the point where
U00 u01 U02 z0 d0
U → 0 υ11 uT12 , z → ζ1 , and d → δ1
0 0 U22 z2 d2
Assume that d0 and z0 have already been computed and that ζ1 contains −uT01 z0 while z2 contains −U02
Tz .
0
Then
ζ1 := (δ1 − uT01 z0 )/υ11 = (δ1 + ζ1 )/υ11 .
Constrained by the fact that δ1 = ±1, the magnitude of ζ1 can be (locally) maximized by choosing
δ1 := −sign(uT01 z0 ) = sign(ζ1 )
and then
ζ1 := (δ1 + ζ))/υ11 .
After this, z2 is updated with
z2 := z2 − ζ1 u12 .
This suggests the algorithm in Figure 12.13(right). It computes the same z as does the algorithm in Fig-
ure 12.13(left).
By then taking the output vector z and solving the unit upper triangular system LT x = z we again
compute kxk∞ ≤ kA−T k∞ = kA−1 k1 so that
12.15.4 Discussion
There is a subtle reason for why we estimate kAk1 by estimating kA−T k∞ instead. The reason is that
when the LU factorization with pivoting, PA = LU, is computed, poor conditioning of the problem tends
to translate more into poor conditioning of matrix U than L, since L is chosen to be unit lower triangular
and the magnitudes of its strictly lower triangular elements are bounded by on. So, but choosing d based
on U, we using information about the triangular matrix that is more important.
A secondary reason is that if element growth happens during the factorization, that element growth
exhibits itself in artificially large elements of U, which then translates to a less well-conditioned matrix U.
In this case the condition number of A may not be all that bad, but when its factorization is used to solve
Ax = y the effective conditioning is worse and that effective conditioning is due to U.
Chapter 12. Notes on Gaussian Elimination and LU Factorization 252
12.16 Wrapup
12.16.2 Summary
Chapter 13
Notes on Cholesky Factorization
Video
Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
13.1.1 Launch
253
Chapter 13. Notes on Cholesky Factorization 254
13.1.2 Outline
Definition 13.1 A matrix A ∈ Cm×m is Hermitian positive definite (HPD) if and only if it is Hermitian
(AH = A) and for all nonzero vectors x ∈ Cm it is the case that xH Ax > 0. If in addition A ∈ Rm×m then A
is said to be symmetric positive definite (SPD).
(If you feel uncomfortable with complex arithmetic, just replace the word “Hermitian” with “Symmetric”
in this document and the Hermitian transpose operation, H , with the transpose operation, T .
Example 13.2 Consider the case where m = 1 so that A is a real scalar, α. Notice that then A is SPD if
and only if α > 0. This is because then for all nonzero χ ∈ R it is the case that αχ2 > 0.
Homework 13.3 Let B ∈ Cm×n have linearly independent columns. Prove that A = BH B is HPD.
* SEE ANSWER
Homework 13.4 Let A ∈ Cm×m be HPD. Show that its diagonal elements are real and positive.
* SEE ANSWER
Theorem 13.5 (Cholesky Factorization Theorem) Given a HPD matrix A there exists a lower triangular
matrix L such that A = LLH .
Obviously, there similarly exists an upper triangular matrix U such that A = U H U since we can choose
U H = L.
The lower triangular matrix L is known as the Cholesky factor and LLH is known as the Cholesky
factorization of A. It is unique if the diagonal elements of L are restricted to be positive.
The operation that overwrites the lower triangular part of matrix A with its Cholesky factor will be
denoted by A := CholA, which should be read as “A becomes its Cholesky factor.” Typically, only the
lower (or upper) triangular part of A is stored, and it is that part that is then overwritten with the result. In
this discussion, we will assume that the lower triangular part of A is stored and overwritten.
13.3 Application
The Cholesky factorization is used to solve the linear system Ax = y when A is HPD: Substituting the
factors into the equation yields LLH x = y. Letting z = LH x,
Ax = L (LH x) = Lz = y.
| {z }
z
Thus, z can be computed by solving the triangular system of equations Lz = y and subsequently the desired
solution x can be computed by solving the triangular linear system LH x = z.
13.4. An Algorithm 257
13.4 An Algorithm
The most common algorithm for computing A := CholA can be derived as follows: Consider A = LLH .
Partition
α11 ? λ 0
A= and L = 11 . (13.1)
a21 A22 l21 L22
Remark 13.6 We adopt the commonly used notation where Greek lower case letters refer to scalars, lower
case letters refer to (column) vectors, and upper case letters refer to matrices. (This is convention is often
attributed to Alston Householder.) The ? refers to a part of A that is neither stored nor updated.
Remark 13.7 Similar to the tril function in Matlab, we use tril (B) to denote the lower triangular part
of matrix B.
Chapter 13. Notes on Cholesky Factorization 258
H.
Proof: Since A is Hermitian so are A22 and A22 − l21 l21
χ1
Let x2 ∈ C(n−1)×(n−1) be an arbitrary nonzero vector. Define x = where χ1 = −aH x2 /α11 .
21
x2
Then, since x 6= 0,
H
χ1 α11 aH
21 χ1
0 < xH Ax =
x2 a21 A22 x2
H
χ1 α11 χ1 + aH
21 x2
=
x2 a21 χ1 + A22 x2
= α11 |χ1 |2 + χ1 aH H H
21 x2 + x2 a21 χ1 + x2 A22 x2
aH x2 x2H a21 x2H a21 H aH x2
= α11 21 − a21 x2 −x2H a21 21 + x2H A22 x2
α11 α11 α11 α11
H
a21 a21
= x2H (A22 − )x2 (since x2H a21 aH H H
21 x2 is real and hence equals a21 x2 x2 a21 )
α11
H H
= x2 (A22 − l21 l21 )x2 .
H is HPD.
We conclude that A22 − l21 l21
Proof: of the Cholesky Factorization Theorem
Proof by induction.
Base case: n = 1. Clearly the result is true for a 1 × 1 matrix A = α11 : In this case, the fact that A is
√
HPD means that α11 is real and positive and a Cholesky factor is then given by λ11 = α11 , with
uniqueness if we insist that λ11 is positive.
Inductive step: Assume the result is true for HPD matrix A ∈ C(n−1)×(n−1) . We will show that it holds for
√
A ∈ Cn×n . Let A ∈ Cn×n be HPD. Partition A and L as in (13.1) and let λ11 = α11 (which is well-
H (which exists as a consequence
defined by Lemma 13.8), l21 = a21 /λ11 , and L22 = CholA22 − l21 l21
of the Inductive Hypothesis and Lemma 13.9). Then L is the desired Cholesky factor of A.
By the principle of mathematical induction, the theorem holds.
13.6. Blocked Algorithm 259
where A11 and L11 are b × b. By substituting these partitioned matrices into A = LLH we find that
H
A11 ? L11 0 L11 0 H
L11 L11 ?
= = .
A21 A22 L21 L22 L21 L22 H
L21 L11 H + L LH
L21 L21 22 22
L11 = CholA11 ?
.
−H H
L21 = A21 L11 L22 = CholA22 − L21 L21
Remark 13.10 The Cholesky factorization A11 := L11 = CholA11 can be computed with the unblocked
algorithm or by calling the blocked Cholesky factorization algorithm recursively.
−H
Remark 13.11 Operations like L21 = A21 L11 are computed by solving the equivalent linear system with
H H
multiple right-hand sides L11 L21 = A21 .
done done AT L ?
↓ ↓
A00 ? ?
aT10 α11 ?
Repartition
A20 a21 A22
↓ ↓
√
UPD. α11
Update
a21
A22 −
UPD. UPD.
α11
a21 aT21
↓ ↓
done done AT L ?
@
End of iteration
R
@
partially
done updated ABL ABR
Figure 13.1: Left: Progression of pictures that explain Cholesky factorization algorithm. Right: Same
pictures, annotated with labels and updates.
13.7. Alternative Representation 261
√
α11 := α11 A11 := CholA11
a21 := a21 /α11 A21 := A21 tril (A11 )−H
A22 := A22 − tril a21 aH A22 := A22 − tril A21 AH
21 21
Figure 13.2: Unblocked and blocked algorithms for computing the Cholesky factorization in FLAME
notation.
Chapter 13. Notes on Cholesky Factorization 262
Beginning of iteration: At some stage of the algorithm (Top of the loop), the computation has moved
through the matrix to the point indicated by the thick lines. Notice that we have finished with the
parts of the matrix that are in the top-left, top-right (which is not to be touched), and bottom-left
quadrants. The bottom-right quadrant has been updated to the point where we only need to perform
a Cholesky factorization of it.
Repartition: We now repartition the bottom-right submatrix to expose α11 , a21 , and A22 .
End of iteration: The thick lines are moved, since we now have completed more of the computation, and
only a factorization of A22 (which becomes the new bottom-right quadrant) remains to be performed.
Continue: The above steps are repeated until the submatrix ABR is empty.
To motivate our notation, we annotate this progression of pictures as in Figure 13.1 (right). In those
pictures, “T”, “B”, “L”, and “R” stand for “Top”, “Bottom”, “Left”, and “Right”, respectively. This then
motivates the format of the algorithm in Figure 13.2 (left). It uses what we call the FLAME notation for
representing algorithms [25, 24, 40]. A similar explanation can be given for the blocked algorithm, which
is given in Figure 13.2 (right). In the algorithms, m(A) indicates the number of rows of matrix A.
Remark 13.12 The indices in our more stylized presentation of the algorithms are subscripts rather than
indices in the conventional sense.
Remark 13.13 The notation in Figs. 13.1 and 13.2 allows the contents of matrix A at the beginning of the
iteration to be formally stated:
AT L ? LT L ?
A= = ,
ABL ABR LBL ÂBR − tril LBL LBL H
13.8 Cost
The cost of the Cholesky factorization of A ∈ Cm×m can be analyzed as follows: In Figure 13.2 (left)
during the kth iteration (starting k at zero) A00 is k × k. Thus, the operations in that iteration cost
√
• α11 := α11 : negligible when k is large.
m−1 m−1
CChol (m) ≈ ∑ (m − k − 1)2 + ∑ (m − k − 1)
|k=0 {z } |k=0 {z }
(Due to update of A22 ) (Due to update of a21 )
m−1 m−1
1 1 1
= ∑ j2 + ∑ j ≈ m3 + m2 ≈ m3
j=0 j=0 3 2 3
which allows us to state that (obvious) most computation is in the update of A22 . It can be shown that the
blocked Cholesky factorization algorithm performs exactly the same number of floating point operations.
Comparing the cost of the Cholesky factorization to that of the LU factorization from “Notes on LU
Factorization” we see that taking advantage of symmetry cuts the cost approximately in half.
Recall that if B ∈ Cm×n has linearly independent columns, then A = BH B is HPD. Also, recall from “Notes
on Linear Least-Squares” that the solution, x ∈ Cn to the linear least-squares (LLS) problem
H
B
|{z}B x = BH y .
|{z}
A ŷ
This makes it obvious how the Cholesky factorization can (and often is) used to solve the LLS problem.
Homework 13.15 Consider B ∈ Cm×n with linearly independent columns. Recall that B has a QR fac-
torization, B = QR where Q has orthonormal columns and R is an upper triangular matrix with positive
diagonal elements. How are the Cholesky factorization of BH B and the QR factorization of B related?
* SEE ANSWER
Repartition
A00 ? ?
AT L ?
→ aT α11 ?
10
ABL ABR
A20 a21 A22
whereα11 is 1 × 1
Continue with
A00 ? ?
AT L ?
← aT α11 ?
10
ABL ABR
A20 a21 A22
endwhile
Figure 13.3: Unblocked Cholesky factorization, bordered variant, for Homework 13.17.
13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS 265
(Hint: For this exercise, use techniques similar to those in Section 13.5.)
2. Assuming that A00 = L00 L00T , where L is lower triangular and nonsingular, argue that the assign-
00
T T −T
ment l10 := a10 L00 is well-defined.
3. Assuming that A00 is SPD, A00 = L00 L00 T where L is lower triangular and nonsingular, and l T =
00
q 10
T −T T T
a10 L00 , show that α11 − l10 l10 > 0 so that λ11 := α11 − l10 l10 is well-defined.
4. Show that T
A00 a10 L00 0 L00 0
=
aT10 α11 T
l10 λ11 T
l10 λ11
* SEE ANSWER
Homework 13.17 Use the results in the last exercise to give an alternative proof by induction of the
Cholesky Factorization Theorem and to give an algorithm for computing it by filling in Figure 13.3. This
algorithm is often referred to as the bordered Cholesky factorization algorithm.
* SEE ANSWER
Homework 13.18 Show that the cost of the bordered algorithm is, approximately, 13 n3 flops.
* SEE ANSWER
Algorithm Code
for j = 1 : n do j=1, n
√
α j, j := α j, j A(j,j) = sqrt(A(j,j))
for i = j + 1 : n do i=j+1,n
Simple
for k = j + 1 : n do k=j+1,n
for i = k : n do i=k,n
αi,k := αi,k − αi, j αk, j A(i,k) = A(i,k) - A(i,j) * A(k,j)
endfor enddo
endfor enddo
endfor enddo
do j=1, n
for j = 1 : n A( j,j ) = sqrt( A( j,j ) )
√
α j, j := α j, j
call dscal( n-j, 1.0d00 / A(j,j), A(j+1,j), 1 )
Vector-vector
α j+1:n, j := α j+1:n, j /α j, j
do k=j+1,n
for k = j + 1 : n call daxpy( n-k+1, -A(k,j), A(k,j), 1,
αk:n,k := −αk, j αk:n, j + αk:n,k A(k,k), 1 )
endfor enddo
endfor enddo
Figure 13.4: Simple and vector-vector (level-1 BLAS) based representations of the right-looking algo-
rithm.
13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS 267
Algorithm Code
for j = 1 : n do j=1, n
√
α j, j := α j, j A(j,j) = sqrt(A(j,j))
Matrix-vector
Figure 13.5: Matrix-vector (level-2 BLAS) based representations of the right-looking algorithm.
Chapter 13. Notes on Cholesky Factorization 268
We start with a simple implementation in Fortran. A simple algorithm that does not use BLAS and the
corresponding code in given in the row labeled “Simple” in Figure 13.4. This sets the stage for our
explaination of how the algorithm and code can be represented with vector-vector, matrix-vector, and
matrix-matrix operations, and the corresponding calls to BLAS routines.
The first BLAS interface [29] was proposed in the 1970s when vector supercomputers like the early Cray
architectures reigned. On such computers, it sufficed to cast computation in terms of vector operations.
As long as memory was accessed mostly contiguously, near-peak performance could be achieved. This
interface is now referred to as the Level-1 BLAS. It was used for the implementation of the first widely
used dense linear algebra package, LINPACK [17].
Let x and y be vectors of appropriate length and α be scalar. In this and other notes we have vector-
vector operations such as scaling of a vector (x := αx), inner (dot) product (α := xT y), and scaled vector
addition (y := αx + y). This last operation is known as an axpy, which stands for alpha times x plus y.
Our Cholesky factorization algorithm expressed in terms of such vector-vector operations and the cor-
responding code are given in Figure 13.4 in the row labeled “Vector-vector”. If the operations supported
by dscal and daxpy achieve high performance on a target archecture (as it was in the days of vector
supercomputers) then so will the implementation of the Cholesky factorization, since it casts most compu-
tation in terms of those operations. Unfortunately, vector-vector operations perform O(n) computation on
O(n) data, meaning that these days the bandwidth to memory typically limits performance, since retriev-
ing a data item from memory is often more than an order of magnitude more costly than a floating point
operation with that data item.
We summarize information about level-1 BLAS in Figure 13.6.
The next level of BLAS supports operations with matrices and vectors. The simplest example of such
an operation is the matrix-vector product: y := Ax where x and y are vectors and A is a matrix. Another
example is the computation A22 = −a21 aT21 + A22 (symmetric rank-1 update) in the Cholesky factorization.
Here only the lower (or upper) triangular part of the matrix is updated, taking advantage of symmetry.
The use of symmetric rank-1 update is illustrated in Figure 13.5, in the row labeled “Matrix-vector”.
There dsyr is the routine that implements a double precision symmetric rank-1 update. Readability of the
code is improved by casting computation in terms of routines that implement the operations that appear in
the algorithm: dscal for a21 = a21 /α11 and dsyr for A22 = −a21 aT21 + A22 .
If the operation supported by dsyr achieves high performance on a target archecture (as it was in
the days of vector supercomputers) then so will this implementation of the Cholesky factorization, since
it casts most computation in terms of that operation. Unfortunately, matrix-vector operations perform
O(n2 ) computation on O(n2 ) data, meaning that these days the bandwidth to memory typically limits
performance.
We summarize information about level-2 BLAS in Figure 13.7.
13.11. Implementing the Cholesky Factorization with the (Traditional) BLAS 269
• The “” indicates the data type. The choices for this first letter are
s single precision
d double precision
c single precision complex
z double precision complex
• x and y indicate the memory locations where the first elements of x and y are stored,
respectively.
• incx and incy equal the increment by which one has to stride through memory for the
elements of vectors x and y, respectively
routine/ operation
function
swap x↔y
scal x ← αx
copy y←x
axpy y ← αx + y
dot xT y
nrm2 kxk2
asum kre(x)k1 + kim(x)k1
imax min(k) : |re(xk )| + |im(xk )| = max(|re(xi )| + |im(xi )|)
XXYY,
where
• In addition, operations with banded matrices are supported, which we do not discuss
here.
which implements the operation A = αxxT + A, updating the lower or upper triangular part of
A by choosing uplo as ‘Lower triangular’ or ‘Upper triangular’, respectively. The
parameter lda (the leading dimension of matrix A) indicates the increment by which memory
has to be traversed in order to address successive elements in a row of matrix A.
The following table gives the most commonly used level-2 BLAS operations:
routine/ operation
function
gemv general matrix-vector multiplication
symv symmetric matrix-vector multiplication
trmv triangular matrix-vector multiplication
trsv triangular solve vector
ger general rank-1 update
syr symmetric rank-1 update
syr2 symmetric rank-2 update
Algorithm Code
do j=1, n, nb
jb = min( nb, n-j+1 )
A( j+jb, j ), lda )
A j+b:n, j: j+b−1 A−H
j: j+b−1, j: j+b−1
A j+b:n, j+b:n := A j+b:n, j+b:n call dsyrk( ‘Lower triangular’, ‘No transpose’,
J-JB+1, JB, -1.0d00, A( j+jb, j ), lda,
− tril Aj+b:n,j:j+b−1 AH j+b:n,j:j+b−1 1.0d00, A( j+jb, j+jb ), lda )
endfor enddo
int Chol_unb_var3( FLA_Obj A )
AT L ? {
Partition A → FLA_Obj ATL, ATR, A00, a01, A02,
ABL ABR ABL, ABR, a10t, alpha11, a12t,
whereAT L is 0 × 0 A20, a21, A22;
while m(AT L ) < m(A) do
FLA_Part_2x2( A, &ATL, &ATR,
Determine block size b &ABL, &ABR, 0, 0, FLA_TL );
FLAME Notation
Repartition
while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){
A00 ? ? FLA_Repart_2x2_to_3x3(
A ? ATL,/**/ ATR, &A00, /**/ &a01, &A02,
TL
→A A
10 11 ?
/* ************ */ /* ************************** */
ABL ABR
A20 A21 A22 &a10t,/**/ &alpha11, &a12t,
whereA11 is b × b ABL,/**/ ABR, &A20, /**/ &a21, &A22,
1, 1, FLA_BR );
/*-------------------------------------------------*/
A11 := CholA11 FLA_Sqrt( alpha11 );
A21 := A21 tril (A11 )−H FLA_Invscal( alpha11, a21 );
FLA_Syr( FLA_LOWER_TRIANGULAR, FLA_MINUS_ONE,
A22 := A22 − tril A21 AH
21 A21, A22 );
/*-------------------------------------------------*/
FLA_Cont_with_3x3_to_2x2(
Continue with &ATL,/**/ &ATR, A00, a01, /**/ A02,
a10t, alpha11,/**/ a12t,
A00 ? ? /* ************* */ /* *********************** */
AT L ?
← A10 A11 ? &ABL,/**/ &ABR, A20, a21, /**/ A22,
ABL ABR
FLA_TL );
A20 A21 A22
}
endwhile return FLA_SUCCESS;
}
The naming convention for level-3 BLAS routines are similar to those for the level-2 BLAS.
A representative call to a level-3 BLAS operation is given by
routine/ operation
function
gemm general matrix-matrix multiplication
symm symmetric matrix-matrix multiplication
trmm triangular matrix-matrix multiplication
trsm triangular solve with multiple right-hand sides
syrk symmetric rank-k update
syr2k symmetric rank-2k update
T =A .
• The call to dtrsm implements A21 := L21 where L21 L11 21
T +A .
• The call to dsyrk implements A22 := −L21 L21 22
The bulk of the computation is now cast in terms of matrix-matrix operations which can achieve high
performance.
We summarize information about level-3 BLAS in Figure 13.9.
One thread
10
6
GFlops
2 Hand optimized
BLAS3
BLAS2
BLAS1
triple loops
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
n
Figure 13.10: Performance of the different implementations of Cholesky factorization that use different
levels of BLAS. The target processor has a peak of 11.2 Gflops (billions of floating point operations per
second). BLAS1, BLAS2, and BLAS3 indicate that the bulk of computation was cast in terms of level-1,
-2, or -3 BLAS, respectively.
In a number of places in these notes we presented algorithms in FLAME notation. Clearly, there is a
disconnect between this notation and how the algorithms are then encoded with the BLAS interface. In
Figures 13.4, 13.5, and 13.8 we also show how the FLAME API for the C programming language [6]
allows the algorithms to be more naturally translated into code. While the traditional BLAS interface
underlies the implementation of Cholesky factorization and other algorithms in the widely used LAPACK
library [1], the FLAME/C API is used in our libflame library [25, 41, 42].
13.12.2 BLIS
The implementations that call BLAS in this paper are coded in Fortran. More recently, the languages of
choice for scientific computing have become C and C++. While there is a C interface to the traditional
BLAS called the CBLAS [11], we believe a more elegant such interface is the BLAS-like Library Instan-
tiation Software (BLIS) interface [43]. BLIS is not only a framework for rapid implementation of the
traditional BLAS, but also presents an alternative interface for C and C++ users.
Chapter 13. Notes on Cholesky Factorization 274
13.13 Wrapup
13.13.2 Summary
Chapter 14
Notes on Eigenvalues and Eigenvectors
If you have forgotten how to find the eigenvalues and eigenvectors of 2 × 2 and 3 × 3 matrices, you may
want to review
Video
Read disclaimer regarding the videos in the preface!
* YouTube
* Download from UT Box
* View After Local Download
275
Chapter 14. Notes on Eigenvalues and Eigenvectors 276
14.0.1 Outline
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
14.0.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
14.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
14.2 The Schur and Spectral Factorizations . . . . . . . . . . . . . . . . . . . . . . . 283
14.3 Relation Between the SVD and the Spectral Decomposition . . . . . . . . . . . . 285
14.1 Definition
Definition 14.1 Let A ∈ Cm×m . Then λ ∈ C and nonzero x ∈ Cm are said to be an eigenvalue and corre-
sponding eigenvector if Ax = λx. The tuple (λ, x) is said to be an eigenpair. The set of all eigenvalues of A
is denoted by Λ(A) and is called the spectrum of A. The spectral radius of A, ρ(A), equals the magnitude
of the largest eigenvalue, in magnitude:
The action of A on an eigenvector x is as if it were multiplied by a scalar. The direction does not
change, only its length is scaled:
Ax = λx.
The following exercises expose some other basic properties of eigenvalues and eigenvectors:
Homework 14.4 Let λ be an eigenvalue of A and let Eλ (A) = {x ∈ Cm |Ax = λx} denote the set of all
eigenvectors of A associated with λ (including the zero vector, which is not really considered an eigenvec-
tor). Show that this set is a (nontrivial) subspace of Cm .
* SEE ANSWER
Definition 14.5 Given A ∈ Cm×m , the function pm (λ) = det(λI − A) is a polynomial of degree at most m.
This polynomial is called the characteristic polynomial of A.
14.1. Definition 277
The definition of pm (λ) and the fact that is a polynomial of degree at most m is a consequence of the
definition of the determinant of an arbitrary square matrix. This definition is not particularly enlightening
other than that it allows one to succinctly related eigenvalues to the roots of the characteristic polynomial.
Remark 14.6 The relation between eigenvalues and the roots of the characteristic polynomial yield a
disconcerting insight: A general formula for the eigenvalues of a m × m matrix with m > 4 does not exist.
The reason is that there is no general formula for the roots of a polynomial of degree m > 4. Given any
polynomial pm (χ) of degree m, an m × m matrix can be constructed such that its characteristic polynomial
is pm (λ). If
pm (χ) = α0 + α1 χ + · · · + αm−1 χm−1 + χm
and
−αn−1 −αn−2 −αn−3 · · · −α1 −α0
1 0 0 ··· 0 0
0 1 0 ··· 0 0
A=
0 0 1 ··· 0 0
.. .. .. .. .. ..
. . . . . .
0 0 0 ··· 1 0
then
pm (λ) = det(λI − A)
Hence, we conclude that no general formula can be found for the eigenvalues for m × m matrices when
m > 4. What we will see in future “Notes on ...” is that we will instead create algorithms that converge to
the eigenvalues and/or eigenvalues of matrices.
Theorem 14.7 Let A ∈ Cm×m and pm (λ) be its characteristic polynomial. Then λ ∈ Λ(A) if and only if
pm (λ) = 0.
Homework 14.8 The eigenvalues of a diagonal matrix equal the values on its diagonal. The eigenvalues
of a triangular matrix equal the values on its diagonal.
* SEE ANSWER
Corollary 14.9 If A ∈ Rm×m is real valued then some or all of its eigenvalues may be complex valued. In
this case, if λ ∈ Λ(A) then so is its conjugate, λ.
Proof: It can be shown that if A is real valued, then the coefficients of its characteristic polynomial are
all real valued. Complex roots of a polynomial with real coefficients come in conjugate pairs.
It is not hard to see that an eigenvalue that is a root of multiplicity k has at most k eigenvectors. It
is, however, not necessarily the case that an eigenvalue that is a root of multiplicity k also has k linearly
Chapter 14. Notes on Eigenvalues and Eigenvectors 278
independent eigenvectors. In other words, the null space of λI − A may have dimension less then the
algebraic multiplicity of λ. The prototypical counter example is the k × k matrix
µ ···
1 0 0 0
0 µ 1 ...
0
0
. . .. ..
. .. ... ...
J(µ) =
. .
.
0 0 0 ...
µ 1
0 0 0 ··· 0 µ
where k > 1. Observe that λI − J(µ) is singular if and only if λ = µ. Since µI − J(µ) has k − 1 linearly
independent columns its null-space has dimension one: all eigenvectors are scalar multiples of each other.
This matrix is known as a Jordan block.
Definition 14.10 A matrix A ∈ Cm×m that has fewer than m linearly independent eigenvectors is said to
be defective. A matrix that does have m linearly independent eigenvectors is said to be nondefective.
Theorem 14.11 Let A ∈ Cm×m . There exist nonsingular matrix X and diagonal matrix Λ such that A =
XΛX −1 if and only if A is nondefective.
Proof:
(⇒). Assume there exist nonsingular matrix X and diagonal matrix Λ so that A = XΛX −1 . Then, equiva-
lently, AX = XΛ. Partition X by columns so that
λ0 0 ··· 0
0 λ1 · · · 0
A x0 x1 · · · xm−1 = x0 x1 · · · xm−1
.. .. .. ..
. . . .
0 0 · · · λm−1
= λ0 x0 λ1 x1 · · · λm−1 xm−1 .
Then, clearly, Ax j = λ j x j so that A has m linearly independent eigenvectors and is thus nondefective.
(⇐). Assume that A is nondefective. Let {x0 , · · · , xm−1 } equal m linearly independent eigenvectors cor-
responding to eigenvalues {λ0 , · · · , λm−1 }. If X = x0 x1 · · · xm−1 then AX = XΛ where Λ =
diag(λ0 , . . . , λm−1 ). Hence A = XΛX −1 .
Definition 14.12 Let µ ∈ Λ(A) and pm (λ) be the characteristic polynomial of A. Then the algebraic
multiplicity of µ is defined as the multiplicity of µ as a root of pm (λ).
Definition 14.13 Let µ ∈ Λ(A). Then the geometric multiplicity of µ is defined to be the dimension of
Eµ (A). In other words, the geometric multiplicity of µ equals the number of linearly independent eigen-
vectors that are associated with µ.
14.2. The Schur and Spectral Factorizations 279
Theorem 14.14 Let A ∈ Cm×m . Let the eigenvalues of A be given by λ0 , λ1 , · · · , λk−1 , where an eigenvalue
is listed exactly n times if it has geometric multiplicity n. There exists a nonsingular matrix X such that
J(λ0 ) 0 ··· 0
0 J(λ1 ) · · · 0
A=X . .
.. .
.. . .. .
..
0 0 · · · J(λk−1 )
For our discussion, the sizes of the Jordan blocks J(λi ) are not particularly important. Indeed, this decom-
position, known as the Jordan Canonical Form of matrix A, is not particularly interesting in practice. For
this reason, we don’t discuss it further and do not we give its proof.
Proof: Let λ ∈ Λ(A) and x be an associated eigenvector. Then Ax = λx if and only if Y −1 AYY −1 x = Y −1 λx
if and only if B(Y −1 x) = λ(Y −1 x).
Definition 14.16 Matrices A and B are said to be similar if there exists a nonsingular matrix Y such that
B = Y −1 AY .
Theorem 14.17 Schur Decomposition Theorem Let A ∈ Cm×m . Then there exist a unitary matrix Q and
upper triangular matrix U such that A = QUQH . This decomposition is called the Schur decomposition
of matrix A.
In the above theorem, Λ(A) = Λ(U) and hence the eigenvalues of A can be found on the diagonal of U.
Proof: We will outline how to construct Q so that QH AQ = U, an upper triangular matrix.
Since a polynomial of degree m has at least one root, matrix A has at least one eigenvalue, λ1 , and
corresponding eigenvector we normalize this eigenvector to have length one. Thus Aq1 = λ1 q1 .
q1 , where
Choose Q2 so that Q = q1 Q1 is unitary. Then
H
QH AQ = q1 Q2 A q1 Q2
qH Aq1 qH AQ
2 λ1 q H AQ
2 λ wT
= 1 1 = 1 = 1 ,
H H
Q2 Aq1 Q2 AQ2 H H
λQ2 q1 Q2 AQ2 0 B
where wT = qH H
1 AQ2 and B = Q2 AQ2 . This insight can be used to construct an inductive proof.
One should not mistake the above theorem and its proof as a constructive way to compute the Schur
decomposition: finding an eigenvalue and/or the eigenvalue associated with it is difficult.
Chapter 14. Notes on Eigenvalues and Eigenvectors 280
AT L AT R
Lemma 14.18 Let A ∈ Cm×m be of form A = . Assume that QT L and QBR are unitary “of
0 ABR
appropriate size”. Then
H
QT L 0 QT L AT L QH
TL QT L AT R QH
BR QT L 0
A= .
0 QBR 0 QBR ABR QH
BR 0 QBR
Homework 14.19 Prove Lemma 14.18. Then generalize it to a result for block upper triangular matrices:
A0,0 A0,1 · · · A0,N−1
0 A1,1 · · · A1,N−1
A= .
0 . .. .
..
0
0 0 · · · AN−1,N−1
* SEE ANSWER
AT L AT R
Corollary 14.20 Let A ∈ Cm×m be of for A = . Then Λ(A) = Λ(AT L ) ∪ Λ(ABR ).
0 ABR
Homework 14.21 Prove Corollary 14.20. Then generalize it to a result for block upper triangular matri-
ces.
* SEE ANSWER
A theorem that will later allow the eigenvalues and vectors of a real matrix to be computed (mostly)
without requiring complex arithmetic is given by
Theorem 14.22 Let A ∈ Rm×m . Then there exist a unitary matrix Q ∈ Rm×m and quasi upper triangular
matrix U ∈ Rm×m such that A = QUQT .
A quasi upper triangular matrix is a block upper triangular matrix where the blocks on the diagonal are
1 × 1 or 2 × 2. Complex eigenvalues of A are found as the complex eigenvalues of those 2 × 2 blocks on
the diagonal.
Theorem 14.23 Spectral Decomposition Theorem Let A ∈ Cm×m be Hermitian. Then there exist a
unitary matrix Q and diagonal matrix Λ ∈ Rm×m such that A = QΛQH . This decomposition is called the
Spectral decomposition of matrix A.
Proof: From the Schur Decomposition Theorem we know that there exist a matrix Q and upper triangular
matrix U such that A = QUQH . Since A = AH we know that QUQH = QU H QH and hence U = U H . But
a Hermitian triangular matrix is diagonal with real valued diagonal entries.
What we conclude is that a Hermitian matrix is nondefective and its eigenvectors can be chosen to
form an orthogonal basis.
Homework 14.24 Let A be Hermitian and λ and µ be distinct eigenvalues with eigenvectors xλ and xµ ,
respectively. Then xλH xµ = 0. (In other words, the eigenvectors of a Hermitian matrix corresponding to
distinct eigenvalues are orthogonal.)
* SEE ANSWER
14.3. Relation Between the SVD and the Spectral Decomposition 281
Homework 14.26 Let A ∈ Cm×m and A = UΣV H its SVD. Relate the Spectral decompositions of AH A and
AAH to U, V , and Σ.
* SEE ANSWER
Chapter 14. Notes on Eigenvalues and Eigenvectors 282
Chapter 15
Notes on the Power Method and Related
Methods
in which the Power Method and Inverse Power Methods are discussed at a more rudimentary level.
Video
Read disclaimer regarding the videos in the preface!
283
Chapter 15. Notes on the Power Method and Related Methods 284
15.0.1 Outline
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
15.0.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
15.1 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
15.1.1 First attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
15.1.2 Second attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
15.1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
15.1.4 Practical Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
15.1.5 The Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
15.1.6 What if |λ0 | ≥ |λ1 |? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
15.2 The Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
15.3 Rayleigh-quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
From “Notes on Eigenvalues and Eigenvectors” we know then that the columns of X equal eigenvectors
of A and the elements on the diagonal of Λ equal the eigenvalues::
λ0 0 ··· 0
0 λ1 · · · 0
X= x0 x0 · · · xm−1 and Λ=
.. .. . . ..
. . . .
0 0 · · · λm−1
so that
Axi = λi xi for i = 0, . . . , m − 1.
For most of this section we will assume that
Now, as long as ψ0 6= 0 clearly ψ0 λk0 x0 will eventually dominate which means that v(k) will start pointing
in the direction of x0 . In other words, it will start pointing in the direction of an eigenvector corresponding
to λ0 . The problem is that it will become infinitely long if |λ0 | > 1 or infinitessimily short if |λ0 | < 1. All
is good if |λ0 | = 1.
so that
Now, since λλ0k < 1 for k > 1 we can argue that
1 0 ··· 0
k 1 0 ··· 0
λ1
0 λ0 ··· 0 0 0 ··· 0
lim v(k) = lim X y = X y
k→∞ k→∞ .. ... ... .. .. . . . . ..
. .
. . . .
k
0 0 ··· λm−1 0 0 ··· 0
λ0
= Xψ0 e0 = ψ0 Xe0 = ψ0 x0 .
Thus, as long as ψ0 6= 0 (which means v must have a component in the direction of x0 ) this method will
eventually yield a vector in the direction of x0 . However, this time the problem is that we don’t know λ0
when we start.
15.1.3 Convergence
Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a
few definitions regarding rates of convergence.
Notice that because of the equivalence of norms, if the sequence converges in one norm, it converges in
all norms.
|αk+1 − α| ≤ 0.99000
|αk+2 − α| ≤ 0.970299
|αk+3 − α| ≤ 0.932065
|αk+4 − α| ≤ 0.860058
|αk+5 − α| ≤ 0.732303
|αk+6 − α| ≤ 0.530905
|αk+7 − α| ≤ 0.279042
|αk+8 − α| ≤ 0.077085
|αk+9 − α| ≤ 0.005882
|αk+10 − α| ≤ 0.000034
If we consider α the correct result then, eventually, the number of correct digits roughly doubles in each
iteration. This can be explained as follows: If |αk − α| < 1, then the number of correct decimal digits is
given by
− log10 |αk − α|.
Since log10 is a monotonically increasing function,
log10 |αk+1 − α| ≤ log10 C|αk − α|2 = log10 (C) + 2 log10 |αk − α| ≤ 2 log10 (|αk − α|
and hence
− log10 |αk+1 − α| ≥ 2( − log10 (|αk − α| ).
| {z } | {z }
number of correct number of correct
digits in αk+1 digits in αk
Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration
to the next.
We now define a convenient norm.
Lemma 15.3 Let X ∈ Cm×m be nonsingular. Define k · kX : Cm → R by kykX = kXyk for some given norm
k · k : Cm → R. Then k · kX is a norm.
1 0 ··· 0 0 0 ··· 0
k k
λ1 λ1
0 λ0 ··· 0 0 λ0 ··· 0
= X y − x = X y
.. .. .. .. ψ 0 0 .. .. .. ..
. . . .
. . . .
k k
λm−1 λm−1
0 0 ··· λ0 0 0 ··· λ0
Hence
0 0 ··· 0
k
λ1
0 ··· 0
X −1 (v(k) − ψ0 x0 ) =
λ0
y
.. .. .. ..
. . . .
k
λm−1
0 0 ··· λ0
and
0 0 ··· 0
−1 (k+1)
0 λλ10 · · · 0
−1 (k)
X (v − ψ0 x0 ) = X (v − ψ0 x0 ).
.. . . . . ..
. . . .
λm−1
0 0 ··· λ0
Now, let k · k be a p-norm1 and its induced matrix norm and k · kX −1 as defined in Lemma 15.3. Then
kv(k+1) − ψ0 x0 kX −1 = kX −1 (v(k+1) − ψ0 x0 )k
0 0 ··· 0
0 λ1 · · · 0
λ0 −1 (k)
=
. . X (v − ψ0 x0 )
.. . . . . . .
..
0 0 · · · λm−1
λ0
λ1 −1 (k) λ1
≤ kX (v − ψ0 x0 )k = kv(k) − ψ0 x0 kX −1 .
λ0 λ0
This shows that, in this norm, the convergence of v(k) to ψ0 x0 is linear: The difference between current
approximation, v(k) , and the solution, ψx0 , is reduced by at least a constant factor in each iteration.
Proof: Let x be an eigenvector of A and λ the associated eigenvalue. Then Ax = λx. Multiplying on the
left by xH yields xH Ax = λxH x which, since x 6= 0 means that λ = xH Ax/(xH x).
Clearly this ratio as a function of x is continuous and hence an approximation to x0 when plugged into this
formula would yield an approximation to λ0 .
Theorem 15.6 Let A ∈ Cm×m be nonsingular. Then λ and x are an eigenvalue and associated eigenvector
of A if and only if 1/λ and x are an eigenvalue and associated eigenvector of A−1 .
Show that
1 1 1 1
λm−1 > λm−2 ≥ λm−3 ≥ · · · ≥ λ0 .
* SEE ANSWER
Thus, an eigenvector associated with the smallest (in magnitude) eigenvalue of A is an eigenvector associ-
ated with the largest (in magnitude) eigenvalue of A−1 . This suggest the following naive iteration:
15.3. Rayleigh-quotient Iteration 291
for k = 0, . . .
v(k+1) = A−1 v(k)
v(k+1) = λm−1 v(k+1)
endfor
Of course, we would want to factor A = LU once and solve L(Uv(k+1) ) = v(k) rather than multiplying
with A−1 . From the analysis of the convergence of the “second attempt” for a Power Method algorithm
we conclude that now
(k+1)
λm−1 (k)
kv − ψm−1 xm−1 kX −1 ≤ kv − ψm−1 xm−1 k −1 .
X
λm−2
for k = 0, . . .
v(k+1) = A−1 v(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor
Often, we would expect the Invert Power Method to converge faster than the Power Method. For
example, take the case where |λk | are equally spaced between 0 and m: |λk | = (k + 1). Then
λ1 m − 1 λm−1 1
= and = .
λ0 m λm−2 2
which means that the Power Method converges much more slowly than the Inverse Power Method.
Lemma 15.8 Let A ∈ Cm×m and µ ∈ C. Then (λ, x) is an eigenpair of A if and only if (λ − µ, x) is an
eigenpair of (A − µI).
The matrix A − µI is referred to as the matrix A that has been “shifted” by µ. What the lemma says is that
shifting A by µ shifts the spectrum of A by µ:
This suggests the following (naive) iteration: Pick a value µ close to λm−1 . Iterate
Chapter 15. Notes on the Power Method and Related Methods 292
for k = 0, . . .
v(k+1) = (A − µI)−1 v(k)
v(k+1) = (λm−1 − µ)v(k+1)
endfor
Of course one would solve (A − µI)v(k+1) = v(k) rather than computing and applying the inverse of A.
If we index the eigenvalues so that |λ0 − µ| ≤ · · · ≤ |λm−2 − µ| < |λm−1 − µ| then
(k+1)
λm−1 − µ (k)
kv − ψm−1 xm−1 kX −1 ≤ kv − ψm−1 xm−1 k −1 .
X
λm−2 − µ
The closer to λm−1 the “shift” (so named because it shifts the spectrum of A) is chosen, the more favorable
the ratio that dictates convergence.
A more practical algorithm is given by
for k = 0, . . .
v(k+1) = (A − µI)−1 v(k)
v(k+1) = v(k+1) /kv(k+1) k
endfor
The question now becomes how to chose µ so that it is a good guess for λm−1 . Often an application
inherently supplies a reasonable approximation for the smallest eigenvalue or an eigenvalue of particular
interest. However, we know that eventually v(k) becomes a good approximation for xm−1 and therefore
the Rayleigh quotient gives us a way to find a good approximation for λm−1 . This suggests the (naive)
Rayleigh-quotient iteration:
for k = 0, . . .
µk = v(k) H Av(k) /(v(k) H v(k) )
v(k+1) = (A − µk I)−1 v(k)
v(k+1) = (λm−1 − µk )v(k+1)
endfor
Now2
(k+1)
λm−1 − µk (k)
kv − ψm−1 xm−1 kX −1 ≤ kv − ψm−1 xm−1 k −1
X
λm−2 − µk
with
lim (λm−1 − µk ) = 0
k→∞
which means super linear convergence is observed. In fact, it can be shown that once k is large enough
which is known as quadratic convergence. Roughly speaking this means that every iteration doubles
the number of correct digits in the current approximation. To prove this, one shows that |λm−1 − µk | ≤
Ckv(k) − ψm−1 xm−1 kX −1 .
2 I think... I have not checked this thoroughly. But the general idea holds. λm−1 has to be defined as the eigenvalue to which
the method eventually converges.
15.3. Rayleigh-quotient Iteration 293
Better yet, it can be shown that if A is Hermitian, then, once k is large enough,
which is known as cubic convergence. Roughly speaking this means that every iteration triples the number
of correct digits in the current approximation. This is mind-boggling fast convergence!
A practical Rayleigh quotient iteration is given by
Video
Read disclaimer regarding the videos in the preface!
Tragically, the camera ran out of memory for the first lecture... Here is the second lecture, which discusses
the implicit QR algorithm
* YouTube
* Download from UT Box
* View After Local Download
295
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 296
16.0.1 Outline
Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
16.0.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
16.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.2 Subspace Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.3 The QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
16.3.1 A basic (unshifted) QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 306
16.3.2 A basic shifted QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
16.4 Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
16.4.1 Householder transformations (reflectors) . . . . . . . . . . . . . . . . . . . . . . 308
16.4.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
16.5 The QR algorithm with a Tridiagonal Matrix . . . . . . . . . . . . . . . . . . . . 311
16.5.1 Givens’ rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.6 QR Factorization of a Tridiagonal Matrix . . . . . . . . . . . . . . . . . . . . . . 312
16.7 The Implicitly Shifted QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 314
16.7.1 Upper Hessenberg and tridiagonal matrices . . . . . . . . . . . . . . . . . . . . . 314
16.7.2 The Implicit Q Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
16.7.3 The Francis QR Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
16.7.4 A complete algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
16.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
16.8.1 More on reduction to tridiagonal form . . . . . . . . . . . . . . . . . . . . . . . 322
16.8.2 Optimizing the tridiagonal QR algorithm . . . . . . . . . . . . . . . . . . . . . . 322
16.9 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
16.9.1 Jacobi’s method for the symmetric eigenvalue problem . . . . . . . . . . . . . . 322
16.9.2 Cuppen’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
16.9.3 The Method of Multiple Relatively Robust Representations (MRRR) . . . . . . . 325
16.10 The Nonsymmetric QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 325
16.10.1 A variant of the Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . 325
16.10.2 Reduction to upperHessenberg form . . . . . . . . . . . . . . . . . . . . . . . . 327
16.10.3 The implicitly double-shifted QR algorithm . . . . . . . . . . . . . . . . . . . . 329
16.1. Preliminaries 297
16.1 Preliminaries
The QR algorithm is a standard method for computing all eigenvalues and eigenvectors of a matrix. In this
note, we focus on the real valued symmetric eigenvalue problem (the case where A ∈ Rn×n . For this case,
recall the Spectral Decomposition Theorem:
Theorem 16.1 If A ∈ Rn×n then there exists unitary matrix Q and diagonal matrix Λ such that A = QΛQT .
We will partition Q = q0 · · · qn−1 and assume that Λ = diag(()λ0 , · · · , λn−1 ) so that throughout
this note, qi and λi refer to the ith column of Q and the ith diagonal element of Λ, which means that each
tuple (λ, qi ) is an eigenpair.
We start with a matrix V ∈ Rn×r with normalized columns and iterate something like
V (0) = V
for k = 0, . . . convergence
V (k+1) = AV (k)
Normalize the columns to be of unit length.
end for
The problem with this approach is that all columns will (likely) converge to an eigenvector associated with
the dominant eigenvalue, since the Power Method is being applied to all columns simultaneously. We will
now lead the reader through a succession of insights towards a practical algorithm.
Let us examine what V = AV looks like, for the simple case where V = v0 v1 v2 (three columns).
b
We know that
v j = Q QT v j .
| {z }
yj
Hence
n−1
v0 = ∑ ψ0, j q j ,
j=0
n−1
v1 = ∑ ψ1, j q j , and
j=0
n−1
v2 = ∑ ψ2, j q j ,
j=0
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 298
• If we happened to know λ0 , λ1 , and λ2 then we could divide the columns by these, respectively, and
get new vectors
n−1 λj n−1 λj n−1 λj
vb0 vb1 vb2 = ∑ j=0 0, j λ0 j ∑ j=0 1, j λ1 j ∑ j=0 2, j λ2 q j
ψ q ψ q ψ
ψ1,0 λλ10 q0 + ψ2,0 λλ20 q0 + ψ2,1 λλ12 q1 +
= ψ0,0 q0 + ψ1,1 q1 + ψ2,2 q2 + (16.1)
n−1 λj n−1 λj n−1 λj
∑ j=1 ψ0, j λ0 q j ∑ j=2 ψ1, j λ1 q j ∑ j=3 ψ2, j λ2 q j
• Assume that |λ0 | > |λ1 | > |λ2 | > |λ3 | ≥ · · · ≥ |λn−1 |. Then, similar as for the power method,
– The first column will see components in the direction of {q1 , . . . , qn−1 } shrink relative to the
component in the direction of q0 .
– The second column will see components in the direction of {q2 , . . . , qn−1 } shrink relative to the
component in the direction of q1 , but the component in the direction of q0 increases, relatively,
since |λ0 /λ1 | > 1.
– The third column will see components in the direction of {q3 , . . . , qn−1 } shrink relative to the
component in the direction of q2 , but the components in the directions of q0 and q1 increase,
relatively, since |λ0 /λ2 | > 1 and |λ1 /λ2 | > 1.
in the direction of q0 :
n−1 λj
vb1 − qT0 vb1 q0 = ψ1,1 q1 + ∑ ψ1, j qj
j=2 λ1
so that we are left with the component in the direction of q1 and components in directions of
q2 , . . . , qn−1 that are suppressed every time through the loop.
• Similarly, if we also know q1 , the components of vb2 in the direction of q0 and q1 can be subtracted
from that vector.
16.2. Subspace Iteration 299
• We do not know λ0 , λ1 , and λ2 but from the discussion about the Power Method we remember that
we can just normalize the so updated vb0 , vb1 , and vb2 to have unit length.
• We do not know q0 , q1 , and q2 , but we can informally argue that if we keep iterating,
– The vector vb0 , normalized in each step, will eventually point in the direction of q0 .
– S (b
v0 , vb1 ) will eventually equal S (q0 , q1 ).
– In each iteration, we can subtract the component of vb1 in the direction of vb0 from vb1 , and then
normalize vb1 so that eventually result in a the vector that points in the direction of q1 .
– S (b
v0 , vb1 , vb2 ) will eventually equal S (q0 , q1 , q2 ).
– In each iteration, we can subtract the component of vb2 in the directions of vb0 and vb1 from vb2 ,
and then normalize the result, to make vb2 eventually point in the direction of q2 .
What we recognize is that normalizing vb0 , subtracting out the component of vb1 in the direction of vb0 , and
then normalizing vb1 , etc., is exactly what the Gram-Schmidt process does. And thus, we can use any
convenient (and stable) QR factorization method. This also shows how the method can be generalized to
work with more than three columns and even all columns simultaneously.
The algorithm now becomes:
This shows that, if the components in the direction of q0 and q1 are subtracted out, it is the
component
λ3
in the direction of q3 that is deminished in length the most slowly, dictated by the ratio λ2 . This, of
(k)
course, generalizes: the jth column of V (k) , vi will have a component in the direction of q j+1 , of length
(k)
|qTj+1 v j |, that can be expected to shrink most slowly.
We demonstrate this in Figure 16.1, which shows the execution of the algorithm with p = n for a 5 × 5
(k)
matrix, and shows how |qTj+1 v j | converge to zero as as function of k.
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 300
Figure 16.1: Convergence of the subspace iteration for a 5 × 5 matrix. This graph is mislabeled: x should
be labeled with v. The (linear) convergence of v j to a vector in the direction of q j is dictated by now
quickly the component in the direction q j+1 converges to zero. The line labeled |qTj+1 x j | plots the length
of the component in the direction q j+1 as a function of the iteration number.
Next, we observe that if V ∈ Rn×n in the above iteration (which means we are iterating with n vectors
at a time), then AV yields a next-to last column of the form
λj
∑n−3
j=0 γn−2, j q j+
λn−2
ψn−2,n−2 qn−2 + ,
ψn−2,n−1 λλn−1
n−2
qn−1
where ψi, j = qTj vi . Thus, given that the components in the direction of q j , j = 0, . . . , n − 2 can be expected
in later iterations
to be greatly reduced by the QR factorization that subsequently happens with AV , we
(k)
notice that it is λλn−1 that dictates how fast the component in the direction of qn−1 disappears from vn−2 .
n−2
This is a ratio we also saw in the Inverse Power Method and that we noticed we could accelerate in the
Rayleigh Quotient Iteration: At each iteration we should shift the matrix to (A − µk I) where µk ≈ λn−1 .
Since the last column of V (k) is supposed to be converging to qn−1 , it seems reasonable to use µk =
(k)T (k) (k)
vn−1 Avn−1 (recall that vn−1 has unit length, so this is the Rayleigh quotient.)
The above discussion motivates the iteration
16.2. Subspace Iteration 301
0
10 | qT0 x(k)
4
|
| qT1 x(k)
4
|
| qT x(k) |
2 4
T (k)
length of component in direction ... | q3 x4 |
−5
10
−10
10
−15
10
0 5 10 15 20 25 30 35 40
k
Figure 16.2: Convergence of the shifted subspace iteration for a 5 × 5 matrix. This graph is mislabeled: x
should be labeled with v. What this graph shows is that the components of v4 in the directions q0 throught
q3 disappear very quickly. The vector v4 quickly points in the direction of the eigenvector associated
with the smallest (in magnitude) eigenvalue. Just like the Rayleigh-quotient iteration is not guaranteed to
converge to the eigenvector associated with the smallest (in magnitude) eigenvalue, the shifted subspace
iteration may home in on a different eigenvector than the one associated with the smallest (in magnitude)
eigenvalue. Something is wrong in this graph: All curves should quickly drop to (near) zero!
Notice that this does not require one to solve with (A − µk I), unlike in the Rayleigh Quotient Iteration.
However, it does require a QR factorization, which requires more computation than the LU factorization
(approximately 43 n3 flops).
We demonstrate the convergence in Figure 16.2, which shows the execution of the algorithm with a
(k)
5 × 5 matrix and illustrates how |qTj vn−1 | converge to zero as as function of k.
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 302
We have informally argued that the columns of the orthogonal matrices V (k) ∈ Rn×n generated by the
(unshifted) subspace iteration converge to eigenvectors of matrix A. (The exact conditions under which
this happens have not been fully discussed.) In Figure 16.3 (left), we restate the subspace iteration. In
it, we denote matrices V (k) and R(k) from the subspace iteration by Vb (k) and Rb to distinguish them from
the ones computed by the algorithm on the right. The algorithm on the left also computes the matrix
b(k) = V (k)T AV (k) , a matrix that hopefully converges to Λ, the diagonal matrix with the eigenvalues of A
A
on its diagonal. To the right is the QR algorithm. The claim is that the two algorithms compute the same
quantities.
We conclude that if Vb (k) converges to the matrix of orthonormal eigenvectors when the subspace iteration
is applied to V (0) = I, then A(k) converges to the diagonal matrix with eigenvalues along the diagonal.
In Figure 16.4 (left), we restate the subspace iteration with shifting. In it, we denote matrices V (k) and R(k)
from the subspace iteration by Vb (k) and Rb to distinguish them from the ones computed by the algorithm
on the right. The algorithm on the left also computes the matrix A b(k) = V (k)T AV (k) , a matrix that hopefully
converges to Λ, the diagonal matrix with the eigenvalues of A on its diagonal. To the right is the shifted
QR algorithm. The claim is that the two algorithms compute the same quantities.
16.3. The QR Algorithm 303
Figure 16.4: Basic shifted subspace iteration and basic shifted QR algorithm.
We conclude that if Vb (k) converges to the matrix of orthonormal eigenvectors when the shifted subspace
iteration is applied to V (0) = I, then A(k) converges to the diagonal matrix with eigenvalues along the
diagonal.
The convergence of the basic shifted QR algorithm is illustrated below. Pay particular attention to the
convergence of the last row and column.
2.01131953448 0.05992695085 0.14820940917 2.21466116574 0.34213192482 0.31816754245
(0)
(1)
A = 0.05992695085
2.30708673171 0.93623515213 A = 0.34213192482 2.54202325042
0.57052186467
0.14820940917 0.93623515213 1.68159373379 0.31816754245 0.57052186467 1.24331558383
2.63492207667 0.47798481637 0.07654607908 2.87588550968 0.32971207176 0.00024210487
(2)
(3)
A = 0.47798481637 2.35970859985 0.06905042811 A = 0.32971207176 2.12411444949 0.00014361630
0.07654607908 0.06905042811 1.00536932347 0.00024210487 0.00014361630 1.00000004082
2.96578660126 0.18177690194 0.00000000000 2.9912213907 0.093282073553 0.00000000000
(4)
(5)
A = 0.18177690194
2.03421339873 0.00000000000 A = 0.0932820735 2.008778609226
0.00000000000
0.00000000000 0.00000000000 1.00000000000 0.0000000000 0.000000000000 1.00000000000
Once the off-diagonal elements of the last row and column have converged (are sufficiently small), the
problem can be deflated by applying the following theorem:
Theorem 16.4 Let
A0,0 A01 ··· A0N−1
0 A1,1 · · · A1,N−1
A= . ,
.. .. .. ..
. . .
0 0 · · · AN−1,N−1
where Ak,k are all square. Then Λ(A) = ∪N−1
k=0 Λ(Ak,k ).
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 304
In other words, once the last row and column have converged, the algorithm can continue with the subma-
trix that consists of the first n − 1 rows and columns.
The problem with the QR algorithm, as stated, is that each iteration requires O(n3 ) operations, which
is too expensive given that many iterations are required to find all eigenvalues and eigenvectors.
In the next section, we will see that if A(0) is a tridiagonal matrix, then so are all A(k) . This reduces the cost
of each iteration from O(n3 ) to O(n). We first show how unitary similarity transformations can be used to
reduce a matrix to tridiagonal form.
Definition 16.6 Let u ∈ Rn , τ ∈ R. Then H = H(u) = I − uuT /τ, where τ = 12 uT u, is said to be a reflector
or Householder transformation.
We observe:
• Let z be any vector that is perpendicular to u. Applying a Householder transform H(u) to z leaves
the vector unchanged: H(u)z = z.
• Let any vector x be written as x = z + uT xu, where z is perpendicular to u and uT xu is the component
of x in the direction of u. Then H(u)x = z − uT xu.
This can be interpreted as follows: The space perpendicular to u acts as a “mirror”: any vector in that
space (along the mirror) is not reflected, while any other vector has the component that is orthogonal
to the space (the component outside and orthogonal to the mirror) reversed in direction. Notice that a
reflection preserves the length of the vector. Also, it is easy to verify that:
1. HH = I (reflecting the reflection of a vector results in the original vector);
2. H = H T , and so H T H = HH T = I (a reflection is an orthogonal matrix and thus preserves the norm);
and
3. if H0 , · · · , Hk−1 are Householder transformations and Q = H0 H1 · · · Hk−1 , then QT Q = QQT = I (an
accumulation of reflectors is an orthogonal matrix).
As part of the reduction to condensed form operations, given a vector x we will wish to find a
Householder transformation, H(u), such that H(u)x equals a vector with zeroes below the first element:
H(u)x = ∓kxk2 e0 where e0 equals the first column of the identity matrix. It can be easily checked that
choosing u = x ± kxk2 e0 yields the desired H(u). Notice that any nonzero scaling of u has the same prop-
erty, and the convention is to scale u so that the first element equals one. Let us define [u, τ, h] = HouseV(x)
to be the function that returns u with first element equal to one, τ = 12 uT u, and h = H(u)x.
16.4. Reduction to Tridiagonal Form 305
16.4.2 Algorithm
The first step towards computing the eigenvalue decomposition of a symmetric matrix is to reduce the
matrix to tridiagonal form.
The basic algorithm for reducing a symmetric matrix to tridiagonal form, overwriting the original
matrix with the result, can be explained as follows. We assume that symmetric A is stored only in the
lower triangular part of the matrix and that only the diagonal and subdiagonal of the symmetric tridiagonal
matrix is computed, overwriting those parts of A. Finally, the Householder vectors used to zero out parts
of A overwrite the entries that they annihilate (set to zero).
α11 a21 T
• Partition A → .
a21 A22
where H = H(u21 ). Note that a21 := Ha21 need not be executed since this update was performed by
the instance of HouseV above.2 Also, aT12 is not stored nor updated due to symmetry. Finally, only
the lower triangular part of HA22 H is computed, overwriting A22 . The update of A22 warrants closer
scrutiny:
1 1
A22 := (I − u21 uT21 )A22 (I − u21 uT21 )
τ τ
1 1
= (A22 − u21 u21 A22 )(I − u21 uT21 )
T
τ | {z } τ
T
y21
1 1 1
= A22 − u21 yT21 − Au21 uT21 + 2 u21 yT21 u21 uT21
τ τ |{z} τ | {z }
y21 2β
1 T β T 1 T β T
= A22 − u21 y21 − 2 u21 u21 − y21 u21 − 2 u21 u21
τ τ τ τ
1 T β T 1 β
= A22 − u21 y21 − u21 − y21 − u21 uT21
τ τ τ τ
| {z } | {z }
wT21 w21
Repartition
A00 a01 A02 t0
AT L AT R t
T → τ
→ aT aT12
α11 ,
10 1
ABL ABR tB
A20 a21 A22 t2
whereα11 and τ1 are scalars
Continue with
A00 a01 A02 t0
AT L AT R t
T ← τ1
← aT aT12
α11 ,
10
ABL ABR tB
A20 a21 A22 t2
endwhile
Figure 16.5: Basic algorithm for reduction of a symmetric matrix to tridiagonal form.
16.5. The QR algorithm with a Tridiagonal Matrix 307
× × × × × × × 0 0 0 × × 0 0 0
× × × × × × × × × × × × × 0 0
× × × × × −→ 0 × × × × −→ 0 × × × × −→
× × × × × 0 × × × × 0 0 × × ×
× × × × × 0 × × × × 0 0 × × ×
Original matrix First iteration Second iteration
× × 0 0 0 × × 0 0 0
× × × 0 0 × × × 0 0
0 × × × 0 −→ 0 × × × 0 −→
0 0 × × × 0 0 × × ×
0 0 0 × × 0 0 0 × ×
Third iteration
Figure 16.6: Illustration of reduction of a symmetric matrix to tridiagonal form. The ×s denote nonzero
elements in the matrix. The gray entries above the diagonal are not actually updated.
This is captured in the algorithm in Figure 16.5. It is also illustrated in Figure 16.6.
The total cost for reducing A ∈ Rn×n is approximately
n−1
4
4(n − k − 1)2 flops ≈ n3 flops.
∑ 3
k=0
where γ2 + σ2 = 1. (Notice that γ and σ can be thought of as the cosine and sine of an angle.) Then
T
γ −σ γ −σ γ σ γ −σ
GT G = =
σ γ σ γ −σ γ σ γ
2 2
γ + σ −γσ + γσ 1 0
= = ,
2
γσ − γσ γ + σ 2 0 1
which means that a Givens’ rotation is a unitary matrix.
Now, if γ = χ1 /kxk2 and σ = χ2 /kxk2 , then γ2 + σ2 = (χ21 + χ22 )/kxk22 = 1 and
T
γ −σ χ γ σ χ 2 2
(χ1 + χ2 )/kxk2 kxk2
1 = 1 = = .
σ γ χ2 −σ γ χ2 (χ1 χ2 − χ1 χ2 )/kxk2 0
Then
α
b 0,0 α
b 0,1 α
b 0,2 0 1 0 0 0 α
b 0,0 α
b 0,1 α
b 0,2 0
0 α
b 1,1 α
b b
b 1,2 α
b
b1,3 0 γ2,1 σ2,1 0 0 α
b 1,1 α
b 1,2 0
=
0 0 0 −σ2,1 γ2,1 0 0
α
b 2,2 α
b 2,3 α2,1 α2,2 α2,3
0 0 α3,2 α3,3 0 0 0 1 0 0 α3,2 α3,3
T
α
b 2,2 γ3,2 −σ3,2 α
b 2,2 α
b
b 2,2
Finally, from one can compute γ3,2 and σ3,2 so that = .
α3,2 σ3,2 γ3,2 α3,2 0
Then
α α b 0,2 0
α 1 0 0 0 α α
b 0,1 α 0
0,0
b 0,1
0,0
b b b 0,2
0 α 0 1 0 0 0 α
b
b 1,1 α
b 1,2 α
b b
b 1,3 b 1,1 α
b b
b 1,2 α
b
b 1,3
=
0 0 1 0 0 0
α
b 2,2 α
b b
b 2,3 γ3,2 σ3,2 α
b 2,2 α
b 2,3
0 0 0 α b 3,3 0 1 −σ3,2 γ3,2 0 0 α3,2 α3,3
The matrix Q is the orthogonal matrix that results from multiplying the different Givens’ rotations together:
1 0 0 0 1 0 0 0
γ1,0 −σ1,0 0 0
0 0 0 γ2,1 −σ2,1 0 0 1 0 0
σ1,0 γ1,0
Q= . (16.2)
1 0 0 σ2,1 γ2,1 0 0 0 γ3,2 −σ3,2
0 0
0 0 0 1 0 0 0 1 0 0 σ3,2 γ3,2
The next question is how to compute RQ given the QR factorization of the tridiagonal matrix:
α
b 0,0 α
b 0,1 α
b 0,2 0 γ1,0 −σ1,0 0 0 1 0 0 0 1 0 0 0
0 α α α 0 0
0 −σ2,1 0
0 1 0 0
b b b σ γ1,0 γ2,1
b 1,1 b 1,2 b1,3 1,0
0 0 α
b
b 2,2 α
b 2,3
b
0
0 1 0
0 σ2,1 γ2,1 0
0
0 γ3,2 −σ3,2
0 0 0 α
b 3,3 0 0 0 1 0 0 0 1 0 0 σ3,2 γ3,2
| {z }
α̃0,0 α̃0,1 α
b 0,2 0
α̃
1,0 b˜ 1,1
α α
b
b 1,2 α
b
b 1,3
0 0 α
b
b 2,2 α
b
b 2,3
0 0 0 α
b 3,3
| {z }
α̃0,0 α̃˜ 0,1 α̃0,2 0
α̃
1,0 α̃˜ 1,1 α̃1,2 α
b
b 1,3
0 α̃2,1 α̃2,2 α
b
b 2,3
0 0 0 α
b 3,3
| {z }
α̃0,0 ˜α̃0,1 α̃0,2 0
α̃1,0 α̃˜ 1,1 α̃˜ 1,2 α̃1,3
.
0
α̃2,1 α̃˜ 2,2 α̃2,3
0 0 α̃3,2 α̃3,3
Dear Colleagues,
John Francis was born in 1934 in London and currently lives in Hove, near
Brighton. His residence is about a quarter mile from the sea; he is a
widower. In 1954, he worked at the National Research Development Corp
(NRDC) and attended some lectures given by Christopher Strachey.
In 1955,’56 he was a student at Cambridge but did not complete a degree.
He then went back to NRDC as an assistant to Strachey where he got
involved in flutter computations and this led to his work on QR.
After leaving NRDC in 1961, he worked at the Ferranti Corp and then at the
University of Sussex. Subsequently, he had positions with various
industrial organizations and consultancies. He is now retired. His
interests were quite general and included Artificial Intelligence,
computer languages, systems engineering. He has not returned to numerical
computation.
He was surprised to learn there are many references to his work and
that the QR method is considered one of the ten most important
algorithms of the 20th century. He was unaware of such developments as
TeX and Math Lab. Currently he is working on a degree at the Open
University.
John Francis did remarkable work and we are all in his debt. Along with
the conjugate gradient method, it provided us with one of the basic tools
of numerical analysis.
Gene Golub
Figure 16.7: Posting by the late Gene Golub in NA Digest Sunday, August 19, 2007 Volume 07 : Issue
34. An article on the ten most important algorithms of the 20th century, published in SIAM News, can be
found at http://www.uta.edu/faculty/rcli/TopTen/topten.pdf.
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 314
α
b 2,1
again preserves eigenvalues. Finally, from one can compute γ3,1 and σ3,1 so that
α
b 3,1
T
γ3,1 −σ3,1 α
b 2,1 α̃2,1
= .
σ3,1 γ3,1 α
b 3,1 0
Then
α̃0,0 α̃1,0 0 0 1 0 0 0 α̃0,0 α̃1,0 0 0 1 0 0 0
α̃1,0 α̃1,1 α̃2,1 0 0 1 0 0 α̃1,0 α̃1,1 α
b
b 2,1 α 0
b 3,1 1 0 0
=
0
α̃2,1 α̃2,2 α̃2,3 1
0 γ3,2 σ3,2 0
α
b
b 2,1 α
b 2,2 α 1
b 2,3 0 γ3,1 −σ3,1
0 0 α̃3,2 α̃3,3 0 1 −σ3,2 γ3,2 0 α
b 3,1 α
b 3,2 α3,3 0 1 σ3,1 γ3,1
The matrix Q is the orthogonal matrix that results from multiplying the different Givens’ rotations to-
gether:
1 0 1 0 0 0 0 0
γ1,0 −σ1,0 0 0
0 γ2,0 −σ2,0 0 0 1 0 0
σ1,0 γ1,0 0 0
Q= .
0 σ2,0 γ2,0 0 0 0 γ3,1 −σ3,1
0 0 1 0
0 0 0 1
0 0 0 1 0 0 σ 3,1 γ3,1
γ1,0
σ
1,0
It is important to note that the first columns of Q is given by , which is exactly the same first
0
0
column had Q been computed as in Section 16.6 (Equation 16.2). Thus, by the Implicit Q Theorem, the
tridiagonal matrix that results from this approach is equal to the tridiagonal matrix that would be computed
by applying the QR factorization from Section 16.6 with A − µI, A − µI → QR followed by the formation
of RQ + µI using the algorithm for computing RQ in Section 16.6.
The successive elimination of elements α b i+1,i is often referred to as chasing the bulge while the entire
process that introduces the bulge and then chases it is known as a Francis Implicit QR Step. Obviously,
the method generalizes to matrices of arbitrary size, as illustrated in Figure 16.8. An algorithm for the
chasing of the bulge is given in Figure 18.4. (Note that in those figures T is used for A, something that
needs to be made consistent in these notes, eventually.) In practice, the tridiagonal matrix is not stored as
a matrix. Instead, its diagonal and subdiagonal are stored as vectors.
×× TT L T
TML
×××
××× +
××× Beginning of iteration
TML TMM T
TBM
+ ×××
×××
××× TBM TBR
××
↓ ↓
×× T00 t10
×××
××× + T
T10 τ11 T
t21
××× Repartition t21 T22 t32
+ ×××
××× T
t32 τ33 T
t43
××× t43 T44
××
↓ ↓
×× T00 t10
×××
××× T
t10 τ11 T
t21
××× + Update t21 T22 t32
×××
+ ××× T
t32 τ33 T
t43
××× t43 T44
××
↓ ↓
××
××× TT L T
TML
×××
××× + End of iteration
××× TML TMM T
TBM
+ ×××
××× TBM TBR
××
Figure 16.8: One step of “chasing the bulge” in the implicitly shifted symmetric QR algorithm.
• If an element of the subdiagonal (and corresponding element on the superdiagonal) becomes small
enough, it can be considered to be zero and the problem deflates (decouples) into two smaller tridi-
agonal matrices. Small is often taken to means that |αi+1,i | ≤ ε(|αi,i | + |αi+1,i+1 |) where ε is some
quantity close to the machine epsilon (unit roundoff).
• If A = QT QT reduced A to the tridiagonal matrix T before the QR algorithm commensed, then the
Givens’ rotations encountered as part of the implicitly shifted QR algorithm can be applied from the
right to the appropriate columns of Q so that upon completion Q is overwritten with the eigenvectors
of A. Notice that applying a Givens’ rotation to a pair of columns of Q requires O(n) computation
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 316
Repartition
T00 ? 0 0 0
TT L ? 0 tT τ11 ? 0 0
10
? → 0 ? 0
TML TMM t21 T22
0 TBM TBR 0 0 T
t32 τ33 ?
0 0 0 t43 T44
whereτ11 and τ33 are scalars
(during final step, τ33 is 0 × 0)
τ21 τ21
Compute (γ, σ) s.t. GTγ,σt21 = , and assign t21 =
0 0
T22 = GTγ,σ T22 Gγ,σ
T = tT G
t32 (not performed during final step)
32 γ,σ
Continue with
T00 ? 0 0 0
TT L ? 0 tT τ11 ? 0 0
10
? ← 0 ? 0
TML TMM t21 T22
0 0 T
t32 ?
0 TBM TBR τ33
0 0 0 t43 T44
endwhile
per Givens rotation. For each Francis implicit QR step O(n) Givens’ rotations are computed, mak-
ing the application of Givens’ rotations to Q of cost O( n2 ) per iteration of the implicitly shifted QR
algorithm. Typically a few (2-3) iterations are needed per eigenvalue that is uncovered (by defla-
tion) meaning that O(n) iterations are needed. Thus, the QR algorithm is roughly of cost O(n3 )
if the eigenvalues are accumulated (in addition to the cost of forming the Q from the reduction to
tridiagonal form, which takes another O(n3 ) operations.)
• If an element on the subdiagonal becomes zero (or very small), and hence the corresponding element
of the superdiagonal, then the problem can be deflated: If
16.7. The Implicitly Shifted QR Algorithm 317
× ×
× × ×
× × ×
T00 0 × × ×
T =
0 T11 × × 0
0 × ×
× × ×
× ×
then
Because of the connection between the QR algorithm and the Inverse Power Method, subdiagonal
entries near the bottom-right of T are more likely to converge to a zero, so most deflation will happen
there.
• A question becomes when an element on the subdiagonal, τi+1,i can be considered to be zero. The
answer is when |τi+1,i | is small relative to |τi | and |τi+1,i+1 . A typical condition that is used is
q
|τi+1,i | ≤ u |τi,i τi + 1, i + 1|.
• If A ∈ Cn×n is Hermitian, then its Spectral Decomposition is also computed via the following steps,
which mirror those for the real symmetric case:
For details, see some of our papers mentioned in the next section.
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 318
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortı́, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012
(Reduction to tridiagonal form is one case of what is more generally referred to as “condensed form”.)
Sweep 1
× 0 × × × × 0 × × × × 0
0 × × × × × × × × × × ×
→ → →
× × × × 0 × × × × × × ×
× × × × × × × × 0 × × ×
zero (1, 0) zero (2, 0) zero (3, 0)
× × × 0 × × × × × × × ×
× × 0 × × × × 0 × × × ×
→ → →
× 0 × × × × × × × × × 0
0 × × × × 0 × × × × 0 ×
zero (2, 1) zero (3, 1) zero (3, 2)
Sweep 2
× 0 × × × × 0 × × × × 0
0 × × × × × × × × × × ×
→ → →
× × × 0 0 × × × × × × ×
× × 0 × × × × × 0 × × ×
zero (1, 0) zero (2, 0) zero (3, 0)
× × × 0 × × × × × × × ×
× × 0 × × × × 0 × × × ×
→ → →
× 0 × × × × × × × × × 0
0 × × × × 0 × × × × 0 ×
zero (2, 1) zero (3, 1) zero (3, 2)
such that
T
T γ11 −σ31 α11 α31 γ11 −σ31 α
b 11 0
J31 A31 J31 = = .
σ31 γ33 α31 α33 σ31 γ33 0 α
b 33
We know this exists since the Spectral Decomposition of the 2 × 2 matrix exists. Such a rotation is called
a Jacobi rotation. (Notice that it is different from a Givens’ rotation because it diagonalizes a 2 × 2 matrix
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 320
when used as a unitary similarity transformation. By contrast, a Givens’ rotation zeroes an element when
applied from one side of a matrix.)
Homework 16.11 In the above discussion, show that α211 + 2α231 + α233 = α b 233 .
b 211 + α
* SEE ANSWER
Jacobi rotation rotations can be used to selectively zero off-diagonal elements by observing the fol-
lowing:
T
I 0 0 0 0 A00 a10 AT20 a30 AT40 I 0 0 0 0
0 γ 0 −σ31 0 aT α11 aT21 α31 aT41 0 γ 0 −σ13 0
11 10 11
T
JAJ =
0 0 I 0 0
A20
a21 A22 a32 AT42
0 0
I 0 0
0 σ31 0 γ33 0 aT α31 aT32 α33 aT43 0 σ31 0 γ33 0
30
0 0 0 0 I A40 a41 A42 a43 A44 0 0 0 0 I
A00 ab10 AT20 ab30 AT40
abT α abT21 0 abT41
10 b 11
= T
A20 ab21 A22 ab32 A42 = A,
b
abT abT32 α abT43
30 0 b 33
A40 ab41 A42 ab43 A44
where
γ11 −σ31 aT10 aT21 aT41 abT10 abT21 abT41
= .
σ31 γ33 aT30 aT32 aT43 abT30 abT32 abT43
Importantly,
What this means is that if one defines off(A) as the square of the Frobenius norm of the off-diagonal
elements of A,
off(A) = kAk2F − kdiag(A)k2F ,
b = off(A) − 2α2 .
then off(A) 31
• The good news: every time a Jacobi rotation is used to zero an off-diagonal element, off(A) de-
creases by twice the square of that element.
• The bad news: a previously introduced zero may become nonzero in the process.
16.10. The Nonsymmetric QR Algorithm 321
The original algorithm developed by Jacobi searched for the largest (in absolute value) off-diagonal
element and zeroed it, repeating this processess until all off-diagonal elements were small. The algorithm
was applied by hand by one of his students, Seidel (of Gauss-Seidel fame). The problem with this is that
searching for the largest off-diagonal element requires O(n2 ) comparisons. Computing and applying one
Jacobi rotation as a similarity transformation requires O(n) flops. Thus, for large n this is not practical.
Instead, it can be shown that zeroing the off-diagonal elements by columns (or rows) also converges to a
diagonal matrix. This is known as the column-cyclic Jacobi algorithm. We illustrate this in Figure 16.10.
• There exists a unitary matrix Q ∈ Cn×n and upper triangular matrix R ∈ Cn×n such that A = QRQH .
Importantly: even if A is real valued, the eigenvalues and eigenvectors may be complex valued.
Theorem 16.12 Let A ∈ Rn×n . Then there exist unitary Q ∈ Rn×n and quasi upper triangular matrix
R ∈ Rn×n such that A = QRQT .
Here quasi upper triangular matrix means that the matrix is block upper triangular with blocks of the
diagonal that are 1 × 1 or 2 × 2.
Remark 16.13 The important thing is that this alternative to the Schur decomposition can be computed
using only real arithmetic.
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 322
Repartition
A00 a01 A02 t0
AT L AT R t
T → τ
→ aT aT12
α11 ,
10 1
ABL ABR tB
A20 a21 A22 t2
whereα11 is a scalar
Continue with
A00 a01 A02 t0
AT L AT R t
T ← τ1
← aT aT12
α11 ,
10
ABL ABR tB
A20 a21 A22 t2
endwhile
Figure 16.11: Basic algorithm for reduction of a nonsymmetric matrix to upperHessenberg form.
16.10. The Nonsymmetric QR Algorithm 323
× × × × × × × × × × × × × × ×
× × × × × × × × × × × × × × ×
× × × × × −→ 0 × × × × −→ 0 × × × × −→
× × × × × 0 × × × × 0 0 × × ×
× × × × × 0 × × × × 0 0 × × ×
Original matrix First iteration Second iteration
× × × × × × × × × ×
× × × × × × × × × ×
0 × × × × −→ 0 × × × × −→
0 0 × × × 0 0 × × ×
0 0 0 × × 0 0 0 × ×
Third iteration
Figure 16.12: Illustration of reduction of a nonsymmetric matrix to upperHessenbert form. The ×s denote
nonzero elements in the matrix.
• Update
A a01 A02 I 0 0 A a01 A02 I 0 0
00 00
T
α11 aT12
T
:= 0 1 0 a10 α11 aT12
a10 0 1 0
0 a21 A22 0 0 H 0 a21 A22 0 0 H
3 Note that the semantics here indicate that a21 is overwritten by Ha21 .
Chapter 16. Notes on the QR Algorithm and other Dense Eigensolvers 324
A a01 A02 H
00
T
= a10 α11 aT12 H
0 Ha21 HA22 H
where H = H(u21 ).
– Note that a21 := Ha21 need not be executed since this update was performed by the instance
of HouseV above.4
– The update A02 H requires
1
A02 := A02 (I − u21 uT21 )
τ
1
= A02 − A02 u21 uT21
τ | {z }
y01
1
= A02 − y01 uT21 (ger)
τ
1 1
= A22 − u21 wT21 − z21 uT21
| τ {z τ }
rank-2 update
It doesn’t suffice to only This is captured in the algorithm in Figure 16.11. It is also illustrated in Fig-
ure 16.12.
Homework 16.14 Give the approximate total cost for reducing a nonsymmetric A ∈ Rn×n to upper-
Hessenberg form.
* SEE ANSWER
For a detailed discussion on the blocked algorithm that uses FLAME notation, we recommend [34]
and [45]
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortı́, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012
The purpose of this note is to give some high level idea of how the Method of Relatively Robust Repre-
sentations (MRRR) works.
327
Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR) 328
17.0.1 Outline
• It can compute all eigenvalues and eigenvectors of tridiagonal matrix in O(n2 ) time (versus O(n3 )
time for the symmetric QR algorithm).
The benefit when computing all eigenvalues and eigenvectors of a dense matrix is considerably less be-
cause transforming the eigenvectors of the tridiagonal matrix back into the eigenvectors of the dense matrix
requires an extra O(n3 ) computation, as discussed in [44]. In addition, making the method totally robust
has been tricky, since the method does not rely exclusively on unitary similarity transformations.
The fundamental idea is that of a twisted factorization of the tridiagonal matrix. This note builds up
to what that factorization is, how to compute it, and how to then compute an eigenvector with it. We
start by reminding the reader of what the Cholesky factorization of a symmetric positive definite (SPD)
matrix is. Then we discuss the Cholesky factorization of a tridiagonal SPD matrix. This then leads to
the LDLT factorization of an indefinite and indefinite tridiagonal matrix. Next follows a discussion of the
UDU T factorization of an indefinite matrix, which then finally yields the twisted factorization. When the
matrix is nearly singular, an approximation of the twisted factorization can then be used to compute an
approximate eigenvalue.
Notice that the devil is in the details of the MRRR algorithm. We will not tackle those details.
17.1. MRRR, from 35,000 Feet 329
Algorithm: A := C HOL(A)
AT L ?
Partition A →
ABL ABR
where AT L is 0 × 0
while m(AT L ) < m(A) do
Repartition
A00 ? ?
AT L ?
→ aT α
10 11 ?
ABL ABR
A20 a21 A22
√
α11 := α11
a21 := a21 /α11
A22 := A22 − a21 aT21
Continue with
A00 ? ?
AT L ?
← aT α11 ?
10
ABL ABR
A20 a21 A22
endwhile
Figure 17.1: Unblocked algorithm for computing the Cholesky factorization. Updates to A22 affect only
the lower triangular part.
Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR) 330
Continue with
A00 ? ?
A ? ?
FF
α10 eT α11 ? ?
T L
? ←
αMF eL αMM
0 α21 α22 ?
0 αLM eF ALL
0 0 α32 eF A33
endwhile
Figure 17.2: Algorithm for computing the the Cholesky factorization of a tridiagonal matrix.
17.2. Cholesky Factorization, Again 331
• Update A22 := A22 − a21 aT21 (updating only the lower triangular part).
The resulting algorithm is given in Figure 17.2. In that figure, it helps to interpret F, M, and L as First,
Middle, and Last. In that figure, eF and eL are the unit basis vectors with a ”1” as first and last element,
respectively Notice that the Cholesky factor of a tridiagonal matrix is a lower bidiagonal matrix that
overwrites the lower triangular part of the tridiagonal matrix A.
Naturally, the whole matrix needs not be stored. But that detail is not important for our discussion.
Algorithm: A := LDLT(A)
AT L ?
Partition A →
ABL ABR
whereAT L is 0 × 0
while m(AT L ) < m(A) do
Repartition
A00 ? ?
AT L ?
→ aT α
10 11 ?
ABL ABR
A20 a21 A22
whereα11 is 1 × 1
Continue with
A00 ? ?
AT L ?
← aT α11 ?
10
ABL ABR
A20 a21 A22
endwhile
(nonsingular) symmetric matrix. (Notice: the so computed L is not the same as the L computed by the
Cholesky factorization.) Partition
α11 a21 T 1 0 δ 0
A→ ,L → , and D → 1 .
a21 A22 l21 L22 0 D22
Then
T
α11 aT21 1 0 δ1 0 1 0
=
a21 A22 l21 L22 0 D22 l21 L22
δ11 0 T
1 l21
=
δ11 l21 L22 D22 T
0 L22
17.4. The UDU T Factorization 333
δ11 ?
=
T + L D LT
δ11 l21 δ11 l21 l21 22 22 22
This suggests the following algorithm for overwriting the strictly lower triangular part of A with the strictly
lower triangular part of L and the diagonal of A with D:
α11 a21T
• Partition A → .
a21 A22
Homework 17.1 Modify the algorithm in Figure 17.1 so that it computes the LDLT factorization. (Fill in
Figure 17.3.)
* SEE ANSWER
Homework 17.2 Modify the algorithm in Figure 17.2 so that it computes the LDLT factorization of a
tridiagonal matrix. (Fill in Figure 17.4.) What is the approximate cost, in floating point operations, of
computing the LDLT factorization of a tridiagonal matrix? Count a divide, multiply, and add/subtract as
a floating point operation each. Show how you came up with the algorithm, similar to how we derived the
algorithm for the tridiagonal Cholesky factorization.
* SEE ANSWER
Notice that computing the LDLT factorization of an indefinite matrix is not a good idea when A is
nearly singular. This would lead to some δ11 being nearly zero, meaning that dividing by it leads to very
large entries in l21 and corresponding element growth in A22 . (In other words, strange things will happen
“down stream” from the small δ11 .) Not a good thing. Unfortunately, we are going to need something like
this factorization specifically for the case where A is nearly singular.
Repartition
A00 ? ?
A ? ?
FF
α eT α ? ?
T 10 L 11
? →
αMF eL αMM
0 α21 α22 ?
0 αLM eF ALL
0 0 α32 eF A33
whereα22 is a scalars
Continue with
A00 ? ?
A ? ?
FF
α10 eT α11 ? ?
L
αMF eTL αMM ? ←
0 α21 α22 ?
0 αLM eF ALL
0 0 α32 eF A33
endwhile
Figure 17.4: Algorithm for computing the the LDLT factorization of a tridiagonal matrix.
17.4. The UDU T Factorization 335
Repartition
A00 a01 A02
AT L AT R
→ ? α11 aT
12
? ABR
? ? A22
whereα11 is 1 × 1
Continue with
A00 a01 A02
AT L AT R
← ? α11 aT12
? ABR
? ? A22
endwhile
Figure 17.5: Unblocked algorithm for computing A = UDU T . Updates to A00 affect only the upper
triangular part.
Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR) 336
Homework 17.3 Derive an algorithm that, given an indefinite matrix A, computes A = UDU T . Overwrite
only the upper triangular part of A. (Fill in Figure 17.5.) Show how you came up with the algorithm,
similar to how we derived the algorithm for LDLT .
* SEE ANSWER
Notice that the UDU T factorization has the exact same shortcomings as does LDLT when applied
to a singular matrix. If D has a small element on the diagonal, it is again “down stream” that this can
cause large elements in the factored matrix. But now “down stream” means towards the top-left since the
algorithm moves in the opposite direction.
• Find a column that has no pivot in it. This identifies an independent (free) variable.
• It is inherently unstable since you are working with a matrix, B = A − λI that inherently has a bad
condition number. (Indeed, it is infinity if λ is an exact eigenvalue.)
• λ is not an exact eigenvalue when it is computed by a computer and hence B is not exactly singular.
So, no independent variables will even be found.
These are only some of the problems. To overcome this, one computes something called a twisted factor-
ization.
Again, let B be a tridiagonal matrix. Assume that λ is an approximate eigenvalue of B and let A =
B − λI. We are going to compute an approximate eigenvector of B associated with λ by computing a vector
that is in the null space of a singular matrix that is close to A. We will assume that λ is an eigenvalue of
17.6. The Twisted Factorization 337
Repartition
A00 α01 eL 0 0
A α e 0
FF FM L
? α11 α12 0
? αMM αML eTF →
? ? α22 α23 eTF
? ? ALL
? ? ? A33
where
Continue with
A00 α01 eL 0 0
A α e 0
FF FM L
? α11 α12 0
? αMM αML eTF ←
? ? T
α22 α23 eF
? ? ALL
? ? ? A33
endwhile
Figure 17.6: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.
Chapter 17. Notes on the Method of Relatively Robust Representations (MRRR) 338
B that has multiplicity one, and is well-separated from other eigenvalues. Thus, the singular matrix close
to A has a null space with dimension one. Thus, we would expect A = LDLT to have one element on the
diagonal of D that is essentially zero. Ditto for A = UEU T , where E is diagonal and we use this letter to
be able to distinguish the two diagonal matrices.
Thus, we have
• A = B − λI is indefinite.
• A = LDLT . L is bidiagonal, unit lower triangular, and D is diagonal with one small element on the
diagonal.
• A = UEU T . U is bidiagonal, unit upper triangular, and E is diagonal with one small element on the
diagonal.
Let
A00 α10 eL 0 L00 0 0 U00 υ01 eL 0
A = α10 eTL α21 eTF , L = λ10 eTL 0 ,U = 0 υ21 eTF ,
1
1
α11
0 α21 eF A22 0 λ21 eF L22 0 0 U22
D 0 0 E 0 0
00 00
D = 0 δ1 , and E = 0 ε1 0 ,
0
0 0 D22 0 0 E22
where all the partitioning is “conformal”.
(Hint: multiply out A = LDLT and A = UEU T with the partitioned matrices first. Then multiply out the
above. Compare and match...) How should φ1 be chosen? What is the cost of computing the twisted
factorization given that you have already computed the LDLT and UDU T factorizations? A ”Big O”
estimate is sufficient. Be sure to take into account what eTL D00 eL and eTF E22 eF equal in your cost estimate.
* SEE ANSWER
17.7. Computing an Eigenvector from the Twisted Factorization 339
Because λ was assumed to have multiplicity one and well-separated, the resulting twisted factorization
has “nice” properties. We won’t get into what that exactly means.
For simplicity we will focus on the case where A ∈ Rn×n . We will assume that the reader has read Chap-
ter 16.
341
Chapter 18. Notes on Computing the SVD of a Matrix 342
18.0.1 Outline
18.1 Background
Recall:
Theorem 18.1 If A ∈ Rm×m then there exists unitary matrices U ∈ Rm×m and V ∈ Rn×n , and diagonal
matrix Σ ∈ Rm×n such that A = UΣV T . This is known as the Singular Value Decomposition.
There is a relation between the SVD of a matrix A and the Spectral Decomposition of AT A:
Homework 18.2 If A = UΣV T is the SVD of A then AT A = V Σ2V T is the Spectral Decomposition of AT A.
* SEE ANSWER
The above exercise suggests steps for computing the Reduced SVD:
• Form C = AT A.
• Compute unitary V and diagonal Λ such that C = QΛQT , ordering the eigenvalues from largest to
smallest on the diagonal of Λ.
• Form W = AV . Then W = UΣ so that U and Σ can be computed from W by choosing the diago-
nal elements of Σ to equal the lengths of the corresponding columns of W and normalizing those
columns by those lengths.
The problem with this approach is that forming AT A squares the condition number of the problem. We
will show how to avoid this.
Repartition
A00 a01 A02 t0 r0
AT L AT R t r
T → τ , T → ρ
→ aT aT12
α11 ,
10 1 1
ABL ABR tB rB
A20 a21 A22 t2 r2
whereα11 , τ1 , and ρ1 are scalars
1 α11 α11
[ , τ1 , ] := HouseV( )
u21 a21 a21
aT12 := aT12 − wT12
z21 := (y21 − βu21 /τ1 )/τ1
A22 := A22 − u21 wT21 (rank-1 update)
[v12 , ρ1 , a12 ] := HouseV(a12 )
w21 := A22 v12 /ρ1
A22 := A22 − w21 vT12 (rank-1 update)
Continue with
A00 a01 A02 t0 r0
AT L AT R t r
T ← τ1 , T ← ρ1
← aT aT12
α11 ,
10
ABL ABR tB rB
A20 a21 A22 t2 r2
endwhile
Figure 18.1: Basic algorithm for reduction of a nonsymmetric matrix to bidiagonal form.
Chapter 18. Notes on Computing the SVD of a Matrix 344
× × × × × × 0 0 × × 0 0
× × × × 0 × × × 0 × × 0
× × × × −→ 0 × × × −→ 0 0 × × −→
× × × × 0 × × × 0 0 × ×
× × × × 0 × × × 0 0 × ×
Original matrix First iteration Second iteration
× × 0 0 × × 0 0
0 × × 0 0 × × 0
0 0 × × −→ 0 0 × ×
0 0 0 × 0 0 0 ×
0 0 0 × 0 0 0 0
Third iteration Fourth iteration
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
• We introduce zeroes below the “diagonal” in the first column by computing a Householder transfor-
mation and applying this from the left:
1 α11 α11
– Let [ , τ1 , ] := HouseV( ). This not only computes the House-
u21 a21 a21
holder vector, but also zeroes the entries in a21 and updates α11 .
– Update
T
α11 aT12 1 1 a T ã T
:= (I − 1
α α̃
) 11 12 = 11 12
a21 A22 τ1 u21 u21 a21 A22 0 Ã22
where
18.2. Reduction to Bidiagonal Form 345
× × × × × × × ×
× × × × 0 × × ×
× × × × −→ 0 × × ×
× × × × 0 × × ×
× × × × 0 × × ×
The zeroes below α11 are not actually written. Instead, the implementation overwrites these
elements with the corresponding elements of the vector uT21 .
• Next, we introduce zeroes in the first row to the right of the element on the superdiagonal by com-
puting a Householder transformation from ãT12 :
– Let [v12 , ρ1 , a12 ] := HouseV(a12 ). This not only computes the Householder vector, but also
zeroes all but the first entry in ãT12 , updating that first entry.
– Update
aT12 aT12 α̃12 eT0
:= (I − 1 v12 vT12 ) =
A22 A22 ρ1 Ã22
where
T
* α̃12 equals the first element of the updated a12 ;
* w21 := A22 v12 /ρ1 ;
T
* Ã22 := A22 − w21 v12 .
This introduces zeroes to the right of the ”superdiagonal” in the first row:
× × × × × × 0 0
0 × × × 0 × × ×
0 × × × −→ 0 × × ×
0 × × × 0 × × ×
0 × × × 0 × × ×
– The zeros to the right of the first element of aT12 are not actually written. Instead, the imple-
mentation overwrites these elements with the corresponding elements of the vector vT12 .
Homework 18.3 Homework 18.4 Give the approximate total cost for reducing A ∈ Rn×n to bidiagonal
form.
* SEE ANSWER
For a detailed discussion on the blocked algorithm that uses FLAME notation, we recommend [45]
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortı́, G. Joseph Elizondo.
Families of Algorithms for Reducing a Matrix to Condensed Form.
ACM Transactions on Mathematical Software (TOMS) , Vol. 39, No. 1, 2012
where β10 BT00 el is (clearly) a vector with only a nonzero in the last entry.
The above proof also shows that if B has no zeroes on its superdiagonal, then neither does BT B.
This means that one can apply the QR algorithm to matrix BT B. The problem, again, is that this squares
the condition number of the problem.
The following extension of the Implicit Q Theorem comes to the rescue:
Theorem 18.6 Let C, B ∈ Rm×n . If there exist unitary matrices U ∈ Rm×m and V ∈ Rn×n so that B =
U T CV is upper bidiagonal and has positive values on its superdiagonal then B and V are uniquely deter-
mined by C and the first column of V .
The above theorem supports an algorithm that, starting with an upper bidiagonal matrix B, implicitly
performs a shifted QR algorithm with T = BT B.
Consider the 5 × 5 bidiagonal matrix
β0,0 β0,1 0 0 0
0 β β 0 0
1,1 1,2
B= 0 0 β2,2 β2,3 0 .
0 0 0 β3,3 β 3,4
0 0 0 0 β4,4
18.3. The QR Algorithm with a Bidiagonal Matrix 347
Then T = BT B equals
β20,0 ? 0 0 0
β β
0,1 0,0 ? ? 0 0
T =
0 0 ? ? 0 ,
0 0 0 ? ?
0 0 0 0 β23,4 + β24,4
where the ?s indicate “don’t care” entries. Now, if an iteration of the implicitly shifted QR algorithm were
executed with this matrix, then the shift would be µk = β23,4 + β24,4 (actually, it is usually computed from
the bottom-right 2 × 2 matrix, a minor detail) and the first Givens’ rotation would be computed so that
γ0,1 σ0,1 β20,0 − µI
−σ0,1 γ0,1 β0,1 β1,1
has a zero second entry. Let us call the n × n matrix with this as its top-left submatrix G0 . Then applying
this from the left and right of T means forming G0 T G0 = G0 BT BGT0 = (BGT0 )T (BGT0 ). What if we only
apply it to B? Then
β˜ 0,0 β˜ 0,1 0 0 0 β0,0 β0,1 0 0 0 γ0,1 −σ0,1 0 0 0
β˜ β˜ 1,1 β1,2 0 0 0 β1,1 β1,2 0 0 σ γ0,1 0 0 0
1,0 0,1
=
.
0 0 β2,2 β2,3 0 0 0 β2,2 β2,3 0 0 0 1 0 0
0 0 0 β3,3 β3,4 0 0 0 β3,3 β3,4 0 0 0 1 0
0 0 0 0 β4,4 0 0 0 0 β4,4 0 0 0 0 1
The idea now is to apply Givens’ rotation from the left and right to “chasethe bulge”
except that now the
β˜ 0,0
rotations applied from both sides need not be the same. Thus, next, from one can compute γ1,0
β˜ 1,0
γ1,0 σ1,0 β˜ 0,0 β˜ 0,0
and σ1,0 so that = . Then
−σ1,0 γ1,0 β˜ 1,0 0
β˜ 0,0 β˜ 0,1
β˜ 02 0 0 γ1,0 σ1,0 0 0 0 β˜ 0,0 β˜ 0,1 0 0 0
β˜
0 1,1 β˜ 1,2 0 0 −σ
1,0 γ1,0 0 0 β˜ 1,0
0 β˜ 1,1 β1,2 0 0
0 0 0 = 0 0 1 0 0 .
0 0 0
β2,2 β2,3 β2,2 β2,3
0 0 0 β3,3 β3,4
0 0 0 1 0
0
0 0 β3,3 β3,4
0 0 0 0 β4,4 0 0 0 0 1 0 0 0 0 β4,4
˜β γ0,2 σ0,2 β˜
Continuing, from 0,1 one can compute γ0,2 and σ0,2 so that 0,1 =
β˜ 0,2 −σ0,2 γ0,2 β˜ 0,2
Chapter 18. Notes on Computing the SVD of a Matrix 348
˜˜
β0,2 . Then
0
˜ ˜
β˜ 0,0 β˜ 0,1
β0,0 β˜ 0,1 0 0 0 β˜ 02 0 0 1 0 0 0 0
˜
β˜
β˜ β˜ 1,2 β˜ 1,2
0
1,1 0 0
0 1,1 0 0 0
γ1,0 −σ1,0 0 0
β˜ 2,1 β˜ 2,2 =
0
β2,3 0 0 0 β2,2 β2,3 0 0
σ1,0 γ1,0 0 0
0 0 0 β3,3 β3,4 0 0 0 β3,3 β3,4 0
0 1 0 0
0 0 0 0 β4,4 0 0 0 0 β4,4 0 0 0 0 1
× × 0 0 0 × × × 0 0 × × 0 0 0
× × × 0 0 0 × × 0 0 0 × × 0 0
0 0 × × 0 −→ 0 0 × × 0 −→ 0 × × × 0 −→
0 0 0 × × 0 0 0 × × 0 0 0 × ×
0 0 0 0 × 0 0 0 0 × 0 0 0 0 ×
× × 0 0 0 × × 0 0 0
0 × × × 0 0 × × 0 0
0 0 × × 0 −→ 0 0 × × 0 −→
0 0 0 × × 0 0 × × ×
0 0 0 0 × 0 0 0 0 ×
× × 0 0 0 × × 0 0 0
0 × × 0 0 0 × × 0 0
0 0 × × × −→ 0 0 × × 0 −→
0 0 0 × × 0 0 0 × ×
0 0 0 0 × 0 0 0 × ×
× × 0 0 0
0 × × 0 0
0 0 × × 0
0 0 × ×
0 0 0 0 ×
Figure 18.3: Illustration of one sweep of an implicit QR algorithm with a bidiagonal matrix.
Chapter 18. Notes on Computing the SVD of a Matrix 350
Repartition
B00 β01 el 0 0 0
BT L BT M ? 0 β11 β12 0 0
0 BMM BMR → 0 0
β21 β22 β23
0 0 BBR 0
0 0 β33 β34 eT0
0 0 0 0 B44
whereτ11 and τ33 are scalars
(during final step, τ33 is 0 × 0)
γL1 σL1 β1,1 β˜ 1,1
Compute (γL1 , σL1 ) s.t. =
−σL1 γL1 β2,1 0
overwriting β1,1 with β˜ 1,1
β1,2 β1,3 γL1 σL1 β1,2 0
:=
β2,2 β2,3 −σL1 γL1 β2,2 β2,3
if m(BBR ) 6= 0
γR1 σR1 β1,2 β˜ 1,2
Compute (γR1 , σR1 ) s.t. =
−σR1 γR1 β1,3 0
overwriting β1,2 with β˜ 1,2
β1,2 0 β1,2 β1,3
γR1 −σR1
:= β2,2 β2,3
β2,2 β2,3
σR1 γR1
β3,2 β3,3 0 β3,3
Continue with
B00 β01 el 0 0 0
BT L BT M ? 0 β11 β12 0 0
0 ← 0 0
BMM BMR β21 β22 β23
0 0 BBR 0
0 0 β33 β34 eT0
0 0 0 0 B44
endwhile
351
Chapter 18. Notes on Computing the SVD of a Matrix 352
Answers
T
abT0
abT
1
• . = ab0 ab1 · · · abn−1 .
..
abTm−1
aH
0
H aH
1
• a0 a1 · · · an−1 = . .
..
aH
m−1
353
H
abT0
abT
1
• . = ab0 ab1 · · · abn−1 .
..
abTm−1
* BACK TO TEXT
* BACK TO TEXT
354
T
A0,0 A0,1 ··· A0,N−1 AT0,0 AT1,0 ··· ATM−1
A1,0 A1,1 ··· A1,N−1 AT AT1,1 · · · ATM−1,1
0,1
• = .
.. .. .. .. .. ..
. . ··· .
. . ··· .
AM−1,0 AM−1,1 · · · AM−1,N−1 AT0,N−1 AT1,N−1 · · · ATM−1,N−1
A0,0 A0,1 ··· A0,N−1 A0,0 A0,1 ··· A0,N−1
A1,0 A1,1 ··· A1,N−1 A1,0 A1,1 ··· A1,N−1
• = .
.. .. .. .. .. ..
. . ··· .
. . ··· .
AM−1,0 AM−1,1 · · · AM−1,N−1 AM−1,0 AM−1,1 · · · AM−1,N−1
H
A0,0 A0,1 ··· A0,N−1 AH
0,0 AH
1,0 ··· AH
M−1
A1,0 A1,1 ··· A1,N−1 AH AH · · · AH
0,1 1,1 M−1,1
• = .
.. .. .. .. .. ..
. . ··· .
. . ··· .
AM−1,0 AM−1,1 · · · AM−1,N−1 AH H H
0,N−1 A1,N−1 · · · AM−1,N−1
* BACK TO TEXT
• (αx)T = αxT .
• (αx)H = αxH .
x0 αx0
x1 αx1
• α . =
.. ..
.
xN−1 αxN−1
* BACK TO TEXT
355
* BACK TO TEXT
* BACK TO TEXT
Homework 1.8 Follow the instructions in the below video to implement the two algorithms for com-
puting y := Ax + y, using MATLAB and Spark. You will want to use the laff routines summarized in
Appendix B. You can visualize the algorithm with PictureFLAME.
* YouTube
* Downloaded
Video
Answer: Implementations are given in Figure 1.5. They can also be found in
Homework 1.9 Implement the two algorithms for computing A := yxT + A, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
Answer: Implementations can be found in
356
f u n c t i o n [ y o u t ] = M v m u l t u n b v a r 1 ( A, x , y ) f u n c t i o n [ y o u t ] = M v m u l t u n b v a r 2 ( A, x , y )
[ AL , AR ] = F L A P a r t 1 x 2 ( A, ... [ AT , . . .
0 , ’ FLA LEFT ’ ) ; AB ] = F L A P a r t 2 x 1 ( A, . . .
0 , ’FLA TOP ’ ) ;
[ xT , . . .
xB ] = F L A P a r t 2 x 1 ( x , . . . [ yT , . . .
0 , ’FLA TOP ’ ) ; yB ] = F L A P a r t 2 x 1 ( y , . . .
0 , ’FLA TOP ’ ) ;
w h i l e ( s i z e ( AL , 2 ) < s i z e ( A, 2 ) )
w h i l e ( s i z e ( AT , 1 ) < s i z e ( A, 1 ) )
[ A0 , a1 , A2 ] = F L A R e p a r t 1 x 2 t o 1 x 3 ( AL , AR, . . .
1 , ’FLA RIGHT ’ ) ; [ A0 , . . .
a1t , . . .
[ x0 , . . . A2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( AT , . . .
chi1 , . . . AB, . . .
x2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( xT , . . . 1 , ’FLA BOTTOM ’ ) ;
xB , . . .
1 , ’FLA BOTTOM ’ ) ; [ y0 , . . .
psi1 , . . .
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−% y2 ] = F L A R e p a r t 2 x 1 t o 3 x 1 ( yT , . . .
yB , . . .
% y = c h i 1 * a1 + y ; 1 , ’FLA BOTTOM ’ ) ;
y = l a f f a x p y ( c h i 1 , a1 , y ) ;
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−%
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−%
% psi1 = a1t * x + psi1 ;
[ AL , AR ] = F L A C o n t w i t h 1 x 3 t o 1 x 2 ( A0 , a1 , A2 , . . . psi1 = l a f f d o t s ( a1t , x , psi1 ) ;
’ FLA LEFT ’ ) ;
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−%
[ xT , . . .
xB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( x0 , . . . [ AT , . . .
chi1 , . . . AB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( A0 , . . .
x2 , . . . a1t , . . .
’FLA TOP ’ ) ; A2 , . . .
’FLA TOP ’ ) ;
end
[ yT , . . .
y out = y ; yB ] = F L A C o n t w i t h 3 x 1 t o 2 x 1 ( y0 , . . .
psi1 , . . .
y2 , . . .
return ’FLA TOP ’ ) ;
end
y o u t = [ yT
yB ] ;
return
357
Programming/chapter01 answers/Rank1 unb var1.m (see file only) (view in MATLAB)
and
* BACK TO TEXT
Homework 1.10 Prove that the matrix xyT where x and y are vectors has rank at most one, thus explaining
the name “rank-1 update”.
Answer: There are a number of ways of proving this:
The rank of a matrix equals the number of linearly independent columns it has. But each column of xyT
is a multiple of vector x and hence there is at most one linearly independent column.
The rank of a matrix equals the number of linearly independent rows it has. But each row of xyT is a
multiple of row vector yT and hence there is at most one linearly independent row.
The rank of a matrix equals the dimension of the column space of the matrix. If z is in the column space
of xyT then there must be a vector w such that z = (xyT )w. But then z = yT wx and hence is a multiple
of x. This means the column space is at most one dimensional and hence the rank is at most one.
There are probably a half dozen other arguments... For now, do not use an argument that utilizes the SVD,
since we have not yet learned about it.
* BACK TO TEXT
Homework 1.11 Implement C := AB +C via matrix-vector multiplications, using MATLAB and Spark.
You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm with
PictureFLAME.
Answer: An implementation can be found in
* BACK TO TEXT
Homework 1.12 Argue that the given matrix-matrix multiplication algorithm with m × n matrix C, m × k
matrix A, and k × n matrix B costs, approximately, 2mnk flops. Answer: This can be easily argued by
noting that each update of a rwo of matrix C costs, approximately, 2nk flops. There are m such rows to be
computed.
* BACK TO TEXT
358
Homework 1.13 Implement C := AB + C via row vector-matrix multiplications, using MATLAB and
Spark. You will want to use the laff routines summarized in Appendix B. You can visualize the algorithm
with PictureFLAME. Hint: yT := xT A + yT is not supported by a laff routine. You can use laff gemv
instead.
Answer: An implementation can be found in
* BACK TO TEXT
Homework 1.9 Argue that the given matrix-matrix multiplication algorithm with m × n matrix C, m × k
matrix A, and k × n matrix B costs, approximately, 2mnk flops. Answer: This can be easily argued
by noting that each rank-1 update of matrix C costs, approximately, 2mn flops. There are k such rank-1
updates to be computed.
* BACK TO TEXT
Homework 1.10 Implement C := AB +C via rank-1 updates, using MATLAB and Spark. You will want
to use the laff routines summarized in Appendix B. You can visualize the algorithm with PictureFLAME.
Answer: An implementation can be found in
* BACK TO TEXT
359
Chapter 2. Notes on Vector and Matrix Norms (Answers)
Homework 2.2 Prove that if ν : Cn → R is a norm, then ν(0) = 0 (where the first 0 denotes the zero vector
in Cn ).
Answer: Let x ∈ Cn and ~0 the zero vector of size n and 0 the scalar zero. Then
ν(~0) = ν(0 · x) 0 · x = ~0
= |0|ν(x) ν(·) is homogeneous
= 0 algebra
* BACK TO TEXT
Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that
χ j 6= 0. Then
kxk1 = |χ0 | + · · · + |χn−1 | ≥ |χ j | > 0.
* BACK TO TEXT
360
• x 6= 0 ⇒ kxk∞ > 0 (k · k∞ is positive definite):
Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that
χ j 6= 0. Then
kxk∞ = max |χi | ≥ |χ j | > 0.
i
* BACK TO TEXT
In other words, it equals the vector 2-norm of the vector that is created by stacking the columns of A on
top of each other. The fact that the Frobenius norm is a norm then comes from realizing this connection
and exploiting it.
Alternatively, just grind through the three conditions!
* BACK TO TEXT
abT0
abT
1
Homework 2.20 Let A ∈ Cm×n and partition A = . . Show that
..
abTm−1
kAk∞ = max kb
ai k1 = max (|αi,0 | + |αi,1 | + · · · + |αi,n−1 |)
0≤i<m 0≤i<m
361
abT0
abT
1
Answer: Partition A = . . Then
..
abTm−1
a T
a Tx
b 0
b 0
abT
abT x
1
1
kAk∞ = max kAxk∞ = max
. x
= max
kxk∞ =1 kxk∞ =1
..
kxk∞ =1
.
..
abT
abT x
m−1 ∞ m−1 ∞
n−1 n−1
= max max |aTi x| = max max | ∑ αi,p χ p | ≤ max max ∑ |αi,p χ p |
kxk∞ =1 i kxk∞ =1 i p=0 kxk∞ =1 i p=0
n−1 n−1 n−1
= max max ∑ (|αi,p ||χ p |) ≤ max max ∑ (|αi,p |(max |χk |)) ≤ max max ∑ (|αi,p |kxk∞ )
kxk∞ =1 i p=0 kxk∞ =1 i p=0 k kxk∞ =1 i p=0
n−1
= max ∑ (|αi,p | = kb
ai k1
i p=0
* BACK TO TEXT
362
Homework 2.21 Let y ∈ Cm and x ∈ Cn . Show that kyxH k2 = kyk2 kxk2 .
Answer: W.l.o.g. assume that x 6= 0.
We know by the Hölder inequality that |xH z| ≤ kxk2 kzk2 . Hence
kyxH k2 = max kyxH zk2 = max |xH z|kyk2 ≤ max kxk2 kzk2 kyk2 = kxk2 kyk2 .
kzk2 =1 kzk2 =1 kzk2 =1
But also
kyxH k2 = max kyxH zk2 /kzk2 ≥ kyxH xk2 /kxk2 = kyk2 kxk2 .
z6=0
Hence
kyxH k2 = kyk2 kxk2 .
* BACK TO TEXT
* BACK TO TEXT
363
so that kABk2F ≤ kAk22 kBk22 . Taking the square-root of both sides established the desired result.
* BACK TO TEXT
Homework 2.28 Let k·k be a matrix norm induced by the k·k vector norm. Show that κ(A) = kAkkA−1 k ≥
1.
Answer:
kIk = kAA−1 k ≤ kAkkA−1 k.
But
kIk = max kIxk = max kxk = 1.
kxk=1 kxk
Hence 1 ≤ kAkkA−1 k.
* BACK TO TEXT
Homework 2.29 A vector x ∈ R2 can berepresented by the point to which it points when rooted at the
2
origin. For example, the vector x = can be represented by the point (2, 1). With this in mind, plot
1
'$
&%
@
@
@
@
@
@
* BACK TO TEXT
364
Homework 2.30 Consider
1 2 −1
A= .
−1 1 0
1. Compute the 1-norm of each column and pick the largest of these:
kAk1 = max(|1| + | − 1|, |2| + |1|, | − 1| + |0|) = max(2, 3, 1) = 3
2. Compute the 1-norm of each row and pick the largest of these:
kAk∞ = max(|1| + |2| + | − 1|, | − 1| + |1| + |0|) = max(2, 4) = 4
p √ √
3. kAkF = 12 + 22 + (−1)2 + (−1)2 + 12 + 02 = 8 = 2 2
* BACK TO TEXT
kxk1 ≤ nkxk∞ :
Let k be the index such that |xk | = kxk∞ . kx|1 = ∑i |xi | ≤ ∑i |xk | = n|xk | = nkxk∞ .
365
√ √
3. (1/ n)kxk2 ≤ kxk∞ ≤ nkxk2 .
Answer:
√
(1/ n)kxk2 ≤ kxk∞ : Let k be the index such that |xk | = kxk∞ .
r p
q
∑ |x |2 ∑i |xi |2 kxk2
i k
kxk∞ = |xk | = |xk |2 = ≥ √ = √
n n n
√
kxk∞ ≤ nkxk2 :
This follows from the fact that kxk∞ ≤ kxk1 ≤ sqrtnkxk2 .
* BACK TO TEXT
* BACK TO TEXT
Prove that
1. kAkF = kAT kF .
Answer:
! !
m−1 n−1 n−1 m−1
kAk2F = ∑ ∑ |αi, j |2 = ∑ ∑ |αi, j |2 = kAT k2F
i=0 j=0 j=0 i=0
q
2. kAkF = ka0 k22 + ka1 k22 + · · · + kan−1 k22 .
Answer:
! !
m−1 n−1 n−1 m−1 n−1
kAk2F = ∑ ∑ |αi, j |2 = ∑ ∑ |αi, j |2 = ∑ ka j k22
i=0 j=0 j=0 i=0 j=0
366
q
a0 k22 + ke
3. kAkF = ke a1 k22 + · · · + ke
am−1 k22 .
Answer:
2
T
T
ae0
T
ae1
2
2 T 2
a0 k22 + ke
a1 k22 + · · · + ke
am−1 k22 .
kAkF = kA kF =
= ae0 ae1 · · · aen−1
= ke
..
F
.
aeTm−1
F
aTi )T .)
(Note that here aei = (e
* BACK TO TEXT
• ke j k2 = 1
• ke j k1 = 1
• ke j k∞ = 1
• ke j k p = 1
* BACK TO TEXT
• kIk1 = 1
• kIk∞ = 1
* BACK TO TEXT
Homework 2.36 Let k · k be a vector norm defined for a vector of any size (arbitrary n). Let k · k be the
induced matrix norm. Prove that kIk = 1 (where I equals the identity matrix).
Answer:
367
δ0 0 ··· 0
0 δ1 · · · 0
Homework 2.37 Let D = (a diagonal matrix). Compute
.. .. ..
. . . 0
0 0 · · · δn−1
δ0 0 ··· 0
0 δ1 · · · 0
Homework 2.38 Let D = (a diagonal matrix). Then
.. .. ..
. . . 0
0 0 · · · δn−1
Also,
kDk p = max kDxk p ≥ kDeJ k p = kδJ eJ k p = |δJ |keJ k p = |δJ | = max |δk |
kxk p =1 k
368
where J is chosen to be the index so that = |δJ | = maxk |δk |. Thus
Also,
369
Chapter 3. Notes on Orthogonality and the SVD (Answers)
Homework 3.4 Let Q ∈ Cm×n (with n ≤ m). Partition Q = q0 q1 · · · qn−1 . Show that Q is an
orthonormal matrix if and only if q0 , q1 , . . . , qn−1 are mutually orthonormal.
Answer: Let Q ∈ Cm×n (with n ≤ m). Partition Q = q0 q1 · · · qn−1 . Then
H
QH Q = q0 q1 · · · qn−1 q0 q1 · · · qn−1
qH
0
qH
1
= . q0 q1 · · · qn−1
..
H
qn−1
qH q0 qH
0 q1 · · · qH0 qn−1
0
qH q0 qH · · · qH
1 q1 1 qn−1
1
= .
.
.. .
.. ..
.
qH H H
n−1 q0 qn−1 q1 · · · qn−1 qn−1
Now consider QH Q = I:
qH0 q0 qH
0 q1 · · · qH0 qn−1 1 0 ··· 0
qH q0 qH · · · qH 0 1 ··· 0
1 q1 1 qn−1
1
= .
.
.. .
.. .. .. .. ..
.
. . .
qH H H
n−1 q0 qn−1 q1 · · · qn−1 qn−1 0 0 ··· 1
Homework 3.6 Let Q ∈ Cm×m . Show that if Q is unitary then Q−1 = QH and QQH = I.
Answer: If Q is unitary, then QH Q = I. If A, B ∈ Cm×m , the matrix B such that BA = I is the inverse of
A. Hence Q−1 = QH . Also, if BA = I then AB = I and hence QQH = I.
* BACK TO TEXT
Homework 3.7 Let Q0 , Q1 ∈ Cm×m both be unitary. Show that their product, Q0 Q1 , is unitary.
Answer: Obviously, Q0 Q1 is a square matrix.
(Q0 Q1 )H (Q0 Q1 ) = QH H
QH
1 Q0 Q0 Q1 = | 1 = I.
1{zQ}
| {z }
I I
370
Hence Q0 Q1 is unitary.
* BACK TO TEXT
Homework 3.8 Let Q0 , Q1 , . . . , Qk−1 ∈ Cm×m all be unitary. Show that their product, Q0 Q1 · · · Qk−1 , is
unitary.
Answer: Strictly speaking, we should do a proof by induction. But instead we will make the more
informal argument that
= QH H
k−1 · · · Q1 |QH
0 Q0 Q1 · · · Qk−1 = I.
{z }
I
| {z }
I
| {z }
I
| {z }
I
* BACK TO TEXT
Homework 3.10 Let U ∈ Cm×m and V ∈ Cn×n be unitary matrices and A ∈ Cm×n . Then
Answer:
• kUAk2 = maxkxk2 =1 kUAxk2 = maxkxk2 =1 kAxk2 = kAk2 .
• kAV k2 = maxkxk2 =1 kAV xk2 = maxkV xk2 =1 kA(V x)k2 = maxkyk2 =1 kAyk2 = kAk2 .
* BACK TO TEXT
Homework 3.11 Let U ∈ Cm×m and V ∈ Cn×n be unitary matrices and A ∈ Cm×n . Then
Answer:
371
• Partition A = a0 a1 · · · an−1 . Then it is easy to show that kAk2F = ∑n−1 2
j=0 ka j k2 . Thus
2
kUAkF = kU a0 a1 · · · an−1 kF = k Ua0 Ua1 · · · Uan−1 k2F
2
n−1 n−1
= ∑ kUa j k22 = ∑ ka j k22 = kAk2F .
j=0 j=0
abTm−1 abTm−1V
m−1 m−1
= ∑ aTi )H k22 =
kV H (b ∑ k(baTi )H k22 = kAk2F .
i=0 i=0
Hence kAV kF = kAkF .
* BACK TO TEXT
372
so that kDk2 ≤ maxn−1
i=0 |δi |.
Also, choose j so that |δ j | = maxn−1
i=0 |δi |. Then
n−1
kDk2 = max kDxk2 ≥ kDe j k2 = kδ j e j k2 = |δ j |ke j k2 = |δ j | = max |δi |.
kxk2 =1 i=0
AT
Homework 3.14 Let A = . Use the SVD of AT to show that kAk2 = kAT k2 .
0
Answer: Let AT = UT ΣT VTH be the SVD of AT . Then
AT UT ΣT VTH UT
A= = = ΣT VTH ,
0 0 0
which is the SVD of A. As a result, clearly the largest singular value of AT equals the largest singular
value of A and hence kAk2 = kAT k2 .
* BACK TO TEXT
Homework 3.15 Assume that U ∈ Cm×m and V ∈ Cn×n are unitary matrices. Let A, B ∈ Cm×n with
B = UAV H . Show that the singular values of A equal the singular values of B.
Answer: Let A = UA ΣAVAH be the SVD of A. Then B = UUA ΣAVAH V H = (UUA )ΣA (VVA )H where both
UUA and VVA are unitary. This gives us the SVD for B and it shows that the singular values of B equal the
singular values of A.
* BACK TO TEXT
σ0 0
Homework 3.16 Let A ∈ Cm×n with A = and assume that kAk2 = σ0 . Show that kBk2 ≤
0 B
kAk2 . (Hint: Use the SVD of B.)
Answer: Let B = UB ΣBVBH be the SVD of B. Then
H
σ0 0 σ0 0 1 0 σ0 0 1 0
A= = =
0 B 0 UB ΣBVBH 0 UB 0 ΣB 0 VB
which shows the relationship between the SVD of A and B. Since kAk2 = σ0 , it must be the case that the
diagonal entries of ΣB are less than or equal to σ0 in magnitude, which means that kBk2 ≤ kAk2 .
* BACK TO TEXT
373
Homework 3.17 Prove Lemma 3.12 for m ≤ n.
Answer: You can use the following as an outline for your proof:
= 0 (thezero matrix) then the theorem trivially holds: A = UDV H
Proof: First, let us observe that if A
H
A= abT0 = 1 aT0 k2 )
kb 0 v0 V1 = UDV H
where DT L = δ0 = aT0 k2
kb and U = 1 .
• Inductive step: Similarly modify the inductive step of the proof of the theorem.
• By the Principle of Mathematical Induction the result holds for all matrices A ∈ Cm×n with m ≥ n.
* BACK TO TEXT
• κ2 (A) = σ0 /σm−1 .
Homework 3.36 Let A ∈ Cn×n have singular values σ0 ≥ σ1 ≥ · · · ≥ σn−1 (where σn−1 may equal zero).
Prove that σn−1 ≤ kAxk2 /kxk2 ≤ σ0 (assuming x 6= 0).
Answer:
kAxk2 kAyk2
≤ max = σ0 .
kxk2 y6=0 kyk2
374
(for example, as a consequence of how σ0 shows up in the derivation of the SVD).
kAxk2 kAyk2 kUΣV H yk2 kΣV H yk2
≥ min = min = min
kxk2 y6=0 kyk2 y6=0 kyk2 y6=0 kyk2
H
kΣV yk2 kΣzk2
= min = min = σn−1
y6=0 kV H yk2 z6=0 kzk2
(through a simple argument about the smallest element (in magnitude) on the diagonal of a diagonal
matrix).
* BACK TO TEXT
Homework 3.37 Let D ∈ Rn×n be a diagonal matrix with entries δ0 , δ1 , . . . , δn−1 on its diagonal. Show
that
1. maxx6=0 kDxk2 /kxk2 = max0≤i<n |δi |. Answer: It is usually simpler to square both sides: maxx6=0 kDxk22 /kxk22 =
max0≤i<n δ2i
max kDxk22 /kxk22 = max kDxk22 = max δ20 χ20 + · · · + δ2n−1 χ2n−1
x6=0 kxk2 =1 kxk2 =1
2 2 2 2
≤ max max δi χ0 + · · · + max δi χn−1 = max δ2i max kxk22
kxk2 =1 0≤i<n 0≤i<n 0≤i<n kxk2 =1
= max δ2i
0≤i<n
2. minx6=0 kDxk2 /kxk2 = min0≤i<n |δi |. Answer: Again, it is usually simpler to square both sides:
minx6=0 kDxk22 /kxk22 = min0≤i<n δ2i
min kDxk22 /kxk22 = min kDxk22 = min δ20 χ20 + · · · + δ2n−1 χ2n−1
x6=0 kxk2 =1 kxk2 =1
≥ min min δi χ0 + · · · + min δi χn−1 = min δ2i min kxk22
2 2 2 2
kxk2 =1 0≤i<n 0≤i<n 0≤i<n kxk2 =1
= min δ2i
0≤i<n
375
* BACK TO TEXT
Homework 3.38 Let A ∈ Cn×n have the property that for all vectors x ∈ Cn it holds that kAxk2 = kxk2 .
Use the SVD to prove that A is unitary.
Answer: Let A = UΣV H be the singular value decomposition. It suffices so show that Σ − I or, equiv-
alently, that σ0 = · · · = σn−1 . This then means A is the result of multiplying three unitary matrices, and
hence unitary itself.
Answer 1: Partition
U= u0 · · · un−1 and V = v0 · · · vn−1
Answer 2: σ0 ≥ σ1 ≥ · · · ≥ σn−1 .
kAxk2
σ0 = max = 1.
x6=0 kxk2
kAxk2
σn−1 = min = 1.
x6=0 kxk2
Hence σi = 1 for 0 ≤ i < n.
Answer 3: (Does not use the SVD) kAxk2 = kxk2 is equivalent to kAxk22 = kxk22 and to xH AH Ax = xH x.
Let ai denote the ith column of A. Pick x = ei . Then we see that eH H H H
i A Aei = (Aei ) Aei = ai ai = 1
and hence the columns of A are all of length one. Now, consider x = ei + e j where i 6= j:
* BACK TO TEXT
* BACK TO TEXT
376
1
Homework 3.40 Compute the SVD of 2 1 .
−2
Answer: The trick is to recognize that you need to normalize each of the vectors:
1 1 √ √ 1
1
−1 1 = √ 5 2√ −1 1
−2 5 −2 2
1 1 √ √
1
= √ ( 5 2) √ −1 1
5 −2 | {z } 2
| {z }
| {z } Σ H
V
U
* BACK TO TEXT
377
Chapter 4. Notes on Gram-Schmidt QR Factorization (Answers)
Homework 4.1 What happens in the Gram-Schmidt algorithm if the columns of A are NOT linearly
independent? Answer: If a j is the first column such that {a0 , . . . , a j } are linearly dependent, then a⊥j will
equal the zero vector and the process breaks down. How might one fix this? Answer: When a vector
with a⊥j
is encountered, the columns can be rearranged so that that column (or those columns) come last.
How can the Gram-Schmidt algorithm be used to identify which columns of A are linearly independent?
Answer: Again, if a⊥j = 0 for some j, then the columns are linearly dependent.
* BACK TO TEXT
Homework 4.2 Convince yourself that the relation between the vectors {a j } and {q j } in the algorithms
in Figure 4.2 is given by
ρ0,0 ρ0,1 · · · ρ0,n−1
0 ρ1,1 · · · ρ1,n−1
a0 a1 · · · an−1 = q0 q1 · · · qn−1 . ,
.. .
.. . .. .
..
0 0 · · · ρn−1,n−1
where
H for i < j
qi a j
1 for i = j
H j−1
qi q j = and ρi, j = ka j − ∑i=0 ρi, j qi k2 for i = j
0 otherwise
0 otherwise.
Homework 4.5 Let A have linearly independent columns and let A = QR be a QR factorization of A.
Partition
RT L RT R
A → AL AR , Q → QL QR , and R → ,
0 RBR
where AL and QL have k columns and RT L is k × k. Show that
1. AL = QL RT L : QL RT L equals the QR factorization of AL ,
2. C (AL ) = C (QL ): the first k columns of Q form an orthonormal basis for the space spanned by the
first k columns of A.
3. RT R = QH
L AR ,
4. (AR − QL RT R )H QL = 0,
378
5. AR − QL RT R = QR RBR , and
6. C (AR − QL RT R ) = C (QR ).
Hence
AL = QL RT L and AR = QL RT R + QR RBR .
QH H
L (AR − QL RT R ) = QL QR RBR
is equivalent to
QH
L AR − Q
H
Q R = QH Q R = 0.
| L{z L} T R | L{z R} BR
I 0
Rearranging yields 3.
• Since AR − QL RT R = QR RBR we find that (AR − QL RT R )H QL = (QR RBR )H QL and
(AR − QL RT R )H QL = RH H
BR QR QL = 0.
* BACK TO TEXT
Homework 4.6 Implement GS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (I’m not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
Answer: Implementations can be found in Programming/chapter04 answers.
* BACK TO TEXT
379
Homework 4.7 Implement MGS unb var1 using MATLAB and Spark. You will want to use the laff
routines summarized in Appendix B. (I’m not sure if you can visualize the algorithm with PictureFLAME.
Try it!)
Answer: Implementations can be found in Programming/chapter04 answers.
* BACK TO TEXT
380
Chapter 6. Notes on Householder QR Factorization (Answers)
• H = HH .
Answer:
(I − 2uuH )H = I − 2(uH )H uH = I − 2uuH
• H H H = I (a reflector is unitary).
Answer:
H H H = HH = I
* BACK TO TEXT
Homework 6.4 Show that if x ∈ Rn , v = x ∓ kxk2 e0 , and τ = vT v/2 then (I − 1τ vvT )x = ±kxk2 e0 .
Answer: This is surprisingly messy...
* BACK TO TEXT
where τ = uH u/2 = (1 + uH
2 u2 )/2 and ρ =
±kxk2 .
q
z z
Hint: ρρ = |ρ|2 = kxk22 since H preserves the norm. Also, kxk22 = |χ1 |2 + kx2 k22 and z = |z| .
Answer: Again, surprisingly messy...
* BACK TO TEXT
381
Homework 6.6 Function Housev.m implements the steps in Figure 6.2 (left). Update this implementation
with the equivalent steps in Figure 6.2 (right), which is closer to how it is implemented in practice.
Answer: See Programming/chapter06 answers/Housev alt.m
* BACK TO TEXT
Answer:
I 0 0 0
H
H
= I −
0 I − 1 1 1 1 1
0 1
τ1 τ1
u2 u2 u2 u2
0 0
1 H
= I−
τ1 0 1 1
u2 u2
0 0 0
1
= I− 0 1 H
u2
τ1
0 u2 u2 uH 2
H
0 0
1
= I − 1 1 .
τ1
u2 u2
* BACK TO TEXT
Homework 6.8 Given A ∈ Rm×n show that the cost of the algorithm in Figure 6.4 is given by
2
CHQR (m, n) ≈ 2mn2 − n3 flops.
3
382
matrix A22 which is of size approximately (m − k) × (n − k) for a cost of 4(m − k)(n − k) flops. Thus the
total cost is approximately
* BACK TO TEXT
Input is an m×n matrix A and vector t of size n. Output is the overwritten matrix A and the vector of scalars
that define the Householder transformations. You may want to use Programming/chapter06/test HQR unb var1.m
to check your implementation.
Answer: See Programming/chapter06 answers/HQR unb var1.m.
* BACK TO TEXT
You may want to use Programming/chapter06/test HQR unb var1.m to check your implementation.
Answer: See Programming/chapter06 answers/FormQ unb var1.m.
* BACK TO TEXT
Homework 6.13 Given A ∈ Cm×n the cost of the algorithm in Figure 6.6 is given by
2
CFormQ (m, n) ≈ 2mn2 − n3 flops.
3
Answer: The answer for Homework 6.8 can be easily modified to establish this result.
* BACK TO TEXT
383
Homework 6.14 If m = n then Q could be accumulated by the sequence
Q = (· · · ((IH0 )H1 ) · · · Hn−1 ).
Give a high-level reason why this would be (much) more expensive than the algorithm in Figure 6.6.
Answer: The benefit of accumulating Q via the sequence · · · (H1 (H0 I)) is that many zeroes are preserved,
thus reducing the necessary computations. In contrast, IH0 creates a dense matrix, after which application
of subsequent Householder transformations benefits less from zeroes.
* BACK TO TEXT
Homework 6.15 Consider u1 ∈ Cm with u1 6= 0 (the zero vector), U0 ∈ Cm×k , and nonsingular T00 ∈ Ck×k .
Define τ1 = (uH
1 u1 )/2, so that
1
H1 = I − u1 uH
1
τ1
equals a Householder transformation, and let
−1 H
Q0 = I −U0 T00 U0 .
Show that
−1
−1 H 1 T00 t01 H
Q0 H1 = (I −U0 T00 U0 )(I − u1 uH
1 )=I− U0 u1 U0 u1 ,
τ1 0 τ1
where t01 = QH
0 u1 .
Answer:
−1 H 1
Q0 H1 = (I −U0 T00 U0 )(I − u1 uH
1)
τ1
−1 H 1 −1 H 1
= I −U0 T00 U0 − u1 uH
1 +U0 T00 U0 u1 uH
1
τ1 τ1
−1 H 1 1 −1
= I −U0 T00 U0 − u1 uH 1 + U0 T00 t01 uH
1
τ1 τ1
Also
−1
T00 t01 H
I− U0 u1 U0 u1
0 τ1
−1 −1
T00 −T00 t01 /τ1 H
= I− U0 u1 U0 u1
0 1/τ1
−1 H −1
T00 U0 − T00 t01 uH
1 /τ1
= I− U0 u1
1/τ1 uH
1
−1 H −1
= I −U0 (T00 U0 − T00 t01 uH
1 /τ1 ) − 1/τ1 u1 u1
H
−1 H 1 1 −1
= I −U0 T00 U0 − u1 uH 1 + U0 T00 t01 uH
1
τ1 τ1
384
* BACK TO TEXT
1 H
Hi = I − ui u
τi i
Show that
H0 H1 · · · Hk−1 = I −UT −1U H ,
where T is an upper triangular matrix.
Answer: The results follows from Homework 6.15 via a proof by induction.
* BACK TO TEXT
* BACK TO TEXT
Answer:
−1 −1 −1 −1
T t T −T00 t01 /τ1 T T −T00 T00 t01 /τ1 + τ1 /τ1 I 0
00 01 00 = 00 00 = .
0 τ1 0 1/τ1 0 τ1 /τ1 0 1
* BACK TO TEXT
385
ν0 ka0 k22
ν ka k2
1 1 2
T
Homework 6.19 Let A = a0 a1 · · · an−1 , v= . = , q q = 1 (of same
.. ..
.
νn−1 ka2n−1 k2
ρ0
ρ
1
T T
size as the columns of A), and r = A q = . . Compute B := A−qr with B = b0 b1 · · · bn−1 .
..
ρn−1
Then
kb0 k22 ν0 − ρ20
kb k2 ν − ρ2
1 2 1 1
= .
.
.. .
..
kbn−1 k2 2 2
νn−1 − ρn−1
Answer:
ai = (ai − aTi qq) + aTi qq
and
(ai − aTi qq)T q = aTi q − aTi qqT q = aTi q − aTi q = 0.
This means that
kai k22 = k(ai − aTi qq) + aTi qqk22 = kai − aTi qqk22 + kaTi qqk22 = kai − ρi qk22 + kρi qk22 = kbi k22 + ρ2i
so that
kbi k22 = kai k22 − ρ2i = νi − ρ2i .
* BACK TO TEXT
Homework 6.20 In Section 4.4 we discuss how MGS yields a higher quality solution than does CGS in
terms of the orthogonality of the computed columns. The classic example, already mentioned in Chapter 4,
that illustrates this is
1 1 1
ε 0 0
A= ,
0 ε 0
0 0 ε
√
where ε = εmach . In this exercise, you will compare and contrast the quality of the computed matrix Q
for CGS, MGS, and Householder QR factorization.
386
format long % print out 16 digits
eps = 1.0e-8 % roughly the square root of the machine epsilon
A = [
1 1 1
eps 0 0
0 eps 0
0 0 eps
}
• With the various routines you implemented for Chapter 4 and the current chapter compute
• Finally, check whether the columns of the various computed matices Q are mutually orthonormal:
Q_CGS’ * Q_CGS
Q_MGS’ * Q_MGS
Q_HQR’ * Q_HQR
What do you notice? When we discuss numerical stability of algorithms we will gain insight into
why HQR produces high quality mutually orthogonal columns.
Later, we will see how the QR factorization can be used to solve Ax = b and linear least-squares
problems. At that time we will examine how accurate the solution is depending on which method for QR
factorization is used to compute Q and R.
Answer:
ans =
387
>> Q_MGS’ * Q_MGS
ans =
ans =
1.000000000000000 0 0
0 1.000000000000000 0
0 0 1.000000000000000
Clearly, the orthogonality of the matrix Q computed via Householder QR factorization is much more
accurate.
* BACK TO TEXT
A
Homework 6.21 Consider the matrix where A has linearly independent columns. Let
B
• A = QA RA be the QR factorization of A.
RA RA
• = QB RB be the QR factorization of .
B B
A A
• = QR be the QR factorization of .
B B
Assume that the diagonal entries of RA , RB , and R are all positive. Show that R = RB .
Answer:
A QA 0 RA QA 0
= = QB RB
B 0 I B 0 I
A QA 0
Also, = QR. By the uniqueness of the QR factorization, Q = QB and R = RB .
B 0 I
* BACK TO TEXT
388
R
Homework 6.22 Consider the matrix where R is an upper triangular matrix. Propose a mod-
B
ification of HQR UNB VAR 1 that overwrites R and B with Householder vectors and updated matrix R.
Importantly, the algorithm should take advantage of the zeros in R (in other words, it should avoid com-
puting with the zeroes below its diagonal). An outline for the algorithm is given in Figure 6.15.
Answer: See Figure 6.6.
* BACK TO TEXT
389
Algorithm: [R, B,t] := HQR U PDATE UNB VAR 1(R, B,t)
RT L RT R tT
Partition R → , B → BL BR , t →
RBL RBR tB
whereRT L is 0 × 0, BL has 0 columns, tT has 0 rows
while m(RT L ) < m(R) do
Repartition
R r01 R02
RT L RT R 00
→ rT
ρ11 T
r12 ,
RBL RBR 10
R20 r21 R22
t
0
tT
BL BR → B0 b1 B2 , → τ1
tB
t2
whereρ11 is 1 × 1, b1 has 1 column, τ1 has 1 row
ρ11 ρ11
, τ1 := Housev
b1 b1
T
r12 1 T
r12
1
Update := I −
τ1
1 bH
1
B2 b1 B2
via the steps
• wT12 := (r12
T + bH B )/τ
1 2 1
r T T
r − w12T
• 12 := 12
B2 T
B2 − b2 w12
Continue with
R r R02
RT L RT R 00 01
← rT ρ11
T
r12 ,
RBL RBR 10
R20 r21 R22
t
tT 0
← , ← τ1
BL BR B0 b1 B2
tB
t2
endwhile
390
Figure 6.6: Algorithm that compute the QR factorization of a triangular matrix, R, appended with a ma-
Chapter 8. Notes on Solving Linear Least-squares Problems (An-
swers)
Homework 8.2 Let A ∈ Cm×n with m < n have linearly independent rows. Show that there exist a lower
triangular matrix LL ∈ Cm×m and a matrix QT ∈ Cm×n with orthonormal
such that A = LL QT , noting
rows
that LL does not have any zeroes on the diagonal. Letting L = LL 0 be Cm×n and unitary Q =
Q
T , reason that A = LQ.
QB
Don’t overthink the problem: use results you have seen before.
Answer: We know that AH ∈ Cn×m with linearly independent
columns (and n ≤ m). Hence there exists
RT
a unitary matrix Q̌ = FlaOneByTwoQL QR and R = with upper triangular matrix RT that has no
0
zeroes on its diagonal such that AH = Q̌L RT . It is easy to see that the desired QT equals QT = Q̌H
L and the
desired LL equals RHT.
* BACK TO TEXT
Homework 8.3 Let A ∈ Cm×n with m < n have linearly independent rows. Consider
kAx − yk2 = min kAz − yk2 .
z
Use the fact that A = LL QT , where LL ∈ Cm×m is lower triangular and QT has orthonormal rows, to argue
−1
that any vector of the
form QH H
T LL y + QB wB (where wB is any vector in C
n−m ) is a solution to the LLS
QT
problem. Here Q = .
QB
Answer:
min kAz − yk2 = min kLQz − yk2
z z
= min kLw − yk2
w
z = QH w
w
T − y
= min
LL 0
w
wB
H
z=Q w 2
= min kLL wT − yk2 .
w
z = QH w
391
describes all solutions to the LLS problem.
* BACK TO TEXT
Homework 8.4 Continuing Exercise 8.2, use Figure 8.1 to give a Classical Gram-Schmidt inspired al-
gorithm for computing LL and QT . (The best way to check you got the algorithm right is to implement
it!)
Answer:
* BACK TO TEXT
Homework 8.5 Continuing Exercise 8.2, use Figure 8.2 to give a Householder QR factorization inspired
algorithm for computing L and Q, leaving L in the lower triangular part of A and Q stored as Householder
vectors above the diagonal of A. (The best way to check you got the algorithm right is to implement it!)
Answer:
* BACK TO TEXT
392
Chapter 8. Notes on the Condition of a Problem (Answers)
Homework 9.1 Show that, if A is a nonsingular matrix, for a consistent matrix norm, κ(A) ≥ 1.
Answer: Hmmm, I need to go and check what the exact definition of a consistent matrix norm is
here... What I mean is a matrix norm induced by a vector norm. The reason is that then kIk = 1.
Let k · k be the norm that is used to define κ(A) = kAA−1 k. Then
* BACK TO TEXT
Homework 9.2 If A has linearly independent columns, show that k(AH A)−1 AH k2 = 1/σn−1 , where σn−1
equals the smallest singular value of A. Hint: Use the SVD of A.
Answer: Let A = UΣV H be the reduced SVD of A. Then
(since the two norm of a diagonal matrix equals its largest diagonal element in absolute value).
* BACK TO TEXT
Homework 9.3 Let A have linearly independent columns. Show that κ2 (AH A) = κ2 (A)2 .
Answer: Let A = UΣV H be the reduced SVD of A. Then
* BACK TO TEXT
393
• Show that Ax = y if and only if AH Ax = AH y.
Answer: Since A has linearly independent columns and is square, we know that A−1 and A−H exist.
If Ax = y, then multiplying both sides by AH yields AH Ax = AH y. If AH Ax = AH y then multiplying
both sides by A−H yields Ax = y.
• Reason that using the method of normal equations to solve Ax = y has a condition number of κ2 (A)2 .
* BACK TO TEXT
Homework 9.6 Characterize the set of all square matrices A with κ2 (A) = 1.
Answer: If κ2 (A) = 1 then σ0 /σn−1 = 1 which means that σ0 = σn−1 and hence σ0 = σ1 = · · · = σn−1 .
Hence the singular value decomposition of A must equal A = UΣV H = σ0UV H . But UV H is unitary since
U and V H are, and hence we conclude that A must be a nonzero multiple of a unitary matrix.
* BACK TO TEXT
Homework 9.7 Let k · k be a vector norm with corresponding induced matrix norm. If A ∈ Cn×n is
nonsingular and
k∆Ak 1
< . (9.1)
kAk κ(A)
then A + ∆A is nonsingular.
Answer: Equation (9.1) can be rewritten as k∆AkkA−1 k < 1. We will prove that if A + ∆A is singular,
then k∆AkkA−1 k ≥ 1.
A + ∆A is singular
⇒ < equivalent condition >
(A + ∆A)y = 0 for some y 6= 0
⇒ < linear algebra >
y = −A−1 ∆Ay
⇒ < take norms >
kyk = kA−1 ∆Ayk
⇒ < property of induced matrix norms >
kyk ≤ kA−1 kk∆Akkyk
⇒ < kyk > 0 >
1 ≤ kA−1 kk∆Ak.
394
* BACK TO TEXT
Homework 9.8 Let A ∈ Cn×n be nonsingular, Ax = y, and (A + ∆A)(x + δx) = y. Show that (for induced
norm)
kδxk k∆Ak
≤ κ(A) .
kx + δxk kAk
Answer: (A + ∆A)(x + δx) = y can be rewritten as Ax + Aδx + ∆A(x + δx) = y. Since Ax = y we find that
δx = −A−1 ∆A(x + δx) and hence
kδxk ≤ kA−1 kk∆Akkx + δxk.
Thus
kδxk κ(A) k∆Ak
≤ kA−1 kk∆Ak = kA−1 k −1
k∆Ak = κ(A) .
kx + δxk kAkkA k kAk
* BACK TO TEXT
Homework 9.9 Let A ∈ Cn×n be nonsingular, k∆Ak/kAk < 1/κ(A), y 6= 0, Ax = y, and (A + ∆A)(x + δx) =
y. Show that (for induced norm)
kδxk κ(A) k∆Ak
kAk
≤ .
kxk 1 − κ(A) k∆Ak
kAk
Answer:
* BACK TO TEXT
Homework 9.10 Let A ∈ Cn×n be nonsingular, k∆Ak/kAk < 1, y 6= 0, Ax = y, and (A + ∆A)(x + δx) =
y + δy. Show that (for induced norm)
k∆Ak kδyk
kδxk κ(A) kAk + kyk
≤ .
kxk 1 − κ(A) k∆Ak kAk
Answer:
(A + ∆A)(x + δx) = y + δy
is equivalent to
395
Taking the norm on both side yields
Bringing terms that involve δx to the left and dividing by kxk yields
since kyk = kAxk ≤ kAkkxk. Rearranging this yields the desired result.
* BACK TO TEXT
396
Chapter 9. Notes on the Stability of an Algorithm (Answers)
Homework 10.3 Assume a floating point number system with β = 2 and a mantissa with t digits so that
a typical positive number is written as .d0 d1 . . . dt−1 × 2e , with di ∈ {0, 1}.
Answer: . 10 · · · 0} × 21 .
| {z
t
digits
• What is the largest positive real number u (represented as a binary fraction) such that the float-
ing point representation of 1 + u equals the floating point representation of 1? (Assume rounded
arithmetic.)
· · · 0} × 21 + . |00{z
Answer: . |10{z · · · 0} 1 × 21 = . 10 · · · 0} 1 × 21 which rounds to . 10
| {z · · · 0} × 21 = 1.
| {z
t t t
digits digits digits t
digits
· · 01} × 21 > 1.
| ·{z
It is not hard to see that any larger number rounds to . 10
t
digits
* BACK TO TEXT
397
Homework 10.12 Prove Theorem 10.11.
Answer:
Show that if |A| ≤ |B| then kAk1 ≤ kBk1 :
Let
A = a0 · · · an−1 and B = b0 · · · bn−1 .
Then
!
m−1
kAk1 = max ka j k1 = max ∑ |αi, j |
0≤ j<n 0≤ j<n i=0
!
m−1
= ∑ |αi,k | where k is the index that maximizes
i=0
!
m−1
≤ ∑ |βi,k | since |A| ≤ |B|
i=0
!
m−1
≤ max ∑ |βi, j | = max kb j k1 = kBk1 .
0≤ j<n i=0 0≤ j<n
κ := ((χ0 ψ0 + χ1 ψ1 ) + χ2 ψ2 ),
398
Case 1: ∏ni=0 (1 + εi )±1 = (∏n−1 ±1
i=0 (1 + εi ) )(1 + εn ). By the I.H. there exists a θn such that (1 + θn ) =
±1 and |θ | ≤ nu/(1 − nu). Then
∏n−1
i=0 (1 + εi ) n
!
n−1
∏ (1 + εi)±1 (1 + εn ) = (1 + θn )(1 + εn ) = 1 + θn + εn + θn εn ,
| {z }
i=0
θn+1
* BACK TO TEXT
nu bu nu bu
γn + γb + γn γb = + +
1 − nu 1 − bu (1 − nu) (1 − bu)
nu(1 − bu) + (1 − nu)bu + bnu2
=
(1 − nu)(1 − bu)
nu − bnu2 + bu − bnu2 + bnu2 (n + b)u − bnu2
= =
1 − (n + b)u + bnu2 1 − (n + b)u + bnu2
(n + b)u (n + b)u
≤ 2
≤ = γn+b .
1 − (n + b)u + bnu 1 − (n + b)u
* BACK TO TEXT
Homework 10.19 Let k ≥ 0 and assume that |ε1 |, |ε2 | ≤ u, with ε1 = 0 if k = 0. Show that
I +Σ (k) 0
(1 + ε2 ) = (I + Σ(k+1) ).
0 (1 + ε1 )
Hint: reason the case where k = 0 separately from the case where k > 0.
Answer:
Case: k = 0. Then
I +Σ (k) 0
(1 + ε2 ) = (1 + 0)(1 + ε2 ) = (1 + ε2 ) = (1 + θ1 ) = (I + Σ(1) ).
0 (1 + ε1 )
399
Case: k = 1. Then
I + Σ(k) 0 1 + θ1 0
(1 + ε2 ) = (1 + ε2 )
0 (1 + ε1 ) 0 (1 + ε1 )
(1 + θ1 )(1 + ε2 ) 0
=
0 (1 + ε1 )(1 + ε2 )
(1 + θ2 ) 0
= = (I + Σ(2) ).
0 (1 + θ2 )
Case: k > 1. Notice that
(I + Σ(k) )(1 + ε2 ) = diag(()(1 + θk ), (1 + θk ), (1 + θk−1 ), · · · , (1 + θ2 ))(1 + ε2 )
= diag(()(1 + θk+1 ), (1 + θk+1 ), (1 + θk ), · · · , (1 + θ3 ))
Then
I + Σ(k) 0 (I + Σ(k) )(1 + ε2 ) 0
(1 + ε2 ) =
0 (1 + ε1 ) 0 (1 + ε1 )(1 + ε2 )
(k)
(I + Σ )(1 + ε2 ) 0
= = (I + Σ(k+1) ).
0 (1 + θ2 )
* BACK TO TEXT
400
(Note: strictly speaking, one should probably treat the case n = 1 separately.)!.
* BACK TO TEXT
Homework 10.25 In the above theorem, could one instead prove the result
y̌ = A(x + δx),
where δx is “small”?
Answer: The answer is “no”. The reason is that for each individual element of y
However, the δx for each entry ψ̌i is different, meaning that we cannot factor out x + δx to find that
y̌ = A(x + δx).
* BACK TO TEXT
Homework 10.27 In the above theorem, could one instead prove the result
Č = (A + ∆A)(B + ∆B),
401
Chapter 10. Notes on Performance (Answers)
No exercises to answer yet.
402
Chapter 11. Notes on Gaussian Elimination and LU Factorization (An-
swers)
Answer:
I 0 0 I 0 0 I 0 0 I 0 0
k k k k
0 = 0 0 = 0
0 1 0 0 1 1 1 0
0 −l21 I 0 l21 I 0 −l21 + l21 I 0 0 I
* BACK TO TEXT
L 0 0
00
T
Homework 12.7 Let L̃k = L0 L1 . . . Lk . Assume that L̃k has the form L̃k−1 = l10 1 0 , where L̃00
L20 0 I
L 0 0 I 0 0
00
is k × k. Show that L̃k is given by L̃k = l10
T
.. (Recall: L = 0 .)
1 0 bk 0 1
L20 l21 I 0 −l21 I
Answer:
b−1
L̃k = L̃k−1 Lk = L̃k−1 Lk
−1
L00 0 0 I 0 0 L 0 0 I 0 0
00
T T
= l10 1 0 . 0 0 = l10 1 0 . 0 1 0
1
L20 0 I 0 −l21 I L20 0 I 0 l21 I
L 0 0
00
T
= l10 1 0 .
L20 l21 I
* BACK TO TEXT
403
function [ A_out ] = LU_unb_var5( A )
%------------------------------------------------------------%
%------------------------------------------------------------%
end
return
Homework 12.12 Implement LU factorization with partial pivoting with the FLAME@lab API, in M-
script.
Answer: See Figure 12.7.
* BACK TO TEXT
Homework 12.13 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of DOT products. Hint: Partition
T
υ11 u12
U → .
0 U22
Call this Variant 1 and use Figure 12.6 to state the algorithm.
404
Algorithm: Solve Uz = y, overwriting y (Variant 1) Algorithm: Solve Uz = y, overwriting y (Variant 2)
UT L UT R y UT L UT R y
Partition U → ,y→ T Partition U → ,y→ T
UBL UBR yB UBL UBR yB
where UBR is 0 × 0, yB has 0 rows where UBR is 0 × 0, yB has 0 rows
while m(UBR ) < m(U) do while m(UBR ) < m(U) do
Repartition Repartition
U u U
U u U
UT L UT R 00 01 02 UT L UT R 00 01 02
→ uT υ11 uT
, → uT υ11 uT
,
UBL UBR 10 12
UBL UBR 10 12
U u U U u U
20 21 22 20 21 22
y y
yT 0 yT 0
→
ψ1 →
ψ1
yB yB
y2 y2
ψ1 := ψ1 − uT12 x2 ψ1 := χ1 /υ11
ψ1 := ψ1 /υ11 y0 := y0 − χ1 u01
Figure 6 (Answer): Algorithms for the solution of upper triangular system Ux = y that overwrite y with x.
405
Answer:
Partition
υ11 uT12 χ1 ψ1
= .
0 U22 x2 y2
Multiplying this out yields
υ11 χ1 + uT12 x2 ψ1
= .
U22 x2 y2
So, if we assume that x2 has already been computed and has overwritten y2 , then χ1 can be computed as
χ1 = (ψ1 − uT12 x2 )/υ11
which can then overwrite ψ1 . The resulting algorithm is given in Figure 12.6 (Answer) (left).
* BACK TO TEXT
Homework 12.14 Derive an algorithm for solving Ux = y, overwriting y with the solution, that casts most
computation in terms of AXPY operations. Call this Variant 2 and use Figure 12.6 to state the algorithm.
Answer: Partition
U00 u01 x0 y0
= .
0 υ11 χ1 ψ1
Multiplying this out yields
U00 x0 + u01 χ1 y0
= .
υ11 χ1 ψ1
So, χ1 = ψ1 /υ11 after which x0 can be computed by solving U00 x0 = y0 − χ1 u01 . The resulting algorithm
is given in Figure 12.6 (Answer) (right).
* BACK TO TEXT
Homework 12.15 If A is an n × n matrix, show that the cost of Variant 1 is approximately 23 n3 flops.
Answer: During the kth iteration, L00 is k × k, for k = 0, . . . , n − 1. Then the (approximate) cost of the
steps are given by
• Solve L00 u01 = a01 for u01 , overwriting a01 with the result. Cost: k2 flops.
T U = aT (or, equivalently, U T (l T )T = (aT )T for l T ), overwriting aT with the result.
• Solve l10 00 10 00 10 10 10 10
Cost: k2 flops.
T u , overwriting α with the result. Cost: 2k flops.
• Compute υ11 = α11 − l10 01 11
* BACK TO TEXT
406
Homework 12.16 Derive the up-looking variant for computing the LU factorization.
Answer: Consider the loop invariant:
A AT R L\UT L UT R
TL =
ABL ABR ABL
b ABR
b
LT LUT L = A
bT L LT LUT R = A
bT R
∧ ((
((( ((((
(UT L = ABL LBL
LBL ( (
U R + LBRUBR = ABR
( (
(((T(
( ( b ( b
L00U00 = A
b00 L00 u01 = ab01 L00U02 = A
b02
TU =a
l10 bT10 T u +υ = α
l10 T U + uT = a
l10 bT12
00 01 11 b 11 02 12
L20U00 = A
b20 L20 u01 + υ11 l21 = ab21 L20U02 + l21 uT12 + L22U22 = A
b22
The equalities in yellow can be used to compute the desired parts of L and U:
T U = aT for l T , overwriting aT with the result.
• Solve l10 00 10 10 10
T u , overwriting α with the result.
• Compute υ11 = α11 − l10 01 11
* BACK TO TEXT
Homework 12.17 Implement all five LU factorization algorithms with the FLAME@lab API, in M-
script.
* BACK TO TEXT
407
Homework 12.18 Which of the five variants can be modified to incorporate partial pivoting?
Answer: Variants 2, 4, and 5.
* BACK TO TEXT
1 0 0 ··· 0 1
−1 1 0 ··· 0 1
−1 −1 1 · · · 0 1
A= . .
.. . . .
.. .. . . .. ..
. .
−1 −1 ··· 1 1
−1 −1 · · · −1 1
Eliminating the entries below the diagonal in the second column yields:
1 0 0 ··· 0 1
0 1 0 ··· 0 2
0 0 1 ··· 0 4
.
.. .. .. . . .. ..
. . . . . .
0 0 ··· 1 4
0 0 ··· −1 4
408
Eliminating the entries below the diagonal in the (n − 1)st column yields:
1 0 0 ··· 0 1
0 1 0 ··· 0 2
0 0 1 ··· 0 4
.
. . . .
.. .. .. . . ... ..
.
n−2
0 0
··· 1 2
0 0 ··· 0 2 n−1
* BACK TO TEXT
Show that
−1
L00
L−1 = .
T L−1
−l10 1
00
Answer:
Answer 1:
−1 −1
L00 0 L00 0 L00 L00 0 I 0
= =
T
l10 1 T L−1
−l10 1 T L−1 − l T L−1
l10 1 0 1
00 00 10 00
Answer 2: We know that the inverse of a unit lower triangular matrix is unit lower triangular. Let’s call
the inverse matrix X. Then LX = I and hence
L 0 X 0 I 0
00 00 = .
T
l10 1 T
x10 1 0 1
* BACK TO TEXT
409
Algorithm: [L] := LI NV UNB VAR 1(L)
LT L LT R
Partition L →
LBL LBR
whereLT L is 0 × 0
while m(LT L ) < m(L) do
Repartition
L l L02
LT L LT R 00 01
→ l T λ11 l T
10 12
LBL LBR
L20 l21 L22
whereλ11 is 1 × 1
T := l T L
l10 10 00
Continue with
L l L02
LT L LT R 00 01
← l T λ11
T
l12
10
LBL LBR
L20 l21 L22
endwhile
Figure 12.8: Algorithm for inverting a unit lower triangular matrix. The algorithm assumes that in previous
−1
iterations L00 has been overwritten with L00 .
Homework 12.25 Use Homework 12.24 and Figure 12.12 to propose an algorithm for overwriting a unit
lower triangular matrix L with its inverse.
Answer: See Figure 12.8
* BACK TO TEXT
Homework 12.26 Show that the cost of the algorithm in Homework 12.25 is, approximately, 13 n3 , where
L is n × n.
Answer: Assume that L00 is k × k. Then the cost of the current iteration is, approximately, k2 flops (the
cost multiplying a lower triangular matrix times a vector). Thus, the total cost is, approximately
n−1 Z n
2 1
∑k ≈ x2 dx = n3 .
k=0 0 3
410
* BACK TO TEXT
Homework 12.27 Given n × n upper triangular matrix U and n × n unit lower triangular matrix L, propose
an algorithm for computing Y where UY = L. Be sure to take advantage of zeros in U and L. The cost of
the resulting algorithm should be, approximately, n3 flops.
There is a way of, given that L and U have overwritten matrix A, overwriting this matrix with Y . Try
to find that algorithm. It is not easy... Answer:
Variant 1 Partition
υ11 uT12 λ11 0 ψ11 yT12
U → ,L → , and Y → .
0 U22 l21 L22 y21 Y22
Then, since UY = L,
υ11 uT12 ψ11 yT12 λ11 0
. =
0 U22 y21 Y22 l21 L22
and hence
υ11 ψ11 + uT12 y21 υ11 yT12 + uT12Y22 1 0
=
U22 y21 U22Y22 l21 L22
This suggests the following steps. We also give the approximate cost of the step when U22 is k × k.
411
Algorithm: [Y ] := S OLVE UY EQ L UNB VAR 1(U, L,Y )
UT L UT R LT L LT R YT L YT R
Partition U → ,L→ ,Y →
UBL UBR LBL LBR YBL YBR
whereUBR , LBR , and YBR are 0 × 0
while m(UBR ) < m(U) do
Repartition
U u U02 L l L02
UT L UT R 00 01 L LT R 00 01
→ uT υ11 uT
, TL → l T λ11 l T
,
10 12 10 12
UBL UBR LBL LBR
U20 u21 U22 L20 l21 L22
Y y Y02
YT L YT R 00 01
→ yT ψ11 yT
YBL YBR 10 12
Y20 y21 Y22
whereυ11 is 1 × 1, λ11 is 1 × 1, ψ11 is 1 × 1
Continue with
U u01 U02 L l01 L02
UT L UT R 00 LT L LT R 00
← uT
υ11 T
u12 ,
← lT
λ11 T
l12 ,
UBL UBR 10 LBL LBR 10
U u21 U22 L20 l21 L22
20
Y y01 Y02
YT L YT R 00
← yT
ψ11 yT12
YBL YBR 10
Y20 y21 Y22
endwhile
412
Homework 13.3 Let B ∈ Cm×n have linearly independent columns. Prove that A = BH B is HPD.
Answer: Let x ∈ Cm be a nonzero vector. Then xH BH Bx = (Bx)H (Bx). Since B has linearly independent
columns we know that Bx 6= 0. Hence (Bx)H Bx > 0.
* BACK TO TEXT
Homework 13.4 Let A ∈ Cm×m be HPD. Show that its diagonal elements are real and positive.
Answer: Let e j be the jth unit basis vectors. Then 0 < eHj Ae j = α j, j .
* BACK TO TEXT
Homework 13.15 Consider B ∈ Cm×n with linearly independent columns. Recall that B has a QR fac-
torization, B = QR where Q has orthonormal columns and R is an upper triangular matrix with positive
diagonal elements. How are the Cholesky factorization of BH B and the QR factorization of B related?
Answer:
BH B = (QR)H QR = RH QH Q R = |{z}
RH |{z}
R .
| {z }
I L LH
* BACK TO TEXT
(Hint: For this exercise, use techniques similar to those in Section 13.5.)
413
T , where L is lower triangular and nonsingular, argue that the assign-
2. Assuming that A00 = L00 L00 00
T := aT L−T is well-defined.
ment l10 10 00
−1 T := aT L−T uniquely
Answer: The compution is well-defined because L00 exists and hence l10 10 00
T .
computes l10
3. Assuming that A00 is SPD, A00 = L00 L00 T where L is lower triangular and nonsingular, and l T =
00
q 10
T −T T T
a10 L00 , show that α11 − l10 l10 > 0 so that λ11 := α11 − l10 l10 is well-defined.
Answer: We want to show that α11 − l10 T l > 0. To do so, we are going to construct a nonzero
10
vector x so that xT Ax = α11 − l10
T l , at which point we can invoke the fact that A is SPD.
10
Rather than just giving the answer, we go through the steps that will give insight into the thought
process leading up to the answer as well. Consider
T T
x0 A00 a10 x0 x0 A00 x0 + χ1 a10
=
χ1 aT10 α11 χ1 χ1 aT10 x0 + α11 χ1
= x0T A00 x0 + χ1 a10 + χ1 aT10 x0 + α11 χ1
= x0T A00 x0 + χ1 x0T a10 + χ1 aT10 x0 + α11 χ21 .
T l , perhaps we should pick χ = 1. Then
Since we are trying to get to α11 − l10 10 1
T
x0 A00 a10 x0
= x0T A00 x0 + x0T a10 + aT10 x0 + α11 .
1 aT10 α11 1
414
Algorithm: A := C HOL UNB(A) Bordered algorithm)
AT L ?
Partition A →
ABL ABR
whereAT L is 0 × 0
while m(AT L ) < m(A) do
Repartition
A00 ? ?
AT L ?
→ aT α11 ?
10
ABL ABR
A20 a21 A22
whereα11 is 1 × 1
Continue with
A00 ? ?
AT L ?
← aT α11 ?
10
ABL ABR
A20 a21 A22
endwhile
4. Show that T
A00 a10 L00 0 L00 0
=
aT10 α11 T
l10 λ11 T
l10 λ11
Homework 13.17 Use the results in the last exercise to give an alternative proof by induction of the
Cholesky Factorization Theorem and to give an algorithm for computing it by filling in Figure 13.3. This
algorithm is often referred to as the bordered Cholesky factorization algorithm.
Answer:
Proof by induction.
415
Base case: n = 1. Clearly the result is true for a 1 × 1 matrix A = α11 : In this case, the fact that A is
√
SPD means that α11 is real and positive and a Cholesky factor is then given by λ11 = α11 , with
uniqueness if we insist that λ11 is positive.
Inductive step: Inductive Hypothesis (I.H.): Assume the result is true for SPD matrix A ∈ C(n−1)×(n−1) .
We will show that it holds for A ∈ Cn×n . Let A ∈ Cn×n be SPD. Partition A and L like
A00 ? L00 0
A= and L = .
T
a10 α11 T
l10 λ11
Assume that A00 = L00 L00 T is the Cholesky factorization of A . By the I.H., we know this exists
00
since A00 is (n − 1) × (n − 1) and that L00 is nonsingular. By the previous homework, we then know
that
T = aT L−T is well-defined,
• l10 01 00
q
• λ11 = α11 − l10T l is well-defined, and
10
• A = LLT .
Homework 13.18 Show that the cost of the bordered algorithm is, approximately, 13 n3 flops.
Answer: During the kth iteration, k = 0, . . . , n − 1 assume that A00 is k × k. Then
−T
• aT10 = aT10 L00 (implemented as the lower triangular solve L00 l10 = a10 ) requires, approximately, k2
flops.
* BACK TO TEXT
416
Chapter 13. Notes on Eigenvalues and Eigenvectors (Answers)
Homework 14.4 Let λ be an eigenvalue of A and let Eλ (A) = {x ∈ Cm |Ax = λx} denote the set of all
eigenvectors of A associated with λ (including the zero vector, which is not really considered an eigenvec-
tor). Show that this set is a (nontrivial) subspace of Cm .
Answer:
• 0 ∈ Eλ (A): (since we explicitly include it).
* BACK TO TEXT
Homework 14.8 The eigenvalues of a diagonal matrix equal the values on its diagonal. The eigenvalues
of a triangular matrix equal the values on its diagonal.
Answer: Since a diagonal matrix is a special case of a triangular matrix, it suffices to prove that the
eigenvalues of a triangular matrix are the values on its diagonal.
If A is triangular, so is A − λI. By definition, λ is an eigenvalue of A if and only if A − λI is singular.
But a triangular matrix is singular if and only if it has a zero on its diagonal. The triangular matrix A − λI
has a zero on its diagonal if and only if λ equals one of the diagonal elements of A.
* BACK TO TEXT
Homework 14.19 Prove Lemma 14.18. Then generalize it to a result for block upper triangular matrices:
A0,0 A0,1 · · · A0,N−1
0 A1,1 · · · A1,N−1
A= .
0 . .. .
..
0
0 0 · · · AN−1,N−1
Answer:
417
AT L AT R
Lemma 14.18 Let A ∈ Cm×m be of form A = . Assume that QT L and QBR are
0 ABR
unitary “of appropriate size”. Then
H
QT L 0 QT L AT L QH
TL QT L AT R QH
BR QT L 0
A= .
0 QBR 0 QBR ABR QH
BR 0 QBR
Proof:
H
QT L 0 QT L AT L QH
TL QT L AT R QH
BR QT L 0
0 QBR 0 QBR ABR QH
BR 0 QBR
QH H
T L QT L AT L QT L QT L QH H
T L QT L AT R QBR QBR
=
0 QH H
BR QBR ABR QBR QBR
AT L AT R
= = A.
0 ABR
* BACK TO TEXT
Homework 14.21 Prove Corollary 14.20. Then generalize it to a result for block upper triangular matri-
ces.
Answer:
AT L AT R
Colollary 14.20 Let A ∈ Cm×m be of for A = . Then Λ(A) = Λ(AT L ) ∪
0 ABR
Λ(ABR ).
418
Proof: This follows immediately from Lemma 14.18. Let AT L = QT L TT Li QH H
T L and ABR = QBR TBRi QBR
be the Shur decompositions of the diagonal blocks. Then Λ(AT L ) equals the diagonal elements of TT L
and Λ(ABR ) equals the diagonal elements of TBR . Now, by Lemma 14.18 the Schur decomposition of A is
given by
H
QT L 0 Q A Q H H
QT L AT R QBR Q 0
A = TL TL TL TL
0 QBR 0 QBR ABR QH BR 0 Q BR
H
QT L 0 TT L QT L AT R QH
BR QT L 0
= .
0 QBR 0 TBR 0 QBR
| {z }
TA
Hence the diagonal elements of TA (which is upper triangular) equal the elements of Λ(A). The set of
those elements is clearly Λ(AT L ) ∪ Λ(ABR ).
The generalization is that if
A0,0 A0,1 · · · A0,N−1
0 A1,1 · · · A1,N−1
A=
0 . .. .
..
0
0 0 · · · AN−1,N−1
where the blocks on the diagonal are square, then
Λ(A) = Λ(A0,0 ) ∪ Λ(A1,1 ) ∪ · · · ∪ Λ(AN−1,N−1 ).
* BACK TO TEXT
Homework 14.24 Let A be Hermitian and λ and µ be distinct eigenvalues with eigenvectors xλ and xµ ,
respectively. Then xλH xµ = 0. (In other words, the eigenvectors of a Hermitian matrix corresponding to
distinct eigenvalues are orthogonal.)
Answer: Assume that Axλ = λxλ and Axµ = µxµ , for nonzero vectors xλ and xµ and λ 6= µ. Then
xµH Axλ = λxµH xλ
and
xλH Axµ = µxλH xµ .
Because A is Hermitian and λ is real
µxλH xµ = xλH Axµ = (xµH Axλ )H = (λxµH xλ )H = λxλH xµ .
Hence
µxλH xµ = λxλH xµ .
If xλH xµ 6= 0 then µ = λ, which is a contradiction.
* BACK TO TEXT
419
Homework 14.25 Let A ∈ Cm×m be a Hermitian matrix, A = QΛQH its Spectral Decomposition, and
A = UΣV H its SVD. Relate Q, U, V , Λ, and Σ.
Answer: I am going to answer this by showing how to take the Spectral decomposition of A, and turn
this into the SVD of A. Observations:
• We will assume all eigenvalues are nonzero. It should be pretty obvious how to fix the below if some
of them are zero.
• Q is unitary and Λ is diagonal. Thus, A = QΛQH is the SVD except that the diagonal elements of Λ
are not necessarily nonnegative and they are not ordered from largest to smallest in magnitude.
• We can fix the fact that they are not nonnegative with the following observation:
λ0 0 · · · 0
0 λ1 · · · 0
A = QΛQH = Q
H
Q
.. .. . . ..
. . . .
0 0 · · · λn−1
λ0 0 · · · 0 sign(λ0 ) 0 ··· 0 sign(λ0 ) 0 ··· 0
0 λ1 · · · 0 0 sign(λ1 ) ··· 0 0 sign(λ1 ) · · · 0
H
= Q Q
.. .. . . .. .
.. .. .. .. .. .. .. ..
. . . .
. . .
. . . .
0 0 ··· λn−1 0 0 · · · sign(λn−1 ) 0 0 · · · sign(λn−1 )
| {z }
I
λ0 0 · · · 0 sign(λ0 ) 0 ··· 0 sign(λ0 ) 0 ··· 0
0 λ1 · · · 0 0 sign(λ1 ) · · · 0 0 sign(λ1 ) · · · 0
H
= Q Q
.. .. . . .. .. .. .. .. .. .. .. ..
. . . .
. . . .
. . . .
0 0 · · · λn−1 0 0 · · · sign(λn−1 ) 0 0 · · · sign(λn−1 )
| {z } | {z }
|λ0 | 0 ··· 0 Q̃H
0 |λ1 | · · · 0
.. .. . . ..
. . . .
0 0 · · · |λn−1 |
| {z }
Λ̃
= QΛ̃Q̃H .
Now Λ̃ has nonnegative diagonal entries and Q̃ is unitary since it is the product of two unitary
matrices.
• Next, we fix the fact that the entries of Λ̃ are not ordered from largest to smallest. We do so by
noting that there is a permutation matrix P such that Σ = PΛ̃PH equals the diagonal matrix with the
values ordered from the largest to smallest. Then
A = QΛ̃Q̃H
420
PH P Λ̃ P
= Q |{z} H
|{z}P Q̃H
I I
* BACK TO TEXT
Homework 14.26 Let A ∈ Cm×m and A = UΣV H its SVD. Relate the Spectral decompositions of AH A
and AAH to U, V , and Σ.
Answer:
AH A = (UΣV H )H UΣV H = V ΣU H UΣV H = V Σ2V H .
Thus, the eigenvalues of AH A equal the square of the singular values of A.
Thus, the eigenvalues of AAH equal the square of the singular values of A.
* BACK TO TEXT
421
Chapter 14. Notes on the Power Method and Related Methods (An-
swers)
* BACK TO TEXT
Answer: This follows immediately from the fact that if α > 0 and β > 0 then
• α > β implies that 1/β > 1/α and
• α ≥ β implies that 1/β ≥ 1/α.
* BACK TO TEXT
* BACK TO TEXT
422
Chapter 16. Notes on the QR Algorithm and other Dense Eigen-
solvers (Answers)
so that
b(k) = Vb (k) T AVb ( k) = Vb (k) T Vb (k+1) Rb(k+1)
A
Since Vb (k) T Vb (k+1) this means that A
b(k) = Vb (k) T Vb (k+1) Rb(k+1) is a QR factorization of A
b(k) .
Now, by the I.H.,
A(k) = A
b(k)Vb (k) T Vb (k+1) Rb(k+1)
But then
b (k)Vb (k) T Vb (k+1) = Vb (k+1) .
V (k+1) = V (k) Q(k+1) = Vb (k) Q(k+1) = V
| {z }
I
Finally,
b(k+1) = Vb (k+1) T AVb (k+1) = V (k+1) T AV (k+1) = (V (k) Q(k+1) )T A(V (k) Q(k+1) )
A
= Q(k+1) T V (k) AV (k) Q(k+1) = Q(k+1) T A(k) Q(k+1) = R(k+1) Q(k+1) = A(k+1)
| {z }
R (k+1)
* BACK TO TEXT
423
• Base case: k = 0. We need to show that Vb (0) = V (0) and A
b(0) = A(0) . This is trivially true.
* BACK TO TEXT
for j = 0, . . . , k. Then
βk+1,k = kq̃k+1 k2
and then
qk+1 = q̃k+1 /βk+1,k .
This way, the columns of Q and B can be determined, one-by-one.
* BACK TO TEXT
424
Homework 16.11 In the above discussion, show that α211 + 2α231 + α233 = α b 233 .
b 211 + α
Answer: If A b31 k2 = kA31 k2 because multiplying on the left and/or the right by a
b31 = J31 A31 J T then kA
31 F F
unitary matrix preserves the Frobenius norm of a matrix. Hence
2
2
α11 α31
αb 11 0
=
.
0 α
α31 α33
b 33
F F
But
2
α11 α31
= α211 + 2α231 + α233
α31 α33
F
and
2
αb 11 0
b 211 + +α
b 233 ,
= α
0 α
b 33
F
Homework 16.14 Give the approximate total cost for reducing a nonsymmetric A ∈ Rn×n to upper-
Hessenberg form.
Answer: To prove this, assume that in the current iteration of the algorithm A00 is k × k. The update in
the current iteration is then
425
Thus, the total cost in flops is, approximately,
n−1
2k(n − k − 1) + 2k(n − k − 1) + 2(n − k − 1)2 + 2(n − k − 1)2 + 2 × 2(n − k − 1)2
∑
k=0
n−1 n−1
= ∑ 4k(n − k − 1) + 4(n − k − 1)2 + ∑ 4(n − k − 1)2
k=0 k=0
n−1
= ∑ [4k(n − k − 1) + 4(n − k − 1)2 ] + 4(n − k − 1)2
k=0
| {z }
= 4(k + n − k − 1)(n − k − 1)
= 4(n − 1)(n − k − 1)
≈ 4n(n − k − 1)
n−1 n−1 n−1 n−1
2
≈ 4n ∑ (n − k − 1) + 4 ∑ (n − k − 1) = 4n ∑ j + 4 ∑ j2
k=0 k=0 j=0 j=0
Z n
n(n − 1) 4 10
≈ 4n +4 j2 d j ≈ 2n3 + n3 = n3 .
2 j=0 3 3
* BACK TO TEXT
426
Algorithm: A := LDLT(A)
AT L ?
Partition A →
ABL ABR
whereAT L is 0 × 0
while m(AT L ) < m(A) do
Repartition
A00 ? ?
AT L ?
→ aT α
10 11 ?
ABL ABR
A20 a21 A22
whereα11 is 1 × 1
Continue with
A00 ? ?
AT L ?
← aT α11 ?
10
ABL ABR
A20 a21 A22
endwhile
Homework 17.1 Modify the algorithm in Figure 17.1 so that it computes the LDLT factorization. (Fill in
Figure 17.3.)
Answer:
Three possible answers are given in Figure 17.11.
* BACK TO TEXT
Homework 17.2 Modify the algorithm in Figure 17.2 so that it computes the LDLT factorization of a
tridiagonal matrix. (Fill in Figure 17.4.) What is the approximate cost, in floating point operations, of
427
Algorithm: A := LDLT TRI(A)
AFF ? ?
Partition A → αMF eTL αMM ?
0 αLM eF ALL
whereALL is 0 × 0
while m(AFF ) < m(A) do
Repartition
A00 ? ?
A ? ?
FF
α eT α ? ?
10 L 11
αMF eTL αMM ? →
0 α21 α22 ?
0 αLM eF ALL
0 0 α32 eF A33
whereα22 is a scalars
Continue with
A00 ? ?
A ? ?
FF
α10 eT α11 ? ?
L
αMF eTL αMM ? ←
0 α21 α22 ?
0 αLM eF ALL
0 0 α32 eF A33
endwhile
Figure 17.12: Algorithm for computing the the LDLT factorization of a tridiagonal matrix.
computing the LDLT factorization of a tridiagonal matrix? Count a divide, multiply, and add/subtract as a
floating point operation each. Show how you came up with the algorithm, similar to how we derived the
algorithm for the tridiagonal Cholesky factorization.
Answer:
428
for example, a21 := a21 /α11 becomes
α α α /α
21 := 21 /α11 = 21 11 .
0 0 0
* BACK TO TEXT
Homework 17.3 Derive an algorithm that, given an indefinite matrix A, computes A = UDU T . Overwrite
only the upper triangular part of A. (Fill in Figure 17.5.) Show how you came up with the algorithm,
similar to how we derived the algorithm for LDLT .
Answer: Three possible algorithms are given in Figure 17.13.
Partition
A00 a01 U00 u01 D00 0
A→ U → , and D →
T
a01 α11 0 1 0 δ1
Then
T
A00 a01 U00 u01 D00 0 U00 u01
=
aT01 α11 0 1 0 δ1 0 1
U00 D00 u01 δ1 UT 0
= 00
0 δ1 T
u01 1
T T
U00 D00U00 + δ1 u01 u01 u01 δ1
=
? δ1
This suggests the following algorithm for overwriting the strictly upper triangular part of A with the strictly
upper triangular part of U and the diagonal of A with D:
A00 a01
• Partition A → .
T
a01 α11
429
Algorithm: A := UDU T(A)
AT L AT R
Partition A →
? ABR
whereABR is 0 × 0
while m(ABR ) < m(A) do
Repartition
A00 a01 A02
AT L AT R
→ ? α11 aT
12
? ABR
? ? A22
whereα11 is 1 × 1
Continue with
A00 a01 A02
AT L AT R
← ? α11 aT12
? ABR
? ? A22
endwhile
Figure 17.13: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.
430
• Compute u01 := u01 /α11 .
• Update A22 := A22 − u01 aT01 (updating only the upper triangular part).
• a01 := u01 .
T .
• Continue with computing A00 → U00 D00 L00
This algorithm will complete as long as δ11 6= 0,
* BACK TO TEXT
Homework 17.4 Derive an algorithm that, given an indefinite tridiagonal matrix A, computes A = UDU T .
Overwrite only the upper triangular part of A. (Fill in Figure 17.6.) Show how you came up with the
algorithm, similar to how we derived the algorithm for LDLT .
Answer: Three possible algorithms are given in Figure 17.14.
key insight is to recognize that, relative to the algorithm in Figure 17.13, α11 = α22 and a10 =
The
0
so that, for example, a10 := a10 /α11 becomes
α12
0 0 0
:= /α22 = .
α12 α12 α12 /α22
* BACK TO TEXT
431
Algorithm: A := UDU T TRI(A)
A α eT 0
FF FM L
Partition A → ? αMM αML eTF
? ? ALL
whereAFF is 0 × 0
while m(ALL ) < m(A) do
Repartition
A00 α01 eL 0 0
A α e 0
FF FM L
? α11 α12 0
? αMM αML eTF →
? ? α22 α23 eTF
? ? ALL
? ? ? A33
where
Continue with
A00 α01 eL 0 0
A α e 0
FF FM L
? α11 α12 0
? αMM αML eTF ←
? ? T
α22 α23 eF
? ? ALL
? ? ? A33
endwhile
Figure 17.14: Algorithm for computing the the UDU T factorization of a tridiagonal matrix.
432
(Hint: multiply out A = LDLT and A = UEU T with the partitioned matrices first. Then multiply out the
above. Compare and match...) How should φ1 be chosen? What is the cost of computing the twisted
factorization given that you have already computed the LDLT and UDU T factorizations? A ”Big O”
estimate is sufficient. Be sure to take into account what eTL D00 eL and eTF E22 eF equal in your cost estimate.
Answer:
T
L00 0 0 D00 0 0 L00 0 0
λ10 eTL 1 0 0 0 λ10 eTL 1 0
δ1
0 λ21 eF L22 0 0 D22 0 λ21 eF L22
L00 0 0 D00 0 0 LT 0 λ10 eL
00
= λ10 eTL 1 0 0 0 0 λ21 eTF
1
δ1
0 λ21 eF L22 0 0 D22 0 T
L22
L00 0 0 D LT λ10 D00 eL 0
00 00
= λ10 eTL 1 0 0 λ21 δ1 eTF
δ1
0 λ21 eF L22 0 0 T
D22 L22
L00 D00 L00T λ10 L00 D00 eL 0 A00 α01 eL 0
= λ10 eTL D00 L00
T λ210 eTL D00 eL + δ1 λ21 δ1 eTF = α01 eTL α21 eTF . (17.2)
α11
0 λ21 δ1 eF λ221 δ1 eF eTF + L22 D22 L22 T 0 α21 eF A22
and
T
U00 υ01 eL 0 E00 0 0 U00 υ01 eL 0
0 1 υ21 eTF 0 0 0 1 υ21 eTF
ε1
0 0 U22 0 0 E22 0 0 U22
U00 υ01 eL 0 E 0 0 T
U00 0 0
00
= 0 1 υ21 eTF 0 0 υ01 eTL 1 0
ε1
0 0 U22 0 0 E22 0 υ21 eF T
U22
U υ01 eL 0 E UT 0 0
00 00 00
= 0 1 υ21 eTF ε1 υ01 eTL 0
ε1
0 0 U22 0 υ21 E22 eF T
E22U22
U E U T + υ201 ε1 eL eTL υ10 ε1 eL 0 A00 α01 eL 0
00 00 00
= υ01 ε1 eTL ε1 + υ201 eTF E22 eF ε1 υ21 eTF E22U22 = α01 eTL
T α21 eTF . (17.3)
α11
0 TE e
υ21U22 U22 E22U22 T 0 α21 eF A22
22 F
Finally,
T
L00 0 0 D00 0 0 L00 0 0
λ10 eTL 1 υ21 eTF 0 0 λ10 eTL 1 υ21 eTF
φ1
0 0 U22 0 0 E22 0 0 U22
433
L00 0 0 D00 0 0 T
L00 λ10 eL 0
= λ10 eTL 1 υ21 eTF 0 0 0 1 0
φ1
0 0 U22 0 0 E22 0 υ21 eF U22 T
L00 0 0 T
D00 L00 λ10 D00 eL 0
= λ10 eTL 1 υ21 eTF 0 0
φ1
0 0 U22 0 υ21 E22 eF E22U22 T
L00 D00 L00T λ10 L00 D00 eL 0 A00 α01 eL 0
= λ10 eTL D00 L00
T λ210 eTL D00 eL + φ1 + υ221 eTF E22 eF υ21 eTF E22U22T = α eT α21 eTF . (17.4)
α11
01 L
0 υ21U22 E22 eF U22 E22U22T 0 α21 eF A22
The equivalence of the submatrices highlighted in yellow in (17.2) and (??) justify the submatrices high-
lighted in yellow in (17.4).
The equivalence of the submatrices highlighted in grey in (17.2), (??, and (17.4) tell us that
or
so that
α11 = (α11 − δ1 ) + φ1 + (α11 − ε1 ).
Solving this for φ1 yields
φ1 = δ1 + ε1 − α11 .
Notice that, given the factorizations A = LDLT and A = UEU T the cost of computing the twisted factor-
ization is O(1).
* BACK TO TEXT
434
x
0
where x = χ1 is not a zero vector. What is the cost of this computation, given that L00 and U22 have
x2
special structure?
Answer: Choose x0 , χ1 , and x2 so that
T
L 0 0 x 0
00 0
λ10 eTL T χ1 = 1
1 υ21 eF
0 0 U22 x2 0
or, equivalently,
T
L00 λ10 eL 0 x 0
0
0 χ1 = 1
0 1
0 υ21 eF T
U22 x2 0
T
L00 x0 = −λ10 eL
T
U22 x2 = −υ21 eF
T ) and bidiagonal
so that x0 and x2 can be computed via solves with a bidiagonal upper triangular matrix (L00
T ), respectively.
lower triangular matrix (U00
Then
T
L00 0 0 D00 0 0 L00 0 0 x0
λ10 eTL υ21 eTF 0 0 λ10 eTL υ21 eTF
1
0
1
χ1
0 0 U22 0 0 E22 0 0 U22 x2
L 0 0 D00 0 0 0
00
= λ10 eTL T
1
0 0 0 1
υ21 eF
0 0 U22 0 0 E22 0
L 0 0 0 0
00
= λ10 eTL T 0 = 0 ,
1
υ21 eF
0 0 U22 0 0
435
T x = −λ e :
Now, consider solving L00 0 10 L
× × 0 0 0 0 0
0 × × 0 0 0 0
0 0 × × 0 0 0
=
0 0 0
× × 0
0
0 0 0
0 × ×
0
0 0 0 0 0 × −λ10
A moment of reflection shows that if L00 is k × k, then solving L00 T x = −λ e requires O(k) flops.
0 10 L
T x = −υ e requires O(n −
Similarly, since then U22 is then (n − k − 1) × (n − k − 1), then solving U22 2 21 F
k − 1) flops. Thus, computing x requires O(n) flops.
Thus,
• Given tridiagonal Ab computing all eigenvalues requires O(n2 ) computation. (We have not discussed
this in detail...)
Thus, computing all eigenvalues and eigenvectors of a tridiagonal matrix via this method requires
O(n2 ) computation. This is much better than computing these via the tridiagonal QR algorithm.
* BACK TO TEXT
436
Chapter 18. Notes on Computing the SVD (Answers)
Homework 18.2 If A = UΣV T is the SVD of A then AT A = V Σ2V T is the Spectral Decomposition of
AT A.
* BACK TO TEXT
Homework 18.3 Homework 18.4 Give the approximate total cost for reducing A ∈ Rn×n to bidiagonal
form.
* BACK TO TEXT
437
438
Appendix A
How to Download
Videos associated with these notes can be viewed in one of three ways:
• * Download from UT Box links to the video uploaded to UT-Austin’s “UT Box’ File Sharing
Service”.
• * View After Local Download links to the video downloaded to your own computer in directory
Video within the same directory in which you stored this document. You can download videos by
first linking on * Download from UT Box. Alternatively, visit the UT Box directory in which the
videos are stored and download some or all.
439
440
Appendix B
LAFF Routines (FLAME@lab)
Figure B summarizes the most important routines that are part of the laff FLAME@lab (MATLAB)
library used in these materials.
441
Operation Abbrev. Definition Function Approx. cost
flops memops
Vector-vector operations
Copy (COPY) y := x y = laff copy( x, y ) 0 2n
Vector scaling (SCAL) x := αx x = laff scal( alpha, x ) n 2n
Vector scaling (SCAL) x := x/α x = laff invscal( alpha, x ) n 2n
Scaled addition (AXPY) y := αx + y y = laff axpy( alpha, x, y ) 2n 3n
Dot product (DOT) α := xT y alpha = laff dot( x, y ) 2n 2n
Dot product (DOTS) α := xT y + α alpha = laff dots( x, y, alpha ) 2n 2n
Length (NORM 2) α := kxk2 alpha = laff norm2( x ) 2n n
Matrix-vector operations
General matrix-vector y := αAx + βy y = laff gemv( ’No transpose’, alpha, A, x, beta, y ) 2mn mn
multiplication (GEMV) y := αAT x + βy y = laff gemv( ‘Transpose’, alpha, A, x, beta, y ) 2mn mn
Rank-1 update (GER) A := αxyT +A A = laff ger( alpha, x, y, A ) 2mn mn
Triangular matrix b := L−1 b, b := U −1 b example:
solve (TRSV) b := L−T b, b := U −T b b = laff trsv( ’Upper triangular’, ’No transpose’,
’Nonunit diagonal’, U, b ) n2 n2 /2
Triangular matrix- x := Lx, x := Ux example:
vector multiply (TRMV) x := LT x, x := U T x x = laff trmv( ’Upper triangular’, ’No transpose’,
’Nonunit diagonal’, U, x ) n2 n2 /2
Matrix-matrix operations
General matrix-matrix C := αAB + βC example:
multiplication (GEMM) C := αAT B + βC C = laff gemm( ‘Transpose’, ’No transpose, 2mnk 2mn + mk + nk
C := αABT + βC alpha, A, B, beta, C )
C := αAT BT + βC
Triangular solve B := αL−1 B example:
with MRHs (TRSM) B := αU −T B B = laff trsm( ’Left’, ’Lower triangular’,
B := αBL−1 ’No transpose’, ’Nonunit diagonal’, m2 n m2 + mn
B := αBU −T alpha, U, B ) m2 n m2 + mn
Bibliography
[1] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, Jack J. Dongarra, J. Du Croz, S. Ham-
marling, A. Greenbaum, A. McKenney, and D. Sorensen. LAPACK Users’ guide (third ed.). SIAM,
Philadelphia, PA, USA, 1999.
[3] Paolo Bientinesi. Mechanical Derivation and Systematic Analysis of Correct Linear Algebra Algo-
rithms. PhD thesis, Department of Computer Sciences, The University of Texas, 2006. Technical
Report TR-06-46. September 2006.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[4] Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Ortı́, and Robert A.
van de Geijn. The science of deriving dense linear algebra algorithms. ACM Trans. Math. Soft.,
31(1):1–26, March 2005.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[5] Paolo Bientinesi, Brian Gunter, and Robert A. van de Geijn. Families of algorithms related to the
inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35(1):3:1–3:22, July
2008.
[6] Paolo Bientinesi, Enrique S. Quintana-Ortı́, and Robert A. van de Geijn. Representing linear al-
gebra algorithms in code: The FLAME APIs. ACM Trans. Math. Soft., 31(1):27–59, March 2005.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[7] Paolo Bientinesi and Robert A. van de Geijn. The science of deriving stability analyses. FLAME
Working Note #33. Technical Report AICES-2008-2, Aachen Institute for Computational Engineer-
ing Sciences, RWTH Aachen, November 2008.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[8] Paolo Bientinesi and Robert A. van de Geijn. Goal-oriented and modular stability analysis. SIAM J.
Matrix Anal. & Appl., 32(1):286–308, 2011.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[9] Paolo Bientinesi and Robert A. van de Geijn. Goal-oriented and modular stability analysis. SIAM J.
Matrix Anal. Appl., 32(1):286–308, March 2011.
443
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
We suggest you read FLAME Working Note #33 for more details.
[10] Christian Bischof and Charles Van Loan. The WY representation for products of Householder ma-
trices. SIAM J. Sci. Stat. Comput., 8(1):s2–s13, Jan. 1987.
[11] Basic linear algebra subprograms technical forum standard. International Journal of High Perfor-
mance Applications and Supercomputing, 16(1), 2002.
[13] I. S. Dhillon. A New O(n2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Prob-
lem. PhD thesis, Computer Science Division, University of California, Berkeley, California, May
1997. Available as UC Berkeley Technical Report No. UCB//CSD-97-971.
[14] I. S. Dhillon. Reliable computation of the condition number of a tridiagonal matrix in O(n) time.
SIAM J. Matrix Anal. Appl., 19(3):776–796, July 1998.
[16] Inderjit S. Dhillon, Beresford N. Parlett, and Christof Vömel. The design and implementation of the
MRRR algorithm. ACM Trans. Math. Soft., 32(4):533–560, December 2006.
[17] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK Users’ Guide. SIAM,
Philadelphia, 1979.
[18] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear
algebra subprograms. ACM Trans. Math. Soft., 16(1):1–17, March 1990.
[19] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of
FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft., 14(1):1–17, March 1988.
[20] Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst. Solving Linear
Systems on Vector and Shared Memory Computers. SIAM, Philadelphia, PA, 1991.
[21] Jack J. Dongarra, Sven J. Hammarling, and Danny C. Sorensen. Block reduction of matrices to
condensed forms for eigenvalue computations. Journal of Computational and Applied Mathematics,
27, 1989.
[22] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University
Press, Baltimore, 3nd edition, 1996.
[23] Kazushige Goto and Robert van de Geijn. Anatomy of high-performance matrix multiplication. ACM
Trans. Math. Soft., 34(3):12:1–12:25, May 2008.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[24] John A. Gunnels. A Systematic Approach to the Design and Analysis of Parallel Dense Linear Alge-
bra Algorithms. PhD thesis, Department of Computer Sciences, The University of Texas, December
2001.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[25] John A. Gunnels, Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn. FLAME: Formal
Linear Algebra Methods Environment. ACM Trans. Math. Soft., 27(4):422–455, December 2001.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[26] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and
Applied Mathematics, Philadelphia, PA, USA, second edition, 2002.
[27] C. G. J. Jacobi. Über ein leichtes Verfahren, die in der Theorie der Säkular-störungen vorkommenden
Gleichungen numerisch aufzulösen. Crelle’s Journal, 30:51–94, 1846.
[28] Thierry Joffrain, Tze Meng Low, Enrique S. Quintana-Ortı́, Robert van de Geijn, and Field G. Van
Zee. Accumulating Householder transformations, revisited. ACM Trans. Math. Softw., 32(2):169–
179, June 2006.
[29] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for
Fortran usage. ACM Trans. Math. Soft., 5(3):308–323, Sept. 1979.
[30] Margaret E. Myers, Pierce M. van de Geijn, and Robert A. van de Geijn. Linear Algebra: Founda-
tions to Frontiers - Notes to LAFF With. Self published, 2014.
Download from http://www.ulaff.net.
[31] Jack Poulson, Bryan Marker, Robert A. van de Geijn, Jeff R. Hammond, and Nichols A. Romero.
Elemental: A new framework for distributed memory dense matrix computations. ACM Trans. Math.
Softw., 39(2):13:1–13:24, February 2013.
[32] C. Puglisi. Modification of the Householder method based on the compact WY representation. SIAM
J. Sci. Stat. Comput., 13:723–726, 1992.
[33] Enrique S. Quintana, Gregorio Quintana, Xiaobai Sun, and Robert van de Geijn. A note on parallel
matrix inversion. SIAM J. Sci. Comput., 22(5):1762–1771, 2001.
[34] Gregorio Quintana-Ortı́ and Robert van de Geijn. Improving the performance of reduction to Hes-
senberg form. ACM Trans. Math. Softw., 32(2):180–194, June 2006.
[35] Robert Schreiber and Charles Van Loan. A storage-efficient WY representation for products of
Householder transformations. SIAM J. Sci. Stat. Comput., 10(1):53–57, Jan. 1989.
[36] G. W. Stewart. Matrix Algorithms Volume 1: Basic Decompositions. SIAM, Philadelphia, PA, USA,
1998.
[37] G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, Philadelphia, PA, USA, 2001.
[38] Lloyd N. Trefethen and David Bau III. Numerical Linear Algebra. SIAM, 1997.
[39] Robert van de Geijn and Kazushige Goto. Encyclopedia of Parallel Computing, chapter BLAS (Basic
Linear Algebra Subprograms), pages Part 2, 157–164. Springer, 2011.
[40] Robert A. van de Geijn and Enrique S. Quintana-Ortı́. The Science of Programming Matrix Compu-
tations. www.lulu.com/contents/contents/1911788/, 2008.
[41] Field G. Van Zee. libflame: The Complete Reference. www.lulu.com, 2012.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[42] Field G. Van Zee, Ernie Chan, Robert van de Geijn, Enrique S. Quintana-Ortı́, and Gregorio
Quintana-Ortı́. The libflame library for dense matrix computations. IEEE Computation in Science &
Engineering, 11(6):56–62, 2009.
[43] Field G. Van Zee and Robert A. van de Geijn. BLIS: A framework for rapid instantiation of BLAS
functionality. ACM Trans. Math. Soft., 2015. To appear.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[44] Field G. Van Zee, Robert A. van de Geijn, and Gregorio Quintana-Ortı́. Restructuring the tridiagonal
and bidiagonal QR algorithms for performance. ACM Trans. Math. Soft., 40(3):18:1–18:34, April
2014.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[45] Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortı́, and G. Joseph Elizondo. Families
of algorithms for reducing a matrix to condensed form. ACM Trans. Math. Soft., 39(1), 2012.
Download from http://www.cs.utexas.edu/users/flame/web/FLAMEPublications.html.
[46] H. F. Walker. Implementation of the GMRES method using Householder transformations. SIAM J.
Sci. Stat. Comput., 9(1):152–163, 1988.
[47] David S. Watkins. Fundamentals of Matrix Computations, 3rd Edition. Wiley, third edition, 2010.
[48] Stephen J. Wright. A collection of problems for which Gaussian elimination with partial pivoting is
unstable. SIAM J. Sci. Comput., 14(1):231–238, 1993.
Index
447
gemv Matlab, 104
cost, 21 matrix
gemv, 20 condition number, 82
ger conjugate, 13
cost, 25 inversion, 247–249
ger, 23 low-rank approximation, 78
Gram-Schmidt norms(, 41
Classical, 88 norms), 45
cost, 99 orthogonal, 53–83
Madified, 93 orthogormal, 64
Gram-Schmidt orthogonalization permutation, 219
implementation, 104 spectrum, 278
Gram-Schmidt QR factorization, 85–100 transpose, 13–16
unitary, 64
Hermitian transpose
matrix-matrix multiplication, 26
vector, 14
blocked, 32
Householder QR factorization, 111
cost, 28
blocked, 127–138
element-by-element, 26
Householder transformation, 115–118
via matrix-vector multiplications, 27
Housev(·), 118
via rank-1 updates, 30
HQR(·), 122
via row-vector times matrix multiplications, 29
infinity-norm matrix-matrix operation
vector, 40 gemm, 26
inner product, 19 matrix-matrix multiplication, 26
inverse matrix-matrix operations, 444
Moore-Penrose generalized, 78 matrix-matrix product, 26
pseudo, 78 matrix-vector multiplication, 20
algorithm, 22
laff
by columns, 22
routines, 441–445
by rows, 22
LAFF Notes, xii
cost, 21
laff operations, 444
via axpy, 22
linear system solve, 229
via dot, 22
triangular, 229–232
matrix-vector operation, 20
low-rank approximation, 78
gemv, 20
lower triangular solve, 229–231
ger, 23
LU decomposition, 215
matrix-vector multiplication, 20
LU factorization, 211–247
rank-1 update, 23
complete pivoting, 228
matrix-vector operations, 444
cost, 218–219
matrix-vector product, 20
existence, 215
MGS, 93
existence proof, 226–228
Modified Gram-Schmidt, 93
partial pivoting, 219–226
Moore-Penrose generalized inverse, 78
algorithm, 221
multiplication
LU factorization:derivation, 215–216
matrix-matrix, 26
MAC, 32 blocked, 32
element-by-element, 26 by rows, 25
via matrix-vector multiplications, 27 cost, 25
via rank-1 updates, 30 via axpy, 25
via row-vector times matrix multiplications, via dot, 25
29 reduced Singular Value Decomposition, 73
matrix-vector, 20 reduced SVD, 73
cost, 21 Reflector, 115
multiply-accumulate, 32 reflector, 115
norms scal, 16
matrix(, 41 sca
matrix), 45 cost, 17
vector(, 38 scaled vector addition, 18
vector), 41 cost, 18, 20
notation scaling
FLAME, 21 vector
cost, 17
Octave, 104
singular value, 69
operations
Singular Value Decomposition, 53–83
laff, 444
geometric interpretation, 69
matrix-matrix, 444
reduced, 73
matrix-vector, 444
theorem, 69
orthonormal basis, 85
solve
outer product, 23
triangular, 229–232
partial pivoting Spark webpage, 104
LU factorization, 219–226 spectral radius
permutation definition, 278
matrix, 219 spectrum
preface, x definition, 278
product SVD
dot, 19 reduced, 73
inner, 19
matrix-matrix, 26 Taxi-cab norm
matrix-vector, 20 see vector 1-norm, 40
outer, 23 transpose, 13–16
projection matrix, 14
onto column space, 77 vector, 13
pseudo inverse, 78 triangular solve, 229
lower, 229–231
QR factorization, 90 upper, 232
Gram-Schmidt, 85–100 triangular system solve, 232
Rank Revealing, 143
upper triangular solve, 232
Rank Revealing QR factorization, 143
rank-1 update, 23 vector
algorithm, 25 1-norm
by columns, 25 see 1-norm, vector, 40
2-norm
see 2-norm, vector, 38
complex conjugate, 13
conjugate, 13
Hermitian transpose, 14
infinity-norm
see infinity-norm, vector, 40
length
see vector 2-norm, 38
norms(, 38
norms), 41
operations, 444
orthogonal, 64
orthonormal, 64
perpendicular, 64
scaling
cost, 17
transpose, 13
vector addition
scaled, 18
vector length
see vector 2-norm, 38
vector-vector operation, 16
axpy, 18
dot, 19
scal, 16
vector-vector operations, 444