Professional Documents
Culture Documents
Zheng Jinchang
January 25, 2014
This solution is for Applied Numerical Linear Algebra, written by James. W. Demmel. It covers
the majority of the questions from Chapter I to Chapter VI except the programming and few
other questions . As it is done by myself, surely it will have errors. Thus, if there are some
errors or something else, you can email me at jczheng1234@hotmail.com. I hope this solution
is helpful :)
C ONTENTS
1
1 S OLUTIONS FOR C HAPTER I: I NTRODUCTION
Question 1.1. Let A be an orthogonal matrix. Show that det(A) = ±1. Show that if B is also
orthogonal and det(A) = − det(B ), then A + B is singular.
A AT = I .
Consequently,
det(A) = ±1.
To proof A + B is singular, consider det(A + B ) = det(A T + B T ), yielding
det(A + B ) = 0.
i.e, A + B is singular.
Question 1.2. The rank of a matrix is the dimension of the space spanned by its columns.
Show that A has rank one if and only if A = ab T for some column vectors a and b.
Remark. For this question to hold, we have to assume that both a and b is not 0.
Proof. If A = ab T , let b = {b 1 , b 2 , . . . , b n }. Partition A by columns as A = α1 , α2 , . . . , αn . We
¡ ¢
have
αi = b i a.
Therefore, all column vectors of A are linear dependent on a. As a, b 6= 0, resulting in A 6= 0.
Then, dim span α1 , α2 , . . . , αn = 1, i.e. rank(A) = 1.
¡ ¡ ¢¢
losing any generalities, suppose α1 6= 0. Because of rank(A) = 1, there is only one linear inde-
pendent column vector of A, which means
αi = b i α1 .
A = ab T .
Question 1.3. Show that if a matrix is orthogonal and triangular, then it is diagonal. What
are its diagonal elements?
2
Proof. Without losing any generalities, suppose A = (a i j ) is orthogonal and upper trangular.
Because A is orthogonal, we get
a 11 a 12 . . . a 1n a 11 a 11 a 11 a 12 . . . a 1n
a 22 . . . a 2n a 12 a 22 a 12 a 22 a 22 . . . a 2n
.. .. .. .. .. = .. .. .. .. = 1.
..
. . . . . . . . . .
a nn a 1n a 2n . . . a nn a 1n a 2n . . . a nn a nn
a 1i = ±δ1i .
which means A(2 : n, 2 : n) is orthogonal and upper triangular. Thus, the result follows by
induction, and the diagonal entries are ±1.
Question 1.4. A matirx is stictly upper triangular if it is upper triangluar with zero diagonal
elements. Show that if A is strictly upper triangular and n-by-n then A n = 0.
¡ ¢
Proof. Suppose B is a n-by-n matrix, and partition B by its columns as B = b 1 , b 2 , . . . , b n .
Consider the i th column of B A, which is
iX
−1
(B A)(:, i ) = A( j , n)b j .
j =1
Thus, the i th column of B A is the linear combination of the first to the (i − 1)th columns
of B . Subsititue B by A, yielding the diagonal and first superdiagonal of A 2 becoming zero,
because only the first i −1 entries of i th column of A are nonzeros. Consequently, the second
superdiagonal of A 3 becomes zero. In this ananlogy, after n times, the n − 1 superdiagonal of
A n becomes zero, i.e. A n = 0.
Question 1.5. Let k · k be a vector norm on Rm and assume that C ∈ Rm×n . Show that if
rank(C ) = n, then kxkC ≡ kC xk is a vector norm.
Proof.
1. Positive definiteness.
For any x ∈ Rn , since k · k is a vector norm, we have kC xk ≥ 0, and if and only if C x = 0,
kC xk = 0. Consider the linear system of equations
C x = 0.
Because C has full column rank, the above equations have only zero solution. This
proves the positive definiteness property.
3
2. Homogeneity.
For any x ∈ Rn , α ∈ R,
°2
T ¶° kE sk22
° µ
°E I − ss ° = kE k2 −
°
F .
° s T s °F sT s
Proof. Since
¶¶T
ss T ss T ss T E s(E s)T
µ ¶µ µ µ ¶
E I− T E I− T = E I − T ET = EET − ,
s s s s s s sT s
it follows
°2
T ¶° T kE sk22
° µ
°E I − ss ° = tr(E E T ) − tr(E s(E s) ) = kE k2 −
°
F .
° s T s °F sT s sT s
Therefore, the question is proved.
Remark. It could also be proved by projection property, because for any u ∈ R , P = uu /u T u
n T
is the orthogonal projection of the subspace spanned by u, and I − P is its complement pro-
jection.
Proof. Because the Frobenius norm can be evaluated as kAk2F = tr(A A ∗ ), it follows
For two-norm, since we have the equation about two-norm: kAk22 = λmax (A A ∗ ), subsititue A
by x y ∗ . We have
v ∗x x∗v 2
λ2max (x x ∗ ) = max = max (v ∗ x) , for any v ∈ C.
v 6=0 v ∗v v ∗ v =1
4
According to Cauchy-Schwartz inequality, only when v is proportional to x, (v ∗ x) get the
greatest absolute value. Therefore, λ2max (x x ∗ ) ≤ kxk22 , and when v = x the inequality becomes
equality. Consequently,
Question 1.8. One can identify the degree d polynomials p(x) = di=0 a i x i with Rd +1 via the
P
vector of coefficients. Let x be fixed. Let S x be the set of polynomials with an infinite relative
condition number with respect to evaluationg them at x (i.e., they are zero at x). In a few words,
describe S x geometrically as a subset of Rd +1 . Let S x (κ) be the set of polynomails whose relative
condition number is κ or greater. Describe S x (κ) geometrically in a few words. Describe how
S x (κ) changes geometrically as κ → ∞.
¡ ¢
Solution. Suppose the coefficients of the polynomials with degree d as a = a 1 , a 2 , . . . , a d +1 ,
and the fixed x = 1, x, . . . , x d . Since S x are the polynomials that are zero at x, we have
¡ ¢
a T x = 0.
Therefore, the set S x forms the hyperplanes which are perpendicular to the fixed x (or their
normal vectors are x).
¡ ¢ ¡ ¢
As for the S x (κ), set |a| = |a 1 |, |a 2 |, . . . , |a d +1 | , and |x| = |x 1 |, |x 2 |, . . . , |x d +1 | . Due to the defi-
nition of the relative condition number, we have
|a T ||x|
≥ κ.
|a T x|
kak2 kxk2
≥ κ.
|a T x|
Take reciprocal and arccos on both sides, the inequality does not change, resulting
1
∠(a, x) ≥ arccos .
κ
So, the set S x (κ) are hyperplanes whose normal vectors have greater angle than arccos κ1 with
the fixed x. Thus, as κ → ∞, arccos κ1 → π2 , i.e, the hyperplanes become perpendicualr to the
fixed x.
Question 1.9. This question is too long, so I omit it. For details, please refer to the textbook.
5
Proof. For the first algorithm, insert a roundoff term (1+δi ) at each floating point operation,
we get
log (1 + x)(1 + δ1 ) log(1 + x) + log(1 + δ1 )
ŷ = (1 + δ2 ) = (1 + δ2 ).
x x
Since x, δi ≈ 0, we get the relative error as
log (1 + δ1 ) ¯ ¯ δ1 ¯ ¯ δ1 ¯
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
¯ ŷ − y ¯ ¯ x
¯ ¯=¯
¯ y ¯ ¯ log (1 + x) (1 + δ2 )¯ ≈ ¯ (1 + δ2 )¯¯ ≈ ¯¯ ¯¯
¯ ¯
x x x
As δ is a bounded value, the relative error will grow when x → 0.
Use the same technique to analyse the second algorithm, we have its relative error as
log d ¯¯ ¯¯ 1 + δ2
¯ ¯ ¯ ¶¯ ¯ ¯
¯ ŷ − y ¯ ¯ d − 1 log d
µ
δ ¯ ≈ δ, |δ| ≤ 2².
¯
¯ ¯=¯
¯ y ¯ ¯ log d (d − 1)(1 + δ ) (1 + 2 ) − = − 1
1 d −1 ¯ ¯ 1 + δ1 ¯
where |δi | ≤ d ². Use this to prove the following fact. Let A m×n and B n×p be matrices, and
compute their produnct in the usual way. Baring overflow or underflow show that |fl(A · B ) −
A · B | ≤ n · ² · |A| · |B |. Here the absolute value of a matrix |A| means the matrix with entries
(|A|)i j = |a i j |, and the inequality is meant componentwise.
Proof. Denote δi as mutipilication roundoff error and δ̂i as addition roundoff error. Ex-
panding the formula with the roundoff terms, we get
à ! à !
d d dY
−1 dY
−1
(1 + δi ) (1 + δ̂ j )x i y j + (1 + δ̂ j )x 1 y 1 .
X X
fl xi y i =
i =1 i =2 j =i −1 j =1
As
dY
−1
1 − (d − i + 1)² + O(²2 ) ≤ (1 − ²)d −i +1 ≤ (1 + δ̂ j ) ≤ (1 + ²)d −i +1 ≤ 1 + (d − i + 1)² + O(²2 ),
j =i −1
Now, turn to prove the matrix inequlity. Since the absolute value of a matrix is defined compo-
nentwise, we just need to prove it componentwise. Suppose c i j = (A·B )(i , j ) = nk=1 A(i , k)B (k, j ).
P
6
Question 1.11. Let L be a lower triangular matrix and solve Lx = b by forward substituion.
Show that barring overflow or underflow, the computed solution x̂ satisfies (L+δL)x̂ = b, where
|δl i j | ≤ n²|l i j |, where ² is the machine precision. This means that forward substituion is back-
ward stable. Argue that backward substitution for solving upper triangular systems satisfies
the same bound.
Proof. Insert a roundoff term (1 + δi ) at each floating point operation in the forward substi-
tuion formula, and use the results of last question. We get
Pi −1
bi − k=1
a i k x k (1 + δk )
xi = (1 + δ̂i )(1 + δ̄i ), where |δk | ≤ (i − 1)², |δ̂i | ≤ ², |δ̄i | ≤ ².
ai i
i
a i k x k (1 + δi k ), where |δi k | ≤ max{2, i − 1}².
X
bi =
k=1
(((b i − a i 1 x 1 ) − a i 2 x 2 ) − a i 3 x 3 ) . . . .
In fact, it may be the evaluation sequence the question intended. My result is based on the
following sequence
iX
−1
bi − ( a i k x k ).
k=1
For this question, there is not major difference between the two.
Question 1.12. The question is too long, so I omit it. For details, please refer to the textbook.
Remark. In my point of view, the question is intended to prove that complex arithmetics
implemented in real arithmetics are backward stable. Thus, it may be sufficient to prove
7
Solving δ1 , δ2 from the above equations, we get
(a 1 + a 2 )2 δ̂ + (b 1 + b 2 )2 δ̄
δ1 = ,
(a 1 + a 2 )2 + (b 1 + b 2 )2
(a 1 + a 2 )(b 1 + b 2 )(δ̄ − δ̂)
δ2 = .
(a 1 + a 2 )2 + (b 1 + b 2 )2
Since |δ1 |2 + |δ2 |2 ≤ |δ̂ + δ̄|2 + |δ̄ − δ̂|2 ≤ 4²2 , we have proved for the complex addition. As
subtraction is essentially the same as addition, it is the same for complex subtraction .
As for mutiplication, use the same technique and the above result, we have the equation as
For simplicity, we ignore the higher order epsilons as they are not the major part of the error.
Then solving δ1 , δ2 from the above equations, we get
1 ³
δ1 = (a 12 a 22 − a 1 a 2 b 1 b 2 )δ̄1 + (a 22 b 12 + a 1 a 2 b 1 b 2 )δ̄2 + (a 12 b 22 +
(a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2
´
a 1 a 2 b 1 b 2 )δ̄3 − (a 1 a 2 b 1 b 2 + b 12 b 22 )δ̂4 + (a 12 a 22 − b 12 b 22 + a 22 b 12 + a 12 b 22 )δ̂1 − (a 1 b 1 b 22 + a 1 a 22 b 1 )δ̂2 ,
1 ³
δ2 = − (a 12 a 2 b 2 + a 1 a 22 b 1 )δ̄1 + (a 1 a 22 b 1 − a 2 b 12 b 2 )δ̄2 + (a 12 a 2 b 2 −
(a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2
´
a 1 b 1 b 22 )δ̄3 + (a 1 b 1 b 22 + a 2 b 12 b 2 )δ̄4 + ((a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2 )δ̂2 .
We can claim that all the coefficient of each δi is bounded, because of the mean value in-
equality. Therefore, the question holds for the complex multiplication. Now, we can check
whether both the real and imaginary parts of the complex product are always compute accu-
rately. Since we have fl(a 1 + b 1 i )(a 2 + b 2 i ) = (a 1 + b 1 i )(a 2 + b 2 i )(1 + δ1 + δ2 i ), separate the real
and imaginary parts as
¯ ¯
¯ (a 1 a 2 − b 1 b 2 )δ1 − (a 1 b 2 + a 2 b 1 )δ2 ¯
Errorr = ¯¯ ¯,
a1 a2 − b1 b2 ¯
¯ ¯
¯ (a 1 a 2 − b 1 b 2 )δ2 − (a 1 b 2 + a 2 b 1 )δ1 ¯
Errori = ¯¯ ¯.
a1 b2 + a2 b1 ¯
Thus, if a 1 , b 1 → 0, the real part may go wrong; and if b 1 , b 2 → 0, the imaginary part may go
wrong. So both parts of the complex product may not always be computed accurately.
At last, we check the complex division with the knowledge of complex mutiplication, and
my algorithm for complex division is like the normal complex division just without actual
complex mutiplication in realizing the denominator. Therefore, we have
1 In reality, there are two division δ’s. For simplicity I suppose they are the same, becasue there are no differ-
ence after the following merging δ’s. Otherwise, the procedure remains the same, but will be slightly more
complicated.
8
Ignore the higher order epsilon. Then merge (1 + δ̄1 + δ̄2 i )(1 + δ̂4 ), (1 + δ̂1 )(1 + δ̂3 ) and (1 +
δ̂2 )(1 + δ̂3 ), and continue to use the old mark. We can solve δ1 , δ2 as
a 22 δ̂1 + b 22 δ̂2
δ1 = δ̄1 − ,
a 22 + b 22
δ2 = δ̄2 .
As all the coefficient of each δ is smaller than one, we prove the result for complex division.
What’s more, not matter whether |a| is large or small, a/a will approximate 1 by the above
formulas.
Remark. Since there is not major difference between real and complex situations. For sim-
plicity, I prove for the real case.
Proof. In one side, if A is s.p.d., for any x, y ∈ Rn , we define an inner product as 〈x, y〉 =
x T Ay. Verify four criteria of inner product, we get
This means A is symmetric positive definite, and is uniquely determined by the inner product
(for the bases is fixed). Thus, the quetion is proved.
9
p
Therefore, we get the first inequality kxk2 ≤ kxk1 ≤ nkxk2 .
For the inequality about infinity-norm, since it is obvious that
r X X
max(|x i |) ≤ max(x i 2 ) + x i2 ≤ max(|x i |) + |x i |,
else else
and
n n
n max(|x i |)2 ≥ x i2 , n max(|x i |) ≥
X X
|x i |,
i =1 i =1
p
the second inequality kxk∞ ≤ kxk2 ≤ nkxk∞ and third inequality kxk∞ ≤ kxk1 ≤ nkxk∞
follows.
Question 1.15. Prove Lemma 1.6.
Proof. Let A ∈ Rm×n , k · kmn the operator norm and k · km , k · kn the corresponding vector
norms. To prove the question, we just need to verify the three criteria.
1. Positive definiteness
Since kAkmn = maxkxkn =1 (kAxkm ), definitly kAkmn ≥ 0. If kAkmn = 0, which means
Ax = 0 for all x ∈ Rn . i.e., Rn is the solution set of linear system of equations Ax = 0. As
a result, we have A = 0.
2. Homogeneity
Due to the homogeneity of corresponding vector norm, we have
kA + B kmn = max (k(A + B )xkm ) ≤ max (kAxkm ) + max (kB xkm ) = kAkmn + kB kmn .
kxkn =1 kxkn =1 kxkn =1
m m m
kAxk22 = (αTi x)2 ≤ kαi k22 kxk22 = kxk22 kαi k22 = kxk22 kAk2F .
X X X
i =1 i =1 i =1
10
2. kAB k ≤ kAk · kB k for any operator norm or for the Frobenius norm.
Using the above result, for any operator norm, we have
columns as B = β1 , β2 , . . . , βs . We get
¡ ¢
m,s m,s m s
kAB k2F = (αTi β j )2 ≤ (kαi k22 · kβ j k22 ) = kαi k22 · kβi k22 = kAk2F kB k2F .
X X X X
i =1 i =1 i =1 j =1
j =1 j =1
3. The max norm and Frobenius norm are not operator norms.
Consider a matrix A = ααT . We have a lower bound of kAk as follows,
kAkmax is an operator norm, we have kAkmax ≥ n. Such conflict means that k · kmax is
not an operator norm.
For Frobenius norm, consider any operator norm of identity matrix, we have
kI xk kxk
kI k = max = max = 1.
x6=0 kxk x6=0 kxk
While the Frobenius norm of I is n, it means that Frobenius norm is not an operator
norm.
4. kQ AZ k = kAk if Q and Z are orthogonal or unitary for the Frobenius norm and for the
operator norm induced by k · k2 .2
Separate the target equation as two equations: kQ Ak = kAk, kAZ k = kAk. Since kA T k =
kAk holds for Frobenius and two-norm (it is obvious for Frobenius norm and later in
this question we will prove this
p for two-norm), we just need to prove the first equation.
For two-norm, since kAk2 = λmax (A T A), we get
q q
kQ Ak2 = λmax (A T Q T Q A) = λmax (A T A) = kAk2 .
Next, for the Frobenius norm, we claim that for any vector x, kQxk2 = kxk2 , because
kQxk22 = x T Q T Qx. Then, parition A by its columns as A = α1 , α2 , . . . , αn . We have
¡ ¢
n n
kQ Ak2F = kQαi k22 = kαi k22 = kAk2F .
X X
i =1 i =1
2 This result can be extended into the case of the rectangular matrix. The proof is nearly the same, except that we
11
5. kAk∞ ≡ maxx6=0 kAxk ∞
P
kxk∞ = maxi j |a i j | = maximum absolute row sum.
¢T
Partition A by its rows as A = α1 , α2 , . . . , αTn . We have
¡ T T
kAxk∞ n ¯
= max |αTi x| ≤ max
X ¯
kAk∞ ≡ max ¯a i j ¯ .
x6=0 kxk∞ kxk∞ =1,i i j =1
¡
Denote i the number of row of the maximum absolute row sum, and set y = sign(a i 1 ),
¢ P
sign(a i 2 ), . . . , sign(a i n ) . The up bound is attainable, as kyk∞ = 1 and kAyk = maxi j |a i j |.
Denote j the number of column of the maximum absolute column sum. We have
à à ! !
Xn X X
max |x i | kαi k1 = max |x i | kαi k1 + 1 − |x i | kα j k1
kxk1 =1 i =1 kxk1 =1 i 6= j i 6= j
à !
X¡ ¢
= max kα j k1 − kα j k1 − kαi k1 |x i |
kxk1 =1 i 6= j
≤ kα j k1 .
7. kAk2 = kA T k2p
.3
Since kAk2 = λmax (A T A), it suffice to prove that λmax (A T A) = λmax (A A T ). Denote λ
is as a nonzero eigenvalue of A T A and x its corresponding eigenvector. We have
version when A and B are square here. First assume A is invertable, then det(λI − AB ) = det(λI −B A) is easy to
prove. For singular A, we perturb A as à = A + t I . à is invertable when t is small enough, and operator det() is
continuous. Thus, we have the result by taking limit. This method is inspired by my friend, Zhou Zeren. There
is another method in Chapter IV.
12
9. If A is n-by-n, then n −1/2 kAk2 ≤ kAk1 ≤ n 1/2 kAk2 .
p
Use the inequalities kxk2 ≤ kxk1 ≤ nkxk2 . Suppose kxk 6= 0. Invert the inequalities as
1 1 1
p ≤ ≤ .
nkxk2 kxk1 kxk2
kAxkp
As kAkp ≡ maxx6=0 kxkp , combine the two inequalities. We have
p
kAxk2 kAxk1 nkAxk2
p ≤ ≤ .
nkxk2 kxk1 kxk2
i.e.,
kAk2 ≤ kAkF ≤ n 1/2 kAk2
13
p
Question 1.17. We mentioned that on a Cray 2 2
p machine the expression arccos (x/ x + y )
caused an error, because roundoff caused x/ x 2 + y 2 to exceed 1. Show that this is impos-
sible using IEEE arithmetic, barring overflow or under flow. Extra credit: Prove the same result
using correctly rounded decimal arithmetic.
Remark. I am not sure about the correctness of my proof, since there is not major difference
between decimal and binary arithmetic in my proof. It may due to that I use the different
method to prove it instead of what the author is intended for. However, whether or not it is
correct, my proof can serve as a p reference.
Proof. If we have prove that fl( x 2 ) = x exactly on a machine using IEEE arithmetic, we will
get q p
fl(x/ x 2 + y 2 ) ≥ fl(x/ x 2 ) = fl(x/x) = 1,
p
which means it p is impossible for arccos (x/ x 2 + y 2 ) causing an error. Thus, we only need
2 2
to prove that fl( x 2 ) = x. Denote fl(xp ) = x (1 + δ) with |δ| ≤ ², where x p is already a floating
point number. Then, to compute fl(x 2 ), the computer will compute fl(x 2 ) ≈ x(1 + 21 δ),
¡p
fl(x 2 ) − x /x = 21 δ and 12 |δ| ≤ 12 ², the 21 δ will be rounded off,
¢
and round it. However, as
p
resulting in fl( x 2 ) = x.
Question 1.18. Suppose that a and b are normalized IEEE double precision floating point
numbers, and consider the following algorithm, running with IEEE arithmetic:
1. Barring overflow or underflow, the only roundoff error committed in running the algo-
rithm is computing s 1 = f l (a +b). In other words, both subtractions a −s 1 and (a −s 1 )+b
are computed exactly.
2. s 1 +s 2 = a +b, exactly. This means that s 2 is actually the roundoff error committed when
rounding the exact value of a + b to get s 1 .
Proof. To do addition and subtraction, a computer will equalize the exponential parts of
two operands. For instance, suppose
a = (1, a 1 , a 2 , a 3 , a 4 , . . . , a 50 , a 51 )2 × 2e 1 ,
b = (1, b 1 , b 2 , b 3 , b 4 , . . . , b 50 , b 51 )2 × 2e 2 .
To do addition, a computer will change e 1 and e 2 into e 3 , then add their corresponding frac-
tion parts.
Therefore, if a is too larger than b, that is to say, the bottomest bit of a is still greater than the
14
toppest bit of b, we will have s 1 = a + b = a, and s 1 − a = a − a = 0 is exact, since a and b are
already floating point numbers. Then, s 1 + b = 0 + b = b is also exact.
If a is not too larger than b, without losing any generality, we can suppose that the toppest bit
of b is the 50th toppest bit of a. Remain to use the above marks. We have
s 1 = a + b = (1, a 1 , a 2 , a 3 , a 4 , . . . , a 50 + 1, a 51 + b 1 ) × 2e 1 .
This is exact, because the toppest bit of s 1 is the same as the toppest bit of a. What’s more,
since a − s 1 has the same toppest bit of b, it is certain that (a − s 1 ) + b is exact.
The situation of a < b is the same. In all, we can see that under both situation s 1 + s 2 contain
all the bits of a and b. Therefore, s 1 + s 2 = a + b exactly.
Question 2.2. Consider solving AX = B for X , where A is n-by-n, and X and B are n-by-m.
There are two obvious algorithms. The first algorithm factorizes A = P LU using Gaussian elim-
ination and then solves for each column of X by forward and back substitution. The second al-
gorithm computes A −1 using Gaussian elimination and then multiplies X = A −1 B . Count the
number of flops required by each algorithm, and show that the first one requires fewer flops.
Solution. For the first algorithm, since permutation does not increase flops, we can suppose
matrix A is already permuted. Then, denote
a 11 a 12 . . . a 1n
a 21 a 22 . . . a 2n
A= .
.. .. .
..
. . . . .
a n1 a n2 ... a nn
To determine the first column of L requires n − 1 divisions. To update the rest of the matrix
costs n − 1 multiplications and subtractions for each row, which totally costs 2(n − 1)2 flops.
Therefore, the whole flops Gaussian elimination requires is
2 2
(2(i − 1)2 + (i − 1)) = n 3 + O(n 2 ).
X
i =n 3
15
It is the same for back substitution. Thus, the first method costs one step of LU decomposi-
tion, m steps of forward and back substitution. Totally
2 3
n + 2mn 2 + O(n 2 ) + O(mn)
3
flops.
For the second algorithm, Using 1i =n (2(i −1)i +2(n−i +1)(i −1)) = n 3 +O(n 2 ) flops, we reduce
P
from ¯
a 11 a 12 . . . a 1n ¯¯ 1
a 21 a 22 . . . a 2n ¯ 1
¯
. .. ..
.. ..
¯
.
. . . . .
¯
¯
¯
a n1 a n2 . . . a nn ¯ 1
to ¯
a 11 a 12 ... a 1n ¯ 1
¯
0 00
a 22 ... a 2n ¯ b 21 1
¯
.. ¯ . .. .
.. ..
¯ .
. . ¯ . . .
00
¯
a nn ¯ b n1 b n2 ... 1
Costing another 1i =n ((i − 1) + 2n(i − 1)) = n 3 + O(n 2 ), we finish finding the inverse of A. As a
P
Therefore, when n and m is large enough, the first algorithm costs fewer flops.
Question 2.3. Let k · k be the two-norm. Given a nonsingular matrix A and a vector b, show
that for sufficiently small kδAk, there are nonzero δA and δb such that inequality
Proof. To find the suitable δx and δA, let us have a glance at how the inequality comes. The
original equation is
δx = A −1 (−δA · x̂ + δb).
Using the inequality kB xk ≤ kB k·kxk twice, and kδA x̂ +δbk ≤ kδA x̂k+kδbk, we get the target
inequality. If we denote y as the vector which satisfy kA −1 k · kyk = kA −1 yk, set δA satisfy
k−δAk·kx̂k = k−δA x̂k, and δA x̂ proportional to δb. The three inequalities will both become
equalities. This suggests how we can construct δA and δb.
Assume the δA has the form δA = c ab T , where c is a positive constant and a, b are vectors to
be determined. According to Question 1.7., k − δAk = ckbk · kak. Then, as the δA must satisfy
k − δAk · kx̂k = k − δA x̂k, we get
ckbk · kak · kx̂k = ck − ab T x̂k = ckb T x̂k · kak ≤ ckbk · kak · kx̂k.
16
Therefore, for the equation to hold, b must be proportional to x̂, while a can be free of choice.
Then, for δA x̂ to be proportional to δb, we can set b = x̂ and a = d δb, where d is another
constant to be determined. As the final condistion requires
δA = d δb x̂ T , δb = (cd x̂ T x̂ + 1)−1 y.
We can see that both δA and δb are determined by known value, and due to the conditions
we use, the target inequality becomes equality under our choice of δA and δb, whose norm
can be modified by the choice of c, d and kyk.
are attainable.
Remark. I could only prove that the bounds can be attainable for certain A, whose inverse
satisfy the following condition:
¡ ¢ ¡ ¢
sgn(b i 1 ), sgn(b i 2 ), . . . , sgn(b i n ) = ± sgn(b j 1 ), sgn(b j 2 ), . . . , sgn(b j n ) , for any i , j.
kδxk
≤ ²k|A −1 | · |A|k.
kx̂k
|δx| = |A −1 (−δA x̂ + δb)| ≤ |A −1 |(|δA| · |x̂| + |δb|) ≤ |A −1 |(²|A| · |x̂| + ²|b|) = ²(|A −1 |(|A| · |x̂| + |b|)),
it is inevitable that
|δA| = ²|A|, |δb| = ²|b|,
which means we could only choose the signs of entries of δA and δb. From inside, we have
|δA x̂| = |δA||x̂|. To make the inequality become equality, for each row, δA(i , j ) and x̂ j should
have the same sign. Therefore,
¡ ¢ ¡ ¢
sgn(δA(i , 1)), sgn(δA(i , 2)), . . . , sgn(δA(i , n)) = ± sgn(x̂ 1 ), sgn(x̂ 2 ), . . . , sgn(x̂ n ) , for any i ,
which reduces the degree of freedom of δA to n. Next, to make | − δA x̂ + δb| = | − δA x̂| + |δb|,
we have
n
X
sgn( −δA x̂) = sgn(δb),
i =1
17
which reduces the degree of freedom of δb to 0. For simplicity, denote z = −δA x̂ + δb. Use
the last n degree of freedom of δA, we could determine the sign of each z i . According to the
condition of A −1 , set
sgn(z i ) = sgn(A −1 (i , j )), for any i , j.
We will get |A −1 z| = |A −1 ||z|. In all, we get
Take norm on both sides, and the result follows. For the second inequality, the procedure is
the same. Since under the assumption of δb = 0, the only difference is to divide x̂ in both
sides.
Question 2.5. Prove Theorem 2.3. Given the residual r = A x̂ − b, use Theorem 2.3 to show that
bound (2.9) is no larger than bound (2.7). This explains why LAPACK computes a bound based
on (2.9), as described in section 2.4.4.
Proof. To prove Theorem 2.3. Rearrange the equation (A + δA)x̂ = b + δb, yielding
|A x̂ − b| = |r | = |δb − δA x̂| ≤ |δb| + |δA x̂| ≤ |δb| + |δA| · |x̂| ≤ ²(|b| + |A| · |x̂|).
are nonnegative, we just need to prove that |r i | ≤ ²(|A| · |x̂| + |b|)i . According to the smallest
value of ², the targeted inequality holds. Therefore, bound (2.9) is no larger than bound (2.7).
Question 2.6. Prove Lemma 2.2.
Proof. First, consider the single permuted permutation matrix. That is, the permutation
matrix with only one pair of rows permuted. It can be written as
1
..
.
0 ··· 1
P = ..
, where i th, j th rows are permuted.
.
1 ··· 0
..
.
1
18
According to the form of the single permutation matrix, it follows det(P ) = ±1. Give a matrix
¢T
X = α1 , α2 , . . . , αn = βT1 , βT2 , . . . , βTn . The matrix product P X and X P have the form
¡ ¢ ¡
¢T
P X = βT1 , βT2 , . . . , βTj , . . . , βTi , . . . , βTn ,
¡
X P = α1 , α2 , . . . , α j , . . . , αi , . . . , αn .
¡ ¢
Mark this as property-one. Then, to transform P into identity matrix I , it just need to per-
mute the permuated i th and j th rows again. Consequently, P −1 = P for single permuted
permutation matrix.
For any permutation matrix P̂ , decompose it into a series of single permutation matrices as
P i , i = 1, 2, . . . , m. i.e.,
P̂ = P m P m−1 · · · P 1 .
According to what we have proved for the single permutation matrix, it is only necessary to
verify the property about P −1 . Since P̂ T = P 1 P 2 · · · P m , we have
P̂ P̂ T = P̂ T P̂ = I .
Proof. To prove the question, partition L by its columns and M by its rows. We have
L = α1 , α2 , . . . , αn ,
¡ ¢
¢T
M = βT1 , βT2 , . . . , βTn .
¡
Recall that both L and M is unit lower triangular matrices. Therefore, D i i αi βTi only has
nonzero entries in i : n submatrix, and αi (i ) = βi (i ) = 1. i.e., D i i αi βTi should have form
as
0
.
.
¢ .
0 . . . 0 D i i αi ∗ =
¡
0 .
D i i βi
∗
As A is nonsingular symmetric, we know D i i 6= 0 and D 11 α1 = D 11 β1 , which means α1 = β1 .
Then, denote A 1 = A − D 11 α1 β1 . Apply the same technique to A(2 : n, 2 : n), which is also
nonsingular and symmetric. As D i i αi βTi only has nonzero entries in i : n submatrix. We can
prove that α2 = β2 . In this analogy, it proves L = M .
19
Question 2.8. Consider the following two ways of solving a 2-by-2 linear system of equations:
· ¸ · ¸ · ¸
a 11 a 12 x1 b1
Ax = · = = b.
a 21 a 22 x2 b2
det = a 11 ∗ a 22 − a 12 ∗ a 21 ,
x 1 = (a 22 ∗ b 1 − a 12 ∗ b 2 )/ det,
x 2 = (−a 21 ∗ b 1 + a 11 ∗ b 2 )/ det .
Show by means of a numerical example that Cramer’s rule is not backward stable.
Proof. To varify whether these two algorithms are backward stable or not, it requires to ver-
ify whether
kδxk/kx̂k = O(²).
Consider the following example:
· ¸ · ¸
1.112 1.999 2.003
·x = .
3.398 6.123 6.122
Thus,
· ¸ · ¸
0.19688 −0.23712
δx 1 = , δx 2 = .
−0.1091134 −0.0341134
It follows that kδx 2 k1 /kxk1 ≈ 0.156894 ≈ 156² = O(1), which suggest that the second algo-
rithm is not backward stable.
20
Question 2.9. Let B be an n-by-n upper bidiagonal matrix, i.e., nonzero only on the main
diagonal and first superdiagonal. Derive an algorithm for computing κ∞ (B ) ≡ kB k∞ kB −1 k∞
exactly (ignoring roundoff ). In other words, you should not use an iterative algothrim such as
Hager’s estimator. Your algorithm should be as cheap as possilbe; it should be possible to do
using no more than 2n − 2 additions, n multiplications, n divisions, 4n − 2 absolute values,
and 2n − 2 comparisions.
Proof. For a bidiagonal matrix to have inverse, all the diagonal entries of B must be nonzero.
Thus, we have the explicit form of B and B −1 as follow:
b 1,1 b 1,2
b 2,2 b 2,3
B =
. . . .
,
. .
b n−1,n−1 b n−1,n
b n,n
Thus, the absolute sum of i th row of B −1 can be compute from (i + 1)th row by the following
recursive formula: ¯ ¯ ¯ ¯
s i = s i +1 ¯b i ,i +1 /b i ,i ¯ + ¯1/b i ,i ¯ .
Assume that we have done the 2n − 1 absolute value operations. The condition number can
be computed as:
1: M AX 1 = B (n, n), M AX 2 = 1/B (n, n), L AST SU M = M AX 2
2: for i = 2 to n do
3: T E M P = B (i , i ) + B (i , i + 1)
4: if T E M P > M AX 1 then
5: M AX 1 = T E M P
6: end if
7: L AST SU M = (L AST SU M ∗ B (i , i + 1) + 1)/B (i , i )
8: if L AST SU M > M AX 2 then
9: M AX 2 = L AST SU M
10: end if
11: end for
Totally, it uses 2n − 2 additions, n − 1 multiplications, n divisions, 2n − 1 absolute values and
2n − 2 comparisions, which meets the requirement.
Question 2.10. Let A be n-by-m with n ≥ m. Show that kA T Ak2 = kAk22 and κ2 (A T A) = κ2 (A)2 .
Let M be n-by-n and positive definite and L be its Cholesky factor so that M = LL T . Show that
kM k2 = kLk22 and κ2 (M ) = κ2 (L)2 .
21
Proof. As A may be rectangular, denote A = U ΣV T as the SVD decomposition of A. Since
kA T Ak2 = kAk22 is proved. For equality κ2 (A T A) = κ2 (A)2 , it also follows from A T A = V Σ2V T ,
which implies σmin (A T A) = σmin (A)2 . Since the Cholesky factor matrix L is square and in-
vertible, apply the above result and we prove the question.
Question 2.11. Let A be symmetic and positive definite. Show that |a i j | ≤ (a i i a j j )1/2 .
Proof. Since A is s.p.d., all its diagonal entries are positive. Consider
p p
x i j = {0, 0, . . . , a j j , . . . , − a i i , . . . , 0},
p p
y i j = {0, 0, . . . , a j j , . . . , a i i , . . . , 0},
yielding
x Tij Ax i j = a j j a i i −
p p p
ai i a j j ai j − a i i a j j a i j + a i i a j j = 2a i i a j j − 2 a i i a j j a i j ≥ 0,
y Tij Ay i j = a j j a i i +
p p p
ai i a j j ai j + a i i a j j a i j + a i i a j j = 2a i i a j j + 2 a i i a j j a i j ≥ 0,
Question 2.13. In this question we will ask how to solve B y = c given a fast way to solve Ax = b,
where A − B is "small" in some sense.
22
2. If you have a fast algorithm to solve Ax = b, show how to build a fast solver for B y = c,
where B = A + uv T .
3. Suppose that kA − B k is "small" and you have a fast algorithm for solving Ax = b. De-
scribe an iterative scheme for solving B y = c. How fast do you expect your algorithm to
converge?
Proof.
1. First, it needs to prove that det(A + uv T ) = det(A) det(1 + v T A −1 u). Consider
A + uv T u
¶ µ −1
A −1 u A
µ ¶µ ¶µ ¶µ ¶ µ ¶
I A I A A
=
vT 1 1 1 −v T 1 1 1 + v T A −1 u
Take determinant on both sides, and the result follows. Then, we can verify Sherman-
Morrison formula as follows:
(A + uv T )(A −1 − (A −1 uv T A −1 )/(1 + v T A −1 u))
= I + uv T A −1 − (uv T A −1 − uv T A −1 uv T A −1 )/(1 + v T A −1 u)
= I + uv T A −1 − (uv T A −1 )(1 + v T A −1 u)/(1 + v T A −1 u)
= I,
(A −1 − (A −1 uv T A −1 )/(1+v T A −1 u))(A + uv T )
= I + A −1 uv T − (A −1 uv T − A −1 uv T A −1 uv T )/(1 + v T A −1 u)
= I + A −1 uv T − (A −1 uv T )(1 + v T A −1 u)/(1 + v T A −1 u)
= I.
Thus, we prove the firs part of part I.
For the Sherman-Morrison-Woodbury formula, we first prove that det(A+UV T ) = det(A) det(I +
V T A −1U ). Consider
A +UV T U
¶ µ −1
A −1U A
µ ¶µ ¶µ ¶µ ¶ µ ¶
I A I A A
=
VT I I I −V T I I I + V T A −1U
Take determinant on both sides, and the result follows. As A is nonsingular, we prove
they are mutually necessary and sufficient. Then, verify the matrix equation as follows:
(A +UV T )(A −1 − A −1U T −1V T A −1 ) = I −U T −1V T A −1 +UV T A −1 −UV T A −1U T −1V T A −1
= I +UV T A −1 − (U +UV T A −1U )T −1V T A −1
= I +UV T A −1 −U (I + V T A −1U )T −1V T A −1
= I,
T
(A −1 −1
− A UT −1
V A −1
)(A +UV ) = I − A −1U T −1V T + A −1UV T − A −1U T −1V T A −1UV T
T
23
2. Denote x 1 , x 2 , which satisfy Ax 1 = u, Ax 2 = c. According to what we have prove, to
solve B y = c, we have
y = B −1 c = (A −1 − (A −1 uv T A −1 )/(1 + v T A −1 u))c
= x 2 − ((v T x 2 )x 1 )/(1 + v T x 1 )
Therefore, using the above formula, the solution can be computed from two solutions
of Ax = b in addition to inner product, all of which are fast.
B y = (A + B − A)y = Ay − (A − B )y = c,
we get an equation as
y = A −1 c + A −1 (A − B )y.
What we need to do is to find its root, thus we can use the fixed point iteration as
y i +1 = A −1 c + A −1 (A − B )y n ,
or in pseudocode as:
1: while kr k > t ol do
2: solve Ax = r
3: y = y +x
4: r = c −By
5: end while
For the fixed point iteration to converge, the norm of the coefficient matrix A −1 (A − B )
must smaller than one. Or, we can also get the same result from the iterative refinement
prespective as:
Question 2.14. Programming question. Skipped.
Question 2.16. Show how to reorganize the Cholesky algorithm (Algorithm 2.11.) to do most
of its operations using Level 3 BLAS. Mimic Algorithm 2.10.
Solution. As cholesky factorization does not need pivot, we first modify Algorithm 2.11 to fit
rectangular matrices. In details, for a m-by-b rectangular matrix, use original Algorithm 2.11
to deal with the upper b-by-b submatrix, and use level 2 BLAS to update the lower (m − n)-
by-n matrix, yielding
1: for i = 1 to b do
24
p
2: A(i : m, i ) = A(i : m, i )/ A(i , i )
3: for j = i + 1 to b do
4: A( j : b, j ) = A( j : b, j ) − A( j , i )A( j : b, i )
5: end for
6: A(b + 1 : m, i + 1 : b) = A(b + 1 : m, i : b) − A(b + 1 : m, i ) · A(i + 1 : b, i )T
7: end for
Then, by delaying the update of submatrix by b steps, we get level 3 BLAS implemented
Cholesky. In details,
Question 2.17. Suppose that, in Matlab, you have an n-by-n matrix A and an n-by-1 matrix
b. What do A\b, b’/A, and A/b mean in Matlab? How does A\b differ from inv(A)*b? 4
1. Show that after k steps of Gaussian elimination without pivoting, A 22 has been overwrit-
ten by S.
Proof.
4 I don’t use matlab much, so the answer may be modefied.
25
1. As we know, one step of Gaussian elimination overwrite the n − 1 lower submatrix by
Schur complement. If it holds for (k − 1)th step of Gaussian elimination, it suffices to
prove it is also true for the k steps of Gaussian elimination. Denote
A 11 v k−1 A 12
x T (A 22 − A 21 A −1 T T
11 A 12 )x = x A 22 x − (A 12 x) A 11 (A 12 x) ≥ 0.
26
Question 2.19. Matrix A is called strictly column diagonally dominant, or diagonally domi-
nant for short, if
n
X
|a i i | > |a j i |.
j =1, j 6=i
2. Show that Gaussian elimination with partial pivoting does not actually permute any
rows, i.e., that it is identical to Gaussian elimination without pivoting.
Proof.
Thus, we have
|a j i | ≤ λ ≤ a i i +
X X
ai i − |a j i |.
j 6=i j 6=i
When a i i > 0, we have λ > 0. Otherwise, a i i < 0 will result in λ < 0. Since A is strictly
column diagonally dominant, a i i 6= 0. Thus, we prove λ 6= 0, which means det(A) =
det(A T ) = ni=1 λ 6= 0. i.e., A is nonsingular.
Q
2. Since |a 11 | > |a j 1 |, it holds for the base case. If we could prove that after updating the
submatrix still has the diagonally dominant property, we would prove the question.
Denote
a i0 j = a i j − a 1 j ∗ a i 1 /a 11 .
Sum them up, we get
n n
|a i0 j | ≤
X X
(|a i j | + |a 1 j ∗ a i 1 /a 11 |)
i =2,i 6= j i =2,i 6= j
n
X
≤ |a i i | − |a 1i | − |a 1 j | + |a 1 j /a 11 | |a i 1 |
i =1,i 6= j
≤ |a i i | − |a 1i | ≤ |a i i | − |a 1i ∗ a i 1 /a 11 | ≤ |a i0 i |.
Question 2.20. Given an n-by-n nonsingular matrix A, how do you efficiently solve the follow-
ing problems, using Gaussian elimination with partial pivoting?
27
(b) Compute α = c T A −1 b.
You should (1) descirbe your algorithms, (2) present them in pseudocode and (3) give the require
flops.
Solution.
1: for i = k to 1 do
2: solve Ax = b
3: b=x
4: end for
(b) First solve Ax = b, then do the inner product. The pseudocode is shown below.
1: solve Ax = b
2: α = cT x
Totally, it require 23 n 3 + O(n 2 ) flops.
Question 2.21. Prove that Strassen’s algorithm (Algorithm 2.8) correctly multiplies n-by-n ma-
trices, where n is a power of 2.
Proof. Since n is a power of 2, all matrix partitions and matrix multiplication will be ok.
To prove the result, it suffice to prove the correctness of the recursive step. According to
Strassen’s algorithm, we have
P 1 = A 12 B 21 + A 12 B 22 − A 22 B 21 − A 22 B 22 , P 4 = A 11 B 22 + A 12 B 22 ,
P 2 = A 11 B 11 + A 11 B 22 + A 22 B 11 + A 22 B 22 , P 5 = A 11 B 12 − A 11 B 22 ,
P 3 = A 11 B 11 + A 11 B 22 − A 21 B 11 − A 21 B 12 , P 6 = A 22 B 21 − A 22 B 11 ,
P 7 = A 21 B 11 + A 22 B 11 , C 11 = A 12 B 21 + A 11 B 11 ,
C 12 = A 12 B 22 + A 11 B 12 , C 21 = A 22 B 21 + A 21 B 11 ,
C 22 = A 22 B 22 + A 21 B 12 .
28
Therefore, · ¸ · ¸
A 11 B 11 + A 12 B 21 A 11 B 12 + A 12 B 22 C 11 C 12
A= = .
A 21 B 11 + A 22 B 21 A 21 B 12 + A 22 B 22 C 21 C 22
So the recursive yield the correct answer, and the basic case for the Strassen’s algorithm is just
the scalar arithmtic operations. So, Strassen’s algorithm correctly multiplies n-by-n matrices,
where n is a power of 2.
Proof. The differences between CGS and MGS are the different ways to calculate coeffi-
cients r i j . For CGS, it maintains to use the original vectors a i , while the MGS uses the up-
dated vectors q i , which after j steps of updates becomes
j
X
q i = ai − r ki q k .
k=1
j
r ( j +1)i = q Tj+1 q i = q Tj+1 (a i − r ki q j ) = q Tj+1 a i .
X
k=1
2. What is the conition number of the coefficient matrix in terms of the singular values of
A?
3. Give an explicit expression for the inverse of the coefficient matrix, as a block 2-by-2 ma-
trix.
Solution.
29
· ¸ · ¸ · ¸
I A r b
1. We can decompose the equation T · = into the following two equations:
A 0 x 0
r + Ax = b,
A T r = 0.
A T r + A T Ax = A T Ax = A T b,
which means x satisfies the normal equation. Therefore, x minimizes kAx − bk2 .
2. Since A has full column rank, let U T AV = Σ the full SVD of A, where U ,V are m-by-
m, n-by-n orthgonal matirces and Σ is m-by-n diagonal matrix with nonzero entries.
Applying orthgonal similarity transformation onto the coefficient matrix, we have
· T
Σ
¸· ¸· ¸ · ¸
U Im A U I
= .
V T AT V ΣT
Since Σ has full colum rank, we can compute the eigenvalues of the above matrix by
applying elementary transformations with unit determinant as
3. To find its inverse, we use invertible matrices to transform the coefficient matrix into
indentity matrix as follows
I m A(A T A)−1
· ¸· ¸· ¸· ¸ · ¸
Im Im Im A Im
= .
−(A T A)−1 In −A T I n A T In
30
Thus, we have the candidate matrix as
I m A(A T A)−1 I m − A(A T A)−1 A T A(A T A)−1
· ¸· ¸· ¸ · ¸
Im Im
= = B.
−(A T A)−1 In −A T I n (A T A)−1 A T −(A T A)−1
· ¸ · ¸
I A I A
It is easy to check that B T = T B = I m+n . Consequently, B is the inverse
A 0 A 0
of coefficient matrix.
A T C Ax = A T C b.
To derive the corresponding formulation like previous question, consider the following sym-
metric full rank linear systems of equations
C 1/2 A
· ¸ · ¸ · 1/2 ¸
I r C b
· = .
A T C 1/2 0 x 0
Thus, if we take C 1/2 A as à and C 1/2 b as b̃, we have the same equation as before and the
previous result also applies.
31
Question 3.5. Let A ∈ Rn×n be positive definite. Two vectors u 1 and u 2 are called A-orthgonal
if u T1 Au 2 = 0. If U ∈ Rn×r and U T AU = I , then the columns of U are said to be A-orthgonal.
Show that every subspace has an A-orthgonal basis.
Proof. Given any subspace, there exists a set of linear independent vectors α1 , α2 , . . . , αn ,
¡ ¢
which can span the whole space. Then, we use the given vectors to create an A-orthonormal
basis β1 , β2 , . . . , βn as follows:
¡ ¢
1: for i = 1 to n do
βi = αi − ij−1 (βT Aαi )β j
P
2:
q =1 j
3: βi = βi / βTi Aβi
4: end for
Since αi span the whole space, the above process terminates at nth step. It is easy to verify
that the first step holds the A-orthonormal property as
βT1 Aβ2 = βT1 Aα2 − (βT1 Aα2 )βT1 Aβ1 /βT2 Aβ2 = βT1 Aα2 − βT1 Aα2 /βT2 Aβ2 = 0.
¡ ¢ ¡ ¢
c 1 βi 1 + c 2 βi 2 + . . . + c s βi s = 0.
Apply βTik A on both side, and we have c k = 0, which implies they are linear independent. As
αi span the whole space, the vector set βi also form a basis of the space and is mutually
A-orthonormal.
P (1, 1) 0 P (1, 2 : m + 1)
P̂ = 0 I n−1 0 .
P (2 : m + 1, 1) 0 P (2 : m + 1, 2 : m + 1)
32
Apply P̂ into A, we have
kx 1 k2 ∗
P̂ A = A 1 = 0 R(2 : n, 2 : n) .
0 ∗
Thus, we reduce the problem into smaller subproblem. Continue, and we will reduce A into
upper triangluar.
Question 3.7. If A = R +uv T , where R is an upper triangular matrix, and u and v are column
vectors, describe an effiecient algorithm to compute the QR decomposition of A.
Solution. When either u or v is 0, the answer is thrivial. Suppose u n 6= 0. Otherwise the last
row of uv T will be 0 and we can start from the (n −1)th row of uv T . As the rank of uv T is only
one and by assumption the last row of uv T is not 0, it follows that the last row of uv T can
linear represent the remaining rows. Thus, there exist n − 1 Givens rotations G 1 ,G 2 , . . . ,G n−1
which rotate i th and nth rows to cancel i th row. Since R is upper triangular, G i R is also upper
triangular, which means
is upper triangular except for the last row. Then, we proceed to cancel the last row. First, if
R(1, 1) is not zero, we use the first row of Ā to cancel Ā(n, 1). Otherwise, we swap the first row
and the last row of Ā. Continue like this, and we reduce Ā to upper triangular, and all the
matrices we apply to A forms the orthgonal matrix Q.
Question 3.8. Let x ∈ Rn and let P be a Householder matrix such that P x = ±kxk2 e 1 . Let
Q 1,2 , . . . ,Q n−1,n be Givens rotations, and let Q = G 1,2 . . .G n−1,n . Suppose Qx = ±kxk2 e 1 . Must P
equal Q?
Solution. P does not need to equal Q. Consider the two dimension counterexample as
· ¸· ¸ · ¸
1 −3 −4 3 −5
Px = = = −kxk2 e 1 .
5 −4 3 4 0
· ¸
2 1
The above matrix is a Householder matrix with u = as generating vector. In the same
5
1
time, the following matrix is a Givens rotation, which satisfies the condition while does not
equal to P ,
· ¸· ¸ · ¸
1 −3 −4 3 −5
Qx = = = −kxk2 e 1 .
5 4 −3 4 0
Question 3.9. Let A be m-by-n, with SVD A = U ΣV T . Compute the SVDs of the following
matrices in terms of U , Σ, and V :
1. (A T A)−1 ,
2. (A T A)−1 A T ,
33
3. A(A T A)−1 ,
4. A(A T A)−1 A T .
Remark. For the formulas to hold, we have to assume that A has full column rank. And for
simplicity, later we will assume that m ≥ n if the contrary situation is not stated explicitly,
since for the case n ≥ m can be treated similarly.
Proof.
Question 3.10. Let A k be a best rank-k approximation of the matrix A, as defined in Part 9 of
Theorem 3.3. Let σi be the i th singular value of A. Show that A k is unique if σk > σk+1 .
Proof. I believe what the question claim is not correct. According to the textbook, the so-
called best rank-k approximation of a matrix A under two-norm is a rank-k matrix which
satisfies
kA − Ãk2 = σk+1 .
Thus, I present a counterexample to illustrate that it may not be unique under the assupmtion
of the question. Consider the matrix A as
4
3
A = diag(4, 3, 2, 1) = .
2
1
34
where 3 < ² < 4, satisfies kA − Ãk2 = 1. Therefore, the best rank-3 approximation of A is not
unique. However, the result holds if we measure under Frobenius norm, and if the best rank-
k approximation must have the form of ki=1 σi u i v Ti , the proof is trivial.
P
Remark. Under the assumption of the question, I can only prove that any matrix B satisfies
kA − B k2 = σk+1 has the condition
T
u 1 B v 1 u T2 B v 1 . . . u Tk B v 1
u T B v T T
2 u2 B v 2 . . . uk B v 2
1
.. .. .. ..
U T BV =
. ,
. . .
T T T
u 1 B v k u 2 B v k . . . u k B v k
0
where U and V are the orthgonoal matrices of the SVD of A. In some degree, the two norm
equality kA−B k2 = σk+1 does not provide enough information of the range of B , which results
in its nonuniqueness.
Question 3.11. Let A be m-by-n. Show that X = A + (the Moore-Penrose pseudoinverse) mini-
mizes kAX − I kF over all n-by-m matrices X . What is the value of this minimum?
X̂ (i , i ) = σ−1
i , X̂ (i , j ) = 0, where i ∈ {1, 2, . . . , r }, and j 6= i ∈ {1, 2, . . . , m}.
So, X̂ may have many choices, and X̂ = Σ+ always satisfies the condition. As a result, X = A +
p
is the minimizer, and the minimum is n − r .
Question 3.12. Let A, B , and C be matrices with dimensions such that the product A T C B T is
well defined. Let X be the set of matrices X minimizing kAX B −C kF , and let X 0 be the unique
member of X minimizing kX kF . Show that X 0 = A +C B + .
Remark. This solution is unique only under the assumption that A and B have full rank.
Proof. Assume matrix A as m-by-s, B as t -by-n, C as m-by-n. Denote the SVD decomposi-
tion of A as A = U1 Σ1V1T , B as B = U2 Σ2V2T . As there is an one-to-one correspondence from
X to X̂ = V1T X U2 , we can substitue X as X̂ . Thus, we have
35
where r 1 is the rank of A and r 2 is the rank of B . As the matrix U1T CV2 are set once A, B and C
are given, the minimizer must and only have to minimize
Question 3.13. Show that the Moore-Penrose pseudoinverse of A satisfies the following identi-
ties:
A A + A = A, A + A A + = A + , A + A = (A + A)T , A A + = (A A + )T .
1.
· ¸
T T T Ir 0
+
A A A = U ΣV V Σ U U ΣV +
=U ΣV T = U ΣV T = A.
0 0
2.
· ¸
T Ir T T 0 + T
+ + +
A A A = V Σ U U ΣV V Σ U = V +
Σ U = V Σ+U T = A + .
0 0
3.
· ¸ · ¸
T T Ir 0 T Ir 0 T T
+ +
A A = V Σ U U ΣV =V V = (V V ) = (A + A)T .
0 0 0 0
4.
· ¸ · ¸
T Ir T 0 T Ir 0 T T
+
A A = U ΣV V Σ U = U +
U = (U U ) = (A A + )T .
0 0 0 0
0 AT
¸ ·
Question 3.14. Prove part 4 of Theorem 3.3: Let H = , where A is square and A =
A 0
U ΣV T is its SVD. Let Σ = diag(σ1 , σ2 . . . , σn ),U = [u 1 , u 2 , . . . , u n ], and V = [v 1 , v 2 , . . . , v n ]. Prove
· ¸
vi
that the 2n eigenvalues of H are ±σi , with corresponding unit eigenvectors p1 . Extend
2 ±u i
to the case of retangular A.
36
Proof. We directly prove the case of retangular A, thus the square case is proved without
special treatment. First, denote the SVD decomposition of A as U ΣV T = ni=1 σi uv T , where
P
Then, as
p
°· ¸°
° vi °
° ±u ° = 2,
° °
i 2
and all p1 [v i , ±u i ]T are linear indenpendent, ±σi are the 2n eigenvalues of H with corre-
2
sponding unit eigenvectors p1 [v i , ±u i ]T . As for the remaining m − n eigenvalues, let Ũ =
2
[u n+1 , . . . , u m ] be the orthgonal supplement of U . We can claim that the remaining m − n
eigenvalues of H are zeros, with corresponding unit eigenvectors [0, u i ]T , since
Pn T ¸· ¸ · ¸
i =1 σi v u
· ¸ ·
0 0 0 0
H = Pn T = .
ui σ
i =1 i uv 0 u i 0
Question 3.15. Let A be m-by-n, m < n, and of full rank. Then min kAx − bk2 is called an
underdetermined least squares problem. Show that the solution is an (n − m)-dimensional
set. Show how to compute the unique minimum norm solution using appropriately modified
normal equations, QR decomposition, and SVD.
Solution. We explain these three techniques one by one, and prove the properties of the
solution in the first technique.
A AT y = b
has unique solution. Every vector in the form A T y + x 2 is the minimizer, which forms
an (n − m)-dimensional set. The unique minimum norm solution is x = A T (A A T )−1 b .
37
2. For QR decompostion, since A T has the normal QR decomposition as A T = QR where
Q is n-by-m and R is m-by-m, we have that A = R T Q T , and the least square problem
equals the equation
R T Q T x = b.
Let x = Q(R T )−1 b + x̃, where x̃ can be any vector. We can transform the above equation
into
Q T x̃ = 0.
Thus, x̃ belong to the null space of Q T . Evaluate the norm of our solution, we have
kxk22 = kQ(R T )−1 b + x̃k22 = kQ(R T )−1 bk22 + kx̃k22 ≥ kQ(R T )−1 bk22 .
U T x̃ = 0.
Thus, x̃ belong to the null space of U T . Evaluate the norm of our solution, we have
Question 3.16. Prove Lemma 3.1:
Let P be an exact Householder (or Givens) transformation, and P̃ be its floating point approxi-
mation. Then
fl(P̃ A) = P (A + E ) kE k2 = O(²)kAk2
and
fl(A P̃ ) = (A + F )P kF k2 = O(²)kAk2 .
Proof. If fl(a ¯ b) = (a ¯ b)(1 + ²) holds, like Question 1.10., we can prove that
fl(AB ) − AB = O(²)AB.
≤ O(²)kP −1 k2 kP k2 kAk2
= O(²)kAk2 .
38
Question 3.17. The question is too long, so I omit it. For details, please refer to the textbook.
Solution.
P = P 2 P 1 = (I − u 2 u T2 )(I − u 1 u T1 )
= I − 2u 1 u T1 − 2u 2 u T2 − 4u T2 u 1 u 2 u T1
· ¸· T ¸
2 u1
= I − [u 1 , u 2 ] .
4u T2 u 1 2 u T2
Thus, the base case holds. Then, assume that P 0 = P k P k−1 · · · P 1 = I −Uk Tk UkT , then we
have
Therefore, by induction, we have proved part I of the question. As for algorithm, we can
use the recursive formula we get in induction as
· ¸
Tk
Tk+1 = T .
2u k+1Uk Tk 2
39
3. Denote b as a small integer. Like Level 3 BLAS Gaussian elimination, we use the tech-
nique of delaying update. In details, assume we have already done the first k−1 columns,
yielding
k −1 b n −k −b +1
k −1 n −k +1
à ! R 11 R 12 R 13
k −1 Q 11 Q 12
A = · Ã 22 Ã 23
m −k −1 Q 21 Q 22
à 32 à 33
¸ ·
à 22
Then, apply the Level 2 BLAS QR implement to the submatrix and get
à 23
· ¸ · ¸
à 22 à 23 R 22 R 23
=Q ,
à 32 à 33 Ā 33
· ¸ · ¸ · ¸ · ¸
R 22 Ã 22 R 23 Ã 23
where Q = and Q = . Thus,
0 Ã 32 Ā 33 Ã 33
¸ R 11 R 12 R 13
· ¸·
Q 11 Q 12 I
A= R 22 R 23
Q 21 Q 22 Q
Ā 33
¸ R 11 R 12 R 13
·
Q 11 Q 12Q
= R 22 R 23
Q 21 Q 22Q
Ā 33
The Q can be computed by part I, and R 23 , Ā 33 can be computed from matrix product,
which is level 3 BLAS.
Question 3.18. It is often of interest to solve constrained least squares problems, where the
solution x must satisfy a linear or nonlinear constraint in addition to minimizing kAx − bk2 .
We consider one such problem here. Suppose that we want to choose x to minimize kAx − bk2
subject to the linear constraint C x = d . Suppose also that A is m-by-n, C is p-by-n, and C has
full rank. We also assume that p ≤ n (so C x = d is guaranteed to be consisitent) and n ≤ m + p
(so the system is not underdetermined). Show that there is a unique solution under the assump-
· ¸
A
tion that has full column rank. Show how to compute x using two QR decompositions and
C
some matrix-vector multiplications and solving some triangular systems of equations.
40
Like Question 3.15., the solutions of the above linear system are Q 1 L −1 d +Q 2 y, where y is any
vector in Rn−p . Therefore, the original least square problem equals to
y = R̂ −1Q̂ T (b − AQ 1 L −1 d ).
x = AQ 2 y +Q 1 L −1 d .
∞ ¢i
= (−1)i (A T A)−1 A T δA + (A T A)−1 (δA)T A + (A T A)−1 (δA)T (δA) (A T A)−1 (A + δA)T (b + δb)
X ¡
i =0
≈ I − (A T A)−1 (δA)T A − (A T A)−1 A T δA (A T A)−1 (A T b + (δA)T b + A T δb)
¡ ¢
it follows b 1 q 1 + b 2 q 2 + . . . + b n−p q n−p is in the null space of A, which is also in the null space of C .
41
kr k2
Also, since sin θ = kbk2 and θ is acute, we can evaluate cos θ and tan θ as following:
r
kbk22 −kr k22
p
kAxk2
cos θ = 1 − sin θ =
2
kbk2
= kbk2 ,
2
sin θ kr k2
tan θ = cos θ = (kr k2 /kbk2 )/(kAxk2 /kbk) = kAxk2 .
Thus,
kx̃ − xk2 ≤ ²k(A T A)−1 A T k2 kAk2 kxk2 + ²k(A T A)−1 k2 kAk2 kr k2 + ²k(A T A)−1 A T k2 kbk2
≤ ²κ2 (A)kxk2 + ²κ22 (A)kxk2 tan θ + ²κ2 (A)kxk2 / cos θ
µ ¶
2κ2 (A)
≤² + tan θ · κ22 (A) .
cos θ
that det(A − λI ) = bi=1 det(A i i − λI ). Conclude that the set of eigenvalues of A is the union of
Q
Proof. Denote A i i as m i -by-m i and the whole matrix A as n-by-n. From the definition of
determinant, we have det(A) = (−1)σi a i 1 ,1 a i 2 ,2 , . . . , a i n ,n , where σi is the inverse number of
P
For the first group, a i j , j = 0 when i j > m 1 . Thus, we can delete them from the sum, and get
Therefore, we reduce our problem into smaller subproblem. Continue, and we have
b
Y
det(A) = det(A 11 ) det(A 22 ) . . . det(A bb ) = det(A i i ).
i =1
Consequently,
b
det(A − λI ) = det(A i i − λI ).
Y
i =1
Thus λ is an eigenvalue of A if and only if it is an eigenvalue of some A i i .
Question 4.2. Suppose that A is normal; i.e., A A ∗ = A ∗ A. Show that if A is also triangular,
it must be diagonal. Use this to show that an n-by-n matrix is normal if and only if it has n
orthonormal eigenvectors.
42
Proof. Since A is normal and triangular, we have
a 11 a 12 . . . a 1n ā 11 ā 11 a 11 a 12 ... a 1n
a 22 . . . a 2n ā 12 ā 22 ā 12 ā 22 a 22 ... a 2n
.
.. = .
.. .. .. .. .. .. .. ..
.
. . . . . . . . . .
a nn ā 1n ā 2n . . . ā nn ā 1n ā 2n ... ā nn a nn
Thus, for i 6= 1 we have |a 1i | = 0. Then we reduce the problem into smaller subproblem.
Continue, and we will have A is diagonal. For any n-by-n normal matrix A, we have a unitary
matrix U such that U ∗ AU = T , where T is triangular. Then, we claim that T is normal because
T ∗ T = U ∗ A ∗UU ∗ AU = U ∗ A ∗ AU = U ∗ A A ∗U = U ∗ AUU ∗ A ∗U = T T ∗ .
AU = U diag(λ1 , λ2 , . . . , λn ).
i.e., each column of U is the right eigenvector of A. On the other hand, if A has n orthonormal
eigenvectors, denote them as U . We have A = U diag(λ1 , λ2 , . . . , λn )U ∗ , and it is easy to verify
that A is normal.
Question 4.3. Let λ and µ be distinct eigenvalues of A, let x be a right eigenvector for λ, and
let y be a left eigenvector for µ. Show that x and y are orthogonal.
y ∗ Ax = µy ∗ x = λy ∗ x.
2. Show that ( f (T ))i i = f (Ti i ) so that the diagonal of f (T ) can be computed from the diag-
onal of T .
3. Show that T f (T ) = f (T )T .
4. From the last result, show that the i th superdiagonal of f (T ) can be computed from the
(i-1)st and earlier subdiagonals. Thus, starting at the diagonal of f (T ), we can compute
the first superdiagonal, second superdiagonal, and so on.
43
Proof.
A i = QT Q ∗QT Q ∗ . . .QT Q ∗ = QT i Q ∗ .
Thus,
∞ ∞
ai A i = a i QT i Q ∗ = Q f (T )Q ∗ .
X X
f (A) =
i =−∞ i =−∞
Thus,
∞ ∞
a i (T i ) j j = a i T ji j = f (T j j ).
X X
( f (T )) j j =
i =−∞ i =−∞
4. It is suffices to deduce the result for T i , and it can be prove by mathematic induction.
First, for T 2 , we have
iX
+k
T 2 (i , i + k) = T (i , t )T (t , i + k).
t =i
Thus, the kth superdiagonal can be computed from 0th to (k − 1)th superdiagonals.
P j −i +1
Then, suppose it is correct for T n , so we can express T n (i , j ) = t =1 c i , j ,t T (i , i + t −1),
where c i , j ,t are consts. Therefore, we can prove for T n+1 as
iX
+k iX
+k s−i
X+1
T k+1 (i , i + k) = T k (i , s)T (s, i + k) = T (s, i + k) c i ,s,t T (i , i + t − 1).
s=i s=i t =1
Thus, the kth superdiagonal of T n+1 can still be computed from 0th to (k − 1)th super-
diagonals.
Question 4.5. Let A be a square matrix. Apply either Question 4.4 to the Schur form of A or
equation (4.6) to the Jordan form of A to conclude that the eigenvalues of f (A) are f (λi ), where
the λi are the eigenvalues of A. This result is called the spectral mapping theorem.
44
Proof. Let U T U ∗ be the Schur form of A. According to part II of the previous question, we
know that the eigenvalues of f (T ) and f (A) are the same. Since ( f (T ))i i = f (Ti i ) = f (λi ),
the eigenvalues of f (T ) are f (λi ). Thus, the eigenvalues of f (A) are f (λi ) with the same
multiplicity as λi .
Question 4.6. In this problem we will show how to solve the Sylvester or Lyapunov equation
AX − X B = C , where X and C are m-by-n, A is m-by-m, and B is n-by-n. This is a system of
mn linear equations for the entries of X .
1. Given the Schur decompositions of A and B , show how AX − X B = C can be transformed
into a similar system A 0 Y − Y B 0 = C 0 , where A 0 and B 0 are upper triangular.
2. Show how to solve for the entries of Y one at a time by a process analogous to back sub-
stitution. what condition on the eigenvalues of A and B guarantees that the system of
equations is nonsingular?
45
Thus, if a nn 6= b i i , we can solve the last row of Y one at a time. Then continue like this
with the requisition that all the eigenvalues of A and B do not coincide. We can solve
Y.
3. As what we have deduced, Y = U1∗ X U2 , where U1 ,U2 are the unitary matrices which
transform A and B into Schur form separately. We can get
X = U1 Y U2∗ .
· ¸
A C
Question 4.7. Suppose that T = is in Schur form. We want to find a matrix S so that
0 B
· ¸ · ¸
−1 A 0 I R
S TS = . It turns out we can choose S of the form . Show how to solve for R.
0 B 0 I
Solution. As the question does not tell us the dimension of each block and matrix, we can
suppose that the matrix multiplication which the question requires is compatible. Since
· ¸
−1 I −R
S = , we have
0 I
· ¸· ¸· ¸ · ¸
I −R A C I R A AR +C − RB
S −1 T S = =
0 I 0 B 0 I 0 B
AR − RB = −C ,
are similar. conclude that the nonzero eigenvalues of AB are the same as those of B A.
· ¸
I −A
Proof. Consider the nonsingular matrix S = . We have
0 I
· ¸ · ¸· ¸· ¸ · ¸
AB 0 −1 I −A AB 0 I A 0 0
S S = = .
B 0 0 I B 0 0 I B BA
µ ¶ µ ¶
AB 0 0 0
Thus, and are similar. Consequently, they have the same eigenvalues, and
B 0 B BA
according to Question 4.1., we have
λn det(λI − AB ) = λm det(λI − B A)
46
Question 4.9. Let A be n-by-n with eigenvalues λ1 , . . . , λn . Show that
n
|λi |2 = min kS −1 ASk2F .
X
i =1 det(S)6=0
Remark. The question is not correct unless we change the operation from "min" into "inf".
I present a counterexample below to illustrate why. Consider A as
· ¸
1 1
A= .
0 1
· ¸
a b
For any nonsingular matrix S = , we have ad 6= bc and
c d
¸°2
a2
°·
1 ° ad − ac − bc
kS −1 ASk2F =
°
° °
2
(ad − bc) ° −c 2 ac + ad − bc °F
c 4 + a4
= + 2.
(ad − bc)2
Since a and c cannot equals zero simultaneously, we have
n
kS −1 ASk2F > |λi |2 .
X
i =1
Then, we can transform the set from nonsingular matrices to nonsingular upper triangular
matrices, because every nonsingular matrix S −1 has a QR decomposition as S −1 = QR. What’s
more, since unitary matrices do not change Frobenius norm, we have
Thus,
n
|λi |2 ≤ kRT R −1 k2F = kS −1 ASk2F .
X
inf inf
i =1 det(R)6=0 det(S)6=0
47
Then, we should that the lower bound can be approximated to any precision. Denote c =
maxi 6= j |T (i , j )|. For any ² < 1, set a = ²/ max(c, 1). Consider S = diag a n−1 , a n−2 , . . . , a, 1
¡ ¢
n(n−1)
Since |a i Tk j | < ², the square sum of the off-diagnoal entries is smaller than 2 ², which
proves the question.
Solution.
1
H = (A + A ∗ ),
2
1
S = (A − A ∗ ),
2
where A = H + S, H is Hermitian and S is skew-Hermitian.
48
4. If A is normal, according to Question 4.2., there exists unitary matrix U such that U AU ∗ =
diag(λ1 , . . . , λn ). Thus, we have
n
kAk2F = kU AU ∗ k2F = |λi |2 .
X
i =1
On the other hand, if A is not normal, denote its Schur form as U AU ∗ = T . We can
claim that T is not diagonal, otherwise T will be normal and consequently A is normal,
which contradicts our assumption. So, we get
n
kAk2F = kU AU ∗ k2F = kT k2F > λi .
X
i =1
Pn 2
Thus, if i =1 |λi | = kAk2F , A must be normal.
Question 4.11. Let λ be a simple eigenvalue, and let x and y be right and left eigenvectors.
We define the spectral projection P corresponding to λ as P = x y ∗ /(y ∗ x). Prove that P has the
following properties.
1. P is uniquely defined, even though we could use any nonzero scalar multiples of x and y
in its definition.
Remark. Part IV of the question follows directly from Question 1.7. However, I present a
similar proof as shown below to remain us of the result.
Proof.
1. Since λ is a simple eigenvalue, any other right and left eigenvectors can be expressed
as x̂ = c x and ŷ = d y. Thus, for the spectral projection defined by x̂ and ŷ, we have
P̂ = x̂ ŷ ∗ /( ŷ ∗ x̂) = cd x y ∗ /(cd y ∗ x) = P,
49
4. According to the result of part I, we can assume that kxk2 = kyk2 = 1. For any nonzero
vector α in Cn , we can decompose it into α = c y + d z, where y ∗ z = 0 and kzk2 = 1.
Then we have
kP αk2 kP (y + z)k2 |c| |c|
kP k2 = max = max = max ∗
= max .
α6=0 α6=0 α6=0 kαk2 |y x|
p
kαk2 kαk2 ∗
|c|+|d |6=0 |y x| |c| + |d |
It is easy to find that c = 0 cannot be the maximizer. Thus, we can divide |c| on the
fraction and have
|c| 1 1
kP k2 = max = max ≤ ∗ .
|y ∗ x| 1 + |d | |y x|
p q
|c|+|d |6=0 |y ∗ x| |c| + |d | |c|6 =0
|c|
· ¸
a c
Question 4.12. Let A = . Show that the condition numbers of the eigenvalues of A are
0 b
¡ c ¢2 1/2
both equal to (1+ a−b ) . Thus, the condition number is large if the difference a −b between
the eigenvalues is small compared to c, the offdiagonal part of the matrix.
Proof. As what the question suggests, we assume that all the variables appearing are real. It
is easy to verify that
x 1 = [1, 0]T ,
c T
y 1 = [1, a−b ]
x 2 = [1, a−b T
c ] ,
y 2 = [0, 1]T
are the right and left eigenvectors of eigenvalue λ2 = b. Thus, the condition number of λ1 is
kx 1 k2 ky 1 k2 ³ c ´2
= (1 + )1/2 ,
|y T1 x 1 | a −b
and that of λ2 is
kx 2 k2 ky 2 k2 ³ c ´2
= (1 + )1/2 .
|y T2 x 2 | a −b
Thus, the result is proved.
Question 4.13. Let A be a matrix, x be a unit vector (kxk2 = 1), µ be a scalar, and r = Ax −
µx. Show that there is a matrix E with kE kF = kr k2 such that A + E has eigenvalue µ and
eigenvector x.
50
Proof. Suppose such E exists. Since we have the condition
(A + E )x = µx,
Remark. I believe the question is correct under the assumption of both B and C are real.
Thus, it is what my proof will assume.
Solution. Denote a i∗j asthe (i , j ) entry of matrix A ∗ . As B and C are both real and
a i∗j = a j i =b j i + i c j i =b j i − ic j i = b j i − i c j i .
B β −C γ = λβ,
B γ +C β = λγ.
Then, considering
−C β B β −C γ λβ β
· ¸· ¸ · ¸ · ¸ · ¸
B
= = =λ ,
C B γ Cβ + Bγ λγ γ
−B γ −C β
· ¸· ¸ · ¸ · ¸ · ¸
B −C −γ −λγ −γ
= = =λ .
C B β −C γ + B β λβ β
Thus, as we can see, from the above equation, we know that [β, γ]T , [−γ, β]T are the eigenvec-
tors with corresponding eigenvalue λ of M .
51
Question 5.2. Prove Corollary 5.1, using Weyl’s theorem and part 4 of Theorem 3.3.
Proof. According to the hint provided, denote matrices H1 , H2 , H as
0 GT GT + F T 0 FT
· ¸ · ¸ · ¸
0
H1 = , H2 = , H= .
G 0 G +F 0 F 0
We know that σ1 ≥ σ2 . . . ≥ σn ≥ −σn ≥ −σn−1 ≥ . . . ≥ −σ1 are the eigenvalues of H1 , and
σ01 ≥ σ02 . . . ≥ σ0n ≥ −σ0n ≥ −σ0n−1 ≥ . . . ≥ −σ01 are the eigenvalues of H2 . Using Weyl’s theorem,
we have
|σi − σ0i | ≤ kH k2 .
Thus, what remains to prove is kH k2 = kF k2 . To do this, consider
0 FT 0 FT
· ¸ · ¸ · T ¸
F F 0
HT H = · = .
F 0 F 0 0 FFT
This matrix has the eigenvalues as F F T and F T F , which have the same nonzero eigenvalues.
Thus λmax (H T H ) = λmax (F T F ). i.e., kH k2 = kF k2 .
Question 5.3. Consider Figure 5.1. Consider the corresponding contour plot for an arbitrary
3-by-3 matrix A with eigenvalues α3 ≤ α2 ≤ α1 . Let C 1 and C 2 be the two great cirles along
which ρ(u, A) = α2 . At what angle do they intersect?
Remark. Since A is arbitary, we may assume A is real. Otherwise, x ∗ Ax may not be real
number. What’s more, assume A is diagnoalizable and the eigenvalues of A are not identical.
If not, I don’t think it will behave the same as what the textbook describes.
Solution. Suppose x 1 , x 2 , x 3 are three unit orthogonal eigenvectors of A with corresponding
eigenvalues as λ1 , λ2 , λ3 repectively. Thus, all vectors y ∈ R3 can be decomposed as y = b 1 x 1 +
b 2 x 2 + b 3 x 3 . If a unit vector y satisfies ρ(y, A) = α2 , we have
ρ(y, A) = λ1 b 12 + λ2 b 22 + λ3 b 32 = λ2 .
As we have passume not all eigenvalues are equal, we can suppose λ2 − λ3 6= 0 and denote
λ1 − λ2 / λ2 − λ3 = c without loss of generality. Thus, we express the two great ’circles’ by
p
parametric equations as
p
b 1 = sin(θ)/ 1 + c 2 ,
b2 = cos(θ),
p
b3 = ±c sin(θ)/ 1 + c 2 .
Then, their insection points are the points with θ = 0 and θ = π. For θ = 0, their tangent
vectors are
p p
t 1 = 1/ 1 + c 2 x 1 + c/ 1 + c 2 x 3 ,
p p
t 2 = 1/ 1 + c 2 x 1 − c/ 1 + c 2 x 3 .
52
2
Consequently, the corresponding intersection angle is arccos(t T1 t 2 /(kt 1 k2 ·kt 2 k2 )) = arccos( 1−c
1+c 2
).For
θ = π, by the same computation, the intersection angle is the same as θ = 0.
Question 5.4. Use the Courant-Fischer minimax theorem (Theorem 5.2) to prove the Cauchy
interlace theorem:
µ ¶
H b
1. Suppose that A = T is an n-by-n symmetric matrix and H is (n −1)-by-(n −1). Let
b u
αn ≤ . . . ≤ α1 be the eigenvalues of A and θn−1 ≤ . . . ≤ θ1 be the eigenvalues of H . Show
that these two sets of eigenvalues interlace:
αn ≤ θn−1 ≤ . . . ≤ θi ≤ αi ≤ θi −1 ≤ αi −1 ≤ . . . ≤ θ1 ≤ α1 .
µ ¶
H B
2. Let A = be n-by-n and H be m-by-m, with eigenvalues θm ≤ . . . ≤ θ1 . Show that
BT U
the eigenvalues of A and H interlace in the sense that α j +(n−m) ≤ θ j ≤ α j (or equivalently
α j ≤ θ j −(n−m) ≤ α j −(n−m) ).
Proof.
· ¸
I n−1
1. Denote the n-by-(n − 1) matrix P as P = . Thus, H = P T AP and for any vector
0
x ∈ Rn−1 , x T x = (P x)T (P x) holds because P T P = I n−1 . Noting that for linear inde-
pendent vectors x 1 , . . . .x n−1 ∈ Rn−1 , P x 1 , . . . .P x n−1 ∈ Rn are also linear independent.
Therefore, if S is i -dimensional space in Rn−1 , P S is i -dimensional space in Rn . Since
the minimum of the subset is not less than the minimum of whole set, we have
xT H x
θi = min max
S n−i 06=x∈S n−i xT x
(P x)T A(P x)
= min max
S n−i 06=x∈S n−i (P x)T (P x)
y T Ay
≥ min max = αi +1 .
S n−i 06= y∈S n−i yT y
xT H x
θi = max min
Si 06=x∈S i xT x
(P x)T A(P x)
= max min
Si 06=x∈S i (P x)T (P x)
y T Ay
≤ max min = αi .
Si 06= y∈S i yT y
53
¸ ·
Im
2. It is similar to what we have proved. Denote the n-by-m matrix P as P = , and we
0
have H = P T AP . Using Courant-Fischer minimax theorem, we have
xT H x
θi = min max
S m+1−i 06=x∈S m+1−i xT x
(P x)T A(P x)
= min max
S m+1−i 06=x∈S m+1−i (P x)T (P x)
y T Ay
≥ min max = αi +n−m .
S m+1−i 06= y∈S m+1−i yT y
On the other hand, we have
xT H x
θi = max min
Si 06=x∈S i xT x
(P x)T A(P x)
= max min
Si 06=x∈S i (P x)T (P x)
y T Ay
≤ max min = αi .
Si 06= y∈S i yT y
Combine the results, and we have αi +n−m ≤ θi ≤ αi , which is equivalent to what we
should prove.
Question 5.5. Let A = A T with eigenvalues α1 ≥ . . . ≥ αn . Let H = H T with eigenvalues
θ1 ≥ . . . ≥ θn . Let A + H have eigenvalues λ1 ≥ . . . λn . Use Courant-Fischer minimax theorem
(Theorem) to show that α j + θn ≤ λ j ≤ α j + θ1 . If H is positive definite, conclude that λ j > α j .
In other words, adding a symmetric positive definite matrix H to another symmetric matrix A
can only increase its eigenvalues.
Proof. According to Courant-Fischer minimax theorem, we have
x T (A + H )x
λi = min max
S n+1−i 06=x∈S n+1−i xT x
T
x Ax xT H x
≤ min ( max + max )
S n+1−i 06=x∈S n+1−i xT x 06=x∈S n+1−i xT x
x T Ax
≤ min ( max + θ1 ) = αi + θ1 .
S n+1−i 06=x∈S n+1−i xT x
And,
x T (A + H )x
λi = max min
Si 06=x∈S i xT x
x T Ax xT H x
≥ max( min + min )
Si 06=x∈S i xT x 06=x∈S i xT x
x T Ax
≥ max( min + θn ) = αi + θn .
Si 06=x∈S i xT x
54
Combine the results, and we have a j + θn ≤ λ j ≤ α j + θ1 . As for positive definite H , we know
that all its eigenvalues are positive. Thus,
α j < α j + θn ≤ λ j .
Question 5.6. Let A = [A 1 , A 2 ] be n-by-n, where A 1 is n-by-m and A 2 is n-by-(n − m). Let
σ1 ≥ σ2 ≥ . . . ≥ σn be the singular values of A and τ1 ≥ τ2 . . . ≥ τm be the singular values of A 1 .
Use the Cauchy interlace theorem from Question 5.4 and part 4 of Theorem 3.3 to prove that
σ j ≥ τ j ≥ σ j +n−m .
Proof. Denote the matrix H as
0 A1 A2
H = A T1 0 0 .
A T2 0 0
According to part 4 of Theorem 3.3, the absolute values of eigenvalues of H are the singular
values of A T , which are equal to those of A. Thus, we can express the eigenvalues of H as
σ1 ≥ σ2 ≥ . . . ≥ σn ≥ −σn ≥ . . . ≥ −σ1 . Consider the (n + m)-by-(n + m) principal submatrix
· ¸
0 A1
H1 = T .
A1 0
σi +(n−m) ≤ τi ≤ σi ,
k(q + d )q T − I k2 = kq + d k2 .
Remark. My proof suppose the dimension of the vector space is greater than 2. If the di-
mension is only 2, the proof is similar.
Proof. Denote z as a unit vector which is orthogonal to both q and d . Therefore, any vector
x can be express as x = aq + bd + c z, and we have
k (q + d )q T − I xk22
¡ ¢
T 2
k(q + d )q − I k2 = max
x6=0 kxk22
(a 2 − 2ab + b 2 )kd k22 + c 2
= max
a 2 +b 2 +c 2 6= a 2 + b 2 kd k22 + c 2
(kd k22 − 1)a 2 − 2abkd k22
à !
= max 1+
a 2 +b 2 +c 2 6=0 a 2 + b 2 kd k22 + c 2
(kd k22 − 1)a 2 − 2abkd k22
à !
≤ max 1+ .
a 2 +b 2 6=0 a 2 + b 2 kd k22
55
To determine the maximum of the above formula, we need to consider whether a = 0. If a = 0,
the above formula yields 1; If not, set s = b/a, c = kd k22 , and define
c − 1 − 2c s 0 2
s 2 − (1 − 1c )s − 1c
f (s) = , and f (s) = 2c .
1 + cs 2 (1 + c s 2 )2
c 2 +c
Thus, f (s) has two local maximum at s = − 1c , s = +∞. Since f (− 1c ) = c+1 , f (+∞) = 0, we have
Question 5.8. Formulate and prove a theorem for singular vectors analogous to Theorem 5.4.
Remark. My theorem has been shown below, and I assume that the target matrix A is m-by-
n, where m ≥ n. However, I am not sure whether it is the one that the question requests.
Theorem 5.4’. Let A = U ΣV T , which is the reduced SVD of an m-by-n matrix A. Let A + E =
 = Û Σ̂V̂ T be the perturbed SVD. Write U = [u 1 . . . , u n ],V = [v 1 . . . , v n ], Û = [û 1 . . . , û n ], and
V̂ = [v̂ 1 . . . , v̂ n ]. Let θ, ϕ denote the acute angles between u i , û i and v i , v̂ i repectively. Define
gap(i , A) as min j 6=i |σi − σ j | if i 6= n; and gap(n, A) as min j 6=n (σn , |σn − σ j |) when i = n. Then
we have p
2 2kE k2
max(sin 2θ, sin 2ϕ) ≤ .
gap(i , A)
µ ¶ µ ¶
vi v̂ i
Proof. Denote ωi = p1 , ω̂i = p1 , and θ̂ the acute angle between ωi and ω̂i . Then,
2 ui 2 û i
consider the matrix
AT AT + E T
· ¸ · ¸
0 0
H= , Ĥ = ,
A 0 A +E 0
and δH = Ĥ − H . By Theorem 3.3 and Question 5.2, we know that kδH k2 = kE k2 and σ1 ≥
. . . σn ≥ 0 ≥ . . . ≥ 0 ≥ −σn ≥ . . . ≥ −σ1 are the eigenvalues of H, σ̂1 ≥ . . . σ̂n ≥ 0 ≥ . . . ≥ 0 ≥ −σ̂n ≥
. . . ≥ −σ̂1 are the eigenvalues of Ĥ . Thus, According to and Theorem 5.4, we know that
2kE k2
sin 2θ̂ ≤ .
gap(i , A)
1 1
cos θ̂ = ωTi ω̂i = (u T û + v T v̂ ) = (θ + ϕ),
2 2
we just prove for the angle, which is larger than θ̂. For the smaller angle, consider ku i + û i k22
instead of ku − û i k2 . Without loss of generality, we may assume that θ is the larger one. Since
θ 1 θ̂
2 sin2 = 1 − cos θi = ku i − û i k22 ≤ kω − ω̂i k22 = 4 sin2 ,
2 2 2
56
and θ ≥ θ̂, we have
θ θ p θ̂ θ̂ p
sin θ = 2 sin cos ≤ 2 2 sin cos = 2 sin θ̂.
2 2 2 2
Then,
p p 2kE k2
sin 2θ = 2 sin θ cos θ ≤ 2 2 sin θ̂ cos θ̂ = 2 sin 2θ̂ ≤ .
gap(i , A)
Question 5.9. Prove bound (5.6) from Theorem 5.5.
Proof. Since A is symmetric, it has a set of orthonormal vectors [q 1 , . . . , q n ], which span the
whole space. Thus, suppose the unit vector x = nj=1 b j q j . Since
P
Ax − ρ(x, A)x = r ,
b i Aq i = b i αi q i ,
subtract the second from the first, and we have
(A − ρ(x, A)I ) b j q j = r + (ρ(x, A) − αi )b i q i .
X
j 6=i
As we can see that the left side does not contain q i , the right side does not contain q i , either.
Thus, the part of q i of r cancelled by (ρ(x, A) − αi )b i q i , and kr + (ρ(x, A) − αi )b i q i k2 ≤ kr k2 .
P
Denote the right side as j 6=i c j q j . We have
Thus, b j = c j /(αi − ρ(x, A)) as we have assume the gap0 is not zero. Then
¢2 1 1 X 2 1 kr k2
sin θ = k c j /(αi − ρ(x, A)) ) 2 ≤
X X¡
b j q j k2 = ( ( c )2 ≤ .
j 6=i j 6=i gap0 j 6=i j gap0
Question 5.10. A harder question. Skipped.
Question 5.11. Suppose θ = θ1 + θ2 , where all three angles lie between 0 and π/2. Prove that
1 1 1
2 sin θ ≤ 2 sin 2θ1 + 2 sin 2θ2 .
Proof. Consider the function
f (θ1 ) = sin 2θ − sin 2θ1 − sin 2(θ − θ1 ).
Its derivative is
f 0 (θ1 ) = −2 cos 2θ1 + 2 cos 2(θ − θ1 ).
As 2θ − 2θ1 , 2θ1 ∈ [0, π], between which the function cos is monotonus, f 0 (θ1 ) has only one
root as θ1 = θ/2. Thus, f (θ1 ) decreases in [0, θ/2], and increases in [θ/2, θ]. It can only reach
its maximum at the extremity θ1 = 0 and θ1 = θ. i.e.,
f (θ1 ) ≤ max( f (0), f (θ)) = 0.
Therefore, 21 sin θ ≤ 12 sin 2θ1 + 12 sin 2θ2 .
57
Question 5.12. Prove Corollary 5.2.
0 GT 0 Ĝ T
· ¸ · ¸
Proof. Denote the matrix H as H = , and Ĥ as Ĥ = . According to Theo-
G 0 Ĝ 0
rem 3.3, σ1 ≥ . . . σn ≥ 0 ≥ . . . ≥ 0 ≥ −σn ≥ . . . ≥ −σ1 are the eigenvalues of H, σ̂1 ≥ . . . σ̂n ≥ 0 ≥
. . . ≥ 0 ≥ −σ̂n ≥ . . . ≥ −σ̂1 are the eigenvalues of Ĥ . Since
· T ¸ · ¸
X 0 X 0
Ĥ = H ,
0 YT 0 Y
°· T ¸· ¸ °
° X 0 X 0
° 0 Y T 0 Y − I ° = ², by applying Theorem 5.6, we will prove the
°
if we can prove ° °
°· 2T ¸°
° X X −I 0 °
result. As the target two-norm equals ° T
° and
° 0 Y Y − I °2
°· T ¸°
° X X −I 0 ° = max kX T X − I k2 , kY T Y − I k2 , 6
° ¡ ¢
°
T
° 0 Y Y −I 2 °
P k−1 R kT = (A − σk )P k .
58
Question 5.14. Prove Lemma 5.1.
Proof. The prove will be trivial if x or y are zero vector. Thus, we may assume that both of
them are nonzero. Then, there exists a row of x y T that can linearly represent the other rows.
Without loss of generality, suppose it is the first row. Otherwise, we may change it to the first
row by exchange rows and columns. Denote the i th entry of x and y as x 1 and y 1 repectively.
Then, by assumption, x 1 6= 0. We have
1 + x1 y 1 x1 y 2 ... x1 y n
x2 y 1 1 + x2 y 2 . . . x2 y n
T
det(I + x y ) = det
.. .. .. ..
. . . .
xn y 1 xn y 2 . . . 1 + xn y n
1 + x1 y 1 x1 y 2 . . . x1 y n
−x 2 /x 1 1 ... 0
= det
.. .. .. ..
. . . .
−x n /x 1 0 ... 1
Pn
1 + i =1 x i y i x 1 y 2 . . . x 1 y n
0 1 ... 0
= det
.. .. .. ..
. . . .
0 0 ... 1
n
x i y i = 1 + x T y.
X
= 1+
i =1
T T
Remark. There is a simpler way to prove the question. First, observe that x y x = (y x)x,
which means x is an eigenvector of x y T corresponding to y T x. As the rank of x y T is one,
x y T has only one nonzero eigenvalue. Then by Question 4.5, the eigenvalues of I + x y T are
1’s and 1 + y T x. The result follows.
Question 5.15. Prove that if t (n) = 2t (n/2) + cn 3 + O(n 2 ), then t (n) ≈ c 34 n 3 . This justifies the
complexity analysis of the divide-and-conquer algorithm (Algorithm 5.2).
Proof. Since we have
59
Question 5.16. Let A = D + ρuu T , where D = diag(d 1 , . . . , d n ) and u = [u 1 , . . . , u n ]T . Show
that if d i = d i +1 or u i = 0, then d i is an eigenvalues of A. If u i = 0, show that the eigenvector
corresponding to d i is e i , the i th column of the identity matrix. Derive a similarly simple
expression when d i = d i +1 . This shows how to handle deflation in the divide-and-conquer
algorithm, Algorithm 5.2.
Proof. When u i = 0, then the i th column of uu T is zero vector. Thus,
(D + ρuu T )e i = d i e i .
Remark. My proof is based on solving (A − λI )x = 0. There is another way based on defla-
tion. First, as we can use a Givens rotation to cancel an entry of a vector, if d i = d i +1 , denote
G as the Givens rotation which satisfies
u1 u1
. .
.. ..
u u
i i
G = .
u i +1 0
.. ..
. .
un un
Then,
G AG T = GDG T + ρGuu T G T = D + ρ û û T ,
where û i = 0.
60
Question 5.17. Let ψ and ψ0 be given scalars. Show how to compute scalars c and ĉ in the
c
function definition h(λ) = ĉ + d −λ so that at λ = ξ, h(ξ) = ψ and h 0 (ξ) = ψ0 . This result is
needed to derive the secular equation solver in section 5.3.3.
Solution. As
c
h 0 (λ) = ,
(d − λ)2
the conditions h(ξ) = ψ and h 0 (ξ) = ψ0 become
c c
ψ = ĉ + , ψ0 = ,
d −ξ (d − ξ)2
whose solution is
c = ψ0 (d − ξ)2 , ĉ = ψ − ψ0 (d − ξ).
Question 5.18. Use the SVD to show that if A is an m-by-n real matrix with m ≥ n, then
there exists an m-by-n matrix Q with orthonormal columns (Q T Q = I ) and an n-by-n positive
semidefinite matrix P such that A = QP . This decomposition is called the polar decomposition
of A, because it is analogous to the polar form of a complex number z = e i arg(z) · |z|. Show that
if A is nonsingular, then the polar decomposition is unique.
Proof. Suppose A = U ΣV T , which is the reduced SVD of A. To show how to do the polar
decomposition, consider
SW T ΛW S T = Λ.
It follows that
P̂ = S T ΛS = S T SW T ΛW S T S = W T ΛW = P,
and AP −1 = Q = Q̂ = A P̂ −1 . p
T
Remark. In fact, the uniquenesspfollows from the uniqueness of A A. Standard textbooks
about linear algebra often define A T A = W T ΛW without checking whether it is well defined
as the orthogonal matrix W may not be unique. Thus, I present the proof here.
Proof.
61
1. Denote x as any vector in R2n . Then,
P T x = [e T1 , . . . , e T2n ]T x = [x 1 , x n+1 , x 2 , . . . .x n , x 2n ]T ,
x T P = x T [e 1 , . . . , e 2n ] = [x 1 , x n+1 , x 2 , . . . , x n , x 2n ].
i.e.,
0 a1 ... 0 0
a 0 ... 0 0
1
0 b1 ... 0 0
T
P AP =
.. .. .. .. .
..
. . . . .
0 0 ... 0 an
0 0 ... an 0
62
Question 5.20. Prove Lemma 5.7
χ1 a 1 ζ1 b 1
χ2 a 2 ζ2 b 2
χ3 a 3 ζ3 b 3
= .
.. ..
. .
χn a n
Question 5.21. Prove Theorem 5.13. Also, reduce the exponent 4n −2 in Theorem 5.13 to 2n −1.
63
To improve the bound, it is necessary to improve max(kD 1 D 1 − I k2 , kD 2 D 2 − I k2 ). Consider,
p Q
D̂ 1 = ( p1χ [n/2] ζ j /χ j )D 1 , D̂ 2 = ( χn [n/2] χ j /ζ j )D 2 8 . Then,
Q
n j =1 j =1
à !
χ1 [n/2] χ2 [n/2] p n−1
ζ j /χ j , p ζ j /χ j , . . . , χn χ j /ζ j ,
Y Y Y
D̂ 1 = diag p
χn j =1 χn j =2 j =[n/2]+1
à !
p [n/2] p [n/2] p n−1
D̂ 2 = diag χn χ j /ζ j , χn χ j /ζ j , . . . , χn ζ j /χ j
Y Y Y
j =1 j =2 j =[n/2]+1
¯ χ2 [n/2] ¯ χ2
à ¯ ¯ ¯ ¯!
¯ iY
−1 ¯
¯ i Y 2 2 ¯ i
kD̂ 1 D̂ 1 − I k2 ≤ max max ¯ ζ j /χ j − 1¯ , max ¯ 2 2
χ j /ζ j − 1¯ ≤ τ2n−1 − 1,
¯ ¯
1≤i ≤[n/2] ¯ χn j =i ¯ [n/2]+1≤i ≤n ¯ χn j =[n/2]+1 ¯
à ¯ ¯ ¯ ¯!
¯ [n/2]
Y 2 2 ¯ ¯ iY
−1 ¯
kD̂ 2 D̂ 2 − I k2 ≤ max max ¯χn χ j /ζ j − 1¯ , max ¯χn ζ2j /χ2j − 1¯ ≤ τ2n−1 − 1.
¯ ¯ ¯ ¯
1≤i ≤[n/2] ¯ j =i
¯ [n/2]+1≤i ≤n ¯ j =[n/2]+1
¯
Question 5.22. Prove that Algorithm 5.13 computes the SVD of G, assuming that G T G con-
verges to a diagonal matrix.
Remark. I am not sure whether those σi are ordered by their corresponding magnitude,
and it seems that G must have full rank. Otherwise, U = [G(:, 1)/σ1 ,G(:, 2)/σ2 , . . . ,G(:, n)/σn ]
cannot be orthonormal as G(:, i ) are linear dependent or some σi = 0.
Proof. Denote Ĝ as the resulted matrix such that Ĝ T Ĝ = diag σ21 , σ22 , . . . , σ2n = Λ. As Ĝ T Ĝ, it
¡ ¢
follows
n
σ2j = Ĝ(i , j )2 = kĜ(:, j )k22 .
X
i =1
Then, we claim that U = [Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn ] is orthonormal, since
8 [a] denotes the largest integer that smaller than a. Another feasible const is pχ Q(n/2) ζ χ , where (a) means
1 j =1 j j
the smallest ineger that larger than a.
64
Finally, as
p
U ΣV T = [Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn ] ΛJ T = Ĝ J T = Ĝ J −1 = G,
65
Thus, all the principal submatrices of C has nonnegative determinant, which implies C is
semipositive.
Question 5.27. Let µ ¶
I B
A=
B∗ I
with kB k2 < 1. Show that
1 + kB k2
kAk2 kA −1 k2 = .
1 − kB k2
Proof. To prove the equation, we need to compute the kAk2 and kA −1 k2 exactly. Denote B
µ ¶
n+m y
as n-by-m matrix, and partion any vector x ∈ R as x = where y ∈ Rn and z ∈ Rm . Then
z
we can evaluate the largest and smallest eigenvalues of A as follows:
µ ¶µ ¶
∗
¢ I B y
λmax = max x Ax = = 1 + 2 max Re(y ∗ B z),
¡ ∗ ∗
max y z
kxk2 =1 kyk22 +kzk22 =1 B∗ I z kyk22 +kzk22 =1
As 2|Re(y ∗ B z)| = ky ∗ B z + z ∗ B ∗ yk2 ≤ 2kyk2 kB k2 kzk2 ≤ (kyk22 + kzk22 )kB k2 = kB k2 , the above
formulas has an upper bound as 1 + 12 kB k2 , and a lower bound as 1 − 12 kB k2 . To see both of
them are attainable, set ẑ as the vector which satisfies kB ẑk2 = kB k2 , and ŷ which is a unit
vector and parallel to ẑ ∗ B ∗ . Thus,
y ∗B z ±1 ∗ ∗ ±1 ±1
= ẑ B B z = kB zk22 = kB k2 .
kyk22 + kzk22 2kB k2 2kB k2 2
What’s more, since we know kB k2 < 1, the matrix A are positive definite. Thus, the largest
eigenvalue of A −1 is the inverse of smallest eigenvalue of A. Therefore,
1 + kB k2
kAk2 kA −1 k2 = .
1 − kB k2
Question 5.28. A square matrix A is said to be skew Hermitian if A ∗ = −A. Prove that
1. the eigenvalues of a skew Hermitian are purely imaginary.
2. I − A is nonsingular.
x ∗ Ax = λx ∗ x = λ,
−x ∗ Ax =λx ∗ x =λ.
Add them together, and we have λ +λ= 0, which means λ is pure imaginary.
66
2. According to Question 4.5, the eigenvalues of I − A are 1 − λi , where λi are the eigen-
values of A. Since all the eigenvalues of A are pure imaginary,
Yq
|1 − λi | = 1 + λ2i ,
Y
Thus, C is unitary.
jπ 2jπ
2 −1 jπ 2 sin( N +1 ) − sin( N +1 )
−1 2 −1 sin( N +1 ) ..
2jπ .
sin( )
. ..
k jπ (k+1) j π (k−1) j π
−1 . .
N +1
.
. = 2 sin( N +1 ) − sin( N +1 ) − sin( N +1 )
..
.. .. ..
. . −1
j Nπ .
sin( N +1 )
−1 2 N jπ (N −1) j π
2 sin( N +1 ) − sin( N +1 )
k jπ (k + 1) j π (k − 1) j π k jπ k jπ jπ
2 sin( ) − sin( ) − sin( ) = 2 sin( ) − 2 sin( ) cos( ).
N +1 N +1 N +1 N +1 N +1 N +1
0· j π (N +1)· j π
As for the first and last entries, it also holds because sin( N +1 ) = sin( N +1 ) = 0.
67
T
1. The Cholesky factorization T N = B N B N has a upper bidiagonal Cholesky factor B N with
s s
i +1 i
B N (i , i ) = , B N (i , i + 1) = .
i i +1
i
L N (i , i ) = 1, L N (i + 1, i ) = − ,
i +1
i +1
U N (i , i ) = , U N (i , i + 1) = −1.
i
Proof.
T
1. It suffices to verify that T N = B N B N . According to Question 5.19, the diagonal entries of
q
BNT
B N are i +1
i + i −1
i = 2, and the offdiagonal entries are i +1 i T
i i +1 = 1. Thus, T N = B N B N .
2. As T N is band matrix, T (i , i ) would be updated only at (i − 1)th step. For the first step,
L N (1, 1) = 1, L N (2, 1) = − 12 and U N (1, 1) = 2, U N (1, 2) = −1. Suppose it holds for (k −1)th
step and does not require swapping rows. Then, for kth step, we have
.
.. .. .. ...
. . ..
.
. .
.. .
k
−1 1 .
k
−1
T N = L k−1 k−1 = L k−1
k−1 k−1 .
1
k+1
−1 2 −1 −1
k
k
..
.. .. . .
. . . .. ..
Therefore, it also holds for kth step and does not require swapping rows.
68
∂ 2
∂ 2
2 2 2 2
Proof. It suffices to verify that (− ∂x 2 − ∂y 2 ) sin(i πx) sin( j πy) = (i π + j π ) sin(i πx) sin( j πy)
as follows:
∂2 ∂2 ∂2 ∂2
(− − ) sin(i πx) sin( j πy) = − sin(i πx) sin( j πy) − sin(i πx) sin( j πy)
∂x 2 ∂y 2 ∂x 2 ∂y 2
= i 2 π2 sin(i πx) sin( j πy) + j 2 π2 sin(i πx) sin( j πy).
Question 6.4. 1. Prove Lemma 6.2.
Proof.
1. As the textbook has already proved part III of Lemma 6.2, we only need to prove part I
and part II. For part I, partition X by columns as X = [x 1 , . . . , x n ]. Then
Ax 1 A x1
.. . .. .
vec(AX ) = . = .. = (I n ⊗ A) · vec(X ).
Ax n A xn
The proof for part II is similar. It suffices to verify that X B (i , j ) = nk=1 B (k, j )X (i , k),
P
2. Since the (i , j ) block of (A ⊗ B )(C ⊗ D) is nk=1 A(i , k)C (k, j )B D = (AC ) ⊗ (B D), part I
P
(A ⊗ B )(A −1 ⊗ B −1 ) = I m ⊗ I n = I m+n .
4. It can be verified as
69
Question 6.5. Suppose that A n×n is diagonalizable, so A has n independent eigenvectors:
Ax i = αi x i , or AX = X Λ A , where X = [x 1 , x 2 , . . . , x n ] and Λ A = diag(αi ). Similarly, suppose
that B m×m is diagonalizable, so B has m independent eigenvectors: B y i = βi y i , or B Y = Y ΛB ,
where Y = [y 1 , y 2 , . . . , y m ] and ΛB = diag(β j ). Prove the following results.
(I m ⊗ A + B ⊗ I n )(Y ⊗ X ) = (Y ⊗ X )(I m ⊗ Λ A + ΛB ⊗ I n ).
(B ⊗ A)(Y ⊗ X ) = (Y ⊗ X )(ΛB ⊗ Λ A ).
Remark. It seems that part II of the question is wrong. The sum αi + β j should not equal
zero. It has been proved in Question 4.6. Also, I do not know why the author swapped A and
B in the matrix form, athough they are equivalent.
Proof.
(I m ⊗ A + B ⊗ I n )(Y ⊗ X ) = Y ⊗ AX + B Y ⊗ X = Y ⊗ X Λ A + Y ΛB ⊗ X
= (Y ⊗ X )(I m + Λ A ) + (Y ⊗ X )(ΛB + I n )
= (Y ⊗ X )(I m ⊗ Λ A + ΛB ⊗ I n ).
(I m ⊗ A + B ⊗ I n )vec(X ) = vec(C ),
which is a linear equation. To solve it, it is necessary and sufficient det(I m ⊗ A+B ⊗I n ) 6=
0. As what part I has proved, the eigenvalues of the above matrix is αi + β j . Thus,
for all i , j , αi + β j 6= 0. For the second Sylvester equation, it suffices to prove that the
eigenvalues of I m ⊗ A + B T ⊗ I n are also αi + β j , which is true if we subsititue B as B T in
part I.
(B ⊗ A)(Y ⊗ X ) = (B Y ⊗ AX ) = (Y ΛB ) ⊗ (X Λ A ) = (Y ⊗ X )(ΛB ⊗ Λ A ).
70
Question 6.6. Programming question. Skipped.
Proof.
1. Since the mth Chebyshev polynomial is defined by Tm (x) = 2xTm−1 (x) − Tm−2 (x) with
initial value T0 (x) = 1, T1 (x) = x, we can prove it by induction. As T0 (1) = 1, T1 (1) = 1
and assume it holds for (m − 1)th Chebyshev polynomial, it follows
3. The recursive formula has unique solution when initial value is given, though it may
take many forms. Thus, it suffices to verify such Tm satisfies the recursive formula by
induction, since it holds for base case. When |x| ≤ 1, it follows10
¡ ¢
Tm = cos(m · arccos x) = cos (m − 1) arccos x + arccos x
¡ ¢ ¡ ¢
= x cos( m − 1) arccos x − sin (m − 2) arccos x + arccos x sin(arccos x)
= xTm−1 − x sin (m − 2) arccos x sin(arccos x) − Tm−2 sin(arccos x)2
¡ ¢
³ ´
= xTm−1 − Tm−2 + Tm−2 cos(arccos x)2 + x/2 Tm−1 − Tm−3
3 x
= xTm−1 − Tm−2 + (2xTm−2 − Tm−3 )
2 2
= 2xTm−1 − Tm−2 .
5. As cosh x > 0, the zeros of Tm (x) must lie in segment [−1, 1]. When |x| ≤ 1, T (x) =
¡ ¢ ¡ ¢
cos m arccos x , which has zero at x i = cos (2i − 1)π/(2m) . Since Tm (x) is polynomial
¡ ¢
with degree m, all its zeros are x i = cos (2i − 1)π/(2m) .
9 Although T does not satisfy the formula, yet it does not matter.
0
10 cos(arccos x) = x holds because |x| ≤ 1.
11 cosh(arccoshx) = x holds because |x| ≥ 1.
71
p
6. Since arccoshx = ln(x ± x 2 − 1) when |x| > 1, it follows
1¡ p p ¢
Tm = exp(m ln(x ± x 2 − 1)) + exp(−m ln(x ± x 2 − 1))
2
1¡ p p
= (x ± x 2 − 1)m + (x ± x 2 − 1)−m .
¢
2
p p
As x + x 2 − 1 = 1/(x − x 2 − 1), it can be united as one formula as
1¡ p p
(x + x 2 − 1)m + (x + x 2 − 1)−m .
¢
2
1¡ p p ¢ 1 p p
(1 + ² + ²2 + 2²)m + (1 + ² + ²2 + 2²)−m ≥ (1 + 2²)m ≥ .5(1 + m 2²).
2 2
Question 6.9. Confirm that evaluating the formula in (6.47) by performing the matrix-vector
multiplications from right to left is mathematically the same as Algorithm 6.13.
I ⊗ Λ + Λ ⊗ I = diag(λ1 I + Λ, . . . , λn Λ),
it follows
1 1 1 1
(I ⊗ Λ + Λ ⊗ I )−1 = diag( , ,..., ,..., ).
2λ1 λ1 + λ2 λ2 + λ1 2λn
Consequently,
h 2 f i0, j +1
−1 2 0
(I ⊗ Λ + Λ ⊗ I ) vec(h F )( j n + i ) = .
λ j +1 + λi
The second step is correct. Finally, use part IV of Question 6.4., again, yielding
Question 6.10. The question is too long, so I omit it. For details, please refer to the textbook.
Proof.
72
1. First, we prove that when two matrices commute they have at least one common eigen-
vector. Suppose λ is an eigenvalue of matrix A with corresponding eigenvectors x 1 , x 2 , . . . , x s .
Since AB x i = B Ax i = λB x i , each B x i is in the eigenspace with value λ of A. Conse-
quently, B x i = sj =1 a j i x j . Or, in matrix form as
P
a 11 a 12 ... a 1s
¡ ¢ ¡ ¢ . .. .. ..
B x 1 , . . . , B x s = x 1 , . . . , x s .. . . . .
a s1 a s2 ... a ss
Denote the right side matrix as S. It has at least one eigenvector y corresponding to µ.
Thus,
a 11 a 12 ... a 1s
¤ . .. .. ..
. y = µ x 1 , . . . , x s y.
£ ¤ £ £ ¤
B x 1 , . . . , x s y = x 1 , . . . , x s .. . .
a s1 a s2 ... a ss
£ ¤ £ ¤
As x i are linear independent, x 1 , . . . , x s y 6= 0. Therefore, x 1 , . . . , x s y is a common
eigenvector. Then, we prove the question by induction. The base case holds because it
involves only scalar operations. Suppose it holds for (n − 1)-by-(n − 1) matrices. Then,
for n-by-n matrices, since A and B have one common unit eigenvector x, expand it into
an orthonormal matrix Q. We have
λ µ
µ ¶ µ ¶
Q T AQ = ,Q T BQ = .
A n−1 B n−1
Since Q T AQQ T BQ = Q T ABQ = Q T B AQ − Q T BQQ T AQ, it follows that A n−1 and B n−1
commute. Thus, by induction, there exists Q̃ such that Q̃ T A n−1Q̃ = Λ A n−1 , Q̃ T B n−1Q̃ =
ΛB n−1 . Then
λ λ
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
T 1 1 T 1 1
Q T A Q= ,Q T B Q= .
Q̃ Q̃ ΛB n−1 Q̃ Q̃ ΛB n−1
jπ
Since T̃ = T N − 2I , by Question 4.5, the eigenvalues of T̃ are 2 cos( N +1 ) with corre-
jπ j Nπ
sponding eigenvector z j = [sin( N +1 ), . . . , sin( N +1 )]. Then, as T̂ = αI − θ T̃ , it follows
jπ
that the eigenvalues of T̂ are α − 2θ cos( N +1 ) with corresponding eigenvectors z j =
jπ j Nπ
[sin( N +1 ), . . . , sin( N +1 )].
73
3. Since T = I ⊗ A + T̃ ⊗ H , it follows
(I ⊗ Λ A )x i j = (I ⊗ Λ A )(z i ⊗ e j ) = z i ⊗ (α j )e j = α j (z i ⊗ e j ) = α j x i j .
Denote X = [x 11 , x 12 , . . . , x nn ] = Z ⊗ I . It follows
5. If A and B are symmetric tridiagonal Toeplitz matrices, Q = Z and the cost of matrix
matrix product can be reduced to O(n 2 log n) by fast sine transformation (or FFT). Thus,
the total cost reduces to O(n 2 log n).
Question 6.11. Suppose that R is upper triangular and nonsingular and that C is upper hes-
senberg. confirm that RC R −1 is upper Hessenberg.
Proof. As the inverse of an upper triangular matrix is upper triangular, it suffices to verify
multiplication by upper triangular preserve Hessenberg. Since C (i , j ) = 0 when i + 1 > j and
R(i , j ) = 0 when i > j , it follows
n
X iX
−1 n
X
RC (i , j ) = R(i , k)C (k, j ) = R(i , k)C (k, j ) + R(i , k)C (k, j ) = 0, when i > j + 1,
k=1 k=1 k=i
n
X iX
−2 n
X
C R(i , j ) = C (i , k)R(k, j ) = C (i , k)R(k, j ) + C (i , k)R(k, j ) = 0, when i > j + 1.
k=1 k=1 k=i −1
74
Question 6.12. Confirm that the Krylov subspace Kk (A, y 1 ) has dimension k if and only if the
Arnoldi algorithm (Algorithm 6.9) or the Lanczos algorithm (Algorithm 6.10) can compute q k
without quitting first.
Proof. The proofs for Arnoldi algorithm and Lanczos algorithm are similar. Thus, it suf-
fices to prove the result for Arnoldi algorithm. If we can prove that at i th step of Arnoldi
algorithm, z i = Aq i lies in the space span(A[y 1 , . . . .y i ]). The question is proved, because the
criteria of quitting is whether h j +1, j = 0 occurs for any j , which equals to span(A[y 1 , . . . , y j ]) ⊂
span[y 1 , . . . , y j ], i.e., Krylov subspace Kk (A, y 1 ) has dimension less that k.
We prove z i = Aq i ∈ span(A[y 1 , . . . .y i +1 ]) by induction. For the base case, it holds because
q j = b/kbk2 . If it holds for the ( j − 1)th step, we can conclude q i ∈ [y 1 , . . . .y j −1 ] for all i ≤ j ,
Pi −1
since q i = z i −1 − k=1 q k . Then for the j th step, z = Aq j ∈ span(A[y 1 , . . . .y j ]).
Question 6.13. Confirm that when A n×n is symmetric positive definite and Q n×k has full col-
umn rank, then T = Q T AQ is also symmetric positive definite.
Proof. For any vector x ∈ Rk , it follows
x T Q T AQx = (Qx)T AQx ≥ 0.
As A is s.p.d, only when Qx = 0 the equality holds. Then, since Q has full column rank, the
only solution of Qx = 0 is x = 0. Thus, T is also s.p.d.
Question 6.14. Prove Theorem 6.9.
Remark. The Theorem has a little error. That is, Â = M −1/2 AM 1/2 . The correct one should
be  = M −1/2 AM −1/2
Proof. It suffices to prove the relationship bewtween the variables and we can do it by in-
duction. For the base case, we can verify as follows:
ẑ = Â · p̂ 1 = M −1/2 AM −1 b = M −1/2 Ap 1 = M −1/2 z,
ν̂1 = (r̂ T0 r̂ 0 )/(p̂ T1 ẑ) = (b T M −1 b)/(b T M −1 z) = ν1 ,
x̂ 1 = x̂ 0 + ν̂1 p̂ 1 = ν1 M 1/2 p 1 = M 1/2 x 1 ,
r̂ 1 = r̂ 0 − ν̂1 ẑ = M −1/2 r 0 − ν1 M −1/2 z = M −1/2 r 1 ,
µ̂2 = (r̂ T1 r̂ 1 )/(r̂ T0 r̂ 0 ) = (r T1 M −1 r 1 )/(r T0 M −1 r 0 ) = µ2 ,
p̂ 2 = r̂ 1 + µ̂2 p̂ 1 = M −1/2 r 1 + M −1/2 µ2 p 1 = M 1/2 p 2 .
Suppose it holds for (k − 1)th step. Then, for kth step, it follows
ẑ = Â · p̂ k = M −1/2 Ap k = M −1/2 z,
ν̂k = (r̂ Tk−1 r̂ k−1 )/(p̂ Tk ẑ) = (r k−1 T M −1 r k−1 )/(p Tk z) = νk ,
x̂ k = x̂ k−1 + ν̂k p̂ k = M 1/2 x k−1 + νk M 1/2 p k = M 1/2 x k ,
r̂ k = r̂ k−1 − ν̂k ẑ = M −1/2 r k−1 − νk M −1/2 z = M −1/2 r k ,
µ̂k+1 = (r̂ Tk r̂ k )/(r̂ Tk−1 r̂ k−1 ) = (r Tk M −1 r k )/(r Tk−1 M −1 r k−1 ) = µk+1 ,
p̂ k+1 = r̂ k + µ̂k+1 p̂ k = M −1/2 r k + M −1/2 µk+1 p k = M 1/2 p k+1 .
75