You are on page 1of 75

Solutions for Applied Numerical Linear Algebra

Zheng Jinchang
January 25, 2014

This solution is for Applied Numerical Linear Algebra, written by James. W. Demmel. It covers
the majority of the questions from Chapter I to Chapter VI except the programming and few
other questions . As it is done by myself, surely it will have errors. Thus, if there are some
errors or something else, you can email me at jczheng1234@hotmail.com. I hope this solution
is helpful :)

C ONTENTS

1 Solutions for Chapter I: Introduction 2

2 Solutions for Chapter II: Linear Equation Solving 15

3 Solutions for Chapter III: Linear Least Squares Problems 29

4 Solutions for Chapter IV: Nonsymmetric Eigenvalue Problems 42

5 Solutions for Chapter V: The Symmetric Eigenproblem and Singular Value


Decomposition 51

6 Solutions for Chapter VI: Iterative Methods for Linear Systems 67

1
1 S OLUTIONS FOR C HAPTER I: I NTRODUCTION
Question 1.1. Let A be an orthogonal matrix. Show that det(A) = ±1. Show that if B is also
orthogonal and det(A) = − det(B ), then A + B is singular.

Solution. According to the definition of orthogonal matrix, we have

A AT = I .

Take determinant on both sides. Since det(A) = det(A T ), it results


¡ ¢2
det(A) = 1.

Consequently,
det(A) = ±1.
To proof A + B is singular, consider det(A + B ) = det(A T + B T ), yielding

det(I + B A T ) det(A) = det(A + B ) = det(A T + B T ) = det(B T ) det(B A T + I ).

Since det(A) = − det(B ), it follows det(B A T + I ) = 0. Thus

det(A + B ) = 0.

i.e, A + B is singular. 
Question 1.2. The rank of a matrix is the dimension of the space spanned by its columns.
Show that A has rank one if and only if A = ab T for some column vectors a and b.

Remark. For this question to hold, we have to assume that both a and b is not 0.
Proof. If A = ab T , let b = {b 1 , b 2 , . . . , b n }. Partition A by columns as A = α1 , α2 , . . . , αn . We
¡ ¢

have
αi = b i a.
Therefore, all column vectors of A are linear dependent on a. As a, b 6= 0, resulting in A 6= 0.
Then, dim span α1 , α2 , . . . , αn = 1, i.e. rank(A) = 1.
¡ ¡ ¢¢

On the other hand, if rank(A) = 1, partition A by columns as A = α1 , α2 , . . . , αn . Without


¡ ¢

losing any generalities, suppose α1 6= 0. Because of rank(A) = 1, there is only one linear inde-
pendent column vector of A, which means

αi = b i α1 .

Denote a = α1 , b = b 1 , b 2 , . . . , b n . Then a, b 6= 0, and


¡ ¢

A = ab T .


Question 1.3. Show that if a matrix is orthogonal and triangular, then it is diagonal. What
are its diagonal elements?

2
Proof. Without losing any generalities, suppose A = (a i j ) is orthogonal and upper trangular.
Because A is orthogonal, we get
     
a 11 a 12 . . . a 1n a 11 a 11 a 11 a 12 . . . a 1n
a 22 . . . a 2n   a 12 a 22   a 12 a 22 a 22 . . . a 2n 
     
 
.. ..   .. .. ..  =  .. .. .. ..  = 1.
.. 
      
. .  . . .   . . . . . 
 
 
a nn a 1n a 2n . . . a nn a 1n a 2n . . . a nn a nn

Equate the (1, 1) entry, yielding

a 1i = ±δ1i .

Then, condition becomes


     
1 1 1 1
a 22 . . . a 2n   a 22 a 22 a 22 ... a 2n 
     
   
 = 1,
..  = .. 
.. .. .. .. .. ..
  
. .  . . . . . . 
    
   
a nn a 2n ... a nn a 2n ... a nn a nn

which means A(2 : n, 2 : n) is orthogonal and upper triangular. Thus, the result follows by
induction, and the diagonal entries are ±1. 
Question 1.4. A matirx is stictly upper triangular if it is upper triangluar with zero diagonal
elements. Show that if A is strictly upper triangular and n-by-n then A n = 0.
¡ ¢
Proof. Suppose B is a n-by-n matrix, and partition B by its columns as B = b 1 , b 2 , . . . , b n .
Consider the i th column of B A, which is
iX
−1
(B A)(:, i ) = A( j , n)b j .
j =1

Thus, the i th column of B A is the linear combination of the first to the (i − 1)th columns
of B . Subsititue B by A, yielding the diagonal and first superdiagonal of A 2 becoming zero,
because only the first i −1 entries of i th column of A are nonzeros. Consequently, the second
superdiagonal of A 3 becomes zero. In this ananlogy, after n times, the n − 1 superdiagonal of
A n becomes zero, i.e. A n = 0. 
Question 1.5. Let k · k be a vector norm on Rm and assume that C ∈ Rm×n . Show that if
rank(C ) = n, then kxkC ≡ kC xk is a vector norm.

Proof.

1. Positive definiteness.
For any x ∈ Rn , since k · k is a vector norm, we have kC xk ≥ 0, and if and only if C x = 0,
kC xk = 0. Consider the linear system of equations

C x = 0.

Because C has full column rank, the above equations have only zero solution. This
proves the positive definiteness property.

3
2. Homogeneity.
For any x ∈ Rn , α ∈ R,

kαxkC = kαC xk = |α|kC xk = |α|kxkC .

This proves the homogeneity property.

3. The triangle inequality.


For any x, y ∈ Rn ,

kx + ykC = kC x +C yk ≤ kC xk + kC yk = kxkC + kykC .

This proves the triangluar inequality.

Consequently, the kxkC ≡ kC xk is a vector norm. 

Question 1.6. Show that if 0 6= s ∈ Rn and E ∈ Rm×n , then

°2
T ¶° kE sk22
° µ
°E I − ss ° = kE k2 −
°
F .
° s T s °F sT s

Proof. Since
¶¶T
ss T ss T ss T E s(E s)T
µ ¶µ µ µ ¶
E I− T E I− T = E I − T ET = EET − ,
s s s s s s sT s

it follows
°2
T ¶° T kE sk22
° µ
°E I − ss ° = tr(E E T ) − tr(E s(E s) ) = kE k2 −
°
F .
° s T s °F sT s sT s
Therefore, the question is proved. 
Remark. It could also be proved by projection property, because for any u ∈ R , P = uu /u T u
n T

is the orthogonal projection of the subspace spanned by u, and I − P is its complement pro-
jection.

Question 1.7. Verify that kx y ∗ kF = kx y ∗ k2 = kxk2 kyk2 for any x, y ∈ Cn .

Proof. Because the Frobenius norm can be evaluated as kAk2F = tr(A A ∗ ), it follows

kx y ∗ k2F = tr(x y ∗ y x ∗ ) = (y ∗ y)tr(x x ∗ ) = kyk22 kxk22 .

For two-norm, since we have the equation about two-norm: kAk22 = λmax (A A ∗ ), subsititue A
by x y ∗ . We have

kx y ∗ k22 = λmax (x y ∗ y x ∗ ) = (y ∗ y)λmax (x x ∗ ) = kyk22 λmax (x x ∗ ).

Now, what remains is to determine λmax (x x ∗ ). Because x x ∗ is hermitian, we get

v ∗x x∗v 2
λ2max (x x ∗ ) = max = max (v ∗ x) , for any v ∈ C.
v 6=0 v ∗v v ∗ v =1

4
According to Cauchy-Schwartz inequality, only when v is proportional to x, (v ∗ x) get the
greatest absolute value. Therefore, λ2max (x x ∗ ) ≤ kxk22 , and when v = x the inequality becomes
equality. Consequently,

kx y H k2 = kyk22 λmax (x x ∗ ) = kxk2 kyk2 .

Combining the two eqautions proves the question. 


Remark. If we regard vector x and y as n-by-1 matrices, by the consistancy of two-norm,
kx yk2 ≤ kxk2 kyk2 holds, and this upper bound is attainable by applying vector y. Since we
have not proved the consistancy, I use the above method, instead.

Question 1.8. One can identify the degree d polynomials p(x) = di=0 a i x i with Rd +1 via the
P

vector of coefficients. Let x be fixed. Let S x be the set of polynomials with an infinite relative
condition number with respect to evaluationg them at x (i.e., they are zero at x). In a few words,
describe S x geometrically as a subset of Rd +1 . Let S x (κ) be the set of polynomails whose relative
condition number is κ or greater. Describe S x (κ) geometrically in a few words. Describe how
S x (κ) changes geometrically as κ → ∞.
¡ ¢
Solution. Suppose the coefficients of the polynomials with degree d as a = a 1 , a 2 , . . . , a d +1 ,
and the fixed x = 1, x, . . . , x d . Since S x are the polynomials that are zero at x, we have
¡ ¢

a T x = 0.

Therefore, the set S x forms the hyperplanes which are perpendicular to the fixed x (or their
normal vectors are x).
¡ ¢ ¡ ¢
As for the S x (κ), set |a| = |a 1 |, |a 2 |, . . . , |a d +1 | , and |x| = |x 1 |, |x 2 |, . . . , |x d +1 | . Due to the defi-
nition of the relative condition number, we have

|a T ||x|
≥ κ.
|a T x|

Using Cauchy-Schwartz inequality, we can transform the inequality as

kak2 kxk2
≥ κ.
|a T x|

Take reciprocal and arccos on both sides, the inequality does not change, resulting

1
∠(a, x) ≥ arccos .
κ

So, the set S x (κ) are hyperplanes whose normal vectors have greater angle than arccos κ1 with
the fixed x. Thus, as κ → ∞, arccos κ1 → π2 , i.e, the hyperplanes become perpendicualr to the
fixed x. 

Question 1.9. This question is too long, so I omit it. For details, please refer to the textbook.

5
Proof. For the first algorithm, insert a roundoff term (1+δi ) at each floating point operation,
we get
log (1 + x)(1 + δ1 ) log(1 + x) + log(1 + δ1 )
ŷ = (1 + δ2 ) = (1 + δ2 ).
x x
Since x, δi ≈ 0, we get the relative error as
log (1 + δ1 ) ¯ ¯ δ1 ¯ ¯ δ1 ¯
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
¯ ŷ − y ¯ ¯ x
¯ ¯=¯
¯ y ¯ ¯ log (1 + x) (1 + δ2 )¯ ≈ ¯ (1 + δ2 )¯¯ ≈ ¯¯ ¯¯
¯ ¯
x x x
As δ is a bounded value, the relative error will grow when x → 0.
Use the same technique to analyse the second algorithm, we have its relative error as
log d ¯¯ ¯¯ 1 + δ2
¯ ¯ ¯ ¶¯ ¯ ¯
¯ ŷ − y ¯ ¯ d − 1 log d
µ
δ ¯ ≈ δ, |δ| ≤ 2².
¯
¯ ¯=¯
¯ y ¯ ¯ log d (d − 1)(1 + δ ) (1 + 2 ) − = − 1
1 d −1 ¯ ¯ 1 + δ1 ¯

Therefore, the second algorithm will correct near x = 0. 


¡Pd Pd
Question 1.10. Show that, barring overflow or underflow, fl i =1 x i y i = i =1 x i y i (1 + δi ),
¢

where |δi | ≤ d ². Use this to prove the following fact. Let A m×n and B n×p be matrices, and
compute their produnct in the usual way. Baring overflow or underflow show that |fl(A · B ) −
A · B | ≤ n · ² · |A| · |B |. Here the absolute value of a matrix |A| means the matrix with entries
(|A|)i j = |a i j |, and the inequality is meant componentwise.
Proof. Denote δi as mutipilication roundoff error and δ̂i as addition roundoff error. Ex-
panding the formula with the roundoff terms, we get
à ! à !
d d dY
−1 dY
−1
(1 + δi ) (1 + δ̂ j )x i y j + (1 + δ̂ j )x 1 y 1 .
X X
fl xi y i =
i =1 i =2 j =i −1 j =1

As
dY
−1
1 − (d − i + 1)² + O(²2 ) ≤ (1 − ²)d −i +1 ≤ (1 + δ̂ j ) ≤ (1 + ²)d −i +1 ≤ 1 + (d − i + 1)² + O(²2 ),
j =i −1

we have the following two inequlities


à !
d d d
x i y i ≤ (1 + (d − i + 2)²)x i y i + (1 + d ²)x 1 y 1 ≤ (1 + d ²)x i y i ,
X X X
fl
i =1 i =2 i =1
à !
d d d
(1 − d ²)x i y i ≤ (1 − (d − i + 2)²)x i y i + (1 − d ²)x 1 y 1 ≤ fl
X X X
xi y i .
i =1 i =2 i =1

Therefore, fl di=1 x i y i = di=1 x i y i (1 + δ̄i ), where |δ̄i | ≤ d ².


¡P ¢ P

Now, turn to prove the matrix inequlity. Since the absolute value of a matrix is defined compo-
nentwise, we just need to prove it componentwise. Suppose c i j = (A·B )(i , j ) = nk=1 A(i , k)B (k, j ).
P

Applying what we have proved, we have


¯ ¯
¯ ¯ ¯X n ¯ Xn
¯fl(c i j ) − c i j ¯ = ¯¯ A(i , k)B (k, j )δk ¯ ≤ n²
¯
|A(i , k)||B (k, j )| = n²(|A||B |)(i , j ).
¯k=1 ¯ k=1

This proves the question. 

6
Question 1.11. Let L be a lower triangular matrix and solve Lx = b by forward substituion.
Show that barring overflow or underflow, the computed solution x̂ satisfies (L+δL)x̂ = b, where
|δl i j | ≤ n²|l i j |, where ² is the machine precision. This means that forward substituion is back-
ward stable. Argue that backward substitution for solving upper triangular systems satisfies
the same bound.

Proof. Insert a roundoff term (1 + δi ) at each floating point operation in the forward substi-
tuion formula, and use the results of last question. We get
Pi −1
bi − k=1
a i k x k (1 + δk )
xi = (1 + δ̂i )(1 + δ̄i ), where |δk | ≤ (i − 1)², |δ̂i | ≤ ², |δ̄i | ≤ ².
ai i

Solve b i from the above equation, because of

1/(1 + δ̂i )(1 + δ̄i ) = 1 + δi , where |δi | ≤ 2².

We can combine the term x i in the sum as follows:

i
a i k x k (1 + δi k ), where |δi k | ≤ max{2, i − 1}².
X
bi =
k=1

Therefore, denote (δL)i j = a i j δi j . We have proved (L + δL)x̂ = b, where |δl i j | ≤ n²|l i j |. 


Remark. The result will be slightly different if we consider b i − ik=1
P −1
a i k x k as

(((b i − a i 1 x 1 ) − a i 2 x 2 ) − a i 3 x 3 ) . . . .

In fact, it may be the evaluation sequence the question intended. My result is based on the
following sequence
iX
−1
bi − ( a i k x k ).
k=1

For this question, there is not major difference between the two.

Question 1.12. The question is too long, so I omit it. For details, please refer to the textbook.

Remark. In my point of view, the question is intended to prove that complex arithmetics
implemented in real arithmetics are backward stable. Thus, it may be sufficient to prove

|fl(a ¯ b) − ab| ≤ O(²)ab.

My proof tries to find out the expressions of the δ’s.


Solution. First, consider the complex addition. Suppose there is such δ = δ1 + δ2 i . Then,
as complex addition is two real addition, we have two ways to compute the floating point
operations as follows:

fl (a 1 + b 1 i + a 2 + b 2 i ) = (a 1 + a 2 )(1 + δ̂) + (b 1 + b 2 )(1 + δ̄)i = (a 1 + a 2 + b 1 i + b 2 i )(1 + δ1 + δ2 i ).

7
Solving δ1 , δ2 from the above equations, we get

(a 1 + a 2 )2 δ̂ + (b 1 + b 2 )2 δ̄
δ1 = ,
(a 1 + a 2 )2 + (b 1 + b 2 )2
(a 1 + a 2 )(b 1 + b 2 )(δ̄ − δ̂)
δ2 = .
(a 1 + a 2 )2 + (b 1 + b 2 )2

Since |δ1 |2 + |δ2 |2 ≤ |δ̂ + δ̄|2 + |δ̄ − δ̂|2 ≤ 4²2 , we have proved for the complex addition. As
subtraction is essentially the same as addition, it is the same for complex subtraction .
As for mutiplication, use the same technique and the above result, we have the equation as

fl ((a 1 + b 1 i )(a 2 + b 2 i )) = (a 1 + b 1 i )(a 2 + b 2 i )(1 + δ1 + δ2 i )


= a 1 a 2 (1 + δ̄1 ) − b 1 b 2 (1 + δ̄4 ) + a 2 b 1 (1 + δ̄2 )i + a 1 b 2 (1 + δ̄3 )i (1 + δ̂1 + δ̂2 i ).
¡ ¢

For simplicity, we ignore the higher order epsilons as they are not the major part of the error.
Then solving δ1 , δ2 from the above equations, we get
1 ³
δ1 = (a 12 a 22 − a 1 a 2 b 1 b 2 )δ̄1 + (a 22 b 12 + a 1 a 2 b 1 b 2 )δ̄2 + (a 12 b 22 +
(a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2
´
a 1 a 2 b 1 b 2 )δ̄3 − (a 1 a 2 b 1 b 2 + b 12 b 22 )δ̂4 + (a 12 a 22 − b 12 b 22 + a 22 b 12 + a 12 b 22 )δ̂1 − (a 1 b 1 b 22 + a 1 a 22 b 1 )δ̂2 ,
1 ³
δ2 = − (a 12 a 2 b 2 + a 1 a 22 b 1 )δ̄1 + (a 1 a 22 b 1 − a 2 b 12 b 2 )δ̄2 + (a 12 a 2 b 2 −
(a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2
´
a 1 b 1 b 22 )δ̄3 + (a 1 b 1 b 22 + a 2 b 12 b 2 )δ̄4 + ((a 1 a 2 − b 1 b 2 )2 + (a 1 b 2 + a 2 b 1 )2 )δ̂2 .

We can claim that all the coefficient of each δi is bounded, because of the mean value in-
equality. Therefore, the question holds for the complex multiplication. Now, we can check
whether both the real and imaginary parts of the complex product are always compute accu-
rately. Since we have fl(a 1 + b 1 i )(a 2 + b 2 i ) = (a 1 + b 1 i )(a 2 + b 2 i )(1 + δ1 + δ2 i ), separate the real
and imaginary parts as
¯ ¯
¯ (a 1 a 2 − b 1 b 2 )δ1 − (a 1 b 2 + a 2 b 1 )δ2 ¯
Errorr = ¯¯ ¯,
a1 a2 − b1 b2 ¯
¯ ¯
¯ (a 1 a 2 − b 1 b 2 )δ2 − (a 1 b 2 + a 2 b 1 )δ1 ¯
Errori = ¯¯ ¯.
a1 b2 + a2 b1 ¯

Thus, if a 1 , b 1 → 0, the real part may go wrong; and if b 1 , b 2 → 0, the imaginary part may go
wrong. So both parts of the complex product may not always be computed accurately.
At last, we check the complex division with the knowledge of complex mutiplication, and
my algorithm for complex division is like the normal complex division just without actual
complex mutiplication in realizing the denominator. Therefore, we have

a1 + b1 i a1 + b1 i (a 1 + b 1 i )(a 2 − b 2 i )(1 + δ̄1 + δ̄2 i )


µ ¶
fl = (1 + δ1 + δ2 i ) = ¡ (1 + δ̂4 ).1
a2 + b2 i a2 + b2 i a 2 (1 + δ̂1 ) + b 2 (1 + δ̂2 ) (1 + δ̂3 )
2 2
¢

1 In reality, there are two division δ’s. For simplicity I suppose they are the same, becasue there are no differ-

ence after the following merging δ’s. Otherwise, the procedure remains the same, but will be slightly more
complicated.

8
Ignore the higher order epsilon. Then merge (1 + δ̄1 + δ̄2 i )(1 + δ̂4 ), (1 + δ̂1 )(1 + δ̂3 ) and (1 +
δ̂2 )(1 + δ̂3 ), and continue to use the old mark. We can solve δ1 , δ2 as

a 22 δ̂1 + b 22 δ̂2
δ1 = δ̄1 − ,
a 22 + b 22
δ2 = δ̄2 .

As all the coefficient of each δ is smaller than one, we prove the result for complex division.
What’s more, not matter whether |a| is large or small, a/a will approximate 1 by the above
formulas. 

Question 1.13. Prove lemma 1.3.

Remark. Since there is not major difference between real and complex situations. For sim-
plicity, I prove for the real case.
Proof. In one side, if A is s.p.d., for any x, y ∈ Rn , we define an inner product as 〈x, y〉 =
x T Ay. Verify four criteria of inner product, we get

1. 〈x, y〉 = x T Ay = (y T Ax)T = y T Ax = 〈y, x〉.

2. 〈x, y + z〉 = x T A(y + z) = x T Ay + x T Az = 〈x, y〉 + 〈x, z〉.

3. 〈αx, y〉 = αx T Ay = α〈x, y〉.

4. 〈x, x〉 = x T Ax ≥ 0, and 〈x, x〉 = 0 if and only if x = 0.

Therefore, it is an inner product.


On the other side, if 〈·, ·〉 is an inner product, consider its value on the canonical Rn bases
e i . Set matrix A(i , j ) = 〈e i , e j 〉, then A(i , j ) is symmetric because of the symmetry property of
inner product. For any x, y ∈ Rn , expand them under the canonical bases as x = ni=1 αi e i , y =
P
Pn
i =1 βi e i . We get
n n
〈x, y〉 = ( αi e i )T ( βi e i ) = x T Ay.
X X
i =1 i =1

This means A is symmetric positive definite, and is uniquely determined by the inner product
(for the bases is fixed). Thus, the quetion is proved. 

Question 1.14. Prove Lemma 1.5.


p
Proof. First, since a 2 + b 2 ≤ |a| + |b|, we have
s s s
n
X n
X n
X n
X
kxk2 = x i2 ≤ |x 1 | + x i2 ≤ |x 1 | + |x 2 | + x i2 ≤ . . . ≤ |x i | = kxk1 .
i =1 i =2 i =3 i =1

on the other hand,


n n n n n
kxk21 = ( |x i |)2 = x i2 + 2 x i2 + (x i2 + x 2j ) = n x i2 = nkxk22 .
X X X X X X
|x i x j | ≤
i =1 i =1 i =1 i =1 i 6= j i =1
j =i +1

9
p
Therefore, we get the first inequality kxk2 ≤ kxk1 ≤ nkxk2 .
For the inequality about infinity-norm, since it is obvious that
r X X
max(|x i |) ≤ max(x i 2 ) + x i2 ≤ max(|x i |) + |x i |,
else else

and
n n
n max(|x i |)2 ≥ x i2 , n max(|x i |) ≥
X X
|x i |,
i =1 i =1
p
the second inequality kxk∞ ≤ kxk2 ≤ nkxk∞ and third inequality kxk∞ ≤ kxk1 ≤ nkxk∞
follows. 
Question 1.15. Prove Lemma 1.6.
Proof. Let A ∈ Rm×n , k · kmn the operator norm and k · km , k · kn the corresponding vector
norms. To prove the question, we just need to verify the three criteria.
1. Positive definiteness
Since kAkmn = maxkxkn =1 (kAxkm ), definitly kAkmn ≥ 0. If kAkmn = 0, which means
Ax = 0 for all x ∈ Rn . i.e., Rn is the solution set of linear system of equations Ax = 0. As
a result, we have A = 0.

2. Homogeneity
Due to the homogeneity of corresponding vector norm, we have

kαAkmn = max (kαAxkm ) = |α| max (kAxkm ) = |α|kAkmn .


kxkn =1 kxkn =1

3. The triangluar inequality


Because the triangular inequality of corresponding vector norm, we have

kA + B kmn = max (k(A + B )xkm ) ≤ max (kAxkm ) + max (kB xkm ) = kAkmn + kB kmn .
kxkn =1 kxkn =1 kxkn =1

In all, an operator norm is a matrix norm. 


Question 1.16. Prove all parts except 7 of lemma 1.7.
Remark. Part 14 of lemma 1.7 seems wrong. Thus, I do some modification there.
Proof.
1. kAxk ≤ kAk · kxk for a vector norm and its corresponding operator norm, or the vector
two-norm and matrix Frobenius norm.
For a vector norm and its corresponding operator norm, the inequality is obvious, be-
cause operator norm is defined as smallest upbound for which kAk · kxk ≥ kAxk holds.
We just need to prove it for the two-norm and Frobenius norm.
¢T
Partition matrix A by its rows as A = αT1 , αT2 , . . . , αTm . We have
¡

m m m
kAxk22 = (αTi x)2 ≤ kαi k22 kxk22 = kxk22 kαi k22 = kxk22 kAk2F .
X X X
i =1 i =1 i =1

Thus, it also holds for the two-norm and Frobenius norm.

10
2. kAB k ≤ kAk · kB k for any operator norm or for the Frobenius norm.
Using the above result, for any operator norm, we have

kA(B x)k kAk · kB xk


kAB k = max ≤ max ≤ kAk · kB k.
x6=0 kxk x6=0 kxk
¢T
For Frobenius norm, parition A by its rows as A = αT1 , αT2 , . . . , αTm , and split B by its
¡

columns as B = β1 , β2 , . . . , βs . We get
¡ ¢

m,s m,s m s
kAB k2F = (αTi β j )2 ≤ (kαi k22 · kβ j k22 ) = kαi k22 · kβi k22 = kAk2F kB k2F .
X X X X
i =1 i =1 i =1 j =1
j =1 j =1

Thus, it holds also for the two-norm and Frobenius norm.

3. The max norm and Frobenius norm are not operator norms.
Consider a matrix A = ααT . We have a lower bound of kAk as follows,

kAxk kAαk kααT αk kαk


kAk = max ≥ = = kαk22 = kαk22 .
x6=0 kxk kαk kαk kαk

Let α = 1, 1, . . . , 1 , then kAkmax = 1. However, according to the above inequality, if


¡ ¢

kAkmax is an operator norm, we have kAkmax ≥ n. Such conflict means that k · kmax is
not an operator norm.
For Frobenius norm, consider any operator norm of identity matrix, we have

kI xk kxk
kI k = max = max = 1.
x6=0 kxk x6=0 kxk

While the Frobenius norm of I is n, it means that Frobenius norm is not an operator
norm.

4. kQ AZ k = kAk if Q and Z are orthogonal or unitary for the Frobenius norm and for the
operator norm induced by k · k2 .2
Separate the target equation as two equations: kQ Ak = kAk, kAZ k = kAk. Since kA T k =
kAk holds for Frobenius and two-norm (it is obvious for Frobenius norm and later in
this question we will prove this
p for two-norm), we just need to prove the first equation.
For two-norm, since kAk2 = λmax (A T A), we get
q q
kQ Ak2 = λmax (A T Q T Q A) = λmax (A T A) = kAk2 .

Next, for the Frobenius norm, we claim that for any vector x, kQxk2 = kxk2 , because
kQxk22 = x T Q T Qx. Then, parition A by its columns as A = α1 , α2 , . . . , αn . We have
¡ ¢

n n
kQ Ak2F = kQαi k22 = kαi k22 = kAk2F .
X X
i =1 i =1

2 This result can be extended into the case of the rectangular matrix. The proof is nearly the same, except that we

use the equality kAk2 = σmax (A).

11
5. kAk∞ ≡ maxx6=0 kAxk ∞
P
kxk∞ = maxi j |a i j | = maximum absolute row sum.
¢T
Partition A by its rows as A = α1 , α2 , . . . , αTn . We have
¡ T T

kAxk∞ n ¯
= max |αTi x| ≤ max
X ¯
kAk∞ ≡ max ¯a i j ¯ .
x6=0 kxk∞ kxk∞ =1,i i j =1
¡
Denote i the number of row of the maximum absolute row sum, and set y = sign(a i 1 ),
¢ P
sign(a i 2 ), . . . , sign(a i n ) . The up bound is attainable, as kyk∞ = 1 and kAyk = maxi j |a i j |.

6. kAk1 ≡ maxx6=0 kAxk 1 T P


kxk 1 = kA k∞ = ¡max j i |a i j | = maximum absolute column sum.
Partition A by its columns as A = α1 , α2 , . . . , αn . We have
¢
° °
°X n ° n
kAk1 = max kAxk1 = max ° α1 x 1 ° ≤ max
X
|x i | kαi k1 .
° °
kxk1 =1 kxk1 =1 °i =1 ° kxk1 =1 i =1
1

Denote j the number of column of the maximum absolute column sum. We have
à à ! !
Xn X X
max |x i | kαi k1 = max |x i | kαi k1 + 1 − |x i | kα j k1
kxk1 =1 i =1 kxk1 =1 i 6= j i 6= j
à !
X¡ ¢
= max kα j k1 − kα j k1 − kαi k1 |x i |
kxk1 =1 i 6= j

≤ kα j k1 .

Set x = e j , and the up bound is attainable.

7. kAk2 = kA T k2p
.3
Since kAk2 = λmax (A T A), it suffice to prove that λmax (A T A) = λmax (A A T ). Denote λ
is as a nonzero eigenvalue of A T A and x its corresponding eigenvector. We have

(A A T )Ax = A(A T A)x = λAx.

Since A T Ax = λx, and λ 6= 0, we have Ax 6= 0. Therefore, λ is an eigenvalue of A A T and


Ax is its corresponding eigenvector. As both A T A and A A T are nonnegative symmetric
matrices, their eigenvalues must at least as large as zero. If one of the matrices has all
its eigenvalues zero, it follows A = 0. As a result the other will has all its eigenvalues
zero, too. Thus, λmax (A T A) = λmax (A A T ).

8. kAk2 = maxi |λi (A)| if A is normal, i.e., A A T = A T A.


Since A is normal, there exist an orthgonal matrix P and a diagonal matrix Λ = diag λ1 , λ2 , . . . , λn
¡ ¢

such that P −1 AP = Λ. Thus,


q
kAk2 = kP −1 AP k2 = kΛk2 = λmax (ΛT Λ) = max |λi (Λ)| = max |λi (A)|.
i i
3 There are several ways to prove that AB and B A have the same eigenvalues. Among them, I present an analytic

version when A and B are square here. First assume A is invertable, then det(λI − AB ) = det(λI −B A) is easy to
prove. For singular A, we perturb A as à = A + t I . à is invertable when t is small enough, and operator det() is
continuous. Thus, we have the result by taking limit. This method is inspired by my friend, Zhou Zeren. There
is another method in Chapter IV.

12
9. If A is n-by-n, then n −1/2 kAk2 ≤ kAk1 ≤ n 1/2 kAk2 .
p
Use the inequalities kxk2 ≤ kxk1 ≤ nkxk2 . Suppose kxk 6= 0. Invert the inequalities as
1 1 1
p ≤ ≤ .
nkxk2 kxk1 kxk2

kAxkp
As kAkp ≡ maxx6=0 kxkp , combine the two inequalities. We have
p
kAxk2 kAxk1 nkAxk2
p ≤ ≤ .
nkxk2 kxk1 kxk2

Take max, it follows


n −1/2 kAk2 ≤ kAk1 ≤ n 1/2 kAk2 .

10. If A is n-by-n, then n −1/2 kAk2 ≤ kAk∞ ≤ n 1/2 kAk2 .


Like the previous question, we use the inequalities n −1/2 kxk2 ≤ kxk∞ ≤ kxk2 . Invert the
inequalities as
1 1 n 1/2
≤ ≤ .
kxk2 kxk∞ kxk2
Combine the two inequalities. We have
p
kAxk2 kAxk∞ nkAxk2
p ≤ ≤ .
nkxk2 kxk∞ kxk2

Take max, we get


n −1/2 kAk2 ≤ kAk∞ ≤ n 1/2 kAk2 .

11. If A is n-by-n, then n −1 kAk∞ ≤ kAk1 ≤ nkAk∞ .


Like the previous question, we use the inequalities kxk∞ ≤ kxk1 ≤ nkxk∞ . Still, invert
the inequalities as
1 1 1
≤ ≤ .
nkxk∞ kxk1 kxk∞
Combine the two inequalities. We have
kAxk2 kAxk1 nkAxk∞
≤ ≤ .
nkxk∞ kxk1 kxk∞
Take max, we get
n −1 kAk∞ ≤ kAk1 ≤ nkAk∞ .

12. If A is n-by-n, then kAk2 ≤ kAkF ≤ n 1/2 kAk2 .


As A T A is symmetric nonnegative. Therefore, the eigenvalues of A T A are all nonnega-
tive. As λ(A T A) = tr(A T A), we get
P

λmax (A T A) ≤ tr(A T A) ≤ nλmax (A T A).

i.e.,
kAk2 ≤ kAkF ≤ n 1/2 kAk2

13

p
Question 1.17. We mentioned that on a Cray 2 2
p machine the expression arccos (x/ x + y )
caused an error, because roundoff caused x/ x 2 + y 2 to exceed 1. Show that this is impos-
sible using IEEE arithmetic, barring overflow or under flow. Extra credit: Prove the same result
using correctly rounded decimal arithmetic.

Remark. I am not sure about the correctness of my proof, since there is not major difference
between decimal and binary arithmetic in my proof. It may due to that I use the different
method to prove it instead of what the author is intended for. However, whether or not it is
correct, my proof can serve as a p reference.
Proof. If we have prove that fl( x 2 ) = x exactly on a machine using IEEE arithmetic, we will
get q p
fl(x/ x 2 + y 2 ) ≥ fl(x/ x 2 ) = fl(x/x) = 1,
p
which means it p is impossible for arccos (x/ x 2 + y 2 ) causing an error. Thus, we only need
2 2
to prove that fl( x 2 ) = x. Denote fl(xp ) = x (1 + δ) with |δ| ≤ ², where x p is already a floating
point number. Then, to compute fl(x 2 ), the computer will compute fl(x 2 ) ≈ x(1 + 21 δ),
¡p
fl(x 2 ) − x /x = 21 δ and 12 |δ| ≤ 12 ², the 21 δ will be rounded off,
¢
and round it. However, as
p
resulting in fl( x 2 ) = x. 

Question 1.18. Suppose that a and b are normalized IEEE double precision floating point
numbers, and consider the following algorithm, running with IEEE arithmetic:

if(|a| < |b|), swap a and b


s1 = a + b
s 2 = (a − s 1 ) + b

Prove the following facts:

1. Barring overflow or underflow, the only roundoff error committed in running the algo-
rithm is computing s 1 = f l (a +b). In other words, both subtractions a −s 1 and (a −s 1 )+b
are computed exactly.

2. s 1 +s 2 = a +b, exactly. This means that s 2 is actually the roundoff error committed when
rounding the exact value of a + b to get s 1 .

Proof. To do addition and subtraction, a computer will equalize the exponential parts of
two operands. For instance, suppose

a = (1, a 1 , a 2 , a 3 , a 4 , . . . , a 50 , a 51 )2 × 2e 1 ,
b = (1, b 1 , b 2 , b 3 , b 4 , . . . , b 50 , b 51 )2 × 2e 2 .

To do addition, a computer will change e 1 and e 2 into e 3 , then add their corresponding frac-
tion parts.
Therefore, if a is too larger than b, that is to say, the bottomest bit of a is still greater than the

14
toppest bit of b, we will have s 1 = a + b = a, and s 1 − a = a − a = 0 is exact, since a and b are
already floating point numbers. Then, s 1 + b = 0 + b = b is also exact.
If a is not too larger than b, without losing any generality, we can suppose that the toppest bit
of b is the 50th toppest bit of a. Remain to use the above marks. We have

s 1 = a + b = (1, a 1 , a 2 , a 3 , a 4 , . . . , a 50 + 1, a 51 + b 1 ) × 2e 1 .

Then, compute a − s 1 , we have

a−s 1 = (1 − 1, a 1 − a 1 , . . . , a 50 − a 50 − 1, a 51 − a 51 − b 1 )×2e 1 = (−1, −b 1 )×2e 1 −50 = (−1, −b 1 )×2e 2 .

This is exact, because the toppest bit of s 1 is the same as the toppest bit of a. What’s more,
since a − s 1 has the same toppest bit of b, it is certain that (a − s 1 ) + b is exact.
The situation of a < b is the same. In all, we can see that under both situation s 1 + s 2 contain
all the bits of a and b. Therefore, s 1 + s 2 = a + b exactly. 

2 S OLUTIONS FOR C HAPTER II: L INEAR E QUATION S OLVING


Question 2.1. Programming question. Skipped.

Question 2.2. Consider solving AX = B for X , where A is n-by-n, and X and B are n-by-m.
There are two obvious algorithms. The first algorithm factorizes A = P LU using Gaussian elim-
ination and then solves for each column of X by forward and back substitution. The second al-
gorithm computes A −1 using Gaussian elimination and then multiplies X = A −1 B . Count the
number of flops required by each algorithm, and show that the first one requires fewer flops.

Solution. For the first algorithm, since permutation does not increase flops, we can suppose
matrix A is already permuted. Then, denote
 
a 11 a 12 . . . a 1n
 a 21 a 22 . . . a 2n 
 
A= .
 .. .. .
.. 
 . . . . . 
a n1 a n2 ... a nn

To determine the first column of L requires n − 1 divisions. To update the rest of the matrix
costs n − 1 multiplications and subtractions for each row, which totally costs 2(n − 1)2 flops.
Therefore, the whole flops Gaussian elimination requires is

2 2
(2(i − 1)2 + (i − 1)) = n 3 + O(n 2 ).
X
i =n 3

For a single forward substitution, it requires


n
(2i − 1) = n 2 + O(n).
X
i =1

15
It is the same for back substitution. Thus, the first method costs one step of LU decomposi-
tion, m steps of forward and back substitution. Totally

2 3
n + 2mn 2 + O(n 2 ) + O(mn)
3
flops.
For the second algorithm, Using 1i =n (2(i −1)i +2(n−i +1)(i −1)) = n 3 +O(n 2 ) flops, we reduce
P

from  ¯ 
a 11 a 12 . . . a 1n ¯¯ 1
 a 21 a 22 . . . a 2n ¯ 1
 ¯ 

 . .. ..
.. ..
¯ 
 .
 . . . . .
¯ 
¯ 
¯
a n1 a n2 . . . a nn ¯ 1
to  ¯ 
a 11 a 12 ... a 1n ¯ 1
¯
0 00
a 22 ... a 2n ¯ b 21 1
 ¯ 
 
.. ¯ . .. .
.. ..

¯ .
. . ¯ . . .
 
 
00
¯
a nn ¯ b n1 b n2 ... 1

Costing another 1i =n ((i − 1) + 2n(i − 1)) = n 3 + O(n 2 ), we finish finding the inverse of A. As a
P

matrix multiplication also costs nm(2n − 1), totally it costs

2n 3 + 2mn 2 + O(n 2 ) + O(mn).

Therefore, when n and m is large enough, the first algorithm costs fewer flops. 

Question 2.3. Let k · k be the two-norm. Given a nonsingular matrix A and a vector b, show
that for sufficiently small kδAk, there are nonzero δA and δb such that inequality

kδxk ≤ kA −1 k(kδAk · kx̂k + kδbk)

is an equality. This justifies calling κ(A) = kA −1 k · kAk the condition number of A.

Proof. To find the suitable δx and δA, let us have a glance at how the inequality comes. The
original equation is
δx = A −1 (−δA · x̂ + δb).
Using the inequality kB xk ≤ kB k·kxk twice, and kδA x̂ +δbk ≤ kδA x̂k+kδbk, we get the target
inequality. If we denote y as the vector which satisfy kA −1 k · kyk = kA −1 yk, set δA satisfy
k−δAk·kx̂k = k−δA x̂k, and δA x̂ proportional to δb. The three inequalities will both become
equalities. This suggests how we can construct δA and δb.
Assume the δA has the form δA = c ab T , where c is a positive constant and a, b are vectors to
be determined. According to Question 1.7., k − δAk = ckbk · kak. Then, as the δA must satisfy
k − δAk · kx̂k = k − δA x̂k, we get

ckbk · kak · kx̂k = ck − ab T x̂k = ckb T x̂k · kak ≤ ckbk · kak · kx̂k.

16
Therefore, for the equation to hold, b must be proportional to x̂, while a can be free of choice.
Then, for δA x̂ to be proportional to δb, we can set b = x̂ and a = d δb, where d is another
constant to be determined. As the final condistion requires

−δA · x̂ + δb = (cd x̂ T x̂ + 1)δb = y,

we can choose c, d such that cd x̂ T x̂ + 1 6= 0. Therefore, we have

δA = d δb x̂ T , δb = (cd x̂ T x̂ + 1)−1 y.

We can see that both δA and δb are determined by known value, and due to the conditions
we use, the target inequality becomes equality under our choice of δA and δb, whose norm
can be modified by the choice of c, d and kyk. 

Question 2.4. Show that bounds

kδxk ≤ ²k|A −1 (|A| · |x̂| + |b|)k,


kδxk
≤ ²k|A −1 | · |A|k.
kxk

are attainable.

Remark. I could only prove that the bounds can be attainable for certain A, whose inverse
satisfy the following condition:
¡ ¢ ¡ ¢
sgn(b i 1 ), sgn(b i 2 ), . . . , sgn(b i n ) = ± sgn(b j 1 ), sgn(b j 2 ), . . . , sgn(b j n ) , for any i , j.

Also, I believe that second inequality should be

kδxk
≤ ²k|A −1 | · |A|k.
kx̂k

Proof. For the first inequality, since it comes from

|δx| = |A −1 (−δA x̂ + δb)| ≤ |A −1 |(|δA| · |x̂| + |δb|) ≤ |A −1 |(²|A| · |x̂| + ²|b|) = ²(|A −1 |(|A| · |x̂| + |b|)),

it is inevitable that
|δA| = ²|A|, |δb| = ²|b|,
which means we could only choose the signs of entries of δA and δb. From inside, we have
|δA x̂| = |δA||x̂|. To make the inequality become equality, for each row, δA(i , j ) and x̂ j should
have the same sign. Therefore,
¡ ¢ ¡ ¢
sgn(δA(i , 1)), sgn(δA(i , 2)), . . . , sgn(δA(i , n)) = ± sgn(x̂ 1 ), sgn(x̂ 2 ), . . . , sgn(x̂ n ) , for any i ,

which reduces the degree of freedom of δA to n. Next, to make | − δA x̂ + δb| = | − δA x̂| + |δb|,
we have
n
X
sgn( −δA x̂) = sgn(δb),
i =1

17
which reduces the degree of freedom of δb to 0. For simplicity, denote z = −δA x̂ + δb. Use
the last n degree of freedom of δA, we could determine the sign of each z i . According to the
condition of A −1 , set
sgn(z i ) = sgn(A −1 (i , j )), for any i , j.
We will get |A −1 z| = |A −1 ||z|. In all, we get

|δx| = ²|A −1 |(|A| · |x̂| + |b|).

Take norm on both sides, and the result follows. For the second inequality, the procedure is
the same. Since under the assumption of δb = 0, the only difference is to divide x̂ in both
sides. 
Question 2.5. Prove Theorem 2.3. Given the residual r = A x̂ − b, use Theorem 2.3 to show that
bound (2.9) is no larger than bound (2.7). This explains why LAPACK computes a bound based
on (2.9), as described in section 2.4.4.

Proof. To prove Theorem 2.3. Rearrange the equation (A + δA)x̂ = b + δb, yielding

|A x̂ − b| = |r | = |δb − δA x̂| ≤ |δb| + |δA x̂| ≤ |δb| + |δA| · |x̂| ≤ ²(|b| + |A| · |x̂|).

As the above inequality is componentwise, it follows


|r i |
² ≥ max .
i (|A| · |x̂| + |b|)i
Use the same technique describe in the previous question, it can be proved that for certain
δA and δb
|A x̂ − b| = |δb − δA x̂| = |δb| + |δA| · |x̂| ≤ ²(|b| + |A| · |x̂|),
|r i |
which implies that ² cannot be smaller than maxi (|A|·|x̂|+|b|)i . There proves the question. Next,
to prove k|A | · |r |k ≤ ²k|A |(|A| · |x̂| + |b|)k. Because all entries of |A −1 |, |r |, and |A| · |x̂| + |b|
−1 −1

are nonnegative, we just need to prove that |r i | ≤ ²(|A| · |x̂| + |b|)i . According to the smallest
value of ², the targeted inequality holds. Therefore, bound (2.9) is no larger than bound (2.7).

Question 2.6. Prove Lemma 2.2.

Proof. First, consider the single permuted permutation matrix. That is, the permutation
matrix with only one pair of rows permuted. It can be written as
 
1
 .. 

 . 


 0 ··· 1 


P = .. 
 , where i th, j th rows are permuted.
 . 
 

 1 ··· 0 

 .. 
 . 
1

18
According to the form of the single permutation matrix, it follows det(P ) = ±1. Give a matrix
¢T
X = α1 , α2 , . . . , αn = βT1 , βT2 , . . . , βTn . The matrix product P X and X P have the form
¡ ¢ ¡

¢T
P X = βT1 , βT2 , . . . , βTj , . . . , βTi , . . . , βTn ,
¡

X P = α1 , α2 , . . . , α j , . . . , αi , . . . , αn .
¡ ¢

Mark this as property-one. Then, to transform P into identity matrix I , it just need to per-
mute the permuated i th and j th rows again. Consequently, P −1 = P for single permuted
permutation matrix.
For any permutation matrix P̂ , decompose it into a series of single permutation matrices as
P i , i = 1, 2, . . . , m. i.e.,
P̂ = P m P m−1 · · · P 1 .
According to what we have proved for the single permutation matrix, it is only necessary to
verify the property about P −1 . Since P̂ T = P 1 P 2 · · · P m , we have

P̂ P̂ T = P̂ T P̂ = I .

Thus, the question is proved. 


Question 2.7. If A is a nonsingular symmetric matrix and has the facotization A = LD M T ,
where L and M are unit lower triangluar matrices and D is a diagonal matrix, show that L =
M.

Proof. To prove the question, partition L by its columns and M by its rows. We have

L = α1 , α2 , . . . , αn ,
¡ ¢
¢T
M = βT1 , βT2 , . . . , βTn .
¡

Then, we can decompose A as


n
A = LD M T = D i i αi βTi .
X
i =1

Recall that both L and M is unit lower triangular matrices. Therefore, D i i αi βTi only has
nonzero entries in i : n submatrix, and αi (i ) = βi (i ) = 1. i.e., D i i αi βTi should have form
as  
0
 . 
 . 
¢  . 
0 . . . 0 D i i αi ∗ = 
¡
 0 .

D i i βi 
 


As A is nonsingular symmetric, we know D i i 6= 0 and D 11 α1 = D 11 β1 , which means α1 = β1 .
Then, denote A 1 = A − D 11 α1 β1 . Apply the same technique to A(2 : n, 2 : n), which is also
nonsingular and symmetric. As D i i αi βTi only has nonzero entries in i : n submatrix. We can
prove that α2 = β2 . In this analogy, it proves L = M . 

19
Question 2.8. Consider the following two ways of solving a 2-by-2 linear system of equations:
· ¸ · ¸ · ¸
a 11 a 12 x1 b1
Ax = · = = b.
a 21 a 22 x2 b2

1. Algorithm 1. Gauusian elimination with partial pivoting (GEPP).

2. Algorithm 2. Cramer’s rule:

det = a 11 ∗ a 22 − a 12 ∗ a 21 ,
x 1 = (a 22 ∗ b 1 − a 12 ∗ b 2 )/ det,
x 2 = (−a 21 ∗ b 1 + a 11 ∗ b 2 )/ det .

Show by means of a numerical example that Cramer’s rule is not backward stable.

Proof. To varify whether these two algorithms are backward stable or not, it requires to ver-
ify whether
kδxk/kx̂k = O(²).
Consider the following example:
· ¸ · ¸
1.112 1.999 2.003
·x = .
3.398 6.123 6.122

The approximate answer should be x T = 1.63788, 0.0908866 . However, under four-decimal-


¡ ¢

digit floating point arthimetic, for the first algorithm, it yields


· ¸· ¸ · ¸
1 3.398 6.123 6.122
x= .
0.3273 1 −0.005 2.003

Solving it, we have


· ¸
1.441
x= .
0.2
For the second algorithm, we get

fl(det) = fl(fl(1.112 ∗ 6.123) − fl(1.999 ∗ 3.398)) = fl(6.809 − 6.793) = 0.016,


fl(x 1 ) = fl(fl(fl(6.123 ∗ 2.003) − fl(1.999 ∗ 6.122))/det) = 1.875,
fl(x 2 ) = fl(fl(fl(1.112 ∗ 6.122) − fl(3.398 ∗ 2.003))/det) = 0.125.

Thus,
· ¸ · ¸
0.19688 −0.23712
δx 1 = , δx 2 = .
−0.1091134 −0.0341134

It follows that kδx 2 k1 /kxk1 ≈ 0.156894 ≈ 156² = O(1), which suggest that the second algo-
rithm is not backward stable. 

20
Question 2.9. Let B be an n-by-n upper bidiagonal matrix, i.e., nonzero only on the main
diagonal and first superdiagonal. Derive an algorithm for computing κ∞ (B ) ≡ kB k∞ kB −1 k∞
exactly (ignoring roundoff ). In other words, you should not use an iterative algothrim such as
Hager’s estimator. Your algorithm should be as cheap as possilbe; it should be possible to do
using no more than 2n − 2 additions, n multiplications, n divisions, 4n − 2 absolute values,
and 2n − 2 comparisions.

Proof. For a bidiagonal matrix to have inverse, all the diagonal entries of B must be nonzero.
Thus, we have the explicit form of B and B −1 as follow:
 
b 1,1 b 1,2

 b 2,2 b 2,3 

B =
 . . . .

,
 . . 

b n−1,n−1 b n−1,n 


b n,n

(−1)n−1 n−1 b i ,1+1 / ni=1 b i ,i


 Q Q 
1/b 1,1 −b 1,2 /b 2,2 b 1,1 b 1,2 b 2,3 /b 1,1 b 2,2 b 3,3 ... i =1
(−1)n−2 n−1
Q Qn

 1/b 2,2 −b 2,3 /b 2,2 b 3,3 ... i =2 b i ,1+1 / i =2 b i ,i 

−1
 .. .. .. 
B = . . .
.


−b n−1,n /b n−1,n−1 b n,n

 1/b n−1,n−1 
1/b n,n

Thus, the absolute sum of i th row of B −1 can be compute from (i + 1)th row by the following
recursive formula: ¯ ¯ ¯ ¯
s i = s i +1 ¯b i ,i +1 /b i ,i ¯ + ¯1/b i ,i ¯ .
Assume that we have done the 2n − 1 absolute value operations. The condition number can
be computed as:
1: M AX 1 = B (n, n), M AX 2 = 1/B (n, n), L AST SU M = M AX 2
2: for i = 2 to n do
3: T E M P = B (i , i ) + B (i , i + 1)
4: if T E M P > M AX 1 then
5: M AX 1 = T E M P
6: end if
7: L AST SU M = (L AST SU M ∗ B (i , i + 1) + 1)/B (i , i )
8: if L AST SU M > M AX 2 then
9: M AX 2 = L AST SU M
10: end if
11: end for
Totally, it uses 2n − 2 additions, n − 1 multiplications, n divisions, 2n − 1 absolute values and
2n − 2 comparisions, which meets the requirement. 

Question 2.10. Let A be n-by-m with n ≥ m. Show that kA T Ak2 = kAk22 and κ2 (A T A) = κ2 (A)2 .
Let M be n-by-n and positive definite and L be its Cholesky factor so that M = LL T . Show that
kM k2 = kLk22 and κ2 (M ) = κ2 (L)2 .

21
Proof. As A may be rectangular, denote A = U ΣV T as the SVD decomposition of A. Since

kA T Ak2 = kV Σ2V T k2 = kΣ2 k2 = kΣk22 = kU ΣV T k22 = kAk22 ,

kA T Ak2 = kAk22 is proved. For equality κ2 (A T A) = κ2 (A)2 , it also follows from A T A = V Σ2V T ,
which implies σmin (A T A) = σmin (A)2 . Since the Cholesky factor matrix L is square and in-
vertible, apply the above result and we prove the question. 
Question 2.11. Let A be symmetic and positive definite. Show that |a i j | ≤ (a i i a j j )1/2 .

Proof. Since A is s.p.d., all its diagonal entries are positive. Consider
p p
x i j = {0, 0, . . . , a j j , . . . , − a i i , . . . , 0},
p p
y i j = {0, 0, . . . , a j j , . . . , a i i , . . . , 0},

yielding

x Tij Ax i j = a j j a i i −
p p p
ai i a j j ai j − a i i a j j a i j + a i i a j j = 2a i i a j j − 2 a i i a j j a i j ≥ 0,
y Tij Ay i j = a j j a i i +
p p p
ai i a j j ai j + a i i a j j a i j + a i i a j j = 2a i i a j j + 2 a i i a j j a i j ≥ 0,

Combine them, and it follows


(a i i a j j )1/2 ≥ |a i j |.

Question 2.12. Show that if
µ ¶
I Z
Y = ,
0 I
where I is an n-by-n identity matrix, then κF (Y ) = kY kF kY −1 kF = (2n + kZ k2F ).

Proof. Consider the following matrix


µ ¶
I −Z
X= ,
0 I

which satisfies X Y = Y X = I . Thus,


q q
κF (Y ) = kY kF kY −1 kF = (2n + kZ k2F ) (2n + kZ k2F ) = (2n + kZ k2F ).


Question 2.13. In this question we will ask how to solve B y = c given a fast way to solve Ax = b,
where A − B is "small" in some sense.

1. Prove the Sherman-Morrison formula: Let A be nonsingular, u and v be column vectors,


and A + uv T be nonsingular. Then (A + uv T )−1 = A −1 − (A −1 uv T A −1 )/(1 + v T A −1 u).
More generally, prove the Sherman-Morrison-Woodbury formula: Let U and V be n-by-
k rectangular matrices, where k ≤ n and A is n-by-n. Then T = I + V T A −1U is non-
singular if and only if A + UV T is nonsingular, in which case (A + UV T )−1 = A −1 −
A −1U T −1V T A −1 .

22
2. If you have a fast algorithm to solve Ax = b, show how to build a fast solver for B y = c,
where B = A + uv T .

3. Suppose that kA − B k is "small" and you have a fast algorithm for solving Ax = b. De-
scribe an iterative scheme for solving B y = c. How fast do you expect your algorithm to
converge?
Proof.
1. First, it needs to prove that det(A + uv T ) = det(A) det(1 + v T A −1 u). Consider
A + uv T u
¶ µ −1
A −1 u A
µ ¶µ ¶µ ¶µ ¶ µ ¶
I A I A A
=
vT 1 1 1 −v T 1 1 1 + v T A −1 u
Take determinant on both sides, and the result follows. Then, we can verify Sherman-
Morrison formula as follows:
(A + uv T )(A −1 − (A −1 uv T A −1 )/(1 + v T A −1 u))
= I + uv T A −1 − (uv T A −1 − uv T A −1 uv T A −1 )/(1 + v T A −1 u)
= I + uv T A −1 − (uv T A −1 )(1 + v T A −1 u)/(1 + v T A −1 u)
= I,
(A −1 − (A −1 uv T A −1 )/(1+v T A −1 u))(A + uv T )
= I + A −1 uv T − (A −1 uv T − A −1 uv T A −1 uv T )/(1 + v T A −1 u)
= I + A −1 uv T − (A −1 uv T )(1 + v T A −1 u)/(1 + v T A −1 u)
= I.
Thus, we prove the firs part of part I.
For the Sherman-Morrison-Woodbury formula, we first prove that det(A+UV T ) = det(A) det(I +
V T A −1U ). Consider
A +UV T U
¶ µ −1
A −1U A
µ ¶µ ¶µ ¶µ ¶ µ ¶
I A I A A
=
VT I I I −V T I I I + V T A −1U
Take determinant on both sides, and the result follows. As A is nonsingular, we prove
they are mutually necessary and sufficient. Then, verify the matrix equation as follows:
(A +UV T )(A −1 − A −1U T −1V T A −1 ) = I −U T −1V T A −1 +UV T A −1 −UV T A −1U T −1V T A −1
= I +UV T A −1 − (U +UV T A −1U )T −1V T A −1
= I +UV T A −1 −U (I + V T A −1U )T −1V T A −1
= I,
T
(A −1 −1
− A UT −1
V A −1
)(A +UV ) = I − A −1U T −1V T + A −1UV T − A −1U T −1V T A −1UV T
T

= I + A −1UV T − A −1U T −1 (V T + V T A −1UV T )


= I + A −1UV T − A −1U T −1 (I + V T A −1U )V T
= I.
In all, we prove part I.

23
2. Denote x 1 , x 2 , which satisfy Ax 1 = u, Ax 2 = c. According to what we have prove, to
solve B y = c, we have

y = B −1 c = (A −1 − (A −1 uv T A −1 )/(1 + v T A −1 u))c
= x 2 − ((v T x 2 )x 1 )/(1 + v T x 1 )

Therefore, using the above formula, the solution can be computed from two solutions
of Ax = b in addition to inner product, all of which are fast.

3. For the last question, since we have

B y = (A + B − A)y = Ay − (A − B )y = c,

we get an equation as
y = A −1 c + A −1 (A − B )y.
What we need to do is to find its root, thus we can use the fixed point iteration as

y i +1 = A −1 c + A −1 (A − B )y n ,

or in pseudocode as:
1: while kr k > t ol do
2: solve Ax = r
3: y = y +x
4: r = c −By
5: end while

For the fixed point iteration to converge, the norm of the coefficient matrix A −1 (A − B )
must smaller than one. Or, we can also get the same result from the iterative refinement
prespective as:

kr i +1 k = kr i − B δyk = k(A − B )δyk ≤ kA − B k · kδyk ≤ kA − B k · kA −1 k · kr i k.

Therefore, if kA − B k · kA −1 k < 1, the algorithm will converge linearly.


Question 2.14. Programming question. Skipped.

Question 2.15. Programming question. Skipped.

Question 2.16. Show how to reorganize the Cholesky algorithm (Algorithm 2.11.) to do most
of its operations using Level 3 BLAS. Mimic Algorithm 2.10.

Solution. As cholesky factorization does not need pivot, we first modify Algorithm 2.11 to fit
rectangular matrices. In details, for a m-by-b rectangular matrix, use original Algorithm 2.11
to deal with the upper b-by-b submatrix, and use level 2 BLAS to update the lower (m − n)-
by-n matrix, yielding
1: for i = 1 to b do

24
p
2: A(i : m, i ) = A(i : m, i )/ A(i , i )
3: for j = i + 1 to b do
4: A( j : b, j ) = A( j : b, j ) − A( j , i )A( j : b, i )
5: end for
6: A(b + 1 : m, i + 1 : b) = A(b + 1 : m, i : b) − A(b + 1 : m, i ) · A(i + 1 : b, i )T
7: end for
Then, by delaying the update of submatrix by b steps, we get level 3 BLAS implemented
Cholesky. In details,

L T11 L T12 L 11 L T11 L 11 L T12


µ ¶ µ ¶µ ¶µ ¶ µ ¶
A 11 A 12 L 11 I
A= = = .
A 21 A 22 L 12 I Ā I L 12 L T11 Ā + L 12 L T12

Thus, Ā = A 22 −L 12 L T12 , a symmetric rank-b update which is level 3 BLAS.


1: for i = 1 to n − 1 step b do
µ ¶
L 22
2: Use the above Algorithm to factorize A(i : n, i : i + b − 1) =
L 23
3: A(i + b : n, i + b : n) = A(i + b : n, i + b : n) − L 23 L T23
4: end for


Question 2.17. Suppose that, in Matlab, you have an n-by-n matrix A and an n-by-1 matrix
b. What do A\b, b’/A, and A/b mean in Matlab? How does A\b differ from inv(A)*b? 4

Solution. A\b: equals to solve the linear systems of equations: Ax = b;


b’/A: equals to solve x T A = b T ;
A/b: ?;
Although both A\b and inv(A)*b get the solution of the linear systems of equations: Ax = b,
yet to solve Ax = b does not need to compute the inverse of A. It can be based on matrix
factorization like LU, LL. 

Question 2.18. Let · ¸


A 11 A 12
A=
A 21 A 22

where A 11 is k-by-k and nonsingular. Then S = A 22 − A 21 A −1


11 A 12 is called the Schur comple-
ment of A 11 in A, or just Schur complement for short.

1. Show that after k steps of Gaussian elimination without pivoting, A 22 has been overwrit-
ten by S.

2. Suppose A = A T , A 11 is positive definite, and A 22 is negative definite. Show that A is


nonsingular, that Gaussian elimination without pivoting will work in exact arithmetic,
but that Gaussian elimination without pivoting may be numerically unstable.

Proof.
4 I don’t use matlab much, so the answer may be modefied.

25
1. As we know, one step of Gaussian elimination overwrite the n − 1 lower submatrix by
Schur complement. If it holds for (k − 1)th step of Gaussian elimination, it suffices to
prove it is also true for the k steps of Gaussian elimination. Denote
A 11 v k−1 A 12
 

A = a Tk−1 a kk b Tn−k  , where A 11 is (k − 1)-by-(k − 1).


A 21 u n−k A 22
As kth step of Gaussian elimination equals to (k − 1)th step of Gaussian elimination
followed by one more Gaussian elimination, and the first k − 1 steps will update A(k :
n, k : n) as
T
a kk b Tn−k
¸ · T ¸
a kk − a Tk−1 A −1 T −1
· · ¸
a 11 v k−1 b n−k − a k−1 A 11 A 12 ,
− k−1 A −1
£ ¤
11 v k−1 A 12 =
u n−k A 22 A 21 u n−k − A 21 A −111 v k−1 A 22 − A 21 A −1
11 A 12

the next step will update A 22 as


T T T
A 022 = A 22 −A 21 A −1 −1 −1 −1
11 A 12 −(u n−k −A 21 A 11 v k−1 )(b n−k −a k−1 A 11 A 12 )/(a kk −a k−1 A 11 v k−1 ).
µ ¶
A 11 v k−1
On the other hand, since the inverse of Ā = T is
a k−1 a kk

(a kk − a Tk−1 A −1 v k−1 )A −1 + A −1 v k−1 a Tk−1 A −1 −A −1


· ¸
−1 1 11 11 11 11 11 v k−1
Ā = ,
a kk − a Tk−1 A −1 −a Tk−1 A −1 1
11 v k−1 11

the corresponding kth step Schur complement is


· ¸
¤ 0 A 12 T
= A 22 − A 21 A −1 −1 −1
£
A 22 − A 21 u n−k A 11 A 12 − (A 21 A 11 v k−1 a k−1 A 11 A 12 −
b Tn−k
u n−k a Tk−1 A −1 A 12 − A 21 A −1 T T T −1 0
11 v k−1 b n−k + u n−k b n−k )/(a kk − a k−1 A 11 v k−1 ) = A 22 .

Therefore, it also holds for kth step of Gaussian elimination.

2. According to the condition, A 11 , A 22 is invertible. Thus, to show that A is nonsingular,


consider its determinant as
¯ ¯ ¯ ¯
¯ A 11 A 12 ¯ ¯ A 11 A 12 ¯ = det(A 11 ) det(A 22 − A 21 A −1 A 12 ).
¯
det(A) = ¯¯ ¯ = ¯
11
A 21 A 22 ¯ ¯ 0 A 22 − A 21 A −1 A
11 12
¯

For any vector x ∈ Rn−k , we have

x T (A 22 − A 21 A −1 T T
11 A 12 )x = x A 22 x − (A 12 x) A 11 (A 12 x) ≥ 0.

The equality holds if and only if x = 0. So, A 22 − A 21 A −1 11 A 12 is s.p.d., which means


−1
det(A 22 − A 21 A 11 A 12 ) > 0. As a result, A is nonsingular.
According to Question 2.11., all the diagonal entries of A are nonzero. Consequently the
Gaussian elimination will work in exact arthimetic. To show it is numerical unstable,
consider the following counterexample:
1010 1010
· ¸ · ¸· ¸
1 1 1
A= = .
1010 −1 1010 1 −1 − 1020

26


Question 2.19. Matrix A is called strictly column diagonally dominant, or diagonally domi-
nant for short, if
n
X
|a i i | > |a j i |.
j =1, j 6=i

1. Show that A is nonsingular.

2. Show that Gaussian elimination with partial pivoting does not actually permute any
rows, i.e., that it is identical to Gaussian elimination without pivoting.

Proof.

1. According to Gershgorin’s theorem, the eigenvalues λ of A T satisfy


X
|λ − a i i | ≤ |a j i |.
j 6=i

Thus, we have
|a j i | ≤ λ ≤ a i i +
X X
ai i − |a j i |.
j 6=i j 6=i

When a i i > 0, we have λ > 0. Otherwise, a i i < 0 will result in λ < 0. Since A is strictly
column diagonally dominant, a i i 6= 0. Thus, we prove λ 6= 0, which means det(A) =
det(A T ) = ni=1 λ 6= 0. i.e., A is nonsingular.
Q

2. Since |a 11 | > |a j 1 |, it holds for the base case. If we could prove that after updating the
submatrix still has the diagonally dominant property, we would prove the question.
Denote
a i0 j = a i j − a 1 j ∗ a i 1 /a 11 .
Sum them up, we get
n n
|a i0 j | ≤
X X
(|a i j | + |a 1 j ∗ a i 1 /a 11 |)
i =2,i 6= j i =2,i 6= j
n
X
≤ |a i i | − |a 1i | − |a 1 j | + |a 1 j /a 11 | |a i 1 |
i =1,i 6= j

≤ |a i i | − |a 1i | ≤ |a i i | − |a 1i ∗ a i 1 /a 11 | ≤ |a i0 i |.

Therefore, the diagonally dominant property holds. By mathematic induction, we prove


the quesion.

Question 2.20. Given an n-by-n nonsingular matrix A, how do you efficiently solve the follow-
ing problems, using Gaussian elimination with partial pivoting?

(a) Solve the linear system A k x = b, where k is a positive integer.

27
(b) Compute α = c T A −1 b.

(c) Solve the matrix equation AX = B , where B is n-by-m.

You should (1) descirbe your algorithms, (2) present them in pseudocode and (3) give the require
flops.

Solution.

(a) Since A k x = b equals A k−1 x = A −1 b, set A −1 b = b 1 , and we have A k−1 x = b 1 , which


is the same kind of equation with smaller exponent. Thus, we have the algorithm

1: for i = k to 1 do
2: solve Ax = b
3: b=x
4: end for

Totally, it requires 23 n 3 + 2kn 2 + O(n 2 ) flops.

(b) First solve Ax = b, then do the inner product. The pseudocode is shown below.
1: solve Ax = b
2: α = cT x
Totally, it require 23 n 3 + O(n 2 ) flops.

(c) Solve AX = B equals to solve m vectors.


1: for i = 1 to m do
2: solve Ax = b i
3: end for

Totally, it require 23 n 3 + 2mn 2 + O(n 2 ) flops.

Question 2.21. Prove that Strassen’s algorithm (Algorithm 2.8) correctly multiplies n-by-n ma-
trices, where n is a power of 2.

Proof. Since n is a power of 2, all matrix partitions and matrix multiplication will be ok.
To prove the result, it suffice to prove the correctness of the recursive step. According to
Strassen’s algorithm, we have

P 1 = A 12 B 21 + A 12 B 22 − A 22 B 21 − A 22 B 22 , P 4 = A 11 B 22 + A 12 B 22 ,
P 2 = A 11 B 11 + A 11 B 22 + A 22 B 11 + A 22 B 22 , P 5 = A 11 B 12 − A 11 B 22 ,
P 3 = A 11 B 11 + A 11 B 22 − A 21 B 11 − A 21 B 12 , P 6 = A 22 B 21 − A 22 B 11 ,
P 7 = A 21 B 11 + A 22 B 11 , C 11 = A 12 B 21 + A 11 B 11 ,
C 12 = A 12 B 22 + A 11 B 12 , C 21 = A 22 B 21 + A 21 B 11 ,
C 22 = A 22 B 22 + A 21 B 12 .

28
Therefore, · ¸ · ¸
A 11 B 11 + A 12 B 21 A 11 B 12 + A 12 B 22 C 11 C 12
A= = .
A 21 B 11 + A 22 B 21 A 21 B 12 + A 22 B 22 C 21 C 22
So the recursive yield the correct answer, and the basic case for the Strassen’s algorithm is just
the scalar arithmtic operations. So, Strassen’s algorithm correctly multiplies n-by-n matrices,
where n is a power of 2. 

3 S OLUTIONS FOR C HAPTER III: L INEAR L EAST S QUARES P ROBLEMS


Question 3.1. Show that the two variations of Algorithm 3.1, CGS and MGS, are mathemati-
cally equvialent by showing that the two formulas for r i j yield the same results in exact arith-
metic.

Proof. The differences between CGS and MGS are the different ways to calculate coeffi-
cients r i j . For CGS, it maintains to use the original vectors a i , while the MGS uses the up-
dated vectors q i , which after j steps of updates becomes

j
X
q i = ai − r ki q k .
k=1

Thus, as for any j < i , q j are mutually orthogonal, it is obvious that

j
r ( j +1)i = q Tj+1 q i = q Tj+1 (a i − r ki q j ) = q Tj+1 a i .
X
k=1

So, by induction, CGS and MGS are mathematically equivalent. 

Question 3.2. Programming question. Skipped.

Question 3.3. Let A be m-by-n, m ≥ n, and have full rank.


· ¸ · ¸ · ¸
I A r b
1. Show that T · = has a solution where x minimizes kAx − bk2 .
A 0 x 0

2. What is the conition number of the coefficient matrix in terms of the singular values of
A?

3. Give an explicit expression for the inverse of the coefficient matrix, as a block 2-by-2 ma-
trix.

4. Show how to use the QR decomposition of A to implement an iterative refinement algo-


rithm to improve the accuracy of x.

Solution.

29
· ¸ · ¸ · ¸
I A r b
1. We can decompose the equation T · = into the following two equations:
A 0 x 0

r + Ax = b,
A T r = 0.

Thus, if x is the original solution, we have

A T r + A T Ax = A T Ax = A T b,

which means x satisfies the normal equation. Therefore, x minimizes kAx − bk2 .

2. Since A has full column rank, let U T AV = Σ the full SVD of A, where U ,V are m-by-
m, n-by-n orthgonal matirces and Σ is m-by-n diagonal matrix with nonzero entries.
Applying orthgonal similarity transformation onto the coefficient matrix, we have
· T
Σ
¸· ¸· ¸ · ¸
U Im A U I
= .
V T AT V ΣT

Since Σ has full colum rank, we can compute the eigenvalues of the above matrix by
applying elementary transformations with unit determinant as

(1 − λ)I m Σ (1 − λ)I m Σ I m λΣ(ΣT Σ)−1


· ¸ · ¸ · ¸
det = det det
ΣT −λI n ΣT −λI n In
(1 − λ)I m λ(1 − λ)Σ(ΣT Σ)−1 + Σ
· ¸ · ¸
Im
= det det
−ΣT I n ΣT
I m Σ(ΣT Σ)−1 (1 − λ)I m λ(1 − λ)Σ(ΣT Σ)−1 + Σ
· ¸ · ¸
= det det
In λΣT λ(λ − 1)I n − ΣT Σ
(1 − λ)I m + λΣ(ΣT Σ)−1 ΣT
· ¸
= det
λΣT λ(λ − 1)I n − ΣT Σ
n
= (1 − λ)m−n (λ2 − λ − σ2i ).
Y
i =1
q q
Thus, the eigenvalues of the coefficient matrix are 1 and 12 (1± 1 + 4σ2i ). As 1 + 4σ2i >
q
1, we have 1 + 1 + 4σ2i > 2. Consequently, the condition number is
 µ q ¶ n ³q ´ o
1
1 + 1 + 4σ2max / min 12 1 + 4σ2min − 1 , 1 , when m > n;


 2
|λmax /λmin | = µ q ¶ ³q ´
 1 + 1 + 4σ2max / 1 + 4σ2min − 1 , when m = n;

3. To find its inverse, we use invertible matrices to transform the coefficient matrix into
indentity matrix as follows

I m A(A T A)−1
· ¸· ¸· ¸· ¸ · ¸
Im Im Im A Im
= .
−(A T A)−1 In −A T I n A T In

30
Thus, we have the candidate matrix as
I m A(A T A)−1 I m − A(A T A)−1 A T A(A T A)−1
· ¸· ¸· ¸ · ¸
Im Im
= = B.
−(A T A)−1 In −A T I n (A T A)−1 A T −(A T A)−1
· ¸ · ¸
I A I A
It is easy to check that B T = T B = I m+n . Consequently, B is the inverse
A 0 A 0
of coefficient matrix.

4. Let A = QR the QR factorization of A, where Q is m-by-m orthgonal matrix and R is


m-by-n upper triangluar matrix with positive entries in R(i , i ). Since
R QT
· ¸ · ¸ · ¸· ¸· ¸
I A I QR Q I
= T T = ,
AT 0 R Q I RT I
the main work in solving the original linear equations lies in solving
· ¸· ¸ · ¸ · ¸
I R x1 x1 + Rx2 b1
T = T = ,
R x2 R x1 b2
which can be solved easily. Thus, we can apply iterative refinement onto the above
equation to improve the answer.

Question 3.4. Weighted least squares: If some components of Ax −b are more important than
others, we can weight them with a scale factor d i and solve the weighted least squares problem
min kD(Ax − b)k2 instead, where D has diagnoal entries d i . More generally, recall that if C
is symmetric positive definite, then kxkC ≡ (x T C x)1/2 is a norm, and we can consider mini-
mizing kAx − bkC . Derive the normal equations for this problem, as well as the formulation
corresponding to the previous quesion.
Solution. To derive the normal equations, we look for the x, where the gradient of kAx −
bkC2 = (Ax − b)T C (Ax − b) vanishes, i.e.,

0 = lim (Ax + Ae − b)T C (Ax + Ae − b) − (Ax − b)T C (Ax − b) /e T C e


¡ ¢
e→0
= lim 2e(A T C Ax − A T C b) + e T A T C Ae /e T C e
¡ ¢
e→0
= lim 2e(A T C Ax − A T C b) /e T C e.
¡ ¢
e→0

Thus, we get its normal equations as

A T C Ax = A T C b.

To derive the corresponding formulation like previous question, consider the following sym-
metric full rank linear systems of equations
C 1/2 A
· ¸ · ¸ · 1/2 ¸
I r C b
· = .
A T C 1/2 0 x 0

Thus, if we take C 1/2 A as à and C 1/2 b as b̃, we have the same equation as before and the
previous result also applies. 

31
Question 3.5. Let A ∈ Rn×n be positive definite. Two vectors u 1 and u 2 are called A-orthgonal
if u T1 Au 2 = 0. If U ∈ Rn×r and U T AU = I , then the columns of U are said to be A-orthgonal.
Show that every subspace has an A-orthgonal basis.

Proof. Given any subspace, there exists a set of linear independent vectors α1 , α2 , . . . , αn ,
¡ ¢

which can span the whole space. Then, we use the given vectors to create an A-orthonormal
basis β1 , β2 , . . . , βn as follows:
¡ ¢

1: for i = 1 to n do
βi = αi − ij−1 (βT Aαi )β j
P
2:
q =1 j
3: βi = βi / βTi Aβi
4: end for
Since αi span the whole space, the above process terminates at nth step. It is easy to verify
that the first step holds the A-orthonormal property as

βT1 Aβ2 = βT1 Aα2 − (βT1 Aα2 )βT1 Aβ1 /βT2 Aβ2 = βT1 Aα2 − βT1 Aα2 /βT2 Aβ2 = 0.
¡ ¢ ¡ ¢

Thus, by mathematic induction, for any i > j we have


³ ´ ³ ´
βTj Aβi = βTj Aαi − (βTj Aαi )βTj Aβ j /βTi Aβi = βTj Aαi − βTj Aαi /βTi Aβi = 0,

which implies β1 , β2 , . . . , βn is an A-orthgonal basis. If there exist some βi 1 , βi 2 , . . . , βi s which


¡ ¢

are linear dependent, there will exist c 1 , c 2 , . . . , c s such that

c 1 βi 1 + c 2 βi 2 + . . . + c s βi s = 0.

Apply βTik A on both side, and we have c k = 0, which implies they are linear independent. As
αi span the whole space, the vector set βi also form a basis of the space and is mutually
A-orthonormal. 

Question 3.6. Let A have the form


· ¸
R
A= ,
S
where R is n-by-n and upper triangular, and S is m-by-n and dense. Describe an algorithm
using Householder transformations for reducing A to upper triangular form. Your algorithm
should not "fill in" the zeros in R and thus require fewer operations than would Algorithm 3.2
applied to A.
¡ ¢
Solution. Adding R(1, 1) into the first column of S, we get x 1 = R(1, 1), S(1, 1), S(2, 1) . . . , S(m, 1) .
If kx 1 k2 = 0, we are done and move into submatrix A(2 : m + n, 2 : n). Otherwise, we will have
¡ ¢
an (m + 1)-by-(m + 1) Householder transformation P , which satisfies P x 1 = kx 1 k2 , 0, . . . , 0 .
Thus, we can construct a symmetric orthgonal matrix P̂ as

P (1, 1) 0 P (1, 2 : m + 1)
 

P̂ =  0 I n−1 0 .
P (2 : m + 1, 1) 0 P (2 : m + 1, 2 : m + 1)

32
Apply P̂ into A, we have
 
kx 1 k2 ∗
P̂ A = A 1 =  0 R(2 : n, 2 : n) .
0 ∗
Thus, we reduce the problem into smaller subproblem. Continue, and we will reduce A into
upper triangluar. 

Question 3.7. If A = R +uv T , where R is an upper triangular matrix, and u and v are column
vectors, describe an effiecient algorithm to compute the QR decomposition of A.

Solution. When either u or v is 0, the answer is thrivial. Suppose u n 6= 0. Otherwise the last
row of uv T will be 0 and we can start from the (n −1)th row of uv T . As the rank of uv T is only
one and by assumption the last row of uv T is not 0, it follows that the last row of uv T can
linear represent the remaining rows. Thus, there exist n − 1 Givens rotations G 1 ,G 2 , . . . ,G n−1
which rotate i th and nth rows to cancel i th row. Since R is upper triangular, G i R is also upper
triangular, which means

Ā = G n−1G n−2 . . .G 1 A = G n−1G n−2 . . .G 1 R +G n−1G n−2 . . .G 1 uv T

is upper triangular except for the last row. Then, we proceed to cancel the last row. First, if
R(1, 1) is not zero, we use the first row of Ā to cancel Ā(n, 1). Otherwise, we swap the first row
and the last row of Ā. Continue like this, and we reduce Ā to upper triangular, and all the
matrices we apply to A forms the orthgonal matrix Q. 

Question 3.8. Let x ∈ Rn and let P be a Householder matrix such that P x = ±kxk2 e 1 . Let
Q 1,2 , . . . ,Q n−1,n be Givens rotations, and let Q = G 1,2 . . .G n−1,n . Suppose Qx = ±kxk2 e 1 . Must P
equal Q?

Solution. P does not need to equal Q. Consider the two dimension counterexample as
· ¸· ¸ · ¸
1 −3 −4 3 −5
Px = = = −kxk2 e 1 .
5 −4 3 4 0
· ¸
2 1
The above matrix is a Householder matrix with u = as generating vector. In the same
5
1
time, the following matrix is a Givens rotation, which satisfies the condition while does not
equal to P ,
· ¸· ¸ · ¸
1 −3 −4 3 −5
Qx = = = −kxk2 e 1 .
5 4 −3 4 0


Question 3.9. Let A be m-by-n, with SVD A = U ΣV T . Compute the SVDs of the following
matrices in terms of U , Σ, and V :

1. (A T A)−1 ,

2. (A T A)−1 A T ,

33
3. A(A T A)−1 ,

4. A(A T A)−1 A T .

Remark. For the formulas to hold, we have to assume that A has full column rank. And for
simplicity, later we will assume that m ≥ n if the contrary situation is not stated explicitly,
since for the case n ≥ m can be treated similarly.
Proof.

1. for first formula, we have

(A T A)−1 = (V ΣT U T U ΣV T )−1 = (V Σ2V T )−1 = V Σ−2V T .

2. For second formula, we have

(A T A)−1 A T = V Σ−2V T V ΣU T = V Σ−1U T .

3. For third formula, we have

A(A T A)−1 = U ΣV T V Σ−2V T = U Σ−1V T .

4. For the last formula, we have

A(A T A)−1 A T = U Σ−1V T V ΣU T = I n .


Question 3.10. Let A k be a best rank-k approximation of the matrix A, as defined in Part 9 of
Theorem 3.3. Let σi be the i th singular value of A. Show that A k is unique if σk > σk+1 .

Proof. I believe what the question claim is not correct. According to the textbook, the so-
called best rank-k approximation of a matrix A under two-norm is a rank-k matrix which
satisfies
kA − Ãk2 = σk+1 .
Thus, I present a counterexample to illustrate that it may not be unique under the assupmtion
of the question. Consider the matrix A as
 
4
 3 
A = diag(4, 3, 2, 1) =  .
 
 2 
1

Then any matrix à like


4−²
 
 3 
,
 
2

 
0

34
where 3 < ² < 4, satisfies kA − Ãk2 = 1. Therefore, the best rank-3 approximation of A is not
unique. However, the result holds if we measure under Frobenius norm, and if the best rank-
k approximation must have the form of ki=1 σi u i v Ti , the proof is trivial.
P

Remark. Under the assumption of the question, I can only prove that any matrix B satisfies
kA − B k2 = σk+1 has the condition
 T
u 1 B v 1 u T2 B v 1 . . . u Tk B v 1

u T B v T T
2 u2 B v 2 . . . uk B v 2

 1 
.. .. .. ..
U T BV = 
 
. ,
 . . . 
 T T T
u 1 B v k u 2 B v k . . . u k B v k


0

where U and V are the orthgonoal matrices of the SVD of A. In some degree, the two norm
equality kA−B k2 = σk+1 does not provide enough information of the range of B , which results
in its nonuniqueness. 

Question 3.11. Let A be m-by-n. Show that X = A + (the Moore-Penrose pseudoinverse) mini-
mizes kAX − I kF over all n-by-m matrices X . What is the value of this minimum?

Proof. Denote the SVD decomposition of A as A = U ΣV T . As there is an one-to-one corre-


spondence from X to X̂ = V T X U T , we can substitue X by X̂ . If X̂ is the minimizer, we can
regain target X as V X̂ U T and vice verse. Consequently, the problem can be transformed into
à !
r m
¢2 X
2 T T 2 2 2 2
kAX − I kF = kU ΣV V X̂ U − I kF = kΣ X̂ − I kF = σi X̂ (i , i ) − 1 + σi ( X̂ (i , j )) + (n − r ),
X ¡
i =1 j 6=i

where r is the rank of A. Thus, to minimizes the above formula, we have

X̂ (i , i ) = σ−1
i , X̂ (i , j ) = 0, where i ∈ {1, 2, . . . , r }, and j 6= i ∈ {1, 2, . . . , m}.

So, X̂ may have many choices, and X̂ = Σ+ always satisfies the condition. As a result, X = A +
p
is the minimizer, and the minimum is n − r . 

Question 3.12. Let A, B , and C be matrices with dimensions such that the product A T C B T is
well defined. Let X be the set of matrices X minimizing kAX B −C kF , and let X 0 be the unique
member of X minimizing kX kF . Show that X 0 = A +C B + .

Remark. This solution is unique only under the assumption that A and B have full rank.
Proof. Assume matrix A as m-by-s, B as t -by-n, C as m-by-n. Denote the SVD decomposi-
tion of A as A = U1 Σ1V1T , B as B = U2 Σ2V2T . As there is an one-to-one correspondence from
X to X̂ = V1T X U2 , we can substitue X as X̂ . Thus, we have

kAX B −C k2F = kU1 Σ1V1T V1 X̂ U2T U2 Σ2V2T −C k2F


= kΣ1 X̂ Σ2 −U1T CV2 k2F
= k(Σ1 X̂ Σ2 −U1T CV2 )(1 : r 1 , 1 : r 2 )k2F + k(U1T CV2 ) (((r 1 + 1) : m, 1 : n) + (1 : r 1 , (r 2 + 1) : n)) k2F ,

35
where r 1 is the rank of A and r 2 is the rank of B . As the matrix U1T CV2 are set once A, B and C
are given, the minimizer must and only have to minimize

k(Σ1 X̂ Σ2 −U1T CV2 )(1 : r 1 , 1 : r 2 )k2F .

Thus, any matrix X̂ , which satisfies Σ1 X̂ Σ2 (1 : r 1 , 1 : r 2 ) = (U1T CV2 )(1 : r 1 , 1 : r 2 ) or equivalently


for any matrix X which Σ1V1T X U2 Σ2 (1 : r 1 , 1 : r 2 ) = (U1T CV2 )(1 : r 1 , 1 : r 2 ), is a minimizer, We
claim that X 0 = A +C B + is such a minimizer, since

Σ1V1T X 0U2 Σ2 (1 : r 1 , 1 : r 2 ) = Σ1V1T V1 Σ+ T + T


1 U1 CV2 Σ2 U2 U2 Σ2 (1 : r 1 , 1 : r 2 )
· ¸ · ¸
I r1 0 T I r2 0
= U CV2 (1 : r 1 , 1 : r 2 )
0 0 1 0 0
= (U1T CV2 )(1 : r 1 , 1 : r 2 ).


Question 3.13. Show that the Moore-Penrose pseudoinverse of A satisfies the following identi-
ties:

A A + A = A, A + A A + = A + , A + A = (A + A)T , A A + = (A A + )T .

Proof. Denote the SVD decomposition of A as U ΣV T , r as the rank of A. We prove these


identities one by one.

1.
· ¸
T T T Ir 0
+
A A A = U ΣV V Σ U U ΣV +
=U ΣV T = U ΣV T = A.
0 0

2.
· ¸
T Ir T T 0 + T
+ + +
A A A = V Σ U U ΣV V Σ U = V +
Σ U = V Σ+U T = A + .
0 0

3.
· ¸ · ¸
T T Ir 0 T Ir 0 T T
+ +
A A = V Σ U U ΣV =V V = (V V ) = (A + A)T .
0 0 0 0

4.
· ¸ · ¸
T Ir T 0 T Ir 0 T T
+
A A = U ΣV V Σ U = U +
U = (U U ) = (A A + )T .
0 0 0 0

0 AT
¸ ·
Question 3.14. Prove part 4 of Theorem 3.3: Let H = , where A is square and A =
A 0
U ΣV T is its SVD. Let Σ = diag(σ1 , σ2 . . . , σn ),U = [u 1 , u 2 , . . . , u n ], and V = [v 1 , v 2 , . . . , v n ]. Prove
· ¸
vi
that the 2n eigenvalues of H are ±σi , with corresponding unit eigenvectors p1 . Extend
2 ±u i
to the case of retangular A.

36
Proof. We directly prove the case of retangular A, thus the square case is proved without
special treatment. First, denote the SVD decomposition of A as U ΣV T = ni=1 σi uv T , where
P

U = [u 1 , u 2 , . . . , u n ],V = [v 1 , v 2 , . . . , v n ] and Σ = diag(σ1 , σ2 . . . , σn ). We verify that ±σi are the


2n eigenvalues of H with corresponding unit eigenvectors p1 [v i , ±u i ]T as follows:
2
Pn T ¸· ¸
i =1 σi v u
· ¸ ·
vi 0 vi
H = Pn T
ui i =1 σi uv 0 ui
σi v i
· ¸ · ¸
vi
= = σi ,
σi u i ui
Pn T ¸·
i =1 σi v u
· ¸ · ¸
vi 0 vi
H = Pn T
−u i i =1 σi uv 0 −u i
· ¸ · ¸
−σi v i vi
= = −σi .
σi u i −u i

Then, as
p
°· ¸°
° vi °
° ±u ° = 2,
° °
i 2

and all p1 [v i , ±u i ]T are linear indenpendent, ±σi are the 2n eigenvalues of H with corre-
2
sponding unit eigenvectors p1 [v i , ±u i ]T . As for the remaining m − n eigenvalues, let Ũ =
2
[u n+1 , . . . , u m ] be the orthgonal supplement of U . We can claim that the remaining m − n
eigenvalues of H are zeros, with corresponding unit eigenvectors [0, u i ]T , since
Pn T ¸· ¸ · ¸
i =1 σi v u
· ¸ ·
0 0 0 0
H = Pn T = .
ui σ
i =1 i uv 0 u i 0


Question 3.15. Let A be m-by-n, m < n, and of full rank. Then min kAx − bk2 is called an
underdetermined least squares problem. Show that the solution is an (n − m)-dimensional
set. Show how to compute the unique minimum norm solution using appropriately modified
normal equations, QR decomposition, and SVD.

Solution. We explain these three techniques one by one, and prove the properties of the
solution in the first technique.

1. For modified normal equation, partition any vector x ∈ Rn into x = x 1 + x 2 , where x 2 is


in the null space of A and x 1 is in the range of A T . Thus,

kAx − bk2 = kAx 1 − bk2 = kA A T y − bk2 .

As A A T has full rank, the normal equation

A AT y = b

has unique solution. Every vector in the form A T y + x 2 is the minimizer, which forms
an (n − m)-dimensional set. The unique minimum norm solution is x = A T (A A T )−1 b .

37
2. For QR decompostion, since A T has the normal QR decomposition as A T = QR where
Q is n-by-m and R is m-by-m, we have that A = R T Q T , and the least square problem
equals the equation
R T Q T x = b.
Let x = Q(R T )−1 b + x̃, where x̃ can be any vector. We can transform the above equation
into

Q T x̃ = 0.

Thus, x̃ belong to the null space of Q T . Evaluate the norm of our solution, we have

kxk22 = kQ(R T )−1 b + x̃k22 = kQ(R T )−1 bk22 + kx̃k22 ≥ kQ(R T )−1 bk22 .

Consequently, the unique minimum norm solution is Q(R T )−1 b.

3. For SVD, we can have the SVD decomposition of A as A = V ΣU T , where V is m-by-m,


Σ is m-by-m full rank diagonal matrix and U T is m-by-n. Like QR decomposition, the
least square problem equals to
U T x = Σ−1V T b.
Let x = U Σ−1V T b + x̃ and we have the solution of above linear equations as

U T x̃ = 0.

Thus, x̃ belong to the null space of U T . Evaluate the norm of our solution, we have

kxk22 = kU Σ−1V T b + x̃k22 = kU Σ−1V T bk22 + kx̃k22 ≥ kU Σ−1V T bk22 .

Consequently, the unique minimum norm solution is U Σ−1V T b.


Question 3.16. Prove Lemma 3.1:
Let P be an exact Householder (or Givens) transformation, and P̃ be its floating point approxi-
mation. Then
fl(P̃ A) = P (A + E ) kE k2 = O(²)kAk2
and
fl(A P̃ ) = (A + F )P kF k2 = O(²)kAk2 .

Proof. If fl(a ¯ b) = (a ¯ b)(1 + ²) holds, like Question 1.10., we can prove that

fl(AB ) − AB = O(²)AB.

Thus, for the first formula, we have

kE k2 = kP −1 fl(P̃ A) − P A k2 = kP −1 (fl((1 + O(²))P A) − P A) k2


¡ ¢

≤ O(²)kP −1 k2 kP k2 kAk2
= O(²)kAk2 .

The second one can be deduced similarly. 

38
Question 3.17. The question is too long, so I omit it. For details, please refer to the textbook.

Solution.

1. We prove it by induction. First, consider the base case as P = P 2 P 1 . We have

P = P 2 P 1 = (I − u 2 u T2 )(I − u 1 u T1 )
= I − 2u 1 u T1 − 2u 2 u T2 − 4u T2 u 1 u 2 u T1
· ¸· T ¸
2 u1
= I − [u 1 , u 2 ] .
4u T2 u 1 2 u T2

Thus, the base case holds. Then, assume that P 0 = P k P k−1 · · · P 1 = I −Uk Tk UkT , then we
have

P = P k+1 P 0 = (I − 2u k+1 u Tk+1 )(I −Uk Tk UkT )


= I −Uk Tk UkT − 2u k+1 u Tk+1 + 2u k+1 u Tk+1Uk Tk UkT
· ¸· T ¸
Tk Uk
= I − [Uk , u k+1 ]
2u Tk+1Uk Tk 2 u Tk+1
T
= I −Uk+1 Tk+1Uk+1 .

Therefore, by induction, we have proved part I of the question. As for algorithm, we can
use the recursive formula we get in induction as
· ¸
Tk
Tk+1 = T .
2u k+1Uk Tk 2

And the base case is T1 = 2. The pseudocode is shown below:


1: T1 = 1
2: for i = 2 to k do
3: βi = 2u Ti Ui −1 Ti −1
· ¸
Ti −1
4: Ti =
βi 2
5: end for

2. The algorithm which the textbook introduces is just sufficient:


1: for i = 1 to min(m − 1, n) do
2: u i = House(A(i : m, i ))
3: v = u Ti A(i : m, i : n)
4: A(i : m, i : n) = A(i : m, i : n) − 2u i v T
5: end for

Here, u Ti A(i : m, i : n) is a matrix-vector multiplication and A(i : m, i : n) = A(i : m, i :


n)−2u i v T is a rank-1 update. Since House(A(i : m, i )) costs 2(m −i ), the total flops are
Pmin(m−1,n)
i =1
(2m − 2i − 1)(n − i ) + 2(m − i )(n − i ) + 2(m − i ) ≈ 2mn 2 − 32 n 3 .

39
3. Denote b as a small integer. Like Level 3 BLAS Gaussian elimination, we use the tech-
nique of delaying update. In details, assume we have already done the first k−1 columns,
yielding

k −1 b n −k −b +1
k −1 n −k +1  
à ! R 11 R 12 R 13
k −1 Q 11 Q 12  
A = ·  Ã 22 Ã 23 
m −k −1 Q 21 Q 22
 
à 32 à 33
¸ ·
à 22
Then, apply the Level 2 BLAS QR implement to the submatrix and get
à 23
· ¸ · ¸
à 22 à 23 R 22 R 23
=Q ,
à 32 à 33 Ā 33
· ¸ · ¸ · ¸ · ¸
R 22 Ã 22 R 23 Ã 23
where Q = and Q = . Thus,
0 Ã 32 Ā 33 Ã 33

¸ R 11 R 12 R 13
 
· ¸·
Q 11 Q 12 I
A=  R 22 R 23 
Q 21 Q 22 Q
Ā 33
¸ R 11 R 12 R 13
 
·
Q 11 Q 12Q 
= R 22 R 23 
Q 21 Q 22Q
Ā 33

The Q can be computed by part I, and R 23 , Ā 33 can be computed from matrix product,
which is level 3 BLAS.

Question 3.18. It is often of interest to solve constrained least squares problems, where the
solution x must satisfy a linear or nonlinear constraint in addition to minimizing kAx − bk2 .
We consider one such problem here. Suppose that we want to choose x to minimize kAx − bk2
subject to the linear constraint C x = d . Suppose also that A is m-by-n, C is p-by-n, and C has
full rank. We also assume that p ≤ n (so C x = d is guaranteed to be consisitent) and n ≤ m + p
(so the system is not underdetermined). Show that there is a unique solution under the assump-
· ¸
A
tion that has full column rank. Show how to compute x using two QR decompositions and
C
some matrix-vector multiplications and solving some triangular systems of equations.

Solution. First, compute a full QR decomposition of C as C = R T Q T , where Q T = [Q 1 ,Q 2 ]T


is n-by-n and R T = [L, 0] is p-by-n. Then the constraint C x = d can be transformed into
· T¸
Q1
R T Q T x = [L, 0] x = LQ 1T x = d .
Q 2T

40
Like Question 3.15., the solutions of the above linear system are Q 1 L −1 d +Q 2 y, where y is any
vector in Rn−p . Therefore, the original least square problem equals to

kAx − bk2 = kA(Q 1 L −1 d +Q 2 y) − bk2 = kAQ 2 y − (b − AQ 1 L −1 d )k2 .

As AQ 2 is m-by-(n − p) and m + p ≥ n, we have m ≥ n − p. If AQ 2 has full rank, we will


· ¸
A
get the unique solution. Since has full column rank, we claim that the intersection of
C
the null space of A and the null space of B is empty. If not, suppose x is one vector of the
· ¸ · ¸
A A
intersection, then x = 0, which means the columns of are linear dependent. Thus,
C C
the null spaces of A and C do not intersect. Consequently, AQ 2 has full rank. 5 If we denote
the QR decomposition of AQ 2 as AQ 2 = Q̂ R̂, we have the unique solution of y as

y = R̂ −1Q̂ T (b − AQ 1 L −1 d ).

Then the final answer should be

x = AQ 2 y +Q 1 L −1 d .

Question 3.19. Programming question. Skipped.

Question 3.20. Prove Theorem 3.4.

Proof. First, since

(A T A)−1 A T (A + δA) = I + (A T A)−1 A T δA

and kA T A −1 A T δAk2 = 1/σn , I + (A T A)−1 A T δA is invertible accroding to Lemma 2.1. Conse-


quently, (A + δA) has full rank. As x̃ minimizes the problem k(A + δA)x̃ − (b + δb)k2 , it can be
expressed as
¢−1
x̃ = (A + δA)T (A + δA) (A + δA)T (b + δb)
¡
¢−1
= A T A(I + (A T A)−1 A T δA + (A T A)−1 (δA)T A + (A T A)−1 (δA)T (δA) (A + δA)T (b + δb)
¡

∞ ¢i
= (−1)i (A T A)−1 A T δA + (A T A)−1 (δA)T A + (A T A)−1 (δA)T (δA) (A T A)−1 (A + δA)T (b + δb)
X ¡
i =0
≈ I − (A T A)−1 (δA)T A − (A T A)−1 A T δA (A T A)−1 (A T b + (δA)T b + A T δb)
¡ ¢

≈ (I − (A T A)−1 (δA)T A − (A T A)−1 A T δA)x + (A T A)−1 (δA)T b + (A T A)−1 A T δb


= (I − (A T A)−1 A T δA)x + (A T A)−1 (δA)T r + (A T A)−1 A T δb.
5 Denote Q = q , q , . . . , q
¡ ¢
2 1 2 n−p . If there exist b 1 , b 2 , . . . , b n−p such that

b 1 Aq 1 + b 2 Aq 2 + . . . + b n−p Aq n−p = A(b 1 q 1 + b 2 q 2 + . . . + b n−p q n−p ) = 0,

it follows b 1 q 1 + b 2 q 2 + . . . + b n−p q n−p is in the null space of A, which is also in the null space of C .

41
kr k2
Also, since sin θ = kbk2 and θ is acute, we can evaluate cos θ and tan θ as following:
r
kbk22 −kr k22
p
kAxk2
cos θ = 1 − sin θ =
2
kbk2
= kbk2 ,
2

sin θ kr k2
tan θ = cos θ = (kr k2 /kbk2 )/(kAxk2 /kbk) = kAxk2 .

Thus,

kx̃ − xk2 ≤ ²k(A T A)−1 A T k2 kAk2 kxk2 + ²k(A T A)−1 k2 kAk2 kr k2 + ²k(A T A)−1 A T k2 kbk2
≤ ²κ2 (A)kxk2 + ²κ22 (A)kxk2 tan θ + ²κ2 (A)kxk2 / cos θ
µ ¶
2κ2 (A)
≤² + tan θ · κ22 (A) .
cos θ


4 S OLUTIONS FOR C HAPTER IV: N ONSYMMETRIC E IGENVALUE


P ROBLEMS
Question 4.1. Let A be defined as in equation (4.1). Show that det(A) = bi=1 det(A i i ) and then
Q

that det(A − λI ) = bi=1 det(A i i − λI ). Conclude that the set of eigenvalues of A is the union of
Q

the sets of eigenvalues of A 11 through A bb .

Proof. Denote A i i as m i -by-m i and the whole matrix A as n-by-n. From the definition of
determinant, we have det(A) = (−1)σi a i 1 ,1 a i 2 ,2 , . . . , a i n ,n , where σi is the inverse number of
P

the permutation (i 1 , i 2 , . . . , i n ). Then we can group them as

det(A) = (−1)σi (a i 1 ,1 , . . . , a i m1 ,m1 ) . . . (a i mb−1 +1 ,mb−1 +1 , . . . , a i mb ,mb ).


X

For the first group, a i j , j = 0 when i j > m 1 . Thus, we can delete them from the sum, and get

det(A) = (−1)σ1 (a i 1 ,1 , . . . , a i m1 ,m1 )(−1)σi 0 (a i m1 +1 ,m1 +1 , . . . , a i m2 ,m2 ) . . . (a i mb−1 +1 ,mb−1 +1 , . . . , a i mb ,mb )


X

= det(A 11 ) (−1)σi 0 (a i m1 +1 ,m1 +1 , . . . , a i m2 ,m2 ) . . . (a i mb−1 +1 ,mb−1 +1 , . . . , a i mb ,mb ).


X

Therefore, we reduce our problem into smaller subproblem. Continue, and we have
b
Y
det(A) = det(A 11 ) det(A 22 ) . . . det(A bb ) = det(A i i ).
i =1

Consequently,
b
det(A − λI ) = det(A i i − λI ).
Y
i =1
Thus λ is an eigenvalue of A if and only if it is an eigenvalue of some A i i . 
Question 4.2. Suppose that A is normal; i.e., A A ∗ = A ∗ A. Show that if A is also triangular,
it must be diagonal. Use this to show that an n-by-n matrix is normal if and only if it has n
orthonormal eigenvectors.

42
Proof. Since A is normal and triangular, we have
     
a 11 a 12 . . . a 1n ā 11 ā 11 a 11 a 12 ... a 1n
a 22 . . . a 2n   ā 12 ā 22   ā 12 ā 22 a 22 ... a 2n 
     
 
.
..  = .
..   .. .. .. .. .. .. .. 
  
  .
. .  . . .   . . . . . 
 
 
a nn ā 1n ā 2n . . . ā nn ā 1n ā 2n ... ā nn a nn

Equating (1, 1) entry, we have


n
|a 11 |2 = |a 1n |2 .
X
i =1

Thus, for i 6= 1 we have |a 1i | = 0. Then we reduce the problem into smaller subproblem.
Continue, and we will have A is diagonal. For any n-by-n normal matrix A, we have a unitary
matrix U such that U ∗ AU = T , where T is triangular. Then, we claim that T is normal because

T ∗ T = U ∗ A ∗UU ∗ AU = U ∗ A ∗ AU = U ∗ A A ∗U = U ∗ AUU ∗ A ∗U = T T ∗ .

Thus, T is diagonal and denote it as diag(λ1 , λ2 , . . . , λn ), we have

AU = U diag(λ1 , λ2 , . . . , λn ).

i.e., each column of U is the right eigenvector of A. On the other hand, if A has n orthonormal
eigenvectors, denote them as U . We have A = U diag(λ1 , λ2 , . . . , λn )U ∗ , and it is easy to verify
that A is normal. 
Question 4.3. Let λ and µ be distinct eigenvalues of A, let x be a right eigenvector for λ, and
let y be a left eigenvector for µ. Show that x and y are orthogonal.

Proof. Consider y ∗ Ax, and we have

y ∗ Ax = µy ∗ x = λy ∗ x.

Thus, (λ − µ)y ∗ x = 0. As λ 6= µ, we get y ∗ x = 0, i.e., x and y is orthogonal. 


i
Question 4.4. Suppose A has distinct eigenvalues. Let f (z) = ∞
P
i =−∞ a i z be a function which

is defined at the eigenvalues of A. Let Q AQ = T be the Schur form of A (so Q is unitary and T
upper triangular).

1. show that f (A) = Q f (T )Q ∗ . Thus to compute f (A) it suffices to be able to compute f (T ).


In the rest of the problem you will derive a simple recurrence formula for f (T ).

2. Show that ( f (T ))i i = f (Ti i ) so that the diagonal of f (T ) can be computed from the diag-
onal of T .

3. Show that T f (T ) = f (T )T .

4. From the last result, show that the i th superdiagonal of f (T ) can be computed from the
(i-1)st and earlier subdiagonals. Thus, starting at the diagonal of f (T ), we can compute
the first superdiagonal, second superdiagonal, and so on.

43
Proof.

1. It is suffices to prove that A i = QT i Q ∗ , and it can be proved as

A i = QT Q ∗QT Q ∗ . . .QT Q ∗ = QT i Q ∗ .

Thus,
∞ ∞
ai A i = a i QT i Q ∗ = Q f (T )Q ∗ .
X X
f (A) =
i =−∞ i =−∞

2. It is suffices to prove that (T i )i i = (Ti i )i , and it can be proved by mathematic induction.


As the multiplication of upper triangular is also an upper triangular matrix, the diago-
nal entries are the products of the corresponding diagonal entries. For the base case,
we have (T 2 )i i = Ti2i . Then, suppose it is correct for k, we can deduce for k + 1 as

(T k+1 )i i = (T k T )i i = Tiki Ti i = Tik+1


i .

Thus,
∞ ∞
a i (T i ) j j = a i T ji j = f (T j j ).
X X
( f (T )) j j =
i =−∞ i =−∞

3. It is self evident that any matrix is commutative with itself. Thus


∞ ∞
ai T T i = a i T i T = f (T )T.
X X
T f (T ) =
i =−∞ i =−∞

4. It is suffices to deduce the result for T i , and it can be prove by mathematic induction.
First, for T 2 , we have
iX
+k
T 2 (i , i + k) = T (i , t )T (t , i + k).
t =i

Thus, the kth superdiagonal can be computed from 0th to (k − 1)th superdiagonals.
P j −i +1
Then, suppose it is correct for T n , so we can express T n (i , j ) = t =1 c i , j ,t T (i , i + t −1),
where c i , j ,t are consts. Therefore, we can prove for T n+1 as

iX
+k iX
+k s−i
X+1
T k+1 (i , i + k) = T k (i , s)T (s, i + k) = T (s, i + k) c i ,s,t T (i , i + t − 1).
s=i s=i t =1

Thus, the kth superdiagonal of T n+1 can still be computed from 0th to (k − 1)th super-
diagonals.

Question 4.5. Let A be a square matrix. Apply either Question 4.4 to the Schur form of A or
equation (4.6) to the Jordan form of A to conclude that the eigenvalues of f (A) are f (λi ), where
the λi are the eigenvalues of A. This result is called the spectral mapping theorem.

44
Proof. Let U T U ∗ be the Schur form of A. According to part II of the previous question, we
know that the eigenvalues of f (T ) and f (A) are the same. Since ( f (T ))i i = f (Ti i ) = f (λi ),
the eigenvalues of f (T ) are f (λi ). Thus, the eigenvalues of f (A) are f (λi ) with the same
multiplicity as λi . 
Question 4.6. In this problem we will show how to solve the Sylvester or Lyapunov equation
AX − X B = C , where X and C are m-by-n, A is m-by-m, and B is n-by-n. This is a system of
mn linear equations for the entries of X .
1. Given the Schur decompositions of A and B , show how AX − X B = C can be transformed
into a similar system A 0 Y − Y B 0 = C 0 , where A 0 and B 0 are upper triangular.

2. Show how to solve for the entries of Y one at a time by a process analogous to back sub-
stitution. what condition on the eigenvalues of A and B guarantees that the system of
equations is nonsingular?

3. Show how to transform Y to get the solution X .


Solution.
1. Suppose U1 T1U1∗ and U2 T2U2∗ are the Schur form of A and B repectively. Then the
equation can be expressed as
T1U1∗ X U2 −U1∗ X U2 T2 = U1∗CU2 .
Denote Y = U1∗ X U2 and C 0 = U1∗CU2 , we have
T1 Y − Y T2 = C 0 ,
which satisfies the condition.

2. For illustration, we rewrite the left side of equation A 0 Y − Y B 0 = C 0 in matrix form as


     
a 11 a 12 . . . a 1n y 11 y 12 . . . y 1n y 11 y 12 . . . y 1n b 11 b 12 . . . b 1n
a 22 . . . a 2n   y 21 y 22 . . . y 2n   y 21 y 22 . . . y 2n   b 22 . . . b 2n 
     

.. .. 
  .. .. .. − ..
..  .. .. ..  .. .
.. 
   
. .  . . . .   . . . .  . . 
 

a nn y n1 y n2 . . . y nn y n1 y n2 . . . y nn b nn
Equating its corresponding entry of C 0 (n, :) one by one, we have
C 0 (n, 1) = a nn y n1 − b 11 y n1 ,
2
C 0 (n, 2) = a nn y n2 −
X
b i 2 y ni ,
i =1
......
k
C 0 (n, k) = a nn y nk −
X
b i k y ni ,
i =1
......
n
C 0 (n, n) = a nn y nk −
X
b i n y ni .
i =1

45
Thus, if a nn 6= b i i , we can solve the last row of Y one at a time. Then continue like this
with the requisition that all the eigenvalues of A and B do not coincide. We can solve
Y.

3. As what we have deduced, Y = U1∗ X U2 , where U1 ,U2 are the unitary matrices which
transform A and B into Schur form separately. We can get

X = U1 Y U2∗ .


· ¸
A C
Question 4.7. Suppose that T = is in Schur form. We want to find a matrix S so that
0 B
· ¸ · ¸
−1 A 0 I R
S TS = . It turns out we can choose S of the form . Show how to solve for R.
0 B 0 I

Solution. As the question does not tell us the dimension of each block and matrix, we can
suppose that the matrix multiplication which the question requires is compatible. Since
· ¸
−1 I −R
S = , we have
0 I
· ¸· ¸· ¸ · ¸
I −R A C I R A AR +C − RB
S −1 T S = =
0 I 0 B 0 I 0 B

Thus, the question is equal to solve

AR − RB = −C ,

which is the type of equation discussed in the previous question. 


Question 4.8. Let A be m-by-n and B be n-by-m. show that the matrices
µ ¶ µ ¶
AB 0 0 0
and
B 0 B BA

are similar. conclude that the nonzero eigenvalues of AB are the same as those of B A.
· ¸
I −A
Proof. Consider the nonsingular matrix S = . We have
0 I
· ¸ · ¸· ¸· ¸ · ¸
AB 0 −1 I −A AB 0 I A 0 0
S S = = .
B 0 0 I B 0 0 I B BA
µ ¶ µ ¶
AB 0 0 0
Thus, and are similar. Consequently, they have the same eigenvalues, and
B 0 B BA
according to Question 4.1., we have

λn det(λI − AB ) = λm det(λI − B A)

Thus, AB and B A have the same nonzero eigenvalues. 

46
Question 4.9. Let A be n-by-n with eigenvalues λ1 , . . . , λn . Show that
n
|λi |2 = min kS −1 ASk2F .
X
i =1 det(S)6=0

Remark. The question is not correct unless we change the operation from "min" into "inf".
I present a counterexample below to illustrate why. Consider A as
· ¸
1 1
A= .
0 1
· ¸
a b
For any nonsingular matrix S = , we have ad 6= bc and
c d
¸°2
a2
°·
1 ° ad − ac − bc
kS −1 ASk2F =
°
° °
2
(ad − bc) ° −c 2 ac + ad − bc °F
c 4 + a4
= + 2.
(ad − bc)2
Since a and c cannot equals zero simultaneously, we have
n
kS −1 ASk2F > |λi |2 .
X
i =1

Thus, the lower bound cannot be reached in this situation.


Proof. First, denote the Schur form of A as U ∗ AU = T . As there is an one to one coresspon-
dent from S to U S, it means that the set is not changed. Thus, we we can transform the
minimum problem into

inf kS −1U ∗ AU Sk2F = inf kS −1 T Sk2F .


det(U S)6=0 det(S)6=0

Then, we can transform the set from nonsingular matrices to nonsingular upper triangular
matrices, because every nonsingular matrix S −1 has a QR decomposition as S −1 = QR. What’s
more, since unitary matrices do not change Frobenius norm, we have

inf kS −1 T Sk2F = inf kQRT R −1Q −1 k2F = inf kRT R −1 k2F .


det(S)6=0 det(QR)6=0 det(R)6=0

As the inverse of an upper triangular matrix is also upper triangular, we have


     
r 11 λ1 −1
r 11 λ1
* *  * *
r 22 λ2 −1
r 22 λ2
    
RT R −1 = 
       
=
.. .. .. ..
  
. . . .
     
     
r nn λn −1
r nn λn

Thus,
n
|λi |2 ≤ kRT R −1 k2F = kS −1 ASk2F .
X
inf inf
i =1 det(R)6=0 det(S)6=0

47
Then, we should that the lower bound can be approximated to any precision. Denote c =
maxi 6= j |T (i , j )|. For any ² < 1, set a = ²/ max(c, 1). Consider S = diag a n−1 , a n−2 , . . . , a, 1
¡ ¢

λ1 a 2 T13 a n−1 T1n


 
aT12 ...

 λ2 aT23 ... a n−2 T2n 

ST S T =  λ3 ... a n−3 T3n  .
 
.. ..
 
.
 
 . 
λn

n(n−1)
Since |a i Tk j | < ², the square sum of the off-diagnoal entries is smaller than 2 ², which
proves the question. 

Question 4.10. Let A be an n-by-n matrix with eigenvalues λ1 , . . . , λn .

1. Show that A can be written A = H + S, where H = H ∗ is Hermitian and S = −S ∗ is skew-


Hermitian. Give explicit formulas for H and S in terms of A.

2. Show that ni=1 |Rλi |2 ≤ kH k2F .


P

3. Show that ni=1 |Jλi |2 ≤ kSk2F .


P

4. Show that A is normal (A A ∗ = A ∗ A) if and only if ni=1 |λi |2 = kAk2F .


P

Solution.

1. Suppose A = H + S, then we have A ∗ = H − S. Thus, we can solve H and S as

1
H = (A + A ∗ ),
2
1
S = (A − A ∗ ),
2
where A = H + S, H is Hermitian and S is skew-Hermitian.

2. Let U ∗ AU = T be the Schur form of A. Then, for H , we have


° °2
° Rλ °
1
° * °
2 2 1 ∗ 2
°
Rλ2 ° n

|Rλi |2 .
° ° X
kH kF = kU HU kF = k (T + T )kF = ° 
..
° ≥
2 °
° .
°
° i =1
*
Rλn °F
° °
°

3. Again, let U ∗ AU = T be the Schur form of A. Then, for S, we have


° °2
° Jλ °
1
° * °
1
°
Jλ2 ° n
kSk2F = kU ∗ SU k2F = k (T − T ∗ )k2F = ° |Jλi |2 .
° ° X
° ≥
..

2 °
° .
°
° i =1
*

Jλn °F
° °
°

48
4. If A is normal, according to Question 4.2., there exists unitary matrix U such that U AU ∗ =
diag(λ1 , . . . , λn ). Thus, we have
n
kAk2F = kU AU ∗ k2F = |λi |2 .
X
i =1

On the other hand, if A is not normal, denote its Schur form as U AU ∗ = T . We can
claim that T is not diagonal, otherwise T will be normal and consequently A is normal,
which contradicts our assumption. So, we get
n
kAk2F = kU AU ∗ k2F = kT k2F > λi .
X
i =1
Pn 2
Thus, if i =1 |λi | = kAk2F , A must be normal.

Question 4.11. Let λ be a simple eigenvalue, and let x and y be right and left eigenvectors.
We define the spectral projection P corresponding to λ as P = x y ∗ /(y ∗ x). Prove that P has the
following properties.

1. P is uniquely defined, even though we could use any nonzero scalar multiples of x and y
in its definition.

2. P 2 = P. (Any matrix satisfying P 2 = P is called a projection matrix.)

3. AP = P A = λP . (These properties motivate the name spectral projection, since P "con-


tains" the left and right invariant subspaces of λ.)

4. kP k2 is the condition number of λ.

Remark. Part IV of the question follows directly from Question 1.7. However, I present a
similar proof as shown below to remain us of the result.
Proof.

1. Since λ is a simple eigenvalue, any other right and left eigenvectors can be expressed
as x̂ = c x and ŷ = d y. Thus, for the spectral projection defined by x̂ and ŷ, we have

P̂ = x̂ ŷ ∗ /( ŷ ∗ x̂) = cd x y ∗ /(cd y ∗ x) = P,

which suggests that P is uniquely defined.

2. We can verify P 2 = P as follows:

P 2 = x y ∗ x y ∗ /(y ∗ x)2 = x y ∗ /(y ∗ x) = P.

3. We can verify AP = P A = λP as follows:

AP = Ax y ∗ /(y ∗ x) = λx y ∗ /(y ∗ x) = λP = x y ∗ A/(y ∗ x) = P A.

49
4. According to the result of part I, we can assume that kxk2 = kyk2 = 1. For any nonzero
vector α in Cn , we can decompose it into α = c y + d z, where y ∗ z = 0 and kzk2 = 1.
Then we have
kP αk2 kP (y + z)k2 |c| |c|
kP k2 = max = max = max ∗
= max .
α6=0 α6=0 α6=0 kαk2 |y x|
p
kαk2 kαk2 ∗
|c|+|d |6=0 |y x| |c| + |d |

It is easy to find that c = 0 cannot be the maximizer. Thus, we can divide |c| on the
fraction and have
|c| 1 1
kP k2 = max = max ≤ ∗ .
|y ∗ x| 1 + |d | |y x|
p q
|c|+|d |6=0 |y ∗ x| |c| + |d | |c|6 =0
|c|

The maximum is obtained when d = 0, and consequently kP k2 is the condition number


of λ.


· ¸
a c
Question 4.12. Let A = . Show that the condition numbers of the eigenvalues of A are
0 b
¡ c ¢2 1/2
both equal to (1+ a−b ) . Thus, the condition number is large if the difference a −b between
the eigenvalues is small compared to c, the offdiagonal part of the matrix.

Proof. As what the question suggests, we assume that all the variables appearing are real. It
is easy to verify that

x 1 = [1, 0]T ,
c T
y 1 = [1, a−b ]

are the right and left eigenvectors of eigenvalue λ1 = a, and

x 2 = [1, a−b T
c ] ,
y 2 = [0, 1]T

are the right and left eigenvectors of eigenvalue λ2 = b. Thus, the condition number of λ1 is

kx 1 k2 ky 1 k2 ³ c ´2
= (1 + )1/2 ,
|y T1 x 1 | a −b

and that of λ2 is
kx 2 k2 ky 2 k2 ³ c ´2
= (1 + )1/2 .
|y T2 x 2 | a −b
Thus, the result is proved. 

Question 4.13. Let A be a matrix, x be a unit vector (kxk2 = 1), µ be a scalar, and r = Ax −
µx. Show that there is a matrix E with kE kF = kr k2 such that A + E has eigenvalue µ and
eigenvector x.

50
Proof. Suppose such E exists. Since we have the condition

(A + E )x = µx,

we have the equation for E as


−r = E x.
Set E = −r x ∗ , which satisfies the condition. What’s more, we can evaluate the Frobenius
norm of E as
kE k2F = tr(xr ∗ r x ∗ ) = kr k22 tr(x x ∗ ) = kr k22 .
So, there is a matrix E satisfies the conditions of the question. 

5 S OLUTIONS FOR C HAPTER V: T HE S YMMETRIC E IGENPROBLEM AND


S INGULAR VALUE D ECOMPOSITION
Question 5.1. Show that A = B + iC is hermitian if and only if
· ¸
B −C
M=
C B

is symmetric. Express the eigenvalues and eigenvectors of M in terms of those of A.

Remark. I believe the question is correct under the assumption of both B and C are real.
Thus, it is what my proof will assume.
Solution. Denote a i∗j asthe (i , j ) entry of matrix A ∗ . As B and C are both real and

a i∗j = a j i =b j i + i c j i =b j i − ic j i = b j i − i c j i .

A is hermitian if and only if A ∗ = B − iC = B + iC = A, which is equivalent to the symmetry of


M.
As for eigenvalues and eigenvectors, suppose α = β+i γ is the eigenvector with corresponding
eigenvalue λ ∈ R of A. Because of the definition of eigenvalues and eigenvectors, Aα = (B +
iC )(β + i γ) = λ(β + i γ). Thus, we have the equation

B β −C γ = λβ,
B γ +C β = λγ.

Then, considering

−C β B β −C γ λβ β
· ¸· ¸ · ¸ · ¸ · ¸
B
= = =λ ,
C B γ Cβ + Bγ λγ γ
−B γ −C β
· ¸· ¸ · ¸ · ¸ · ¸
B −C −γ −λγ −γ
= = =λ .
C B β −C γ + B β λβ β

Thus, as we can see, from the above equation, we know that [β, γ]T , [−γ, β]T are the eigenvec-
tors with corresponding eigenvalue λ of M . 

51
Question 5.2. Prove Corollary 5.1, using Weyl’s theorem and part 4 of Theorem 3.3.
Proof. According to the hint provided, denote matrices H1 , H2 , H as

0 GT GT + F T 0 FT
· ¸ · ¸ · ¸
0
H1 = , H2 = , H= .
G 0 G +F 0 F 0
We know that σ1 ≥ σ2 . . . ≥ σn ≥ −σn ≥ −σn−1 ≥ . . . ≥ −σ1 are the eigenvalues of H1 , and
σ01 ≥ σ02 . . . ≥ σ0n ≥ −σ0n ≥ −σ0n−1 ≥ . . . ≥ −σ01 are the eigenvalues of H2 . Using Weyl’s theorem,
we have
|σi − σ0i | ≤ kH k2 .
Thus, what remains to prove is kH k2 = kF k2 . To do this, consider
0 FT 0 FT
· ¸ · ¸ · T ¸
F F 0
HT H = · = .
F 0 F 0 0 FFT

This matrix has the eigenvalues as F F T and F T F , which have the same nonzero eigenvalues.
Thus λmax (H T H ) = λmax (F T F ). i.e., kH k2 = kF k2 . 
Question 5.3. Consider Figure 5.1. Consider the corresponding contour plot for an arbitrary
3-by-3 matrix A with eigenvalues α3 ≤ α2 ≤ α1 . Let C 1 and C 2 be the two great cirles along
which ρ(u, A) = α2 . At what angle do they intersect?
Remark. Since A is arbitary, we may assume A is real. Otherwise, x ∗ Ax may not be real
number. What’s more, assume A is diagnoalizable and the eigenvalues of A are not identical.
If not, I don’t think it will behave the same as what the textbook describes.
Solution. Suppose x 1 , x 2 , x 3 are three unit orthogonal eigenvectors of A with corresponding
eigenvalues as λ1 , λ2 , λ3 repectively. Thus, all vectors y ∈ R3 can be decomposed as y = b 1 x 1 +
b 2 x 2 + b 3 x 3 . If a unit vector y satisfies ρ(y, A) = α2 , we have

ρ(y, A) = λ1 b 12 + λ2 b 22 + λ3 b 32 = λ2 .

Rearranging the above equation and using b 12 + b 22 + b 32 = 1 yields


p p
λ1 − λ2 b 1 = ± λ2 − λ3 b 3 .

As we have passume not all eigenvalues are equal, we can suppose λ2 − λ3 6= 0 and denote
λ1 − λ2 / λ2 − λ3 = c without loss of generality. Thus, we express the two great ’circles’ by
p

parametric equations as
p
b 1 = sin(θ)/ 1 + c 2 ,
b2 = cos(θ),
p
b3 = ±c sin(θ)/ 1 + c 2 .

Then, their insection points are the points with θ = 0 and θ = π. For θ = 0, their tangent
vectors are
p p
t 1 = 1/ 1 + c 2 x 1 + c/ 1 + c 2 x 3 ,
p p
t 2 = 1/ 1 + c 2 x 1 − c/ 1 + c 2 x 3 .

52
2
Consequently, the corresponding intersection angle is arccos(t T1 t 2 /(kt 1 k2 ·kt 2 k2 )) = arccos( 1−c
1+c 2
).For
θ = π, by the same computation, the intersection angle is the same as θ = 0. 

Question 5.4. Use the Courant-Fischer minimax theorem (Theorem 5.2) to prove the Cauchy
interlace theorem:
µ ¶
H b
1. Suppose that A = T is an n-by-n symmetric matrix and H is (n −1)-by-(n −1). Let
b u
αn ≤ . . . ≤ α1 be the eigenvalues of A and θn−1 ≤ . . . ≤ θ1 be the eigenvalues of H . Show
that these two sets of eigenvalues interlace:

αn ≤ θn−1 ≤ . . . ≤ θi ≤ αi ≤ θi −1 ≤ αi −1 ≤ . . . ≤ θ1 ≤ α1 .
µ ¶
H B
2. Let A = be n-by-n and H be m-by-m, with eigenvalues θm ≤ . . . ≤ θ1 . Show that
BT U
the eigenvalues of A and H interlace in the sense that α j +(n−m) ≤ θ j ≤ α j (or equivalently
α j ≤ θ j −(n−m) ≤ α j −(n−m) ).

Proof.
· ¸
I n−1
1. Denote the n-by-(n − 1) matrix P as P = . Thus, H = P T AP and for any vector
0
x ∈ Rn−1 , x T x = (P x)T (P x) holds because P T P = I n−1 . Noting that for linear inde-
pendent vectors x 1 , . . . .x n−1 ∈ Rn−1 , P x 1 , . . . .P x n−1 ∈ Rn are also linear independent.
Therefore, if S is i -dimensional space in Rn−1 , P S is i -dimensional space in Rn . Since
the minimum of the subset is not less than the minimum of whole set, we have

xT H x
θi = min max
S n−i 06=x∈S n−i xT x
(P x)T A(P x)
= min max
S n−i 06=x∈S n−i (P x)T (P x)
y T Ay
≥ min max = αi +1 .
S n−i 06= y∈S n−i yT y

On the other hand, we have

xT H x
θi = max min
Si 06=x∈S i xT x
(P x)T A(P x)
= max min
Si 06=x∈S i (P x)T (P x)
y T Ay
≤ max min = αi .
Si 06= y∈S i yT y

Combine the results, and we have αi +1 ≤ θi ≤ αi , which is equivalent to what we should


prove.

53
¸ ·
Im
2. It is similar to what we have proved. Denote the n-by-m matrix P as P = , and we
0
have H = P T AP . Using Courant-Fischer minimax theorem, we have
xT H x
θi = min max
S m+1−i 06=x∈S m+1−i xT x
(P x)T A(P x)
= min max
S m+1−i 06=x∈S m+1−i (P x)T (P x)
y T Ay
≥ min max = αi +n−m .
S m+1−i 06= y∈S m+1−i yT y
On the other hand, we have
xT H x
θi = max min
Si 06=x∈S i xT x
(P x)T A(P x)
= max min
Si 06=x∈S i (P x)T (P x)
y T Ay
≤ max min = αi .
Si 06= y∈S i yT y
Combine the results, and we have αi +n−m ≤ θi ≤ αi , which is equivalent to what we
should prove.

Question 5.5. Let A = A T with eigenvalues α1 ≥ . . . ≥ αn . Let H = H T with eigenvalues
θ1 ≥ . . . ≥ θn . Let A + H have eigenvalues λ1 ≥ . . . λn . Use Courant-Fischer minimax theorem
(Theorem) to show that α j + θn ≤ λ j ≤ α j + θ1 . If H is positive definite, conclude that λ j > α j .
In other words, adding a symmetric positive definite matrix H to another symmetric matrix A
can only increase its eigenvalues.
Proof. According to Courant-Fischer minimax theorem, we have
x T (A + H )x
λi = min max
S n+1−i 06=x∈S n+1−i xT x
T
x Ax xT H x
≤ min ( max + max )
S n+1−i 06=x∈S n+1−i xT x 06=x∈S n+1−i xT x
x T Ax
≤ min ( max + θ1 ) = αi + θ1 .
S n+1−i 06=x∈S n+1−i xT x
And,
x T (A + H )x
λi = max min
Si 06=x∈S i xT x
x T Ax xT H x
≥ max( min + min )
Si 06=x∈S i xT x 06=x∈S i xT x
x T Ax
≥ max( min + θn ) = αi + θn .
Si 06=x∈S i xT x

54
Combine the results, and we have a j + θn ≤ λ j ≤ α j + θ1 . As for positive definite H , we know
that all its eigenvalues are positive. Thus,

α j < α j + θn ≤ λ j .


Question 5.6. Let A = [A 1 , A 2 ] be n-by-n, where A 1 is n-by-m and A 2 is n-by-(n − m). Let
σ1 ≥ σ2 ≥ . . . ≥ σn be the singular values of A and τ1 ≥ τ2 . . . ≥ τm be the singular values of A 1 .
Use the Cauchy interlace theorem from Question 5.4 and part 4 of Theorem 3.3 to prove that
σ j ≥ τ j ≥ σ j +n−m .
Proof. Denote the matrix H as
0 A1 A2
 

H =  A T1 0 0 .
A T2 0 0

According to part 4 of Theorem 3.3, the absolute values of eigenvalues of H are the singular
values of A T , which are equal to those of A. Thus, we can express the eigenvalues of H as
σ1 ≥ σ2 ≥ . . . ≥ σn ≥ −σn ≥ . . . ≥ −σ1 . Consider the (n + m)-by-(n + m) principal submatrix
· ¸
0 A1
H1 = T .
A1 0

As in Question 3.14, we have already know its eigenvalues are τ1 ≥ . . . τm ≥ 0 = . . . = 0 ≥ −τm ≥


. . . ≥ −τ1 . Thus, by Cauchy interlace theorem, we have

σi +(n−m) ≤ τi ≤ σi ,

which is what we need to prove. 


Question 5.7. Let q be a unit vector and d be any vector orthogonal to q. Show that

k(q + d )q T − I k2 = kq + d k2 .

Remark. My proof suppose the dimension of the vector space is greater than 2. If the di-
mension is only 2, the proof is similar.
Proof. Denote z as a unit vector which is orthogonal to both q and d . Therefore, any vector
x can be express as x = aq + bd + c z, and we have

k (q + d )q T − I xk22
¡ ¢
T 2
k(q + d )q − I k2 = max
x6=0 kxk22
(a 2 − 2ab + b 2 )kd k22 + c 2
= max
a 2 +b 2 +c 2 6= a 2 + b 2 kd k22 + c 2
(kd k22 − 1)a 2 − 2abkd k22
à !
= max 1+
a 2 +b 2 +c 2 6=0 a 2 + b 2 kd k22 + c 2
(kd k22 − 1)a 2 − 2abkd k22
à !
≤ max 1+ .
a 2 +b 2 6=0 a 2 + b 2 kd k22

55
To determine the maximum of the above formula, we need to consider whether a = 0. If a = 0,
the above formula yields 1; If not, set s = b/a, c = kd k22 , and define

c − 1 − 2c s 0 2
s 2 − (1 − 1c )s − 1c
f (s) = , and f (s) = 2c .
1 + cs 2 (1 + c s 2 )2
c 2 +c
Thus, f (s) has two local maximum at s = − 1c , s = +∞. Since f (− 1c ) = c+1 , f (+∞) = 0, we have

k(q + d )q T − I k22 = c + 1 = kq + d k22 .

Question 5.8. Formulate and prove a theorem for singular vectors analogous to Theorem 5.4.

Remark. My theorem has been shown below, and I assume that the target matrix A is m-by-
n, where m ≥ n. However, I am not sure whether it is the one that the question requests.

Theorem 5.4’. Let A = U ΣV T , which is the reduced SVD of an m-by-n matrix A. Let A + E =
 = Û Σ̂V̂ T be the perturbed SVD. Write U = [u 1 . . . , u n ],V = [v 1 . . . , v n ], Û = [û 1 . . . , û n ], and
V̂ = [v̂ 1 . . . , v̂ n ]. Let θ, ϕ denote the acute angles between u i , û i and v i , v̂ i repectively. Define
gap(i , A) as min j 6=i |σi − σ j | if i 6= n; and gap(n, A) as min j 6=n (σn , |σn − σ j |) when i = n. Then
we have p
2 2kE k2
max(sin 2θ, sin 2ϕ) ≤ .
gap(i , A)
µ ¶ µ ¶
vi v̂ i
Proof. Denote ωi = p1 , ω̂i = p1 , and θ̂ the acute angle between ωi and ω̂i . Then,
2 ui 2 û i
consider the matrix
AT AT + E T
· ¸ · ¸
0 0
H= , Ĥ = ,
A 0 A +E 0

and δH = Ĥ − H . By Theorem 3.3 and Question 5.2, we know that kδH k2 = kE k2 and σ1 ≥
. . . σn ≥ 0 ≥ . . . ≥ 0 ≥ −σn ≥ . . . ≥ −σ1 are the eigenvalues of H, σ̂1 ≥ . . . σ̂n ≥ 0 ≥ . . . ≥ 0 ≥ −σ̂n ≥
. . . ≥ −σ̂1 are the eigenvalues of Ĥ . Thus, According to and Theorem 5.4, we know that

2kE k2
sin 2θ̂ ≤ .
gap(i , A)

As all angles are acute and

1 1
cos θ̂ = ωTi ω̂i = (u T û + v T v̂ ) = (θ + ϕ),
2 2

we just prove for the angle, which is larger than θ̂. For the smaller angle, consider ku i + û i k22
instead of ku − û i k2 . Without loss of generality, we may assume that θ is the larger one. Since

θ 1 θ̂
2 sin2 = 1 − cos θi = ku i − û i k22 ≤ kω − ω̂i k22 = 4 sin2 ,
2 2 2

56
and θ ≥ θ̂, we have
θ θ p θ̂ θ̂ p
sin θ = 2 sin cos ≤ 2 2 sin cos = 2 sin θ̂.
2 2 2 2
Then,
p p 2kE k2
sin 2θ = 2 sin θ cos θ ≤ 2 2 sin θ̂ cos θ̂ = 2 sin 2θ̂ ≤ .
gap(i , A)

Question 5.9. Prove bound (5.6) from Theorem 5.5.
Proof. Since A is symmetric, it has a set of orthonormal vectors [q 1 , . . . , q n ], which span the
whole space. Thus, suppose the unit vector x = nj=1 b j q j . Since
P

Ax − ρ(x, A)x = r ,
b i Aq i = b i αi q i ,
subtract the second from the first, and we have
(A − ρ(x, A)I ) b j q j = r + (ρ(x, A) − αi )b i q i .
X
j 6=i

As we can see that the left side does not contain q i , the right side does not contain q i , either.
Thus, the part of q i of r cancelled by (ρ(x, A) − αi )b i q i , and kr + (ρ(x, A) − αi )b i q i k2 ≤ kr k2 .
P
Denote the right side as j 6=i c j q j . We have

(A − ρ(x, A)I ) b j q j = (αi − ρ(x, A))


X X X
bj q j = cj q j.
j 6=i j 6=i j 6=i

Thus, b j = c j /(αi − ρ(x, A)) as we have assume the gap0 is not zero. Then
¢2 1 1 X 2 1 kr k2
sin θ = k c j /(αi − ρ(x, A)) ) 2 ≤
X X¡
b j q j k2 = ( ( c )2 ≤ .
j 6=i j 6=i gap0 j 6=i j gap0


Question 5.10. A harder question. Skipped.
Question 5.11. Suppose θ = θ1 + θ2 , where all three angles lie between 0 and π/2. Prove that
1 1 1
2 sin θ ≤ 2 sin 2θ1 + 2 sin 2θ2 .
Proof. Consider the function
f (θ1 ) = sin 2θ − sin 2θ1 − sin 2(θ − θ1 ).
Its derivative is
f 0 (θ1 ) = −2 cos 2θ1 + 2 cos 2(θ − θ1 ).
As 2θ − 2θ1 , 2θ1 ∈ [0, π], between which the function cos is monotonus, f 0 (θ1 ) has only one
root as θ1 = θ/2. Thus, f (θ1 ) decreases in [0, θ/2], and increases in [θ/2, θ]. It can only reach
its maximum at the extremity θ1 = 0 and θ1 = θ. i.e.,
f (θ1 ) ≤ max( f (0), f (θ)) = 0.
Therefore, 21 sin θ ≤ 12 sin 2θ1 + 12 sin 2θ2 . 

57
Question 5.12. Prove Corollary 5.2.
0 GT 0 Ĝ T
· ¸ · ¸
Proof. Denote the matrix H as H = , and Ĥ as Ĥ = . According to Theo-
G 0 Ĝ 0
rem 3.3, σ1 ≥ . . . σn ≥ 0 ≥ . . . ≥ 0 ≥ −σn ≥ . . . ≥ −σ1 are the eigenvalues of H, σ̂1 ≥ . . . σ̂n ≥ 0 ≥
. . . ≥ 0 ≥ −σ̂n ≥ . . . ≥ −σ̂1 are the eigenvalues of Ĥ . Since
· T ¸ · ¸
X 0 X 0
Ĥ = H ,
0 YT 0 Y
°· T ¸· ¸ °
° X 0 X 0
° 0 Y T 0 Y − I ° = ², by applying Theorem 5.6, we will prove the
°
if we can prove ° °
°· 2T ¸°
° X X −I 0 °
result. As the target two-norm equals ° T
° and
° 0 Y Y − I °2
°· T ¸°
° X X −I 0 ° = max kX T X − I k2 , kY T Y − I k2 , 6
° ¡ ¢
°
T
° 0 Y Y −I 2 °

the equation holds. Thus, we prove the result. 


Question 5.13. Let A be a symmetric matrix. Consider running shifted QR iteration (Algorithm
4.5) with a Rayleigh quotient shift (σi = a nn ) at every iteration, yielding a sequence σ1 , σ2 , . . .
of shifts. Also run Rayleigh quotient iteration (Algorithm 5.1), starting with x 0 = [0, 0, . . . , 1]T ,
yielding a sequence of Rayleigh quotients ρ 1 , ρ 2 , . . .. show that these sequences are identical:
σi = ρ i for all i . This justifies the claim in section 5.3.2 that shifted QR iteration enjoys local
cubic convergence.
Proof. We prove it by induction. At the begining, for the shifted QR iteration, σ1 = a nn . For
Rayleigh quotient iteration start with x 0 = [0, 0, . . . , 1], ρ 1 = x T0 Ax 0 = a nn . Thus, σ1 = ρ 1 . Then,
suppose it holds until the kth step. For (k + 1)th step, denote P k = Q k Q k−1 . . .Q 1 . Since A is
symmetric, the orthogonal matrix Q i are symmetric, too7 . We kown that for the shifted QR
iteration,
P k (A k+1 − σk I ) = (A − σk )P k .
What’s more,

P k R k = P k−1Q k R k = P k−1 (A k − σk I ) = (A − σk )P k−1 .

Transpose on both sides, premultiply by P k−1 and postmultiply by P k , yielding

P k−1 R kT = (A − σk )P k .

Equate the last column, and we have

P k−1 e n = (A − σk )P k e n /r k , or equivalently P k e n = (A − σk )−1 P k−1 e n /r k .

where r k represents R k (n, n). As P 0 e n = e n and kP i e n k2 = 1, it is a inverse iteration sequence


with the same start and same shifts as Rayleigh quotient iteration, by assumption. Thus,
P k e n = x k , which suggest σk = A(n, n) = e Tn A k e n = e Tn P k AP k e n = x Tk Ax k = ρ k . 
6 It is because that such a block diagonal matrix has the eigenvalues as the union of those of its diagonal blocks.
7 It is because (QR)T = R T Q T = QR, yielding Q = R T Q T R the right side of which is symmetric.

58
Question 5.14. Prove Lemma 5.1.
Proof. The prove will be trivial if x or y are zero vector. Thus, we may assume that both of
them are nonzero. Then, there exists a row of x y T that can linearly represent the other rows.
Without loss of generality, suppose it is the first row. Otherwise, we may change it to the first
row by exchange rows and columns. Denote the i th entry of x and y as x 1 and y 1 repectively.
Then, by assumption, x 1 6= 0. We have
 
1 + x1 y 1 x1 y 2 ... x1 y n
 x2 y 1 1 + x2 y 2 . . . x2 y n 
 
T
det(I + x y ) = det 
 .. .. .. .. 
. . . .

 
xn y 1 xn y 2 . . . 1 + xn y n
 
1 + x1 y 1 x1 y 2 . . . x1 y n
 −x 2 /x 1 1 ... 0 
 
= det 
 .. .. .. .. 
. . . . 


−x n /x 1 0 ... 1
 Pn 
1 + i =1 x i y i x 1 y 2 . . . x 1 y n
0 1 ... 0 
 

= det 
 .. .. .. .. 
. . . . 


0 0 ... 1
n
x i y i = 1 + x T y.
X
= 1+
i =1


T T
Remark. There is a simpler way to prove the question. First, observe that x y x = (y x)x,
which means x is an eigenvector of x y T corresponding to y T x. As the rank of x y T is one,
x y T has only one nonzero eigenvalue. Then by Question 4.5, the eigenvalues of I + x y T are
1’s and 1 + y T x. The result follows.
Question 5.15. Prove that if t (n) = 2t (n/2) + cn 3 + O(n 2 ), then t (n) ≈ c 34 n 3 . This justifies the
complexity analysis of the divide-and-conquer algorithm (Algorithm 5.2).
Proof. Since we have

t (n) = 2t (n/2) + cn 3 + O(n 2 ),


1
2t (n/2) = 4t (n/4) + cn 3 + O(n 2 ),
4
.........
1
2log2 n−1 t (2) = 2log2 n t (1) + 2 log n−2 cn 3 + O(n 2 ),
2 2

by adding them together, we get


4 4
t (n) = 2log2 n t (1) + c (1 − n −2 )n 3 + O(n 2 ) ≈ cn 3 .
3 3


59
Question 5.16. Let A = D + ρuu T , where D = diag(d 1 , . . . , d n ) and u = [u 1 , . . . , u n ]T . Show
that if d i = d i +1 or u i = 0, then d i is an eigenvalues of A. If u i = 0, show that the eigenvector
corresponding to d i is e i , the i th column of the identity matrix. Derive a similarly simple
expression when d i = d i +1 . This shows how to handle deflation in the divide-and-conquer
algorithm, Algorithm 5.2.
Proof. When u i = 0, then the i th column of uu T is zero vector. Thus,

(D + ρuu T )e i = d i e i .

As for d i = d i +1 , without loss of generality, we can assume that u i 6= 0 and d i −1 6= d i , d i +1 6=


d i +2 ; otherwise, such case will either be the previous one or similar to the following. We claim
that e i +1 − u i +1 /u i e i is the eigenvector corresponding to d i , because
 2  
u1 + d1 − di . . . u1 ui u 1 u i +1 . . . u1 un 0
 .. .. .. .. .. ..  .. 

 . . . . . . 
 . 

2
 u1 ui ... ui u i u i +1 . . . ui un  −u i +1 /u i 
  
(A − d i I )(e i +1 − u i +1 /u i e i ) = 

 u 1 u i +1 2
. . . u i u i +1 u i +1 ... u i +1 u n   1
 

.. .. .. .. .. .. ..
  
. .
  
 . . . .  . 
2
u1 un . . . u i u n u i +1 u n . . . u n + d n − d i 0
   
1 . . . u1 0 . . . 0 d1 − di . . . 0 0 ... 0 0
. . .. .. . . .
 .. .. . ..   .. .. .
. ..
.. .. ..  ..
 
 . .  . . . .  
 . 

0 . . . ui 0 . . . 0  u 1   . . . u i u i +1 . . . u n  −u i +1 /u i 
 
=
 
0 . . . u i +1 1 . . . 0  0 ... 0 0 ... 0 1
  
  
 .. . . .. .. . . ..   .. .. . . .. . .
   
. . . . . .. .. . ..   ..
  
. . . 
0 . . . un 0 . . . 1 0 ... 0 0 . . . dn − di 0
= 0.


Remark. My proof is based on solving (A − λI )x = 0. There is another way based on defla-
tion. First, as we can use a Givens rotation to cancel an entry of a vector, if d i = d i +1 , denote
G as the Givens rotation which satisfies
   
u1 u1
 .   . 
 ..   .. 
   
 u  u 
 i   i
G  =  .
u i +1   0 
 ..   .. 
   
 .   . 
un un

Then,

G AG T = GDG T + ρGuu T G T = D + ρ û û T ,

where û i = 0.

60
Question 5.17. Let ψ and ψ0 be given scalars. Show how to compute scalars c and ĉ in the
c
function definition h(λ) = ĉ + d −λ so that at λ = ξ, h(ξ) = ψ and h 0 (ξ) = ψ0 . This result is
needed to derive the secular equation solver in section 5.3.3.

Solution. As
c
h 0 (λ) = ,
(d − λ)2
the conditions h(ξ) = ψ and h 0 (ξ) = ψ0 become
c c
ψ = ĉ + , ψ0 = ,
d −ξ (d − ξ)2

whose solution is
c = ψ0 (d − ξ)2 , ĉ = ψ − ψ0 (d − ξ).


Question 5.18. Use the SVD to show that if A is an m-by-n real matrix with m ≥ n, then
there exists an m-by-n matrix Q with orthonormal columns (Q T Q = I ) and an n-by-n positive
semidefinite matrix P such that A = QP . This decomposition is called the polar decomposition
of A, because it is analogous to the polar form of a complex number z = e i arg(z) · |z|. Show that
if A is nonsingular, then the polar decomposition is unique.

Proof. Suppose A = U ΣV T , which is the reduced SVD of A. To show how to do the polar
decomposition, consider

A = U ΣV T = UV T V Σ1/2 Σ1/2V T = (UV T )(Σ1/2V T )T (Σ1/2V T ).

The UV T is m-by-n orthonormal and (Σ1/2V T )T (Σ1/2V T ) is a positive semidefinite matrix,


which satisfy the condition of polar decomposition.
For the uniqueness, suppose there are two polar decomposition of A as A = QP = Q̂ P̂ . Con-
sider A T A = P 2 = P̂ 2 . As P and P̂ are positive definite matrices when A has full rank, P and P̂
have the same eigenvalues. Suppose P = W T ΛW , P̂ = S T ΛS. We get

SW T ΛW S T = Λ.

It follows that
P̂ = S T ΛS = S T SW T ΛW S T S = W T ΛW = P,
and AP −1 = Q = Q̂ = A P̂ −1 . p 
T
Remark. In fact, the uniquenesspfollows from the uniqueness of A A. Standard textbooks
about linear algebra often define A T A = W T ΛW without checking whether it is well defined
as the orthogonal matrix W may not be unique. Thus, I present the proof here.

Question 5.19. Prove Lemma 5.5

Proof.

61
1. Denote x as any vector in R2n . Then,

P T x = [e T1 , . . . , e T2n ]T x = [x 1 , x n+1 , x 2 , . . . .x n , x 2n ]T ,
x T P = x T [e 1 , . . . , e 2n ] = [x 1 , x n+1 , x 2 , . . . , x n , x 2n ].

Thus, denote a i , j as the i th row, j th column entry of A. We have


   
a 1,1 a 2,1 ... a 2n,1 a 1,1 a n+1,1 ... a n,1 a 2n,1
a a 2,n+1 ... a 2n,n+1  a a n+1,n+1 ... a n,n+1 a 2n,n+1 
 1,n+1   1,n+1 
 a 1,2 a 2,2 ... a 2n,2   a 1,2 a n+1,2 ... a n,2 a 2n,2 
   
T
P AP = 
 .. .. .. ..  P = 

 .. .. .. .. ..  .
 . . . .   . . . . . 
   
 a 1,n a 2,n ... a 2n,n   a 1,n a n+1,n ... a n,n a 2n,n 
a 1,2n a 2,2n ... a 2n,2n a 1,2n a n+1,2n ... a n,2n a 2n,2n

i.e.,  
0 a1 ... 0 0
a 0 ... 0 0
 1 
0 b1 ... 0 0
 
T
P AP = 
 .. .. .. .. .
.. 
 . . . . . 
 
0 0 ... 0 an 
0 0 ... an 0

2. We can verify by simple computations:


  
a1 b1 a1  
a 2 + b 12 a2 b1
a2 b2  b 1 a 2   1
   
  a2 b1 a 22 + b 22 a3 b2
 
 .. ..  .. .. =

.
 . .  . . .. ..
. .
    

a n−1 b n−1  

b n−2 a n−1
  
a n2
 
a n b n−1
an b n−1 an

As for the remaining result, suppose B = U ΣV T , which is the SVD of B . Then, B B T =


U Σ2U T . Therefore, the singular value are the square roots of the eigenvalues of TB B T ,
and the left singular vectors of B are the eigenvectors of TB B T .

3. We can verify by simple computations:


  
a1 a1 b1  
a2 a1 b1
  1
b
 1 a2 a2 b2
  
 a 1 b 1 a 22 + b 12 a2 b2
 
 .. ..  .. .. =

.
 . .   . . .. ..
. . a n−1 b n−1
    

b n−2 a n−1

a n−1 b n−1 
  
  2 2
a n−1 b n−1 a n−1 + b n−1
b n−1 a n an

As for the remaining result, suppose B = U ΣV T , which is the SVD of B . Then, B T B =


V Σ2V T . Therefore, the singular value are the square roots of the eigenvalues of TB T B ,
and the right singular vectors of B are the eigenvectors of TB T B .

62

Question 5.20. Prove Lemma 5.7

Proof. We can verify it by simple computations:


 1 
χ1
 
a1 b1
χ2 χ1  ζ1 

ζ1
 a2 b2  χ1 
χ 3 χ2 χ1 ζ2 ζ 1
 
 
a3 b3
 
D 1B D 2 =  ζ2 ζ1  χ2 χ 1
 
 
.. .. ..
 
. .
 .. 
.
 
   . 
χn ...χ1
 
an ζn−1 ...ζ1
ζn−1 ...ζ1 χn−1 ...χ1
 1 
χ1 a 1 χ1 b 1

χ2 χ1 χ 2 χ1 ζ1
ζ1 a 2 ζ1 b 2
 
χ1

 
χ3 χ2 χ1 χ3 χ 2 χ1 ζ2 ζ 1

 
ζ 2 ζ1 a 3 ζ2 ζ1 b 3

=  χ2 χ1

 
.. ..

 .. 
. .

  . 
χn ...χ1
 
ζn−1 ...ζ1
ζn−1 ...ζ1 a n χn−1 ...χ1

χ1 a 1 ζ1 b 1
 

 χ2 a 2 ζ2 b 2 

χ3 a 3 ζ3 b 3
 
= .
.. ..
 
. .
 
 
χn a n


Question 5.21. Prove Theorem 5.13. Also, reduce the exponent 4n −2 in Theorem 5.13 to 2n −1.

Proof. First, we can evaluate kD 1 D 1 − I k2 as follows:


° °
°
2
χ22 χ21 χ2n . . . χ21 °
kD 1 D 1 − I k2 = °diag{χ1 − 1, 2 − 1, . . . , 2 − 1}
° °
ζ1 ζn−1 . . . ζ21
°
° °
2
¯ Qi
¯ j =1 χ2 − ij−1 2¯
=1 ζ j ¯
Q ¯
j
= max ¯
¯
i
j =1 ζ j
−1 2
Q ¯
1≤i ≤n ¯ ¯
¯ ¯
¯Y i iY −1 ¯
2i −2 ¯ 2 2¯
≤ max τ ¯ χj − ζj ¯
1≤i ≤n ¯ j =1 j =1
¯

≤ max τ2i −2 (τ2i − τ−2i +2 )


1≤i ≤n
≤ max τ4i −2 − 1 = τ4n−2 − 1.
1≤i ≤n

For kD 2 D 2 − I k2 , using the same techinique, we can prove kD 2 D 2 − I k2 = τ4n−4 − 1. Thus,


τ4n−2 − 1 ≡ max(kD 1 D 1 − I k2 , kD 2 D 2 − I k2 ), and by Corollary 5.2, we have

|σ̂i − σi | ≤ (τ4n−2 − 1)σi = ((² + 1)4n−2 − 1)σi = (4n − 2)²σi + O(²2 ).

63
To improve the bound, it is necessary to improve max(kD 1 D 1 − I k2 , kD 2 D 2 − I k2 ). Consider,
p Q
D̂ 1 = ( p1χ [n/2] ζ j /χ j )D 1 , D̂ 2 = ( χn [n/2] χ j /ζ j )D 2 8 . Then,
Q
n j =1 j =1
à !
χ1 [n/2] χ2 [n/2] p n−1
ζ j /χ j , p ζ j /χ j , . . . , χn χ j /ζ j ,
Y Y Y
D̂ 1 = diag p
χn j =1 χn j =2 j =[n/2]+1
à !
p [n/2] p [n/2] p n−1
D̂ 2 = diag χn χ j /ζ j , χn χ j /ζ j , . . . , χn ζ j /χ j
Y Y Y
j =1 j =2 j =[n/2]+1

It is easy to verify that B̂ = D̂ 1 B D̂ 2 = D 1 B D 2 , and

¯ χ2 [n/2] ¯ χ2
à ¯ ¯ ¯ ¯!
¯ iY
−1 ¯
¯ i Y 2 2 ¯ i
kD̂ 1 D̂ 1 − I k2 ≤ max max ¯ ζ j /χ j − 1¯ , max ¯ 2 2
χ j /ζ j − 1¯ ≤ τ2n−1 − 1,
¯ ¯
1≤i ≤[n/2] ¯ χn j =i ¯ [n/2]+1≤i ≤n ¯ χn j =[n/2]+1 ¯
à ¯ ¯ ¯ ¯!
¯ [n/2]
Y 2 2 ¯ ¯ iY
−1 ¯
kD̂ 2 D̂ 2 − I k2 ≤ max max ¯χn χ j /ζ j − 1¯ , max ¯χn ζ2j /χ2j − 1¯ ≤ τ2n−1 − 1.
¯ ¯ ¯ ¯
1≤i ≤[n/2] ¯ j =i
¯ [n/2]+1≤i ≤n ¯ j =[n/2]+1
¯

Thus, we improve that bound from 4n − 2 to 2n − 1. 

Question 5.22. Prove that Algorithm 5.13 computes the SVD of G, assuming that G T G con-
verges to a diagonal matrix.

Remark. I am not sure whether those σi are ordered by their corresponding magnitude,
and it seems that G must have full rank. Otherwise, U = [G(:, 1)/σ1 ,G(:, 2)/σ2 , . . . ,G(:, n)/σn ]
cannot be orthonormal as G(:, i ) are linear dependent or some σi = 0.
Proof. Denote Ĝ as the resulted matrix such that Ĝ T Ĝ = diag σ21 , σ22 , . . . , σ2n = Λ. As Ĝ T Ĝ, it
¡ ¢

follows
n
σ2j = Ĝ(i , j )2 = kĜ(:, j )k22 .
X
i =1

Then, we claim that U = [Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn ] is orthonormal, since

Ĝ(:, 1)T /σ1


 

 Ĝ(:, 2)T /σ2  £


 
T
¤
U U =  ..  Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn = I ,
.

 
Ĝ(:, n)T /σn
Ĝ(:, 1)T /σ1
 
T
¤  Ĝ(:, 2) /σ2 
 
T
£
UU = Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn  .. 
.

 
T
Ĝ(:, n) /σn
n
Ĝ(:, i )Ĝ(:, i )T /σ2i = ĜΛ−1Ĝ T = Ĝ(Λ−1Ĝ T Ĝ)Ĝ −1 = I .
X
=
i =1

8 [a] denotes the largest integer that smaller than a. Another feasible const is pχ Q(n/2) ζ χ , where (a) means
1 j =1 j j
the smallest ineger that larger than a.

64
Finally, as
p
U ΣV T = [Ĝ(:, 1)/σ1 , Ĝ(:, 2)/σ2 , . . . , Ĝ(:, n)/σn ] ΛJ T = Ĝ J T = Ĝ J −1 = G,

it follows that Algorithm 5.13 computes the SVD of G. 


Question 5.23. A harder question. Skipped.
Question 5.24. A harder question. Skipped.
Question 5.25. A harder question. Skipped.
Question 5.26. Suppose that x is an n-vector. Define the matrix C by c i j = |x i |+|x j |−|x i − x j |.
Show that C (x) is positive semidefinite.
Proof. As ½
2 min(|x i |, |x j |), x i x j ≥ 0,
c i j = |x i | + |x j | − |x i − x j | =
0, x i x j < 0,
we may assume that x 1 ≥ x 2 ≥ . . . ≥ x k ≥ 0 ≥ x k+1 . . . ≥ x n . Otherwise, we can use permutation
matrix to transform other situation into this. Then, the matrix C has the form
 
x1 x2 . . . xk
 x2 x2 . . . xk
 

 . .. . . .. 
 .
 . . . .


  · ¸
x k x k . . . x k
 = C1

C = 2 .

 −x k+1 −x k+2 . . . −x n   C2

 −x k+2 −x k+2 . . . −x n  
 .. .. .. .. 
. . . . 
 

−x n −x n . . . −x n
Thus, it suffices to prove for the x which satisfies |x| = x, and consequently C takes the form
 
x1 x2 . . . xk
 x2 x2 . . . xk 
 
C = .
 .. . . .
. 
 .. . . .. 
xk xk ... xk
For simlicity, call the matrices which have this form as A-matrices. We prove that A-matrices
are semipositive by induction. First, as x i ≥ 0 by assumption, the base case holds. Then,
suppose that all (k − 1)-by-(k − 1) A-matrices are semipositive. For any k-by-k A-matrix C , its
determinant can be computed as
 
  x2 − x1 x3 − x1 . . . xk − x1
x1 x2 . . . xk  x3 − x2 . . . xk − x2 
x 2 x 2 . . . x k 
   
 . .. .
..

C = xk  .
 .. . . ..  = x k 
  
. . . . .  

x k − x k−1 


1 1 ... 1
1 1 1 ... 1
2k
Y
= (−1) x k (x i − x i +1 ) ≥ 0.

65
Thus, all the principal submatrices of C has nonnegative determinant, which implies C is
semipositive. 
Question 5.27. Let µ ¶
I B
A=
B∗ I
with kB k2 < 1. Show that
1 + kB k2
kAk2 kA −1 k2 = .
1 − kB k2
Proof. To prove the equation, we need to compute the kAk2 and kA −1 k2 exactly. Denote B
µ ¶
n+m y
as n-by-m matrix, and partion any vector x ∈ R as x = where y ∈ Rn and z ∈ Rm . Then
z
we can evaluate the largest and smallest eigenvalues of A as follows:
µ ¶µ ¶

¢ I B y
λmax = max x Ax = = 1 + 2 max Re(y ∗ B z),
¡ ∗ ∗
max y z
kxk2 =1 kyk22 +kzk22 =1 B∗ I z kyk22 +kzk22 =1

λmin = 1 + 2 min Re(y ∗ B z).


kyk22 +kzk22 =1

As 2|Re(y ∗ B z)| = ky ∗ B z + z ∗ B ∗ yk2 ≤ 2kyk2 kB k2 kzk2 ≤ (kyk22 + kzk22 )kB k2 = kB k2 , the above
formulas has an upper bound as 1 + 12 kB k2 , and a lower bound as 1 − 12 kB k2 . To see both of
them are attainable, set ẑ as the vector which satisfies kB ẑk2 = kB k2 , and ŷ which is a unit
vector and parallel to ẑ ∗ B ∗ . Thus,
y ∗B z ±1 ∗ ∗ ±1 ±1
= ẑ B B z = kB zk22 = kB k2 .
kyk22 + kzk22 2kB k2 2kB k2 2
What’s more, since we know kB k2 < 1, the matrix A are positive definite. Thus, the largest
eigenvalue of A −1 is the inverse of smallest eigenvalue of A. Therefore,
1 + kB k2
kAk2 kA −1 k2 = .
1 − kB k2

Question 5.28. A square matrix A is said to be skew Hermitian if A ∗ = −A. Prove that
1. the eigenvalues of a skew Hermitian are purely imaginary.

2. I − A is nonsingular.

3. C = (I − A)−1 (I + A) is unitary, C is called the Cayley transform of A.


Proof.
1. Denote λ as an eigenvalue of matrix A, and x as the corresponding unit eigenvector.
Thus, Ax = λx and −x ∗ A =λx ∗ . Consequently,

x ∗ Ax = λx ∗ x = λ,
−x ∗ Ax =λx ∗ x =λ.

Add them together, and we have λ +λ= 0, which means λ is pure imaginary.

66
2. According to Question 4.5, the eigenvalues of I − A are 1 − λi , where λi are the eigen-
values of A. Since all the eigenvalues of A are pure imaginary,
Yq
|1 − λi | = 1 + λ2i ,
Y

which suggests that I − A is invertible.

3. Since C ∗ = (I − A)(I + A)−1 , we have

C ∗C = (I − A)(I + A)−1 (I − A)−1 (I + A)


= (I − A)(I − A + A − A A)−1 (1 + A)
= (I − A)(I − A)−1 (I + A)−1 (1 + A)
= I,
CC = (I − A)−1 (I + A)(I − A)(I + A)−1

= (I − A)−1 (I − A)(I + A)(I + A)−1


= I.

Thus, C is unitary.

6 S OLUTIONS FOR C HAPTER VI: I TERATIVE M ETHODS FOR L INEAR


S YSTEMS
Question 6.1. Prove Lemma 6.1.

Proof. It suffices to verify T N z j = λ j z j as follows:

jπ 2jπ
   
2 −1  jπ  2 sin( N +1 ) − sin( N +1 )
−1 2 −1  sin( N +1 )  .. 
2jπ   .
   
sin( )

 . .. 
k jπ (k+1) j π (k−1) j π 
−1 . .
 N +1  
  
 . 
. = 2 sin( N +1 ) − sin( N +1 ) − sin( N +1 )
..
   
.. .. ..
    
. . −1
 
j Nπ .
 
sin( N +1 )
  
−1 2 N jπ (N −1) j π
2 sin( N +1 ) − sin( N +1 )

As sin(A) + sin(A) = 2 sin( A+B A−B


2 ) cos( 2 ), it follows

k jπ (k + 1) j π (k − 1) j π k jπ k jπ jπ
2 sin( ) − sin( ) − sin( ) = 2 sin( ) − 2 sin( ) cos( ).
N +1 N +1 N +1 N +1 N +1 N +1
0· j π (N +1)· j π
As for the first and last entries, it also holds because sin( N +1 ) = sin( N +1 ) = 0. 

Question 6.2. Prove the following formulas for triangular factorizations of T N .

67
T
1. The Cholesky factorization T N = B N B N has a upper bidiagonal Cholesky factor B N with
s s
i +1 i
B N (i , i ) = , B N (i , i + 1) = .
i i +1

2. The result of Gaussian elimination with partial pivoting on T N is T N = L N U N , where the


triangular factors are bidiagonal:

i
L N (i , i ) = 1, L N (i + 1, i ) = − ,
i +1
i +1
U N (i , i ) = , U N (i , i + 1) = −1.
i

3. T N = D N D TN , where D N is the N -by-(N + 1) upper bidiagonal matrix with 1 on the main


diagonal and −1 on the superdiagonal.

Proof.
T
1. It suffices to verify that T N = B N B N . According to Question 5.19, the diagonal entries of
q
BNT
B N are i +1
i + i −1
i = 2, and the offdiagonal entries are i +1 i T
i i +1 = 1. Thus, T N = B N B N .

2. As T N is band matrix, T (i , i ) would be updated only at (i − 1)th step. For the first step,
L N (1, 1) = 1, L N (2, 1) = − 12 and U N (1, 1) = 2, U N (1, 2) = −1. Suppose it holds for (k −1)th
step and does not require swapping rows. Then, for kth step, we have
   . 
.. .. .. ...

. . ..

.
  . .

 .. .
 
k
−1   1  .
  k
−1 
T N = L k−1  k−1  = L k−1 
k−1 k−1 .
1
 
k+1
−1 2 −1 −1
   
k
 
k
  
..
 
.. .. . .
  
. . . .. ..

Therefore, it also holds for kth step and does not require swapping rows.

3. It suffices to verify that T N = D TN D N as follows:


 
1  
2 −1
 
1 −1  .. 
 .. .. −1 .  −1 . . . . . .
  
. .  
D N D TN = 

 .. =
 
 = TN


1 −1
 . 1 ..
 . 2 −1
   
−1 1
   
1 −1  
−1 2
−1

Question 6.3. Confirm equation (6.13).

68
∂ 2
∂ 2
2 2 2 2
Proof. It suffices to verify that (− ∂x 2 − ∂y 2 ) sin(i πx) sin( j πy) = (i π + j π ) sin(i πx) sin( j πy)

as follows:
∂2 ∂2 ∂2 ∂2
(− − ) sin(i πx) sin( j πy) = − sin(i πx) sin( j πy) − sin(i πx) sin( j πy)
∂x 2 ∂y 2 ∂x 2 ∂y 2
= i 2 π2 sin(i πx) sin( j πy) + j 2 π2 sin(i πx) sin( j πy).


Question 6.4. 1. Prove Lemma 6.2.

2. Prove Lemma 6.3.

3. Prove that the Sylvester equation AX − X B = C is equivalent to (I n ⊗ A −B T ⊗I m )vec(X ) =


vec(C ).

4. Prove that vec(AX B ) = (B T ⊗ A) · vec(X ).

Proof.

1. As the textbook has already proved part III of Lemma 6.2, we only need to prove part I
and part II. For part I, partition X by columns as X = [x 1 , . . . , x n ]. Then

Ax 1 A x1
    
 ..   . .. . 
vec(AX ) =  .  =    ..  = (I n ⊗ A) · vec(X ).
 

Ax n A xn

The proof for part II is similar. It suffices to verify that X B (i , j ) = nk=1 B (k, j )X (i , k),
P

which holds by the definition of matrix product.

2. Since the (i , j ) block of (A ⊗ B )(C ⊗ D) is nk=1 A(i , k)C (k, j )B D = (AC ) ⊗ (B D), part I
P

follows. For part II, use the result of part I, yielding

(A ⊗ B )(A −1 ⊗ B −1 ) = I m ⊗ I n = I m+n .

Thus, (A ⊗ B )−1 = (A −1 ⊗ B −1 ). As for part III, we can verify it as follows

A(1, 1)B T A(2, 1)B T . . . A(n, 1)B T


 

 A(1, 2)B T A(2, 2)B T . . . A(n, 2)B T 


 
T  = AT ⊗ B T .
(A ⊗ B ) =   .. .. .. ..
. . . .

 
T T T
A(1, n)B A(2, n)B . . . A(n, n)B

3. Take operator "vec" on both side of the Sylvester equation, yielding

vec(AX − X B ) = vec(AX ) − vec(X B ) = (I ⊗ A)vec(X ) − (B T ⊗ I )vec(X ) = vec(C ).

4. It can be verified as

vec(AX B ) = (I ⊗ A)vec(X B ) = (I ⊗ A) · (B T ⊗ I )vec(X ) = (B T ⊗ A)vec(X ).

69

Question 6.5. Suppose that A n×n is diagonalizable, so A has n independent eigenvectors:
Ax i = αi x i , or AX = X Λ A , where X = [x 1 , x 2 , . . . , x n ] and Λ A = diag(αi ). Similarly, suppose
that B m×m is diagonalizable, so B has m independent eigenvectors: B y i = βi y i , or B Y = Y ΛB ,
where Y = [y 1 , y 2 , . . . , y m ] and ΛB = diag(β j ). Prove the following results.

1. The mn eigenvalues of I m ⊗ A + B ⊗ I n are λi j = αi + β j , i.e., all possible sums of pairs


of eigenvalues of A and B . The corresponding eigenvectors are z i j , where z i j = x i ⊗ y j ,
whose (km + l )th entry is x i (k)y j (l ). Written another way,

(I m ⊗ A + B ⊗ I n )(Y ⊗ X ) = (Y ⊗ X )(I m ⊗ Λ A + ΛB ⊗ I n ).

2. The Sylvester equation AX + X B T = C is nonsingular if and only if the sum αi + β j = 0


for all eigenvalues αi of A and β j of B . The same is true for the slightly different Sylvester
equation AX + X B = C .

3. The mn eigenvalues of A ⊗ B are λi j = αi β j , i.e., all possible products of pairs of eigen-


values of A and B . The corresponding eigenvectors are z i j , where z i j = x i ⊗ y j , whose
(km + l )th entry is x i (k)y j (l ). Written another way,

(B ⊗ A)(Y ⊗ X ) = (Y ⊗ X )(ΛB ⊗ Λ A ).

Remark. It seems that part II of the question is wrong. The sum αi + β j should not equal
zero. It has been proved in Question 4.6. Also, I do not know why the author swapped A and
B in the matrix form, athough they are equivalent.
Proof.

1. It suffices to verify the equation as follows:

(I m ⊗ A + B ⊗ I n )(Y ⊗ X ) = Y ⊗ AX + B Y ⊗ X = Y ⊗ X Λ A + Y ΛB ⊗ X
= (Y ⊗ X )(I m + Λ A ) + (Y ⊗ X )(ΛB + I n )
= (Y ⊗ X )(I m ⊗ Λ A + ΛB ⊗ I n ).

2. According to Question 6.4, the first Sylvester equation equals to solve

(I m ⊗ A + B ⊗ I n )vec(X ) = vec(C ),

which is a linear equation. To solve it, it is necessary and sufficient det(I m ⊗ A+B ⊗I n ) 6=
0. As what part I has proved, the eigenvalues of the above matrix is αi + β j . Thus,
for all i , j , αi + β j 6= 0. For the second Sylvester equation, it suffices to prove that the
eigenvalues of I m ⊗ A + B T ⊗ I n are also αi + β j , which is true if we subsititue B as B T in
part I.

3. It suffices to verify the equation as follows:

(B ⊗ A)(Y ⊗ X ) = (B Y ⊗ AX ) = (Y ΛB ) ⊗ (X Λ A ) = (Y ⊗ X )(ΛB ⊗ Λ A ).

70
Question 6.6. Programming question. Skipped.

Question 6.7. Prove Lemma 6.7.

Proof.

1. Since the mth Chebyshev polynomial is defined by Tm (x) = 2xTm−1 (x) − Tm−2 (x) with
initial value T0 (x) = 1, T1 (x) = x, we can prove it by induction. As T0 (1) = 1, T1 (1) = 1
and assume it holds for (m − 1)th Chebyshev polynomial, it follows

Tm (1) = 2Tm−1 (1) − Tm−2 (1) = 1.

2. As T1 (x) = x = 21−1 x, T0 (x) = 19 , suppose it holds for (m − 1)th Chebyshev polynomial.


It follows
Tm (x) = 2xTm−1 (x) − Tm−2 (x) = 2m−1 x m + O(x m−1 ).

3. The recursive formula has unique solution when initial value is given, though it may
take many forms. Thus, it suffices to verify such Tm satisfies the recursive formula by
induction, since it holds for base case. When |x| ≤ 1, it follows10
¡ ¢
Tm = cos(m · arccos x) = cos (m − 1) arccos x + arccos x
¡ ¢ ¡ ¢
= x cos( m − 1) arccos x − sin (m − 2) arccos x + arccos x sin(arccos x)
= xTm−1 − x sin (m − 2) arccos x sin(arccos x) − Tm−2 sin(arccos x)2
¡ ¢
³ ´
= xTm−1 − Tm−2 + Tm−2 cos(arccos x)2 + x/2 Tm−1 − Tm−3
3 x
= xTm−1 − Tm−2 + (2xTm−2 − Tm−3 )
2 2
= 2xTm−1 − Tm−2 .

When |x| ≥ 1, it follows11


¡ ¢ ¡ ¢
Tm = cosh(marccoshx) = xTm−1 + sinh (m − 1)arccoshx sinh arccoshx
¡ ¢ ¡ ¢ ¡ ¢2
= xTm−1 + x sinh (m − 2)arccoshx sinh arccoshx + Tm−2 sinh arccoshx
= xTm−1 − Tm−2 + x 2 Tm−2 + x/2Tm−1 − x/2Tm−3
3
= Tm−1 − Tm−2 + x/2(2xTm−2 − Tm−3 )
2
= 2Tm−1 − Tm−2 .
¡ ¢
4. |Tm (x)| ≤ 1 because |Tm (x)| = | cos m arccos x | when |x| ≤ 1.

5. As cosh x > 0, the zeros of Tm (x) must lie in segment [−1, 1]. When |x| ≤ 1, T (x) =
¡ ¢ ¡ ¢
cos m arccos x , which has zero at x i = cos (2i − 1)π/(2m) . Since Tm (x) is polynomial
¡ ¢
with degree m, all its zeros are x i = cos (2i − 1)π/(2m) .
9 Although T does not satisfy the formula, yet it does not matter.
0
10 cos(arccos x) = x holds because |x| ≤ 1.
11 cosh(arccoshx) = x holds because |x| ≥ 1.

71
p
6. Since arccoshx = ln(x ± x 2 − 1) when |x| > 1, it follows

1¡ p p ¢
Tm = exp(m ln(x ± x 2 − 1)) + exp(−m ln(x ± x 2 − 1))
2
1¡ p p
= (x ± x 2 − 1)m + (x ± x 2 − 1)−m .
¢
2
p p
As x + x 2 − 1 = 1/(x − x 2 − 1), it can be united as one formula as

1¡ p p
(x + x 2 − 1)m + (x + x 2 − 1)−m .
¢
2

7. Substitute x as 1 + ² into the above formula, yielding12

1¡ p p ¢ 1 p p
(1 + ² + ²2 + 2²)m + (1 + ² + ²2 + 2²)−m ≥ (1 + 2²)m ≥ .5(1 + m 2²).
2 2

Question 6.8. Programming question. Skipped.

Question 6.9. Confirm that evaluating the formula in (6.47) by performing the matrix-vector
multiplications from right to left is mathematically the same as Algorithm 6.13.

Proof. We prove it step by step. First, according to Question 6.4.,

(Z T ⊗ Z T ) · vec(h 2 F ) = h 2 vec(Z F Z T ) = h 2 vec(Z T F Z ).13

The first step is correct. Since

I ⊗ Λ + Λ ⊗ I = diag(λ1 I + Λ, . . . , λn Λ),

it follows
1 1 1 1
(I ⊗ Λ + Λ ⊗ I )−1 = diag( , ,..., ,..., ).
2λ1 λ1 + λ2 λ2 + λ1 2λn
Consequently,
h 2 f i0, j +1
−1 2 0
(I ⊗ Λ + Λ ⊗ I ) vec(h F )( j n + i ) = .
λ j +1 + λi
The second step is correct. Finally, use part IV of Question 6.4., again, yielding

vec(V ) = (Z ⊗ Z )vec(V 0 ) = vec(Z V 0 Z ) = vec(Z V 0 Z T ).

Question 6.10. The question is too long, so I omit it. For details, please refer to the textbook.

Proof.

12 The last one holds by Bernoulli inequality.


13 Z = Z T .

72
1. First, we prove that when two matrices commute they have at least one common eigen-
vector. Suppose λ is an eigenvalue of matrix A with corresponding eigenvectors x 1 , x 2 , . . . , x s .
Since AB x i = B Ax i = λB x i , each B x i is in the eigenspace with value λ of A. Conse-
quently, B x i = sj =1 a j i x j . Or, in matrix form as
P

a 11 a 12 ... a 1s
 
¡ ¢ ¡ ¢ . .. .. .. 
B x 1 , . . . , B x s = x 1 , . . . , x s  .. . . . .
a s1 a s2 ... a ss

Denote the right side matrix as S. It has at least one eigenvector y corresponding to µ.
Thus,

a 11 a 12 ... a 1s
 
¤ . .. .. .. 
.  y = µ x 1 , . . . , x s y.
£ ¤ £ £ ¤
B x 1 , . . . , x s y = x 1 , . . . , x s  .. . .
a s1 a s2 ... a ss
£ ¤ £ ¤
As x i are linear independent, x 1 , . . . , x s y 6= 0. Therefore, x 1 , . . . , x s y is a common
eigenvector. Then, we prove the question by induction. The base case holds because it
involves only scalar operations. Suppose it holds for (n − 1)-by-(n − 1) matrices. Then,
for n-by-n matrices, since A and B have one common unit eigenvector x, expand it into
an orthonormal matrix Q. We have

λ µ
µ ¶ µ ¶
Q T AQ = ,Q T BQ = .
A n−1 B n−1

Since Q T AQQ T BQ = Q T ABQ = Q T B AQ − Q T BQQ T AQ, it follows that A n−1 and B n−1
commute. Thus, by induction, there exists Q̃ such that Q̃ T A n−1Q̃ = Λ A n−1 , Q̃ T B n−1Q̃ =
ΛB n−1 . Then

λ λ
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
T 1 1 T 1 1
Q T A Q= ,Q T B Q= .
Q̃ Q̃ ΛB n−1 Q̃ Q̃ ΛB n−1

2. First, consider the most trivial case, when


 
−1
 .. 
−1 . 
T̃ =  .
 
..
. −1
 

−1


Since T̃ = T N − 2I , by Question 4.5, the eigenvalues of T̃ are 2 cos( N +1 ) with corre-
jπ j Nπ
sponding eigenvector z j = [sin( N +1 ), . . . , sin( N +1 )]. Then, as T̂ = αI − θ T̃ , it follows

that the eigenvalues of T̂ are α − 2θ cos( N +1 ) with corresponding eigenvectors z j =
jπ j Nπ
[sin( N +1 ), . . . , sin( N +1 )].

73
3. Since T = I ⊗ A + T̃ ⊗ H , it follows

(I ⊗Q)T (I ⊗Q T ) = (I ⊗Q)(I ⊗ A)(I ⊗Q T ) + (I ⊗Q)(T̃ ⊗ H )(I ⊗Q T )


= I ⊗Q AQ T + T̃ ⊗Q HQ T = I ⊗ Λ A + T̃ ⊗ ΛH .

By Question 6.5, the eigenvalues of T̃ ⊗ ΛH are λi j = 2 cos( N +1 )θi with corresponding
eigenvector x i j = z i ⊗ e j . Then,

(I ⊗ Λ A )x i j = (I ⊗ Λ A )(z i ⊗ e j ) = z i ⊗ (α j )e j = α j (z i ⊗ e j ) = α j x i j .

Denote X = [x 11 , x 12 , . . . , x nn ] = Z ⊗ I . It follows

(I ⊗ Λ A + T̃ ⊗ ΛH )X = X diag(α1 + λ11 , . . . , αn + λnn ).

The eigenvectors are (I ⊗Q)(Z ⊗ I ) = Z ⊗Q.

4. Denote ST S −1 = Λ as the eigendecomposition of T . Then, to solve T x = b yields


x = S −1 Λ−1 Sb. As we have the explicit form of Λ and S, we claim by this method we
can solve T x = b in O(n 3 ) while dense LU costs O(n 6 ) and band LU costs O(n 4 ).
As Z is symmetric orthonormal and finding Q and Q −1 costs O(n 3 ) time, if we can
bound the matrix vector product (Z ⊗ Q)b and (Z −1 ⊗ Q −1 )c, the claim holds. Since
the two matrix vector product is similar, we use (Z ⊗ Q)b as illustration. Partition b by
every n entries, and form it into a n-by-n matrix B . Then, the corresponding matrix
vector product becomes matrix matrix product as

(Z ⊗Q)b = (Z ⊗Q)vec(B ) = vec(QB Z T ),

which costs O(n 3 ) time.

5. If A and B are symmetric tridiagonal Toeplitz matrices, Q = Z and the cost of matrix
matrix product can be reduced to O(n 2 log n) by fast sine transformation (or FFT). Thus,
the total cost reduces to O(n 2 log n).


Question 6.11. Suppose that R is upper triangular and nonsingular and that C is upper hes-
senberg. confirm that RC R −1 is upper Hessenberg.

Proof. As the inverse of an upper triangular matrix is upper triangular, it suffices to verify
multiplication by upper triangular preserve Hessenberg. Since C (i , j ) = 0 when i + 1 > j and
R(i , j ) = 0 when i > j , it follows
n
X iX
−1 n
X
RC (i , j ) = R(i , k)C (k, j ) = R(i , k)C (k, j ) + R(i , k)C (k, j ) = 0, when i > j + 1,
k=1 k=1 k=i
n
X iX
−2 n
X
C R(i , j ) = C (i , k)R(k, j ) = C (i , k)R(k, j ) + C (i , k)R(k, j ) = 0, when i > j + 1.
k=1 k=1 k=i −1

74
Question 6.12. Confirm that the Krylov subspace Kk (A, y 1 ) has dimension k if and only if the
Arnoldi algorithm (Algorithm 6.9) or the Lanczos algorithm (Algorithm 6.10) can compute q k
without quitting first.
Proof. The proofs for Arnoldi algorithm and Lanczos algorithm are similar. Thus, it suf-
fices to prove the result for Arnoldi algorithm. If we can prove that at i th step of Arnoldi
algorithm, z i = Aq i lies in the space span(A[y 1 , . . . .y i ]). The question is proved, because the
criteria of quitting is whether h j +1, j = 0 occurs for any j , which equals to span(A[y 1 , . . . , y j ]) ⊂
span[y 1 , . . . , y j ], i.e., Krylov subspace Kk (A, y 1 ) has dimension less that k.
We prove z i = Aq i ∈ span(A[y 1 , . . . .y i +1 ]) by induction. For the base case, it holds because
q j = b/kbk2 . If it holds for the ( j − 1)th step, we can conclude q i ∈ [y 1 , . . . .y j −1 ] for all i ≤ j ,
Pi −1
since q i = z i −1 − k=1 q k . Then for the j th step, z = Aq j ∈ span(A[y 1 , . . . .y j ]). 
Question 6.13. Confirm that when A n×n is symmetric positive definite and Q n×k has full col-
umn rank, then T = Q T AQ is also symmetric positive definite.
Proof. For any vector x ∈ Rk , it follows
x T Q T AQx = (Qx)T AQx ≥ 0.
As A is s.p.d, only when Qx = 0 the equality holds. Then, since Q has full column rank, the
only solution of Qx = 0 is x = 0. Thus, T is also s.p.d. 
Question 6.14. Prove Theorem 6.9.
Remark. The Theorem has a little error. That is, Â = M −1/2 AM 1/2 . The correct one should
be  = M −1/2 AM −1/2
Proof. It suffices to prove the relationship bewtween the variables and we can do it by in-
duction. For the base case, we can verify as follows:
ẑ = Â · p̂ 1 = M −1/2 AM −1 b = M −1/2 Ap 1 = M −1/2 z,
ν̂1 = (r̂ T0 r̂ 0 )/(p̂ T1 ẑ) = (b T M −1 b)/(b T M −1 z) = ν1 ,
x̂ 1 = x̂ 0 + ν̂1 p̂ 1 = ν1 M 1/2 p 1 = M 1/2 x 1 ,
r̂ 1 = r̂ 0 − ν̂1 ẑ = M −1/2 r 0 − ν1 M −1/2 z = M −1/2 r 1 ,
µ̂2 = (r̂ T1 r̂ 1 )/(r̂ T0 r̂ 0 ) = (r T1 M −1 r 1 )/(r T0 M −1 r 0 ) = µ2 ,
p̂ 2 = r̂ 1 + µ̂2 p̂ 1 = M −1/2 r 1 + M −1/2 µ2 p 1 = M 1/2 p 2 .
Suppose it holds for (k − 1)th step. Then, for kth step, it follows
ẑ = Â · p̂ k = M −1/2 Ap k = M −1/2 z,
ν̂k = (r̂ Tk−1 r̂ k−1 )/(p̂ Tk ẑ) = (r k−1 T M −1 r k−1 )/(p Tk z) = νk ,
x̂ k = x̂ k−1 + ν̂k p̂ k = M 1/2 x k−1 + νk M 1/2 p k = M 1/2 x k ,
r̂ k = r̂ k−1 − ν̂k ẑ = M −1/2 r k−1 − νk M −1/2 z = M −1/2 r k ,
µ̂k+1 = (r̂ Tk r̂ k )/(r̂ Tk−1 r̂ k−1 ) = (r Tk M −1 r k )/(r Tk−1 M −1 r k−1 ) = µk+1 ,
p̂ k+1 = r̂ k + µ̂k+1 p̂ k = M −1/2 r k + M −1/2 µk+1 p k = M 1/2 p k+1 .


75

You might also like