You are on page 1of 5

148 Chapter 3: Linear Least Squares

3.38. Explain how QR factorization with column 3.42. Let a be a nonzero column vector. Consid-
pivoting can be used to determine the rank of a ered as an n × 1 matrix, a has only one positive
matrix. singular value. What is its value?
3.39. Explain why column pivoting can be used 3.43. Express the Euclidean condition number of
with the modified Gram-Schmidt orthogonaliza- a matrix in terms of its singular values.
tion procedure but not with the classical Gram-
3.44. List two reliable methods for determining
Schmidt procedure.
the rank of a rectangular matrix numerically.
3.40. In terms of the condition number of the
3.45. If A is a 2n × n matrix, rank the following
matrix A, compare the range of applicability of
methods according to the amount of work required
the normal equations method and the Householder
to solve the linear least squares problem Ax ≈ b.
QR method for solving the linear least squares
problem Ax ∼ = b [i.e., for what values of cond(A) (a) QR factorization by Householder transforma-
can each method be expected to break down?]. tions
3.41. Let A be an m × n matrix. (b) Normal equations
(a) What is the maximum number of nonzero sin- (c) Singular value decomposition
gular values that A can have? 3.46. List at least two applications for the singu-
(b) If rank(A) = k, how many nonzero singular lar value decomposition (SVD) of a matrix other
values does A have? than solving least squares problems.

Exercises
3.1. If a vertical beam has a downward force ap- (a) Set up the overdetermined linear system for
plied at its lower end, the amount by which it the least squares problem.
stretches will be proportional to the magnitude of (b) Set up the corresponding normal equations.
the force. Thus, the total length y of the beam is (c) Compute the least squares solution by
given by the equation Cholesky factorization.
y = x1 + x2 t, 3.3. Set up the linear least squares system Ax ∼ =
b for fitting the model function f (t, x) = x1 t+x2 et
where x1 is its original length, t is the force ap- to the three data points (1,2), (2,3), (3,5).
plied, and x2 is the proportionality constant. Sup-
pose that the following measurements are taken: 3.4. In fitting a straight line y = x0 + x1 t to the
three data points (ti , yi ) = (0,0), (1,0), (1,1), is
t 10 15 20 the least squares solution unique? Why?
y 11.60 11.85 12.25
3.5. Let x be the solution to the linear least
(a) Set up the overdetermined 3 × 2 system of lin- squares problem Ax ∼
= b, where
ear equations corresponding to the data collected.  
1 0
(b) Is this system consistent? If not, compute each 1 1
possible pair of values for (x1 , x2 ) obtained by se- A= 1 2.

lecting any two of the equations from the system.
1 3
Is there any reason to prefer any one of these re-
sults? Let r = b − Ax be the corresponding residual
(c) Set up the system of normal equations and vector. Which of the following three vectors is a
solve it to obtain the least squares solution to the possible value for r? Why?
overdetermined system. Compare your result with      
1 −1 −1
those obtained in part b. 1  −1   1
3.2. Suppose you are fitting a straight line to the (a) 1
 (b) 
 1
 (c) 
 1

three data points (0,1), (1,2), (3,3). 1 1 −1
Exercises 149

3.6. (a) What is the Euclidean norm of the min- 3.13. If A is both an orthogonal matrix and
imum residual vector for the following linear least an orthogonal projector, what can you conclude
squares problem? about A?
3.14. Show that if the vector v 6= 0, then the
   
1 1   2
0 1 x 1 ∼ matrix
= 1
x2 vv T
0 0 1 H =I −2 T
v v
(b) What is the solution vector x for this problem? is orthogonal and symmetric.
3.7. Let A be an m × n matrix and b an m- 3.15. Let a be any nonzero vector. If v =
vector. a − α e1 , where α = ±kak2 , and
(a) Prove that a solution to the least squares prob-
lem Ax ∼ = b always exists. vv T
H =I −2 ,
(b) Prove that such a solution is unique if, and vT v
only if, rank(A) = n. show that Ha = α e1 .
3.8. Suppose that A is an m × n matrix of rank 3.16. Consider the vector a as an n × 1 matrix.
n. Prove that the matrix AT A is positive definite.
(a) Write out its reduced QR factorization, show-
3.9. Prove that the augmented system matrix in ing the matrices Q and R explicitly.
Section 3.4.2 cannot be positive definite.
(b) What is the solution to the linear least squares
3.10. Let B be an n × n matrix, and assume that problem ax ∼= b, where b is a given n-vector?
B is both orthogonal and triangular.
3.17. Determine the Householder transformation
(a) Prove that B must be diagonal.
that annihilates all but the first entry of the vector
(b) What are the diagonal entries of B? [ 1 1 1 1 ]T . Specifically, if
(c) Let A be n × n and nonsingular. Use parts a    
and b to prove that the QR factorization of A is 1 α
vv T 
 
unique up to the signs of the diagonal entries of 1  0
I −2 T   =  ,
R. In particular, there exist unique matrices Q v v 1  0 
and R such that Q is orthogonal, R is upper tri- 1 0
angular with positive entries on its main diagonal,
and A = QR. what are the values of α and v?

3.11. Suppose that the partitioned matrix 3.18. Suppose that you are computing the QR
  factorization of the matrix
A B  
O C 1 1 1
1 2 4
is orthogonal, where the submatrices A and C are A= 1 3

9
square. Prove that A and C must be orthogonal, 1 4 16
and B = O.
3.12. (a) Let A be an n × n matrix. Show by Householder transformations.
that any two of the following conditions imply the (a) How many Householder transformations are
other: required?
1. AT = A (b) What does the first column of A become as a
2. AT A = I result of applying the first Householder transfor-
3. A2 = I mation?
(b) Give a specific example, other than the iden- (c) What does the first column then become as a
tity matrix I or a permutation of it, of a 3 × 3 result of applying the second Householder trans-
matrix that has all three of these properties. formation?
(c) Name a nontrivial class of matrices that have (d ) How many Givens rotations would be required
all three of these properties. to compute the QR factorization of A?
150 Chapter 3: Linear Least Squares

3.19. Consider the vector AT A = LLT be the Cholesky factorization of


  AT A.
2
a = 3. (a) Show that RT R = LLT .
4 (b) Can one conclude that R = LT ? Why?
3.23. In Section 3.4.1 we observed that the
(a) Specify an elementary elimination matrix that cross-product matrix AT A is exactly singular in
annihilates the third component of a. floating-point arithmetic if
(b) Specify a Householder transformation that an-  
nihilates the third component of a. 1 1
(c) Specify a Givens rotation that annihilates the A =  0,
third component of a. 0 
(d ) When annihilating a given nonzero component √
where  is a positive number smaller than mach
of any vector, is it ever possible for the correspond- in a given floating-point system. Show that if
ing elementary elimination matrix and House- A = QR is the reduced QR factorization for this
holder transformation to be the same? Why? matrix A, then R is not singular, even in floating-
(e) When annihilating a given nonzero component point arithmetic.
of any vector, is it ever possible for the correspond-
3.24. Verify that the dominant term in the op-
ing Householder transformation and Givens rota-
eration count (number of multiplications or num-
tion to be the same? Why?
ber of additions) for solving an m × n linear least
3.20. Suppose you want to annihilate the second squares problem using the normal equations and
component of a vector Cholesky factorization is mn2 /2 + n3 /6.
 
a1 3.25. Verify that the dominant term in the oper-
a= ation count (number of multiplications or number
a2
of additions) for QR factorization of an m × n ma-
using a Givens rotation, but a1 is already zero. trix using Householder transformations is mn2 −
(a) Is it still possible to annihilate a2 with a n3 /3.
Givens rotation? If so, specify an appropriate 3.26. Let c = cos(θ) and s = sin(θ) for some
Givens rotation; if not, explain why. angle θ. Give a detailed geometric description of
(b) Under these circumstances, can a2 be annihi- the effects on vectors in the Euclidean plane R2 of
lated with an elementary elimination matrix? If each the following 2 × 2 orthogonal matrices.
so, how? If not, why?
   
c s −c s
(a) G = (b) H =
3.21. A Givens rotation is defined by two param- −s c s c
eters, c and s, and therefore would appear to re- 3.27. (a) Suppose that Q is an n × k matrix
quire two storage locations in a computer imple- whose columns form an orthonormal basis for a
mentation. The two parameters depend on a sin- subspace S of Rn . Show that QQT is an orthog-
gle angle of rotation, however, so in principle it onal projector onto S.
should be possible to record the rotation by stor-
(b) If A is a matrix with linearly independent
ing only one number. Devise a scheme for stor-
columns, show that A(AT A)−1 AT is an orthogo-
ing and recovering Givens rotations using only one
nal projector onto span(A). How does this result
storage location per rotation.
relate to the linear least squares problem?
3.22. Let A be an m × n matrix of rank n. Let (c) If P is an orthogonal projector onto a subspace
  S, show that I −P is an orthogonal projector onto
R
A=Q the orthogonal complement of S.
O
(d ) Let v be any nonzero n-vector. What is the
be the QR factorization of A, with Q orthogonal orthogonal projector onto the subspace spanned
and R an n × n upper triangular matrix. Let by v?
Exercises 151

3.28. (a) In the Gram-Schmidt procedure of Sec- (e) Show that any orthogonal matrix Q is a prod-
tion 3.5.3, if we define the orthogonal projectors uct of reflectors.
Pk = qk qkT , k = 1, . . . , n, where qk is the kth col- (f ) Illustrate the previous result by expressing the
umn of Q in the resulting QR factorization, show plane rotation  
that c s
,
(I − Pk )(I − Pk−1 ) · · · (I − P1 ) −s c
= I − Pk − Pk−1 − · · · − P1 . where c2 + s2 = 1, as a product of two reflectors.
For some specific angle of rotation, draw a picture
(b) Show that the classical Gram-Schmidt proce- to show the mirrors.
dure is equivalent to
3.30. (a) Consider the column vector a as an
qk = (I − (P1 + · · · + Pk−1 ))ak , n × 1 matrix. Write out its reduced singular value
decomposition, showing the matrices U , Σ, and
(c) Show that the modified Gram-Schmidt proce- V explicitly.
dure is equivalent to (b) Consider the row vector aT as a 1 × n matrix.
Write out its reduced SVD, showing the matrices
qk = (I − Pk−1 ) · · · (I − P1 )ak . U , Σ, and V explicitly.
3.31. If A is an m × n matrix and b is an m-
(d ) An alternative way to stablize the classical vector, prove that the solution x of minimum Eu-
procedure is to apply it more than once (i.e., iter- clidean norm to the least squares problem Ax ∼
=b
ative refinement), which is equivalent to taking is given by
X uTi b
qk = (I − (P1 + · · · + Pk−1 ))m ak , x= vi ,
σi
σi 6=0
where m = 2 is typically sufficient. Show that where the σi , ui , and vi are the singular values
all three of these variations are mathematically and corresponding singular vectors of A.
equivalent (though they may differ markedly in
finite-precision arithmetic). 3.32. Prove that the pseudoinverse A+ of an
m × n matrix A, as defined using the SVD in Sec-
3.29. Let v be a nonzero n-vector. The hyper- tion 3.6.1, satisfies the following four properties,
plane normal to v is the (n − 1)-dimensional sub- known as the Moore-Penrose conditions.
space of all vectors z such that v T z = 0. A (a) AA+ A = A.
reflector is a linear transformation R such that
(b) A+ AA+ = A+ .
Rx = −x if x is a scalar multiple of v, and
Rx = x if v T x = 0. Thus, the hyperplane acts as (c) (AA+ )T = AA+ .
a mirror: for any vector, its component within the (d ) (A+ A)T = A+ A.
hyperplane is invariant, whereas its component or- 3.33. Prove that the pseudoinverse A+ of an
thogonal to the hyperplane is reversed. m × n matrix A, as defined using the SVD in Sec-
(a) Show that R = 2P − I, where P is the or- tion 3.6.1, has the value indicated for each of the
thogonal projector onto the hyperplane normal to following special cases.
v. Draw a picture to illustrate this result. (a) If m = n and A is nonsingular, then A+ =
(b) Show that R is symmetric and orthogonal. A−1 .
(c) Show that the Householder transformation (b) If m > n and A has rank n, then A+ =
(AT A)−1 AT .
vv T (c) If m < n and A has rank m, then A+ =
H =I −2 ,
vT v AT (AAT )−1 .
is a reflector. 3.34. (a) What is the pseudoinverse of the fol-
lowing matrix?
(d ) Show that for any two vectors s and t such
that s 6= t and ksk2 = ktk2 , there is a reflector R
 
1 0
such that Rs = t. A0 =
0 0
152 Chapter 3: Linear Least Squares

(b) If  > 0, what is the pseudoinverse of the fol- (c) What do these results imply about the condi-
lowing matrix? tioning of the problem of computing the pseudoin-
  verse of a given matrix?
1 0
A =
0 

Computer Problems

3.1. For n = 0, 1, . . . , 5, fit a polynomial of de- 3.3. (a) For a series of matrices A of order n,
gree n by least squares to the following data: record the execution times for a library routine to
compute the LU factorization of A. Using a linear
t 0.0 1.0 2.0 3.0 4.0 5.0
least squares routine, or one of your own design,
y 1.0 2.7 5.8 6.6 7.5 9.9
fit a cubic polynomial to the execution times as
Make a plot of the original data points along with a function of n. To obtain reliable results, use a
each resulting polynomial curve (you may make fairly wide range of values for n, say, in increments
separate graphs for each curve or a single graph of 100 from 100 up to several hundred, depending
containing all of the curves). Which polynomial on the speed and available memory of the com-
would you say captures the general trend of the puter you use. You may obtain more accurate
data better? Obviously, this is a subjective ques- timings by averaging several runs for a given ma-
tion, and its answer depends on both the nature trix size. The resulting cubic polynomial could be
of the given data (e.g., the uncertainty of the data used to predict the execution time for other val-
values) and the purpose of the fit. Explain your ues of n not tried, such as very large values for n.
assumptions in answering. What is the predicted execution time for a matrix
of order 10,000?
3.2. A common problem in surveying is to deter-
mine the altitudes of a series of points with respect (b) Try to determine the basic execution rate (in
to some reference point. The measurements are floating-point operations per second, or flops) for
subject to error, so more observations are taken your computer by timing a known computation,
than are strictly necessary to determine the alti- such as matrix multiplication. You can then use
tudes, and the resulting overdetermined system is this information to determine the complexity of
solved in the least squares sense to smooth out LU factorization, based on the polynomial fit to
errors. Suppose that there are four points whose the execution times. After converting to floating-
altitudes x1 , x2 , x3 , x4 are to be determined. In point operations, how does the dominant term
addition to direct measurements of each xi with compare with the theoretically expected value of
respect to the reference point, measurements are 4n3 /3 (counting both additions and multiplica-
also taken of each point with respect to all of the tions)? Try to explain any discrepancy. If you
others. The resulting measurements are: use a system that provides operation counts au-
tomatically, try this same experiment fitting the
x1 = 2.95, x2 = 1.74, operation counts directly.
x3 = −1.45, x4 = 1.32,
3.4. (a) Solve the following least squares prob-
x1 − x2 = 1.23, x1 − x3 = 4.45, lem using any method you like:
x1 − x4 = 1.61, x2 − x3 = 3.21,
x2 − x4 = 0.45, x3 − x4 = −2.75.
   
0.16 0.10   0.26
 0.17 x1 ∼
0.11  =  0.28  .
Set up the corresponding least squares system x2
2.02 1.29 3.31
Ax ∼ = b and use a library routine, or one of your
own design, to solve it for the best values of the
altitudes. How do your computed values compare (b) Now solve the same least squares problem
with the direct measurements? again, but this time use the slightly perturbed

You might also like