CHAPTER 7 LEAST SQUARES SOLUTIONS TO LINEAR SYSTEMS

7. LEAST SQUARES SOLUTIONS TO LINEAR SYSTEMS
Objectives
The major objectives of this chapter are to study least squares solution to the linear system problem Ax = b. Here are the highlights of the chapter.

Some applications giving rise to least squares problems (Sections 7.2 and 7.5).
A result on the existence and uniqueness of the solution of least squares problems (Theorem 7.3.1 of Section 7.3).

Computational methods for least squares problem: The normal equations (Section 7.8.1) and the QR factorization (Section 7.8.2) methods for the full-rank overdetermined problem the QR factorization with column pivoting algorithm for the rank-de cient overdetermined problem (Section 7.8.3) and the normal equations (Section 7.9.1) and the QR factorization methods (Section 7.9.2) for the minimum-norm solution of the
full-rank underdetermined problem.

Sensitivity analyses of least squares and generalized inverse problems (Section 7.7). Iterative Re nement procedure for least-squares problems (Section 7.10). Background Material Needed for this Chapter
The following background material and the tools developed in previous chapters will be needed for the smooth reading of this chapter. 1. The Cholesky factorization algorithm for solving a symmetric positive de nite system (Section 6.4.7). 2. The QR factorization using the Householder and Givens methods (Algorithms 5.4.2 and 5.5.2), The QR factorization of a non-square matrix. 3. Orthogonal projections (Section 5.6). 4. The QR factorization with column pivoting (Section 5.7). 5. The Iterative re nement algorithm from Chapter 6 (Algorithm 6.9.1). 6. Perturbation analysis of linear systems (Section 6.6). 388

7.1 Introduction
In Chapter 6 we discussed several methods for solving the linear system

Ax = b
where A was assumed to be square and nonsingular. However, in several practical situations, such as in statistical applications, geometric modeling, signal processing etc., one needs to solve a system where the matrix A is non-square and/or singular. In such cases, solutions may not exist at all in cases where there are solutions, there may be in nitely many. For example, when A is m n and m > n, we have an overdetermined system (that is, the number of equations is greater than the number of unknowns), and an overdetermined system typically has no solution. In contrast, an underdetermined system (m < n) typically has an in nite number of solutions. In these cases, the best one can hope for is to nd a vector x which will make Ax as close as possible to the vector b. In other words, we seek a vector x such that r(x) = kAx ; bk is minimized. When the Euclidean norm k k2 is used, this solution is referred to as a least squares solution to the system Ax = b. The term \least squares solution" is justi ed, because it is a solution that minimizes the Euclidean norm of the residual vector and, by de nition, the square of the Euclidean norm of a vector is just the sum of squares of the components of the vector. The problem of nding least squares solutions to the linear system Ax = b is known as the linear least squares problem (LSP). The linear least squares problem is formally de ned as follows:

Statement of the Least Squares Problem
Given a real m n matrix A of rank k min(m n), and a real vector b, nd a real n-vector x such that the function r(x) = kAx ; bk2 is minimized. If the least squares problem has more than one solution, the one having the minimum Euclidean norm is called the minimum length solution or the minimum norm solution. This chapter is devoted to the study of such problems. The organization of the chapter is as follows. In Section 7.2 we show how a very simple business application leads to an overdetermined least squares problem. In this section we simply formulate the problem as a least squares problem and later in Section 7.5 we present a solution of the problem using normal equations. 389

In Section 7.3 we prove a theorem on the existence and uniqueness of solution of an overdetermined least squares problem. In Section 7.7 we analyze the sensitivity of the least squares problems due to perturbations in data. We prove only a simple result here and state other results without proofs. Section 7.8 deals with computational methods for both full rank and rank-de cient overdetermined problems. We discuss the normal equations methods and the QR factorization methods using Householder transformations, modi ed Gram-Schmidt and Classical Gram-Schmidt orthogonalizations. Underdetermined least squares problems are considered in Section 7.9. We again discuss here the normal equations and the QR methods for an underdetermined problem. In Section 7.10 an iterative improvement procedure for re ning approximate solutions is presented. In Section 7.11 we describe an e cient way of computing the variance-covariance matrix of a least squares solution, which is (AT A);1.

7.2 A Simple Application Leading to an Overdetermined System
Suppose that the number of units bi of a product sold by a company in the district i of a town depends upon the population ai1 (in thousands) of the district and the per capita income ai2 (in thousands of dollars). The table below (taken from Neter, Wasserman and Kunter (1983)) compiled by the company shows the sales in ve districts, as well as the corresponding population and per capita income. District Sales Population Per Capita Income 1 162 274 2450 2 120 180 3254 3 223 375 3802 4 131 205 2838 5 67 86 2347 Suppose the company wants to use the above table to predict future sales and believes (from past experience) that the following relationship between bi, ai1 , and ai2 is appropriate:

i

bi

ai

1

ai

2

bi = x + ai x + ai x :
1 1 2 2 3

If the data in the table has satis ed the above relation, we have 162 = x1 + 274x2 + 2450x3 390

120 223 131 67 or where

= = = =

x x x x

+ 180x2 + 3254x3 1 + 375x2 + 3802x3 1 + 205x2 + 2838x3 1 + 86x2 + 2347x3
1

Ax = b

274 2450 162 C B 120 C 0x 1 B C 180 3254 C C B C C B 223 C B x1 C : 375 3802 C b=B C x = B 2C C B C @ A C B C C B 131 C 205 2838 A x3 @ A 1 86 2347 67 The above is an overdetermined system of ve equations in three unknowns.

01 B1 B B B A = B1 B B B1 @

1

0

1

7.3 Existence and Uniqueness
As in the case of solving a linear system, questions naturally arise: 1. Does a least squares solution to Ax = b always exist? 2. If the solution exists, is it unique? 3. How do we obtain such solutions? The following theorem answers the questions of existence and uniqueness. Assume that the system Ax = b is overdetermined, or square, that is, A is of order m n where m n. An overdetermined system Ax = b can be represented graphically as:
=

m>n
x

A

b

The underdetermined case will be discussed later in this chapter. 391

Theorem 7.3.1 (Least Squares Existence and Uniqueness Theorem)

There always exists a solution to the linear least squares problem. This solution is unique i A has full rank, that is, rank(A) = n. If A is rank de cient, then the least squares problem has in nitely many solutions. We present a proof here in the full-rank case, that is, in the case when A has full rank. The rank-de cient case will be treated later in this chapter and in the chapter on the Singular Value Decomposition (SVD) (Chapter 10). First, we observe the following

and only if x satis es

Lemma 7.3.1 x is a least squares solution to an overdetermined system Ax = b if
AT Ax = AT b:
(7.3.1)

Proof. We denote the residual r = b ; Ax by r(x) to emphasize that given A and b, r is a function of x. Let y be an n-vector. Then r(y ) = b ; Ay = r(x) + Ax ; Ay = r(x) + A(x ; y ). So, kr(y)k = kr(x)k + 2(x ; y)T AT r(x) + kA(x ; y)k :
2 2 2 2 2 2

First assume that x satis es

AT Ax = AT b
2 2 2 2 2 2

that is, AT r(x) = 0. Then from the above, we have

kr(y)k = kr(x)k + kA(x ; y)k

kr(x)k

2

implying that x is a least squares solution. Next assume that AT r(x) 6= 0. Set AT r(x) = z 6= 0. De ne now a vector y such that

y = x + z:
Then

r(y) = r(x) + A(x ; y ) = r(x) ; Az kr(y)k = kr(x)k + kAzk ; 2 AT r(x)zT = kr(x)k + kAz k ; 2 kz k < kr(x)k
2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2

392

2kz 2 for any > 0 if Az = 0, and for 0 < < kAzk22 if Az 6= 0. This implies that x is not a least k2 squares solution. Proof. (of Theorem 7.3.1) Since A has full rank, AT A is symmetric and positive de nite and is thus, in particular, a nonsingular matrix. The theorem, in the full rank case, is now proved from the fact that the linear system AT Ax = AT b has a unique solution i AT A is nonsingular.

7.4 Geometric Interpretation of the Least Squares Problem
Let A be an m n matrix with m > n. Then A is a linear mapping of Rn ! Rm . R(A) is a subspace of Rm . Every vector u 2 R(A) can be written as u = Ax for some x 2 Rn . Let b 2 Rm . Because k k2 is a Euclidean norm, kb ; Axk2 is the distance between the end points of b and Ax. It is clear that this distance is minimal if and only if b ; Ax is perpendicular to R(A) (see Fig. 7.1). In that case, kb ; Axk2 is the distance from the end point of b to the \plane" R(A).

b

b-Ax R(A) Ax

Figure 7.1
From this interpretation, it is easy to understand that a solution of the least squares problem to the linear system Ax = b always exists. This is because one can project b to the \plane" R(A) to obtain a vector u 2 R(A) and there is x 2 Rn such that u = Ax. This x is a solution. Because b ; Ax is perpendicular to R(A) and every vector in R(A) is a linear combination of column vectors of A, then b ; Ax is orthogonal to every column of A. That is,

AT (b ; Ax) = 0
393

The engineers and scientists gather data from experiments.2) is the \best. a1 xi .3) E = 0 i = 1 : : : m: ai n E = .2 X(y . a . . a x .5. a .2 X x (y . a . One strategy for \best. a2 x2 . n X i=1 (yi . a xm) i i m i i a i n E = . . Suppose that the mth (m n) degree polynomial y(x) = a0 + a1x + a2 x2 + + am xm (7. a x .5. i . i i i am i 0 1 =1 . a x .2 X xm (y . a0 . a x . am xm) i 394 .t" is to minimize the sum of the squares of the residuals E= We must then have Now. a xm ) i i i m i i a i 0 0 1 2 2 =1 1 0 1 2 2 (7. amxm ) : i 2 (7. A meaningful representation of the collected data is needed to make meaningful decisions for the future.5. . . A well-known example of how normal equations arise in practical applications is the tting of a polynomial to a set of experimental data.t" for this set of data. a x .4) n E = .or AT Ax = AT b: 7. Let (x1 y1) (x2 y2) : : : (xn yn) be a set of paired observations.5.5 Normal Equations and Polynomial Fitting De nition 7.1 The system of equations AT Ax = AT b is called the normal equations. =1 .

. CB .5) (where denotes the summation from i = 1 to n). B . The matrix V is known as the Vandermonde matrix. if the xi 's are all distinct. From our discussion in the previous section.m C : V =B. X Setting xk = Sk k = 0 1 2m and denoting the entries of the right hand side. x = voltage 0 2 5 7 9 13 24 y = current 0 6 7:9 8:5 12 21:5 35 395 . and b = (b0 b1 : : : bm)T . C B.Setting these equations to zero.6) am . B . @ 0 1 S S 1 2 Sm Sm 2 (Note that S0 = n. . (7. CB .5. C B. This is a system of normal equations furthermore. .5. the system of equations can be written as: P a 0 X xm + a i 1 X xm + i +1 + am X xi m = 2 X xm yi i 0S BS B B .8) where a = (a0 a1 : : : am )T .8). . . we see that a is the least squares solution to the system (7. .5. C B. de ne 01 x xm 1 1 1 B1 x C B 2 2 C B .7) V TV a = b (7.) This is a system of (m + 1) equations in (m + 1) unknowns a0 a1 This is really a system of normal equations. CB .1 Suppose that an electrical engineer has gathered the following experimental data consisting of the measurement of the current in an electric wire for various voltages. A@ +1 0 1 am 1 0b C Bb C B C=B . .5. To see this. by b0 b1 bm. x. .5. C @ A 1 xn xm n Then the above system becomes equal to Sm Sm +1 Sm 10 a CB a CB . (7. . Example 7. A @ 0 1 bm 1 C C C: C C A (7. respeci tively. . we have 0 1 a n + a xi + a xi + + am xm = yi i X X X m X a xi + a xi + + am xi = x i yi 2 2 0 1 2 +1 X X X X .5. then the matrix V has full rank. . .

Straight line Fit 01 B1 B B B1 B B B V = B1 B B B1 B B B1 @ The Normal Equations are: 01 C 2C C C 5C C C 7C C C 9C C C 13 C A 1 24 0 0 1 B 6 C B C B C B 7:9 C B C B C B C y = B 8:5 C : B C B C B 12:0 C B C B C B 21:5 C @ A 35:0 a = 0:6831 a = 1:4353: The value of a + a x at x = 5 is 0:6831 + 1:4345 5 = 7:8596. Case 1. 0 1 0 1 The solution of these equations are: V TV a = V Ty = b ! ! 7 60 0:0906 a = 10 : 60 904 1:3385 3 Case 2.We would like to derive the normal equations for the above data corresponding to the best t of the data to (a) straight line (b) a quadratic. and would like to see a comparison of the predicted results with the actual result when v = 5. Quadratic Fit 01 B1 B B B1 B B B V = B1 B B B1 B B B1 @ 0 7 B B 60 @ The normal equations are 0 2 5 7 9 13 1 24 0 1 C 4 C C C 25 C C C 49 C : C C 81 C C C 169 C A 576 60 904 0:0091 C B C 4B 904 17226 C a = 10 @ 0:1338 C : A A 904 17226 369940 2:5404 396 V TV a = V Ty = b 1 0 1 .

The matrix of the normal equations in this case is ill-conditioned: Cond2(V T V ) = 2:3260 105: Indeed. it is well-known that the Vandermonde matrices become progressively illconditioned as the order of matrices increases.6.1 The matrix Ay = (AT A).3. AT = A. De nition 7. Least Squares Solution Using the Pseudoinverse The unique least squares solution x to the full-rank least-squares problem Ax = b is given by x = (AT A). Ay = (AT A).6 Pseudoinverse and the Least Squares Problem Denote (AT A).1).8127. Note that when A is square and invertible. it therefore follows that the unique least-squares solution is x = Ay b.1AT = Ay . 7.1AT b = Ay b: Clearly. : 1 1 1 1 397 . AT 1 is called the pseudoinverse or the Moore-Penrose generalized inverse of A. Note: The use of higher degree polynomial may not necessarily give the best result. From (7.The solution of these normal equations is: 0a B a = Ba @ a 0 1 2 1 0 0:8977 1 C = B 1:3695 C : C B C A @ A 0:0027 The value of a0 + a1x + a2 x2 at x = 5 is 7. the above de nition of the pseudoinverse generalizes the ordinary de nition of the inverse of a square matrix A. (AT ). AT = A.

7. We consider two cases: perturbation in the vector b and perturbation in the matrix A.1 01 21 B C A = B2 3C @ A Thus.7 Sensitivity of the Least Squares Problem In this section we study the sensitivity of a least squares solution to perturbations in data. K. 4 5 rank(A) = 2: ! y = (AT A). then Cond(A) = kAk kAyk. and the matrix (AT A). AT = . 2 2 Example 7. For component-wise perturbation results.11. That is.1:2857 . Having de ned the generalized inverse of a rectangular matrix.6.We shall discuss this important concept in some more detail in Chapter 10. we investigate how a least squares solution changes with respect to small changes in the data. we now de ne the condition number of such a matrix as Cond(A) = kAk kAyk.6. since it measures the information contained in the experiment. Mitra (1971). An excellent reference on the subject is the book Generalized Inverses by C.0:5000 Cond (A) = kAk kAky = 7:6656 2:0487 = 15:7047 : 1 2 2 2 Variance-Covariance Matrix In certain applications. Cond(A) = kAk kAyk . and Chandrasekaran and Ipsen (1994). This study is important in understanding the di erent behaviors of di erent methods for solving the least squares problem that will be discussed in the next section. and Cond(A) is the condition number with respect to 2-norm.2 If an m n matrix A has full rank. Rao and S.0:5714 0:8577 A 1 0:5000 . De nition 7. that is. 398 . all the norms used in the rest of this chapter are 2-norms. Note: If not explicitly stated. An algorithm for computing the variance-covariance matrix with- out explicitly computing the inverse is given in Section 7. see Bjorck (1992). The results in this section are norm-wise perturbation results. the matrix AT A is called the information matrix.1 is known as the variancecovariance matrix. R. A has full rank.

Case 1: Perturbation in the vector b Here we assume that the vector b has been perturbed to ^ = b + b. but A has remained unchanged. be the unique least-squares solutions to the original and the perturbed problems. respectively. we have AT ( bN ) = 0. x = Ay b + Ay b . Proof.2) 399 . x . ^ respectively. Since x and x are the unique least squares solutions to the original and the perturbed ^ problems. we have Ax = bR from which (again taking norms on both sides) we get Rk kxk kbAk : k (7.7. since x is the unique least squares solution. kx . the projections of the vectors b and b onto R(A).7. x = Ay b = Ay( bR + bN ) ^ = Ay ( bR) + Ay ( bN ) = Ay bR + (AT A). xk Cond(A) k bRk : ^ kxk kbRk Here Cond(A) = kAk kAyk and bR and bR are. b = bR + bN : Since bN lies in the orthogonal complement of R(A) = N (AT ). Ay b = Ay b: ^ Let bN denote the projection of b onto the orthogonal complement of R(A). Then if kbRk 6= 0.1 (Least-Squares Right Perturbation Theorem) Let x and x. That is.7.1) x . we have x = Ayb x = Ay(b + b): ^ Thus. AT bN = Ay bR : 1 Again. b Theorem 7. So (7.

Combining (7.) 0:6667 0 0:13333 1 B C bR = PA b = 10.1 above. If this number is large.7. we might have a drastic change in the least squares solution.1 tells us that if only the vector b is perturbed.7.1) and (7. @ A @ A 4 1 0 1 1 PA = Projection of A onto R(A) 0 0:9333 0:3333 0:1667 1 B C = B 0:3333 0:3333 .2).1 Theorem 7. Interpretation of Theorem 7.2 of Chapter 5. from Theorem 7. Cond(A) = kAk kAyk serves as the condition number in the sensitivity analysis of the unique least squares solution. if this number is small and the relative error in the projection of b onto R(A) is also small.3 B 0:03333 C @ A 0:06667 Cond(A) = 2:4495: 0 1:3333 1 C B bR = PA b = B 0:3333 C A @ k Since kkbbRk = 10.0:3333 C @ A 0:1667 . not merely the smallness of k bR k.7. namely k bR k . as in the case of linear systems.7.6.4. Example 7.0:3333 0:8333 011 B C B1C @ A (See Example 5. The following computations 400 .7. a small number.7. we R expect that the least squares solution will not be perturbed much. Note that it is the smallness of the relative error in the projection of b onto R(A). then the least squares solution will not change much. then. and A is well-conditioned.1 An Insensitive Problem 01 21 011 B C B C A = B 0 1 C b = B 1 C b = 10. that plays the role kbRk here. On the other hand. we have the theorem. then even with a small relative error in the projection of b onto R(A).

show this is indeed the case. kx .4 1 C C A Cond(A) = O(104) : : Using results of Example 5.4 10 0 Since the product of the condition number and kkbbRkk is 7.0888. :6667 x = Ay b = :3333 ! y(b + b) = :6667 x=A ^ :3334 kx .7. Let the perturbation E of the matrix be small enough so that rank(A) = rank(A + E ): 401 . Indeed this is true.3 from Chapter 5. Perturbation in the matrix A The analysis here is much more complicated than the previous case.4 C bN = B 0 C @ A @ A . ^ kxk 4 ! Example 7. B :1 C @ A 0 1 C 0 C A . we should expect a substantial R change in the solution.4 10 1 0 2 B b = B 10. we have 0 2 1 001 B C B C bR = PAb = B 10.2 A Sensitive Problem 0 1 B A = B 10. xk = 2:4495 10. We will state the result here (without proof) and the major consequences of the result. @ 4 10. xk = 0:5000 (Large!) ^ kxk Case 2. @ 0 3 4 x= ! 1 1 x= ^ 0:5005 ! 1:5005 011 B C b = 10.6.

7.2 (Least-Squares Left Perturbation Theorem) Let x and x be ^ kx .0:0001 1 B C bN = B 0:9999 C @ A 0:9999 402 kbN k = 1: kbRk .2 tells us that in the case where only the matrix A is perturbed. In the rst example. xk 2Cond(A) kEAk + 4(Cond(A)) kEN k kbN k + O kEN k ^ kxk kAk kAk kbRk kAk 2 2 the unique least-squares solutions to Ax = b and (A + E )^ = b. in general. respectively. (Cond(A))2 serves as the condition number of the problem in the second example. Example 7. but with di erent b. Then if bR 6= 0. Note that the residual r = b .Let x and x denote the unique least squares solutions. to illustrate the di erent sensitivities of the least squares problem in di erent cases. Theorem 7.2 sensitivity of the unique least squares solution.7. then the sensitivity will depend only on the Cond(A). we have the following theorem due to Stewart (1969). Let EA and EN denote the projections of E onto R(A) and onto the orthogonal complement of R(A).3 Sensitivity Depending Upon the Square of the Condition Number 0 1 B A = B 0:0001 @ 0 1:0001 1 B C bR = PA b = B 0:0001 C @ A 0:0001 0 1 C 0 C A 0:0001 1 011 B C b = B1C @ A 1 0 . respectively. depends upon the squares of the condition number of A. Theorem 7. to the original and to the ^ perturbed problem. Cond(A) serves as the condition number. the Two Examples with Di erent Sensititivities We now present two examples with the same matrix A. if kEN k or kbN k is zero or small. However. and let rank(A + E ) x be the same as rank(A). Then Interpretation of Theorem 7. Ax is zero if bN = 0.7.7.

B 0 0:9999 C (same as in Example 7.6. B 0 0:9999 C @ A 4 4 0 0:9999 kEN k = kE k = 9:999 10.) 0 10. since (Cond(A)) = 2 10 is large.(Using PA from Example 5.3 of Chapter 5. xk = 9:999 10 (Large!) : ^ kxk 3 kEN k kbN k (Cond(A)) = 9:999 10. 1 B C EN = 10. B 0 0:9999 C @ A 0 0:9999 0 1 1 1 B C A + E = B 0:0001 0:0001 C @ A 0 0:0002 0 0 10.4 C 0:5000 .6. @ @ A 1 C 0 C A 0:0001 1 0 2 1 B C b = B 0:0001 C : @ A 0:0001 4 Example 5. bR = b and bN = B 0 C. 2 10 = 1:9998 10 : kAk kbRk Example 7.7. we kbR 5 2 8 should expect a drastic departure of the computed solution from the true solution.0:0001 1 B C E = 10.3) @ A 4 0 0:9999 0 1 B A = B 0:0001 @ 0 0 1 001 B B C In this case. as the following computations show: ! .3 of Chapter 5.4:999 ! 0:5 3 x = 10 ^ x= 5 0:5 kx . (See A .0:5000 C. This is indeed true.) Let 4 0 0 .0:5000 0:5000 1 403 . : kAk kAk EN k Though the product of kkAkk and kbN k is rather small.0:0001 1 B C E = 10.4 10.4 Sensitivity Depending Upon the Condition Number 2 5 8 4 Note that Let 0 0 . Note that PA = B 10.7.4 10.

(A + E )(^): ^ x Then kr . the square of Cond(A) does not have any e ect the least squares solution is a ected only by Cond(A).7. MC (1983. rk kEN k + 2Cond(A) kEN k + O kEN k : ^ kbk kAk kAk kAk 2 Interpretation of Theorem 7. We verify this as follows: ! yb = 1 x=A 1 ! 1:4999 x == ^ 0:5000 Cond(A) = 1:4142 10 0 1 1 1 B C EA = 10. for the original and the perturbed least squares problems that is. 141{144) for a precise statement and proof of a result on residual sensitivity. Ax r = b .4 : Residual Sensitivity. We have just seen that the sensitivities of the least squares solutions due to perturbations in the matrix A are di erent for di erent solutions however.7. Theorem 7. according to Theorem 7.3 404 .3 (Least-Squares Residual Sensitivity Theorem) Let r and r ^ denote the residuals.Thus. 0 C @ A 4 4 4 kEAk = 10. xk = 0:5000 ^ kxk 4 0 10. See Golub and Van Loan. We state the result in somewhat simpli ed and crude form. r = b . kAk kx .2. respectively.7. B 10. pp. the following theorem shows that the residual sensitivity always depends upon the condition number of the matrix A.

0:0001 1 B C r = b .4 1 kr . (A + E )^ = B 0:9998 C ^ x @ A 0:9999 0 0 .1 1 B C r = b .7. Example 7.The above result tells us that the sensitivity of the residual depends at most on the condition number of A.10. shows that it is 405 . due to Wedin (1973).7.3 is now easily veri ed. Sensitivity of the Pseudoinverse.0:0001 1 B C E = 10. The following result. rk = 0:5 E = P E = 10.4 B 0 0:9999 C @ A 0 0:9999 ! 0:5 .4 1 0 0 .5 1 1 C b = B1C 4 B C 0 C A @ A 0 10.4:9999 ! 3 x= x = 10 ^ 0:5 5 0 . Ax = B 0:9999 C @ A 0:9999 0 .4 B 0 0:9999 C ^ B C N N A @ A kbk 0 0:9999 Cond(A) = 1:4142 104 and 0 1 B A = B 10. Cond(A) again that plays a role in the sensitivity analysis of the pseudoinverses of a matrix. @ 1 0 1 EN Cond(A) kkAkk = 1:4142: The inequality in Theorem 7.

1:2844 . Ay = 10.4 kE k : kAk ~y A Cond(A) = 15:7047: 0 0:0010 0:0020 1 B C E = 10.4 (Pseudoinverse Sensitivity Theorem) Let A be m n. where ~ ~ m n.0:5709 0:8563 A 0:9990 0:499995 . provided that rank(A) = rank(A). we have ~ Ay . Let Ay and Ay be. Ay ~ Ay Theorem 7.6 01 21 B C A = B2 3C @ A 4 5 0:0040 0:0050 ! y = . A = B 0:0020 0:0030 C @ A 4 Note that 7. the pseudoinverse A and of A = A + E .~ Then.0:5000 0 1:001 2:002 1 C ~ B A + E = A = B 2:002 3:003 C @ A 4:004 5:005 ! ~y = .0:4995 ~ Ay . respectively.1:2857 .8 Computational Methods for Overdetermined Least Squares Problems 7.1 The Normal Equations Method One of the most widely used approaches (especially in statistics) for computing the least squares solution is the normal equations method.8. p k 2Cond(A) kEk : kA Example 7. This method is based upon the solution of the system of normal equations AT Ax = AT b: 406 .0:5714 0:8571 A 1 0:5000 .7.7.

the method is quite e cient.1 Normal Equation Solution of the Case-Study Problem 01 B1 B B B A = B1 B B B1 @ 274 2450 1 C 180 3254 C C C 375 3802 C C C 205 2838 C A 86 2347 1 14691 C 3466402 C A 44608873 1 0 5 1120 B 1120 297522 (1) Form AT A = B @ 14691 3466402 0 703 1 B C (2) Form AT b = B 182230 C @ A 2164253 0 112 1 B 120 C B C B C B C b = B 223 C B C B C B 131 C @ A 67 407 . (1) Form c = AT b. 2 3 2 3 2 Flop-Count. the following algorithm computes the least squares solution x from normal equations using Cholesky factorization.1 Least Squares Solution Using Normal Equations Given an m n (m > n) matrix A with rank(A) = n. the normal equations approach for solving the least squares problem can be stated as follows: Algorithm 7.We assume that A is m n (m > n) and that it has full rank. and n op to solve two triangular systems. (2) Find the Cholesky factorization of AT A = HH T . Since in this case AT A is symmetric and positive de nite it admits the Cholesky decomposition AT A = HH T : Therefore. Thus. (3) Solve the triangular systems in the sequence Hy = c H T x = y: mn + n op: mn for computing AT A and AT b. The above method for solving the full rank least squares problem requires about Example 7.8. n for computing the Cholesky factorization 2 6 2 6 of AT A.8.

the sales in a district with the population 220. using the above results.000 and per capita income of $2. Then the best prediction using the 408 .500. we compute 0x B ( 1 274 2450 ) B x @ x 1 2 3 1 C = 162:4043 C A 1 C C = 120:6153 A 1 C C = 222:8193 A 1 C = 130:3140 C A 1 C C = 66:847 A 0x B ( 1 180 3254 ) B x @ x (True value = 162) 1 2 3 0x B ( 1 375 3802 ) B x @ x (True value = 120) 1 2 3 0x B ( 1 205 2838 ) B x @ x (True value = 223) 1 2 3 0x B ( 1 81 2347 ) B x @ x (True value = 131) 1 2 3 (True value = 67): Suppose that the company would like to predict.(3) Solve the normal equations: AT Ax = AT b 1 0 7:0325 B C x = B 0:5044 C : @ A 0:0070 Note that the system is very ill conditioned: Cond(AT A) = 3:0891 108: To see how the computed least squares solution agrees with the data of the table.

40 (1) c = AT b = 57 (2) Cholesky factorization of AT A: 3:7417 0 H= 5:3452 0:6547 (3) Solutions of the triangular systems: 10:6904 . .2 01 21 B C A = B2 3C @ A 031 B C b = B5C @ A 3 4 9 rank(A) = 2 rank(A b) = 3.0:2182 ! 3:3333 : . We therefore calculate the least squares solution.1:8333 .0:3333 ! Numerical di culties with the normal equations method 409 .given model is 0 7:0325 1 B C ( 1 220 2500 ) B 0:5044 C = 135:5005 @ A 0:0070 Note: In this example. the residuals are small in spite of the fact that the data matrix A is ill-conditioned.8.1AT = 1:3333 0:3333 .0:6667 and ! y b = 3:3333 : A . Example 7.0:3333 1:1667 ! Ay = (AT A).0:3333 Note that . Thus the system Ax = b does not have a solution.0:3333 ! ! y = x = ! 3:3333 The unique least squares solution is x = .

8 Since t = 9. Now. the normal equations approach may. The columns of A are linearly independent.4 0 10 with t = 9. the sensitivity of the problem depends only on the condition number of A (see Theorems 7. Thus. @ 1 kxk = (Cond(A))2: (Exercise): Thus. 225{226) that. unless Cond(A) is less than 10 2 . if x is the computed least ^ squares solution obtained by the normal equations method. in certain cases. Indeed.7. we will get ! TA = 1 1 A 1 1 which is singular. compute ! .1 and 7. though easy to understand and implement. where it is assumed that AT A has been computed exactly and then rounded to t signi cant digits. pp. introduce more errors than those which are inherent in the problem. From the perturbation analysis done in Chapter 6. This is seen as follows. then AT A may fail to be positive de nite or even may not be nonsingular. we have just seen in the section on perturbation analysis of the least squares problem that in certain cases such as when the residual is zero. First.8. Note that Cond(A) = 1:4142 104 > 10 2 = 104. However. may give rise to numerical di culties in certain cases.2). the normal equations method will introduce more errors in the solution than what is warranted by the data.3 1 C 4 0 C A . 410 .7. then kx . we easily see that. t Example 7. it may even be singular.8 1 T A = 1 + 10 A 1 1 + 10.The normal equations method. xk ^ Cond(AT A) t Consider 0 1 B A = B 10. The following simple example illustrates that fact. Second. we might lose some signi cant digits during the explicit formation of AT A and the computed matrix AT A may be far from positive de nite computationally. it has been shown by Stewart (IMC. in these cases. the accuracy of the least squares solution using normal equations will depend upon the square of the condition number of the matrix A.

bk will be minimized if x is chosen so that d R x . bk = QT Ax . we have 2 2 c where QT b = . Thus. kAx . despite ill-conditioning as the following computations show: ! 1:00000001 1 AT A = 1 1:00000001 ! T b = 2:00000001 C = A 2:00000001 ! 0 T A = 1:000000005 H = the Cholesky factor of A : 0:999999995 0:00014142135581 Solution of Hy = c: ! 2 y= : 0:00014142135651 Solution of H T x = y : ! 1! 0:999999999500183 x= : 1:0000000499817 1 (The exact solution. c = 0: 1 ! 411 .8. In fact. the computed matrix AT A could be obtained as a symmetric positive de nite matrix and the normal equations method would yield an accurate answer. we must stress that the method is still regarded as a useful tool for solving the least squares problem. if we would use an extended precision in our computations. ck + kdk 2 2 2 2 1 2 2 2 2 Let QT A = R = 1 be the QR decomposition of the matrix A. at least in the case where the matrix A is well conditioned.) 7. Then.2 The QR Factorization Methods for the Full-Rank Problem ! R kAx .A Special Note on the Normal Equations Method In spite of the drawbacks mentioned above about the normal equations method. since the length of a 0 vector is preserved by an orthogonal matrix multiplication. Note that in the example above. QT b = kR x . it is routinely used in many practical applications and seems to be quite popular with practicing engineers and statisticians.

The corresponding residual then is given by krk = kdk 2 2 This observation immediately suggests the following QR approach for solving the least squares problem: Least Squares Using QR (1) Decompose Am n = Qm m Rm n. the solution x is obtained by back substitution. Suppose that the Householder method is used to compute the QR decomposition of A. where Hk k = 1 : : : n are the Householder matrices such that QT = HnHn. we see that the product QT b can be formed as: For k = 1 2 : : : n b = Hk b. 1 HH: 2 1 Thus. (2) Form QT m bm n 1 c = . following the notations of Chapter 5. Then. The idea of solving the least squares problem based on the QR factorization of A using the Householder transformation was rst suggested by Golub (1965). the matrix Q does not have to be formed explicitly. 412 . d R x=c 1 ! (3) Solve the upper triangular system where R= R 0 1 ! : Use of Householder Matrices Once A is decomposed into QR and the vector QT b is formed.

That is. The method is stable. Speci cally. (b + b)k x 2 where E and b are small. the following algorithm computes the least squares solution x using Householder matrices H1 through Hn. d 2 1 ! of A. Since the cost of the algorithm is dominated by the cost of the QR decomposition n m . c H H b= . Example 7.Algorithm 7. Apply the Householder method to A. kE kF and k bk 2 c n kAkF + O( ) c kbk + O( ) 2 2 2 where c (6m .4 01 21 B C A = B2 3C @ A 3 4 031 B C b = B5C @ A 9 413 . 3n + 41)n and is machine precision. the overall op-count for the full-rank least squares solution using Householder's method is Flop-count. Solve R1x = c.8. the method is about twice as expensive as the normal equations method. Hanson (SLP. Obtain R1 and the Householder matrices H1 H2 : : : Hn. p. 1. 90) that the computed solution x is such that it minimizes ^ Round-o Error and Stability. n (Exercise): 3 2 Thus. 2. It has been shown in Lawson and k(A + E )^ . Form HnHn.1 3.2 The Householder-Golub method for the full-rank least squares problem Given an m n matrix A (m n) with rank(A) = n and an m-vector b. Note that the 2 3 normal equations method requires about n 2m + n 6 op. where c is an n-vector. the computed solution is the exact least squares solution of a nearby problem.8.

1 1 0:0001 .5:3452 1 C R! 0:6547 C C= 1 : C C 0 A 0 1 (2) QT b c = d 0 .0:7071 0:7071 C @ A 0 .0:3333 Norm of the residual = krk2 = kdk = 0:8115 Example 7.0:4364 0:4082 .3:7417 B 0 B R = B B B @ 0 0:8724 0:4082 C 0:2182 .0:0001 B C Q = B .8.1 .0:8018 0 .4.0:2182 C : @ A 0:8165 ! (3) Solution of the system R1x = c: .3 from Chapter 5.10:6904 1 B C = B .1 1 ! B B 0 0:0001 C = R C R = @ A 1 0 0:7071 0:7071 0 (See Example 5.) 0 0 414 .5:3452 ! .(1) A = QR: 0 .3:7417 .0:8165 C A .0:0001 .0:5345 @ .5 0 1 B A = B 0:0001 @ (1) A = QR 0 1 C 0 C A 0:0001 1 0 2 1 B C b = B 0:0001 C @ A 0:0001 0 .0:2673 B Q = B .0:2182 The least squares solution is ! 3:3332 x= : .10:6904 ! x= 0 0:6547 .

Note that 0:0005 5 . known as the Gram-Schmidt process.5 5 1 Ay b = : 1 Norm of the residual = krk2 = kdk = 0.2 ! = 0:0001 . introduced by M. Gentleman (1973). use the Givens rotations to decompose A into QR and then use this decomposition to solve the least squares problem.2 1 ! T b = B 0:0001 C = c B C Q @ A 0 d (3) Solution of the system: R1 x = c . However. 156-157). For details. can be used to decompose A into QR. We rst state these algorithms for QR factorization and then show how the modi ed Gram-Schmidt can be used to solve the least squares problem.1 ! 0 0:0001 x x ! 1 1 1 = 1 2 = 1: x x 1 2 ! . 415 . there are \square-root free" Givens rotations. which can be used to solve the least squares problem. when used to solve the linear least squares problem. as we have seen before. is just slightly more expensive than the Householder method. see Golub and Van Loan MC 1984 (pp.(2) 0 . A properly implemented Gram-Schmidt process. Use of Gram-Schmidt and Modi ed Gram-Schmidt Algorithms Another technique.1 The unique least squares solution is . but seems to be as numerically as e ective as the Householder method for solving the least squares problem. The square-root free Givens rotations are also known as fast Givens rotations. ! Use of Givens Rotations We can. Recall that computations of Givens rotations require evaluations of square roots however.5 Ay = 103 ! 0:0005 . of course. the use of Givens rotations will be more expensive than the use of Householder matrices.

3 Classical Gram-Schmidt (CGS) for QR Factorization Given A = (a1 a2 : : : an) of order m n with rank(A) = n. (See later in this section for details. During the rkk = kqk k qk rqk : The algorithm. however. satis es QT Q = I + E . The two versions are numerically equivalent.1 X i=1 2 qk ak . where kE k Cond(A). 1 do rik qiT ak k.4 Modi ed Gram-Schmidt (MGS) for QR Factorization qk For j = k + 1 : : : n do The above is the row-oriented modi ed Gram-Schmidt method. the following algorithm computes a matrix Qm n = (q1 : : : qn) with orthonormal columns and an upper triangular matrix R = (rij )n n such that A = Qm n Rn n. the computed qk 's can be far from orthogonal. The column-oriented version can similarly be developed (Exercise #17). as outlined above. For k = 1 2 : : : n do For i = 1 2 : : : k . computes the QR factorization of A in which. at the kth step. is known to have serious numerical di culties. Givens method for computing the QR factorization of A.8. (For more details. The following algorithm. as a result. the kth column of Q and the kth row of R are computed (note that the Gram-Schmidt algorithm computes the kth columns of Q and R at the kth step).) 416 T rkj qk qj qj qj . kk rikqi computations of the qk's. see the discussions in the next section. For k = 1 2 : : : n do rkk = kqkk2 Algorithm 7.) The algorithm. denoted by Q. It can be shown (Bjorck (1992)) that the ~ ~ ~ computed Q by MGS. Set qk = ak k = 1 2 : : : n. rkjqk : qk rkk Remarks: The modi ed Gram-Schmidt algorithm is not as satisfactory as the Householder or . can be modi ed to have better numerical properties. known as the modi ed Gram-Schmidt algorithm.8. cancellation can take place and. whereas the ~ ~ Householder method produces QT Q = I + E where kE k .Algorithm 7.

5 Form Q and R: 0 0 1 B C q = B .0:7071 C @ A 2 1 0 1 1 0 B C Q = (q q ) = B 0:0001 .4 417 .6 0 1 B A = B 0:0001 @ 0 1 C 0 C A 0:0001 1 0 2 1 B C b = B 0:0001 C @ A 0:0001 Gram-Schmidt k=1: 0 1 1 B C q = a = B 0:0001 C @ A 1 1 0 1 1 B C q = rq = B 0:0001 C @ A 1 1 11 0 r = kq k = 1 11 1 2 0 k=2: r =1 12 001 B C q = a . r q = 10.1 C @ A 2 2 12 1 1 0:7071 T q1 q2 = . B . compared to mn .Flop-count. Example 7. All computations are performed with 4-digit arithmetic.) 2 2 3 3 Although in the 2 2 case the CGS and MGS algorithms produce the same results.7:0711 10. n needed by the Householder method.8. The op-count for the MGS is mn . we use a 2 2 example to illustrate here how the computational arrangements di er with the same matrix. (Note that MGS works with the full length column vector at each step.0:7071 C @ A 1 2 R= r 0 r22 11 ! r 12 0 0:7071 ! 1 1 = 0 1:414 10. whereas the Householder method deals with successively shorter columns.

0:7071 C @ A 2 2 2 22 4 0:7071 Form Q and R: 0 1 1 0 B C Q = B 0:0001 .8. (Note that for the same problem. 0 ! 2 c = QT b = 0 ! Modi ed Gram-Schmidt q =a 1 1 q =a 2 2 k=1: 0 1 1 B C q = B 0:0001 C @ A 1 r = kq k = 1 11 1 2 r = qT q 12 1 2 0 0 1 B C q = q . the Householder0 ! 1 QR method produced x = .) 1 418 ! . 22 0 0 1 B C q = rq = B .0:0001 C @ A 2 2 12 1 0 0:0001 k=2: r = kq k = 1:4142 10. r q = B .Form c: 2 The least squares solution is x = .5). which is correct (Example 7.0:7071 C @ A 0 0:7071 1 1 R= 0 1:4142 10.4 ! 2 The least squares solution is x = .

and the Householder methods (which is known to be very stable) and tabulated the results of ^ ^ kI .r q 2 12 1 T where r12 = q1 a2. Speci cally. one could use column pivoting to maintain orthogonality as much as possible. However. For example. With CGS the orthogonality of Q can be completely lost with MGS. QT Qk in extended precision.0:0001 1 ^ Note that in the above 2 2 example.1c Cond2(A()A) 2 Cond 2 assuming that c2 Cond2(A) < 1. the CGS and MGS algorithms are equivalent. it follows that jqT (q )j < (1:06)(2m + 3) ka k : 1 2 2 2 This shows that in CGS two computed vectors. the computed Q (in four-digit arithmetic) is such that ! ^ ^ On the other hand. the orthogonality of Q may not be acceptable when A is ill-conditioned. To compare the computations with the di erent Gram-Schmidt processes. can be far from being orthogonal. the modi ed Gram-Schmidt. their numerical properties are di erent. for the same problem using the Householder method. q k < (1:06)(2m + 3) ka k : 2 2 2 2 T Since q1 q2 = 0. it has been shown that c ^ ^ I . consider the computation of q2 by the CGS method.0:0001 : . as remarked earlier. On the other hand. q and q . Since in MGS the loss of orthogonality depends upon the condition number. where c1 and c2 are small constants. QT Q = I (in four-digit arithmetic). QT Q 2 1 . Thus. 419 . The results are displayed in the following table. given q1 with kq1 k2 = 1. it can be shown Bjorck (1992) that in MGS the loss of orthogonality 1 2 depends upon the condition number of the matrix A.Modi ed Gram-Schmidt versus Classical Gram-Schmidt Algorithms Mathematically. Then it can be shown (Bjorck (1992)) that k (q ) . ^ ^ QT Q = 1 . neither algorithm can be recommended over the Householder or the Givens method. as far as nding the QR factorization of A is concerned. We have q 2 a . we also computed QR factorizations of the 5 5 Hilbert matrix using the Gram-Schmidt.

and this c is used to solve Rx = c. but it is still not fully satisfactory.1 Chris Paige is a Australian-Canadian numerical linear algebraist. However. as far as the least squares problem is concerned. However. qn+1: ! x! . especially for solving large and sparse problems. while using MGS in the solution of a least squares problem. QT Qk Gram-Schmidt 1:178648780488241 10. z ) . He is a professor of computer science at McGill University.TABLE 7. There seems to be a growing tendency to use MGS in solutions of least squares problems even over the Householder method. it is a di erent story. special care has to be taken. then the computed least squares solution may not be accurate. Montreal. Use of the MGS method in the least squares solution We have just seen that the MGS is superior to the CGS for QR factorization. Canada.1 = (Q1 qn+1) ! R z 0 = Q1 (Rx .7 Modi ed Gram-Schmidt 4:504305729523455 10. Since Q may not be computationally orthogonal. b = (A b) . Bjorck and Paige (1992) have recently shown that the MGS is numerically equivalent to Householder QR factorization applied to A augmented with a square matrix of zero elements on top. the MGS is clearly preferred over the CGS.1 Comparison of QR Factorization with Di erent Methods ^ ^ Method kI .12 Householder 4:841449989971538 10. 420 .16 Remark: The table clearly shows the superiority of the Householder method over both the Gram-Schmidt and modi ed Gram-Schmidt methods of the latter two methods. who rejuvenated the use of the Lanczos algorithm in matrix computations by detailed study of the break-down of the algorithm. if c = QT b is computed using Q obtained from MGS. The MGS should be applied to the augmented matrix (A b) to compute the factorization: (A b) = (Q1 qn+1) From this it follows that R z 0 ! : x Ax .

Apply MGS to Am n to obtain Q = (q1 : : : qn ) and R.7 Consider solving the least squares problem using the MGS with 0 1 1 0 2 1 1 B C B C A = B :0001 0 C b = B :0001 C : @ A @ A 0 :0001 :0001 ! 1 The exact solution is x = . 2. HnHn.8. Details can be found in Bjorck (1990). The above discussion leads to the following least squares algorithm. 1 0 1 1 0 ! 1 1 B :0001 . Solve Rx = ( : : : n)T . For k = 1 : : : n do T k = qk b 1 b b . we get ( 1 2) = (2 0:0001).5 Least Squares Solution by MGS 1. Algorithm 7. bk2 will be a minimum when Rx = z .8. Example 7. 1 applied to Round-o property and op-count. 1 HH 2 1 R c = : A b 0 c 0 0 1 2 ! ! 421 . then kAx . and the solution of Rx = ( 1 2)T is the 1 x . It can be shown (see Bjorck and Paige (1992)) that the MGS for the QR factorization method is numerically equivalent to the Householder method ! 0 0 A b that is.If qn+1 is orthogonal to Q1. if we obtain If we now form c = Q 0 x using ! algorithm above. k qk 3. On the other hand.:7071 C C Q=B R= : @ A 0 :0001 0 :7071 ! T b and solve Rx = c. we obtain x = 2 . Thus. the least squares solution can be obtained by solving Rx = z and the residual r will be given by r = qn+1 .

From this equivalence, it follows that the MGS method is backward stable for the least squares problem. The method is slightly more expensive than the Householder method. It

requires about mn op, compared to the mn ; n op needed by the Householder method.
2 2

3

3

7.8.3 The QR Factorization Method for the Rank-De cient Case
In this section we consider the rank-de cient overdetermined problem. As stated in Theorem 7.3.1, there are in nitely many solutions in this case. There are instances where, the rank-de ciency is actually desirable, because it provides a rich family of solutions which might be used for optimizing some other aspects of the original problem. In case the m n matrix A, m n, has rank r < n, the R matrix in a QR factorization of A is singular. However, we have seen that the use of the QR factorization with column pivoting can theoretically reveal the rank of A. Recall that Householder's method with column pivoting yields 0 where P is a permutation matrix, R11 is an r r nonsingular upper triangular matrix and R12 is r (n ; r). This factorization can obviously be used to solve the rank-de cient least squares problem as follows. Let

QT AP

=

R

11

0

R

12

!

PTx = y
~ AP = A and Then

QT b =

c : d

!

kAx ; bk

2 2

= QT APP T x ; QT b ~ = QT Ay ; QT b =
2 2

2 2

(2-norm remains invariant under orthogonal matrix multiplication)

R

0

11

! y! R
0
12

y

1 2

c ; d
422

!

2

2

= (R11y1 + R12y2 ; c)2 2 + d2 2 2

Thus, kAx ; bk2 will be minimized if y is chosen so that 2

R y = c;R y :
11 1 12 2

Moreover, the norm of the residual in this case will be given by:

krk = kb ; Axk = kck :
2 2 2

This observation suggests the following QR factorization algorithm for rank-de cient least squares solutions.

Algorithm 7.8.6: Least Squares Solutions for the Rank-De cient Problem Using QR Step 1: Decompose Am n P = Qm nRn m using column pivoting. ! Tb = c . Step 2: Form Q Step 3: Choose an arbitrary vector y .
2

d

Step 4: Solve the r r nonsingular upper triangular system:
R y = c;R y :
11 1 12 2

Step 5: Recover x from

x = Py:

Minimum Norm Solution
Since y2 is arbitrary, the above method will yield in nitely many least squares solutions, as Theorem 7.3.1 states. The solution obtained by setting y2 = 0 is called the basic solution. In case R12 = 0, the basic solution is the minimum norm least squares solution, i.e., among the in nite number of least squares solutions, it has the minimum norm. In case R12 6= 0, the basic solution can not be minimum norm solution (Exercise 20). In such a case the complete QR factorization with column pivoting can be used. Recall from Chapter 5 that the complete QR factorization of an m n matrix A is given by ! T AW = T 0 : Q 0 0 If pivoting is used, then ! T 0 T AP = Q W 0 0 423

where P is a permutation matrix. The minimum norm solution will then be given by

x = PW

T; c
1

!

0

PW

T; c
1

!

0

where c is a vector consists of the rst r elements of QT b. (Exercise #21(a))

Remarks:
1. A note on the use of column pivoting: We have shown that the column pivoting is useful for the rank-de cient least squares problem. However, even in the full rank case, the use of column pivoting is suggested (see Golub and Van Loan MC 1984). 2. For the rank-de cient least squares problem, the most reliable approach is the use of singular value decomposition (see Section 7.8.4 and Chapter 10).

Round-o property. It can be shown (Lawson and Hanson, SLP p. 95) that for the minimum length least squares problem, the computed vector x is close to the exact solution of a perturbed ^
problem. That is, there exist a matrix E and vectors x and b such that x is the minimum length ^ ^ least squares solution of (A + E )^ b + b, where x

kE kF (6m + 6n ; 6k ; 3s + 84)s kAkF + O( )
2

and k bk (6m ; 3k + 40)k kbk + O( 2 ), where k is the rank of A and s = min(m n). Moreover,

kx ; xk (6n ; 6k + 43)s kxk + O( ): ^
2

Note: In the above result we have assumed that R in R =
22

is zero. But in 0 22 ^ practical computations it will not be identically zero. In that case, if R22 is the computed version of R22, then we have
11 12

R

R R

!

kE kF

^ R

22

F

+ (6m + 6n ; 6k ; 3s + 84)s kAkF + O( 2):

Example 7.8.8 A Consistent Rank-De cient Problem
01 01 B C A = B2 0C @ A
0 0

031 B C b = B6C @ A
0 424

rank(A) = 1:

1 0 Step 1: AP = QR P = . 0 1

!

0 ;:4472 :8940 0 1 B C Q = B ;:8944 ;:4472 0 C @ A 0 ;2:2361 0 1 B C R = B 0 0C: @ A
0 0 1

0 ;6:7082 1 B C Step 2: QT b = B 0 C c = (;6:7082). @ A
0 Step 3: Choose y2 = 0.
11 1

0

0

Step 4: Solve R y = c ; R y :
12 2

The minimum norm least squares solution is

;6 y = Rc = ;2::7082 = 3: 2361
1 11

3 x=y= : 0

!

Example 7.8.9 An Inconsistent Rank-De cient-Problem
01 01 B C A = B2 0C @ A
0 0

011 B C b = B2C: @ A
0

0 ;0:4472 ;0:8944 0 1 0 ;2:2361 0 1 B C B C , Q = B ;0:8944 ;0:4472 0 C R = B 0 Step 1: PA = QR P = 0C: @ A @ A 0 1 0 0 1 0 0 ! c Step 2: QT b = c = ;2:2361. ! 1 0

Step 3: Choose y = 0.
2

d

1 Step 4: The minimum norm least squares solution is x = y = . 0 (Note that R11y1 = c1 ; R12y2 gives y1 = 1) 425

!

7.8.4 Least Squares Solution Using the SVD
Acting on the remark in the previous section, for the sake of curious readers and those who know how to compute the Singular Value Decomposition (SVD) of a matrix (for instance, using some software package such as MATLAB, LAPACK or LINPACK), we just state the following results which show how the SVD can be used to solve a least squares problem. A full treatment will be given in Chapter 10. Let A = U VT be the SVD of A, where A is m n with m n and = diag(

0 b0 B b0 = U T b = B ... @

1

b0m

1 C: C A

1

:::

n).

Let

Least Squares Solutions Using the SVD
A family of the least-squares solutions is given by

x = Vy
where

0y B y = B ... @

1

In the rank-de cient case, an expression for the minimum norm solution is

yn

1 C C A

8 b0 > i < yi = > i : arbitrary,
x=
r X uT b0 i i=1 i

i 6= 0 i = 0:

vi

where r = rank(A), and ui and vi are the columns of U and V , respectively. (Note that rank(A) = the number of nonzero singular values of A.)

426

Example 7.8.10

0 1 B A = B 0:0001 @

021 B C b0 = U T b = B 0 C @ A
0

1 2 C B 0:0001 C C 0 C b=B A @ A 0 0:0001 0:0001 A = U V T gives 01 0 ;0:0001 1 B C U = B 0 ;0:7071 0:7071 C @ A 0 0:7071 0:7071 0 1:4142 0 1 B C =B 0 0:0001 C @ A 0 0 ! 0:7071 ;0:7071 V= 0:7071 0:7071

1

0

1

y y= y

1 2

!

1:4142 = 0

!

1 x = Vy = : 1

!

7.9 Underdetermined Systems
Let A be m n and m < n. Then the system

Ax = b
has more equations than unknowns. Such a system is called an underdetermined system. An underdetermined system can be illustrated graphically as follows:

m<n

=

A x

b

Underdetermined systems arise in a variety of practical applications. Unfortunately, underdetermined systems are not widely discussed in the literature. An excellent source is the survey paper by Cline and Plemmons (1976). An underdetermined system has either no solution or an in nite number of solutions. This can be seen from the following theorem: 427

Theorem 7.9.1 Let Ax = b be an underdetermined system. Then every solution
x of the system can be represented as
where xR is any solution that is,

x = xR + xN AxR = b
and xN is in the null space of A, that is,

AxN = 0:

this case, the complete solution set is given by

Note. A full-rank underdetermined system has an in nite number of solutions. In

An Expression for the Solution Set of the Full-Rank Underdetermined System x = AT (AAT ); b + (I ; AT (AAT ); A)y , where y is arbitrary.
1 1

In the following, we describe two approaches for computing the minimum norm solution assuming that A has full rank, i.e., rank(A) = m.

428

1) 1 x = AT (AAT ). x: Then kyk = kx + zk 2 2 2 2 = kxk2 + kz k2 + 2(xT z ) 2 2 = kxk2 + kz k2 + 2(bT (AAT ).2 Let A be m n (m < n) and have full rank.1) given by (7.9.1A(y .9.9.7.1 Minimum Norm Solution for the Full-rank Undetermined Problem Using Normal Equations Step 1: Solve AAT y = b: 429 . b = 0).9. Since A has full rank. x) = Ay .9. b: (7.9. Algorithm 7. This shows that kyk 2 kxk : 2 2 Thus. The above discussion suggests immediately the following algorithm. AAT is nonsingular and clearly x is the unique solution of the system (7. To see that x is indeed the minimum norm solution. x)) 2 2 = kxk2 + kz k2 2 2 (since A(y . let's assume that y is another solution and de ne z = y .2).9.2) Proof.1 The Minimum Norm Solution of the Full-rank Underdetermined Problem Using Normal Equations Theorem 7. Then the unique minimum norm solution x to the underdetermined system Ax = b is given by (7. Ax = b . x is the minimum norm solution.

1 The equations AAT y = b are called the normal equations for the underdetermined system (7.1 1 2 3 A= 2 3 4 ! 6 b= : 9 14 20 20 29 .9. Example 7. 2 . 2 1 1+ ! 2 430 . For example. when ! 1 0 A= : 1 0 If is such that 10.Step 2: Form the minimum norm solution x: x = AT y: De nition 7. consider the explicit formation of AAT . then t 1+ AAT = 1 is singular. may arise in the underdetermined case as well.1 ! : 1 ! Step 1: AAT = y = ! Step 2: 011 B C x = AT y = B 1 C : @ A 1 Remarks: The same di culties with the normal equations. with t-digits.t < < 10.9.1). as pointed out in the overdetermined case.9.

2 The QR Approach for the Full-Rank Underdetermined Problem A QR factorization of A can. we have ! R QT AT = : 0 (Note that AT is n m n m. Thus. Then we have Partitioning y conformably. in particular. be used to solve an underdetermined system and.9. of course.7. to compute the minimum norm solution. the system Ax = b becomes (RT 0T )QT x = b: Denote y = QT x.) So. 1 . we get (RT or 0T ) (RT 0T )y = b: yR =b yN ! RT yR = b: yN = 0: The unique minimum norm least squares solution is obtained by setting So. the minimum norm solution is given by 0 The above discussion suggests the following algorithm. Step 3: Solve 1 2 RT yR = b: 431 Step 4: Form x = Q yR.2 Minimum Norm Solution for the Full-rank Underdetermined Problem Using QR Step 1: Find the QR factorization of AT : ! R QT AT = 0 : Step 2: Partition Q = (Q Q ). Since A in this case has more rows than columns. 1 2 x = Qy = (Q Q ) yR ! = Q1yR : Algorithm 7.9. we decompose AT into QR instead of A.

Note that if we use the Householder method to compute the Flop-count. the product Q1yR can be computed from the factored form of Q as the product of Householder matrices. there exist a matrix E and a vector x such that x is the minimum length solution of ^ ^ (A + E )^ b x where Round-o property.:2673 B Q = B . 93)) that the computed vector x is close to the exact minimum length least squares solution of a perturbed problem.QR factorization of A. It has been shown (Lawson and Hanson SLP (pp.3:7417 B R = B .9. m ops will be required to imple2 3 3 That is. A note on implementation.2 1 2 3 A= 2 3 4 ! 6 b= 9 ! : Step 2: 0 . 3m + 41)m kAkF + O( ): 2 Example 7. ment the above algorithm. Using Householder orthogonalization.1:6036 ! yR = : 0:6547 011 B C x = Q yR = B 1 C : @ A 1 1 432 .:4364 0 C :6547 C : A 0 1 Step 3: Step 4: The minimum norm solution: 0 . ^ kE kF (6n . m n .:5345 @ .5:3452 @ 1 :8729 1 C :2182 C A .:8018 0 .

The method is based upon an interesting observation made by Golub that the least squares solution x and the corresponding residual vector r satisfy the linear system Note that the above system is equivalent to and I A AT 0 ! r! x = 0 b ! : Ax + r = b AT r = 0 AT Ax = AT b: Thus one can apply the iterative re nement procedure described in Chapter 6 to the above augmented linear system. (2) Compute c(k) such that Ac(k) . which means that x is the solution of the normal equation 433 . do until convergence occurs: (1) r(k) = b . Ax(k) (compute the residual). Then For k = 1 2 : : :. A natural analog of the iterative re nement procedure for the linear system problem described in a section of Chapter 6 will be the following.10 Iterative Re nement It is natural to wonder if a computed least squares solution x can be improved cheaply in an iterative manner. which is due to Bjorck (1967a 1968). The scheme was proposed by Golub (1965).1 Linear System Analog Iterative Re nement For Least Squares Solution Let x(1) be an approximate solution of the least squares problem. r(k) 2 is minimum. well known for his outstanding contributions to solutions of least squares problems. He is a professor of mathematics at Linkoping University in Sweden. Algorithm 7.7. A successful procedure used widely in practice now follows. yielding the following scheme. (3) Correct the solution: xk ( +1) = x(k) + c(k): An analysis of the above re nement procedure by Golub and Wilkinson (1966) reveals that the method is satisfactory only when the residual vector r = b . Ake Bjorck is a Swedish numerical linear algebraist.10. Ax is su ciently small. as was done in the case of a linear system.

c02: c0 (4) Form c = Q 0 . one can implement the scheme rather cheaply.10. where accumulation of inner products is not necessary.2 Iterative Re nement for Least Squares Solutions rk (1) Compute k r Set r(0) = 0 x(0) = 0. Fortunately.Algorithm 7. T 0 A 0 b ! ! rk ! ( ) xk ( ) . ck r ! rk ! ck ! rk (3) Update the solution and the residual: = k + k . the above scheme would be quite expensive A 0 when m is large. r0 (2) Solve for c0 : RT c0 = r : Form QT r1 = 2 1 2 1 2 2 0 (3) Solve for c2: R1c2 = r1 . step 1 must be performed Implementation of Step 2 Since the matrix T is of order m + n.6. xk x c I A (2) Solve the system T A 0 ( ) 1 ( ) 2 ( ) 1 ( ) 2 ( +1) ( ) ( ) 1 ( ) 2 ( +1) ( ) in double precision. For k = 1 2 : : : do ( ) 1 ( ) 2 ! rk = k . (Using accumulation of inner products ! ck ! ! using double precision which is in contrast with the iterative re nement procedure for the linear system problem given in Section 6. then the system Q 0 ! I A can be transformed into I A AT 0 1 ! c! c 1 2 r = r 1 2 ! QT c 0 2 (RT 0)QT c1 = r2 : 1 + R 1 ! c = QT r 1 This shows that the above augmented system can be solved by solving two triangular systems and two matrix-vector multiplications as follows: r0 ! (1) . r 2 1 2 ! 434 .) I A = . Observe that if ! T A = R1 is the QR decomposition of A. Remark: Note that for satisfactory performance of the algorithm.

10. The procedure \may give solutions to full single precision accuracy even when the initial solution may have no correct signi cant gures" (Bjorck (1992)). On the other hand. iteration step s. xk ( ) 2 ( 1) 2 and c is an error constant. Example 7. Note that for these problems (Cond(A))2 serves as the condition number. . For a well-conditioned matrix. n op. With the above formulation each iteration will require only 4mn . satis es Flop-count. Bjorck and Golub (1967) have shown that with a 8 8 ill-conditioned Hilbert matrix. An Interpretation of the Result and Remarks The above result tells us that the iterative re nement procedure is quite satisfactory. the above result shows error at an iterative re nement step depends upon the condition number of A.assuming that the Householder method has been used and that Q has not been formed explicitly. Q does not need to be formed explicitly these products can be obtained if Q is only known in implicit form. the convergence may occur even in one iteration. 2 Round-o error. three digits accuracy per step both for the solution and the residual can be obtained. It is even more satisfactory for least squares problems with large residuals. It can be shown (Bjorck (1992)) that the solution x s obtained at the ( ) x s . x < c Cond(A) s = 2 3 : : : kx s. Note that for the matrix-vector multiplications in steps 1 and 3.1 01 21 B C A = B2 3C @ A 3 4 031 B C b = B5C @ A r =0 (0) 031 B C ! b! B5C B C r B C = = B9C: B C r 0 B C B0C @ A (0) 1 (0) 2 x =0 (0) 9 k=0 0 435 .

@ A 0 r2 0:8165 .0:6667 C B r B 0:3333 C : C C = + =B B C x x c B B 3:3333 C C @ A .0:3333 is the same least squares solution as obtained by the QR and normal equation methods.0:2182 C.10:6904 1 0! r1 B C = QT b = B .10:6904 ! 0 = r1 0:2182 0 = (0:8165): r2 0 . 0 0 . 1 @ A 3:3333 Note that c0 ! 3:3333 x = .0:6667 C B c B 0:3333 C : C C =B B C c B B 3:3333 C C @ A .0:6667 C.0:3333 (0) (0) 1 (0) 2 (1) (0) The computations of c(0) and c(0) are shown below. 2 ! 1 0 (0) (0) r1 = b r2 = . (1) ! 436 . .0:3333 0 0:3333 1 B C c(0) = B . 2 = 0 ! 3:3333 (0) c2 = .0:3333 r = r (0) 1 (0) 2 ! (3) Update the solution and the residual: (1) 0 0:3333 1 B C ! r ! c ! B .(2) Solve the system: I A AT 0 (0) 1 (0) 2 ! c ! c (0) 1 (0) 2 0 0:3333 1 B C ! B . Thus.

A common estimate of 2 is: 2 = kb . ~ ~ X = (AT A). Assuming that A has full rank. then the least squares estimate of x in the above model is. consider the classical Linear Regression Model: b = Ax + with the properties 1. thus. (R)T : 1 1 437 . b m. Consider the QR factorization of A: ~! TA = R : Q 0 Then ~ ~ AT A = (R)T R 2 ^ = b . A(AT A). ^ (note that AT ^ = 0 and ^^ = 0).1 : In order to see how this matrix arises in statistical analysis. Here b is the response vector.1 In statistical and image processing applications very often one needs to compute the variancecovariance matrix X = (AT A). clearly. see Johnson and Wichern (1988). E ( ) = 0 and 2. AT b: ^ 1 Also. Axk2 . ^ = (I .1AT )b b and. and is the error term. and Bjorck (1992). where A is m n.11 Computing the Variance-Covariance Matrix (AT A).7. = (R). A is the design matrix. one wonders if it is possible to compute X from the QR factorization of A. Regression analysis is concerned with prediction of one or more (response) dependent variables from the values of a group of predictor (independent) variables. cov( ) = E ( T ) = 2I .n For details. We show below how this can be done. given by x = (AT A). Since vital information might be lost in computing AT A explicitly. The parameters x and 2 are unknown.

3) ii j =i+1 j =k+1 0 1 Thus.3) determine all the entries of X . For xkk we have n X xkk rkk + rkjxjk = r1 from which we obtain j =k+1 kk n X rkj xkj A : xkk = r1 @ r1 . Algorithm 7. 1. 1 ::: 1 k = 2 3 ::: n.11.T : is just r1 en.T .~ ~ One can then easily compute (R). only one-half of its entries need to be computed. since R is an upper triangular matrix. i = k .T upper triangular system ~ ~ R(x : : : xn) = (R). if x1 through xn are the successive columns of X . Compute xik from (7.11. 1 n .1) j . then from ~ ~ X = (R).2).1).T we have ~ Since the last column of R.11.) For i = k . ~ ~ computing (AT A).1 Computing the Variance-Covariance Matrix X = (AT A). for k = n . we have k n X X xik = . However. 438 1 . This explicit multiplication can easily be avoided if the computations are reorganized.11.1 this way will require explicit multiplication of (R). kk kk j =k+1 ~ Rxn = r1 en: nn (7. is easily computed by solving the 1 nn By symmetry we also get the last row of X .11. 2.1(R). 1 : : : 1. 2 ::: 2 1: 3. Suppose now that we have already determined xij = xji j = n : : : k + 1 i determine xik i k. (7. and X can overwrite R. Compute xkk from (7.1.2) (Note that xkj = xjk j = k + 1 : : : n have already been computed. Thus.11.1(R).11.11. xn . r1 @ rij xjk + rij xkj A : (7.3).11. Note that since X is symmetric. Compute the last column xn by solving the system (7. the last column.1){(7. We now 0 1 (7.

11.11. using the above formulas requires about n 3 1 3 Example 7. x = 1:5 0 1:5 .6 35 .0:4924 C A 0:8616 .3)) .24 C. it is unique i A has full rank (Theorem 7. and Table of Comparisons 1.1). The QR factorization methods are more expensive than the normal equations method.2:4495 B R = B 0 @ 0 0 4 1 B C The last column x = B . the QR factorization method.0:8165 @ . The least squares solution to the problem Ax = b always exists. = B .12 Review and Summary.0:4082 0 . Flop-count. We discussed two methods: the normal equations method and . Existence and Uniqueness. @ A 4 .24 C (using (7.24 16:5 1 7.11. the use of extended precision is 439 2. The computation of X = (AT A).6 4 1 B C Then X = (AT A). If one insists on using normal equations. but are more reliable numerically. but there are numerical di culties with this method in certain cases.1 01 5 71 B C A = B2 3 4C: @ A A = QR gives 1 1 1 0 . In the overdetermined case.4:8990 3:3166 0 0:1231 C .0:3015 . The normal equations method is simple to implement.11. Overdetermined Problems.0:4082 B Q = B .1)) @ A 3 22 12 11 0:9045 .6 5 (using (7.2)) 3 x = .ops.6:5320 1 C 4:8242 C : A .0:2462 1 16:5 x = 35 (using (7.0:3015 .3.

Perturbation Analysis: The results of perturbation analyses are di erent for di erent cases of the perturbations in the data. but the most numerically viable approach to deal with rank de ciency is the SVD approach to be discussed in Chapter 10. 3. It is reasonably accurate. We have discussed the normal equations method and the QR factorization method for the minimum-norm solution. For the underdetermined problem there are in nitely many solutions in the full-rank case. Underdetermined Problems. 440 . For the rank-de cient overdetermined problem. See Bjorck (1992). This is in contrast with the results obtained by the Householder-QR factorization method for the full rank overdetermined problem. with some reorganization it can be made competitive with the Householder method to compute the least squares solution. The round-o error analyses for the underdetermined and the rank-de cient overdetermined problems are somehow complicated. Rice (1966) rst noted the superiority of this algorithm. The normal equations here are: AAT y = b whereas those of the overdetermined problem are: AT Ax = AT b: The numerical di culties with the normal equations are the same as those of the overdetermined problem. 4. Here we have discussed only the QR factorization method with column pivoting. Some numerical experiments even suggest the superiority of this method over other QR methods. there are in nitely many solutions. where it is shown that the computed solution is the exact solution of a perturbed problem. because solutions in these cases are not unique unless imposing the requirement of having the minimum-norm solution. The backward round-o analyses of the QR factorization methods using Householder transformations in these cases show that the computed minimum-norm solution in each case is nearby the minimumnorm solution of a perturbed problem.recommended. However. One can use Householder or Givens transformations or the modi ed Gram-Schmidt method to achieve QR factorization of A needed for solution of the least squares problem. popularity of using the MGS method in solving the least-squares problem is growing. Indeed. The Householder QR factorization is the most e cient and numerically viable the modi ed Gram-Schmidt is only slightly more expensive than the Householder method and is not as reliable as the Householder or the Givens method numerically for QR factorization.

2). A widely used algorithm due to Bjorck is presented in section 7.11. Computing the Covariance Matrix: An e cient computation of the matrix (AT A).1). The solution obtained by this iterative re nement algorithm is quite satisfactory.10. Iterative Re nement: As in the case of the linear system problem. then the sensitivity of the unique least squares solution. In certain cases.9) is satisfactory only when the residual vector r = b . Ax is su ciently small. we show how to compute this matrix using QR factorization of A and without nding the inverse explicitly.7. COMPARISONS FOR DIFFERENT LEAST SQUARES METHODS TABLE 7. depends upon the square of the condition number (Theorem 7. In section 7. such as when the residual is zero. the sensitivity depends only on the condition number of A.If only b is perturbed. is 1 important in statistical and image processing applications. It is shown how to solve the system in a rather inexpensive way using QR factorization of A. This algorithm requires solution of an augmented system of order m + n (where A is m n). then Cond(A) = kAk kAy k serves as the condition number for the unique least squares solution (Theorem 7. If A is perturbed. 5.2 441 .7. it is possible to improve the accuracy of a computed least squares solution in an iterative fashion. 6. An algorithm which is a natural analog to the one for the linear system (section 6. in general.

3 Stable: The computed solution is the Full-Rank exact solution of a nearby problem Overdetermined MGS-QR mn2 Almost as stable as the Full-Rank Householder-QR 3 Overdetermined Householder-QR 2mnr .Problem (1) Di culties with formation of AT A (2) Produces more errors in the solution than there are warranted by data. m Same as the rank-de cient 3 Full-rank overdetermined problem Overdetermined Full-Rank Normal Equations Method Flop-Count mn2 + n3 2 6 Numerical Properties 442 . Mildly stable: The computed minimum3 Rank-De cient with column pivoting where r = rank(A) norm solution is close to the minimum-norm solution of a perturbed problem m3 Underdetermined Normal m2 n + 6 Same di culties as in the case of the Full-rank Equations overdetermined problem 3 Underdetermined Householder-QR m2 n . in certain cases n3 Overdetermined Householder-QR mn2 . r2(m + n) + 2r .

Hanson (SLP). Richard D. These two books are a \must" for the readers interested in further study on the subject. used the book Applied Linear Regression Models. and by Gil. For a thorough treatment of the subject we refer the readers to the book by Golub and Van Loan (MC). and in some numerical analysis texts. The books MC and FMC contain a very thorough treatment of the perturbation analysis. for details. in particular. the references given in the book on these papers and the list of references appearing in the recent monograph of Bjorck (1992). 443 . Lawson and R. Inc. etc. J. See. For more on underdetermined problems. in particular. Murray and Philip Numerical Linear Algebra and Optimization also contain detailed discussions on perturbation analyses. A paper by Stewart (1987) is also interesting to read. William Wasserman and Michael Kunter. representing the most fundamental contributions in this area. Any book on regression analysis in statistics will contain applications of least squares problem in statistics. gives the proofs of the round-o error analyses of the various algorithms described in the present book.13 Suggestions for Further Reading The techniques of least squares solutions are covered in any numerical linear algebra text. Irwin. A classical survey paper of Golub (1969) contains an excellent exposition of the numerical linear algebra techniques for the least squares and the singular value decomposition problems arising in statistics and elsewhere. see the paper by Cline and Plemmons (1976). There is a nice recent monograph by Bjorck (1992) that contains a thorough discussion on least squares problems. by John Neter. Stewart. The books by Stewart (IMC). L. Bjorck. A complete book devoted to the subject is the (almost classical) book by C. The emphasis in most books is on the overdetermined problems.7. The SLP by Lawson and Hanson. We have. Illinois (1983). by Watkins (FMC). The readers are highly recommended to read these papers along with other papers in the area by Golub.

AND 7. Prove Theorem 7.7 5.6. Using least squares. Prove that x is a least squares solution i Ax = bR and b . 0 1 1 1 ! 0:0001 1 B C A= A = B 0:0001 0 C @ A 1 1 0 0:0001 01 2 31 ! 7 6:990 B C A= A = B3 4 5C: @ A 4 4 0 7 8 444 .5. Find the generalized inverse and the condition number of each of the following matrices. Ax = bN where bR and bN are.3. 7. Let A be m n m n. 2. t a straight line and a quadratic to x 0 1 3 5 7 9 12 y 10 12 18 15 20 25 36 Compare your results. Show that the residual vector r = b . 3. 4.3 1. 6. respectively. both in the full-rank and rank-de cient cases. Ax is orthogonal to all vectors in R(A). 7. Compute the condition number of the associated Vandermonde matrix in each case. Let A have linearly independent columns. based on QR factorization of A.1. PROBLEMS ON SECTIONS 7. Then prove that (Cond2 (A))2 = Cond2 (AT A). the range-space and column-space components of the vector b. Prove that the matrix AT A is positive de nite i A has rank n.Exercises on Chapter 7 PROBLEMS ON SECTION 7.

2. the unique least squares solutions ~ to the problems Ax = b and ~ where A = A + E . 445 .7.3A: @ A 0 0 11. Let A and A have full rank.7. respectively.) 9. Verify the inequality of Theorem 7.4 in each of the following cases.4A @ A 05 1 6 1 1 B C (b) A = B 10.~ 8. Work out a proof of Theorem 7.4 01 11 B C (c) A = B 0 1 C E = 10. 01 21 B C (a) A = B 3 4 C E = 10. Then prove that ~~ Ax = b k k +(Cond(A))2 kEk 1 + kEk : kA kA (Hint: Apply the perturbation analysis of the linear systems with normal equations. A : 10. xk ~ kxk ~ k kb Cond(A) kEk 1 + kxk kA ~k 01 21 B C A = B3 4C @ A 5 6 4 031 B C b=B 7 C @ A 11 E = 10. Verify the inequality of problem 8 with the following data: kx . Let x and x be.4A @ A 0 10.4 0 C E = 10.

Let A = B 2 3 C @ A (a) (b) (c) (d) (e) 0 1 1 Find the unique least squares solution x using (i) x = Ay b.PROBLEMS ON SECTIONS 7. Find r and r and verify the inequality of Theorem 7. Develop an algorithm to solve the normal equations whose system matrix is positive semide nite and singular. Find Cond(A). ~ ~ Let A = A = 10. (iii) the Householder and the Givens QR factorization methods. based on complete pivoting of the system matrix. (ii) the normal equations method. @ A 14.4A and let x = Ay b where A = A + E . Apply your algorithm to the normal equations with ! 1 1 A= 1 1 and ! 2 b= : 2 01 11 B C 13. Consider the following well-known ill-conditioned matrix 01 1 11 B 0 0C C B C jj A=B C B B0 0C A @ 0 0 (Bjorck (1990)).7. (Show all your work. Show that for this problem. Find kxk. 446 1 . when only A is perturbed. depends upon Cond(A). Construct your own example where the sensitivity of the least squares problem will depend upon the square of the condition number of the matrix.7 AND 7.8 12.3.) 15.kxk and verify the ~ ~ x inequality of Theorem 7.7. the sensitivity of the least squares problem. (iv) the CGS and MGS methods. ^ 001 B C b = B 5 C.2 for this problem.

(Square-root free Cholesky) Given a symmetric positive de nite matrix A. Find the maximum departure from orthogonality of the computed columns of the Q matrix using the CGS and MGS methods. Keep A unchanged. Change A to A1 = A + A where A = 10. Keep b unchanged.B B (b) Find the least squares solution to Ax = B B B @ (a) Choose an small. 031 B C Change b to b1 = B 0 C. Then compute Cond2 (A) to check that A is ill-conditioned. (b) Show that the op-count for both the Gram-Schmidt and the modi ed Gram-Schmidt are mn2. (a) Construct an example of the least squares problem where Gram-Schmidt will yield a poor result but modi ed Gram-Schmidt will do fairly well. 3 18.3 A. Find an upper bound for the relative error @ A 0 in the least squares solution. 17. (a) Derive a column-oriented version of the modi ed Gram-Schmidt process. develop an algorithm for nding the Cholesky decomposition of A without any square-roots: A = LDLT where L is a unit lower triangular matrix and D is a diagonal matrix with positive diagonal entries. 031 C C C using C C A (c) (d) (e) (f) (i) the normal equations method. so that rank(A) = 3. n . (c) Show that the op-count for the Householder method is n2 m . Find an upper bound for the relative error in the least squares solution. CGS and MGS methods. Compute the least squares solution of the problem in (b) using the SVD. 16. Apply your algorithm to solve the full-rank least squares problem based on solving normal equations. (ii) the QR factorization method with the Householder. 447 . where A is m n (m n).

(a) Show that the minimum-norm solution to Ax = b. Show that the op-count for QR factorization with column pivoting using the Householder method is 2mr . Apply your algorithm to problem #22. Give a op-count for the algorithm. Show that C @ A 19. Develop an algorithm based on the QR factorization with column pivoting to compute the minimum norm solution to the underdetermined system Ax = b where A is m n m < n.2 for computing the minimum norm solution of the full-rank underdetermined problem. with rank(A) = r.8. obtained by complete QR factorization with column pivoting is given by 34 39 T in both cases the loss of orthogonality between q1 and q2 is unacceptable: q1 q2 = 0:09065. c 1 ! 0 where c is a vector consisting of the rst r elements of QT b. r2(m + n) + 2r3=3. 20. Apply your algorithm to 0 !Bx 1 2 3 B x 4 5 6 @ x 1 2 3 1 ! C C= 6 : A 15 23. (b) Find the minimum norm solution to the least squares problem with 01 1 11 B C A = B1 1 1C @ A 1 1 1 031 B C b = B3C: @ A 3 PROBLEMS ON SECTION 7. using t = 4.0 8 21 1 B 13 34 C B C C (b) Apply the CGS and MGS methods to the matrix A = B B B 21 35 C. the basic solution cannot be the minimum-norm solution unless R12 is zero. x = PW T.9 22. Show that for the QR method with column pivoting in the rank-de cient case. 21. 448 . where r = rank(A). Develop an algorithm similar to the one given in section 7.

2 Solve the least squares problem: Find c(k) such that Ac k . Using the above formula. 2.4 C @ A @ A 0 10. C : B.3 Correct the solution: xk ( +1) = x(k) + c(k): (a) Apply three iterations of the algorithm to each of the following problems: 011 031 B C B C (i) A = B 2 C b = B5C @ A @ A 03 1 1 9 0 2 1 1 B C B C (ii) A = B 10.4 @ x3 A 1 0 0 10 1 x 4 25.1 r(k) = b .C @ A 0 2. For k = 1 2 : : : do 2. if PA is the orthogonal projection onto R(Ay ).) 2. That is.4 0 C B x2 C = B 1 C : B CB C B C @ AB C @ A B C . 001 B0C B C (1) 1. then the minimum norm solution x is given by y x = PA y where y is any solution.24.C B. Ax(k) .1. x = B . (Compute the residual. PROBLEMS ON SECTION 7.9. Consider the natural algorithm for iterative re nement for improving a computed least squares solution described in the beginning of section 7.10 26.10 with x(1) = (0 : : : 0)T .9. compute the minimum norm solution to the system 0 1 0 1 10. Using Theorem 7.4 0 C b = B 10. prove that the minimum norm solution to an underdetermined system y can be obtained by projecting any solution to the system onto R(Ay ).4 0 1 B x1 C 0 1 1 0 B 1 0 10.4 10. r k ( ) ( ) 2 is minimum.1.4 449 . Give a geometrical interpretation of Theorem 7.

2 3 4 6 7 7 Hilbert matrix and 0 1 B (iii) A = B 10. PROBLEM ON SECTION 7.2 to each of the problems of exercise #26. and compare the results.10. 1 b = 1 1 1 1 5 1 1 T. Show that these diagonal entries are just the squares of the 2-norms of the rows of R.1 are needed.1 1 C B C 4 0 C b = B1C A @ A .2 to the least-squares problem with the 7 .4 0 10 1 (b) What is the relationship of this algorithm with the iterative algorithm using the augmented system? (Algorithm 7. Apply Algorithm 7. 28. In many applications.10. Apply Algorithm 7.11 29. with and without explicitly solving the augmented system in step 2.2) 27. @ 1 0 1 Tell how many digits of accuracy per iteration step both in the solution and the residual were obtained. and that these can be computed only with 2 n3 ops.1. only the diagonal entries of the Variance-Covariance matrix X = (AT A).10. 3 450 .

etc. 3. 1. C C B1C @ A @ A 0 0 1 0 0 . lsudqrh.8 12 20.MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 7 You will need the programs housqr. lsfrqrh.7. norm. null. write a MATLAB program to implement the QR algorithm using Givens Rotations for the full rank overdetermined least-square problem: 0 1 1 011 1 B 10. 451 . Plot the original data point and the least-squares ts using the MATLAB command plot and polyval. lsitrn2.9 51.b). orth. lsfrmgs.9 42. B B B 0 0:0003 C C B :0001 C C @ A @ A 3 3 4 4 x] = lsfrqrg (A. B :0001 C : B C EA = 10.4 on di erent sensitivities of least-squares problems. 2. Let 0 0:0001 :0001 Using MATLAB commands pinv. (Implementation of the Least-squares QR Algorithm using Givens Rotations).9 6. reganal from MATCOM. 0 C B C B C B C b = B1C: B C A=B B C B 0 10. and compare the results.7. clgrsch.5 73 90. lsitrn1. lsudnme. lsfrnme.. givqr. verify the inequalities of Theorems 7.1-7. Using givqr and bcksub.5 Using the MATLAB Command vander and the operation `n' compute the least-square t of the data to polynomials of degree 1 through 4. cond.0:0001 1 0 :0001 1 B B C B 0 0:0009 C C B C b = a0.5 30. mdgrsch. Consider the following set of data points: x 0 1 2 3 4 5 6 7 8 9 y 1 2.

) a. 452 .3 . Using the results of (a). Q R] = mdgrsch(A).) 4. A randomly generated matrix of order 10.3 0 C B 0 C B C 3. 2. b = B . Q R] = givqr (A) (Givens QR) iii.3 0 C. C @ A . Hilbert matrix of order 10. make the following table for each matrix A. Q R] = qr(A) from MATLAB or Q R] = housqr (A) from Chapter 5. Q R]= clgrsch(A) from MATCOM (Classical Grass-Schmidt) iv. ii. from MATCOM (Modi ed Gram-Schmidt) ^ ^ b. 1. B B 0 10. (Householder QR).g.3 C B C case has all entries equal to 1. (The Purpose of this exercise is to compare the accuracy and op-count and timing for di erent methods for QR factorization of a matrix. 4 and 5 For problems 4 and 5 use the following sets of test-data. 0 1 1 1 1 B 10.Test-Data for Problems 3. generate b so that the least-squares solution x0 each in 1 3 B 10. for data matrix in #3. Q and R stand for the computed Q and R.3 C B C B 10 C @ A 10. (e.3 0 0 10 For problem #5. Compute the QR factorization for each matrix A of the data set using i.

b) (Least-squares using Givens QR) iv. op-count and timing ^^ kA . ^ ^ METHOD k(Q)T Q . b. x] = lsfrqrh (A. (The purpose of this exercise is to compare the accuracy. pinv is a MATLAB command for computing the generalized inverse of a matrix. x] = lsfrqrg (A. lsfrqrh and lsfrnme are all available in MATCOM.) a.(COMPARISON OF DIFFERENT QR FACTORIZATION METHODS). Using the results of (a) make one table for each data set x. x] = lsfrmgs(A. x] = pinv (A) * b (Least-squares using generalized inverse) Note: lsfrmgs. 453 . QRk kAk 2 2 Table Flop-Count Elapsed Time for di erent least-squares methods for full-rank.b) (Least-squares using Normal Equations) v. Write your conclusions. Compute the least-squares solution x for each data set using ^ i. overdetermined problems. I k2 housqr givqr clgrsch mdgrsch c.b) (Least-squares using Householder QR) iii. 5.b) (Least-squares using modi ed Gram-Schmidt) ii. ^ Note also that the vector x has all entries equal to 1. x] = lsfrnme (A.

6 of the book. Write your conclusions. 7.8. and the corresponding residual r.V] = svd (A) to compute the singular value decomposition of A. using Householder QR factorization of A: x r] = lsrdqrh (A.b) Use the same test data as in problem #6 and compare the results with respect to accuracy.b) to compute the minimum norm leastsquares solution x to the rank-de cient overdetermined problem Ax = b. op-count and elapsed time. 6.b) (This program should implement Algorithm 7. xk =kxk ^ 2 2 kAx . 454 . Using the MATLAB function U. write a MATLAB program. called lsrdqrh(A. and backsub from Chapter 6. called lsrdsvd to compute the minimum norm least-squares solution x to the rank-de cient overdetermined system Ax = b: x] = lsrdsvd (A.(COMPARISON OF DIFFERENT METHODS FOR FULL-RANK OVERDETERMINED LEAST-SQUARES PROBLEM) METHOD Table kx .) Test Data: A 20 2 matrix with all entries equal to 1 and b is vector with all entries equal to 2.S. Using housqr from MATCOM or Q R] = qr(A) from MATLAB. bk ^ 2 Flop-Count Elapsed Time lsfrmgs lsfrqrh lsfrqrg lsfrnme generalizedinverse c. write a MATLAB program.

and compare the results with respect to accuracy.10. a. elapsed time. @ 3 0 C C C 3 0 C A . Run the program reganal from MATCOM. ! ! 1 2 3 4 5 6 1 1 1 1 1 1 1 1 A= A= 0 1 2 3 4 5 6 7 0 1 0 0 0 0 0 Construct b for each A so that the minimum norm solution x has all its entries equal to 1. 0 10 10 10 10 10 10 10 1 B C A = B 0 1 0 0 0 0 0 C: @ A Test Data: A = The 8 8 Hilbert matrix 0 1 1 1 1 B 10.2) from MATCOM on the 8 8 Hilbert matrix A and constructing b randomly. Compare the algorithms on this data with respect to number of iterations. Compute (AT A). and elapsed time.1 using MATLAB command inv.10. Compute explicitly (AT A). op-count. b. ii. and elapsed time. and op-count. both for the solution and the residual.1) and lsitrn2 (based on Algorithm 7. Run the programs lsitrn1 (based on Algorithm 7.3 0 0 10 A = A 8 8 randomly generated matrix C 455 . 10. Run the programs lsudnme (least-squares solution for the underdetermined full-rank problem using normal equations) and lsudqrh (least-squares solution for the underdetermined full rank problem using Housholder QR factorization) from MATCOM on the following sets of data to compute the minimum norm solution x to the full rank underdetermined problem Ax = b.1 for each of the following matrices A as follows: i. 0 B A=B B B 0 10. Compare the results with respect to accuracy. 9.8. op-count.

1 The Bauer-Fike Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 521 8.1 The Basic QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 532 8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 457 8. Buckling Problem and Simulating Transient Current of an Electrical Circuit : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 473 8.5 Computing Selected Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : : 495 8.5.1 Eigenvalue Computations Using the Characteristic Polynomial : : : : : : : : 515 8.2 The Role of Dominant Eigenvalues and Eigenvectors in Dynamical Systems : 496 8.2 Eigenvalue Computations via Jordan-Canonical Form : : : : : : : : : : : : : 520 8. NUMERICAL MATRIX EIGENVALUE PROBLEMS 8.6 Similarity Transformations and Eigenvalue Computations : : : : : : : : : : : : : : : 515 8.9.6.3.3.7.3 Hyman's Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 520 8.2 Vibration Problem.6.2.7.7 Eigenvalue Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 521 8.5.2.3.3 The Eigenvalue Problems Arising in Practical Applications : : : : : : : : : : : : : : 466 8.3 Convergence of the QR Iterations and the Shift of Origin : : : : : : : : : : : 536 .3 Diagonalization of a Hermitian Matrix : : : : : : : : : : : : : : : : : : : : : : 463 8.2 The Schur Triangularization Theorem and its Applications : : : : : : : : : : 460 8.6.4 Computing the Subdominant Eigenvalues and Eigenvectors: De ation : : : : 508 8.3 An Example of the Eigenvalue Problem Arising in Statistics: Principal Components Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 488 8.2 The Hessenberg QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : 535 8.9.2.9 The Real Schur Form and QR Iterations : : : : : : : : : : : : : : : : : : : : : : : : : 529 8.8.5.4 Localization of Eigenvalues : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 490 8.2 Eigenvalue Bounds and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : 494 8.2 Some Basic Results on Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : 458 8.9.1 Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : : : : : : : : : : 458 8.1 Discussions on the Importance of the Largest and Smallest Eigenvalues : : : 495 8.4.4 The Cayley-Hamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : 466 8.2.4.8 Eigenvector Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 527 8.1 Stability Problems for Di erential and Di erence Equations : : : : : : : : : : 467 8.5. The Inverse Iteration and the Rayleigh Quotient Iteration497 8.1 The Gersgorin Disk Theorems : : : : : : : : : : : : : : : : : : : : : : : : : : : 490 8.3 The Power Method.2 Sensitivity of the Individual Eigenvalues : : : : : : : : : : : : : : : : : : : : : 524 8.

11 8.8 The Real Schur Form and Invariant Subspaces : : : : : Computing the Eigenvectors : : : : : : : : : : : : : : : : : : : : 8.9.8.11.10.4 The Single-Shift QR Iteration : : : : : : : : : : : : : : : 8.5 The Double-Shift QR Iteration : : : : : : : : : : : : : : 8.9.9.6 Implicit QR Iteration : : : : : : : : : : : : : : : : : : : 8.2 The Symmetric QR Iteration Method : : : : : : : : : : The Lanczos Algorithm For Symmetric Matrices : : : : : : : : Review and Summary : : : : : : : : : : : : : : : : : : : : : : : Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 537 : 539 : 543 : 547 : 550 : 552 : 552 : 553 : 556 : 558 : 564 : 566 : 573 : 578 .14 8.2 Calculating the Eigenvectors from the Real Schur Form The Symmetric Eigenvalue Problem : : : : : : : : : : : : : : : 8.1 The Sturm Sequence and the Bisection Method : : : : : 8.12 8.1 The Hessenberg-Inverse Iteration : : : : : : : : : : : : : 8.7 Obtaining the Real Schur Form A : : : : : : : : : : : : 8.13 8.9.11.10 8.9.10.

CHAPTER 8 NUMERICAL MATRIX EIGENVALUE PROBLEMS .

1. NUMERICAL MATRIX EIGENVALUE PROBLEMS Objectives The major objectives of this chapter are to study numerical methods for the matrix eigenvalue problem.7) 2.7) 5.9).3) giving rise to the eigenvalue problem. Special methods for the symmetric eigenvalue problem: symmetric QR iteration with Wilkinson shift and the bisection method (Section 8.1 and 5. Eigenvalue and Eigenvector sensitivity (Sections 8.11).5.7 and 8.4.8. The Power method and the Inverse power methods for nding the selected eigenvalues and eigenvectors (Section 8. Required Background 456 . Some practical applications (Section 8.3 and 5. Reduction to Hessenberg and tridiagonal form (sections 5.3).1) 3. The QR iteration algorithm for nding the eigenvalues of a matrix of moderate size (Section 8.4) 4. Inverse Iteration and Real Schur methods for nding the eigenvectors (Section 8. Linear system solutions with arbitrary.8). Here are some of the highlights of the chapter. Norm Properties of Matrices (Section 1. The condition number and its properties (Section 6. The following concepts and tools developed earlier in this book will be required for smooth reading and understanding of the material of this chapter.12).5. and triangular matrices (Section 6. Hessenberg. The Lanczos methods for nding the extremal eigenvalues of a large and sparse symmetric matrix (Section 8. The QR factorization of an arbitrary and a Hessenberg matrix using Householder and Givens transformations (Sections 5.4.5).10).

An almost classical method by Lanczos has been rejuvenated by numerical linear algebraists recently. Unfortunately. The problem is a very important practical problem and arises in a variety of applications areas including engineering. buckling problem. In Section 8. The symmetric eigenvalue problem enjoys certain remarkable special properties one such property is that the eigenvalues of a symmetric matrix are well conditioned. the Rayleigh-Quotient iteration.8.2. unfortunately. Section 8. transient behaviour of an electrical circuit. su ce. the inverse power method.1 Introduction This chapter is devoted to the study of numerical matrix eigenvalue problem. Some of the basic theoretical results on eigenvalues and eigenvectors are stated (and proved sometimes) in Section 8. The organization of this chapter is as follows. A method based on the exploitation of some of the special properties of the symmetric eigenvalue problem. known as the power method is useful for this purpose. known as the Sturm Sequence-Bisection method. statistics.5 describes the power method. are stated and proved. cannot be used to compute the eigenvalues and eigenvectors of a very large and sparse matrix. The symmetric Lanczos method is useful in particular to extract a few extremal eigenvalues and the associated eigenvectors of a very large and sparse matrix.6 the di culties of computing the eigenvalues of a matrix via the characteristic 457 . Section 8. A standard practical algorithm for nding the eigenvalues of a matrix is the QR iteration method with a single or double shift. etc. Several applications do not need the knowledge of the whole spectrum. this is not a practical approach.3 is devoted to the discussions of how the eigenvalue problem arises in some practical applications such as stability analyses of a system of di erential and di erence equations.nding method. Since the eigenvalues of a matrix A are the zeros of the characteristic polynomial det(A . The QR iteration method. one would naively think of computing the eigenvalues of A by nding its characteristic polynomial and then computing its zeros by a standard root. In Section 8. usually a few largest or smallest ones. A few selected eigenvalues.4 some classical results on eigenvalues locations such as the Gersgorin's disk theorem etc. for nding a selected number of eigenvalues and the corresponding eigenvectors. The sparsity gets lost and the storage becomes an issue. etc. based on implicit powering of A. etc. A classical method. economics. I ). vibration analysis. is useful for nding eigenvalues of a symmetric matrix in an interval (note that the eigenvalues of a symmetric matrix are all real). principal component analysis in statistics.

1 Eigenvalues and Eigenvectors Let A be an n n matrix. Then is an eigenvalue of A if there exists a nonzero vector x such Ax = x or (A . The vector y given by y A= y is called a left eigenvector of A associated with the eigenvalue .2. I ) is a polynomial in of degree n and is called the characteristic polynomial of A. 8.11. It is denoted by trace (A) or Tr (A).9 is the most important section of this chapter. I )x = 0: The vector x is a right eigenvector (or just an eigenvector) of A associated with the eigenvalue . 458 . Thus the n eigenvalues of A are the n roots of the characteristic polynomial. These include the Sturm sequence-bisection method and symmetric QR iteration method with the Wilkinson shift. The Hessenberg-inverse iteration and computations of eigenvectors from the Real Schur form are described in Section 8. The homogeneous system (A .2 Some Basic Results on Eigenvalues and Eigenvectors that 8. The chapter concludes with a brief introduction of the symmetric Lanczos algorithm for nding the extremal eigenvalues of a large and sparse symmetric matrix. I ) = 0: PA ( ) = det(A . The methods based on speci c properties of symmetric matrices are discussed in Section 8. I )x = 0 has a nontrivial solution i det(A . It is customary to call a right eigenvector just an eigenvector.7 and 8. The sum of the eigenvalues of a matrix A is called the trace of A. The most important result in this section is the Bauer-Fike theorem.polynomial and the Jordan canonical form are highlighted. The QR iteration method with and without shifts and their implementations are described in this section. Section 8.10. The eigenvalue and eigenvector sensitivity are discussed in sections 8.8.

I ) det(A .2. det(TAT . Theorem 8. ij +a = nn De nition 8. then trace(A) = a11 + sum of the diagonal entries of A. I ): Thus. Proof.Recall also from Chapter 1.1 The eigenvalue with the largest modulus is called spectral radius of A and is denoted by (A) : (A) = max(j ij). as its diagonal entries. I ) = = = = = det(T (A .1) det T det(A . therefore. I ) det(T T . 459 . have the same eigenvalues. I ) det(T . TAT .1) det T det(T . the eigenvalues of two similar matrices are the same. I )T .2 The set of eigenvalues of a matrix A is called the spectrum of A. I ) is also triangular with a . that if A = (a ). ) 11 22 i=1 showing that the eigenvalues of A are a11 a22 : : : ann. n Y det(A . If A = (aij) is triangular. We now present some basic results on eigenvalues and eigenvectors.2. : : : ann . 1 Proof.2 The matrices A and TAT . In other words. I ) = (aii .2.1 ) det(A . i De nition 8. a .1 .1 and A have the same characteristic polynomial and. Theorem 8.1) det(A .2. have the same eigenvalues. So.1 The eigenvalues of a triangular matrix are its diagonal entries. then the matrix (A .

2 The Schur Triangularization Theorem and its Applications Theorem 8. The theorem is of signi cant theoretical importance. ! ! 1 1 1 0 B=I 2 2 = 0 1 . Proof. 1 2 n In general. if p(x) is a polynomial of degree n. 0 1 Theorem 8. several important results on eigenvalues and eigenvectors can be derived using this theorem. A and B have the same eigenvalues. but they cannot 8.2. then the eigenvalues of Am p(x) = c xn + c xn. Here is a simple example. m 1 m n as its diagonal entries.3 For any complex matrix A there exists a unitary matrix U such that UAU = T is a triangular matrix. Since Am and T m are 460 . Two matrices having the same eigenvalues are not necessarily similar.4 Let 1 ::: n be the eigenvalues of A. Note: The converse is not true. they have the same eigenvalues.2. From U AU = T we have T m = (U AU )(U AU ) (U AU ) (m times) = U Am U: But T m is also a triangular matrix with similar. We shall give a proof of this theorem later in the chapter (Theorem 8. are m m : : : m.9. As we shall see below. + 1 + cn then the eigenvalues of p(A) are p( 1) p( 2) : : : p( n). The eigenvalues of A are the diagonal entries of T .1).Take A = 0 1 be similar.2.

2. + + cnI )U = c U An U + c U An. The eigenvec- Proof. 0 1 1 0 0 1 1 1 1 1 The diagonal entries of the triangular matrix T1 are p( 1) p( 2) : : : p( n): Since p(A) and T1 have the same eigenvalues. then A = A implies that A is symmetric (AT = A). Ax = and To prove the second part. The eigenvalues of a real symmetric matrix are also real. it follows that the eigenvalues of A are real.In the general case. diagonal. and let x and y be the associated eigenvectors. Theorem 8.5 The eigenvalues of a Hermitian matrix A are real. the theorem is proved. T is also Hermitian and. U + + cn U U = c T n + c T n. Then by de nition. Since the eigenvalues of A are the diagonal entries of T . let 1 and 2 be two distinct eigenvalues of A. Remark: Note that if A is real. + + cnI = T = a triangular matrix. and the diagonal entries of a Hermitian matrix must be real. U p(A)U = U (c An + c An. tors corresponding to the distinct eigenvalues are pairwise orthogonal. we have 1 x x 6= 0 y y 6= 0 1 Ay = from which we obtain 2 y Ax = y x 461 . Since U AU we (U AU ) or U A U or U AU or T = have = = = = T T T T T: Thus. therefore.

Let be an eigenvalue of A and x be the corresponding eigenvector. Then from Ax = x we obtain x Ax = x x: x x > 0: Since A is positive de nite.2. Theorem 8. Thus.2. y and x are orthogonal. since A is Hermitian. so. we have U AU = diag( i) = diag( : : : 2 De ne now x = Uy .3). P 1 n): trix are linearly independent. then x Ax = y U AUy = iyi . Since the i's are real and positive. we then have But. Again. for every nonzero x. we have x Ax > 0. (Theorem 8. Since A is Hermitian. by the Schur Triangularization theorem.6 The eigenvalues of a Hermitian positive de nite matrix are positive Proof.2. so > 0. every nonzero y corresponds to a nonzero x. To prove the converse.and yAx = y (since Since A = A. )y x = 0: 1 2 2 1 But. y x = 0 thus. x Ax > 0 and is real. x Ax > 0. Theorem 8. x x is always positive. if a Hermitian matrix has all its eigenvalues positive. we note that. proving that A is positive de nite. 1 6= 2. we get 2 2 2 x= y x= y x 2 2 = ): y x= y x or ( . and conversely. it must be positive de nite.7 The eigenvectors associated with the distinct eigenvalues of a ma- 462 .

. )x + 2 1 2 2 + cm ( 1 . there exists an unitary matrix U such that U AU = D is a diagonal matrix. m )xm Multiplying now to the left of the last equation by ( 2I . cx + 1 1 + cm xm = 0 + cm ( 1I . s)xm = 0: i Continuing this way (m . because can show that cm.Proof.2. s)( 2 . A) we get c ( . Let Consider 1 ::: m be the distinct eigenvalues and x1 : : : xm be the corresponding eigenvectors. It is an immediate consequence of the Schur Triangularization Theorem. )x + 3 1 3 2 3 3 + cm ( 1 . 6= j . Proof. T = T implies that it is diagonal. Since A =A from we have U AU = T T = T: 463 Since T is triangular. we conclude that cm = 0. The diagonal entries of D are eigenvalues of A. A)xm = 0 = 0: (since Ax1 = 1x1): where ci i = 1 : : : m are constants. the vectors x1 : : : xm are linearly independent. we get c ( I . )( .1 = cm. A)x + 1 1 1 2 1 2 or c ( . Analogously. 1) times. A) to the left. A)x + c ( I .8 For any Hermitian matrix A.2 = = c2 = c1 = 0: Thus. Multiplying by ( 1I . we 8.3 Diagonalization of a Hermitian Matrix Theorem 8.2.

So. it can always be reduced to a block diagonal matrix as the following result shows. However. It is clear that an arbitrary matrix can not be reduced to a diagonal matrix by similarity. where D = diag( 1 : : : n). (Compare this de nition with that given in Chapter 1 (De nition 1. AX = A(x : : : xn0 = ( x : : : nxn) ) 1 0 1 1 1 B B = (x : : : xn) B B B @ 1 1 0 n C C C = XD C C A ::: n.1)) Thus any complex Hermitian matrix.3. 464 .9 An arbitrary matrix A is similar to a diagonal matrix if its eigen::: Proof. 1 Theorem 8. AX = D: 1 De nition 8. Otherwise. if it is similar to a diagonal matrix. Since x1 : : : xn are the eigenvectors corresponding to the distinct eigenvalues are linearly independent (by Theorem 8. Therefore.2.2. Let x : : : xn be the eigenvectors corresponding to the eigenvalues X = (x : : : xn): 1 1 n. De ne Then. any real symmetric matrix and a matrix A having distinct eigenvalues are nondefective.2.3 An n n matrix A is said to be nondefective if it is diagonalizable that is.7). 1 they X . the matrix is defective.values are distinct. X is nonsingular.

If Ji is of degree vi . )v are called the elementary divisors of A.10 (The Jordan Theorem) For an n n matrix A. 465 .4 The polynomials det(Ji . each Jordan block has order 1 and therefore elementary divisors are linear. there are m linearly independent eigenvectors. The matrices J : : : Jm are Jordan matrices. If v1 = v2 = are called linear elementary divisors. pp.2. then 1 v + 1 + vm = n: i De nition 8. 121{129). Jordan Canonical Form The diagonal block matrix above is called the Jordan Canonical Form of A. there exists a nonsingular matrix T such that 0J B B B B B T .Theorem 8. = vm = 1 then the elementary divisors Notes: We note the following: (1) When all the eigenvalues of A are distinct. In fact.2. Proof. if there are m Jordan blocks. See Horn and Johnson (1985. AT = B B B B B B 0 @ 1 1 0 Jm 1 1 C C C C C C C C C C C A where 0 B B B B B Ji = B B B B B B @ i i 1 i 0 1 C 0 C C C C C C C C C 1C A i where the are eigenvalues of A. counting the multiplicities of the same block. I ) = ( i .

if A = (aij ) is an n n matrix and Pn ( ) is the characteristic polynomial of A.3 The Eigenvalue Problems Arising in Practical Applications The problem of nding the eigenvalues and eigenvectors arise in a wide variety of practical applications. the mathematical models of many engineering problems are systems of di erential and di erence equations and the solutions of these equations are often expressed in terms of the eigenvalues and eigenvectors of the matrices of these systems. then Theorem 8.2. Furthermore. Example 8. often can be determined only by knowing the nature and location of the eigenvalues. the elementary divisors are linear. As we have seen before.2.2.11 A square matrix A satis es its own characteristic equation that Pn (A) is a ZERO matrix. 466 . See Matrix Theory by Franklin. such as stability. It arises in almost all branches of science and engineering.4 The Cayley-Hamilton Theorem is.(2) If there are some nonlinear elementary divisors. many important characteristics of physical and engineering systems.. 8. then A has eigenvalues whose multiplicities are higher than or equal to the number of independent eigenvectors associated with them. 113{114. 2 5 1 2 0 1 1 0 0A : 0 0 8. We will give a few representative examples in this section. etc.1 0 1 0 1A Let A = @ : P( ) = 0 P (A) = @ 2 2 2 0 = @ 1 2 .2 @ . 1: 0 1 1 0 1 1 2A 0 1A @1 0A . (3) For a real symmetric or complex Hermitian matrix. 2 . pp. Proof.

2) that is Av = v (8. C _ @ A xn(t) arises in a wide variety of physical and engineering systems. when the eigenvalue of A are all distinct).3.1) or in matrix form where and x(t) = Ax(t) _ A = (aij )n 1 n 0 x (t) 1 dB . The solution of this system is intimately related to the eigenvalue problem for the matrix A. .1) with x(0) = x0 is given by x(t) = eAtx 467 0 (8.8.3. If A has n linearly independent eigenvectors (which will happen.3. where v is not dependent on t. then the general solution of the system can be written as x(t) = c v e 1t + c v e 2t + 1 1 2 2 + c n vn e n t (8. Then we must have ve t = Ave t (8.1).3) showing that if is an eigenvalue of A and v is corresponding eigenvector. C x(t) = dt B . In the general case.1 Stability Problems for Di erential and Di erence Equations A homogeneous linear system of di erential equations with constant coe cients of the form: dx1(t) = a x (t) + a x (t) + + a x (t) ..3.3.4) where 1 : : : n are the eigenvalues of A and v1 v2 : : : vn are the corresponding eigenvectors. To see this.3. dxn(t) = a x (t) + a x (t) + n1 1 n2 2 dt dt dx (t) = a x (t) + a x (t) + dt . Thus the eigenpair ( v ) of A can be used to compute a solution x(t) of (8. the general solution of (8. as we have seen in the last section.1) has a solution x(t) = ve t.5) . 11 21 1 1 12 22 2 2 2 1 n n + a2nxn (t) + annxn (t) (8.3.3.3. assume that (8.

if A has the Jordan Canonical Form eAt = I + At + A2!t + : : : 2 2 V . Furthermore. .. ..7) B i . the system of di erential equations (8. 2 1 (8. The stability is de ned with respect to an equilibrium solution.8) 1 Thus.. ...1 An equilibrium solution of the system x(t) = Ax(t) x(0) = x _ is the vector xe satisfying 0 (8.6) (8. C B C C Ji = B . many interesting and desirable properties of physical and engineering systems can be studied just by knowing the location or the nature of the eigenvalues of the system matrix A.3. B B . B C B C . 468 .3.9) De nition 8. xe = 0 is an equilibrium solution and it is the unique equilibrium solution i A is nonsingular. 0 1 1 t t =2 tp =p! C B B 1 t B C tp. AV0= diag(J J : : : 1 k) J 1 0 1 1 2 (8.3.. .. =(p .1) is completely solved by knowing its eigenvalues and eigenvectors of the system matrix A. as said before.3. B C B C t @ A 1 i i (8.10) Axe = 0: Clearly. . A mathematical de nition of stability is now in order.3. 1C C @ A 0 i ( ) k then where eAt = V diag(eJ1t eJ2 t : : : eJ t)V .. .where The eigenvalues i and the corresponding eigenvectors now appear in the computation of eAt : For example. .. . .3.. J t = e tB . C: e .3. 1)! C B C B C . Stability is one such property..

there is a such that 0 kx(t) . xe k < wherever kx . In other words.3.3.De nition 8. but not conversely.1) is asymptotically stable if the equilibrium solution xe = 0 is asymptotically stable. for any t and > 0. there exists a real number ( t ) > 0 such that kx(t) . if.1) is stable if the equilibration solution xe = 0 is stable. for any t .4 The system (8. one needs more than that. De nition 8.3. De nition 8.3 An equilibrium solution is asymptotically stable if it is stable and if. In other words. xe k : 0 0 0 The system (8.6 A system that is not stable is called unstable. the stability guarantees that the system remains close to the origin if the initial position is chosen close to it. Mathematical Criteria for Asymptotic Stability 469 .3.3. the asymptotic stability not only guarantees the stability but ensures that every solution x(t) converges to the origin whenever the initial position is chosen close to it. but not asymptotically stable. De nition 8.3. xek < : 0 whenever De nition 8. the following convention is normally adopted. Since an asymptotically stable system is necessarily stable.3. In many situations. xek ! 0 as t ! 1 kx .5 A system is called marginally stable if it is stable. stability is not enough.2 An equilibrium solution xe is said to be stable.

follows from Theorem 8.3.9). x(t): Then _ z_ (t) = x(t) . i j < 0: j j j j Stability of a Nonhomogeneous System Many practical situations give rise to mathematical model of the form x(t) = Ax(t) + b _ (8. De ne z(t) = x(t) . It.2 (Stability Theorem For a Nonhomogeneous System of Differential Equations) (i) An equilibrium solution of (8. (ii) An equilibrium solution is unstable if at least one eigenvalue has a positive real part. when t ! 1. _ Note that if j = j + i j j = 1 2 : : : n then e t = e t ei t . It is enough to prove that x(t) ! 0 as t ! 1.1) to be asymptotically stable is that the eigenvalues of the matrix A all have negative real parts. Ax(t) .11) where b is a constant vector.3. Proof. and e t ! 0. b = A(x(t) .3. (x)(t) _ = Ax(t) + b . x(t)) = Az (t): Thus.3. Since the general solution of the system x(t) = Ax(t) is given by x(t) = eAtx(0). The stability of such a system is also governed by the eigenvalues of A.11) is asymptotically stable i all the eigenvalues of A have negative real parts. Let x(t) be an equilibrium solution of (8. the proof follows from (8.11). x(t) ! x(t) if and only if z (t) ! 0.1 that: Theorem 8.3. 470 . This can be seen as follows.3.3.1 (Stability Theorem For a Homogeneous System of Di erential Equations) A necessary and su cient condition for the equilibrium solution xe = 0 of the homogeneous system (8.3.3.Theorem 8.6){(8. therefore.

3.Remarks: Since the stability of an equilibrium solution depends upon the eigenvalues of the matrix A of the system.3. or just the stability of the matrix A. Let's try to explain this historical fact through the notion of stability. Alliance 2: Germany and Austria-Hungary. Alliance 1: France and Russia.1 A European Arms Race Consider the arms race of 1909-1914 between two European alliances. many practical systems are modeled by a system of di erence equations: xk+1 = Axk + b. xk = Axk + b +1 (8. a system modeled by a system of rst order di erential or di erence equations.3. 471 . An equilibrium solution of Theorem 8. We leave the proof to the readers. to determine the stability and asymptotic stability of Example 8. A well-known mathematical criterion for asymptotic stability of such a system is given in the following theorem.3 (Stability Theorem for a System of Di erence Equations).11). The equilibrium solution is unstable if at least one eigenvalue has magnitude greater than 1. it is usual to refer to stability of the system itself.12) is asymptotically stable i all the eigenvalues of A are inside the unit circle. respectively. Explicit knowledge of the eigenvalues is not needed. all we need to know is if the eigenvalues of A are in the left half plane or inside the unit circle.3. The two alliances went to war against each other. Summarizing. Stability of a System of Di erence Equations Like the system of di erential equations (8.

F. Note that this simple model is realistic in the sense that the rate of change of the war potential of one country depends upon the war potential of the other country. i = 1 2 gi(t) = the grievances that country i has against the other. i and ki. 2)2 + 4k1k2 : 2 Thus the equilibrium solution x(t) is asymptotically stable if 1 2 . this model can be written as: x(t) = Ax(t) + g _ where 0 . k1 k2 > 0. Richardson. and the cost of the armaments the country can a ord. A=@ k 1 2 The eigenvalues of A are k A x (t) A x(t) = @ . i = 1 2 are all positive constants. one eigenvalue will have positive real parts. k1 k2 < 0. x (t) 0 1 g g = @ A: g 1 1 2 2 1 2 2 1 0 1 ( 1 . i = 1 2: gi . While the rst two factors cause the rate to increase. and is known as the Richardson model. the estimates of 1 2 and k1 k2 were made under some realistic assumptions and were found to be as follows: 1 1 = .First consider the following crude (but simple) mathematical model of war between two countries: where dx = k x .( + ) p = 2 = 0:2 472 . k1 k2 > 0. In matrix form. both the eigenvalues will have negative real parts if it is negative. sign associated with that term). and unstable if 1 2 . and on the grievances that one country has against its enemy-country. x + g dt dx = k x . x + g dt 1 1 2 1 1 2 2 1 2 2 1 2 xi(t) = war potential of the country i. For the above European arms race. the last factor has certainly a slowing e ect (that is why we have a . ixi denotes the cost of armaments of the country i. This is because when 1 2 . This mathematical model is due to L.

3..13) for solving the linear system Ax = b converges to the solution x for an arbitrary choice of the initial approximation x1 i the spectral radius (B ) < 1.) With these values of 1 2 and k1 k2.3. Convergence of Iterative Schemes for Linear Systems In Chapter 6 (Theorem 6. We thus see again that only an implicit knowledge of the eigenvalues of B is needed to see if an iterative scheme based upon (8. we have 1 2 .14) 0 B B B y=B B B @ y (t) y (t) 1 2 yn(t) 473 1 C C C . 1979 (pp.:7700: 2 1 Thus the equilibrium is unstable.2:2000. C . 209-214). the two eigenvalues are: 1.3. Luenberger. C A (8. Buckling Problem and Simulating Transient Current of an Electrical Circuit Analysis of vibration and buckling of structures. The main assumptions are that both the alliances have roughly the same strength and 1 and 2 are the same as Great Britain which is usually taken to be the reciprocal of the life-time of the British Parliament ( ve years). k = . C: .15) . often give rise to a system of second-order di erential equations of the form: B y + Ay = 0 where (8.3. For a general model of Richardson's theory of arm races and the role of eigenvalues there. Braun (1978)). In fact. see the book by M.3.13) is convergent.10.1) we have seen that the iteration: xk = Bxk + d +1 (8. 8. John Wiley and Sons. etc.2 Vibration Problem. see the book Introduction to Dynamic Systems by David G. New York. simulation of transient current of electrical circuits.4000 and .k k = 1 2 2 1 .k = k = 0:9: 1 2 (For details of how these estimates were obtained.

34{35. an explicit solution of the system can be written down in terms of the generalized eigenvalues and eigenvectors. AEP pp.17) Such an eigenvalue problem is called a generalized eigenvalue problem.14) in this case can be written down in terms of the eigenvalues and the eigenvectors. respectively.14). A solution of (8.16) w Bx = Ax: 2 Writing = w2.3. The number is called a generalized eigenvalue and x is the corresponding generalized eigenvector. this becomes Ax = Bx: (8. then y= n X i=1 ci cos( it + di)xi where ci and di are arbitrary constants. Then from (8. In such a case.3. called the mass and sti ness matrices.3.) In vibration problems the matrices B and A are.3. of (8.The solution of such a system leads to the solution of an eigenvalue problem of the type: Ax = Bx: This can be seen as follows: Let y = xeiwt be a solution of the system (8.3. giving rise to the symmetric generalized eigenvalue problem: Kx = Mx: 474 .14) we must have (8. (See Wilkinson.17). Very often in practice the matrices A and B are symmetric and positive de nite.3. and are denoted by M and K . The eigenvalues of the generalized eigenvalue problem in this case are all real and positive. Eigenvalue-Eigenvector Solution of By + Ay = 0 with A B Positive De nite Let x1 x2 : : : xn be the n independent eigenvectors corresponding to the generalized eigenvalues 1 2 : : : n.

7 The quantities wi = p i i = 1 : : : n are called natural frequencies. When such excitation is oscillatory the system is also forced to vibrate at the excitation frequency. 1 The frequencies can be used to determine the periods Tp for the vibrations. An entire chapter (Chapter 9) will be devoted to the generalized eigenvalue problem later. aircrafts. For forced vibration. We will also illustrate this in this section . Under free vibration such systems will vibrate at one or more of its natural frequencies. We will give a simple example below to illustrate this. which are properties of the dynamical system and depends on the associated mass and sti ness distribution. etc. Free vibration takes place when a system oscillates due to the forces inherent in the system. possessing mass and elasticity experience vibration to some degree. We now describe below how the frequencies and amplitudes can be used to predict the phenomenon of resonance in vibration engineering. Thus T =2 pi wi is the period of vibration for the ith mode. 475 . the behavior of a vibrating system can be analyzed by knowing the natural frequencies and the amplitudes . and x : : : xn are called the amplitudes of vibration of the masses. The solutions of such problems using nite di erences lead to also eigenvalue problems. and without any external forces. buildings.De nition 8.3. and their design require consideration of their oscillatory behavior. As we will see. systems oscillate under the excitation of external forces. Other vibration problems such as the buckling problem of a beam gives rise to boundary value problems for the second-order di erential equations.. All machines and structures such as bridges.

Braun. In case of the Tacoma Bridge. However. it was the wind. Another important property of dynamical systems is damping which is present in all systems due to energy dissipation by friction and other resistances. Damping becomes important in limiting the amplitude of oscillation at resonance. 1978 (p. a periodic force of very large amplitude was generated and. This is the kind of situation an engineer would very much like to avoid. 167{169).Phenomenon of Resonance If the excitation frequency coincides or becomes close to one of the natural frequencies of the system. has such a device. the frequency of this force was equal to one of the natural frequencies of the bridge at the time of collapse. Speci cally. For a continuous elastic body the number of independent co-ordinates or degrees of freedom needed to describe the motion is in nite. see the book Differential Equations and Their Applications by M. the frequencies 476 . In the case of the Broughton Bridge. a For In both the above cases. under many situations. for small values of damping. and a condition of resonance is encountered. parts of such bodies may be assumed to be rigid. and is normally not included in the estimation of natural frequencies. major e ect of damping is to reduce amplitude with time. However. it has very little e ect on the natural frequencies. and the system may be treated dynamically equivalent to one having a nite number of the degrees of freedom. The collapse of the Tacoma Narrows (also known as Galloping Gerty) Bridgea at Puget Sound in the state of Washington in 1940 and that of the Broughton Suspension Bridge in England are attributed to such a phenomenon. soldiers are no longer permitted to march in cadence over a bridge. The famous Chicago Sears Tower in the windy city. SpringerVerlag. Buildings can be constructed with active devices so that the force due to wind can be controlled. dangerously large oscillations may result. Because of what happened in Broughton. the large force was set up by soldiers marching in cadence over the bridge. a complete story of the collapse of the Tacoma bridge. and the eigenvalue and the eigenvectors of the matrix of the mathematical model of the system are related to these quantities. the behavior of a vibrating system can be analyzed by knowing the frequencies and the amplitudes of the masses. Chicago. In summary.

k y = 0 m y .3 Figure 8.k k 1 2 2 2 2 ! y! y 1 2 = 0: (8.k y +k y =0 1 1 1 2 1 2 2 2 2 2 2 2 1 or m 0 m2 1 0 ! y! y 1 2 k + k .3.k + .1(a)) with rigid oor shown below.are the square roots of the eigenvalues and the relative amplitudes are represented by the components of the eigenvectors.1 The equation of motion for this system can be written as m y + (k + k )y .1(b). Example 8.18) 477 . It is assumed that the weight distribution of building can be represented in the form of concentrated weight at each oor level. as shown in Figure 8. and the sti ness of supporting columns are represented by the spring constants ki .3.2 Vibration of a Building Consider a two-story building (Figure 8. y 2 m =m 2 k =k 2 ` y 1 m =m 1 k =k 1 ` (a) (b) Figure 8.

Taking m1 = m2 = m k1 = k2 = k. The eigenvalues are: 1 k = m (0:3820) 2 k = m (2:6180): The corresponding eigenvectors are: 0:5257 0:8507 : 0:8507 .k k 1 2 m 0 ! y M y + Ky = 0 where y = y Assuming that a solution y is of the form ! : (8.1K and = w2 the equations of motion give rise to the eigenvalue problem or k m k .k m 0 ! y1 ! + .0:5257 The vibration of the building at these two normal modes are shown in the gure: ! ! 478 .19) . we have ! 2k . setting A = M .3.k ! .3.1 and.m ! x ! = k x m 1 2 x x 1 2 ! : (8.20) y = xeiwt and multiplying the above equation by M .k k 0 m y2 De ning m the mass matrix M = 0 and 2k the sti ness matrix K = the above equation becomes y ! = 0: y 1 2 (8.21) The eigenvalues 1 and 2 and the corresponding eigenvectors representing two normal modes for this 2 2 problem are easily calculated.m 2 k .3.

uniform beam of length l.0.526 ! = 0:618 m 1 rk -0.526 ! = 1:618 m 2 rk Figure 8. that is. y P x ` Figure 8.2 Buckling Problem (A Boundary Value Problem) Consider a thin. We 479 .3 We are interested in the stability of the beam.851 0. how and when the beam buckles.851 0. An axial load P is applied to the beam at one of the ends.

Py: 2 2 (8. as was done in Chapter 6. 0 = x0 < x1 < x2 < : : : < xj < : : : < xn. 2yi + yi. ..23).22) y(0) = y (l) = 0 (8. . CB ..2 C B .e. d2 We will solve this boundary value problem by approximating dxy with an appropriate nite 2 di erence scheme. C B . : dy (8. C B . . . d2 y(0) = y(l) = 0: >From beam theory the basic relationship between the curvature dxy and the 2 internal moment M for the axes is d2 y = M dx2 EI where E is the modulus of elasticity.. .3. C @ A@ A @ A 0 0 0 .1 C B . . CB . C B .will show below how this problem gives rise to an eigenvalue problem and what role the eigenvalues play. . we obtain the following symmetric tridiagonal matrix eigenvalue problem: 0 2 . the equation gives the governing di erential equation which is the so-called bending moment equation: The constant EI is called the exural rigidity of the beam. ..3. C B ..22) and taking 2 2 .1 < xn = l and Let 2 l h = n: +1 1 2 = i 2 into account the boundary conditions (8.1 C B y1 C B y1 C B B . C B . and I is the area moment of inertia of column cross-section. Let y denote the vertical displacement of a point of the beam which is at a distance x from the (de ection) left support. ..23) constitute a boundary value problem. C (8. .3. . . Writing the bending moment at any section as M = Py . CB . Suppose that both the ends of the beam are simply supported.1 0 10 y 1 0y 1 0 B .3.3. . 0 C B . . The bending moment equation along with the boundary conditions d EI dxy = . .2 C B .24) dx x x h d Substituting this approximation of dxy into the bending moment equation (8. C = B .3. C B . with x0 x1 : : : xn as the points of division.25) B . . C CB C B C B . C B .1 2 yn yn 480 yi . .1 2 . i. That is. Let the interval 0 l] be partitioned into n subintervals of equal length h.

EI P= h 2 In general. .4 Voltage drop across a capacitor is: q VC = C q = charge in the capacitor C = capacitance VL = L di L is inductor u dt Voltage drop across an inductor: Voltage drop across a resistor: VR = iR R is resistor u: 481 .27) which is called a critical load.i _C A A L R A A Figure 8. Given an electric circuit consisting of several loops. the smallest value of P is of primary importance. because they determine the possible onset of the buckling of the beam. since the bending associated with larger values of P may not be obtained without failure from occurring under the action of the lowest critical value P .where Each value of determines a load = Ph : 2 EI (8.3. Simulating Transient Current for an Electric Circuit ( Chapra and Canale (1988) ). we want to know the oscillation of each loop with respect to the other. suppose we are interested in the transient behavior of the electric circuit.26) (8.3. In particular. These critical loads are the ones which are of practical interests. First consider the following circuit of the loop.

L dt C C .29) (8. i )dt = 0 . C1 _ i2 C2 _ i3 C3 _ i4 C4 _ Figure 8. . 1 Z t i dt + 1 Z t (i . We then have for this circuit q L di + Ri + C = 0 dt or q 1Z (Because.1 3 3 3 3 4 2 2 3 di .30) (8. i1 $ $ $ $ % % % % L2 . i )dt = 0: . i )dt = 0 .3. i )dt + 1 Z t (i . i )dt = 0 . L1 .1 . . L3 . .1 i dt = 0 + .5 Kircho 's voltage law applied to each loop gives Loop 1: Loop 2: Loop 3: Loop 4: 1 Z t (i .Kircho 's law of voltage states that the algebraic sum of voltage drops around a closed loop circuit is zero.28) 2 di .31) di . . VC = C = C i dt:) Now consider the network with four loops.3. L dt C C . L di . L dt C C . C dt .1 1 1 1 1 2 (8. . . 1 Z t (i .1 .1 4 4 4 4 3 3 4 482 . . 1 Z t (i .1 . .1 2 2 2 2 3 1 1 (8. i )dt + 1 Z t (i . 1 L di + iR + C dt Z 0 .3.3. L4 .

3. i ) = 0 dt d i + 1 (i .3.36) Assume ij = Aj sin(wt) j = 1 2 3 4: or (Recall that ij is the current at the j th loop.35) 1 1 1 1 . C A + ( C + C . 1 (i .32) (8.) >From (8.33) 1 1 1 1 .39) 1 1 1 . C A sin wt . L w )A . C A2 = 0: 1 1 (8.L A w sin wt + C A sin wt .3.34) (8.3.3.28)-(8. C A sin wt + C A sin wt = 0 2 2 2 2 2 2 2 1 1 1 2 or >From (8. C A sin wt + C A sin wt = 0 3 3 2 3 3 3 4 2 2 2 3 or >From (8. C A + ( C + C . C A + ( C + C . C A sin wt .3. C A = 0: 2 2 2 3 3 2 3 3 4 (8.38) 1 1 1 1 . i ) = 0: dt 2 1 1 2 1 1 2 2 2 2 2 2 2 3 1 1 2 2 3 3 2 3 3 4 2 2 3 2 4 4 2 4 4 3 3 4 (8.3.32) 1 1 . i ) .3.40) 483 . C A sin wt = 0 4 4 2 4 4 3 3 3 4 or 1 1 1 .3.35) (8. i ) = 0 L dt C C 1 1 L d i + C (i .3.L1A1w2 sin wt + C A1 sin wt . L w )A = 0: 3 3 3 4 4 2 4 (8.31) can be di erentiated and rearranged to give 1 L d i + C (i . C A2 sin wt = 0 1 1 1 1 ( C .3. C A = 0: 1 1 1 2 2 2 2 2 3 (8.34) 1 1 1 1 .33) (8. L1w2 )A1 .37) >From (8. C A sin wt .3. i ) = 0 dt 1 1 L d i + C i .L A w sin wt + C A sin wt .3.3. L w )A .3.The system of ODEs given by (8. C (i . C (i . i ) .L A w sin wt + C A sin wt .

3.3. L w )A .44) or (1 .3.3.C C C C 1 1 1 . L C w )A = 0: C 1 1 3 2 2 3 2 3 3 2 3 4 4 3 3 4 3 4 4 2 4 1 2 (8.43) (8. L C w )A .Gathering the equations (8. L w )A = 0 1 1 1 1 2 1 1 1 2 2 2 2 2 3 2 2 2 3 3 2 3 3 4 3 3 3 4 4 2 4 C (8.41) (8.3. consider the special case C =C =C =C =C 1 2 3 4 484 .1 0 0 CB A B C C B . A3 = 0 C C C A + ( C + 1 .45) (8.3. To see it more clearly.3.A . 1 A = 0 . 1 A = 0 . A . 1 A = 0 C 1 A + ( 1 + 1 .3. C2 A1 + ( C2 + 1 .49) is: 0 10 1 .1 C B A @ A@ C C + 1) . C A + ( C + C .3.C C C C 1 A + ( 1 .C 2 C A + ( C + 1) + A .3. L1 C1w2 )A1 .50) 0 0 3 0 3 LC 4 4 A The above is an eigenvalue problem. L2C2w2)A2 .48) or A .49) 3 4 In matrix form. A C C C A + ( C + 1)A . L w )A .37){(8.3. A2 = 0 . L w 2 )A .47) (8.3.3.C A + C A + A .C C C .C (C A 0 10 LC 0 0 0 CB A B B 0 LC 0 B CB 0 CB A B CB =w B B 0 CB 0 LC 0 CB A @ A@ 1 2 3 4 2 1 2 1 3 2 3 2 4 3 4 2 1 C C C C C C A 1 C C C: C C C A 1 1 1 2 3 4 2 2 2 (8.1 B C C CB 0 CB A B CB B 0 C C B CB . (8.C C 2 1 1 2 1 2 2 3 2 2 3 3 4 3 3 4 3 4 = L1 C1w2A1 = L2 C2w2A2 = L3 C3w2A3 = L4 C4w2A4 (8. C A + ( C + 1 . 1 . we have ( 1 . ( + 1) .40) together. A = 0 .42) (8.46) (8. C ( C + 1) .

and

L =L =L =L =L
1 2 3 4

and assume then the above problem reduces to

= LCw2 (1 ; )A1 ; A2 = 0 ;A1 + (2 ; )A2 ; A3 = 0 ;A2 + (2 ; )A3 ; A4 = 0 ;A3 + (2 ; )A4 = 0

or

or since ij = Aj sin wt

0 10 (1 ; ) ;1 B CB B ;1 (2 ; ) ;1 CB B CB B CB B B CB ;1 +(2 ; ) ;1 C B @ A@ ;1 +(2 ; ) 0 10 (1 ; ) ;1 B CB i B ;1 (2 ; ) ;1 CB i B CB B CB B B CB ;1 (2 ; ) ;1 C B i @ A@ ;1 (2 ; ) i 10 1 0 0 i 1 ;1 0 0 C B i C B B B ;1 2 ;1 0 C B i C B i CB C B B B CB C B B 0 ;1 2 ;1 C B i C = B i CB C B B A@ A @ @ i i 0 0 ;1 2
1 2 2 3

j = 1 : : : 4: We have

A A A A
1 2 3 4

1 2 3 4

1 C C C C=0 C C A

(8.3.51)

1 C C C=0 C C C A 1 C C C: C C C A

(8.3.52)

or

1 2 3 4

(8.3.53)

The solution of this eigenvalue problem will give us the natural frequencies (wi2 = i=LC ): Moreover the knowledge of the eigenvectors can be used to study the circuit's physical behavior such as the natural modes of oscillation. These eigenvalues and the corresponding normalized eigenvectors (in four digit arithmetic) are:
1

= :1206

2

=1

3

= 2:3473

4

= 3:5321:

485

0 :5774 1 0 ;:4285 1 0 ;:2280 1 B ;0:0000 C B :5774 C B :5774 C B CB CB C B CB CB C B CB CB B ;0:5774 C B :2289 C B ;:6565 C : C @ A@ A@ A :2280 ;:5774 ;:6565 :4285 From the directions of the eigenvectors we conclude that for all the loops oscillate in the same direction. For the second and third loops oscillate in the opposite directions from the rst and fourth, and so on. This is shown in the following diagram:
1 3

0 :6665 1 B :5774 C B C B C B B :4285 C C @ A

486

1

= 0:1206
+ -

2

=1
+ -

3

= 2:3473
+ -

4

= 3:5321
+ -

Figure 8.6
487

8.3.3 An Example of the Eigenvalue Problem Arising in Statistics: Principal Components Analysis
Many practical-life applications involving statistical analysis (e.g. stock market, weather prediction, etc.) involve a huge amount of data. The volume and complexities of the data in these cases can make the computations required for analysis practically infeasible. In order to handle and analyze such a voluminous amount of data in practice, it is therefore necessary to reduce the data. The basic idea then will be to choose judiciously `k' components from a data set consisting of n measurement on p(p > k) original variables, in such a way that much of the information (if not most) in the original p variables is contained in the k chosen components. Such k components are called the rst k \principal components" in statistics. The knowledge of eigenvalues and eigenvectors of the covariance matrix is needed to nd these principal components. Speci cally, if is the covariance matrix corresponding to the random vector

X = (X X : : : Xp)
1 2

0 are the eigenvalues, and x1 through xp are the corresponding eigenvectors of the matrix , then the ith principal component is given by
1 2

:::

p

Yi = xT X i = 1 2 : : : p: i
Furthermore, the proportion of total population variance due to ith principal component is given by the ratio:
1

+

2

+:::+

i

p

i = trace( )

i = 1 : : : pn:

(8.3.54)

Note: The covariance matrix is symmetric positive semide nite, and, therefore, its eigenvalues are all nonnegative.

If the rst k ratios constitute the most of the total population variance, then the rst k principal components can be used in statistical analysis. Note that in computing the kth ratio, we need to know only the kth eigenvalue of the covariance matrix the entire spectrum does not need to be computed. To end this section, we remark that many real-life practices, such as computing the index of Dow Jones Industrial Average, etc., can now be better understood and explained through the principal components analysis. This is shown in the example below.

488

A Stock-Market Example (Taken from Johnson and Wichern (1992))
Suppose that the covariance matrix for the weekly rates of return for stocks of ve major companies (Allied Chemical, DuPont, Union Carbide, Exxon, and Texaco) in a given period of time is given by: 0 1:000 0:577 0:509 0:387 0:462 1 B 0:577 1:000 0:599 0:389 0:322 C B C B B 0:509 0:599 1:000 0:436 0:426 C : C C R=B B C B B 0:387 0:389 0:436 1:000 0:523 C C @ A 0:462 0:322 0:426 0:523 1:000 The rst two eigenvalues of R are:
1 2

= 2:857 = :809:

(8.3.55) (8.3.56)

The proportion of total population variance due to the rst component is approximately 2:857 = 57%: (8.3.57) 5 The proportion of total population variance due to the second component is: :809 = approximately 16%: (8.3.58) 5 Thus the rst two principal components account for 73% of the total population variance. The eigenvectors corresponding to these principal components are:

xT = (:464 :457 :470 :421 :421)
1

(8.3.59) (8.3.60)

and

xT = (:240 :509 :260 ;:526 ;:582):
2

These eigenvectors have interesting interpretations. >From the expression of x1 we see that the rst component is (roughly) equally weighted sum of the ve stocks. This component is generally called the market component. However, the expression for x2 tells us that the second component represents a contrast between the chemical stocks and the oil-industries stocks. This component will be generally called an industry component. Thus, we conclude that about 57% of total variations in these stock returns is due to the market activity and 16% is due to industry activity. The eigenvalue problem also arises in many other important statistical analysis, for example, in computing the canonical correlations, etc. Interested readers are referred to the book by Johnson and Wichern (1992) for further reading. 489

A nal comment: Most eigenvalue problems arising in statistics,

such as in principal components analysis, canonical correlations, etc., are actually singular value decomposition problems and should be handled computationally using the singular value decomposition to be described in Chapter 10.

8.4 Localization of Eigenvalues
As we have seen in several practical applications, explicit knowledge of eigenvalues may not be required all that is required is a knowledge of distribution of the eigenvalues in some given regions of the complex plane or estimates of some speci c eigenvalues. There are ways such information may be acquired without actually computing the eigenvalues of the matrix. In this section, we shall discuss some of the well-known approaches. We start with a well-known result of Gersgorin (1931).

8.4.1 The Gersgorin Disk Theorems Theorem 8.4.1 (Gersgorin's First Theorem) Let A = (aij )n n. De ne n X ri = jaijj i = 1 : : : n:
j =1 i=j
6

Then each eigenvalue of A satis es at least one of the following inequalities:

j ; aiij ri i = 1 2 : : : n:
In other words, all the eigenvalues of A can be found in the union of disks fz : jz ; aiij ri i = 1 : : : ng.

Proof. Let be an eigenvalue of A and x be an eigenvector associated with . Then from
Ax = x
or from ( I ; A)x = 0 490

we have

( ; aii)xi =

n X
j =1 i=j
6

aij xj i = 1 : : : n

where xi is the ith component of the vector x. Let xk be the largest component of x (in absolute value). Then, since jxj j=jxkj 1 for j 6= k, we have from above

j ; akkj

n X
j =1 j =k
6

n X
j =1 j =k
6

x jakj j jjxj jj
k

jakj j:

Thus is contained in the disk j ; akk j rk .

De nition 8.4.1 The disks Ri = jz ; aiij ri, i = 1 : : : n are called Gersgorin disks in the
complex plane.

Example 8.4.1

01 2 31 B C A = B3 4 9C @ A
r = 5 r = 12 r = 2:
1 2 3

1 1 1

The Gersgorin disks are:

R : jz ; 1j 5 R : jz ; 4j 12 R : jz ; 1j 2:
1 2 3

(The eigenvalues of A are 7.3067, -0.6533 0:3473i.) crude estimates of the eigenvalues.

Remark: It is clear from the above example that the Gersgorin rst theorem gives only very

491

7 Gersgorin's Disks of Example 8. the following theorem gives some more speci c information.Imag R2 R1 R3 Real Figure 8.4. We state the theorem without proof. 492 . pp. Theorem 8. see Horn and Johnson (1985) and Brualdi (1993). Note: For generalizations of Gersgorin's theorems. 344{345). Proof.2 (Gersgorin's Second Theorem) Suppose that r Gersgorin disks are disjoint from the rest.4. Horn and Johnson (1985. then exactly r eigenvalues of A lie in the union of the r disks.1 While the above theorem only tells us that the eigenvalues of A lie in the union of n Gersgorin disks.

2 493 . This is indeed true.9834.4.4. Imag R1 1 R2 4 R3 8 Real Figure 8.9671. Therefore.8 Gersgorin's Disks for Example 8.2. by Theorem 8.Example 0 1 0:1 0:2 1 B C A = B 0:2 4 0:3 C @ A 0:4 0:5 8 The Gersgorin disks are: 1 2 3 R : jz . and 8. 4j 0:5 R : jz . each disk must contain exactly one iegenvalue of A. 1j 0:3 R : jz . 3. Note that the eigenvalues A are 0. 8j 0:9 All the three disks are disjoint from each other.0495.

Proof.1 (A) kAT k: Combining these two results and taking the in nity norm in particular.4.4.4 (A) minfmax i n X j =1 jaijj max j n X i=1 jaij jg: 494 .4. the spectral radius of A. (A). we have Corollary 8. In particular. kAk1 = 13 (A) = 7:3067. is bounded by kAk : (A) kAk.8. we have j j kAk where is an eigenvalue of A. Theorem 8. Since the eigenvalues of AT are the same as those of A.4. we obtain Theorem 8. Here are two examples.2 Eigenvalue Bounds and Matrix Norms Simple matrix norms can sometimes be used to obtain useful bounds for the eigenvalues.3 For any consistent pair of matrix-vector norms. The proof follows immediately by taking the norm in Ax = x k xk = kAxk kAk kxk or or j j kxk kAk kxk j j kAk: Note: For the above example.

In vibration analysis of structures. a common engineering practice is just to com- pute the rst few smallest eigenvalues (frequencies) and the corresponding eigenvectors (modes).4.Theorem 8. 8.2. it is the smallest eigenvalue that is the most important one.3) tells us that there exists a unitary U AU = T an upper triangular matrix.5 Computing Selected Eigenvalues and Eigenvectors 8. n X i=1 n X j ji = 2 n X i=1 jtiij 2 n n XX i=1 j =1 i=1 2 jtiij = 2 n X i=1 j ij : 2 jtij j = kAkF . Since similar matrices have the same traces. The same remarks hold in the case of control problems modeled by a system of second-order di erential equations arising 495 . n n XX i=1 j =1 Again. For example. The Schur Triangularization Theorem (Theorem 8.1 Discussions on the Importance of the Largest and Smallest Eigenvalues We have seen before that in several applications all one needs to compute is a few largest or smallest eigenvalues and the corresponding eigenvectors. then 2 j ij kAkF : 2 matrix U such that Thus Proof.5. TT = U AA U: Tr(TT ) = Tr(AA ) = kAkF : 2 So. recall that in the buckling problem. because it has been seen in practice that the larger eigenvalues and eigenvectors contribute very little to the total response of the system. AA is unitarily similar to TT .5 Let 1 2 ::: n be n X i=1 the eigenvalues of A. Tr(TT ) = jtij j : Also 2 n n XX i=1 j =1 jtij j = kAkF : 2 2 2 and Thus.

In summary.5. Furthermore. There are other applications where only the dominant and the subdominant eigenvalues and the corresponding eigenvectors play an important role.in the nite-element generated reduced order model of large exible space structures (see the book by Inman (1989)). 496 . In statistical applications. 8. such as in principal component analysis. (A proof will be presented later.) +1 1 1 2 3 1 2 1 2 1 1 1 2 2 2 0 1 1 1 1 1 1 1 1 2 1 In the case 1 = 0 the second dominant eigenvalue 2 and the corresponding eigenvector assume the role of the rst dominant eigenpair. the state vector xk will approach the 6 direction of the vector v corresponding to the dominant eigenvalue . j kj j i kj i = 2 3 : : : n i provided that i = 0: This means that for large values of k. j j > j j j j : : : j nj where : : : n are the eigenvalues of A.2 The Role of Dominant Eigenvalues and Eigenvectors in Dynamical Systems Let's discuss brie y the role of dominant eigenvalues and the eigenvectors in the context of dynamical systems. Similar conclusions hold for the continuous-time system: x(t) = Ax(t): _ For details. the long term behavior of a homogeneous dynamical system can essentially be predicted from just the rst and second dominant eigenvalues of the system matrix and the corresponding eigenvectors. see the book by Luenberger (1979). only the rst few largest eigenvalues are computed. Suppose that A has a set of independent eigenvectors: v v : : : vn: Then the state xk at any time k > 0 is given by: xk = kv + k v + : : : + n k vn n where x = v + : : : + nvn : Since j jk > j ijk i = 2 3 : : : n it follows that for large values of k. the rate at which the state vector approaches v is determined by the ratio of the second to the rst dominant eigenvalue: j = j. Consider the homogeneous discrete-time system: xk = Axk k = 0 1 2 : : :: Let be the dominant eigenvalue of A that is.

It is well known (see Luenberger (1979). Let max(g ) denote the element of maximum modulus of the vector g . then over the long term there is neither growth nor decay in the population. 1 = 0: In this case. If the dominant eigenvalue (that is. then it follows from 1 of the matrix A is less than 1 pk = 1 kv 1 1 +:::+ k n n vn that the population decreases to zero as k becomes large.5. because they rely on matrix vector multiplications only (and therefore. Moreover. Finally. the zero entries in a sparse matrix do not get lled in during the process). 170) that such a system can be modeled by pk = Apk k = 0 1 2 : : : +1 where pk is the population-vector. 1 is the dominant eigenvalue of A.The second dominant eigenpair is particularly important in the case long term behavior of the system is determined by this pair. Let be the tolerance and N be the maximum number of iterations. if j 1j > 1 then there is long term growth in the population. Similarly. The Inverse Iteration and the Rayleigh Quotient Iteration In this section we will brie y describe two well-known classical methods for nding the dominant eigenvalues and the corresponding eigenvectors of a matrix. Let the eigenvalues 1 2 : : : n of A be such that j j > j j j j : : : j nj 1 2 3 that is. p. Let v1 be the corresponding eigenvector. it is the second dominant eigenvalue of A that determines how fast the original population distribution is approaching the nal distribution. It is so named because it is based on implicit construction of the powers of A. if j 1j < 1). the An Example on the Population Study Let's take the case of a population system to illustrate this. In the latter case the original population approaches a nal distribution that is de ned by the eigenvector of the dominant eigenvalue. if the dominant eigenvalue is 1. 8. The methods are particularly suitable for sparse matrices. The Power Method The power method is frequently used to nd the dominant eigenvalue and the corresponding eigenvector of a matrix. 497 .3 The Power Method.

1 6= 0 is the only dominant eigenvalue. For k = 1 2 3 : : : do 1 xk = Axk. a multiple of v . as k ! 1. we have Ak x xk = max(Akx ) : 0 0 Let the eigenvectors v1 through vn associated with write x0 = 1 v1 + 2 v2 + : : : + n vn. 498 1 . max(^k. )) < or if k > N . 2. x Proof.Algorithm 8. 1 6= 0: So. We can then Ak x = Ak ( v + v + : : : + nvn ) kv + kv + : : : + k = n nvn k = k v + v +:::+ n 1 1 1 2 2 2 1 1 1 2 2 1 2 n 1 k vn]: Since 1 is the dominant eigenvalue. ^ x Stop if (max(^k ) .5. From above. and fxkg ! w . x x 1 1 1 1 Theorem 8. . and fmax(^k)g ! : x 1 Remarks: We have derived the power method under two constraints: 1. i 1 k ! 0 as k ! 1 i = 2 3 : : : n: Ak x xk = max(Ak x ) ! cv 0 0 1 Thus. ^ xk = xk = max(^k). Choose x 0 Step 2.1 Power Method Step 1. 0 1 1 2 2 1 ::: n be linearly independent.5.1 max(^k) ! .

This shows that in this case the power method converges to some vector in the subspace spanned by v1 : : : vn. because after a few iterations.6235. and 9. Then we have Ak x 0 = k 1 r X k 1 i=1 r X 1 ivi + i vi n X i=r+1 i ( i= 1 )k v ! i (since ( i= 1)k is small for large values of k). The normalized eigenvector corresponding to the largest eigenvalue 9. we note that the method still converges when the matrix A has more than one dominant eigenvalue.5.0:6235.1 3 4 5 x0 = (1 1 1)T : 01 2 31 B C A = B2 3 4C @ A The eigenvalues of A are 0. For example. Example 8.The rst constraint ( 1 6= 0) is not really a serious practical constraint.6233 is (:3851 :5595 :7339)T : k=1: 061 B C x = Ax = B 9 C ^ @ A 1 0 2 k=2: 0:50 x1 = B 3 C = B 0:75 C ^ x1 = max(^ ) B 4 C B C x1 @ A @ A 1 1 12 max(^10 =1 0 x ) 12 1 1 0 5:00 B x = Ax = B 7:25 ^ B @ 2 1 9:50 max(^2 ) = 9:50 x 499 1 C C C A 1 0 0:5263 C B x = B 0:7632 C C B A @ 2 1:0000 . let 1 = 2 = : : : = r and j 1j > j r+1j > : : : > j nj and let there be independent eigenvectors associated with 1. . As far as the second constraint is concerned. round-o errors will almost always make it happen.

2 1 2 1 2 The absolute value of the error at each step decreases by the ratio ( ). (Note that the normalized dominant of 1 0:3851 B C eigenvector B 0:5595 C is a scalar multiple of x3 .) @ A 0:7339 Convergence of the Power Method The rate of convergence of the power method is determined by the ratio from the following.6235 and fxk g is converging towards x the direction0 the eigenvector associated with this eigenvalue. that is. if is close to . the convergence will be fast.k=3: 0 5:0526 1 B C x = Ax = B 7:3421 C ^ @ A 9:6316 x = x = max(^ ) ^ x 0 0:5246 1 B C = B 0:7623 C @ A 1:000 max(^ ) = 9:6316: x 3 3 3 3 3 2 Thus fmax(^3)g is converging towards the largest eigenvalue 9. as is easily seen kxk . 2 1 . v k = k 1 1 2 2 1 2 k k v +:::+ 2 2 n n 1 k n 1 vn k k j j 2 2 1 k 1 kv k + : : : + j n j k vn k (j 2jkv2k + : : : + j nj kvnk) : (Since j i j j 2 j i = 3 4 : : : n). v k 1 1 2 1 k k = 1 2 3 ::: = (j 2j kv2 k + : : : + j nj kvnk): 1 1 This shows that the rate at which xk approaches 1 v depends upon how fast j jk goes to zero. 500 . then the convergence will be very slow if this ratio is small. 1 1 Thus we have where kxk .

is the dominant eigenvalue of A . but the eigenvectors remain unaltered. Thus. be slow. (Note that by shifting the matrix A by . the above choice of is not useful in practice. is an e ective method for Algorithm 8.) By choosing appropriately.5. then the rate of convergence will be determined by the ratio 2 . The choice of = 1 + 20 = 2 10:5 yields the ratio 2 . I )^k = xk.2 The sequence fxkg converges to the direction of the eigenvector corresponding to 1. and if the power method is applied to the shifted matrix A . the ratio . The Inverse Power Method/Inverse Iteration The following iterative method. I . Step 1. known as the inverse iteration. Let be the tolerance and N be the maximum number of iterations. 1 assuming that i are all real. An optimal choice (Wilkinson AEP. This simple choice of sometimes indeed yields very fast convergence: but there are many common examples where the convergence can still be slow with this choice of . For k = 1 2 3 : : : do (A . 2 . is 2 ( 1 + n). Therefore. 1 1 Theorem 8. Choose x : Step 2. rather than . p. if is a suitable shift so that 1 . j(i 6= 1) that is.The Power Method with Shift In some cases. j j i . computing an eigenvector when a reasonably good approximation to an eigenvalue is known. thus yielding the faster convergence. because the eigenvalues are not known a priori. in some cases. . is much closer to 1 than to the other eigenvalues. still close to one.2 Inverse Iteration Let be an approximation to a real eigenvalue 1 such that j 1 . can be made signi cantly smaller . 572) of .5. = 8:5 . 501 . than 1 . Consider a 20 20 matrix A with the eigenvalues 20 19 : : : 2 1. k=kxk k) < or if k > N . convergence can be signi cantly improved by using a suitable shift. xk. I . the eigenvalues get 1 1 2 1 2 shifted by . Furthermore. the rate of convergence will still 9:5 1 . x 0 xk = xk = max(^k): ^ x Stop if (kxk .

Numerical Stability of the Inverse Iteration At rst sight inverse iteration seems to be a dangerous procedure. ). c v + : : : + ( n . : : : ( n .1 where F is small. ( . ). 1 are the same as those of A. I + F )xk = xk. Suppose that x0 = c1 v1 + c2v2 + 1 1 0 1 1 1 1 2 1 2 2 + cnvn . is the dominant one (it is the largest). and it is the direction of the eigenvector that we are interested in. v : : : + cn 1 1 1 2 2 1 1 1 2 1 2 2 . Thus. the coe cient of the rst term in the expansion. ). n. An Illustration Let us illustrate the above with k = 1. )k c v + c . I ) is exactly what we want. namely ( 1 ) . ^ 1. I ). cnvn: ^ Since 1 is closer to than any other eigenvalue. a a \The iterated vectors do indeed converge eventually to the eigenvectors of A + F . in practice the ill-conditioning of the matrix (A . That is why it is also known as the inverse power method. 620{621). I ). this ill-conditioning might a ect the computed approximations of the eigenvector. n. " k 1 . The eigenvalues of (A . Then 1 x = (A . xk converges to the direction of v1. . x1 is roughly a multiple of v1. Fortunately. as in the case of the power method.Remark: Note that inverse iteration is simply the power method applied to (A . Wilkinson (AEP pp. ). because if is near 1. Thus. I ). 1 k vn : # Since 1 is closer to than any other eigenvalue. which is what we desire. It is the direction of v1 which we are trying to compute. x = ( . Consequently. The error at each iteration grows towards the direction of the eigenvector. the rst term on the right hand side is the dominating one and. c v + ( . are ( . ). the matrix (A . we can write Proof. = ( ." 502 . therefore. ). 620{621) has remarked that in practice xk is remarkably close ^ to the solution of (A . I ) is obviously ill-conditioned. For details see Wilkinson (AEP pp. )k v + ( c )k : : : + ( cn )k vn ^ . and the eigenvectors 1 1 1 2 1 1 c xk = ( .

p.Example 8. 2 Choosing the Initial Vector x 0 To choose the initial vector x0 we can run a few iterations of the power method and then switch to the inverse iteration with the last vector generated by the power method as the initial vector x0 in the inverse iteration. Wilkinson (AEP.2 3 4 5 x0 = (1 1 1)T = 9: 01 2 31 B C A = B2 3 4C @ A k = 1: 1 1 x = (1 1:5 2)T ^ 1 x = x =kx k = (:3714 :5571 :7428)T : ^ ^ 1 2 k = 2: 2 x = (:619 :8975 1:1761)T ^ 2 x = x =kx k = (:3860 :5597 :7334)T : ^ ^ 2 2 2 k = 3: 3 x = (:6176 :8974 1:1772)T ^ 3 x = x =kx k = (:3850 :5595 :7340): ^ ^ 3 3 2 k = 4: 4 x = (:6176 :8974 1:1772)T ^ 4 x = x =kx k = (:3850 :5595 :7340)T ^ ^ 4 4 2 k = 5: 5 x = (:6177 :8974 1:1772)T ^ 5 x = x =k(^ )k = (:3851 :5595 :7339): ^ x 5 5 2 Remark: In the above example we have used norm k k as a scaling for the vector xi to emphasize that the scaling is immaterial since we are working towards the direction of the eigenvector.5. 627) has stated that if x0 is assumed to be such that Le = Px 503 0 .

4:1 0 1 1 0 .8:1 1 0 0 2 3 B C B C L = B .1Le = B :7531 C : @ A .:8456 1 0 0 1:0200 01 0 01 B C P = B0 1 0C: @ A 0 0 1 0 1:000 1 B C x0 = P . I ) = LU and e is the vector of unit elements.:2160 1 1 0 k=1: 0 :4003 1 B C x = (A .where P is the permutation matrix satisfying P (A . I ).3 01 2 31 B C A = B2 3 4C: @ A 3 4 5 0 The eigenvalues of A are: Choose = 9:1 . x = B :6507 C ^ @ A 0 :4083 1 C B x = x = max(^ ) = B :6637 C : ^ x @ A 1 1 1 :9804 1:000 504 .6:1 4:0 C @ A 3:0 4:0 .5:6062 4:7407 C @ A @ A . then only \two iterations are usually adequate". I = B 2:0 . Note: If x0 is chosen as above.:2469 1 0 C U = B 0 .0:6235 and 9:6235: 0 . provided that is a good approximation to the eigenvalue.:3704 . then the computations of x1 involves only solution of the ^ triangular system: U x1 = e1 : ^ Example 8.8:1 2:0 3:0 1 B C A .5.

Therefore x = c v + : : : + cnvn : Assume that vi i = 1 : : : n are normalized. and noting that viT vj = 0 i 6= j .k=2: 0 :9367 1 B C x = (1 .3 times the normalized eigenvector. Then the quotient T Rq = = xxTAx x is a good approximation to the eigenvalue for which x is the corresponding eigenvector. Since A is symmetric there exists a set of orthogonal eigenvectors v v : : : vn. we have T T = x TAx = (c v + : : : + cnvn ) TA(c v + : : : + cn vn) xx (c v + : : : + cnvn ) (c v + : : : + cnvn ) c Tc = (c v + : : : + c nvn ) (+ : : :v++ : : : + cn nvn ) +c cn + + = c c ++ cc + : :: :: :+ c ncn n 3 2 c + : : : + n cn 61 + c c 7: 7 = 6 6 7 4 5 c + : : : + cn 1+ c c 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 1 2 1 2 1 1 we can write 1 2 505 . since Avi = ivi i = 1 : : : n. Proof. @ A :7339 The Rayleigh Quotient Theorem 8. that is viT vi = 1: Then.0 0 :5315 1 B C x = B :7683 C @ A 2 1:7624 :3851 1 B C The normalized eigenvector correct to four digits is B :5595 C.3 Let A be a symmetric matrix and let x be a reasonably good approximation to an eigenvector. I ). x = B 1:3540 C ^ @ A 2 1 1 1:000 which is about 1.5.

Solve for xk+1 : ^ 3. Compute 4. Let N be the maximum number of iterations to be performed. Thus.5. which means that is close to 1.:2 x Note : It can be shown (exercise) that for a symmetric matrix A: n Rq 1 where n and 1 are the smallest and the largest eigenvalue of A respectively. the expression within ] is close to 1. k I )^k+1 = xk x xk = xk = max(^k ): ^ x +1 +1 +1 xk ) is an acceptable eigenvalue-eigenvector pair or if k > N . Rayleigh-Quotient Iteration The above idea of approximating an eigenvalue can be combined with inverse iteration to compute successive approximations of an eigenvalue and the corresponding eigenvector in an iterative fashion.3 Rayleigh-Quotient Iteration For k = 0 1 2 : : : do 1. c1 is larger than other ci i = 2 : : : n. = xxTAx = . Example 8.:2361. Stop if the pair ( k (A .5. x T 1 2 A= : 2 3 ! x= T Then the Rayleigh Quotient .1 The quotient Rq = xxTAx is called the Rayleigh Quotient.:5 1 ! : is a good approximation to the eigenvalue .4 Let De nition 8.5. known as Rayleigh-quotient iteration. 506 .Because of our assumption that x is a good approximation to v1 . Compute k = xT Axk =xT xk k k 2. described as follows. Algorithm 8.

3851 times x2 is this eigenvector to three digits.5. Convergence: It can be shown (Wilkinson. 636. perhaps the best thing to do is to use the 0 0 direct power method itself a few times and then use the last approximation as x0 . Remark: Rayleigh Quotient iteration can also be de ned in the nonsymmetric case. Example 8. p. p. See also Parlett (1974).method is cubic. AEP. where one nds both left and right eigenvectors at each step.5 01 2 31 B C A = B2 3 4C @ A 3 4 5 Let us take 1:000 which is obtained after 3 iterations of the power method. We omit the discussion of the nonsymmetric case here and refer the reader to Wilkinson AEP. Then 0 :5246 1 B C x = B :7622 C @ A 0 k=0: 0 = xT Ax0 =(xT x0) = 9:6235 0 0 0 :5247 1 B C x = B :7623 C @ A 1 k=1: 1:000 = xT Ax1 =(xT x1) = 9:6235 1 0 11000 1 : B C x2 = B 1:4529 C @ A 1:9059 0 :3851 1 B C The normalized eigenvector associated with 9.6255 is B :5595 C. 507 . Thus two iterations were su cient. @ A :7339 1 Note that . 630) that the rate of convergence of the Choice of x : As for choosing an initial vector x .

x xT xi i = 1 2 : : : n: Since x is orthogonal to the other eigenvectors. Then de ne T A2 = A1 . Case 2: A is Nonsymmetric The idea above can be easily generalized to a nonsymmetric matrix however. x xT 1 where xT x1 = 1: Then 1 1 A xi = A xi . Case 1: A is Symmetric First suppose that A = A1 is symmetric. and x through xn are the corresponding eigenvec2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 3 1 tors. The basic idea behind de ation is to replace the original matrix by another matrix of the same or lesser dimension using a computed eigenpair such that the de ated matrix has the same eigenvalues as the original one except the one which is used to de ate. 1x1 y1 where y T x = 1: 1 1 508 . De ne 1 1 1 1 A = A . Let ( 2 1 1 x ) be an eigenpair of A . Let (x1 y1) be the pair of right and left eigenvector of A = A1 corresponding to 1. and assuming xT x = 1 we have For i = 1 : A x = A x . 0 = ixi: Thus the eigenvalues of A are 0 : : : n.5.8. we need both left and right eigenvectors here. Hotelling De ation The Hotelling de ation is a process which replaces the original matrix A = A1 by a matrix A2 of the same order such that all the eigenvalues of A2 are the same as those of A1 except the one which is used to construct A2 from A1 . x = 0 = 0: For i = 1: 6 A xi = A xi . the next dominant eigenvalue 2 can be computed by using de ation.4 Computing the Subdominant Eigenvalues and Eigenvectors: De ation Once the dominant eigenvalue 1 and the corresponding eigenvector v1 have been computed. x xT x = x .

p. then 1 1 1 0 B0 B A = HAH = B . Of course. eigenvectors x1 through xn. we see that 1 = 0 and. if j 1j > j 2j j 3j : : : j nj. 1). through n are the eigenvalues of A2 corresponding to the Remarks: Though the Hotelling de ation is commonly recom- mended in the engineering literature as a practical process for computing a selected number of eigenvalues and eigenvectors and is wellsuited for large and sparse matrices. @ 1 1 ::: 0 A2 1 1 C 0 C bT A C=@ C C 0 A A 1 2 where A2 is (n . the technique will work with any similarity transformation however. 509 . x y T xi = ixi . 0 = ixi : 2 1 1 1 1 2 Again. we will use Householder matrices for reasons of numerical stability.5. and the eigenvalues of A2 are the same as those of A except for 1 in particular.Then using the bi-orthogonality conditions of the eigenvectors xi and yi we have For i = 1 : T A2x1 = A1x1 . the method both in the symmetric and nonsymmetric cases is numerically unstable (see Wilkinson AEP. 585). The method is based upon the following result: Theorem 8. B B. 1x1 = 0 For i 6= 1 : A xi = A xi . 1) (n . Householder De ation We will now construct a de ation using similarity transformation on A1 with Householder matrices.. then dominant eigenvalue of A2 is 2. which is the second dominant (subdominant) eigenvalue of A.4 Let ( v ) be an eigenpair of A and H be a Householder matrix such that Hv is a multiple of e1 . 1x1y1 x1 = 1x1 .

Example 8. Discard the rst row and the rst column of HAH and nd the dominant eigenvalue of the (n . C : B C B. B B. 4. if B B0 HAH = B . HAH (ke1) = 1ke1 (since Hv1 = ke1 ) or HAHe1 = 1e1..C @ A 1 0 3.. (This means that the rst column of HAH is 1 times the rst column of the identity matrix. -0. @ 1 ::: A2 C C C: C C A 1 plus (n .Proof. 2. Algorithm 8. From Av = v we have 1 1 1 HAHHv = Hv (since H = I ): 1 1 1 2 That is. Compute the dominant eigenvalue 1 and the corresponding eigenvector v1 using the power method and the inverse power method.0018.5. Moreover.4 Householder De ation to Compute the Subdominant Eigenvalue 1.3083. 1.4947. 1) (n . 1) matrix thus obtained. I ). Find a Householder matrix H such that 0 1 B C B0C Hv = B . which is the second dominant eigenvalue of A. I ) = 1 det(A2 .) Thus HAH must have the form 0 1 0 Since det(HAH . 1) j j > j j > j j j j : : : j nj 1 2 3 4 the dominant eigenvalue of A2 is 2. Compute HAH . it follows that the eigenvalues of HAH are eigenvalues of A2 .5. 510 .6 0 :2190 :6793 :5194 1 B C A = B :0470 :9347 :8310 C : @ A :6789 :3835 :0346 The eigenvalues of A are: .

show below that this eigenvector can be computed directly from A2 without invoking inverse iteration.0:7039 C @ A .:3223 .0:4430 1 1 1 1 2 1 H=I.:3223 .:5552 . the subdominant eigenvector v2 of A corresponding to 2 is obtained from v = Hv : 2 (2) 1 (2) (2) (Note that HAHv1 = 2v1 ).:4430 . Computing the Subdominant Eigenvector 2 2 0 Once the subdominant eigenvalue 2 has been computed using the above procedure. We.:3331 C B C B C: 3.:7039 :6814 .:8738 011 B C Hv = B 0 C : @ A 1 1 0 1:4977 .:4430 0 . the corresponding eigenvector v2 can be found from the inverse iteration. = 1:4947 v = B . kv k e = B .:4430 1 B C = B . however.:5052 0 4.:3331 1 0 1:4977 . The dominant eigenvalue of A is . Then it can be shown that the eigenvector (2) v1 of A1 corresponding to 1 is given by v = (2) 1 ! where is determined from v (2) 2 (2) ( 1 . u = v .:7039 C : @ A .:2005 .1:5552 1 B C 2. 2 ) + bT v2 = 0: (2) Once v1 is found.0:3083 which the subdominant eigenvalue of A. 2uuT uT u 0 .:2005 C @ A . HAH = B 0 :1987 :2672 C = B 0 C A @ A B @ A 0 :3736 .:7039 . (2) Let v2 be the eigenvector of A2 corresponding to 2 .0 .:5552 1 B C 1. Thus to compute the subdominant eigenvector v2 of A: Given 0 HAH = A = @ 1 bT A : 0 A 1 2 1 511 .

Algorithm 8.5.5 Computing the Subdominant Eigenvector
(2) 1. Compute the eigenvector v2 of A2 corresponding to 2.

2. Compute given by
(2) 3. Compute the eigenvector v1 of A1 :

T v (2) = b ;2 : 2 1

v =
(2) 1

!
v
(2) 2

:

4. Compute the vector v2 of A:

v = Hv :
2 (2) 1

Example 8.5.7

0 :2190 :6793 :5194 1 B C A = B :0470 :9347 :8310 C @ A
:6789 :3835 :0341
1

0 1:4947 ;:3223 ;:3333 1 B C HAH = B 0 :1987 :2672 C @ A 0 :3736 ;:5052

= 1:4947

2

= ;:3083

bT = (;:3223 ; :3331) :1987 :2672 A = : ;:3736 ;:5052
2

!

1. 2. 3. 4.

;:4662 ! v = :8847 Tv = b ; = :0801 0 :0801 1 B C v = B ;:4662 C @ A ;:8847 0 ;:1082 1 B C v = Hv = B ;:5514 C : @ A 0:8314 Computing the other largest Eigenvalues and Eigenvectors
(2) 2 (2) 2 2 1 (2) 1 2 (2) 1

512

Once the pair ( 2 v2) is computed, the matrix A2 can be de ated using this pair to compute the pair ( 3 v3). From the vector pair ( 3 v3), we then compute the pair ( 4 v4) and so on. Thus by repeated applications of the process we can compute successively all the n eigenvalues and eigenvectors of A.

Remarks: However, if more than a few eigenvalues are needed, the QR iteration method to be described a little later should be used, because in that case, the QR iteration will be more cost-e ective, and it is a stable method. Computing the Smallest Eigenvalues
It is easy to see that the power method applied to A;1 gives us the smallest eigenvalue in magnitude (the least dominant one) of A. Let A be nonsingular and let the eigenvalues of A be ordered such that

j j>j j j j
1 2 3

j

n;1j > j nj > 0:

Then the eigenvalues of A;1 (which are the reciprocals of the eigenvalues of A) are arranged as: 1 > 1 1 1 > 0:
n n;1 n;2

j j
1

That is, 1 is the dominant eigenvalue of A;1 . This suggests that the reciprocal of the smallest n eigenvalue can be computed by applying the power method to A;1.

Algorithm 8.5.6 Computing the Smallest Eigenvalue in Magnitude
1. Apply the power method to A;1 . 2. Take the reciprocal of the eigenvalue obtained in step 1.

Note: Since the power method is implemented by matrix-vector multiplication only, the inverse of A does not have to be computed explicitly. This is because computing A; x,
1

where x is a vector, is equivalent to solving the linear system: Ay = x. To compute the next least dominant eigenvalue (the second smallest eigenvalue in magnitude), we compute the smallest eigenvalue and the corresponding eigenvector and then apply de ation. The inverse power method applied to the (n ; 1) (n ; 1) matrix at bottom right hand 513

corner will yield the reciprocal of the second smallest eigenvalue of A. Once the required eigenvalues are computed, we can always use the inverse power method to compute the corresponding eigenvectors or use de ation as shown earlier.

Remark: To accelerate the convergence, a suitable shift should be made. Example 8.5.8
01 4 51 B C A = B2 3 3C: @ A

1 1 1 The power method (without shift) applied to A;1 with the starting vector x0 = (1 ;1 1)T gave = 9:5145. Thus the smallest eigenvalue of A is: 1 = :1051:
(Note the eigenvalues of A are 6.3850, 01.4901 and .1051.)

Summary of the Process for Finding the Selected Eigenvalues and Eigenvectors
To compute a selected number of eigenvalues (mainly the rst few largest or smallest eigenvalues and the corresponding eigenvectors), the following combination of the power method, inverse iteration and de ation is recommended to be used in the following sequences: 1. Use the power method (the power method applied to A;1) to compute a reasonably good approximation of the largest (smallest) eigenvalue in magnitude and of the corresponding eigenvectors. 2. Use the inverse iteration with the approximate eigenvalue (keeping it xed) and the approximate eigenvector obtained in step 1 as the starting vector. 3. Apply now de ation to compute the next set of eigenpairs.

514

8.6 Similarity Transformations and Eigenvalue Computations
Recall (Theorem 8.2.2) that two similar matrices A and B have the same eigenvalues, that is, if X is a nonsingular matrix such that X ;1AX = B then the eigenvalues of A are the same as those of B . One obvious approach to compute the eigenvalues of A, therefore, will be to reduce A to a suitable \simpler" form B by similarity so that the eigenvalues of B can be more easily computed. However, extreme caution must be taken here. It can be shown (Golub and Van Loan MC 1984, p. 198) that

Theorem 8.6.1 (X ; AX ) = X ; AX + E where
1 1

kE k

2

kX k kX ; k kAk :
2 1 2 2 2

Remark: Since the error matrix E clearly depends upon Cond (X ), the above theorem tells us that in computing the eigenvalues and eigenvectors we should avoid ill-conditioned transforming matrices to transform A by similarity to a \simpler" form. Primarily because of this the eigenvalues of a matrix A are not computed using the characteristic polynomial of A or by transforming A to the Jordan Canonical form. Below we will discuss them in
some detail.

8.6.1 Eigenvalue Computations Using the Characteristic Polynomial Why should eigenvalues not be computed via the characteristic polynomial?

Since the eigenvalues of a matrix are the zeros of the characteristic polynomial, it is natural to think of computing the eigenvalues of A by nding the zeros of its characteristic polynomial. However, this approach is NOT numerically e ective.

515

Di culties with Eigenvalue Computations Using the Characteristic Polynomial
First, the process of explicitly computing the coe cients of the characteristic polynomial may be numerically unstable. Second, the zeros of the characteristic polynomial may be very sensitive to perturbations on the coe cients of the characteristic polynomial. Thus if the coe cients of the characteristic polynomial are not computed accurately, there will be errors in the computed eigenvalues. In Chapter 3 we illustrated the sensitivity of the root- nding problem by means of the Wilkinsonpolynomial and other examples. We will now discuss the di culty of computing the characteristic polynomial in some detail here. Computing the characteristic polynomial explicitly amounts to transforming the matrix to a block-companion (or Frobenius) form. Every matrix A can be reduced by similarity to

0 Ck where each Ci is a companion matrix. The matrix C is said to be in Frobenius form. If k = 1, the matrix A is nonderogatory. Assume that A is nonderogatory and let's see how A can be reduced to a companion matrix by similarity. This can be achieved in two stages:

0C 1 0 B ... C C C=B @ A
1

Reduction of a Matrix to a Companion Matrix Stage 1: The matrix A is transformed to an unreduced Hessenberg
matrix H by orthogonal similarity using the Householder or Givens method. ther reduced to a companion matrix by similarity.

Stage 2: The transformed unreduced Hessenberg matrix H is fur-

516

We have already seen in Chapter 5 how a nonderogatory matrix A can be transformed to an unreduced Hessenberg matrix by orthogonal similarity in a numerically stable way using the Householder or Given method. Consider now stage 2, that is, the transformation of the unreduced Hessenberg matrix H to a companion matrix C . Let X be the nonsingular transforming matrix, that is HX = XC where 00 0 : 1 0 c1 B1 0 : C B 0 c2 C B C B0 1 0 C B C=B 0 c3 C : B .. .. .. . . . .. .. C C B. . . . .C @ A 0 0 : 0 cn If x1 x2 : : : xn are the n successive columns of X , then from

HX = XC
we have and

Hxi = xi
1

+1

i = 1 ::: n; 1
2 2

Hxn = c x + c x +
1

+ cn xn:

Eliminating x2 through xn we have

(H )x1 = 0

where (x) is the characteristic polynomial of C . Since the two similar matrices have the same characteristic polynomial, we have by the CayleyHamilton Theorem (H ) = 0: This means that x1 can be chosen arbitrarily. Once x1 is chosen, x2 through xn can be determined by the recursion: xi+1 = Hxi i = 1 : : : n ; 1: Thus it follows that if x1 is chosen so that

X = (x Hx : : : Hxn; )
1 1 1

is nonsingular, then X will transform H to C by similarity. 517

So, if one or more subdiagonal entries hi i of H are signi cantly small, then the inverse of X will have large entries and consequently X will be ill-conditioned. Thus in
+1

Choose x1 = (1 0 : : : 0), then the matrix 01 1 B0 h C B 21 C B C B0 0 h h C C X = (x1 Hx1 : : : Hxn;1) = B B . . 21 32 . C B . . ... .. C . . B. . C . @ A 0 0 0 0 h21h32 hn n;1 is nonsingular because hi+1 i 6= 0 i = 1 : : : n ; 1.

such a case the transformation of an unreduced Hessenberg matrix H to a companion matrix will be unstable

Thus, the rst stage, in which A is transformed to H using the Householder or the Givens method, is numerically stable, while the second stage, in which H is further reduced to C , can be highly unstable. Example.
0 1 2 31 B C H = B 0:0001 1 1 C @ A
0

2 3 x1 = (1 0 0)T x2 = Hx1 = (1 0:0001 0)T

x = Hx = (1:0002 0:0002 0:0002)T 0 1 1 1:0002 1 B C X = B 0 0:0001 0:0002 C @ A 0 0 0:0002
3 2

00 0 1 1 B C X ; AX = C = B 1 0 ;4:9998 C @ A
1

0 1 5 Cond2(X ) = 3:1326 104 :

(Note that the existence of a small subdiagonal entry of H namely h21 , made the transforming matrix X ill-conditioned.) 518

let's point out some remarks of Wilkinson about Frobenius forms of matrices arising in certain applications such as mechanical and electrical systems. 434). we have found the program based on its use surprisingly satisfactory in general for matrices arising from damped mechanical or electrical systems.1 n. (AEP. For example.There are also other equivalent methods for reducing H to C . I ) = are given by n+c n. of the characteristic polynomial det(A . Here Wilkinson has shown that in LeVerrier's method. When this is true methods based on the use of the explicit characteristic polynomial are both fast and accurate. The subdiagonal entries are used as pivots and we have seen before that small pivots can be dangerous. The Newton's sums determining the coe cients ci i = 0 : : : n. see (Wilkinson AEP.(trace(Ak ) + cn. Wilkinson.trace(A) kcn. p. which also shows that small subdiagonal entries can make the method highly unstable. AEP pp. For example. ) + k = 2 : : : n: 1 1 1 + cn." Quotation from Wilkinson AEP p.1 + + c1 + c0 cn. = . 406) describes a pivoting method for transforming an unreduced Hessenberg matrix H to a companion matrix C using Gaussian elimination. LeVerrier's method (Wilkinson. \Although we have made it clear that we regard the use of the Frobenius form as dangerous. in that it may well be catastrophically worse-conditioned than the original matrix. severe cancellation can take place while computing the coe cients from the traces using Newton's sums. 482 519 .k+1trace(A)) For details. 434{435) computes the coe cients of the characteristic polynomial using the traces of the various powers of A. Note that there are other approaches for nding the characteristic polynomial of a matrix.k = . It is common for the corresponding characteristic polynomial to be well-conditioned. p. Having emphasized the danger of using the Frobenius form in the eigenvalue-computations of a matrix. trace(Ak.

6. known as Hyman's method. Let H = (hij ) be an unreduced upper Hessenberg matrix. Thus. this computation can also be highly unstable.. of course. we wish to point out that there is a method for implicitly evaluating the characteristic polynomial of a Hessenberg matrix and its derivative at a certain given point. 196{197).nding procedure (such as Newton's method) to compute a zero of the characteristic polynomial.. 520 . the eigenvalues of A are displayed as soon as the JCF is computed. It then follows from 8.. and hence an eigenvalue of the associated matrix. Recall (Theorem 8. i 's are the eigenvalues. the Jordan-Canonical Form is the one that comes to mind rst.10) that given an n n matrix A. In this connection. AX = diag(J : : : Jk) 1 1 where 0 B B Ji = B B B @ 1 i If Ji is of order i .2 Eigenvalue Computations via Jordan-Canonical Form Let us now discuss the use of some other suitable canonical forms in eigenvalue computations.3 Hyman's Method Before we leave this section. can in turn be used in a root. Set pn( ) = 1. Theorem 8. pp. .6.. 1 C : C A i k 1 + 2 + :::+ = n: The matrix on the right hand side is the Jordan Canonical Form (JCF) of A. Unfortunately. then 0 1 0 C . 8. there exists a nonsingular matrix X such that X . without explicitly computing the coe cients of the characteristic polynomial.. This method. C C .1 that the computed JCF in that case will be inaccurate. the transforming matrix X will very ill-conditioned (see Golub and Van Loan MC 1984.Remarks: The above remarks of Wilkinson clearly support a long tradition by engineers of computing the eigenvalues via the characteristic polynomial..2.6. Whenever A is close to a nondiagonalizable matrix.

However. 429) has shown that Hyman's method has favorable numerical properties. We have illustrated the eigenvalue-sensitivity phenomenon by means of some examples in Chapter 3. known as the Bauer-Fike Theorem.1k. 8. Proof.1( ) through p1 ( ) using the recurrence: +1 +1 = +1 13 20 n X 6 @ hi j pj ( ) . j kX k kX .7. the Jordan Canonical Form of A is a diagonal matrix D. we have min j i . In this section we will see now what speci c role the condition number of the transforming matrix: Cond(X ) = kX k kX . and 1 2 ::: n are the eigenvalues of A. Then for an eigenvalue of A + E . on the eigenvalue sensitivity of a diagonalizable matrix.Compute pn. 8. if all the eigenvalues are needed. 1 0n 1 X @ hij pj ( ) .7. Thus Hyman's method can be combined with Newton's method for nding the zeros of a polynomial to obtain an isolated eigenvalue of a matrix.1k kE k where k k is a subordinate matrix norm. Consider two cases: 521 . 1 : : : 1: pi ( ) = 6 7 hi i 6 7 4 5 +1 Then the characteristic polynomial det(A .7 Eigenvalue Sensitivity In the previous two sections we have cautioned the readers about the danger of computing the eigenvalue via Jordan Canonical Form or the Frobenius form of a matrix. that is. plays in the eigenvalue-sensitivity. p. one should use the QR iteration method to be described later.1 The Bauer-Fike Theorem Theorem 8. The danger was mainly the possibility of the transforming matrix X being ill-conditioned.1 Let A be diagonalizable. pi ( )A 7 6 ji 7 6 7 6 7 i = n . Here is a more speci c result. I ) = h21 h32 hn n. I ) of A is given by det(H . p ( )A : j =1 1 Wilkinson (AEP.

we get 1 k( I .1k kX k kE k: 522 .Case 1: Case 2: = i for some i. Now from (A + E )x = x we have Ex = ( I . EXyk k( I . Then the diagonal entries of the diagonal matrix I . A)x = ( I . 1 .1 to the left ( I . D). )x = X ( I . XDX . D). D are di erent from zero. X . EXy (note that x = Xy ). we have kyk = k( I . k kX .1 min( . k( I . by multiplying the equation by X . 1 i 1 min( . i ) kX . k = max i 1 = min( . the matrix ( I .1k kX . Then from the last equation. we have.1x = y . D) is nonsingular.1Ex or y = ( I . D)X . ) : i i So. D)y = X . D). D). or.1:) Set X . D). X . Since the determinant of a matrix is equal to the product of its eigenvalues.1kkE k kX k: Now for a subordinate norm. x: 1 1 (Note that A = XDX . The theorem is trivially true. kkE k kX k kyk: 1 1 1 1 Dividing both sides by y . i) kX k kX k kE k: i 1 . 6= i for any i. 1 1 Taking a subordinate norm on both sides.

0:0756 C @ A 0 0 .) Example 8. a similar result also holds.Implications of the Theorem The above theorem tells us that if Cond(X ) = kX k.0:2113 0:0957 1 B C A+E = B 0 2:0001 .0:2113 0:0957 1 B C A = B0 2 . Remark: In case A is not diagonalizable.1:0636 C @ A 5 and Van Loan. (For details. AX = diag(1 2 3): 1 The eigenvalues of A are: 1. 2.0:8756 C : @ A 0 0 3:0001 The eigenvalues of A + E are: 1:0001 2:0001 3:001 Cond2(X ) = 1:8294 105 523 01 0 01 B C E = 10. the more ill-conditioned the eigenvector matrix X is.1 0 0 3 0 0 1:2147 X . 209. then an eigenvalue of the perturbed matrix A + E can be signi cantly di erent from an eigenvalue i of A. MC 1984 p. see Golub 0 1 . the more ill-conditioned will be the eigenproblem for A. In general. B 0 1 0 C @ A 4 .7.0:2113 :1705 1 B C X = 10 B 0 1 . kX . 3 0 0 1 0 1:0011 . k 1 1 is large.

i Example 8.1))T eeik : i 2 i 2 De nition 8.2 01 2 31 B C A = B 0 0:999 1 C @ A 0 0 2 524 . 183). This typically happens when A is close to a matrix having some nonlinear elementary divisors (see Wilkinson. AEP p. However. There are n condition numbers associated with the n eigenvalues of A. this can be done for any diagonalizable matrix. If xi and yi are real. some may be very well-conditioned while others are ill-conditioned.kE k = 10. If this number is large then is an ill-conditioned eigenvalue. Notes: 1. as we have seen from the examples in Chapter 3. 2.1 The number 1 . Similarly. Then the normalized right-hand and left-hand eigenvectors corresponding to an eigenvalue i are given by Xe X .7.1AX = diag( 1 : : : n). k gives an overall assessment of the changes in eigenvalues with 1 respect to changes in the coe cient of the matrix. where s is de ned by si i si = yiT xi is called the condition number of the eigenvalue i . In fact. rather than conditioning of the eigenvalue problem.7. Recall that in Chapter 3 an analysis of the illconditioning of the individual eigenvalues of the slightly perturbed Wilkinson-matrix was given in terms of the numbers si . It is therefore more appropriate to talk about conditioning of the individual eigenvalues. some eigenvalues of A may be more sensitive than the others.7. some eigenvectors may be well-conditioned while others are not. In general.2 Sensitivity of the Individual Eigenvalues The condition number kX k kX . then si is the cosine of the angle between xi and yi . Cond (X ) kE k = 18:2936: 2 2 4 2 8. Let X .1 T xi = kXe ik yi = k((X .

while 3 =2 Indeed. s = yT x = :1925: 2 3 Thus. i si = jyiT xi j = kXejeki X(X XeTje k k . The computed eigenvalues of the perturbed matrix were (to three digits): 0:999 + 0:001i 0:999 . s = yT x = 3:5373 10. when a(3 1) was perturbed to 0.999 of the original matrix) become complex. )T eik 1 2 kX k keik = kX k 2 2 1 1 2 2 2 2 k(X . 0:001i and 2: For yet another nontrivial example.0:7075 :7068)T (0 0 1)T 1 2 3 1 2 3 4 4 s = yT x = 3:5329 10. A Relationship Between s and Cond(X ) i It is easy to see that the condition numbers si and Cond2(X ) with respect to the 2-norm are related. T . k : 2 1 2 525 . the above computations clearly show that is well-conditioned.x x x y y y 1 2 3 1 2 3 = = = = = = 1 (1 0 0)T (1 .0:005 0)T (:9623 :1923 :1925)T (0:0004 :7066 . see exercise #23.000001 and the eigenvalues of the perturbed matrix were computed.) i i 1 2 1 Now and = kXe k k(X . 1 =1 2 = :999 are ill-conditioned.0:7076)T (0 . )T k = kX .1)T e k : i 2 i 2 1 2 kXeik k(X . the rst two eigenvalues of the perturbed matrix (those corresponding to 1 and of the . )T k keik = k(X .

3 01 2 31 B C A = B 0 0:999 1 C @ A 0 0 2 1 = 2:8305 103 s1 1 = 2:8270 103 s2 1 = 5:1940: Cond2(X ) = 6:9603 103 : s 3 Thus. 0:0000i 0:0000 + 0:000i 0:1925 Note the linear dependence of the rst two eigenvectors. 2 For the above matrix A.0:0005 + 0:0005i 0:1923 C : @ A 0:0000 .7.So. :8840i 0:9623 1 B C X = B .0:0005 . the matrix of eigenvectors is: 0 . k = Cond (X ): 2 1 2 2 si 1 Cond2 (X ): Example 8.0:4675 . 0:0005i . The Condition Numbers and Linear Dependence of Eigenvectors Since for a diagonalizable matrix the columns of the matrix X are the eigenvectors of A. 1 kX k kX .0:4675 + :8840i . for each i. Cond2(X ) = 2:5965 103 : 526 . Cond2 (X ) gives us an indication of how linearly independent the eigenvectors are: si 1 < Cond (X ) 2 i = 1 2 3: If Cond (X ) is large it means that the eigenvectors are nearly dependent. si Thus.

see Watkins (FMC pp. 332{333).11. Thus. if complex) matrices. 527 . Remark: The normal matrices most commonly found in practical applications are symmetric (or Hermitian.The Eigenvalue-Sensitivity of a Normal Matrix A matrix A is called normal if AA = A A where A = (A)T : A Hermitian matrix is normal. j kE k2: Corollary to the Bauer-Fike Theorem: Let A be a normal matrix. We will discuss the symmetric eigenvalue problem in more detail in Section 8. the eigenvalues of a symmetric (or Hermitian) matrix are well-conditioned. We will just state a theorem (in somewhat crude form) that will highlight the main di erences between eigenvalue and eigenvector sensitivities. Normal matrices are diagonalizable.8 Eigenvector Sensitivity We shall not go into details in this discussion on the sensitivity of eigenvectors. Then for an eigenvalue of A + E we have min j i . by the Corollary above. For an exact statement and proof. the eigenvalues of a normal matrix are perfectly well-conditioned. thus Cond2 (X ) = 1: Thus an immediate consequence of the Bauer-Fike theorem is: eigenvalues of A. 8. and 1 ::: n be the Eigenvalue Sensitivity of a Normal Matrix In other words. A remarkable property of a normal matrix A is that if X is the transforming matrix that transform A to a diagonal matrix.

j )sj jk jk xj + 0(k Ak ) 2 = yj ( A)xj : Implications of the theorem The above theorem tells us that if A is perturbed by a small amount.1 C and B 0 C while those @ A@ A @ A 0 0 1 528 01 0 01 B C A = B 0 :99 0 C @ A .0:01 C. assuming that the xk + xk = xk + where X j 6=k ( k . This is signi cant especially for a Hermitian or a symmetric matrix. Then.) However. An immediate consequence of this theorem is that if there is a multiple eigenvalue or Example 8. then there are some ill-conditioned eigenvectors.1 1 0 :01 1 eigenvalues are 0 B CB C B C well-conditioned.of A be perturbed by k that is k + xk + xk be the eigenvector corresponding to eigenvalues of A are all distinct. we have k Theorem 8.8. condition numbers of all the eigenvalues other than k . since A is symmetric the 0 1 0 . 2.8. then the eigenvectors are well-conditioned. . an eigenvalue near another eigenvalue. the distance of k from the other eigenvalues. B .99. the eigenvectors of A0 are B . If the eigenvalues are well-separated and well-conditioned. because we know that the eigenvalues of a Hermitian matrix are all well-conditioned.1 Let A be a very small perturbation of A and let the eigenvalue k is an eigenvalue of A + A: Let k + k . then the amount of perturbation an eigenvector x experiences is determined by k 1. (No change.1 0 0 2 0 1 :0001 0 1 B C A0 = A + A = B :0001 :99 0 C : @ A 0 0 2 The eigenvalues of A+ A are 1. but the eigenvectors could be ill-conditioned. and 2.

I 0 1:00001 2 B A =A+ A= B 3 4:00001 @ 1 6 7 3 C 5 C A 8:00001 1 529 . for example.1 0 .:5774 @ @ A .:6340 B B = B . then there may be large errors in the computed canonical form and this in turn will introduce large errors in the eigenvalues.9.2:3660 1 C 0 C A 0 3 A = 10.:5774 6 7 8 0 13 . A question therefore arises as to whether we can obtain a similarity reduction of A to a suitable canonical form using a well-conditioned transforming matrix. if it is complex) the condition number (with respect to 2-norm and F-norm) of such a matrix.011 B C of A are B 0 C @ A 0 001 001 B C B C B 1 C and B 0 C.9 The Real Schur Form and QR Iterations In the preceding discussions we have seen that computing eigenvalues of A via reduction of A to the Frobenius or to the Jordan Canonical Form is not numerically e ective.:2113 C A .:5774 1 C :7887 .99.:5774 . if a matrix A is transformed to a matrix B using unitary similarity trans- formation.:2113 :7887 .:9019 0 @ . is an orthogonal matrix (or unitary. If the transforming matrix is ill-conditioned. then a perturbation in A will result in a perturbation in B of the same magnitude. being 1.6:0981 0 5 3 . Indeed. That is.:5774 01 2 31 B B C A = B 3 4 5 C U = B . Note that the eigenvector corresponding to @ A @ A 0 1 3 = 2 has not changed while the other two eigenvectors have changed because of the proximity of the eigenvalues 1 and . A perfectly well-conditioned matrix. 8. if B = U AU and U (A + A)U = B + B then k Bk 2 k Ak : 2 Example 8.

0 13:00001 . De ne U = (u V ) 1 where V is k k . . : 5 A perfect canonical form displaying the eigenvalues is a triangular form (the diagonal entries are the eigenvalues). Next assume the theorem is true for n = k . We will prove the theorem using induction on n. then there exists a unitary matrix U such that U AU = T 1 2 where T is a triangular matrix with the eigenvalues entries. Then U1 is unitary and. 1) (k . We restate this important result below and give a proof. If n = 1 the theorem is trivially true. .6:0981 0 0:00001 B = B .3).2. 0 B B B A = U AU = B B B @ 1 1 1 1 0 .:633974 . Theorem 8.2.9. then we will show that it is also true for n = k: Let u be a normalized eigenvector of A associated with an eigenvalue 1. 1). B = 10.2:3660 1 B C B = U (A + A)U = B .3) If A is an n n matrix.:9019 :00001 0 C @ A .1 (The Schur Triangularization Theorem restatement of Theorem 8. 0 ^ A 1 C C C C C C A ^ where A is (k . 1 and is unitary. In this context we now recall a classical result due to Schur (Theorem 8. ::: n as the diagonal Proof. By our hypothesis there exists unitary matrix V1 of order (k . 1) such that ^ ^ T = V1 (A)V1 530 . I 1 1 5 3 3 k Ak = k Bk = 10. 1.

we are done.. de ning 0 we see that U2 is unitary (because V1 is so). .9. . Theorem 8. known as the Real Schur Form of A (RSF). and 2 1 2 0 1 1 0 0C B B0 C B C C U = B . even for a real matrix A. 0 B B B U AU = B B B @ 1 0 . 531 @ . However. so is U AU: Since the eigenvalues of a triangular matrix appear on the diagonal. . 0 1 C C C C: ^ ^C V AV = T C A 1 2 ^ Since T is triangular. The scalars diagonal entries correspond to real eigenvalues and 2 2 matrices on the diagonal correspond to complex conjugate eigenvalues. C B C 11 12 22 1 2 0 0 Rkk where each Rii is either a scalar or a 2 2 matrix. .is triangular. U and T in the Schur Theorem above can be complex. . Then there exists an n n orthogonal matrix Q such that 0R R R k1 B C B R kC T AQ = R = B 0 R Q B . Since a real matrix can have complex eigenvalues (occurring in complex conjugate pairs).2 (The Real Schur Triangularization Theorem) Let A be an n n real matrix. A . we can choose U to be real orthogonal if T is replaced by a quasi-triangular matrix R. C B C V @ A 2 1 U A U = U U AU U = U AU 2 1 1 2 So.. . Then. C . B..

A properly implemented QR-method is widely used nowadays for computing the eigenvalues of an arbitrary matrix.9. the rst k columns and eigenvectors of the matrix A. Note: Since the eigenvalues of a matrix A are the n zeros of the char- degree higher than four cannot be found in a nite number of steps.2 is known as the Real Schur Form (RSF) of A. though its roots can be traced to a work of Rutishauser (1958). Francis (1961). We present below a method known as the QR iteration method. The QR iteration method was proposed in algorithmic form by J. For each k(1 of Q form an orthonormal basis for the invariant subspace corresponding to the rst k eigenvalues. they can not be considered to be constructive. G.Proof.1 The Basic QR Iteration We rst present the basic QR iteration method. They do not help us in computing the eigenvalues and eigenvectors.9. Set A0 = A: 532 . The 2 2 matrices on the diagonal are usually referred to as \bumps". any numerical eigenvalue-method for an arbitrary matrix 8. Remark: Since the proofs of both the theorems are based on the knowledge of eigenvalues acteristic polynomial and it is well-known (proved by Galois more than a century ago) that the roots of a polynomial equation of has to be iterative in nature.1. the method is based on the QR factorization and is iterative in nature. The columns of Q are called Schur vectors. 2. De nition 8. Notes: 1. k n). As the name suggests.9. for computing the Real-Schur form of A.1 The matrix R in Theorem 8. The method was also independently discovered by the Russian mathematician Kublanovskaya (1961).9. The proof is similar to Theorem 8.

Compute now a sequence of matrices (Ak ) de ned by A = QR A = R Q =Q R A = R Q =Q R 0 1 2 0 0 0 1 0 1 1 2 1 2 quence is orthogonally similar to the previous one and is therefore orthogonally similar to the original matrix. 518{519). we will be done. and A2 is orthogonally similar to A1 . indeed this happens (see Wilkinson AEP. X . The following result shows that under certain conditions. pp. Thus. In fact. Each matrix in the se1 1 and so on.9. Thus A1 is orthogonally similar to A. has the same eigenvalues as A then if the sequence fAk g converges to a triangular or quasi-triangular matrix. A Condition for Convergence Theorem 8. and therefore. A2 is orthogonally similar to A. for su ciently large k we get 0 B Ak = B @ 1 u1 0 Ak C C: A 533 . For example. as the following computation shows: A = QT A Q = QT (QT A Q )Q = (Q Q )T A (Q Q ): 2 1 1 1 1 0 0 0 1 0 1 0 0 1 Since each matrix is orthogonally similar to the original matrix A. Therefore. A = R Q = QT A Q (sinceQT A = R ) A = R Q = QT A Q : 1 2 0 1 0 1 0 1 0 1 0 1 0 0 0 Ak = Qk Rk = Rk. k = 1 2 : : : The matrices in the sequence fAk g have a very interesting property. Then fAk g converges to an upper triangular matrix or to the Real Schur Form.3 (A Convergence Theorem for Basic QR Iteration) Let the eigenvalues : : : n be such that j j > j j > : : : > j nj and let the eigenvector 1 1 2 matrix X of the left eigenvectors (that is. It is easy to see this. In general. it can be shown that under the above conditions. Qk. the rst column of Ak approaches a multiple of e1 .1) be such that its leading principal minors are nonzero.

1:5665 R1 = : 0 .0:3718 1 .0:3796 ! ! k=3: (Note that we have already made some progress towards obtaining the eigenvalues.0:0006 ! Q3 = .5:3718 .0:9934 .2 ! k=0: A = A=Q R .0:0081 1 .) .:2 ! 1 1 .0:3718 2 2 5:3718 1:0030 A3 = R2Q2 = = Q3 R3 0:0030 .We can apply the QR iteration again to Ak and the process can be continued to see that the sequence converges to an upper triangular matrix.0:0082 ! Q = .3723.0:3821 5:3796 .3:1623 .0:9487 0:3162 ! .0:3723 534 ! .5:3797 0:9593 ! R = : 0 .5:2345 .0:6325 0 0 0 0 0 k=1: k=2: 5:2 1:6 A1 = R0 Q0 = =Q R :6 .:9934 ! . 1 2 A= : 3 4 Eigenvalues of A are: 5.0:0006 1 .0:1146 Q1 = .0:0438 .0:9487 ! Q = . Example 8.0:1146 . j 1j > j 2j.1:0028 ! R3 = : 0 .9.0:9562 A2 = R1Q1 = = Q2R2: .0:3162 .3723.4:4272 R = : 0 .1 . and -0.

0:0000 0:8872 0:8466 0 .0:5651 .2 The Hessenberg QR Iteration The QR iteration method as presented above is not e cient if the matrix A is full and dense. making the method e cient.3 0 0:2190 . We have seen before that the QR factorization of such a matrix A requires 0(n3) ops and thus n iterations of QR method will require 0(n4 ) ops. making the method impractical. then each member of the sequence fAk g is also upper Hessenberg. and Givens rotations are used to factorize Ak into QkRk.0:0000 0 0:0011 0 0 0 535 . Thus.0:6418 1 B C A = A = B .k=4: 5:3723 .0:7149 0:2898 0:6152 1 B C B C Q = B 0:9519 .9.0:0002 . This can be seen as follows: Suppose Ak is an unreduced upper Hessenberg matrix.0:1505 0:2668 C R = B 0:0000 1:0186 0:9714 C : @ A @ A 0:0000 0:8710 0:4913 .0:3063 . Example 8. something simple can be done: Reduce the matrix A to a Hessenberg matrix by orthogonal similarity before starting the QR iterations. Since the QR factorization of a Hessenberg matrix requires only 0(n2 ) ops. 1 ) is also upper Hessenberg. An interesting practical consequence of this is that if A = A0 is initially reduced to an upper Hessenberg matrix and is assumed to be unreduced.9.0:4676 0:8291 1 0 . Fortunately. since Rk is upper triangular.0:6805 0:1226 0:4398 C @ A . Then Qk = J (2 1 )J (3 2 ) J (n n . Ak+1 = Rk Qk is also upper Hessenberg.0:9998 A4 = R3Q3 = : .0:3723 ! 8. the QR iteration method with the initial reduction of A to Hessenberg will be bounded by 0(n3).

Fortunately.k=1: 0:0000 0:0010 0:0006 0 0:4546 .0:4213 0:5303 C : @ A @ A 0:0000 .0:8907 . . MC 1984 p. it is natural to wonder about the rate of convergence of the iterations. ^ i I: 536 .0:0000 0 0:9649 0:2625 .0:2132 1 B C A = R Q = B 0:9697 0:6928 0:7490 C @ A 1 0 0 k=2: 0 1:3792 B A = R Q = B .0:0000 1 B C Q = B .1 i k : i are very close to each other. Let the QR iteration be applied to the matrix ^ H = H .3 Convergence of the QR Iterations and the Shift of Origin Once it is known that the QR iteration method can be made e cient by initial reduction of A to a Hessenberg matrix.0:3752 @ .0:0023 1:0000 0:0000 . Then it can be shown (Golub and Van Loan.0:2625 0:9649 .9.0:0000 C @ A .1 and Let ^ i be an approximation of an eigenvalue i of H . the rate of convergence will be very slow if the moduli of two eigenvalues i i.0:0000 0:0000 1:0000 2 1 1 2 .0:0000 0:0018 0 1:4293 . the rate of convergence can signi cantly be improved by shifting the origin.) i i 8.0:5197 0:5690 1 C . By rate of convergence we mean how fast the subdiagonal entries of the transformed Hessenberg matrix converge to zero.0:1927 0:5299 C A .0:0021 1 0 1:0886 0:9928 0:5702 1 B C B C Q1 = B 0:8907 0:4546 0:0011 C R1 = B 0:0000 .0:0000 0:0018 0 0:4949 0:8265 .0:4509 0:4099 1 B C R = B 0 .0:3224 0:6607 C : @ A 0 0 0:0018 2 (Note that each Q and each A is upper Hessenberg. Let A be initially reduced to an upper Hessenberg matrix H and the QR iteration is performed on H to obtain the sequence Hk .1 of Hk converges to zero at a rate determined by the ratio i Thus. 228) that the subdiagonal entry h(ki).

2. then the rate of convergence will be faster. known as the singleshift QR iteration method. in this case.^ The eigenvalues of H are i . Let i = :99 i. ^i: Let these eigenvalues be ordered so that j . once an eigenvalue ^ of H is found. ^i : : : 2 n .9. ^ij: ^ Then. n k Hk . can overwrite by H . the ith subdiagonal entry of Hk will converge to zero at a rate determined by the ratio k i . ) For k = 0 1 2 : : : do until h(kn. ^i 1 2 . 227).1 = 1:1 ^ i = 1: Then i . the corresponding eigenvalue of H can be computed just by adding the shift back.1 Consider the convincing example from Ortega and Poole (INMD. Transform A to an upper Hessenberg matrix H . i. 1. ^ i rather than by the ratio j i jk : The former is usually smaller than the latter.1 . Of course. each of the matrices ij fH g k To implement the single shift QR iteration. h(nn) I = QkRk . ^ i = :1 i. In the above h(k) denotes the (i j )th entry of Hk .4 The Single-Shift QR Iteration The above argument suggests the following modi ed QR iteration method. we need to have an approximate value ^ i of the eigenvalue i . ^ i j j . ^i i. Set H0 = H . ^ i while i.1 . Of course.1 i = :9: ^ This observation tells us that if we apply the QR iteration to the shifted matrix H rather than to the original matrix H . 8. p. k Hk+1 = Rk Qk + h(nn)I . ^ ij j n . Experimentally it has been observed that if we let the unshifted QR iteration (the 537 .1 becomes computationally zero.

2:0616 2:7440 1:3720 1 B C R =B 0 1:3117 0:5606 C @ A 0 0 0:2311 1 (1) 33 1 1 1 1 538 .2:8284 .0:2615 1 B C Q = B 0:6860 . where one does not subtract the shifts but implicitly constructs the matrix Hk+1.0:2774 C @ A 0 .1 0 0 C @ A 0 (0) 33 0 0 0 k=1: 0 . we can continue the iterations using the (n n)th element of the current matrix as the next shift.3:000 1 B C R =B 0 0:4142 0:7071 C @ A 0 0 .1:4142 1 B C H = R Q + h I = B .h I = Q R 0 .1:0000 . This is explained in the exercise #29. pp.0:7276 .0:6727 .1:0000 . h I = B1 1 3C = Q R @ A 0 1 0 0 0 0:7071 . 548{549).0:7071 0 2:000 . Example 8. Thus starting with hnn as a shift. Convergence: The rate of convergence of the single-shift QR iteration method is ultimately Remark: An implicit version of the single shift iteration known as the implicit QR iteration.0:5000 0:5000 0 1 0 0 (0) 33 0 0:7071 0:7071 H . then h(s) can be taken as a reasonably good approximation nn (s) to an eigenvalue.0:7071 1 B C Q = B .4 Single-shift QR Iteration 0 can be worked out.9.1:4142 1:5000 0:5000 C : @ A 0 .Basic QR) run for a few iterations (say s). cubic to the eigenvalue of minimum modulus (see Wilkinson AEP.0:3812 0:9245 0 . 01 1 11 B C H = H = B1 2 3C: @ A 0 1 1 k=0: 00 1 11 B C H .0:6342 .

9.1:0488 1 B C R =B 0 0:9740 0:1367 C @ A 0 0 0:0124 0 3:5057 .0:0881 0:7137 2 1 1 (1) 33 k=3: H . are: :7261 3:6511 . if there are any.0:5960 0:1544 C : @ A 0 .0:9955 .0:0010 1:000 0 .0:2732 .0:0905 0:9959 0 .h I = Q R 0 .3:2940 1:3787 . a real matrix can have complex eigenvalues.:0514 C : @ A 0 .0:9620 0:2721 0:0247 1 B C Q = B .0:4244 0:0664 C : @ A 0 0:0000 0:7261 3 (3) 33 3 3 3 3 4 3 3 (3) 33 The iteration is clearly converging towards the eigenvalue . In such a case: i) at some stage of iteration.0:3772:) 8.0:2318 .5 The Double-Shift QR Iteration If the desired eigenvalue is real.1:7472 1:2434 1 B C H = R Q + h I = B 0:1101 .0:0001 1 B C Q = B 0:0953 . the above single-shift QR iteration works well.0:0870 C @ A 0 .7261. we could encounter a trailing 2 2 submatrix on the bottom right-hand corner corresponding to a complex conjugate pair of eigenvalues.1:0613 1:0464 1 B C H = R Q + h I = B 0:8998 . 539 . in four digit arithmetic. However.h I = Q R 0 .2:7924 2:0212 1:2451 1 B C R =B 0 1:1556 0:0675 C @ A 0 0 0:0001 0 3:6983 .0:0010 C @ A 0 .k=2: 0 3:8824 .0:2661 .0:0011 0:7260 2 (2) 33 2 2 2 2 3 2 2 (2) 33 H .0:9955 .1:2459 1 B C H = R Q + h I = B .0:0953 . (The eigenvalues of H .0:9580 .2:1221 .

2 .i 0 @ A 1 0 0 1 0 H . which is real.9.1 0 1 0 B C Q = B 0 . 1:4142i 1 B C C R =B 0 0 :5434 @ A 1 0 :2717i :9624 0 1 1:7453 .1:4142 + 1:4142i 1:4142 . Then one iteration-step of the double-shift QR iteration is: Hs .:5230 :8523i 2 1 1 2 0 0 1:9248 540 .:8523i :5230 @ A 0 .:2717i C @ A 1 0 0 i 0 .1 + i .1:4142 C @ A 0 0 :7071i 0 :7071 0 0 1 1:4142 .1 .1 0 k = i k = .:9124 . i . :9717i 1 B C C: H = R Q + k I = B0 . k I = Qs Rs +1 2 +1 +1 Hs = Rs Qs + k I: +2 +1 +1 2 Example 8. :9717i 1:7453 . 2 matrix as shift parameters yielding the One Iteration-step of the Double-Shift QR (Complex) Let the eigenvalues of the 2 2 bottom right hand corner of the Hessenberg matrix Hs be k1 and k2 = k1 .ii) the (n n)th entry of the trailing 2 2 matrix. 1:4142i .1 0 1 0 B C Q = B 0 .:7071 .2 1 B C R = B 0 1:4142i .0:7071i C @ A 0 0 .k I = Q R : 1 2 1 1 0 .i: 0 1 2 H . kiI = Q R : 0 0 0 0 . will not be a good approximation.1:4142 + 1:4142i 1 B C C H = R Q + k I = B0 . k I = QsRs Hs = Rs Qs + k I 1 +1 1 Hs . iii) it is natural to use the eigenvalues of that 2 double-shift QR iteration.5 01 2 21 B C H = H = B0 0 1C: @ A 0 .

(k + k )Hs + k k I: 2 1 2 1 2 1 2 Since k2 = k1 the matrix N is real.:8523i :5230 ! Note that the eigenvalues of the 2 2 bottom right hand corner matrix . k I ) = Hs . Finally. Hs+2 is also real. with a little manipulation complex arithmetic can be avoided. From the above discussions we conclude that it is possible to obtain Hs+2 directly from Hs through real orthogonal transformations. the matrix Qs Qs+1 can be chosen to be real..i and i. k )I ]Qs + k I Qs Qs HsQs Qs (Qs Qs ) Hs (QsQs ): +1 +1 2 +1 +1 +1 +1 +1 2 +1 2 2 1 +1 2 1 1 2 +1 2 +1 +1 +1 Since QsQs+1 and Hs are real. k I )QsRs = Qs (Hs . Hs +2 = = = = = = Rs Qs + k I Qs (Hs . 541 . k I )QsRs = Qs Qs (Hs .i. 2 Avoiding Complex Arithmetic in Double-Shift QR Iteration Since k1 and k2 are complex. even though the starting matrix Hs is real. k I )(Hs . k I )Qs + (k . k I )(Hs . k )I )Qs + k I Qs Qs (Hs . k I )Qs + k I Qs (RsQs + (k .:5230 :8523i are . we show that Hs+2 is orthogonally similar to Hs through this real transforming matrix Qs Qs+1. However. Next. s s We will show that the matrix H +2 is orthogonally similar to H through a real transforming matrix. N = (Hs . k I ) = (Hs . the above double-shift QR iteration step will require complex arithmetic for implementation. We will discuss this aspect now. and can be formed directly from H without computing H +1. Thus the eigenvalues of H are 1 i and . we show that (QsQs+1 )(Rs+1Rs) is the QR factorization of N . k I )Rs = Qs Qs Rs Rs: 2 2 1 2 +1 2 +1 +1 Since N is real and (Qs Qs+1)(Rs+1Rs) is the QR factorization of N . Consider the matrix s s N = (Hs .

hn.1hnn .8 5 1 B C N = H . we note that k1 and k2 need not be computed explicitly.1 n This allows us to write the one-step of double-shift QR iteration in real arithmetic as follows: One Step of Double Shift QR Iteration (Real Arithmetic) 1. 3.2 0 0 2 t=2 d = 2: 542 . k I ) = Hs . Example 8. Let ! be the 2 2 right hand corner matrix of the current matrix Hs: Then hn.9.1hn. hnn 1 1 1 1 t = k + k = sum of the eigenvalues = trace = hn.1 2 3 C : @ A . Form Hs+2 = QT Hs Q. n hn n. k I )(Hs .Eigenvalues k and k need not be computed explicitly 1 2 Though computing the eigenvalues of a 2 2 matrix is almost a trivial job.1 + hnn is real is real: d = k k = product of the eigenvalues = determinant =hn. n.6 01 2 31 B C H = H = B1 0 1C @ A 0 .2 2 0 0 3 . Find the QR factorization of N : N = QR. tH + dI = B . We will call the above computation Explicit Double Shift QR iteration for reasons to be stated in the next section. 2. Form the matrix N = Hs2 . hn n. tHs + dI . (k + k )Hs + k k I 1 2 2 1 2 1 2 all we need to compute is the trace and the determinant of the 2 2 matrix. To form the matrix N = (Hs . 1 2 1 n. 1 2 1 n.

1:1867 3:0455 . Compute the 1st column n1 of the matrix N .Find the QR Factorization of N : 0 .:5470 . They can be written down explicitly.6 Implicit QR Iteration After all this. a little trick again allows us to implement the step in 0(n2) ops.2 P T P T )Hs0 (P P 2 1 1 2 Pn.0:8289 C : @ A 2 0 0:0000 1:8437 0:8116 8.9. First. that the above double-shift (explicit) QR iteration is not practical. we note. Transform Hs0 to an upper Hessenberg matrix by orthogonal similarity using Householder matrices: T (Pn.:8571 1:1007 2:5740 1 B C H = QT H Q = B . 2.3) we can show that the matrix H +2 of the explicit QR and H 0+2 of the implicit QR are both unreduced upper Hessenberg and are essentially the same matrix.:2408 1 B C Q = B :2673 :0322 . ) = Hs0 : 2 +2 stand how these four steps can be implemented in 0(n2) ops.0:8365 0:1204 0 . since Hs is Hessenberg. The above four steps constitute one iteration of the double-shift implicit QR. k1 I )(Hs . Form Hs0 = P0T HsP0 . Fortunately.4. we must give a close look into the structures of the above computations. 543 Using the implicit Q-theorem (Theorem 5.0:9631 C @ A :5345 . computing the 1st column of N = (Hs . It contains only three nonzero elements. with utter disappointment. 4. Find a Householder matrix P0 such that P0n1 is a multiple of e1 . 3. To unders s . The reason for this is that forming the matrix N itself in step 1 requires 0(n3) ops. One Iteration of the Double-Shift Implicit QR 1.:8018 . k2 I ) is almost trivial.

when n = 6. 0 ! 2 : 544 . Each Pk k = 1 2 : : : n . For example. Second. The last Householder matrix Pn.3 ^ where Pk is a 3 3 Householder matrix.2 has the form B0 B B B B 0 = P Hs P = B 0 Hs B B0 B B B0 0 0 @ 0 0 C C C C C C: C C C C C A Pn.Let 0n Bn B B Bn B Ne = n = B B B0 B . B . = 2 In. the Householder matrix P0 has the form ! ^ P0 0 P0 = 0 In.k. 3 has the form 0I 1 0 k B ^ C C Pk = B Pk @ A 0 In. th11 + d + h12h21 11 = h21 (h11 + h22 . because only three elements of n1 are nonzero.2 amount to chasing these bulges systematically. It is a Hessenberg matrix with a bulge. t) = h21 h32: Here hij refers to the (ij )th entry of Hs . Because of this form of P0 . @ 1 1 11 21 31 then 0 1 C C C C C C C C C C C A n n n 11 21 31 = h2 . 0 2 ^ Pn. and Hs being Hessenberg. the matrix Hs0 = P0 Hs P0 is not a full matrix.3 ^ where P0 is a 3 3 Householder matrix. we have 0 1 0 0 0 0 A bulge will be created at each step of the reduction of Hs to Hessenberg form and the constructions of Householder matrices P1 through Pn. B .

1 n: 2. 1 1 n. ^ (a) Find a 3 3 Householder matrix Pk such that 0x1 0 1 ^B C B C Pk B y C = B 0 C : @ A @ A z 0 0 Form 0I Bk P Pk = B @ ^k 0 545 In. produces orthogonal matrices P0 P1 : : : Pn. t) z = h =h h : 11 21 2 11 21 11 12 11 12 32 21 32 21 3. pp. d = hn. The algorithm overwrites H with QT HQ. 1.1hnn .Taking into consideration the above structures of computations.1 + hnn n.2 such that the nal matrix is upper Hessenberg. th + d + h h y = n = h ( h + h . Compute the Householder matrices P0P1 : : :Pn. 0 1 2 is an unreduced upper Hessenberg matrix.2 such that QT HQ where Q = P P : : : Pn. Algorithm 8.1 One Iteration-Step of the Double-Shift Implicit QR Let H be an n n unreduced upper Hessenberg matrix.9. Compute the shifts: t = hn. which constitutes one step of the double-shift implicit QR iteration. For k = 0 1 2 : : : n . Then the following algorithm. it can be shown that one step of the implicit QR iteration requires only o(n ) ops. see the book by Stewart (IMC pp.k. 375{378) and the recent book by Watkins (FMC. Compute the 1st three nonzero entries of the 1st column of N = H 2 . hn n. tH + dI : x = n = h . 2 For details of this 0(n2) computations of one iteration of the double-shifted implicit QR.1hn. 3 do. 277{278). 1 C C: A 3 .

0 ! 2 : 2 transferring matrix Q is needed and accumulated.1 z = n = .2:6248 .0:9733 1 B C H = P T HP = B :0581 0:8666 1:9505 C : @ A 1:1852 .Form PkT HPk and overwrite with H : H PkT HPk : Update x y and z : x y z hk hk hk +2 +3 +4 k+1 k+1 k+1 (if k < n .2 of order 2 such that ^ Pn. Flop-count. 234).2 0 . 0 2 ! ! Pn. If the 01 2 31 B C Example 8.:8571 .0:0793 C A @ :5345 . then another 6n2 ops will be needed (see Golub and Van Loan MC 1983. Form 2 x = : y 0 In.1 C @ A .2 11 21 31 3. k = 0: 031 B C uuT P = I .9. p.7 1. 2uT u where u = B . t = 2 d = 2 H = B 1 0 1 C @ A 0 . x = n = 3 y = n = . 3): ^ (b) Find a Householder matrix Pn.2 2 2. = 2 ^ Pn.:0793 0:8414 0 . One step of the double-shift implicit QR iteration takes about 6n ops.:8018 :2673 :5345 1 C B P = B :2673 :9604 .0:7221 2:9906 0 0 0 0 Update x and y : 546 .

5 by the explicit QR. Note that the eigenvalues of the de ated submatrix are also the eigenvalues of the original matrix. Hk ) = det( I . the matrix has the form: where B 0 is the 2 2 trailing submatrix or is a 1 1 matrix. A0) det( I . Once a real or a pair of complex conjugate eigenvalues is computed. can be deleted and computation of the other eigenvalues can be continued with the submatrix. This process is also known as de ation. one or two (and sometime more) subdiagonal entries from the bottom of the Hessenberg matrix converge to zero. suppose immediately before de ation.1:1866 ^ P1 x = y 0 1 0 C . 2.7 Obtaining the Real Schur Form A 1.0:9988 .9. For. Iterate with the implicit double step QR method. (Example 8.0:8571 B H = P1T HP1 = B .0:9988 C A 0:0490 1 1:1008 2:5739 C 3:0456 .0:9988 0:0490 01 0 B 0 0:0490 P1 = B @ 0 . This then will give us a real or pair of complex conjugate eigenvalues. Typically.9. Then the characteristic equation of Hk : det( I .1:1867 @ 0 ! .x = h = :0581 y = h = 1:1852 21 31 Find P1 : ^ P1 = Note that the matrix H obtained by the implicit QR is the same as H2 obtained earlier in section 8. or the last two rows and the last two columns in the second case. B 0 ): 547 A0 C 0 Hk = 0 B0 ! . Transform the matrix A to Hessenberg form.0:8290 C : A 1:8436 0:8116 ! ! 8. the last row and the last column in the rst case. after two to three steps of the doubly-shift implicit QR iteration.0:9988 0 .9.6) 0:0490 .

2 2 3 The RSF is The eigenvalues of the 2 2 right-hand lower corner submatrix are 2:0832 1:5874i.3543 0.9. see the book \The Symmetric Eigenvalue Problem" by B. For a good discussion on this matter.1 i. His book \The Symmetric Eigenvalue Problem" is an authoritative book in this area.2:0531 7 6 6 0 6 7 1:2384 1:6659 7 : 4 5 0 . Beresford Parlett is a professor of mathematics at the University of California at Berkeley. It seems that there are no clear-cut conventions here however.Thus. N. Parlett (1980).1:1867 21 0.0129 0. He has made some outstanding contributions in the area of numerical matrix eigenvalue problem. we have given a commonly used criterion above." 2 3 .1:1663 .8 Find the Real Schur Form of 2 61 H=6 1 6 4 2 37 7 0 1 7: 5 0 .1:3326 .1 to be zero if jhi i. When to Accept a Subdiagonal Entry as Zero A major decision that we have to make during the iteration procedure is when to accept a subdiagonal entry as zero so that the matrix can be de ated. especially for large and sparse problems. Accept a subdiagonal entry hi i. j 1 (jhii j + jhi.0000 548 .1j): Example 8. the eigenvalues of Hk are the eigenvalues of A0 together with those of B 0 .1:9409 2:9279 Iteration 1 2 3 4 h . But Hk is orthogonally similar to the original matrix A and therefore has the same eigenvalues as A.

7632 .0:3822 0:4526 .0:1069 . Note that no round-o error is involved in this computation and it takes only 0(n2) ops. The RSF is h h .0494 7 6 7 h=6 6 0 .0:0763 2 3 0:1922 0:5792 5 The eigenvalues of 4 are 0:1082 0:4681i.8394 7 6 6 0.1922 0.0:4089 7 : 7 6 7 4 5 0 0 0 . op-count for this method.0:0011 5 0.0:0252 Iteration h21 1 0. if they vary widely.0001 .0:1996 0.1AD.3860 2 . The EISPACK routine BALANC that balances the entries of the matrix A is applied to A before the other routines are used to compute the eigenvalues. it is hard to give an exact . before starting the QR iterations.0:3773 0. Since QR iteration method is an iterative method.Example 8. .0:3905 0. where the diagonal matrix D is chosen so that a norm of each row is approximately equal to the norm of the corresponding column. The balancing is equivalent to transforming the matrix A to D. preprocessing the matrix by balancing improves the accuracy of the QR iteration method.0:0641 7 4 5 0 0 .5792 0. it is advisable to balance the entries of the original matrix A. empirical observations have established that it takes about two 549 Flop-count.0:3905 0 32 43 2 3 1.0243 .0:0084 .0:3590 0 .0:0672 3 0.0001 0. However.0:5084 .0:6391 7 6 6 .0:9615 0:9032 .0001 . In general.0089 4 .9 Find the Real Schur Form of 2 3 0:2190 .0:3905 0:0243 Balancing As in the process of solving linear system problems.0:4571 0:8804 7 6 7 7 h=6 6 0 6 7 .9.0:0756 0:6787 .0:3673 0 .4095 0.

then the cost will be about 15n3 ops. it will require about 8n3 ops to compute all the eigenvalues (Golub and Van Loan. forms a basis for the invariant subspace associated with the eigenvalues of R11. where kE kF (n) kAkF (n) is a slowly growing function of n. Thus. The QR iteration method is quite stable. since Ax = x for each eigenvalue . Then S will be called an invariant subspace (with respect to premultiplication by A) if x 2 S implies that Ax 2 S . one needs to compute the orthonormal bases of an invariant subspace associated with a selected 550 . each eigenvector is an invariant subspace of dimension 1 associated with the corresponding eigenvalue. The computed orthogonal matrix Q is also orthogonal. such as in the solution of algebraic Riccati equations (see Laub (1979)). Basis of an Invariant Subspace from RSF Let 0 22 and let us assume that R11 and R22 do not have eigenvalues in common. Round-o Property algorithm shows that the computed real Schur form (RSF) is orthogonally similar to a nearby matrix A + E . Then the rst p columns of Q. Thus. The Real Schur Form of A displays information on the invariant subspaces. In many applications. If the transforming matrix Q and the nal quasitriangular matrix T are also needed. MC 1984 p. 235).9.9. An analysis of the round-o property of the 8.QR iterations per eigenvalue.8 The Real Schur Form and Invariant Subspaces De nition 8. where p is the order QT AQ = R = R 11 R R 12 ! of R11.2 Let S be a subspace of the complex plane C n.

Unfortunately. The process is quite inexpensive. 241).9. one wonders if some extra work can be done to bring them into that order.1 J (1 2 ) = .number of eigenvalues.0:2361 0:0000 ! QT AQ1 = 1 0:0000 4:2361 ! 0 . where k is the number of interchanges required to achieve the desired order.0:8507 0:5257 ! 0:00 T AQ = 4:2361 Q : 0:00 . 1 Then Q = Q1 J (1 2 )T is such that QT AQ = 2 Example 8. if the eigenvalues are not in a desired order. It requires only k(8n) ops.4:4722 0 ! T = .:5257 :8507 . For details see (Golub and Van Loan. 551 ! . Let ! r12 1 T AQ = Q1 1 1 6= 2: 0 2 If 1 and 2 are not in right order. Stewart (1976) has provided useful Fortran routines for such an ordering of the eigenvalues.0:8507 Q = Q1J (1 2 ) . Let A be 2 2. MC 1984 p. the transformed Real Schur Form obtained by QR iteration may not give the eigenvalues in some desired order. all we need to do to reverse the order is to form a Givens rotation J (1 2 ) such that ! ! r12 J (1 2 ) = : 0 2 . That this can indeed be done is seen from the following simple discussion.1 0 ! 4:4722 ! 0 J (1 2 ) = . Thus.10 0 r 12 1 ! : 1 2 A= 2 3 :8507 :5257 ! Q1 = .0:5257 .0:2361 The above simple process can be easily extended to achieve any desired ordering of the eigenvalues in the Real Schur Form.

Note: From Ax = x we have or That is. since A is initially reduced to a Hessenberg matrix H for the QR iteration. P T APPx = Px Hy = y: x = Py: 552 .1 The Hessenberg-Inverse Iteration 1.10. Thus the Hessenberg-Inverse iteration can be stated as follows: Algorithm 8.1) y(k) = z(k)= max(z (k)): Stop if k(y (k) . 3.8. Compute an eigenvalue whose eigenvector x is sought.5. Reduce the matrix A to an upper Hessenberg matrix H : P T AP = H: 2. using the implicit QR iteration. the maximum number of iterations. Recover the eigenvector x: x = Py k ( ) where y (k) is the approximation of the eigenvector y obtained from Step 3. I )z (k) = y (k.1 The Hessenberg-Inverse Iteration As soon as an eigenvalue is computed by QR iteration. 4.10. However.1))=y (k)k < or if k > N . y (k. we can invoke inverse iteration (algorithm 8.2) to compute the corresponding eigenvector. it is natural to take advantage of the structure of the Hessenberg matrix H in the solutions of the linear system that need to be solved in the process of inverse iteration. Apply the inverse iteration For k = 1 2 : : : do Solve (H .10 Computing the Eigenvectors 8.

tkk I )y = 0: Write T . Solve the homogeneous triangular system Ty = y . 1 2 ! (T .8. 2. tkkI = T where T11 is k k. we have Q AQQ x = Q x: Ty = y: Thus. Compute x = Qy: We now show how the solution of Ty = y can be simpli ed assuming that T is triangular and that all the eigenvalues of A (that is.2 Calculating the Eigenvectors from the Real Schur Form The eigenvectors can also be calculated directly from the Real Schur Form without invoking the inverse iteration. Then gives that is. writing Q x = y . the diagonal entries of T ) are distinct.10. The process is described as follows: Let A be transformed to the RSF T by the implicit QR iteration: Q AQ = T: Then Ax = x can be written as That is. Partition y accordingly: T 0 T 11 12 22 ! y y= y where y1 has k-entries. an eigenvector x corresponding to an eigenvalue can be computed as follows: 1. tkk I )y = 0 T T 0 T 11 12 12 22 ! y! y 1 2 =0 2 T y + T y = 0 T y = 0: 11 1 2 22 553 . after A has been transformed to the RSF T . Let = tkk : That is we are trying to nd a y such that (T .

) 2.sz . the homogeneous system T y =0 22 2 has only the trivial solution y2 = 0. tkk I ) = T ^ T 11 T ! : 0 T 11 12 22 T = 11 11 1 0 0 s ! : ^ ^ T y = . Thus T11y1 = 0 reduces to ^ ^ T y + sz = 0 11 1 y ^ y = z ^ z can be chosen to be any nonzero number.2 Computing an Eigenvector Directly from the RSF 1. So.10. Solve by back substitution: choosing z as a nonzero number. because its diagonal entries are tjj . T11 is singular therefore. y1 6= 0: Again. tkk j = k + 1 : : : n which are di erent from zero. 1) (k . Since T is upper triangular. 554 (T . Algorithm 8. Transform A to RSF by the implicit QR iteration: Q AQ = T (Assume that T is triangular and that the diagonal entries of T are di erent. 1) and nonsingular. Partition 5. From above. note that T11 has the form 0 1 ^ T11 s A T11 = @ 0 0 ^ where T11 is (k .Now. Select the eigenvalue = tkk whose eigenvector x is to be computed: 3. y can be computed by ^ 1 1 11 1 where ! back substitution. we have T y = 0: 11 1 Since the ith diagonal entry of T11 is zero. T22 is nonsingular. Partition 4.

Compute x = Qy: 1 Example 8.:4747 :2778 .:2385 y = = : z 1 1 ! ! 1 0 ! B .:8696 .:9606 .(T11).:8352 C : @ A . Then y ^ .0:9283 1 B C x = Qy = B 0:3910 C : @ A 1 2 0:2058 555 .:1358 .10.:2423 1 B C Q = B .1sz = .:0016 :4937 4 6 7 Suppose we want to compute the eigenvector corresponding to = t22 = :3571: Then ! 10:5428 2:5145 T11 = : 0 0 = s = Choose z = y1 = ^ 11 1 ^ T 10:5428 2:5145 1 ^ .0:2569 0 .1 01 1 11 B C A = B2 3 4C @ A 0 10:8998 2:5145 2:7563 1 B C T = B 0 0:3571 :2829 C @ A 0 0 .y ^ 6. Form y = : z ! y 7.:2385 C y y = =B 1 C A @ y 0 0 .:2385 Choose y2 = 1. Form y = : 1 1 ! 0 8.

Since a real symmetric matrix cannot have a complex eigenvalue.2. be applied to nd the eigenvalues of a symmetric matrix.11 The Symmetric Eigenvalue Problem The QR iteration can. that is.2) we have QT AQ = R where R is in Real Schur Form. each 2 2 matrix on the diagonal corresponds to a pair of complex conjugate eigenvalues. 556 .0:3315 1 B C Ax = B 0:1396 C @ A 0 . One such method is based on the Sturm sequence property of the characteristic polynomial of the matrix. The Real Schur form of a real symmetric matrix is a diagonal matrix. We will discuss below the symmetric QR iteration brie y. However. Indeed. of course. Some Special Properties A. it follows that R can not have a 2 2 matrix on the diagonal therefore R is a diagonal matrix.0:3315 1 B C x = B 0:1396 C : @ A 0:0734 0:0734 and 2 8. in the symmetric case the method simpli es to a large extent.5). B. >From the Real Schur Triangularization Theorem (Theorem 8. The eigenvalues of a real symmetric matrix are real.9. since the eigenvalues and eigenvectors of a symmetric matrix enjoy certain special remarkable properties over those of the nonsymmetric case.It is easily veri ed that 0 . Now. Proof. R is a triangular matrix with each diagonal entry as either a scalar or a 2 2 matrix. The eigenvectors associated with the distinct eigenvalues are orthogonal (Theorem 8. General Perturbation Property. C. some special methods exploiting these properties can be developed for the symmetric problem.

2336. respectively. Note that kE k2 = 10. kE k2 i + kE k2 i = 1 2 : : : n: i This result is remarkable. . 0.1 01 2 31 B C A = B2 3 4C @ A 3 4 6 The eigenvalues of A are . Perturbation Result on the Eigenvalues of a Symmetric Matrix The eigenvalues of a real symmetric matrix are wellconditioned.) (Section 8.Let A be a n n real symmetric matrix.7. D.0:4203.1) that 0 i .2337. In fact. 557 E = 10.4.11. Rank-one Perturbation Property. (See also the corollary of the Bauer-Fike Theorem given earlier. Let A0 = A + E where E is a real symmetric 0 0 be the perturbation of the matrix A. small changes in the elements of A cause only small changes in the eigenvalues of A. that is.2) Example 8.1868. In this section we state a theorem that shows how are the eigenvalues shifted if E is a rank-one perturbation. and let 1 2 n and 0 1 2 n 0 . The eigenvalues of A + E are .7. and 10. 10. since kE k = maxflargest and smallest of the eigenvalues of E g 2 then the eigenvalues of the perturbed matrix A0 cannot di er from the eigenvalues of the original matrix A by more than the largest eigenvalue of the perturbated matrix E . The result plays an important role in the divide and conquer algorithm (Dongarra and Sorensen (1987)) for the symmetric eigenvalue problem.0:4203. Then it follows from the Bauer-Fike Theorem (Theorem eigenvalues of A and A 8.1867. 4 I : 3 3 .

.Eigenvalues of a Rank-One Perturbed Matrix Theorem 8. pp. if all the eigenvalues are required explicitly. the symmetric matrix A is transformed to a symmetric tridiagonal matrix T using Householder's method described earlier: 0 B B B B B T =T =B PAP B B B B B @ 1 1 1 0 .11.. The method is particularly useful if eigenvalues are required in an interval.1 Suppose B = A + bbT .1] i] n be the eigenvalues of A i = 2 ::: n if 0 i = 1 ::: n . 2 2 .. Example 8.11.1 The Sturm Sequence and the Bisection Method In this section we describe a method to nd the eigenvalues of a symmetric matrix.. where A is an n n symmetric matrix. Then 0 0 i2 i2 i i+1 i. However.. It is easily veri ed that 2 < 1 2 8. .0:5157 2 = 0:1709 1 = 11:3443.1 n 1 C C C C C C: C C C C C A 558 ...2 01 2 31 B C A = B2 4 5C @ A = . See Wilkinson APE. in practice..1 b = (1 2 3)T : 3 5 6 The eigenvalues of B are: 03 = ... .. . First. the symmetric QR iteration method is preferred over the method of this section for matrices of moderate sizes. it can be used to nd all eigenvalues. n.1 n. 1 if <0 Proof. n.2 0 .1 n. In principle. 97-98.11. is a scalar and b is an n-vector.3:3028 02 = 0 01 = 0:3023. Let 1 2 ::: and 01 : : : 0n be the eigenvalues of B .. The eigenvalues of A are: 0 < 1 and 3 < 0 < 2: 3 = .

1 i. Thus pk ( ) is positive if is negative with large magnitude.Let pi( ) denote the characteristic polynomial of the i i principal submatrix of T .1pi. ( ) 1 k k ) ( ) ( k k ( ) 1): pk ( ) is positive in the rst interval and takes alternative signs in consecutive intervals. 559 . then T is a p ( ) = 1 and p ( ) = 0 1 block diagonal and its eigenproblem can thus be reduced to that of its submatrices. the following interlacing property is very important. we may assume that i 6= 0 i = 1 2 : : : n .1 ( ) 1 k) ( ( ) 1 k ( ) 2 k) ::: ( k k.2( 2 ) i = 2 3 ::: n 1 with . 438. Theorem 8. For unreduced symmetric tridiagonal matrix T .1)k k + . It is clear that pk ( ) = (. Then these polynomials satisfy a three term recursion: pi( ) = ( i . ( ) . The interlacing property and the sign changes of pk ( ) can be illustrated in the following gure.) We shall show that this interlacing property leads to an interesting result on the number of eigenvalues of T. )pi. Then ( +1) 1 ( ) i k the ith smallest eigenvalue of its k k principle sub( +1) k < ( ) 1 k < ( +1) 2 k < < i k < ( ) i k < ( +1) +1 i k < < k k ( +1) < k k ( ) < k k ( +1) +1 : (See Golub and Van Loan. MC 1989 p. If a subdiagonal entry of T is zero. : Without loss of generality.11.2 (Interlacing property) Let T be an unreduced symmetric tridiagonal matrix and matrix. 1: Recall that the matrix T with this property is called unreduced. The zeros (k) < (k) < < (k) of pk separate the real line into k + 1 intervals 1 2 k (.

as well as one eigenvalue of each k k submatrix. a sign agreement between p0( ) and p1 ( ) \pushes" property. it \pushes" one eigenvalue (n) of A (or T ). Suppose we have counted the sign agreements between consecutive two terms of p0( ) p1( ) : : : pk ( ). this implies to the right of .+ + + p (λ) 5 Let be a real number.p (λ) 0 1 + + + + + + + + + + + + + + + + + - + + + + + + + + + + + + - + + + - + + + + + - + + + - p (λ) p (λ) 2 p (λ) 3 p (λ) 4 . By interlacing < (1) 1 < (2) 2 < < n n : ( ) Thus. n to the right of . There are two cases for the signs of p0( ) and p1 ( ): µ p (λ) 0 1 + + + + + + + + + µ + + + + + + λ (1) 1 + - + - + - p (λ) λ (1) 1 - - Case 1: no sign agreement Case 2: signs agree (1) 1 Clearly. Let's consider the signs of pk ( ) and pk+1 ( ) and interlacing property: 560 .

+1 Example 8.+ λ (k) i + p (λ) k+1 + λ (k+1) i-1 λ (k+1) i + + λ (k+1) i+1 p (λ) k+1 + + λ (k+1) i-1 - λ + + (k+1) i (k+1) i+1 λ Case 1: no sign agreement Case 2: signs agree k Let be between (. the number eigenvalues of T . when we know the number of zeros of pk in 1). then we know the number of zeros of pk+1 in 1) by checking the sign agreement between pk ( ) and pk+1( ).11. Generally.. It is clear from the above gure that a sign agreement between i i k+1 to the right of .+ λ (k) i - + p (λ) k + + λ (k) i-1 - .e. The rule is clear: The number of sign agreements between consecutive terms of p ( ) p ( ) : : : pn( ) equals to the number of eigenvalues of T larger than . This recursion can be continued until we know the number of zeros of pn.3 Let 02 1 01 B C T = B1 2 1C: @ A 0 1 2 561 . or it means that p pk ( ) and pk+1( ) \pushes" i k+1 has one more zeros than pk in 1). There is no zero of p0 in 1).µ p (λ) k + + λ + (k) i-1 µ . i.)1 and (k). 0 1 Note : It is also clear that pk ( ) = 0 should be considered an sign agreement with pk( ) because one more zero is \pushed" into 1). We know the number of zeros of p1 in 1) by the sign agreement or disagreement of p0 ( ) and p1( ).

m+1.p ( ) = 1 p ( )=2. set s1 = s3 . ) (2 .11. that is. In fact. it follows that all the eigenvalues of T are positive. The eigenvalues of T are easily seen to be p p 2 2 + 2 2 . Otherwise go to II. Compute N (s3) = the number of agreements in sign in the sequence 1 p1(s3) p2(s3) : : : pn(s3 ): Let > 0 be a preassigned small positive number. 2(2 . since p3(0) = 4 6= 0. 1] .m+1 for a given m n. The Sturm-sequence and Bisection method for computing a speci ed zero of pn ( ) now can be stated as follows: Algorithm 8.1 0: We count the agreements in sign of + + . There are two agreements con rming that T has two eigenvalues greater than or equal to 2. the sequence p0( ) p1( ) p2( ) and p3( ) is: 1 0 . ) . we can take s = . )p ( ) . Then n n. II. s1j < . ) .. p ( ) = (2 . 4. initially. ) = (2 . ): 0 2 3 1 2 2 1 2 3 Let = 0. 562 If N (s3) < m. 1 p ( ) = (2 . ) . There are three agreements in sign.1 The Sturm Sequence { Bisection Algorithm Let 1 < eigenvalue is 2 be the eigenvalues of T . Then the sequence p0 p1( ) p2( ) and p3 ( ) is 1. 2. If so accept s3 = s1 + s2 2 as an approximate value of n. Thus all the eigenvalues of T are greater than or equal to zero.m+1 : < < I. 2: Let = 2 then. Find an interval s1 s2] containing 1 Since 2 i kT k. 3 . set s2 = s3 otherwise. the zeros of pn( ). Compute s3 = s1 + s2 .kT k1 s = kT k1: III. p ( ) = (2 . Test if js2 . 3. (2 . Suppose the desired n.

11. Example 8. s = .1 0 . the desired zero is located in an interval of width (s 2k s ) . s =0 s =4 s = 0+4 =2 2 N (s ) = N (2) = 2 < 3: Set s = s 1 2 3 3 2 3 Iteration 2.4 s = 4 s =0 N (s ) = N (0) = 3: Set s = s : 1 2 3 3 1 3 Iteration 1. s = 0 s = 1 s = :5 N (s ) = 3: Set s = s : 1 2 3 3 1 3 563 .. Note: After k steps. 3 s =0 s =2 s = 0+2 = 1 2 1 2 3 N (s ) = number of agreements of sign in the sequence: 1 . Initially. .) N (s ) = 2 < 3 Set s = s : 3 2 3 Iteration 3. 2 2 1 1 m = 3: Iteration 0.1: (+ .4 Let 02 1 01 B C A = B1 2 1C: @ A 0 1 2 p Suppose we want to approximate = 2 ..

in fact. Also. So.1 . it is symmetric tridiagonal.2 Symmetric QR Iteration I.11. ( ) 1 ( ) 1 1 tnkn.The eigenvalue 1 is clearly in the interval :5 1]. the case.11. the double-shift strategy discussed for the general eigenvalue problem is not needed in this case. if the starting matrix is a symmetric tridiagonal matrix T . which is.2 The Symmetric QR Iteration Method When A is symmetric. sign(r) ( ) r r + tnkn. However. known as the Wilkinson-shift. is normally used.1 n. we need only 0(n) ops to generate each Tk (note that the QR factorization of a symmetric tridiagonal matrix requires only 0(n) ops). in this case we have to deal with the eigenvalue problem of a symmetric tridiagonal matrix rather than of a Hessenberg matrix. the eigenvalue of the trailing 2 2 matrix that is closer to the (n n)th entry. the symmetric QR iteration is much more e cient than the nonsymmetric case. ( ) 2 1 ) (t(k. s1 j < : 8. since the eigenvalues of a symmetric matrix are all real and the Real Schur Form of a symmetric matrix is a diagonal rather than triangular matrix. Transform A to a symmetric tridiagonal matrix T using orthogonal similarity transformations: PAP T = T: 564 . To apply the QR iteration method to a symmetric tridiagonal matrix. the transformed Hessenberg matrix PAP T = H is also symmetric therefore. This is known as the Wilkinson-shift. then so is each matrix Tk in the sequence Tk . in this case a popular shift. n. k tnn ( ) ( ) 2 1 ! then the Wilkinson-shift is k = tnn + r . tnkn. t(k) ) n nn : r= 2 Thus the symmetric method comes in two stages: where Algorithm 8. Thus. we note that. k I = Qk Rk and. is usually chosen as the shift. Instead of taking the (n n)th entry at every iteration as the shift. furthermore. Thus if a trailing 2 2 submatrix of Tk is given by tnk. We can continue our iterations until the length of the interval js2 .

As in the general nonsymmetric case. j : ( +1) 1 ( ) 1 2 Remark: In practice. pp. the symmetric QR QT AQ = D + E algorithm with implicit shift generates an orthogonal matrix Q and a diagonal matrix D such that kE kF and (n) is a slowly growing function of n. the symmetric QR with implicit shift is stable. k I . However if the 3 3 Round-o error property. 278{281. pp. j ejtnkn. Convergence of the Symmetric QR Iteration ) It can be shown (see Lawson and Hanson. see Golub and Van Loan. where (n)kAkF 565 . see Lawson and Hanson SLP. For details. transforming orthogonal matrix Q is accumulated the ops count becomes 5n3. there exists e > 0 depending upon A such that for all k jtnkn. 240{244). This is known as Implicit Symmetric QR.1 to n zero is quadratic. +1 Also see exercise #28 in this chapter. given a symmetric matrix A. Apply single-shift QR iteration to T with the Wilkinson-shift : Set T = T1 For k = 1 2 : : : do until convergence occurs T k . it has been seen that the convergence is almost always cubic but the quadratic convergence is all that has been proved (for a proof. that is. Flop-Count: The symmetric QR algorithm requires only about 2 n ops. SLP p. 109) that the convergence of t(kn.II. I = Q k Rk Tk+1 = Rk Qk + I . Remark: It is possible to compute Tk from Tk without explicitly forming the matrix Tk . It can be shown that.

at least the extremal eigenvalues of very large and sparse symmetric matrices.1j p(n) kAk2 : Thus.12 The Lanczos Algorithm For Symmetric Matrices There are areas of applications such as power systems. The Symmetric Lanczos Algorithm Given an n n symmetric matrix A and a unit vector v1 . space science. etc. A method originally devised by Lanczos has received considerable attention in this context. quantum physics and chemistry. The symmetric Lanczos algorithm has served as a \seed" algorithm for further research in this area. then the eigenvalues and the eigenvectors can be computed much more accurately. There was a conference at Northern Carolina State University honoring Lanczos 100th birthday in December. Most large problems arising in applications are sparse. i i i Note: If the starting matrix itself is a symmetric tridiagonal matrix. 566 . The symmetric problem is better understood than the nonsymmetric problem. nuclear studies. 8. the Lanczos algorithm constructs simultaneously a symmetric tridiagonal matrix T and an orthonormal matrix V such that T = V T AV: The algorithm can be deduced easily as follows: (8. is one of the rst algorithms for large and sparse eigenvalue problems. where the eigenvalue problems for matrices of very large order are commonly found. Research in this area is very active. each computed eigenvalue ^ satis es the inequality: j . The algorithm. a Hungarian mathematician and physicist. .1) Cornelius Lanczos (1893-1974).12. 1993. The QR iteration method described in the last section is not practical for large and sparse eigenvalue problem. which is now known as the symmetric Lanczos algorithm. There are now wellestablished techniques to compute the spectrum.. the eigenvalues with modulus close to kAk2 are computed accurately the other ones may not be. We will discuss this method and its applications to eigenvalue computations in this section. made signi cant contributions to problems in physics and mathematics that are considered to be of vital interest in current scienti c computing. The sparsity gets destroyed and the storage becomes an issue.Furthermore.

.. j vj .. .1 n gives 0 n.4) Also. B .2) and (8. B B .. .12. and observing that the orthonormality condition gives vjT vj = 1 vjT vk = 0 j 6= k we obtain j = vjT Avj j = 1 2 : : : n: j . ..Let 0 and V = (v1 v2 : : : vn). 1 (where we assume that 0v0 = 0).1 + j vj +1 j = 1 2 ::: n . @ 1 1 1 2 0 n. Then the equation 0 B .3) (8. Multiplying both sides of the above equation by vjT to the left. . T =B . B B.12. vj = rj = +1 then from (8. The above discussion gives us the following basic Lanczos algorithm: j 567 .. .. .1vj . if we write rj = Avj . we get j provided that j 6= 0: 2 The nonvanishing of can be assured if we take j = krj k . @ 1 1 1 1 2 1 2 0 n. .1 (8.. . B A(v v : : : vn) = (v v : : : vn) B .12.12.4). .1vj .2) Avj = j vj + j ..1 1 C C C C C A (8. .1 1 C C C C C A V T AV = T or or AV = V T 0 B .12.1 n n. .

The whole algorithm can be implemented just by using a subroutine that can perform matrixvector multiplication. 2. 4. in contrast with Householder's method or Givens' method.1vj . = j . This is important when n is large.1 The Basic Symmetric Lanczos Given an n n symmetric matrix A and a unit vector v1 .1v1 g.Algorithm 8.1 Notes: 1. the matrix A is never altered during the whole procedure. The vectors v1 v2 : : : vn are called Lanczos vectors. k rj k 1 1 2 j .2 The Reformulated Lanczos Set v0 = 0 0 = 1 r0 = v1: For j = 1 2 : : : n do 568 . Algorithm 8. In fact. We now show the Lanczos algorithm can be reformulated using only three n{vectors. Such a feature makes the Lanczos algorithm so attractive for sparse computations.12. 3. vjT Avj (A . The Arnoldi method described in chapter 6 is the same as the symmetric Lanczos method described above. Each Lanczos vector vj +1 is orthogonal to all the previous ones provided j 6= 0. 5. j I )vj .12. the following algorithm constructs simultaneously the entries of a symmetric tridiagonal matrix T and an orthonormal matrix V = (v1 : : : vn) such that: V T AV = T: Set vo = 0 o = 1 ro = v1: For j = vj = j = rj = j = 1 2 : : : n do rj . The vectors fv1 : : : vj g from an orthonormal basis of the Krylov subspace fv1 Av1 : : : Aj . and thus the sparsity of the original matrix A can be maintained.

6. Avj = vjT uj = krj k: 3. v = B :8321 C @ A .1 2. j vj j Example 8. rj = rj .1 01 2 31 B C A = B2 3 4C @ A 3 4 5 =1 011 B C r = v = B0C: @ A 0 1 0 j=1: 1 1 011 B C = 3:6056 v = B 0 C : @ A 1 0 j=2: 2 T = 2 0 0 1 B C = 8:0769 = :6154 v = B :5547 C @ A :8321 ! ! 1 3:6056 2 2 1 1 1 2 = 3:6056 8:0769 j=3: 3 0 0 1 B C = 0:0769 = 1:5466 10.1 j 5. uj 4. vj = rj . :5547 :8321 C : @ A 0 :8321 . rj uj .1= j .1vj .0:0769 01 0 1 0 B C V = B .12.1.0:5547 3 14 3 3 Note that V T AV = T: 569 .01 :5547 0 1 3:6056 0 B 3:6056 :0769 :6154 C C T =T =B @ A 0 :6154 . j .

as can be seen ~ from the following result (Golub and Van Loan. If Vj . There now exist procedures such as Lanczos with complete orthogonalization which produce the Lanczos vectors that are orthogonal to working precision. the loss of orthogonality j among the Lanczos vectors is the real concern as explained in the following. 333): ~T jvjT vj j jrj vj j + kAk : ~ ~ j ~j j 2 +1 Thus. Several papers of Paige (1970. Procedures for complete and selective orthogonalization have been described in some detail in Golub and Van Loan MC. (1985b). are also very useful reading. computationally this is a blessing in disguise. 1971. This. a signi cant amount of research has been devoted to making the algorithm a practical one.Round-o Properties of the Symmetric Lanczos Algorithm It is clear from (8. However. However. 334{337.). however. Tj and rj denote the ~ respective computed quantities. ~ ~ Since the Lanczos algorithm plays an important role in sparse matrix computations. Loss of Orthogonality The Lanczos algorithm clearly breaks down when any of the j = 0. etc. We immediately obtain an invariant subspace. as we will see later. whenever needed. then vj and vj +1 will be far from being orthogonal. seldom happens in practice. then it can be shown (see Paige 1980) that ~ ~~ ~ j AVj = Vj Tj + rj eT + Ej where kEj k2 = kAk2 which shows that the Lanczos algorithm has very favorable numerical properties as far as the equation AVj = Vj Tj + rj eT is concerned. we refer the readers to the well-known books in the area by Parlett (SEP 1980).12. and by Cullum and Willoughby (1985a). if ~j is small. 1984 pp. The real di culty is that the computed ~j can be very small due to the cancellation that takes place during the computations of rj . 570 .2) that if the symmetric Lanczos algorithm is run from j = 1 to k(k < n). then we have AVj = Vj Tj + rj eT j ~ ~ where Vj = (v1 : : : vj ) and Tj is the j j principal submatrix of T . For details. 1972. MC 1984 p. whose pioneering work in the early 1970's rejuvenated the interests of the researchers in this area. and a small ~j can cause a severe loss of orthogonality among the computed vectors vj . or Lanczos with selective orthogonalization that is used to enforce orthogonality only in selective vectors.

there is an eigenvalue of A. see Golub and Van Loan.1 Let ( j zj ) be an eigenpair of Tj . 2 Let SjT Tj Sj = diag( 1 2 1 2 : : : j) 1 2 and Then Yj = (y y : : : yj ) = Vj Sj = Vj (s s : : : sj ): kRik = kAyi . Then the pair ( j yj ) where yj is de ned by yj = VjT zj . we introduce the following de nitions. In practice.Computing the Eigenvalues of A The sole purpose of presenting the Lanczos algorithm here was to show that the Lanczos matrices Tj can be used to compute certain eigenvalues of the matrix A.1 Let Ri = Ayi . when a j is exactly equal to zero. How do we compute kRik ? Fortunately. Then in each interval i . returning to the question of how well a Ritz pair approximates an eigenpair of A. we have an invariant subspace. Now. yi i i = 1 : : : j . Theorem 8. we can compute kRik from Tj without computing i and yi at every step. the eigenvalues of Tj provide very good approximations to the extremal eigenvalues of A. yi ik = kAVjsi . MC 1984 p.12. 2 2 Thus. kRik i + kRik ]. we state the following result (for a proof. Vj Tj = j vj eT ) j j = j j j keT si k (since kvj k = 1): j = j j j jsjij +1 +1 +1 571 . this happens very rarely in practice. Vj Tj )si k (because Tj si = si i ) = k( j vj eT )si k (Note that AVj . Unfortunately. for large enough values of j . The question arises: Can we give posteriori bounds? To this end. Vj si ik = k(AVj . 327). We remarked earlier.12. it follows from the above theorem that kRi k2 is a good measure of how accurate is the Ritz pair ( i yi). De nition 8. is called the Ritz pair the j 's are known as the Ritz values and yj 's are called the Ritz vectors. This is indeed good news.

R = j j js j = :6154 :3870 = :2382 1 2 21 R = j j js j = :5674: 2 2 22 572 . 9.6235 of A). iyik = j j j jsjij where sji = eT si si is the ith column of Sj : j Example 8. 1 ::: j are the eigenvalues of Tj .2 (Residual Theorem for Ritz Pair) Let Tj denote the j j symmetric tridiagonal matrix obtained after j steps of the Lanczos algorithm.6235. The above discussion can be summarized in the following theorem: Theorem 8. -. we have kRik = kAyi .2 3 4 5 j=2: The eigenvalues of A are: 0.6235. Let Yj = (y y : : : yj ) = Vj Sj : 1 2 Then for each i from 1 to j .:5133 1 2 = 9:5903: (Note that 2 01 2 31 B C A = B2 3 4C: @ A = 9:5903 is a good approximation of the largest eigenvalue 9.where sij is the (j i)th entry of Sj .:3970 :9211 = .12. Let SjT Tj Sj denote the Real Schur Form (RSF) of Tj SjT Tj Sj = diag( : : : j ) 1 that is.12. ! 1 3:6056 T2 = 2 = :6154 3:6056 8:0769 ! :9221 :3870 S2 = .

R = j j js j = 1:4009 10. in the study of vibrations of structures. In several applications explicit knowledge of the eigenvalues is not required all that is needed is a knowledge of the distribution of the eigenvalues in a region of the complex plane or estimates of some speci c eigenvalues. and the eigenvalue problem arises principally in the analysis of stability of these equations. Applications of the Eigenvalues and Eigenvectors. The eigenvalue problem arises in a wide variety of practical applications. the problem of computing the eigenvalues and eigenvectors of a matrix: Ax = x. and R3 in this case.j=3: 0 1 3:6056 B T = B 3:6056 8:0769 @ 0 C :6154 C 3 A 0 :6154 . We have attempted just to show how important the eigenvalue problem is in practical applications. This is a situation engineers would like to avoid. For example.:4082 :3851 1 B C S3 = B . Here are the highlights of the chapter. resonance occurs. Mathematical models of many of the problems arising in engineering applications are systems of di erential and di erence equations.) 8. and if any of the natural frequencies becomes equal or close to a frequency of the imposed periodic force on the structure. Localization of Eigenvalues.0:0769 0 :8277 . : (Note the smallness of R1 R2. and principal components analysis in statistics with a reference to a stock market analysis. In this chapter we have included examples on the European arms race. Maintaining the stability of a system is a real concern for engineers.13 Review and Summary This chapter has been devoted to the study of the eigenvalue problem. R = j j js j = 9:0367 10. 2. 1. simulating transient current of an electric circuit. the eigenvalues and eigenvectors are related to the natural frequencies and amplitude of the masses.:2727 :1132 :9210 C @ A :4196 :9058 :0584 1 2 3 3 3 3 31 32 33 15 14 16 1 R = j j js j = 6:487 10. vibration of a building. buckling of a beam. 573 .

then the condition number of the transforming matrix X . etc. The Power Method and The Inverse Iteration. a number of the eigenvalues in a region. The Rayleigh Quotient Iteration. j j kAk. where only the largest and the smallest (in magnitudes) eigenvalues or only the rst or last few eigenvalues and their corresponding eigenvectors are needed. There are applications such as analysis of dynamical systems. Cond(X ) = kX k kX . principal component analysis in statistics. the power method should be used with a suitable shift. plays 574 .4. gives an estimate of the eigenvalue of a symmetric matrix A for which x is the corresponding eigenvector. The inverse power method is simply the power method applied to (A .1 where is a suitable shift. buckling of a beam. 4.1k. I ). however. very crude. Also.4. 5.1) tells us that if A is a diagonalizable matrix.2) can be used to obtain a region of the complex plane containing all the eigenvalues. The power method is extremely simple to implement and is suitable for large and sparse matrices. Sensitivity of Eigenvalues and Eigenvectors. The estimates are.3) This result says that the upper bound of any eigenvalue of A can be found by computing its norm. The quotient T Rq = xxTAx x known as the Rayleigh quotient. Recall that this result is important in convergence analysis of iterative methods for linear systems. This idea.. The process is known as the Rayleigh quotient iteration. can be used to compute an approximation to an eigenvalue and the corresponding eigenvector. but there are certain numerical limitations. vibration analysis of structures.7. The Bauer-Fike Theorem (Theorem 8.4. It is widely used to compute an eigenvector when a reasonably good approximation to an eigenvalue is known.The Gersgorin disk theorems (Theorems 8. 3. (Theorem 8. or in some cases. The power method and the inverse power method based on implicit construction of powers of A can be used to compute these eigenvalues and the eigenvectors. In practice.1 and 8. when combined with the inverse iteration method.

Eigenvalue Computation via the Characteristic Polynomial and the Jordan Canonical Form. if there is a multiple eigen- value or there is an eigenvalue close to another eigenvalue. Unfortunately. The eigenvalues of these condensed forms are rather easily computed. On the other hand. The eigenvalues of a symmetric matrix are well-conditioned. which are the zeros of the characteristic polynomial. to measure the sensitivity of an individual eigenvalue. Thus. However. The JCF displays the eigenvalues explicitly. and yi are. then a small change in A can cause signi cant changes in the eigenvalues. the normalized right and left eigenvectors corresponding to i . if the eigenvalues are well-separated and well-conditioned.the role of the condition number of the eigenvalue problem. and with the companion or Frobenius form. The condition number of the eigenvalue i is the reciprocal of the number jyiT xi j. but the eigenvectors can be quite ill-conditioned.8. It is thus important to know the sensitivity of the individual eigenvalues. A similarity transformation preserves the eigenvalues and it is well known that a matrix A can be transformed by similarity to the Jordan Canonical Form (JCF) and to the Frobenius form (or a comparison form if A is nonderogatory). The sensitivity of an eigenvector xk corresponding to an eigenvalue k depends upon (i) the condition number of all the eigenvalues other than k and (ii) the distance of k from the other eigenvalues (Theorem 8. then there are some illconditioned eigenvectors. computation of eigenvalues via the characteristic polynomial or the JCF is not recommended in practice. If an eigenvalue problem is ill-conditioned. it immediately follows from the Bauer-Fike Theorem that the eigenvalues of a symmetric matrix are insensitive to small perturbations. the characteristic polynomial of A is trivially computed and then a root.1). then it might happen that some eigenvalues are more sensitive than the others. If this number is large. one needs the knowledge of both left and right eigenvectors corresponding to that eigenvalue. then the eigenvectors are well-conditioned. 6.nding method can be applied to the characteristic polynomial to obtain the eigenvalues. where xi. Since a symmetric matrix A can be transformed to a diagonal matrix by orthogonal similarity and the condition number of an orthogonal matrix (with respect to 2-norm) is 1. This is especially signi cant for a symmetric matrix. Obtaining these forms may require a very ill-conditioned transforming 575 . respectively.

The eigenvalues appearing in RSF obtained by the QR iteration algorithm do not appear in the desired order. the shifts are the eigenvalues of the 2 2 submatrix at the bottom right hand corner. At each iteration. Since the eigenvalues of a real matrix can be complex. complex arithmetic is usually required.matrix. therefore. With double shifts. However. and the process is applied to the de ated matrix. Also. however. double shifts are used. In general. although there are some applications which need 576 . The double shift implicit QR iteration seems to be the most practical algorithm for computing the eigenvalues of a dense matrix of modest size. if implemented naively (which we call the basic QR iteration). Ordering the Eigenvalues. The convergence can be accelerated by using suitable shifts. will require 0(n4) ops. The matrix A is. In practice. can be quite slow in the presence of a near-multiple or a multiple eigenvalue. For a real matrix A. the matrix is de ated. such as orthogonal matrices. the n steps of QR iteration algorithm. The QR Iteration Algorithm. ill-conditioned similarity transformation should be avoided in eigenvalue computation. 7. Since the algorithm is based on repeated QR factorizations and each QR factorization of an n n matrix requires 0(n3 ) ops. The most widely used algorithm for nding the eigenvalues of a matrix is the QR iteration algorithm. The use of well-conditioned transforming matrices. is desirable. The convergence of the Hessenberg-QR iteration algorithm. initially reduced to a Hessenberg matrix H by orthogonal similarly before the start of the QR iteration. making the algorithm impractical. the eigenvalues of the 2 2 bottom right hand corner matrix at each iteration do not need to be computed explicitly. and the Hessenberg form is preserved at each iteration. the eigenvalues are computed two at a time. Once two eigenvalues are computed. The process is known as the double shift implicit QR iteration. 8. computations can be arranged so that complex arithmetic can be avoided. and the sensitivity of the eigenvalue problem depends upon the condition number of this transforming matrix. The key observations here are: the reduction of A to H has to be made once for all. the algorithm basically constructs iteratively the Real Schur Form (RSF) of A by orthogonal similarity.

10. the eigenvalues can be put in the desired order. can be used to compute the eigenvalues of a symmetric matrix. However. Since the matrix A is initially reduced to a Hessenberg matrix for practical implementation of the QR iteration algorithm. The given symmetric matrix A is rst transformed to a symmetric tridiagonal matrix T using orthogonal similarity and then the well-known bisection algorithm for the rootnding problem is applied to the characteristic polynomial of T . Alternatively. This recursion not only gives the characteristic polynomial of T . We have not discussed these methods in this chapter. The bisection method is especially useful for nding eigenvalues of a symmetric matrix in a given interval of the real line. The QR iteration algorithm. but also gives the characteristic polynomials of all the leading principal submatrices. of course. Large and Sparse Eigenvalue Problem. A shift called the Wilkinson shift is normally used here. The convergence of the symmetric QR iteration algorithm with the Wilkinson-shift has been proven to be quadratic however. 9. Computing the Eigenvectors. etc. and there are methods for the symmetric problem that can exploit these properties. The eigenvalues and eigenvectors of a symmetric matrix A enjoy some remarkable special properties. which is used in the implementation of the bisection method. inverse iteration can be invoked to compute the corresponding eigenvector.this. Once an approximation to an eigenvalue is obtained for the QR iteration. such as the divide and conquer method. designed by Stewart.. These methods are important primarily for parallel computations.1). for the symmetric eigenvalue problem. obtained by a simple recursion (Section 8. the Jacobi method. 10. one can compute the eigenvectors directly from the RSF of A. 11. There is an excellent Fortran routine. advantage can be taken of the structure of a Hessenberg matrix in computing an eigenvector using inverse iteration. A remarkable fact is that these polynomials have the Sturm sequence property. One such method is the bisection method. with a little extra work. The Symmetric Eigenvalue Problem. There are other methods. in practice very often it is cubic. which will accomplish this. The eigenvalue problem for large and sparse matrices is an active area of research. The state-of-the-art techniques using Lanczos or Arnoldi 577 .

almost all eigenvalue problems here are generalized eigenvalue problems as a matter of fact. The techniques for symmetric eigenvalue problems are more well-established and betterunderstood than those for the nonsymmetric problem. and by Peter O'Neil. 1991. such as solutions of the Lyapunov matrix equation (Bartels and Stewart (1972)). the Sylvester Matrix Equation (Golub. Nash. Some references of how eigenvalue problems (especially large and sparse eigenvalue problems) arise in other areas of sciences and engineering such as power systems. they are symmetric de nite problems. see the books on numerical methods in engineering by Chapra and Canale. see section 9. The books by Inman and by Thompson are. structural mechanics. Volume by Patel et al. the IEEE Reprint. mechanical. 1985. and Van Loan (1979)). Van Dooren (1982)). and the book Numerical Methods in Control Theory.12). chemistry. oceanography.. very useful and important books in this area. physics. see the recent paper by Brualdi (1993). etc. Birkhauser. However. see the book Computational Methods for Linear Control Systems. 1.10 in the next chapter. by B. Boston.14 Suggestions for Further Reading Most books on vibration discuss eigenvalue problems arising in vibration of structures. For some generalizations of the Gersgorin disk theorems. For references of the well-known books in vibration. Byers (1984). There are other engineering books (too numerous to list here). (1994). Prentice-Hall. containing discussions on eigenvalue problems in engineering. are given in the book Lanczos Algorithms for Large Symmetric Eigenvalue Computations. we have included only a very brief description of the symmetric Lanczos method (Section 8. For the sake of completeness and to give the readers an idea of how the Lanczos-type methods are used in eigenvalue computations. especially in the areas of electrical. Datta (in preparation). in particular. For learning more about how the eigenvalue problem arises in other areas of engineering. etc. referenced in Chapter 6. The Real Schur Form (RSF) of a matrix is an important tool in numerically e ective solutions of many important control problems. by Petkov. Algebraic Riccati equations (Laub (1979). civil and chemical engineering. 8. by Jane Cullum and Ralph Willoughby. For details. vol. Christov and Konstantinov.type of methods with some sort of reorthogonalization and proper preconditioning can compute only a few extremal eigenvalues. This paper contains results giving a region of the complex plane for each eigenvalue for a 578 . N.

1979. Cambridge University Press. N. Again.|can be found in all numerical linear algebra books: Golub and Van Loan (MC. etc. Models. the most authoritative book in this area. Chapter 9). 1980. Prentice-Hall. see Matrix Analysis by Roger Horn and Charles Johnson. A basic discussion on Lanczos methods with and without reorthogonalizations. see Dongarra and Sorensen (1987) and the original paper of Cuppen (1981). Stewart (IMC).full description of the Gersgorin disk theorems and applications. For results on eigenvalue bounds. volumes I and II. Willoughby. the book by Golub and Van Loan (MC 1984 and 1989) is a rich source of references. and this is an active area of research. For a description of the divide and conquer method. Prentice-Hall. Kagstrom and Ruhe (1980a and 1980b). 1985 (Chapter 6). John Wiley & Sons. the Rayleigh-Quotient iteration method. Birkhauser. K. NJ. An important book in the area of symmetric eigenvalue problem is the book The Symmetric Eigenvalue Problem by Beresford Parlett. of course. and Peters and Wilkinson (1979) are important in the context of inverse iteration. Another recent book in this area is Eigenvalue Problem for Large Matrices. by J. 1985. many papers in this area have been published. Saad (1993). A nice description of stability theory in dynamic systems is given in Introduction to Dynamic Systems: Theory. Englewood Cli s. Chapter 9). etc. 1983 and 1989). the inverse power method. see the papers by Golub and Wilkinson (1976). and Applications by David Luenberger. Parlett. Watkins (FMC). Hager (ANLA). A. 240{247. A Fortran program for ordering the eigenvalues of a real upper Hessenberg matrix appears in Stewart (1976). For computation of Jordan Canonical Form. The papers by Varah (1968). pp. and Demmel (1983). and The Symmetric Eigenvalue Problem by B. Two authoritative books in this area are: Lanczos Algorithms for Large Symmetric Eigenvalue Computations. is contained in the book by Golub and Van Loan (MC 1984. For a proof of the global convergence of the symmetric QR iteration with Wilkinson-shifts. (1970). New York. see the paper by Varah (1968). Cullum and R. The Wilkinson AEP is. Descriptions of the usual techniques for eigenvalue and eigenvector computations|the power method. Wilkinson (AEP). See the list of references given in Golub and Van Loan (MC 1989. by Y. see the book SLP by Lawson and Hanson. Since the pioneering work by Paige in the 1970's. 579 . and their applications to solutions of positive de nite linear systems and eigenvalue computations. Cambridge. and the QR iteration method. Watkins (FMC) contains a nice description of the Jacobi method (Chapter 6).

Paige (1980). These are considered to be the \seed papers" for further recent active research in this area.) are also well worth reading. etc.The doctoral thesis of Chris Paige (1971) and several of his follow-up papers (Paige (1976). 580 .

Consider the following model for the vertical vibration of a motor car: car body mass m1 y d (suspension) 1 2 k1 m2 k2 d (tire) 2 y 1 Road (a) Formulate the equation of motion of the car.k 1 Determine the stability of motion when 1 .Exercises on Chapter 8 (Use MATLAB whenever needed and appropriate) PROBLEMS ON SECTION 8.k ! k +k ! y y= : y 1 1 1 2 1 2 2 m = m = 1200kg N k = k = 300 cm : 1 2 581 .3 1. neglecting the damping constants d1 of the shock absorber and d2 of the tire: M y + Ky = 0 where M = diag(m1 m2) and k K= .

Write the solution of the equation M y + Ky = 0 with M and K as given in #1(a). Develop an eigenvalue problem for an LC network similar to the case study given in section 8.M .M . using initial conditions y (0) = 0 and y (0) = (1 1 : : : 1)T . and d D= .2. _ 3.d ! : d 1 1 Investigate the stability of motion in this case when d = 4500 Ns : m 1 Hints: Show that the system M y + Dy + Ky = 0 _ is equivalent to the rst order system x(t) = Ax(t) _ where and A= . K 1 0 I . D 1 ! y (t) x(t) = : y (t) _ ! 2. 582 . Show that in this case the natural frequencies are given by 8 > :4451jpLC > < p w = > 1:2470j LC > 1:8019jpLC : Find the modes and illustrate how the currents oscillate in these modes.(b) Formulate the equation of motion when just the damping d2 of the tire is neglected: M y + Dy + Ky = 0 _ M . and K are the same as in part (a).d 1 1 . but with only three loops.2.

Apply the Gersgorin disk theorems to obtain bounds of the eigenvalues for the following matrices: 0 10 1 1 1 B C (a) A = B 2 10 1 C @ A 2 2 10 01 0 01 B C (b) B 2 5 0 C @ A 1 1 6 0 2 .PROBLEMS ON SECTION 8.1 0 C C C (c) B B B 0 .1 2 .1 2 1 1 0 0 B .1 0 0 1 B B . Let x be an eigenvector corresponding to an isolated eigenvalue in the Gersgorin disk Rk .4 4.1 2 . 583 .1 2 .i 0 B 1 C (f) B 1 1 C: @ A 0 1. Using a Gersgorin disk theorem.1 C C @ A . 2 0 10000 0 :5771 :509 :387 :462 1 : B :577 1:000 :599 :389 :322 C B C B B :509 :599 1:000 :436 :426 C C C (e) B B C B B :387 :389 :436 1:000 :523 C C @ A 426 0 1:452 :322 :1 1 :523 1:000 . ri aij + ri].1 C C @ A 0 0 0 . prove that a diagonally dominant matrix is nonsingular. Find an interval where all the eigenvalues of A must lie. Prove that jxk j > jxij for i 6= k.1 .1 2 . 6.i 1+i 5. 7. Let A = (aij be an n n symmetric matrix. Then using a Gersgorin disk theorem prove that each eigenvalue of A will lie in one of the intervals: aij .1 0 C B C C (d) B B B 0 .

5 8. Applying the power method and inverse power method nd the dominant eigenvalue and the corresponding eigenvector for each of the matrices in the exercise #4. and the corresponding eigenvectors 10. (Inverse Orthogonal Iteration) The following iteration. Explain the slow rate of convergence of the power method with the following matrices: 03 2 31 B C (a) A = B 0 2:9 1 C @ A 0 00 0 1 1 1 0 B B 1 10 0 C C (b) A = @ A 1 1 9:8 Choose a suitable shift and then apply the shifted power method to each of the matrices and observe the improvement of the rates of convergence. 584 . then 1 . Then For k = 2 3 : : : do 1) Compute Bk = AQk. The process can be used to compute p(p > 1) largest eigenvalues and the corresponding eigenvectors. 11.PROBLEMS ON SECTION 8. I . n are the eigenvalues of A and v1 : : : vn are the corresponding eigenvecn . (Orthogonal Iteration) The following iterative procedure generalizes the power method and is known as the orthogonal iteration process. 12. can be used to compute the p smallest eigenvalues.1 2) Factorize into QR: Bk = Qk Rk : Apply the above method to compute the rst two dominant eigenvalues and eigenvectors for each of the matrices in Exercise #4. Let Q1 be an n p orthonormal matrix. Prove that if 1 : : : tors. 9. : : : are v1 : : : vn. called the Inverse Orthogonal iteration. generalizes the inverse power method and. are the eigenvalues of A . Let Q1 be an n p orthonormal matrix.

where qn is the last column of Q in (T . Compute the smallest eigenvalue of each of the matrices A in Exercise #4 by applying the power method to A.For k = 2 3 : : : 1) Solve for Bk : ABk = Qk. Construct a simple example to show that an arbitrary similarity transformation can worsen the conditioning of the eigenvalues of the transformed matrix.6{8. Then compute the corresponding eigenvector without invoking the inverse iteration. then prove that x1 = qn. A A = 0 A 1 2 3 ! PROBLEMS ON SECTIONS 8. 15. without explicitly computing A. 2) Factorize into QR: Bk = Qk Rk : Apply the Inverse Orthogonal Iteration to compute the 2 smallest (least dominant) eigenvalues of each of the matrices in Exercise #4. 16. De ation Using Invariant Subspace. where U is unitary. Suppose that we have an n m matrix X with independent columns and an m m matrix M such that AX = XM: Consider the QR factorization of X : QT X = Then show that (a) R 0 ! : QAQT (b) the eigenvalues of A are those of A1 and A3 . Let T be a symmetric tridiagonal matrix. (c) the eigenvalues of A are those of M .1. 0I ) = QR: 14.1. Let the Rayleigh-Quotient Iteration be applied to T with x0 = en . and U (A + A)U = B + B then show that k B k2 = k Ak2. Compute the subdominant eigenvalue of each of the matrices in Exercise #4 using Householder de ation.1 .8 17. If B = U AU . 585 . 13. 18.

(a) Given 1 1 A= 0 1+ nd the eigenvector-matrix X such that X . (b) Prove that a matrix A has a set of n orthonormal eigenvectors i A is normal. (d) How does the Real Schur Form of a normal matrix look? (e) Using the eigenvector-sensitivity theorem (Theorem 8. 21. where D is diagonal.8. 20.19. Verify the ill-conditioning of the eigenvalues of A computationally by constructing a small perturbation to A and nding the eigenvalues of the perturbed matrix. are ill-conditioned.1). by computing the eigenvalues using MATLAB. then the eigenvector xk corresponding to the eigenvalue k is well-conditioned if k is well-separated from the other eigenvalues. Prove the Bauer-Fike Theorem using Gersgorin's rst theorem. Show that the unitary similarity transformation preserves the condition number of an eigenvalue. However. that is. for each eigenvalue of A. 2) where is a very small positive number. the eigenvalues are well-conditioned.1AX is diagonal hence show that the eigenvalues of A are ill-conditioned. we have j j = 1. 22. (a) Prove that A is normal if there exists an unitary matrix U such that U AU = D. (c) Prove that a normal matrix A is unitary i its eigenvalues are on the unit circle. 586 ! B B @ C C A . show that if A is normal. Show both theoretically and experimentally that the eigenvectors of the matrix A = diag(1 + 1 . (b) Consider 0 12 11 10 2 11 B 11 11 10 C B 2 1C B C B 0 10 10 C B 2 1C B C: A=B C B 2 1C B C 0 0 0 0 11 1 Show that the largest eigenvalues of A are well-conditioned while the smallest ones very ill-conditioned. 23.

Consider one step of the single-shift QR iteration: k k Hk . then RQ is upper Hessenberg. the Q-matrix in H = QR need not be an upper Hessenberg. 26. Prove that (a) the QR factorization of an n n Hessenberg matrix requires 0(n2) ops. is always upper Hessenberg.10 24. (a) 01 2 01 B C A = B2 3 4C @ A 0 4 1 (b) 04 5 61 B C A = B1 0 1C @ A 0 . I = QR H = RQ + I 587 is real: . hnn I = Qk Rk Hk = RkQk + hnn I ( ) +1 ( ) or (simply) H . Apply 3 iterations of the single-shift QR iteration to each of the following matrices and observe the convergence or nonconvergence of the subdiagonal entries. Perform one step of the single-shift QR iteration the matrix 01 1 11 B C H = B 2 10 6 C @ A 0 :1 1 and observe the quadratic convergence. Implicit single-shift QR. (b) the QR factorization of a symmetric tridiagonal matrix requires only 0(n) ops. then Q in H = QR. Construct an example to show that for an arbitrary upper Hessenberg matrix H . 29.2 2 28. if R has nonzero diagonal entries. Prove that if R is upper triangular and Q is upper Hessenberg.PROBLEMS ON SECTIONS 8.9 AND 8. 27. 25. However.

C: B . . Conclude nally 0 from the Implicit Q Theorem that the Hessenberg matrix H1 is essentially the same as H 0: The steps (a) to (c) constitute one step of the implicit single-shift QR iteration. In analogy with QR iteration algorithm.1 . (b) Denote the rst column of H . . Show that the matrix ~ Q = P J : : : Jn n. I by h1 = (h11 . C B . I . C B C B . . . Set A = A1 2) Compute Ak = Lk Rk Ak+1 = Rk Lk k = 1 2 : : : Why is this algorithm not to be preferred over the QR iteration algorithm? 588 . except possibly for the sign. ) 1 32 43 1 32 1 is upper Hessenberg. . Apply one step of the implicit single-shift QR iteration to the symmetric tridiagonal matrix 0 2 . making the necessary assumptions. Construct one step of the explicit double-shift QR iteration (real arithmetic) and one step of the implicit double-shift QR iteration for the matrix 01 2 3 41 B3 4 5 6C B C C A=B B B0 1 0 1C C @ A 0 0 . LR Iteration. )T H 0(J : : : Jn n. Show that the rst column of P0 is the same as the rst column of Q.1 1 0 B . and therefore contains only two nonzero entries. h21 0 : : : 0)T : Find a Givens rotation P0 such that P0 h1 is a multiple of e1 .1 2 30. (c) Form H 0 = P0T HP0 .. develop a LR iteration algorithm.(a) Prove that the rst column of Q is a multiple of the rst column of H .. .1 C @ A 0 . 31. based on LU decomposition of A. Find Givens rotations J32 J43 : : : Jn n.2 2 and show that the nal Hessenberg matrices are (essentially) the same. 0 32 1 has the same rst column as P0 and hence the same rst column as Q.1 such that H 0 = (J J : : : Jn n. .

33.1 0 0 1 B . 36. then at least (k . (a) Prove that the eigenvalues of an unreduced real symmetric tridiagonal matrix are real and distinct.1 2 . Show . Prove the following: Let H = H0 .11 AND 8. 37.6). 1) subdiagonal entries of T must be zero.12 35.1 2 . Considering the structures of the matrices Pi i = 0 1 : : : n . Apply Sturm Sequence -Bisection Algorithm (Algorithm 8. Generate the sequence fHk g: Hk . (b) Prove that if is an eigenvalue of multiplicity k of an unreduced symmetric tridiagonal matrix T . (Section 6.1 0 C B C C A=B B B 0 . (c) Compute the eigenvalues of A by applying the symmetric QR iteration with the Wilkinsonshift. k kI = Qk Rk Hk l = Rk Qk + k I: + Then n (H .1 2 Without computing the eigenvalues show that j j < 4 for each eigenvalue of A. (a) Let that there are exactly two eigenvalues greater than 2 and two less than 2.1 C : C @ A 0 0 .4. (a) Develop a QR-type algorithm to compute the eigenvalues of a symmetric positive de nite matrix A. Show that the matrices Hs and Hs+2 in the double-shift QR iteration have the same eigenvalues.11.32. (b) Apply the inverse Hessenberg iteration algorithm to A to compute the eigenvector associated with the eigenvalue close to 2.1). based upon the Cholesky decomposition. 34. 2 in the implicit double-shift QR iteration step (Exercise #29). I ) = (Q i 1 i=1 : : : Qn)(Rn : : : R ): 1 PROBLEMS ON SECTIONS 8. show that it requires only 4n2 ops to implement this step. 589 0 1 . to compute the eigenvalue close to 2.

40. Let A = A + iA be a Hermitian matrix. sign ( 2) 2 j j+ where = . Take the 50 50 symmetric tridiagonal matrix T with 2 along the diagonal and -1 along the sub diagonal and super diagonal.12.3 to T with j = 1 2 : : : 20. . Apply Algorithm 8.1 and 8. Let A= ! : Prove that the eigenvalue of A closest to is given by 2 p + = .(b) Test your algorithm with the matrix A of problem #35.12. Find pproximations to a few extreme eigenvalues of T using Theorems 8. 2 A . 38.A 39.12. 590 . Then prove that B = A A How are the eigenvalues and eigenvectors of A related to those of B ? 1 2 1 2 1 2 ! is symmetric.2.

sigma. A = The Wilkinson bidiagonal matrix of order 20.MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 8 1.x0.x0. 2.n) 03 2 3 1 B C A = B 0 0:99 1 C @ A Test Data and Experiment: (a) Take the 20 20 symmetric tridiagonal matrix appearing in the buckling problem of section 8. Write a MATLAB program to compute the dominant eigenvalue of a matrix using the power method.n) (a) Modify the program power to incorporate a shift sigma lambda1] = powershift(A. 591 .x0.epsilon.n) (b) Apply power and powershift to the following matrices and compare the speed of convergence Test Data: 0 0 2:9 A = A randomly generated matrix of order 5.epsilon.epsilon. lambda1] = power(A.x0.epsilon. (b) Now compute the smallest eigenvalue in magnitude. lambdan.3. write a MATLAB program called powersmall to compute the smallest eigenvalue (in magnitude) of a matrix A lambdan = powersmall(A. Apply power to compute the dominant eigenvalue lambda1 by choosing x0 arbitrarily. (a) Using linsyspp from Chapter 6 or the MATLAB command `n'.n) (b) Using linsyspp from Chapter 6 or the MATLAB command `n'. write a MATLAB program called invitr to implement the inverse iteration algorithm x = invitr(A.sigma.2. by using (i) powershift with sigma = lambda1 (ii) powersmall with the same x0 as used to implement power.

Consult the example on population study in Section 8. housmat (from Chapter 5) and housmul (from Chapter 4). Using power.4. determine by using power. (d) Find the smallest critical load that will buckle the beam (e) Taking sigma = lambdan. invitr. chohort population model can be represented by the following system of di erence equations : p (k + 1) = p (k) + p (k) + ::: npn (k) 0 0 0 1 1 pi (k + 1) = i pi(k) i = 0 1 ::: n . invitr. what is the nal population distribution and how fast is the original distribution approaching the nal distribution. write a MATLAB program to implement the Householder de ation algorithm that computes the subdominant eigenvalue of a matrix: lambda2 = housde t(a) Test data: A single sex. 1. if there is a long term growth in the population if this is so.2. Taking 0 = 0 1 = 2 = ::: = n = 1. nd the eigenvector corresponding to the smallest eigenvalue n using invitr. 4.) (a) For each of the following matrices construct a matrix X of appropriate order which is upper triangular with all the entries equal to 1 except for a few very small diagonal 592 . and i = 1 i = 0 1 ::: n . housde t.(c) Compare the speed of (i) and (ii) in (b). (The purpose of this exercise is to study how are the eigenvalues of a matrix A are a ected by conditioning of the transforming matrix. p 170). 3. 1 +1 or in matrix form pk = Apk : +1 Here i is the birth rate of the ith age group and i is the rate of survival of that group (see Luenberger (1979).

(b) Repeat part (a) by taking X as a Householder matrix of appropriate order. (d) Perturb the (n. (The purpose of this exercise is to study the sensitivities of the eigenvalues of 00 0 2 1 01 1 B C B A = B 1 0 . (c) Compare the results of (a) and (b). Perform the following on each of the matrices in the test-data: (a) Using the MATLAB command V D] = eig(A). 5. Then nd the matrix of left eigenvectors W as follows W = (inv(V))' /norm(inv(V)') (b) Compute si = wiT vi i = 1 ::: n: where Wi and Vi are the ith columns of W and V . (e) Make the following table for each matrix. Then compute the eigenvalues of A and those of (X ).10. 1 C 1 C A 0 1 4 0 0 0:9999 A = The Wilkinson bidiagonal matrix of order 20.1)th entry of A by = 10.entries. 2) MATLAB command eig. (b) Compare your results of (1) and (2) for each matrix. 593 . 6. Then compute the eigenvalues ^ i i = 1::: n of the perturbed matrix using the MATLAB command eig.5 C A = B 0 :19 @ A @ 1 some well-known matrices with ill conditioned eigenvalues). (a) Compute the eigenvalues of the following matrices using: 1) MATLAB commands poly and roots. nd the eigenvalues and the matrix of right eigenvectors.1AX using MATLAB commands eig and inv : 01 1 0 01 00 0 2 1 B0 1 0 0C B C C A = B 1 0 . i = 1 2 ::: n.5 C B C A=B B @ A B 0 0 0:9 1 C C @ A 0 1 4 0 0 0 1 A = The Wilkinson bidiagonal matrix of order 20. (c) Compute ci = the condition number of the ith eigenvalue = 1=si.

V D] = eig(A). Test-Data: (1) A = The Wilkinson bidiagonal matrix of order 20 (2) A = The transpose of the Wilkinson Bidiagonal matrix of order 20. where A is the matrix obtained from A by perturbing (n 1)th entry by = 10. ^ ij Cond (V) ci (f) Write your conclusions. Study the sensitivites of the eigenvectors of the following matrices by actually computing the the eigenvectors of the original and perturbed matrices using the MATLAB command ^ ^ V D] = eig(A).5.i ^i j i . (3) 0 12 11 10 3 2 11 C B 11 11 10 B 3 2 1C C B 0 0 0 1 1 7. B B A=B B B B B B @ C C C C C C C 2 2 1C A 594 .

Test your program with a randomly generated matrix with di erent values of k. hprime = de at(H k). Using Qritrdsi and de at. 9. 1. Test: Generate randomly a 20 20 Hessenberg matrix and make the following table using rsf. (b) Modify the program now to implement the Hessenberg QR iteration A] = qritrh(A num) num is the number of iterations (c) Compare the speed of the programs by actually computing the op-count and elapsed time . 11. Write a MATLAB program to de ate the last k rows and k columns of a Hessenberg matrix. for k = 2 hprime will be of order n . and so on. hprime will be of order n . write a MATLAB program to determine the Real Schur form of a Hessenberg matrix A h] = rsf(h eps) where eps is the tolerance. 595 . 10. 0 0 0 C 0 C C C 2 C A 0 0:0005 1 8. Note that for k = 1.01 0 0 B 0 :999 0 B A=B B B0 0 0 @ 1) A = diag(1 :9999 1 :9999 1) 2) A randomly generated matrix of order 5. Write a MATLAB program called qritrb to implement the basic QR iteration using givqr from Chapter 5 : (a) A] = qritrb(A num) num is the maximum number of iterations. 2. (a) Write a MATLAB program to compute one step of the explicit double shift QR iteration: A] = qritrdse(A) (b) Write a MATLAB program to compute one step of the implicit QR iteration with double shift: A] = qritrdsi(A). (c) Compare your results of (a) and (b) and conclude that they are essentially the same.

11.1: number] = signagree (T meu). qritrh. called lanczossym. invitr. are in MATCOM.3. (To generate a symmetric matrix B. called polysymtri to compute the characteristic polynomial pn( ) of an unreduced symmetric tridiagonal matrix T .1: lambda] = bisection (T m n). based on Theorem 8. Write a MATLAB program. and then compare your results with those obtained by using eig(T). to implement the reformulated symmetric Lanczos algorithm of Section 8.11. with n = 20. Compute n. 13. then take B = A + AT ). (c) Using polysmtri and signagree. But it is a good idea to write your own codes). (a) Write a MATLAB program. write a MATLAB program called signagree that nds the number of eigenvalues of T greater than a given real number . using bisection. (Note : The programs power.11.11. implement the Bisection algorithm of Section 8.1 valpoly] = polysymtri(A lambda): (b) Using polysymtri. qritrdsi. based on the recursion in Section 8. Test Data: A = The dymmetric tridiagonal matrix arising in Buckling Problem in Section 8. qritrb.2.m+1 for m = 1 2 3 ::: .Iteration h21 h32 h43 h54 h20 1 TABLE 12. 596 . Using lanczossym nd the rst ten eigenvalues of a randomly generated symmetric matrix of order 50. etc. generate A rst.3.

5. THE GENERALIZED EIGENVALUE PROBLEM 9.9.12 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 655 .9.8 The Quadratic Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 643 9.6.3.2 Generalized Schur Decomposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 599 9.2 A Case Study on the Vibration of a Building : : : : : : : : : : : : : : : : : : 622 9.9.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 597 9.7.9.6.1 The QZ Method for the Symmetric-De nite Pencil : : : : : : : : : : : : : : : 614 9.6.3.1 The Sturm Sequence Method for Tridiagonal A and B : : : : : : : : : : : : : 646 9.7.3 Estimating the Generalized Eigenvalues : : : : : : : : : : : : : : : : : : : : : 649 9.3 The QZ algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 601 9.7.5 The Symmetric-De nite Generalized Eigenvalue Problem : : : : : : : : : : : : : : : 612 9.1 Decoupling of a Second-Order System : : : : : : : : : : : : : : : : : : : : : : 630 9. B : : : : : : : : : : : : : : : : : 647 9.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 652 9.5.10 The Lanczos Method for Generalized Eigenvalues in an Interval : : : : : : : : : : : : 651 9.4 Computations of the Generalized Eigenvectors : : : : : : : : : : : : : : : : : : : : : 611 9.3 A Case Study on the Potential Damage of a Building Due to an Earthquake : 639 9.6 Symmetric-De nite Generalized Eigenvalue Problems Arising in Vibrations of Structures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 619 9.2 The Lanczos Algorithm for the Pencil A .2 Reduction to the Generalized Schur Form : : : : : : : : : : : : : : : : : : : : 604 9.9 The Generalized Symmetric Eigenvalue Problems for Large and Structured Matrices 646 9.1 A Case Study on the Vibration of a Spring-Mass System : : : : : : : : : : : : 619 9.2 The Cholesky-QR Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 614 9.1 Reduction to Hessenberg-Triangular Form : : : : : : : : : : : : : : : : : : : : 602 9.3 Diagonalization of the Symmetric-De nite Pencil : : : : : : : : : : : : : : : : 616 9.3 Forced Harmonic Vibration : : : : : : : : : : : : : : : : : : : : : : : : : : : : 625 9.7 Applications to Decoupling and Model Reduction : : : : : : : : : : : : : : : : : : : : 629 9.2 The Reduction of a Large Model : : : : : : : : : : : : : : : : : : : : : : : : : 637 9.5.

CHAPTER 9 THE GENERALIZED EIGENVALUE PROBLEM .

9.4. reduction (Section 9. Engineering vibration problems giving rise to generalized symmetric de nite eigenvalue problems (Section 9.9).4.4. The Cholesky algorithm for solving a symmetric positive de nite system (Section 6.4. Applications of simultaneous diagonalization techniques to decoupling and model The Lanczos algorithm for large and sparse symmetric de nite problems (Section The Lanczos algorithm for generalized eigenvalues in an interval (Section 9. 9.5). Background Material Needed for this Chapter We need the following major tools developed earlier to understand the material in this chapter.5).9.2. Section 8. 1. THE GENERALIZED EIGENVALUE PROBLEM Objectives The objectives of this chapter is to study engineering applications and numerical methods for the generalized eigenvalue problem: Ax = Bx. 5.1.3).1). The Cholesky-QR algorithm and simultaneous diagonalization techniques for a symmetric de nite pencil (Section 9.6).10). 2. 4. 5. 3. The Householder and Givens methods to create zeros in a vector and the corresponding QR factorization algorithms (Sections 5.5.5). 5. 5. The Inverse iteration algorithm (Chapter 8.7). Section 8.1 Introduction In this chapter we consider the following eigenvalue problems known as the generalized eigenvalue problem. Some of the highlights of this chapter are: The QZ algorithm for generalized Schur form (Section 9.5.7). The Rayleigh quotient iteration algorithm (Chapter 8. 597 .

" (Parlett.1 In the problem AX = BX . If B is singular. B ) = 0: De nition 9. the term came to be used for any one parameter family of curves. It is easy to see that is a root of the characteristic equation det(A .2 The matrix A . nd scalars and nonzero vectors x such that Ax = Bx: Note that the standard eigenvalue problem for the matrix A is a special case of this problem (take B = I ). The Symmetric Eigenvalue Problem (1980). Many engineering applications give rise to generalized eigenvalue problems. by a natural extension. De nition 9. the ma- jority of eigenvalue problems arising from real physical systems are generalized eigenvalue problems. p. A Note on the Use of the Word `Pencil' \The rather strange use of the word \pencil" comes from optics and geometry. B is called a matrix pencil. matrices. 302). In fact. a generalized eigenvalue is also an eigenvalue of B .1. For example. in that case the corresponding generalized eigenvalue is 1.1 A. then the pencil is called a regular pencil. spaces. the eigenproblems for vibrations of structures such as buildings 598 . is called a generalized eigenvalue and the vector x is a generalized eigenvector associated with .Statement of the Generalized Eigenvalue Problem Given n n matrices A and B .1. Note that when B is nonsingular. or other mathematical objects. If B is nonsingular. then zero is an eigenvalue of B. An aggregate of (light) rays converging to a point does suggest the sharp end of a pencil and.

Finally. can be extracted once the matrices A and B are reduced to triangular forms.1.5. Theorem 9.7 are dedicated to the study of the symmetric de nite generalized eigenvalue problems.6.3 If A and B are real symmetric matrices and furthermore. De nition 9. we present the Sturm sequence and the Lanczos methods for large and sparse generalized eigenvalue problems. and 9.and bridges are the so-called symmetric de nite generalized eigenvalue problems for the mass and sti ness matrices: Kx = Mx. the simultaneous diagonalization algorithm is presented and several engineering applications of this technique are discussed. while Theorem 9.2 Generalized Schur Decomposition As a generalization of the procedure for the standard eigenvalue problem. In Section 9. Several case studies on the problems arising in vibration of structures are presented.3 we describe the QZ algorithm for the generalized eigenvalue problem. Sections 9. then the generalized eigenvalue problem Ax = Bx is called the symmetric de nite generalized eigenvalue problem. it is natural to think of solving the generalized eigenvalue problem by simultaneously reducing the matrices A and B to some convenient forms from which generalized eigenvalues can be easily extracted.2. and eigenvectors.1. in Section 9. where M is the mass matrix and K is the sti ness matrix.4 we show how to compute a generalized eigenvector when a reasonable approximation of a generalized eigenvalue is known using the inverse iteration. M is symmetric positive de nite. In Section 9. It is a natural generalization of the QR iteration algorithm described in Chapter 8. A popular algorithm widely used in engineering practice|namely. M and K are usually real symmetric. illustrates the extraction of the generalized eigenvalues once the matrices A and B have been reduced to such forms. if B is positive de nite.3.1 shows the existence of such convenient forms. and furthermore.8.2 we present a result that shows how the generalized eigenvalues. 599 . 9. 9. This chapter is devoted to the study of the generalized eigenvalue problem with particular attention to the symmetric de nite problem. The chapter is organized in the following manner. In Section 9.

t0nn i = 1 : : : n of the regular pencil det(A . the eigenvalues of T2....1 Let A and B be n n matrices and B be nonsingular. then U1 can be chosen such that U1 AU2 = T1 is in Real Schur Form. Thus. Since B is nonsingular. we have T2. B B. T1y = T2y . @ 0 ..1 are both upper triangular matrices.1 T1. Let U1 and U2 be unitary matrices such that 0t B 11 B0 U1AU2 = T1 = B .. @ 0 Then the generalized eigenvalues are given by i tnn 1 C C C C C A 1 C C C: C C A . B ) i = tii =tii : 0 Proof.2.. Remarks: 1. 2. so is T2. 600 .. B B. If A and B are real matrices. ..1 T1: Since T1 and T2.Theorem 9.. Then from above we have U1 AU2 y = U1 BU2y that is. 0 t0 11 B0 B U1BU2 = T2 = B .1T1y = y which shows that is an eigenvalue of T2. Thus. we have U1AU2U2 x = U1 BU2U2 x: De ne y = U2 x. so is their product T2.1 T1 are tii =t0ii i = 1 : : : n. . Note that U2 x = y is an eigenvector associated with . From Ax = Bx..

Cleve Moler is the developer of the popular software package MATLAB. Then the basic idea is to apply the QR iteration algorithm to the matrix C = B .9. Stewart. The simultaneous reduction of A and B to triangular forms by equivalence is guaranteed by the following theorem: Theorem 9. without explicitly forming the matrix C .1 .1 = B (B .1 A are the same as those of AB . He was also one of the developers of another popular mathematical software \LINPACK". In this case the elements of C will be much larger than those of A and B .1 A (or to AB . then the next best alternative.Assume that B is nonsingular. 601 . because AB . B0 = an upper triangular matrix. We shall now give a constructive proof of this theorem. If AB . known as the QZ iteration algorithm for computing the generalized eigenvalues. The pair (A0 B 0 ) is said to be in generalized Real Schur Form. developed by Cleve Moler and G.3 The QZ algorithm We will describe an analog of the QR iteration. Inc. He is currently with MathWorks. W. by simultaneous orthogonal equivalence: Stage I. they are similar. A and B are reduced to an upper Hessenberg and an upper triangular matrix. The reduction to the generalized Real Schur Form is achieved in two stages. of course. Thus. respecA B QT AZ : an upper Hessenberg matrix QT BZ : an upper triangular matrix Dr.1 or B . because if B is nearly singular.3.1 .1 A)B . Note that the eigenvalues of B. tively. there exist orthogonal matrices Q and Z such that QT AZ is an upper Real Schur matrix and QT BZ is upper triangular: A0 = QT AZ : B0 = QT BZ : A0 = an upper Real Schur matrix. and the eigenvalues of C will be computed inaccurately. is to transform A and B to some reduced forms and then extract the generalized eigenvalues i from these reduced forms.1 ). then it is not desirable to form B .1 A is not to be computed explicitly. He was a professor and head of the Computer Science Department at the University of New Mexico.1 .1 (The Generalized Real Schur Decomposition) Given two n n real matrices A and B .

1 nA = B .. we have A UTA 0 B B B TA = B . B.1 n in the (n . C . Reduce A to Hessenberg form while preserving the triangular structure of B ...C B. the (n 1)th entry of A is made zero by applying a rotation Qn..1 Reduction to Hessenberg-Triangular Form Let A and B be two n n matrices. TB = B 0 0 C: B U C B.. We will now brie y sketch these two stages in the sequel. The Hessenberg-Triangular pair (A B) is reduced further to the generalized Real Schur Form by applying implicit QR iteration to AB. Then 1. B .1 . . 1 n) plane: 0 B B B B A Qn. A U B B B @ 1 C C C C C C C C A 1 0 C B0 C B C B C B . A will be full). This process is known as the QZ Algorithm. . .A @ 0 0 0 0 First.Stage II.3. B B B @ 602 0 1 C C C C C: C C C A . 9.. 2. Find an orthogonal matrix U such that B U T B : upper triangular: Form (in general.. This step is achieved as follows: To start with.

while B remains upper triangular: 1 C C C C C .C .This transformation. Fortunately. when applied to the right of A.. B. C . B. .1 nB = B 0 0 B. B B B0 @ 0 0 0 1 C C C C C C C C A A 0 The entries (n .. . B B. . 1) entry of B zero. .. when applied to B to the left.. we have B 0 B0 B B B B Zn.. will give a ll-in in the (n n . . . C C A ... . @ 0 0 0 1 C C C C: C C . The process is continued until the matrix A is an upper Hessenberg matrix and B is upper triangular. B A=B B.. At the end of the rst step... . the matrix A is Hessenberg in its rst column. . this rotation. C: .C C . B. . . B .1 n = B .. . followed by another appropriate rotation to the right of B to recover the undesirable ll-in in B .. .. B. .. each time applying an appropriate rotation to the left of A.C .. .C C C C A 0 0 0 The zeros are now produced on the second column of A in the appropriate places while retaining the triangular structure of B in the analogous manner. ..A The rotation Zn. B. 603 0 1 0 B C B B C B B C B B C B0 C B = B 0 .. 1 n ) is now applied to the right of B to make the (n n . 1) position: 0 B0 B B B B Qn. . does not destroy the zero produced earlier. 2 1) : : : (3 1) of A are now successively made zero. B.1 n = B 0 0 B. @ 0 B B B B B AZn.C @ A @ 1 C C . . 1 1) (n .C ..1 n = J (n . C C B .. C: ... Schematically.

Update A: A is in upper Hessenberg and B is in upper triangular form.3.2 1 B C A A(1) Z23 = Q23AZ23 = B 1:4142 4:9497 . analogous to an implicit QR step. Form Q23 to make a31 zero: 01 0 1 0 B C Q23 = B 0 :7071 :7071 C @ A 0 . Thus a QZ step.:7071 :7071 0 1 1 2 3 B C A A(1) = Q23A = B 1:4142 4:2426 4:9497 C : @ A 0 0 . we have A and B as an upper Hessenberg and an upper triangular matrix. Update B : 01 1 1 1 B C B B (1) = Q23B = B 0 0:7071 2:8284 C @ A 0 .3. 0 1 3 .0:7071 C : @ A 0 0 0:7071 4.0:7071 0 3. The basic idea now is to apply an implicit QR step to AB.1 without ever completely forming this matrix explicitly. obtained from stage 1.4:2426 C : @ A 0 .1 01 2 31 B C A = B1 3 4C @ A 1 3 3 01 1 11 B C B = B0 1 2C: @ A 0 0 2 1.1 C @ A 0 1 0 01 1 .0:7071 2.1 1 B C B B(1) Z23 = Q23 BZ23 = B 0 2:8284 .0:7071 0 9.2 Reduction to the Generalized Schur Form At the beginning of this process. will be as follows: 604 . Form Z23 to make b32 zero: 01 0 0 1 B C Z23 = B 0 0 . We can assume without loss of generality that the matrix A is an unreduced upper Hessenberg matrix.Example 9. respectively.

if c1 and c2 are the rst two columns of C = AB. The above four steps constitute one single step of the QZ algorithm.1 .1 explicitly. this rst column contains at most three nonzero entries in the rst three places: 0 To compute x y z all we need to know is the rst two columns of C . Computation of the rst column of (C . 2 I ). Find a Householder matrix Q1. from which the generalized eigenvalues can be easily computed. 1I )(C . without forming C = AB . 2I ) The real bottleneck in implementing the whole algorithm is in computing the rst column of (C . and suitably chosen shifts. . 1I )(C . which can be obtained just by inverting the 2 2 leading principal submatrix of B . Thus.1 does not need to be computed. 2 I ). 1I )(C . A1 Q0(Q1A)Z : an upper Hessenberg B1 Q0(Q1B )Z : an upper triangular. and Q1 B to an upper triangular matrix B1 . First. then 0x1 ByC B C B C BzC B C n1 = Ne1 = (C . 1 and 2 are Transform simultaneously Q1A to an upper Hessenberg matrix A1.C @ A b b (c1 c2) = (a1 a2) 11 12 0 b22 605 !. 2 I )e1 = B C : B C B0C B.1 . such that Q1Ne1 is a multiple of e1 .C B.Compute the rst column of N = (C . Fortunately.1 the whole B . Using the implicit Q theorem we can show (Exercise) that the matrix A1B1 1 is essentiatlly the same as that we would have obtained by applying an implicit QR step directly to AB . note that because A is upper Hessenberg and B is upper triangular. We now show how to implement these steps.C B. this can be done.1.1 . where C = AB . Application of a few QZ steps sequentially will then yield a quasi-triangular matrix R = QT AZ and an upper triangular T = QT BZ . 1 I )(C . Form Q1 A and Q1 B .

C B .12 21 C : B C B 21( 11 2 21 22 1) C @ A @ A z c21c32 Example 9.1 1 B .8 z = 2: Computation of A1 and B1 Since the rst column n1 of N = (C .2 Let 01 1 1 11 B2 1 4 1C B C C A=B B B0 1 1 1C C @ A 0 0 1 1 01 2 3 41 B0 1 1 1C B C C B=B B B0 0 1 2C: C @ A 0 0 0 3 1 2 : 0 1 The 2 2 leading principal submatrix of B = ! 011 B2C B C c1 = B C B C B0C @ A 0 Choose 1 0 . 2I ) has at most three nonzero entries. ) + c 2(c . )(c . ) + c c 1 1 11 B y C = B c 11c . Let 0 1 c 0c 1 B c12 C 11 B 22 C Bc C B C B 21 C Bc C B C B C B C c1 = B 0 C and c2 = B 32 C : B C B .C @ A B . C B0C B . Note that c1 has at most two nonzero entries and c2 has at most three.where ai i = 1 2 are the rst two columns of A. 1 I )(C . C B .3.3 C B C c2 = B C : B C B1C @ A =1 0 2=1 then x = .C @ A 0 0 Then it is easy to see that 0 x 1 0 (c .3 606 . C B . the Householder matrix Q1 that transforms n1 to a multiple of e1 has the form ! ^ Q1 0 Q1 = 0 In.2 y = .

^ where Q1 is a 3 3 Householder matrix. B B. 0 0 . B. B 0 B B B B B B B B B Q1B = B B B B B B B B B @ 0 0 0 1 C C C C C C C C C C C .. Therefore A 0 B B B B B B B Q1A = B B B B B B B @ 0 . B B Z1Z2 = B . .C . .. . . B.C .. B . . . . . B. B. to make the nonzero entries b31 b32 and b21 zero. To do this. . .. B. Householder matrices Z1 and Z2 are rst constructed to make B triangular.C A 1 C C C C C C C C C: C C C C C C C C C A 0 .1) position of Q1 A and the unwanted nonzeros in the (2. . . . in that there are now an unwanted nonzero in the (3. . . .. .. .1) and (3. .1). . We then have B 0 B0 B B B0 0 B B. 0 .. that is.. . . . both the Hessenberg form A and the triangular form of B are now lost.. . @ 0 0 0 0 1 C C C C C C C C C C C C C C C C C A 607 . .2) position of Q1 B . That is. .. The job now will be to cleverly chase the unwanted nonzero entries and make them zero using orthogonal transformations. . . . B. 0 0 0 . (3.

C @..1 One step of the QZ Algorithm Given A.. . an unreduced upper Hessenberg matrix...... B . . the 608 . C B. 0 0 0 0 1 0 C B C B0 C B C B C B0 C B C B C B C = B0 C B C B C B0 C B . A 0 A0 1 C C C C C C C C C C C C C C C C A 0 B0 1 C C C C C C C C C C C C C C C C A The process is now repeated on the submatrices A0 and B 0 which have the same structures of A and B to start with. . (4.2) positions of A. . a Householder matrix Q2 is created to introduce zeros in the (3. @ 1 C C C C C C C C C: C C C C C C C C A 0 B B B B B B0 B B A Q2 A = B 0 B B B B0 0 0 B B.. The continuation of this process will nally yield A1 . B..... B . ... C B 0 C B C B B C B .. B @ 0 . B.. In view of the above discussion.1). .. . C B . and B . . ...1) and (4.3. . We now have 0 B B B B B B B B A Z1Z2 = B B B0 0 B B B0 0 0 B...1) and (4..A 0 0 0 (Note that we now have unwanted zeros in the (3. C B B . B @ 0 0 0 0 B B0 B B B0 B B B B Q2 B = B 0 B B B0 B B. . . A @ 0 1 0 C B C B C B C .... . and B1 in the desired forms. . . C B ..1) positions of A B is then updated. let's now summarize one step of the QZ iteration. C B . C B. . . . C B . .. . .. an upper triangular matrix. ... Algorithm 9.. C = B 0 C B C .... . .. C B.) Next. 0 0 .. ... ..

C B. 2I ).2 .1 2.2 and Z1 through Zn. 2 ) + c21(c22 . respectively. 1 I )(C . 1. these steps construct orthogonal matrices Q and Z such that QAZ is an upper Hessenberg and QBZ is upper triangular. 609 . C : B.C @ A 0 Obtaining the Transforming Matrices The transforming matrices Q and Z are obtained as follows: The matrix Q: Q = Q1Q2 Q3 Qn.C B. 1) z = c21c32.2: Note that Q has the same rst row as Q1 . 2) + c12c21 y = c21(c11 . Transform Q1A and Q1 B .1 : 1 and 2.1 . Compute the rst column of N = (C . . 0 1 B0C B C 3. Then (c1 c2) = (a1 a2) 0 b22 The three nonzero entries of the rst column of N are given by: x = (c11 .C B. creating orthogonal matrices Q2 through Qn. B0C B. to an upper Hessenberg matrix A1 and a triangular matrix B1 by orthogonal equivalence in the way shown above.2 The matrix Z : Z = Z1 Z2 Zn. Choose the shifts forming B . without explicitly ! b11 b12 Let (c1 c2) be the rst two columns of C . Find a Householder matrix Q1 such that Q1 n1 = B .following steps constitute one step of the QZ iteration that is. by taking advantage of the special structures of the matrices. 1 )(c11 . The rst column of N = n1 = B C. where C = AB .C @ A 4. Form Q1 A and Q1 B . 0 5. 0x1 ByC B C B C BzC B C B C .

3. then an Choosing the Shifts explicitly forming B. respectively. Flop Count: One QZ step requires 13n2 op. Transform (A B ) to a Hessenberg-Triangular pair by orthogonal equivalence: A QT AZ : an upper Hessenberg.additional 5n2 and 8n2 ops. Apply a sequence of the QZ steps to the Hessenberg-Triangular pair (A B ) to produce fAk g and fBk g. Watch the convergence of the sequences fAk g and fBk g: fAk g . respectively.2 The QZ Algorithm Given A and B are n n real matrices. ^ ^ rithm. B QT BZ : an upper Triangular. another 8n3 and 10n3 ops (from experience it is known that about two QZ steps per eigenvalue are adequate). II. I. and Z .! R. will be required. with properly chosen shifts. III.! T . if required. Quasi-Triangular fBk g . The formation of Q Round-o Properties: The QZ iteration algorithm is as stable as the QR iteration algo- kE k = kAk and kF k = kB k is the machine precision. It can be shown that the computed R and S satisfy ^ QT (A + E )Z0 = R 0 ^ QT (B + F )Z0 = S . Triangular. 610 .1. The double shifts 1 and 2 at a QZ step can be taken as the eigenvalues of the lower 2 2 submatrix of C = AB . If Q and Z are to be accumulated.1 : The 2 2 lower submatrix of C can be computed without Algorithm 9. needs. 0 where Q0 and Z0 are orthogonal and Flop Count: The implementation of I to III requires about 15n3 ops.

2. 611 .1k k . Choose an initial eigenvector v0 . at each iteration only a Hessenberg system needs to be solved. the corresponding generalized eigenvector v can be computed using the inverse iteration as before.1 Computations of the Generalized Eigenvectors 1. B is a Hessenberg matrix.1. the matrix A . B )vk = Bvk. For k = 1 2 : : : do Solve (A . Thus. k 3. B )^k = Bvk. Algorithm 9.1. for a prescribed small .4 Computations of the Generalized Eigenvectors Once an approximate generalized eigenvalue is computed. Stop if kvk kv vk .4. Note that for a given . ^ ^ . A Remark on Solving (A . B)vk = Bvk.1 v vk = vk=kvk k1. substantial savings can be made by exploiting the HessenbergTriangular structure to which the pair (A B ) is reduced as a part of the QZ algorithm.9. In solving (A .

:5114 C : ^ ^ ^ @ A 0:1217 9.1:5 C @ A 0 .4.5 The Symmetric-De nite Generalized Eigenvalue Problem In this section.1:5 0 1 B C A = 109 B . the symmetric de nite generalized eigenvalue problem arises in a wide variety of engineering applications. As said before. B ) = 1950800: 011 B C v0 = B 1 C : @ A 1 Solve: (A .0:0102 C ^ @ A 0:0024 0 :8507 1 B C v1 = v1 =kv1k = B . 612 . It is routinely solved in vibration analysis of structures.1:5 3 .1 0 3 . 0B )^1 = Bv0 1 v :0170 1 B C v1 = B . we study the symmetric de nite eigenvalue problem: Ax = Bx Here A and B are symmetric and B is positive de nite.Example 9.1:5 1:5 02 0 01 B C B = 103 B 0 3 0 C : @ A 0 0 4 1 k=1: Solve for v1: = a generalized eigenvalue of(A .

Theorem 9. and natural frequencies form the basis for discussing vibration phenomena of complex systems. from we have whence or Ax = Bx Ax = LLT x L. since a symmetric matrix has a set of n independent eigenvectors. We start with an important (but not surprising) property of the symmetric de nite pencil. The assertion about the eigenvectors is obvious.1LT x = LT x Cy = y where y = LT x.Frequencies. \The language of modes. it admits the Cholesky decomposition: B = LLT : So. linearly independent eigenvectors.1A(LT ).1A(LT ). The matrix C = L.1 The symmetric-de nite pencil (A . The eigenvectors are therefore referred to as mode shapes or simply as modes. An entire industry has been formed around the concept of modes" (Inman (1994)). mode shapes.1 is symmetric therefore is real. Modes. and Mode Shapes The eigenvalues are related to the natural frequencies. B) has real eigenvalues and Proof. 613 .5. Since B is symmetric positive de nite. and \the size and sign of each element of an eigenvector determines the shape of the vibration at any instant of time".

1 The QZ Method for the Symmetric-De nite Pencil The QZ algorithm described in the previous section for the regular pencil A .) Computational Algorithms 9. the drawback here is that both the symmetry and de niteness of the problem will be lost in general. However.1Ak kB.5. 9.5. of course. (Exercise #26. B lie in the interal . be applied to a symmetric-de nite pencil.kB.1 immediately reveals a computational method for nding the generalized eigenvalues of a symmetric-de nite pencil: 614 . B can.An Interval Containing the Eigenvalues of a Symmetric De nite Pencil The eigenvalues of the symmetric de nite pencil A .1Ak].5.2 The Cholesky-QR Algorithm The proof of Theorem 9. We describe now a specialized algorithm for a symmetric-de nite pencil.

Compute the generalized eigenvectors xi by solving: LT xi = yi . as follows: even if they are themselves well-conditioned. so will be L. Compute the Real Schur Form of B and order the eigenvalues of B from smallest to largest: U T BU = D = diag(d1 : : : dn): 2. there is nothing objectionable in the algorithm. Stability of the Algorithm When B is well-conditioned.1 . However.1.1A(LT ). Speci cally. by using the ordered RSF of B in place of the Cholesky decomposition. using the symmetric QR iteration. 615 2 L = UD. by taking advantage of the symmetry of A. 4.1 The Cholesky-QR Algorithm For the Symmetric-De nite Pencil 1. Compute the eigenvalues and the eigenvectors yi i = 1 : : : n. in many practical applications only a few of the smallest generalized eigenvalues are of interest.1 A(LT ). Form 3. These eigenvalues sometimes can be computed reasonably accurately. Compute the eigenvalues of C . Find the Cholesky factorization of B : B = LLT : 2. p. Form 4.1 : p p C = L. 3. 310) that a computed eigenvalue ~ obtained by the algorithm is the exact eigenvalue of the matrix (L. ill-conditioning of B will severely a ect the computed eigenvalues. Algorithm 9. even when B is ill-conditioned. if B is ill-conditioned or nearly singular. where kE k2 kAk2kB. of the symmetric matrix C .Algorithm 9.1k2. 1 = U diag( d1 : : : dn). Therefore.1 + E ). i = 1 : : : n. in this case. As we will see in the next section. Form C = L.1AT (LT ).1 and then the matrix C cannot be computed accurately. it can be shown (Golub and Van Loan MC 1984. the eigenvalues and the eigenvectors will be inaccurate.5.2 Computing the Smallest Eigenvalues with Accuracy 1. Thus.5.

Example 9.0:6055 6:6055. Remark: Two smallest generalized eigenvalues have been computed by the procedure rather 9. the largest one is completely wrong. accurately. 0 0 1 1 0 B C L = UD.5.1 01 2 31 B C A = B2 3 4C @ A 3 4 5 01 0 01 B C B = B 0 :00001 0 C : @ A 0 0 1 Cond(B ) = 105: 1.1 = B 0:0063 1 3 C: @ A 3 5 5. 0 0:0003 0:0063 0:0126 1 B C C = L. The two smallest generalized eigenvalues are: 0 . = B 316:2278 0 0 C : @ A 1 2 0 0 1 3. This can be seen as follows: Let Q be an orthogonal matrix such that QT CQ = diag(c1 : : : cn): 616 . The eigenvalues of C are: 0 .0:6665. 0 :00001 0 0 1 B C U T BU = D = B 0 1 0 C @ A 0 0 1 00 1 01 B C U = B1 0 0C: @ A 0 0 1 2. However. 0:0126 4.5. This example shows how ill-conditioning of B can impair the occuracy of the computed eigenvalues.1 AT (LT ).3 Diagonalization of the Symmetric-De nite Pencil Simultaneous Diagonalization of A and B The Cholesky-QR iteration algorithm of the symmetric-de nite pencil gives us a method to nd a nonsingular matrix P that transforms A and B simultaneously to diagonal forms by congruence.

1)T Q = QT CQ = diag(c1 c2 : : : cn) and P T BP = QT L. 4.1 )T Q. nd an orthogonal matrix Q such that QT CQ is a diagonal matrix. by taking advantage of the symmetry of A (C is symmetric). Form P = (L.1)T Q = QT L. 0 10 1 1 1 B C B = B 1 10 1 C @ A 01 2 31 B C A = B2 3 4C @ A 3 4 5 1. Form C = L. 1.1LLT (L.5. the following algorithm computes a nonsingular matrix P such that P T BP is the identity matrix and P T AP is a diagonal matrix.1A(L. Example 9.1)T Q = I: Algorithm 9. Compute the Cholesky factorization of B : B = LLT : 2. The Cholesky decomposition of B = LLT : 1 0 3:1623 0 0 C B L = B :3162 3:1464 0 C A @ :3162 :2860 3:1334 617 .1. Then P T AP = QT L.3 Simultaneous Diagonalization of a Symmetric De nite Pencil Given a symmetric matrix A and a symmetric positive de nite matrix B . Flop Count: The algorithm for simultaneous diagonalization requires 7n3 ops.Set P = (L.2 Consider 1 1 10 A is symmetric and B is symmetric positive de nite.1B (L. 3.1)T Q.1 A(LT ). Applying the symmetric QR iteration algorithm to C .5.

when K is the sti ness matrix and M is the mass matrix in the symmetric-de nite generalized eigenvalue problem: Kx = Mx the matrix P that simultaneously diagonalize M and K . that is. Form 5.0:0936 0:8174 C @ A 0:7063 0:5651 . Verify: 0 :09409 . and the columns of the matrix P are called the normal modes. Orthogonality of the Eigenvectors Note that if p1 : : : pn are the columns of P .0:1361 P T AP = diag(:8179 .0:3873 1 B C Q = B :5684 .1A(L.1)T Q = B :1601 .2.0:4262 4. Form C = L. to the identity and a diagonal matrix.0:0462 0:2722 C @ A 0:2254 0:1803 .0:0679 0): P T BP = Identity = diag(1 1 1): Modal Matrix In vibration problems involving the mass and sti ness matrices. then it is easy to see that pT Bpi = 1 i = 1 : : : n i T Bp = 0 i 6= j: pi j and pT Api = ci i = 1 : : : n i T Ap = 0 i 6= j: pi j 618 . respectively. is called the modal matrix. Find an orthogonal Q such that QT CQ = diag(c1 : : : cn) 0 :4220 .0:8197 .1 )T 0 0:1000 0:1910 :2752 1 B C C = B 0:1910 0:2636 0:3320 C @ A 0:2752 0:3320 0:3864 3.0:1361 1 B C P = (L.0:2726 .

9. We would like to study the con guration of the system corresponding to the di erent modes of vibration.Generalized Rayleigh-Quotient The Rayleigh-Quotient de ned for a symmetric matrix A in Chapter 8 can easily be generalized for the pair (A B ).1 A Case Study on the Vibration of a Spring-Mass System Consider the system of three spring masses shown in Figure 9. Compute 2.4 Generalized Rayleigh Quotient Iteration Choose x0 such that kx0 k = 1: For k = 0 1 : : : do until convergence occurs 1.1 with spring constants k1 k2 and k3. 619 .5. Normalize xk+1 : ^ x ^ xk+1 = kx k+1k : ^ k+1 2 9. We will now give some speci c examples. Algorithm 9. k B )^k+1 = Bxk x 3. as shown in the following algorithm. x Ax = xT Bx (kxk2 = 1) T It can be used to compute approximations to generalized eigenvalues k and eigenvectors xk for the symmetric de nite generalized eigenvalue problem.6 Symmetric-De nite Generalized Eigenvalue Problems Arising in Vibrations of Structures As we stated earlier.5.1 The number is called the generalized Rayleigh quotient.6. Solve for xk+1: ^ xT Axk k k = xT Bx : k k (A . De nition 9. almost all eigenvalue problems arising in vibrations of structures are the symmetric-de nite generalized eigenvalue problems.

1 3 Degrees of Freedom 1. respectively. Formulate the problem as a Symmetric De nite Generalized Eigenvalue Problem Let the displacements of the masses from static equilibrium positions be de ned by the generalized coordinates y1 y2 and y3 . The generalized eigenvalues and eigenvectors are then used to study the behavior of the system. k2y2 = 0 620 . The equations of motion for the system can be written as: m1y1 + (k1 + k2)y1 . The sti ness matrices are very often determined using physical characteristics of the systems.k 1 m 1 y 1 k 2 m 2 y 2 k 3 m 3 y 3 Figure 9.

k2x2 = 0 . k3y3 = 0 m3y3 . k3y2 + k3y3 = 0 or.!2 xk ei!t k = 1 2 3 we have .k3 k3 y3 0 M y + Ky = 0 that is.k2 k2 + k3 . 0 m 0 0 1 0 y 1 0 k + k . where M = diag(m1 m2 m3) and 0 k + k .k C B x C = !2 B 01 m 0 C B x1 C CB 2C B B 2 1 2 3CB 2C 2 A@ A @ A@ A @ x3 0 0 m3 x3 0 .m1x1!2 + (k1 + k2)x1 . k3x3 = 0 . k2y1 + (k1 + k2)y2 .k3 k3 y1 = x1 ei!t y2 = x2 ei!t y3 = x3 ei!t Assuming harmonic motion. k2x1 + (k1 + k2)x2 . m1 .k 1 0 1 2 2 B C K = B .k 10y 1 001 0 1 1 1 2 2 B CB C B CB 1C B C B 0 m2 0 C B y2 C + B . the amplitudes of the masses.k 0 1 1 2 2 B .k2 k2 + k3 .m2x2!2 . k3x2 + k3x3 = 0 which can be written in matrix form as: 0m 0 0 10x 1 10x 1 0 k + k . respectively. Substituting these expressions for y1 y2 and y3 into the equations of motion. and ! denotes the natural angular frequency. in matrix form.k k + k .k3 C : @ A 0 . we can write where x1 x2 and x3 are. and noting that yk = .k3 k3 621 .m3x3!2 .k3 C B y2 C = B 0 C @ A@ A @ A@ A @ A 0 0 m3 y3 0 . m2 and m3.m2y2 .

The eigenvalues 2 i = !i Kx = Mx i = 1 2 3 are the squares of the natural frequencies of the rst. The natural frequencies are: 102 (4:4168 2:8951 0:9273).0:0035 0:0031 C : B C @ A 0:0008 0:0028 0:0040 These eigenvectors can be used to determine di erent con gurations of the system for di erent modes. 622 . 9. and k1 to k4 are equivalent spring constants of columns that act as springs in parallel. which are fairly rigid are represented by lumped masses m1 to m4 having a horizontal motion caused by shear deformation of columns. respectively.0:0040 0:0017 1 B .:6124 1 .6. The oors and roofs. That is. Then M = diag(20000 30000 40000)1 0 3 . and third modes of vibration.:4330 C @ A 0 .1:5 3 .1:5 0 B C K = 109 B .0:0034 . second.:6124 0 1 B C C = 105 B . Solve the Generalized Eigenvalue Problem Using the Cholesky-QR Algorithm Let's take m1 = 20000 m2 = 30000 m3 = 40000 k1 = k2 = k3 = 109 1:5.or where = ! 2 . 2. they can be used to see how the structures vibrates under di erent modes.:4330 :3750 The generalized eigenvalues = the eigenvalues of C are: 105(1:9508 :8382 :0860). The generalized eigenvectors are: 0 0:0056 .1:5 C @ A 0 .1:5 1:5 0 141:4214 1 0 0 B C L = B 0 173:2051 0 C @ A 0 0 200:000 0 1:5 .2 A Case Study on the Vibration of a Building Consider a four-story reinforced concrete building as shown in the gure below.

k 1 0 0 1 2 2 B . and k1 = 10 1014 k2 = 623 .k k + k . 1.k4 C @ A 0 0 . Formulate the problem as a symmetric-de nite generalized eigenvalue problem in terms of mass and sti ness matrices: Kx = Mx: As in the previous example.k3 k3 + k4 . M is diagonal and K is tridiagonal: M = diag(m1 m2 m3 m4) 0 k + k .m 4 y 4 k 4 m 3 y 3 k 3 m 2 y 2 k 2 m 1 y 1 k 1 Figure 9.k C B 2 2 3 0 C 3 B C: K = B C B 0 .2 Schematic of 4-story building We would like to study the con guration when the building is vibrating in its rst two modes (corresponding to two smallest eigenvalues).k4 k4 Taking m1 = 5 107 m2 = 4 107 m3 = 3 107 m4 = 2 107.

6 0 C B C C K = 1014 B B B 0 .0:1085 C C @ A @ A @ A @ A .3 B .3 B .0:0858 C 10. we have 0 18 .0:0472 0:1403 0:1318 0:1036 The eigenvectors corresponding to the two smallest eigenvalues 107(:3435)and 107(1:8516) are: 0 0:0370 1 0 . Find the generalized eigenvalues and eigenvectors using the Cholesky-QR algorithm.8 1014.1:7321 C = 10 B C B 0 .1:7889 1 0 0 B C B 0 C C: 7 B .3 B . respectively.1:7321 3:3333 .3 B .3 B 0:0753 C 10.1:6330 2:000 The eigenvalues of C which are the generalized eigenvalues.3 and 9.3 B 0:0753 C .0:0785 1 B C B C B C B C .4 4 and M = 107diag(5 4 3 2): 2.6 10 .4 C C @ A 0 0 .0:0785 1 0 0:0370 1 0 0:0896 1 B C B C B C B C B C B C B C B C .8 14 . 0 7:0711 0 1 0 0 B 0 6:3246 0 C B 0 C B C L = 103 B C B 0 0 5:4772 0 C @ A 0 0 0 4:4721 0 3:6 .4.0:02777 C : 10 B C B C B C B B 0:0977 C B 0:0104 C B 0:1091 C B . are: 0 6:1432 1 B C B C 7 B 1:8516 C : 10 B B 0:3435 C C @ A 4:0950 The corresponding generalized eigenvectors are: 0 :0666 1 0 .8 0 0 1 B . 624 .0:1058 C 10.0:0858 C : 10 B 10 B C B 0:1091 C B 0:0104 C C @ A @ A 0:1318 0:1403 The rst two modes of vibration are shown in Figures 9. k3 = 6 1014 k4 = 4 1014.1:6330 C @ A 0 0 .1:7889 3:5000 .

as shown in Figure 9.4 Second Mode of Vibration of 4-Story Building 9.x m 4 4 x m 4 4 x m 3 3 k 4 m x 3 3 k 4 x 2 m2 k 3 m 2 k 3 x x1 m k 1 2 2 m 1 k 2 k 1 x 1 k 1 Figure 9.3 Forced Harmonic Vibration In the previous two examples.6.5. Consider now a system with two degrees of freedom excited by a harmonic force F1 sin !t.3 First Mode of Vibration of 4-Story Building Figure 9. 625 . we considered vibration of systems without any external force.

m1! 2 )x1 . y2 ) . m2 ! 2) x2 0 626 .k 1 These equations can be easily solved to obtain explicit expressions for the amplitudes x1 and x2 : 2) k x1 = m m(2! 2.6.kx1 + (2k . m22! !F1.k(y1 . ky1 + F1 sin !t m2y2 = k(y1 .2) 2 1 2 ( 1 .6. !2) (9. " (2k .1) and substituting into the equations we get y2 1 = x2 1 sin !t (2k . ky2 or # ( y ) " 2k .k 2k y 2 0 Assuming the solution to be ! x! y 1 "m 0 (9. ! )( 2 kF (9. !2) 1 2 1 2 .k m F1 sin ω t 1 y 1 k m2 k y2 Figure 9.3) x2 = m m (! 2 .6. y2 ) . m2!2)x2 = 0: In matrix form.k # ( x1 ) ( F1 ) = : (2k . !12)(!2 . m !2) .5 Forced vibration of a 2 degree of freedom system Then the di erential equations of motion for the system become m1y1 = .k # ( y ) ( F ) 1 1 + = 1 sin !t: 0 m2 y2 . kx2 = F1 .

In other words. !m!!)2F. ! 2) : 1 2 From above. a situation which is quite alarming. Note that in this case.where !1 and !2 are the modal frequencies. The fuselage and engine are assumed to have a combined mass m1. This situation is very alarming to an engineer. if the frequency of the imposed periodic force is equal or nearly equal to one of the natural frequencies of a system.6. For the special case when m1 = m2 = m. ! 2)(! 2 .4) kF1 2 x2 = m2(!2 . signaling the occurrence of resonance. !1 and !2 are given by s s and x1 and x2 are given by k !1 = m and k !2 = 3 m (2 . and sti ness k2 and k3 k1 represents the combined sti ness of the landing gear and tires. The wings are modeled by lumped masses m2 and m3 . let us consider an airplane landing on a rough runway. 627 . the amplitude 2 becomes arbitrarily large. it follows immediately that whenever ! is equal to or close to !1 or !2 . resonance results. ! 2) 2 )( 1 (9. Example 9. the denominator is zero or close to it.6.1 To demonstrate the consequences of excitation at or near resonance. The masses of the wheels and landing gear are assumed negligible compared to the mass of the fuselage and the wings. 1 x1 = m2(!2k.6. The system is modeled in Figure 9.

where m1 represents the mass of the fuselage.k . and k2 and k3 are the sti nesses of the wings.k3 0 k3 : y3 The airplane is shown schematically in Figure 9. y1 . 628 .7. Let the contour be described by y = y0 sin !t.6 Airplane landing on a runway The runway is modeled by a sinusoidal curve as shown in the Figure 9. m2 and m3 represent the lumped masses of the wings. Finally.y v y0 l = 20 m Figure 9. The combined sti ness of the landing gear and tires is denoted by k1.k2 k2 0 7 > y2 > = > 0 > : 4 5> > 4 5 : 0 0 0 m3 : y3 . where f1 = k1 y0. and y3 represent motion relative to the ground. and let the airplane be subjected to a forcing input of f1 sin !t.k 3 8 y 9 8 f sin !t 9 > > > > > > = 6 1 7< 1 = 6 1 2 3 2 37< 1= < 1 6 0 m2 0 7 y2 + 6 .6. y2 . The equations of motion for the 3 degree of freedom system so described are given by: 2 m 0 0 3 8 y 9 2 k + k + k .

So if ! = !1 . there is danger of excitation at or near the resonance.7 Applications to Decoupling and Model Reduction We have already seen how generalized eigenvalue problems arise in applications.7 Model of the airplane Let k1 k3 = k2 m1 m2 = = = = 1:7 105 N/m 6 105 N/m 1300kg m3 = 300kg: The natural frequencies obtained by solving the generalized eigenvalue problem: Kx = Mx = ! 2 Mx are given by: !1 = 9:39 rad/sec !2 = 44:72 rad/sec !3 = 54:46 rad/sec: The forcing frequency ! is related to the landing velocity v by v = !`=2 . These include decoupling of a second-order system of di erential equations. if the landing velocity is 107 km/hr. In this section we will mention a few more engineering applications. 9. v = !1` = 9:39 220m/sec = 29:8m/sec = 107km/hr: 2 Thus. 629 . and model reduction.y 1 y3 m 3 y k3 k m1 2 m2 2 k 1 y Figure 9. or close to it.

The matrices M and K are..7.7.C @ A yn y=dy dt2 2 is an n-vector. so the homogeneous system (9. we will now show how the simultaneous diagonalization technique described earlier can be pro tably employed to solve this system of second-order equations.1) 0y 1 B y1 C B C y = B . The Undamped System As we have seen before. the mass and sti ness matrices. The idea is to decouple the system into n uncoupled equations so that each of these uncoupled equations can be solved using a standard technique.2) (9. or MP z + KPz = 0: P T MP z + P T KPz = 0 z + z = 0: zi + !i2zi = 0 630 The homogeneous system therefore decouples: i = 1 2 : : : n: .1 Decoupling of a Second-Order System Case 1.9.7.7. Assuming that these matrices are symmetric and M is positive de nite.3) M y + Ky = 0 becomes Next premultiplying by P T . as usual. Let P be the modal matrix such that: P T MP = I 2 2 P T KP = = diag(!1 : : : !n): Let y = Pz . vibration problems in structural analysis give rise to a homogeneous system of di erential equations of second order of the form: M y + Ky = 0 where and (9.2 C B C B.

_ Example 9.or 2 z1 + !1 z1 = 0 2 z2 + !2 z2 = 0 2 zn + !nzn = 0 . B. @ A 6 4 yn An cos !nt + Bn sin !n t 3 7 7 7 7: 7 7 5 (9.C @ A zn 2 2 2 = diag(!1 !2 : : : !n): The solution of the original system now can be obtained by solving these n decoupled equations using standard techniques and then recovering the original solution y from y = Pz: Thus.4) are given by zi = Ai cos !i t + Bi sin !it i = 1 2 then the solutions of the original system (9.5) The constants Ai and Bi are to be determined from the initial conditions such as: yijt=0 = displacement at time t = 0 yijt=0 = initial velocity.. .7.1 Consider the system with three masses in the example of the Section 9. C=P6 B . when released from rest at t = 0. (9. Suppose that the system. .7.2 C B C B.6. The initial conditions are: y1 = 1 y2 = 2 y3 = 3 3 7 y_1 = 0 7 y_2 = 0 : 7 5 y3 = 0 _ 631 . C 6 . is subjected to the displacement as given below: We would like to nd the undamped time response of the system.7.7.1) are given by n 0 y 1 2 A cos ! t + B sin ! t B y1 C 6 A1 cos !1t + B1 sin !1t B 2C 6 2 2 2 2 B .4) where 0z 1 B z1 C B C z = B .C 6.7.1. if the solutions of the transformed system (9..

632 .Since the initial velocities are zeros. Again.0:0040 0:0017 1 B C P = B .0:0035 0:0031 C : @ A 0:0028 0:0040 0A 1 011 B 1C B C P B A2 C = B 2 C @ A @ A A3 3 is A1 = 2:5154 A2 = 55:5358 A3 = 710:6218: Substituting these values of A1 A2 A3 and the values of !1 !2 and !3 obtained earlier in y = Pz or 0 y 1 0 z 1 0 A cos ! t 1 B y1 C = P B z1 C = P B A1 cos !1t C B 2C B 2C B 2 2 C @ A @ A @ A y3 z3 A3 cos !3 t we obtain the values of y1 y2 and y3 that give the undamped time response of the systems subject to the given initial conditions.0:0034 . we obtain y1 = PB1 !1 = 0 _ y2 = PB2 !2 = 0 _ y3 = PB3 !3 = 0: _ These give B1 = B2 = B3 = 0. we have 0y 1 0A 1 011 B 1C B 1C B C B y2 C = P B A2 C = B 2 C : @ A @ A @ A y3 A3 3 Recall that for this problem 0:0008 Then the solution of the linear system 0 0:0056 . at t = 0.

for an automatic shock absorber.6) (9. the damping is proportional to mass or sti ness or a combination of them in a certain special manner. Let us now consider damped homogeneous systems. The original system can now be solved by solving these n uncoupled equations separately.8) become: zi + 2 i!i zi + !i2 zi = 0 i = 1 2 : : : n: _ The solutions of these equations are then given by zi = e. i2 t q q i = 1 2 ::: n where the constants Ai and Bi are to be determined from the given initial equations. Damping of this type is called proportional or Rayleigh damping.005.7. and then recovering the original solution y from y = Pz .8) In engineering practice it is customary to assume modal damping.7. etc.7. such as in the design of exible structures. Let D be the damping matrix. the decoupled equations (9. (See Inman (1994).7) D= M+ K where and are constants.) However.7. f ig are taken to be as low as 0. such as that due to air resistance. The Damped System Space Some damping. Assuming modal damping. Then we have P T DP = = P T MP + P T KP I+ : Let z = P T y . In modal damping. Let P be the modal matrix. 196. On the other hand. is present in all real systems.Case 2. Then the equations of motion of the damped system becomes M y + Dy + Ky = 0: _ Assume that D is a linear combination of M and K that is (9. 633 .. Let and be chosen so that + !i2 = 2 i !i : i is called the modal damping ratio of the ith mode. Then the above homogeneous damped equations are transformed to n decoupled equations: zi + ( + !i2)z_i + !i2 zi = 0 i = 1 2 : : : n: (9. i is usually taken as a small number between 0 and 1. i !i t Ai cos !i 1 . i2 t + Bi sin !i 1 . uid and solid friction. a value as high as = 0:5 is possible. The most common values are 0 0:05. in some applications. pp.

d2y2 + 2ky1 .d1y1 + d2(y2 . y1 ) ) - my1 The equation of motion for this mass is: . y_1) . ky2 = 0: _ _ 634 . For mass m: d1y1 _ ky1 m - d2(y2 .7. k m Consider the following system with two degrees of freedom (DOF). We will show here why y 1 y k 2m 2 2k d 1 d 2 d 3 The equations of motion of the system are developed by considering a free body diagram for each mass.2 Usefulness of Proportional Damping proportional damping is useful.Example 9. y1) = my1 _ _ or my1 + (d1 + d2)y1 . ky1 + k(y2 . y1 ) _ _ k(y2 .

11) Equations (9. For this 2 DOF system. ky1 + 3ky2 = 0: _ _ Thus.1 = .9) (9.d2 d2 + d3 y_2 .d2(y2 .10) (9.2 3 = 10 + 6 (9.2 # K= = .2 # = + : .k 3k .9). d3y_2 . So. 635 .7.7.k # " 4 . (9.k # ( y ) ( 0 ) 1 2 1 1 + 1 2 + = : 0 2m y2 . y1) .10) and (9.2 6 2 = 5 +4 .11) are uniquely satis ed with = 1 and = 0.7.1 # " 5 0 # " 4 . for the whole system we have " m 0 # ( y ) " d + d .k 3k y2 0 Let k = 2 and m = 5. and assume that D = M + K: Consider now two cases.For mass 2m: d2(y2 . k(y2 . the mass matrix "m 0 # "5 0 # M= = 0 2m 0 10 the sti ness matrix " 2k . this is a 2 case of proportional damping.d2 d2 + d3 Case 1: d1 = 1 d2 = 1 d3 = 2 " 2 .d # 2 D= 1 2 : . d2y1 + (d2 + d3 )y2 . y1 ) _ _ k(y2 . y1 ) 2m d3y2 _ 2ky2 ) - 2my2 The equation of motion for this mass is: . y1) . 2ky2 = 2my2 _ _ or 2my2 .2 6 and the damping matrix " d + d .7.7.d # ( y_ ) " 2k .1 3 0 10 .7.

the equations of motion can be decoupled using y = Pz.12). .7. C CB C: pni C B Fi C . @ pn1 1 0 F1 1 CB C pn2 C B F2 C CB C p1n p2n pnn . . B .13) (9.7. CB . CB . C A@ A Fn The decoupled equations will then be given by zi + 2 i!i z_i + !i2 zi = p1iF1 + p2iF2 + 636 + pniFn .4 = . P B B p1i p2i B . K is symmetric.2 5 = 10 + 6 : (9. This type of damping will lead to complex natural frequencies and mode shapes. in the second case.12) (9.13) and (9. CB . ..14) cannot all be satis ed with any set of values of and . C . .7.4 5 0 10 . > : _ (9.4 # " 5 0 # " 4 . the equations of motion are given by 8 F (t) 9 > 1 > > > F2(t) > > < = M y + Dy + Ky = F (t) = > . CB . This is a case of nonproportional damping.7.15) > . C . and damping is proportional.7. such decoupling is not possible. C .7.Case 2: d1 = 2 d2 = 4 d3 = 1 " 6 . (9. Damped Systems Under Force Excitation When a damped system is subject to an external force F . obtaining real mode shapes and real natural frequencies. CB . . Let P = (pij ) be the modal matrix. C . B TF = B . B . Then 0p p B p11 p21 B 12 22 B . > > : Fn(t) > Assuming that M is symmetric positive de nite. it is easy to see from our previous discussion that the above equations can be decoupled using simultaneous diagonalization.2 6 6 = 5 +4 . However.7.. In the rst case.2 # = + . B .14) Now equations (9. CB .

16) The function Ei (t) is called the exciting or forcing function of the ith mode. C: B.17) .7. one is normally interested in maximum responses. Once the decoupled equations are solved for zi we can recover the solutions of the original equations from: y = Pz or 0y 1 0z 1 B y1 C B z1 C B 2C B 2C B . etc.or where Ei(t) = n X j =1 zi + 2 i!i z_i + !i2zi = Ei(t) pji Fj i = 1 2 : : : n: (9.7. when the force is a shock type force.2 The Reduction of a Large Model Many applications give rise to a very large system of second order di erential equations: M y + Dy + Ky = F (y t): _ 637 (9.C B. If each force Fi is written as Fi = fis(t) then Ei(t) = s(t) n X j =1 n X j =i pjifj : De nition 9..C B. and the maximum values of z1 z2 : : : zn can be obtained from the responses of a single equation of one degree of freedom (see the example below). C = P B . such as an earthquake. For example.7. 9.7.C B.C @ A @ A yn zn zi + 2 i !i zi + !i2zi = Ei(t) _ The solutions of the decoupled equations depend upon the nature of the force F (t).1 The expression pjifi is called the mode participation factor for the ith mode.

perhaps m of them where m n. Let the matrix of these normal modes be Pn m : Then techniques for computing a large number of generalized eigenvalues and eigenvectors are virtually nonexistent and not very well developed (see the section on the Generalized P T MP = Im m 2 2 P T KP = m m = diag(!1 : : : !m ): Setting y = Pz and assuming that the damping is proportional to mass or sti ness. but of very large dimension in practice.For example.7. Such a thought is based on an assumption that in many instances the response of the structure depends mainly on a rst couple of eigenvalues (low frequencies) usually the higher modes do not get excited. Suppose that.17) then reduces to m equations: zi + 2 i!i z_i + !i2zi = Ei(t) i = 1 2 : : : m where Ei is the ith coordinate of the vector P T F . the large space structure (LSS) is a distributed parameter system. Once this small number of equations is solved. 638 . e ective numerical Symmetric Eigenvalue Problem for Large and Structure Matrices). Several vibration groups in industry and the military use the following approximation to obtain the maximum value of yi (see Thompson (1988)): vm uX t jyijmax = jp1z1(max)j + u jpj zj(max)j2 j =2 where pi is the ith column of P evaluated at zi . we were able to compute only the rst few normal modes. It is therefore in nite dimensional in theory. the system of n di erential equations (9. It is therefore natural to think of solving the problem by constructing a reduced-order model with the help of a few eigenvalues and eigenvectors which are feasible to compute. Naturally. under the usual assumption that M and K are symmetric and of order n and M is positive de nite. the displacement of any masses under the external force can be computed from: yi = Pzi i = 1 : : : n: Sometimes only the maximum value of the displacement is of interest. Unfortunately. The dimension of the problem in such an application can be several thousand. We will now show how the computations can be simpli ed by using only the knowledge of a few eigenvalues and eigenvectors. the solution of a large system will lead to a solution of a very large generalized eigenvalue problem. A nite element generated reduced order model can be a large second order system.

9. y k 4 the rst two modes (the modes corresponding to the two lowest frequencies) in our calculations. when the building will be subjected to a strong earthquake. 4 y k 3 3 y k 2 2 y k 1 1 y 0 Figure 9. We will use only In the gure.3 A Case Study on the Potential Damage of a Building Due to an Earthquake Suppose we are interested in nding the absolute maximum responses of the four-story building considered in the example of section 9.6. the yi i = 1 : : : 4 denote the displacement of each oor relative to the moving support.7.2. 639 . whose motion is denoted by y0 .8 Building subject to an earthquake The decoupled normal mode equations in modal form in this case can be written as: zi + 2 i!i z_i + !i2zi = . pjimj = mode participation factor of the chosen mode pi due to support existence. using the known response of the building due to a previous earthquake.Eiy0 i = 1 2 where 4 X Ei = j =1 y0 = absolute acceleration of the moving support.

6. 20 p 1 0 p 13 6B p11 C B p12 C7 6B C B C7 P = (p1 p2) = 6B 21 C B 22 C7 6B C B C7 6B p31 C B p32 C7 4@ A @ A5 p41 p42 where p1 and p2 are the two chosen participating modes. Then we can take z1(max) = E1R1 z2(max) = E2R2: This observation immediately gives: 0 B y1 B y2 B B B y3 B @ y4 1 C C C C C C A 0 1 0 p11 C B C B p12 B p21 C B B C B C + E2R2 B p22 B = E1R1 B B p32 B p31 C B @ A @ p41 p42 1 C C C C: C C A max Using now the data of the example in section 9. that is. we have m1 = 107(5) m2 = 107(4) m3 = 107(3) m4 = 107(2): p2 = the eigenvector corresponding to the second smallest eigenvalue 0 .pji are the coordinates of the participating mode Pi.0:0785 1 B C B C . Let R1 and R2 denote the maximum relative responses of z1(max) and z2(max) obtained from previous experience.2.3 B 0:0858 C = 10 B B 0:0104 C C @ A 0:1403 E1 = m1p11 + m2p21 + m3 p31 + m4p41 = 104(1:0772) E2 = m1p12 + m2 p22 + m3p32 + m4 p42 = 103(.3 B B B 0:1091 C C @ A 0:1318 .4:2417) 640 p1 = the eigenvector corresponding to the smallest eigenvalue 0 0:0370 1 B B 0:0753 C C C = 10.

We obtain 0y 1 B y1 C B 2C B C B C B y3 C @ A y4 = max + = = 1:9805 in. etc. 1 1 By C B 1:3079 in. C C @ A @ A y4 (abs.0:0785 1 B C B C . 1 . C B C B C B B 1:7526 in.Assume that R1 = 1:5 inches.0:1488 0 6812 B 1:3079 in.3:103(.: Thus. that of the second oor is 1. max.3079 in.. The absolute maximum relative displacements are obtained by adding the terms using their absolute values: 0y 1 0 0:6812 in.3:104(1:0772)(1:5) B B B 0:1091 C C @ A 0 1318 0:.) 2:2781 in. C B 2C B C B C C =B B C B B y3 C B 1:7747 in.4:2417)(:25) B 0:0858 C 10 B B 0:0104 C C @ A 0:1403 0 0:5979 1 0 0:0833 1 B 1:2169 C B 0:0910 C B C B C B C+B C B C B B 1:7636 C B . C C @ A 641 . the absolute maximum displacement (relative to the moving support) of the rst oor is 0. 0 0:0370 1 B 0:0753 C B C C 10.6812 in..0:0110 C C @ A @ A 0 2::1293 in. R2 = :25 inches.

and thus help us to choose design parameters.y4 m 4 y3 m 3 k 4 y2 m2 k 3 y1 m k 1 2 k 1 Figure 9. Another practice for such a measure in engineering literature has been to use the root sum square of the same terms. k = 2 and the average maximum relative displacements are given by q q (E1R1p11)2 + (E2R2 p12)2 = :9975 inches (E1R1p21)2 + (E2R2 p22)2 = 1:5610 inches 642 . giving the \average" maximum relative displacement values: (yi)average max = (E1R1pi1 )2 + (E2R2pi2 )2 + (y1 )average max = (y2 )average max = and so on.9 Absolute maximum displacement Note: The contribution to the second participating mode to responses is small in comparison with the contribution of the rst mode. q + (Ek Rk Pik )2 For the above example. Average Maximum Relative Displacement of the Masses The absolute maximum relative displacement of the masses provides us with an upper bound for the largest relative displacements the masses can have.

1 Ky = 0: _ Write Then (9.8. of course.3) (9.1) under the assumption of proportional or modal damping.8.M .M .8.8. In case of general damping.1 Dz2 .5.8.8.8. M is symmetric positive de nite.1.1 Kz1 (from 9.1D Assuming a solution of the equation (9.7) .3) and (9.8.4) y = z_1 = z2 _ y = z_2 = . Multiplying both sides of the equation (9.1 Dy .8. as before.8 The Quadratic Eigenvalue Problem So far we have considered the second-order system: M y + Dy + Ky = 0 _ (9.1D z2 z_2 That is z = Az _ where (9.2) _ = . however. which.1 Ky (from 9. amounts to solving a generalized eigenvalue problem. This assumption let us use the simultaneous diagonalization on M K and D (see Section 9.1 we have y + M . by M .1 Dy + M .5) (9. we have the standard eigenvalue 0 problem: z z= 1 z2 ! ! Ax = x 643 (9. We assume.7. we will.1K .1). M .8. Approach 1: Reduction to a standard Eigenvalue Problem. M .M .8.4) can now be combined into the single matrix equation: 0 I ! z1 ! z_1 ! = .9.6) I and A = : (9. We describe two existing approaches to solve this quadratic eigenvalue problem and show how to extract frequencies and from these eigenvalues and eigenvectors. as we have seen in Section 9.M . have the quadratic eigenvalue problem: ( 2M + D + K )x = 0: This problem has 2n eigenvalues and there are 2n eigenvectors corresponding to them.8.3) The equations (9.8.M .1K .6) of the form z = xe t .2) (9.8.M .8) . Case 2).

13) ! ! . The modal damping ratios are given by: k = p.10) Approach 2: Reduction to a Symmetric Generalized Eigenvalue Problem Let's substitute z1 = y and z2 = y into _ M y + Dy + Ky = 0 _ then Again. the real and imaginary part of the complex eigenvalue k .K z1 = 0 M z2 _ .11) and (9.8. Note that the eigenvalues conjugate pairs. then the natural frequencies and the modal damping ratios can be computed from the eigenvalues and eigenvectors as follows.11) . Computing Frequencies and Damping Ratios Let k = k + i k . k and k are.D z2 644 .12) (9. If the second-order system is the model of a vibrating structure.Dz2 .7. that is.8.where A is 2n 2n as given by 9. 462).9) occur in complex- (See Inman (1994).K 0 ! z_1 ! 0 .K .12) we get M z_2 = .8.8. K z_1 = .8.8. Kz1: z1 = y = z2 _ _ (9. There are 2n eigenvalues i i = 1 ::: 2n of the above problem and 2n corresponding eigenvectors. 2 k+ k k 2: (9. since we can write Combining (9. pp.Kz2 (9. respectively.8. Then the natural frequencies !k are given !k = q k+ k 2 2 k = 1 2 ::: n: k (9.8.

M 0 ! Ax = Bx where Once the eigenvalues and eigenvectors are computed.8.15). A will be computed inaccurately and so will be the eigenvalues and eigenvectors. The natural frequencies and modal damping ratios can be extracted from the eigenvalue as in Approach 1. the explicit formation of the matrix A given by (9. respectively.8. A Word of Caution About Using Approach 1 If M is ill-conditioned.D 0 M 0 . Also note that A is nonsymmetric.8.K .8.16) where A and B are both symmetric and are given by (9.M .K 0 ! A= B= : .That is where 0 B z_ = Az (9.1) can be computed using (9.1 will be computationally disastrous.8.1 D .7) by actually computing M .1 K . the solution of (9.K ! .8.8.16) will give 2n eigenvalues and eigenvectors. Since A and B are both 2n 2n.8.K 0 ! z1 A= B= z= (9.9) and (9.8. Summary: The quadratic eigenvalue problem ( 2M + D + K )x = 0 can be either reduced to the standard nonsymmetric eigenvalue problem: Ax = x where or to the generalized symmetric eigenvalue problem: I A= . 645 .8.14) ! .K .K ! . the equation (9.8. the natural frequencies and the modal damping ratios of a vibrating structure whose mathematical model is the second-order system (9.10).14) of the form z = xe t.D 0 M z2 Assuming again a solution of (9.15) .14) yields the generalized eigenvalue problem: Ax = Bx (9.

Let 0 1 0 0 0 1 0 0 1 1 1 1 B . The zeros can be found by bisection or any other suitable root. n.1 The Sturm Sequence Method for Tridiagonal A and B We now present a method when A and B are both symmetric tridiagonal and B is positive de nite. .. see Wilkinson AEP.1 C : @ A @ A 0 0 0 0 n. .. Even though A and B are banded.. and takes advantage of the tridiagonal forms of A and B .1( 1 0 ).9. C B 1 B 1 C 0 C B. 646 . The generalized eigenvalues of the pencil (A . The method is a generalization of the Sturm-sequence algorithm for the symmetric eigenvalue problem described earlier in Chapter 8. Thus.nding method for polynomials. B=B .1 01 0 01 01 1 0 1 B C B C A = B0 2 0C B = B1 4 1 C: @ A @ A 0 0 3 0 1 5 p0( ) = 1 p1( ) = 1 .1) (9. ... simultaneous diagonalization techniques is not practical for large and sparse matrices. 340-341. Example 9. C 0 C B ..9. . pp. .2 ( 0 2 ) (9.9. the matrix C = L. . .9 The Generalized Symmetric Eigenvalue Problems for Large and Structured Matrices As we have seen in several previous case studies.9. r = 2 3 : : : n: r ) pr. the Cholesky algorithm when applied to such structured problems. pr ( ) = ( r ..( r . .1 n n. C A=B . . For a proof of the algorithm. will very often destroy the sparsity. . 9. C B..1 n De ne the sequence of polynomials fp( )g given by: p0( ) = 1 p1( ) = 1 .1 C B . Unfortunately. .3) Then it can be shown (exercise #22) that these polynomials form a Sturm sequence. n.1 will in general be full.9. . C B 0 .2) (9. .9. in many practical situations the matrices A and B are structured: tridiagonal and banded cases are quite common. .1 A(LT ). B ) are then given by the zeros of pn ( ). 0 r r )pr.

Then For i = 0 1 2 : : : do until convergence occurs. Computing a Generalized Eigenvector Once a generalized eigenvalue is computed. Note that the system (A .6708.3964. ) = . About two iterations per eigenvector are adequate. B In this section. they are sparse. Solve for xi+1 : (A . and 0. B )xi+1 = yi can be solved by taking advantage of the tridiagonal forms of A and B . The Lanczos method. 5 )(3 2 . )2(1 . 9. 0. 1. the corresponding eigenvector can be computed using the inverse iteration by taking full advantage of the tridiagonal forms of A and B as follows: Let y0 be the initial approximation of an eigenvector corresponding to a computed eigenvalue .9. which relies on matrix-vector multiplications only. is suitable to preserve sparsity. 6 + 2) . It can be easily checked that these are the generalized eigenvalues of the pencil (A . 4 )(1 . 2) . 2.14 3 + 38 2 . 647 The Lanczos method to be described here is useful for estimating a few extremal eigenvalues of a large and sparse symmetric de nite generalized eigenvalue problem. (.6 +2 p3( ) = (3 . .p2( ) = (2 . Let's repeat that large symmetric de nite generalized eigenvalue problems are common in practice and like most practical problems. B ). B )xi+1 = yi 2. 28 + 6: The roots of p3( ) = 0 are: 1. we shall show how the symmetric Lanczos method described in Chapter 8 for a matrix A can be extended to the symmetric de nite generalized eigenvalue problem Ax = Bx.2 The Lanczos Algorithm for the Pencil A .6472. (0 . Form yi+1 = Bxi+1 : Remarks: 1. Let B = LLT be the Cholesky factorization of B . )2 1 = 3 2.

i. Set 3.. 0 = 1 r0 = v1 v0 = 0: For i = 1 2 : : : j do Solve: Bvi = ri. .1Bvi.1 1 C C C C j ... .1 B B. 648 .Algorithm 9. Choose v1 such that v1 Bv1 = 1. the following generalized Lanczos algorithm constructs a symmetric tridiagonal matrix 0 B 1 B Tj = B . . i Bvi . If ( s) is an eigenpair of Tj .1 The Generalized Symmetric Lanczos Algorithm Given a symmetric matrix A and a symmetric positive de nite matrix B . j .1 ) ri = Avi . 1 . .1 = viT (Avi . @ .1 C A j and an orthonormal matrix Vj = fv1 v2 : : : vj g such that VjT AVj = Tj VjT BVj = Ij j : T 1.. 3.9. For j = n: V T BV = In n V T AV = Tn n a symmetric tridiagonal matrix. .1 i = kL rik2 : i NOTES: 1. .. 2. then ( Vj s) is the corresponding Ritz pair for the pencil (A B ).1 . The Lanczos vectors in this case are B-orthonormal: viT Bvi = 1 viT Bvj = 0 i 6= j: 2. Bvi.1= i.

.1ri can be computed by solving the lower triangular system: Lyi = ri : Also.9. is equivalent to Bvi = ri. . Example 9. we need (i) a Cholesky-factorization routine for a sparse matrix. B B . C B C C Tj = B .2 ..1 since B = LLT ..9.1 (ri. . jC C @ A j j will approximate the generalized eigenvalues of the pencil (A .1= LT vi = L... (ii) routines for sparse triangular systems. 9. to implement the scheme. Thus. .2 01 2 31 C B A = B2 3 4C A @ L = diag(1 1:4142 1:7321) 0 3 4 5 B = diag(1 2 3) = 1 v1 = (1 0 0)T r0 = v1 : T (Note that v1 Bv1 = 1).1) i..1= i. B ).Computational Remarks Note that L. . and (iii) a routine for matrixvector multiplication.3 Estimating the Generalized Eigenvalues Fact: For large enough j the extreme eigenvalues of the tridiagonal matrices 0 1 1 2 B . . 649 .

(Note that the largest eigenvalue of T2. The generalized eigenvalues of (A B ) are: 4.0:2363 650 0 0 0:5285 11 B4:5919 B 0:3796 CC : CC B B AA @ @ 0:3796 . Bv2 ) = . 4.08489 :5285 The Ritz pairs are: 0 B T3 = T = B @ 1 1 0 1 0 1 2:2361 1 0 C B C 2 C = B 2:2361 3:2000 0:2449 C : A @ A 0 0:2449 .:5477 @ 3 = 0:3651 T = v3 (Av3 .6013.0:4347.i=1: T = v1 (Av1 . Bv0 ) = 1 001 B C r1 = B 2 C @ A 3 = 2:2361: 1 1 i=2: 0:4472 T 2 = v2 (Av2 .0:3920 .0:4347 0 4:6013. B).0:0333: 1 2 1 C C A Thus.0:3920 4:5919. is a reasonable approximation of the largest generalized eigenvalue 4.0:2363 CC CC B B AA @ @ .) 0 0 1 B C v2 = B 0:4472 C @ A i=3: 0 0 B v3 = B .5919.0:0333 3 !! and 4:5919 0:5285 0:8489 !! : 0 0 0:8489 11 B. The eigenpairs of T2 are: 0: . 0 2 The eigenvalues of T are: . Bv1 ) = 3:2000 ! ! 1 2:2361 1 1 T2 = = : 2:2361 3:2000 1 2 The eigenvalues of T2 are . 0.6013 of the pencil (A .0:3920 B . .

9.1 The Lanczos Method for Generalized Eigenvalues in an Interval Given (A B ).10 The Lanczos Method for Generalized Eigenvalues in an Interval In several important applications. 4. Choose a random unit vector v1. We give a brief outline of a Lanczos algorithm for the generalized symmetric eigenvalue problem to do this. Set that count equal to jS ( 2. For i = 1 2 : : : do until jC ( 1. Determine a new shift 651 . r B = WW T be the converged eigenvalues from Step 3. Then add i+1 . The algorithm was devised by Ericsson and Ruhe (1980).10. Algorithm 9. 3. i + 1 to C s = s 5. i B ) : )j = jS ( A . such as in vibration and structural analysis. Count the number of negative diagonal entries of D. Factorize (A . Denote S ( ) = the set of eigenvalues in ( ) C ( ) = the set of converged eigenvalues in ( ): Set 1 = )j C= (null set). Let 1 2 : : : 1 : : : r. Apply the symmetric Lanczos algorithm to the matrix W T (A . the following algorithm computes the generalized eigenvalues k of the pencil (A . B ) in a speci ed interval ]. i B = LDLT : i)j. iB ). one is frequently interested in computing a speci ed number of eigenvalues in certain parts of the spectrum. A symmetric and B symmetric positive de nite.1 W where with v1 as the starting Lanczos vector.

The algorithm requires about 15n3 ops and is backward stable. namely.1 and 9. Solve for p: (A . Compute y = W T p. 3. the pair (A B ) is reduced to a Hessenberg-triangular pair.1 is never formed explicitly. In stage II. which constructs the generalized Real Schur Form of (A B ). There exist Schur and Real Schur analogs of the ordinary eigenvalue problem for the generalized eigenvalue problem (Theorems 9. 2.1 W we need a matrix-vector multiply of the form y = W T (A . 9. see the paper by Ericsson and Ruhe (1980). Compute z = Wx 2. In stage I.Remarks: Note that in applying the symmetric Lanczos to W T (A . Existence Results. the Hessenberg-triangular pair is further reduced to the generalized Real Schur Form using implicit QR iteration to AB . B .1 Wx which can be obtained as follows: 1. For details of this algorithm.1).1 . its implementation and proof. the generalized eigenvalue problem involving two matrices: Ax = Bx: We now review and summarize the most important results: 1. i B )p = z which can be done by making use of the factorization in Step 1 of the algorithm. i B). The QZ Algorithm. The algorithm comes in two stages. iB ). The most widely used algorithm for the generalized eigenvalue problem is the QZ algorithm.11 Review and Summary This chapter has been devoted to the study of the most commonly arising eigenvalue problem in engineering. 652 .2.3.

Several case studies from vibration engineering have been presented in section 9. (See Chapter 8. The method constructs an ordered real Schur form of B rather than the Cholesky factorization. and the situation is quite alarming. and K is symmetric positive semide nite.3. If the frequency of the imposed periodic force becomes equal or nearly equal to one of the natural frequencies of the system. and of Boughton Bridge in England are related to such a phenomenon. These include: (i) vibration of a free spring-mass system. in the USA. to the generalized eigenvalues and eigenvectors. The accuracy obtained by this algorithm can be severely impaired if the matrix B in Ax = Bx is ill-conditioned.) The QZ method can. respectively. The Cholesky-QR algorithm for the symmetric de nite problem Ax = Bx 653 . This has been described in section 9. A variation of this method due to Wilkinson that computes the smallest eigenvalues with reasonable accuracy has been described in this section. (iii) and forced harmonic vibration of a spring-mass system. The fall of Tacoma Bridge in the state of Washington. then resonance occurs. it has been studied in some depth here. of course. Simultaneous Diagonalization and Applications. This is called the symmetric-de nite generalized eigenvalue problem. 4.1. The natural frequencies and amplitudes of a vibrating system are related. (ii) vibration of a building. Because of the importance of this problem. be used to solve a symmetric de nite generalized eigenvalue problem. A symmetry-preserving method is the Cholesky-QR algorithm. both symmetry and de niteness will be lost in general. The Generalized Symmetric Eigenvalue Problem. However.5. Almost all eigenvalue problems arising in structural and vibration engineering are of the form Mx = Kx where M is symmetric positive de nite.6 to show how this problem arises in important practical applications.

such as the design of large space structures.basically constructs a nonsingular matrix P that transforms A and B simultaneously to diagonal forms by congruence: P T AP = a diagonal matrix P T BP = I: This is called simultaneous diagonalization of A and B . Unfortunately.16).9 and 9. such as sparsity.8. the simultaneous diagonalization technique is not practical for large and sparse problems.10).8. Its applications include (i) decoupling of a second order system of di erential equations M y + Dy + ky = 0 _ to n independent equations zi + ( + !i2 )z_i + !i2 zi = 0 i = 1 2 ::: n where D = M + K .8. The simultaneous diagonalization technique preserves symmetry. It is shown that the problem can be reduced either to a standard 2n 2n nonsymmetric problem (9. The Quadratic Eigenvalue Problem: The eigenvalue problem ( 2M + D + K )x = 0 is discussed in Section 9. give rise to very large and sparse symmetric de nite eigenvalue problems.8. this decomposition is called modal decomposition and the matrix P is called a modal matrix. however. but destroys the other exploitable properties o ered by the data of the problem. Decoupling and model reduction are certainly very useful approaches for handling a large second-order system of di erential equations. The technique of simultaneous diagonalization is a very useful technique in engineering practice. etc. In vibration and other engineering applications. (Note that most practical large problems are sparse and maintaining sparsity is a major concern to the algorithm developers to economize the storage requirements.8.) 5. It is also shown how to extract the frequencies and modal damping ratios of a vibrating system governed by a second-order system once the eigenvalues are obtained (Equations 9. and (ii) making of a reduced order model from a very large system of second-order systems..8) or to a 2n 2n symmetric generalized eigenvalue problem (9. On the other hand. many practical problems. 654 .

see Golub and Van Loan (MC 1984 Chapter 8) and the references therein. NJ 07632. by Daniel J.) For further reading on simultaneous diagonalization technique. W. G. Prentice Hall. C. (The original paper of Moler and Stewart (1973) is worth reading in this context. and Stability. Schiehlen. Vibration of Mechanical and Structural Systems. The Sturm-sequence and Lanczos Methods. We have given an outline of the Sturmsequence method for the generalized eigenvalue problem with tridiagonal matrices A and B . Martinus Nijho Publications. Linear Vibrations. Smith. The QZ iteration algorithm has been discussed in detail in the books by Golub and Van Loan (MC 1984 Chapter 7). Wolford.9). 4. Vibration with Control. Inman. Englewood Cli s. New York. Stewart (IMC). C. Prentice Hall. 1979. W. Muller and W. Engineering Vibration. 5. T. The symmetric de nite tridiagonal problems arise in several applications. 9. An Introduction to Mechanical Vibrations. 655 . Jr. Theory of Vibrations with Applications. Thomson. and Watkins (FMC Chapter 5). We have also given a very brief description of the symmetric Lanczos method for a generalized eigenvalue problem. Some of the well-known books in the literature of vibration are: 1. (Third Edition) by W. by Robert F. 1989. New York.12 Suggestions for Further Reading Almost all books in vibration and structural engineering discuss implicitly or explicitly how the generalized eigenvalue problems arise in these applications. Steidel. James. 1985. New York. For results on perturbation analysis of the generalized eigenvalue problem. by M. and P. 6. Harper & Row. 3. Inman. Measurement. The chapter concludes with a Lanczos-algorithm due to Ericsson and Ruhe for nding a speci ed number of generalized eigenvalues in an interval of a symmetric de nite generalized eigenvalue problem (section 9. Englewood Cli s. by Daniel J. 1994. J. see Stewart (1978). L. 2. Prentice Hall. O.. by P. John Wiley & Sons. Dordrecht. NJ 1988. (Second Edition).6. Whaley.

(Second Edition). _ Chapter 15 of Parlett's book \The Symmetric Eigenvalue Problem" (1980) is a rich source of knowledge in this area. For a look-ahead Lanczos algorithm for the quadratic eigenvalue problem see Parlett and Chen (1991). 656 . Okamoto. Laub and Williams (1992) have considered the simultaneous triangularizations of matrices M D and K of the second-order system M y + Dy + Ky = 0. A technique more e cient than the Cholesky-QR iteration method for computing the generalized eigenvalues of a symmetric de nite pencil for banded matrices has been proposed by Crawford (1973).For applications of the symmetric de nite generalized eigenvalue problem to earthquake engineering. by S. see the book Introduction to Earthquake Engineering. University of Tokyo Press. See also the recent work of Wang and Zhao (1991) and Kaufman (1993). 1984.

What is the op-count? 8. Show that the matrix Q1 in the initial QZ step can be computed just by inverting the 2 2 leading principal submatrix of the triangular matrix B . How are the generalized eigenvectors of the pair (A B ) related to the orthogonally equivalent ~ ~ pair (A B ). Show that the shifts 1 and 2 in a QZ step.1 . (a) Prove that if A and B are n n matrices.Exercises on Chapter 9 PROBLEMS ON SECTION 9. B ) is a polynomial of degree at most n.1 (Hint: Computation depends only on the lower right 3 3 submatrix of B . Give op-count for (a) Hessenberg-Triangular reduction with and without accumulations of the transforming matrices. which are the eigenvalues of the lower 2 2 principal submatrix of C = AB . Show that when A and B have a common null vector. 6. (c) Reduction of (A B ) to the generalized Schur form. (b) The degree of det(A . PROBLEMS ON SECTIONS 9. B ) = 0: 2. Work out an algorithm based on Gaussian elimination with partial pivoting for the Hessenbergtriangular reduction.1 ). 4.3 AND 9. show that the matrix Q in the QZ step has the same rst row as Q1 . 5. where ~ A = UAV ~ B = UBV: 657 .2 1.4 3. B ) is equal to n i A is nonsingular. can be computed without forming the complete B . Using the Implicit Q Theorem. 7. then det(A . (b) One step of the QZ iteration. then the generalized characteristic equation is identically zero: det(A .

if the dimension of the null space AB is k. Show that the algorithm for simultaneous diagonalization requires about 7n3 ops. Rework the Cholesky-QR algorithm (Algorithm 9. A22 is upper Hessenberg. Given 1 2 3 4 1 10 C B1 3 4C C B=B B C B C B1 4 5A @ 5 6 1 1 0 1 10 1 1 1 1 10 1 1 C 1C C: C 1C A 10 1 01 1 11 0 10 1 0 1 B C B C (a) A = B 1 1 1 C B = B 1 10 1 C : @ A @ A 1 1 1 0 1 10 658 . k) (n . Show that in this case. Apply the above algorithm to 01 B1 B A=B B B1 @ 1 14. 0 (b) For k = 1 2 : : : do i. 12.PROBLEMS ON SECTION 9. Find QR factorization of Zk : Zk = Qk Rk . then the Hessenberg-Triangular structure takes the form: ! A11 A12 ! 0 B12 A= B= 0 A22 0 B22 where A11 is a k k upper triangular matrix. Solve for Zk : BZk = AQk. Let A and B be positive de nite.1) of the symmetric de nite pencil by using the Schur decomposition of B : V T BV = D: 10. Work out the op-count for the Cholesky-QR algorithm of the symmetric de nite pencil. and B22 is (n .1. Consider the Hessenberg-triangular reduction of (A B ) with B singular. ii.5 9. 13. Generalized Orthogonal Iteration.5. Consider the following iterative algorithm known as the generalized orthogonal iteration: (a) Choose an n m orthonormal matrix Q0 such that QT Q0 = Im m . k) upper triangular and nonsingular. How does this help in the reduction process of the generalized-Schur form? 11.

Suppose that it de ects about 2mm at midspan under a vehicle of 90000 kg. the generalized Rayleigh-Quotient iteration. ii. techniques of simultaneous diagonalization. Consider the following spring-mass problem: 659 . What are the natural frequencies of the bridge and the vehicle? 16. Suppose that a bridge trestle has a natural frequency of 5 Hz (known from an earlier test). the QZ iteration followed by inverse iteration. For the equation of motion under the force F (t) = F1 ei!t: my + ky = F1 ei!t nd the amplitude and determine the situation which can give rise to resonance. iii. 17.6 AND 9.0 10 1 1 1 01 B C B (b) A = B 1 10 1 C B = B 1 @ A @2 1 1 1 1 3 Find the generalized eigenvalues and eigenvectors for each of the above pair using i.7 15. 1 2 1 3 1 4 1 3 1 4 1 5 1 C C: A PROBLEMS ON SECTIONS 9.

(c) Find the con guration of the system in each mode.3k y1 2m k y m k 2 y 2m 3k 3 (a) Determine the equations of motion. 660 . (b) Set up the generalized eigenvalue problem in the form Kx = Mx and then determine the natural frequencies and modes of vibration.

661 .18. Consider the four story building as depicted by the following diagram: m 4 m 3 m 2 m 1 Given m1 m2 m3 m4 k1 k3 = = = = = = 1:0 0:8 0:5 0:6 15 15 105kg 105kg 105kg 105kg 108N/m k2 = 12 108N/m 108N/m k4 = 10 108N/m: Find the maximum amplitude of each oor for a horizontal displacement of 3 mm with a period of .25 second.

Consider the following diagram of a motor car suspension with vibration absorber (taken from the book Linear Vibrations by P. O. p.19. C. car body y1 (t) m1 y3 (t) m3 k d m2 k 1 absorber y2 (t) 1 3 d 3 axle & wheel suspension k 2 tire guide way ye (t) Motor car suspension with vibration absorber Given m1 m2 m3 k1 Find the various amplitude frequency responses with di erent damping values of the absorber d3 = 0 300 600 1000 Ns/m. 226). Schiehlen. (The response of a system is measured by the amplitude ratios. Muller and W.) 1200kg 80kg 20kg 300 N cm N k2 = 3200 cm N k3 = 600 cm : = = = = 662 .

22. Estimate the generalized eigenvalues of the pair (A B ) of problem #14 using the generalized symmetric Lanczos algorithm.9) and (9.1 C D = B 1 1 1 C : @ A @ A 0 .9.kB .8. Deduce the equations (9. 26. Find the generalized eigenvalues for the pair (A B ). Find the generalized eigenvalues of the pair (A B ) of problem #23 in the interval (0 0:3).9. where 0 1 1 0 1 10 using the Sturm sequence method described in section 9. in exercise #23.3) form a Sturm sequence. 01 1 01 B C A = B1 1 1C @ A 0 10 1 0 1 B C B = B 1 10 1 C @ A 24. Show that the polynomials de ned by (9.1 Ak].1 0 1 01 1 11 B C B C M = I3 3 K = B . 23. Find the eigenvalues of the quadratic pencil: (M 2 + K + D)x = 0 using both approach 1 and approach 2 of Section 9. Prove that the eigenvalues of the symmetric de nite pencil A. B lie in the interval . 25. 663 .1){(9. and compare the results.10) for frequencies and the mode shapes.9 20.1 Ak kB .8.1 2 1 1 1 21. 0 2 .8.8 AND 9.PROBLEMS ON SECTIONS 9.1 2 .1.9.

(d) Compare C and D to verify that they are essentially the same. 2. 1. (a) Write a MATLAB program. write a MATLAB program. A = A 15 15 randomly generated unreduced upper Hessenberg matrix.1 . Using lynsyspp from Chapter 6 or the MATLAB Command `n' and hesstri. 664 . (The purpose of this exercise is to compare the accuracy of di erent ways to nd the generalized eigenvalues and eigenvectors of a symmetric de nite pencil (K . called eigenvecgen to compute a generalized eigenvector u corresponding to a given approximation to a generalized eigenvalue : Test-Data: u] = eigvecgen(A B ): Test your program using randomly generated matrices A and B . called qzitri. M )).1 : C ] = qritrdsi (C ): (c) Compute D = A1 B 1. Write a MATLAB program. to implement one step of QZ iteration algorithm: A1 B1] = qzitri (A B): (b) Now apply one step of qritrdsi. 3.5. each of order 15. upper triangular matrix with all entries equal to 1. and then compare the result with that obtained by running the MATLAB command : U D] = eig(A B): 4. where (A1 B 1) is the Hessenberg-triangular pair obtained in step (a). (Double-shift implicit QR iteration from Chapter 8) to C = AB. to reduce a pair of matrices (A B ) to a (Hessenbergtriangular) pair: H T ] = hesstri(A B): Test your program by randomly generating A and B .MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 9 You will need backsub from MATCOM. B = A 15 15. each of order 15. except two diagonal entries each equal to 10. called hesstri.

. . write a MATLAB program.k 0 0 B .2. . B . called diagsimul.k 2k . and `n' (or bascksub from Chapter 3). 665 . V 1 D1] = geneigsymdf (K M ) where V 1 is a matrix containing the generalized eigenvectors as its columns and D1 is a diagonal matrix containing the generalized eigenvalues of the symmetric-de nite pencil K .5. . . . B. to simultaneously diagonalize a pair of matrices (M K ). called geneigsymdf.. K = B . . B @ . M ) (Algorithm 9. eig.6. M. (b). B . inv. .5. and (c). .k 0 .. Using chol. B 0 m = 100 k = 108 : 0 0 .2): (I D) = diagsimul (M K ): Test-data: Use M and K from the case-study on the vibration of a building in Section 9.(a) Using the MATLAB commands chol.k 2k 1 C C C C C C C C C C A 25 25 5. .k 0 B B. . . . . to implement the Cholesky algorithm for the symmetric-de nite pencil (K . . inv. . . Test-data: M = diag(m m m)25 25 0 B k . eig from MATLAB.. (b) Run eig(K M ) from MATLAB to compute (V 2 D2): V 2 D2] = eig(K M ): (c) Run eig(inv(M ) K ) from MATLAB to compute (V 3 D3) : V 3 D3] = eig (inv (M ) K ): (d) Compare the results of (a). . write a MATLAB program.. . where M is symmetric positive de nite and K is symmetric (Algorithm 9.1). .

Find the generalized eigenvalues using the MATLAB 666 . 20.M . Then compute the eigenvalues and (iii) Use the MATLAB command V 3 D3] = eig (A B ). where TJ is a j j symmetric tridiagonal matrix and IJ is the j j identity matrix. 8. eig. Find the natural frequencies and modal ratios using the results (9. (alpha beta) = langsymgen (A B ) where alpha is a vector containing the diagonal entries of TJ and beta is a vector containing the o -diagonal entries of TJ . solve example 9. and 50. Write a MATLAB program called lansymgen to transform the symmetric de nite pair (A B ) to (TJ IJ ). B ). Tell how the extreme eigenvalues of TJ for each value of j approximate the generalized eigenvalues of the pencil (A . (The eigenvalues and the eigenvectors obtained by (iii) can be taken as the accurate ones).6.10) of the book with the same M K and D as in problem #7. where the matrices A and B are given as in part (ii).8. Form the matrix A = @ by .1 from the book completely. Test-data and the experiment: Take B = M A = K where M and K are the same as given in exercise #7. 30. and elapsed time. Compare the results of the above three approaches with respect to accuracy. and zeros.5 M . j n.K 0 A : where A = @ B=@ .) eigenvectors of A using MATLAB command: V1 D1] = eig (A): (ii) (Approach 2 of Section 9.8. Run lansymgen using j = 10.K A . 0 1 0 1 0 .8.1D using MATLAB commands inv. Using diagsimul.1K . 7.M .8): Use the program geneigsymdf to compute the eigenvalues: V 2 D2] = geneigsymdf (A B ) 0 1 0 I A (i) Approach 1 of Section 9.9) and (9. (The purpose of this exercise is to compare di erent approaches for solving the quadratic eigenvalue problem: ( 2M + D + K )x = 0. op-count.7. 40. 9.K D 0 M Test-Data: Use the same M and K as in problem 4 and D = 10.

command U D] = eig(A B )]. 667 .

3 The Rank and the Rank-De ciency of a Matrix : : : : : : : : : : : : : : : : : 682 10.1 Computing the SVD from the Eigenvalue Decomposition of AT A : : : : : : : 695 10.12Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : 712 .8.9.1 The Norms and the Condition Number : : : : : : : : : : : : : : : : : : : : : 680 10.7 Computing the Variance-Covariance Matrix : : : : : : : : : : : : : : : : : : : : : : : 688 10.6.2 Orthonormal Bases for the Null Space and the Range of a Matrix : : : : : : : 681 10.8.2 The Golub-Kahan-Reinsch Algorithm : : : : : : : : : : : : : : : : : : : : : : 696 10.9 Computing the Singular Value Decomposition : : : : : : : : : : : : : : : : : : : : : : 695 10.4 Computing the Singular Values with High Accuracy: The Demmel-Kahan Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 705 10.8.6.6 The Bisection Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 709 10.9.6.9.6.10Generalized SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 710 10. and the Pseudo-Inverse689 10.5 The Di erential QD Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 709 10.4 10.10.3 10.8 The Singular Value Decomposition.4 An Outer Product Expansion of A and Its Consequences : : : : : : : : : : : 686 10.9.2 10.5 Numerical Rank : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 687 10.5 10.THE SINGULAR VALUE DECOMPOSITION (SVD) 10.9.6.9.11Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 711 10.1 The SVD and Least Squares Problem : : : : : : : : : : : : : : : : : : : : : : 689 10.2 Solving the Linear System Using the Singular Value Decomposition : : : : : 692 10. the Least Squares Problem.1 10.3 The SVD and the Pseudoinverse : : : : : : : : : : : : : : : : : : : : : : : : : 692 10.3 The Chan SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 703 10.6 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 669 The Singular Value Decomposition Theorem : : : : : : : : : : : : : : : : : : : : : : : 670 A Relationship between the Singular Values and the Eigenvalues : : : : : : : : : : : 673 Applications of the SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 675 Sensitivity of the Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 678 The Singular Value Decomposition and the Structure of a Matrix : : : : : : : : : : : 679 10.

CHAPTER 10 THE SINGULAR VALUE DECOMPOSITION (SVD) .

THE SINGULAR VALUE DECOMPOSITION (SVD) Objectives The major objectives of this chapter are to study theory.9) Background Material Needed for this Chapter 1.6) The following background material will be needed for smooth reading of this chapter: 4.7) 2. (Section 10.2) A relationship between singular values and the eigenvalues(Section 10. Creating zeros with Householder and Givens matrices (Sections 5. etc.4) Sensitivity of singular values (Section 10. The Bauer-Fike Theorem (Theorem 8.2) 6. Rank concept (Section 1. computing orthonormal bases.6) Use of the SVD in least squares and the pseudoinverse problems (Section 10. QR factorization of a nonsquare matrix (Section 5.3 and 5.1) 668 .7. Statements and methods for least-squares problems and the pseudoinverse (Chapter 7) 7.3) An application of the SVD to the fetal ECG (Section 10.Here are the highlights of the Chapter.4. matrix approximations.8) The Golub-Kahan-Reinsch algorithm (Section 10. and applications of the Singular Value Decomposition (SVD). methods.4 and 5. Norm concepts and related results (Section 1. The SVD existence theorem (Section 10. Orthonormal bases (Sections 1.9) The Demmel-Kahan algorithm (Section 10.10.5) Applications of the SVD to rank-related problems.3) 3.5) 5.

some applications. Some details of the contributions of these mathematicians to the SVD can be found in an interesting recent paper by G. etc.5 we discuss the sensitivity of the singular values. Weyl (1885-1955)|can be associated with the development of the theory of the SVD. taken from the work of Callaerts.3.4 we give a real-life application of the SVD on separating out the ElectroCardiogram (ECG) of a fetus from that of the mother. however. The organization of this chapter is as follows. Stewart (1993). Golub. Sylvester (1814-1897). W. Jordan (1838-1921). Schmidt (1876-1959). Pan and Sigmon (1992). and this is remarkable. De Moor. W. the LU and the QR decompositions. Beltrami (1835-1899). U and V are real orthogonal.10. In this chapter we will study another very important decomposition of a complex matrix A: the singular value decomposition or in short. this has only been possible due to the pioneering work of some contemporary numerical analysts such as: G.6 we outline several important applications of the SVD relating to the structure of a matrix: nding the numerical rank. approximation of A by another matrix of lower rank. orthonormal bases for the range and null space of A. If A is real. p. The singular values are insensitive to perturbations. etc. In recent years the SVD has become a computationally viable tool for solving a wide variety of problems arising in many practical applications.2 we prove the existence theorem on the SVD. In section 10. C. Van Dewalle and Sansen (1991). H. E. We will. and the existing computational methods for the SVD. and H.1 Introduction So far we have studied two important decompositions of a matrix A. Also the book by Horn and Johnson (1991) contains a nice history of the SVD. The names of at least ve classical and celebrated mathematicians|E. In section 10. 17) Horn and Johnson (1991). In section 10. J. In section 10. . give a more traditional and constructive proof that relies on the relationship between the singular values and singular vectors of A with the eigenvalues and eigenvectors of AT A: The later aspect is treated fully in Section 10. However. the SVD de ned by A=U V where U and V are unitary and The SVD has a long and fascinating history.. is a diagonal matrix. etc. There are several proofs available in the literature: Golub and Van Loan (MC. The ability of the SVD to perform the above computations in a numerically reliable way makes the use of the SVD so 669 We will assume that A is real throughout this chapter. In this chapter we will discuss the important properties. who showed us how to compute the SVD in a numerically e cient and stable way. Kahan. 1984. namely.

section 10. The number of nonzero diagonal entries equals the algebraic rank of A.9 is devoted to computing the SVD. Denote the eigenvalues of AT A by 1 = 1 .2 The Singular Value Decomposition Theorem matrices U and V such that Theorem 10.8 we discuss the use of the SVD in least squares solutions and computation of the pseudoinverse. Denote the set of orthonormal eigenvectors of AT A corresponding to 1 through n by v1 : : : vn that is. has the form 0 1 B B . It is an n n symmetric positive semide nite matrix therefore 2 2 2 its eigenvalues are nonnegative. 2 = 2 : : : n = n.1) where is an m n \diagonal" matrix. we brie y outline the recent improvements to the method by Demmel and Kahan (1990). besides describing the widely-used Golub-Reinsch method.n) n Proof. Here. The diagonal entries of are all nonnegative and can be arranged in nonincreasing order.. 10.2. 0(m. In section 10. A brief mention of the very recent method of Fernando and Parlett (1992) will be mentioned too.2.1 Let A be a real m n matrix.. B B =B B @ n 1 C C C: C C C A = r+2 = = n = 0: We have seen before that a symmetric matrix has a set of orthonormal eigenvectors. v1 through vn are orthonormal and satisfy: AT Avi = i2vi i = 1 : : : n: 670 . The SVD is the most numerically reliable approach to handle these problems.attractive in practical applications. Then there exist orthogonal A = Um T m m n Vn n (10. Assume that these have been ordered such that 1 2 r > 0 and r+1 Remark: Notice that when m n. Consider the matrix AT A. Finally.

j The set fu1 : : : ur ur+1 ::: um g forms an orthonormal basis of the m-space C m .3)).7) 671 . and choose U2 = (ur+1 : : : um ) such that uT A = 0 j = r + 1 : : : m.2.2.2.2) (10. i = 1 : : : r form an orthonormal set. De ne now V = (V1 V2) and U = (U1 U2). C .2.4) (10. De ne U1 = (u1 : : : ur ).2. C A uT m (Using 10. De ne now a set of vectors fui g by u = 1 Av i = 1 : : : r: (10. V1 = (v1 : : : vr ) V2 = (vr+1 : : : vn) where v1 through vr are the eigenvectors associated with the nonzero eigenvalues 1 through r .3) viT AT Avj = 0 if i 6= j . C B @ A B T B um B B @ 1 v T AT 1 1 1 v T AT 2 2 1 C C C . and 0 B B B 0 uT 1 B 1 B B B uT C B B 2C T AV = B . and vr+1 : : : vn correspond to the zero eigenvalues.2. C . C .2. C C 1 v T AT C A(v1 : : : vn) C r r C C uT+1 C r .5) (by (10.6) (10.2) and (10. C .2.Then.4) i i i The ui 's. Then U and V are orthonormal. C 1 n B B . and Write viT AT Avi = i 2 (10.2. C A(v : : : v ) = B U B B . because uT u = 1 (Av )T 1 (Av ) i j i 8i j < 0 when i 6= j = : 1 when i = j = 1 j j (viT AT Avj ) i (10.

0 0 C 1 2 0 0C 0 2 C 2 C . We will follow the convention that the singular values are ordered as above.6) is known as the Singular Value Decomposition (SVD) of A. The singular values of A are uniquely determined while the matrices U and V are not unique (Why?) 672 .. C .. Note that the other (n . A Convention ality. This is also n For the remainder of this chapter we will assume.. . . 1 rank (A) = rank (U V T ) = rank ( ) and that the rank of a diagonal matrix is equal to the number of nonzero diagonal entires. When m n.1 The scalars 1 2 : : : r are called the nonzero-Singular Values of A. we consider the SVD of AT and if the SVD of AT is U V T .2.2. Notes: 1..2. because if m < n. without any loss of gener- referred to as the singular spectrum.2 The columns of U are called the left singular vectors and those of V are called the right singular vectors. 2. by (A). C= C 1 2 0 0 0C C r r C C 0 0 . The decomposition (10. De nition 10. De nition 10.. we will denote the set of singular values of A. . then the SVD of A is V U T . . Also. r) singular values are zero singular values. . 0 C A 0 0 The statement about the rank follows immediately from the fact that the rank is invariant under unitary matrix multiplications: 1 0 B B B B B B = B B B B B B @ 1 2 1 0 0 0 . we have n singular values. that m n.. Thus = min the smallest singular 1 = max is the largest singular value and value.

2.3. Corollary 10. (10. Theorem 10.1 The singular values of an m n matrix A are the nonnegative square roots of the eigenvalues of AT A. Proof. 10.1 Let A be a symmetric matrix with its eigenvalues Then the singular values of A are j ij i = 1 : : : n.0:5009 0:4082 3 3 0:5696 .3742.3. AT A = (V T U T )(U V T ) = V T VT = V 0V T (10.3.3.3 A Relationship between the Singular Values and the Eigenvalues In the proof of Theorem 10.2) (10. 0. Since A = AT AT A = A2: 673 1 ::: n.3.3) .0:8219 V= 0:8299 0:5696 ! 2 2 There are two singular values they are 6.1 01 21 B C A = B2 3C @ A 3 4 0 6:5468 0 1 B C =B 0 0:3742 C @ A 0 0 3 2 0 0:3381 0:8480 0:4082 1 B C U = B 0:5506 0:1735 .Example 10. that's how they have been de ned). We put this important fact in the form of a theorem and give an independent proof. Proof.2.1) 2 2 where 0 is an n n diagonal matrix with 1 : : : n as its diagonal entries.0:8165 C @ A 0:7632 .1 we have used the fact that the singular values of A are the nonnegative square roots of the eigenvalues of AT A (in fact. the theorem is proven.5458. V T AT AV = 0 = 2 2 2 diag( 1 2 : : : n): Since the singular values are nonnegative numbers. Thus.

n) 1(n n) ! De ne where and Then it is easy to see that n : ~ ~ U .2 An n n matrix A is nonsingular i all its singular values are di erent from zero.1 above we have that the n singular values of A are the square roots of the n eigenvalues of A2 . n) 0(m. Theorem 10.3.2 The nonzero singular values of an m n (m n) matrix A are positive eigenvalues of ! 0 A C= m m AT 0n n Proof. Let Write A = U V T: U = = ! U1 U2 m n m (m . 1 n Corollary 10. proof is now immediate from Theorem 10.3. Since a matrix is nonsingular i its eigenvalues are all di erent from zero. Proof.n) 1 ~ U 1 = p U1 2 ~ 1 V = p V: 2 ! 0 1 0 0 1 B C P T CP = B 0 . We know that det(AT A) = (det(A))2: Thus A is nonsingular if and only if AT A is non- singular.Therefore. from Theorem 10.1. we have the result.3. 1 0 C @ A 0 0 0 674 . Since the eigenvalues of A2 are 2 : : : 2 .3.U U2 P = ~1 ~ 1 V V 0n (m.

the SVD is the most e ective tool in solving least squares and the generalized least-square problems. realization of state space models. In image processing applications the SVD is routinely used for image compression.4 Applications of the SVD The SVD has become an e ective tool in handling various important problems arising in a wide variety of applications areas. (See Golub and Van Loan MC. electrical network. 1 : : : .. speech is absent about 50% of the time. etc. when a person speaks. k. such as control theory. biomedial engineering. then these disturbances dominate in the microphone signal.which shows that the nonzero eigenvalues of C are are the nonzero singular values of A. For example. time series analysis. The aspects of control theory and identi cation problems requiring use of the SVD include problems on controllability and observability. speech synthesis. The crux of using the SVD in these applications is in the fact that these applications require: determination of the \numerical" rank of a matrix. where 1 through k 10. In most cases these computations need to be done under the presence of impurities in the data (called \noises"). to remove noise in a picture. and as we will see in later sections in this chapter. nding an approximation of a matrix using matrices of lower ranks. see Van Dewalle and DeMoor (1988). 1 ::: k. model reduction. For details see Jain (1989). etc.) We now present in this section a real-life application of the SVD in biomedial engineering. pattern recognition. In signal and speech processing the SVD can be regarded as a lter that produces an estimate of a signal from noisy data. signal and image processing identi- cation and estimation. the H-in nity control. say. In such a situation the ratio of speech signal to the background noise has to be enhanced and the SVD can be used e ectively to do so. a fan. 1984 and 1989. robust feedback stablization. The maternal ECG (MECG) clearly 675 . Fetal ECG (Taken from (Van Dewalle and DeMoor (1988). . a vibrating machine. etc. In biomedial engineering the SVD plays an important role in obtaining a meaningful fetal ECG from that of the mother. etc. when speech is absent. if there are background noises coming from. etc.) Consider the problem of taking the electrocardiogram of a fetus by placing the cutaneous electrodes at the heart and the abdomen of the pregnant mother. balancing. the SVD is very e ective for these computations. Furthermore. computations of the orthonormal bases for the row and column spaces of a matrix as well as the bases for their orthogonal complements. Thus.

each measurement signal mi (t) can be written as a linear combination of r source signals si (t) and additive noise signal ni (t).4.4. The problem now is to get an estimate of the source signals s(t) knowing only m(t). This leads to the following equations: m1(t) = t11s1(t) + t12 s2 (t) + m2(t) = t21s1(t) + t22 s2 (t) + . the measurement signals are corrupted by an additive noise signal. 676 (10.4) (10.5) n(t) = (n1(t) n2(t) : : : nr (t))T : The matrix T is called the transfer matrix and depends upon the geometry of the body.disturbs the observation of the fetal ECG (FECG). The objective then will be to detect the FECG while simultaneously suppressing the MECG respective to noise. separate out the estimate of fetal source signals.6) .3) m(t) = T s(t) + n(t) T = (tij ) and (10.4. It can be assumed (see Van Dewalle and de Moor (1988)) that this relationship is linear. and the conductivities of the body tissues. Let each measurement consist of q samples. .4. the positions of the electrodes and sources. Then the measurements can be stored in a matrix M of order p q. Let these measurements be arranged in a vector called the measurement vector m(t): m(t) = (m1(t) m2(t) : : : mp(t))T : s(t) = (s1(t) s2(t) : : : sr (t))T : (10. Suppose there are p measurements signals m1 (t) : : : mp (t). and. indeed. We now show that the SVD of M can be used to get estimates of the source signals. and there exists a relationship between the measurement signals and the source signals.1) (10. mp(t) = tp1 s1(t) + tp2 s2(t) + or where + t1r sr (t) + n1 (t) + t2r sr (t) + n2 (t) + tpr sr (t) + nr (t): (10.2) Let there be r source signals s1 (t) s2(t) : : : sr (t) arranged in the source signal vector s(t): Obviously. because the contributions of the material heart signals are much stronger than those of the fetal heart. Let M = U VT be the SVD of M . and from that estimate.4. .4.

8) F 0 C @ A 0 0 0 where M contains rm large singular values.7) will contain p estimates of the source signals. Next. obviously. Note that if the SVD of M is given by M = U VT = (U1 U2) ^ then S can be estimated by S = U1T M: 0 677 1 0 2 ! VT ! 1 V2T .^ Then the p q matrix S de ned by ^ S = UTM (10.9) T ^ Thus SF = UF M . F contains rf singular values. The signals ^ in SF are called the principal fetal signals. The above method has been automated and an on-line adaptive algorithm to compute the U matrix has been designed. we have 0 UT 1 M ^ = U T M = B UF C M B TC S @ A 0 U T M 1 U0T0 S 1 ^ B M C B ^M C = B UF M C = B SF C @ T A @ A U0T M ^ S0 (10. For the details of the method and test results. DeMoor. Let U = (UM UF U0) be a conformable partitioning of U . as follows: ^ F = UF SF = rm +rf X i=rm +1 T i ui vi where ui and vi are the ith column of U and V . and 0 contains the remaining singular values associated with noise.4. those smaller ones associated with the fetal heart. we need to extract the estimates of the fetal ^ ^ source signals from S let this be called SF : Partition the matrix of singular values of M as follows: 0 1 0 m 0 B C =B 0 (10. and i is the ith singular value of M . see the paper by Callaerts. ^ Once SF is determined. associated with the maternal heart.4. Van Dewalle and Sansen (1990). we can also construct a matrix F containing only the contributions of fetus in each measured signal.4. Then. etc.

of course.1) (10. the following important result holds. the singular values of A and A + E . Indeed. Theorem 10. Proof.5 Sensitivity of the Singular Values Since the squares of the singular values of A are just the eigenvalues of the real symmetric matrix AT A. to the singular values of B and E in the ~ same way the eigenvalues of A are related to the singular values of A.1 The singular values of A: 01 2 31 B C A = B3 4 5C @ A 6 7 8 1 2 3 00 0 B E = B0 0 @ = 14:5576 = 1:0372 = 0:0000 678 0 C 0 C: A 0 0 :0002 1 (10.10. The result basically states that. like the eigenvalues of a symmetric matrix. De ne now B = E= T we have BT 0 E 0 ~ ~ ~ B . where 1 through k are the nonzero singular values of A. zero.5. the singular values of a matrix are well-conditioned.7.5. and we have seen before that the eigenvalues of a symmetric matrix are insensitive to small perturbations. A = E. The proof is based on the interesting relationship between the singular values of the matrix A and the eigenvalues of the symmetric matrix ~ A= ~ displayed in Theorem 10. Let i i = 1 : : : n and ~i. k . Then j~i . the same thing should be expected of the singular values as well. The result is analogous to a result on the sensitivity of the eigenvalues of a symmetric matrix stated in Chapter 8. The remaining ! ! 0 B 0 E ~ ~ ~ eigenvalues of A are. The result of Theorem ?? ~ ~ ~ now follows immediately by applying the Bauer-Fike Theorem (Theorem 8.5.2) .2. ij kE k2 for each i. i = 1 : : : n be.1) to B A and E . 1 : : : . It has been shown there that the nonzero eigenvalues of A are 1 : : : k .3. ~ ~ The eigenvalues of B and E are related respectively. Let A and B = A + E be two m n matrices (m n). AT 0 0 A ! Example 10. in decreasing order.1 (Perturbation Theorem for Singular Values).5. respectively.

The result of Theorem 10.6. i)2 kE kF : i=1 i and ~i .5.6 The Singular Value Decomposition and the Structure of a Matrix The SVD can be very e ectively used to compute certain important properties relating to the structure of a matrix.) Theorem 10. etc. Euclidean norm.2 will be used later in deriving a result on the closeness of A to a number of lower rank. nearness to rank-de ciency. 1j = 1:1373 10. condition number and orthonormal basis for the null space and range. which is an important practical issue). 2j = 0 j~3 .5. (Theorem 10.3) We now present a result (without proof) on perturbation of the singular values that uses Frobenius norm instead of 2-norm. We will discuss these applications in this section.The singular values of A + E : ~1 = 14:5577 ~2 = 1:0372 ~3 = 0:0000 kE k2 = :0002 j~1 . such as its rank (in fact. Then vn uX u t (~i . 679 .1. 3j = 0: (10. i = 1 ::: n be the same as in Theorem 10.5.2 Let A E 10.5.4 j ~2 .2.

4. 2 n 4. n 6= 0). Example 10.1k = 1 . To prove 3 we note that the largest singular value of A.1 min = 0.) 2. 2 2 2 1 2. 01 2 31 B C A = B3 4 5C @ A 1 6 7 8 = 14:5576 2 = 1:0372 3 = 0:0000 680 .1 is 1 . Cond2(A) = kAk2 kA. Remark: When A is rank-de cient. kAkF = ( 1 + 2 + + n ) 2 . min Proof. kA. (Note that when A is n invertible.1 The Norms and the Condition Number matrix A(m n). and we say that Cond(A) is in nite.10.6. Follows from the de nition of Cond2 (A) and 1 and 3.1 Let = 1 2 n be the n singular values of an m n max . kAk2 = 1 Theorem 10.6. 1. 3. when A is n n and nonsingular. kAkF = kU V T kF = k kF 2 2 =( 1+ 2+ 2 2 + n) 1 : 3. Then 1.1k2 = 1 n = max . kAk2 = kU V T k2 = k k2 = max( i ) i = 1: (Note that the 2-norm and F -norm are invariant under unitary matrix products.6. Then the result follows from 1.

In this section we show how the singular vectors can be used to construct these bases. Projection onto N (A) = V2V2T . Projection onto R(A) = U1U1T . This is because. then Computing Projections Using the SVD 1.6. Let r be the rank of A. Then the set of columns vj corresponding to the zero singular values of A form an orthonormal basis for the nullspace of A. that is. 4. Orthogonal Projections Once the orthonormal bases for the range R(A) and the null-space N (A) of A are obtained. 681 . Thus if we partition U and V as U = (U1 U2) V = (V1 V2) where U1 and V1 consist of the rst r columns of U and V .1) (10.1. the orthogonal projections can be easily computed. Similarly. Projection onto the orthogonal complement of R(A) = U2U2T . the set of columns uj corresponding to the nonzero singular values is an orthonormal basis for the range of A. when j = 0 vj satis es Avj = 0 and is therefore in the null-space of A. Projection onto the orthogonal complement of N (A) = V1 V1T .6. kAkF = p 1 = 14:5576 2 2 2 1+ 2+ 3 = 14:5945 10. kAk2 = 2.2 Orthonormal Bases for the Null Space and the Range of a Matrix In Chapter 5 we have shown how to construct orthonormal bases for the range and the null-space of a matrix A using QR decomposition. 3. 2.2) Let uj and vj be the columns of U and V in the SVD of A. 1 2 r+1 = >0 = n = 0: r (10.6.

(Compute now the four @ A 0:8370 .Example 10. in particular. it is more important. to determine if the given matrix is near a matrix of a certain rank and in particular. of course.0:4379 0:3244 0 0:4625 . @ A 0: 0 0:25004082:8371 1 0 B C An orthonormal basis for the range of A = B 0:4852 0:3267 C. is the QR factorization with column pivoting. to know if it is near a rank-de cient matrix.6. which is more reliable than Gaussian elimination. nding the rank of a triangular matrix is trivial one can just read it o from the diagonal.2 6 7 8 1 = 14:5576 2= 3 0 0:2500 0:8371 1:0372 1 = 0: 0:4867 B C U = B 0:4852 0:3267 . may transform a rank-de cient matrix rank into one having full rank.6. as we will see below. Another approach.0:8111 C @ A 0:8378 . however. The most obvious and the least expensive way of determining the rank of a matrix is. Theoretically.0:8165 C @ A 0:6786 . to triangularize the matrix using Gaussian elimination and then to nd the rank of the reduced upper triangular matrix. determining the nonsingularity of a square matrix are very important and frequently arising tasks in linear algebra and many important applications. to determine the rank and rank-de ciency. The Gaussian elimination method which uses elementary transformations.0:4379 orthogonal projections yourself. As we have seen before in 682 .0:8165 C.3 The Rank and the Rank-De ciency of a Matrix Finding the rank of an m n matrix and.) 10. Unfortunately. In practice.0:0882 . this is not a very reliable approach in oating point computations.0:7870 0:4082 1 B C V = B 0:5706 .0:6106 0:4082 01 2 31 B C A = B3 4 5C @ A 0 0:4082 1 B C An orthonormal basis for the null-space of A = B . due to numerical round-o error.

3) 0 0 R1 is upper triangular and nonsingular. Since the number of nonzero singular values determines the rank of a matrix. Theorem 10. rather than knowing what the rank is. De ne where Ak = U k V T 0 0 Then (i) rank(Ak ) = k. Then r > 0 and r+1 = the question is how far A is from a matrix of rank k < r. and the dimension r of R1 is the rank of A. 1 0 0 1 B C B . Theorem 10. B C 0 C: B C k=B C B 0 C k @ A Proof.2 is generally known as the Eckart-Young Theorem (see Eckart and Young (1939)). 166-167) and our discussions on rank-revealing QR factorization in Chapter 5. that is. 1 2 = n = 0. and orthogonal matrix Q such that (10. does not determine reliably the nearness of A to a rank-de cient matrix. 1984. This result above. it is closest to A in Frobenius norm. rank(Ak ) = rank(U k V T ) = rank( k ) = k: 683 . Suppose that A has rank r. pp. and let rank(A) be r > 0. therefore.6. more meaningful to know if a matrix is near a matrix of a certain rank. because. The statement about the rank is obvious.. see again Golub and Van Loan (MC.2 Let A = U V T be the SVD of A. Let k r. The singular value decomposition exactly answers this question. The following theorems can be used to answer the question.. For details. however. = QT AP R1 R2 ! The most reliable way to determine the rank and nearness to rank-de ciency is to use the SVD. we can say that a matrix A is arbitrarily near a matrix of full rank: just change each zero singular value by a small number . It is.Chapter 5 this method constructs a permutation matrix P .6. (ii) out of all the matrices of rank k.6.

we rst prove that if A has (n . Next. kB .2.6.5. k) singular values k+1 : : : n are small. Ak2 = Proof. pp. then it is close to Ak . closest to A is given by where B = U kV T 0 1 0 0C 1 B B C . Then since B has rank k. kAk . B C C: =B k B C B 0 C k @ A 0 0 k+1: Furthermore. k) small singular values. we then have n X i=1 n. Ak2 = 1 : kAk2 Cond2 (A) 684 . then B must have rank at least r. Ak2 F j i . we show that if B is any other matrix of rank k..3 Out of all the matrices of rank k(k < r). the matrix B. let's denote the singular values of B by 1 2 k+1 = k+2 = = n = 0. the distance of B from A : kB . 2 kB .To prove the second part. See Watkins (FMC. Theorem 10.6. k2 = k+1 + + n: Thus if the F F F (n . That is. 413-414). That is. )V T k2 = k k . ij2 k+1 + 2 + n = kAk . 2 kB . then rank(B ) = k r: Corollary 10. if B is such that kB .2 The relative distance of a nonsingular matrix A to the nearest singular matrix 1 B is Cond (A) . AkF : 2 We now present a similar result (without proof) using 2-norm. Indeed. Ak2 = kU ( k . Ak2 : F F To see this. then A is 2 2 close to Ak .6.1 If the distance of the matrix B from A is less than the smallest singular value of A. Ak2 kAk . Corollary 10.. By Theorem 10. Ak2 < r .

Implication of the above results Distance of A Matrix From the Nearest Matrix of Lower Rank The above result states that the smallest nonzero singular value gives the distance from A to the nearest matrix of lower rank. one such T perturbation is ur r vr . In fact.6. for a nonsingular n n matrix A. u3 rank(A0) = 2: 0 C 0 C A 0:0000004 3 = 0:0000004 1 0 C 0C A 1 1 0 C 0C A 1 01 0 01 C T B 3v3 = B 0 2 0 C : @ A 0 0 0 1 685 . 1. look into the smallest nonzero singular value r . then the matrix is very close to a matrix of rank r . in order to know if a matrix A of rank r is close enough to a matrix of lower rank. If this is very small. 1. Thus. n gives the measures of the distance of A to the nearest singular matrix. In particular. because there exists a perturbation of size small as j r j which will produce a matrix of rank r . Example 10.3 01 0 B A = B0 2 @ 0 0 rank(A) = 3 00 1 B U = B1 0 @ 0 0 00 1 B V = B1 0 @ 0 0 A0 = A .

T The required perturbation u3 3 v3 to make A singular is very small: 00 0 0 1 B C 10. and 1 : : : r be the nonzero singular values of A. this will result in a very substantial savings.6 B 0 0 0 C : @ A 0 0 :4 10.4 01 2 31 B C A = B3 4 5C @ A 6 7 8 686 .6.4 A= r X j =1 T j uj vj : Proof. if r is substantially less than n.6. Let A be m n (m n).1 The representation of A = expansion of A. r X j =1 T j uj vj De nition 10. and u1 : : : ur .4 An Outer Product Expansion of A and Its Consequences In many applications the matrix size is so large that the storage of A may require a major part of total available memory space. The knowledge of nonzero singular values and the associated left and right singular vectors can help store the matrix with a substantial amount of savings in storage space. Let the associated right and left singular vectors be v1 : : : vr . Since a rank-one matrix of order m n can be stored using only (m + n) storage locations rather than mn locations needed to store an arbitrary matrix.6. is called the dyadic or the outerproduct Example 10. The proof follows immediately from the relation A= U VT noting that r+1 through n are zero. respectively. Then the following theorem shows that A can be written as the sum of r matrices each of rank 1.6. Theorem 10.

:4389 0 . k) small singular T T T values of A which can be neglected. in digital image processing.5 Numerical Rank As we have just seen in the last section in practical applications that need singular values we have to know when to accept a computed singular value to be \zero". If A is n n.tkAk1 where the entries of A are correct to t digits. even when k is chosen much less than n. the storage of Ak will require 2nk locations compared to n2 locations needed to store A.4 has an important practical consequences. if the data shares large relative error.0 :2500 1 B C u1 = B :4852 C @ A 8379 0 ::4625 1 B C v1 = B :5706 C @ A :6786 = 14:5576 2 = 1:0372 3 = 0 r = 2: 1 0 :8371 1 B C u2 = B :3267 C @ A . Then the matrix Ak = 1u1 v1 + a2 u2v2 + + k uk vk is a very good approximation of A. resulting in a large amount of savings. For example. it should also be taken into consideration. 10.tkAk1 for a zero singular value. Of course. 247): 687 .6. For details see Andrews and Hunt (1977). In practical applications. Having de ned a tolerance = 10.:07870 1 B C v2 = B . p. if it is of an order of \round-o zeros" (machine epsilon). However. the matrix A generally has a large number of small singular values. we can have the following convention (see Golub and Van Loan (MC (1989)). and such an approximation can be adequate in applications. the digital image corresponding to Ak can be very close to the original image.:0882 C @ A 0:6106 01 2 31 C T T B 1 u1 v1 + 2 u2v2 = B 3 4 5 C : @ A 6 7 8 An Important Consequence Theorem 10. Suppose there are (n .6. A practical criterion would be Accept a computed singular value to be zero if it is less than or equal to 10. we can declare it to be zero.

where n = rank(A).1 = (cij ) are given by n X vikvjk T .3:3333 2 2 1 2 688 .1 01 21 B C A = B2 3C: @ A 3 4 2 2 c11 = v11 + v12 = 4:833 2 2 1 2 2 c12 = v11v21 + v12v22 = . We note here this matrix can also be computed immediately once the singular values and the left singular vectors of A have been computed.1 .1 using the QR factorization of A.1 Using the SVD Let A = U VT k be the SVD of A. (Note that C = (A A) = V k=1 Example 10. then A has numerical rank r. count the \large" singular values only.6. If this number is r. Remark: Note that nding the numerical rank of a matrix will be \tricky" if there is no 10. suitable gap between a set of singular values.2V T :) cij = 2 . then the entries of the matrix C = (AT A).7.satisfy A has \numerical rank" r if the computed singular values ~1 ~2 : : : ~ n ~1 ~2 ~r > ~r+1 ~n (10.4) Thus to determine the numerical rank of a matrix A. Computing (AT A).7 Computing the Variance-Covariance Matrix In Chapter 7 we have described a method to compute the variance-covariance matrix (AT A).

b0k2 689 . The reduced problem is trivial to solve.8.8 The Singular Value Decomposition. We have just seen that the singular values of A are needed to determine the reliably nearness to rank de ciency. Consider the least squares problem once more: Find x such that krk2 = kAx . and the Pseudo-Inverse In Chapter 7 we have seen that the full-rank linear least squares problem can be e ectively solved using the QR decomposition. the use of the SVD of A reduces the least squares problem for a full matrix A to one with a diagonal matrix : Find y such that is minimum.1 The SVD and Least Squares Problem We have brie y discussed how to solve a least-squares problem using the SVD in Chapter 7.c22 = v21 + v22 = 2:3333: 2 2 2 2 1 2 10. Thus. Then we have krk2 = k(U V T x . but pivoting will be needed to handle the rank-de cient case. Here we give a full treatment. 10. Let A = U V T be the SVD of A. U T b)k2 = k y . the Least Squares Problem. We will now see that the SVD is also an e ective tool to solve the least squares problem. both in the full rank and rank-de cient cases. b0k2 where V T x = y and U T b = b0 . b)k2 = kU ( V T x . bk2 is minimum. We have k m X X 02 k y . b0k2 = j iyi . b0ij2 + jbij i=1 i=k+1 k y .

Form b0 = U T b = B . of course. in the rank- that minimizes k y . unique. There are instances where this rank de ciency is actually desirable because it provides a rich family of solutions which might be used for optimizing some other aspect of the original problem. Algorithm 10. Find the SVD of A: 0 b0 1 B b1 C B 0C 2. Compute y = B .) 690 . b0k2 is given by: de cient case. In the full rank case the least squares solution is.C @ A yn 0 yi = bi when i 6= 0 i yi = arbitrary when i = 0.) Of course. Then the family of least squares solutions is (Note that in the full-rank case. The above discussion can be summarized in the following algorithm. the solution can be recovered from x = V y: Since corresponding to each \zero" singular value i yi can be set arbitrarily. Thus the vector 0y 1 B y1 C B C y = B .1 C 3. the family has just one number. C choosing @ A yn b0m A = U VT: 8 b0 1 > i < when i 6= 0 C yi = > i A: : arbitrary..1 Least Squares Solutions Using the SVD 1.8. we will have in nitely many solutions to the least squares problem.C @ A 0y 1 B .. when i = 0: x = V y: 4.where k is the number of nonzero singular values of A.2 C. once y is computed.C B. B. (Note that when k < n yk+1 through ym do not appear in the above expression and therefore do not have any e ect on the residual.2 C B C B.

= 7:5358 2 = 0:4597 3 = 0: 1 A is rank-de cient.1 01 2 31 B C A = B2 3 4C @ A 1 2 3 061 B C b = B9C: @ A 6 1. from above. it takes about An Expression for the Minimum Norm Least Squares Solution It is clear from Step 3 above that in the rank-de cient case.0:8546 0:4082 1 B C V = B 0:5470 .1) Example 10. 0 :4956 :5044 :7071 1 B C U = B :7133 .0:7071 2. we have the following expression for the minimum 2-norm solution: Minimum Norm Least-Squares Solution Using the SVD x= where r = rank(A).0:2547 0)T : 3. (In deriving this op-count.2mn2 + 4n3 ops to solve the least squares problem. Thus. the minimum 2-norm least squares solution is the one that is obtained by setting yi = 0.:7008 0:0000 C @ A 0:4956 0:5044 .8.8.0:5541 0): 0 :3208 .0:1847 .) Flop-count: Using the Golub-Kahan-Reinsch method to be described later. when A is m n and m n. The minimum 2-norm least-squares solution is V y = (1 1 1)T : 691 . it is noted that the complete vector b does not need to be computed then only the columns of U that correspond to the nonzero singular values are needed in computation.0:8165 C : @ A 0:7732 0:4853 0:4082 4. r X uT b0i i i=1 i vi (10. whenever i = 0. y = (1:6411 . b0 = U T b = (12:3667 .

2 Solving the Linear System Using the Singular Value Decomposition Note that the idea of using the SVD in the solution of the least squares problem can be easily applicable to determine if a linear system Ax = b has a solution.8. and.8. and a solution of Ax = b can be computed by solving the diagonal system y = b0 rst. the system Ax = b is consistent i the diagonal system y = b0 is consistent (which is trivial to check). the A formal de nition of the pseudoinverse of any matrix A (whether it has full rank or not) can be given as follows: 692 .10. in general. this approach will be much more expensive than the Gaussian elimination and QR methods. However.3 The SVD and the Pseudoinverse In Chapter 7 we have seen that when A is an m n (m pseudoinverse of A is given by: Ay = (AT A). if so.1AT : n) matrix having full rank. how to compute it. Thus if A= U VT then is equivalent to Ax = b y = b0 where y = V T x and b0 = U T b: So. to solve a linear system. 10. and then recovering x from x = vy . That is why the SVD is not used in practice.

XAX = X: 3. and therefore is the pseudoinverse of A.1V T U T = V . use 1 = 0) j V yU T where y = diag( 1 ) j (10. (XA)T = XA: m matrix X The pseudoinverse of a matrix always exists and is unique.1 ): The process for computing the pseudoinverse Ay of A using the SVD of A can be summarized as follows. Algorithm 10.3) (Note that in this case. (AX )T = AX: 4. Note that this expression for the pseudoinverse coincides with A. Let A = U V T be the SVD of A then it is easy to verify that the matrix (if j = 0.8.1( T ).1 when A is nonsingular.1AT = (V T U T U V T ). We now show that the SVD provides a nice expression for the pseudoinverse. Find the SVD of A: A = U VT: 693 .2) satis es all the four conditions. y = .1 = (AT A). AXA = X . 2.8.2 Computing the Pseudoinverse Using the SVD 1. because A.Four Properties of the Pseudoinverse The pseudoinverse of an m n matrix A is an n satisfying the following conditions: 1.8.1V T V T U T = V .1U T (10.

1) and (10. 1 0 0 : : : r are the r nonzero singular values of A.6 3 0 0 0 . 1 1 3 3 B 3 9 9 CB CB C B C Ay = B .1 0 0 C B 0 2 0 C B . 6 1 .6 . 1 C @ 9 3 9 A@ 2 A@ A @ 3 6A 1 .9 1 0 . 6 .1 C = B 0 2 . 1 .2.2).36 19 .1 0 0 V T = V: Thus 0 0 0 0 1 ..8. 1 r 1 C C C C C 0C C C C A Example 10.8.1 0 0 0 2 1 9 9 3 3 The Pseudoinverse and the Least Squares Problem >From (10. 6 .1 0 1 0 0 . 6 C C B CB CB 9 3 A=@ 9A A@ A@ 6 . 6 1 9 B ..1 0 1 B C U T = B 0 0 . it follows that the minimum 2-norm solution of the linear least squares problem obtained through the SVD in the previous section is Ayb where Ay is the pseudoinverse of A. 0 B B B B y = diag B B B B B @ 1 1 1 2 .1 1 0 1 0 0 1 0 1 .6 . 6 C B 0 1 0 C B 0 0 .1 C @ A . Compute where 3.8. 6 1 0 1 0 0 1 0 0 . 694 .2 Find the pseudoinverse of 0 0 0 .1 0 0 0 0 9 3 A = U VT Ay = V yU T 01 0 01 y = B0 2 0C B 1 C A @ 0 0 . Compute Ay = V y U T .

8.9 Computing the Singular Value Decomposition As mentioned before.9.Example 10. it is important to know how the SVD can be computed in a numerically e ective way. as we have seen in Chapter 7. etc.1) .0:9167 C 0:3333 . However.. the ability of the singular values and the singular vectors to compute e ectively the numerical rank. is what makes the singular value decomposition so attractive in many practical applications.0:9167 B Ay = B .0:1667 C A . Therefore.1 Computing the SVD from the Eigenvalue Decomposition of AT A We have seen (Theorem 10.3 1 2 0 .0001. some vital information may be lost due to round-o error in the process of computing AT A. It is thus natural to think of computing the singular values by nding the eigenvalues of the symmetric matrix AT A.1 1:0001 1:0000 A= 1:0000 1:0001 The singular values of A are 2. The following simple example will illustrate the phenomenon: Example 10.1) that the singular values of a matrix A are just the nonnegative square roots of the eigenvalues of the symmetric matrix AT A. the orthonormal bases of the range and the null spaces. the condition number. this is not a numerically e ective approach because.0010 and 0. Now ! T A = 2:0002 2:0002 A 2:0002 2:0002 695 ! (10.9.3.0:6667 0:5833 061 B C b = B8C @ A 011 B C Ay b = B 1 C : @ A 1 10. In this section we will discuss this most important aspect of the SVD. 10.9. the distance of a matrix A from a matrix of lower rank.0:1667 @ 0:5833 The minimum 2-norm least squares solution is 01 2 31 B C A = B2 3 3C @ A 3 6 1 1:3333 .

The algorithm comes in two stages: Stage I: The m n matrix A(m n) is transformed to an upper bidiagonal matrix by orthogonal equivalence: ! B U0Tm m Am n V0n n = where B is an n n bidiagonal matrix given by 0 (10... C .. C 1 matrix : Stage II: The bidiagonal matrix B is further reduced by orthogonal equivalence to a diagonal U1T BV1 = = diag( 1 : : : n): (10. Stage I is known as the Golub-Kahan bidiagonal procedure. We will call the combined two-stage procedure by the Golub-Kahan-Reinsch method. .(to four signi cant digits). High relative accuracy of the singular values of bidiagonal matrices.. The eigenvalues of AT A are 0..0004. C . The singular vector matrices U and V are given U = U0U1 V = V0V1 (10.1 n C A 0 0 bnn 0 . 696 .3) by Of course.9. and Stage II is known as the Golub-Reinsch algorithm.5) Remark: In the numerical linear algebra literature. The LINPACK SVD routine is based on this algorithm. B .0000 and 4.9..9. A Fortran program implementing the algorithm appears in Businger and Golub (1969). @ 0 b12 .2) 0b 11 B0 B B .4) (10.9. 10. . is the matrix of singular values. C: .2 The Golub-Kahan-Reinsch Algorithm The following algorithm is nowadays a standard computational algorithm for computing the singular values and the singular vectors... B . b n. The following result due to Demmel and Kahan (1990) shows that the singular values of a bidiagonal matrix can be computed with very high accuracy. The ALGOL codes for this procedure has been described in a paper by Golub and Reinsch (1970).9.

and bi i+1 + bi i+1 = 2i bi i+1 j = 0.1 max(j j j . 697 . a Householder matrix V01 is constructed such that 0 B0 B B (1) = U A = B 0 B A 01 B B B0 @ 1 C C C C C C C C A 0 B0 B B B A(2) = A(1) V01 = B 0 B B B0 @ 0 0 01 B C B 0 C B0 C B C B C=B 0 C B C B C B0 A @ 0 0 0C C 1 C C C C C C C A A0 The process is now repeated with A(2) that is. 6 2n.1j): Let Let = i=1 i 1 n be the singular values of B and i 0 let 1 n 0 be the singular values of B + B .9.1bii.7) and V0 = V01V02 V0 n. Suppose that bii + bii = 2i. Let B = ( bij ).8) Let's illustrate the construction of U01 V01 and U02 V02.2: (10.9.6) Reduction to Bidiagonal Form The matrices U0 and V0 in Stage I are constructed as the product of Householder matrices as follows: U0 = U01U02 U0n (10.1 Let B = (bij) be an n n bidiagonal matrix. Then i i 0 i i = 1 : : : n: (10. and their role in the bidiagonalization process with m = 5 and n = 4: First. a Householder matrix U01 is constructed such that 0 Next.9.9.Theorem 10.

The process is continued until the bidiagonal matrix B is obtained.0:3410 C @ A .0:8847 1 B C = B .1:8154 01 1 0 0 B C = B 0 .0:3410 0:3180 0 .0:6470 0:7625 C @ A 0 .0:9077 . U01 A(1) 0 .0:1474 . Example 10. V01 A(2) .8:2567 .Householder matrices U02 and V02 are constructed so that 0 0 01 B0 C B 0C B C C (2) V = B 0 0 B C U02A 02 B C B C B0 0 C @ A 0 0 Of course.2 01 2 31 B C A = B3 4 5C: @ A 6 7 8 Step 1. Thus.9. that is.1:0002 0:0245 C A @ 0 1:9716 .0:5571 0:6470 1 0 .6:7823 . in this step we will work with the 4 3 matrix A0 rather than the matrix A(2) .9:7312 1 B C = U01A = B 0 0:0461 0:0923 C @ A 0 .0:8847 .0:4824 698 Step 2.0:4423 0:8295 . by embedding them in identity matrices of appropriate orders. 0 0 the orthogonal matrices U02 and V02 will be constructed rst such that 0 1 0 B C B0 C B C 0 0 U02 A0V02 = B C B0 C @ A 0 0 0 then U02 and V02 will be constructed from U02 and V02 in the usual way.0:4423 .6:7823 12:7620 0 C B = A(1) V01 = B 0 .

The details of the process can be found in the book by Golub and Van Loan (MC 1984 pp.9) B C .6:7823 12:7620 1 0 B C B = U02A(2) = U021A(1)V01 = U02U01AV01 = B 0 . The ith iteration is equivalent to applying the implicit symmetric QR. Then the Wilkinson-shift for the symmetric matrix B T B is the eigenvalue of 2 2 right-hand corner submatrix ! 2 + 2 n. To simplify the notation. it successively constructs a sequence of bidiagonal matrices fBk g such that each Bi has possibly smaller o -diagonal entries than the previous one. if bk k+1 = 0. then B can be written as the direct sum of two bidiagonal matrices B1 and B2 and (B ) = (B1 ) (B2).0:0508 0:9987 C @ A 0 0:9987 0:0508 0 . C B C C B=B (10.9.1 n.Step 3.. We outline the process brie y in the following.. Finding the SVD of the Bidiagonal Matrix The process is a variant of the QR iteration. let's write 0 1 1 2 B .1:0081 . U02 01 1 0 0 B C = B 0 . described in Chapter 8..10) 2+ 2 n n..9. Starting from the n n bidiagonal matrix B obtained in Stage 1.1:8178 C @ A 0 0 0 Note that from the above expression of B . The process has guaranteed convergence and the rate of convergence is quite fast..1 (10. In the following just one iteration step of the method is described. The e ective tridiagonal matrices are assumed to be unreduced (note that the implicit symmetric QR works with unreduced matrices) otherwise we would work with decoupled SVD problems.1 n n 2 that is closer to 2 + n: n 699 n .1 n n.. it immediately follows that zero is a singular value of A. B nC @ A be an n n bidiagonal matrix. forming the product BiT Bi explicitly. of course. with Wilkinson shift to the symmetric tridiagonal matrix BiT Bi without. For example. 292-293). .

... That is. B J2 B C B . that is form J10 ! 1 such that J 0 1 2.. ....2 2... B C B C B .) The idea is now to chase the nonzero entry `+' down the subdiagonal to the end of the matrix by applying the Givens rotations in an appropriate order as indicated by the following: 0 1 + B C . .. (10.. ..... C B C @ A where + indicates a ll-in. C TB = B C 3. Apply J1 to the right of B ..11) (The ll-in is at the (2...1) position of B.. 1 C C C C C C C C A 700 (Fill-in at (2. C B C B BJ1 = B C B . B C B C B ... 1 1 2 ! = ! 0 : : B BJ1 This will give a ll-in at the (2. C B C @ A (Fill-in at (3...2) position) 0 B B B TB = B B 4. .3) position) 1 the B C .... C B C @ A (Fill-in at 0 (1.. B J4 B B B @ + . . . we will have 0 1 B + .4) position) ..1.9. C B C B C B . C B + C B BJ3 = B C B .1) position.. Compute a Givens Form J1 = rotation J 0 0 0 In....

:6480 1 B C B J2B = B 0 1:5361 .:9764 0 C @ A 0 0 1 0 1:3020 2:2238 . The Wilkinson shift = 15:0828 0 .. 0 .3) position.3 01 2 01 B C B = B0 2 3C @ A 0 0 1 1.:9901 :1406 0 1 B C J1 = B .:2160 0 1 B C J2 = B 0:2160 .1:2713 .:1406 .:9764 .) 0 0 1 3. 1 C C C C C C C C A At the end of one iteration we will have a new bidiagonal matrix B orthogonally equivalent to the original bidiagonal matrix B : (Fill-in at the (4.1:8395 0 1 B C B BJ1 = B .:9901 0 C @ A 0 0 0 2...2 J4T J2T )B(J1J3 Jn.9.) And so on.3) position) 0 0 1 01 0 1 0 B C J3 = B 0 0:9601 :2797 C @ A 0 .1) position.:2797 :9601 701 . + .1 ): Example 10.. B = (J2Tn. 0 .1:9801 3 C @ A (The ll-in at the (2.2:9292 C @ A (The ll-in at (1.0 B B B B B BJ5 = B B B B @ .0:2812 .

. that . The estimated op-count is: 2m2n + 4mn2 + 9 n3 (m n). A criterion for o -diagonal negligibility given in Golub and Van Loan (MC 1984 p. There are applications (e.2:4812 C @ A 0 :1210 :9926 0 0 Stopping Criterion :6646 The algorithm typically requires a few iteration steps before the o -diagonal entry n becomes negligible.g. Stage II is iterative and is quite cheap. least squares) where all three matrices are not explicitly required. ^ ^ ^ ^ A + E (U + U ) ^ (V + V ) 702 ^ ^ U ^ (V )T .:2797 :9601 01 0 1 0 B C J4 = B 0 :9926 .1j) where is a small multiple of the machine precision . Flop-Count: The cost of the method is determined by the cost of Stage I.(Fill-in at the (3. This count includes 2 the cost of U and V . and V appears in Golub and Van Loan (MC 1984. is. Round-o Property: It can be shown (Demmel and Kahan (1990)) that the computed SVD.2) position) 4. 434) is: A Criterion for Neglecting an O -Diagonal Entry Accept an o -diagonal i to be zero if j ij (j i j + j i. 0 1:3020 2:3163 1 0 B C B BJ3 = B 0 2:2942 . 175).:1210 C @ A 0 1:3020 2:3163 1 0 B C B J4B = B 0 2:3112 .2:3825 C @ A 0 . p. A nice table of di erent op-count of the Golub-Kahan-Reinsch SVD and the Chan SVD (to be described in the next section) for di erent requirements of U . produced by the Golub-Kahan-Reinsch algorithm is nearly the exact SVD of A + E .

Entrywise Errors of the Singular Values Furthermore. Thus the Chan-SVD can be described as follows: 1.9. ^ (kE k2jkAk2) p(m n) k U k p(m n) ^ k V k p(m n) and p(m n) is a slowly growing function of m and n.13) Tony Chan. Find the QR factorization of A: 0 2. a Chinese-American mathematician is a professor of mathematics at the University of California. Thus. once the SVD of R is obtained. 10. Los Angeles. one can easily retrieve the SVD of A. the singular values which are not much smaller than max will be computed by the algorithm quite accurately. He developed this improved algorithm for SVD when he was a graduate student at Stanford University.^ ^ ^ ^ where U + U and V + V are orthogonal. let i be a computed singular value. ij p(n)kAk2 = p(n) max where p(n) is a slowly growing function of n. and the small ones may be inaccurate. can be improved in the case m > 13 n. The result says that the computed singular values can not di er from the true singular values by an amount larger than = p(n) max . Chan (1982) observed that the Golub-Kahan-Reinsch algorithm for computing the SVD described in the last section. The improvement naturally comes from the fact that the work required to bidiagonalize the triangular matrix R is much less than that required originally to bidiagonalize the matrix A. Find the SVD of R using the Golub-Kahan-Reinsch algorithm: QT A = R ! : (10.9. Of course. 703 . if the matrix A is rst factored 8 into QR and then the bidiagonalization is performed on the upper triangular matrix R. then j i .12) R = X YT: (10.9.3 The Chan SVD T.

9.6:1101 1 B C R=B 0 .0:8033 Y = 0:8033 0:5956 0 7:6656 0 1 B C = B 0 0:4881 C @ A 0 0 704 0 C 0 C A 1:0000 1 .9.14) U and V .4:5826 .0:8729 0:4082 .4 01 21 B C A = B2 3C: @ A 4 5 1. The SVD of R: R = X YT 0 .0:9963 0:0856 B X = B . 2mn + 4mn2 + Flop-Count: The Chan-SVD requires about 2m2n +11n3 ops to compute Remark: The Chan-SVD should play an important role in several applications such as in control theory.0:9963 @ 0 0 ! 0:5956 .Then the singular values of A are just the singular values of R.0:4082 0:8018 C @ A . Clearly.0:8165 . The singular vector matrices U and V are given by: U = Q X j0] V = Y: (10. The QR factorization of A: 0 .0:8165 C @ A 0 0 (Q T A = R) 2. compared 9 n3 ops required by the Golub-Kahan-Reinsch SVD algorithm. there to 2 will be savings with the Chan-SVD when m > n.0:2182 .0:0856 .0:4364 . where typically m n: Example 10.0:5345 1 B C Q = B .0:2673 0 .

the very small singular values can be found (almost) as accurately as the data permits.0:5345 1 B C U = Q X j0] = B 0:4698 0:3694 0:8018 C @ A 0:8347 0:4814 . n3 =3. Demmel and Kahan (Demmel and Kahan. 705 . and jointly with William Kahan.6656.0:2673 V = Y Flop-Count for the Least-Squares Problem Using the SVD The number of ops required to solve the linear least squares problem (m n) using the SVD computed by the Golub-Kahan-Reinsch algorithm is about 2mn2 + 4n3 .3. the Presidential Young Investigator Award in 1986. including the smaller ones.4 Computing the Singular Values with High Accuracy: The Demmel-Kahan Algorithm The round-o analysis of the Golub-Kahan-Reinsch algorithm tells us that the algorithm can compute the singular values not much smaller than kAk2 quite accurately however. 177). the op-count is about mn2 + 17 n3 .9. in 1991. with very high relative accuracy. 3 Recall3 that the op-count for the normal equations approach for the least-squares problem is mn2 + n and that using QR factorization (with Householder matrices) is mn2 . 10. p. 0. James Demmel is a professor of Computer Science at the University of California-Berkeley.4881. Again a 2 6 nice table of di erent least squares methods in regards to their e ciency appears in Golub and Van Loan (MC 1984. The new algorithm is based on a remarkable observation that when the shift is zero. The Singular Value Decomposition of A = U V T The Singular Values of A are 7. He won several awards including the prestigious Householder Award (jointly with Ralph Byers of the University of Kansas) in 1984. The reason for this is that the new algorithm corresponds to the Golub-Kahan-Reinsch algorithm when the shift is zero. the SIAM (Society for Industrial and Applied Mathematics) Award in 1988. for the best linear algebra paper and the Wilkinson Prize in 1993. 1990) in an award-winning paper presented a new algorithm that computes all the singular values. They call this algorithm \QR iteration with a zero shift". and therefore. the smaller singular values may not be computed with high relative accuracy. If the Chan-SVD is used. no cancellation due to subtraction occurs. 0 0:2873 0:7948 .

706 . Since the rank of the 2 2 matrix ! the (2.. . . we now have one extra zero on the superdiagonal compared to the same stage of the GolubKahan algorithm: This phenomenon continues.. This is demonstrated in the following. C BJ1 = B B C B C . B C @ A where there is a ll-in at the (2.. we will have b12 b13 b22 b23 is 1. C B C B C B J2 BJ1 = B . Then 0 1 + B . it will zero out 0 B B B B B B J2 BJ1 J3 = B B B B B B @ 0 0 . but note that the (1. ....4) entry. C B C @ A There is now a ll-in at the (1.. Let J1 be the Givens rotation such that ! 2 ! 1 0) (J1 = : 0 1 2 ! J10 0 Compute J1 = ..4) entry as well as the (2. This zero will now propagate through the rest of the algorithm and is indeed the key to the e ectiveness of this algorithm.. and so on. Then 0 In.. 1 C C C C C C C C C C C A Thus.2 0 0 1 B+ C B C B C B C ...0 Indeed the e ect of the zero shift is remarkable. Indeed.. .2) entry now is zero instead of being nonzero. Let us now apply J2 to the left of BJ1 to zero the nonzero entry at the (2...3) entry as well that is. it follows that when J3 is applied to J2 BJ1 to the right to zero out (1.1) position.3) entry. the rotation J4 zeros out the (3.3) position as in the Golub-Kahan-Reinsch algorithm....1) position as before.

0 0 1 (Note that there is a ll-in at the (2.1) position. but the (1.0:9583 C @ A 0 0:9583 0:2857 0 2:8636 1:9557 0 1 B C B BJ3 = B 0 2:4445 0 C @ A 0 0:9583 0:2857 (Note that both (1. 01 1 0 0 B C J4 = B 0 0:9310 0:3650 C @ A 0 .) 4.0:6247 0:7809 0 C : @ A B 0 2:8636 0:5588 1:8741 1 B C J2 B = B 0 0:6984 2:3426 C : @ A 0 0 1 0 0 1 Note that the rank of the 2 2 matrix ! 0:5588 1:8741 0:6984 2:3476 is one.Example 10. 2.2) entry is zero.0:8944 0 1 B C J1 = B 0:8944 0:4472 0 C @ A 0 2:2361 0 0 1 B C B BJ1 = B 1:7889 0:8944 3 C : @ A 0 0 1 0 0 1 1.9.3) entries are zero. 0 0:7809 0:6244 0 1 B C J2 = B .5 01 2 01 B C B = B0 2 3C @ A 0 0:4472 .) 3.3) and (2. 01 0 1 0 B C J3 = B 0 0:2857 .0:3650 0:9310 707 .

Let 1 2 n.1= n: If there is a cluster of singular values of multiplicity m di erent from the 2 2 remaining ones then the convergence will be linear with constant factor n.m : i = 1 : : : n be the singular values of the computed bidiagonal matrix B obtained by applying one implicit zero shift QR to B.9306.2657. 0 2:8636 1:9557 0 1 B C B J4B = B 0 2:6256 0:1043 C @ A Stopping Criterion: Let the nonzero diagonal entries of B be denoted by 1 : : : n. if w 69n2 < 1 708 . 1 to 1 do j +1 = j j +1j( j /( j + j 1 n = j nj j +1j)) 2nd sequence: j +1j)): A Criterion for Setting an O -Diagonal Entry to Zero Set j = 0 if j j/ j +1j or j j / j j where < 1 is the desired relative accuracy. 1 1 to (n-1) do j = j j j ( j +1/( j +1 + j and = f 1g For j = n . Rate of convergence: The convergence of the last o -diagonal element to zero is linear with 2 2 constant factor n. Remark: For details of how to apply these two di erent criteria in di erent situations in practical implementations.m+1 / n.0 0 0:2660 Note that the singular values of B are: 3. De ne the two sequences f k g and f k g as follows: 1st sequence: For j = n . 1. It has been i Round-o error: Let shown by Demmel and Kahan (1990) that. see Demmel and Kahan (1990). and the nonzero o -diagonal entries by 2 : : : n.8990. and f j g and f j g are computed as above. and 0.

6 7 6 7 B=6 7 .. we remark that the bisection method for the symmetric eigenvalue problem discussed in Chapter 8 can be adapted to compute the singular values of a matrix. the reader is referred to the paper by Demmel and Kahan (1990).. 10. This is a rather pessimistic result. .5 The Di erential QD Algorithm Recently Fernando and Parlett (1992) have reported an algorithm based on the di erential form of the quotient-di erence that computes all the singular values with maximal relative accuracy.then Moreover. i is bounded by c An error bound on the computed singular values is of the form 0 0 w j i . w i i = 1 : : : n: where c0 is another modest constant. \errors do not accumulate at all and the error in the computed i and c another modest constant". ij 1 . the singular values of the then bidiagonal matrix Bk are k1 kn. if after k iterations. This algorithm is claimed to be at least four times faster than the Golub-Kahan-Reinsch algorithm. we can not describe the algorithm here. c0 n (1 .1 n 7 4 5 0 bnn are positive eigenvalues of " 0 BT # : B 0 709 . then when w 69n2 < 1.9.6 The Bisection Method To conclude the section. as the machinery needed to explain the algorithm has not been developed in this book. The authors have given another round-o result that states that with the approach of the convergences of the algorithm.1w)k . Unfortunately.. we have j i . kij (1 .9. Note that the singular values of 2b b 3 0 11 12 6 7 .. For details and the proofs of these results.. b 6 n.. c0 n ) 10. 1 i 69kn2 i: The above result basically states that \the relative error in the computed singular values can only grow with the squares of the dimension".

the singular values of B . 318-319).. . .. For a proof. bn. b23 B T =B B b23 . it is ine cient in sequential computation compared to QR based algorithms. 1 C C C C C C C C C C C C C C C C bnn C A 10. . B B 0 which is a symmetric tridiagonal matrix with 0 diagonal entries. r = rank(B ). respectively. We merely state the theorem here without proof.. 12 b . see a recent paper by Li.2.1 n bnn . The advantage of the bisection method is its simplicity.10. Rhee and Zeng (1993). . see Golub and Van Loan (MC. i....1 n . The c1 c2 : : : cr are called the generalized singular values of A and B. 22 B 12 B B B b22 .. pp. .e.. and this generalized SVD is useful in certain applications such as constrained least squares problems. Then there exist orthogonal matrices U and V and a nonsingular matrix W such that U T AW = C = diag(c1 : : : cn) ci 0 V T BW = D = diag(d1 : : : dq ) di 0 where q = min(p n) d1 dr > dr+1 = = dq = 0. B B . . Thus the bisection method described earlier in Chapter 8 can now be used to nd the positive eigenvalues of T . However. matrices of order m n and p n (m n).1) can be generalized for a pair of matrices A and B . B B @ bn.1 (Generalized Singular Value Decomposition Theorem) Let A and B be.. accuracy and inherent parallelism. Theorem 10. 1984. b B 11 B B b .10 Generalized SVD The SVD Theorem (Theorem 10. Van Loan is a professor of Computer Science at Cornell University. 710 .The last matrix can be permuted to obtain 0 0 b11 B b .. The generalization was rst performed by Van Loan (1976). elements d d d 1 2 r Charles F. He is the co-author of the celebrated book \Matrix Computations".

711 .3. for easy reference. In phase 1. nding the distance of A from another matrix of lower rank (in particular the nearness to singularity of a nonsingular matrix) solving both full-rank and the rank-de cient least squares problems. This algorithm comes into two phases. the most important results discussed in this chapter.11 Review and Summary In this section we summarize. Applications of the SVD: The singular values and the singular vectors of a matrix A are useful and most reliable tools for determining the (numerical) rank and the rank-de ciency of A nding the orthonormal bases for range and the null space of A. and nding the ipseudoinverse of A. statistical applications. Unfortunately. A modi cation of this method. 4.5.) 3. biomedical engineering. image processing. The Demmel-Kahan method computes all the singular values with high relative accuracy. the bidiagonal matrix is further reduced to a diagonal matrix by orthogonal similarity using implicit QR iteration with the Wilkinson's shift.1). but U and V are not unique. (Theorem 10. signal processing. A = U VT: The singular values (the diagonal entries of ) are unique. Relationship of the singular values and singular vectors with the eigenvalues: The singular values of A are the nonnegative square roots of the eigenvalues of AT A. Sensitivity of the singular values: The singular values are insensitive to small perturbations (Theorem 10. Existence and Uniqueness of the SVD: The singular value decomposition (SVD) of a matrix A always exists (Theorem 10. 1. see also Theorem 10. have made the SVD an indispensable tool in a wide variety of applications areas such as the control and systems theory. Computing the SVD: The most widely used approach for computing the SVD of A is the Golub-Kahan-Reinsch algorithm. and in phase 2. 2. 5.2. These remarkable abilities and the fact that the singular values are insensitive to small perturbations. etc. the matrix A is reduced to a bidiagonal matrix by orthogonal equivalence. very tiny singular values may not be computed with very high relative accuracy by this method.10. etc.1.1).2.3. known as the zero shift QR iteration or the QR iteration with a zero shift has been proposed by Demmel and Kahan in 1990.

Deprettere. see an interesting survey paper a variety of applications of singular value decomposition in identi cation and signal processing by Joos Van Dewalle and Bart DeMoor (1988). Systems. see the excellent survey papers by Golub (1969). F. especially in H -in ty and Robust Control Theory. For applications of the SVD to robust pole assignment problem. Prentice Hall. K. etc. 2) SVD and Signal Processing II. applications of the SVD are varied. nearness to singularity. For applications of the SVD to classical control problem. edited by B.10. such as. Algorithms. have treated these aspects very well.W. Curious readers are referred to a growing number of interesting papers in these areas. For some statistical applications. edited by Thomas Kailath. Amsterdam. R. and systems theory applications are structured. edited by R. Algorithms. Andrews and B. Datta. For some interesting papers in this context. Many matrices arising in signal processing and control. 1991. Hunt. 1988. Applications and Architecture. 1977. (North Holland). Finding e cient and reliable algorithms for di erent types of computations that can exploit the structures of these matrices is a challenging problem to researchers. Hankel. The SVD also plays an important role in modern control theory. such as nding numerical rank. see Kautsky. Inc.12 Some Suggestions for Further Reading As mentioned before. Prentice-Hall. For discussions on singular values sensitivity. Amsterdam. Discussions on mathematical applications of the SVD. and Hammarling (1985). Elsevier Science Publishers B. Toeplitz. The books by Golub and Van Loan (MC) and Watkins (FMC). N. are contained in all numerical linear algebra books. see the book Fundamentals of Digital Image Processing by A. and Control. See also the recent paper of Van Dooren (1990). Information and System Sciences Series. etc. Much work has been done and it is presently an active area of research. 1988. see an earlier survey of Klema and Laub (25). For applications to image processing. C. 712 . 1989 and Digital Image Processing by H. et al. Elsevier. see Stewart (1979) and (1984).. Analysis and Applications. Vaccaro. see the book Linear Algebra in Signals. Two important books in in the area of \SVD and Signal processing" are: 1) SVD and Signal Processing. in particular. orthonormal bases for the range and null-space. edited by Ed. Nichols and van Dooren (1985). There are books in each area of applications where the SVD plays an important role. SIAM. Jain. For some applications of the SVD to system identi cation and signal processing problems.

The SVD Theorem (Theorem 10. see an excellent paper by DeMoor and Zha (1991). Malcom and Moler (CMMC). the original paper by Golub and Kahan (1965).2. Fortran codes for the SVD computations also appear in Forsythe. 713 . For computational algorithms for the generalized SVD. pp.For computational aspects of singular values and singular vectors. and Paige (1986). Van Loan (1985). and in Lawson and Hanson (SLS). For the statement and a proof of this theorem. For the SVD of the product of two matrices. For other papers in this area see Kagstrom (1985) and Paige and Saunders (1981). see Heath. For further generalizations of the SVD. see Stewart (1983). Codes for the GKR method appear in Golub and Reinsch (1970). see Golub and Van Loan (MC 1984. Paige and Ward (1986).1) has been generalized to a pair of matrices (A B ) by Van Loan (1976). This is known as the generalized singular value decomposition theorem. and the recent papers by Demmel and Kahan (1990) and Fernando and Parlett (1992) are valuable. Laub. 318-319). and Businger and Golub (1969).

Let rank(A) = r n.1 from Theorem 10. Then construct U and V such that U (V )T is also a singular value decomposition. Let A be m n (m n). (a) Derive Theorem 10. Why is the version called the Economic Version? 3.3. 2.3 1.Exercises on Chapter 10 PROBLEMS ON SECTIONS 10. An economic version of the SVD. S is a nonsingular r r diagonal matrix. Let be a singular value of A with multiplicity ` that is. (b) How are the singular vectors of A related with those of U T AV ? (c) If m = n and A is symmetric positive de nite.1. Then prove that A = V SU where V is an m n matrix with orthonormal columns. then prove that the eigenvalues of A are the same as its singular values.2. 4.2 AND 10. 1 . and U is an r n matrix with orthonormal rows.3. Let ~ ~ ~ ~ U V T be the singular value decomposition of A. Then from the de nitions of singular values prove that the singular values of A and V T AV are the same. i = i+1 = = i+`. 2 0) 714 . and U and V be orthogonal. (a) Let A be m n. Then nd the orthogonal matrix P such that 03 where S = AT 02 3 A ! 2 P T SP = diag( : 1 2 . (b) Given 01 21 B C A = B3 4C @ A 5 6 Find the singular values 1 and 2 of A by computing the eigenvalues of AT A.

5.5. (b) Let U be an orthogonal matrix. then Au1 and Au2 are orthogonal. kAT Ak2 = kAk2 2 iii. and the orthonormal bases for the null-space and the range of each of the matrices in problem #5.1.5) @ A 01 11 1 B 0 C where is small: A=B @ C A 0 PROBLEMS ON SECTIONS 10. Using the SVD of A. Cond(A). Find the orthogonal projections onto the range and the null-space of A and of their orthogonal complements. Then prove that kAU k2 = kAk2 and kAU kF = kAkF (c) Let A be an m n matrix. 10. 7.7 6. where U and V are orthogonal. (a) Find the rank. prove that i. rank(AT A) = rank(AAT ) = rank(A) = rank(AT ) ii. (b) Prove that for an mxn matrix A i. Cond2(AT A) = (Cond2 (A))2 ii.2.6. iii. Then show that kAk2 = 1 if and only if A is a multiple of an orthogonal matrix. and 10. 715 . k k2 k kF . (a) Let A be an invertible matrix. AT A and AAT have the same nonzero eigenvalues. If the eigenvectors u1 and u2 of AT A are orthogonal. Cond2(A) = Cond2 (U T AV ). nd the SVD of the following matrices: 01 21 B C A = (1 2 3) A = B3 4C @ A 0 5 16 1 B C A = B1C A = diag(1 0 2 0 . Using the constructive proof of Theorem 10.

9. 01 21 B C 12. r) columns from A.3. where BT A = U V T . QkF kA . the matrix 0 . Let Q be an orthogonal matrix such that kA .1 using the SVD of A. Ak2 ? 10. @ A 1 4 (b) Compute (AT A). Prove that if A is an m n matrix with rank r. (a) Give a proof of Theorem 10. X kF : Find kA . and let Bm r be a matrix obtained by deleting (n . Bk2 < r . BQkF kA . (a) Let A = B 1 3 C. What is kB . 01 2 31 B C A = B2 3 4C @ A 716 . QkF by computing the singular values of A .:3905 :8912 1 B C Q = B .:2310 . and if B is another m n matrix satisfying kA . Find the outer product expansion of A using the SVD of A. BX kF for any orthogonal matrix X .6. Let A and B be n n real matrices. show that out of all the orthogonal matrices X . Then prove that Q = UV T . (b) Given 5 6 nd a rank-one matrix B closest to A in 2-norm. Given 01 21 B C A = B3 4C @ A 5 6 7 and using the result of problem #10.(d) Let rank(Am n ) = n. then prove that Cond2(B ) Cond2 (A): 8. Q. then B has at least rank r.:4824 :8414 :2436 C @ A :8449 :3736 :3827 is such that kA . 11.

then (b) If A has full row rank... Verify that the matrix Ay = V y U T where y = diag( 1 ) (with the convention that if j = 0. Ay A P3 = AAy and P4 = I . then Ay = (AT A).. Ayx = 0 for any x in N (AT )... 0 C C C C C C: C C C C C A AAyv = v for any vector v in R(A). (AT )y = (Ay )T (Ay )y = A: 16.1AT Ay = AT (AAT ). Show that (a) If A has full column rank. Let A be an m n matrix.) 15.. j 1 = 0). 0 B B B B B Dy = B B B B B B @ 0 C C C C C C C C C C C A i>0 i = 1 : : : r: 1 1 .. r 01 0 Then show that 0 . (Check all four conditions for the de nition of the we use j pseudoinverse. show that (a) (b) (c) (d) 0 . Also verify that each of these is a projection matrix. AAy . 01 r 1 0 14.. For any nonzero matrix A. Let 0 B B B B B D=B B B B B B @ 1 .PROBLEMS ON SECTION 10. is the pseudoinverse of A.8 13.1: 17. >From the SVD of A. compute the singular value decompositions of the projection matrices: P1 = AyA P2 = I . 717 .

18. Let

3 6 9 Find the minimum 2-norm solution x to the least squares problem min kAx ; bk2: x What is kxk2? Obtain x and kxk2 using both the SVD of A and the pseudoinverse of A. 011 011 B C B C 19. Given u = B 2 C v = B 1 C nd Ay where @ A @ A 3 1

01 21 B C A = B2 4C @ A

031 B C b = B6C: @ A

A = uvT :
20. Let

2 3 4 5 14 (a) Find a least squares solution x : min kAx ; bk2. What is kxk2? x (b) Find the minimum-norm solution x, and kxk2.

01 1 1 01 B C A = B0 1 1 1C @ A

031 B C b = B 3 C: @ A

PROBLEMS ON SECTION 10.9
21. Let B be an upper bidiagonal matrix having a multiple singular value. Then prove that B must have a zero either on its diagonal or superdiagonal. 22. Consider the family of bidiagonal matrices 01; (1 + ) B ... ... B B B ... ... B( ) = B B B ... B @

1 C C C C C C C (1 + ) C A 1;

1: It can be shown (Demmel and Kahan 1990) that the smallest singular value of B ( ) is approximately 1;n(1 ; (2n ; 1) ). Taking = 106 and using = 0, verify the above result. 718

23. Let

0 1 :9999 1 B C A = B 2 1:9999 C @ A
3 2:9999

Find the SVD of A

(a) using the Golub-Kahan-Reinsch algorithm. (b) using the Demmel-Kahan algorithm. (c) Compare your results of (a) and (b). 24. Show that an approximate op-count for the Golub-Kahan-Reinsch and the Chan SVD are, respectively, 2m2n + 4mn2 + 2 n3 and 2m2n + 11n3 for an m n matrix A. Compute also the 9 op-count for the Demmel-Kahan algorithm.

719

MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 10
You will need housqr, parpiv, reganl from MATCOM. 1. (The purpose of this exercise is to study the sensitivities (insensitivities) of the singular values of a matrix). Using MATLAB commands svd and norm, verify the inequalities in Theorems 10.5.1 and 10.5.2, respectively.

Test-Data
(i)

0 1 1 1 1 1 C B B 0 0:99 1 1 C B C A=B B 0 0 0:99 1 C C B C @ A
0 0 0 0:99

(ii) A = The Wilkinson Bidiagonal matrix of order 20. In each case, construct a suitable E so that (A + E ) di ers from A in the (n 1)th element only by an = 10;5. (Note that the eigenvalues of both matrices are ill-conditioned). 2. Generate randomly a 15 5 matrix using MATLAB command: A = rand (15,5). Find s = svd (A). Compute now jjAjj2 jjAjjF the condition number of A with respect to 2norm using the entries of s and then compare your results with those obtained by MATLAB commands norm(a), norm(a, 'fro'), cond(a), respectively. 3. (a) Compute the orthonormal bases for the range and the null space of a matrix A as follows: (i) Use U S V ] = svd (A) from MATLAB. (ii) Use housqr or housqrp from Chapter 5, as appropriate. (iii) Use orth(A) and null(A) from MATLAB. (b) Compare the results of (i) and (ii) and (iii) and op-count for each algorithm.

Test-Data:

720

(i)

1 1 1 0 (ii) A = A randomly generated matrix of order 10.

0 1 0 0 1 1C B B0 0 0 0C B C A=B B1 1 1 0C C B C @ A

4. Compute the rank of each of the following matrices using (i) MATLAB command rank (that uses the singular values of A) and (ii) the program housqrp (Householder QR factorization with pivoting) and parpiv from Chapter 5, compare the results.

Test-Data and the Experiment
(a) (Kahan Matrix)

A = diag (1 s

0 1 1 ;c ;c ;c C B B 0 1 ;c B C ;c C B. C B. . C ... n;1) B . . C s . C B B .. . . . ;c C B. C B C @ A
0 0 0 1

where, c2 + s2 = 1 c s > 0 c = 0:2 s = 0:6 n = 100. (b) A 15 10 matrix A created as follows: A = xy T where x = round (10 rand (15,1)) y = round (10 rand (10,1)). 5. (a) Generate randomly a matrix A of order 6 4 by using MATLAB command rand (6,4). Now run the following MATLAB command:

U S V ] = svd (A):
Set S (4 4) = 0 compute B = U S V 0. What is rank (B )? (b) Construct a matrix C of rank 3 as follows:

C = qr (rand (3)):
Verify that jjC ; Ajj2 jjB ; Ajj2 using MATLAB command norm for computing the F F Frobenius norm. (c) What is the distance of B from A? 721

(d) Find a matrix D of rank 2 that is closest to A. (This exercise is based on Theorems 10.6.2 and 10.6.3). 6. Let

0 0 0 1 Find the distance of A from the nearest singular matrix. Find a perturbation which will make A singular. Compare the size of this perturbation with j r j. 7. Let A = U V T be the singular value decomposition (SV D) of a randomly generated 15 10 matrix A = rand (15,10), obtained by using MATLAB command U S V ] = svd(A): Set S (8 8) = S (9 9) = S (10 10) = 0 all equal to zero. Compute B = U S V 0. Find the best approximation of the matrix B in the form

0 1 1 1 1 1C B B 0 :0001 1 1 C B C A=B B 0 0 0:0001 1 C C B C @ A

B

r X i=1

P such that jjB ; r=1 xi yiT jj2 = minimum, where xi and yi are vectors, and r is the rank of B . i
8. For matrices A and B in problem #7, nd an unitary matrix Q such that jjA ; BQjj is minimized. (Hint: Q = V U T , where AT B = U V T ). (Use MATLAB command svd to solve this problem). 9. (a) Write a MATLAB program, called covsvd to compute (AT A);1 using the singular value decomposition. Use it to nd (AT A);1 for the 8 8 Hilbert matrix and compare your results and op-count with those obtained by running reganl from MATCOM. (b) Using covsvd nd the Pseudoinverse of A and compare your result with that obtained by running MATLAB command pinv. (A is same as in part (a)). 10. Let A be a 10 10 Hilbert matrix and b be a vector generated such that all entries of the vector x of Ax = b are equal to 1. Solve Ax = b using the SVD of A, and compare the accuracy, op-count and elapsed time with those obtained by linsyspp from Chapter 6. 722

xi yiT

11. Let A = rand (10,3), and

X = pinv(A):

Verify that X satis es all the four conditions of the Pseudoinverse using MATLAB: AXA = X XAX = X (AX )T = AX (XA)T = XA. 12. Write a MATLAB program, called chansvd to implement the improved SVD algorithm of Chan given in Section 10.9.3, using MATLAB commands qr and svd.

U S V ] = chansvd(A):
Run your program with a randomly generated 30 5 matrix A = rand (30,5) and compare the op-count and elapsed time with those obtained by using svd(A). 13. Write a MATLAB program, called bidiag to bidiagonalize a matrix A (Section 10.9.2) :

B] = bidiag(A tol)
where B is a bidiagonal matrix and tol is the tolerance. Test your program using A = rand(15,10). 14. (The purpose of this exercise is to compare the three related approaches for nding the small singular values of a bidiagonal matrix.) Write a MATLAB program to implement the Demmel-Kahan algorithm for computing the singular values of a bidiagonal matrix: s] = dksvd(A) where s is the vector containing the singular values of A.

Experiment:
Set A = rand (15,10). Compute U S V ] = svd(A). Set S (10 10) = S (9 9) = S (8 8) = 10;5. Compute B = U S V 0 . Run the program bidiag on B :

C = bidiag(B):

Compute now the singular values of C using (i) svd, (ii) dksvd, and (iii) chansvd and compare the results with respect to accuracy (especially for the smallest singular values), op-count and elapsed time.

723

11.A TASTE OF ROUND-OFF ERROR ANALYSIS
11.1 11.2 11.3 11.4 11.5 11.6

Basic Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : Backward Error Analysis for Forward Elimination and Back Substitutions Backward Error Analysis for Triangularization by Gaussian Elimination : Backward Error Analysis for Solving Ax = b : : : : : : : : : : : : : : : : : Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: 724 : 724 : 727 : 734 : 736 : 737

CHAPTER 11 A TASTE OF ROUND-OFF ERROR ANALYSIS .

1) T L = (l ) b = ( b : : : b ) ij 1 n (11. ^ 724 .2. These laws were obtained in Chapter 2. LU factorization using Gaussian elimination and solution of a linear system.2) y = (y : : : y ) 1 n T using the forward elimination Scheme.2) (11.1 Basic Laws of Floating Point Arithmetic We rst remind the readers of the basic laws of oating point arithmetic which will be used in the sequel. A TASTE OF ROUND-OFF ERROR ANALYSIS Here we give the readers a taste of round-o error analysis in matrix computations by giving backward analysis of some basic computations such as solutions of triangular systems.1. 11.2.1. /.1.1) (11. Then 1: (x y ) = (x y )(1 + ) 2: (xy ) = xy (1 + ) 3: If y 6= 0. Let's recall that by backward error analysis we mean an analysis that shows that the computed solution by the algorithm is an exact solution of a perturbed problem. We will use s to denote a computed quantity of s..1. . then ( x ) = ( x )(1 + ): y y Occasionally. When the perturbed problem is close to the original problem. . (11. Let j j < where is the machine precision. we say that the algorithm is backward stable.4) 11.3) (11. Lower Triangular System Consider solving the lower triangular system Ly = b where and (11.2 Backward Error Analysis for Forward Elimination and Back Substitutions Case 1. we will also use x 4: (x y ) = 1 + y where `*' denotes any of the arithmetic operations +.11.

where y = ( lb ) = l (1b+ ) ^ 1 1 1 11 11 1 (using law 4).3) (11. Similarly. (l y ) ^ l l (b . (11.2).2. = 12 + 22 22 22 ^ ^ 22 . where (neglecting 11 l (1 + )^ + l (1 + )^ = b y y 21 21 1 22 22 2 21 2 . and (11. Equation (11.1.Step 1.2. 2 y = ^ ^ b . Thus. This gives that is.3) can be rewritten as l (1 + )(1 + )^ + l (1 + )^ = b (1 + ) y y 21 11 22 1 22 2 2 2 22 That is.5) where and ^21 = l21(1 + l ^22 = l22(1 + l 725 ) 22 ): 21 .2. where j j 1 l (1 + )^ = b y 11 1 1 1 ^ y =b l ^ 11 1 1 ^11 = l11(1 + 1 ): l This shows that y1 is the exact solution of an equation line whose coe cient is a number close to ^ l11. l y = b . l y (1 + ))(1 + ) = l (1 + ) 2 21 1 2 21 1 22 22 2 21 1 11 22 22 2 (using both (11. which is small). Step 2.1.4)).2. we can say that y1 and y2 satisfy ^21y1 + ^22y2 = b2 l ^ l ^ = 1 +11 22 (11.4) where j 11j j 21j and j 2j are all less than or equal to .

The above can be easily generalized and we can say that at the kth step. j + 2) (see Chapter 2. (11.1 The computed solution y to the n n lower triangular system ^ Ly = b obtained by forward elimination.8) kj ij ij The above can be summarized in the following theorem. . satis es a perturbed triangular system: (L + L)^ = b y 1 where the entries of L are bounded by (11. then j j 1:06(k . where ^ = l (1 + ) k = 1 : : : n j = 1 : : : k. Theorem 11. we see that the computed y1 through y satisfy the following perturbed triangular system: ^ ^ ^11y1 = b1 l ^ ^21y1 + ^22y2 = b2 l ^ l ^ . .2. Case 2.2. We can state 726 .6) where ^ = l (1 + ) j = 1 : : : k. j + 2) jl j: (11.2.2.2. Knowing the bounds for . Section 2.8) assuming that n < 100 . The process can be continued until k = n. Note that l These equations can be written in matrix form ^ Ly = (L + L)y = b kj kj kj ij ij 11 = 1. For example.7) ij ij ij where L is a lower triangular matrix whose (i j )th element ( L) = l . the unknowns y1 through y satisfy ^ 1y1 + ^ 2y2 + + ^ y = b l ^ l ^ l ^ (11. l Thus. Upper Triangular System The round-o error analysis of solving an upper triangular system Ux = c using back substitution is similar to case 1. if n is 1 . the bounds for can be easily computed.3). Then small enough so that n < 100 j( L) j 1:06(i . ^ 1y1 + ^ 2y2 + + ^ y = b l ^ l ^ l ^ k k k kk k k kj kj kj n n n nn n n Step k.

.3 Backward Error Analysis for Triangularization by Gaussian Elimination The treatment here follows very closely to the one given in Ortega (Numerical Analysis: A Second Course).9) (11..2 Let U be an n n upper triangular matrix and c be a vector. Ux = c using back substitution satis es (U + U )^ = c x where 1 assuming n < 100 . . (11. j + 2) ju j ij ij 11. 1) steps.. Recall that the process of triangularization using Gaussian elimination consists of (n . and in Forsythe. A (11. C .. C . C . a 1 (k ) ln a a (k ) kk . At step k. C . C .3.Then the computed solution x to the system ^ Theorem 11. First.2) 727 .3.2. C a C: C (k ) kn . Let the computed multipliers be m 1 i = 2 3 : : : n: ^ n k ij i Then where a a m = ( a ) = a (1 + ) ^ i1 i1 i1 11 11 i1 j j i1 : (11. the matrix A( ) is constructed which is triangular in the rst k columns that is k 0a B B B B A =B B B B @ (k ) (k ) 11 .2.1) (k ) nk a (k ) nn The nal matrix A( .1) is triangular.2. We shall assume that the quantities a( ) are the computed numbers. Malcom and Moler (CSLAS)..10) j( U ) j 1:06(i . . let's analyze the computations of the entries of A(1) from A in the rst step.

L A+E (1) (0) (0) (11.a1+( 1 11 a11 = 1a 1 i i i i i i i i ij (11.3) Let us now compute the errors in computing the other elements a(1) of A(1).6) where 0 0 Analogously.3.3.3. we will have n1 0 0 Bm B^ L = B . the error e(0) in setting a(1) to zero is given by 1 1 i i e (0) i1 = a(1) .3) can be rewritten as j (1) ij ij (1) ij j 1j j ij (1) ij j : (11.5).5) a = (a .. C: E =B . we have ij e = (0) (1) ij a . 728 . 0 C 0C C 1 C C A 0 0 0 1 0 B C Be e C B . C @ A (0) (0) 21 (0) 2n e (0) n1 e (0) nn A =A .3.3. @ (0) 21 0 0 m ^ . (m 1a )) ^ = (a . m a ) + e ^ i1 (0) i j = 2 : : : n: (1) ij where 1+ From (11. a 1 + m 1a11 ^ 1 a 1 )(1 + )a = 0. m1 a1 (1 + (1))](1 + ^ ij i ij ij i j ij ij j j ij (1) ij ) i j = 2 ::: n where (11.3.. C .3. (m 1a1 ))(1 + (1) ) ^ = a .L A +E (2) (1) (1) (1) (1) (11.m a ^ (1) ij (1) ij i1 1j i j = 2 : : : n: A = A.4) and (11. B .7) where L(1) and E (1) are similarly de ned.4) (11. B B .. at the end of step 2. The computed elements a(1) i j = 2 : : : n are given by ij a (1) ij = (a .Thus..3..

3.9). B.8) Continuing in this way. we can write A Since (n .1) + n n = A + E (0) + E (1) + That is.10) n n (I + L(0) + L(1) + + L( . ^ .1) = A + E (0) + E (1) + + E ( . @ 0 k +1 k m ^ C C C C .C A 0 (11. .1) + L(0)A + L(1)A(1) + + L( .C .11) (11. 2: Thus from (11.2)A( .3. .3.3. 0 B =B. C .12) 729 .2) n n n (11.1) k = 0 1 2 : : : n .3.1) + L(0) A( .L A+E .1) (11.2)A( . + E ( .2) n L we have (k .2) n + L( . m B . we obtain A (n .Substituting (11.1) 00 0 B B B.L A +E (2) (1) (1) (1) (1) (0) (0) (1) (1) (1) (11. B.L A +E = A.9) 01 n k L A =L A (k ) (k ) (k ) (n .3.3.8) and (11.. .3.C .2))A( . we have A = A .6) in (11.3. .1) + L(1)A( .7).2) = A + E (0) + E (1) + n n + E ( .

14) e (k n n .m a ^ (k ) ij ik (k kj .3. .1) e e and and (k ij (k i k .1) = (k ) 1+ ij (k ) ij a .Noting now .1) = a( . and denoting E (0) + E (1) + ^^ A + E = LU + E ( . 1 (11. @ m ^ 21 31 n1 0 1 m32 ^ .16) (11. .3.15) . . C . ..1) (k ) ij i j = k +1 ::: n (11..1) 0 0 e k k k k k n k n . @ ( ..3. B .. e( +11) B. . .17) (11. .3.. . .. B. n n 0 C 0C C C ^ 0 C = L (Computed L) C . . A 0 0 (k 1) k +1 n k = 1 2 ::: n . C .18) j j ik j j (k ) ij (k ) ij j j : We formalize the above discussions in the following theorem: 730 .. C C . we have from (11.. m ^ .3. B.1 n 1 C C A 1 A (n . B .3. .1) k i k i k (11. B .1) = B . . .11): (11. .2) are given by: 1 C C C C e . I+L +L + (0) (1) +L (n 2) 0 1 Bm B^ B B^ =Bm B .3..2) by E . .13) where the entries of the matrices E (0) : : : E ( 00 0 0 B0 0 0 B B.1) = U ^ (computed U ). . m2 ^ n 0 0 1 . E ( .

:11 . .C @ A m1 m2 ^ ^ m . . C C B . :52 :11 = :16 a = :22 .1 The computed upper and lower triangular matrices L and U produced by Gaussian elimination satisfy ^^ A + E = LU ^ ^ where U = A( ) and L is lower triangular matrix of computed multipliers: 0 1 0 1 0 Bm C B ^ 21 1 0 0C B . . .1 1 ^ n n n n n Example 11.1 Let t = 2.:0008 = :63 . . . :22 .3.. .3. . 1:57 :21] = . .. :52 :11] = . . 1:57 :35 = . . .. . :52 :35] = :0020 = :10 . B ...^ ^ Theorem 11. C B . .:33 :22 (1) e e (0) 21 (0) 22 (0) 23 (0) 31 = 0 .:0003 731 e e . .. 0 :21 :35 :11 1 B C A = B :11 :81 :22 C @ A :33 :22 :39 (1) 22 (1) 23 Step 1. m = ::11 = :52 ^ 21 :33 = 1:57 m = :21 ^ 21 31 a = :81 . . :52 :21] = .. :81 . :33 . 1:57 :11 = :22 (1) 32 (1) 33 0 :21 :35 :11 1 B C A = B 0 :63 :16 C @ A 0 .:0020 = 0 .:33 a = :39 . :52 :35 = :63 a = :22 .C B L=B . . C .

:0008 :0020 . 1:57 :35] = .:52 63 (2) 0 :21 :35 :11 1 B C ^ A = B 0 :63 :16 C = U: @ A 0 0 :30 a = :22 + :52 :16 = :30 (2) 33 e e E Thus (1) 32 (1) 33 = 0 . 0 0 1 0 0 B C E = E + E = B .e e (0) 32 (0) 33 = .:33 + :52 :63] = :0024 = :30 .:0028 C @ A .:003 :0019 . .:0005 0 1 1 0 0 C ^ B L = B :52 1 0 C @ A 1:57 .:0005 Step 2.:0032 (1) (1) Since ^^ we can easily verify that LU = A + E . :22 + :52 :16] = . :39 . 1:57 :11] = :0027 m = .:0008 :0020 .:0028 C @ A .:0003 .:33 . For this purpose we assume that pivoting has been used in Gaussian elimination so that jm j 1. ::33 = .:0005 :0027 (0) = :22 . Recall that the growth ^ factor is de ned by max ja( )j = : max ja j ik k i j k ij i j ij 732 . :22 . 32 0 0 1 0 0 B C E = B .:52 1 (0) Bounds for the elements of E We now assess how large the entries of the error matrix E can be.:0032 00 0 1 0 B C = B0 0 0 C @ A 0 :0024 .

kE k1 a (1 + 3 + = an : 2 + (2n .3. C @ A 1 3 5 2n .18) hold element-wise. jE j = jE + + E . 1)) (11. a : (since m ^ ik 1) Then (0) (0) Denote 1 . Then from (11.Let a = max ja j.14) and (11. Thus.19) Remark: The inequalities (11. We can immediately obtain a bound in terms of norms.3. B B B0 @ 00 0 10 0 B1 2 2 C B 2 C B C B C B1 3 4 a B 4 C C B .3. C B. j 0 0 20 0 01 B 6B 1 2 C B0 6B 2C B 6B C B0 1 6B C B C +B a 6B 6B C B 6B C B 6B C B 4@ A B B @ 1 2 2 0 1 00 B0 B B.3.1)j and.. by . 1 i = k + 1 ::: n .15).1)j a k = 1 2 ::: n . j jE j + + jE .20) 733 . 2 (n 2) (n 2) 2 2 01 C 0C C C 2C C 2 13 0 C7 0 C7 C7 1 2 C C C C C A 0 0 0 0 C7 C7 C7 C7 0 C7 A5 (11. we have i j ij je (k ) ij (k ik . for i j = k + 1 : : : n je j 1 . ja j + ja (k ) ij (k ij 2 1. + +B. B.3.

. we know that computed solution y and x of the ^ ^ above two triangular systems satisfy: ^ (L + L)^ = b y and ^ (U + U )^ = y : x ^ From these equations. First.. C . from Theorem 11.2) Bounds for F kF k1 kE k1 + k Lk1 kU k1 + kLk1k U k1 + k Lk1k U k1: We now obtain expressions for k Lk1 k U k1 kLk1 and kU k1 . 0 C B^ C B .4. we know that triangularization of A using Gaussian elimination ^ ^ ^^ yields L and U such that A + E = LU .2) we have (U + U )^ = (L + L). Since.1 and Theorem 11..3. we have or or where ^^ (Note that A + E = LU ).. 1 0 1 C B m .2.1 b x ^ ^ (L + L)(U + U )^ = b x (A + F )^ = b x ^ ^ F = E + ( L)U + L( U ) + ( L)( U ) (11. A @ m ^ m . ^ ^ These L and U will then be used to solve: ^ Ly = b ^ Ux = y: From Theorem 11. 1 ^ 21 n1 n n 1 734 . C B .2.11.4.1) (11.4. ^ L=B . From (11. followed by forward elimination and back substitution.1.4 Backward Error Analysis for Solving Ax = b We are now ready to give a backward round-o error analysis for solving Ax = b using triangularization by Gaussian elimination.2.

then kE k1 n a .. from (11. 1 : 2 2 (11.4. j 2 ^ Assuming partial pivoting. 1 + 1:06n (n + 1)a + n a 2 2 2 (11.8).4.4. 1 l and a kAk1.6) (note that U = A( . jm j 1 k = 1 2 : : : n .4){(11. we have kF k1 ^ ^ kE k1 + k Lk1kU k1 + kLk1k U k1 + k Lk1 k U k1 n a .4.7) Also recall that Assume that n2 1 (which is a very reasonable assumption in practice).11) The above discussion leads to the following theorem.4.4. ij (11.3) 1 ik ^ kLk1 k Lk1 Similarly.4.. 735 .4) (11.2). ..from (11.4.8) (11. B C .8) in (11.1) ) n and k U k1 n(n2+ 1) 1:06a : (note that max ju j a ).4.4. we have ^ 21 21 n n (11.2.4. i.10) Since .4.4. @ A (n + 1)jm j ^ 3jm . 1 i = k + 1 : : : n.10) we can write kF k1 1:06(n + 3n ) kAk1 : 3 2 (11.e. we obtain 0 1 2 B 3jm j C B C ^ 2 B C j Lj 1:06 B C .5) (11. n n(n + 1) (1:06 ) 2 ^ kU k1 na (11.9) k Lk1 k U k1 n a Using (11. .

Theorem 11.4. The results of this chapter are already known to the readers. 736 . Linear systems problem Ax = b using Gaussian elimination with partial pivoting followed by forward elimination and back substitution (Theorem 11. We have tried to give formal proofs here.2.3. Wilkinson (AEP. pp ) states that in practice kF k1 is always less than or equal to n kAk1 .2. 11.4.10.17). as the title of the chapter suggests. and (11.1).2) and kF k1 1:06(n3 + 3n2 ) kAk1 . (Exercise) 11. In practice.5 Review and Summary In this chapter. The above bound for F is grossly overestimated.8.8). we can also obtain an element-wise bound for F . Remarks: 1. this bound for F is very rarely attained.10).1 The computed solution x to the linear system ^ Ax = b using Gaussian elimination with partial pivoting satis es a perturbed system (A + F )^ = b x where F is de ned by (11.2. 2. Lower and upper triangular systems using forward elimination and back substitution (Theorems 11.4. 11.2.1). 11.2. 3.3.11). Bounds for the error matrix E in each case has been derived (11.1 and 11.2). They have been stated earlier in the book without proofs.3.2. (11.19. 2. LU factorization using Gaussian elimination with partial pivoting and (Theorem 11. We have merely attempted here to give the readers a taste of round-o error analysis. we have presented backward error analyses for: 1. Making use of (11.4.

The books A Second Course in Numerical Analysis.6 Suggestions for Further Reading The most authoritative book for learning details of backward error analyses of algorithms for matrix computations is the Wilkinson's Algebraic Eigenvalue Problem (AEP). for the linear system problem Ax = b using the process. is also a good source of knowledge. depend upon the size of the growth factor. whereas. the stability of the Gaussian elimination process for LU factorization and therefore. 737 . by James Ortega. by Forsythe and Moler.To repeat. 11. Rounding Errors in Algebraic Process. and Computer Solutions of Linear Algebraic Systems. have also given in-depth treatment of the material of this chapter. Wilkinson's other book in this area. these results say that the forward elimination and back substitution methods for triangular systems are backward stable.

3.8). (11. Using = 10.2. and t = 2 compute the LU decomposition using Gaussian elimination (without pivoting) for the following matrices. 6. Suppose now that partial pivoting has been used in computing the LU factorization of each of the above matrices of problem #1.3.2.10). Find F in each case such that the computed solution x 1 satis es (A + F )x = b: Compare the bounds predicated by (11.4. are backward stable. 738 . show that the process of forward elimination and back substitution for lower and upper triangular systems. 4. From (11.11) with actual errors.2. nd an element-wise bound for F in Theorem 11. Find again the error matrix E in each case. conclude that the backward stability of Gaussian elimination is essentially determined by the size of the growth factor .18).Exercises on Chapter 11 1.13{(11. and (11.1. Consider the problems of solving linear systems: Ax = b using Gaussian elimination with partial pivoting with each of the matrices from problem #1 ! 1 and taking b = in each case. Making use of (11.3. respectively. From Theorems 11.3.17). and compare the bounds of the entries in E predicted by (11.2.12) with the actual errors.4. 3. (A + F )x = b: 5.1 and 11.2. and nd the error matrix E in each case such that (b) (c) ! 3 4 (a) A = 5 6 :25 :79 ! :01 :12 ! A + E = LU: 10 9 8 5 ! 4 1 (d) 1 2 ! :01 :05 (e) : :03 :01 2.

Consider the problem of evaluating the polynomial p( ) = a by synthetic division: n n + a . = i 1 ( p + a .1 + n n + a0 p = a n n p. 1 : : : 1: ) .1 ) i i i = n n .1 + n Then p0 = p( ): Show that p = a (1 + ) + a .7. (1 + 0 n n n n 1 n+1 + a0 (1 + 0 ): Find a bound for each i = 0 1 : : : n: What can you say about the backward stability of the algorithm from your bounds? i 739 .1 .

2.2.1 Some Relational Operators in MATLAB : : : : : : : : : : : : : A.7 Getting a Hard Copy : : : : : : : : : : : : : : : : : : : : : : : : : : A.1.2.2 Writing Your Own Programs Using MATLAB Commands : : : : : : A.2.9 Use of `diary' Command and Printing The Output : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 740 740 740 740 741 743 746 746 747 747 748 748 748 748 749 750 .2.2.1.2.3 Colon Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A.5 Computing Flop-Count and Elapsed Time of An Algorithm.2 Entering and Exiting MATLAB : : : : : : : : : : : : : : : : : : : A.1 What is MATLAB ? : : : : : : : : : : : : : : : : : : : : : : : : : : A.1.2 Some Matrix Building Functions : : : : : : : : : : : : : : : : : : A.5 Numerical Examples : : : : : : : : : : : : : : : : : : : : : : : : : : A.1. A BRIEF INTRODUCTION TO MATLAB A.8 Examples of Some Simple MATLAB Programs : : : : : : : : : A.6 Saving a MATLAB Program : : : : : : : : : : : : : : : : : : : : : A.A.1. while.2.2. if commands : : : : : : : : : : : : : : : : : : : : : : : : A.4 Most frequently used MATLAB operations and functions : : A.4 for. : A.1 Some Basic Information on MATLAB : : : : : : : : : : : : : : : : : : : A.3 Two most important commands: HELP and DEMO : : : : : : A.

APPENDIX A A BRIEF INTRODUCTION TO MATLAB .

. It was developed by Cleve Moler. It is an interactive software package for solving problems arising in scienti c and engineering computations.1.3 Two most important commands: HELP and DEMO The two most important commands are help and demo. max(svd(X)). A BRIEF INTRODUCTION TO MATLAB A.2) is the same as NORM(X).2 Entering and Exiting MATLAB On most systems the command matlab will let you enter into MATLAB. Typing 'help' followed by a matlab function name from the list will give you more speci c information about that particular function.'fro') is the F-norm. sqrt(sum(diag(X'*X))). Example A. The most current version contains programs for many other types of computations including 2-D and 3-D graphic capabilitiies.1. the largest column sum.'inf') is same as NORM(X.1) is the 1-norm of X.inf) is the infinity norm of X. For matrices.. Give the command exit to quit MATLAB. A. NORM(X. eigenvalues and eigenvector computations.1 What is MATLAB ? MATLAB stands for MATrix LABoratory.1 >> help norm NORM Matrix or vector norm. NORM(X. NORM(X) is the largest singular value of X. NORM(X. MATLAB contains programs for all fundmental matrix computations such as solutions of linear systems. NORM(X. Typing 'help' you will get the listing of all the matlab functions and other matlab capabilities. A. singular values and singular vectors computations. the largest row sum.A. = max(sum(abs((X')))). = max(sum(abs((X)))). when you are all done. solutions of least squares problems . etc. NORM(X.1. various matrix factorizations.1.1 Some Basic Information on MATLAB A.inf).

Matrix condition number. Demo teaches you how to use the matlab functions such as how to enter matrix values into a A. not abs(real(z)) + abs(imag(z)).Matrix or vector norm. 2.inf) = max(abs(V)).Numerical linear algebra. See also COND. if X has complex components. which was used in some earlier versions of MATLAB. NORMEST. CONDEST.4 Most frequently used MATLAB operations and functions Some Basic Matrix Operations + * .P) = sum(abs(V)^P)^(1/P). inf or 'fro'. cond norm .2).-inf) = min(abs(V)). In MATLAB 4. NORM(V.. NORM(V. For vectors. RCOND. matrix. how to nd the rank of a matrix. z.P) is available for matrix X only if P is 1. how to nd its transpose.* ^ \ / & | ~ Plus Minus Matrix multiplication Array multiplication power Backslash or left division Slash or right division Logical AND Logical OR Logical NOT Matrix functions .NORM(X. then abs(z) = sqrt(z*conj(z)). 741 . . Matrix analysis. etc. NORM(V.0.1. NORM(V) = norm(V.

. Using help you can get information on any of the above routines. Eigenvalues and singular values. . .Orthogonalization.Characteristic polynomial.Linear equation solution .Least squares in the presence of known covariance. .Number of linearly independent rows or columns. . . . . \ and / chol lu inv qr qrdelete qrinsert pinv lscov .1.Orthogonal-triangular decomposition. .2 >> help lu 742 . . .LINPACK reciprocal condition estimator. . Linear equations. .Real block diagonal form to complex diagonal form. .Diagonal scaling to improve eigenvalue accuracy. Here is an exapmle.Pseudoinverse.Insert a column in the QR factorization.Schur decomposition.rcond rank det trace null orth rref . eig poly hess svd qz rsf2csf cdf2rdf schur balance .Factors from Gaussian elimination. use "help slash".Matrix inverse.Cholesky factorization. .Hessenberg form.Determinant.Generalized eigenvalues. .Complex diagonal form to real block diagonal form.Sum of diagonal elements.Reduced row echelon form. . Example A. .Null space.Singular value decomposition. .Delete a column from the QR factorization.Eigenvalues and eigenvectors. . .

1. A.U] = LU(X) stores a upper triangular matrix in U and a "psychologically lower triangular matrix". in L . i.e. By itself.P] = LU(X) returns lower triangular matrix L. upper triangular matrix U. LU(X) returns the output from LINPACK'S ZGEFA routine. so that X = L*U. L. a product of lower triangular and permutation matrices. and permutation matrix P so that P*X = L*U.LU Factors from Gaussian elimination.U. L.5 Numerical Examples To enter a matrix and a vector in MATLAB >> A= 1 3 5 A = 1 2 1 3 4 3 5 6 9 2 4 6 1 3 9] >> b= 1 1 1]' b = 1 1 1 To solve the Linear system Ax = b >> x=A\b x = 743 .

3246 12.2000 3.2500 To compute the eigenvalues of A >> eig(A) ans = -0.-0.5000 0.4000 To nd the rank of A 744 .3246 2.2500 1.9193 8.2361 0 -4.5000 0.5000 -0.5000 0 0.0000 To reduce A to an upper hessenberg matrix >> hess(A) ans = 1.2000 4.2500 1.5000 -0.5000 0 To compute the inverse of A >> inv(A) ans = -2.2500 -0.0000 -2.6000 -3.1305 -6.

1539 To compute the QR factorization of A using Householder matrices.5774 0.>> rank(A) ans = 3 To compute the 2-norm of A >> norm(A) ans = 13.4082 -0.5774 -0.4082 0.6188 2. >> q = q.7071 r = -2.r] = qr(A) will give the qr factorization of A -0.1547 0 -10.7071 -0.0000 0.8284 745 .5774 -0.4495 0 0 -5.7155 1.8165 -0.6145 4.3538 To compute the condition number of A (with respect to 2-norm) >>cond(A) ans = 42.

0000 0.1 Some Relational Operators in MATLAB < > <= less than greater than less than or equal to 746 .0000 0 0 1.0000 A.ufl.2 Writing Your Own Programs Using MATLAB Commands It is easy to write your own programs using MATLAB commands.u . We list rst some most common uses and then give examples.tex. >> l.5000 1.0000 0 1.edu Deirectory: pub/matlab Files: primer.edu containing the single line send matlab/primer.5000 u = 2 0 0 4 1 0 6 2 4 1.tex A.2.u] = lu(A) l = 0.ps or Send e-mail to listserv@math. primer. This is available via anonymous ftp from Address: math. We urge students to get hold of a copy of Matlab Primer written by Professor Kermit Sigmon of University of Florida.To compute the lu factorization of A using partial pivoting.

Examples A.>= == greater than or equal to equal A. 4.2) denotes the entire 2nd column of A. 2. fth and seventh columns of A The statement A(:.2 Some Matrix Building Functions eye zeros rand identity matrix matrix of zeros randomly generated matrix with entries between zero and one.:) denotes the rst 3 rows of A.1:3) will replace the columns 2. hilb(5) will create the 5 x 5 Hilbert matrix.7]) = B(:. A(:.3) will create a 5 x 3 randomly generated matrix 2. 2. If x is a vector diag(x) is the diagonal matrix with entries of x on the diagonal.2.5. 1. lower triangular trianluar part of a matrix Hilbert matrix Toeplitz matrix gives the dimension of a matrix absolute value of the elements of ammatrix or a vector.2.7]) denotes the second. diag(A) is the vector consisting ot the diagonal of A 3. A(1:3. rand(5.5. max diag triu tril hilb toeplitz size abs max entry create or extract a diagonal upper triangular part of a matrix.7 of A with the rst three columns of B Examples 747 . max(max(A)) will give the maximum entry of the whole matrix A.3 Colon Notation The 'colon' notation (:) is used to conveniently indicate a whole row or column A(:.2)) will give the maximum value of the second column of A. max(A(:.5.

If the version of matlab that you use contains the cputime function then you can delete the cputime. MATCOM contains a program cputime.2. The uses of these commands will be illustrated in the following examples.7 Getting a Hard Copy diary< filename > will cause a copy of all subsequent terminal input and most of the resulting output to be writen on the named le.2. Example = a\b flops(0) \\ x flops will give the total ops to solve a linear system with a given matrix a and a vector b. For example : t = cputime your operation cputime . A. diary o suspends it. A.mat'. To compute the op count of an algorithm set ops(0) immediately before executing the algorithm and type ops immediately after completion. A.A.4 for.m. t returns the cputime to run your operation. while. if commands These commands are most useful in writing MATLAB programs for matrix algorithms. The command load < filename > will restore all the variables from the le named ' lename.mat'.2.m program from MATCOM.5 Computing Flop-Count and Elapsed Time of An Algorithm. 748 .6 Saving a MATLAB Program The command save < filename > will store all variables in the lename ' lename. The function cputime returns the CPU time in seconds that has been used by the MATLAB process since the MATLAB process started.2. Since the PC version of MATLAB does not have a cputime function.

m % Matrix-Matrix product with upper triangular matrices % input U and V two upper triangular matrices of order n % output C = U * V % function C = matmat(U.2 The following code will create a matrix A such that the (i j )-th entry of the matrix A = a(i j ) is (i + j ).3 The following MATLAB program computes the matrix-matrix product of two upper triangular matrices.j) zero.j) = 0 end end end Example A.2. a= zeros(4.A.4) for i = 1:4 for j = 1:4 if i > j a(i.j) = i+j end end Example A.2. This program will be called matmat.4) for i = 1:4 for j = 1:4 a(i.8 Examples of Some Simple MATLAB Programs Example A.2.2. a = rand(4.V) 749 .1 The following code will make the elements below the diagonal of the 4 x 4 matrix A = a(i.

function C = matmat(U. The 750 .j) = C(i.V) n.5 end A.2.n) for i = 1:n for j = i:n for k = i:j C(i.k) * V(k.1) % Computing % input x % output nrm.2. Sometimes you may want a listing of your matlab program step by step as it was executed. the two norm of the vector x nrm = twonorm(x) nrm = twonorm(x) the two norm of a vector x % function function n.m] = size(U) C = zeros(n.j) end end end end Example A.j) + U(i.m] = size(x) r = max(abs(x)) y=x/r s = 0 for i=1:n s = s + y(i)^2 end nrm = r * s^0.4 (Matlab Implementation of Algorithm 3.1.9 Use of `diary' Command and Printing The Output .

set the diary off The above command will store all the commands that are executed by the program matmat.m in the le diary8. This le can then be printed. 751 .diary command can be used to create a listing.sets the diary on <----. Example: >>diary B:diary8 >> C = matmat(U. Examples of some M les are given in appendix B. Only that data which is printed on the screen is stored in diary8. In case you do not want the output to be printed on the screen place a semi colon at the end of the line.V) >> diary off <----. To write a comment-line put the % sign in the rst column.execute your program <----.

1 What is MATCOM ? : : : : : : : : : : : : : : : : : : : : B.2 How To Use MATCOM ? : : : : : : : : : : : : : : : : : : : : : B. MATLAB AND SELECTED MATLAB PROGRAMS B.B.4 Some Selected MATCOM Programs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 752 752 752 755 759 .1 MATCOM and Some Selected Programs From MATCOM B.3 Chapter-wise Listing Of MATCOM Programs : : : : : : : : B.1.

APPENDIX B MATCOM AND SELECTED MATLAB PROGRAMS FROM MATCOM .

MATCOM has been written by Conrad Fernandes.m 5.1 / / \ \ \ is the algorithm or the section number The program name is / To use this program in matlab.1 housmat. In particular. etc.3. To nd out these variables type >> help housmat create a householder matrix 5.3. and elapsed time . Biswa Nath Datta.2 How To Use MATCOM ? .m H = I . By using the programs in MATCOM the students will be able to compare di erent alogrithms for the same problem with respect to accuracy . op-count.1. The students will be able to verify the statements about goodness and badness of di erent algorithms.2 *u*u'/(u'*u) input output vector x u. In section B3 you have a chapter wise listing of the programs in MATCOM Example create a householder matrix housmat. they will be able to distinguish between a good and a bad algorithm.1 What is MATCOM ? MATCOM is a MATLAB based interactive software package containing the implementation of all the major algorithms of chapter 3 through chapter 8 of the book ' Numerical Linear Algebra and Applications ' by Prof. MATLAB AND SELECTED MATLAB PROGRAMS B.B.1 MATCOM and Some Selected Programs From MATCOM B. you need to know the input and output variables. For each problem considered in this book. there are more than one (in some cases several) algorithms.H] . B. a graduate student of Professor Datta.

To execute the program you have to do the following 1.6789 0.2220 -0.function u.6884 -0.0269 -0.9994 1.0477 -0.6793 >> u = u.H] = housmat(x) 1.H] = housmat(x) As output you will get the Householder matrix H such that Hx is a multiple of e1 and the vector u out of which the Householder matrix H has been formed.6888 -0.0269 753 .H] = housmat(x) The input is a vector x and the output are a matrix H and a vector u. then type u.0477 0. >> x = rand(4. create the input vector x 2.1) x = 0.9981 -0.0693 0.0470 0.2190 0.0000 H = -0.7740 0.

6884 -0.0269 0.6117 754 .6888 -0.0269 -0.6122 -0.-0.3880 -0.3880 0.

m housqr.m givhs.4.m 4.2.m 5.m parpiv.2 5.5.3 Chapter-wise Listing Of MATCOM Programs CHAPTER 3 Title Program Name Number Back substitution backsub.m Computing (I .4. 2(u uT )=(uT u)) houspostmul.m 3.) givrota.m compiv.3 .5. num.1 5.m houshess.2 5.m Without Pivoting Triangularization Using Gaussian Elimination With Partial Pivoting Triagularization Using Gaussian elimination With Complete Pivoting Creating Zeros in a Vector With a Householder Matrix Householder QR Factorization Householder QR Factorization for a Nonsquare Matrix Householder Hessenberg Reduction Creating Zeros in a Vector Using Givens Rotations Creating Zeros in a Speci ed Position of a Matrix Using Givens Rotations QR factorization Using Givens Rotations Givens Hessenberg Reduction 755 5.1.m givqr.m 5.m 3.4.m housmat.B.4.4 5.3 Number givcs.3 The Inverse of an Upper Triangular Matrix invuptr.2 5.1 (sec.3 5.4 Basic Gaussian Elimination gauss.m housqrnon.1 5. num.1 Computing A (I .1.2.2.1.5 CHAPTER 4 Title Program Name Number T )=(uT u)) A housmul.) CHAPTER 5 Title Program Name Triangularization Using Gaussian elimination lugsel.1 (sec.2 5.2.5.2. 2(u u 4.5.m and givrot.m 3.

5.m incompiv.m gaused.5.m shermor.3 (sec.4 6.3 6.4 6.7 756 .1 6.10.) 6.3 (sec.m Solving Ax = b with Partial Pivoting gausswf choles.1 6.m sucov.m congrd.4. num.4.10.) 6.10.2 6.m icholes.5. num.without Explicit Factorization Cholesky Algorithm Sherman Morrison Formula Inverse by LU Factorization without Pivoting Inverse by Partial Pivoting Inverse by Complete Pivoting Hager's norm-1 condition number estimator Iterative Re nement The Jacobi Method The Gauss-Seidel Method The Succesvie Overrelaxation Method The Basic Conjugate Gradient Algorithm Incomplete Cholesky Factorization No-Fill Incomplete LDLT CHAPTER 6 Forward Elimination forelm.10.) 6. num.m jacobi.10.1 6.3 6.) 6.m inlu.7.m iterref.m nichol.m 6.2 (sec.m inparpiv.10.1 6. num.5.4.m hagnormin1.3 (sec.9.6 6.

m rayqot.4 (sec.2 8.6) Number NO CHAPTER 757 .9.m 7.8.9.9.10. num.) 8.m senseig.m 7.1 (sec.8.7.6 7.) 8.) 8.2 7. num.m lsudqrh.3 7.5 (sec.8.1 7 .m The Explicit Double Shift QR Iteration qritrdse One Iteration-Step of the Implicit Double Shift QR qritrdsi.2 clgrsch.8.m 8.8.5.1 8. num.m mdgrsch.2 7.m lsitrn2.m 7.1 (section 8.9.3 8.2 (sec.1 7.m lsrdqrh.) 8.2 (sec.m invitr. num.8.m reganal.10.9.Least Squares Problem Classical Gram-Schmidt for QR factorization Modi ed Gram-Schmidt for QR factorization Least Squares Solution by MGS Least Squares Solution for the Rank-De cient Problem Using QR Minimum Norm Solution for the Full-Rank Underdetermined Problem Using Normal Equations Minimum Norm solution for the Full-Rank Underdetermined Problem Using QR Linear Systems Analog Least Squares Iterative Re nement Iterative Re nement for Least Squares Solution Computing the Variance-Covariance Matrix CHAPTER 7 Title Program Name Number Least Squares Solution Using Normal Equations lsfrnme.9.5.11.m lsfrmgs.) 8.5 7. num.5.m Title CHAPTER 8 Program Name power.9.m lsitrn1.4 7.m lsudnme.1 The Householder-Golub Method For the Full-Rank lsfrqrh.1 Power Method Inverse Iteration Rayleigh-Quotient Iteration Sensitivities of Eigenvalues The Basic QR Iteration The Hessenberg QR Iteration The Explicit Single Shift QR Iteration qritrsse.m qritrb qritrh.9.

m Computing the CPU Time cputime.m 758 .Title Program Name Number The Absolute Maximum of a Vector absmax.m Interchange Two Vectors inter.

1.b) y] = backsub(T.4 Some Selected MATCOM Programs % Back substitution 3.n] = size(T) if m~=n disp('matrix T return end y = zeros(n.j)*y(j) end y(i) = (b(i) .3 % output % function function % % backsub.1) for i = n:-1:1 sum = 0 for j = i+1:n sum = sum + T(i.i) end is not square') end % inverse of an upper triangular matrix 3.m % the matrix T is overwritten by its inverse % function function % % T] = invuptr(T) T] = invuptr(T) !rm diary8 diary diary8 m.n] = size(T) if m~=n disp('matrix T return end is not square') 759 .B.4 invuptr.b) !rm diary8 diary diary8 m.sum ) / T(i.1.m % input upper triangular T and vector b y] the solution to Ty = b by back subst y] = backsub(T.

2.n] = size(A) beta = 2/(u'*u) for j = 1 : n alpha = 0 for i = 1 : m1 alpha = end alpha = beta * alpha alpha + u(i) * A(i.1 I .u) A] = housmul(A.n) for k = n:-1:1 T(k.j) = A(i.k) end T(i.u) % function function % % !rm diary8 diary diary8 m1.k) = 1/T(k.alpha * u(i) end end 760 .m 4.j) .j)*T(j.k) = -sum/T(i.k) for i = k-1 :-1 :1 sum = 0 for j = i+1:k sum = sum + T(i.j) for i = 1:m1 A(i.2uu'/u'*u]*A A and vector u A] A] = housmul(A.i) end end end % pre multiply a % compute % input % output matrix by a Housholder matrix housmul.s = eye(n.

j.5.2 % input % output A l.j) end end end u = triu(A) l = tril(A.end % LU factorization using Gaussian Elimination Without Pivoting 5.u] = lugsel(A) lugsel.k) = end for i = k+1:n for j = k+1 :n A(i.j) A(i.5 givrota. u] l.n] = size(A) for k = 1 : n-1 for i = k+1:n A(i.j) .u] = lugsel(A) l.m % input % output i j and matrix A A] .2.m % this program calls the MATCOM program givcs.A) % function 761 . of a % matrix A using Givens Rotations 5.k)/ A(k. A = J * A A] = givrota(i.k) for i = 1:n l(i.i) = 1 end end % Creating Zeros in a Specified Posn.-1) = A(i.k) * A(k.A(i.m % function function % % !rm diary8 diary diary8 m1.

% function function % % A.k)/ A(k.b] = gausswf(A.k) 762 .n) J(i.j.n] = size(A) for k = 1 : n-1 for i = k+1:n A(i.j) = s J(j.m 6. b] matrix A is overwritten with the upper triangular % part A^(n-1) and the multipliers are stored in the lower % triangular part of A.4.n] = size(A) % % !rm diary89 diary diary89 x = zeros(2.i) x(2) = A(j.i) c.j) = c A = J*A end %Solving Ax = b with partial pivoting without Explicit Factorization % gausswf.b] = gausswf(A.1) x(1) = A(i.A) m.k) = end A(i.i) =-s J(j.s] = givcs(x) J = eye(n.2 % input % output matrix A and vector b A.function A] = givrota(i.i) = c J(i.b) !rm diary8 diary diary8 m1.b) A. The vector b is overwritten by b^(n-1).

j) .b.A(i.m % input % output matrix A x b and numitr xold] xold] = jacobi(A.for i = k+1:n b(i) = b(i) + A(i.10.j) = -A(i.n) is not square') for i = 1 : n for j = 1 : n if i ~= j Bsub(i.j) % function function % % !rm diary8 diary diary8 m.x.numitr) = A(i.b.k) * A(k.j) end end end u = triu(A) l = tril(A.n] = size(A) if m~=n disp('matrix a return end xold = x Bsub = zeros(n.i) end end 763 .1 jacobi.numitr) xold] = jacobi(A.j) / A(i.k) * b(k) for j = k+1 :n A(i.-1) for i = 1:n l(i.i) = 1 end end % Jacobi method 6.x.

2 % input matrix A and vector b % output function % % x] x] = lsfsqrh(A.n] = size(A) if m1~=n disp('matrix A is not square') 764 .i) end for i = 1 : numitr disp('the iteration number') i xnew = Bsub * xold xold = xnew end end % Householder-Golub method for the least squares problem 7.x] = backsub(r.b) x] = lsfrqrh(A.n] = size(A) y = zeros(n.1) q.1) for i = 1 : n bsub(i) = b(i) / A(i.8.b) lsfrqrh.m + bsub % function !rm diary8 diary diary8 m.c) m1.end bsub = zeros(n.:) r.r1] = qr(A) c = q'*b ran = rank(r1) r = r1(1:ran.c.

r] = qr(A) for k = 1 : numitr disp('the iteration number') k Anew = r*q q.r] = qr(Anew) end xold = diag(Anew) end 765 .return end q.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.