This action might not be possible to undo. Are you sure you want to continue?
Biswa Nath Datta Department of Mathematical Sciences Northern Illinois University DeKalb, IL 60115
email: dattab@math.niu.edu
The book is dedicated to my Parents and Fatherinlaw and Motherinlaw whose endless blessing has made writing of this book possible
PREFACE
Numerical Linear Algebra is no longer just a subtopic of Numerical Analysis it has grown into an independent topic for research over the past few years. Because of its crucial role in scienti c computing, which is a major component of modern applied and engineering research, numerical linear algebra has become an integral component of undergraduate and graduate curricula in mathematics and computer science, and is increasingly becoming so in other curricula as well, especially in engineering. The currently available books completely devoted to the subject of numerical linear algebra are Introduction to Matrix Computations by G. W. Stewart, Matrix Computations by G. H. Golub and Charles Van Loan, Fundamentals of Matrix Computations by David Watkins, and Applied Numerical Linear Algebra by William Hager. These books, along with the most celebrated book, The Algebraic Eigenvalue Problem by J. H. Wilkinson, are sources of knowledge in the subject. I personally salute the books by Stewart and Golub and Van Loan because I have learned \my numerical linear algebra" from them. Wilkinson's book is a major reference, and the books by Stewart and Golub and Van Loan are considered mostly to be \graduate texts" and reference books for researchers in scienti c computing. I have taught numerical linear algebra and numerical analysis at Northern Illinois University, the University of Illinois, Pennsylvania State University, the University of California{San Diego, and the State University of Campinas, Brazil. I have used with great success the books by Golub and Van Loan and by Stewart in teaching courses at the graduate level. As for introductory undergraduate numerical linear algebra courses, I, like many other instructors, have taught topics of numerical linear algebra from the popular \numerical analysis" books. These texts typically treat numerical linear algebra merely as a subtopic, so I have found they do not adequately cover all that needs to be taught in a numerical linear algebra course. In some undergraduate books on numerical analysis, numerical linear algebra is barely touched upon. Therefore, in frustration I have occasionally prescribed the books by Stewart and by Golub and Van Loan as texts at the introductory level, although only selected portions of these books have been used in the classroom, and frequently, supplementary class notes had to be provided. When I have used these two books as \texts" in introductory courses, a major criticism (or compliment, in the view of some) coming from students on these campuses has been that they are \too rich" and \too vast" for students new to the subject. As an instructor, I have always felt the need for a book that is geared toward the undergraduate, and which can be used as an independent text for an undergraduate course in Numerical Linear Algebra. In writing this book, I hope to ful ll such a need. The more recent books Fundamentals of Matrix Computations, by David Watkins, and Applied Numerical Linear Algebra, by William Hager, address this need to some extent. This book, Numerical Linear Algebra and Applications, is more elementary than most existing books on the subject. It is an outgrowth of the lecture notes I have compiled over the years for use in undergraduate courses in numerical linear algebra, and which have been \classtested" at Northern Illinois University and at the University of CaliforniaSan Diego . I have deliberately chosen only those topics which I consider essential to a study of numerical linear algebra. The book is intended for use as a textbook at the undergraduate and beginning graduate levels in mathematics, computer science and engineering. It can also serve as a reference book for scientists and engineers. However, it is primarily written for use in a rst course in numerical linear algebra, and my hope is that it will bring numerical linear algebra to the undergraduate classroom.
Here the principal topics of numerical linear algebra, such as Linear Systems, the Matrix Eigenvalue Problem, Singular Value Decomposition, Least Squares Methods, etc., have been covered at a basic level. The book focuses on development of the basic tools and concepts of numerical linear algebra and their e ective use in algorithmdevelopment. The algorithms are explained in \stepbystep" fashion. Wherever necessary, I have referred the reader to the exact locations of advanced treatment of the more di cult concepts, relying primarily on the aforementioned books by Stewart, Golub and Van Loan, and occasionally on that of Wilkinson. I have also drawn heavily on applications from di erent areas of science and engineering, such as: electrical, mechanical and chemical engineering physics and chemistry statistics control theory and signal and image processing. At the beginning of each chapter, some illustrative case studies from applications of practical interest are provided to motivate the student. The algorithms are then outlined, followed by implementational details. MATLAB codes are provided in the appendix for some selected algorithms. A MATLAB toolkit, called MATCOM, implementing the major algorithms in Chapters 4 through 8 of the book, is included with the book. I will consider myself successful and my e orts rewarded if the students taking a rst course in numerical linear algebra and applications, using this book as a text, develop a rm grasp of the basic concepts of roundo errors, stability, condition and accuracy, and leave with a knowledge and appreciation of the core numerical linear algebra algorithms, their basic properties and implementations. I truly believe that the book will serve as the right text for most of the existing undergraduate and rst year graduate courses in numerical linear algebra. Furthermore, it will provide enough incentives for the educators to introduce numerical linear algebra courses in their curricula, if such courses are not in existence already. Prerequisites are a rst course in linear algebra and good knowledge of scienti c programming. Following is a suggested format for instruction using Numerical Linear Algebra and Applications as a text. These guidelines have been drawn on the basis of my own teaching and those of several other colleagues with whom I have had an opportunity to discuss. 1. A First Course in Numerical Linear Algebra (Undergraduate { one semester) Chapter 1: 1.4, 1.6, 1.7, 1.8 Chapter 2 Chapter 3: (except possibly Section 3.8) Chapter 4 Chapter 5: 5.1{5.4, 5.5.1, 5.6 Chapter 6: 6.2, (some selected topics from Section 6.3), 6.4, 6.5.1, 6.5.3, 6.5.4, 6.6, 6.7, 6.9, 6.10.1{6.10.4 Chapter 7: 7.2, 7.3, 7.4, 7.5, 7.6, 7.8.1, 7.8.2 Chapter 8: 8.2, (some selected topics of Section 8.3), 8.4, 8.5.1, 8.5.3, 8.6.1, 8.6.2, 8.7, 8.9.1, 8.9.2, 8.9.3, 8.9.4 Possibly also some very selected portions of Chapter 9 and Chapter 10, depending upon the availability of time and the interests of the students and instructors 2. A Second Course in Numerical Linear Algebra (Advanced Undergraduate, First Year Graduate { one semester)
Chapter 1: Chapter 5: Chapter 6: Chapter 7: Chapter 8: Chapter 9: Chapter 10 Chapter 11
1.3.5, 1.3.6 5.5, 5.6, 5.7, 5.8 6.3, 6.5.2, 6.5.5, 6.8, 6.10.5, 6.10.6 7.7, 7.8, 7.9, 7.10, 7.11 8.3, 8.8, 8.9, 8.10, 8.11, 8.12 9.2, 9.3, 9.4, 9.5, 9.6.1, 9.8, 9.9, 9.10
3. A Course on Numerical Linear Algebra for Engineers (Graduate { one semester) Chapter 1 Chapter 2 Chapter 3 (except possibly Section 3.8) Chapter 4 Chapter 5: 5.1,5.2, 5.3, 5.4 Chapter 6: 6.2, 6.3, 6.4, 6.5.1, 6.5.3, 6.6.3 (only the statement and implication of Theorem 6.6.3), 6.7.1, 6.7.2, 6.7.3, 6.7.8, 6.8, 6.9, 6.10.1, 6.10.2, 6.10.3, 6.10.4, 6.10.5 Chapter 7: 7.3, 7.5, 7.8.1, 7.8.2 Chapter 8: 8.2, 8.3, 8.4, 8.5, 8.6.1, 8.6.2, 8.7.1, 8.9.1, 8.9.2, 8.9.3, 8.9.4, 8.9.6, 8.12 Chapter 9 Chapter 10: 10.2, 10.3, 10.4, 10.5, 10.6.1, 10.6.3, 10.6.4, 10.8.1, 10.9.1, 10.9.2
CHAPTERWISE BREAKDOWN
Chapter 1, Some Required Concepts from Core Linear Algebra, describes some important results from theoretical linear algebra. Of special importance here are vector and matrix norms, special matrices and convergence of the sequence of matrix powers, etc., which are essential to the understanding of numerical linear algebra, and which are not usually covered in an introductory linear algebra course. Chapter 2 is on Floating Point Numbers and Errors in Computations. Here the concepts of oating point number systems and rounding errors have been introduced and it has been shown through examples how roundo errors due to cancellation and recursive computations can \pop up", even in simple calculations, and how these errors can be reduced in certain cases. The IEEE oating point standard has been discussed. Chapter 3 deals with Stability of Algorithms and Conditioning in Problems. The basic concepts of conditioning and stability, including strong and weak stability, have been introduced and examples have been given on unstable and stable algorithms and illconditioned and wellconditioned problems. It has been my experience, as an instructor, that many students, even after taking a few courses on numerical analysis, do not clearly understand that \conditioning" is a property of the problem, stability is a property of the algorithm, and both have e ects on the accuracy of the solution. Attempts have been made to make this as clear as possible. It is important to understand the distinction between a \bad" algorithm and a numerically e ective algorithm and the fact that popular mathematical software is based only on numerically e ective algorithms. This is done in Chapter 4, Numerically E ective Algorithms and Mathematical Software. The important properties such as e ciency, numerical stability, storageeconomy, etc., that make an algorithm and the associated software \numerically e ective" are explained with examples. In addition, a brief statement regarding important matrix software such as LINPACK, EISPACK, IMSL, MATLAB, NAG, LAPACK, etc., is given in this chapter. Chapter 5 is on Some Useful Transformations in Numerical Linear Algebra and Their Applications. The transformations such as elementary transformations, Householder re ections, and Givens rotations form the principal tools of most algorithms of numerical linear algebra. These important tools are introduced in this chapter and it is shown how they are applied to achieve the important decompositions such as LU and QR, and reduction to Hessenberg forms. This chapter is a sort of \preparatory" chapter for the rest of the topics treated in the book. Chapter 6 deals with the most important topic of numerically linear algebra, Numerical Solutions of Linear Systems. The direct methods such as Gaussian elimination with and without pivoting, QR factorization methods, the method based on Cholesky decompositions, the methods that take advantage of special structure of matrices, etc., the standard iterative methods such as the Jacobi, the GaussSeidel, successive overrelation, iterative re nement, and the perturbation analysis of the linear system and computations of determinants, inverses and leading principal minors are discussed in this chapter. Some motivating examples from applications areas are given before the techniques are discussed. The Least Squares Solutions to Linear Systems, discussed in Chapter 7, are so important in applications that the techniques for nding them should be discussed as much as possible, even in an introductory course in numerical linear algebra. There are users who still routinely use the normal equations method for computing least squares solution the numerical di culties associated with this approach are described in some detail and then a better method, based on the QR decomposition for the least squares problems, is discussed. The most reliable generalpurpose method based on the singular value decomposition is mentioned in this chapter, and treated in full in
Chapter 10. The QR methods for rankde cient least squares problem and for the underdetermined problem, and iterative re nement procedures are also discussed in this chapter. Some discussion on perturbation analysis is also included. Chapter 8 picks up another important topic, probably the second most important topic, Numerical Matrix Eigenvalue Problems. There are users who still believe that eigenvalues should be computed by nding the zeros of the characteristic polynomial. It is clearly explained why this is not a good general rule. The standard and most widely used techniques for eigenvalues computations, the QR iteration with and without shifts, are then discussed in some detail. The popular techniques for eigenvector computations such as the inverse power method and the Rayleigh Quotient Iteration are described, along with techniques of eigenvalue locations. The most common methods for the symmetric eigenvalue problem and the symmetric Lanczos method are described very brie y in the end. Discussion on the stability of di erential and di erence equations, and engineering applications to the vibration of structures and a stock market example from statistics, are included which will serve as motivating examples for the students. Chapter 9 deals with The Generalized Eigenvalue Problem (GEP). The GEP arises in many practical applications such as in mechanical vibrations, design of structures, etc. In fact, in these applications, almost all eigenvalue problems are generalized eigenvalue problems, most of them are symmetric de nite problems. We rst present a generalized QR iteration for the pair (A B ), commonly known as the QZ iteration for GEP. Then we discuss in detail techniques of simultaneous diagonalization for generalized symmetric de nite problems. Some applications of simultaneous diagonalization techniques, such as decoupling of a system of secondorder di erential equations, are described in some detail. Since several practical applications, e.g. the design of large sparse structures, give rise to very largescale generalized de nite eigenvalue problems, a brief discussion on Lanczosbased algorithms for such problems is also included. In addition, several case studies from vibration and structural engineering are presented. A brief mention is made of how to reduce a quadratic eigenvalue problem to a standard eigenvalue problem, or to a generalized eigenvalue problem. The Singular Value Decomposition (SVD) and singular values play important roles in a wide variety of applications. In Chapter 10, we rst show how the SVD can be used e ectively to solve computational linear algebra problems arising in applications, such as nding the structure of a matrix (rank, nearness to rankde ciency, orthonormal basis for the range and the null space of a matrix, etc.), nding least squares solutions to linear systems, computing the pseudoinverse, etc. We then describe the most widely used method, the GolubKahanReinsch method, for computing the SVD and its modi cation by Chan. The chapter concludes with the description of a very recent method for computing the smallest singular values of a bidiagonal matrix with high accuracy by Demmel and Kahan. A practical life example on separating the fetal ECG from the maternal ECG is provided in this chapter as a motivating example. The stability (or instability) of an algorithm is usually established by means of backward roundo error analysis, introduced and made popular by James Wilkinson. Working out the details of roundo error analysis of an algorithm can be quite tedious, and presenting such analysis for every algorithm is certainly beyond the scope of this book. At the same time, I feel that every student of numerical linear algebra should have some familiarity with the way rounding analysis of an algorithm is performed. We have given the readers A Taste of Roundo Error Analysis in Chapter 11 of the book by presenting such analyses of two popular algorithms: solution of a triangular system and Gaussian elimination for triangularization. For other algorithms, we just present the results (without proof) in the appropriate places in the book, and refer the readers to
the classic text The Algebraic Eigenvalue Problem by James H. Wilkinson and occasionally to the book of Golub and Van Loan for more details and proofs. The appendix contains MATLAB codes for a selected number of basic algorithms. Students will be able to use these codes as a template for writing codes for more advanced algorithms. A MATLAB toolkit containing implementation of some of the most important algorithms has been included in the book, as well. Students can use this toolkit to compare di erent algorithms for the same problem with respect to e ciency, accuracy, and stability. Finally, some discussion of how to write MATLAB programs will be included.
NUMERICAL LINEAR ALGEBRA AND APPLICATIONS
ical Linear Algebra, namely, the conditioning of the problem and the stability of an algorithm via backward roundo error analysis, are introduced at a very early stage of the book with simple motivating examples. Speci c results on these concepts are then stated with respect to each algorithm and problem in the appropriate places, and their in uence on the accuracy of the computed results is clearly demonstrated. The concepts of weak and strong stability, recently introduced by James Bunch, will appear for the rst time in this book. Most undergraduate numerical analysis textbooks are somewhat vague in explaining these concepts which, I believe, are fundamental to numerical linear algebra. Discussion of fundamental tools in a separate chapter. Elementary, Householder and Givens matrices are the three most basic tools in numerical linear algebra. Most computationally e ective numerical linear algebra algorithms have been developed using these basic tools as principal ingredients. A separate chapter (Chapter 5) has been devoted to the introduction and discussion of these basic tools. It has been clearly demonstrated how a simplebut very powerfulproperty of these matrices, namely, the ability of introducing zeros in speci c positions of a vector or of a matrix, can be exploited to develop algorithms for useful matrix factorizations such as LU and QR and for reduction of a matrix to a simple form such as Hessenberg. In my experience as a teacher, I have seen that once students have been made familiar with these basic tools and have learned some of their most immediate applications, the remainder of the course goes very smoothly and quite fast. Throughout the text, soon after describing a basic algorithm, it has been shown how the algorithm can be made coste ective and storagee cient using the rich structures of these matrices. Stepbystep explanation of the algorithms. The following approach has been adopted in the book for describing an algorithm: the rst few steps of the algorithm are described in detail and in an elementary way, and then it is shown how the general kth step can be written following the pattern of these rst few steps. This is particularly helpful to the understanding of an algorithm at an undergraduate level. Before presenting an algorithm, the basic ideas, the underlying principles and a clear goal of the algorithm have been discussed. This approach appeals to the student's creativity and stimulates his interest. I have seen from my own experience that once the basic ideas, the mechanics of the development and goals of the algorithm have been laid out for the student, he may then be able to reproduce some of the wellknown algorithms himself, even before learning them in the class.
Some Basic Features of the Book
Clear explanation of the basic concepts. The two most fundamental concepts of Numer
Clear discussion of numerically e ective algorithms and highquality mathematical software. Along with mathematical software, a clear and concise de nition of a \numerically e ective" algorithm has been introduced in Chapter 3 and the important properties
such as e ciency, numerical stability, storageeconomy etc., that make an algorithm and associated software numerically e ective, have been explained with ample simple examples. This will help students not only to understand the distinction between a \bad" algorithm and a numerically e ective one, but also to learn how to transform a bad algorithm into a good one, whenever possible. These ideas are not clearly spelled out in undergraduate texts and as result, I have seen students who, despite having taken a few basic courses in numerical analysis, remain confused about these issues. For example, an algorithm which is only e cient is often mistaken by students for a \good" algorithm, without understanding the fact that an e cient algorithm can be highly unstable (e.g., Gaussian elimination without pivoting). Applications. A major strength of the book is applications. As a teacher, I have often been faced with questions such as: \Why is it important to study suchandsuch problems?", \Why do suchandsuch problems need to be solved numerically?", or \What is the physical signi cance of the computed quantities?" Therefore, I felt it important to include practical life examples as often as possible, for each computational problem discussed in the book. I have done so at the outset of each chapter where numerical solutions of a computational problem have been discussed. The motivating examples have been drawn from applications areas, mainly from engineering however, some examples from statistics, business, bioscience, and control theory have also been given. I believe these examples will provide su cient motivation to the curious student to study numerical linear algebra. After a physical problem has been posed, the physical and engineering signi cance of its solution has been explained to some extent. The currently available numerical linear algebra and numerical analysis books do not provide su ciently motivating examples. MATLAB codes and the MATLAB toolkit. The use of MATLAB is becoming increasingly popular in all areas of scienti c and engineering computing. I feel that numerical linear algebra courses should be taught using MATLAB wherever possible. Of course, this does not mean that the students should not learn to write FORTRAN codes for their favorite algorithms{knowledge of FORTRAN is a great asset to a numerical linear algebra student. MATLAB codes for some selected basic algorithms have therefore been provided to help the students use these codes as templates for writing codes for more advanced algorithms. Also, a MATLAB toolkit implementing the major algorithms presented in the book has been provided. The students will be able to compare di erent algorithms for the same problem with regard to e ciency, stability, and accuracy. For example, the students will be able to see instantly, through numerical examples, why Gaussian elimination is more e cient than the QR factorization method for linear systems problems why the computed Q in QR factorization may be more accurate with the Householder or Givens method than with the GramSchmidt methods, etc. Thorough discussions and the most uptodate information. Each topic has been very thoroughly discussed, and the most current information on the state of the problem has been provided. The most frequently asked questions by the students have also been answered. Solutions and answers to selected problems. Partial solutions for selected important problems and, in some cases, complete answers, have been provided. I feel this is important for our undergraduate students. In selecting the problems, emphasis has been placed on those problems that need proofs.
Above all, I have imparted to the book my enthusiasm and my unique style of presenting material in an undergraduate course at the level of the majority of students in the class, which have made me a popular teacher. My teaching evaluations at every school at which I have taught (e.g., State University of Campinas, Brazil Pennsylvania State University the University of Illinois at UrbanaChampaign University of California, San Diego Northern Illinois University, etc.) have been consistently \excellent" or \very good". As a matter of fact, the consistently excellent feedback that I receive from my students provided me with enough incentive to write this book.
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTATIONAL DIFFICULTIES
0.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 0.2 Fundamental Linear Algebra Problems and Their Importance : : : : : : : : : : : : : 0.3 Computational Di culties of Solving Linear Algebra Problems Using Obvious Approaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1
4
CHAPTER 0 LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTATIONAL DIFFICULTIES
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTATIONAL DIFFICULTIES
0.1 Introduction
The main objectives of this chapter are to state the fundamental linear algebra problems at the outset, make a brief mention of their importance, and point out the di culties that one faces in computational setting when trying to solve these problems using obvious approaches.
0.2 Fundamental Linear Algebra Problems and Their Importance
The fundamental linear algebra problems are: A. The Linear System Problem: Given an n n nonsingular matrix A and an nvector b, the problem is to nd an nvector x such that Ax = b. A practical variation of the problem requires solutions of several linear systems with the same matrix A on the left hand side. That is, the problem there is to nd a matrix X = x1 x2 :: xm] such that AX = B where B = b1 b2 ::: bm] is an n m matrix. Associated with linear system problems are the problems of nding the inverse of a matrix, nding the rank, the determinant, the leading principal minors, an orthonormal basis for the range and the nullspace of A, and various projection matrices associated with A. Solutions of some of these later problems require matrix factorizations and the problem of matrix factorizations and linear system problems are intimately related. It is perhaps not an exaggeration to say that the linear system problem arises in almost all branches of science and engineering: applied mathematics, biology, chemistry, physics, electrical, mechanical, civil, and vibration engineering, etc. The most common source is the numerical solution of di erential equations. Many mathematical models of physical and engineering systems are systems of di erential equations: ordinary and partial. A system of di erential equations is normally solved numerically by discretizing the system by means of nite di erences or nite element methods. The process of discretization, in 1
general, leads to a linear system, the solution of which is an approximate solution to the di erential equations. (see Chapter 6 for more details). B. The Least Squares Problem: Given an m n matrix A, and an mvector b, the least squares problem is to nd an nvector x such that the norm of the residual vector, kAx ; bk2, is as small as possible. Least squares problems arise in statistical and geometric applications that require tting a polynomial or curve to experimental data, and engineering applications such as signal and image processing. See Chapter 7 for some speci c applications of least squares problems. It is worth mentioning here that methods for numerically solving least squares problems invariably lead to solutions of linear systems problems (see again Chapter 7 for details). C. The Eigenvalue Problem: Given an n n matrix A, the problem is to nd n numbers i and nvectors xi such that
Axi = i xi i = 1 ::: n:
The eigenvalue problem typically arises in the explicit solution and stability analysis of a homogeneous system of rst order di erential equations. The stability analysis requires only implicit knowledge of eigenvalues, whereas the explicit solution requires eigenvalues, and eigenvectors explicitly. Applications such as buckling problems, stock market analysis, study of behavior of dynamical systems, etc. require computations of only a few eigenvalues and eigenvectors, usually the few largest or smallest ones. In many practical instances, the matrix A is symmetric, and thus, the eigenvalue problem becomes a symmetric eigenvalue problem. For details of some speci c applications see Chapter 8. A great number of eigenvalue problems arising in engineering applications are, however, generalized eigenvalue problems, as stated below.
2
D. The Generalized and Quadratic Eigenvalue Problems: Given the n n matrices A B , and C , the problem is to nd i and xi such that ( 2A + iC + B )xi = 0 i = 1 : : : n: i This is known as the quadratic eigenvalue problem. In the special case when C is a zero matrix, the problem reduces to a generalized eigenvalue problem. That is, if we are given n n matrices A and B , we must nd and x such that
Ax = Bx:
The leading equations of vibration engineering (a branch of engineering dealing with vibrations of structures, etc.) are systems of homogeneous or nonhomogeneous secondorder di erential equations. A homogeneous second order system has the form:
Az + C z + Bz = 0 _
the solution and stability analysis of which lead to a quadratic eigenvalue problem. Vibration problems are usually solved by setting C = 0.Moreover, in many practical instances, the matrices A and B are symmetric and positive de nite. This leads to a symmetric de nite generalized eigenvalue problem. See Chapter 9 for details of some speci c applications of these problems. E. Singular Value Decomposition Problem. Given an m n matrix A, the problem is to nd unitary matrices U and V , and a \diagonal" matrix such that
A = U ( )V :
The above decomposition is known as the Singular Value Decomposition of A. The entries of are singular values. The column vectors of U and V are called the singular vectors. Many areas of engineering such as control and systems theory, biomedical engineering, signal and image processing, and statistical applications give rise to the singular value decomposition problem. These applications typically require the rank of A, an orthonormal basis, projections, the 3
distance of a matrix from another matrix of lower rank, etc., in the presence of certain impurities (known as noises) in the data. The singular values and singular vectors are the most numerically reliable tools to nd these entities. The singular value decomposition is also the most numerically e ective approach to solve the least squares problem, especially, in the rankde cient case.
0.3 Computational Di culties of Solving Linear Algebra Problems Using Obvious Approaches
In this section we would like to point out some computational di culties one might face while attempting to solve some of the abovementioned linear algebra problems using \obvious" ways. graduate linear algebra course, is of signi cant theoretical and historical importance (for a statement of this rule, see Chapter 6). Unfortunately, it can not be recommended as a practical computational procedure. Solving a 20 20 linear system, even on a fast modernday computer, might take more than a million years to compute the solution with this rule.
Solving a Linear System by Cramer's Rule: Cramer's Rule, as taught at an under
Computing the unique solution of a linear system by matrix inversion: The unique
solution of a nonsingular linear system can be written explicitly as x = A;1b. Unfortunately, computing a solution to a linear system by rst explicitly computing the matrix inverse is not practical. The computation of the matrix inverse is about three times as expensive as solving the linear system problem itself using a standard elimination procedure (see Chapter 6), and often leads to more inaccuracies. Consider a trivial example: Solve 3x = 27. An elimination procedure will give x = 9 and require only one division. On the other hand, solving the equation using matrix inversion will be cast as x = (1=3) 27, giving x = 0:3333 27 = 8:999 (in four digit arithmetic), and will require one division and one multiplication. Note that computer time consumed by an algorithm is theoretically measured by the number of arithmetic operations needed to execute the algorithm.
Solving a least squares problem by normal equations: If the m n matrix A has
full rank, and m is greater than or equal to n, then the least squares problem has a unique solution and this solution is theoretically given by the solution x to the linear system
AT Ax = AT b:
4
The above equations are known as the normal equations. Unfortunately, this procedure has some severe numerical limitations. First, in nite precision arithmetic, during an explicit formation of AT A, some vital information might be lost. Second, the normal equations are more sensitive to perturbations than the ordinary linear system Ax = b, and this sensitivity, in certain instances, corrupts the accuracy of the computed least squares solution to an extent not warranted by the data. (See Chapter 7 for more details.)
Computing the eigenvalues of a matrix by nding the roots of its characteristic polynomial: The eigenvalues of a matrix A are the zeros of its characteristic polynomial.
Thus an \obvious" procedure for nding the eigenvalues would be to compute the characteristic polynomial of A and then nd its zeros by a standard wellestablished root nding procedure. Unfortunately, this is not a numerically viable approach. The roundo errors produced during a process for computing the characteristic polynomial, will very likely produce some small perturbations in the computed coe cients. These small errors in the coe cients can a ect the computed zeros very drastically in certain cases. The zeros of certain polynomials are known to be extremely sensitive to small perturbations in the coe cients. A classic example of this is the Wilkinsonpolynomial (see Chapter 3). Wilkinson took a polynomial of degree 20 with the distinct roots: 1 through 20, and perturbed the coe cient of x19 by a signi cantly small amount. The zeros of this slightly perturbed polynomial were then computed by a wellestablished root nding procedure, only to nd that some zeros became totally di erent. Some even became complex.
nonsingular
Solving the Generalized Eigenvalue Problem and the Quadratic Eigenvalue Problems by Matrix Inversion: The generalized eigenvalue problem in the case where B is
Ax = Bx B;1 Ax = x:
is theoretically equivalent to the ordinary eigenvalue problem:
However, if the nonsingular matrix B is sensitive to perturbations, then forming the matrix on the left hand side by explicitly computing the inverse of B will lead to inaccuracies that in turn will lead to computations of inaccurate generalized eigenvalues. Similar results hold for the quadratic eigenvalue problem. In major engineering applications, such as in vibration engineering, the matrix A is symmetric positive de nite, and is thus 5
nonsingular. In that case the quadratic eigenvalue problem is equivalent to the eigenvalue problem Eu = u where ! 0 I : E= ;1 ;1
;A B ;A C
But numerically it is not advisable to solve the quadratic eigenvalue problem by actually computing the matrix E explicitly. If A is sensitive to small perturbations, the matrix E cannot be formed accurately, and the computed eigenvalues will then be inaccurate.
Finding the Singular Values by computing the eigenvalues of AT A: Theoretically,
the singular values of A are the nonnegative square roots of the eigenvalues of AT A. However, nding the singular values this way is not advisable. Again, explicit formation of the matrix might lead to the loss of signi cant relevant information. Consider a rather trivial example: 01 11 B C AB 0C @ A 0 0 where is such that! nite precision computation 1 + 2 = 1. Then computationally we in 1 1 have AT A = . The eigenvalues now are 2 and 0. So the computed singular values 1 1 p p p will now be given by 2, and 0. The exact singular values, however are 2 and = 2 (See Chapter 10 for details.)
Conclusion: Above we have merely pointed out how certain obvious theoretical approaches to
linear algebra problems might lead to computational di culties and inaccuracies in computed results. Numerical linear algebra deals with indepth analysis of such di culties, investigations of how these di culties can be overcome in certain instances, and with formulation and implementations of viable numerical algorithms for scienti c and engineering use.
6
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGEBRA
1.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2.1 Subspace and Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.1 Range and Null Spaces : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.2 Rank of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.3 The Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.4 Similar Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.5 Orthogonality and Projections : : : : : : : : : : : : : : : : : : : : : : 1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix 1.4 Some Special Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.5 The CayleyHamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : 1.6 Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.7 Vector and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.7.1 Vector Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.7.2 Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.7.3 Convergence of a Matrix Sequence and Convergent Matrices : : : : : : 1.7.4 Norms and Inverses : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.8 Norm Invariant Properties of Orthogonal Matrices : : : : : : : : : : : : : : : 1.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : :
7 7 8 9 13 13 14 15 15 17 18 26 27 28 28 30 34 37 40 41 42
CHAPTER 1 A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGEBRA
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGEBRA
1.1 Introduction
Although a rst course in linear algebra is a prerequisite for this book, for the sake of completeness we establish some notation and quickly review the basic de nitions and concepts on matrices and vectors in this chapter, and then discuss in somewhat greater detail the concepts and fundamental results on vector and matrix norms and their applications to the study of convergent matrices. These results will be used frequently in the later chapters of the book.
1.2 Vectors
An ordered set of numbers is called a vector the numbers themselves are called the components of the vector. A lower case italic letter is usually used to denote a vector. A vector v having n components has the form 2 3
6v 6 v = 6 .. 6 6. 4
v
1 2
A vector in this form is referred to as a column vector and its transpose is a row vector. The set of all nvectors (that is, each vector having n components) will be denoted by Rn 1 or simply by Rn . The transpose of a vector v will be denoted by v T . Unless otherwise stated, a column vector will simply be called a vector. If u and v are two row vectors in Rn, then their sum u + v is de ned by
vn
7 7 7: 7 7 5
u + v = (u + v u + v : : : um + vn)T :
1 1 2 2
If c is a scalar, then cu = (cu1 cu2 : : : cun)T . The inner product of two vectors u and v is the scalar given by uvT = u1 v1 + u2 v2 + + unvn: The length of a vector v , denoted by kv k, is v T v that is, the length of v (or Euclidean length of p2 2 2 v) is v1 + v2 + + vn. A set of vectors fm1 : : : mk g in Rn is said to be linearly dependent if there exist scalars c1 : : : ck, not all zero, such that
p
cm +
1 1
+ ck mk = 0 7
(zero vector).
Otherwise, the set is called linearly independent.
Example 1.2.1
The set of vectors
ei = (0 0 : : : 0 1 0 : : : 0)T " ith component
is linearly independent, because
i = 1 ::: n
c e +c e +
1 1 2 2
0c Bc B + cnen = B . B. B. @
= cn = 0:
1 2
cn
1 C C C=0 C C A
is true if and only if
c =c =
1 2
Example 1.2.2
1 ;3 The vectors and are linearly dependent, because ;2 6 1 ;3 0 3 + = : ;2 6 0 Thus, c1 = 3 c2 = 1.
!
!
!
!
!
1.2.1 Subspace and Basis Orthogonality of two vectors : The angle between two vectors u and v is given by
uT cos( ) = kukkvv k
. Two vectors u and v are orthogonal if = 90o , that is uT v = 0. The symbol ? is used to denote orthogonality. Let S be a set of vectors in Rn . Then S is called a subspace of Rn if s1 s2 2 S implies c1s1 + c2 s2 2 S , where c1 and c2 are any scalars. That is, S is a subspace if any linear combination of two vectors in S is also in S . Note that the space Rn itself is a subspace of Rn . For every subspace there is a unique smallest positive integer r such that every vector in the subspace can be 8
expressed as a linear combination of at most r vectors in the subspace r is called the dimension of the subspace and is denoted by dim S ]. Any set of r linearly independent vectors from S of dim S ] = r forms a basis of the subspace. Orthogonality of Two Subspaces. Two subspaces S1 and S2 of Rn are said to be orthogonal if sT s2 = 0 for every s1 2 S1 and every s2 2 S2 . Two orthogonal subspaces S1 and S2 will be 1 denoted by S1 ? S2.
1.3 Matrices
A collection of n vectors in Rn arranged in a rectangular array of m rows and n columns is called a matrix. A matrix A, therefore, has the form
am bm bmn It is denoted by A = (aij )m n , or simply by A = (aij ), where it is understood that i = 1 : : : m and j = 1 : : : n. A is said to be of order m n. The set of all m n matrices is denoted by Rm n . A matrix A having the same number of rows and columns is called a square matrix. The
1 2
0a Ba B A = B .. B B . @
11 21
b b
12 22
bn1 C bnC C
1 2
C: C A
square matrix having 1's along the main diagonal and zeros everywhere else is called the identity matrix and is denoted by I . The sum of two matrices A = (aij ) and B = (bij ) in Rm n is a matrix of the same order as A and B and is given by A + B = (aij + bij ): If c is a scalar, then cA is a matrix given by
cA = (caij ):
Let A be m n and B be n p. Then their product AB is an m p matrix given by n X i = 1 ::: m AB = ( aikbkj ) j = 1 : : : p: k=1 Note that if b is a column vector, then Ab is a column vector. On the other hand, if a is a column vector and bT is a row vector, then abT is a matrix, known as the outer product of the two vectors a and b. Thus 0 1 0 1
Ba B abT = B .. B B. @
a
1 2
an
C C C(b b C C A
1
2
Ba b B bm ) = B .. B B . @
ab
1 1 2 1
a bm C a bm C C
1 2
anb
1
an bm
. C: . C . A
9
Example 1.3.1
011 B C a = B2C @ A
3
b = (2 3 4)
4 The transpose of a matrix A of order m n, denoted by AT , is a matrix of order n m with rows and columns interchanged:
02 3 4 1 B C Outer product abT = B 4 6 8 C (a matrix). @ A 6 9 12 1 02 B C Inner product aT b = ( 1 2 3 ) B 3 C = 20 (a scalar). @ A
AT = (aji)
i = 1 : : :n j = 1 : : :m:
Note that the matrix product is not commutative that is, in general
AB 6= BA:
Also, (AB )T = B T AT .
Hermitian (Symmetric) Matrix : A complex matrix A is called Hermititan if
A = (A)T = A, where A is the complex conjugate of A. A real matrix is symmetric if AT = A.
An alternative way of writing the matrix product
Writing B = (b1 : : : bp), where bi is the ith column of B , the matrix product AB can be written as
AB = (Ab : : : Abp):
1
Similarly, if ai is the ith row of A then
0a B1 Ba BC B C AB = B .. C : B C B . C @ A
1 2
am B
10
Block Matrices
If two matrices A and B can be partitioned as
A A= A
11 21
A ! A
12 22
B B= B
11 21
B ! B
12 22
then considering each block as an element of the matrix, we can perform addition, scalar multiplication and matrix multiplication in the usual way. Thus, ! A11 + B11 A12 + B21 A+B = A21 + B21 A22 + B22 and ! A11B11 + A12B21 A11 B12 + A12B22 AB = A21B11 + A22B21 A21 B12 + A22B22 assuming that the partioning has been done conformably so that the corresponding matrix multiplications are possible. The concept of two block partioning can be easily generalized. Thus, if A = (Aij ) and B = (Bij ) are two block matrices, then C = AB is given by
C = (Cij ) =
n X k=1
Aik Bkj
!
where each Aik Bkj , and Cij is a block matrix. A block diagonal matrix is a diagonal matrix where each diagonal element is a square matrix. That is, A = diag(A11 : : : Ann) where Aii are square matrices.
The Determinant of a Matrix
For every square matrix A, there is a unique number associated with the matrix called the determinant of A, which is denoted by det(A). For a 2 2 matrix A, det(A) = a11a22 ; a12a21 for a 3 3 matrix A = (aij ), det(A) = a11 det(A11) ; a12 det(A12) + a13 det(A13), where A1i is a 2 2 submatrix obtained by eliminating the rst row and the ith column. This can be easily generalized. For an n n matrix A = (aij ) we have det(A) = (;1)i+1ai1 det(Ai1) + (;1)i+2 ai2 det(Ai2) + + (;1)i+nain det(Ain) where Aij is the submatrix of A of order (n ; 1) obtained by eliminating the ith row and j th column. 11
Example 1.3.2
01 2 31 B C A = B4 5 6C: @ A
Set i = 1. Then 7 8 9 5 6 4 6 4 5 det(A) = 1 det ; 2 det + 3 det 8 9 7 9 7 8 = 1(;3) ; 2(;6) + 3(;3) = 0:
!
!
!
Theorem 1.3.1 The following simple properties of det(A) hold:
1. det(A) = det(AT ) 2. det( A) =
n det(A), where
is a scalar.
3. det(AB ) = det(A) det(B ). 4. If two rows or two columns of A are identical, then det(A) = 0. 5. If B is a matrix obtained from A by interchanging two rows or two columns, then det(B ) = ; det(A). 6. The determinant of a triangular matrix is the product of its diagonal entries. (A square matrix A is triangular if its elements below or above the diagonal are all zero.)
The Characteristic Polynomial, the Eigenvalues and Eigenvectors of a Matrix
Let A be an n n matrix. Then the polynomial pn ( ) = det( I ; A) is called the characteristic polynomial. The zeros of the characteristic polynomial are called the eigenvalues of A. Note that this is equivalent to the following: is an eigenvalue of A i there exists a nonzero vector x such that Ax = x. The vector x is called a right eigenvector (or just an eigenvector), and the vector y satisfying y A = y is called a left eigenvector associated with .
De nition 1.3.1 An n n matrix A having fewer than n linearly independent eigenvectors is called a defective matrix . Example 1.3.3
12
1 2 The matrix A = 0 1 " 1 # " ;1 # is defective. The two eigenvectors and are linearly independent. 0 0
!
The Determinant of a Block Matrix
A ! A= 0 A where A and A are square matrices. Then det(A) = det(A ) det(A ).
Let
A
11
12 22
11
22
11
22
1.3.1 Range and Null Spaces
For every m n matrix A, there are two important associated subspaces: the Range of A, denoted by R(A), and the Null Space of A, denoted by N (A):
R(A) = fb 2 Rm j b = Ax for some x 2 Rng N (A) = fx 2 Rn j Ax = 0g:
Let S be a subspace of Rm . Then the subspace S ? de ned by
S ? = fy 2 Rm j y T x = 0 for all x 2 S g
is called the orthogonal complement of S . It can be shown (Exercise) that
(i) N (A) = R(AT )? (ii) R(A)? = N (AT ). The dimension of N (A) is called the nullity of A and is denoted by null(A).
1.3.2 Rank of a Matrix
Let A be an m n matrix. Then the subspace spanned by the row vectors of A is called the row space of A. The subspace spanned by the columns of A is called the column space of A. The rank of a matrix A is the dimension of the column space of A. It is denoted by rank(A). A square matrix A 2 Rn n is called nonsingular if rank(A) = n. Otherwise it is singular. 13
An n n matrix A 2 Rn n is said to have full column rank if its columns are linearly independent. The full row rank is similarly de ned. A matrix A is said to have full rank if it has either full row rank or full column rank. If A does not have full rank, it is rank de cient.
Example 1.3.4
5 6 has full rank rank(A) = 2 (it has full column rank) null(A) = 0.
01 21 B C A = B3 4C @ A
Example 1.3.5
01 21 B C A = B2 4C @ A
0 0
is rank de cient rank(A) = 1 Null (A) = 1.
Some Rank Properties
Let A be an m n matrix. Then 1. rank(A) = rank(AT ). 2. rank(A) + null(A) = n. 3. rank(AB ) rank(A) + rank(B ) ; n, where B is n p. 4. rank(BA) = rank(A) = rank(AC ), where B and C are nonsingular matrices of order m. 5. rank(AB ) minfrank(A) rank(B )g.
1.3.3 The Inverse of a Matrix
Let A be an n n matrix. Then a matrix B such that
AB = BA = I
where I is the n n identity matrix, is called the inverse of A. The inverse of A is denoted by A;1 . The inverse is unique. An interesting property of the inverse of the product of two matrices is: 14
(AB );1 = B ;1 A;1:
Theorem 1.3.2 For an n n matrix A, the following are equivalent:
1. A is nonsingular. 2. det(A) is nonzero. 3. rank(A) = rank(AT ) = n. 4. N (A) = f0g. 5. A;1 exists. 6. A has linearly independent rows and columns. 7. The eigenvalues of A are nonzero.
1.3.4 Similar Matrices
Two matrices A and B are called similar if there exists a nonsingular matrix T such that
T ; AT = B:
1
An important property of similar matrices. Two similar matrices have the same eigenvalues. (for a proof, see Chapter 8, section 8.2) 1.3.5 Orthogonality and Projections A set of vectors fv : : : vmg in Rn is orthogonal if
1
viT vj = 0
i 6= j:
If, in addition, viT vi = 1, for each i, then they are called orthonormal. A basis for a subspace that is also orthonormal is called an orthonormal basis for the subspace.
Example 1.3.6
15
0 0 01 C B A = B 1C: A @
1 2 1 2
0 0 B The vector B ; p @ ;p Example 1.3.7
1 5 1 5
1 C forms an orthonormal basis for R(A). (See section 5.6.1) C A 01 21 B C A = B0 1C @ A
1 0
1 2 1 3 1 3
1
0 ;p ;p 1 B ; C p C forms an orthonormal basis for R(A). (See section 5.6.1) The matrix V = B 0 @ A
1
; p1 2
1 p3
Orthogonal Projection
Let S be a subspace of Rn. Then an n n matrix P having the properties: (i) R(P ) = S (ii) P T = P (P is symmetric) (iii) P 2 = P (P is idempotent) is called the orthogonal projection onto S or simply the projection matrix. We denote the orthogonal projection P onto S by PS . The orthogonal projection onto a subspace is unique. Let V = fv1 : : : vk g form an orthonormal basis for a subspace S . Then
PS = V V T
is the unique orthogonal projection onto S . Note that V is not unique, but PS
is.
A relationship between PS and PS?
If PS is the orthogonal projection onto S , then I ; PS , where I is the identity matrix of the same order as PS , is the orthogonal projection onto S ? . (Exercise : 14(a)) 16
The Orthogonal Projections onto R(A) and N (AT )
When the subspace S is R(A) or N (AT ) associated with the matrix A, we will denote the unique orthogonal projections onto R(A) and N (AT ) by PA and PN , respectively. It can be shown (exercise 14(b)) that if A is m n(m then
n) and has full rank,
PA = A(AT A); AT PN = I ; A(AT A); AT :
1 1
Example 1.3.8
01 21 B C A = B0 1C @ A 1 0 1 0 ! 2 2 ; A AT A = (AT A); = @ 2 5 ; 0 1 B C PA = B B C ; C @ A ; ;
1 5 6 1 3 1 3 1 3 5 6 1 3 1 6 1 3 1 3 1 6 1 3 1 6 1 3
1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix
Any vector b can be written as
b = bS + bS?
where bS 2 S and bS ? 2 S ? . Let S be the R(A) of a matrix A. Then bS 2 R(A) and bS ? 2 N (AT ). We will therefore denote bS by bR and bS ? by bN , meaning that bR is in the range of A and bN is in the null space of AT . It can be shown (exercise (14(c)) that
bR = PA b and bN = PN b:
17
bR and bN are called the orthogonal projection of b onto R(A) and the orthogonal projection
of b onto N (AT ), respectively. Example 1.3.9
From above, we easily see that
bT bN = 0: R
0 0 01 B C A = B 1C @ A
1 2 1 2
1 0 0C B V = an orthonormal basis = B ; p C C B A @ ;p 00 0 01 B C C PA = V V T = B 0 @ A
1 2 1 2
1
011 B C b = B1C @ A
1
00 0 01011 001 CB1C = B1C B CB C B C bR = PAb = B 0 A@ A @ A @ 01 0 0 1 B C PN = (I ; PA) = B 0 ; C @ A ; 0 011 B C bN = PN b = B 0 C @ A
1 2 1 2 1 2 1 2
0
1 2 1 2
1 2 1 2
0
1 2 1 2
1 2 1 2
1
1
Note that b = bR + bN .
0
1.4 Some Special Matrices
1. Diagonal Matrix { A square matrix A = (aij ) is a diagonal matrix if aij = 0 for i 6= j .
We write A = diag(a11 : : : ann).
2. Triangular Matrix { A square matrix A = (aij ) is an upper triangular matrix if aij = 0 for
i > j.
18
The transpose of an upper triangular matrix is lower triangular that is, A = (aij ) is lower triangular if aij = 0 for i > j . 0 LOWER TRIANGULAR UPPER TRIANGULAR
0 B B @
0C C
1 A
0 B B @
1 C C A
Some Useful Properties of Triangular Matrices The following properties of triangular matrices are useful.
1. The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix. The diagonal entries of the product matrix are just the products of the diagonal entries of the individual matrices. (Exercise 19(a) ). 2. The inverse of a nonsingular upper (lower) triangular matrix is an upper (lower) triangular matrix. The diagonal entries of the inverse are the reciprocals of the diagonal entries of the original matrix. (Exercise 19(b)). 3. The eigenvalues of a triangular matrix are its diagonal entries (Exercise 19(d)). 4. The determinant of a triangular matrix is the product of its diagonal entries. (Exercise (19(c)).
Thus, a triangular matrix is nonsingular i all of its diagonal entries are nonzero. 3. Unitary (Orthogonal) Matrix  A square complex matrix U is unitary if
U U = UU = I
where U = (U )T U is the complex conjugate of U . If U is real, then U is orthogonal if
U T U = UU T = I:
Orthogonal matrices play a very important role in numerical matrix computations.
19
The following two important properties of orthogonal mattices make them so attractive for numerical computation : 1. The inverse of an orthogonal matrix is just its transpose O;1 = OT 2. The product of two orthogonal matrices is an orthogonal matrix.
4. Permutation Matrix  A nonzero square matrix P is called a permutation matrix if
there is exactly one nonzero entry in each row and column which is 1 and the rest are all zero. Thus, if ( 1 : : : n) is a permutation of (1 2 : : : n), then P = ( e 1 ... e )
n
where ei is the ith row of the n n identity matrix I , is a permutation matrix. Similarly,
P = (e 1 e 2 : : : e )
n
where ei is the ith column of I , is a permutation matrix.
Example 1.4.1
1 0 0 are all permutation matrices.
00 1 01 B C P = B0 0 1C @ A
1
01 0 01 B C P = B0 1 0C @ A
2
0 0 1
01 0 01 B C P = B0 0 1C @ A
3
0 1 0
E ects of Premultiplication and Postmultiplication by a permutation matrix. 0e 1 B . C If P = B . C, then @ . A
1
1
e
n
Similarly, if P2 = (e 1 e
2
1
row of A e ), where e is the ith column of A, then
nth
n i
0 B th row of A B th row of A B P A=B B . . B . @
1 2 1
1 C C C: C C C A
AP = ( th column of A, th column of A, : : : , nth column of A).
1 2
20
Example 1.4.2 0a a B 1. A = B a a @
31
12 22 23
0a B P A = Ba @
1
a
21
13
a
a a a
22 32
13 23
33
21
00 1 01 B C 2. P = B 0 0 1 C = (e e e ) @ A
1 3 1 2
a
31
11
a a a
12
1 0 0 e1 a23 1 0 2nd row of A 1 C B C a33 C = B 3rd row of A C A @ A a13 1st row of A
1 00 1 01 0e C C P = B0 0 1C = Be B C B A @ A @
1
2 3
1 C C A
0a B AP = B a @
1
1 0 0
13
a
23
33
a a a
11
21
31
a a a
12
22
1 C = (3rd column of A, 1st column of A, 2nd column of A) C A
32
An important property of a permutation matrix is that a permutation matrix is orthogonal. Thus: 1. The inverse of a permutation matrix P is its transpose and it is also a permutation matrix. 2. The product of two permutation matrices is a permutation matrix, and therefore is orthogonal.
5. Hessenberg Matrix (almost triangular)  A square matrix A is upper Hessenberg if aij = 0
for i > j + 1. The transpose of an upper Hessenberg matrix is a lower Hessenberg matrix, that is, a square matrix A = (aij ) is a lower Hessenberg matrix if aij = 0 for j > i + 1. A square matrix A that is both upper and lower Hessenberg is tridiagonal.
0 LOWER HESSENBERG UPPER HESSENBERG 21
0 B .. B. B B B @
...
0
1 C C C C C A
0 B B B B B @
1 C C C ... . . C . .C . .A
An upper Hessenberg matrix A = (aij ) is unreduced if
ai i; 6= 0 for i = 2 3 : : : n
1
Similarly, a lower Hessenberg matrix A = (aij ) is unreduced if
ai i 6= 0 for i = 1 2 : : : n ; 1
+1
Example 1.4.3 01 2 01 B C A = B 2 3 4 C is an unreduced lower Hessenberg matrix. @ A
1 01 B A = B1 @ 0 1 1 1 2
1 1 1 C 1 C is an unreduced upper Hessenberg matrix. A 3
Some Useful Properties
1. Every square matrix A can be transformed to an upper (or lower) Hessenberg matrix by means of an unitary similarity, that is, given a complex matrix A, there exists a unitary matrix U such that UAU = H where H is a Hessenberg matrix.
Proof. (A constructive proof in the case where A is real is given in Chapter 5.)
2. If A is symmetric (or complex Hermitian), then the transformed Hessenberg matrix as obtained in 1 is tridiagonal. 3. An arbitrary Hessenberg matrix can always be partitioned into diagonal blocks such that each diagonal block is an unreduced Hessenberg matrix.
Example 1.4.4
01 2 3 41 B2 1 1 1C B C C A = B B B0 0 1 1C: C @ A 0 0 1! 1 A A
A =
0 A3 22
1 2
:
Note that and are unreduced Hessenberg matrices.
1 2 A1 = 2 1 1 1 A3 = 1 1
! !
0 0 0 1 an is called an upper companion matrix. The transpose of an upper companion matrix is a lower companion matrix. The characteristic polynomial of a companion matrix can be easily written down. det (C ; I ) = det (C T ; I ) = (;1)n( n ; an n;1 ; an;1 n;2 ;
Companion Matrix  A normalized upper Hessenberg matrix of the form 00 0 a 1 B1 0 C B a C B C B0 1 C B C C=B B .. . . . . . . . . . .. C C B. .C @ A
1 2
; a ; a ):
2 1
matrix that is, A is nonderogatory if there exists a nonsingular T such that TAT ;1 = a companion matrix. There are, of course, other equivalent characterizations of a nonderogatory matrix. For example, a matrix A is nonderogatory i there exists a vector b such that rank(b Ab : : : An;1b) = n: The matrix (b Ab : : : An;1b) = n is called the controllability matrix in control theory. If the rank condition is satis ed, then the pair (A b) is called controllable.
6. Nonderogatory Matrix A matrix A is nonderogatory if A is similar to a companion
Remark: An unreduced Hessenberg matrix is nonderogatory, but the converse is not true. Example 1.4.5
23
1 2 A= 0 3 is an upper Hessenberg matrix with (2 1) entry equal to zero, but A is nonderogatory. ! 1 Pick b = : 2 ! 1 5 Then (b Ab) = is nonsingular. 2 6 A matrix that is not nonderogatory is called derogatory. A derogatory matrix is similar to a direct sum of a number of companion matrices
!
0C B B 0 B B ... B B B B 0 @
1
where each Ci is a companion matrix, k > 1 and the characteristic polynomial of each Ci divides the characteristic polynomial of all the preceding Ci's. The above form is also known as Frobenius
Ck
1 C C C C C C C C A
Canonical Form.
7. Diagonally Dominant Matrix  A matrix A = (aij ) is row diagonally dominant if X jaiij > jaij j for all i
A column diagonally dominant matrix is similarly de ned. The matrix 0 10 1 1 1 B C A = B 1 10 1 C @ A 1 1 10 is both row and column diagonally dominant. Note : Sometimes in the literature of Linear algebra, a matrix A having the above properties is called a strictly diagonally matrix.
j 6=i
8. Positive De nite Matrix  A symmetric matrix A is positive de nite if for every nonzero
vector x,
xT Ax > 0 n T . Then xT Ax = X aij xi xj is called the quadratic form associated Let x = (x x : : : xn) ij with A.
1 2 =1
24
Example 1.4.6
x x ! x! T Ax = (x x ) 2 1 x 1 5 x ! x = (2x + x x + 5x ) x = 2x + 2x x + 5x = 2(x + 2x x + x ) + x = 2(x + x ) + x > 0 x=
1 2 1 2 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2 1 1 2 1 4 2 2 9 2 1 1 2 2 2 9 2 2 2
A=
2 1 1 ! 5
!
2 2
A positive semide nite matrix is similarly de ned. A symmetric matrix A is positive semidefinite if xT Ax 0 for every x. A commonly used notation for a symmetric positive de nite (positive semide nite matrix) is A > 0( 0). Some Characterizations and Properties of Positive De nite Matrices Here are some useful characterizations of positive de nite matrices:
1. A matrix A is positive de nite if and only if all its eigenvalues are positive. Note that in the above example the eigenvalues are 1.6972 and 5.3028. 2. A matrix A is positive de nite i all its leading principal minors are positive. There are n leading principal minors of an n n matrix A. The ith leading principal minor, denoted by " 1 2 !# i det A 1 2 i is the determinant of the submatrix of A formed out of the rst i rows and i columns. 0 10 1 1 1 B C Example: A = B 1 10 1 C. Thus, in the above example, @ A 1 1 10 The rst leading principal minor = 10 ! 10 1 The second leading principal minor = det A = 99 1 10 The third leading principal minor = det A = 972. 25
3. A symmetric diagonally dominant matrix is positive de nite. Note that the matrix A, in the example above, is diagonally dominant. 4. If A = (aij ) is positive de nite, then aii > 0 for all i. 5. If A = (aij ) is positive de nite, then the largest element (in magnitude) of the whole matrix must lie on the diagonal. 6. The sum of two positive de nite matrices is positive de nite. positive de nite. They can serve only as initial tests for positive de niteness. For example, the matrices 04 1 1 11 0 20 12 25 1 B1 0 1 2C B C B C C A=B B = B 12 15 2 C B @ A B1 1 2 3C C @ A 25 2 5 1 2 3 4 cannot be positive de nite, since in the matrix A, there is a zero entry on the diagonal and in B , the largest entry 25 is not on the diagonal.
Remarks: Note that (4) and (5) are only necessary conditions for a symmetric matrix to be
1.5 The CayleyHamilton Theorem
A square matrix A satis es its own characteristic equation that is, if A = (aij ) is an n n matrix and Pn ( ) is the characteristic polynomial of A, then
Pn(A) is a ZERO matrix.
Proof. (see Matrix Theory by Franklin, pp. 113114).
26
Example 1.5.1
Let
A = P( ) =
2
P (A) =
2
=
0 1 : 1 2 2 ; 2 !; 1: ! 1 0! 1 2 0 1 ;2 ; 2 5! 1 2 0 1 0 0 : 0 0
!
1.6 Singular Values
Let A by m n. Then the eigenvalues of the n n hermitian matrix A A are real and nonnegative. 2 2 2 Let these eigenvalues be denoted by i2 where 1 2 n . Then 1 2 : : : n are called the singular values of A. Every m n matrix A can be decomposed into
A= U VT
where Um m and Vn n are unitary and is an m n \diagonal" matrix. This decomposition is called the Singular Value Decomposition or SVD. The singular values i i = 1 : : : n are the diagonal entries of . The number of nonzero singular values is equal to the rank of the
matrix A. The singular values of A are the nonnegative square roots of the eigenvalues of AT A (see Chapter 10, section 10.3) Example 1.6.1
Let
A = AT A =
The eigenvalues of AT A are : q p = q 9+2 65 ] 1 p 9; 65 ] 2 = 2
0 2 1 2
1 : 2! 2 : 8
!
h
9
p65 i
2
27
1.7 Vector and Matrix Norms
1.7.1 Vector Norms
Let
0x Bx B x = B .. B B. @
1 2
be an nvector and V be a vector space. Then, a vector norm, denoted by the symbol kxk, is a realvalued continuous function of the components x1 x2 : : : xn of x, de ned on V , that has the following properties: 1. kxk > 0 for every nonzero x. kxk = 0 i x is the zero vector. 2. k xk = j jkxk for all x on V and for all scalars . 3. kx + y k kxk + ky k for all x and y in V . The property (3) is known as the Triangle Inequality.
xn
1 C C C C C A
Note:
k ; xk = kxk kxk ; kyk k(x ; y)k:
It is simple to verify that the following are vector norms.
Some Easily Computed Vector Norms.
(a) kxk1 = jx1 j + jx2j + (b) kxk2 = x2 + x2 + 1 2
p
jxnj (sum norm or one norm)
xn (Euclidean norm or two norm)
2
(c) kxk1 = maxi jxi j (in nity norm or maximum norm) In general, if p is a real number greater than or equal to 1, the pnorm or Holder norm is de ned by kxkp = (jx1jp + + jxnjp) 1 :
p
28
Example 1.7.1
Let x = (1 1 ;2). Then
kxk = 4 q p kxk = 1 + 1 + (;2) = 6 kxk1 = 2
1 2 2 2 2
An important property of the Holdernorm is the Holder inequality
kxT yk kxkp kykq
where 1 1 p + q = 1:
2
A special case of the Holderinequality is the CauchySchwartz inequality
jxT yj kxk kyk
that is,
2
j
n X j =1
vn vn uX uX u u xj yj j t xj t yj :
2 2
j =1
j =1
29
that
Equivalent Property of the Vectornorms All vector norms are equivalent in the sense that there exist positive constants and such
kxk kxk kxk
for all x. For the 2, 1, or 1 norms, we can compute and easily:
kxk kxk pnkxk kxk1 kxk pnkxk1 kxk1 kxk nkxk1
2 1 2 2 1
1.7.2 Matrix Norms
Let A be an m n matrix. Then, analogous to the vector norm, we de ne a matrix norm kAk with the following properties: 1. kAk > 0 kAk = 0 i
A is the zero matrix
2. k Ak = j jkAk for any scalar 3. kA + B k kAk + kB k 4. kAB k kAkkB k for all A and B .
Subordinate Matrix Norms
Given a matrix A and a vector norm k k, a nonnegative number de ned by:
satis es all the properties of a matrix norm. This norm is called the matrix norm subordinate to the vector norm. A very useful and frequently used property of a subordinate matrix norm (we shall sometimes call it the pnorm of a matrix A) is
Ax kAkp = max kkxkkp x6
=0
p
kAxkp kAkpkxkp: This property easily follows from the de nition of pnorms. Note: Ax kAkp kkxkkp p for any particular nonzero vector x. Multiplying both sides by kxkp gives the original inequality.
30
The two easily computable pnorms are:
1 1
m X kAk = maxn jaij j j i (maximum columnsum norm) n X kAk1 = max jaij j i m
=1 1
(maximum rowsum norm)
j =1
Example 1.7.2
0 1 1 ;2 C B A=B 3 4 C B C @ A ;5 6 kAk = 12 kAk1 = 11
1
Another useful pnorm is the spectral norm:
De nition 1.7.1 :
q kAk = maximum eigenvalue of AT A (Note that the eigenvalues of AT A are real and nonnegative).
2
Example 1.7.3
A = AT A =
2 1 5 13 5 3 ! 13 34
!
The eigenvalues of AT A are 0.0257 and 38.9743.
2
p kAk = 38:9743 = 6:2429
31
The Frobenius Norm
An important matrix norm compatible with the vector norm kxk2 is the Frobenius norm:
2n m 3 XX 5 kAkF = 4 jaij j
2
1 2
j =1 i=1
A matrix norm k kM and a vector norm k kv are compatible if
kAxkv kAkM kxkv :
Example 1.7.4
1 2 A= 3 4 p kAkF = 30:
!
Notes:
1. For the identity matrix I ,
p kI kF = n
whereas kI k1 = kI k2 = kI k1 = 1. 2. kAk2 = trace(AT A), where trace(A) is de ned as the sum of the diagonal F entries of A, that is, if A = (aij ), then trace(A) = a11 + a22 + : : : + ann.
Equivalence Property of Matrix Norms
As in the case of vector norms, the matrix norms are also related. There exist scalars and such that:
kAk
kA k
kAk :
In particular, the following inequalities relating various matrix norms are true and are used very frequently in practice.
Theorem 1.7.1 Let A be m n. pmkAk . (1) pn kAk1 kAk 1 pnkAk . (2) kAk kAk
1 2 2
F
2
32
pnkAk . (3) p1m kAk1 kAk2 1
(4) kAk2
p kAk kAk1.
1
We prove here inequalities (1) and (2) and leave the rest as exercises.
Proof of (1)
By de nition:
Again, from the equivalence property of the vectornorms, we have:
Ax kAk1 = max kkxkk1 : x6
=0
1
kAxk1 kAxk
and
2
kxk
>From the second inequality we get It therefore follows that or i.e.,
2
pnkxk : 1 pn kxk :
2 2 2
kxk1
1
kAxk1 pn kAxk kxk1 kxk
Ax max kkxkk1 x6=0
1
2 =0 2
pn max kAxk = pnkAk x6 kxk
1 pn kAk1 kAk :
2
2
The rst part is proved. To prove the second part, we again use the de nition of kAk2 and the appropriate equivalence property of the vectornorms.
Ax kAk = max kkxkk x6 pmkAxk kAxk 1 kxk1 kxk :
2 =0 2 2 2
2
Thus,
Ax So max kkxkk2 x6=0
2
pm max kAxk1 or kAk pmkAk . 1 x6 kxk1 The proof of (1) is now complete.
=0 2
kAxk kxk
2
2
pm kAxk1 kxk1 :
33
We prove (2) using a di erent technique. Recall that
kAkF = trace(AT A):
2
Since AT A is symmetric, there exists an orthogonal matrix O such that
OT (AT A)O = D = diag(d : : : dn):
1
(See Chapter 8.)
Now, the trace is invariant under similarity transformation (Exercise). We then have trace(AT A) = trace(D) = d1 + + dn :
Let dk = maxi (di). Then, since d1 : : : dn are also the eigenvalues of AT A, we have
kAk = dk:
2 2
Thus,
kAkF = trace(AT A) = d +
2 1
+ dn dk = kAk2: 2 + dk = ndk
To prove the other part, we note that That is, kAk2 F
ndk = nkAk . So, kAkF pnkAk .
2 2 2
kAkF = d +
2 1
+ dn dk + dk +
1.7.3 Convergence of a Matrix Sequence and Convergent Matrices
A sequence of vectors v (1) v (2) : : : is said to converge to the vector v if
Limit vi k = vi i = 1 : : : n: k!1
( )
A sequence of matrices A(1) A(2) : : : is said to converge to the matrix A = (aij ) if
Limit aijk = aij i j = 1 2 : : : n: k!1
( )
If the sequence fA(k)g converges to A, then we write
Limit A k = A: k!1
( )
We now state, without proof, necessary and su cient conditions for the convergence of vector and matrix sequences. The proofs can be easily worked out. 34
Theorem 1.7.2 The sequence v v : : : converges to v if and only if for any vector norm
(1) (2)
Limit kv k ; v k = 0: k!1
( )
A similar theorem holds for a matrix sequence. for every matrix norm
Theorem 1.7.3 The sequence of matrices A A : : : converges to the matrix A if and only if
(1) (2)
Limit kA k ; Ak = 0: k!1
( )
We now state and prove a result on the convergence of the sequence of powers of a matrix to the zero matrix.
Theorem 1.7.4 The sequence A A : : : of the powers of the matrix A converges to the zero matrix i j ij < 1 for each eigenvalue i of A.
2
Proof. For every n n matrix A, there exists a nonsingular matrix T such that 0J 1 0 B C T ; AT = J = B . . . C @ A
1 1
0
Jr
where each Ji has the form
0 B B B B Ji = B B B B @
i
1 C C i 1 0 C C ... ... C: C ... 1 C C 0 A
1
i
The above form is called the Jordan Canonical Form of A, and the diagonal block matrices are called Jordan matrices. It is an easy computation to see that
0 B B B B k=B Ji B B B B B @
0 . . . . . . 0
k i
k
... 0
k;1 i k i
;k
2
k
... ...
k;2 i k;1 i
1 C C C C . C ... . C . C C k k k; C i C i A
1
0
k i
from where we see that Jik ! 0 if and only if j ij < 1. This means that Limit Ak = 0 i j ij < 1 k!1 for each i.
De nition 1.7.2 A matrix A is called a convergent matrix if Ak ! 0 as k ! 1.
35
We now prove a su cient condition for a matrix A to be a convergent matrix in terms of a norm of the matrix A. We rst prove the following result.
A Relationship between Norms and Eigenvalues Theorem 1.7.5 Let be an eigenvalue of a matrix A. Then for any subordinate matrix norm,
j j < kAk:
Proof. By de nition, there exists a nonzero vector x such that
Ax = x:
Taking the norm of each side, we have
kAxk = k xk = j j kxk:
However, kAxk kAk kxk, so j j kxk = kAxk kAkkxk, giving j j kAk.
De nition 1.7.3 The quantity (A) de ned by
(A) = max j ij
is called the spectral radius of A.
As a particular case of Theorem 1.7.5, we have: (A) kAk: In view of the above, we can now state:
Corollary 1.7.1 A matrix A is convergent if kAk < 1, where k k is a subordinate
matrix norm.
36
Convergence of an In nite Matrix Series Theorem 1.7.6 The matrix series
I +A+A +
2
converges if and only if A is a convergent matrix. When it converges, it converges to (I ; A);1.
Proof. For the series to converge, Ak must approach a zero matrix when k approaches in nity.
Thus, the necessity is obvious. Next, let A be a convergent matrix that is, Ak ! 0 as k ! 1. Then from Theorem 1.7.4, we must have j ij < 1 for each eigenvalue i of A. This means that the matrix (I ; A) is nonsingular, since the eigenvalue of I ; A are i ; 1 1 ; 2 : : : 1 ; n, and a matrix A is nonsingular if and only if its eigenvalues are nonzero. This is because the eigenvalues of (I ; A) are 1 ; 1 1 ; 2 : : : 1 ; n, and j ij < 1 implies that none of them is zero. Thus from the identity (I ; A)(I + A + A2 + we have (I + A + A2 + + Ak ) = I ; Ak+1
+ Ak ) = (I ; A);1 ; (I ; A);1Ak+1 :
Since A is a convergent matrix, Thus when k ! 1 I + A + A2 +
Ak ! 0 as k ! 1:
+1
+ Ak ! (I ; A);1 .
1.7.4 Norms and Inverses
While analyzing the errors in an algorithm, we sometimes need to know, given a nonsingular matrix A, how much it can be perturbed so that the perturbed matrix A + E is nonsingular, and how to estimate the error in the inverse of the perturbed matrix. We start with the identity matrix. In the following, k k is a matrix norm for which kI k = 1.
Theorem 1.7.7 Let kE k < 1, then (I ; E ) is nonsingular and
k(I ; E ); k (1 ; kE k); :
1 1
Proof. Let
1
:::
n
be the eigenvalues of E . Then the eigenvalues of I ; E are 1;
1
1;
2
: : : 1 ; n:
37
eigenvalues are nonzero.)
Since kE k < 1 j ij < 1 for each i. Thus, none of the quantities 1 ; 1 1 ; 2 : : : 1 ; n is zero. This proves that I ; E is nonsingular. (Note that a matrix A is nonsingular i all its To prove the second part, we write (I ; E );1 = I + E + E 2 +
Since kE k < 1,
Thus, the series on the right side is convergent. Taking the norm on both sides, we have
Limit E k = 0: k!1
2
k(I ; E ); k kI k + kE k + kE k
1
= (1 ; kE k);1 (since kI k = 1):
(Note that the in nite series 1 + x + x2 + i jxj < 1.)
converges to 1 1;x
Theorem 1.7.8 If kE k < 1 then
k k(I ; E ); ; I k 1 ;E k k kE Proof. For any two nonsingular matrices A and B, we can write
1
A; ; B; = A; (B ; A)B; :
1 1 1 1
In the above equation, substitute now and A = I ; E then we have
B=I
(I ; E );1 ; I = (I ; E );1E:
(Note that I ;1 = I .) Taking the norm on both sides yields
k(I ; E ); ; I k kI ; E k; kE k:
1 1
Now from Theorem 1.7.7 we know
kI ; E k;
and thus we have the result.
1
(1 ; kE k);1
38
Implication of the Result
If the matrix E is very small, then 1 ; kE k is close to unity. Thus the above result implies that if we invert a slightly perturbed identity matrix, then the error in the inverse of the perturbed matrix does not exceed the order of the perturbation.
Theorem 1.7.9 Let A be nonsingular and let kA; E k < 1. Then A ; E is nonsingular, and kA; E k kA; ; (A ; E ); k kA; k 1 ; kA; E k Proof. We can write A ; E = A(I ; A; E ): Since kA; E k < 1, from Theorem 1.7.7, we have I ; A; E is nonsingular. Thus, A ; E , which is the product of the nonsingular matrices A and I ; A; E , is also nonsingular.
1 1 1 1 1 1 1 1 1 1
To prove the second part, we again recall the identity
1 1 1
A; ; B; = A; (B ; A)B; :
1
Substituting B = A ; E , we then have
A; ; (A ; E ); = ;A; E (A ; E ); :
1 1 1 1
Taking the norm on each side yields
kA; ; (A ; E ); k kA; E kk(A ; E ); k:
1 1 1 1
Now, since
B = A ; (A ; B ) = A I ; A; (A ; B )] B; = I ; A; (A ; B)]; A; :
1 1 1 1 1
(Note that (XY );1 = Y ;1X ;1.) If we now substitute B = A ; E , we then have (A ; E );1 = I ; A;1 E ];1A;1 : Taking norms, we get
k(A ; E ); k kA; k kI ; A; E k; :
1 1 1 1
39
But from Theorem 1.7.7, we know that
kI ; A; E k;
1
1
(1 ; kA;1E k);1:
1
So, we have We, therefore, have
;1 ( ;1 or kA ; AA 1; E ) k k ;k
A; k(A ; E ); k 1 ;kkA;kE k :
1 1
; ; kA; ; (A ; E ); k kA ;E k ;kA k k 1 kA E kA; E k . 1 ; kA; E k
1 1 1 1 1 1 1
Implication of the Result
The above result states that if kA;1 E k is small and is much less than unity, then the relative error in (A ; E );1 is bounded by kA;1E k.
1.8 Norm Invariant Properties of Orthogonal Matrices
We conclude the chapter by listing some very useful norm properties of orthogonal matrices that are often used in practice.
Theorem 1.8.1 Let O be an orthogonal matrix. Then
kOk = 1:
2
Proof. By de nition,
kOk =
2
=
q (OT O) q
(I ) = 1:
Theorem 1.8.2
kAOk = kAk
2
2
40
Proof.
2
kAOk =
q (OT AT AO) q (AT A) = kAk =
2
(Note that the spectral radius is invariant under similarity transformation.) (See Chapter 8.)
Theorem 1.8.3
2
kAOkF = kAkF
2
Proof. kAOkF = trace(OT AT AO) = trace(AT A) = kAkF .
1.9 Review and Summary
The very basic concepts that will be required for smooth reading of the rest of the book have been brie y summarized in this chapter. The most important ones are: 1. Special Matrices: Diagonal, triangular, orthogonal, permutation, Hessenberg, tridiagonal, diagonally dominant, and positive de nite matrices have been de ned and important properties discussed. 2. Vector and Matrix Norms: Some important matrix norms are: rowsum norm, columnsum norm, Frobeniusnorm, and spectral norm. A result on the relationship between di erent matrix norms is stated and proved in Theorem 1.7.1. Of special importance is the normproperty of orthogonal matrices. Three simple but important results have been stated and proved in section 1.8 (Theorems 1.8.1, 1.8.2, and 1.8.3). These results say that (i) the spectral norm of an orthogonal matrix is 1, and (ii) the spectral and the Frobenius norms remain invariant under matrix multiplication. 3. Convergence of a Matrix Sequence. The notion of the convergence of the sequence of matrix powers fAk g is important in the study of convergence of iterative methods for linear systems. The most important results in this context are: 41
(i) The sequence fAkg converges to the \zero" matrix if and only if j ij < 1 for each eigenvalue of i of A (Theorem 1.7.4). (ii) The sequence fAkg converges to a zero matrix if kAk < 1. (Corollary to Theorem 1.7.5).
4. Norms and Inverses. If a nonsingular matrix A is perturbed by a matrix E , it is sometimes of interest to know if the perturbed matrix A + E remains nonsingular and how to estimate the error in the inverse of A + E . Three theorems (Theorems 1.7.7, 1.7.8, and 1.7.9) are proved in this context in Section 1.7.4. These results will play an important role in the perturbation analysis of linear systems (Chapter 6).
1.10 Suggestions for Further Reading
The material covered in this chapter can be found in any standard book on Linear Algebra and Matrix Theory. In particular, we suggest the following books for further reading: 1. Matrix Theory by Joel N. Franklin, Prentice Hall, Englewood Cli s, NJ, 1968. 2. Linear Algebra With Applications by Steven J. Leon, McMillan, New York, 1986. 3. Linear Algebra and its Applications, (Second Edition), by Gilbert Strang, Academic Press, New York, 1980. 4. Introduction to Linear Algebra by Gilbert Strang, WellesleyCambridge Press, 1993. 5. The Algebraic Eigenvalue Problem by James H. Wilkinson, Clarendon Press, Oxford, 1965 (Chapter 1). 6. Matrix Analysis by Roger Horn and Charles Johnson, Cambridge University Press, 1985. 7. The Theory of Matrices by Peter Lancaster, Academic Press, New York, 1969. 8. The Theory of Matrices with Applications by Peter Lancaster and M. Tismenetsky, 2nd ed., Academic Press, Dover, New York, 1985. 9. The Theory of Matrices in Numerical Analysis by A. S. Householder, Dover Publications, Inc., New York, 1964. 10. Matrices and Linear Algebra by Hans Schneider and George Philip Barker, Dover Publications Inc., New York, 1989. 42
11. Linear Algebra and Its Applications by David C. Lay, AddisonWesley New York, 1994. 12. Elementary Linear Algebra with Applications by Richard O. Hill, Jr., HarcourtBraceJovanovich, 1991.
43
Exercises on Chapter 1 PROBLEMS ON SECTIONS 1.2 AND 1.3
1. Prove that (a) (b) (c) (d) (e) (f) a set of n linearly independent vectors in Rn is a basis for Rn. the set fe1 e2 : : : eng is a basis of Rn. a set of m vectors in Rn, where m > n, is linearly dependent. any two bases in a vector space V have the same number of vectors. dim(Rn) = n. spanfv1 : : : vng is a subspace of V, where spanfv1 : : : vng is the set of linear combinations of the n vectors v1 : : : vn from a vector space V. (g) spanfv1 : : : vng is the smallest subspace of V containing v1 : : : vn.
2. Prove that if S = fs : : : sk g is an orthogonal set of nonzero vectors, then S is linearly independent. 3. Let S be an mdimensional subspace of Rn. Then prove that S has an orthonormal basis. (Hint: Let S = fv1 : : : vng.) De ne a set of vectors fuk g by:
u = kv k v0 vk uk = kv0 k k
1 1 1 +1 +1 +1
where
0 T T vk = vk ; (vk u )u ; (vk u )u ;
+1 +1 +1 1 1 +1 2 2
T ; (vk uk)uk
+1
k = 1 2 : : : m ; 1:
Then show that fu1 : : : um g is an orthonormal basis of S . This is the classical GramSchmidt process. 4. Using the GramSchmidt process construct an orthonormal basis of R3 . 5. Construct an orthonormal basis of R(A), where
01 21 B C A = B2 3C: @ A
4 5 44
6. Let S1 and S2 be two subspaces of Rn . Then prove that dim(S1 + S2 ) = dim(S1) + dim(S2) ; dim(S1 \ S2): 7. Prove Theorem 1.3.1 on the properties of the determinant of a matrix. 8. Prove that (a) (b) (c) (d) (e) null(A) = 0 i A has linearly independent columns. rank(A) = rank(AT ). rank(A) + null(A) = n. if A is m n and m < n, then rank(A) m. if A and B are m n and n p matrices then rank(AB ) minfrank(A) rank(B )g: (f) the rank of a matrix remains unchanged when the matrix is multiplied by an invertible matrix. (g) if B = UAV , where U and V are invertible, then rank(B ) = rank(A). (h) N (A) = R(AT )? and R(A)? = N (AT ) 9. Let A be m n. Then A has rank 1 i A can be written as A = abT , where a and b are column vectors. 10. Prove the following basic facts on nonsingularity and the inverse of A: (a) (b) (c) (d) (A;1);1 = A (AT );1 = (A;1 )T (cA);1 = 1 A;1 , where c is a nonzero scalar c (AB );1 = B ;1 A;1
11. Suppose a matrix A can be written as
A = LU
where L is a lower triangular matrix with 1's along the diagonal and U = (uij ) is an upper triangular matrix. Prove that n Y det A = uii :
i=1
45
A 0 12. Let A = , where A and A are square. Prove that det(A) = det(A ) det(A ). A A 13. Suppose that A can be written as A = LDLT where L is lower triangular with 1's as diagonal entries and D = diag(d : : : dnn) is a diagonal matrix. Prove that the leading principal minors (determinant) of A are d d d : : :, d : : :dnn.
1 2 1 3 1 3 3 11 11 11 22 11
!
14. (a) Show that if PS is an orthogonal projection onto S , then I ; PS is the orthogonal projection onto S? . (b) Prove that i. PA = A(AT A);1 AT . ii. PN = I ; A(AT A);1AT . iii. kPA k2 = 1 (c) Prove that i. bR = PA b ii. bN = PN b 15. (a) Find PA and PN for the matrices 01 21 0 1 1 1 B C B C A = B2 3C A = B 10;4 0 C @ A @ A ;4 0 0 0 10 011 B C (b) For the vector b = B 0 C, nd bR and bN for each of the above matrices. @ A 1 (c) Find an orthonormal basis for each of the above matrices using the GramSchmidt process and then nd PA PN bR , and bN . For a description of the GramSchmidt algorithm, see Chapter 7 or problem #3 of this chapter. 16. Let A be an m n matrix with rank r. Consider the Singular Value Decomposition of A:
A = U VT ^ ^ = (Ur Ur ) (Vr Vr )T :
Then prove that 46
(a) (b) (c) (d)
Vr VrT is the orthogonal projection onto N (A)? = R(AT ). Ur UrT is the orthogonal projection onto R(A). ^ ^ Vr (U )r (Ur )T is the orthogonal projection onto R(A)? = N (AT ). ^ ^ Vr (Vr)T is the orthogonal projection onto N (A).
17. (Distance between two subspaces). Let S1 and S2 be two subspaces of Rn such that dim(S1 ) = dim(S2). Let P1 and P2 be the orthogonal projections onto S1 and S2 respectively. Then kP1 ; P2k2 is de ned to be the distance between S1 and S2. Prove that the distance between S1 and S2 , dist(S1 S2) = sin( ) where is the angle between S1 and S2 . 18. Prove that if PS is an orthogonal projection onto S , then I ; 2PS is an orthogonal projection.
PROBLEMS ON SECTIONS 1.4{1.6
19. Prove the following. (a) The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix . In general, if A = (aij ) is an upper (lower) triangular matrix , then p(A), where p(x) is a polynomial, is an upper (lower) triangular matrix whose diagonal elements are p(aii) i = 1 : : : n. (b) The inverse of a lower (upper) triangular matrix is another lower (upper) triangular matrix, whose diagonal entries are the reciprocals of the diagonal entries of the triangular matrix. (c) The determinant of a triangular matrix is the product of its diagonal entries. (d) The eigenvalues of a triangular matrix are its diagonal entries. (e) If A 2 Rn n is strictly upper triangular, then An = 0. (A = (aij ) is strictly upper triangular if A is upper triangular and aii = 0 for each i.) (f) The inverse of a nonsingular matrix A can be written as a polynomial in A (use the CayleyHamilton Theorem). 20. Prove that the product of an upper Hessenberg matrix and an upper triangular matrix is an upper Hessenberg matrix. 21. Prove that a symmetric Hessenberg matrix is symmetric tridiagonal. 47
22. A square matrix A = (aij ) is a band matrix of bandwidth 2k + 1 if ji ; j j > k implies that aij = 0. What are the bandwidths of tridiagonal and pentadiagonal matrices? Is the product of two banded matrices having the same bandwidth a banded matrix of the same bandwidth? Give reasons for your answer. 23. (a) Show that the matrix
where u is a column vector, is orthogonal (the matrix H is called a Householder matrix. (b) Show that the matrix ! c s J= where c2 + s2 = ` is orthogonal. (The matrix J is called a Givens matrix.) (c) Prove that the product of two orthogonal matrices is an orthogonal matrix. (d) Prove that a triangular matrix that is orthogonal is diagonal. 24. Let A and B be two symmetric matrices. (a) Prove that (A + B ) is symmetric. (b) Prove that AB is not necessarily symmetric. Derive a condition under which AB is symmetric. (c) If A and B are symmetric positive de nite, prove that (A + B ) is positive de nite. Is AB also positive de nite? Give reasons for your answer. When is (A ; B) symmetric positive de nite? 25. Let A = (aij ) be an n n symmetric positive de nite matrix. Prove the following. (a) Each diagonal entry of A must be positive. (b) A is nonsingular. (c) (aij )2 < aii ajj for i = 1 : : : n j = 1 ::: n i 6= j (d) The largest element of the matrix must lie on the diagonal. 26. Let A be a symmetric positive de nite matrix and x be a nonzero nvector. Prove that A + xxT is positive de nite. 48
uuT H = I ; 2 uT u
;s c
27. Prove that a diagonally dominant matrix is nonsingular, and a diagonally dominant symmetric matrix with positive diagonal entries is positive de nite. 28. Prove that if the eigenvalues of a matrix A are all distinct, then A is nonderogatory. 29. Prove that a symmetric matrix A is positive de nite i A;1 exists and is positive de nite. 30. Let A be an m n matrix (m n) having full rank. Then AT A is positive de nite. 31. Prove the following basic facts on the eigenvalues and eigenvectors. (a) A matrix A is nonsingular i A does not have a zero eigenvalue. (Hint: det A = 1 2 : : : n.) (b) The eigenvalues of AT and A are the same. (c) If two matrices have the same eigenvalues, they need not be similar (construct an example to show this). (d) A symmetric matrix is positive de nite i all its eigenvalues are positive. (e) The eigenvalues of a triangular matrix are its diagonal elements. (f) The eigenvalues of a unitary (orthogonal) matrix have moduli 1. (g) The eigenvectors of a symmetric matrix are orthogonal. (h) Let A be a symmetric matrix and let Q be orthogonal such that QAQT is diagonal. Then show that the columns of Q are the eigenvectors of A. 32. Let
0c c B n1; n0; B B B C=B 0 1 0 B . B . . ... ... . B . . @
1 2
c c1
0 0C C C 0 0C . .C . .C . .C A 1 0
1 0
C
Then show that (a) the matrix V de ned by
0
0
1 1 1 is such that V ;1 CV = diag( i) i = 1 : : : n. 49
0 n B n; B B . B V = B .. B B B @
1 1 1
n
1
...
2
n;1
2
2
...
1 n; C n C . C . C . C C C n C A
n n
1
where
i 6= j
(b) The eigenvector xi corresponding to the eigenvalue i of C is given by xT = ( n;1 i i i 1).
n;2 i
: : :,
33. Let H be an unreduced upper Hessenberg matrix. Let X = (x1 : : : xn) be de ned by 011 B0C B C x1 = e1 = B .. C xi+1 = Hxi i = 1 2 : : : n ; 1: B C B.C @ A 0 Then prove that X is nonsingular and X ;1HX is a companion matrix (upper Hessenberg companion matrix.) 34. What are the singular values of a symmetric matrix? What are the singular values of a symmetric positive de nite matrix? Prove that a square matrix A is nonsingular i it has no zero singular value. 35. Prove that (a) trace(AB ) = trace(BA). (b) trace(AA ) =
m n XX i=1 j =1
jaijj , where A = (aij ) is m n.
2
(c) trace(A + B ) = trace(A) + trace(B ). (d) trace(TAT ;1) = trace(A).
PROBLEMS ON SECTIONS 1.7 AND 1.8
36. Show that kxk1 kxk1 kxk2 (as de ned in section 1.7 of the book) are vector norms. 37. Show that, if x and y are two vectors, then
kxk ; kyk
kx ; yk kxk + kyk
38. If x and y are two nvectors, then prove that (a) jxT y j kxk2 ky k2 (CauchySchwarz inequality) (b) kxy T k2 kxk2 ky k2 (Schwarz inequality) 39. Let x and y be two orthogonal vectors, then prove that
kx + yk = kxk + kyk :
2 2 2 2 2 2
50
40. Prove that for any vector x, we have
kxk1 kxk
2
kxk :
1
41. Prove that kAk1 kAk1 kAk2 are matrix norms. 42. Let A = (aij ) be m n. De ne A` = maxij jaij j. Is A` a matrix norm? Give reasons for your answer. 43. (a) Prove that the vector length is preserved by orthogonal matrix multiplication. That is, if x 2 Rn and Q 2 Rn n be orthogonal, then kQxk2 = kxk2 (Isometry Lemma). (b) Is the statement in part (a) true if the one and in nity norms are used? Give reasons. What if the Frobenius norm is used? 44. Prove that kI k2 = 1. Prove that kI kF = n. 45. Prove that if Q and P are orthogonal matrices, then (a) kQAP kF = kAkF (b) kQAP k2 = kAk2 46. Prove that the spectral norm of a symmetric matrix is the same as its spectral radius. 47. Let A 2 Rn n and let x y and z be nvectors such that Ax = b and Ay = b + z . Then prove that (assuming that A;1 exists).
p
kzk kAk
2 2
kx ; y k
2
kA; k kzk
1 2
2
48. Prove that kAk2 kAkF (use CauchySchwarz inequality). 49. Prove that kAk2 is just the largest singular value of A. How is kA;1 k2 related with a singular value of A? 50. Prove that kAT Ak2 = kAk2 . 2 51. Prove that
kABkF kAkF kBk : kABkF kAk kBkF :
2 2
51
52. Let A = (a1 : : : an), where aj is the j th column of A. Then prove that
kAkF =
2
n X i=1
kaikF :
2
53. Prove that if A and A + E are both nonsingular, then
k(A + E ); ; A; k kE k kA; k k(A + E ); k
1 1 1 1
(Banach Lemma). What is the implication of this result? 54. Let A 2
2
kE k < kAy k , where Ay = (AT A); AT .
1 1
Rm n have full rank, then prove that AyE has also full rank if E is such that
2
55. Show that the matrices
4 5 6 01 2 31 B C A = B0 5 4C @ A 0 0 1 are not convergent matrices.
01 2 31 B C A = B2 3 4C @ A
01 B A=B @
1 2 1 3
1 2 1 3 1 4
1 3 1 4 1 5
1 C C A
00 1 01 B C A = B0 0 1C @ A
1 2 3
56. Construct a simple example where the norm test for convergent matrices fails, but still the matrix is convergent. 57. Prove that the series (I + A2 + A2 + : : :) converges if kB k < 1, where B = PAP ;1 . What is the implication of this result? Construct a simple example to see the usefulness of the result in practical computations. (For details, see Wilkinson AEP, p. 60.)
52
2. FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 Floating Point Number Systems : : : : : : : : : : : : : : : : : : : : : Rounding Errors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : Addition of n Floating Point Numbers : : : : : : : : : : : : : : : : : Multiplication of n Floating Point Numbers : : : : : : : : : : : : : : Inner Product Computation : : : : : : : : : : : : : : : : : : : : : : : Error Bounds for Floating Point Matrix Computations : : : : : : : : Roundo Errors Due to Cancellation and Recursive Computations : Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
53 56 59 63 65 66 69 71 75 77
CHAPTER 2 FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS
2. FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS
2.1 Floating Point Number Systems
Because of limited storage capacity, a real number may or may not be represented exactly on a computer. Thus, while using a computer, we have to deal with approximations of the real number system using nite computer representations. This chapter will be con ned to the study of the arithmetic of such approximate numbers. In particular, we will examine the widely accepted IEEE standard for binary oatingpoint arithmetic (IEEE 1985). A nonzero normalized oating point number in base 2 has the form: (;1)sd1:d2d3
dt2e dt2e
x = :d d
or
1 2
x = :r2e
where e is the exponent, r is the signi cant, d2 d3 and (;1)s is the sign. (Note that t is nite.)
dt is called the fraction, t is the precision,
d =1
1
di = 0 or 1 2 i t
Three parameters specify all numerical values that can be represented. These are: the precision, t, and L and U , the minimum and maximum exponents. The numbers L and U vary among computers, even those that adhere to the IEEE standard, since the standard recommends only minimums. As an example, the standard recommends, for single precision, that t = 24 L = ;126, and U = 127. The recommendation for double precision is t = 53 L = ;1022, and U = 1023. Consider the example for a 32bit word 1
s
8
e
23
f
Here s is the sign of the number, e is the eld for the exponent, and f is the fraction. Note that for normalized oating point numbers in base 2, it is known that d1 = 1 and can thus be stored implicitly. 53
The actual storage of the exponent is accomplished by storing the true exponent plus an o set, or bias. The bias is chosen so that e is always nonnegative. The IEEE standard also requires that the unbiased exponent have two unseen values of L ; 1 and U + 1. L ; 1 is used to encode 0 and denormalized numbers (i.e., those for which d1 6= 1). U +1 is used to encode 1 and nonnumbers, such as (+1) + (;1), which are denoted by NaN. Note that for the single precision example given above, the bias is 127. Thus, if the biased exponent is 255, then 1 or a NaN is inferred. Likewise if the biased exponent is 0, then 0 or a denormalized number is inferred. The standard speci es how to determine the di erent cases for the special situations. It is not important here to go into such detail. Curious readers should consult the reference (IEEE 1985). From the discussion above, one sees that the IEEE standard for single precision provides approximately 7 decimal digits of accuracy, since 2;23 = 1:2 10;7. Similarly, double precision provides approximately 16 decimal digits of accuracy (2;52 = 2:2 10;16). There is also an IEEE standard for oating point numbers which are not necessarily of base 2. By allowing one to choose the base, , we see that the set of all oating point numbers, called the Floating Point Number System, is thus characterized by four parameters:  The number base. t  The Precision. L U  Lower and upper limits of the exponent. This set has exactly: 2( ; 1) t;1(U ; L + 1) + 1 numbers in it. We denote the set of normalized oating point numbers of precision t by Ft . The set of Ft is NOT closed under arithmetic operations that is, the sum, di erence, product, or quotient of two oating numbers in Ft is not necessarily a number in Ft. To see this, consider the simple example in the oating system with = 10 t = 3 L = ;1 U = 2:
a = 11:2 = :112 10 b = 1:13 = :113 10
2
1
The product c = a b = 12:656 = :12656 102 is not in Ft . The above example shows that during the course of a computation, a computed number may very well fall outside of Ft. There are, of course, two ways a computed number can fall outside the range of Ft rst, the exponent of the number may fall outside the interval L U ] second, the fractional part may contain more than t digits (this is exactly what happened in the above example). 54
If the computations produce an exponent too large (too small) to t in a given computer, then the situation is called over ow (under ow). Over ow is a serious problem for most systems the result of an over ow is 1. Under ow is usually considerably less serious. The result of an under ow can be to set the value to zero, a denormalized number or 2L .
Example 2.1.1 Over ow and Under ow
1. Let = 10 t = 3 L = ;3 U = 3.
a = :111 10 b = :120 10 c = a b = :133 10
3 3
5
will result in an over ow, because the exponent 5 is too large. 2. Let = 10 t = 3 L = ;2 U = 3
a = :1 10; b = :2 10; c = ab = 2 10;
1 1
4
will result in an under ow. Simple mathematical computations such as nding a square root, or exponent of a number or computing factorials can give over ow. For example, consider computing
p c= a +b :
2 2
If a or b is very large, then we will get over ow while computing a2 + b2. The IEEE standard also sets forth the results of operations with in nities and NaNs. All operations with in nities correspond to the limiting case in real analysis. Those ambiguous situations, such as 0 1, result in NaNs, and all binary operations with one or two NaNs result in a NaN.
55
Computing the Length of a Vector
Over ow and under ow can sometimes be avoided just by organizing the computations di erently. Consider, for example, the task of computing the length of an nvector x with components
x : : : xn
1
kxk = x + x +
2 2 2 1 2 2
+ x2 : n
If some xi is too big or too small, then we can get over ow or under ow with the usual way of computing kxk2. However, if we normalize each component of the vector by dividing it by m = max(jx1j : : : jx1j) and then form the squares and the sum, then over ow problems can be avoided. Thus, a better way to compute kxk2 will be: 2 1: m = max(jx1j : : : jxnj) 2: yi = xi=m i = 1 : : : n p 2 2 2 3: kxk2 = m (y1 + y2 + + yn )
2.2 Rounding Errors
If a computed result of a given real number is not machine representable, then there are two ways it can be represented in the machine. Consider
d
1
dtdt
+1
Then the rst method, chopping, is the method in which the digits from dt+1 on are simply chopped o . The second method is rounding, in which the digits dt+1 through the rest are not only chopped o , but the digit dt is also rounded up or down depending on whether dt+1 =2 or dt+1 < =2. Let (x) denote the oating point representation of a real number x.
Example 2.2.1 Rounding
Consider base 10. Let x = 3:141596
t=2 (x) = 3:1 t=3 (x) = 3:14 t=4 (x) = 3:142
56
We now give an expression to measure the error made in representing a real number x on the computer, and then show how this measure can be used to give bounds for errors in other oating point computations.
De nition 2.2.1 Let x denote an approximation of x, then there are two ways we can measure ^
the error: Absolute Error = jx ; xj ^ ^ Relative Error = jx j;jxj x
x 6= 0:
simple example shows this:
Note that the relative error makes more sense than the absolute error. The following
Example 2.2.2
Consider
x x ^ and x x ^
1 1
= 1:31 1 = 1:30 2 = 0:12 2 = 0:11
1
The absolute errors in both cases are the same:
jx ; x j = jx ; x j = 0:01: ^ ^
2 2
On the other hand, the relative error in the rst case is jx1 ; x1j = 0:0076335 ^
jx j
1
and the relative error in the second case is jx2 ; x2j = 0:0833333: ^
jx j
2
Thus, the relative errors show that x1 is closer to x1 than x2 is to x2 , whereas the absolute errors ^ ^ give no indication of this at all. The relative error gives an indication of the number of signi cant digits in an approximate answer. If the relative error is about 10;s, then x and x agree to about s signi cant digits. More ^ speci cally,
De nition 2.2.2 x is said to approximate x to s signi cant digits if s is the largest nonnegative in^ ^ ^ 1 teger for which the relative error jx j;jxj < 5(10;s) that is, s is given by s = ; log jx j;jxj + 2 . x x
57
Thus, in the above examples, x1 and x1 agree to two signi cant digits, while x2 and x2 agree ^ ^ to about only one signi cant digit. We now give an expression for the relative error in representing a real number x by its oating point representation (x):
Theorem 2.2.1 Let (x) denote the oating point representation of a real number
x. Then
j (x) ; xj jxj
81 > ;t for rounding < => 2 : ;t for chopping
1 1
9 > = >:
(2.2.1)
Proof. We establish the bound for rounding and leave the other part as an exercise.
Let x be written as
x=(d d
1 2
dtdt
+1
)
e
e
where d1 6= 0 and 0 di < . When we round o x we obtain one of the following point numbers:
x0 = ( d d x00 = ( d d
1 2 1 2
dt) dt) +
;t)]
e
:
Obviously we have x 2 (x0 x00). Assume, without any loss of generality, that x is closer to x0. We then have jx ; x0j 1 jx0 ; x00j = 1 e;t: 2 2 Thus, the relative error ;t 1 jx ; x0j jxj 2 d1d2 dt 1 ;t (since d < ) i 2 1 = 1 1;t: 2
Example 2.2.3
58
Consider the three digit representation of the decimal number x = 0:2346 ( = 10 t = 3). Then, if rounding is used, we have: (x) = 0:235 Relative Error = 0:001705 < 1 10;2: 2 Similarly, if chopping is used, we have: (x) = :234 Relative Error = 0:0025575 < 10;2:
De niton: The number in (2.2.1) is called the machine precision or unit roundo error.
It is the smallest positive oating point number such that: (1 + ) > 1: is usually between 10;16 and 10;6 (on most machines), for double and single precision, respectively. For the IBM 360 and 370, = 16 t = 6 = 4:77 10;7. The machine precision is very important in scienti c computations. If the particulars t L and U for a computer are not known, the following simple FORTRAN program can be run to estimate for that computer (Forsythe, Malcolm and Moler (1977), p. 14.)
REAL MEU, MEU 1 MEU = 1.0 10 MEU = 0.5 * MEU MEU 1 = MEU + 1.0 IF (MEU 1.GT.1.0) GOTO 10
The above FORTRAN program computes an approximation of which di ers from by at most a factor of 2. This approximation is quite acceptable, since an exact value of is not that important and is seldom needed. The book by Forsythe, Malcolm and Moler (CMMC 1977) also contains an extensive list of L and U for various computers.
2.3 Laws of Floating Point Arithmetic
The formula
j (x) ; xj jxj
8 > ;t for chopping < = > 1 ;t :2 for rounding
1 1
59
(x) = x(1 + ) where j j Assuming that the IEEE standard holds, we can easily derive the following simple laws of oating point arithmetic.
can be written as
Theorem 2.3.1 Let x and y be two oating point numbers, and let (x + y), (x ; y ) (xy ), and (x=y ) denote the computed sum, di erence, product and quotient. Then 1. (x y ) = (x y )(1 + ), where j j 2. (xy ) = (xy )(1 + ), where j j . . .
3. if y 6= 0, then (x=y ) = (x=y )(1 + ), where j j
On computers that do not use the IEEE standard, the following oating point law of addition might hold: 4. (x + y ) = x(1 + 1 ) + y (1 + 2 ) where j 1 j , and j 2 j .
Example 2.3.1 Simple Floating Point Operations with Rounding
Let in examples 1 through 3. 1. x = :999 102 y = :111 100. = 10
t=3
x + y = 100:0110 = :100011 10 (x + y ) = :100 10
3
3
Thus, (x + y ) = (x + y )(1 + ), where = ;1:0999 10;4 2. x = :999 102 y = :111 100.
1 j j < 2 (10; ):
2
xy = 11:0889 (xy ) = :111 10
60
2
Thus, (xy ) = xy (1 + ), where = 1:00100 10;3 3. x = :999 102 y = :111 100.
j j 1 (10 ; ): 2
1 3
x = 900 y x = :900 10 y = 0:
= = = = = 10 0:1112 :2245 :24964 :2496
3
4. Let
t=4
105 104 104
x y xy (xy )
Thus, j (xy ) ; xy j = :44 and
j j = 1:7625 10; < 1 10; 2
3
4
Computing Without a GuardDigit
Theorem 2.3.1 and the examples following this theorem show that the relative errors in computing the sum, di erence, product and quotient in oating point arithmetic are small. However, there are computers without guard digits, such as the Cybers and the current CRAYS (the CRAY arithmetic is changing), in which additions and subtractions may not be accurate. We describe this aspect in some detail below. A guard digit is an extra digit on the lower end of the arithmetic register whose purpose is to catch the low order digit which would otherwise be pushed out of existence when the decimal points are aligned. The following example shows the di erence between two models.
Examples of oating point additions
Let = 10
t=3
61
= 0:001
Example 2.3.2 Addition with a Guard Digit
x = 0:101 10
2
y = ;0:994 10
1
Step 1. Align two numbers
guard digit x = 0:101 0 y = ;0:099 4 0:1010 ;0:0994 (x + y ) = 0:001 6
102 102 102 102 102
Step 2. Add (with an extra digit)
Step 3. Normalize
(x + y ) = 0:160 100
Result: (x + y) = (x + y)(1 + ) with = 0. Example 2.3.3 Addition without a guard digit
x = 0:101 10
2
y = ;0:994 10
102 102
1
Step 1. Align two numbers
x = 0:101 y = ;0:099 4]
The low order digit 2] is pushed out !
Step 2. Add
0:101 ;0:099 (x + y ) = 0:002 102 (x + y ) = 0:200 100
102 102
Step 3. Normalize
Result: (x + y) = (x + y)(1 + ) with = 0:25 = 250 .
62
Thus, we repeat that for computers with a guard digit, (x y ) = (x + y )(1 + ) However, for those without a guard digit (x y ) = x(1 + 1 ) y (1 + 2 )
jj
j j
1
j j
2
:
A FINAL REMARK: Throughout this book, we will assume that the computations have been
performed with a guard digit, as they are on almost all available machines. We shall call results 1 through 3 of Theorem 2.3.1 along with (2.2.1) the fundamental laws of oating point arithmetic. These fundamental laws form the basis for establishing bounds for oating point computations. For example, consider the oating point computation of x(y + z ): (x(y + z )) = x (y + z )](1 + 1 ) = x(y + z )(1 + 2 )(1 + 1 ) = x(y + z )(1 + 1 2 + 1 + 2 ) x(y + z)(1 + 3 ) where 3 = 1 + 2 since 1 and 2 are small, their product is neglected. We can now easily establish the bound of 3 . Suppose = 10, and that rounding is used. Then
j j = j + j j j+j j
3 1 2 1 2
1 101;t + 1 101;t 2 2 = 101;t:
Thus, the relative error due to roundo in computing (x(y + z )) is about 101;t in the
worst case.
2.4 Addition of n Floating Point Numbers
Consider adding n oating point numbers x1 x2 : : : xn with rounding. De ne s2 = (x1 + x2 ). Then s2 = (x1 + x2) = (x1 + x2 )(1 + 2 ) 63
where j 1 j
1 1;t. That is, s ; (x + x ) = (x + x ). De ne s s : : : s recursively by 2 1 2 2 1 2 3 4 n 2 si+1 = (si + xi+1) i = 2 3 : : : n ; 1:
Then s3 = (s2 + x3) = (s2 + x3)(1 + 3 ). That is,
s ; (x + x + x ) = (x + x ) + (x + x )(1 + ) + x (x + x ) + (x + x + x )
3 1 2 3 1 1 2 2 2 2 1 1 2 2 2 3 3 3
3 3
(neglecting the term
2 3
which is small, and so on). Thus, by induction we can show that
2
sn ; (x + x +
1
+ xn)
(x1 + x2 ) 2 + (x1 + x2 + x3) 3 + + (x1 + x2 + + xn )
n
(again neglecting the terms i j , which are small). The above can be written as
sn ; (x + x +
1 2
+ xn)
x ( + + + n) +x ( + + n ) + x ( + + + xn n
1 2 3 2 2 3 3 1
+ n)
where each j ij
1 1;t = . De ning 2
1 2
= 0, we can write:
Theorem 2.4.1 Let x x : : : xn be n oating point numbers. Then
(x1 + x2 + + xn) ; (x1 + x2 + + xn ) x1( 1 + 2 + + n ) + x2( 2 + + n) + where each j ij , i = 1 2 : : : n. + xn
n
adding n oating point numbers in ascending order of magnitude:
Remark: From the above formula we see that we should expect smaller error in general when
jx j jx j jx j
1 2 3
jxnj:
If the numbers are arranged in ascending order of magnitude, then the larger errors will be associated with the smaller numbers. 64
2.5 Multiplication of n Floating Point Numbers
Proceeding as in the case of addition of n oating point numbers in the last section, it can be shown that
Theorem 2.5.1
(x1 x2
xn) (1 + )
n Y i=1
xi
where = j(1 + 2 )(1 + 3)
(1 + n ) ; 1j and j i j
i = 1 2 : : : n.
A bound for quite realistic on most machines this assumption will hold for fairly large values of n). Since j i j , we have (1 + )n; ; 1. Again, since
1
Assuming that (n ; 1) < :01, we will prove that < 1:06(n ; 1) . (This assumption is
ln(1 + )n;1 = (n ; 1) ln(1 + ) < (n ; 1) we have Thus, (1 + )n;1 < e(n;1) :
2 (1 + )n;1 ; 1 < e(n;1) = (n ; 1) + ((n ;21) ) + 2 = (n ; 1) 1 + (n ; 1) + ((n ;61) ) + 2 0:05 < (n ; 1) 1 + 1 ; :05 (Note that (n ; 1) < :01.) Thus, (1 + )n;1 ; 1 < (n ; 1) 1 + 1 0:05 < 1:06(n ; 1) : ; :05
(2.5.1)
Thus, combining Theorem 2.5.1 and (2.5.1), we can write
Theorem 2.5.2 The relative error in computing the product of n oating point numbers is at most 1:06(n ; 1) , assuming that (n ; 1) < :01.
65
2.6 Inner Product Computation
A frequently arising computational task in numerical linear algebra is the computation of the inner product of two nvectors x and y:
xT y = x y + x y +
1 1 2 2
+ xn yn
(2.6.1)
where xi and yi , i = 1 : : : n, are the components of x and y . Let xi and yi , i = 1 : : : n be oatingpoint numbers. De ne
. . . Sk = We then have, using Theorem 2.3.1,
S = S =
1 2
(x1y1 ) (S1 + (x2y2 )) (Sk;1 + (xk yk )) k = 3 4 : : : n:
(2.6.2) (2.6.3) (2.6.4)
S = x y (1 + ) S = S + x y (1 + )] (1 + )
1 2 1 1 1 1 2 2 2 2
Sn
. . . = Sn;1 + xn yn (1 + n )] (1 + n )
(2.6.5) (2.6.6) (2.6.7)
where each j i j , and j ij . Substituting the values of S1 through Sn;1 in Sn and making some rearrangements, we can write
n X i=1
Sn =
where 1+
i
xi yi(1 + i )
i+1)
(2.6.8)
= (1 + i )(1 + i)(1 + 1 + i + i + i+1 +
i j
(1 + n) + n ( 1 = 0)
(2.6.9)
(ignoring the products
and
j k,
which are small).
66
For example, when n = 2, it is easy to check that
S = x y (1 + ) + x y (1 + )
2 1 1 1 2 2 2
(2.6.10)
where 1 + 1 1 + 1 + 2 1 + 2 1 + 2 + 2 (neglecting the products of 1 2 and 2 2, which are small). As in the last section, it can be shown (see Forsythe and Moler CSLAS, pp. 92{93) that if n < 0:01, then j ij 1:01(n + 1 ; i) i = 1 2 : : : n: (2.6.11) From (2.6.8) and (2.6.11), we have
j (xT y) ; xT yj n X jxij jyij j ij
n jxjT jy j n kxk ky k
2
i=1
2
(using the Cauchy{Schwarz inequality (Chapter 1, section 1.7)),
where j j = (j 1j j 2j : : : j nj)T .
Theorem 2.6.1
j (xT y) ; xT yj n jxjT jyj n kxk kyk
2 2
where c is a constant of order unity.
Computing Inner Product in Double Precision
While talking about inner product computation, let's mention that since most computers allow double precision computations, it is recommended that the inner product be computed in double precision (using 2t digits arithmetic) to retain greater accuracy. The rationale here is that if we use single precision to compute xT y , then there will be (2n ; 1) single precision rounding errors (one for each multiplication and each addition). A better strategy is to convert each xi and yi to double precision by extending their mantissa with zeros, multiply them in double precision, add them in double precision and, nally, round the nal result in single precision. This process is known as accumulation of inner product in double precision (or extended precision). We summarize the process in the following. 67
Accumulation of Inner Product in Double Precision
1. Convert each xi and yi in double precision. 2. Compute the individual products xi yi in double precision. 3. Compute the sum
n X i=1
xiyi in double precision.
4. Round the sum in single precision. The process gives low roundo error at little extra cost. It can be shown (Wilkinson AEP, pp. 117{118) that the error in this case is essentially independent of n. Speci cally, it can be shown that if the inner product is accumulated in double precision and 2(xT y ) denotes the result of such computations, then
Theorem 2.6.2
j (xT y) ; xT yj c jxT yj unless severe cancellation takes place in any of the terms of xT y.
2
Remark: The last sentence in Theorem 2.6.2 is important. One can construct a very simple section.
example (Exercise #6(b)) to see that if cancellation takes place, the conclusion of Theorem 2.6.2 does not hold. The phenomenon of catastrophic cancellation is discussed in the next
68
2.7 Error Bounds for Floating Point Matrix Computations
and c a oating point number. Then 1. (cA) = cA + E jE j
Theorem 2.7.1 Let jM j = (jmijj). Let A and B be two oating point matrices
jcAj jA + Bj
2. (A + B ) = (A + B ) + E jE j
If A and B are two matrices compatible for matrixmultiplication, then 3. (AB ) = AB + E jE j n jAj jB j + O( 2 ).
Proof. See Wilkinson AEP, pp. 114115, Golub and Van Loan MC (1989, p. 66). Meaning of O( )
2
In the above expression the notation O( 2 ) stands for a complicated expression that is bounded by c 2 , where c is a constant, depending upon the problem. The expression O( 2 ) will be used frequently in this book.
Remark: The last result shows that the matrix multiplication in oating point arithmetic can be very inaccurate, since jAj jB j may be much larger than jAB j itself (exercise #9). For this reason, whenever possible, while computing matrixmatrix or matrixvector product, accumulation of inner products in double precision should be used, because in this case the entries of the error matrix can be shown to be bounded predominantly by the entries of the matrix jABj, rather than those of jAjjBj see Wilkinson AEP, p. 118. Error Bounds in Terms of Norms
Traditionally, for matrix computations the bounds for error matrices are given in terms of the norms of the matrices, rather than in terms of absolute values of the matrices as given above. Here we rewrite the bound for error matrices for matrix multiplications using norms, for easy reference later in the book. We must note, however, that entrywise error bounds are more meaningful than normwise errors (see remarks in Section 3.2). Consider again the equation: (AB ) = AB + E jE j n jAj jB j + O( 2): 69
Since kE k k jE j k, we may rewrite the equation as: (AB ) = AB + E where
kE k k jE j k n k jAj k k jBj k + O( ): In particular, for k k and k k1 norms, we have
2 2
kE k1 kE k
2
n kAk1 kB k1 + O( ) n kAk kBk + O( ):
2 2 2 2 2 2
Theorem 2.7.2 (AB) = AB + E , where kE k
n kAk kBk + O( ).
2 2 2 2
Two Important Special Cases A. Matrixvector multiplication Corollary 2.7.1 If b is a vector, then from above we have
(Ab) = Ab + e where
kek
2
n kAk kbk + O( ):
2 2 2 2
(See also Problem #11 and the remarks made there.)
B. Matrix multiplication by an orthogonal matrix Corollary 2.7.2 Let A 2 Rn n and Q 2 Rn n orthogonal. Then
(QA) = Q(A + E ) where kE k2 n2 kAk2 + O( 2 ).
70
Implication of the above result The result says that, although matrix multiplication can be inaccurate in general, if one of the matrices is orthogonal then the oating point matrix multiplication gives only a small and acceptable error. As we will see in later chapters, this result forms the basis
of many numerically viable algorithms discussed in this book. For example, the following result, to be used very often in this book, forms the basis of the QR factorization of a matrix A (see Chapter 5) and is a consequence of the above result.
Corollary 2.7.3 Let P be an orthogonal matrix de ned by
^ where u is a column vector. Let P be the computed version of P in oating point arithmetic. Then ^ (PA) = P (A + E ) where
T P = I ; 2 uuu T u
kE k
2
cn kAk
2
2
and c is a constant of order unity. Moreover, if the inner products are accumulated in double precision, then the bound will be independent of n2 .
Proof. (See Wilkinson AEP, pp. 152160).
2.8 Roundo Errors Due to Cancellation and Recursive Computations
Intuitively, it is clear that if a large number of oating point computations is done, then the accumulated error can be quite large. However, roundo error can be disastrous even at a single step of computation. For example, consider the subtraction of two numbers:
x = :54617 y = :54601
The exact value is
d = x ; y = :00016:
71
Suppose now we use four digit arithmetic with rounding. Then we have
x = :5462 (Correct to four signi cant digits) ^ y = :5460 (Correct to ve signi cant digits) ^ ^ d = x ; y = :0002: ^ ^
^ How good is the approximation of d to d? The relative error is ^ jd ; dj = :25(quite large!) jdj
What happened above is the following. In four digit arithmetic, the numbers .5462 and .5460 are of almost the same size. So, when the rst one was subtracted from the second, the most signi cant digits canceled and the very least signi cant digit was left in the answer. This phenomenon, known as catastrophic cancellation, occurs when two numbers of approximately the same size are subtracted. Fortunately, in many cases catastrophic cancellation can be avoided. For example, consider the case of solving the quadratic equation:
ax + bx + c = 0
2
a 6= 0:
It is clear from above that if a b, and c are numbers such that ;b is about the same size as b2 ; 4ac (with respect to the arithmetic used), then a catastrophic cancellation will occur in computing x2 and as a result, the computed value of x2 can be completely erroneous.
The usual way the two roots x1 and x2 are computed is: p ;b + b2 ; 4ac x1 = 2 pa ;b ; b2 ; 4ac x2 = 2a
p
Example 2.8.1
As an illustration, take a = 1, b = ;105, c = 1 (Forsythe, Malcolm and Moler CMMC pp. 2022). Then using = 10 t = 8 L = ;U = ;50, we see that p 105 + 1010 ; 4 = 105 (true answer) x1 = 2 105 ; 105 = 0 (completely wrong). x2 = 2
The true x2 = 0:000010000000001 (correctly rounded to 11 signi cant digits). The catastrophic p cancellation took place in computing x2, since ;b and b2 ; 4ac) are the same order. Note that p in 8digit arithmetic, 1010 ; 4 = 105. 72
How Cancellation Can be Avoided
Cancellation can be avoided if an equivalent pair of formulas is used:
where sign(b) is the sign of b. Using these formulas, we easily see that:
p b + sign(b) b ; 4ac x = ; 2a c x = ax
2 1 2 1
x = 100000:00 1:0000000 x = 100000:00 = 0:000010000
1 2
Example 2.8.2
For yet another example to see how cancellation can lead to inaccuracy, consider the problem of evaluating f (x) = ex ; x ; 1 at x = :01. Using ve digit arithmetic, the correct answer is .000050167. If f (x) is evaluated directly from the expression, we have f (:01) = 1:0101 ; (:01) ; 1 = :0001 ; :000050167 Relative Error = :000100005016 : = :99 100 indicating that we cannot trust even the rst signi cant digit. Fortunately, cancellation can again be avoided using the convergent series
ex = 1 + x + x + x + 2 3!
2 3
In this case we have
ex ; x ; 1 = (1 + x + x + x + ) ; x ; 1 2 3! x +x +x =
2 3 2 3 4
For x = :01, this formula gives (:01)2 + (:01)3 + (:01)4 + 2 3! 4! = :00005 + :000000166666 + :00000000004166 + = :000050167 (Correct gure up to ve signi cant gures) 73
2
3!
4!
Remark: Note that if x were negative, then use of the convergent series for ex would not have
helped. For example, to compute ex for a negative value of x, cancellation can be avoided by using: 1 e;x = e1x = x2 + x3 + 1 + x + 2! 3!
Recursive Computations
In the above examples, we saw how subtractive cancellations can give inaccurate answers. There are, however, other common sources of roundo errors, e.g., recursive computations, which are computations performed recursively so that the computation of one step depends upon the results of previous steps. In such cases, even if the error made in the rst step is negligible, due to the accumulation and magni cation of error at every step, the nal error can be quite large, giving a completely erroneous answer. Certain recursions propagate errors in very unhealthy fashions. Consider a very nice example involving recursive computations, again from the book by Forsythe, Malcolm, and Moler CMMC pp. 1617].
Example 2.8.3
Suppose we need to compute the integral
En = En =
or
Z
0
1
xnex; dx
1
for di erent values of n. Integrating by parts gives
Z
1 0
xe
n x;1
dx = (x e
n x;1)1
0
;
Z
0
1
nxn; ex; dx
1 1
En = 1 ; nEn; n = 2 3 : : : Thus, if E is known, then for di erent values of n, En can be computed, using the above recursive
1 1
formula. Indeed, with = 10 and t = 6, and starting with E1 = 0:367879 as a sixdigit approximation to E1 = 1=e, we have from above: E1 = 0:367879 E2 = 0:264242 E3 = 0:207274 E4 = 0:170904 . . . E9 = ;0:068480 74
Although the integrand is positive throughout the interval 0 1], the computed value of E is negative. This phenomenon can be explained as follows. The error in computing E was ;2 times the error in computing E , the error in computing E was ;3 times the error in E (therefore, the error at this step was exactly six times the error in E ). Thus, the error in computing E was (;2)(;3)(;4) (;9) = 9! times the error in E . The
9 2 1 3 2 1 9 1
error in E1 was due to the rounding of 1=e using six signi cant digits, which is about 4:412 10;7. However, this small error multiplied by 9! gave 9! 4:412 10;7 = :11601, which is quite large.
Rearranging the Recurrence
Again, for this example, it turned out that we could get a much better result by simply rearranging the recursion so that the error at every step, instead of being magni ed, is reduced. Indeed, if we rewrite the recursion as En;1 = 1 ; En n = : : : 3 2 n
then the error at each step will be reduced by a factor of 1=n. Thus, starting with a large value of n (say, n = 20) and working backward, we will see that E9 will be accurate to full sixdigit precision. To obtain a starting value, we note that Z1 Z1 1 En = xn en;1dx xndx = n + 1 : 0 0 With n = 20, E20 1 . Let's take E20 = 0. Then, starting with E20 = 0, it can be shown (Forsythe, 21 Malcolm, and Moler CMMC, p. 17) that E9 = 0:0916123, which is correct to full sixdigit precision. 1 The reason for obtaining this accuracy was that the error in E20 was at most 21 this error was 1 1 1 multiplied by 20 in computing E19, giving an error of at most 20 21 = 0:0024 in the computation of E19, and so on.
2.9 Review and Summary
The concepts of oating point numbers and rounding errors have been introduced and discussed in this chapter. 1. Floating Point Numbers: A normalized oating point number has the form
x= r
e
where e is called exponent, r is the signi cant, and is the base of the number system. The oating point number system is characterized by four parameters: 75
 the base t  the precision L U  the lower and upper limits of the exponent. 2. Errors: The error(s) in a computation is measured either by absolute error or relative error.
The relative errors make more sense than absolute errors.
The relative error gives an indication of the number of signi cant digits in an approximate answer. The relative error in representing a real number x by its oating point representation (x) is bounded by a number , called the machine precision (Theorem 2.2.1). 3. Laws of Floating Point Arithmetic. (x y ) = (x y )(1 + ) where * indicates any of the four basic arithmetic operations + ; , or , and j j . 4. Addition, Multiplication, and Inner Product Computations. The results of addition and multiplication of n oating point numbers are given in Theorems 2.4.1 and 2.5.1, respectively. While adding n oating point numbers, it is advisable that they are added in ascending order of magnitude. While computing the inner product of two vectors, accumulation of inner product in double precision, whenever possible, is suggested. 5. Floating Point Matrix Multiplications. The entrywise and normalize error bounds for matrix multiplication of two oating point matrices are given in Theorems 2.7.1 and 2.7.2, respectively. Matrix multiplication in oating point arithmetic can be very inaccurate, unless one of the matrices is orthogonal (or unitary, if complex). Accumulation of inner product is suggested, whenever possible, in computing a matrixmatrix or a matrixvector product. The high accuracy in a matrix product computation involving an orthogonal matrix makes the use of orthogonal matrices in matrix computations quite attractive.
76
6. Roundo Errors Due to Cancellation and Recursive Computation. Two major sources of roundo errors are subtractive cancellation and recursive computations. They have been discussed in some detail in section 2.8. Examples have been given to show how these errors come up in many basic computations. An encouraging message here is that in many instances, computations can be reorganized so that cancellation can be avoided, and the error in recursive computations can be diminished at each step of computation.
2.10 Suggestions for Further Reading
For details of IEEE standard, see the monograph \An American National Standard: IEEE Standard for Binary FloatingPoint Arithmetic," IEEE publication, 1985. For results on error bounds for basic oating point matrix operations, the books by James H. Wilkinson (i) The Algebraic Eigenvalue Problem (AEP) and (ii) Rounding Errors in Algebraic Processes (PrenticeHall, New Jersey, 1963) are extremely useful and valuable resources. Discussion on basic oating point operations and rounding errors due to cancellations and recursive computations are given nowadays in many elementary numerical analysis textbooks. We shall name a few here which we have used and found useful. 1. Elementary Numerical Analysis by Kendall Atkinson, John Wiley and Sons, 1993. 2. Numerical Mathematics and Computing by Ward Cheney and David Kincaid, Brooks/Cole Publishing Company, California, 1980. 3. Computer Methods for Mathematical Computations by George E. Forsythe, Michael A. Malcolm and Cleve B. Moler, Prentice Hall, Inc., 1977. 4. Numerical Methods: A Software Approach by R. L. Johnston, John Wiley and Sons, Toronto, 1982. 5. Numerical Methods and Software by D. Kahaner, C. B. Moler and S. Nash, Prentice Hall, Englewood Cli s, NJ, 1988.
77
Exercises on Chapter 2
1. (a) Show that
j (x) ; xj jxj
81 > ;t for rounding < => 2 : ;t for chopping
1 1
(b) Show that (a) can be written in the form (x) = x(1 + ) j j ( x ! ) = x ! (1 + ek ) k k
k k
2
:
2. Let x be a oating point number and k be a positive integer, then where
jekj 2k + O( ):
3. Construct examples to show that the distributive law for oating point addition and multiplication does not hold. What can you say about the commutativity and associativity for these operations? Give reasons for your answers. 4. Let x1 x2 : : : xn be the n oating point numbers. De ne
s = (x + x ) sk = (sk; + xk ) k = 3 : : : n:
2 1 2 1
Then from Theorem 2.4.1 show that
sn = (x + x +
1 2
+ xn ) = x1 (1 + 1 ) + x2 (1 + 2 ) +
+ xn(1 + n ):
Give a bound for each
i
i = 1 : : : n.
5. (a) Construct an example to show that, when adding a list of oating point numbers, in general, the rounding error will be less if the numbers are added in order of increasing magnitude. (b) Find another example to show that this is not always necessarily true. 6. (a) Prove that the error in computing an inner product with accumulation in double precision is essentially independent of n. That is, show that if 2 (xT y ) denotes the computation of the inner product with accumulation in double precision, then unless severe cancellation takes place, j 2(xT y) ; xT yj c jxT yj + O( 2) (Wilkinson AEP, pp. 116117). 78
(b) Show by means of a simple example that if there is cancellation, then 2(xT y ) can di er signi cantly from xT y (take t = 3). (c) If s is a scalar, then prove that
xT y = x y (1 + ) + + xnyn (1 + n) : s s=(1 + ) Find bounds for and i , i = 1 : : : n. (See Wilkinson AEP, p. 118).
2 1 1 1
!
7. Show that (a) (cA) = cA + E jE j jcAj (b) (A + B ) = (A + B ) + E jE j (jAj + jB j) (c) (AB ) = AB + E jE j n jAj jB j + O( 2 ) (Wilkinson AEP, p. 115.) 8. Construct a simple example to show that the matrix multiplication in oating point arithmetic need not be accurate. Rework your example using accumulation of inner product in double precision. 9. Let A and B be a n n matrices, then show that (AB ) = AB + E where
jeij j n jfij j + 0( ) fij = inner product of the ith row of A and j th column of B .
2 2
10. Prove that if Q is orthogonal then (QA) = Q(A + E ) where kE k2 n2 kAk2jO( 2 ): 11. Let b be a column vector and x = Ab. Let x = (x). Then show that ^ kx ; xk p(n) kA;1k kAk ^ kxk where p(n) is a polynomial in n of low degree. (The number kA;1k kAk is called the condition number of A. There are matrices for which this number can be very big. For those matrices we then conclude that the relative error in matrixvector product can be quite large.) 79
12. Using Theorem 2.7.1, prove that, if B is nonsingular, k (AB) ; ABkF n kBk B;1 + O( 2): F F
kABkF
+1
13. Let y1 : : : yn be n column vectors de ned recursively:
yi = Ayi
14. Let = 10 t = 4. Compute where
i = 1 2 : : : n ; 1:
Let yi = (yi ). Find a bound for the relative error in computing each yi i = 1 : : : n. ^ (AT A)
1 C 4 C 0 C: A 0 10;4 Repeat your computation with t = 9. Compare the results. 15. Show how to arrange computation in each of the following so that the loss of signi cant digits can be avoided. Do one numerical example in each case to support your answer. (a) (b) (c) (d) (e)
0 B1 A = B 10; B @
1
ex ; x ; 1, p x +1;x , 1 1 x ; x + 1, x ; sin x, 1 ; cos x,
4 2
for negative values of x. for large values of x. for large values of x. for values of x near zero. for values of x near zero.
16. What are the relative and absolute errors in approximating (a) by 22 ? 7 1 by .333? (b) 3 (c) 1 by .166? 6 How many signi cant digits are there in each computation? 17. Let = 10 t = 4. Consider computing
How many correct digits of the exact answer will you get? 80
a = ( 1 ; :1666)=:1666: 6
18. Consider evaluating
p e= a +b :
2 2
How can the computation be organized so that over ow in computing a2 + b2 for large values of a or b can be avoided? 19. What answers will you get if you compute the following numbers on your calculator or computer? (a) (b) (c)
p 10 ; 1, p ; 10 ; 1, 10 ; 50
8 20 16
Compute the absolute and relative errors in each case. 20. What problem do you foresee in solving the quadratic equations (a) x2 ; 106x + 1 = 0 (b) 10;10x2 ; 1010x + 1010 = 0 using the wellknown formula
x = ;b
p
What remedy do you suggest? Now solve the equations using your suggested remedy, using t = 4. 21. Show that the integral
b ; 4ac : 2a
2
yi =
Z
0
1
can be computed by using the recursion formula: yi = 1 ; 5yi;1: i Compute y1 y2 : : : y10 using this formula, taking
0 1 =0
xi dx x+5
y = ln(x + 5)jx = ln 6 ; ln 5 = ln(1:2):
What abnormalities do you observe in this computations? Explain what happened. Now rearrange the recursion so that the values of yi can be computed more accurately. 22. Suppose that x approximates 104 50000 and 55596 to ve signi cant gures. Find the largest interval in each case containing x . 81
3. STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS
3.1 Some Basic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 3.1.1 Computing the Norm of a Vector : : : : : : : : : : : : : : : : : : : : : : : : : 82 3.1.2 Computing the Inner Product of Two Vectors : : : : : : : : : : : : : : : : : : 83 3.1.3 Solution of an Upper Triangular System : : : : : : : : : : : : : : : : : : : : : 83 3.1.4 Computing the Inverse of an Upper Triangular Matrix : : : : : : : : : : : : : 84 3.1.5 Gaussian Elimination for Solving Ax = b : : : : : : : : : : : : : : : : : : : : 86 3.2 De nitions and Concepts of Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : 91 3.3 Conditioning of the Problem and Perturbation Analysis : : : : : : : : : : : : : : : : 95 3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of the Solution 96 3.5 The Wilkinson Polynomial : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98 3.6 An Illconditioned Linear System Problem : : : : : : : : : : : : : : : : : : : : : : : : 100 3.7 Examples of Illconditioned Eigenvalue Problems : : : : : : : : : : : : : : : : : : : : 100 3.8 Strong, Weak and Mild Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 103 3.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 105 3.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 106
CHAPTER 3 STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS
3. STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS
3.1 Some Basic Algorithms
De nition 3.1.1 An algorithm is an ordered set of operations, logical and arithmetic, which when applied to a computational problem de ned by a given set of data, called the input data, produces a solution to the problem. A solution is comprised of a set of data called the output data.
In this book, for the sake of convenience and simplicity, we will very often describe algorithms by means of pseudocodes which can be translated into computer codes easily. Describing algorithms by pseudocodes has been made popular by Stewart through his book IMC (1973).
3.1.1 Computing the Norm of a Vector Given x = (x : : : xn)T , compute kxk .
1 2
Algorithm 3.1.1 Computing the Norm of a Vector Input Data: n x : : : xn. Step 1: Compute r = max(jx j : : : jxnj). Step 2: Compute yi = xi=r i = 1 : : : n Step 3: Compute s = kxk = rp(y + + yn).
1 1 2 2 1 2
Output Data: s. Pseudocodes r = max(jx j : : : jxnj)
s=0 For i = 1 to n do yi = xi=r s = s + yi s = r(s) =
2 1 2 1
G. W. Stewart, a former student of the celebrated numerical analyst Alston Householder, is a professor of computer science at the University of Maryland. He is well known for his many outstanding contributions in numerical linear algebra and statistical computations. He is the author of the book Introduction to Matrix Computations.
82
An Algorithmic Note
In order to avoid over ow, each entry of x has been normalized before using the formula q kxk2 = x2 + + x2 : 1 n
3.1.2 Computing the Inner Product of Two Vectors
Given x and y two nvectors, compute the inner product
xT y = x y + x y +
1 1 2 2
+ xnyn :
Algorithm 3.1.2 Computing the Inner Product of Two Vectors Input Data: n x x : : : xn: Step 1: Compute the partial products: si = xiyi i = 1 : : : n: n X Step 2: Add the partial products: Sum = si:
1 2
i=1
Output Data: Sum Pseudocodes Sum = 0 For i = 1 : : : n do Sum = Sum + xiyi 3.1.3 Solution of an Upper Triangular System
Consider the system
Ty = b t y +t y + t y + t y +
11 1 12 2 22 2 33 3
where T = (tij ) is a nonsingular upper triangular matrix and y = (y1 y2 : : : yn)T . Speci cally, + t1n yn = b1 + t2n yn = b2 + t3n yn = b3
. . .
83
tn;
1
n;1yn;1 + tn;1 nyn
tnnyn
= bn;1 = bn
where each tii 6= 0 for i = 1 2 : : : n. The last equation is solved rst to obtain yn , then this value is inserted in the next to the last equation to obtain yn;1, and so on. This process is known as back substitution. The algorithm can easily be written down.
Algorithm 3.1.3 Back Substitution Input Data: T = (tij ), an n n upper triangular matrix, and b an nvector. Step 1: Compute yn = tbn nn Step 2: Compute yn; through y successively: 0 1 n 1 @b ; X t y A i = n ; 1 : : : 2 1: yi = t i ij j
1 1
ii
j =i+1
Output Data: y = (y : : : yn)T .
1
Pseudocodes For i = n 0 ; 1 : : : 3 2 1 do n 1 n X yi = t @bi ; tij yj A
1
ii
Note: When i = n, the summation (P) is skipped.
j =i+1
3.1.4 Computing the Inverse of an Upper Triangular Matrix
Finding the inverse of an n n matrix A is equivalent to nding a matrix X such that
AX = I:
Let X = (x1 : : : xn) and I = (e1 : : : en), where xi is the ith column of X and ei is the ith column of I . Then the matrix equation AX = I amounts to solving n linear systems:
Axi = ei
i = 1 : : : n:
The job is then particularly simple when A is a triangular matrix. Let T be an upper triangular matrix. Then nding its inverse S = (s1 : : : sn) amounts to solving n upper triangular linear systems: Txi = ei i = 1 : : : n: 84
Let si = (s1i s2i : : : sni)T .
For i=1: Ts = e gives
1 1
(The entries s21 through sn1 are all zero.)
s = t1 :
11 11
For i=2: Ts = e gives
2 2
(The entries s32 through sn2 are all zero.)
s = t1
22
22
s = ; t1 (t s ):
12 11 12 22
For i=k: Tsk = ek gives
skk = t1
kk
sik = ; t1 (ti i si
ii
+1
+1
k
: : : tik skk) i = k ; 1 k ; 2 ::: 1:
(The other entries of the column sk are all zero.) The pseudocodes of the algorithm can now be easily written down.
A Convention: From now onwards, we shall use the following format for algorithm
descriptions.
Algorithm 3.1.4 The Inverse of an Upper Triangular Matrix
Let T be an n n nonsingular upper triangular matrix. The following algorithm computes S the inverse of T . For k = n n ; 1 : : : 1 do (1) s = 1
kk
tkk
(2) sik = ;t;1 ii
k X j =i+1
tij sjk (i = k ; 1 k ; 2 : : : 1).
Example 3.1.1
05 2 31 C B T = B0 2 1C A @
0 0 4 85
k = 3:
s
33
= 1 4
s s
23
13
= ; t1 (t12s23 + t13 s33) 11 1 (2 (; 1 ) + 3 1 ) = ;5 8 4 1 = ; 10 = 1 =1 t22 2 = ; t1 (t12s22 ) 11 1 (2 1 ) = ; 1 = ; 5 2 5 = t1 = 1 5 11
1 5
= ; t1 (t23s33 ) 22 1 (1 1 ) = ; 1 : = ;2 4 8
k = 2:
s
22
s
12
k = 1:
0 ; B T; = S = B 0 @
1
s
11
1 5
0
0
1 2
; ;
1 10 1 8
1 4
1 C C A
3.1.5 Gaussian Elimination for Solving Ax = b
Consider the problem of solving the linear system of n equations in n unknowns:
a x + a x + + a n xn = b a x + a x + + a n xn = b
11 21 1 1 12 22 2 2 1 2
1 2
an x + an x + + annxn
1 1 2 2
. . . = bn
or, in matrix notation,
Ax = b
86
where a = (aij ) and b = (b1 : : : bn)T . A wellknown approach for solving the problem is the classical elimination scheme known as Gaussian elimination. A detailed description and mechanism of development of this historical algorithm and its important practical variations will appear in Chapters 5 and 6. However, for a better understanding of some of the material presented in this chapter, we just give a brief description of the basic Gaussian elimination scheme. that the reduced upper triangular system can be solved easily using the back substitution algorithm (Algorithm 3.1.3).
Basic idea. The basic idea is to reduce the system to an equivalent upper triangular system so
Reduction process. The reduction process consists of (n ; 1) steps. Step 1: At step 1, the unknown x is eliminated from the second through the nth equations. a This is done by multiplying the rst equation by ; a ; a : : : ; a and adding it, a a
1
respectively, to the 2nd through n equations. The quantities
th
21 11
31 11
n1
11
are called multipliers. At the end of step 1, the system Ax = b becomes A(1) x = b(1), where the entries of A(1) = (a(1)) and those of b(1) are related to the entries of A and b as follows: ij
a mi = ; a i
1
1
11
i = 2 ::: n
aij = aij + mi a j (i = 2 : : : n j = 2 : : : n) bi = bi + mi b (i = 2 : : : n):
(1) 1 1 (1) 1 1
(Note that a(1) a(1) : : : a(1) are all zero.) 21 31 nn
Step 2: At step 2, x is eliminated from the 3rd through the nth equations of A x = b by
multiplying the second equations of A(1) x = b(1) by the multipliers
2 (1) (1)
a mi = ; ai
2
2
22
i = 3 ::: n
and adding it, respectively, to the 3rd through nth equations. The system now becomes A(2)x = b(2), whose entries are given as follows:
aij = aij + mi a j (i = 3 : : : n j = 3 : : : n) bi = bi + mi b (i = 3 : : : n)
(2) (1) (1) 2 2 (2) (1) (1) 2 2
and so on. 87
Step k: At step k, the (n ; k) multipliers mik = ; aik; , i = k + 1 : : : n are formed and using k
(
k;1)
them, xk is eliminated from the (k + 1)th through the nth equations of A(k;1)x = b(k;1). The entries of A(k) and those of b(k) are given by
k aijk = aijk; + mik akj; bik = bik; + mikbkk;
( ) ( 1) ( ( ) ( 1) ( 1) 1)
akk
(
1)
(i = k + 1 : : : n j = k + 1 : : : n) (i = k + 1 : : : n)
( 1)
Step n1: At the end of the (n ; 1)th step, the reduced matrix A n; is upper triangular and the
original vector b is transformed to b(n;1). We are now ready to write down the pseudocodes of the Gaussian elimination scheme. The following summarized observations will help write the pseudocodes: 1. There are (n ; 1) steps (k = 1 2 : : : n ; 1):
2. For each value of k, there are (n ; k) multipliers: mik (i = k + 1 : : : n). 3. For each value of k, only (n ; k)2 entries of A(k) are modi ed (i = k + 1 : : : n j = k + 1 : : : n). The (n ; k) entries below the (k k)th entry of the kth column are zeros and the remaining other entries that are not modied remain the same as those of the corresponding entries of A(k;1).
Algorithm 3.1.5 Basic Gaussian Elimination
mi k = ; aik ; . k akk For j = k + 1 : : : n do k aijk = aijk; + mik akj; bik = bik; + mikbkk; (Note that A = (aij ) = (aij ) b = b .)
( ( 1) ( ) ( 1) ( 1) ( ) ( 1) ( 1) (0) (0)
For k = 1 2 : : : n ; 1 do For i = k + 1 : : : n do
k;1)
88
Remarks:
1. The above basic Gaussian elimination algorithm is commonly known as the Gaussian elimination algorithm without row interchanges or the Gaussian elimination algorithm without pivoting. The reason for having such a name will be clear from the discussion of this algorithm again in Chapter 5. 2. The basic Gaussian algorithm as presented above is not commonly used in practice. Two practical variations of this algorithm, known as Gaussian elimination with partial and complete pivoting, will be described in Chapters 5 and 6. 3. We have assumed that the quantities a11 a(1) : : : a(n;1) are di erent from zero. If 22 nn
any of them is computationally zero, the algorithm will stop.
Example 3.1.2
5x1 + x2 + x3 = 7 x1 + x2 + x3 = 3 2x1 + x2 + 3x3 = 6 or
05 1 110x B B1 1 1CBx CB @ A@
2 1 3
1 2 3
x
1 071 C C = B3C B C A @ A
6
Ax = b:
Step 1: k=1.
i = 2 3: 2 m = ;a = ;1 m = ;a = ;5: a 5 a j = 2 3:
21 21 11 31 31 11
89
= a22 + m21a12 = 4 5 i = 2 j = 3 : a(1) = a23 + m21a13 = 4 23 5 i = 3 j = 2 : a(1) = a32 + m31a12 = 3 32 5 i = 3 j = 3 : a(1) = a33 + m31a13 = 13 33 5 8 b(1) = b2 + m21b1 = 5 2 b(1) = b3 + m31b2 = 16 3 5 (1) (1) (1) (1) (1) (1) (Note: b1 = b1 a21 = a31 = 0, a11 = a11 a12 = a12 a13 = a13.)
i=2 j=2: a
(1) 22
05 1 1 10x 1 071 B CB C B0 CBx C = B C B C @ A@ A @ A
1 2 3
0
4 5 3 5
4 5
13 5
x
8 5
16 5
A x = b :
(1) (1)
Step 2: k=2.
i=3 m = ;a = ;3 4 a
32 (1) 32 (1) 22
i=3 j=3: a b
(2) 33
(2) 3
= a(1) + m32a(1) = 2 33 23 (1) (1) = b3 + m32b2 = 2
1 2 3
071 05 1 1 10x 1 CBx C = B C B0 B C B CB C @ A @ A@ A
0 0 2
4 5 4 5
x
2
8 5
A x = b
(2)
(2)
Note that A(2) is upper triangular.
Back Substitution: The above triangular system is easily solved using back substitution:
2x3 = 2 ) x3 = 1 90
4x + 4x = 8 ) x = 1 2 5 2 5 3 5 5x1 + x2 + x + 3 = 7 ) x1 = 1
3.2 De nitions and Concepts of Stability
The examples on catastrophic cancellations and recursive computations in the last chapter had one thing in common: the inaccuracy of the computed result in each case was due entirely to the algorithm used, because as soon as the algorithm was changed or rearranged and applied to the problem with the same data, the computed result became very satisfactory. Thus, we are talking about two di erent types of algorithms for a given problem. The algorithms of the rst typegiving inaccurate resultsare examples of unstable algorithms, while the ones of the second typegiving satisfactory resultsare stable algorithms. The study of stability is very important. This is done by means of roundo error analysis. There are two types: backward error analysis and forward error analysis. In forward analysis an attempt is made to see how the computed solution obtained by the algorithm di ers from the exact solution based on the same data.
De nition 3.2.1 An algorithm will be called forward stable if the computed solution x is close ^
to the exact solution, x, in some sense. The roundo error bounds obtained in Chapter 2 for various matrix operations are the result of forward error analyses. On the other hand, backward analysis relates the error to the data of the problem rather than to the problem's solution. Thus we de ne backward stability as follows:
De nition 3.2.2 An algorithm is called backward stable if it produces an exact solution to a nearby problem.
Backward error analysis, introduced in the literature by J. H. Wilkinson, is nowadays widely used in matrix computations and using this analysis, the stability (or instability) of many algorithms in numerical linear algebra has been established in recent years. In this book, by \stability"
we will imply \backward stability", unless otherwise stated.
James H. Wilkinson, a British mathematician, is well known for his pioneering work on backward error analysis for matrix computations. He was a liated with the National Physical Laboratory in Britain, and held visiting appointments at Argonne National Laboratory, Stanford University, etc. Wilkinson died an untimely death in 1986. A fellowship in his name has since been established at Argonne National Laboratory. Wilkinson's book The Algebraic Eigenvalue Problem is an extremely important and very useful book for any numerical analyst.
91
As a simple example of backward stability, consider the case of computing the sum of two oating point numbers x and y . We have seen before that (x + y ) = (x + y )(1 + ) = x(1 + ) + y (1 + ) = x0 + y 0 Thus, the computed sum of two oating point numbers x and y is the exact sum of another two oating point numbers x0 and y 0 . Since
jj
both x0 and y 0 are close to x and y , respectively. Thus we conclude that the operation of adding two oating point numbers is backward stable. Similar statements, of course, hold for other oating point arithmetic operations. For yet another type of example, consider the problem for solving the linear system Ax = b:
De nition 3.2.3 An algorithm for solving Ax = b will be called stable if the computed solution
x is such that ^
(A + E )^ = b + b x with E and b small.
How Do We Measure Smallness?
The \smallness" of a matrix or a vector is measured either by looking into its entries or by computing its norm.
Normwise vs. Entrywise Errors
While measuring errors in computations using norms is traditional in matrix computations, componentwise measure of errors is becoming increasingly important. It really does make more sense. An n n matrix A has n2 entries, but the norm of A is a single number. Thus the smallness or largeness of the norm of an error matrix E does not truly re ect the smallness or largeness of the individual entries of E . For example, if E = (10 :00001 1)T , then kE k = 10:0499: Thus the small entry .00001 was not re ected in the norm measure. 92
Examples of Stable and Unstable Algorithms by Backward Error Analysis Example 3.2.1 A Stable Algorithm  Solution of an Upper Triangular System by Back Substitution
Consider Algorithm 3.1.3 (the back substitution method). Suppose the algorithm is implemented using accumulation of inner product in double precision. Then it can be shown (see Chapter 11) that the computed solution x satis es ^ (T + E )^ = b x where the entries of the error matrix E are quite small. In fact, if E = (eij ) and T = (tij ), then
jeij j
jtij j10;t
i j = 1 ::: n
showing that the error can be even smaller than the error made in rounding the entries of T . Thus, the back substitution process for solving an upper triangular system is stable.
Example 3.2.2 An Unstable Algorithm  Gaussian Elimination Without Pivoting
Consider the problem of solving the nonsingular linear system Ax = b using Gaussian elimination (Algorithm 3.1.5). It has been shown by Wilkinson (see Chapter 11 of this book) that, when the process does not break down, the computed solution x satis es ^ (A + E )^ = b x with where
kE k1 cn kAk1 + 0( )
3 2
A k = (aijk )
( ) ( )
are the reduced matrices in the elimination process and , known as the growth factor, is given by max max ja(k)j ij = k ij : max jaij j ij More speci cally, if = max jaij j, and ij
k
= max ja(k)j, then the growth factor is given by i j ij
1
= max(
:::
n;1) :
93
1 2 One step of Gaussian elimination using 9 decimal digit oating point arithmetic will yield the reduced matrix ! 10;10 1 ! 10;10 1 A(1) = = : 0 2 ; 1010 0 ;1010 The growth factor for this problem is then 10 10 = max( 1) = max(2 10 ) = 10 2 2 which is quite large. Thus, if we now proceed to solve a linear system with this reduced upper triangular matrix, we cannot expect a small error matrix E . Indeed, if we wish to solve 10;10x1 + x2 = 1 x1 + 2x2 = 3 using the above A(1) , then the computed solution will be x1 = 0 x2 = 1, whereas the exact solution is x1 = x2 = 1. This shows that Gaussian elimination is unstable for an arbitrary linear system.
Now for an arbitrary matrix A, can be quite large, because the entries of the reduced matrices A k can grow arbitrarily. To see this, consider the simple matrix ! 10; 1
( )
A=
10
:
We shall discuss this special system in Chapter 6 in some detail.
Note: Gaussian Elimination without pivoting is not unstable for all matrices. There are certain classes of matrices such as symmetric positive de nite matrices, etc., for which Gaussian elimination is stable.
If an algorithm is stable for a given matrix A, then one would like to see that the algorithm is stable for every matrix A in a given class. Thus, we may give a formal de nition of stability as follows:
De nition 3.2.4 An algorithm is stable for a class of matrices C if for every matrix A in C ,
the computed solution by the algorithm is the exact solution of a nearby problem. Thus, for the linear system problem
Ax = b
94
an algorithm is stable for a class of matrices C if for every A 2 C and for each b, it produces a computed solution x that satis es ^ (A + E )^ = = b + b x for some E and b, where (A + E ) is close to A and b + b is close to b.
3.3 Conditioning of the Problem and Perturbation Analysis
From the preceding discussion we should not form the opinion that if a stable algorithm is used to solve a problem then the computed solution will be accurate. A property of the problem called conditioning also contributes to the accuracy or inaccuracy of the computed result. The conditioning of a problem is a property of the problem itself. It is concerned with how the solution of the problem will change if the input data contains some impurities. This concern arises from the fact that in practical applications very often the data come from some experimental observations where the measurements can be subjected to disturbances (or \noise") in the data. There are other sources of error also, for example, roundo errors (discussed in Chapter 11), discretization errors, etc. Thus, when a numerical analyst has a problem in hand to solve, he or she must frequently solve the problem not with the original data, but with data that has been perturbed. The question naturally arises: What e ects do these perturbations
have on the solution?
A theoretical study done by numerical analysts to investigate these e ects, which is independent of the particular algorithm used to solve the problem, is called perturbation analysis. This study helps one detect whether a given problem is \bad" or \good" in the sense of whether small perturbations in the data will create a large or small change in the solution. Speci cally we de ne:
De nition 3.3.1 A problem (with respect to a given set of data) is called an illconditioned or badlyconditioned problem if a small relative error in data causes a large relative error in the computed solution, regardless of the method of solution. Otherwise, it is called wellconditioned.
Suppose a problem P is to be solved with an input c. Let P (c) denote the computed value of the problem with the input c. Let c denote the perturbation in c. Then P will be said to be illconditioned for the input data c if the relative error in the answer: jP (c + c) ; P (c)j jP (c)j is much larger than the relative error in the data:
j cj : jcj
95
Note: The de nition of conditioning is datadependent. Thus, a problem which is illconditioned for one set of data could be wellconditioned for another set.
3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of the Solution
As stated in the previous section, the conditioning of a problem is a property of the problem itself, and has nothing to do with the algorithm used to solve the problem. To a user, of course, the accuracy of the computed solution is of primary importance. However, the accuracy of a computed solution by a given algorithm is directly connected with both the stability of the algorithm and the conditioning of the problem. If the problem is illconditioned, no matter how stable the
algorithm is, the accuracy of the computed solution cannot be guaranteed. Backward Stability and Accuracy
Note that the de nition of backward stability does not say that the computed solution x by a backward stable algorithm will be close to ^ the exact solution of the original problem. However, when a stable algorithm is applied to a wellconditioned problem, the computed solution should be near the exact solution.
The illconditioning of a problem contaminates the computed solution, even with the use of a stable algorithm, therefore yielding an unacceptable solution. When a computed solution is unsatisfactory, some users (who are not usually concerned with conditioning) tend to put the blame on the algorithm for the inaccuracy. To be fair, we should test an algorithm for stability only on wellconditioned matrices. If the algorithm passes the test of stability on wellconditioned matrices, then it should be declared a stable algorithm. However, if a \stable" algorithm is applied to an illconditioned problem, it should not introduce more error than what the data warrants. From the previous discussion, it is quite clear now that investigating the conditioning of a problem is very important.
96
The Condition Number of a Problem
Numerical analysts usually try to associate a number called the condition number with a problem. The condition number indicates whether the problem is ill or wellconditioned. More speci cally, the condition number gives a bound for the relative error in the solution when a small perturbation is applied to the input data. In numerical linear algebra condition numbers for many (but not all) problems have been identi ed. Unfortunately, computing the condition number is often more involved and time consuming than solving the problem itself. For example (as we shall see in Chapter 6), for the linear system problem Ax = b, the condition number is Cond(A) = kAk kA;1k: Thus, computing the condition number in this case involves computing the inverse of A it is more expensive to compute the inverse than to solve the system Ax = b. In Chapter 6 we shall discuss methods for estimating Cond(A) without explicitly computing A;1 . We shall discuss conditioning of each problem in detail in the relevant chapter. Before closing this section, however, let's mention several wellknown examples of illconditioned problems.
An IllConditioned Subtraction
Consider the subtraction: c = a ; b:
a = 12354101 b = 12345678 c = a ; b = 8423:
Now perturb a in the sixth place:
a = 12354001 ^ c = c + c = a ; b = 8323: ^ ^
Thus, a perturbation in the sixth digit in the input value caused a change in the second digit in the answer. Note that the relative error in the data is a ; a = :000008 ^
a
97
while the relative error in the computed result is c ; c = :01187722: ^
An IllConditioned RootFinding Problem
Consider solving the simple quadratic equation:
c
f (x) = x ; 2x + 1:
2
The roots are x = 1 1: Now perturb the coe cient 2 by 0.00001. The computed roots of the perturbed polynomial f^(x) = x2 ; 2:00001x + 1 are: x1 = 1:0032 and x2 = :9968: Relative errors in x1 and x2 are .0032. The relative error in the data is 5 10;6.
3.5 The Wilkinson Polynomial
The above example involved multiple roots. Multiple roots or roots close to each other invariably make the root nding problem illconditioned however, the problem can be illconditioned even when the roots are very well separated. Consider the following wellknown example by Wilkinson (see also Forsythe, Malcolm and Moler CMMC, pp. 18{19).
P (x) = (x ; 1)(x ; 2) (x ; 20) = x ; 210x +
20 19
The zeros of P (x) are 1 2 : : : 20 and are distinct. Now perturb the coe cient of x19 from ;210 to ;210 + 2;20, leaving other coe cients unchanged. Wilkinson used a binary computer with t = 30. Therefore, this change signi ed a change in the 30th signi cant base 2 digit. The roots of the perturbed polynomial, carefully computed by Wilkinson, were found to be (reproduced from CMMC, p. 18): 1.00000 2.00000 3.00000 4.00000 4.99999 6.00000 6.99969 8.00726 8.91725 20.84690 0000 0000 0000 0000 9928 6944 7234 7603 0249 8101 10.09526 11.79363 13.99235 16.73073 19.50243 6145 3881 8137 7466 9400 0.64350 1.65232 2.51883 2.81262 1.94033 0904i 9728i 0070i 4894i 0347i
98
The table shows that certain zeros are more sensitive to the perturbation than are others. The following analysis, due to Wilkinson (see also Forsythe, Malcolm and Moler CMMC, p. 19), attempts to explain this phenomenon. Let the perturbed polynomial be
P (x ) = x ; x +
20 19
Then the (condition) number
x
x=i
measures the sensitivity of the root n = i i = 1 2 : : : 20. To compute this number, di erentiate the equation P (x ) = 0 with respect to :
x = ; P= P= x x =
19
XX
20 20
i=1 j =1
(x ; j )i
:
i 6= j
The values of x
x=i
for i = 1 : : : 10 are listed below. (For the complete list, see CMMC, p. 19.) Root x= jx=i Root x= jx=i 1 ;8:2 10;18 11 ;4:6 107 2 8:2 10;11 12 2:0 108 3 ;1:6 10;6 13 ;6:1 108 4 2:2 10;3 14 1:3 109 5 ;6:1 10;1 15 ;2:1 109 6 5:8 101 16 2:4 109 7 ;2:5 103 17 ;1:9 109 8 6:0 104 18 1:0 109 9 ;8:3 105 19 ;3:1 108 10 7:6 106 20 4:3 107
99
Root nding and Eigenvalue Computation idea to compute the eigenvalues of a matrix by explicitly nding the coe cients of the characteristic polynomial and evaluating its zeros, since the roundo errors in computations will
The above examples teach us a very useful lesson: it is not a good
invariably put some small perturbations in the computed coe cients of the characteristic polynomial, and these small perturbations in the coe cients may cause large changes in the zeros. The eigenvalues will then be computed inaccurately.
3.6 An Illconditioned Linear System Problem
The matrix
01 B B H = B .. . . . . B B. . . @
1 2 1 3 1 3 1 4 1 2
is called the Hilbert matrix after the celebrated mathematician David Hilbert. The linear system problem, even with a Hilbert matrix of moderate order, is extremely illconditioned. For example, take n = 5 and consider solving Hx = b where b (2:2833 1:4500 1:0929 0:8845 0:7456). The exact solution is x = (1 1 1 1 1 )T : Now perturb the (5 1)th element of H in the fth place to obtain .20001. The computed solution with this very slightly perturbed matrix is (0:9937 1:2857 ;0:2855 2:9997 0:0001)T . Note that Cond(H ) = 0(105). For more examples of illconditioned linear system problems see Chapter 6.
n
1
n+1
1
1 C n C C . C . C . A
n
1 1 +1 2
n;1
1
3.7 Examples of Illconditioned Eigenvalue Problems
Example 3.7.1
100
Consider the 10 10 matrix:
1 The eigenvalues of A are all 1. Now perturb the (10,1) coe cient of A by a small quantity = 10;10. Then the eigenvalues of the perturbed matrix computed using the software MATLAB to be described in the next chapter (that uses a numerically e ective eigenvaluecomputation algorithm) were found to be: 0 1:0184 + 0:0980i 0:9506 + 0:0876i 1:0764 + 0:0632i 0:9051 + 0:0350i 1:0999 + 0:00i 1:0764 ; 0:0632i 0:9051 ; 0:0350i 1:0184 ; 0:0980i 0:9506 ; 0:0876i (Note the change in the eigenvalues.)
01 1 1 B 1 1 0 C B C B C B ... ... C : B C A=B C B ... 1 C B 0 C @ A
Example 3.7.2 The WilkinsonBidiagonal Matrix
Again, it should not be thought that an eigenvalue problem can be illconditioned only when the eigenvalues are multiple or are close to each other. An eigenvalue problem with wellseparated eigenvalues can be very illconditioned too. Consider the 20 20 triangular matrix (known as the Wilkinsonbidiagonal matrix): 0 20 20 1 B 19 20 0 C B C B C B C: ... ... C A=B B C B . . . 20 C B 0 C @ A 1 The eigenvalues of A are 1 2 : : : 20. Now perturb the (20,1) entry of A by = 10;10). If the eigenvalues of this slightly perturbed matrix are computed using a stable algorithm (such as the 101
QR iteration method to be described in Chapter 8), it will be seen that some of them will change drastically they will even become complex. In this case also, certain eigenvalues are more illconditioned than others. To explain this, Wilkinson computed the condition number of each of the eigenvalues. The condition number of the eigenvalue i of A is de ned to be (see Chapter 8): Cond( i) = jy T1x j
i i
where yi and xi are, respectively, the normalized left and right eigenvectors of A corresponding to the eigenvalue i (recall that x is a right eigenvector of A associated with an eigenvalue if Ax = x, x 6= 0) similarly, y is a left eigenvector associated with if yT A = yT ). In our case, the right eigenvector xr corresponding to r = r has components (see Wilkinson AEP, p. 91): r 20 ; r 1 20 ; r (20 ;;)(19 ; r) : : : (;20)20;r 0 : : : 0 2 ;20 ( 20) while the components of yr are r ; 1)! 0 0 : : : 0 (r ;;1 : : : (r ; 1)(2 ; 2) r 20 1 1 : r 20 20 These vectors are not quite normalized, but still, the reciprocal of their products gives us an estimate of the condition numbers. In fact, Kr , the condition number of the eigenvalue = r, is ; r 19 Kr = y T1x = (20(; 1) 20; 1)! r)!(r r r The number Kr is large for all values or r. The smallest Ki for the Wilkinson matrix are K1 = K20 4:31 107 and the largest ones are K11 = K10 3:98 1012.
Example 3.7.3 (Wilkinson AEP, p. 92) 0 n (n ; 1) (n ; 2) B (n ; 1) (n ; 1) (n ; 2) B B B 0 (n ; 2) (n ; 2) B B . B ... A = B .. B . B . B . B . B . B . @
0 102
3 2 11 C 3 2 1C . .C ... . .C . .C C ... ... . . C . .C . .C ... ... 2 . C .C .C C 2 2 1C A 0 1 1
As n increases, the smallest eigenvalues become progressively illconditioned. For example, when n = 12, the condition numbers of the rst few eigenvalues are of order unity while those of the last three are of order 107.
3.8 Strong, Weak and Mild Stability
While establishing the stability of an algorithm by backward error analysis, we sometimes get much more than the above de nition of stability calls for. For example, it can be shown that when Gaussian elimination with partial and complete pivoting, (for discussions on partial and complete pivoting, see Chapter 5) is applied to solve a nonsingular system Ax = b, the computed solution x not only satis es ^ (A + E )^ = b x with a small error matrix E , but also we have (A + E ) is nonsingular. The standard de nition of stability, of course, does not need that. On the other hand, if Gaussian elimination without pivoting is applied to a symmetric positive de nite system, the computed solution x satis es (A + E )^ = b with a small error matrix E , but ^ x the standard backward error analysis does not show that (A + E ) is symmetric positive de nite. Thus, we may talk about two types of stable algorithms: one type giving not only a small error matrix, but also the perturbed matrix A + E belonging to the same class as the matrix A itself, and the other type just giving a small error matrix without any restriction on A + E . To distinguish between these two types of stability, Bunch (1987) has recently introduced in the literature the concept of strong stability. Following Bunch we de ne: for a class of matrices C if, for each A in C the computed solution is the exact solution of a nearby problem, and the matrix (A + E ) also belongs to C .
De nition 3.8.1 An algorithm for solving the linear system problem Ax = b is strongly stable
Examples (Bunch).
1. The Gaussian elimination with pivoting is strongly stable on the class of nonsingular matrices. (See Chapter 6 for a description of the Choleskyalgorithm.) 2. The Choleskyalgorithm for computing the factorization of a symmetric positive de nite matrix of the form, A = HH T , is strongly stable on the class of symmetric positive de nite matrices.
James R. Bunch is a professor of mathematics at the University of California at San Diego. He is well known for his work on e cient factorization of symmetric matrices (popularly known as the BunchKaufman and the BunchParlett factorization procedures), and his work on stability and conditioning.
103
3. The Gaussian elimination without pivoting is strongly stable on the class of nonsingular diagonally dominant matrices. With an analogy of this de nition of strong stability, Bunch also introduced the concept of weak stability that depends upon the conditioning of the problem.
De nition 3.8.2 An algorithm is weakly stable for a class of matrices C if for each well
conditioned matrix in C the algorithm produces an acceptable accurate solution. Thus, an algorithm for solving the linear system Ax = b is weakly stable for a class of matrices C if for each wellconditioned matrix A in C and for each b, the computed solution x to Ax = b is ^ such that kx ; xk is small: ^ kxk Bunch was motivated to introduce this de nition to point out that the wellknown (and frequently used by engineers) Levinson Algorithm for solving linear systems involving TOEPLITZ matrices (a matrix T = (tij ) is TOEPLITZ if the entries along each diagonal row are the same) are weakly stable on the class of symmetric positive de nite Toeplitz matrices. This very important and remarkable result was proved by Cybenko (1980). The result was important because the signal processing community had been using the Levinson algorithm routinely for years, without fully investigating the stability behavior of this important algorithm.
Remarks on Stability, Strong Stability and Weak Stability
1. If an algorithm is strongly stable, it is necessarily stable. 2. Note that stability implies weak stability. Weak stability is good enough for most users. 3. If a numerical analyst can prove that a certain algorithm is not weakly stable, then it follows that the algorithm is not stable, because, \not weakly stable" implies \not stable".
a solution that is an exact solution of a nearby problem. But it might very well happen that an algorithm produces a solution that is only close to the exact solution of a nearby problem.
George Cybenko is a professor of electrical engineering and computer science at Dartmouth College. He has made substantial contributions in numerical linear algebra and signal processing.
Mild Stability: We have de ned an algorithm to be backward stable if the algorithm produces
104
How should we then call such an Algorithm? Van Dooren, following de Jong (1977) has called such an algorithm mixed stable algorithm, and Steward (IMC, 1973) has de ned such an algorithm as just stable algorithm under the additional restriction that the data of the nearby problem and the original data belong to the same class. We believe that it is more appropriate to call such stability as mild stability. After all, such an algorithm is stable in the mild sense. We thus de ne:
De nition 3.8.3 An algorithm is mildly stable if it produces a solution that is close to the exact
solution of a nearby problem.
Example 3.8.1
1. The QRalgorithm for the rank de cient least squares problems is mildly stable (see Lawson and Hanson SLP, p. 95 and Chapter 7 of this book).
2. The QRalgorithm for the fullrank underdetermined least squares problem is mildly stable (see Lawson and Hanson SLP, p. 93 and Chapter 7 of this book).
3.9 Review and Summary
In this chapter we have introduced two of the most important concepts in numerical linear algebra, namely, the conditioning of the problem and stability of the algorithm, and have discussed how they e ect the accuracy of the solution. 1. Conditioning of the Problem: The conditioning of the problem is a property of the problem. A problem is said to be illconditioned if a small change in the data causes a large change in the solution, otherwise it is wellconditioned. The conditioning of a problems is data dependent. A problem can be illconditioned with respect to one set of data while it may be quite wellconditioned with respect to another set. Illconditioning or wellconditioning of a matrix problem is generally measured by means of a number called the condition number. The condition number for the linear system problem Ax = b is kAk kA;1k. The wellknown examples of illconditioned problems are: the Wilkinson polynomial for the root nding problem, the Wilkinson bidiagonal matrix for the eigenvalue problem, the Hilbert matrix for the algebraic linear system problem, etc.
Paul Van Dooren is a professor of electrical engineering at University of Illinois at UrbanaChampaign. He has received several prestigious awards and fellowships including the Householder award and the Wilkinson fellowship for his important contributions to numerical linear algebra which turned out to be extremely valuable for solving computational problems arising in control and systems theory and signal processing.
105
2. Stability of an Algorithm: An algorithm is said to be a backward stable algorithm if it computes the exact solution of a nearby problem. Some examples of stable algorithms are: backward substitution and forward elimination for triangular systems, Gaussian elimination with pivoting for linear systems, QR factorization using Householder and Givens transformations, QR iteration algorithm for eigenvalue computations, etc. The Gaussian elimination algorithm without row changes is unstable for arbitrary matrices. 3. E ects of conditioning and stability on the accuracy of the solution: The conditioning of the problem and the stability of the algorithm both have e ects on accuracy of the solution computed by the algorithm. If a stable algorithm is applied to a wellconditioned problem, it should compute accurate solution. On the other hand, if a stable algorithm is applied to an illconditioned problem, there is no guarantee that the computed solution will be accurate. The de nition of backward stability does not imply that. However, if a stable algorithm is applied to an illconditioned problem, it should not introduce more errors than what the data warrants.
3.10 Suggestions for Further Reading
For computer codes of standard matrix computations, see the book Handbook for Matrix Computations by T. Coleman and Charles Van Loan, SIAM, 1988. The concepts of stability and conditioning have been very thoroughly discussed in the book An Introduction to Matrix Computations by G. W. Stewart, Academic Press, New York, 1973 (Chapter 2). Note that
Stewart's de nition of backward stability is slightly di erent from the usual de nition of backward stability introduced by Wilkinson. We also strongly suggest the readers to read
an illuminating paper by James R. Bunch in the area: The Weak and Strong Stability of Algorithms in Numerical Linear Algebra, Lin. Alg. Appl. (1987), volume 8889, 4966. Wilkinson's AEP is a rich source of knowledge for results on backward stability for matrix algorithms. An important paper discussing notions and concepts of di erent types of stability in general is a paper by L. S. de Jong (1977).
106
Exercises on Chapter 3 Note: Use MATLAB (see Chapter 4 and the Appendix), whenever appropriate.
1. (a) Show that the oating point computations of the sum, product and division of two numbers are backward stable. (b) Are the oating computations of the inner and outer product of two vectors backward stable? Give reasons for your answer. 2. Find the growth factor of Gaussian elimination without pivoting for the following matrices. 0 1 ;1 ;1 1 B C ! ! B 0 1 ;1 B . . . . ;.1 C :00001 1 1 1 B. . . . . C B. . . . . C C C 1 1 :00001 1 B .. B . . . . . . ;1 C B. C @ A 0 0 1 10 10 1 1 1 :9
! 1 1! 1
1 :99
01 1 1 1 B 1 :9 :81 C B C @ A
1 1:9 3:61
01 1 B :9 :9 B @
1 1 :999
! 1:0001 1 ! 01 B B @
1 2 1 3
1 1:9 3:61
1 C :9 C A
1
1
1 2 1 3 1 4
1 3 1 4 1 5
1 C: C A
3. Find the condition number of each of the matrices of problem 2. 4. Show that Cond(cA) = Cond(A) for all nonzero scalars c. Show that if kI k 1, then Cond(A) 1. 5. Prove that
k (AB) ; ABk n Cond (B) + O( ) F F kABkF where A and B are matrices and B is nonsingular. (CondF (B )F = kB kF kB ; kF )
2 1
6. Are the following oating point computations backward stable? Give reasons for your answer in each case. (a) (x(y + z )) (b) (x1 + x2 + + xn ) (c) (x1x2 xn ) 107
(d) (xT y=c) where x and y are vectors and c is a scalar px2 + x2 + + x2 (e) 1 2 n 7. Find the growth factor of Gaussian elimination for each of the following matrices and hence conclude that Gaussian elimination for linear systems with these matrices is backward stable. 0 10 1 1 1 B C (a) B 1 10 1 C @ A 1 1 10 04 0 21 B C (b) B 0 4 0 C @ A 2 0 5 0 10 1 1 1 B C (c) B 1 15 5 C @ A 1 5 14 8. Show that Gaussian elimination without pivoting for the matrix 0 10 1 1 1 B B 1 10 1 C C @ A 1 1 10 is strongly stable. 9. Let H be an unreduced upper Hessenberg matrix. Find a diagonal matrix D such that D;1HD is a normalized upper Hessenberg matrix (that is, all subdiagonal entries are 1). Show that the transforming matrix D must be illconditioned, if one or several subdiagonal entries of H are very small. Do a numerical example of order 5 to verify this. 10. Show that the roots of the following polynomials are illconditioned: (a) x3 ; 3x2 + 3x + 1 (b) (x ; 1)3(x ; 2) (c) (x ; 1)(x ; :99)(x ; 2) 11. Using the result of problem #5, show that the matrixvector multiplication with an illconditioned matrix may give rise to a large relative error in the computed result. Construct your own 2 2 example to see this. 12. Write the following small SUBROUTINES for future use: (1) MPRINT(A,n) to print a square matrix A or order n. 108
(2) TRANS(A,TRANS,n) to compute the transpose of a matrix. (3) TRANS(A,n) to compute the transpose of a matrix where the transpose is overwritten by A. (4) MMULT(A,B,C,m,n,p) to multiply C = Am n Bn p . (5) SDOT(n,x,y,answer) to compute the inner product of two nvectors x and y in single precision. (6) SAXPY(n,A,x,y) to compute y ax + y in single precision, where a is a scalar and x and y are vectors. (The symbol y ax + y means that the computed result of a times x plus y will be stored in y.) (7) IMAX(n,x,MAX) to nd
jxij = maxfjxj j : j = 1 : : : ng:
(8) SWAP(n,x,y) to swap two vectors x and y . (9) COPY(n,x,y) to copy a vector x to y . (10) NORM2(n,x,norm) to nd the Euclidean length of a vector x. (11) SASUM(n,x,sum) to nd sum
n X i=1
jxij.
(12) NRMI(x,n) to compute the in nity norms of an nvector x. (13) Rewrite the above routines in double precision. (14) SNORM(m,n,A,LDA) to compute the 1norm of a matrix Am n . LDA is the leading dimension of the array A. (15) Write subroutines to compute in nity and Frobenius norms of a matrix. (16) Write a subroutine to nd the largest element in magnitude in a column vector. (17) Write a subroutine to nd the largest element in magnitude in a matrix. (Note: Some of these subroutines are a part of BLAS (LINPACK). See also the book Handbook for Matrix Computations by T. Coleman and Charles Van Loan, SIAM, 1988.)
109
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFTWARE
4.1 De nitions and Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2 FlopCount and Storage Considerations for Some Basic Algorithm : : : : : : : : : 4.3 Some Existing HighQuality Mathematical Softwares for Linear Algebra Problemsodes and MATLAB Toolkit : : : : : : : : : : : : : : : : : : : : 4.3.9 The ACM Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.3.10 ITPACK (Iterative Software Package) : : : : : : : : : : : : : : : : : : : : : 4.4 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.5 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: 110 : 113 : 122 : 122 : 122 : 123 : 124 : 125 : 125 : 125 : 126 : 126 : 126 : 126 : 127
CHAPTER 4 NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFTWARE
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFTWARE
4.1 De nitions and Examples
Solving a problem on a computer involves the following major steps performed in sequence: 1. Making a mathematical model of the problem, that is, translating the problem into the language of mathematics. For example, mathematical models of many engineering problems are sets of ordinary and partial di erential equations. 2. Finding or developing constructive methods (theoretical numerical algorithms) for solving the mathematical model. This step usually consists of a literature search to nd what methods are available for the problems. 3. Identifying the best method from a numerical point of view (the best one may be a combination of several others). We call it the numerically e ective method. 4. Finally, implementing on the computer the numerically e ective method identi ed in step 3. This amounts to writing and executing a reliable and e cient computer program based on the identi ed numerically e ective method, and may also require exploitation of the target computer architecture. The purpose of creating a mathematical software is to provide a scientist or engineer with a piece of a computer program he can use with con dence to solve a problem for which the software was designed. Thus, a mathematical software should be of high quality. Let's be speci c about what we mean when we call a software a high quality mathematical software. A high quality mathematical software should have the following features. It should be 1. Powerful and exible  can be used to solve several di erent variations of the original problem and the closely associated problems. For example, closely associated with the linear system Ax = b problem are: (a) Computing the inverse of A, i.e., nding an X such that AX = I . Though nding the inverse of A and solving Ax = b are equivalent problems, solution of a linear system using the inverse ofthe system matrix is not advisable. Computing the inverse explicitly should be avoided, unless a speci c application really calls for it. (b) Finding the determinant and rank of A. (c) Finding AX = B , where B is a matrix, etc. 110
Also, a matrix problem may have some special structures. It may be positive de nite, banded, Toeplitz, dense, sparse, etc. The software should state clearly what variations of the problem it can handle and whether it is specialstructure oriented. 2. Easy to read and modify  The software should be well documented. The documentation should be clear and easy to read, even for a nontechnical user, so that if some modi cations are needed, they can be made easily. To quote from the cover page of Forsythe, Malcolm and Moler (CMMC): \: : : it is an order of magnitude easier to write two good subroutines than to decide which one is best. In choosing among the various subroutines available for a particular problem, we placed considerable emphasis on the clarity and style of programming. If several subroutines have comparable accuracy, reliability, and e ciency, we have chosen the one that is the least di cult to read and use." 3. Portable  Should be able to run on di erent computers with few or no changes. 4. Robust  Should be able to deal with an unexpected situation during execution. 5. Based on a numerically e ective algorithm  Should be based on an algorithm that has attractive numerical properties. We have used the expression \numerically e ective" several times without quali cation. This is the most important component of a high quality mathematical software. We shall call a matrix algorithm numerically e ective if it is: (a) General Purpose  The algorithm should work for a wide class of matrices. (b) Reliable  The algorithm should give warning whenever it is on the verge of breakdown due to excessive roundo errors or not being able to meet some speci ed criterion of convergence. There are algorithms which produce completely wrong answers without giving warning at all. Gaussian elimination without pivoting (for the linear system or equivalent problem) is one such algorithm. It is not reliable. (c) Stable  Total rounding errors of the algorithm should not exceed the errors that are inherent in the original problem (see the earlier section on stability). (d) E cient  The e ciency of an algorithm is measured by the amount of computer time consumed in its implementation. Theoretically, the number of oatingpoint operations needed to implement the algorithm indicates its e ciency. 111
De nition 4.1.1 A oatingpoint operation, or op, is the amount of computer time required to execute the Fortran statement
A(I,J) = A(I,J) + t * A(I,J)
A op involves one multiplication, one addition, and some subscript manipulations. Similarly, one division coupled with an addition or subtraction will be counted as one op. This de nition of a op has been used in the popular software package
LINPACK (this package is brie y described in section 4.2).
A note on the de nition of opcount
With the advent of supercomputing technology, there is a tendency to count an addition or subtraction as a op as well. This de nition of a op has been adopted in the second edition of the book by Golub and Van Loan (MC 1989). However, we have decided to stick to the original LINPACK de nition. Note that if an addition (or subtraction) is counted as a op, then the \new op" has twice the value of the \old op".
De nition 4.1.2 A matrix algorithm involving computation with matrices of order n will be called an e cient algorithm if it takes no more than O(n ) ops. (The historical Cramer's rule for solving a linear system is therefore not e cient, since O(n!) ops are required for
3
its execution.) (See Chapter 6.)
One point is well worth mentioning here. An algorithm may be e 3 cient, but still unstable. For example, Gaussian elimination without pivoting requires n ops for an n n 3 matrix. Therefore, while it is e cient, it is unreliable and unstable for an arbitrary matrix. (e) Economic in the use of storage  Usually, about n2 storage locations are required to store a dense matrix of order n. Therefore, if an algorithm requires storage of several matrices during its execution, a large number of storage locations will be needed even when n is moderate. Thus, it is important to give special attention to economy of storage while designing an algorithm. 112
By carefully rearranging an algorithm, one can greatly reduce its storage requirement (examples of this will be presented later). In general, if a matrix generated during execution of the algorithm is not needed for future use, it should be overwritten by another computed element.
Notation for Overwriting and Interchange
We will use the notation
a b
to denote that \b overwrites a". Similarly, if two computed quantities a and b are interchanged, they will be written symbolically
a $ b:
4.2 FlopCount and Storage Considerations for Some Basic Algorithm
Numerical linear algebra often deals with triangular matrices, which can be stored using only n(n + 1) locations rather than n2 locations. This useful fact should be kept in mind while designing 2 an algorithm in numerical linear algebra, so that the extra available space can be used for something else. Sparse matrices, in general, have lots of zero entries. A convenient scheme for storing a sparse matrix will be such that only the nonzero entries are stored. In the following, we illustrate the opcount and storage scheme for some simple basic matrix computations with dense matrices. In determining the opcount of an algorithm, we note that the numbers of multiplications and additions in a matrix algorithm are roughly the same. Thus, a count of only the number of multiplications in a matrix algorithm gives us an idea of the total opcount for that algorithm. Also, the counts involving zero elements can be omitted.
Example 4.2.1 1 Product1 0 x Inner 0 y Computation Bx C B C n B C B . C and y = B y. C be two nvectors. Then the inner product z = xT y = X xiyi B C Let x = B . C B.C B.C B.C i @ A @ A
1 2 1 2
can be computed as (Algorithm 3.1.2): 113
xn
yn
=1
For i = 1 2 : : : n do z = z + xi yi Just one op is needed for each i. Then a total of n ops is needed to execute the algorithm.
Example 4.2.2 Outer Product Computation
The outer product xy T is an n n matrix Z as shown below. The (i j )th component of the matrix is xi yj . Since there are n2 components and each component requires one multiplication, the outerproduct computation requires n2 ops and n2 storage locations. However, very often one does not require the matrix from the outerproduct explicitly.
0x B B T = Bx Z = xy B .. B. @
1 2
Example 4.2.3 MatrixVector Product
Let A = (aij ) be an n n matrix and let
xn
1 C C C(y y C C A
1
2
0x y x y Bx y x y B yn ) = B .. B B . @
1 1 2 1 1
1 2 2 2
x yn 1 C x yn C C
1 2
xny xny
2
x n yn
C: C A
0b Bb B b = B .. B B. @
12 2 22 2
1 2
be an nvector. Then
0 a b +a b + B Ab = B a b + a b + @
11 1 21 1 1 1 2 2
bn
1 C C C C C A
+ a1nbn C + a2nbn C A + annbn
2
1
an b + an b +
Flopcount. Each component of the vector Ab requires n multiplications and additions. Since there are n components, the computation of Ab requires n ops. Example 4.2.4 MatrixMatrix Product With Upper Triangular Matrices
Let U = (uij ) and V = (vij ) be two upper triangular matrices of order n. Then the following algorithm computes the product C = UV . The algorithm overwrites V with the product UV . For i = 1 2 : : : n do For j = i i + 1 : : : n do
j X k=i
vij cij
uik vkj
114
An Explanation of the Above Pseudocode
Note that in the above pseudocode j represents the innerloop and i represents the outerloop. For each value of i from 1 to n, j takes the values i through n.
FlopCount.
1. Computing cij requires j ; i + 1 multiplications. 2. Since j runs from i to n and i runs from 1 to n, the total number of multiplications is:
n n XX i=1 j =i
(j ; i + 1) = =
n X
i=1 n X (n ; i + 1)(n ; i + 2) i=1
(1 + 2 +
+ (n ; i + 1)
2
n
3
6
(for large n) + r2 = r(r + 1)(2r + 1) .) 6
(Recall that 1 + 2 + 3 +
+ + r = r(r 2 1) , 12 + 22 + 32 +
Flopcount for the Product of Two Triangular Matrices
3 For large n, the product of two n n triangular matrices requires about n ops. 6
Example 4.2.5 MatrixMatrix Multiplication
Let A be an m n matrix and B be a n p matrix. Then the following algorithm computes the product C = AB . For i = 1 2 : : : m do For j = 1 2 : : : p do n X cij = aik bkj .
k=1
115
i runs from 1 to m, the total number of multiplication is mnp. Thus, for two square matrices A and B , each of order n, this count is n .
3
Flopcount. There are n multiplications in computing each cij . Since j runs from 1 to p and
FlopCount for the Product of Two Matrices
Let A be m n and B be n p. Then computing C = AB requires mnp ops. In particular, it take n3 ops to compute the product of two n n square matrices. The algorithm above for matrixmatrix multiplication for two n n matrices will obviously require n2 storage locations for each matrix. However, it can be rewritten in such a way that there will be a substantial savings in storage, as illustrated below. The following algorithm overwrites B with the product AB , assuming that an additional column has been annexed to B (alternatively one can have a work vector to hold values temporarily).
Example 4.2.6 MatrixMatrix Product with Economy in Storage
For j = 1 2 : : : n do n X 1. hi = aik bkj (i = 1 2 : : : n) 2. bij 2 : : : n) (h is a temporary work vector).
k=1 hi (i = 1
uuT Example 4.2.7 Computation of I ; 2uT u A.
In numerical linear algebra very often we need to compute ! 2uuT A I;
!
uT u
uu where I is an m m identity matrix, u is an mvector, and A is m n. The matrix I ; 2uT u is ! 2uuT called a Householder matrix (see Chapter 5). Naively, one would form the matrix I ;
T
from the vector u, then form the matrix product explicitly with A. This will require O(n ) ops. We show below that this matrix product can implicitly be performed with O(n2! ops. The key ) 2uuT explicitly. observation here is that we do not need to form the matrix I ;
3
uT u
uT u
116
Then
The following algorithm computes the product. The algorithm overwrites A with the product. Let 2 = uT u :
becomes A ; uuT A: Let
uuT I ; 2uT u A
!
un Then the (i j )th entry of (A ; uuT A) is equal to aij ; (u a j + u a j +
have the following algorithm.
1 1 2 2
0u Bu B u = B .. B B. @
1 2
1 C C C: C C A
+ un anj )ui. Thus, we
Algorithm 4.2.1 Computing (I ; 2uuuu )A T
T
2 1. Compute = uT u . For j = 1 2 : : : n do = u1 a1j + u2a2j +
+ um amj
For i = 1 2 : : : m do
aij aij ; ui
FlopCount.
1. There are (m + 1) ops to compute (m ops to compute the inner product and 1 op to divide 2 by the inner product). 2. There are n 's, and each costs (m + 1) ops. Thus, we need n(m + 1) ops to compute the 's. 3. There are mn aij to compute. Each aij costs just one op, once the 's are computed. Total opcount: (m + 1) + 2mn + n. 117
We now summarize the above very important result (which will be used repeatedly in this book) in the following:
uuT Flopcount for the Product: I ; 2uT u A
Let A be m n (m n and u be an mvector. Then the above product can be computed with only (m + 1) + 2mn + n ops. In particular, if m = n, then it takes roughly 2n2 ops, compared to n3 ops if done naively.
!
A Numerical Example
Let
01 11 B C u = (1 1 1)T A = B 2 1 C : @ A
0 0 =2 3
Then
j=1: (Compute the rst column of the product)
= u1a11 + u2 a21 + u3 a31 = 1 + 2 = 3 = 2 3=2 3
11 21 31 11 21 31 1 2 3
a a a
a ; u = 1 ; 2 = ;1 a ; u = 2 ; 2 1 ; 2 ; 2 = ;0 a ; u = 0 ; 2 1 = ;2
j=2: (Compute the second column of the product)
= u1a12 + u2 a22 + u3a32 = 1 + 1 = 2 =2 2 = 4 3 3 a12 ; u1 = 1 ; 4 1 = ;1 3 3 118
a
12
a
Thus,
22
a
32
a ; u = 1 ; 4 1 = ;1 3 3 4 = ;4 a ; u =0;
22 2 32 3
3
3
A
0 ;1 B =B 0 @ ;2
uuT I ; 2uT u A
;1 1 3 ;1 C C 3 A ;4
3
!
119
Example 4.2.8 Flopcount for Algorithm 3.1.3 (Back Substitution Process)
>From the pseudocodes of this algorithm, we see that it takes one op to compute yn , two ops to compute yn;1 , and so on. Thus to compute y1 through yn , we need (1 + 2 + 3 + + n) = n(n + 1) n2 ops. 2 2
Flopcount for the Back Substitution Process
It requires roughly n ops to solve an upper triangular system using 2 the backsubstitution process.
2
Example 4.2.9 Flopcount and Storage Considerations for Algorithm 3.1.4 (The Inverse of an Upper Triangular Matrix
Let's state the algorithm once more here For k = n n ; 1 : : : 2 1 skk = t1 kk For i = k ; 10k ; 2 : : : 1 1 n X s =;1 @ t s A
ik
tii
j =i+1
ij jk
Flopcount.
k=1 k=2 k=3
1 op 3 ops 6 ops
. . .
k=n
(approximately ):Total ops
n(n + 1) ops 2
1+3+6+ =
n X r2 r=1
2 + r=1 2
n Xr
+ + n(n2 1) =
n X r(r + 1) r=1
n: 6
3
2
120
Flopcount for Computing the Inverse of an Upper Triangular Matrix
3 It requires about n ops to compute the inverse of an upper trian6 gular matrix.
Storage Considerations. Since the inverse of an upper triangular matrix is an upper triangular matrix, and it is clear from the algorithm that we can overwrite tik by sik , the algorithm can be rewritten so that it overwrites T with the inverse S . Thus we can rewrite the algorithm as Computing the Inverse of an Upper Triangular Matrix with Economy in Storage
For k = n n ; 1 : : : 2 1 tkk skk = t1 kk For i = k ; 1 k ; 2 : : : 1 k X t s =;1 t s
ik ik
tii j
= +1
i
ij jk
Example 4.2.10 Flopcount and Storage Considerations for Algorithm 3.1.5 (Gaussian Elimination)
It will be shown in Chapter 5 that the Gaussian elimination algorithm takes about n 3 ops. The algorithm can overwrite A with the upper triangular matrix A(n;1) in fact, it can overwrite A with each A(k) . The multipliers mik can be stored in the lower half of A as they are computed. Each b(k) can overwrite the vector b.
3
121
Gaussian Elimination Algorithm with Economy in Storage
For k = 1 2 : : : n ; 1 do For i = k + 1 : : : n do a aik mik = ; a ik kk For j = k + 1 : : : n do aij aij + mik akj bi bi + mikbk
4.3 Some Existing HighQuality Mathematical Softwares for Linear Algebra Problems
Several high quality mathematical software packages for various types of matrix computations are in existence. These are LINPACK, EISPACK, MATLAB, NETLIB, IMSL, NAG, and the most recently released LAPACK.
4.3.1 LINPACK
LINPACK is \a collection of Fortran subroutines which analyze and solve various systems of simultaneous linear algebraic equations. The subroutines are designed to be completely machine independent, fully portable, and to run at near optimum e ciency in most operating environments." (Quotation from LINPACK Users' Guide.) Though primarily intended for linear systems, the package also contains routines for the singular value decomposition (SVD) and problems associated with linear systems such as computing the inverse, the determinant, and the linear least square problem. Most of the routines are for square matrices, but some handle rectangular coe cient matrices associated with overdetermined or underdetermined problems. The routines are meant for small and dense problems of order less than a few hundred and band matrices of order less than several thousand. There are no routines for iterative methods.
4.3.2 EISPACK
EISPACK is an eigensystem package. The package is primarily designed to compute the eigenvalues and eigenvectors of a matrix however, it contains routines for the generalized eigenvalue problem of the form Ax = Bx and for the singular value decomposition. 122
The eigenvalues of an arbitrary matrix A are computed in several sequential phases. First, the matrix A is balanced. If it is nonsymmetric the balanced matrix is then reduced to an upper Hessenberg by matrix similarities (if it is symmetric, it is reduced to symmetric tridiagonal). Finally the eigenvalues of the transformed upper Hessenberg or the symmetric tridiagonal matrix are computed using the implicit QR iterations or the Sturmsequence method. There are EISPACK routines to perform all these tasks.
4.3.3 LAPACK
The building blocks of numerical linear algebra algorithms have three levels of BLAS (Basic Linear Algebra Subroutines). They are:
Level 1 BLAS: These are for vectorvector operations. A typical Level 1 BLAS is of the form
y y x + y, where x and y are vectors and is a scalar. Ax + y .
Level 2 BLAS: These are for matrixvector operations. A typical Level 2 BLAS is of the form Level 3 BLAS: These are for matrixmatrix operations. A typical Level 3 BLAS is of the form
C AB + C .
Level 1 BLAS is used in LINPACK. Unfortunately, the algorithms composed of Level 1 BLAS are not suitable for achieving high e ciency on most supercomputers of today, such as on CRAY computers. While Level 2 BLAS can give good speed (sometimes almost peak speed) on many vector computers such as CRAY XMP or CRAY YMP, those are not suitable for high e ciency on other modern supercomputers (e.g., CRAY 2). The Level 3 BLAS are ideal for most of today's supercomputers. They can perform O(n3) oatingpoint operations on O(n2 ) data. Therefore, during the last several years, an intensive attempt was made by numerical linear algebraists to restructure the traditional BLAS1 based algorithms into algorithms rich in BLAS2 and BLAS3 operations. As a result, there now exist algorithms of these types, called blocked algorithms, for most matrix computations. These algorithms have been implemented in a software package called LAPACK. \LAPACK is a transportable library of Fortran 77 subroutines for solving the most common problems in numerical linear algebra: systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. It has been designed to be e cient on a wide range of modern highperformance computers. 123
LAPACK is designed to supersede LINPACK and EISPACK, principally by restructuring the software to achieve much greater e ciency on vector processors, highperformance \superscalar" workstations, and shared memory multiprocessors. LAPACK also adds extra functionality, uses some new or improved algorithms, and integrates the two sets of algorithms into a uni ed package. The LAPACK Users' Guide gives an informal introduction to the design of the algorithms and software, summarizes the contents of the package, describes conventions used in the software and its documentation, and includes complete speci cations for calling the routines." (Quotations from the cover page of LAPACK Users' Guide.)
4.3.4 NETLIB Netlib stands for network library. LINPACK, EISPACK, and LAPACK subroutines are available
electronically from this library, along with many other types of softwares for matrix computations. A user can obtain software from these packages by sending electronic mail to
netlib@netlib.ornl.gov
Also, les can be transferred to a local directory by anonymous ftp:
ftp research.alt.com ftp netlib2.cs.utk.edu
To nd out how to use netlib, send email:
send index
Information on the subroutines available in a given package can be obtained by sending email:
send index from {library}
Thus, to obtain a description of subroutines for LINPACK, send the message:
send index from LINPACK
To obtain a piece(s) of software from a package, send the message:
send {routines} from {library}
Thus, to get a subroutine called SGECO from LINPACK, send the message:
send SGECO from LINPACK
124
(This message will send you SGECO and other routines that call SGECO.) Xnetlib, which uses an X Window interface for direct downloading, is also available (and more convenient) once installed. Further information about Netlib can be obtained by anonymous FTP to either of the following sites:
netlib2.cs.utk.edu research.att.com
4.3.5 NAG
NAG stands for Numerical Algorithm Group. This group has developed a large software library (also called NAG) containing routines for most computational problems including numerical linear algebra problems, numerical di erential equations problems (both ordinary and partial), optimization problems, integral equations problems, statistical problems, etc.
4.3.6 IMSL
IMSL stands for International Mathematical and Statistical Libraries. As the title suggests, this library contains routines for almost all mathematical and statistical computations.
4.3.7 MATLAB
The name MATLAB stands for MATrix LABoratory. It is an interactive computing system designed for easy computations of various matrixbased scienti c and engineering problems. MATLAB provides easy access to matrix software developed by the LINPACK and EISPACK software projects. MATLAB can be used to solve a linear system and associated problems (such as inverting a matrix or computing the rank and determination of a matrix), to compute the eigenvalues and eigenvectors of a matrix, to nd the singular value decomposition of a matrix, to compute the zeros of a polynomial, to compute generalized eigenvalues and eigenvectors, etc. MATLAB is an extremely useful and valuable package for testing algorithms for small problems and for use in the classroom. It has indeed become an indispensable tool for teaching applied and numerical linear algebra in the classroom. A remarkable feature of MATLAB is its graphic capabilities (see more about MATLAB in the appendix). There is a student edition of MATLAB, published by Prentice Hall,1992.It is designed for easy use in the classroom.
125
4.3.8 MATLAB Codes and MATLAB Toolkit
MATLAB codes for selected algorithms described in this book are provided for beginning students in the APPENDIX. Furthermore, an interactive MATLAB Toolkit called MATCOM implementing all the major algorithms (to be taught in the rst course) will be provided along with the book, so that students can compare di erent algorithms for the same problem with respect to numerical e ciency, stability, accuracy, etc.
4.3.9 The ACM Library
The library provided by the Association for Computing Machinery contains routines for basic matrixvector operations, linear systems and associated problems, nonlinear systems, zeros of polynomials, etc. The journal TOMS (ACM Transactions on Mathematical Software) publishes these algorithms.
4.3.10 ITPACK (Iterative Software Package)
The package contains routines for solving iteratively mainly linear systems problems in the large and sparse cases.
4.4 Review and Summary
The purpose of this chapter has been to introduce concepts of numerically e ective algorithms and the associated high quality mathematical software. 1. Writing codes for a given algorithm is rather a trivial task. However, all softwares are not high quality software. We have de ned a high quality mathematical software as one which is (1) powerful and exible, (2) easy to read and modify, (3) portable, (4) robust, and more importantly (5) based on a numerically e ective algorithm. 2. Like softwares, there may exist many algorithms for a given problem. However, not all algorithms are numerically e ective. We have de ned a numerically e ective algorithm as one which is (1) general purpose, (2) reliable, (3) stable, and (4) e cient in terms of both time and storage. 3. The e ciency of a matrix algorithm is measured by computer time consumed by the algorithm. A theoretical measure is the number of ops required to execute the algorithm. Roughly, one 126
multiplication (or a division) together with one addition (or a subtraction) has been de ned to be one op. A matrix algorithm involving matrices of order n requiring no more than O(n3) ops has been de ned to be an e cient algorithm stability of an algorithm was de ned in Chapter 2. An important point has been made: An algorithm may be e cient without being
stable. Thus, an e cient algorithm may not necessarily be a numerically e ective algorithm. There are algorithms which are fast but not stable.
4. Several examples (Section 4.2) have been provided to show how certain basic matrix operations can be reorganized rather easily to make them both storage and time e cient, without implementing them naively. Example 4.2.7 is the most important one in this context. Here we have shown how to T organize the computation of the product of an n n matrix A 2uu , known as the Householder matrix, so that the product can with the matrix H = I ; uT u be computed with O(n2 ) ops rather than the O(n3) ops that would be needed if computed naively, ignoring the structure of H . This computation forms a basis for many other matrix computations described later in the book. 5. A statement for each of the several high quality mathematical software package such as LINPACK, EISPACK, LAPACK, MATLAB, IMSL, NAG, etc., has been provided in the nal section.
4.5 Suggestions for Further Reading
A clear and nice description of how to develop high quality mathematical software packages is included in the book Matrix Computations and Mathematical Software by John Rice, McGrawHill Book Company, 1981. This book contains a chapter (Chapter 12) on software projects. The students (and the instructors) interested in comparative studies of various softwares may nd them interesting. An excellent paper by Demmel (1984) on the reliability of numerical software is worth reading. Another recent book in the area is the book Handbook of Matrix Computations by T. Coleman and C. Van Loan, SIAM, 1988. These two books are a must for readers interested in the softwaredevelopment for matrix problems. The books by Forsythe, Malcom and Moler (CMMC) and by Hager (ANAL) contain some useful subroutines for matrix computations. See also the books by Johnston, and by Kahaner, Moler and Nash referenced in Chapter 2. 127
Stewart's book (IMC) is a valuable source for learning how to organize and develop algorithms for basic matrix operations in timee cient and storageeconomic ways. Each software package has its own users' manual that describes in detail the functions of the subroutines, and how to use them, etc. We now list the most important ones. 1. The LINPACK Users' Guide by J. Dongarra, J. Bunch, C. Moler and G. W. Stewart, SIAM, Philadelphia, PA, 1979. 2. The EISPACK Users' Guide can be obtained from either NESC or IMSL. Matrix Eigensystem Routines EISPACK Guide by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, has been published by SpringerVerlag, Berlin, 1976, as volume 6 of Lecture Notes in Computer Science. A later version, prepared by Garbow, Boyle, Dongarra and Moler in 1977, is also available as a SpringerVerlag publication. 3. LAPACK Users' Guide, prepared by Chris Bischof, James Demmel, Jack Dongarra, Jerry D. Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen, is available from SIAM. SIAM's address and telephone number are: SIAM (Society for Industrial and Applied Mathematics), 3600 University City Science Center, Philadelphia, PA 191142688 Tel: (215) 3829800. 4. The NAG Library and the associated users' manual can be obtained from: Numerical Algorithms Group, Inc. 1400 Opus Place, Suite 200 Downers Grove, IL 601515702 5. IMSL: The IMSL software library and documentation are available from: IMSL, Houston, Texas 6. MATLAB: The MATLAB software and Users' Guide are available from: The MathWorks, Inc. Cochituate Place 24 Prime Park Way Natick, MA 017601520 TEL: (508) 6531415 FAX: (508) 6532997 email: info@mathworks.com The student edition of MATLAB has been published by Prentice Hall, Englewood Cli s, NJ 07632. A 5 1 disk is included with the book for MSDOS personal computers. 4
128
For more information on accessing mathematical software electronically, the paper \Distribution of Mathematical Software via Electronic Mail" by J. Dongarra and E. Grosse, Communications of the ACM, 30(5) (1987), 403407, is worth reading. Finally, a nice survey of the blocked algorithms has been given by James Demmel (1993).
129
Exercises on Chapter 4
1. Develop an algorithm to compute the product C = AB in each of the following cases. Your algorithm should take advantage of the special structure of the matrices in each case. Give opcount and storage requirement in each case. (a) (b) (c) (d) (e) (f)
A and B are both lower triangular matrices. A is arbitrary and B is lower triangular. A and B are both tridiagonal. A is arbitrary and B is upper Hessenberg. A is upper Hessenberg and B is tridiagonal. A is upper Hessenberg and B is upper triangular. aij = 0
whenever ji ; j j > k:
2. A square matrix A = (aij ) is said to be a band matrix of bandwidth 2k + 1 if Develop an algorithm to compute the product C = AB , where A is arbitrary and B is a band matrix of bandwidth 2, taking advantage of the structure of the matrix B . Overwrite A with AB and give opcount. 3. Using the ideas in Algorithm 4.2.1, develop an algorithm to compute the product A(I + xy T ), where A is an n n matrix and x and y are nvectors. Your algorithm should require roughly 2n2 ops. 4. Rewrite the algorithm of the problem #3 in the special cases when the matrix I + xy T is (a) an elementary matrix: I + meT , m = (0 0 : : : 0 mk+1 k : : : mn k)T , and eT is the kth k k row of I . T (b) a Householder matrix: I ; 2uu , where u is an nvector. T 5. Let A and B be two symmetric matrices of the same order. Develop an algorithm to compute C = A + B , taking advantage of symmetry for each matrix. Your algorithm should overwrite B with C . What is the opcount? 6. Let A = (aij ) be an unreduced lower Hessenberg matrix of order n. Then, given the rst row r1, it can be shown (Datta and Datta 1976]) that the successive rows r2 through rn of Ak (k 130
uu
is a positive integer
n) can be computed recursively as follows:
(riBi ;
i; X
1
ri =
+1
j =1 ai i+1
aij rj )
i = 1 2 : : : n ; 1 where Bi = A ; aii I:
Develop an algorithm to implement this. Give opcount for the algorithm. 7. Let ar and br denote, respectively, the rth columns of the matrices A and B . Then develop an algorithm to compute the product AB from the formula
AB =
8. Consider the matrix
n X i=1
ai bT : i
Give opcount and storage requirement of the algorithm.
0 0 1 1 Find the eigenvalues of this matrix. Use MATLAB. Now perturb the (1,12) element to 10;9 and compute the eigenvalues of this perturbed matrix. What conclusion do you make about the conditioning of the eigenvalues?
0 12 11 10 3 2 11 B 11 11 10 C B 3 2 1C B B 0 10 10 . . . . .C . .C B . .C B. C B. ... ... ... . . C . .C A=B . . .C B. B. ... ... 2 . C .C B. .C B. B. C B. 2 2 1C @ A
131
MATLAB AND MATCOM PROGRAMS AND PROBLEMS ON CHAPTER 4
You will need the program housmul from MATCOM 1. Using the MATLAB function `rand', create a 5 5 random matrix and then print out the following outputs: A(2,:), A(:,1), A (:,5), A (1, 1: 2 : 5), A( 1, 5]), A (4: 1: 1, 5: 1: 1). 2. Using the function `for', write a MATLAB program to nd the inner product and outer product of two nvectors u and v.
s] = inpro(u,v) A] = outpro(u,v)
Test your program by creating two di erent vectors u and v using rand (4,1). 3. Learn how to use the following MATLAB commands to create special matrices: compan companion matrix diag diagonal matrix ones constant matrix zeros zero matrix rand random matrix tri triangular part hankel hankel matrix toeplitz Toeplitz matrix hilb Hilbert matrix triu upper triangular vander Vandermonde matrix 4. Write MATLAB programs to create the following wellknown matrices: (a) A] = wilk(n) to create the Wilkinson bidiagonal matrix A = (aij ) of order n:
aii = n ; i + 1 i = 1 2
132
20
ai; i = n i = 2 3 n aij = 0 otherwise :
1
(b) A] = pie(n) to create the Pie matrix A = (aij ) of order n:
aij = is a parameter near 1 or n ; 1: aii = 1 for i 6= j:
5. Using \help" commands for \ ops", \clock", \etime", etc., learn how to measure opcount and timing for an algorithm. 6. Using MATLAB functions `for', `size', `zero', write a MATLAB program to nd the product of two upper triangular matrices A and B of order m n and n p, respectively. Test your program using
A = triu(rand (4,3)), B = triu(rand (3,3)).
7. Run the MATLAB program housmul(A u) from MATCOM by creating a random matrix A of order 6 3 and a random vector u with six elements. Print the output and the number of ops and elapsed time. 8. Modify the MATLAB program housmul to housxy(A x y ) to compute the product (I + xy T )A. Test your program by creating a 15 15 random matrix A and the vectors x and y of appropriate dimensions. Print the product and the number of ops and elapsed time.
133
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA AND THEIR APPLICATIONS
5.1 A Computational Methodology in Numerical Linear Algebra : : : 5.2 Elementary Matrices and LU Factorization : : : : : : : : : : : : 5.2.1 Gaussian Elimination without Pivoting : : : : : : : : : : 5.2.2 Gaussian Elimination with Partial Pivoting : : : : : : : : 5.2.3 Gaussian Elimination with Complete Pivoting : : : : : : : 5.3 Stability of Gaussian Elimination : : : : : : : : : : : : : : : : : : 5.4 Householder Transformations : : : : : : : : : : : : : : : : : : : : 5.4.1 Householder Matrices and QR Factorization : : : : : : : : 5.4.2 Householder QR Factorization of a NonSquare Matrix : : 5.4.3 Householder Matrices and Reduction to Hessenberg Form 5.5 Givens Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.5.1 Givens Rotations and QR Factorization : : : : : : : : : : 5.5.2 Uniqueness in QR Factorization : : : : : : : : : : : : : : : 5.5.3 Givens Rotations and Reduction to Hessenberg Form : : : 5.5.4 Uniqueness in Hessenberg Reduction : : : : : : : : : : : : 5.6 Orthonormal Bases and Orthogonal Projections : : : : : : : : : : 5.7 QR Factorization with Column Pivoting : : : : : : : : : : : : : : 5.8 Modifying a QR Factorization : : : : : : : : : : : : : : : : : : : : 5.9 Summary and Table of Comparisons : : : : : : : : : : : : : : : : 5.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: 135 : 135 : 136 : 147 : 155 : 160 : 163 : 167 : 173 : 174 : 180 : 186 : 188 : 191 : 193 : 194 : 198 : 203 : 205 : 209
CHAPTER 5 SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA AND THEIR APPLICATIONS
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA AND THEIR APPLICATIONS
Objectives
The major objective of this chapter is to introduce fundamental tools such as elementary, Householder, and Givens matrices and their applications. Here are some of the highlights of the chapter. without pivoting (Section 5.2.1), MA = U factorization using Gaussian elimination with partial pivoting (Section 5.2.2), and MAQ = U factorization using Gaussian elimination with complete pivoting (Section 5.2.3).
Various LUtype matrix factorizations: LU factorization using Gaussian elimination
QR factorization using Householder and Givens matrices (Section 5.4.1 and Section 5.5.1). Reduction to Hessenberg form by orthogonal similarity using Householder and Givens
matrices (Sections 5.4.3 and 5.5.3). tions (Section 5.6).
Computations of orthonormal bases and orthogonal projections using QR factorizaBackground Material Needed for this Chapter
The following background material and tools developed in earlier chapters will be needed for comprehension of this chapter. 1. Subspace and basis (Section 1.2.1) 2. Rank properties (Section 1.3.2) 3. Orthogonality and projections: Orthonormal basis and orthogonal projections (Sections
1.3.5 and 1.3.6)
4. Special matrices: Triangular, permutation, Hessenberg, orthogonal (Section 1.4) 5. Basic Gaussian elimination (Algorithm 3.1.5) 6. Stability concepts of algorithms (Section 3.2)
134
5.1 A Computational Methodology in Numerical Linear Algebra
Most computational algorithms to be presented in this book have a common basic structure that can be described as follows: 1. The problem is rst transformed to a \reduced" problem. 2. The reduced problem is then solved exploiting the special structure exhibited by the problem. 3. Finally, the solution of the original problem is recovered from the solution of the reduced problem. The reduced problem typically involves a \condensed" form or forms of the matrix A, such as triangular, Hessenberg (almost triangular), tridiagonal, Real Schur Form (quasitriangular), or bidiagonal. It is the structures of these condensed forms which are exploited in the solution of the reduced problem. For example, the solution of the linear system Ax = b is usually obtained rst by triangularizing the matrix A, and then solving an equivalent triangular system. In eigenvalue computations, the matrix A is transformed to a Hessenberg form before applying the QR iterations. To compute the singular value decomposition, the matrix A is rst transformed to a bidiagonal matrix, and then singular values of the bidiagonal matrix are computed. These condensed forms are normally achieved through a series of transformations known as elementary, Householder or Givens transformations. We will study these transformation here and show how they can be applied to achieve various condensed forms.
5.2 Elementary Matrices and LU Factorization
In this section we show how to triangularize a matrix A using the classical elimination scheme known as the Gaussian elimination scheme. The tools of Gaussian elimination are elementary matrices.
De nition 5.2.1 An elementary lower triangular matrix of order n is a matrix of the form 01 0 0 0 01 B0 1 0 C B 0 0C B C B0 0 1 0 C B 0C B. . . . B . . .. .. ... .C .C B. . .C B. . E=B. . .C ... .C B. . 0 1 .C B . C B . C B0 . mk k . . . 0 ... C B C B . ... ... 0 C . B0 0 C . @ A
+1
0 0 0
mn k
0 1
135
Thus, it is an identity matrix except possibly for a few nonzero elements below the diagonal of a single column. If the nonzero elements lie in the kth column, then E has the form:
E = I + meT k where I is the identity matrix of order n, m = (0 0 : : : 0 mk
unitvector.
+1
k
: : : mn k)T , and ek is the kth
Elementary matrices can be very conveniently used to create zeros in a vector, as shown in the following lemma.
Lemma 5.2.1 Given
an there is an elementary matrix E such that Ea is a multiple of e .
1
0a Ba B a = B .. B B. @
1 2
1 C C C C C A
a 6= 0
1
0 1 0 0 01 B ;a 1 0 C B 0C B a C E=B . B . . C: ... .C B . .C @ a A n 0 0 ;a 1 Then E is an elementary lower triangular matrix and is such that 0a 1 B0C B C B C B C Ea = B 0 C B.C B.C B.C @ A
2 1 1 1
Proof. De ne
0
a De nition 5.2.2 The elements mi = ; ai , i = 2 : : : n are called multipliers. 5.2.1 Gaussian Elimination without Pivoting
1 1
We described the basic Gaussian elimination scheme (Gaussian elimination without pivoting) in Chapter 3 (Algorithm 3.1.5). We will see in this section that this process yields an LU factorization of A, A = LU , whenever the process can be carried to completion. The key observation is that
( ) ( 1)
the matrix A k is a result of premultiplication of A k; by a suitable elementary lower triangular matrix.
Set A = A(0). 136
in the rst column. That is, A has the form
(1) 11 (1)
Step 1. Find an elementary matrix E such that A = E A has zeros below the (1,1) entry
1 (1) 1
0 a(1) n2 Note that it is only su cient to nd E1 such that
0a a B0 a B A = B .. .. B B . . @ 0a Ba B E B .. B B . @
1 11 21
12
(1) 22
...
a n1 C anC C
1 (1) 2
ann
(1)
. C: . C . A
0 Then A(1) = E1A will have the above form and is the same as the matrix A(1) obtained at the end of step 1 of Algorithm 3.1.5. a Record the multipliers: m21 m31 : : : mn1 mi1 = ; ai1 i = 2 : : :n.
1 11
an
1 0a C B0 C B C=B . C B . C B . A @
11
1 C C C: C C A
entry in the second column. The matrix E2 can be constructed as follows: b First, nd an elementary matrix E2 of order (n ; 1) such that
Step 2. Find an elementary matrix E such that A = E A has zeros below the (2,2)
2 (2) 2 (1)
0a Ba B B b B ... EB B . B . B . @
2 2
(1) 22 (1) 32
0 A(2) = E2A(1) will then have zeros below the (2,2) entry in the second column.
Record the multipliers: m : : : mn mi = ; ai i = 3 : : : n. Then de ne a 01 0 1 0 1 0 B C B1 0C B0 C B C = B E C: b C E = B .. B C B. ^ A E C @ @ A 0
32 2 (1) 2 (1) 22 2 2 2
an
(1) 2
1 0a C B 0 C B C B C=B 0 C B C B . C B . C B . A @
0
(1) 22
1 C C C C: C C C C A
0a a B0 a B B B A =B 0 0 a B . . B . . B . . @
11 12 (1) 22 (2)
(2) 33
0
0
an
...
a n1 C anC C C a n C: C
1 (1) 2 (2) 3
(2) 3
ann
(2)
. C . C . A
137
Note that premultiplication of A(1) by E2 does not destroy zeros already created in A(1). This matrix A(2) is the same as the matrix A(2) of Algorithm 3.1.5.
Ek A k; has zeros below the (k k) entry in the kth column. Ek is computed in two successive b steps. First, an elementary matrix Ek of order n ; k + 1 is constructed such that 0 k 1 0 a k; 1 B akk; C B akkk C B 0 C C B ; C B bk B k . k C = B 0 C B C E B . C B C B C
( 1) ( 1) ( 1) (
Step k. In general, at the kth step, an elementary matrix Ek is found such that A k =
( )
@ . A B ... C B C @ A a k;
(
1) +1
nk
1)
and then Ek is de ned as
0
0 Here Ik;1 is the matrix of the rst (k ; 1) rows and columns of the n n identity matrix I . The matrix A(k) = Ek A(k;1) is the same as the matrix A(k) of Algorithm 3.1.5.
0I B k; B Ek = B B B @
1
0
b Ek
1 C C C: C C A
Record the multipliers:
mk
+1
k
: : : mik mi k = ; aik; k akk
( (
k;1)
1)
i = k + 1 : : : n:
( 1)
same matrix A(n;1) of Algorithm 3.1.5.
Step n1. At the end of the (n ; 1)th step, the matrix A n; is upper triangular and the
0a a B0 a B B B0 0 a B =B B B0 B . B . B . @
11 12 (1) 22
A n;1)
(
(2) 33
...
an an an
1
(1) 2 (2) 3
0
0
0
3)
...
0 a(n;1) nn
. . . . . .
1 C C C C C C: C C C C C A
Obtaining L and U .
A n; = En; A n; = En; En; A n; =
( 1) 1 ( 2) 1 2 (
= En;1En;2En;3
1 2
E E A:
2 1
Set
U = A n;
(
1)
L = En; En;
1
EE:
2 1
138
Then from above we have
U = L A:
1
Since each Ek is a unit lower triangular matrix (a lower triangular matrix having 1's along the diagonal), so is the matrix L1 and, therefore, L;1 exists. (Note that the product of two triangular 1 matrices of one type is a triangular matrix of the same type.) Set L = L;1 . Then the equation U = L1A becomes 1
A = LU:
This factorization of A is known as LU factorization.
n De nition 5.2.3 The entries a a : : : ann; are called pivots, and the above process of obtaining LU factorization is known as Gaussian elimination without row interchanges . It is commonly known as Gaussian elimination without pivoting.
11 (1) 22 ( 1)
An Explicit Expression for L
; ; Since L = L;1 = E1 1E2 1 1 ; En; and Ei; = I ; meT , where i
1 1 1
m = (0 0 : : : 0 mi
we have
+1
i
: : : mn i)
0 1 1 0 0 B ;m C B 1 0 0C B C B C ... B ;m ;m 1 L=B 0C: C B .. C . ... ... . B . . 0C @ A ;mn ;mn ;mn n; 1 Thus, L can be formed without nding any matrix inverse.
21 31 32 1 2 1
Existence and Uniqueness of LU Factorization
It is very important to note that for an LU factorization to exist, the pivots must be di erent ! 0 1 from zero. Thus, LU factorization may not exist even for a very simple matrix. Take A = . 1 0 It can be shown (exercise #3) that if Ar is the submatrix of A consisting of the rst r rows and columns, then det Ar = a11a(1) a(r;1) : 22 rr
Karl Friedrich Gauss (17771855) was a German mathematician and astronomer, noted for development of many classical mathematical theories, and for his calculation of the orbits of the asteroids Ceres and Pallas. Gauss is still regarded as one of the greatest mathematicians the world has ever produced.
139
Thus, if det Ar r = 1 2 : : : n is nonzero, then an LU factorization always exists. Indeed, the LU factorization in this case is unique, for if
A=L U =L U
1 1 2
2
then, since L1 and L2 and the matrix A are nonsingular, from det A = det(L1U1 ) = det L1 det U1 and det A = det(L2U2 ) = det L2 det U2
it follows that U1 and U2 are also nonsingular. Hence
L; L = U U ;
2 1 1 2 1
1
where L2L;1 is the product of two unit lower triangular matrices and is therefore unit lower 1 triangular U2 U1;1 is the product of two upper triangular matrices and is therefore upper triangular. Since a unit lower triangular matrix can be equal to an upper triangular matrix only if both are the identity, we have L1 = L2 U1 = U2:
Theorem 5.2.1 (LU Theorem) Let A be an n n matrix with all nonzero leading
principal minors. Then A can be decomposed uniquely in the form
A = LU
where L is unit lower triangular and U is upper triangular.
Remark: Note that in the above theorem, if the diagonal entries of L are not speci ed, then the factorization is not unique. Algorithm 5.2.1 LU Factorization Using Elementary Matrices
Given an n n matrix A, the following algorithm computes, whenever possible, elementary matrices
E E : : : En;
1 2
1
140
and an upper triangular matrix U such that with
L = E; E;
1 1 2
1
; En;
1
1
A = LU . The algorithm overwrites A with U .
For k = 1 2
n ; 1 do
1. If akk = 0 stop.
b 2. Find an elementary matrix Ek = I + meT , of order (n ; k + 1), where k
m = (0 0 : : : 0 mk
such that
+1
k
: : : mn k)
0a 1 0a 1 kk B kk C B 0 C . C B C . C B C b B . Ek B .. C = B .. C : B C B C B . A @ . A @
Ik;
1
3. De ne Ek = b where Ik;1 is the matrix of the rst (k ; 1) rows and columns of 0 Ek the n n identity matrix I . 4. Save the multipliers
0
!
ank
0
mk
+1
k
: : : mn k:
Overwrite aik with mi k i = k + 1 : : : n. 5. Compute A(k) = Ek A. 6. Overwrite A with A(k).
Example 5.2.1
Find an LU factorization of
02 2 31 B C A = B4 5 6C: @ A
1 2 4
141
Step 1. Compute E . The multipliers are: m = ;2 m = ; . 0 1 0 01 C B E = B ;2 1 0 C A @ ; 0 1 0 1 0 0102 2 31 02 2 3 1 B CB C B C A A = E A = B ;2 1 0 C B 4 5 6 C = B 0 1 0 C @ A@ A @ A ; 0 1 1 2 1 0 1 ; Step 2. Compute E . The multiplier is: m = ;1 ! 1 0 b E = ;1 1 01 0 01 ! B0 1 0C = 1 0 C E =B @ A 0 E b 0 ;1 1 01 0 0102 2 31 02 2 31 B CB C B C A A = E A = B0 1 0CB0 1 0C = B0 1 0C @ A@ A @ A 0 ;1 1 0 1 0 0 02 2 31 C B Thus U = B 0 1 0 C : A @
1 21 31 1 2 1 1 2 (1) 1 1 2 1 2 2 32 2 2 2 (2) 2 (1) 5 2 5 2
0 1 0 01 B C L = E E = B ;2 1 0 C @ A ; ;1 1 1 0 1 0 01 0 1 0 0 C B B C L = L; = B ;m 1 0C = B 2 1 0C A @ @ A ;m ;m 1 1 1 (Note that neither L1 nor its inverse needs to be computed explicitly.)
1 2 1 1 2 1 1 21 31 32 1 2
Compute L:
0 0
5 2
Forming the Matrix L and Other Computational Details
1. Each elementary matrix Ek is uniquely determined by (n ; k) multipliers
mk
+1
k
: : : mnk:
Thus to construct Ek , it is su cient to save these multipliers. 142
2. The premultiplication of A(k;1) by Ek a ects only the rows (k +1) through n. The rst k rows of the product remain same as those of A(k;1) and only the last (n ; k) rows are modi ed. Thus, if Ek A(k;1) = A(k) = (a(k)), then ij (a) a(k) = a(k;1) (i = 1 2 : : : k j = 1 2 : : : n) (the rst k rows). ij ij (b) a(k) = a(k;1) + mik a(k;1) (i = k + 1 : : : n j = k + 1 : : : n) (the last (n ; k) rows). ij ij kj (Note that this is how the entries of A(k) were obtained from those of A(k;1) in Algorithm 3.1.5.) For example, let n = 3.
0 1 0 01 B C E = Bm 1 0C @ A
1
E A = A = (aij )
1 (1) (1)
0 1 0 010a B CB E A = Bm 1 0CBa @ A@
1
where
m
21 31
0 1
m a a a a
11 21 31
21 31
0 1
12 22 32
a a a
13 23 33
1 0a a C B C=B 0 a A @
11
12
0 a(1) 32
(1) 22
a a a
13
(1) 23 (1) 33
1 C C A
a a
and so on.
(1) 22 (1) 23
= a22 + m21a21 = a23 + m31a13
3. As soon as A(k) is formed, it can overwrite A. 4. The vector (mk+1 k : : : mnk) has (n ; k) elements and at the kth step exactly (n ; k) zeros are produced, an obvious scheme of storage will then be to store these (n ; k) elements in the positions (k + 1 k) (k + 2 k) : : : (n k) of A below the diagonal. 5. The entries of the upper triangular matrix U then can be stored in the upper half part of A including the diagonal. With this storage scheme, the matrix A(n;1) at the end of the (n ; 1)th step will be
A A n;
(
1)
0 a a an B ;m a B an B . . . B .. . .. .. . =B . B . B . . ... ... . B . . @ n ;mn ;mn n; ann;
11 12 1 21 (1) 21 1 1 (
(1) 2
1)
1 C C C C C C C C A
143
Thus a typical step of Gaussian elimination for LU factorization consists of (1) forming the multipliers and storing them in the appropriate places below diagonal, (2) updating the entries of the rows (k + 1) through n and saving them in the upper half of A. Based on the above discussion, we now present the following algorithm.
Algorithm 5.2.2 Triangularization Using Gaussian Elimination without Pivoting
Let A be an n n matrix. The following algorithm computes triangularization of A, whenever it exists. The algorithm overwrites the upper triangular part of A including the diagonal with U , and the entries of A below the diagonal are overwritten with multipliers needed to compute L. For k = 1 2 : : : (n ; 1) do 1. (Form the multipliers) aik mik = ; aik (i = k + 1 k + 2 : : : n) 2. (Update the entries) aij aij + mikakj (i = k + 1 : : : n j = k + 1 n).
akk
Remark: The algorithm does not give the matrix L explicitly however, it can be formed out of the multipliers saved at each step, as shown earlier (see the explicit expression for L). FlopCount. The algorithm requires roughly n ops. This can be seen as follows: 3 At step 1, we compute (n ; 1) multipliers and update (n ; 1) entries of A. Each multiplier requires one op and updating each entry also requires 1 op. Thus, step 1 requires (n ; 1) +(n ; 1)
3
2 2
ops. Step 2, similarly, requires (n ; 2)2 + (n ; 2)] ops, and so on. In general, step k requires (n ; k)2 + (n ; k)] ops. Since there are (n ; 1) steps, we have Total ops =
n;1 X k=1
(n ; k)2 +
n;1 X k=1
(n ; k)
' n + O(n ) : 3
3 2
; = n(n ; 1)(2n ; 1) + n(n2 1) 6
144
Recall 1. 12 + 22 + 2. 1 + 2 + + r2 = +r =
r(r+1)(2r+1)
6 2
r(r+1)
Gaussian Elimination for a Rectangular Matrix
The above described Gaussian elimination process for an n n matrix A can be easily extended to an m n matrix to compute its LU factorization, when it exists. The process is identical, only the number of steps in this case is k = minfm ; 1 ng: The following is an illustrative example. Let 01 21 B C A = B3 4C @ A 5 6 m = 3 n = 2: The number of steps k = min(2 2) = 2:
Step 1. The multipliers are m = ;3 m = ;5:
21 31
a a
22 32
a = a + m a = ;2 a = a + m a = ;4
(1) 22 (1) 32 22 32 21 12 31 13
A
32
01 2 1 B C A = B 0 ;2 C : @ A 0 ;4
(1) 32
Step 2. The multiplier is m = ;2 a
A
01 2 1 B C A = B 0 ;2 C @ A
(2)
a = 0.
(2) 32
01 2 1 B C U = B 0 ;2 C : @ A
0 0
0 0
Note that U in this case is upper trapezoidal rather than an upper triangular matrix 0 1 1 01 0 01 0 0 B C B C L = B ;m21 1 0 C = B 3 1 0 C : A @ @ A ;m31 ;m32 1 5 2 1 145
Verify that
01 0 0101 2 1 01 21 B CB C B C LU = B 3 1 0 C B 0 ;2 C = B 3 4 C = A: @ A@ A @ A
5 2 1 0 0 5 6
Di culties with Gaussian Elimination without Pivoting
As we have seen before, Gaussian elimination without pivoting fails if any of the pivots is zero. However, it is worse yet if any pivot becomes close to zero: in this case the method can be carried to completion, but the obtained results may be totally wrong. Consider the following celebrated example from Forsythe and Moler (CSLAS, p. 34): Let Gaussian elimination without pivoting be applied to ! 0:0001 1 A= 1 1
;1 and use threedigit arithmetic. There is only one step. We have: multiplier m21 = 10;4 = ;104
0:0001 1 U = A = 0 ;104 ! 1 0 L = : 104 1 The product of the computed L and U gives ! 0:0001 1 LU = 2 0 which is di erent from A. Who is to blame? Note that the pivot a(1) = 0:0001 is very close to zero (in threedigit arithmetic). This small 11 pivot gave a large multiplier. The large multiplier, when used to update the entries, eliminated the small entries (e.g., (1 ; 104) gave ;104). Fortunately, we can avoid this small pivot just by row interchanges. Consider the matrix with the rst row written second and the second written rst: ! 1 1 0= A : 0:0001 1 Gaussian elimination now gives ! ! 1 1 1 0 (1) U =A = L= 0 1 0:0001 1
(1)
!
1 1 Note that the pivot in this case is a = 1. The product LU = = A0 . 0:0001 1:0001
(1) 11
!
146
5.2.2 Gaussian Elimination with Partial Pivoting
In the example above, we have found a factorization of the matrix A0 which is a permuted version of A in the sense that the rows have been swapped. A primary purpose of factorizing a matrix A into LU is to use this factorization to solve a linear system. It is easy to see that the solution of the system Ax = b and that of the system A0 x = b0, where b0 has been obtained in a manner similar to that used to generate A0 , are the same. Thus, if the row interchanges can help avoid a small pivot, it is certainly desirable to do so. As the above example suggests, disaster in Gaussian elimination without pivoting can perhaps be avoided by identifying a \good pivot" (a pivot as large as possible) at each step, before the process of elimination is applied. The good pivot may be located among the entries in a column or among all the entries in a submatrix of the current matrix. In the former case, since the search is only partial, the method is called partial pivoting in the latter case, the method is called complete pivoting. It is important to note that the purpose of pivoting is to prevent large growth in the reduced matrices which can wipe out original data. One way to do this is to keep multipliers less than one in magnitude, and this is exactly what is accomplished by pivoting. However, large multipliers do not necessarily mean instability (see our discussion of Gaussian elimination without pivoting for symmetric positive de nite matrices in Chapter 6). We rst describe Gaussian elimination with partial pivoting. The process consists of (n ; 1) steps. Let it be ar1 1 .
Step 1. Scan the rst column of A to identify the largest element in magnitude in that column.
Form a permutation matrix P1 by interchanging the rows 1 and r1 of the identity matrix and leaving the other rows unchanged. Form P1 A by interchanging the rows r1 and 1 of A. Find an elementary lower triangular matrix M1 such that A(1) = M1P1 A has zeros below the (1,1) entry on the rst column.
It is su cient to construct M1 such that
0a B ... B M B .. B B . @
1
11
an
1 0 1 C B0C C B C C = B . C: C B.C C B.C A @ A
0
1
147
Note that
mn 0 0 1 where m = ; a , m = ; a : : : mn = ; an . Note that aij refers to the (i j )th entry of the a a a permuted matrix P A. Save the multipliers mi i = 2 : : : n and record the row interchanges.
1 21 21 11 31 31 11 1 1 11 1
0 1 Bm B B B M =Bm B . B . B . @
1
21 31
0 1 0 0 1
...
01 C 0C C C 0C C
C C A
0 B0 B B B A = B0 B. . B. . B. . @
(1)
1
0
1 C C C C C .C .C .C A
in magnitude in that column. Let the element be a(1)2 . Form the permutation matrix P2 by r2 interchanging the rows 2 and r2 of the identity matrix and leaving the other rows unchanged. Form P2 A(1). Next, nd an elementary lower triangular matrix M2 such that A(2) = M2P2 A(1) has zeros below c the (2,2) entry. M2 is constructed as follows. First, construct an elementary matrix M2 of order (n ; 1) such that 0 1 0 1
Step 2. Scan the second column of A below the rst row to identify the largest element
(1)
Ba B B c = B ... M B B . B . B . @
2
a
22 32
then de ne
01 0 1 0 B0 C B C B. C M =B. c C: B. M C @ A
2
an
C B0C C B C C B C C = B0C C B C C B.C C B.C C B.C A @ A
0
2
Note that aij refers to the (i
j )th
entry of the current matrix P A . At the end of Step 2,
2 (1)
0
2
148
we will have
0 B0 B B B A = M P A = B0 B. B. B. @
(2) 2 2 (1)
01 0 B0 1 B B B0 m B M = B .. .. B B. . B. . B. . B. . @
2
32
0 . . . 0 0 0 0 1 ...
1 C C C C C C ... . C .C .A
01 C 0C C C 0C .C .C .C C ... 0 C C A 1
a where mi2 = ; ai2 , i = 3 4 : : : n. 22 Save the multipliers mi2 and record the row interchange. Step k. In general, at the kth step, scan the entries of the kth column of the matrix A(k;1) below the row (k ; 1) to identify the pivot ar , form the permutation matrix Pk , and nd an elementary lower triangular matrix Mk such that A(k) = Mk Pk A(k;1) has zeros below the (k k) c entry. Then Mk is constructed rst by constructing Mk of order (n ; k + 1) such that
k k
0 mn2 0
0a 1 0 1 B kk C B 0 C B ... C B C cB C B C Mk B .. C = B .. C B . C B.C @ A @ A
ank
0
and then de ning
Mk = Pk A k; .
( 1)
Ik;
0
1
where `0' is a matrix of zeros. The elements ai k refer to the (i k)th entries of the matrix
c Mk
0
!
matrix.
Step n1. At the end of the (n ; 1)th step, the matrix A n; will be an upper triangular
( 1)
Form U : Set
A n; = U:
( 1)
(5.2.1)
149
Then
U = A n; = Mn; Pn; A n; = Mn; Pn; Mn; Pn; A n; =
( 1) 1 1 ( 2) 1 1 2 2 ( 3)
= Mn;1Pn;1 Mn;2Pn;2
2 2 2 1 1
MPMPA
2 2 1 1
M P M P =M Then we have from above the following factorization of A :
1 1 2
Set
Mn; Pn; Mn; Pn;
(5.2.2)
U = MA
Theorem 5.2.2 (Partial Pivoting Factorization Theorem) Given an n n
nonsingular matrix A, Gaussian elimination with partial pivoting gives an upper triangular matrix U and a \permuted" lower triangular matrix M such that
MA = U
where M and U are given by (5.2.2) and (5.2.1), respectively. From (5.2.2) it is easy to see that there exists a permutation matrix P such that PA = LU . De ne
P = Pn; P P L = P (Mn; Pn;
1 2 1
1
1
M P ); :
1 1 1
(5.2.3) (5.2.4)
Then PA = LU .
Corollary 5.2.1 (Partial Pivoting LU Factorization Theorem). Given an
n n nonsingular matrix A, Gaussian elimination with partial pivoting yields LU factorization of a permuted version of A: PA = LU
where P is a permutation matrix given by (5.2.3), L is a unit lower triangular matrix given by (5.2.4), and U is an upper triangular matrix given by (5.2.1).
150
Example 5.2.2
0:0001 1 A= : 1 1 Only one step. The pivot entry is 1, r1 = 2
!
P =
1
PA =
1
m
21
=
M =
1
MPA =
1 1
M = MA =
0 1 1 0 ! 1 1 0:0001 1 1 ; 0:0001 = ;10;4 ! ! 1 0 1 0 = m21 1 ;10;4 1 ! 1 1! 1 1! 1 0 = =U ;10;4 1 0:0001 1 0 1 ! ! 0 1 ! 1 0 0 1 M1P1 = = ;10;4 1 1 0 1 ;10;4 ! 10;4 1 ! 1 1 ! 0 1 = : 1 ;10;4 1 1 0 1
!
Example 5.2.3
Triangularize
1 1 1 using partial pivoting. Express A = MU . Find also P and L such that PA = LU .
0 0 1 11 C B A = B 1 2 3C A @
Step 1. The pivot entry is a = 1, r = 2 00 1 01 B C P = B1 0 0C @ A
21 1 1
01 2 31 B C P A = B0 1 1C @ A
1
0 0 1 1 1 1 151
0 1 0 01 B C M = B 0 1 0C @ A ;1 0 1 01 2 3 1 C B A = M P A = B0 1 1 C A @ 0 ;1 ;2
1 (1) 1 1
Step 2. The pivot entry is a = 1 01 2 3 1 B C P A = B0 1 1 C @ A 0 ;1 ;2
22 2 (1) 2 3
P = I (no interchange is necessary)
2
c M =
M =
2
1 0 1 1
1
!
0 1 1 01 2 3 1 B C U = A(2) = M2 P2 A(1) = B 0 1 1 C @ A 0 0 ;1 00 1 01 B C M = M2 P2 M1P1 = B 1 0 0 C : @ A 1 ;1 1 It is easily veri ed that A = MU .
2
0 1 ! 1 0 0 I 0 B C = B0 1 0C @ A c 0 M
Form L and P :
00 1 01 C B P = P P = B1 0 0C A @
2 1
01 0 01 B C L = P (M P M P ); = B 0 1 0 C : @ A 1 ;1 1
2 2 1 1 1
0 0 1
It is easy to verify that PA = LU . 152
Forming the Matrix M and Other Computational Details
1. Each permutation matrix Pk can be formed just by recording the index rk since Pk is the permuted identity matrix in which rows k and rk have been interchanged. However, neither the permutation matrix Pk nor the product Pk A(k;1) needs to be formed explicitly. This is because the matrix Pk A(k;1) is just the permuted version of A(k;1) in which the rows rk and k have been interchanged. 2. Each elementary matrix Mk can be formed just by saving the (n ; k + 1) multipliers. The matrices MkPk A(k;1) = Mk B also do not need to be computed explicitly. Note that the elements in the rst k rows of the matrix Mk B are the same as the elements of the rst k rows of the matrix B, and the elements in the remaining (n ; k) rows are given by:
bij + mik bkj
(i = k + 1 : : : n
j = k + 1 : : : n):
3. The multipliers can be stored in the appropriate places of lower triangular part of A (below the diagonal) as they are computed. 4. The nal upper triangular matrix U = A(n;1) is stored in the upper triangular part. 5. The pivot indices rk are stored in a separate single subscripted integer array. 6. A can be overwritten with each A(k) as soon as the latter is formed.
Again, the major programming requirement is a subroutine that computes an elementary matrix M such that, given a vector a, Ma is a multiple of the rst column of the identity matrix. In view of our above discussion, we can now formulate the following practical algorithm for
LU factorization with partial pivoting.
Algorithm 5.2.3 Triangularization Using Gaussian Elimination with Partial Pivoting
Let A be an n n nonsingular matrix. Then the following algorithm computes the triangularization of A with rows permuted, using Gaussian elimination with partial pivoting. The upper triangular matrix U is stored in the upper triangular part of A, including the diagonal. The multipliers needed to compute the permuted triangular matrix M such that MA = U are stored in the lower triangular part of A. The permutation indices rk are stored in a separate array. For k = 1 2 : : : n ; 1 do 153
1. Find rk so that jar k j = kmaxn jaik j. Save rk . i If ar k = 0, then stop. Otherwise, continue.
k k
a 3. (Form the multipliers) aik mik = ; a ik (i = k + 1 : : : n).
4. (Update the entries) aij
kk
3
2. (Interchange the rows rk and k) akj $ ark j (j = k k + 1 : : : n).
aij + mikakj = aij + aik akj (i = k + 1 : : : n j = k + 1 : : : n).
2
Flopcount. The algorithm requires about n ops and O(n ) comparisons. 3 (Note that the search for the pivot at step k requires (n ; k) comparisons.) Note: Algorithm 5.2.3 does not give the matrices M and P explicitly. However, they can be constructed easily as explained above, from the multipliers and the permutation indices, respectively. Remark: The above algorithm accesses the rows of A in the innermost loop and that is why it is known as the roworiented Gaussian elimination (with partial pivoting) algorithm.
It is also known as the kij algorithm note that i and j appear in the inner loops. The columnoriented algorithm can be similarly developed. Such a columnoriented algorithm has been used in LINPACK (LINPACK routine SGEFA).
Example 5.2.4
01 2 41 B C A = B4 5 6C: @ A
7 8 9
Step 1. k = 1.
1. The pivot entry is 7: r1 = 3. 2. Interchange rows 3 and 1:
07 8 91 B C A B4 5 6C: @ A
1 2 4
3. Form the multipliers:
a
21
4 m = ;7
21
a
31
m = ;1: 7
31
154
4. Update:
07 8 9 1 B C C: A B0 @ A
0
3 7 6 7 6 7 19 7
Step 2. k = 2.
1. The pivot entry is 6 . 7 2. Interchange rows 2 and 3:
07 8 9 1 B C C: A B0 @ A
0
6 7 3 7 19 7 6 7
3. Form the multipliers: 4. Update:
m = ;1: 2
32
Form M .
07 8 9 1 B C: C A = B0 @ A 0 0 ; 00 0 11 C B M = B 1 0 ; C: A @ ; 1 ;
6 7 19 7 1 2 1 7 1 2 1 2
5.2.3 Gaussian Elimination with Complete Pivoting
In Gaussian elimination with complete pivoting, at the kth step, the search for the pivots is made among all the entries of the submatrix below the rst (k ; 1) rows. Thus, if the pivot is ars , to bring this pivot to the (k k) position, the interchange of the rows r and k has to be followed by the interchange of the columns k and s. This is equivalent to premultiplying the matrix A(k;1) by a permutation matrix Pk obtained by interchanging rows k and r and postmultiplying Pk A(k;1) by another permutation matrix Qk obtained by interchanging the columns k and s of the identity matrix I . The ordinary Gaussian elimination is then applied to the matrix Pk A(k;1)Qk that is, an elementary lower triangular matrix Mk is sought such that the matrix
A k = Mk Pk A k; Qk
( ) ( 1)
has zeros on the kth column below the (k k) entry. The matrix Mk can of course be computed in two smaller steps as before. 155
At the end of the (n ; 1)th step, the matrix A(n;1) is an upper triangular matrix. Set
A n; = U:
( 1)
(5.2.5)
Then
U = A n; = Mn; Pn; A n; Qn; = Mn; Pn; Mn; Pn; A n; Qn; Qn; = = Mn; Pn; Mn; Pn; M P AQ Q
( 1) 1 1 ( 2) 1 1 1 2 2 ( 3) 2 1 1 1 2 2 1 1 1
2
Qn; :
1
Set
Mn; Pn; Q
1
1 1
MP = M Qn; = Q:
1 1 1
(5.2.6) (5.2.7)
Then we have
U = MAQ:
Theorem 5.2.3 (Complete Pivoting Factorization Theorem) Given an n n
matrix A, Gaussian elimination with complete pivoting yields an upper triangular matrix U , a permuted lower triangular matrix M and a permutation matrix Q such that MAQ = U where U M , and Q are given by (5.2.5){(5.2.7). As in the case of partial pivoting, it is easy to see from (5.2.4) and (5.2.7) that the factorization MAQ = U can be expressed in the form:
PAQ = MU:
156
Corollary 5.2.2 (Complete Pivoting LU Factorization Theorem). Gaussian
elimination with complete pivoting yields the factorization PAQ = LU , where P and Q are permutation matrices given by
P = Pn; P Q = Q Qn;
1 1
1 1
and L is a unit lower triangular matrix given by
L = P (Mn; Pn;
1
1
M P ); :
1 1 1
Example 5.2.5
Triangularize
00 1 1 B A = B1 2 3 @
1 1 1
1 C C A
using complete pivoting.
Step 1. The pivot entry is a = 3.
23
00 1 01 B C P = B1 0 0C @ A
1
00 0 11 B C Q = B0 1 0C @ A
1
0 0 1
1
0 1 0 01 B C M = B; 1 0C @ A ; 0 1 03 2 1 1 B C A = M P AQ = B 0 ; C: @ A
1 1 3 1 3 (1) 1 1 1
1 0 0
03 2 11 B C P AQ = B 1 1 0 C @ A
1
1 1 1
0
1 3
1 3
1 3
2 3
157
0 1 0 0 1 0 03 2 11 ! 1 0 B 2 1C c : M2 = 1 P2A(1) Q2 = B 0 3 3 C A @ 1 2 1 1 0 ;3 3 01 0 01 C B M2 = B 0 1 0 C @ A 1 0 2 1 03 2 11 C B U = A(2) = M2P2A(1)Q2 = M2P2(M1P1AQ1 )Q2 = B 0 2 3 C : @ 3 1A 0 0 1 2 (Using Corollary 5.2.2, nd for yourself P , Q, L, and U such that PAQ = LU .)
Step 2. The pivot entry is a = 2 . 3 01 0 01 B C P = B0 0 1C @ A
(1) 33 2
01 0 01 B C Q = B0 0 1C: @ A
2
Forming the Matrix M and other Computational Details
Remarks similar to those as in the case of partial pivoting hold. The matrices Pk Qk Pk A(k;1)Qk , Mk and Mk PkA(k;1)Qk do not have to be formed explicitly wasting storage unnecessarily. It is enough to save the indices and the multipliers. In view of our discussion on forming the matrices Mk and the permutation matrix Pk , we now present a practical Gaussian elimination algorithm with complete pivoting, which does not show the explicit formation of the matrices Pk Qk Mk MkA and Pk AQk . Note that partial pivoting
is just a special case of complete pivoting.
Algorithm 5.2.4 Triangularization Using Gaussian Elimination with Complete Pivoting
Given an n n matrix, the following algorithm computes triangularization of A with rows and columns permuted, using Gaussian elimination with complete pivoting. The algorithm overwrites A with U . U is stored in the upper triangular part of A (including the diagonal) and the multipliers mik are stored in the lower triangular part. The permutation indices rk and sk are saved separately. For k = 1 2 : : : n ; 1 do 1. Find rk and sk such that jar s j = max fjaij j : i j kg and save rk and sk . If ar s = 0, then stop. Otherwise, continue.
k k k k
158
2. (Interchange the rows rk and k) akj $ ar j (j = k k + 1 : : : n).
k
4. (Form the multipliers) aik mik = ; aik (i = k + 1 : : : n). 5. (Update the entries of A) aij
3. (Interchange the columns sk and k) aik $ ai s (i = 1 2 : : : n).
k
akk aij + mik akj = aij + aik akj (i = k + 1 : : : n j = k + 1 : : : n).
Note: Algorithm 5.2.4 does not give the matrices M P , and Q explicitly they have to be formed, respectively, from the multipliers mik and the permutation indices rk and sk , as explained above. Flopcount: The algorithm requires n ops and O(n ) comparisons. 3
3 3
Example 5.2.6
Just one step is needed.
1 2 A= : 3 4
!
k=1: The pivot entry is 4.
r = 2 s = 2:
1 1
First, the second and rst rows are switched and this is then followed by the switch of the second and rst column to obtain the pivot entry 4 in the (1,1) position:
A
3 4 1 2 4 3 2 1
!
(After the interchange of the rst and second rows).
A
!
(After the interchange of the rst and second columns). 2 Multiplier is: a21 m21 = ; a21 = ; 4 = ; 1 a 2
11
A
4 3 0 ;1 2 159
!
(After updating the entries of A).
0 1 0 P1 = Q = Q1 = 1 0 1 ! 0 1! 1 0 M = M1P1 = 1 = ;2 1 1 0
!
1 0 ! 0 1 : 1 ;1 2
!
5.3 Stability of Gaussian Elimination
The stability of Gaussian elimination algorithms is better understood by measuring the growth of the elements in the reduced matrices A(k). (Note that although pivoting keeps the multipli
ers bounded by unity, the elements in the reduced matrices still can grow arbitrarily).
We remind the readers of the de nition of the growth factor in this context, given in Chapter 3.
De nition 5.3.1 The growth factor is the ratio of the largest element (in magnitude) of
A A : : : A n; to the largest element (in magnitude) of A: : : : n; ) = max(
(1) ( 1) 1 2 1
Example 5.3.1
where = max jaij j and ij
k
= max ja(k)j. i j ij 0:0001 1 A= 1 1
! !
1. Gaussian elimination without pivoting gives
(1)
0:0001 1 A =U = 0 ;104 max ja(1)j = 104 ij max jaij j = 1
= the growth factor = 104
2. Gaussian elimination with partial pivoting yields ! 1 1
A =U =
(1)
0 1 max ja(1) j = 1 ij max jaij j = 1
= the growth factor = 1 160
The question naturally arises: how large can the growth factor matrix? We answer this question in the following.
be for an arbitrary
1. Growth Factor of Gaussian Elimination for Complete Pivoting
For Gaussian elimination with complete pivoting,
fn 2 3 4
1
1 2
1 3
n
n
1 ;1 g1=2:
Indeed, there was an unproven conjecture by Wilkinson (AEP, p. 213) that the growth factor for complete pivoting was bounded by n for real n n matrices. Unfortunately, this conjecture has recently been settled by Gould (1991) negatively. Gould (1991)
exhibited a 13 13 matrix for which Gaussian elimination with complete pivoting gave the growth factor = 13:0205. In spite of Gould's recent result, Gaussian elimination with complete
This is a slowly growing function of n. Furthermore, in practice this bound is never attained.
pivoting is a stable algorithm. 2.
Growth Factor of Gaussian Elimination for Partial Pivoting
For Gaussian elimination with partial pivoting, 2n;1, that is,
can be as big as 2n; .
1
Unfortunately, one can construct matrices for which this bound is attained.
161
Consider the following example:
0 B 1 B ;1 B B . B .. B A = B .. B B . B B . B .. B @ ;1
0 0 0 1C 1 0 0 1C C . C ... ... . C . C C . C ... ... . C . C ... ... . C . C . C A ;1 1
1
That is,
for j = i n for j < i 0 otherwise. Wilkinson (AEP, pp. 212) has shown that the growth factor for this matrix with partial pivoting is 2n;1. To see this, take the special case with n = 4. 0 1 0 0 11 B ;1 1 0 1 C B C C A = B B B ;1 ;1 1 1 C C @ A ;1 ;1 ;1 1 01 0 0 11 B0 1 0 2C B C C (1) A = B B B 0 ;1 1 2 C C @ A 0 ;1 ;1 2 01 0 0 11 B B0 1 0 2C C C A(2) = B B B0 0 1 4C C @ A 0 0 ;1 4 01 0 0 11 B0 1 0 2C B C C (3) A = B B B0 0 1 4C C @ A 0 0 0 8 Thus the growth factor = 8 = 23 = 24;1: 1
8 > 1 > < aij = > ;1 > :
162
Remarks: Note that this is not the only matrix for which = 2n; . Higham and Higham (1987)
1
have identi ed a set of matrices for which = 2n;1. The matrix 0 0:7248 0:7510 0:5241 0:7510 1 B 0:7317 0:1889 0:0227 ;0:7510 C C B C B=B B C B 0:7298 ;0:3756 0:1150 0:7511 C A @ ;0:6993 ;0:7444 0:6647 ;0:7500 is such a matrix. See Higham (1993). Examples of the above type are rare. Indeed, in many practical examples, the elements of the matrices A(k) very often continue to decrease in size. Thus, Gaussian elimination with partial
pivoting is not unconditionally stable in theory, but in practice it can be considered a stable algorithm. 3. Growth Factor and Stability of Gaussian Elimination without Pivoting
For Gaussian elimination without pivoting, can be arbitrarily large, except for a few special cases, as we shall see later, such as symmetric positive de nite matrices. Thus Gaussian elimi
nation without pivoting is, in general, a completely unstable algorithm.
5.4 Householder Transformations
De nition 5.4.1 A matrix of the form
uu H = I ; 2uT u
T
Alston Householder, an American mathematician, was the former Director of the Mathematics and Computer Science Division of Oak Ridge National Laboratory at Oak Ridge, Tennessee and a former Professor of Mathematics at the University of Tennessee, Knoxville. A research conference on Linear and Numerical Linear Algebra dedicated to Dr. Householder, called \HOUSEHOLDER SYMPOSIUM" is held every three years around the world. Householder died in 1993 at the age of 89. A complete biography of Householder appears in the SIAM Newsletter, October 1993.
where u is a nonzero vector, is called a Householder matrix after the celebrated numerical analyst Alston Householder.
163
A Householder matrix is also known as an Elementary Re ector or a Householder transformation. We now give a geometric interpretation of a Householder transformation.
u(uT x) u
P
x
;2u(uT x)
Hx
= ( =
I ; 2uu)T x x ; 2u(uT x)
With this geometric interpretation the following results become clear:
kHxk = kxk for every x 2 Rn.
2 2
A re ection does not change the length of the vector.
H is an orthogonal matrix.
kHxk = kxk for every x implies that H is orthogonal.
2 2
H = I.
2
Hx re ects x to the other side of P , but H x = H (Hx) re ects it back to x.
2
Hy = y for every y 2 P .
Vectors in P cannot be re ected away.
H has a simple eigenvalue ;1 and (n ; 1)fold eigenvalue 1. P = fv 2 Rn : v> u = 0g has n ; 1 linearly independent vectors y : : : yn; and Hyi = yi , i = 1 : : : n ; 1. So 1 is an (n ; 1)fold eigenvalue. Also, H re ects u to ;u, i.e., Hu = ;u. Thus ;1 is an eigenvalue of H which must be a simple eigenvalue because H can have only n eigenvalues.
1 1
164
det(H ) = ;1 det(H ) = (;1) 1 1 = ;1. Also from Figure 5.1, for given x y 2 Rn with kxk2 = ky k2, if we choose u to be a unit vector parallel to x ; y , then H = I ; 2uu> re ects x to y . The importance of Householder matrices lies in the fact that they can also be used to create zeros in a vector.
matrix H such that Hx is a multiple of e1 .
Lemma 5.4.1 Given a nonzero vector x 6= e , there always exists a Householder
1
Proof. De ne
with u = x + sign(x1 )kxk2e1 , then it is easy to see that Hx is a multiple of e1 :
T H = I ; 2 uuu uT
Note: If x is zero, its sign can be chosen either + or ;. Any possibility of over ow or under ow in the computation of kxk can be avoided by scaling the vector x. Thus the vector u should be determined from the vector max x x jg rather than from the vector x itself. i fj i Algorithm 5.4.1 Creating zeros in a vector with a Householder matrix
1 2
Given an nvector x, the following algorithm replaces x by Hx = ( 0 : : : 0)T , where H is a Householder matrix. 1. Scale the vector x
x maxi fjxijg . 2. Compute u = x + sign(x1 )kxk2e1 . uuT 3. Form Hx where H = I ; 2uT u .
Remark on Step 3: Hx in step 3 should be formed by exploiting the structure of H as shown
in Example 7 in Chapter 4.
165
Example 5.4.1
001 B C x = B4C @ A
1
x maxx x jg fj i
4
001 B C = B1C @ A
1 4 17 4
001 0 1 0p p B1C B B C u = B 1 C + 17 B 0 C = B 1 @ A @ A @
1 4
and
0 0 ;0:9701 ;0:2425 1 C uuT B H = I ; 2uT u = B ;0:9701 0:0588 ;0:2353 C @ A ;0:2425 ;0:2353 0:9412 0 ;4:1231 1 B C Hx = B 0 C : @ A
0
0
1 4
1 C C A
FlopCount and Roundo Property. Creating zeros in a vector by a Householder matrix is a cheap and numerically stable procedure. It takes only 2(n2) ops to create zeros in the positions 2 through n in a vector, and it can b be shown (Wilkinson AEP, pp. 152162) that if H is the computed Householder matrix, then
b kH ; H k 10 :
Moreover, where
b (Hx) = H (x + e)
jej cn kxk
2 2
c is a constant of order unity, and is the machine precision.
166
5.4.1 Householder Matrices and QR Factorization Theorem 5.4.1 (Householder QR Factorization Theorem) Given an n n
matrix A, there exists an orthogonal matrix Q and an upper triangular matrix R such that A = QR: The matrix Q can be written as Q = H1H2 matrix.
Hn; , where each Hi is a Householder
1
As we will see later, the QR factorization plays a very signi cant role in numerical solutions of linear systems, leastsquares problems, eigenvalue and singular value computations. We now show how the QR factorization of A can be obtained using Householder matrices, which will provide a constructive proof of Theorem 5.4.1. As in the process of LU factorization, this can be achieved in (n ; 1) steps however, unlike the Gaussian elimination process, the Householder process can always be carried out to completion.
Step 1. Construct a Householder matrix H such that H A has zeros below the (1,1) entry in the lst column: 0 1 B0 C B C B C B0 C: B C H A=B B .. .. .C .C B. . .C @ A
1 1 1
Note that it is su cient to construct H1 = I ; 2unuT =(uT un ) such that n n
0
0a Ba B H B .. B B . @
1
11 21
an
1 0 1 C B0C C B C C=B.C C B.C C B.C A @ A
0
1
for then H1 A will have the above form.
(1)
Overwrite A with A = H1A for use in the next step.
167
Since A overwrites A(1) , A(1) can be written as:
11 (1)
0a a B0 a B A A = B .. .. B B . . @
2
12 22
0 an2
...
a n1 C a nC C
1 2
ann
2
. C: . C . A
(1)
in the 2nd column and the zeros already created in the rst column of A(1) in step 1 are not destroyed: 0 1
Step 2. Construct a Householder matrix H such that H A has zeros below the (2,2) entry
B0 B B B A = H A = B0 0 B. . B. . B. . @
(2) 2 (1)
H can be constructed as follows:
2
0 0
C C C C C .C ... . C .C A
b First, construct a Householder matrix H2 = In;1 ; 2un;1uT;1 =(uT;1un;1) of order n ; 1 such n n that 0 1 0 1 Ba B B b B ... HB B . B . B . @
2
a
22 32
and then de ne
0 A(2) = H2A(1) will then have the form above.
01 0 B B0 H = B .. B b B. H @
2
an
C B0C C B C C B C C B C C = B0C C B.C C B.C C B.C A @ A
0
2
0
2
1 C C C: C C A
Overwrite A with A .
(2)
Note: Since H also has zeros below the diagonal on the lst column, premultiplication of A by
H preserves the zeros already created in step 1.
2 2 (1)
Step k. In general, at the kth step, rst create a Householder matrix
u uT b Hk = In;k ; (2uTn;k u n;k ) n;k n;k
+1 +1 +1 +1 +1
168
of order n ; k + 1 such that
0a 1 0 1 B kk C B 0 C . B . C B C b k B ... C = B .. C H B C B C B . C B.C @ A @ A
ank
0
and then, de ning
Hk =
compute A(k) = Hk A(k;1).
Ik;
0
1
b Hk
0
!
The matrix A(k) will have zeros on the kth column below the (k k)th entry and the zeros already created in previous columns will not be destroyed. At the end of the (n ; 1)th step, the resulting matrix A(n;1) will be an upper triangular matrix R. Now, since A(k) = Hk A(k;1) k = n;1 ::: 2 we have
Overwrite A with A k :
( )
R = A n; = Hn; A n; = Hn; Hn; A n; = = Hn; Hn; H H A:
( 1) 1 ( 2) 1 2 ( 1 2 2 1
3)
(5.4.1)
Set
HH: (5.4.2) Since each Householder matrix Hk is orthogonal, so is QT . Therefore, from above, we have
1 2 2 1
QT = Hn; Hn;
R = QT A or A = QR:
T T (Note that Q = H1 H2 T Hn; is also orthogonal.)
1
(5.4.3)
Forming the Matrix Q and Other Computational Details
1. Since each Householder matrix Hk is uniquely determined by the vector un;k+1 to construct Hk it is su cient just to save the vector un;k+1: 2. A(k) = Hk A can be constructed using the technique described in Chapter 4 (Example 4.7) which shows that A(k) can be constructed without forming the product explicitly. 3. The vector un;k+1 has (n ; k +1) elements, whereas only (n ; k) zeros are produced at the kth step. Thus, one possible scheme for storage will be to store the elements (un;k+1 1 un;k+1 2 : : : un;k+1 n;k) in positions (k + 1 k) : : : (n ; k + 1 k) of A: The last element un;k+1 n;k+1 has to be stored separately. 169
4. The matrix Q, if needed, can be constructed from the Householder matrices H1 through Hn;1.
The major programming requirement is a subroutine for computing a Householder matrix H such that, for a given vector x, Hx is a multiple of e .
1
Algorithm 5.4.2 Householder QR Factorization
Given an n n matrix A, the following algorithm computes Householder matrices H1 through Hn;1 and an upper triangular matrix R such that with Q = H1 Hn;1 A = QR: The algorithm overwrites A with R. For k = 1 to n ; 1 do
b 1. Find a Householder matrix Hk = In;k+1 ; 2un;k+1uT;k+1=uT;k+1un;k+1 of order n ; k + 1 n n such that 0a 1 0r 1 kk B kk C B 0 C B ... C B C b B C B C Hk B .. C = B .. C : B . C B . C @ A @ A ank 0 2. De ne ! Ik;1 0 Hk = b : 0 Hk 3. Save the vector un;k+1.
4. Compute A(k) = Hk A. 5. Overwrite A with A(k).
Example 5.4.2
Let
00 1 11 B C A = B1 2 3C: @ A
1 1 1
Step 1.
k = 1:
170
Construct H :
1
001 0 1 B C B C H B1C = B0C @ A @ A
1
001 0 1 1 0 p2 1 B C p B C B C u = B1C + 2B0C = B 1 C @ A @ A @ A
3
1
0
1
0
1
u uT H = I ; 2uT u
1 3 3 3 3 3
01 0 01 0 1 B C B = B0 1 0C ; B p @ A @ B 1 = B ; p2 @
1 2 1 2 1 2
1 p2 1 2
1 p2 1 2 1 2
1 p2 1 0 0 1 0 0 ; p1 ; p1 1 2 2 2
1 2
1 C C A
Form A :
(1)
;p 0 ;p2 B A =H A=B 0 @
(1) 1
;
C ; C A
1 2 1 2
Overwrite:
A A
(1)
0 ;1:414 ;2:1213 ;2:8284 1 B 0 ;0:2071 0:2929 C B C @ A 0 ;1:2071 ;1:7071
k=2
0
;
;3p2 2 p 1; 2 2 p (1+ 2)
2
2 2
2
;
2 2 (2+ 2) 2
p 1 ;p C C A
p
Step 2.
c Construct H :
2
Construct H :
2
! ! b ;0:2071 = H 0 ;1:2071 ! ! ;1:4318 ! ;0:2071 1 u = ; 1:2247 = ;1:2071 0 ;1:2071 ! ;0:1691 ;0:9856 b H = ;0:9856 0:1691 01 1 0 0 B C H = B 0 ;0:1691 ;0:9856 C @ A 0 ;0:9856 0:1691
2 2 2 2
171
Form A :
(2)
Form Q:
0 ;1:4142 ;2:1213 ;2:8284 1 B C H A =H A =H H A=B 0 1:2247 1:6330 C = R @ A 0 0 ;0:5774 0 0 1 0:8165 0:5774 B C Q = H H = B ;0:7071 0:4082 ;0:5774 C @ A ;0:7071 ;0:4082 0:5774
(2) 2 (1) 2 1 1 2 2 3 3
matrix R. This can be seen as follows. b The construction of Hk (and therefore of Hk ) requires about 2(n ; k) ops, while that of A(k) from A(k) = Hk A (taking advantage of the special structure of Hk ) requires roughly 2(n ; k)2 ops. Thus Total number of ops = 2
n;1 X k=1
FlopCount. The algorithm requires approximately n ops just to compute the triangular
(n ; k)2 + (n ; k)] + 1]
= 2 (n ; 1)2 + (n ; 2)2 + + 12] + 2 (n ; 1) + (n ; 2) + = 2 n(n ; 1)(2n ; 1) + 2 n(n ; 1) 6 2 3 2n : 3
Note: The above count does not take into account the explicit construction of Q. Q is available
only in factored form. It should be noted that in a majority of practical applications, it is su cient to have Q in this factored form and, in many applications, Q is not needed at all. If Q is needed 2 explicitly, another 3 n3 ops will be required. (Exercise #22) composition of a slightly perturbed matrix. Speci cally, it can be shown (Wilkinson AEP p. 236) b b that if R denotes the computed R, then there exists an orthogonal Q such that
Roundo Property. In the presence of roundo errors the algorithm computes QR debb A + E = QR:
The error matrix E satis es
(n) kAkF where (n) is a slowly growing function of n and is the machine precision. If the inner products are accumulated in double precision, then it can be shown (Golub and Wilkinson (1966)) that (n) = 12:5n. The algorithm is thus stable. 172
kE kF
of an m n matrix A. The above Householder method can be applied to obtain QR factorization of such an A as well. The process consists of s = minfn m ; 1g steps the Householder matrices H1 H2 : : : Hs are constructed successively so that
5.4.2 Householder QR Factorization of a NonSquare Matrix In many applications (such as in least squares problems, etc.), one requires the QR factorization
8 R! > < if m n TA = HsHs; H H A = Q 0 > : (R S ) if m n. FlopCount and Roundo Property
1 2 1
The Householder method in this case requires 1. n2 (m ; n ) ops if m n. 3 2. m2(n ; m ) ops if m n. 3 The roundo property is the same as in the previous case. The QR factorization of a rectan
gular matrix using Householder transformations is stable. Example 5.4.3
0 1 B A = B 0:0001 @
1 C 0 C A 0 0:0001 s = min(2 3) = 2:
1
Step 1. Form H
1
0 1 1 B C p u = B 0:0001 C + 1 + (0:0001) @ A
2
2
0
011 0 2 1 B0C = B B C B 0:0001 C C @ A @ A
0 0
H =I;
1
2 2 T 2
uu uT u2 2
0 ;1 ;0:0001 0 1 B C = B ;0:0001 1 0C @ A
0 0 1
1
0 ;1 ;1 1 C B A = H A = B 0 ;0:0001 C A @
(1)
0
0:0001
173
Step 2: Form H
2
u =
1
=
b H =
2
H =
2
Form R Form
! ;0:0001 ! q 1 ; (;0:0001) + (0:0001) 0:0001 0 ! ;2:4141 10; :1000 ! ! 1 0 u uT = ;0:7071 0:7071 ; 2 uT u 0:7071 0:7071 0 1 01 1 0 0 B B 0 ;0:7071 0:7071 C C @ A 0 0:7071 0:7071 0 ;1 ;1 1 B 0 0:0001 C = R ! : C H A =B @ A
2 2 4 1 1 1 1 2 (1)
0
0
0
0 ;1 1 0:0001 ;0:0001 B C Q = H H = B ;0:0001 ;0:7071 0:7071 C @ A
1 2
R =
;1
0 0:0001
;1 !
0
0:7071
0:7071
5.4.3 Householder Matrices and Reduction to Hessenberg Form Theorem 5.4.2 (Hessenberg Reduction Theorem) An arbitrary n n ma
trix can always be transformed to an upper Hessenberg matrix Hu by orthogonal similarity: PAP T = Hu: As noted before, reduction to Hessenberg form is very important in eigenvalue computations. The matrix A is routinely transformed to a Hessenberg matrix before the process of eigenvalue computations (known as the QR iterations) starts. Hessenberg forms are also useful tools in
many other applications such as in control theory, signal processing, etc.
174
The idea of orthogonal factorization using Householder matrices described in the previous section can be easily extended to obtain P and Hu . The matrix P is constructed as the product of (n ; 2) Householder matrices P1 through Pn;2: P1 is constructed to create zeros in the rst column of A below the entry (2,1), P2 is determined to create zeros below the entry (3,2) of the second column of the matrix P1 AP1T , and so on. The process consists of (n ; 2) steps. (Note that an n n Hessenberg matrix contains at least (n;2)(n;1) zeros.) 2
b Step 1. Find a Householder matrix P of order n ; 1 such that 0a 1 0 1 Ba C B0C bB C B C P B .. C = B .. C : B C B C B . C B.C @ A @ A
1 21 31 1
an
1
0
De ne and compute Overwrite A with A(1) : Then
P =
1 (1)
I
b 0 P1
1 1
1
0
!
A = P AP T :
0 B B B B A A = B0 B. . B. . B. . @
(1)
0
1 C C C C C C C C A
b Step 2. Find a Householder matrix P of order (n ; 2) such that 0a 1 0 1 B ... C B 0 C bB C B C P B .. C = B .. C : B C B C B . C B.C @ A @ A
2 32 2
an
2
0
De ne and compute A(2) = P2 A(1)P2T :
P =
2
I
b 0 P2
2
0
!
175
Overwrite A with A : Then
(2)
0 . . . . . . 0 0 The general Step k can now easily be written down. At the end of (n ; 2) steps, the matrix A(n;2) is an upper Hessenberg matrix Hu: Now,
T Hu = A n; = Pn; A n; Pn;
( 2) 2 ( 3) 2
0 B B B B0 B A A =B B B0 B. B. B. @
(2)
1 C C C C C C: C C C C C A
T T = Pn;2(Pn;3A(n;4)Pn;3 )Pn;2 . . .
= (Pn;2Pn;3 Set
P )A(P T P T
1 1 2 2 3
T T Pn; Pn; ):
3 2
(5.4.4)
(5.4.5) We then have Hu = PAP T : Since each Householder matrix Pi is orthogonal, the matrix P which is the product of (n ; 2) Householder matrices is also orthogonal.
1
P = Pn; Pn;
P:
postmultiplication of Pk A by PkT does not destroy the zeros already created in Pk A: For example, let n = 4 and k = 1: Then 01 0 0 01 B0 C B C B C P1 = B C B0 C @ A 0
Note: It is important to note that since Pk has the form ! Ik 0 b 0 PK
0 B B PA = B B B0 @
1
0
1 C C C C C A
176
and
0 B B T =B P AP B B0 @
1 1
0
101 0 0 01 0 CB0 C B CB C B CB C=B CB C B CB0 C B0 A@ A @
0 0
1 C C C: C C A
Forming the Matrix P And Other Computational Details
1. Each Householder matrix Pk is uniquely determined by the vector un;k : It is therefore su cient to save the vector un;k to recover Pk later. If the matrix P is needed explicitly, it can be computed from the Householder matrices P1 through Pn;2: 2. The vector un;k has (n ; k) elements, whereas the number of zeros produced at the kth step is (n ; k ; 1). Thus, the (n ; k) elements of un;k can be stored in the appropriate lower triangular part of A below the diagonal if the subdiagonal entry at that step is stored separately. This is indeed a good arrangement, since subdiagonal entries in a Hessenberg matrix are very special and play a special role in many applications. Thus, all the information needed to compute P can be stored in the lower triangular part of A below the diagonal, storing the subdiagonal entries separately in a linear array of (n ; 1) elements. Other arrangements of storage are also possible.
Algorithm 5.4.3 Householder Hessenberg Reduction
Given an n n matrix A, the following algorithm computes Householder matrices P1 through Pn;2 such that, with P T = P1 Pn;2, PAP T is an upper Hessenberg matrix Hu: The algorithm overwrites A with Hu . For k = 1 2 : : : n ; 2 do
b 1. Determine a Householder matrix Pk = In;k ; 2un;k uT;k =uT;kun;k , of order n ; k, such that n n 0a B k ... b B PK B .. B B . @
+1
k
ank
1 0 1 C B0C C B C C = B . C: C B.C C B.C A @ A
0
2. Save the vector un;k : ! Ik 0 3. De ne Pk = b 0 Pk 177
4. Compute A(k) = Pk APkT 5. Overwrite A with A(k)
the explicit computation of P . P can be stored in factored form. If P is computed explicitly, 2 another 3 n3 ops will be required. However, when n is large, the storage required to form P is prohibitive. 351) that the computed Hu is orthogonally similar to a nearby matrix A + E , where
FlopCount. The algorithm requires n ops to compute Hu. This count does not include
5 3 3
RoundO Property. The algorithm is stable. It can be shown, (Wilkinson AEP p.
kE kF cn kAkF :
2
Here c is a constant of order unity. If the inner products are accumulated in double precision at the appropriate places in the algorithm, then the term n2 in the above bound can be replaced by n, so in this case
kE kF cn kAkF
which is very desirable.
Example 5.4.4
Let
1 1 1 Since n = 3 we have just one step to perform. b Form P1: ! ! 1 b P1 = 1 0 ! p 1 ! 1 + p2 ! ! p 1 1 + 2 = u2 = + 2e1 = 1 0 1 1 ! ! ;0:7071 ;0:7071 ! 1 0 5:8284 2:4142 u b1 = I2 ; 2uu2u22 ; :2929 = : P 2 0 1 2:4142 1 ;0:7071 0:7071
T T
00 1 21 B C A = B1 2 3C: @ A
178
1 01 1 0 0 C = B 0 ;0:7071 ;0:7071 C C B C A @ A b 0 P 0 ;0:7071 0:7071 0 0 ;2:1213 0:7071 1 B C A A = P AP T = B ;1:4142 3:5000 ;0:5000 C = Hu : @ A 0 1:5000 ;0:5000 All computations are done using 4digit arithmetic.
1
Form P :
01 B P = B0 @
1 (1)
0
0
1
1
1
Tridiagonal Reduction
If the matrix A is symmetric, then from
PAP T = Hu
it follows immediately that the upper Hessenberg matrix Hu is also symmetric and, therefore, is tridiagonal. Thus, if the algorithm is applied to a symmetric matrix A, the resulting matrix Hu will be a symmetric tridiagonal matrix T . Furthermore, one obviously can take advantage of the symmetry of A to modify the algorithm. For example, a signi cant savings can be made in storage by taking advantage of the symmetry of each A(k): The symmetric algorithm requires only 2 n3 ops to compute T compared to 5 n3 ops needed to 3 3 compute Hu: The roundo property is essentially the same as the nonsymmetric algorithm. The
algorithm is stable. Example 5.4.5
Let
1 1 1 Since n = 3 we have just one step to perform. b Form P1: ! ! 1 b P1 = 1 0 ! p ! p 1 ! 1 + p2 ! 1 1 u2 = + 2e1 = + 2 = 1 1 0 1 ! ! ;0:7071 ;0:7071 ! 1 0 5:8284 2:4142 u b1 = I2 ; 2uu2u22 ; :2929 = : P 2 0 1 2:4142 1:0000 ;0:7071 0:7071
T T
00 1 11 B C A = B1 2 1C: @ A
179
Form P :
1
01 B P = B0 @
1
Thus
1 01 1 0 0 C = B 0 ;0:7071 ;0:7071 C : C B C A @ A b 0 P 0 ;0:7071 0:7071 0 0 ;1:4142 0 1 B C Hu = P AP T = B ;1:4142 2:5000 0:5000 C : @ A
0 0
1 1 1
(Note that Hu is symmetric tridiagonal.)
0
0:5000 0:5000
5.5 Givens Matrices
De nition 5.5.1 A matrix of the form 0 B1 0 0 B0 1 0 B B. . . B .. .. .. B B B0 0 0 B J (i j c s) = B .. .. .. B B. . . B B B0 0 0 B B. . . B .. .. .. B @
0 0 0
ith j thcolumns 1 # #
0 0 . . . 0 . . . 0 . . . 1
c
s
;s c
0
C C C C C C C C C C C C C C C C C C C A
ith j th
rows
where c2 + s2 = 1, is called a Givens matrix, after the numerical analyst Wallace Givens. Since one can choose c = cos and s = sin for some , a Givens matrix as above can be conveniently denoted by J (i j ). Geometrically, the matrix J (i j ) rotates a pair of coordinate axes (ith unit vector as its xaxis and the j th unit vector as its y axis) through the given angle in the (i j ) plane. That is why the Givens matrix J (i j ) is commonly known as a Givens Rotation or Plane Rotation in the (i j ) plane. This is illustrated in the following gure.
W. Givens was director of the Applied Mathematics Division at Argonne National Laboratory. His pioneering work done in 1950 on computing the eigenvalues of a symmetric matrix by reducing it to a symmetric tridiagonal form in a numerically stable way forms the basis of many numerically backward stable algorithms developed later. Givens held appointments at many prestigious institutes and research institutions (for a complete biography, see the July 1993 SIAM Newsletter). He died in March, 1993.
180
e2
v=
cos( sin(
+ ) + )
!
=
cos sin
; sin !
cos
cos sin
!
u= e1
Thus, when an nvector
cos sin
!
0x Bx B x = B .. B B. @
1 2
is premultiplied by the Givens rotation J (i j ), only the ith and j th components of x are a ected the other components remain unchanged. Note that since c2 + s2 = 1, J (i j ) J (i j )T = I , thus the rotation J (i j ) is orthogonal.
xn
1 C C C C C A
s= p x c= p x x +x x +x ! ! c s the Givens rotation J (1 2 ) = is such that J (1 2 )x = : ;s c 0 The above formula for computing c and s might cause some under ow or over ow. However,
1 2 2 1 2 2 2 1 2 2
x If x = x
1 2
!
is a 2vector, then it is a matter of simple veri cation that, with
the following simple rearrangement of the formula might prevent that possibility.
If jx2 j jx1j, compute t = x1 , s = p 1 2 c = st. x2 1+t x2 , c = p 1 , s = ct. Otherwise, t = x 1 + t2 1
(Note that computations of s and t do not involve .)
Example 5.5.1
181
x=
1
1 2
!
1 2 Since jx1j > jx2j, we use t = 1 c = q 1 1 = p , s = p . 2 5 5 1+ 4 2 1 ! 0 p p 1 1 ! p5 ! c s 5 x = @ 1 25 A 1 = 2 : ;s c 0 ;p p 2 5 5
Zeroing Speci ed Entries in a Vector
if Givens rotations are especially useful in creating zeros in a speci ed position in a vector. Thus,
0x 1 Bx C B C B.C B .. C B C B C Bx C B C x = B ..i C B C B.C B C B C B xk C B.C B.C B.C @ A
1 2
and if we desire to zero xk only, we can construct the rotation J (i k ) (i < k) such that J (i k )x will have zero in the kth position. ! c s To construct J (i k ), rst construct a 2 2 Givens rotation such that
xn
and then form the matrix J (i k ) by inserting c in the positions (i i) and (k k), s and ;s respectively in the positions (i k) and (k i), and lling the rest of the matrix with entries of the identity matrix.
! ! ! c s xi = ;s c xk 0
;s c
Example 5.5.2
011 B C x = B ;1 C : @ A
182
3 Suppose we want to create a zero in the third position, that is, k = 3. Choose i = 2.
1. Form a 2 2 rotation such that
01 0 B ; 2. Form J (2 3 ) = B 0 p @
! c s ! ;1 ! = ;s c 3 0
1 10
; c = p1
Then
01 0 B ; J (2 3 )x = B 0 p @
;3 0 p10
p3 10 ;1 p10
0
1 C C: A
10
s = p3 :
10
;3 0 p10
1 10
p3 10 ;1 p10
0
10 1 1 0 1 1 CB C Bp C C B ;1 C = B 10 C : A@ A @ A
3 0
Creating Zeros in a Vector Except Possibly in the First Place
Given an nvector x, if we desire to zero all the entries of x except possibly the rst one, we can construct J (1 2 ), x(1) = J (1 2 )x, x(2) = J (1 3 )x(1), x(3) = J (1 4 )x(2), etc., so that with P = J (1 n ) J (1 3 )J (1 2 ) we will have Px a multiple of e1 . Since each rotation is orthogonal, so is P .
Example 5.5.3
011 B C x = B ;1 C @ A
0 C 1 0C A 2 0 0 1 0 p2 1 B C x(1) = J (1 2 )x = B 0 C @ A 2 0 q 2 0 p2 1 6 C B 6 0 A J (1 3 ) = B 0 1 q C : @ ; 2 p2 0 6 6
1 2
0p B J (1 2 ) = B p @
2
;1 p2 1 p2
1
183
Then
0 p6 1 B C J (1 3 )x = B 0 C @ A
(1)
0 p ; p B p p P = J (1 3 )J (1 2 ) = B q q @ ; 0 p6 1 B C Px = B 0 C : @ A
1 6 1 2 2 6
0
1 6
1 2 2 6
1 C 0 C q A
2 p6 2 6
0
FlopCount and Roundo Property. Creating zeros in a vector using Givens rotations is about twice as expensive as using Householder matrices. To be precise, the process requires
1 only 1 2 times ops as Householder's, but it requires O( n22 ) square roots, whereas the Householder method requires O(n) square roots.
The process is as stable as the Householder method.
Creating Zeros in Speci ed Positions of a Matrix
The idea of creating zeros in speci ed positions of a vector can be trivially extended to create zeros in speci ed positions of a matrix as well. Thus, if we wish to create a zero in the (j i) (j > i)] position of a matrix A, one way to do this is to construct the rotation J (i j ) a ecting the ith and j th rows only, such that J (i j )A will have a zero in the (j i) position. The procedure then is as follows.
Algorithm 5.5.1 Creating Zeros in a Speci ed Position of a Matrix Using Givens Rotations
Given an n n matrix A, the following algorithm overwrites A by J (i j )A such that the latter has a zero in the (j i) position. 1. Find c = cos( ) s = sin( ) such that
! c s ! aii ! = ;s c aji 0
184
2. Form J (i j ):
Remark: Note that there are other ways to do this as well. For example, we can form J (k j )
a ecting the kth and j th rows, such that J (k j )A will have a zero in the (j i) position.
Example 5.5.4
Let
4 5 6 Create a zero in the (2,1) position of A using J (1 2 ): 1. Find c and s such that
01 2 31 B C A = B3 3 4C: @ A
c s! 1! x! = ;s c 3 0 c= p s= p
1 10 3 10
2. Form J (1 2 ).
0p B; J (1 2 ) = B p @
10 10
1 10 3 10
0p p B J (1 2 )A = B 0 ; p @
4 5
0
p3 10 p1 10
0
0 C 0C A 1
1 1
11 10 3 10
15 p10 ;5 C p10 C : A
6
Example 5.5.5
Let
4 5 6 Create a zero in the (3,1) position using J (2 3 ): 1. Find c and s such that
01 2 31 B C A = B2 3 4C: @ A
c s ;s c c= p
! 2!
4
=
!
0
4 20
2 20
s= p
185
2. Form
01 0 B J (2 3 ) = B 0 p @ 01 0 B A = J (2 3 )A = B 0 p @
(1) 2 20
;4 0 p20
;4 0 p20 101 2 31 0 1 2 3 1 0 CB C Bp C 26 32 p4 C B 2 3 4 C = B 20 p20 p20 C A @ A 20 A @ ;2 ;8 p2 4 5 6 0 p20 p20 20
2 20
p4 20 p2 20
0
1 C C A
5.5.1 Givens Rotations and QR Factorization
It is clear from the foregoing discussion that, like Householder matrices, Givens rotations can also be applied to nd the QR factorization of a matrix. The Givens method, however, is almost twice as expensive as the Householder method. In spite of this, QR factorization using Givens rotations seems to be particularly useful in QR iterations for eigenvalue computations and in the solution of linear systems with structured matrices, such as Toeplitz, etc. Givens rotations are also emerging as important tools in parallel computations of many important linear algebra problems. Here we present the algorithm for QR factorization of an m n matrix A, m n, using Givens' method. The basic idea is just like Householder's: compute orthogonal matrices Q1 Q2 : : : Qk , using Givens rotations such that A(1) = Q1 A has zeros below the (1,1) entry in the rst column, A(2) = Q2 A(1) has zeros below the (2,2) entry in the second column, and so on. Each Qi is generated as a product of Givens rotations. One way to form fQi g is:
Q = J (1 m )J (1 m ; 1 ) Q = J (2 m )J (2 m ; 1 )
1 2
J (1 2 ) J (2 3 )
and so on. Let s = min(m n ; 1): Then
R = A s; = Qs; A s; = Qs; Qs; A s; =
( 1) 1 ( 2) 1 2 ( 3)
= Qs;1Qs;2
Q Q A = QT A:
2 1
We now have A = QR with QT = Qs;1Qs;2
QQ:
2 1
186
J (i i + 1 ) such that if
we have where
Theorem 5.5.1 (Givens QR Factorization Theorem) Given an m n matrix A, m n, there exist s = min(m n ; 1) orthogonal matrices Q Q : : : Qs; de ned by Qi = J (i m )J (i m ; 1 )
1 2 1
Q = QT QT
1 2
QT; s
1
A = QR R= R
0
1
!
and R1 is upper triangular.
Forming the Matrix Q and Other Computational Details
In practice there is no need to form the n n matrices J (k ` ) and J (k ` )A explicitly. Note that J (k ` ) can be determined by knowing c and s only, and J (k ` )A replaces the kth and `th rows of A by their linear combinations. Speci cally, the kth row of J (k ` )A is c times the kth row of A plus s times the `th row. Similarly, the `th row of J (k ` ) is ;s times the kth row of A plus c times the `th row. If the orthogonal matrix Q is needed explicitly, then it can be computed from the product: Q = QT QT QT;1 1 2 s where each Qi is the product of (m ; i) Givens rotations:
Qi = J (i m )J (i m ; 1 )
J (i i + 1 ):
Algorithm 5.5.2 QR Factorization Using Givens Rotations
Given an m n matrix A, the following algorithm computes an orthogonal matrix Q, using Givens rotations such that A = QR. The algorithm overwrites A with R. For k = 1 2 : : : minfn m ; 1g do For ` = k + 1 : : : m do 187
1. Find c and s such that
a`k 0 2. Save the indices k and ` and the numbers c and s. 3. Form J (k ` ): 4. Overwrite A with J (k ` )A:
2
c s ;s c
! a ! kk
=
!
:
FlopCount. The algorithm requires 2n ;m ; n ops. This count, of course, does not include computation of Q. Thus, the algorithm is almost twice as expensive as the Householder algorithm for QR factorization.
3
b Roundo Property. The algorithm is quite stable. It can be shown that the computed Q b and R satisfy b b R = QT (A + E )
where
kE kF ckAkF c is a constant of order unity (Wilkinson AEP, p. 240).
5.5.2 Uniqueness in QR Factorization
We have seen that QR factorization of a matrix A always exists and that this factorization may be obtained in di erent ways. One therefore wonders if this factorization is unique. In the following we will see that if A is nonsingular then the factorization is essentially unique. Moreover, if the diagonal entries of the upper triangular matrices of the factorizations obtained by two di erent methods are positive, then these two QR factorizations of A are exactly the same. To see this, let A = Q1 R1 = Q2 R2: (5.5.1) Since A is nonsingular, then R1 and R2 are also nonsingular. Note that det A = det(Q1R1) = det Q1 det R1 and det A = det(Q2R2) = det Q2 det R2:
Since Q1 and Q2 are orthogonal, their determinants are 1. Thus det R1 and det R2 are di erent from zero. From (5.5.1), we have QT Q1 = R2R;1 = V 2 1 188
so that V T V = QT Q2 QT Q1 = I . Now V is upper triangular (since R;1 and R2 are both upper 1 1 2 triangular), so equating elements on both sides, we see that V must be a diagonal matrix with 1 as diagonal elements. Thus,
R R; = V = diag(d d : : : dn) with di = 1 i = 1 2 : : : n.
2 1 1 1 2
This means that R2 and R1 are the same except for the signs of their rows. Similarly, from
QT Q = V
2 1
we see that Q1 and Q2 are the same except for the signs of their columns. (For a proof, see Stewart IMC, p. 214.) If the diagonal entries of R1 and R2 are positive, then V is the identity matrix so that
R =R
2
1
and
Q =Q :
2 1
The above result can be easily generalized to the case where A is m independent columns.
n and has linearly
Theorem 5.5.2 (QR Uniqueness Theorem) Let A have linearly independent columns. Then there exist a unique matrix Q with orthonormal columns and a unique upper triangular matrix R with positive diagonal entries such that
A = QR:
Example 5.5.6
We nd the QR factorization of
1 1 1 using Givens Rotations and verify the uniqueness of this factorization.
00 1 11 B C A = B1 2 3C @ A
189
Step 1. Find c and s such that
c s a = 0 ;s c a a =0 a =1
11 21 11 21
!
!
!
0 0 1 01 B C J (1 2 ) = B ;1 0 0 C @ A 0 0 1 0100 1 11 01 2 3 1 B CB C B C A J (1 2 )A = B ;1 0 0 C B 1 2 3 C = B 0 ;1 ;1 C @ A@ A @ A
Find c and s such that 0 0 1 1 1 1 = 1 1 1 0 0 1
c=0
s=1
a =1
11
c s ;s c
! a !
c= p a
11 31 1 2
!
0 s = p12
1 2
0 0 p1 0 p1 1 0 1 2 B 02 1 02 C B 0 ;1 CB A J (1 3 )A = B A@ @ ; 1 p1 0 p2 1 1 2
0p 0 p 1 B C J (1 3 ) = B 0 1 0 C @ A
1 2
; p1 2
1 p2
3 3 2 p2 2 2 C B C ;1 C = B 0 ;1 ;1 C A @ p A ; 1 0 p1 ; 2 2
1 0p
p 1
Step 2. Find c and s such that
a = ;1
22
01 B J (2 3 )A = B 0 @
0 p2 B = B 0 @
(using four digit computations). 0
0
;p2 p3 1 p3
0
! c s! a ! = ;s c a 0 p c = ;p s = ;p a = ;p p 1 10p p
22 32 32 1 2 2 3
1 3
3 p p2 p3 2
0
CB C p A@ ; p C B 0 ;1 ;1 C p A ;p 0 ;p ; 2 p 1 0 1 1:4142 2:1213 2:8284 2 2 p C B C p C=B 0 1:2247 1:6330 C = R A A @
1 3 2 3 1 2 2 1 p3 2 3
0
2
3 2
2 2
0
0
0:5774
190
Remark: Note that the upper triangular matrix obtained here is essentially the same as the
one given by the Householder method earlier (Example 5.4.2), di ering from it only in the signs of the rst and third rows.
5.5.3 Givens Rotations and Reduction to Hessenberg Form
The Givens matrices can also be employed to transform an arbitrary n n matrix A to an upper Hessenberg matrix Hu by orthogonal similarity: PAP T = Hu : However, to do this, Givens rotations must be constructed in a certain special manner. For example, in the rst step, Givens rotations J (2 3 ) J (2 4 ) : : : J (2 n ) are successively computed so that with P1 = J (2 n ) J (2 4 )J (2 3 ),
0 In the second step, Givens rotations J (3 4 ), J (3 5 ) : : : J (3 n ) are successively computed so that with P2 = J (3 n )J (3 n ; 1 ) J (3 4 ),
0 B B B B P AP T = A = B 0 B. . B. . B. . @
1 1 (1)
1 C C C C C .C .C .C A
0 B B B B B T = A = B0 PA P B B0 B. B. B. @
2 (1) 2 (2)
0 . . . . . . 0 0
1 C C C C C .C .C .C .C .C .C A
and so on. At the end of (n ; 2)th step, the matrix A(n;2) is the upper Hessenberg matrix Hu. The transforming matrix P is given by:
P = Pn; Pn;
2
3
PP
2
1
where Pi = J (i + 1 n )J (i + 1 n ; 1 )
J (i + 1 i + 2 ):
Algorithm 5.5.3 Givens Hessenberg Reduction
Given an n n matrix A, the following algorithm overwrites A with PAP T = Hu, where Hu is an upper Hessenberg matrix. 191
For p = 1 2 : : : n ; 2 do For q = p + 2 : : : n do 1. Find c = cos( ) and s = sin( ) such that
c s ap p = ;s c aq p 0 2. Save c and s and the indices p and q: 3. Overwrite A with J (p + 1 q )AJ (p + 1 q )T :
+1
!
!
!
Forming the Matrix P and Other Computational Details
There is no need to form J (p + 1 q ) and J (p + 1 q )A explicitly, since they are completely determined by p q and c and s (see the section at the end of the QR factorization algorithm using Givens rotations). If P is needed explicitly, it can be formed from
P = Pn;
where Pi = J (i + 1 n )J (i + 1 n ; 1 )
2
PP
2
1
J (i + 1 i + 2 ):
10 3 3 5 3 3
4 the algorithm requires about 3 n3 ops to transform A to a symmetric tridiagonal matrix T again this is twice as much as required by the Householder method to do the same job.
FlopCount. The algorithm requires about n ops to compute Hu compared to n required by the Householder method. Thus, the Givens reduction to Hessenberg form is about twice as expensive as the Householder reduction. If the matrix A is symmetric, then
Roundo Property. The roundo property is essentially the same as the Householder method. The method is numerically stable. Example 5.5.7
00 1 11 B C A = B1 2 3C @ A
1 1 1
192
Step 1. Find c and s such that
! c s! a ! = 0 ;s c a a =a =1 c= p s= p
21 31 21
01 B J (2 3 ) = B 0 @
0
31
1 2
0 0 1 2:1213 0:7171 B C A J (2 3 )AJ (2 3 )T = B 1:4142 3:5000 0:5000 C @ A 0 ;1:5000 ;0:5000
= Upper Hessenberg.
1 p2 ; p12
0
1 p2 1 p2
0
1 C C A
1 2
Observation: Note that the Upper Hessenberg matrix obtained here is essentially the same as
that obtained by Householder's method (Example 5.4.4) the subdiagonal entries di er only in sign.
5.5.4 Uniqueness in Hessenberg Reduction
The above example and the observation made therein brings up the question of uniqueness in Hessenberg reduction. To this end, we state a simpli ed version of what is known as the Implicit Q Theorem. For a complete statement and proof, see Golub and Van Loan (MC 1983, pp. 223234).
Theorem 5.5.3 (Implicit Q Theorem) Let P and Q be orthogonal matrices such that P T AP = H and QT AQ = H are two unreduced upper Hessenberg
matrices. Suppose that P and Q have the same rst columns. Then H1 and H2 are essentially the same in the sense that H2 = D;1 H1D, where
1 2
D = diag( 1 : : : 1):
193
Example 5.5.8
Consider the matrix
1 1 1 once more. The Householder method (Example 5.4.4) gave 0 0 ;2:1213 0:7171 1 B C H1 = P1AP1T = B ;1:4142 3:5000 ;0:5000 C @ A 0 1:5000 ;0:5000 The Givens method (Example 5.5.7) gave 0 0 1 2:1213 0:7171 B C H2 = J (2 3 )AJ (2 3 )T = B 1:4142 3:5000 0:5000 C = H2: @ A 0 ;1:5000 ;0:5000 In the notation of Theorem 5.5.3 we have
00 1 21 B C A = B1 2 3C @ A
P =P
1
QT = J (2 3 ):
1
Both P and Q have the same rst columns, namely, the rst column of the identity. We verify that
H = D; H D
2 1
and
D = diag(1 ;1 1):
5.6 Orthonormal Bases and Orthogonal Projections
Let A be m n, where m n. Consider the QR factorization of A:
QT A =
R!
0
:
Assume that A has full rank n. Partition Q = (Q1 Q2), where Q1 has n columns. Then the columns of Q1 form an orthonormal basis for R(A). Similarly, the columns of Q2 form an orthonormal basis for the orthogonal complement of R(A). Thus, the matrix
PA = Q QT
1 1
? is the orthogonal projection onto R(A) and the matrix PA = Q2QT is the projection onto 2 the orthogonal complement of R(A). Since the orthogonal complement of R(A) is denoted by ? R(A)? = N (AT ), we shall denote PA by PN .
194
Example 5.6.1
0 1 B A = B 0:0001 @
0 1 C 0 C A 0:0001
1
Using the results of Example 5.4.3 we have 0 ;1 1 0:0001 ;0:0001 B C Q = B ;0:0001 ;0:7071 0:7071 C @ A 0 0:7071 0:7071 0 ;1 0:0001 1 0 ;0:0001 1 B C B C Q1 = B 0:0001 ;0:7071 C Q2 = B 0:7071 C @ A @ A 0 0:7071 0:7071 0 1:000 0:0003 0:0007 1 B C PA = Q1QT = B 0:0003 0:5000 ;0:5000 C 1 @ A 0007 ; 0 0:0:0001 10:5000 0:5000 ; B C ? PN = PA = Q2QT = B 0:7071 C ( ;0:0001 0:7071 0:7071) 2 @ A 0:7071 0 0:00000001 ;0:0001 ;0:0001 1 B C = B ;0:0001 0:5000 0:5000 C @ A ;0:0001 0:5000 0:5000
Projection of a vector
Given an mvector b, the vector bR , the projection of b onto R(A), is given by
bR = PA b:
Similarly b? , the projection of b onto the orthogonal complement of R(A), is given by
? b? = PA b = PN b ? where we denote PA by PN .
space of AT , we denote b? by bN for notational convenience. 195
Note that b = bR + b?. Again, since the orthogonal complement of R(A) = N (AT ), the null
Note: Since bN = b ; bR, it is tempting to compute bN just by subtracting bR from b once bR has been computed. This is not advisable, since in the computation of bN from bN = b ; bR,
cancellation can take place when bR b.
Example 5.6.2
01 21 B C A = B0 1C @ A 011 B C b = B1C @ A
1
1 0 0 ;0:7071 B Q = B 0 @ ;0:7071 0 ;0:7071 B Q1 = B 0 @ ;0:7071
PA = bR = PA b = PN = bN = PN b =
;0:5774 ;0:4082 1 C ;0:5774 0:8165 C A 0:5774 0:4082 ;0:5744 1 C ;0:5774 C A 0:5774 0 0:8334 0:3334 0:1666 1 B C Q QT = B 0:3334 0:3334 ;0:3334 C @ A 0:1666 ;0:3334 0:8334 0 1:3340 1 B B 0:3334 C C @ A 0:6666 0 0:1667 ;0:3333 ;0:1667 1 B B ;0:3333 0:6667 0:3333 C C @ A ;0:1667 0:3333 0:1667 0 ;0:3334 1 B B 0:6666 C C @ A ;0:3334
1 1
Example 5.6.3
0 1 B A = B 0:0001 @
0 1 C 0 C A 0:0001 196
1
0 2 1 B C b = B 0:0001 C @ A
0:0001
0:1 0:7071 0:7071 0 1 1 0:0001 0:0001 B C PA = Q1QT = B 0:0001 0:5000 ;0:5000 C 1 @ A 0:0001 ;5:0000 0:5000 0 2 1 B C bR = PA b = B 0:0001 C @ A 0:0001 001 B C bN = B 0 C : @ A 0
0 ;1 ;0:0001 ;0:0081 1 B C Q = B ;0:0001 ;0:7071 0:7071 C @ A
Projection of a Matrix Onto the Range of Another Matrix
Let B = (b1 : : : bn) be an m n matrix. We can then think of projecting each column of B onto R(A) and onto its orthogonal complement. Thus, the matrix
BR = PA(b : : : bn) = PA B
1
is the orthogonal projection of B onto R(A). Similarly, the matrix
BN = PN B
is the projection of B onto the orthogonal complement of R(A).
Example 5.6.4
01 21 B C A = B2 3C @ A
4 5
01 2 31 C B B = B2 3 4C A @
3 4 5
0 ;0:2182 ;0:8165 ;0:5345 1 B C Q = B ;0:4364 ;0:4082 0:8018 C @ A ;0:8729 0:4082 ;0:2673
197
A = QR gives
Orthonormal Basis for R(A):
0 ;0:2182 ;0:8165 1 B C Q B ;0:4364 ;0:4082 C @ A ;0:8729 0:4082 0 0:7143 0:4286 ;0:1429 1 B C PA = Q QT = B 0:4286 0:3571 0:2143 C @ A ;0:1429 0:2143 0:9286 Orthogonal Projection of B onto R(A): 0 1:1429 2:1429 3:1429 1 B C PA B = B 1:7857 2:7857 3:7857 C : @ A
1 1 1
3:0714 4:0714 5:0714
5.7 QR Factorization with Column Pivoting
If an m n (m n) matrix A has rank r < n, then the matrix R is singular. In this case the QR factorization cannot be employed to produce an orthonormal basis of R(A). To see this, just consider the following simple 2 2 example from Bjorck (1992, p. 31): 0 0 c ;s A= = 0 1 s c
!
! 0 s!
0 c
= QR:
If c and s are chosen such that c2 + s2 = 1, rank(A) = 1 < 2, and the columns of Q do not form an orthonormal basis of R(A) nor for its complement. Fortunately, however, the process of QR factorization (for example, the Householder method) can be modi ed to yield an orthonormal basis. The idea here is to generate a permutation matrix P such that AP = QR where 0 0 Here R11 is r r upper triangular and r is the rank of A, and Q is orthogonal. The rst r columns of Q will then form an orthonormal basis of R(A). The following theorem guarantees the existence of such a factorization:
R=
R
11
R
12
!
:
198
with rank(A) = r min(m n): Then there exist an n n permutation matrix P and an m m orthogonal matrix A such that 0 0 where R11 is an r r upper triangular matrix with positive diagonal entries.
Theorem 5.7.1 (QR Column Pivoting Theorem) Let A be an m n matrix
R R
QT AP
=
11
12
!
Proof. Since rank(A) = r, there exists a permutation matrix P such that
AP = (A A )
1 2
where A1 is = m r and has linearly independent columns. Consider now the QR factorization of A1: ! T A = R11 Q 1 0 where by the uniqueness theorem (Theorem 5.5.2), Q and R11 are uniquely determined and R11 has positive diagonal entries. Then 0 22 T AP ) = rank(A) = r, and rank(R11) = r we must have R22 = 0: Since rank(Q
1 2
QT AP
= (QT A
QT A
)=
R
11
R R
12
!
:
Creating the Permutation Matrix P
There are several ways one can think of creating this permutation matrix P . We present here one such way, which is now known as QR factorization with column pivoting. The permutation matrix P is formed as the product of r permutation matrices P1 through Pr . These permutation matrices are applied to A one by one, before forming Householder matrices to create zeros in appropriate columns. More speci cally, the following is done: so that the column of maximum norm becomes the rst column. This is equivalent to forming a permutation matrix P1 such that the matrix AP1 has the rst column having the maximum norm. Form now a Householder matrix H1 so that
Step 1. Find the column of A having the maximum norm. Permute now the columns of A
A = H AP
1 1
1
has zeros in the rst column below the (1,1) entry. 199
by deleting the rst row and the rst column. Permute the columns of this submatrix so that the column of maximum norm becomes the rst column. This is equivalent to constructing a ^ ^ ^ permutation matrix from P2 such that the second column of A1 P2 has the maximum norm. Form ^ P2 from P2 in the usual way, that is: 01 0 1 0 B0 C B C B. C P2 = B . B. ^ C: P2 C @ A 0 Now construct a Householder matrix H2 so that
^ Step 2. Find the column with the maximum norm of the submatrix A obtained from A
1
1
A = H A P = H H AP P
2 2 1 2 2 1 1
2
has zeros in the second column of A2 below the (2,2) entry. As before, H2 can be constructed in two steps as in Section 5.3.1: The kth step can now easily be written down. The process is continued until the entries below the diagonal of the current matrix all become zero. Suppose r steps are needed. Then at the end of the rth step, we have
A A r = Hr H AP Pr R = QT AP = R =
( ) 1 1 2 3 3
0
11
R
0
12
!
:
2
r
matrix Q can be stored in factored form in the subdiagonal part of A and A can overwrite R.
FlopCount and Storage Consideration. The above method requires 2mnr ; r (m + n)+ ops. The matrix Q, as in the Householder factorization, is stored in factored form. The
Example 5.7.1
00 01 C B A = B 1C A @
1 2 1 2
1
= (a1 a2):
3 2
200
Step 1. a has the largest norm. Then ! 0 1
2
P =
1 1
00 01 B C AP = B 1 C @ A
H = Bp @;
1 (1)
1 0
1 00
1 2 1 2
B
; p1 2 ;1
2 1 2
00 B; A = H AP = B p @
1 1
; p1 2
1 2
1 2
;1 1 p2 ;1 C C
2
; p1 2 ;1
2 1 2
R =
Thus, for this example
! R R
0
1
; p1 2
;1 1 0 0 p2 ;1 C B 1 CB
2 1 2
1 2
A
0
1 2 1 2
A@
1
1 0 ; p2 p 1 ; C B 0 0 C=R C C=B A A @
1 2
0
0
0
2
:
Q = HT = H
1
P =
1
! 0 1
1 0
1
:
1 2
0 0 p B q The matrix A has rank 1, since R = 2 is 1 1. The column vector B ; @
1
orthonormal basis of R(A).
;1 p2
1 C forms an C A
Complete Orthogonal Factorization
It is easy to see that the submatrix (R11 R12) can further be reduced by using orthogonal transformations, yielding ! T 0 : 0 0
201
Theorem 5.7.2 (Complete Orthogonalization Theorem) Given Am n with
rank(A) = r, there exist orthogonal matrices Qm 0 0 where T is an r r upper triangular matrix with positive diagonal entries.
QT AW =
! T 0
m
and Wn n such that
Proof. The proof is left as an exercise (exercise #38). The above decomposition of A is called the complete orthogonal decomposition. RankRevealing QR
The above process, known as the QR factorization with column pivoting, was developed by Golub (1965). The factorization is known as the rankrevealing QR factorization, since in exact arithmetic it reveals the rank of the matrix A which is the order of the nonsingular upper triangular matrix R11. However, in the presence of rounding errors, we will actually have
R=
R
0
11
R R
12 22
!
and if R22 is \small" in some measure (say, kR22k is of O( ), where is the machine precision), then the reduction will be terminated. Thus, from the above discussion, we note that, given an m n matrix A (m n), if there exists a permutation matrix P such that 0 22 where R11 is r r, and R22 is small in some measure, then we will say that A has numerical rank r. (For more on numerical rank, see Chapter 10, Section 10.5.5.) Unfortunately, the converse is not true. A celebrated counterexample due to Kahan (1966) shows that a matrix can be nearly rankde cient without having kR22k small at all.
Gene H. Golub, an American mathematician and computer scientist, is well known for his outstanding contribution in numerical linear algebra, especially in the area of the singular value decomposition (SVD), least squares, and their applications in statistical computations. Golub is a professor of computer science at Stanford University and is the coauthor of the celebrated numerical linear algebra book \Matrix Computations". Golub is a member of the National Academy of Sciences and a past president of SIAM (Society for Industrial and Applied Mathematics).
QT AP = R =
R
11
R R
12
!
202
Consider
0 0 1 with c2 + s2 = 1 c s > 0. For n = 100 c = 0:2, it can be shown that A is nearly singular (the smallest singular value is O(10;8)). On the other hand, rnn = sn;1 = :133, which is not small, so R cannot be nearly singular. The question whether at any stage R22 becomes really small for any matrix has been investigated by Chan (1987) and more recently by Hong and Pan (1992).
01 B0 B B n; ) B . A = diag(1 s : : : s B .. B. B. B. @
1
;c ; c 1 ;c
;c 1 C ;c C . C ... ... . C=R . C C . . . . . . ;c C C A
5.8 Modifying a QR Factorization
Suppose the QR factorization of an m k matrix A = (a1 : : : ak ) (m k) has been obtained. A vector ak+1 is now appended to obtain a new matrix:
A0 = (a : : : ak ak ):
1 +1
It is natural to wonder how the QR factorization of A0 can be obtained from the given QR factorization of A, without nding it from scratch. The problem is called the updating QR factorization problem. The downdating QR factorization is similarly de ned. The updating and downdating QR factorization arise in a variety of practical applications, such as signal and image processing. We present below a simple algorithm using Householder matrices to solve the updating problem.
Algorithm 5.8.1 Updating QR Factorization Using Householder Matrices
The following algorithm computes the QR factorization of A0 = (a1 : : : ak ak+1) given the Householder QR factorization of A = (a1 : : : ak).
Step 1. Compute bk = Hk H ak , where H through Hk are Householder matrices such that ! R
+1 1 +1 1
QT A = Hk
+1
H A=
1
0
:
Step 2. Compute a Householder matrix Hk so that Hk bk = rk is zero in entries k +
2 : : : m.
+1 +1 +1
Step 3.
Form R0 =
" R!
0
rk
#
+1
. 203
Step 4. Form Q0 = Hk
+1
H.
1
Theorem 5.8.1 R0 and Q0 de ned above are such that (Q0)T A0 = R0. Example 5.8.1
011 B C A = B2C @ A 0 ;0:2673 ;0:5345 ;0:8018 1 B C H H = QT = B ;0:5345 0:7745 ;0:3382 C @ A ;0:8018 ;0:3382 0:4927 0 ;3:7417 1 B C R = B 0 C @ A
2 1 1
3
01 11 B C A0 = B 2 4 C @ A
3 5
0
Step 1.
0 ;0:2673 ;0:5345 ;0:8018 1 B C H = B ;0:5345 0:7745 ;0:3382 C @ A ;0:8018 ;0:3382 0:4927 0 ;3:7417 1 B C R = B 0 C @ A
1
0 ;6:4143 1 B C b = H a = B 0:8227 C @ A
2 1 2
0
0:3091
Step 2.
01 1 0 0 B C H = B 0 ;0:9426 ;0:3339 C @ A 0 ;0:3339 0:9426
2
0 ;6:4143 1 B C r = B ;0:9258 C @ A
2
0
204
Step 3.
0 ;3:7417 ;6:4143 1 B C R0 = (R r ) = B 0 ;0:9258 C @ A
2
Q0
01 B Veri cation: (Q0)T R0 = B 2 @
3 5
0 ;0:2673 ;0:5345 ;0:8018 1 B C = H H = B 0:7715 ;0:6172 0:1543 C @ A ;0:5773 ;0:5774 0:5774 1 1 C 4 C = A0. A
2 1
0
0
5.9 Summary and Table of Comparisons
For easy reference we now review the most important aspects of this chapter.
1. Three Important Matrices: Elementary, Householder and Givens. Elementary Lower Triangular matrix: An n n matrix E of the form E = I + meT , k
where m = (0 0 : : : 0 mk+1 k : : : mn k)T is called an elementary lower triangular matrix. If E is as given above, then E ;1 = I ; meT . k T Householder matrix: An n n matrix H = I ; 2uu , where u is an nvector is called a Householder matrix. A Householder matrix is symmetric and orthogonal.
uT u
Givens matrix: A Givens matrix J (i j c s) is an identity matrix except for (i i) (i j) (j i) and (j j ) entries, which are, respectively, c s ;s and c, where c + s = 1.
2 2
A Givens matrix is an orthogonal matrix.
2. Two Important Matrix Factorizations: LU and QR. LU factorization: A factorization of A in the form A = LU , where L is unit lower triangular
and U is upper triangular, is called an LU factorization of A. An LU factorization of matrix A does not always exist. If the leading principal minors of A are all di erent from zero, then the LU factorization of A exists and is unique (Theorem 5.2.1). The LU factorization of a matrix A, when it exists, is achieved using elementary lower triangular matrices. The process is called Gaussian elimination without row interchanges. 205
The process is e cient, requiring only n33 ops, but is unstable for arbitrary matrices. Its use is not recommended in practice unless A is symmetric positive de nite or column diagonally dominant. For decomposition of A into LU in a stable way, row interchanges (Gaussian elimination with partial pivoting) or both row and column interchanges (Gaussian elimination with complete pivoting) to identify an appropriate pivot will be needed. Gaussian elimination with partial and complete pivoting yield factorizations MA = U and MAQ = U , respectively. orthogonal and R is upper triangular. This is called QR factorization of A. The QR factorization of a matrix A is unique if A has linearly independent columns (Theorem 5.4.2). The QR factorization can be achieved using either Householder or Givens matrices. Both methods have guaranteed numerical stability. The Householder method is more e cient than the Givens method ( 2n3 ops versus 4n3 ops (approximately)), but the Givens matri3 3 ces are emerging as useful tools in parallel matrix computations and for computations with structured matrices. The GramSchmidt and modi ed GramSchmidt processes for QR factorization are described in Chapter 7.
QR Factorization. Every matrix A can always be written in the form A = QR, where Q is
3. Hessenberg Reduction.
The Hessenberg form of a matrix is a very useful condensed form. We will see its use throughout the whole book. An arbitrary matrix A can always be transformed to an upper Hessenberg matrix by orthogonal similarity: Given an n n matrix A, there always exists an orthogonal matrix P such that PAP T = Hu, an upper Hessenberg matrix (Theorem 5.4.2). This reduction can be achieved using elementary, Householder or Givens matrices. We have described here methods based on Householder and Givens matrices (Algorithms 5.4.3 and 5.5.3). Both the methods have guaranteed stability, but again, the Householder method is more e cient than the Givens method. For the aspect of uniqueness in Hessenberg reduction, see the statement of the Implicit Q Theorem (Theorem 5.5.3). This theorem basically says that if a matrix A is transformed by orthogonal similarity to two di erent unreduced upper Hessenberg matrices H1 and H2 using two transforming matrices P and Q, then H1 and H2 are essentially the same, provided that P and Q have the same rst columns.
206
4. Orthogonal Bases and Orthogonal Projections. ! R
is the QR factorization of an m n matrix (m n), then the columns of Q1 form If QT A = 0 an orthonormal basis for R(A) and the columns of Q2 form an orthonormal basis for the orthogonal complement of R(A), where Q = (Q1 Q2) and Q1 has n columns. If we let B = (b1 : : : bn) be an m n matrix, then the matrix BR = PA (b1 : : : bn) = PA B , where PA = Q1 QT , is the orthogonal projection of B onto R(A). 1 Similarly, the matrix BN = PN B is the projection of B onto the orthogonal complement of ? R(A), where PN = PA = Q2QT . 2
5. QR Factorization with Column Pivoting, RankRevealing QR, and Modifying a QR Factorization.
If A is rankde cient, then the ordinary QR factorization cannot produce an orthonormal basis of R(A). In such a case, the QR factorization process needs to be modi ed. A modi cation, called the QR factorization with column pivoting, has been discussed in this chapter. Such a factorization always exists (Theorem 5.7.1). A process to achieve such a factorization using Householder matrices due to Golub has been described brie y in Section 5.6. The QR factorization with column pivoting, in exact arithmetic, reveals the rank of A. That is why it is called rankrevealing QR factorization. However, in the presence of rounding errors, such a rankdetermination procedure is complicated and not reliable. Finally, we have presented a simple algorithm to modify the QR factorization of a matrix (updating QR).
6. Table of Comparisons.
We now summarize in the following table e ciency and stability properties of some of these major computations. We assume that A is m n (m n).
207
PROBLEM
LU Factorization Factorization: MA = U
TABLE 5.1 TABLE OF COMPARISONS METHOD FLOPCOUNT
Gaussian elimination without row interchange Gaussian elimination with partial pivoting
mn2
2
(APPROXIMATE)
STABILITY
Unstable
;n ;n ;n
3
6
Stable in (+O(n2) comparisons) practice
3
2 6
mn2
Factorization: MAQ = U Gaussian elimination with complete pivoting QR Factorization QR Factorization Hessenberg Reduction of an n n matrix Hessenberg Reduction of an n n matrix QR Factorization with Column Pivoting Householder Givens Householder Givens Householder
mn2
2
3
(+O(n3) comparisons)
6
Stable Stable Stable Stable Stable Stable
n (m ; n )
2 3
2n2(m ; n ) 3
5 3 3
n
10 3 3
n
2mnr r ;2r2(m + n) + 233 , r = rank(A)
Concluding Remarks: Gaussian elimination without pivoting is unstable Gaussian elimination with partial pivoting is stable in practice Gaussian elimination with complete pivoting is stable. >From the above table, we see that the process of QR factorization
and that of reduction to a Hessenberg matrix using Householder transformations are most e cient. The methods are numerically stable. However, as remarked earlier, Givens transformations are useful in matrix computations with structured matrices, and they are emerging as important tools for parallel matrix algorithms. Also, it is worth noting that Gaussian elimination can be 208
used to transform an arbitrary matrix to an upper Hessenberg matrix by similarity. For details, see Wilkinson AEP, pp. 353355.
5.10 Suggestions for Further Reading
The topics covered in this chapter are standard and can be found in any numerical linear algebra text. The books by Golub and Van Loan (MC) and that by G. W. Stewart (IMC) are rich sources of further knowledge in this area. The book MC in particular contains a thorough discussion on QR factorization with column pivoting using Householder transformations. (Golub and Van Loan MC, 1984, pp. 162167.) The book SLP by Lawson and Hanson contains indepth discussion of triangularization using Householder and Givens transformations, and QR factorization with column pivoting (Chapters 10 and 15). The details of error analysis of the Householder and the Givens methods for QR factorization and reduction to Hessenberg forms are contained in AEP by Wilkinson. For error analyses of QR factorization using Givens transformations and variants of Givens transformations, see Gentleman (1975). A nice discussion on orthogonal projection is given in the book Numerical Linear Algebra and Optimization by Philip E. Gill, Walter Murray, and Margaret H. Wright, Addison Wesley, 1991.
209
Exercises on Chapter 5 (Use MATLAB, whenever appropriate and necessary)
PROBLEMS ON SECTIONS 5.2 and 5.3
1. (a) Show that an elementary lower triangular matrix has the form
E = I + meT k
where m = (0 0 : : : 0 mk+1 k : : : mn k)T . (b) Show that the inverse of E in (a) is given by
E ; = I ; meT : k
1
2. (a) Given
0:00001 a= 1 using 3digit arithmetic, nd an elementary matrix E such that Ea is a multiple of e1 . (b) Using your computations in (a), nd the LU factorization of ! 0:00001 1 A= 1 2 ^ ^ (c) Let L and U be the computed L and U in part (b). Find ^^ kA ; LU kF where k kF is the Frobenius norm.
!
kAkF
3. Show that the pivots a11 a11 : : : a(n;1) are nonzero i the leading principal minors of A are 22 nn nonzero. Hint: Show that det Ar = a11 a(1) : : : a(r;1): 22 rr 4. Let A be a symmetric positive de nite matrix. At the end of the rst step of the LU factorization of A, we have 0 1
B B B B A =B B B B @
(1)
a
0 0 . . . 0
11
a
12
an
1
A0
C C C C C C C C A
210
Prove that A0 is also symmetric and positive de nite. Hence show that LU factorization of a symmetric positive de nite matrix using Gaussian elimination without pivoting always exists and is unique. 5. (a) Repeat the exercise #4 when A is a diagonally dominant matrix that is, show that LU factorization of a diagonally dominant matrix always exists and is unique. (b) Using (a), show that a diagonally dominant matrix is nonsingular. 6. Assuming that LU factorization of A exists, prove that (a) A can be written in the form
A = LDU
1
where D is diagonal and L and U1 are unit lower and upper triangular matrices, respectively. (b) If A is symmetric, then A = LDLT : (c) If A is symmetric and positive de nite, then
A = HH T
where H is a lower triangular matrix with positive diagonal entries. (This is known as the Cholesky decomposition.) 7. Assuming that LU factorization of A exists, develop an algorithm to compute U by rows and L by columns directly from the equation:
A = LU:
This is known as Doolittle reduction. 8. Develop an algorithm to compute the factorization
A = LU
where U is unit upper triangular and L is lower triangular. This is known as Crout reduction. Hint: Derive the algorithm from the equation A = LU .
211
9. Compare the Doolittle and Crout reductions with Gaussian elimination with respect to opcount, storage requirements and possibility of accumulating inner products in double precision. 10. A matrix G of the form
G = I ; geT k
is called a GaussJordan matrix. Show that, given a vector x with the property that eT x 6= 0 there exists a GaussJordan matrix G such that k
Gx is a multiple of ek.
Develop an algorithm to construct GaussJordan matrices G1 G2 : : : Gn successively such that (GnGn ; 1 : : : G2G1)A is a diagonal matrix. This is known as GaussJordan reduction. Derive conditions under which GaussJordan reduction can be carried to completion. Give opcount for the algorithm and compare it with those of Gaussian elimination, Crout reduction and Doolittle reductions. 11. Given
01 2 31 B C A = B2 5 4C @ A
3 4 5
nd
(a) LU factorization of A using Gaussian elimination and Doolittle reduction. (b) LU factorization of A using Crout reduction (note that U is unit upper triangular and L is lower triangular). 12. Apply the GaussJordan reduction to A of the problem #11. 13. (a) Let A be m n and let r = minfm ; 1 ng. Develop an algorithm to construct elementary matrices E1 : : : Er such that
Er Er;
1
EEA
2 1
is an upper trapezoidal matrix U . The algorithm should overwrite A with U . 212
3 (b) Show that the algorithm requires about r ops. 3 (c) Apply the algorithm to
6 7 14. Given a tridiagonal matrix A with nonzero o diagonal entries, write down a set of simple conditions on the entries of A that guarantees that Gaussian elimination can be carried to completion. 15. Assuming that LU decomposition exists, from
01 21 B C A = B4 5C @ A
A = LU
show that, when A is tridiagonal, L and U are both bidiagonals. Develop a scheme for computing L and and U in this case and apply your scheme to nd LU factorization of 04 1 0 01 B1 4 1 0C B C C A=B B B0 1 4 1C C @ A 0 0 1 4 16. Prove that the matrix L in each of the factorizations PA = LU and PAQ = LU , obtained by using Gaussian elimination with partial and complete pivoting, respectively, is unit lower triangular. 00 1 0 01 B0 0 1 0C B C C 17. Given A = B B B0 0 0 1C C @ A 2 3 4 5 Find a permutation matrix P , a unit lower triangular matrix L, and an upper triangular matrix U such that PA = LU . 18. For each of the following matrices nd (a) permutation matrices P1 and P2 and elementary matrices M1 and M2 such that MA = M2P2M1P1A is an upper triangular matrix. (b) permutation matrices P1 P2 Q1 Q2 and elementary matrices M1 and M2 such that MAQ = M1 (P2(M1P1AQ1))Q2) is an upper triangular matrix. 213
(c) Express each factorization in the form PAQ = LU (note that for Gaussian elimination without and with partial pivoting, Q = L). (d) Compute the growth factor in each case.
1 01 C B C i. A = B A @ 0 100 99 98 1 B C ii. A = B 98 55 11 C @ A 0 1 0 1 0 111 B C iii. A = B ;1 1 1 C @ A ;1 ;1 1 0 0:00003 1:566 1:234 1 B C iv. A = B 1:5660 2:000 1:018 C @ A 0 1:23401 10:018 ;3:000 1 1 ; B C v. A = B ;1 2 0 C : @ A 0 ;1 2
1 2 1 3 1 4 1 5 1 2 1 3 1 3 1 4
PROBLEMS ON SECTIONS 5.4{5.6
uu 19. Let x be an nvector. Give an algorithm to compute a Householder matrix H = I ; 2 uT u such that Hx has zeros in the positions (r + 1) through n r < n. How many ops will be required to implement this algorithm? Given x = (1 2 3)T , apply your algorithm to construct H such that Hx has a zero in the 3rd position.
T
20. Let H be an unreduced upper Hessenberg matrix of order n. Develop an algorithm to triangularize H using (a) Gaussian elimination, (b) Householder transformations, (c) Givens rotations. Compute the opcount in each case and compare. 214
21. Let
0 10 1 1 1 1 B 2 10 1 1 C B C C H=B B B 0 1 10 1 C : C @ A
0 0 1 10
Triangularize H using
(a) Gaussian elimination, (b) the Householder method, (c) the Givens method. T ^ 22. Let Hk = I ; 2uu where u is a (n ; k + 1) vector. De ne T
u u
0 How many ops will be required to compute Hk A, where A is arbitrary and n n? Your ^ count should take into account the special structure of the matrix Hk . Using this result, show that the Householder method requires about 2 n3 ops to obtain R 3 2 n3 ops to obtain Q in the QR factorization of A. and another 3 23. Show that it requires n2 (m ; n ) ops to compute R in the QR factorization of an m n 3 matrix A (m n) using Householder's method. Given 01 21 B C A = B3 4C @ A 5 6 (a) nd Householder matrices H1 and H2 such that
Hk =
Ik;
1
0 ^ : Hk
!
H H A=
2 1
R
0
!
where R is 2 2 upper triangular. (b) nd orthonormal bases for R(A) and for the orthogonal complement of R(A). (c) Find orthogonal projections of A onto R(A) and onto its orthogonal complement that is, nd PA and PA? . 215
24. Given
011 B C b = B2C @ A
3
and A as in Problem #21, nd bR and bN . 25. Given
01 31 B C B = B2 4C @ A
3 5
and A as in Problem #21, nd BR and BN .
26. Let H be an n n upper Hessenberg matrix and let
H = QR
where Q is orthogonal and R is upper triangular obtained by Givens rotations. Prove that Q is also upper Hessenberg. 27. Develop an algorithm to compute AH where A is m n arbitrary and H is a Householder matrix. How many ops will the algorithm require? Your algorithm should exploit the structure of H . 28. Develop algorithms to compute AJ and JA, where A is m n and J is a Givens rotation. (Your algorithms should exploit the structure of the matrix J ). How many ops are required in each case? 29. Show that the opcount to compute R in the QR factorization of an m n matrix A (m n) using Givens' rotations is about 2n2(m ; n ). 3
30. Give an algorithm to compute
Q=H H
1
2
Hn
where H1 through Hn are Householder matrices in QR factorization of an m n matrix A m n. Show that the algorithm can be implemented with 2(m2n ; mn2 + n3=3) ops. 31. Let A be m n. Show that the orthogonal matrix where each Qi is the product of (m ; i) Givens rotations, can be computed with 2n2 ( m ; n ) 3 ops. 216
Q = QT QT
1 2
QT; s
1
32. Let
01 2 31 B C A = B4 5 6C: @ A
7 8 9
Find QR factorization of A using (a) the Householder method (b) the Givens method Compare the results.
33. Apply both the Householder and the Givens methods of reduction to the matrix 00 1 0 0 01 B0 0 1 0 0C B C B B0 0 0 1 0C C C A=B B C B B0 0 0 0 1C C @ A 1 2 3 4 5 to reduce it to a Hessenberg matrix by similarity. Compare the results. 34. (a) Show that it requires 5 n3 ops to compute the upper Hessenberg matrix Hu using the 3 n;2 n;2 3 X X Householder method of reduction. (Hint: 2(n ; k)2 + 2n(n ; k) 2 n3 + n3 = 5n .) 3 3 k=1 k=1 (b) Show that if the transforming matrix P is required explicitly, another 2 n3 ops will be 3 needed. (c) Work out the corresponding opcount for reduction to Hessenberg form using Givens rotations. 3 (d) If A is symmetric, then show that the corresponding count in (a) is 2n . 3 35. Given an unreduced upper Hessenberg matrix H , show that the matrix X de ned by X = (e1 He1 : : : H n;1e1 ) is nonsingular and is such that X ;1HX is a companion matrix in upper Hessenberg form. (a) What are the possible numerical di culties with the above computations? 217
(b) Transform
0 1 1 2 3 4 B 2 10; C B 4 4 4C B C H=B ; 1 2C B 0 C 1 10 @ A
5 3
to a companion form. 36.
0
0
1 1
(a) Given the pair (A b) where A is n n and b is a column vector, develop an algorithm to compute an orthogonal matrix P such that PAP T = upper Hessenberg Hu and Pb is a multiple of the vector e1 . (b) Show that Hu is unreduced and b is a nonzero multiple of e1 i rank (b Ab : : : An;1b) = n: (c) Apply your algorithm in (a) to 01 1 1 11 011 B B C B1 2 3 4C C B2C B C A=B b = B C: C B C B2 1 1 1C B3C @ A @ A 1 1 1 1 4
PROBLEMS ON SECTION 5.7 AND SECTION 5.8
37. Given
7 8 9 nd a permutation matrix P , an orthogonal matrix Q, and an upper triangular matrix R using the Householder method such that
01 2 31 B2 3 4C B C C A=B B B4 5 6C C @ A
AP = QR:
The permutation matrix P is to be chosen according to the criteria given in the book. 38. Give a proof of the complete orthogonalization theorem (Theorem 5.7.2) starting from the QR factorization theorem (Theorem 5.4.1). 39. Work out an algorithm to modify the QR factorization of a matrix A from which a column has been removed. 218
MATLAB AND MATCOM PROGRAMS AND PROBLEMS ON CHAPTER 5
You will need the programs housmul, compiv, givqr and givhs from MATCOM.
1. (a) Write a MATLAB function called elm(v ) that creates an elementary lower triangular matrix E so that Ev is a multiple of e1 , where v is an nvector. (b) Write a MATLAB function elmul(A E ) that computes the product EA, where E is an elementary lower triangular matrix and A is an arbitrary matrix.
Your program should be able to take advantage of the structure of the matrix E. (c) Using elm and elmul, write a MATLAB program elmlu that nds the LU factorization
of a matrix, when it exists:
L U ] = elmlu(A):
(This program should implement the algorithm 5.2.1 of the book).
Using your program elmlu, nd L and U for the following matrices:
0 0:00001 A = @ 1 0 B 10 1 A = B 1 10 B @
1 1 A = 5 5 Hilbert matrix 219
1A 1 1A A=@ 1 0:00001 1 1 1 C C 1 C A = diag(1 2 3): A 10
1
0
1
Now compute (i) the product LU and (ii) jjA ; LU jjF in each case and print your results.
2. Modify your program elmlu to incorporate partial pivoting:
M U ] = elmlupp(A):
Test your program with each of the matrices of problem #1. Compare your results with those obtained by MATLAB builtin function:
L U ] = lu(A):
3. Write a MATLAB program, called parpiv to compute M and U such that MA = U using partial pivoting: M U ] = parpiv(A):
(This program should implement algoirthm 5.2.3 of the book).
Print M U jjMA ; U jjF and jjMA ; U jj2 for each of the matrices A of problem #1. 4. Using the program compiv from MATCOM, print M Q and U and
jjMAQ ; U jj and jjMAQ ; U jjF
2
for each of the matrices of problem #1. 5. (a) Write a MATLAB program called housmat that creates a Householder matrix H such that Ha is a multiple of e1 , where a is an n;vector. (b) Using housmat and housmul (from the Appendix of the book), write a MATLAB program housqr that nds the QR factorization of A:
Q R] = housqr(A):
(This program should implement the algorithm 5.4.2 of the book). (c) Using your program housqr, nd Q and R such that
A = QR
for each of the matrices in problem #1. Now compute 220
i. jjI ; QT QjjF , ii. jjA ; QRjjF , and compare the results with those obtained using the MATLAB builtin function
Q R] = qr(A):
(d) Repeat (c) with the program givqr from MATCOM in place of housqr, that computes QR factorization using Givens rotations. 6. Run the program givqr(A) from MATCOM with each of the matrices in problem #1. Then using Q R] = qr(A) from MATLAB on those matrices again, verify the uniqueness of QR factorization for each A. 7. Using givhs(A) from MATCOM and the MATLAB function hess(A) on each of the matrices from problem #1, verify the implicit QR theorems: Theorem 5.5.3 (Uniqueness of
Hessenberg reduction).
8. Using the results of problems #5, nd an orthonormal basis for R(A), an orthonormal basis for the orthogonal complement of R(A), the orthogonal projection onto R(A), and the projection onto the orthogonal complement of R(A) for each of the matrices of problem #1. 9. Incorporate \maximum norm column pivoting" in housqr to write a MATLAB program housqrp that computes the QR factorization with column pivoting of a matrix A. Test your program with each of the matrices of problem #1:
Q R P ] = housqrp(A):
Compare your results with those obtained by using the MATLAB function
Q R P ] = qr(A):
Note : Some of the programs you have been asked to write such as parpiv, housmat, housqr,
etc. are in MATCOM or in the Appendix. But it is a good idea to write your own programs.
221
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 223 6.2 Basic Results on Existence and Uniqueness : : : : : : : : : : : : : : : : : : : : : : : 224 6.3 Some Applications Giving Rise to Linear Systems Problems : : : : : : : : : : : : : : 226 6.3.1 An Electric Circuit Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 226 6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors : : : : : 228 6.3.3 Linear Systems Arising from Ordinary Di erential Equations (Finite Di erence Scheme) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 231 6.3.4 Linear Systems Arising from Partial Di erential Equations: A Case Study on Temperature Distribution : : : : : : : : : : : : : : : : : : : : : : : : : : : 233 6.3.5 Special Linear Systems Arising in Applications : : : : : : : : : : : : : : : : : 238 6.3.6 Linear System Arising From Finite Element Methods : : : : : : : : : : : : : 243 6.3.7 Approximation of a Function by a Polynomial: Hilbert System : : : : : : : : 247 6.4 Direct Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 248 6.4.1 Solution of a Lower Triangular System : : : : : : : : : : : : : : : : : : : : : : 249 6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting 249 6.4.3 Solution of Ax = b Using Pivoting Triangularization : : : : : : : : : : : : : : 250 6.4.4 Solution of Ax = b without Explicit Factorization : : : : : : : : : : : : : : : : 256 6.4.5 Solution of Ax = b Using QR Factorization : : : : : : : : : : : : : : : : : : : 258 6.4.6 Solving Linear System with Right Multiple Hand Sides : : : : : : : : : : : : 260 6.4.7 Special Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 262 6.4.8 Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 284 6.4.9 LU Versus QR and Table of Comparisons : : : : : : : : : : : : : : : : : : : : 286 6.5 Inverses, Determinant and Leading Principal Minors : : : : : : : : : : : : : : : : : : 288 6.5.1 Avoiding Explicit Computation of the Inverses : : : : : : : : : : : : : : : : : 288 6.5.2 The ShermanMorrison and Woodbury Formulas : : : : : : : : : : : : : : : : 290 6.5.3 Computing the Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : 292 6.5.4 Computing the Determinant of a Matrix : : : : : : : : : : : : : : : : : : : : : 295 6.5.5 Computing The Leading Principal Minors of a Matrix : : : : : : : : : : : : : 297 6.6 Perturbation Analysis of the Linear System Problem : : : : : : : : : : : : : : : : : : 299 6.6.1 E ect of Perturbation in the RightHand Side Vector b : : : : : : : : : : : : 300 6.6.2 E ect of Perturbation in the matrix A : : : : : : : : : : : : : : : : : : : : : 304 6.6.3 E ect of Perturbations in both the matrix A and the vector b : : : : : : : : : 306
6.7 The Condition Number and Accuracy of Solution : : : : : : : : : : : : : : : : : : 6.7.1 Some Wellknown Illconditioned Matrices : : : : : : : : : : : : : : : : : : 6.7.2 E ect of The Condition Number on Accuracy of the Computed Solution : 6.7.3 How Large Must the Condition Number be for IllConditioning? : : : : : 6.7.4 The Condition Number and Nearness to Singularity : : : : : : : : : : : : 6.7.5 Conditioning and Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.7.6 Conditioning and the Eigenvalue Problem : : : : : : : : : : : : : : : : : : 6.7.7 Conditioning and Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.7.8 Computing and Estimating the Condition Number : : : : : : : : : : : : : 6.8 Componentwise Perturbations and the Errors : : : : : : : : : : : : : : : : : : : : 6.9 Iterative Re nement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.10 Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.10.1 The Jacobi Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.7.2 The GaussSeidel Method : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.10.3 Convergence of Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : 6.10.4 The Successive Overrelaxation (SOR) Method : : : : : : : : : : : : : : : 6.10.5 The Conjugate Gradient Method : : : : : : : : : : : : : : : : : : : : : : : 6.10.6 The Arnoldi Process and GMRES : : : : : : : : : : : : : : : : : : : : : : 6.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.12 Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: 308 : 309 : 310 : 311 : 312 : 313 : 313 : 314 : 315 : 320 : 321 : 326 : 328 : 332 : 334 : 342 : 349 : 356 : 359 : 366
CHAPTER 6 NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
Objectives
The major objectives of this chapter are to study numerical methods for solving linear systems and associated problems. Some of the highlights of this chapter are:
Theoretical results on existence and uniqueness of the solution (Section 6.2).
Some important engineering applications giving rise to linear systems problems (Section 6.3).
Direct methods (Gaussian elimination with and without pivoting) for solving linear systems
(Section 6.4).
Special systems: Positive de nite, Hessenberg, diagonally dominant, tridiagonal and block
tridiagonal (Section 6.4.7). Methods for computing the determinant and the inverse of a matrix (Section 6.5).
Sensitivity analysis of linear systems problems (Section 6.6). Iterative re nement procedure (Section 6.9). Iterative methods (Jacobi, GaussSeidel, Successive Overrelaxation, Conjugate Gradient)
for linear systems (Section 6.10).
The following major tools and concepts developed in earlier chapters will be needed for smooth learning of material of this chapter. 1. Special Matrices (Section 1.4), and concepts and results on matrix and vector norms (Section 1.7). Convergence of a matrix sequence and convergent matrices (Section
Required Background
1.7.3)
2. LU factorization using Gaussian elimination without pivoting (Section 5.2.1, Algorithms 5.5.1 and 5.5.2). 3. MA = U factorization with partial pivoting (Section 5.2.2, Algorithm 5.2.3). 4. MAQ = U factorization with complete pivoting (Section 5.2.3, Algorithm 5.2.4). 5. The concept of the growth factor (Section 5.3). 222
6. QR factorization of a matrix (Section 5.4.1, Section 5.5.1). 7. Concepts of conditioning and stability (Section 3.3 and Section 3.4). 8. Basic knowledge of di erential equations.
6.1 Introduction
In this chapter we will discuss methods for numerically solving the linear system
Ax = b
where A is an n n matrix and x and b are nvectors. A and b are given and x is unknown. The problem arises in a very wide variety of applications. As a matter of fact, it might be We shall discuss methods for nonsingular linear systems only in this chapter. The case where the matrix A is not square or the system has more than one solution will be treated in Chapter 7. A method called Cramer's Rule, taught in an elementary undergraduate linear algebra course, is of high signi cance from a theoretical point of view.
said that numerical solutions of almost all practical engineering and applied science problems routinely require solution of a linear system problem. (See Section 6.3.)
CRAMER'S RULE
Let A be a nonsingular matrix of order n and b be an nvector. The solution x to the system Ax = b is given by xi = det Ai i = 1 : : : n det A where Ai is a matrix obtained by replacing the ith column of A by the vector b and x = (x1 x2 : : : xn)T .
unknowns by Cramer's rule, using the usual de nition of determinant, would require more than a million years even on a fast computer (Forsythe, Malcom and Moler CMMC, p. 30). For an n n system, it will require about O(n!) ops. Two types of methods are normally used for numerical computations: 223
Remarks on Cramer's Rule: Cramer's Rule is, however, not at all practical from a computational viewpoint. For example, solving a linear system with 20 equations and 20
(1) Direct methods (2) Iterative methods The direct methods consist of a nite number of steps and one needs to perform all the steps in a given method before the solution is obtained. On the other hand, iterative methods are based on computing a sequence of approximations to the solution x and a user can stop whenever a certain desired accuracy is obtained or a certain number of iterations are completed. The iterative
methods are used primarily for large and sparse systems.
The organization of this chapter is as follows: In Section 6.2 we state the basic theoretical results (without proofs) on the existence and uniqueness of solutions for linear systems. In Section 6.3 we discuss several engineering applications giving rise to linear systems problems. In Section 6.4 we describe direct methods for solving linear systems. In Section 6.5 we show how the LU and QR factorization methods can be used to compute the determinant, the inverse and the leading principal minors of a matrix. In Section 6.6, we study the sensitivity issues of the linear systems problems and their e ects on the solutions. In Section 6.9 we brie y describe an iterative re nement procedure for improving the accuracy of a computed solution. In Section 6.10 we discuss iterative methods: the Jacobi, GaussSeidel, SOR, and conjugate gradient and GMRES methods.
6.2 Basic Results on Existence and Uniqueness
Consider the system of m equations in n unknowns:
a11x1 + a12x2 + a12x1 + a22x2 +
. . .
+ a1nxn = b1 + a2nxn = b2 + amn xn = bm :
am1x1 + am2 x2 +
In matrix form, the system is written as
Ax = b
224
where
am1 am2 amn xn bm Given an m n matrix A and an mvector b, if there exists a vector x satisfying Ax = b, then we
0a a B a11 a12 B A = B ..21 22 B B . @
a1n 1 C a2n C C
C C A
0x 1 B x1 C B C x = B ..2 C B C B.C @ A
0b 1 B b1 C B C b = B ..2 C : B C B.C @ A
say that the system is consistent. Otherwise, it is inconsistent. It is natural to ask if a given system Ax = b is consistent and, if it is consistent, how many solutions are there? when is the solution unique? etc. To this end, we state the following theorem.
Theorem 6.2.1 (Existence and Uniqueness Theorem for A Nonhomogeneous System)
(i) The system Ax = b is consistent i b 2 R(A) in other words, rank(A) = rank(A b). (ii) If the system is consistent and the columns of A are linearly independent, then the solution is unique. (iii) If the system is consistent and the columns of A are linearly dependent, then the system has an in nite number of solutions.
Homogeneous Systems
If the vector b = 0, then the system Ax = 0 is called a homogeneous system. A homogeneous system always has a solution, namely x = 0. This is the trivial solution.
Theorem 6.2.2 (Existence Theorem for a Homogeneous System)
(i) The system Ax = 0 has a nontrivial solution i the columns of A are linearly dependent. If Ax = 0 has a nontrivial solution, it has in nitely many solutions. (ii) If m = n, then Ax = 0 has a nontrivial solution i A is singular.
225
Theorem 6.2.3 (Solution Invariance Theorem) A solution of a consistent system
Ax = b
(i) Any two equations are interchanged. (ii) An equation is multiplied by a nonzero constant. (iii) A nonzero multiple of one equation is added to another equation.
remains unchanged under any of the following operations:
Two systems obtained from one another by applying any of the above operations are called equivalent systems. Theorem 6.2.3 then says that two equivalent systems have the same solution.
6.3 Some Applications Giving Rise to Linear Systems Problems
It is probably not an overstatement that linear systems problems arise in almost all practical applications. We will give examples here from electrical, mechanical, chemical and civil engineering. We start with a simple probleman electric circuit.
6.3.1 An Electric Circuit Problem
Consider the following diagram of an electrical circuit:
226
A1
R12 = 1
I1
A2
R23 = 2
I2
A3
V1 = 100 V6 = 0
R25 = 10
R34 =3 I2 I2 R45 = 4 A4
I4 A6 I3 R56 = 5 A5
Figure 61
We would like to determine the amount of current between the nodes A1 A2 A3 A4 A5, and A6 . The famous Kirchho 's Current Law tells us that the algebraic sum of all currents entering a node must be zero. Applying this law at node A2, we have
I1 ; I2 + I4 = 0
At node A5, At node A3, At node A4,
(6.3.1) (6.3.2) (6.3.3)
I2 ; I3 ; I4 = 0 I2 ; I2 = 0 I2 ; I2 = 0
(6.3.4) Now consider the voltage drop around each closed loop of the circuit, A1 A2 A3 A4A5 A6 A1, A1 A2A5 A6 A1 , A2A3A4A5A2. The Kirchho 's Voltage Law tells us that the net voltage drop around each closed loop is zero. Thus at the loop A1A2A3A4A5A6A1, substituting the values of resistances and voltages, we have
I1 + 9I2 + 5I3 = 100
Similarly, at A1 A2 A5 A6A1 and A2 A3A4 A5 A2 we have, respectively
(6.3.5) (6.3.6) (6.3.7)
I1 ; 10I4 + 5I3 = 100 9I2 + 10I4 = 0
227
Note that (6.3.6) + (6.3.7) = (6.3.5). Thus we have four equations in four unknowns:
I1 ; I2 + I4 I2 ; I3 ; I4 I1 ; 10I4 + 5I3 9I2 + 10I4
The equations (6.3.8){(6.3.11) can be written as
= = = =
0 0 100 0
(6.3.8) (6.3.9) (6.3.10) (6.3.11)
0 9 0 10 I4 the solution of which yields the current between the nodes.
0 1 ;1 0 1 1 0 I 1 0 0 1 B CB 1C B C B CB C B C B B 0 1 ;1 ;1 C B I C B 0 C CB C B C B CB 2C B C B CB C B C B CB C = B C B CB C B C B CB C B C B B 1 0 5 ;10 C B I3 C B 100 C CB C B C B CB C B C B CB C B C B CB C B C @ A@ A @ A
0
6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors
Many mathematical models are based on conservation laws such as conservation of mass, conservation of momentum, and conservation of energy. In mathematical terms, these conservation laws lead to conservation or balance or continuity equations, which relate the behavior of a system or response of the quantity being modeled to the properties of the system and the external forcing functions or stimuli acting on the system. As an example, consider a chemical processing plant consisting of six interconnected chemical reactors (Figure 62), with di erent mass ow rates of a component of a mixture into and out of the reactors. We are interested in knowing the concentration of the mixture at di erent reactors. The example here is similar to that given in Chapra and Canale (1988), pp. 295298.
228
Q15 = 3
C5
Q55 = 2 Q54 = 2
Q25 = 1 Q01 = 6 C01 = 12
C1
Q44 = 11 Q12 = 3
C2
Q24 = 1
C4
Q23 = 1 Q31 = 1
C3
Q34 = 8 Q66 = 2 Q36 = 10
C6
Q03 = 8 C03 = 20
Figure 62
Application of conservation of mass to all these reactors results in a linear system of equations as shown below, consisting of ve equations in ve unknowns. The solution of the system will tell us the concentration of the mixture at each of these reactors. in and one ow going out, as shown in Figure 63.
m1 Q1 C1m2 Q2 C2 C3
Steadystate, completely mixed reactor. Consider rst a reactor with two ows coming
m3 Q3 C3

Figure 63
Application of the steady state conservation of mass to the above reactor gives
m1 + m2 = m3 : _ _ _
Noting that
(6.3.12)
mi = Qi Ci _
229
where
mi = mass ow rate of the mixture at the inlet and outlet sections i i = 1 2 3 Qi = volumetric ow rate at the section i i = 1 2 3 Ci = density or concentration at the section i i = 1 2 3
we get from (6.3.12)
Q1C1 + Q2C2 = Q3C3
(6.3.13)
For given inlet ow rates and concentrations, the outlet concentration C3 can be found from equation (6.3.13). Under steady state operation, this outlet concentration also represents the spatially uniform or homogeneous concentration inside the reactor. Such information is necessary for designing the reactor to yield mixtures of a speci ed concentration. For details, see Chapra and Canale (1988). Referring now to Figure 62, where we consider the plant consisting of six reactors, we have the following equations (a derivation of each of these equations is similar to that of (6.3.13)). The derivation of each of these equations is based on the fact that the net mass ow rate into the reactor is equal to the net mass ow out of the reactor.
For reactor 1:
6C1 ; C3 = 72
(6.3.14)
(Note that for this reactor, ow at the inlet is 72 + C3 and ow at the outlet is 6C1.)
For reactor 2: For reactor 3: For reactor 4: For reactor 5: For reactor 6:
3C1 ; 3C2 = 0
(6.3.15) (6.3.16) (6.3.17) (6.3.18) (6.3.19)
; C2 + 11C3 = 200
C2 ; 11C4 + 2C5 + 8C6 = 0
3C1 + C2 ; 4C5 = 0 10C3 ; 10C6 = 0
230
Equations (6.3.14){(6.3.19) can be rewritten in matrix form as 0 6 0 ;1 0 0 0 1 0 C 1 0 72 1 B 3 ;3 0 0 0 0 C B C1 C B 0 C B CB C B C B B 0 ;1 9 0 0 0 C B C2 C B 200 C CB C B C B CB 3C B C B CB C B C B B 0 1 8 ;11 2 8 C B C4 C = B 0 C CB C B C B CB C B C B B 3 1 0 0 ;4 0 C B C5 C B 0 C CB C B C @ A@ A @ A 0 0 10 0 0 ;10 C6 0 or AC = D:
(6.3.20)
The ith coordinate of the unknown vector C represents the mixture concentration at reactor i of the plant.
6.3.3 Linear Systems Arising from Ordinary Di erential Equations (Finite Di erence Scheme) A Case Study on a SpringMass Problem
Consider a system of three masses suspended vertically by a series of springs, as shown below, where k1 k2 , and k3 are the spring constants, and x1 x2 , and x3 are the displacements of each spring from its equilibrium position.
k1 m1 k2 m2 k3 m3 x3 x2 x1
The freebody diagram for these masses can be represented as follows: 231
k 61x1 m1 k2(x2 ; x1)
k 62(x2 ; x1) m2 m2 g
k3(x3 ; x2)
6 ?
m3 k3(x3 ; x2) m3 g
? m? 1g
2
? ?
Referring to the above diagram, the equations of motion, by Newton's second law, are:
Suppose we are interested in knowing the displacement of these springs when the system eventually returns to the steady state, that is, when the system comes to rest. Then, by setting the secondorder derivatives to zero, we obtain
m1 ddtx1 = k2(x2 ; x1) + m1g ; k1 x1 2 2x m2 ddt22 = k3(x3 ; x2) + m2g ; k2 (x2 ; x1 ) 2 m3 ddtx3 = m3 g ; k3(x3 ; x2) 2
k1x1 + k2(x1 ; x2) = m1 g k2(x2 ; x1) + k3(x2 ; x3) = m2 g k3(x3 ; x2) = m3 g
This system of equations in three unknowns, x1 x2, and x3 , can be rewritten in matrix form as 0 k + k ;k 10x 1 0m g1 0 1 1 2 2 1
B B B B ;k B 2 B B B @
0
or The matrix
CB C B C CB C B C CB C B C CB C B C k2 + k3 ;k3 C B x2 C = B m2g C CB C B C CB C B C CB C B C A@ A @ A m3 g ;k3 k3 x3
Kx = w:
0 k + k ;k 1 0 1 2 2 B C B C B C B C K = B ;k2 k2 + k3 ;k3 C B C B C B C @ A 0 ;k3 k3
232
is called the sti ness matrix.
6.3.4 Linear Systems Arising from Partial Di erential Equations: A Case Study on Temperature Distribution
Many engineering problems are modeled by partial di erential equations. Numerical approaches to these equations typically require discretization by means of di erence equations, that is, partial derivatives in the equations are replaced by approximate di erences. This process of discretization in turn gives rise to linear systems of many interesting types. We shall illustrate this with a problem in heat transfer theory. A major objective in a heat transfer problem is to determine the temperature distribution T (x y z t) in a medium resulting from imposed boundary conditions on the surface of the medium. Once this temperature distribution is known, the heat transfer rate at any point in the medium or on its surface may be computed from Fourier's law, which is expressed as
qx = ;K @T @x @T qy = ;K @y qz = ;K @T @z where qx is the heat transfer rate in the x direction, @T is the temperature gradient in the x @x direction, and the positive constant K is called the thermal conductivity of the material. Similarly for the y and z directions.
Consider a homogeneous medium in which temperature gradients exist and the temperature distribution T (x y z t) is expressed in Cartesian coordinates. The heat di usion equation which governs this temperature distribution is obtained by applying conservation of energy over an innitesimally small di erential element, from which we obtain the relation
@ (K @T ) + @ (K @T ) + @ (K @T ) + q_ = C @T (6.3.21) p @z @x @x @y @y @z @z where is the density, Cp is the speci c heat, and q_ is the energy generated per unit volume.
This equation, usually known as the heat equation, provides the basic tool for solving heat conduction problems. It is often possible to work with a simpli ed form of equation (6.3.21). For example, if the thermal conduction is a constant, the heat equation is @ 2T + @ 2T + @ 2T + q_ = 1 @T (6.3.22)
@x2
@y 2
@z2
K
@t
233
where = K=( Cp) is a thermophysical property known as the thermal di usivity. Under steady state conditions, there can be no changes of energy storage, i.e., the unsteady state term @T can be dropped, and equation (6.3.21) reduces to the 3D Poisson's Equation @t
@ 2T + @ 2T + @ 2 T + q_ = 0 (6.3.23) @x2 @y2 @z2 K If the heat transfer is twodimensional (e.g., in the x and y directions) and there is no energy
generation, then the heat equation reduces to the famous Laplace's equation
@ 2 T + @ 2T = 0 @x2 @y2
(6.3.24)
Analytical solutions to the heat equation can be obtained for simple geometry and boundary conditions. Very often there are practical situations where the geometry or boundary conditions are such that an analytical solution has not been obtained or if it is obtained, it involves complex series solutions that require tedious numerical evaluation. In such cases, the best alternatives are nite di erence or nite element methods which are well suited for computers.
If the heat transfer is unsteady and onedimensional without energy generation, then the heat equation reduces to @ 2T = 1 @T (6.3.25) @x2 @t
Finite Di erence Scheme
A wellknown scheme for solving a partial di erential equation is to use nite di erences. The idea is to discretize the partial di erential equation by replacing the partial derivatives with their approximations, i.e., nite di erences. We will illustrate the scheme with Laplace's equation in the following. Let us divide a twodimensional region into small regions with increments in the x and y directions given as x and y , as shown in the gure below.
234
Nodal Points
Dy
Dx
Each nodal point is designated by a numbering scheme i and j , where i indicates x increment and j indicates y increment:
(i,j + 1)
(i  1,j)
(i,j)
(i + 1, j)
(i,j  1)
The temperature distribution in the medium is assumed to be represented by the nodal points temperature. The temperature at each nodal point (xi yj ) (which is symbolically denoted by (i,j) as in the diagram above) is the average temperature of the surrounding hatched region. As the number of nodal points increases, greater accuracy in representation of the temperature distribution is obtained. A nite di erence equation suitable for the interior nodes of a steady twodimensional system can be obtained by considering Laplace's equation at the nodal point i j as
@ 2T + @ 2 T = 0 @x2 i j @y 2 i j
235
(6.3.26)
The second derivatives at the nodal point (i j ) can be expressed as
@ 2T @x2 @ 2T @y 2
ij
@T @x
ij
; @T @x x @T @T 1 ; @y @y i j + 2 x
i+ 1 j 2
i; 1 j 2 i j; 1 2
(6.3.27) (6.3.28)
As shown in the gure, the temperature gradients can be approximated (as derived from the Taylor series) as a linear function of the nodal temperatures as
@T @x @T @x @T @y @T @y
i+ 1 j 2 i; 1 j 2 i j+ 1 2 i j; 1 2
Ti+1 j ; Ti j x Ti j ; Ti;1 j x Ti j +1 ; Ti j y Ti j ; Ti j ;1 y
(6.3.29) (6.3.30) (6.3.31) (6.3.32)
where, Ti j = T (xi yj ). Substituting (6.3.29){(6.3.32) into (6.3.27){(6.3.28), we get
@ 2T = Ti+1 j ; 2Ti j + Ti;1 j @x2 i j ( x)2 @ 2T = Ti j+1 ; 2Ti j + Ti j ;1 @y 2 i j ( y )2
The equation (6.3.26) then gives
(6.3.33) (6.3.34)
Assume x = y . Then the nite di erence approximation of Laplace's equation for interior regions can be expressed as
Ti+1 j ; 2Ti j + Ti;1 j + Ti j +1 ; 2Ti j + Ti j ;1 = 0 ( x)2 ( y )2
Ti j+1 + Ti j ;1 + Ti+1 j + Ti;1 j ; 4Ti j = 0
(6.3.35)
More accurate higher order approximations for interior nodes and boundary nodes are also obtained in a similar manner.
Example 6.3.1
A twodimensional rectangular plate (0 x 1 0 y 1) is subjected to the uniform temperature boundary conditions (with top surface maintained at 1000C and all other surfaces at 236
00C ) shown in the gure below, that is T (0 y ) = 0, T (1 y ) = 0, T (x 0) = 0, and T (x 1) = 1000C , Suppose we are interested only in the values of the temperature at the nine interior nodal points (xi yj ) where xi = i x and yj = j y , i j = 1::3 with x = y = 1 . 4
o 100 C (0,0) (0,1) O oC (0,2) (0,3) (0,4) (1,0) (1,1) (1,2) (1,3) (1,4) (2,0) (2,1) (2,2) (2,3) (2,4) o O C (3,0) (3,1) (3,2) (3,3) (3,4) (4,0) (4,1) (4,2) (4,3) (4,4) o O C
However, we assume symmetry for simplifying the problem. That is, we assume that T33 = T13, T32 = T12, and T31 = T11. We thus have only six unknowns: (T11 T12 T13) and (T21 T22 T23). 4T1 1 ; 0 ; 100 ; T2 1 ; T1 2 4T1 2 ; 0 ; T1 1 ; T2 2 ; T1 3 4T1 3 ; 0 ; T1 2 ; T2 3 ; 0 4T2 1 ; T1 1 ; 100 ; T1 1 ; T2 2 4T2 2 ; T1 2 ; T2 1 ; T1 2 ; T2 3 4T2 3 ; T1 3 ; T2 2 ; T1 3 ; 0 = = = = = = 0 0 0 0 0 0
237
After suitable rearrangement, these equations can be written in the following form: 0 4 ;1 ;1 0 0 0 1 0 T1 1 1 0 100 1
0 0 0 ;1 ;2 4 T2 3 0 The solution of this system will give us temperatures at the nodal points.
B B B B ;2 B B B B B B1 B B B B B B B0 B B B B B B0 B B B @
4 0
0 ;1 0 4 ;1 ;1 0 4
;1 ;2 4
0 ;1 0
CB C B C CB C B C CB C B C CB C B C 0 C B T2 1 C B 100 C CB C B C CB C B C CB C B C CB C B C CB C B C 0 C B T1 2 C B 0 C CB C B C CB C = B C CB C B C CB C B C CB C B C ;1 C B T 2 2 C B 0 C CB C B C CB C B C CB C B C CB C B C CB C B C ;1 C B T 1 3 C B 0 C CB C B C CB C B C CB C B C A@ A @ A
6.3.5 Special Linear Systems Arising in Applications
Many practical applications give rise to linear systems having special properties and structures, such as tridiagonal, diagonally dominant, and positive de nite systems and block tridiagonal. The solution methods for solving these special systems are described in Section 6.4.6. We rst state a situation which gives rise to a tridiagonal system.
Tridiagonal Systems
Consider onedimensional steady conduction of heat such as heat conduction through a wire. In such a case, the temperature remains constant with respect to time. The equation here is: The di erence analog of this equation is:
@ 2T = 0 @x2
T (x + x) ; 2 T (x) + T (x ; x) = 0
where x is the increment in x, as shown below.
T0 =
1
x
2
3
T4 =
238
Using a similar numbering scheme as before, the temperature at any point is given by
Ti+1 ; 2 Ti + Ti;1 = 0
that is, the temperature at any point is just the average of the temperatures of the two nearest neighboring points. Suppose the domain of the problem is 0 x 1. Divide now the domain into four segments of equal length, say x. Thus x = :25. Then T at x = i x will be denoted by Ti . Suppose that we know the temperature at the end points x = 0 and x = 1, that is,
T0 = T4 =
These are then the boundary conditions of the problem. >From the equation, the temperature at each node, x = 0 x = x, x = 2 x x = 3 x x = 1 is calculated as follows: At x = 0, T0 = (given) At x = x, T0 ; 2T1 + T2 = 0 At x = 2 x, T1 ; 2T2 + T3 = 0 At x = 3 x, T2 ; 2T3 + T4 = 0 At x = 1, T4 = (given) In matrix form these equations can be written as: 01 0 0 0 010T 1 0 1 0
0 The matrix of this system is tridiagonal.
B CB C B C B CB C B C B B 1 ;2 1 0 0 C B T C B 0 C CB C B C B CB 1C B C B CB C B C B CB C B C B CB C B C B CB C B C B B 0 1 ;2 1 0 C B T2 C = B 0 C CB C B C B CB C B C B CB C B C B CB C B C B CB C B C B B 0 0 1 ;2 1 C B T3 C B 0 C CB C B C B CB C B C B CB C B C B CB C B C @ A@ A @ A
0 0 0 1
T4
239
Symmetric Tridiagonal and Diagonally Dominant Systems
In order to see how such systems arise, consider now the unsteady conduction of heat. This condition implies that the temperature T varies with the time t. The heat equation in this case is 1 @T = @ 2 T :
@x2 Let us divide the grid in the (x t) plane with spacing x in the xdirection and t in the tdirection.
@t
ti+1 ; ti = t
t2 t1 0 x1 x2 x3 xn 1
xi+1 ; xi = x
Let the temperature at the nodal point xi = i x and tj = j t, as before, be denoted by Tij . 2 Approximating @T and @ T by the nite di erences @t @x2 1 (T ( x)2 i+1 j +1 ; 2Ti j +1 + Ti;1 j +1) we obtain the following di erence analog of the heat equation: (1 + 2C )Ti j +1 ; C (Ti+1 j +1 + Ti;1 j +1) = Ti j where C = ( xt)2 . These equations enable us to determine the temperature at a time step j = k + 1, knowing the temperature at the previous time step j = k.
@T @t @ 2T @x2
1 (T t i j +1 ; Ti j )
i = 1 2 ::: n
For i = 1 j = k: (1 + 2C )T1 k+1 ; CT2 k+1 = CT0 k+1 + T1 k For i = 2 j = k: (1 + 2C )T2 k+1 ; CT3 k+1 ; CT1 k+1 = T2 k . . .
240
For i = n j = k: (1 + 2C )Tn k+1 ; CTn;1 k+1 = Tn k + Tn+1 k+1
Suppose now the temperatures at the two vertical sides are known, that is,
T0 t = TW1 Tn+1 t = TW2
Then the above equations can be written in matrix notation as 0 (1 + 2C ) ;C 0 0 1 0 T1 k+1 1 0 T1 k + CTW1 1
0 ;C (1 + 2C ) Tn k+1 Tn k + CTW2 The matrix of the above system is clearly symmetric, tridiagonal and diagonally dominant (note that C > 0). For example, when C = 1, and we have 0 3 ;1 0 10T 1 0T +T 1 0 B ;1 3 ;1 C B T1 k+1 C B 1 kT W1 C B B . . . . 0 C B 2 .k+1 C B 2. k C B .. . . . . . . ... C B .. C = B .. C CB C B C B CB B . CB . C B . C C B C B . . . . . . . ;1 C B . C B B . C B . C B .. C C @ A@ A @ A 0 ;1 3 Tn k+1 Tn k + TW2 or Ax = b: The matrix A is symmetric, tridiagonal, diagonally dominant and positive de nite.
B B B B ;C B B B B B . B . B . B B B B . B . B . B B B @
(1 + 2C ) ;C 0 ... ... ... ... ...
CB C B CB C B CB CBT C B C B 0 C B 2 k+1 C B CB C B CB C B CB CB . C B . CB . C B . CB . C = B C B . CB C B CB C B CB CB . C B CB . C B C B ;C C B . C B CB C B CB C B CB C B A@ A @
T2 k
. . . . . .
C C C C C C C C C C C C C C C C C C C C A
Block Tridiagonal Systems
To see how block tridiagonal systems arise in applications, consider the twodimensional Poisson's equation:
@ 2 T + @ 2 T = f (x y ) @x2 @y 2
0 x 1 241
0 y 1:
A discrete analog to this equation, similar to Laplace's equation derived earlier, is
Ti+1 j + Ti;1 j + Ti j +1 + Ti j;1 ; 4Tij = ( x)2 fij i = 1 2 ::: n j = 1 2 ::: n
This will give rise to a linear system of (n + 2)2 variables. Assume now the values of T at the four sides of the unit square are known and we are interested in the values of T at the interior grid points, that is,
T0 j Tn+1 j and Ti 0 Ti n+1
(j = 0 1 : : : n + 1
i = 0 1 : : : n + 1) n2 )
are given and T11 : : : Tn1 T12 : : : Tn2 T1n : : : Tnn are to be found. Then we have a (n2 system with n2 unknowns which can be written after suitable rearrangement as
04 B ;1 B B B B B B B B B B B B B0 B B B0 B B B ;1 B B B0 @
;1 0 4 ;1 0
0 ;1 0 0 ;1 0
0 ;1
0 ;1 4 ;1 0 ;1 4 0 0 0
0 0 ;1 0 0 0 ;1 0 4 ;1 0 0 ;1 0 ;1 4 ;1 0 0 ;1 0 . . .
1 C C C C C C C C C C C C C C C C C C C C C C C A
242
0 T01 + T10 ; ( x)2f11 1 C 0 T11 1 B B C T20 ; ( x)2f21 B C BT C B C B 21 C B C B . C B C B .. C B . C . B C B C . B C B C BT C B C B n1 C B C B C B C B C B C . . B T12 C = B C . B . C B C B . C B C B . C B C B C B C B C B B Tn2 C B Tn;1 0 ; ( x)2fn;1 1 C C B . C B C B . C B B . C B Tn+1 1 + Tn 0 ; ( x)2fn 1 C C C @ A B B C B C 2f Tnn T02 ; ( x) 12 @ A
. . . or in matrix form,
0 An ;In 1 0 T11 1 0 T + T ; ( x)2f 1 B CB T C B C B 21 C B 01 10 2 11 C B CB . C B C T20 ; ( x) f21 B ;I . . . . . . C B .. C B C B n CB C B C . B CB C B . C . B CBT C B C B C B n1 C B C . B CB C B . C . B CB C B ... ... ... C B C B T12 C = B B C B . C B Tn;1 0 ; ( x)2fn;1 1 C C B CB . C B C B CB . C B B C B C B Tn+1 1 + Tn 0 ; ( x)2fn 1 C C B . . . . . . ;I C B T C B C B C B n2 C B C nCB B C B 2f C B C B .. C @ T02 ; ( x) 12 A B CB . C . @ A@ A . . ;In An Tnn 04 B B ;1 An = B .. B B . @
;1
(6.3.36)
where
0 ... ... . C . C . C (6.3.37) . . . . . . ;1 C C A 0 ;1 4 The system matrix above is block tridiagonal and each block diagonal of matrix An is symmetric, tridiagonal, and positive de nite. For details, see Ortega and Poole (INMDE, pp. 268{272).
1
6.3.6 Linear System Arising From Finite Element Methods
We have seen in the last few sections how discretization of di erential equations using nite di erences gives rise to various types of linear systems problems. Finite element technique is another popular way to discretize di erential equations and this results also in linear systems problems. 243
Just to give a taste to the readers, we illustrate this below by means of a simple di erential equation. The interested readers are referred to some of the wellknown books on the subject: Ciarlet (1981), Strang and Fix (1973), Becker, Carey and Oden (1981),Reddy(1993). 1. Variational formulation of a twopoint boundary value problem. Let us consider the following twopoint boundary value problem
;u00 + u = f (x)
0<x<1 u(0) = u(1) 0
(6.3.38) (6.3.39)
du where u0 = dx and f is a continuous function on 0,1]. We further assume that f is such that Problem (6.3.38)(6.3.39) has a unique solution. We introduce the space V = fv : v is a continuous function on 0,1]  and v 0 is piecewise continuous and bounded on 0,1], and v (0) = v (1) = 0g
Now, if we multiply the equation ;u00 + u = f (x) by an arbitrary function v 2 V (v is called a test function) and integrate the left hand side by parts, we get
Z1
0
(;u00(x) + u(x))v (x)dx = (u0 v 0 + uv )dx =
Z1
0
f (x)v (x)dx
(6.3.40)
that is,
Z1
0
Z1
0
f (x)v (x)dx
for every u 2 V
Since v 2 V and v (0) = v (1) = 0. We write (6.3.40) as:
a(u v) = (f v)
where and
a(u v ) =
(f v ) =
Z1
0
(u0 v 0 + uv )dx
Z1
0
f (x)v (x)dx
(Notice that the form a( ) is symmetric (i.e. a(u v ) = a(v u)) and bilinear.) These two prop
erties, will be used later. It can be shown that u is a solution of (6.3.40) if and only if u is a solution to (6.3.38)(6.3.39).
2. The Discrete Problem 244
We now discretize problem (6.3.40). We start by constructing a nite dimensional subspace Vn of the space V . Here, we will only consider the simple case where Vn consists of continuous piecewise linear functions. For this purpose, we let 0 = x0 < x1 < x2 ::: < xn < xn+1 = 1 be a partition of the interval 0,1] into subintervals Ij = xj ;1 xj ] of length hj = xj ; xj ;1 j = 1 2 ::: n + 1. With this partition, we associate the set Vn of all functions v (x) that are continuous on the interval 0,1], linear in each subinterval Ij , j = 1 ::: n + 1, and satisfy the boundary conditions v (0) = v (1) = 0. We now introduce the basis functions f 1 2 ::: ng, of Vn. We de ne j8x) by ( < 1 if i = j (i) j (xi ) = : 0 if i 6= j (ii) j (x) is a continuous piecewise linear function. j (x) can be computed explicitly to yield:
1
ϕ ( λ) j
0
xj
1
Since
1
8 x ; xj;1 > when xj ;1 x xj < hj j (x) = > xj +1 ; x : hj+1 when xj x xj+1: ::: n are the basis functions, any function v 2 Vn can be written uniquely as:
v (x) =
n X i=1
vi i(x)
where vi = v (xi ):
We easily see that Vn V .
The discrete analogous of Problem (6.3.40) then reads: nd un 2 Vn such that
n X i=1
a(un v ) = (f v)
8v 2 Vn
(6.3.41)
Now, if we let un =
ui i(x) and notice that equation (6.3.41) is particularly true for every
245
function j (x) j = 1 ::: n, we get n equations, namely.
a(
Now using the linearity of a (
X
ui
i
j ) = (f
j)
8j = 1 2 ::: n 8j = 1 2 ::: n:
(6.3.42)
n X i=1
j)
leads to n linear equations in n unknowns:
i j ) = (f j)
uia(
which can be written in the matrix form as
Aun = (fn)i
where (fn )i = (f i) and A = (aij ) is a symmetric matrix given by
aij = aji = a(
i
j)
and
un = (u1 ::: un)T :
if
The entries of the matrix A can be computed explicitly: We rst notice that
aij = aji = a(
i
j) = 0
ji ; j j 2
(This is due to the local support of the function i(x)). A direct computation now leads to
aj j = a(
j
Z x " 1 (x ; xj;1)2 # Z x +1 " 1 (xj+1 ; x) #2 dx + dx j) = h2 + h2 h2 + h2
j j
xj;1
j
j
xj
j +1
j +1
Z x " 1 (xj ; x) (x ; xj;1 # aj j;1 = ; h2 + h dx = ;1 + hj h h 6
j
1 = h + h 1 + 1 hj + hj +1] : 3 j j +1
xj ;1 j j
j
j
Hence, the system (6.3.42) can be written as:
2 6 a1 b1 6 b1 a2 6 6 6 6 6 6 ... ... ... 6 6 6 ... ... 6 6 4
0
0
bn;1 bn;1 an
3 7 7 7 7 7 7 7 7 7 7 7 7 7 5
2 6 6 6 6 6 6 6 6 6 6 6 6 6 4
u1 u2
. . . . . . . . .
un
3 2 7 6 7 6 7 6 7 6 7 6 7 6 7=6 7 6 7 6 7 6 7 6 7 6 7 6 5 4
(fn )1 (fn )2 . . . . . . . . . (fn )n
3 7 7 7 7 7 7 7 7 7 7 7 7 7 5
246
1 1 where aj = h + h 1 + 1 hj + hj +1] and bj = ; h + hj . In the special case of uniform grid 3 6 j j +1 j 1 , the matrix A then takes the form hj = h = n + 1
2 6 2 6 ;1 6 6 16 A= h6 6 6 6 6 4
;1
2 ... ... ... ... ... ... ... 0 ;1 2
2 7 64 7 h6 1 7+ 6 7 6 7 66 7 6 6 7 ;1 7 4 5
0 7 7
3
1 ... ... 0
... 7 7 ... 1 7 7 7 5 1 4
07 7
3
(6.3.43)
6.3.7 Approximation of a Function by a Polynomial: Hilbert System
In Chapter 3 (Section 3.6) we cited an illconditioned linear system with the Hilbert matrix. In this section we show how such a system arises. The discussion here has been taken from Forsythe and Moler (CSLAS, pp. 80{81). Suppose a continuous function f (x) de ned on the interval 0 x 1 is to be approximated by a polynomial of degree n ; 1: n
X
i=1
pi xi;1
such that the error
E=
Z 1 "X n
0
is minimized. The coe cients pi of the polynomial are easily determined by setting
i=1
pixi;1 ; f (x)
#2
dx
E =0 pi
i = 1 : : : n:
(Note that the error is a di erentiable function of the unknowns pi and that a minimum occurs when all the partial derivatives are zero.) Thus we have
n E = 2 Z 1 4X p xj;1 ; f (x)5 xi;1 dx = 0 j pi 0 j =1 n X Z1 j =1
0
2
3
i = 1 ::: n i = 1 : : : n:
or
xi+j ;2 dx
pj =
Z1
0
f (x)xi;1 dx
(To obtain the latter form we have interchanged the summation and integration.) Letting Z1 hij = xi+j;2 dx
0
247
and we have
bi =
Z1
0
f (x)xi;1 dx (i = 1 2 : : : n) i = 1 : : : n:
n X j =1
hij pj = bi
That is, we obtain the linear system
0b 1 B b1 C B C where H = (hij ) b = B .2 C. The matrix H is easily identi ed as the Hilbert matrix, (see B.C B.C @ A
Chapter 3, Section 3.6), since
Hp = b
bn
hij =
Z1
0
1 xi+j ;2 dx = i + j ; 1 :
6.4 Direct Methods
In this section we will study direct methods for solving the problem Ax = b. These methods include
Gaussian elimination without pivoting, based on LU factorization of A (Section 6.4.2). Gaussian elimination with partial pivoting, based on MA = U factorization of A (Section 6.4.3). Gaussian elimination with complete pivoting, based on MAQ = U factorization of A (Section 6.4.3). The method based on the QR factorization of A (Section 6.4.5).
The method based on the Cholesky decomposition of a symmetric positive de nite matrix (Section 6.4.7). Gaussian elimination for special systems: Hessenberg, positive de nite, tridiagonal, diagonally dominant. (Section 6.4.7). For a comparison of these methods and their relative merits and demerits, see Section 6.4.9 and the accompanying Tables of Comparison. These methods are primarily used to solve small
and dense linear system problems of order up to 200 or so.
The basic idea behind all the direct methods is to rst reduce the linear system Ax = b to equivalent triangular system(s) by nding triangular factors of the matrix A, and then solving the 248
triangular system(s) which is (are) much easier to solve than the original problem. In Chapter 5 we described various methods for computing triangular factors of A. In Chapter 3 (Section 3.1) we described the back substitution method for solving an upper triangular system. We now describe in the following an analogous method, called forward elimination, for solving a lower triangular system.
6.4.1 Solution of a Lower Triangular System
The solution of the nonsingular lower triangular system
Ly = b
can be obtained analogously as in the upper triangular system. Here y1 is obtained from the rst equation and then, inserting its value in the second equation, y2 is obtained and so on. This process is called forward elimination.
Algorithm 6.4.1 Forward Elimination For i = 1 2 : : : n do 1 0 i;1 1 @b ; X l y A yi = l i ij j
ii
Note: When i = 1, the summation (P) is skipped.
ops to compute y2, 3 ops to compute y3 , and so on). The algorithm is as stable as the back substitution process (Algorithm 3.1.3).
j =1
Flopcount and stability. The algorithm requires about n ops (1 op to compute y1, 2 2
2
6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting
The Gaussian elimination method for the linear system Ax = b is based on LU factorization of the matrix A. Recall from Chapter 5 (Section 5.2.1) that triangularization using Gaussian elimination without pivoting, when carried out to completion, yields an LU factorization of A. Once we have this factorization of A, the system Ax = b becomes equivalent to two triangular systems:
Ly = b Ux = y:
249
Solving Ax = b using Gaussian Elimination Without Pivoting
The solution of the system Ax = b using Gaussian elimination (without pivoting) can be achieved in two stages: First, nd a LU factorization of A. Second, solve two triangular systems: the lower triangular system Ly = b rst, followed by the upper triangular system Ux = y . gular system needs only n22 ops, the total ops count for solving the system Ax = b using Gaussian elimination is about n33 + n2 .
Flopcount. Since a LU factorization requires about n33 ops and the solution of each trian
6.4.3 Solution of Ax = b Using Pivoting Triangularization A. Solution with Partial Pivoting
If Gaussian elimination with partial pivoting is used to triangularize A, then as we have seen in Chapter 5 (Section 5.2.2), this process yields a factorization:
MA = U:
In this case, the system Ax = b is equivalent to the triangular system
Ux = Mb = b0:
Solving Ax = b Using Gaussian Elimination With Partial Pivoting
To solve Ax = b using Gaussian elimination with partial pivoting: 1. Step 1. Find the factorization MA = U by the triangularization algorithm using partial pivoting (Algorithm 5.2.3). 2. Step 2. Solve the triangular system by back substitution (Algorithm 3.1.3):
Ux = Mb = b0:
250
Implementation of Step 2
The vector can be computed as follows: (1) s1 = b (2) For k = 1 2 : : : n ; 1 do sk+1 = Mk Pk sk sn = b0 .
b0 = Mb = Mn;1Pn;1Mn;2 Pn;2
M1 P1b
251
Computational Remarks
The practical Gaussian elimination (with partial pivoting) algorithm does not give Mk and Pk explicitly. But, we really do not need them. The vector sk+1 can be computed immediately from sk once the index rk of row interchange and the multipliers mik have been saved at the kth step. This is illustrated with the 3 3 example as follows:
Example 6.4.1
Let
n=3
and let
01 0 01 B C P1 = B 0 0 1 C @ A
0 1 0
0 1 0 01 B C M1 = B m21 1 0 C @ A
m31 0 1
0 s(2) 1 B 1 C s2 = M1 P1s1 = B s(2) C : @ 2 A 0s 1 B 1C P1s1 = B s3 C @ A
s4 s(2) 3
Then we have
and the entries of s2 are then given by
s(2) = s1 1 s(2) = m21s1 + s3 2 s(2) = m31s1 + s2: 3
B. Solution with Complete Pivoting
Gaussian elimination with complete pivoting (Section 5.2.3) gives
MAQ = U:
Using this factorization, the system Ax = b can be written as
Uy = Mb = b0
where Thus, we have the following. 252
y = QT x:
Solving Ax = b Using Gaussian Elimination With Complete Pivoting
To solve Ax = b using complete pivoting:
Step 1. Find the factorization MAQ = U by the factorization algorithm using complete pivoting (Algorithm 5.2.4). Step 2. Solve the triangular system (for y) (Algorithm 3.1.3):
Uy = b0
computing b0 as shown above. Step 3. Finally, recover x from y : x = Qy .
Implementation of Step 3
Since x = Qy = Q1 Q2 Step 3. Set wn = y . For k = n ; 1 : : : 2 1 do wk = Qk wk+1 Then x = w1.
Qn;1y , the following scheme can be adopted to compute x from y in
Note: Since Qk is a permutation matrix, the entries of wk are simply those of wk+1 reordered according to the permutation index. Example 6.4.2
Solve with
Ax = b
00 1 11 B C A = B1 2 3C @ A
1 1 1
021 B C b = B6C @ A
3
(a) using partial pivoting, 253
(b) using complete pivoting.
(a) Partial Pivoting
With the results obtained earlier (Section 5.2.2, Example 5.2.3), we compute 061 00 1 01 B C B C P1 b = B 2 C P1 = B 1 0 0 C @ A @ A 3 0 0 1 061 0 1 0 01 B C B C M1P1b = B 2 C M1 = B 0 1 0 C @ A @ A ;3 ;1 0 1 061 01 0 01 B C B C P2M1P1b = B 2 C P2 = B 0 1 0 C @ A @ A ;3 0 0 1 061 01 0 01 B C B C b0 = M2P2M1P1b = B 2 C M2 = B 0 1 0 C : @ A @ A ;1 0 1 1 The solution of the system
is x1 = x2 = x3 = 1.
Ux = b0 01 2 3 10x 1 0 6 1 B CB 1 C B C B 0 1 1 C B x2 C = B 2 C @ A@ A @ A 0 0 ;1 x3 ;1
(b) Complete Pivoting
Using the results obtained in the example in Section 5.2.4, we have 061 00 1 01 B C B C P1 b = B 2 C P1 = B 1 0 0 C @ A @ A 3 0 0 1 061 061 B C B C M1P1b = B 0 C P2 M1 P1b = B 1 C @ A @ A 1 0 061 B C b0 = M2 P2 M1P1 b = B 1 C : @ A
1 2
254
The solution of the system
Uy = b0 03 2 110x 1 061 B0 2 1 CB 1C B C B 3 3 C B x2 C = B 1 C @ A@ A @ A 1 1 0 0 2 x3 2 is y1 = y2 = y3 = 1. Since fxk g k = 1 2 3 is simply the rearrangement of fyk g, we have x1 = x2 = x3 = 1.
Some Details of Implementation
Note that it is not necessary to store the vectors si and wi separately, because all we need is the vector sn for partial pivoting and w1 for complete pivoting. So starting with x = b, each new vector can be stored in x as it is computed. Also note that if we use the practical algorithms, the matrices Pk Qk and Mk are not available explicitly they have to be formed respectively out of indices rk sk and the multipliers mik . In this case, the statements for computing the sk 's and wk 's are to be modi ed accordingly. using elementary matrices requires n33 ops. The triangular system Ux = b0 or Uy = b0 can be 2 solved using back substitution with n ops and, the vector b0 can be computed with n2 ops, 2 taking into account the special structures of the matrices Mk and Pk . Recovering x from y in Step 3 of the Complete Pivoting process does not need any ops. Note that x is obtained from y just by reshu ing the entries of y . Thus, the solution of the linear3 system Ax = b using Gaussian elimination with complete or partial pivoting requires n +O(n2) ops. However, Gaussian 3 n3 comparisons to identify (n ; 1) pivots, elimination with complete pivoting requires about 3 compared to only O(n2) comparisons needed by the partial pivoting method.
Flopcount. We have seen in Chapter 5 (Section 5.2.1) that the triangularization process
Roundo Property
In Chapter 5 (Section 5.3) we discussed the roundo property of Gaussian elimination for triangularization of A. We have seen that the growth factor determines the stability of the triangularization procedure. The next question is how does the growth factor a ect the solution procedure of Ax = b using such a triangularization. The answer is given in the following: 255
Roundo Error Result for Linear System Problem with Gaussian Elimination
It can be shown that the computed solution x of the linear system Ax = b, using ^ Gaussian elimination, satis es (A + E )^ = b x where kE k1
Chapter 11, Section 11.4.
c(n3 + 3n2 ) kAk1 , and c is a small constant. For a proof see
Remark: The size of the above bound is really determined by , since when n is not too large, can be considerably small compared to and can therefore be neglected. Thus the growth factor is again the deciding factor.
n3
6.4.4 Solution of Ax = b without Explicit Factorization
As we have just seen, the Gaussian elimination method for solving Ax = b comes in two stages. First, the matrix A is explicitly factorized:
A = LU (without pivoting) MA = U (with partial pivoting) MAQ = U (with complete pivoting).
Second, the factorization of A is used to solve Ax = b. However, it is easy to see that these two stages can be combined so that the solution can be obtained by solving an upper triangular system by processing the matrix A and the vector b simultaneously. In this case, the augmented matrix (A b) is triangularized and the solution is then obtained by back substitution. We illustrate this implicit process for Gaussian elimination with partial pivoting.
Algorithm 6.4.2 Solving Ax = b With Partial Pivoting Without Explicit Factorization
Given an n n matrix A and a vector b, the following algorithm computes the triangular factorization of the augmented matrix (A b) using Gaussian elimination with partial pivoting. A is overwriten by the transformed triangular matrix and b is overwritten by the transformed vector. The multipliers are stored in the lowerhalf part of A. 256
For k = 1 2 : : : n ; 1 do (1) Choose the largest element in magnitude in the column k below the (k k) entry call it ar :
k k
ar = max fjaik j : i kg
k k
If ar = 0, Stop.
k k
th (2) Otherwise, interchange the rows k and rk of A and the kth and rk entries of b:
ar $ akj (j = k k + 1 : : : n) br $ bk :
k j k
(3) Form the multipliers: (4) Update the entries of A: (5) Update the entries of b:
a aik mik = ; a ik
kk
(i = k + 1 : : : n)
aij aij + mikakj
(i = k + 1 : : : n j = k + 1 : : : n)
bi bi + mik bk
(i = k + 1 : : : n):
Example 6.4.3
00 1 11 B C A = B2 2 3C @ A
4 1 1
021 B C b = B6C: @ A
3
Step 1. The pivot entry is a31 = 4, r1 = 3
Interchange the rows 3 and 1 of A and the 3rd and 1st entries of b. 04 1 11 031 B C B C A B2 2 3C b B6C @ A @ A 0 1 1 2 m21 = ; a21 = ; 1 a11 2 031 04 1 11 C B C B b b(1) = B 9 C A A(1) = B 0 3 2 C @2A @ 2 5A 0 1 1 2 257
Step 2. The pivot entry is a22 = 3 2
04 1 1 1 B 3 5 C A A(2) = B 0 2 2 C @ A 2 0 0 ;3
m32 = ; a32 = ; 2 a22 3
031 B C b b(2) = B 9 C @2A ;1
The reduced triangular system A(2) x = b(2) is:
The solution is:
4x1 + x2 + x3 = 3 3x + 5x = 9 2 2 2 3 2 2 x = ;1 ;3 3
x3 = 3 2
x2 = 1 2
x1 = 1 4
6.4.5 Solution of Ax = b Using QR Factorization
If we have the system Ax = b then becomes or
A = QR QRx = b
just by solving the equivalent upper triangular system:
Rx = QT b = b0: Thus, once we have the QR factorization of A, the solution of the system Ax = b can be obtained Rx = b0 = QT b:
Solving Ax = b using QR
To solve Ax = b using QR factorization, 1. Find the QR factorization of A : QT A = R (Sections 5.4 and 5.5). 2. >From b0 = QT b. 3. Solve Rx = b0. 258
Forming b0
To compute b0 we do not need Q explicitly. It can be computed from the factored form of Q. For example, if the QR factorization is obtained using the Householder method (Chapter 5, Section 5.4.1), then QT = Hn;1Hn;2 H2H1 and b0 = QT b can be computed as (1) y1 = b (2) For k = 1 2 : : : n ; 1 do yk+1 = Hkyk (3) yn = b0:
Example 6.4.4
Consider
1 1 1 3 >From the example of Section 5.4.1, we know that the Householder method gives us 0 ;1:4142 ;2:1213 ;2:8284 1 B C R = B 0 1:2247 1:6330 C @ A 0 0 ;0:5774 0 0 ; p1 ; p1 1 0 0 ;0:7071 ;0:7071 1 2 2C B B 1 C H1 = B ; p2 1 ; 1 C = B ;0:7071 0:5000 ;0:5000 C @ A @ A 2 2 1 1 1 ;p ; ;0:7071 ;0:5000 0:5000 01 2 0 2 2 0 1 B C H2 = B 0 ;0:1691 ;0:9856 C @ A 0 ;0:9856 0:1691
00 1 11 B C A = B1 2 3C @ A
021 B C b = B6C: @ A
Compute b0:
021 B C y1 = b = B 6 C @ A
3
0 0 ; p1 ; p1 1 0 2 1 0 ; p9 1 0 ;6:3640 1 B ; p1 p1 2 ; p12 C B 6 C = B p1 2 C = B 0:0858 C CB C B C B C y2 = H1y1 = B 2 2 A @ 2 A@ A @ 2 A @ 1 ; p12 ; p2 p12 ; p52 3 ;2:9142
259
0 ;6:3640 1 B C b0 = y3 = H2y2 = B 2:8560 C @ A ;0:5770
(Note that b0 above has been computed without explicitly forming the matrix Q.)
Solve:
Rx = b0
011 B C x = B1C: @ A
1
FlopCount. If the Householder method is used to factor A into QR, the solution of Ax = b requires 2 n3 + O(n2) ops (Chapter 5, Section 5.4.1) on the other hand, the Givens 3 rotations technique requires about twice as many (Chapter 5, Section 5.5.1). Roundo Property
We know from Chapter 5 that both the Householder and the Givens methods for QR factorization of A are numerically stable. The backsubstitution process for solving an upper triangular system is also numerically stable (Section 3.1.3). Thus, the method for solving Ax = b using QR factorization is likely to be stable. Indeed, this is so.
Roundo Error Result for Solving Ax = b using QR
It can be shown (Lawson and Hanson (SLP p. 92)) that the computed solution x is ^ the exact solution of (A + E )^ = b + b x where kE kF (3n2 + 41n) kAkF + O( 2) and k bk (3n2 + 40n) kbk + O( 2).
Remark: There is no \growth factor" in the above expression. 6.4.6 Solving Linear System with Right Multiple Hand Sides
Consider the problem
AX = B
260
where B = (b1 ::: bm) is an n m matrix (m n) and X = (x1 x2 ::: xm). Here bi , and xi i = 1 ::: m are nvectors. The problem of this type arises in many practical applications (one such application has been considered in Section 6.4.7: Computing the Frequency Response Matrix). Once the matrix A has been factorized using any of the methods described in Chapter 5, the factorization can be used to solve the m linear systems above. We state the procedure only with partial pivoting.
Solving AX = B (Linear System with Multiple Righhand Sides) Step 1. Factorize: MA = U , using Gaussian elimination with partial pivoting. Step 2. Solve the m upper triangular systems:
Uxi = b0i = Mbi i = 1 ::: m:
FlopCount: The algorithm will require about n33 + mn2 ops. Example: Solve AX = B where 01 2 41 B C A = B4 5 6C @ A
07 8 9 1 C B 6 U = B 0 7 19 C @ 7 A
7 8 9
01 21 B C B = B3 4C @ A 00 0 11 B C M = B 1 0 ;1 C @ 7A 1 ;2 1 ;1 2
5 6
0 0 ;1 2 (See Example 5.2.4 in Chapter 5.)
Solve:
0 5 1 B C Ux1 = Mb1 = B 0:2857 C @ A 0 0:3333 1 B C x1 = B 0:3333 C @ A
0 261 0
Solve:
0 6 1 B C Ux2 = Mb2 = B 1:1429 C @ A
0
0 ;0:6667 1 B C x2 = B 1:3333 C : @ A
0
6.4.7 Special Systems
In this subsection we will study some special systems. They are (1) Symmetric positive de nite systems. (2) Hessenberg systems. (3) Diagonally Dominant Systems. (4) Tridiagonal and block tridiagonal systems. We have seen in Section 6.3 that these systems occur very often in practical applications such as in numerical solution of partial di erential equations, etc. Indeed, it is very often said by
practicing engineers that there are hardly any systems in practical applications which are not one of the above types. These systems therefore deserve a special treatment. Symmetric Positive De nite Systems factorization
First, we show that for a symmetric positive de nite matrix A there exists a unique
A = HH T
where H is a lower triangular matrix with positive diagonal entries. The factorization is called the Cholesky Factorization, after the French engineer Cholesky.
The existence of the Cholesky factorization for a symmetric positive de nite matrix A can be seen either via LU factorization of A or by computing the matrix H directly from the above relation. To see this via LU factorization, we note that A being positive de nite and, therefore having positive leading principal minors, the factorization
A = LU
is unique. The upper triangular matrix U can be written as
U = DU1
AndreLouis Cholesky (18751918) served as an o cer in the French military. His work there involved geodesy and surveying.
262
where
D = diag(u11 u22 : : :unn) = diag(a11 a(1) : : :a(n;1)) 22 nn
and U1 is a unit upper triangular matrix. Thus
A = LDU1:
Since A is symmetric, from above, we have
U1T DLT = LDU1
or
D = (U1T );1LDU1 (LT );1 :
The matrix (U1T );1L is a unit lower triangular matrix and the matrix U1(LT );1 is unit upper triangular. It therefore follows from above that (U1T );1L = U1 (LT );1 = I that is, so A can be written as
U1 = LT A = LDLT
where L is unit lower triangular. Since the leading principal minors of A are a11 a11a(1) : : : a11a(1) 22 22 (n;1) A is positive de nite i the pivots a a(1) : : : a(n;1) are positive. This means that when : : :ann 11 22 nn A is positive de nite the diagonal entries of D are positive and, therefore, we can write
D = D1=2 D1=2
where So,
D1=2 = diag
pa qa(1) : : : qa(n;1) : nn 11 22
A = LDLT = LD1=2D1=2LT = H H T
Note that the diagonal entries of H = LD1=2 are positive. The above discussion can be summarized in the following theorem: 263
Theorem 6.4.1 (The Cholesky Factorization Theorem) Let A be a symmetric
positive de nite matrix. Then A can be written uniquely in the form
A = HH T
where H is a lower triangular matrix with positive diagonal entries. An explicit expression for H is given by H = LD1=2 where L is the unit lower triangular matrix in the LU factorization of A obtained by Gaussian elimination without pivoting and
D1=2 = diag u1=2 : : : u1=2 : 11 nn
The above constructive procedure suggests the following algorithm to compute the Cholesky factorization of a symmetric positive de nite matrix A:
Algorithm 6.4.3 Gaussian Elimination for the Cholesky Factorization Step 1. Compute the LU factorization of A using Gaussian elimination without pivoting. Step 2. Form the diagonal matrix D from the diagonal entries of U :
D = diag(u11 u22 : : : unn):
Step 3. Form H = LD1=2. Example 6.4.5
Find the Cholesky factorization of A using Gaussian elimination without pivoting. ! 2 3 A = 3 5 ! ;1 = M1 = 1 0 L 3 ;2 1 ! 2 3! ! 1 0 2 3 (1) = M A = A = 1 ;3 1 3 5 0 1 2 2 264
U = L = H =
Verify
HH T =
p
2 3 D = diag(2 1 ) 1 2 0 2 ! 1 0 3 2 1 ! p2 0 ! p2 0 ! 1 0 = 3 1 1 3 1 p2 p2 0 p2 2 2 0
1 p2
!
Stability of Gaussian Elimination for the Cholesky Factorization
We now show that Gaussian elimination without pivoting is stable for symmetric positive definite matrices by exhibiting some remarkable invariant properties of symmetric positive de nite matrices. The following example illustrates that even when there is a small pivot, Gaussian 0:00003 0:00500 : 0:00500 1:0000 There is only one step. The pivot entry is 0.00003. It is small. The multiplier m21 is large: 0 00500 a m21 = ; a21 = ; 0::00003 = ; 500 : 3 11 But ! 0:00003 0:00500 A(1) = : 0 0:166667 The entries of A(1) did not grow. In fact, max(a(1)) = 0:166667 < max(aij ) = 1. This interesting ij phenomenon of Gaussian elimination without pivoting applied to the 2 2 simple example (positive de nite) above can be explained through the following result.
3 p2
! p2
0
3 p2 1 p2
!
2 3 = = A: 3 5
!
elimination without pivoting does not give rise to the growth in the entries of the matrices A(k). Let !
A=
(k let A(k) = (aij ) ) be the reduced matrices obtained by applying Gaussian elimination without pivoting to A. Then
Theorem 6.4.2 Let A = (aij ) be an n n symmetric positive de nite matrix and
1. each A(k) k = 1 : : : n ; 1, is symmetric positive de nite, 2. max ja(k)j max ja(k;1)j k = 1 2 : : : n ; 1. ij ij
265
Proof. We prove the results just for the rst step of elimination, because the results for the other
steps then follow inductively. After the rst step of elimination, we have
0 a(1) a(1) 0 n2 nn To prove that A(1) is positive de nite, consider the quadratic form:
0a a a1n 1 0 a11 a12 a1n 1 11 12 B C B C (1) B C a(1) C B 0 2n C B C: (1) = B 0 a22 A B .. .. . . . .. C = B .. C B . . C B . C . A @ B @ A
n n XX n n XX i=2 j =2
1 aij ; aia aj 1 xixj
xT Bx =
=
i=2 j =2 n n XX i=1 j =1
a(1)xixj = ij
aij xi xj ; a11
a x1 + a i1 xi i=2 11
n X
11
!2
If A(1) is not positive de nite, then there will exist x2 : : : xn such that
n n XX i=1 j =1
a(1)xi xj 0: ij
n X ai1
With these values of x2 through xn, if we de ne
x1 = ;
then the quadratic form
n n XX i=1 j =1
i=2 a11
xi
aij xixj 0
which contradicts the fact that A is positive de nite. Thus A(1) is positive de nite. Also, we note a(1) aii i = 1 2 : : : n ii for 0 a(1) = aii ; ai1 ii
2
Thus, each diagonal entry of A(1) is less than or equal to the corresponding diagonal entry of A. Since the largest element (in magnitude) of a symmetric positive de nite matrix lies on the diagonal, we have max ja(1) j max jaij j. ij
a11
aii (because a11 > 0).
A Consequence of Theorem 6.4.2 From Theorem 6.4.2, we immediately conclude that if jaij j 1 then ja(ijk)j 1. This means that the growth factor in this case is 1.
266
The Growth Factor and Stability of Guassian Elimination for a Positive De nite Matrix The growth factor of Gaussian elimination without pivoting for a symmetric positive de nite matrix is 1. Thus, Gaussian elimination without pivoting for a symmetric positive de nite matrix is stable. Example 6.4.6
05 1 11 B C A = B1 1 1C @ A 0 1 1 51 1 5 1 (1) = B 0 4 4 C : B 5 5C A @ A
0
4 5 24 5
The leading principal minors of A(1) are 5, 4, 16. Thus A(1) is symmetric positive de nite. Also, max(a(1) ) = 24 < max(aij ) = 5: ij 5 05 1 11 C B A(2) = B 0 4 5 C @ 5 4A 0 0 4 The leading principal minors of A(2) are 5, 4, 16. Thus A(2) is also positive de nite. Furthermore, max(a(2)) = 5 = max(a(1)). The growth factor = 5 = 1. ij ij 5
Solution of a Symmetric Positive De nite System Using LDLT Decomposition
We have seen in the beginning of this section that a symmetric matrix A having nonzero leading principal minors can be written uniquely in the form
A = LDLT
where L is unit lower triangular and D has positive diagonal entries. Furthermore, this decomposition can be obtained in a numerically stable way using Gaussian elimination without pivoting. 267
In several circumstances one prefers to solve the symmetric positive de nite system Ax = b directly from the factorization A = LDLT , without computing the Cholesky factorization. The advantage is that the process will not require computations of any square roots. The process then is as follows:
Gaussian Elimination for the Symmetric Positive De nite System Ax = b Step 1. Compute the LDLT factorization of A:
A = LDLT :
Step 2. Solve Step 3. Solve Step 4. Solve Example 6.4.7
2 3 A= 3 5
Lz = b Dy = z LT x = y
!
5 b= : 8 2 0 D= 1 : 0 2
!
Step 1. Step 2. Solve Lz = b
1 0 L= 3 2 1 1 0 3 1 2
!
!
5 = : z2 8 1 z1 = 5 z2 = 1 2
! z! 1
!
Step 3. Solve Dy = z
2 0 1 0 2
! y! 1
y2
268
=
13 2
5
!
y1 = 5 2 y2 = 1
5 2
Step 4. Solve LT x = y
1 3 2 0 1
! x! 1
x2 x2 = 1 x2 = 1
=
!
1
The Cholesky Algorithm
We now show how the Cholesky factorization can be computed directly from A = HH T . >From
0 0a a 1 B h11 0 a1n B 11 12 C Bh h 0 B a21 a22 a2n C B ..21 . .22 B . C=B . . B . C B ... B . C B .. B ... @ A B . @ an1 an2 ann
hn1
we have
A = HH T
0 0
hnn
10h h hn1 1 11 21 CB 0 h C CB hn2 C C B . . 22 C C B . .. C CB . C CB . C CB . C ... CB . C A@ A
0 0
hnn
h11 = pa11
i X k=1
a hi1 = h i1 aij =
j X k=1
11
i = 2 ::: n hik hjk j < i:
h2 = aii ik
This leads to the following algorithm, known as the Cholesky Algorithm.
Algorithm 6.4.4 Cholesky Algorithm
Given an n n symmetric positive de nite matrix A, the following algorithm computes the Cholesky factor H . The matrix H is computed row by row and is stored in the lower triangular part of A. For k = 1 2 : : : n do *This algorithm in some applications (such as in statistics) is known as the squareroot algorithm. A square269
root free algorithm, however, can be developed.
For i = 1 2 : : : k ; 1 do
; akk hkk = akk ; Pk=11 h2 kj j
aki hki = h1 aki ; Pij;1 hij hkj =1
q
ii
Remark:
1. In the above pseudocode,
0 X
j =1
( ) 0. Also when k = 1, the inner loop is skipped.
2. The Cholesky factor H is computed row by row. 3. The positive de niteness of A will make the quantities under the squareroot sign positive. shown (Demmel 1989) that where jeij j ^ Roundo property. Let the computed Cholesky factor be denoted by H . Then, it can be (n + 1) (a a )1=2, and 1 ; (n + 1) ii jj ^ ^ A + E = H (H )T
E = (eij ):
Thus, the Cholesky Factorization Algorithm is Stable Solution of Ax = b using the Cholesky Factorization
Having the Cholesky factorization A = HH T at hand, the positive de nite linear system Ax = b can now be solved by solving the lower triangular system Hy = b rst, followed by the upper triangular system H T x = y .
Algorithm 6.4.5 The Cholesky Algorithm for the Positive De nite System Ax = b Step 1. Find the Cholesky factorization of
A = HH T
using Algorithm 6.4.4.
Step 2. Solve the lower triangular system for y:
Hy = b:
270
Step 3. Solve the upper triangular system for x:
H T x = y:
Example 6.4.8
Let
01 1 1 1 B C A = B1 5 5 C @ A
1 5 14
031 B C b = B 11 C : @ A
20
A. The Cholesky Factorization 1st row: (k = 1) 2nd row: (k=2)
a h21 = h21 = 1 q11 p h22 = a22 ; h2 = 5 ; 1 = 2 21
(Since the diagonal entries of H have to be positive, we take the + sign.)
h11 = 1:
3rd row: (k=3)
h31 = a31 = 1 hn h32 = h1 (a32 ; h21h31) = 1 (5 ; 1) = 2 2 q22 p p h33 = a33 ; (h2 + h2 ) = 14 ; 5 = 9 31 32
(we take the + sign)
H = B1 2 0C @ A
1 2 3
h0 = +3 1 33 1 0 0
B
C
271
B. Solution of the Linear System Ax = b
(1) Solution of Hy = b
01 0 01 B1 2 0C B C @ A
1 2 3 y3 20 y1 = 3 y2 = 4 y3 = 3
0y 1 0 3 1 B y1 C = B 11 C B 2C B C @ A @ A
(2) Solution of H T x = y
01 1 11 B0 2 2C B C @ A
0 0 3 x3 3 x3 = 1 x2 = 1 x3 = 1
3
0x 1 031 B x1 C = B 4 C B 2C B C @ A @ A
of ops required to do the same job using LU factorization. Note that the process will also 2require n square roots. The solution of each triangular system Hy = b and H T x = y requires n ops. 2 Thus the solution of the positive de nite system Ax = b using the Cholesky algorithm 3 requires n + n2 ops and n square roots. 6 algorithm, then it can be shown that x satis es ^ where kE k2
FlopCount. The Cholesky algorithm requires n ops to compute H one half of the number 6
Roundo property. If x is the computed solution of the system Ax = b using the Cholesky ^
(A + E )^ = b x
rithm for solving a symmetric positive de nite system is quite stable.
c kAk2 and c is a small constant depending upon n. Thus the Cholesky algo
Relative Error in the Solution by the Cholesky Algorithm
Let x be the computed solution the symmetric positive de nite system of Ax = b ^ using the Cholesky algorithm followed by triangular systems solutions as described above, then it can be shown that kx ; xk2 ^ Cond(A): kxk2 ^ (Recall that Cond(A) = kAk kA;1k.) 272
~ one, since Cond(A) may be much smaller than Cond(A). (See the discussions on conditioning and scaling in Section 6.5.)
~ Remark: Demmel (1989) has shown that the above bound can be replaced by O( )Cond(A), pa : : : pa ). The latter may be much better than the previous ~ where A = D;1AD;1 D = diag( 11 nn
Hessenberg System
Consider the linear system
Ax = b
where A is an upper Hessenberg matrix of order n. If Gaussian elimination with partial pivoting is used to triangularize A and if jaij j 1 then ja(k)j k + 1 (Wilkinson AEP p. 218). Thus we ij can state the following:
Growth Factor and Stability of Gaussian Elimination for a Hessenberg System
The growth factor for a Hessenberg matrix using Gaussian elimination with partial pivoting is bounded by n. Thus a Hessenberg system can be safely solved
using partial pivoting.
3 ops required to solve a system with an arbitrary matrix. This is because at each step of elimination during triangularization process, only one element needs to be eliminated and since 2 there are (n ; 1) steps, the triangularization process requires about n ops. Once we have the 2 factorization MA = U the upper triangular system
n3
Flopcount: It requires only n2 ops to solve a Hessenberg system, signi cantly less than
a stable way using Gaussian elimination with partial pivoting. Example 6.4.9
273
2 can be solved in n ops. Thus a Hessenberg system can be solved with only n2 ops in 2
Ux = Mb = b0
Triangularize
0 1 2 31 B C A = B 2 3 4C @ A
0 5 6
using partial pivoting.
Step 1.
00 1 01 B C P1 = B 1 0 0 C @ A 02 3 41 B C P1 A = B 1 2 3 C @ A
0 0 1
0 5 6 ! 1 0 c M1 = 1 ;2 1 0 1 0 01 B 1 C M1 = B ; 2 1 0 C @ A 0 0 1 0 1 0 0102 3 41 02 3 41 CB C B 1 C B 1 M1P1A = A(1) = B ; 2 1 0 C B 1 2 3 C = B 0 2 1 C : A @ A@ A @ 0 0 1 0 5 6 0 5 6
Step 2.
01 0 01 B C P2 = B 0 0 1 C @ A 02 3 41 B C = B0 5 6C @ A
0
1 ; 10
0 1 0
1 2
P2A(1)
c M2 =
1
01 B M2 = B 0 @
1 ! 0 1
1 0 ; 10
0 1
0 C 0C A 1 274
1
01 B U = A(2) = M2P2A(1) = B 0 @ 00 B M = M2P2M1P1 = B 0 @
1 1 1 ; 2 ; 10
1 0 ; 10 1 1 0 C 0 1 C A
0 1
2 3 4 2 3 4 0 CB C B C 0CB0 5 6C = B0 5 6C A@ A @ A 1 1 2 1 0 2 0 0 5
10
1 0
1
Computation of the Growth Factor = max(6 6 6) = 1. 6
An Application of Hessenberg System: Computing the Frequency Response Matrix
In control theory it is often required to compute the matrix
G(j! ) = C (j!I ; A);1B
for many di erent values of ! , to study the response of a control system. The matrices A B C here are matrices of a control system and are of order n n n m, and r n, respectively (m n). The matrix G(j! ) is called the frequency response matrix. Since computing (j!I ; A);1B is equivalent to solving m linear systems (see Section 6.5.1):
i = 1 ::: m 3 where bi is the ith column of B for each ! , we need about n + mn2 + rnm ops to compute 3 G(j! ) (using Gaussian elimination with partial pivoting). Typically, G(j! ) needs to be computed for a very large number of ! , and thus, such computation will be formidable. On the other hand, if A is transformed initially to a Hessenberg matrix: A = PHP T
then
(j!I ; A)xi = bi
G(j! ) = C (j!I ; H );1P T B:
The computation of (j!I ; H );1 P T B will now require solutions of m Hessenberg systems, each of which will require only n2 ops. Then, for computing G(j! ) = C (j!I ; H );1P T B for each ! , there will be a saving of order O(n) for each ! . This count does not include reduction to Hessenberg 275
form. Note that the matrix A is transformed to a Hessenberg matrix once and the same Hessenberg matrix is used to compute G(j!) for each !. Thus, the computation that uses an initial reduction of A to a Hessenberg matrix is much more e cient. Moreover, as we have seen before, reduction to Hessenberg form and solutions of Hessenberg systems using partial pivoting are both stable computations. This approach was suggested by Laub (1981). in control theory. Numerically viable algorithms for important control problems such as controllability and observability, matrix equations (Lyapunov, Sylvester, ObserverSylvester, Riccati, etc.) routinely transform the matrix A to a Hessenberg matrix before actual computations start. Readers familiar and interested in these problems might want to look into the book \Numerical Methods in Control Theory" by B. N. Datta for an account of these methods. For references on the individual papers, see Chapter 8. The recent reprint book \Numerical Linear Algebra Techniques for Systems and Control," edited by R. V. Patel, Alan Laub and Paul van Dooren, IEEE Press, 1994, also contains all the relevant papers.
Remarks: Reduction of A to a Hessenberg matrix serves as a frontier to major computations
Diagonally Dominant Systems
Recall that a matrix A = (aij ) is column diagonally dominant if
ja11j > ja21j + ja31j + ja22j > ja12j + ja32j +
. . .
+ jan1j + jan2j
jannj > ja1nj + ja2nj + + jan;1 nj A column diagonally dominant matrix, like a symmetric positive de nite matrix, possesses the attractive property that no rowinterchanges are necessary at any step during the triangularization procedure for Gaussian elimination with partial pivoting. The pivot element is already there in the right place. Thus, to begin with, at the rst step, a11 being
the largest in magnitude of all the elements in the rst column, no rowinterchange is necessary. At the end of the rst step, we then have
0a a 11 12 B 0 a(1) B A(1) = B .. 22 B B . ... @
0
a1n 1 C a(1) C 2n C a(1) nn
a(1) n2
0a a 11 B 0 12 B . C=B . . C B . . A B . @
0 276
a1n 1
C C C C A0 C A
and, it can be shown exercise] that A0 is again column diagonally dominant and therefore a(1) 22 is the pivot for the second step. This process can obviously be continued, showing that pivoting is not needed for column diagonally dominant matrices. Furthermore, the following can be easily proved.
Growth Factor and Stability of Gaussian Elimination for Diagonally Dominant Systems
The growth factor for a column diagonally dominant matrix with partial pivoting is bounded by 2 (exercise #16): 2. Thus, for column diagonally dominant systems, Gaussian elimination with partial pivoting is stable.
Example 6.4.10
0 5 1 11 C B A = B 1 5 1C: A @
1 1 5
Step 1.
05 1 1 1 B C A(1) = B 0 24 4 C @ 5 5A
0
4 5 25 5
Step 2.
05 1 1 1 B C A(2) = B 0 24 5 C : @ 5 4A
0 0
14 3
The growth factor = 1. (Note that for this example, the matrix A is column diagonally dominant and positive de nite thus = 1.)
The next example shows that the growth factor of Gaussian elimination for a column diagonally dominant matrix can be greater than 1, but is always less than 2. Example 6.4.11
277
5 ;8 A= : 1 10 ! (1) = 5 ;8 : A 0 11:6 The growth factor = max(10 11:6) = 11:6 = 1:16. 10 10
!
Tridiagonal Systems
The LU factorization of a tridiagonal matrix T , when it exists, may yield L and U having very special simple structures: both bidiagonal, L having 1's along the main diagonal and the superdiagonal entries of U the same as those of T . Speci cally, if we write
0a B c1 B T = B ..2 B B. @ 01 B` B2 B B =B B B B @
0 ... 0 ... ... 0 ... ...
... ... ... ... b C n;1 C A
b1
0 . C . C . C
1
By equating the corresponding elements of the matrices on both sides, we see that
`n 1
10u CB 1 CB CB CB CB CB CB CB A@
cn
an
... ... 0 ... ... ... 0
b1
1 C C C C: C C C bn;1 C A
un
a1 = u1 ci = `i ui;1 ai = ui + `i bi;1
from which f`ig and fui g can be easily computed:
i = 2 ::: n i = 2 ::: n
Computing LU Factorization of a Tridiagonal Matrix
u1 = a1
For i = 2 : : : n do `i = ci
ui;1 ui = ai ; `ibi;1.
278
Flopcount: The above procedure only takes (2n2) ops. Solving a Tridiagonal System
Once we have the above simple factorization of T the solution of the tridiagonal system Tx = b can be found by solving the two special bidiagonal systems:
Ly = b
and .
Ux = y
Flopcount: The solutions of these two bidiagonal systems also require (2n2) ops. Thus, a tridiagonal system can be solved by the above procedure in only 4n4 ops, a very
cheap procedure indeed.
Stability of the Process: Unfortunately, the above factorization procedures breaks down if any ui is zero. Even if all ui are theoretically nonzero, the stability of the process in general cannot be guaranteed. However in many practical situations, such as in discretizing Poisson's equation etc., the tridiagonal matrices are symmetric positive de nite, in which cases, as we have seen before, the above procedure is quite stable. In fact, in the symmetric positive de nite case, this procedure should be preferred over the Choleskyfactorization technique, as it does not involve computations of any square
roots. It is true that the Cholesky factorization of a symmetric positive de nite tridiagonal matrix can also be computed in O(n) ops however, an additional n square roots have to be computed (see, Golub and Van Loan MC 1984, p. 97).
In the general case, to maintain the stability, Gaussian elimination with partial pivoting should be used. If jaij jbij, jcij 1 (i ; 1 : : : n), then it can be shown (Wilkinson AEP p. 219) that the entires
of A(k) at each step will be bounded by 2.
279
Growth Factor and Stability of Gaussian Elimination for a Tridiagonal System
The growth factor for Gaussian elimination with partial pivoting for a tridiagonal matrix is bounded by 2: 2:
Thus, Gaussian elimination with partial pivoting for a tridiagonal system is very stable.
The opcount in this case is little higher it takes about 7n ops to solve the system Tx = b (3n ops for decomposition and 4n for solving two triangular systems), but still an O(n) procedure. If T is symmetric, one naturally wants to take advantage of the symmetry however, Gaussian elimination with partial pivoting does not preserve symmetry. Bunch (1971,1974) and Bunch and Kaufman (1977) have proposed symmetrypreserving algorithms. These algorithms can be arranged to have opcount comparable to that of Gaussian elimination with partial pivoting and require less storage than the latter. For details see the papers by Bunch and Bunch and Kaufman.
Example 6.4.12 Triangularize
0 0:9 0:1 0 1 B C A = B 0:8 0:5 0:1 C @ A
0 0:1 0:5 using (i) the formula A = LU and (ii) Gaussian elimination. i. From A = LU u1 = 0:9
i=2:
8 c `2 = u2 = 0::8 = 9 = 0:8889 09 1 u2 = a2 ; `2b1 = 0:5 ; 8 0:1 = 0:4111 9 280
i=3:
c 1 `3 = u3 = 00::41 = 0:2432 2 u3 = a3 ; `3b2 = 0:5 ; 0:24 10:1 = 0:4757 0
B C L = B 0:8889 1 0C @ A 0 0:2432 0:1 1 0 0:9 0:1 0 B C U = B 0 0:4111 0:1 C @ A
0 0 0:4757
1
0
0
(ii) Using Gaussian Elimination with Partial Pivoting
Step 1. Multiplier m21 = ; 0::8 = ;0:89 09 0 0:9 0:1 0 1 B C A(1) = B 0 0:4111 0:1 C @ A
0 0:1 0:5 1 Step 2. Multiplier m32 = ; 00::41 = ;0:243 0 0:9 0:1 0 (2) = B 0 0:4111 B A 0:1 @
0 0 1 0 0 0:4757 0 1 1 1 0 0 0 B C B C L = B ;m21 1 0 C = B 0:8889 1 0 C : @ A @ A 0 ;m32 1 0 0:2432 1
1 C C=U A
Block Tridiagonal Systems
In this section we consider solving the block tridiagonal system:
Tx = b
where T is a block tridiagonal matrix and b = (b1 b2 : : : bn)T is a block vector. The number of components of the block vector bi is the same as the dimension of the ith diagonal block matrix in T. 281
A. Block LU Factorization
The factorization procedure given in the beginning of this section may be easily extended to the case of the Block Tridiagonal Matrix
0A B C1 B 2 B B T =B B B B @
B1
... ... 0 ... ... ... 0 ... ...
CN B1
1 C C C C C: C C BN ;1 C A
AN
Thus if T has the block LU factorization:
LN I then the matrices Li i = 2 : : : N and Ui i = 1
0I BL B 2 B B T =B B B B @
... 0 ... ... 0 ... ...
10U CB 1 CB CB CB CB CB . CB . CB . A@
0
... ... ... ... ...
1 C C C C C = LU C C BN ;1 C A
0 . . . . . .
UN
N can be computed as follows:
Algorithm 6.4.6 Block LU Factorization
Set For i = 2 : : : N do (1) Solve for Li : (2) Compute Ui:
U1 = A1
UiT;1 Li = Ci Ui = Ai ; LiBi;1 :
B. Solution of Block Systems
Once we have the above factorization, we can nd the solution x of the system
Tx = b
by solving Ly = b and Ux = y successively. The solution of Ly = b can be achieved by Block Forward Elimination, and that Ux = y can be achieved by Block Back Substitution. 282
Algorithm 6.4.7 Block Forward Elimination
Set L1y0 = 0. For i = 1 : : : n do yi = bi ; Liyi;1.
Algorithm 6.4.8 Block Back Substitution
Set BN xN +1 = 0. For i = N : : : 1 do Ui xi = yi ; Bixi+1
Example 6.4.13
0 4 ;1 1 0 1 041 B B C B ;1 4 0 1 C C B4C B C A=B b=B C C B C B 1 0 2 ;1 C B2C @ A @ A 0 1 ;1 2 2 ! ! ! 4 ;1 2 ;1 1 0 A1 = A2 = B1 = ;1 4 ;1 2 0 1 ! ! 4 2
b1 =
Set
Block LU Factorization ! 4 ;1 U1 = A1 = : ;1 4
4
b2 =
2
i=2:
(1) Solve for L2:
U1L2 = I2 = L2 = U1;1 =
1 0 0 1 ! 0:2667 0:0667 0:667 0:2667
!
283
(2) Compute U2 from
U2 = A2 ; L2 B1! ! ! 2 ;1 0:2667 0:0667 1 0 = ; ;1 2 0:0667 0:2667 0 1 ! 1:7333 ;1:0667 = ;1:0667 1:7333
Block Forward Elimination
4 y1 = b1 ; L1 y0 = b1 = 4 ! 0:6667 y2 = b2 ; L2 y1 = 0:6667
!
Block Back Substitution
!
U2x2 = x2 = U1x1 = x1 =
0:6667 y2 ; B2x3 = y2 = (B2x3 = 0) 0:6667 ! ! 0:6666 ! 1 ! ;1 0:6667 = 0:9286 0:5714 U2 = 0:6667 0:5714 0:9286 0:6667 1 ! 1! 3! 4 y1 ; B1x2 = ; = 4 1 3 ! 0:2667 0:0667 ! 3 ! 1 ! 3 U1;1 = = : 3 0:0667 0:2667 3 1
Block Cyclic Reduction
Frequently in practice, the block tridiagonal matrix of a system may possess some special properties that can be exploited to reduce the system to a single lowerorder system by using a technique called Block Cyclic Reduction. For details see the book by Golub and Van Loan (MC 1983 pp. 110{117) and the references therein.
6.4.8 Scaling
If the entires of the matrix A vary widely, then there is a possibility that a very small number needs to be added to a very large number during the process of elimination. This can in uence the 284
accuracy greatly, because, \the big one can kill the small one." To circumvent this di culty, often it is suggested that the rows of A are properly scaled before the elimination process begins. The following simple example illustrates this. Consider the system ! ! 106 ! 10 106 x1 = : 1 1 x2 2 Apply now Gaussian elimination with pivoting. Since 10 is the largest entry in the rst column, no interchange is needed. We have, after the rst step of elimination, ! x! ! 10 106 106 1 = 0 ;105 x2 ;105 1 which gives x2 = 1 x1 = 0. The exact solution, however, is . Note that the above system is 1 exactly equal to the system in Section 6.3.4, with the rst equation multiplied by 106. Therefore, even choosing the false pivot 10 did not help us. However, if we scale the entries of the rst row of the matrix by dividing it by 106 and then solve the system (after modifying the 1st entry of b appropriately) using partial pivoting, we will then have the accurate solution, as we have seen before. Scaling of the rows of a matrix A is equivalent to nding an invertible diagonal matrix D1 so ; that the largest element (in magnitude) in each row of D1 1A is about the same size. Once such D1 is found, the solution of the system Ax = b is found by solving the scaled system where ~ b Ax = ~ ~ = D1 1b: b ;
!
; ~ A = D1 1 A
The process can be easily extended to scale both the rows and columns of A. Mathematically, this is equivalent to nding diagonal matrices D1 and D2 such that the largest (in magnitude) element ; in each row and column of D1 1AD2 lies between two xed numbers, say, 1 and 1, where is the base of the number system. Once such D1 and D2 are found, the solution of the system
Ax = b
is obtained by solving the equivalent system ~ b Ay = ~ 285
and then computing where
x = D2y
; ~ A = D1 1AD2 ; ~ = D1 1 b: b
The above process is known as equilibration (Forsythe and Moler, CSLAS, pp. 44{45). In conclusion, we note that scaling or equilibration is recommended in general, and should be used only on an adhoc basis depending upon the data of the problem. \The roundo error analysis for Gaussian elimination gives the most e ective results when a matrix is equilibrated." (Forsythe and Moler CSLAS, p. )
6.4.9 LU Versus QR and Table of Comparisons
We have just seen that the method for solving Ax = b using the QR factorization is about twice as expensive as the Gaussian elimination with partial pivoting method if Householder method is used to factor A and, about four times as expensive as this method if Givens rotations are used. On the other hand, the QR factorization technique is unconditionally stable, whereas, from theoretical point of view, with Gaussian elimination with partial or complete pivoting, there is always some risk involved. Thus, if stability is the main concern and the cost is not a major factor, one can certainly use the QR factorization technique. However, considering both e ciency and stability from a practical point of view, it is currently agreed that Gaussian elimination with partial pivoting is the most practical approach for solution of Ax = b. If one really insists on using an orthogonalization technique, certainly the Householder method is to be preferred over the Givens method. Furthermore, Gaussian elimination without pivoting should not be used unless the matrix A is symmetric positive de nite or diagonally dominant. We summarize the above discussion in the following two tables.
286
TABLE 6.1
(COMPARISON OF DIFFERENT METHODS FOR LINEAR SYSTEM PROBLEM WITH ARBITRARY MATRICES)
METHOD
Gaussian Elimination Without Pivoting Gaussian Elimination With Partial Pivoting Gaussian Elimination With Complete Pivoting QR Factorization Using Householder Transformations QR Factorization Using Givens Rotations
FLOPCOUNT (APPROX.)
n3 3 n3 (+O(n2 ) 3 comparisons) n3 (+O(n3 ) 3 comparisons) 2n3 + (n 3 square roots) 4n3 + ( n2 3 2 square roots)
GROWTH FACTOR
Arbitrary 2n;1
STABILITY
Unstable Stable in practice
1 fn 21 3 2 n(1=n;1)g1=2 Stable
None None
Stable Stable
287
TABLE 6.2
(COMPARISON OF DIFFERENT METHODS FOR LINEAR SYSTEM PROBLEM WITH SPECIAL MATRICES)
MATRIX TYPE METHOD
3 Symmetric Positive 1) Gaussian Elimination 1) n 3 De nite without Pivoting 3 1) Cholesky 1) n + (n 6 square roots) Diagonally Gaussian Elimination n3 Dominant with partial 3 pivoting Hessenberg Gaussian Elimination with Partial n2 Pivoting Tridiagonal Gaussian Elimination O(n) with Partial Pivoting
FLOPCOUNT GROWTH (APPROX.) FACTOR STABILITY
1) = 1 1) None 2 n 2 Stable Stable Stable Stable
6.5 Inverses, Determinant and Leading Principal Minors
Associated with the problem of solving the linear system Ax = b are the problems of nding the determinant, the inverse and the leading principal minors of the matrix A. In this section we will see how these problems can be solved using the methods of various factorizations developed earlier.
6.5.1 Avoiding Explicit Computation of the Inverses The inverse of a matrix A very seldom needs to be computed explicitly. Most computa1. A;1b (inverse times a vector) 2. A;1B (inverse times a matrix) 3. bT A;1 c (vector times inverse times a vector). 288
tional problems involving inverses can be reformulated in terms of solution of linear systems. For example, consider
The rst problem, the computation of A;1 b is equivalent to solving the linear system:
Ax = b:
Similarly, the second problem can be formulated in terms of solving sets of linear equations. Thus, if A is of order n n and B is of order n m, then writing C = A;1 B = (c1 c2 : : : cm). We see that the columns c1 through cm of C can be found by solving the systems
Aci = bi i = 1 : : : m
where bi i = 1 : : : m are the successive columns of the matrix B . The computation of bT A;1c can be done in two steps: 1. Find A;1c that is, solve the linear system: Ax = c 2. Compute bT x. As we will see later in this section, computing A;1 is three times as expensive as solving the linear system Ax = b. Thus, all such problems mentioned above can be solved much more e ciently by formulating them in terms of linear systems rather than naively solving them using matrix inversion.
The explicit computation of the inverse should be avoided whenever possible. A linear system should never be solved by explicit computation of the inverse of the system matrix. Signi cance of the Inverse of a Matrix in Practical Applications
Having said that the most computational problems involving inverses can be reformulated in terms of linear systems, let us remark that there are, however, certain practical applications where the inverse of a matrix needs to be computed explicitly and, in fact, the entries of the inverse matrices in these applications have some physical signi cance.
Example 6.5.1
Consider once again the springmass problem discussed in Section 6.3.3. The (i j )th entry of the inverse of the sti ness matrix K tells us what the displacement of the mass i will be if a unit external force is imposed on mass j . Thus the entries of K will tell us how the systems components will respond to external forces. 289
Let's take a speci c instance when the spring constants are all equal:
k1 = k2 = k3 = 1
Then
Since the entries of the rst mass will make a displacement of all the masses by 1 inch downward. Similar interpretations can be given for the elements of the other columns of K ;1 .
0 2 ;1 0 1 B C K = B ;1 2 ;1 C @ A 0 ;1 1
01 1 11 B C K ;1 = B 1 2 2 C : @ A
1 2 3
rst column of K ;1 are all 1's, it means that a downward unit load to the
Some Easily Computed Inverses
Before we discuss the computation of A;1 for an arbitrary matrix A, we note that the inverses of some wellknown matrices can be trivially computed. (1) The inverse of the elementary lower triangular matrix M = I ; meT is given by M ;1 = I + meT k k (2) The inverse of an orthogonal matrix Q is its transpose QT (note that a Householder matrix and a permutation matrix are orthogonal). (3) The inverse of triangular matrix T of one type is again a triangular matrix of the same type, the diagonal entries being the reciprocals of the diagonal entries of the matrix T .
6.5.2 The ShermanMorrison and Woodbury Formulas
In many applications once the inverse of a matrix A is computed, it is required to nd the inverse of another matrix B which di ers from A only by a rankone perturbation. The question naturally arises if the inverse of B can be computed without starting all over again. That is, if the inverse of B can be found using the inverse of A which has already been computed. The ShermanMorrison formula shows us how to do this.
290
The ShermanMorrison Formula
If u and v are two nvectors and A is a nonsingular matrix, then (A ; uv T );1 = A;1 + (A;1uv T A;1 ) where 1 = (1 ; v T A;1u) if v T A;1u 6= 1.
Remarks: a) (1) above is a special case of this formula. b) The ShermanMorrison formula shows how to compute the inverse of the matrix obtained from a matrix A by rankone change, once the inverse of A has been computed, without explicitly computing the inverse of the new matrix.
The ShermanMorrison formula can be extended to the case where U and V are matrices. This generalization is known as the Woodbury formula:
The Woodbury Formula
(5) (A ; UV T );1 = A;1 + A;1 U (I ; V T A;1 U );1V T A;1, if I ; V T A;1 U is nonsingular.
Example 6.5.2
Given
01 1 11 B C A = B2 4 5C @ A
A;1
nd (A ; uv T );1, where u = v = (1 0 0)T .
0 ;3 ;1 1 1 B C = B 14 2 ;3 C @ A ;10 ;1 2
1 1 = 1 ; v T A;1 u = 4
6 7 8
0 ;3 ;1 1 1 4 4 4 ;1 + A;1 uv T A;1 = B 7 ; 3 1 C : B 2 C A @ 2 2 A ;5 3 ;1 2 2 2
291
Thus
0 ;3 1 1 1 B 4 43 4 C (A ; uv T );1 = B ; 2 ; 2 1 C : @ 7 2 A 5 3 ;1 ;2 2 2
6.5.3 Computing the Inverse of a Matrix
Computing the inverse of a matrix A is equivalent to solving the sets of linear systems:
Axi = ei i = 1 : : : n:
These n linear systems now can be solved using any of the techniques discussed earlier. However, Gaussian elimination with partial pivoting will be used in practice. Since the matrix A is the same for all the systems, A has to be factorized only once. Taking into account this fact and taking advantage of the special form of the right hand side of each system, the computation of the inverse of A requires about n3 ops. We may also compute the inverse of A directly from any of triangular factorizations (note that this process is completely equivalent to the process of nding A;1 by solving the n linear systems Axi = ei i = 1 : : : n, di ering only in the arrangement of computations). A. If Gaussian elimination with partial pivoting is used, we have
MA = U
then Recall from Chapter 5 that
A;1 = U ;1M: M = Mn;1Pn;1 M2P2 M1 P1
so that we have
Computing the Inverse From Partial Pivoting Factorization
A;1 = U ;1(Mn;1Pn;1
292
M2P2 M1 P1 ):
B. If complete pivoting is used, Then (Note that Q;1 = QT .) Thus,
MAQ = U: A = M ;1 UQT :
Computing the Inverse From Complete Pivoting Factorization
A;1 = QU ;1M = (Q1Q2
C. If orthogonal factorization is used,
Qn;1)U ;1(Mn;1Pn;1 A = QR
M2P2 M1 P1):
and
Computing the Inverse From QR Factorization
A;1 = R;1 QT :
For reasons stated earlier, Gaussian elimination with partial pivoting should be used
in practice.
trices Mi and the fact that Pi are permutation matrices should be taken into consideration in forming the product M . D. If A is a symmetric positive de nite matrix, then
Remark: In practical computations, the structure of the elementary lower triangular ma
A = HH T (the Cholesky Factorization) A;1 = (H T );1H ;1 = (H ;1)T H ;1
293
Computing the inverse of a symmetric positive de nite matrix A: Step 1. Compute the Cholesky factorization A = HH T . Step 2. Compute the inverse of the lower triangular matrix H : H ;1. Step 3. Compute (H ;1)T H ;1.
E. A is a tridiagonal matrix
0a b 1 0 1 1 B c a ... C B 1 2 C B . . C A=B . . . . bn;1 C B C @ A
0
ja1j > jb1j ja2j > jb2j + jc1j janj > jcn;1j:
. . .
cn;1 an
Then A has the bidiagonal factorization:
A = LU
where L is lower bidiagonal and U is upper bidiagonal.
A;1 = U ;1 L;1
Example 6.5.3
Let
0 2 ;1 0 1 B C A = B ;1 2 ;1 C @ A 0 ;1 1 0 1 0 01 C B L = B ;1 1 0 C A @ 2 0 ;2 1 0 2 ;1 3 0 1 B 3 C U = B 0 2 ;1 C @ A 01 B2 = U ;1 L;1 = B 0 @
0 0
1 3 1 3 2 3
1 1 0 0 1 1 1 CB 1 1 0C B ;1 CB 2 C = B1 2 2C: C A 2A@ A @ A 1 2 1 0 0 3 1 2 3 3 3 A is tridiagonal and L and U are bidiagonals. 294
10
1 0
1
Example 6.5.4
Compute A;1 using partial pivoting when 00 1 11 B C A = B1 2 3C @ A 1 1 1 A;1 = U ;1 M = U ;1M2 P2 M1P1 : Using the results of Example 5.2.4, we have
So,
00 1 01 B C M = M2P2 M1 P1 = B 1 0 0 C @ A 1 ;1 1 0 1 ;2 1 1 B C U ;1 = B 0 1 1 C : @ A 0 0 ;1 0 1 ;2 1 1 0 0 1 0 1 0 ;1 0 1 1 B CB C B C A;1 = B 0 1 1 C B 1 0 0 C = B 2 ;1 1 C : @ A@ A @ A 0 0 ; 1 1 ;1 1 ;1 1 ;1
6.5.4 Computing the Determinant of a Matrix
The determinant of a matrix A, denoted by det(A) is its nth leading principal minor pn. Thus, the method for computing the leading principal minors to be described in the next section, can in particular be used to compute the determinant of A. Furthermore, the ordinary Gaussian elimination (with partial, complete or without pivoting) and the Householder triangularization methods can also be applied to compute det(A).
Note: The determinant of a matrix is seldom needed in practice.
A. If Gaussian elimination without pivoting is used, we have
A = LU
det(A) = det(L) det(U ):
U is an upper triangular matrix, so, det(U ) = u11u22 lower triangular matrix, so, det(L) = 1. Thus,
295
unn = a11a(1) 22
n a(nn;1) L is a unit
Computing det(A) from LU Factorization
det(A) = a11a(1) 22
n a(nn;1) = the product of the pivots.
B. If the Gaussian elimination with partial pivoting is used, we have
MA = U:
Then, det(M ) det(A) = det(U ). Now M = Mn;1Pn;1 M2 P2 M1 P1. Since the determinant of each of the lower elementary matrices is 1 and the determinant of each of the permutation matrices 1, we have det(M ) = (;1)r where r is the number of row interchanges made during the pivoting process. So, we have 1 det(A) = det(M ) det(U ) = (;1)r u11 u22 unn = (;1)r a11a(1) a(n;1) : 22 nn
Computing det(A) from MA = U factorization
det(A) = (;1)r a11 a(1) 22 where r is the number of interchanges. C. If the Gaussian elimination with complete pivoting is used, we have
n a(nn;1)
MAQ = U:
So, det(M ) det(A) det(Q) = det(U ): det(M ) = (;1)r det(Q) = (;1)s Thus we state the following: 296
Let r and s be, respectively, the number of row and column interchanges. Then
Computing det(A) from MAQ = U factorization
det(A) = (;1)r+s det(U ) = (;1)r+s a11a(1) a(n;1) 22 nn where r and s are the number of row and column interchanges.
Example 6.5.5
00 1 11 B C A = B1 2 3C @ A
1 1 1 A. Gaussian elimination with partial pivoting. 01 2 3 1 B C U = B0 1 1 C @ A 0 0 ;1 only one interchange therefore r = 1. det(A) = (;1) det(U ) = (;1)(;1) = 1. B. Gaussian elimination with complete pivoting. 03 2 11 C B U = B0 2 1 C @ 3 3A 0 0 1 2 In the rst step, there were one row interchange and one column interchange. In the second step, there are one row interchange and one column interchange. Thus r = 2 s=2 det(A) = (;1)r+s det(U ) = (;1)43 2 1 = 1: 3 2
6.5.5 Computing The Leading Principal Minors of a Matrix
There are applications, such as nding the eigenvalue distribution in a given region of the complex plane (see Datta and Datta (1986), Datta (1987), etc.), that require knowledge of the leading principal minors of a matrix. Also, as we will see in Chapter 8, the leading principal minors are important to the eigenvalue computations of a symmetric matrix where the signs of the leading principal minors of a symmetric tridiagonal matrix are needed. We will discuss here a numerical method for determining the leading principal minors of a matrix. 297
The Gaussian elimination with partial and complete pivoting and Householder triangularization, in general, can not be used to obtain the leading principal minors (unless, of course, one nds the factor matrices explicitly, computes all the minors of each of the factor matrices and then uses the BinetCauchy theorem (Gantmacher, Theory of Matrices, Vol. I)
to compute the leading principal minors, which is certainly a very expensive procedure). However, Wilkinson (AEP, pp. 237{296) has shown that the steps of the partial pivoting method and those of the Givens triangularization method can be rearranged so that the modi ed procedures yield the kth leading principal minor at the end of the kth step. We will describe here only the Givens triangularization method.
The Givens Method for the Leading Principal Minors
The zeros are created in the positions (2,1) (3,1), (3,2) (4,1), (4,2) (4,3), etc., in order, by applying successively the Givens rotations J (1 2 ) to A J (1 3 ) to A(1) = J (1 2 )A J (2 3 ) to A(2) = J (1 3 )J (1 2 )A and so on. The leading principal minors of order (k + 1) is obtained at the end of kth step:
Pk+1 = a11 a(1) 22 Pk+1 = a11a22
) a(kk+1 k+1:
Of course, each A(k) can overwrite A. In this case we have
ak+1 k+1:
Algorithm 6.5.1 Givens Method For the Leading Principal Minors
Given an n n matrix A the following algorithm computes the (i + 1)th principal minor at the end of ith step. For i = 2 : : : n do For j = 1 2 : : : i ; 1 do Find c and s such that
c s ;s c Overwrite A with J (j i )A Pi = a11a22 aii
! a ! jj
aij
=
!
0
298
Flopcount and stability. The algorithm requires 4n ops, four times the expense of the 3 Gaussian elimination method. The algorithm has guaranteed stability. (Wilkinson, AEP, pp.
3
246)
Example 6.5.6
01 0 01 B C A = B1 1 0C @ A
only one step needed
0 0 1
0 1 0 1 ! B 1:4142 C 1 1 ! 1 ! B 0:7071 0:7071 C p2 p2 C= 1 =B 0 C =B @ A 1 @ A 1 1 ; p2 p2 1 ;0:7071 0:7071 0 0 0:7071 0:7071 0 1 0 1 0 0 1 0 1:4142 0:7071 0 1 B CB C B C A J (1 2 )A = B ;0:7071 0:7071 0 C B 1 1 0 C = B 0 0:7071 0 C @ A@ A @ A
0 0 1 0 0 1 0 0 1 1st leading principal minor: p1 = a11 = 1 2nd leading principal minor: p2 = a11 a22 = 1:4142 0:7071 = 1:0000 3rd leading principal minor: p3 = a11a22a33 = 1.
1 c= p
2
1 s= p
2
6.6 Perturbation Analysis of the Linear System Problem
In practice the input data A and b may be contaminated by error. This error may be experimental, may come from the process of discretization, etc. In order to estimate the accuracy of the computed solution, the error in the data should be taken into account. As we have seen in Chapter 3, there are problems whose solutions may change drastically even with small changes in the input data, and this phenomenon of illconditioning is independent of the algorithms used to solve these problems. We have discussed illconditioning of several problems in Chapter 3. Let's take another simple example of an illconditioned linear system. Consider the following linear system:
x1 + 2x2 = 3 2x1 + 3:999x2 = 5:999
299
The exact solution is x1 = x2 = 1. Now make a small perturbation in the righthand side obtaining the system:
x1 + 2x2 = 3 2x1 + 3:999x2 = 6
The solution of the perturbed system now, obtained by Gaussian elimination with pivoting (considered to be a stable method in practice) is:
x1 = 3
x2 = 0:
Thus, a very small change in the right hand side changed the solution altogether.
In this section we study the e ect of small perturbations of the input data A and b on the computed solution x of the system Ax = b. This study is very useful. Not only will this help
us in assessing an amount of error in the computed solution of the perturbed system, regardless of the algorithm used, but also, when the result of a perturbation analysis is combined with that of backward error analysis of a particular algorithm, an error bound in the computed solution by the algorithm can be obtained.
Since in the linear system problem Ax = b, the input data are A and b, there could be impurities either in b or in A or in both. We will therefore consider the e ect of perturbations on the solution x in each of these cases separately.
6.6.1 E ect of Perturbation in the RightHand Side Vector b
We assume here that there are impurities in b but the matrix A is exact.
Theorem 6.6.1 (Right Perturbation Theorem) If b and x, are, respectively,
k bk k xk Cond(A) k bk : kxk kbk
the perturbations of b and x in the linear system Ax = b and, A is assumed to be nonsingular and b 6= 0 then Cond(A)kbk
300
Proof. We have
and The last equation can be written as
Ax = b A(x + x) = b + b: Ax + A x = b + b
or that is,
A x = b sinceAx = b x = A;1 b:
Taking a subordinate matrixvector norm we get
k xk kA;1k k bk:
Again, taking the same norm on both sides of Ax = b we get
(6.6.1)
kAxk = kbk
or
kbk = kAxk kAk kxk
(6.6.2)
Combining (6.6.1) and (6.6.2), we have
On the other hand, A x = b gives Also, from Ax = b, we have
k xk kAk kA;1k k bk : kxk kbk
bk k xk k Ak k
(6.6.3) (6.6.4) (6.6.5)
Combining (6.6.4) and (6.6.5), we have
kxk
1
kA;1kkbk :
1
kx kxk
k bk : kAkkA;1kkbk
301
Recall from Chapter 3 that kAk kA;1k is the condition number of A and is denoted by Cond(A). Theorem is therefore proved. .
302
Interpretation of Theorem 6.6.1
It is important to understand the implication of Theorem 6.6.1 quite well. Theorem 6.6.1 says that a relative change in the solution can be as large as Cond(A) multiplied by the relative change in the vector b. Thus, if the condition number is not too large, then a small perturbation in the vector b will have very little e ect on the solution. On the other hand, if the condition number is large, then even a small perturbation in b might change the solution drastically.
Example 6.6.1 An illconditioned problem 01 2 1 1 B C A = B 2 4:0001 2:002 C @ A
1 2:002 2:004
0 4 1 B C b = B 8:0021 C @ A
5:006
011 0 4 1 B C B C The exact solution x = B 1 C. Change b to b0 = B 8:0020 C. @ A @ A
1 Then the relative change in b: 5:0061
kb0 ; bk = k bk = 1:879 10;5 (small): kbk kbk If we solve the system Ax0 = b0, we get 0 3:0850 1 B C x0 = x + x = B ;0:0436 C : @ A 1:0022 (x0 is completely di erent from x) x Note: kkxkk = 1:3461:
It is easily veri ed that the inequality in Theorem 6.6.1 is satis ed: Cond(A) kkbbkk = 4:4434: However, the predicated change is overly estimated.
Example 6.6.2 A wellconditioned problem ! 1 2
A=
3 4 303
3 b= 7
!
1 3:0001 The exact solution x = . Let b0 = b + b = . 1 7:0001 The relative change in b:
!
!
kb0 ; bk = 1:875 10;5 (small) kbk Cond(A) = 14:9330 (small)
Thus a drastic change in the solution x is not expected. In fact x0 satisfying
Ax0 = b0
is 0:9999 x0 = 1:0001
!
Note:
1 x= : 1
!
k xk = 10;5: kxk
6.6.2 E ect of Perturbation in the matrix A
Here we assume that there are impurities in A only and as a result we have A + A in hand, but b is exact.
Theorem 6.6.2 (Left Perturbation Theorem) Assume A is nonsingular and b = 0. Suppose that A and x are respectively the perturbations of A and x in 6
the linear system
Ax = b:
Furthermore, assume that A is such that k Ak <
kA k k xk Cond(A) k Ak = 1 ; Cond(A) k Ak : kxk kAk kAk
1 . Then ;1
Proof. We have
or
(A + A)(x + x) = b (A + A)x + (A + A) x = b: 304 (6.6.6)
Since we have from (6.6.6) or
Ax = b
(A + A) x = ; Ax (6.6.7) (6.6.8) (6.6.9)
x = ;A;1 A(x + x):
Taking the norm on both sides, we have
that is, Since
k xk kA;1k k Ak (kxk + k xk) ;1 kAk = kA k kAk k Ak (kxk + k xk)
;1 1 ; kAk kAk k Ak k xk
kAk
kAk kA;1k k Ak kxk: kAk
(6.6.10)
kA;1k k Ak < 1
the expression under parenthesis of the left hand side is positive. We can thus divide both sides of the inequality by this number without changing the inequality. After this, if we also divide by kxk, we obtain
which proves the theorem.
kAkkA;1k kkAAk k xk k k Ak ) kxk (1 ; kAkkA;1k kAk k Ak (1 ; Cond(A) k Ak ) = Cond(A) kAk kAk
(6.6.11)
Remarks: Because of the assumption that k Ak < kA1 1k (which is quite reasonable to assume), ; the denominator on the right hand side of the inequality in Theorem 6.6.2 is less than one. Thus A even if kkAkk is small, then there could be a drastic change in the solution if Cond(A) is large. Example 6.6.3
Consider the previous example once more. Change a2 3 to 2.0001 keep b xed. Thus 00 0 01 B C A = ;10;4 B 0 0 1 C (small): @ A 0 0 0 305
Now solve the system: (A + A)x0 = b :
0 ;1:0002 1 B C x0 = B 2:0002 C @ A
0:9998
0 ;2:0002 1 B C x = x0 ; x = B 1:0002 C @ A ;0:0003 x Relative Error = kkxkk = 1:2911 (quite large).
6.6.3 E ect of Perturbations in both the matrix A and the vector b
Finally, we assume now that both the input data A and b have impurities. As a result we have the system with A + A as the matrix and b + b as the right hand side vector.
Theorem 6.6.3 (General Perturbation Theorem) Assume that A is nonsingular, b = 0, and k Ak < kA1 1k . Then 6 ; 0 1 C k Ak k bk k xk B Cond(A) C B B A kxk @ 1 ; Cond(A) k Ak C kAk + kbk : kAk Proof. Subtracting
from we have or or
Ax = b
(A + A)(x + x) = b + b (A + A)(x + x) ; Ax = b (A + A)(x + x) ; (A + A)x + (A + A)x ; Ax = b (A + A)( x) ; Ax = b 306
or Let A;1 (; A) = F . Then
A(I ; A;1(; A)) x = b + Ax:
(6.6.12)
kF k = kA;1(; A)k kA;1k k Ak < 1 (by assumption):
Since kF k < 1, I ; F is invertible, and k(I ; F );1k 1 ;1kF k (see Chapter 1, Section 1.7, Theorem 1.7.7): >From (6.6.12), we then have (6.6.13)
x = (I ; F );1 A;1( b + Ax)
or or
A;1 k xk 1k; kFkk (k bk + k Ak kxk)
k xk kxk
That is, Again
kA;1k k bk (1 ; kF k) kxk + k Ak kA;1k k bk kAk + k Ak (1 ; kF k) kbk
(6.6.14) (Note that 1
kAk ). kxk kbk
(6.6.15) (6.6.16)
k xk kA;1k kAk k bk + k Ak : kxk (1 ; kF k) kbk kAk
;1 k kF k = kA;1(; A)k kA;1k k Ak = kA kAkkAk k Ak:
Since kF k 1, we can write from (6.6.15) and (6.6.16)
0 1 0 1 C k bk k Ak B C k bk k Ak k xk B kA;1k kAk Cond(A B C C B C kbk + kAk = B Cond(A) ) B C kbk + kAk : @ (1 ; kxk @ (1 ; ( kA;1k kAk ) k Ak) A k Ak) A kAk kAk
(6.6.17)
k A Remarks: We again see from (6.6.17) that even if the perturbations kkbbk and kkAkk are small, there might be a drastic change in the solution, if Cond(A) is large. Thus, Cond(A) plays the crucial role in the sensitivity of the solution.
307
De nition 6.6.1 Let A be a nonsingular matrix. Then Cond(A) = kAk kA;1k: A Convention
Unless otherwise stated, when we write Cond(A), we will mean Cond2 (A), that is, the condition number with respect to 2norm. The condition number of a matrix A with respect to a subordinate p norm (p = 1 2 1) will be denoted by Condp (A), that is, Cond1 (A) will stand for the condition number of A with respect to 1norm, etc.
6.7 The Condition Number and Accuracy of Solution
The following are some important (but easy to prove) properties of the condition number of a matrix. I. Cond(A) with respect to any pnorm is at least 1. II. If A is an orthogonal matrix, then Cond(A) with respect to the 2norm is 1.
(Note that this property of an orthogonal matrix A makes the matrix so attractive for its use in numerical computations.)
III. Cond(AT A) = (Cond(A))2. IV. Cond(A) = Cond(AT ). V. Cond(AB ) Cond(A)Cond(B ). VI. Cond( A) = Cond(A), where is a nonzero scalar. VII. Cond(A) j 1j=j nj, where j 1j j 2j : : : j nj, and VIII. Cond2 (A) = 1 = n, where
1 2 1
:::
n
are the eigenvalues of A.
n
are the singular values of A.
We now formally de ne the illconditioning and wellconditioning in terms of the condition number.
De nition 6.7.1 The system Ax = b is illconditioned if Cond(A) is quite large. Otherwise, it is wellconditioned.
308
numbers with respect to two di erent norms are related (see Golub and Van Loan MC, 1984 p. 1 Cond 26). (For example, it can be shown that if A is an n n matrix, then n Cond 2((A)) n.) In 1 A general, if a matrix is wellconditioned or illconditioned with respect to one norm, it
Remarks: Though the condition number, as de ned above, is normdependent, the condition
is also illconditioned or wellconditioned with respect to some other norms. Example 6.7.1
(a) Consider 1 0:9999 A = 0:9999 1 ! ;1 = 103 5:0003 ;4:99997 : A ;4:9997 5:0003 1. The condition numbers with respect to the in nity norm and 1norm:
!
kAk1 = kAk1 = 1:9999 kA;1k1 = kA;1k1 = 104 Cond1 (A) = Cond1 (A) = 1:9999 104
2. The condition number with respect to the 2norm
q (A) = 1:9999 kAk2 = q kA;1k2 = (A;1) = 104 Cond2 (A) = 1:9999 104:
Remark: For the above example, it turned out that the condition number with respect to any
norm is the same. This is, however, not always the case. In general, however, they are closely related. (See below the condition number of the Hilbert matrix with respect to di erent norms.)
6.7.1 Some Wellknown Illconditioned Matrices 1. The Hilbert Matrix 01 1 1 3 B1 2 1 1 B2 3 4 A = B .. B B. @
n
1
n+1
1
1 2n;1
1 1 C n+1 C C . C . C . A
n
1
309
For n = 10 Cond2(A) = 1:6025 1013: Cond1 (A) = 3:5353 1013: Cond1(A) = 3:5353 1013: 2. The Pie matrix A with aii = aij = 1 for i 6= j . The matrix becomes illconditioned when is close to 1 or n ; 1. For example, when = 0:9999 and n = 5, Cond(A) = 5 104. 3. The Wilkinson bidiagonal matrix of order 20 (see Chapter 3):
6.7.2 E ect of The Condition Number on Accuracy of the Computed Solution
Once a solution x of the system Ax = b has been computed, it is natural to test how accurate the ^ computed solution x is. If the exact solution x is known, then one could of course compute the ^ kx ; xk to test x. However, in most practical situations, the exact solution is not relative error kxk^ ^ known. In such cases, the most obvious thing to do is to compute the residual r = b ; Ax and ^ krk . Interestingly, we should note that the solution see how small is the relative residual kbk
obtained by the Gaussian elimination process in general produces a small residual. (WHY?) Unfortunately, a small relative residual does not guarantee the accuracy of the solution. The following example illustrates this fact. Example 6.7.2
Let
A = b =
Let
1:0001 1 1 1 ! 2:0001 : 2
!
x = ^
Then
0 2
!
0:0001 r = b ; Ax = ^ 0 310
!
Note that r is small. However, the vector x is nowhere close to the exact solution ^ ! 1
x=
. 1 The above phenomenon can be explained mathematically from the following theorem. The proof can be easily worked out.
Theorem 6.7.1 (Residual Theorem) kx ; xk Cond(A) krk : ^ kxk kbk Interpretation of Theorem 6.7.1
Theorem 6.7.1 tells us that the relative error in x does not depend only on the relative residual ^ but also on the condition number of the matrix A as well. A computed solution can be guaranteed to be accurate only when the product of both Cond(A) and the relative residual is small. Note that in the above example, Cond(A) = 4:0002 104: (large!)
6.7.3 How Large Must the Condition Number be for IllConditioning? A frequently asked question is: how large Cond(A) has to be before the system Ax = b is considered to be illconditioned. We restate Theorem 6.6.3 to answer the question. Theorem 6.6.3 (Restatement of Theorem 6.6.3) 0 1 C k Ak k bk Cond(A) k xk B B C B A kxk @ 1 ; Cond(A) k Ak C kAk + kbk kAk
Suppose for simplicity
k Ak = k bk = 10;d: kAk kbk
311
Then, k xk is approximately less than or equal to 2 Cond(A) 10;d . kxk This says that if the data has a relative error of 10;d and if the relative error in the solution has to be guaranteed to be less than or equal to 10;t then Cond(A) has to be less than or equal to 1 10d;t. Thus, whether a system is illconditioned or wellconditioned depends on 2 For example, suppose that the data have a relative error of about 10;5 and an accuracy of about 1 10;3 is sought, then Cond(A) 2 102 = 50. On the other hand, if the accuracy of about 10;2 is sought, then Cond(A) 1 103 = 500. Thus, in the rst case the system will be wellconditioned 2 if Cond(A) is less than or equal to 50, while in the second case, the system will be wellconditioned if Cond(A) is less than or equal to 500.
the accuracy of the data and how much error in the solution can be tolerated.
Estimating Accuracy from the Condition Number
In general, if the data are approximately accurate and if Cond(A) = 10s, then there will be only about t ; s signi cant digit accuracy in the computed solution when the solution is computed in tdigit arithmetic. For better understanding of conditioning, stability and accuracy, we again refer the readers to the paper of Bunch (1987).
6.7.4 The Condition Number and Nearness to Singularity
The condition number also gives an indication when a matrix A is computationally close to a singular matrix: if Cond(A) is large, A is close to singular. This measure of nearness to singularity is a more accurate measure than the determinant of A. For example, consider the wellknown n n triangular matrix 0 1 ;1 ;1 ;1 1 B 0 1 ... B C ;1 C B. B .. 0 1 . . . C . C B B . . . . . ... C A=B. . . . . . C B. . . . . . C C B . . . . C B . .. .. .. C B0 . ;1 C @ A 0 0 0 1 The matrix has the determinant equal to 1, however, it is nearly singular for large n. Cond1 (A) = n2n;1: Similarly, the smallness of the determinant of a matrix does not necessarily mean that A is close to a singular matrix. Consider A = diag(0:1 0:1 : : : 0:1) of order 1000. det(A) = 312
10;1000, which is a small number. However, A is considered to be perfectly nonsingular, because Cond2 (A) = 1.
6.7.5 Conditioning and Pivoting
It is natural to wonder if illconditioning can be detected during the triangularization process using the Gaussian elimination with partial pivoting. By a normalized matrix here we mean that kAk2 = 1. Suppose that A and b have been normalized. Then there are certain symptoms for illconditioning.
Symptoms for IllConditioning
1. A small pivot, 2. A large computed solution, 3. A large residual vector, etc. large (see Algorithm 5.2.3 or Algorithm 5.2.4), and this large M will make A;1 large. (Note that if partial pivoting is used, then A;1 = U ;1M ). Similarly, if the computed solution x is large, then ^ ;1^k kA;1k k^k showing that kA;1k is possibly large. Large from Ax = ^ we have kxk = kA b ^ b ^ b kA;1k, of course, mean illconditioning, because Cond(A) = kAk kA;1k will then be large.
Justi cation: Suppose there is a small pivot, then M in the triangular factorization of A will be
Remark: There are matrices which do not have any of these symptoms, still are illconditioned (see Wilkinson AEP pp. 254{255.) 6.7.6 Conditioning and the Eigenvalue Problem If a normalized matrix A has a small eigenvalue, it must be illconditioned. For, if
are the eigenvalues of A, then it can be shown that
1
2
:::
n
kA;1k2 max jeigenvalue of A;1j = min1j j : i Thus, a normalized matrix A is illconditioned if and only if kA;1k2 is large. (See
Wilkinson AEP, p. 195.)
313
Example 6.7.3
Consider the linear system with
Ax = b
01 0 B A = B 0 0:00001 @ 0 0:1 1 B C b = B 0:1 C @ A
0:1 0 0
0 C 0 C A 0:00001
1
0 0:00001 1 B C x = 104 B 1 C : @ A
which is quite large. The eigenvalues of A are 1, 0.00001 and 0.00001. Thus, A has a small eigenvalue. 0 0:00001 1 0 1 B C A;1 = 105 B 0 1 0C @ A 0 0 1 which is large. Thus, for this example (i) the computed solution is large, (ii) A has a small eigenvalue, and (iii) A;1 is large. A is, therefore, likely to be illconditioned. It is indeed true: Cond(A) = 105: 1
6.7.7 Conditioning and Scaling In Section 6.4.8 we discussed scaling and the message there was \scaling is in general recommended if the entries of the matrix A vary widely". Scaling followed by a strategy of pivoting is helpful. One thing that we did not make clear is that scaling has some e ect on the condition number of the matrix. For example, consider again the example of Section 6.4.8: ! 10 106
A =
1 1 Cond(A) = 106: However, if the rst row of A is scaled to obtain ! ~ = 0:00001 1 A 1 1 314
then
~ Cond(A) = 2:
The question naturally arises, \Given a matrix A how can one choose the diagonal ; matrices D1 and D2 such that Cond(D1 1AD2) will be as small as possible?" There is an (almost) classical solution due to Bauer (1963) for the above problem. Unfortunately the solution is not practical. For example the in nitynorm solution requires knowing the eigenvalue of maximum modulus and the corresponding eigenvector of the nonnegative matrix C = jAj jA;1j, and to solve Ax = b, we will not know A;1 in advance. For details of the method, see Forsythe and Moler (CSLAS, pp. 43{44).
Remark: Demmel (1989) has shown that scaling to improve the condition number is not necessary when solving a symmetric positive de nite system using the Cholesky algorithm. The error bound obtained for the solution by the algorithm for the unscaled system Ax = b ~ is almost the same as that of the scaled system with A = D;1AD;1 , D = diag(pa11 : : : pann). 6.7.8 Computing and Estimating the Condition Number
The obvious way to compute the condition number will be to compute it from its de nition: 1. Compute A;1 2. Compute kAk kA;1k and multiply them. We have seen that computing the inverse of A requires n3 ops. Thus, this approach is three times the expense of nding the solution of Ax = b itself. On the other hand, to compute Cond(A), we only need to know kA;1 k, not the inverse itself. Furthermore, the exact value of Cond(A) itself is seldom needed an estimate is su cient. The question, therefore, arises whether we can get a reasonable estimate of kA;1 k without computing the inverse of A explicitly. In this context, we note that if y is any nonzero nvector, then from
Az = y
we have Then,
kzk = kA;1yk kA;1k kyk: kA;1k kzk y 6= 0: kyk
315
Thus, if we choose y such that kz k is quite large, we could have a reasonably good estimate of ky k kA;1k. Rice (MCMS, p. 93) remarks: There is a heuristic argument which says that if y is z picked at random, then the expected value of kyk is about 1 kA;1k. kk 2 A systematic way to choose y has been given by the Linpack Condition Number Estimator (LINPACK (1979)). It is based on an algorithm by Cline, Moler, Stewart and Wilkinson (1979). The process involves solving two systems of equations
AT y = e
and
Az = y
where e is a scalar multiple of a vector with components 1 chosen in such a way that the possible growth is maximum. To avoid over ow, the LINPACK condition estimator SGECO routine actually computes an 1 estimate of Cond(A) , called The procedure for nding RCOND, therefore, can be stated as follows: 1. Compute kAk. 2. Solve AT y = e and Az = y , choosing e such that the growth is maximum (see LINPACK (1979)) for the details of how to choose e). RCOND = kAky kz k : kk
ky 3. RCOND = kAk kz k. k
cost of estimating the Cond(A) of A using the above procedure is quite cheap. The same triangularization can be used to solve both the systems in step 2. Also, `1 vector norm can be used so that the subordinate matrix norm can be computed from the columns of the matrix A. The process of estimating Cond(A) in this way requires only O(n2) ops. can be proved that
Flopcount. Once A has been triangularized to solve a linear system involving A, the actual
Roundo error. According to LINPACK (1979), ignoring the e ects of roundo error, it
1 RCOND Cond(A): In the presence of roundo error, if the computed RCOND is not zero, 1 is almost always RCOND a lower bound for the true condition number. 316
An Optimization Technique for Estimating kA;1k1
Hager (1984) has proposed a method for estimating kA;1 k based on an optimization technique. This technique seems to be quite suitable for randomly generated matrices. Let A;1 = B = (bij ). De ne a function f (x): f (x) = kBxk1 = Then
n n XX i=1 j =1
bij xj :
kBk1 = kA;1k1 = maxff (x) : kxk1 = 1g: Thus, the problem is to nd maximum of the convex function f over the convex set
S = fx 2 Rn : kxk1 1g:
It is well known that the maximum of a convex function is obtained at an extreme point. Hager's method consists in nding this maximum systematically. We present the algorithm below (for details see Hager (1984)). Hager remarks that the algorithm usually stops after two iterations. An excellent survey of di erent condition number estimators including Hager's, and their performances have been given by Higham (1987).
Algorithm 6.7.1 Hager's norm1 condition number estimator
Set = kA;1k1 = 0:
011 BnC 1 BnC B . C: Set b = B . C B.C @ A
n
1
1. Solve Ax = b. 2. Test if kxk 3. Solve AT z = y where
. If so, go to step 6. Otherwise set = kxk1 and go to step 3.
yi = 1 if xi 0 yi = ;1 if xi < 0:
William Hager is a professor of mathematics at University of Florida. He is the author of the book Applied Numerical Linear Algebra.
317
4. Set j = arg maxfjzij i = 1 to ng. 5. If jzj j > z T b, update
001 B ... C B C B C B1C B C b B C B C B0C B.C B.C B.C @ A
0
j th entry
and return to step 1. Else go to step 6. 6. Set kA;1k1
. Then Cond1 (A) = kAk1.
Example 6.7.4
We illustrate Hager's method by means of a very illconditioned matrix. 01 2 31 B C A = B 3 4 5 C Cond(A) = 3:3819 1016: @ A 6 7 8
Iteration 1:
011 B3C b = B 1 C: @3A 0 1:0895 1 B C x = B ;2:5123 C @ A
=
1 3
y = z = j =
1:4228 5:0245 011 B C B ;1 C @ A 1 0 2:0271 1 B C 1016 B ;3:3785 C @ A 1:3514 2
jz2j = 1016(3:3785) > zT b = ;1:3340:
318
Update
001 B C b B1C: @ A
0
Iteration 2:
001 B C b = B1C @ A 0 ;1:3564 1 B C x = 1017 B 2:7128 C @ A ;1:3564 kxk1 = 5:4255 1017:
0
Since kxk1 > , we set = 5:4255 1017.
Comment. It turns out that this current value is an excellent estimate of kA;1k1. Condition Estimation from Triangularization
If one uses the QR factorization to solve a linear system or the Cholesky factorization to solve a symmetric positive de nite system, then as a byproduct of the triangularization, one can obtain an upper bound of the condition number with just a little additional cost. If QR factorization with column pivoting is used, then from
QT AP
=
R
0
!
we have Cond2(A) = Cond2(R). If the Cholesky factorization is used then from
A = HH T
we have Cond2 (A) = (Cond2(H ))2. Thus, the Cond2 (A) can be determined if Cond2(R) or Cond2 (H ) is known. Since kRk2 or kH k2 is easily computed, all that is needed is an algorithm to estimate kR;1k2 or kH ;1k2. There are several algorithms for estimation of the norms of the inverses of triangular matrices. We just state one from a paper of Higham (1987). For details, see Higham (1987).
Algorithm 6.7.2 Condition Estimation of an Upper Triangular Matrix
319
Given a nonsingular upper triangular matrix T = (tij ) of order n, the following algorithm computes CE such that kT ;1k1 CE . 1. Set z = 1 .
n
tnn
For i = n ; 1 to 1 do s 1 2. s s + jtij jzj (j = i + 1 : : : n). zi = jts j :
ii
3. Compute CE = kz k1 , where z = (z1 z2 ::: zn)T .
Flopcount. The algorithm requires n ops. 2
2
Remark: Once kT ;1k1 is estimated by the above algorithm, kT ;1k2 can be estimated from the
relation:
2 kT ;1k2 (kM (T );1k1CE 1
where M (T ) = (mij ) are de ned by:
8 < jt j i = j mij = : ii ;jtij j i 6= j:
kM (T );1k1 can be estimated by using Hager's algorithm described in the last section.
6.8 Componentwise Perturbations and the Errors
If the componentwise bounds of the perturbations are known, then the following perturbation result obtained by Skeel (1979) holds.
Theorem 6.8.1 Skeel (1979) Let Ax = b and (A + A)(x + x) = b + b Let j Aj jAj ,j bj jbj Then,
k xk kxk kjA;1jjAjjxj + jA;1jjbjk (1 ; kjA;1jjAjk)kxk
Robert Skeel is a professor of Computer Science at the University of Illinois at UrbanaChampaign.
320
;1 De nition 6.8.1 We shall call the number Cond(A x) = kjA kjjAjjxjk the Skeel's condition xk number and Conds(A) = kjA;1jjAjk the upper bound of the Skeel's condition number.
Cond(A x) is useful when the column norms of A;1 vary widely. Chandrasekaran and Ipsen (1994) have recently given an analysis of how do the individual components of the solution vector x get a ected when the data is perturbed. Their analysis can e ectively be combined with the Skeel's result above, when the componentwise perturbations of the data are known.. We give an example.
An important property of Cond(A x): Skeel's condition number is invariant under rowscaling. It can, therefore, be much smaller than the usual condition number Cond(A).
Theorem 6.8.2 (Componentwise Perturbation Theorem). Let (A + A)(x + x) = b, and j Aj jAj Then,
where riT = eT A;1 . i
jxi ; xij jxij
jriT jjAjjxj jxij
Thus, the componentwise perturbations in the error expressions have led to the componenetwise version of Skeel's condition number. Similar results also hold for right hand side perturbation. For details see Chandrasekaran and Ipsen (1994).
6.9 Iterative Re nement
Suppose a computed solution x of the system Ax = b is not acceptable. It is then natural to wonder ^ if x can be re ned cheaply by making use of the triangularization of the matrix A already available ^ at hand. The following process, known as iterative re nement, can be used to re ne x iteratively up ^ to some desirable accuracy.
Iterative Re nement Algorithm
The process is based on the following simple idea: b Let x be a computed solution of the system
Ax = b:
321
b If x were exact solution, then
b r = b ; Ax
would be zero. But in practice we shall not expect that. Let us now try to solve the system again with the computed residual r(6= 0), that is, let c satisfy
Ac = r:
b Then, y = x + c is the exact solution of Ax = b, provided that c is the exact solution of Ac = r, because b b Ay = A(x + c) = Ax + Ac = b ; r + r = b:
It is true that c again will not be an exact solution of Ac = r in practice however, the above b discussion suggests that y might be a better approximation than x. If so, we can continue the process until a desired accuracy is achieved.
Algorithm 6.9.1 Iterative Re nement
Set x(0) = x. ^ For k = 0 1 2 : : : do 1. Compute the residual vector r(k):
r(k) = b ; Ax(k) :
2. Calculate the correction vector c(k) by solving the system:
Ac(k) = r(k)
using the triangularization of A obtained to get the computed solution x. ^ 3. Form x(k+1) = x(k) + c(k).
(k+1) (k) 4. If kx kx(;kx k2 is less than a prescribed tolerance , stop. k) 2
ination with pivoting will ultimately produce a very accurate solution.
Remark: If the system is wellconditioned, then the iterative re nement using Gaussian elim
Example 6.9.1
322
01 1 01 0 0:0001 1 B C B C A = B0 2 1C b = B 0:0001 C : @ A @ A 0 0 3 ;1:666 0 ;0:2777 1 C B The exact solution x = B 0:2778 C (correct up to four gures). @ A ;0:5555 011 B C x(0) = B 1 C : @ A
1
k=0:
The solution of Ac(0) = r(0) is
0 ;1:9999 1 B C r(0) = b ; Ax(0) = B ;2:9999 C : @ A ;4:6666 0 ;1:2777 1 B C = B ;0:7222 C @ A ;1:5555
c(0) x(1)
0 ;0:2777 1 B C = x(0) + c(0) = B 0:2778 C : @ A ;0:5555
Note that Cond(A) = 3:8078. A is wellconditioned.
Accuracy Obtained by Iterative Re nement
Suppose that the iteration converges. Then the error at (k +1)th step will be less than the error at the kth step.
Relative Accuracy from Iterative Re nement
Let
Then if c 10;s, there will be a gain of approximately s gures per iteration. the original system Ax = b each iteration requires only O(n2) ops. 323
kx ; x(k+1)k c kx ; x(k)k : ^ ^ kxk ^ kxk ^
Flopcount. The procedure is quite cheap. Since A has already been triangularized to solve
Remarks: Iterative re nement is a very useful technique. Gaussian elimination with partial pivoting followed by iterative re nement is the most practical approach for solving a linear system accurately. Skeel (1979) has shown that in most cases even one step of iterative re nement is su cient. Example 6.9.2 (Stewart IMC, p. 205)
7 6:990 34:97 A= b= 4 4 20:00 Cond2 (A) = 3:2465 103:
!
!
1:667 = 3:333 (obtained by Gaussian elimination without pivoting.)
2 The exact solution is x = . 3 Let x(0) be
!
x(0)
!
k=0:
r(0)
The solution of Ac(0) = r(0) is
= b ; Ax(0) = =
0:333 10;2 = 0
!
c(0) x(1)
0:3167 ;0:3167
! !
x(0) + c(0)
1:9837 = : 3:0163
k=1:
The solution of Ac(1) = r(1) is
r(1)
= b ; Ax(1)
;0:0292 ! = ;0:0168
0:0108 c(1) = ;0:0150 ! (2) = x(1) + c(1) = 1:9992 x 3:0008
!
324
Iterative Re nement of the Computed Inverse
As in the procedure of re ning the computed solution of the system Ax = b a computed inverse of A can also be re ned iteratively. Let X (0) be an approximation to a computed inverse of A. Then the matrices X (k) de ned by the following iterative procedure:
X (k+1) = X (k) + X (k)(I ; AX (k)) k = 0 1 2 : : :
(6.9.1)
Like the NewtonRaphson method, the iteration (6.9.1) has the convergence of order 2. This can be seen as follows:
converge to a limit (under certain conditions) and the limit, when it exists, is a better inverse of A. Note the resemblance of the above iteration to the NewtonRaphson method for nding a zero of f (x): f xk+1 = xk ; f 0((xk )) : x
k
I ; AX (k+1) = I ; A(X (k) + X (k)(I ; AX (k))) = I ; AX (k) ; AX (k) + (AX (k))2 = (I ; AX (k))2 : I ; AXk+1 = (I ; AX0)2k from where we conclude that if kI ; AX0 k = < 1 then the iteration converges to a limit, because in this case kI ; AXk+1k 2k . A necessary and su cient condition is that (I ; AX0 ) < 1.
We summarize the above discussion as follows: Continuing k times, we get
Theorem 6.9.1 The sequence of matrices fXkg de ned by
Xk+1 = Xk + Xk(I ; AXk ) k = 0 1 2 : : ::
converges to the inverse of the matrix A for any initial approximation X0 of the inverse i (I ; AX0 ) < 1: A su cient condition for the convergence is that
kI ; AX0k = < 1:
325
Example 6.9.3
01 2 31 B C A = B2 3 4C @ A
A;1
Let us take
then
0 0 ;0:6667 0:3333 1 B C = B ;4 4:3333 ;0:6667 C (in fourdigit arithmetic). @ A 3 ;2:6667 0:3333 0 0 ;0:6660 0:3333 1 B C X0 = B ;3:996 4:3290 ;0:6660 C @ A 2:9970 ;2:6640 0:3330
(I ; AX0) = 0:001 < 1:
7 6 8
(Note that the eigenvalues of I ; AX0 are 0.001, 0.001, 0.001.)
(exact up to four signi cant gures).
0 0 ;0:6667 0:3333 1 B C X1 = X0 + X0(I ; AX0) = B ;4 4:3333 ;0:6667 C @ A 3 ;2:6667 0:3333
Estimating Cond(A) For Iterative Re nement
A very crude estimate of Cond(A) may be obtained from the Iterative Re nement procedure (Algorithm 6.9.1). Let k be the number of iterations required for the re nement procedure to converge, t is the number of digits used in the arithmetic, then (Rice MCMS, p. 98): a rough estimate of Cond(A) is 10t(1; 1 ) .
k
Thus, if the iterative re nement procedure converges very slowly, then
A is illconditioned.
6.10 Iterative Methods
In this section, we study iterative methods, primarily used to solve a very large and sparse linear system Ax = b, arising from engineering applications. These include 326
1. The Jacobi method (Section 6.10.1). 2. The GaussSeidel method (Section 6.10.2). 3. The Successive Overrelaxation method (Section 6.10.4). 4. The Conjugate Gradient method with and without preconditioner (Section 6.10.5). 5. The GMRES method (Section 6.10.6). The GaussSeidel Method is a modi cation of the Jacobi method, and is a special case of the Successive Overrelaxation Method (SOR). The Conjugate Gardient Method is primarily used to solve a symmetric positive de nite system. The Jacobi and the GaussSeidel methods converge for diagonally dominant matrices and in addition, the GaussSeidel method converges if A is symmetric positive de nite. Note that the diagonally dominant and the symmetric positive de nite matrices are among the most important classes of matrices arising in practical applications. The direct methods based on triangularization of the matrix A becomes prohibitive in terms of computer time and storage if the matrix A is quite large. On the other hand, there are practical situations such as the discretization of partial di erential equations, where the matrix size can be as large as several hundred thousand. For such problems, the direct methods become impractical. For example, if A is of order 10 000 10 000, it may take as long as 23 days for an IBM 370 to solve the system Ax = b using Gaussian elimination or orthogonalization techniques of Householder and Givens. Furthermore, most large problems are sparse and the sparsity gets lost to a considerable extent during the triangularization procedure, so that at the end we have to deal with a very large matrix with too many nonzero entries, and the storage becomes a crucial issue. For such problems, it is advisable to use a class of methods called ITERATIVE METHODS that never alter the matrix A and require the storage of only a few vectors of length n at a time.
Basic Idea
The basic idea behind an iterative method is to rst write the system Ax = b in an equivalent form:
x = Bx + d
(6.10.1)
and then starting with an initial approximation x(1) of the solutionvector x to generate a sequence of approximations fx(k)g iteratively de ned by
x(k+1) = Bx(k) + d k = 1 2 : : :
327
(6.10.2)
with a hope that under certain mild conditions, the sequence fx(k)g converges to the solution as k ! 1. To solve the linear system Ax = b iteratively using the idea, we therefore need to know (a) how to write the system Ax = b in the form (6.10.1), and (b) how should x(1) be chosen so that the iteration (6.10.2) converges to the limit
or under what sort of assumptions, the iteration converges to the limit with any arbitrary choice of x(1). Stopping Criteria for the Iteration (6.10.2)
It is natural to wonder when the iteration (6.10.2) can be terminated. Since when convergence occurs, x(k+1) is a better approximation than x(k) a natural stopping criterion will be:
Stopping Criterion 1
I. Stop the iteration (6.10.2) if
kx(k+1) ; x(k)k < kx(k)k
for a prescribed small positive number . ( should be chosen according to the accuracy desired). In cases where the iteration does not seem to converge or the convergence is too slow, one might wish to terminate the iteration after a number of steps. In that case the stopping criterion will be:
Stopping Criterion 2
II. Stop the iteration (6.10.2) as soon as the number of iteration exceeds a prescribed number, say N .
6.10.1 The Jacobi Method The System Ax = b
or
a11x1 + a12x2 + a21x1 + a22x2 + an1x1 + an2x2 +
+ a1n xn = b1 + a2n xn = b2 . . . + annxn = bn 328
can be rewritten (under the assumption that aii 6= 0 i = 1 : : : n) as: x = 1 (b ; a x ; ; a x )
1
x2 = a1 (b2 ; a21x1 ; a23 x3 ; 22
. . .
a11
1
12 2
1n n
; a2nxn)
xn = a1 (bn ; an1x1 ; nn
In matrix notation,
; an n;1xn;1):
or
; a1n 1 0 x 1 0 b1 1 0 x 1 0 0 ; a12 a11 a11 a2n C B 1 C B a11 C B x1 C B ; a21 0 ; a23 B ; a C B x2 C B b2 C B 2 C B a22 C a22 B.C B 22 C B . C B B . C B .. C B .. C + B a C . ... ... ... B . C=B . C B C B 22 C . . B.C B CB . C B . C B . C B .. CB C B . C C ... ... B.C B . C@ A B ; aan;1 n C B .. C @ b.n A @ A @ n;1 n;1 A xn xn an n;1 an1 ann ; a 0 ;a nn nn
x = Bx + d: A =L+D+U
0 C 0C C ... . C .C .A an1 an n;1 0 D = diag(a11 ann) 0 a23 . . . ... ... ... ... ... ... 0 0 0 0 0 0 ...
If we write the matrix A in the form where
00 Ba B L = B 21 B .. B . @ 00 B0 B B. B U = B .. B. B. B. @
1
and
a12
1 C C . C . C . C C C an;1 n C A
a1n a2n
0
then it is easy to see that
B = ;D;1 (L + U ) = (I ; D;1 A) d = D;1b:
(Note that, because of our assumption that aii 6= 0 i = 1 : : : n, D is nonsingular.) 329
B = ;D;1 (L + U ) = (I ; D;1 A) the Jacobi Iteration Matrix and denote it by BJ . Similarly, we shall denote the vector D;1 b by bJ , which, we called the Jacobi vector.
We shall call the matrix
The Jacobi Iteration Matrix and the Jacobi Vector
Let A = L + D + U . Then
BJ = ;D;1(L + U ) bJ = D;1b
With the Jacobi iteration matrix and the Jacobi vector as de ned above, the iteration (6.9.2) becomes:
The Jacobi Iteration
k+1) x(ik+1) = a1 (bi ; ai i+1x(i+1
ii
; ai nx(nk)) i = 1 2
n:
We thus have the following iterative procedure, called the The Jacobi Method.
Algorithm 6.10.1 The Jacobi Method 0 x(1) 1 B 1 C B (1) C (1) = B x2 C. (1) Choose x B .. C B . C @ A
(2) For k = 1 2 : : : do until a stopping criterion is satis ed
x(1) n
x(k+1) = BJ x(k) + bJ
or
(6.10.3)
x(ik+1) = a1 (bi ;
ii
n X
j i
=1 6=j
aij x(jk)) i = 1 : : : n:
(6.10.4)
330
Example 6.10.1
05 1 11 B C A = B1 5 1C @ A
1 1 5
071 B C b = B7C @ A 001 B C x(1) = B 0 C @ A
7 0
0 0 ;0:2000 ;0:2000 1 B C BJ = B ;0:2000 0 ;0:2000 C @ A ;0:2000 ;0:2000 0
0 1:4000 1 B C bJ = B 1:4000 C @ A
1:4000
k=1:
0 1:4000 1 B C x(2) = BJ x(1) + bJ = B 1:4000 C @ A
1:4000
k=2:
0 0:8400 1 B C x(3) = BJ x(2) + bJ = B 0:8400 C @ A
0:8400
k=3:
0 1:0640 1 B C x(4) = BJ x(3) + bJ = B 1:0640 C @ A
1:0640
k=4:
0 0:9744 1 B C x(5) = BJ x(4) + bJ = B 0:9744 C @ A
0:9744
k=5:
0 1:0102 1 B C x(6) = BJ x(5) + bJ = B 1:0102 C @ A
1:0102
k=6:
0 0:0099 1 B C x(7) = BJ x(6) + bJ = B 0:0099 C @ A
0:0099 331
k=7:
0 1:0016 1 B C x(8) = BJ x(7) + bJ = B 1:0016 C @ A
1:0016
The GaussSeidel Method*
In the Jacobi method, to compute the components of the vector x(k+1) the components of the vector x(k) are only used however, note that to compute x(k+1), we could have used x(k+1) through 1 i (k+1) xi;1 which were already available to us. Thus, a natural modi cation of the Jacobi method will be to rewrite the Jacobi iteration (6.10.4) in the following form:
The GaussSeidel Iteration i;1 n X X (k) aij xj ): x(ik+1) = a1 (bi ; aij x(jk+1) ;
ii j =1 j =i+1
(6.10.5)
The idea is to use each new component, as soon as it is available, in the computation of the next component. The iteration (6.10.5) is known as the GaussSeidel iteration and the iterative method based on this iteration is called the GaussSeidel Method.
In the notation used earlier, the GaussSeidel iteration is:
x(k+1) = ;(D + L);1Ux + (D + L);1b:
(Note that the matrix D + L is a lower triangular matrix with a11 : : : ann on the diagonal and, since we have assumed that these entries are nonzero, the matrix (D + L) is nonsingular). We will call the matrix B = ;(D + L);1U the GaussSeidel matrix and denote it by the symbol BGS . Similarly the GaussSeidel vector (D + L);1b will be denoted by bGS . That is,
* Association of Seidel's name with Gauss for this method does not seem to be welldocumented in history.
332
The GaussSeidel Matrix and the GaussSeidel Vector
Let A = L + D + U . Then
BGS = ;(D + L);1U bGS = (D + L);1b:
Algorithm 6.10.2 The GaussSeidel Method
(1) Choose an initial approximation x(1) . (2) For k = 1 2 : : : do until a stopping criterion is satis ed
x(k+1) = BGS x(k) + bGS
or
i;1 n X X (k)A aij xj x(ik+1) = a1 @bi ; aij x(jk+1) ; ii j =1 j =i+1
(6.10.6)
0
1
i = 1 2 : : : n:
(6.10.7)
Example 6.10.2
05 1 11 B C A = B1 5 1C @ A 071 B C b = B7C @ A
k = 1:
1 1 5 7 0 0 ;0:2 ;0:2 1 C B BGS = B 0 0:04 ;0:16 C A @ 0 0:032 0:072 0 1:4000 1 B C bGS = B 1:1200 C @ A 0:8960
0 1:4000 1 C B x(2) = BGS x(1) + bGS = B 1:1200 C A @
0:8960 333
k = 2:
0 0:9968 1 B C x(3) = BGS x(2) + bGS = B 1:0214 C @ A
0:9964
k = 3:
0 0:9964 1 B C x(4) = BGS x(3) + bGS = B 1:0014 C @ A
1:0004
k = 4:
0 0:9996 1 B C x(5) = BGS x(4) + bGS = B 1:0000 C @ A
1:0001
k = 5:
011 B C x(6) = BGS x(5) + bGS = B 1 C : @ A
1
Computer Implementations
On actual computer implementations it is certainly economical to use the equation (6.10.4) and (6.10.7). The use of (6.10.3) and (6.10.6) will necessitate the storage of D, L and U , which will be a waste of storage.
6.10.3 Convergence of Iterative Methods
It is often hard to make a good guess of the initial approximation x(1). Thus, it will be nice to have conditions that will guarantee the convergence of the iteration (6.10.2) for any arbitrary choice of the initial approximation. In the following we derive such a condition.
Theorem 6.10.1 (Iteration Convergence Theorem) The iteration
x(k+1) = Bx(k) + c
converges to a limit with an arbitrary choice of the initial approximation x(1) i the matrix B k ! 0 as k ! 1, that is, B is a convergent matrix.
Proof. From
x = Bx + c
334
and we have
x(k+1) = Bx(k) + c x ; x(k+1) = B(x ; x(k)): x ; x(k) = B (x ; x(k;1)):
(6.10.8) (6.10.9)
Since this is true for any value of k, we can write
Substituting (6.10.9) in (6.10.8), we have
x ; x(k+1) = B2 (x ; x(k;1)):
Continuing this process k times we can write
(6.10.10)
x ; x(k+1) = Bk (x ; x(1)):
This shows that fx(k)g converges to the solution x for any arbitrary choice of x(1) if and only if B k ! 0 as k ! 1. Recall now from Chapter 1 (Section 1.7, Theorem 1.7.4) that B is a convergent matrix i the spectral radius of B, (B), is less than 1. Now (B) = maxfj ij i = 1 : : : ng, where 1 through n are the eigenvalues of B . Since j ij kB k for each i and for any matrix norm (see Chapter 8) in particular, (B ) kB k. Thus, a good way to see if B is convergent is to compute kBk with rowsum or columnsum norm and see if this is less than one. (Note that the converse
is not true.)
We combine the result of Theorem 6.10.1 with the observation just made in the following:
Conditions for Convergence of the Iteration (6.10.2)
A necessary and su cient condition for the convergence of the iteration (6.10.2), for any arbitrary choice of x(1), is that (B ) < 1. A su cient condition is that kB k < 1 for some norm. We now apply the above result to identify classes of matrices for which the Jacobi and/or GaussSeidel methods converge for any choice of initial approximation x(1). 335
The Jacobi and GaussSeidel Methods for Diagonally Dominant Matrices Corollary 6.10.1 If A is row diagonally dominant, then the Jacobi method
converges for any arbitrary choice of the initial approximation x(1) .
Proof. Since that A = (aij) is row diagonally dominant, we have, by de nition n X jaiij > jaij j i = 1 2 n:
j i
=1 6=j
(6.10.11)
Recall that the Jacobi Iteration Matrix BJ is given by 0 0 ; a12 B a21 a11 a23 B; B a22 0 ; a22 B B ... ... ... BJ = B ... B B .. ... ... B . B
; a1n a11 ; a2n a ; aan;1 n
. . .
22
0 From (6.10.11) we therefore have that the absolute row sum (that is, the row sum taking absolute values) of each row is less than 1, which means
nn nn
@
a ; a n1
; an n;1 a
n;1 n;1
1 C C C C C C C C C C A
kBJ k1 < 1:
Thus by Theorem 6.10.1, we have Corollary 6.10.1.
Corollary 6.10.2 If A is a row diagonally dominant matrix, then the
GaussSeidel method converges for any arbitrary choice of x(1).
Proof. The GaussSeidel iteration matrix is given by
BGS = ;(D + L);1U:
Let be an eigenvalue of this matrix and x = (x1 : : : xn)T be the corresponding eigenvector. Then from BGS x = x or
;Ux = (D + L) x
336
we have which can be rewritten as
;
n X j =i+1
aij xj =
i X j =1
aij xj 1 i n aij xj (1 i n):
aiixi = ;
i;1 X j =1
aij xj ;
n X j =i+1
Let xk be the largest component (having the magnitude 1) of the vector x. Then from the above equation, we have i;1 n X X j jjakkj j j jakj j + jakj j (6.10.12) that is,
j =1 j =i+1 n X j =k+1
j j(jakkj ;
or
k;1 X j =1
jakj j)
jakj j
(6.10.13) (6.10.14)
Since A is row diagonally dominant, or
Pn ja j j =k+1 kj jj : P ;1 (jakkj ; k=1 jakj j) j
jakkj >
k;1 X j =1 n X
j j
=1 6=k
jakjj
n X j =k+1
jakkj ;
jakjj >
jakj j:
Thus from (6.10.14), we conclude that j j < 1, that is,
(BGS ) < 1: Thus from Theorem 6.10.1, we have Corollary 6.10.2.
A Remark on Corollary 6.10.1
It is usually true that the greater the diagonal dominance of A, the faster is the convergence of the Jacobi method. However, there are simple counter examples that show that this does not always happen. The following simple 2 2 example in support of this statement appears in Golub and Van Loan MC (1989, p. 514). The example was supplied by Richard S. Varga. 0 1 ;1 1 3! @ 1 2 A A2 = 11 ; 4 : A1 = ;2 1 ;2 1 337
It is easy to verify that (BJacob ) of A1 is greater than (BJacob) of A2 . We now discuss the convergence of the GaussSeidel method for yet another very important class of matrices, namely, the symmetric positive de nite matrices.
The GaussSeidel Method for A Symmetric Positive De nite Matrix
We show that the GaussSeidel method converges, with an arbitrary choice of x(1), for a symmetric positive de nite matrix.
Theorem 6.10.2 Let A be a symmetric positive de nite matrix. Then the GaussSeidel method converges for any arbitrary choice of the initial approximation x(1) .
Proof. As A is symmetric, we have
A = L + D + LT :
Thus BGS = ;(D + L);1LT . We will now show that (BGS ) < 1. Let ; be an eigenvalue of BGS and u be the corresponding eigenvector. Then (D + L);1 LT u = u or Thus or or
LT u = (D + L)u: u LT u = u (D + L)u u Au ; u (L + D)u = u (L + D)u u Au = (1 + )u (L + D)u: u Au = (1 + )u (LT + DT )u:
(6.10.15) (6.10.16)
Taking conjugatetranspose on both sides, we have
338
>From (6.10.15) and (6.10.16), we obtain 1 + 1 u Au = (1 + ) (1 + ) = = =
u (L + D)u + u (LT + DT )u u (L + D + LT + DT )u u (A + DT )u u (A + D)u > u Au
(6.10.17) (6.10.18) (6.10.19) (6.10.20)
(Note that since A is positive de nite, so is D and, therefore, u Du > 0).
Dividing both sides of (6.10.20) by u Au(> 0) we have 1 + 1 (1 + ) (1 + ) > 1
or
(2 + + ) > 1: (1 + )(1 + ) Let = + i . Then = ; i . >From (6.10.21) we then have 2(1 + ) > 1 (1 + )2 + 2 p from where it follows that 2 + 2 < 1. That is (BGS ) < 1 since j j = 2 + 2 .
(6.10.21)
Rates of Convergence and a Comparison Between the GaussSeidel and the Jacobi Methods
We have just seen that for the row diagonally dominant matrices both the Jacobi and the GaussSeidel methods converge for an arbitrary x(1) . The question naturally arises if this is true for some other matrices, as well. Also, when both methods converge, another question arises:
which one converges faster?
>From our discussion in the last section we know that it is the iteration matrix B that plays a crucial role in the convergence of an iterative method. More speci cally, recall from proof of Theorem 6.9.2 that ek+1 = error at the (k + 1)th step = x ; xk+1 and e1 = initial error = x ; x(1) are related by kek+1k kBk kke1k k = 1 2 3 : : ::
Thus, kB k k gives us an upper bound of the ratio of the error between the (k + 1)th step and the initial error.
De nition 6.10.1 If kBk k < 1 then the quantity k ; ln kB k
k
339
is called the Average Rate of Convergence for k iterations, and, the quantity
; ln (B)
is called the Asymptotic Rate of Convergence.
If the asymptotic rate of convergence of one iterative method is greater than that of the other and both the methods are known to converge, then the one with the larger asymptotic rate of convergence, converges asymptotically faster than the other. The following theorem, known as the SteinRosenberg Theorem, identi es a class of matrices
for which the Jacobi and the GaussSeidel are either both convergent or both divergent. We shall state the theorem below without proof. The proof involves the PerronFrobenius Theorem from matrix theory and is beyond the scope of this book. The proof of the theorem and related discussions can be found in an excellent reference book on the subject, (Varga MIR, Chapter 3):
are all positive and the o diagonal entries are nonnegative, then one and only one of the following statements holds: (a) (BJ ) = (BGS ) = 0 (b) 0 < (BGS ) < (BJ ) < 1 (c) (BJ ) = (BGS ) = 1 (d) 1 < (BJ ) < (BGS ).
Theorem 6.10.3 (SteinRosenberg) If the matrix A is such that its diagonal entries
Corollary 6.10.3 If 0 < (BJ ) < 1, then the asymptotic rate of convergence of
the GaussSeidel method is larger than that of the Jacobi method. Two immediate consequences of the above results are noted below.
Richard Varga is a university professor at Kent State University. He is also the director of the Institute for Computational Mathematics at that university. He is well known for his outstanding contributions in iterative methods. He is the author of the celebrated book Matrix Iterative Analysis.
340
If the matrix A satis es the hypothesis of Theorem 6.10.3, then (i) The Jacobi and the GaussSeidel methods either both converge or both diverge. (ii) When both the methods converge, the asymptotic rate of convergence of the GaussSeidel method is larger than that of the Jacobi method.
average rate of convergence. Unfortunately, in the general case no such statements about the convergence and the asymptotic rates of convergence of two iterative methods can be made. In fact, there are examples where one method converges but the other diverges (see the example below). However, when both the
Remarks: Note that in (ii) we are talking about the asymptotic rate of convergence, not the
GaussSeidel and the Jacobi converge, because of the lower storage requirement and the asymptotic rates of convergence, the GaussSeidel method should be preferred over the Jacobi. Example 6.10.3
The following example shows that the Jacobi method can converge even when the GaussSeidel method does not. 0 1 2 ;2 1 B C A = B1 1 1 C @ A 2 2 1 011 B C b = B2C @ A 5 0 0 ;2 2 1 B C BJ = B ;1 0 ;1 C @ A ;2 ;2 0 0 0 ;2 2 1 B C BGS = B 0 2 ;3 C @ A 0 0 2 (BGS ) = 2: (BJ ) = 6:7815 10;6 341
A Few Iterations with GaussSeidel
x(1) x(4)
7 15 etc. This shows that the GaussSeidel method is clearly diverging. The exact solution is 071 B C x = B ;4 C : @ A ;1 On the other hand, with the Jacobi method we have convergence with only two iterations: 001 B C x(1) = B 0 C @ A 001 1 (2) = B 2 C B C x @ A 0 57 1 B C x(3) = B ;4 C : @ A ;1
001 011 011 B C x(2) = B 1 C B C B C = B0C x(3) = B 0 C @ A @ A @ A 3 0 07 1 0 131 1 B C B C = B ;8 C x(5) = B ;36 C @ A @ A
6.10.4 The Successive Overrelaxation (SOR) Method
The GaussSeidel Method is frustratingly slow when (BGS ) is close to unity. However, the rate of convergence of the GaussSeidel iteration can, in certain cases, be improved by introducing a parameter w, known as the relaxation parameter. The following modi ed GaussSeidel iteration,
The SOR Iteration i;1 n X X (k) w aij xj ) + (1 ; w)xk i = 1 2 : : : (6.10.22) x(ik+1) = a (bi ; aij x(jk+1) ; i
ii j =1 j =i+1
is known as the successive overrelaxation iteration or in short, the SOR iteration, if w > 1. From (6.10.22) we note the following:
342
(1) when ! = 1, the SOR iteration reduces to the GaussSeidel iteration. (2) when ! > 1, in computing the (k + 1)th iteration, more weight is placed on the most current value than when ! < 1, with a hope that the convergence will be faster. In matrix notation the SOR iteration is
x(k+1) = (D + !L);1 (1 ; !)D ; !U ]x(k) + !(D + !L);1b k = 1 2 : : :
(6.10.23)
(Note that since aii 6= 0, i = 1 : : : n the matrix (D + !L) is nonsingular.) The matrix (D + !L);1 (1 ; ! )D ; !U ] will be called the SOR matrix and will be denoted by BSOR . Similarly, the vector (D + !L);1b will be denoted by bSOR , that is,
The SOR Matrix and the SOR Vector
BSOR = (D + !L);1 (1 ; ! )D ; !U ] bSOR = !(D + !L);1b:
In matrix notation the SOR iteration algorithm will be:
Algorithm 6.10.3 The Successive Overrelaxation Method
(1) Choose x(1) (2) For k = 1 2 do until a stopping criterion is satis ed x(k+1) = BSOR x(k) + bSOR .
Example 6.10.4
Let A and b be the same as in Example 6.9.2. Let
x(1) = x(1) = x(1) = 0 w = 1:2 1 2 3
Then
0 ;0:2000 ;0:2400 ;0:2400 1 B C BSOR = B 0:0480 ;0:1424 ;0:1824 C @ A 0:0365 0:0918 ;0:0986
343
0 1:6800 1 B C bSOR = B 1:2768 C @ A
0:9704
k = 1: k = 2: k = 3:
0 1:6800 1 0 x(2) 1 1 B C B C x(2) = B x(2) C = BSOR x(1) + bSOR = B 1:2768 C @ A @ 2 A 0 x(3) 1 0 0:8047 1 1 B C B C x(3) = B x(3) C = BSOR x(2) + bSOR = B 0:9986 C @ 2 A @ A 0 x(4) 1 0 1:0266 1 1 B C B C x(4) = B x(4) C = BSOR x(3) + bSOR = B 0:9811 C : @ 2 A @ A
x(3) 3
1:0531
x(2) 3
0:9704
x(4) 0:9875 3 Choice of ! in the Convergent SOR Iteration
It is natural to wonder what is the range of ! for which the SOR iteration converges and what is the optimal choice of ! ? To this end, we rst prove the following important result due to William Kahan (1958).
proximation x(1) , w must lie in the interval (0,2).
Theorem 6.10.4 (Kahan) For the SOR iteration to converge for every initial ap
Proof. Recall that SOR iteration matrix BSOR is given by
BSOR = (D + !L);1 (1 ; !)D ; !U ]
where A = L + D + U . The matrix (D + !L);1 is a lower triangular matrix with a1 , i = 1 : : : n as the diagonal entries ii and the matrix (1 ; ! )D ; !U is an upper triangular matrix with (1 ; ! )aii i = 1 : : : n as the diagonal entries. So det(BSOR ) = ;(1 ; w)n:
William Kahan is a professor of Mathematics and Computer Science at the University of CaliforniaBerkeley. He has made signi cant contributions in several areas of numerical linear algebra, including computer arithmetic. He received the prestigious \ACM Turing Award" in 1989.
344
Since the determinant of a matrix is equal to the product of its eigenvalues, we conclude that (BSOR ) j1 ; ! j where (BSOR is the special radius of the matrix BSOR . Since by Theorem 6.10.1, for the convergence of any iterative method the spectral radius of the iteration matrix has to be less than 1, we conclude that ! must lie in the interval (0,2). The next theorem, known as the OstrowskiReich Theorem, shows that the above condition is also su cient in case the matrix A is symmetric and positive de nite. The theorem was proved by Reich for the GaussSeidel iteration (! = 1) in 1949 and subsequently extended by Ostrowski in 1954. We state the theorem without proof. For proof, see Varga MIA, Section 3.4 or Ortega, Numerical AnalysisSecond Course, p. 123. The OstrowskiReich Theorem is a generalization of Theorem 6.10.4 for symmetric positive de nite matrices.
Theorem 6.10.5 (OstrowskiReich) Let A be a symmetric positive de nite matrix and let 0 < ! < 2. Then the SOR method will converge for any arbitrary choice of x(1).
Optimal Choice of ! in the Convergent SOR Iteration
We now turn to the question of optimal choice of ! . Again, for an arbitrary matrix, no criterion has been developed so far. However, such a criterion is known for a very useful class of matrices that arises in many practical applications. The matrices in this class are called consistently ordered matrices.
De nition 6.10.2 A matrix A = L + D + U is consistently ordered if the eigenvalues of the
matrix
C ( ) = ;D;1 ( 1 L + U )
do not depend on , 6= 0. Young has de ned a consistently ordered matrix di erently (see Young (1971). The latter does not depend on the eigenvalues. 345
De nition 6.10.3 The matrix A is 2cyclic if there is a permutation matrix P such that A A !
PAP T = A21 A22
11 12
where A11 and A22 are diagonal. David Young has de ned such a matrix as the matrix having \Property (A)". This de nition can be generalized to block matrices where the diagonal matrices A11 and A22 are block diagonal matrices we call such matrices block 2cyclic matrices. A wellknown example of a consistently ordered block 2cyclic matrix is the block tridiagonal matrix 0 T ;I 1 0 B ;I . . . . . . ... C B C C A = B .. . . . . B C B . . . ;I C @ A 0 ;I T where 0 4 ;1 1 0 B ;1 . . . . . . ... C B C C: T = B .. .. .. B . . . C B ;1 C @ A 0 ;1 4 Recall that this matrix arises in the discretization of the Poisson's equations: on the unit square. In fact, it can be shown (Exercise) that every block tridiagonal matrix
2T x2 2 + T =f 2
y
with nonsingular diagonal blocks is consistently ordered and 2cyclic.
The following important and very wellknown theorem on the optimal chocie of ! for consistently ordered matrices is due to David Young (Young (1971)).
David Young, Jr., is a professor of mathematics and Director of the Numerical Analysis Center at the University of Texas at Austin. He is widelyknown for his pioneering contributions in the area of iterative methods for linear systems. He is also one of the developers of the software package \ITPACK".
346
diagonal elements. Then
Theorem 6.10.6 (Young) Let A be consistently ordered and 2cyclic with nonzero
(BGS ) = ( (BJ ))2:
Furthermore, if the eigenvalues of BJ are real and (BJ ) < 1, then the optimal choice for ! in terms of producing the smallest spectral radius in SOR, denoted by !opt, is given by !opt = p 2 1 + 1 ; (BJ )2 and (BSOR ) = !opt ; 1. As an immediate consequence, we get:
Corollary 6.10.4 For consistently ordered 2cyclic matrices, if the Jacobi method
converges, so does the GaussSeidel method, and the GaussSeidel method converges twice as fast as the Jacobi method.
Example 6.10.5
011 ;1 0 ;1 0 0 1 C B0C B C 4 ;1 0 ; 1 0 C C B C C B C C B0C ;1 4 0 0 ;1 C b = B C: C B C B0C 0 0 4 ;1 0 C C B C C B C C B0C ;1 0 ;1 4 ;1 A @ A 0 0 ;1 0 ; 1 4 0 The eigenvalues of BJ are: 0:1036 0:2500 ;0:1036 ;0:2500 0:6036 ;0:6036.
(BJ ) = 0:6036 p1 ;2(0:6036)2 = 1:1128 !opt = 1+ (BGS ) = 0:3643 347
04 B ;1 B B B B0 A=B B B ;1 B B B0 @
It took ve iterations for the SOR method0with !opt to converge to the exact solution (up to 1 0 B0C B C (1) four signi cant gures), starting with xSOR = B . C : B.C B.C @ A 0 0 0:2939 1 B 0:0901 C B C B B 0:0184 C C B C (5) B C xSOR = B B 0:0855 C : C B C B B 0:0480 C C @ A 0:0166 With the same starting vector x(1), the GaussSeidel method required 12 iterations (Try it!). Also nd out how many iterations will be required by Jacobi
Comparison of Rates of Convergence Between the GaussSeidel and the SOR Methods
The following theorem, also due to D. Young (see Young (1971)), relates the rates of convergence between the GaussSeidel method and the SOR method with the optimum relaxation factor !opt , for consistently ordered matrices.
diagonal elements. Assume that the eigenvalues of BJ are all real and (BJ ) < 1. Let RGS and RSOR denote, respectively, the asymptotic rates of convergence of the GaussSeidel and the SOR methods with optimum relaxation factor !opt . Then 2 (BJ )R1=2 RSOR RGS + 2 RGS ]1=2 GS the second inequality holds if RGS 3.
Theorem 6.10.7 (Young) Let A be a consistently ordered matrix with nonzero
Remarks: The above theorem basically states that the SOR method with the optimum relaxation factor converges much faster (which is expected) than the GaussSeidel method, when the asymptotic rate of convergence of the GaussSeidel method is small.
348
Example 6.10.6
Consider again matrix arising in the process of discretization of the Poisson's equation with the meshsize h (the matrix 6.3.36). For this matrix it is easily veri ed that (BJ ) = cos( h): We also know that (BGS ) = 2 (BJ ). Thus
RGS = 2RJ = 2(; log cos h) 2 2 = 2 h + 0(h4 )
=
2h2 + 0(h4):
2
For small h, RGS is small, and by the second inequality of Theorem 6.10.7 we have
RSOR 2R1=2 2 h GS
and
Thus, when h is small, the asymptotic rate of convergence of the SOR method with the optimum 1 relaxation is much greater than the GaussSeidel method. For example, when h = 50 then Thus, in this case the SOR method converges about 31 times faster than the GaussSeidel method. And, furthermore, the rate of convergence becomes greater as h decreases the improvement is really remarkable when h is very very small.
RSOR RGS
2=( h):
RSOR RGS
100 = 31:8:
6.10.5 The Conjugate Gradient Method We describe here a method, known as the Conjugate Gradient Method (CG) for solving a
symmetric positive de nite linear system
Ax = b:
The method was originally devised by Hestenes and Stiefel (1956) and nowadays is widely used to solve large and sparse symmetric positive de nite systems. It is direct in theory, but iterative in practice. The method is based on the following wellknown result on optimization (see the book by Luenberger (1973)). 349
Theorem 6.10.8 If A is a real symmetric positive de nite matrix, then solving
Ax = b
is equivalent to minimizing the quadratic function (x) = 1 xT Ax ; xT b: 2 Furthermore, the minimum value of (x) is ; 1 bT A;1 b and is obtained by choosing 2
x = A;1b:
There are a large number of iterative methods in the literature of optimization for solving this minimization problem (see Luenberger (1973)). In these iterative methods the successive approximations xk are computed recursively:
xk+1 = xk + kpk
where the vectors fpk g are called the direction vectors and the scalars k are chosen to minimize (p) in the directions of pk that is, k is chosen to minimize the function (xk + pk ). It can be shown that this will happen if we choose = k = pT (b ; Axk )=pT Apk k k = pT rk =pT Apk k k where rk = b ; Axk :
How to choose the direction vectors?
The next question, therefore, is how to choose the direction vectors pk ? The conjugate gradient method (denoted by CG) is a method that automatically generates the direction vectors. The direction vector needed at each step is generated at the previous step. Moreover, the direction vectors pk have the remarkable property that (pi)T Apj = 0 i 6= j: That is, these vectors are orthogonal with respect to the inner product xT Ay de ned by A. The direction vectors pk satisfying the above property are called conjugate vectors.
Algorithm 6.10.4 The Basic Conjugate Gradient Algorithm
The Classical Conjugate Gradient Method 350
1. Choose x0 and . Set p0 = r0 = b ; Ax0 . 2. For i = 0 1 2 3 do
Test for convergence: If kri+1 k2 2 kr +1 k2 i = kr k2 2 2 pi+1 = ri+1 + ipi
i i
w = Api i = krik2=pT w 2 i xi+1 = xi + ipi ri+1 = ri ; iw
continue.
Example 6.10.7
05 1 11 B C A = B1 5 1C @ A
1 1 5
071 B C b = B7C @ A
7
x(0) = (0 0 0)T
071 B C p0 = r0 = b ; Ax(0) = B 7 C @ A
7
i=0:
0 49 1 B C ! = Ap0 = B 49 C @ A
r 2 = kpT0k2 = 0:1429 0! 0 1:0003 1 B C x1 = x0 + 0p0 = B 1:0003 C @ A 1:0003 0 ;0:0021 1 B C r1 = r0 ; 0 ! = B ;0:0021 C @ A ;0:0021 ;8 0 = 9 10 0 ;0:0021 1 C B p1 = r1 + 0 p0 = B ;0:0021 C : A @ ;0:0021
0
49
351
i=1:
0 ;0:0147 1 B C ! = Ap1 = B ;0:0147 C @ A ;0:0147 0 1:0000 1 B C x2 = x1 + 1 p1 = B 1:0000 C : 1 = 0:1429 @ A
1:0000
Convergence In the absence of roundo errors the conjugate gradient method should converge in no more than n iterations. Thus in theory, the conjugate gradient method requires about
n
2
iterations. In fact, it can be shown that the error at every step decreases. Speci cally, it can be proved (see Ortega IPVL, p. 277) that:
Theorem 6.10.9
kx ; xk k2 < kx ; xk;1k2 where x is the exact solution, unless xk;1 = x:
However, the convergence is usually extremely slow due to the illconditioning of A. This can be seen from the following result.
352
Rate of Convergence of the Conjugate Gradient Method p De ne kskA = sT As. Then an estimate of the rate of convergence is:
kxk ; xkA 2 kkx0 ; xkA
where =(
p ; 1)=(p + 1) and = Cond(A) = kAk kA;1k = = 2 2 n
1
here n and 1 are the largest and smallest eigenvalues of the symmetric positive de nite matrix A (note that the eigenvalues of A are all positive).
Note. = 0 when Cond(A) = 1. When ! 1 Cond(A) ! 1. Thus, the larger is the Cond(A), the slower is the rate of convergence.
(For a proof, see Luenberger (1973, p. 187.)
Preconditioning
Since a large condition number of A slows down the convergence of the conjugate gradient method, it is natural to see if the condition number of A can be improved before the method is applied in this case we will be able to apply the basic conjugate gradient method to a preconditioned system. Indeed, the use of a good preconditioner accelerates the rate of convergence of the method substantially. ~ A basic idea of preconditioning is to nd a nonsingular S such that Cond(A) < Cond(A) where ~ ~~ b A = SAS T . Once such a S is found, then we can solve Ax = ~ where x = (S ;1)T x ~ = Sb and ~ b then recover x from x from ~ x = S T x: ~ The matrix S is usually de ned for simplicity by (S T S );1 = M: Note that M is symmetric positive de nite and is called a preconditioner.
Algorithm 6.10.5 The Preconditioned Conjugate Gradient Method (PCG)
353
Find a preconditioner M . Choose x0 and . Set r0 = b ; Ax0 p0 = y0 = M ;1 r0. For i = 0 1 2 3 (a) (b) (c) (d) (e) (f) (g) (h) do
w = Api T i = yi ri =pT w i xi+1 = xi + i pi ri+1 = ri ; iw
Test for convergence: If kri+1k2 2 yi+1 = M ;1ri+1 T T i = yi+1 ri+1=yi ri pi+1 = yi+1 + ipi
, continue.
Note: If M = I , then the preconditioned conjugate gradient methods reduces to the basic conjugate gradient.
The next question, therefore, is : : :
How to Find a Preconditioner M
Several possibilities have been explored in the literature. Among them are 1. Polynomial Preconditioning and 2. Incomplete Cholesky Factorization (ICF) We shall describe ICF in the following. For a description of polynomial preconditioning, (see Ortega IPVL, pp. 206{208).
Incomplete Cholesky Factorization
Since A is symmetric positive de nite, it admits factorization:
A = LDLT
where L is lower unit triangular and D is diagonal. If A is sparse, L is generally less sparse than A, because llin can occur. However, we can ignore the llin and obtain what is known as an Incomplete Cholesky Factorization. The basic principle of the Incomplete Cholesky Factorization of A = (aij ) is: 354
If aij 6= 0 calculate lij .
If set
aij = 0 lij = 0:
Algorithm 6.10.6 Incomplete Cholesky Factorization (Ortega IPVL, p. 212) Set `11 = pa11
For i = 1 2 : : : n do For j = 1 2 : : : i ; 1 do If aij = 0, then lij = 0 else `ij = `1 (aij ; Pj;1 `ik `jk) q Pi;1 l2 ) k=1 `ii = (aii ; k=1 ik
jj
be carried out to completion. However, we can obtain a no ll, incomplete LDLT factorization of A which avoids square root computations.
Remarks: The above algorithm requires computations of square roots. It may, therefore, not
Algorithm 6.10.7 NoFill Incomplete LDLT
Set d11 = a11. For i = 2:::n do For j = 1 2 ::: i ; 1 do if aij = 0 `ij = 0, else,
`ij = (aij ; dii = aii ;
i;1 X k=1
j ;1 X
k=1
`ik dkk`jk )=djj.
`ikdkk
Use of Incomplete Cholesky Factorization in Preconditioning
Note that Incomplete Cholesky Factorization algorithm mathematically gives the factorization of A in the form A = LLT + R 355
where R 6= 0. Since the best choice for a preconditioner M is the matrix A itself, after the matrix L is obtained through incomplete Cholesky Factorization, the preconditioner M is taken as
M = LLT :
In the Preconditioned Conjugate Gradient algorithm (Algorithm 6.9.5) (PCG), a symmetric positive de nite system of the form: My = r needs to be solved at each iteration with M as the coe cient matrix (Step f). Since M = LLT , this is equivalent to solving Lx = r LT y = x:
Since the coe cient matrix at each iteration is the same, the incomplete Cholesky Factorization will be done only once. If the no ll incomplete LDLT is used, then mathematically
we have
A = LDLT + R M = LDLT :
In this case we take the preconditioner M as:
Then at each iteration of the PCG, one needs to solve a system of the form
My = r
which is equivalent to Lx = r Dz = x and LT y = z . Again, L and D have to be computed once for all.
6.10.6 The Arnoldi Process and GMRES
In the last few years, a method called GMRES has received a considerable amount of attention by numerical linear algebraists in the context of solving large and sparse linear systems. The method is based on a classical scheme due to Arnoldi, which constructs an orthonormal basis of a space, called Krylov subspace fv1 Av1 :::An;1v1g, where A is n n and v1 is a vector of unit length. The Arnoldi method can be implemented just by using matrixvector multiplications, and is, therefore, suitable for sparse matrices, because the zero entries are preserved.
The basic idea behind solving a large and sparse problem using the Arnoldi method is to project the problem onto the Krylov subspace of dimension m < n using the orthonormal basis, constructed by the method, solve the mdimensional problem using
356
a standard approach, and then recover the solution of the original problem from the solution of the projected problem.
We now summarize the essentials of the Arnoldi method followed by an algorithmic description of GMRES.
Algorithm 6.10.8 : The Arnoldi Method
(1) Start: Choose a vector v1 of norm 1 and an integer m n. (2) Iterate: For j = 1 2 ::: m do
hij = (vjT Avj ) i = 1 2 ::: j vj+1 = Avj ; ^
and
j X i=1
h i j vi
hj +1 j = kvj +1k2 ^ vj +1 = vj+1 =hj+1 j: ^
(ii) Let Vm be the n m matrix whose j th column is the column vector vj , i.e., Vm v1 v2 ::: vm]: Then Vm is an orthonormal basis of the Krylov subspace fv1 Av1 ::: An;1v1 g. ~ (iii) De ne Hm as the (m +1) m matrix whose nonzero entries are the coe cients hij and Hm ~ is the m m matrix obtain from Hm by deleting its last row. Then the matrix Hm is such that
Notes: (i) The scalars hi j have been chosen so that the vectors vj are orthonormal.
AVm ; VmHm = hm;1 m 0 0 ::: 0 vm+1]:
(6.10.24)
(iv) A numerically viable way to implement the Arnoldimethod is to use modi ed GramSchmidt or complete or partial reorthogonalization in step 2.
Example
01 2 31 B C A = B1 2 3C @ A
1 1 1 v1 = (1 0 0)T :
m=2
j =1:
i = 1 h11 = 1
357
001 B C v2 = Av1 ; h11v1 = B 1 C ^ @ A
1 h21 = 1:4141 0 0 1 B C v2 = v2 =h21 = B 0:7071 C ^ @ A 0:7071
j=2:
i = 1 h12 = 3:5355 h22 = 3:5000
Form
01 0 1 B C V2 = B 0 0:7071 C @ A
0 0:7071
1 3:5335 H2 = : 1:4142 3:5000
!
VERIFY:
00 1 0 B C AV2 ; H2 V2 = B 0 1:0607 C : @ A 0 ;1:0607 GMRES (Generalized Minimal Residual) Method
The GMRES method is designed to minimize the norm of the residual vector b ; Ax over all vectors of the a ne subspace x0 + Km , where x0 is the initial vector and Km is the Krylov subspace of dimension m.
Algorithm 6.10.9 : fGeneralized Minimal Residual Method (GMRES) (Saad1 and Schultz2
(1986)) (1) Start: Choose x0 and a dimension m of the Krylov subspace. Compute r0 = b ; Ax0 . (2) Arnoldi process: Perform m steps of the Arnoldi algorithm starting with v1 = r0=kr0k, to generate the Hessenberg ~ matrix Hm and the orthogonal matrix Vm . (3) Form the approximate solution:
Youcef Saad is a professor of computer science at the University of Minnesota. He is wellknown for his contributions to largescale matrix computations based on Arnolid Method. 2 Martin Schultz is a professor of computer science at Yale University.
1
358
~ Find the vector ym that minimizes the function J (y ) = k e1 ; Hm y k where e1 = 1 0 ::: 0]T among all vectors y of Rm . = jr0j Compute xm = x0 + Vm ym . above algorithm to converge is not xed beforehand but is determined as the Arnoldi algorithm is run. A formula that gives the residual norm without computing explicitly the residual vector makes this possible. For details see the paper by Saad and Schultz 1986]. Solving a Shifted System: An important observation is that the Arnoldi basis Vm is invariant under a diagonal shift of A: if we were to use A ; I instead of A in Arnoldi, we would obtain the same sequence fv1 ::: vmg. This is because the Krylov subspace Km is the same for A and A ; I , provided the initial vector v1 is the same. Note that from (6.9.24) we have: (A ; I )Vm = Vm (Hm ; I ) + hm+1 m vm+1 eT m which means that if we run the Arnoldi process with the matrix A ; I , we would obtain the same matrix Vm but the matrix Hm will have its diagonal shifted by I . This idea has been exploited in the context of ordinary di erential equations methods by Gear and Saad (1983). Solving shifted systems arises in many applications, such as the computation of the Frequency Response matrix of a control system (see Section 6.4.7).
Remark: Although not clear from the above description, the number m steps needed for the
6.11 Review and Summary
For an easy reference, we now state the most important results discussed in this chapter.
1. Numerical Methods for Arbitrary Linear System Problems. Two types of methods
direct and iterativehave been discussed.
(a) Direct Methods: The Gaussian elimination and QR factorization methods.
Gaussian elimination without row interchanges, when it exists, gives a factorization of A : A = LU . The system Ax = b is then solved rst by solving the lower triangular system Ly = b followed by solving the upper triangular system Ux = y . 359
ommended for practical use unless the matrix A is symmetric positive de nite. The growth factor can be arbitrarily large for an arbitrary matrix.
Gaussian elimination with partial pivoting gives a factorization of A:
3 The method requires n ops. It is unstable for arbitrary matrices, and is not rec3
MA = U:
Once having this factorization, Ax = b can be solved by solving the upper triangular system Ux = b0, where b0 = Mb. The process requires n33 ops and O(n2) comparisons. In theory, there are some risks involved, but in practice, this is a stable algorithm. It is the most widely used practical algorithm
for solving a dense linear system.
Gaussian elimination with complete pivoting gives
MAQ = U:
Once having this factorization, Ax = b can be solved by solving rst
Uy = b0
and then recovering x from
where y = QT x b0 = Mb
x = Qy:
The process requires n33 ops and O(n3 ) comparisons. Thus it is more expensive than Gaussian elimination with partial pivoting, but it is more stable (the growth factor in this case is bounded by a slowly growing function of n, whereas the growth factor with Gaussian elimination using partial pivoting can be as big as 2n;1). The orthogonal triangularization methods are based on the QR factorization of A:
A = QR:
Once having this factorization, Ax = b can be solved by solving the upper triangular system Rx = b0, where b0 = QT b. One can use either the Householder method or the Givens method to achieve this factorization. The Householder method is more economical than the Givens method ( 2n3 ops versus 4n3 3 3 ops). Both the methods have the guaranteed stability. 360
(2) Iterative Methods: The Jacobi, GaussSeidel, and SOR methods have been discussed.
A generic formulation of an iterative method is:
x(k+1) = Bx(k) + d:
Di erent methods di er in the way B and d are chosen. Writing A = L + D + U we have: For the Jacobi method:
B = BJ = ;D;1 (L + U ) d = bJ = D;1 b:
For the GaussSeidel method:
B = BGS = ;(D + L);1 U d = bGS = (D + L);1b:
For the SOR method:
B = BSOR = (D + wL);1 (1 ; w)D ; wU ] d = bSOR = w(D + wL);1b:
(w is the relaxation parameter). The iteration
x(k+1) = Bx(k) + d
converges, for any arbitrary choice of the initial approximation x(1) if and only if the spectral radius of B is less than 1 (Theorem 6.10.1). A su cient condition for convergence is kB k < 1 (Theorem 6.10.1). Using this su cient condition, it has been shown that both the Jacobi and GaussSeidel methods converge when A is a diagonally dominant matrix (Corollary 1 and Corollary 2 of Theorem 6.10.1). The GaussSeidel method also converges when A is symmetric positive de nite (Theorem 6.10.2). For the SOR iteration to converge for any arbitrary choice of the initial approximation, the relaxation parameter w has to lie in (0,2) (Theorem 6.10.4). 361
If the matrix A is symmetric positive de nite, then the SOR iteration is guaranteed to converge for any arbitrary choice of w in the interval (0,2) (Theorem 6.10.5). For a consistently ordered and 2cyclic matrix A with nonzero diagonal entries, the optimal choice of w, denoted by wopt is given by: wopt = p 2 1 + 1 ; (BJ )2 assuming that the eigenvalues of BJ are real and (BJ ) < 1. (A) stands for the spectral radius of A (Theorem 6.10.6).
2. Special Systems: Symmetric positive de nite, diagonally dominant, Hessenberg and tridiagonal systems have been discussed.
pivoting and Choleskyare described.
(a) Symmetric positive de nite system. Two methodsGaussian elimination without
The Gaussian elimination without pivoting gives a factorization of A:
A = LDLT :
The system Ax = b is then solved by rst solving the lower triangular system LDy = b, followed by solving the upper triangular system LT x = y . The method requires n33 ops and does not require any square roots evaluation. It is stable ( 1). The Cholesky factorization algorithm computes a factorization of A in the form A = HH T , where H is lower triangular with positive diagonal entries. Once having this factorization, the system Ax = b is solved by rst solving the lower triangular system Hy = b, where y = H T x, followed by solving the upper triangular system H T x = y . The method requires n63 ops and n square roots evaluations. It is stable. (
(b) Diagonally Dominant system. Gaussian elimination with partial pivoting is stable
2).
ops to solve an n n Hessenberg system. It is stable ( 362
(c) Hessenberg system. Gaussian elimination with partial pivoting requires only O(n2)
n).
It is stable (
(d) Tridiagonal system. Gaussian elimination with partial pivoting requires only O(n) ops.
2).
3. Inverse, Determinant and Leading Principal Minors. The inverse and the determinant
of a matrix A can be readily computed once a factorization of A is available.
(a) Inverses.
If Gaussian elimination without pivoting is used, we have
A = LU:
Then
A;1 = U ;1 L;1: MA = U:
If Gaussian elimination with partial pivoting is used, we have Then
A;1 = (M ;1U );1 = U ;1M: MAQ = U:
If Gaussian elimination with complete pivoting is used, we have Then
A;1 = (M ;1UQT );1 = QU ;1M: A = QR:
If an orthogonal factorization is used, we have Then
Note that most problems involving inverses can be recast so that the inverse does not have to be computed explicitly.
Furthermore, there are matrices (such as triangular, etc.) whose inverses are easily computed. The inverse of a matrix B which di ers from a matrix A by a rankone perturbation only can be readily computed, once the inverse of A has been found, by using the ShermanMorrison Formula: Let B = A ; uvT . Then B;1 = A;1 + (A;1uvT A;1), where = (1;v 1A;1 u) . There is a generalization of this formula, known as the Woodbury Formula.
T
A;1 = (QR);1 = R;1QT :
363
However, the determinant of A can be computed immediately, once a factorization of A has been obtained. If we use Gaussian elimination with partial pivoting, we have
(b) Determinant. The determinant is rarely required in practical applications.
MA = U
then det(A) = (;1)r a11 a(1) 22
n a(nn;1)
where r is the number of row interchanges made during the elimination process. a11 a(1) : : : 22 (n;1) are the pivot entries, which appear as the diagonal entries of U . Similarly, other ann factorizations can be readily used to nd the determinant.
(c) Leading Principal Minors. The Givens triangularization method has been described. 4. The Condition Number and Accuracy of Solution. In the linear system problem Ax = b,
the input data are A and b. There may exist impurities either in A or in b, or in both. We have presented perturbation analyses in all the three cases. The results are contained in Theorems 6.6.1, 6.6.2, and 6.6.3. Theorem 6.6.3 is the most general theorem. In all these three cases, it turns out that Cond(A) = kAk kA;1k is the deciding factor. If this number is large, then a small perturbation in the input data may cause a large relative error in the computed solution. In this case, the system is called an illconditioned system, otherwise it is wellconditioned. The matrix A having a large condition number is called an illconditioned matrix. Some important properties of the condition number of a matrix have been listed. Some wellknown illconditioned matrices are the Hilbert matrix, Pie matrix, Wilkinson bidiagonal matrix, etc. The condition number, of course, has a noticeable e ect on the accuracy of the solution.
A computed solution can be considered accurate only when the product of both Cond(A) and the relative residual is small (Theorem 6.7.1). Thus, a small relative residual alone does not guarantee the accuracy of the solution. A frequently asked question is: How large does Cond(A) have to be before the system Ax = b is considered to be illconditioned?
364
The answer depends upon the accuracy of the input data and the level of tolerance of the error in the solution. In general, if the data are approximately accurate and if Cond(A) = 10s, then there are about (t ; s) signi cant digits of accuracy in the solution, if it is computed in tdigit
arithmetic.
Computing the condition number from the de nition is clearly expensive it involves nding the norm of the inverse of A, and nding the inverse of A is about three times the expense of solving the linear system itself. Two condition number estimators: The LINPACK condition number estimator and the Hager's norm1 condition number estimator, have been described. There are symptoms exhibited during the Gaussian elimination with pivoting such as a small pivot, a large computed solution, a large residual, etc. that merely indicate if a system is illconditioned, but these are not sure tests. When componentwise perturbations are known, Skeel's condition number can be useful , especially when the norms of the columns of the inverse matrix vary widely.
5. Iterative Re nement. Once a solution has been computed, an inexpensive way to re ne the
solution iteratively, known as the iterative re nement procedure has been described (Section 6.8). The iterative re nement technique is a very useful technique.
6. The Conjugate Gradient and GMRES Methods. The conjugate gradient method, when
used with an appropriate preconditioner, is one of the most widely used methods for solving a large and sparse symmetric positive de nite linear system. Only one type of preconditioner, namely the Incomplete Cholesky Factorization (ICF), has been described in this book. The basic idea of ICF is to compute the Cholesky factorization of A for the nonzero entries of A only, leaving the zero entries as zeros. We have described only basic Arnoldi and Arnoldibased GMRES methods. In practice, modi ed GramSchmidt process needs to be used in the implementation of the Arnoldi method, and GMRES needs to be used with a proper preconditioner. The conjugate gradient method is direct in theory, but iterative in practice. It is extremely slow when A is illconditioned and a preconditioner is needed to accelerate the convergence.
365
6.12 Some Suggestions for Further Reading
The books on numerical methods in engineering literature routinely discuss how various engineering applications give rise to linear systems problems. We have used the following two in our discussions and found them useful: 1. Numerical Methods for Engineers, by Steven C. Chapra and Raymond P. Canale (second edition), McGrawHill, Inc., New York, 1988. 2. Advanced Engineering Mathematics, by Peter V. O'Neil (third edition) Wadsworth Publishing Company, Belmont, California, 1991. Direct methods (such as Gaussian elimination, QR factorization, etc.) for linear systems and related problems, discussions on perturbation analysis and conditioning of the linear systems problems, iterative re nement, etc. can be found in any standard numerical linear algebra text (in particular the books by Golub and Van Loan, MC and by Stewart, IMC are highly recommended). Most numerical analysis texts contain some discussion, but none of the existing books provides a through and indepth treatment. For discussion on solutions of linear systems with special matrices such as diagonally dominant, Hessenberg, positive de nite, etc., see Wilkinson (AEP, pp. 218{220). For proofs and analyses of backward error analyses of various algorithms, AEP by Wilkinson is the most authentic book. Several recent papers by Nick Higham (1986,1987) on conditionnumber estimators are interesting to read. For iterative methods, two (by now) almost classical books on the subject: 1. Matrix Iterative Analysis, by Richard Varga, Prentice Hall, Engelwood Cli s, New Jersey, 1962, and 2. Iterative Solution of Large Linear Systems, by David Young, Academic Press, New York, 1971, are a must. Another important book in the area is: Applied Iterative Methods, by L. A. Hageman and D. M. Young, Academic Press, New York, 1981. The most recent book in the area is Templates for the Solution of Linear Systems: Building Blocks For Iterative Methods, by Richard Barrett, Mike Berry, Tony Chan, James Demmel, June Donato, Jack Dongarra, Viector Eijkhoat, Roland Pozo, Chuck Romine and Henk van der Vorst, SIAM, 1994. The book incoporates stateoftheart computational methods for solving large and sparse nonsymmetric systems. 366
The books: Introduction to Parallel and Vector Solutions of Linear Systems and Numerical Analysis { A Second Course, by James Ortega, also contain very clear expositions on the convergence of iterative methods. The Conjugate Gradient Method, originally developed by M. R. Hestenes and E. Stiefel (1952), has received considerable attention in the context of solving large and sparse positive de nite linear systems. A considerable amount of work has been done by numerical linear algebraists and researchers in applications areas (such as in optimization). The following books contain some indepth discussion: 1. Introduction to Linear and Nonlinear Programming, by David G. Luenberger, AddisonWesley, New York, 1973. 2. Introduction to Parallel and Vector Solutions of Linear Systems, by James Ortega. An excellent survey paper on the subject by Golub and O'Leary (1989) is also recommended for further reading. See also another survey paper by Axelsson (1985). The interesting survey paper by Young, Jea, and Mai (1988), in the book Linear Algebra in Signals, Systems, and Control, edited by B. N. Datta, et al., SIAM, 1988, is worth reading.
The development of conjugate gradient type methods for nonsymmetric and symmetric inde nite linear systems is an active area of research.
A nice discussion on \scaling" appears in Forsythe and Moler (CSLAS, Chapter 11). See also a paper by Skeel(1979) for relationship between scaling and stability of Gaussian elimination. A recent paper of Chandrasekaran and Ipsen( 1994)describes sensitivity of individual components of the solution vector when the data is subject to perturbations. To learn more about the Arnoldi method and the Arnoldibased GMRES method, see the papers of Saad and his coauthors (Saad (1981), Saad (1982), Saad and Schultz (1986)). Walker (1988) has proposed an implementation of GMRES method using Householder matrices.
367
Exercises on Chapter 6 Use MATLAB Wherever Needed and Appropriate PROBLEMS ON SECTION 6.3
1. An engineer requires 5000, 5500, and 6000 yd3 of sand, cement and gravel for a building project. He buys his material from three stores. A distribution of each material in these stores is given as follows: Store Sand Cement Gravel % % % 1 60 20 20 2 40 40 20 3 20 30 50 How many cubic yards of each material must the engineer take from each store to meet his needs? 2. If the input to reactor 1 in the \reactor" problem of Section 6.3.2 is decreased 10%, what is the percent change in the concentration of the other reactors? 3. Consider the following circuit diagram:
3
10Ω
2
5Ω
1 V1 = 200V
5Ω
10Ω
4
15Ω
5
20Ω
6
V6 = 50V
Set up a linear system to determine the current between nodes. 4. Using the di erence equation (6.3.35), set up a linear system for heat distribution at the following interior points of a heated plate whose boundary temperatures are held constant: 368
50
o
C
(1,1)
(2,1)
(3,1)
o 100 C
(1,2)
(2,2)
(3,2)
o 75 C
(1,3)
(2,3)
(3,3)
o 0 C
5. Derive the linear system for the nite di erence approximation of the elliptic equation The domain is in the unit square, x = 0:25, and the boundary conditions are given by
2T x2 2 + yT = f (x y ): 2
T (x T (1 T (0 T (x
6. For the previous problem, if
0) y) y) 1)
= = = =
1;x
y
1 1:
f (x y ) = ; 2 sin( x) sin( y)
then the analytic solution to the elliptic equation
2T x2 2 + yT = f (x y ) 2
with the same boundary conditions as in problem #5, is given by T (x y) = 1 ; x + xy + 1 sin( x) sin( y ) 2 (Celia and Gray, Numerical Methods for Differential Equations, Prentice Hall, Inc., NJ, 1992, pp. 105{106). (a) Use the nite di erence scheme of section 6.3.4 to approximate the values of T at the 1 interior points with x = y = n n = 4 8 16: (b) Compare the values obtained in (a) with the exact solution. (c) Write down the linear system arising from nite element method of the solution of the twopoint boundary value problem: ;2u00 + 3u = x2 , 0 x 1 y (0) = y (1) = 0, using the same basic functions j (x) as in the book and uniform grid. 369
PROBLEMS ON SECTION 6.4
7. Solve the linear system Ax = b, where b is a vector with each component equal to 1 and with each A from problem #18 of Chapter 5, using (a) Gaussian elimination wihtout pivoting, (b) Gaussian elimination with partial and complete pivoting, (c) the QR factorization. 8. (a) Solve each of the systems of Problem #7 using parital pivoting but without explicit factorization (Section 6.4.4). (b) Compute the residual vector in each case. 9. Solve
1 2 3 x3 3 using Gaussian elimination without and with partial pivoting and compare the answers. 10. Consider m linear systems
0 0:00001 1 1 1 0 x 1 0 2:0001 1 B 3 C B x1 C = B 3 C B C 1 1CB 2C B @ A@ A @ A
Axi = bi i = 1 2 : : : m:
(a) Develop an algorithm to solve the above systems using Gaussian elimination with complete pivoting. Your algorithm should take advantage of the fact that all the m systems have the same system matrix A. (b) Determine the opcount of your algorithm. (c) Apply your algorithm in the special case where bi is the ith column of the identity matrix, i = 1 : : : n. (d) Use the algorithm in (c) to show that the inverse of an n n matrix A can be computed using Gaussian elimination with complete pivoting in about 4 n3 ops. 3 (e) Apply the algorithm in (c) to compute the inverse of 01 1 11 2 B 1 1 3 C: 1 B2 3 4C A=@ A 11. Consider the system Ax = b, where both A and b are complex. Show how the system can be solved using real arithmetic only. Compare the opcount in this case with that needed to solve the system with Gaussian elimination using complex arithmetic. 370
1 3 1 4 1 5
12. (i) Compute the Cholesky factorization of 01 1 1 1 B C A = B 1 1:001 1:001 C @ A 1 1 2 using (a) Gaussian elimination without pivoting, (b) the Cholesky algorithm. ii) In part (a) verify that max ja(i)j max ja(k;1)j k = 1 2. ij ij A(1) and A(2) are also positive de nite. iii) What is the growth factor? 13. Using the results of problem #12, solve the system Ax = b 0 3 1 B C where b = B 3:0020 C : @ A 4:0010 and A is the same as in problem #12. 0 4 ;1 ;1 0 1 B ;1 4 0 ;1 C B C C 14. (a) Show that A = B B B ;1 0 4 ;1 C is positive de nite with and without nding the C @ A 0 ;1 ;1 4 Cholesky factorization. 0 1 2 B2C B C (b) Solve the system Ax = B C. B C B2C @ A 2 15. (a) Develop an algorithm to solve a tridiagonal system using Gaussian elimination with partial pivoting. (b) Show that the growth factor in this case is bounded by two (Hint: max ja(1)j 2 max jaij j). ij 16. (a) Show that Gaussian elimination applied to a column diagonally dominant matrix preserves diagonal dominance at each step of reduction that is, if A = (aij ) is column diagonally dominant, then so is A(k) = (a(k)) k = 1 2 : : : n ; 1. ij 371
(b) Show that the growth using Gaussian elimination with partial pivoting for such a matrix is bounded by 2. (Hint: max max ja(k)j 2 max jaiij). k i j ij (c) Verify the statement of part (a) with the matrix of problem #14. (d) Construct a 2 2 column diagonally dominant matrix whose growth factor with Gaussian elimination without pivoting is larger than 1 but less than or equal to 2. 17. Solve the tridiagonal system
(a) using Gaussian elimination, (b) computing the LU factorization of A directly from A = LU .
0 2 ;1 0 0 1 0 x 1 0 1 1 1 B B ; 1 2 ;1 0 C B x 2 C B 1 C CB C B C B CB C B C B B 0 ;1 2 ;1 C B x3 C = B 1 C CB C B C @ A@ A @ A 0 0 ;1 2 x4 1
18. Solve the diagonally dominant system 0 10 1 1 1 1 0 x 1 0 13 1 B 1 10 1 1 C B x1 C B 13 C B CB 2C B C B CB C B C B B ;1 0 10 1 C B x3 C = B 10 C CB C B C @ A@ A @ A ;1 ;1 ;1 10 x4 7 using Gaussian elimination without pivoting. Compute the growth factor. 19. (a) Develop e cient algorithms for triangularizing (i) an upper Hessenberg matrix (ii) a lower Hessenberg matrix, using Gaussian elimination with partial pivoting. (b) Show that when A is upper Hessenberg, Gaussian elimination with partial pivoting gives
ja(ijk)j k + 1 if jaij j 1
hence deduce that the growth factor in this case is bounded by n. (c) Apply your algorithms to solve the systems: 01 2 2 310x 1 061 B 3 4 5 7 C B x1 C B 7 C B CB 2C B C CB C B C i. B B B 0 0:1 2 3 C B x3 C = B 8 C CB C B C @ A@ A @ A 0 0 0:1 1 x4 9
0 1 0:0001 0 1 0 x 1 0 1:0001 1 B 0 2 0:0001 C B x1 C = B 2:0001 C C CB 2C B ii. B A @ A@ A @
0 0 3
x3
3:0000
372
0 0 1 4 x4 1 (d) Compute the growth factor in each case. (e) Suppose the data in the above problems are accurate to 4 digits and you seek an accuracy of three digits in your soluiton. Identify which problems are illconditioned. 0 1 1 1 B C 20. (a) Find the QR factorization of A = B 10;5 0 C. @ A ;5 0 10 0 2 1 B C (b) Using the results of (a), solve Ax = b, where b = B 10;5 C. @ A 10;5 21. Find the LU factorization of 0T I 0 1 B C A=BI T I C @ A 0 I T 0 2 ;1 0 1 B C where T = B ;1 2 ;1 C : @ A 0 ;1 2 Use this factorization to solve Ax = b where each element of b is 2.
00 0 0 110x 1 001 B 1 0 0 2 C B x1 C B 0 C B CB 2C B C CB C B C iii. B B B 0 1 0 3 C B x3 C = B 1 C : CB C B C @ A@ A @ A
PROBLEMS ON SECTION 6.5
22. Find the determinant and the inverse of each of the matrices of problem #18 of Chapter 5 using (a) Gaussian elimination with partial and complete pivoting. (b) the QR factorization.
A A 23. (a) Let A = 11 12 . A21 A22 Assume that A11 and! 22 are square and that A11 and A22 ; A21 A;1 A12 are nonsingular. A 11 B11 B12 Let B = be the inverse of A. Show that B21 B22 B22 = (A22 ; A21 A;1A12 );1 11 B12 = ;A;1 A12B22 11
373
!
B21 = ;B22 A21A;1 11 and B11 = A;1 ; B12 A21A;1 : 11 11
(b) How many ops are needed to compute A;1 using the results of (a) if A11 and A22 are, respectively, m m and p p? 0 4 0 ;1 ;1 1 B C B 4 ;1 ;1 C C ;1 where A = B 0 (c) Use your results above to compute A B B ;1 ;1 4 0 C : C @ A ;1 1 0 4 01 2 1 00 2 1 1 1 B C B C 24. Let A = B 2 4:0001 2:0002 C and B = B 2 4:0001 2:0002 C. @ A @ A 1 2:0002 2:0004 1 2:0002 2:0004 Write B in the form B = A ; uv T , then compute B ;1 using the ShermanMorrison formula, knowing 0 4:0010 ;2:0006 0:0003 1 B C A;1 = 104 B ;2:0006 1:0004 ;0:0002 C @ A 0:0003 ;0:0002 0:0001 25. Suppose you have solved a linear system with A as the system matarix. Then show, how you can solve the augmented system
A x b = c xn+1 bn+1 where A is nonsingular and n n and a b, and c are vectors, using the solution you have
already obtained. Apply your result to the solution of 01 2 3 11 B4 5 6 1C B C B C B B 1 1 1 1 C y = ( 6 15 3 1 ) : C @ A 0 0 1 2
!
!
!
PROBLEMS ON SECTIONS 6.6 and 6.7
26. Consider the symmetric systems Ax = b, where 0 0:4445 0:4444 ;0:2222 1 0 0:6667 1 B C B C A = B 0:4444 0:4445 ;0:2222 C b = B 0:6667 C : @ A @ A ;0:2222 ;0:2222 0:1112 ;0:3332 011 B C The exact solution of the system x = B 1 C. @ A 1 374
(a) Make a small perturbation b in b keeping A unchanged. Solve the system Ax0 = b + b. Compare x0 with x. Compute Cond(A) and verify the appropriate inequality in the text. (b) Make a small perturbation A in A such that k Ak kA1 1 k . Solve the system ; (A + A)x0 = b. Compare x0 with x and verify the appropriate inequality in the text. (Hint: kA;1k2 = 104). 27. Prove the inequality
k xk k Ak kx + xk Cond(A) kAk where Ax = b and (A + A)(x + x) = b. 0 1 1 1 10x 1 011 0 0 0 0:00003 1 1 2 3 B CB C B C B C Verify the inequality for the system B 1 1 1 C B x2 C = B 1 C, using A = B 0 0 0 C. @ 2 3 4 A@ A @ A @ A 1 1 1 x3 1 0 0 0 3 4 5 28. (a) How are Cond(A) and Cond(A;1) related?
(b) Show that i. Cond(A) 1 ii. Cond(AT A) = (Cond(A))2. 29. (a) Let O be an orthogonal matrix. Then show that Cond(O) with respect to the 2norm is one. (b) Show that the Cond(A) with respect to the 2norm is one if and only if A is a scalar multiple of an orthogonal matrix.
30. Let U = (uij ) be a nonsingular upper triangular matrix. Then show that with respect to the in nity norm u) Cond(U ) max(u ii) : min(
ii
Hence construct a simple example of an illconditioned nondiagonal symmetric positive definite matrix.
31. Let A = LDLT be a symmetric positive de nite matrix. Let D = diag(Dii). Then show that with respect to 2norm Cond(A) max(dii) : min(dii) Hence construct an example for an illconditioned nondiagonal symmetric positive de nite matrix.
375
32. (a) Show that for any matrix A, Cond(A) with respect to 2norm is given by Cond(A) =
max min
where max and min are, respectively, the largest and the smallest singular values of A. (b) Use the above expressions for Cond(A) to construct an example of an illconditioned matrix as follows: choose two nondiagonal orthogonal matrices U and V (in particular they can be chosen as Householder matrices) and a diagonal matrix with one or several small diagonal entries. Then the matrix
A= U VT
has the same conditionnumber as and is illconditioned. 33. (a) Construct your own example to show that a small residual does not necessarily guarantee that the solution is accurate. (b) Give a proof of Theorem 6.7.1 (Residual Theorem). (c) Using the Residual Theorem prove that if an algorithm produces a small residual for every wellconditioned matrix, it is weakly stable. ! 1 a 34. (a) Find for what values of a the matrix A = is illconditioned? a 1 ! 1 (b) Let a = 0:999. Solve the system Ax = using Gaussian elimination without pivot1 ing. (c) What is the condition number of A?
PROBLEMS ON SECTION 6.9
001 B C 35. Apply iterative re nement to the system of the problem #9 using x(1) = B 0 C. Estimate @ A
0 Cond(A) from x(1) and x(2) and compare it with the actual Cond2(A) = 16:2933.
PROBLEMS ON SECTION 6.10
36. Consider the linear system of problem #14(a). (a) Why do both the Jacob and GaussSeidel methods converge with an arbitrary choice of initial approximation for this system? 376
(b) Carry out ve iterations of both the methods with the same initial approximation
001 B C B C (1) = B 0 C x B C B0C @ A
0
and compare the rates of convergence. 37. Construct an example to show that the convergence of the Jacobi method does not necessarily imply that the GaussSeidel method will converge. 38. Let the n n matrix A be partitioned into the form
0A A A1n 1 11 12 BA A C B 21 22 B . . . A.2n C C A=B . .. .. . C B . . C @ A
AN 1 AN 2 AN N
where each diagonal block Aii is square and nonsingular. Consider the linear system
Ax = b
with A as above and x and b partitioned commensurately. (a) Write down the Block Jacobi, Block GaussSeidel, and Block SOR iterations for the linear system Ax = b, (Hint: Write A = L + D + U , where D = diag(A11 : : : ANN ), and L and U are strictly block lower and upper triangular matrices.) (b) If A is symmetric positive de nite, then show that U = LT and D is positive de nite. In this case, from the corresponding results in the scalar cases, prove that, with an arbitrary choice of the initial approximation, Block GaussSeidel always converges and Block SOR converges if and only if 0 < w < 2. 39. Consider the Block system arising in the solution of the discrete Poisson equation: uxx + uyy = f : 0 T ;I 1 B ;I T . . . C B C B . . . C B . .. .. C B . ;I C @ A
;I T
377
01 C 0C C ... ... ... 0 C C C . . . . . . ;1 C C A 0 0 0 ;1 4 Show that the Block Jacobi iteration in this case is where
04 B ;1 B B B T =B B B B @
;1 0 4 ;1 0
k) k Tx(ik+1) = x(i+1 + x(i;)1 + bi i = 1 : : : N:
Write down the Block GaussSeidel and Block SOR iterations for this system. 40. (a) Prove that 1kxk2 kxkA nkxk2, where A is a symmetric positive de nite matrix with the eigenvalues 0 < 1 2 : : : n. p (b) Using the result in (a), prove that kx ; xk k2 2 k kx ; x0 k2, for the conjugate gradient method. 41. Show that the Jacobi method converges for a 2 2 symmetric positive de nite system. 42. For the system of problem #39, compute (BJ ) (BGS ), and wopt with N = 50 100 1000. Compare the rate of convergence of the SOR iteration with using the optimal value wopt in each case with that of Gauss Seidel, without actually performing the iterations. 43. Consider the block diagonal system 0A 0 B0 A B B. . B .. . . B B. . B . .. B. @ 0 0 where 0 C 0 0C C ... ... . C .C .C ... ... 0 C C A 0 A 25
p
p
1
011 B1C B C B C B C x = B1C B.C B.C B.C @ A
25
1
01 C 0C . C . C . C C C ;1 C A 2 5 5 Compute (BJ ) and (BGS ) and nd how they are related. Solve the system using 5 iterations of GaussSeidel and SOR with optimal value of w. Compare the rates of convergence. 378
0 2 ;1 0 B ;1 2 ;1 B B . B A = B .. . . . . . . . . . B . B . ... ... B . @ 0 ;1
44. Prove that the choice of minimizes the quadratic function
= pT (Ax ; b)=pT Ap ( ) = ( x ; p) = 1 (x ; p)T A(x ; p) ; bT (x ; p): 2
45. Show that the eigenvectors of A are the direction vectors. 46. (a) Apply the Incomplete Cholesky Factorization algorithm to an unreduced tridiagonal matrix T and show that the result is the usual Cholesky Factorization of T . Verify the above statement with 04 1 1 0 B 1 . . . . . . ... C B C T = B .. . . . . C : B B . . . 1C C @ A 0 1 4 5 5 (b) Apply the SOR iteration to the matrix T in (a) with w = 1:5 using x(0) = (0 0 0 0 0)T , and make a table with the results of the iterations. 47. Let p0 p1 : : : pn;1 be the direction vectors generated by the basic conjugate gradient algorithm. Let rk = b ; Axk k = 0 1 : : :n ; 1. Then prove that (a) rk 2 span (p0 : : : pk) k = 0 1 2 : : : n ; 1. (b) Span (p0 : : : pi) = span (r0 Ar0 : : : Air0 ), i = 0 1 : : : n ; 1. (See Ortega IPVL, pp. 271{273). (Read the proof from Ortega IPVL, pp. 271273 and reproduce the proof yourself using your own words.) (c) Prove that r0 : : : rn;1 are mutually orthogonal. 48. (Multisplitting) Consider the iteration
x(k+1) = Bx(k) + d
where B is given by
B=
and
k X i=1
DiBi;1 Ci i = 1 ::: k
379
d=(
k X i=1
DiBi;1 )b
A = Bi ; Ci
k X i=1
Di = I (Di 0):
Develop the Jacobi, the GaussSeidel and the SOR methods based on the multisplitting of A. This type of multisplitting has been considered by O'Leary and White (1985),
and Neuman and Plemmons (1987).
49. Apply the Jacobi, the GaussSeidel and the SOR (with optimal relaxation factor) methods to the system in the example following Theorem 6.10.6 (Example 6.10.5) and verify the statement about number of iterations made there about di erent methods. 50. Give a proof of Theorem 6.10.8. 51. Read the proof of Theorem 6.10.9 from Ortega, IPVL, p. 277, and then reproduce the proof yourself using your own words.
380
MATLAB AND MATCOM PROGRAMS AND PROBLEMS ON CHAPTER 6
You will need the programs lugsel, inlu, inparpiv, incompiv, givqr, compiv, invuptr, iterref, jacobi, gaused, sucov, nichol from MATCOM. 1. (a) Write a MATLAB program called forelm, based on Algorithm 6.4.1:
y ] = forelm (L b)
to solve a nonsingular lower triangular system Ly = b using forward elimination. (b) Write a MATLAB program called backsub, based on Algorithm 3.1.3
x] = backsub (U b)
to solve a nonsingular upper triangular system
Ux = b:
Use randomly generated test matrices and test matrices creating L and U with one or more small diagonal entries. (Note : forelm and backsub are also in MATCOM or in the Appendix.)
381
Test Matrices for Problems 2 Through 8
For problems #2 through 8 use the following matrices as test matrices. When the problem is linear system problem Ax = b, create a vector b such that the solution vector x is a vector with all components equal to 1. 1. Hilbert matrix of order 10 2. Pie matrix of order 10 3. Hankel matrix of order 10 4. Randomly generated matrix of order 10
0 B :00001 1 1 5. A = B B 0 :00001 1 @
0 0 :00001 6. Vandermonde matrix of order 10.
1 C C. C A
2. (a) Using lugsel from MATCOM, backsub, and forelm, write the MATLAB program
x] = linsyswp(A b)
to solve Ax = b using Gaussian elimination without pivoting. Compute the growth factor, elapsed time, and opcount for each system. (b) Run the program inlu from MATCOM and multiply the result by the vector b, to obtain the solution vector x = A;1 b. Compute opcount. (c) Compare the computed solutions and opcounts of (a) and (b). 3. (a) Using parpiv and elmul from Chapter 5, and backsub, write a MATLAB program
x] = linsyspp(A b)
to solve Ax = b using Gaussian elimination with partial pivoting. Compute the growth factor, elapsed time, and opcount for each system. 382
(b) Run the program inparpiv from MATCOM and multiply the result by b to compute the solution vector x = A;1 b. Compute opcount for each system. (c) Compare the computed solutions, opcounts and elapsed times of (a) and (b). 4. (a) Using compiv from MATCOM, elmul from Chapter 5, and backsub, write the MATLAB program x] = linsyscp(A b) to solve Ax = b using Gaussian elimination with complete pivoting. Compute opcount, elapsed time, and the growth factor for each system. (b) Run the program incompiv from MATCOM and multiply the result by b to compute the solution vector x = A;1 b. Compute opcount for each system. (c) Compare the computed solutions, opcount, and elapsed time of (a) and (b). 5. (a) Implement the algorithm in section 6.4.4 to solve Ax = b without explicit factorization using partial pivoting: x] = linsyswf (A b): (b) Compute A;1 using this explicit factorization. 6. (a) Using housqr from Chapter 5 (or the MATLAB function qr) and backsub, write the MATLAB program x] = linsysqrh(A b) to solve Ax = b using QR factorization with Householder matrices. Compute opcount for each system. (b) Repeat (a) with givqr in place of housqr that is, write a MATLAB program called linsysqrg to solve Ax = b using the Givens method for QR factorization. 7. (The purpose of this exercise is to make a comparative study with respect to
accuracy, elapsed time, opcount and growth factor for di erent methods for solving Ax = b.)
Tabulate the result of problems 2 through 6 in the following form: Make one table for each matrix. x stands for the computed solution. ^ 383
TABLE 6.1 (Comparison of Di erent Methods For the Linear System Problems)
Method linsyswp linsyspp linsyscp linsyswf linsysqrh linsysqrg The computed Rel. Error Residual Growth Elapsed solution x ^ jjx ; xjj=jjxjj jjb ; Axjj Factor Time ^ ^
A;1b
384
8. (a) Write a MATLAB program to nd the inverse of A using housqr (or MATLAB function qr) and invuptr (from MATCOM):
A] = invqrh(A):
Compute opcount for each matrix. (b) Repeat (a) using givqr and invuptr:
A] = invqrg(A):
Compute opcount for each matrix. (c) Run inlu, inparpiv, incompiv from MATCOM with each of the data matrices. Make a table for each matrix A to compare the di erent methods for nding the inverse with ^ respect to accuracy and opcount. Denote the computed inverse by A. Get A;1 by using the MATLAB command inv (A).
TABLE 6.2 (Comparison of Di erent Methods for Computing the Inverse)
Method inlu inparpiv incompiv invqrh invqrg 9. (a) Modify the program elmlu to nd the cholesky factorization of a symmetric positive de nite matrix A using Gaussian elimination without pivoting. Relative Error ^ jjA;1 ; (A)jj=jjA;1jj FlopCount
H ] = cholgauss(A):
Create a 15 15 lower triangular matrix L with positive diagonal entries taking some of the diagonal entries small enough to be very close to zero, multiply it by LT and take A = LLT as your test matrix. Compute H . Compute opcount. 385
(b) Run the MATLAB program Chol on the same matrix in (a), denote the transpose of ^ the result by H . Compute opcount. (c) Compare the results of (a) and (b). (Note that Chol(A) gives an upper triangular matrix H such that A = H T H ). 10. Run the program linsyswp with the diagonally dominant, symmetric tridiagonal and block tridiagonal matrices encountered in Section 6.3.5 by choosing the righthand side vector b so that the solution vector x is known apriori. Compare the exact solution x with the computed solution x. ^
(The purpose of this exercise is to verify that to solve a symmetrical positive de nite system, no pivoting is needed to ensure stability in Gaussian elimination).
11. (a) Write a MATLAB program to implement algorithm 6.7.2 that nds an upper bound of the 2norm of the inverse of an upper triangular matrix:
CEBOUND] = norminvtr(U ):
Test your result by randomly creating a 10 10 upper triangular matrix with several small diagonal entries, and then compare your result with that obtained by running the MATLAB command: norm(inv (U )): (b) Now compute the condition number of U as follows:
norm(U ) norminvtr(U ):
Compare your result with that obtained by running the MATLAB command: cond (U ): Verify that (u ) Cond(U ) max(u ii) : min
ii
(Use the same test matrix U as in part (a)).
12. (a) (The purpose of this exercise is to compare di erent approaches for estimating the condition number of a matrix). Compute and/or estimate the condition number of each of the following matrices A of order 10: Hilbert, Pie, randomly generated, Vandermonde, and Hankel using the following approaches: i. Find the QR factorization of A with column pivoting: QT AP = R Estimate the 2norm of the inverse of R by running norminvtr on R. Now compute norm(R) * norminvtr(R). Compute opcount. 386
ii. Compute norm(A) * norm( inv(A)). Compute opcount. iii. Compute cond(A). Compute opcount. (b) Now compare the results and opcounts. 13. (a) Run the iterative re nement program iterref from MATCOM on each of the 15 15 systems: Hilbert, Pie, Vandermonde, randomly generated, Hankel, using the solution obtained from the program linsyspp (problem #3) as the initial approximation
x(o)
(b) Estimate the condition number of each of the matrices above using the iterative re nement procedure. (c) Compare your results on condition number estimation with those obtained in problem #12. 14. Run the programs jacobi, gaused, and sucov from MATCOM on the 6 6 matrix A of Example 6.10.5 with the same starting vector x(0) = (0 0 ::: 0)T . Find how many iterations each method will take to converge. Verify the statement of the example that it takes ve iterations for SOR to converge with !opt compared to twelve iterations for Jacobi. 15. Run the programs jacobi, gaused, and sucov from MATCOM on Example 6.10.3 and verify the statement of the example. 16. Run the program nichol from MATCOM implementing the \NOFill Incomplete Cholesky Factorization" on the tridiagonal symmetric positive de nite matrix T of order 20 arising in descretization of Poisson's equation. Compare your result with that obtained by running chol(T) on T . 17. Write a MATLAB program called arnoldi based on the Arnoldi method (Algorithm 6.10.8) using modifying Gram{Schmidt algorithm (modi ed Gram{Schmidt has been implemented in MATCOM program mdgrsch (see Chapter 7)). 18. Using Arnoldi and a suitable leastsquares routine from Chapter 7, write a MATLAB program called gmres to implement the GMRES algorithm (Algorithm 6.10.9).
387
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.