You are on page 1of 561
Mate l a 4-110) in Action Harry Dym Graduate Studies in Mathematics Volume 78 js American Mathematical Society Linear Algebra in Action Harry Dym Graduate Studies in Mathematics Volume 78 s American Mathematical Society “Providence, Rhode Island Editorial Board David Cox Walter Craig Nikolai Ivanov Steven G. Krantz David Saltman (Chair) 2000 Mathematics Subject Classification. Primary 15-01, 30-01, 34-01, 39-01, 52-01, 93-01. For additional information and updates on this book, visit www.ams.org/bookpages/gsm-78 Library of Congress Cataloging-in-Publication Data Dym, H. (Harry), 1938-. Linear algebra in action / Harry Dym. p. cm. — (Graduate studies in mathematics, ISSN 1065-7339 ; v. 78) Includes bibliographical references and index. ISBN-13: 978-0-8218-3813-6 (alk. paper) ISBN-10: 0-8218-3813-X (alk. paper) 1. Algebras, Linear. I. Title. QA184.2.D96 2006 512/.5—de22 2006049906 Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by e-mail to reprint-permissionCams. org. © 2007 by the American Mathematical Society. All rights reserved. ‘The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. © The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability. Visit the AMS home page at http: //www.ams.org/ 10987654321 121110090807 Dedicated to the memory of our oldest son Jonathan Carroll Dym and our first granddaughter Avital Chana Dym, who were recalled prematurely for no apparent reason, he but 44 and she but 12. Yhi zichram baruch Contents Preface Chapter 1. §l. §1.2. §13. gd. §1.5. §1.6. §1.7. §1.8. Vector spaces Preview The abstract definition of a vector space Some definitions Mappings Triangular matrices Block triangular matrices Schur complements Other matrix products Chapter 2. Gaussian elimination §2.1. §2.2. §2.3. §2.4. §2.5. §2.6. §2.7. §2.8. Some preliminary observations Examples Upper echelon matrices The conservation of dimension Quotient spaces Conservation of dimension for matrices From U to A Square matrices Chapter 3. Additional applications of Gaussian elimination §3.1. Gaussian elimination redux xv anreer 13 16 17 19 21 22 24 30 36 38 38 40 41 45 45 vi Contents §3.2. §3.3. §3.4. §3.5. §3.6. §3.7. §3.8. Chapter 4. §4.1. §4.2. §4.3. §4.4. §4.5. §4.6. §4.7. §4.8. §4.9. §4.10. §4.11. §4.12. §4.13. §4.14. §4.15. Chapter 5. §5.1. §5.2. §5.3. §5.4. §5.5. §5.6. §5.7. $5.8. §5.9. §5.10. Properties of BA and AC Extracting a basis Computing the coefficients in a basis The Gauss-Seidel method Block Gaussian elimination {0, 1, co} Review Eigenvalues and eigenvectors Change of basis and similarity Invariant subspaces Existence of eigenvalues Eigenvalues for matrices Direct sums Diagonalizable matrices An algorithm for diagonalizing matrices Computing eigenvalues at this point Not all matrices are diagonalizable The Jordan decomposition theorem An instructive example The binomial formula More direct sum decompositions Verification of Theorem 4.12 Bibliographical notes Determinants Functionals Determinants Useful rules for calculating determinants Eigenvalues Exploiting block structure The Binet-Cauchy formula Minors Uses of determinants Companion matrices Circulants and Vandermonde matrices 48 50 51 52 55 56 57 61 62 64 64 66 69 7 73 74 76 78 79 82 82 84 87 89 89 90 93 97 99 102 104 108 108 109 Contents Chapter 6. Calculating Jordan forms §6.1. §6.2. §6.3. §6.4. §6.5. §6.6. §6.7. §6.8. §6.9. Overview Structure of the nullspaces Np; Chains and cells Computing J An algorithm for U An example Another example Jordan decompositions for real matrices Companion and generalized Vandermonde matrices Chapter 7. Normed linear spaces §7.1. | §7.3. §7.4. §7.5. §7.6. §7.7. §7.8. §7.9. §7.10. §7.11. Four inequalities Normed linear spaces Equivalence of norms Norms of linear transformations Multiplicative norms Evaluating some operator norms Small perturbations Another estimate Bounded linear functionals Extensions of bounded linear functionals Banach spaces Chapter 8. Inner product spaces and orthogonality §8.1. §8.2. §8.3. §8.4. §8.5. §8.6. §8.7. §8.8. §8.9. §8.10. §8.11. Inner product spaces A characterization of inner product spaces Orthogonality Gram matrices Adjoints The Riesz representation theorem Normal, selfadjoint and unitary transformations Projections and direct sum decompositions Orthogonal projections Orthogonal expansions The Gram-Schmidt method vii 111 112 112 115 116 7 120 122 126 128 133 183 138 140 142 143 145 147 149 150 152 155 157 157 160 161 163 163 166 168 170 172 174 77 Contents viii §8.12. Toeplitz and Hankel matrices 178 §8.13. Gaussian quadrature 180 §8.14. Bibliographical notes 183 Chapter 9. Symmetric, Hermitian and normal matrices 185 §9.1. Hermitian matrices are diagonalizable 186 §9.2. Commuting Hermitian matrices 188 §9.3. Real Hermitian matrices 190 §9.4. Projections and direct sums in F" 191 §9.5. Projections and rank 195 §9.6. Normal matrices 195 89.7. Schur’s theorem 198 §9.8. QR factorization 201 §9.9. Areas, volumes and determinants 202 §9.10. Bibliographical notes 206 Chapter 10. Singular values and related inequalities 207 §10.1. Singular value decompositions 207 §10.2. Complex symmetric matrices 212 §10.3. | Approximate solutions of linear equations 213 §10.4. The Courant-Fischer theorem 215 §10.5. Inequalities for singular values 218 §10.6. Bibliographical notes 225 Chapter 11. Pseudoinverses 227 §11.1. Pseudoinverses 227 §11.2. The Moore-Penrose inverse 234 §11.3. | Best approximation in terms of Moore-Penrose inverses 237 Chapter 12. Triangular factorization and positive definite matrices 239 §12.1. A detour on triangular factorization 240 §12.2. Definite and semidefinite matrices 242 §12.3. Characterizations of positive definite matrices 244 §12.4. An application of factorization 247 §12.5. Positive definite Toeplitz matrices 248 §12.6. | Detour on block Toeplitz matrices 254 §12.7. A maximum entropy matrix completion problem 258 §12.8. Schur complements for semidefinite matrices 262 Contents ix §12.9. Square roots 265 §12.10. Polar forms 267 §12.11. Matrix inequalities 268 §12.12. A minimal norm completion problem 271 §12.13. A description of all solutions to the minimal norm completion problem 273 §12.14. Bibliographical notes 274 Chapter 13. Difference equations and differential equations 275 §13.1. Systems of difference equations 276 §13.2. The exponential e'4 277 §13.3. Systems of differential equations 279 §13.4. | Uniqueness 281 §13.5. Isometric and isospectral flows 282 §13.6. Second-order differential systems 283 §13.7. Stability 284 §13.8. | Nonhomogeneous differential systems 285 §13.9. Strategy for equations 285 §13.10. Second-order difference equations 286 §13.11. Higher order difference equations 289 §13.12. Ordinary differential equations 290 §13.13. Wronskians 293 §13.14. Variation of parameters 295 Chapter 14. Vector valued functions 297 §14.1. Mean value theorems 298 §14.2. Taylor’s formula with remainder 299 §14.3. Application of Taylor’s formula with remainder 300 §14.4. Mean value theorem for functions of several variables 301 §14.5. Mean value theorems for vector valued functions of several variables 301 §14.6. Newton’s method 304 §14.7. A contractive fixed point theorem 306 §14.8. A refined contractive fixed point theorem 308 §14.9. Spectral radius 309 §14.10. The Brouwer fixed point theorem 313 §14.11. Bibliographical notes 316 Contents Chapter 15. The implicit function theorem §15.1. §15.2. §15.3. §15.4. §15.5. §15.6. §15.7. §15.8. §15.9. §15.10. §15.11. Preliminary discussion The main theorem A generalization of the implicit function theorem Continuous dependence of solutions The inverse function theorem Roots of polynomials An instructive example A more sophisticated approach Dynamical systems Lyapunov functions Bibliographical notes Chapter 16. Extremal problems §16.1. §16.2. §16.3. §16.4. §16.5. §16.6. §16.7. Classical extremal problems Extremal problems with constraints Examples Krylov subspaces The conjugate gradient method Dual extremal problems Bibliographical notes Chapter 17. Matrix valued holomorphic functions §17.1. §17.2. §17.3. §17.4. §17.5. §17.6. | §17.8. §17.9. Differentiation Contour integration Evaluating integrals by contour integration A short detour on Fourier analysis Contour integrals of matrix valued functions Continuous dependence of the eigenvalues More on small perturbations Spectral radius redux Fractional powers Chapter 18. Matrix equations §18.1. §18.2. §18.3. The equation X - AXB=C The Sylvester equation AX -XB=C Special classes of solutions 317 317 319 324 326 327 329 329 333 335 336 337 337 341 344 349 349 354 356 357 357 361 365 370 372 375 377 378 381 383 383 385 388 Contents xi §18.4. Riccati equations 390 §18.5. Two lemmas 396 §18.6. An LQR problem 398 §18.7. Bibliographical notes 400 Chapter 19. Realization theory 401 §19.1. Minimal realizations 408 §19.2. Stabilizable and detectable realizations 415 §19.3. Reproducing kernel Hilbert spaces 416 §19.4. de Branges spaces 418 §19.5. Ra invariance 420 §19.6. Factorization of @(A) 421 §19.7. Bibliographical notes 425 Chapter 20. Eigenvalue location problems 427 §20.1. Interlacing 427 §20.2. Sylvester’s law of inertia 430 §20.3. | Congruence 431 §20.4. Counting positive and negative eigenvalues 433 §20.5. Exploiting continuity 437 §20.6. GerSgorin disks 438 §20.7. The spectral mapping principle 439 §20.8. AX=XB 440 §20.9. Inertia theorems 441 §20.10. An eigenvalue assignment problem 443 §20.11. Bibliographical notes 446 Chapter 21. Zero location problems 447 §21.1. Bezoutians 447 §21.2. A derivation of the formula for Hy based on realization 452 §21.3. The Barnett identity 453 §21.4. The main theorem on Bezoutians 455 §21.5. Resultants 457 §21.6. Other directions 461 §21.7. Bezoutians for real polynomials 463 §21.8. Stable polynomials 464 §21.9. Kharitonov’s theorem 466 xii Contents §21.10. Bibliographical notes Chapter 22. Convexity §22.1. §22.2. §22.3. §22.4. §22.5. §22.6. §22.7. §22.8. §22.9. §22.10. §22.11. §22.12. §22.13, §22.14. §22.15. Preliminaries Convex functions Convex sets in R” Separation theorems in R” Hyperplanes Support hyperplanes Convex hulls Extreme points Brouwer’s theorem for compact convex sets The Minkowski functional The Gauss-Lucas theorem The numerical range Eigenvalues versus numerical range The Heinz inequality Bibliographical notes Chapter 23. Matrices with nonnegative entries §23.1. §23.2. §23.3, §23.4, §23.5. §23.6. Perron-Frobenius theory Stochastic matrices Doubly stochastic matrices An inequality of Ky Fan The Schur-Horn convexity theorem Bibliographical notes Appendix A. Some facts from analysis §A.1. §A.2. §A.3. §A4. §A.5. §A.6. §A7. Convergence of sequences of points Convergence of sequences of functions Convergence of sums Sups and infs Topology Compact sets Normed linear spaces Appendix B. More complex variables SB.1. Power series 467 469 469 471 473 475 ATT 479 480 482 485 485 488 489 491 492 494 495 496 503 504 507 509 513 515 515 516 516 517 518 518 518 521 521 Contents xiii §B.2. Isolated zeros 523 §B.3. The maximum modulus principle 525 §B.4. In(1— A) when |A| <1 525 §B.5. Rouché’s theorem 526 §B.6. Liouville’s theorem 528 §B.7. Laurent expansions 528 §B.8. Partial fraction expansions 529 Bibliography 531 Notation Index 535 Subject Index 537 Preface A foolish consistency is the hobgoblin of little minds,... Ralph Waldo Emerson, Self Reliance This book is based largely on courses that I have taught at the Fein- berg Graduate School of the Weizmann Institute of Science over the past 35 years to graduate students with widely varying levels of mathematical sophistication and interests. The objective of a number of these courses was to present a user-friendly introduction to linear algebra and its many ap- plications. Over the years I wrote and rewrote (and then, more often than not, rewrote some more) assorted sets of notes and learned many interesting things en route. This book is the current end product of that process. The emphasis is on developing a comfortable familiarity with the material. Many lemmas and theorems are made plausible by discussing an example that is chosen to make the underlying ideas transparent in lieu of a formal proof; ile., I have tried to present the material in the way that most of the mathe- maticians that I know work rather than in the way they write. The coverage is not intended to be exhaustive (or exhausting), but rather to indicate the rich terrain that is part of the domain of linear algebra and to present a decent sample of some of the tools of the trade of a working analyst that I have absorbed and have found useful and interesting in more than 40 years in the business. To put it another way, I wish someone had taught me this material when I was a graduate student. In those days, in the arrogance of youth, I thought that linear algebra was for boys and girls and that real men and women worked in functional analysis. However, this is but one of many opinions that did not stand the test of time. xv xvi Preface In my opinion, the material in this book can (and has been) used on many levels. A core course in classical linear algebra topics can be based on the first six chapters, plus selected topics from Chapters 7-9 and 13. The latter treats difference equations, differential equations and systems thereof. Chapters 14-16 cover applications to vector calculus, including a proof of the implicit function based on the contractive fixed point theorem, and ex- tremal problems with constraints. Subsequent chapters deal with matrix valued holomorphic functions, matrix equations, realization theory, eigen- value location problems, zero location problems, convexity, and matrices with nonnegative entries. I have taken the liberty of straying into areas that I consider significant, even though they are not usually viewed as part of the package associated with linear algebra. Thus, for example, I have added short sections on complex function theory, Fourier analysis, Lyapunov func- tions for dynamical systems, boundary value problems and more. A number of the applications are taken from control theory. I have adapted material from many sources. But the one which was most significant for at least the starting point of a number of topics covered in this work is the wonderful book [45] by Lancaster and Tismenetsky. A number of students read and commented on substantial sections of assorted drafts: Boris Ettinger, Ariel Ginis, Royi Lachmi, Mark Kozdoba, Evgeny Muzikantov, Simcha Rimler, Jonathan Ronen, Idith Segev and Amit Weinberg. I thank them all, and extend my appreciation to two senior readers: Aad Dijksma and Andrei Iacob for their helpful insightful remarks. A special note of thanks goes to Deborah Smith, my copy editor at AMS, for her sharp eye and expertise in the world of commas and semicolons. On the production side, I thank Jason Friedman for typing an early version, and our secretaries Diana Mandelik, Ruby Musrie, Linda Alman, Terry Debesh, all of whom typed selections and to Diana again for preparing all the figures and clarifying numerous mysterious intricacies of Latex. I also thank Barbara Beeton of AMS for helpful advice on AMS Latex. One of the difficulties in preparing a manuscript for a book is knowing when to let go. It is always possible to write it better.! Fortunately AMS maintains a web page: http://www.ams.org/bookpages/gsm-78, for sins of omission and commission (or just plain afterthoughts). TAM, ACH TEREM NISHLAM,.... October 18, 2006 Rehovot, Israel \Israel Gohberg tells of a conversation with Lev Sakhnovich that took place in Odessa many years ago: Lev: Israel, how is your book with Mark Gregorovic (Krein) progressing? Israel: It's about 85% done. Lev: That’s great! Why so sad? Israel: If you would have asked me yesterday, I would have said 95%. Chapter 1 Vector spaces The road to wisdom? Well it’s plain and simple to express. Err and err and err again, but less and less and less. Cited in [43] 1.1. Preview One of the fundamental issues that we shall be concerned with is the solution of linear equations of the form Qy121 +0124 + ++ ++Ayg%g=b1 2121 +A2202 + ++ -+A9q%q=b2 p11 +Op202 +++ +ApgLq=bp , where the aj; and the b; are given numbers (either real or complex) for i=1,...,pandj=1,... ,q, and we are looking for the «; for j = 1,...,q. Such a system of equations is equivalent to the matrix equation Ax=b, where a1 7+ aig zr b M1 Apa, Xq. bp, 2 1. Vector spaces e RC Cola: The term aj; in the matrix A sits in the i’th row and the j’th column of the matrix; i.e., the first index stands for the number of the row and the second for the number of the column. The order is re as in the popular drink by that name. Given A and b, the basic questions are: 1. When does there exist at least one solution x? 2. When does there exist at most one solution x? 3. How to calculate the solutions, when they exist? 4. How to find approximate solutions? The answers to these questions are part and parcel of the theory of vector spaces. 1.2. The abstract definition of a vector space This subsection is devoted to the abstract definition of a vector space. Even though the emphasis in this course is definitely computational, it seems advisable to start with a few abstract definitions which will be useful in future situations as well as in the present. A vector space V over the real numbers is a nonempty collection of objects called vectors, together with an operation called vector addition, which assigns a new vector u + v in V to every pair of vectors u in V and v in V, and an operation called scalar multiplication, which assigns a vector av in V to every real number a@ and every vector v in V such that the following hold: 1. For every pair of vectors u and v, u+v = v+u; i.e., vector addition is commutative. 2. For any three vectors u, v and w, u+ (v + w) = (u+v)+w; ie., vector addition is associative. 3. There is a zero vector (or, in other terminology, additive identity) 0€V such that 0+ v = v +0 =v for every vector v in V. 4. For every vector v there is a vector w (an additive inverse of v) such that v + w =0. 5. For every vector v, lv = v. 6. For every pair of real numbers a and f and every vector v, a(Gv) = (af)v. 7. For every pair of real numbers a and f and every vector v, (a+f)v = av + By. 8. For every real number a and every pair of vectors u and v, a(u+v) = au +av. 1.2. The abstract definition of a vector space 3 Because of Item 2, we can write u+v+w without brackets; similarly, because of Item 6 we can write av without brackets. It is also easily checked that there is exactly one zero vector 0 € V: If 0 is a second zero vector, then = 0+0=0. A similar argument shows that each vector v € V has exactly one additive inverse, —v = (—1)v in V. Correspondingly, we write u+(—v) =u-v. From now on we shall use the symbol R to designate the real numbers, the symbol C to designate the complex numbers and the symbol F when the statement in question is valid for both R and C and there is no need to specify. Numbers in F are often referred to as scalars. A vector space V over C is defined in exactly the same way as a vector space V over R except that the numbers a and (@ which appear in the definition above are allowed to be complex. Exercise 1.1. Show that if V is a vector space over C, then Ov = 0 for every vector v € V. Exercise 1.2. Let V be a vector space over F. Show that if a,@ € F and if v is a nonzero vector in V, then av = Bv <> a = @. [HINT: a-340—>v=(a—6)(a- Av) Example 1.1. The set of column vectors zy Fe= :|):a€F, i= Zp. of height p with entries x; € F that are subject to the natural rules of vector addition zy yu aty ele : 2p. Yp. Xp + Yp, and multiplication 1 ax, a = Bp Oxy of the vector x by a number a € F is the most basic example of a vector space. Note the difference between the number 0 and the vector 0 € F?. The latter is a column vector of height p with all p entries equal to the number zero. 4 1. Vector spaces The set F?*9 of p x q matrices with entries in F is a vector space with respect to the rules of vector addition: Ty Lig yuo Wg Tu+Y +: Ligt Yig : Sia ie ole : : . Zp + Xpq Yl <-> Up. Bert Yt --* Zpq+ Yq and multiplication by a scalar a € F: Ci Zig QL, ++ Alig Zp + Xpq ALpl --- ALpg Notice that the vector space F? dealt with a little earlier coincides with the vector space that is designated F?*? in the current example. Exercise 1.3. Show that the space R° endowed with the rule max(x1, 91) xO y= | max(x2, y2) max(sr3, ys) for vector addition and the usual rule for scalar multiplication is not a vector space over R. [HINT: Show that this “addition” rule does not admit a zero element; i.e., there is no vector a € R® such that a0 x = x Oa = x for every x € R®] a Exercise 1.4. Let C C R® denote the set of vectors a= | a2 | such that a3 the polynomial a; + at + a3t* > 0 for every t € R. Show that it is closed under vector addition (i.e., a,b € C => a+b € C) and under multiplication by positive numbers (i.e.,a€¢C and a>0=>aa€C), but that C is not a vector space over R. [REMARK: A set C with the indicated two properties is called a cone.] Exercise 1.5. Show that for each positive integer n, the space of polyno- mials n PX) = do a;r’ of degree n j=0 with coefficients a; € C is a vector space over C under the natural rules of addition and scalar multiplication. [REMARK: You may assume that Lj=0 @j? = 0 for every \ € C if and only if ag = a) = --- = an = 01] Exercise 1.6. Let F denote the set of continuous real-valued functions f(z) on the interval 0 < x < 1. Show that F is a vector space over R with respect. to the natural rules of vector addition ((f; + f2)(x) = fi(x) + fo(x)) and scalar multiplication ((af)(x) = af(z)). 1.3. Some definitions 5 1.3. Some definitions e Subspaces: A subspace M of a vector space V over F is a nonempty subset of V that is closed under vector addition and scalar multiplication. In other words if x and y belong to M, then x+y € M and ax € M for every scalar a € F. A subspace of a vector space is automatically a vector space in its own right. Exercise 1.7. Let Fo denote the set of continuous real-valued functions f(x) on the interval 0 < x < 1 that meet the auxiliary constraints f(0) =0 and f(1) = 0. Show that Fo is a vector space over R with respect to the natural rules of vector addition and scalar multiplication that were intro- duced in Exercise 1.6 and that Fo is a subspace of the vector space F that was considered there. Exercise 1.8. Let 7, denote the set of continuous real-valued functions f(z) on the interval 0 < « < 1 that meet the auxiliary constraints f(0)=0 and f(1) = 1. Show that F, is not a vector space over R with respect to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.6. e Span: If vi,... , VK is a given set of vectors in a vector space V over F, then k “VE} = Sagv; 101,...,0% EF jal span {vi,. In words, the span is the set of all linear combinations a)v, +--:+ Q,V;, of the indicated set of vectors, with coefficients a1,... ,a, in F. It is important to keep in mind that span{vi,... , vk} may be small in some sense. In fact, span {v1,... , Vk} is the smallest vector space that contains the vectors vi,-.. , Vk. The number of vectors k that were used to define the span is not a good indicator of the size of this space. Thus, for example, if 1 2 3 vi = |2| ,v2= |4] and v3 = |6] , 1 2 3 then span{v1, V2, v3} = span{vi}. To clarify the notion of the size of the span we need the concept of linear dependence. e Linear dependence: A set of vectors vi,... , Vg in a vector space V over F is said to be linearly dependent over F if there exists a 6 1. Vector spaces set of scalars a1,... ,@% € F , not all of which are zero, such that avi +++: +aRvy, = 0. Notice that this permits you to express one or more of the given vectors in terms of the others. Thus, if a; # 0, then a2 Ok Vi = ——v2 — ++ — VE ay ay and hence span{vi,... , Vk} =span{v2,...,Vve}- Further reductions are possible if the vectors v2,... , Vx are still lin- early dependent. e Linear independence: A set of vectors vi,... , Vk in a vector space V over F is said to be linearly independent over F if the only scalars ay,...,@% € F for which Ovi +++-+aKVve, =0 are a] =... =a, = 0. This is just another way of saying that you cannot express one of these vectors in terms of the others. Moreover, if {vi,..., vx} is a set of linearly independent vectors in a vector space V over F and if (1.1) V=aQivt+---+agve and v=fivi +---+ GevE for some choice of constants a1,... ,@%,51,... , 8% € F, then a; = f; forj—l ke Exercise 1.9. Verify the last assertion; i.e., if (1.1) holds for a linearly independent set of vectors, {vi,...,vz}, then a; = §; for j = 1,... ,k. Show by example that this conclusion is false if the given set of k vectors is not linearly independent. e Basis: A set of vectors v1,... , vg is said to form a basis for a vector space V over F if (1) span{vi,..., vk} = V. (2) The vectors v;,... , vj, are linearly independent. Both of these conditions are essential. The first guarantees that the given set of k vectors is big enough to express every vector v € Vasa linear combination of v1,... , v4}; the second that you cannot achieve this with less than k vectors. A nontrivial vector space V has many bases. However, the number of elements in each basis for V is exactly the same and is referred to as the dimension of V and will be denoted dimV. A proof of this statement will be furnished later. The next example should make it plausible. 1.3. Some definitions 7 Example 1.2. It is readily checked that the vectors Blk form a basis for the vector space F 3 over the field F. It is also not hard to show that no smaller set of vectors will do. (Thus, dimF? = 3, and, of course, dimF* = k for every positive integer k.) In a similar vein, the p x q matrices Ej;,i =1,... ,p,j =1,...,4, that are defined by setting every entry in Ej; equal to zero except for the ij entry, which is set equal to one, form a basis for the vector space F?*9, Matrix multiplication: Let A = [a;;] be apxq matrix and B = [b.t] be aqxr matrix. Then the product AB is the p x r matrix C = [ci] with entries q cre = D> anjbje, kot pe 1 j=l Notice that cze is the matrix product of the the k’th row 4; of A with the @’th column by of B: be Cee = Abe = [ax +++ Akeq] bge Thus, for example, if 23 4 ug A= and B=|/10-1 1], 21 0 Ol 2 then 4 7 10 2 AB = : 34 5 9 Moreover, if A € F?*? and x € F4, then y = Ax is the vector in F? with components y; = Dia a2; fori =1,... ,p. e Identity matrix: We shall use the symbol J, to denote the n x n matrix A = [aij], i,j = 1,... ,n, with aj = 1 for i =1,...,n and aij = 0 for i ¢ j. Thus, 10 I3=|0 1 0 So 0 oO}. 1 8 1. Vector spaces The matrix J, is referred to as the n x n identity matrix, or just the identity matrix if the size is clear from the context. The name stems from the fact that J,x = x for every vector x € F". « Zero matrix: We shall use the symbol Opxq for the matrix in F?*4 all of whose entries are equal to zero. The subscript p x q will be dropped if the size is clear from the context. The definition of matrix multiplication is such that: e Matrix multiplication is not commutative, i.e., even if A and B are both p x p matrices, in general AB # BA. In fact, if p > 1, then one can find A and B such that AB = Opxp, but BA # Opxp- Exercise 1.10. Find a pair of 2x2 matrices A and B such that AB = O22 but BA # Oox2. e Matrix multiplication is associative: If A € F?*7, B € F*" and CeF™s, then (AB)C = A(BC). ¢ Matrix multiplication is distributive: If A, Ai, Ao € F?*4 and B, By, Bg € F%*, then (Ai + 42)B = A,B+ A,B and A(B, + By) = AB, + AB. e If A € F?*4 is expressed both as an array of p row vectors of length q and as an array of g column vectors of height p: al a=|: ]=fa = ad, ap and if B € F%*" is expressed both as an array of q row vectors of length r and as an array of r column vectors of height q: bi B=| : |=[b - by], b, then the product AB can be expressed in the following three ways: aB 7 (1.2) AB=| : |=[Ab: --- Ab,] = oaib;. aB st Exercise 1.11. Show that if bi bie bis ba A= [ ae we as | and B= bar b22 beg baa | , oo b31 632 633 baa 1.3. Some definitions 9 then Din 0 0 0 ay 0 0 0 as ap=| 0 0|2+([o aoe 0 |2+[0 0 ay |? and hence that AB= [ an | (bu + ind [ 08 [ber ba2 bes bad +[ ] em s+ baa]. 413 23 Exercise 1.12. Verify the three ways of writing a matrix product in for- mula (1.2). [HINT: Let Exercise 1.11 serve as a guide.] e Block multiplication: It is often convenient to express a large ma- trix as an array of sub-matrices (i.e., blocks of numbers) rather than as an array of numbers. Then the rules of matrix multiplication still apply (block by block) provided that the block decompositions are compatible. Thus, for example, if An Ai fe | a Bu By Bi Bu Bo Bop Bo3 Bas Asi Aso with entries Aj; € F?*% and By, € F%*"*, then CAB (Gyo ia) where Ci = Au Bij + Ai2Ba;, is a pj x rj matrix. e Transposes: The transpose of a p x q matrix A is the g x p matrix whose k’th row is equal to the k’th column of A laid sideways, k = 1,...,q. In other words, the ij entry of A is equal to the ji entry of its transpose. The symbol A? is used to designate the transpose of A. Thus, for example, if iy 6) ao A= , then AT = | 3 2 4 2 6 5 6 It is readily checked that (1.3) (AT)? =A and (AB)? = BTA’. . Hermitian transposes: The Hermitian transpose A# ofa pxqma- trix A is the same as the transpose A” of A, except that all the entries 10 1. Vector spaces in the transposed matrix are replaced by their complex conjugates. Thus, for example, if 1 4 1 3 5+i | 7 a .rtrts~—CSCis 4 2-i 6i Bt | 6 It is readily checked that (1.4) (AM)¥ =A and (AB)! = BHAH, e Inverses: Let A € F?*4. Then: (1) A matrix C € F%*? is said to be a left inverse of A if CA = Iy. (2) A matrix B € F?*4 is said to be a right inverse of A if AB = Ip. In the first case A is said to be left invertible. In the second case A is said to be right invertible. It is readily checked that if a matrix A € F?*4 has both a left inverse C' and a right inverse B, then B = C: C = Cl, = C(AB) = (CA)B = 1,B = B. Notice that this implies that if A has both a left and a right inverse, then it has exactly one left inverse and exactly one right inverse and (as shown just above) the two are equal. In this instance, we shall say that A is invertible and refer to B = C as the inverse of A and denote it by A~}. In other words, a matrix A € F?*4 is invertible if and only if there exists a matrix B € F4? such that AB = I, and BA = I,. In fact, as we shall see later, we must also have g = p in this case. Exercise 1.13. Show that if A and B are invertible matrices of the same size, then AB is invertible and (AB)~! = B-1A7}. 101 Exercise 1.14. Show that the matrix A= |1 1 0] has no left inverses 10 and no right inverses. at) Exercise 1.15. Show that the matrix A = [ oft ] has at least two right inverses, but no left inverses. Exercise 1.16. Show that if a matrix A € C?*? has two right inverses By and Bo, then AB, + (1—A)Bz is also a right inverse for every choice of \ € C. Exercise 1.17. Show that a given matrix A € F?*@ has either 0, 1 or infinitely many right inverses and that the same conclusion prevails for left inverses. 1.4. Mappings 11 Exercise 1.18. Let Ai; € F?*?, Ajo € F?*4 and Ao € F2*?. Show that if Aj, is invertible, then [Au Aig] is right invertible and [ i ] is left invertible. 21 1.4. Mappings e Mappings: A mapping (or transformation) T from a subset Dr of a vector space / into a vector space V is a rule that assigns exactly one vector v € V to each u € Dr. The set Dr is called the domain of T. The following three examples give some idea of the possibilities: 3a? + 4 (a) T: [2] eR?4| m-m | ER. £2 @ +202+6 (b) T: {[2] €R?: 2 -aF of > [1/(21 — 22)] ER}. 321 + 22 (c) T: al ER? 4 | a, -m | eR. 3, 321 + ro The restriction on the domain in the second example is imposed in order to insure that the definition is meaningful. In the other two examples the domain is taken equal to the full vector space. In this framework we shall refer to the set Nr = {ue Dr: Tu=0y} as the nullspace (or kernel) of T and the set Rr = {Tu: ué Dr} as the range (or image) of T. The subscript V is added to the symbol 0 in the first definition to emphasize that it is the zero vector in V, not in U. e Linear mapping: A mapping T from a vector space U over F into a vector space V over the same number field F is said to be a linear mapping (or a linear transformation) if for every choice of u, v € U and a € F the following two conditions are met: (1) T(ut+v) =Tu+Tv. (2) T(au) = aTu. It is readily checked that if T is a linear mapping from a vector space U over F into a vector space V over F, then Np is a subspace of U and Rr is a subspace of V . Moreover, in the preceding set of three examples, T is linear only in case (c). 12 1. Vector spaces e The identity: Let U/ be a vector space over F. The special linear transformation from UY into UW that maps each vector u € U into itself is called the identity mapping. It is denoted by the symbol J, if U = F" and by Jy otherwise, though, more often than not, when the underlying space U is clear from the context, the subscript U/ will be dropped and J will be written in place of Jy. Thus, yu = Ju =u for every vector u € UW. Exercise 1.19. Compute Wr and Rr for each of the three cases (a), (b) and (c) considered above and say which are subspaces and which are not. Linear transformations are intimately connected with matrix multiplication: Exercise 1.20. Show that if T is a linear transformation from a vector space U/ over F with basis {u,,... , ug} into a vector space V over F with basis Avie 1Vphs then there exists a unique set of scalars aj; € F,i = 1,... ,p and j = 1,... ,q such that (1.5) Tu; = > and hence that q DP (1.6) TD) ayuj) = oyivi > Ax=y, j=l i=1 where x € F4 has components 21,... ,&q, y € F? has components y1,... , Yp and the entries ajj of A € F?*4 are determined by formula (1.5). e WARNING: If A € C?*4, then matrix multiplication defines a lin- ear map from x € C4 to Ax € C?. Correspondingly, the nullspace of this map, Na={x€C%:Ax=0}, isasubspace of C%, and the range of this map, Ra ={Ax:x€C%}, isasubspace of C?. However, if A € R?*4, then matrix multiplication also defines a linear map from x € R? to Ax € R?; and in this setting Na ={x€R!:Ax=0} isasubspaceof R®, and the range of this map, Ra ={Ax:xe€R%}, isasubspace of R?. In short, it is important to clarify the space on which A is acting, i.e., the domain of A. This will usually be clear from the context. 1.5. Triangular matrices 13 1.5. Triangular matrices Ann xn matrix A = [aj)] is said to be e upper triangular if all its nonzero entries sit either on or above the diagonal, i.e., if a;; = 0 when i > j. e lower triangular if all its nonzero entries sit either on or below the diagonal, i.e., if A? is upper triangular. e triangular if it is either upper triangular or lower triangular. e diagonal if a;;=0 when ij. Systems of equations based on a triangular matrix are particularly conve- nient to work with, even if the matrix is not invertible. Example 1.3. Let A € F4%4 be a 4 x 4 upper triangular matrix with nonzero diagonal entries and let b be any vector in F‘4. Then the vector x is a solution of the equation (1.7) Ax =b if and only if ait, + diet, + a1373+ ate = by a22%2 + 02303 + ar4qt4 = be 43303 +344 = b3 a4grg = 4. Therefore, since the diagonal entries of A are nonzero, it is readily seen that these equations admit a (unique) solution, by working from the bottom up: -1 La = agyb4 1 @3 = 33 (bs — a3ara) —1 Tz = Azy (bz — 4303 — a24t4) -1 2 = ayy (b — ay2e2 — a1gz3 — ay4za) « Thus, we have shown that for any right-hand side b, the equation (1.7) admits a (unique) solution x. Exploiting the freedom in the choice of b, let ej, j = 1,... ,4, denote the j’th column of the identity matrix I, and let x; denote the solution of the equation Ax; =e; for j = 1,... ,4. Then the 4 x 4 matrix X= [x1 x2 X3 xa] with columns xj,... ,x4 is a right inverse of A: AX = Afxr--+x4) = [Axi --- Axy] fer---edl=la. 14 1. Vector spaces Analogous examples can be built for pxp lower triangular matrices. The only difference is that now it is advantageous to work from the top down. The existence of a left inverse can also be obtained by writing down the requisite equations that must be solved. It is easier, however, to play with transposes. This works because A is a triangular matrix with nonzero diagonal entries if and only if A’ is a triangular matrix with nonzero diagonal entries and YA=],—> ATYT=1],. Exercise 1.21. Show that the right inverse X of the upper triangular ma- trix A that is constructed in the preceding example is also a left inverse and that it is upper triangular. Lemma 1.4. Let A be ap x p triangular matrix. Then (1) A is invertible if and only if all its diagonal entries are different from zero. Moreover, if A is an invertible triangular matriz, then (2) A is upper triangular <> A-! is upper triangular. (3) A is lower triangular <=> A-! is lower triangular. Proof. Suppose first that _— |} 1 412 a=| 0 a | is a 2 x 2 upper triangular matrix with nonzero diagonal entries a1; and az2. Then it is readily checked that the matrix equation zu t2|_][1 0 ale = l-[6 ae which is equivalent to the pair of equations a 22 |_| 0 lz ]=[o] [2 ]-[1]. has exactly one solution ott x= [z a2 l 7 oe rq 21 L22 0 az} and that this solution is also a left inverse of A: is + 7 Ay Ay] 12499 =1 0 39 1.5. Triangular matrices 15 Thus, every 2 x 2 upper triangular matrix A with nonzero diagonal entries is invertible and ayy ~ ayy arga3y! (1.8) At= 0 ay} is also upper triangular. Now let A and B be upper triangular k x k matrices such that AB = BA=I,. Then for every choice of a,b,c € C* and a,8 € C with a £0, [6 *\[ | _ [aetee on | Oalle B acl ap _ [I+ac™ Ab+ a 2 ac? a8 : Consequently, the product of these two matrices will be equal to J,41 if and only if c = 0, Ab + Ga = 0 and af = 1, that is, if and only if ¢ = 0, b = —6Ba and 8 = 1/a. Moreover, if c, b and ( are chosen to meet these conditions, then B b|[A a]_[BA Batab]_, Oo B O a\ =| 0 Ba 7 since Ba+ab = Ba+a(-—@Ba) =0. Thus, we have shown if k x k upper triangular matrices with nonzero entries on the diagonal are invertible, then the same holds true for (k +1) x (k+1) upper triangular matrices with nonzero entries on the diagonal. Therefore, since we already know that 2 x 2 upper triangular matrices with nonzero entries on the diagonal are invertible, it follows by induction that every upper triangular matrix with nonzero entries on the diagonal is invertible and that the inverse is upper triangular. Suppose next that A € C?*? is an invertible upper triangular matrix with inverse B ¢ C?*?. Then, upon expressing the identity AB = I, in block form as A, ai] [Bi bi] _ [Jp-1 O O a) lef Bl~ [oO 1 with diagonal blocks of size (p — 1) x (p— 1) and 1 x 1, respectively, it is readily seen that a3; = 1. Therefore, a; 4 0. The next step is to play the same game with A; to show that its bottom diagonal entry is nonzero and, continuing this way down the line, to conclude that the diagonal entries of A are nonzero and that the inverse matrix B is also automatically upper triangular. The details are left to the reader. 16 1. Vector spaces This completes the proof of the asserted statements for upper triangular matrices. The proof for lower triangular matrices may be carried out in much the same way or, what is simpler, by taking transposes. o Exercise 1.22. Show that if A € C"*” and A* = Onxn for some positive integer k, then J, — A is invertible. [HINT: It’s enough to show that (In-A)(UIn tA+ A? +++ AO!) = (I) tA+A? +--+ A*1)(T,—A) = In] Exercise 1.23. Show that even though all the diagonal entries of the ma- trix oil A=]101 1 070 are equal to zero, A is invertible, and find A}. Exercise 1.24. Use Exercise 1.22 to show that a triangular n x n matrix A with nonzero diagonal entries is invertible by writing A=D+(A-D)=D(I,+D-\(A—D)), where D is the diagonal matrix with dj; = aj; for j =1,... ,n. [HINT: The key observation is that (D~!(A — D))" =O1] 1.6. Block triangular matrices A matrix A € F"™” with block decomposition An ov Atk 2a. ce Ag ot Akg where Aj; € F?'*% for i,j =1,...,kand pp t+---+pp=qtetq=an is said to be « upper block triangular if p; = q; fori =1,... ,k and Ajj = O for i>j. ¢ lower block triangular if p; = q; fori = 1,... ,k and Aj; = O for i Eis invertible, and construct an example to show that the opposite implication is false. Exercise 1.31. Show that if the matrix E is defined by formula (1.10), then D and A-BD~'!C invertible => E is invertible, and show by example that the opposite implication is false. Exercise 1.32. Show that if the blocks A and D in the matrix FE defined by formula (1.10) are invertible, then Eis invertible <= D—CA™'B is invertible <= A-—BD"'C is invertible. Exercise 1.33. Show that if blocks A and D in the matrix E defined by formula (1.10) are invertible and A — BD~!C is invertible, then (1.15) (A- BDC) = A"! + A B(D-CA™B)"!CA7}. (HINT: Multiply both sides of the asserted identity by A - BD~!C\] Exercise 1.34. Show that if if blocks A and D in the matrix E defined by formula (1.10) are invertible and D — C-A~B is invertible, then (1.16) (D-CA™'B)"! = D7“! + D“'C(A- BD7'C)"BD". (HINT: Multiply both sides of the asserted identity by D — CAB] Exercise 1.35. Show that if A € C?*?, B € C?*7, C € C2? and the matrices A and A+ BC are both invertible, then the matrix I, + CAB is invertible and (Ig + CA~'B)-! = I, -C(A+ BC)"!B. 1.8. Other matrix products 19 Exercise 1.36. Show that if A ¢ C?*?, B € C4, C € C4? and the matrix A+ BC is invertible, then the matrix [ a8 | invertible, and Cc -I, find its inverse. Exercise 1.37. Let A € C?*?, u € C?, v € C? and assume that A is invertible. Show that [vi 7] is invertible <> A+uv" is invertible —+ 1+v7Au 40 and that if these conditions are met, then (yp tuv! At) y= ul + v7? Attu). Exercise 1.38. Show that if in the setting of Exercise 1.37 the condition 1+v"A-1u #0 is met, then the Sherman-Morrison formula Attuv# At 1.17 Atuvi)-} = at — — (1.17) (At+uv’) 1+v4 Alu holds. Exercise 1.39. Show that if A is a p x q matrix and C is a q x q invertible matrix, then Rac = Ra. Exercise 1.40. Show that the upper block triangular matrix An Aiz Ais A=] O Apo Agg O 0 As with entries Aj; of size p; x p; is invertible if the diagonal blocks A}1, A22 and Agg are invertible, and find a formula for A~!. [HINT: Look for a matrix B of the same form as A such that AB = Ip, +p.+p3-] 1.8. Other matrix products Two other product rules for matrices that arise in assorted applications are: ¢ The Schur product C = AoB of A = [aj] € C™*" with B = [bj] € C”*" is defined as the n x n matrix C = [cj] with entries oj; = aijbi; for i,j =1,...,n. ¢ The Kronecker product A@B of A = [a;;] € C?*? with B = [bj] € C"*™ is defined by the formula aiB ++ aigB A@B= : QB +++ apqgB 20 1. Vector spaces The Schur product of two square matrices of the same size is clearly commutative. It is also readily checked that the Kronecker product of real (or complex) matrices is associative: (A®B)@®C=A®@(BEC) and satisfies the rules (4@B)? = AT @BT, (A@B)(C@D)=AC@BD, when the indicated matrix multiplications are meaningful. If x € F*, u € F*, y e Fé and v € FS, then the last rule implies that (x"u)(y7v) = (x @y7)(u@v). Chapter 2 Gaussian elimination ... People can tell you... do it like this. But that ain’t the way to learn. You got to do it for yourself. Willie Mays, cited in Kahn [40], p. 163 Gaussian elimination is a way of passing from a given system of equations to a new system of equations that is easier to analyze. The passage from the given system to the new system is effected by multiplying both sides of the given equation, say Ax =b, successively on the left by appropriately chosen invertible matrices. The restriction to invertible multipliers is essential. Otherwise, the new system will not have the same set of solutions as the given one. In particular, the left multipliers will be either permutation matrices (which are defined below) or lower triangular matrices with ones on the diagonal. Both types are invertible. The first operation serves to interchange (i.e., permute) rows, whereas the second serves to add a multiple of one row to other rows. Thus, for example, 0 1 OJ fan aiz +++ ain az a22 Qn 1 0 O} Jaa age -+- Gan] = Jai ai2 Qn] > 0 0 1) las, a32 +++ agp, 431 432 *** 3p. whereas 1 0 0] fan +--+ ain ay a2 . 41n a@ 1 Of} faa +++ @on} = |aaj+a@q1 aayg+a22 +++ adin+ Gon BO LJ [as1 +++ a3n Bay, +431 Bai2+a32 +++ fain + 3n. 21 22 2. Gaussian elimination 2.1. Some preliminary observations The operation of adding (or subtracting) a constant multiple of one row of ap X q matrix from another row of that matrix can always be achieved by multiplying on the left by a p x p matrix with ones on the diagonal and one other nonzero entry. Every such matrix can be expressed in the form (2.1) Eq = Ip + ace} with i and j fixedand i#j, where the vectors e)...,@p denote the standard basis for F? (i.e., the columns in the identity matrix I,) and a € F. It is readily seen that the following conclusions hold for the class of matrices £j; of the form (2.1): (1) &, is closed under multiplication: Ea Eg = Fo+g- (2) The identity belongs to €;;: Eo = Ip. (3) Every matrix in &;; is invertible: E, is invertible and Ez! = E_,. (4) Multiplication is commutative in );: E,Eg = EgEq. Thus, the class of matrices of the form (2.1) isa commutative group with respect to matrix multiplication. The same conclusion holds for the more general class of p x p matrices of the form (2.2) Eu=1,+ue?, with ueF? and efu=0. The trade secret is the identity, which is considered in the next exercise, or, in less abstract terms, the observation that. 10 0 0) Wf 0070 1 O70 0 0 10 0 Oo 1 00 0c 10; |0 a+ce 10 060 1)|[0d01 0 b+d 0 1 and the realization that there is nothing special about the size of this matrix or the second column. Exercise 2.1. Let u,v € F? be such that e7u = 0 and e7v = 0. Show that (Ip + ue? )(Ip + vel) = (Ip + ve )(Ip + ue?) = Ip + (v+u)e?. e Permutation matrices: Every n x n permutation matrix P is ob- tained by taking the identity matrix J, and interchanging some of the rows. Consequently, P can be expressed in terms of the columns e;, j =1,...,n of I, and a one to one mapping o of the set of integers {1,... ,n} onto itself by the formula n (2.3) P=Pr=) ejeryy. jal 2.1. Some preliminary observations 23 Thus, for example, if n = 4 and o(1) = 3, o(2) = 2, o(3) = 4 and o(4) = 1, then T T T T Py = e1€3 + e2€) + e3e; + eye] = EP ooo ooHno coor oroo The set of n x n permutation matrices also forms a group under multiplication, but this group is not commutative (i.e., conditions (1)-(3) in the list given above are satisfied, but not (4)). e Orthogonal matrices: An n x n matrix V with real entries is said to be an orthogonal matrix if V7V = In. Exercise 2.2. Show that every permutation matrix is an orthogonal ma- trix. [HINT: Use formula (2.3).] The following notions will prove useful: e Upper echelon: A p x q matrix U is said to be an upper echelon matrix if the first nonzero entry in row # lies to the left of the first nonzero entry in row i+1. Thus, for example, the first of the following two matrices is an upper echelon matrix, while the second is not. 362410 423 1 001050 0060 00002 0)’ j0 505 000000 0000 . Pivots: The first nonzero entry in each row of an upper echelon matrix is termed a pivot. The pivots in the matrix on the left just above are 3, 1 and 2. Pivot columns: A column in an upper echelon matrix U will be referred to as a pivot column if it contains a pivot. Thus, the first, third and fifth columns of the matrix considered in the preceding paragraph are pivot columns. If GA = U, where G is invertible and U ¢€ F?*4 is in upper echelon form with k pivots, then the columns aj,,... ,@%, Of A that correspond in position to the pivot columns Uj,,--- , Uj, Of U will also be called pivot columns (even though the pivots are in U not in A) and the entries 2;,,... , 2%, in x € F? will be referred to as pivot variables. 24 2. Gaussian elimination 2.2. Examples Example 2.1. Consider the equation Ax = b, where 0 23 1 1 (2.4) A=]/1 5 3 4] andb= |2]. 26 3 2 al 1. Construct the augmented matrix _ foz231) (2.5) A=|15 342 26 3 2 1 that is formed by adding b as an extra column to the matrix A on the far right. The augmented matrix is introduced to insure that the row operations that are applied to the matrix A are also applied to the vector b. 2. Interchange the first two rows of A to get Te a3 412 = (2.6) a. 26321 0 P= 0 001 has been chosen to obtain a nonzero entry in the upper left-hand corner of the new matrix. where =o on 3. Subtract two times the top row of the matrix P,A from its bottom row to get (2.7) 1 5 3 4 2 - 1 0 0 O 2 1 ein. =where fi 0 10 0 -4 -3 -6 -3 2 1 is chosen to obtain all zeros below the pivot in the first column. 4. Add two times the second row of EPA to its third row to get 153 4 (2.8) ae al o =EE,PjA=([U c¢], 003-4 -1 where _ ll soo oor who roo 2.2, Examples 25 is chosen to obtain all zeros below the pivot in the second column, U = E2E,P,A is in upper echelon form and c = F2E\P,b. It was not nec- essary to permute the rows, since the upper left-hand corner of the block [ ao | was already nonzero. _ 5. Try to solve the new system of equations a 2 (2.9) Ux=|023 1 l= 1 003 -4 : -1 Ta. by solving for the pivot variables from the bottom row up: The bottom row equation is 3z3 — 424 = —1, and hence for the third pivot variable x3 we obtain the formula 323 = 4a4-1. The second row equation is 222+ 323+24=1, and hence for the second pivot variable x2 we obtain the formula 222 = —323 —-@4+1=—5244+2. Finally, the top row equation is ©, + 5x2 + 3x3 + 404 = 2, and hence for the first pivot variable x, we get a = —5ap — 323 — dag +2 = BFA) ey 1) — dng +2 =>a4-2. 2 Thus, we have expressed each of the pivot variables x), 72,23 in terms of the variable 4. In vector notation, 2 -2 9/2 = jo) | 2 —5/2 am ico lee 1 1/3| Ol 43 ry 0 7 is a solution of the system of equations (2.9), or equivalently, (2.10) E)E,P,Ax = EpE,Pib 26 2. Gaussian elimination (with A and b as in (2.4)) for every choice of x4. However, since the matrices Ep, E, and P, are invertible, x is a solution of (2.10) if and only if Ax = b, i.e., if and only if x is a solution of the original equation. 6. Check that the computed solution solves the original system of equa- tions. Strictly speaking, this step is superfluous, because the construction guarantees that every solution of the new system is a solution of the old sys- tem, and vice versa. Nevertheless, this is an extremely important step, because it gives you a way of verifying that your calculations are correct. Conclusions: Since U is a 3 x 4 matrix with 3 pivots, much the same sorts of calculations as those carried out above imply that for each choice of b € F3, the equation Ax = b considered in this example has at least one solution x € F4. Therefore, R4 = F%. Moreover, for any given b, there is a family of solutions of the form x = u + «4v for every choice of x4 € F. But this implies that Ax = Au + x4Av = Au for every choice of x4 € F, and hence that v € M4. In fact, 9/2 —5/2 4/3 1 Na = span and Ra =F. This, as we shall see shortly, is a consequence of the number of pivots and their positions. (In particular, anticipating a little, it is not an accident that the dimensions of these two spaces sum to the number of columns of A.) Example 2.2. Consider the equation Ax = b with 0043 by A=|1 2 4 1 andb= |b 1284 bg 1. Form the augmented matrix 0 _ 0 A=]1 2 12 one mE aicacal Ss 2. Interchange the first two rows to get 1241 2% : 0043 h|=RA 7. with P; as in Step 2 of the preceding example. 2.2. Examples 27 3. Subtract the top row of P;A from its bottom row to get, 12 411i : 0043 & |=HPA, 00 4 8 by~by 100 E,= O10). et 0 1 4. Subtract the second row of E, P;A from its third row to get where 1241 by _ 0043 by = EE, P,A=[U ec], 0 0 0 0 by~by— by where 1 00 1241 be E=|0 10], U=|0 0 4 3) and c= bt . 0 -11 0000 3 — be — by, 5. Try to solve the new system of equations 24 - by 004 a = by : 000 : bg — by — br working from the bottom up. To begin with, the bottom row yields the equation 0 = b3—b2—b;. Thus, it is clear that there are no solutions unless b3 = bj + bg. If this restriction is in force, then the second row gives us the equation own Arg + 324 = by and hence, the pivot variable, by — 324 7 Next, the first row gives us the equation 13 = © + 2x2 + 4x3 + v4 = bo and hence, the other pivot variable, © = bo — 2x2 — 423 — 24 = be — 2g — (b1 — 3x4) — 24 = ba — bi — 2x2 + 2x4. 28 2. Gaussian elimination Consequently, if bz = b; + b2, then zy be — by 2 2 _ fal _ | 0 1 0 x= |e b/4 +22 0 ie. -3/4 4 0 0 1 is a solution of the given system of equations for every choice of x2 and x4 in F. 6. Check that the computed solution solves the original system of equations. Conclusions: The preceding calculations imply that the equation Ax = b is solvable if and only if by 1 0 b= be ; Le. if and only if b€spanq |O],|1 : by + be, 1 1 Moreover, for each such b € F® there exists a solution of the form x = u+t2ov1 + ©4V2 for every x2,24 € F. In particular, r2Av; + 24Av2 = 0 for every choice of x2 and x4. But this is possible only if Av; = 0 and Av2 = 0. Exercise 2.3. Check that for the matrix A in Example 2.2, Rg is the span of the pivot columns of A: 0) 4 > ' Ra=spand |i}, [4] $, and Nq | -3/4 1 8 0 1 The next example is carried out more quickly. Example 2.3. Let 003477 by —\0 100 0 _ | & Slot ae 6\ ei. 006814 by Then a vector x € F® is a solution of the equation Ax = b if and only if 01000 n be 00347 a by 0.0021] ) %8 | ~ | bg- 2b. - br 00000 4 ba — 2b 2.2. Examples 29 The pivots of the upper echelon matrix on the left are in columns 2, 3 and 4, Therefore, upon solving for the pivot variables x2, x3 and x4 in terms of 1,25 and bj,... ,b4 from the bottom row up, we obtain the formulas 0 = b4- 2b 204 = b3—2bo—b) —25 323 = 6, — 424-725 = 3b, + 4b2 — 2b3 — 55 oS be But this is the same as zy zy 1 0 22 ba 0 0 zg | = | (—Sa5 + 3b) + 4b: —2b3)/3 | =a, | 0 | +25 | —5/3 rq (—a5 + bg — 2bz — b1)/2 0 -1/2 Ts &5 0 1 0 0 0 0 1 ) +b) 1 +b; | 4/3 | +b3 | —2/3 -1/2 -1 1/2 0 0 0 = 2U, + 25u2 + by ug + bou4 + bgus , where uj,... ,Us denote the five vectors in F® of the preceding line. Thus, we have shown that for each vector b € F4 with by = 2b;, the vector X = 2U) + F5Ug + byug + boug + bgus is a solution of the equation Ax = b for every choice of x; and z5. Therefore, £1U, + 25 Uz is a solution of the equation Ax = 0 for every choice of x1, 25 € F. Thus, uy, u2 € V4 and, as Ax = 1 Au; + 25 Auy + bi Aug + b2Aug + b3Augs = 6; Aug + b2Auy + bg Aus, the vectors v, = Aug = and v3 = Aus = Noor coro oroo belong to Ra. Exercise 2.4. Let aj, j = 1,...,5, denote the j’th column vector of the matrix A considered in the preceding example. Show that (1) span{vi, v2, v3} = span{ag, ag, a4} i.e., the span of the pivot columns of A. 30 2. Gaussian elimination (2) span{uy, u2} = V4 and span{vi, v2, v3} = Ra. 2.3. Upper echelon matrices The examples in the preceding section serve to illustrate the central role played by the number of pivots in an upper echelon matrix U and their positions when trying to solve systems of equations by Gaussian elimination. Our next main objective is to exploit the special structure of upper echelon matrices in order to draw some general conclusions for matrices in this class. Extensions to general matrices will then be made on the basis of the following lemma: Lemma 2.4. Let A € F?*? and assume that A # Opxq. Then there exists an invertible matrit G € F?*P such that (2.11) GA=U is in upper echelon form. Proof. By Gaussian elimination there exists a sequence Pj, Po,... , Ph of pXp permutation matrices and a sequence Ej, E2,... , Ej, of lower triangular matrices with ones on the diagonal such that Ey, Py- ++ E2P2E,P,A =U is in upper echelon form. Consequently the matrix G = E,P,--- E2P2E,P; fulfills the asserted conditions, since it is the product of invertible matrices. Oo Lemma 2.5. Let U € F?*4 be an upper echelon matrix with k pivots and let e; denote the j’th column of Ip for j =1,... ,p. Then: (1) & < min {p,q}. (2) The pivot columns of U are linearly independent. (3) The span of the pivot columns = span{ey,... ,e,} = Ru; i-e., (a) Ifk U is left invertible <> Ny = {0}. (3) k =p <=> U is right invertible —> Ryu = F?. Proof. The first assertion is established in Lemma 2.5 (and is repeated here for perspective). Suppose next that U has q pivots. Then v=| of ] if q

U is left invertible. Suppose next that U is left invertible with a left inverse V. Then x € Ny => Ux = 0 = 0 = V(Ux) = (VU)x = x, ie., U left invertible => Ny = {0}. To complete the proof of (2), observe that: The span of the pivot columns of U is equal to the span of all the columns of U, alias Ry. Therefore, every column of U can be expressed as a linear combination of the pivot columns. Thus, as Nu = {0} => the q columns of U are linearly independent , it follows that Nu = {0} => U_ has q pivots. Finally, even though the equivalence k = p <=> Ry = F? is known from Lemma 2.5, we shall present an independent proof of all of (3), because it is instructive and indicates how to construct right inverses, when they exist. We proceed in three steps: (a) k = p => U is right invertible: If k = q, then U is right (and left) invertible by Lemma 1.4. If k = p and q > p, then there exists a

You might also like