You are on page 1of 694
Tom M. Apostol CALCULUS VOLUME II Multi Variable Calculus and Linear Algebra, with Applications to Differential Equations and Probability SECOND EDITION i wie fc John Wiiey & Sons New York London Sydney Toronto CONSULTING EDITOR George Springer, Indiana University Cooreren ® 398 mr Xone Comonnsre All rights reserved, No part of the material covered by. this copyright ima¥_to produced in any form, er by any means of reproduction Previous edition copyright © 1962 by Xerox Corporation tary of Congress Calalog Card Somber: 61 14608 san v4 OOOO Ts Primed mn tho United States of America 1098765432 To Jane and Stephen PREFACE This book is a continuation of the author's Calculus, Volume I, Second Edition. The present volume has been written with the same underlying philosophy that prevailed in the first. Sound training in technique is combined with a strong theoretical development. Every effort has been made to convey the spirit of modem mathematics without undue emphasis on formalization, As in Vohime 1, historical remarks are included 10 give the student a sense of participation in the evolution of ideas The second volume is divided into three parts, entitled Linear Analysis, Nonlinear Anaiysis, and Speciai Topics. The iast wo chapers of Voiume i iave deen repeated as the first two chapters of Volume I so that all the material on linear algebra will be complete in one volume. Part 1 contains an introduction to linear algebra, including linear transformations, matrices, determinants, eigenvalues, and quadratic forms. Applications are given to analysis, in particular to the study of linear differential equations, Systems of differential equations are treated with the help of matrix calculus. Existence and uniqueness theorems are proved by Picard’s method of successive approximations, which is also cast in the language of contraction operators, Part 2 discusses the calculus of functions of several variables. Differential calculus. is unified and simplified with the aid of linear algebra. It includes chain rules for scalar and vector fields, and applications to partial differential equations and extremum problems. Integral calculus includes line integrals, multiple integrals, and surface integrals, with applications 10 vector analysis, Here the treatment is along more or less classical lines and does not include a formal development of differential forms. ‘The special topics treated in Part 3 are Probability and Numerical Analysis. The material on probability is divided into two chapters, one dealing with finite or countably infinite sample spaces; the other with uncountable sample spaces, random variables, and dis- uibution functions. The use of the Iculus is illustrated in the study of both one- and two-dimensional random variables, The last chapter contains an introduction to numerical analysis, the chief emphasis being on different kinds of polynomial approximation, Here again the ideas are unified by the notation and terminology of linear algebra, The book concludes with a treatment of summation formula, van Preface There is ample material in this volume for a full year’s course meeting twee or four times per week, It presupposes a knowledge of one-variable calculus as covered in most first-year calculus courses. The author has taught this material in a course with two lectures and two recitation periods per week, allowing about ten weeks for each part and omitting the starred sections. This second volume has been planned so that many chapters can be omitted for a variety of shorter courses. For example, the last chapter of each part can be skipped without disrupting the continuity of the presentation, Part 1 by itself provides material for a com- bined course in linear algebra and ordinary differential equations. The individual instructor can choose tonics to suit his needs and preferences by consulting the diagram on the next page which shows the logical interdependence of the chapters. Once again 1 acknowledge with pleasure the assistance of many friends and colleagues. In preparing the second edition I received valuable help from Professors Herbert S Zuckerman of the University of Washington, and Basil Gordon of the University of California, Los Angeles, each of whom suggested a number of improvements, ‘Thanks are also due to the staf of Blzisdell Publishing Company for their assistance and cooperation, As before, it gives me special pleasure to express my gratitude to my wife for the many ways in which she has contributed. In grateful acknowledgement I happily dedicate this book to her. TM A Pasadena, Californta September 16, 1968 Logical Interdependence of the Chapters ix 1 SPACES — 2 15 LINEAR INTRODUCTION TRANSFORMATIONS 10 NUMERICAL AND MATRICES ANALYSIS 3 DETERMINANTS ¢ 8 10 18 LINEAR DIFFIRENTIAL use | | ser rexcrions ee Caucoues OF jaf wwnacraLs | | aND mntmvTary QUATIONS § PROBABILITY 4 VECTOR MELDS GBNVAL \ AND 7 VECTORS 1 SYSTEMS OP DIRFERENTIAL MULTIPLE QUATIONS 5 INTEGRALS EIGENVALUES OF 14 PERATORS ACTING, CALCULUS UF ON BUCLIDEAN PxOnaniiTias SPACES 9 12 +-——§$+ Appucations ov | | sureace pirmmesniaL | |ivrece: cancunus CONTENTS PART 1. LINEAR ANALYSIS 1, LINEAR SPACES 1.1 Introduction 1.2. The definition of a linear space 1.3 Examples of linear spaces 14 Elementary consequences of the axioms 1.5 Exercises 1.6 Subspaces of a linear space 1.7 Dependent and independent sets in a linear space 183 Bases and dimension 1.9 Components 1.11 Inner products, Euclidean spaces. Norms 1.12 Onthogonality in a Euclidean space 1.13. Exercises 1.14 Construction of orthogonal sets. The Gram-Schmidt 1.15. Orthogonal complements. Projections process 1.16 Best approximation of elements in a Euclidean space by elements in a finite~ dimensional subspace L17_ Exercises 2. LINEAR TRANSFORMATIONS AND MATRICES 2.1, Linear transformations 2.2 Null space and range 2.3 Nullity and rank Bob eeaaee 18 20 2 26 32 34 xi xii Contents 2.4 Exercises 2.5. Algebraic operations on Tinear transformations 2.6 Inverses 2.7 One-to-one linear transformations 28 Exercises 2.9 Linear transformations with prescribed values 2.10 Matrix representations of Hinear transformations 2.11 Construction of a matrix representation in diagonal form 2.12 Exercises 2.13 Linear spaces of matrices 2.14 Tsomorphism between linear transformations and matrices 2.15 Multiplication of matrices 2.16 Exerci 2.17 Systems of linear equations techniques square miavices 1 Miscellaneous exercises on matrices 3. DETERMINANTS 3.1 Introduction 3.2 Motivation for the choice of axioms for a determinant function 3.3. A set of axioms for a determinant funetion 3.4 Computation of determinants 3.5 The uniqueness theorem 3.6. Exercises 3.7. The product formula for determinants 3.8 The determinant of the inverse of a nonsingular mattix 3.9 Determinants and independence of vectors 3.10 The determinant of a block-diagonal matrix 3.11 Exercises 3.12 Expansion formulas for determinants, Minors and cofactors 3.13 Bristence of the determinant function 3.14 The determinant of a transpose 3.15 The cofactor matrix 3.16 Cramer’s rule 3.17 Exercises 35 36 38 41 42 4s 48 50 5 52 34 37 38 67 68 7 2 B 16 19 19 $1 83 83 84 85 36 90 au 92 93 94, Contents xiil 4, EIGENVALUES AND EIGENVECTORS 4.1 Linear transformations with diagonal matrix representations 96 4.2 Bigenvectors and eigenvalues. of a linear transformation 97 4.3 Linear independence of eigenvectors comesponding 0 distinct eigenvalues 100 44 Exercises * 101 4.5 The finite-dimensional case. Characteristic polynomials 102 4.6 Calculation of eigenvalues and cigenvectors in the finite-cimensional case 103 4.7 Trace of a matrix 106 107 4.9 Matrices representing the same Tinear transformation, Similar matrices 108, 4.10 Exet 12d 5. EIGENVALUES OF OPERATORS ACTING ON EUCLIDEAN SPACES 5.1 Higenvalues and inner products 114 5.2 Hermitian and skew-Hermitian transformations 115 5.3. Eigenvalues and eigenvectors of Hermitian and skew-Hermitian operators 117 5.4 Onhogonalily of cigenvectors comesponding to distinct eigenvalues 117 5.5 Exercises 118 5.6 Existence of an orthonormal set of eigenvectors for Hermitian and skew-Hemnitian operators acting on finite-dimensional spaces 120 5.7 Matrix representations for Hermitian and skew-Hermitian operators 121 58 Hermitian and skew-Hermitian matrices. The adjoint of a matrix 122 5.9 Diagonalization of a Hermitian or skew-Hermitian matrix 122 5.10 Unitary matrices. Orthogonal matrices 123 5.11 Exercises 104 5.12 Quadratic forms 126 5.13 Reduction of a real quadratic form to a diagonal form 128 5.14 Applications to analytic geometry 130 5.15 Exercises 134 5.16 Eigenvalues of a symmetric transformation obtained as values of its quadratic form 135 45.17 Extremal properties of cigenvalues of a symmetric transformation 136 45.18 The finite-dimensional case 7 5.19 Unitary transformations 138 5.20 Exercises M41 xiv Contents 6. LINEAR DIFFERENTIAL EQUATIONS 6.1 Historical introduction 6.2. Review of results conceming linear equations of first and second orders 6.3. Exercises G4 Linc differential equations of ower a 6.5 The existence-uniqueness theorem 6.6 The dimension of the solution space of a homogeneous linear equation 6.7 The algebra of constant-coelficient operators 6.8 Determination of a basis of solutions for linear equations with constant coefficients by factorization of operators 6.9. Exerc 6.10 ‘The relation between the homogeneous and nonhomogeneous equations 6.11 Determination of a particular solution of the nonhomogeneous equation. The method of vanation of parameters 6.12 Nonsingularity of the Wronskian matrix of 7 independent solutions of a homogeneous Hisear equation 6.13. Special methods for determining a particular solution of the nonhomogeneous equation, Reduction to a system of first-order linear equations 6.14 The annihilator method for determining a particular solution of the nonhomogeneous equation 6.15 Exer 6.16 Miscellaneous exercises on linear differential equations 6.17 Linear equations of second order with analytic coefficients 6.18 The Legendre equation 6.19 ‘The Legendre polynomials 6.20, Rodrigues’ formula for the Legendre polynomials 6.21 Exercises 622 The method of Frobenius 624 The Ressel equation 6.24. Exercises 7. SYSTEMS OF DIFFERENTIAL EQUATIONS 7.1 Introduetion 7.2, Caleulus of matrix functions 7.3. Infinite series of matrices. Norms of matrices 7.4 Exercises 7.5 The exponential matrix 142 143 144 145 147 147 148 150 154 156 19 163 163 166 167 169 im 17a 176 m 180 182, 188 191 193 194 195 197 Contents xv 7.6 The differential equation satisfied by et 197 7.7 Uniqueness theorem for the matrix differential equation F'() = AF(t) 198 7.8 The law of exponents for exponential matrices 199 7.9 Existence and uniqueness theorems for homogeneous linear systems with constant coefficients 200 7.10 The problem of calculating e 201 7.1 The Cayley-Hamilton theorem 203 7.12 Exervises 205 7.13 Puvzer’s method for calculating et 205 7.14 Alternate methods for calculating ef in special cases 208 TAS Exercises 21 7.16 Nonhomogenoous linesr systems with constant coefficients 23 7.17 Exercises 215 7.8 The general linear sysiem Y() = Pi) Yi) + Qt) 217 7.19 A power-series method for solving homogeneous linear systems 220 7.20 Exercises 221 7.21 Proof of the existence theorem by the method of successive approximations 222 7.22 The method of successive approximations applied to first-order nonlinear 207 7.23 Proof of an existence-uniqueness theorem for first-order nonlinear systems 229 7.24. Bxervises 230 47.25. Successive approximations and fixed points of operators 232 7.26 Normed linear spaces 233 7.27 Contraction operators 234 47.28 Fixed-point theorem for contraction operators 235 7.29 Applications of the fixed-point theorem 237 PART 2. NONLINEAR ANALYSIS 8. DIFFERENTIAL CALCULUS OF SCALAR AND VECTOR FIELDS 8.1 Functions from R” to R”. Scalar and vector fields 243 8.2 Open balls and open sets 244 8.3 Exercises 245 8.4 Limits and continuity 287 8.5 Exercises 251 8.6 The derivative of a scalar field with respect to a vector 252 7 Directional derivatives and pautial derivatives 254 8.8 Partial derivatives of higher onder 255 8.9 Exercises 255 xvi Contents 8.10, Directional derivatives and continuity 8.11 The total derivative 8.12 The gradient of a scalar field 8.13 A. sufficient condition for differentiability 8.14 Exercises 8.15 A chain rule for derivatives of scalar fields 8.16 Applications to geometry, Level seis, Tangent planes 8.17 Exercises 8.18 Derivatives of veetor fields 8. 19 Dillerentiability implies continuity 8.20 The chain rule for derivatives of vector fields 8.21 Matrix form of the chain rule 8.22 Exercises 38.23. Sulliciem conditions for the equality of mixed partial derivatives 8.24 Miscellaneous exercises 257 258 259 261 262 263 266 268 269 am 272 273 215 217 281 9. APPLICATIONS OF THE DIFFERENTIAL CALCULUS 9.1 Partial differential equations 9.2 A first-order partial differential equation with constant coefficients 9.3 Exercises 9.4 The one-dimensional wave equation 5.3 Exercises 9.6 Derivatives of functions defined implicitly 9.8 Exercises 9.9 Maxima. minima. and saddle points 9. 10 Second-order Taylor formula for scalar fields. 9.11 The nature of a stationary point determined by the eigenvalues of the Hessian matrix 9.12 Second-derivative test for extrema of functions of two variables 9.13 Exercises 9.14 Extrema with constraints. Lagrange’s multipliers 9. IS Exercises 9.16 The extreme-value theorem for continuous scalar fields 9.17 The small-span theorem for continuous scalar fields (uniform continuity) 10. LINE INTEGRALS 10.1. Introduction 10.2 Paths and line integrals 283 284 280 288 294 cys 302 303 308 310 312 313 314 318 319 321 Contents wii 10.3 Other notations for line integrals 324 10.4 Basie properties of line integrals 326 10,5 Exercises 328 10.6 The concept of work as a line integral 328 10.7 Line integrals with respect to are length 329 10.8 Further applications of line integrals 330 10.9 Exercises 331 10.10 Open connected sets. Independence of the path 332 10.11 The second fundamental theorem of calculus for line integrals 333 10.12 Applications t0 mechanics 335 10.13 Exercises, 336 10.14 The first fundamental theorem of calculus for line integrals 337 10.15 Necessary and sufficient conditions for a vector field to be a gradient 339 10.16 Necessary conditions for a vector field tw be a gradient 340 10.17 Special methods for constructing potential functions 342 10.18 Exercises 345 10.19 Applications to exact differential equations of first order 346 10.20. bsxereises 349 10.21 Potential functions on convex sets 350 11. MULTIPLE INTEGRALS 11.1 Introduction 11.2 Partitions of rectangles. Step functions 11.3. The dovble integral of a step function 11.4 The definition of the double integral of a function defined and bounded on a rectangle 11.5 Upper and lower double integrals 11.6 Evaluation of a double integral by repeated one-dimensional integration 11.7 Geometric interpretation of the double integral as a volume 11.8 Worked examples 11.9 Exercises 11.10 Integrability of continuous functions 11 1 1 Antegrability of bounded functions with discontinuities 11.12 Double integrals extended over more general regions 11.13 Applications to area and volume 11.14 Worked examples 11, 15 Exercises 11.16 Further upplications of double imegrals 11.17 Two theorems of Pappus 11.18 Exercises 357 358 359 360 362, 363 364 365 368 369 371 373 316 377 xviii Contents 11.19 Green’s theorem in the plane 378 11.20 Some applications of Green’s theorem 382 11.21 A necessary and suflicient condition for a two-dimensional vector field t© be a gradient 383 11.22) Bacteises: 385 11.23 Green’s theorem for multiply connected regions 387 11.24 The winding number 389 1 1.25 Exercises 391 11.26 Change of variables in a double integral 392 11.27. Special cases of the transformation formula 396 11.28 Exercises 399 11.29 Proof of the transformation formula in a special case 401 11.30 Proof of the transformation formula in the general case 403 1131 Extensions to higher dimensions 405 11.32 Change of variables in an m-fold integral 407 11330 Wi o ang 11.34. Exercises 413 12. SURFACE INTEGRALS 12.1 Parametric representation of a surface aly 12.2 The fundamental vector product 420 12.3. The fundamental vector product as a normal to the surface 423 12.4 Exercises 424 12.5 Area of a parametric surface 424 12.6 Exercises ara 12.7 Surface integrals 430 128 Change of parameuie representation 432 129 Other notations for surface integrals 434 12.10 Exercises: 436 12.11 The theorem of Stokes 438 12.12 The curl and divergence of a vector field 440 12.13 Exercises 442 12.14 Further properties of the curl and divergence 443 12.15. Exercises 447 12.16 Reconstruction of a vector field from its curl 448 12.17 Exercises 452 12.18 Extensions of Stokes’ theorem 453 12.19 The divergenee theorem (Gauss’ theorent:) As7 12.20 Applications of the divergence theorem 460 1991 Ryerises 462 Contents xix PART 3. SPECIAL TOPICS 13. SET FUNCTIONS AND ELEMENTARY PROBABILITY 13.1 Historical introduction 13.2. Finitely additive set functions 13.3 Finitely additive measures 13.4 Exercises 13.5 The definition of probability for finite sample spaces 13.6 Special terminology peculiar to probability theory 13.7 Exercises 13.8 Worked examples 13.9 Exercises 13.10 Some basic principles of combinatorial analysis 13.11 Exercises 13.12 Conditional probability 12.13 Independence 13.14 Exercises 13.15 Compound experiments 13.16 Bemoulli trials 13.17 ‘The most probable number of successes in Bemoulli tals 13.18 Exercises 13.19 Countable and uncountable sets 13.20 Isxereises 1321 The definition of probability for countably infinite sample spaces 13.22. Exercises 13.23 Miscellaneous exercises on probability 14, CALCULUS OF PROBABILITIES 14.1 The definition of probability for uncountable sample spaces 142 Countability of the set of points with positive probability 14.3 Random variables 14.4 Exercises 14.5 Distribution functions 14.6 Discontinuities of distribution functions 14.7 Discrete distributions. Probability mass functions 14.8 Exercises 14.9 Continuous distributions. Density functions 469 470 an 472 413 475 4q7 4aq7 419 481 485 486 agg 490 492 495 497 499 501 304 506 507 507 510 sul 512 513 S14 SIT 520 523 525 xx Contents 14.10 Uniform distribution over an interval 14.11 Cauchy's distribution 14.12 Exercises 14.13 Exponential — distributions 14.14 Normal distributions 14.15 Remarks on more general distributions 14.16 Exercises 14.17 Distributions of functions of random variables 14.18 Exercises 14.19 Distributions of two-dimensional random variables 14.20 Two-dimensional discrete distributions 14.21 Two-dimensional continuous distributions. Density functions 14,22 Exercises 14.23 Distributions of functions of two random variables 14.24 Exercises 14.25 Expectation and variance 14.26 Expectation of a function of a random variable 14.27 Fxercises 14.28 Chebyshev’s inequality 14.29 Laws of large numbers 1430 The central limit theorem of the calculus of probabilities 14.3 I Exercises Suggested References 15. INTRODUCTION TO NUMERICAL 15.1 Hiswrical inoduction 15.2 Approximations by polynomials 15.2 2 15.4 Fundamental problems in polynomial approximation 15.5 Exercises 15.6 Interpolating polynomials 15.7 Equally spaced interpolation points 15.8 Error analysis in polynomial interpolation 15.9 Exercises 15.10 Newion’s interpolation formula ANALYSIS 15.11 Equally spaced interpolation points. The forward difference operator 15.12 Factorial polynuntialy 15.13 Exercises 15.14 A minimum problem relative to the mex norm Contents XXL 15.15 Chebyshev polynomials 596 15.16 A minimal property of Chebyshev polynomials 598 15.17 Application to the error formula for interpolation 399 15.18 Exercises 600 15.19 Approximate integration. The trapezoidal rule 602 15.20 Simpson's. rule 605 15.21 Exercises 610 15.22 The Buler summation formula 613 15.23 Fxercises 618 Suggested References 621 Answers 10 exercises 622 Index 665 Calculus PART | LINEAR ANALYSIS LINEAR SPACES 1.1 Introduction Throughout mathematics we encounter many examples of mathematical objects that can be added (0 each other and muloped by real numbers, Furst ol all, the real numbers themselves are such objects. Other examples are real-valued functions, the complex numbers, infinite series, vectors in n-space, and vector-valued functions, In this chapter we discuss a general mathematical concept, called a linear space, which includes all these examples and many others as special cases, Briefly, a linear space is a sct of elements of any kind on which certain operations (called addition and multiplication by numbers) can be performed. In defining a linear space, we do not specify the nature of the elements nor do we tell how the operations are t be performed on them, Instead, we require that the operations have certain properties which we take as axioms for a linear space. We tum now 1 a detailed description of these axioms 1.2 The definition of a linear space Lat V denote a nonempty set of objects, called elements, The set Vis called a linear space if it satisfies the following ten axioms which we list in three groups Closure axioms axrom 1. cxosunt unpex avpztzox. For every pair of elements x and y in V there corresponds a unique element in V called the sum of x and y, denoted by x + y. Rxr0M 2. crosomz omen wourrouzearron ay ata womans. For every x in V and every real number a there corresponds an element in V called the product of a and x, denoted Axioms for addition axrom 3, commurarive taw. For all x and y in V, we have x + y yer axrom 4. assocrativeraw. For all x,y, andzin V,wehave(x + y) +z =x+(y+z). a 4 Linear spaces acto 5. EXISTENCEOFZEROELEMENT. There is an element in V, denoted by Q such that x+0=x forallxinV axrom 6, EXISTENCEOFWEGATIVES. For every x in V, the element (— 1x has the property x+(-Dx Axioms for multiplication by numbers axrom 7, assoctativa taw. pr every x in V and all real numbers a and b, we have a(bx) = (ab)x. axrow 8, oreremmunive taw ron soorron mV, For all x andy in Vand all real a, we hare a(x + y) = ax + ay. AXIOM 9, DISTRIBUTIVE LAW FOR ADDITION oF NomBERS. For all x in V and all real a and b, we have (a + dix = ax + bx. axzon 10, extsrevce or rommmnr. For every xin V,we have Ix = Linear spaces, as defined above, are sometimes called real linear spaces to emphasize the fact that we are multiplying the elements of V by real numbers, If real number is replaced by complex number in Axioms 2, 7, 8, and 9, the resulting structure is called a complex linear space. Sometimes a linear space is referred to as a linear vector space ot simply a vector space; the numbers used as multipliers are also called scalars, A real linear space has real numbers as scalars; a complex linear space has complex numbers as. scalars. Although we shall deal primarily with examples of real linear spaces, all the theorems are valid for complex Ii spaces as well. When we use the term linear space without further designation, it is 1 be understood that the space can be real or complex. 1.3 Examples of linear spaces Hf we specify the sct Vo and ich how w add iis chimenis and how w mubiply them by numbers, we get a concrete example of a linear space. The reader can easily verify that each of the following examples satisfies all the axioms for a real linear space. sowie 1, Let V = R, the set of all real numbers, and let x + y and ax be ordinary addition and multiplication of real numbers. puwia 2. Let V = C, the set of all complex numbers, define x + y to be ordinary addition of complex numbers, and define ax to be multiplication of the complex number x Examples of linear spaces 5 by the real number a. Even though the elements of V are complex numbers, this is a real linear space because the scalars are real mows 3. Let V = V,, the vector space of all n-mples of real numbers, with addition and multiplication by scalars defined in the usual way in terms of components. waver 4. Let V be the set of all vectors in V,, orthogonal to a given nonzero vector N. If n = 2, this linear space is a line through 0 with N as a normal vector, If a = 3, it is a plane throngh O with Nas normal vector. ‘The following examples are called function spaces. The clements of V are real-valued functions, with addition of wo functions f and g defined in the usual way: F + eile) =f) + 80x) for every real x in the interseotion of the domains off and g. Multiplication of a funetion J by a real scalar a is defined as follows: af is that function whose value at each x in the domain off is af (x). The zero clement is the function whose values are everywhere zero, The reader can easily verily that each of the following sels is a function space suveie 5, The set of all functions defined on a given interval eaves 6. The set of all polynomials. sxuoiz 7. The set of all polynomials of degree < m, where m is fixed. (Whenever we consider this set it is understood that the zero polynomial is also included) The set of all polynomials of degree equal to ” is not a linear space because the closure axioms are not satisfied. For example, the sum of two polynomials of degree m need not have degree 1, avers 8. The set of all functions continuous on a given interval. If the interval is [a,b], we denote this space by Cia, b). nuweiz 9, The set of all functions differentiable at a given point. moverz 1), ‘Ihe set of all functions integrable on a given interval, watz 11, The set of all functions f defined at 1 with £1) = 0, The number 0 is essential in this example, If we replace 0 by a nonzero number ¢, we violate the closure axioms. sxaueiz 12, The set of all solutions of a homogeneous linear differential equation y” + ay’ + by = 0, where a and b are given constants. Here again 0 is essential. The set of solutions of a nonhomogeneous differential equation does not satisfy the closure axioms. These examples and many others illustrate how the linear space concept permeates algebra, geometry, and analysis. When a theorem is deduced from the axioms of a linear space, we obtain, in one stroke, a result valid for each concrete example, By unifying 6 Linear spaces diverse examples in this way we gain a deeper insight into each, Sometimes special knowl edge of one particular example helps to anticipate or interpret results valid for other examples aint reveais ictaiiuusiips which might viieiwive cocape unvice. 1.4 Elementary consequences of the axioms The following theorems are easily deduced from the axioms for a linear space. THEOREM 1.1, UNIQUENESS OF THE ZERO ELEMENT. In any linear space there is one and only one zero element. Proof. Axiom 5 tells us that there is at least one zero element, Suppose there were two, say 0, and 0,. Taking x = Q; and 0 = 0, in Axiom 5, we obtain O;+ O,= 01. Similarly, taking x = 02 and 0 = O,, we find 02 + 0, = 02. But O, + 02 = 02 + 0, because of the commutative law, so 0, = 02 THEOREM 1.2, UNIQUENESS OF NBGATIVE ELEMENTS In any linear space every element has exactly one negative. That is, for every x there is one and only one y such that x + y = 0. Proof. Axiom 6 tells us that each x has at least one negative. namely (- 10x, Suppose x has two negatives, say y, and y,. Then x + y, = 0 and x + y, = 0. Adding y2 to both members of the first equation and using Axioms 5, 4, and 3, we find that Jett H)=W+O and Jot + = (tne W=0+y = +0 Therefore yy = yz, so x has exactly one negative, the element - 1)x. Notation. The negative of x the sum y + (-x) denoted by —x. The difference y — x is defined to be The next theorem describes a number of properties wh manipulations in a linear space. ch govern elementary alget ‘THBOREM 1.3. Ina given linear space, let x and y denote arbitrary elements and let a and b denote arbitrary scalars. Then we'have the following properties: (a) Ox = 0. (b) a0 = 0. (ce) Ga)x = -(ax) = al-x) (d) Ifax=O,theneithera=Oorx=O (e) If ax = ay anda #0, thenx=y. (8) Max = bx and x # 0, then a = (2) -@+ Y= (x) + (Hy) = XY. (bt) x +x =2x,x +" +x = Sx, andingeneral, Sp, x = nx. Exercises 7 We shall prove (a), (b), and (6) and leave the proofs of the other properties as exercises. Proof of (a). Let z = Ox. We wish to prove that z = 0. Adding 7 to itself and using Axiom 9, we find that z42=0x 40x = 40x =0x=2 Now add —z to both members get z = 0. Proof oftb). Let z = a0, add z to itself, and use Axiom 8. Proof offe). Let ~ = Ca)x. Adding 10 ax and using Axiom 9, we find that zt ax = (-a)x + ax = (-a + a)x = 0x =0, so is the negative of ax, x = -(ax). Similarly, if we add a(—x) to ax and use Axiom 8 and property (b), we find that a(—x) = -(ax). 15 Exercises In Exercises 1 through 28, determine whether each of the given sets is a real linear space, if addition and moluplicauon by real scalars are defined in the usval way. Kor those that are not, tell which axioms fail t0 hold, The functions in Exercises 1 through 17 are real-valued, In Exer- cises 3, 4, and 5, each funetion has domain containing 0 and 1. In Exercises 7 through 12, each domain contains all real numbers. 1. All rational functions. 2. Ail rational functions fig, with the degree off < the degree of (including f= 0). 3. All fwith £0) =r01) 8. All even functions 4. Allfwith 2f(0) =14) 9, All odd. functions, 5. Allfwith /(1) = 1 4/10). 10. All bounded functions. 6. Alll step functions defined on [0, 1). 11, All increasing functions. 7. All fwith f(x) > 0.as x +00. 12. All funetions with period 2x. 13, 3. All f integrable on [0, 1] with [3 flx) dx = 0. 14, All F integrable on [0, 1] with {§ fx) dx = 0. 15. All _f satisfying f(x) = f(1 — x) for all x. 16. All Taylor polynomials of degree < x for a fixed m Gineluding the zero polynomial, 17. All solutions of a linear second-order homogeneous differential equation’ y” + P(x)y" + O(s)y — 0, where P and Q are given functions, continuous everywhere. 18, All bounded real sequences. 20. All convergent real series. 19, All convergent real sequences. 21, Alll absolutely convergent real series. 22, All vouiuns (a, y, 2) in Fy with 2 = 6, 23. All vectors (x, y, z) in Vy with x = 0 or y = 0. 2A. All vectors (x, y. z) in Vy with y = 5x. 25. All vectors (x, y, 2) in Vy with 3x 4y = 1, 2 = 0. 26, Al vectors (x, y, 2) in Vq which are scalar multiples of (1,2, 3). 27. All vectors (x, y, 2) in V, whose components satisfy a system of three linear equations of the form OyX + dyay + Anz yX + dV + Oz nX + yay + Oygz 8 Linear spaces 28, All vectors in V, that are linear combinations of two given vectors A and B. 29. Let V = R™, the set of positive real numbers. Define the “sum” of two elements x and y in ¥ w be dein prodiuct ay Gn die usu seine), aK define “nuitpiicaion™ oF an eicmen x in V by a scalar ¢ t0 be x*, Prove that V is a real linear space with 1 as the zero element, 30. (a) Prove that Axiom 10 can be deduced from the other axioms. (b) Prove that Axiom 10 cannot be deduced from the other axioms if Axiom 6 is replaced by Axiom 6°: For every x in V there is an element y in V such that x + y = 0, 3 1, Let S be the set of all ordered pairs (x,, x3) of real numbers, In each case determine whether or not $ is a linear space with the operations of addition and mutiplication by scalars defined as indicated. If the set is not a linear space, indicate which axioms are violated (@) (ys) + Ors Ve) = On + rte +2), a(x, X2) = (ax, 0), (b) Game) + Orde) - Gr + 1-0, (x,y) = (ary, axe). () Gi, 2) + isda) = Oa, 22 +d, (x1, 22) = (X,, ax,). (A) Ces) + Os Ped — Cites rela + yD. aGa. 22) Clan, axel) 32, Prove parts (d) through (h) of Theorem 1.3. 1.6 Subspaces of a linear space Given a linear space V, let $ be a nonempty subset of V. If § is also a linear space, with the same operations of addition and muldplication by scalars, then Sis called a subspace of V, The next theorem gives a simple criterion for determining whether or not a subset of a linear space is a subspace. razonan 1.4, Let S be a nonempty subset of a linear space V. Then $ is a subspace if and only if 8 satiafies the closure axioms. Proof. If S is a subspace, it satisfies all the axioms for a linear space, and hence, in particular, it es the closure axioms. Now we show that if S satisfies the closure axioms it satisfies the others as well. The commutative and associative laws for addition (Axioms 3 and 4) and the axioms for multiplication by scalars (Axioms 7 through 10) are automatically satisfied in S because they hold for all elements of V. It remains to verify Axioms 5 and 6, the existence of a zero element in S, and the existence of a negative for each element in S. Let x be any clement of $. (S has at least one element since $ is not empty.) By Axiom 2, ax is in S for every scalar a. Taking a = 0, it follows that Ox is in S. But Ox = O, by ‘Theorem 1.3(a), so U € S, and Axiom 5 1s satisfied. ‘laking a = — 1, we see that (-l)x is in S. But x + (~ 1)x = 0 since both x and (- 1x are in V, so Axiom 6 is satisfied in S. Therefore $ is a subspace of V. surtxon. Let Sbe a nonempty subset of a linear space V. An element xin V of the form where X1,..., x,areall in Sand e,..., ¢% ar scalars, is called a finite linear combination Of elements of S The set of all finite linear combinations of elements of S satisfies the closure axioms and hence is a subspace of V. We call this the subspace spanned by S, or the linear span of S and denote it by LS. If Sis empty, we define L(S) to be {0}, the set consisting Of the zero clement alone. Dependent and independent sets in a linear space 9 Different sets may span the same subspace. For example, the space V2 is spanned by each of the following sets of vectors: {i,j}, {i,j, i +4}, (0, i, i,j, —j, i+ j). The space of all polynomials p(t) of degree + co in Equation (1.3), exch term wilh kM tends w aro and we find that ey = 0. Deleting the Mai term fiom (1.2) and applying the induction hypothesis, we find that each of the remaining n — 1 coeflicients ¢, is zero. varorem 1.5. Let S = {x,,...,x,} be an independent set consisting of k elements in a linear space V and let L{S) be the subspace spanned by 8. Then every set of k + | elements in L{S) is dependent. Proof. ‘The proof is by induction on k, the number of elements in S. First suppose k = 1, Then, by hypothesis, S§ consists of one element x,, where x; # 0 since S is iniepentient, Now wike any owo distinct ciemems yy und ye in 265), Then each is a scalar Dependent and independent sets in a linear space u multiple of 2, say Jr = ¢1%, and Ye= ¢qXy, where ¢, and ¢y are not both 0, Multiplying Jr by Cy and yx by ¢ and subtracting, we find that Yi — Ge = 0 This is a nontrivial representation of 0 soy, and yz are dependent, This proves the theorem when k = 1 Now we assume the theorem is tre for k — 1 and prove that it is also tme for k, Take any set of k + 1 clements in L(S), say T= (1, Yer... Yes}. We wish to prove that Tis dependent. Since each y, is in L(S) we may write ey Yay, 2 foreachi= 1,2,...,k +1. We examine all the scalars aj, that multiply x; and split the proof ino ovo cases avcorling to whether all these scalars are O oF not CASE 1. ay = 0 for everyi = 1,2,..., k +1. In this case the sum in (1.4) does not involve x1, so each y, in T is in the linear span of the set S’ = {xp,..., Xx}. But Sis independent and consists of k — 1 elements, By the induction hypothesis, the theorem is truc for k — 1 so the set T is dependent. This proves the theorem in Case 1 CASE 2. Not all the scalars az, are zero. Let us assume that a,, # 0. (If necessary, we can renumber the y's to achieve this.) Taking i = 1 in Equation (1.4) and multiplying both members by c,, where ¢; = giyfan, we get ‘ C= aki + LC X; - 72 From this we subtract Equation (1.4) 10 get Cd == Blas = aX fori = 2,...,k +1. This equation expresses each of the k elements ¢,J, — Jy as a linear combination of the — 1 independent elements 2, + Xe . By the induction hypothesis, the k elements ¢y,— y; must be dependent, Hence, for some choice of scalars fz, : fei» not all zero, we have ky 2 He = : from which we find & & But this is a nontrivial linear combination of yy, . Ysa Which represents the zero ele- ment, so the elements J, » Jeex Must be dependent, This completes the proof. 1 Linear spaces 1.8 Bases and dimension pernmmron. A jinite set S of elements in a linear space V is called a finite basis for V if S is independent and spans V. The space V is called finite-dimensional if it has a jinite ranoom = 1.6. Let V be a_finite-dimensional linear space. Then every finite hasis for V has the same number of elements. Proof. Let $ and T be two finite bases for V. Suppose S consists of k elements and T consists of m elements, Since $ is independent and spans V, Theorem 1.5 tells us that every set of k +1 clements in Vis dependent, Therefore, every set of more thank elements in V is dependent, Since T is an independent set, we must have m < k The same argu- ment with $ and 7 interchanged shows that k < m . Therefore k =m. vernuzron. Ifa linear space V has a basis of n elements, the integer n is called the dimension of V. We write n=dim V. If V = {0}, we say V has dimension 0. wove 1. The space V, has dimension n, One basis is the set of # unit coordinate vectors, puweiz 2, The space of all polynomials p(t) of degre 0 Ff x#0 (positivity). Inner products, Euclidean spaces. Norms 15 A weal linear space with an inner product is called a real Buclidean space. Note: Taking c = 0 in (3), we find that (O, y) = 0 for all y. In a complex linear space, an inner product (x, y) is a complex number satisfying the same axioms as those for a real inner product, except that the symmetry axiom is replaced by the relation (ry (x, 9) = (9, x), (Hermitiant symmetry) where (y, x) denotes the complex conjugate of (y, x). In the homogeneity axiom, the scalar mulupher ¢ can be any complex number, From the homogeneity axiom and (I’), we get the companion relation By ( &) = y= ay, 9) = el, yD. A complex linear space with an inner product is called a complex Euclidean ‘space. (Sometimes the term unitary space is also used.) One example is complex vector space V,(C) discussed briefly in Section 12.16 of Volume 1. ‘Although we are interested primarily in examples of real Euclidean spaces, the theorems of this chapter are valid for complex Euclidean spaces as well. When we use the term Euclidean space without further designation, it is to be understood that the space can be real or complex. product. mevere 1. In Vy let (5 y) = x the usual dot product of x and y. wore 2.16 x = (xy, x3) and y = (, y) are any two vectors in Vz, define (x, y) by the formula (x. Y= Deen Xian xePet aye. This example shows that there may be more than one inner product in a given linear space. novere 3. Let C(a, 6) denote the linear space of all real-valued functions continuous on an interval fa, b]. Define an inner product of two funetions f and g by the fornmula ¢ fog) This formula is analogous to Equation (1.6) which defines the dot product of two vectors in V,. The function values f(t) and g(t) play the role of the components x; and y, , and integration takes the place of summation. 4 In honor of Charles Hermite (1822-1901), a French mathematician who made many contributions to algebra and analysis. 16 Linear spaces eavere 4. In the space C(a, b), define (hO=), wOFMs(o at, where w is a fixed positive fimetion in C(a, b). The function w is called a weigitfunction, In Example 3 we have w(t) = 1 for all t, waexz 5, In the linear space of all real polynomials, define a= eroeto ar. Because of the exponential factor, this improper integral converges for every choice of polynomials and g. swronm 1.8. In a Euclidean space V, every inner product satisfies the Cauchy-Schwarz inequality: 1, 99/7 $0, 9), y) for all x andy in V. Moreover, the equality sign holds if and only if x and y are dependent. Proof. If either x = 0 or y = 0 the result holds trivially, so we can assume that both xX and y are nonzero, Let x = ax + by, where a and b are scalars wo be specified later. We have the inequality ( z) > 0 for all a and b, When we express this inequality in terms of x and y with an appropriate choice of a and b we will obtain the Cauchy-Schwarz inequality. To express (2, 7) in terms of x and y we use properties (1’), (2) and (3°) to obtain ¢ =(ax+ by, @%+ by) = (aa, ax) + (ax, by) + (by, aa} + (by, by} = aa(x, x) + ab(x, y) + baly, x) + bb(y, 9) D0. Taking a = (y, y) and cancelling the positive factor (y, y) in the inequality we obtain O DO + OY) + bY, 4 bb20. Now we take 6 = —(x. ¥). Then 6 = — (y. x) and the last inequality simplifies to Os I x) 2 YO. ¥) = 1G, YI? This proves the Cauchy-Schwarz inequality. The equality sign holds throughout the proof if and only if z = 0. This holds, in tum, if and only if x and y are dependent. muverz. Applying Theorem 1.8 10 the space C(a, b) with the inner product (f, g) = PFMDg(2) dt. we find that the Cauchy-Schwarz inequality becomes (Pree a) < (P70 at)( Pew Inner products, Euclidean spaces. Norms "7 ‘The inner product can be used to introduce the metric concept of length in any Duclidean space, vernutzon. In a Euclidean space V, the nonnegative number \\x \ defined by the equation lx! = (x, x)* is cailed the norm of the element x. When the Cauchy-Schwarz inequality is expressed in terms of norms, it becomes 1G YS Ii My Since it_may be possible to define an inner product in many different ways. the norm of an element will depend on the choice of inner product, This lack of uniqueness is to be expected, It is analogous t0 the fact that we can assign different numbers to measure the length of a given line segment, depending on de choice of scale or unit of measurement, The next theorem gives fundamental properties of norms that do not depend on the choice of inner product, mazorm 1.9. In a Euclidean space. every norm has the following properties for all elements x and y and all scalars c. @) Ix}=0 if x=0. (D) jx] > 0 ff xO — (positivity). (©) Ilex] = lel lxt] (homogeneity). (@) ix + yl 0. Proof. vroperues (a), (0) and (C) follow at once trom tne axioms for an anner product To prove (d), we note that Ix + yl = HY FY) = x) + OF) + + O19) + IP + @ y+ & y)- ‘The sum (x, y) + (e, y) is mal, The Canchy-Schwarz inequality shows that [Cv (xl lplland 1(x, »)1 < lkel yh, so we have WS IIx + yll® < bl? + Dl? + 2ixll Lvl = (lal + Jy)? This proves (@). When y = ex , where ¢ > 0, we have la + y= lle + xl) = (0+ ©) fal = [x] + devil = lel + by. 18 Linear spaces sermon. In @ real Euclidean space V, the angle between two nonzero elements x and y is defined to be that number 0 in the interval 0 << 1 which satisfies the equation (7 ose= 2, ist yl Note: The Cauchy-Schwarz inequality shows that the quotient on the right of (1.7) lies in the interval [ — 1, 1], so there is exactly one 6 in (0, 1] whose cosine is equal to this quotient 1.12 Orthogonality in a Euclidean space version. In a Euclidean space V, two elements x and y are called orthogonal if their inner product is zero. A subset § of Vis called an orthogonal set if (x, y) = 0 for every pair of distinet elements x and y in S. An orthogonal set is called orthonormal if each of ius elements has norm 1 ‘The zero element is orthogonal to every element of V; it is the only element orthogonal to itself. The next theorem shows a relation between orthogonality and independence. mutonn 1.10. In a Buclidean space V, every orthogonal set of nonzero elements is independent. In particular, in a jinite-dimensional Huelidean space with dim V = n, every orthogonal set consisting of n nonzero elements is a basis for V. Proof. Let $ be an orthogonal set of nonzero elements in V, and suppose some finite linear combination of elements of Sis zero, say 2 Zor =O, A where each x, € S. Taking the inner product of each member with 2, and using the fact that (Xi, x) = 0 if t #1, we find thar ¢,(x,, 4) = 0. But (x1, 1) 0 since x ¥ 0 0 ¢, = 0. Repeating the argument with x, replaced by x), we find that each ¢ = 0, This proves that § is independent, If dim V = n and if S consists of n elements, Theorem 1.7(b) shows that S is a basis for V. maveus. In the real linear space C(0, 2m) with the inner produet (f, g) = §3*/(x)g(x) dx. let S be the set of trigonometric functions {uj, 1, #g,.. .} given by UX) = 1, Way al) = cos me, tyalX) = sin mx, for n= 1,2... If m # n, we have the orthogonality relations par | UCU, (X) de = 0, Orthogonality in a Euclidean space 19 so $ is an orthogonal set. Since no member of S is the zero element, $ is independent, The nom of each clement of 5 is easily calculated, We have (up, uo) = [jf dt = 2m and, for n> 1, we have [Most nx dex (sys eggs) =|, sin’ nx de = Therefore, |jug] = V2m and |u,|= V7 for n > 1 . Dividing each u, by its norm, we obtain an orthonormal set {79 Gis Gar- +} Where Gy = ty/liml| Thus, we have cos nx sin nx Gan alX) + Pan(X) ~ for a>1 In Section 1.14 we shall prove that every finite-dimensional Euclidean space has an ‘orthogonal basis, The next theorem shows how t compute the components of an element relative to such a basis, and muon I ,| 1. Let V he a _finite-dimensional Euclidean space with dimension n, 2 linear combination of the basis elements, say 1.8) x=) ce, ‘hen its components relative to the ordered basis (e,,..., e,) are given by the formulas 19) ee for jl ee (ee) n particular, if S is an orthonormal basis, each c; is given by 1.10) =e). Proof. Taking the inner product of each member of (1.8) with ¢;, we obtain (ede 3 aleie) = e(e,,0) since (e,, ¢;)=0 if ij. This implies (1.9), and when (e,, ¢,) = 1, we obtain (1.10) If {e;,....¢,) is an orthonormal basis, Equation (1.9) can be written in the form Lay x= Do, ene fe ‘The next theorem shows that in a finite-dimensional Buclidean space with an orthonormal vasis the inner product of two elements can be computed in terms of their components. 20 Linear spaces ruzonex 1.12. Let V be a finite-dimensional Euclidean space of dimension n, and assume fhat {e,,...,e) is an orthonormal basis for V. Then for every puir of elements x and y in V, we have aay (9) =SGe)0e) — Parsenal’s formula). In particular, when x = y , we have (1.13) Wx? = Six ea. a Proof, Taking the inner product of both members of Equation (1.11) withy and using the linearity property of the inner product, we obtain (1,12). When x = y, Equation (1.12) reduces to (1.13). Note: Equation (1.12) is named in honor of M.A. Parseval (circa 1776-1836), who obtained this type of formula in a special function space. Equation (1.13) is a generalization of the thennem of Pythagoras 1.13 Exercises 1 Letx = (x... .,%) andy =(),,..., yn) be arbitrary vectors in Vy. In each case, determine whether (x. y) is an inner product for V,, if (x, y) is defined hy the form (, y) is not an inner product, tell which axioms are not satistied 2 eo W @ & Y= > Hil My =( Sui)". 4 a &) &y) = Son. © w= So 1 vot ~ Sat Sy. © @ypadudy- a re Suppose we retain the first three axioms for a real inner product (symmetry, linearity, and homogeneity but replace the Jourth axiom by a new axiom (4°): (x, x) = 0 if and only if x =0. Prove that either (x, x) > 0 for all x 4 0 or else (x, x) < 0 for all x # 0. [Hint: Assume (x, x) > 0 for some x # 0 and (y y) <0 for some y ¥ 0. In the space spanned by {x, y), find an element z #0 with (2, ) = 0.) Prove: that each of the statements in Exercises 3 through 7 is valid for all elements x and y ina real Euclidean space. (x, y) = 0 if and only if |x + lx = yl &, > = 0 if and only if |x + yl* = jix!? + [lpl? . (x,y) = 0 if and only if |x + ey! > |x! for all real » (K een ote cas if |x! = lyl. . If x and y are nonzero elements making an angle @ with each other, then vii ix = yit = ixit +i = 2 xi lip cvs 6 Exercises a1 8. In the real linear space C(1, e), define an inner product by the equation (f.8)= fF dog. fCxets) de, (a) If f(x) = Vx, compute If (b) Find a linear polynomial g(x) = a + dx that is orthogonal to the constant funetion fx) = 1 9, In the real linear space C( — 1, 1), let (f, g) = Ug, tz, ty given hy 1 f(Og(®) de . Consider the three functions m0 , a) wy) = 14 8. Piove that wo of them are orthogonal, (wo make an angle {3 with each other, aud owo make an angle 2/6 with each other. 10. In the linear space P, of all real polynomials of degree < 1, define Ga => )+0), (@ Prove that (jg) is an inner product for Fy (b) Compute (f, g) when f(t) = t and g@ = at +h. (©) If f(o) = # find all linear polynomials g orthogonal to /: 11. In the linear space of all real polynomials, define (f, ¢) = [PF e'f(ng(®) dt (a) Prove that this improper integral converges absolutely Yor ali polynomials f and g. (b) If x,(0) = 0" for n = 0, 1,2...., prove that (x. Xm) = (m +)! (©) Compute (f,g) when fi) = (¢ +1) and g()= #41 (@ Find all linear polynomials g(¢) = a + bf orthogonal to f() = 1+ 1. 12. In the linear space of all real polynomials, determine whether or not (f, g) is an inner product if (f,g) is defined by the formula given, In case (f,g) is not an inner product, indicate which axioms are violated. In (c), f” and g ' denote derivatives. @ (fe) = fg). © a= [isos oa. OKs) =| [feos a | Fg) = ([/o ar)(fEsco a). 13. Let V consist of all infinite sequences (x,} of real numbers for which the series Ex2 converges. If x = {tq} and y= {yp} are two elements of Y, define (y= > Anne wt (a) Prove that this series converges absolutely, [Hint Use the Cauchy-Schwarz inequality © estimate the sum S24 |xyyal-d (b) Prove that V is a linear space with (x, y) as an inner product. (©) Compute (x, y) if x, = 1m and yy = 1](n+ 1) for 2 1 (@) Compute (x, 'y) if ty = 2" and yy, = Au! form = 1 \4, Let ¥ be the set’of all real functions f continuous on [0, + 2) and such that the integral JP otf") det converges. Define (f, g) = Jy e“P(Ng() dt 22 Linear spaces (a) Prove that the integral for (f, g) converges absolutely for each pair of functions f and g inv. [Hint Use the Cauchy-Schwarz inequality estimate the integral (2 # | (Qg(O] dt] (b) Prove that V is a linear space with (¢, g) as an inner product, (©) Compute (f,g) if f() = et and g(t) = ™, where n= 0, 1,2, . 15. In a complex Euclidean space, prove that the inner product has the following properties for all elements x, y and z, and all complex @ and b, (@) (ax, by) = abla, y). () (x, ay + bz) = aCe, y) + Ble, 2). 16. Prove that the following identities are valid in every Euclidean space. @) Ix +yP = ixi* + yl? + Gy) +O. 9). (b) ix + yi? = Ix — yl? = 20, y) + 2,0. (©) ix + ple + Ix = ple = 2 Ix + 2 17. Prove thai the space of all complex-valued functions continuous on an interval a, J becomes a unitary space if we define an inner product by the formula ha) ~ f mosorWar, where w is a fixed positive fimetion, continuous on fa, Bj, 1.14 Construction of orthogonal sets. The Gram-Selt process Every finite-dimensional linear space has a finite basis. If the space is Euclidean, we can always construct an orthogonal basis. This result will be deduced as a consequence of a ‘general theorem whose proof shows how { construct orthogonal sets in any Buclidean space, finite or infinite dimensional, The construction is called the Gram-Schmidt orthog- onalizationprocess, in honor of J. P. Gram (1850-1916) and E. Schmidt (18451921), aanonen 1,13, onruoaonauzsarszon runonen. Let xy, Xp, be ajinite or infinite sequence ot elements in a Euclidean space V, and let L(x,,..., Xx) denote the subspace spanned by the first k of these elements, Then there is a corresponding sequence of elements VisJasvee in V which has the following properties sor each integer he: (a) The element y, is orthogonal to every element in the subspace L(yy,..-+ Yx-a) () The subspare spanned hy y, the camo as that spanned hy ©, LO s+ Se) = Ls 00s Xe (c) The sequence ¥;,)2,..., is unique, except for scalar factors. That is, if yi, y,s-++»i8 another sequence o£ elements in V satisfying vroverties (a) and (b) cox all k, then £0. there is a scalar ¢, such that ¥,= CY. Proof. We consiruct the elements y+ Ys , by induction, To start the process, we take y= x,, Now assume we have constructed y.,. .. Y So that (a) and (p) are satisfied when k =r. Then we define y,,1 by the equation (14) Jer = Xo — Baas a Construction of orthogonal sets. The Gram-Schmidt process 23 where the scalars a,,... a, are to be determined. For j Yrs) so we can wnte where z,€L(1s... + y,) We wish to prove that z, = 0. By property (a), both yf, and CraaVrqx ate Orthogonal 10 z, . Therefore, their difference, z,, is orthogonal to z, . In other words, Z, is orthogonal to itself, so z, = 0, This completes the proof of the orthogonaliza- tion theorem. 4 Linear spaces In the foregoing construction, suppose we have y,,1 = 0 for some r, Then (1.14) shows that %,,1 i a linear combination of Yay... , Jy» and hence of %,... 5 Xp» 80 the elements 4... , Xpy1 are dependent, In other words, if the first k elements X15-.. , Xe are independent, then the comesponding elements Jy ,..., Yyare nonzero, In this case the coefficients q, in (1.14) are given by (1.15), and the formulas’ defining yy Vy, become, Gi) i=, ve er Dew for P=1,2...,k—1 Oo ww These formulas describe the Gram-Schmidt process for constructing an orthogonal set of nonzero elements y,... , Yz which spans the same subspace as a given independent set Xpyeeey Xp. In particular, if x, Xp is a basis for a finite-dimensional Euctidean space, then y,,... x i8 an orthogonal basis for the same space. We can also convert this 0 an orthonormal basis by normalizing each element y,, that is, by dividing it by its nom. Therefore, as a corollary of Theorem 1.13 we have the following. THEOREM 114 Avery finite-dimenstonal Buclidean space has an orthonormal basis. If x andy ae elements in a Euclidean space, withy % O, the clement & yy Gy) is called the projection of x along y. In the Gram-Schmidt process (1.16), we construct the clement yp, by subtracting from x,,1 the projection of x,,, along cach of the earlier clements Yi,..+, Jr» Figure 1,1 illustrates the construction geometrically in the vector space Vs . c= Drew) woe Figure 1.1 The Gram-Schmidt process in V . An orthogonal set {y,, Jes ys} is constructed from a given independent set {xy, ¥2, x5}. Construction of orthogonal sets. The Gram-Schmidt process 25 zxawext 1, In Vy, find an orthonormal basis for the subspace spanned by the three vectors X= (1, -1, 1, -1), x2 = (5, 1, 1, 1), and x5 = (3, -3, 1, -3). Solution, Applying the Gram-Schmidt process, we find Y= % = (1-1, 1,1), rom SY ye (4,2,0,2), = Ou yd) 5G) G) Xs— on + Va = (0,0, 0,0). SS yh a= Ov) (es Ye) Since ys =O, the three vectors x1, Xe, X; must be dependent. But since y, and y, are nonzero, the vectors x, and Xz are independent, Therefore L(x,, 2, %3) is a subspace of dimension 2. The set (J, y4} is an orthogonal basis for this subspace. Dividing each of ‘Ya and yz by its norm we get an orthonormal basis consisting of the two vectors dy Ya 1 A aKi—11-1) and 2 (2,1,0, 1). Uyak lel V6 ; sae 2, The Legendre polynomials. In the linear space of all polynomials, with the inner product (x, y) = Jt, x(f)y(t) dt, consider the infinite sequence xy. %1.%2,..., where x,(0) =. When the orthogonalization theorem is applied to this sequence it yields another sequence of polynomials Yo, Yis Yar...» first encountered by the French mathe- matician A, M, Legendre (1752-1833) in his work on potential theory. The first few polynomials are easily calculated by the Gram-Schmidt process. First of all, we have Jolt) = xo(f) = 1. Since 1 Oo w=[dr=2 and Os) =f rdr=o, we find that X1 > Yo) nil) = x(t) ~ 22 yA) 0 wi)» (o> sa) Next, we use the relations ' | (2, yo) = [itas Gom= [ea=0, Oow= Pi edes, to obtain (2, Yo) ay) ya(t) = xg(t) — 2220) yy) — Gerd) “ Oo ve of On Similarly, we find that dlt= Pa Bt, y(th= th Fte Pelt) = 18 AP + Se, 6 Linear spaces We shall encounter these polynomials again in Chapter 6 in our further study of differential equations, and we shall prove that yf) = & ge Gini ae aie ‘The polynomials P, given by Lan)! 2M(n!) a e Pian 7 P.O = sa Il) = 5 are known as the Legendrepolynomials. The polynomials in the corresponding orthonormal sequence Fp, Fis Fess +s given by ¢, = Yn/|lynl| are called the normalized Legendre poly- nomials. From the formulas for Jy » Ye given above, we find that a= Vt. o=Vit, ols WIGF-M al)= WEG -3y pat) = AVE B5t4 — 3008 + 3), galt) = AVE (6318 — 701° + 151) 1.15. Orthogonal complements. Projections Let V be a Buclidean space and let S be a finite-dimensional subspace. We wish to consider the following type of approximation problem: Given an element x in V, to deter- mine an element in S whose distance from x is as small as possible. The distance between two elements x and y is defined to be the norm x — y\l Before discussing this problem in its general form, we consider a special case, illustrated in Figure 1,2. Here V is the vector space Vz and $ is a two-dimensional subspace, a plane through the origin, Given x in V, the problem is to find, in the plane S, that point 5 nearest 10 x. If x € S, then clearly = x is the solution If x is not in S, then the nearest point s is obtained by dropping a perpendicular from x to the plane. This simple example suggests an approach to the general approximation problem and motivates the discussion that follows. verintmioy. Let S be a subset of a Buclidean space V. An element in V is said to be orthogonal to S if it is orthogonal to every element of S The set of all elements orthogonal to S is denoted by S+ and is called “S perpendicular.” it iy u sinmpie exercise w verily dna S! iy u subspace of V, winter or nut S its In case $ is a subspace, then Sis called the orthogonal complement of S. is on, EXAMPLE, If S is a plane through the origin, as shown in Figure 1.2, then Sis a line through the origin perpendicular to this plane, This example also gives a geometric inter- pretation for the next theorem. Orthogonal complements. Projections 27 xasast FIGURE 1.2 Goomeuic interpretation of the orthogonal devonyrsition theorem int Vg. ramonex 1,15, onrnecouar pecourosrrzou zuzonen. Let V be a Euclidean space and let S be a finite-dimensional subspace of V. Then every element x in V can be represented uniquely as a sum of two elements, one in § and one in $4, That is, we have (7) x=stst, wheres€S and s+eSt, Moreover, the norm of x is given by the Pythagorean formula (1.18) xl? = isl? + Isl? Proof. First. we prove that an orthogonal decomposition (1.17) actually exists. Since S is finite-dimensional, it has a finite orthonormal basis, say {e,,... , ¢,}. Given x, define the elements s and s+ as follows: (1,19) S= > G, ee, 5! A Note that each term (x, ¢,)e, is the projection of x along e, . The clement $ is the sum of the projections of x along each basis element, Since § is a linear combination of the basis elements, § lies in §. The definition of s+ shows that Equation (1.17) holds, To prove that s+ lies in S+, we consider the inner product of s+ and any basis element e; . We have (st, e) (x = 6,6) = (x, e) = (5, &). But from (1.19), we find that (5, ¢;) = (% ©), so s+ is orthogonal to e;, Therefore 5+ is orthogonal to every element in S, which means that s+ © S Next we prove that the orthogonal decomposition (1.17) is unique. Suppose that x has two such representations, say (1.20) xests! and oxattel, 28 Linear spaces where § and rare in S, and s+ and ¢! are in S+, We wish to prove that s= rand st = t+ From (1.20), we have s — t = ¢1 = s+, so we need only prove that s —¢ = 0. But s = 16 Sand t) — s! € S4 s0 § —£ is both orthogonal to f+ — s+ and equal to 4 — st Since the zero element is the only clement orthogonal to itself, we must have 5 — ‘This shows that the decomposition is unique. Finally, we prove that the norm of x is given by the Pythagorean formula, We have xP =O, x)= 645,545) =(5,9) +64, 84), the remaining terms heing zeny since g and gt are orthogonal. This proves (1.18), DEANITION. Let S be a finite-dimensional subspace of a Euclidean space V, and let fers... ¢ be an orthonormal basis for S. LE x € V, the element $ defined by the equation s= D& ee, ft is called the projection of x on the subspace S. We prove next that the projection of x on Sis the solution to the approximation problem stated at the beginning of this section. 1.16 Best approximation of elements in a Euclidean space by elements in a finite- dimensional subspace ‘THEOREM 1.16. APPROXIMATION THHOREM. Let S be a_finite-dimensional subspace of a Euclidean space V, and let x be any element of V. Then the projection of x on § is nearer to x than any other element of S. That is, if § is the projection of x on S, we have IIx = sl S|xae For allt in S; the equality sign holds if and only if t = s, Proof, By Theorem 1.15 we can write x = 5 + 54, where s € S and s+ € $+, Then, for any fin S, we have -6=Q-9+6-0. Since s = 4 € S and x = s = s+ € S!, this is an orthogonal decomposition of x — f, so its norm is given by the Pythagorean formula lle — 1? = Ux = sii? + [ls — el? But ||s = ¢\|? > 0, so we have |x — ¢\|? > |x —s|®, with equality holding if and only if Best approximation of elements in a Euclidean space 29 wawiz 1. Approximation of continuous functions on (0, 2x] by trigonometric polynomials, Let V = C(O, 2m), the linear space of all real functions continuous on the interval (0, 27], and define an inner product by the equation (f, g) = J3 fla)g(x) de . In Section 1.12 we exhibited an orthonormal set of wigonometric functions gy, g,, Pgs» » » + Where cos sin k: (1.21) gx) = PxaX) = 5 Pylx) = » for k>1. y v ‘The 2n +1 elements Go, G1++. . » Pan SPan a subspace S of dimension 2n + 1, The ele- ments of § are called trigonometric polynomials. If fe C(O, 27), letf, denote the projection off on the subspace S. Then we have 1.22) Se= Shoe where (F 9) = [edna dx. The numbers (f, qx) are called Fourier coefficients off. Using the formulas in (1,21), we can rewrite (1.22) in the form (1,23) SX) = day +d(a, cos kx + by sin kx), a where por 4 sin kx dx [10 cos kxdx, bh = 0 a, 7 7 for k =0,1,2,...,m. The approximation theorem tells us that the trigonometric poly- nomial in (1.23) approximates f better than any other trigonometric polynomial in S, in the sense that the norm |f = f,l| is as small as possible, woverz 2, Approximation of continuous functions on [— 1, 1] by polynomials of degree < n. Let V = C(- 1, 1), the space of real continuous functions on [- 1, 1}, and let Chg) = Py S@g@) dx. The n+ 1 nonuualized Legendse polynoniials gy. piss. + + Pas introduced in Section 1.14, span a subspace S of dimension n + 1 consisting of all poly- nomials of degre 1, form an orthonormal set spanning the same subspace a8 X95 X15 X25 4. In the linear space oF all real polynomials, with inner product (x, y) x(0) =f" for m= 0, 1, 2... .. Prove that the functions Sf ey dr, let yt) =1. yl = VIQ-D, 9,4) = V5 (6F - & + 1) form an orthonormal set spanning the same. subspace as {it , x1, Xz}. 5. Let V be the linear space of all real functions £ continuous on (0, + 0) and such that the integral f¢° e'f2(t) dt converges. Define (f,g) = [7 e*/(g(0) dt, and let Yo, yr »Yos ++. be the set obtained by applying the Gram-Schmidt process 0 x9, x15 Xp» . where x(t) = r* for > 0. Prove that y(t)= 1, y(Q= 1-1, y()= 2-4 +2 y= P= 9% + 18 = 6 this Tn dhe real linear space C(O, 2) with inner product (f g) = J3 f Goya) dx, let fla) = e* and show that the constant polynomial g nearest to f is g = 4(e = 1). Compute |g -f |? for this g. 8. In the real linear space C( — 1, 1) with inner product (f, g)= fi, f (ag(x) dx , let fix) = e* and find the linear polynomial g nearest to f. Compute |g — f !? for this g. 9. In the real linear space C(O, 27) with inner product (f,g) = J3* f(xg() dx, let fle) = x. In the subspace spanned by u(x) = 1 , m(X) = COS X, We(x) = sin x, Lind the tngonometnc polynomial nearest tof. 10, In the linear space V of Exercise 5, let f (x) tof. and find the linear polynomial that is nearest 2 LINEAR TRANSFORMATIONS AND MATRICES 2.1 Linear transformations One of the ultimate goals of analysis is a comprehensive study of functions whose domains and ranges are subsets of linear spaces. Such functions are called transformations, This the i © emples, - tions, which occur in all branches of mathematics. Properties of more general transforma- tions are often obtained by approximating them by linear transformations First’ we introduce some notation and terminology conceming arbitrary functions. Let V and W be two sets, The symbol T:V>W will be used to indicate that 7’ is a function whose domain is V and whose values are in W. For each x in V, the element 7Yx) in W is called the image of x under T, and we say that T maps x onto Tix). If A is any subset of V, the set of all images T(x) for x in A is called the image of A under T and is denoted by TYA). The image of the domain V, T(V), is the range of 7. Now we assume that V and Ware linear spaces having the same set of scalars, and we define a linear transformation as follows. parmrtrox. If Vand Ware linear spaces, a function T: V—> W is called a linear trans- formation of V into W if it has the following two properties: (a) T(x + y) = Tee) + T(y) forall x and yin V, (b) Tex) = eT) forall x in Vandall scalars c. These properties are verbalized by saying that T™ preserves addition and multiplication by scalars. The two properties can be combined into one formula which states that for all x,y in V and all scalars a and 6, By induction, we also have the more general relation 1(Sax) = Zane for any n elements x1, ... ,x, in Vand any n scalars q,..., 31 22 Linear transformations and matrices The reader can easily verify that the following examples are linear transformations, EXAMPLE 1. The identity transformation, The transformation T: V — V, where Thx) = x for each x in V, is called the identity transformation and is denoted by Z or by Iy sowie 2. The zero transformation, The wansformation T: V—> V which maps each element of V onto 0 is called the zero transformation and is denoted by 0. seweie 3. Multiplication by a fixed scalar c. Here we have T: V > V, where Ttx) = ex for all x in V. When ¢ = 1, this is the identity transformation, When ¢ = 0, it is the zero transformation. muwerz 4, Linear equations. Let V = V, and W = Vy . Given mn real numbers aj, where i= 1,2,...,mandk =1,2,...,, define 7: V, > V,, as follows: Tmapseach vector x = (%1,.. x) in Vq onto the vector y = (Wi, Yor) in Voq according to the equations suvorz 5. Inner product with a fixed element. Let V be a real Euclidean space. For a fixed element z in V, define 7: V > R as follows: If x € V, then T(x) = (x, z), the inner product of x with 2. puez 6. Projection on a subspace. Let V be a Euclidean space and let $ be a finite- dimensional subspace of V. Define 7; V + S as follows: If x € V, then T(x) is the projection of x on S, wuwse 7 The differentiation operator. Let V be the linear space of all real funetions f differentiable on an open interval (a, 6). The linear transformation which maps each funetionfin V onto its derivativer” is called the differentiation operator and is denoted by D. Thus, we have D: V — W, where D (£) = f" for cach f in V. The space W consists of all derivatives. f. wowin 8 The integration operator. Let V be the linear space of all real functions continuous on an interval /a, b/. If fe V, define g = Vf) to be that function in V given by sof ode it a ei) =0 iF so the element x = Cgi@pea +1" + Ceesete iS in the mull space N(T). This means there Exercises 35 fare scalars Cy... , Gy Such that x = Ce, + +. 8+ Geq, 80 We have b ker ~ ox- Yea-Zaa- 0. But since the elements in (2.2) are independent, this implies that all the scalars ¢; are zero. Therefore, the elements in (2.3) are independent. Note: If V is infinite-dimensional, then at least one of N(T) or T(V) is infinite- dimensional. A proof of of this fact is outlined in Exercise 30 of Section 2.4. 2.4 Exercises Tn each of Exercises 1 through 10, a transformation T: Vp => Vz i8 defined by the formula given for T(x, y), where (x, y) is an arbitrary point in ¥,, In each case determine whether Tis linear. If T is lincar, deseribe iis mull space and range, and compute its nullity and rank, 1. T(x, y) = »). 6. Tix, = (7,0) Le, Y-& PD. 7. The, GD, 3 (x0). 8. TG, y) = + Ly + D. 4, (x, x). 9. Tx, y)= (x —y,x ty). Oy). 10. TG, y) {au —y,4 +7). same as above for each of Exercises 11 through 15 if the transformation 7: Vz > Vy bed us indica 11, T rotates every point through the same angle y about the origin. That is, 7” maps a point with polar coordinates (r, 8) onto the point with polar coordinates (r, 0 + y), where g is fixed. Also, T maps 0 onto itself. 12, T maps each point onto its reflection with respect to a fixed line through the origin, 13. T maps every point onto the point (1, 1). 14. T maps each point with polar coordinates (r, 6) onto the point with polar coordinates (2r, 6). Also, T maps 0 onto itself. 15, T maps cach point with polar coordinates (r, @) onto the point with polar coordinates (r, 26), Also, T maps 0 onto itself. Do the same as above in each of Exercises 16 through 23 if a transtormation T; ¥;—> V3 is elined by the formula given for T(x, y, 2), where (x, y, 2) is an arbitrary point of Vs « 16. T(x, y,2) = (2, ¥s)- 20. Tex,y,2) = (x4 yt z= D. 17. TC, ¥, 2) = (x,y, 0). 21, T(x, y,2) = (+ Lys 2,243) 18. Tix. y, z) = (x, 2y, 32). Thx, y, 2) = 4 Ys 2). 19. T(x, ys 2) = (ys I). |. T(x, = + ZA ty) In each of Exercises 24 through 27, a transformation T: V—> V is described as indica each case, determine whether Tis linear. If T is linear, describe its null space and range, and compute the nullity and rank when they are finite 24, Let V be the linear space of all real polynomials p(x) of degree < 2. ip €¥, = Tp) means that g(x) = p(x + 1) for all real x. 25. Let V be the linear space of all real functions differentiable on the open interval (— 1, 1). Ife V, g = T(f) means that g(x) = xf"(x) for all x in (= 1, 1) 36 Linear transformations and matrices 26. Let V be the linear space of all real functions continuous on fa, b]. If fe V, g = T() means that 2x) = [2/00 sin -)d forage ch 27. Let V be the space of all real functions twice differentiable on an open interval (a, 6). If y €V, define T(y) = y” + Py’ + Qy , where P and Q are fixed constants, 28. Let V be the linear space of all real convergent sequences {x,}. Define a transformation T: V— V as follows: If x = (x,} is a convergent sequence with limit a let T(x) = {Yn}, where Jn =a ay for m2 1. Prove that Tis linear and describe the null space and range of 7. 29. Let V" denote the linear space of all real functions continuous on the interval [—, x]. Let S be that subset of V consisting of all f satisfying the three equations i fat = [" focostat =0, [" fsinede =0. (a) Prove that S is a subspace of V. (b) Prove that $ contains the functions f (x) = cos mx and f(x) = sin me for each # = 2,3, (©) Prove that $ is. infinite-dimensional Let T: V— V be the linear transformation defined as follows: IEF © Vg T() means that 4g) = i {lt cos nif de. (d) Prove that 7(¥), the range of 7, is finite-dimensional and find a basis for 7(V). (@) Determine the null space of 7. (® Find all real © ¢ 0 and all nonzero f in V such dat 7( = of. (Note that suche an f lies in the range of 7.) 30. Let 7: V+ W be a linear transformation of a linear space V into a linear space W. If V is infimte-dimensional, prove that at least one of ‘I(V) or N(1) 18 intinte-cimensiona (Hint: Assume dim N(T) = & , dim 7(V) =r , let e ,.., ¢ be a basis for N¢T) and Jet e1) <-> Oks Cpt» + ~ + psn be independent elements in V, where n> 1. The clements T(eeaa)s- - +s Tl€een) are dependent since n > r. Use this fact to obtain a contradiction. 25 Algebraic operations on linear transformations Functions whose values lie in a given linear space W can be added to each other and can be multiplied by the scalars in W according to the following definition. perimtion. Let S: V—> W and T: V—> W be two functions with a common domain V es ina iinear space W. if is any scutar in W, we define the sum 8 + T and the product cT by the equations 04) (S+ T)a)= SQ) + TH), (TQ) = eT) for all x in V. Algebraic operations on linear transformations 37. We are especially interested in the case where V is also a linear space having the same scalars as W. In this case we denote by &( V, W) the set of all linear transformations of V imo W. If S and Tare two linear transformations in Z( V, W), it is an easy exercise to verify that S + T and eT are also linear transformations in Y(V, W). More than this is true. With the operations just defined, the set L(V, W) itself becomes a new linear space. The zero transformation serves as the zero element of this space, and the transformation (~1)T is the negative of T. I is a suaightforward matter to verify that all ten axioms for a linear space are satisfied, Therefore, we have the following. THEOREM 2.4. The set L(V, W) of all linear transformations of V into W is a linear space with the operations of addition and multiplication by scalars defined as in (2.4), A more interesting algebraic operation on linear transformations is composition or multiplication of transformations, This operation makes no use of the algebraic structure of a linear space and can be defined quite generally as follows. Tx) Frome 2.1 Illustrating the composition of two transformations. permurtion. Let U, V, W be sets. Let T: U-> V be a function with domain U and values in V, and let S: V—> W be another function with domain V and values in W. Then the composition ST is the function ST: U—> W defined by the equation (STVx) = S[T@)] for every x in U. Thus, to map x by the composition This is illustrated in Figue 2.1. Composition of real-valued functions has been encountered repeatedly in our study of iculus, and we nave seen that the operauon 1s, i general, not commutanve, However, as in the case of real-valued functions, composition does satisfy an associative law. we first map x by T and then map T(x) by S. THEOREM 2,5, If 1: U-> V, S: VW, and R: W > X are three functions, then we have RST) = (RS). 38 Linear transformations and matrices Proof, Both functions R(ST) and (RS)T have domain U and values in X. For each x in U, we have [R(STIIG) = RU(STI@) = RISIT@M] and [(RS)TI(x) = (RS)[T@)] = RISTO, which proves that R(ST) = (RST. vermox. Let T: V—> V be a function which maps V into itself. We define integral powers of T inductively as follows. P=1, T*=7T™1 for n>1. Here J is the identity transformation, The reader may verify that the associative law implies the law of exponents T"7" = 77 for all nonnegative integers m and n, The next theorem shows that dhe compesition of Zinear ausformations is again linear, ruxonre 2.6. If U,V, W are linear spaces with the same scalars, and if T: U + V and S: V —» W are linear transformations, then the composition ST: U — W is linear, Proof. For all x, y in U and all scalars a and b, we have (ST)(ax + by) = SIT(ax + by)} = SlaT(x) + 6T)] = aSTQX) + STO). Composition can be combined with the algebraic operations of addition and multiplica- tion of scalars in 4(V, W) to give us the following. ruzoreu 2.7. Let U,V, W be linear spaces with the same scalars, assume S$ and T are in LV,W), and let © be any scalar. (a) For any function R with values in V, we have (S+T)R=SR+TR and (cS)R= c(SR). (b) For any linear transformation R: W > U, we have R(S+ T)=RS+RT and — RCS) = (RS). The proof is a straightforward application of the definition of composition and is left as an. exercise 2.6 Inverses Jn our study of real-valued functions we leamed how t construct new functions by inversion of monotonic functions. Now we wish to extend the process of inversion to a more genemil class of fimetions Inverses 39 Given a function T, our goal is to find, if possible, another function $ whose composition with Tis the identity transformation. Since composition is in general not commutative, we have 10 distinguish between ST and 7S. Therefore we introduce two kinds of inverses which we call left and right inverses. perinrriox. Given two sets V and Wand a function T: V ~» W. A function S: T(V) — V is called a left inverse of T if S{T(x)] = x for all x in V, that is, if ST=Iy, where Ty is the identity transformation on V. A function R: T(V) ~» V is called a right inverse of T if TIR(y)] = y for ally in T(V), that is, if TR= Iniyy, where Ip (yy is the identity transformation on TV). sues. A function with no left inverse but with two right inverses. Let V = (1, 2) and let W = {0}. Define 7: V— Was follows: T(1) = T@) = 0. This function has two right inverses R: W—> V and R’ : W—> V given by R(O) = 1, RYO) = 2 It cannot have a left inverse S since this would require 1 = STO] = 80) and 2 = SIT) = SH). ‘This simple example shows that left inverses need not & be unique. st and that right inverses need not Every function 7; V—> W has at least one right inverse. In fact, each y in 7(V) has the form y = T(x) for at least one x in V. If we select one such x and define R(y) = x , then T{R(y)] = T(x) = y for each y in TYV), so R is a right inverse. Nonuniqueness may occur because there may be more than one x in V which maps onto a given y in 7YV). We shall prove presently (in Theorem 29) that if each y in T(V) is the image of exactly one x in V, then right inverses are unique. First we prove that if 2 left inverse exists it is unigue and, at the same time, is a right inverse. rmmonen 2.8. A function T: V—> W can have at most one left inverse. If T has a left inverse S, then § is also a right inverse. Proof. Assume T has two left inverses, $: T(V)+ Vand $': T(V)+ V. Choose any y in T{V). We shall prove that S(y) — S'(y) . Now y — T¥x) for some x in V, so we have S(T@))= x and S*/T13), 40 Linear transformations and matrices since both $ and S" are left inverses. Therefore S(y) = x and S’(y) = x , s0 Sty) = S'(y) for all y in T(V). Therefore S = S* which proves that left inverses are unique. Now we prove that every left inverse $ is also a right inverse, Choose any element y in T(V). We shall prove that T[S()] = y . Since y € T(V), we have y = T(x) for some x in V. But § is a left inverse, so. x = SIT@))= Si). Applying T, we get Tix) = TISQ)] . But y = Tix), so y = TIS(y)] , which completes the proof. ‘The next theorem characterizes all functions having left inverses. mugorem 2.9. A function T: V + W has a left inverse if and only if T maps distinct elements of V onto distinct elements of W: that is, if and only if, for all x and y in V, (2.5) x#Y — implies — T(x) A Ty). Note: Condition (2.5) is equivalent to the statement 2.6) T(x) =T(y) implies x =y. A function 7° satisfying (2.5) or (2.6) for all x and y in V is said to be one-to-one on V. Proof, Assume T has a Ieft inverse S, and assume that Thx) = Ty). We wish © prove that x = y . Applying S, we find S[7(x)] = S[TQ)] Since S{T(x)] = x and S{TO)I = y, this implies x = y. This proves that a function with a lef inverse is one-to-one on. its domain, Now we prove the converse. Assume Tis one-to-one on V. We shall exhibit a function S: T(V) —» V which is a left inverse of T. If y © T(V) , then y = T(x) for some x in V. By (2.6), there is exactly one x in V for which y = T(x). Define Sty) to be this x. That is, we define $ on T{V) as follows: sy) X means that = Tx) = y. Then we have S[7(x)] = x for each x in V, so ST = J. Therefore, the function $ so defined is a left inverse of . permrtron. Let T: V —> W be one-to-one on V. The unique left inverse of T (which we know is also a right inverse) is denoted by T“, We say that T is invertible, and we call T* the inverse of T. The results of this section refer to arbitrary functions. Now we apply these ideas to linear transformations One-to-one linear transformations 41 2.7 One-to-one linear transformations In this secuon, Y and W denote linear spaces with the same scalars, and 7: Vo W denotes a linear transformation in P( V, W). The linearity of T enables us to express the one-to-one property in several equivalent forms. auzonen 2.10, Let T: V —» W be a linear transformation in &( V, W). Then the following statements are equivalent. (a) T is one-to-one on V. (b) T is invertible and its inverse T-; T(V) — V is linear. (c) For all x in V, T(x) = 0 implies x = 0. That is, the null space N(T) contains only the zero element of V. Proof. We shall prove that (a) implies (6), (b) implies (©), and (c) implies (a). First assume (a) holds. Then T has an inverse (by Theorem 2.9), and we must show that 74 is linear. Take any two elements y and v in 7(V). Then y = T(x) and vy = T{y) for some x and y in V. For any scalars a and b, we have au. ,bv = aT(x) + BT) = Tax + by), since Tis linear. Hence, applying 7-1, we have T7(qu + by) = ax + by = aT(u) + DT), so T-1 is linear, Therefore (a) implies (b) Next assume that (b) holds, Take any x in V for which 7x) = 0. Applying 7-4, we find that x = 7-(Q) = Q, since T-1 is linear, Therefore, (b) implies (c). Finally, assume (c) holds. Take any two elements u and v in V with Tu) = Tv). By linearity, we have T(@ — v) = Tw) — Tiv) = 0, so w — v = 0, Therefore, Tis one-to-one on V, and the proof of the theorem is complete. When V is finite-dimensional, the one-to-one property can be formulated in terms of independence and dimensionality, as indicated by the next theorem. muon 2.1 |, Let T: V — W be a linear transformation in £( V, W) and assume that V is finite-dimensional, say dim V =n. Then the following statements are equivalent. (a) T és one-to-one on V. (d) If e,,...,€, are independent elements in V, then T(e,),..., T(e,) are independent elements in T{V). © dim TV) = 0. (d) If fe:,..., &/ is a basis for V, then {T(e,),...+ Tle,)} is a basis for T1V). Proof. We sball prove that (a) implies (b), (b) implies (c), (©) implies (d), and (@) implies (@). Assume (a) holds. Let ey, ... . €9 be independent clements of V and consider the a2 Linear transformations and matrices elements T(e,),. . . .T(e,) in TCV). Suppose that Sere) =0 2 for certain scalars cy, .. . , ¢ By linearity, we obtain (See) O, and hence = =Sce,=O a since T is one-to-one, But e), » @y are independent, so ¢ = ++ = 0, Therefore (a) implies (b). Now assume (b) holds. Let {ce . . . , e,] be a basis for V. By (b), the nm elements T(e,)s..., T(@,) in TCV) are independent. Therefore, dim T(V) > n . But, by Theorem 2.3, we have dim T(V) ¥ for which 7(V) — V . There are six altogether. Label them as T,,..., 7s and make a muliplication table showing the com- position of each pair, Indicate which functions are one-to-one on V, and give their inverses In each of Exercises 3 through 12, a function T: V¥,—* V, is defined by the formula given for T(x, y), where (x, y) is an arbitrary point in V,. In each case determine whether T' is one-to-one on Va. If it is. describe its range TY Vx): for each point (u. 2) in TC Vs), let (x. y) = TMu, v) and give fommalas for determining x andy in terms of w and v. 3. T(x, y) = (y, »)- 8. T(x, y) = (&, e”) 4. TG, y)= (%-)). 9 T(x, y) = (x, 1). 5. T(x, y) = (x, 0) 10. T(x,y) = @ + Ly + 1) 6. T(x, y) = (x, x) IL, Tx, y) = (& — yx +y)- 7. Te y= OY i2 7u,y) Exercises 43 In each of Exercises 13 through 20, a function T: V2—> V is defined by the formula given for T(x, y, 2), where (x, y, 2) is an arbitrary point in Vg. In each case, determine whether Tis one-to- one on Vg. If it is, describe its range T(V,); for each point (u,v, w) in T(V,), let (x, y, 2) Tu, v, w) and give formulas for determining x, y, and z in terms of u, v, and w, 13, Tt, w= yx) 17. Ty y,2) = + Ly + 1jz— 0. 14, THY, 2) = Oy, 18. TQ, y,2) = + Ly +2,2 +3). 15. Th. ¥. 2) = 2y. 32), 19. Tex, ¥. 2) = (x, X4 WX YE. 16. Tex, y,2) = (yx ty $2). 20. Tix,y,2) =@ +y,y +2,x42). 21, Let T: V-> V be a function which maps V into itself, Powers are defined inductively by the formulas T? = J, 7? = TT) for n> 1 Prove that the associative law for composition implies the law of exponents: T*7* = Ti". If Tis invertible, prove that 7” is also inverible and that (T*)1= (Ts, In Exercises, 22 through 25, S and 7 denote functions with domain V and values in ¥, In general, ST # TS. If ST = TS, we say that S and T commute. 22. If § and 7 commute, prove that (ST)* = $*7* for all integers n > 0. 23. If $ and Tare invertible, prove that ST is also invertible and that (ST)! = T-1S+1, In other words, the inverse of $7 is the composition of inverses, taken in reverse order. If $ and Tare invertible and commmute, prove that their inverses also commute. Let ¥ be a linear space. If § and 7 commute, prove that e (S+7=S42sr+ 1 and ($+ TP = + 3ST + 397+ T* Indicate how these formulas must be altered if ST # TS. 26. Let Sand 7 be the linear transformations of V, into Vy defined by the formulas S(x, y, 2) = (% yy x) and T(x, y, 2) = (x, x + yx + y+ 2), where (x, y, z) is an arbitrary point of V,. (a) Determine the image of (x, y, 2) under each of the following transformations : ST, TS, ST = TS. S*, T?, (ST), (T5)", (ST — TS)* (b) Prove that and Tare one-to-one on Vg and find the image of (u, v, w) under each of the following transformations : $1, T-!, (ST, (TS). (c) Find the image of (x, y, 2) under (1° = 1)" foreach n> 1 Let ¥ be the linear space of all real polynomials p(x), Let D denote the differentiation operator ‘and let 7 denote the integration operator which maps each polynomial p onto the polynomial 4 given by q(x) = Jf p(t) de , Prove that DT = fy but that TD ¥ Jy, Descnbe the null space and range of TD. 28, Let V be the linear space of all real polynomials p(x). Let D denote the differentiation operator and let 7 be the linear transformation that maps p(x) onto xp’ (x), (a) Let p(x) = 2 + 3x = a*+ 4x3 and determine the image ofp under each of the following transformations: D, T, DT, TD, DT — TD, T*D® — D*T? (b) Determine those p in V for which 7(p) = p . (© Determine thosep in V for which (DT — 2D)(p) = 0. (@) Determine thosep in V for which (DT — TDY%p)-~ Dr(p). . Let V and D be as in Exercise 28 but let 7 be the linear transformation that maps. p(x) onto xp(x). Prove tihat DT — TD = Zand that DT" ~ T"D = nT for n > 2. 30. Let Sand 7 be in ¥(V, ¥) and assume that ST — TS Prove that ST* — T"S = nT") for all m2 3.1. Let ¥ be the linear space of all real polynomials p(x), Let R, S, be the functions which map aut atbiiaay polynomial p(x} = cy 7 Ga TF enx™ i ¥ uly dhe pulyuvuniais uta}, 9a), “4 Linear transformations and matrices and ((x), respectively, where 1) = pO, = Soe, ae) = Dee, (a) Let p(x) = 2 + 3x = 27 + x* and determine the image of p under each of the following transformations: R, S, T, ST, Ts, (TS), T2S*, S*T*, TRS, RST. (b) Prove that , S, and Tare linear and determine the null space and range of each, (©) Prove that T' is one-to-one on V and determine its inverse. (@ If n> 1, express ( TS)" and S*7” in terms of J and R. 32, Refer to Exercise 28 of Section 2.4. Determine whether Tis one-to-one on V. If it is, describe its inverse. 2.9 Linear transformations with prescribed values. If V is finitedimensional, we can always construct a linear transformation > V-> W with prescribed values at the basis elements of V, as described in the next theorem. THEOREM 2.12, Let e,...,¢, be a basis for an n-dimensional linear space V. Let ty,+2+ My 66 1 arbitrary elements in a linear space W. Then there is one and only one linear transformation T: V —» W > such that 27D Te.) = uy, for k=1,2.....0 This T maps an arbitrary element x in V as follows: (2.8) If & =Dxeq, then Tee) = ¥ xylly. 7 a Proof. Every x in V can be expressed uniquely as a linear combination of €;, e, the multipliers x, .. . ,%_ being the components of x relative to the ordered basis (@,,...,€) If we define T by (2.8), it is a straightforward matter to verify that Tis linear. If x = , for some h, then all components of x are 0 except the kth, which is 1, so (28) gives T(e,) = m,, ate required, To prove that there is only one linear transformation satisfying (2.7), let 7” be another and compute Ta). We find that Since Tee = Th) for ali x in Vy we imve T= 7, wich compices tie provi, emere. Determine the linear transformation T: V,—> Vz which maps the basis elements (1,0) and j = ©, 1) as follows: Wit, TH=2%-j. Matrix representations of linear transformations 45 Solution, If x= xyi+ xq is an arbitrary clement of Vg, then T(x) is given by Tex) = Ti) + xT) = aE +) + Qi — p= Cy + 2+ ( — D 2.10 Matrix representations of lincar transformations Theorem 2.12 shows that a linear transformation 7: V-—> W of a finite-dimensional linear space V is completely determined by its action on a given set of basis elements €.4...4@q» Now, suppose the space W is also finite-dimensional, say dim W =m, and let Wyss +45 Wy be a basis for W. (The dimensions and m may or may not be equal.) Since T has values in H’, each element 7(e,) can be expressed uniquely as a linear combination of the basis elements Wy... Wins Say Te) = Stam, where fy... + fe are the components of T(e,) relative to the: ordered basis (Wy, ... , w,). We shall display the m-tuple (ty... - + tmx) vertically, as follows: fe (2.9) -). La! ‘This anray is called a column vector or a column matfix. We have such a column vector for each of the elements T(e,), . . . , T(e,). We place hem side by side and enclose them in one par of brackets to obtain the following rectan array: tm tma 6 fmm This array is called a matrix consisting of m rows and n columns. We call it an_m by ” matrix, or an m xm matrix, The first row is the 1x m matrix (ys fay. - + + Gig): The m x 1 matrix displayed in (2.9) is the kth column, The scalars f,, are indexed so the first subseript i indicates the row:, and the second subseript k indicates the column in which tie occurs We call ty the ik-entry or the ik-element of the matrix, The more compact notation (ty), oF is aio ued WW Geuvie die smauia wine iveuiry is fy 46 Linear transformations and matrices every linear transformation 7 of an n-dimensional space V into an m-dimensional space W gives rise to an m x # matrix (f,,) whose columns consist of the components of Te), , T(e,) relative w the basis (w,, .w,). We call this the matrix representation of T relative to the given choice of ordered bases (e, ..., €)) for Vand (i... ., w,) for W. Once we know the matrix (fz), the components of any element T(x) relative to the basis (w,,...,w,) can be determined as described in the next theorem. mmonm 2.13. Let T be a linear transformation in ( V, W), where dim V = n and dim W = m. Let (e;, » @,) and (wy,..., w,) be ordered bases for Vand W, respectively, and let (1x) be the m xn matrix whose entries are determined by the equations 2.10) Te) = Stam. for Then an arbitrary element in V with components (x1... . , x.) relative to (2, ¢.) is mapped by T onto the element 212) Tw = 3 yyy, 2.12) Te) = F 34H, a in W with components (y,,«. «+ ,) relative to (w,,..., w,) The y, are related to the components of x by the linear equations = (2.13) for i=1,2,. Proof. Applying T to each member of (2.11) and using (2.10), we obtain T(x) = ExT) Sa Sem = 3 (Sree) Sym, Fea SOS 2 & where each y, is given by (2.13), This completes the proof, Having chosen a pair of bases (e,, ,@) and (i,,..., W,) for V and W, respectively, every linear transformation T; V-> W has a matrix representation (t,.). Conversely, if we start with any mm scalars amanged as a rectangular matrix (J,,) and choose a pair of ordered bases for V and W, then it is easy to prove that there is exactly one linear trans- formation 7: V—> W having this matrix representation. We simply define Tar the basis elements of V by the equations in (2.10). Then, by Theorem 2.12, there is one and only cone linear transformation T: V—> W with these prescribed values. The image (x) of an arbiuary poim x in V is den given by Equations (2.12) Matrix. representations of linear transformations 47 EXAMPLE I, Construction of a linear transformation from a given matrix, Suppose we start with the 2 x 3. matrix 3001-2 1 0 4 Choose the usual bases of unit coordinate vectors for Vy and ¥;. Then the given mattix represents a linear transformation 1) Vy—+ Vz which maps an arbitrary vector (x1, X25 3) in Vg onto the vector (j,, 2) in Vz according to the linear equations Ji = 3x, + X, — 2x5, Dos M+ ox, + 4a. wxaneus 2, Construction of a matrix representation of a given linear transformation. Let V be the linear space of all real polynomials p(x) of degree <3. This space has dimen- sion 4, and we choose the basis (, x, x%, x9), Let D be the differentiation operator which maps each polynomial p(x) in V onto its desivative p(x). We can regard D as a linear transformation of V into W, where W is the 3-dimensional space of all real polynomials of degree <2’. In W we choose the basis (1, x, x®), To find the matrix representation of D relative to this (choice of bases, we tansform (differentiate) each basis element of ¥ and express it as a linear combination of the basis elements of W. Thus, we find that D(1) = 1 + Ox + Ox, =0+0r+0x, Di) D8) = 2x = 0. ax + Ox, DO) = 3x7 = 0 + Ox $ 32". ‘The coefficients of these polynomials determine the columns of the matrix representation of D. Therefore, the required representation is given by the following 3 x 4 matrix: 0100 0020). 0003 To emphasize that the matrix representation depends not only on the basis elements but also on their onder, let us reverse the order of the basis elements in Wand use, instead, the ordered basis (x8, x, 1). Then the basis elements of V are transformed into the same poly- nomials obtained above, but the components of these polynomials relative to the new basis (x, x, 1) appear in reversed order, Therefore, the mavix representation of D now becomes 0003 8 Linear transformations and matrices Let us compute a third matrix representation for D, usinglhebasis (I, 1 +x, 14x + 2x%, 1+ x + x4 x4) for V, and the basis (1, x, x%) for W. The basis elements of V are trans- formed as follows: DA) = 0, D(il+x) = 1, D(l +x4x°)-1 42x, DU +x4x24 xX) =1 +2x43x, so the matrix representation in this case is 01 1 S e wD 2.11 Construction of a matrix representation in diagonal form ince it is possible to obtain different matrix representations of a given linear tmnsfermm tion by different choices of bases, it is natural 10 uy to choose the bases so that the resulting matrix will have a particularly simple form. The next theorem shows that we can make all the enuies 0 except possibly along the diagonal suming from the upper feft-hand comer of the matrix, Along this diagonal there will be a sting of ones followed by zeros, the number of ones being equal to the rank of the transformation. A matrix (t,,) with all entries ti, = 0 when i # k is said to be a diagonal matrix. upon 2.14. Let V and W be finite-dimensional linear spaces, with dim V = n and dim W = m. Assume T € L(V, W) and let r =dim T(V) denote the rank of T. Then there exists a basis (e,...., €.) for V and a basis (w,,.... W,) for W such that 2.14) Te) =", for i=1,2,...47% and Te)=Of or i=rty..yn Therefore. the matrix (t,,) of T relative 10 these bases has all entries zero except for the r diagonal entries =t,=1 Proof. First we construct a basis for W. Since 7(V) is a subspace of W with dim 77V) = 1, the space T(V) has a basis of r elements in W, say w%,... , W,. By Theorem 1.7, these elements form a subset of some basis for W. Therefore we can adjoin elements Ways + « Wy, 80 that (2.16) (Wye es Wes Wrtnee 9 Wn) is a basis for W. Construction of a matrix representation in diagonal form 49 Now we construct a basis for V. Each of the first r elements w'; in (2.16) is the image of at least one element in ¥. Choose one such element in V and call ite; . Then T(e,) = w, for £=1,2,..., 7 so 2.14) is suisfied. Now let k be the dimension of the null space NCD, By Theorem 23 we have n = k +r, Since dim N(T) = &, the space N(T) has a basis consisting of k elements in V which we designate as e,.1 + ue For each of these elements, Equation (2.15) is satisfied. Therefore, to complete the proof, we must show that the ondered set (2.17) (ys Crs Crete. es Creed is a basis for V. Since dim V = n = r + k, we need only show that these elements are independent, Suppose that some linear combination of them is zero, say 2.18) Appiying 7 and using Byuation But wy,+..,W, are independent, and hence ¢ = + terms in (2.18) are zero, so (2.18) reduces to = 0. Therefore, the first But er, sss, @,f¥@ independent since they form a basis for NT), and hence ¢,,, = Crap = 0. Therefore, all the c, in (2,18) are zero, go the elements in (2.17) form a basis for V. This completes the proof. suweiz. We refer (0 Example 2 of Section 2.10, where D is the differentiation operator which maps the space V of polynomials of degree <3 into the space W of polynomials of degree <2. In this example, the range 7(V) = W, so T has rank 3. Applying the method used to prove Theorem 2.14, we choose any basis for W, for example the basis (1, x, x%), ‘A set of polynomials in V which map onto these elements is given by (% 4x2, x4), We extend this set to get a basis for V by adjoining the constant polynomial 1, which is a. basis for the null space of D. Therefore, if we use the basis (x, $x%, 4x°, 1) for V and the basis (1, x, x*) for W, the comesponding matrix representation for D has the diagonal form 1000 0 1 0.0 Loo i OY 50 Linear transformations and matrices 2.12 Exercises J all exercises involving dhe vector space Vy 5 the usual Dasis of unit coordina vectors is © be chosen unless another basis is specifically mentioned. In exercises concerned with the matrix of a linear transformation T: V+ W where V = W’, we take the same basis in both V and W uniess another choice is indicated. 1. Determine the matrix of each of the following linear transformations of ¥, into Vy + (@) the identity transformation, (b) the zero transformation, (©) multiplication by a fixed scalar c. Determine the matrix for each of the following projections. (@) Ts Vy Va, — where Ty, 25%) = Chr, Xa) - (b) Ts Var Vy, where T(x, x2, %3) = Cas %)- (0) T: Ve-¥ Vy, where Tey se, X55 40%) — Chay Myra) 3. A linear transformation 7: Vy > Vy maps the basis vectors # andj as follows: Tw th THR (a) Compute 7(34 = 4j) and T?Gi ~ 4)) in terms of i and j (b) Determine the matrix of T and of T?, (©) Solve part (b) if the basis G, j) is replaced by (e, , 2), where ¢, = 7 = j, ey 4, A linear transformation 7: Vy —> Vy is defined as follows: Each vector (x, 9) the y-axis and then doubled in length to yield 7(x, y). Determine the matrix of 7 5. Let 7: Vy Vq be a linear transformation such that 11k) mt ae, rie k) + T(itjek)=j-k. {a} Compute T+ 2) + 9%) and deictmine the nullity and rank of 7 (b) Determine the matrix of 7. 6. For the linear transformation in Exercise 5, choose both bases to be (@,, e2, 9), where & = 2, 3, 5), ep = (1, 0, 0), ey = ©, 1, =1), and determine the mawix of 1 relative | the new bases. 7. A linear transformation 7: Vz —> V, maps the basis vectors as follows: T(i) = (0, 0), TQ) = Gd, 1), TH) = (, =p. (a) Compute 7@4i -j + k) and determine the nullity and rank of 7. (b) Determine the matrix of 7. (©) Use the basis Gi, j, #9 in ¥y and the basis (wy, w,) in Vp, where w; (1,2). Determine the maitix of 7 relative 10 these bases. G@) Find bases (é . eg, €3) for Vy and (Wy , wa) for Ve relative to which the matrix of Twill be in diagonal. form, 8. A linear transformation 7: Vz > Vz maps the basis vectors as follows: 71) = (1, 0 1), TG) = (-1,0, D. (a) Compute T(2i ~ 3)) and determine the nullity and rank of 7. (b) Determine the matrix of 7. (©) Find bases (©) for ¥p aint (Wy, amg) for Fy achativ in diagonal form. 9, Solve Exercise 8 if Td = (1,0, 1) and 7g) = (1,1, 1). 10, Let Vand W be linear spaces, each with dimension 2 and each with basis (@; , é2). Let 1: VW be a linear transformation such that Tie, + €) = 3¢, + 9ey, Te, + 2e,) = Te + 23ey (a) Compute 7(e, ~ e,) and determine the nullity and rank of 7. (b) Determine the matrix of 7 relative to the given basis. GD, w, 3 EB z Linear spaces of matrices 51 (c) Use the basis (e, , ¢,) for V and find a new basis of the form (ey + aez, 2e, + be,) for W, relative (0 which the matrix of Twill be in diagonal form, In the linear space of all real-valued functions, each of the following sets is independent and spans a finite-dimensional subspace V. Use the given set as a basis for V and let D: ¥-> V be the differentiation operator. In cach case, find the matrix of D and of D® relative to this choice of basis, 11. Gin x, c0s x). 15. Geos x, sin x), 12.0, xe). 16. (Sin x, COS X, X sin X, X 00S x). 3. (1, 1+x,1+ x +e). 17. (€€ sin X, €* 00S X). 14. (e*, xe*), 18, (e** sin 3x, e* cos 3x). 19. Choose the basis (1, x, 2%, 2°) in the linear space V of all real polynomials of degree <3. Let D denote the differentiation operator and let T; V-> V be the linear transformation which maps p(x) onto xp’(x). Relative w the given basis, determine the matrix of each of the followingtransformations: (a) T; (b) DT; (c) TD; (d) TD — DT; (¢) T®; (f) T20? ~ D°T*. 20, Refer to Exercise 19, Let W be the image of Vunder ID. Find bases for Vand for W relative to which the matrix of 7) is in diagonal form. 2.13 Linear spaces of matrices We have seen how matrices arise in a natural way as representations of linear tans: formations. Matrices can also be considered as objects existing in their own right, without necessarily being connected to linear transformations. As such, they form another of mathematical objects on which algebraic operations can be defined. The connection with linear transformations serves as motivation for these definitions, but this connection will be ignored for the moment. Let m and 1 be two positive integers, and let Jy,,, be the set of all pairs of integers (i, ) such that 1 a for k= 1,2,...40. Sinee we have Souttom and CN@)= Flume (S + THe) m(S) + m(T) and m(cT) = (ct) = cm(T). ‘This proves we obtainm(S + 7) = (Sq + te that_m_ is linear To prove that m is one-to-one, suppose that m(S) = m(T), where $ = (5,) and T = (4) Equations (2.19) and (2.21) show that S(¢,) = Tle,) for each basis element e, 80 S(x) = T(x) for all x in V, and hence $ = T. Note: ‘The function m is called an isomorphism. For a given choice of bases, m establishes a one-to-one correspondence between the set of linear transformations L(V, W) and the set of m xn matrices My». The operations of addition and multipli- cation hy scalars are preserved under this correspondence, The linear spaces #(V, W) and M,,,,, are said 0 be isomorphic. Incidentally, Theorem 2.11 shows that the domain of a one-Lo-one linear transformation has the same dimension as its range, Therefore, dim (V, IV) =dim Myq = mn if V = Wand if we choose the same basis in both V and W, then the matrix m(Z) which corresponds to the identity transformation J: V—+ V is an m x n diagonal matrix with each 54 Linear transformations and matrices diagonal entry equal 101 and all others equal to 0. This is called the identity or unit matrix and is denoted by I or by L. 2.15 Multiplication of matrices Some linear transformations can be multiplied by means of composition, Now we shall define multiplication of matrices in such a way that the product of two matrices comesponds to the composition of the linear transformations they represent We recall that if T: U-> V and S: V-> Ware linear transformations, their composition ST: U— W is a linear transformation given by ST(x) = SIT) forallxin U. Suppose that U, V, and Ware finite-dimensional, say dmU=n, dimY¥=p, dim W= m Ch matrix, the matix 7 is a p x m matrix, and the mauix of ST is an n matix, The following definition of marix muliplication will enable us 10 deduce the relation m(ST) = m(S)m(T). ‘This extends the isomorphism property to products. pmnmnon. Lei A be any mx p mairix, and iet B be any px nmatrix, say A= (a,)M2. and B= (bi dita. Then the product AB is defined to be the m x1 matrix C = (cs) whose ij-entry is given by 2 Cy = 2 eubss- Note: ‘The product AB is not defined unless the number of columns of A is equal to the number of rows of B. JE we write A; for the ith row of A, and B* for thejth column of B, and think of these as p-dimensional vectors, then the sum in (2.22) is simply the dot product 4, » Bi. In other words, the ij-entry of AB is the dot product of the ith row of A with thejth column of B: AB = (A; - BYR ‘Thus, matrix multiplication can be regarled as a generalization of the dot product. 312 and B -110 4 6 S—1].SinceAis2 x 3andBis3x 2, EXAMPLE I. Let 4 ud Multiplication of matrices 55 the product AB is the 2 x 2 matrix Ay B A,B!) fT 20 AB= = Ay BY Ay: BY 1-7 The entries of AB are computed as follows: Ay B=3-441-542-0=17, A. B= 3-641-(-I 42-2221, A.) Bes (1)+441-54+ 0-051, Ay B= —1)-6+1-(-1)+0-2=-7. EXAMPLE 2, Let 2 2 1 -3 A= and B= 1 124 2 Here A is 2 x 3 and Bis 3 x 1,50 AB is the 2 x 1 matrix given by (al [a] AB= = : Ay BL 8 since A, + Bt = 2+(=2) + 1-1-4 (=3):2= -9and y- BE = 1-(-2) 442-144-228, wemets 3. If A and B are both square matrices of the same size, then both AB and BA are defined, For example, if we find that 1 10 Boa ‘This cxample shows that in general AB # BA . If AB = BA , we say A and B commute. wanpis 4. If I, is the p x p identity matrix, then J,4 = A for every p x m matrix A, and BI, = B for every m x p matrix B. For example, 123 qa - bs ob 1 qlee 0 1 ofj3}=]3 0 425.6 oo l4 4 0 Now we prove that the matrix of a composition ST is the product of the matrices m(S) and m(r). to onf2 2 56 Linear transformations and matrices rmsonen 2.16, Let T: U-» Vand S: V—> W be linear transformations, where U, V, W are finite-dimensional linear spaces. then, for a fixed choice of bases, the matrices of S, T, and ST are related by the equation m(ST) = m(Symn(T). Proof. Assume dim U =n, dim V = p, dim W = m. Let (j,.... uy) be a basis for U, (e+ ++; ) a basis for V, and (wy... ., w,) a basis for W. Relative to these bases, we have (5) = m2 Boj @ bl12 ; WS) = GE, where SQ)— Ssaw;, for R= 4.00.7, and m(T) = (ty)Pha, where Tu) = Dt for f= y2e...sm a Therefore, we have sru) = sittuy] =Sn8te) =F, ¥ same = ¥ mG & so we find that m(ST) = (38 to) {= mS)m(T) We have already noted that matrix multiplication does not always satisfy the com- mutative law, The next theorem shows that it does satisfy the associative and distributive laws. RIBUTIVE LAWS FOR MATRIX MUL’ IM 2.17, ASSOCIATIVE AND DIS Given matrices A, B, C. (@) If the products A(BC) and (AB)C are meaningful, we have A(BC) = (AB)C (associative lav). (b) Assume A and B are of the same size, If AC and BC are meaningful, we have (A+ B)C = AC + BC (right distributive law), whereas if CA and CB are meaningjill, we have C(A + B)=CA+ CB (left distributive law). Thesc properties can be deduced directly fiom tho dof of iain hi plication, but we prefer the following type of argument, Introduce finite-dimensional linear spaces U, V, W, X and linear transformations 7; U-> V, S: V> W, R: W> X such that, for a fixed choice of bases, we have Az=mR) B=m(S), C=m(T). Exercises 57 By Theorem 2.16, we have m(RS) = AB and m(ST) = BC From the associative law for composition, we find that R(ST) = (RS)T. Applying Theorem 2.16 once more to this equation, we obtain m(R)m(ST) = m(RS)m(T) or A(BC) = (AB)C, which proves (a). The proof of (b) can be given by a similar type of argument, pininnion. IFA Is a square mairix, we define integral powers of A inductively as fotiows. a” A" =AA™ for n>1. 2.16 Exercises 1 cal fo 4 pent 2 1 If A= 1 B=|-! 3], C= |1 =], compute B + C, AB, 5 2 es BA, AC, CA, AQB—3C). oO 2 Let A= F 1] Find all 2 x 2 matrices B such that (a) AB = 0 ; (b) BA = 0. 3. In cach ease find @, b, ©, d to satisfy the given equation. 00 1 O}fa) fr 1020 100 0j/5] Io ubepjoort 066 @) ={ ts () 010 olfel fe 149B}o100 984 oo 0 tla 5. joo 1 0 4, Calculate AB — BA in each case. 122 41. (a) 4=]2 1 2], B=|-4 2 oO}; 123 121 2.60 6 a 1 2 (b) A=] 1 1 2], B=] 3° -2 4 bbo2 ay foe 3 it 5.1f A is a square matrix, prove that A"A" = A” for all integers m > 0, n> 0. tot 12 6. vaa=[) } seem a =[f |eessnene L 1 ‘cos 20 —sin 20 sin20 cos 20, Lil 123 ‘cos 8 —sin 0 7. Let A= « Verify that 4? = sinf cos 9. | anc compute A", 8. Let A 0 1 4). Verify that 47 =]0 1 2]. Compute 4* and A‘, Guess a general 001 oon formula for A” and prove it by induction, 58 Linear transformations and matrices 10 ona [ + Prove that A? = 2A = Z and compute 4, liz 10, Find all 2 x 2 matrices A such that 48 = 0. 11 (@) Prowe that a? ¥ 2 matrix A commntes with every 2 x 9 mairiy if and only if A commutes with each of the four matrices 1 0 1 0 0 00 Oo| (00) iol (oi) (b) Find all such matrices A, 12 The equation A? = Z is satisfied by each of the 2 x 2 matrices [ob Eb bah wie 6 and © awe abiiay wal nunibers, Tin ait 2x 2 mnaices A suck dat Ae — 7 1 6 Bo A= | wan |: find 2 x 2 matrices C and D such that AC = B 98 andD B 14, (a) Verify that the algebraic identities (A+ BP =A? + 2AB+ B® and = (A+ BY(A=B) = 4°- Be k dl ] and B= : o 2 12 (b) Amend the right-hand members of these identities to obtain formulas valid for all. square matrices A and B. (©) For which matrices A and B are the identities valid as stated in (a)? do not hold for the 2 x 2 matrices A 27 Systems of linear equations Tet A = (a,,) be a given m xn matrix of numbers, and let ¢,, 1, be mm farther numbers. A. set of m equations of the form (2.23) Dagr, =e; fori sm, a is called a system of m linear equations in n unknowns. Here x1, ... , x, are regarded as unknown, A solution of the system is any n-tple of numbers (x,,... + xq) for which all the equations ar satisficd. The matrix A is called the coefficient-matrix of the system, Linear systems can be studied with the help of linear transformations. Choose the usual bases of unit coordinate vectors in V, and in V,,. The coefficientmatrix A determines a Systems of linear equations 59 linear transformation, 7: V, — Vy, which maps an arbitrary vector x = (x1, ... x) in V,, onto the vector y = (ys... + Ym) in Vm, given by the m linear equations Y= Laux, for i= 1,2,...,m Letc=(c,,...,¢)) be the vector in V,, whose components are the numbers appearing in system (2.23), This system can be written more simply as; x) =e. The system has a solution if and only if © is in the range of 7, If exactly one x in V,, maps onto ¢, the system has exactly one solution, IF more than one x maps onto ¢, the has more than one solution. stom sovee: 1, A system with no solution, The system x + y solution, “The sum of two numbers cannot be both I and 2. 1,x+ y= 2 has no swvexe 2. A system with exactly one solution, The system x + y exactly one solution: (x, y) = (}, 4). sx y =0 has suvwerz: 3. A system with more than one solution. The system x + y = 1 , consisting of one equation in two unknowns, has more than one solution, Any two numbers whose sum is | gives ii solution, With each linear system (2,23), we can associate another system Sayx,=0 for 7m, obtained by replacing each ¢; in (2.23) by 0. This is called the homogeneous system comre- sponding (© (2.23). If ¢ # O, system (2.23) is called a nonhomogeneous system. A vector x in V,, will satisfy the homogeneous system if and only if T(x) = 0, where Tis the linear transformation determined by the coefficient-matrix, The homogene ous system always has one solution, namely x = Q, but it may have others. The set of solutions of the homogeneous system is the null space of T, The next theorem describes the relation between solutions of the homogeneous system and those of the nonhomogeneous system, suronn 2.18. Assume the nonhomogeneous system (2.23) has a solution, say b. (a) fa vector x is a solution of the nonhomogeneous system, then the vector v is a solution of the corresponding homogeneous system. (b) Ifa vector v is a solution of the homogeneous system, then the vector x solution of the nonhomogeneous system. 60 Linear transformations and matrices Proof. Let T: V_ —> Vy be the linear transformation determined by the coefficient- matrix, as described above. Since b is a solution of the nonhomogeneous system we have Te) = T(x = b) = Tix) = Mb) = Nw) Therefore T(x) = c if and only if T(e) = 0. This proves both (a) and (b). This theorem shows that the problem of finding all solutions of a nonhomogeneous system splits naturally into wo parts: (1) Finding all solutions v of the homogeneous system, that is, determining the null space of 7; and (2) finding one particular solution b of the nonhomogeneous system, By adding b to each vector v in the mull space of 7, we thereby obtain all solutions x = v + b of the nonhomogeneous system. Let k denote the dimension of N(T) (the nullity of 7). If we can find k independent solutions 1... ,te of the homogeneous system, they will form a basis for N(T), and we can obain every vin NCD) by forming all possible linear combinations Ds Kyte t hy, where t,..- + t are arbitrary scalars, This linear combination is called the general solution of the homogencous system. If b is one particular solution of the nonhomogeneous system, then all solutions x are given by XS b+ Kot hee This linear combination is called the general solution of the nonhomogeneous system. ‘Theorem 2.18 can now be restated as follows. razon 2.19. Let T: V_ > V, be the linear transformation such that Tix) = 2 = Oye thy = Ute... Im) and Ye - Laur for i= 1,2,-.-,m Let k denote the nullity of T. If 15... , Uy, are k independent solutions of the homogeneous system T(x) = 0, and if b is one particular solution of the nonhomogencous system T(x) = then the general solution of the nonhomogeneous system is abt yt... + hes where ty... . » ty are arbitrary scalars, This theorem does not tell us how to decide if a nonhomogeneous system has a particular solution 8, nor does it tell us how tw determine soludons Ps + Oy OF the homogeneous system, It does tell us what (© expect when the nonhomogeneous system has a_ solution, The following example, although very simple, illustrates the theorem, Computation techniques 61 nuvere. The system x + y = 2 has for its associated homogeneous system the equation x + y = 0. Therefore. the null space consists of all vectors in V» of the form (4 —1), where f is arbitrary, Since (f, —1) = (1, —1), this is a one-dimensional subspace of Vz with basis (1, ~ 1). A particular solution of the nonhomogeneous system is (0,2). There- fore the general solution of the nonhomogencous system is given by G@HW=O0,24-1I of xt yo2—-t, where 1 is arbitrary 248 Computation techniques We tum now to the problem of actually computing the solutions of a nonhomogeneous linear system, Although many methods have been developed for attacking this problem, all of them require considerable computation if the system is large. For example, 10 solve a system of ten equations in as many unknowns can require several hours of hand com- putation, even with the aid of a desk calculator. We shall discuss a widely-used method, known as the Gauss-Jordan elimination method, which is relatively simple and can be easily programmed for high-speed electronic computing machines. The micthod consists of applying three basic types of operations on the equations of a linear system: (1) Interchanging two equations; (2) Multiplying all the terms of an equation by a nonzero scalar: (3) Adding to one equation a multiple of another. Each time we perform one of these operations on the system we oblain a new system having exactly the same solutions. Two such systems are called equivalent. By performing these operations over and over again in a systematic fashion we finally arrive at an equivalent sysiem which can be solved by inspection, We shall illustrate the method with some specific examples, It will then be clear how the method is to be applied in general. wuvext 1. A system with a unique solution. Consider the system 2x —Syt4z=-3 xt 5 x- 4y +6r=10. This particular system has a unique solution, x = 124, y = 75, 2 = 31 , which we shall obtain by the Gauss-Jordan elimination process. To save labor we do not bother to copy the Teuers x, Y, 2 and the equals sign over axl over again, but work instead with the aug- mented matrix 2 5 4] 3 (2.24) 12 1 s 1-4 6] 10, 62 Linear transformations and matrices obtained by adjoining the righthand members of the system to the coefficient matrix. The three basic types of operation mentioned above are performed on the rows of the augmented matrix and aw called row operations. At any stage of the process we can put the letters X, y. 2 back again and insert equals signs along the vertical line to obtain equations, Our ulumate goal 1 {0 amve al the augmented mainx 10 OLgpa (2.25) 0 10 | hs oo1 after a succession of row operations. The corresponding system of equations is x = 124, y = 75, z = 31 , which gives the desired solution. The first step is to obtain a 1 in the upper left-hand comer of the matrix, We can do this by interchanging the first row of the given matrix (2.24) with either the second or third row. Or, we can multiply the first row by }, Interchanging the first and second rows, we get ri 11 sy 25 4 | 3 1 4 6] 10. The next step is to make all the remaining entries in the first column equal to zero, leaving the first row intact. To do this we multiply the fist row by -2 and add the result to the second row. Then we multiply the first row by -1 and add the result to the third row, After these two operations, we obtain (2.26) 0 -1 2|-13 qg-32 fj N -12 Now we repeat the process on the smaller matrix [ =13 which appears -2s| 5 PP adjacent to the two zeros, We can obtain a 1 in its upper left-hand comer by multiplying the second row of (2.26) by — 1 [This gives us the matrix guest 251 | 35 Multiplying the second row by 2 aft adding the resul"] to the third, we get 2.27) 101 2/5 Computation techniques 63 At this stage, the comesponding system of equations is given by x-l+ z= 5 yours a3. These equations can be solved in succession, starting with the third one and working backwards, to give us 223, yeld+2z=13+62=75, x =S+ Yor=5 + 150-31 = 1%, Or, we can continue the Gauss-Jordan process by making all the entries zero above the diagonal elements in the second and third columns. Multiplying the second row of (2.27) by 2 and adding the result to the first row, we obtain MH OOL 321 | tb | Finally, we mmiltiply the thin row by 3 and add the result to the first mow, and then multiply the third row by 2 and add the result to the second row to get the matrix in (2.25), avez 2, A system with more than one solution. Consider the following system of 3 equations in 5 unknowns: Wx — Sp tae wens 2.28) xdte - uto = 4y + 62 + 2u—v = 10, The corresponding sgmented matrix is witaa — 46M2L dsl | Smt The coefficients of x, y, % and the right-hand members are the same as those in Example 1, If we perform the same row operations used in Example 1, we: finally arrive at the augmented matrix 1 0 0-16 19 | 124 o 1 0 9 ufo 1 © 1 3 4|; 64 Linear transformations and matrices The comesponding system of equations can be solved for x, y, and x in terms of w and v, giving us 124 + 16u = 190 = 7S} Oe the z= 31 + 3u— 40, If we let w= fy and o = fy, where fy and fy are arbitrary real numbers, and determine x, Ys Z by these equations, the vector (x, y, z uy ¥) in Vs given by (2, Yo 21 My 0) = (124 + 164, = 191, 75 + 94, — Ut, 31 + 34, — tes hy 4) is a solution, By separating the parts involving 4, and f,, we can rewrite this as follows: (9% 45 ty 8) = (124, 75, 31,0,0) + (16, 9,3, 1,0) + 4(—19, 11, —4, 0, 1) This equation gives the general solution of the system. The vector (124, 75, 31,0,0) is a particular solution of the nonhomogeneous system (2.28), The two vectors (16, 9, 3, 10) and (-19, —I1,-4 , 0, 1) are solutions of the corresponding homogeneous system. Since they are independent. they form a basis for the space of all solutions of the homogeneous system. puveie 3. A system with mu solution, Consider the system 2x —Sy+42=-3 (2.29) x-ytez = 5 x—4y 452-10 This system is almost identical to that of Example 1 except that the coefficient of z in the third equation has been changed fom 6 to 5, The comesponding augmented matrix is nase | ans Applying the same row operations ysed in Example 1 to transform (2.24) into (2,27), we arrive at the augmented matrix | (230) L ooze 201 Wl | When the bottom row is expressed as an equation, it states that 0 = 31. Therefore the original system has 0 solution since the (wo systems (2.29) and (2.30) are equivalent. Inverses of square matrices 65 In cach of the foregoing examples, the number of equations did not exceed the number of unknowns, If there are more equations than unknowns, the Gauss-Jordan process. is still applicable. For example, suppose we consider the system of Example 1, which has. the solution x = 124. y = 75, z = 31. If we adjoin a new equation 10 this system which is also satistied by the same tiple, tor example, the equauon Zx — sy + z = 94, then the elimination process leads to the agumented matrix, 1 0 0| 124] 0 10 15| O01 (51) 000 i with a row of zeros along the bottom, But if we adjoin a new equation which is not satisfied by the tiple (124, 75, 31), for example the equation x + y + z = 1, then the elimination process leads to an augmented matrix of the form et at 0 1 0] 75 00 1| 3) O00! a where a@ 4 0. The last row now gives a contradictory equation 0 = a@ which shows that the system has no solution 2.19 Inverses of square matrices Let A = (q,) be a square nx ft matrix. If there is another mx nm matix B such that BA = I, where Z. is the m x m identity matrix, then A is called nonsingular and B is called a left inverse of A. Choose the usual basis of unit coordinate vectors in V,, and let T:V, —> V, be the linear transformation with matix m(T) = A. Then we have the following, smoesx 2.20, The matrix A is nonsingular if and only if T is invertible. If BA then B= m(T* Proof, Assume that A is nonsingular and that BA = I, We shall prove that T(x) = 0 implies x = 0. Given x such that T(x) = 0, let X be the m x 1 column matrix formed from the components of x. Since T(x) = Q, the matrix product AX is an n x 1 column matrix consisting of zeros, so B(AX) is also acolumn matrixof zeros. But B(AX) = (BA)X = IX = X, so every component of x is 0. Therefore, Tis invertible, and the equation 7-1 = Z implies that m(T)m(T3) = Z or Am(T-2) = I. Multiplying on the left by B, we find m(T>) = B , Conversely, if T is invenible, then TT’ is the identity transformation so m(T)Ym(T}is the identity matrix. Therefore A is nonsingular and m(T)A = I. 66 Linear transformations and matrices All the properties of invertible linear transformations have their counterparts for non- singular matrices, In particular, lett inverses (it they exist) are unique, and every lett inverse is also a right inverse, In other words, if A’ is nonsingular and BA = I, then AB = I. We call B the inverse of A and denote it by A~4, The inverse AM! is also non- singular and its inverse is A. Now we show that the problem of actually determining the entries of the inverse of a nonsingular matia is cquivaient i solving separate nonbomogencous Uncar system Let A = (q,,) be nonsingular and let 4-1 = (bj) be its inverse. The entries of A and A are related by then? equations (231) > dubs = where 8, =1if § =j, and 4, = 0 if i # j. For cach fixed choice of j, we can regard this as a nonhomogeneous system of 7 linear equations in m unknowns by, by;,..., Bay. Since A. is nonsingular, each of these systems has a unique solution, the jth column of B. All these systems have the same coefficient-matrix A and differ only in their right members. For example, if A is a 3 x 3 matrix, there are 9 equations in (2.31) which can be expressed as 3 separate linear systems having the following augmented matrices : My ae a | 1 a a2 a] 0 4 42 a | 0 Ay ey | 0), My Grn ey | 1], Ce x, M32 Ass | 0. Ax 2 O59 | O. 4 My aq | 1 If we apply the Gauss-Jondan process, we arrive at the respective augmented matrices 10 Ola, 10 Of bs 10 0/4, Oi Olt) |0 1 O]s,), [0 1 0/5, 0 0 1 | Sz, 0 0 1] bye, 0 0 1) > and solve all three systems at once by working with the enlarged matrix % G2 a3] 1 0 0, dy a3 | 0 1 0 A yy dg | 0 0 1 The elimination process then leads to 10 Of dy by As O 1 O] by bee bos}. [0 0 1] bn Par bao} Exercises or The matrix on the right of the vertical line is the required inverse, The matrix on the left of the line is the 3 x 3 identity matrix. It is not necessary to know in advance whether A is nonsingular, If A is singular (not nonsingular), we can still apply the Gauss-Jordan method, but somewhere in the process one of the diagonal elements will become zero, and it wil] not be possible to tansform A to the identity matrix. A system of 7 linear equations in unknowns, say Saan =e, 1 can be written more simply as a matrix equation, AX= C, where A = (ay) is Ue coeflicient mauix, and X and C are column mauiees, aL coy x oe If A is nonsingular there is a unique solution of the system given by ¥ = AC, 2.20 Exercises Apply the Gauss-Jordan elimination process to cach of the following systems. If a solution exists, determine the general. solution RK dy + Set = K+ ype 32+ 20 6xt p—42 $3 23x +2 tz xty-3rt+ u Sx +3y Bz Qe-y+ r= xd+y- 2 Ix ty Te +3u=3. 3. 3x + 2y +221 7. x4 ye dee U4 do =0 Sx+3y Bz =2 xt Wot Te + Mustdv=o Tx + dy + Sz e+ 3y + 6 + Iu + 150 =O. 43x42 +221 xm +Zewe-2 5x +3y Bz RX s3yeZ- Sus 9 Tx dy + $2 4x-ytz- us 5 X+ Y= 250 MX ays 2eyue 3 9. Prove that the system x +y +22 =2, 2x —y +32 =2, 5x —y +az =6, has a unique solution if'a y 8. Find all solutions when a = 8. 10. (@) Dewuuiue aii suiutivus of ile sysiea Sx + 2y — 62 + 2u xo yt zo oue 68 Linear transformations and matrices (b) Determine all solutions of the system 5x +2y — 62 +2u = -1 xX-y+t zZ- u=-2 Reyes = Gt 1 LL, ‘Ihis exercise fells how to determine all nonsingular 2 x 2 matrices. Prove that CL ee Deduce that [ 1 is nonsingular if and only if ad = be ¥ 0, in which case its inverse is 1 a 6 Me ele gt: Determine the inverse of each of the matrices in Exercises 12 through 16. 234 234 sip 123 w| 211]. 15, 14 0 0 1 ~ ooot 12 2 010000) 13.}2 -1 1 roscoe 03010 13 2. 16, i 0010 20) P21 000301 )-2 5 4). 000020 i+ 6 2.21 Miscellaneous exercises on matrices 1. Tf a square matrix has a row of zeros or a column of zeros, prove that it is singular. 2. For each of the following statements about m x m matrices, give a proof or exhibit a counter example. (a) If AB + BA = 0, then 2B = BAA, (b) If A and B are nonsingular, then A + B is nonsingular. (©) If A and B are nonsingular, then AB is nonsingular. (@) If A, B, and A + B are nonsingular, then A — B is nonsingular, (©) If A? = 0, then A — Z is nonsingular. (0 If the product of & matrices A, ...A, i8 nonsingular, then each matnx A, 18 nonsingular. Miscellaneous exercises on matrices 69 12 6 0 3. 1fA = [ |: find a nonsingular matrix P such that PAP = h |: 54 0-1 ai : 4, The matrix A = s where # = -1,a =4(1 + ¥/5),andb = 4(1 — V5), has the prop- ib erty that A? = A. Describe completely all 2 x 2 matrices A with complex entries such that Aad. 5. If A? = A, prove that (A+ D* =Z + (2° = VA 6. The special theory of relativity makes use of a set of equations of the form x’ = a(x ~ vf), y= y.2'=2, t= alt ~ exje®). Here v represents the velocity of a moving object, ¢ the speed of light, and a — ¢f,/ — 0, where |u| < © . The Tinear transformation which maps the two-dimensional vector (x, #) onto (x', C) is called a Lorentz transformation. ts matrix relative to the usual bases is denoted by L(v) and is given by L(v) = hat L(0)— 7B (u + oe/(we + 2). In other words, the product of two Lorenz transformations is another Lorentz. transformation. 7. If we interchange the rows and columns of a rectangular mauix A, the new matix so obtained is called the transpose of A and is denoted by At, For example, if we have 14 123 then At=|2 5}. =) 5 of a> 3 6 Prove that transposes have the following properties (a) (Al = AL (b) (A + B)i= Ab+ Bt, (c) (cA)! = cA‘ (d) (4B) = BtAt, (e) (A) = (44) if Ais nonsingular. 8 A square matrix A is called an orthogonal matrix if AA! == 1. Verify that the 2 x 2 matrix sin@ co 0 that its rows, cOnsidered as vectors in V,, form an orthonormal set. 9. For cach of the following statements about n x m matrices, give a proof or else exhibit a counter example, (a) IF A and B are orthogonal, then A + B is orthogonal. (b) If A and B are orthogonal, then AB is orthogonal. (©) If A and B are orthogonal, then B is orthogonal. 10, Hadumard matrices, named for Jacques Hadamard (1865-1963), are those m x m matrices with the following properties: 1. Each entry is 1 or -1 IL, Bach row, considered as a vector in ¥, , has length «/n, IIL. The dot product of any two distinct rows is 0. Hadamard matrices arise in certain problems in geometry and the theory of numbers, and they have been applied recently to the construction of optimum code words in space com- unicaion, in spie of heir apparent simplicity, Gey present nuny unsvived probiens. The cos 6 sin 6 js orthogonal for cach real 6, If A is any m x 7 orthogonal matrix, prove 70 Linear transformations and matrices main unsolved problem at this time is 10 determine all » for whieh an xm Hadamard matrix exists, This exercise cuilines a partial solution (@) Determine all 2 x 2 Hadamard matrices (there are exactly 8). (b) This part of the exercise outlines a simple proof of the following theorem: If A is ann lemmas conceming vectors in n-space. Prove each of these lemmas and apply them to the rows of Hadamard matrix to prove the theorem, tema 1. If X,Y,Z are orthogonal vectors inV,,, then we have (K+ YH 42) = |x|. amma 2. Write X=, ...6s Xs Y= (a o--es Iy)s Z= (veces Ze)» Uf cach component X;,¥;,2;i8 either | or=1, then the product (x; + y,)(%;+ 2)) is either Oor 4. 3 DETERMINANTS 3.1 Introduction In many applications of linear algebra to geometry and analysis the concept of a determinant plays an important part, ‘This chapler studies the basic properties of determi- nants and some of their applications. Determinants of order two and three were intoduced in Volume Tas a useful notation for expressing certain formulas in a compact form, We recall ‘that a determinant of oder two was defined by the formula lan as] GD e922 Aida : a, a ‘ Despite similarity in notation, the determinant |(written with vertical bars) is 141 gz conceptually distinct from the matrix [ ** (yrttten with square brackets). The ay Aap, determinant is a number assigned (0 the mattix acconding to Fomula @.1), ‘To emphasize this connection we also write an as = a aa Gea, Determinants of order three were defined in Volume I in tems of second-order detemi- nants by the formula Mm ty as (3.2) det} ao dan aa Lan an ase a ae an aon This chapter teats the more general case, (he determinant of a square matrix of order n for any integer n > 1. Our point of view is to teat the: determinant as a function which aw 72 Determinants assigns to cach square matrix A a number called the determinant of A and denoted by det A. It is possible to define this function by an explicit formula generalizing (3.1) and (3.2). This formula is a sum containing n! products of entries of A. For large m the formula is unwieldy and is rarely used in practice, It seems preferable to study determinants from another point of view which emphasizes more clearly their essential properties. ‘These Properties, which are important in the applications, will be taken as axioms for a determi- nant function, Initially. our program will consist of three parts: (1) To motivate the choice of axioms, (2) To deduce further properties of determinants from the axioms, @) To prove that there is one and only one function which satisfies the axioms 3.2. Motivation for the choice of axioms for a determinant function In Volume I we proved that the scalar tiple product of three vectors A,, A,, A, in 3-space can be expressed as the determinant of a matrix whose rows are the given vectors. Thus we have Ny Tait dt doy ae Elon, dy, Oy, where Ay = (ayy, dies dy9)s Aa = (das, dons das), ANd Ay = (Ay, ans dss)» If the rows are linearly independent the scalar tiple product is nonzero; the absolute value of the product is equal (© the volume of the parallelepiped determined by the three vectors A, A, A, If the rows are dependent the scalar tiple product is zero, In this case the vectors A, A, A, are coplanar and the parallelepiped degenerates to a plane figure of zero volume. Some of the properties of the scalar triple product will serve as motivation for the choice of axioms for a determinant function in the higher-dimensional case. To state these properties in a form suitable for generalization, we consider the scalar triple product as a function of the three row-vectors A, de, 4g. We denote this function by d; thus, (Ay, Agy As) = Ay X Ags Ape We focus our attention on the following properties: (a) Homogeneity in each row. For example, homogeneity in the first row states that AIA, Ag, Ap) = 1 d(Ay, dy, Ag) for every scalar ¢ . (b) Additivity in each mw. For example, additivity in the second row: states that Ay, Aa 4 Cy Ag) = dlAy, Aa, As) + UAL, C, As) for every vector C. fo) The seal (d) Normalization: ple di, j,k) = 1, A set of axioms for a determinant function B Each of these properties can be easily verified from properties of the dot and cross product, Some of these properties are suggested by the geometric relation between the scalar triple product and the volume of the parallelepiped determined by the geometric vectors Ay» Aay Ay. The geometric meaning of the adkitive property (b) in a special case is of particular interest. If we take C = A, in (b) the second term on the right is (©), and. relation (6) becomes (3.3) Ay, Ap + Ay, An) = Ar, A, AD). This property is illustrated in Figure 3.1 which shows a parallelepiped determined by A, A,, A, and another parallelepiped determined by A,, A, + Az, A, tquation (3.3) merely states that these two parallelepipeds have equal volumes. This is evident geometri- cally because the parallelepipeds have equal altitudes and bases of equal area. Volame = (A... Nay A) Volume «dita, Ay + A, As) Provre 8.1 Geometric interpretation of the property d(A,,A,, 4g)= d(4y, A, + Ay Ay). The ‘wo parallelepipeds have equal volumes, 3.3 A set of axioms for a determinant function The properties of the scalar wiple product mentioned in the foregoing section can be suitably generalized and used as axioms for determinants of order m. If A = (a) is an ox aun wit wai UL CoUupica cuties, we Ueuuic its tows by Ay... Ay. Thus, tie ith row of A is a vector in n-space given by A = (ays Aes. + + On) ‘We regard the determinant as a function of the ™ rows A,, ..., A,, and denote its values by d(Ay,...,A) or by det A. AXIOMATIC DEFINITION OF A DETERMINANT FUNCTION. A real- or complex-valuedfunction 4, defined for cach ordered n-tuple of vectors A,,..., A, in mspace, is called a determinant function of order nt if it satisfies the following axioms for ail choices of vectors Ay, , .., Ay and C in n-space: 74 Determinants AXIOM I, HOMOGENEITY IN EACH Row, Ifthe kth row A, is multiplied by a scalar 1, then the determinant is also multiplied by 1: d..., tay, (AS): axrom 2, appitrvity x sacn now. For each k we have dC, 4) -dd A tea, A). AXIOM 3, ‘THE DETERMINANT VANISHES IF ANY TWO ROWS ARE EQUAL: dA. Ag) =O if Ay = Ay for some i and j with i ¥ j. AXIOM 4, ‘THE DETERMINANT OF THE IDENTITY MATRIX IS EQUAL ‘TO 1; aa. 1) where 1, is the kth unit coordinate vector. ‘The first two axioms state that the determinant of a matrix is a linear function of each of its rows. This is often described by saying that the determinant is a multilinear function of its rows. By repeated application of linearity in the first row we can write (346s. day essa) =E tells ares Ads where t,,... fy are scalars and Cy... ,C, are any vectors in n-space. Sometimes a weaker version of Axiom 3 is used. AXIOM 3’, THE DETERMINANT VANISHES IF TWO ADJACENT ROWS ARE EQUAL: Ay A)=0 if Ay = Aga forsomek =1,2 ,..., 0-1 It is a remarkable fact that for a given n there is one and only one function d which satisfies Axioms 1, 2, 3° and 4. The proof of this fact, one of the principal results of this chapter, will be given later, The next theorem gives properties of determinants deduced from Axioms 1,2, and 3° alone. One of these properties is Axiom 3. It should be noted that Axiom 4 is not used in the proof of this theorem, This observation will be useful later when we prove uniqueness of the determinant function. qaworeM 3.1. A determinant function satisfying Axioms 1, 2, and 3° has the following further properties: (a) The determinant vanishes if some row is 0: dA. 4)=0 if A= O — forsomek A set of axioms for a determinant function (©) The determinan changes sign if two adjacent rows are (merchanged: Uo... Av Ansrer. J= dle... Agi An es) (c) The determinant changes sign if any two rows A, and A, with i 7 j are interchanged. (a) The determinant vanishes if any two rows are equal: dAy,.-.54,)=0 if Ay= Ay for some i and j with i #j. (e) The determinant vanishes if its rows are dependent. Proof. To prove (a) we simply take ¢ = 0 in Axiom 1, To prove (b), let B be a matrix having the same rows as A except for row k and row k + 11. Let both rows By and By be equal © Ay+ Agi. Then det B = 0 by Axiom 3°. Thus we may write a... Ag+ Agere Ae + Aga Applying the additive property to row & and to row k + 1 we can rewrite this equation as follows : AW. Ay Ags ee) Fes Aes Avsay oe) FC... Atay dese) +l... Ags Angas + The first and fourth terms are zero by Axiom 3°, Hence the second and third terms are negatives of each other, which proves (b). To prove (c) we can assume that i 2, the square matrix of order n — 1 obtained by deleting the kth row and the jth column of A is called the k, j minor of A and is denoted by soins. A mattix A = (q,,) of order 3 has nine minors. Three of them are [412 5) fe 7] [aes doa) Ane |; Ana |™ — ana |" | Gyn ys, Gy, ge, Equation (3.2) expresses the determinant of a3 x 3 matrix as a linear combination of determinants of these three minors, The formula can be written as follows: det A =a, det A,, = ayy det A, + ais det A,, . The next theorem extends this formula to the nxn case for any n> 2. THEOREM 3.9. EXPANSION BY KTHROW MINORS, For any a x mmatrix A, m2, the cofactor of dy, is related to the minor Ay, by the formula (G20) det Aj; = (—1)F' det Ay. Therefore the expansion of det A in terms of elements of the kth row is given by 621) det A= 3 (— 1 May, det As. TA Proof. We illustrate the idea of the proof by considering first the special case k = j = The matrix 4/, has the form 88 Determinants By applying clementary sow operations of ype (3) we can make every cnty below the Lin the first column equal to zero, leaving all the remaining entries intact. For example, if we multiply the first row of A;, by as and add the result to the second row, the new second row becomes (0, dyas daa... , dp,). By a succession of such elementary row operations we obiain a new matrix which we shall denote by A, and which has the form 1 0 0 0 a22 dan 0 ase ax O ays 7 “| Since row operations of type (3) leave the determinant unchanged we have det A= det Aja. det AY, = det Ayr, where A,, is the 1, 1 minor of A, tye 8" 4, Therefore det A’, =det 4;1, which proves (3.20) for k . ‘We consider next the special case k = 1 yf arbitrary, nd prove that (3.23) det Ai,= (= 1)" det As,. Once we prove (3.23) the more general formula (3.20) follows at once because matrix Al, can be transformed to a mattix of the form Bi, by k ~ 1 successive interchanges of adjacent rows, The determinant changes sign at each interchange so we have (3.24) det Aj; = (- 1) det Bi, where B is an nxn matrix whose first row is J, and whose 1, f minor By, is Ay). By (3.23), we have det Bi, = (— 1)" det By = (1) det Ay; so (3.24) gives us det Ay, = (— "(= 1)" det 4g; = (— I)*? det Ans. Therefore if we prove (3.23) we also prove (3.20). Expansion formulas for determinants. Minors and cofactors 89 We turn now to the proof of (3.23). The matrix A‘, has the form 6 0 Gy. ay ttt Aan 4] : : Lay Amy and By elementary row operations of type (3) we introduce a column of zeros below the 1 and transform this 1 0 O 1 0 ts 0 a la a, Ay = Gm 2 Aas O Ans. Ann As before, the determinant is unchanged so det AY, = det A,;. ‘The 1, j minor Ay, has the form Fat) doggy dagen | By 9" An ga Angee 8" Ann Now we regard det 48, as a function of the 2 — 1 rows of Ayj, say det A%, = f(Ayj) . The functionfsatisfies the first three axioms for a determinant function of order n — 1, There- fore, by the uniqueness theorem we can write 3.25) S(Ay) = £U) det Ais» where J is the identity matrix of onder m — 1, Therefore, 0 prove (3.23) we must show that fd) = (= 1) Now (J) is by definition the determinant of the matrix 0: 010-0 C=l0 100 0 |< jth row 0 00 Lr 0 0-000 1 jth column 90 Determinants The entries along the sloping lines are all 1, The remaining entries not shown are all 0, By interchanging the first row of C successively with rows 2, 3... ,f we arive at the nx n identity mavix after j = 1 interchanges. The determinant changes sign al each inte change, so det C = ¢ 1), Hence f(J) =~ 1M, which proves (3.23) and hence (3.20). 3.13 Existence of the determinant function In this section we use induction on n, the size of a matrix, to prove that determinant functions of every onder exist, For n = 2 we have already shown that a determinant function exists. We can also dispense with the case n = 1 by defining det [a] = a., - Assuming a determinant function exists of order n — 1, a logical candidate for a deter- minant function of order m would be one of the expansion formulas of Theorem 3.9, for example, the expansion in terms of the fistrow minors. However, it is easier to verify the axioms if we use a different but analogous formula expressed in terms of the firs-column minors. THEOREM 3,10. Assume determinants of order n — 1 exist. For any n x n matrix A=(aj,), let f be the function defined by the formula 326) fa, 4 iC Lan det An. ‘then t satisfies all four axiomsfor a determinanifunction of order n, Therefore, by induction, determinants of onder n exist for every n. Proof. We regard each term of the sum in (3.26) as a function of the rows of A and we write SW Ars 22 Ad = (— Day det Ap - If we verify that cach f; satisfies Axioms 1 and 2 the same will be tuc for f. Consider the effect of multiplying the first row of A by a scalar t, The minor A,, is not affected since it does not involve the first row. The coefficient a,, is multiplied by 1, so we have AGA, Ags... Ad = ty det A, = tfi(di,.- +» Ad. If} > 1 the first row of cach minor Ay, gets muliptied by f and the coefficient ayy is not affected, so again we have Si(tArs Aas +s An) = HilArs Aas ves An)» ‘Therefore each f; is homogeneous in the first row. If the kth row of A is multiplied by t, where k > 1, the minor A,, is not affected but dy is multiplied by t, so f, is homogeneous in the kth row. If j x , the coefficient a), is not affected but some row of Ay, gels multiplied by #. Hence every f; is homogeneous in the ‘The determinant of a transpose 91 A similar argument shows that cach f, is additive in every row, so £ satisfies Axioms 1 and 2. We prove next that £ satisfies Axiom 3', the weak version of Axiom 3, From Theorem 3.1, it then follows that £ satisfics Axiom 3. ‘To verify that. £ satisfies Axiom 3’, assume two adjacent rows of A are equal, say A, = Aas. Then, except for minors Ag, and Axia, each minor Ay, has two equal rows so det Aj) =0. Therefore the sum in (3.26) consists only of the two terms comesponding to jokandj=kt1, 621) LUAyy 6) A) = (“DMs det Age C1 Paya, det A,,,., But Ayy = Aggaa and dy = duis since A, = A,,,.. Therefore the two terms in (3.27) differ only in sigh, sof (4... , A) = 0. Thus, £ satisfies Axiom 3° Finally, we verify that £ satisfies Axiom 4, When A = 1 we have a,, = 1 and a,, = 0 for > 1. Also, A., is the identity matrix of order n~ 1, so each term in (3.26) is zero except the first, which is equal to 1, Hence £ (y,..., J,)=1 sof satisfies Axiom 4, In the foregoing proof we could just as well have used a functionfdefined in terms of the kth-column minors 4,, instead of the first-column minors 4,,.. In fact, if we let 8.28) f(A. >A) = 3 (= "ay det Am, rs exacily the same type of proof shows that this £ satisfies all four axioms for a determinant function, Since determinant functions are unique, the expansion formulas in (3.28) and those in (3.21) are all equal to det A. The expansion formulas (3.28) not only establish the existence of determinant functions but also reveal a new aspect of the theory of determinants-a connection between raw. properties and column-properties. ‘This connection is discussed further in the next section, 3.14 The determinant of a transpose Associated with each matrix A is another matrix called the transpose of A and denoted by 4‘. The rows of 4¢ are the columns of A. For example, if 14 123 Az » then Atm ]2 5], 4576 3 6 A formal definition may be given as follows. DEFINITION OF TRANSPOSE ‘The transpose af anim Xn matrix A = (a,)%", is the n Xm matrix A! whose i, j entry is ay, Although transposition can be applied to any rectangular matrix we shall be concerned primarily with square matrices. We prove next that transposition of a square matrix does fot alter its determinant, 92 Determinants mazonen 3. 1], For any nx n matrix A we have det A= det A! Proof. The proof is by induction on n. For m= 1 and n = 2 the result is easily verified, Assume, then, that the theorem is trie for matrices of order 1 — 1. Let A = (ay) and let B = A! = (6). Expanding det A by its firsteolumn minors and det B by its firstrow minors we have det A= S(-1) ay det Aj, det B= ¥ (1); det By a iat But from the definition of transpose we have by, - a, and By - (dy)‘ Since we are assuming the theorem is true for matrices of onder m = 1 we have det By = det Ay. Hence the foregoing sums are equal term by term, so det A = det B. 3.15 The cofactor matrix ‘Theorem 3.5 showed that if A is nonsingular then det A 3 0. The next theorem proves the converse. That is, if det A 3 0, then AW exists. Moreover, it gives an explicit formula In Theorem 3.9 we proved that the i, j cofactor of a,, is equal to (— I) det A,, , where Aj, is the i j minor of A. Let us denote this cofactor by cof ay. Thus, by definition, ie At cof ayy = (— 1 det Ay. DEFINITION OF THE COFACTOR MATRIX. The matrix whose i, j entry is cof ay is called the cofactor matrix? of A and is denoted by cof A. Thus. we have cof A = (cof a) 1 = (( = IY det Ay saa The next theorem shows that the product of A with the transpose of its cofactor matrix is, apart from a scalar factor, the identity matrix J, agorem 3.12, For any nx m matrix A with n > 2 we have (3.29) A(cof A) = (det AVI. In particular, if det A 0 the inverse of A exists and is given by At = (or Ay. +n much of the matrix literature the transpose of the cofactor matrix is called the adiygate of A. Some of the older literature cals it the adjoint of A. However, current nomenclarure reserves the term adjoint for an caiiicty diffescid object, discussed ia Section 5.8 Cramers’ rule 98 Proof. Using Theorem 3.9 we express det A in terms of its kthkrow cofactors by the formula (3.30) det A = > ay, cof ay;. Ea Keep fixed and apply this relation to a new matrix B whose ith row is equal to the kth row of A for some i # h, and whose remaining rows are the same as those of A. Then det B = 0 because the ith and kth rows of B are equal, Expressing det B in terms of its ith-row cofactors. we have (8.31) det B= ¥ by, cof by, = 0 But since the ith row of B is equal to the kth row of A we have Big= a, and — cof by = colay for every j. Hence (3.31) states. that (3.32) Eauations (3.30) and (3.32) together can be written as follows: S dt A if i=k (8.33) Dar, cof a, i verse. But the sum appearing on the left of (3.33) is the h, i entry of the product A(cof A)! There- fore (3.33) implies (3.29). As a direct corollary of Theorems 3.5 and 3.12 we have the following necessary and sufficient condition for a square matrix to be nonsingular, auronsn 3.13. A square matrix A is nonsingular if and only if det A #0. 3.16 Cramer’s rule Theorem 3.12 can also be used | give explicit formulas for the solutions of a system of linear equations with a nonsingular coeflicient matrix. The formulas are called Cramer's rule, in honor of the Swiss mathematician Gabriel Cramer (1704-1752). THEOREM 3,14. CRAMER'S RULE. If a system of n linear equations in n unknowns MisveeaXny Me (i 12,0... m) 94 Determinants has a nonsingular coefficient-matrix A = given by the formulas ,;) , then there is a unique solution for the system Dba, or iai). on (3.34) det a Proof. The system can be written as a matrix equation, x by where X and B are column matrices, ¥ =] +], B=] +]. Since A is nonsingular there is a unique solution X given by A 3.35 X = A"B=—— (cof A)'B. G39) ica” 7 The formulas in (3.34) follow by equating components in (3.35). It should be noted that the formula for x, in (3.34) can be expressed as the quotient of two determinants, where C; is the matrix obtained from A by replacing the jth column of A by the column matrix B. 3.17 Exercis s 1. Determine the cofactor matrix of cach of the following matrices 301 2 4 2 0 1 1 -2 6} 2 3 2 3 2. Determine the inverse of each of the nonsingular matrices in Exercise 1. 3. Find all values of the scalar 4 for which the matrix J = A is singular, if A is equal to 10 2 uu -2 8 Oo 3 «| > (b) Jo 2- 2 2. 0. 8 2 Exercises 95 4,1f Ais an mx m matrix with 1 > 2, prove each of the following, properties of its cofactor matrix: (a) cof (A4) = (cof A)! (b) (of A)'A = (det A). (©) Alcof A)! = (eof A)‘A (A commutes with the transpose of its cofactor matrix). 5. Use Cramer’s rule to solve each of the following systems: Qx-y +4257, yt2=l Bx-y-z=3, ke t+Sy43z=4, 6. (a) Explain why each of the following is a Cartesian equation for a straight line in the xy-plane passing through two distinet points (x, y,) and (x,, J) x yt | 7 | =0; det/x y, 1] <0. Xo Mh Jah a () State and prove comesponding relations for a plane in 3-space passing through three distinct points. (©) State and prove comesponding relations for a cirele in the xy-plane passing through three noncolinear points 7. Given 1% functions f,, each differentiable on an interval (a, 5), define F(x) — det [f,(x)] for each x in (@, b). Prove that the derivative F'(x) is a sum of 1 determinants, FOO = D det AW), (x) 2% (x), where A,(x) is the matrix obtained by differentiating the functions in the ith row of [f,(x)]. 8. An mx n matrix of functions of the form W(x) = {u/"'G)], in which each row after the first is the derivative of the previous row, is called a Wronskian matrix in honor of the Polish mathe- matician I. MH, Wronski (1778-1853). Prove that the derivative of the determinant of W(x) is the determinant of the matrix oblained by differentiating each entry in the last row of W(x), [Hing Use Exercise 7.] 4 EIGENVALUES AND EIGENVECTORS 4.1 Linear transformations with diagonal matrix representations Let 1; V— V be a linear transformation on a finite-dimensional linear space V. ‘Those properties of 7 which are independent of any coordinate system (basis) for V are called imrinsicpropenies of T. They awe shared vy ali the mauix represemuaions of 7. If a basis can be chosen so that the resulting matrix has a particularly simple form it may be possible to detect some of the intrinsic properties directly from the matrix representation, Among the simplest types of matrices are the diagonal matrices, Therefore we miight ask Whether every linear transformation has a diagonal matrix representation, In Chapter 2 we treated the problem of finding a diagonal matrix representation of a linear transfor mation V—» W, where dim V =” and dim W = m . In Theorem 2.14 we proved that there always exists a basis (e;,..., &) for V and a basis (Wy)... .. W,,) for W such that the matrix of T relative t© this pair of bases is a diagonal matrix, In particular, if W = V the matrix will be a square diagonal matrix. ‘The new feature now is that we want to use the same basis for both Vand W. With this restriction it is not always possible to find a diagonal matrix representation for 7. We tm, then, to the problem of determining which trans- formations do have a diagonal matrix representation Notation: If A= (aj,) is a diagonal matrix, we write A = diag (@3, 425+. +» a) It is easy to give a necessary and sufficient condition for a linear transformation to have a Giagonal matrix representation, vHborem 4,1, Given a linear transformation T; V— V, where dim V = n. If T hay a diagonal matrix representation, then there exists an independent set of elements uw, in V and a corresponding set of scalars Ay,..., dy such that (4) TG) 4a for ko 12, Conversely, if there is an independent set Uy,..., ty in Vand a corresponding set of scalars Ayy eves dy satisfying (4.1), then the matrix A = diag (A1,..-44,) is a representation of T relative to the basis (ty... +5 Us): 96 Bigenvectors and eigenvalues of a linear transformation 7 Proof, Assume first that 7 has a diagonal matrix representation A = (aj) relative 10 some basis (€,,.. . 63). The action of T on the basis cloments is given by the formula Tex) = % aati= Arner since ay = 0 for i A & . This proves (4.1) with up = ep and a, = a,, . Now suppose independent elements 1; , + My, and scalars A, » 2, exist satisfying (4.1). Since uj...» My ame independent they form a basis for V. If we define’ a,, = A, 8 dy, = 0 for i # k, then the matrix A = (a,) is a diagonal matrix which represents T relative to the basis (js... , uy)s Thus the problem of finding a diagonal matrix representation of a linear transformation hhas been transformed to another problem, that of finding independent elements 4, .. - . tly and scalars yy... 5 1, © satisfy (4.1), Elements uy and scalars Ay satisfying 4.1) are called eigenvectors and eigenvalues of T’, respectively. In the next section we study eigenvectors and eigenvalues in a more general setting. 4.2 Eigenvectors and eigenvalues of a linear transformation In this discussion V denotes a linear space and $ denotes a subspace of V. The spaces S' and V are not requited © be finite dimensional, vermurren. Let T:S— Vbe a linear transformation of Sinto V. A scalar Ais called an eigenvalue of Tif there is a nonzero element x in S such that (42) Tx) = Ax. ‘The element x is called an eigenvector of Tbelonging to 1. The scalar A is called an eigenvalue corresponding to x. ‘There is exactly one eigenvalne corresponding to a given eigenvector x. In fact, if we have T(x) = Ax and Tx) = px for some x # O, then Ax uxso =m. Note: Although Equation (4.2) always holds for x = 0 and any scalar 1, the definition excludes 0 as an eigenvector. One reason for this prejudice against 0 is to have exactly one cigenvalue 2 associated with a given eigenvector x. The following examples illustrate the meaning ol these concepts. naeir 1, Multiplication by a fixed scalar. Let T: S—* V be the linear transformation defined by the equation TY) = cx for each x in S, where ¢ is a fixed scalar, In this example every Monzero clement of S is an eigenvector belonging to the scalar c. + The words eigenvector und eigenvalue are partial translations of the German words Eigenvektor and Figenwert, respectively. Some authors use the (erms characteristic vector. OF prover vector as SvNOnVMS for eigenvector. Eigenvalues are also called characteristic values, proper values, OF latent roots. 08, Eigenvalues and eigenvectors mame 2. The eigenspace E(A) consisting of all x such that T(x) = Ax. Let T: S > V be a linear wansformation having an eigenvalue 2. Let B(A) be the set of all elements x in S such that Tix) = Ax. This set contains the zero element 0 and all eigenvectors belonging to il. It is easy to prove that E(/) is a subspace of S, because if x and y are in E(A) we have Tlax + by) = aT(x) + Ty) = aix + bly = ax + by) for all scalars a and b. Hence (ax + by) € E(/) so E(A) is a subspace. The space E(1) is called the eigenspace corresponding to 2. I may be finite- or infinite-dimensional. If E(A) is finite-dimensional then dim E() > 1 , since E(1) contains at least one nonzero element x corresponding 1 1 waexe 3. Existence of zero eigenvalues. If an eigenvector exists it cannot be zero, by Gefinition. However, the zero scalar can be an eigenvalue, In fact, if 0 is an eigenvalue for x then T(x) = Ox = 0, so x is in the null space of 7. Conversely, if the null space of T contains any nonzero elements then each of these is an eigenvector with eigenvalue 0, In general, £(A) is the null space of 1 — AL. exwpre 4. Rejection in the xy-plane. Let S = V the xy-plane, That is, let Tact on the basis vectors i, j, k as follows: (i) = i, TH) = 1(k) = -k. Every nonzero vector in the xy-plane is an eigenvector with eigenvalue 1 ‘The remaining eigenvectors are those of the form ck, where ¢ 0 ; each of them has eigenvalue — 1 VR) and let T be a reflection in saves 5. Rotation of the plane through afixed angle a. This example is of special interest because it shows that the existence of cigenyectors may depend on the underlying ficld of scalars. The plane can be regarded as a linear space in two different ways: (1) As a 2- dimensional real linear space, V = V,(R), with two basis elements (1, 0) and (0, 1), and with real numbers as scalars, or (2) as a dimensional complex linear space, V = V,(C), with one basis element 1, and complex numbers as scalars. Consider the second interpretation first. Each element zs 0 of V4(C) can be expressed in polar form, z = re, If T rotates z through an angle % then T(z) = rel@ts) = oltz, Thus, cach x 7 0 is an eigenvector with eigenvalue 4 = e'. Note that the eigenvalue e is not real unless iy an integer mulipie of Now consider the plane as a real linear space, V,(R). Since the scalars of V,(R) are real numbers the rotation T has real eigenvalues only if a is an integer multiple of 7, In other words, if @ is not an integer multiple of a then 7 has no real eigenvalues and hence no eigenvectors, Thus the existence of eigenvectors and eigenvalues may depend on the choice eV, ore muwiz 6. The differentiation gperator. Let V be the linear space of all real functions f having derivatives of every order on a given open interval, Let D be the linear transfor mation which maps each f onto its derivative, D(f) = f '. The eigenvectors of D are those nonzero functions f satisfying an equation of the form envectors and eigenvalues of a linear transformation 99 f= for some real 4, This is a first order linear differential equation, All its solutions are given by the formula f(x) = ce*, where © is un abitmy real constant. Thetefore the eigenvectors of D are all exponential functions f(x) == ce** with c # 0 . The cigenvalue corresponding to fix) = ce is 4. In examples like this one where V is a function space the eigenvectors ate called eigenfunctions. exmerx 7. The integration operator. Let V be the linear space of all real functions continuous on a finite interval fa, b]. Iffe V define g = 7() w be that function given by ay = [sod if asxsr. The eigenfunctions of 7 Gf any exist) awe those nonzerofsatisfying an equation of the form 43) fF 10 a= fo) for some real 4. If an eigenfunction exists we may differentiate this equation to obtain the relation f(x) = af"(x), from which we find f(x) = ce*!*, provided 7 # 0. In other word: the only candidates for eigenfunctions are those exponential functions of the form f(x) ce*!* with © #0 and 7 # 0. However, if we put x= a in (4.3) we obtain 0 = Af(a) = Kee”. Since e4!* is never zero we see that the equation 71) = Af cannot be satisfied with a non- ze f, so Thas no eigenfunetions and no eigenvalues. wowexz 8 The subspace spanned by an eigenvector. Let T: § > V be a linear trans- formation having an eigenvalue 1. Let x be an eigenvector belonging to 4 and let L(x) be the subspace spanned by x. That is, L(x) is the set of all scalar multiples of x. It is easy to show that 77 maps L(x) into itself. In fact, if y = ex we have Thy) = Tex) = cT(x) = c(Ax) = Hex) = dy. If ¢ #0 then y ¥ 0 so every nonzero clement y of L(x) is also an eigenvector belonging wo. A subspace U of $ is called invariant under Tif Tmaps each element of U onto an element of U, We have j ust shown that the subspace spanned by an eigenvector is invariant under 7. 100 Eigenvalues and eigenvectors 43 Linear independence of eigenvectors corresponding to di finet eigenvalues One of the most important properties of eigenvalues is deseribed in the next theorem. As before, S$ denotes a subspace ot a linear space V. rizoasn 4.2, Let ths... , Uy be eigenvectors of a linear transformation T: S —* V, and assume that the corresponding eigenvalues 4,,... , 4, are distinct. Then the eigenvectors Uy, sv yMy are independent. Proof. The proof is by induction on k. The result is trivial when k = 1. Assume, then, that it has been proved for every set of k — 1 eigenvectors. Let ts... , Mk be k eigen vectors belonging to distinct eigenvalues, and assume that scalars ¢ exist such that k (4.4) dew; =O. Applying Tt both members of (4.4) and using the fact that T(u,) = 2m; we ind (4.5) Multiplying @4) by J, and subtracting from (4.5) we obtain the equation Sed: — Ayu, =O. But since ,. «+ Mya are independent we must have ¢,(2; — 4,) =0 for each 7= 1, 2,202, k —1. Since the eigenvalues are distinct we have 4; ¥ A, for ik so c;= 0 fori= 1, 2k = 1, From (44) we see that ¢, is also 0 so the eigenvectors ,.... My are inde- pendent. Note that Theorem 4.2 would not be tue if the zero element were allowed to be an eigen- vector. This is another reason for excluding 0 as an cigenvector. Warning: ‘The converse of Theorem 4,2 does not hold, That is, if 7 has independent cigenvectors 4... ts then the corresponding eigenvalues 4,..., 4 need not be dis- tinct. For example, if Tis the identity transformation, T(x) = x for all x, then every x #0 is an eigenvector but there is only one eigenvalue, = 1 Theorem 4.2, has important consequences for the finite-dimensional case. uso = 4.3. If dim n, every linear transformation T: V— V has at most n distinct eigenvalues. If T has exactly n distinct eigenvalues, then the corresponding eigenvectors form a basis for V and the matrix of T relative 10 this basis is a diagonal matrix with the eigenvalues as diagonal entries. Proof. I there were n + 1 distinct eigenvalues then, by Theorem 4.2, V would contain n + 1 independent elements. This is not possible since dim V =n . The second assertion follows from Theorems 4.1 and 4.2 Exercises 101 Note: ‘Theorem 4.3 tells us that the existence of 1 distinct eigenvalues is a sufficient condition for Tio have a diagonal matrix representation. This condition is not necessary. There exist linear transformations with less than n distinct eigenvalues that can be represented by diagonal matrices. The identity transformation is an example, All its eigen- values are equal to 1 but it can be represented by the identity matrix. Theorem 4.1 tells us that the existence of 1 independent eigenvectors is necessary and sufficient for Tw have a diagonal matrix representation, 44 Exercises 1. (@) If T has an eigenvalue 4, prove that aT has the eigenvalue al. (b) If is an eigenvector for hoth T, and T, , prowe that x is also an eigenvector for aT, + BT, How are the eigenvalues related? . Assume “T: V > V has an eigenvector x belonging to an eigenvalue 1, Prove that x'is an eigen- vector of 7? belonging to # and, more generally, x is an eigenvector of 7" belonging to 2". Then use the result of Exercise 1 to show that if P is a polynomial, then x is an eigenvector of P(T) belonging to P(A). 5. Consider the plane as a real linear space, V = a(R) , and let 7° be a rotation ot ¥ through an angle of 2/2 radians. Although 7 has no eigenvectors, prove that every nonzero vector is an cigenvector for 72, 4. If T: V-» ¥ has the property that 7* has a nonnegative eigenvalue 4%, prove that at least one of 2 or ~-/ is an eigenvalue for 7. (Hint: Tt = #1 = (1 + A(T =A). 5. Let V be the linear space of all real functions differentiable on (0, 1). If f € V, define g = T() Wo mean that g(¢) = ef “(0 forall tin (0, 1). Prove that every real 4 is an eigenvalue for T, and determine the eigenfunctions corresponding to A. np T(p) to mean that g(t) = p(t + 1) for all real f, Prove that 7 has only the eigenvalue 1. What are the eigenfunctions belonging to this eigenvalue? 7. Let F be the linear space of all functions continuous ou (= co, + 20) and such that the integral fia £O dt exists for all real x. If f € V, let g = T() be defined by the equation g(x) = S.cf (dt. Prove that every positive 1 is an eigenvalue for Tand determine the eigenfunctions comresponding to 4. 8. Let V be the linear space of all functions continuous on ( ~ %, + c9) and such that the integral ( . ¢.f10) de exists for all real x. If f€ V let g= TM be defined by the equation g(x) = JE 2 tf (@ dt . Prove thal every negative 4 is aneigenvalue for 7 and determine the eigenfunc- tions comesponding to 1. 9. Let V ~ C(O, ») be the real linear space of all real functions continuous on the interval 0, x]. Let S be the subspace of all functions f which have a continuous second derivative in linear and which also satisfy the boundary conditions 10) = f(m) = 0. Let 7: S - V be the linear transformation which maps each f onlo its second derivative, TI) = f ". Prove that the eigenvalues of Tare the numbers of the form —n?, where n= 1, 2,..., and that the eigen- functions corresponding to —n? are fit) = cy sin nt , where cy, # 0. 10. Let V be the linear space of all real convergent sequences {x,}. Define T: V-> V as follows: If x = {x,} is a convergent sequence with limit a, let T(x) = {yq}, where y,= a — xq for n> 1. Prove that Thas only two eigenvalues. 2= 0 and A = -1, and deiermine the eigen- vectors belonging to cach such 4, 11. Assume that a linear wansformation T has two eigenvectors x and y belonging to distinct eigenvalues J.and 1 If ay + by is an eigenvector of T, prove that a= 0 or b= 0. 12. Let 7: § + be a linear transformation such that every nonzero clement of $ is an eigenvector. Prove that there exists a scalar c such that T(x) = cx. In other words, the only transformation with this property is a scalar times the identity. /#ine: Use Exercise 11] 102 Eigenvalues and eigenvectors 4.5 The finite-dimensional case. Characteristic polynomials if dim ¥ =a, te prodiem of finding de cigenvaiues of w iia uansfocmaion 7 > ¥ can be solved with the help of determinants, We wish to find those scalars A such that the equation T(x) = Ax has a solution x with x ¥ 0. The equation Tix) = Ax can be written in the form (I= TH) = 0 where Jis the identity transformation, If we let T, = Af — 7, then 2 is an eigenvalue if and only if the equation (46) T(x) = 0 has a nonzero solution x, in which case 7, is not invertible because of Theorem 2.10). Therefore, by Theorem 2.20, a nonzero solution of (4.6) exists if and only if the matrix of T, is singular. If A is a matrix representation for T, then AZ — A is a matrix representation for T,. By Theorem 3.13, the mawix A — A is singular if and only if det (AJ — A) = 0. ‘Thus, if A is an eigenvalue for Tit satisfies the equation an det (AI — A) Conversely, any 4 in the underlying field of scalars which satisties (4:7) 18 an eigenvalue. This suggests that we should study the determinant det (AZ — A) as a funetion of 4. rmore 44. If Ais any nx nmatrixand if [is the nx nidentity matrix, the function f defined by the equation f(2) = det (AZ = A) is a polynomial in i of degree n. Moreover, the term of highest degree is 4%, and the constant term is f(0) = det -A) = (- 1)" det A. Proof. The statement 1(0) = det (-A) follows at once trom the definition of f. We prove that? is a polynomial of degre m only for the case m <3. The proof in the general case can be given by induction and is left as an exercise, (See Exercise 9 in Section 4.8.) For n = 1 the determinant is the linear polynomial f(4) = 4 ~ a. For n = 2 we have Aa a det (A = AD “oe = a)G = dy) = dixie, ay A= ay B= (G+ dod + Guide = A201), a quadratic polynomial in 4, For n = 3 we have Ama, -ay ~4s det (i = A) =| dy T= dy ~day My yg A — gy Calculation of eigenvalues and eigenvectors in thejnite-dimensional case 108 dy =n ay oe. dma y =a 1 = dy Ihe last two terms are hnear polynomials in 2, ‘Ihe first term 1s a cubic polynomial, the term of highest degree being /*, vermox. IFA is an nxn matrix the determinant f(A) = det (= A) is called the characteristic polynomial of A. The roots of the characteristic polynomial of A are complex numbers, some of which may be real. If we let F denote either the real field R or the complex field C, we have the following theorem, razonsn 4.5. Let T: V—> V be a linear transformation, where V has scalars in F, and dim V =n. Let A be a matrix representation of T. Then the set of eigenvalues of T consists of those roots of the characteristic polynomial of A which lie in F. Proof. The discussion preceding Theorem 4.4 shows that every eigenvalue of 7’ satisfies the equation det (47 — A) = 0 and that any root of the characteristic polynomial of A which lies in F is an eigenvalue of T: ‘The matrix A depends on the choice of basis for V, but the eigenvalues of 7" were defined without reference to a basis. Therefore, the set of roots of the characteristic polynomial of ‘A must be independent of the choice of basis. More than this is ue. In a later section we shall prove that the characteristic polynomial itself is independent of the choice of basis, We tum now to the problem of actually calculating the eigenvalues and eigenvectors in the finite-dimensional case, 4.6 Calculation of” eigenvalues and eigenvectors in the finite-dimensional case In the finite-cimensional case the eigenvalues and eigenvectors of a linear transformation Tare also called eigenvalues and eigenvectors of each matix representation of T. Thus, the eigenvalues of a square matrix A are the roots of the characteristic polynomial (A) det (AJ — A). The eigenvectors corresponding to an cigenvalue 2 are those nonzero vectors X= (41,0. 048) regarded as ne x 1 column matrices saaisfying ie miki equation AX =21X, or (AI. AX = 0. This is a system of n linear equations for the components x, .. . , x, Once we know 2 we can obtain the eigenvectors by solving this system, We illustrate with three examples that exhibit different features 104 Eigenvalues and eigenvectors zoos I. A matrix with all its eigenvalues distinct. ‘The matix has the characteristic polynomial A-2 -1 0 -1 det I= A) = det] -2 4-3 -4 | =G@—-HNEA+DEG-39, 1 1 A+2 so there are three distinet eigenvalues: 1, and 3. To find the eigenvectors comesponding to 4 = 1 we solve the system AX = X, or 2 1 Tf xy -1 -1 -2\Lx} be This gives us Qt et Ha Qxy + 3xy + Ay = =x, - m2 = a, which can be rewritten as M+ Xt %y=O 2xy + 2xz + 4x = 0 ax - to 3x = 0. Adding the first and third equations we find x, = 0, and all three equations then reduce to X +X, = 0. Thus the eigenvectors comesponding to 4 = 1 are X = t(1,— 1, 0) , where 1 is any nonzero scalar. By similar calculations we find the eigenvectors ¥ = 4(0, 1 , -1) corresponding to 2 = 1, and X = ¢(2,3, = 1) corresponding wo A = 3, with # any nonzero scalar. Since the eigenvalues are distinct the correspondingeigenvectors (1, — 1 , 0), (0, 1, — 1), and (2,3, — 1) are independent. The results can be summarized in tabular form as follows. In the third column we have listed the dimension of the cigenspace E(1). Bigenvalue A Eigenvectors dim E(A) Il «1, 1,0), 140 1 -1 10,1,-1), 10 1 3 43,3, 1, t#0 1 Calculation of eigenvalues and eigenvectors in thejnite-dimensional case 105 woe 2. A matrix with repeated eigenvalues. ‘The matrix es A- 3 -B has the characteristic polynomial a-2 1 -1 ] det(@r—A)=det}] 0 2-3 1 J=Q-D~A-—29~-4). -2 -l1 4-3 The eigenvalues are 2, 2, and 4. (We list the eigenvalue 2 twice to emphasize that it is a double root of the characteristic polynomial.) To find the eigenvectors corresponding to 4 = 2 we solve the sysiem AX = 2X, which reduces to —%_+ x3=0 My — X= 0 xy + xe t ay = 02 This has the solution x, = x= —x, so the eigenvectors corresponding to 4 = 2 are tC1, 1, 1), where 1 0, Similarly we find the eigenvectors 4(1, — 1, 1) corresponding to the eigenvalue: 4 = 4. The results can be summarized as follows: Ligenvalue Kigenvectors dim Ba) 2,2 (-1,1,D, 140 4 11, -1,D, 140 1 nowir 3. Another matrix with repeated eigenvalues. ‘The matrix prt Aum is D2 Lo has the characteristic polynomial (4 — 1)(2 — 1)(4— 7). When 4 = 7 the system AY = TX becomes 0 0 —3x, = 3x; + 3x = 0. Sy mm -2x, + 4x, — Ine This has the solution x, = 2x, x3 = 3x,, so the eigenvectors corresponding to 1(1, 2, 3), where t ¥ 0. For the eigenvalue 4= 1, the system AX = X consi equation s of the 106 Eigenvalues and eigenvectors M+ X24 Xs = 0 repeated three times. To solve this equation we may take x= a, X;= . where a and b are arbitrary, and then take x3= -a — 6 . ‘Thus every eigenvector corresponding to 4 = 1 has the form (ab, -a — &) = a(l,0, -1) + 1, 1, -D. where a #0 or b 4 0. This means that the vectors (1, 0, -1) and (0, 1, -1) form a basis for E(1). Hence dim E(A) = 2 when 4= 1, The results can be summarized as follows: Eigenvalue dim Ea) 7 (1,2,3), t#0 1 it a(1,0,-1) + BO, 1, —1), a, b not both 0. 2 Note that in this example there are three independent eigenvectors but only two distinct cigenvalues. 4.7 Trace of a matrix Let /(2) be the characteristic polynomial of an n Xn matrix A, We denote the n roots of ‘{(A) by Ais...» 2ay With each root written as often as its multiplicity indicates, Then we have the factorization fA= AA) Gad We can also write /(A) in decreasing powers of I as follows, SU) = AM yak Oe Ot coe Comparing this with the factored form we find that the constant term ¢q and thecoefficient of 2°-* are given by the formulas (HVA, and Gy = (At HA). Since we also have ¢y = (-1)" det A. we see that Ayer d= det A. ‘That is, theproduct of the mots of the characteristicpolynomialof A is equal to the determinant cm sun of the roots ofl(A) is called the trace of A, denoted by tA. Thus, by definition, wAa=dk. 2 The coefficient of 4"-1 ig given by ¢,_, — -tr A. We can also compute this coefficient from the determinant form for f(a) and we find that Ca = Ut tt + Sand Exercises 107 (A. proof of this formula is requested in Exercise 12 of Section 4.8.) The two formulas for Car show that rA=Say. a That is, the trace of A ts also equal 10 the sum of the diagonal elements of A. Since the sum of the diagonal elements is easy to compute it can be used as a numerical check in calculations of eigenvalues. Further properties of the tace are described in the next set of exercises. 48 Exercises Determine the eigenvalues and eigenvectors of each of the matrices in Exercises 1 through 3 Also, for each eigenvalue 4 compute the dimension of the eigenspace E(2). 1 0 1d 1 °) 1 i 1. (a) » b) » © . . 01 o1 1 1| Li 1d@ cos 8 —sin 6 2. a>0,b>0. 3. . 3. sind cos 0 o 1 0 =i ro The matrices Py = » Ps : a) i 0. | occur in the quantum mechanical theory of electron spin and are called Pauli spin matrices, in honor of the physicist Wollgang Pauli (1900-1958). Verify that they all have eigenvalues 1 and — 1 . Then determine all 2 x 2 matrices with complex entries having the two eigenvalues 1 and -1 . 5. Determine all 2 x 2 mmauives with real entries whose eigenvalues are (a) real and distinct, (b) real and equal, (c) complex conjugates. 6. Determine a, b,c, d,e, £, given that the vectors (1, 1,1), (1, 0,1), and (1, — 1,0) are eigen- vectors of the matrix 111 o -1 abe def 7. Calculate the eigenvalues and eigenvectors of each of the following matrices, Also, compute the dimension of the eigenspace £(2) for each eigenvalue 4. 10 ¢ 2 13 5.5 6 @ [.s 1 a, (b) i 2 ‘ll ) . 4 T L 4-7 1] bb 3 20] Ls -6 -4L 8 Calculate the eigenvalues of each of the five matrices MOl 6 1 0 0 O 010 oo01 o 1:0 0 100 @ i) pe): , 1ooo 0 o-1 0 ooo! 0100 0 0 0 = 001 0 108 Eigenvalues and eigenvectors 0 0 1 0 0 Oo 10 0 o-1 0 0 @ » © . 00 -i o o 1 0 [0 0 7 0] lo 0 o -1) ‘These are called Dirnc matrices in honor of Paul A. M. Dirac (1902- ), the English physicist. They occur in the solution of the relativistic wave equation in quantum mechanics 9. If A and B are mx n matrices, with B a diagonal matrix, prove (by induction) that the deter- minant f(2) = det (2B = A) is a polynomial in 1 with £@) = (-1)” det A, and with the coefficient of 4” equal to the product of the diagonal entries of B. 10. Prove that a square matrix A and its transpose 4* have the sume characteristic polynomial 11. If A and B are mx m matrices, with A nonsingular, prove that AB and BA have the same set of eigenvalues. Note: It can be shown that AB and BA have the same characteristic poly- nomial, even if A is singular, but you are not required to prove this. 12. Let A be ann x m matrix with characteristic polynomial f(2). Prove (by induction) that the coefficient of 2 in f(A) is -tr A. 13 Ter A and R he mx m matrices with det = det Rand ir A= ir R Prove that A and R have, the same characteristic polynomial if m = 2 but that this need not be the case if m > 2. 14, Prove each of the following statements about the trace. (a)tr(A+B)=trA+trB (b) tt CA) = WA. (©) tr (AB) = tr (BA). @tAewa, 4.9 Matrices representing the same linear transformation, Similar matrices In this section we prove that two different matrix representations of a linear trans- formation have the same characteristic polynomial. To do this we investigate more closely the relation between matrices which represent the same transformation, Let us recall how matrix representations are defined. Suppose 7: V—> W is a linear mapping of an n-dimensional space V into an m-dimensional space W. Let (e,, =e) and (w,,...,W,) be ordered bases for V and W respectively. The matrix representation of T relative to this choice of bases is the m x m matrix whose columns consist of the com- ponents of T(@%), + - « 5TC@,) relative w the basis (wy, . «5 w)) Different matrix represen tations arise from different choices of the bases. We consider now the case in which V = W, and we assume that the same ordered basis (e1,-+ +50) is used for both V and W. Let A = (aq) be the matrix of T relative 10 this basis. This means that we have (48) Te)= Daye for k=1,2,...,0 a Now choose another ordered basis (tj, ... , #,) for both V and Wand let B = (b,;) be the matrix of T relative to this new basis. Then we have Matrices representing the same linear transformation. Similar matrices 109 (49) Tu) = 3 brite for: jel,2,.c an: Since each uy i> in the space spanned by ey, eq We call wile (4.10) Wy = Leeee for j = 1,2, ne fin sone set uF seats egy. Tine xm unaitin C= (643) determined by these sears is une singular because it represents a linear wansformation which maps a basis of V onto another basis of V. Applying 7 to both members of (4.10) we also have the equations (411) Tu) = Se,Tle) for j= 1.2....,0 The systems of equations in (48) through (4.11) can be written more simply in matrix form by introducing matrices with vector entries. Let B= leyeees@n) and Ut, ew he 1 x row matrices whose entries am the hasis elements in gnestion, Then the set_of equations in (4.10) can be written as a single matrix equation, (4.12) U=EC. Similarly, if we introduce E (7(e,),..-, Tie,)] and Tu), «++ Tal)» Equations (4.8), (4.9), and (4.11) become, respectively, 4.13) BAe Ue UB Ue From (4.12) we also have E= UC. To find the relation between A and B we express U' in wo ways in terms of YU, From (4.13) we have UB and U' = E'C = BAC = UC+AC Therefore UB = UCAC. But cach entry in this matrix equation is a linear combination 110 Bigenvalues and eigenvectors of the basis vectors a,..., ify. Since the u, are independent we must have B= CAC. Thus, we have proved the following theorem. rmonkx 4.6. If twon Xn matrices A and B represent the same linear transformation T, then there is a nonsingular matrix C such that B=Cdc, Moreover, if A is the matrix of T relative to a basis e,] and if B is the matrix Of T relative to a basis U = [uy,.. . , uq], then for C we can take the nonsingular matrix relating the vo bases according to the matrix equation U = EC. 2 auzonex 4.7, Let A and B be two n x n matrices related by an equation of the form B = CAAC, where C is a nonsingular nx n matrix. Then A and B represent the same linear transformation, Proof. Choose a basis E= fey... . , ¢| for an n-dimensional space V. Let a, be the vectors determined by the equations 2 4.14) Uj = Lege for f= 1, 2,...5% wi where the scalars ¢, are the entries of C. Since C is nonsingular it represents an invertible linear transformation, so U =[u,,.. . , u,] is also a basis for ¥, and we have U = EC. Tet The the linear transformation having the matrix representation A relative to the basis E, and let S be the transformation having the matix representation B relative to the basis U. ‘Then we have (4.15) Te) = Lage for k=1,2,....0 A and 4.16) Su) = S butte forj=1,2,...,1 We shall prove that § = T by showing that T(u,) = S(u,) for each j. Equations (4.15) and (4.16) can be written more simply in matix form as_ follows, Matrices representing the same linear transformation, Similar matrices rot (Tle), --.. Tel = EA, [S(u),.... S(u,)] = UB. Applying T to (4.14) we also obtain the relation T(u,) = 3 ¢T(e,), or [Teu),... . Tug)] = EAC. But we have UB = ECB = EC which shows that 7(u,) = S(u,) for each j. Therefore T(x) = S(x) for each x in V. so T = S. In other words, the matrices A and B represent the same linear transformation, periurtion. ‘Two n X n matrices A and B are called similar if there is a nonsingular matrix C such that B = CAC. Theorems 46 and 4.7 can be combined to give us suzonem 4,8. Tivo nn matrices ave similar ifand only if they represent the same linear transformation. Similar mvatices share many properties. For example, they have the same determinant since det (CAC) = det (C)(det Ayidet C) = det A. ‘This property gives us the following theorem, smeoxsn 4.9. Similar matrices have the same characteristic polynomial and therefore the same eigenvalues. Proof. If A and B are similar there is a nonsingular matrix C such that B = C-1AC. ‘Therefore we have A — B=} = CAAC = 1CHIC — CAC #= CAA — ANC. This shows that 4/ — B and J — A are similar, so det (4 — B) = det (AJ — A). Theorems 4.8 and 4.9 together show that all matrix representations of a given linear transformation T have the same characteristic polynomial. This polynomial is also called the characteristic polynomial of 7. ‘The next theorem is a combination of Theorems 45, 42, and 46. In Theorem 4.10, F denotes either the real field Ror the complex field C 112 Eigenvalues and eigenvectors smronx 4.10, Let T: V+ V be a linear transformation, where V has scalars in F, and dim V =n. Assume that the characteristic polynomial of T has n distinct roots I, ... , Ay in EF. Then we have (a) The corresponding eigenvectors t,..., U, form a basis for V. (b) The matrix of Trelative to the orrlered basis U =(u,..., uJ is the diagonal matrix: A having the eigenvalues as diagonal entries: A= diag (A,...,4,). (c) If A is the matrix of T relative to another basis E e1,....@), then A=COAC, where C is the nonsingular matrix relating the two bases by the equation U=EC. Proof. By Theorem 4.5 cach root 2; is an eigenvalue, Since there are n distinct roots, Theorem 4.2 tells us that the corresponding eigenvectors 4, . . . . Wy are independent, Hence they form a basis for V. This proves (a). Since T(,) 1, the matrix of T relative to U is the diagonal matrix A, which proves (b). To prove () we use Theorem 46, Tue nousinguiar unarix C in Theorem 410 is calied a diugonatizing mairix, (e,-+-+) is the basis of unit coordinate vectors (Ih, ..., Jy), then the equation U BC in Theorem 4.10 shows that the kth column of C consists of the components of the eigenvector u, relauve 10 (hy... he If the eigenvalues of A are distinct then A is similar to a diagonal matrix. If the eigen- values are not distinct then A still might be similar to a diagonal matrix. This will happen if and only if there are h independent eigenvectors corresponding to each eigenvalue of multipicity &. Examples occur in the next set of exercises. 4.10 Exercises 11 0 1. Prove that the matrices and [10 1 have the same eigenvalues but are not similar. o1 2, In cach case find a nonsingular matrix C such that C-14C isa diagonal matrix or explain why no such C exists 10 1 2 20 20 wa-[ |: a-[ |: o4-[ |: oa-[ |: 1 3. 54 -1 4 -1 0 3. Three bases in the plane are given, With respect to these bases a point has components (x, x4)s (a Ya): and (21, 22), respectively. Suppose that fy, ye] = (e+ 214, fay, 2] = [x 42]B, and [2,, Za] = Las yalC, where A, B,C are 2 x 2 matrices, Express C in terms of A and B. Exercises U3. Tn each case, show that the eigenvalies of A se nov distinct but that 4 has thee independent eigenvectors, Find a nonsingular matrix C such that CAC is a diagonal matrix. oor i 1 @aA=/01 0], ()A=] 1 3 1 100 ee: ‘Show that none of the following matrices is similar to a diz 20 ii ‘onal. matrix. but that each is similar to a triangular matrix of the ton] where is an eigenvalue, 2-1 27 of |: of |: One <1 4 Determine the eigenvalues and eigenvectors of the matrix] 0 0 — L}and thereby show that it is not similar to a diagonal matrix. O2ss1220 <1 3003 (@) Prove that a, square matrix A is nonsingular if and only if 0 is not an eigenvalue of A. (b) IF A is nonsingular, prove that the eigenvalues of 4”? are the reciprocals of the eigenvalues of A. Given an mx m matrix A with real entries such that 4% = J. Prove the following statements about A (@) 4 is nonsingular, (b) mis even, (©) 4 has no real eigenvalues. (d) det A» 5 EIGENVALUES OF OPERATORS ACTING ON EUCLIDEAN SPACES 5.1 Eigenvalues and inner products This chapter describes some properties of eigenvalues and eigenvectors of linear trans- formations that operate on Buclidean spaces, that is, on linear spaces having an inner product. We recall the fundamental properties of inner products In a real Euclidean space an inner product (x, ¥) of two elements x andy is a real number satisfying the following properties = () @& y)= Gs ¥) (symmetry) (2) (w+ 2, 7) = (x, 9) +») — Cinearity) 3) x,y) = e(x, y) (homogeneity) (4) (xx) >0 if xO (positivity) In a complex Euclidean space the inner product is a complex number satisfying the same properties, with the exception that symmetry is replaced by Hermitian symmetry, qa) @ Y= >), where the bar denotes the complex conjugate. In (3) the scalar ¢ is complex, From (1’) and (3) we obtain , which (ells us that scalars are conjugated when taken out of the second factor. Taking x = y in (1’) we sce that (x, x) is real so property (4) is meaningful if the space is complex. When we use the term Euclidean space without further designation it is to be understood that the space can be real or complex. Although most of our applications will be to finite- dimensional spaces, we do not require this restriction at the outset The first theorem shows that eigenvalues (if they exist) can be expressed in terms of the inner product. rmonx 5.1. Let Ebe a Buclidean space, let V be a subspace of E, and let T:V > Ebe a linear transformation having an eigenvalue 2 with a corresponding eigenvector x. Then we have T(x), x) (6.1) aa” Hermitian and skew-Hermitian transformations us Proof. Since Tix) = Ax we have (11x), x) = (Ax, x) = Ax, x) Since x # 0 we can divide by (x, x) to get (5.1). Several properties of eigenvalues are easily deduced from Equation (5.1). For example, from the Hermitian symmetry of the inner product we have the companion formula pa & TC) (5.2) (sx) for the complex conjugate of 2. From (5.1) and (5.2) we see that 4 is real (A = A) if and cooly if (Tix), x) is real, that is, if and only if (7), x) = (x, T(x) for the eigenvector x (This condition is wivially swtisied in a real Euclidean space.) Also, 2 is pure imaginary if and only if (T(x), 2) 18 pure imaginary, that is, if and only if T(x), x) =(x, T(3)) for the eigenvector x. 5.2 Hermitian and skew-Hermitian transformations In this section we introduce (wo important types of linear operators which act on Buclid- ean spaces. These operators have two categories of names, depending on whether the underlying Euclidean space has a real or complex inner product. In the real case the trans- formations awe called symmetric and skew-symmetric. In the complex case they are called Hermitian and skew-Hermitian, ‘These transformations occur in many different applications. For example, Hernia operaiors on infinite-vintet nal saves play an important wile in quantum mechanics. We shall discuss primarily the complex case since it presents no added difficulties. sarmnion. Let E be a Euclidean space and let V be a subspace of E. A linear trans- formation T: V— E is called Hermitian on V if (T(x), y) = (% Ty) forall x and y in V. Operator T is called skew-Hermitian on V if (TQ), 3} — (7) for all x and y in V. In other words, a Hermitian operator Tcan be shifted from one factor of an inner product to the other without changing the value of the product. Shifting a skew-Hemmitian operator changes the sign of the product. Note: As already mentioned, if Bis a real Euclidean space, Hermitian transformations, are also called symmetric: skew-Hermitian transformations are called skew-symmetric. 116 Eigenvalues of operators acting on Euclidean spaces wovere 1, Symmetry and skew-symmetry in the space Cla, b). Let C(a, b) denote the space of all real functions continuous on a closed interval /a, 6 with the real inner product (£9) = Proe dt. Let V be a subspace of C(a,b). If T: V > C(a, b) is a linear transformation then (f, ‘T(g)) = § /OTs(t) at, where we have written T(t) for Tigi). Therefore the conditions for sym- metry and skew-symmetry become (6.3) [2 yore =e THO} a = 0 fT is symmetric, and G) Pf OTe + OTLO} ad = 0 if T is. skew-symmetric. ‘Therefore, =P p is satisfied for all f and g in C(@, 4) since the integrand is zero, multiplication by a fixed function is a symmetric operator. subspace consisting of all functions £ which have a continuous derivative in the open interval (a, 8) and which also satisfy the boundary condition f (@) = f'bl. Let D:V—» Cla, b) be the differentiation operator given by D(f) = £" . It is easy to prove that D is skew-symmetric. In this case the integrand in (64)"is the derivative of the product fg, so the integral is equal to J Ger at= fe) ~ flaeta) , Since boihfand g satisfy the boundary condition, we have f (B)g(b) ~ £ (a)g(a) = 0. ‘Thus, the boundary condition implies skew-symmetry of D. The only eigenfunctions in the sub- space Vare the constant functions. They belong to the eigenvalue 0. EXAMPLE 4, Sturm-Liouville operators, This example is important in the theory of linear second-order differential equations, We use the space C(a, 6) of Example 1 once more and let V be the subspace consisting of all £ which have a continuous second derivative in /a, b/ and which also satisfy the two boundary conditions 65) PALS@=0, poHsH)= wherep is fixed function in C(a, b) with continuous derivative on fa, 61 Let q be another fixed function in C(a, 6) and let T; V-> C(a, b) be the operator defined by the equation TP) = (ef 'Y + 98. Orthogonality of eigenvectors corresponding to distinct eigenvalues 117 This is called a Sturm-Liowville operator. To test for symmetry we note that f7(g) — elf) = S08) — apf’). Using this in (53) and integrating both f& f + (pg? dé and fe (pf dt by pans, we find PP Ur@ = ern) at = fre’ LP per’ ae — eo fore a=, since bothfand g satisfy the boundary conditions (5.5). Hence 7’ is symmetric on V, The cigenfnoions of Tare those nonzero f which saisly, for some real 1, a differential equation of the form OY +f = on fa, b], and also satisfy the boundary conditions (5.5). 53 Ei igenvalues and eigenvectors of Hermitian and skew-Hermitian operators Regarding eigenvalues we have the following theorem; ramoren 5.2, Assume T has an eigenvalue il Then we have: a) If T is Hermitian, Ais reai: A= 1. (b) If T is skew-Hermitian, 4 is pure imaginary: Proof. Let x be an eigenvector corresponding to A. Then we have (TG), aan eee) (sx) (x) 7 If Tis Hermitian we have (Tx), x) = (%, Tix) so 4 = 1. If Tis skew-Hermitian we have (TQ), ¥) = —(x, Tee) so A= 1. te: IC Fis symmetric, Theorem 5.2 tells us nothing new about the eigenvalues of since all the eigenvalues must be real if the inner product is real.. If 7 is skew-symmetric, the eigenvalues of 7 must be both real and pure imaginary. Hence all the eigenvalues of a skew-symmetric operator must be zero (if any exist). 5.4 Orthogonality of eigenvectors corresponding to distinct eigenvalues Distinct eigenvalues of any linear transformation correspond to independent eigen- vectors (by Theorem 4.2), For Hermitian and skew-Hermitian transformations more is true. sumorem 5.3. Let T be a Hermitian or skew-Hermitian transformation, and let 2 and jt be distinct eigenvalues of T with corresponding eigenvectors x and y. Then x and y are orthogonal; that is, (x,y) = Proof. We write Thx) = Lx, Ty) = yy and compare the two inner products (710), and (x, Tiy)). We have (Tix), yy = Ax. v= Ax. y) and (x, TQM = Gx. py) = Gas WD. 118 Eigenvalues of operators acting on Euclidean spaces If T is Hermitian this gives us A(x, y) = A(x, y) = u(x, y) since w = ji, Therefore (x, y) = 0 since 2 we. If T is skew-Hermitian we obtain A(x, y) = —A(x, y) = p(x, ¥) which again implies (x,y) - 0. nave. We apply Theorem 5.3 10 those nonzero functions which satisfy a differential equation of the form (5.6) OfY afm if (on an interval fa, bl, and which also satisfy the boundary conditions p(a) f(a) = p(b)f (6) 0, ‘The conclusion is that any two solutions £ and g comesponding to two distinet values of 4. are orthogonal, For example, consider the differential equation of simple harmonic mation f" + Rf=0 oa the interval (0, 7], where k 3 0. This has the form (5.6) with p = 1, g = 0, and 2 = —K2, All solutions ae given by f(1) = ¢, cos kt + cy sin ke. The boundary condition f ©) =0 implies ¢, = 0. The second boundary condition, £ (m)= 0, implies ¢, sin km = 0. Since ¢y 0 for a nonzero solution, we must have sin kr = 0, which means that k is an integer. In other words, nonzero solutions which satisfy the boundary conditions exist if onhogonality condition implied by Theorem 5.3 now becomes the familiar relation ol Gay ee i sl ae ft) = ad only fk is These solutions arf) i sin nt sin mt dt = 0 if m® and n® are distinct integers. 5.5 Exercises 1, Let £ be Buclidean space, let V be a subspace, and let T: V-> £ be a given linear trans- formation, Let 4 be a scalar and x a nonzero element of V. Prove that T is an eigenvalue of 7 with x as an eigenvector if and only if (TO), y) = Ax) forevery yin BE. w . Let T(x) = cx lor every x ina linear space V, where ¢ is a lixed scalar. Prove that Tis sym- metic if V is a real Euclidean space. 3. Assume T: VV is a Hermitian transformation, (a) Prove that 7” is Hermitian for every positive integer 7, and that 7? is Hermitian if Tis invertible. (b) What can you conclude about 7* and T! if T is skew-Hermitian? 4, Let Ty: V > E and Ty:V -+ E be two Hermitian transformations. (a) Prove that aT, + bT, is Hermitian for all real scalars a and 6. (b) Prove that the product (composition) 7,7, is Hermitian if T, and T, commute, that is, if Tih, = TT. Let V = V,(R) with the usual dot product as inner product. Let 7 be a reflection in the xy- plane, that is, let T@) = 7,7) = j,and T/k} = -A. Prove that T is symmewic. . Exercises 119 6. Let C(@, 1) be the real linear space of all real functions continuous on [0, 1] with inner product Ug) = fi f(g de. Let V be the subspace of all F such that fi T(t) dr = 0,LetT: V > (0, 1) be the integration operator defined by Tf (x) ={3 £ (t) de. Prove that Tis skew-sym- metic 7. Let V be the real Buclidean space of all real polynomials with the inner product (f,g) = FL, £ (glade. Determine which of the following transformations T! V+ V is symmetric ot skew-symmetric: (©) Tf) =/@) +f(-). @ Tf) =f(-9. ) Tf @ 170) =f@) -f(-9. (fa) =] forgtaw(o at. where w is a fixed positive function in Ca, b). Modify the Sturm-Liouville operator T by wong To Of Vt of Prove that the modified operator is symmetic on the subspace V. and define a scalar-valued function Q on V as follows : Q(x) = (Tix), x) for all x in V. (@) If 7 is Hermitian on T°, prove that Q(x) is real for all x. (b) If Tis skew-Hermitian, prove that Q(x) is pure imaginary for all x, (¢) Prove that Q(tx) = 17Q(x) for every scalar 1. @ Prove that Q(x + y) = Qa) + QL) + (NI. y) + (Th), »), and find a conesponding formula for Q(x + ty) (e) If Q&) = 0 for all x prove that Yay) = 0 for all x ( WF Qo) As real for all x prove that ‘I is Hermitian, [Hint: Use the fact that Q(x + ty) equals its conjugate for every scalar t] 10. This exercise shows that the Legendre polynomials (introduced in Section 1.14) are eigen- functions of a Sturm-Liowville operator. The Legendre polynomials are defined by the equation PA) = ESO, — whew, (t) = = (a) Verify tau 1) £, () = 2nt f(t). (©) Ditierentiate the equation in (a) m+ 1 times, using Leibniz’s formula (see p. 222 of Volume T) to obtain (EADS PO) + ele DG LMG) + ales DF PO = Amf HVE + Inn + YF MO. (© Show that the equation in (b) can be rewritten in the form [Ke = DPOF = n+ DPD. ‘This shows that P,(1) is an eigenfunction of the Sturm-Liouville operator 7 given on the 120 Eigenvalues of operators acting on Euclidean spaces interval [ -1 , 1] by T(f)= (pf), where p() = ® ~ 1 . The eigenfunction P,(f) belongs 10 the eigenvalue 4 = n(n + 1). In this example the boundary conditions for’ symmetry are automatically satisfied since p(1)= p( = 1) = 0. 5.6 Existence of an orthonormal set of eigenvectors for Hermitian and skew-Hermitian operators acting on finite-dimensional spaces Both Theorems 5.2 and 5.3 are based on the assumption that 7” has an eigenvalue. As we know, eigenvalues need not exist. However, if 7 acts on a finite-dimensional complex space, then eigenvalues slways exist since they am the roots of the characteristic polynomial If 7 is Hermitian, all the eigenvalues are teal. If 7 is skew-Hermitian, all the eigenvalues are pure imaginary, We also know that two distinct eigenvalues belong to orthogonal eigenvectors if T is Hermitian or skew-Hermitian. Using this property we can prove that 7” has an orthonormal set of eigenvectors which spans the whole space. (We recall that an orthogonal set is called orthonormal if each of its (elements has norm 1.) , Assume dim V = n and let V-> V be Hermitian or skew-Hermitian, Then there exist n eigenvectors ts... , %y af T which form an orthonormal basis for V. Hence the matrix of T relative to this basis is the diagonal matrix A = diag (Ay... . Andy where iy is the eigenvalue belonging 10 ty. Proof. We use inductian on the dimension n. If n =1, then 7’ has exactly one eigen- value. Any eigenvector % of norm 1 is an orthonormal basis for V. Now assume the theorem is true for every Buclidean space of dimension n = 1. To prove it is also ue for V we choose an eigenvalue 4, for T and a comesponding cigenvector 4, of norm 1. Then T(u,) = Ayu and fay) = 1. Let $ be the subspace spanned by uy. We shall apply the induction hypothesis t0 the subspace S* consisting of all elements in V which are orthogonal to 4, Si = {x|xeV, (x, m) =O}. To do this we need to know that dim $4 =m — 1 and that T maps $+ into itself From Theorem 1.7(a) we know that w is part of a basis for V, say the basis (u,, 0... t,). We can assume, without loss in generality, that this is an orthonormal basis, (If not, we apply the Gram-Schmidt process to convert it into an orthonormal basis, keeping u, as the first basis element) Now take any x in S4 and write ly ba RTE Gy. Then x, = (x, uy) = 0 since the basis is orthonomal, so x is in the space spanned by Yes... Hence dim S4= n = 1. Next we show that 7” maps $1 into itself, Assume Tis Hermitian. If x € $4 we have (7G, ies , Ch td = Gs Ma) = Ari) — AGs wi) = Matrix representations for Hermitian and skew-Hermitian operators 11 80 Tix) € $+. Since Tis Hermitian on $+ we can apply the induction hypothesis to find that has n — 1 eigenvectors ,. . . ,t, Which form an orthonormal basis for $1, Therefore the orthogonal set us. . : , up is an orthonormal basis for V. This proves the theorem if Tis Hermitian, A similar argument works if Tis skew-Hermitian. 5.7 Matrix representations for Hermitian and skew-Hermitian operators In this section we assume that V is a finite-dimensional Euclidean space. A Hermitian or skew-Hermitian transformation can be characterized in terms of its action on the elements of any basis. tuzoren 5.5. Let (e,,... . e,) be a basis for V and let T: V+ V be a linear transfor- mation, Then we have: (a) Tis Hermitian if and only if (T(e,), €1) = (e,, T(e)) for all i andj. (b) Tis skew-Hermitian if and only if (T(e;), e,) = = (e;,T(e,)) for all i andj. Proof. Take any two elements x and y in V and express each in terms of the basis elements, say x = 5 xe; and y = 5 yey. Then we have ree ye (Sy Tie) phate tied Sve) et Py artiehe} TOM H | BPed I] =F xi[ Meh Z wey = ZB KT €0- Similarly we find TO) = ES «anes TE). Ae Statements (a) and (b) following immediately from these equations. Now we characterize these concepts in terms of a matrix representation of T. maton 5.6. Let (e,,..., @,) be an orthonormal basis for V, and let A = (ay) be the matrix representation of a linear transformation T: V-+ V relative to this basis. Then we have: (a) Tis Hermitian if and only if a;;= Gy for all i andj. (b) Tis skew Hermitian if and only if a,,- —4,, for all i andj. Proof. Since A is the matrix of T we have T(e,) = D8, 4usex . Taking the inner product of 7(e,) with e, and using the linearity of the inner product we obtain (ey, 09 = (Zanerse) =Zarter e0- But (e, . ¢;) = 0 unless & =, $0 the last sum simplifies to a,,(¢;, ¢,) = a4; since (e, . €1) Hence we have fy = (Tle) e)) forall 122 Eigenvalues of operators acting on Euclidean spaces Imerchanging i and j, taking conjugates, and using the Hermitian symmetry of the inner produet, We find = (6 Tle) for all i, j. Now we apply Theorem 5.5 10 complete the proof. 5.8 Hermitian and skew-Hermitian matrices. The adjoint of a matrix The following definition is suggested by Theorem 5.6. preixiriox, A square matrix A = (ay) is called Hermitian if aj; = dy, for all i and j. Matrix A is called skew-Hermitian if a= —6,,for all i and j. ‘Theorem 5.6 states that a transformation 7 on a finite-dimensional space V is Hermitian or skew-Hermitian according as its matrix relative t© an orthonormal basis is Hermitian or skew-Hermitian These matrices can be described in another way. Let A denote the matrix obtained by replacing each entry of A by its complex conjugate. Matrix 4 is called the conjugate of A. Matrix A is Hermitian if and only if it is equal to the transpose of its conjugate, A = 4! IL is skew-Hemnitian if A = —At The transpose of the conjugate is given a special name, DEFINITION OP THE ADJOLNT OP AMATRIX. For any matrix A, the transpose of the conjugate, Al, is also called the adjoint of A and is denoted by A*. Thus, a squaw matrix A is Hermitian if A = A*, and skew-Hermitian if A = -A*, A Hermitian mauix is also called self-adjoint. Note: Much of the older matrix literature uses the term adjoint for the transpose of the cofactor matrix, an entirely different object. ‘The definition given here conforms to the ‘current nomenclature in the theory of linear operators. 5.9 Diagonalization of a Hermitian or skew-Hermitian matrix rasos 5.7. Every n X n Hermitian or skew-Hermitian matrix A is similar to the diagonal matrix A= diag (A,,..., 4,) of its eigenvalues. Moreover, we have A= CC, where Cis a nonsingular matrix whose inverse is its adjoint,C *=C*. Proof. Let V be the space of n-tuples of complex numbers, and let (@,,..., €) be the orthonormal basis of unit coordinate vectors. If x = Y xe; and y = } ye, let the inner product be given by (x, y) = 5 xis. For the given matrix A, let 7 be’ the transformation represented by A relative to the chosen basis, Then Theorem 5.4 tells us that V has an Unitary matrices. Orthogonal matrices 123 orthonormal basis of eigenvectors (x, + u,), Telative to which 7’ has the diagonal matrix representation A = diag (A,, . + A,). where 2, is the eigenvalue belonging to My. Since both a! and A represent T they are similar, so we have A = C-1AC, where C = (¢j,) is the nonsingular matrix relating the two bases: [sees da} = Ler... 5 eWlC. This equation shows that thejth column of C consists of the components of Wy relative to (e1s...,€). Therefore ¢,j is the ith component of u,, The inner product of uy and uy is given by (od = Se Since {uy,..., uy} iS an orthonormal set, this shows that CC* = I, so C= C* . Note: The proof of Theorem 5.7 also tells us how to determine the diagonalizing matrix C. We find an orthonormal set of eigenvectors a4, .. . , ay and then use the components of u; (relative to the basis of unit coordinate vectors) as the entries of theith column of C. 2 sxavexs 1, The real Hermitian matrix A = C1 has eigenvalues A, = 1 and 1, = 6. The eigenvectors belonging t© 1 are 1(2, —I), # 34 0, Those belonging to 6 aw f(1, 2), 1 #0. The two eigenvectors 4 = (2, -1) and w= 11, 2) with f= 1/5 form an orthonormal set. Therefore the matrix a C* since C is real. It is easily verified is a diagonalizing matrix for A. In this case Ct 1 that CAC = 0 : sxwous 2. If A is already @ diggonal matrix, then the diagonaizing mauix C of Theorem 57 either leaves A unchanged or mérely rearranges the diagonal ents 5.10 Unitary matrices. Orthogonal matrices perimrrron, A square matrix A is called unitary if AA* = 1. It is called orthogonal if AA = 1. Note: Every real unitary matrix is orthogonal since A* = At, Theorem 5.7 tells us that a Hermitian or skew-Hemmitian matrix can always be diago- alized by a unitary matrix. A real Hermitian matrix has real eigenvalues and the come- sponding eigenvectors can be taken real. Therefore a real Hermitian matrix can be 104 Eigenvalues of operators acting on Euclidean spaces diagonalized by a real orthogonal matrix, This is nor tue for real skew-Hermitian matrices, (See Exercise 11 in Section 5.11.) We also have the following slated o periwrzzon. A square matrix A with real or complex entries is called symmetric if A — At ; it is called skew-symmetric if A = —A‘, mover 3. If A is real, its adjoint is equal to its transpose, A* = A', Thus, every real Hermitian matrix is symmetric, but a symmetric matrix need not be Hermitian. pl +i2 | flat 2 1+i 3- EXAMPLE: 4. If A = » then 4 > Abe 3 i 4iy 3+i —4i 2 4i 1=i 348 2. 4) and A* = [ 2+4] acces 1 EXAMPLE: 5, Both matrices [ and E are Hermitian. The first is sym- =i BR metric, the second is not 0-2 i -2) EXAMPLE 6. Both matrices and | are skew-Hermitian. The first is 0 2 3) skew-symmetric, the second is not sexe 7. All the diagonal elements of a Hermitian matrix are real, All the diagonal elements of a skew-Hermitian matrix are pure imaginary, All the diagonal elements of a skew-symmetric matrix are zero. savers 8, For any square matrix A, the matrix B= 4(4 + A*) is Hermitian, and the matrix C = (A = A*) is skew-Hermitian. Their sum is A. Thus, every square matrix A can be expressed as a sum A = B + C, where B is Hermitian and C is skew-Hermitian, It is an easy exercise to verify that this decomposition is unique. Also every square matrix A can be expressed umiguely as the sum of a symmeme max, #(4 + A‘), and a skew- symmetric matrix, $(4 = 44), suvez 9. If A is orthogonal we have 1 = det (AA!) = (det A\det 4!) = (det A)*, 50 det 4 = £1 5.11 Exercises 1. Determine which of the following matrices are symmettic, skew-symmetric, Hermitian, skew- ‘Hermitian. O12 o i 2 o G2 o 1 2 @ji 03), WMl/i 0 3). @©l- 0 31, @j 1 0 3). 234 -2 -3 47 -2 -3 0, 2-3 0, Exercises 125 cos 0 —sin 0 (a) Verify that Ue 2x 2 matin A = is an orthogonal avin, sin cos 0 (b) Let T be the linear transformation” with the above matrix A relative 10 the usual basis {i,j}. Prove that T maps each point in the plane with polar coordinates (r, 2) onto the point with polar coordinates (r, «+ 6), Thus, 7 is a rotation of the plane about the origin, 8 being the angle of rotation. Let 7 be wal Sypace with the usuai basis vecwws & jy A. love thea each Uf the fuliowing matrices is orthogonal and represents the transformation indicated. 1 go @) | 0 1 9F0 —— Geffection in the xy-plane). 0 9 1 10 oO @) }O =1 0} — Geflection dough tine x-aaiy). -l 0 0 © [ ol ‘ (reflection through the origin) Lo 0 1] 10 0 (a) cos 8 —sin | (rotation about the x-axis), [o sino cose | “10 0 (©) | 0 cos @ -sin 6, (rotation about x-axis followed by reflection in the y 0 sin 6 cosy 4. A real orthogonal matrix A is called proper if det A = and improper if det A = 6050 —sin 0 (a) I A is a proper 2 x 2 matrix, prove that A =| _ | for some 6, This represents a rolation through an angle 8. [sin é cos d 10 -10 (b) Prove that [ Al and [ ° | are improper matrices. The first mawix represents a o =] 1 reflection of the xy-plane through the’ x-axis; the second represents a reflection through the y-axis, Find all improper 2x 2 mauice In cach of Exercises 5 through 8, find (a) an orthogonal set of eigenvectors for A, and (b) a unitary matix C such that CAC is a diagonal matrix, 9 12 0 2 5.A= . 6A = 12 16, rach a J <2 126 Eigenvalues of operators acting on Euclidean spaces 9. Determine which of the following matrices are unitary, and which are orthogonal (a, b, @ real). a fone 6 a0] (avi avi ave] @ | |: @| 0 10) ©] 0 4s 1% : sin 0 cos yd vd 3% 10, The special theory of relativity makes use of the equations = alt —vx/c*) wsae-w), yy, Here v is the velocity of a moving object, ¢ the speed of light, and a =o] Ve — a, The linear transformation which maps (x, y, 2 6) onto (x’, y’, 2’, is called a Lorente transformation. (a) Let (xy, Xa) Xa, x4) 0 = (X, V4 %, det) and ( %45 49%) = 0", YZ, fer”). Show that the four equations can be written as one matrix equation, a 00 —~iavje 0 10 0 cmll 9 O19 Liwle 0 0 a J (b) Prove that the 4 x 4 matrix in (@) is orthogonal but not unitary. 0a 11, Let a be a nanzera real number and let A be the skew-symmetric matrix A [ Al oa (a) Find an orthonormal set of eigenvectors for A. (b) Find a unitary matrix C such that CAC is a diagonal matrix. (©) Prove that there is no real orthogonal matrix C such that C1AC is a diagonal mauix. 12, If the eigenvalues of a Hemnitian or skew-Hemmitian matrix A are all equal to ¢, prove that A nel 13, If A is a real skew-symmetric matrix, prove that both Z = A and Z+ A are nonsingular and that (I = A)(Z + A) is orthogonal 14, Tor cach of the following statements about m x 1 matrices, give a proof or exhibit a counter example, (a) If A and B are unitary, then A + B is unitary, (b) Wf a and are unitary, then as is unitary. (©) IA and AB are unitary, then B is unitary. @ If A and B are unitary, then A + B is not unitary 5.12 Quadratic forms Lot V be a real Euclidean space and let T: ¥—> ¥ be a symmetric operator. This means that T can be shifted from one factor of an inner product to the other, (T(x), y) = (x, TE) forallxandyin V. Given 7, we define a real-valued function Q on V by the equation x) = (TQ), ) Quadratic forms 1 The function Q is called the quadratic form associated with T. The term “quadratic” is suggested by the following theorem which shows that in the finite-dimensional casc Q(x) is a quadratic polynomial in the components of x. muzoren 5.8. Let (e,... ,¢,) be an orthonormal basis for a real Euclidean space V. Let T: V — V be a symmetric transformation, and let A = (a;;) be the matrix of T relative to this basis. Then the quadratic form Q(x) = (T(x), x) és related to A as follows: 62) am => daxxy if x =D xe om Proof. By linearity we have T(x) = ¥ x;T(@) . Therefore 20) = (Sure, Sxe) 5 Sxx(T),e). 1 This proves (5.7) since a; = a: = (T(e,), &). The sum appearing in (5.7) is meaningful even if the matrix A is not symmetic. serrurrzon. Let V be any real Euclidean space with an orthonormal basis (e,,..., ¢), and let A = (a) by any n x n matrix of scalars. The scalar-valued function Q defined at each element x = ¥ xe; of V by the double sum (5.8) Q(x) => dann, is called the quadratic form associated with A. If A is a diagonal matrix, then a), = 0 if i # j so the sum in (5.8) contains only squared terms and can be written more simply as Qs) In this case the quadratie form is called a diagonalform, The double sum appearing in (5:8) can also be expressed as a product of three matrices, rugorem 5.9. Let X =[xi,..., %] be a 1 x n row matrix, and let A = (ay) be an nx nmatrix. Then XAX* isa 1 x 1 matrix with entry wor ams Proof. The product XA isa Lx n matrix, XA = [yi.... ys], where entry yy is the dot product of X with thejth column of A, ’ = 2 xy 1 128 Bigenvalues of operators acting on Euclidean spaces Therefore the product YAX' is a 1 x 1 matrix whose single entry is the dot product ANS eat Note: It is customary (0 identify the 1.x 1 matix XAX* with the sum in (5.9) and w call the product XAX" a, quadratic form, Equation (5:8) is written more simply as follows: Q(x) = XAX'. +X = by, x]. Then we have XA = bs 5 1 X XAX! = [xy -- 3x2, — + sell | = x} = 3xx, — xe + Sx, [xy — 3x2, —Xy + Sma), and hence 1-2 exawpie 2. Let B= ~ |X = Ee, x). Then we have L423 1-2] [x XBX! = bs I | = xf — Qxax, — xpxy + 5x2, -2 siLx, Jn both Examples 1 and 2 the two mixed product terms add up to —4x4x, so XAX' = XBX'. These examples show that different matrices can lead to the same quadratic form, Note that one of these matrices is symmetric, This illustrates the next theorem, rmoame 5.10, For any n x n matrix A and any |x nrow matrix X we have XAX* = XBX' where B is the symmetric matrix B= 3(A+ A’). Proof. Since XAX* is a1 x 1 matrix it is equal to its transpose, XAX' = (XAX‘!. But the transpose of a product is the product of transposes in reversed onder, so we have (XAX! = XAUX! . Therefore XAX' = |XAX'+ LXAIX' = BX! 5.13 Reduction of a real quadratic form to a diagonal form ‘A real symmetric matrix A is Hermitian, Therefore, by Theorem 5.7 it is similar to the onal matrix A = diag (Ay. . 4,) of its eigenvalues. Moreover, we have A = CtAC, where C is an orthogonal matrix, Now we show that C can be used to convert the quadratic xt Reduction of a real quadratic form to a diagonal form 129 razor 5.11. Let XAX' be the quadmitic form associated with a real symmetric matrix A, and let C be an orthogonal matrix that converts A to a diagonal matrix A= CtAC ‘Then we have MAX! = FAN'S S apt et where Y = [)1). +95] is the row matrix Y=XC, and J,,..., 2, are the eigenvalues of A. Proof. Since C is orthogonal we have C-1 = Ct. Therefore the equation Y = XC implies X = YC*, and we obtain VAX! =(YC)A( YC! = Y(CHC) Yt = YA Yt Note: Theorem 5.11 is described by saying that the linear transformation Y = XC reduces the quadratic form XAX¢ to a diagonal form YA ¥* XIXt = Sx? =X), a the square of the length of the vector ¥ = Qy,... x.) . A linear transformation Y = XC. where C is an orthogonal matrix, gives a new quadratic form YAY with A = CIC! = Ct = J, Since XIX' = YIY* we have |X|? = | Y\)?, so Y has the same length as X. A linear transformation which preserves the length of each vecior is called an isometry. These transformations are discussed in more detail in Section 5.19. exwere 2,. Determine an orthogonal matrix C which reduces the quadratic form Q(x) = 2x + dxyxa + 5x2 to a diagonal form. 2 Solution. We write Q(x) = XAXt, where A = [ 1. This symmetric matrix was diagonalized in Example 1 following Theorem 5.7. It has the eigenvalues 2, and an orthonormal set of eigenvectors w, 4, where = ¢(2, —1), ty 1/¥5. An orthogonal diagonalizing matrix is C = { The corresponding “l diagonal form is YAY! = Ayi+ ays= yt + Oy} ‘The result of Example 2 has a simple geometric interpretation, illustrated in Figure 5.1 ‘The linear transformation Y = XC can be regarded as a rotation which maps the basis #, J onto the new hasis uy, Me. A point with coordinates (x, x2) relative to the first basis. has new coordinates (y1, yz) relative to the second basis. Since YAX' = YAY", the set of points (x, x2) satisfying the equation YAX' = ¢ for some c is identical with the set of points (y1, Je) satisfying YA Y* = c . ‘The second equation, wntten as y? + 6yF = c, 8 130 Bygenvalues of operators acting on Buelidean spaces { (8 x2) relativeto basis jj | Gi. y2) relativeto basis uy. uy FIGURE 5.1 Rotation of axes by an orthogonal matrix. ‘The ellipse has Cartesian equation XAX' =9 in the x,xy-system, and equation YA Y' = 9 in the yrye-system. the Cartesian equation of an ellipse if ¢ > 0, Therefore the equation XAX' = ¢, written as 2x? + 4ayxy + Sx2 = c, represents the same cllipse in the original coordinate system. Figure 5.1 shows the ellipse corresponding to ¢ = 9. 5.14 Applications to analytic geometry The reduction of a quadratic form to a diagonal form can be used to identify the set of all points (x, y) in the plane which satisfy a Cartesian equation of the form 6.10) axt+ bxy + ott det yt f=o We shall find that this set is always a conic section, that is, an ellipse, hyperbola, parabola, ‘or one of the degenerate cases (the empty set, a single point, or one or two straight lines). The type of conic is governed by the second-degree terms, that is, by the quadratic form ax*+ bxy + cy®. To conform with the notation used earlier, we write x, for X, Xs for ys and express this quadratic form as a matrix product, XAX! = ax? + bere + ext, a bz (xi, x2] and A = . By a rotation Y bi ¢ a diagonal form Ayy? + Agy3, where Ay, Az are the eigenvalues of A. An orthonormal set of eigenvectors 41, tp determines a new set of coordinate axes, relative to which the Cartesian equation (5.10) becomes where X XC we reduce this form to (11) Ait Mayet dt ey, + f = 0, with new coefficients a and e* in the linear terms, In this equation there is no mixed product term y,¥2y SO the type of come is easily dentified by examining the eigenvalues 4, and Ap Applications 10 analytic geometry 131 If the conic is not degenerate, Equation (5.11) represents an ellipse if Jy, A, have the same sign, a hyperbola if 4,, 4, have opposite signs, and a parabola if either A, or A, is zero. The three cases correspond 10 Ady > 0, Ade < 0, and Aydy = 0. We illustrate with some specific examples, savers 1, 2x” + day + Sy? + 4x + 13y — J = 0. We rewrite this as (9.12) 2x} + 4xyx_ + Sx¥ + 4x, + 13x, - $= 0 ‘The quadratic form 2x? + 4x, + Sxf is the one treated in Example 2 of the foregoing section, Ils matrix has eigenvalues A; = 1 , dy = 6, and an orthonormal set of eigenvectors uy = (2, — 1), Me = t(1,2), where 1 = 1/5. An orthogonal diagonalizing matrix is 1 c= [ : This reduces the quadratic part of (5.12) to the form y+ 6y2. To -1 24 determine the effect on the linear part we write the equation of rotation Y = XC in the form X = YC! and obtain Git) == bn val 2y,). Vv 1 Therefore the linear part 4x, + 13x, is transformed 10 4 B Oy. Ye). S(Hyn + 2yy) - <5 y. 5 y "2 a! a v2) V3 + 6/5 ye. The wansformed Cartesian equation becomes Jit 6yE— Vo ri + 6/5 ¥2- = 0. By completing the squares in y, and y, we rewrite this as follows: On — 3V5)* + 60% + AVS) = This is the (equation of an ellipse with its center at the point (Jy'5, }V/5) in the yyy system. The positive directions of the y, and yy axes are determined by the eigenvectors 1, and ug, as indicated in Figure 5.2. We can simplify the equation further by writing -iW5, oy. t+ IVS Goomettically, this is the same as introducing a new system of coordinate axes parallel to the Yy¥e axes bot with the new origin at the center of the ellipse. In the 74Z3-system the equation of the ellipse is. simply or The vile and ail dnee Codinaic sysicus we simu in Figue 3.2. 132 Eigenvalues of operators acting on Buclidean spaces FicurE 5.2 Rotation and translation of Goordinate: axes. The rotation Y is followed by the translation 2, = yy — $V5, 2) = yp +3V9. xc PNAMPLE 2. 2x am dey — yt dx + 10y = 13 = 0. We rewrite this as 2x? — 4ex, — xt — 4x, + 10x, — 13 = 0. 2 The quadratic part is YAX', where A = [ : 1 This matrix has the eigenvalues Ay = 3. de An orthonormal set of eigenvectors is uy = 1(2, 1), wp = t(1, 2) where t = If JS. An orhogonal diagonalizing matix is C of rotation ¥ = ¥C* gves us 21 - The equation -1 2 x 1 3 (2¥1 Vad a= E(—I1 + 2d), Vv ‘Therefore the transformed equation becomes wi-vt-on +t BA Yi $2y.) — 13 v5 v5 or 18 By = WE- t Se v By completing the squares in y, and y, we obtain the equation 301 ~ V5)" = 204 — V5 = 12, Applications to analytic geometry 133 (a) Hyperbola: 323 — 224 = 12 (b) Parabola: yf + y = 0 Houre 5.3 The curves in Examples 2 and 3, which represents a hyperbola with its center at (V5, £V5) in the yyyy-system. The trans- lation 7, = yy — 4V5, z2= ya — 4V5 simplifies this equation further to Sep eae 1 ‘The hyperbola is shown in Figure 53). ‘The eigenvectors uy and x determine the direc- tions of the positive y, and yy axes. EXAMPLE 3. 9x? + 24xy + 16y? ~ 20x + 15y = 0. We rewrite this as Oxt + 24xyxy + 16x; — 20x, + 15xy= 0. 9 ‘The symmetric matix for the quadratic pat is A =| |, t Its cigenvalucs are 4; = 0. An orthonormal set of eigenvectors is u1 = 43,4), % = }(—4,3). An orthog- 3 onal diagonalizing matrix is C = on 1. The equation of rotation X = ¥C* gives us = 46-4), = B+ 3p). ‘Therefore the transformed Cartesian equation becomes 25yt — 223), — 4ye) + (491 + 3y2) = 0. ‘This simplifies toy? + ya = 0, the equation of a parabola with its vertex at the origin, ‘The parubola is Siow in Figue 530). 134 Bigenvalues of operators acting on Buclidean spaces ewe 4, Degenerate cases. A knowledge of the eigenvalues alone does not reveal whether the Cartesian equation represents a degenerate conic section. For example, the three equations x*+ 2y?= 1, x8+ 2p? = 0, and x? + 2y?= — | all have the same eigenvalues; the first represents a nondegenerate ellipse, the second is satisfied only by (x, y) = (0, 0), and the third represents the empty set. The last two can be regarded as degenerate cases of the ellipse. ‘Thw graph of die cquatiun y? = 0 iy ie avaniy. The equations y° = 1 — G sepresenis the two parallel lines y=1and y=— 1. These can be regarded as degenerate cases of the parabola, The equation x¢ = 4y* = 0 represents two intersecting lines since it is satisfied if either x— 2y =O or x+ 2p =0, This can be regarded as a degenerate case of the hyperbola. However, if the Cartesian equation ax® + hey + ey? + dex + ey + f = 0 represents a nondegenerate conic section, then the fype of conic can be determined quite easily. The characteristic polynomial of the matrix of the quadratic form ax? + bay + cy? is 2—a—b2)] | det [ | P= (a+ oh+ (ac — $= A— AMAA) —b)2 de. Therefore the product of the cigenvalues is Jydy = ac — 34 = 4(Aae — B*). Since the type of conic is determined by Uke algebraic sign of the praduct Ae, we see that the conie is an ellipse, hyperbola, ot parabola, according as dae — b* is positive, negative, or zero. The number dac — bf is called the discriminant of the quadratic form ax? + buy + ey. In Examples 1, 2 and 3 the discriminant has the values 34, -24, and 0, respectively. 5.15 Exercises In each of Bxereises 1 through 7, find (a) a symmetric matrix A for the quadratic form; (b) the eigenvalues of A; (c) an orthonormal set of eigenvectors; (d) an orthogonal diagonalizing matrix C. 1. 4x; + 4ay%2 +33. ss XE+ xyty + raXa + Hex. 2. aXe 6. 2x} + 4xyxa + x5 — xh 3. ad + xx, = xh 7. 3x; + Axpta + 8xpty + 4xpxy + 3x5. 4. 34x; = 24xpry + 4x3. In each of Exercises 8 dough 18, identify and the Cartesian equation. 8, y= 2xy +285 = 0. 14, 5x? + 6xy + 5y? -2 =O. 9, yt — Ixy + 5x = 0. 15, x* + Ixy + y* — 2x + 2p 43 10, y® = 2xy +22 - 5: 16. 2x? + dxy + Sy? — 2x —y = 11. 5x2 — 4xy + 2y% - 6 =0, 17, x84 dxy — 2 = 1250. 12. 19x* + 4xy + 16y® = 212x + 104y = 356, 13, 9x2 + 2dxy + 16y® = 5:2x + Idy = 6, 18. xy + y -2x -2 =O. 19, For what value (or values) of © will the graph of the Cartesian equation 2xy — 4x + Ty + ¢ = 0 be a pair of lines ? 20. If the equation ax* + bxy + cy® represents an ellipse, prove that the area of the region it bounds is 2x] V4ac = 6%, This gives a geometric meaning to the discriminant 4ac — b*, whe @ sketch of the conic section represented by Higenvalues Of a symietne transformation obtained as values of its quadratic form 135 £5.16} Figenvalues of a symmetric transformation obtained as values of its quadratic form Now we drop the requirement that V be finite-dimensional and we find a relation between the eigenvalues of a symmetric operator and its quadratic for. Suppose x is an eigenvector with norm 1 belonging to an eigenvalue 4, Then T(x) so we have ax (3.13) Q&) = (TO), ) = (Ax, x) = Ale, = A, since (x, x) =: 1. The set of all x in V satisfying (x, x) = 1 is called the unit sphere in V. Equation (6.13) proves the following theorem, suzonen 5,12. Let T: V + V be a symmetric transformation on a real Euclidean space V, and let Q(x) «= (Tix), x). Then the eigenvalues of T (if uny exist) are to be found among the values that Q takes on the unit sphere in V. exaeiz. Let V = V,(R) with the usual basis (i,j) and the usual dot product as inner product. Let T be the symme quadratic fom of Tis given by 4 transformation with matrix A = [ 1 Then the 0 O(x) = Y Yayxxy = Axi + 8x}. ae ‘The eigenvalues of, 7 are Ay = 4, ay = 8. It is easy 10 see that these eigenvalues are, respectively, the minimum and maximum values which Q takes on the unit circle x? + xd = 1. In fact, on this circle we have Qx) = 4d + x2) t axta4t 4x9, whee -1 0 for all real t In other words, the quadratic polynomial p(t) = ar + bf has its minimum at 1 = 0. Hence p’(0) = 0. But p'(0) = a = 2T(x), y), 80 (Tix), y) = 0. Since y was arbitrary, we can in particular take y = T(x), getting (T(x), T(x)) = 0. This proves that Tix) = 0. If Q is nonpositive on V we get p/t) = at + bt? < 0 for all #, sop has its maximum at = 0, and hence p'(0) = 0 as before. 5:17 Extremal properties of eigenvalues of a symmetric transformation Now we shall prove that the extreme values of a quadratic form on the unit sphere are eigenvalues’. THEOREM 5.14. Let T:V—> V be a symmetric linear transformation on a real Euclidean space V, and let Q(x) = (Thx), x). Among all values that Q takes on the unit sphere, assume ‘The finite-dimensional case 137 there is an extremum} (maximum or minimum) at a point u with (yw = 1 . Then wisaneigen- vector for T: the corresponding eigenvalue is Qiu), the extreme value of Q on the unit sphere. Proof. Assume Q has a minimum atu Then we have 614) Q(x) > Olu) for all x with (x, x) = 1 Let A= Q(u). If (x, x) = 1 we have Q(u) = Ax, x) = (Ax, x) so inequality (5.14) can be writlen as 615) (Ta), x > Gx, 9) provided (x, x) = 1. Now we prove that (5.15) is valid for all x in V. Suppose [x||= @. Then x = ay, where jjyl| = 1. Hence (Te), x) = (T@), ay) = aT), 9) and (Ax, x)= ry, Y) But (70), ¥) > (4, ») since (y, y) = 1. Multiplying both members of this inequality by @ we get (5.15) for x = ay. Since (Tix), x) ~ (Ax, x) = (lx) — Ax, x), we can rewrite inequality (5.15) in the form (Tx) — Ax, x) > 0, or 6.16) (S@),x) 20, where $= 7-4. When x = u we have equality in (5.14) and hence also in (5.16). The linear transformation S is symmetric. Inequality (5.16) states that the quadratic form Q, given by Q,(x) = (S(X), x) is nonnegative on V, When x = u we have Q,(u) = 0. Therefore, by Theorem 5.13 we must have S(u) = 0. In other words, Tfu) = Au, so u is an eigenvector for T; and 4 = Qu) is the comesponding eigenvalue. This completes the proof if Q has a minimum atu. If there is a maximum at u all the inequalities in the foregoing proof are reversed and we apply ‘Theorem 5.13 to the nonpositive quadratic form Q, . 5.18 The finitedimensional case Suppose now that dim V =n. Then 7 has n real eigenvalues which can be amanged in increasing onder, say ASA SA, According to ‘Theorem 5.14, the smallest eigenvalue 4, 18 the minimum of Q on the unit sphere, and the largest eigenvalue is the maximum of Q on the unit sphere, Now we shall show that the intermediate eigenvalues also occur as extreme values of Q. restricted to certain subsets of the unit sphere. J If Vis infinite-dimensional, the quadratic form Q need be the case when T has no eigenvalues. In the finite- minimum somewhere on the unit sphere. This foll extreme values of continuous fimetions. For a special ea have an extremum on the unit sphere. ‘This Will mensional case, Q always has a maximum and a as a consequence of a more general theorem on of this theorem see Section 9.16. 138 Eigenvalues of opersiors acting on Fuclidean spaces Let uw be an eigenvector on the unit sphere which minimizes Q Then 4, = Qu) . I A is an eigenvalue different from 4, any eigenvector belonging 19 A must be onhogonal to 4. Therefore it is natural to search for such an eigenvector on the orthogonal complement of the subspace spanned by ws. Let S be the subspace spanned by #1. The orthogonal complement $+ consists of all elements in V orthogonal to 4. In particular, $+ contains all eigenvectors belonging to as T mape SL into itself.? Let $,_; denote the unit sphere in the (” — 1)-dimensional subspace S+, (The unit sphere $,,_, is a subset of the unit sphere in V.) Applying Theorem 5.14 to the subspace S+ we find that Ay = Q(u,) . where Wz is a point which minimizes Q on S,1- The next eigenvector Ag can be obtained in a similar way as the minimum value of Q on the unit sphere S,9 in the (n — 2)-dimensional space consisting of those elements orhog- onal to both #4 and Wp. Continuing in this manner we find that cach eigenvalue 4, is the minimum value which Q takes on a unit sphere S,,,41 in a subspace of dimension n= hk + 1. ‘The largest of these minima, 2,, is also the maaximum value which Q takes on each of the spheres S,,,1. The comesponding set of eigenvectors ty, .. . . 4, form an orthonormal basis for V. 5.19 Unitary transformations We conclude this chapter with a brief discussion of another important class of trans- formations known as unitary transformations, In the finite-dimensional case they are represented by unitary mutrices, vernunson. Let E be a Buclidean space and V a subspace of B. A linear transformation T: V— Bis called unitary on V if we have (5.17) (Tix), Ty) = (x, y) forall xandy in V. When is a real Buctidean space a unitary transformation is also called an orthogonal transformation. Equation (6.17) is described by saying that 7 preserves inner products. Therefore itis natural t0 expect that T' also preserves orthogonality and noms, since these are derived from the inner product rugorem 5.15. If 7: V— E is a unitary transformation on V, then for all x and y in V we have. (a) (& yy O implies (Tix), Ti) = 0 (Tpreserves orthogonality). () (7) = |x| (Tpreserves norms). (© IT) — TQ)| = Ix — yl (Tpreserves distances). . (@ Tis invertible, and Tis unitary on TV). T This was done in the proot of ‘Theorem 9.4, Section 9.6, Unitary transformations 139 Proof. Part (a) follows at once from Equation (5.17), Part (b) follows by taking x = y in (5.17). Part (©) follows from (b) because T(x) — Tiy) = T(x — y). To prove (d) we use (b) which shows that T(x) = 0 implies x = O, so If x © T{V) and y © 7(V) we can write x = Tlw), y = Tiv), so we have is invertible. (T'Q), T7Q) = &, 9) = (Tw), Tr) =). Therefore T-? is unitary on 7(V). Regarding eigenvalues and eigenvectors we have the following theorem. suzonmt 5.116, Let T: V > E be a unitary transformation on V. (@) If T has an eigenvalue A, then \A\ = (b) If x and y are eigenvectors belonging to distinct eigenvalues hand ft, then x and y are orthogonal. (c) 2 V = E and dim V = n, and if V is a complex space, then there exist eigenvectors ty, ..., Uy of T which form an orthonormal basis for V. The matrix of T relative to this basis is the diagonal matrix A = diag (Ay... ., A,), where dy is the eigenvalue belonging to Uy. Proof. To prove (a), let x be an eigenvector belonging to il, Then x # 0 and Tix) = Ax. Taking y = x in Equation (5.17) we get Gx, a=) or A(x, ) =, »). Since (x, x) > 0 and AZ = [Al?, this implies || To prove (b), write Tix) = Lx, Ty) = jay and compute the inner product (T(x), 11y)) in two ways. We have (Ti), TQ) = Gy) since T is unitary. We also. have (T(x), TO) = (Ax, wy) = Aa, 9) since x and y are eigenvectors, Therefore Afi(x, y) = (% y) « 90 (%, y) = 0 unless Aw = 1 But 47 = 1 by (a), so if we had 4 = 1 we would also have dai, i= fi, 2 = we, which contradicts the assumption that 4 and pe are distinct. Therefore 4ji # 1 and (x, y) = 0. Part (¢) is proved by induction on in much the same way that we proved Theorem 5.4, the corresponding result for Hermitian operators. The only change required is in that part of the proof which shows that 7 maps S into itself, where ={xlxeV, (x,m) = 0}. Here 4, is an eigenvector of 7 with eigenvalue J,, From the equation 7(u,) = Ay, we find w= APT) = AT) 140 Eigenvalues of operators acting on Euclidean spaces since 4,4, = [A |? = Now choose any x in S+ and note that (Tx), 1) = (TX), AT) = ATE), TO) = AO ms) = 0 Hence T(x) © S*it xe S*, so T maps S* imo itself. The rest oF the proof is identical with that of Theorem 5.4, so we shall not repeat the details, The next two theorems describe properties of unitary transformations on a finite- dimensional space. We give only a brief outline of the proofs. THEOREM 5.17. Assume dim V = n and let E Then a linear transformation T: V —» V is unitary (Cis. +++ @n) be a jixed basis for V. and only if (5.18) (Tle), T(e)) = (@, @) for all i and j In particular, if E is orthonormal then T is unitary if and only if T maps E onto an orthonormal basis. Sketch of proof. Write x= ¥ xe. y = Y Yes. Then we have Sse Eye) =3 Bante, A y= and (10), TO = (Sx70€),3 9,71) =¥ ZxT@), Tey Now compare (x, y) with (T(x), Thy). ‘icone 5.18. Assume dim V = n and let (e,,..., e,) be an orthonormal basis for V. Let A = (a;;) be the matrix representation of a linear transformation T: V —» V relative to this basis, Then T is unitary if and only if A is unitary, that is, if and only if (3.19) AA Sketch of proof. Since (e:, @;) is the implies ntry of the identity matrix, Equation (5.19) 6.20) (643.6) = ductas = & ands A a Since A ig the main of Twe have The) — Fe. utys Tle) - Ye. ayes oe Since A is the matrix of Pure have Te) — Epy tutes Me) — Thy suey oe (Te). Tey) = (Sanees 3 anes) = 3 ¥ adilees 6) = Sande Now compare this with (5.20) and use Theorem 5.17. Exercises 14 muzoreu 5.19. Every unitary matrix A has the following properties: (@ A is nonsingular and A! = A*. (b) Bach of A’, A, and A* is a unitary matrix, (0) The eigenvalues of A ar complex numbers of absolute value 1 (@) |det A| == 1; if A is real, then det A= +1 ‘The proof of Theorem 5.19 is left as an exercise for the reader, 5.20 Exercises 1. (a) Let 1: ¥ + ¥ be the transformation given by Tix) = ex, where ¢ is a fixed scalar. Prove that 1 is unitary if and only if lel = 1 (b) If V is one-dimensional, prove that the only unitary transformations on V are those des- cribed in (a). In particular, if Vis a real one-dimensional space, there ate only two onhogonal transformations, T(x) — x and Tx) = —x Prove each of the following statements about a real orthogonal # xm matrix A. (a) If 2 is a real eigenvalue of A, then 2 -1 ()H Ais ao x conus In other words, the nonreal eigenvalues of A occur in conjugate pai (© If mis odd, then A has at least one real eigenvalue. 3. Let ¥ be a real Euclidean space of dimension 7. An orthogonal transformation 4: ¥ — with determinant 1 is called a rotation, If n is odd, prove that 1 is an eigenvalue for T. This shows that every rotation of an odd-dimensional space has a fixed axis. [Hint: Use Exercise 2.] 4, Given a real orthogonal matrix A with -1 as an eigenvalue of multiplicity &, Prove that det A = CL 5. If 7 is linear and norm-preserving, prove that Tis unitary. 6. If T: ¥ ~+ V is both unitary and Hermitian, prove that T?= J. 7. Let (@,,...,¢,) and (yy... , ua) be two orthonormal bases for a Euclidean space ¥. Prove that there is a unitary transformation T which maps one of these bases onto the other. 8. Find a real a such that the following matrix is unitary: Rp Wien cigeavalc of A, th i is also an @ oH yaQi-ty ja 30 +i) jal -d |. a -} 4a(2 — i) 9. If 4 is a skew-Hermitian matrix, prove that both ZA and Z + 4 are nonsingular and that (Z = ANZ + AY is unitary. 10. If A is a unitary matrix and if Z + A is nonsingular, prove that (I — A)(Z + A)-) is skew- ‘Hermitian. 11, If A is Hermitian, prove that A — if is nonsingular and that (A ~ i)3(4 4 if) is unitary. 12. Prove that any unitary matrix can be diagonalized by a unitary matrix. 13. A square matrix is called normal if AA* = A*A . Determine which of the following types of matrices ae normal. (@) Hermitian mauiees, @ Skew-symmeuic: maces. (b) Skew-Hermitian matrices (©) Unitary matrices (©) Symmettic matrices. (Orthogonal matrices 14, I-A 18 a normal matnx (4A* = A*A) and if U 18 a unitary matrix, prove that U*AU is normal. 6 LINEAR DIFFERENTIAL EQUATIONS 6.1 Historical introduction The history of differential equations began in the 17th century when Newton, Leibniz, and the Bemoullis solved some simple differential equations of the first and second order arising from problems in geometry and mechanics. These early discoveries, beginning about 1690, seemed {© suggest that the solutions of all differential equations based on geometric and physical problems could he expressed in terms of the familiar elementary functions of calculus. Therefore, much of the early work was aimed at developing ingenious techniques for solving differential equations by clementary means, that is to say, by addition, sub- gration, applied only a finite action number of times to the familiar functions of calculus, Special methods such as separation of variables and the use of integrating factors were devised more or less haphazardly before the end of the 17th century. During the 18th century, more systematic procedures were developed, primarily by Euler, Lagrange, and Laplace. It soon became apparent that relatively few differential equations could be solved by elementary means. Little by little, mathematicians began 10 realize that it was hopeless to try t discover methods for solving all differential equations. Instead, they found it more fruitful to ask whether or not a given differential equation has any solution at all and, when it has, t ty to deduce properties of the solution from the differential equation itself. Within this framework. mathematicians began to think of differential equations as new sources of functions, ‘An important phase in the theory developed carly in the 19h century, paralleling the general uend foward a more rigowus approach to the calculus, In the 1820's, Cauchy obtained the first “existence theorem” for differential equations, He proved that every firstorder equation of the form multiplication, division, composition, and y= f(x,y) has a solution whenever the right member, f(x,y), satisfies certain general conditions One important example is the Ricatti equation y= Ploy + Qe)y + RO), where P, Q, and R are given functions, Cauchy’s work implies the existence of a solution of the Ricatti equation in any open interval (=r, 1) about the origin, provided P, Q, and 142 Review of results concerning linear equations of first and second orders 143 R have power-series expansions in (—r, r}. In 1841 Joseph Liouville (1809-1882) showed that in some cases this solution cannot be obtained by elementary means. Experience has shown that it is difficult to obtain results of much generality about solutions of differential equations, except for a few types. Among these are the so-called linear ifferential equations which occur in a great variety of scientific problems. Some simple types were discussed! in Volume T-linear equations of first order and linear equations of second order with constant coefficients. The next section gives a review of the principal results concemning these equations 6.2 Review of results concerning linear equations of first and second orders A linear differential equation of first order is one of the form (6.1) Y¥ + PQ = QQ), where P and Q are given functions. In Volume I we proved an existence-uniqueness theorem for this equation (Theorem 83) which we restate here. turorem 6.1. Assume P and Q are continuous on an open interval J. Choose any point a in J and let b be any real number. Then there is one and only one function y = f (x) which satisfies the differential equation (6.1) and the initial condition fla) = b . This function is given by the explicit formula 62) Fea) = bea + eat | neta, where A(x) = Jz P(t) dt. Linear equations of second order are those of the form Pry" + Proy’ + Paldy = RQ). If the coefficients Py, Py, P, and the right-hand member R are continuous on some interval J, and if Py is never zero on J, an existence theorem (discussed in Section 6.5) guarantees that solutions always exist over the interval J. Nevertheless, there is no general formula analogous to (6:2) for expressing these solutions in terms of Py, Py, Pes and R Thus, even in this relatively simple generalization of (6.1), the theory is far from complete. except in special cases. If the coefficients are constants and if R is zero, all the solutions can be determined explicitly in terms of polynomials, exponential and trigonometric functions by tie following theorem which was proved in Volume 1 (Theorem 8.7) razonm 6.2. Consider the differential equation (6.3) y" + ay! + by =0, 144 Linear differential equations where a and b are given real constants. Let d= a? = 4b, Then every solution of (63) on the interval (= 0, + 00) has the form raeBeuy(x) + Catta(x)), (6.4) a where cand ¢, are constants, and the functions u, and u, are determined according to the algebraic sign of das follows: (a) If d = 0, then u(x) = 1 and u(x) = x _ () If d> 0, then u(x) = e and u(x) =e, wherek=4Nd. (©) Fd < 0, then u(x) = cos kx and g(x) = sin kx, where k= 3V—d. ‘The number d= a? = 4b is the discriminant of the quadratie equation (6.5) rPtart+b ‘This is called the characteristic equation of the differential equation (6.3). Its roots are given by -a-v ‘The algebraic sign of d determines the nature of these roots. If d > 0 both roots are real y= ce + Cpe. If d <0, the roots r and 7, are conjugate complex numbers. Each of the complex exponential functions f(x) = ef and f,(x) = e"* is a complex solution of the differential equation (6.3). We obtain real solutions by examining the real and imaginary parts of fy and fy. Writing y= —4a + ik, rg = —}a —ik, where k = }/—d, we have Sl) =e = eC = 9-H! C05 kee + je! sin kx and masin kx. lx) = 8 = HCH The general solution appearing in Equation (6.4) is a linear combination of the real and imaginary parts of f(x) and f(x). 6.3. Exercises ‘These exercises have been selected from Chapter 8 in Volume I and are intended as a review of the intoductory material on linear differential eyuatious uf first aud second orders. Linear equations of first order. In Exercises 1,2,3, solve the initial-value problem on the specified interval 1 y! - 3y =e on (—©, +0), with y =Owhenx 2. xy’ — 2p = x5 on ©, +), with y = 1 when x = 1. 3. y' + ylanx = sin 2x on (—4n, #2), with y = 2 wnen x = Linear differential equations of order 145 4, If a strain of bacteria grows at a rate proportional to the amount present and iff the population doubles in one hour, by how much will it inerease at the end of two hours? 5, A enrve with Cartesian equation y = f(x) passes through the origin, Lines drawn parallel to the coordinate axes through an arbitary’ point of the curve form a rectangle with two sides on the axes. The curve divides every such rectangle into two regions A and B, one of which has an area equal tom times the other. Find the function f- 6. (a) Let u be a nonzero solution of the second-order equation y” + P(xdy’ + O(xdy Show that the substitution y = we converts the equation 0. RO) Jy" + POdy’ + Oo) imo a firstorder linear equation for v” (b) Obtain a nonzero solution of the equation. y use the method of part (a) to find a solution of 4y! + x3)" = 4y) = 0 by inspection and y= Ay + By — 4y) = Dre such that y = 0 and y’ = 4 when x =0. Linear equations of second order with constant coefficients, In cach of Exercises 7 through 10, find all solutions on ( = %, + 0). Ty! Ay =o. 9. yl" 2p 4 Sy = 8. y" + 4y = 0, 10. y+ 2y' +y =0 11, Find all values of the constant & such that the differential equation y" + ky = 0 has a nontrivial solution y = f,(x) for which&(O) = f,(1) = 0. For each permissible value of k, determine the conresponding solution y = f,(x) . Consider both positive and negative values of k. 12.10 (a, 6) is a given point in the plane and if m is & given real number, prove thal the differential equation y" + ky = 0 has exacly one solution whose graph passes through (a, 6) and has slope m there. Discuss separately the case k = 0. 13, In each case, find a linear differential equation of second order satisfied by mand up. (a) m@)=e* g(x) = e*. (b) m(x) = , g(x) = xe. © mlx) = e** cos x, ule) @) ma) = sin @x +1), ug) = sin 2x + 2). (e) u(x) = cosh x. u(x) = sinh x 14. A particle undergoes simple harmonic motion, Initially its displacement is 1, its velocity is 1 and its acceleration is — 12. Compute its displacement and acceleration when the velocity is V8. 6A Linear differential equations of order A linear differential equation of order n is one of the form (6.6) Poa + PAO + oo + Pa(xdy = ROD The functions Py. P,.... . P, multiplying the various derivatives of the unknown function y are called the coefficients of the equation, In our general discussion of the linear equation we shall assume that all the coefficients are continuous on some interval J. The word “interval” will refer either to a bounded or to an unbounded interval. 146 Linear differential equations In the differential equation (6:6) the leading coefficient P, plays a special role, since it etermines the order of the equation, Points at which P.(x) = 0 are called singular points of the equation, The presence of singular points sometimes introduces complications that require special investigation. To avoid these difficulties we assume that the function Pg iy never 2210 on of Then we can divide both sides of Equation (66) by Py ant wewsite the differential equation in a form with leading coefficient 1, Therefore, in our general dis- cussion we shall assume that the differential equation has the form 1) YPM 4 ot PaGy = RO). The discussion of linear equations can be simplified by the use of operator notation, Let €(J) denote the linear space consisting of all real-valued functions continuous on an. interval I Let 6*(J) denote the subspace consisting of all functions f° whose first m derivatives P’sf'y ++ .,f exist and ate continuous on ef Let Py, ... , Py be nm given functions in (J) and consider the operator L: €*(J) +> Vid given by Up =f PfODa vere Pf, The operator L itself is sometimes written as L=D*+ PDO +--+ Py, where D¥ denotes the kth derivative operator. In operator notation the differential equation in (67) is written simply as (6.8) Ly) = RB. A solution of this equation is any function y in @"(J) which satisfies (68) on the interval J. It is easy to verify that L(y, + y2)= L(y.) + Lz), and that L(ey) = eL(y) for every constant ¢. Therefore L is a linear operator. This is why the equation L(y) = Ris refered to as a linear equation. The operator L is called a linear differential operator ef order n With each linear equation L(y) = R we may associate the equation L(y) = 0, in which the righthand side bas been replaced by zero, This is called the homogeneous equation corresponding to L{y) = R. When R is not identically zero, the equation L(y) = R ig called a nonhomogencous equation, We shall find that we can always solve the non homogeneous equation whenever we can solve the comesponding homogeneous equation, Therefore, we begin our study with the homogeneous case. The set of solutions of the homogeneous equation is the null space N(L) of the operator L. This is also called the solution space of the equation, The solution space is a subspace of @*(J). Although €"(J) 1s infinite-dimensional, it turns out that the solution space N(L) is always finite-dimensional. In fact, we shall prove that 9) dim NL) =n, ‘The dimension of the solution space of a homogeneous linear equation if where 1 is the order of the operator L. Equation (6.9) is called the dimensionality theorem for linear differential operators. The dimensionality theorem will be deduced as a conse- quence of an existence-uniqueness theorem which we discuss next. 6.5 The existence-uniqueness theorem qaponmy 6.3, EXISTENCE-UNIQUENESS THEOREM ron LINEAR EQUATIONS OF one 1. Let Py Poser, Py be continuous functions on an open interval J and let L be the linear differ ential operator La Dt PD fo +P If xo €J and if ky, ky, .-+Ky1 7 n given real numbers, then there exists one and only one function y = fix) which satisfies the homogeneous differential equation L(y) = 0 on Jand which also satisfies the initial conditions L (0) = ko Fo) = Hay oes of ON) = Kat Note: The vector in n-space given by (f(x), f(r), $V) is called the initial-value vector off at x9, Theorem 6.3 tells us that if we choose a point xin J and choose a vector in n-space, then the homogeneous equation L(y) = 0 has exactly one solution y= £ (x) on J with this vector as initial-value vector at ‘x. For example, when n = 2 there is exactly one solution with prescribed value f(x) and prescribed derivative f'(&) at a prescribed point xy. The proof of the existence-uniqueness theorem will be obtained as a corollary of more general existence-uniqueness theorems discussed in Chapter 7. An altemate proof for the case of equations with constant coefficients 1s given in Section 7.9. 6.6 The dimension of the solution space of a homogeneous linear equation muxoren 6.4. pimunsuonauiey tusoxen. Let L: 8*(3) > @(d) be a kinear digfereniiad operator of order n given by (6.10) LED EPs rt Py ‘Then the solution space of the equation Ly) = 0 has dimension n. Vy % linear transformation that maps each function f in the solution space N(L) onto the initial- value vector off at Xp, Proof. al TP) = (Fedsf ds... f°)» where Xp is a fixed point in of ‘The uniqueness theorem tells us that T(£) = 0 implies f = 0. Therefore, by Theorem 2.10, Tis one-to-one on N(L), Hence T-! is also one-to-one and maps V,, onto N(L). and Theorem 2.11 shows that dim N(L) = dim V, = 1 148. Linear differential equations Now that we know that the solution space has dimension n, any set of n independent solutions will serve as a basis, Therefore, as a corollary of the dimensionality theorem we have THEOREM 6.5, Let L: €"(J) —> Vid) be a linear differential operator of order n. If yy... 4, are independent solutions of the homogeneous differential equation L{y) = 0 on J, then every solution y = f (x) on J can be expressed in the form on I) = Sean). where Cy,..., , are constants. Note: Since all solutions of the differential equation L(y) = 0 are contained in formula (6.1 D, the linear combination on the right, with arbitrary constants ¢, ty is sometimes called the general solution of the differential equation, The dimensionality theorem tells us that the solution space of a homogeneous linear differential equation of order 1 always has a basis of m solutions, but it does not tell us how to determine such a basis. In fact, no simple method is known for determining a basis of solutions for every linear equation, However, special methods have been devised for special equations. Among these are differential equations with constant coefficients to which we tum now, 67 The algebra of constant-coefficient operators A. constant-coefficient operator A is a linear operator of the form 6.12) A= aD" + D4 +614 a,.D + a, where D is the derivative operator and a,, 4,, + @, are real constants. If a, # 0 the operator is said to have order m, The operator A can be applied to any function y with derivatives on some interval, the result being a function Afy) given by Aly) = ay + ayyhY tree aay + ay In this section, we restrict our attention to functions having derivatives of every order on (= co, + co). ‘The set of all such functions will be denoted by @® and will be refered to as the class of infinitely differentiable functions. If y € @® then A(y) is also in €°, ‘The usual algebraic operations on linear transformations (addition, multiplication by scalars, and composition or multiplication) can be applicd, in particular, to constant- coefficient operators. Let A and B be wo constant-coeflicient operators (not necessatily of the same order). Since the sum A + B and all scalar multiples /A are also constant- coeficient operators, the set of all constant-coefficient operators is a linear space. The product of A and B (in either order) is also a constant-coeflicient operator. Therefore, sums, prodnets, and scalar multiples of constant-coefficient operators satisfy the usual commutative, associative, and distributive laws satisfied by all linear transformations. Also, since we have D'D* = D*D? for all positive integers r and s, any two constant- cocificiem operuors commu; AB = BA . The algebra of constant-coefficient operators 149 With each constant-coefficient operator A we associate a polynomial py called the characterise polynomial of A. If A is given by (6.12), p4 18 that polynomual which has the same coefficients as A. That is, for every real r we have ale) = agt" + apt ee + a, Conversely, given any real polynomial p, there is a corresponding operator A whose coefficients are the same as those ofp. The next theorem shows that this association between operators and poiynomiais is a one-to-one correspondence. Moreover, this correspondence associates with sums, products, and scalar multiples of operators the respective sums, products, and scalar multiples of their characteristic polynomials. ruonn 6.6. Let A and B denote constant-coefficient operators with characteristic polynomials py and pz, respectively, and let 2 be a real number. Then we have: (a) A= B if and only if py = prs (©) pare = Pat Par (©) Paw = Pa Pa» @) Papas hPa Proof. We consider part (a) first. Assumep, = py . We wish to prove that A(y) = Bly) for every y in @*, Since P4 = Py, doth polynomials have the same degree and the coefficients, Therefore A and B have the same order and the same coeflicients, so (3) B(y) for every y in €°. ‘Next we prove that A = B implies py = py. The relation A = B means that A(y) = B(y) for every y in @®, Take y = e', where r is a constant. Since yt = Pe’ for every k > 0, we have: AQy) = pale? — and By) = pale". The equation A(y) = B(y) implies pr) = py(r). Since r is arbitrary we must have Pa = Pp . This completes the proof of part (2). Parts (b), (©, wx (@ follow al once from the definition of dhe characteristic polynomial From Theorem 6.6 it follows that every algebraic relation involving sums, products, and scalar multiples of polynomials py and pg also holds for the operators A and B. In particular, if the characteristic polynomial pcan be factored as a product of two or more polynomials, cach factor must be the charcteristic polynomial of some constant-coeflicient ‘operator, so, by Theorem 6.6, there is a corresponding factorization of the operator A. For example, ifp(r) = py(r)po(r), then A = BC. Ifp4(r) can be factored as a product of nt linear factors, say (6.13) PA) afr nr ner (rr), the comesponding factorization of A. takes the form A= a(D~n\D—n)- (Dr 150 Linear differential equations The fundamental theorem of algebra tells us that every polynomial p4(r) of degree n> TU has 2 factorization of the form (6.13), where ry, ra... , fy ate the mats of the equa: tion, Pal) = 0, called the characteristic equation of A. Each root is written as often as its multiplicity indicates, The roots may be real or complex. Since p(r) has real coefficients, the complex roots occur in conjugate pairs, «+ iB, — ip, if p #0. The two linear factors comesponding to cach such pair can be combined to give one quadratic factor r? — 2ar + a + BP whose cocflicients are real. Therefore, every polynomial p.(r) can be factored as a produet of linear and quadratic polynomials with real coefficients. This gives a come- sponding factorization of the operator A as a product of firstorder and second-order constant-coefficient operators with real coefficients, mwvore I. Let A = D* — 5D + 6. Since the characteristic polynomial p4(r) has the factorization r? — Sr + 6 = (r — 2)(r — 3), the operator A has the factorization Dt - 5D + 6=(D —2(D =3). mmore 2 Let A = Dt = 2D8 + 2D — 2D +1. The chameteristie polynomial p.y(r) has the factorization Pm 2+ Wert Ls = lr = r+ D, so A has the Fact A=M =) = 104 1) 68 Determination of a basis of solution by factorization of operators for linear equations with constant coefficients The next theorem shows how factorization of constant-coefficient operators helps us to solve linear differential equations with constant coefficients, razonen 6.1. Let L be a constant-coefficient operator which can be factored as a product of constant-coefficient operators, say L Then the solution space of the linear differential equation L{y) = 0 contains the solution space of each differential equation A,(j) = 0. In other words, (6.14) N(A) S N(L) foreachi= 1,2,..., k. Proof. If w is in the null space of the last factor A, we have A,(u) = 0 so L(t) = (Aran Au) = (A, + Ap yA) = (A, + Axa) Determination of a basis of solutions for linear equations 151 Therefore the null space of L contains the null space of the last factor A,, But since constant-coefficient operators commute, we can rearrange the factors so that any one of them is the last factor. This proves (6.14). If L{u) = 0, the operator L is said to annihilate u. Theorem 6.7 tells us that if a factor As of Z annihilates u, then ZL also annihilates u. We illustrate how the theorem can be used © solve homogeneous differential equations with constant coefficients. We have chosen examples to illustrate different features, depending on the nature of the roots of the characteristic equation, CASE I. Real distinct roots. severe 1, Find a basis of solutions for the differential equation (6.15) (D)- 7D + 6y = Solution, This thas the form L(y) = 0 with L=D'-7D+6=(D-l)(D-2(D +3). The mull space of D = I contains u,(x) = e?, that of D — 2 contains u4(x) = e%, and that of D + 3 contains ug(x) = e-**, In Chapter 1 (p. 10) we proved that 1), ue, ug are inde- pendent. Since three independent solutions of a third order equation form a basis for the solution space, the general soiution of (6.19) 1S given by ym et eet s cee, The method used to solve Example I enables us to find a basis for the solution space of any constant-coefficient operator that can be factored into distinet linear factors, rarorm 6,8, Let L be a constant coefficient operator whose characteristic equation Px(0)=Ohas n distinct real roots r;,',5...,%,» Then the general solution of the differential equation L(y) = 0 on the interval (- 20, + a) is given by the formula (6.16) y See". zy Proof We have the factorization L = a(D = n)(D ~ 1)" D ~1,) Since the null space of (D = r4) contains u(x) =e, the mull space of L contains the m functions (61D wQd= er, mlx) =e)... ul) = ee 152 Linear differential equations In Chapter 1 (p. 10) we proved that these functions are independent. Therefore they form a basis for the solution space of the equation Liv) = 0. so the general solution is given by (6.16). CASE I, Real roots, some of which are repeated. If all the roots are real but mot distinct, the functions in (6.17) are not independent and therefore do not form a basis for the solution space. If a root r occurs with multiplicity m, then (D = r)" is a factor of L, The next theorem shows how to obtain m independent solutions in the null space of this factor son 6.9, The m functions u(x) =e, u(x) = xe... ux) = x Tet care m independent elements annihilated by the operator (D — ry" . Proof. The independence of these functions follows from the independence of the polynomials 1, x, X%,..., x"%, To prove that 1, Ue, ... , Mm are annihilated by (D = 7)" we use induction on m. Wm = 1 dere is only oue function, u(x) = e%, which is clewly aunihilaed by (D = 1). Suppose, then, that the theorem is true form ~ 1. This means that the functions 1, , Uy. ate annihilated by (D = 1)", Since @=1)" =D ~)(D=n the functions 44, .. Wya1 are also annihilated by (D =r)" . To complete the proof we must show that (D — 1)" annihilates u,, . Therefore we consider D = r)"uy = (D — ry" D = r)(x™ te"). We have (D = Dlx™1e) = D(x"Ier2) = rx™er# = (m — L)xte x" re rx te = (m = 1x" = (m= Vital) When we apply D ~ r)"-1 to both members of this last equation we get O on the right since (D — n"! annihilates u.1 . Hence (D = r)"uq. = 0 $0 ua is annihilated by (D =r)". This completes the proof. sxanrue 2, Find the general solution of the differential equation L(y) = 0, where L=D—Dt-8D+12. Solution. The operator L has the factorization L=(D- 2D + 9. Determination of a basis of solutions for linear equations 153 By Theorem 6.9, the two functions u(x) =e, g(x) = xe are in the null space of (D — 2)%, The function g(x) = ¢-3 is in the null space of (D + 3). Since uy, up, Hy are independent (see Exercise 17 of Section 69) they form a basis for the null space of L, so the general solution of the differential equation is y= ce + cxe™ + cen, Theorem 69 tells us how to find a basis of solutions for any nth order linear equation with constant coefficients whose characteristic equation has only real roots, some of which ‘are repeated. If the distinct roots are rh, rz, + + ++ Te and if they occur with respective multiplicities 7, mz, ... , My» that part of the basis corresponding tor, is given by the my functions po) = xe, where G= 1,2 y..45 My ‘Asp takes the values 1,2,...,k we get mn +--+ + my functions altogether, In Exercise 17 of Section 6.9 we outline a proof showing that all these functions are independent, Since the sum of the muliplicities m, + +++ + m, is equal 10 m, the order of the equation, the functions vy, form a basis tor the solution space of the equation, wxaeuz 3. Solve the equation (D* + 2D° — 2D3 = Diy = 0. Solution, We have D* + 2D* = 2D* — Dt = DXD —1)(D + 1°. The part of the basis comesponding wo the factor Dt is m(x) = 1, ug(x) = x i the part comesponding to the factor (D = 1) is ug(x) =e? ; and the part corresponding to the factor (D+ 1) is u(x) = e*, us(x) = xe, g(x) = x%e-*. The six functions uy, ... , ve are independent so the general solution of the equation is Het eg + cyt + (ey + Gx + Ogxtle®. CASE IL, Complex roots. If complex exponentials are used, there is no need to distinguish between real and complex rots of the characteristic equation of the differential equation L(y) = 0. If real-valued solutions are desired, we factor the operator [into linear and quadratic factors with real coefficients. Each pair of conjugate complex roots % + if, & — iB corresponds to a quadratic factor, (6.18) D? — 2xD + a2 + BR. The null space of this second-order operator contains the two independent functions u(x) = e** cos Bx and v(x) = e** sin Ax. If the pair of roots a + i8 occurs with multi- phicity m, the quadratic factor occurs to the mth power, ‘Ihe null space of the operator [DP wD sot pam 154 Linear differential equations contains 2m independent functions, u(x) = xe cos Ax, u(x) = xe sin Bx, gq =1,2,...,m. P by mt on me (Proofs are in B 28 of Section 69.) The following examples illustrate some of the possiblities EXAMPLE 4, yp" = 4p" + 13y’ = 0, ‘The characteristic equation, r?— 4r? + 13r = 0, has the roots 0, 2 + i; the general solution is Y= G+ eM(cy C08 3x + Cy sin 3x) EXAMPLE 5. y"—2y" + 4y! — 8y = 0. Tho characteristic equation is Ps e —8 =r — (P44 =0; its roots are 2, 2i, —2i, 90 the general solution of the differential equation is y= Qe" + Ce cos 2x + ¢y sin 2x. EXAMPLE 6, y) — 9y + 34y” — 66y" + 65y’ = 25y = 0. The characteristic equation canbe writen’ as GN 4s 5Pao; its roots are 1, 2 + i, 2 4, so the general solution of the differential equation is y= ce + &[ce + eax) cos x + (64 + Cx) sin x1. 6.9 Exercises Find the general solution of each of the differential equations in Exercises 1 through 12. 1 0. 1. yl + 16y = 0. 2 &y”-y=o 3. 0. 9. y+ Ay! + By" + By" + dy = 0. 4. y" = 3y" +37’ —y =0 10. y) + 2y"+y = 0. 5. yO + dy" + Oy" + dy’ + y = 0. U1, y+ dy@ + dy” = 6 ya) — [oy = 0. 12, yl 4 gy) + 16y" =O. 13. If m is a positive constant, find that particular solution y — f(x) of the differential equation = my" + my! = ny = 0 which satisfies the condition (0) = f'(0) = 0,f"@) = 1. 14. A linear differential equation with ‘constant coefficients has characteristic equation f(r) = 0. If all the roots of the characteristic equation are negative, prove that every solution of the differ- ential equation approaches zero as x + + 2. What can you conclude about the behavior of all solutions on the interval (0, + co) if all the roots of the characteristic equation are nonpasitive? Exercises 155 1.5. In each case, find a linear differential equation with constant coefficients satisfied by all the given functions. (@) ma)=e, mO)=e%, may = e, (b) 4) = 6, yx) = xe, CO) mG) 1, gx) — x, s(x) @ w=, mO)= e, w(x) © w@= 22, wQX)= 6, — w() = xe. () 4G) = e* cos Ix, myx) =e sin Ae, w(x) = em, a(x) = xe (2) w(x) = cosh x, ug(x) = sinh x, u(x) =x cosh, ug(x) = x sinh x (h) u(x) = cosh x sin x, g(x) = sinh x cos x, ug(X) = x. 16, Let ry, ++ fn be m distinct real numbers, and let Qyy.. . » Q, be 1 polynor is the zero polynomial. Prove that the 1 functions ils, none of which m= ie, .... uO = QnCde™* are independent. Outline of proof. Use induction on n, For = 1 and n = 2 the result is easily verified. Assume the statement is true for m =p and let ¢,.. . .¢p41 be p + 1 real scalars such that > Oxx)e™ = a Multiply both sides by e~'»+x* and differentiate the resulting equation. Then use the induction hypothesis to show that all the scalars ¢, are 0. An alternate proof can be given based on order of magnitude as x ~+ + cc, as was done in Example 7 of Section 1.7 (p. 10). 17, Let my, mig, «+ My be & positive integers, let ry, r2,--- fy be & distinet real numbers, and letn =m, +--+ + mg. For each pair of integersp, q satisfying 1 < pSk,1 1. Prove that each of the derivatives M’, M" + M1) also annihilates u. (©) Use part (b) and Exercise 19 to prove that M annihilates each of the functions u, xu... xy, (@) Use part (©) to show that the operator (D? — 24D + a* + 9%)" annihilates cach of the functions xte% sin fx and xtet* cos Bx for q = 1,2,...,m—1. 21, Let L be a constant-coeflicient operator of order n with characteristic polynomial p(r). If a is Gumiani aud if y has q Gciivaiives, prove thai Ecertuta)) = 6.10 The relation between the homogeneous and nonhomogeneous equations We return now to the general linear differential equation of order n with coefficients that are not necessarily constant. The next theorem describes the relation between solutions of a homogeneous equation L(y) = 0 and those of a nonhomogeneous equation Ly) = Rix). maton 6.10. Let L:€ (J) + 6(J) be a linear differential operator of order n. Let uy... u, be n independent solutions of the homogeneous equation L{y) = 0, and let yy be a particular solution of the nonhomogeneous equation L(y) = R, where R € Vid). Then every solution (x) of the nonhomogeneous equation has the form 6.19) $0) = 40) + Demo, 2% where C1y.. +, C, are constants. Proof. By linearity we have L(f — y,) =L(f) — LQ,) =R=—R Therefore Ff = xis in the solution space of the homogeneous equation L(y) = 0,80 f— y, is a linear couiination OF May... Bey say ya Guy TT Ggily «This proves (0.19). Since all solutions of L(y) = R are found in (6.19), the sum on the right of (6.19) (with arbitrary constants ¢,, ¢,,.-.+¢) is called the general solution of the nonhomogeneous Determination of a particular solution of the nonhomogeneous equation 187 equation, Theorem 6.10 states that the general solution of the nonhomogeneous equation is obtained hy adding toy, the general solution of the homogeneous equation, Note: Theorem 6.10 has a simple geometric analogy which helps give an insight into its meaning. To determine all points on a plane we find a particular point on the plane and add to it all points on the parallel plane through the origin, To find all solutions of Liy) = R we find a particular solution and add to it all solutions of the homogeneous eqmation Ty) = 0. The set of snintions af the nanhomogenenns erpatinn is analogens to a plane through a particular point, The solution space of the homogeneous equation is analogous to a parallel plane through the origin. To use Theorem 6.10 in practice we must solve two problems: (1) Find the general solution of the homogeneous equation L(y) = 0, and (2) find a particular solution of the nonhomogeneous equation L(y) = R, In the next section we show that we can always solve problem (2) if we can solve problem (1), 6.11 Determination of a particular solution of the nonhomogeneous equation. The method of variation of parameters We tum now to the problem of determining one particular solution y, of the nonhomo- geneous equation L(y) = R . We shall describe a method known as variation ofparameters which tells us how to determine y, if we know # independent solutions 41, .. . , uq of the homogeneous equation L(y) = 0. The method provides a particular solution of the form (6.20) Dr = Oy +--+ dyys wherey,, .. . , 0, are funetions that can be calculated in terms of uy... . 5 ty and the right hand member R. The method leads to a system of m linear algebraic equations satisfied by the derivatives v4, » vi,. This system can always be solved because it has a nonsingular coefficient matrix. Integration of the derivatives then gives the required functions v,, v,. The method was first used by Johann Bemoulli to solve linear equations and then by Lagrange in 1774 to solve linear equations of second order. For the mth order case the details can be simplified by using vector and matrix notation. The right-hand member of (6.20) can be written as an inner product, of first order, (6.21) Ja = (4), where v and w are n-dimensional vector functions given by SH (rere te Wa (Myers Me We ty to choose v so that the inner product defining y, will satisfy the nonhomogeneous equation L(y) = R, given that L(u) = 0, where Lfw) = (L(y)... +» L(u,)). We begin by calculating the first derivative of y, . We find (6.22) Y= u')+ (yu). 158 Linear differential equations We have 1 functions &,. . . ,2,, 10 determine, so we should be able to put 1 conditions on them, If we impose the condition that the second (erm on the right of 622) should vanish, the formula for y{ simplifies to Yi=(v, w’), provided that (v’, u) = 0. for yi, we We w+ yu). Tf we can choose v so that (v’, uw’) = 0 then the formula for y7 also simplifies and we get yL= (vw) provided that also (v’, uw’) = 0. If we continue in this manner for the first 1 — 1 derivatives of y, we find yo (v, a"), provided that also (Vv, uw!" 0. So far we have par m= 1 conditions on v Differentiating one mom we get Y= wl) +07, ul), This time we impose the condition (vu?) = RG), and the last equation becomes yi? =(v, uw!) + R(x), provided that also (v’, w™-Y) = RO). Suppose, for the moment, that we can satisfy the m conditions imposed on v, Let L = D? + P,(x)D" ++. . + P(x). When we apply L to y, we find LO) = iP + Poy? + + PAlx)y = (ul) + ROO} + PO, Ww) +. + PAO, w) = (, Lu) + R(x) = (% 0) + Rix) = Ria. Thus L(y) = R(x), soy; is a solution of the nonhomogeous equation, ‘The method will succeed if we can satisfy the conditions we have imposed on v, These conditions state that (v’, wu’) = 0 for k = 0, 1, on = 2, and that (v’, wl") = R(x). We can write these m equations as a single matrix equation, (6.23) Wx)o'(x) = RO] |, Ld Determination of a particular solution of the nonhomogeneous equation 159 where v'(x) is regarded as anf x 1 column matrix, and where W is the m xm matrix function whose rows consist of the components of # and its successive derivatives: My yy uy Me es wy w jue yg we The matrix W is called the Wronskian matrix of yy... , tas after J. M. H. Wronski (1778-1853). In the next section we shall prove that the Wronskian matrix is nonsingular, Therefore we can multiply both sides of ay W(x)? to obtain = 0 po : ROW - 1 0 1 Choose two points ¢ and x in the interval J under consideration and integrate this vector equation over the interval fom ¢ to x to obtain [°] u(x) = 06) + [Rowe i] a= no) +209, 1 where 0 263) = [rower al ae 1 The formula y: = (u, v) for the particular solution now becomes di &, v) ~ (He). 2) - (0) » (u, The first term (u, v{c)) satisfies the homogeneous equation since it is a linear combination of My, ee. sq» Therefore we can omit this term and use the second term (u, z) as a particular solution of the nonhomogeneous equation. In other words, a particular solution of 160 Linear differential equations Ly) = Ris given by the inner proxiuct Note that it is not necessary that the function FR be continuous on the interval 0, given that the homogeneous equation has a solution of the form y= e"™ . 6. Obtain one nonzero solution by inspection and then find the general solution of the differential equation Q" = 4y") + #0" - 4y) 7. Find the general solution of dhe differential equation Axty” + 4xy" — 0, given that there is a particular solution of the form y = x for x > 0. 8, Find a solution of the homogeneous equation by tril, and then find the general solution of the equation x(1 = x)y” = (1 = 2p’ + G2 = 3x + Dy = (1 = 8. 9, Find the general solution of the equation Ox = 3x5)" + 4) + Oxy = given that it has a solution that is a polynomial in x 10, Find the general solution of the equation HCL = Oy" + 2x(2 = xy’ + 2+ yet, given that the homogeneous equation has @ solution of the form y = 11, Let g(x) = fg e'/e dt if x > 0. (Do not attempt to evaluate this integral) Find all values of the constant a such that the function £ defined by poyatens Linear equations of second order with analytic coefficients 169 satisfies the linear differential equation xy" + Ox = yy FU aK Use this information to determine the general solution of the equation on the interval (0, + 0), 617 Lin A. funetionfis said to be analytic on an interval (xp — 7. Xo + 1) ifthas a power-series expansion in this interval, ear equations of second order with analytic coefficients S00) = Sax — x0)", convergent for [4 = Ag] 0, Mz > 0. Let A4 be the larger of My and 1M, . Then we have M. M bel S and leal < The recursion formula implies the inequality (+ 2H $Y Meal SD[K. loses ladzei} LX Dil A < S04 2) layyal 7 i = aD 1 (k +1) layl - ' + Dlenale, lalla t Those readers not familiar with multiplication of power series may consult Exercise 7 of Section 6.21 The Legendre equation im Now let A, = [a], A, = |@|, and define A, A, . formula vely by the recursion we (634) (n+ n+ DAgs = x Dik+ DA for n > 0. Then |a,| < A, for all 2 > 0, so the series ¥ a,(x — xq)" is dominated by the series Y A, [x = x9|". Now we use the ratio test to show that YA, |x = ql" converges if Ix = Xl < F. Replacing n by nm — 1 in (6.34) and subtracting ¢-* times the resulting equation fom (6.34) we find that (n + 2)(m + NAgio = n+ DnAws = M(n + 2)Agir . Therefore (n+ Ins (n+ DME A = : me Ae Mn + DN and we find Ang be sol (ns Dns (re DME fel Anya |X = xol"*? (n + 2Xn 4 Dt ° 1 as nN —> 0, This limit is less than 1 if |x — x9] < ¢. Hence ¥ a,(x — xo)" converges if |x = xql <¢. But since = |x; — Xol and since x, was an arbitrary point in the interval Oo — 1, Yo + 7), the series F a,(x — x0)" converges for all x in (xy rn, Xo + 7) The first two coefficients a, and a, represent the initial values ofy and its derivative at the point x9. If we let wy be the power-series solution with a, = 1 and a, = 0, so that (%) =1 and —wy(%9) = 0, and let) be the solution with @, = 0 and a, = 1, so that Tan ae then the solutions 1 and up will be independent. This completes the proof. 6.18 The Legendre equation In this section we find power-series solutions for the Legendre equation, (6.35) (1 — x*)p" = 2xy’ + a(n + Dy where a is any real constant, This equation oecurs in problems of traction and in heat- iow prubicns with spircival syuunciiy. equation has polynomial solutions called Legendre polynomials. These are the same polynomials we encountered earlier in connection with the Gram-Schmidt process (Chapter 1, page 26) The Legendre equation can be written as ua iy a janitive integer we sbait fad thai te Io" = DyT = aa + Dy, which has the form T(y) = dy, 172 Linear differential equations where Tis a Sturm-Liouville operator, 11) = (pf')', with p(x) = x* — 1 and A= a(2+ 1). Therefore the nonzero solutions of the Legendre equation are eigenfunctions of T belonging to the eigenvalue a(x + 1). Since p(x) satisfies the boundary conditions Puy = pt“) = 0, the operator Tis symmetric with respect to the inner product (ha) = [Foe a. ‘The general theory of symmetric operators tells us that eigenfunctions belonging to distinct eigenvalues are orthogonal (Theorem $3). In the differential equation treated in Theorem 6.13 the coefficient of y" is 1. The Legendre equation can be put in this form if we divide through by 1 — x®. From (6.35) we obtain y+ Pip’ + Paco - 0, where 5 ad PY teed PA) = = 5 = yt if x £1. Since 1/( = x4) = Ye, x for |x| < 1, both P, and P; have power-series expansions in the open interval (— [, 1) so Theorem 6.13 is applicable. ‘To find the recursion formula for the coefficients it is simpler 10 leave the equation in the form (6.35) and uy to find a power-series solution of the form y= San" =o valid in the open imerval (— 1, 1). Differentiating this series term by term we obmin =Sna,x and y"= F n(n = Ya, a ise Therefore we have Day’ = F 2na,x"= ¥ 2na,x", ay 0 and (1 = aby" = S nln = Yaya"? — Env = a,x” 2 Pa =S(n + (n+ Dayar” — Emm — 1a,x" ele + 20+ Dania = a(n Da, )x". If we substitute these series in the differential equation (6.35), we see that the equation will be satisfied if, and only if, the coefficients satisly the relation (n+ 2A(n+ Vag — n(n — La, = 2nd, + a(a + 1a, = 0 The Legendre equation 173 for all n > 0. This equation is the same as (n + n+ Daas (n= al(n+ 1+ aa, = or 635) dys -~ So Motos, (n + In + 2) This relation enables us to determine ap, d,, a,, . .. , successively in terms of a, Similarly, we can compute dg, 45d, , in terms of a,. For the coefficients with even subscripts we have pe Gee 2 ya -2 = 2 1 = a- 2a +9, 2 (yt Ma Mat 3) and, in general, ean Me = 2)+.. (@— In + 2) (a+ Wa+3)-.' +=) | Slag — (1 te (Qn)! This can be proved by induction, For the cveflicients with odd subscripss we find (a = 1a = 3). 1 @ = Int VD (a+ Wats) ss. (at 2n) Bane ee OeeeEeEeeuereeey (n + D! ‘Therefore dhe series for y can be writen us: (6.37) y = oth (x) + ath(x), where (6.38) ug) = +>" (a — In 42) (+ Ilr + 3) (1+ 2n—1) nel Qin! and (639) : yp Ge Dea 30 @ =n $14 os G+ 2m) ons wie) = D(- SSRD ae a, The ratio test shows that each of these series converges for |x| < 1. Also, since the relation (6.36) is satisfied separately by the even and add coefficients, cach of u, and ty is a solution of the differential equation (6.35). These solutions satisiy the initial conditions uO) = 1, uit0) ud) - 0, uj(0) I74 Linear differential equations Since uy and u, are independent, the general solution of the Legendre equation (6.35) over the open interval (~ 1, 1) is given by the linear combination (6.37) with arbitrary constants a, and a, When a is 0 or a positive even integer, say % = 2m, the series for w(x) becomes a polynomial of degree 2m containing only even powers of x, Since we have la. (a — In 42) 2mm — 2)+++ Om — In 42) = Pw (m—n)! and (at 1(a+ 3). (a+ 2n m1) = (2m 4 12m + 3). ++ (2m + 2n — 1) (2m + 2n)!m! 2"(2m)! (m + n)! the formula for 14(x) in this case becomes (2m + 2k)! -1y (m = kil (m + KOK (6.40) For example, when a = 0, 2, 4, 6 (m =0, 1,2, 3) the corresponding polynomials are u(x) = 1 3x3, 1 10x + Ext, | — dx? + 63x4 — 284ye, ‘The series for u(x) is not a polynomial when a is even because the coefficient of x24 is never zero. When is an odd positive integer, the roles of 4 and tm are reversed; the series for u,(x) becomes a polynomial and the series for w(x) is not a polynomial. Specifically if, a= 2m + Lwe have yt Seo m+ 2k+ Yaa, 6.41) x)= oo oe array (m— Km + WQk + D! For example, when « = 1, 3, 5 (m = 0, 1,2), the corresponding polynomials are 14) kek, rout pay 6.19 The Legendre polynomials Some of the properties of the polynomial solutions of the Tegendre equation can he deduced directly from the differential equation or from the formulas in (640) and (641). Others are more easily deduced from an altemative formula for these polynomials which we shall now derive. First we shall obtain a single formula which contains (aside from constant factors) both the polynomials in (6.40) and (641), Let US (yn = 20! ee 2) - (642) PQ) = 5 Agrln ryt (n= 2)! The Legendre polynomials 175 where [nf] denotes the greatest integer 0 we have P,(1) = Exercises 177 Moreover, P(x) is the only polynomial which satisfies the Legendre equation (1 = x8)y" = 2xy’ + n(n + Dy = 0 and has the value 1 when x = 1 For each m > 0 we have Py(—x) = (—1)"P a(x). This shows that P, is an even function when 7 is even, and an odd funetion when nis odd. We have already mentioned the orhogonslity relation, PP Pa(o)P q(x) ae 0 if m#n. When m = n we have the norm relation [soar a= 25 P, : IP, an+1 Every polynomial of degree m can be expressed as a linear combination of the Legendre polynontials Py, Py, ...,P,. In fact, iff is a polynomial of degre n we have where From the orthogonality relation it follows. that YP) de = for every polynomial g of degree less than n, This property can be used to prove that the Legendre polynomial Phas m distinct red zeros and dat dey all Tie in the open interval C1). 6.21 Exercises 1. The Legendre equation (6.35) with = 0 has the polynomial solution (x) = 1 and a solution 4, not a polynomial, given by the series in Equation (6.41). (@) Show that the sum of the series for uy is given by 1 Le ug(x)= 5 log p— for a < 1 (b) Verify directly that the function uy in part (a) is a solution of the Legendre equation when 178 Linear differential equations 2. Show that the function f defined by the equation x ite 08) = 1 = Flog pS for |x| < | satisfies the Legendre equation (6.35) with «= 1 . Express this function as a linear combination of the solutions 1 and uz given in Equations (6.38) and (6.39). 3. ‘The Legendre equation (6.35) can be wnitten in the form [G2 - Dy'T - aa + Dy = 0. (@) Ifa, b, ¢ are constants with a > 6 and 4e + 1 > 0, show that a differential equation of the type {= a = Dy’ = cy = 0 can be transformed to a Legendre equation by a change of variable of the form x = At + B, with A > 0. Determine A and B in terms of a and 5. (b) Use the method suggested in part (a) to transform the equation Q# = xy" + Ox = Dy’ = 2y= 0 to a Legendre equation. 4, Find two independent power-series solutions of the Hermite equation 2xy" + ay = 0 ‘on an interval of the form (-r ,r). Show that one of these solutions is a polynomial when & is a nonnegative integer. 5. Find a power-series solution of the differential equation ay’ +3 + Ay! + 3x4 valid for all x. Find a second solution of the form y = x-® 6. Find a power-series solution of the differential equation a,x" valid for all x ¥ 0. ay! + aty! = (ax + Dy = 0 valid on an interval of the form (-7 , 7) . 7. Given two functions A and B analytic on an interval (%~ r, 9 +1), say AG) = $ aye — 9", Bla) = 3 ux - x)". It can be shown that the product C(x) = A()B(x) is also analytic on (x — 7,» + r). This exercise shows that C has the power-series expansion cl) Sae- xy") where cp -5 ab. Exercises 179 (@) Use Leihniz’s mule for the nth derivative of a prodnet to show that the nth derivative of C is given by cma =F ("arena"). (b) Now use the fact that A(x) =k! gyand BO*(x) = (1K)! bg to obtain CM) 0 1S alae Since C(x) =n! ¢y , this proves the required formula for ¢y . In Exercises 8 through 14, P,(x) denotes the Legendre polynomial of degree #, These exercises outline proofs of the properties of the Legendre polynomials described in Section 6.20, 8. (a) Use Rodrigues’ formula to show that aoe I" + ~ 10,0), where Q,,(x) is a polynomial. () Prove that P,( 1) = 1 and that Pa(— 1) =( = 1)". (©) Prove that P,(x) is the only polynomial solution of Legendre’s equation (with « = 1) having the value I when x = 1, 9, (a) Use the differential equations satisfied by P, and Py, to show that LU X9(P Pig = Pll = Ime + = mim + FP (b) If m_# 2, integrate the equation in (a) fom -1 to 1 to give an alternate proof of the orthogonality relation [i PaGdPn(2) de = 0, 10, (a) Let f(x) = (x* ~ 1)". Use integration by parts to show that Preyer ae = - [1 pomen-re) ar, Apply this formula repeatedly to deduce that the integral on the left is equal to 20m)! [d= x" de. Cn)! f= xi" de (b) Ihe substation x = cos F transforms the integral f# (1 the relation ay" ax wo f5/? sine™rty dt. Use 2nQn = 2)- +2 Qn FDQn= 1) and Rodrigues’ formula to obtain 2 n+l? [i e.cot a = 180 Linear differential equations 11. (@) Show that Qn)! PA) = arane x" + On(), where Q(x) is a polynomial of degree less than 7, (b) Express the polynomial f(x) = x4 as a linear combination of Pp, Py , Pp » Py, and Py. (©) Show that every polynomialf of degree n can be expressed as a linear combination of the Legendre polynomials Py, P,,... 5 Pas 12, (@) Iffis a polynomial of degree 1, write £0) = Serie. é [This is possible because of Exercise 11 (¢).] For a fixed m, 0S m 0, the constant C may or may not be zero. As in Case 1, both series converge in the interval |x = Xq| 0), so that |x|'= 2. Differentiation of (6.50) gives us Ysa ty ax" + x na io n=0 ‘The Bessel equation 183 Similarly, we obtain 3 (a +n +t= a,x". If L(y) = xy" + ay! + (8 = ay, we find Lo= Fins oinst — Nae" + #3 oo + a,x” = P +6 Z ax 23 wae = Zine of — allagr + 3 z a # Now we put L(j) = 0, cancel xf, and try to determine the a, so that the coefficient of cach power of x will vanish, For the constant term we need (t= a2)ay = 0. Since we seek a solution with a, % 0, this requires that 651) PF =0. This is the indicial equation. Its roots « and —« are the only possible values of f that can give us a solution of the desired ype. Consider first the choice =a, For this r the remaining equations for determining the coefficients become 652) [1 Fa ata = 0 and (nt a) wags dea 0 for n> 2. Since ¢ > 0, the first of these implies that a, = 0. The second formula can be written as 653) a, 2 — #2 (n+ a)? — n(n + 2a) 50 dy = a = a, = '*+=0, For the coefficients with even subscripts we have 7 =49 =a a 0. If x < 0 we can repeat the discussion with xt replaced by (~x)'. We again find that must satisfy the equation 1— a Taking 1 =a we then obtain the same solution, except that the outside factor x* is replaced by (—x)*. Therefore the function f, given by the equation . 2. 1" (654) Ful) = eb (: + adeae py Tees <5 er =a) is a solution of the Bessel equation valid for all real x 0, For those values of % for which JF,{0) and f;(V) exist the solution 1s also valid tor x = U. Now consider the root ¢ = —a of the indicial equation. We obtain, in place of (6.52), the equations [(—)'~aJa,=0 and [(n— a)? = otJa, + a,-2= 0, which become (1 = 20a, 0 and n(n 2a)a, + ays If 2a: is not an integer these equations give us a, = 0 and ans na — 22) a= for m2 2, Since iis tocursivn formuta is die samme as (6.53), wiilt a replaced by —a, we are led to the solution “(ana valid for all real x #0. The solution f_, was obtained under the hypothesis that 2x is not a positive integer. However, the scrics for f_. is meaningful cven if 2x is a positive integer, so long as « is not a positive integer. It can be verified that f_, satisfies the Bessel equation for such a. Therefore, for each «> 0 we have the series solution /,, given by Equation (6.54); and if @ is not a nonnegative integer we have found another solution /., given by Equation (6.55) The two solutions /, and f_, are independent, since one of them -+2 as x 0, and the other does not. Next we shall simplify the form of the solutions. To do this we need some properties of Euler’s gamma function, and we digress briefly to recall these properties. (655) fal) = ag [xl* (! _ For each real s > 0 we define I's) by the improper integral Toys [reat ‘The Bessel equation 185 This integral converges if s > 0 and diverges if s <0. Integration by parts leads to the functional equation (6.56) T(s + 1) = 5 Ts). ‘This implies that Te + 2) =6+DFG+N=6+ ps TO, Ts + 3) =(s + 2 (5 + 2) =(5 + 2) (s + Ds Ts), and, in general, (6.31) Ts + na (sen—l)-.. (s+ DsT(s) for every positive integer n, Since (1) = [2 et dé = 1, when we put s = 1 in (657) we find Tin +) = Anus, the gamma function 18 an extension of the tactonal tuncton from integers to positive real numbers. ‘The functional equation (6.56) can be used to extend the definition of 1%) to negative values of s that are not integers. We write (6.56) in the form (6.58) Ti) = Pot) 5 ‘The righthand member is meaningful ifs + 1 > 0 and 5 % 0. Therefore, we can use this equation to define Y(s) if — 1 < s < 0. The right-hand member of (6.58) is now meaning- fil if 6 +250, 6 4 = 1, 6 40, and we can use this equation to define 1968) for -2 < $< — 1. Continuing in this manner, we can extend the definition of 1%(s) by induction to every open interval of the form -n < § <-n +1, where n is a positive integer. The functional equation (6.56) and its extension in (6.57) are now valid for all real s for which both sides are meaningful. We return now to the discussion of the Bessel equation, The series for f, in Equation (6.54) contains the product (1 + a) (2 + a). .* (m+ a). We can express this product in terms of the gamma function by tking $= 1 + « in (657). This gives us (n+ 1+) Le aQea)err Fat ls « (Lt ae adore (ne a) Tata) Therefore, if we choose a, = 2-%{['(1 + a) in Equation (654) and denote the resulting function f(x) by J,(x) when x > 0, the solution for x > 0 can be written as ws 10 =) Sanaa 186 Linear differential equations The function J, defined by this equation for x > 0 and « > 0 is called the Bessel /unction Of the first kind of order «. When a is a nonnegative integer, say « = p , the Bessel function Jy is yiven by the power series Isle placer ce (p= 0, 1,2, Inis 1s also a solution of the Bessel equation for x <0, Extensive tables of Bessel functions have been constructed. The graphs of the two functions Jp and Jy are shown in Figure 6.2 Fiourn 6.2 Graphs of the Bessel functions Jy and Jy We can define a new function J_, by replacing @ by —a in Equation (6.59), if a is such that [(a + 1 = %) is meaningful; that is, if & is not a positive integer, Therefore, if x > 0 and a >0,a41,2,3,..., wedeline i al (@) 5 aterizal Taking 8 = 1 = @ in (6.57) we obtain T(n+ 1-X)=(-a)Q—9). (a — a) PC 1-4) and we see that the series for J_,(x) is the same as that for f_,(x) in Equation (6.55) with a, = XIT(L = a), x > 0. Therefore, if « is not a positive integer, J_, is a solution of the Bessel equation for x > 0. If & is not an integer, the two solutions J,(x) and J_,(x) are linearly independent on the positive real axis (since their ratio is not constant) and the general solution of the Bessel equation for x > 0 is eyS(*) + CyJ_q(X). If a is a nonnegative integer, say 2 = p , we have found only the solution J and its con stant multiples valid for x > 0, Another solution, independent of this one, can be found The Bessel equation 187 by the method described in Exercise 4 of Section 6.16. This states that if a is a solution of y’ + Pyy’ + Pay = 0 that never vanishes on an interval J, a second solution u, independent ‘of u, is given by the integral a(x) = u(x) [* 2 a, Je (uit) where Q(x) = e~/Fs'#lé2 For the Bessel equation we have P,(x) = x, so Q(x) = Ux and a second solution u, is given by the formula (6.60) u(x) 4,09) f Tr om t if © and x lie in an interval 7 in which J, does not vanish This second solution can be put in other forms, For example, from Equation (659) we may write where g(O) # 0. In the interval Z the function g, has a power-series expansion a= 54,0" 2 which could be determined by equating coefficients in the identity g(t) [2 = er IF we assume the existence of such an expansion, the integrand in (6.60) takes the form i (OF 1 sh Saye. Integrating this formula term by term from ¢ tw x we obuin a logarithmic term A, log x (from the power 1!) plus a series of the form x? Y B,x", Therefore Equation (6.60) takes the form u(x) = AayJ (3) log x + Joye? S Bas It can be shown that the coefficient A,, 4 0. If we multiply u(x) by 1/Ay, the resulting solution is denoted hy K(x) and has the form This is the form of the solution promised hy the second case of Frobenius’ theorem. Having arrived at this formula, we can verify that a solution of this form acwally exists by substituting the right-hand member in the Bessel equation and determining the co- efficients C,, so as 10 satisfy the equation. The details of this calculation are lengthy and 188 Linear differential equations will be omitted. The final result can be expressed as Ko) = spoons 3G) SPMD SEF Sion easy where hy =0 and hy = 1+ 4-4++**+ lin for n > 1. The series on the right converges for all real x. The function K, defined for x > 0 by this formnla is called the Resselfunction Of the second kind of orderp. Since K, is not a constant multiple of J, , the general solution of the Bessel equation in this case for x > 0 is y= Cy,(x) + 2K, (x) Further properties of the Bessel functions are discussed in the next set of exercises, 6.24 Exercises 1. (@) Let £ be any solution of the Bessel equation of order « and let g(x) = x'4f (x) for x > 0. Show that g satisfies the differential equation (b) When 422 = 1 the differential equation in (a) becomes y” + y = is y = A cos x + B sin x. Use this information and the equation? I'(d) = VF to show that, fuin 26, (2fton (©) Deduce the formulas in part (b) direetly from the series for Jy4(x) and s(x), Use the series representation for Bessel functions to show that 26 se = (2) sinx and Jy(x) = R d (@) | OG) = 0), S (eats) = - haul) 3 1,08) = 2I,(x) and Gx) = x-°J,(2) for x > 0. Note that each positive zew of J, is a zero of F, and is also a zero of G, . Use Rolle’s theorem and Exercise 2 10 prove that the posi- tive zeros of J, and Jq.s interlace. That is, there is a zero of J, between each pair of positive e105 Of Jzs1y and a 22¢0 OF Jyq1 between each pair of positive zeros of J,. (See Figure 6.2.) The change of variable = ives us r@= fe et dt {ew du = Va (See Exercise 16 of Section 11.28 for a proof that 2 {3° e-u* du =v.) Exercises 189 10. ML (a) From the relations in Exercise 2 deduce the recurrence relations 2A). KO) tat) md 21,6) = KO) - In. (b) Use the relations in part (a) to deduce the formulas Jor) + Juss = 02) td Sua) = Jala) « 2,00). . Use Exercise 1 (b) and a suitable recurrence formula to show that sg ie Jygo) = (3) (= ~ cos x). Find a similar formula for J_s,(x). Note: Jx) is an elementary function for every « which is half an odd integer. Prove that 1 5 5 gy VIO) + HE a wat 3) — —— ae) and E (abe) « 0%) = Bal). (a) Use the identities in Exercise 6 to show that Fi) + 2 3 0- 1 ond Sen + Wa@Wna@)= dx (b) From part (a), deduce that [Jo(x)| < 1 and |J,G0| < 4v2 for n= 1,2, 3, and all x20. . Let ga(x) = x4f,(ax*) for x > 0, where a and 6 are nonzero constants. Show that g, satisfies the “differential equation aty” + (abt + | — ofb%y = 0 if, and only if, f, is a solution of the Bessel equation of order a. Use Exercise 8 {0 express the general solution of each of the following differential equations in terms of Bessel functions for x > 0. (a) y" + xy =O (o)y” + xy =0. Generalize Exercise 8 when f, and g, are related hy the equation g(x) = x*f,(ax) for x > 0. Then find the general solution of each of the following equations in terms of Bessel functions forx >0. @) wy" +64 y © xy" + 6" + aty = 0. (b) xy" + 6y’ + xy = 0. (d) Ay" — xy + (K+ Dy = 0. A Bessel function identity exists of the form Ie) — Sg) = ae), where @ and ¢ are constants. Determine a and c. 190 Linear differential equations 12. Find a power series solution of the differential equation xy” + y’ + y = 0 convergent for --to 0 it can be expressed in terms of a Bessel function 13. Consider a linear second-order differential equation of the form AG" + xPOIy' + QQrdy 0, where A(x), P(x), and Q(x) have power series expansions, AW) = Fart, Pix) Spec. ow) with a, # 0, each convergent in an open interval (-7, 7). If the differential equation has a series solution of the farm valid for 0 < x 0. Determine these solutions 16, ‘The nonlinear differential equation y" + y + ay* = 0 is only nonzero constant. Assume there is a solution which can be expressed as a power series in a nonlinear if « is a small of the form DY up(x)a" — (valid in some interval 0 < a <1) mo and that this solution satisfies the initial conditions y = 1 and y’ = 0 when x = 0. To con form with these initial conditions, we ty 10 choose the coefficients y,(x)so that ug(0) = 1 u,(0) = 0 and u,(0)= w,(0) = 0 for > 1. Substiwwe this series in the differential equation, equate suitable powers of @ and thereby determine w(x) and u(x). 1 SYSTEMS OF DIFFERENTIAL EQUATIONS 7. Introduction Although the study of differential equations began in the 17h century, it was not until the 19th century that mathematicians realized that relatively few differential equations could be solved by elementary means. The work of Cauchy, Liouville, and others showed the importance of establishing general theorems to guarantee the existence of solutions to certain specific classes of differential equations. Chapter 6 illustrated the use of an existence. uniqueness theorem in the study of linear differential equations, This chapter is concerned with a proof of this theorem and related topics. Existence theory for differential equations of higher order can be reduced to the first order case by the introduction of systems of equations, For example, the second-order equation aay yt my 1 can be transformed to a system of two first-order equations by introducing two unknown funetions y; andy, , where Then we have Gy We cannot solve the equations separately by the methods of Chapter 6 because each of them involves two unknown functions. In this chapter we consider systems consisting of m linear differential equations of first ons Jay Vi = Puy + ProlDys ++. + PinlDyn + 10) (7.3) Yn = Paalt ae Pron #8 + Pann + Gnd) IL 192 Systems of differential equations The functions pj, and g, which appear in (7.3) are considered as given functions defined on a given interval J, The functions y,, ... , Jy are unknown functions to be determtined. Systems of this type are called first-order linear systems, In general, each equation inthe system involves more than one unknown function so the equations cannot be solved separately, ‘A linear differential equation of order m can always be transformed to a linear system, Suppose the given nth order equation is (7.4) VO + ayO + reed any = Rit), where the coefficients a, are given functions, To transform this to a system we write yy = y and introduce a new unknown function for each of the successive derivatives of y. That is, we put W=V, Yes Vy Ye= Ys. Y= Yay and rewrite (7.4) as the system dis Ye Yes Va (15) Yn-1= Jn Vn = Ant = Aya — 1 HY y + RD). The discussion of systems may be simplified considerably by the use of vector and matrix notation, Consider the general system (7.3) and introduce vector-valued functions Y = Ots++. Inds Q = Gis+..04n)s and a matrix-valued function P = [pj], defined by the equations YO = OOO), GO =O... 1900), POC) = [Ps] for each + in J. We regard the vectors as n x 1 column matrices and write the system (73) in the simpler form (16) P(t) Y + Q(t). For example, in system (7.2) we have P), vo= 11, om BL Dal | Calculus of matrix: functions 193 In system (7.5) we have - oo 0 0 0 y 0 0 \ 0 ° ys Y= > P= Q(t) = o 0 0 \ ° - RW), An initial-value problem for system (7.6) is to find a vector-valued function Y which satisfies (7.6) and which also satisfies an initial condition of the form Y(a) = B, where aeJand B= (b;,... , 6) is a given n-dimensional vector. Tn the case nm I (the scalar case) we know from Theorem 6.1 that, continuous on J, all solutions of (7.6) are given by the explicit formula if P and Q are an YE) = ea) + et | tg dy, where A(x) = fz P() dt , and a is any point in J. We will show that this formula can be suitably generalized for systems, that is, when P(t) is an m x m matrix function and Q(t) ig an nedimensional vector function, To do this we must assign a meaning to integrals of matrices and to exponentials of matrices, Therefore, we digress briefly to discuss the calculus of matrix functions, 72 Calculus of matrix functions The generalization of the concepts of integral and derivative for matrix functions is straightforward. If P(t) = [p,(f)], we define the integral (2 P(t) de by the equation [Pro a= [f 0) ai] That is, the integral of matrix P(t) is the matrix obtained by integrating each entry of P(1), rming of course, that ench entry is integmble on fa, hJ. The render can verify that the linearity properly for integrals generalizes to matrix functions, Continuity and differentiability of matrix functions are also defined in terms of the cauies, We say dnt a matrix function # = {py} is conimuous a é if each enuy py is continuous at t. The derivative P’ is defined by differentiating cach entry, P(t) = [rif], whenever all derivatives pi,(t) exist. Ik is easy to verify the basic differentiation rules for sums and products, For example, if P and Q are differentiable matrix functions, we have (P+ Oy'= P+ Q 194 Systems of differential equations if P and Q are of the same size, and we also have (PQ) = PQ’ + PQ if the product PQ is defined. The chain rule also holds. That is, if F(t) = Plg(t)] , where P is a differentiable matrix function and g is a differentiable scalar function, then F’(t) g'(P'[g(D]. The zero-derivative theorem, and the first and second fundamental theorems of calculus are also valid for matrix functions, Proofs of these properties are requested in the next set of exercises. The definition of the exponential of a matrix is not so simple and requires further preparation, This is discussed in the next section, 713 Infinite series of matrices. Norms of matrices Let A = [aj] be ann x n matrix of real or complex entries. We wish to define the exponential e in such a way that it possesses some of the fundamental properties of the ordinary real or complex-valued exponential Tn particnlar, we shall require the law of exponents in the form (78) ef4e'4 = e'4 for all real sand t, and the relation 19) I, where 0 and Z are the n x n zero and identity matrices, respectively. It might seem natural to define e4 to be the matrix [e%], However, this is unacceptable since it satisfies neither of properties (7.8) or (7.9). Instead, we shall define e4 by means of a power series expansion, We know that this formula holds if A is a real or complex number, and we will prove that it implies properties (7.8) and (7.9) if A is a matrix, Before we can do this we need to explain what is meant by a convergent series of matrices, DEFINITION OF CONVERGENT SURIES OF MATRICES. Given an infinite sequence of m Xn matrices {C,} whose entries are real or complex numbers, derote the ij-entty of Cy by of. If all mn sorios (7.10) Sle (i= 1,...,m:j=1,...,) care convergent, then we say the series of matrices Ye, C,, is convergent, and its sum is defined to be the mx n matrix whose ij-entry is the series in (7.10). Exercises 195 ‘A simple and useful test for convergence of a series of matrices can be given in terms of the orm of a maitix, a generalization of the absolute value of a number. DEANINION OF NoRM or A Mave If A = [a,j] is an m xn matrix of real or complex entries, the norm of A, denoted by |A\, is defined to be the nonnegative number given by the formula (Al) 14 = Zan In other words, the norm of A is the sum of the absolute values of all its entries. There are other definitions of norms that are sometimes used, but we have chosen this one because of the ease with which we cai prove the following. propesties ‘THHOREM 7.J, FUNDAMENTAL PROPERTIES OF NORMS. For rectangular matrices A and B, and all real or complex scalars ¢ we have 1A + Bil < Al + IBI, 4B] < [All Bi, lel] = fel 4). Proof. We prove only the result for | AB||, assuming that A is mx n and Bis n xp ‘The proofs of the others are simpler and are left as exercises. Writing A = [ax], B = [5x5], we have AB=[S%_, @xbgj] . 0 from (7.11) we obtain (4B) = S| Sande, <3 Staal Situ < ¥ Seal Bl = 1A iat Note that in the special case B = A the inequality for |4B\| becomes ||| < Al. By induction we also have [A SANE for K=1,2,3 0068 ‘These inequalities will be useful in the discussion of the exponential matrix. ‘The next theorem gives a useful sufficient condition for convergence of a series of matrices. ‘THROREM 7.2, ‘TBST FOR CONVERGENCE OF A MaTRIX sERIBS. If {C,} is a sequence orm Xn matrices such that ¥% , | Cyl| converges, then the matrix series Y2, C, also converges. Proof, Let the jpentey of C, be cd by et. Since felt"! < 1H ¢ Proof. Let the ij-cuiny of O, be denoted by oft, Since jell) * ae (S")a = EWA. > om In the last equation we used the property AY‘! = AEA . Since A commutes with A¥ we could also have writen AS? = AA* to obtain the relation E’(t) = AE(t). This completes the proof Note: The foregoing proof also shows that A commutes with e'4 . 7.7 Uniqueness theorem for the matrix differential equation F'() = AF() In this section we prove a uniqueness theorem which characterizes all solutions of the matrix differential equation F’(t) = AF(t) . The proof makes use of the following theorem, THEOREM 7.4. NONSINGULARITY oF e*4, For any n X n matrix A and any scalar t we have (7.14) ete tt Hence e4 is nonsingular, and its inverse is e-*4 Proof. Let F be the matrix function defined for all real ¢ by the equation Fi) = e4et4 We shall prove that F(1) is the identity matrix J by showing that the derivative F(t) is the vero matrix. Differentiating F as a product, using the result of Theorem 7.3, we find F(t) =e4(e4y + (eye = e'4(— Ae 4) + Ae'4et4 = —Ae'te “4 + Ate 4 = 0, since A commutes with e4, Therefore, by the zero-derivative theorem, F is a constant matrix. But F(0) = e°4e94 = I, so F(t) = I for all t, This proves (7.14), The law of exponents for exponential matrices 199 THEOREM 7.5, UNIQUENESS THEOREM. Let A and B be given n x n constant matrices. Then the only n x n matrix function F satisfying the initial-value problem F(t) = AF), FO) =B for-n< t<+ vis (7.15) Fi) = 2&4B. Proof. First we note that e'4B is a solution. Now let F be any solution and consider the matrix function Gt) = AF) Differentiating this product’ we obtain Gy) = e"AF"(t) — Ae“4F(t) = e4AF(t) — eMAF(1) = 0. Therefore G(t) is a constant matrix, G(t) = G(0) = F(0) = B. In other words, e4F(r) = B. Multiplying by e4 and using (7.14) we obtain (7.15). F()= FA, F(0) = 8 78 The law of exponents for exponential matrices ‘The law of exponents e4e% = e4*5 is not always true for matrix exponentials, A counter example is given in Exercise 13 of Section 7.12. However, it is not difficult to prove that the formula is true for matrices A and B which commute. rHsorem 7.6, Let A and B be two n x n matrices which commute, AB = BA . Then we have (7.16) cAtB ws oQB Proof. From the equation AB = BA we. find that A®B = A(BA) = (AB)A = (BAJA = BA?, so B commutes with 4%. By induction, B commutes with every power of A. By witing e’4 as a power series we find that B also commutes with e*4 for every real f, Now let F be the matrix function defined by the equation ater, 200 Systems of differential equations Differentiating F(t) and using the fact that B commutes with e'4 we find F(0) = (A+ Bye4® — pete — e4Be'® H(A + Bese (A + Beet = (A + BF(). By the uniqueness theorem we have F(t) = &4* FQ). But F(0) = 0, so F(t) =0 for all t, Hence ANB) gtd gtB When « = 1 we obiain (7.16). wars. The matrices sd and tA commute for all scalars s and t. Hence we have dtd wx oleh, 79 Existence and uniqueness theorems for homogencous linear systems with constant coefficients ‘The v xn constant marie Y is an n-dimensional vector function (regarded as an n x 1 column matrix) is called a homogeneous linear system with constant coefficients. We shall use the exponential matrix to give an explicit formula for the solution of such a_ system. tial equation Yt) — A Yi), where A is nd riconm 7.7. Let A be a given n x ni constant matrix and let B be a given n-dimensional vector. Then the initial-value problem qin YW =AY), (0) = has a unique solution on the interval — co <1<+ 00. This solution is given by the formula (7.18) Y(t) = e'4B. More generally, the unique solution of the initial value problem Y(Q=AYt, Ya) =B, is Y(t) = e'-4B, Proof. Differentiation of (7.18) gives us Y'(t) = Ae‘4B = A Y(t). Since YO) = B , this is a solution of the initial-value problem (7.17). To prove that it is the only solution we argue as in the proof of Theorem 7.5. Let Z(1) be another vector function satisfying Z(t) = AZ(t) with ZO) = B , and let Git) = e4Z(1). Then we easily verify that G(i) = U, so G(t) = GIO) = Z(O) = B. In other words, The problem of calculating e4 201 e*4Z(t) = B, so Zt) = 4B = Y(t). The more general case with initial value Ya) = B is treated in exactly the same way. 7.10 The problem of calculating e'4 Although Theorem 7.7 gives an explicit formula for the solution of a homogeneous system with constant coefficients, there still remains the problem of actually computing the exponenual matnx e, Ir we were to calculate e directly mom the series definition we would have to compute all the powers A* for k =0, 1,2, ,.., and then compute the sum of cach series Y2q tcl#/ke!, where elt? is the jj-entry of AF. In general this is a hopeless task unless A is a matrix whose powers may be readily calculated, For example, if A is a diagonal matrix, say A =ding (Ay... Ay), then every power of A is also a diagonal matrix, in fact, A® = diag (4¥,..., A). ‘Therefore in this case e4 is a diagonal matrix given by , ii) = ding (e,..., ef Zit Another easy case to handle is when A is a matrix which can be diagonalized. For example, if there is a nonsingular matrix C such that C-1AC is a diagonal matrix, say CAC = D, then we have A = CDC fiom which we find eae diag ( + ino At = (CDC)(CDC*) = CDC, ‘and, more generally, Ak = CDC. Therefore in this case we have Here the difficulty lies in determining C and its inverse, Once these are known, e'4 is easily calculated. Of course, not every matrix can be diagonalized so the usefulness of the foregoing remarks is limited. 2 wawie 1, Calculate et4 for the 2 x 2 matrix A = [ ] ide oo ki 2... Sobition. ws a GLa, s ab 6 0 ai C=] a such that CAC = D, where D = diag (Ay, Ay) = Ae To e 202 Systems of differential equations determine C we can write AC = CD, or [ ‘I 5) [a By[6 oF t 2j)le a: -|' a i}: Multiplying the matrices, ve find that this equation is satisfied for any scalars a, b, ¢, d with a =4e,b= -d. Taking e = d = i we couse 4-1 c= » ct a: tf ‘Therefore i= cenc+ allt “Te ‘I 1 i sli Lo eJL—-1 4. (' le a] hae 40% — deh i el ek Mt ae exampze 2. Solve the linear system —4 5 d= Si + Ay Vis N+ 2a subject to the initial conditions »,(0) = 2, y2(0) = Solution. In matrix form the system can be written as 2 5 4 YI=AY0, YO= [ |: where A = [ |: 3 12 By Theorem 7.7 the solution is ¥(t) = e'4 Y(). Using the matrix et4 calculated in Example T we find [*| i ef 4c — ge] [2 yl SLE met et + acl, Jaa 4et— Det, yy = eM + Det, from which we obtain There are many methods known for calculating e when A cannot be diagonalized. Most of these methods are rather complicated and require preliminary matrix transforma- tions, the nature of which depends on the multiplicities of the eigenvalues of A. Ina later section we shall discuss a practical and straightforward method for calculating et which can be used whether or not A can be diagonalized, It is valid for alf matrices A and requires no preliminary transformations of any kind. This method was developed by E. J. Putzer in a paper in the American Mathematical Monthly, Vol. 73 (1966), pp. 2-7. It is based on a famous theurem aivibuicd io Artur Cayicy (1621-1895) and William Rowan Hamiiion The Cayley-Hamilton theorem 203 (18051865) which states that every square matrix satisfies its characteristic equation. First we shall prove the Cayley-Hamilton theorem and then we shall use it to obtain Putzer's formulas for calculating ef4, 7.A1 The Cayley-Han ilton theorem ‘THEOREM 7.8, CAYLEY-HAMILTON THEOREM. Let A be an nxn matrix and let 9) SOHO AY = AY + Gad tt GA + be its characteristic polynomial, Then £ (A) = 0. In other words, A satisfies the equation (720) Pte st At egf = 0. wen 2.12 which states that for any squam 7.21) A (cof A)' = (det A)I. We apply this formula with A replaced by Af — A.. Since det (I — A) = f(A), Equation (721) becomes 12) (Ul = Ayfcof Al = A)}'= SAI This equation is valid for all real 1. The idea of the proof is to show that it is also valid when 4 is replaced by A ‘The entries of the matrix cof (AI — A) are the cofactors of AF — A. Except for a factor +1,, each such cofactor is the determinant of a minor of Af — A of order n — 1. Therefore each entry of cof (AI ~ A), and hence of {cot (4 = A)/*, is a polynomial in of degree nis a linear combination of 1*7, t*A, t*A%, ..., 1A", Hence we can expect that ef should be expressible as a polynomial in A of the form (127) ef Sal DAT & where the scalar coefficients gy(¢) depend on f, Putzer developed two useful methods for expressing e'4 as a polynomial in A, ‘The next theorem describes the simpler of the two methods. : runowen 79. Let hy... dy be ihe eigenvalues uf ann Xi matrix A, aiid sequence of polynomials in A as follows: (7.28) PA) =I, p(A)= TT (A= aad, for kK=A2,..0an aed Then we have a 29) et. ZrenOPdA) , where the scalar coefficients r(t), ...,r(0) are determined recursively from the system of linear differential equations KO=AnOQ, nQ=1, (7.30) rha@) = Aaten@ . ri), tes) 0, (k= 4,2 Note: Equation (7.29) does not express et4 directly in powers of A as indicated in (7.27), but as a linear combination of the polynomials Py(A), Py(A), . .. , Pys(A). These polynomials are easily calculated once the eigenvalues of A are determined. Also the inuiipiias ni, 5 nt) in 7.30} ae eily valvuiaicd, Akimuyin this equines sviving a system of linear differential equations, this particular system has a tiangular matrix and the solutions can be determined in succession. Proof. Let r(),.. «+ (0 de the scalar functions determined by (7.30) and define a matrix function F by the equation 30) FQ). Fras PAA) . Note that F(0) = r,(0)Pa(A) = F. We will prove that F(t) = e'4 by showing that F satisfies the same differential equation as e4, namely, F'(t) = AF() . Differentiaing (731) and using the recursion formulas (7.30) we obtain = 2 re(PCA) Fo Sord0 + fea (OIA); Putzer’s method for calculating e'4 207 where r,(t) is defined to be 0. We rewrite this in the form PO =S resldPalA) . 3 Aeuresl OPA), then subtract 2, F(t) = S22) Antesi()P,(A) to obtain the relation ) FO-AFO. "SrealOlPalA. Gra APIA) But from (7.28) we see that P,,;(A)= (A= Agil)P,(A), 80 Pe). Ana adP(A). (AA DPA). Anta = 4,)P(A) =(A — A,DP,A). ‘Therefore Equation (7.32) becomes FOO ~ LF) = (A = 4) 3 al PMA) = (A= ADF) — PsP A} =(A = 1,DF() ~ r,()P,(A). The Cayley-Hamilton theorem implies that P,(A) = ‘0 the Last equation becomes PW — AF) = (A= ADE = APY) = AF, from which we find F'%t)= AF(t). Since FO) = I, the uniqueness theorem (Theorem 7.7) shows that F(t) = e'4 . ewwere I, Express e'4 as a lincar combination of Zand A if A is a 2 x 2 matrix with both its eigenvalues equal to il. Solution. Writing 2, = 4y= 2, we are to solve the system of differential equations HO = ar, nO =1, nO) = Ar) +r, 70) = 0. Solving these first-order equations in succession we find n=, rf) = te Since P,(A) = Z and P(A) = A = Al, the required formula for e4 is 733) 4 = ML + te(A =U) = A = ANT + tA, 208 Systems of differential equations ser 2. Solve Example 1 if the eigenvalues of A are 4and 4, where 1 # 4. Solution, In this case the system of differential equations is nO) = Ant), 70) =1, HO =arnOD+nO, 10) = Iis solutions are given by woe =H n= e, nf)= Since P(A) = Z and P(A) = A= AI the required formula for e4 is (7.34) If the eigenvalues 4, 4 are complex numbers, the exponentials e# and e will also be complex numbers. But if 2 and je are complex conjugates, the scalars multiplying Z and A in (734) will be real. Por example, suppose Asati, p= p#0. Then 4 — 4 = 2if so Equation (7.34) becomes (ti0Nt glean ed a elatipty “a = («+ ipl] “ = ate «8 ee al ian) 2i zB sin Bt = {os pti sin pot SEE a — al — ‘The terms involving i cancel and we get (7.35) 4s e {(B cos ft = a sin Bil + sin Bt A}. 7.14 Alternate methods for calculating e'* in special cases Putzer’s method for expressing e4 as a polynomial in A is completely general because it is valid for all square matrices A. A general method is not always the simplest method to use in certain special cases. In this section we give simpler methods for computing e'4 in three special cases: (a) When all the eigenvalues of A are equal, (b) when all the eigenvalues of A are distinct, and (c) when A has two distinct eigenvalues, exactly one of which has multiplicity 1 Alternate methods for calculating e'4 in special cases 209 muon 7.10, If Ais ann x n matrix with all its eigenvalues equal to 2, then we have (7.36) SFA — ane. ee Proof. Since the matrices AtT and (A — AL) commute we have ga en e (A — ant im ‘The Cayley-Hamilton theorem implies that (A — 2/)* = 0 for every k 2 n , so the theorem is proved. munorn «7.11. If A is ann xn matrix with n distinct eigenvalues we have Ay Jy « then 4 =Se*L (A), mi where 1,(A) is a polynomial in A of degree n ~ 1 given by the formula tor k=1,2,...,0 Note: The polynomials 1,(A) are called Lagrange interpolation coefficients Proof. We define a matrix function F by the equation (737) F(t) = ¥ e**L,(A) m1 and verify that F satisfies the differential equation F(t) = AF(t) and the initial condition F(0) = 1. From (7.37) we see that AFW) = Fi) = 3 (A = Alb) By the Cayley-Hamilton theorem we have (A — AjI)L,(A) = 0 for each k, so F satisfies the differential equation F(t) = AF(1) . which becomes (7.38) SLAA) 21 ra A proof of (7.38) is outlined in Exercise 16 of Section 7.15. ‘The next theorem treats the case when A has two distinct eigenvalues, exactly one of which has multipicity 1. 210 Systems of differential equations mumoree 7.12. Let A be ann x n matrix (n > 3) with two distinct eigenvalues A.and jt, where 2 has multiplicity n — 1 and ju has multiplicity 1. Then we have fui Banas 7 Se qa arya =a a Proof, As in dhe proof of Theorem 7.10 we begin by writing fe eee Se ae — ant = e* Dae = An¥ +e" > fia -an* aoa Kt ep = oD FB (A- ah + #2 Gaier —any, = — Now we evaluate the series over 7 in closed form by using the Cayley-Hamilton theorem. Since Aww =A (ua we find (A = MYA = al) = (A — A = (uw AA - AD. ‘The let member is 0 by the Cayley-Hamilton theorem s0 (A = an" = (w= AA AD. Using this relation repeatedly we find om ADH = (ue AAR AD™. ‘Therefore the series over r becomes Sf it a > pte = HMA = aN) tp Gu aya —an. imo This completes the proof. The explicit formula in Theorem 7.12 can alo be deduced by applying Puveer’s method, but the details are more complicated The explicit formulas in Theorems 7.10, 7.11 and 7.12 cover all matrices of order n < 3 Since the 3 x 3 case ofien arises in practice, the formulas in this case are listed below for easy reference. CASE 1. If a3 x 3 matix A has eigenvalues 2, 4, 4, then = eM +A — Al) + $A = ADF. Exercises 2 CASE 2. If a3 x 3 matrix A has distinct eigenvalues a, 4, v, then he AZ BDH 9D) yp AZ WA =) ye (A= AINA = wl) CC) @=Me=) C= D0 = 1) CASE 3. If a 3 x 3 matrix A has eigenvalues 2, 4, @, with A # 4, then te’ e4 — eT + (4 = AN} + < aC - ay. avers. Compute e' when A = Solution, The cigenvalues of A are 1, 1, 2, so the formula of Case 3 gives us (739) 4 self + (4 = Z)) + (C — e(A— DP = te'(A - DP. By collectng powers of A we can also wnie this as follows, (7.40) et = (Meh + MT + (3t + Dot — 2A — f(t + Det — eA. At this stage we can calculate (A — 1)? or A? and perform the indicated operations in (7.39) or (7.40) to write the result as a 3 x 3 matrix, —2e! + et (3t+ Detm De (t+ Net + ott ef = |—2(r + ets Qe (3+ Set de® — (4 + Det + Delt 2b + Det + det — (31+ Bet — Be (r+ Adel + ett 7.15 Exercises For each of the matrices in Exercises 1 through 6, express et as a polynomial in A, fi 0 21 1 2 LAs . BA=]0 1 3), 2 -1 004 11 0 0 4 : ee 1 5.A=|2 ool 6.A= 3. 6 1 2 oo 3 - 7 og 0 7. (a) A3 x3 matix A is known © have all its eigenvalues equal 0 1. Prove that ett = felt (ttt — 2at + 2)Z + (—2Ar + 2A + PAY. (b) Find a comesponding formula if A is a4 x 4 matrix with all its eigenvalues equal to A. a2 Systems of differential equations In each of Exercises 8 through 15, solve the system Y’ =A Y subject to the given initial condition. : 1, vo =("]. van[ ; . ro -['] 2 -hy Lead J iy Sot It 200 a A 0 ¥@) =| -1}. WeA=l0 1 0}, YO=/al. m =lh-l y 2, O11 6 o 1 0 1 202-3 8 was} 0 0 1 1-6 yo) =| 0 : ll 6 =| 0 0 2 0 WAS iLL Li) 001 al 16, This exercise outlines a proof of Equation (7.38) used in the proof of Theorem 7.11, Let L(A) be the polynomial in 2 of degree m = 1 defined by the equation _ 42 Ea) = L hah’ it where Ay... 1, are m distinct scalars. (a) Prove that 0 fi # AQ Lh) = : 1 if Ay = Aye () Let yy)... Jn be m arbitrary scalars, and let 70)» Skit). Prove that p(J) is the only polynomial of degree < m — 1 which satisfies the 1 equations PU). ye fork =1,2,...,n. (©) Prove that D8 L,(A) = 1 for every 2, and deduce that for every square matrix A we have SL =1, a where Z is the identity matrix. Nonhomogeneous linear systems with constant coefficients 213 7.16 Nonhomogencous linear systems with constant coefficients We consider next the nonhomogeneous initial-value problem 41) YO =AYO+ QW, Ya = on an interval J, Here A is an mn x 7 constant matrix, Q is an n-dimensional vector function (regarded as an mx 1 column matrix) continuous on J, and @ is a given point in J. We can obiain an explicit formula for the solution of this problem by the same process used 1 treat the scalar case First. we multiply both members of (7.41) by the exponential matrix e~4 and rewrite the differential eywaion in the for (7.42) ety) = AY(D} = e4Q(N. The left member of (7.42) is the derivative of the product e-*4¥(f). Therefore, if we integrate both members of (742) fiom a to x, where x € J, we obtain ody(x) — e4y(ay— [* 400) at « Multiplying by e*4 we obtain the explicit formula (7.43) which appears in the following theorem. suonn 7.13. Let A be ann x n constant matrix and let Q be an n-dimensional vector function continuous on an interval J. Then the initial-value problem Y(QQ=A Y¥+ O00. Ya = B has a unique solution on J given by the explicit formula (7.43) V(x) = &= 4B + et I *4Q(t) dt. ‘As in the homogeneous case, the difficulty in applying this formula in practice lies in the calculation of the exponential matrices, Note that the first term, e 4B, is the solution of the homogeneous problem ¥%1) = A ¥(0, ¥(a) = B. The second term is the solution of the nonhomogeneous problem YI) =A ¥(N + O11), Ya) = 0. We illustrate Theorem 7.13. with an example, severe. Solve the initial-value problem. ¥@)= AYO+ OO), ¥(0) = 214 Systems of differential equations on the interval (© 00 , + co), where 2-1 | oe A=|0 3 -I}, AH=| 0 |, ooo) : a Solution. According to Theorem 7.13, the solution is given by i e4Q(1) de. (7.44) Yeu) = 4 |" e4Q(0 dt = | The eigenvalues of A are 2, 2, and 4, To calculate e*4 we use the formula of Case 3, Section 7.14, to obtain ett = (T+ x(A = I} + He — YA — 2? — Jre*(A — 21) =e (14 x(A — UW) + Ue — 2x ~ (A — 2}. We can replace x by x — ¢ in this formula to obtain e "4, Therefore the integrand in (7.44) is el 49(1) = POT + = IA = 22) + Ye? = Be — Y= 1](A = 23010) 1 x-t ee — 2x — 1-1 =e/0/4(A—-2Ne*| 0 | + ta — 21)e* 0 t a(x — 1) Lette! — 24x — 1) — Integrating, we find x bt Ya) =fer *40(1) dt = f 0 | +(A- ano 0 ] 7 List! Lis’ ye —f-x-x? + fa = 2c] 0 | je fb bt et Since we have =I 1 20 2 A-2=|0 1 -1| and (4-29%=|-2 0 2], 1 L 2 i) 2. we find x be fe? — be — tbs) ¥(x) =e] 0 | +e] ast | + et) te + Pt bet btt bt xt xt dt get — fpr pt — seb te B] =e tities pl]. ted bet bt The rows of this matrix are the required functions y,, Ya Ja. Exercises 215, 7.17 Exercises 1. Let Z be a solution of the nonhomogeneous system ZO = AZ) + AW), con an interval J with inital value Z(a). Prove that there is only one solution of the non- homogeneous system YO = AY) + QO con J with inital value Y(@) and that it is given by the foemnala Y(t) = Z(t) + e-4{ Ya) ~ Z(a)}. Special methods are often available for determining a particular solution Z(0) which resembles the given finnction Q(), Fxercises 2, 3, 5, and 7 indicate such methoxls for Q) = C. OG eC, OW = IC, and Q(t) = (cos at)C + (Sin af)D, where C and D are constant vectors. If the particular solution Z(0) so obtained does not have the required initial value, we modily Zt) as indicated in xerciso 1 to obtain another solution Y(q with the required initial value. . (a) Let A be a constant mx n matrix, B and C constant n-dimensional vectors. Prove that the solution of the system np YO=AVO +e, Y@=B on (~ 2, + ©) is given by the formula Yox) = oleae + (end da) (b) If A is nonsingular, show that the integral in part (a) has the value {e'#-=)4 — 7A, (© Compute Y(%) explicitly when [hej lh 3. Let A be an 1m X 1 constant mattix, let B and C be n-dimensional constant vect be a given scalar. (@) Prove that the nonhomogeneous system Z(t) = AZ(t) + e#C has a solution of the form Z(t) = eB if, and only if, (af = A)B = C. (b) Ifa is not an eigenvalue of A, prove that the vector B can always be chosen so that the system in (a) has a solution of the form Z(Q) = eB. (©) If = is not an eigenvalue of A, prove that every solution of the system Y(t) = A YG) + eC has the form ¥() = et4( ¥(0) = B) + eB, where B= (al ~ AIC, 4, Use the method suggested by Exercise 3 to find a solution of the nonhomogeneous system YW = AYO + eC, with a-[) }: e-[7} ro =[1). s, and let a 216 Systems. of differential equations 5. Let A be an mx # constant matrix, let Band C be n-dimensional constant vectors, and let m be a positive integer. (@) Prove that the nonhomogeneous system Y'() = A YW) + PC, YW) = B, has a particular solution of the form Yi) = By + (By + OBy 40° + By where By, B1,-.., By are constant vectors, it and only if =- sama Determine the coefficients By, By...» By for such a solution (b) If A is nonsingular, prove that the initial veetor B can always be chosen so that the system in (a) Iwas a solution of the specified fom, 6. Consider the nonhomogeneous system N= Int wte Vea Wht yee () Find a particular solution of the form Y(f) = By + 1B, + #By + #By. (b) Find a solution of the system with y,(0) = y,(0) = 1. 7. Let A be an # x n constant matrix, let B, D be n-dimensional constant vectors, and let «be a given nonzero real number. Prove that the nonhomogeneous system Yi) = A Yi) + (cos at)C + (Sin at)D, (0) = B, has a particular solution of the form Y(D = (Cos af)E + (sin a)F, where E and Fare constant vectors, at and only at (42 + oR = (AC + oD) Determine £ and Fin terms of A, B, C for such a solution, Note that if 4* + 7 is nonsingular, the initial veotor B can always be chosen so that the system has a solution of the specified form. 8. (@ Find a particular solution of the nonhomogeneous system Hee + Bye +4 sin 2 Ye= Wi Yar () Find a solution of the systen with y(0) = yy(0) = 1. The general linear system Y'(t) = P(t) Yt) + Q(t) 217 In each of Exercises 9 through 12, subject t0 the given initial condition, solve the nonhomogeneous system Y") = A YQ + QO 4. 0 1 aaa . ow =| A. yo =["]. Le Lz} ie] ale oo-| | yo) - | : 1-3" et’ oe} 5 + - —2ppr nal ° i} aw={ 777), =f), 2-3 Se! +12, ip -1 0 ° 6 12.4=| 0 -1], O@=|2], YO =| -2]. o 0 - ' 1 POLE) + Oo ‘Theorem 7.13 gives an explicit formula for the solution of the linear system YO=SAYO+O0, YO=8 where A is a constant n X a matrix and Qf), ¥() are m X 1 column matiices, We tm now to the more general case (745) Y= POYO)+Q0, Ya =B where the 1 x m matrix P(t) is not necessarily constant If P and Q are continuous on an open interval J, a general existence-uniqueness theorem which we shall prove in a later section tells us that for each a in J and each initial vector B there is exactly one solution to the initial-value problem (7.45). In this section we use this result to obtain a formula for the solution, generalizing Theorem 7.13. In the scalar case (a = 1) the differential equation (7.45) can be solved as follows. We let A(x) = JZ P(t) df, then multiply both members of (7.45) by e~4" to rewrite the differ- ential equation in the form (7.46) AOC Yr) — POY(D} = e4Q(). Now the left member is the derivative of the product e~ 4" ¥(1), ‘Therefore, we can integrate both members from a to x, where a and x are points in J, to obtain osely(y — eA y(a) = ff HOW at. Multiplying by e¢® we obtain the explicit formula (147) Yoo = e4@le4Y(q) * et fF eA Q(1) dt. 218 Systems of differential equations The only part of this argument that does not apply immediately to matrix functions is the statement that the left-hand member of (7.46) is the derivative of the product e~4! ¥(1) At this stage we used the fact that the derivative of e~4") is —P(t)e~4"#, In the scalar case this is a consequence of the following formula for differentiating exponential functions if E(t) = e#, then EG) = A(He*. Unfortunately, this differentiation formula is not always true when A is a matrix function. 1 For example, it is false for the 2 x 2 matrix function A(t) = [6 { (See Exercise 7 of Section 7.12.) Therefore a modified argument is needed to extend Equation (7.47) to the matrix case, Suppose we multiply each member of (745) by an unspecified n x nm matrix F(t), This gives us the relation FOV = FOP] YN + FOO Now we add F'(1)¥(1) to both members in order to transform the left member to the derivative of the product F(t) ¥(f. If we do this, the last equation gives us {FO OY = (FO) + FOPO} YO + FOO « If we can choose the matrix F(t) so that the sum (F(t) + F()P(1)} on the right is the zero matrix, the last equation simplifies to (FO) YOY = FNC) Integrating this from a to x we obtain FQ)¥(x) = F(a)¥(a) = i F()Q() at « If, in addition, the matrix F(x) is nonsingular, we obtain the explicit formula (748) Y(x) = FaFa) Ya) + FO? |* HOW) ar. This is a generalization of the scalar formula (7.47), The process will work if we can find a nm x ft matrix function F(t) which satisfies the matrix differential equation F(t) = -FOPW) and which is nonsingular, Note that this differential equation is very much like the original differential equation (7.45) with Q(1) = 0, except that the unknown function F(t) is a square matrix instead of a column matrix, Also, the unknown function is multiplied on the right by —P(#) instead of on the left by P(t), We shall prove next that the differential equation for F always has a nonsingular solution, we puut will depend ou tie folluwing cainicue Uimacm fix inwuegeuevus five sysicu, The general linear system Y"(t) = P(t) ¥(t) + Q(t) 219 ruonsn 7.14, Assume A(t) is ann x n matrix function continuous on an open interval J. Ifa CJ and if Bis a given n-dimensional vector, the homogeneous linear system Ya) = A(t) Y(t), Ya) = B, has an n-dimensional vector solution ¥ on J. A proof of Theorem 7.14 appears in Section 7.21. With the help of this theorem we can prove the following. ruonm 7.15. Given an n x n matrix function P, continuous on an open interval J, and given any point a in J, there exists an n x n matrix function F which satisfies the matrix differential equation (7.49) Fx) = —F()P(x) on J with initial value F(a) Moreover, F(x) is nonsingular for each x in J. Proof. Let Y,(x) be a vector solution of the differential equation YG) = —P(a)'¥(x) on J with initial vector Ya) = J, , where J, is the kth column of the m x 1 identity matrix 1 Here P(x)* denotes the transpose of P(x). Let Gtx) be the m x n matrix whose kth column is Y(x). Then G satisfies the matrix differential equation (7.50) G(x) = —P(x)'G(x) on J with initial value Gfa) = I. Now take the transpose of each member of (7.50). Since the transpose of a product is the product of transposes in reverse order, we obtain {G@!” = —GQ)'PQ). Also, the transpose of the derivative G’ is the derivative of the wanspose Gt, Therefore the matrix F(x) = G(x)! satisfies the differential equation (7.49) with initial value Fla) = 1 Now we prove that F(x) is nonsingular by exhibiting its inverse. Let Hobe the 1 x on matrix function whose kth column is the solution of the differential equation Ya) = Pix) Yo) with initial vector ¥(a) = 1, the kth column of J, Then H_ satisfies the initial-value problem Hix) = POH), — Hla) on J. The product F(x)H(x) has derivative FH (x) + FAQ) = FOOP@)AG) = FX)PQ)HQ) = 0 220 Systems of differential equations for each x in J. Therefore the product F(x)H(x) is constant, F(x)H(x) = Fla)H(a) = I, so H(x) is the inverse of F(x). This completes the proof. ‘The results of this section are summarized in the following theorem. muon 7.16. Given an n x n matrix function P and an n-dimensional vector function Q. both continuous on an open interval J, the solution of the initial-value problem (751) r(x) = Pix) x) + OG), Ma) = B, on J is given by the formula (752) Ya) = Fania) + FOO [* FW) at The Wx n matrtx F(x) 1s the transpose of the matrix whose kth column ts the solution of the initial-value problem (753) Ya) = —PQ)YG), Y@=h, where I, is the kth column of the identity matrix 1. Although Theorem 7.16 provides an explicit formula for the solution of the general linear system (7.51), the formula is not always a useful one for calculating the solution because of the difficulty involved in determining the matrix function F. ‘The determination of F requires the solution of » homogeneous linear systems (7.53). The next section describes a power-series method that is sometimes used to solve homogeneous linear systems. We remind the reader once more that the proof of ‘Theorem 7.16 was based on Theorem 7.14, the existence theorem for homogeneous linear systems, which we have not yet proved. 7.19 A power-series method for sob homogeneous linear systems Consider a homogeneous linear system (754) YO = AQ)YO, — YO) = B, in which the given nx m matrix A(x) has a power-series expansion in x convergent in some open interval containing the origin, say ii a } bl AG) — Ag A body too tA for [ot < where the coefficients 4g, A,, A,,... are given xm matrices. Let us try to find a powerseries solution of the form Y(x) = By + XB, + OB ee BR Exercises 221 with vector coefficients By, By, By, ..., Since Y(0) = Bp, the initial condition will be satisfied by taking By = B, the prescribed initial vector. To determine the remaining coefficients we substitute the power series for Y(x) in the differential equation and equate coefficients of like powers of x to obtain the following system of equations: x (1.55) By= AB, (k+DB = 2 Arba for k=1,2.... ‘These equations can be solved in succession for the vectors By, By, .... If the resulting power series for Y(x) converges in some interval [x] < 2. then Y(x) will be a solution of the initial-value problem (7.54) in the interval |x| 1, so the system of equations in (7.55) becomes AB, (kK +)By = 4B, for kD 1. Solving these equations in succession we find Therefore the series solution in this case becomes: So x* ad Yo) = B+ Dae =e B Ge This agrees with the result obtained earlier for homogeneous linear systems with constant coefficients. 7.20 Exercises 1. Let p be a real-valued function and Q an m x 1 matrix function, both continuous on an interval J. Let A be an nx m constant matrix. Prove that the initial-value problem Y'@)= p@AYG)+ OG), —Yu)= B, has the solution eltp + ole Yoo) e4Q(N) dt on J, where g(x) = [2 p@ dr. Consider the special case of Exercise 1 in which 4 is nonsingular, a = 0, p(x) = 2x, and Q(x) = xC, where C is a constant vector. Show that the solution becomes Y¥(x) = e"™4(B + $470) = bate, 3. Let A(t) be an m x n matrix function and let E@) = e4, Let QW, Y(r), and Bbe ax 1 column matrices. Assume that BO = AOEO Systems of differential equations oon an open interval J. Ifa € J and if A’ and Q are continuous on J, prove that the initial-value problem var = A(OYH+ QO, Yu) =B, has the following solution on I ¥(x) = ed(le-AledB + eleift AQ dt. da et, This exercise describes examples of matrix functions A(\) for which B() = A (EW). (a) Let A) = 7A, where A is ann xn constant matrix and r is a positive integer. Prove that EO = AEC) on (= @, «0, (b) Tet A() be 2 polynomial in « with matrix coefficients, say AW = ¥ PA... where the coefficients commute, 4,4, = A,A, for all rand s. Prove that E°() = 4’@E( on (-@, ©), (©) Solve the homogeneous linear system Y@ = (0+ ayy, (0) = on the interval ( — <0, c0), where A is an mx m constant matrix, 5. Assume that the m x nm matix function A(x) has a power-series expansion convergent for |x| > Sane)? +(« 143 ee pec a &! Gk +t convergent for = ao co we shall prove that the infinite series (7.60) X,{Ynanl) — Yona} converges for cach x in J. For this purpose it suffices to prove that the series (761) & WY u(x) = YC mano converges. In this series we use the matrix norm introduced in Section 7.3, the norm of a matrix is the sum of the absolute values of all its entries. Consider a closed and bounded subinterval J, of J’ containing a. We shall prove that for 2 in (1.61) is ef co every x in J, the pendent of x. This implies that the series converges uniformly on J. To estimate the size of the terms in (7.61) we use the recursion formula repeatedly. Initially, we have ae YX) = WoO) = |" AWB at. Proof af the existence theorem by the method af successive approximations 225 For simplicity, we assume that a a in Jy. Now we use the recursion formula once more to express the difference ¥, — Y; in terms of ¥,— Yo, and then use the estimate just obtained for Y, — Yy to obtain Le) = ¥,0)1 = I FF acatycn = van} at | < [vac 1B Ma = a) at < [Bi a? | @ =a) dt= [Bh Mix =a) for all x > a in Jy. By induction we find Ian) = Yuli < al ME for m=O, 1.2.05 (m + 1! and for all x >a in Jy. If x < @ a similar argument gives the same inequality with [xy — | appearing instead of (x ~ a). If we denote by L the length of the interval J,, then we have lx = al < L for all x in Jy s0 we obtain the estimate Mmipmen , Yana) ~ ¥u(x)L < 1B m+ DP for m=0,1,2,. and for all x in Jy. Therefore the series in (7.61) is dominated by the convergent series eae! 5) > = pe — 1. m+ 1! aay Gn. BD, ‘This proves that the series in (7.61) converges uniformly on J;. ‘The foregoing argument shows that the sequence of successive approximations always converges and the convergence is uniform on Jy. Let Y denote the limit function, That is, define Y(x) for each x in Jy by the equation ¥(x) = lim ¥,(x). In 226 Systems of differential equations all prove that Y has the following properties: (@) Y is continuous on Jy. (b) Y(x) = B+ fe A(t) ¥(t)dt for all x in Jy. (c) Ya) = Band YW) = AY) for all x in. Part (©) shows that Y is a solution of the initial-value problem on J,, Proof of (@) Fach fiction ¥, is a colimn matrix whose entries are scalar fhnetions. continuous on J,. Each entry of the limit function Y is the limit of a uniformly convergent sequence of continuous functions so, by Theorem 11.1 of Volume J, each entry of Y is also continuous on Jy, Therefore Y itgelf is continuous oa Jy, Proof offb). ‘The recursion formula (7.58) states that Yoals) = B+ [* AOD ar. Therefore AC)lim Y(t) at etic Y(x) = lim Yale) = 6 + im fr AGO dt = B+ kam ke va =e lays iF Av) a. ‘The interchange of the limit symbol with the integral sign is valid because of the uniform convergence of the sequence {¥,} on Jy. Proof of (©). The equation ¥(a) = B follows at once from (b). Because of (a), the integrand in (6) is continuous on J, so, by the first fundamental theorem of calculus, Y"(x) exists and equals A(x) Y@x) on Jy. The interval J; was any closed and bounded subinterval of J containing a. If J, is enlarged, the process for obtaining Y(x) doesn’t change because it only involves integration from a to x, Since for every x in J there ig a closed bounded subinterval of J containing a and x, a solution exists over the full interval J. THEOREM 7.17, UNIQUENESS THEOREM FOR HOMOGENEOUS LINEAR SYSTE continuous on an open interval J, the differential equation Ss. If A(t) is yy = Aw YQ) has at most one solution on J satisfying a given initial condition Ya) = Proof, Let Y and Z be wo solutions on J. Let J be any closed and bounded subinterval of J containing a. We will prove that Z(x) = Y(x) for every x in J,. This implies that Z = Y on the full interval J. Since both Y and Z are solutions we have ZW = YO = AMZ = wh. The method of successive approximations applied tojrst-order nonlinear systems 227 Choose x in Jy and integrate this equation from a to x to obtain ze) = vu) = [* AWIZO = Yo) ae ‘This implies the inequality - Yoh di, (7.63) |Z(x) where Mis an upper bound for |A(f)| onJ,. Let M, be an upper bound for the continuous function |Z(f) = Y(e)ll on J,. Then the inequality (763) gives ue (7.64) IZ) — Y@)I $ Mh |x — al. Using (7.64) in the right-hand member of (7.63) we obtain [x = al? Zs) = ¥@) < Mem, | [* ie al ar |= Mem, By induction we find (7.65) 126) = ¥(9| < ey, Bal" 7 a. . . . .. . . . . . proof. The results of this section can be summarized in the following existence-uniqueness theorem. sano 7.18. Let A be an n x n matrix function continuous on an open interval J. Ifa € J and if B is any n-dimensional vector, the homogeneous li Yi) = AW) Yo, Ya) has one and only one n-dimensional vector solution on J. 7.22 The method of successive approximations applied to first-order m near systems Consider a. first-order system of the form (7.66) y= FY), where Fis a given n-dimensional vector-valued. finetion, and Yis an unknown n-dimensional vector-valued function to be determined. We seek a solution Y which satisfies the equation YO = Fl, Yor 228 Systems of differential equations for each ¢ in some interval J and which also satisfies a given initial condition, say Ya) = B, where a € J and B is a given n-dimensional vector, In a manner parallel t0 the linear case, we construct a sequence of successive approxi- mations Yy, ¥,, Ya. . by taking Y,= B and defining Y,,, in terms of ¥, by the recursion formula (7.67) Yeu(x) = B+ i Fit, ¥(Q|dt for k=0, 1,2,.... Under cenain conditions on F, this sequence will converge w a limit function Y which will satisfy the given differential equation and the given initial condition, Before we investigate the convergence of the process we discuss some one-dimensional examples chosen to illustrate some of the difficulties that can arise in practice. wxmeuz 1. Consider the nonlinear initial-value problem y’ = a + y® withy = 0 when x = 0. We shall compute a few approximations to the solution, We choose Yo(x) = 0 and determine the next three approximations as follows: 7 pytt ts dt=—4+-—4+— . + 63 * 2079 * Sosas At is now apparent that a great deal of labor will be needed to compute further approxi- mations. For example, the next two approximations Y, and Y, will be polynomials of degrees 3 1 and 63, respectively. ‘The next example exhibits a further difficulty that can arise in the computation of the successive approximations mowers 2. Consider the nonlinear imbal-value problem y" = 2x +e" , withy = 0 when x = 0. We begin with the initial guess ¥,(x) = 0 and we find ¥@) = fr (t+ l)dtetex, -{* 1 gen yea [ltt J ats eM ae= x! fe a. Here further progress is impeded by the fact that the last integral cannot be evaluated in terms of elementary functions. However, for a given x it is possible to calculate a numerical approximation to the integral and whereby obtain an approximation 1 ¥,(x). Because of the difficulties displayed in the last two examples, the method of successive approximations is sometimes not very useful for the explicit determination of solutions in practice, The real value of the method is its use in establishing existence theorems. Proof of an existence-uniqueness theorem for first-order nonlinear systems 229 7.23 Proof of an existence-uniqueness theorem for first-order nonlinear systems We tum now to an existence-uniqueness theorem for firstorder nonlinear systems. By placing suitable restrictions on the function which appears in the righthand member of the differential equation Y'= F(x, Y), we can extend the method of proof used for the linear case in Section 7.21 Let Jdenote the open interval over which we seck a solution, Assume a € Sand let B be a given n-dimensional vector. Let $ denote a set in (n + 1)-space given by S={@,¥)| Iya Sh, |¥—Bi sk), where h > 0 and k > 0. [If n = 1 this is a rectangle with center at (a, B) and with base 2h and altitude 2k.] We assume that the domain of F includes a set S of this type and that F is bounded on S, say (7.68) IIF@, YS M for all (x, Y) in S, where M is a positive constant Next, we assume that the composite function G(x) = F(x, Y(x)) is continuous on the interval (a — h , a + h) foe every function Y which is continuous on (a = h . a + A) and which has the property that (x, Y(x)) € S for all x in (a= hk, @ + A). This assumption guarantees the existence of the integrals that occur in the method of successive approxi- mations, and it also. implies continuity of the functions so constructed. Finally, we assume that F satisfies a condition of the form IFC, Y) = F(x, Z)| $ ALY ~ Zi) for every pair of points (x, Y) and (x, Z) in 8, where A is a positive constant, This is called a Lipschitz condition in honor of Rudolph Lipschitz who first introduced it in 1876. A Lipeehi va fi nd the condition does not ms it enablee ue to as ion very seriouely and from the linear to the nonlinear case. proof of existence and uniguens THEOREM 7,19, EXISTENCE AND UNIQUENESS OF SOLUTIONS TO. FIRST-ORDER NONLINEAR systems. Assume F satisfies the boundedness, continuity, and Lipschitz conditions specified above on a set S. Let I denote the open interval (a = ¢, a +c), where c= min (h, k fit). ‘Then there is one and only one function Y defined on Z with Y(a) = B such that (x, Yor) €S and Y(a) = F(x, Ya) foreach xin 1. Proof. Since the proof is analogous to that for the linear case we sketch only the principal steps. We let Y,(x) = B and define vector-valued functions Y; , Ye, ... on Z by the recursion formula (7.69) Yuu) B+ [Fl Yg(Q] de for m=0,1,2,. 230 Systems of differential equations For the recursion formula to be meaningful we need to know that (x, Y\(x)) € S for each x in 1 This is easily proved by induction on m. When m = 0 we have (x, Ya(x)) = (% B), which is in S, Assume then that (x, ¥\x)) € S for some m and cach x in I. Using (7.69) and (7.68) we obtain Vaud — BL S| HELE YaCOIE at | < ar | ff ar = atte = at Since [x — al < ¢ for x in J, this implies that I Yin) — Bl < Me Sk, which shows that (x, Y(x)) © S for each x in I, Therefore the recursion formula is meaningfial for every m > 0 and every x in 7. The convergence of the sequence {Y(x)} is now established exactly as in Section 7.21 We write HO) We Waal YO} and prove that Y,(x) tends to a limit as k —> oo by proving that the infinite series 28 ¥an6) = YG converges on 1. This is deduced from the inequality MA™ |x — al™ t< MAnc™t (m+! (m+ 1! WTO = FOO S which is proved by induction, using the recursion formula and the Lipschitz condition Y@) = lim Y,x) mem for each x in J and verify that it satisfies the integral equation ¥(x) = B+ f FU, YO] dt, exactly as in the linear case. This proves the existence of a solution, The uniqueness may then be proved by the same method used to prove Theorem 7.17, 7.24 Exercises onsider the linear imtial-value problem yi+y =2¢, with y = 1 when x = 0. Frercises 231 (@) Find the exact solution Y of this problem. () Apply the method of successive approximations, starting with the initial guess ¥o(x) = 1 Determine Y,(x) explicitly and show that lim Y\(x) = Y(@) for all real x. Apply the method of successive approximations to the nonlinear initial-value problem with y WO=145+ 35+ Gt I? OM x oe ae 1 2,0) = 4 = Oa ate tig + ae t 366 + 256° 8. Consider the system of equations Qe tz, v= Sry 2, 7 with initial conditions y = 2 and z = 0 when x = 0. Start with the initial guesses Y(x) = Z,(x) = 0, use the method of successive approximations, and determine Y,(x) and Zg(x). 9. Consider the initial-value problem ys xty't dy, with y= 5 and 1 whenx = 0, Charge iis prubieun iy au eyuivaicut problem iuvuiving a system uf owe equations fae owe unknown functions y = Y(x) and z = Z(x), where z Then use the method of successive approximations, starting with initial guesses Y(x) = 5 and Z,(x) = 1, and determine ¥,(x) and Z,(x), 10. Letfbe defined on the rectangle R= [-1, 1} x[-1, 1] as follows: 0 if x=0, dix if x#0and ly a, -2x if x#0and y<-x. (a) Prove that |f(x, y)| < 2 for all (x,y) in R (b) Show thatfdoes not satisfy a Lipschitz’ condition on R. ‘© For each constant C satisfying |C| <1 , show that y = Cx? is a solution of the initial-value problem y’ =/(x, y), with y = 0 when x = 0. Show also that the graph of each of these solutions over (— 1°. 1) lies in R. @ Apply the method of successive approximations to &his initial-value problem, starting with initial guess Y\(x) = 0. Determine Y,(x) and show that the approximations converge to a solution of the problem on the interval (~ 1 , 1). (©) Repeat part (@), starting with initial guess Y,(x) = x. Determine Y.(x) and show that the approximating functions converge 1 a solution different from any of those in part (©). (D Repeat part (), starting with he initial guess Y,(x) = 2x? (g) Repeat part (), starting with the initial guess Y,(x) = x18 4 7.25 Successive approximations and fixed points of operators ‘The basic idea underlying the method of successive approximations can be used not only to establish existence theorems for differential equations but also for many other important problems in analysis. The rest of this chapter reformulates the method of successive Normed linear spaces 233 In the proof of Theorem 7.18 we constructed a sequence of functions {¥,} according 10 the recursion formula Yui) = B+ [AKO at The right-hand member of this formula can be regarded as an operator 7 which converts certain functions Y into new functions 1¥) according 10 the equation m = B+ [° ave ae In the proof of Theorem 7.18 we found that the solution Y of the initial-value problem Y@ = A Y(), Y(a) = B, satisfies the integral equation y= B+ [* avinat. In operator notation this states that Y = 77). In other words, the solution Y remains unaltered by the operator 7. Such a function Y is called a fixed point of the operator T. Many important problems in analysis can be formulated so their solution depends on the existence of a fixed point for some operator, Therefore it is worthwhile to try to discover properties of operators that guarantee the existence of a fixed point. We tum now a systematic treatment of this problem, 7.26 Normed Ii To formulate the method of successive approaimations in a general form itis couvenient to work within the framework of linear spaces. Let S be an arbitrary linear space. When wwe speak of approximating one element x in S by another element y in S, we consider the difference x — y , which we call the error of the approximation, To measure the size of this ermor we introduce a norm in the space. jear_ spaces DI INITION OF A NORM. Let S be any linear space. A real-valuedfunction N defined on S is called a norm if it has the following proverties: (@) NO) > 0 for each x in S (b) N(ex) = |e| NG@) for each x in S and each scalar c. (©) NQe+ 9) S No) + NG) for all x and y in & (d) Mex) = 0 implies x = 0. A linear space with a norm assigned 1 it is called a normed linear space. ‘The norm of x is sometimes written |jx\j instead of N(x). In this notation the fundamental properties become : (a) [lx] > 0 for all x in S. (b) lexl] = lel [lxl) for all x in S and all scalars c, (©) lx + yl < |Ix\| + [Lyi] for all x and y in S. 234 Systems of differential equations If the space $ is Buctidean, then it always has a norm which it inherits fom the inner product, namely, 1x] = (x, x). However, we shall be interested in a particular norm which does not arise from an inner product. sxameuz. The max norm. Let C(J) denote the lincar space of real-valued functions continuous on a closed and bounded interval J. If p € C(J), define Nell = max |p(x)|, ae where the symbol on the right stands for the maximum absolute value of g on J. The reader may verify that this norm has the four fundamental properties. The max norm is not derived from an inner product To prove this we show that the max norm violates some property possessed by all inner-product norms. For example, if a norm is derived from an inner product, then the “parallelogram law” ll yi + x — yl? = 2 xl? + 2 Jy holds for all x and y in $, (See Exercise 16 in Section 1.13,) The parallelogram law is not always satisfied by the max norm, For example, let x and y be the functions on the interval [0, 1] given by MN) =6 ye lat. Then we have {ixil= llyl|= lx + yh= lx — yll= 1, 90 the parallelogram law is. violated, 7.27 Contraction operators In this section we consider the normed linear space C(J) of all real functions continuous on a closed hounded interval J in which { gj is the max norm. Consider an operator T: CJ) > Cd) whose domain is C(J) and whose range is a subset of C(J). That is, if g is continuous on J, then T(g) is also continuous on J. The following formulas illustrate a few simple examples of such operators. In each case g is an arbitrary function in CU) and T(g)(x) is defined for each x in J by the formula given: T(g\(x) = 2g(x), where 4 is a fixed real number, TUpy(x) = f° gd) at, where is a given point in J, TN) = b+ [FF oO] at, where b is a constant and the composition /[t, @(t)] is continuous on J. We are interested in those operators T for which the distance | T(@) ~ T(y)| is less than a fixed constant multiple % <1 of |ip — pl . These are called contraction operators; they are defined as follows. Fixed-point theorem for contraction operators 235 DEFINITION OF A CONTRACTION OPERATOR. An operator T: C(J)- C(J) is called a contraction operator if there is a constant x satisfying 0 1. To prove (7.74) we note that [Peed = Pe) = (PIO) — Mr DO) S & Ne — Peal - ‘Therefore the inequality in (7.74) will be proved if we show that (775) Ve = Pall S Moe for every k > 1. We now prove (7.75) by induction, For k = 1 we have er Poll S Neal + pol = M, which is the same as (7.75). To prove that (7.75) holds for k + 1 if it holds for k we note that VP) = ge2)] - ITPIO) — TH DO) S # We — Pall S Mot. Since this is valid for each x in J we must also have livia ~ yall S dfa" This proves (7.75) by induction, Therefore the series in (7.73) converges uniformly on J. It we let g(x) denote its sum we have (7.76) (3) = im 9400) = 146) + 3 {940 = 909} Applications of thejxed-point theorem 237 The function y is continuous on J because it is the sum of a uniformly convergent series of continuous functions. To prove that ¢ is a fixed point of T we compare T(q) with Yair = Ta). Using the conuaction propery of T we have (7.77 ITED = Pa Gl = ITC) = THIN S #190) = 9.001. But from (7.72) and (7.76) we find Ia) ~ e469 =| Sonate) — 00]

co the series on the right tends 10 0, 80 @41(X) + Thi). But since Paya) > g(x) as n> &, this proves that g(x) = 7(q)(x) for each x in J Therefore g = T(g), so @ is a fixed point. Finally we prove that the fixed point gy is unique, Let y be another function in C(J such that TYy) = y, Then we have lp -vl= IT) -THI Sale vl This gives us (1 — a) | ¢ — yl < 0. Since a < 1 we may divide by 1 — @ to obtain the inequality |p — ll < 0. But since we also have | ¢ — yl > 0 this means that |p — pl = 0, and hence y — y = 0. The proof of the fixed-point theorem is now complete. + 7.29 Applications of the fixed-point theorem To indicate the broad scope of applications of the fixed point theorem we use it to prove fe, ¥ wnt thenmms The first gives 0 to define y as a funetion of x. fon of the form saFicient condition for an eq THEOREM 7.21, AN IMPLICIT-FUNCTION THEOREM, Let f£ be defined on a rectangular strip R of the form R={((x,y)|agxgb,-w 0, then for each 4 such that 1 (7.85) 4) <——_ M(b — a) there is one and only one function @ in C that satisfies the integral equation (7.83). Proof. We shall prove that T is a contraction operator, Take any two funetions gy and in C and consider the difference Tod) ~ Tex) = 2 |? Kx, Dlg — gal at. Using the inequality (7.84) we may write IT@d(x) = T(p2)@)| < [2] M(B — @) los = Gall = @ Ng = gal, where g = || M(b — a). Because of (7.85) we have 0 < a < 1, so T is a contraction operator with contraction constant a, Therefore T has a unique fixed point g in C. This function g satisfies (7.83). PART 2 NONLINEAR ANALYSIS 8 DIFFERENTIAL CALCULUS OF SCALAR AND VECTOR FIELDS 8.1 Functions from R” to R”. Scalar and vector fields Pan 1 of this volume dealt primarily with Tinear transformations T:V>W from one linear space V into another linear space W. In Pat 2 we drop the requirement that T he linear hut restrict the spaces V and W to be finite-dimensional — shall consider functions with domain in n-space R" and with range in m-space R™, When both 7 and m are equal to 1, such a function is called a real-valued function of a real variable. When a = 1 and m > 1 it is called a vector-valued function of a real variable. Examples of such functions were studied extensively in Volume 1 In this chapter we assume that 2 > 1 and m >I. When m = 1 the function is called a real-valued function of a vector variable or, more briefly, a scalar field, When m > 1 it is called a vector-valued function of a vector variable, or simply a vector field. This chapter extends the concepts of limit, continuity, and derivative to scalar and vector fields. Chapters 10 and 11 extend the concept of the integral. ‘Really, we Notation; Scalars will be denoted by lightfaced type, and vectors by bold-faced type. If f is a scalar field defined at a point x = (x, ... , x,) in RY, the notations ffx) and Sle... 45x) are oth used to denote the value of fat that particular point. Iffis a vector field we write f(x) or fly, ... , x) for the funetion value at x. We shall use the inner product, y= Se and the corresponding norm |x| = (x + x)*, where x = (x,....x,) andy = Ots-++ 5 Dn)> Points in R® are usually denoted by (x, y) instead of (x1, x); points in R3 by (x,y, 9 insteada(4y, 0, %). Scalar and vector fields defined on subsets of R® and R® occur frequently in the appli- cations of mathematics to science and engineering. For example, if at each point x of the atmosphere we assign a real number /(x) which represems the temperawure at x, the function 243 244 Differential calculus of scalar and vector fields £ so defined is a scalar field, If we assign a vector which represents the wind velocity at that point, we obtain an example of a vector field. In physical problems dealing with either scalar or vector fields it is important to know how the field changes as we move from one point to another. In the one-dimensional case the derivative is the mathematical tool used to study such changes, Derivative theory in the one-dimensional case deals with fimetions defined on open intervals. To extend the theory ( R* we consider generalizations of open intervals called open sets. 8.2 Open balls and open sets Let @ be a given point in R and let_r be a given positive number. The set of all points x in R® such that |x -al 1. The boundary of $ consists of all x with |x] = 1. 83 Exercises 1, Letf be a scalar field defined on a set $ and let ¢ be a given real number. The set of all points x in S such that f(x) = ¢ is called a level set off. (Geometric and physical problems dealing with level sets will be discussed later in this chapter.) For each of the following scalar fields S is the whole space R". Make a sketch to describe the level sets comesponding to the given values of ¢, @) fix, N= ate © =0,1,4,9. (b) f(x, y= e, peters keane () fx, y) = 608 (x +s =1,0,4,3/2,1 ) f(x, y,2) = x+y +7, c= -1,0,1 (@) fx, , 2) = x8 + By + 32, ¢=0,6,12. © fl, 2) = sin OF + P+ 2), 246 Differential calculus of scalar and vector fields 2. In cach of the following cases, let S be the set of all points (x,y) in the plane satisfying the given inequalities, Make a sketch showing the set S and explain, by a geometric argument, whether or not $ is open. Indicate the boundary of $ on your sketch. @ e+ yc, h) 1 Sx <2and3 0. (©) Ixi< 1 and [yl <1 @) x2y. (@x >Oand y>0 ) x >y. (e) Ix| S$ 1 and lyl $1. (1) y > x* and [x] < 2, () x >and y <0. (im) (? + y? - NG = x8 - y%) > 0. @® xy <1. (o) Ox =a? = DQ + 9? — 2) >0. 3. In cach of the following. let S$ be the set of all points (x. y. 7) in 3-space satisfying the given inequalities and determine whether or not $ is open. (@) # = x2 - yF = 1 >0. (&) Isl 1, (yl <1, and [ej <1 Oxty4z<1 @ Mel SVs [yl <1, and lel <1 ©xtytz0,y>0,2>0. (xt + 4y* + 4c? — 2x + 16y +402 + 113 <0. 4. (a) If A is an open set in n-space and if x € A, show that the set A — {x}, obtained by removing Ue point x fiom A, is open (b) If A is an open interval on the real line and B is a closed subinterval of A, show that A =B is opent (©) If A and B are open intervals on the real line, show that 4 U B and A A B are open. (@ I A is a closed interval on the real line, show that its complement (relative to the whole real line) is open. 5. Prove the following properties of open sets in R: (a) The empty set J is open. (h) R® is open. (©) The union of any collection of open sets is open. (@) The intersection of a Finite collection of open sets is open. (©) Give an example to show that the intersection of an infinite collection of open sets is not necessarily open, Closed sets. A set Sin R® is called closed if its complement R* ~ S is open. The next three exercises discuss properties of closed sets, 6. In each of the following cases, let S be the set of all points (x, y) in R® satisfying the given conditions, Make a sketch showing the set Sand give a geometric argument to explain whether S is open, closed, both open and closed, or neither open nor closed. (@) 2 +3220. @1 x? and |x| < 2. (1 R” , where S is a subset of R”. Ifa € R" and bE R™ we write ity (8.1) lim f(x) (01, flx) +b as x a) to mean that (8.2) lim If) ~ 4) = 0. is-al—0 The limit symbol in equation (82) is the usual limit of elementary calculus. In this definition it is not required that? be defined at the point a itself If we write A= x =a, Equation (8.2) becomes lim | f(@+ A) ~ bl) = 0. For points in R® we write (x, y) for x and (a, 6) for a and express the limit relation (8.1) as follows : A function f is said to be continuous at a if f is defined at @ and if lim f(x) = f(a). We say f is continuous on a set S iff is continuous at each point of S. 248 Differential calculus of scalar and vector fields Since these definitions are straightforward extensions of those in the one-dimensional case, it is not suprising (0 leam that many familiar properties of limits and continuity can also be extended. For example, the usual theorems concerning limits and continuity of sums, products, and quotients also hold for scalar fields, For vector fields, quotients are not defined but we have the following theorem conceming sums, multiplication by scalars, inner products, and norms, ramon 81. If lim fix) = band lim g(x) =, then we also have: (a) lim fa) + g@)]= 5+ ¢. (b) lim A(x) = 4b forevery scalar 2. (0) Tim fii goo = hve (@) lim | fll = lol. Proof. We prove only parts (c) and (@); proofs of (a) and (b) are left as exercises for the reader To prove (c) we write f(x) G(X) = B+ = [f(x) = b] [gX) — el + # [glx) ~ el +e I(X) — 4). Now we use the triangle inequality and the Cauchy-Schwarz inequality to obtain 0 < Wflx) - ox) = B+ ell < lflx) = BI lex) — el + bl lig(x) = ell + ell fx) — 5). Since ||flx) — bl] > 0 and |lg(x) ~ c|/ > 0 as x > a, this shows that [[flx) + g(x) — b. |] > as x > a, which proves (©). Taking f(x) = g(x) in part (c) we find Tim f(x)? = Wot? , fiom which we obtain (@. exapze 1. Continuity and components of a vector field. If a vector field f bas values in R™, each function value x) has m components and we can write FR) = GOs Fal) + The fos... fy 200 called components of the vector field f We shall prove that f is continuous at a point if, and only if, each component fi, is continuous at that point Let ¢, denote the kth unit coordinate vector (all the components of e, are O except the kth, which 1s equal to 1). Then f,() 18 given by the dot product IX) = 08) * ee Limits and continuity 249 Therefore, part (c) of Theorem 8.1 shows that each point of continuity of fis also a point of continuity of f,. Moreover, since we have M2) =¥ fades, 2 repeated application of parts (a) and (b) of Theorem 8.1 shows that a point of continuity of all m components fy,. .- + fm is also a point of continuity off. exwee 2. Continuity of the tdenitty function. The identity function, f(x) = x, is continuous everywhere in RY. Therefore “its components are also continuous everywhere in R*, These are the scalar fields given by A) = X Ale) = Xp. a) = sxaete 3. Continuity of linear transformations, Let f: R" — R™ be a linear transforma- tion We will prow that fis contimons at each point a in R™ Ry: lin ry we have fa + Wi) = fla) + fii). Therefore, it suffices to prove that f(A) > 0 as h > 0. Writing A in terms of its components we have h= hye + 11+ + fgey- Using linearity again we find fh) = hyfley) + 0° + A,fle,). This shows that fth) > 0 as h > 0. woexs 4. Continuity of polynomials in n variables. A scalar field P defined on R® by a formula of the form PO) = SS tay age is called a polynomial in m variables x1,..., X,- A. polynomial is continuous everywhere in R" because it is a finite sum of products of scalar fields continuous everywhere in R"- For example, a polynomial in two variables x and y, given by aa PO, y) = % Deux’! 18 continuous at every point (x, y) in R% ENAMDLE 5, A coalae field f given hy f (4) = POXIQ() where P and Q are polynomials in the components of x, is called a rational function, A rational function is continuous at each point where Q(x) # 0. af rat Further examples of continuous function can be constructed with the help of the next theorem, which is concerned with continuity of composite functions. 250 Differential calculus of scalar and vector fields ruzonen 8.2. Let f and g be functions such that the composite function fog is defined at a, where (Fe (x) = fls@]- If g is continuous at a and if f is continuous at g(a), then the composition f © g is continuous at a. Proof. Let y= f(x) and © = gia). ‘hen we have Slg(x)| — fle(a)] = fly) — f(b). , P+ bas x—> a, so we have fim 14082] ~Sla@l = im 1/0) — FOI = 0. Therefore lim flg(a)] = fla(a)] , so fe g is continous at a. muvee 6, The foregoing theorem implies continuity of scalar fields h, where h(x, y) is given by formulas such as sin (x*y), log (P+ y'), Xe log [cos (x” + y*)]. xt These examples are continuous at all points at which the functions are defined, The first is continuous at all points in the plane, and the second at all points except the origin, The third is continuous at all points (x, p) at which x + y # 0, and the fourth at all points at which x? + y is not an odd muliple of 2. [The set of (x, y) such that x2 + y® = na/2, n=1,3,5, , is a family of circles cemered at the origin.) These examples show that the discontinuities of a function of two variables may consist of isolated points, entire curves, or families of curves. suweiz 7. A function of two variables may be continuous in each variable separately ad yet be discontinuous ay a fuuetion of the Ovo vatiables wyethet, This is illustialed by the following example: xy. F(x, y) aay if (4,0). f(0,0)= 0 For points (x, y) on the x-axis we have y= 0 and f(x, 9) = f(x, 0) = 0, so the function has the constant value 0 everywhere on the x-axis. Therefore, if we put p = 0 and think of f #8 a function of x alone, fis continuous at x = 0. Similarly, fhas the constant value 0 at all points on the y-axis, so if we put x = 0 and think off as a function of y alone, f is continuous at y = 0. However, as a function of two variables, fis not continuous at the origin. In fact, at each point of the line y = x (except at the origin) the function has the constant value } because ffx, x) = x*/(2x*) = }; since there are points on this line itmtity loss to tho 0 a / @O#h, Exercises 251 8.5. Exercises The exercises in this section are concerned with limits and continuity of scalar fields defined on subsets of the plane, 1. im cach oF the ioilowing exampies a scalar Heidils defined by te given equation for aii poins (x, y) in the plane for which the expression on the right is defined. In each example determine the set of points (x, y) at whichfis continuous. (a) f(x, y) = t+ yh = axty? fa y= (b) fx, y) = log G+ J). () f(x, y) = arctan ie. © flrs) =* cost ® fey= ©) fix, cos x, (, y) = ee yy 0° Tea # x % @ fy) = tans. (i) fG, yy = x0 (©) f(x,y) = arctan 2 @) SG, y) = areeos \/x]y 2 If lim f(x, y) = L, and if the one-dimensional limits (en), lim flx,y) and lim f(x, y) a “ both exist, prove that lim (lim f(x, 99] = tim flim f(s y)] = Le eat rb me ‘The two limits in this equation are called iterated limits; the exercise shows that the existence of the (wo-dimensional limit and of the two one-dimensional limits implies the existence and equality of the (wo iterated limits. (The converse is mot always tue. A counter example is given in Exercise 4.) 3. Let f(x, y) = (x — y)/(x + y) if x +y #0. Show that lim flim f(x, y)] = 1 but that lim [lim f(x, y)] e-0 wd u-0 a6 Use this result with Exercise 2 to deduce that f(x, y) does not tend (0 a limit as (x, y) -+ (0, 0) 4. Let whenever x2y? + (x — 9)? # 0. Show that f(x, y= lim [lim f(x, yy] = 0 et yt 0 0 but that f(x, y) does not tend to a limit as (x, y) > (0,0). (Hint: Examincfon the line y = x ,] ‘This example shows that the converse of Exercise 2 is not always cme 262 Differential calculus of scalar and vector fields 5. Let 1 faye fity hee o if ys Show that f(x, y) + 0 as (x, y) + (0, 0) but that iim Cline fs, yO tm ftom fx, 1 roe zd yd Explain why this does not contradict Exercise 2. 6. If (x, ») 4 ©, 0), let f(x, y) = @® — y*/G* + y%), Find the limit of f(x, y) as (x, y) + (0, 0) along the line y= mex. Is it possible to define f(0, 0) so as to makefcontinuous at (0, 0)? 7. Let f(x, y) = 0 ify <0 or ify > x and let f(x, y) = 1 if O< ye xt Show that f(x, y) +0 as (x,y) > (0,0) along any straight line through the origin. Find a curve through the origin along which (except at the origin) f(x, y) has the constant-value 1. Isfeontinuous at the origin’? 8 If f(x, y) = fin G24 YG? y2) when (x,y) % (0) how must f(0, 0) be defined so as to makefcontinuous at “the origin? 9, Letfbe a scalar field continuous at an interior point a of a set Sin R”. If f(a) ¥ 0, prove that dete is au a-ball B(q) in whichfhas dhe same sign as f(a). 8.6 The derivative of a scalar field with respect to a vector This section introduces derivatives of scalar fields. Derivatives of vector fields are dis- cussed in Section 8.18 Let f be a scalar field defined on a set $ in R”, and let @ be an interior point of 8. We wish t0 study how the field changes as we move from a to a nearby point, For example, suppose f(a) is the temperature at a point a in a heated room with an open window. If we move toward the window the temperature will decrease, but if we move toward the heater it will increase, In general, the manner in which a field changes will depend on the direction in which we move away trom @, Suppose we specify this direction by another vector y. That is, suppose we move from a toward another point a + y along the Tine segment joining a and a + y . Bach point on this segment has the form a + hy, where ft is a real number. An example is shown in Figure 83. The distance from a to a + Ay is (hyl| = {Al [yl Since a is an interior point of S, there is an mball B(@; r) lying entirely in S. If h is chosea so that |Al [|p| < 7, the segment from a to a + hy will lie in S. Gee Figure 8.4.) We keep ayhy . Ye - Naan 1 Ficure 8.3. The point a + Ay lies on the line FIGURE 8.4 The point a + Ay lies in the duough @ pooallel oy. ball Bla; 7) if Marl) 0 13. f(x,y) =tanG*)), — y #0. 17. f(x, y) = arecos Jx]y, -y #0. 18. Let v(r, 9 = ee", Find a value of the constant such that v satisfies the following equation : w 12 oe" a Bar’ ar)” 19, Given 2 = u¢x, ye" and Mu/( ax 8y) = 0. Find values of the constants a and & such that eo 4 Wry RTE 20, (a) Assume that f’(2c: v) = 0 for every x in some n-ball Bla) and for every vector y. Use the mean-value theorem 10 prove thatfis constant on B(a. (b) Suppose that f"(e; y) = 0 for a fixed vector y and for every x in B(a), What can you conclude about f in this case? 21. A set $ in R® is called convex if for every pair of points a and B in S the line segment from @ to bis also in $; in other words, ta + (1 — 1b € S for each fin the interval 0 <1 < 1 @ Pros xy ball is (b) If f(a; y) = 0 for every x in an open convex set § and for every y in R", prove thatfis constant on S. 22. (a) Prove that there is no scalar fieldfsuch that /"(a; y) > 0 for a fixed vector a and every nonzero vector y. (b) Give an example of a scalar field £ such that f’(x; y) > 0 for a fixed vector y and every vector x, Directional derivatives and continuity 8.10 Directional derivatives and continuity In the one-dimensional theory, existence of the derivative of a function f at a point implies continuity at that point. This is easily proved by choosing an A 3 0 and writing f(a + h)— f(a) h fath-sf@= As h — 0 the right side tends to the limit f(a)» 0 = 0 and hence f(a + h) --fla). This shows that the existence off(@) implies continuity off at a. Suppose we apply the same argument 19 a general scalar field, Assume the derivative f(a; y) exists for some y. Then if h x 0 we can write fla+ ty) ~s@) = fw =f As A — 0 the right side tends to the limit f(a; ¥) +0 = 0 ; hence the existence of f’(a; y) for a given y implies that lim f(a + hy) = fle) for the same y. This means that f(x) ~fla)as x > a along a straight line through a having the direction y. If f(a; y) exists for every vector y, then f(x) > f(a) as x + a along every line through a. This seems t suggest thatfis continuous at a. Surprisingly enough, this conclusion need not be true. The next example describes a scalar field which has a direc- tional derivative in every direction at 0 but which is not continuous at 0. suveiz. Lethe the scalar field defined on R® as follows: x+y fx. ) = if x40, f@y=0. Let a = (0,0) and let y = (a, ) be any vector. If a #0 and h #0 we have La + by) =f@ _ flby) = (0) _ fla, hb) _ _ab*_ h h ho ah + HBT Tetting h + O we find § ; y= ba If y = (0, 6) we find, in a similar way, that S(O; y) = 0. Therefore f"(O; y) exists for all directions y. Also, ix) > 0 as x + 0 along any straight line through the origin, However, at each point of the parabola x = y® (except al the origin) whe functionfhas the value $. Since such points exist arbitrarily close to the origin and since f (0) = 0, the function £ is not continuous at 0. The foregoing example shows that the existence of all directional derivatives at a point fails to imply continuity at that point. For this reason, directional derivatives are a some- ‘what unsatisfactory extension of the one-dimensional concept of derivative, A more suitable generalization exists which implies continuity and, at the same time, permits us to extend the principal theorems of one-dimensional derivative theory to the higher dimensional case, ‘This is called the total derivative. 258 Differential calculus of scalar and vector fields 8.11 The total derivative We recall that in the one-dimensional case a function f with a derivative at a can be approximated near a by a linear Taylor polynomial. Iff ‘(a) exists we let E(a, A) denote the difference (8.6) Ela, n= fOt DIO _ pe) if hyo, h and we define E@, 0) = 0. Front (6.6) we obain the formule fla +h) =fia)+ f@h + hE(a,h), an equation which holds also for h = 0. This is the firstorder Taylor formula for approxi mating f(a + h)— fla) by £ a). The error committed is hE(a, h). From (8.6) we see that Efa, h) + 0 as h — 0. Therefore the error hE(a, fh) is of smaller order than A for small h. ‘This propery of approximating a differentiable function by a linear function suggests a way of extending the concept of differentiability t the higher-dimensional case Let f SR be a scalar field defined on a set S in RK”. Let a be au interior point of S, and let Ba; r) be an n-ball lying in S. Let » be a vector with |Je] 0 as | e+ 0. The linear transformation T, is called the total derivative off at a. Note: The total derivative T, is linear transformation, not a number. The function value T,(v) is a real number; it i$ defined for every point » in R*. The total derivative was introduced by W. H. Young in 1908 and by M. Fréchet in 1911 in a more general context Equation (8.7), which holds for | »|| 0, where E(a, v) + 0 as jjelj > 0. In this formula we take v = hy, where h #0 and |h\ jy | O and since [fjJh == 1, the right-hand member of (11) tends to the limit 7) as h > 0. Therefore the left-hand member tends to the same limit, ‘This proves (8.8). Now we use the linearity of 7, to deduce (8.9). If y = (Juy... + Yq) We have y = Dia Yates hence TAY) 1,(Zne)= Snr Syed Svs. 8.12 The gradient of a scalar field Vhe formula in ‘Theorem 8.5, which expresses f’(a; y) as a linear combination of the components of y, can be writen as a dot product, f@y) Df@yx = Vi@ ys where Vf (a) is the vector whose components are the partial derivatives off at a, Vf @=(Df@,...Prfl@). ‘This is called the gradient off The gradient Wf is a vector fick! defined at cach point « where the partial derivatives D,f(a),..., D,f(a) exis. We also write grad f for Vf. The symbol V is pronounced “del.” 260 Differential calculus of scalar and vector fields The firstorder Taylor formula (8.10) can now be written in the form (8.12) fla + 0) = fla) + V/(a) 0411 el Ela, 0), where E(a, t) > 0 as ¥b|| -> 0. In this form it resembles the one-dimensional Taylor formula, with the gradient vector V"(u) playing the role of the derivative f’(a). ability implies comtinviy. Frm the Taylor formtlllla we can easily prove that THEOREM 86. Ifa scalar field f is differentiable at a, then £ is continuous at a. Proof. From (8.12) we have If(@ + 0) —f@| = IV/@)- v + lol] Ela, a). Applying the triangle inequality and the Cauchy-Schwarz inequality we find OS 1s + 2) = fla) SVP) el + Ll LE(@, 1. f(azy) is the component fof -¥f(a) along the unit vector y Ficure 8.5 Geometric relation of the directional derivative to the gradient vector. ‘When y is a unit vector the directional derivative f/(a; y) has a simple geometric relation to the gradient vector, Assume that W/(a) 3 0 and let 6 denote the angle between y and Vyla), Then we have F(a; y) = VI@ + y = Vf(@| ly] cos 6 = IVF(@)| cos 0. This shows that the directional derivative is simply the component of the gradient vector in the direction of y. Figure 85 shows the vectors Vf(a) and y attached to the point a, The Guivative is layest when cos 9 — 1, dat is, when y hay de sane discction ax V/tw). In other words, at a given point a, the scalar field undergoes its maximum rate of change in the direction of the gradient vector, moreover, this maximum is equal 0 the length of the gradient vector. When Vf@ 18 orthogonal to y, the directional derivative f"(a; y) is 0. A sufficient condition for differentiability 261 In 2-space the gradient vector is often written as Fl y) 5+ PCy) . ay Vi (x, Y) = In 3-space the comesponding formula is Ofte, Yo 2); + PH 2) + ez) = i 2 he Vex, y, fC Ys 2) Ox ay 8.13 A sufficient condition for differentiability Iffis differentiable at a, then all partial derivatives D,f(a), ... . D,f(a) exist. However, the existence of all these partials does not necessarily imply that f is differentiable at a. A counter example is provided by the function IG) = STH if x#0, f@ y= discussed in Section 8.10. For this function, both partial derivatives Dyf(O) and D,f(O) exist but £ is not continuous at 0, hence f cannot be differentiable at 0. The next theorem shows that the existence of continuous partial derivatives at a point implies differentiability at that point, twzonen = S.7. A SUPFICTEWT © conprTION «FoR = prrverewrranrirty. Assume that thepartial derivatives Dif,..., Daf exist in some n-ball B(a) and are continuous at a. Then f is dif- ferentiable at a. Note: A scalar field satisfying the hypothesis of Theorem 87 is said t0 be continuously differentiable at a. Proof. ‘The only candidate for 7,() is Vf (a) + v . We will show that f(a+)—f@=V/(@) 0+ II | E@,r), where £(@, v) > 0 as | o> 0. This will prove the theorem. Let A= Jol. Then v = Au, where |u| = 1. We keep 4 small enough so that a + v lies in the ball B(a) in which the partial derivatives D,f,..., D,fexist. Expressing # in terms of its components we have Smet + Uneasy where e,...,, are the unit coordinate vectors. Now we write the difference f (@ + ¥) — fla) 2s. telescoping sum, G13) fle +8) fa) = fla+ iw) ~ sto) =Z (sa + i) = flas 20}, 262 Differential calculus of scalar and vectorjelds where %, %1,..., 8, ame any vectors in R" such that % = 0 and v, =u. We choose these vectors $0 they satisfy the recurrence relation % = m1 + Wste . That is, we take » 0, mae, Me MEH Weer, ..., 2, ne + + Men Then the kth term of the sum in (8.13) becomes Sa + Ades + Ayes) — (a+ Adpa) = SOx + Auer) — fbd where , = a+ Av, . The two points b, and b, + Au,e, differ only in their kth component. Therefore we can apply the mean-value theorem of differential calculus t write (8.14) L (By + dues) — F(b:) = Aue Def (Cr) where ¢, lies on the line segment joining B, to b, + Juez. Note that b, > a and hence eas AO. Using (8.14) in (8.13) we obtain S@ t+ -fla= 2X Prflete- < But V/(a) ‘v= AV). w= 2 2, Deflan, so fla + ¥) ~ f) Vfla): v= AS {Dy fled) — De Sla} ny = (0) Ela), where E(a, 0) = 3 {Di S(c) — Dif (@)}ue- a Since ¢, —> a as | vl] > 0, and since each partial derivative D,f is continuous at a, we see that E(a, 8) 0 as | 9] + 0. This completes the proof 8.14 Exercises 1. Find the gradient vector at each point at which it exists for the scalar fields defined by the following equations (a) f(x, y) = 8 + y* sin Gy) (d) f(x, ¥, 2) = x8 = yet 227. (b) f(x, y) =e cos y (©) fl y, 2) = log (x2 + 2)? = 32%), © SQ, 2) = xyz (f) flxy, 2) = 2. Evaluate the directional derivatives of the following scalar fields for the points and directions given (a) f(x,y, 2) = x2 + 2y*+ 32% at (1, 1,0) in the direction of # | + 2k (>) f(x, y, 2) = Gly)” at CL, 1, 1) in the ditection of 2+ J =k 3. Find the points (x, y) and the directions for which the directional derivative of f(x, y) = lett yo 4. A differentiable scalar fieldfhas, at the point (1, 2), directional derivatives +2 in the direction toward (2,2) and -2 in the direction toward (1, 1). Determine the gradient vector at (1, 2) and compute the dirvetional derivative in the direction toward (4, 6). 5. Bind values of the constants a, b, and ¢ such that the directional derivative of f(x, ys 2) = axy® + byz + cz*x3 at the point (1,2, = 1) has a maximum value of 64 in a direction parallel 3x24 92 has its largest value, if (x, y) is restricted to. he on the A chain rule for derivatives of scalar fields 263 6, Given a scalar field differentiable at a point gin R®, Suppose that f’(a; y) = 1 andf’(a; z) = 2, where y = 27 + 3j and f(a; xi + yj) = 6 Als 7. Let f andg denote sca properties of the gradient: (a) grad f = 0 iff is constant on S. (b) grad (f + g) = gradf + gradg (©) grad (1) = gradfif ¢ is a constant. (d) grad (fg) = fyradg + g grad. ggradf —fgradg = EBC Sees i © gad z + j. Make a skeich showing the set of all points (x, y) for which calculate the gradient V’(a). jar fields that are differentiable on an open set S, Derive the following at points at which g # 0. 8. In R® let r(x, y, 2) = xi + yf + ah, and let r(x, y, 2)= r(x y, 2) - (a) Show that ‘Vet, (b) Show that V(r") 2) is a unit vector in the direction of r(x. ) ney if n is a positive integer. (Hint: Use Exercise 1(d).] ), (©) Is the formula of part (b) valid when n is a negative integer or zero? (@ Find a sealar field f- such that Uf =r. 9, Assume fis differentiable at each point of an r-ball B(@). IF f “Gx; y) = 0 for a independent veciors J1,... , Yq and for every x in B(a), prove that fis constant on B(a). 10, Assume fis diffcivntiatle at cach point of an a-ball Bia). (a) IF_Vf (%) = 0 for every x in BQu), prove that fis con ant on Bea). (©) If f(x) < £ (a) forall xin BCU), prove that Vfla) = 0. 11. Consider the following six stalements about a scalar field f: S -+ R, where S & R" and ais an interior point of S. @ fis continuous at a, (b) fis differentiable at a. (c) f “a;y) exists for every y in R", (@) All the first-order partial derivatives off exist in a neighborhood of a and are continuous ata. OY @=o0 ©) f(x) = [x = al for all xin R™ Ina table like the one shown here, mark T in the appropriate square if the statement in row (x) always implies the statement in column (y). For example, if (a) always implies e of the first row. The main diagonal has already been (b), mark T the second sq filled in for you, 815 A chain rule for derivatives of scalar fields In one-dimensional derivative theory, the chain rule enables us to compute the derivative of a composite function g(t) =ffr(t)] by the formula f wit vi 264 Differential calculus of scalar and vector fields This section provides an extension of the formula when f is replaced by a scalar field defined on a set in n-space and r is replaced by a vector-valued function of a real variable with values in the domain off. In a later section we further extend the formula to cover the case in which both f and r are vector fields, It is easy to conceive of examples in which the composition of a scalar field and a vector field might arise, For instance, suppose f(x) measures the temperature at a point x of a solid in 3-space, and suppose we wish to know how the temperature changes as the point x varies along a curve C lying in the solid. If the curve is described by a vector-valued function r defined on an interval fa, bj, we can introduce a new function g by means of the formula B= fi) if axrsd. This composite function g expresses the temperature as a function of the parameter 1, and its derivative g’(P) measures the rate of change of the temperature along the curve. The following extension of the chain mule enables us t compute the derivative 2%) without determining g(t) explicitly, THboREM 88. CHAIN RULE Let £ bea scalar field defined on an open set S in R", and let r be a vector-valuedfunction which maps an interval J from R into §. Define the composite function g = f © ron J by the equation g= flO] if ted. Let t be @ point in J at which (exists and assume that £ is differentiable at r(t). Then g(t) exists and is equal to the dot product (8.15) g()= Vilar), where a = rn). Proof. Let a = r(t), where ¢ is a point in J at which r'(t) exists. Since S is open there is an mball B(a) lying in S, We take ft 0 but small enough so that r(t + h) lies in Bla), and we let y =r(t +h) = (0). Note that y+ 0as hh 0. Now we have Be + A) — g(t) = Slee + NI-S FOl=S@ + )-S@ Applying the first-order Taylor formula for f we have fa + N-S@ Wi@-y + lyl| Eley), where Ela, y) + 0 as |ly|—> 0. Since y = r(t + h) = a(t) this gives us g(t + s =a) wa) meh part) _ Ea.y), Letting h —» 0 we obtain (8.15). A chain rule for derivatives of scalar fields 265 uveiz 1. Directional derivative along a curve. When the function r describes a curve C, the derivative +’ is the velocity vector (tangent to the curve) and the derivative g’ in Equation (8.15) is the derivative off with respect to the velocity vector, assuming that rv #0. If T(t) is a unit vector in the direction of r’() (Tis the unit tangent vector), the dot direction of C. For a plane curve we can write T(t) = cos a(t) i+ cos B()j, where (7) and 6(f) we the angles made by the vector T(t) and the positive x and y-axes; the directional derivative Off along C becomes Vf t2(y1 ‘TH =D, f[r(t)} cos a(t) + Daf {r(1)]c0s BE) This formula is often written more briefly as ¢ oF vf: T= cosa 5, 0s 8. Some authors write dflds for the directional derivative Vf + T. Since the directional derivative along C is defined in terms of 7, its value depends on the parametric representa- tion chosen for C. A change of the representation could reverse the direction of 7} this, in tum, would reverse the sign of the directional derivative. suet 2, Find the directional derivative of the scalar field f(x, y) = x? — 3xy along the parabola y = x? = x + 2 at the point (1,2). : Solution. At an arbitrary point (x, y) the gradient vector is We.» - 214 7- oe ays - a, ax! * ay At the point (1,2) we have V/(l, 2) = -4i — 3j. The parabola can be represented parametrically by the vector equation r(f) = fi + (18 — 1 + 2). Therefore, r(I) = i + 3j, r' =i + (2t = lj, and =*(1) =i +}. For this representation of C the unit tangent vector T(1) is (i + j)/V2_ and the required directional derivative is WI ,2) + TU) = —7/V2. mwez 3. Let £ be a nonconstant scalar field, differentiable everywhere in the plane, hhaving a tangent at each of its points. Prove that f has the’ following properties at each point of C: (@) The gradient vector Vf is normal 0 C (b) The directional derivative Off is zero along C. (©) The directional derivative off has its largest value in a direction normal 10 C. VG 266 Differential calculus of scalar and vector fields Solution, 1 T is a unit tangent vector to C, the directional derivative off along C is the dot product Vf- T. This product is zero if Vf is perpendicular to 7, and it has its largest value if Wf is parallel to 7; Therefore both statements (b) and (¢) are consequences of (a) To prove (a), consider any plane curve T with a vector equation of the form rf) = X@E + YOF ani invoduce the funtion gi} — | g(t) = Vf_ iol. rt. When T = C, the function g has the constant value ¢ so g(t) = 0 if 19 EC. Since g'= Vf r’, this shows that Vf is perpendicular to r’ on C; hence VF is normal to C. 8.16 Appli tions to geometry. Level sets, ‘Tangent planes The chain rule can be used to deduce geometric properties of the gradient vector, Let f be a scalar field defined on a set Sin R" and consider those points x in S for which f(x) has a constant value, say f (x) = ¢. Denote this set by L(©), so that L(c) = {x| xe S and f(x) = ¢}. Ine set L(¢) 18 called a level set ott. In K%, L(c) 18 called a level curve; in KP it 1s called a level surface. w 7 Level surface L(c) Lines of flow Figure 8.6 The dotted curves are iso- RE8.7 The gradient vector Vf is normal thermals: f (x, y) = ¢. The gradient vector to cach curve T’ on the level surface VE _ points in the direction of the lines of flow. PO Wz Families of level sets occur in many physical applications. For example, if f(x, y) represents temperature at (x, y), the level curves off (curves of constant temperature) are called isothermals. ‘The flow of heat takes place in the direction of most rapid change in temperature. As was shown in Example 3 of the foregoing section, this direction is normal to the isothermals. Hence, in a thin Mat sheet the flow of heal is along a family of curves Applications to geometry. Level sets. Tangent planes 267 orthogonal to the isothermals, These are called the lines of flow; they are the onhogonal trajectories of the isothermals. Examples ate shown in Figure 86. Now consider a scalar fieldfdifferentiable on an open set S in R®, and examine one of its level surfaces, L(c). Let a be a point on this surface, and consider a curve I which lies on the surface and passes through a, as suggested by Figure 8.7. We shall prove that the gradient vector V'(a) is normal to this curve at a. That is, we shall prove that V/(a) is perpendicular to the tangent vector of I at a. For this purpose we assume that I’ is Vf, a normal vector Je plane / Level surface L(c) FiGike 88 The gradient vector VE_ is normal to the tangent plane of a level surface FO, 92 described parametiically by a differentiable vector-valued function r defined on some interval Jin R'. Since T lies on the level surface L/c), the function r satisfies the equation fir) =e¢ forall rind. IF gt) =fle()] for t in J, the chain rule states that £0 = Wile] + re. Since g is constant on J, we have g’) =0 on J. In particular, choosing 4, so that g(t) = @ we find that Vela) + r'(h) = 0 In other words, the gradient off at a is perpendicular to the tangent vector r'(t,), as asserted, 268 Differential calculus of scalar and vector fields Now. we take a family of curves on the level surface L(c), all passing through the point 8, According to the foregoing discussion, the tangent vectors of all these curves are per- pendicular 10 the gradient vector V'(a). If WfG Js not the zero vector, these tangent vectors determine a plane, and the gradient V'(a) is normal to this plane, (See Figure 88). This particular plane is called we rangene piane of the surface Z(c) at a. We know from Volume T that a plane through a with normal vector N consists of all points x in R® satisfying N- (x = a) = 0. Therefore the tangent plane to the level surface L(o) at @ consists of all x in R® satisfying Vila): (x — a) =0 To obtain a Cartesian equation for this plane we express x, a, and V'(u) in terms of their components. Writing x = (x, ys 2), a= (modi, 2), and Va) = Di flai+ Dsfl@j+ Mof(@k , we obtain the Cartesian equation D:f@(x = x) + Diflaly = yx) + Dif@(E— 4) = 0. A similar discussion applies to scalar fields defined in R®, In Example 3 of the foregoing section we proved that the gradient vector V/(a) at a point a of a level curve is perpendicular to the tangent vector of the curve at a, Therefore the tangent line of the level curve L(e) at a point a = (x, ,y,) has the Cartesian equation Difax — 1) + Dif @Y= I) = 8.17 Exercis 1. In this exercise you may assume the existence and continuity of all derivatives under con- sideration, ‘The equations w= f(x, 9). x = X(W, y= Yo define w as a function of f, say u= F(t) (@) Use the chain rule 0 show that ry. Lea. Ly FO= 9) yO where f]@x and aly are to be evaluated at [X(), YO (b) In a similar way, express the second derivative F'() in terms of derivatives off, X, and ¥. Remember that the partial derivatives in the formula of part (a) are composite functions given by of a ZF Logue ven. 2 ~ ngpten. ven a 7 SOKO, ay ~ Pe bp Refer to Exercise 1 and compute F(t) and FQ) in terms of + for each of the following special cases @) fix,y)= 8+ FX = 4 Y=. (b) flx, y) = AY cos Cx"), XO = cos #. YO) (c) f (% y) = log (1+ eA +e"), XO Derivatives of vector fields 269 3. In each case, evaluate the directional derivative off for the points and directions specified: (a) fx, y, z) = 3x ~ 5y + 2z at (2,2, 1 in the direction of the outward normal to the sphere @ fo ea (b) f(x, y, 2) = a = y® ar a general point of the surface x? + y? + 28 = 4 in the direotion of the outward normal at that point (© fl, yr 2)= 22+ y= 22 a G4, 5) along the curve of intersection of the wo surfaces 2x! + 2y? — 2? = 25 and x? + y? =z? 4G) Find a vector V(x, y. 2) normal to the 3 Vee P+ e+ yy at a general point (x, y, z) of the surface, (x, y. 2) #0, 0,0). (b) Find the cosine of the angle 0 between V(x, ys 2) and the z-axis and determine the limit of cos 0 as (%, y, 2) -> (0, 0, 0). 5. The two equations e* cos v = x and e* sin v = y define w and v as functions of x and y, say = Ulx, y) and v = V(x, y). Find explicit formulas for U(x, y) and V(x, y), valid for x > 0, and show that the gradient vectors VU(x, y) and V V(x, y) are perpendicular at each point &,)). / . Lxifix, ye yixyl. (@) Verity that 8ff@x and af]y are both zero at the origin (b) Does the surface z = f(x, y) have a tangent plane at the origin? [Hit section of the surface made by the plane x = y ,] 7. If (Xp, Yoo 2) i8 & point on the surface z= xy , then the two lines z= Yor, ¥ = Yo and z= xy, X = %q intersect at (Xqy Yor Za) and lic on the surface. Verify that the tangent plane to this surface at the point (3p, Jo. Z9) contains these two lines. 8. Find a Cartesian equation for the tangent plane to the surface xyz = (Xo. Yo + Zo). Show that the volume of the tetrahedron bounded by coordinate plane is 94/2. 9. Find a pair of linear Canesian equations for the line which is tangent to both the surfaces xt 4)? 4222 = dand 7 = e*¥ al the point (1, 1, 1) 10. Find & constant ¢ such that at any point of intersection of the two spheres Consider the at a general point plane and the three (eeote ye t2t= 3 and HGR D+ 21 the coresnonding tangent planes will be perpendicular to each other. IL If ry and ry denote the distances from a point (x, y) on an ellipse co its foci, show that the equation 7 + ry = constant (satisfied by these distances) implies the relation T-Vi+ m4) =0, where T is the unit tangent 10 the curve. Interpret this result geometrically, thereby showing that the tangent makes equal angles with the lines joining (x, y) to the foci 12. I VfCx, y, 2) is always parallel to xi + yj + 2k, show thatfinust assume equal values at the points (0, 0, a) and (0, 0, 8.18 Derivatives of vector fields Derivative theory for vector fields is a straightforward extension of that for scalar fields Let f: S > R” be a vector field defined on a subset S of R® If a is an interior point of S o diffe: and if y is any vector in R" we define the derivative /"(a; y) by the formula mL +) = S@ noo » f(a; y) whenever the limit exists, The derivativef(a; y) is a vector in R” Let f, denole the ki component off We now unt the derivative f"(a; y) exists if and only if f,(a3 y) exists for each & = 1, 2,..., m, in which case we have N= (Haid fala Ma SMe, where é, is the kth unit coordinate vector. We say that f is differentiable at an interior point a if there is a linear transformation tT: R" such that (8.16) f(a + v) = f(a)+ Tv) + lem Ela, 2), where E(@, a) -» 0 as 0 > 0. The firstorder Taylor formula (8.16) is to hdd for all 8 with ll? | 0. The term E(a, ») is a vector in R" . The linear trans- formation 7, is calied the totai derivative off a a. For scalar fields we proved that 7,(y) is the dot product of the gradient vector V/(a) withy. For veetor fields we will prove that 7,(y) is a vector whose kth component is the dot product Vf(a) ..¥ ior 89. Assume f is differentiable at a with total derivative T,. Then the derivative Ff (a; y) exists for every ain R*, and we have (8.17) Ty) =1'(a39) Moreover, ifff=(fi y-.-y f)andify=(,..., %»), we have (18) Ty) -SVf ay eon (CHa... Vha(@)"9) Proof We argue as in the scalar case, Ify = 0, then f"(a;y) = Oand 7,(O) = 0. Therefore we can assume that y # 0, Taking ¢ = Ay in the Taylor formula (8.16) we have Slat hy) flu) = Talhy) + ihyl| Ela, 0) = AT Cy) + \hi Lyi Ela 0). Dividing by A and letting h -» 0 we obtain (8.17). To prove (8.18) we simply note that Fain =E fais) o=SVhlarver Differentiability implies continuity am Equation (8.18) can also be written more simply as a matrix product, Ty) = Dflay . where ffa) is the m xf matrix whose kth row is Vfi(@), and where y is regarded as an nx 1 column matrix. The matrix Df (a) is called the Jacobian matrix off ata. Its kj entry is the partial derivative D,f,(@). Thus, we have “Difi@) Def(a) + Dy f(a)” Di fila) Dz fala) A) Df(a) = . Pr fala) Pofn(@) °° Da fala). The Jacobian matrix Df(a) is defined at each point where the mn partial derivatives Difiha) exist. The total derivative 71, is also written as f ‘(a). The derivative f(a) is a linear trans- formation; the Jacobian Dffa) is a matrix representation for this uansformation, The first-order Taylor formula takes the form (8.19) fla + ») =f(a) +f ‘(a)(v) + |e] Ela, 2), where E(a, 2) — 0 as v -> 0. This resembles the one-dimensional Taylor formula, To compute the components of the vector f '(a)(x) we can use the matrix product Dfla)y or formula (8.18) of Theorem 8.9. 819 Differentiability implies continuity swsonn 8.10. If avector field f is differentiable at a, then fis continuous at a. Proof. As in the scalar case, we use the Taylor formula to prove this theorem, If we let p> 0 in (8.19) the emor term | v! Ea, 2) > 0. The linear pat f (au) also tends 10 0 because linear transformations are continuous at 0, This completes the proof. At this point it is convenient to derive an inequality which will be used in the proof of the chain rule in the next section, The inequality concems a vector field differentiable at & it states that (8.20) WFC) < Mfa) ol. where M,(a) = > IV/(a)! . a To prove this we use Equation (8.18) along with the tangle inequality and the Cauchy- Schwarz. inequality to obtain IPO = 272 Differential calculus of scalar and vector fields 8.20 The chain rule for derivatives of vector fields ‘THEOREM 8.11, CHAIN RULE. Let f and g be vector fields such that the composition hz fog is defined ina neighhorhond of a point a Assume that g is differentiable at a, with total derivative g'(a). Let b = gla) and assume that f is differentiable at b, with total derivative £ '(b). Then his differentiable at a, and the total derivative h'(a) is given by # (a) =f (b)°8'(@), the composition of the linear transformations f ‘(b) and g(a). Proof We consider the difference h(a + y) — h(a) for small \\y |}, and show that we have a firstorder Taylor formula. From the definition of t we have (8.21) Ka t+ y)— Ma) =flgat yl— £ le@l= fb +2) f), where » = g(a + y) — g(a). The Taylor formula for g(a + y) gives us (8.22) = g'@Q) +I Ea. y), where Ea, y)> 0 as y- 0. ‘The Taylor formula for £ (> + 0) gives us (8.23) £ b+ 0) — Flo) = £'(A(v) +I) e rr ES, dv), where E,(b, 2) > 0 as p> 0. Using (8.22) in (8.23) we obtain 824) f(b + 2) —2(m) =F’) +2 'ONIVLE (@, y)) + HH ]E(6) = fg’ (aly) + \ly!| E@ y), where E(a, 0) = 0 and (8.95) Bion it yao (8.95) Exo) if ye O. To complete the proof we need to show that E(a, y) > 0 as y + 0. ‘The first term on the right of (8.25) tends 10 0 as y — 0 because E,(a, y) +0 as y +0 and linear transformations are continuous at 0. In the second term on the right of (8.25) the factor £,(6, v) —> 0 because v > 0 as y—> 0. The quotient ell/ilyl| remains bounded because, by (8.22) and (8.20) we have lol < M@ lvl + tyil LEC, »)h- Therefore both terms on the right of (8.25) tend wo 0 as ¥ + O, so Ela, y) > 0, ‘Thus, from (8.24) and (8.21) we obtain the Taylor formula Ha + y)~ Wa) = £'@s'(@y) + ly! El@.y), Matrix form of the chain rule 213 where E(a, y) > 0 as y —> 0, This proves that A is differentiable at a and that the total derivative A’(a) is equal to the composition f’(b) > g°(u) 8.21 Matrix form of the chain rule Lot k= f > g., where g is differentiable at a andfis differentiable at 6 = g(u). ‘The chain rule states that h@) = FO) © gq. i DEa), DEB), and Dela) which represent the linear transformations h'(a), f(b), and g’(w), respectively. Since composition of linear transformations conesponds to multiplication of their matrices, we obtain (8.26) Dh(a) = Df(b) Dela). where b = g(w) This is called the matrix form of the chain rule. It can also be written as a set of scalar equations by expressing each mawix in terms of its entries. Suppose that a € R?, 6 = gu) € R”, and fb) ER", Then hia) € R” and we can write B= (Breads Le Sivee sofas = Oe oes ti Then Dh{a) is an m x p matrix, Df(b) is an m x n matrix, and Dg(a) is ann x p matrix, given by Dwg) = (Diba, Df() = [D.AOin Dgla) = [D;(@)Ikiia The matrix equation (8.26) is equivalent to mp scalar equations, D,hAa) = S DiS()V 90), for i=1,2,...,m and fj fi These equations express the partial derivatives of the components of A in tems of the partial derivatives of the components off and g. sxvetz 1. Extended chain rule for scalar fields. Suppose f is a scalar field (m = 1). ‘Then h is also a scalar field and there are p equations in the chain rule, one for each of the partial derivatives. of Ft D,ha)= SD. f(bD.2la), for §= 1.2.0... Be ‘The special case p = 1 was already considered in Section 8.15. In this case we get only one equation, ni(a) = 3 D.f(B)ei(a). a 274 Differential calculus of scalar and vector fields Now take p = 2 andn=2. Write a = (s, f) and b = (x, ¥) . Then the components x and y are related to s and ¢ by the equations $i(8, 1), Y= gals). ‘The chain rule gives a pair of equations for the partial derivatives of A: Dh(s, t) = Dy f(x, y) Digls, t) + Def(%, ¥) Digals, t) D4h(S, 1) = Dif, Y) Dagils, 1) + Daf(% ¥) Dagelss In the ¬ation, this pair of equations is usually written as th _ Hx , Yay Hh _ A ax | Hav 21 = : : 627 ds Oxds (ay as a dxdt dyat wmere 2 Polar coordinates. The temperature of a thin plate is described hy a. scalar field f, the temperature at (x, y) being f(x, y). Polar coordinates x = rcos 6, y=r sin 8 fare introduced, and the temperamare becomes a function of r and @ determined by the equation or, 9) = f(r cos 8, r sin 9), Express the partial derivatives @g/@r and Ag/26 in terms of the partial derivatives Off@x and f/ay. Solution. We use the chain rule as expressed in Equation (8.27), writing (r, 9) instead of (, 0, and g instead of A. The equations X =rcosé, y=r sind give ws (8.28) se =, 8 O+ fain 4, A sinb +r Zoos. These are the required formulas for Gp/@r and d¢/@6. nmeie 8. Second-order partial derivatives. Refer to Example 2 and express the second- order partial derivative @%%/26® in tcrms of partial derivatives off Solution. We begin with the formula for 6/89 in (8.28) and differentiate with respect to J, treating r as a constant, There are two terms on the right, each of which must be Exercises differentiated as a product, Thus we have Fe _ _, Hf Asin 8) _ ono (2) + Af Alcos ®) nso (2) 2 aolax) ty aolay = =r eos L rain 3 (2) =r nods rcos 03 (2). To compute the derivatives of f/x and A//@y with respect to 9 we must keep in mind that, as functions of r and 0, Af/@x and ay are composite functions given by | |. . ax ay Therefore, their derivatives with respect to @ must be determined by use of the chain rule We again use (8.27), withfreplaced by Dif, to obtain 6 (A) _ ADS) ax + AD,S) ey _ FF - +o aolan) = me fs oe aaa Similarly, using (827) withfeplaced by D,f, we find a (2) = Aah) Ox + ADsf) ay __ af solar ax 80 ay 00° ax ay When these formulas are used in (8.29) we obtain (- rin) + 2 (rc0s . ot = ~reosoZ «A ? sin? oe - ? as ay ay 6. buoy th 8 Oe This is the required formula for 6%¢/06?. Analogous formulas for the second-order partial derivatives 02¢/8, d%p/(@r 29), and G—/(89 @r) are requested in Exercise 5 of the next section. 8.22 Baercises Jn these exercises you may assume differentizbility of all functions under consideration. 1, The substitution t= g(x, y) converts F(t) into f (x, y), where f(x, y) = Flg(x, y)] (a) Show that Yo & of pl k's x Figs, Dl gz and ay 7 F BoM; (b) Consider the special case F(t) = e™ #, g(x, y) = cos (x* + 9°). Compute af/2x and aflay by use of the formulas in part (a). To check your result, deteimine f (x, y) explicitly in terme of x and y and compute af] ay and af) Ay dieoelly from f Differential calculus of scalar and vector fields 10, 1 ‘The substitution w= (x — y)/2,v = (+ y)/2 changes (u, v) into F(x, y). Use an appropriate hain rule to express the partial derivatives @F] @x and @F] ay in terms of the pastial s aff @u and af av. ‘The equations W =fix, y), x = X(s, 0, y = Ys, 1) define uw as a function of sand t, say w= FG, 0. (a) Use an appropriate form of the chain rule to express the partial deriv aF/at interns of afjax, affay, eX/2s, aX/ér, a¥/as, O¥/ar. (oy 83f1( ax ay) = OMCy ax), Show that form of the deriv ives OF/@s and ar a 9s 2 ax oy a ae yey ee erieyy as 5 ax dy * ay OF +S) (©) Find similar formulas for the partial derivatives @F/( 8s at) and @*F) at? Solve Exervise 3 in (a) X(s,t) = $4 t, (b) X(s, t) = st, (s 7 © XG, N= (=/2, WG, 1) r cos Band The intmdnetion of polar coordinates changes f(x, y) into ofr, A), where x y =r sin 6. Express the second-order partial derivatives @g/r2, dy/( r 26), and @q/( 26 ar) {in terms of the partial derivatives off. You may use the formulas derived in Example 2 of ection 8.21. he equations «= f(x,y 2h x = XG,5, yy = Mn 80, and z= Zr, 5,1), define was a function off, s, and t, say w= F(r, s, #) . Use an appropriate form of the chain rule to express sie pans Gexivatives OF] Or, OF7 as, aah OF) OF in cans oF passa Solve Exercise 6 in each of the following special cases @) X48, )=r+s+t Wns Jar w+, Ln s0= 24 smb () X0,,02 74+ 8+, Y0,s02P-8-7, Zrsy92 hese ‘The equations w= f(x, y, 2) x= X(s, 0, y= Ys, , 2= Zs, #) define u as a fnetion of § and £, say U= F(s, 8. Use an appropriate form of the chain rule to express the partial de- rivatives 8F] @ and QF] at in terms of partial derivatives off, X, Y, and Z. Solve Exercise 8 in each of the following special cases: @Xs,y=84+0, ¥5)=F=0, Zl, 0 = 2 ) X65, H= s+ Yad=s—t 26,9 = st ‘The equations w= f (x, y),x= XO 8.0), y= Mrs 5, t) define was a function of rs, and t, say = Fir, s. 0. Use an appropriate form of the chain rule to express the partial derivatives @Fl@r, GF] as, and @F/@t in terms of partial derivatives off, X, and Y. Solve Exercise 10 in each of the following special cases: (@) X0ysth=r +s, Venn 0 = (b)X(r,s,th=rtstt, Yn d= re P+ OXGs, ters, Yr,so= st Let h(x) = flea}, where 8 (fi, ..., Sa) i @ vector field differentiable at a, and fis a scalar field differentiable at 6 = g(a). Use the chain rule to show that the gradient of It can be expressed as a linear combination of the gradient vectors of the components of $+ as follows Ta) - 3 Dyf @Vgnla), ) If f(x,y, 2) = xi + y+ zk, prove that the Jacobian matrix Df(x, y, 2) is the identity matrix of order 3. ‘Sufficient conditions for the equality of mixed partial derivatives 277 (b) Find all differentiable vector ficldsf: R? + R° for which the Jacobian matrix Df(x, yz) is the identity matrix of onder 3. (©) Find all differentiable vector fields; R® + R39 for which the Jacobian matrix is a diagonal matrix of the form diag (p(x), g(y), n~2)), wherep, g, and r are given continuous functions. 14, Let f2 R? > R® and g: R® + R? be two vector fields defined as follows: fix, y) = eH + sin (y+ Wj, glu, vw) = (u + 2ot + Bw) E+ Qn — WJ. (a) Compute each of the Jacobian matrices Df(x, y) and Dg(u, v, w). (b) Compute the composition A(u, vy w) = fIgti, ¥ Ww). (©) Compute the Jacobian matrix DA, ~1, 1). 15. Let f: R® + R® and g: R® + R® be two vector fields defined as follows: S (x,y,z) = GP ty + Zi + Ox ty + 270, But, 0, w) = uo Wet owe sin vj + werk, (@) Compute each of the Jacobian matrices Df(x, y, 2) and Deu, % w). (b) Compute the composition A(u, u, w) = flg(u, 8, w)] (©) Compute the Jacobian matrix DA(u, 0, w). 8.23 Sufficient conditions for the equality of mixed partial derivatives ms a real-valued funcuon of two vamables, the two muxed partial cenvanves D,.27 and D,,f are not necessarily equal. By D,f we mean D,(D,f) = @%/](@x @y), and by Dz fwe mean DD, f) = &f\(Gy ax). For example, if fis defined by the equations f(s, y) = x9 for (x, y) #09), f0,0=0, it is easy t prove that Ds. f(O, 0) = — 1 and D,,2f(0, 0) = 1. This may be seen as follows: ‘The definition of Dz, f(0, @) states that (8.30) Dzaf0, 0 = tm DifO, b= DSO, 0) | a) k Now we have D,f00, 0) = imLAY= SO.) 9 hoo h and, if (x, y) # (0,0), we find y — dext + 4xty? — Diflos ) = Therefore, if b 3 0 we have Dif (0, k) = —kS[k4 = -b and hence Dif, k) = Di f(0, 0) k 278 Differential caloulue of scalar and vectonjelde Using this in (8.30) we find that D,,f(0, 0) = -1 . A similar argument shows that D,2f(0, 0) = 1, and hence D; (0, 0) # Dy 2f(, 0). Jn the example just treated the two mixed partials Dyef and Dzif are not both con- tinuous at the origin. It can be shown that the two mixed partials are equal at a point (a, b) if at least one of them is continuous in a neighborhood of the point. We shall prove first that they are equal if both are continuous, More precisely, we have the following theorem. IFORFM 8.12. A SUFFICIENT CONDITION FOR EQUALITY OF MINED PARTIAL DERIVATIVES Assumef is a scalar field such that the partial derivatives D,f, Dsf, Dy2f, and Desf exist on an open set S IF (a, b)is a point in $ at which both Dy »f and Dz f are continous, we have (831) Dy of (@, 6) = Das f(a, 6) Proof. Choose nonzero h and k such that the rectangle R(h, k) with vertices (a, B), (at h, 0), (a +h, b + k), and (a, b + ) lies in $. (An example is shown in Figure 8.9.) (ab +h) (athb+k) + - (a, by (a+ hb) Ficure 8.9 A(A, &) is a combination of values off at the vertices, Consider the expression ACh, k) = fla +h, bh +k) — fla + hh) — f(a, b +) + f(a. b)- This is a combination of the values off at the vertices of R(h, k) , taken with the algebraic signs indicated in Figure 89. We shall express A(A, &) in terms of Dgxf and also in terms of Diof. We consider a new function G of one variable defined by the equation G(x) =f(x, 6 +k) — f(x, 4) for all x between a anda +h. (Geometrically, we are considering the values off at those points at which au abitwy vertical Line cuts the horizontal edges of R(M, J) Then we have (8.32) Ath, k) = Gla + h) = Gia). Sufficient conditions for the equality of mixed partial derivatives 219 Applying the one-dimensional mean-value theorem to the righthand member of (8.32) we obtain G(a + h) — Gla) = AG'(x,), where x, lies between a and a + h. Since G'(x) = Dif(a, b +k) — Dyf(x, b), Vquation (8.32) becomes 833) Ath, B= WD, f(s. b + B= Dfles, DI) Applying the mean-value theorem 0 the right-hand member of (8.33) we obtain (8.34) AG, ky =hkDaa flay), where ys lies between b and b + k. The point (x, yx) lies somewhere in the rectangle Rth, k) Applying the same procedure to the function H(y) = f(@ + h, y) = f(a, y) we find a scoond expression for AG, k), namely, (835) Ah, K) =hkDs aflan yds where (py ys) also lies in R(A, k) . Bquating the two expressions for A(h, k) and cancelling hk we obtain Da af (Xi) = Darf%n Ys) + Now we let (f, k) >» (Q,0) and use the continuity of Dyef and Deaf to obtain (8.31). ‘The foregoing argument can be modified to prove a stronger version of ‘Theowm 8.12. muons 8.13, Leif be a scalar field such that the partial derivatives D,f, Dzf,and Daf exist on an open set § containing (a, b). Assume further that D, yf is continuous on S. Then the derivative D»f (a, b) exists and we have Dish (a,b) = Daf (a, 8). Proof. We define A(h, k) as in the proof of Theorem 8.12. ‘The part of the proof leading to Equation (8.34) is still valid, giving us 8.36) for some (x, y;) in the rectangle R(h, k). The rest of the proof is not applicable since it requires the existence of the derivative D, 2 f(a, b), which we now wish to prove. ‘The definition of D, f(a, ) states that = tn Pof(a t hb) — Daf (a, b) (8.37) Dy f(a, b) = io . 280 Differential calculus of scalar and vector fields We are to prove that this limit exists and has the value Dz, (a, b). From the definition of Dif we D,flab) = tigi @eP +8 = fe. and 1 oy = tim Ze bikin far hy ‘Therefore the difference quotient in (8.37) can be written as Dof(a + hy b) ~ Def(a, b) _ 5... AUK) h eo hk Using (8.36) we can rewrite this as D, . (8:38) Pefla + 4 fla.) Him Daf). ‘To complete the proof we must show that 839) lima [tim DaafCss»¥)] = DasSlo, 8). aso Lise When > 0, the point y,—> B, but the behavior of 4 as a function of & is unknown. i ust the continsty af cay Zac b= O, we could Desf to deduce that lim Def Gs yi) = DasSG ). ko Since the limit # would have to lie in the interval a < 0, y > 0. Compute af ax in terms of x and y. 5. Assume that the equations w= f(x, J). X = Xi, y = Yu) define w as a function of 4, say u = F(t). Compute the third derivative F”yt in terms of derivatives of£, X, and Y. 6, ‘The change of variables x= u+o, y= wot tanstorms f(x, y) imo gu, v). Compute the value of @g/(2 au) at the point at which w= 1, v= 1, given that Gf _*f_ fet ay xt yt ax dy By ax ‘al tat point. 282 Differential calculus of scalar and vector fields 7. ‘The change of variables x = a, y = $(u# =v) transforms f(x, ¥) 00 g(u, 0). (a) Caleulate ag/au, ag], and @y)(2u 2) in terms of partial derivatives of f. (You may assume equality” of mixed panials.) (b) If UfCx, y)I? = 2 for all x and y, determine constants a and 6 such that ay (a (zi) - (2) = +o, 8. Two functions F and G of one variable and a function 2 of two variables are related by the equation IRs) + GNP ee") = 2F'LAG'Y) whenever Fx) + Gly) # 0. Show that the mixed pantial derivative Dz z(x, y) is never zero. (You may assume the existence and continuity of all derivatives encountered.) 9. A scalar field f is bounded and continuous on a rectangle R =a, b] = le, dl . A new scalar field g is defined on R as follows gu d= [LP yar] ay (a) It can be shown that for each fixed W in [a, bf the function A defined on fe, d] by the equation A(y) = f% f(x,y) dx is continuous on fe, dj. Use this fact to prove that ag/ av exists and is continuous on the open rectangle $= (a, b) x (c, d) (the imerior of R). (b) Assume that I [[trn va] ae fY[[se.nay]es for all (u, v) in R. Prove that g is differentiable on Sand that the mixed partial derivatives Dy 2g(u, 0) and Dp.rg(u, v) exist and are equal to f(u, ¥) at each point of S. 10. Refer to Exercise 9. Suppose u and v are expr v= BO) j and let g(t) = glA(), BON] (2) Detemnine qi) in terms off, A’, and B’ &) Compute pf in terms of ¢ when f(x, y) = eFY and aw = BW = jin the first quadrant.) LL. If f(x,y, 2) = (Fx A). (Px B), where r= xi + yj + 2k and A and B are constant vectors, show that V/(x,), 2) = Bx (rx A) +A x (r x B) 12. Let r =axi $ yj + 2k and let r= |r|). If A and B are constant vectors, show that sed parametrically ae follows: u — A), 2 = (Assume B lies (a) Av 0 1 34-7Ber AB (b) B (4 (; oo a a 13. Find the set of all points (a, b, ©) in 3-space for which the wo spheres (x ~ a)? + (y = ) + ( —%-1 and x? + yp? 422 — 1 intersect orthogonally. (Their tangent planes showld be perpendicular at each point of intersection.) 14. A cylinder whose equation is y z (x) is tangent to the surface z* + 2xz + y = 0 at all fo. polnis commion wv the tru sustaccs, 9 APPLICATIONS OF DIFFERENTIAL CALCULUS 9.1 Partial differential cquations The theorems of differential calculus developed in Chapter 8 have a wide variety of applications This chapter illustrates their axe in came examples mmlated to partial differen. tial equations, implicit functions, and extremum problems. We begin with some eleme tary remarks conceming partial differential equations An equation involving a scalar fieldfand its partial derivatives is called a partial differen- tial equation, Two simple examples in which f is a function of two variables are the first- order equation @.) FY _ 9, ax and the second-order equation + F(x y) (9,2 “ss = 02) a Each of these is a homogeneous linear partial differential equation. That is, each has the form L(f) = 0, where L is a linear differential operator involving one or more partial derivatives. Equation (9.2) is called the two-dimensional Laplace equation. Some of the theory of linear ordinary differential equations can be extended to partial differential equations. For example, it is easy to verify that for cach of Equations (9.1) and (9.2) the set of solutions is a linear space. However, there is an important difference between ordinary and partial linear differential equations that should be realized at the outset, We illustrate this difference by comparing the partial differential equation (9.1) with the ordinary differential equation v3) rs = The most general function satisfying (03) isf(x) = C, where C is an arbitrary constant: In other words, the solution-space of (9.3) is one-dimensional. But the most general fimetion satisfying 9.1) is Sx, Y= 8) 284 Applications of differential caleuus where g is any function of y. Since g is arbitrary we can easily obiain an infinite set of independent solutions. For example, we can take g(y) = e% and let ¢ vary over all real mumbers. Thus, the solutionspace of (0.1) is inznite-dimensional In some respects this example is typical of what happens in general. Somewhere in the process of solving a first-order partial differential equation, an integration is required to remove cach partial derivative. At this step an arbitrary function 1s introduced in the solution, This results in an infinite-dimensional solution space. In many problems involving partial differential equations it is necessary 10 select from the wealth of solutions a particular solution satisfying one or more auxiliary conditions. As might be expected, the nature of these conditions has a profound effect on the existence or uniqueness Of soiution, A systematic study oF sucir probieus wilt uot be aucmpied in diy book. Instead, we will treat some special cases to illustrate theideas introducedinChapter8, 92 A first-order partial differential equation with constant coefficients Consider the firstonder partial differential equation 9.4) jie ws De 2 a) All the solutions of this equation can be found by geometric considerations, We express the left member as a dot product, and write the equation in the form Gi+ 2p) - VF ») This tells us that the gradient vector Vf (x, y) is orthogonal to the vector di + 2j at each point (x, y). But we also know that V/(x, y) is orthogonal to the level curves of f, Hence these level curves must be straight lines parallel to 34 + 2j. In other words, the level curves off are the lines Be = By Therefore f(x, y) is constant when 2x — 3y is constant. This suggests that (9.5) S(x,y) = gx — 3y) for some function g. Now we verify that, for each differentiable function g, the ar fieldf defined by (9.5) does. indeed. satisfy (9.4). Using the chain rule to compute the partial derivatives off we find F _ 91 on : Sarwoxr~ay, L= -wox-a, 3x 7 Ox 39) ay 2°¢ ay. 3 oY. oyoy = ry) = 69x ~ ¥y) = 0 ax" ay Therefore, £ satisfies (9.4). A first-order partial differential equation with constant coefficients 285 Conversely, we can show that every differentiablefwhich satisfies (9.4) must necessarily have the form (9.5) for some g. To do this, we introduce a linear change of variables, (9.6) x= Au+ By, y=Cut Dy. This transforms f(x, y) into a function of u and y, say A(u, v) = f(Au + By, Cu + Dv) . We shall choose the constants A, B, C, D so that satisfies the simpler equation aku, v) (9.7) a! ‘Then we shall solve this equation and show that f has the required form, Using the chain nile we find ah Of ax Fay ay du x au aydu ax” ay Since J satisnes (YA) we nave Gf/éy = = (3/2){@f]@x), 0 the equation for Bijéu Decomes / 3c). au ax 3C, Taking A 3 and C = 2 we find ‘Therefore, hh will satisfy (7) if we choose A (9.8) x= du+ Be, y = Qut+ De For this choice of A and C, the function i satisfies (9.7), so h(u, 0) is a function of r alone, say hu, 0 g(r) for some function g. To express r in terms of x and y we eliminate y from (9.8) and obtain 2x — 3y = (2B — 3D)r. Now we choose B and D to make 2B — 3D = 1, say B D = 1. For this choice the tansformation (9.6) is nonsingular; we have y = 2x — 3y, and hence S(x,y) = hu, 0) = gle) = gx = 3y). This shows that every differentiable solutionfof (9.4) has the form (95) Exactly the same type of argument proves the following theorem for first-order equations with — coustanl coefficients. mupopey 0.1 the equation ag he differentiable on let f be the scalar field dejined on B® hy (9.9) IG B= g(bx = ay), 286 Applications of differential calculus where a and 6 are constants, not both zero, Then f satisfies thejrst-order partial differential equation (9.10) everywhere in R?. Conversely, every differentiable solution of (9.10) necessarily has the form (9.9) for some g- 9.3 Exercises In this set of 3 cises you may assume differentiability of all functions under consideration. 1. Determine that solution of the partial differential equation f(x,y) af(x, LO). AE _y ax yo which satisfies the condition f(x, 0) = sin x for all x. 2. Determine that solution of the partial differential equation POD yf 4 ax ay which satisfies the conditions f(0, 0) = 0 and Dyf(x, 0) = e* for all x 3. (a) If ux, y) = f(xy), prove that u satisfies the partial differential equation au au ax 7 Find a solution such that u(x, x) = x4e** for all x. (b) H tx, 3) = fUaly) tor y # 0, prove that P satisties the partial-ditterential equation ey “Ry Find a solution such that o(1, 1) = 2 and Dyo(x, Vx) = Ux for all x # 0. 4. If gu, v) satisfies the partial differential equation prove that g(«, ¥) = @lu) + 9y(e), where gy(w) is @ function of x alone and y(e) is @ function . 5. Assume £ satisfies the partial differential equation a x 4 oe Introduce the linear change of variables, x = Au + Bu, y = Cu + Dv, where A, B, C, D are constant, and let g(it, v) = f(An + Bo, Cut Dv). Compute nonzero integer valves. of A first-order partial differential equation with constant coefficients 287 A, B, ©, D such thag satisfies 9g/( @u dx) = 0. Solve this equation forg and thereby deter- mine fi (Assume eq 6. A function # is defined by an equation of the form lity of the mixed partials) Show that 1 satis! and find G(x, 9). 7. The substiution x =e, y = ef convertsf(x, y) into g(s, #), where g(s, 1) = fle, e'). If fis Known (© satisfy the partial differenial equation of a BevBahe show that g satisfies the partial-differential eq a oe are =0 8. Let fbe a scalar field that is differentiable on an open set Sin R", We say that f is homogeneous of degree p over Sif fie) = af) for every ¢ > O and every x in $ for which fx €S. For a homogencous scalar field of degreep show that we have x.) =p f(y for each xin S ‘This is known as Euler’s theorem for homogeneous functions. If x = (x, +) it can be expressed as f . a Fe Hing = PL Osta [Hint; For fixed x, define g(1) = f(rx) and compute g'(1).) 9. Prove the converse of Euler's theorem. That is, if f satisfies x Vf (x) = pf (x) forall x in an open set S, then f must be homogeneous of degree p over S$, (Hint: For fixed x, defir w(t) = fx) = fx) and compute ¢'(0.] 10. Prove the following extension of Euler's theorem for homogeneous functions of degree p in the 2 imensional case. (Assume equality of the mixed partials.) #f 4+ 24 ” Ox By af pip - Df. 288 Applications of differential calculus 9.4 The one-dimensional wave equation Imagine a string of infinite length stretched along the x-axis and allowed to vibrate in the xy-plane. We denote by y = f(x, 1) the vertical displacement of the string at the point x at time ¢, We assume that, at time ¢ = 0, the string is displaced along a prescribed curve, y= Fis). Au example iy shown in Figure 9,1(a), Figures 9.1(b) aid (© show possible displacement curves for later values oft. We regard the displacement f(x, ¢) as an unknown function of x and 1 to be determined, A mathematical model for this problem (suggested by physical considerations which we shall not discuss here) is the partial differential equation 22 ax” where ¢ is a positive constant depending on the physical characteristics of the string, This equation is called the one-dimensional wave equation. We will solve this equation subject 1 certain auxiliary conditions Y @r=0 ()t=1 Fiounx 9.1 The displacement curve y= f(x, 0) shown for various values of ¢ Since the imitial displacement 1s the prescribed curve y = F(x), we seek a solution satistying the condition f(x, 0) — F(x). We also assume that p//, the velocity of the vertical displacement, is prescribed at time 1 = 0, say Df (x, 0) = GX), where G is a given function, It seems reasonable to expect that this information should suffice to determine the subsequent motion of the string. We will show that, indeed, this is tue by determining the finctionfin terms of F and G. The solution is expressed in a form given by Jean d'Alembert (1717-1783), a French mathematician and_ philosopher. mone 9,2, D'ALEMBERT’S soumson of maz wave nguarron. Let Fand G be given functions such that G is differentiable and F is twice differentiable on R'. Then the function f given by the jormuia Fox + ot) + He = ot) 2 2e pret (9.11) S05 Gs) ds The one-dimensional wave equation 289 satisfies the one-dimensional wave equation (9.12) of eb oF 6x? and the initial conditions (9.13) P(x, 0)= FX), — Daf, 0) = GO). Conversely, any function f with equal mixed partials which satisfies (9.12) and (9.13) neces- sari/y has the form (9.11). Proof. Wt is a straightforward exercise to verify that the function f given by (9.11) satisfies the wave equation and the given initial conditions. This verification is left to the reader, We shall prove the converse, Onc way to proceed is to assume thatfis a solution of the wave equation, intoduec a linear change of variables, x=du+Bo, t=Cu+ Dv, which transforms f(x, 1) into a function of u and v, say gu, ¥) = f(Au + By, Cu + Dy) , and choose the constants A, B, C, D so that g satisfies the simpler equation ®e Ou Ov Solving this equation for g we find that g(u, v) = gi(u) + gale), where gy(u) is a function of u alone and g(r) is a function of v alone. The constants 4, B, C, D can be chosen so that w= x + ef, v= x= ct, from which we obtain (9.14) Sx, = Glxt et) + glx - ct). the given functions F and G. We will obtain (9.14) by another method which makes use of Theorem 9.1 and avoids the change of variables. First we rewrite the wave equation in the form (9.15) LiLf) where L, and Ly are the first-order linear differential operators given by Letf be a solution of (9.15) and let u(x, 0) 290 Applications of differential calculus Equation (9.15) states that w satisfies the first-order equation L,(u) = 0. Hence, by Theorem 9.1 we have u(x, 1) = pe + 0 for some fimetion ¢. Let ® be any primitive of gy, say P(y) = J% g(s) ds, and let o(x, t) = x O(x + ct). We will show that Za(v) = 1). We have ae ax 2e av Ota and F 5 +Ct), so a oo Ly = Fee 5 D(x + cf) = G(x + of) = u(x, 0 = L, Jn other words, the difference f — v satisfies the first-order equation LAf — ») = 0 By Theorem 9.1 we must have /(x, 2) — v(x, 1) = ye — ct} for some function y. There- fore f(D = ve, OD + ye — c= z D(x + cf) + yx — cf) 1 This proves (9,14) with g, = Xe ® and G = y. Now we use the initial conditions (9.13) to determine the functions g, and gy in terms of the given functions F and G. The relation f(x, 0) = F(x) implies (9.16) ot) + alr) — Fl ‘The other initial condition, Dof(x, 0) = G(x), implies 17) epi(x) = p(x) = G00. Differentiating (9.16) we obtain (9.18) gis) + p(x) = F(x). Solving (9.17) and (9.18) for gi(x) and pi(x) we find vey (+ 2 Ga), HY = FP) — . Ge). 2 The one-dimensional avo equation 291 Integrating these relations we get F(x) = FO) 2 2ede glx) = 71(0) px) = pA0) = Fay Fi - £ [cw ds. In the first equation we replace x by x + cf: in the second equation we replace x by X = cl Then we add the two resulting equations and use the fact that g\(0) + 9,(0) = F(0) to obtain f(x, 0 = gilxt eb) + gx — ct) =P, Dt FO = et x | Gis) ds. j 2 Jane This completes the proof, zxawpe, Assume the initial displacement is given by the formula ft -isea', fa) 1=0 Fravaz 9.2 A solution of the wave equation shown for ¢ = 0 and t = The graph of F is shown in Figures 9.1(a) and 9.2(a). Suppose that the initial velocity G(x) = 0 for all x. Then the resulting solution of the wave equation is given by the formula 165, = Fae ven. Figures 9.1 and 9.2 show the curve y= /(x, 1) for various values of 7. The figures illustrate that the solution of the wave equation is a combination of two standing waves, one traveling to the right, the other to the left, each with speed c. ee examples iMusirating the use of the equations are given in the next set of exe wi rake int the study of partial diffecentiat 292 Applications of differential calculus 9.5 Exercises In this set of exercises you may assume differenti lity of all functions under consideration, 1. If & is a positive constant and g(x, 1) = 3x/,/ke, let foto ye yt)= “ds Fle = fe du 7, ye Sho see Sand Saen St, (@) Show that B= er and 5 OUT (b) Show thatfsatisfies the partial differential equation x 3 (he heat equation). 38 Consider a scalar fieldfdefined in R® such that f(x, y depends only on the distance r of (x, y) from the origin, say f(x, y) = g(r), where r= (x24 y®)'4 (a) Prove that for (x, y) # (0,0) we have ap a oy 1 =O +O. r for all (3, y) # (0,0). Use part (@) to prove that f Cx, (0, 0), where a and 8 are constants Repeat Exercise 2 for the n-dimensional case, where n > 3. That is, assume that f(x) = fOtsecesn = a bog (2 + 32) + b for (x, y) (9, whese © = [lal]. Show that for x £0. If f satisfies the n-dimensional Laplace equation, ay ef ant om for all x ¥ 0, deduce that f(x) = a |x|?" + b for x # 0, where a, b are constants. Note: ‘The linear operator V® defined by the equation wre is called the n-dimensional Laplacian Exercises 293 4. Two-dimensional Laplacian in polar coordinates. ‘The introduction of polar coordinates x = r cos 6, y= rsin 6, converts f(x. y) into g(r, 6), Verify the following formulas: at 1 (a) | Sflr cos 0, r sin 6)? 42) : x wy af By Lee Lae oe” De TT or () Ox" 5. Three-dimensional Laplacian in spherical coordinates, ‘The introduction of spherical coordinates x=pcosdsing, y=psinOsing, z= pcos@, transforms f(x, ys 2) t0 F(p, 6, 9). This exercise shows how to express the Laplacian Vf in terms of partial derivatives. of F (a) First introduce polar coordinates x — r cos 8 ,y — r sin 8 to transformf (x, ys 2) tog(r, 0, 2). Use Exercise 4 to show that (b) Now tanstorm g(r, 9, 2) 10 F(p, 6 ) by taking 2 = p cos y, r= p sin g. Note that, except for a change in notation, this transformation is the same as that used in pan (a). Deduce that ar Le Fain g op * sing 3 vy CF Lae 1 ee ap? pap © pt ag 6. This exercise shows how Legendre’s differential equation arises when we seek solutions of Laplace's equation having a special form. Letfbe a scalar field satisfying the three-dimensional Laplaceequation, T2f — 0. Introduce spherical coordinates as in Exercise S and let F(p, 9, g) = fe. J,2). (a) Suppose we seek solutions f° of Laplace’s equation such that F(o, 8, 9) is independent of and has the special form F(p, 9, 9) = p"G(g) . Show that f° satisfics Laplace's cquation if G satisfies the second-order equation eG dG we Feo ee tan + DG (b) The change of variable x = cos 9 (p = arccos x, -1 wala YD where w(x, y) = + and w(x, y) = f(x, y)- The chain mule gives us the formulas ag au us aus ag au, du, aus 2 = DF =! + DF + DsF d B= DF + DFS + DFS, Be ax Oa FO a May yoy where each partial derivative D,F is to be evaluated at (x, y,f(x, y)). Since we have Solving this for af[@x we obtain af _ _ DiFIs, y SO YI) (9.20) = (8.20) ax DoF bys fq 0] at those points at which DaF [x, y, f(x, J] #0. By a similar argument we oblain a come- sponding formula for @/dy: af. DeFlx. y.fCe vd) ay DAFIx, ys f(% y)] at those points at which DsF[x, y.J(x, y)] ¥ 0. These formulas are usually writen more briefly as follows: (92) af_ _aFjox af _ _ Flay dx OF /@z” ay OF [ez exvexe. Assume that the equation y? + xz + 2? — e? — ¢ = 0 defines z as a function of x and y, say 2= f(x,y). Find @ value of the constant ¢ such that f(Q, ©) = 2, and YN- Qe. 296 Applications of differential calculus Solution, When x = 0,y =e, and z = 2, the equation becomes & + 4 — e —c =0, and this is satisfied by c'=4. Let F(x, y, 2) = J* +x2 + 2? =e = 4. From (9.20) and (9.21) we have Foi ax x+2z— ey When x = 0,y = e, and 2 = 2 we find @f/8x = 2/(e* — 4) and @ffay = 2e/(e* — 4). Note that we were able to compute the partial derivatives @//@x and @f/@y using only the value of f(x, y) at the single point (0. e). The foregoing discussion can be extended to functions of more than two variables Tanorem 9.3, Let F be a scalar field differentiable on an open set Tin R*, Assume that the equation F(x,,...,%) =0 dejines x, implicitly as a differentiable function of x,y. ++. Xy-a» Say Xy Hf te dy for allpoints (x;,.« « » Xq_a) it some open set S inR**, Then for each k= 1,2,....n~1, the partial derivative Dy is given by the formula a DE DuF at those points at which DyF # 0. The partial derivatives D.F and D.F which appear in (9.22) are to be evaluated at the point (xy5X2+.++y Xnas{(Xyr+ ++» Xn-a))+ The proof is a direct extension of the argument used to derive Equations (9.20) and (9.21) and is left to the reader. ‘The discussion can be generalized in another way. Supposc we have two surfaces with the following implicit representations o. (9.23) Fa,y.2=0, Goxy2 If these surfaces intersect along a curve C. it may be possible 10 obtain a parametric representation of C by solving the two equations in (9:23) simultaneously for two of the variables in terms of the third, say for x and y in terms of 2 Let us suppose that it is possible «© solve for x and y and dha solutions are given by the equation x=X@), ys YQ for all z in some open interval (a, b). Then when x and y are replaced by X(2) and Y(2), respectively, the two equations in (0.23) are identically satisfied. That is, we can write Derivatives of functions defined implicitly 297 F(X(z), Y@, 2) = 0 and G[X(z), Y@, 21 = 0 for all z in (a, b). Again, by using the chain rule, we can compute the derivatives X°(z) and Y°(2) without an explicit knowledge of X(z) and Y(z). To do this we introduce new functions f and g by means of the equations f= FXO, Then f(z) = g(z) = 0 for every z in (a, 5) and hence the derivatives f(z) and g'(z) are also zero on (a, b). By the chain mule these derivatives are given by the formula fe == xy 4 F vt, 9) = xe@s Sye+ ax ty ax ay ae Since f’(z) and g°(2) are both zor we can determine X"(2) and Y'(@) by solving the follow- ing pair of simultaneous linear equations xe +E ve =-%, . y oz 0G oG aG + X@)+ —Ve=e- >. ax fz) + iy (z) ae At those points at which the determinant of the system is not zero, these equations have a unique solution which can be expressed as follows, using Cramer’s rule: (9.24) X@se ‘The determinants which appear in (9.24) are determinants of Jacobian matrices and are called sheobian determinants. A special notation is often used to denote Jacobian deter- minants. We write on an) ax, a. ax. : ans ax,, 298 Applications of differential calculus Tn this notation, the formulas in (9.24) can be expressed more briefly in the form AF, GIAy, AF, G)A(x, y) (9.25) X@= OF, GAC, 9)” Y= (The minus sign has been incorporated into the numerators by interchanging the columns.) ‘The method can be extended 10 treat more general situations in which m equations in n variables are given, where n> m and we solve for m of the variables in terms of the remaining n— m variables. ‘The partial derivatives of the new functions so defined can be expressed as quotients of Jacobian determinants, generalizing (9.25). An example with m = 2 and n = 4 is described in Exercise 3 of Section 9.8. 9.7 Worked examples In this section we illustrate some of the concepts of the foregoing section by solving various types of problems dealing with functions defined implicitly. nmerz 1, Assume that the equation g(x. y) = 0 determines y as a differentiable function of x, say y = Y(x) for all x in some open interval (a, b). Express the derivative Y'(x) in terms of the partial derivatives of g. Solution, Let G(x) = glx, Y(x)]| for x in (a, b). Then the equation g(x, y) = 0 implies G(x) = 0 in (a, b). By the chain rule we have ow = 14 28 yx), fiom which we obtain - 4 Ogl0x (9.26) Yw=- agiay at hose points x in (a, b) al which dg/y ¥ 0. The partial derivatives g/Ox and Og/ay are given by the formulas @g/@x = Diglx, Y(x)] and ag/éy = Diglx, Y(x)]. sme: 2. When y is eliminated from the two equations z = f(x, y) and g(x, y) = 0, the result can be expressed in the form z = h(x). Express the detivative h'G) in terms of the partial derivatives off and g. Solution, Let us assume that the equation g(x, y) = 0 may be solved for y in terms of X and that a solution is given by y = Y(x) for all x in some open interval (a, b). Then the function ht is given by the formula A(x) = fix, ¥@)] aba o, Applying the chain rule we have Ga 2... Worked examples 299 Using Equation (9.26) of Example 1 we obtain the formula ay _ a a dyox dy dx og ay Wa) = The partial derivatives on the right are to be evaluated at the point (x, Y(x)). Note that the numerator can also be expressed as a Jacobian determinant, giving us o a A(x, wi(a) = A.W ¥) gloy seswis 3. The two equations 2x =p? — u® and y = wv define w and ¢ as functions of x and y. Find formulas for @u/Ox, du/@y, dv/Ax, dv/dy. Solution. If we hold y fixed and differentiate the two equations in question with respect to x, remembering that u and v are functions of x and y, we obtain Ov Ou dv, du 2=w2— mw and = = Max ex ax * ax Solving these simultaneously for @u/8x and @v/dx we find a! ang £ ax ete ax wt On the other hand, if we hold x fixed and differentiate the two given equations with respect to y we obtain the equations bv ou 2 and Solving these simultaneously we find ewoe 4. Let w be defined as a function of x and y by means of the equation = F(x + u,yu). and Ouj@x and Gu/@y in terms of the parual denvaives of F. Solution, Suppose that «= g(x, y) for all (x, y) in some open sot S. Substituting g(X, y) for w in the original equation we must have 8 fink y)s wales Ih 300 Applications of differential caleulus » g(x,y). Now we hold y fixed and differ- using the chain rule on the right, to obtain where u(x, y) =x + g(x, y) and u(x, y) entiate both sides of (9.27) with respect to x (9.28) ag = DF du ox ax + DFM. ox But u,/x = 1 + Ag/Ax, and Au,/Ax = y Ag/@x. Hence (9.28) becomes ag ag ag 8 = pyr (1 +2) + dar (y 2). ae = PF ( * =) +P ( >) Solving this equation for g/@x (and writing @u/@x for 4g/4x) we obtain au -DF ox DF +y DF = 1" In a similar way we find Qe gp OH: pp Me _ rp 8, nel ALN By OF ay | Ph ay WO at PE (y+ wm 8) ‘This leads to the equation du _ _ —a(x, y) DoF dy DF +y DF =I" ‘The partial derivatives DF and D,F are to be evaluated at the point (x + g(%, y), y g(x, J). sxvexe 5. When w is eliminated from the two equations x = w+ v and y = w*, we get an equation of the form F(x, y. v) = 0 which defines v implicitly as a function of x and y, say v= A(x, y) . Prove that ah ___Mx.y) Ox 3h(x,y) — 2x and find a similar formula for 3h/dy. Solution. Eliminating u from the two given equations, we obtain the relation wie -ysd. Let F be the function defined by the equation Fa, y,) = wed The discussion in Section 9.6 is now applicable and we can write (9.29) ey ee ax OF [av ay OF ev" Worked examp les 301 But @F/ax = v', OF/@v = 2xv ~ 3v%, and @F/@y = — 1. Hence the equations in (9.29) become th_ _ Ax, y) Ox xv 3! = x —3v SHC, y) = 2e and a a Oy Oxv 30? = 2xh(x, y) — 3x, y) | ewer 6, The equation F(x, y, 2) = 0 defines z implicitly as a function of x and y, say 2 =/(x, 3). Assuming that 3°F/(@x 82) = 2F/(82 ax), show that (53) Ge) ~ Gas) Ge) Gs)» Ge) Ge). q xt ary (ee (9.30) 2 where the partial derivatives on the right are to be evaluated at (x, y. f(x, y)). Solution. By Equation (9.20) of Section 9.6 we have a OF /ex 3) oan) ax-i@%* We must remember that this quotient really means _ Dix FYI) DF[x, y. f(x y)) Let us introduce G(x,y) = D\F[x,y,/(x,y)] and H(x,y) = DsF{x,y, f(x,y]. Our object is to evaluate the partial derivative with respect to x of the quotient 4 _ Gey) ax HG)" holding y fixed. The rule for differentiating quotients gives us 7 (9.32) a Since G and 1 are composite functions, we use the chain rule to compute the partial derivatives 6G/ax and @H/@x. For 8G/@x we have a 96 = ryry 1 + DADE)“ 0+ DADE) Z ax ax 302 Applications of differential caleulus Similarly, we find oH = D(D,F). 1+ D&D F):0+ Dab x ax ax =F Prag ax as + ast ax’ Substituting these in (9.32) and replacing @f/@x by the quotient in (9.31) we obtain the formnla in (9.30) 98 Fxercises In the exercises in this section you may assume the existence and continuity of all derivatives under consideration, 1. The two equations x + y= wv and xy=u — v determine x andy implicitly as functions of wand v, say x = X(u, v) and y = Y(w, v), Show that @X]au= av — I(x ~ y) if'x# y, and find similar formulas for aX/ av, 8 Y/au, @ Y/a0, The two equations x + y = wv and xy = u = v determine x and v as functions of w and y, say x = X(u, y) and v = V(u, y). Show that @X/2u = (u + v)/(1+ yu) if 1+ yu x 0, and find similar formulas for 2X] 8y, @V] au, 9 V] ey. 3. The two equations Fx, y, uv) = 0 and G(x, y, u, v) = 0 determine x and y implicitly as functions of w and v, say x = X(u, e) and y= ¥(u, v). Show that ne at points at which the Jacobian @(F, G)/@(x, y) # 0, and find similar formulas for the partial derivatives 8X) 20, a Y/au, and @ ¥/@0, 4, The intersection of the two surfaces given by the Cartesian equations 2x% + 3)? — and x2 +)® = 2% contains a curve C passing through the point P = (J7, 3,4), These equations imay be solved for x and y in terms of z ( give a parametric representation of C with 2 as parameter. (@) Find a unit tangent vector T to C at the point P without psin, the parametric representation (b) Check the result in part (@) by determining a parametric representation of C with z as parameter. 5. The three equations F(u, v) = 0, w= ay, and v = \/x® + 2? define a surface in xyz-space. Find a normal vector 10 this surface at the point x = 1 Fi is koown that D,F(, 2) = 1 and D,F(L, 2) 6. The three equations an explicit knowledge of x? — ycos (w) +22 =0, xt+ y= sin (wy) + 22% = 2, xy -sinucosv +z =0, define x, y, and 7 a8 fimetions of wand v. Compnte the partial derivatives @x/ Qu and ax/@o at thepointx =y = 1,u= 7/2,v =0,2 =0. Maxima, minima, and saddle points 303 7. The equation /(/x, z/x) = 0 defines x implicitly as a function of x and y, say z= g(x,y). Show that % %& an Ty = sy) at those points at which Defly/x, g(x, y)/2] is not zero. 8. Let F be a real-valued function of two real variables and assume that the partial derivatives DF and D,F are never zero. Let u be another real-valued function of two real variables such that the partial derivatives @u/ 2x and uj ay are related by the equation FY @uj ax, 2uj 2y) = 0. Prove that a constant n exists such that siete Buy wap = (Fy) and find mn, Assume that #uj( ax 3y) = dYuj( ay ax), 9, The equation x + z + (y + z)® = 6 defines z implicitly as a function of x and y, say z= fx, y) . Compute the partial derivatives af/éx, af/ay, and #fi( ax 2) in terms of x, y, and 2. 10.The om fom sin (x 4 y) 4 sin fy + 2) = 1 defines 7 implicitly ae a fimetion of x and y, say ‘Gx, y) . Compute the second derivative Dy »fin terms of x,y, and 2. ll, The equation F(x +y +z, x°+ y?+ z*) = 0 defines z implicitly as a function of x and y, say 2 = f(x,y). Determine the partial derivatives 2] 2x and aff dy in teems of x,y, 2 and the partial derivatives D,F and D,F. 12, Letfandg be functions of one for all the partial derivatives oi off and g. Verify the relation ariable and define F(x, y) first and second onder, expr f(x + g(y)]. Find formulas -d in terns of te derivatives Fer FOF ax Ax By” ay Oxt” 99 Maxima, minima, and saddle points ‘A surface that is described explicidy by an equation of the form z = f(x, y) can be thought of as a level surface of the scalar field F defined by the equation F(x, 2) S06, Yo Iitis differentiable, the gradient of this field is given by the vector ‘A linear equation for the tangent plane at a point Py = (x;, 71, 24) can be written in the form 2 = A(x) + B= y), where A=DflXsy) and B= Deflas ya)- When both coefficients A and B are zero. the point P; is called a stationary point of the surface and the point (x1, y1) is called a stationarypoint or a criticalpoint of the function f. 304 Applications of differential calculus ‘The tangent plane is horizontal at a stationary point. ‘The stationary points of a surface are usually classified into three categories : maxima, minima, and saddle points. If the surface is thought of as a mountain landscape, these categories correspond, respectively, to mountain tops, bottoms of valleys, and mountain passes. ‘The concepts of maxima, minima, and saddle points can be intwoduced for arbitrary scalar fields defined on subsets of R”. pmmnrrion. A scalar field f is said to have an absolute maximum at a point a of a set S inR if (9.33) I) S/@ for allx in S. The number f (a) is called the absolute maximum value off on S The function f is said to hane a relative maximum at a if the inequality in (9.23) is satisjiedfor every xin some n-ball Bia) lying in S In other words, a relative maximum at a is the absolute maximum in some neighborhood! of a. The terms absolute minimum and relative minimum are defined in an analogous fashion, using the inequality opposite «© wat in (33), The adjectives global and local are sometimes used in place of absolute and relative, respectively. DERINITION. A number which is either a relative maximum or a relative minimum off is called an extremum off. Tff has an extemum at an interior point a and is differentiable there, then all first-order partial derivatives D,f(a),..., D,f(@) must be zero. In other words, Vfta) = 0. (This is easily proved by holding each component fixed and reducing the problem to the one- dimensional case.) In the case 7 = 2, this means that there is a horizontal tangent plane to the surface 2 = f(x, y) al the point (4 f (@). On the other hand, it is easy to find examples in which the vanishing of all partial derivatives at a does not necessarily imply an extremum at a This occurs at the so-called saddlepoints which are defined as follows. DEMINITION. Assume f is differentiable at a. If Vf (a) = 0 the point a is called a stationary point of f. A stationary point is called a saddle point if every nball Bla) contains points x such that f (o) < fla) and other points such that f (> f (a Th oy points of a function are classified as maxima, minima, and points of inflection. The following examples illustrate several types of stationary points. In each case the stationary point in question is at the origin. sano 1, Relative maximum. > = f Ge y)=2~ x2 = y%, This surface is a paraho- oid of revolution, In the vicinity of the origin it has the shape shown in Figure 9.3(a), Its Maxima, minima, and saddle points 305 (b) Level curves: Example 1, Relative maximum at the origin, oo () z= 4 Example 2, Relative minimum a the origin, Frou 9.3 Examples 1 and 2. level curves are circles, some of which are shown in Figure 9.3(b). Since f(x, y) = 2 = (xt + y*) $2 = S(O, 0) for all (, y), it follows thatfnot only has a relative maximum at (0,0) but also an absolwe maximum there. Both partial derivatives f/x and afjay vanish at the origin weawote 2. Relative minimum. 2 = f(x, y) = x* + y* This example, another paraho- loid of revolution, is essentially the same as Example 1, except that there is a minimum at the origin rather than a maximum, The appearance of the surface near the origin is fused in Figure und some oF te ievel curves ae sown in Figure 9.3(b). 306 Applications of differential caleulus exwere 8, Saddle point. z= f(x,y) = xy. This surface is a hyperbolic paraboloid, Near the origin the surface idle shaped, as shown in Figure 9.4(a), Both partial derivatives f/x and afay are zero at the origin but there is neither a relative maximum nor a relative minimum there. In fact, for points (x, Y) in the first or third quadrants, xand y have the same sign, giving us f(x, y) > 0 = /@, 0) , whereas for points in the second and fourth quadrants x and y have opposite signs, giving us f(x, y)< 0= f (0, 0). Therefore. in every neighborhood of the origin there are points at which the function less than /(O, 0) and points at which the function exceeds (0, 0), 0 the origin is a saddle (b) Level curves: xy = ¢ Foot 9.4 Example 3. Saddle point at the origin. point. The presence of the saddle point is also revealed by Figure 9.4(b), which shows some of the level curves near (0, 0), These are hyperbolas having the x- and y-axes as asymptote: xis 4 Saddle point. x= f(x,y) = x* = 3xy?. Near the origin, this surface has the appearance of a mountain pass in the vicinity of three peaks, This surface, sometimes referred to as a “monkey saddle,” is shown in Figure 9.5(a). Some of the level curves are illustrated in Figure 9.5(b). It is clear that there is a saddic point at the origin, xoveir 5, Relative minimum, 2= f(x.y) = x*y*, This surface has the appearance of a valley surrounded by four mountains, as suggested by Figure 96(a). ‘There is an absolute minimum at the origin, since f(x, y) > (0, 0) for all (x, y). The level curves [shown in Figure 9,6(0)] are hyperbolis having’ the A- and. y-axes as asymmplotes. Note that these level curves are similar 10 those in Example 3. In this case, however, the function assumes only nonnegative values on all its level curves sxauris 6 Relative maximum. z= f(x, y) = 1 ~ 24. In this case the surface is a cylinder with generators parallel to the y-axis, as shown in Figure 9.7(a). Cross sections cut by planes parallel to the x-axis are parabolas. There is obviously an absolute maximum at the origin because f(x, y)=1— x? < 1 = f(0, 0) for all (x, y). The level curves form a family of parallel straight fines as shown’ in Figure 9.7(b). (a) r= x" =3 ay (b) Level curves: FicuRE 9.5 Example 4. Saddle point at the origin, (b) Level curves: x'y' = € FIGURE 9.6 Example 5. Relative minimum at the origin } Tangent plane at (0,0,1) @)zelax (b) Level curves: 1 FIGURE 9.7 Example 6. Relative maximam at the origin. 307 308 Apnlications of differential calculus 9.10 Second-order Taylor formula for scalar fields Ifa differentiable scalar field f has a stationary point at a, the nature of the stationary point is determined by the algebraic sign of the difference (x) -f(a) for x near a. If Sat y) — f@) = Vila) y + ly | E@, 9) where E(a,y)—>0 as y-> 0. At a stationary point, Vf(a) = 0 and the Taylor formula becomes fat y)-f@= ly 0 E@ y). To determine the algebraic sign of f(a + y) — f(a)we need more information about the error term ly | E(a, y). The next theorem shows that if f has continuous second-order partial derivatives at a, the error term is equal to a quadratic form, SS jw ZL, Psi Wyss = Re plus a term of smaller order than jly |. The coefficients of the quadratic form are the second-order partial derivatives Dyf = Di(D,f), evaluated at a. The n x n matrix of second-order derivatives D,,f(x) is called the Hessian matrix? and is denoted by Ho. Thus, we have A(x) = [Df whenever the derivative notation as follows: exist. The quadratic form can be written more simply in matrix 3 Ed,s@nw, =yM@y’, a where y = (J1,.. » 5)',) i8 considered as a1 x n row matrix, and y* is its transpose, an nx 1 column matrix. When the partial derivatives D,,f are continuous we have Dy f = Dyf and the matrix H(u) is symmetric, Taylor’s formula, giving a quadratic approximation to f(a + y) = f(u), now takes the following form, rusomx 9,4, SECOND-ORDER TAYLOK FORMULA FOR seatan ereins. Let f be a scalar field with continuous second-order partial derivatives D,,f in an n-ball Bw. Then for all v inR" such that a+ y € B(w) we have 8.) flat y) -F@) = Wl) -¥+ Lytitas evy', where O< Oas y +0. Proof. Keep y fixed and define g(u) for real u by the equation gy) =f(at+uy) for -1 0as y—>0. From (9.37) we find that iy Ea, N=} >> (D,fle +o) - D,S@)ny, < ay > 1D,,fla+ oy) = Dy f@) hyl®- aa 310 Applications of differential calculus Dividing by |ly |? we obtain the inequality (Ea, N13 > > Ww, sao) = D.f0 et for y # 0. Since each second-order partial derivative Dy,f is continuous at a, we have Dif(a+ cy) > Di,f(a) as y+ 0,80 Ex(a, y) -> 0.as y + 0. This completes the proof. O11 ‘The nature of a ctatinnary poi Ata stationary point we have V"(a) = 0, so the Taylor formula in Equation (9.35) becomes Sa + 9) —S(@) = §yH (ay + lly |? Ea(a, y). Since the error term |lyl/? £,(a, y) tends to zero faster than |lyl|#, it seems reasonable to expect that for small y the algebraic sign of f(a + y) -f(@) is the same as that of the quadratic form yH(a)y'; hence the nature of the stationary point should be determined by the algebraic sign of the quadratic form, This section is devoted to a proof of this. fact First we give a connection between the algebraic sign of a quadratic form and its eigen- values. muon 9.5, Let A= [a,j] be ann xn realsymmetric matrix, and let BU) = vay'= ZF usr Then we have: (@) QQ) > 0 for all y #0 if and only if all the eigenvalues of A are positive. (b) QQ) < 0 for ally #0 if and only if all the eigenvalues of A are negative. Note: In case (a), the quadratic form is calledpositive definite; in case (b) it is called negative definite. Proof. According to Theorem 5.11 there is an orthogonal matrix C that reduces the quadratic form yAy! to a diagonal form, That is (9.38) Q0) = pay = 2 Ax where x = (xj,... 5X) is the row matrix x= yC,and 4, ..., A, are the eigenvalues of A. The eigenvalues are real since A is symmetric It all the eigenvalues are positive, Equation (¥.38) shows that Qty) > 0 whenever x #0, But since x = yCwe have y = xC, so x ¥ 0 if and only if y # 0. Therefore Oly) > 0 for all y #0 Conversely, if Q(y) > 0 for all y # 0 we can choose y so that x = yC is the kh co- ordinate vector e, . For this y, Equation (9.38) gives us Qy) = 4,, so each 4, > 0. This ptoves pat (a). The proof of (b) is entirely analogous. Nature of a stationary point determined by eigenvalues of Hessian matrix 311 The next theorem describes the nature of a stationary point in terms of the algebraic sign of the quadratic form yH(a)y*. turorem 9.6. Let £ bea scalar field with continuous second-order partial derivatives Df in an n-ball Bia), and ter 1a) denote the Hessian matrix at a stationary point a. Then we have: (a) If all the eigenvalues of H(a) are positive, f has a relative minimum at a. (b) if all the eigenvalues of H(a) are negative, f has a relative maximum at a. (c) If Hla) has both positive and negative eigenvalues, then {has a saddle point at a. Proof. Let Qy) = yH(a)y'. The Taylor formula gives us (9.39) f(a + y)— Sf @) = 400) + ly? E@,y), where £,(a, y) + 0as y + 0. We will prove that thee is a positive number y such that, if 0 < ly] 0 for all y # 0. There- fore YH (@y' > y(ully* = u (yl? for all real u Sh ly? for all p # 0. Since E,(a, y) > 0 as y + O, there is a positive number r such that |E,(a, y)| < $4 whenever 0 < jjp|| 0. Therefore f has a relative minimum at a, which proves part (a), To prove (b) we can use a similar argument, or simply apply part (@) to —f. To prove (c), let 1, and A, be two eigenvalues of H(a) of opposite signs. Let h = min {[Ay|, [dg|}. Then for each real w satistying -h 0 as above so that |Ex(@, y)| < J whenever 0 < llyl| < r. ‘Then, arguing as above, we see that for such y the algebraic sign of f(a + y) -fa) is the same as that of Q(y). Since both positive and negative values occur as » + O, f has a saddle point at a, This completes the proof. Note: If all the eigenvalues of H(@) are zero, Theorem 9.6 gives no information concerning the stationary point. Tests involving higher order derivatives can be used to treat such examples, but we shall not discuss them. here. 9.12 Second-derivative test for extrema of functions of two variables In the case m = 2 the nature of the stationary point can also be determined by the algebraic sign of the second derivative D,1f(a) and the determinant of the Hessian matrix. szonm 9.7. Let a be a stationary point of a scalar field f (x,, x2) with continuous second order partial derivatives in a 2-ball B(u). Let A Piaf@, 2B Pisf@, C= Daa f(@), and let AR A =det H(a) = det = AC- B, BC. (a) If A<0, f has a saddle point at a. (b) A> Oand A> 0, £ has a relative minimum at a. (c) fA>OandA <0, f has a relative maximum at a. @ If A = 0, the test is inconclusive. Proof. \n this case the characteristic equation det [AJ — H(u)] = 0 is a quadratic equation, BA (A+ OAFARO. The eigenvalues Ay, Ay are related (© the coefficients by the equations Ath = At, If A < 0 the eigenvalues have opposite signs, so f has a saddle point at a, which proves (a). fA > 0, the eigenvalues have the same sign, In this case AC > B® > 0, so A and C have the same sign, This sign must be that of 2, and A, since A + C = A, + dy. This proves (b) and ©) To prove (@) we refer to Examples 4 and 5 of Section 9.9, In both these examples we have A = 0 at the origin, In Example 4 the origin is a saddle point, and in Example 5 it is a relative minimum, Exercises 313 Even when Theorem 9.7 is applicable it may not be the simplest way to determine the nature of a stationary point. For example, when f(x, y) = ev), where g(x. y) x? + 2-4 cost y 2 cos ¥, the test is applicable, but the computations are lengthy. case we may express g(x, y) as a sum of squares by writing g(x, y) =1 +x? + (1 — cos 3). We see at once that / has relative maxima at the points at which x* = Q and (1 — cos J)? = 0. These are the points (0, 27), when n is any integer. 9.13 Exercises In Exercises 1 through 15, locate and classify the stationary points (if any) of the surfaces having the Cartesian equations given. e+ — 7.25 x= Bayt + y*. -g- IF 3.25 x6 = x= y). leafy, 9. n= x9 + y= Bay, =(xeyt I in x cosh y 2xt — xy — 3y* — 3x + Ty. NBs — bry + 39), xioay tye ty. Sx + Ty = 25)eereri9®, nxsinysin(x +), O0. = (et + yeah), 16 Let f(x, y) = 3x4 — 4xy + y®, Show that on every line y = mux the function has a minimum, at @, 0), but that there’ is rio relative minimum in any two-dimensional neighborhood of the Grigin, Make a sketch indicating the set of points (x, y) at which f(a, y) > 0 and the set at whichftx, y) < 0. 11 Let fx. y)= 3 =X = Waty = (a) Make a sketch indicating the set of points (x, y) at which f (x, y) > 0. (b) Find all points (x, y) in the plane at which Dyf (x, y) = Daf (x, y) = 0. [Hit has (3 — y) as a factor.] (©) Which of the stationary points are relative maxima? Which are relative minima? Which are neither? Give reasons for your answers. (a) Docs f have an absolute minimum or an absolute maximum on the whole plane? Give reasons for your answers 18. Determine all the relative and absolute extreme values and the saddle points for the function Oy) = xy — 27 = y*) on the squareO Sx S1,0Sy S 1. 19, Determine constants a and & such that the ititegral Dif(x, w fife b fd)" dx will he as email a possible if forse, ()fhd=O%+ 1 20. Let fx, y) = Ax? + 2Bxy + Cj + 2Dx + 2Ey + F, where A > O and BY< AC. (a) Prove that a point (x1, y,) exists at which f has a minimum, [Hint; Transform the quad- ratic part to a sum of squsires.] (b) Prove that f(x;,);) = Dxy + By, + Fat this minimum, (©) Show that ABD | {iy = eB © FI. log Fi

You might also like