Lin Alg

Lectures in Linear Algebra
T. Shaska
Contents
Preface 2
1 Analytic geometry 3
1.1 Cartesian system of the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Relations and their graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Geometric interpretation of complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Roots of unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Algebraic equations, planar algebraic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Circle and Ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Conics with mixed terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Vectors in Physics and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.1 The plane R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.2 The space R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Euclidean spaces, linear systems 27

2.1 Euclidean n- space Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Norm and dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Matrices and their algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Linear systems of equations, Gauss method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Elementary row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Row-echelon form of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Reduced row-echelon form, Gauss-Jordan method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.1 A word on homogenous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Inverses of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6.1 Computing the inverses using the row-echelon form . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Vector Spaces 47
3.1 Definition of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.1 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Bases and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 A basis for Matn×n (R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.2 Finding a basis of a subspace in kn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Nullspace and rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 Finding a basis for the row-space, column-space, and nullspace of a matrix. . . . . . . . . . . . 58
3.4 Sums, direct sums, and direct products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.1 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Direct products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3
4 Linear Transformations 65
4.1 Linear maps between vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Composition of linear maps, inverse maps, isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Matrices associated to linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Linear transformation in geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.1 Scalings: scalar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.2 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.3 Shears . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.5 Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6 Review exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Determinants, eigenvalues, eigenvectors 81

5.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1.1 Computation of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.1.2 Generalized concept of Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Eigenvalues, eigenvectors, and eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Similar matrices, diagonalizing matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.1 Diagonalizing matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Cramer’s rule and adjoint matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4.1 Adjoints of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Canonical Forms 101

6.1 Basics on polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1.1 Irreducibility of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Companion matrices, minimal polynomial, Smith normal form. . . . . . . . . . . . . . . . . . . . . . . 104
6.3 The rational canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4 Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4.1 Computing the rational canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4.2 Computing the transformation matrix: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5 The Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Inner Products and Orthogonality 119

7.1 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1.1 Inner products over real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.1.2 Hermitian products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2 Orthogonal bases, Gram-Schmidt orthogonalization process . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2.1 Gram-Schmidt algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.2 The QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3 Orthogonal transformations and orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3.1 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4 The method of least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4.1 The method of least squares for higher degree polynomials . . . . . . . . . . . . . . . . . . . . 134
7.5 Sylvestre’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.6 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8 Symmetric matrices 141
8.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.1.1 Binary quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.1.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.1.3 Polynomials versus forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.2.1 Diagonalizing a quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.2.2 Binary quadratic forms, conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.2.3 Ternary forms, quadratic surfaces in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.2.4 Graphing quadratics equations in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.3 Positive definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.3.1 Principal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.4 Singular values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4.1 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Preface
In most universities linear algebra is taught at the sophomore level and is the first course which introduces students
to concepts such as vector spaces, homomorphisms, matrices, etc. These are important concepts in mathematics
and must be developed with care. Such concepts play a crucial role for mathematics and computer science majors.
Moreover, linear algebra is also a very important tool to students who major in engineering, sciences, finance, etc.
It is important that these students learn linear algebra from a more practical viewpoint. Responding to these needs
there are two kinds of textbooks on linear algebra.
In the first kind the emphasis is put mainly on applications and the mathematical content is hidden or lost. The
students feel overwhelmed with many applications and techniques and finish the course without the mathematical
culture that such a course should provide.
The second kind of textbooks are at a more advanced level focusing on the mathematical content and skipping
most of the computational aspects of the subject. Most texts of this kind simply lack the examples and exercises
to illustrate the material. Furthermore, they avoid most of the computational exercises in topics such as canonical
forms, diagonalization, etc.
Almost all books of linear algebra lack the historical prospective of the subject, the motivation why the subject
was developed. The students who take a course in linear algebra usually do not lear about the rich geometry of the
plane R2 , or the space R3 , the transformations, etc. This is mainly because most of it is ignored by recent books and
teachers of the subject.
In writing this book I simply wanted to present a more balanced approach of the mathematical and the computa-
tional content of the subject while reminding the student throughput the book of the motivations and the historical
prospectives of the subject. My goal was that students learn the computational tools of linear algebra and at the
same time have some level of maturity about the subject. Preparing the student for more advanced topics of linear
algebra was the main concern. Therefore, we treat some topics that probably would not be covered in a typical
sophomore level course, such as irreducibility of polynomials over the rationals numbers, complex numbers, etc.
What you will find special about this textbook is that the geometry is incorporated throughout the book. The
first chapter is a quick review of a few facts from analytic geometry that are covered in high school. However, it is
no coincidence that the chapter is closed with an example of a conic with mixed terms. The process of transforming
this conic in a standard form touches some of the very important parts of linear algebra. First, is the question of
what type of transformations will preserve the shape of the conic. Second, how is it done? Throughout the book
the student will realize that diagonalizing a matrix correspond to changing the basis of a vector space, which in
itself corresponds to certain algebraic substitutions. Moreover, the idea that many things change while we change
the basis, but some things stay the same leads to the very important concept of invariant. Inertia of a binary form,
determinant of a matrix, etc are some of the concepts where this will be explored.
We treat canonical forms in detail in contrast to most other textbooks at this level. It is believed that this part of
linear algebra is too difficult for an introductory course on the subject. I first experimented with teaching canonical
forms at the undergraduate level at the University of California at Irvine, during the academic year 2002-2003.
To my surprise students responded very well to it. Since then I have continuously included canonical forms in
my linear algebra courses. It is true that I spend one or two lectures on polynomials and complex numbers but
that is not a lost time since most of the undergraduate students lack a good understanding of these topics. While
computing canonical forms of matrices can be at times painful it can be also be an excellent opportunity to introduce
some computer algebra packages in the course. I have used Maple to give examples and exercises that illustrate
the benefits of canonical forms. One can also assign programming exercises on computing canonical forms. The
success of these programming exercises will depend on the exposure that students have had to programming.
Some of them will respond very well to it, while others will have a difficult time with such exercises. Having
1
Linear Algebra Shaska T.
some knowledge of canonical forms is very helpful to engineering students who often have to normalize operators
and also to mathematics students who will use these canonical forms in later classes such as abstract algebra,
representation theory, etc.
We expect that the student who is taking the class has already knowledge of basic calculus or as it is known in
most US universities Calculus I and II. For students of mathematics a chart of classes and the logical hierarchy is
suggested in the diagram.
Calculus I, II
v *
Linear Algebra Introduction to Proofs
r ,
Calculus III Elementary Number Theory
,
Real Analysis I Abstract Algebra I

Real Analysis II Abstract Algebra II
We have tried not to overflow the book with meaningless examples and exercises, however we give enough
exercises to entertain even the most ambitious students. There are a few exercises in the text that need some
knowledge of other areas of mathematics. We expect students who are taking the course to have taken the calculus
sequence of classes and to have taken some basic course on discrete mathematics or logic. The level of exercises
varies. Most of the exercises are at the very basic level where we simply check the level of the understanding of the
subject. However, some of the exercises are challenging even for the most ambitious student.
Acknowledgments
This book grew out of lectures on the subject at University of California at Irvine, University of Idaho, University
of Vlora, and Oakland University. I would like to thank especially the University of California at Irvine: for giving
me the opportunity to teach the subject several times in a row during the time that the first draft of this book was
written, and enthusiastic students at all the above schools who had to put up with rough drafts of the manuscript.
Tanush Shaska
Rochester, 2018
2
Chapter 1
Analytic geometry
Analytic geometry is the study of geometric objects such as lines, circles, ellipses, parabolas, hyperbolas through
the use of algebra. In this chapter we will review briefly some of the major problems of analytic geometry and
show that it was the study of such problems as the main motivator for the development of what we now call linear
algebra.
1.1 Cartesian system of the plane

Let be given the set of real numbers R. By R2 we will denote the Cartesian product R × R. A point in R2 is an
ordered pair (a, b) of real numbers a, b ∈ R. Numbers a, b are called coordinates. The point with coordinates (0, 0) is
called the origin.
We assume that the reader is familiar with the correspondence between
the ordered pairs (a, b) and points of the plane. For every two points
P1 (x1 , y1 ) and P2 (x2 , y2 ) on the plane the distance between them is given by y
q
d= (x2 − x1 )2 + (y2 − y1 )2 . (4, 3)
The distance of a point P(x, y) from the origin is

q
kOPk = x2 + y2 . x
Any two distinct points determine uniquely a line. Let L be a line on the
plane. Denote by θ the angle between the x-axis and the line L, measured (−4, −3)
counterclockwise starting from the x-axis. The slope of L is called tan θ.
Lemma 1.1. For any two points P1 (x1 , y1 ) and P2 (x2 , y2 ) in the plane, the slope
of the line P1 P2 is Figure 1.1: Coordinate plane
y2 − y1
m= .
x2 − x1
The reader to prove the following lemma as a trigonometry problem.
Lemma 1.2. For any two lines L1 and L2 with slopes m1 and m2 respectively, the angle φ between them, measured in the
counterclockwise direction, is
m2 − m1
tan φ = ,
1 + m1 m2
where m2 is the slope of ending line and m1 the slope of the initial line.
Corollary 1.1. Two lines are perpendicular if and only if
1
m2 = − .
m1
3
1.1.1 Relations and their graphs

A relation between sets X and Y is a set of ordered pairs, hence a subset of the Cartesian product X × Y. The set of
all the first elements of these ordered pairs is called the domain and the set of the second coordinate is called the
set of values of the relation.
The graph of a relation is the set of points in the coordinative plane which satisfy the given relation. An equation
in x and y that is satisfied from all the points of the graph is called the equation of the graph.
An equation is called an algebraic equation when it is given by a polynomial equation
F(x, y) = 0.
A particular class of relations are those relations when for every x from the domain it corresponds a unique value
y from the set of values. Such relations we call them functions.
When the domain and the set of values of a relation ∼ are the same, say X, we say that we have a relation ∼ on
X. A relation ∼ on X is called an equivalence relation if the following properties hold:
• reflexive: ∀x ∈ X, x ∼ x
• symmetric: ∀x, y ∈ X, x ∼ y =⇒ y ∼ x
• transitive: ∀x, y, z ∈ X, x ∼ y ∧ y ∼ z =⇒ x ∼ z.
The study of geometric objects using algebraic methods is the focus of algebraic geometry. It is exactly the
correspondence between geometric objects and algebraic equations that was a breakthrough in mathematics.
Rene Descartes or (latin) Renatus Cartesius: 31 March 1596 – 11 February

1650) was a French philosopher, mathematician, and scientist. Dubbed
the father of modern Western philosophy, much of subsequent Western
philosophy is a response to his writings, which are studied closely to this
day.
One of Descartes’ most enduring legacies was his development of Carte-
sian or analytic geometry, which uses algebra to describe geometry. He
"invented the convention of representing unknowns in equations by x, y,
and z, and knowns by a, b, and c". He also "pioneered the standard nota-
tion" that uses superscripts to show the powers or exponents; for example,
the 2 used in x2 to indicate x squared.
He was first to assign a fundamental place for algebra in our system of
knowledge, using it as a method to automate or mechanize reasoning, par-
ticularly about abstract, unknown quantities. European mathematicians
had previously viewed geometry as a more fundamental form of mathe-
matics, serving as the foundation of algebra. Algebraic rules were given
geometric proofs by mathematicians such as Pacioli, Cardan, Tartaglia and
Ferrari. Equations of degree higher than the third were regarded as unreal,
because a three-dimensional form, such as a cube, occupied the largest di-
mension of reality. Descartes professed that the abstract quantity a2 could
represent length as well as an area. This was in opposition to the teach-
ings of mathematicians, such as Vieta, who argued that it could represent
Figure 1.2: Renatus Cartesius
only area. Although Descartes did not pursue the subject, he preceded
Gottfried Wilhelm Leibniz in envisioning a more general science of algebra
or "universal mathematics," as a precursor to symbolic logic, that could
encompass logical principles and methods symbolically, and mechanize
general reasoning.
4
Shaska T. Linear Algebra
1.2 Polar coordinates

Another way to represent point in a plane is via polar coordi-
nates. Consider the following system. Let O be a fixed point on the
plane and L an oriented line going through O as in Fig. 1.3. We call
the point O the pole and the line L the polar axis.
(r, θ)
If P is any point on the plane we denote by r the distance r = |OP|
and by θ the angle from the line L to the line OP measured in the r
counterclockwise direction. The ordered pair (r, θ) are called the
polar coordinates of P.
θ
If P = O, then r = 0 and we accept that (0, θ) represents the pole polar axis
for every value of θ. The definition of polar coordinates require that
r > 0. We extend this definition of (r, θ) even in the case when r < 0, Figure 1.3: Polar coordinates
considering the point with coordinates (−r, θ) the point on the line
with the point (r, θ) but on the opposite side of distance |r| from the
pole O. Notice that (−r, θ) represents the same point as the coordinates
(r, θ + π).
In a coordinate system we would prefer that every point has a unique representation. Since the rotation by 2π
will move the point to its previous position then (r, θ) is given also by
(r, θ + 2nπ) and (−r, θ + (2n + 1)π)
where n is any integer. To avoid this we restrict θ to
π π
− ≤θ≤ .
2 2
The conversion between Cartesian coordinates and polar coordinates now can be made precise; see Section 1.2. We
take the center of the coordinate system by the pole O and the x-axis to be the line L with the same orientation.
If P has Cartesian coordinates P(x, y) and polar coordinates (r, θ), then
we have
x y
cos θ = and sin θ = .
r r y
Hence,
x = r cos θ and y = r sin θ. (1.1) (x, y) ≡ (r, θ)
To find the polar coordinates when we know the Cartesian coordi-
nates we have r
y
y
r =x +y
2 2 2
and tan θ = (1.2)
x θ
which come directly from Eq. x
(1.1) or from the Section 1.2. It comes
helpful now that θ ∈ − π2 , π2 , since tan θ is now one to one on the

interval − π2 , π2 . Thus, θ ∈ − π2 , π2 . θ is called the argument of the

point. Let us see some examples.
Example 1.1. Convert the point 2, π3 from polar coordinates to Cartesian coordinates.

Solution: Since r = 2 and θ = π/3, then Eq. (1.1) gives
π 1
x = r cos θ = 2 cos = 2· = 1
3 2
√
π 3
y = r sin θ = 2 sin = 2 · =3
3 2
5
√
Thus the point in Cartesian coordinates is (1, 3).

Example 1.2. Find the polar coordinates for the point P(1, −1) given in Cartesian coordinates
Solution: The argument θ is

y
tan θ =
= −1,
x
which implies that θ = −π/4. Since (1, −1) is the the fourth quadrant then r is positive and we have
q p √
r = x2 + y2 = 12 + (−1)2 = 2.
√
2, − π4 .

Thus, the polar coordinates are

Example 1.3. Convert the point P(4, π) from polar coordinates in Cartesian coordinates.
Solution: Substituting for r = 4 and θ = π in Eq. (1.1) we have
x = 4 cos π = 4 · (−1) = −4
y = 4 sin π = 4 · (0) = 0
Cartesian coordinates are P(−4, 0).

1.2.1 Complex numbers

Let us briefly review the set of complex numbers and show that it is a vector space with scalars from the real
numbers R.
We denote by i the symbol such that i2 = −1. The set of complex
numbers, C, is the set
y
C := {a + bi | a, b ∈ R}. z1 z2
Two complex numbers z1 = a + bi and z2 = c + di are equal if and only if a = b
and c = d.
Definition 1.1. We define the addition and multiplication as follows:
z1 + z2
i) (a + bi) + (c + di) = (a + c) + (b + d)i z2
ii) (a + bi) · (c + di) = (ac − bd) + (ad + bc)i
z1 = a + bi
The scalar multiplication is defined as
x
r · (a + bi) = ra + (rb) i, z̄1 = a − bi
for any r ∈ R and a + b i ∈ C.

Both addition and scalar multiplication are illustrated geometrically in
Figure 1.4: Addition and multiplication
Fig. 1.4. We will say more about the geometric interpretation shortly.
of complex numbers
The following are two easy but important exercises which should be
done at least once by every student.
Exercise 1.1. Prove that the multiplication in C is simply the addition and multiplication of R when restricted to R.
6
For the complex number z = a + bi we call the real part of z the real number a and the imaginary part of z the real
number b. We denote them as follows:
<(z) = a, =(z) = b.
For every z ∈ R we have z = <(z) + 0 · i. If z = a + bi and a = 0 then z is called purely imaginary.

For a given z = a + bi we define its complex conjugate as z̄ := a − bi. The mapping
C→C
(1.3)
z → z̄
is surjective and is called complex conjugation map. If z ∈ C, then z is real if and only if z = z̄. The absolute value
or modulus of z = a + bi, denoted by kzk, is defined to be
√ √
|z| = zz̄ = a2 + b2 .
1.2.2 Geometric interpretation of complex numbers

Next we give a geometric representation of the complex numbers and primitive roots of unity.
We can represent z = a + bi as the point in the xy-plane having coordinates (a, b). The distance ρ of this point from
√
the origin is a2 + b2 . Thus, ρ = |z|. The angle − π2 ≤ θ ≤ π2 , such that tan θ = ba , is called the argument of z.
Notice that a = ρ cos θ and b = ρ sin θ. Hence,
z = ρ (cos θ + i sin θ) .
This is called the polar representation of z. We can use this polar representation to get a geometric interpretation
of the multiplication of complex numbers. Let, z, w be any two complex numbers such that
z = r1 (cos α + i sin α), and w = r2 (cos β + i sin β).
Then,
z · w = r1 r2 cos(α + β) + i sin(α + β)

z r1 (1.4)
= cos(α − β) + i sin(α − β)

w r2
Moreover, the following is true:
Lemma 1.3. (De Moivre’s formula) For any integer n ≥ 1
(cos θ + i sin θ)n = cos(nθ) + i sin(nθ).
Clearly, if z = r(cos θ + i sin θ) and n an integer ≥ 1 then
zn = rn (cos nθ + i sin nθ).
We will make use also of the Euler’s formula
eiθ = cos θ + i sin θ.
Euler’s formula is obtained by Taylor expansions.
7
1.2.3 Roots of unity

We will see now a geometric interpretation of the roots of the simplest type of polynomial equation for complex
numbers, namely
zn − 1 = 0,
for any given n ≥ 1.
Denote by
2π 2π
+ i sin
εn = cos .
n n
Using De Moivre’s formula we have that εnn = 1. This number εn is called a primitive n-th root of unity. All powers
of εn are in the unit circle and have the property that raised to the power n give 1. When there is no confusion
about n we simply denote the εn -th by ε. Multiplying ε by itself simply rotate ε by the angle θ = 2π n . Hence, after
n-rotations we will reach the point z = (1, 0). See Example 2.26 in section 2.6 for more on rotations in C. In the next
figure we show how powers of ε11 spread out in the unit circle.
A complex number z is said to have order n ≥ 1 if zn = 1 and zm , 1 4 =(z)

for all 0 < m < n. All complex numbers z of order n are called the
primitive roots of unity. Primitive roots of unity are quite essential in
the theory of general polynomial equations of higher degree. 2
Example 1.4. In Fig. 1.5 we show the solution set of the inequality
<(z)
||z − z0 || ≤ 3,
−4 −2 2 4
where z0 = 1 − i, in the complex plane.
−2
The geometry of the complex plane is a wonderful part of math-
ematics and it is used in many areas of science. In the remainder
of these notes we will occasionally inject problems and ideas from −4
the geometry of complex numbers, complex functions, dynamical
systems, etc. Figure 1.5: ||z − z0 || ≤ 3
Exercises:
1.1. Solve the equation 1.7. Let S be the set
(z4 − 1)(z3 − 1) = 0 S := {z ∈ C | kzk = 1}

√
3
1.2. Let z1 = 21 + i 2 . Find z2 , z19 , and z22 . Draw the graph of S. Find S ∪ R, S ∩ R, R \ S
1.3. Let z = r1 (cos α+i sin α) and w = r2 (cos β+i sin β). Prove
1.8. Prove the following:
the following
i) z + w = z + w
z · w = r1 r2 cos(α + β) + i sin(α + β) ii) zw = z · w

1.4. Let z = r(cos α + i sin α). Show that 1.9. Solve the following equation
zn = cos nα + i sin nα zn − 1 = 0
for any integer n ≥ 1. √
3
1.10. Let z = 21 + i 2 . Compute z2 , z3 , z4 , z5 , z7 , z8 , z9 .
1.5. Prove that ∀u, v ∈ C,
1.11. Let z = r1 (cos α + i sin α) and w = r2 (cos β + i sin β).
|u · v| = |u| · |v|.
Prove the following
1.6. Compute the modulus of the following complex numbers:
z · w = r1 r2 cos(α + β) + i sin(α + β)

√ √ !10 √ !12
2 2 1 3
i− , +i 4πi 2πi
2 2 2 2 1.12. Express the numbers 3e− 6 , −3e n in standard form.
8
1.13. Solve the following 1.25. A function f (z) is given by
3 f :C→C

z+ (3z2 − 2z + 5) = 0
z2
az+b
z→ ,
1.14. Factor completely the following polynomial p(z) = z7 −1. cz+d
where ad − bc = 1. Given that
1.15. Factor over Q the polynomial p(z) = z5 − 1
1
1.16. Does the equation f (1) = −1, f (i) = i, f (2) = −
2
z4 + z3 + z2 + z + 1 = 0 a) Find f (z) and then f (2), f (2i), f ( 12 ).
b) Let C = {z ∈ C s.t. |z| = 1, Re(z) ≥ 0} (half of the unit
have any rational solutions? circle). Find f (C).
xn −1
1.17. Can f (x) = x−1 be expressed as a polynomial? What is 1.26. A function f (z) is given by
that expression?
f :C→C
1.18. Compute the modulus of the following complex numbers:
az+b
√ !12 z→
1 3 cz+d
(i + 1)10 , + i
2 2 where ad − bc = 1. Given that
1
1.19. Prove that for any rational number r ∈ Q, f (i) = −i, f (3) = , f (−1) = −1
3
(cos θ + i sin θ)r = cos(rθ) + i sin(rθ)
a) Find f (z) and then f (2), f (2i), f ( 12 ).
√
3
b) Let C = {z ∈ C s.t. |z| = 2, } be the circle with center in
1.20. Let z = 12 + i 2 . Compute z2 , z3 , z4 , z5 , z7 , z8 , z9 . the origin and radius 2. Find f (C).
4πi 2πi
1.21. Express the numbers 3e− 6 , −3e n in standard form. 1.27. Let ε5 denote the primitive 5-th root of unity. Find the
Mobius transformation f (x) such that
1.22. Solve the following
f (0) = 0, f (1) = ε5 , f (ε5 ) = ε25
3
(z + 2 )(3z2 − 2z + 5) = 0.
z Prove your answer. Find f (ε25 ), f (ε35 ), f (ε45 ).
1.23. Factor completely the following polynomial p(z) = z7 −1. 1.28. Prove the De Moivre’s formula. For any integer n ≥ 1
1.24. Factor over Q the polynomial p(z) = z5 − 1. (cos θ + i sin θ)n = cos(nθ) + i sin(nθ).
1.3 Algebraic equations, planar algebraic curves

An algebraic equation is a polynomial equation. Let us recall that degree of a polynomial with two variables is the
total degree. For example the degree of the polynomial
f (x, y) = 1 + x + y + xy,
is 2. The degree of a polynomial f is denoted by deg f .

Since the simplest classification of polynomials, and therefore of algebraic equations, is done based on their
degrees, then the following question is natural:
Question: What geometrical shape has the graph of and algebraic equation with degree d?
1.3.1 Lines
Let’s start with degree one equations. Then we have the following:
9
Lemma 1.4. i) The graph of every algebraic equation of degree one is a line.
ii) Every non-vertical line has equation
y = mx + b,
where m is the slope of the line.
The proof is done in high school algebra. In more general we say that the equation of a line is given by
ax + by = c
Consider now the problem of finding the intersection of two lines L1 and L2 with equations
L1 : a1 x + b1 y = c1
L2 : a2 x + b2 y = c2
Hence we are looking for points which are both in L1 and L2 , or in other words which satisfy the equation of L1 and
L2 . The set of such points P(x, y) we denote by
a1 x + b1 y = c1
(
a2 x + b2 y = c2
and call it a system of two equations with two unknowns. We will see later that systems of equations are always
intersection sets of several geometric objects.
1.3.2 Circle and Ellipse

Let be given a point O(a, b). The set of points on the plane with distance r > 0 from the point O is called a circle with
radius r and center O. This circle has equation
(x − a)2 + (y − b)2 = r2 .
Thus, we have:
Lemma 1.5. Every circle on the plane is given by an algebraic equation of degree 2.
Naturally, we would like to know whether every degree 2 algebraic equation represents a circle. The answer is
negative as we will see below.
Corollary 1.2. A circle with center at the origin and radius r has equation
x2 + y2 = r2 .
Let P1 (a, b) and P2 (c, d) be two points on the plane. The set of points P(x, y) of the plane such that the distance
PP1 + PP2 is constant, say
PP1 + PP2 = 2d,
is called ellipse. Points P1 and P2 are called foci of the ellipse. The equation of the ellipse is
q q
(x − a)2 + (y − b)2 + (x − c)2 + (y − d)2 = 2d.
Lemma 1.6. Every ellipse on the plane can be written as a degree 2 algebraic equation.
Thus, a degree 2 polynomial equation can give us a circle or an ellipse and who knows what else?
Corollary 1.3. An ellipse with foci (c, 0) and (−c, 0) has equation
x2 y2
+ = 1,
a2 b2
where b2 + c2 = a2 .
10
Hence, a degree two equation can give different geometric shapes. Or is it really the circle and the ellipse that
much different? For example are the circle
x2 + y2 = 1
and the ellipse
y2
=1 x2 +
4
really different? After all, the above ellipse can be transformed into the unit circle above by shrinking the y-axis by
y
a factor of two. In other words if we make the substitution Y = 2 in the equation of the ellipse then we get
x2 + Y2 = 1,
which is a unit circle. Let’s investigate further.
Ellipse
First let’s study the ellipse in more detail.
Definition 1.2. Ellipse is called the set of all points of the plane the sum of distances of which from fixed points F1 and F2 ,
is constant. These two points are called the foci of the ellipse. The midpoint of the segment F1 F2 is called the center of the
ellipse.
Kepler’s second law says that the planets orbit around the sun in trajectories that are ellipses and the Sun is in
one of the foci of the ellipse.
To simplify the algebra we can assume that the foci of the ellipse are on the x-axis and the origin in the center of
the ellipse. Say the foci have coordinates (−c, 0) and (c, 0). Denote the sum of distances of a point P(x, y) from the
foci with 2a > 0.
Then P(x, y) is in the ellipse when
|PF1 | + |PF2 | = 2a
Hence, q q
(x + c)2 + y2 + (x − c)2 + y2 = 2a
or q q
(x − c) + y = 2a − (x + c)2 + y2
2 2
Squaring both sides we have

q
x − 2cx + c + y = 4a − 4a (x + c)2 + y2 + x2 + 2cx + c2 + y2
2 2 2 2
and by simplifying we get

q
a (x + c)2 + y2 = a2 + cx.
We square again to get Figure 1.6: Ellipse
a2 (x2 + 2cx + c2 + y2 ) = a4 + 2ca2 x + c2 x2
and the equation becomes
(a2 − c2 )x2 + a2 y2 = a2 (a2 − c2 ).

From the triangle 4F1 F2 P, we see that 2c < 2a. Thus, c < a which implies a2 − c2 > 0. Denote by b2 = a2 − c2 . Then
the equation of the ellipse becomes
b2 x2 + a2 y2 = a2 b2
11
By dividing both sides by a2 b2 ,we have

x2 y2
+ =1 (1.5)
a2 b2
Since b2 = a2 − c2 < a2 then b < a. Hence, the intersection of the ellipse with the x-axis is found by letting y = 0,
Therefore, x2 /a2 = 1, or x2 = a2 , which implies that x = ±a. The corresponding points (−a, 0) and (a, 0) are called
vertices of the ellipse and the line that connects these two points big axis of the ellipse. To find the intersection with
y-axis we take x = 0, which gives
y2
= 1 =⇒ y2 = b2 =⇒ y = ±b.
b2
The Eq. (1.5) does not change when we substitute x with −x and y with −y. Hence, the ellipse is symmetric with
respect to both axes. If the foci are the same, so c = 0, then a = b. In this case the ellipse is a circle with radius r = a = b.
Summarizing all of the above we ave that the ellipse
x2 y2
+ =1 a≥b>0
a2 b2
has foci (±c, 0), where c2 = a2 − b2 and vertices (±a, 0). Let’s see some examples.
x2 y2
Example 1.5. For the ellipse + = 1, the big axes is horizontal or vertical?
6 4
Solution: Since a > b and 6 > 4, we have a2 is the denominator of x2 . Hence, the big axes is horizontal
x2 y2
Example 1.6. Find a, b, c for the ellipse + = 1.
25 16
√ √ √
Solution: From the equation we have a = 5, b = 4. Thus, c = a2 − b2 = 25 − 16 = 9 = 3.
Example 1.7. Find the equation of the ellipse with foci (0, ±2) and vertices (0, ±3).
Solution: Using Eq. (1.5), we have c = 2 and a = 3. Then, b2 = a2 − c2 = 5. Thus, the equation of the ellipse is
x2 y2
+ =1
5 9

1.3.3 Conics
Consider the following problem. It is given the equation z2 = x2 + y2 in space. As we will see later the graph of this
equation is the set of points in R3 which satisfy this equation. Since for every value of z, this is a curve with radius
√
|z|, then this graph is what we call a double cone; see Fig. 1.7.
Exercise 1.2. Let the double cone given as above and
ax + by + cz = d,
the equation of a plane in R3 . Find the intersection between the double cone and the plane.
Notice that in R3 every polynomial equation of degree one represents a plane. To solve the problem we need to
solve the system
z = x2 + y2
( 2
ax + by + cz = d
Geometrically the set of solutions is given in Fig. 1.7.
Next, we will try to determine this solution set algebraically.
12
Figure 1.7: Conics
Parabola
Next we see another shape of graph from a degree two algebraic equation which is not an ellipse or a circle.
Definition 1.3. A parabola is called the set of all points of the plane which are equidistant from a fixed line L and from a
fixed point F which is not on the line. The line L is called directrix of the parabola and the point F the focus.
Notice that the middle point of the perpendicular from the focus to the directrix belong to the parabola. This
point is called the vertex of the parabola. The perpendicular from the focus F to the directrix L is called the axes of
the parabola.
To obtain a simple equation of the parabola we put its vertex on the
origin O and its directrix parallel to the x-axis. If the focus of the
parabola is the point (0, p), then the directrix has equation y = −p. If
P(x, y) is a point in the parabola, then the distance from P to the focus
is q
|PF| = x2 + (y − p)2
and the distance of P from the directrix is |y + p|. From the definition
of the parabola these distances are equal, hence
q
x2 + (y − p)2 = |y + p|
Squaring both sides we get

Figure 1.8: Parabola
x2 + (y − p)2 = |y + p|2 = (y + p)2
x2 + y2 − 2py + p2 = y2 + 2py + p2
x2 = 4py
Thus, the equation of the parabola with focus (0, p) and directrix y = −p is
x2 = 4py (1.6)
13
1
If we would substitute a = , then the standard equation of the parabola Eq. (1.6) will be in the form y = ax2 . It
4p
has its wings up if p > 0 and wings down if p < 0. The graph is symmetric with respect to the y-axis since Eq. (1.6)
does not change when we replace x with −x.
If we change places between x and y in Eq. (1.6) we have:
y2 = 4px (1.7)
which gives the equation of the parabola with focus (p, 0) and directrix x = −p. Parabola is with wings to the right if
p > 0 and to the left when p < 0. In both cases the graph is symmetric to the x-axis which is the axis of the parabola.
If the parabola has its vertex at (h, k) and vertical axis then the standard form is
(x − h)2 = 4p(y − k) (1.8)

where p , 0. The focus is the point (h, k + p) and its directrix y = k − p. The axis is the line x = h. When p > 0 parabola
is with wings up and when p < 0 parabola is with wings down.
If parabola has as a vertex the point (h, k) and horizontal axis, then its standard form is
(y − k)2 = 4p(x − h) (1.9)

where p , 0. Its focus is the point (h + p, k) and its directrix is the line x = k − p. Its axis is the line y = k. When p > 0
parabola has wings open to the right and when p < 0 parabola has wings open to the left.
Both Eq. (1.8) and Eq. (1.9) are obtained by Eq. (1.6) and Eq. (1.7), by shifting the parabola and by substituting x
and y with x − h and y − k respectively.
Example 1.8. Find the focus, directrix, vertex, and axis of the parabola x2 − 4y = 0.
Solution: We write the equation as x2 = 4y and comparing it with Eq. (1.6), we have 4p = 4. Thus, p = 1. Hence, the
focus is (0, p) = (0, 1), directrix y = −p = −1, vertex a (0, 0), and the axis x = 0.
Example 1.9. Find the focus, directrix, vertex, and axis of the parabola y2 + 10x = 0.
5
Solution: We have y2 = −10x and comparing it with Eq. (1.7) we have 4p = −10. Thus, p = − . Hence, the focus is
2
5 5

(p, 0) = − , 0 , directrix y = −p = , vertex (0, 0), and the axis y = 0.
2 2
Example 1.10. Find the focus, directrix, vertex, and axis of the parabola y2 + 4x − 2y − 3 = 0.
Solution: Let us see if we can transform the equation in the standard form
y2 + 4x − 2y − 3 = (y − 1)2 + 4x − 4 = (y − 1)2 + 4(x − 1) = 0

(y − 1)2 = −4(x − 1)
The equation is of the form Eq. (1.9). Hence, its vertex is (h, k) = (1, 1), the focus (0, p) = (0, −1), directrix x = k − p = 2,
and the axis the line with equation y = k = 1.
Example 1.11. Find the focus, directrix, vertex, and axis of the parabola x2 − 4x − 8y + 28 = 0.
x2 − 4x − 8y + 28 = (x − 2)2 − 8(y − 3) = 0
(x − 2)2 = 8(y − 3)
The equation resembles Eq. (1.8). Thus, the vertex is (h, k) = (2, 3), focus (h, k + p) = (2, 5), directrix y = k − p = 3 − 2 = 1,
and the axis x = h = 2.
14
Example 1.12. Find the focus, directrix, vertex, and axis of the parabola x2 − 10x − 2y + 29 = 0.
x2 − 10x − 2y + 29 = (x − 5)2 − 2(y − 2) = 0

(x − 5)2 = 2(y − 2)
The equation resembles Eq. (1.8). Thus, the vertex is (h, k) = (5, 2), focus (h, k + p) = (5, 2.5), directrix y = k − p = 2 − 0.5 =
1.5, and the axis the line with equation x = h = 5.
We leave the reader with the following problem.
Exercise 1.3. Given a degree 2 equation, how can we determine if this equation represents a parabola? For example, can you
prove that the graph of the equation
4x2 + 12xy + 9y2 − 4x + 4y = 0
is a parabola?
Hyperbola
Next we see another shape obtained also from a degree two equation.
Definition 1.4. Hyperbola is the set of all points P(x, y) of the plane whose
difference from two fixed points F1 and F2 is constant. In other words, all
points P(x, y) such that
|PF1 | − |PF2 | = ±2a
for some constant a > 0. Points F1 and F2 are called foci of the hyperbola.
The midpoint of the segment that connects to foci is called the center of the
hyperbola. The line that connects the two foci is called the axis of the
hyperbola.
Notice that the definition of the hyperbola is similar to the definition
of the ellipse. The only difference is that the sum of the distances in the
definition of the ellipse is replaced by the difference of the distances.
Determining the equation of the hyperbola is similar to that of the Figure 1.9: Hyperbola
ellipse.
If the foci are on the x-axis, precisely on (±c, 0) and the difference of the distances is
|PF1 | − |PF2 | = ±2a
then the equation of the hyperbola is

x2 y2
− =1 (1.10)
a2 b2
where c2 = a2 + b2 . The intersection points with the x-axis are ±a and the points (a, 0) and (−a, 0) are the vertices of
the hyperbola. If we take x = 0 in Eq. (1.10), then we have y2 = −b2 which is impossible. Hence, hyperbola has no
intersection points with the y-axis. Hyperbola is symmetric with both axes.
We can transform Eq. (1.10) to the form
x2 y2
= 1 + ≥1
a2 b2
√
Thus, x2 ≥ a2 . Hence, |x| = x2 ≥ a. Therefore, x ≥ a ose x ≤ −a.
So the hyperbola has two parts, or wings. The lines y = (b/a)x dhe y = −(b/a)x are asymptotes of these wings.
Summarizing we have that hyperbola
x2 y2
− =1
a2 b2
15
!
b
has foci (±c, 0), where =
c2 a2 + b2 ,
horizontal axis, vertices (±a, 0), and asymptotes y = ± x.
a
If the foci are on the y-axis, then interchanging places for x and y in Eq. (1.10) we have that the hyperbola
y2 x2
− =1 (1.11)
a2 b2
a
has foci (0, ±c), ku c2 = a2 + b2 , vertical axis, vertices (0, ±a), and asymptotes y = ±( )x.
b
The standard equation of the hyperbola with center in (h, k) and horizontal axis is
(x − h)2 (y − k)2
− =1 (1.12)
a2 b2
The distance between two vertices is 2a. Vertices are (h ± a, k), the distance between to foci is 2c. The foci are (h ± c, k),
where c2 = a2 + b2 . The equations of the asymptotes are
b
y = k ± (x − h)
a
The standard equation of the hyperbola with center in (h, k) and vertical axis is
(y − k)2 (x − h)2
− =1 (1.13)
a2 b2
The distance among the vertices is 2a. Vertices are (h, k ± a) and the distance among the foci 2c. The foci are (h, k ± c),
where c2 = a2 + b2 . The equations of the asymptotes are
a
y = k ± (x − h).
b
Let us illustrate with a few examples.
Example 1.13. If the equation of the hyperbola is
y2 x2
− = 1,
4 36
√ √
then the axis of the hyperbola is vertical, a = 6, b = 2 and each foci is the distance c = a2 + b2 = 2 10 from the center of the
√ √ 1 1
hyperbola. The foci are (0, −2 10) and (0, 2 10). Asymptotes have equations y = x dhe y = − x, since ba = 13 .
3 3

Example 1.14. Suppose that the center of the hyperbola is at (2, −1). Then, the equation of the hyperbola is
(x − 2)2 (y + 1)2
− ,
9 16
b 4
from which we have a = 3 and b = 4. The slope of the asymptotes are ± = ± . The axis of the hyperbola is horizontal. The
a 3
foci are (7, −1) and (−3, −1). Vertices of the hyperbola are (−3, −1) and (5, −1).

Example 1.15. Find the equation of the hyperbola with center at (2, −1) when a = 3, b = 4, and the axis is horizontal
Solution: The equation of the hyperbola is
(x − 2)2 (y + 1)2
− =1
32 42
16
√
and c = a2 + b2 = 5. The distance between two vertices is 2a = 6. Vertices are (h − a, k) = (−1, −1) and (h + a, k) = (5, −1).
The distance between to foci is 2c = 10. The foci are (h − c, k) = (−2, −1) and (h + c, k) = (7, −1). The equations of the
asymptotes are
b 4 b 4
y = k − (x − h) = −1 − (x − 2) dhe y = k + (x − h) = −1 + (x − 2)
a 3 a 3

Example 1.16. Find the foci and the equation of the hyperbola with vertices (0, ±1) and asymptote y = 2x.
√
Solution: From Eq. (1.11) we have a = 1 and a
b = 2. Hence, b = a
2 = 1
2 and c2 = a2 + b2 = 45 . Then the foci are (0, ± 5/2)
and the equation of the parabola is y2 − 4x2 = 1.
Example 1.17. Determine the conic and find its foci
9x2 − 4y2 − 72x + 8y + 176 = 0
Solution: We first complete the squares to get the equation in standard form. Hence,
4(y2 − 2y) − 9(x2 − 8x) = 176
implies
4(y − 1)2 − 9(x − 4)2 = 36
or in other words
(y − 1)2 (x − 4)2
− = 1.
9 4
√
Thus the equation is of the form of Eq. (1.13). Therefore, a2 = 9, b2 = 4, c2 = 13. The foci are (4, 1 + 13) and
√ 3
(4, 1 − 13), while the vertices are (4, 4) and (4, −2). The equations of the asymptotes are y = 1 ± (x − 4).
2
Exercises:
Find the vertex, focus, and directrix of the parabola and x2 y2

1.37. + =1
sketch its graph 1 4
1 2 x2 y2
1.29. x = y 1.38. + =1
2 16 4
1.30. y = −4x2 x2 y2
1.39. + =1
9 25
1.31. (x + 1)2 = 4(y − 2)
1.40. x2 + 4y2 2 + 8x − 8y + 16 = 0
1
1.32. x2 − 6x + y=8
2 1.41. 490x2 + 90y2 − 4410 = 0
1.33. y2 = 8x (x + 4)2 (y − 8)2
1.42. + =1
1.34. x − 1 = (y + 3)2 25 81
(x + 6)2 (y + 5)2
1.35. x2 + 2x + 12y + 25 = 0 1.43. + =1
16 4
1 2
1.36. x − 13 = y − 13 (x + 2)2 (y − 1)2
36 1.44. + =1
49 16
Find the vertices and the foci of the ellipse and sketch its (x − 1)2 (y + 7)2
graph. 1.45. + =1
256 9
17
Find vertices, foci, asymptotes of the hyperbola and 1.52. 4x2 + y2 + 24x − 6y + 9 = 0
sketch its graph.
1.53. y2 − 2y + 40x + 281 = 0
x2 y2
1.46. − =1 1.54. 4x2 + y2 + 32x + 6y + 57 = 0
36 81
1.55. x2 + y2 − 6x + 4y + 12 = 0
y2 x2
1.47. − =1
4 36 1.56. y2 − x2 + 2y − 14x − 57 = 0
1.48. 75x2 − 27y2 = 675 1.57. 4x2 − y2 − 56x − 4y + 176 = 0
1.49. x2 − 4y2 + 10x − 48y − 135 = 0 1.58. x2 = y2 + 1
(y + 3)2 (x + 7)2 1.59. y2 + 2y = 4x2 + 3
1.50. − =1
25 64
1.60. 4x2 + 4x + y2 = 0
5x2
1.51. − 5y2 = 80 1.61. y2 − 8y = 6x − 16
4
1 2
Identify the conic and find vertices and foci. 1.62. x + 62 = y + 62
12
1.4 Conics with mixed terms

Consider the following problem, which looks very innocent and elementary.
Exercise 1.4. Let be given a curve with equation
13x2 − 18xy + 9y2 − 40x = −64 (1.14)
Graph the set of solutions of this equation.
We don’t really know a methodological approach how to graph

the set of solutions of this equation. We are not even sure what shape
the graph might have. If we were to use any computational algebra
packages then we can get the graph in Figure 1.10.
The graph looks like an ellipse. Is it really an ellipse? Can you find
algebraic substitutions which do not change the shape of the graph
and make the equation easier to graph? After all, if this is really an
ellipse, shouldn’t we be able to move the coordinate system such that
it is right in the center of this ellipse? In other words, can we find
algebraic substitutions for x and y such that this equation becomes
X 2 Y2
+ =1
A2 B2
for some real numbers A and B?
It can be easily verified that Equation 1.14 can be written as
9(x − y)2 + 4(x − 5)2 = 36 Figure 1.10: The graph of Equation 1.14
Thus, by letting
X = x−5 and Y = x− y
we get
X 2 Y2
+ =1 (1.15)
32 22
which definitely seems nicer than Equation 1.14. This "new" ellipse has axes of length 6 and 4 and they seem, at
least visually, to be close to the original ellipse.
18
Question 1.1. Can we find a methodological approach to solve this problem or similar problems like this one?
Stated a bit differently, given a degree 2 equation, can we determine a methodological approach so we can make
the right substitutions and that the equation is transformed into a nicer one?
Question 1.2. Can this be done for quadratic surfaces (degree 2 equations in space)? What about higher degree equations?
In the process, we will have to understand and answer the following three questions:
Question 1.3. i) Which substitutions preserve the shape of graphs?
ii) Which substitutions preserve the size of graphs?
ii) How do we determine such substitutions?
A student who will learn to answer all of three parts of the question above, even for just degree 2 equations, will
have learned linear algebra. In the process we will learn a beautiful theory and about the people who developed it.
1.5 Vectors in Physics and Geometry

Let’s recall a few facts from high school physics or geometry. In this section we will denote by R the plane or the
3-dimensional space and n = 2, 3. Then every point in R is represented uniquely by an ordered tuple x1 , . . . , xn ).
Let’s also denote the set of all ’vectors’ in R by S.
In this set S we define the following relation: u ∼ v if the following hold
i) u and v are parallel
ii) have the same length
iii) have the same direction
You should prove that this is indeed an equivalence relation. Then, a vector is called an equivalence class from
the above relation. Denote the set of all such equivalence classes by S/ ∼. Hence, this is the set of all vectors.
Moreover, the above three conditions are geometrically equivalent with moving the vector u in a parallel way over
v. So we can assume that all vectors of S/ ∼ start at the origin O of the space R.
Thus, there is a one-to-one correspondence between the set of elements of S/ ∼ and points of R, namely
~ ←→ P = (x, y) or P = (x, y, z)
u = OP
u1 
 
Hence, a vector u = OP is an ordered tuple (u1 , . . . , un ) for n = 2, 3 and will be denoted by u =  ... , in order to
~  
 
un
distinguish it from the point P(u1 , . . . , un ). In the next section we will generalize this concept to Rn .
1.5.1 The plane R2

We are familiar with the notion of a vector in the real plane R2 . In this section we will briefly review some of the
properties of vectors in R2 and extend these concepts to Rn . A vector in R2 is an ordered pair
" #
v
v := 1 , where v1 , v2 ∈ R.
v2
" # " #
u1 v
For any two vectors u = , v = 1 define the addition and scalar multiplication in the usual way
u2 v2
u1 + v1
" # " #
ru1
u + v := , and r · u := , (1.16)
u2 + v2 ru2
19
~
ru
~
u ~
u
Figure 1.11: Multiplying by a scalar
~
u ~
u
~
v
~+~
u v ~+~
u v
~
v ~
v
~
u
Figure 1.12: Addition of vectors
where r ∈ R.
Geometrically scalar multiplication r u is described as in Fig. 1.11, where r u is a new vector with the same
direction as u and length r-times the length of u.
Addition of two vectors u and v geometrically is described in Fig. 1.12.
Example 1.18. Prove that such definitions agree with addition and scalar multiplication defined in Eq. (1.16)
1.5.2 The space R3
Next we review briefly the geometry of the space and vectors in R3 .

Recall that R3 is the Cartesian product R × R × R = {(x, y, z) | x, y, z ∈ R}
and a point P in R3 is represented by an ordered triple (x, y, z) as z
shown in Fig. 1.13.
From our discussion above, there is a one to one correspondence
R
between points in R3 and vectors in space, namely the point P corre-
spond to the vector OP~ and vice versa.
S
Hence, a vector v in R3 is an ordered triple (v1 , v2 , v3 ), denoted by P z0
v1 
 
y
v := v2  , x0
 
v3
 
y0 Q
x
where v1 , v2 , v3 ∈ R are called the coordinates of v.
Notice that we will denote a point by an ordered triple P(x, y, z) Figure 1.13: Coordinates of P(x, y, z).
and will always distinguish this from the vector OP ~ with coordinates
x, y, z.
20
u1  v1 
   
For any two vectors u = u2  and v = v2  we define the addition and scalar multiplication as in R2 , namely
   
u3 v3
   
u1 + v1  ru1 
   
u + v := u2 + v2  , r · u := ru2  .
   
u3 + v3 ru3
   
where r ∈ R.
Since any two generic lines determine a plane, the geometric interpretation of addition and scalar multiplication
of R2 is still valid in R3 .
Let be given two points P1 (x1 , y1 , z1 ) and P2 (x2 , y2 , z2 ) in R3 . We will show that the distance |P1 P2 | between the
two points is
q
|P1 P2 | = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2
To verify this formula we construct a parallelepiped where the points
P1 and P2 are vertices across from each other as in Fig. 1.14. If z
A(x2 , y1 , z1 ) and B(x2 , y2 , z1 ) are the other vertices as in Fig. 1.14, then
|P1 A| = |x2 − x1 | |AB| = |y2 − y1 | |BP2 | = |z2 − z1 |
Since the triangles 4P1 BP2 and 4P1 AB are right triangles, from the
Pythagorean theorem we have
|z2 − z1 |
P2
|P1 B|2 = |P1 A|2 + |AB|2 P1
|P1 P2 |2 = |P1 B|2 + |BP2 |2 |x2 − x1 |

Combining the two equations we have y2 − y1
|P1 P2 |2 = |P1 A|2 + |AB|2 + |BP2 |2 y

= |x2 − x1 | + |y2 − y1 | + |z2 − z1 |
2 2 2
= (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

x
Thus,
q Figure 1.14: Distance between two points
|P1 P2 | = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2
The distance between a point P(x, y, z) and the origin is
q
|OP| = x2 + y2 + z2
Equation of the sphere
The equation of the sphere with center at the point with coordinates (x0 , y0 , z0 ) and radius r is
(x − x0 )2 + (y − y0 )2 + (z − z0 )2 = r2
To prove this we just need the definition of the sphere, which is the set of all points P(x, y, z) equidistant from the
fixed point Q(x0 , y0 , z0 ) with a distance r from it. Thus, |PQ| = r. Squaring both sides we have |PQ|2 = r2 or
(x − x0 )2 + (y − y0 )2 + (z − z0 )2 = r2
When the center of the sphere is at the origin we have
x2 + y2 + z2 = r2
However, not every sphere has an equation as above. Consider the following example:
21
z
z
(x, y, z)
(x, y, z)
r
y (x0 , y0 , z0 )
0
y
0
x
x
(a) Radius r, center (0, 0, 0) (b) Radius r, center (x0 , y0 , z0 )
Figure 1.15: Sphere in R3
Example 1.19. Prove that the following equation represent a sphere and find its radius and its center
4x2 + 4y2 + 4z2 − 8x + 16y = 1
Solution: Complete squares of the binomials and we have
21
(x − 1)2 + (y + 2)2 + z2 =
4
q
Thus the equation represents a sphere with center (1, −2, 0) and radius 21
4 .
In Chapter ?? we will show how to use methods of linear algebra and have a methodological way of doing this
for every quadratic surface.
Keep in mind that an equation in variables x and y represents a

curve in R2 and a surface in R3 . We illustrate with an example.
Example 1.20. Construct the graph of x2 + y2 = 4 in R2 and R3
Solution: In R2 this equation represents a circle with radius 2 and

center at the origin.
In R3 it is a right cylinder with radius r = 2 and exists the z-axis as in
Fig. 1.16

Properties of vector addition and multiplying by a scalar we can
Figure 1.16: x2 + y2 = 4 in R3
summarize below:
Theorem 1.1. If u, v, w are three vectors in R3 and c and d are scalar, then the following hold:
1. u + v = v + u
2. u + (v + w) = (u + v) + w
3. u + 0 = u
4. u + (−u) = 0
5. c(u + v) = cu + cv
22
6. (c + d)u = cu + du
7. (cd)u = c(du)
8. 1u = u
The proof is left as an exercise to the reader.
Let’s denote by V3 the set of all vectors in the 3-dimensional space R3 . Three vectors which play a special role
in V3 are  
1
 
0
 
0
i = 0 j = 1 , k = 0
     
0 0 1
     
These vectors are called vectors of the standard basis. We will explain this terminology in more detail in the coming
sections.  
a
If u = b, then we have
 
c
 
a a 0 0

         
1
   
0 0
u =   =   +   +   = a   + b 1 + c 0 = ai + bj + ck
b 0 b 0 0    
c 0 0 c 0 0 1
             
Hence, every vector in R3 is expressed in terms of vectors i, j, k.

A vector u is called a unit vector if it has length 1. For example, vectors i, j, k are unit vectors. A unit vector
which has the same direction with a given vector u is a vector
1 u
u= .
kuk kuk
In the next section we will formalize such definitions to the case of Rn . The reader should make sure to fully
understand the concepts from R2 and R3 before proceeding to Rn .
Exercises:
1.63. Find the lengths of the sides of the triangle with vertices 1.70. Find the equation of the sphere which passes through the
A(3, −2, 1), B(1, 2, −3), C(3, 4, −2). Determine if this triangle point (4, 3, −1) and has the center at (3, 8, 1)
is regular.
Prove that the following equations represent a sphere,
1.64. Finds the distance of the point (−5, 3, 4) from each coor- find its center and its radius.
dinate plane. 1.71. x2 + y2 + z2 − 6x + 4y + 2z = −17
1.65. Find the magnitude of the force which has its projections 1.72. x2 + y2 + z2 = 4x − 2y
on the coordinate axis as x = −6, y = −2, and z = 9.
1.73. x2 + y2 + z2 = x + y + z
1.66. Prove that the triangle with vertices A(1, −2, 1) 1.74. x2 + y2 + z2 + 2x + 8y − 4z = 28
B(3, −3, 1) and C(4, 0, 3) is a right triangle.
1.75. 16x2 + 16y2 + 16z2 − 96x + 32y = 5
1.67. Find the equation of √ the sphere with center at the point
1.76. (a) Prove that the middle of the segment which is deter-
(4, −2, 3) and radius r = 3.
mined by the points A(a1 , b1 , c1 ) and B(a2 , b2 , c2 ) is the point
1.68. Find the equation of with coordinates
√ the sphere with center at the point
(−1, 3, 2) and radius r = 3. a1 + a2 b1 + b2 c1 + c2
!
, ,
2 2 2
1.69. Find the equation of the sphere with center at the point
(2, 3, 4) and radius 5. Where does the sphere intersect the (b) Find the lengths of the three medians of the triangle with
coordinate planes? vertices A(4, 1, 5), B(1, 2, 3), C(−2, 0, 5)
23
 
Determine the inequalities which determine the follow-  2 
ing regions. 1.86. Find a + b, 3a − 2b, kak and |a − b|, if a = −4 and
 
4
 
1.77. The region between the plane xy and z = 5
 
 0 
b =  2 .
 
1.78. The region which consists of all points between spheres −1
 
of radii r and R with center at the origin, where r < R.
1.87." Find # v + w," v −# w, kvk and |v − w|, |v + w|, and −2v, if
1.79. Find the equation of the sphere with has the same center 1 −1
v= and w = .
with x2 + y2 + z2 − 6x + 4z − 36 = 0 and passes through the 3 −5
point (2, 5, −7).
1.88. Find v + w, v − w, kvk and |v − w|, |v + w|, and −2v, if
−1
 
1
 
1.80. Prove that the set of all points whose distance from
A(−1, 5, 3) is twice the distance from B(6, 2, −2), is a sphere. v = 2 and w =  2 
   
3 −3
1.81. Determine an equation for the set of points equidistant 1.89. Find v + w, v − w, kvk and |v − w|, |v + w|, and −2v, if
from A(−1, 5, 3) and B(6, 2, −2).  
1 −1
 
= =
0 −2
−→ v   and w  
1.82. Draw the vector AB, when A and B are given as below
 
1 2
 
and find its equivalent with the initial point at the origin.
1.90. Find the unit vector which has the same direction with
i) A = (0, 3, 1), B = (2, 3, −1) the vector −3 i + 7 j.
1.91. Find the unit vector which has the same direction with
ii) A = (4, 0, −2), B = (4, 2, 1) the vector 2 i − j + 3 k.
iii) A = (2, 0, 3), B = (3, 4, 5) 1.92. Find the unit vector which has the same direction with
 2 
iv) A = (0, 3, −2), B = (2, 4, −1) the vector  3 .
 
−2
 
1.83. Find the sum of the vectors and illustrate geometrically
1.93. Find
  a vector which has the same direction with the
" # " # 3
1 4
, vector 2, but has length 3
 
i)
−4 3 1
 
 
 1 
 
2 1.94. Find
  a vector which has the same direction with the
ii) −3 ,
  0 −2
vector  4 , but has length 6.
   
5 1
 
2
 
" # " #
2 3 −1
   
iii) , 3
1 −1 1.95. Let be given the vectors v =  5  and w = 1.
   
−2 1
   
 
1
 
3 1. Find the vector u such that u + v + w = i.
iv) 1 , 2
 
2. Find the vector u such that u + v + w = 2 j + k.
 
2 1
 
−→ −→ −−→
1.96. If A, B, C are vertices of a triangle, find AB + BC + CA.
" #
5
1.84. Find a + b, 2a − 3b, kak and |a − b|, if a = and
−12
" # " # " #
3 2 7
" # 1.97. Draw the vectors a = , b= and c = . De-
3 2 −1 1
b= .
6 termine graphically if there exist the scalars s and t such that
  c = sa + tb. Find the values for s and t.
 1  1.98. Let be given ~x and ~y two nonzero vectors not parallel in
1.85. Find a − b, a + 2b, kak and |a − b|, if a =  2  and
R2 . Prove that if ~z is any vector in R2 , then there exist two
−3
 
scalars s and t such that ~z = s~
x + t~y.
−2
 
 
b =  .
−1 1.99. Is the property from the previous problem true for R3 ?
5
  Explain.
24
x x0 
   
of all points (x, y, z) which satisfy
1.100. Let u =  y and v =  y0  be given vectors in R2 . De-
   
z z0
   
scribe the set of all points (x, y, z) which satisfy |u − v| = 1. |a − a1 | + |a − a2 | = k,
" # " # " #
x x1 x
1.101. Let a = ,a = , and a2 = 2 . Describe the set
y 1 y1 y2 where k > |a1 − a2 |
25
26
Chapter 2
Euclidean spaces, linear systems
We start this chapter with the familiar notion of Euclidean spaces

Rn which we know from high school. Intuition from R2 and R3 will
be used to generalize concepts for Rn including the norm, dot product
of vectors, angles among vectors, and the geometry of R2 and R3 . z
plane-yz
x
In Section 2.3, we introduce the matrices and their algebra. Using e-
n
matrices to solve linear systems of equations involves computing la
p
the row-echelon form and the reduced row-echelon form of matrices. y
These are the so-called Gauss algorithm and Gauss - Jordan algorithm
and are studied in section 4 and 5. In section 6 we study the inverses
plane-xy
of matrices and algorithms of computing such matrices.
x
Figure 2.1: Euclidean space R3 .
2.1 Euclidean n- space Rn

Let Rn be the following Cartesian product
Rn := {(x1 , . . . , xn ) | xi ∈ R}
u1 
 
A vector u in R will be defined as an ordered tuple (u1 , . . . , un ) for ui ∈ R, i = 1, . . . , n and denoted by u =  ... . For
n
 
 
un
u1  v1 
   
any u, v ∈ Rn such as u =  ...  and v =  ...  we define the vector addition and scalar multiplication as follows:
   
   
un vn
 u1 + v1  rv1 
   
u + v :=  ...  , r v :=  ...  .

   
(2.1)
   
un + vn rvn
A Euclidean n-space is the set of vectors together with vector addition and scalar multiplication defined as
0
 
above. Elements of R are called vectors and all r ∈ R are called scalars. The vector 0 =  ...  is called the zero vector.
n
 
 
0
27
By a vector u we usually mean a column vector unless otherwise stated. The row vector [u1 , . . . , un ] is called the
transpose of u and denoted by
ut = [u1 , . . . , un ]
For the addition and scalar multiplication we have the following properties.
Theorem 2.1. Let u, v, w be vectors in Rn and r, s scalars in R. The following are satisfied:
1) (u + v) + w = u + (v + w),
2) u + v = v + u,
3) 0 + u = u + 0 = u,
4) u + (−u) = 0,
5) r (u + v) = ru + rv,
6) (r + s) u = r u + s u,
7) (rs) u = r (s u),
8) 1 u = u.
Proof. Exercise.
Two vectors v and u are called parallel if there exists an r ∈ R such that v = r u. Given vectors v1 , . . . , vs ∈ Rn and
r1 , . . . , rs ∈ R, the vector
r1 v1 + · · · + rs vs
is called a linear combination of vectors v1 , . . . , vs .
Definition 2.1. Let v1 , . . . , vs be vectors in Rn . The span of these vectors, denoted by Span (v1 , . . . , vs ), is the set in Rn of all
linear combinations of v1 , . . . , vs .

Span (v1 , . . . , vs ) = r1 v1 + · · · + rs vs | ri ∈ R, i = 1, . . . , s
Definition 2.2. Vectors u1 , . . . , un are called linearly independent if
r1 u1 + · · · + rn un = 0
implies that
r1 = · · · = rn = 0,
otherwise, we say that u1 , . . . , un are linearly dependent.
In the coming sections we will see that the concept of linear independence is one of the most important concepts
of linear algebra. Our strategy will be to try to generalize all concepts of R2 or R3 to Rn . Of course the geometric
interpretation in Rn doesn’t make sense, but this will not deter us to assign the same names to abstract concepts in
Rn as we had for R2 and R3 .
Exercises:
" # " #
2.1. Show that the formal definitions of the addition and scalar 3 5
2.4. Let v = and u = . Find scalars r, s such that
multiplication in R2 agree with the geometric interpretations. 5 6
2.2. Let u, v, w given as below #"
5
  rv+su = .
 3 
 
1
 
0 11
v =   , u =   , w = 3 .
 5  1  
−1 7 4
     
2.5. What does it mean for two vectors u, v ∈ R2 to be linearly
Compute 2u + 3v − w. dependent?
 
 1 
 
 3  " # " #
2.3. Let v =  , u =  6 . Compute 2u + 3v. 0 1
 2   
2.6. What is the span of and in R2 ?
−1 −6 1 0
   
28
2.7. Let u, v, and w be given vectors as below 2.10. Let c be a positive real number and O1 , O2 points in the
  xy-plane with coordinates (c, 0) and (−c, 0) respectively. Find
1
 
3
 
1 an equation that describes all points P of the xy-plane such
u = 2 , v = 4 , w = 1 . that
     
→ →
0 0 1 ||PO1 || + ||PO2 || = 2a,
     
Can w be a linear combination of u and v? What is geometri- for a > c.

cally the span of u and v? y
2.8. Find the area of the triangle determined by the vectors

 
1
 
 2 
u = 2 and v =  2  . O1 O2
   
2 −3 x
   
2.9. Use vectors to decide whether the triangle with vertices P(x, y)
A = (1, −3, −2), B = (2, 0, −4), and C = (6, −2, −5) is right
angled. D
2.2 Norm and dot product

In this section we study two very important concepts of Euclidean spaces; that of the dot product and the norm.
The concept of the dot product will be generalized in chapter 5 to any vector space.
u1 
 
Definition 2.3. Let u :=  ...  ∈ Rn . The norm of u, denoted by kuk, is defined as
 
 
un
q
kuk = u21 + · · · + u2n
The norm has the following properties:

Theorem 2.2. For any vectors u, v ∈ Rn and any scalar r ∈ R the following are true:
a) kuk ≥ 0 and kuk = 0 if and only if u = 0
b) kruk = |r| kuk
c) ku + vk ≤ kuk + kvk
Proof. The proof of i) and ii) are easy and left as exercises. The proof of iii) is completed in Lemma 2.3
u
A unit vector is a vector with norm 1. Notice that for any nonzero vector u the vector kuk is a unit vector.
Definition 2.4. Let
u1  v1 
   
u :=  ...  , v :=  ... 

   
   
un vn
be vectors in Rn . The dot product of u and v (sometimes called the inner product) is defined as follows:
u · v := u1 v1 + · · · + un vn ,
or sometimes denoted by hu, vi. The following identity
kvk2 = v · v
is very useful.
29
Lemma 2.1. The dot product has the following properties:
i) u · v = v · u
ii) u · (v + w) = u · v + u · w
iii) r (u · v) = (r u) · v = u · (rv)
iv) u · u ≥ 0, and u · u = 0 if and only if u = 0
Proof. Use the definition of the dot product to check all i) through iv).
Two vectors u, v ∈ Rn are called perpendicular if u · v = 0.
Lemma 2.2 (Cauchy-Schwartz inequality). Let u and v be any vectors in Rn . Then
|u · v| ≤ ||u|| · ||v||
Proof. If one of the vectors is the zero vector, then the inequality is obvious. So we assume that u, v are nonzero.
For any r, s ∈ Rn we have krv + suk ≥ 0. Then,
krv + suk2 = (rv + su) · (rv + su)

= r2 (v · v) + 2rs (v · u) + s2 (u · u) ≥ 0
Take r = u · u and s = −v · u. Substituting in the above we have:
krv + suk2 = (u · u)2 (v · v) − 2(u · u) (v · u)2 + (v · u)2 (u · u)

h i
= (u · u) (u · u)(v · v) − (v · u)2 ≥ 0
h i
Since (u · u) = kuk2 > 0 then (u · u)(v · v) − (v · u)2 ≥ 0. Hence,
(v · u)2 ≤ (u · u) (v · v) = kuk2 · kvk2
and
|u · v| ≤ ||u|| · ||v||.

Lemma 2.3 (Triangle inequality). For any two vectors u, v in Rn the following hold
ku + vk ≤ kuk + kvk
Proof. We have
ku + vk2 = (u + v) · (u + v)
= (u · u) + 2(u · v) + (v · v) = kuk2 + 2(u · v) + kvk2 ≤ kuk2 + 2 |u · v| + kvk2
≤ kuk2 + 2 · kuk · kvk + kvk2 = (kuk + kvk)2
Hence,
kv + uk ≤ kvk + kuk.

Example 2.1. Let u and v be two given vectors and θ the angle between them. Prove that
u · v = ||u|| · |v|| cos θ
30
Hence, we have the following definition. The angle between two vectors u and v is defined to be
u·v

θ := cos−1
kuk · kvk
From Lem. (2.2) we have that
u·v
−1 ≤ ≤1
kuk · kvk
Hence, the angle between two vectors is well defined.
Example 2.2. Find the angle between
−1
 
 2 
 
u = −1 , and v = −1
   
2 1
   
Solution: Using the above formula we have
! √ !
(2, −1, 2) · (−1, −1, 1) 3
θ = cos −1
√ √ = cos−1
.
9· 3 9
Then θ ≈ 1.377 radians or θ ≈ 78.90◦ .
Consider vectors u and v in R2 as in the Fig. 2.2. The projection

vector of v on u, denoted by proju (v) is the vector obtained by drop- B
ping a perpendicular from the vertex of v on the line determined by
u. Thus,
v w u−v
→ u·v u·v
kproju (v)k := kAOk = ||v|| · cos (CÂB) = ||v|| · = .
||u|| · ||v|| ||u||
u
We can multiply by the unit vector kuk to get
A C
pro ju v u
u·v u u·v
proju (v) = · = 2 u (2.2)
||u|| kuk u
Figure 2.2: The projection of v onto u
If we want a vector perpendicular to u we have
u·v
w = v − proju (v) = v − u. (2.3)
u2
We will see later in the course how this idea is generalized in Rn to the process of orthogonalization and is used in
the method of least squares.
Exercise 2.1. The above discussion provides a method that for any two given vectors u and v we can determine a vector w
which is perpendicular to u. Can you devise a similar argument for three vectors u1 , u2 , u3 ? In other words, determine v and
w from u1 , u2 , u3 such that the set of vectors {u, v, w} are pairwise perpendicular.
Exercises:
2.11. Prove that the triangle with vertices A(−2, 4, 0), 2.13. Describe the region that determine the following equa-
B(1, 2, −1) and C(−1, 1, 2) is regular. tions or inequalities in R3 .
2.12. In the third octant find the point P the

√ distances√of 1. x2 + y2 = 1
which from the three coordinate axis are dx = 10, d y = 5,
√
dz = 13. 2. 1 ≤ x2 + y2 + z2 ≤ 25
31
3. x < 5  t 
 
1
 
2.20. For what values of t are the vectors u = 0 and v = −t
   
4. z = −4
 2
t t
 
perpendicular?
5. y > −3
2.21. Show that the distance d from a point P = (x0 , y0 ) to a
6. x2 =4 line
ax + by + c = 0
7. x = y
is given by
|ax0 + by0 + c|
8. x2 + y2 + z2 ≤3 d= √ .
a2 + b2
2.14. Let 4 ABC be any given triangle and θ the angle between
2.22. Let the vectors u, v, w have the same origin in R3 and
AB and AC. Prove the law of cosines in a triangle
coordinates
BC = AB + AC − 2 AB · AC · cos θ
2 2 2  
1
 
 2  −1
 
u = 2 , v =  2  , and w = −1 .
     
2.15. Show that for any two vectors u and v the following is 2
 
−3
 
−1
 
true
(v − w) · (v + w) = 0 ⇐⇒ ||v|| = ||w|| Compute the volume of the parallelepiped determined by
u, v, w.
2.16. Let a and b be the sides of a parallelogram and its diag-
onals d1 , d2 . Show that, 2.23. Let the vectors u, v ∈∈ R3 be given as below
 
1
 
d21 + d22 = 2(a2 + b2 ).  1 
u =   and v =  2  .
2  
2 −3
   
2.17. Prove that two diagonals of a parallelogram are perpen-
dicular if and only if all sides are equal.
Find the projection of u on v.
 
1
 
 2  2.24. Let u, v, w be vectors in R3 as follows:
2.18. Find the angle between the vectors u =   and v =  2 
2  
−3 −1
 
1
 
 2 
 
2
   
and the area of the triangle determined by them. u = 2 , v =  2  , w = −1 .
     
2 −3 −1
     
2.19. Let u be the unit vector tangent to the graph of y = x + 1
2
at the point (2, 5). Find a vector v perpendicular to u. Find the projection of u onto the vw-plane.
2.3 Matrices and their algebra

A matrix is a list of vectors. Consider for example vectors ui ∈ Rm , for i = 1, . . . , n. An ordered list of such vectors,
say
A = [u1 , . . . , un ]
is called a matrix. If each ui is given by
 ai,1 
 
 a 
 i,2 
ui =  . 
 .. 
 
ai,m
then A is a m by n table of scalars from R.

In general an m × n matrix A is an array of numbers which consists of m rows and n columns and is represented
32
as follows:
 a1,1 a1,2 a1,3 ... a1,n
 

 a
 2,1 a2,2 a2,3 ... a2,n 

 a3,1 a3,2 a3,3 ... a3,n
 

A = [ai, j ] =  ·
 

 (2.4)

 · 


 · 

am,1 am,2 am,3 ... am,n
The i-th row of A is the vector
Ri := (ai,1 , . . . , ai,n )
and the j-th column is the vector
 a1, j 
 

 · 
C j := 
 ·  .
· 
 


an,j
Let A = [ai, j ] be an m × n matrix and B = [bi, j ] be a n × s matrix. The matrix product AB is the n × s matrix C = [ci, j ]
such that ci,j is the dot product of the i-th row vector of A and the j-th column vector of B.
The matrix addition is defined as
A + B = [ai,j + bi, j ],
and the multiplication by a scalar r ∈ R is defined to be the matrix the matrix
rA := [rai,j ].
The m × n zero matrix, denoted by 0, is the m × n matrix which has zeroes in all its entries. An m by n matrix A
is called a square matrix if m = n. If A = [ai,j ] is a square matrix then all entries ai,i form the main diagonal of A.
The n by n identity matrix, denoted by In , is the matrix which has 1’s in the main diagonal and zeroes elsewhere.
A matrix that can be written as rI is called a scalar matrix. Two matrices are called equal if their corresponding
entries are equal. Notice that the arithmetic of matrices is not the same as the arithmetic of numbers. For example,
in general AB , BA, or AB = 0 does not imply that A = 0 or B = 0. We will study some of these properties in detail
in the next few sections. Next we state the main properties of the algebra of matrices.
Theorem 2.3. Let A, B, C be matrices of sizes such that the operations below are defined. Let r, s be scalars. Then the following
hold:
i) A + B = B + A
ii) (A + B) + C = A + (B + C)
iii) A + 0 = 0 + A = A
iv) r(A + B) = rA + rB
v) (r + s)A = rA + sA
vi) (rs)A = r(sA)
vii) (rA)B = A(rB) = r(AB)
viii) A(BC) = (AB)C
ix) IA = A = AI
x) A(B + C) = AB + AC
xi) (A + B)C = AC + BC
Proof. Most of the proofs are elementary and we will leave them as exercises for the reader.
The trace of a square matrix A = [ai,j ] is the sum of its diagonal entries:
tr (A) := a11 + · · · + ann .
Lemma 2.4. The following hold:

i) tr (A + B) = tr (A) + tr (B),
ii) tr (AB) = tr (BA).
33
Proof. The first part is obvious. We prove only part ii). Let A = [ai,j ] and B = [bi,j ] be n × n matrices. Denote
AB = C = [ci,j ] and BA = D = [di,j ]. Then
ci,i = Ri (A) · Ci (B) = Ci (B) · Ri (A) = di,i ,
where Ri (A) is the i-th row of A and Ci (B) is the i-th column of B. This completes the proof.
Example 2.3. For matrices A and B given below
 4 2 2 1 2 61
   
  
A =  0 3 1  , B =  3 -3 1
   

21 10 -2 31 2 1
   
compute the following tr (A), tr (B), tr (A + B), tr (AB), and tr (BA).
Solution: It is clear that tr(A) = 5, tr(B) = −1. Then, tr(A + B) = 4. We have
 74 6 248
 

AB =  41 -7 4  .
 
-13 8 1289
 
Hence, tr (AB) = tr (BA) = 1356.

Given the matrix A = [ai,j ] its transpose is defined to be the matrix
At := [a j,i ].
A is called symmetric if A = At . Note that for a square matrix A its transpose is obtained by simply rotating the
matrix along its main diagonal.
Lemma 2.5. For any matrix A the following hold
i) (At )t = A,
ii) (A + B)t = At + Bt ,
iii) (AB)t = Bt At .
Proof. Parts i) and ii) are easy. We prove only part iii). Let A = [ai,j ] and B = [bi,j ]. Denote AB = [ci, j ]. Then,
(AB)t = [c j,i ] where
c j,i = R j (A) · Ci (B) = C j (At ) · Ri (Bt ) = Ri (Bt ) · C j (At ).
This completes the proof.
Example 2.4. For matrices A and B given below
 4 2 2 1 2 61
   
  
A =  0 3 1  , B =  3 -3 1
   

21 10 -2 31 2 1
   
compute the following At , Bt , (A + B)t , (AB)t , and (BA)t .
Solution: We have
 4 0 21 1 3 31
   
  
At =  2 3 10  , Bt =  2 -3 2  .
   
2 1 -2 61 1 1
   
Computing (A + B)t , (AB)t , and (BA)t is left as an exercise for the reader.
Let A be a square matrix. If there is an integer n such that = I then we say that A has finite order, otherwise
An
A has infinite order. The smallest integer n such that An = I is called the order of A.
Exercises:
34
2.25. Find the trace of the matrices A, B, A + B, and A − B, 2.31. Let A be a square matrix. Show that (An )t = (At )n .
where A and B are
 4

2 2 
 
 1 2 6

 2.32. Prove or disprove the identity
A =  0 3 1  , B =  3 -3 1
  
(A + B)2 = A2 + 2AB + B2 ,

21 10 -1 31 0 13
   
for any two m × n matrices A and B.

2.26. We call a matrix A idempotent if A2 = A. Find a 2
by 2 idempotent matrix A not equal to the identity matrix
I2 . Using A, give an example of two matrices B, C such that 2.33. Let A and B be two matrices such that AB = BA. Prove
BC = 0, but B , 0 and C , 0. that
(A − B)(A + B) = A2 − B2 .
2.27. Let 2.34. Let A and B be two matrices such that AB = BA. Prove
cos θ − sin θ
" #
A= that
sin θ cos θ (A − B)(A2 + AB + B2 ) = A3 − B3 .
Find A2 . What about An ? 2.35. Let Q be the following set of complex matrices:
" # " # " # " #
1 0 i 0 0 1 0 i
2.28. A square matrix A is said to be nilpotent if there is an ± , ± , ± , ±
0 1 0 -i -1 0 i 0
integer r ≥ 1 such that Ar = 0. Let A, B be matrices such that
AB = BA, A2 = 0 and B2 = 0. Show that AB and A + B are such that i2 = −1. Further, let
nilpotent.
" # " # " # " #
1 0 i 0 0 1 0 i
I= , i= , j= , k= .
2.29. Let 0 1 0 -i -1 0 i 0
 4 2 2 
 
A =  0 3 1 
  Verify the following statements
2 0 1
 
i2 = j2 = k2 = −I
If possible, find a matrix B such that AB = 2A.
and
2.30. Prove that: ij = k, jk = i, ji = −k, kj = −i, ik = −j.
i) For any matrix A, the matrix AAt is symmetric
ii) If A is a square matrix then A + At is symmetric. These matrices are sometimes called quaternions. Show that
±i, ±j, ±k have order 4.
2.4 Linear systems of equations, Gauss method

In this section we will study a classical problem, solving linear systems of equations. Recall that by e vector x ∈ Rn
we denote a column vector.
Let a linear system of m equations with n unknowns be given as follows:
a1,1 x1 + · · · + a1,n xn = b1




 a2,1 x1 + · · · + a2,n xn = b2



........................





a x +···+a x = b

m,1 1 m,n n m
We write this system in the matrix form as

A·x = b
35
where
 a1,1 a1,2 a1,3 ... a1,n   x1   b1 
     
...
 a 
 2,1 a2,2 a2,3 a2,n   x 
 2 
 b 
 2 
 a3,1 a3,2 a3,3 ... a3,n   x3   b3 
     
A = [ai,j ] =  ·  , x =   , b =   .
     


 · 

 
 
 
 
·
     
     
...
    
am,1 am,2 am,3 am,n xm bm
We would like to use matrices and design an algorithm which can determine if such a system has a solution and in
the case it does, find that solution. The matrix [A | b] denotes the following matrix
 a1,1 a1,2 a1,3 ... a1,n b1

 

 a a2,2 a2,3 ... a2,n b2 
 2,1 
 a3,1 a3,2 a3,3 ... a3,n b3
 

[A | b] :=  · · .
 



 · · . 

· · .
 
 

am,1 am,2 am,3 ... am,n bm
and is called the augmented matrix of the corresponding system.
2.4.1 Elementary row operations

We would like to manipulate the augmented matrix [A | b] such that the solution set of the linear system does not
change. We define as elementary row operations performed on a matrix the following operations:
1) Interchange the i-th row with the j-th row (denoted by Ri −→ R j )

2) Multiply the i-th row by a nonzero scalar r (denoted by Ri → r Ri )
3) Add the i-th row to r times the j-th row (denoted by Ri → Ri + r R j )
It is obvious that such operations on the augmented matrix do not change the solution set of the system. If the
matrix B is obtained by performing row operations on A then matrices A and B are called row equivalent .
2.4.2 Row-echelon form of a matrix

Definition 2.5. A matrix is in row echelon form if:
1) All rows containing all zeroes are below rows with nonzero entries.
2) The first nonzero entry in a row appears in a column to the right of the first nonzero entry in any preceding row.
For a matrix in row-echelon form, the first nonzero entry in a row is the pivot for that row.
Example 2.5. Using row operations find the row echelon form of the matrix
 1 2 3
 

A =  2 0 1
 

3 2 2
 
Solution: We perform the following row operations:
36
3 R2→ 1 R 1 2 3  1 2 3 


1 2
    
2  R2 →R1 −R2 
A = 2
2 1 5
 0 1 −→ 1 0

2
 −→ 0
 2 2


3 2 2 3 2 2 3 2 2
   
   
1 2 3  1 2 3  1 2 3 
 
R3 → 13 R3 R3 →R2 − 32 R3 
−→ 0

2 5  R3 →R
−→
1 −R3 0 2 5  −→ 0 2 5 
2  2  2 
2 2 4 7
 

1 3 3

0 3 3
0 0 −1

Row operations are fast and inexpensive operations. Below we give the algorithm of how to transform a matrix in
row-echelon form.
Algorithm 1. Input: A matrix A.

Output: The row-echelon form of A
1) Start with the first column which has nonzero entries.

2) By row interchange get a pivot p in the first row of this column. Make entries in this column below the pivot
all zeroes.
3) Continue this way with the next column.
The row-echelon form of matrices is used to solve linear systems of equations. Let A x = b, be a linear equation. We
create the augmented matrix [A | b] and find its row-echelon form, say [H | c]. Using back substitution we solve the
system
Hx = c.
We illustrate with an example.
Example 2.6. Solve the linear system
x2 − 3x3 = −5




2x1 + 3x2 − x3 = 7




 4x1 + 5x2 − 2x3 = 10


Solution: Then
 0 1 -3 -5  2 3 -1 7
   
 
[A | b] =  2 3 -1 7 [H | c] =  0 1 -3 -5
   
 
4 5 -2 10 0 0 -3 -9
   
by performing the operations R1 −→ R2 , R3 → R3 − 2R1 , R3 → R3 + R2 . Thus the linear system is equivalent with the following
system
2x1 + 3x2 − x3 = 7




x2 − 3x3 = − 5




−3x3 = − 9



By back substitution we have

 -1 
 
x =  4 
 
3
 

This method is known as the Gauss method.
Theorem 2.4. Let
Ax = b
be a linear system and [A | b] [H | c], where [H | c] is in row-echelon form. Then one of the following hold:
37
1. Ax = b has no solution if and only if H has a row of all zeroes and in the same row c has a nonzero entry.
2. If Ax = b has solutions then one of the following holds: i) it has a unique solution if every column of H contains a pivot
ii) it has infinitely many solutions if some column of H contains no pivot
Proof. We recall from elementary algebra that a linear equation
ax = b
has no solution if and only if a = 0 and b , 0. It has a unique solution if and only if a , 0 and b , 0 and infinitely
many solutions if and only if a = b = 0.
If H has a row of all zeroes and in the same row c has a nonzero entry cn , 0 then the equation
0 · xn = cn
has no solution and therefore the linear system Ax = b has no solution. The converse also hold from the definition
of the row-echelon form. Parts 2, i) and 2, ii) follow similarly.
Example 2.7. Find how many solutions the following system has:
2x + 5y = 3
(
6x + 15y = 9
Solution: The augmented matrix is
" # " #
2 5 3 2 5 2
[A | b] = [H | c] =
6 15 9 0 0 0
From the above theorem the system has infinitely solutions. Of course, this is easy to see since the second equation of the system
is obtained by multiplying the first equation by 3.
The above theorem can be interpreted geometrically in the case of a 2 by 2 or a 3 by 3 coefficient matrix. For
example in the case of a linear system of 2 equations and 2 variables we have the well known situation of two lines
on the plane. It is known from geometry that two lines intersect in one point, no points, or infinitely many points.
Exercises:
Solve the linear systems using the Gauss method with Find the row-echelon form of the following matrices
back substitution.
2.39.
2.36.  0

1 -3 -5

x + 5y = 2
( 
 0 3 0 1 
 
3x + 2y = 9 4 5 -2 10
 
2.37. 2.40.
 0 0 0 0
 
2x + y − 3z = 0
 
  1

 1 -3 -3 
6x + y − 8z = 0

  
 1 3 0 0
  
 
 2x − y + 5z = −4

2 5 -2 1
  
2.38. 2.41. Determine all values of b1 , b2 such that the following



 y − 2z = 3 system has solutions

x + 2y − 3z = 2 x1 + 11x2 = b1

 (


 5x − 3y + z = −1 3x1 + 33x2 = b2


38
2.42. Determine all values of b1 , b2 such that the following 2.44. Find a, b, c and d such that the quartic
system has no solutions
(
x1 + 2x2 = b1 y = ax4 + bx3 + cx2 + d
− 2x1 − 4x2 = b2 passes through the points (3, 2), (-1, 6), and (-2, 38), and (2, 6).
2.43. Find a, b, and c such that the parabola
y = ax2 + bx + c
2.45. Find a polynomial function going through the points (3,
passes through the points (1,-4), (-1,0), and (2,3). 1, -2), (1, 4, 5), and (2, 1, -4).
2.5 Reduced row-echelon form, Gauss-Jordan method

Let [A | b] be a matrix in row-echelon form. Can we manipulate [A | b] even further so that the solution of the
corresponding system is read directly from the matrix equation? This leads to the following definition
Definition 2.6. A matrix is in reduced row-echelon form if it is in row-echelon form, all pivots are 1, and all terms above
the pivots are 0.
As we will see, once the coefficient matrix is in the reduced row-echelon form then the solution of the corre-
sponding linear system is read directly in the last column of the augmented matrix. We illustrate with an example.
Example 2.8. Let [H | c] be the matrix in row-echelon form as in the Example 2.7:
 2 3 -1 7
 

[H | c] =  0 1 -3 -5  .
 
0 0 -3 -9
 
Find its reduced row-echelon form.
Solution: To find the reduced row-echelon form we perform the following row-operations
 
 2 3 -1 7  R1 → 1 R1 , R3 →− 1 R3
[H | c] =  0 1 -3
2 3
 -5 
 −→
0 0 -3 -9
 
3
- 12 7
   
 1 2 2  R1 →R1 − 3 R2  1 0 4 11  R →3R +R
2  0 1 -3 -5  2 3 2
 0 1 -3 -5  −→ −→
 
 
0 0 1 3 0 0 1 3
   
   
 1 0 4 11  R →R −4R  1 0 0 -1 
 0 1 0 4  1 1 3  0 1 0 4 
  −→  
0 0 1 3 0 0 1 3
   
Hence, we can directly conclude that the solution to the corresponding system is
 -1 
 
x =  4  ,
 
3
 
as concluded previously.
39
Remark 2.1. Notice that the reduced row-echelon form of a matrix A , on contrary to the row-echelon form, is unique.
The method that transforms the augmented matrix to the reduced row-echelon form is called the Gauss-Jordan
method.
Remark 2.2. Even though the Gauss-Jordan method gives the solution in a "nicer" form, it is not necessarily better than the
Gauss method. For large linear systems the number of operations performed becomes significant. Using the Gauss-Jordan
method, it takes roughly 50% more arithmetic operations than using the Gauss method.
Example 2.9. Find the reduced row-echelon form of the matrix.
 2 1 -2 1
 

[A | b] =  -2 1 1 2
 

-2 -1 2 2
 
Show all the row operations. What are the solutions of the corresponding system Ax = b?
Solution: The reduced row-echelon form is
− 34
 
 1 0 0 
[H | c] =  0 − 12
 
1 0 

0 0 0 1

Hence, the system has no solutions.

Example 2.10. Determine values of b such that the following system has one solution, infinitely many solutions, or no
solutions
x1 + 2x2 − x3 = b




 x1 + x2 + 2x3 = 1



 2x1 − x2 + x3 = 2


Solution: The augmented matrix is

 1 2 -1 b
 

[A | b] =  1 1 2 1
 

2 -1 1 2
 
and its reduced row-echelon form is:

b+3 
1 0 0

4 
 


[H | c] = 0 1 0 b−1 
4 


 
 
 b−1 
0 0 1 4
The system has a solution for any b.
2.5.1 A word on homogenous systems

A linear system is called homogenous if it is in the form
Ax = 0.
Clearly x = 0 is a solution of such systems and is called the trivial solution. The augmented matrix for such systems
is [A | 0] and its row-echelon form will be [H | 0]. The system has nontrivial solutions if there is a row of H with no
pivots. We will see in Chapter 3 that this is equivalent with the determinant of the matrix A being nonzero.
Exercises:
40
2.46. Find the reduced row-echelon form of A 2.50. Solve the following system using the Gauss method
 1 2 3  5x1 + 3x2 − x3 = −2
  


A = 
 2 0 1  
2x1 + 2x2 + 2x3 = 3
 

3 2 2
  

 − x1 − x2 + x3 = 6


and solve the linear system Ax = 0.
2.51. Solve the following system using the Gauss-Jordan
method
11x1 + 12x2 − 3x3 = 2

2.47. Find the reduced row-echelon form of A 


− x1 + 3x2 + 2x3 = 3


 0 1 -3 -5 
  

 2x1 + 3x2 + x3 = −2

A = 
 0 3 0 
1 
4 5 -2 10
 
2.52. Prove that the reduced row-echelon form of a matrix is
and solve the linear system Ax = 0. unique.
2.48. Find the reduced row-echelon form of A 2.53. Let Ax = 0 be a homogenous system which has no non-
trivial solutions. What is the reduced row-echelon form of A ?
 0 0 0 0 
 
 1 1 -3 -3 
A =   2.54. Find a, b, and c such that the parabola
 1 3 0 0 
2 5 -2 1
 
y = ax2 + bx + c
and solve the linear system Ax = 0. passes through the points (1,2), (-1,1), and (2,3).
2.49. Solve the following system using the Gauss-Jordan

2.55. Find a, b, c and d such that the quartic
method
 x1 + 2x2 − x3 = 1

y = ax4 + bx3 + cx2 + d


x1 + x2 + 2x3 = 3




 2x1 − x2 + x3 = −2

 passes through the points (3,2), (-1,6), and (-2,1), and (0,0).
2.6 Inverses of matrices

In this section we study the important concept of the inverse of a matrix.
Definition 2.7. Let A = [ai, j ] be a n × n square matrix. A is called invertible if there exists an n × n matrix A−1 such that
AA−1 = A−1 A = In .
A−1 is called the inverse of A and A is called invertible. If A is not invertible then it is called singular.
Theorem 2.5 (Uniqueness of the inverse). Let A be an invertible matrix. Then, its inverse is unique.
Proof. Suppose that A has two inverses C and D. Then,
AC = I = AD and CA = I = DA
Then we have
D(AC) = DI = D
(2.5)
D(AC) = (DA)C = IC = C
Hence, C = D.
We also have the following useful result.
41
Lemma 2.6. Let A, B be invertible matrices. Then AB is invertible and
(AB)−1 = B−1 A−1 .
Proof. Exercise
Definition 2.8. Any matrix that can be obtained from the identity matrix In by one row operation is called an elementary
matrix .
Theorem 2.6. Let A be an m × n matrix and E an m × m elementary matrix. Then E A affects the same row operation on A as
the one performed in In to obtain E.
Proof. Let E be the elementary matrix obtained as
Ri ←→R j
Im −→ E.
Then the new Ri (E) = (0, . . . , 0, 1, 0, . . . 0) where 1 is in the j-th position. Hence, the entries of Ri (E A) are
Ri (E) · Cr (A), for r = 1, . . . n
and Ri (E A) = R j (A). Similarly, R j (E A) = Ri (A).

The cases in which E is obtained by a row-scaling and row-addition go similarly and are left as exercises to the
reader.

2.6.1 Computing the inverses using the row-echelon form

Let A be a given matrix. We want to find its inverse A−1 if it exists. Consider first the elementary matrices.
Let E be an elementary row matrix obtained by one row-interchange of I. Then performing the same row-
interchange to E would give us back I. Hence, EE = I and the inverse of E is E itself. If E is obtained by multiplying
a row by a scalar then we divide that row with the same scalar to get back I. If E is obtained by Ri → Ri + rR j then
performing Ri → Ri − rR j would result in I. Hence, we have the following:
Lemma 2.7. Elementary matrices are invertible
Proof. Let E1 be an elementary matrix. Then E1 is obtained by some row operation on the identity I. Since every
row-operation can be undone then we can perform a new row-operation on E1 to obtain I. The second row
operation corresponds to another elementary matrix E2 such that E2 E1 = I; see the previous theorem. Thus, E1 has
an inverse.
Example 2.11. Let E be given as below

 1 0 0 0
 

 0 0 0 1 
E = 
 
 0 0 1 0


0 1 0 0
 
Find its inverse.
Solution: E is obtained by interchanging rows R2 ←→ R4 of the identity matrix. So E is an elementary matrix and therefore
invertible. Its inverse is E since E2 = I.

Lemma 2.8. Let A, B be n × n matrices. Then, AB = In if and only if BA = In .
42
Proof. It is enough to show that if AB = In then BA = In , the other direction goes by symmetry of A and B. Hence,
we assume that AB = In . Let b be any vector in Rn . Then ABb = b. Thus the system Ax = b has always a solution
(namely x = Bb). By Theorem 2.4 the reduced row-echelon form of A is In . Hence, there are E1 , . . . , Ek such that
Ek · · · E1 A = In (2.6)
Multiplying both sides on the right by B we have
Ek · · · E1 (AB) = B.
But AB = In , hence Ek · · · E1 = B. Thus, by equation (2.6) we have BA = In .

Now, we go back to the main question of this section, that of computing inverses. In general we proceed as
follows. Let A = [ai,j ] be given. To find A−1 we have the following algorithm:
Algorithm 2. Input: A square matrix A.

Output: Determines if A−1 exists and finds it in that case.
1) Form the augmented matrix [A | I]

2) Apply the Gauss-Jordan method to reduce [A | I] to [I | C]. If this is possible then C = A−1 , otherwise A−1 does
not exist.
Example 2.12. Find the inverse of the following matrix
 -1 1 0 2
 

 0 2 1 0 
A = 
 
 0 1 -2 1


0 -1 -1 0
 
Solution: Create the matrix [A | I]. Then its reduced row-echelon form is:
 1 0 0 0 -1 -5 2 -9
 

 0 1 0 0 0 1 0 1 
[I | C] = 
 
 0 0 1 0 0 -1 0 -2


0 0 0 1 0 -3 1 -5
 
Hence,
 -1 -5 2 -9
 

 0 1 0 1 
A = C = 
−1  
 0 -1 0 -2


0 -3 1 -5
 

Example 2.13. Let A be given
 1 0 0 -1
 

 1 1 1 0 
A = 
 
 -1 1 1 0


0 0 -1 -1
 
Find its inverse.
Solution: Create [A | I] as follows
43
 1 0 0 -1 1 0 0 0
 

 1 1 1 0 0 1 0 0 
[A | I] = 
 
 -1 1 1 0 0 0 1 0


0 0 -1 -1 0 0 0 1
 
Its reduced row-echelon form is

 1 1 
 1 0 0 0 0 2 - 2 0 
 0 1 0 0 -1 1 0 1
 
[I | A−1 ] = 

1 1 
 0 0 1 0 1 - 2 2 -1 

1
0 0 0 1 -1 - 12 0

2

Remark 2.3. We have illustrated above how to find the inverse of a matrix. However, such an inverse does not always exist.
In the next chapter we will study some necessary and sufficient conditions such that the inverse of a matrix exists.
Exercises:
2.56. a) Let A be a square matrix such that A2 = 0. Find the 2.60. Show that if B is invertible, then tr (A) = tr (BAB−1 ).
inverse of I − A.
b) Let A be a square matrix such that A2 + 2A + I = 0. Find
2.61. Let
the inverse of A.
 1 2 -1 
 
c) Let A be a square matrix such that A − A + I = 0. Find
3
A =  0 3
 1 
the inverse of A. 
2 0 1

d) Let A be a square matrix such that A = 0. Find the
n
inverse of I − A. If possible, find a matrix B such that AB = 2I.
2.62. Find the inverse of the following

2.57. Find the inverse of
 5 2 0 2
" #  
1 a 
A=
 3 2 1 0 
0 1 A =   .
 
 3 1 -2 4 
2 4 -1 2
 
Does A have an inverse for any value of a?
2.63. Let
 1 2 3  3 0 1
   
2.58. For what values of a, b, c, d does the inverse of  
A =  -2 1 2  , B =  2 0 2  ,
   
" #
3 2 1 0 2 1
   
a b
A=
c d
be given. Find the following: tr(A), tr(B), At , AB, Bt At ,
tr(BAB−1 ).
exist? Find the inverse for such values of a, b, c, d.
2.59. Solve the linear system 2.64. Show that if A is invertible then so is At .
Ax = b
2.65. Let r be a positive integer and A an invertible matrix.
when A is invertible. Is Ar necessarily invertible? Justify your answer.
Review exercises
44
2.66. Find the reduced row-echelon form of the matrix. Show 2.78. Find all matrices which commute with
all the row operations. " #
0 1
 4 2 3 3 
 
 -2 0 2
 1 1 2 
3 -1 2 1 2.79. Show that if A commutes with B then At commutes
 
  with Bt .
1
 
5
2.67. Find the angle between the vectors u = 2 and v = 1.
   
3 8
   
2.80. Let V be the set of all m by n matrices with entries in R.
2.68. Determine all values of b1 , b2 such that the following Show that scalar matrices commute with all matrices from V.
system has no solutions Are there any other matrices which commute with all matrices
of V?
x1 + 2x2 − x3 = b1




− 2x1 − 4x2 + 2x3 = b2



 2.81. Let a, b, c, d be real number not all zero. Show that the
 x1 − x2 + x3 = 2


following system has exactly one solution
2.69. Find the area of the triangle between the three points
ax1 + bx2 + cx3 + dx4 = 0

(1, 2), (3, 4), (5, 6).



 bx1 − ax2 + dx3 − cx4 = 0



2.70. Let the matrices
cx1 − dx2 − ax3 + bx4 = 0



 3 2 3   2 -2 1
    

dx1 + cx2 − bx3 − ax4 = 0
 

A =  -2 1 2  , B =  2 0 2  ,
   
0 1 1 0 2 2
   
2.82. For what value of λ does the following system has a
be given. Find the following: tr(A), tr(B), At , AB, Bt At , solution:
2x1 − x2 + x3 + x4 = 1

tr(BAB−1 ).



x1 + 2x2 − x3 + 4x4 = 2


2.71. Show that if AB is invertible then so are A and B.


 x1 + 7x2 − 4x3 + 11x4 = λ


2.72. A square matrix is called upper triangular if all entries
2.83. The following system has a unique solution:
below the mail diagonal are zero. What is the sum and product
of upper triangular matrices? Justify your answer. 
 ay + bx = c


2.73. A square matrix is called lower triangular if all entries cx + az = b



above the mail diagonal are zero. What is the sum and product

bz + cy = a.



of lower triangular matrices? Let V := Matn×n (R) be the set
of all n × n matrices with entries in R, W1 the set of all upper Show that abc , 0. Find the solution of the system.
triangular matrices of V, and W2 the set of all lower triangular
matrices of V. What is the intersection W1 ∩ W2 ? 2.84. Find the following:
" #n " #n " #n
2.74. Let A be a 3 by 2 matrix. Show that there is a vector b 1 1 1 0 1 1
such that the linear system , ,
0 1 1 1 1 1
Ax = b
2.85. Let " #
is unsolvable. a b
A= ,
c d
2.75. Let A be an m × n matrix with m > n. Show that there
exists a b such that the linear system Ax = b is unsolvable. such that A2 = I. Show that the following relation is satisfied
2.76. Let A be a m × n matrix and B an n × m matrix, where when x is substituted by A:
m > n. Use the above result to show that the row-echelon form x2 − (a + d)x + (ad − bc) = 0.
of the matrix AB has at least one row of all zeroes.
2.77. Find all matrices B such that 2.86. Let A be a 3 by 3 matrix. Can you generalize the above
problem in that case? What about if A is an n by n matrix?
" # " #
0 1 0 0 2.87. Find the order of the following matrices
i) B=
0 2 0 0
" # " # " # " # " # " #
0 1 0 0 1 1 -1 1 -1 -1 1 1 -1
ii) B= , , ,
0 2 0 0 2 1 0 0 1 0 1 -1 0
45
46
Chapter 3
Vector Spaces
In this chapter we formally define vector spaces. After discussing Euclidean spaces in the previous chapter, the
concept of the vector space here will be more intuitive. Throughout this chapter k denotes a field. For our purposes
k is always one of the following: Q, R, C.
3.1 Definition of vector spaces

In this section we generalize the concept of Euclidean spaces studied in the previous chapter to a more abstract
object, that of a vector space.
Let S be a set and
f : S×S → S
(3.1)
(a, b) → f (a, b)
a function. Such function will be called a binary operation defined on S.

The usual addition and multiplication of numbers are binary operations, as shown below.
Example 3.1. Let Z be the set of integers and "+" defined as
”+” : Z×Z → Z
(3.2)
(a, b) → a + b
Then, "+" is a binary operation defined in Z.

Let V be a given set and 00 +00 a binary operation defined as
”+” : V×V → V
(3.3)
(u, v) → u + v
Let 00 ?00 be another binary operation
”?” : k×V → V
(3.4)
(r, u) → r ? u
Definition 3.1. The set V together with the binary operations above, denoted by (V, +, ?), is a vector space over k if the
following are satisfied:
1. (u + v) + w = u + (v + w), ∀ u, v, w ∈ V
2. u + v = v + u, ∀ u, v ∈ V
3. ∃ 0 ∈ V, s.t. 0 + u = u + 0 = u, ∀u ∈ V
47
4. ∀ u ∈ V, there is − u ∈ V such that u − u = 0

5. ∀ r ∈ k, ∀ u, v ∈ V, r ? (u + v) = r ? u + r ? v
6. ∀ r, s ∈ k, u ∈ V, (r + s) ? u = r ? u + s ? u
7. ∀ r, s ∈ k, u ∈ V, (rs) ? u = r ? (s ? u)
8. ∃ 1 ∈ k, s.t. ∀ u ∈ V, 1?u = u
Property 1) and 2) say that addition is associative and commutative. By property 3) we have an additive identity
and by property 8) a multiplicative identity. Property 4) assures there is an additive inverse normally called the
opposite. Elements r, s ∈ k are called scalars. From now on we will suppress 0 ?0 .
Elements of a vector space are called vectors. From now on V/k will denote a vector space over some field k.
When there is no confusion we will simply use V. Next we give some examples of some classical vector spaces.
Example 3.2 (Euclidean spaces Rn ). Show that Rn is a vector space with the usual vector addition and scalar multiplication.
What is the additive and multiplicative identity?
Example 3.3 (The space of polynomials with coefficients in k). Let k[x] denote the set of polynomials
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
where a0 , . . . , an ∈ k. We define the sum and the scalar product of two polynomials to be
( f + g)(x) := f (x) + g(x)

(3.5)
(r f )(x) := r f (x)
for any r ∈ k. Then, k[x] is a vector space over k. k[x] is also called the polynomial ring of univariate polynomials; see Chapter
4 for more details.
Example 3.4 (The space of n × n matrices). The set of n × n matrices with entries in a field k, together with matrix addition
and scalar multiplication forms a vector space. We denote this space by Matn×n (k).
Example 3.5 (The space of functions from R to R). Let L(R) denote the set of all functions
f : R −→ R
We define the sum and the scalar product of two functions to be
( f + g)(x) := f (x) + g(x)

(3.6)
(r f )(x) := r f (x)
for any r ∈ R. Show that L(R) is a vector space over R.

We can generalize the above example as follows:
Example 3.6 (Function Spaces). Let S denote a set and k a field. A function is called k-valued if
f : S −→ k
Let V denote the set of all k-valued functions. We define the sum and the scalar product of two functions in V to be
( f + g)(x) := f (x) + g(x)

(3.7)
(r f )(x) := r f (x)
for any r ∈ k. Then V is a vector space over k.

Exercise 3.1 (Complex numbers as a vector space). Prove that C together with the addition and scalar multiplication
defined Definition 1.1 form a vector space over R.
48
There are some subsets of a vector space V which are of special importance. A subset W ⊂ V is called a subspace
(or a linear subspace) of V if it is a vector space by itself. Next we see some examples of subspaces of a vector
space.
Example 3.7. Let V = R3 . Then every v ∈ V is a triple (x, y, z), which we have denoted by
x
 
v =  y
 
z
 
Let W be the set of vectors v ∈ V such that the last coordinate is 0,

 
x
   
  
W= = .
 
 y 
v | v ∈ V
 
 
 
 

0

   

Then W is basically R2 which is also a vector space. Hence, W is a subspace of V.
A set S of V is called closed under addition if for every u, v ∈ S we have
u + v ∈ S.
It is called closed under scalar multiplication if for every u ∈ S and r ∈ k we have
ru ∈ S.
We have the following Lemma.
Lemma 3.1. Any subset W ⊂ V is a vector space if and only if it is closed under addition, scalar multiplication, and contains
0.
Proof. Exercise
Example 3.8. Let V = R3 and P be the plane determined by the vectors u and v going through the origin. This plane is a
vector space because: it contains the zero vector, every sum of two vectors in P is again in P, and every vector in P multiplied
by a scalar is again in P.
Example 3.9 (The nullspace of a matrix:). Let A be a given matrix. Consider the set of all vectors in Rn which satisfy the
equation
Ax = 0.
We call this set the nullspace of A. It is a subspace of Rn . The proof is easy and is left as an exercise.
Definition 3.2. Let V be a vector space over k and v1 , . . . vn ∈ V. Then, v is a linear combination of v1 , . . . vn if it can be
written as
v = r1 v1 + · · · + rn vn
where r1 , . . . , rn ∈ k.
We have the following lemma:
Lemma 3.2. Let V be a vector space and v1 , . . . , vn ∈ V. The set W of all linear combinations of v1 , . . . , vn , is a subspace of V.
Proof. Exercise.
49
3.1.1 Linear independence

Let V be a vector space and u1 , . . . , un vectors in V.
Definition 3.3. The vectors u1 , . . . , un are called linearly independent if
r1 u1 + · · · + rn un = 0
implies that
r1 = · · · = rn = 0,
otherwise, we say that u1 , . . . , un are linearly dependent.
Hence, a set of vectors u1 , . . . , un are linearly dependent if one of them is expressed as a linear combination of
the other ones.
 
2
 
1
 
1
Example 3.10. Show that u1 =  , u2 =  , and u3 = 1 are linearly independent in R3 .
3 2  
1 1 1
     
Solution: We want to find if there exist r1 , r2 , r3 , not all zero such that
r1 u1 + r2 u2 + r3 u3 = 0.
We have
 2r1 + r2 + r3  0
   
3r1 + 2r2 + r3  = 0
r1 + r2 + r3
   
0
The augmented matrix and its reduced row-echelon form are
 2 1 1 0  1 0 0 0
   
 
[A | 0] =  3 2 1 0 [H | c] =  0 1 0 0
   
 
1 1 1 0 0 0 1 0
   
Since every row has a pivot then the system has a unique solution (r1 , r2 , r3 ) = (0, 0, 0). This concludes that u1 , u2 , u3 are
linearly independent.
The following example should be familiar to students who have had a course in differential equations:
Example 3.11. Let L(R) be the vector space of all real-valued functions in t. Show that the following pair of functions
sin t, cos t are linearly independent.
Solution: Let r1 , r2 ∈ R such that

r1 sin t + r2 cos t = 0
π
for every t ∈ R. Take t = 0, then r2 = 0. If we take t = 2 then r1 = 0. Hence sin t and cos t are linearly independent.
Exercises:
3.1. Let U, W be subspaces of V. Define the sum of subspaces Show that Wu is a subspace of V.
of U and W by
3.3. Let S be a set and V a vector space over the field k. Show
U + W := {u + w | u ∈ U, w ∈ W}. that the set of functions
Show that U ∩ W and U + W are subspaces of V. f : S → k,
3.2. Let u ∈ V = Rn and
under function addition and multiplication by a constant is a
Wu := {v ∈ V | u · v = 0}. vector space.
50
3.4. Let L(R) be the vector space of all real-valued functions 3.9. Let Q be the set of rational numbers and
in t. Show that the following pairs are linearly independent. √ √
i) t, et Q( 2) := {a + b 2 | a, b ∈ Q}.
ii) sin t, cos 2t √
iii) tet , e2t Prove that Q( 2) is a vector space over Q with the usual
iv) t, sin t. addition and scalar multiplication.
3.10. We know that the set of complex numbers C is given by
3.5. An upper triangular matrix is a matrix A = [ai,j ] such
that ai, j = 0 for all i < j. Show that the space of upper triangular C := {a + bi | a, b ∈ R}
matrices is a subspace of Matn×n (R). √
where i = −1. Is C a vector space over R with the usual
3.6. Prove that k[x] is a vector space over the field k. addition and scalar multiplication?
3.7. Let k be a field and A := k[x] the polynomial ring. Denote 3.11. Let V be the set of 2 by 2 matrices of the form
by An the set of polynomials in A of degree n. Is An a subspace " #
0 x
of A? Justify your answer.
y 0
3.8. Let k be a field and A := k[x] the polynomial ring. De-
where x, y are any scalars in R. Is V a vector space over R?
note by Pn the set of polynomials in A of degree ≤ n. Is Pn a
subspace of A? Justify your answer. 3.12. Is R a vector space over Q?
3.2 Bases and dimension

In this section we will study two very fundamental concepts in the theory of vector spaces that of basis and
dimension. Let V be a vector space over k and B := {v1 , . . . , vn } ⊂ V. Denote by W the set of all linear combinations
of v1 , . . . , vn in V. We say that W is generated by v1 , . . . , vn .
Definition 3.4. Let V be a vector space over k and B := {v1 , . . . , vn } ⊂ V. Then B is a basis of V if the following hold:
i) V = Span (v1 , . . . , vn )
ii) v1 , . . . , vn are linearly independent.
Example 3.12. Let V = R2 . A basis of this vector space is
B = {i, j}
where i = (0, 1) and j = (1, 0).
Solution: Indeed, we know from calculus that every vector v ∈ R2 can be written as a linear combination of i and j as follows:
v = r1 i + r2 j
for some real numbers r1 , r2 . This is called the standard basis of R2 .

Theorem 3.1. Let V be a vector space over k and B := {v1 , . . . , vn } be a basis of V. If
x1 v1 + · · · + xn vn = y1 v1 + · · · + yn vn ,
then
xi = yi , for i = 1, . . . , n.
Proof. From
x1 v1 + · · · + xn vn = y1 v1 + · · · + yn vn
we get that
(x1 − y1 )v1 + · · · + (xn − yn )vn = 0.
51
Since B := {v1 , . . . , vn } is a basis of V then v1 , . . . , vn are linearly independent. Hence
xi = yi , for i = 1, . . . , n.

The theorem motivates the following definition:
Definition 3.5. Let V be a vector space, B := {v1 , . . . , vn } a basis of V, and u ∈ V given by
u := x1 v1 + · · · + xn vn .
Then (x1 , . . . , xn ) are called the coordinates of u with respect to B.

Theorem 3.2. Let V be a vector space over the field k and B1 and B2 bases for V such that |B1 | = m and |B2 | = n. Then, m = n.
Proof. Let the bases B1 and B2 be
B1 = {v1 , . . . , vm } and B2 = {w1 , . . . , wn }
and assume that m < n.

Since {v1 , . . . , vm } is a basis then there exist x1 , . . . , xn ∈ k such that
w1 = x1 v1 + · · · + xm vm .
We know that w1 , 0 since B2 is a basis, thus at least one of x1 , . . . , xm is , 0. Without loss of generality we may
assume that x1 , 0. Then we have
x1 v1 = w1 − x2 v2 − · · · − xm vm
Hence,
1 x2 xm
v1 = w1 − v 2 − · · · − vm .
x1 x1 x1
The subspace W generated by {w1 , v2 , . . . , vm } contains v1 . Hence, W = V. We continue this procedure until we
replace all v2 , . . . , vm by w2 , . . . wm . Thus, we have that the set
{w1 , . . . , wm }
generates V. Then for each i > m we have wi as a linear combination of w1 , . . . , wm . This is a contradiction because
w1 , . . . , wn are linearly independent since B2 is a basis. Hence, m ≥ n. Interchanging the roles of B1 and B2 we get
m = n.

Hence, we have the following definition.
Definition 3.6. Let V be a vector space and B a basis of V. Then,
dim(V) := |B|
is called the dimension of V.

Vector spaces with a finite dimension are called finite dimensional. In this book we will primarily study finite
dimensional vector spaces.
Let V be a vector space. A subspace W of V of dimension dim(W) = 1 is called a line and a subspace of dimension
2 is called a plane.
The following theorem is very useful while checking for a basis. We will skip its proof.
Theorem 3.3. Let V be a vector space with dim(V) = n and {v1 , . . . , vn } be linearly independent. Then, {v1 , . . . , vn } is a basis
for V.
Proof. Exercise.
Corollary 3.1. Let V be a vector space and W a subspace of V. If dim(W) = dim(V) then W = V.
52
Proof. Take a basis B = {w1 , . . . wn } of W. Hence, w1 , . . . , wn are linearly independent. Then from the above theorem
they generate V.
Corollary 3.2. Let V be a vector space and W a subspace of V. Then
dim(W) ≤ dim(V).
Proof. Exercise.
" # " #
1 2
Example 3.13. Let u = and v = be vectors in V = R2 . What is the space W = Span (u, v)?
3 7
Solution: From the previous examples we know that dim(V) = 2. Then from the previous corollary
dim(W) ≤ 2.
Since u and v are not multiples of each other then they are independent. Hence, dim(W) = 2. From Corollary 3.1 we have that
W = R2 .
3.2.1 A basis for Matn×n (R)

So far our examples of bases are from the spaces Rn . However, the above results hold for any vector space. So what
is a basis and the dimension of Matn×n (R)?
Example 3.14. Let V = Mat2×2 (R). Find a basis for V and its dimension.
Solution: First we notice that every matrix

" #
a b
A= ∈V
c d
can be written as " # " # " # " # " #

a b 1 0 0 1 0 0 0 0
A= =a +b +c +d
c d 0 0 0 0 1 0 0 1
Hence, the set B = {M1 , M2 , M3 , M4 } where

" # " # " # " #
1 0 0 1 0 0 0 0
M1 = , M2 = , M3 = , M4 = ,
0 0 0 0 1 0 0 1
generates all of V. Are M1 , M2 , M3 , M4 linearly independent? If
r1 M1 + r2 M2 + r3 M3 + r4 M4 = 0,
then " # " #

r1 r2 0 0
=
r3 r4 0 0
which gives
r1 = r2 = r3 = r4 = 0.
Hence, B is a basis of V and dim(V) = 4.
Remark 3.1. In general, one can find a basis of Matn×n (k) as above and show that the dimension is n2 .
53
3.2.2 Finding a basis of a subspace in kn

Let w1 , . . . , wm be vectors in Rn and W = Span (w1 , . . . , wm ). By Lemma 3.2, W is a subspace of Rn . We want to
find a basis for W. So first we need to check if w1 , . . . , wm are independent. Hence, we would like to find scalars
r1 , . . . , rm ∈ R such that
r1 w1 + · · · + rm wm = 0.
Let w1 , . . . , wm be as below
w1,1  w2,1  wm,1 
     
w1 =  ...  , w2 =  ...  , . . . . . . wm =  ... 

     
(3.8)
     
w1,n w2,n wm,n
Then
r1 w1 + · · · + rm wm = 0
implies
w1,1 r1 + w2,1 r2 + · · · + wm,1 rm = 0




 w1,2 r1 + w2,2 r2 + · · · + wm,2 rm = 0



........................





w1,n r1 + w2,n r2 + · · · + wm,n rm = 0


Hence we have the system

  r1   0 
   
 w1,1 w2,1 . . . wm,1   r

  0 
  2   
w2,2 . . .

 w1,2 wm,2   r3   0 
  
  ·   =   .
 . . . ...... . . . 
   
     
w2,n . . .
  
w1,n wm,n    
rm 0
which can be written as
 r1   0 
   
 r   0 
 2   
 r3   0 
   
[ w1 | w2 | · · · | wm ] ·   =   .
   
   
   
   

   
  
rm 0
To solve this system we find the row-echelon form of the matrix
A = [w1 | w2 | . . . | wm ].
If the row-echelon form has a pivot in every column then w1 , . . . , wm are linearly independent, otherwise they are
linearly dependent. The vectors which form a basis in this case are the ones corresponding to columns with pivots.
So we have the following algorithm:
Algorithm 3. Input: A subspace W generated by w1 , . . . , wm in kn .

Output: A basis of W
i) Form the matrix A = [w1 | w2 | . . . | wm ]

ii) Find the row-echelon form of A
iii) The columns with pivots come from wi ’s which form a basis for W.
Example 3.15. Let W = Span (w1 , w2 , w3 , w4 ) ⊂ R4 such that
1 −1 2 3

       
2  3  4 3
w1 =   , w2 =   , w3 =   , w4 =   (3.9)
       
3  1  2 1
1 5 6 5
       
54
Find a basis for W.
Solution: We form the matrix A = [w1 , . . . , wn ].
 1 -1 2 3
 

 2 3 4 3 
A = 
 
 3 1 2 1


1 5 6 5
 
The reduced row-echelon form of A is
− 52
 
 1 0 0 
 0 1 0 − 35
 

7
 
 0 0 1 
 5 
0 0 0 0
Thus, the basis of W is B = {w1 , w2 , w3 }.
Theorem 3.4. dim(Rn ) = n
Proof. Take the set

1 0 0
     



0 1  .  
 . 
    
.

 
 
 
 
 

     
0 0
     
B= , , . . . ,
     
 
 


.
 
.
 
 0 
 
. .
       
. .
       
0

 
 
 
 
 
 
 


 
 
 
 
 
 
 


0 0 
1
     
of elementary vectors. Obviously this set generates Rn since every vector in Rn can be written as a linear combination
of elements in B.
Create the matrix A = [w1 , . . . , wn ]. Then A = I so it is already in reduced row-echelon form. Since every column
has a pivot, then elements of B are linearly independent.
The basis B is called the standard basis of Rn .
Example 3.16. Let P4 be the vector space of polynomials with real coefficients and degree ≤ 4. Determine whether
{ f1 , f2 , f3 , f4 , f5 } given as below
f1 = 2x4 − x3 + 2x2 − 1
f2 = x4 − x
f3 = x4 + x3 + x2 + x + 1
f4 = x2 − 1
f5 = x − 1
form a basis for P4 .
Solution: We take the basis B = {x4 , x3 , x2 , x, 1} for P4 . The reader should verify that this is a basis for P4 . Then the coordinates
of f1 , f2 , f3 , f4 , f5 with respect to the basis B are
 2   1  1 0 0
         
 
−1  0  1  
0   
0 
     
f1 =  2  , f2 =  0  , f3 = 1 , f4 =  1  , f5 =  0  .
       
 0  −1 1 0  1 
         
 
       
−1 0 1 −1 −1
55
We can determine whether the polynomials are independent by determining whether the corresponding coordinate vectors in
R5 are independent. The corresponding matrix is
 2 1 1 0 0 
 
 -1 0 1 0

0 

 2 0 1 1 0 

 0 -1 1 0 1 
 
-1 0 1 -1 -1
and its reduced row-echelon form is the identity matrix I5 . Since every column has a pivot then the vectors are independent in
R5 and therefore f1 , . . . f5 are independent in P4 . The dimension of P4 is dim P4 = 5. Hence { f1 , f2 , f3 , f4 , f5 } form a basis for
P4 .
Exercises:
3.13. Let V be a vector space over k. If a set of vectors is 3.18. Let V = k[x]. Show that
linearly independent in V, prove that the set does not contain
the zero vector. f1 = x6 + x4 and f2 = x6 + 3x4 − x,
3.14. Let W = Span (w1 , w2 , w3 ) ⊂ R4 such that are linearly independent.
1 −1 1 3.19. Let k be a field and V := k[x] the vector space of poly-
     
2  3  4 nomials in x. Denote by Pn the space of polynomials in V of
w1 =   , w2 =   , w3 =   (3.10)
3  1  0 degree ≤ n. Find a basis for Pn .
1 5 6
3.20. Let V be the vector space of functions f : R → R. Let
Find a basis for W. W be the subspace of V such that
3.15. Let W = Span (w1 , w2 ) ⊂ R6 such that W := Span (sin2 x, cos2 x).
1  2 
   
Show that W contains all constant functions.
2  4 
 
 
3 6
  3.21. Let V be the vector space of functions f : R → R. Show
w1 =   , and w2 =  
  (3.11) that the set
1  2 
9
 
18
  {1, sin x, sin 2x, . . . , sin nx}
5 10
 
is an independent set of vectors in V.
Find a basis for W. 3.22. Let V be the vector space of functions f : R → R. Find
3.16. Prove that any set B ⊂ Rn of n non-zero vectors which a basis of the subspace
are mutually perpendicular form a basis for Rn .
W = Span (3−sin x, 2 sin 2x−sin 3x, 3 sin 2x−sin 4x, sin 5x−sin 2x}.
3.17. Let V = Mat3×3 (R). Find a basis for V and its dimen-
sion. Hint: Use the previous problem.
3.3 Nullspace and rank of a matrix

Let A be a m × n matrix over k. Consider all the rows Ri of A. These are vectors in kn . The span of row vectors of A
is called a row space of A. Similarly the column vectors of A are vectors in km and the span of column vectors is
called the column space of A. As before the nullspace of A will be the solution set of A.
Theorem 3.5. Let A be an m × n matrix. The dimension of the row space is the same as the dimension of the column space.
This common dimension is equal to the number of pivots in the row-echelon form of A.
Proof. We use the previous procedure to find the dimension in both cases. This dimension is the number of pivots.

This common dimension is called the rank of A and is denoted by rank (A). The dimension of the nullspace is
called the nullity of A and is denoted by null (A).
56
Theorem 3.6 (Rank-Nullity Theorem). Let A be an m × n matrix and H its row-echelon form
i) rank (A) = number of pivots of H
ii) null (A)= number of columns without a pivot
Moreover,
rank (A) + null (A) = n
Proof. All is left to show is that null (A) = # columns without pivots in the row-echelon form. This is obvious since
we have as many free variables for the corresponding linear system as we have columns without pivots in the
row-echelon form of A.
Example 3.17. Find the rank, nullity, a basis for the row space, a basis for the column space, and a basis for the nullspace of
the matrix
 2 1 1 
 
A =  3 2 2 
 
1 1 1
 
Solution: We start by finding the reduced row-echelon form of A.
 2 1 1  1 0 1
   
 
A =  3 2 2 H =  0 1 1
   
 
1 1 1 0 0 0
   
Then
rank (A) = 2, and null (A) = 1.
A basis for the column space is    

 2 1 
   
B1 =  , .

3 2

  
     
1 1
 
To find a basis of the row-space we use the rows from H which contains pivots. So we have
   

 1 0 
   
B2 =  0 , 1 .
 
 

1 1
     
To find a basis for the nullspace we have to solve the system
Hx = 0
The augmented matrix is:

 1 0 1 0
 

[H | 0] =  0 1 1 0
 

0 0 0 0
 
Thus, x3 is a free variable and x2 + x3 = 0 and x1 + x3 = 0. The solution is
 -x3  - 1
   
 
x =  -x3  = x3  -1
   

x3 1
   
So a basis for the nullspace is  


 -1  

B3 =  .
 
 -1
 
 

 1
 


57
3.3.1 Finding a basis for the row-space, column-space, and nullspace of a matrix.
Given a m×n matrix A, we would like to find the bases of spaces associated with it. We have the following algorithm:
Algorithm 4. Input: An m × n matrix A

Output: A basis for the row-space, column-space, and nullspace of A
i) Find the reduced row-echelon form H of A

ii) The columns of A corresponding to the columns of H with pivots, form a basis for the column space.
iii) The nonzero rows of H form a basis for the row space.
iv) Use back substitution to solve Hx = 0.
Example 3.18. Find bases for the spaces associated with A
 1 2 -1 3
 

A =  1 1 2 1
 

2 -1 1 2
 
Solution: The reduced row-echelon form is
 1 0 0 3/2 
 
 
 
H =  0

 1 0 1/2 
 
 

0 0 1 -1/2
A basis for the column space is

 
1   2   -1 
    
 

 1  ,  ,
   1   2 

   
 2  
-1 1 
     
The rank of A is rank (A) = 3 and null (A) = 1. Thus, there is one free variable which we denote by x4 . Solving Hx = 0 we have
 3   3 
 - 2 x4   - 2 
   
   
 1   1 
 - 2 x4   - 2 
x =   = x4 
   

 1   
 1
 2 x4
 
  2 
   
   
x4 1
   
A basis of the nullspace is

 3 


  - 2  

   

   
  1  

  -  
 2
 
B= .
 
 
 

  1  

 
 2  


 
 


 1
 



For a basis of the row space we take all three rows of H.
58
Example 3.19. Find the rank, nullity of a basis for the column space, row space, and the nullspace of the matrix
 4 2 3 3 
 
A = 
 -2 1 1 2 
3 -1 2 1
 
Solution: The reduced row-echelon form of A is
6
 1 0 0 -
 
 23 

 
H =  0 1 0 9
 
23

 
 

25
0 0 1

23
Then,
rank (A) = 3, null (A) = 1
For the basis of the column space we have  
4   2  3 
    
 
 
 -2  ,  1  ,
     1 

  
 3   -1 
2 
    
For the basis of the row-space we take all three rows of A since each one of them contains a pivot. Next we find a basis for the
nullspace. Hence, we have to solve the system
Hx = 0.
The solution is  6 
 - 23  - 6
 
 
   
   
 9 
 9
 23   

x =   · x4 = 
 
 · t,

 25   
 23   25 
  



   
 
1 23
for some free variable t. Hence, a basis is  


  -6  

 9  
B=
  
25
 


 


23

 


The next theorem relates some of the previous topic to this section.
Theorem 3.7. Let A be an n × n matrix. The following are equivalent:
i) Ax = b has a unique solution for every b ∈ Rn .
ii) A is row equivalent to In
iii) A is invertible
iv) The column vectors of A form a basis for Rn
Proof. Exercise
The following result is quite useful when checking for inverses.
Corollary 3.3. Let A be an n × n matrix. Then A is invertible if and only if
rank (A) = n.
Exercises:
59
3.23. Find the rank, a basis for the row space, and a basis for 3.29. Generalize the above problem to Rn . Let u1 , . . . , un be
the column space, a basis for the nullspace for the following linearly independent column vectors in Rn and A an invertible
matrices. n × n matrix. Prove that the vectors Au1 , . . . , Aun are linearly
independent.
 2 3 2 1  1 1 1  1 2 3
     
3.30. Let u and v be column vectors in R3 and A an invertible
  
 ,  ,
 1 1 0 1   1 2 3   4 5 6 
   

2 3 1 -1
 
3 4 5
 
7 8 9
 3 × 3 matrix. Prove that if vectors Au and Av are linearly
independent then u and v are linearly independent.
3.24. Let A be a square matrix. Show that
3.31. Generalize the above problem to Rn . Let u1 , . . . , un
null (A) = null (At ).
be column vectors in Rn and A an invertible n × n matrix.
3.25. Let A, B be matrices such that the product AB is defined. Prove that if vectors Au1 , . . . , Aun are linearly independent
Show that then u1 , . . . , un are linearly independent.
rank (AB) ≤ rank (A).
3.32. Let
3.26. Give an example of two matrices A, B such that
cos θ − sin θ
" #
A=
rank (AB) < rank (A). sin θ cos θ
3.27. Let A be an m × n matrix. Prove that for some angle θ. Take any vector u ∈ R2 and compare it with
rank (AAt ) = rank (A). the vector Au. What happens geometrically?
3.28. Let u and v be linearly independent column vectors in 3.33. Let A be as in the previous exercise and {u, v} a basis in
R3 and A an invertible 3 × 3 matrix. Prove that the vectors R2 . Show that {Au, Av} is a basis for R2 . You might want to
Au and Av are linearly independent. look at the nullspace of A.
3.4 Sums, direct sums, and direct products

Let V be a finite dimensional vector space and U, W its subspaces. We define the sum U + W of subspaces U and W
as follows
U + W := {u + w | u ∈ U, w ∈ W}
This set U + W is a subspace of V; see Exercise 3.1 at the end of this section.
Lemma 3.3. Let V be a finite dimensional vector space and U, W its subspaces. Then
dim(U + W) = dim U + dim W − dim(U ∩ W).
Proof. Exercise.
3.4.1 Direct sums

We say that V is a direct sum of U and W, denoted by V = U ⊕ W, if every element v in V is expressed uniquely as
a sum
v = u+w
for some u ∈ U and w ∈ W.
Theorem 3.8. Let U, W be subspaces of the vector space V. If V = U + W and U ∩ W = {0}, then
V = U ⊕ W.
Proof. Let v in V and v = u + w for some u ∈ U and w ∈ W. To prove that V is a direct sum we must show that u and
w are uniquely determined. Assume that exist u0 and w0 such that v = u0 + w0 . Then,
v − v = (u − u0 ) + (w − w0 ) = 0
60
Hence, u − u0 = w0 − w. Since u − u0 ∈ U, and w − w0 ∈ W, then
(u − u0 ) = (w0 − w) ∈ U ∩ W = {0}
Therefore,
u = u0 and w = w0
Theorem 3.9. Let V be a finite dimensional vector space over k and W a subspace of V. Then, there is a subspace U ⊂ V such
that
V = U⊕W
Proof. Let dim V = n and dim W = r where r < n. Let
B = {b1 , . . . , bn }
be a basis for V. Then we can pick r elements of B which form a basis for W, say b1 , . . . , br . Let
U := {br+1 , . . . , bn }
Obviously V = U + W. Also U ∩ W = {0}, otherwise b1 , . . . , bn would not be linearly independent.

The subspace U is called the complement of W in V.
Example 3.20. Let V = R3 and
B = {i, j, k}
its standard basis. Let
U := Span (i, j)
Then, from the above theorem,
V := U ⊕ W
where W = Span (k). Thus
R3 = Span (i, j) ⊕ Span (k)

The next result is an immediate consequence of Lem. 3.3. We also provide a direct proof.
Theorem 3.10. Let V be a finite dimensional vector space over k such that V = U ⊕ W. Then,
dim(V) = dim(U) + dim(W)
Proof. Let B1 and B2 be bases for U and W respectively. Say
B1 = {u1 , . . . , ur } and B2 = {w1 , . . . , ws }
Then every element of U can be written as a unique linear combination
u = x1 u1 + · · · + xr ur
and every element of W can be written as a unique linear combination
w = y1 w1 + · · · + ys wr
Hence, every element of V can be written as a unique linear combination
v = x1 u1 + · · · + xr ur + y1 w1 + · · · + ys wr
Thus the set

{u1 , . . . , ur , w1 , . . . , ws }
61
forms a basis for V.

The definition of the direct sum can be generalized to several summands. We say that
n
M
V= Vi = V1 ⊕ · · · ⊕ Vn
i=1
if every element in V can be written uniquely as a sum
v = v1 + · · · + vn , with vi ∈ Vi .
3.4.2 Direct products

The notion of direct products is based on the Cartesian products. We review some of the standard terminology. Let
U and W be vector spaces over some field k. We let U × W be the set of all ordered pairs (u, w) such that u ∈ U and
w ∈ W, i.e.,
U × W := {(u, w) | u ∈ U, w ∈ W}
We define the addition of any two pairs (u1 , w1 ) and (u2 , w2 ) as follows
(u1 , w1 ) + (u2 , w2 ) = (u1 + u2 , w1 + w2 )
The scalar multiplication is defined as follows: for every r ∈ k,
r (u, w) = (ru, rw)
Exercise: Show that U × W with this addition and scalar multiplication is a vector space over k.
Definition 3.7. The vector space U × W is called the direct product of U and W.
Lemma 3.4. Let U, W be vector spaces. Then,
dim(U × W) = dim U + dim W
Proof. The proof is left to the reader.

The definition of the direct product can be generalized to several factors. For example
n
Y
V := Vi = V1 × · · · × Vn
i=1
is the set of n-tuples where addition and scalar multiplication are defined coordinate-wise.
Exercises:
" #  
1
 
2 0
3.34. Let V = R2 and W be the subspace generated by w = .
and let U be the subspace generated by u1 =   and u2 = 1.
1
3
 
0 1
" #    
1
Let U be the subspace generated by u = . Show that V is Show that V is the direct sum of W and U.
1
the direct sum of W and U. Can you generalize this to any
two vectors u and w?
3.36. Let u and v be two nonzero vectors in R2 . If there is no
c ∈ R such that u = cv, show that {u, v} is a basis of R2 and that
 
1
3.35. Let V = R . Let W be the space generated by w = 0,
3   R2 is a direct sum of the subspaces generated by U = Span (u)
0
  and V = Span (v) respectively.
62
3.37. Let U and W be subspaces of V. What are U +U, U +V? 3.40. Let U and W be subspaces of a vector space V.
Is U + W = W + U? i) Show that U ∩ W ⊂ U ∪ W ⊂ U + W.
3.38. Let U, W be subspaces of a vector space V. Show that ii) When is U ∪ W a subspace of V?
ii) What is the smallest subspace of V containing U ∪ W?
dim U + dim W = dim(U + W) + dim(U ∩ W)
3.39. Let k be a field, V = Mat2×2 (k), 3.41. Let V be a vector space over k and S the set of all sub-
(" # ) spaces of V. Consider the operation of subspace addition in S.
a b
U := | a, b ∈ k Show that there is a zero in S for this operation and that the
-b a
operation is associative.
and (" # )
a b
W := | a, b ∈ k . 3.42. Let V be a vector space over k and S the set of all sub-
b -a spaces of V. Consider the operation of intersection in S. Show
Show that: that this operation is associative. Is there an identity for this
i) U and W are subspaces of V. operation (i.e., there is an E ∈ S such that A ∩ E = A for all E
ii) V = U ⊕ W in S)?
Review exercises
3.43. Define the following: vector space over a field k, sub- 3.49. Find a basis for the subspace W = Span (w1 , w2 , w3 , w4 )
space, nullspace, direct sum, direct product. in R4 where w1 , . . . , w4 are given as below:
3.44. A square matrix is called upper triangular if all entries
1 −1 1 3
       
below the main diagonal are zero. Let V = Matn×n (R) and W 0  3  4 0
the set of all upper triangular matrices of V. Is W a subspace w1 =   , w2 =   , w3 =   , w4 =   (3.12)
       
3  1  2 1
of V? Justify your answer.
1 5 1 5
       
3.45. A square matrix is called lower triangular if all entries
above the main diagonal are zero. Let V = Matn×n (R), W1 be
" #! " #!
2 −1
the set of all upper triangular matrices of V, and W2 be the set 3.50. Let V = R 2 , W = Span , and U = Span .
3 1
of all lower triangular matrices of V. What is the intersection Show that V is the direct sum of W and U.
W1 ∩ W2 ?
3.51. Let B := {u, v, w} such that
3.46. Let A be an n × n invertible matrix. What is rank (A),
null (A)? What is the reduced row-echelon form of A.  
1
 
 1 
 
1
3.47. Let A be an invertible 3 by 3 matrix. Prove that u = 2 , v = −1 , w = 3
     
B := {u, v, w} is a basis for R3 if and only if 3 1 1
     
{Au, Av, Aw}

Is B a basis for R3 ? Justify your answer.
is a basis for R3 .
3.52. Let V = Matn (R). Find the matrices that commute
3.48. Let with every element of V.
Ax = b
be a linear system of n equations and n unknowns. How many 3.53. Let GL2 (k) denote the set of matrices in Mat2 (k) which
solutions has this system if rank (A) = n? What if rank (A) < n have an inverse. Is GL2 (k) a subspace of Mat2 (k)? Justify your
? Explain. answer.
63
64
Chapter 4
Linear Transformations
Let us consider again one of the questions that was raised in Chapter 1 and more specifically in Section 1.4. So what
kind of transformations of Rn will preserve most (or all) of geometric properties of the objects and in the same time
keep the algebraic structure of Rn ?
There are two algebraic operations in Rn , namely the vector addition and the scalar multiplication. How should
a map look like, which preserves both of these operations? Do such maps preserve the geometric properties of the
objects?
4.1 Linear maps between vector spaces

In this section we will study maps between vector spaces. We are interested in maps which will preserve the
operations on the vector space. Let V and V 0 be vector spaces over the same field k.
Definition 4.1. A map
T : V → V0
will be called a linear map if the following hold for any u, v ∈ V and r ∈ R:
i) T(u + v) = T(u) + T(v),

ii) T(r · u) = r · T(u)
Example 4.1. Let V = Rn and A be an n × n matrix. We define the following map:
TA : V −→ V
(4.1)
x −→ A · x
It is easily checked that this is a linear map.
Example 4.2. Let U and V be vector spaces over k. We denote the set of all linear maps f : U → V by
L(U, V) := { f : U → V | f is linear }
We define an addition in L(U, V) as the usual addition of functions and the scalar multiplication will be the multiplication by
a constant from k. In other words,
( f + g)(u) = f (u) + g(u)
r ? f (u) = r · f (u)
We leave it as an exercise for the reader to show that L(U, V) is a vector space over k. This is a very important space and will
be used again later in the course.
Lemma 4.1. Let T : V → W be a linear map between the vector spaces V and W. Then the following hold:
i) T(0V ) = 0W .
ii) For every v ∈ V, T(−v) = −T(v).
65
Proof. The proof is straight forward.

T(0V ) = T(v − v) = T(v) + T(−v) = T(v) − T(v) = 0W
Part ii) is obvious.
Definition 4.2. Let T : U → V be a linear map between the vector spaces U and V. The kernel of T, denoted by ker(T), is
defined to be the following
ker(T) := {u ∈ U | T(u) = 0W }
The image of T is defined to be
Img(T) := {v ∈ V | ∃u ∈ U, T(u) = v}
Lemma 4.2. Let

T:V→W
be a linear map. Then,
i) ker(T) is a subspace of V
ii) Img(T) is a subspace of W.
Proof. Exercise
The following lemma is helpful in checking whether a liner map is injective or not.
Lemma 4.3. Let
T:V→W
be a linear map. Then ker(T) = {0V } if and only if T is injective.
Proof. Assume that ker(T) = {0V }. Then, for every v1 , v2 ∈ V such that T(v1 ) = T(v2 ) we have
T(v1 ) − T(v2 ) = 0 =⇒ T(v1 − v2 ) = 0 =⇒ (v1 − v2 ) ∈ ker(T)
which means that
v1 − v2 = 0 =⇒ v1 = v2
Assume that T is injective and let v ∈ ker(T). Then T(v) = T(0V ) = 0W implies that v = 0V .

Example 4.3. Let C(R) denote the vector space all differentiable functions f : R → R. Consider the map
D : C(R) → C(R)
f (x) → D( f (x)) = f 0 (x)
where f 0 (x) is the derivative of f (x). Show that D is a linear map.
Theorem 4.1. Let T : V → W be an injective linear map. If v1 , . . . vn are linearly independent elements in V, then
T(v1 ), . . . , T(vn ) are linearly independent elements in W.
Proof. Let
y1 T(v1 ) + · · · + yn T(vn ) = 0W
for y1 , . . . , yn scalars. Then
T(y1 v1 ) + · · · + T(yn vn ) = 0W
which implies that
T(y1 v1 + · · · + yn vn ) = 0W
Since T is injective then ker(T) = {0} and
y1 v1 + · · · + yn vn = 0V
This implies that
y1 = · · · = yn = 0
since v1 , . . . , vn are linearly independent. Thus T(v1 ), . . . , T(vn ) are linearly independent elements in W.
We will accept the following theorems without proof.
66
Theorem 4.2. Let T : V → W be a linear map. Then,
dim V = dim ker(T) + dim Img(T)
Theorem 4.3. Let T : V → W be a linear map and dim V = dim W. If ker(T) = {0} or Img(T) = W, then T is bijective.
Proof. If ker(T) = {0} then T is injective and
dim Img(T) ≥ dim V = dim W
Thus Img T = W and T is surjective.

If Img(T) = W then T is surjective and
dim ker(T) = 0
Thus ker(T) = {0V } and T is also injective.
Example 4.4. Let A be the matrix

 -1 2 3
 

 4 5 6 
 
7 8 9
 
and LA the linear map
LA : R3 −→ R3
(4.2)
x −→ A · x
Determine whether the map LA is bijective.
Solution: We determine first ker(LA ). More precisely we want to find all x ∈ R3 such that
T(x) = Ax = 0
Hence ker(LA ) is the same as the nullspace A. To find the nullspace we proceed as before The reduced row-echelon form is
 1 0 0
 

H =  0 1 0
 

0 0 1
 
Hence rank (A) = 3, null (A) = 0 and nullspace of A is {0}. Thus, ker(LA ) = {0} and LA is injective. From the previous theorem
we conclude that LA is bijective.
Remark 4.1 (Notation). It is important to emphasize the notation used in the literature about the linear maps. If L : Rn 7→ Rm
is a linear map, then implicitly we are implying that Rn and Rm are vector spaces. Hence, elements of the Rn , Rm are vectors.
Therefore the notation
   y1 
 
x1   y 
L  ...  =  . 
   2 
   .. 
xn  
ym
If L : Rn 7→ Rm is considered as a map among sets Rn , Rm , then the notation
L(x1 , . . . , xn ) = (y1 , y2 , . . . , ym )
must be used. Both notations are used in the literature. We will stick to the column vectors notation whenever possible.
67
4.2 Composition of linear maps, inverse maps, isomorphisms

It is natural to ask whether the composition of two linear maps is linear or if the inverse of a linear map is linear.
Theorem 4.4. Let U, V, W be vector spaces over some field k and f and g be linear maps:
f g
U −→ V −→ W
Then the map

g◦ f : U −→ W
is also linear.
Proof. Let u1 , u2 ∈ U. Then
(g ◦ f )(u1 + u2 ) = g f (u1 + u2 ) = g f (u1 ) + f (u2 ) = (g ◦ f )(u1 ) + (g ◦ f )(u2 )

Also,
(g ◦ f )(r · u) = g f (r · u = g(r · f (u)) = r · (g ◦ f )(u)

Example 4.5. Let A and B be matrices of dimension m × n and n × s respectively and LA , LB be the linear maps
LA LB
Rm −→ Rn −→ Rs
such that LA (x) = Ax and LB (x) = Bx. The map LB ◦ LA is given by
LB ◦ LA : Rm −→ Rs
x −→ (BA) x
and it is easily verified to be linear.

Theorem 4.5. Let U, V be vector spaces over a field k and f : U −→ V be a linear map which has an inverse f −1 : V −→ U.
Then, f −1 is linear.
Proof. Exercise
Let U, V be vector spaces and
L : U −→ V
a linear map which has an inverse. Then, L is called an isomorphism and U and V are called isomorphic spaces.
Exercises:
4.1. Let T : R → R such that T(x) = sin x. Is T an isomor- the interval [0, 1]. Check whether the map
phism? Explain.
φ : L([0, 1], R) −→ L(R)
Z 1
4.2. Let A = [ai j ] be an n × n matrix and tr(A) denote its trace. f (x) −→ f (x) dx
Show that the map 0
is a linear map.
tr : Matn×n (k) −→ k 4.4. Let T : Rn → Rn be a linear map given by T(x) = Ax for

some n × n invertible matrix A. Show that T is a bijection.
A −→ tr(A)
4.5. Let P4 be the vector space of degree ≤ 4 polynomials with
is a linear map. real coefficients. Show that P4 is isomorphic to R5 .
4.6. Generalize the above result. In other words, prove that
4.3. Let L([0, 1], R) denote the set of integrable functions on Pn is isomorphic to Rn+1 .
68
4.7. Can you find two vector spaces of the same finite dimen- 4.10. Let T : C → C, such that
sion which are not isomorphic? Explain.
1

4.8. We know that C is a vector space over R. Define the 
 for z , 0
map T : C → C, such that T(z) = z, where z is the complex T(z) = 

z

 0 for z = 0

conjugate of z. Is T a linear map?

4.9. Let T : C → C, such that T(z) = z + z0 , where z0 is a fixed

complex number. Is T an isomorphism? Is T an isomorphism? Explain.
4.3 Matrices associated to linear maps

One of the "nicest" things of linear algebra is that to every linear map we can associate a matrix and vice-versa.
Thus, we can use the properties of matrices to understand all about linear maps. In this section we will find out
how we can get the matrix when a map is given.
Let V and W be finite dimensional vector spaces over the field k and
L : U −→ V
be a linear map. Further, let
B1 := {u1 , . . . , un }
B2 := {v1 , . . . . . . , vm }
be bases for U and V respectively. Then the values L(u1 ), . . . , L(un ) are as follows:
L(u1 ) = a1,1 v1 + · · · a1,m vm
L(u2 ) = a2,1 v1 + · · · a2,m vm
·········
L(un ) = an,1 v1 + · · · an,m vm
for some scalars ai,j ∈ k. We take the n by m matrix A = [ai,j ], where ai,j are as above. Its transpose At is the m × n
matrix
 a1,1 a2,1 · · · an,1 
 
 a a2,2 · · · an,2 
At =  1,2

··· ···

 
a1,m a2,m · · · an,m
 
which is called the matrix associated with the linear map L with respect to the bases B1 and B2 and denoted by
B
MB2 (L).
1
Indeed, every vector x ∈ U is written as
x = x1 u1 + · · · + xn un
where x1 , . . . , xn are scalars in k. Hence,
L(x) = x1 L(u1 ) + · · · + xn L(un )
= x1 a1,1 v1 + · · · + a1,m vm

+ x2 a2,1 v1 + · · · + a2,m vm

·········
+ xn an,1 v1 + · · · + an,m vm

=(a1,1 x1 + a2,1 x2 + · · · + an,1 xn )v1

(a1,2 x1 + a2,2 x2 + · · · + an,2 xn )v2
·········
(a1,m x1 + a2,m x2 + · · · + an,m xn )vm
69
Thus, the coordinates of the vector L(x) with respect to the basis B2 of V are
 a1,1 x1 + a2,1 x2 + · · · + an,1 xn

 

 a x + a x + · · · + a x 
 1,2 1 2,2 2 n,2 n 
L(x) =  ·········
 

·········
 
 
a1,m x1 + a2,m x2 + · · · + an,m xn
 
x1
 
 a1,1 a2,1 ··· an,1
   
  x2 
 a a2,2 ··· an,2   
=  1,2  ·  ·
   
··· ···

·
   
a1,m a2,m ··· an,m
   
 
xn
B
= At x = MB2 (L) x
1
B
Thus for any linear map L : U → V there is a matrix MB2 (L) with respect to bases B1 and B2 such that
1
B
L(x) = MB2 (L) x
1
B
We normally write MB2 (L) in the following way
1
B
h i
MB2 (L) = L(u1 )B2 | · · · | L(un )B2
1
where each L(ui )B2 is the column vector L(ui ) with respect to the basis B2 of V.
Example 4.6. Let L : R2 → R3 be the linear map given by
" #!  x − y 
x
= 2x − 3y
 
L
y
x − 3y
 
Find the matrix associated with L with respect to the standard bases.
Solution: The standard basis for R2 is

(" # " #)
1 0
B1 = {~i, ~j} = ,
0 1
Then
−1
 
1
 
L(i) = 2 , L( j) = −3
   
1 −3
   
with respect to the standard basis of R3 . Hence, the associated matrix of L : R2 → R3 is
 1 -1
 

 2 -3 
 
1 -3
 
with respect to the standard bases of R2 and R3 .
70
Example 4.7. Let
f : R2 → R2
x cos θ − y sin θ
" # " #
x
→
y x sin θ + y cos θ
for some fixed θ.

The reader should show that f is a linear map. It is an exercise in trigonometry to show that this map rotates every point
of R2 by the angle θ. What is the matrix associated to f with respect to the standard basis of R2 ?
Solution: We have
cos θ − sin θ
" #! " # " #! " #
1 0
f = , f =
0 sin θ 1 cos θ
Then, the associated matrix is:
cos θ − sin θ
" #
A := M( f ) =
sin θ cos θ
We have seen in the exercises of Chapter 1 that
" #
cos nθ − sin nθ
A =n
sin nθ cos nθ
Indeed this is to be expected since rotating n-times by θ is the same as rotating by the angle nθ.
We now see an example when neither of the bases B1 , B2 is a standard basis.
Example 4.8. Let
T : R3 −→ R4
x + y
 
x
 
 y + z
 y
  −→ 
 
x − y

z
 
y−z
 
be a linear map. Fix bases

     

 1 2 3 
     
B1 =  , ,  = {u1 , u2 , u3 },

1 1 1

 
 
 
 
 
 

1 0 1
      
       


 1 1 2 0 

0 2 3 0
 
B2 =  , , ,  = {v1 , v2 , v3 , v4 }
 
0 0 2 0
        
 

1 0 1 2
 
of R3 and R4 respectively. Find the associated matrix of L with respect to B1 and B2 .
Solution: We first find the following
2 3 4

     
2 1 2
T(u1 ) =   =: w1 , T(u2 ) =   =: w2 , T(u3 ) =   =: w3
0 1 2
0 1 0
Now we need to express the vectors w1 , w2 , w3 with respect to the basis B2 . Each one of them must be expressed as
r1 v1 + r2 v2 + r3 v3 + r4 v4 = (r1 + r2 + 2r3 , 2r2 + 3r3 , 2r3 , r1 + r3 + 2r4 ).
71
Thus we have (with respect to B2 )

 9   3 
 1 
 
 4   2 
 1   1   1 
−  − 
w1 =   , w2 =  14  , w3 =  2 
 
 0   2   1 
 1  1 
−2 2
0
The matrix is
9 3
1
 
 4 2 
 
 
1 − 14 1 
− 2 


B
= 

MB2


1
0 − 12
 

 1 
 
 
− 12 1 
2 0

The following theorem makes precise the relation between matrices and linear maps. Let U and V be vector spaces
over k and B1 , B2 their bases respectively. From now on when there is no confusion for a linear map f : U → V we
B
will simply use M( f ) instead of MB2 ( f ).
1
Theorem 4.6. Let U and V be vector spaces over k and B1 , B2 their respective bases. For any f, g ∈ L(U, V) the following hold:
i) M( f + g) = M( f ) + M(g)
ii) M(r f ) = r M( f ), for any scalar r ∈ k.
iii) M( f ◦ g) = M( f ) · M(g)
Proof. The proof is straight forward and left as an exercise.

The following theorem shows that not only to every linear map we can associate a matrix but that the converse
also holds.
Theorem 4.7. Let U and V be vector spaces over k of dimension n and m respectively. Fix bases B1 , B2 of U and V. Further,
let L(U, V) be the space of linear maps f : U → V. Then
Φ : L(U, V) −→ Matm×n (k)

(4.3)
f −→ M( f )
is an isomorphism.
Proof. The previous theorem shows that Φ is a linear map. First we show that φ is injective. Let f, g ∈ L(U, V) such
that Φ( f ) = Φ(g). Thus, M( f ) = M(g). Hence, for every x ∈ U we have
M( f ) x = M(g) x
which means that f (x) = g(x). Therefore, f = g and Φ is injective.

Let A ∈ Matm×n (K). Define
LA : U −→ V
(4.4)
x −→ A x
Then, LA ∈ L(U, V). Hence, Φ is surjective.
Exercises:
72
4.11. Check whether the map T : R3 −→ R4 such that 4.16. Let Pn denote the vector space over R of polynomials with
coefficients in R and degree ≤ n. Differentiation of polynomi-
T(x, y, z) = (x + 2, y − x, x + y)
als is a linear map on this space. Find its matrix representation
is linear. If it is linear then find its associated matrix. for
4.12. Find the associated matrix with respect to the standard B1 = B2 = {1, x, . . . , xn }.
bases to the map T : R3 −→ R4 such that " #
1
T(x, y, z) = (x, y, x + y + z) 4.17. Let u = ∈ R2 and T : R2 → R2 such that T(x) = u+x.
2
4.13. Find the associated matrix with respect to the standard Find the matrix representation of T with respect to the standard
bases to the map T : R2 −→ R3 such that basis of R2 .
T(x, y) = (x + y, 3y, 7x + 2y)
4.18. Let T : R2 → R2 be the transformation which rotates
4.14. Find the associated matrix with respect to the standard every point counterclockwise by the angle θ. Find its matrix
bases to the map T : R5 −→ R5 such that representation with respect to the standard basis.
T(x1 , . . . , x5 ) = (x1 , x2 , x3 , x4 , x5 )
4.15. Let L1 (R) be the vector space of differentiable functions 4.19. Let T : R2 → R2 be the transformation of the plane which
from R to R. Let sends every point to its symmetric point with respect to the x-
axis (i.e., T(x, y) = T(x, −y)). Find the matrix representation
V := Span (sin x, cos x) of T with respect to the standard basis.
and D : L1 (R) → L1 (R) the differentiation map. The restric-
tion of this map to V gives a linear map DV : V → V. Find the 4.20. Find the standard matrix representation for the reflec-
matrix representation of DV for B1 = B2 = {sin x, cos x}. tion of the xy-plane with respect to the line y = x + 2.
4.4 Change of basis

Sometimes we have to deal with two different bases for the same vector space. The above discussion gives a way
of finding the coordinates of a vector with respect to a given basis.
Let V be a vector space and B, B0 two bases of V given by
B = {b1 , . . . , bn }, B0 = {b01 , . . . , b0n },
and T : V → V be the linear map such that
T: V −→ V
(4.5)
bi −→ b0i
0 0
We denote the associated matrix of T by MB
B
and call it the transformation matrix of B to B0 . Then MB
B
is given by
0
MB
B
= [ T(b1 ) | · · · · · · | T(bn ) ] = [ b01 | · · · · · · | b0n ]
where b0i ’s are given with respect to the basis B0 . We give the following algorithm for computing the transformation
matrix.
Algorithm 5. Input: A vector space V and two bases B1 = {u1 , · · · un } and B2 = {v1 , · · · vn } of V
B
Output: The transformation matrix MB2 , such that
1
B
MB2 · vB1 = vB2
1
i) Create the matrix

A = [ v1 | · · · | vn | u1 | . . . | un ]
ii) Transform A by row operations to the matrix

B
I | MB2
1
73
Example 4.9. Let V = R2 and

(" # " #) (" # " #)
1 1 1 −1
B1 = , , B2 = ,
1 0 2 1
B
be two bases of V. Find the transformation matrix MB2 . Given vectors u, v with coordinates
1
" # " #
3 −2
u= , and v =
4 3
with respect to the B1 basis, find their coordinates with respect to B2 .
Solution: We first create the matrix

" #
1 -1 1 1
A=
2 1 1 0
By row operations we transform it to
2 1
 
 1 0 3 3 
 
 
1 2
 
0 1 - 3 - 3
Then " #
B 1 2 1
MB2 = ·
1 3 -1 -2
and " # " # " # " #
B 3 1 10 B -2 1 1
uB2 = MB2 · = , and vB2 = MB2 · =− .
1 4 3 -11 1 3 3 4

 
1
Example 4.10. Let u ∈ R3 with coordinates in the standard basis u = 2. Find the coordinates of u with respect to the basis
 
3
 
     

 1 2 3 
     
B0 =  , , .

1 0 1

   
        
1 1 1
 
Solution: We first create the matrix
 1 2 3 1 0 0
 

A =  1 0 1 0 1 0
 

1 1 1 0 0 1
 
By row operations we transform it to

1 1
 
 1 0 0 - 2 2 1 
 0 1 0 0 -1 1
 

1 1
 
0 0 1 2 2 -1
Then  1 1 
 − 2 2 1 
 
 
B
M := MB2 =  0 −1 1 
 
1  
 
 1 1
−1

2 2
74
and  7 

 2 

 
M · u =  1 
 
 
 
3 
−2


Let V be a finite dimensional vector space and B and B0
be bases of V. Let L : V → V be a linear transformation
and MB and MB0 be associated matrices for L with respect to bases B and B0 respectively. Then we have the
following theorem.
0
Theorem 4.8. Let M := MB
B
be the transformation matrix from B to B0 . Then,
MB0 (L) = M−1 · MB (L) · M
Proof. Exercise.
Example 4.11. Find the associated matrix for the linear map
T : R3 −→ R4
such that
T(x, y, z) = (x − y + 2z, y + z, 3x − 2y − z, 7y + z)
and find a basis for ker(T).
Solution: We have
  1   −1    2 
     
1
  0   0 
   1   0
   1 
T 0 =   , T 1 =   , T 0 =   (4.6)
            
  3   −2   −1
0 0 1
0 7 1
     
The associated matrix is:

 1 -1 2
 

 0 1 1 
M(T) := 
 
 3 -2 -1


0 7 1
 
and its reduced row-echelon form is:

 1 0 0
 

 0 1 0 
H(T) := 
 
 0 0 1


0 0 0
 
The system
H(T)x = 0
has as solution only x = 0 therefore ker(T) = {0}.
Exercises:
4.21. Let B1 = {1, x, x2 , x3 } be a basis for P3 . Show that 4.22. Let V := Span (ex , e−x ). Find the coordinates of
f (x) = sinh x, g(x) = cosh x
B2 = {2x − 1, x2 − x + 1, x3 − x, x3 − x, −2}
with respect to B = {ex , e−x }.
is also a basis. Find the transformation matrix from B1 to B2 . 4.23. Let V := Span (ex , xex ). Find the transformation matrix
75
from B1 to B2 , where vectors obtained by rotating counterclockwise, by the angle θ,

the vectors i, j respectively. Clearly B2 := {u, v} is a basis for
B1 := {ex , xex } and B2 = {2xex , 4ex }. B
R2 . Find MB2 .
4.24. Let B1 := {i, j} be the standard basis of R2 and u, v be the 1
4.5 Linear transformation in geometry

In this section we see some linear transformations in geometry. We will focus on the plane but similar maps can be
also studied in space.
4.5.1 Scalings: scalar matrices

A scaling is a linear transformation which scales the unit vectors. In other words,
" #! " #
x rx
T = ,
y ry
" #
r 0
for some scalar r > 0. The corresponding matrix is A = = r I. When r > 1 it is called a dilation and when r < 1
0 r
a contraction.
4.5.2 Rotations
We already have seen what happens to a rotation with an angle θ counterclockwise around the origin. It is given by
cos θ − sin θ x
" # " #" #
x
→ ,
y sin θ cos θ y
"
#
a −b
which equivalently says that it is a matrix with a2 + b2 = 1.
b a
A rotation combined with a scaling has a matrix
cos θ − sin θ x
" # " #" #
x
→r
y sin θ cos θ y
Hence we have:
"#
a −b
Lemma 4.4. A matrix of the form represents a rotation by θ combined with a scaling r > 0, where r and θ are the
b a
" #
a
polar coordinates of the vector .
b
4.5.3 Shears
" # " #
1 r 1 0
A horizontal shear is given by the matrix and a vertical shear by the matrix .
0 1 k 1
4.5.4 Projections
Let us consider now a problem that we have already seen in Fig. 4.1, finding a projection of a vector v over a vector
u. We already know the formula for proju (v). Is this a linear map? Can we find its matrix if that’s the case?
76
Consider vectors u and v in R2 as in the Fig. 4.1. The projection

vector of v on u, denoted by proju (v) is the vector obtained by drop-
ping a perpendicular from the vertex of v on the line determined by
u. We found its formula in Eq. (2.2)
B
u·v u u·v
proju (v) = · = 2 u
||u|| kuk u
" # v w u−v
x
Let us try to express this in terms of the coordinates of x = 1 when
x2
" #
u
the unit vector u = 1 is given. So we have
u2 A C
pro ju v u
" #
u
proju (x) = (x · u) u = (x1 u1 + x2 u2 ) 1
u2 Figure 4.1: The projection of v onto u
u1 x1 + u − 1u2 x2
" 2 # " 2 #" # " 2 #
u1 u1 u2 x1 u1 u1 u2
= = = x
u1 u2 x1 + u22 x2 u1 u2 u22 x2 u1 u2 u22
Hence, the projection is a linear map since it is given by multiplication by a matrix.
Consider
" # now if we have a line L going through the origin with equation y = ax. A directional vector for L is
1
w = , which we can normalize as
a
" #
w 1 1
u = = √ . (4.7)
||w|| a2 + 1 a
Then we have the linear transformation
" #! " #" #
x 1 1 a x
T = 2 .
y a +1 a a2 y
In the above discussion it was not necessary to assume that the vector u be a unit. The student should prove the
following lemma.
" #
w
Lemma 4.5. For any given vector w = 1 the projection map
w2
x → projw (x)
is a linear map with matrix

" 2 #
1 w1 w1 w2
P= 2 .
w1 + w2 1 w2
2 w w22
" #
1
Example 4.12. Find the matrix P of the projection map onto the line L generated by w = .
2
Proof. From the Lemma above we have

" 2 # " #
1 w1 w1 w2 1 1 2
P= =
w21 + w22 w1 w2 w22 5 2 4
Example 4.13. Given a line L with equation

L : y = ax + b.
Find the formula for the projection of a point P(x, y) onto this line. Is this map linear?
77
4.5.5 Reflections
We continue our discussion of the previous section but now with the goal of finding the symmetric point of B with
respect to the line AC. First we consider the case when the point A is the point (0, 0) in R2 . So the problem is the
same as before but now we want to find the vector refu v as in Fig. 4.2.
Consider vectors u and v in R2 as in the Fig. 4.1. The reflection
vector of v with respect to u, denoted by refu v is the vector obtained
by reflecting the vector v with respect to the line determined by u. B
Hence,

refu v = proju (v) − w = proju (v) − v − proju (v)
v w
= 2proju (v) − v = 2Pv − v = (2P − I2 ) v
" #
x
Let us try to express this in terms of the coordinates of x = 1 when
x2 O C
" # pro ju v u
u
the unit vector u = 1 is given. So we have the matrix of the reflection
u2
as w0
" 2 #
2u1 − 1 2u1 u2 refu v
S = 2P − I2 =
2u1 u2 2u22 − 1
Consider now, as in the case of projections, the line L with equation
y = ax. Then the unit vector u is given by Eq. (4.7). Thus the matrix S B’
becomes " #
1 1 − a2 2a Figure 4.2: Reflection of v with respect to u
S = 2P − I2 = 2
a + 1 2a a2 − 1
Lemma 4.6. The reflection with respect to a line L going through the origin with equation
L : y = ax
is a linear map given by the matrix " #

1 1 − a2 2a
S= 2
a + 1 2a a2 − 1
Can we generalize this solution to a general line? Let L be a line in R2 with equation
y = ax + b. (4.8)
Consider the map T : R2 → R2 such that it takes every point P(x, y) to its reflection P0 . Determine explicit formulas
for this map and check whether it is linear. The following is a high school problem in analytic geometry.
Lemma 4.7. The reflection map refL x with respect to a general line L : y = ax + b is given by the formula
(1 − a2 )x + 2ay − 2ab
" # " #
x 1
→ 2
y a + 1 2ax + (a2 − 1)y + 2b
Proof. Let P(x1 , y1 ) be a given point. The line L0 going through P and perpendicular to L will have equation
1 x1

y = − x + y1 + . (4.9)
a a
The point of intersection has x-coordinate given by
1 x1

a + x = −b + y1 + .
a a
So we have
x1 + ay1 − ab
x=
a2 + 1
78
x1 +x2
If we denote Q(x2 , y2 ) the reflection point then x = 2 . Therefore,
x1 + ay1 − ab (1 − a2 )x1 + 2ay1 − 2ab

x2 = 2x − x1 = 2 − x 1 =
a2 + 1 a2 + 1
Substituting x2 in Eq. (4.9) we get
2ax1 + (a2 − 1)y1 + 2b
y2 =
a2 + 1
This completes the result.
Notice that the above map is not linear. There is a way to extend this map to a map T0 : R3 → R3 such that T0 is
linear, but we will consider that later.
Lemma 4.8. Let P be a plane in R3 going through the origin. Then P has equation
ax + by + cz = 0
Find the formulas for the reflection map refP x with respect to the plane P. Show that this is a linear map. Find its matrix.
Proof. Exercise
Now that we know that the reflection with respect to a plane going through the origin is a linear map, maybe
we take another look at the case of the general line on the plane. Can we somehow consider equation Eq. (4.8) as a
plane in R3 ? Or can we do even better, have some kind of a space that all lines pass through the center?
Think about it!
Exercises:
4.25. Let L be a line in R3 such that it contains the unit vector 4.26. Let L be a line in R3 such that it contains the unit vector
u1 
 
u1 
 
u =  2 
u 
u = u2 
 
u3
 
u3
 
Find the matrix of the linear transformation T(x) = projL (x).
What is the trace of this matrix? Find the matrix of the linear transformation T(x) = refL x.
4.6 Review exercises

4.27. Find the associated matrix for the linear map 4.29. Check whether the map T : R3 −→ R3 such that
T : R3 −→ R4 T(x, y, z) = (x − 2y, y − x, x + y)
such that is linear. If it is linear then find its associated matrix.
4.30. Let T : R2 → R2 be the rotation counter clockwise by

T(x, y, z) = (x − y + 2z, y + z, 3x − 2y − z, 7y + z)
the angle θ = π3 . Find
and find a basis for ker(T).
T(1, 0), T(1, 1), T(−1, 1).
4.28. Find the standard matrix representation of the rotation 4.31. Find the rank and nullity, and bases for the column
of the xy-plane counterclockwise about the origin with an space, row space, and the nullspace of the matrix
angle:
 1 2 3 1 
 
i) 45 ◦
 -2 1 1 2 

ii) 60 ◦  
 -1 3 4 3 
iii) 15◦ -1 3 4 3
 
79
4.32. Find the associated matrix for the linear map such that
T(x, y, z, w) = (x − y + z, 2x − 2y + 2z, x + y − z − b, 2x − w)
T : R4 −→ R4 and find a basis for ker(T).
80
Chapter 5
Determinants, eigenvalues, eigenvectors
The theory of determinants was developed in the 17-th and 18-th centuries. It started mainly with Cramer and
continued further with Bezout, Vandermonde, Laplace, Cauchy, et al. With the development of modern algebra
and new concepts that came with it as multilinear forms, permutation groups, etc, the concept of the determinant
was put in a firm foundation.
5.1 Determinants
In this section we define the determinant of matrix. The proper way to do that would be via alternating forms and
permutations but that might be a bit ambitious for this course. Instead we proceed with the more computational
approach. For the interested reader we provide a complete treatment of determinants in the appendix.
Definition 5.1. Let A = [ai j ] be an n × n matrix. For each (i, j) let Ai j be the (n − 1) × (n − 1) matrix obtained by deleting its
i-th row and j-column. Then, Ai j is called a minor of A, and
āi j = (−1)i+ j det(Ai j )
is called a cofactor of A.
Definition 5.2. Let A = [ai j ] be an n × n matrix. Then for a fixed i = 1, . . . n the determinant of A is defined to be:
n
X n
X
det(A) := (−1)i+j · ai,j · det(Ai j ) = ai, j · āi,j
j=1 j=1
and is independent on the choice of i.
The determinant of a matrix A
 a1,1 a1,2 a1,3 ... a1,n

 

 a a2,2 a2,3 ... a2,n 
 2,1 
 a3,1 a3,2 a3,3 ... a3,n
 

A :=  · ·
 



 · · 


 · · 

am,1 am,2 am,3 ... am,n
81
is denoted by
a1,1 a1,2 a1,3 ... a1,n
a2,1 a2,2 a2,3 ... a2,n
a3,1 a3,2 a3,3 ... a3,n
det(A) = · ·
· ·
· ·
am,1 am,2 am,3 ... am,n
Example 5.1. Let A be a 2 × 2 matrix
" #
a b
A=
c d
Then its determinant is
a b
det(A) = = ad − bc.
c d

Example 5.2. Let A be a 3 × 3 matrix

a1,1 a1,2 a1,3 
 
A = a2,1 a2,2 a2,3 

a3,1 a3,2 a3,3
 
Then its determinant is
a2,2 a2,3 a a2,3 a a2,2

det(A) =a1,1 − a1,2 2,1 + a1,3 2,1
a3,2 a3,3 a3,1 a3,3 a3,1 a3,2
= a1,1 a2,2 a3,3 + a1,2 a2,3 a3,1 + a2,1 a3,2 a1,3 − a3,1 a2,2 a1,3 − a3,2 a2,3 a1,1 − a2,1 a1,2 a3,3
In many textbooks of elementary linear algebra the following technique is given for remembering the determi-
nant of a 3 by 3 matrix. The downward arrows represent products with coefficients 1 and the upward ones represent
a1,1 a1,2 a= 1,3 a= 1,1 a= 1,2
! ! !
a2,1 a= 2,2 a= 2,3 a= 2,1 a2,2
! ! !
a3,1 a3,2 a3,3 a3,1 a3,2
products with coefficient -1.
Definition 5.3. The definition of the determinant as above is called the expansion by minors along the i-th row.
First we have to show what is already claimed in the definition: that the choice of the row does not change the
determinant. We skip the proof of the theorem. For a complete proof a more precise definition of the determinant
is needed, as given in the Appendix B.
Theorem 5.1. Expansion along any row or column does not change the determinant.
The above theorem allows us to pick the row or column with more zeroes when we compute the determinant
of a matrix.
82
Example 5.3. Compute the determinant of the matrix
1 2 0 4 0
 
 

 0 2 0 0 1 

A =  2 1 2 1 2
 

1 1 2 4 5
 
 

0 2 1 2 0
Solution: Since the second row has three zeroes we expand along that row. So we have
1 0 4 0 1 2 0 4
2 2 1 2 2 1 2 1
det(A) = 2 · −1·
1 2 4 5 1 1 2 4
0 1 2 0 0 2 1 2
We let
 1 0 4 0  1 2 0 4
   
 
 2 2 1 2   2 1 2 1 
A1 :=   , A2 = 
   
 1 2 4 5  1 1 2 4

 
0 1 2 0 0 2 1 2
   
Then
2 1 2 2 2 2
det(A1 ) = 1 · 2 4 5 +4· 1 2 5
1 2 0 0 1 0 (5.1)
= (5 + 8 − 8 − 20) + 4 (2 − 2 · 5) = −15 − 32 = −47
1 2 1 2 2 1 2 1 2
det(A2 ) = 1 2 4 −2· 1 2 4 −4· 1 1 2
2 1 2 0 1 2 0 2 1
= (4 + 16 + 1 − 4 − 4 − 4) − 2 (8 + 1 − 4 − 8) − 4 (2 + 4 − 8 − 1)
= 9 − 2 · (−3) − 4 · (−3) = 27
Hence,
det(A) = 2 · (−47) − 27 = −121

Lemma 5.1. det(A) = det(AT )
Proof. Let A = [aij ] be given. We prove the Lemma by induction. For n = 1 the proof is trivial. Assume that the
lemma holds for n < r. We want to show that it holds for n = r. The determinant of A is
det(A) = a11 |A11 | − a12 |A12 | + · · · + (−1)r+1 a1r |A1r |
Denote by B := At . Then
det(B) = b11 |B11 | − b21 |B21 | + · · · + (−1)r+1 br1 |B1r |.
However, a1j = b j1 and B j1 = At1j . By the induction hypothesis we have |A1 j | = |B j1 |. Hence det(A) = det(B) = det(At ).

Remark 5.1. The determinant of a triangular matrix is the product of its diagonal entries.
We illustrate with an upper triangular matrix.
83
Example 5.4. Let A be a triangular matrix
 a1,1 a1,2 a1,3 ... a1,n

 

 0 a2,2 a2,3 ... a2,n 
 
 0 0 a3,3 ... a3,n
 

A :=  · ·
 



 · · 


 · · 

0 0 0 ... am,n
Solution: We find the determinant by expanding along the first column. It is obvious that
n
Y
det(A) = ai,i .
i=1

We now see some properties of determinants.
Lemma 5.2. Let A be an n × n matrix. The row operations have the following effect on the determinant:
i) If Ri ←→ R j is performed on a matrix A then the determinant of the resulting matrix A0 is
det(A0 ) = − det(A)
ii) If two rows of A are the same then

det(A) = 0
iii) If Ri → rRi then the determinant of the resulting matrix A0 is
det(A0 ) = r · det(A)
iv) The operation R j → rRi + R j does not change the determinant.
Proof. i) We proceed by induction. The proof for n = 2 is trivial. Assume that the property holds for all matrices of
size smaller then n. Let B denote the matrix obtained after performing the operation Ri ←→ R j on A. Compute the
determinant by expansion along the s-th row, where s , i and s , j. Then
det(A) = as1 |As1 | − as2 |As2 | + · · · + (−1)s+n asn |Asn |.
For each 1 ≤ r ≤ n we have

(−1)s+r |Asr | = −(−1)s+r |Brs |.
Thus, by induction hypothesis, |Brs | = −|Asr |. Hence, det(B) = − det(A).
ii) This is an immediate consequence of part i).
iii) Immediate consequence of the definition.
iv) Let B denote the matrix obtained after performing the operation R j → rRi + R j on A. Then,
det(B) = b j1 |B j1 | + · · · + (−1) j+n b jn |B jn |

= (rai1 + a j1 )|B j1 | + · · · + (−1) j+n (rain + a jn )|B jn |

= rai1 |B j1 | + · · · + (−1) j+n rain |B jn | + a j1 |B j1 | + · · · + (−1) j+n a jn |B jn |
= r det(C) + det(A)
where C is obtained by interchanging the rows of A. Hence, det(C) = 0 and det(B) = det(A).

84
Theorem 5.2. A matrix A is invertible if and only if det(A) , 0.
Proof. Let A be given. Compute the row echelon form of A. Then
det(A) = r · det(H)
for some constant r , 0. The matrix A is invertible if and only if H has pivots in every row. Since H is triangular
then its determinant is the product of this pivots. Hence, A is invertible if and only if det(H) , 0. Therefore, A is
invertible if and only if det(A) , 0.
Lemma 5.3. Let A, B ∈ Matn×n (k). If det(A) = 0 then det(AB) = 0.
Proof. Exercise.

Theorem 5.3. Let A, B ∈ Matn×n (k). Then

det(A B) = det(A) det(B).
Proof. First we assume that A is diagonal. Then, to obtain the matrix AB, each row of B is multiplied by Ai,i . Hence,
det(AB) = (a11 · · · ann ) · det(B) = det(A) · det(B).
Without loss of generality assume that A is invertible (otherwise the theorem is true from the above Lemma).
Then, A can be converted in a diagonal form D by row operations (no multiplying by constants is allowed).
Thus, D = EA for some elementary matrix E where E corresponds to row interchanges and row-additions. Hence,
det(A) = (−1)r · det(D), for some r. Then,
E(AB) = (EA)B = DB.
Hence, we have
det(AB) = (−1)r · det(DB) = (−1)r · det(D) · det(B) = det(A) · det(B).
Example 5.5. Find the determinant of the matrix AB when
 1 0 0 0  3 0 0 0
   
 
 2 2 0 0   2 1 0 0 
A :=   , B = 
   
 9 2 4 0  21 -7 2 0

 
12 10 2 5 13 2 31 2
   
Solution: Since both are triangular matrices and det(AB) = det(A) · det(B) we have
det(AB) = (1 · 2 · 4 · 5) · (3 · 1 · 2 · 2) = 480
5.1.1 Computation of determinants

Computing the determinant as described above is a lengthy process. However we can use the row operations to
compute the determinant faster.
85
Algorithm 6. Input: A square matrix A

Output: The determinant of A
1) Reduce A to row-echelon form using only row addition and row interchanges.
2) If during the procedure one of the rows becomes all zeroes then
det(A) = 0,
otherwise
n
Y
det(A) = (−1)r · pi
i=1
where pi ’s are pivots and r is the number of row interchanges performed.
5.1.2 Generalized concept of Determinant

Let us try to give a more general definition of the determinant.
Definition 5.4. A map
φ : V1 × · · · × Vn → W
is called multi-linear if for all i = 1, . . . n and r ∈ k the following hold:
i) φ(v1 , . . . ,vi , v + u, vi+1 , . . . , vn ) =

= φ(v1 , . . . , vi , v, vi+1 , . . . , vn ) + φ(v1 , . . . , vi , u, vi+1 , . . . , vn )
ii) φ(v1 , . . . ,vi , r · u, vi+1 , . . . , vn ) = r · φ(v1 , . . . , vi , v, vi+1 , . . . , vn )
Definition 5.5. A multi-linear map

φ : V ×···×V → W
is called n-multi-linear function. If W = k the φ as above is called a multi-linear form.
Definition 5.6. Let φ : V × · · · × V → W be an n-multi-linear function. Then φ is called alternating if
φ(v1 , . . . , vi , vi+1 , . . . , vn ) = 0
whenever vi = vi+1 and is called symmetric if interchanging any two coordinates does not change the value of the function.
Exercise: Show that a 2-multi-linear map is a bilinear map.
Proposition 5.1. Let φ be an n-multi-linear alternating function on V. Then,
1) the value of φ on an n-tuple is negated if two adjacent components are interchanged.
2) for each σ ∈ Sn ,
φ(vσ(1) , . . . , vσ(n ) = ε φ(v1 , . . . , vn )
3) if vi = v j for any i , j then φ(v1 , . . . , vn ) = 0.
4) if vi is replaced by vi + αv j , in (v1 , . . . , vn ) for any i , j and α ∈ k, then the value of φ on this tuple is not changed.
Proposition 5.2. Assume that φ is an n-multi-linear alternating function on V and that for some v1 , . . . vn ∈ V, w1 , . . . wn ∈ V
we have
w1 = a11 v1 + . . . an1 vn
...
wn = a1n v1 + . . . ann vn
Then, X
φ(w1 , . . . , wn ) = ε(σ)aσ(1) 1 · · · aσ(n) n φ(v1 , . . . , vn )
σ∈Sn
86
Definition 5.7. An n × n determinant function on k is any function
det : matn×n (k) → k
that satisfies:
1) it is a n-multi-linear alternating form on kn , where n-tuples are (A1 , . . . An ) n-columns of matrices A in kn .
2) det(I) = 1
Theorem 5.4. There is a unique n × n determinant function on k and it can be computed for any n × n matrix A = [aij ] by
X
det(A) = ε(σ)aσ(1) 1 · · · aσ(n) n .
σ∈Sn
Corollary 5.1. The determinant is an n-multi-linear function on the rows of Matn×n (k) and for any matrix A, det(A) = det(AT ).
Theorem 5.5. (Cramer’s rule) If A1 , . . . , An are columns of a matrix A and
b = b1 A1 + . . . bn An
for bi ∈ k then
bi · det(A) = det(A1 , . . . , Ai−1 , B, Ai+1 , . . . , An ).

Theorem 5.6. Let A, B ∈ Matn×n (k). Then
det(A B) = det(A) det(B).
Definition 5.8. Let A = [ai j ] be an n × n matrix. For each (i, j) let Ai j be the (n − 1) × (n − 1) matrix obtained by deleting its
i-th row and j-column. Then, Ai j is called a minor of A, and Ci j = (−1)i+j det(Ai j ) is called a cofactor of A.
Theorem 5.7. Let A = [ai j ] be an n × n matrix. Then for each i = 1, . . . , n, the determinant of A can be computed as:
n
X
det(A) = (−1)i+ j det(Ai j ).
j=1
The theorem assures that the expansion by any row or column for a matrix A gives the same determinant. Thus,
our definition of the determinant in Chapter 3 is justified.
Exercises:
5.1. Let A be a (n × n) invertible matrix. Show that 5.4. Find the determinants of
 5 -1 0 2   5 2 0 2
   
1 
det(A ) =
−1  1 2 1 0   3 2 1 0 
det(A) A =  , B = 
  
 3 1 -2 4   3 1 -2 4


0 4 -1 2 2 4 -1 2
   
5.2. Find the determinants of
and use the result to find det(A−1 ) and det(B−1 ).
 1 1 1   2 1 3 
   
5.5. Let A be a matrix such that det(A) , 0. Does the system
A =  1 1 1  , B =  2 -1 0 
 
Ax = b have any solutions?
2 0 1 4 0 3
   
5.6. Let A be given as
5.3. Find the determinants of "
a b
#
A=
c d
 1 0 1  2 1 3
   
 
A =  0 1 0  , B =  2 -1 0 What is the condition on a, b, c, d such that A has an inverse?
   


2 0 1
 
-1 0 5
 Find the inverse.
87
5.7. Let C be an invertible matrix. Prove that 5.9. Let A be an n × n matrix. If every row of A adds to 0
prove that det(A) = 0.
det(A) = det(C−1 AC).
5.8. The determinant of an n × n matrix A is det(A) = 3. Find 5.10. Let A be an n × n matrix. If every row of A adds to 1
det(2A), det(−A), and det(A3 ). prove that det(A − I) = 0. Does this imply that det(A) = 0 ?
5.2 Eigenvalues, eigenvectors, and eigenspaces

The reader has probably noticed that it is much easier to deal with diagonal matrices. For example, if A is diagonal
then det(A) is easier to find, a linear system Ax = b is simple to solve, and An is easy to calculate. How can a matrix
be "transformed" to a diagonal matrix? In this section we will study some of the most important concepts of linear
algebra such as eigenvalues, eigenvectors, and eigenspaces. Their importance will be obvious in the next section.
Definition 5.9. Let A be an n × n matrix. A nonzero scalar λ is called an eigenvalue if there exists a nonzero vector v such
that
Av = λv
The vector v is called the eigenvector corresponding to λ.
Proposition 5.3. The following are equivalent:

1) λ is an eigenvalue of A
2) det(λI − A) = 0
Proof. In order to compute such eigenvalues and eigenvectors we notice that
Av = λv =⇒ (A − λI)v = 0
Hence, an eigenvalue is a scalar λ for which the system
(A − λI)x = 0
has a non trivial solution. We know that this system has a nontrivial solution if and only if the determinant of the
coefficient matrix is zero. Thus, we want to find λ such that
det(A − λI) = 0.
Let A = [ai,j ] be a given matrix. Then the above equation can be written as
a1,1 − λ a1,2 a1,3 ... a1,n

a2,1 a2,2 − λ a2,3 ... a2,n
a3,1 a3,2 a3,3 − λ ... a3,n
det(A − λI) = · ·
· ·
· ·
an,1 an,2 an,3 ... an,n − λ

Computing this determinant we get a polynomial in λ of degree at most n. This is called the characteristic
polynomial of A, which we denote by char (A, λ). Finding the eigenvalues of A is equivalent to finding the roots of
the polynomial char (A, λ).
Corollary 5.2. λ is an eigenvalue if and only if it is a root of the characteristic polynomial.
Remark 5.2. Recall from algebra that a polynomial of degree n can have at most n roots. Hence an n × n matrix can have at
most n eigenvalues. See Appendix B for more details on polynomials.
88
The multiplicity of an eigenvalue as a root of the characteristic polynomial is called the algebraic multiplicity
of the eigenvalue. For a fixed eigenvalue λ the corresponding eigenvectors are given by the solutions of the system
(A − λI)x = 0
Equivalently we have called such a space the nullspace of the coefficient matrix (A − λI).
Definition 5.10. If λ is an eigenvalue of A, the set
EL := {v ∈ V | A v = λ v}
is called the eigenspace of A corresponding to λ. The dimension of the eigenspace is called the geometric multiplicity of the
eigenvalue λ.
Remark 5.3. It can be shown that the geometric multiplicity is always ≤ to the algebraic multiplicity.
Finding the eigenvalues requires solving a polynomial equation which can be difficult for high degree polyno-
mials. Once the eigenvalues are found then we use the linear system
(A − λI)x = 0
to find the eigenvectors. We illustrate below.
Example 5.6. Find the characteristic polynomial and the eigenvalues of the matrix
" #
1 2
A= .
5 4
Solution: The characteristic polynomial is
1−λ 2
char (A, λ) = det(A − λI) =
5 4−λ
= (1 − λ)(4 − λ) − 5 · 2 = λ2 − 5λ − 6 = (λ + 1)(λ − 6)
The eigenvalues are λ1 = −1 and λ2 = 6. Both of them have algebraic multiplicity 1.
If λ1 = −1 the system becomes:
" #
2 2
x=0
5 5
and its solution is " #
-1
v1 =
1
Its eigenspace is
Eλ1 = hv1 i.
It has dimension 1 and therefore the geometric multiplicity of λ1 = −1 is also 1.
For λ2 = 6 the system becomes:

" #
-5 2
x=0
5 -2
and its solution is " #
1
v2 = 5
2
Its eigenspace is
Eλ2 = hv2 i
This eigenspace also has dimension 1 and therefore the geometric multiplicity of λ2 = 6 is also 1.
89
Example 5.7. Find the eigenvalues and their multiplicities for the matrix
 1 0 2 1 
 
 2 1 0 -1 
A := 
 
 0 0 2 0 

0 0 1 -2
 
char (A, x) = (x − 1)2 (x − 2) (x + 2)
Hence there are three eigenvalues, namely λ1 = 1, λ2 = −2, λ3 = 2. The eigenvalue λ1 = 1 has algebraic multiplicity 2 and the
others have algebraic multiplicity 1.
To find the geometric multiplicities for λ1 , λ2 , λ3 we have to find their corresponding eigenvectors. By solving the
corresponding systems we have
 0   1   9 
     
 1   - 5   17 
v1 =  , v2 =  3  , v3 = 
     
 0   0   4 

0 1
     
-3
Thus the geometric multiplicities for λ1 , λ2 , λ3 are respectively 1, 1, 1.

Next we will see an example when the algebraic and geometric multiplicities are the same for each eigenvalue.
Example 5.8. Find the eigenvalues and their multiplicities for the matrix
 1 0 0 1 
 
 0 1 0 2 
A := 
 
 1 -1 2 3 

0 0 0 -2
 
char (A, x) = (x − 1)2 (x − 2) (x + 2)
Hence there are three eigenvalues, namely λ1 = 1, λ2 = −2, λ3 = 2. The eigenvalue λ1 = 1 has algebraic multiplicity 2 and the
others have algebraic multiplicity 1.
To find the geometric multiplicities for λ1 , λ2 , λ3 we have to find their corresponding eigenvectors. By solving the
corresponding systems we have:
For λ = 1 the eigenvectors are
 1   -1 
   
 1   0 
u1 =   , u2 = 
   
 0   1 

0 0
   
Hence the geometric multiplicity of λ1 = 1 is 2.

For λ2 and λ3 the eigenvectors are respectively v2 and v3 as below:
 1  0
   
 
 2   0 
v2 =  5  , v3 = 
   
1

 2   
0
   
-3
Hence, the geometric multiplicity for λ2 and λ3 is 1.

90
Remark 5.4. We will see in the next chapter that the above two examples illustrate two classes of matrices. We will learn how
to deal with each of these classes separately.
Exercises:
5.11. Find the eigenvalues and their algebraic and geometric T and D.
multiplicities for each of the matrices
5.16. Let A and B be given as below:
 5 -1 0 2  5 2 0 2
   
 
 5 -1 0 2  5 2 0 2
 1    
2 1 0   3 2 1 0 
A =   , B = 
     
 1 2 1 0   3 2 1 0 
 3 1 -2 4  3 1 -2 4

A =   , B = 
     
 3 1 -2 4  3 1 -2 4

0 4 -1 2 2 4 -1 2
     
0 4 -1 2 2 4 -1 2
   
5.12. Let A be a diagonal n × n matrix given by
Find their eigenvalues. In each case compute the sum and
 1 0 0 0 product of eigenvalues and compare it with the trace and de-
 

 0 2 0 0  terminant of the matrix.
A = 
 
 0 0 3 0


0 0 0 4
 
5.17. Prove that a square matrix is invertible if and only if no
What are its eigenvalues and their multiplicities? eigenvalue is zero.
5.13. Compute the eigenvalues and their multiplicities of the 5.18. Let A be a 3 by 3 matrix. Can you find a formula which
matrix A3 , where A is as in the previous example. determines the eigenvalues of A if you know the trace and
determinant of A?
5.14. Let A be a diagonal n × n matrix such that det(A) , 0.

Assume that all entries in the diagonal are distinct. How many 5.19. Find the characteristic polynomial, eigenvalues, and
distinct eigenvalues has A and what are their multiplicities? eigenvectors of the matrix
 -1 -1 0
 

5.15. Let A be a 2 by 2 matrix with trace T and determinant A =  1 1 1
 

D. Find a formula that gives the eigenvalues of A in terms of 
3 1 -2

5.3 Similar matrices, diagonalizing matrices

In this section we will study the concept of similarity of matrices. We will determine necessary and sufficient
conditions for a matrix to be similar to a diagonal matrix. When this is possible we will provide an algorithm for
determining this diagonal matrix.
Definition 5.11. Two matrices A and B are called similar if there exists a matrix C such that
A = C−1 B C.
Two similar matrices A and B are denoted by A ∼ B.

Lemma 5.4. The similarity relation is an equivalence relation.
Proof. Exercise.
The following theorem is the main result of this section. We will skip its proof.
Theorem 5.8. Let A be an n × n matrix and
λ1 , . . . , λi , . . . , λs
all distinct eigenvalues of A. If for each λi the geometric multiplicity equals the algebraic multiplicity, say
alg. mult.(λi ) = geom. mult. (λi ) = ei
91
then
A = CDC−1
where D is the diagonal matrix given below

 λ1
  



e1
 

 ... 
 


λ1
  

 
λ2
 
 
...
 
D = 
 
λ2

 

...
 
 
 λn  
  

...
  

en
  
λn
  




and h i
C = v1,1 , . . . v1,ei , v2,1 , . . . , v2,e2 , . . . vs,es
where vi,1 , . . . , vi,ei is a basis for the eigenspace Eλi .
We call the matrix C in the above theorem the transitional matrix of A associated with D. We illustrate the above
theorem with the following two examples.
Example 5.9. Let

 2 1 0 2
 

 -1 0 -1 0 
A = 
 
 2 1 0 1


1 0 -1 1
 
Solution: Its characteristic polynomial is
char (A, λ) = (λ2 − 2λ + 2)(λ2 − λ − 1)
The eigenvalues are

√
~ 1 5
1 ± i, ±
2 2
and their algebraic multiplicity is 1. We now find the geometric multiplicity for each one of the eigenvalues.
λ = 1 +~i: Then we solve the system
A − (1 +~i)In = 0
The solution is
1 
 

 −1 +~i 
 
v1 =  
 1 
0
 
and the corresponding eigenspace has dimension 1.

Similarly, if λ = 1 −~i then the eigenvector is:
1 
 

 −1 −~i 
 
v2 =  
 1 
0
 
92
√ √
5 5
If λ3 = 12 + 2 , λ4 = 12 + 2 , then the corresponding eigenvectors are
 13 5 √   13 5 √ 
− 2 + 2 5 − 2 − 2 5
   
   
   

 1 


 1 

v3 =  
√  ,
 v4 =  
√


 6 − 3 5   6 − 3 5 
   
   
 15 7 √   15 7 √ 
   
2 − 2 5 2 − 2 5
Hence, since the algebraic multiplicity of each eigenvalue is the same with the geometric multiplicity then A is similar to
1 +~i
 
0 0 0 
1 −~i
 
 0 0√ 0 
D = 
 
5
2+ 2
 0 1
 0 0 √ 
5

1
0 0 0 2− 2

The transitional matrix in this case is C = [v1 , v2 , v3 , v4 ].

Lemma 5.5. Similar matrices have the same eigenvalues.
Proof. Let A ∼ B, say
A = C−1 BC
for some invertible matrix C. Then,
char (A, λ) = det(A − λ I) = det(A − λ I) · det(C−1 ) · det(C)

= det C−1 (A − λI)C = det C−1 AC − λC−1 IC

= det C−1 AC − λI = det(B − λI) = char (B, λ).
Thus, the characteristic polynomial is the same. Hence, A and B have the same eigenvalues.
Lemma 5.6. Let A be a n × n matrix and
λ1 , λ2 , . . . , λn
its eigenvalues (not necessarily distinct) such that the algebraic and geometric multiplicity are the same. Then,
tr(A) = λ1 + λ2 + · · · + λn .
Proof. Exercise
5.3.1 Diagonalizing matrices

We want to consider the following: Given a matrix A, find a diagonal matrix D such that A is similar to D. Fur-
ther, find the matrix C which conjugates A and D. The theorem above provides an algorithm for how this can be done.
Algorithm 7. Input: An n × n matrix A.

Output: Matrices C and D such that
A = C D C−1
if A is diagonalizable, otherwise display ’A is not diagonalizable’.
i) Compute the eigenvalues of A and their algebraic multiplicities.

ii) For each eigenvalue λ1 , compute the geometric multiplicity of λi and the corresponding eigenvectors
vi,1 , . . . , vi,s
iii) Create the matrix D and C as in the previous theorem.
93
Example 5.10. Let A be the 4 × 4 matrix given below
 9 0 0 0
 

 -2 1 -3 -4 
A := 
 
 -6 0 6 0


4 4 3 11
 
Find out if this matrix is diagonalizable and in that case find a diagonal matrix D similar to A and the transitional matrix C
associated to D.
Solution: The characteristic polynomial of A is
char (A, x) = (x − 3) (x − 6) (x − 9)2 .
Thus, the eigenvalues are

λ1 = 3, λ2 = 6, λ3 = 9
with algebraic multiplicities 1, 1, and 2 respectively. The corresponding eigenvectors of λ1 , λ2 , λ3 are respectively v1 , v2 , and
w1 , w2 as below
 0   0   2   1 
       
 -2   1   1   0 
v1 :=   , v2 :=   , w1 :=   , w2 := 
       
 0   -3   -4   -2 

1 1 0 1
       
Hence, the geometric multiplicities are respectively 1,1, and 2. Therefore the matrix A is diagonalizable and C and D are
 3 0 0 0  0 0 2 1
   
 
 0 6 0 0   -2 1 1 0 
D =   , C := 
   
 0 0 9 0  0 -3 -4 -2

 
0 0 0 9 1 1 0 1
   

Example 5.11. Let A be a 3 by 3 matrix as below
 2 1 0
 

A =  0 2 0  .
 
0 0 3
 
Check if A is similar to a diagonal matrix.
Solution: Then char (A, λ) = (λ − 2)2 (λ − 3). For the eigenvalue λ = 2, the algebraic multiplicity is 2 and the eigenspace is
given by    

 0 

  
E2 = 
 
t 1 | t ∈

 
 
 Q

 0
   

The geometric multiplicity is 1, hence A is not similar to the diagonal matrix of eigenvalues.
Lemma 5.7. Let A be similar to a diagonal matrix D such that A = C−1 D C. Then
An = (C−1 ) Dn C
Proof. Exercise.
Exercises:
94
5.20. Let A be a n × n matrix with characteristic polynomial 5.25. Let

 1 2 4
 
char (A, λ) = an λn + an−1 λn−1 + · · · + a1 λ + a0 .

A =  3 5 2
 

2 6 9
 
Show that
tr(A) = (−1)n−1 · an−1 .
Find the eigenvalues of A. Compute A11 .
5.21. Diagonalize (if possible) the matrix:
 3 1 4 2
 
 -1 0 -1 0

 5.26. Let A be the 4 by 4 matrix
A = 
 
 2 1 0 1


 -2 -5 -2 -1 
 
1 0 -1 1
 
 
 
5.22. Let  3
 2 7 3
0 

 2 2
 2 1 3 2   3 1 4 2 
   
A := 


 -1 0 -1 0   -1 0 -1 0 
 1 −1 −3
-1 

A =  , and B =  .  2 2 2
 
 5 1 0 1   2 1 0 1   
 
1 0 -1 3
 
1 0 -1 1
  −5 −7 −1
1

2 2 2
Determine if A and B are similar.
Show that A = C−1 DC where
5.23. Let  1 2 1 1  -1 0 0 0
   
 
3  C :=  1 1 -1 0
   0 -1 0 0 
2 1 3 2  -10 -2 2  ,
   
D := 
  
 
 + 11  -1 1 1 2  0 0 1 0

 -1 0 -1 0  7 -5 1   
A =   , B =   .

and
  
1 1 0 -1 0 0 0 2
  
 5 1 0 1   -15 -2 5 4 
1 0 -1 3 -15 -4 5 3
   
Compute A6 .
Determine if A and B are similar.
5.24. Let " # 5.27. Compute Ar for

8 2
A= " #
2 5 -3 5
A=
Find its eigenvalues and eigenvectors. Find their geometric -2 4
and algebraic multiplicities. Find the matrices C, D such that
where r is a positive integer.
A = C−1 DC
5.4 Cramer’s rule and adjoint matrices

Until now we have solved linear systems using the Gauss method. In this section we will see a different method
which gives us a formula for solving linear systems.
Let a linear system
A·x = b
be given where
 a1,1 a1,2 a1,3 ... a1,n   x1   b1 
     
...
 a 
 2,1 a2,2 a2,3 a2,n   x 
 2 
 b 
 2 
 a3,1 a3,2 a3,3 ... a3,n   x3   b3 
     
A = [ai,j ] =  ·  , x =   , b =   .
     


 · 

 
 
 
 

 · 

 
 
 
 
am,1 am,2 am,3 ... am,n xm bm
For each k = 1, . . . , n, we define the matrix Bk to be the matrix obtained by replacing the k-column of A by the
vector b as below:
95
 a1,1 a1,2 a1,3 ... b1 ... a1,n

 

 a
 2,1 a2,2 a2,3 ... b2 ... a2,n 

 a3,1 a3,2 a3,3 ... b3 ... a3,n
 

... ...

·

 . 
Bk =  ... ...
 
· . 
... ...

· bi
 
 
... ...

 · . 

... ...

· .

 
am,1 am,2 am,3 ... bn ... am,n
 
Theorem 5.9. (Cramer) If A is an invertible matrix then the linear system
Ax = b
has a unique solution given by

det(Bk )
xk = , f or k = 1, . . . , n.
det(A)
Proof. The solution is x = A−1 b. Expand det Bk in cofactors of the k-th column. We have
det Bk = b1 A1k + · · · + bn Ank .

1
Multiplying by det A , this is exactly the k-th component of the vector x.
Example 5.12. Solve the following system using Cramer’s rule
2x + 3y = 5
(
5x − y = 7
Solution: Then
" # " # " #
2 3 5 3 2 5
A= , B1 = , B2 =
5 -1 7 -1 5 7
and
det(A) = −17, det(B1 ) = −26, det(B2 ) = −11
Hence,
26 11
x1 = , x2 =
17 17

We now illustrate with a linear system with five equations and five unknowns.
Example 5.13. Solve the linear system Ax = b, where A is as in Example 5.3, and
1
 
 

 0 

b =  0
 

-1
 
 
 
0
Solution: As shown in Example (5.3) the determinant of A is det(A) = −121. Further, we compute
det(B1 ) = −61, det(B2 ) = −14, det(B3 ) = 44, det(B4 ) = −8, det(B5 ) = 28
96
Then, the solution of the system is

 61 
 121 
 
 
 14 
 121 
 
 
 
x =  − 11 

 4 
 
 
 8 
 121 
 
 
 
28 
− 121

A linear system
Ax = b
such that b = 0 is called a homogenous system.
Theorem 5.10. A homogenous system has a nonzero solution if and only if det(A) = 0.
Proof. Exercise.
5.4.1 Adjoints of matrices

The existence of the inverse of a matrix depends on whether or not the determinant of the matrix is 0. Naturally
one would like to find a formula for the inverse in terms of the determinant.
Definition 5.12. Let A be a n × n matrix with entries in C given by A = [ai,j ]. For each entry ai,j the corresponding cofactor
is denoted by ci,j . Create the matrix C = [ci,j ]. Let
C̄ := [ c̄i, j ],
where C̄i, j = [c̄i,j ] contains the conjugates of elements ci,j . The matrix
adj (A) := (C̄)t
is called the adjoint of A.

Example 5.14. Find the adjoint of the matrix
 i+1 2 i-1
 

A =  0 2i 0
 

i 1 -1
 
Solution: Then
 -2i 0 2
 

C =  -1-i 0 1-i
 

4 0 -2+2i
 
Hence,
2i 0 2
 
 
C̄ =  -1+i 0 1+i
 

4 0 -2-2i
 
and
 -2i -1-i 4
 

adj (A) =  0 0 0
 

2 1-i -2+2i
 
97
Remark 5.5. Notice that if the matrix has entries in R then it is not necessary to take the conjugates of ci, j since the conjugates
of real numbers are the numbers themselves. That is why in most textbooks which treat only the matrices with entries from R
the definition of the adjoint does not contain taking conjugates.
Example 5.15. Let A be the following matrix.
 1 2 0 -1
 

 0 2 0 0 
A := 
 
 2 1 -1 1


1 1 2 -1
 
Solution: Then, its adjoint is
 -2 5 -4 -2
 

 0 -6 0 0 
adj (A) = 
 
 6 -3 0 -6


10 -7 -4 -2
 

Theorem 5.11. Let A be an invertible matrix and adj (A) its adjoint. Then
A · adj (A) = adj (A) · A = det(A) · In
Proof. Exercise.
From the above theorem we conclude that for a given matrix A such that det(A) , 0 we have
1
A−1 = adj (A)
det(A)
Example 5.16. Find the adjoint of A

 1 2 3
 

A =  4 5 6  .
 
7 8 9
 
Solution: The adjoint is

 -3 6 -3
 

adj (A) =  6 -12 6
 

-3 6 -3
 
Notice that det(A) = 0 so this matrix does not have an inverse.
Exercises:
5.28. Let the curve 5.29. Using Cramer’s rule solve the system Ax = ~b where
 5 -1 0 2  5 
   

 1 2 1 0 ~b =  3 
  
A + By + Cx + Dy2 + Exy + x2 = 0 A =   ,
 
 3 1 -2 4   3 
 
0 4 -1 2 2
 
5.30. Find the adjoint of

be given. It passes through the points (x1 , y1 ), . . . , (x5 , y5 ). De-
termine the A, B, C, D, and E. This was the original problem  1 0 1  2 1 3
   
 
that Cramer was concerned with when he discovered his A =  0 1 0  , B =  2 -1 0
   

formula. 
2 0 1
 
-1 0 5

98
and use the result to find A−1 and B−1 . The polynomials f (x) and g(x) have a common factor in
k[x] if and only if Res( f, g, x) = 0.
5.31. Find the adjoint of
 5 -1

0 2

  5

2 0 2

 Let
 1 2 1 0   3 2 1 0 
A = 

 ,

B = 
  F(t) = u(1 + t2 ) − t2
 3 1 -2 4  3 1 -2 4

  (5.4)
0 4 -1 2 2 4 -1 2 G(t) = v(1 + t2 ) − t3
   
and use the result to find A−1 and B−1 . Find Res(F, G, t).
5.32. Determine if the matrix 5.34. Let
 1 0

0 -1
 f (x) = x5 − 3x4 − 2x3 + 3x2 + 7x + 6
 0 1
 (5.5)
0 0 
g(x) = x4 + x2 + 1
A := 
 
 2 1 -1 1


1 0 2 -1
 
Find Res( f, g, x).
is invertible. 5.35. Let
f (x) = an xn + · · · + a1 x + a0
5.33. Let f, g be as follows:
and f 0 (x) its derivative. Define the discriminant ∆ f of f (x)
f (x) = al x + al−1 x
l l−1
· · · + a − 1x + a0 with respect to x as below:
(5.2)
g(x) = bm x + bm−1 x m m−1
· · · + b1 x + b0 n(n−1)
(−1) 2
The matrix ∆ f := Res( f, f 0 , x).
an
 al bm
 
a The following is a basic fact in the algebra of polynomials:
al
 l−1 bm bm−1


al−1 .
al−2 bm−1 . bm−2
 
 polynomial f (x) has double roots if and only if ∆ f = 0.

 . The
al−2 . . . . . .


.  .. . al . . . bm  .
 
  Does
Syl( f, g, x) =  a1 . . . al−1 . . . bm−1  b0
f (x) = 6x4 − 23x3 − 19x + 4
a1 . . al−2
 a
 0 b0 . . . 
a0 . . . . . . have  any multiple roots in C?
 

.  . . . 

 5.36. Find b such that
. 
. . 
a0 b0
 
f (x) = x4 − bx + 1
(5.3)
is called the Sylvester matrix of f (x) and g(x). The resultant has a double root in C.
of f (x) and g(x), denoted by Res( f, g, x), is
5.37. Find p such that
Res( f, g, x) := det(Syl( f, g, x)).
f (x) = x3 − px + 1
The following is a basic fact in the algebra of polynomials:
has a double root in C.

5.38. Let A and B be two matrices which have the same eigen- multiplicities for the matrix
values. Are A and B necessarily similar? Explain your answer.
 1 1 0 2
 

 1 2 1 0 
A = 
 
 1 1 2 4


0 1 -1 2
 
5.39. Find the eigenvalues and their algebraic and geometric 5.40. Find the eigenvalues and their algebraic and geometric
99
multiplicities for the matrix 5.41. Prove that if A is similar to a diagonal matrix, then A is
similar to At .
 2 2 0 1 
 
 1 1 1 0 
B = 

 1 1 -2 1 

1 4 -1 2

100
Chapter 6
Canonical Forms
The main purpose of this chapter is classify the distinct linear transformations of a vector space or the similarity
classes of matrices.
Let V be a n dimensional vector space over the field k and B a basis of V. Further, T : V → V is a linear map and
0
A = MB B
(T) is its associated matrix. Choosing a different basis B0 for V gives a new matrix B = MB B0
(T) associated
with T, namely
B = P−1 A P
where P = MB B0
(id), see Chapter 4. Can we find B0 such that the matrix associated with T is as simple as possible?
The strategy is to pick B0 such that B is as close to a diagonal matrix as possible. We distinguish two cases:
i) k does not contain all the eigenvalues of A
ii) k contains all eigenvalues.
These cases lead respectively to the rational canonical form and the Jordan canonical form and will be studied in sections
2 and 3.
6.1 Basics on polynomials

In this section we review some of the basic properties of polynomials. For more details on this topic the interested
reader can check [Sha18] or [DF04].
As before, by a field k we mean one of the following: Q, R, or C. A polynomial f (x) with coefficients in k is the
following
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
where a0 , . . . , an ∈ k. an is called the leading coefficient of f (x). The polynomial f (x) is called a monic if an = 1.
We denote by k[x] the set of all polynomials in x with coefficients from k. Let f, g ∈ k[x]. By f + g, f · g we denote
p(x)
the usual addition and multiplication of polynomials. The set of all rational functions q(x) is denoted by k(x),
( )
p(x)
k(x) := | p(x), q(x) ∈ k[x]
q(x)
and it is a field. The next theorem shows that the well known Euclidean algorithm applies to polynomials as well.
Theorem 6.1. (Euclidean algorithm) Let f, g ∈ k[x] and assume that g , 0. Then there exists unique r, q ∈ k[x] such that
f = q· g+r
where deg r < deg g.

The polynomial r(x) is called the remainder of the division f (x) by g(x). If r(x) is the zero polynomial (i.e.,
r(x) ≡ 0) then we say that g(x) divides f (x) and denote it by g(x) | f (x). If α ∈ k such that f (α) = 0 we say that α is a
root of f (x). Then we have the following:
101
Corollary 6.1. Let f ∈ k[x] and α ∈ k such that f (α) = 0. Then f (x) = (x − α) · g(x).
Let f (x) be a polynomial and α a root of f (x) such that

f (x) = (x − α)e · g(x)
and (x − α) - g(x). The integer e is called the multiplicity of the root α.
Corollary 6.2. Let k be a field such that every non-constant polynomial in k[x] has a root in k. Then, for each f ∈ k[x] there
exists α1 , . . . , αn ∈ k and c ∈ k such that
f (x) = c (x − α1 ) · · · (x − αn ).
Corollary 6.3. Let f ∈ k[x] such that deg ( f ) = n. There are at most n roots of f in k.
Theorem 6.2. (Fundamental Theorem of Algebra) Every degree n polynomial f (x) with coefficients in C has n roots,
counting multiplicities.
Example 6.1. Let f (x) ∈ C[x] be given by

f (x) = (x2 + 1)2 · (x − 1)3 · (x − 2)
Then, the roots of f (x) are i, −i, 1, and 2 of multiplicity 2, 2, 3, and 1, respectively. Hence, the roots of f (x) are
i, i, −i, −i, 1, 1, 1, 2.
Thus, there are 8 of them as expected since deg f = 8.
6.1.1 Irreducibility of polynomials

As we will see in the next few sections it is important to know if a given polynomial can be factored or not over a
field k (i.e., k = Q). A polynomial f (x) ∈ k[x] is said to be irreducible if it can not be written as a product
f (x) = g(x) · h(x)
where g(x) and h(x) are non-constant polynomials. We give a few techniques of how to check if a polynomial is
irreducible over the field of rationals Q.
Theorem 6.3. (Integral root test) Let f (x) be a polynomial with integer coefficients given by
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
and α a root of f (x) such that α = db , (b, d) = 1. Then b | a0 and d | an .
Example 6.2. Prove that the polynomial f (x) = x3 + 2x + 2 is irreducible over Q.
Solution: Assume that f (x) factors in Q[x]. Then one of the factors is linear. Hence, f (x) has a rational root a = db . From the
previous theorem
b | 2, and d|1
Hence, b = ±1, ±2 and d = ±1. Then we have a = ±1, ±2. It can easily be checked that none of these values is a root of f (x).
Hence f (x) is irreducible.
Theorem 6.4. (Eisenstein’s criterion) Let f (x) be a polynomial with integer coefficients given by
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
and p a prime in A such that:
i) p | ai for all i ≤ n − 1
ii) p2 - a0
iii) p - an .
Then, f (x) is irreducible over Q.
102
Example 6.3. Show that

f (x) = x7 + 12x6 − 9x5 + 30x4 − 6x3 + 15x2 + 12x − 3
is irreducible over Q.
Solution: Notice that p = 3 divides all coefficients other than the leading coefficient. Further p2 = 9 does not divide a0 = −3.
Applying the previous theorem we conclude that f (x) is irreducible over Q.
Theorem 6.5. (Extension of the Eisenstein’s criterion) Let f (x) be a polynomial with integer coefficients given by
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
and p a prime in A such that:

i) there is an r (0 ≤ r ≤ n) such that p - ar
ii) p | ai for all 0 ≤ i ≤ r − 1
iii) p2 - a0
iv) f (x) = h(x) · g(x), such that h, g ∈ A[x].
Then, deg(h) ≥ r or deg(g) ≥ r.
Example 6.4. Let p be a prime. Show that

f (x) = x5 + 2x4 + 3x3 + 3
is irreducible in Q[x].
Solution: We use the previous theorem. Since 3 divides a0 , . . . , a3 but does not divide a4 then r = 4. Hence, if f (x) is reducible
then it is a product of polynomials of degree 4 and 1. Thus f (x) has a rational root. By the integral root test we show that this
can’t happen.
Exercises:
xn −1
6.1. Use the Euclidean algorithm to write the x−1 as a poly-
nomial. 1) x4 + 10x + 5
3) x4 − 4x3 + 6
4) x6 + 30x5 − 15x3 + 6x − 120
6.2. Prove that f (x) = x3 − 3x − 1 is irreducible in Q.
6.3. For any prime p show that x2 −p and x3 −p are irreducible 6.7. Factor over Q the polynomial
in Q.
f (x) = x3 − 7x2 + 16x − 12.
6.4. Let α ∈ Z such that α is divisible by some prime p but 6.8. Factor over Q the polynomial
p2 - α. Prove that xn − α is irreducible.
f (x) = x3 + x2 + x − 14.
6.5. Prove that f (x) = x4 + 1 is irreducible over Q.
6.9. We solve a quadratic equation by the well-known
quadratic formula. Do you know any formulas that one can
6.6. Prove that the following polynomials are irreducible over use to solve a cubic polynomial? What about polynomials of
Q. degree 4, 5?
103
6.2 Companion matrices, minimal polynomial, Smith normal form.

As above k denotes one of the fields Q, R, or C and Matn×n (k) denotes the vector space of all n × n matrices with
entries in k. Let A ∈ Matn×n (k) and f ∈ k[x] given by
f (x) = an xn + · · · + a0 .
We define
f (A) := an An + · · · + a1 A + a0 I.
Then f (A) is an n by n matrix with entries in k.
Theorem 6.6. Let A ∈ Matn×n (k). Then there exists a non-zero f ∈ k[x] such that
f (A) = 0.
Proof. The vector space Matn×n (k) is of dimension n2 . Hence,
I, A, A2 , . . . , As
are linearly dependent for s > n2 . Thus, there exist a0 , . . . , as such that
as As + . . . aA + a0 I = 0.
Take f (x) = as xs + . . . a1 x + a0 .
Definition 6.1. We call the minimal polynomial of A the unique monic polynomial m ∈ k[x] of minimal degree such that
m(A) = 0. The minimal polynomial of A is denoted by mA (x).
Definition 6.2. Let f (x) be a monic polynomial in k[x] given by
f (x) = xn + an−1 xn−1 + · · · + a0 .

The companion matrix of f (x) is the n × n matrix
 0 0 ... ... −a0

 

 1
 0 ... ... −a1 

 0 1 ... ... −a2
 
C f := 

... ... ...

 
... ... ...
 
 
...

0 0 1 −an−1

and we denote it by C f .
Lemma 6.1. Let f (x) ∈ k[x] and C f its companion matrix. The characteristic polynomial of C f is
char (C f , x) = f (x).
Proof. Exercise.
For a given matrix A the characteristic polynomial char (A, x) = det(xI − A). The matrix (xI − A) can be considered
as a matrix over the field k(x). Moreover, A is also in Matn×n ( k(x) ). In the next theorem we show how every
matrix in Matn×n ( k(x) ) can be transformed into a diagonal matrix by the elementary operations. These elementary
operations consist of
i) Interchange of any two rows or columns (Ri ←→ R j )
ii) Adding a multiple (in k[x]) of one row or column to another (Ri −→ q(x) · Ri + R j ).
iii) Multiplying any row or column by a non-zero element in k (Ri −→ u · Ri , for u ∈ k)
Two matrices A and B, one of which can be obtained by a sequence of elementary operations on the other, are called
Gaussian equivalent. For matrices whose entries are polynomials we have the following:
104
Theorem 6.7. Let M ∈ Matn×n ( k[x] ). Then, using elementary operations the matrix M can be put in a diagonal form
 1
 


 · 

·
 
 

1

 
 

 e1 (x) 


 · 


 · 

·
 
 
es (x)
 
where e1 (x), . . . , en (x) are monic polynomials such that
ei (x) | ei+1 (x), for i = 1, . . . , s − 1.
Proof. We will use the elementary operations to transform M into a diagonal matrix. Among all matrices which are
Gaussian equivalent to M pick the one which has the entry of smallest degree. Let such matrix be A = [ai j (x)] and
the entry with lowest degree is ai j =: m(x).
By an interchange of rows and columns bring this entry in (1, 1)-position. All entries of the first column can be
written as (Euclidean algorithm)
a1 j = m(x) q j (x) + r j (x)
where deg r j (x) < deg m(x).
By performing R j − m(x) q j (x) → R j for j = 2, . . . n the first column of the matrix is
 m(x) 
 
r (x)
 2 
 . . . 
  .
 . . . 
 
 
rn (x)
Choose the entry m0 (x) with the smallest degree from the first column and by a row change move that to the
(1, 1)-position. Perform the same process as above. Then degrees of r0j (x) will decrease by at least one. Since k[x] is
an Euclidean domain this process will end after finitely many steps and the first column will look like
m1 (x)
 
 0 
 
 . . . 
  .
 . . . 
 
 
0
Indeed, the maximum number of steps can be no bigger then deg m(x).
Next we perform the same procedure for the first row to get
 m2 (x) 0 ... 0 
 
a0 (x) a0 (x) . . . a0 (x)
 2,1 2,2 2,n 
a (x) a0 (x) . . . a0 (x)
 0 
 3,1 3,2 3,n 
 . . . 
 
an,1 (x) an,2 (x) . . . an,n (x)
 0 0 0 
Continuing again with the first column and so on, we get a sequence of operations
A → A(1) → A(2) → . . .
Let mi (x) denote the entry in the (1, 1)-position after the i-th step. Then
deg m(x) > deg m1 (x) > . . .
105
Thus, the procedure must stop and the matrix will be
e1 (x) 0 ... 0 

 
 0
 a00 (x)
2,2
... a00
2,n
(x)
a00 ... 00 
 0
 3,2
(x) a3,n (x)
 . . .
 

a00 ... a00
 
0 n,2
(x) n,n (x)
where e1 (x) has the smallest degree and divides all the entries a00
i, j
(x).
Now we perform the same procedure focusing on the next row and column. Finally we will have
e1 (x) 0 ... 0

 

 0 e2 (x) . . . 0

...

D :=  0 0 0 

 . . .
 

...

0 0 en (x)
such that ei (x) | ei+1 (x), for i = 1, . . . , n − 1.
Remark 6.1. If any of ei (x) = 0 then it will occur in the last position since all other e j (x), j , i must divide ei (x).
Definition 6.3. Let A ∈ Matn×n (k). Then by the above theorem the matrix xI − A can be put into the diagonal form
 1
 


 · 

·
 
 

1

 
 

 e1 (x) 


 · 


 · 

·
 
 
es (x)
 
such that ei (x) are monic and ei (x) | ei+1 (x), for i = 1, . . . , s − 1. This is called the Smith normal form for A and elements ei (x)
of nonzero degree are called invariant factors of A.
Lemma 6.2. The characteristic polynomial of A is the product of its invariant factors up to multiplication by a constant.
Proof. We have
char (A, x) = det(xI − A).
Since (xI − A) ∼ Smith (A) then
det(xI − A) = c · det(Sm(A)),
for some c ∈ k.

Lemma 6.3. Let e1 (x), . . . es (x) be the invariant factors of A such that
ei (x) | ei+1 (x), for i = 1, . . . , s.
The minimal polynomial ma (x) is the largest invariant factor of A. In other words
es (x) = mA (x).
Proof. Exercise
106
Example 6.5. Find the Smith normal form of the matrix A given as follows:
 2 -2 14 
 
A :=  0 3 -7 

0 0 2
 
Solution: We have
 x - 2 2 - 14
 

xI − A =  0 x-3 7
 

0 0 x-2
 
We perform the following elementary operations
 x − 2 2 - 14
 
 C ←→C
xI − A =  0
 x−3 7  1−→ 2

0 0 x−2

 2 x−2 - 14
 
 R →(x−3)R −2R
 x − 3
 0 7  2 −→ 1 2

0 0 x−2

 2 x−2 - 14
 
 C →(x−2)C −2C
 0
 (x − 2)(x − 3) −14(x − 2)  2 −→ 1 2

0 0 x−2

 2 0 - 14
 
 R1 → 1 R1 , R2 →− 1 R2
 0 2 2
 −2(x − 2)(x − 3) −14(x − 2) 
 −→
0 0 x−2

 1 0 -7
 
 C →7C +C
 0 (x − 2)(x − 3) 7(x − 2)  3 1 3
  −→
0 0 x−2
 
 1 0 0
 
 C ←→C
 0 (x − 2)(x − 3) 7(x − 2)  2 3
  −→
0 0 x−2
 
 1 0 0
 
 R →R −7R
 0 7(x − 2) (x − 2)(x − 3)  3 2 3
  −→
0 (x − 2) 0
 
 1 0 0
 
 C →(x−3)C −7C
 0 7(x − 2) (x − 2)(x − 3)  3 2 3
  −→
0 0 (x − 2)(x − 3)
 
 1 0 0
 
 R2 → 1 R2 , R3 →− 1 R3
 0 7 7
 7(x − 2) 0 
 −→
0 0 −7(x − 2)(x − 3)
 
107
 1 0 0
 

 0 (x − 2) 0 
 
0 0 (x − 2)(x − 3)
 
which is the Smith normal form Sm(A). The reader can check that the characteristic polynomial of Smith (A) and A are the
same.
Exercises:
6.10. Find the companion matrix of 6.14. Find the Smith normal form of matrices in the previous
two exercises.
f (x) = x3 − x − 1.
6.11. Find the companion matrix of
6.15. Determine all possible minimal polynomials of a matrix
f (x) = (x − 2)2 (x − 3). A with characteristic polynomial
6.12. Let A be a 2 by 2 matrix with entries in Q such that char (A, x) = (x − 2)2 (x − 3)
char (A, x) = x2 + 1. Find the minimal polynomial of A.
6.16. Determine all possible Smith normal forms of a matrix
6.13. Let f (x) be an irreducible polynomial cubic in Q. For A with characteristic polynomial
example
char (A, x) = (x − 2)2 (x − 3)
f (x) = ax3 + bx2 + cx + d.
Let A be a 3 by 3 matrix with entries in Q such that 6.17. Find all possible Smith normal forms of a matrix A with
char (A, x) = f (x). Find the minimal polynomial mA (x) of characteristic polynomial
A. Can you generalize to a degree n polynomial?
char (A, x) = x3 − 1.
6.3 The rational canonical form

Let f (x) be a polynomial with coefficients in a field k. As noted in the previous section not all roots of a given
polynomial are necessarily in k. For example, not all polynomials with rational coefficients factor into linear factors
over the rationals. Let A be a given matrix with entries in k. In this section we will see how to find the "best" matrix
D similar to A and with entries still in k. The reader can assume that in this section k = Q.
Let A ∈ Matn×n (k) and D = Smith (A), its Smith normal form as in the previous section. Let e1 (x), . . . , es (x) be the
invariant factors of A and C1 , . . . , Cs the corresponding companion matrices. The block-matrix
 C1
 


 C2 

·
 
 
 

 · 


 · 

Cs
is called the rational canonical form of A and is denoted by Rat (A). The word rational is used to indicate that this
form is calculated entirely within the field k. Notice that,
e1 (x) · · · es (x) = c · char (A, x)
implies that
deg e1 + · · · + deg es = deg char (A, x).
Hence, A and Rat (A) have the same dimensions.
108
Example 6.6. Find the rational canonical form of the matrix
 2 -2 14
 

A :=  0 3 -7
 

0 0 2
 
Solution: We found the invariant factors of this matrix in Example 6.5 in the last section. They are e1 (x) = x − 2 and
e2 (x) = (x − 2)(x − 3). Then the rational form of A is
 2
 

Rat (A) =  0 -6 

1 5
 
Theorem 6.8. Let k be a field and A ∈ Matn×n (k). Then the following hold:
i) Two matrices in Matn×n (k) are similar if and only if have the same rational form.
ii) The rational form of A is unique.
Proof. Let A be similar to B. Then char A (x) = char B (x) as polynomials over k. Hence, the Smith normal form is the
same for A and B. Thus, A and B have the same rational form.
If A and B have the same rational form, then they have the same invariant factors.
ii) There is only a unique choice of invariant factors. Hence a unique rational form.
Example 6.7. Let A be a 10 by 10 matrix such that its invariant factors are
e1 (x) = x − 2
e2 (x) = (x − 2)(x3 + x + 1) (6.1)
e3 (x) = (x − 2)(x − 3)(x + x + 1) 3
Find the rational canonical form of A.
Solution: By multiplying through we have
e2 (x) = x4 − 2x3 + x2 − x − 2
(6.2)
e3 (x) = x5 − 5x4 + 7x3 − 4x2 + x + 6
Hence, the rational canonical form of A is
 2
 


 0 0 0 2 

1 0 0 1
 
 
0 1 0 -1
 
 
 
0 0 1 2
Rat (A) = 
 

 0 0 0 0 -6 

1 0 0 0 -1
 
 


 0 1 0 0 4 

0 0 1 0 -7
 
 

0 0 0 1 5
109
Example 6.8. Let A be a 8 by 8 matrix such that its invariant factors are
e1 (x) = x3 + x + 1
(6.3)
e2 (x) = (x2 + 2)(x3 + x + 1) = x5 + 3x3 + x2 + 2x + 2
Solution: Hence the rational canonical form is
 0 0 -1
 

 1 0 -1 
 
 0 1 0 
 
0 0 0 0 -2
 
Rat (A) = 
 

 1 0 0 0 -2 

0 1 0 0 -1
 
 
0 0 1 0 -3
 
 

0 0 0 1 0

Exercises:
6.18. Find the rational canonical form of this matrix over Q 6.19. Let A be the 8 by 8 matrix given by
0 0 0 0 0 0 0 1
 
 

 1 0 0 0 0 0 0 0 


 0 1 0 0 0 0 0 0 

0 0 1 0 0 0 0 0
 
A = 
 

 0 0 0 1 0 0 0 0 


 0 0 0 0 1 0 0 0 


 0 0 0 0 0 1 0 0 

" # 0 0 0 0 0 0 1 0
1 2
3 4 Find its eigenvalues. What about the eigenvalues of AT ?
6.4 Cayley-Hamilton theorem

The Cayley-Hamilton theorem is one of the most recognized theorems of linear algebra. It can be quite useful at
times to compute the rational canonical form of matrices.
Theorem 6.9. (Cayley - Hamilton) Let A ∈ Matn×n (k), mA (x) its minimal polynomial, and char A (x) the characteristic
polynomial of A. Then,
mA (x) | char A (x).
Proof. Let e1 (x), . . . , es (x) be the invariant factors of A such that ei (x) | ei+1 (x), for i = 1, . . . s. We know that
char A (x) = e1 (x) · · · es (x)
Since es (A) = mA (A) = 0 and es (x) | char A (x), then char A (A) = 0.
Since m(x) is the minimal polynomial then
deg mA (x) ≤ deg charA (x).
By the Euclidean algorithm,
charA (x) = q(x) mA (x) + r(x)
such that deg r(x) < deg mA (x). Since charA (A) = 0, then r(A) = 0. Thus r(x) is the zero polynomial, otherwise r(x)
would be the minimal polynomial.
110
6.4.1 Computing the rational canonical form

The previous section determines an algorithm for computing the Smith normal form of a matrix A. This gives us all
the invariant factors of A. Once the invariant factors are known then it is easy to write down the rational canonical
form Rat (A) of A. However, there are techniques to directly compute the rational form of a matrix by elementary
operations or figure out the invariant factors without computing the Smith normal form. In this section we illustrate
some of these techniques through examples.
Example 6.9. Let A be the 3 by 3 matrix given below:
 23 70 20 
 3 3 3 
 
 
A =  - 43 11 4
 
- 3 - 3


 
 
-2 -7 -1

Find its rational canonical form.
Solution: The characteristic polynomial of A is
char (A, x) = (x − 1)3 .
Then, by Cayley-Hamilton theorem the minimal polynomial of A is one of the following:
mA (x) = (x − 1), (x − 1)2 , (x − 1)3
Furthermore, mA (A) = 0. We check that A − I , 0 and (A − I)2 = 0. Hence the minimal polynomial is
mA (x) = (x − 1)2
Hence the Smith normal form is

 1
 

Smith (A) =  x−1
 

(x − 1)2
 
and the rational form

 1
 

Rat (A) =  0 -1 

1 2
 
6.4.2 Computing the transformation matrix:

We know how to compute the rational form of a matrix A. Then, A is similar to its rational form Rat (A). Hence
there exists an invertible matrix C such that
A = C−1 Rat (A) C
We would like to compute C. The strategy is to keep track of all elementary operations performed in xI − A and to
perform these operations on I in order to get C as a product of elementary matrices.
Algorithm 8. Input: A n × n matrix A

Output: The matrix C such that
A = C−1 Rat (A) C
1) Create the matrix xI − A.
2) Transform it to the Smith normal form and keep track of all the elementary operations.
111
3) For each of the operations of step 2, perform the following operations on the identity matrix I by converting to the following
rules:
a) Ri ←→ R j =⇒ Ci ←→ C j
b) Ri −→ q(x) · Ri + R j =⇒ Ci −→ q(x) · Ci + C j
c) Ri −→ u · Ri , for u ∈ k =⇒ Ci −→ u · C j
4) The matrix obtained after performing these operations on I is the sought matrix C.
Exercises:
6.20. Find the rational form of the 3 by 3 matrix with invariant 6.25. Determine all possible rational canonical forms for a
factors matrix with characteristic polynomial
e1 (x) = (x − 1), e2 (x) = (x − 1), e3 (x) = x − 1.
f (x) = x2 (x2 + 1)2
6.21. Find the rational canonical form of matrices over Q
 0 -4 85   2 2 1  6.26. Determine all possible rational canonical forms for a
   
A =   , B =  matrix with characteristic polynomial
 1 4 -30   0 2 -1 

0 0 3 0 0 3
   
and determine if A and B are similar. f (x) = xp − 1
for an odd prime p.

6.22. Find the invariant factors of
 2 2 1 
 
 3 4 1 
  6.27. The characteristic polynomial of a given matrix A is
1 5 1
 
char (A, x) = (x − 1)2 · (x + 1) · (x2 + x + 1).
6.23. Prove that two non-scalar 2 × 2 matrices over k are sim-
ilar if and only if they have the same characteristic polynomial.
What are the possible polynomials that can be minimal poly-
nomials of A?
6.24. Find the rational canonical form of
 0 -1 -1 
 
 0 0 0  6.28. Find all similarity classes of 2 × 2 matrices with entries
in Q and precise order 4 (i.e, A4 = I).

-1 0 0

6.5 The Jordan canonical form

Let a ∈ k. Then a matrix of the form
 α 1
 


 α 1 

· ·
 
Jα = 
 

 · · 
α

1 


α
 
is called a Jordan block.
Lemma 6.4. Let A be an s × s matrix with characteristic polynomial
char A (x) = (x − α)s .
Then, A is similar to the s × s Jordan block matrix Jα .
112
Proof. Let f (x) := (x − α)s . Then, the Cayley-Hamilton theorem implies that
f (A) = (A − αI)s = 0.
Hence, mA (x) = (x − α)r or equivalently mA−αI (x) = xr . Thus, (A − αI) is similar to the companion matrix D of g(x) := xr ,
where
 0 1
 


 0 1 

0 ·
 
 
D =  · ·
 



 · · 

· 1 

 
0
Thus, there is an invertible matrix P such that
P−1 (A − αI)P = D
which implies that P−1 A P = D + αI. In other words, A is similar to
 α 1
 


 α 1 

α 1
 
 
D + αI =  · ·
 



 · · 
α

1 


α
 

A matrix is in Jordan canonical form if it is a block diagonal matrix
 J1
 

 J 2 
J = 
 
 . 

.
 
 

Jn
with Jordan blocks along the diagonal.
Theorem 6.10. Let A be a n × n matrix with entries in k and assume that k contains all eigenvalues of A. Then,
i) A is similar to a matrix in Jordan canonical form.

ii) The Jordan canonical form of A denoted by J(A) is unique up to a permutation of blocks.
Thus, to find the Jordan canonical form of a n by n matrix A we first find its invariant factors e1 (x), . . . , es (x). Since
the field k contains all eigenvalues of A and each ei (x) | char (A, x), then we factor invariant factors as
ei (x) = (x − α1 )e1 · · · (x − αr )er
For each αi , i = 1, . . . , er we have a Jordan block. Since the product of all the invariant factors equals the characteristic
polynomial of A, the combination of all the Jordan blocks along the diagonal will create an n by n matrix (same
dimensions as A).
Remark 6.2. The Jordan canonical form of a matrix A is diagonal if and only if A is diagonalizable.
Example 6.10. Both matrices
 0 1 1 1  5 2 -8 -8
   
 
 1 0 1 1   -6 -3 8 8 
A =   , B =   ,
   
 1 1 0 1   -3 -1 3 4 
1 1 1 0 3 1 -4 -5
   
113
have the same characteristic polynomial

f (x) = (x − 3)(x + 1)3 .
Determine whether these matrices are similar and find their Jordan canonical forms.
Solution: The minimal polynomial for A and B is one of the following polynomials:
m1 (x) =(x − 3) (x + 1),

m2 (x) =(x − 3) (x + 1)2 , (6.4)
m3 (x) =(x − 3) (x + 1) . 3
We check that (A − 3I) (A + I) = 0. In the same way we check that (B − 3I) (B + I) = 0. Hence, the minimal polynomial of A
and B is
m(x) = (x − 3) (x + 1).
Its Smith normal forms are
 1
 

 x+1 
Smith (A) = Smith (B) = 
 
x+1

 
(x − 3)(x + 1)
 
Then the Jordan canonical forms are

 -1
 

 -1 
J(A) = J(B) = 
 
-1

 
-3
 
Thus, A and B are similar. Further, A and B are diagonalizable matrices and we can diagonalize them using the techniques of
the previous chapter.
Example 6.11. Let A be a matrix such that its invariant factors are
e1 (x) =(x − 2)2 (x2 + 1)

(6.5)
e2 (x) =(x − 2)3 (x2 + 1)2
Find the rational and Jordan canonical form of A.
Solution: Multiplying out we have
e1 (x) =x4 − 4x3 + 5x2 − 4x + 4

(6.6)
e2 (x) =x7 + 6x6 + 14x5 − 20x4 + 25x3 − 22x2 + 12x − 8
The rational canonical form is
0 0 0 -4
 
 

 1 0 0 4 

0 1 0 -5
 
 
 

 0 0 1 4 


 0 0 0 0 0 0 8 

Rat (A) =  1 0 0 0 0 0 -12  ,


 0 1 0 0 0 0 22 

0 0 1 0 0 0 -25
 
 
 

 0 0 0 1 0 0 20 


 0 0 0 0 1 0 -14 

0 0 0 0 0 1 -6
114
and the Jordan canonical form
 2 1
 

 0 2 
 
-i
 
 
+i
 
 
 

 2 1 0 

J(A) =  0 2 1 


 0 0 2 

-i 1
 
 
 

 0 -i 


 i 1 
0 i


Example 6.12. Let A be a 3 by 3 matrix as below
 2 1 0
 

A =  0 2 0
 

0 0 3
 
Find its Jordan canonical form.
Solution: Then char (A, λ) = (λ − 2)2 (λ − 3). For the eigenvalue λ = 2, the algebraic multiplicity is 2 and the eigenspace is
given by
 1 
 
E2 = {t 
 0 
 | t ∈ Q}
0
 
The geometric multiplicity is 1, hence A is not similar to the diagonal matrix of eigenvalues.
We have
xI − A =
 x − 2 1 0  C ←→C  1 x−2 0
   
 R =(x−2)R −R
 0 x−2 0  1−→ 2  x − 2 0 0  2 −→ 1 2
   
0 0 x−3 0 0 x−3
 1 x−2 0  C =(x−2)C −C  1 0 0
    R ←→R
 C 2←→C3
 0

(x − 2)2 0  2 −→ 1 2  0 - (x − 2)2 0  2−→ 3
  
0 0 x−3 0 0 x−3

 1 0 0  1 0 0
   
 
 0 x−3 0  −→  0 1 0
  
 
0 0 (x − 2)2 0 0 (x − 2)2 (x − 3)
   
Then its Jordan canonical form is

 2 1
 

J(A) =  0 2  .
 
3
 
Instead we could have recognized that A was already in the Jordan canonical form. Notice that the geometric multiplicity for
each eigenvalue is 1 and there is one Jordan block for each eigenvalue. Also the algebraic multiplicities of the eigenvalues are 2
and 1 and the corresponding Jordan blocks are of sizes 2 and 1 respectively. We will see that these facts are not a coincidence.
Exercises:
115
6.29. Let A be a matrix with characteristic polynomial 6.35. Find the Jordan canonical form of matrices
char (A, x) = x3 + x2 + x + 1
 0 -4 85  2 2 1
   
 
Find the rational form of A over Q and the Jordan canonical A =  1 4 -30  , B =  0 2 -1
   

form of A over C. 0 0 3 0 0 3
   
and determine if A and B are similar.

6.30. Find the rational and Jordan canonical form of
 2 1 1 
 
 . 6.36. Determine the Jordan canonical form for the n × n ma-
 1 2 0 


1 1 3
 trix over Q whose entries are all 1.
6.31. Compute the Jordan canonical form of the matrix with
characteristic polynomial f (x) = xn − 1, for n ≥ 2. 6.37. Let A be the 2 × 2 matrix which corresponds to the
rotation of the complex plane by 2π
5 . Find the Jordan canonical
6.32. Show that if A2 = A then A is similar to a diagonal form of A. Explain in terms of complex numbers.
matrix which has only 0’s and 1’s along the diagonal.
6.38. Let A be the 2 × 2 matrix which corresponds to the

6.33. Find the Jordan canonical form of transformation of the complex plane T(z) = 1z . Find the Jordan
 3 2 0 
  canonical form of A. Explain in terms of complex numbers.
 .
 1 2 7 

1 -2 3
 
Programming exercises:
6.34. Find the Jordan canonical form of
 1 0 0  1) Write a computer program which finds the Jordan
 
 . canonical form of a given matrix A.
 0 0 -2 

0 1 3
 

6.39. Find the rational canonical form of the 5 by 5 matrix A 6.43. Diagonalize the matrix or explain why it can’t be diag-
with characteristic polynomial onalized.
 3 1 0 -1
 
char (A, x) = x5 + 2x4 − 12x3 + 4x2 − 6x + 10  4

0 0 3 
A = 
 
 -4 2 2 -3

6.40. Let A be a n by n matrix which has n distinct eigenvalues 
2 -4 0 7
 
λ1 , . . . , λn . Find the Jordan canonical form of A.
6.44. Diagonalize the matrix or explain why it can’t be diag-
onalized.
6.41. The characteristic polynomial of a 3 by 3 matrix A is
 7 -1 0 2
 
char (A, x) = (x − 1) (x − 2).
2

 -10 4 0 -4 
A = 
 
 5 -1 2 2


Find all possibilities for the rational and Jordan canonical form 
-15 3 0 -4

of A.
6.45. Let A be an n × n nilpotent matrix. Show that An = 0.
6.42. Determine if the matrices A and B are similar 6.46. Let A be a strictly upper triangular matrix (all entries on
the main diagonal and below are 0). Prove that A is nilpotent.
 -1 1 0 0   -1 1 0 0 
   
 0 -1 0 0   0 -1 0 0 
A =  , B = 
 
 0 0 -2 0   0 0 -2 1 
 6.47. Let A be the 2 × 2 matrix which corresponds to the
0 0 0 -2 0 0 0 -2 rotation of the complex plane by 2π
n . Find the Jordan canonical
   
116
form of A. Explain in terms of complex numbers. A, over C, which satisfy A6 = 1.
6.48. Determine the set of similarity classes of 3 × 3 matrices

6.50. Determine the set of similarity classes of 6 × 6 matrices
A, over C, which satisfy A3 = 1.
A, over C, with characteristic polynomial:
6.49. Determine the set of similarity classes of 3 × 3 matrices char (A, x) = (x4 − 1)(x2 − 1).
117
118
Chapter 7
Inner Products and Orthogonality
In this chapter we will study the important concept of inner product in a vector space. We give the most general
definition of the inner product and briefly look at Hermitian products. The rest of the chapter is focused on
orthogonal and orthonormal bases we will study the Gram-Schmidt orthogonalization process. In the last section
a brief introduction to dual spaces is given.
7.1 Inner products

Let V be a vector space over the field k. Recall that in this book k denotes one of the fields Q, R, or C. For α ∈ k the
complex conjugate of α is denoted by ᾱ. Let f (u, v) be a function given as below
f: V × V −→ k
(7.1)
(u, v) = f (u, v)
The function f is called an inner product (scalar product) if the following properties hold for every u, v, w ∈ V
and r ∈ k:
i) f (u, v) = f (v, u),

ii) f (u, v + w) = f (u, v) + f (u, w)
iii) f (r u, v) = r f (u, v).
We will denote inner products with hu, vi instead of f (u, v). An inner product is called non-degenerate if
hu, vi = 0, for all v ∈ V =⇒ u = 0.
A vector space V with an inner product is called an inner space. We give some examples of inner spaces.
Example 7.1. Show that hu, z vi = z̄ hu, vi.
Solution: Indeed, hu, z vi = hz v, ui = zhu, vi = z̄ hu, vi.
Example 7.2. Let V = Rn and consider u, v ∈ Rn ,
u1   v1 
   
u2   v 
u =   and v =  2 
 
. . .  . . . 
un vn )
 
the dot product defined as

u · v = u1 v1 + · · · + un vn
We leave it as an exercise for the reader to show that this is an inner product in V.
119
Example 7.3. Let V be the space of real continuous functions on [0, 1]
f : [0, 1] −→ R
For f, g ∈ V we define
Z 1
h f, gi = f (t) · g(t) dt
0
Using properties of integrals it is easy to verify that this is an inner product.
Example 7.4. Let V be the vector space as above and
f (x) = sin x, g(x) = cos x
Compute h f, gi.
Solution: We have
Z 1 Z 1
1 1 1 − cos 2
h f, gi = sin x cos x dx = sin (2x) dx = (− cos 2 + cos 0) =
0 2 0 4 4
Definition 7.1. Let V be a vector space and h·, ·i an inner product on V. Let u ∈ V. We call v orthogonal to u if hu, vi = 0,
sometimes denoted by u ⊥ v. For a set S ⊂ V its orthogonal set S⊥ is defined as
S⊥ := {v ∈ V | ∃ s ∈ S, s ⊥ v}
If S is a subspace of V then S⊥ is called the orthogonal complement of S.

Similarly to the discussion in Chapter 2 for vectors u and v in R2 (see Fig. 2.2 and Eq. (2.3)) we can get a projection
formula for any inner product h·, ·i. The projection vector of v on u, denoted by proju (v) is the vector
u·v u u·v

proju (v) = · = u
||u|| kuk u·u
If we want a vector perpendicular to u we have
u·v

w = v − proju (v) = v − u. (7.2)
u·u
as in Eq. (2.3).
Exercise 7.1. Let u and v be vectors in an inner space with inner product h·, ·i. Take
hu, vi

w = v− u.
hu, ui
Prove that w is orthogonal to u.
Notice that for any inner product
hu, ui = hu, ui.
Hence hu, ui ∈ R and the following definition makes sense.
Definition 7.2. An inner product is positive definite if the following hold:
i) hu, ui ≥ 0 for all u ∈ V.
ii) hu, ui > 0 if and only if u , 0.
The norm of an element v ∈ V is defined to be p

||v|| := hv, vi.
120
Theorem 7.1. Let V be a finite dimensional vector space over k with a positive definite inner product. If W is a subspace of V
then
V = W ⊕ W⊥ .
Moreover,
dim V = dim W + dim W ⊥ .
Proof. Exercise.

Next we study separately vector spaces over R and those over C.
7.1.1 Inner products over real numbers

Notice that in this case the definition of the inner product is a function
hu, vi : V × V −→ R
such that the following properties hold for every u, v, w ∈ V and r ∈ R:
i) hu, vi = hv, ui,
ii) hu, v + wi = hu, vi + hu, wi
iii) hr u, vi = rhu, vi = hu, r vi.
The most common inner product of Euclidean spaces Rn is the dot product
7.1.2 Hermitian products

Let V be a vector space over C. The inner product in this case is called a Hermitian product. It is a function on V
such that
hu, vi : V × V −→ C
such that the following properties hold for every u, v, w ∈ V and r ∈ C:
i) hu, vi = hv, ui,

ii) hu, v + wi = hu, vi + hu, wi
iii) hα u, vi = αhu, vi and hu, α vi = ᾱ hu, vi.
Example 7.5. Let k ⊂ C and V = kn . For any
u1   v1 
   
u   v 
u =   and v =  2 
 2   
. . .  . . . 
un vn )
   
we define
hu, vi = u1 v̄1 + · · · + un v̄n
Show that this is a Hermitian product. This particular product we will call the Euclidean inner product.
Notice that for the Euclidean inner product h·, ·i
hu, ui = u1 ū1 + · · · + un ūn = ||u1 ||2 + · · · + ||un ||2
The norm of u ∈ V is defined as p p

||u|| = hu, ui = ||u1 ||2 + · · · + ||un ||2
121
Example 7.6. Let V be the space of complex continuous functions
f : [0, 1] −→ C
Z 1
0
Using properties of complex integrals show that this is an inner product.
Example 7.7. (Fourier series) Let V be the space of continuous complex-valued functions
f : [−π, π] −→ C
For f, g ∈ V we define Z π
−π
For any integer n define
fn (t) = en·it .
Verify that:
i) if m , n then h fn , fm i = 0
ii) h fn , fn i = 2π
h f, f i
Rπ
iii) h fn , fnn i = 2π
1
−π
f (t) e−int dt.
h f, fn i
The quantity h fn , fn i is called the Fourier coefficient with respect to f .
Exercises:
7.1. Let V = R2 and the inner product is the Euclidean prod- 7.4. Let V := Matn (R). Define the inner product of matrices
uct. As a review of Chapter 2 prove the following for any M and N as
u, v ∈ V. hM, Ni = tr(MN)
i) ||u + v||2 = ||u||2 + ||v||2 Show that this is an inner product and it is non-degenerate.
ii) ||u + v|| ≤ ||u|| + ||v||
iii) ||u|| = 0 if and only if u = 0. 7.5. Prove the Schwartz inequality
iv) |hu, vi| ≤ ||u|| · ||v|| |hu, vi| ≤ ||u|| · ||v||
7.2. Let V be the space of real continuous functions for the Hermitian product.
f : [0, 1] −→ R 7.6. Prove the following for the Hermitian product:
i) ||u|| ≥ 0
For f, g ∈ V we define ii) ||u|| = 0 if and only if u = 0.
Z 1 iii) ||αu|| = |α| ||u||
h f, gi = f (t) · g(t) dt iv) ||u + v|| ≤ ||u|| + ||v||
0
7.7. Let V := Matn (R). Let A, B be any matrices in V such
Given f (x) = x3 , find g(x) ∈ V such that g is orthogonal to f .
that
" # " #
7.3. Let V be the vector space as in the previous exercise and a1 a2 b1 b2
A := , and B :=
W the set all polynomials in V. Is W is a subspace of V? Given a3 a4 b3 b4
a polynomial
Is the following
f (x) = an x + an−1 x + · · · + a1 x + a0 ,
n n−1
hA, Bi = a1 b1 + a2 b2 + a3 b3 + a4 b4
Can you find g(x) ∈ V such that h f, gi = 0 ? an inner product on V?
122
7.8. Let P2 denote the space of polynomials in k[x] and degree 7.9. Let P2 be equipped with the inner product as in the above
≤ 2. Let f, g ∈ P2 such that example. Describe all the polynomials of norm 1.
f (x) = a2 x2 + a1 x + a0 , 7.10. Let V := L([0, 1], R) be the space of real continuous

functions on [0, 1] with the inner product
and
1
g(x) = b2 x2 + b1 x + b0 .
Z
Define 0
h f, gi = a0 b0 + a1 b1 + a2 b2 .
Describe the norm associated to this inner product and all
Verify that this is an inner product on P2 . functions of norm 1.
7.2 Orthogonal bases, Gram-Schmidt orthogonalization process

Let V be a finite dimensional vector space over k with an inner product h·, ·i. The norm of an element v ∈ V is defined
to be p
||v|| := hv, vi.
A set {v1 , . . . , vn } of vectors is called an orthogonal if for any i , j we have
hvi , v j i = 0.
If in addition, all vectors have norm one then they are called an orthonormal.
Lemma 7.1. Orthonormal vectors in Rn are linearly independent.
Proof. Exercise
Exercise 7.2. Is the above Lemma true for any inner space?
Exercise 7.3 (Pythagorean theorem). Consider vectors u, v ∈ Rn . Prove that
||u + v||2 = ||u||2 + ||v||2
holds if and only if u and v are orthogonal. Is this true for in any inner space?
Theorem 7.2. If v1 , . . . , vn are linearly independent then there is an orthogonal set w1 , . . . , wn such that
Span (v1 , . . . , vn ) = Span (w1 , . . . wn )
Proof. Let fix an ordering on B = {v1 , . . . , vn } say
v1 < v2 < · · · < vn .
Take the following set of vectors
w1 = v1
hv2 , w1 i

w2 = v2 − w1
hw1 , w1 i
hv3 , w2 i hv3 , w1 i

w3 = v3 − w2 − w1 (7.3)
hw2 , w2 i hw1 , w1 i
......
hvi+1 , wi i hvi+1 , w1 i

wi+1 = vi+1 − wi − · · · − w1
hwi , wi i hw1 , w1 i
The reader can check that this is the desired orthogonal set.
Let
B = {v1 , . . . , vn }
123
be a basis for V. Then B is called an orthogonal basis if for any i , j we have

hvi , v j i = 0.
If in addition, for all i = 1, . . . , n, ||vi || = 1 then B is called an orthonormal basis.
Then we have the following corollary.
Corollary 7.1. Every finite dimensional inner-product space has an orthogonal basis.
The proof of the above theorem is constructive and provides and algorithm to find an orthogonal space (and
therefore orthonormal) of any inner space.
7.2.1 Gram-Schmidt algorithm

Algorithm 9. Input: A set S = {v1 , . . . , vn } of vectors.
Output: An orthogonal set of vectors W = {w1 , . . . , wn } such that
Span (v1 , . . . , vn ) = Span (w1 , . . . wn )
i) Fix an ordering of the set S, say

v1 , v2 , . . . , vn
ii) Let
w1 := v1
iii) Compute all wi ’s using the recursive formula

w1 = v1
hv2 , w1 i

w2 = v2 − w1
hw1 , w1 i
hv3 , w2 i hv3 , w1 i

w3 = v3 − w2 − w1 (7.4)
hw2 , w2 i hw1 , w1 i
......
hvi+1 , wi i hvi+1 , w1 i

wi+1 = vi+1 − wi − · · · − w1
hwi , wi i hw1 , w1 i
for all i = 1, . . . , n − 1.
iii) The set {w1 , . . . , wn } is the required W
Example 7.8. Let V = R3 and the inner product on V is the dot product . Let
 
1
 
2
v1 =   , v2 = 2
2  
3 1
   
be given. Find an orthogonal basis of Span (v1 , v2 ).
Solution: Let w1 = v1 . Then

     19 
2 2 
hv2 , w1 i   9    14 
 
w2 = v2 − w1 = 2 − 2 =  57 

hw1 , w1 i   14    13 
1 1 − 14
Clearly w1 ⊥ w2 .
124
Example 7.9. Let V be the space of real continuous functions
f : [0, 1] −→ R
Z 1
h f, gi = f (x) · g(x) dx
0
As shown above, this is an inner product. Let
f (x) = x, g(x) = x2
Since both are continuous then f, g ∈ V. Find an orthogonal basis of Span ( f, g).
Solution: Let w1 = f . Then

R 1 
hg, f i  x3 dx  3
0
w2 = g − f = x −  R 1
2 
 x = x2 − x

h f, f i  2
x dx
 4
0
The reader should check whether w1 ⊥ w2 .
We consider the exercises from section 1. Using the Gram-Schmidt procedure it is quite simple to solve these
exercises.
Example 7.10. Let V be the space of real continuous functions. Given f (x) = x3 , find g(x) ∈ V such that g is orthogonal to f .
Solution: Take
S = { f, 1}
We want to find an orthogonal set W such that f ∈ W. Let w1 = f . Then
R1
h1, f i x3 dx 7
w2 = 1 − f = 1 − R 01 x3 = 1 − x3
h f, f i x6 dx 4
0
The reader can check that h f, w2 i = 0.
7.2.2 The QR factorization

The orthogonalization process can be represented via a matrix form. Indeed we have the following theorem.
Theorem 7.3. Let M be an n × m matrix with linearly independent vectors v1 , . . . , vm . Then
r11 r12 ... r1m 

 
 0 r22 ... r2m 

M = [v1 | v2 | · · · | vm ] = [w1 | w2 | · · · | wn ]  . .. .. .. 

 .. . . . 

0 0 ... rmm
where w1 , . . . , wm are orthonormal, r11 = 1 and ri j are given by
 ||v j ||, for j = 2, . . . , m

 ⊥

ri j = 

 wi · v j for i < j

Moreover, this representation is unique.

The vectors w1 , . . . , wm are the vectors obtained by the normalization basis of col(M) by the Gram-Schmidt
process. Then each vi can be written as
vi = r1i w1 + r2j w2 + · · · + rii wi ,
for all i = 1, . . . , m.
125
Example 7.11. Find a QR factorization of the matrix
1 2 3
 
4 5 6
A = 
7 8 9

1 1 1

Exercises:
7.11. Find an orthogonal basis for the nullspace of the matrix be given. Find an orthogonal basis of Span (v1 , v2 , v3 , v4 , v5 ).
 2 -2 14 
 
A :=  0 3 -7  7.18. Let P2 denote the space of polynomials in k[x] and degree

0 0 2
 
≤ 2. Let f, g ∈ P2 such that
7.12. Find an orthogonal basis for the nullspace of the matrix
f (x) = a2 x2 + a1 x + a0 , and g(x) = b2 x2 + b1 x + b0 .
2 1 1
 
 
 .
 1 2 0 
Define

1 1 3
 
h f, gi = a0 b0 + a1 b1 + a2 b2 .
 3 1 0 -1  Let f1 , f2 , f3 , f4 be given as below
 
 4 0 0 3 
A =  
 -4 2 2 -3  f1 = x2 + 3
2 -4 0 7
 
f2 = 1 − x
7.14. Let V be the space of real continuous functions. Given (7.5)
f3 = 2x2 + x + 1
f (x) = x and g(x) = e , find an orthogonal set W = {w1 , w2 }
2 x
such that Span ( f, g) = Span (w1 , w2 ). f4 = x + 1.
7.15. In the space of real continuous functions find a function Find an orthogonal basis of Span ( f1 , f2 , f3 , f4 ).
g(x) which is orthogonal to f (x) = sin x.
7.16. Show that the following identity holds for any inner √
7.19. Find an orthogonal basis for the subspace Span (1, x, x)
product of the vector space C0,1 of continuous functions on [0, 1], where
||u + v|| + ||u − v|| = 2||u|| + 2||v|| R1
h f, gi = 0 f (x)g(x)dx.
7.17. Let V = R4 and the inner product on V is the dot product
. Let
1 2 1 1 0
         
2 0 1 2 0 7.20. Find an orthonormal basis for the plane
v1 =   , v2 =   , v3 =   , v4 =   , v5 =  
         
3 2 1 3 1
4 1 1 4 2 x + 7y − z = 0.
         
126
7.3 Orthogonal transformations and orthogonal matrices

Let us start with some examples from the geometry of R2 . Consider the linear map T : R2 → R2 such that
" # " #
x1 x
→ 1 .
x2 2x2
" #
1 0
The corresponding matrix for this map is A = . If we have the ellipse
0 2
x22
x21 + = 1,
4
then it is transformed to the unit circle x21 + x22 = 1. So linear transformations change shapes of objects in R2 .
How should a linear transformation be such that it preserves shapes? Obviously, it has to preserve distances.
This motivates the following:
Definition 7.3. A linear transformation T : Rn → Rn is called orthogonal if it preserves the length:
||T(x)|| = ||x||,
for all x ∈ Rn . The corresponding matrix of an orthogonal map is called an orthogonal matrix.
Proposition 7.1. Orthogonal transformations preserve orthogonality. In other words, if u is orthogonal to v then T(u) is
orthogonal to T(v).
Proof. Using Exercise 7.3 it is enough to show that
||T(u) + T(v)||2 = ||T(u)||2 + ||T(v)||2 .
We have
||T(u) + T(v)||2 = ||T(u + v)||2 = ||(u + v)||2 = ||u||2 + ||v||2 = ||T(u)||2 + ||T(v)||2

Theorem 7.4. i) A linear transformation T : Rn → Rn is if and only if the image of the standard basis is an orthonormal basis.
ii) An n × n matrix is orthogonal if and only if its columns form an orthonormal basis for Rn .
Proof. Let B = {e1 , . . . , en } be the standard basis. Then, by Proposition 7.1 the set
{T(e1 ), . . . , T(en )}
is an orthogonal basis. Since T preserves norms, then it is an orthonormal basis.

An orthogonal matrix is the matrix of an orthogonal transformation T : Rn → Rn . Let B = {v1 , . . . , vn } be an
orthonormal basis for Rn . Then the matrix is given by
[T(v1 ) | T(v2 ) | . . . , |T(vn )]
Since v1 , . . . , vn are orthonormal, then T(v1 ), . . . , T(vn ) are orthonormal.

Theorem 7.5. A square matrix A is orthogonal if and only if
A−1 = AT .
Proof. Let A be a given orthogonal matrix. From Theorem 7.4 its columns form an orthonormal basis, say
 | | | 
 
A = v1
 v2 ... vn  .
| | |
 
127
Then,
T
− v1 − 
 
 v1 · v1 v1 · v2 ··· v1 · vn 
 
− vT2 −  | | |  v2 · v1 v2 · v2 ··· v2 · vn 
  
A A = 
T
..
 v v2 ... vn  =  . .. .. ..  = I,
.
  1
 |   .. . . . 

  | | 
− vTn − vn · v1 vn · v2 ··· vn · vn
since v1 , . . . , vn is an orthonormal set.
Another property of orthogonal matrices is the following.
Proposition 7.2. Let A be an orthogonal matrix. For all u and v ∈ Rn we have that
u · v = (Au) · (Av)
Proof. Exercise.
Summarizing all properties of orthogonal matrices we have the following.
Theorem 7.6. Let A be an n × n matrix. Then the following statements are equivalent.
i) A is an orthogonal matrix
ii) The transformation T(x) = nAx preserves length (in other words kAxk = kxk).
iii) The columns of A form an orthonormal basis
iv) AT A = In .
v) A−1 = AT .
vi) A preserves the dot product, in other words u · v = (Au) · (Av).
Proof. Combine all the results proved above to show the equivalence of these statements.

Since we will be using the transpose more often in the coming sections let’s summarize some of its properties.
Proposition 7.3. The following are true:
i) (A + B)T = AT + BT
ii) (rA)T = rAT
iii) AB)T = BT AT
iv) rank AT = rank A
−1 T
v) AT = A−1 .
Proof. Exercise

7.3.1 Orthogonal projections

We have seen orthogonal projections before. Let us now consider how to find the matrix of an orthogonal projection.
Theorem 7.7. Let V be a subspace of Rn with orthonormal basis {v1 , . . . , vm }. The matrix of the orthogonal projection onto V
is given by
P = QQT ,
where
 | | | 
 
Q = v1
 v2 ... vm  .
| | |
 
Moreover, the matrix of an orthogonal transformation is symmetric.
Proof. Let V be a subspace of Rn with an orthonormal basis u1 , . . . , un . The projection of x onto V is given by
 T 
 −u − 
i  1 
projV (x) = (u1 · x) u1 + · · · + (um · x) um = u1 u1 x + · · · + um um x = u1 |u2 | · · · |um  ...  x = QQT x
h
T T
 T 
−um −
128

Let us go back to the case of the line.
Example 7.12. Consider a line L in R2 with equation
L : y = ax + b.
Find the matrix of the orthogonal projection onto L.
Proof. We have noted before that if the line doesn’t pass through the origin, the projection is not even a linear map.
However, let’s just pretend that we don’t even know that.
Notice that a directional vector for L is " # " #
−b/a b −1
u= = .
b a a
√
Its norm is ||u|| = ba a2 + 1. Hence, {v} is a orthonormal basis for L, where
" #
u 1 −1
v= = √ .
||u|| a2 + 1 a
Notice that there is no b anymore in this vector. That’s because this is a vector not on the original line, but on the
line parallel to L which goes through the origin.
Then from Theorem 7.7 the matrix P is
" #! ! " #
1 −1 1 h i 1 1 −a
P= √ √ −1 a = 2 .
a2 + 1 a a2 + 1 a2 + 1 −a a
Let us check how this will work with the directional vector u ∈ L. We have
−(a + 1)
" # " #! " 2 # " #
1 1 −a b −1 b b −1
Pu = 2 · = = = u,
a(a2 + 1 a(a2 + 1)
2
a + 1 −a a a a a a
as expected.
We have already seen the above example; see Fig. 1.2.
Exercise 7.4. Find the matrix of the orthogonal projection onto the subspace V of R4 such that V = Span (u, v), where
1  1 
   
1 1 1 −1
u =   and v =  
2 1 2 −1
1 1
Notice that u and v already form an orthonormal basis for V.
Solution: Since the vectors u and v are orthonormal we have
1 1  " 1 0 0 1

   
#
1 1 −1 1 1 1 1 0 1 1 0
P = QQT =  =

4 1 −1 1 −1 −1 1 0 1 1 0
 
1 1 1 0 0 1
  

Exercise 7.5. Given a plane P in R3 going through the origin, say with equation
ax + by + cz = 0.
Find the matrix of the orthogonal map onto P.
Exercises:
129
7.21. Check whether the matrix is orthogonal entries in k is a vectors space; see ??. Consider the map
T : Matn×n (k) → Matn×n (k) given by
1 1 −1
 
3 2 −5
  T(A) = AT .
2 2 0
7.22. For a field k the space Matn×n (k) of n × n matrices with Is this a linear map? Prove your answer.
130
7.4 The method of least squares

The method of least squares was first discovered by Gauss in the early 1800’s and has been used successively since
then in many areas of mathematics and engineering. Consider the following problem:
Problem: Given a set of data
x x1 x2 x3 x4 ... xn
y y1 y2 y3 y4 ... yn
Table 7.1
Find a linear function y = f (x) that best fits this data.
Geometrically two of these points (xi , yi ) determine a line. However, we are looking for the line that is "closest"
to all the given points. Let us assume that the equation of f (x) is given by
f (x) = ax + b.
Then we have
yi = axi + b, for i = 1, . . . n.
In matrix notation we have
x1 1 ax1 + b
   
x2
 1 " # ax2 + b
 · ·  a

 · 
 

 ·  =  
 ·  b  · 
 · ·   · 
   
xn 1 axn + b
   
or we write this as
A v ~y,
where
x1 1 ax1 + b
   
x 1 ax + b
 2 " #  2 
· ·  a  · 
   
A =  , v= , ~y =   .
 · ·  b  · 

 · ·   · 
 

axn + b

xn 1
 
# "
a
The problem now becomes to determine v = such that the error vector Av − ~y is minimal. The concept of
b
minimal depends on the type of application. The method of least squares is based on the idea that we require
that the magnitude ||Av − ~y|| is minimal. Denote by d := Av − ~y. Then, di = ax + b − yi . Minimizing ||Av − ~y|| means
minimizing ||Av − ~y||2 , which means minimizing
d21 + d22 + · · · + d2n .

Let v1 and v2 denote the column vectors of A. The vector Av = av1 + bv2 lies in the space W = Span (v1 , v2 ).
We want to find a vector v0 ∈ W such that the dot product Av · (Av0 − ~y) = 0 for all v ∈ W. Then we have
Av · (Av0 − ~y) = (Av)T (Av0 − ~y) = (Av)T Av0 − (Av)T ~y

(7.6)
= vT AT Av0 − vT AT ~y = vT AT Av0 − AT ~y = 0
for all v ∈ W. Because the dot product is a non-degenerate inner product then
AT Av0 − AT ~y = 0
131
and
v0 = (AT A)−1 AT ~y
The matrix
P := (AT A)−1 AT
is sometimes called the projection matrix of A.
Next we provide some examples for different polynomial approximations.
Example 7.13. Let the following data be given
x 1 2 2 5
y 2 3 5 7
Find a linear function that best fits the data.
Solution: Then
 1 1  2 
   

 2 1   3 
A :=   , and b = 
   
 2 1  5 


5 1 7
   
We have " #
34 10
A A=T
10 4
The least squares solution is
" #
1 7
v0 = (AT A)−1 AT ~y =
6 8
Hence, the best fitting line to the above data is
7 4
y = x+
6 3

As we will see in the next example the least squares method has its limitations. As expected not everything in
applications is linear. If we approximate a given data with a linear model then this model might not fit the data
very well. In the next example we see that sometimes such an approximation is not close at all to the data.
Example 7.14. Let the following data be given
x 1 2 3 4 5
y 2 5 4 7 2
Solution: Then
1 1 2
   
   

 2 1 


 5 

A := 
 3 1  ,

and ~y =  4 

4 1 7
   
   
   
5 1 2
" #
1 1
v0 = (A A) T −1
A ~y =
T
5 17
132
Figure 7.1: Fitting of the above data by the least squares method.
Hence, the best fitting line to the above data is

x 17
y= +
5 5
The graph in Fig. Fig. 7.1 presents the graph of the data and of the function.

In the above examples we found a linear function that best fits a given set of data. However, the method of least
squares can be used not only to find linear functions. Next we see how to generalize the method.
Let A be an m × n matrix, v an n × 1 vector and ~y and m × 1 vector. Let a matrix equation
A v ~y
be given. A least squares solution of the matrix equation A v ~y is a vector v0 such that
||~y − Av0 || ≤ ||~y − Av||
for all v. #"

a
The problem now becomes to determine v = such that the error vector Av − ~y is minimal. The concept
b
of minimal depends on the type of application. The method of least squares is based on the idea that we require
that the magnitude ||Av − ~y|| be minimal. Denote by d := Av − ~y. Then, di = ax + b − yi . Minimizing ||Av − ~y|| means
minimizing ||Av − ~y||2 , which means minimizing
d21 + d22 + · · · + d2n .
Let v1 and v2 denote the column vectors of A. The vector Av = av1 + bv2 lies in the space W = Span (v1 , v2 ).
We want to find a vector v0 ∈ W such that the dot product Av · (Av0 − ~y) = 0 for all v ∈ W. Then we have
Av · (Av0 − ~y) = (Av)T (Av0 − ~y) = (Av)T Av0 − (Av)T ~y

(7.7)
= vT AT Av0 − vT AT ~y = vT AT Av0 − AT ~y = 0
for all v ∈ W. Because the dot product is a non-degenerate inner product then
AT Av0 − AT ~y = 0
133
and
v0 = (AT A)−1 AT ~y
The matrix
P := (AT A)−1 AT
is called the projection matrix of A. The vector Av0 is called the orthogonal projection of ~y on the column space of
A.
7.4.1 The method of least squares for higher degree polynomials

We consider the same problem as in the previous subsection. However, the approximation we want to use is not
necessarily linear but a degree n polynomial. It is known that if n points are given on the plane then there is always
a degree n polynomial which passes through these points, unless the points are linearly dependent. Thus, for most
applications we have r points and want to find a polynomial of degree n that best fits the data for n < r. Consider
the problem:
Problem: Given a set of data
x x1 x2 x3 x4 ... xr
y y1 y2 y3 y4 ... yr
Table 7.2
Find a degree n polynomial

y = f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
that best fits this data.
We can write this in a matrix form as follows:
x1 . . . x1 1  an  an x1 + an−1 x1 + · · · + a1 x1 + a0 

 n     n n−1 
xn . . . x
 n−1  an x2 + an−1 x2 + · · · + a1 x2 + a0 
1 a   n n−1 
 2 2
 · · · ·   ·   ·
     
 = 

  
 · · · ·   ·   · 

 · · · ·   ·   ·
     

xn . . . xn 1
 n
an xn + an−1 xn + · · · + a1 xn + a0
  n n−1
a0
 
As previously we denote this as A v = ~y. The least squares solution is
v0 = (AT A)−1 AT ~y
Example 7.15. Let the following data be given as in the previous example.
x 1 2 3 4 5
y 2 5 4 7 2
Find a polynomial of degree 2 that best fits the data.
Solution: Then
 1 1 1 2
   
  
 4 2 1   5 
   
A :=  9
 3 1  ,

and ~y =  4 

 16 4 1 7
   
  
 
25 5 1 2
134
Figure 7.2: Fitting of the above data by the least squares method.
6
 − 7
 

 
 
v0 = (AT A)−1 AT ~y =  187
 
 35


 
 13 
−5
Hence, the best fitting degree 2 polynomial to the above data is
6 187 13
y = − x2 + x−
7 35 5
The graph of Fig. Fig. 7.2 presents the graph of the data and of the function. Notice how we get a better approximation than in
the linear case.
Example 7.16. Find degree 3 and 4 polynomials that approximate the data of the previous example.
Solution: For a degree 3 polynomial we have
1 15 53
y = − x3 + x2 − x + 3
3 7 21
The graph is presented in Fig. Fig. 7.3. Compare this with degree 1 and 2 polynomials to see that we get a better fit.
Since we have four points on the plane, then by using a degree 4 polynomial we are able to find a polynomial that will pass
through the points. The least squares method will find this unique solution when it exists. In this case, the degree 4 polynomial
that fits the data is
5 29 235 2 196
y = − x4 + x3 − x + x − 33
6 3 6 3
and the graph is presented in Fig. Fig. 7.4.

The least squares method can be used for many other applications. (At A)−1 exists if A has independent column
vectors. Thus, we have a unique least squares solution if null (A) = 0.
Example 7.17. Find the least squares solution to the system
x1 − x2 = 4



+ 2 + x3 = 3



 3x 1 2x


 3x 1 + 2x 2 − 5x3 = 1
2x1 + x2 − x3 = 3



135
Figure 7.3: Fitting of the data by a degree 3 polynomial
Figure 7.4: Fitting of the data by a degree 4 polynomial
Solution: We have
Ax = ~y
where
 1 -1 0  4 
   

 3 2 1   3 
A = 

 ,
 ~y =  .

 3 2 -5   1 
2 1 -1 3
   

 617 

 275 

 
v0 = (A A) A ~y = 
t −1 t −21  .
 
 11 
 

 29
275
136
Then the orthogonal projection of ~y on the column space of A is the vector Av0 given by
 1142  
  4.152727273 


 275  
   

 179  
  3.254545455 
55

Av0 =   =   .
   
 

 331   1.203636364 
 275  
  
   
 
123 
2.236363636
55
Exercises:
7.23. Let the following data be given 7.28. Find the least squares solution to the system
x 1 2 3 4 5 x1 − 11x2 = 1



y 8 13 18 23 28 3x1 + x3 = 2






 x1 + 2x2 = 1
 2x1 + x2 − x3 = 31


7.24. Let the following data be given
and the corresponding orthogonal projection.
x 0 1 3 2 -2
y 2 1 -1 -5 4
7.29. Find the least squares solution to the system
5x1 − 12x2 = 4

7.25. Find a degree 2 polynomial that best fits the following 

x1 + 3x2 = −2


data: 
 6x1 + 2x2 = −1


x 0 1 3 2 5
y -2 -1 2 4 2 and the corresponding orthogonal projection.
7.26. Find a degree 3 polynomial that best fits the data:
7.30. Find the least squares solution to the system
x 0 1 3 5 -2
3x1 − x2 + 3x3 = 4

y 2 3 5 0 -4 

 3x1 + 7x2 + x3 = 3



7.27. Find the least squares solution to the system 

 3x1 + 2x2 − x3 = 21
2x1 + x2 − x3 = 4

x1 − x2 = 4
 



1 + 2x2 = 3

3x


 3x1 + 2x2 = 1

 and the corresponding orthogonal projection.
137
7.5 Sylvestre’s theorem

Let V be a finite dimensional vector space over R and h·, ·i an inner product on V. By the previous section we can
find an orthogonal basis B = {v1 , . . . , vn } of V. Since the inner product is not necessarily positive definite then hvi , vi i
could be ≤ 0. Denote
ci := hvi , vi i
for i = 1, . . . n. We can reorder the basis B such that
c1 , . . . , cp > 0, cp+1 , . . . , cp+s < 0, cp+s+1 , . . . , cp+s+r = 0,
where p + s + r = n. Sylvester’s theorem says that the numbers p, s, r don’t depend on the choice of the orthogonal
basis B. We normalize the basis as follows. Let
vi , if ci = 0




vi


 √ , if ci > 0


0

vi := 
 ci (7.8)
vi


 √−c , if ci < 0




i
Then the set B0 is a basis of V such that

hvi , vi i = ±1, or 0.
Such basis is called an orthonormal basis of V.
Let
B = {v1 , . . . , vn }
be an orthogonal basis of V such that
c1 , . . . , cp = 1, cp+1 , . . . , cp+s = −1, cp+s+1 , . . . , cp+s+r = 0
where p + s + r = n and ci := hvi , vi i.

Theorem 7.8. The number p, r, s are uniquely determined by the inner product and do not depend on the choice of the orthogonal
basis B.
The integer p (resp., s) is sometimes called the index of positivity (resp., negativity) and the pair (p, s) the
signature of the inner product.
Exercises:
What is the signature of Rn with the usual Euclidean Find the signature of the Euclidean product for W.
inner product?
7.32. Let P2 denote the space of polynomials in k[x] and degree
7.31. Let W be the space generated by ≤ 2. Let f, g ∈ P2 such that
v1 = (1, 2, 3, 4), f (x) = a2 x2 + a1 x + a0 , and g(x) = b2 x2 + b1 x + b0 .
v2 = (2, 0, 2, 1),
v3 = (1, 1, 1, 1), (7.9) Define
h f, gi = a0 b0 + a1 b1 a2 b2 .
v4 = (1, 2, 3, 4)
v5 = (0, 0, 1, 2) Find the signature of this inner product for P2 .
138
7.6 The dual space

Let V be a vector space over the field k.
Definition 7.4. The dual space of V is the vector space (over k)
V ? := L(V, k)
of all linear maps L : V −→ k. Elements of the dual space are called functionals.
Example 7.18. Let V = kn . The simple examples of functionals are coordinate functions
φi (x1 , . . . , xn ) = xi
We leave it to the reader to verify that these are functionals.
Theorem 7.9. Let V be a vector space of finite dimension. Then,
dim V = dim V ?
Proof. Exercise 1.
The following lemma constructs a basis for V ? .
Lemma 7.2. Let V be a vector space over the field k and V ? its dual space. Let
B = {v1 , . . . , vn }
be a basis for V. For each i = 1, . . . n, we define
φ(vi ) = 1
(
φi := (7.10)
φ(v j ) = 0, for j , i
The functionals {φ1 , . . . , φn } form a basis for V ? .
Proof. Exercise.
Definition 7.5. The basis {φ1 , . . . , φn } of V ? is called the dual basis.
The dual space is a very important concept in linear algebra. Below we give a few more examples of functionals
which are important in different areas of mathematics.
Example 7.19. Let V be a vector space over k with scalar product h·, ·i. Fix an element u ∈ V. The map
V −→ k
v −→ hv, ui
is a functional.
Example 7.20. Let V be a vector space of continuous real-valued functions on the interval [0, 1]. Define
δ : V −→ R
such that δ( f ) = f (0). Then δ is a functional called the Dirac functional.
Theorem 7.10. Let V be a finite dimensional vector space over k with a non-degenerate scalar product. The map
Φ : V −→ V ?
(7.11)
v 7→ Lv
is an isomorphism.
Proof. See for example [Lan87, pg. 128].
Exercises:
139
7.33. Let V be a vector space of finite dimension. Prove that, 7.34. Let V = Matn×n (R). Describe V ? .
dim V = dim V ? .
7.35. Let V = R2 n. Describe V ? .

7.36. Let V be the space of real continuous functions be given. Find an orthogonal basis of Span (v1 , v2 , v3 ).
f : [0, 1] −→ R
 -1 -3 1
 
Z 1 
A :=  0 3 1
 
h f, gi = f (x) · g(x) dx 
5 2 2
 
0
i) Let n be a fixed integer and
f (x) = sin nx, g(x) = cos nx
 2 1 4
 
Find h f, gi. 
 .
 1 2 -1 

4 1 3
 
ii) Find a function perpendicular to f (x) = ex .
7.37. Let V = C2 and h·, ·i the Euclidean inner product in V. 7.42. Find an orthogonal basis for the nullspace of the matrix
Let u, v ∈ V such that
 3 1 7 -1
 
2+i
" # " #
i 
u= , v=  4 1 2 3 
i−1 i+3 A = 
 
 -4 2 2 -3


2 -4 3 7
 
Find hu, vi, ||u||, ||v||.
7.38. Let V := Matn (C). Let M, N be any matrices in V. Show
7.43. Find√ an orthogonal basis for the subspace
that
Span (1, x, x2 ) of the vector space C0,1 of continuous func-
hM, Ni = tr(MN)
tions on [0, 1], where
is an inner product.
Z 1
7.39. Let V = R4 and the inner product on V is the dot product h f, gi = f (x)g(x)dx
. Let 0
1 2 1
     
2 2 1 7.44. Find an orthonormal basis for the plane
v1 =   , v2 =   , v3 =  
     
3 1 1
1 2 1 4x + 3y + 2z = 0.
     
140
Chapter 8
Symmetric matrices
The theory of symmetric matrices is tied closely with the classical theory of quadratic forms. In the first section
we give a brief overview of quadratic forms and how to associate them with matrices. In the next few sections we
study in more detail the symmetric matrices. The main focus of this chapter can be summarized in the problem of
diagonalizing a quadratic form. For example, given and equation
61x2 + 104xy + 22xz + 52y2 + 40yz + 61z2 = 144
show that by a change of basis this equation can be written as
x2 y2 z2
+ + =1
4 9 16
which obviously is a lot easier to recognize its shape.
8.1 Quadratic forms

The theory of n-ary forms is one of the oldest and most beautiful parts of mathematics. In this chapter we give a
brief introduction to this theory and its relation to linear algebra. As before we take the field of scalar to be R.
8.1.1 Binary quadratic forms

A binary quadratic form is a homogenous degree 2 polynomial in two variables, in other words a polynomial of
the form
f (x, y) = ax2 + bxy + cx2
and its discriminant defined as
∆ f = b2 − 4ac
We let the matrices M and v be defined as
b
" # " #
a x
M= b
2 , and v=
2 c y
Then, a binary quadratic form is given by

f = vt Mv
Hence, there is a one-to-one correspondence between the binary quadratic forms and symmetric 2 × 2 matrices. For a given
form f its corresponding matrix will be denoted by M f .
Lemma 8.1. The discriminant ∆ f of f is zero if and only if det M f = 0. Moreover,
∆ f = −4 det M f
141
Remark 8.1. There are many authors who define binary forms as
f (x, y) = ax2 + 2b xy + cy2

" #
a b
so that the corresponding matrix is M f = and the discriminant ∆ f = ac − b2 instead of b2 − 4ac. We will stick with the
b c
usual conventions.
Change of coordinates
λ λ2
" #
A change of coordinates is any linear map R2 → R2 for some matrix M = 1 ∈ Mat2 (R),
λ3 λ4
λ λ2 x
" # " #" #
x
→ 1 .
y λ3 λ4 y
Notice that M is not necessarily an invertible matrix. We normally denote f λ1 x + λ2 y, λ3 x + λ4 y by f M (not to be

confused by any power of f ).
Notice that
f λ1 x + λ2 y, λ3 x + λ4 y = (aλ21 + bλ1 λ3 + cλ23 )x2 + (2aλ1 λ2 + bλ1 λ4 + bλ2 λ3 + 2cyλ3 λ4 )xy + (aλ22 + bλ2 λ4 + cλ24 )y2

which has matrix

#t
aλ21 + bλ1 λ3 + cλ23 aλ1 λ2 + cλ3 λ4 + 2b (λ1 λ4 + λ2 λ3 ) λ λ2 λ λ2
" # " " #
= 1 Mf 1
aλ1 λ2 + cλ3 λ4 + 2b (λ1 λ4 + λ2 λ3 ) aλ21 + bλ1 λ3 + cλ23 λ3 λ4 λ3 λ4
So we have
f λ1 x + λ2 y, λ3 x + λ4 y = vt MT M f M v

Exercise 8.1. If A is a symmetric matrix, then for any matrix B, the matrix BT AB is symmetric.
Two binary quadratic forms f (x, y) and g(x, y) are called equivalent if they are related through an invertible
change of coordinates. In other words, if there exists an invertible matrix M ∈ GL2 (R) such that
f M (x, y) = g(x, y).
So we are ready to answer the natural question: if two binary forms f and g are related by a change of coordinates,
how are their matrices M f and M g related?
Lemma 8.2. Two binary quadratic forms f and g are related through a change of coordinates M ∈ Mat2 (R) if and only if their
corresponding matrices satisfy,
M f = MT M g M.
Two binary forms are equivalent over R if and only if there exists M ∈ GL2 (R) such that
M f = Mt M g M.
Proof. We already proved the first claim. To prove the second claim let us assume that M ∈ GL2 (R) such that
M f = MT M g M.

Two matrices A and B are called congruent over R if there is an invertible M ∈ GL2 (R) such that
A = MT BM.
142
Geometry of binary quadratic forms

Let’s see if we can determine the shape of the graph in R2 for the curve with equation
f (x, y) = ax2 + bxy + cy2 = d
for some fixed constant d ∈ R. Can we somehow use the matrix M f to determine the shape of the graph? Without
any loss of generality we can assume that d = 1 by applying the transformation
" # "√ #
x dx
→ √ .
y dy
If the quadratic form would have the shape
f (x, y) = α1 x2 + α2 y2 (8.1)
then this would be much easier to graph. We have seen such graphs from high school. We call binary quadratics as
in Eq. (8.1) diagonal, since their corresponding matrices are diagonal
α1
" #
0
0 α2
So how can we change a binary quadratic to a diagonal quadratic? We know how to diagonalize matrices. So
maybe the same procedure can be used to diagonalize quadratics? Let’s give it a try.
The characteristic polynomial of A is
b2
char (A, x) = x2 − (a + c)x − − ac
4
Its eigenvalues are
p p
a+c (a − c)2 + b2 a+c (a − c)2 + b2
λ1 = + and λ1 = −
2 2 2 2
and their corresponding eigenvectors
√1 √1
   
v1 =  (c−a)+ (c−a) +b  and v2 =  (c−a)− (c−a) +b  .
 2 2
  2 2

1 1
Hence, A is diagonalizable as A = PDP−1 where P
√1 √1
 
P =  (c−a)+ (c−a)2 +b2  .
(c−a)2 +b2 
 
(c−a)−
1 1

Since D is given by
D = P−1 AP
we can make this work if somehow P−1 = PT as in Lemma 8.2. But we know exactly about matrices with this
property, thanks to Theorem 7.5. They are the orthogonal matrices.
So our next challenge becomes to find an orthogonal matrix P and a diagonal matrix D such that A = PDP−1 ,
or in other words to choose the eigenvectors in the diagonalization process such that the transition matrix C is
orthogonal.
Positive definite forms

Let us now recall a few things from high school.
143
x α1 α2
f(x) a -a a
Table 8.1: The sign of the quadratic polynomial
Example 8.1. For a given binary quadratic form
f (x, y) = ax2 + bxy + cy2
the sign of f (x, 1) is determined by the following: f (x, 1) has the opposite sign of a in the interval (−α1 , α2 ) and it has the sign
of a everywhere else.
A binary quadratic is called positive definite if f (x, y) > 0 for every x ∈ R2 .
Lemma 8.3. A binary quadratic form

f (x, y) = ax2 + bxy + cy2
is a positive definite quadratic if and only if a > 0 and ∆ f < 0.
Proof. We assume that y , 0 and write

 !2 
1  x x 
f (x, y) = ax + bxy + cy = 2 a
2 2
+ b + c
y y y
let us make the substitution t = xy . Then the sign of f (x, y) is the same as the sign of
g(t) = at2 + bt + c.
From the above discussion, this is always positive if and only if a > 0 and ∆ g = ∆ f < 0.
For values y = 0 we have f (x, 0) = ax2 , so f (x, y) is not positive definite since for x = 0 it is f (0, 0) = 0. This
completes the proof.
8.1.2 Quadratic forms

Definition 8.1. A quadratic form defined over R is called a function
q : Rn → R
such that
n
X
q(x) = ai,j xi x j , (8.2)
i=0, j=0
where ai,j are coefficients from R.

h i
As for binary quadratics, every quadratic form has an associated matrix Aq given by A = ai,j . Obviously, in
this case we also have
q(x) = xT Aq x
Exercise 8.2. Prove that for a given form q the matrix Aq is unique.
Exercise 8.3. Prove that the set Qn of all quadratic forms over R forms a subspace in the space of all functions from Rn to R.
What is the dimension of this space?
144
Binary forms are the simplest of all quadratic forms. Quadratic forms
q : R3 → R
are called ternary forms. A ternary form is given by
q(x) = a1,1 x21 + a1,2 x1 x2 + a1,3 x1 x3 + a2,2 x22 + a2,3 x2 x3 + a3,3 x23
and has coefficient matrix A = [ai,j ].
Consider the ternary form
x2 y2 z2 z
q(x, y, z) = + + ,
a2 b2 a2 c
which has corresponding matrix
1  y
 a2 0 0 
0 b
Aq =  0 1 a
 
b2
0 
 1
0 0 c2

From Calculus we know that the equation of x

Figure 8.1: Ellipsoid
q(x, y, z) = 1
in R3 is an ellipsoid as shown in Fig. 8.1.
8.1.3 Polynomials versus forms

Obviously quadratic forms are multivariable polynomials of total degree two. In general, a form of degree d is a
homogenous polynomial of total degree d.
Exercises:
8.1. Prove that there is a one to one correspondence between 8.2. Let
the set of positive definite quadratic forms and the upper half q(x, y) = a x2 + 2b xy + c y2
plane
be a quadratic form. From methods of multivariable calcu-
lus determine the global extrema of this function. Can you
H = {z ∈ C | <(z) > 0} accomplish this via linear algebra methods?
8.2 Symmetric matrices

From Section 7.3 we recall that a matrix S is called orthogonal if its corresponding linear map x → Sx preserves
lengths, for all x ∈ Rn . The following are true for orthogonal matrices.
Lemma 8.4. The following are equivalent:
i) S is orthogonal
ii) ST = S−1
iii) Columns of S form an orthonormal basis for Rn
The reader should check Section 7.3 for details. A matrix A is called orthogonally diagonalizable if there if
there is an orthogonal matrix S and a diagonal matrix D such that
A = ST DS
145
Expressing a matrix A in the above form would be beneficial for obvious reasons, not only we change the base of
the vector space such that A becomes a diagonal matrix, but we do so preserving distances. The natural question
is, which matrices are orthogonally diagonalizable? We will answer this question in the remaining of this lecture.
Lemma 8.5. If A is orthogonally diagonalizable then AT = A.
Proof. If A is orthogonally diagonalizable then it exists an orthogonal matrix S ∈ GLn (R) such that
A = SDST ,
for some diagonal matrix D. Then,
AT = (SDST )T = (ST )T · DT · ST = A.
Hence, AT = A.

" #
1 2
Example 8.2. Let A = . Find S orthogonal such that ST AS is orthogonal.
2 −2
Solution: Since for an orthogonal matrix S we have ST = S−1 , then we are looking for a matrix such that S−1 AS is
diagonal. We follow the same method as in Section 5.3. The characteristic polynomial is
char (A, λ) = (1 − λ)(−2 − λ) − 4 = λ2 + λ − 6 = (λ − 2)(λ + 3)
For λ = 2 we have " #!
2
E2 = Span
1
and for λ = −3 we have " #!
−1
E−3 = Span
2
Then the matrices S and D are " # " #
2 −1 2 0
S= , and D=
1 2 0 −3
The matrix S is not orthogonal, since its columns do not form an orthonormal basis for R2 . We can fix this by taking
" #
1 2 −1
S= √ .
5 1 2

Next, we see how to do this in general.
Theorem 8.1. Let v1 and v2 be eigenvectors of a symmetric matrix A belonging to distinct eigenvalues λ1 and λ2 . Then, v1
and v2 are orthogonal.
Proof. Consider the product vT1 Av2 . Then we have,
vT1 Av2 = vT1 (λ2 v2 ) = λ2 (v1 · v2 )

Also,
vT1 Av2 = vT1 AT v2 = (Av1 )T v2 = (λ1 v1 )T v2 = λ1 (v1 · v2 )
Hence, we have
λ1 (v1 · v2 ) = λ2 (v1 · v2 )
Thus,
(λ1 − λ2 )(v1 · v2 ) = 0,
which implies that
v1 · v2 = 0 and therefore v1 is orthogonal to v2 .

The above theorem shows that symmetric matrices are special. Indeed, it gets even better.
146
Theorem 8.2. If A is a symmetric matrix, then all its eigenvalues are real.
Proof. Since complex eigenvalues occur in pairs via the conjugate, consider such a pair α ± iβ and the corresponding
eigenvectors v ± iw, respectively. Note that
(v + iw)T (v − iw) = ||v||2 + ||w||2 ,
see Exercise ??. Then, we have
(v + iw)T A(v − iw) = (v + iw)T (α − iβ)(v − iw) = (α − iβ)(||v||2 + ||w||2 )
Also,
(v + iw)T A(v − iw) = (A(v + iw))T (v − iw) = (α + iβ)(v + iw)T (v − iw) = (α + iβ) (||v||2 + ||w||2 )
Hence, α + iβ = α − iβ and we are done.
Exercise 8.4. Prove the above theorem using the definition of the dot product for vectors spaces over C.
Next, we consider the main result of this lecture, the so called spectral theorem.
Theorem 8.3 (Spectral theorem). A matrix A is orthogonally diagonalizable if and only if A is symmetric.
Proof. Let A be an n × n matrix. We will prove the theorem using induction on n.

For n = 1, we take S = D = [1] and the claim is obviously true. Now assume that the theorem is true for n.
To be completed ...

Algorithm 10. Orthogonal diagonalization of a symmetric matrix A
Input: A symmetric matrix A

Output: An orthogonal matrix S and a diagonal matrix D such that A = ST DS.
Step 1 Compute all eigenvalues
λ1 , . . . , λr ,
and their multiplicities
Step 2 For each eigenvalue λi determine an orthonormal basis
B = {vi,1 , . . . , vi,si }
Step 2 The matrix h i

S = v1,1 | . . . |v1,s1 |v2,1 | . . . |v2,s2 | . . . |vr,sr
is the desired orthogonal matrix and the matrix of the eigenvalues is the matrix D.
147
8.2.1 Diagonalizing a quadratic form

For any given form q(x), the corresponding matrix A is a symmetric matrix. Since A is symmetric, then it can be
orthogonally diagonalized, say
A = QT DQ,
for some orthogonal matrix S and a diagonal matrix D. Let v = Qx. Then we have the following result
Theorem 8.4 (Principal Axes Theorem). Let q(x) be a quadratic form and A its corresponding matrix with
A = QDQT
its orthogonal diagonalization. Then

q (Qx) = λ1 v21 + · · · + λn v2n ,
where λ1 , . . . , λn are eigenvalues of A.
Proof. Let q(x) = xT Ax. By the Spectral Theorem there exist matrices Q and D such that Q is orthogonal and D is
diagonal with eigenvalues of A as entries in the main diagonal and
A = QT DQ.
Then we have
−1
D = Q−1 A QT = QT AQ.
Let us now compute q(Qx),
q (Qx) = (Qx)T A (Qx) = xT (QT AQ) x = xT D x = λ1 x21 + · · · + λn x2n .

The eigenvectors v1 , . . . , vn are called the principal axes. Let us consider a few examples.
Example 8.3. Find a change of coordinates that transforms the quadratic

form
q(x, y) = 5x2 + 4xy + 2y2 ,
into a diagonal form. Sketch the graph of the curve q(x) = 1 before and after
the diagonalizing it.
" #
5 2
Proof. The corresponding matrix is A = .
2 2
Its eigenvalues are λ1 = 6 and λ2 = 1 and the corresponding unit
eigenvectors " # " #
1 2 1 1
v1 = √ and v2 √
5 1 5 −2
So the matrix for the coordinate change is
" #
1 2 1
Q= √
5 1 −1
If we check the change of coordinates we have
2x + y x − 2y
!
q (Qx) =q √ , √ = 6x2 + y2 ,
5 5
Figure 8.2: The ellipse after the transformation
as expected.
148
Example 8.4. Graph the equation
q(x, y) = −7x2 − 12xy + 2y2 = 1
Diagonalize the quadratic form q(x) and graph the equation again.
" #
−7 −6
Proof. The corresponding matrix is A = .
−6 2
Its eigenvalues are λ1 = 5 and λ2 = −10 and the corresponding unit
eigenvectors " 1# " #
−2 2
v1 = and v2
1 1
Normalizing them we have
" # " #
1 −1 1 2
v1 = √ , v2 = √ .
5 2 5 1
So the matrix for the coordinate change is
"
1 −1 2
# Figure 8.3: The hyperbola after the transfor-
Q= √ mation
5 2 1
If we check the change of coordinates we have
1

q (Qx) = q − x + 2y, x + y = 5x2 − 10y2 .
2
The red graph is the initial one and the blue graph is the graph of the quadratic in the diagonal form.
Let us see another example.
Exercise 8.5. Let T : R2 → R2 be a linear map such that
T(x) = Ax,
for A a 2 × 2 invertible symmetric matrix. Show that the unit circle is mapped to an ellipse under T. Find the lengths of the
semi-major and the semi-minor axis of the ellipse in terms of the eigenvalues of A.
Solution: Since A is invertible, then its eigenvalues λ1 , λ2 are nonzero and real. Assume that |λ1 | ≥ |λ2 |. We denote
by v1 , v2 the
" corresponding
# orthonormal eigenbasis.
x
Let x = be a vector on the unit circle. Then,
y
x = v1 cos θ + v2 sin θ
Then,
T(x) = cos θ · T(v1 ) + sin θ · T(v2 ) = cos θ · (λ1 v1 ) + sin θ · (λ2 v2 )
which is on the ellipse with semi-major axis ||λ1 v1 || = |λ1 | and semi-minor axis ||λ2 v2 || = |λ2 |.

Notice that A is orthogonally decomposed as
√
. . 2 (5 − 17) . .
" #"1 #" #
0√
A=
. . . .
2 (5 + 17)
1
0
" #
2 2
Example 8.5. The unit circe under the transformation x → Ax, where A = is transformed to the ellipse as shown
2 3
Fig. 8.4.
149
Figure 8.4: Mapping the unit circle to an ellipse
8.2.2 Binary quadratic forms, conics

Let
F(x, y) = ax2 + bxy + cy2 ,
be a binary quadratic and A is associated matrix. The following lemma determines the shape of the equation
F(x, y) = d, for any constant d ∈ R∗ .
Corollary 8.1. Let C be the curve in R2 given by the equation
f (x, y) = ax2 + bxy + cy2 = 1

" #
a b/2
and A = its corresponding matrix. If both eigenvalues of A are positive then C is an ellipse, if they have different
b/2 c
signs then C is a hyperbola.
Proof. The proof is straight forward. The binary form is equivalent to
f Q = λ1 x2 + λ2 y2 .
From high school we know that the corresponding graph is an ellipse if λ1 and λ2 have the same sign and a
hyperbola if they have different signs.
Example 8.6. Find the shape of the equation
q(x) = x2 + 18xy + 6y2 = 2
Proof. The matrix for q(x) is " #

1 9
A=
9 6
Since its eigenvalues are λ1 = 1 and λ2 = −3 and they have different signs then the shape is a hyperbola.
8.2.3 Ternary forms, quadratic surfaces in R3

Next, we focus on quadratic forms in R3 . Consider the following problem:
Example 8.7. Given a quadratic form
F(x, y, z) = ax2 + by2 + cz2 + 2dxy + 2exz + 2gyz
Determine linear substitutions for x, y, z such that the monomials xy, xz, yz disappear.
150
A ternary quadratic form is a second degree polynomial equation in three variables x, y, z which has the form
F(x, y, z) = ax2 + by2 + cz2 + 2dxy + 2exz + 2 f yz
where coefficients a through f are real numbers. Consider the curve
F(x, y, z) = h.
Then this equation can be written in the form

xt Ax = f
where 
 x 
 
 a d e


x =  y  , and A =  d b f
   

z e f c
   
A is called the matrix associated with the quadratic form F(x, y, z). Sometimes it is useful to rotate the xy-axis such
that the equation of the above curve does not have the terms xy, yz, xz. Such quadratic forms are called diagonal
quadratic forms. This would be equivalent to asking that the associated matrix be diagonal.
Example 8.8. Let the quadratic form q(x, y, z) be given as below
q(x, y, z) = 2x2 + 3y2 + 3z2 − 2xy − 2xz − 2yz.
Determine its diagonal form.

Proof. The matrix corresponding to q(x, y, z) is
−1 −1

 2

A = −1 3 −1

−1 −1 3
 
The characteristic polynomial is

char (A, x) = (x − 4)(x2 − 4x + 2)
and the eigenvalues
√ √
λ1 = 4, λ2 = 2 − 2, λ3 = 2 + 2.
The diagonal form is
√ √
q(Qx) = 4x2 + (2 − 2)y2 + (2 + 2)z2 .

The inertia of A, denoted in(A), is defined as the triple
in(A) := (n1 , n2 , n3 )
where ni , i = 1, 2, 3 denotes the number of positive, negative, and zero eigenvalues of A respectively.
Lemma 8.6. Let F(x, y, z) be a ternary quadratic form and A its associated matrix. The following are true:
i) If in(A) = (3, 0, 0) then the quadratic is an ellipsoid.

ii) If in(A) = (2, 0, 1) then the quadratic is an elliptic paraboloid.
iii) If in(A) = (2, 1, 0) then the quadratic is a hyperboloid of one sheet.
iv) If in(A) = (1, 2, 0) then the quadratic is a hyperboloid of two sheets.
v) If in(A) = (1, 1, 1) then the quadratic is a hyperbolic paraboloid.
vi) If in(A) = (1, 0, 2) then the quadratic is a parabolic cylinder.
151
Example 8.9. Identify the quadratic surface with equation
5x2 + 16xy + 11y2 + 20xz − 4yz + 2z2 = 36.
Find its diagonal form.

Proof. The corresponding matrix is

 5 8 10 

A =  8 11 −2

10 −2 2
 
with
char (A, x) = (x2 − 81)(x − 18).
So the eigenvalues are
λ1 = 18, λ2 = 9, λ3 = −9.
From the Lemma 8.6 we already know that this is a hyperboloid with one sheet.
The normalized eigenvectors are:
 
2
 
1
 
2
1 1   1  
v1 =   , v2 = −2 , v3 = −1
2
3 1
  32 3 −2
Hence the orthogonal matrix Q is


2 1 2 

1
Q= 2 −2 −1
3 
1 2 −2

Then the change of coordinates is

2  x  2x + y + z 

2 1
   
1
Qx = −2 −1  y = 2x − 2y − z
2    

3 
1 2 −2 z
  
x + 2y − 2z

Then the diagonalized form is
q(Qx) = q(2x + y + z, 2x − 2y − z, x + 2y − 2z) = 18x2 + 9y2 − 9z2 = 36.
The equation of the surface

x2 y2 z2
+ − = 1.
2 4 4

8.2.4 Graphing quadratics equations in R2 and R3

The above techniques gave us everything about the quadratic forms q(x). But not every quadratic equation is a
quadratic form. Let us look first at the following example.
Example 8.10. Graph the equation
9x2 + 4y2 − 18x − 16y = 11
Solution: We notice that our previous techniques do not apply here since this is not a norm. However, we could group all x
terms together and all y terms together as follows:

9x2 − 18x + 4y2 − 16y =
Which can be written as

9(x − 1)2 + 4(y − 2)2 = 36
152
By substituting X = x − 1 and Y = y − 2 we get

X 2 Y2
+ = 1,
22 9
which we know how to graph.
Can the above technique used in general? Not, if the equation has xy terms. However, we can take all the degree
2 terms first and consider that part as a quadratic form q(x). We can diagonalize that form and the new form will
not have xy terms. Then we can separate the variables and complete the squares.
We see another example.
Example 8.11. Graph the quadratic equation
x2 + 2 xy + 2 xz + y2 − 2 yz + z2 + 2 x + 6 z = −2
Solution: We first consider the quadratic form
q(x) = x2 + 2 xy + 2 xz + y2 − 2 yz + z2 .
The matrix associated to q(x) is

1 1 1 

1
 1 −1
1 −1 1

with characteristic polynomial

char (A, x) = (x + 1)(x − 2)2
and eigenvalues
λ1 = −1, and λ2 = 2.
The corresponding eigenspaces are
     

 −1   
 1 1 
     
E1 = Span   and E2 = Span ,
 
1 0 1
 
 
   
  
     
 1 
  1 0
 
Thus, we can take orthonormal bases {v1 } and {v2 , v3 } as follows

−1
   
1
 
1
1   1   1  
v1 = √  1  , v2 = √ 0 , v3 = √  2  .
3 1  2 1 6 −1
The change of coordinates is

 1
√1 √1   

− √3
X x  x
   
 2 6 
Y  = Q  y = [v1 |v2 |v3 ] x =  √1 0 √2    y
     3 6    
Z z  √1 √1 −√  z
1
3 2 6
Applying this to the initial quadratic equation we have
4√ √ 2√
−x2 + 2 y2 + 2 z2 + 3x + 4 2y − 6z = −2
3 3
Hence, now we have no xy terms and

4√
√
1√

− x2 − 3x + 2 y2 + 2 2y + 2 z2 − 6z = −2
3 3
which becomes
2√ 2 1 2 35

− x− 3 + 2(y + 2) + 2 z −
2
= −2
3 6 9
This quadratic has no xy-terms and you can finish it by completing the squares.
Exercises:
153
8.3. Find an orthogonal matrix S and a diagonal matrix D What is the kernel of A?
such that A = ST DS, where
" #
3 2
A= 8.9. Find all the eigenvalues and their multiplicities of the
2 3
following matrix
8.4. Find an orthogonal matrix S and a diagonal matrix D
3 1 1 1 1
 
" #
3 3 1 
3 1 1 1
A= 
3 −5 A = 1 1 3 1 1

1 1 1 3 1
 
8.5. Find an orthogonal matrix S and a diagonal matrix D 
such that A = ST DS, where 1 1 1 1 3

0 0 3

A = 0 2
 0 8.10. Let the unit sphere in R3 with equation
3 0 0
 
8.6. Find an orthogonal matrix S and a diagonal matrix D x2 + y2 + z2 = 1

be given. Using the method of the previous exercise, classify it
 1 −2 2 
 
A = 
−2 4 −4
 according to the above list.
2 −4 4
 
8.7. Find an orthogonal matrix S and a diagonal matrix D 8.11. Classify the quadratic surface

1 0 1

2x2 + 4y2 − 5z2 + 3xy − 2xz + 4yz = 2.
A = 0 1 0

1 0 1
 
8.12. Classify the quadratic surface
8.8. Prove that the algebraic multiplicities equal the geometric
multiplicities for all the eigenvalues of the following matrix
x2 + y2 − z2 + 3xy − 5xz + 4yz = 1.
1 1 1 1 1
 
1 1 1 1 1
A = 1 1 1 1 1
  8.13. Classify the quadratic surface
1 1 1 1 1
 
1 1 1 1 1 x2 + y2 + z2 = 1.
154
8.3 Positive definite matrices

Let q(x) be a quadratic form. We say q(x) is positive definite if for all x ∈ Rn , q(x) > 0. We say that the quadratic
form is positive semidefinite if q(x) ≥ 0 for all x ∈ Rn .
A symmetric matrix A is called positive definite if the quadratic form q(x) = xt Ax is positive definite and positive
semidefinite when q(x) = xt Ax is positive semidefinite.
A quadratic form q(x) is called indefinite if it takes both positive and negative values. A symmetric matrix A is
called indefinite if the quadratic form q(x) = xt Ax is indefinite.
The following lemma is an easy exercise.
Lemma 8.7. Let A ∈ Matn×m (R) and
q(x) := ||Ax||2 , for x ∈ Rn .
Then, q(x) is a positive semidefinite. Moreover, q(x) is positive definite if and only if nullspace(A) = {0}.
Proof. Exercise.
Theorem 8.5. The following are true:
• A symmetric matrix A is positive definite if and only if all of its eigenvalues are positive. A is positive semidefinite if and
only if all of its eigenvalues are positive or zero.
• A symmetric matrix A is positive definite iff all of its eigenvalues are positive. A is positive semidefinite iff all of its
eigenvalues are positive or zero.
Proof. The proof is rather straightforward. If λ1 , . . . , λn are the eigenvalues of A, then in its diagonal form
q(x) = xt Ax = λ1 x21 + · · · + λn x2n .
The rest follows.

8.3.1 Principal matrices

Let A be a square matrix. The principal matrices of A, denoted by
A(1) , A(2) , ·A(i) , · · · , A(n) ,
are the matrices obtained by chopping off all the rows and columns > i of A. We have the following:
Theorem 8.6. A symmetric n × n matrix A is positive definite if and only if det A(m) > 0 for all principal submatrices A(m) ,
m = 1, . . . , n.
Proof. To come soon.

Example 8.12. Let A be the matrix given by

 1 1 2

A = −1 2 0
 
1 2 0
 
Indeed we have " #

1 1
A (1)
= [1], A (2)
=
−1 2
Then,
det A(1) = 1, det A(2) = 3, det A(3) = 8.
Therefore, A is positive definite.
155
Theorem 8.7 (Min-Max Theorem). Let q(x) = xT Ax be a quadratic form and
λ1 ≥ λ2 ≥ · · · ≥ λn
the eigenvalues of A. On the domain ||x|| = 1 the following are true:
i) λ1 ≥ q(x) ≥ λn
ii) q(x) obtains a maximum value when x is a unit eigenvector of λ1 . This maximum value is q(x) = λ1 .
ii) q(x) obtains a minimum value when x is a unit eigenvector of λn . This minimum value is q(x) = λn .
Proof. Exercise
Exercises:
156
8.4 Singular values

Consider a transformation T : Rn → RM . We saw that many linear transformations were projections, rotations,
reflections or a combination of those. Can we express every transformation as a composition of simpler transfor-
mations? If so what would happen to the corresponding matrix of this transformation? This section barely touches
in this very important topic.
Let A be an n × m matrix. Then AT A is an m × m symmetric matrix. For any symmetric matrix we can ask if it is
positive definite, positive semidefinite, or indefinite.
For our matrix AT A we have the following:
Lemma 8.8. For any matrix A, the matrix AT A is a positive semidefinite.
Proof. Exercise.

Hence all eigenvalues of AT A are positive or zero. Let assume they are
λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0
We denote by
p
σi = λi , i = 1, . . . m.
The singular values of A are the square roots of the eigenvalues of the m × m matrix AT A. Usually we write the
singular values σ1 , . . . , σm of a matrix in decreasing order
σ1 ≥ · · · ≥ σm ≥ 0
Example 8.13. Let A be given by


1 1 1 

A = 1 1 −1

1 −1 1
 
prove that its singular values are:

σ1 = 2, σ2 = 2, σ3 = 1.
The following theorem shows that the number of singular values that are equal to zero is an invariant under
any base change.
Theorem 8.8 (Singular values and rank). If A is an n × m matrix of rank r, then the singular values σ1 , . . . , σr are nonzero
and
σr+1 = · · · = σm = 0.
Proof. We start by
Example 8.14. Let be given the matrix

 
 2 0 0 1 
 
 0 1 0 0 
 
A =  

 0 0 1 0 
 

1 0 0 2

Find its singular values.
157
Solution: We have
 
 5 0 0 4 
 
 0 1 0 0 
 
A A = 
T 

 0 0 1 0 
 
4 0 0 5
 
Then,
char (AT A, x) = (x − 9)(x − 1)3
so the eigenvalues are
λ1 = 9, λ2 = 1
and the singular values
σ1 = 3, σ2 = 1.

Next we take a non-symmetric matrix
Example 8.15. Let be given the matrix

−1

 2 1 1 
 
 1 1 0 3 
 
A =  

 1 0 1 0 
 

1 0 −3 2

Find its singular values.
Solution: We have

−4

 7 3 7 
 
 3 2 −1 4 
 
A A = 
T 

 −4 −1 11 −7 
 
7 4 −7 14
 
Then,
char (AT A, x) = x4 − 34 ∗ x3 + 253 ∗ x2 − 508 ∗ x + 144
This polynomial is irreducible. We can find its eigenvalues numerically and they are
24.51545253, 6.450415601, 2.696419097, 0.3377127726,
and the singular values
σ1 = 4.95130816342049, σ2 = 2.53976683989367, σ3 = 1.64207767698920, σ4 = .581130598536912.
8.4.1 Singular value decomposition

Let us get back to our original question. Can we write our linear maps as some kind of composition of simpler
linear maps. The next theorem addresses that question.
Theorem 8.9 (Singular Value Decomposition). Any n × m matrix A of rank rank (A) = r can be written as
A = UΣV T
where U is an orthogonal n × n matrix, V is an orthogonal m × m matrix, and Σ is an n × m matrix whose first r diagonal
entries are the nonzero singular values σ1 , . . . , σr of A, while all the other values are zero.
158
Proof. To come soon ...

From the proof of the theorem we are able to devise a procedure how to compute such singular value decom-
position.
Let rank (A) = r and
σ1 ≥ σ2 ≥ · · · ≥ σr
the non-zero singular values. Choose an orthonormal basis of eigenvectors
v1 , v2 , . . . , vm .
Let
1 1 1
u1 = Av1 , u2 = Av2 , . . . , ur = Avr .
σ1 σ2 σr
Then take
V = [v1 |v2 | · · · |vr ] , U = [u1 |u2 | · · · |ur ]
and
σ1
 

 .. 
Σ = 
 . 

σr
 
 

0 0
Next we see some examples.
Example 8.16. Find a singular value decomposition for
"
#
6 2
A=
−7 6
Solution: We have
# "
85 −30
A A=
T
−30 40
with characteristic polynomial
char (AT A, x) = x2 − 125x + 2500 = (x − 100)(x − 25).
The eigenvalues are

λ1 = 100 ≥ λ2 = 25
and the singular values are
σ = 10 ≥ σ2 = 5.
The normalized eigenvectors are " # " #
1 2 1 1
v1 = √ and v2 = √ .
5 −1 5 2
and " #
1 2 1
V= √
5 −1 2
Then we have " #
1 1 −1
u1 = Av1 = √
σ1 5 2
and " #
1 1 2
u2 = Av2 = √
σ2 5 1
159
Hence " #
1 1 2
U= √
5 −2 1
Finally
# "
10 0
Σ=
0 5
You should verify whether A = UΣV T .

Example 8.17. Find the singular value decomposition of the matrix

1 1 1 

A = 1 1 −1

1 −1 1
 
Example 8.18. Orthogonally diagonalize the matrix
2 0 0 1
 
0 1 0 0
A = 

0 0 1 0

1 0 0 2
 
char (A, x) = (x − 3)(x − 1)1
For λ1 = 3 the corresponding normalized eigenvector is
1
 
1 0
v1 = √  
2 0
1
For λ2 = 1 we have the following orthonormal eigenbasis
−1 0 0

     
1  0 
  0
  1
v2 = √   , v3 =   , v4 =  
 
2  0  1 0
1 0 0
   
The transitional orthogonal matrix is

 1
− √1

 √2 0 0
 2 
 0 0 0 1
Q = [v1 |v2 |v3 |v4 ] =  
 0 0 1 0
 √1 √1 0 0

2 2
Then,
3 0 0 0
 
0 1 0 0
A = Q 
T
Q
0 0 1 0
0 0 0 1
 
160
Example 8.19. Determine the change of coordinates to bring the equation
x2 + y2 − 2z2 + 4xy − 2xz + 2yz − x + y + z = 0
in the standard form. Write down the standard form and identify the surface.
Solution: We first take

q(x) = x2 + y2 − 2z2 + 4xy − 2xz + 2yz.
The corresponding matrix for this quadratic form is
2 −1

 1

A =  2 1 1 

−1 1 −2
 
and its characteristic polynomial

char (A, x) = x(x − 3)(x + 3).
We take the eigenvalues as λ1 = 0, λ2 = −3, λ3 = 3. The corresponding eigenvectors in normal form are
−1
   
1
 
1
1   1   1  
v1 = √  1  , v2 = √ −1 , v3 = √ 1 ,
3 1  6 2  2 0
Hence, the orthogonal transitional matrix is Q = [v1 |v2 |v3 ]. The diagonalized form will be
q0 (x) = 0 · X2 − 3Y2 + 3Z2 .
The change of coordinates is

y
  − √x + √ + √z 
 
X x  x
 
3 6 2
y 
  = Q  y =  √3 − √6 + √z 
Y    
√ 2 
Z z
     y 6 
 √x + 
3 3
Hence our equation becomes

p p
−x + (3)y2 − (3)z2 = 0
This is a hyperbolic paraboloid.

Example 8.20. Let be given a plane in R3 with equation
ax + by + cz = 0.
Consider the reflection map T : R3 → R3 which takes a point P to its reflection with respect to the given plane. Is this map
linear? If that is the case determine its corresponding matrix. Prove your answers.
Solution: We just give a brief outline here. We will call the plane ax + by + cz = 0, the plane P.
a
 
~
Take the vector u = b, this is called the normal vectors to the plane P. Let A be a point and v := OA.
 
c
 
Use what we have already learned to find the projection projP v of v on the plane. Then the vector perpendicular
from A to the plane is
v − projP v
The symmetric point A0 of the point A with respect to P is represented by the vector
~ 0 = v − 2 v − proj v

OA P
161
Show that this is

~ 0 = I3 − 2 uuT
OA .
a2 + b2 + c2
x
 
~ =  y, then
If OA  
z
2 2
 (1 − 2a )x − 2aby − 2acz  (1 − 2a ) −2ab −2ac  x
    
~ 0 = −2abx + (1 − 2b2 )y − 2bcz =  −2ab
OA
  
+(1 − 2b2 ) −2bcz   y
 
  
−2acx − 2bcy + (1 − 2c2 )z −2ac −2bc +(1 − 2c2 ) z
 
Since this map is given through a multiplication by a matrix then it must be linear.

This transformation is sometimes called Householder transformation and is widely used in optics, computer
vision, etc.
Exercise 8.6. Prove the linearity of the reflection to a plane using only the geometry of vectors in R3 .
 
 3 
Exercise 8.7. Find the orthogonal projection of v = −1 onto the plane V in R3 with equation
 
2
 
x − y + 2z = 0.
Solution: Use the formula in the previous exercise.

Exercises:
8.14. Find the singular values of 8.15. Find the singular values of
" #
p −q
A= .
q p " #
1 1
A= .
Explain the results geometrically. 0 1
162
Index
absolute value , 7 Eisenstein’s criterion, 110

addition elementary matrix, 44
of vectors , 21 elementary operations, 112
additive identity, 52 elementary row operations, 37
additive inverse , 52 error vector, 138, 140
adjoint, 105 Euclidean algorithm, 109
angle, 31 Euclidean inner product, 129
argument, 7 Euclidean space, 21
augmented matrix, 37 Euler’s formula, 7
expansion by minors, 88
back substitution, 38 Extension of the Eisenstein’s criterion, 111
basis, 55
binary operation, 51 finite dimensional, 57
finite order, 35
Cauchy-Schwartz inequality, 30 Fourier series, 130
Cayley - Hamilton theorem, 119 function spaces, 52
characteristic polynomial, 95 functionals, 146
closed under addition, 53 Fundamental Theorem of Algebra , 110
closed under scalar multiplication, 53
cofactor, 87 Gauss method, 38
column space , 61 Gauss-Jordan method, 41
companion matrix, 112 Gaussian equivalent, 112
complement, 66 generated, 55
complex conjugate, 7 Gram-Schmidt Algorithm, 132
complex conjugation map , 7
complex numbers C, 6 Hermitian product, 129
coordinate functions, 146 homogenous system, 42, 104
coordinates, 56
Cramer’s rule , 103 idempotent, 35
identity matrix, 33
De Moivre’s formula , 7 imaginary part, 7
determinant, 87 index of negativity, 145
dimension, 57 index of positivity, 145
Dirac functional, 147 inertia, 159
direct product, 67 infinite order, 35
direct sum, 65 inner product, 29, 127
distance in R3 , 22 inner space, 127
dot product, 29 Integral root test, 110
dual basis, 147 invariant factors, 114
dual space, 146 inverse, 43
invertible, 43
eigenspace, 95 isomorphic spaces , 74
eigenvalue, 94 isomorphism , 74
algebraic multiplicity , 95
geometric multiplicity, 95 Jordan block, 121
eigenvector, 94 Jordan canonical form, 122
163
kernel , 72 primitive n-th root of unity , 7

projection matrix, 139, 141
leading coefficient, 109 projection vector, 31, 83, 128
least square solution, 140 purely imaginary, 7
line, 57
linear combination, 53 quaternions, 36
linear combination of vectors, 28
linear map, 71 rank, 61
linearly dependent, 28, 54 rational canonical form, 109, 117
linearly independent, 28, 54 real part, 7
lower triangular, 47, 68 reduced row-echelon form, 40
reflection vector, 84
main diagonal, 33 remainder, 109
matrix root, 109
m × n , 33 row echelon form, 37
addition, 33 row equivalent, 37
equal, 33 row space , 61
product, 33
scalar, 27
symmetric, 34
scalar matrix, 33
transpose, 34
scalar multiplication, 21
matrix associated to the quadratic form, 159
scalar product, 127
matrix associated with a linear map, 75
scalars, 52
minimal polynomial, 112
signature, 145
minor, 87
similar, 98
modulus , 7
singular, 43
monic, 109
Smith normal form, 114
multi-linear form, 92
solution
alternating, 92
infinitely, 38
symmetric, 92
of a system, 38
multiplication by a scalar, 33
unique, 38
multiplicative identity, 52
space of n × n matrices, 52
multiplicity, 110
space of complex continuous functions, 130
space of polynomials, 52
nilpotent, 35
space of real continuous functions, 128
non-degenerate, 127
span, 28
norm, 29, 128, 131
sphere, 22
nullity, 61
square matrix, 33
nullspace , 53, 61
standard basis, 59
opposite , 52 subspace, 53
order, 35 sum, 65
orthogonal, 128 sum of subspaces, 54
orthogonal basis, 132 Sylvester’s Theorem, 145
orthogonal complement, 128 The space of functions from R to R, 52
orthogonal projection, 141 trace, 34
orthogonal set, 128 transformation matrix, 80
orthonormal basis, 132, 145 transitional matrix, 99
triangle inequality, 30
pivot, 37
trivial solution, 42
plane, 57
polar representation , 7 Uniqueness of the inverse, 43
polynomial unit vector, 29
irreducible, 110 upper triangular, 47, 68
polynomial ring, 52
positive definite, 128 vector, 20, 21, 52
164
linearly independent , 28
parallel, 28
perpendicular, 30
unit, 24
vector space, 51
vectors in Rn , 28
zero matrix, 33
165
166
Bibliography
[DF04] David S. Dummit and Richard M. Foote, Abstract algebra, Third, John Wiley & Sons, Inc., Hoboken, NJ, 2004. MR2286236
[Lan87] Serge Lang, Linear algebra, Third, Undergraduate Texts in Mathematics, Springer-Verlag, New York, 1987. MR874113
[Sha17] T. Shaska, Foundations of mathematics, 2017. Lecture Notes.
[Sha18] , An introduction to algebra, AP, 2018.
167

Lin Alg

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lin Alg

Uploaded by

Copyright:

Available Formats

Lectures in Linear Algebra

2 Euclidean spaces, linear systems 27

5 Determinants, eigenvalues, eigenvectors 81

6 Canonical Forms 101

7 Inner Products and Orthogonality 119

1.1 Cartesian system of the plane

The distance of a point P(x, y) from the origin is

1.1.1 Relations and their graphs

Rene Descartes or (latin) Renatus Cartesius: 31 March 1596 – 11 February

1.2 Polar coordinates

interval − π2 , π2 . Thus, θ ∈ − π2 , π2 . θ is called the argument of the

point. Let us see some examples.

Solution: Since r = 2 and θ = π/3, then Eq. (1.1) gives

Solution: The argument θ is

Solution: Substituting for r = 4 and θ = π in Eq. (1.1) we have

Cartesian coordinates are P(−4, 0).

1.2.1 Complex numbers

for any r ∈ R and a + b i ∈ C.

For every z ∈ R we have z = <(z) + 0 · i. If z = a + bi and a = 0 then z is called purely imaginary.

1.2.2 Geometric interpretation of complex numbers

z = r1 (cos α + i sin α), and w = r2 (cos β + i sin β).

Moreover, the following is true:

Lemma 1.3. (De Moivre’s formula) For any integer n ≥ 1

(cos θ + i sin θ)n = cos(nθ) + i sin(nθ).

Clearly, if z = r(cos θ + i sin θ) and n an integer ≥ 1 then

zn = rn (cos nθ + i sin nθ).

We will make use also of the Euler’s formula

eiθ = cos θ + i sin θ.

Euler’s formula is obtained by Taylor expansions.

1.2.3 Roots of unity

A complex number z is said to have order n ≥ 1 if zn = 1 and zm , 1 4 =(z)

1.1. Solve the equation 1.7. Let S be the set

(z4 − 1)(z3 − 1) = 0 S := {z ∈ C | kzk = 1}

1.13. Solve the following 1.25. A function f (z) is given by

1.3 Algebraic equations, planar algebraic curves

is 2. The degree of a polynomial f is denoted by deg f .

1.3.2 Circle and Ellipse

which is a unit circle. Let’s investigate further.

Squaring both sides we have

and by simplifying we get

We square again to get Figure 1.6: Ellipse

a2 (x2 + 2cx + c2 + y2 ) = a4 + 2ca2 x + c2 x2

and the equation becomes

(a2 − c2 )x2 + a2 y2 = a2 (a2 − c2 ).

By dividing both sides by a2 b2 ,we have

Figure 1.7: Conics

Squaring both sides we get

(x − h)2 = 4p(y − k) (1.8)

(y − k)2 = 4p(x − h) (1.9)

y2 + 4x − 2y − 3 = (y − 1)2 + 4x − 4 = (y − 1)2 + 4(x − 1) = 0

x2 − 10x − 2y + 29 = (x − 5)2 − 2(y − 2) = 0

|PF1 | − |PF2 | = ±2a

|PF1 | − |PF2 | = ±2a

then the equation of the hyperbola is

Solution: The equation of the hyperbola is

9x2 − 4y2 − 72x + 8y + 176 = 0

4(y2 − 2y) − 9(x2 − 8x) = 176

Find the vertex, focus, and directrix of the parabola and x2 y2

1.4 Conics with mixed terms

13x2 − 18xy + 9y2 − 40x = −64 (1.14)

Graph the set of solutions of this equation.

We don’t really know a methodological approach how to graph

Question 1.3. i) Which substitutions preserve the shape of graphs?