Professional Documents
Culture Documents
Office Hours:
Tuesday and Thursday, 1:30-2:15 in Science Center 322.
Mondays 2-2:30 (longer if students are still there)
1
Course Assistants:(all former students in Math 23a or Math E-23a)
Goals: Math 23a is the first half of a moderately rigorous course in linear algebra
and multivariable calculus, designed for students who are serious about mathemat-
ics and interested in being able to prove the theorems that they use but who are
as much concerned about the application of mathematics in fields like physics and
economics as about “pure mathematics” for its own sake. Trying to cover both
theory and practice makes for a challenging course with a lot of material, but it is
appropriate for the audience!
Prerequisites: This course is designed for the student who received a grade of 5
on the Math BC Advanced Placement examination or an A or A minus in Math
1b. Probably the most important prerequisite is the attitude that mathematics is
fun and exciting. Extension students should ordinarily have an A in Math E-16,
and an additional math course would be a very good idea.
Our assumption is that the typical Math 23a student knows only high-school
algebra and single-variable calculus, is currently better at formula-crunching than
at doing proofs, and likes to see examples to accompany abstractions. If, before
coming to Harvard, you took courses in both linear algebra and multivariable
calculus, Math 25 might be more appropriate. We do not assume that Math 23
students have any prior experience in either of these areas beyond solving systems
of linear equations in high school algebra.
This year, for the second time, we will devote four weeks to single-variable real
analysis. Real analysis is the study of real-valued functions and their properties,
such as continuity, and differentiability, as well as sequences, series, limits, and
convergence. This means that if you are an international student whose curriculum
included calculus but not infinite series OR if you had a calculus course that
touched only lightly on topics like series, limits, and continuity, you will be OK.
Mathematics beyond AP calculus is NOT a prerequisite! Anyone who tries
to tell you otherwise is misguided. In fact, since we will be teaching sequences
and series from scratch (but rigorously), you can perhaps get away with a weaker
background in this area than is required for Math 21.
2
Strange as it may seem, Part I of the math placement test that freshmen have
taken is the most important. Students who do well in Math 23 have almost all
scored 26 or more out of 30 on this part.
Extension students who register for graduate credit are required to learn and
use the scripting language R. This option is also available to everyone else in the
course. You need to be only an experienced software user, not a programmer.
3
Who takes Math 23?
When students in Math 23b were asked to list the two concentrations they were
most seriously considering, the most popular choices were mathematics, applied
math, physics, computer science, chemistry, mathematical economics, life sciences,
and humanities.
Extension students who take this course are often establishing their credentials
for a graduate program in a field like mathematical economics, mathematics, or
engineering. Programs in fields like economics like to see a course in real analysis
on your transcript. Successful Math E-23 students have usually taken more than
one course beyond single-variable calculus.
Upperclassmen who have made a belated decision to go into a quantitative PhD
program will also find this course useful.
Course Meetings:
The course ordinarily meets in Science Center A. To avoid overcrowding, the
first two lectures have been moved to Science Center C.
Lectures on Tuesdays and Thursdays run from 2:37 to 4:00. They provide
complete coverage of the week’s material, occasionally illustrated by examples done
in the R scripting language.
Problem Sessions (Section)
There are two types of weekly problem sessions led by the course staff. The
first is required; the second, though highly recommended, is optional.
• The “early” sections on Thursday and Friday will be devoted to problem
solving in small groups. These are a required course activity and will
count toward your grade. Lecture on Thursday is crucial background for
section!
• The “late” sections that meet on Monday will focus on the weekly problem
sets due on Wednesday mornings, and will also review the proofs that were
done in lecture. Attendance at these sections is optional, but most students
find them to be time well spent.
Videos will be made of all the lectures. Usually the video will be posted on
the Web site before the next lecture, and often it will appear on the same day.
The Thursday video will not be posted in time to provide preparation for the early
sections that meet on Thursdays, and we cannot guarantee that it will appear
before the Friday sections.
Even though all lectures are captured on video, Harvard rules forbid under-
graduates to register for another course that meets at the same time as Math 23,
even one with just a 30-minute overlap! Here is the official statement of this year’s
policy:
“In recent years, the Ad Board has approved petitions in which the direct
and personal compensatory instruction has been provided via video capture of
classroom presentations. In keeping with the views of the Standing Committee
on Undergraduate Educational Policy (formerly EPC), discussed with the Faculty
4
Council and the full faculty last April, the Ad Board will no longer approve such
petitions.”
With regard to athletic practices that occur at the same time as classes, policy
is less well defined. Here is the view of the assistant director of athletics:
”The basic answer is that our coaches should be accommodating to any aca-
demic conflict that comes up with class scheduling. Kids should be able to take
the classes they want and still be a part of the team. Especially for classes that
would only cause a student to miss a small part of a practice.
What complicates things are the classes that would cause a student to miss an
entire practice for 2-3 days a week. Those instances make it hard for a student to
engage fully in the sport and prepare adequately for competition.
It’s hard for freshmen to ask a coach - the adult they have the closest relation-
ship to in campus - for practice accommodations but in my experience many of
them will work with students on their total experience”
The Math 23 policy, based on this opinion: It is OK to take Math 23a and
practice for your sport every Tuesday, but you must not miss Thursday lecture for
a practice.
Extension students may choose between attending lecture or watching videos.
However, students in Math E-23a who will not regularly attend lecture on Thursday
should sign up for a section that meets as late as possible. Then, with occasional
exceptions, they can watch the video of the Thursday lecture to prepare for section.
Sections will begin on September 10-11. Students should indicate their prefer-
ences for section time using the student information system. More details will be
revealed once the software is complete!
In order to include your name on a section list, we must obtain your permission
(on the sectioning form) to reveal on the Web site that you are a student taking
Math 23a or E-23a. If you wish to keep this information secret, we will include
your name in alphabetical order, but in the form Xxxx Xxxxxx.
5
Exams: There will be two quizzes and one final exam.
Quiz 1: Wednesday, October 7 (module 1, weeks 1-4)
Quiz 2: Wednesday, November 4 (module 2, weeks 5-8)
Final Exam: date and time TBA (module 3, weeks 9-12)
Quizzes are held in the Yenching Auditorium, 2 Divinity Avenue. They run
from 6 to 9 PM, but you can arrive any time before 7 PM, since 120 minutes should
be enough time for the quiz.
Keep these time slots open. Do not, for example, schedule a physics lab
or an LS 1a section on Wednesday evenings. If you know that you tend to work
slowly, it would also be unwise to schedule another obligation that leaves only part
of that time available to you!
Students who have exam accommodations, properly documented by a letter
from the Accessible Education Office, may need to take their quizzes in a separate
location. Please provide the AEO letters as early in the term as you can, since we
may need to reserve one or more extra rooms.
The last day to drop and add courses (like Math 23a and Math 21a) is Monday,
October 5. This is before the first quiz. It is important that you be aware of how
you are managing the material and performing in the course. It is not a good
idea to leave switching out of any course (not just Math 23) until the fifth Mon-
day. Decisions of this nature are best dealt with in as timely a manner as possible!!
Quizzes will include questions that resemble the ones done in the “early” sec-
tions, and each quiz will include two randomly-chosen proofs from among the
numbered proofs in the relevant module. There may be other short proofs simi-
lar to ones that were done in lecture and problems that are similar to homework
problems. However if you want quizzes on which you are asked to prove difficult
theorems that you have never seen before, you will need to take Math 25a or 55a,
not Math 23a.
If you have an unexpected time confilct for one of the quizzes, contact Kate
as soon as you know about it, and special arrangements can be made. Distance
students will take their quizzes near their home but on the same dates.
The final examination will focus on material from the last five weeks of the
course. Local Extension students will take it at the same time and place as under-
graduates. The time (9AM or 2PM) will be revealed when the exam schedule is
posted late in September. If you have two or even three exams scheduled for that
day, don’t worry: that is a problem for the Exams Office, not you, to solve.
Except for the final examination, “local” Extension students can meet all their
course obligations after 5:30pm.
“Distance” extension students who do not live near Cambridge and cannot
come to Harvard in the evening to hand in homework, attend section and office
hours, take quizzes, and present proofs can still participate online in all course
activities. Details will be available in a separate document. Since this fully-online
6
option is an experiment, we plan to restrict it to two sections of 12 students each,
with absolute priority given to students who live far from Cambridge.
7
Textbooks:
Vector Calculus, Linear Algebra, and Differential Forms, Hubbard and Hubbard,
fourth edition, Matrix Editions, 2009. Try to get the second printing, which in-
cludes a few significant changes to chapters 4 and 6.
This book is in stock at the Coop, or you can order it for $84 plus $10 for
priority shipping from the publisher’s Web site at
http://matrixeditions.com/UnifiedApproach4th.html. The Student Solution
Manual for the fourth edition, not in stock at the Coop, is also available from that
Web site.
We will cover Chapters 1-3 this term, Chapters 4-6 in Math 23b; so this one
textbook will last for the entire year.
8
Proofs:
Learning proofs can be fun, and we have put a lot of work into designing an
enjoyable way to learn high level and challenging mathematics! Each week’s course
materials includes two proofs. Often these proofs appear in the textbook and will
also be covered in lecture. They also may appear as quiz questions.
You, as students, will earn points towards your grade by presenting these proofs
to teaching staff and to each other without the aid of your course notes. Here is
how the system works:
When we first learn a proof in class, only members of the teaching staff are “qual-
ified listeners.” Anyone who presents a satisfactory proof to a qualified listener
also becomes qualified and may listen to proofs by other students. This process of
presenting proofs to qualified listeners occurs separately for every proof.
You are expected to present each proof before the date of the quiz on which it
might appear; so each proof has a deadline date. Distance students may reference
the additional document which details how to go about remotely presenting proofs
to classmates and teaching staff.
Each proof is worth 1 point. Here is the grading system:
• Listening to a fellow student’s proof: 0.1 point. Only one student can receive
credit for listening to a proof.
• After points have been tallied at the end of the term, members of the course
staff may assign the points that they have earned by listening to proofs
outside of section to any students that they feel deserve a bit of extra credit.
Students who do the proofs early and listen to lots of other students’ proofs can
get more than 100%, but there is a cap of 30 points total.You can almost reach
this cap by doing each proof before the deadline and listening twice to each proof.
Either you do a proof right and get full credit, or you give up and try again
later. There is no partial credit. It is OK for the listener to give a couple of small
hints.
You may consult the official list of proofs that has the statement of each theorem
to be proved, but you may not use notes. That will also be the case when proofs
appear on quizzes and on the final exam.
It is your responsibility to use the proof logging software on the course
Web site to keep a record of proofs that you present or listen to. You can also
use the proof logging software to announce proof parties and to find listeners for
your proofs.
Each quiz will include two questions which are proofs chosen at random from
the four weeeks of relevant material. The final exam will have three proofs, all from
material after the second quiz. Students generally do well on the proof questions.
9
Useful software:
• R and RStudio
This is required only for Extension students who register for graduate credit,
but it is an option for everyone. Consider learning R if you
10
• LaTeX
This is the technology that is used to create all the course handouts. Once
you learn how to use it, you can create professional-looking mathematics on
your own computer.
The editor that is built into the Canvas course Web site is based on LaTeX.
One of the course requirements is to upload four proofs to the course Web site
in a medium of your choice. One option is to use LaTeX. Alternatively, you
can use the Canvas file editor (LaTeX based), or you can make a YouTube
video.
I learned LaTeX without a book or manual by just taking someone else’s files,
ripping out all the content, and inserting my own, and so can you. You will
need to download freeware MiKTeX version 2.9 (see http://www.miktex.org),
which includes an integrated editor named TeXworks.
From http://tug.org/mactex/ you can download a similar package for the
Mac OS X.
When in TeXworks, use the Typeset/pdfLaTeX menu item button to create
a .pdf file. To learn how to create fractions, sums, vectors, etc., just find an
example in the lecture outlines and copy what I did. All the LaTeX source
for lecture outlines, assignments, and practice quizzes is on the Web site, so
you can find working models for anything that you need to do.
If you create a .pdf file for your homework, please print out the files and
hand in the paper at class. An exception can be made if if you are a distance
Extension student or for some other good reason you are not in Cambridge
on the due date.
The course documents contain examples of diagrams created using TikZ,
the built-in graphics editor. It is also easy to include .jpg or .png files
in LaTeX. If you want to create diagrams, use Paint or try Inkscape at
http://www.inkscape.org, an excellent freeware graphics program. Stu-
dents have found numerous other solutions to the problem of creating graph-
ics, so just experiment.
If you create a .pdf file for your homework, please print out the files and hand
in the paper. By default, undergraduates and “local” Extension students may
submit the assignment electronically only if you are out of town on the due
date. Individual section instructors may adopt a more liberal poicy about
allowing electonic submission. Do not submit .tex files.
11
Use of R:
You can earn “R bonus points” in three ways:
12
Grades: Your course grade will be determined as follows:
• problem sets, 50 points. Your worst score will be converted to a perfect score.
For graduate students, only a “graduate” percentage score, using the R bonus
points, will be calculated. For everyone else, we will also calculate an “undergrad-
uate” percentage score, ignoring the R bonus points, and we will use the higher of
the two percentage scores.
The grading scheme is as follows:
Points Grade
94.0% A
88.0% A-
80.0% B+
75.0% B
69.0% B-
63.0% C+
57.0% C
51.0% C-
If you are conscientious about the homework, proofs, and quizzes, you will end up
with a grade between B plus and A, depending on your expertise in taking a fairly
long and challenging 3-hour final exam, and you will know that you are thoroughly
prepared for more advanced courses. For better or worse, you need to be fast as
well as knowledgeable to get an A, but an A- is a reasonable goal even if you make
occasional careless errors and are not a speed demon. Extension students who
earned a B plus have been successful at getting into PhD programs.
There is no “curve” in this course! You cannot do worse because your classmates
do better.
13
Switching Courses (Harvard College students only):
While transfers among Math 21a, 23a, 25a, and 55a are routine, it is important
to note that Math 21a focuses on multivariable calculus, while Math 23a and 25a
focus on linear algebra. Math 21b focuses on linear algebra, while Math 23b and
25b focus on multivariable calculus. Math 21a and b are given every semester, while
Math 23a and 25a are fall only with 23b and 25b given spring only. Ordinarily
there is a small fee if you drop a course after the third Monday of the term, but
this is waived in the case of math courses. However, the fifth Monday, October 5,
is a firm deadline after which you cannot change courses!
14
Switching from Math 25a to Math 23b at midyear requires you to teach
yourself about multivariable differential calculus and manifolds, but a handful
of students do it every year, and it generally works out OK.
• The Lecture Preview Videos were made by Kate. They cover the so-called
Executive Summaries in the weekly course materials, which go over all the
course materials, but without proofs or detailed examples.
If you watch these videos (it takes about an hour per week) you will be very
well prepared for lecture, and even the most difficult material will make sense
on a first hearing.
Last year’s experiment was unsuccessful because we assumed in lecture that
everyone had watched these videos, when in fact only half the class did
so. Those who did not watch them complained, correctly, that the lectures
skipped over basic material in getting to proofs and examples. This year’s
lectures will be self-contained, so the preview videos are not required viewing.
• The R script videos were made by Paul. They provide a line-by-line expla-
nation of the R scripts that accompany each week’s materials.
Last year’s experiment was unsuccessful because going over these scripts in
class was not a good use of lecture time. If you are doing the “graduate”
option, these scripts are pretty much required viewing, although the scripts
are so thoroughly commented that just working through them on your own
is perhaps a viable alternative.
If you are doing just the “undergraduate” option, you can ignore the R scripts
completely.
15
Homework: Homework (typically 8 problems) will be assigned weekly. The
assignment will be included in the same online document as the lecture notes and
section problems.
Assignments are due on Wednesdays by 10:00 AM. There will be a locked box
on the second floor, near Room 209, with your “late” section instructor’s name.
At 10 AM Kate will place a sheet of colored paper in each box, and anything above
that paper will be late! Please include your name, the assignment number, and
your CA’s name on your assignment.
Each week’s assignment will include a couple of optional problems whose so-
lutions require R scripts. These scripts should be uploaded electronically to the
dropbox on the Web site for that week. Please include your name as a comment
in the script and also in the file name.
The course assistant who leads your “late” section should return your corrected
homework to you at the section after the due date. If you are not receiving graded
homework on schedule, send email to penner@math.harvard.edu and the problem
will be dealt with.
Homework that is handed in after 10AM on the Wednesday when it is due
will not be graded. If it arrives before the end of Reading Period and looks fairly
complete, you will get a grade of 50% for it.
It is a violation of Federal privacy law for us to return graded homework by
placing it in a publicly accessible location like an instructor’s mailbox. You will
have to collect your graded homework from your section instructor in person.
16
Tutoring: Several excellent students from previous years, qualified to be course
assistants but too busy, are registered with the Bureau of Study Counsel as tutors.
If you find yourself getting into difficulties, immediately contact the BSC and get
teamed up with one of them.
You will have to contact the BSC directly to arrange for a tutor, since privacy
law forbids anyone on the Math 23 staff to know who is receiving tutoring. A
website with more information can be found at www.bsc.harvard.edu.
Week-by-week Schedule:
Month Date Topic
Fortnight 1 September 3-11 Fields, vectors and matrices
Week 2 September15-18 Dot and cross products; Euclidean geometry of Rn
Week 3 September 22-25 Row reduction, independence, basis
Week 4 Sept. 29 - Oct. 2 Eigenvectors and eigenvalues
Week 5 October 6-9 Number systems and sequences
October 7 QUIZ 1 on weeks 1-4
Week 6 October 13-16 Series, convergence tests, power series
Week 7 October 20-23 Limits and continuity of functions
Week 8 October 27-30 Derivatives, inverse functions, Taylor series
Week 9 November 3-6 Topology, sequences in Rn , linear differential equations
October 29 QUIZ 2 on weeks 5-8
Week 10 November 10-13 Limits and continuity in Rn ; partial and directional derivatives
Week 11 November 17-20 Differentiability, Newton’s method, inverse functions
Fortnight 12 Nov. 24-Dec. 3 Manifolds, critical points, Lagrange multipliers
November 26 Thansksgiving
Half-week 13 December 8 Calculus on parametrized curves; div, grad, and curl
December ? FINAL EXAM on weeks 9-12
This schedule covers all the math that is needed for Physics 15a, 16, and 15b
with the sole exception of surface integrals, which will be done in the spring.
The real analysis in Math 23a alone will be sufficient for most PhD programs in
economics, though the most prestigious programs will want to see Math 23b also.
All the mathematics that is used in Economics 1011a will be covered by the end
of the term. The coverage of proofs is complete enough to permit prospective
Computer Science concentrators to skip CS 20.
Abstract vector spaces and multiple integration, topics of great importance to
prospective math concentrators, have all been moved to Math 23b.
17
MATHEMATICS 23a/E-23a, Fall 2016
Linear Algebra and Real Analysis I
Module #1, Week 1 (Fields, Vectors, and Matrices)
Reading
• 1.1 Suppose that a and b are two elements of a field F . Using only the
axioms for a field, prove the following:
1
– a. Suppose that the matrix [T ] is invertible. Prove that the linear
transformation T is one-to-one and onto (injective and surjective),
hence invertible.
– b. Suppose that linear transformation T is invertible. Prove that its
inverse S is linear and that the matrix of S is [S] = [T ]−1
2
R Scripts
• Script 1.1B-PointsVectors.R
Topic 1 - Addition of vectors in R2
Topic 2 - A diagram to illustrate the point-vector relationship
Topic 3 - Subtraction and scalar multiplication
• Script 1.1C-Matrices.R
Topic 1 - Matrices and Matrix Operations in R
Topic 2 - Solving equations using matrices
Topic 3 - Linear functions and matrices
Topic 4 - Matrices that are not square
Topic 5 - Properties of the determinant
• Script 1.1D-MarkovMatrix
Topic 1 - A game of volleyball
Topic 2 - traveling around on ferryboats
• Script 1.1L-LinearMystery
Topic 1 - Define a mystery linear function f M yst : R2 → R2
3
1 Executive Summary
• Quantifiers and Negation Rules
The “universal quantifier” ∀ is read “for all.”
The “existential quantifier” exists is read “there exists.” It is usually
followed by “s.t,” a standard abbreviation for “such that.”
The negation of “∀x, P (x) is true” is “∃x, P (x) is not true.”
The negation of “∃x, P (x) is true” is “∀x, P (x) is not true.”
The negation of “P and Q are true” is “either P or Q is not true.”
The negation of “either P or Q is true” is “both P and Q are not true.”
• Functions
A function f needs two sets: its domain X and its codomain Y .
f is a rule that, to any element x ∈ X, assigns a specific element y ∈ Y .
We write y = f (x)
f must assign a value to every x ∈ X, but not every y ∈ Y must be of the
form f (x). The subset of the codomain consisting of elements that are of
the form y = f (x) is called the image of f . If the image of f is all of the
codomain Y , f is called surjective or onto
f need not assign different of elements of Y to different elements of X. If
x1 6= x2 =⇒ f (x1 ) 6= f (x2 ), f is called injective or one-to-one
If f is both surjective and injective, it is bijective and has an inverse f −1 .
• Categories
A category C has objects (which might be sets) and arrows (which might
be functions)
An arrow f must have a specific domain objectX and a specific codomain
f
object Y ; we write f : X → Y or X → − Y.
If arrows f : X → Y and g : Y → Z are in the category, then the composi-
tion arrow f ◦ g : X → Z is in the category.
For any object X there is an identity arrow IX : x → X
Given f : X → Y , f ◦ IX = f and IY ◦ f = f .
f g h
Associative law: given X → − Y →− Z→ − W , h ◦ (g ◦ f ) = (h ◦ g) ◦ f
Given an arrow f : X → Y , an arrow g : Y → X such that g ◦ f = IX is
called a retraction.
Given an arrow f : X → Y , an arrow g : Y → X such that f ◦ g = IY is
called a section.
If, for arrow f , arrow g is both a retraction and a section, then g is the
inverse of f , g = f −1 , and g must be unique.
Almost everything in mathematics is a special case of a category.
4
1.1 Fields and Field Axioms
A field F is a set of elements for which the familiar operations of addition and
multiplication are defined and behave in the usual way. Here is a set of axioms
for a field. You can use them to prove theorems that are true for any field.
1. Addition is commutative: a + b = b + a.
• Break up the set of integers into p subsets. Each subset is named after the
remainder when any of its elements is divided by p.
[a]p = {m|m = np + a, n ∈ Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each has
many alternate names. These p infinite sets are the elements of the field
Zp .
• Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any names for
the subsets, because the answer is independent of the choice of name. The
rule is “Add a and b, then divide by p and keep the remainder.”
• Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any names
for the subsets, because the answer is independent of the choice of name.
The rule is “Multiply a and b, then divide by p and keep the remainder.”
5
1.2 Points and Vectors
F n denotes the set of ordered lists of n elements from a field F . Usually the field
is R, but it could be the field of complex numbers C or a finite field like Z5 .
A given element of F n can be regarded either as a point, which represents
“position data,” or as a vector, which represents “incremental data.”
If an element of F n is a point, we represent it by a bold letter like p and write
it as a column of elements enclosed in parentheses.
1.1
p = −3.8
2.3
If an element of F n is a vector, we represent it by a bold letter with an arrow
like ~v and write it as a column of elements enclosed in square brackets.
−0.2
~v = 1.3
2.2
To add a vector to a point, we add the components in identical positions together.
The result is a point: q = p + ~v. Geometrically we represent this by anchoring
the vector at the initial point p. The location of the arrowhead of the vector is
the point q that represents our sum.
q
p ~v
~v + w
~
~
w
~v
6
1.3 Standard basis vectors
The standard basis vector ~ek has a 1 as its kth component, and all its other
components are 0. Since the additive identity 0 and the multiplicative identity
1 must be present an any field, there will always be n standard basis vectors in
F n . Geometrically, the standard basis vectors in R2 are usually associated with
”one unit east” and ”one unit north” respectively.
~e2
~e1
7
1.6 Examples of matrix multiplication
0 1
2 1 0
B 2 −1 A
1 −1 −2
−2 0
0 1 1 −1 2
2 1 0 2 1
A AB B 2 −1 3 3 −2 BA
1 −1 −2 −6 2
−2 0 −4 −2 0
The number of columns in the first factor must equal the number of rows in
the second factor.
8
1.10 Matrix transposes
The transpose of a given matrix A is written AT . The two are closely related.
The rows of A are the columns of AT and the columns of A are the rows of AT .
a b T a c
A= ,A =
c d b d
(AB)T = B T AT
A similar rule holds for matrix inverses:
(AB)−1 = B −1 A−1
1 3
2 4
0 0 1 1
1 0 0 0
The entry in row i, column j of the matrix A = 1 0 0 0 shows how
0 1 1 0
many ways there are to reach island i by a single ferry ride, starting from
island j. The entry in row i, column j of the matrix An shows how many
ways there are to reach island i by a sequence of n ferry rides, sarting from
island j.
9
2 Lecture Outline
1. Quantifiers and negation
Especially when you are explaining a proof to someone, it saves some writing
to use the symbols ∃ (there exists) and ∀ (for all).
Be careful when negating these.
The negation of “∀x, P (x) is true” is “∃x, P (x) is not true.”
The negation of “∃x, P (x) is true” is “∀x, P (x) is not true.”
When negating a statement, also bear in mind that
The negation of “P and Q are true” is “either P or Q is not true.”
The negation of “either P or Q is true” is “both P and Q are not true.”
For practice, let’s negate the following statements (which may or may not
be true!)
• All 11-legged alligators are orange with blue spots. (Hubbard, page 5)
Negation:
10
2. Set notation
Here are the standard set-theoretic symbols:
Using the integers Z and the real numbers R, let’s construct some sets. In
each case there is one way to describe the set using a restriction and another
more constructive way to describe the set.
Constructive:
• The set of coordinate pairs for points on the circle of radius 2 centered
at the origin (an example of a “smooth manifold”).
Restrictive:
Constructive:
11
3. Function terminology:
Here are some terms that should be familiar from your study of precalculus
and calculus:
Example a Example b Example c
domain
codomain
image
one-to-one = injective
onto = surjective
invertible = bijective
Using the sets X = {1, 2} and Y = {A, B, C}, draw diagrams to illustrate
the following functions, and fill in the table to show how the terms apply
to them:
• f : X → Y, f (1) = A, f (2) = B.
12
Here are those function words again, with two additions:
• domain
• natural domain (often deduced from a formula)
• codomain
• image
• one-to-one = injective
• onto = surjective
• invertible = bijective
• inverse image = {x|f (x) ∈ A}
• f1 (x) = x2
• f2 (x) = x3
• f3 (x) = log x(natural logarithm)
• f4 (x) = ex
• Did your calculus course use “range” as a synonym for “image” or for
“codomain?”
13
4. Composition of functions
Sometimes people find that a statement is hard to prove because it is so
obvious. An example is the associativity of function composition, which
will turn out to be crucial for linear algebra.
Prove that (f ◦ g) ◦ h = f ◦ (g ◦ h). Hint: Two functions f1 and f2 are equal
if they have the same domain X and, ∀x ∈ X, f1 (x) = f2 (x).
Consider the set of men who have exactly one brother and least one son.
h(x) = “father of x”, g(x) = “brother of x”, f (x) = “oldest son of x”
• f ◦ g is called
• (f ◦ g) ◦ h is
• g ◦ h is called
• f ◦ (g ◦ h) is
• Simpler name for both (f ◦ g) ◦ h and f ◦ (g ◦ h)
14
5. Finite sets and functions form the simplest example of a category
X Y
X Y
The objects do not have to be sets and the arrows do not have to be
functions. For example, the objects could be courses, and an arrow from
course X to course Y could mean ”if you have taken course X, you will
probably do better in course Y as a result.” Check that the identity and
composition rules are satisfied.
15
6. Invertible functions - an example of invertible arrows
First consider the category of finite sets and functions between them.
The term “inverse” is used only for a “two-sided inverse.” Given f : X → Y ,
an inverse f −1 : Y → X must have the properties
f −1 ◦ f = IX and f ◦ f −1 = IY
Prove that the inverse is unique. This proof uses only things that are true
in any category, so it is valid in any category!
X Y
X Y
It has a “postinverse” (the official word is “retraction”). Just reverse all the
arrows to undo its effect, and define g however you like on the element of
Y that is not in the image of f . Then g ◦ f 6= IX g ◦ f = IX but f ◦ g 6= IY .
16
7. Fields
Loosely speaking, a field F is a set of elements for which the familiar oper-
ations of arithmetic are defined and behave in the usual way. Here is a set
of axioms for a field. You can use them to prove theorems that are true for
any field.
This set of axioms for a field includes properties (such as the commutativity
of addition) that can be proved as theorems by using the other axioms. It
therefore does not qualify as an “independent” set, but there is no general
requirement that axioms be independent.
Some well-known laws of arithmetic are omitted from the list of axioms
because they are easily proved as theorems. The most obvious omission is
∀a ∈ F, 0a = 0.
Here is the proof. What axiom justifies each step?
• 0 + 0 = 0 so (0 + 0)a = 0a.
• 0a + 0a = 0a.
• (0a + 0a) + (−0a) = 0a + (−0a).
• 0a + (0a + (−0a)) = 0a + (−0a).
• 0a + 0 = 0.
• 0a = 0.
17
8. Finite fields
Computing with real numbers by hand can be a pain, and most of linear
algebra works for an arbitrary field, not just for the real and complex num-
bers. Alas, the integers do not form a field because in general there is no
multiplicative inverse. Here is a simple way to make from the integers a
finite field in which messy fractions cannot arise.
18
9. Rational numbers
The rational numbers Q form a field. You learned how to add and multiply
them years ago! The multiplicative inverse of ab is ab as long as a 6= 0.
The rational numbers are not a “big enough” field for doing Euclidean
geometry or calculus. Here are some irrational quantities:
√
• 2
• π.
• most values of trig functions, exponentials, or logarithms.
• coordinates of most intersections of two circles.
∞
X 1 i
ai ( )
i=1
10
The rational numbers and the real numbers are both “ordered fields.” This
means that there is a subset of positive elements that is closed under both
addition and multiplication. No finite field is ordered.
In Z5 , you can name the elements [0], [1], [2], [−2], [−1], and try to call the
elements [1] and [2] “positive.” Why does this attempt to make an ordered
field fail?
19
11. Proof 1.1 - two theorems that are valid in any field
(a) Using nothing but the field axioms, prove that if ab = 0, then either a
or b must be 0.
(b) Using nothing but the field axioms, prove that the additive inverse
of an element a is unique. (Standard strategy for uniqueness proofs:
assume that there are two different inverses b and c, and prove that
b = c.
20
12. Lists of field elements as points and vectors:
F n denotes the set of ordered lists of n elements from a field F . Usually
the field is R, but it could be the field of complex numbers C or a finite
field like Z5 .
An element of F n can be regarded either as a point, which represents “po-
sition data,” or as a vector, which represents “incremental data.” Beware:
many textbooks ignore this distinction!
If an element of F n is a point, we represent it by a bold letter like p and
write it as a column of elements enclosed in parentheses.
1.1
p = −3.8 ,
2.3
21
14. Examples from coordinate geometry
Here are two points in the plane.
1.4 2.4
p= ,q =
−3.8 −4.8
• What is q − p?
• What is p + ~v?
• What is ~v − 1.5~
w?
• What, if anything, is p + q?
22
15. Subsets of F n
A subset of F n can be finite, countably infinite, or uncountably infinite.
The concept is especially useful when the elements of F n are points, but it
is valid also for vectors.
Examples:
0 1 2
(a) In Z23 ,
consider the set { , , }.
1 2 0
This will turn out (outline 7) to be a line in “the small affine faculty
senate.” Write it in the form {p + t~v|t ∈ Z3 }.
(b) In R2 , consider the set of points whose coordinates are both positive
integers. Is it finite, countably infinite, or uncountably infinite?
2 x
(d) In R , draw a diagram that might represent the set of points ,
y
where x is family income and y is family net worth, for which a family
qualifies for free tuition.
23
16. Subspaces of F n
A subspace is defined only when the elements of F n are vectors. It must
be closed under vector addition and scalar multiplication. The second re-
quirement means that the zero vector must be in the subspace. The empty
set ∅ is not a subspace!
Geometrically, a subspace corresponds to a “flat subset” (line, plane, etc.)
that includes the origin.
For R3 there are four types of subspace. What is the geometric interpreta-
tion of each?
0
• 0-dimensional: the set { 0}
0
• 1-dimensional: {t~u|t ∈ R}
Exception: 0-dimensional if
1-dimensional if
24
17. Standard basis vectors:
n
X
xi e~i
i=1
This will turn out to be true also in an abstract n-dimensional vector space,
but in that case there will be no “standard” basis.
25
Here are formulas for vector fields from Hubbard, exercise 1.1.6, (c) and
(e). Plot them. If you did Physics C Advanced Placement E&M, they may
look familiar.
x x x −y
F~ = , F~ =
y y y x
19. Matrices
An m × n matrix over a field F is a rectangular array of elements of F with
m rows and n columns. Watch the convention: the height is specified first!
As a mathematical object, any matrix can be multiplied by any element of
F . This could be meaningless in the context of an application. Suppose
you run a small hospital that has two rooms with three patients in each.
Then
98.6 102.4 99.7
103.2 98.3 99.6
26
20. Matrix multiplication
Matrix multiplication is nicely explained on pp. 43-46 of Hubbard. To
illustrate the rule, we will take
0 1
2 1 0
A= , B = 2 −1
1 −1 −2
−2 0
0 1
• Compute AB. 2 −1
−2 0
2 1 0
1 −1 −2
2 1 0
• Compute BA.
1 −1 −2
0 1
2 −1
−2 0
27
21. Matrices as functions:
Since a column vector is also an n × 1 matrix, we can multiply an m × n
matrix by a vector in F n to get a vector in F m . The product A~ ei is the
ith column of A. This is usually the best way to think of a matrix A as
representing a linear function f : the ith column of A is f (~
ei ).
1 1 0 2
Example: Suppose that f ( )= , f( )= .
0 4 1 3
What matrix represents f ?
Since A(xi e~i + xj e~j ) is the sum of xi times column i and xj times column
j, we see that
The rule for forming the product AB can be stated in terms of the rule for
a matrix acting on a vector: to form AB, just let A act on each column of
B in turn, and put the results side by side to create the matrix AB.
What function does the matrix product AB represent? Consider (AB)~ ei .
This is the ith column of the matrix AB, and it is also the result of letting
B act on e~i , then letting A act on the result. So for any standard basis
vector, the matrix AB represents the composition A ◦ B of the functions
represented by B and by A.
What about the matrices (AB)C and A(BC)? These represent the compo-
sition of three functions: say (f ◦ g) ◦ h and f ◦ (g ◦ h). But we already know
that composition of functions is associative. So we have proved, without
any messy algebra, that multiplication of matrices is associative also.
28
22. Proving associativity by brute force (proof 1.2)
A is an n × m matrix.
B is an m × p matrix.
C is an p × q matrix.
If ai,j represents the entry in the ith row, jth column of A, then
m
X
(AB)i,k = ai,j bj,k
j=1
p p
m X
X X
((AB)C)i,q = (AB)i,k ck,q = (ai,j bj,k )ck,q
k=1 j=1 k=1
(BC)j,q =
(A(BC))i,q =
On what basis can you now conclude that matrix multiplication is associa-
tive for matrices over any field F ?
Group problem 1.1.1c offers a more elegant version of the same proof by
exploiting the fact that matrix multiplication represents composition of
linear functions.
29
23. Identity matrix:
It must be square, and the ith column is the ith basis vector. For example,
1 0 0
I3 = 0 1 0
0 0 1
Now we just have to check the two rules that must hold in any category:
30
25. Matrix inverses:
Consider first the case of a non-square m × n matrix A.
If m > n, then A takes a vector in Rn and produces a longer vector in
Rm . In general, there will be many matrices B that can recover the original
vector in Rn . In the lingo of categories, such a matrix B is a retraction.
Here is a matrix that converts a 2-component vector (price of silver and
price of gold) into a three-component vector that specifies the price of
alloys
4
containing 25%, 50%, and 75% gold respectively. Calculate ~v = A .
8
.75 .25
4
A = .5 .5 , ~v = A =
8
.25 .75
By elementary algebra you can reconstruct the price of silver and of gold
from the price of any two of the alloys, so it is no surprise to find two
different left inverses. Apply each of the following to ~v.
2 −1 0
B1 = , B1~v =
−2 3 0
0 3 −2
B2 = , B2~v =
0 −1 2
31
26. Inverting square matrices
For a square matrix, the interesting case is where both a right inverse B
and a left inverse C exist. In this case, B and C are equal and they are
unique. We can say that “an inverse” A−1 exists.
Proof of both uniqueness and equality:
To prove uniqueness of the left inverse matrix, assume that matrix A has
two different left inverses C and C 0 and a right inverse B:
C 0 A = CA = I
C 0 (AB) = C(AB) = IB
C 0 I = CI = B
C0 = C = B
then
−1 1 d −b
A =
ad − bc −c a
32
The matrix
inversion recipe works in any field: try inverting
3 1
A= where the elements are in Z5 .
4 2
• transpose
• symmetric matrix
• antisymmetric matrix
• diagonal matrix
• upper or lower triangular matrix
33
28. Linear transformations:
T (a~v + b~
w) = aT (~v) + bT (~
w)
Example:
The components of ~v are the quantities of sugar, flour, and chocolate re-
quired to produce a batch of brownies. The components of w ~ are the
quantities of these ingredients required to produce a batch of fudge. T
is the function that converts such a vector into the total cost of ingredi-
ents. T is represented by a matrix [T ] (row vector) of prices for the various
ingredients.
Write these vectors for the following data:
• Sugar costs $2 per pound, flour costs $1 per pound, chocolate costs $6
per pound.
Then a~v + b~
w is the vector of ingredients required to produce a batches of
brownies and b batches of fudge, while T (~v) is the cost of parts for a single
batch of brownies. The statement
T (a~v + b~
w) = aT (~v) + bT (~
w) is sound economics.
Two ways to find the cost of 3 batches of brownies plus 2 batches of fudge.
T (3~v + 2~
w) =
3T (~v) + 2T (~
w) =
34
29. A linear transformation interpreted geometrically.
A parallelogram has one vertex at the origin. Two other vertices are located
at points in the plane specified by ~v and w
~ . Transformation T expands the
parallelogram by a factor of 2 and rotates it counterclockwise through a
right angle.
You can either locate the fourth vertex by vector addition and then apply
T to it, or you can apply T separately to the second and third vertices,
than apply T . So
~ ) = T (~v) + T (~
T (~v + w w)
35
30. Matrices and linear transformations
Use * to denote the mechanical operation of matrix multiplication.
Any vector can be written as ~v = x1 e~1 + ... + xn e~n .
The rule for multiplying a matrix [T ] by a vector ~v is equivalent to
36
32. Inversion
A function f is invertible if it is 1-to-1 (injective) and onto (surjective). If
g is the inverse of f , then both g ◦ f and f ◦ g are the identity function.
How do we reconcile this observation with the existence of matrices that
have one-sided inverses?
Here are two simple examples that identify the problem.
37
33. Example - constructing the matrix of a linear transformation
Here is what we know about function f :
38
34. Invertibility of linear functions and of matrices (proof 1.3, Hubbard, propo-
sition 1.3.14)
Since the key issue in this proof is the subtle distinction between a linear
function T and the matrix [T ] that represents it, it is a good idea to use
* to denote matrix multiplication and ◦ to denote composition of linear
transformations.
It is also a good idea to use ~x for a vector in the domain of T and ~y for a
vector in the codomain of T
Suppose that linear transformation T : F n → F m is represented by the
m × n matrix [T ].
(a) Suppose that the matrix [T ] is invertible. Prove that the linear trans-
formation T is one-to-one and onto (injective and surjective), hence
invertible.
39
35. Application: graph theory
This is inspired by example 1.2.22 in Hubbard (page 51), but I have ex-
tended it by allowing one-way edges and multiple edges.
A graph has n vertices: think of them as islands. Given two vertices Vi and
Vj , there may be Ai,j edges (bridges or ferryboats) that lead from Vj to Vi
and Aj,i edges that lead from Vi to Vj . If a bridge is two-way, it counts
twice, but we allow one-way bridges.
The matrix
0 0 1 1
1 0 0 0
A=
1
0 0 0
0 1 1 0
corresponds to the following directed graph:
40
Clearly A is a matrix, and it describes the graph completely. The challenge
is to associate it with a linear transformation and to interpret its columns
as vectors.
Suppose you are a travel agent and you keep a notebook with a complete list
of all the ways that you have found to reach each island. So one component,
xj , would count the number of ways that you have found to reach island j.
A standard basis vector like e~j describes a notebook that has one way of
reaching island j (land at the airport?) and no way of reaching any other
islands.
It is always worth asking what (if anything) the operations of addition and
scalar multiplication mean. Addition is tricky: in general, it would have
to correspond to two different agents combining their notebooks, with no
attempt to weed out duplicates. Multiplication by a non-integer makes no
sense.
What about A~ ej ? This is the jth column of A and its ith component is
Ai,j , the number of edges leading from Vj to Vi . (Hubbard has chosen the
opposite convention in Exercises 1.2.20 and 1.2.22, but for his example the
matrix is symmetric and it makes no difference). It is an annoying feature
of matrix notation that the row index comes first, since we choose a column
first and then consider its entries.)
x1
x2
Now consider a vector ~v = ... whose entries are arbitrary non-negative
xn
integers. After traversing one more edge, the number of walks that lead to
vertex Vi is
n
X
Ai,j xj .
j=1
This is a linear function, and we see that the vector A~v represents the
number of distinct ways of reaching each island after extending the existing
list of walks by following one extra edge wherever possible.
If you start on island Vj and make a walk of n steps, then the number of
distinct walks leading to each island is specified by the components of the
vector An e~j .
Hubbard does the example of a cube, where all edges are two-way.
41
0 0 1 1
1 0 0 0
For the four-island graph, with A =
1
,
0 0 0
0 1 1 0
use matrix multiplication to find
0 0 1 1 0 0 1 1
1 0 0 0 1 0 0 0
1 0 0 0 1 0 0 0
0 1 1 0 0 1 1 0
0 0 1 1
1 0 0 0
1 0 0 0
0 1 1 0
42
36. Application: Markov processes
This is inspired by example 1.2.21 in Hubbard, but in my opinion he breaks
his own excellent rule by using a “line matrix” to represent probabilities.
The formulation below uses a column vector.
Think of a graph where the vertices represent “states” of a random process.
A state could, for example, be
All edges are one way, and attached to each edge is a number in [0,1], the
“transition probability” of following that edge in one step of the process.
The sum of the probabilities on all the edges leading out of a state cannot
exceed 1, and if it is less than 1 there is some probability of remaining in
that state.
Examples: write at least one column of the matrix for each case.
(a) If you are on Oahu, the probability of flying to Maui is 0.2, and the
probability of flying to Lanai is 0.1. Otherwise you stay put.
(b) Badminton: if player 1 serves, the probability of losing the point and
the serve is 0.2. If player 2 serves, the probability of losing the point
and the serve is 0.3.
(c) If John Hubbard’s reference books are on the shelf in the order (2,1,3),
the probability that he consults book 3 and places it at the left to make
the order (3,2,1) is P3 .
43
(d) Roulette: after starting with 2 chips and betting a chip on red, the
9
probability of having 3 chips is 19 and the probability of having 1 chip
is 19 . (in a fair casino, each probability would be 12 ).
10
What matrix represents the transition resulting from two successive points?
0.8 0.3
0.2 0.7
0.8 0.3
0.2 0.7
What matrix represents the transition resulting from four successive points?
0.7 0.45
0.3 0.55
0.7 0.45
0.3 0.55
If you raise the transition matrix A to a high power, you might conjecture
that after a long time the probability that player 1 is serving is 0.6, no
matter who served first.
∞ 0.6 0.6
In support of this conjecture, show that the matrix A = has
0.4 0.4
the property that AA∞ = A∞ .
44
3 Group Problems
1. Some short proofs
Once your group has solved its problem, use a cell phone to take a picture
of your solution, and upload it to the topic box for your section on the
Week 1 page of the Web site.
(a) When we say that a matrix A is invertible, we mean that it has both
a right inverse and a left inverse. Prove that the right inverse and the
left inverse are equal, and that the inverse is unique.
If you need a hint, see page 48 of Hubbard.
Illustrate
your
answer by writing down the inverse B of the matrix
3 2
A= , where all the entries are in the finite field Z5 , and showing
2 4
that both AB and BA are equal to the identity matrix.
Since you are working in a finite field, there are no fractions. In Z5 ,
dividing by 3 is the same as multiplying by 2.
(b) Here are two well-known laws of arithmetic that are not on the list of
field axioms. They do not need to be listed as axioms because they are
provable theorems! In each case, the trick is to start with an identity
that is valid in any field, then apply the distributive law. You should
be able to justify each step of your proof by reference to one or more
of the field axioms.
Starting with 0 + 0 = 0, prove that 0a = 0 for any a ∈ F .
Starting with −1 + 1 = 0, prove that (−1)a = −a for any a ∈ F .
(c) Prove that composition of functions, whether linear or not, is associa-
tive. Illustrate your proof by using the functions
f (x) = x2 , g(x) = ex , h(x) = 3 log x (natural logarithms)
and computing both f ◦ (g ◦ h) and (f ◦ g) ◦ h
Then use your result to give a one-line proof that matrix multiplication
must be associative. See Hubbard, page 63.
45
2. Matrices and linear functions
46
3. Problems to be solved by writing or editing R scripts
Upload your answer immediately to the Week 1 page of the course Web
site. Then your classmates can try out your script.
47
4 Homework
(PROBLEM SET 1 - due on Tuesday, September 9 by 11:59 PM)
Problems 1-7 should be done on paper and placed in the locked box near
Science Center 209 that has the name of your Monday section instructor on it.
Problems 8 and 9 should be done in a single R script and uploaded to the
dropbox on the Week 1 page of the course Web site.
1. Prove the following, using only the field axioms and the results of group
problem 1(b).
2. Function composition
( Hubbard, exercise 0.4.10.)
Prove the following:
This problem asks you to prove two results that we will use again and again.
All you need to do is to use the definitions of “one-to-one” and “onto.”
Here are some strategies that may be helpful:
48
3. Hubbard, exercise 1.2.2, parts (a) and (e) only. Do part (a) in the field
R, and do part (e) in the field Z7 , where -1 is the same as 6. Check your
answer in (e) by doing the calculation in two different orders: according to
the associative law these should give the same answer. See Hubbard, figure
1.2.5, for a nice way to organize the calculation.
49
3 6 2 5
7. (a) Suppose that T is linear and that T = ,T = .
2 8 1 5
1 0
Use the linearity of T to determine T and T , and thereby de-
0 1
termine the matrix [T ] that represents T . (This brute-force approach
works fine in the 2 × 2 case but not in the n × n case.)
(b) Express the given information about T from part (b) in the form
[T ][A] = [B], and determine the matrix [T ] that represents T by using
the matrix [A]−1 . (This approach will work in the general case once
you know how to invert an n × n matrix .)
The last two problems require R scripts. It is fine to copy and edit similar
scripts from the course Web site, but it is unacceptable to copy and edit
your classmates’ scripts!
(a) Draw a diagram to show the four regions and their monorail connec-
tions.
(b) Construct the 4 × 4 transition matrix A for this graph of four vertices.
(c) Using matrix multiplication in R, determine how many different se-
quences of four monorail rides start in Tibet and end in the Middle
Kingdom.
50
.
51
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #1, Week 2 (Dot and Cross Products, Euclidean Geometry of Rn )
Reading
• 2.2 For a 3 × 3 matrix A, define det(A) in terms of the cross and dot
products of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).
1
R Scripts Scripts labeled A, B, ... are closely tied to the Executive Summary.
Scripts labeled X, Y, ... are interesting examples. There is a narrated version on
the Web site. Scripts labeled L are library scripts that you may wish to include
in your own scripts.
• Script 1.2A-LengthDotAngle.R
Topic 1 - Length, Dot Product, Angles
Topic 2 - Components of a vector
Topic 3 - Angles in Pythagorean triangles
Topic 4 - Vector calculation using components
• Script 1.2B-RotateReflect.R
Topic 1 - Rotation matrices
Topic 2 - Reflection matrices
• Script 1.2C-ComplexConformal.R
Topic 1 - Complex numbers in R
Topic 2 - Representing complex numbers by 2x2 matrices
• Script 1.2D-CrossProduct.R
Topic 1 - Algebraic properties of the cross product
Topic 2 - Geometric properties of the cross product
Topic 3 - Using cross products to invert a 3x3 matrix
• Script 1.2E-DeterminantProduct.R
Topic 1 - Product of 2x2 matrices
Topic 2 - Product of 3x3 matrices
• Script 1.2L-VectorLibrary.R
Topic 1 - Some useful angles and basis vectors
Topic 2 - Functions for working with angles in degrees
• Script 1.2X-Triangle.R
Topic 1 - Generating and displaying a randomly generated triangle
Topic 2 - Checking some formulas of trigonometry
• Script 1.2Y-Angles3D.R
Topic 1 - Angles between vectors in R3
Topic 2 - Angles and distances in a cube Topic 3 - Calculating the airline
mileage between cities
2
1 Executive Summary
1.1 The dot product
Pn
The dot product of two vectors in Rn is ~x ·~y = x1 y1 +x2 y2 +. . .+xn yn = i=1 xi yi
• It requires two vectors and returns a scalar.
• It is commutative and it is distributive with respect to addition.
• In R2 or R3 , the dot product of a vector with itself (a concept of algebra)
is equal to the square of its length (a concept of geometry):
~x · ~x = |~x|2
• Taking the dot product with any standard basis vector e~i extracts the cor-
responding component:
~x · e~i = xi
• Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x · ~a = xa
This means that the difference ~x − xa~a is orthogonal to ~a.
~y b
c ~x − ~y
α
a
~x
Consider the triangle whose sides lie along the vectors ~x(length a), ~y (length b),
and ~x − ~y (length c). Let α denote the angle between the vectors ~x and ~y.
By the distributive law,
(~x − ~y) · (~x − ~y) = ~x · ~x + ~y · ~y − 2~x · ~y =⇒ c2 = a2 + b2 − 2~x · ~y
Comparing with the law of cosines, we find that angles and dot products are
related by:
3
1.3 Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and angle for
vectors to Rn , but now we can no longer invoke Euclidean plane geometry to
guarantee that | cos α| ≤ 1.
~ in Rn
We need to show that for any vectors ~v and w
|~v · w
~ | ≤ |~v||~
w|
~x + ~y
~y
~x
We need to show that its length cannot exceed the sum of the lengths of the
other two sides:
|~x + ~y|2 = (~x + ~y) · (~x + ~y) = (~x + ~y) · ~x + (~x + ~y) · ~y
Applying Cauchy-Schwarz to each term on the right-hand side, we have:
4
1.5 Isometries of R2
A linear transformation T : R2 → R2 is completely specified by its effect on the
basis vectors ~e1 and ~e2 . These vectors are the two columns of the matrix that
represents T . If you know what a transformation is supposed to do to each basis
vector, you can simply use this information to fill out the necessary columns of
its matrix representation.
T~a · T ~b = ~a · ~b
For the matrix associated with an isometry, both columns must be unit vectors
and their dot product is zero.
Two isometries:
cos θ − sin θ
• A rotation, R(θ) = , with det R = +1.
sin θ cos θ
cos 2θ sin 2θ
• A reflection, F (θ) = , with det F = −1.
sin 2θ − cos 2θ
5
1.6 Matrices and algebra: complex numbers
The same field axioms we reviewed on the first day apply here to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers is the
same as the rule for addition of vectors in R2 (in that they are to be kept separate
from each other), and the modulus of a complex number is the same as the length
of the vector that represents it. So the triangle inequality applies for complex
numbers: |z1 + z2 | ≤ |z1 | + |z2 |.
This property extends to vector spaces over complex numbers.
6
1.9 The cross product
a1 b1 a2 b 3 − a3 b 2
~a × ~b = a2 × b2 = a3 b1 − a1 b3
a3 b3 a1 b 2 − a2 b 1
Properties
1. ~a × ~b = −~b × ~a.
2. ~a × ~a = 0.
4. For the standard basis vectors, e~i × e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i × e~j = −e~k .
1. det(A) changes sign if you interchange any two columns. (easiest to prove
for columns 1 and 2, but true for any pair)
7
2 Lecture Outline
1. Introducing coordinates:
8
2. The dot product:
~x · ~y = x1 y1 + x2 y2 + · · · + xn yn
It has the following properties. The proof of the first four (omitted) is
brute-force computation.
• Commutative law:
~x · ~y = ~y · ~x
• Distributive law:
~x · e~i = xi
• Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x · ~a = xa
This means that the difference ~x − xa~a is orthogonal to ~a.
Proof: Orthogonality of two vectors means that their dot product is
zero. So to show orthogonality, evaluate
9
3. Dot products and angles
From elementary trigonometry we have the law of cosines, usually written
c2 = a2 + b2 − 2ab cos α.
In this formula, c denotes the length of the side opposite angle α. Just in
case you forgot the proof, let’s review it.
Expand the dot product using the distributive law, and you can identify
one of the terms as 2ab cos α.
10
4. Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and
angle for vectors to Rn , but now we can no longer invoke Euclidean plane
geometry to guarantee that | cos α| ≤ 1.
~ in Rn ,
We need to show that for any vectors ~v and w
|~v · w
~ | ≤ |~v||~
w|
~ |2 = (t~v − w
f (t) = |t~v − w ~ ) · (t~v − w
~)
11
So we have a useful definition of angle for vectors in Rn in general:
~v · w
~
α = arccos
|~v||~w|
12
6. Proof 2.1 – start to finish, done in a slightly differnt way
Given vectors ~v and w~ in Euclidean Rn , prove that |~v · w
~ | ≤ |~v||~
w| (Cauchy-
Schwarz) and that |~v + w ~ | ≤ |~v| + |~
w| (triangle inequality). Use the dis-
tributive law for the scalar product and the fact that no vector has negative
length.
13
7. Some short proofs that use the dot product:
(a) A triangle is formed by using vectors ~x and ~y, both anchored at one
vertex. The vectors are labeled so that the longer one is called ~x: i.e.
|~x| > |~y|. The vector ~x −~y then lies along the third side of the triangle.
Prove that
|~x − ~y| ≥ |~x| − |~y|.
~x
~x − ~y
~y
(b) Prove that the dot product of vectors ~x and ~y can be expressed solely
in terms of lengths of vectors. It follows that an isometry, which by
definition preserves lengths of all vectors, also preserves dot products
and angles.
(c) A parallelogram has sides with lengths a and b. Its diagonals have
lengths c and d, Prove the “parallelogram law,” which states that
c2 + d2 = 2(a2 + b2 ).
14
8. Calculating angles and areas
−2 −4
Let ~v1 = 2 , ~v2 = 1 .
−1 1
1
Both these vectors happen to be perpendicular to the vector ~v3 = 2 .
2
15
9. Isometries of R2 .
A linear transformation T : R2 → R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are “isometries:” transformations that preserve the dis-
tance between any pair of points, and hence the length of any vector.
Since
4~a · ~b = |~a + ~b|2 − |~a − ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
• A rotation,
cos θ − sin θ
R(θ) = ,
sin θ cos θ
which has det R = 1.
• A reflection,
cos 2θ sin 2θ
F (θ) = ,
sin 2θ − cos 2θ
which has det F = −1.
This represents reflection in a line through the origin that makes an
angle θ with the first basis vector.
16
10. Using matrices to represent rotations and reflections
17
11. Complex numbers as vectors and as matrices
The field axioms thst you learned on the first day apply also to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers
is the same as the rule for addition of vectors in R2 , and the modulus of a
complex number is the same as the length of the vector that represents it.
So the triangle inequality applies for complex numbers: |z1 +z2 | ≤ |z1 |+|z2 |.
This property extends to vector spaces over complex numbers.
The geometrical interpretation of multiplication by a complex number
z = a + ib = reiθ is multiplication of the modulus by r combined with
addition of θ to the angle with the x-axis.
This is precisely the geometrical effect of the linear transformation repre-
sented by the matrix
a −b r cos θ −r sin θ
=
b a r sin θ r cos θ
r 0
Such a matrix is the product of the constant matrix and the rotation
0 r
cos θ − sin θ
matrix .
sin θ cos θ
It is called a conformal matrix and it preserves angles even though it
does not preserve lengths.
Example: Compute the product of the complex numbers 2 + i and 3 + 1 by
useing matrix multiplication.
18
12. Complex numbers as a field of matrices
In general, matrices do not form a field because multiplication is not com-
mutative. There are two notable exceptions: n × n matrices that are mul-
tiples of the identity matrix and 2 × 2 conformal matrices. Since multiples
of the identity matrix
and
rotations
all
commute, the product of two con-
a −b c −d
formal matrices and is the same in either order.
b a d c
19
13. Cross products:
20
We can now prove these without messy calculations involving components.
Justify each step, using properties of the dot product and properties (a)
through (f ) from the preceding page.
~v · ~a × ~b = s~a · ~a × ~b + t~b · ~a × ~b
~v · ~a × ~b = s~a · ~a × ~b − t~b · ~b × ~a
~v · ~a × ~b = s~a × ~a · ~b − t~b × ~b · ~a
~v · ~a × ~b = 0 − 0 = 0.
• |~a × ~b|2 = |~a|2 |~b|2 − (~a · ~b)2
Proof:
|~a × ~b|2 = (~a × ~b) · (~a × ~b)
21
15. Cross products and determinants.
22
16. Determinants in R3
Here is our definition:
If a 3 × 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant
det(A) = ~a1 × ~a2 · ~a3 .
Apply this definition to the matrix
1 0 1
A= 2 1
2 .
0 1 0
(a) det(A) changes sign if you interchange any two columns. (easiest to
prove for columns 1 and 2, but true for any pair)
(b) det(A) is a linear function of each column (easiest to prove for column
3, but true for any column)
23
17. Determinants, triple products, and geometry
The magnitude of ~a × ~b · ~c is equal to the volume of the parallelepiped
spanned by ~a, ~b and ~c.
Proof: ~a × ~b is the area of the base of the parallelepiped, and |~c| cos α,
where α is the angle between ~c and the direction orthogonal to the base, is
its height.
Matrix A maps the unit cube, spanned by the three basis vectors, into a
parallelepiped whose volume is | det(A)|. You can think of | det(A)| as a
“volume stretching factor.” This interpretation will underly much of the
theory for change of variables in multiple integrals, a major topic in the
spring term.
If three vectors in R3 all lie in the same plane, the cross product of any
two of them, which is orthogonal to that plane, is orthogonal to the third
vector, so ~v1 × ~v2 · ~v3 = 0.
1 1 3
Apply this test to ~v1 = 0 , ~v2 = 2 , ~v3 = 2 .
1 0 2
If four points in R3 all lie in the same plane, the vectors that join any one
of the points to each
of theother
three
points
allliein that plane. Apply
1 2 2 4
this test to p = 1 , q = 1 , r = 3 , s = 3 .
1 2 1 3
24
18. Determinants and matrix multiplication
X3 3
X 3
X
~
~c1 = Ab1 = A( bi,1~ei ) = bi,1 A(~ei )) = bi,1~ai .
i=1 i=1 i=1
X3 3
X X3
det C = ( bi,1 a~i ) × ( bj,2 a~j ) · ( bk,3 a~k )
i=1 j=1 k=1
Now use the distributive law for dot and cross products.
3
X 3
X 3
X
det C = bi,1 bj,2 ai × a~j · a~k )
bk,3 (~
i=1 j=1 k=1
There are 27 terms in this sum, but all but six of them involve two subscripts
that are equal, and these are zero because a triple product with two equal
vectors is zero.
The six that are not zero all involve ~a1 × ~a2 · ~a3 , three with a plus sign and
three with a minus sign. So
det C = f (B)(~a1 × ~a2 · ~a3 ) = f (B) det(A), where f (B) is some messy
function of products of all the entries of B.
This formula is valid for any A. In particular, it is valid when A is the
identity matrix, C = B, and det(A) = 1.
So det B = f (B) det(I) = f (B)
and the messy function is the determinant!
25
19. Proof 2.2 – start to finish
For a 3 × 3 matrix A, define det(A) in terms of the cross and dot prod-
ucts of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).
26
20. Isometries of R2 .
A linear transformation T : R2 → R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are “isometries:” transformations that preserve the dis-
tance between any pair of points, and hence the length of any vector.
Since
4~a · ~b = |~a + ~b|2 − |~a − ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
• A rotation,
cos θ − sin θ
R(θ) = ,
sin θ cos θ
which has det R = 1.
• A reflection,
cos 2θ sin 2θ
F (θ) = ,
sin 2θ − cos 2θ
which has det F = −1.
This represents reflection in a line through the origin that makes an
angle θ with the first basis vector.
27
21. Calculations with cross products
28
22. Transposes and dot products
Start by proving in general that (AB)T = B T AT . This is a statement about
matrices, and you have to prove it by brute force.
~ = ~vT w
~v · w ~
29
23. Orthogonal matrices
If a matrix R represents an isometry, then each column is a unit vector and
the columns are orthogonal. Since the columns of R are the rows of RT we
can express this property as
RT R = I
Perhaps a nicer way to express this condition for a matrix to represent an
isometry is RT = R−1 . Check that this is true for the 2 × 2 matrices that
represent rotations and reflections.
For a rotation matrix
cos θ − sin θ
R(θ) = .
sin θ cos θ
30
24. Isometries and cross products
Many vectors of physical importance (torque, angular momentum, magnetic
field) are defined as cross products, so it is useful to know what happens to
a cross product when an isometry is applied to each vector in the product.
~.
Consider the matrix whose columns are R~u, R~v, and w
Multiply this matrix by RT to get a matrix whose columns are
RT R~u, RT R~v, and RT w
~ . In the process you multiply the determinant by
T
det(R ) = det(R).
Now, since RT R = I for an isometry, ~u × ~v · RT w
~ = det(R)R~u × R~v · w
~
Equivalently, R(~u × ~v) · w
~ = det(R)R~u × R~v · w
~.
~ , in particular for any basis vector, it follows
Since this is true for any w
that
R(~u × ~v) = det(R)R~u × R~v
If R is a rotation, then det(R) = 1 and R(~u × ~v) = R~u × R~v
If R is a reflection, then det(R) = −1 and R(~u × ~v) = −R~u × R~v
This is reasonable. Suppose you are watching a physicist in a mirror as she
calculates the cross product of two vectors. You see her apparently using
a left-hand rule and think that she has got the sign of the cross-product
wrong.
31
25. Using cross products to invert a 3 × 3 matrix
Thinking about transposes also leads to a formula for the inverse of a 3 × 3
matrix in terms of cross products. Suppose that matrix A has columns
~a1 , ~a2 , and ~a3 . Form the vector ~s1 = ~a2 × ~a3 .
This is orthogonal to ~a2 and ~a3 , and its dot product with ~a1 is det(A).
Similarly, the vector ~s2 = ~a3 × ~a1
is orthogonal to ~a3 and ~a1 , and its dot product with ~a2 is det(A).
Finally, the vector ~s3 = ~a1 × ~a2
is orthogonal to ~a1 and ~a2 , and its dot product with ~a3 is det(A).
So if you form these vectors into a matrix S and take its transpose,
S T A = det(A)I.
If det A = 0, A has no inverse. Otherwise
ST
A−1 = .
det(A)
32
.
33
3 Group Problems
1. Dot products, angles, and isometries
(a) Making the reasonable assumption that a rotation though angle 2α can
be accomplished by making two successive rotations through angle α,
use matrix multiplication to derive the double-angle formulas for the
sine and cosine functions.
(b) Consider a parallelogram spanned by vectors ~v and w ~ . Using the dot
product, prove that it is a rhombus if and only if the diagonals are
perpendicular and that it is a rectangle if and only if the diagonals are
equal in length.
(c) A parallelogram is spanned by two vectors that meet at a 60 degree
angle, one of which is twice as long as the other. Find the ratio of the
lengths of the diagonals and the cosine of the acute angle between the
diagonals. Confirm that the parallelogram law holds in this case.
34
3. Problems that involve writing or editing R scripts
(a) Construct a triangle where vector AB has length 5 and is directed east,
while vector AC has length 10 and is drected 53 degrees north of east.
On side BC, construct point D that is 1/3 of the way from B to C.
Using dot products, confirm that the vector AD bisects the angle at
A.
This is a special case of Euclid’s Elements, Book VI, Proposition 3.
(b) You are playing golf, and the hole is located 350 yards from the tee in
a direction 18 degrees south of east. You hit a tee shot that travels 220
yards 14 degrees south of east, followed by an iron shot that travels
150 yards 23 degrees south of east. How far from the hole is your golf
ball now located?
(c) Generate a triangle using the function in the vector library 1.2L-
VectorLibrary.R, then apply to each vertex of this triangle the con-
formal matrix C that corresponds to the complex number −1.2 + 1.6i.
Plot the triangle before and after C is applied, and confirm that these
triangles are similar but not congruent.
35
4 Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
A B
C D
36
2. One vertex of a quadrilateral in R3 is located at point p. The other three
vertices, going around in order, are located at q = p + ~a, r = p + ~b, and
s = p + ~c.
w = ~vT A~
(a) Prove that ~v · A~ w. (You can think of the right-hand side as
the product of three matrices.)
w =AT ~v · w
(b) Prove that ~v · A~ ~ . You can do this by brute force using
summation notation, or you can do it by using part (a) and the rule
for the transpose of a matrix product (Therem 1.2.17 in Hubbard).
(c) Now suppose that ~v and w~ are vectors in R3 and R is an 3×3 isometry
matrix. Prove that R~v · Rw
~ = ~v · w
~ . If you believe that physical laws
should remain valid when you rotate your epcerimental apparatus, this
result shows that dot products are appropriate to use in expressing
physical laws.
37
5. Let R(θ) denote the 2×2 matrix that represents a counterclockwise rotation
about the origin through angle θ. Let F (α) denote the 2 × 2 matrix that
represents a reflection in the line through the origin that makes angle α with
the x axis. Using matrix multiplication and the trigonometric identities
sin (α + β) = sin α cos β + cos α sin β
cos (α + β) = cos α cos β − sin α sin β, prove the following:
38
The last two problems require R scripts. Feel free to copy and edit existing
scripts, including student solutions to group problem 3b, and to use the
library script 2l, which has functions for dealing with angles in degrees.
(a) You are playing golf and have made a good tee shot. Now the hole is
located only 30 yards from your ball, in a direction 32 degrees north
of east. You hit a chip shot that travels 25 yards 22 degrees north of
east, followed by a putt that travels 8 yards 60 degrees north of east.
How far from the hole is your golf ball now located? For full credit,
include a diagram showing the relevant vectors.
(b) The three-reflections theorem, whose proof was problem 5b, states that
if you reflect successively in lines that make angle α, β, and γ with
the x−axis, the effect is simply to reflect in a line that makes angle
α + γ − β with the x-axis. Confirm this, using R, for the case where
α = 40◦ , β = 30◦ , and γ = 80◦ . Make a plot in R to show where the
point P = (1, 0) ends up after each of the three successive rotations.
(a) Construct unit vectors in R3 that represent the positions of the three
cities.
(b) By computing angles between these vectors, compare the length in
kilometers of a nonstop flight with the length of a trip that stops
in Dublin. Remember that, by the original definition of the meter,
the distance from the North Pole to the Equator along the meridian
through Paris is 10,000 kilometers. (You may treat the Earth as a
sphere of unit radius.)
(c) Any city that is on the great-circle route from Boston to Naples has a
vector that lies in the same plane as the vectors for Boston and Naples.
Invent a test for such a vector (you may use either cross products or
determinants), and apply it to Dublin.
39
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #1, Week 3 (Row Reduction, Independence, Basis)
Reading
1
R Scripts
• Script 1.3A-RowReduction.R
Topic 1 - Row reduction to solve two equations, two unknowns
Topic 2 - Row reduction to solve three equations, three unknowns
Topic 3 - Row reduction by elementary matrices
Topic 4 - Automating row reduction in R
Topic 5 - Row reduction to solve equations in a finite field
• Script 1.3B-RowReductionApplications.R
Topic 1 - Testing for linear independence or dependence
Topic 2 - Inverting a matrix by row reduction
Topic 3 - Showing that a given set of vectors fails to span Rn
Topic 4 - Constructing a basis for the image and kernel
• Script 1.3C-OrthonormalBasis.R
Topic 1 - Using Gram-Schmidt to construct an orthonormal basis
Topic 2 - Making a new orthonormal basis for R3
Topic 3 - Testing the cross-product rule for isometries
• Script 1.3P-RowReductionProofs.R
Topic 1 - In Rn , n + 1 vectors cannot be independent
Topic 2 - In Rn , n − 1 vectors cannot span
Topic 3 - An invertible matrix must be square
2
1 Executive Summary
1.1 Row reduction for solving systems of equations
When you solve the equation A~v = ~b you combine the matrix A and the vector
~b into a single matrix. Here is a simple example.
x + 2y = 7, 2x + 5y = 16
1 2 x ~ 7
Then A = , ~v = ,b= , so that A~v = ~b exactly corresponds
2 5 y 16
1 2 7
to our system of equations. Our matrix of interest is therefore
2 5 16
First,
subtract
twice row 1 from row 2, then subtract twice row 2 from row 1
1 0 3
to get
0 1 2
Interpret the result as a pair of equations (remember what each column cor-
responded to when we first appended A and ~b together: x = 3, y = 2.
The final form we are striving for is row-reduced echelon form, in which
• The leftmost nonzero entry in every row is a “pivotal 1.”
• Pivotal 1’s move to the right as you move down the matrix.
• A column with a pivotal 1 has 0 for all its other entries.
• Any rows with all 0’s are at the bottom.
The row-reduction algorithm converts a matrix to echelon form. Briefly,
1. SWAP rows, if necessary, so that the leftmost column that is not all zeroes
has a nonzero entry in the first row.
2. DIVIDE by this entry to get a pivotal 1.
3. SUBTRACT multiples of the first row from the others to clear out the rest
of the column under the pivotal 1.
4. Repeat these steps to get a pivotal 1 in the next row, with nothing but
zeroes elsewhere in the column (including in the first row). Continue until
the matrix is in echelon form.
A pivotal 1 in the final column indicates no solutions. A bottom row full of
zeroes means that there are infinitely many solutions.
3
1.2 Row reduction by elementary matrices
Each basic operation in the row-reduction algorithm for a matrix A can be
achieved by multiplication on the left by an appropriate invertible elementary
matrix.
4
1.3 Row reduction for determining linear independence
Given a set of elements such as {a1 , a2 , a3 , a4 }, a linear combination is the name
given to any arbitrary sum of scalar multiples of those elements. For instance:
a1 − 2a2 + 4a3 − 5a4 is a linear combination of the above set.
Given some set of vectors, we describe the set as linearly independent if
none of the vectors can be written as a linear combination of the others. Similarly,
we describe the set as linearly dependent if one or more of the vectors can be
written as a linear combination of the others.
A subspace is a set of vectors (usually an infinite number of them) that is
closed under addition and scalar multiplication. “Closed” means that the sum
of any two vectors in the set is also in the set and any scalar multiple of a vector
in the set is also in the set. A subspace of F n is the set of all possible linear
combinations of some set of vectors. This set is said to span or to generate the
subspace
A subspace W ∈ F n has the following properties:
1. The element ~0 is in W .
3. For any element ~v in W and any scalar c in F , the element c~v is also in W .
b) It is a minimal spanning set: it spans V , but if you remove any vector from
this set, it will no longer span V .
The number of elements in a basis for a given vector space is called the dimension
of the vector space. A subspace has at most the same dimension as the space of
which it is a subspace.
By creating a matrix whose columns are the vectors in a set and row reducing,
we can find a maximal linearly independent subset, namely the columns that
become columns with pivotal 1’s. Any column that becomes a column without a
pivotal 1 is a linear combination of the columns to its left.
5
1.4 Finding a vector outside the span
To show that a set of vectors {~v1 , ~v2 , · · · , ~vk } does not span F n , we must exhibit
~ that is not a linear combination of the vectors in the given set.
a vector w
• Create an n × k matrix A whose columns are the given vectors.
• Row-reduce this matrix, forming the product E of the elementary matrices
that accomplish the row reduction.
• If the original set of vectors spans F n , the row-reduced matrix EA will
have n pivotal columns. Otherwise it will have fewer than n pivotal 1s, and
there will be a row of zeroes at the bottom. If that is the case, construct
the vector w~ = E −1~en .
• Now consider what happens when you row reduce the matrix A|~ w. The
last column will contain a pivotal 1. Therefore the vector w ~ is independent
of the columns to its left: it is not in the span of the set {~v1 , ~v2 , · · · , ~vk } .
If k < n, then matrix A has fewer than n columns, so the matrix EA has
fewer than n pivotal columns and must have a row of zeroes at the bottom. It
~ = E −1~en can be constructed and that a set of fewer
follows that the vector w
than n vectors cannot span F n .
6
1.6 Linearly independent rows
Hubbard (page 200) gives two arguments that the number of linearly independent
rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row that is
linearly independent of the top two rows into the third position. Continue until
the top r rows are a linearly independent set, while each of the bottom m − r
rows is a linear combination of the top r rows.
Continuing with elementary row operations, subtract appropriate multiples
of the top r rows from each of the bottom rows in succession, reducing it to
zero. (Easy in principle but hard in practice!). The top rows, still untouched,
are linearly independent, so there is no way for row reduction to convert any of
them to a zero row. In echelon form, the matrix will have r pivotal 1s: rank r.
It follows that r is both the number of linearly independent columns and the
number of linearly independent rows: the rank of A is equal to the rank of its
transpose AT .
7
2 Lecture Outline
1. Row reduction
This is just an organized version of the techniques for solving simultaneous
equations that you learned in high school.
When you solve the equation A~x = ~b you combine the matrix A and the
vector ~b into a single matrix. Here is a simple example.
The equations are
x + 2y = 7
2x + 5y = 16.
1 2 ~ 7
Then A = ,b= ,
2 5 16
1 2 7
and we must row-reduce the 2 × 3 matrix .
2 5 16
First, subtract twice row 1 from row 2 to get
You see the general strategy. First eliminate x from all but the first equa-
tion, then eliminate y from all but the second, and keep going until, with
luck, you have converted each row into an equation that involves only a
single variable with a coefficient of 1.
8
2. Echelon Form
Key properties:
Here are the “what’s wrong?” examples from Hubbard. Find row opera-
tions that fix them.
1 0 0 2
0 0 1 −1 .
0 1 0 1
1 1 0 1
0 0 2 0 .
0 0 0 1
0 0 0
1 0 0 .
0 1 0
0 1 0 3 0 −3
0 0 −1 1 1 1 .
0 0 0 0 1 2
9
3. Row reduction algorithm
The row-reduction algorithm (Hubbard, p. 166) converts a matrix to ech-
elon form. Briefly,
(a) SWAP rows so that the leftmost column that is not all zeroes has a
nonzero entry in the first row.
(b) DIVIDE by this entry to get a pivotal 1.
(c) SUBTRACT multiples of the first row from the others to clear out the
rest of the column under the pivotal 1.
(d) Repeat these steps to get a pivotal 1 in the second row, with nothing
but zeroes elsewhere in the column (including in the first row).
(e) Repeat until the matrix is in echelon form.
0 3 3 6
Carry out this procedure to row-reduce the matrix 2 4 2 4 .
3 8 4 7
10
4. Solving equations
Once you have row-reduced the matrix, you can interpret it as representing
˜
the equation Ã~x = ~b,
which has the same solutions as the equation with which you started, except
that now they can be solved by inspection.
˜
A pivotal 1 in the last column ~b is the kiss of death, since it is an equa-
tion like 0x + 0y = 1. There is no solution. This happens in the second
Mathematica example,
1 0 1 0
where row reduction leads to 0 1 1 0 .
0 0 0 1
Otherwise, choose freely the values of the “active” unknowns in the non-
pivotal columns(excluding the last one). Then each row gives the value of
the “passive” unknown in the column that has the pivotal 1 for that row.
This happens in the third Mathematica example,
1 0 1 23
2 1 3 1
where row reduction converts 1 −1 0 1 to 0 1 1 − 31 .
1 1 2 31 0 0 0 0
The only nonpivotal column(except the last one) is the third. So we can
choose the value of the active unknown z freely.
2
Then the first row gives x in terms of z: x = 3
− z.
The second row gives y in terms of z: y = − 31 − z.
If there are as many equations as unknowns, this situation is exceptional.
If there are fewer equations than unknowns, it is the usual state of affairs.
Expressing the passive variables in terms of the active ones will be the
subject of the important implicit function theorem in outline 9.
A column that is all zeroes is nonpivotal. Such a column must have been
there from the start; it cannot come about as a result of row reduction.
It corresponds to an unknown that was never mentioned. This sounds
unlikely, but it can happen when you represent a system of equations by
an arbitrary matrix.
Example: In R3 , solve the equations x = 0, y = 0 (z not mentioned)
11
5. Many for the price of one
If you have several equations with the same matrix A on the left and dif-
ferent vectors on the right, you can solve them all in the process of row-
reducing A. This is Example 2.2.10, also done in Mathematica. Row re-
duction is more efficient than computing A−1 , and it works even when A is
not invertible. Here is simple example with a non-invertible A:
x + 2y = 3
2x + 4y = 6
x + 2y = 3
2x + 4y = 7
The first pair has infinitely many solutions: choose any y and take x =
3 − 2y. The second set has none.
We must row-reduce the 2 × 4 matrix
1 2 3 3
.
2 4 6 7
The last column has a pivotal 1 – no solution for the second set.
The third column has no pivotal 1, and the second column is also nonpivotal,
so there are multiple solutions for the first set of equations. Make a free
choice of the active variable y that goes with nonpivotal column 2.
How does the first row now determine the passive unknown x?
12
6. When is a matrix invertible?
Our definition of the inverse A−1 of a matrix A requires it to be both a left
inverse and a right inverse: A−1 A = I and AA−1 = I.
We have also proved that the inverse of a matrix, if it exists, must be unique.
The notation I for the identity obscures the fact that one identity matrix
might be m × m, the other n × n, in which case we would have an invertible
non-square matrix. Now is the time to prove that this cannot happen: only
a square matrix can be invertible. This theorem is the key to Hubbard’s
proof of the most important theorem of linear algebra, which says that the
dimension of a vector space is well defined. The proof relies explicitly on
row reduction.
Now we must show that if A~x = ~b has a unique solution, the number
of rows m must equal the number of columns n. Consider solving
A~x = ~b by row reduction, converting A to matrix à in echelon form.
To show that m = n, show that m ≤ n and n ≤ m.
• If A has more rows than columns, there is no existence. Row reduction
must leave at least one row of zeroes at the bottom, and there exists
~b for which A~x = ~b has no solution.
• If A has more columns than rows, there is no uniqueness. Row reduc-
tion must leave at least one nonpivotal column, and the solution to
A~x = ~b is not unique.
• So if A is invertible, and A~x = ~b therefore has a unique solution, A
must be a square matrix.
13
7. Matrix inversion by row reduction
If A is square and you choose each standard basis vector in turn for the
right-hand side, then row reduction constructs the inverse of A if it exists.
1 2
As a simple example, we invert A = .
2 5
Begin by appending the standard basis vectors as third and fourth columns
to get
1 2 1 0
.
2 5 0 1
The right two columns of the row-reduced matrix are the desired inverse:
check it!
For matrices larger than 2 × 2, row reduction is a more efficient way of con-
structing a matrix inverse than any techniques involving determinants that
you may have learned! Hubbard, Example 2.3.4, is done in Mathematica.
2 1 3 1 0 0 1 0 0 3 −1 −4
The matrix 1 −1 1 0 1 0 row reduces to 0 1 0 1 −1 −1 .
1 1 2 0 0 1 0 0 1 −2 1 3
−1
What are A and A ?
14
8. Elementary matrices:
Each basic operation in the row-reduction algorithm can be achieved by
multiplication on the left by an appropriate invertible elementary matrix.
Here are examples of the three types of elementary matrix.
For each, figure
2 4
out what row operation is achieved by converting A = −1 1 to EA.
1 0
1
0 0
2
• Type 1: E1 = 0 1 0
0 0 1
1 2 0
• Type 2: E2 = 0 1 0
0 0 1
0 0 1
• Type 3: E3 = 0 1 0
1 0 0
15
9. Row reduction and elementary matrices
Interpret the result as a pair of equations and solve them (by inspection)
for x and y.
16
10. Linear combinations and span
Xk k
X
T( ai~vi ) = ai T (~vi ).
i=1 i=1
Pk
The sum i=1 ai~vi is called a linear combination of the vectors ~v1 , · · · v~k .
The set of all the linear combinations of ~v1 , · · · v~k is called the span of the
set ~v1 , · · · ~vk .
Prove that it is a subspace of F n .
1 0 3 2 1
Suppose ~v1 = −2 , ~v2 = 1 , ~v3 = −1 ,~
w1 = −3 ,~
w2 = 0
1 −1 −2 1 0
• Show that w
~ 1 is a linear combination of ~v1 and ~v2 .
• Invent an easy way to describe the span of ~v1 , ~v2 , and ~v3 . (Hint:
consider the sum of the components.)
• Show that w
~ 2 is not in the span of ~v1 , ~v2 , and ~v3 .
1 0 3 2 1 1 0 3 2 0
• The matrix −2 1 −1 −3 0 row reduces to 0
1 5 2 0.
1 −1 −2 1 0 0 0 0 0 1
How does this result answer the question of whether or ~ 1 or w
not w ~2
is in the span of ~v1 , ~v2 , and ~v3 ?
17
11. Special cases:
• In F n , w
~ is in the span of ~u if and only if it is a multiple of ~u.
• In F 2 , if ~v is not a multiple of ~u, then every vector w
~ is in the span
of ~u and ~v.
Write an equivalent statement using negation, and use it to construct
an example.
• In F 3 , a vector w
~ is in the span of ~u and ~v if and only if it is orthogonal
to ~u × ~v.
Give a geometrical interpretation of this statement.
18
12. Linear independence
~v1 , ~v2 , · · · ~vk are linearly independent if the system of equations
x1~v1 + x2~v2 + · · · + xk ~vk = w
~ has at most one solution.
To test for linear independence, make the vectors ~v1 , ~v2 , · · · ~vk into a matrix
and row-reduce it. If any column is nonpivotal, then the vectors are linearly
dependent. There is an example in the Mathematica file.
1 2 0
1 0 2
The vectors to test for independence are ~v1 =
2, ~v2 = 1, ~v3 = 3.
1 1 1
~ is irrelevant and might as well be zero, so we just make a
The vector w
matrix from the three given vectors:
1 2 0 1 0 2
1 0 2 0 1 −1
2 1 3 reduces to 0 0 0
1 1 1 0 0 0
The third column is nonpivotal; so the given vectors are linearly dependent.
How can you write the third one as a linear combination of the first two?
0
2
Change ~v3 to
1 and test again.
1
1 2 0 1 0 0
1 0 2 0 1 0
Now
2 1 1 reduces to 0 0
1
1 1 1 0 0 0
There is no nonpivotal column. The three vectors are linearly independent.
Setting w~ = ~0, as we have already done, leads to the standard definition of
linear independence: if
a1~v1 + a2~v2 + · · · ak ~vk = ~0
then a1 = a2 = · · · = ak = 0.
19
13. Constructing a vector outside the span
The vectors are
4 2
~v1 = 2 , ~v2 = 1
3 2
4 2 1 0
A = 2 1 reduces to EA = 0 1, and the matrix that does the job is
3 2 0 0
1 0 −1
E = − 23 0 2 .
1
2
1 0
We want to append a third column ~b such that when we row reduce the
square matrix A|~b, the resulting matrix EA|E ~b will have a pivotal 1 in the
third column. In this case it will be in the bottom row. Since E, being a
product of elementary matrices, must be invertible, we compute
0 0
−1
E 0 = 1
1 0
0
We have found a vector, 1, that is not in the span of ~v1 and ~v2 .
0
Key point: the proof relies on the fact that this procedure will always work,
because the matrix E that accomplishes row reduction is guaranteed to be
invertible!
20
14. Two key theorems; your proof 3.1
21
15. Proof 3.1 – start to finish
Prove that in Rn , n + 1 vectors are never linearly independent and n − 1
vectors never span.
22
16. Definition of basis
This is Hubbard, Definition 2.4.12. It is really a definition plus two theo-
rems, but it can conveniently be left ambiguous which is which!
A basis for a subspace V ⊂ Rn has the following equivalent properties:
To show that any of these three properties implies the other two would
require six proofs. Let’s do a couple. Call the basis vectors ~v1 , ~v2 , · · · ~vk .
To prove that b 6= 0, assume the contrary, and show that the vectors
~v1 , ~v2 , · · · ~vk would be linearly dependent.
23
• Prove that (c) implies (a).
This is easier, since all we have to show is “maximal.” Add another
~ to the linearly independent spanning set ~v1 , ~v2 , · · · ~vk . How
vector w
do we argue that this set is linearly dependent?
Now we combine this definition of basis with what we already know about
sets of vectors in Rn .
Our conclusions:
In Rn , a basis cannot have fewer than n elements, since they would not
span.
In Rn , a basis cannot have more than n elements, since they would not be
linearly independent.
So any basis must, like the standard basis, have exactly n elements.
24
17. Basis for a subspace
Consider any subspace E ⊂ Rn . We need to prove the following:
• E has a basis.
• Any two bases for E have the same number of elements, called the
dimension of E.
25
Now we proceed to the proof. First we must prove the existence of a basis
by explaining how to construct one.
How to make a basis for a non-empty subspace E in general:
Choose any ~v1 to get started. Notice that we need not specify a method for
doing this! The justification for this step is the so-called “axiom of choice.”
If ~v1 does not span E, choose ~v2 that is not in the span of ~v1 (not a multiple
of it). Again, we do not say how to do this, but it must be possible since
~v1 does not span E.
If ~v1 and ~v2 do not span E, choose ~v3 that is not in the span of ~v1 and ~v2
(not a linear combination).
Keep going until you have spanned the space. By construction, the set is
linearly independent. So it is a basis.
Second, we must prove that every basis has the same number of vectors.
Imagine that two people have done this and come up with bases of possibly
different sizes.
One is ~v1 , ~v2 , · · · ~vm .
The other is w ~ 2, · · · w
~ 1, w ~ p.
Since each basis spans E, we can write each w ~ j as a linear combination of
the ~v. It takes m coefficients to do this for each of the p vectors, so we end
up with an m × p matrix A, each of whose columns is one of the w ~ j.
We can also write each ~vi as a linear combination of the w ~ j . It takes p
coefficients to do this for each of the m vectors, so we end up with a p × m
matrix B, each of whose columns is one of the ~vi .
Clearly AB = I and BA = I. So A is invertible, hence square, and m = p.
26
18. Kernels and Images
27
19. Basis for the image
To find a basis for the image of T , we must find a linearly independent
set of vectors that span the image. Spanning the image is not a problem:
the columns of the matrix for T do that. The hard problem is to choose a
linearly independent set. The secret is to use row reduction.
Each nonpivotal column is a linear combination of the columns to its left,
hence inappropriate to include in a basis. It follows that the pivotal columns
of T form a basis for the image. Of course, you can permute the columns
and come up with a different basis: no one said that a basis is unique.
This process of finding a basis for Img T is carried out in Mathematica.
1 2 1 1 1 2 0 2
The matrix T = 0 0 1 −1 row reduces to 0 0 1 −1.
2 4 1 3 0 0 0 0
By inspecting these two matrices, find a basis for Img T . Notice that the
dimension of Img T is 2, which is less than the number of rows, and that
the two leftmost columns do not form a basis.
28
20. Basis for the kernel
1 2 1 1 1 2 0 2
The matrix T = 0 0 1 −1 row reduces to 0 0 1 −1.
2 4 1 3 0 0 0 0
To find a basis for Ker T , look at the row-reduced matrix and identify
the nonpivotal columns. For each nonpivotal column i in turn, put a 1
in the position of that column, a 0 in the position of all other nonpivotal
columns, and leave blanks in the other positions. The resulting vectors must
be linearly independent, since for each of them, there is a position where
it has a 1 and where all the others have a zero. What are the resulting
(incomplete) basis vectors for Ker T ?
Now fill in the blanks: assign values in the positions of all the pivotal
columns so that T (v~i ) = 0. The vectors v~i span the kernel, since assigning a
value for each nonpivotal variable is precisely the technique for constructing
the general solution to T (~v) = 0.
29
21. Rank - nullity theorem
The matrix of T : Rn → Rm has n columns. We row-reduce it and find r
pivotal columns and n − r nonpivotal columns. The integer r is called the
rank of the matrix.
Each pivotal column gives rise to a basis vector for the image; so the di-
mension of Img T is r.
Each nonpivotal column gives rise to a basis vector for the kernel; so the
dimension of Ker T is n − r.
Clearly, dim(Ker T ) + dim(Img T ) = n.
In the special case of a linear transformation T : Rn → Rn , represented by
a square n × n matrix, if the rank r = n then
30
22. Linearly independent rows
Hubbard (page 200) gives two arguments that the number of linearly inde-
pendent rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row
that is linearly independent of the top two rows into the third position.
Continue until the top r rows are a linearly independent set, while each of
the bottom m − r rows is a linear combination of the top r rows.
Now, continuing with elementary row operations, subtract appropriate mul-
tiples of the top r rows from each of the bottom rows in succession, reducing
it to zero. (This is easy in principle but hard in practice!). The top rows,
still untouched, are linearly independent, so there is no way for row reduc-
tion to convert any of them to a zero row. In echelon form, the matrix will
have r pivotal 1s: its rank is r.
It follows that r is both the number of linearly independent columns and
the number of linearly independent rows: the rank of A is equal to the rank
of its transpose AT .
31
23. Orthonormal basis:
If we have a dot product, then we can convert any spanning set of vectors
into a basis. Here is the algorithm, sometimes called the “Gram-Schmidt
process.” We will apply it to the 3-dimensional subspace of R4 for which
the components sum to zero. Details of the computation are in the Math-
ematica file.
Choose any vector w~ 1 and divide it by its length to make the first basis
vector ~v1 .
1
−1
~1 =
If w 1 , what is ~v1 ?
−1
Choose any vector w ~ 2 that is linearly independent of ~v1 and subtract off
a multiple of ~v1 to make a vector ~x that is orthogonal to ~v1 . Divide this
vector by its length to make the second basis vector ~v2 .
2
−1
~2 =
If w −1, calculate ~x = w
~ 2 − (~
w2 · ~v1 )~v1
0
Choose any vector w ~ 3 that is linearly independent of ~v1 and ~v2 , and subtract
off multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1
and ~v2 . Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent
of your basis vectors.
3 1
√ − 2√5
1
2 2 5
− 1 − √1 − √3
2 , ~ 2 5 2 5
Mathematica gives ~v1 = v =
1 2 − √3 3 , ~
v = .
2√1 5
2 2 5
− 12 1
√ 3
√
2 5 2 5
32
3 Group Problems
1. Row reduction and elementary matrices
2x + y + z = 2
x + y + 2z = 2
x + 2y + 2z = 1
where all the coefficients and constants are elements of the finite field
Z3 . If there is no solution, say so. If there is a unique solution, specify
the values of x, y, and z. If there is more than one solution, determine
all solutions by giving formulas for two of the variables, perhaps in
terms of the third one.
1 2
(b) Find the inverse of A = by using row reduction by means of
−3 −7
elementary matrices, as was done in sample problem 2. Confirm that
the product of the three elementary matrices that you use is indeed
the inverse. Use the familiar rule method for finding a 2 × 2 inverse to
check your answer!
(c) The matrix
0 1 2
A = 1 2 3
2 3 4
is not invertible. Nonetheless, there is a product E of three elementary
matrices, applied as was done in sample problem 2, that will reduce it
to echelon form. Find these three matrices and their product E.
33
2. Some short proofs
(a) Show that type 3 elementary matrices are not strictly necessary, be-
cause it is possible to swap rows of a matrix by using only type 1 and
type 2 elementary matrices. (If you can devise a way to swap the two
rows of a 2 × 2 matrix, that it sufficient, since it is obvious how the
technique generalizes.)
(b) Prove that if a set of linearly independent vectors spans a vector space
W, it is both a maximal linearly independent set and a minimal span-
ning set.
(c) Prove that in a vector space spanned by a single vector ~v, any two
vectors are linearly dependent. Then using this result, prove that in
a space spanned by two vectors ~v1 and ~v2 , any three vectors w ~ 1, w
~2
and w~ 3 must be linearly dependent. In the interest of simplicity. you
may assume that w ~ 1 = a1~v1 + a2~v2 with a1 6= 0.
Hint: Show how to construct a linear combination of w ~ 1 and w ~ 2 and
a linear combination of w~ 1 and w~ 3 , neither of which involves ~v1 .
34
3. Problems to be solved by writing or editing R scripts.
(a) The director of a budget office has to make changes to four line items
in the budget, but her boss insists that they must sum to zero. Three
of her subordinates make the following suggestions, all of which lie in
the subspace of acceptable changes:
1 3 3
2 −2 1
~1 =
w ~ 2 = ,w
3 ,w 2 ~ 3 = −2.
−6 −3 −2
1
1
The boss proposes ~y = −2, also acceptable on the grounds that “it
0
is simpler.”
Express ~y as a linear combination of the w~ i . Then convert the w
~ i to
an orthonormal basis ~vi and express ~y as a linear combination of the
~vi .
(b) Find a basis for the image
and the kernel of the matrix
3 1 1 0 4
1 0 1 1 2
A= 0 1 −2 0 1,
2 0 0 1 3
Express the columns that are not in the basis for the image as linear
combinations of the ones that are in the basis.
(c) Find two different solutions to the following set of equations in Z5 :
2x + y + 3z + w = 3
3x + 4y + 3w = 1
x + 4y + 2z + 4w = 2
35
4 Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
For the first three problems, do the row reduction by hand. That should give
you enough practice so that you can do row reduction by hand on exams. Then
you can use R to do subsequent row reduction.
2x + 4y + z = 2
3x + y = 1
3y + 2z = 3
over the finite field Z5 . If there is no solution, say so. If there is a unique
solution, specify the values of x, y, and z and check your answers. If there
is more than one solution, express two of the variables in terms of an arbi-
trarily chosen value of the third one. For full credit you must reduce the
matrix to echelon form, even if the answer becomes obvious!
2. (a) By using elementary matrices, find a vector that is not in the span of
1 0 2
~v1 = 1 , ~v2 = 2 , and ~v3 = 4
−1 2 0
(b) In the process, you will determine that the given three vectors are
linearly dependent. Find a linear combination of them, with the coef-
ficient of ~v3 equal to 1, that equals the zero vector.
(c) Find a 1 × 3 matrix A such that A~v1 = A~v2 = A~v3 = 0, and use it to
check your answer to part(a).
36
3. This problem illustrates how you can use row reduction to express a specifed
vector as a linear combination of basis vectors.
Your bakery uses flour, sugar, and chocolate to make cookies, cakes, and
brownies. The ingredients for a batch of each product is described by a
vector, as follows:
1 4 7
Suppose ~v1 = 2, ~v2 = 2, ~v3 = 8 .
3 7 11
This means, for example, that a batch of cookies takes 1 pound of flour, 2
of sugar, 3 of chocolate.
You are about to shut down for vacation and want toclear out your inven-
21
~ = 18.
tory of ingredients, described by the vector w
38
Use row reduction to find a combination of cookies, cakes, and brownies
that uses up the entire inventory.
4. Hubbard, exercises 2.3.8 and 2.3.11 (column operations: a few brief com-
ments about the first problem will suffice for the second. These column
operations will be used in the spring term to evaluate n × n determinants.)
37
6. (This result will be the key to proving the “implicit function theorem,” key
to many economic applications.)
Suppose that m × n matrix C , where n > m, has m linearly independent
columns and that these columns are placed on the left. Then we can split
off a square matrix A and write C = [A|B].
7. (Like group problem 3b, but in a finite field, so rref will not help!)
In R, the statement
A<-matrix(sample(0:4, 24, replace = TRUE),4)
was used to create a 4 × 6 matrix A with 24 entries in Z5 . Each entry
randomly has the value 0, 1, 2, 3, or 4.
Here is the resulting matrix:
3 0 4 0 2 2
1 1 3 3 2 1
A=
0
.
2 1 1 4 2
1 0 2 0 3 4
Use row reduction to find a basis for the image of A and a basis for the
kernel. Please check your answer for the kernel.
8. One of the seventeen problems on the first Math 25a problem set for 2014
was to find all the solutions of the system of equations
2x1 − 3x2 − 7x3 + 5x4 + 2x5 = −2
x1 − 2x2 − 4x3 + 3x4 + x5 = −2
2x1 − 4x3 + 2x4 + x5 = 3
x1 − 5x2 − 7x3 + 6x4 + 2x5 = −7
without the use of a computer.
Solve this problem using R ( like script 3.1A).
38
9. (Like script 3.1C and group problem 3a)A neo-Cubist sculptor wants to use
a basis for R3 with the following properties:
1
• The first basis vector w1 = 1 lies along the body diagonal of the
1
cube.
1
• The second basis vector w2 = 0 lies along a face diagonal of the
1
cube.
3
• The second basis vector w3 = 4 , has length 13.
12
39
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #1, Week 4 (Eigenvectors and Eigenvalues)
Reading
• 4.2
– For real n × n matrix A, prove that if all the polynomials pi (t) are
simple and have real roots, then there exists a basis for Rn consisting
of eigenvectors of A.
– Prove that if there exists a basis for Rn consisting of eigenvectors of
A, then all the polynomials pi (t) are simple and have real roots.
1
R Scripts
• 1.4A-EigenvaluesCharacteristic.R
Topic 1 - Eigenvectors for a 2x2 matrix
Topic 2 - Not every 2x2 matrix has real eigenvalues
• 1.4B-EigenvectorsAxler.R
Topic 1 - Finding eigenvectors by row reduction
Topic 2 - Eigenvectors for a 3 x 3 matrix
• 1.4C-Diagonalization.R
Topic 1: Basis of real eigenvectors
Topic 2 - Raising a matrix to a power
Topic 3 - Wnat if the eigenvalues are complex?
Topic 4 - What if there is no eigenbasis?
• 1.4X-EigenvectorApplications.R
Topic 1 - The special case of a symmetric matrix
Topic 2 - Markov Process (from script 1.1D)
Topic 3 - Eigenvectors for a reflection
Topic 4 - Sequences defined by linear recurrences
2
1 Executive Summary
1.1 Eigenvalues and eigenvectors
If A~v = λ~v, ~v is called an eigenvector
for A, and λ is the corresponding
eigenvalue.
−1 4 1
For example, if A = , we can check that ~v =
−2 5 1
is an eigenvector of A with eigenvalue 3.
If A is a 2 × 2 or 3 × 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = λ~v as A~v = λI~v, where I is the identity matrix.
Equivalently, (A − λI)~v = ~0
Suppose that λ is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A − λI).
It follows that the matrix (A − λI) is not invertible. But we have a formula
for the inverse of a 2 × 2 or 3 × 3 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue is that
det(A − λI) = 0.
The polynomial χA (λ) = det(A − λI) is called the characteristic polyno-
mial of matrix A. It is easy to compute in the 2 × 2 or 3 × 3 case, where there
is a simple formula for the determinant. For larger matrices χA (λ) is hard to
compute efficiently, and this approach should be avoided.
Conversely, suppose that χA (λ) = 0 for some real number λ. It follows that
the columns of the matrix (A − λI) are linearly dependent. If we row reduce the
matrix, we will find at least one nonpivotal column, which in turn implies that
there is a nonzero vector in the kernel. This vector is an eigenvector.
This was the standard way of finding eigenvectors until 1995, but it has two
drawbacks:
• Once you have found the eigenvalues, finding the corresponding eigenvectors
is a nontrivial linear algebra problem.
3
Finding the corresponding
eigenvectors
still requires a bit of algebra.
−2 4
For λ = 1, A − λI = .
−2 4
2
By inspection we see that ~v1 = is in the kernel of this matrix.
1
−1 4 2 2
Check: A~v1 = = – eigenvector with eigenvalue 1.
−2 5 1 1
−4 4 1
For λ = 3, A − λI = , and ~v2 = is in the kernel.
−2
2 1
−1 4 1 3
Check: A~v2 = = – eigenvector with eigenvalue 3.
−2 5 1 3
4
1.3 A better way to find eigenvectors
Given matrix A, pick an arbitrary vector w ~ . Keep computing A~ w , A2 w ~ , A3 w
~,
etc. until you find a vector that is a linear combination of its predecessors. This
situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0. Further-
more, this is the nonzero polynomial of lowest degree for which p(A)~ w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root λ by
virtue of the “fundamental theorem of algebra” (Hubbard theorem 1.6.13). Over
the real numbers or a finite field, it it will have a root in the field only if you are
lucky. Assuming that the root exists, factor it out: p(t) = (t − λ)q(t).
Now p(A)~ w = (A − λI)q(A)~ w = 0.
Thus q(A)~ w is an
eigenvector
with eigenvalue λ.
−1 4
Again, let A =
−2 5
1 −1 2 −7
As the arbitrary vector w ~ choose . Then A~ w= and A w ~ = .
0 −2 −8
We need to express the third of these vectors, A2 w ~ , as a linear combination
of
the first two.
This
is done by row reducing the matrix
1 −1 −7 1 0 −3
to to find that A2 w ~ = 4A~w − 3I w~.
0 −2 −8 0 1 4
Equivalently, (A2 − 4A + 3I)~ w = 0.
p(A) = A2 − 4A + 3I or p(t) = t2 − 4t + 3 = (t − 1)(t − 3): eigenvalues 1 and 3.
To get the eigenvector
for eigenvalue
1, apply the remaining factor
of p(A),
−4 4 1 −4 2
A − 3I, to w ~: = . Divide by -2 to get ~v1 = .
−2 2 0 −2 1
To get theeigenvector
foreigenvalue
3, apply the remaining factor of p(A),
−2 4 1 −2 1
A − I, to w ~: = . Divide by -2 to get ~v2 = .
−2 4 0 −2 1
In this case the polynomial p(t) turned out to be the same as the characteristic
polynomial, but that is not always the case.
1
• If we choose w ~ = , we find A~ w = 3~ w, p(A) = A − 3I, p(t) = t − 3. We
1
need to start over with a different w ~ to find the other eigenvalue.
2 0
• If we choose A = , then any vector is an eigenvector with eigenvalue
0 2
2. So p(t) = t − 2. But the characteristic polynomial is (t − 2)2 .
2 1
• If we choose A = , the characteristic polynomial is (t − 2)2 . But now
0 2
there is only one eigenvector.
If we choose w ~ = ~e1 we find p(t) = t − 2
1
and the eigenvector . But if we choose a different w ~ = ~e2 we find
0
p(t) = (t − 2)2 and we fail to find a second, independent eigenvector.
5
1.4 When is there an eigenbasis?
Choose w ~ successively to equal ~e1~e2 , · · · , ~en .
In searching for eigenvectors, we find successively polynomials p1 (t), p2 (t), · · · , pn (t).
There is a basis of real eigenvectors if and only if each of the polynomials pi (t)
has simple real roots, e.g. p(t) = t(t − 2)(t + 4)(t − 2.3). No repeated factors
are allowed!
A polynomial like p(t) = t2 + 1, although it has no repeated factors, has no
real roots: p(t) = (t + i)(t − i).
If we allow complex roots, then any polynomial can be factored into linear
factors (Fundamental Theorem of Algebra, Hubbard page113).
There is a basis of complex eigenvectors if and only if each of the polynomials
pi (t) has simple roots, e.g. p(t) = t(t−i)(t+i). No repeated factors are allowed!
Our technique for finding eigenvectors works also for matrices over finite fields,
but in that case it is entriely possible for a polynomial to have no linear factors
whatever. In that case there are no eigenvectors and no eigenbasis. This is one
of the few cases where linear algebra over a finite field is fundamentally different
from linear algebra over the real or complex numbers.
6
1.6 Properties of an eigenbasis
• Even if all the eigenvalues are distinct, an eigenbasis is not unique. Any
eigenvector in the basis can be multiplied by a nonzero scalar and remain
an eigenvector.
7
1.8 Applications of eigenvectors
• Markov processes
Suppose that a system can be in one of two or more states and goes through
a number of steps, in each of which it may make a transition from one state
to another in accordance with specified “transition probabilities.”
p
For a two-state process, vector ~vn = n specifies the probabilities for
qn
the system to be in state 1 or state 2 after n steps of the process, where
0 ≤ pn , qn ≤ 1. and
pn + qn = 1 The transition probabilities are spacified
a b
by a matrix A = , where all the entries are between 0 and 1 and
c d
a + c = b + d = 1.
After a large number of steps, the state of the system is speciifed by ~vn =
An~v0 .
The easy way to calculate An is by diagonalizing A. If there is a “stationary
state” ~v into which the system settles down, it corresponds to an eigenvector
with eigenvalue 1, since ~vn+1 = A~vn and ~vn+1 = ~vn = ~v.
• Reflections
If 2 × 2 matrix F represents reflection in a line through the origin with
direction vector ~v, then ~v must be an eigenvector with eigenvalue 1 and a
vector perpendicular to ~v must be an eigenvector with eigenvalue -1.
If 3 × 3 matrix F represents reflection in a plane P through the origin
~ then N
with normal vector N, ~ must be an eigenvector with eigenvalue -1
and there must be a two-dimensional subspace of vectors in P , all with
eigenvalue +1.
8
2 Lecture Outline
1. Using the characteristic polynomial to find eigenvalues and eigenvectors
If A~v = λ~v, ~v is called an eigenvector for A, and λ is the corresponding
eigenvalue.
If A is a 2 × 2 or 3 × 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = λ~v as A~v = λI~v, where I is the identity matrix.
Equivalently, (A − λI)~v = ~0
Suppose that λ is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A − λI).
It follows that the matrix (A − λI) is not invertible. But we have a formula
for the inverse of a 2×2 or 3×3 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue
is that det(A − λI) = 0.
The polynomial χA (λ) = det(A − λI) is called the characteristic poly-
nomial of matrix A. It is easy to compute in the 2 × 2 or 3 × 3 case, where
there is a simple formula for the determinant. For larger matrices χA (λ) is
hard to compute efficiently, and this approach should be avoided.
Conversely, suppose that χA (λ) = 0 for some real number λ. It follows
that the columns of the matrix (A − λI) are linearly dependent. If we row
reduce the matrix, we will find at least one nonpivotal column, which in
turn implies that there is a nonzero vector in the kernel. This vector is an
eigenvector.
9
2. A better way to find eigenvectors
Given matrix A, pick an arbitrary vector w ~ . Keep computing A~ w , A2 w~,
3
Aw ~ , etc. until you find a vector that is a linear combination of its prede-
cessors. This situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~ w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
λ by virtue of the “fundamental theorem of algebra” (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky.
Citing your source: This technique was brought to the world’s attention
by Sheldon Axler’s 1995 article “Down with Determinants” (see Hubbard
page 224). Unlike most of what is taught in undergraduate math, it should
probably be cited when you use it in other courses. An informal comment
like “Using Axler’s method for finding eigenvectors...” would suffice.
10
3 2
3. Consider the matrix A = with entries from the finite field Z5 .
3 3
11
4. Concocting a 2 × 2 matrix without a basis of eigenvectors
2 0 1 −1
Let D = ,N = . The matrix N is a so-called “nilpotent”
0 2 1 −1
matrix: because its kernel is the same as its image, N 2 is the zero matrix.
(a) Show that the matrix A = D + N has the property that if we choose
~ that is not in the kernel of N , then the polynomial p(A) is
any w
(A − 2I)2 and so there is no basis of eigenvectors.
(b) Prove by induction that Ak = Dk + kDk−1 N.
12
5. Eigenbases
13
6. Finding eigenvectors
This method is guaranteed to succeed only for the field of complex num-
bers, but the algorithm is valid for any field, and it finds the eigenvectors
whenever they exist.
Given matrix A, pick an arbitrary vector w ~ . If you are really lucky, A~
w
~ and you have stumbled across an eigenvector. If not,
is a multiple of w
keep computing A2 w~ , A3 w
~ , etc. until you find a vector that is a linear
combination of its predecessors. This situation is easily detected by row
reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
λ by virtue of the “fundamental theorem of algebra” (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky. Assuming that the root exists, factor it out:
p(t) = (t − λ)q(t)
w = (A − λI)q(A)~
Now p(A)~ w = 0.
Thus q(A)~
w is an eigenvector with eigenvalue λ.
Here is a 2 × 2 example where the calculation is easy.
−1 4
Let A =
−2 5
1
~ choose
As the arbitrary vector w . Compute A~ w and A2 w
~.
0
14
Use row reduction to express the third of these vectors, A2 w
~ , as a linear
combination of the first two.
1 −1 −7
0 −2 −8
Factor: p(t)=
15
7. Change of basis
Our “old” basis consists of the standard basis vectors ~e1 and ~e2 .
Our “new” basis consists of one eigenvector for each eigenvalue.
2 1
Let’s choose ~v1 = and ~v2 = .
1 1
It would be all right to multiply either of these vectors by a constant or to
reverse their order.
Write down the change of basis matrix P whose columns express the new
basis vectors in term of the old ones.
16
8. Eigenvectors for a 3 × 3 matrix
For Hubbard Example 2.7.5, the calculation is best subcontracted to Math-
ematica. The matrix is
1 −1 0
A = −1 2 −1
0 −1 1
2
Since we have help with the computation, make the choice w ~ = 3.
5
The matrix to row reduce is
2 −1 0 3
3 −1 −3 −9, different from the matrix in Hubbard.
5 2 3 6
The result of row reduction is the same:
1 0 0 0
0 1 0 −3
0 0 1 4
The rest of the work is easily done by hand.
Using the last column, write the polynomial p(t), and factor it.
17
9. When is there an eigenbasis?
This is a difficult issue in general. The simple case is where we are lucky
and find a polynomial p of degree n that has n distinct roots. In that case
we can find n eigenvectors, and it has already been proved that they are
linearly independent. They form an eigenbasis. If the roots are real, the
eigenvectors are elements of Rn . If the roots are distinct but not all real,
the eigenvectors are still a basis of Cn .
Suppose we try each standard basis vector in turn as w ~ . Using ~ei leads to
a polynomial pi . If every pi is a polynomial of degree mi < n, the situation
is more complicated. Theorem 2.7.6 in Hubbard states the result:
There exists an eigenbasis of Cn if and only if all the roots of all the pi are
simple.
Before doing the difficult proof, look the simplest examples of matrices that
do not have n distinct eigenvalues.
2 0
• Let A = . In this case every vector in R2 is an eigenvector
0 2
with eigenvalue 2. There is only one eigenvalue, but any basis is an
eigenbasis.
If we choose w~ = ~e1 and form the matrix whose columns are w ~ and
A~w,
1 2
,
0 0
~ = ~e2 ?
What eigenvector do we find if we choose w
18
2 0
• Let A = . In this case there is only one eigenvalue and there is
1 2
no eigenbasis.
What happens if we choose w ~ = ~e2 ?
~ = ~e1 ,
If we choose w
1 2 4
confirm that
0 1 4
1 0 −4
row reduces to .
0 1 4
What is p1 ?
What happens when we carry out the procedure that usually gives an
eigenvector?
Key point: There was only one eigenvalue, the polynomial (t − 2)2
showed up, and we were unable to find a basis of eigenvectors.
19
10. An instructive 3 × 3 example
The surprising case, and the one that makes the proof difficult, is the one
where there exists a basis of eigenvectors
but there
are fewer than n distinct
1 0 0
eigenvalues. A simple example is A = 0 2 0
0 0 2
Here each standard basis vector is an eigenvector. For the first one the
eigenvalue is 1; for the second and third, it is 2.
A less obvious example is
2 1 −1
A = 0 2 0
0 1 1
The procedure for finding eigenvectors is carried out in the Mathematica
file, with the following results:
~ = ~e1 , we get p1 (t) = t − 2 and find an eigenvector
Using w
1
0 with eigenvalue 2.
0
~ = ~e2 , we get p2 (t) = (t − 1)(t − 2) and find two eigenvectors:
Using w
1 1
0 with eigenvalue 1, 1 with eigenvalue 2.
1 1
At this point we have found three linearly independent eigenvectors and we
have a basis.
~ = ~e3 , we get p3 (t) = (t − 1)(t − 2) and find two eigenvectors:
If we use w
1 1
0 with eigenvalue 1, 0 with eigenvalue 2.
1 0
~ , we will get p(t) = (t − 1)(t − 2)
In general, if we use some arbitrary w
and we will find the eigenvector with eigenvalue 1 along with some linear
combination of the eigenvectors with eigenvalue 2.
Key points about this case:
• The polynomial pi (t), in order to be simple, must have degree less than
n.
• We need to use more than one standard basis vector in order to find
a basis of eigenvectors.
20
11. Proof that if all roots are simple there is an eigenbasis
Assume that whenever we choose w ~ = ~ei , the polynomial pi of degree mi
has simple roots. The columns of the matrix that we row reduce are
~ei , A~ei , · · · Ami~ei . The image of this matrix has three properties.
• It is a subspace Ei of Rn .
• It includes mi eigenvectors. Since these correspond to distinct eigen-
values, they are linearly independent, and therefore they span Ei .
• It includes ~ei .
Now take the union of all the Ei . This union has the following properties:
21
13. Proof 4.2, first half
Assume that whenever we choose w ~ = ~ei , the polynomial pi of degree mi
has simple roots. Consider the subspace E that is the image of the matrix
whose columns are
~e1 , A~e1 , · · · Am1~e1 , ~e2 , A~e2 , · · · Am2~e2 , · · · , ~en , A~en , · · · Amn~en .
Prove that E = Rn (easy) and that there exists a basis for E that consists
entirely of eigenvectors(harder).
22
14. Proof 4.2, second half
Assume that there is a basis of Rn consisting of eigenvectors of n × n matrix
A, but that A has only k ≤ n distinct eigenvalues. Prove that for any basis
~ = ~ei , the polynomial pi (t) has simple roots.
vector w
23
15. Conformal matrices and complex numbers
7 −10
(a) Show that the polynomial p(t) for the matrix A = has roots
2 −1
3 ± 2i.
(b) Show that ( A−3I
2
)2 = −I.
(c) Choose a new basis with ~v1 = ~e1 , ~v2 = ( A−3I
2
)~e1 .
Use these basis vectors as the columns of matrix P .
Confirm that A = P CP −1 , where C is conformal and P is real.
24
16. Change of basis - nice neat case
1 1 1
Let A = , and find an eigenvector, starting with ~e1 = .
−2 4 0
1 −1
Then A~e1 = and A2~e1 =
−2 −10
1 1 −1 1 0 −6
We row-reduce to
0 −2 −10 0 1 5
and conclude that A2~e1 = −6~e1 + 5A~e1 or A2~e1 − 5A~e1 + 6~e1 = 0.
1 1
Complete the process of finding two eigenvalues and show that and
1 2
2
are a pair of eigenvectors that form a basis for R .
p(t) =
For λ = 2,
For λ = 3,
25
17. Fibonacci numbers by matrices
Determine a6 and a7 by using the square of the matrix that was just con-
structed.
1 1
1 2
1 1
1 2
26
18. Powers of a diagonal matrix.
For a 2 × 2 diagonal matrix,
n
c1 0 cn1 0
= .
0 c2 0 cn2
(P −1 AP )n = P −1 An P.
27
3 Group Problems
1. Some interesting examples with 2 × 2 matrices
(a) Since a polynomial equation with real (or complex) coefficients always
has a root (the “fundamental theorem of algebra”), a real matrix is
guaranteed to have at least one complex eigenvalue. No such theorem
holds for polynomial equations with coefficients in a finite field, so
zero eigenvalues is a possibility. This is one of the few results in linear
algebra that depends on the underlying
field.
3 1
Consider the matrix A = with entries from the finite field Z5 .
n 3
By considering the characteristic equation, find values of n that lead
to 2, 1, or 0 distinct eigenvalues. For the case of 1 eigenvalue, find an
eigenvector.
Hint: After writing the characteristic equation with n isolated on the
right side of the equals sign, make a table of the value of t2 + 4t + 4
for each of the five possible eigenvalues. That table lets you determine
how many solutions there are for each of the five possible values of
n. When the characteristic polynomial is the square of a linear factor,
there is only one eigenvector and it is easy to construct.
1 −1
(b) The matrix A = has only a single eigenvalue and only one
4 −3
independent eigenvector.
Find the eigenvalue and eigenvector, show that A = D + N where D is
diagonal and N is nilpotent, and use analysis to calculate A3 without
ever multiplying A by itself (unless you want to check your answer).
(c) Extracting square roots by diagonalization.
2 1
The matrix A =
2 3
conveniently has two eigenvalues that are perfect squares. Find a
basis of eigenvectors and construct a matrix P such that P −1 AP is a
diagonal matrix.
Thereby find two independent square roots of A, i.e. find matrices B1
and B2 such that B12 = B22 = A , with B2 6= ±B1 . Hint: use the
negative square root of one of the eigenvaulues, the positive square
root of the other.
If you take Physics 15c next year, you may encounter this technique
when you study “coupled oscillators.”
28
2. Some proofs. In doing these, you may use the fact that an eigenbasis exists
if and only if all the pi (t) have simple roots.
(a) Suppose that a 5 × 5 matrix has a basis of eigenvectors, but that its
only eigenvalues are 1 and 2. Using Hubbard Theorem 2.7.6, convince
yourself that you must make at least three different choices of ~ei in
order to find all the eigenvectors.
(b) An alternative approach to proof 4.1 – use induction.
Identify a base case (easy). Then show that if a set of k−1 eigenvectors
with distinct eigenvalues is linearly independent and you add to the
set an eigenvector ~vk with an eigenvalue λk that is different from any
of the preceding eigenvalues, the resulting set of k eigenvectors with
distinct eigenvalues is linearly independent.
(c) In general, the square matrix A that represents a Markov process has
the property that all the entries are between 0 and 1 and each column
sums to 1. Prove that such a matrix A has an eigenvalue of 1 and
that there is a “stationary vector” that is transformed into itself by A.
You may use the fact, which we have proved so far only for 2×2 and
3 × 3 matrices, that if a matrix has a nonzero vector in its kermel, its
determinant is zero.
29
3. Problems with 3 × 3 matrices, to be solved by writing or editing R scripts
(a) Sometimes you don’t find all the eigenvectors on the first try.
1 2 0
The matrix A = 2 1 0
0 0 1
has three real, distinct eigenvalues, and there is a basis of eigenvectors.
Find what polynomial equation for the eigenvalues arises from each of
the following choices, and use it to construct as many eigenvectors as
possible.:
• w
~ = e~1 .
• w
~ = e~3 .
• w
~ = e~1 + e~3 .
1 −1 1
(b) Find two eigenvectors for the matrix A = −1 1 1 . and confirm
−2 2 0
that using each of the three standard basis vectors will not roduce a
third independent eigenvector.
Clearly the columns of A are not independent; so 0 is an eigenvalue.
This property makes the algebra really easy.
(c) Use the technique of example 2.7.5 in Hubbard to find the eigenvalues
3 4 −4
and eigenvectors of the matrix A = 1 3 −1
3 6 −4
30
4 Homework
1. Consider the sequence of numbers described, in a manner similar to the
Fibonacci numbers, by
b3 = 2b1 + b2
b4 = 2b2 + b3
(a) Write a matrix B to generate this sequence in the same way that
Hubbard generates the Fibonacci numbers.
(b) By considering the case b1 = 1, b2 = 2 and the case b1 = −1, b2 = 1,
find the eigenvectors and eigenvalues of B.
1
(c) Express the vector as a linear combination of the two eigenvectors,
1
and thereby find a formula for bn if b1 = 1, b2 = 1.
3. (a) Prove that if ~v1 and ~v2 are eigenvectors of matrix A, both with the
same eigenvalue λ, then any linear combination of ~v1 and ~v2 is also
an eigenvector.
(b) Suppose that A is a 3 × 3 matrix with a basis of eigenvectors but
~ , the vectors
with only two distinct eigenvalues. Prove that for any w
2
~ , A~
w w, and A w ~ are linearly dependent. (This is another way to
understand why all the polynomials pi (t) are simple when A has a
basis of eigenvectors but a repeated eigenvalue.)
31
4. Harvard graduate Ivana Markov, who concentrated in English and math-
ematics with economics as a secondary field, just cannot decide whether
she wants to be a poet or an investment banker, and so her career path is
described by the following Markov process:
32
5. (a) Prove by induction (no “· · · ” allowed!) that if F = P CP −1 , then
F n = P C n P −1 for all positive integers n.
(b) Suppose that 2 × 2 real matrix F has complex eigenvalues re±iθ . Show
that, for integer n, F n is a multiple of the identity matrix if and only
if nθ = mπ for some integer m. Hint: write F = P CP −1 where C is
conformal. This hint also helps with the rest of the problem.
3 7
(c) If F = , find the smallest n for which F n is a multiple of
−1 −1
the identity. Check your answer by matrix multiplication.
−2 −15
(d) If G = , use half-angle formulas to find a matrix A
3 10
for which A2 = G. Check your answer by matrix multiplication.
Problems that require writing or editing R scripts
33
8. Here is a symmetric matrix, which is guaranteed to have an orthonormal
basis of eigenvectors. For once, the numbers have not been rigged to make
the eigenvalues be integers.
4 −1 1
A = −1 3 2
1 2 −3
Express A in the form P DP −1 , where D is diagonal and P is an isometry
matrix whose columns are orthogonal unit vectors.
A similar example is in script 1.4X.
34
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #2, Week 1 (Number Systems and Sequences)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-7 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
• (Subsection 1.2) Look at the axioms for an ordered field (Ross, p. 14).
Identify one of the axioms that is not satisfied by the complex numbers,
which form a field but not an ordered field.
• (Subsection 1.2) You are given an unlimited budget to build a podium, one
foot in height, for the gold medal winner in your school’s track meet. Your
only available construction material is squares of gold foil, which are very
thin. Show that the Archimedean property of the real numbers guarantees
that you can succeed.
√
• (Subsection 1.2) Find a way to express 2 (which is irrational) as the√least
upper bound of a set of rational numbers. Hint: you can write 2 in
decimal notation.
1
• (Subsection 1.4) After a careful reading of example 1 in section 8, write out
a “Formal Proof” that
1
lim √ = 0.
n
• (Subsection 1.5)Invent sequences (sn ) and (tn ) such that lim(sn ) = 0 but
lim(sn tn ) = 2. Hint: look at theorem 9.4. You need to invent a (tn ) that
does not satisfy the hypotheses of this theorem.
• 5.1 Define “countably infinite.” Prove that the set of positive rational
numbers is countably infinite, but that the set of real numbers in the interval
[0,1], as represented by infinite decimals, is not countable.
2
R Scripts
• Script 2.1A-Countability.R
Topic 1 - The set of ordered pairs of natural numbers is countable
Topic 2 - The set of positive rational numbers is countable
• Script 2.1B-Uncountability.R
Topic 1 - Cantor’s proof of uncountability
Topic 2 - A different-looking version of the same argument
• Script 2.1C-Denseness.R
Topic 1 - Placing rational numbers between any two real numbers
• Script 2.1D-Sequences.R
Topic 1 - Limit of an infinite sequence
Topic 2 - Limit of sum = sum of limits
Topic 3 - Convergence of sequence of inverses (proof 5.2)
3
1 Executive Summary
1.1 Natural Numbers and Rational Numbers
• The natural numbers N are 1, 2, 3, · · · . They have the following rather
obvious properties. What is not obvious is that these five properties (the
“Peano axioms”) are sufficient to prove any other property of the natural
numbers.
– N1. 1 belongs to N.
– N2. If n ∈ N, then n + 1 ∈ N.
– N3. 1 is not the successor of any element of N.
– N4. If n and m ∈ N have the same successor, then n = m.
– N5. A subset S ∈ N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
• The “least number principle” states that any nonempty subset of N has a
least element. This statement, along with the assumption that any natural
number except 1 has a predecessor, can be used to replace N5.
Practical application: instead of doing a proof by induction, you can assert
that k > 1 is the smallest integer for which Pk is false, then get a contra-
diction by showing that Pk−1 is also false, thereby proving that the set for
which Pk is false must be empty.
• The rational numbers form a “countably infinite set,” which means that
there is a bijection between them and the natural numbers. Many proofs
rely on the fact that the rational numbers, or a subset of them, can be
enumerated as q1 , q2 , · · · .
4
1.2 Rational Numbers and Real Numbers
• The rational numbers and the real numbers each form an ordered field,
which means that there is a relation ≤ with properties
O1. Given a and b, either a ≤ b or b ≤ a.
O2. If a ≤ b and b ≤ a, then a = b.
O3. If a ≤ b and b ≤ c then a ≤ c.
O4. If a ≤ b, then a + c ≤ b + c.
O5. If a ≤ b and 0 ≤ c, then ac ≤ bc.
Many important properties of infinite sequences of real numbers can be
proved on the basis of ordering.
• Many well-known rules of algebra are not included on the list of field axioms.
Usually, as for (−a)(−b) = ab, this is because they are easily provable
theorems. However, there are properties of the real numbers that cannot
be proved from the field axioms alone because they rely on the axiom that
the real numbers are complete. The Completeness Axiom states that
Every nonempty subset S of R that is bounded above has a least upper
bound.
This least upper bound sup S is not necessarily a member of the set S.
• The rational numbers are a “dense subset” of the real numbers. This means
if a, b ∈ R and a < b, there exists r ∈ Q such that a < r < b.
Again the proof relies of the completeness of the real numbers.
• The real numbers form an uncountable set. This means that there is no bi-
jection between them and the natural numbers: they cannot be enumerated
as r1 , r2 , · · · .
5
1.3 Quantifiers and Negation
• Quantifiers are not used by Ross, but they are conventional in mathematics
and save space when you are writing proofs.
∃ is read “there exists.” It is usually followed by “such that” or “s.t.”
Example: the proposition “∃x s.t. x2 = 4” is true since either 2 or -2 has
the desired property.
∀ is read “for all” or “for each” or “for every.” It is used to specify that some
proposition is true for every member of a possibly infinite set or sequence.
Example: ∀x ∈ R, x2 ≥ 0 is true, but ∀x ∈ R, x2 > 0 is false.
– The negation of “∃x such that P (x) is true” is “∀x, P (x) is false.”
– The negation of “∀x, P (x) is true” is “∃x such that P (x) is false.”
6
1.5 Theorems about sequences and their limits
• Theorems about limits, all provable from the definition. These will be
especially useful for us after we define continuity in terms of sequences.
• Using the limit theorems above is usually a much more efficient way to find
the limit of the sequence than doing a brute-force calculation of N in terms
of . Ross has six diverse examples.
7
2 Lecture Outline
1. Peano axioms for the natural numbers ——N = 1, 2, 3, · · ·
• N1. 1 belongs to N.
• N2. If n ∈ N, then n + 1 ∈ N.
• N3. 1 is not the successor of any element of N.
• N4. If n and m ∈ N have the same successor, then n = m.
• N5. A subset S ∈ N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
8
A surprising replacement for axiom N5:
• Every subset of N has a smallest element.
• Any element of N except 1 has a predecessor.
Use these two statements (plus N1 through N4) to prove N5.
There is less to this approach than meets the eye. Instead of proving that
Pk implies Pk+1 for k ≥ 1, we showed that NOT Pk implies NOT Pk−1 for
k ≥ 2,
But these two statements are logically equivalent: quite generally, for pro-
postions p and q, p =⇒ q if and only if ¬q =⇒ ¬p. (principle of
contraposition)
A practical rule of thumb:
• If it is easier to prove that Pk =⇒ Pk+1 , use induction.
• If it is easier to prove that ¬Pk =⇒ ¬Pk−1 , use the least-number
principle.
9
2. Proof by induction and least number principle
Students of algebra are aware that for any positive integer n, xn − y n is
divisible by x − y.
10
3. (Ross, page 16; consequences of the ordered field axioms)
Using the fact that a set of numbers F (could be Q or R) satisfies the
ordered field axioms
O1. Given a and b, either a ≤ b or b ≤ a.
O2. If a ≤ b and b ≤ a, then a = b.
O3. If a ≤ b and b ≤ c then a ≤ c.
O4. If a ≤ b, then a + c ≤ b + c.
O5. If a ≤ b and 0 ≤ c, then ac ≤ bc.
prove the following:
• If a ≤ b then −b ≤ −a.
• ∀a ∈ F , a2 ≥ 0.
4. (Countability of the rational numbers - first part of proof 5.1 - script 2.1A)
Use the “diagonal trick” to prove that the positive rational numbers form
a countably infinite set.
6. Uncountability of the real numbers - second part of proof 5.1 - script 2.1B)
Prove that the real numbers between 0 and 1, as represented by infinite
decimals, form an uncountably infinite set.
11
12. (Ross, page 52 - to be done in LaTeX)
Suppose that lim sn = +∞ and lim tn > 0. Prove that lim sn tn = +∞.
12
13. Proofs based on nothing but the ordered field axioms
O1. Given a and b, either a ≤ b or b ≤ a.
O2. If a ≤ b and b ≤ a, then a = b.
O3. If a ≤ b and b ≤ c then a ≤ c.
O4. If a ≤ b, then a + c ≤ b + c.
O5. If a ≤ b and 0 ≤ c, then ac ≤ bc.
(a) Using the axioms for an ordered field, prove that the sum of two pos-
itive numbers is a positive number.
(b) Using the axioms for an ordered field, prove that the product of two
positive numbers is a positive number.
(c) Prove that Z5 is not an ordered field.
13
14. Least upper bound principle works for R but not for Q.
Your students at Springfield North are competing with a rival team from
Springfield South to draw up a business plan for a company with m scientists
and n other employees. Entries with m2 > 2n2 get rejected. The entry with
the highest possible ratio of scientists to other employees wins the contest.
Will this competition necessarily have a winner?
14
15. Use quantifiers to express the following concepts:
(a) “No matter how large a positive number M you choose, the sequence
(sn ) has infinitely many elements that are greater than M .”
Does this statement imply that lim sn = +∞?
(b) “No matter how small a positive number you choose, the sequence
(sn ) has only finitely many elements that lie outside the interval
(a − , a + ).”
Does this statement imply that lim sn = a?
15
16. Proving limits by brute force
Prove by brute force that the sequence
1 2 3 4
, , , ,···
3 5 7 9
converges to
1
.
2
16
17. Using limit theorems and trickery to prove limits
(a) Evaluate
1
lim √ √ .
n( n + 1 − n2 − 1)
2
1
Note: √ √ = 0.99999999874999999...
100( 10001 − 9999)
(b) Evaluate
4 4
lim((n + 1) 3 − n 3 ).
4 4 √
3
Note: 101 3 − 100 3 = 6.19907769....; 100 = 4.6415....
17
3 Group Problems
1. Proofs that use induction
18
2. Properties of sequences (to be done in LaTeX)
19
3. Some slightly computational problems
20
4 Homework
1. Ross, exercise 1.1. Do the proof both by induction (with “base case” and
“inductive step”) and by the least number principle (show that the assump-
tion that there is a nonempty set of positive integers for which the formula
is not true leads to a contradiction)
(a) He will shoot only finitely many arrows more than 200 meters.
(b) The negation of (a): he will shoot infinitely many arrows more than
200 meters. (You can do this mechanically by using the rules for
negation of statements with quantifiers.)
(c) No matter how small a positive number Artemis chooses, all the rest
of his shots will travel more than 200 − meters. (Off the record –
this idea can be expressed as lim inf sn = 200)
(d) He will become so consistent that eventually any two of his subsequent
shots will differ in distance by less than 1 meter. (This idea will
resurface next week as the concept of “Cauchy sequence.”)
3. Denseness of Q
This problem is closely related to group problem 1c.
355 22
(a) Find a rational number x such that 113
<x< 7
.
355
(b) Find a rational number x such that π < x < 113
.
Hint: π = 4 arctan 1, which any decent calculator can evaluate.
5. Ross, exercise 4.8. If you like this problem, you might enjoy reading enrich-
ment section 6 in Ross, which explains how to construct the real numbers
using “Dedekind cuts.”
6. Ross, Exercise 8.2(c) and 8.2(e). You might want to use the limit theorems
from section 9 to determine the limit, but then do a “Formal Proof” in the
style of the examples from section 8, working directly from the definition
of limit.
21
The last three problems must be dome in LaTeX. Print the pdf file and
attach it to your handwritten solutions.
7. Ross, Exercise 8.9. The star on the exercise means that it is “referred to in
many places.”
8. Ross, Exercise 9.12. This “ratio test” may be familiar from a calculus
course. There is a similar, better known test for infinite series that is
slightly more difficult to prove.
9. Ross, Exercises 9.15 and 9.16(a). The first of these results is invoked fre-
quently in calculus courses, especially in conjunction with Taylor series, but
surprisingly few students can prove it. If you are working the problems in
order, both should be easy.
22
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #2, Week 2 (Series, Convergence, Power Series)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2014 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
1
• Review the convergence tests you can remember and any specific criteria
for their applications. Use one to show that
∞
2
X
e−n
n=0
• 6.1 Bolzano-Weierstrass
2
– Prove that any Cauchy sequence of real numbers is covergent. You
will need to use something that follows from the completeness of the
real numbers. This could be the Bolzano-Weierstrss theorem, or it
could the fact that, for a sequence of real numbers, if lim inf sn =
lim sup sn = s, then lim sn is defined and
lim sn = s
3
R Scripts
• Script 2.2A-MoreSequences.R
Topic 1 – Cauchy Sequences
Topic 2 – Lim sup and lim inf of a sequence
• Script 2.2B-Series.R
Topic 1 – Series and partial sums
Topic 2 – Passing and failing the root test
Topic 3 – Why the harmonic series diverges
4
1 Executive Summary
1.1 Monotone sequences
A sequence (sn ) is increasing if sn ≤ sn+1 ∀n.
A sequence (sn ) is strictly increasing if sn < sn+1 ∀n.
A sequence (sn ) is decreasing if sn ≥ sn+1 ∀n.
A sequence (sn ) is strictly decreasing if sn > sn+1 ∀n.
A sequence that is either increasing or decreasing is called a monotone sequence.
All bounded monotone sequences converge.
For an unbounded increasing sequence, limn→∞ sn = +∞.
For an unbounded decreasing sequence, limn→∞ sn = −∞.
5
1.4 lim inf and lim sup
Given any bounded sequence, the “tail” of the sequence, which consists of the
infinite number of elements beyond the N th element, has a well-defined supremum
and infimum.
Let us combine the notion of limit with the definitions of supremum and
infimum. The ”limit infimum” and ”limit supremum” are written and defined as
follows:
lim inf sn = lim inf{sn : n > N }
N →∞
Now that we know the concepts of lim inf and lim sup, we find the following
properties hold:
• For a Cauchy sequence of real numbers, lim inf sn = lim sup sn , and so the
sequence converges.
6
1.6 Infinite series, partial sums, and convergence
Given an infinite series Σan we define the partial sum
n
X
sn = ak
k=m
7
1.9 Convergence tests
• Limit of the terms. If a series converges, the limit of its terms is 0.
P
• Comparison
P Test. Consider the series an Pof all positive terms.
If P an converges and |bn | < an for all n then bn also
P converges.
If an diverges to +∞ and |bn | > an for all n, then bn also diverges to
+∞
P
• Ratio Test. Consider the series an of nonzero terms.
an+1
This series converges if lim sup | an | < 1
This series diverges if lim inf | an+1
an
|>1
an+1 an+1
If lim inf | an | ≤ 1 ≤ lim sup | an |, then we have no information and need
to perform another test to determine convergence.
• Root Test. Consider the seriesP an , and evaluate lim sup |an |1/n .
P
If lim sup |an |1/n < 1, the series P an converges absolutely.
If lim sup |an |1/n > 1, the series an diverges.
If lim sup |an |1/n = 1, the test gives no information.
• Integral Test. Consider a series of nonnegative terms for which the other
tests seem to be failing. In the event that we can find a function f (x), such
that f (n) = an ∀n, we may look at the behavior of this function’s integral
to tell us whether
Rn the series converges.
If limn→∞ R1 f (x)dx = +∞, then the series will diverge.
n
If limn→∞ 1 f (x)dx < +∞, then the series will converge.
where the sequence (an ) is a sequence of real numbers. A power series defines a
function of x whose domain is the set of values of x for which the series converges.
That, of course, depends on the coefficients (an ). There are three possibilities:
– Converges ∀x ∈ R.
– Converges only for x = 0.
– Converges ∀x in some interval, centered at 0. The interval may be open (−R, R),
closed [−R, R], or a mix of the two like [−R, R]. The number R is called the radius
of convergence. Frequently the series converges absolutely in the interior of the
interval, but the convergence at an endpoint is only conditional.
8
Lecture Outline
Let (sn ) be a sequence in R. Prove that if lim inf sn = lim sup sn = s, then
lim sn is defined and
lim sn = s
Using the result of the preceding proof, which relies on the completeness
axiom for the real numbers, prove that any Cauchy sequence of real num-
bers is convergent.
9
4. (Convergent subsequences, Bolzano Weierstrass)
Given a sequence (sn )n∈N , a subsequence of this sequence is a sequence
(tk )k∈N , where for each k, there is a positive integer nk such that
and tk = snk . So (tk ) is just a sampling of some, or all, of the (sn ) terms,
with order preserved.
A term sn is called dominant if it is greater than any term that follows it.
(a) Use the concept of dominant term to prove that every sequence (sn )
has a monotonic subsequence.
(b) Prove that any bounded increasing sequence converges to its least
upper bound.
(c) Prove the Bolzano-Weierstrass Theorem: every bounded sequence has
a convergent subsequence.
10
6. (Ross, p.99-100, The Root Test)
the lim sup |an |1/n , referred to as α.
P
Consider the infinite series an and P
Prove the following statements about an :
(you may assume the Comparison Test as proven)
7. (Ross,
P pp. 99-100, The Ratio Test)
Let an be an infinite series of nonzero terms. Prove the following (you
may assume the Root Test as proven). You may also use without proof the
following result from Ross (theorem 12.2):
sn+1 1 1 sn+1
lim inf | | ≤ lim inf |sn | n ≤ lim sup |sn | n ≤ lim sup | |
sn sn
• If lim sup |an+1 /an | < 1, then the series converges absolutely.
• If lim inf |an+1 /an | > 1, then the series diverges.
• If lim inf |an+1 /an | ≤ 1 ≤ lim sup |an+1 /an |, then the test gives no
information.
11
9. Defining a sequence recursively (model for group problems, set 1)
John’s rich parents hope that a track record of annual gifts to Harvard will
enhance his chance of admssion. On the day of his birth they set up a trust
fund with a balance s0 = 1 million dollars. On each birthday they add
another million dollars to the fund, and the trustee immediately donates
1/3 of the fund to Harvard in John’s name. After the donation, the balance
is therefore
2
sn+1 = (sn + 1).
3
• Use R to find the annual fund balance up through s18 .
• Use induction to show sn < 2 for all n.
• Show that (sn ) is an increasing sequence.
• Show that lim sn exists and find lim sn .
12
10. What is the fallacy in the following argument?
•
1 1 1 1 1 1 1
loge 2 = 1 − + − + − + − + ··· .
2 3 4 5 6 7 8
•
1 1 1 1 1
loge 2 = − + − + · · · .
2 2 4 6 8
•
3 1 1 1 1 1 1
loge 2 = 1 − − + − − + + · · · = loge 2.
2 4 4 3 8 8 5
•
3
= 1; 3 = 2; 1 = 0.
2
13
11. Clever proofs for p-series.
P1
(a) Prove that n
= +∞ by showing that the sequence of partial sums
is not a Cauchy sequence.
(b) Evaluate
∞
X 1
n=2
n(n − 1)
by exploiting the fact that this is a “telescoping series.”
(c) Prove that
∞
X 1
n=2
n2
is convergent.
14
12. For the sequence
n+2 nπ
sn = sin( ),
n+1 4
give three examples of a subsequence, find the lim sup and the lim inf, and
determine whether it converges.
15
13. A case where the root test outperforms the ratio test
(Ross, Example 8 on page 103)
∞
X n −n 1 1 1 1 1
2(−1) =2+ + + + + + ··· .
n=0
4 2 16 8 64
16
14. (Model for group problems, set 3) Find the radius of convergence and the
exact interval of convergence for the series
∞
X n 3n
x ,
n=0
2n
17
2 Group Problems
1. Subsequences, monotone sequences, lim sup and lim inf
π 1 1 1 1
= 1 − + − + + ··· .
4 3 5 7 9
i. For the sequence of partial sums (sn ), find an increasing subse-
quence and a decreasing subsequence.
ii. Prove that lim sup sn = lim inf sn
iii. Prove that the series is not absolutely convergent by showing that
it fails the Cauchy test with = 1/2,
18
2. Sequences, defined recursively
Feel free to use R to calculate the first few terms of the sequence instead
of doing it by hand. Using a for loop, you can easily calculate as many
terms as you like. By modifying script 2.2C, you can easily plot the first
20 or so terms. It you come up with a good R script, please upload it to
the solutions page.
n
(a) (Ross, 10.9) Let s1 = 1 and sn+1 = ( n+1 )s2n for n > 1.
• Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go
at least as far as s20 .
• Show that lim sn exists.
• Prove that lim sn = 0.
(b) (Ross, 10.10) Let s1 = 1 and sn+1 = 13 (sn + 1) for n > 1.
• Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go
at least as far as s20 .
• Use induction to show sn > 12 for all n.
• Show that (sn ) is a decreasing sequence.
• Show that lim sn exists and find lim sn .
1
(c) (Ross, 10.12) Let t1 = 1 and tn+1 = [1 − ]t
(n+1)2 n
for n > 1.
• Find t2 , t3 , t4 if working by hand. If using R, use a for loop to go
at least as far as t20 .
• Show that lim tn exists.
• Use induction to show tn = n+1 2n
for all n.
• Find lim tn .
19
This last set of problems should be done using LaTeX. They provide good
practice with summations, fractions, and exponents.
(a)
X 2n X
( )xn and xn! .
n!
(b)
X 3n X√
n
( )x and nxn .
n · 4n
(c)
X (−1)n X 3n
( 2 n )xn and √ xn .
n4 n
20
3 Homework
1. Ross, 10.2 (Prove all bounded decreasing sequences converge.)
2. Ross, 10.6
3. Ross, 11.8.
4. Suppose that (sn ) is a Cauchy sequence and that the subsequence (s1 , s2 , s4 , s8 , s16 , · · · )
converges to s. Prove that lim sn = s. Hint– use the standard bag of tricks:
the triangle inequality, epsilon-over-2, etc.
5. Sample problem 2 shows that in general, the order of terms in a series must
be respected when calculating the sum. However, addition is commutative
and associative, which makes it surprising that order should matter.
• Prove that if a series (an ) has only positive terms, then its sum is
equal to the least upper bound of the numbers that can be obtained
by summing over any finite subset of the terms.
Hint: Call this least upper bound S 0 . Call the sum as defined by Ross
S. Prove that S 0 ≤ S and that S ≤ S 0 .
• Suppose that a series includes both positive and negative terms and
its sum is S. It looks as though you can split it into a series of non-
negative terms and a series of negative terms, sum each separately,
then combine the results. Will this approach work for the seies in
sample problem 2
7. Ross, 14.8.
8. Ross, 15.6
9. Ross, 23.4. You might find it useful to have R generate some terms of the
series.
21
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #2, Week 3 (Limits and continuity of functions)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2014 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
• Study example 1 on page 125, then invent a similar argument for the func-
tion f (x) = x2 − 2x + 1. It is important to realize that a proof can be done
“for all sequences.”
•
1
The function g(x) = sin for x 6= 0, g(0) = 0
x
1
is discontinuous at x = 0. Show that the sequence xn = nπ
can be used as
a “bad sequence” to prove this assertion.
• Suppose that a function f (x) has the property that the image of the interval
I = [0, 2] is the interval J = [0, 1] ∪ [2, 3]. Invent a discontinuous function f
with this property and conveince yourself that no continuous function can
have this property.
• When you define the arc sine function in a calculus course, you begin by
restricting the domain of the sine function to the interval [− π2 , π2 ]. Convince
yourself that this restriction makes Theorems 18.4 and 18.5 apply, while
restricting the domain to [0, π] would not work. Which restricted domain
works for defining the arc cosine function?
• Read through examples 1-3 in section 19.1 of Ross. You can skip over the
computational details. The key issue is this:
On the interval (0, ∞) the function f (x) = x12 is continuous for any specified
x0 . However, when x0 is very small, the δ that is needed to prove continuity
1
must be proportional to x30 . There is no “one size fits all” δ that is indepen-
dent of x0 . Example 3 shows that even with = 1, it is impossible to meet
the requirement for uniform continuity. When you draw the graph of f (x),
you see what the problem is: the derivative of f (x), which is essentially the
ratio of to δ, is unbounded.
• Now you have seen two ways to select a function and an interval so that the
function is continuous but not uniformly continuous on the interval. Read
through the rest of section 19.1 to see how to avoid this situation. There
are four ways:
• Think hard about definition 20.1. This is not the definition of limit that
is found in most calculus texts, but it is in some ways better because it
incorporates the ideas of “limit at infinity” and “increases without limit.”
• Look at theorems 20.4 and 20.5, and convince yourself that they are crucial
for proving the well-known formulas for derivatives that are in every calculus
course. If you are fond of entertaining counterexamples, look at example 7
on page 158.
• 7.1 Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b).
Prove that there exists at least one x ∈ [a, b] such that f (x) = y.
Use Ross’s “no bad sequence” definition of continuity, not the epsilon-delta
definition.
2
Additional proofs(may appear on quiz, students wiill post pdfs or
videos
• 7.3 Prove that if f and g are real-valued functions that are continuous at
x0 ∈ R, then f + g is continuous at x0 . Do the proof twice: once using the
“no bad sequence” definition of continuity and one using the epsilon-delta
definition of continuity.
3
R Scripts
• Script 2.3A-Continuity.R
Topic 1 - Two definitions of continuity
Topic 2 – Uniform continuity
• Script 2.3B-IntermediateValue.R
Topic 1 - Proving the intermediate value theorem
Topic 2 - Corollaries of the IVT
4
1 Executive Summary
1.1 Two equivalent definitions of continuity
• Continuity in terms of sequences
This definition is not standard: Ross uses it, but many authors use the
equivalent epsilon-delta definition. Here is some terminology that students
find useful when discussing the concept:
5
1.2 Useful properties of continuous functions
• New continuous functions from old ones.
Once you know that the identity function and elementary functions like nth
root, sine, cosine, exponential, and logarithm as continuous (Ross has not
yet defined most of these functions!), you can state the casual rule
“If you can write a formula for a function that does not involve
division by zero, that function is continuous everywhere.”
– f is a bounded function.
– f achieves its maximum and minimum values on the interval (i.e. they
are not just approached as limiting values).
– IVT: If a < b and y lies between f (a) and f (b), there exists at least
one x in (a, b) for which f (x) = y.
– The image of an interval I is either a single point or an interval J.
– If f is a strictly increasing function on I, there is a continuous strictly
increasing inverse function f −1 : J → I.
– If f is a strictly decreasing function on I, there is a continuous strictly
decreasing inverse function f −1 : J → I.
– If f is one-to-one on I, it is either strictly increasing or strictly de-
creasing.
6
1.3 Continuity versus uniform continuity
It’s all a matter of the order of quantifiers. For continuity, y is agreed upon
before the epsilon-delta game is played. For uniform continuity, a challenge is
made using some > 0, then a δ has to be chosen that meets the challenge
independent of y.
For function f whose domain is a set S:
• Continuity: ∀y ∈ S, ∀ > 0,
∃δ > 0 such that ∀x ∈ S, |x − y| < δ implies |f (x) − f (y)| < .
• On [0, ∞] (not a bounded set), the squaring function is continuous but not
uniformly continuous.
1
• On (0, 1) (not closed) the function f (x) = x
is continuous but not uniformly
continuous.
7
1.4 Limits of functions
1. Definitions of “limit”
4. One-sided limits
We can modify either definition to provide a definition for L = limx→a+ f (x).
• With Ross’s definition, choose the set S to include only values that
are greater than a.
• With the conventional definition, consider only x > a: i.e.
a < x < a + δ implies |f (x) − L| < .
8
Lecture outline
9
7. (Ross, page 156)
Use Ross’s non-standard but excellent definition of limit.
S is a subset of R, f is a function defined on S, and a and L are real
numbers, ∞ or −∞.
Then limx→aS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
Suppose that L1 = limx→aS f1 (x) and L2 = limx→aS f2 (x) exist and are
finite.
Prove that limx→aS (f1 + f2 )(x) = L1 + L2 and
limx→aS (f1 f2 )(x) = L1 L2 .
10
9. Using the “bad sequence” criterion to show that a function is discontinuous.
x
The “signum function” sgn(x) is defined as |x|
for x 6= 0, 0 for x = 0.
Invent a “bad sequence,” none of whose elements is zero, to prove that
sgn(x) is discontinuous at 0, then show that for any positive x, no such bad
sequence can be constructed.
Restate this proof that sgn(x) is discontinous at x = 0, continuous for
positive x, in terms of the epsilon-delta definition.
11
10. Prove that the function
x2 x4
C(x) = 1 − +
2 24
is equal to zero for one and only one value x ∈ [1, 2].
This result will be useful when we define π without trigonometry.
12
11. Uniform continuity (or lack thereof)
1
Let f (x) = x2 + x2
.
Determine whether f is or is not uniformly continuous on each of the fol-
lowing intervals:
(a) [1, 2]
(b) (0, 1]
(c) [2, ∞)
(d) (1, 2)
13
12. Uniform continuity
Show that on the open interval (0, π) the function
1 − cos x
f (x) =
x2
is uniformly continuous by using the “extension” approach.
14
13. Limits by brute force
p
(a) Use the epsilon-delta definition of limit to prove that limx→0 |x| = 0.
x
(b) Use the sequence definition of limit to show that limx→0 |x| does not
exist.
15
14. Limits that involve roots
Use the sum and product rules for limits to evaluate
1
x3 − 1
lim
x→1 x − 1
16
2 Group Problems
1. Proofs about continuity
For (a) and (b), do two different versions of the proof:
• Use the “no bad sequence definition” and invoke a result for sequences
from week 1.
• Use the epsilon-delta definition and mimic the proof for sequences from
week 1.
(a) Prove that if f and g are real-valued functions that are continuous at
x0 ∈ R, then f g is continuous at x0 . (Hint: on any closed interval [x0 −
a, x0 + b] in the domain of f , the continuous function f is bounded.)
(b) Prove that if f is continuous at x0 ∈ R, and g is continuous at f (x0 ),
then the composite function g ◦ f is continuous at x0 .
(c) • The Heaviside function H(x) is defined by H(x) = 0 for x <
0, H(x) = 1 for x ≥ 0 Using the “no bad sequence” definition,
prove that H is discontinuous at x = 0.
• Using the epsilon-delta definition of continuity, prove that f (x) =
x3 is continuous for arbitrary x0 . (Hint: first deal with the special
case x0 = 0, then notice that for small enogh δ, |x| < 2|x0 |.
17
2. Uniform continuity; intermediate-value theorem
18
3. Calculation of limits (do these in LaTeX to get practice with fractions and
functions)
(c) Limits that involve trig functions; use the sum and product rules for
limits and the fact that limx→0 sinx x = 1.
• Evaluate
cos 2x − 1
lim
x→0 x2
• Evaluate
tan x − sin x
lim
x→0 x3
19
3 Homework
Special offer – if you do the entire problem set, with one problem omitted, in
LaTeX and hand in a printout of the PDF file, you will receive full credit for the
omitted problem.
1. Ross, exercises 19.2(b) and 19.2(c). Be sure that you prove uniform conti-
nuity, not just continuity!
4. Ross, exercise 20.18. Be sure to indicate where you are using various limit
theorems.
6. Ross, exercises 17-13a and 17-14. These functions will be of interest when
we come to the topic of integration in the spring term.
8. Ross, exercise 18-10. You may use the intermediate-value theorem to prove
the result.
20
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #2, Week 4 (Derivatives, Inverse functions, Taylor series)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified:July 24, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
• Review the derivative rules, and the limit definition of the derivative.
• Read the last paragraph of section 29.8, which begins “We next show how
to ...” Apply the argument to the case f (x) = sin x, I = ( −π , π ) to do the
2 2
standard derivation of the derivative of the arc sine function. Then be sure
that you understand what else needs to be proved.
• Read the statement of L’Hospital’s rule at the start of section 30.2. Then
look at examples 2 through 5 and identify the values of s and L.
• Read Example 3 on page 257, which describes a function that does not
equal the sum of its Taylor series! Once you are aware of the existence of
such functions, you will appreciate why it is necessary to prove “Taylor’s
theorem with remainder.” Only by showing that the remainder approaches
a limit of zero can you prove that the Taylor series converges to the function.
1
• Look at example 1 of section 31.4, where the familiar Taylor series for the
exponential function and the sine function are derived. By looking at the
corollary at the start of the section and the theorem that precedes it, figure
out the importance of the statement “the derivatives are bounded.”
• Skim the proof of the binomial theorem in Section 31.7. Notice that it is
not sufficient just to crank out derivatives and get the Taylor series. We
will need to prove that, for any |x| < 1, the series for (1 + x)α converges
to the function, and this requires a different form on the remainder. Look
at Corollary 31.6 and Corollary 31.4 and figure out which relies on the
mean-value theorem and which relies on integration by parts.
g(y) − g(y0 ) 1
lim = 0 .
y→y0 y − y0 f (x0 )
• 8.2 Taylor’s Theorem with remainder: Let f be defined on (a, b) with a <
0 < b. Suppose that the nth derivative f (n) exists on (a, b).
Define the remainder
n−1 (k)
X f (0)
Rn (x) = f (x) − xk .
k=0
k!
Prove, by repeated use of Rolle’s theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
f (n) (y) n
Rn (x) = x .
n!
2
Additional proofs(may appear on quiz, students wiill post pdfs or
videos
• 8.3 (Ross, pp.233-234, Rolle’s Theorem and the Mean Value Theorem)
f (b) − f (a)
f 0 (x) =
b−a
• 8.4 (Ross, pp. 228, The Chain Rule – easy special case) Assume the follow-
ing:
– Function f is differentiable at a.
– Function g is differentiable at f (a).
– There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
– Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g ◦ f is defined on J and differentiable at a and that
3
R Scripts
• Script 2.4C-SampleProblems.R
4
1 Executive Summary
1.1 The Derivative - Definition and Properties
• A function f is differentiable at some point a if the limit
f (x) − f (a)
lim
x→a x−a
exists and is finite. It is referred to as f 0 (a). If a function is differentiable
at a point a, then it is continuous at a as well.
• The most memorable derivative rule is The Chain Rule, which states that
if f is differentiable at some point a, and g is differentiable at f (a), then
their composite function g ◦ f is also differentiable at a, and
5
1.3 Behavior of differentiable functions
These justify our procedures when we are searching for the critical points of a
given function. They are the main properties we draw on when reasoning about
a function’s behavior.
• If f is defined on an open interval, achieves its maximum or minimum at
some x0 , and is differentiable there, then f 0 (x0 ) = 0.
• Not quite a proof: Since (f ◦ f −1 )(y) = y, the chain rule states that
f 0 (f −1 (y))(f −1 )0 (y) = y and, if f 0 (f −1 (y)) 6= 0,
1
(f −1 )0 (y) = .
f 0 (f −1 (y))
1 1 1 1
(arctan)0 (y) = = = 2
=
(tan)0 (arctan y) sec2 (arctan y) 1 + tan (arctan y) 1 + y2
6
1.5 Defining the logarithm and exponential functions
Define the natural logarithm as an antiderivative:
Z y Z e
1 1
L(y) = dt, and define e so that dt = 1.
1 t 1 t
From this definition it is easy to prove that L0 (y) = y1 and not hard to prove that
L(xy) = L(x) + L(y).
Now the exponential function can be defined as the inverse function, so that
E(L(y)) = y. From this definition it follows that E(x + y) = E(x) + E(y) and
that E 0 (x) = E(x).
f 0 (x)
lim 0
= L; lim+ f (x) = lim+ g(x) = 0; g 0 (a) < 0.
x→a+ g (x) x→a x→a
Then
f (x)
lim = L.
x→a+ g(x)
• Once you understand the proof in one special case, the proof in all the other
cases is essentially the same.
f 0 (x)
lim = L,
x→a g 0 (x)
f (x)
| − L| < .
g(x)
7
1.7 Taylor series
• If a function f is defined by a convergent power series, i.e.
∞
X
f (x) = ak xk for |x| < R,
k=0
∞
X f (k) (0)
f (x) = xk for |x| < R.
k=0
k!
8
Lecture Outline
2. (Ross, pp. 228, The Chain Rule – easy special case) Assume the following:
• Function f is differentiable at a.
• Function g is differentiable at f (a).
• There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
• Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g ◦ f is defined on J and differentiable at a and that
f (b) − f (a)
f 0 (x) =
b−a
9
5. (Ross, theorem 29.9 on pages 237-238, with the algebra done in reverse
order)
Suppose that f is a one-to-one continuous function on open interval I (either
strictly increasing or strictly decreasing) Let open interval J = f (I), and
define the inverse function f −1 : J → I for which
(f −1 ◦ f )(x) = x for X ∈ I; f ◦ f −1 (y) = y for y ∈ J.
6. (L’Hospital’s Rule; based on Ross, 30.2, but simplified to one special case)
Suppose that f and g are differentiable functions and that
f 0 (z)
lim = L; f (a) = 0, g(a) = 0; g 0 (a) > 0.
z→a+ g 0 (z)
Choose x > a so that for a < z ≤ x, g(z) > 0 and g 0 (z) > 0.
(You do not have to prove that this can always be done!)
By applying Rolle’s Theorem to h(z) = f (z)g(x) − g(z)f (x),
prove that
f (x)
lim = L.
x→a+ g(x)
10
7. (Ross, page 250; version 1 of Taylor’s Theorem with remainder, setting
c = 0)
Let f be defined on (a, b) with a < 0 < b. Suppose that the nth derivative
f (n) exists on (a, b).
Define the remainder
n−1 (k)
X f (0)
Rn (x) = f (x) − xk .
k=0
k!
Prove, by repeated use of Rolle’s theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
f (n) (y) n
Rn (x) = x .
n!
8. (Ross, pp. 342-343; defining the natural logarithm)
Define Z y
1
L(y) = dt.
1 t
Prove from this definition the following properties of the natural logarithm:
•
1
L0 (y) = for y ∈ (0, ∞).
y
• L(yz) = L(y) + L(z) for y, z ∈ (0, ∞).
• limy→∞ L(y) = +∞.
11
9. Calculating derivatives
√
Let f (x) = 3 x.
12
10. Using the Mean Value Theorem
13
11. Using L’Hospital’s rule – tricks of the trade
lim x loge x2 .
x→0+
(b) Evaluate
xex − sin x
lim
x→0 x2
both by using L’Hospital’s rule and by expansion in a Taylor series.
14
12. Applying the inverse-function rule
The function g(y) = arctan y 2 , y ≥ 0 is continuous and strictly increasing,
hence invertible.
Calculate its derivative by finding a formula for the inverse function f (x),
which is easy to differentiate, then using the rule for the derivative of an in-
verse function. You can confirm your answer by using the known derivative
of the arctan function.
15
13. Definition and properties of the exponential function
Denote the function inverse to L by E, i.e.
Prove from this definition the following properties of the exponential func-
tion E:
16
14. Hyperbolic functions, defined by their Taylor series
x3 x5 x2 x4
sinh x = x + + + · · · ; cosh x = 1 + + + ···
3! 5! 2! 4!
17
2 Group Problems
1. Proving differentiation rules
• Let f (x) = csc x so that sin xf (x) = 1. Use the product rule to
prove that
(csc x)0 = − csc x cot x.
(b) Integer exponents
• Positive: use induction and the product rule to prove that for all
positive integers n
(xn )0 = nxn−1
Hint: start with a base case of n = 1.
• Negative: let f (x) = x−n so that xn f (x) = 1. Use the product
rule to prove that for all positive integers n
(x−n )0 = −nx−n−1 .
18
2. MVT, L’Hospital, inverse functions
19
3. Taylor series
x 3 x5 x2 x4
S(x) = x − + − · · · ; C(x) = 1 − + − ···
3! 5! 2! 4!
• Calculate S 0 (x) and C 0 (x), and prove that S 2 (x) + C 2 (x) = 1.
• Use Taylor’s theorem to prove that
C(a + x) = C(a)C(x) − S(a)S(x).
(b) Using the remainder to prove convergence
Define f (x) = loge (1 + x) for x ∈ (−1, ∞).
Using the remainder formula
f (n) (y) n
Rn (x) = x
n!
prove that
1 1 1 1
loge 2 = 1 − + − + − ··· .
2 3 4 5
Show that the remainder does not go to zero if you set x = −1.
(c) Derive the Taylor series for the function f (x) = cos x. Prove that the
series converges for all x. Then use an appropriate form of remainder
to prove that it converges to the cosine function.
20
3 Homework
Again, if you do the entire assignment in TeX, you may omit one problem and
receive full credit for it.
1. Ross, 28.2
2. Ross, 28.8
3. Ross, 29.12
4. Ross, 29.18
5. Ross, exercises 30-1(d) and 30-2(d). Do these two ways: once by using
L’Hospital’s rule, once by replacing each function by the first two or three
terms of its Taylor series.
6. Ross, 30-4. Use the result to convert exercise 30-5(a) into a problem that
involves a limit as y → ∞.
7. One way to define the exponential function is as the sum of its Taylor series:
x2 x3
ex = 1 + x + 2!
+ 3!
+ ··· .
Using this definition and Taylor’s theorem, prove that ea+x = ea ex .
8. Ross, exercise 31.5. For part (a), just combine the result of exmaple 3
(whose messy proof you need not study) with the chain rule.
21
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #3, Week 1
Reading
• Hubbard, Section 1.5. The only topology that is treated is the “open-ball
topology.”
Alas, Hubbard does not mention either finite topology or differential equa-
tions. I have included a set of notes on these topics that I wrote for Math 121.
• Go to the page of the Math 23 Web site called “Finite topology example.”
Roam around the six pages by clicking links, and convince yourself that the
site is represented by the graph and the matrix T on page 3.
• Look at the three axioms for topology on page 4, and decide whether or
not open intervals on the line and open disks in R2 appear to satisfy them.
In each case, invent an infinite intersection of open sets that consists of a
single point, which is a closed set.
1
Proofs to present in section or to a classmate who has done them.
Proofs:
• 9.1
• 9.2 Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the “infinite triangle inequality” for Rn
∞
X ∞
X
| a~i | ≤ |~
ai |
i=1 i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is con-
vergent.
2
R Scripts
• Script 3.1A-FiniteTopology.R
Topic 1 - The ”standard” Web site graph, used in notes and examples
Topic 2 - Drawing a random graph to create a different topology on the
same set
• Script 3.1B-SequencesSeriesRn.R
Topic 1 - A convergent sequence of points in R2
Topic 2 - A convergent infinite series of vectors
Topic 3 - A convergent geometric series of matrices
• Script 3.1C-DiffEquations.R
Topic 1 - Two real eigenvalues
Topic 2 - A repeated real eigenvalue
Topic 3 - Complex conjugate eigenvalues
3
1 Executive Summary
1.1 Axioms of Topology
In topology, we start with a set X and single out some of its subsets as “open
sets.” The only requirement on a topology is that the collection of open sets
satisfies the following rules (axioms)
• The union of any finite or infinite collection number of open sets is open.
• The intersection of two open sets is open. It follows by induction that the
intersection of n open sets is open, but the intersection of infinitely many
open sets is not necessarily open.
In this model, an “open set” is defined by the property that no page in the
set can be reached by a link from outside the set. We need to show that this
definition is consistent with the axioms for open sets.
–The empty set is open. Since it contains no pages, it contains no page that can
be reached by an outside link.
–The set X of all six pages is open, because there is no other page on the site
from which an outside link could come.
–If sets A and B are open, no page in either can be reached by an outside link,
and so their union is also open.
–If sets A and B are open, so is their intersection A ∩ B. Proof by contraposition:
Suppose that A ∩ B is not open. Then it contains a page that can be reached
by an outside link. If that link comes from A, then B is not open. If that link
comes from B, then A is not open. If that link comes from outside both A and
B, then both A and B are not open.
4
1.3 Topology in R and Rn
The usual way to introduce a topology for the set R is to decree that any open
interval is an open set and so is the empty set. Equivalently, we can decree that
the set of points for which |x − x0 | < , with > 0, is an open set. Notice that the
infinite intersection of the open sets (−1/n, 1/n) is the single point 0, a closed
set!
The usual way to introduce a topology for the set Rn is to decree that any
“open ball,” the set of points for which |x − x0 | < , with > 0, is an open set.
• Closed sets
A closed set A is one whose complement Ac = X − A is open. Careful:
this is different from “one that is not open.” There are lots of sets that are
neither open nor closed, and there are sets that are both open and closed.
• The interior of a set A ⊂ Rn , denoted Å, is “the largest open set that is
contained in A,” i.e. the union of all the open subsets of A.
• The boundary of A, denoted ∂A, is the set of all points x with the property
that any neighborhood of x includes points of A and also includes points
of the complement Ac .
The boundary of A is the difference between the closure of A and its interior.
5
1.6 Something special about the open ball topology
– |A~b| ≤ |A||~b|
– |AB| ≤ |A||B|
6
1.8 Calculating the exponential of a matrix
b 0 bt 0
–If D = , then Dt = and
0 c 0 ct
bt
1 0 bt 0 1 (bt)2 0 e 0
exp(Dt) = + + + ··· =
0 1 0 ct 2 0 (ct)2 0 ect
–If there is a basis of eigenvectors for A,
then A = P DP −1 , Ar = P Dr P −1 ., and exp(At) = P exp(Dt)P −1 .
–Replace D by a conformal matrix C = aI + bJ where J 2 = −I and
exp(Ct) = exp(aIt) exp(bJt) can be expressed in terms of sin t and cos t.
–If A = bI + N, and N 2 = 0, exp(At) = exp bt exp(N t) = exp bt(I + N t).
∞
X Ar tr
exp At = .
r=0
r!
∞
X rAr tr−1
d
exp At = .
dt r=1
r!
Set s = r − 1.
∞ ∞
d X As+1 ts X As ts
exp At = =A = A exp At.
dt s=0
s! s=0
s!
So
d
~v˙ = exp At~v0 = A exp At~v0 = A~v.
dt
7
2 Lecture outline
1. Proof 9.1
2. Convergent sequences in Rn :
3. Proof 9.2
Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the “infinite triangle inequality” for Rn
∞
X ∞
X
| a~i | ≤ |~
ai |
i=1 i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is con-
vergent.
4. Prove that if every element of the convergent sequence (xn ) is in the closed
subset C ⊂ Rn , then the limit x0 of the sequence is also in C.
8
6. Constructing a finite topology
Axioms for general topology
– The empty set and the set X are both open.
– The union of any finite or infinite collection number of open sets is open.
– The intersection of two open sets is open.
Suppose that we start with X = {123456} and choose a “subbasis.” con-
sisting of {123}, {245}, and {456}.
• Find all the other sets that must be open because of the intersection
axiom and the empty-set axiom.
• Find all the other sets that must be open because of the union axiom
and the axiom that set X is open.
• We now have the smallest collection of open sets that satisfies the ax-
ioms and includes the subbasis. A closed set is one whose complement
is open. List all the closed sets.
• What is the smallest legal collection of open sets in the general case?
• What is the largest legal collection of open sets in the general case?
9
7. Web site topology. A set of pages is “open” if there are no incoming links
from elsewhere on the site. A set of pages is closed if no outgoing link
leads to a page outside the set (i.e. if the complement is an open set.)
• Of {26}?
• Of {23456}?
10
8. The “open ball” definition of an open set satisfies the axioms of topology.
A set U ∈ Rn is open if ∀x ∈ U, ∃r > 0 such that the open ball Br (x) ∈ U .
11
9. A geometric series of matrices
The geometric series formula for a square matrix A is
(I − A)−1 = I + A + A2 + ....
1
1
0 2 , A2 =
−4 0
Let A = .
− 12 0 0 − 41
12
10. Calculating and using the exponential of a matrix
1 1 1
The matrix A = has eigenvector with eigenvalue 2 and
−2 4 1
1
eigenvector with eigenvalue 3.
2
13
11. Solving a differential equation when there is no eigenbasis.
The system of differential equations
ẋ = 3x − y
ẏ = x + y
3 −1
can be written ~v˙ = A~v, where A = .
1 1
Our standard technique leads to p(t) = t2 − 4t + 4 = (t − 2)2 , so there is
one only eigenvalue.
1−1
Let N = A − 2I = .
1−1
We have found that p(A) = A2 − 4A + 4I = (A − 2I)2 = 0, so N 2 = 0.
Since matrices 2I and N commute, exp(At) = exp(2It) exp(N t)
Show that exp At = e2t (I + N t) ,and confirm that (exp At)~e1 is a solution
to the differential equation.
14
12. Solving the “harmonic oscillator” differential equation (if time permits)
Applying Newton’s second law of motion to a mass of 1 attached to a spring
with “spring constant” 4 leads to the differential equation
ẍ = −4x.
Solve this equation by using matrices for the case where x(0) = 1, v(0) = 0.
The trick is to consider a vector
x(t)
~ =
w , where v = ẋ.
v(t)
15
3 Group Problems
1. Topology
(a) We can use the same conventions as for the ferryboat graph of week
1. Column j shows the links going out of page j. If Ti,j = 1, there is
a link from page j to page i. If Ti,j = 0, there is no link from page j
to page i.
0 1 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
T =0 0 0 0 0 0 .
0 0 0 1 0 0
0 1 0 1 0 0
Draw the Web site graph that this matrix represents.
i. Open sets include {12} and {4}. List all the other open sets and
all the closed sets.
ii. Determine the interior, closure, and boundary of {123}.
iii. Determine to what point or points (if any) the sequence
(1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 4, 6, 4, 6, 4, 6 · · · ) converges.
(b) Recall the axioms of topology, which refer only to open sets:
• The empty set and the set X are both open.
• The union of any collection of open sets is open.
• The intersection of two open sets is open.
A closed set C is defined as a set whose complement C c is open.
You may use the following well-known properties of set complements,
sometimes called “De Morgan’s Laws”:
(A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c .
i. Prove directly from the axioms of topology that the union of two
closed sets is closed.
ii. In the Web site topology, a closed set of pages is one that has
no outgoing links to other pages on the site. Prove that in this
model, the union of two closed sets is closed.
iii. Prove that if A and B are closed subsets of R2 (with the topology
specified by open balls), their union is also closed.
(c) Subsets of R
i. Let A = {0} ∪ (1, 2]. Determine Ac , Å, A, and ∂A.
ii. What interval is equal to ∞ 1 1
S
n=2 [−1 + n , 1 − n ]? Is it a problem
that this union of closed sets is not a closed set?
iii. Let Q1 denote the set of rational numbers in the interval (−1, 1).
Determine the closure, interior, and boundary of this set.
16
2. Convergence in Rn
17
3. Differential equations
18
4 Homework
1. Suppose that you want to construct a Web site of six pages numbered 1
through 6, where the open sets of pages, defined as in lecture, include {126},
{124}, and {56}.
(a) Prove that in the Web site model of finite topology, the intersection
of two open sets is open.
(b) What other sets must be open in order for the family of open sets to
satisfy the intersection axiom?
(c) What other sets must be open in order for the family of open sets to
satisfy the union axiom?
(d) List the smallest family of open sets that includes the three given sets
and satisfies all three axioms. (You have already found all these sets!)
(e) Draw a diagram showing how six Web pages can be linked together so
that only the sets in this family are open. This is tricky. First deal with
5 and 6. Then deal with 1 and 2. Then incorporate 4 into the network,
and finally 3. There are many correct answers since, for example, if
page 1 links to page 2 and page 2 links to page 3, then adding a direct
link from page 1 to page 3 does not change the topology.
19
3. More theorems about limits of sequences
The sequence a~1 , a~2 , ... in Rn converges to ~a.
The sequence b~1 , b~2 , ... in Rn converges to ~b.
(a) Prove that the sequence of lengths |b~1 |, |b~2 |, ... in R is bounded:
∃K such that ∀n, |b~n | < K. Hint: write b~m = b~m − ~b + ~b, then use
the triangle inequality.
(b) Define the sequence of dot products: cn = a~n · b~n .
Prove that c1 , c2 , · · · converges to ~a · ~b.
Hint: Subtract and add ~a · b~n , then use the triangle inequality and the
Cauchy-Schwarz inequality.
1 1
4. Let A = 31 13
3 3
lim An
n→∞
(I − A)−1 = I + A + A2 + ....
for this choice of A. As was the case for sample problem 4, you can
evaluate the infinite sum on the right by summing a geometric series,
but you should split off the first term and start the geometric series
with the second term.
20
5. The differential equation ẍ = −3ẋ − 2x describes the motion of an “over-
damped oscillator.” The acceleration ẍ is the result of the sum of a force
proportional to ẋ, supplied by a shock absorber, and a force proportional
to x, supplied by a spring.
x
~ =
(a) Introduce v = ẋ as a new variable, and define the vector w .
v
Find a matrix A such that w ~˙ = A~
w.
(b) Calculate the matrix exp(At).
(c) Graph x(t) for the following three sets of initial values that specify
position and velocity when t = 0:
1
Release from rest: w~0 = .
0
0
Quick shove: w~0 = .
1
1
Push toward the origin: w ~0 = .
−3
21
a b
6. Suppose that A is a matrix of the form S = . Prove that
b a
cosh(bt) sinh(bt)
exp(St) = exp(at) .
sinh(bt) cosh(bt)
Then use this result to solve
ẋ = x + 2y
ẏ = 2x + y
without having to diagonalize the matrix S.
−1 9
7. Let B = . Show that there is only one eigenvalue λ and find an
−1 5
eigenvector for it. Then show that N = B − λI is nilpotent.
22
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #3, Week 2
Reading
You may wish to feature Ötzi the Iceman as the protagonist of your proof.
1
R Scripts
• Script3.2A-LimitFunctionR2.R
Topic 1 - Sequences that converge to the origin
Topic 2 - Evaluating functions along these sequences
• Script 3.2B-AffineApproximation.R
Topic 1 - The tangent-line approximation for a single variable
Topic 2 - Displaying a contour plot for a function
Topic 3 - The gradient as a vector field
Topic 4 - Plotting some pathological functions
2
1 Executive Summary
1.1 Limits in Rn
• To define limx→x0 f (x), we need not require that x0 is in domain of f . We
require only that x0 is in the closure of the domain of f . This requirement
guarantees that for any δ > 0 we can find an open ball of radius δ around
x0 that includes points in the domain of f . There is no requirement that
all points in that ball be in the domain.
• Limit of a function f from Rn to Rm :
We assume that the domain is a subset X ⊂ Rn .
Definition: Function f : X → Rm has the limit a at x0 :
lim f (x) = a
x→x0
3
1.2 Continuous functions in topology and in Rn
• Function f is continuous at x0 if, for any open set U in the codomain that
contains f (x0 ), the preimage (inverse image) of U , i.e. the set of points x
in the domain for which f (x) ∈ U , is also an open set.
• Here is the definition that lets us extend real analysis to n dimensions.
f : Rn → Rm is continuous at x0 if, for any open “codomain ball” of radius
centered on f (x0 ), we can find an open “domain ball” of radius δ centered
on x0 such that if x is in the domain ball, f (x) is in the codomain ball.
• An equivalent condition (your proof 10.1):
f is continuous at x0 if and only if every sequence that converges to x0 is a
good sequence. We will need to prove this for f : Rn → Rm , but the proof
is almost identical to the proof for f : R → R, which we have already done.
• As was the case in R, sums, products, compositions, etc. of continuous
functions are continuous. If you can write a formula for a function of
several variables that does appear to involve division by zero, the theorems
on pages 98 and 99 will show that it is continuous.
• To show that a function is discontinuous, construct a bad sequence!
4
1.4 The nested compact set theorem
Xk ∈ Rn is a decreasing sequence of nonempty compact sets: X1 ⊃ X2 ⊃ · · · .
For example, in R, Xn = [−1/n, 1/n]. In R2 . we can use nested squares.
∞
\
The theorem states that Xk 6= ∅.
k=1
1
If Xk = (0, (not compact!), the infinite intersection is the empty set.
k
)
The proof (Hubbard, Appendix A.3) starts by choosing a point xk from each
set Xk , then invokes the Bolzano-Weierstrass theorem to select a convergent
subsequence yi that converges to a point a thatT∞is contained in each of the Xk
and so is also an element of their intersection m=1 Xm .
The proof (Hubbard, Appendix A.3) uses the nested compact set theorem.
In general topology, where the sets that are considered are not necessarily
subsets of Rn , the statement “every open cover contains a finite subcover” is
used as the definition of “compact set.”
5
1.7 Directional derivative, Jacobian matrix, gradient
Let ~v be the direction vector of a line through a. Imagine a moving particle
whose position as a function of time t is given by a + t~v on some open interval
that includes t = 0. Then f (a + t~v) is a function of the single variable t. The
derivative of this function with respect to t is the directional derivative.
More generally, we use h instead of t and define the directional derivative as
f (a + h~v) − f (a)
∇~v f (a) = lim
h→0 h
If the directional derivative is a linear function of ~v, in which case f is said
to be differentiable at a, then the directional derivative can be calculated if we
know its value for each of the standard basis vectors. Since
f (a + h~ei ) − f (a)
∇~ei f (a) = lim = Di f (a)
h→0 h
we can write
Dn f (a)
so that
6
2 Lecture outline
1. Given that function f : Rk → Rm is continuous at x0 , prove that every
sequence such that xn → x0 is a “good sequence” in the sense that f (xn )
converges to f (x0 ). (This is half of proof 10.1.)
7
2. Given that function f : Rk → Rm is discontinuous at x0 , show how to
construct a “bad sequence” such that xi → x0 but f (xi ) does not converge
to f (x0 ). (This is the other half of proof 10.1).
8
3. A fanciful version of proof 10.2: a continuous real-valued function f defined
on a compact subset C ⊂ Rn has a supremum M and there is a point a ∈ C
(a maximum) where f (a) = M .
Ötzi the Iceman, whose mummy is the featured exhibit at the archaeological
museum in Bolzano, Italy, has a goal of camping at the greatest altitude
M on the Tyrol, a compact subset of the earth’s surface on which altitude
is a continuous function f of latitude and longitude.
(a) Assume that there is no supremum M. Then Ötzi can select a sequence
of campsites in C such that
f (x1 ) > 1, f (x2 ) > 2,... f (xn ) > n, · · · . Show how to use Bolzano-
Weierstrass to construct a “bad sequence,” in contradiction to the
assumption that f is continuous.
(b) On night n, Ötzi chooses a campsite whose altitude exceeds M − 1/n.
From this sequence, extract a convergent subsequence, and call its
limit a. Show that f (a) = M , so a is a maximum, and M is not
merely a supremum but a maximum value.
9
4. Nested compact sets
You have purchased a nice chunk of Carrara marble from which to carve
the term project for your GenEd course on Italian Renaissance sculpture.
On day 1 the marble occupies a compact subset X1 of the space in your
room. You chip away a bit every evening, hoping to reveal the masterpiece
that is hidden in the marble, and you thereby create a decreasing sequence
of nonempty compact sets: X1 ⊃ X2 ⊃ · · · .
Your understanding instructor gives you an infinite extension of time on the
project. Prove that there is a point a that forever remains in the marble,
no matter how much you chip away; i.e. that
∞
\
Xk 6= ∅.
k=1
10
5. Heine-Borel theorem (proved in R2 , but the proof is the same for Rn .)
Suppose that you need security guards to guard a compact subset X ∈ R2 .
Heine-Borel Security, LLC proposes that you should hire an infinite number
of their guards, each of whom will patrol an open subset Ui of R2 . These
guards protect all of X: the union of their patrol zones is an “open cover.”
Prove that you can fire all but a finite number m of the security guards
(not necessarily the first m) and your property will still be protected:
m
[
X⊂ Ui .
i=1
Break up the part of the city where your property lies into closed squares,
each 1 kilometer on a side. There will exist a square B0 that needs infinitely
many guards (the “infinite pigeonhole principle”).
Break up this square into 4 closed subsquares: again, at least one will need
infinitely many guards. Choose one subsquare and call it B1 . Continue this
procedure to get a decreasing sequence Bi of nested compact sets, whose
intersection includes a point a.
Now show that any guard whose open patrol zone includes a can replace
all but a finite number of other guards.
11
6. Cauchy sequences in Rn
12
7. Using sequences to show that a limit does not exist.
x2 − y 2
x
f = 2
y x + y2
Construct sequences (xn ), all of which converge to the origin, with the
following properties:
13
8. A challeging bad sequence construction, from Hubbard pp. 96-97.
|y|
|y|e− x2
x
f =
y x2
14
9. Continuity and discontinuity in R3
(a) Define
x 0
xyz
F y = 2
, F 0 = 0.
x + y2 + z2
z 0
Prove that F is continuous at the origin.
(b) Define
x 0
xy + xz + yz
g y = 2 , g 0 = 0.
x + y2 + z2
z 0
Prove that g is discontinuous at the origin.
15
10. Converse of Heine-Borel in R
The converse of Heine-Borel says that if the U.S goverment is hiring Heine-
Borel security to guard a subset X of the road from Mosul to Damascus
and wants to be sure that they do not have to pay an infinite number of
guards, then X has to be closed and bounded.
(a) What happens if Heine-Borel assigns guard k to patrol the open inter-
val (−k, k)?
(b) What happens if Heine-Borel selects a point x0 that is not in X and
assigns guard k to patrol the interval (x0 − 1/k, x0 + 1/k).?
16
x p
11. Let f = xy 3 .
y
4
Evaluate the Jacobian matrix of f at and use it to find the best affine
1
4 2
approximation to f ( +t ) for small t.
1 1
4 2
By defining g(t) = f ( +t ), you can convert this problem to one
1 1
in single-variable calculus. Show that using the tangent-line approximation
near t = 0 leads to exactly the same answer.
17
12. A clever applcation of the gradient vector
The Cauchy-Schwarz inequality says that
grad f · v ≤ |gradf ||v|, with equality when grad f and v are proportional.
If v is a unit vector, the maximum value of the directional derivative occurs
when v is a multiple of grad f .
Suppose
that the temperature T in a open subset of the plane is given by
x
T = 25 + 0.1x2 y 3 . If you are at x = 1, y = 2, along what direction
y
should you walk to have temperature increase most rapidly?
18
3 Group Problems
1. Theorems related to Bolzano-Weierstrass and Heine-Borel
(a) You are working for Heine-Borel Security and are bidding on a project
to guard the interior of one mile of Pennsylvania Avennue between the
Capitol to the White House, modeled as the open interval I = (0, 1).
Show that you can create a countably infinite set of disjoint open patrol
zones which cover only a subset of I, so that no “finite subcover” will be
possible. Then show that you cannot do the same with an uncountably
infinite set of disjoint open patrol zones. (Hint: each zone includes a
different rational number.)
(b) A school playground is a compact subset C ⊂ R2 . Two aspiring quar-
terbacks are playing catch with a football, and they want to get as far
apart as possible. Show that if sup |x − y| = D for any two points in
C, they can find a pair of points x0 and y0 such that |x0 − y0 | = D.
Then invent simple examples to show that this cannot be done if the
playground is unbounded or is not closed.
(c) The converse of the Heine-Borel theorem states that if every open
cover of set X ∈ Rn contains a finite subcover, then X must be closed
and bounded.
i. By choosing as the open cover a set of open balls of radius 1, 2, · · · ,
prove that X must be bounded.
ii. To show that X is closed, show that its complement X c must be
open. Hint: choose any x0 ∈ X c and choose an open cover of X
in which the kth set consists of points whose distance from x0 is
greater than k1 . This open cover of X must have a finite subcover.
If you need a further hint, look on pages 90 and 91 of Chapter 2 of
Ross.
19
2. Limits and continuity in R2
(a) Define
xy 3
x 0
f = 2 ,f = 0.
y x +y 6 0
1 1
Show that the sequence 1 is “good” but that i13 is “bad.”
i
i i
(b) Let
xy(x2 − y 2 )
x 0
f = ,f = 0.
y 2 2
(x + y ) 2 0
0
Invent a “bad sequence” of points (a1 , a2 , · · · ) that converges to
0
for which
lim f (ai ) 6= 0.
i→∞
0
This bad sequence proves that f is discontinuous at .
0
(c) Let
xy(x2 − y 2 )
x 0
g = ,g = 0.
y 2
x +y 2 0
0
By introducing polar coordinates, prove that g is continuous at .
0
20
3. Using partial derivatives to find approximate function values
x 2 2
(a) Let f = x y. Evaluate the Jacobian matrix of f at and use
y 0.5
1.98 1.998
it to find the best affine approximation to f and to f .
0.51 0.501
Use a calculator or R, find the “remainder” (the difference between
the actual function value and the best affine approximation) in each
case. You should find that the remainder decreases by a factor that is
much greater than 10.
(b) Let
x2 y
x
f = 4 .
y x + y2
0
f is defined to be 0 at . Show that both partial derivatives are
0
0
zero at but that the function is not continuous there.
0
x
(c) Let f = y + log(xy) (natural logarithm) for x, y > 0. Evaluate
y
0.5
the Jacobian matrix of f at and use it to find the best affine
2
0.51
approximation (constant plus linear approximation) to f .
2.02
21
4 Homework
1. A rewrite of Oetzi the Iceman, with lots of sign changes.
Joe the Plumber, who became a minor celebrity in the 2008 presidential
campagn, has hit the jackpot. Barack Obama enrolls him in a health plan,
formerly available only to members of Congress, that makes him immortal,
and gives him a special 401(k) that delivers $10K per month of tax-free
income. Joe retires to pursue his lifelong dream of camping at the lowest
spot in Death Valley.
Assume that Death Valley National Park is a closed set and that altitude
f (x) in the Park is a continuous function. Prove that the altitude in Death
Valley has a greatest lower bound (even though that is obvious on geograph-
ical grounds) and that there is a place where that lower bound is achieved,
so that Joe can achieve his goal.
2. You are the mayor of El Dorado. Not all the streets are paved with gold –
only the interval [0,1] on Main Street – but you still have a serious security
problem, and you ask Heine-Borel Security LLC to submit a proposal for
keeping the street safe at night. Knowing that the city coffers are full, they
come up with the following pricey plan for meeting your requirements by
using a countable infinity of guards:
• Guard 0 patrols the interval (− N1 , N1 ), where you may choose any value
greater than 100 for the integer N . She is paid 200 dollars.
• Guard 1 patrols the interval (0.4, 1.2) and is paid 100 dollars.
• Guard 2 patrols the interval (0.2, 0.6) and is paid 90 dollars.
• Guard 3 patrols the interval (0.1, 0.3) and is paid 81 dollars.
• Guard k patrols the interval ( 0.8 , 2.4 ) and is paid 100(0.9)k−1 dollars.
2k 2k
(a) Calculate the total cost of hiring this infinite set of guards (sum a
geometric series).
(b) Show that the patrol regions of the guards form an “open cover” of
the interval [0,1].
(c) According to the Heine-Borel theorem, this infinite cover has a finite
subcover. Explain clearly how to construct it. (Hint: look at the proof
of the Heine-Borel theorem)
(d) Suppose that you want to protect only the open interval (0,1), which
is not a compact subset of Main Street. In what very simple way can
Heine-Borel Security modify their proposal so that you are forced to
hire infinitely many guards?
22
3. Prove the Heine-Borel theorem in R2 by contraposition. Assume that you
have been given a countably infinite collection of open sets Ui that cover
a compact set X, and assume that no finite subcollection covers X. Show
(for a contradiction) that you can identify a single U that replaces all but
finitely many of the Ui .
4. Hubbard, Exercise 1.6.6. You might want to work parts (b) and (c) before
attempting part (a). The function f (x) is defined for all of R, which is not
a compact set, so you will have to do some work before applying theorem
1.6.9. Notice that “a maximum” does not have to be unique: a function
could achieve the same maximum value at more than one point.
5. Singular Point, California is a spot in the desert near Death Valley that is
reputed to have been the site of an alien visit to Earth. In response to a
campaign contribution from AVSIG, the Alien Visitation Special Interest
Group, the government has agreed to survey the region around the site.
In the vicinity, the altitude is given by the function
2x2 y
x
f = 4 .
y x + y2
A survey team that traveled through the Point going west to east declares
that the altitude at the Point itself is zero. A survey team that went
south to north would comment only that zero was perhaps a reasonable
interpolation.
(a) Suppose you travel through the Point along the line y = mx, passing
through the point at time t = 0 andmoving witha constant velocity
x t
such that x = t: in other words, = . Find a function
y mt
g(m, t) that gives your altitude as a function of time on this journey.
Sketch graphs of g as a function of t for m = 1 and for m = 3. Is what
happens for large m consistent with what happens on the y axis?
0
(b) Find a sequence of points that converges to , for which xn = n1
0
x
and f = 1 for every point in the sequence. Do the same for
y
x
f = −1.
y
(c) Is altitude a continuous function at Singular Point? Explain.
23
6. (a) Hubbard, exercise 1.7.12. This is good practice in approximating a
function by using its derivative and seeing how fast the “remainder”
goes to zero.
(b) Hubbard, exercise 1.7.4. These are all problems in single-variable cal-
culus, but they cannot be solved by using standard differentation for-
mulas. You have to use the definition of the derivative as a limit.
24
9. (a) Hubbard, Exercise 1.7.22. This is a slight generalization of a topic that
was presented in lecture. The statement is in terms of derivatives, but
it is equivalent to the version that uses gradients.
(b) An application: suppose that you are skiing on a mountain where
x
the height above sea level is described by the function f = 1−
y
0.2x2 − 0.4y 2 (with the kilometer as the unit of distance,
this is not
x 1
unreasonable). You are located at the point = . Find a
y 1
unit vector ~v along the direction in which you should head if you want
to head straight down the mountain and two unit vectors w ~ 1 and w~2
that specify directions for which your rate of descent is only 35 of the
maximum rate.
(c) Prove that in general, the unit vector for which the directional deriva-
tive is greatest is orthogonal to the direction along which the direc-
tional derivative is zero, and use this result to find a unit vector ~u
appropriate for a timid but lazy skier who wants to head neither down
nor up.
25
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #3, Week 3
Differentiability, Newton’s method, inverse functions
Reading
• Hubbard, section 2.8 page 233-235 and page 246. (Newton’s method)
• 11.2 Using the mean value theorem, prove that if a function f : R2 → R has
partial derivatives D1 f and D2 f that are continuous
at a, it is differentiable
at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
1
R Scripts
• Script 3.3A-ComputingDerivatives.R
Topic 1 - Testing for differentiability
Topic 2 - Illustrating the derivative rules
• Script 3.3B-NewtonsMethod.R
Topic 1 - Single variable
Topic 2 - 2 equations, 2 unknowns
Topic 3 - Three equations in three unknowns
• Script 3.3C-InverseFunction.R
Topic 1 - A parametrization function and its inverse
Topic 2 - Visualizing coordinates by means of a contour plot
Topic 3 - An example that is economic, not geometric
2
1 Executive Summary
1.1 Definition of the derivative
• Converting the derivative to a matrix
The linear function f (h) = mh is represented by the 1 × 1 matrix [m].
When we say that f 0 (a) = m, what we mean is that the function
f (a + h) − f (a) is well approximated, for small h, by the linear function
mh. The error made by using the approximation is a “remainder” r(h) =
f (a + h) − f (a) − mh. If f is differentiable, this remainder approaches 0
faster than h, i.e.
r(h) f (a + h) − f (a) − mh
lim = lim = 0.
h→0 h h→0 h
This definition leads to the standard rule for calculating the number m,
f (a + h) − f (a)
m = lim .
h→0 h
• Extending this definition to f : Rn → Rm
A linear function L(~h) is represented by an m × n matrix.
When we say that f is differentiable at a, we mean that the function
f (a + ~h) − f (a) is well approximated, for any ~h whose length is small, by a
linear function L, called the derivative [Df (a)].
The error made by using the approximation is a “remainder”
r(~h) = f (a + ~h) − f (a) − [Df (a)](~h).
f is called differentiable if this remainder approaches 0 faster than |~h|, i.e.
1 ~ 1
lim r(h) = lim (f (a + ~h) − f (a) − [Df (a)](~h)) = 0.
h→~0 |~
~ h| h→~0 |~
~ h|
In that case, [Df (a)] is represented by the Jacobian matrix [Jf (a)].
Proof: Since L exists and is linear, it is sufficient to consider its action on
each standard basis vector. We choose ~h = t~ ei so that |~h| = t. Knowing
that the limit exists, we can use any sequence that converges to the origin
to evaluate it, and so
1 1
ei ) − f (a) − tL~
lim (f (a + t~ ei )) = 0? and L(~ ei ) − f (a))
ei ) = lim (f (a + t~
t→0 t t→0 t
3
1.2 Proving differentiability and calculating derivatives
In every case f is a function from U to Rm , where U is an open subset of Rn .
• f is constant: f = c. Then [Df (a)] is the zero linear transformation, since
1 1
lim (f (a + ~h) − f (a) − [Df (a)]~h) = lim (c − c − ~0) = ~0.
h→~0 |~
~ h| h→~0 |~
~ h|
f1 Df1 (a)
· ·
f has differentiable components: if f =
· : then Df (a) = ·
· ·
fn Dfn (a)
.
4
1.3 Connection between Jacobian matrix and derivative
• If f : Rn → Rm is defined on an open set U ∈ Rn , and
x1 f1 (x)
f (x) = f ... = ...
xn fm (x)
the Jacobian matrix [Jf (x)] is made up of all the partial derivatives of f :
D1 f1 (a)....Dn f1 (a)
[Jf (a)] = ...
D1 fm (a)....Dn fm (a)
(a1 , a2 + h2 ) (a1 + h1 , a2 + h2 )
(a1 , a2 ) (a1 + h1 , a2 )
5
1.5 Newton’s method – more than one variable
Example: we are trying to solve a system of n nonlinear equations in n unknowns,
e.g.
x2 ey − sin(y) − 0.3 = 0
tan x + x2 y 2 − 1 = 0.
Ordinary algebra is no help – there is no nonlinear counterpart to row reduction.
U is an open subset of
R
n
, and we have a differentiable
function ~f (x) : U → Rn .
x x2 ey − sin(y) − 0.3
In the example, ~f = , which is differentiable.
y tan x + x2 y 2 − 1
We are trying to solve the equation ~f (x) = ~0.
Suppose we have found a value a0 that is close to the desired x.
Again we use the best affine approximation
~f (x) ≈ ~f (a0 ) + [Df̃ (a0 )](x − a0 ).
We set out to find a value a1 for which this affine approximation equals zero.
~f (a0 ) + [Df̃ (a0 )](a1 − a0 ) = ~0
Iterating this procedure is the best known for solving systems of nonlinear equa-
tions. Hubbard has a detailed discussion (which you are free to ignore) of how
to use Kantorovich’s theorem to assess convergence.
• g(y0 ) = x0 .
6
2 Lecture outline
1. (Proof 11.1)
Let U ⊂ Rn be an open set, and let f and g be functions from U to R.
Prove that if f and g are differentiable at a then so is f g, and that
7
3. (Proof 11.2) Using the mean value theorem, prove that if a function f :
R2 → R has partial derivatives D1 f and D2 f that are continuous
at a, it is
differentiable at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
4. Newton’s method
(a) One variable: Function f is differentiable. You are trying to solve the
equation f (x) = 0, and you have found a value a0 , close to the desired
x, for which f (a0 ) is small. Derive the formula a1 = a0 − f (a0 )/f 0 (a0 )
for an improved estimate.
(b) n variables: U is an open subset of Rn , and function ~f (x) : U → Rn is
differentiable. You are trying to solve the equation ~f (x) = ~0,
and you have found a value a0 , close to the desired x, for which ~f (a0 )
is small. Derive the formula
−1
a1 = a0 − [Df̃ (a0 )] ~f (a0 ).
8
6. Jacobian matrix for a parametrization function
Here is the function that converts the latitude u and longitude v of a point
on the unit sphere to the Cartesian coordinates of that point.
cos u cos v
u
f = cos u sin v
v
sin u
Work out the Cartesian coordinates of the point with sin u = 35 (37 degrees
North latitude) and sin v = 1(90 degrees East longitude), and calculate the
Jacobian matrix at that point. Then find the best affine approximation to
the Cartesian coordinates of the nearby point where u is 0.01 radians less
(going south) and v is 0.02 radians greater (going east).
9
7. Derivative of a function of a matrix (Example 1.7.17 in Hubbard):
A matrix is also a vector. When we square an n × n matrix A, the entries of
S(A) = A2 are functions of all the entries of A. If we change A by adding
to it a matrix H of small length, we will make a change in the function
value A2 that is a linear function of H plus a small “remainder.”
We could in principle represent A by a column vector with n2 components
and the derivative of S by a very large matrix, but it is more efficient to
leave H in matrix form and use matrix multiplication to find the effect of
the derivative an a small increment matrix H. The derivative is still a linear
function, but it is represented by matrix multiplication in a different way.
(a) Using the definition of the derivative, show that the linear function
that we want is DS(H) = AH + HA.
(b) Confirm that DS is a linear function of H
(c) Check that DS(H) is a good approximation to S(A+H)−S(A) for the
following simple case, where the matrices A and H do not commute.
1 1 0 h
A= ,H=
0 1 k 0
10
8. Two easy chain rule examples
11
9. Chain rule for functions of matrices
In sample problem 2 we showed that the derivative of the squaring function
S(A) = A2 is DS(H) = AH + HA
Proposition 1.7.19 (tedious proof on pp. 136-137) establishes the similar
rule that for T (A) = A−1 , the derivative is DT (H) = −A−1 HA−1
Now the function U (A) = A−2 can be expressed as the composition U =
S ◦ T.
Find the derivative DU (H) by using the chain rule.
The chain rule says “the derivative of a composition is the composition of
the derivatives,” even in a case like this where composition is not repre-
sented by matrix multiplication.
12
10. A non-differentiable function
Consider a surface where the height z is given by the function
3x2 y − y 3
x 0
f = 2 ;f = 0.
y x +y 2 0
This function is not differentiable at the origin, and so you cannot calculate
its directional derivatives there by using the Jacobian matrix!
(a) Along the first standard basis vector, the directional derivative at the
origin is zero. Find two unit vectors along other directions that also
have this property.
(b) Along the second standard basis vector, the directional derivative at
the origin is -1.
Find two unit vectors along other directions that also have this prop-
erty. (This surface is sometimes called a “monkey saddle,” because a
monkey could sit comfortably on it with its two legs and its tail placed
along these three downward-sloping directions.
(c) Calculate
the directional derivative along an arbitrary unit vector
cos θ
~eθ = . Using the trig identity sin 3θ = 3 sin θ cos2 θ − sin3 θ,
sin θ
quickly rederive the special cases of parts (a) and (b).
(d) Using the definition of the derivative, give a convincing argument that
this function is not differentiable at the origin.
13
11. Newton’s method
We want an approximate solution to the equations
log x + log y = 3
x2 − y = 1
x log x + log y − 3 0
i.e. f = 2 = .
y x −y−1 0
3
Knowing that log 3 ≈ 1.1, show that x0 = is an approximate solution
9
to this equation, then use Newton’s method to improve the approximation.
Here is a check:
log 2.81 + log 6.87 = 2.98
2.812 − 6.87 = 1.02
14
12. An economic example of the inverse-function theorem:
Your model: Providing x in health benefits and y in educational benefits
leads to happiness H and cost C according the the equation
H x x + x0.5 y
=f = .
C y x1.5 + y 0.5
Currently, x = 4, y = 9, H = 22, C = 11. Your budget is cut, and you are
told to adjust x and y to reduce C to 10 and H to 19. Find an approximate
solution by using the inverse-function theorem.
H
We cannot find formulas for the inverse function g that would solve
C
the problem exactly, but we can calculate the derivative of g.
" √ #
1 + 2√y x x 13
2
(a) Check that[Df ] = 3√ 1 = 4 1 is invertible.
2
x √
2 y 3 6
−0.03 0.36 19
(b) Use the derivative [Dg] = to approximate g
0.55 −0.6 10
15
3 Group Problems
1. Chain rule
xD1 f + yD2 f = 0.
(c) Chain rule with 2 × 2 matrices
r
Start with a pair of polar coordinates .
θ
x
Function g converts them to Cartesian .
y
x 2xy
Function f then converts to .
y x2 − y 2
r r r
Confirm that [D(f ◦ g)( ))] = [Df (g )] ◦ [Dg ]
θ θ θ
16
2. Issues of differentiability
(a) Let
x2 y 2
x
f = 2 .
y x + y2
0
f is defined to be 0 at . State, in terms of limits, what it means
0
0
to say that f is differentiable at and prove that its derivative
0
0
[Df ] is the zero linear transformation.
0
(b) Suppose that A is a matrix and S is the cubing function given by the
formula S(A) = A3 . Prove that the derivative of S(A) is
[DS(A)](H) = A2 H + AHA + HA2 .
The proof consists in showing that the length of the “remainder” goes
to zero faster than the length of the matrix H.
(c) A continuous but non-differentiable function
x2 y
x 0
f = 2 ,f = 0.
y x + y2 0
i. Show that both partial derivatives vanish at the origin, so that
the Jacobian matrix at the origin isthe zero matrix [0 0], but
1
that the directional derivative along is not zero. How does
1
this calculation show that the function is not differentiable at the
origin?
ii. For all points except the origin, the partial derivatives are given
by the formulas
2xy 3 x4 − x2 y 2
x x
D1 f = 2 , D2 f =
y (x + y 2 )2 y (x2 + y 2 )2
Construct a “bad sequence” of points approaching the origin to
show that D1 f is discontinuous at the origin.
17
3. Inverse functions and Newton’s method
(to be done in R, by modifying R script 3.3B)
x3 + y 2 − xy = 1.08
x2 y + y 2 = 2.04
is x0 = 1, y0 = 1.
Use one step of Newton’s method to improve this approximation.
(b) You are in charge of building the parking lots for a new airport. You
have ordered from amazon.com enough asphalt to pave 1 square kilo-
meter, plus 5.6 kilometers of chain-link fencing. Your plan is to build
two square, fenced lots. The short-term lot is a square of side x=0.6
kilometers; the long-term lot is a square of side y=0.8 kilometers. The
amount of asphalt A and the amount C of chain-link fencing required
are then specified by the function
2
A x x + y2
=F = ,
C y 4x + 4y
Alas, Amazon makes a small shipping error. They deliver enough
asphalt to pave 1.03 square kilometers but only 5.4 kilometers of fence.
i. Use the inverse-function theorem to find approximate new values
for x and y that use exactly what was shipped to you.
In this simple case you can check your answer by solving alge-
braically for x and y.
ii. Find a case where A = 1 but the value of C is such that this
approach will fail because [DF ] is not onto. (This case corresponds
to the maximum amount of fencing.)
(c) Saving Delos
The ancient citizens of Delos, threatened with a plague, consulted the
oracle of Delphi, who told them to construct a new cubical altar to
Apollo whose volume was double the size of the original cubical altar.
(For details, look up “Doubling the cube” on Wikipedia.)
If the side of the original altar was 1, the side of the new altar had to
be the real solution to f (x) = x3 − 2 = 0.
Numerous solutions to this problem have been invented. One uses a
“marked ruler” or “neusis”; another uses origami.
Your job is to use multiple iterations of Newton’s method to find an
approximate solution for which x3 − 2 is less than 10−8 in magnitude.
18
4 Homework
1. (similar to group problem 1a)
We know the derivatives of the matrix-squaring function S and the matrix-
inversion function T :
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A−1 then [DT (A)](H) = −A−1 HA−1 .
(a) Use the chain rule to find a formula for the derivative of the function
U (a) = A4 .
(b) Use the chain rule to find a formula for the derivative of the function
W (a) = A−4 .
3. Hubbard, Exercise 1.8.6, part (b) only. In the case where f and g are
functions of time t, this formula finds frequent use in physics. You can
either do the proof as suggested in part (a) or model your proof on the one
for the dot product on page 143.
19
5. (similar to group problem 2c)
As a summer intern, you are given the job of reconciling the Democratic and
Republican proposals for tax reform. Both parties agree on the following
model:
x(x2 − y 2 )
x 0
f = ,f = 0.
y 2
x +y 2 0
The Republican proposal is y = −x, while the Democratic proposal is
y = x.
20
6. Chain rule: an example with 2 × 2 matrices A similar example with a 3 × 3
matrix is on page 151 of Hubbard.
The function
1
x (x + y)
f = 2 √ was invented by Gauss about 200 years ago to deal
y xy
with integrals of the form
Z ∞
dt
p .
−∞ (t2 + x2 )(t2 + y 2 )
It was revived in the late 20th century as the basis of the AGM (arithmetic-
geometric mean) method for calculating π. You can get 1 million digits with
a dozen or so iterations.
The function is meant to be composed with itself; so it will be appropriate
to compute the derivative of f ◦ f by the chain rule.
21
8. (Related to group problem 3b, but involves extra iterations)
The CEO of a chain of retail stores will get a big bonus if she hits her volume
and profit targets for December exactly. Her microeconomics consultant,
fresh out of Harvard, tells her that both her target figures are functions
of two variables, investment x in Internet advertising and investment y in
television advertising. The former attracts savvier customers and so tends
to contribute to volume more than to profit.
The function that determines volume V and profit P is
3 1
V x4 y 3 + x
= 1 2 .
P x4 y 3 + y
With x = 16, y = 8, V = 32, P = 16, our CEO figures she is set for a
big bonus. Suddenly, the board of directors, feeling that Wall Street is
looking as much for profit as for volume this year, changes her targets to
V = 24, P = 24. She needs to modify x and y to meet these new targets.
(a)
Near V =32,P = 16, there is an inverse function such that
x V
=g . Find its derivative [Dg], and use the derivative to find
y P
values of x and y that are an approximate solution to the problem.
Because the increments to V and P are large, you should not expect
the approximate solution to be very good, but it will be better than
doing nothing.
(b) Use multiple iterations of Newton’s method in R to find accurate values
of x and y that meet the revised targets. Feel free to modify Script
3.3C.
22
9. (a) Hubbard, problem 2.10.2. Make a sketch to show how this mapping
defines an alternative coordinate system for the plane, in which a point
is defined by the intersection of two hyperbolas.
(b) The point x = 3, y = 2 is specified in this new coordinate system
by the coordinates u = 6, v = 5. Use the derivative of the inverse
function to find approximate values of x and y for a nearby point
where u = 6.5, v = 4.5. (This is essentially one iteration of Newton’s
method.)
(c) Find h such that the point u = 6 + h, v = 5.1 has nearly the same
x-coordinate as u = 6, v = 5.
(d) Find k such that the point x = 3 + k, y = 2.1 has nearly the same
u-coordinate as x = 3, y = 2.
(e) For this mapping, you can actually find a formula for the inverse func-
tion that works in the region of the plane where x, y, u, and v are all
positive. Find the rather messy formulas for x and y as functions of
u and v, and use them to answer the earlier questions. Once you cal-
culate the Jacobian matrix and plug in appropriate numerical values,
you will be back on familiar ground.
I could get Mathematica Solve[] to find the inverse function only after
I eliminated y by hand. At this point the quadratic formula does the
job anyway!
23
MATHEMATICS 23a/E-23a, Fall 2015
Linear Algebra and Real Analysis I
Module #3, Week 4
Implicit functions, manifolds, tangent spaces, critical points
Reading
1
R Scripts
• Script 3.4A-ImplicitFunction.R
Topic 1 - Three variables, one constraint
Topic 2 - Three variables, two constraints
• Script 3.4B-Manifolds2D.R
Topic 1 - A one-dimensional submanifold of R2 – the unit circle
Topic 2 - Interesting examples from the textbook
Topic 3 - Parametrized curves in R2
Topic 4 - A two-dimensional manifold in R2
Topic 5 - A zero-dimensional manifold inR2
• Script 3.4C-Manifolds3D.R
Topic 1 - A manifold as a function graph
Topic 2 - Graphing a parametrized manifold
Topic 3 - Graphing a manifold that is specified as a locus
• Script 3.4D-CriticalPoints
Topic 1 - Behavior near a maximum or minimum
Topic 2 - Behavior near a saddle point
• Script 3.5A-LagrangeMultiplier.R
Topic 1 - Constrained critical points in R2
2
1 Executive Summary
1.1 Implicit functions – review of the linear case.
We have n unknowns, n − k equations, e.g for n = 3, k = 1
2x + 3y − z = 0, 4x − 2y + 3z = 0
2 3 −1
Create an (n − k) × n matrix: T =
4 −2 3
If the matrix T is not onto, its rows (the equations) are linearly dependent.
Otherwise, when we row reduce, we will find n − k = 2 pivotal columns and
k = 1 nonpivotal columns. We assign values arbitrarily to the “active” variables
that correspond to the nonpivotal columns, and then the values of the “passive”
variables that corresponds to the pivotal column are determined.
Suppose that we reorder the unknowns so that the “active” variables come last.
Then, after we row reduce the matrix, the first n − k columns will be pivotal. So
the first n − k columns will be linearly independent, and they form an invertible
square matrix. The matrix is now of the form T = [A|B], where A is invertible.
~x
The solution vector is of the form ~v = , where the passive variables ~x come
~y
first, the active variables ~y come second.
A solution to T ~v = ~0 is obtained by choosing ~y arbitrarily and setting
~x = −A−1 B~y. Our system of equations determines ~x “implicitly” in terms of ~y.
3
1.3 Curves, Surfaces, Graphs, and Manifolds
Manifolds are a generalization of smooth curves and surfaces.
The simplest sort of manifold is a flat one, described by linear equations. An
example is the line of slope 2 that passes through the point x = 0, y = −2: a
one-dimensional submanifold of R2
There are three equivalent ways to describe such a manifold.
• (The definition) As the graph of a function that expresses the passive vari-
ables in terms of the active variables: either y = f (x) = −2 + 2x or
x = g(y) = 12 (y + 2).
x
• As a “locus” defined by a constraint equation F = 2x − y − 2 = 0.
y
1 1
• By a parametrization function g(t) = +t .
0 2
Definition: A subset M ⊂ Rn is a smooth manifold if locally it is the graph
of a C 1 function (the partial derivatives are continuous). “Locally” means that
for any point x ∈ M we can find a neighborhood U of x such that within M ∩ U ,
there is a C 1 function that expresses n − k passive variables in terms of the other
k active variables. The number k is the dimension of the manifold. In R3 there
are four possibilities:
• k = 3. Any open subset M ⊂ R3 is a smooth 3-dimensional manifold. In
this case k = 3, and the manifold is the graph of a function
f : R3 → {~0}, whose codomain is the trivial vector space {~0} that contains
just a single point. Such a function is necessarily constant, and its derivative
is zero.
x
• k = 2. The graph of z = f = x2 + y 2 is a paraboloid.
y
x ~ cos 2πz
• k = 1. The graph of the function = f (z) = is a helix.
y sin 2πz
• k = 0. In this case the manifold consists of one of more isolated points.
Near any of these points x0 , it is the graph of a function ~f : {~0} → R3
whose domain is a zero-dimensional vector space and whose image is the
point x0 ∈ R3 . This function is differentiable because, since its domain
contains only one point (the zero vector) you cannot find narby points to
show that it is not differentiable.
There is no requirement that a manifold be the graph of a single function, or
that the “active” be the same at every point on the manifold. The unit circle, the
locus of x2 +y 2 −1 = 0, is the union of four function graphs, two of which have x as
the active variable, two of which have y. By using a parameter t that is not
one of
x cos t
the variables, we can represent it by the parametrization = g(t) =
y sin t
4
1.4 Using the implicit function theorem
Start with an open subset U ⊂ Rn and a C 1 function F : U → Rn−k . Consider
the “locus,” M ∩ U , the set of solutions of the equation F(z) = 0.
If [DF(z)] is onto (surjective) for every z ∈ M ∩ U , then M ∩ U is a smooth
k-dimensional manifold embedded in Rn .
Proof: the implicit function theorem says precisely this. The statement that
[DF(z)] is onto guarantees the differentiability of the implicitly defined function.
If [DF(z)] does not exist or fails to be onto, perhaps even just at a single point,
the locus is not a manifold. We use the notation M ∩ U because F may define
just part of a larger manifold M that cannot be described as the locus as a single
function. To say that M itself is a manifold, we have to find an appropriate U
and F for every point z in the manifold.
5
1.7 Critical points
Suppose that function f : Rn → R is differentiable at point x0 and that the
derivative [Df (x0 )] is not zero. Then there exists a vector ~v for which the direc-
tional derivative is not zero, the function g(t) = f (x0 + t~v − f (x0 ) has a nonzero
derivative at t = 0, and, even if we just consider points that lie on a line through
x0 with direction vector ~v, the function f cannot have a maximum or minimum
at x0 . So in searching for a a maximum or minimum of f at points where it is
differentiable, we need to consider only “critical points” where [Df (x0 ] = 0.
A critical point is not necessarily a maximum or minimum, but for f : Rn → R
there is a useful test that generalizes the second-derivative test of single-variable
calculus. The proof relies on sections 3.3-3.5 of Hubbard, which we are skipping.
Form the “Hessian matrix” of second partial derivatives (Hubbard, p. 348),
evaluated at the critical point x of interest.
6
1.9 Constrained critical points - three approaches
We have proved the following:
If M ⊂ Rn is a k-dimensional manifold, and c ∈ M ∩ U is a local extremum of f
restricted to M , then Tc M ⊂ ker[Df (c)].
Corresponding to each of the three ways that we can “know” the manifold
M, there is a technique for finding the critical points of f restricted to M .
• Manifold as a graph
Near the critical point, the passive variables x are a function g(y of the
active variables y. Define the graph-making function
x
g̃(y) =
y
Now f (g(y) specifies values of f only at points on the manifold. Just search
for unconstrained critical points of this function by setting [Df ◦ g̃(y)] = 0.
This approach works well if you can represent the entire manifold as a single
function graph.
• Parametrized manifold
Points on the manifold are specified by a parametrization γ(u).
Now f (γ(u)) specifies values of f only at points on the manifold. Just search
for unconstrained critical points of this function by setting [Df ◦ γ(u)] = 0.
This approach works well if you can parametrize the entire manifold.
7
1.10 Equality of crossed partial derivatives
Let U ⊂ Rn be open. Suppose that f : Rn → R is differentiable at a and has the
property that each of its partial derivatives Di f is also differentiable at a. Then
Dj (Di f )(a) = Di (Dj f )(a).
The proof consists in using the mean value theorem to show that
1
Dj (Di f )(a) = Di (Dj f )(a) = lim (f (a+t~ei +t~ej )−f (a+t~ei )−f (a+t~ej )+f (a)).
t→0 t2
8
2 Proofs
1. Let W be an open subset of Rn , and let F : W → Rn−k be a C 1 mapping
such that F(c) = 0. Assume that [DF(c)] is onto.
Prove that the n variables can be ordered so that the first n − k columns
of [DF(c)] are linearly independent, and that [DF(c)] = [A|B] where A is
an invertible (n − k) × (n − k) matrix.
a
Set c = , where a are the n − k passive variables and b are the k active
b
variables.
Let g be the “implicit function”
from aneighborhood of b to a neighborhood
g(y)
of a such that g(b) = a and F = 0.
y
Prove that [Dg(b)] = −A−1 B.
9
2. (Proof 12.1 - Hubbard Theorem 3.2.4)
Suppose that U ⊂ Rn is an open subset, F : U → Rn−k is a C 1 mapping,
and manifold M can be described as the set of points that satisfy F(z) = 0.
Use the implicit function theorem to show that if [DF(c)] is onto for c ∈ M ,
then the tangent space Tc M is the kernel of [DF(c)]. You may assume that
the variables have been numbered so that when you row-reduce [DF(c)],
the first n − k columns are pivotal.
10
3. (Hubbard, Proposition 3.2.7) Let U ⊂ Rk be open, and let γ : U → Rn be
a parametrization of manifold M . Show that
Tγ(u) M = img[Dγ(u)].
You may take it as proved that if subspaces V and W both have dimension
k and V ⊂ W, then V = W (for the simple reason that k basis vectors for
V are k independent vectors in W and therefore also form a basis for W ).
11
4. (Proof 12.2 – Hubbard, theorems 3.6.3 and 3.7.1)
Let U ∈ Rn be an open subset and let f : U → R be a C 1 (continuously
differentiable) function.
First prove, using a familiar theorem from single-variable calculus, that if
x0 ∈ U is an extremum, then [Df (x0 )] = [0].
Then prove that if M ⊂ Rn is a k-dimensional manifold, and c ∈ M ∩ U is
a local extremum of f restricted to M , then Tc M ⊂ ker[Df (c)].
12
3 Sample Problems
1. A cometary-exploration robot is fortunate enough to land on an ellipsoidal
comet whose surface is described by the equation
y2 z2
x2 + + = 9.
4 9
Its landing point is x = 2, y = 4, z = 3.
13
2. The plane x + 2y − 3z + 4 = 0 andthecone x2 + y 2 − z 2 = 0 intersect in a
3
curve that includes the point c = 4. Near that point this curve is the
5
x
graph of a function = g(z).
y
Use the implicit function theorem to determine g0 (5), then find the approx-
imate coordinates of a point on the curve with z = 5.01.
Check: 2.89+2(4.07) - 3(5.01)= -4; 2.892 + 4.072 = 24.917.
14
3. Assume that, at the top level, there are nine categories x1 , x2 , · · · , x9 in the
Federal budget. They must satisfy four constraints:
(a) A function g that specifies that passive variables in terms of the active
variables.
(b) The function F that specifies the constraints.
(c) A parametrization function γ that generates a valid budget from a set
of parameters.
For each alternative, specify the shape of the matrix that represents the
derivative of the relevant function and explain how, given a valid budget c,
it could be used to find a basis for the tangent space Tc M.
15
4. (Hubbard, exercise 3.1.17) Consider the situation described by Example
3.1.8 in Hubbard, where four linked rigid rods form a quadrilateral in the
plane. The distance from vertex x1 to x2 is l1 , the distance from vertex x2
to x3 is l2 , the distance from vertex x3 to x4 is l3 , and the distance from
vertex x4 to x1 is l4 .
Show that knowing the positions x1 and x3 of two opposite vertices deter-
mines exactly four possible positions of the linkage if the distance from x1
to x3 is less than both l1 + l2 and l3 + l4 but greater than both |l1 − l2 | and
|l3 − l4 | Draw diagrams to illustrate what can happen if these conditions
are not satisfied.
16
5. Critical points
x
f = 21 x2 + 13 y 3 − xy
y
Calculate the partial derivatives
as
functions
of x and y, and show that the
0 1
only critical points are and
0 1
0
Find the eigenvalues of H0 and classify the critical point at .
0
1
Find the eigenvalues of H1 and classify the critical point at .
1
17
4 Group Problems
1. Implicitly defined functions
x 2 2 2
x + y + z − 3
(a) The nonlinear equation F y = = 0 implicitly
x2 + z 2 − 2
z
determines x and y as a function of z. The first equation describes a
sphere of radius 3, the second describes a cylinder of radius 2 whose
axis is the y-axis. The intersection is a circle in the plane y = 1.
Near the point x = 1, y = 1, z = 1, there is a function that expresses
the two√ passive variables
x and y in terms of the active variable z.
2−z 2
g(z) = .
1
Calculate g0 (z) and determine the numerical value of g0 (1)
Then get the same answer without using the function g by forming
the Jacobian matrix [DF] evaluating it at x = y = z = 1, and using
the implicit function theorem to determine g0 (z) = −A−1 [B].
(b) Dean Smith is working on a budget in which he will allocate x to the
library, y to pay raises, and z to the Houses. He is constrained.
The Library Committee, happy to see anyone get more funds as long
as the library does even better, insists that x2 − y 2 − z 2 = 1.
The Faculty Council, content to see the Houses do well as long as other
areas benefit equally, recommends that x + y − 2z = 1.
To comply with these constraints, the dean tries x = 3, y = 2, z = 2.
Given theconstraints,
x and y are determined by an implicitly defined
x
function = g(z).
y
Use the implicit function theorem to calculate g0 (2), and use it to find
approximate values of x and y if z increased to 2.1.
x
(c) The nonlinear equation F y = x2 − 4z 2 − 4y 2 − 1 = 0 implicitly
z
determines x as a function of y and z, but we need to know whether x
is positive or negative to choose the right square root in the function.
y
Find the appropriate function g near the point
z
1
x = 3, y = 1, z = 1, and calculate [Dg ]
1
Then get the same answer by calculating the Jacobian matrix [DF ]
at x = 3, y = 1, z = 1, splitting off a square matrix A on the left, and
computing [Dg] = −A−1 B.
18
2. Manifolds and tangent spaces, investigated with help from R
(a) Manifold
M is known by the equation
x 4
2
F y = xz − y = 0 near the point c = 2.
z 1
It can also
be2
described parametrically by
s
s
γ = st2 near s = 2, t = 1.
t
t4
i. Use the parametrization to find a basis for the tangent space Tc M.
ii. Use the function F to confirm that your basis vectors are indeed
in the tangent space Tc M.
iii. Use the parametrization to do a wireframe plot of the parametrized
manifold near s = 2, t = 1. See script 3.4C, topic 2.
x
(b) Manifold M is known by the equation F y = x2 y + xy 2 − z 2 + 3 = 0
z
2
near the point c = 1.
3
i. Find a basis for the tangent space Tc M.
y
ii. Locally, M is the graph of a function x = g . Determine
z
1
[Dg ] by using the implicit function theorem.
3
iii. Solve for z in terms of xand y, and use R to do a wireframe plot
of the manifold. See script 3.4C, topic 1.
z1
z3
(c) (Hubbard, Example 3.1.14) F z2 =
z3 − z1 z2
z3
Construct [DF]. It has two rows.
Find the point for which [DF] is not onto. Use R to find points on
the manifold near this point, and try to figure out what is going on.
See the end of script 3.4C for an example of how to find points on a
1-dimensional manifold in R3 .
19
3. Critical points (rigged to make the algebra work, but you should also plot
contour lines in R and use them to find the critical points)
Calculate the Jacobian matrix and the Hessian both by using R and with
pencil and paper.
x
(a) i. Find the one and only critical point of f = 4x2 + 12 y 2 + x82 y
y
on the square 14 ≤ x ≤ 4, 41 ≤ y ≤ 4.
ii. Use second derivatives (the Hessian matrix) to determine whether
this critical point is a maximum, minimum, or neither.
x
(b) The domain of the function F = y 2 + (x2 − 3x) log y is the upper
y
half-plane y > 0. Find all the critical points of F , and use the Hessian
matrix to classsify each as maximum, minimum, or saddle point.
x
(c) The function F = x2 y − 3xy + 12 x2 + y 2 has three critical points,
y
two of which lie on the line x = y. Find each and use the Hessian
matrix to classify it as maximum, minimum, or saddle point.
20
5 Homework - due on December 2
Although all of these problems except the last one were designed so that they
could be done with pencil and paper, it makes sense to do a lot of them in R,
and the Week 12 scripts provide good models. For each problem that you choose
to do in R, include a “see my script” reference in the paper version. Put all your
R solutions into a single script, and upload it to the homework dropbox on the
week 12 page.
When you use R, you will probably want to include some graphs that are not
required by the statement of the problem.
Do appreciate that problems 3 and 4, which use only androgynous names, are
sexual-orientation neutral as well as gender-neutral and avoid the use of third-
person singular pronouns.
21
3. Pat and Terry are in charge of properties for the world premiere of the
student-written opera “Goldfinger” at Dunster House. In the climactic
scene the anti-hero takes the large gold brick that he has made by melting
down chalices that he stole from the Vatican Museum and places it in a
safety deposit box in a Swiss bank while singing the aria “Papal gold, now
rest in peace.”
The gold brick is supposed to have length x = 8, height y = 2, and width
z = 4. With these dimensions in mind, Pat and Terry have spent their
entire budget on 112 square inches of gold foil and 64 cubic inches of an
alloy that melts at 70 degrees Celsius. They plan to fabricate the brick by
melting the alloy in a microwave oven and casting it in a sand mold.
Alas, the student mailboxes that they have borrowed to simulate safety-
deposit boxes turn out to be not quite 4 inches wide. Fortunately, the
equation
x
xyz − 64
F y = =0
xy + xz + yz − 56
z
specifies x and y implicitly in terms of z.
22
4. This problem is an example of a two-dimensional submanifold of R4 .
For their term project in the freshman seminar “Nuclear Terrorism and the
Third World,” Casey and Chris decide to investigate whether plutonium
can be handled safely using only bronze-age technology. They acquire two
bronze spears, each 5 meters long, and design a system where the plutonium
container is connected to the origin by one spear and to the operator by
the other. Everything is in a plane. Now the coordinates x1 and y1 of
the plutonium and the coordinates x2 and y2 of the operator satisfy the
equation
x1
y1 x21 + y12 − 25
F =
= 0.
x2 (x1 − x2 )2 + (y1 − y2 )2 − 25
y2
One solution to this equation is x1 = 3, y1 = 4, x2 = 0, y2 = 8.
(You can build a model with a couple of ball-point pens and some Scotch
tape).
(a) Show that near the given solution, the constraint equation specifies x1
and y1 as a function of x2 and y2 , but not vice-versa.
(b) Calculate the derivative of the implicit function and show that it is not
onto. Determine in what direction the plutonium container will move
if x2 and y2 are both increased by equal small amounts (or changed
in any other way.) This system is not really satisfactory, because the
plutonium container can move only along a circle.
(c) Casey and Chris come up with a new design in which one spear has its
end confined to the x-axis (coordinate x2 can be changed, but y2 = 0).
The other spear has its end confined to the y-axis (coordinate y3 can
be changed, but x3 = 0). For this new setup, one solution is x1 = 3,
y1 = 4, x2 = 6, y3 =0. Show that x1 and y1 are now specified locally
x
by a function ~g 2 . Calculate [Dg] and show that it is onto.
y3
(d) Are x2 and y3, near the same solution, now specified locally by a
x1
function ~f ? If so, what is [Df ]?
y1
(e) For the new setup, another solution is x1 = 3, y1 = 4, x2 = 6, y3 = 8.
Show that in this case, although [DF] is onto, the choice of x1 and y1
as passive variables
is not possible, and there is no implicitly defined
x2
function ~g as there was in part (c). Draw a diagram to illustrate
y3
what is the problem.
23
5. (Physics version)In four-dimensional spacetime, a surface is specified as the
intersection of the hypersphere x2 + y 2 + z 2 = t2 − 2 and the hyperplane
3x + 2y + z − 2t = 2.
(Economics version)A resource is consumed at rate t to manufacture goods
at rates x, y, and z, and production is constrained by the equation x2 +
y 2 + z 2 = t2 − 2.
Furthermore, the expense of extracting the resource is met by selling the
goods, so that 2t = 3x + 2y + z − 2.
In either case, we have a manifold that is the locus of
x 2
y x + y 2 + z 2 − t2 + 2
F =
= 0.
z 3x + 2y + z − 2t − 2
t
6. Consider
the
manifold
specified by the parametrization
x t + et
g(t) = = , −∞ < t < ∞.
y t + e2t
Find where it intersects the line 2x+y = 10. You can get an initial estimate
by using the graph in script 3.4B, then use Newton’s method to improve
the estimate.
24
7. Manifold X, a hyperboloid, can be parametrized as
x sec u
y = γ u = tan u cos v
v
z tan u sin v
If you use R, you can do a wireframe plot the same way that the sphere
was plotted in script 3.4C, topic 2.
(a) Find the coordinates of the point c on this manifold for which
u = π4 , v = π2 .
π
(b) Find the equation of the tangent plane Tc X as the image of [Dγ π4 ].
2
x
(c) Find an equation F y = 0 that describes the same manifold near
z
c, and find the equation of the tangent plane Tc X as the kernel of
[DF(c)].
y
(d) Find an equation x = g that describes the same manifold near
z
c, and
find the equation of the tangent plane Tc X as the graph of
0
[Dg ].
1
8. Hubbard, Exercise 3.6.2. This is the only problem of this genre on the
homework that can be done with pencil and paper, but you must be pre-
pared to do one like it on the final exam!
9. Here is another function that has one maximum, one miminum, and two
saddle points, for all of which x and y are less than 3 in magnitude.
x
f = x3 − y 3 + 2xy − 5x + 6y.
y
Locate and classify all four critical points using R, in the manner of script
3.4D. A good first step is to plot contour lines with x and y ranging from
-3 to 3. If you do
contour(x,y,z, nlevels = 20)
you will learn enough to start zooming in on all four critical points.
An alternative, more traditional, approach is to take advantage of the fact
that the function f is a polynomial. If you set both partial derivatives equal
to zero, you can eliminate either x or y from the resulting equations, then
find approximate solutions by plotting a graph of the resulting fourth-degree
polynomial in x or y.
25