Harvard Math 23a Notes Problems Syllabus PDF

MATHEMATICS 23a/E23a, FALL 2015
Linear Algebra and Real Analysis I

Syllabus for undergraduates and local Extension students
(Distance Extension students will also need to consult a special syllabus)
Last revised: July 22, 2015
Course Website: https://canvas.harvard.edu/courses/4524
Instructor: Paul Bamberg (to be addressed as “Paul,” please)

Paul graduated from Harvard in 1963 with a degree in physics and received his
doctorate in theoretical physics at Oxford in 1967. He taught in the Harvard
physics department from 1967 to 1995 and joined the math department in 2001.
From 1982 to 2000 he was one of the principals of the speech recognition company
Dragon Systems. If you count Extension School and Summer School, he has prob-
ably taught more courses, in mathematics, physics, and computer science, than
anyone else in the history of Harvard. He was the first recipient of the White Prize
for excellence in teaching introductory physics.
This term, Paul is also teaching Math 152, “Discrete Mathematics,” and Math
116, “Real Anaysis, Convexity, and Optimization.”
Email: bamberg@tiac.net
Office: Science Center 322, (617-49)5-9560
Office Hours:
Tuesday and Thursday, 1:30-2:15 in Science Center 322.
Mondays 2-2:30 (longer if students are still there)
Head Teaching Assistant: Kate Penner (to be addressed as “Kate,” please)

Kate is the course head for Math E-23a, responsible for making it possible for
students from around the nation and the world to participate as fully as possible
in course activities.
Kate’s Harvard undergraduate degree is in government, but her interests have
moved to political economy and mathematics. After taking Math E-23 in the
Extension School, she became the head teaching assistant and is starting her sixth
year in that position. She has been course head for linear algebra and real analysis
courses in the Summer School. She may have set a Harvard record in Spring
2013 by teaching in four courses (Math M, Math 21b, Math 23b, and Math 117).
To date, she has received over a dozen teaching awards from the Bok Center for
Teaching and Learning for her work teaching undergraduate math.
This term, Kate is also teaching Math1a.
Email: penner@math.harvard.edu
Office: Science Center 424
Office Hours: TBD
Week 1: T Regular office hours TBA
1
Course Assistants:(all former students in Math 23a or Math E-23a)
• Nicolas Campos, ncampos@college.harvard.edu
• Jennifer Hu, jenniferhu@college.harvard.edu
• Ju Hyun Lee, juhyunlee@college.harvard.edu
• Elaine Reichert, reichertelaine@gmail.com
• Ben Sorscher, bsorscher@college.harvard.edu
• Sebastian Wagner-Carena, swagnercarena@college.harvard.edu
• Kenneth Wang, kwang02@college.harvard.edu
Goals: Math 23a is the first half of a moderately rigorous course in linear algebra
and multivariable calculus, designed for students who are serious about mathemat-
ics and interested in being able to prove the theorems that they use but who are
as much concerned about the application of mathematics in fields like physics and
economics as about “pure mathematics” for its own sake. Trying to cover both
theory and practice makes for a challenging course with a lot of material, but it is
appropriate for the audience!
Prerequisites: This course is designed for the student who received a grade of 5
on the Math BC Advanced Placement examination or an A or A minus in Math
1b. Probably the most important prerequisite is the attitude that mathematics is
fun and exciting. Extension students should ordinarily have an A in Math E-16,
and an additional math course would be a very good idea.
Our assumption is that the typical Math 23a student knows only high-school
algebra and single-variable calculus, is currently better at formula-crunching than
at doing proofs, and likes to see examples to accompany abstractions. If, before
coming to Harvard, you took courses in both linear algebra and multivariable
calculus, Math 25 might be more appropriate. We do not assume that Math 23
students have any prior experience in either of these areas beyond solving systems
of linear equations in high school algebra.
This year, for the second time, we will devote four weeks to single-variable real
analysis. Real analysis is the study of real-valued functions and their properties,
such as continuity, and differentiability, as well as sequences, series, limits, and
convergence. This means that if you are an international student whose curriculum
included calculus but not infinite series OR if you had a calculus course that
touched only lightly on topics like series, limits, and continuity, you will be OK.
Mathematics beyond AP calculus is NOT a prerequisite! Anyone who tries
to tell you otherwise is misguided. In fact, since we will be teaching sequences
and series from scratch (but rigorously), you can perhaps get away with a weaker
background in this area than is required for Math 21.
2
Strange as it may seem, Part I of the math placement test that freshmen have
taken is the most important. Students who do well in Math 23 have almost all
scored 26 or more out of 30 on this part.
Extension students who register for graduate credit are required to learn and
use the scripting language R. This option is also available to everyone else in the
course. You need to be only an experienced software user, not a programmer.
3
Who takes Math 23?
When students in Math 23b were asked to list the two concentrations they were
most seriously considering, the most popular choices were mathematics, applied
math, physics, computer science, chemistry, mathematical economics, life sciences,
and humanities.
Extension students who take this course are often establishing their credentials
for a graduate program in a field like mathematical economics, mathematics, or
engineering. Programs in fields like economics like to see a course in real analysis
on your transcript. Successful Math E-23 students have usually taken more than
one course beyond single-variable calculus.
Upperclassmen who have made a belated decision to go into a quantitative PhD
program will also find this course useful.
Course Meetings:
The course ordinarily meets in Science Center A. To avoid overcrowding, the
first two lectures have been moved to Science Center C.
Lectures on Tuesdays and Thursdays run from 2:37 to 4:00. They provide
complete coverage of the week’s material, occasionally illustrated by examples done
in the R scripting language.
Problem Sessions (Section)
There are two types of weekly problem sessions led by the course staff. The
first is required; the second, though highly recommended, is optional.
• The “early” sections on Thursday and Friday will be devoted to problem
solving in small groups. These are a required course activity and will
count toward your grade. Lecture on Thursday is crucial background for
section!
• The “late” sections that meet on Monday will focus on the weekly problem
sets due on Wednesday mornings, and will also review the proofs that were
done in lecture. Attendance at these sections is optional, but most students
find them to be time well spent.
Videos will be made of all the lectures. Usually the video will be posted on
the Web site before the next lecture, and often it will appear on the same day.
The Thursday video will not be posted in time to provide preparation for the early
sections that meet on Thursdays, and we cannot guarantee that it will appear
before the Friday sections.
Even though all lectures are captured on video, Harvard rules forbid under-
graduates to register for another course that meets at the same time as Math 23,
even one with just a 30-minute overlap! Here is the official statement of this year’s
policy:
“In recent years, the Ad Board has approved petitions in which the direct
and personal compensatory instruction has been provided via video capture of
classroom presentations. In keeping with the views of the Standing Committee
on Undergraduate Educational Policy (formerly EPC), discussed with the Faculty
4
Council and the full faculty last April, the Ad Board will no longer approve such
petitions.”
With regard to athletic practices that occur at the same time as classes, policy
is less well defined. Here is the view of the assistant director of athletics:
”The basic answer is that our coaches should be accommodating to any aca-
demic conflict that comes up with class scheduling. Kids should be able to take
the classes they want and still be a part of the team. Especially for classes that
would only cause a student to miss a small part of a practice.
What complicates things are the classes that would cause a student to miss an
entire practice for 2-3 days a week. Those instances make it hard for a student to
engage fully in the sport and prepare adequately for competition.
It’s hard for freshmen to ask a coach - the adult they have the closest relation-
ship to in campus - for practice accommodations but in my experience many of
them will work with students on their total experience”
The Math 23 policy, based on this opinion: It is OK to take Math 23a and
practice for your sport every Tuesday, but you must not miss Thursday lecture for
a practice.
Extension students may choose between attending lecture or watching videos.
However, students in Math E-23a who will not regularly attend lecture on Thursday
should sign up for a section that meets as late as possible. Then, with occasional
exceptions, they can watch the video of the Thursday lecture to prepare for section.
Sections will begin on September 10-11. Students should indicate their prefer-
ences for section time using the student information system. More details will be
revealed once the software is complete!
In order to include your name on a section list, we must obtain your permission
(on the sectioning form) to reveal on the Web site that you are a student taking
Math 23a or E-23a. If you wish to keep this information secret, we will include
your name in alphabetical order, but in the form Xxxx Xxxxxx.
5
Exams: There will be two quizzes and one final exam.
Quiz 1: Wednesday, October 7 (module 1, weeks 1-4)
Quiz 2: Wednesday, November 4 (module 2, weeks 5-8)
Final Exam: date and time TBA (module 3, weeks 9-12)
Quizzes are held in the Yenching Auditorium, 2 Divinity Avenue. They run
from 6 to 9 PM, but you can arrive any time before 7 PM, since 120 minutes should
be enough time for the quiz.
Keep these time slots open. Do not, for example, schedule a physics lab
or an LS 1a section on Wednesday evenings. If you know that you tend to work
slowly, it would also be unwise to schedule another obligation that leaves only part
of that time available to you!
Students who have exam accommodations, properly documented by a letter
from the Accessible Education Office, may need to take their quizzes in a separate
location. Please provide the AEO letters as early in the term as you can, since we
may need to reserve one or more extra rooms.
The last day to drop and add courses (like Math 23a and Math 21a) is Monday,
October 5. This is before the first quiz. It is important that you be aware of how
you are managing the material and performing in the course. It is not a good
idea to leave switching out of any course (not just Math 23) until the fifth Mon-
day. Decisions of this nature are best dealt with in as timely a manner as possible!!
Quizzes will include questions that resemble the ones done in the “early” sec-
tions, and each quiz will include two randomly-chosen proofs from among the
numbered proofs in the relevant module. There may be other short proofs simi-
lar to ones that were done in lecture and problems that are similar to homework
problems. However if you want quizzes on which you are asked to prove difficult
theorems that you have never seen before, you will need to take Math 25a or 55a,
not Math 23a.
If you have an unexpected time confilct for one of the quizzes, contact Kate
as soon as you know about it, and special arrangements can be made. Distance
students will take their quizzes near their home but on the same dates.
The final examination will focus on material from the last five weeks of the
course. Local Extension students will take it at the same time and place as under-
graduates. The time (9AM or 2PM) will be revealed when the exam schedule is
posted late in September. If you have two or even three exams scheduled for that
day, don’t worry: that is a problem for the Exams Office, not you, to solve.
Except for the final examination, “local” Extension students can meet all their
course obligations after 5:30pm.
“Distance” extension students who do not live near Cambridge and cannot
come to Harvard in the evening to hand in homework, attend section and office
hours, take quizzes, and present proofs can still participate online in all course
activities. Details will be available in a separate document. Since this fully-online
6
option is an experiment, we plan to restrict it to two sections of 12 students each,
with absolute priority given to students who live far from Cambridge.
7
Textbooks:
Vector Calculus, Linear Algebra, and Differential Forms, Hubbard and Hubbard,
fourth edition, Matrix Editions, 2009. Try to get the second printing, which in-
cludes a few significant changes to chapters 4 and 6.
This book is in stock at the Coop, or you can order it for $84 plus $10 for
priority shipping from the publisher’s Web site at
http://matrixeditions.com/UnifiedApproach4th.html. The Student Solution
Manual for the fourth edition, not in stock at the Coop, is also available from that
Web site.
We will cover Chapters 1-3 this term, Chapters 4-6 in Math 23b; so this one
textbook will last for the entire year.
Ross, Elementary Analysis: The Theory of Calculus, 2nd Edition, 2013.

This will be the primary text for the module on single-variable real analysis.
It is available electronically through the Harvard library system (use HOLLIS and
search for the author and title). If you like to own bound volumes, used copies can
be found on amazon.com for as little as $25, but be sure to get the correct edition!
Lawvere, Conceptual mathematics: a first introduction to categories, 2nd Edi-

tion, 2009.
We will only be using the first chapter, and the book is available for free
download through the Harvard library system.
8
Proofs:
Learning proofs can be fun, and we have put a lot of work into designing an
enjoyable way to learn high level and challenging mathematics! Each week’s course
materials includes two proofs. Often these proofs appear in the textbook and will
also be covered in lecture. They also may appear as quiz questions.
You, as students, will earn points towards your grade by presenting these proofs
to teaching staff and to each other without the aid of your course notes. Here is
how the system works:
When we first learn a proof in class, only members of the teaching staff are “qual-
ified listeners.” Anyone who presents a satisfactory proof to a qualified listener
also becomes qualified and may listen to proofs by other students. This process of
presenting proofs to qualified listeners occurs separately for every proof.
You are expected to present each proof before the date of the quiz on which it
might appear; so each proof has a deadline date. Distance students may reference
the additional document which details how to go about remotely presenting proofs
to classmates and teaching staff.
Each proof is worth 1 point. Here is the grading system:
• Presenting a proof to Paul, Kate, one of the course assistants, or a fellow

student who has become a qualified listener: 0.95 points before the deadline,
0.8 points after the deadline. You may only present each proof once.
• Listening to a fellow student’s proof: 0.1 point. Only one student can receive
credit for listening to a proof.
• After points have been tallied at the end of the term, members of the course
staff may assign the points that they have earned by listening to proofs
outside of section to any students that they feel deserve a bit of extra credit.
Students who do the proofs early and listen to lots of other students’ proofs can
get more than 100%, but there is a cap of 30 points total.You can almost reach
this cap by doing each proof before the deadline and listening twice to each proof.
Either you do a proof right and get full credit, or you give up and try again
later. There is no partial credit. It is OK for the listener to give a couple of small
hints.
You may consult the official list of proofs that has the statement of each theorem
to be proved, but you may not use notes. That will also be the case when proofs
appear on quizzes and on the final exam.
It is your responsibility to use the proof logging software on the course
Web site to keep a record of proofs that you present or listen to. You can also
use the proof logging software to announce proof parties and to find listeners for
your proofs.
Each quiz will include two questions which are proofs chosen at random from
the four weeeks of relevant material. The final exam will have three proofs, all from
material after the second quiz. Students generally do well on the proof questions.
9
Useful software:
• R and RStudio
This is required only for Extension students who register for graduate credit,
but it is an option for everyone. Consider learning R if you
– are interested in computer science and want practice in using software

to do things that are more mathematical than can be dealt with in CS
50 or 51.
– are thinking of taking a statistics course, which is likely to use R.
– are hoping to get an interesting summer job or summer internship that
uses mathematics or deals with lots of data.
– want to be able to work with large data files in research projects in any
field (life sciences, economics and finance, government, etc.)
R is free, open-source software. Instructions for download and installation

are on the Web site. You will have the chance to use R at the first section
on Thursday, September 10 or Friday, September 11; so install it right away,
preferably on a laptop computer that you can bring to section.
On the course Website are a set of R scripts, with accompanying YouTube
videos, that explain how to do almost every topic in the course by using
R. These scripts are optional for undergraduate, but they will enhance your
understanding both of mathematics and of R.
10
• LaTeX
This is the technology that is used to create all the course handouts. Once
you learn how to use it, you can create professional-looking mathematics on
your own computer.
The editor that is built into the Canvas course Web site is based on LaTeX.
One of the course requirements is to upload four proofs to the course Web site
in a medium of your choice. One option is to use LaTeX. Alternatively, you
can use the Canvas file editor (LaTeX based), or you can make a YouTube
video.
I learned LaTeX without a book or manual by just taking someone else’s files,
ripping out all the content, and inserting my own, and so can you. You will
need to download freeware MiKTeX version 2.9 (see http://www.miktex.org),
which includes an integrated editor named TeXworks.
From http://tug.org/mactex/ you can download a similar package for the
Mac OS X.
When in TeXworks, use the Typeset/pdfLaTeX menu item button to create
a .pdf file. To learn how to create fractions, sums, vectors, etc., just find an
example in the lecture outlines and copy what I did. All the LaTeX source
for lecture outlines, assignments, and practice quizzes is on the Web site, so
you can find working models for anything that you need to do.
If you create a .pdf file for your homework, please print out the files and
hand in the paper at class. An exception can be made if if you are a distance
Extension student or for some other good reason you are not in Cambridge
on the due date.
The course documents contain examples of diagrams created using TikZ,
the built-in graphics editor. It is also easy to include .jpg or .png files
in LaTeX. If you want to create diagrams, use Paint or try Inkscape at
http://www.inkscape.org, an excellent freeware graphics program. Stu-
dents have found numerous other solutions to the problem of creating graph-
ics, so just experiment.
If you create a .pdf file for your homework, please print out the files and hand
in the paper. By default, undergraduates and “local” Extension students may
submit the assignment electronically only if you are out of town on the due
date. Individual section instructors may adopt a more liberal poicy about
allowing electonic submission. Do not submit .tex files.
11
Use of R:
You can earn “R bonus points” in three ways:
• By being a member of a group that uploads solutions to section problems

that require creation of R scripts. These will be available most, but not all,
weeks. (about 10 points)
• By submitting R scripts that solve the optional R homework problems (again

available most, but not all, weeks). (about 20 points)
• By doing a term project in R. (about 20 points)
To do the “graduate credit” grade calculation, we wiil add in your R bonus

points to the numerator of your score. To the denominator, we will add in 95%
of your bonus points or 50% of the possible bonus points, whichever is greater.
Earning a lot of R points is essential if you are registered for graduate credit. Oth-
erwise,earning more than half the bonus points is certain to raise your percentage
score a bit, and it can make a big difference if you have a bad day on a quiz or on
the final exam.
12
Grades: Your course grade will be determined as follows:
• problem sets, 50 points. Your worst score will be converted to a perfect score.
• presenting and listening to proofs, 26 points.
• uploading proofs to the Web site, 4 points.
• participation in the “early” sections, based on attendance, preparation, con-

tributions to problem solving, and posting solutions to the Web site, 10
points.
• two quizzes, 40 points each.
• final exam, slightly more than 60 points.
• R bonus points, about 50 points in numerator, 25-45 points in denominator.
For graduate students, only a “graduate” percentage score, using the R bonus
points, will be calculated. For everyone else, we will also calculate an “undergrad-
uate” percentage score, ignoring the R bonus points, and we will use the higher of
the two percentage scores.
The grading scheme is as follows:
Points Grade
94.0% A
88.0% A-
80.0% B+
75.0% B
69.0% B-
63.0% C+
57.0% C
51.0% C-
If you are conscientious about the homework, proofs, and quizzes, you will end up
with a grade between B plus and A, depending on your expertise in taking a fairly
long and challenging 3-hour final exam, and you will know that you are thoroughly
prepared for more advanced courses. For better or worse, you need to be fast as
well as knowledgeable to get an A, but an A- is a reasonable goal even if you make
occasional careless errors and are not a speed demon. Extension students who
earned a B plus have been successful at getting into PhD programs.
There is no “curve” in this course! You cannot do worse because your classmates
do better.
13
Switching Courses (Harvard College students only):
While transfers among Math 21a, 23a, 25a, and 55a are routine, it is important
to note that Math 21a focuses on multivariable calculus, while Math 23a and 25a
focus on linear algebra. Math 21b focuses on linear algebra, while Math 23b and
25b focus on multivariable calculus. Math 21a and b are given every semester, while
Math 23a and 25a are fall only with 23b and 25b given spring only. Ordinarily
there is a small fee if you drop a course after the third Monday of the term, but
this is waived in the case of math courses. However, the fifth Monday, October 5,
is a firm deadline after which you cannot change courses!
• Math 23a to Math 21a or b

If you decide to transfer out of Math 23a within 3 weeks of the start of the
semester, then either Math 21a or 21b is a reasonable choice. If more than 3
weeks have elapsed, Math 21b will be a better place for you to go. You will
want to take Math 21a in the spring. You should avoid waiting until the last
minute to switch.
Switching to Math 21 at midyear (either to 21b or to 21a) does not make
sense except in desperate situations. You will have seen some of the topics in
Math 25b, since Math 25a does almost no real analysis. In addition, you will
have done about 60% of Math 112, which you are should skip after taking
Math 23.
• Math 25a to Math 23a

Math 23a and Math 25a cover similar material during the first three weeks.
If you have taken a course in which you learned to multiply matrices and use
dot and cross products, you can probably attend only Math 25 lectures for
three weeks and still have only a little catching up to do if you add Math
23a during the week of the first quiz. However, if you are trying to decide
between 25a and 23a and have not taken a college-level linear algebra course,
it might be prudent to attend the lectures in both courses until you make up
your mind. Math 23a Weeks 2 and 4 will be new material!
In the case of transfers, graded Math 25a problem sets will be accepted in
lieu of missed Math 23a problem sets. It is imperative that you review the
problem sets and material that you have missed upon joining the course as
soon as possible.
For those who make the decision to change courses at the last minute, there
will be special office hours in Science Center 322 on Monday, October 5 from
3 to 4 PM at which study card changes can be approved and arrangements
for missed homework and quizzes can be discussed.
Switching from Math 23a to Math 25b at midyear has worked well for a few
students over the past several years, although you end up seeing a lot of real
analysis twice.
14
Switching from Math 25a to Math 23b at midyear requires you to teach
yourself about multivariable differential calculus and manifolds, but a handful
of students do it every year, and it generally works out OK.
Special material for Physics 15b and Physics 153

Math 23b does an excellent treatment of “vector calculus” (div, grad, and curl)
and its relation to differential form fields and the exterior derivative. Alas, this
material is needed in Physics 15b and Physics 153 before we can reach it in Math
23.
Week 13 covers these topics in a manner that relies only on Math 23a, never
mentioning muliple integrals. This will be covered in a special lecture during
reading period, and there will be a optional ungraded problem set. If you choose
to do this topic, which physics students last year said was extremely useful, there
will be one question about it on the final exam, which you can use to replace your
lowest score on one of the other questions.
If you are not taking Physics 15b or Physics 153, just wait to see this material
in Math 23b.
YouTube videos
These were made as part of a rather unsuccessful pedagogical experiment last
year. They are quite good, but you will need some extra time to watch them.
• The Lecture Preview Videos were made by Kate. They cover the so-called
Executive Summaries in the weekly course materials, which go over all the
course materials, but without proofs or detailed examples.
If you watch these videos (it takes about an hour per week) you will be very
well prepared for lecture, and even the most difficult material will make sense
on a first hearing.
Last year’s experiment was unsuccessful because we assumed in lecture that
everyone had watched these videos, when in fact only half the class did
so. Those who did not watch them complained, correctly, that the lectures
skipped over basic material in getting to proofs and examples. This year’s
lectures will be self-contained, so the preview videos are not required viewing.
• The R script videos were made by Paul. They provide a line-by-line expla-
nation of the R scripts that accompany each week’s materials.
Last year’s experiment was unsuccessful because going over these scripts in
class was not a good use of lecture time. If you are doing the “graduate”
option, these scripts are pretty much required viewing, although the scripts
are so thoroughly commented that just working through them on your own
is perhaps a viable alternative.
If you are doing just the “undergraduate” option, you can ignore the R scripts
completely.
15
Homework: Homework (typically 8 problems) will be assigned weekly. The
assignment will be included in the same online document as the lecture notes and
section problems.
Assignments are due on Wednesdays by 10:00 AM. There will be a locked box
on the second floor, near Room 209, with your “late” section instructor’s name.
At 10 AM Kate will place a sheet of colored paper in each box, and anything above
that paper will be late! Please include your name, the assignment number, and
your CA’s name on your assignment.
Each week’s assignment will include a couple of optional problems whose so-
lutions require R scripts. These scripts should be uploaded electronically to the
dropbox on the Web site for that week. Please include your name as a comment
in the script and also in the file name.
The course assistant who leads your “late” section should return your corrected
homework to you at the section after the due date. If you are not receiving graded
homework on schedule, send email to penner@math.harvard.edu and the problem
will be dealt with.
Homework that is handed in after 10AM on the Wednesday when it is due
will not be graded. If it arrives before the end of Reading Period and looks fairly
complete, you will get a grade of 50% for it.
It is a violation of Federal privacy law for us to return graded homework by
placing it in a publicly accessible location like an instructor’s mailbox. You will
have to collect your graded homework from your section instructor in person.
Collaboration and Academic Integrity policy:

You are encouraged to discuss the course with other students and with the
course staff, but you must always write your homework solutions out yourself in
your own words. You must write the names of those you’ve collaborated
with at the top of your assignment.
If you collaborate with classmates to solve problems that call for R scripts, create
your own file after your study group has figured out how to do it.
Proofs that you submit to the course Web site must be done without consulting
files that other students have posted!
If you have the opportunity to see a complete solution to an assigned problem,
please refrain from doing so. If you cannot resist the temptation, you must cite
the source, even if all that you do is check that your own answer is correct.
You are forbidden to upload solutions to homework problems, whether your
own or ones that are posted on the course Web site, to any publicly available
location on the Internet.
Anything that you learn from lecture, from the textbook, or from working
homework problems can be regarded as “general knowledge” for purposes of this
course, and the source need not be cited. Anything learned in prerequisite courses
falls into the same category. Do not assume that other courses use some an ex-
pansive definition of “general knowledge”!
16
Tutoring: Several excellent students from previous years, qualified to be course
assistants but too busy, are registered with the Bureau of Study Counsel as tutors.
If you find yourself getting into difficulties, immediately contact the BSC and get
teamed up with one of them.
You will have to contact the BSC directly to arrange for a tutor, since privacy
law forbids anyone on the Math 23 staff to know who is receiving tutoring. A
website with more information can be found at www.bsc.harvard.edu.
Week-by-week Schedule:
Month Date Topic
Fortnight 1 September 3-11 Fields, vectors and matrices
Week 2 September15-18 Dot and cross products; Euclidean geometry of Rn
Week 3 September 22-25 Row reduction, independence, basis
Week 4 Sept. 29 - Oct. 2 Eigenvectors and eigenvalues
Week 5 October 6-9 Number systems and sequences
October 7 QUIZ 1 on weeks 1-4
Week 6 October 13-16 Series, convergence tests, power series
Week 7 October 20-23 Limits and continuity of functions
Week 8 October 27-30 Derivatives, inverse functions, Taylor series
Week 9 November 3-6 Topology, sequences in Rn , linear differential equations
October 29 QUIZ 2 on weeks 5-8
Week 10 November 10-13 Limits and continuity in Rn ; partial and directional derivatives
Week 11 November 17-20 Differentiability, Newton’s method, inverse functions
Fortnight 12 Nov. 24-Dec. 3 Manifolds, critical points, Lagrange multipliers
November 26 Thansksgiving
Half-week 13 December 8 Calculus on parametrized curves; div, grad, and curl
December ? FINAL EXAM on weeks 9-12
This schedule covers all the math that is needed for Physics 15a, 16, and 15b
with the sole exception of surface integrals, which will be done in the spring.
The real analysis in Math 23a alone will be sufficient for most PhD programs in
economics, though the most prestigious programs will want to see Math 23b also.
All the mathematics that is used in Economics 1011a will be covered by the end
of the term. The coverage of proofs is complete enough to permit prospective
Computer Science concentrators to skip CS 20.
Abstract vector spaces and multiple integration, topics of great importance to
prospective math concentrators, have all been moved to Math 23b.
17
MATHEMATICS 23a/E-23a, Fall 2016
Module #1, Week 1 (Fields, Vectors, and Matrices)
Authors: Paul Bamberg and Kate Penner

R scripts by Paul Bamberg
Last modified: June 13, 2015 by Paul Bamberg
Reading
• Hubbard, Sections 0.1 through 0.4
• Hubbard, Sections 1.1, 1.2, and 1.3
• Lawvere and Schanuel, Conceptual Mathematics

Search the Internet for ”Harvard HOLLIS” and type ”Conceptual Mathe-
matics” into the Search box.
Choose View Online. You will have to log in with your Harvard PIN.
At a minimum, read the following:
Article I (Sets, maps, composition – definition of a category)
Session 2
This is very easy reading.
Proofs to present in section or to a classmate who has done them.
• 1.1 Suppose that a and b are two elements of a field F . Using only the
axioms for a field, prove the following:
– If ab = 0, then either a or b must be 0.

– The additive inverse of a is unique.
• 1.2(Generalization of Hubbard, proposition 1.2.9) A is an n × m matrix.

The entry in row i, column j is ai,j
B is an m × p matrix.
C is an p × q matrix.
The entries in these matrices are all from the same field F . Using summa-
tion notation, prove that matrix multiplication is associative:
that (AB)C = A(BC). Include a diagram showing how you would lay out
the calculation in each case so the intermediate results do not have to be
recopied.
• 1.3 (Hubbard, proposition 1.3.14) Suppose that linear transformation

T : F n → F m is represented by the m × n matrix [T ].
1
– a. Suppose that the matrix [T ] is invertible. Prove that the linear
transformation T is one-to-one and onto (injective and surjective),
hence invertible.
– b. Suppose that linear transformation T is invertible. Prove that its
inverse S is linear and that the matrix of S is [S] = [T ]−1
Note: Use * to denote matrix multiplication and ◦ to denote composition

of linear transformations. You may take it as already proved that matrix
multiplication represents composition of linear transformations. Do not
assume that m = n. That is true, but we are far from being able to prove
it, and you do not need it for the proof.
2
R Scripts
• Script 1.1A-Finite Fields.R

Topic 1 - Why the real numbers form a field
Topic 2 - Making a finite field, with only five elements
Topic 3 - A useful rule for finding multiplicative inverses
• Script 1.1B-PointsVectors.R
Topic 1 - Addition of vectors in R2
Topic 2 - A diagram to illustrate the point-vector relationship
Topic 3 - Subtraction and scalar multiplication
• Script 1.1C-Matrices.R
Topic 1 - Matrices and Matrix Operations in R
Topic 2 - Solving equations using matrices
Topic 3 - Linear functions and matrices
Topic 4 - Matrices that are not square
Topic 5 - Properties of the determinant
• Script 1.1D-MarkovMatrix
Topic 1 - A game of volleyball
Topic 2 - traveling around on ferryboats
• Script 1.1L-LinearMystery
Topic 1 - Define a mystery linear function f M yst : R2 → R2
3
1 Executive Summary
• Quantifiers and Negation Rules
The “universal quantifier” ∀ is read “for all.”
The “existential quantifier” exists is read “there exists.” It is usually
followed by “s.t,” a standard abbreviation for “such that.”
The negation of “∀x, P (x) is true” is “∃x, P (x) is not true.”
The negation of “∃x, P (x) is true” is “∀x, P (x) is not true.”
The negation of “P and Q are true” is “either P or Q is not true.”
The negation of “either P or Q is true” is “both P and Q are not true.”
• Functions
A function f needs two sets: its domain X and its codomain Y .
f is a rule that, to any element x ∈ X, assigns a specific element y ∈ Y .
We write y = f (x)
f must assign a value to every x ∈ X, but not every y ∈ Y must be of the
form f (x). The subset of the codomain consisting of elements that are of
the form y = f (x) is called the image of f . If the image of f is all of the
codomain Y , f is called surjective or onto
f need not assign different of elements of Y to different elements of X. If
x1 6= x2 =⇒ f (x1 ) 6= f (x2 ), f is called injective or one-to-one
If f is both surjective and injective, it is bijective and has an inverse f −1 .
• Categories
A category C has objects (which might be sets) and arrows (which might
be functions)
An arrow f must have a specific domain objectX and a specific codomain
f
object Y ; we write f : X → Y or X → − Y.
If arrows f : X → Y and g : Y → Z are in the category, then the composi-
tion arrow f ◦ g : X → Z is in the category.
For any object X there is an identity arrow IX : x → X
Given f : X → Y , f ◦ IX = f and IY ◦ f = f .
f g h
Associative law: given X → − Y →− Z→ − W , h ◦ (g ◦ f ) = (h ◦ g) ◦ f
Given an arrow f : X → Y , an arrow g : Y → X such that g ◦ f = IX is
called a retraction.
Given an arrow f : X → Y , an arrow g : Y → X such that f ◦ g = IY is
called a section.
If, for arrow f , arrow g is both a retraction and a section, then g is the
inverse of f , g = f −1 , and g must be unique.
Almost everything in mathematics is a special case of a category.
4
1.1 Fields and Field Axioms
A field F is a set of elements for which the familiar operations of addition and
multiplication are defined and behave in the usual way. Here is a set of axioms
for a field. You can use them to prove theorems that are true for any field.
1. Addition is commutative: a + b = b + a.
2. Addition is associative: (a + b) + c = a + (b + c).
3. Additive identity: ∃0 such that ∀a ∈ F, 0 + a = a + 0 = a.
4. Additive inverse: ∀a ∈ F, ∃ − a such that −a + a = a + (−a) = 0.
5. Multiplication is associative: (ab)c = a(bc).
6. Multiplication is commutative: ab = ba.
7. Multiplicative identity: ∃1 such that ∀a ∈ F, 1a = a.
8. Multiplicative inverse: ∀a ∈ F − {0}, ∃a−1 such that a−1 a = 1.
9. Distributive law: a(b + c) = ab + ac.
Examples of fields include:

The rational numbers Q.
The real numbers R.
The complex numbers C.
The finite field Zp , constructed for any prime number p as follows:
• Break up the set of integers into p subsets. Each subset is named after the
remainder when any of its elements is divided by p.
[a]p = {m|m = np + a, n ∈ Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each has
many alternate names. These p infinite sets are the elements of the field
Zp .
• Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any names for
the subsets, because the answer is independent of the choice of name. The
rule is “Add a and b, then divide by p and keep the remainder.”
• Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any names
for the subsets, because the answer is independent of the choice of name.
The rule is “Multiply a and b, then divide by p and keep the remainder.”
5
1.2 Points and Vectors
F n denotes the set of ordered lists of n elements from a field F . Usually the field
is R, but it could be the field of complex numbers C or a finite field like Z5 .
A given element of F n can be regarded either as a point, which represents
“position data,” or as a vector, which represents “incremental data.”
If an element of F n is a point, we represent it by a bold letter like p and write
it as a column of elements enclosed in parentheses.
 
1.1
p = −3.8
2.3
If an element of F n is a vector, we represent it by a bold letter with an arrow
like ~v and write it as a column of elements enclosed in square brackets.
 
−0.2
~v =  1.3 
2.2
To add a vector to a point, we add the components in identical positions together.
The result is a point: q = p + ~v. Geometrically we represent this by anchoring
the vector at the initial point p. The location of the arrowhead of the vector is
the point q that represents our sum.
q
p ~v
To add a vector to a vector, we again add component by component. The

result is a vector. Geometrically, the vector created by beginning at the initial
point of the first vector and ending at the arrowhead of the second vector is the
represents our sum.
~v + w
~
~
w
~v
To form a scalar multiple of a vector, we multiply each component by the

scalar. In Rn , the geometrical effect is to multiply the length of the vector by the
scalar. If the scalar is a negative number, we switch the position of the arrow to
the other end of the vector.
−2~v
2~v
~v
6
1.3 Standard basis vectors
The standard basis vector ~ek has a 1 as its kth component, and all its other
components are 0. Since the additive identity 0 and the multiplicative identity
1 must be present an any field, there will always be n standard basis vectors in
F n . Geometrically, the standard basis vectors in R2 are usually associated with
”one unit east” and ”one unit north” respectively.
~e2
~e1
1.4 Matrices and linear transformations

An m × n matrix over a field F has m rows and n columns.
Matrices represent linear functions, also known as linear transformations:
A function g : F n → F m is called linear if
g(a~v + b~
w) = ag(~v) + bg(~
w).
For a linear function g, if we know the value of g(~ei ) for each standard basis
vector e~i , the value of g(~v) for any vector v follows by linearity:
g(v1~e1 + v2~e2 + · · · + vn~en ) = v1 g(~e1 ) + v2 g(~e2 ) + · · · + vn g(~en )
The matrix G that represents the linear function g is formed by using g(~ek )
as the kth column. Then, if gi,j denotes the entry in the ith row and jth column
of matrix G, the function value w~ = g(~v) can be computed by the rule
n
X
wi = gi,j vj
j=1
1.5 Matrix multiplication

If m × n matrix G represents linear function g : F n → F m and n × p matrix H
represents linear function h : F p → F n , then the matrix product GH is defined
so that it represents their composition: the linear function g ◦ h : F p → F m .
Start with standard basis vector ~ej . Function h converts this to the jth
column ~hj of matrix H. Then function g converts this column to g(~hj ), which
must therefore be the jth column of matrix GH.
The rule for forming the product GH can be stated in terms of the rule for a
matrix acting on a vector: to form GH, just multiply G by each column of H in
turn, and put the results side by side to create the matrix GH. If C = GH,
n
X
ci,j = gi,k hk,j .
k=1
While matrix multiplication is associative, it is not commutative. Order matters!
7
1.6 Examples of matrix multiplication
 
0 1
2 1 0
B  2 −1 A
1 −1 −2
−2 0
  
0 1 1 −1 2
2 1 0 2 1
A AB B  2 −1  3 3 −2 BA
1 −1 −2 −6 2
−2 0 −4 −2 0
The number of columns in the first factor must equal the number of rows in
the second factor.
1.7 Function inverses

A function f : X → Y is invertible if it has the following two properties:
• It is injective (one-to-one): if f (x1 ) = f (x2 ) , then x1 = x2 .
• It is surjective (onto): ∀y ∈ Y, ∃x ∈ X such that f (x) = y.
The inverse function g = f −1 has the property that if f (x) = y then g(y) = x.
So g(f (x)) = x and f (g(y)) = y. Both f ◦ g and g ◦ f are the identity function.
1.8 The determinant of a 2 × 2 matrix

a b
For matrix A = , det A = ad − bc. If you fix one column, it is a linear
c d
function of the other column, and it changes sign if you swap the two columns.
1.9 Matrix inverses

A non-square m × n matrix A can have a “one-sided inverse.”
If m > n, then A takes a vector in Rn and produces a longer vector in Rm .
In general, there will be many matrices B that can recover the original vector in
Rn , so that BA = In . In this case there is no right inverse.
If m < n, then A takes a vector in Rn and produces a shorter vector in Rm .
In general, there will be no left inverse matrix B that can recover the original
vector in Rn , but there may be many different right inverses for which AB = Im .
For a square matrix, it is possible for both a right inverse B and a left inverse
C to exist. In this case, we can prove that B and C are equal and they are
unique. We can say that “an inverse” A−1 exists, and it represents the inverse of
the linear function represented by matrix A.
You can find the inverse of a 2 × 2 matrix A whose determinant is not zero
by using the formula

−1 1 d −b 1 d −b
A = =
det(A) −c a ad − bc −c a
8
1.10 Matrix transposes
The transpose of a given matrix A is written AT . The two are closely related.
The rows of A are the columns of AT and the columns of A are the rows of AT .

a b T a c
A= ,A =
c d b d
The transpose of a matrix product is the product of the transposes, but in

the opposite order:
(AB)T = B T AT
A similar rule holds for matrix inverses:
(AB)−1 = B −1 A−1
1.11 Applications of matrix multiplication

In these examples, the “sum of products” rule for matrix multiilpication arises
naturally, and so it is efficient to use matrix techniques.
• Counting paths: Suppose we have four islands connected by ferry routes:
1 3
2 4
 
0 0 1 1
1 0 0 0
The entry in row i, column j of the matrix A =  1 0 0 0 shows how

0 1 1 0
many ways there are to reach island i by a single ferry ride, starting from
island j. The entry in row i, column j of the matrix An shows how many
ways there are to reach island i by a sequence of n ferry rides, sarting from
island j.
• Markov processes: A game of beach volleyball has two “states”: in state

1, team 1 is serving, in state 2, team 2 is serving. With each point that
is played there is a “state transition” governed by probabilities: for exam-
ple, from state 1, there is a probability of 0.8 of remaining in state 1, a
probability of 0.2 of moving to state
2. The transition probabilities can be
0.8 0.3
collected into a matrix like A = . Then the matrix An specifies
0.2 0.7
the transition probabilities that result from playing n consecutive points.
9
2 Lecture Outline
1. Quantifiers and negation
Especially when you are explaining a proof to someone, it saves some writing
to use the symbols ∃ (there exists) and ∀ (for all).
Be careful when negating these.
The negation of “∀x, P (x) is true” is “∃x, P (x) is not true.”
The negation of “∃x, P (x) is true” is “∀x, P (x) is not true.”
When negating a statement, also bear in mind that
The negation of “P and Q are true” is “either P or Q is not true.”
The negation of “either P or Q is true” is “both P and Q are not true.”
For practice, let’s negate the following statements (which may or may not
be true!)
• There exists an even prime number.

Negation:
• All 11-legged alligators are orange with blue spots. (Hubbard, page 5)
Negation:
• The function f (x) is continuous on the open interval (0,1), which

means that ∀x ∈ (0, 1), ∀ > 0, ∃δ > 0 such that ∀y ∈ (0, 1),
|y − x| < δ implies |f (y) − f (x)| < .
Negation: f (x) is discontinuous on the open interval (0,1) means that
10
2. Set notation
Here are the standard set-theoretic symbols:
• ∈ (is an element of)

• {a|p(a)} (set of a for which p(a) is true)
• ⊂ (is a subset of)
• ∩ (intersection)
• ∪ (union)
• × (Cartesian product)
• - or \ (set difference)
Using the integers Z and the real numbers R, let’s construct some sets. In
each case there is one way to describe the set using a restriction and another
more constructive way to describe the set.
• The set of real numbers whose cube is greater than 8 in magnitude.

Restrictive:
Constructive:
• The set of coordinate pairs for points on the circle of radius 2 centered
at the origin (an example of a “smooth manifold”).
Restrictive:
Constructive:
11
3. Function terminology:
Here are some terms that should be familiar from your study of precalculus
and calculus:
Example a Example b Example c
domain
codomain
image
one-to-one = injective
onto = surjective
invertible = bijective
Using the sets X = {1, 2} and Y = {A, B, C}, draw diagrams to illustrate
the following functions, and fill in the table to show how the terms apply
to them:
• f : X → Y, f (1) = A, f (2) = B.
• g : Y → X, g(A) = 1, g(B) = 2, g(C) = 1.
• h : Y → Y, h(A) = B, h(B) = C, h(C) = A. (a permutation)
12
Here are those function words again, with two additions:
• domain
• natural domain (often deduced from a formula)
• codomain
• image
• one-to-one = injective
• onto = surjective
• invertible = bijective
• inverse image = {x|f (x) ∈ A}
Here are functions from R to R, defined by formulas.
• f1 (x) = x2
• f2 (x) = x3
• f3 (x) = log x(natural logarithm)
• f4 (x) = ex
• Find one that is not injective (not one-to-one)
• For f1 , what is the inverse image of (1, 4)?
• Which function is invertible as a function from R to R?
• What is the natural domain of f3 ?
• What is the image of f4 ?
• Specify domain and codomain so that f3 and f4 are inverses of one

another.
• Did your calculus course use “range” as a synonym for “image” or for
“codomain?”
13
4. Composition of functions
Sometimes people find that a statement is hard to prove because it is so
obvious. An example is the associativity of function composition, which
will turn out to be crucial for linear algebra.
Prove that (f ◦ g) ◦ h = f ◦ (g ◦ h). Hint: Two functions f1 and f2 are equal
if they have the same domain X and, ∀x ∈ X, f1 (x) = f2 (x).
Consider the set of men who have exactly one brother and least one son.
h(x) = “father of x”, g(x) = “brother of x”, f (x) = “oldest son of x”
• f ◦ g is called
• (f ◦ g) ◦ h is
• g ◦ h is called
• f ◦ (g ◦ h) is
• Simpler name for both (f ◦ g) ◦ h and f ◦ (g ◦ h)
Consider the real-valued functions

g(x) = ex , h(x) = 3 log x, f (x) = x2
• f ◦ g has the formula

• (f ◦ g) ◦ h has the formula
• g ◦ h has the formula
• f ◦ (g ◦ h) has the formula
• Simpler formula for both (f ◦ g) ◦ h and f ◦ (g ◦ h)
14
5. Finite sets and functions form the simplest example of a category
• The objects of the category are finite sets.

• The arrows of the category are functions from one finite set to another.
The definition of a function involves quantifiers.
Requirements for a function f : X → Y
∀x ∈ X, ∃!y ∈ Y such that f (x) = y

What is wrong with the following?
X Y
What is wrong with the following?
X Y
• If arrows f : X → Y and g : Y → Z are in the category, then the

composition arrow f ◦ g : X → Z is in the category.
• For any object X there is an identity arrow IX : x → X
• Given f : X → Y , f ◦ IX = f and IY ◦ f = f
• Composition of arrows is associative:
f g h
Given X →
− Y →
− Z→
− W , h ◦ (g ◦ f ) = (h ◦ g) ◦ f
The objects do not have to be sets and the arrows do not have to be
functions. For example, the objects could be courses, and an arrow from
course X to course Y could mean ”if you have taken course X, you will
probably do better in course Y as a result.” Check that the identity and
composition rules are satisfied.
15
6. Invertible functions - an example of invertible arrows
First consider the category of finite sets and functions between them.
The term “inverse” is used only for a “two-sided inverse.” Given f : X → Y ,
an inverse f −1 : Y → X must have the properties
f −1 ◦ f = IX and f ◦ f −1 = IY
Prove that the inverse is unique. This proof uses only things that are true
in any category, so it is valid in any category!
This function is not invertible because it is not injective, but it is surjective.
X Y
However, it has a “preinverse” (my terminology – the official word is “sec-

tion.”) Starting at an element of Y , choose any element of X from which
there is an arrow to that element. Call that function g. Then f ◦ g = IY
but g ◦ f 6= IX . Furthermore, g is not unique.
Prove the cancellation law that if f has a section and h ◦ f = k ◦ h, then
h = k (another proof that is valid in any category!)
This function f is not invertible because it is not surjective, but it is injec-

tive.
X Y
It has a “postinverse” (the official word is “retraction”). Just reverse all the
arrows to undo its effect, and define g however you like on the element of
Y that is not in the image of f . Then g ◦ f 6= IX g ◦ f = IX but f ◦ g 6= IY .
16
7. Fields
Loosely speaking, a field F is a set of elements for which the familiar oper-
ations of arithmetic are defined and behave in the usual way. Here is a set
of axioms for a field. You can use them to prove theorems that are true for
any field.
(a) Addition is commutative: a + b = b + a.

(b) Addition is associative: (a + b) + c = a + (b + c).
(c) Additive identity: ∃0 such that ∀a ∈ F, 0 + a = a + 0 = a.
(d) Additive inverse: ∀a ∈ F, ∃ − a such that −a + a = a + (−a) = 0.
(e) Multiplication is associative: (ab)c = a(bc).
(f) Multiplication is commutative: ab = ba.
(g) Multiplicative identity: ∃1 such that ∀a ∈ F, 1a = a.
(h) Multiplicative inverse: ∀a ∈ F − {0}, ∃a−1 such that a−1 a = 1.
(i) Distributive law: a(b + c) = ab + ac.
This set of axioms for a field includes properties (such as the commutativity
of addition) that can be proved as theorems by using the other axioms. It
therefore does not qualify as an “independent” set, but there is no general
requirement that axioms be independent.
Some well-known laws of arithmetic are omitted from the list of axioms
because they are easily proved as theorems. The most obvious omission is
∀a ∈ F, 0a = 0.
Here is the proof. What axiom justifies each step?
• 0 + 0 = 0 so (0 + 0)a = 0a.
• 0a + 0a = 0a.
• (0a + 0a) + (−0a) = 0a + (−0a).
• 0a + (0a + (−0a)) = 0a + (−0a).
• 0a + 0 = 0.
• 0a = 0.
17
8. Finite fields
Computing with real numbers by hand can be a pain, and most of linear
algebra works for an arbitrary field, not just for the real and complex num-
bers. Alas, the integers do not form a field because in general there is no
multiplicative inverse. Here is a simple way to make from the integers a
finite field in which messy fractions cannot arise.
• Choose a prime number p.

• Break up the set of integers into p subsets. Each subset is named after
the remainder when any of its elements is divided by p.
[0]p = {m|m = np, n ∈ Z}
[1]p = {m|m = np + 1, n ∈ Z}
[a]p = {m|m = np + a, n ∈ Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each
has many alternate names.
These p infinite sets are the elements of the field Zp .
• Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any
names for the subsets, because the answer is independent of the choice
of name. The rule is “Add a and b, then divide by p and keep the
remainder.”
• What is the simplest name for [5]7 + [4]7 ?
• What is the simplest name for the additive inverse of [3]7 ?
• Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any

names for the subsets, because the answer is independent of the choice
of name. The rule is “Multiply a and b, then divide by p and keep the
remainder.”
• What is the simplest name for [5]7 [4]7 ?
• Find the multiplicative inverse for each nonzero element of Z7
18
9. Rational numbers
The rational numbers Q form a field. You learned how to add and multiply
them years ago! The multiplicative inverse of ab is ab as long as a 6= 0.
The rational numbers are not a “big enough” field for doing Euclidean
geometry or calculus. Here are some irrational quantities:
√
• 2
• π.
• most values of trig functions, exponentials, or logarithms.
• coordinates of most intersections of two circles.
10. Real numbers

The real numbers R constitute a field that is large enough so that any
characterization of a number in terms of an infinite sequence of real numbers
still leads to a real number.
A positive real number is an expression like 3.141592... where there is no
limit to the number of decimal places that can be provided if requested.
To get a negative number, put a minus sign in front. This is Hubbard’s
definition.
An equivalent viewpoint is that a positive real number is the sum of an
integer and an infinite series of the form
∞
X 1 i
ai ( )
i=1
10
where each ai is one of the decimal digits 0...9.

Write the first three terms of an infinite series that converges to π.
The rational numbers and the real numbers are both “ordered fields.” This
means that there is a subset of positive elements that is closed under both
addition and multiplication. No finite field is ordered.
In Z5 , you can name the elements [0], [1], [2], [−2], [−1], and try to call the
elements [1] and [2] “positive.” Why does this attempt to make an ordered
field fail?
19
11. Proof 1.1 - two theorems that are valid in any field
(a) Using nothing but the field axioms, prove that if ab = 0, then either a
or b must be 0.
(b) Using nothing but the field axioms, prove that the additive inverse
of an element a is unique. (Standard strategy for uniqueness proofs:
assume that there are two different inverses b and c, and prove that
b = c.
20
12. Lists of field elements as points and vectors:
F n denotes the set of ordered lists of n elements from a field F . Usually
the field is R, but it could be the field of complex numbers C or a finite
field like Z5 .
An element of F n can be regarded either as a point, which represents “po-
sition data,” or as a vector, which represents “incremental data.” Beware:
many textbooks ignore this distinction!
If an element of F n is a point, we represent it by a bold letter like p and
write it as a column of elements enclosed in parentheses.
 
1.1
p = −3.8 ,
2.3
If an element of F n is a vector, we represent it by by a bold letter with

an arrow like ~v and write it as a column of elements enclosed in square
brackets.  
−0.2
~v =  1.3 
2.2
13. Relation between points and vectors, inspired by geometry:
• Add vector ~v component by component to point A to get point B.

• Subtract point A component by component from point B to get vector
~v.
• Vector addition: if adding ~v to point A gives point B and adding w
~
to point B gives point C, then adding ~v + w~ to point A gives point
C.
• A vector in F n can be multiplied by any element of F to get another
vector.
Draw a diagram to illustrate these operations without use of coordinates,

as is typically done in a physics course.
21
14. Examples from coordinate geometry
Here are two points in the plane.

1.4 2.4
p= ,q =
−3.8 −4.8
Here are two vectors.

−0.2 0.6
~v = ~ =
,w
1.3 −0.2
• What is q − p?
• What is p + ~v?
• What is ~v − 1.5~
w?
• What, if anything, is p + q?
• What is 0.5p + 0.5q? Why is this apparently illegal operation OK?
22
15. Subsets of F n
A subset of F n can be finite, countably infinite, or uncountably infinite.
The concept is especially useful when the elements of F n are points, but it
is valid also for vectors.
Examples:

0 1 2
(a) In Z23 ,
consider the set { , , }.
1 2 0
This will turn out (outline 7) to be a line in “the small affine faculty
senate.” Write it in the form {p + t~v|t ∈ Z3 }.
(b) In R2 , consider the set of points whose coordinates are both positive
integers. Is it finite, countably infinite, or uncountably infinite?
(c) In R2 , consider the set of points on the unit circle, a “one-dimensional

manifold.” Is it finite, countably infinite, or uncountably infinite?

2 x
(d) In R , draw a diagram that might represent the set of points ,
y
where x is family income and y is family net worth, for which a family
qualifies for free tuition.
23
16. Subspaces of F n
A subspace is defined only when the elements of F n are vectors. It must
be closed under vector addition and scalar multiplication. The second re-
quirement means that the zero vector must be in the subspace. The empty
set ∅ is not a subspace!
Geometrically, a subspace corresponds to a “flat subset” (line, plane, etc.)
that includes the origin.
For R3 there are four types of subspace. What is the geometric interpreta-
tion of each?
 
0
• 0-dimensional: the set { 0}

0
• 1-dimensional: {t~u|t ∈ R}
Exception: 0-dimensional if
• 2-dimensional: {s~u + t~v|s, t ∈ R}

Exception: 1-dimensional if
• 3-dimensional: {r~u + s~v + t~

w|r, s, t ∈ R}
Exceptions: 2-dimensional if
1-dimensional if
A special type of subset is obtained by adding all the vectors in a subspace

to a fixed point. It is in general not a subspace, but it has special properties.
Lines and planes that do not contain the origin fall into this category.
We call such a subset an “affine subset.” This terminology is not standard:
the Math 116 textbook uses “linear variety.”
24
17. Standard basis vectors:
These are useful when we want to think of F n more abstractly.

The standard basis vector e~i has a 1 in position i, a 0 everywhere else. Since
0 and 1 are in every field, these vectors are defined for any F .
The nice thing about standard basis vectors is that in F n , any vector can
be represented uniquely in the form
n
X
xi e~i
i=1
This will turn out to be true also in an abstract n-dimensional vector space,
but in that case there will be no “standard” basis.
18. Another meaning for “field”

Physicists long ago started using the term “field” to mean “a function
that assigns a vector to every point.” Examples are the gravitational field,
electric field, and magnetic field.
Another example: in a smoothly flowing stream or in a blood vessel, there
is a function that assigns to each point the velocity vector of the fluid at
that point: a “velocity field.”

x1
If is the point whose coordinates are the interest rate x1 and the
x2
unemployment rate x2 , then the Fed chairman probably has in mind the
function that assigns to this point a vector: the expected change in these
quantities over the next month.

~ x1
A function F that assigns to this point a vector of rates of change:
x2
dx1
dt
dx2 = F
~ x1
dt
x2
specifies a linear differential equation involving two variables. In November

you will learn to solve such equations by matrix methods.
Here is a formula for a vector field from Hubbard, exercise 1.1.6 (b). Plot
it.

x x
F~ = .
y 0
25
Here are formulas for vector fields from Hubbard, exercise 1.1.6, (c) and
(e). Plot them. If you did Physics C Advanced Placement E&M, they may
look familiar.

x x x −y
F~ = , F~ =
y y y x
19. Matrices
An m × n matrix over a field F is a rectangular array of elements of F with
m rows and n columns. Watch the convention: the height is specified first!
As a mathematical object, any matrix can be multiplied by any element of
F . This could be meaningless in the context of an application. Suppose
you run a small hospital that has two rooms with three patients in each.
Then

98.6 102.4 99.7
103.2 98.3 99.6
is a perfectly reasonable way to keep track of the body temperatures of the

patients, but multiplying it by 2.7 seems unreasonable. This matrix, viewed
as an element of R6 , is a point, not a vector, but we always use braces for
matrices.
Matrices with the same size and shape can be added component by com-
ponent. What would you get if you add

0.2 −1.4 0.0
0.6 −0.9 2.35
to the matrix above to update the temperature data by one day?
26
20. Matrix multiplication
Matrix multiplication is nicely explained on pp. 43-46 of Hubbard. To
illustrate the rule, we will take
 
0 1
2 1 0
A= , B =  2 −1
1 −1 −2
−2 0
 
0 1
• Compute AB.  2 −1
−2 0

2 1 0
1 −1 −2

2 1 0
• Compute BA.
1 −1 −2
 
0 1
 2 −1
−2 0
In a set of n×n square matrices, addition and multiplication of matrices are

always defined. Multiplication is distributive with respect to addition, too.
But because matrix multiplication is noncommutative, the n × n matrices
do not form a field if n > 1. (They are said to form a ring.) Let

1 1 0 1
A= B=
1 0 2 1

0 1
Find AB.
2 1

1 1
1 0

1 1
Find BA.
1 0

0 1
2 1
27
21. Matrices as functions:
Since a column vector is also an n × 1 matrix, we can multiply an m × n
matrix by a vector in F n to get a vector in F m . The product A~ ei is the
ith column of A. This is usually the best way to think of a matrix A as
representing a linear function f : the ith column of A is f (~
ei ).

1 1 0 2
Example: Suppose that f ( )= , f( )= .
0 4 1 3
What matrix represents f ?
Since A(xi e~i + xj e~j ) is the sum of xi times column i and xj times column
j, we see that
f (xi e~i + xj e~j ) = xi f (~

ei ) + xj f (~
ej )
This is a requirement if f is to be a linear function.

2
Use matrix multiplication to calculate f ( ).
−1
The rule for forming the product AB can be stated in terms of the rule for
a matrix acting on a vector: to form AB, just let A act on each column of
B in turn, and put the results side by side to create the matrix AB.
What function does the matrix product AB represent? Consider (AB)~ ei .
This is the ith column of the matrix AB, and it is also the result of letting
B act on e~i , then letting A act on the result. So for any standard basis
vector, the matrix AB represents the composition A ◦ B of the functions
represented by B and by A.
What about the matrices (AB)C and A(BC)? These represent the compo-
sition of three functions: say (f ◦ g) ◦ h and f ◦ (g ◦ h). But we already know
that composition of functions is associative. So we have proved, without
any messy algebra, that multiplication of matrices is associative also.
28
22. Proving associativity by brute force (proof 1.2)
A is an n × m matrix.
B is an m × p matrix.
C is an p × q matrix.
What is the shape of the matrix ABC?

Show how you would lay out the calculation of (AB)C.
If ai,j represents the entry in the ith row, jth column of A, then
m
X
(AB)i,k = ai,j bj,k
j=1
p p
m X
X X
((AB)C)i,q = (AB)i,k ck,q = (ai,j bj,k )ck,q
k=1 j=1 k=1
Show how you would lay out the calculation of A(BC).
(BC)j,q =
(A(BC))i,q =
On what basis can you now conclude that matrix multiplication is associa-
tive for matrices over any field F ?
Group problem 1.1.1c offers a more elegant version of the same proof by
exploiting the fact that matrix multiplication represents composition of
linear functions.
29
23. Identity matrix:
It must be square, and the ith column is the ith basis vector. For example,
 
1 0 0
I3 = 0 1 0
0 0 1
24. Matrices as the arrows for a category C

Choose a field F , perhaps the real numbers R.
• An object of C is a vector space F n .

• An arrow of C is an n × m matrix A, with domain F m and codomain
F n.
B A
• Given F p −→ Fm − → F n the composition of arrows A and B is the
matrix product AB. Show that the “shape” of the matrices is right
for multiplication.
• The identity arrow for object F n is the n × n identity matrix.
Now we just have to check the two rules that must hold in any category:
• The associative law for composition of arrows holds because, as we

just proved, matrix multiplication is associative.

2 3 4
• Verify the two identity rules for the case where A = .
1 2 3
30
25. Matrix inverses:
Consider first the case of a non-square m × n matrix A.
If m > n, then A takes a vector in Rn and produces a longer vector in
Rm . In general, there will be many matrices B that can recover the original
vector in Rn . In the lingo of categories, such a matrix B is a retraction.
Here is a matrix that converts a 2-component vector (price of silver and
price of gold) into a three-component vector that specifies the price of
alloys

4
containing 25%, 50%, and 75% gold respectively. Calculate ~v = A .
8
 
.75 .25
4
A =  .5 .5  , ~v = A =
8
.25 .75
By elementary algebra you can reconstruct the price of silver and of gold
from the price of any two of the alloys, so it is no surprise to find two
different left inverses. Apply each of the following to ~v.

2 −1 0
B1 = , B1~v =
−2 3 0

0 3 −2
B2 = , B2~v =
0 −1 2
However, in this case there is no right inverse.

If m < n, then A takes a vector in Rn and produces a shorter vector in
Rm . In general, there will be no left inverse matrix B that can recover the
n
original
vector
in R , but there may be many different right inverses. Let
A = 1 −1 and find two different right inverses. In the lingo of categories,
such a matrix A is a section.
31
26. Inverting square matrices
For a square matrix, the interesting case is where both a right inverse B
and a left inverse C exist. In this case, B and C are equal and they are
unique. We can say that “an inverse” A−1 exists.
Proof of both uniqueness and equality:
To prove uniqueness of the left inverse matrix, assume that matrix A has
two different left inverses C and C 0 and a right inverse B:
C 0 A = CA = I
C 0 (AB) = C(AB) = IB
C 0 I = CI = B
C0 = C = B
In general, inversion of matrices is best done by “row reduction,” discussed

in Chapter 2 of Hubbard. For 2 × 2 matrices there is a simple formula that
is worth memorizing:
If

a b
A=
c d
then

−1 1 d −b
A =
ad − bc −c a
If ad − bc = 0 then no inverse exists.

3 1
Write down the inverse of , where the elements are in R.
4 2
32
The matrix
inversion recipe works in any field: try inverting
3 1
A= where the elements are in Z5 .
4 2
27. Other matrix terminology:

All these terms are nicely explained on pp 49-50 of Hubbard.
• transpose
• symmetric matrix
• antisymmetric matrix
• diagonal matrix
• upper or lower triangular matrix
Try applying them to some 3 × 3 matrices:

 
3 1 2
A = 1 2 3
2 3 4
 
3 0 0
B = 1 2 0
2 3 4
 
3 1 2
C = 0 2 3
0 0 4
 
3 0 0
D = 0 2 0
0 0 4
 
0 −1 −2
E = 1 0 −3
2 3 0
33
28. Linear transformations:
A function T : F n → F m is called linear if, for any vectors ~v, w

~ ∈ F n and
any scalars a, b ∈ F
T (a~v + b~
w) = aT (~v) + bT (~
w)
Example:
The components of ~v are the quantities of sugar, flour, and chocolate re-
quired to produce a batch of brownies. The components of w ~ are the
quantities of these ingredients required to produce a batch of fudge. T
is the function that converts such a vector into the total cost of ingredi-
ents. T is represented by a matrix [T ] (row vector) of prices for the various
ingredients.
Write these vectors for the following data:
• A batch of brownies takes 3 pounds of sugar, 6 of flour, 1 of chocolate,

while a batch of fudge takes 4 pounds of sugar, 0 of flour, 2 of chocolate.
• Sugar costs $2 per pound, flour costs $1 per pound, chocolate costs $6
per pound.
Then a~v + b~
w is the vector of ingredients required to produce a batches of
brownies and b batches of fudge, while T (~v) is the cost of parts for a single
batch of brownies. The statement
T (a~v + b~
w) = aT (~v) + bT (~
w) is sound economics.
Two ways to find the cost of 3 batches of brownies plus 2 batches of fudge.
T (3~v + 2~
w) =
3T (~v) + 2T (~
w) =
Suppose that T produces a 2-component vector of costs from two competing

grocers. In that case [T ] is a 2 × 3 matrix.
34
29. A linear transformation interpreted geometrically.
A parallelogram has one vertex at the origin. Two other vertices are located
at points in the plane specified by ~v and w
~ . Transformation T expands the
parallelogram by a factor of 2 and rotates it counterclockwise through a
right angle.
You can either locate the fourth vertex by vector addition and then apply
T to it, or you can apply T separately to the second and third vertices,
than apply T . So
~ ) = T (~v) + T (~
T (~v + w w)
Draw diagrams to illustrate both approaches.
The matrix that represents T is

0 −2
[T ] =
2 0

a
By letting [T ] multiply an arbitrary vector you can determine the effect
b
2
of T on any point in the plane. Do this for the vector .
1
35
30. Matrices and linear transformations
Use * to denote the mechanical operation of matrix multiplication.
Any vector can be written as ~v = x1 e~1 + ... + xn e~n .
The rule for multiplying a matrix [T ] by a vector ~v is equivalent to
[T ] ∗ ~v = x1 [T ] ∗ e~1 + ... + xn [T ] ∗ e~n = [T ] ∗ (x1 e~1 + ... + xn e~n )

.
So multiplication by [T ] specifies a linear transformation of F n .
The matrix [T ] has columns [T ] ∗ (e~1 ), ...[T ] ∗ (e~n ).
The distinction is subtle. T is a function, a rule. [T ] is just a collection
of numbers, but the general rule for matrix multiplication turns it into a
function.
31. Composition and multiplication:
Suppose S : F n → F m and T : F m → F p are both linear transformations.

Then the codomain of S equals the domain of T and we can define the
composition U = T ◦ S.
Prove that U is linear.
To find the matrix of U , we need only determine its action on a standard

basis vector.
U (~ ei )) = T ([S] ∗ e~i ) = [T ] ∗ ([S] ∗ e~i ) = ([T ] ∗ [S]) ∗ e~i

ei ) = T (S(~
So the matrix of T ◦ S is [T ] ∗ [S].
36
32. Inversion
A function f is invertible if it is 1-to-1 (injective) and onto (surjective). If
g is the inverse of f , then both g ◦ f and f ◦ g are the identity function.
How do we reconcile this observation with the existence of matrices that
have one-sided inverses?
Here are two simple examples that identify the problem.
(a) Define f by the formula f (x) = 2x. Then

f : R → R is invertible.
f : Z3 → Z3 is invertible.
f : Z → Z is not invertible.
f : Z → 2Z is invertible. (2Z is the set of even integers)
In the last case, we have made f invertible by redefining its codomain
to equal its image.
√
(b) If we want to say that the inverse of f (x) = x2 is g(x) = x, we have
to redefine f (x) so that its codomain is the nonnegative reals (makes
it onto) and its domain is the nonnegative reals (makes it one-to-one).
The codomain of the function that an m × n matrix represents is all of Rm .

Hubbard p. 64 talks about the invertibility of a linear transformation T :
F n → F m and ends up commenting that m and n must be equal. Here is
the problem, whose proof will have to wait:
If m > n, T cannot be onto, because its image is just a subspace of F m .

1
Show how the case where [T ] = illustrates the problem.
2
If m < n, T cannot be one-to-one, because there is always a subspace of

F n that gets mapped to the zero vector.

Show how the case where [T ] = 1 −1 illustrates the problem.
37
33. Example - constructing the matrix of a linear transformation
Here is what we know about function f :
• Its domain and codomain are both R2 .

• It is linear.

1 7
• f( )= .
2 5

1 11
• f( )= .
4 9
Find the matrix T that represents f by using linearity to determine what

f does to the standard basis vectors.
Then automate the calculation by writing down a matrix equation and
solving it for T.
38
34. Invertibility of linear functions and of matrices (proof 1.3, Hubbard, propo-
sition 1.3.14)
Since the key issue in this proof is the subtle distinction between a linear
function T and the matrix [T ] that represents it, it is a good idea to use
* to denote matrix multiplication and ◦ to denote composition of linear
transformations.
It is also a good idea to use ~x for a vector in the domain of T and ~y for a
vector in the codomain of T
Suppose that linear transformation T : F n → F m is represented by the
m × n matrix [T ].
(a) Suppose that the matrix [T ] is invertible. Prove that the linear trans-
formation T is one-to-one and onto (injective and surjective), hence
invertible.
(b) Suppose that linear transformation T is invertible. Prove that its

inverse S is linear and that the matrix of S is [S] = [T ]−1
The shortest version of this proof starts by exploiting the linearity of
T when it is applied to a cleverly-chosen sum of vectors.
T (aS(y~1 ) + bS(y~2 )) = aT ◦ S(y~1 ) + bT ◦ S(y~2 ).
39
35. Application: graph theory
This is inspired by example 1.2.22 in Hubbard (page 51), but I have ex-
tended it by allowing one-way edges and multiple edges.
A graph has n vertices: think of them as islands. Given two vertices Vi and
Vj , there may be Ai,j edges (bridges or ferryboats) that lead from Vj to Vi
and Aj,i edges that lead from Vi to Vj . If a bridge is two-way, it counts
twice, but we allow one-way bridges.
The matrix
 
0 0 1 1
1 0 0 0
A=
1

0 0 0
0 1 1 0
corresponds to the following directed graph:
40
Clearly A is a matrix, and it describes the graph completely. The challenge
is to associate it with a linear transformation and to interpret its columns
as vectors.
Suppose you are a travel agent and you keep a notebook with a complete list
of all the ways that you have found to reach each island. So one component,
xj , would count the number of ways that you have found to reach island j.
A standard basis vector like e~j describes a notebook that has one way of
reaching island j (land at the airport?) and no way of reaching any other
islands.
It is always worth asking what (if anything) the operations of addition and
scalar multiplication mean. Addition is tricky: in general, it would have
to correspond to two different agents combining their notebooks, with no
attempt to weed out duplicates. Multiplication by a non-integer makes no
sense.
What about A~ ej ? This is the jth column of A and its ith component is
Ai,j , the number of edges leading from Vj to Vi . (Hubbard has chosen the
opposite convention in Exercises 1.2.20 and 1.2.22, but for his example the
matrix is symmetric and it makes no difference). It is an annoying feature
of matrix notation that the row index comes first, since we choose a column
first and then consider its entries.)
 
x1
 x2 
Now consider a vector ~v =   ...  whose entries are arbitrary non-negative

xn
integers. After traversing one more edge, the number of walks that lead to
vertex Vi is
n
X
Ai,j xj .
j=1
This is a linear function, and we see that the vector A~v represents the
number of distinct ways of reaching each island after extending the existing
list of walks by following one extra edge wherever possible.
If you start on island Vj and make a walk of n steps, then the number of
distinct walks leading to each island is specified by the components of the
vector An e~j .
Hubbard does the example of a cube, where all edges are two-way.
41
 
0 0 1 1
1 0 0 0
For the four-island graph, with A = 
1
,
0 0 0
0 1 1 0
use matrix multiplication to find
(a) the number of two-step paths from island 1 to island 4.

(b) the number of three-step paths from island 1 to island 2.
(c) the number of four-step paths from island 3 to island 1.
   
0 0 1 1 0 0 1 1
1 0 0 0 1 0 0 0
   
1 0 0 0 1 0 0 0
0 1 1 0 0 1 1 0
 
0 0 1 1
1 0 0 0
 
1 0 0 0
0 1 1 0
42
36. Application: Markov processes
This is inspired by example 1.2.21 in Hubbard, but in my opinion he breaks
his own excellent rule by using a “line matrix” to represent probabilities.
The formulation below uses a column vector.
Think of a graph where the vertices represent “states” of a random process.
A state could, for example, be
(a) A travel agent is on a specific island.

(b) Player 1 is serving in a game of badminton.
(c) Hubbard’s reference books are on the shelf in the order (2,1,3).
(d) A roulette player has two chips.
(e) During an inning of baseball, there is one man out and runners on first
base and third base.
All edges are one way, and attached to each edge is a number in [0,1], the
“transition probability” of following that edge in one step of the process.
The sum of the probabilities on all the edges leading out of a state cannot
exceed 1, and if it is less than 1 there is some probability of remaining in
that state.
Examples: write at least one column of the matrix for each case.
(a) If you are on Oahu, the probability of flying to Maui is 0.2, and the
probability of flying to Lanai is 0.1. Otherwise you stay put.
(b) Badminton: if player 1 serves, the probability of losing the point and
the serve is 0.2. If player 2 serves, the probability of losing the point
and the serve is 0.3.
(c) If John Hubbard’s reference books are on the shelf in the order (2,1,3),
the probability that he consults book 3 and places it at the left to make
the order (3,2,1) is P3 .
43
(d) Roulette: after starting with 2 chips and betting a chip on red, the
9
probability of having 3 chips is 19 and the probability of having 1 chip
is 19 . (in a fair casino, each probability would be 12 ).
10
For the badminton example, the transition matrix is

0.8 0.3
A= .
0.2 0.7
What matrix represents the transition resulting from two successive points?

0.8 0.3
0.2 0.7

0.8 0.3
0.2 0.7
What matrix represents the transition resulting from four successive points?

0.7 0.45
0.3 0.55

0.7 0.45
0.3 0.55
If you raise the transition matrix A to a high power, you might conjecture
that after a long time the probability that player 1 is serving is 0.6, no
matter who served first.

∞ 0.6 0.6
In support of this conjecture, show that the matrix A = has
0.4 0.4
the property that AA∞ = A∞ .
44
3 Group Problems
1. Some short proofs
Once your group has solved its problem, use a cell phone to take a picture
of your solution, and upload it to the topic box for your section on the
Week 1 page of the Web site.
(a) When we say that a matrix A is invertible, we mean that it has both
a right inverse and a left inverse. Prove that the right inverse and the
left inverse are equal, and that the inverse is unique.
If you need a hint, see page 48 of Hubbard.
Illustrate
your
answer by writing down the inverse B of the matrix
3 2
A= , where all the entries are in the finite field Z5 , and showing
2 4
that both AB and BA are equal to the identity matrix.
Since you are working in a finite field, there are no fractions. In Z5 ,
dividing by 3 is the same as multiplying by 2.
(b) Here are two well-known laws of arithmetic that are not on the list of
field axioms. They do not need to be listed as axioms because they are
provable theorems! In each case, the trick is to start with an identity
that is valid in any field, then apply the distributive law. You should
be able to justify each step of your proof by reference to one or more
of the field axioms.
Starting with 0 + 0 = 0, prove that 0a = 0 for any a ∈ F .
Starting with −1 + 1 = 0, prove that (−1)a = −a for any a ∈ F .
(c) Prove that composition of functions, whether linear or not, is associa-
tive. Illustrate your proof by using the functions
f (x) = x2 , g(x) = ex , h(x) = 3 log x (natural logarithms)
and computing both f ◦ (g ◦ h) and (f ◦ g) ◦ h
Then use your result to give a one-line proof that matrix multiplication
must be associative. See Hubbard, page 63.
45
2. Matrices and linear functions
(a) Here is what we know about the function f :

• The space it maps from and the space it maps to (the domain and
codomain, respectively) are both R2 .
• It is linear.
1 4
• f( )=
1 2

1 6
• f( )=
3 4
i. Find the matrix T that represents f by using linearity to deter-
mine what f does to the standard basis vectors.
ii. Automate the calculation of T by writing down a matrix equation
and solving it for T .
(b) Suppose that T : (Z5 )2 → (Z5 )2 is a linear transformation for which

1 1 1 3
T = ,T = .
−1 2 1 0
Construct the matrix [T ] that represents T and the matrix [S] that
represents T −1 .
Since you are working in a finite field, there are no fractions. Dividing
by 2 is the same as multiplying by 3.
(c) You are a precious metals dealer. Every day you check the Internet
and download a vector whose first component is the price per ounce
of gold and whose second component is the price per ounce of silver.
You then calculate a vector in R3 whose components are respectively
• the price per ounce of 18-carat gold (75% gold, 25% silver)
Write down the matrix F that represents the linear function
f : R2 → R3 which converts the prices of pure metals to the prices of
alloys.
Invent two different left inverses G1 and G2 for F .
Show that no right inverse for F exists. Explain in economic terms
what is going on here. (The alloys may be inconsistently priced.)
46
3. Problems to be solved by writing or editing R scripts
Upload your answer immediately to the Week 1 page of the course Web
site. Then your classmates can try out your script.
(a) Use the outer() function of R to make a table of the multiplication

facts for Z17 and use it to find the multiplicative inverse of each nonzero
element. Then use these inverses to find the result of dividing 11 by 5
and the result of dividing 5 by 11 in this field.
(b) You are playing roulette in an American casino, and for any play you
may have 0, 1, 2, or 3 chips. When you bet a chip on “odd” you
have only an 18/38 chance of winning, because the wheel has 18 odd
numbers, 18 even numbers, plus 0 and 00 which count as neither even
nor odd.
• If you have 0 chips you cannot bet and continue to have 0 chips.
• If you have 1 chip you have probability 9/19 of moving up to 2
chips, probability 10/19 of moving down to 0 chips.
• If you have 2 chip you have probability 9/19 of moving up to 3
chips, probability 10/19 of moving down to 1 chip.
• If you have 3 chips you declare victory, do not bet, and continue
to have 3 chips.
Create the 4 × 4 matrix that represents the effect of one play. Assume
that before the first play you are certain to have 2 chips. Use matrix
multiplication to determine the probability of your having 0, 1, 2, or 3
chips after 1, 2, 4 and 8 plays. Make a conjecture about the situation
after a very large number of plays.
(c) If you include in your R script the line
source("1.1L-LinearMystery.R")
it will define a function f M yst : R2 → R2 that is linear and invertible.
Every time you execute the source() line the function changes!
Write an R script that shows how to construct the matrix F for this
function
• by evaluating f M yst on the standard basis vectors.

1 1
• by evaluating f M yst only on the vectors and .
1 −1

6 2
• by evaluating f M yst only on the vectors and .
2 1
This script can solve problem a in set 2 on the preceding page!
47
4 Homework
(PROBLEM SET 1 - due on Tuesday, September 9 by 11:59 PM)
Problems 1-7 should be done on paper and placed in the locked box near
Science Center 209 that has the name of your Monday section instructor on it.
Problems 8 and 9 should be done in a single R script and uploaded to the
dropbox on the Week 1 page of the course Web site.
1. Prove the following, using only the field axioms and the results of group
problem 1(b).
(a) The multiplicative inverse a−1 of a nonzero element a of a field is

unique.
(b) (−a)(−b) = ab.
2. Function composition
( Hubbard, exercise 0.4.10.)
Prove the following:
(a) Let the functions f : B → C and g : A → B be onto. Then the

composition (f ◦ g) is onto.
(b) Let the functions f : B → C and g : A → B be one-to-one. Then the
composition (f ◦ g) is one-to-one.
This problem asks you to prove two results that we will use again and again.
All you need to do is to use the definitions of “one-to-one” and “onto.”
Here are some strategies that may be helpful:
• Exploit the definition:

If you are told that f (x) is onto, then, for any y in the codomain Y ,
you can assert the existence of an x such that f (x) = y.
If you are told that f (x) is one-to-one, then, for any a and b such that
f (a) = f (b), you can assert that a = b.
• Construct what the definition requires by a procedure that cannot fail:
To prove that h(x) is onto, describe a procedure for constructing an x
such that h(x) = y. The proof consists in showing that this procedure
works for all y in the codomain Y .
• Prove uniqueness by introducing two names for the same thing:
To prove that h(x) is one-to-one, give two different names to the same
thing: assume that h(a) = h(b), and prove that a = b.
48
3. Hubbard, exercise 1.2.2, parts (a) and (e) only. Do part (a) in the field
R, and do part (e) in the field Z7 , where -1 is the same as 6. Check your
answer in (e) by doing the calculation in two different orders: according to
the associative law these should give the same answer. See Hubbard, figure
1.2.5, for a nice way to organize the calculation.
4. (a) Prove theorem 1.2.17 in Hubbard: that the transpose of a matrix

product is the product of the matrices in the opposite order: (AB)T =
B T AT .

1 2 2 −1
(b) Let A = ,B = . Calculate AB. Then, using the
2 3 −1 3
theorem you just proved, write down the matrix BA without doing any
matrix multiplication. (Notice that A and B are symmetric matrices.)
(c) Prove that if A is any matrix, then AT A is symmetric.
5. (a) Here is a matrix whose entries are in the finite field Z5 .

[1]5 [2]5
A=
[3]5 [3]5
Write down the inverse of A, using the names [0]5 · · · [4]5 for the entries
in the matrix. Check your answer by matrix multiplication.
(b) Count the number of different 2 × 2 matrices with entries in the finite
field Z5 . Of these, how many are invertible? Hint: for invertibility, the
left column cannot be zero, and the right column cannot be a multiple
of the left column.
6. (a) Hubbard, Exercise 1.3.19, which reads:

“If A amd B are n × n matrices, their Jordan product is AB+BA
2
. Show
that this product is commutative but not associative.”
Since this problem has an odd number, it is solved in the solutions
manual for the textbook. If you want to consult this manual, OK, but
remember to cite your source!
(b) Denote the Jordan product of A and B by A∗B. Prove that it satisfies
the distributive law A ∗ (B + C) = A ∗ B + A ∗ C.
(c) Prove that the Jordan product satisfies the special associative law
A ∗ (B ∗ A2 ) = (A ∗ B) ∗ A2 .
49

3 6 2 5
7. (a) Suppose that T is linear and that T = ,T = .
2 8 1 5

1 0
Use the linearity of T to determine T and T , and thereby de-
0 1
termine the matrix [T ] that represents T . (This brute-force approach
works fine in the 2 × 2 case but not in the n × n case.)
(b) Express the given information about T from part (b) in the form
[T ][A] = [B], and determine the matrix [T ] that represents T by using
the matrix [A]−1 . (This approach will work in the general case once
you know how to invert an n × n matrix .)
The last two problems require R scripts. It is fine to copy and edit similar
scripts from the course Web site, but it is unacceptable to copy and edit
your classmates’ scripts!
8. (similar to script 1.1C, topic 5)

~ and v2
Let v1 ~ denote the columns of a 2 × 2 matrix M . Write an R script
that draws a diagram to illustrate the rule for the sign of det M , namely
~ counterclockwise (through less than 180◦ ) to
• If you have to rotate v1
~ then detM > 0.
make it line up with v2,
~ clockwise (through less than 180◦ ) to make it
• If you have to rotate v1
~ then detM < 0.
line up with v2,
~ and v2
• If v1 ~ lie on the same line through the origin, then
det M = 0.
9. (similar to script 1.1D, topic 2)

Busch Gardens proposes to open a theme park in Beijing, with four regions
connected by monorail. From region 1 (the Middle Kingdom), a guest can
ride on a two-way monorail to region 2(Tibet), region 3(Shanghai) or region
4(Hunan) or back. Regions 2, 3, and 4 are connected by a one-way monorail
that goes from 2 to 3 to 4 and back to 2.
(a) Draw a diagram to show the four regions and their monorail connec-
tions.
(b) Construct the 4 × 4 transition matrix A for this graph of four vertices.
(c) Using matrix multiplication in R, determine how many different se-
quences of four monorail rides start in Tibet and end in the Middle
Kingdom.
50
.
51
Module #1, Week 2 (Dot and Cross Products, Euclidean Geometry of Rn )

Reading
• Hubbard, section 1.4
• 2.1 Given vectors ~v and w ~ in Euclidean Rn , prove that |~v · w ~ | ≤ |~v||~

w|
(Cauchy-Schwarz) and that |~v + w ~ | ≤ |~v| + |~
w| (triangle inequality). Use
the distributive law for the scalar product and the fact that no vector has
negative length.
(The standard version of this proof is in the textbook. An alternative is in
sections 1.3 and 1.4 of the Executive Summary.)
• 2.2 For a 3 × 3 matrix A, define det(A) in terms of the cross and dot
products of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).
1
R Scripts Scripts labeled A, B, ... are closely tied to the Executive Summary.
Scripts labeled X, Y, ... are interesting examples. There is a narrated version on
the Web site. Scripts labeled L are library scripts that you may wish to include
in your own scripts.
• Script 1.2A-LengthDotAngle.R
Topic 1 - Length, Dot Product, Angles
Topic 2 - Components of a vector
Topic 3 - Angles in Pythagorean triangles
Topic 4 - Vector calculation using components
• Script 1.2B-RotateReflect.R
Topic 1 - Rotation matrices
Topic 2 - Reflection matrices
• Script 1.2C-ComplexConformal.R
Topic 1 - Complex numbers in R
Topic 2 - Representing complex numbers by 2x2 matrices
• Script 1.2D-CrossProduct.R
Topic 1 - Algebraic properties of the cross product
Topic 2 - Geometric properties of the cross product
Topic 3 - Using cross products to invert a 3x3 matrix
• Script 1.2E-DeterminantProduct.R
Topic 1 - Product of 2x2 matrices
Topic 2 - Product of 3x3 matrices
• Script 1.2L-VectorLibrary.R
Topic 1 - Some useful angles and basis vectors
Topic 2 - Functions for working with angles in degrees
• Script 1.2X-Triangle.R
Topic 1 - Generating and displaying a randomly generated triangle
Topic 2 - Checking some formulas of trigonometry
• Script 1.2Y-Angles3D.R
Topic 1 - Angles between vectors in R3
Topic 2 - Angles and distances in a cube Topic 3 - Calculating the airline
mileage between cities
2
1 Executive Summary
1.1 The dot product
Pn
The dot product of two vectors in Rn is ~x ·~y = x1 y1 +x2 y2 +. . .+xn yn = i=1 xi yi
• It requires two vectors and returns a scalar.
• It is commutative and it is distributive with respect to addition.
• In R2 or R3 , the dot product of a vector with itself (a concept of algebra)
is equal to the square of its length (a concept of geometry):
~x · ~x = |~x|2
• Taking the dot product with any standard basis vector e~i extracts the cor-
responding component:
~x · e~i = xi
• Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x · ~a = xa
This means that the difference ~x − xa~a is orthogonal to ~a.
1.2 Dot products and angles

We have the law of cosines, usually written c2 = a2 + b2 − 2ab cos α.
~y b
c ~x − ~y
α
a
~x
Consider the triangle whose sides lie along the vectors ~x(length a), ~y (length b),
and ~x − ~y (length c). Let α denote the angle between the vectors ~x and ~y.
By the distributive law,
(~x − ~y) · (~x − ~y) = ~x · ~x + ~y · ~y − 2~x · ~y =⇒ c2 = a2 + b2 − 2~x · ~y
Comparing with the law of cosines, we find that angles and dot products are
related by:
~x · ~y = ab cos α = |~x||~y| cos α
3
1.3 Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and angle for
vectors to Rn , but now we can no longer invoke Euclidean plane geometry to
guarantee that | cos α| ≤ 1.
~ in Rn
We need to show that for any vectors ~v and w
|~v · w
~ | ≤ |~v||~
w|
This is generally known as the “Cauchy-Schwarz inequality.”
For a short proof of the Cauchy-Schwarz inequality, make ~v and w

~ into unit
vectors and form their sum and difference.
~v ~
w ~v ~
w
( ± )·( ± )≥0
|~v| |~
w| |~v| |~
w|
~v · w
~ ~v · w
~
1+1±2 ≥ 0, and by algebra | |≤1
|~v||~w| |~v||~w|
We now have a useful definition of angle for vectors in Rn in general:
~v · w
~
α = arccos
|~v||~w|
1.4 The triangle inequality

If ~x and ~y, placed head-to-tail, determine two sides of a triangle, the third side
coincides with the vector ~x + ~y.
~x + ~y
~y
~x
We need to show that its length cannot exceed the sum of the lengths of the
other two sides:
|~x + ~y| ≤ |~x| + |~y|

The proof uses the distributive law for the dot product.
|~x + ~y|2 = (~x + ~y) · (~x + ~y) = (~x + ~y) · ~x + (~x + ~y) · ~y
Applying Cauchy-Schwarz to each term on the right-hand side, we have:
|~x + ~y|2 ≤ |~x + ~y||~x| + |~x + ~y||~y|

In the special case where |~x + ~y| = 0 the inequality is clearly true. Otherwise
we can divide by the common factor of |~x + ~y| to complete the proof.
4
1.5 Isometries of R2
A linear transformation T : R2 → R2 is completely specified by its effect on the
basis vectors ~e1 and ~e2 . These vectors are the two columns of the matrix that
represents T . If you know what a transformation is supposed to do to each basis
vector, you can simply use this information to fill out the necessary columns of
its matrix representation.
Of special interest are isometries: transformations that preserve the distance

between any pair of points, and hence the length of any vector.
Since dot products can be expressed in terms of lengths, it follows that any
isometry also preserves dot products.
So the transformation T is an isometry if and only if for any pair of vectors:
T~a · T ~b = ~a · ~b
For the matrix associated with an isometry, both columns must be unit vectors
and their dot product is zero.
Two isometries:

cos θ − sin θ
• A rotation, R(θ) = , with det R = +1.
sin θ cos θ

cos 2θ sin 2θ
• A reflection, F (θ) = , with det F = −1.
sin 2θ − cos 2θ
Matrix R represents a counterclockwise rotation through angle θ about the

origin. Matrix F represents reflection in a line through the origin that makes an
angle θ with the first standard basis vector.
There are many other isometries of Euclidean geometry; translations, or ro-
tations about points other than the origin. However, these do not hold the origin
fixed, and so they are not linear transformations and cannot represented by 2 × 2
matrices.
Since the composition of isometries is an isometry, the product of any number
of matrices of this type is another rotation or reflection. Remember that com-
position is a series of transformations acting on a vector in a specific order that
must be preserved during multiplication.
5
1.6 Matrices and algebra: complex numbers
The same field axioms we reviewed on the first day apply here to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers is the
same as the rule for addition of vectors in R2 (in that they are to be kept separate
from each other), and the modulus of a complex number is the same as the length
of the vector that represents it. So the triangle inequality applies for complex
numbers: |z1 + z2 | ≤ |z1 | + |z2 |.
This property extends to vector spaces over complex numbers.
1.7 What about complex multiplication?

The geometrical interpretation of multiplication by a complex number
z = a + ib = reiθ is multiplication of the modulus by r combined with addition
of θ to the angle with the x-axis.
This is precisely the geometrical effect of the linear transformation represented
by the matrix
a −b r cos θ −r sin θ
=
b a r sin θ r cos θ

r 0
Such a matrix is the product of the constant matrix and the rotation
0 r
cos θ − sin θ
matrix .
sin θ cos θ
It is called a conformal matrix and it preserves angles even though it does not
preserve lengths.
1.8 Complex numbers as a field of matrices

In general, matrices do not form a field because multiplication is not commuta-
tive. There are two notable exceptions: n × n matrices that are multiples of the
identity matrix and 2 × 2 conformal matrices. Since multiples of the identity
ma-
a −b
trix and rotations all commute, the product of two conformal matrices
b a
c −d
and is the same in either order.
d c
6
1.9 The cross product
     
a1 b1 a2 b 3 − a3 b 2
~a × ~b = a2  × b2  = a3 b1 − a1 b3 
a3 b3 a1 b 2 − a2 b 1
Properties
1. ~a × ~b = −~b × ~a.
2. ~a × ~a = 0.
3. For fixed ~a, ~a × ~b is a linear function of ~b, and vice versa.
4. For the standard basis vectors, e~i × e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i × e~j = −e~k .
5. ~a × ~b · ~c = ~a · ~b × ~c. This quantity is also the determinant of the matrix

whose columns are ~a, ~b, and ~c.
6. (~a × ~b) × ~c = (~a · ~c)~b − (~b · ~c)~a
7. ~a × ~b is orthogonal to the plane spanned by ~a and ~b.
8. |~a × ~b|2 = |~a|2 |~b|2 − (~a · ~b)2
9. The length of ~a × ~b is |~a||~b| sin α.
10. The length of ~a × ~b is equal to the area of the parallelogram spanned by ~a

and ~b.
1.10 Cross product and determinants

If a 3 × 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant det(A) =
~a1 × ~a2 · ~a3 .
1. det(A) changes sign if you interchange any two columns. (easiest to prove
for columns 1 and 2, but true for any pair)
2. det(A) is a linear function of each column (easiest to prove for column 3,

but true for any column)
3. For the identity matrix I, det(I) = 1.
The magnitude of ~a × ~b · ~c is equal to the volume of the parallelepiped spanned

by ~a, ~b and ~c.
If C = AB, then det(C) = det(A) det(B)
7
2 Lecture Outline
1. Introducing coordinates:
For three-dimensional geometry, we choose aspecific

 point, “the origin”,
0
to correspond to the element of R3 , O = 0 . We also choose three
0
orthogonal, oriented, coordinate axes and a unit of length, which determine
the standard basis vectors. These are a “right-handed” basis: if you hold
your right hand so that the thumb points along ~e3 , then the fingers of your
right hand carry ~e1 into ~e2 the “short way around,” through 90 rather than
270 degrees. Now any point pa of Euclidean geometry can be represented
by a vector in R3 ,
 
a1
~a = a1~e1 + a2~e2 + a3~e3 = a2  .

a3
p
The length of ~a is a21 + a22 + a23 . All the basis vectors have unit length.
For two-dimensional geometry, there are two alternatives. The simpler is
0
to make the origin correspond to an element of R2 , O = and to choose
0
two coordinate axes. Then a point pa of Euclidean plane geometry can be
2
represented by a vector
in R ,
a
~a = a1 e~1 + a2 e~2 = 1 . The length of ~a is a21 + a22 .
p
a2
Another way to do plane geometry is to use the plane x3 = 1. This is not
a subspace of R3 , since it does not include the zero vector. The
 origin of
0
the plane corresponds to a non-zero element of R3 , p0 = 0 , and an
  1
a1
arbitrary point of the plane is the element pa = a2  . Two points de-
1
termine a vector, whose third component
p is always 0. The length of the
2 2
vector determined by pa and p0 is a1 + a2 . Now any transformation of
Euclidean plane geometry that preserves distance, even one like a transla-
tion that moves the origin, can be represented by a linear transformation
of R3 . However, only a transformation A that carries the plane x3 = 1 into
itself has geometrical significance. What does this imply about the bottom
row of the matrix A?
8
2. The dot product:
This is defined for vectors in Rn as
~x · ~y = x1 y1 + x2 y2 + · · · + xn yn
It has the following properties. The proof of the first four (omitted) is
brute-force computation.
• Commutative law:
~x · ~y = ~y · ~x
• Distributive law:
~x · (y~1 + y~2 ) = ~x · y~1 + ~x · y~2
• For Euclidean geometry, in R2 or R3 , the dot product of a vector

with itself (defined by algebra) is equal to the square of its length (a
physically meaningful quantity).
• Taking the dot product with any standard basis vector e~i extracts the
corresponding component:
~x · e~i = xi
• Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x · ~a = xa
This means that the difference ~x − xa~a is orthogonal to ~a.
Proof: Orthogonality of two vectors means that their dot product is
zero. So to show orthogonality, evaluate
(~x − (~x · ~a)~a) · ~a.
9
3. Dot products and angles
From elementary trigonometry we have the law of cosines, usually written
c2 = a2 + b2 − 2ab cos α.
In this formula, c denotes the length of the side opposite angle α. Just in
case you forgot the proof, let’s review it.
Angles and dot products are related by the formula

~x · ~y = |~x||~y| cos α
Proof (Hubbard, page 69):
Consider the triangle whose sides lie along the vectors ~x, ~y, and ~x − ~y, and
let α denote the angle between the vectors ~x and ~y.
c2 = (~x − ~y) · (~x − ~y).
Expand the dot product using the distributive law, and you can identify
one of the terms as 2ab cos α.
10
4. Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and
angle for vectors to Rn , but now we can no longer invoke Euclidean plane
geometry to guarantee that | cos α| ≤ 1.
~ in Rn ,
We need to show that for any vectors ~v and w
|~v · w
~ | ≤ |~v||~
w|
This is generally known as the “Cauchy-Schwarz inequality.” Hubbard

points out that it was first published by Bunyakovsky. This fact illustrates
Stigler’s Law of Eponymy:
“No law, theorem, or discovery is named after its originator.”
The law applies to itself, since long before Stigler formulated it, A. N.
Whitehead noted that,
“Everything of importance has been said before, by someone who did not
discover it.”
The best-known proof of the Cauchy-Schwarz inequality incorporates two
useful strategies.
• No vector has negative length.

• Discriminant of quadratic equation.
Define a quadratic function of the real variable t by
~ |2 = (t~v − w
f (t) = |t~v − w ~ ) · (t~v − w
~)
Since f (t) is the square of a length of a vector, it cannot be negative, so

the quadratic equation f (t) = 0 does not have two real roots.
But by the quadratic formula, if the equation at2 + bt + c = 0 does not have
two real roots, its discriminant b2 − 4ac is not positive.
Complete the proof by writing out b2 − 4ac ≤ 0 for quadratic function f (t).
11
So we have a useful definition of angle for vectors in Rn in general:
~v · w
~
α = arccos
|~v||~w|
The function arccos(x) can be computed on your electronic calculator by

summing an infinite series. It is guaranteed to return a value between 0
and π.
   
1 0
2

1?
 
Example: In R4 , what is the angle between vectors 
1 and 1
0 2
5. The triangle inequality (second part of proof 2.1)

If ~x and ~y, placed head-to-tail, determine two sides of a triangle, the third
side coincides with the vector ~x +~y. We need to show that its length cannot
exceed the sum of the lengths of the other two sides:
|~x + ~y| ≤ |~x| + |~y|
The proof uses the distributive law for the dot product and the Cauchy-
Schwarz inequality.
Express |~x + ~y|2 as a dot product:
Apply the distributive law:
Use Cauchy-Schwarz to get an inequality for lengths:
Take the square root of both sides:
12
6. Proof 2.1 – start to finish, done in a slightly differnt way
Given vectors ~v and w~ in Euclidean Rn , prove that |~v · w
~ | ≤ |~v||~
w| (Cauchy-
Schwarz) and that |~v + w ~ | ≤ |~v| + |~
w| (triangle inequality). Use the dis-
tributive law for the scalar product and the fact that no vector has negative
length.
13
7. Some short proofs that use the dot product:
(a) A triangle is formed by using vectors ~x and ~y, both anchored at one
vertex. The vectors are labeled so that the longer one is called ~x: i.e.
|~x| > |~y|. The vector ~x −~y then lies along the third side of the triangle.
Prove that
|~x − ~y| ≥ |~x| − |~y|.
~x
~x − ~y
~y
(b) Prove that the dot product of vectors ~x and ~y can be expressed solely
in terms of lengths of vectors. It follows that an isometry, which by
definition preserves lengths of all vectors, also preserves dot products
and angles.
(c) A parallelogram has sides with lengths a and b. Its diagonals have
lengths c and d, Prove the “parallelogram law,” which states that
c2 + d2 = 2(a2 + b2 ).
14
8. Calculating angles and areas
   
−2 −4
Let ~v1 = 2 , ~v2 = 1  .
  
−1 1
 
1
Both these vectors happen to be perpendicular to the vector ~v3 = 2 .

2
(a) Determine the angle between ~v1 and ~v2 .

(b) Determine the volume of the parallelepiped spanned by ~v1 , ~v2 , and ~v3 ,
and thereby determine the area of the parallelogram spanned by ~v1 and ~v2 .
15
9. Isometries of R2 .
A linear transformation T : R2 → R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are “isometries:” transformations that preserve the dis-
tance between any pair of points, and hence the length of any vector.
Since
4~a · ~b = |~a + ~b|2 − |~a − ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
So T is an isometry if and only if

T~a · T ~b = ~a · ~b for any pair of vectors.
This means that the first column of T must be a unit vector, which can be
written without any loss of generality as

cos θ
.
sin θ
The second column must also be a unit vector, and its dot product with
the first column must be zero. So there are only two possibilities:
• A rotation,
cos θ − sin θ
R(θ) = ,
sin θ cos θ
which has det R = 1.
• A reflection,
cos 2θ sin 2θ
F (θ) = ,
sin 2θ − cos 2θ
which has det F = −1.
This represents reflection in a line through the origin that makes an
angle θ with the first basis vector.
Since the composition of isometries is an isometry, the product of any num-

ber of matrices of this type is another rotation or reflection.
16
10. Using matrices to represent rotations and reflections
(a) Use matrix multiplication to show that if a counterclockwise rotation

though angle β is followed by a counterclockwise rotation though angle
α, the net effect is a counterclockwise rotation though angle α + β.
(The proof requires some trig identities that you can rederive, if you
ever forget them, by doing this calculation.)
(b) Confirm, both by geometry and by matrix multiplication, that if you
reflect a point P first in the line y = 0, then in the line y = x, the net
effect is to rotate the point counterclockwise through 90◦ .
17
11. Complex numbers as vectors and as matrices
The field axioms thst you learned on the first day apply also to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers
is the same as the rule for addition of vectors in R2 , and the modulus of a
complex number is the same as the length of the vector that represents it.
So the triangle inequality applies for complex numbers: |z1 +z2 | ≤ |z1 |+|z2 |.
This property extends to vector spaces over complex numbers.
The geometrical interpretation of multiplication by a complex number
z = a + ib = reiθ is multiplication of the modulus by r combined with
addition of θ to the angle with the x-axis.
This is precisely the geometrical effect of the linear transformation repre-
sented by the matrix

a −b r cos θ −r sin θ
=
b a r sin θ r cos θ

r 0
Such a matrix is the product of the constant matrix and the rotation
0 r
cos θ − sin θ
matrix .
sin θ cos θ
It is called a conformal matrix and it preserves angles even though it
does not preserve lengths.
Example: Compute the product of the complex numbers 2 + i and 3 + 1 by
useing matrix multiplication.
18
12. Complex numbers as a field of matrices
In general, matrices do not form a field because multiplication is not com-
mutative. There are two notable exceptions: n × n matrices that are mul-
tiples of the identity matrix and 2 × 2 conformal matrices. Since multiples
of the identity matrix
and
rotations
all
commute, the product of two con-
a −b c −d
formal matrices and is the same in either order.
b a d c
19
13. Cross products:
At this point it is inappropriate to try to define the determinant of an n × n

matrix. For n = 3, however, anything that can be done with determinants
can also be done with cross products, which are peculiar to R3 . So we will
start with cross products:
Definition:
     
a1 b1 a2 b 3 − a3 b 2
a2  × b2  = a3 b1 − a1 b3 
a3 b3 a1 b 2 − a2 b 1
Since this is a computational definition, the way to prove the following

properties is by brute-force computation.
(a) ~a × ~b = −~b × ~a.

(b) ~a × ~a = ~0.
(c) For fixed ~b, ~a × ~b is a linear function of ~b, and vice versa.
(d) For the standard basis vectors, e~i × e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i × e~j = −e~k .
You may find it easiest to calculate cross products in general as
(a1 e~1 + a2 e~2 + a3 e~3 ) × (b1 e~1 + b2 e~2 + b3 e~3 ),
using the formula for the cross products of basis vectors. Try this
approach for
   
2 0
  ~
~a = 1 , b = 1 . 
0 3
(e) ~a×~b·~c = ~a·~b×~c. No parentheses are necessary, because the operations

only make sense if the cross product is done first. This quantity is also
the determinant of the matrix whose columns are ~a, ~b, and ~c.
(f) (~a × ~b) × ~c = (~a · ~c)~b − (~b · ~c)~a
Physicists, memorize this formula ! The vector in the middle gets the
plus sign.
14. Geometric properties of the cross product:
20
We can now prove these without messy calculations involving components.
Justify each step, using properties of the dot product and properties (a)
through (f ) from the preceding page.
• ~a × ~b is orthogonal to the plane spanned by ~a and ~b.

Proof: Let ~v = s~a + t~b be a vector in this plane. Then
~v · ~a × ~b = s~a · ~a × ~b + t~b · ~a × ~b
~v · ~a × ~b = s~a · ~a × ~b − t~b · ~b × ~a
~v · ~a × ~b = s~a × ~a · ~b − t~b × ~b · ~a
~v · ~a × ~b = 0 − 0 = 0.
• |~a × ~b|2 = |~a|2 |~b|2 − (~a · ~b)2
Proof:
|~a × ~b|2 = (~a × ~b) · (~a × ~b)
|~a × ~b|2 = ((~a × ~b) × ~a) · ~b
|~a × ~b|2 = ((~a · ~a)~b − (~a · ~b)~a) · ~b
|~a × ~b|2 = (~a · ~a)(~b · ~b) − (~a · ~b)(~a · ~b)
|~a × ~b|2 = |~a|2 |~b|2 − (~a · ~b)2
• The length of ~a × ~b is |~a||~b| sin α.

Proof:
|~a × ~b|2 = |~a|2 |~b|2 (1 − cos2 α) = |~a|2 |~b|2 (sin2 α)
• The length of ~a × ~b is equal to the area of the parallelogram spanned

by ~a and ~b.
Proof: |~a| is the base of the parallelogram and |~b| sin α is its height.
Draw a diagram to illustrate this property.
21
15. Cross products and determinants.
You should be familiar with 2 × 2 and 3 × 3 determinants from high-school

algebra. The general definition of the determinant, to be introduced in the
spring term, underlies the general technique for calculating volumes in Rn
and will be used to define differential forms.

a1 b
If a 2 × 2 matrix A has columns and 1 , then its determinant
a2 b2
det(A) = a1 b2 − a2 b1 .
Equivalently,  
    0
a1 b1
0
 
a2  × b2  =  
a b 
det 1 1

0 0
a2 b 2
You can think of the determinant as a function of the entire matrix A or

as a function of its two columns.
Matrix A maps the unit square, spanned by the two standard basis vectors,
into a parallelogram whose area is | det(A)|.
Let’s prove this for the case where all the entries of A are positive and
det(A) > 0. The area of the parallelogram formed by the columns of A is
twice the area of the triangle that has these columns as two of its sides.
The area of this triangle can be calculated in terms of elementary formulas
for areas of rectangles and right triangles.
22
16. Determinants in R3
Here is our definition:
If a 3 × 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant
det(A) = ~a1 × ~a2 · ~a3 .
Apply this definition to the matrix
 
1 0 1
A= 2 1
 2 .
0 1 0
Check the following properties of the definition.
(a) det(A) changes sign if you interchange any two columns. (easiest to
prove for columns 1 and 2, but true for any pair)
(b) det(A) is a linear function of each column (easiest to prove for column
3, but true for any column)
(c) For the identity matrix I, det(I) = 1.
23
17. Determinants, triple products, and geometry
The magnitude of ~a × ~b · ~c is equal to the volume of the parallelepiped
spanned by ~a, ~b and ~c.
Proof: ~a × ~b is the area of the base of the parallelepiped, and |~c| cos α,
where α is the angle between ~c and the direction orthogonal to the base, is
its height.
Matrix A maps the unit cube, spanned by the three basis vectors, into a
parallelepiped whose volume is | det(A)|. You can think of | det(A)| as a
“volume stretching factor.” This interpretation will underly much of the
theory for change of variables in multiple integrals, a major topic in the
spring term.
If three vectors in R3 all lie in the same plane, the cross product of any
two of them, which is orthogonal to that plane, is orthogonal to the third
vector, so ~v1 × ~v2 · ~v3 = 0.
     
1 1 3
Apply this test to ~v1 = 0 , ~v2 = 2 , ~v3 = 2 .
    
1 0 2
If four points in R3 all lie in the same plane, the vectors that join any one
of the points to each
 of theother
 three
points
 allliein that plane. Apply
1 2 2 4
this test to p = 1 , q = 1 , r = 3 , s = 3 .
1 2 1 3
24
18. Determinants and matrix multiplication
If C = AB, then det(C) = det(A) det(B)

This useful result is easily proved by brute force for 2 × 2 matrices, and a
brute-force proof in Mathematica would be valid for 3 × 3 matrices. Here
is a proof that relies on properties of the cross product.
Recall that each column of a matrix is the image of a standard basis vector.
Consider the first column of the matrix C = AB, and exploit the fact that
A is linear.
X3 3
X 3
X
~
~c1 = Ab1 = A( bi,1~ei ) = bi,1 A(~ei )) = bi,1~ai .
i=1 i=1 i=1
The same is true of the second and third columns.

Now consider det C = c~1 × c~2 · c~3 .
X3 3
X X3
det C = ( bi,1 a~i ) × ( bj,2 a~j ) · ( bk,3 a~k )
i=1 j=1 k=1
Now use the distributive law for dot and cross products.
3
X 3
X 3
X
det C = bi,1 bj,2 ai × a~j · a~k )
bk,3 (~
i=1 j=1 k=1
There are 27 terms in this sum, but all but six of them involve two subscripts
that are equal, and these are zero because a triple product with two equal
vectors is zero.
The six that are not zero all involve ~a1 × ~a2 · ~a3 , three with a plus sign and
three with a minus sign. So
det C = f (B)(~a1 × ~a2 · ~a3 ) = f (B) det(A), where f (B) is some messy
function of products of all the entries of B.
This formula is valid for any A. In particular, it is valid when A is the
identity matrix, C = B, and det(A) = 1.
So det B = f (B) det(I) = f (B)
and the messy function is the determinant!
25
19. Proof 2.2 – start to finish
For a 3 × 3 matrix A, define det(A) in terms of the cross and dot prod-
ucts of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).
26
20. Isometries of R2 .
A linear transformation T : R2 → R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are “isometries:” transformations that preserve the dis-
tance between any pair of points, and hence the length of any vector.
Since
4~a · ~b = |~a + ~b|2 − |~a − ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.
So T is an isometry if and only if

T~a · T ~b = ~a · ~b for any pair of vectors.
This means that the first column of T must be a unit vector, which can be
written without any loss of generality as

cos θ
.
sin θ
The second column must also be a unit vector, and its dot product with
the first column must be zero. So there are only two possibilities:
• A rotation,
cos θ − sin θ
R(θ) = ,
sin θ cos θ
which has det R = 1.
• A reflection,
cos 2θ sin 2θ
F (θ) = ,
sin 2θ − cos 2θ
which has det F = −1.
This represents reflection in a line through the origin that makes an
angle θ with the first basis vector.
Since the composition of isometries is an isometry, the product of any num-

ber of matrices of this type is another rotation or reflection.
27
21. Calculations with cross products
(a) Prove the identity
|~a × ~b|2 = |~a|2 |~b|2 − (~a · ~b)2
(b) Prove that
|~a × ~b| = |~a||~b| sin α,
where α is the angle between vectors ~a and ~b.
28
22. Transposes and dot products
Start by proving in general that (AB)T = B T AT . This is a statement about
matrices, and you have to prove it by brute force.
The dot product of vectors ~v and w

~ can also be written in terms of matrix
multiplication as
~ = ~vT w
~v · w ~
where we think of ~vT as a 1 × m matrix and think of w

~ as an m × 1 matrix.
The product is a 1 × 1 matrix, so it equals its own transpose.
w = AT ~v · w
Prove that ~v · A~ ~ . This theorem lets you move a matrix from
one factor in a dot product to the other, as long as you replace it by its
transpose.
29
23. Orthogonal matrices
If a matrix R represents an isometry, then each column is a unit vector and
the columns are orthogonal. Since the columns of R are the rows of RT we
can express this property as
RT R = I
Perhaps a nicer way to express this condition for a matrix to represent an
isometry is RT = R−1 . Check that this is true for the 2 × 2 matrices that
represent rotations and reflections.
For a rotation matrix

cos θ − sin θ
R(θ) = .
sin θ cos θ
For a reflection matrix

cos 2θ sin 2θ
F (θ) = .
sin 2θ − cos 2θ
30
24. Isometries and cross products
Many vectors of physical importance (torque, angular momentum, magnetic
field) are defined as cross products, so it is useful to know what happens to
a cross product when an isometry is applied to each vector in the product.
~.
Consider the matrix whose columns are R~u, R~v, and w
Multiply this matrix by RT to get a matrix whose columns are
RT R~u, RT R~v, and RT w
~ . In the process you multiply the determinant by
T
det(R ) = det(R).
Now, since RT R = I for an isometry, ~u × ~v · RT w
~ = det(R)R~u × R~v · w
~
Equivalently, R(~u × ~v) · w
~ = det(R)R~u × R~v · w
~.
~ , in particular for any basis vector, it follows
Since this is true for any w
that
R(~u × ~v) = det(R)R~u × R~v
If R is a rotation, then det(R) = 1 and R(~u × ~v) = R~u × R~v
If R is a reflection, then det(R) = −1 and R(~u × ~v) = −R~u × R~v
This is reasonable. Suppose you are watching a physicist in a mirror as she
calculates the cross product of two vectors. You see her apparently using
a left-hand rule and think that she has got the sign of the cross-product
wrong.
31
25. Using cross products to invert a 3 × 3 matrix
Thinking about transposes also leads to a formula for the inverse of a 3 × 3
matrix in terms of cross products. Suppose that matrix A has columns
~a1 , ~a2 , and ~a3 . Form the vector ~s1 = ~a2 × ~a3 .
This is orthogonal to ~a2 and ~a3 , and its dot product with ~a1 is det(A).
Similarly, the vector ~s2 = ~a3 × ~a1
is orthogonal to ~a3 and ~a1 , and its dot product with ~a2 is det(A).
Finally, the vector ~s3 = ~a1 × ~a2
is orthogonal to ~a1 and ~a2 , and its dot product with ~a3 is det(A).
So if you form these vectors into a matrix S and take its transpose,
S T A = det(A)I.
If det A = 0, A has no inverse. Otherwise
ST
A−1 = .
det(A)
You may have learned this rule in high-school algebra in terms of 2 × 2

determinants.
Summarize the proof that this recipe is correct.
32
.
33
3 Group Problems
1. Dot products, angles, and isometries
(a) Making the reasonable assumption that a rotation though angle 2α can
be accomplished by making two successive rotations through angle α,
use matrix multiplication to derive the double-angle formulas for the
sine and cosine functions.
(b) Consider a parallelogram spanned by vectors ~v and w ~ . Using the dot
product, prove that it is a rhombus if and only if the diagonals are
perpendicular and that it is a rectangle if and only if the diagonals are
equal in length.
(c) A parallelogram is spanned by two vectors that meet at a 60 degree
angle, one of which is twice as long as the other. Find the ratio of the
lengths of the diagonals and the cosine of the acute angle between the
diagonals. Confirm that the parallelogram law holds in this case.
2. Proofs that involve cross products
(a) Consider a parallelepiped whose base is a parallelogram spanned by

two unit vectors, anchored at the origin, with a 60 degree angle be-
tween them. The third side leaving the origin, also a unit vector,
makes a 60 degree angle with each of the other two sides, so that each
face is made of of a pair of equilateral triangles. Using dot and cross
products, show that the angle α between the third side and a line that
√
bisects the angle between the other two sides satisfies cos α = 1/ 3
and that the volume of this parallelepiped is √12 .
(b) Using the definition det(A) = ~a1 ×~a2 ·~a3 and properties of the dot and
cross products, prove that the determinant of a 3 × 3 matrix changes
sign if you swap the first column with the third column.
(c) Prove that the cross product, although not associative, satisfies the
“Jacobi identity”
(~a × ~b) × ~c + (~b × ~c) × ~a + (~c × ~a) × ~b = 0.
34
3. Problems that involve writing or editing R scripts
(a) Construct a triangle where vector AB has length 5 and is directed east,
while vector AC has length 10 and is drected 53 degrees north of east.
On side BC, construct point D that is 1/3 of the way from B to C.
Using dot products, confirm that the vector AD bisects the angle at
A.
This is a special case of Euclid’s Elements, Book VI, Proposition 3.
(b) You are playing golf, and the hole is located 350 yards from the tee in
a direction 18 degrees south of east. You hit a tee shot that travels 220
yards 14 degrees south of east, followed by an iron shot that travels
150 yards 23 degrees south of east. How far from the hole is your golf
ball now located?
(c) Generate a triangle using the function in the vector library 1.2L-
VectorLibrary.R, then apply to each vertex of this triangle the con-
formal matrix C that corresponds to the complex number −1.2 + 1.6i.
Plot the triangle before and after C is applied, and confirm that these
triangles are similar but not congruent.
35
4 Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
1. One way to construct a regular pentagon
A B
C D
Take five ball-point pens or other objects of equal length(call it 1) and

arrange them symmetrically, as shown in the diagram above, so that O, A, C
and O, B, D are collinear and |OC| = |OD|. Let AO = ~v, |BO| = |~v|,
CD = w ~ , CA = x~v, |DB| = x|~v|.
(a) Express vectors AD and OB in terms of x, ~v, and w ~ . By using the

fact that these vectors have the same length 1 as ~v and w ~ , get two
equations relating x and ~v · w
~ . (Use the distributive law for the dot
product).
(b) Eliminate x to find a quadratic equation satisfied by ~v · w
~ . Show that
~ ~
the angle α between v and w satisfies the equation
sin 3α = − sin 2α and that therefore α = 2π
5
. (In case you have forgot-
2
ten, sin 3α = sin α(4 cos α − 1)).
(c) Explain how, given five identical ball-point pens, you can construct a
regular pentagon. (Amazingly, the obvious generalization with seven
pens lets you construct a regular heptagon. Crockett Johnson claims
to have discovered this fact while dining with friends in a restaurant
in Italy in 1975, using a menu, a wine list, and seven toothpicks)
36
2. One vertex of a quadrilateral in R3 is located at point p. The other three
vertices, going around in order, are located at q = p + ~a, r = p + ~b, and
s = p + ~c.
(a) Invent an expression involving cross products that is equal to zero if

and only if the four vertices of the quadrilateral lie in a plane. (See
section problem 2 for a special case).
(b) Prove that the midpoints of the four sides pq, qr, rs, and sp are the
vertices of a parallelogram.
3. Isometries and dot products

The transpose of a (column) vector ~v is a “row vector” ~vT , which is also a
1 × n matrix.
~ are vectors in Rn and A is an n × n matrix.
Suppose that ~v and w
w = ~vT A~
(a) Prove that ~v · A~ w. (You can think of the right-hand side as
the product of three matrices.)
w =AT ~v · w
(b) Prove that ~v · A~ ~ . You can do this by brute force using
summation notation, or you can do it by using part (a) and the rule
for the transpose of a matrix product (Therem 1.2.17 in Hubbard).
(c) Now suppose that ~v and w~ are vectors in R3 and R is an 3×3 isometry
matrix. Prove that R~v · Rw
~ = ~v · w
~ . If you believe that physical laws
should remain valid when you rotate your epcerimental apparatus, this
result shows that dot products are appropriate to use in expressing
physical laws.
4. Using vectors to prove theorems of trigonometry.
(a) For vectors ~a and ~b,

~a × ~b = |~a||~b| sin α, where α is the angle between the vectors.
By applying this formula to a triangle whose sides are ~v, w~ , and ~v − w
~,
prove the Law of Sines.
(b) Consider a parallelogram spanned by vectors ~v and w ~.
Its diagonal is ~v + w
~.
Let α denote the angle between ~v and the diagonal ; let β denote the
angle between w ~ and the diagonal. By expressing sines and cosines in
terms of cross products, dot products, and lengths of vectors, prove
the addition formula
sin(α + β) = sin α cos β + cos α sin β.
37
5. Let R(θ) denote the 2×2 matrix that represents a counterclockwise rotation
about the origin through angle θ. Let F (α) denote the 2 × 2 matrix that
represents a reflection in the line through the origin that makes angle α with
the x axis. Using matrix multiplication and the trigonometric identities
sin (α + β) = sin α cos β + cos α sin β
cos (α + β) = cos α cos β − sin α sin β, prove the following:
(a) F (β)F (α) = R(2(β − α)).

(b) F (γ)F (β)F (α) = F (γ + α − β). (You might want to work problem 7
first.)
(c) The product of any even number of reflections in lines through the
origin is a rotation about the origin and the product of any odd number
of reflections in lines through the origin is a reflection in a line through
the origin. (Hint: use induction. First establish the base cases n = 1
and n = 2. Then do the ”inductive step:” show that if the result is
true for the product of n reflections, it is true for n + 2 reflections.)
6. Matrices that represent complex numbers
(a) Confirm that i2 = −1 using conformal matrices.

(b) Represent 4 + 2i as a matrix. Square it and interpret its result as
a complex number. Confirm your answer by checking what you get
when expanding algebraically.
(c) Show that using matrices to represent complex numbers still preserves
addition as we would expect.
That is, write two complex numbers as matrices. Then add the matri-
ces, and interpret the sum as a complex number. Confirm your answer
is correct algebraically.
38
The last two problems require R scripts. Feel free to copy and edit existing
scripts, including student solutions to group problem 3b, and to use the
library script 2l, which has functions for dealing with angles in degrees.
7. Vectors in two dimensions
(a) You are playing golf and have made a good tee shot. Now the hole is
located only 30 yards from your ball, in a direction 32 degrees north
of east. You hit a chip shot that travels 25 yards 22 degrees north of
east, followed by a putt that travels 8 yards 60 degrees north of east.
How far from the hole is your golf ball now located? For full credit,
include a diagram showing the relevant vectors.
(b) The three-reflections theorem, whose proof was problem 5b, states that
if you reflect successively in lines that make angle α, β, and γ with
the x−axis, the effect is simply to reflect in a line that makes angle
α + γ − β with the x-axis. Confirm this, using R, for the case where
α = 40◦ , β = 30◦ , and γ = 80◦ . Make a plot in R to show where the
point P = (1, 0) ends up after each of the three successive rotations.
8. Vectors in three dimensions (see script 2Y, topic 3)

The least expensive way to fly from Boston (latitude 42.36◦ N, longitude
71.06◦ W) to Naples (latitude 40.84◦ N, longitude 14.26◦ E) is to buy a ticket
on Aer Lingus and change planes in Dublin (latitude 53.35◦ S, longitude
6.26◦ W). Since Dublin is more than 10 degrees further north than either
Boston or Naples, it is possible that the stop in Dublin might lengthen the
journey substantially.
(a) Construct unit vectors in R3 that represent the positions of the three
cities.
(b) By computing angles between these vectors, compare the length in
kilometers of a nonstop flight with the length of a trip that stops
in Dublin. Remember that, by the original definition of the meter,
the distance from the North Pole to the Equator along the meridian
through Paris is 10,000 kilometers. (You may treat the Earth as a
sphere of unit radius.)
(c) Any city that is on the great-circle route from Boston to Naples has a
vector that lies in the same plane as the vectors for Boston and Naples.
Invent a test for such a vector (you may use either cross products or
determinants), and apply it to Dublin.
39
Module #1, Week 3 (Row Reduction, Independence, Basis)

Last modified:June 18, 2015 by Paul Bamberg
Reading
• Hubbard, Sections 2.1 through 2.5
• 3.1. Prove that in Rn , n + 1 vectors are never linearly independent and

n − 1 vectors never span. Explain how these results show that a matrix
that is not square cannot be invertible.
You may use illustrations with row reduction for a specific value of n, but
your argument must be independent of the value of n.
You may use that fact that any matrix can be row reduced by multiplying
it on the left by a product of invertible elementary matrices.
• 3.2. Equivalent descriptions of a basis:

Prove that a maximal set of linearly independent vectors for a subspace of
Rn is also a minimal spanning set for that subspace.
1
R Scripts
• Script 1.3A-RowReduction.R
Topic 1 - Row reduction to solve two equations, two unknowns
Topic 2 - Row reduction to solve three equations, three unknowns
Topic 3 - Row reduction by elementary matrices
Topic 4 - Automating row reduction in R
Topic 5 - Row reduction to solve equations in a finite field
• Script 1.3B-RowReductionApplications.R
Topic 1 - Testing for linear independence or dependence
Topic 2 - Inverting a matrix by row reduction
Topic 3 - Showing that a given set of vectors fails to span Rn
Topic 4 - Constructing a basis for the image and kernel
• Script 1.3C-OrthonormalBasis.R
Topic 1 - Using Gram-Schmidt to construct an orthonormal basis
Topic 2 - Making a new orthonormal basis for R3
Topic 3 - Testing the cross-product rule for isometries
• Script 1.3P-RowReductionProofs.R
Topic 1 - In Rn , n + 1 vectors cannot be independent
Topic 2 - In Rn , n − 1 vectors cannot span
Topic 3 - An invertible matrix must be square
2
1 Executive Summary
1.1 Row reduction for solving systems of equations
When you solve the equation A~v = ~b you combine the matrix A and the vector
~b into a single matrix. Here is a simple example.
x + 2y = 7, 2x + 5y = 16

1 2 x ~ 7
Then A = , ~v = ,b= , so that A~v = ~b exactly corresponds
2 5 y 16
1 2 7
to our system of equations. Our matrix of interest is therefore
2 5 16
First,
subtract
twice row 1 from row 2, then subtract twice row 2 from row 1
1 0 3
to get
0 1 2
Interpret the result as a pair of equations (remember what each column cor-
responded to when we first appended A and ~b together: x = 3, y = 2.
The final form we are striving for is row-reduced echelon form, in which
• The leftmost nonzero entry in every row is a “pivotal 1.”
• Pivotal 1’s move to the right as you move down the matrix.
• A column with a pivotal 1 has 0 for all its other entries.
• Any rows with all 0’s are at the bottom.
The row-reduction algorithm converts a matrix to echelon form. Briefly,
1. SWAP rows, if necessary, so that the leftmost column that is not all zeroes
has a nonzero entry in the first row.
2. DIVIDE by this entry to get a pivotal 1.
3. SUBTRACT multiples of the first row from the others to clear out the rest
of the column under the pivotal 1.
4. Repeat these steps to get a pivotal 1 in the next row, with nothing but
zeroes elsewhere in the column (including in the first row). Continue until
the matrix is in echelon form.
A pivotal 1 in the final column indicates no solutions. A bottom row full of
zeroes means that there are infinitely many solutions.
Row reduction can be used to find the inverse of a matrix. By appending

the appropriately sized identity matrix, row reducing will give the inverse of the
matrix.
3
1.2 Row reduction by elementary matrices
Each basic operation in the row-reduction algorithm for a matrix A can be
achieved by multiplication on the left by an appropriate invertible elementary
matrix.
• Type 1: Multiplying the kth row by a scalar m is accomplished by an ele-

mentary matrix formed by starting with the identity matrix and replacing
the kth element of the diagonal by the scalar m.
 
1 0 0
Example: E1 = 0 3 0 multiplies the second row of matrix A by 3.
0 0 1
• Type 2: Adding b times the jth row to the kth row is accomplished by an
elementary matrix formed by starting with the identity matrix and changing
the jth element in the kth row for 0 to the scalar b.
 
1 3 0
Example: E2 = 0 1 0 adds three times the second row of matrix A to
0 0 1
the first row.
You want to multiply the second row of A by 3, so the 3 must be in the
second column of E2 . Since the 3 is in the first row of E2 , it will affect the
first row of E2 A.
• Type 3: Swapping row j with row k is accomplished by an elementary

matrix formed by starting with the identity matrix, changing the jth and
kth elements on the diagonal to 0, and changing the entries in row j, column
k and in row k, column j from 0 to 1.
 
0 0 1
Example: E3 = 0 1 0 swaps the first and third rows of matrix A.
1 0 0
Suppose that A|I row-reduces to A0 |B. Then EA = A0 and EI = B, where

E = Ek · · · E2 E1 is a product of elementary matrices. Since each elementary
matrix is invertible, so is E. Clearly E = B, which means that we can construct
E during the row-reduction process by appending the identity matrix I to the
matrix A that we are row reducing.
If matrix A is invertible, then A0 = I and E = A−1 . However, the matrix
E is invertible even when the matrix A is not invertible. Remarkably, E is also
unique: it comes out the same even if you carry out the steps of the row-reduction
algorithm in a non-standard order.
4
1.3 Row reduction for determining linear independence
Given a set of elements such as {a1 , a2 , a3 , a4 }, a linear combination is the name
given to any arbitrary sum of scalar multiples of those elements. For instance:
a1 − 2a2 + 4a3 − 5a4 is a linear combination of the above set.
Given some set of vectors, we describe the set as linearly independent if
none of the vectors can be written as a linear combination of the others. Similarly,
we describe the set as linearly dependent if one or more of the vectors can be
written as a linear combination of the others.
A subspace is a set of vectors (usually an infinite number of them) that is
closed under addition and scalar multiplication. “Closed” means that the sum
of any two vectors in the set is also in the set and any scalar multiple of a vector
in the set is also in the set. A subspace of F n is the set of all possible linear
combinations of some set of vectors. This set is said to span or to generate the
subspace
A subspace W ∈ F n has the following properties:
1. The element ~0 is in W .
2. For any two elements ~u, ~v in W , the sum ~u + w

~ is also in W .
3. For any element ~v in W and any scalar c in F , the element c~v is also in W .
A basis of a vector space or subspace is a linearly independent set that spans

that space.
The definition of a basis can be stated in three equivalent ways, each of which
implies the other two:
a) It is a maximal set of linearly independent vectors in V : if you add any

other vector in V to this set, it will no longer be linearly independent.
b) It is a minimal spanning set: it spans V , but if you remove any vector from
this set, it will no longer span V .
c) It is a set of linearly independent vectors that spans V .
The number of elements in a basis for a given vector space is called the dimension
of the vector space. A subspace has at most the same dimension as the space of
which it is a subspace.
By creating a matrix whose columns are the vectors in a set and row reducing,
we can find a maximal linearly independent subset, namely the columns that
become columns with pivotal 1’s. Any column that becomes a column without a
pivotal 1 is a linear combination of the columns to its left.
5
1.4 Finding a vector outside the span
To show that a set of vectors {~v1 , ~v2 , · · · , ~vk } does not span F n , we must exhibit
~ that is not a linear combination of the vectors in the given set.
a vector w
• Create an n × k matrix A whose columns are the given vectors.
• Row-reduce this matrix, forming the product E of the elementary matrices
that accomplish the row reduction.
• If the original set of vectors spans F n , the row-reduced matrix EA will
have n pivotal columns. Otherwise it will have fewer than n pivotal 1s, and
there will be a row of zeroes at the bottom. If that is the case, construct
the vector w~ = E −1~en .
• Now consider what happens when you row reduce the matrix A|~ w. The
last column will contain a pivotal 1. Therefore the vector w ~ is independent
of the columns to its left: it is not in the span of the set {~v1 , ~v2 , · · · , ~vk } .
If k < n, then matrix A has fewer than n columns, so the matrix EA has
fewer than n pivotal columns and must have a row of zeroes at the bottom. It
~ = E −1~en can be constructed and that a set of fewer
follows that the vector w
than n vectors cannot span F n .
1.5 Image, kernel, and the dimension formula

Consider linear transformation T : Rn → Rm , represented by matrix [T ].
• The image of T , Img T , is the set of vectors that lie in the subspace spanned
by the columns of [T ].
• Img T is a subspace of Rm . Its dimension is r, the rank of matrix [T ].
• A solution to the system of equations T (~x) = ~b is guaranteed to exist
(though it may not be unique) if and only if Img T is m-dimensional.
• To find a basis for Img T , use the columns of the matrix [T ] that become
pivotal columns as a result of row reduction.
• The kernel of T , Ker T , is the set of vectors ~x for which T (~x) = 0.
• Ker T is a subspace of Rn .
• A system of equations T (~x) = ~b has a unique solution (though perhaps no
solution exists) if and only if Ker T is zero-dimensional.
• There is an algorithm (Hubbard pp 196-197) for constructing an indepen-
dent vector in Ker T from each of the n − r nonpivotal columns of [T ].
• Since dim Img T = r and dim Ker T = n − r,
dim Img T + dim Ker T = n (the “rank-nullity theorem.”)
6
1.6 Linearly independent rows
Hubbard (page 200) gives two arguments that the number of linearly independent
rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row that is
linearly independent of the top two rows into the third position. Continue until
the top r rows are a linearly independent set, while each of the bottom m − r
rows is a linear combination of the top r rows.
Continuing with elementary row operations, subtract appropriate multiples
of the top r rows from each of the bottom rows in succession, reducing it to
zero. (Easy in principle but hard in practice!). The top rows, still untouched,
are linearly independent, so there is no way for row reduction to convert any of
them to a zero row. In echelon form, the matrix will have r pivotal 1s: rank r.
It follows that r is both the number of linearly independent columns and the
number of linearly independent rows: the rank of A is equal to the rank of its
transpose AT .
1.7 Orthonormal basis

A basis is called orthogonal if any two distinct vectors in the basis have a dot
product of zero. If, in addition, each basis vector is a unit vector, then the basis
is called orthonormal.
Given any basis {~v1 , ~v2 , · · · , ~vk } of a subspace W and any vector ~x ∈ W , we
can express ~x as a linear combination of the basis vectors:
~x = c1~v1 + c2~v2 + · · · + ck ~vk ,
but determining the coefficients requires row reducing a matrix.

If the basis {~v1 , ~v2 , · · · , ~vk } is orthonormal, just take the dot product with ~vi
to determine that ~x · ~vi = ci .
We can convert any spanning set of vectors into a basis. Here is the algorithm,
sometimes called the “Gram-Schmidt process.”
Choose any vector w ~ 1 : divide it by its length to make the first basis vector ~v1 .
Choose any vector w ~ 2 that is linearly independent of ~v1 and subtract off a multiple
of ~v1 to make a vector ~x that is orthogonal to ~v1 .
~x = w~ 2 − (~
w2 · ~v1 )~v1
Divide this vector by its length to make the second basis vector ~v2 .
Choose any vector w ~ 3 that is linearly independent of ~v1 and ~v2 , and subtract off
multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1 and ~v2 .
~x = w~ 3 − (~
w3 · ~v1 )~v1 − (~ w3 · ~v2 )~v2
Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent of
your basis vectors.
7
2 Lecture Outline
1. Row reduction
This is just an organized version of the techniques for solving simultaneous
equations that you learned in high school.
When you solve the equation A~x = ~b you combine the matrix A and the
vector ~b into a single matrix. Here is a simple example.
The equations are
x + 2y = 7
2x + 5y = 16.

1 2 ~ 7
Then A = ,b= ,
2 5 16

1 2 7
and we must row-reduce the 2 × 3 matrix .
2 5 16
First, subtract twice row 1 from row 2 to get
Then subtract twice row 2 from row 1 to get
Interpret the result as a pair of equations:
Solve these equations (by inspection) for x and y
You see the general strategy. First eliminate x from all but the first equa-
tion, then eliminate y from all but the second, and keep going until, with
luck, you have converted each row into an equation that involves only a
single variable with a coefficient of 1.
8
2. Echelon Form
The result of row reduction is a matrix in echelon form, whose properties

are carefully described on p. 165 of Hubbard (definition 2.1.5). Here is
Hubbard’s messiest example:
 
0 1 3 0 0 3 0 4
0 0 0 1 −2 1 0 1 .
0 0 0 0 0 0 1 2
Key properties:
• The leftmost nonzero entry in every row is a “pivotal 1.”

• Pivotal 1’s move to the right as you move down the matrix.
• A column with a pivotal 1 has 0 for all its other entries.
• Any rows with all 0’s are at the bottom.
If a matrix is not is echelon form, you can convert it to echelon form by

applying one or more of the following row operations.
(a) Multiply a row by a nonzero number.

(b) Add (or subtract) a multiple of one row from another row.
(c) Swap two rows.
Here are the “what’s wrong?” examples from Hubbard. Find row opera-
tions that fix them.  
1 0 0 2
0 0 1 −1 .
0 1 0 1
 
1 1 0 1
0 0 2 0 .
0 0 0 1
 
0 0 0
1 0 0 .
0 1 0
 
0 1 0 3 0 −3
0 0 −1 1 1 1  .
0 0 0 0 1 2
9
3. Row reduction algorithm
The row-reduction algorithm (Hubbard, p. 166) converts a matrix to ech-
elon form. Briefly,
(a) SWAP rows so that the leftmost column that is not all zeroes has a
nonzero entry in the first row.
(b) DIVIDE by this entry to get a pivotal 1.
(c) SUBTRACT multiples of the first row from the others to clear out the
rest of the column under the pivotal 1.
(d) Repeat these steps to get a pivotal 1 in the second row, with nothing
but zeroes elsewhere in the column (including in the first row).
(e) Repeat until the matrix is in echelon form.
 
0 3 3 6
Carry out this procedure to row-reduce the matrix 2 4 2 4 .
3 8 4 7
10
4. Solving equations
Once you have row-reduced the matrix, you can interpret it as representing
˜
the equation Ã~x = ~b,
which has the same solutions as the equation with which you started, except
that now they can be solved by inspection.
˜
A pivotal 1 in the last column ~b is the kiss of death, since it is an equa-
tion like 0x + 0y = 1. There is no solution. This happens in the second
Mathematica example,
 
1 0 1 0
where row reduction leads to 0 1 1 0 .
0 0 0 1
Otherwise, choose freely the values of the “active” unknowns in the non-
pivotal columns(excluding the last one). Then each row gives the value of
the “passive” unknown in the column that has the pivotal 1 for that row.
This happens in the third Mathematica example,
1 0 1 23
   
2 1 3 1
where row reduction converts 1 −1 0 1  to 0 1 1 − 31  .
1 1 2 31 0 0 0 0
The only nonpivotal column(except the last one) is the third. So we can
choose the value of the active unknown z freely.
2
Then the first row gives x in terms of z: x = 3
− z.
The second row gives y in terms of z: y = − 31 − z.
If there are as many equations as unknowns, this situation is exceptional.
If there are fewer equations than unknowns, it is the usual state of affairs.
Expressing the passive variables in terms of the active ones will be the
subject of the important implicit function theorem in outline 9.
A column that is all zeroes is nonpivotal. Such a column must have been
there from the start; it cannot come about as a result of row reduction.
It corresponds to an unknown that was never mentioned. This sounds
unlikely, but it can happen when you represent a system of equations by
an arbitrary matrix.
Example: In R3 , solve the equations x = 0, y = 0 (z not mentioned)
11
5. Many for the price of one
If you have several equations with the same matrix A on the left and dif-
ferent vectors on the right, you can solve them all in the process of row-
reducing A. This is Example 2.2.10, also done in Mathematica. Row re-
duction is more efficient than computing A−1 , and it works even when A is
not invertible. Here is simple example with a non-invertible A:
x + 2y = 3
2x + 4y = 6
x + 2y = 3
2x + 4y = 7
The first pair has infinitely many solutions: choose any y and take x =
3 − 2y. The second set has none.
We must row-reduce the 2 × 4 matrix

1 2 3 3
.
2 4 6 7
This quickly gives

1 2 3 3
0 0 0 1
and then

1 2 3 0
0 0 0 1
The last column has a pivotal 1 – no solution for the second set.
The third column has no pivotal 1, and the second column is also nonpivotal,
so there are multiple solutions for the first set of equations. Make a free
choice of the active variable y that goes with nonpivotal column 2.
How does the first row now determine the passive unknown x?
12
6. When is a matrix invertible?
Our definition of the inverse A−1 of a matrix A requires it to be both a left
inverse and a right inverse: A−1 A = I and AA−1 = I.
We have also proved that the inverse of a matrix, if it exists, must be unique.
The notation I for the identity obscures the fact that one identity matrix
might be m × m, the other n × n, in which case we would have an invertible
non-square matrix. Now is the time to prove that this cannot happen: only
a square matrix can be invertible. This theorem is the key to Hubbard’s
proof of the most important theorem of linear algebra, which says that the
dimension of a vector space is well defined. The proof relies explicitly on
row reduction.
• If A is invertible, a unique solution to A~x = ~b exists.

Existence: Prove that ~x = A−1 ~b is a solution.
Uniqueness: Argue that the uniqueness of the solution follows from

the uniqueness of A−1 .
Now we must show that if A~x = ~b has a unique solution, the number
of rows m must equal the number of columns n. Consider solving
A~x = ~b by row reduction, converting A to matrix Ã in echelon form.
To show that m = n, show that m ≤ n and n ≤ m.
• If A has more rows than columns, there is no existence. Row reduction
must leave at least one row of zeroes at the bottom, and there exists
~b for which A~x = ~b has no solution.
• If A has more columns than rows, there is no uniqueness. Row reduc-
tion must leave at least one nonpivotal column, and the solution to
A~x = ~b is not unique.
• So if A is invertible, and A~x = ~b therefore has a unique solution, A
must be a square matrix.
13
7. Matrix inversion by row reduction
If A is square and you choose each standard basis vector in turn for the
right-hand side, then row reduction constructs the inverse of A if it exists.

1 2
As a simple example, we invert A = .
2 5
Begin by appending the standard basis vectors as third and fourth columns
to get
1 2 1 0
.
2 5 0 1
Now row-reduce this in two easy steps:
The right two columns of the row-reduced matrix are the desired inverse:
check it!
For matrices larger than 2 × 2, row reduction is a more efficient way of con-
structing a matrix inverse than any techniques involving determinants that
you may have learned! Hubbard,  Example 2.3.4, is done in Mathematica. 
2 1 3 1 0 0 1 0 0 3 −1 −4
The matrix 1 −1 1 0 1 0 row reduces to 0 1 0 1 −1 −1 .
1 1 2 0 0 1 0 0 1 −2 1 3
−1
What are A and A ?
14
8. Elementary matrices:
Each basic operation in the row-reduction algorithm can be achieved by
multiplication on the left by an appropriate invertible elementary matrix.
Here are examples of the three types of elementary matrix.
 For each, figure
2 4
out what row operation is achieved by converting A = −1 1 to EA.

1 0
1 
0 0
2
• Type 1: E1 =  0 1 0
0 0 1
 
1 2 0
• Type 2: E2 = 0 1 0
0 0 1
 
0 0 1
• Type 3: E3 = 0 1 0
1 0 0
In practice, use of elementary matrices does not speed up computation, but

it provides a nice way to think about row reduction for purposes of doing
proofs.
For example, as on page 180 of Hubbard, suppose that A|I row-reduces to
I|B.
Then EA = I and EI = B, where
E = Ek · · · E2 E1 is a product of elementary matrices. Since each elementary
matrix is invertible, so is E. Clearly E = B, which means that we can
construct E during the row-reduction process.
It is by no means obvious that E is unique, and in fact the general proof is
left as an exercise (2.4.12) in Hubbard. But in the case where A row-reduces
to the identity there is an easy proof.
Start with EA = I.
Multiply by E −1 on the left, E on the right, to get
E −1 EAE = E −1 E,
from which it follows that AE = I. So E is also a right inverse of A. But
we earlier proved that if a matrix A has a right inverse and a left inverse,
both are unique.
15
9. Row reduction and elementary matrices
We want to solve the equations

3x + 6y = 21
2x + 5y = 16.

3 6 ~ 21
Then A = ,b= ,
2 5 16

3 6 21
and we must row-reduce the 2 × 3 matrix .
2 5 16
Use an elementary matrix to accomplish each of the three steps needed to
accomplish row reduction.
Matrix E1 divides the top row by 3.
Matrix E2 subtracts twice row 1 from row 2.
Matrix E3 subtracts twice row 2 from row 1.
Interpret the result as a pair of equations and solve them (by inspection)
for x and y.
Show that the product E3 E2 E1 is the inverse of A.
16
10. Linear combinations and span
The defining property of a linear function T : for any collection of k vectors

in F n , ~v1 , · · · ~vk , and any collection of coefficients a1 · · · ak in field F ,
Xk k
X
T( ai~vi ) = ai T (~vi ).
i=1 i=1
Pk
The sum i=1 ai~vi is called a linear combination of the vectors ~v1 , · · · v~k .
The set of all the linear combinations of ~v1 , · · · v~k is called the span of the
set ~v1 , · · · ~vk .
Prove that it is a subspace of F n .
        
1 0 3 2 1
Suppose ~v1 = −2 , ~v2 = 1 , ~v3 = −1 ,~
      w1 = −3 ,~
  w2 = 0

1 −1 −2 1 0
• Show that w
~ 1 is a linear combination of ~v1 and ~v2 .
• Invent an easy way to describe the span of ~v1 , ~v2 , and ~v3 . (Hint:
consider the sum of the components.)
• Show that w
~ 2 is not in the span of ~v1 , ~v2 , and ~v3 .
   
1 0 3 2 1 1 0 3 2 0
• The matrix −2 1 −1 −3 0 row reduces to 0
   1 5 2 0.
1 −1 −2 1 0 0 0 0 0 1
How does this result answer the question of whether or ~ 1 or w
not w ~2
is in the span of ~v1 , ~v2 , and ~v3 ?
17
11. Special cases:
• In F n , w
~ is in the span of ~u if and only if it is a multiple of ~u.
• In F 2 , if ~v is not a multiple of ~u, then every vector w
~ is in the span
of ~u and ~v.
Write an equivalent statement using negation, and use it to construct
an example.
• In F 3 , a vector w
~ is in the span of ~u and ~v if and only if it is orthogonal
to ~u × ~v.
Give a geometrical interpretation of this statement.
Prove algebraically that if w ~ is in the span of ~u and ~v, then it is

orthogonal to ~u × ~v. (Proof strategy: interchange dot and cross)
• If matrix [T ] represents linear transformation T , the image of T is the

span of the columns of [T ].
• In general, a vector w ~ is in the span of ~v1 , ~v2 , · · · ~vk if the system of

equations
x1~v1 + x2~v2 + · · · xk ~vk = w~ has at least one solution. To check this,
make all the vectors into a matrix and row-reduce it. If the last column
(corresponding to w ~ ) has a pivotal 1, then w ~ is not in the span of the
others. You have already seen one example, and there is another in
the Mathematica file.
18
12. Linear independence
~v1 , ~v2 , · · · ~vk are linearly independent if the system of equations
x1~v1 + x2~v2 + · · · + xk ~vk = w
~ has at most one solution.
To test for linear independence, make the vectors ~v1 , ~v2 , · · · ~vk into a matrix
and row-reduce it. If any column is nonpivotal, then the vectors are linearly
dependent. There is an example in the Mathematica file.
     
1 2 0
1 0 2
The vectors to test for independence are ~v1 = 
2, ~v2 = 1, ~v3 = 3.
    
1 1 1
~ is irrelevant and might as well be zero, so we just make a
The vector w
matrix from the three given vectors:
   
1 2 0 1 0 2
1 0 2 0 1 −1
2 1 3 reduces to 0 0 0 
   
1 1 1 0 0 0
The third column is nonpivotal; so the given vectors are linearly dependent.
How can you write the third one as a linear combination of the first two?
 
0
2
Change ~v3 to 
1 and test again.

1
   
1 2 0 1 0 0
1 0 2 0 1 0
Now 
2 1 1 reduces to 0 0
  
1
1 1 1 0 0 0
There is no nonpivotal column. The three vectors are linearly independent.
Setting w~ = ~0, as we have already done, leads to the standard definition of
linear independence: if
a1~v1 + a2~v2 + · · · ak ~vk = ~0
then a1 = a2 = · · · = ak = 0.
19
13. Constructing a vector outside the span
The vectors are
   
4 2
~v1 = 2 , ~v2 = 1
  
3 2
   
4 2 1 0
A = 2 1 reduces to EA = 0 1, and the matrix that does the job is
3 2 0 0
 
1 0 −1
E = − 23 0 2 .

1
2
1 0
We want to append a third column ~b such that when we row reduce the
square matrix A|~b, the resulting matrix EA|E ~b will have a pivotal 1 in the
third column. In this case it will be in the bottom row. Since E, being a
product of elementary matrices, must be invertible, we compute
   
0 0
−1  
E 0 = 1 
1 0
 
0
We have found a vector, 1, that is not in the span of ~v1 and ~v2 .

0
Key point: the proof relies on the fact that this procedure will always work,
because the matrix E that accomplishes row reduction is guaranteed to be
invertible!
20
14. Two key theorems; your proof 3.1
• In Rn , a set of n + 1 vectors cannot be linearly independent.

If we start with n + 1 vectors in Rn , make a matrix that has these
vectors as its columns, and row-reduce, the best we can hope for is to
get a pivotal 1 in each of n columns. There must be at least one non-
pivotal column (not necessarily the last column), and the n + 1 vectors
must be linearly dependent: they cannot be linearly independent.
Show what the row-reduced matrix looks like and how it is possible
for the non-pivotal column not to be the last column.
• In Rn , a set of n − 1 vectors cannot span.

Remember that “span” means
∀~w, x1~v1 + x2~v2 + · · · xk ~vk = w
~ has at least one solution.
Since “exists” is easier to work with than “for all”, convert this into a
definition of “does not span.” A set of k vectors does not span if
∃~w such that x1~v1 + x2~v2 + · · · xk ~vk = w ~ has no solution.
We invent a method for constructing w ~ , using elementary matrices.
Make a matrix A whose columns are ~v1 , ~v2 , · · · ~vk , and row-reduce it
by elementary matrices whose product can be called E. Then EA is
in echelon form.
If A has only n − 1 columns, it cannot have more than n − 1 pivotal
1’s, and there cannot be a pivotal 1 in the bottom row. That means
that if we had chosen a w ~ that row-reduced to a pivotal 1 in the last
row, the set of equations
x1~v1 + x2~v2 + · · · xk ~vk = w~
would have had no solution.
Now E is the product of invertible elementary matrices, hence invert-
ible. Just construct w ~ = E −1 e~n as an example of a vector that is not
in the span of the given n − 1 vectors.
21
15. Proof 3.1 – start to finish
Prove that in Rn , n + 1 vectors are never linearly independent and n − 1
vectors never span.
22
16. Definition of basis
This is Hubbard, Definition 2.4.12. It is really a definition plus two theo-
rems, but it can conveniently be left ambiguous which is which!
A basis for a subspace V ⊂ Rn has the following equivalent properties:
(a) It is a maximal set of linearly independent vectors in V : if you add any

other vector in V to the set, it will no longer be linearly independent.
(b) It is a minimal spanning set: it spans V , but if you remove any vector
from the set, it will no longer span.
(c) It is a set of linearly independent vectors that spans V .
To show that any of these three properties implies the other two would
require six proofs. Let’s do a couple. Call the basis vectors ~v1 , ~v2 , · · · ~vk .
• Prove that (a) implies (b) (this is your proof 3.2).

When we add any other vector w ~ to the basis set, the resulting set
is linearly dependent. Express this statement as an equation that
includes the term b~
w.
Show that if b 6= 0, we can express w~ as a linear combination of the

basis set. This will prove “spanning set”.
To prove that b 6= 0, assume the contrary, and show that the vectors
~v1 , ~v2 , · · · ~vk would be linearly dependent.
To prove “minimal spanning set,” just exhibit a vector that is not in

the span of ~v1 , ~v2 , · · · ~vk−1 .
23
• Prove that (c) implies (a).
This is easier, since all we have to show is “maximal.” Add another
~ to the linearly independent spanning set ~v1 , ~v2 , · · · ~vk . How
vector w
do we argue that this set is linearly dependent?
• Prove that (c) implies (b).

All we have to show is “minimal.” Imagine removing the last vector.
To show that the set ~v1 , ~v2 , · · · ~vk−1 is not a spanning set, we need to
find one vector that cannot be a linear combination of these.
Now we combine this definition of basis with what we already know about
sets of vectors in Rn .
Our conclusions:
In Rn , a basis cannot have fewer than n elements, since they would not
span.
In Rn , a basis cannot have more than n elements, since they would not be
linearly independent.
So any basis must, like the standard basis, have exactly n elements.
24
17. Basis for a subspace
Consider any subspace E ⊂ Rn . We need to prove the following:
• E has a basis.
• Any two bases for E have the same number of elements, called the
dimension of E.
Before the proof, consider an example.

E ⊂ R3 is the set of vectors for which x1 + x2 + x3 = 0.
   
1 0
One basis is  0  and  1 .
−1 −1
   
1 1
Another basis is −2 and −1.
  
1 0
It’s obvious that either basis is linearly independent, since neither basis
vector is zero, and one is not a multiple of the other.
How could we establish linear independence by using row reduction?
To show that each spans is less trivial. Fortunately, in this

 simple case we
a
can write an expression for the general element of E as  b 
−a − b
How would we express this general element as a linear combination of basis
vectors?
25
Now we proceed to the proof. First we must prove the existence of a basis
by explaining how to construct one.
How to make a basis for a non-empty subspace E in general:
Choose any ~v1 to get started. Notice that we need not specify a method for
doing this! The justification for this step is the so-called “axiom of choice.”
If ~v1 does not span E, choose ~v2 that is not in the span of ~v1 (not a multiple
of it). Again, we do not say how to do this, but it must be possible since
~v1 does not span E.
If ~v1 and ~v2 do not span E, choose ~v3 that is not in the span of ~v1 and ~v2
(not a linear combination).
Keep going until you have spanned the space. By construction, the set is
linearly independent. So it is a basis.
Second, we must prove that every basis has the same number of vectors.
Imagine that two people have done this and come up with bases of possibly
different sizes.
One is ~v1 , ~v2 , · · · ~vm .
The other is w ~ 2, · · · w
~ 1, w ~ p.
Since each basis spans E, we can write each w ~ j as a linear combination of
the ~v. It takes m coefficients to do this for each of the p vectors, so we end
up with an m × p matrix A, each of whose columns is one of the w ~ j.
We can also write each ~vi as a linear combination of the w ~ j . It takes p
coefficients to do this for each of the m vectors, so we end up with a p × m
matrix B, each of whose columns is one of the ~vi .
Clearly AB = I and BA = I. So A is invertible, hence square, and m = p.
26
18. Kernels and Images
Consider linear transformation T : Rn → Rm . This can be represented by

a matrix, but we want to stay abstract for the moment.
• The kernel of T , Ker T , is the set of vectors ~x for which T (~x) = 0.

• A system of equations T (~x) = ~b has a unique solution if and only if
Ker T is zero-dimensional.
Assume that T (~x1 ) = ~b and T (~x2 ) = ~b.
Since T is linear,
T (~x1 − ~x2 ) = ~b − ~b = 0.
If the kernel is zero-dimensional, it contains only the zero vector, and
~x1 = ~x2 .
Conversely, if the solution is unique: the only way that ~x1 and ~x2 can
both be solutions is ~x1 = ~x2 , the kernel is zero-dimensional.
• Ker T is a subspace of Rn .
Proof:
If ~x and ~y are elements of Ker T , then, because T is linear,
T (a~x + b~y) = aT (~x) + bT (~y) = 0.
• The image of T , Img T , is the set of vectors w
~ for which ∃~v such that
~ = T (~v).
w
• Img T is a subspace of Rm .
Proof:
~ 1 and w
If w ~ 2 are elements of Img T , then
∃~v1 such that w ~ 1 = T (~v1 ) and
∃~v2 such that w ~ 2 = T (~v2 )
T (a~v1 + b~v2 ) = aT (~v1 ) + bT (v~2 ) = aw
~1 + b~
w2 .
We have shown that any linear combination of elements of Img T is
also an element of Img T .
27
19. Basis for the image
To find a basis for the image of T , we must find a linearly independent
set of vectors that span the image. Spanning the image is not a problem:
the columns of the matrix for T do that. The hard problem is to choose a
linearly independent set. The secret is to use row reduction.
Each nonpivotal column is a linear combination of the columns to its left,
hence inappropriate to include in a basis. It follows that the pivotal columns
of T form a basis for the image. Of course, you can permute the columns
and come up with a different basis: no one said that a basis is unique.
This process of finding a basis for Img T is carried out in Mathematica.
   
1 2 1 1 1 2 0 2
The matrix T = 0 0 1 −1 row reduces to 0 0 1 −1.
2 4 1 3 0 0 0 0
By inspecting these two matrices, find a basis for Img T . Notice that the
dimension of Img T is 2, which is less than the number of rows, and that
the two leftmost columns do not form a basis.
28
20. Basis for the kernel
   
1 2 1 1 1 2 0 2
The matrix T = 0 0 1 −1 row reduces to 0 0 1 −1.
2 4 1 3 0 0 0 0
To find a basis for Ker T , look at the row-reduced matrix and identify
the nonpivotal columns. For each nonpivotal column i in turn, put a 1
in the position of that column, a 0 in the position of all other nonpivotal
columns, and leave blanks in the other positions. The resulting vectors must
be linearly independent, since for each of them, there is a position where
it has a 1 and where all the others have a zero. What are the resulting
(incomplete) basis vectors for Ker T ?
Now fill in the blanks: assign values in the positions of all the pivotal
columns so that T (v~i ) = 0. The vectors v~i span the kernel, since assigning a
value for each nonpivotal variable is precisely the technique for constructing
the general solution to T (~v) = 0.
29
21. Rank - nullity theorem
The matrix of T : Rn → Rm has n columns. We row-reduce it and find r
pivotal columns and n − r nonpivotal columns. The integer r is called the
rank of the matrix.
Each pivotal column gives rise to a basis vector for the image; so the di-
mension of Img T is r.
Each nonpivotal column gives rise to a basis vector for the kernel; so the
dimension of Ker T is n − r.
Clearly, dim(Ker T ) + dim(Img T ) = n.
In the special case of a linear transformation T : Rn → Rn , represented by
a square n × n matrix, if the rank r = n then
• any equation T (~v) = ~b has a solution, since the image is n-dimensional.

• any equation T (~v) = ~b has a unique solution, since the kernel is 0-
dimensional.
• T is invertible.
30
22. Linearly independent rows
Hubbard (page 200) gives two arguments that the number of linearly inde-
pendent rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row
that is linearly independent of the top two rows into the third position.
Continue until the top r rows are a linearly independent set, while each of
the bottom m − r rows is a linear combination of the top r rows.
Now, continuing with elementary row operations, subtract appropriate mul-
tiples of the top r rows from each of the bottom rows in succession, reducing
it to zero. (This is easy in principle but hard in practice!). The top rows,
still untouched, are linearly independent, so there is no way for row reduc-
tion to convert any of them to a zero row. In echelon form, the matrix will
have r pivotal 1s: its rank is r.
It follows that r is both the number of linearly independent columns and
the number of linearly independent rows: the rank of A is equal to the rank
of its transpose AT .
31
23. Orthonormal basis:
If we have a dot product, then we can convert any spanning set of vectors
into a basis. Here is the algorithm, sometimes called the “Gram-Schmidt
process.” We will apply it to the 3-dimensional subspace of R4 for which
the components sum to zero. Details of the computation are in the Math-
ematica file.
Choose any vector w~ 1 and divide it by its length to make the first basis
vector ~v1 .
 
1
−1
~1 =
If w  1 , what is ~v1 ?

−1
Choose any vector w ~ 2 that is linearly independent of ~v1 and subtract off
a multiple of ~v1 to make a vector ~x that is orthogonal to ~v1 . Divide this
vector by its length to make the second basis vector ~v2 .
 
2
−1
~2 =
If w −1, calculate ~x = w
 ~ 2 − (~
w2 · ~v1 )~v1
0
Choose any vector w ~ 3 that is linearly independent of ~v1 and ~v2 , and subtract
off multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1
and ~v2 . Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent
of your basis vectors.
 3   1 
√ − 2√5
 1 
2 2 5
− 1  − √1  − √3 
2 , ~  2 5  2 5
Mathematica gives ~v1 =  v =
 1  2 − √3  3  , ~
v = .
 2√1 5 
 
2 2 5
− 12 1
√ 3
√
2 5 2 5
A nice feature of an orthogonal basis (no need for it to be orthonormal) is

that any set of orthogonal vectors is linearly independent.
Proof: assume a1 v~1 + a2 v~2 + · · · ak v~k = ~0.
Choose any v~i and take the dot product with both sides of this equation.
You get ai = 0 for all i, which establishes independence.
32
3 Group Problems
1. Row reduction and elementary matrices
(a) By row reducing an appropriate matrix to echelon form, solve the

system of equations
2x + y + z = 2
x + y + 2z = 2
x + 2y + 2z = 1
where all the coefficients and constants are elements of the finite field
Z3 . If there is no solution, say so. If there is a unique solution, specify
the values of x, y, and z. If there is more than one solution, determine
all solutions by giving formulas for two of the variables, perhaps in
terms of the third one.

1 2
(b) Find the inverse of A = by using row reduction by means of
−3 −7
elementary matrices, as was done in sample problem 2. Confirm that
the product of the three elementary matrices that you use is indeed
the inverse. Use the familiar rule method for finding a 2 × 2 inverse to
check your answer!
(c) The matrix
 
0 1 2
A = 1 2 3
2 3 4
is not invertible. Nonetheless, there is a product E of three elementary
matrices, applied as was done in sample problem 2, that will reduce it
to echelon form. Find these three matrices and their product E.
33
2. Some short proofs
(a) Show that type 3 elementary matrices are not strictly necessary, be-
cause it is possible to swap rows of a matrix by using only type 1 and
type 2 elementary matrices. (If you can devise a way to swap the two
rows of a 2 × 2 matrix, that it sufficient, since it is obvious how the
technique generalizes.)
(b) Prove that if a set of linearly independent vectors spans a vector space
W, it is both a maximal linearly independent set and a minimal span-
ning set.
(c) Prove that in a vector space spanned by a single vector ~v, any two
vectors are linearly dependent. Then using this result, prove that in
a space spanned by two vectors ~v1 and ~v2 , any three vectors w ~ 1, w
~2
and w~ 3 must be linearly dependent. In the interest of simplicity. you
may assume that w ~ 1 = a1~v1 + a2~v2 with a1 6= 0.
Hint: Show how to construct a linear combination of w ~ 1 and w ~ 2 and
a linear combination of w~ 1 and w~ 3 , neither of which involves ~v1 .
34
3. Problems to be solved by writing or editing R scripts.
(a) The director of a budget office has to make changes to four line items
in the budget, but her boss insists that they must sum to zero. Three
of her subordinates make the following suggestions, all of which lie in
the subspace of acceptable changes:
     
1 3 3
2 −2 1
~1 =
w  ~ 2 =  ,w
 3 ,w  2  ~ 3 = −2.
 
−6 −3 −2
 
1
1
The boss proposes ~y =  −2, also acceptable on the grounds that “it

0
is simpler.”
Express ~y as a linear combination of the w~ i . Then convert the w
~ i to
an orthonormal basis ~vi and express ~y as a linear combination of the
~vi .
(b) Find a basis for the image
 and the kernel of the matrix
3 1 1 0 4
1 0 1 1 2
A= 0 1 −2 0 1,

2 0 0 1 3
Express the columns that are not in the basis for the image as linear
combinations of the ones that are in the basis.
(c) Find two different solutions to the following set of equations in Z5 :
2x + y + 3z + w = 3
3x + 4y + 3w = 1
x + 4y + 2z + 4w = 2
(d) The R function

sample(0:2, n, replace=TRUE)
generates n random numbers, each equally likely to be 0, 1, or 2. Use
it to generate three equations of the form ax + by + cz + dw = e with
coefficients in Z3 , and solve them by row reduction. If the solution is
not unique, find two different solutions.
35
4 Homework
In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
For the first three problems, do the row reduction by hand. That should give
you enough practice so that you can do row reduction by hand on exams. Then
you can use R to do subsequent row reduction.
1. By row reducing an appropriate matrix to echelon form, solve the system

of equations
2x + 4y + z = 2
3x + y = 1
3y + 2z = 3
over the finite field Z5 . If there is no solution, say so. If there is a unique
solution, specify the values of x, y, and z and check your answers. If there
is more than one solution, express two of the variables in terms of an arbi-
trarily chosen value of the third one. For full credit you must reduce the
matrix to echelon form, even if the answer becomes obvious!
2. (a) By using elementary matrices, find a vector that is not in the span of
     
1 0 2
~v1 = 1 , ~v2 = 2 , and ~v3 = 4
    
−1 2 0
(b) In the process, you will determine that the given three vectors are
linearly dependent. Find a linear combination of them, with the coef-
ficient of ~v3 equal to 1, that equals the zero vector.
(c) Find a 1 × 3 matrix A such that A~v1 = A~v2 = A~v3 = 0, and use it to
check your answer to part(a).
36
3. This problem illustrates how you can use row reduction to express a specifed
vector as a linear combination of basis vectors.
Your bakery uses flour, sugar, and chocolate to make cookies, cakes, and
brownies. The ingredients for a batch of each product is described by a
vector, as follows:
     
1 4 7
Suppose ~v1 = 2, ~v2 = 2, ~v3 =  8 .
3 7 11
This means, for example, that a batch of cookies takes 1 pound of flour, 2
of sugar, 3 of chocolate.
You are about to shut down for vacation and want  toclear out your inven-
21
~ = 18.
tory of ingredients, described by the vector w
38
Use row reduction to find a combination of cookies, cakes, and brownies
that uses up the entire inventory.
4. Hubbard, exercises 2.3.8 and 2.3.11 (column operations: a few brief com-
ments about the first problem will suffice for the second. These column
operations will be used in the spring term to evaluate n × n determinants.)
5. (This result will be needed in Math 23b)

Suppose that a 2n × 2n matrix T has the following properties:
• The first n columns are a linearly independent set.

• The last n columns are a linearly independent set.
• Each of the first n columns is orthogonal to each of the last n columns.
Prove that T is invertible.

~ = a~u + ~v, where ~u is a linear combination of the first n
Hint: Write w
columns and ~v is a linear combination of the last n columns. Start by
~ = ~0,
showing that ~u is orthogonal to ~v. Then exploit the fact that if w
~ ·w
w ~ = 0.
37
6. (This result will be the key to proving the “implicit function theorem,” key
to many economic applications.)
Suppose that m × n matrix C , where n > m, has m linearly independent
columns and that these columns are placed on the left. Then we can split
off a square matrix A and write C = [A|B].
(a) Let ~y be an (n−m)-component vector of the “active variables,” and

let
~x
~x be the m-component vector of passive variables such that C = ~0.
~y
Prove that ~x = −A−1 B~y.
(b) Use this approach to solve the system of equations
5x + 2y + 3z + w = 0
7x + 3y + z − 2w = 0
by inverting a 2 × 2 matrix, without using row reduction or any other
elimination technique. The solution will express the “passive” vari-
ables x and y in terms of the “active” variables z and w.
The remaining problems are to be solved by writing R scripts. You may

use the rref() function whenever it works.
7. (Like group problem 3b, but in a finite field, so rref will not help!)
In R, the statement
A<-matrix(sample(0:4, 24, replace = TRUE),4)
was used to create a 4 × 6 matrix A with 24 entries in Z5 . Each entry
randomly has the value 0, 1, 2, 3, or 4.
Here is the resulting matrix:
 
3 0 4 0 2 2
1 1 3 3 2 1
A=
0
.
2 1 1 4 2
1 0 2 0 3 4
Use row reduction to find a basis for the image of A and a basis for the
kernel. Please check your answer for the kernel.
8. One of the seventeen problems on the first Math 25a problem set for 2014
was to find all the solutions of the system of equations
2x1 − 3x2 − 7x3 + 5x4 + 2x5 = −2
x1 − 2x2 − 4x3 + 3x4 + x5 = −2
2x1 − 4x3 + 2x4 + x5 = 3
x1 − 5x2 − 7x3 + 6x4 + 2x5 = −7
without the use of a computer.
Solve this problem using R ( like script 3.1A).
38
9. (Like script 3.1C and group problem 3a)A neo-Cubist sculptor wants to use
a basis for R3 with the following properties:
 
1
• The first basis vector w1 = 1 lies along the body diagonal of the
1
cube.
 
1
• The second basis vector w2 = 0 lies along a face diagonal of the

1
cube.
 
3
• The second basis vector w3 = 4 , has length 13.

12
Convert these three basis vectors to an orthonormal basis. Then make a

3×3 reflection matrix F by using this basis, and confirm that the transpose
of F is equal to its inverse.
39
Module #1, Week 4 (Eigenvectors and Eigenvalues)
Author: Paul Bamberg

Reading
• Hubbard, Section 2.7
• Hubbard, pages 474-475
• 4.1 Prove that if ~v1 , · · · , ~vn are eigenvectors of A : Rn → Rn with distinct

eigenvalues λ1 · · · λn , they are linearly independent. Conclude that an n×n
matrix cannot have more than n distinct eigenvalues.
• 4.2
– For real n × n matrix A, prove that if all the polynomials pi (t) are
simple and have real roots, then there exists a basis for Rn consisting
of eigenvectors of A.
– Prove that if there exists a basis for Rn consisting of eigenvectors of
A, then all the polynomials pi (t) are simple and have real roots.
Note - Theorem 2.7.6 in Hubbard is more powerful, because it applies to

the complex case. The proof is the same. Our proof is restricted to the real
case only because we are not doing examples with complex eigenvectors.
1
R Scripts
• 1.4A-EigenvaluesCharacteristic.R
Topic 1 - Eigenvectors for a 2x2 matrix
Topic 2 - Not every 2x2 matrix has real eigenvalues
• 1.4B-EigenvectorsAxler.R
Topic 1 - Finding eigenvectors by row reduction
Topic 2 - Eigenvectors for a 3 x 3 matrix
• 1.4C-Diagonalization.R
Topic 1: Basis of real eigenvectors
Topic 2 - Raising a matrix to a power
Topic 3 - Wnat if the eigenvalues are complex?
Topic 4 - What if there is no eigenbasis?
• 1.4X-EigenvectorApplications.R
Topic 1 - The special case of a symmetric matrix
Topic 2 - Markov Process (from script 1.1D)
Topic 3 - Eigenvectors for a reflection
Topic 4 - Sequences defined by linear recurrences
2
1 Executive Summary
1.1 Eigenvalues and eigenvectors
If A~v = λ~v, ~v is called an eigenvector
for A, and λ is the corresponding
eigenvalue.
−1 4 1
For example, if A = , we can check that ~v =
−2 5 1
is an eigenvector of A with eigenvalue 3.
If A is a 2 × 2 or 3 × 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = λ~v as A~v = λI~v, where I is the identity matrix.
Equivalently, (A − λI)~v = ~0
Suppose that λ is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A − λI).
It follows that the matrix (A − λI) is not invertible. But we have a formula
for the inverse of a 2 × 2 or 3 × 3 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue is that
det(A − λI) = 0.
The polynomial χA (λ) = det(A − λI) is called the characteristic polyno-
mial of matrix A. It is easy to compute in the 2 × 2 or 3 × 3 case, where there
is a simple formula for the determinant. For larger matrices χA (λ) is hard to
compute efficiently, and this approach should be avoided.
Conversely, suppose that χA (λ) = 0 for some real number λ. It follows that
the columns of the matrix (A − λI) are linearly dependent. If we row reduce the
matrix, we will find at least one nonpivotal column, which in turn implies that
there is a nonzero vector in the kernel. This vector is an eigenvector.
This was the standard way of finding eigenvectors until 1995, but it has two
drawbacks:
• It requires computation of the determinant of a matrix whose entries are

polynomials. Efficienc algorithms for calculating the determinant of large
square matrices use row-reduction techniques, which might require division
by a pivotal elment that is a polynomial in λ.
• Once you have found the eigenvalues, finding the corresponding eigenvectors
is a nontrivial linear algebra problem.
1.2 Finding eigenvalues - a simple example

−1 4 −1 − λ 4
Let A = . Then A − λI =
−2 5 −2 5−λ
and χA (λ) = det(A − λI) = (−1 − λ)(5 − λ) + 8 = λ2 − 4λ + 3.
Setting λ2 − 4λ + 3 = (λ − 1)(λ − 3) = 0, we find two eigenvalues, 1 and 3.
3
Finding the corresponding
eigenvectors
still requires a bit of algebra.
−2 4
For λ = 1, A − λI = .
−2 4
2
By inspection we see that ~v1 = is in the kernel of this matrix.
1
−1 4 2 2
Check: A~v1 = = – eigenvector with eigenvalue 1.
−2 5 1 1
−4 4 1
For λ = 3, A − λI = , and ~v2 = is in the kernel.
−2
2 1
−1 4 1 3
Check: A~v2 = = – eigenvector with eigenvalue 3.
−2 5 1 3
4
1.3 A better way to find eigenvectors
Given matrix A, pick an arbitrary vector w ~ . Keep computing A~ w , A2 w ~ , A3 w
~,
etc. until you find a vector that is a linear combination of its predecessors. This
situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0. Further-
more, this is the nonzero polynomial of lowest degree for which p(A)~ w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root λ by
virtue of the “fundamental theorem of algebra” (Hubbard theorem 1.6.13). Over
the real numbers or a finite field, it it will have a root in the field only if you are
lucky. Assuming that the root exists, factor it out: p(t) = (t − λ)q(t).
Now p(A)~ w = (A − λI)q(A)~ w = 0.
Thus q(A)~ w is an
eigenvector
with eigenvalue λ.
−1 4
Again, let A =
−2 5
1 −1 2 −7
As the arbitrary vector w ~ choose . Then A~ w= and A w ~ = .
0 −2 −8
We need to express the third of these vectors, A2 w ~ , as a linear combination
of
the first two.
This
is done by row reducing the matrix
1 −1 −7 1 0 −3
to to find that A2 w ~ = 4A~w − 3I w~.
0 −2 −8 0 1 4
Equivalently, (A2 − 4A + 3I)~ w = 0.
p(A) = A2 − 4A + 3I or p(t) = t2 − 4t + 3 = (t − 1)(t − 3): eigenvalues 1 and 3.
To get the eigenvector
for eigenvalue
1, apply the remaining factor
of p(A),
−4 4 1 −4 2
A − 3I, to w ~: = . Divide by -2 to get ~v1 = .
−2 2 0 −2 1
To get theeigenvector
foreigenvalue
3, apply the remaining factor of p(A),
−2 4 1 −2 1
A − I, to w ~: = . Divide by -2 to get ~v2 = .
−2 4 0 −2 1
In this case the polynomial p(t) turned out to be the same as the characteristic
polynomial, but that is not always the case.

1
• If we choose w ~ = , we find A~ w = 3~ w, p(A) = A − 3I, p(t) = t − 3. We
1
need to start over with a different w ~ to find the other eigenvalue.

2 0
• If we choose A = , then any vector is an eigenvector with eigenvalue
0 2
2. So p(t) = t − 2. But the characteristic polynomial is (t − 2)2 .

2 1
• If we choose A = , the characteristic polynomial is (t − 2)2 . But now
0 2
there is only one eigenvector.
If we choose w ~ = ~e1 we find p(t) = t − 2
1
and the eigenvector . But if we choose a different w ~ = ~e2 we find
0
p(t) = (t − 2)2 and we fail to find a second, independent eigenvector.
5
1.4 When is there an eigenbasis?
Choose w ~ successively to equal ~e1~e2 , · · · , ~en .
In searching for eigenvectors, we find successively polynomials p1 (t), p2 (t), · · · , pn (t).
There is a basis of real eigenvectors if and only if each of the polynomials pi (t)
has simple real roots, e.g. p(t) = t(t − 2)(t + 4)(t − 2.3). No repeated factors
are allowed!
A polynomial like p(t) = t2 + 1, although it has no repeated factors, has no
real roots: p(t) = (t + i)(t − i).
If we allow complex roots, then any polynomial can be factored into linear
factors (Fundamental Theorem of Algebra, Hubbard page113).
There is a basis of complex eigenvectors if and only if each of the polynomials
pi (t) has simple roots, e.g. p(t) = t(t−i)(t+i). No repeated factors are allowed!
Our technique for finding eigenvectors works also for matrices over finite fields,
but in that case it is entriely possible for a polynomial to have no linear factors
whatever. In that case there are no eigenvectors and no eigenbasis. This is one
of the few cases where linear algebra over a finite field is fundamentally different
from linear algebra over the real or complex numbers.
1.5 Matrix Diagonalization

In the best case we can find a basis of n eigenvectors {~v1 , ~v2 , · · · , ~vn } with asso-
ciated eigenvalues {λ1 , λ2 , · · · , λn }. Although the eigenvectors must be indepen-
dent, some of the eigenvalues may repeat.
Create a matrix P whose columns are the eigenvectors. Since the eigenvectors
form a basis, they are independent and the matrix P has an inverse P −1
The matrix D = P −1 AP is a diagonal matrix.
Proof: D~ek = P −1 A(P~ek ) = P −1 A~vk = P −1 λk ~vk = λk P −1~vk = λk~ek .
The matrix A can be expressed as A = P DP −1 .
Proof: A~vk = P D(P −1~vk ) = P D~ek = P (λk~ek ) = λk P~ek = λk ~vk .
A diagonal matrix Dis easy to raise  to an integer
 kpower. 
λ1 0 0 λ1 0 0
For example, if D = 0 λ2 0 , then D = 0 λk2 0 
  k 
0 0 λ3 0 0 λk3
But now A = P DP −1 is also easy to rasie to a power, because Ak = P Dk P −1
(will be proved by induction)
The same result extends to kth roots of matrices,
where B = A1/k means that B k = A.
6
1.6 Properties of an eigenbasis
• Even if all the eigenvalues are distinct, an eigenbasis is not unique. Any
eigenvector in the basis can be multiplied by a nonzero scalar and remain
an eigenvector.
• Eigenvectors that correspond to distinct eigenvalues are linearly indepen-

dent (your proof 4.1)
• If the matrix A is symmetric, eigenvectors that correspond to distinct eigen-

values are orthogonal.
1.7 What if there is no eigenbasis?

We consider only the case where A is a 2 × 2 matrix. If a real polynomial p(t)
does not have two distinct real roots, then it either has a repeated real root or it
has a pair of conjugate complex roots.
Case 1: Repeated root: p(t) = (t − λ)2 .
So p(A) = (A − λI)2 = 0.
Set N = A − λI, and N 2 = 0. The matrix N is called nilpotent.
Now A = λI + N , and A2 = (λI + N )2 = λ2 I + 2λN .
It is easy to prove by induction that Ak = (λI + N )k = λk I + kλk−1 N .
Case 2: Conjugate complex roots:
If a 2 × 2 real matrix A has eigenvalues a ± ib, then it can be expressed

a −b
in the form A = P CP −1 , where C is the conformal matrix and P
b a
is a change of basis matrix. Since a conformal matrix is almost as easy as a
diagonal matrix to raise to the nth power by virtue of De Moivre’s theorem
(r(cos θ + i sin θ))n = rn (cos nθ + i sin nθ), this representation is often useful.
Here is an algorithm for constructing the matrices C and P :
Suppose that the eigenvalues of A are a ± ib. Then A has no real eigenvectors,
and for any real w ~ we will find the polynomial
p(t) = (t − a − ib)(t − a + ib) = (t − a)2 + b2
So p(A) = (A − aI)2 + b2 I = 0 or ( A−aI b
)2 = −I.
Now we need to construct a new basis, which will not be a basis of eigenvectors
but which will still be useful.
Set ~v1 = ~e1 , ~v2 = ( A−aI
b
)~e1 .
Then (A − aI)~v1 = b~v2 and A~v1 = a~v1 + b~v2 .
Also, ( A−aI
b
)~v2 = ( A−aIb
)2~v1 = −~v1 , so
(A − aI)~v2 = −b~v1 and A~v2 = a~v2 − b~v1 .
With respect
to the new basis, the matrix that represents A is the conformal
a −b
matrix C = .
b a
If we define P in the usual way with columns ~v1 and ~v2 , then A = P CP −1 ,
and the matrices P and C are real.
7
1.8 Applications of eigenvectors
• Markov processes
Suppose that a system can be in one of two or more states and goes through
a number of steps, in each of which it may make a transition from one state
to another in accordance with specified “transition probabilities.”

p
For a two-state process, vector ~vn = n specifies the probabilities for
qn
the system to be in state 1 or state 2 after n steps of the process, where
0 ≤ pn , qn ≤ 1. and
pn + qn = 1 The transition probabilities are spacified
a b
by a matrix A = , where all the entries are between 0 and 1 and
c d
a + c = b + d = 1.
After a large number of steps, the state of the system is speciifed by ~vn =
An~v0 .
The easy way to calculate An is by diagonalizing A. If there is a “stationary
state” ~v into which the system settles down, it corresponds to an eigenvector
with eigenvalue 1, since ~vn+1 = A~vn and ~vn+1 = ~vn = ~v.
• Reflections
If 2 × 2 matrix F represents reflection in a line through the origin with
direction vector ~v, then ~v must be an eigenvector with eigenvalue 1 and a
vector perpendicular to ~v must be an eigenvector with eigenvalue -1.
If 3 × 3 matrix F represents reflection in a plane P through the origin
~ then N
with normal vector N, ~ must be an eigenvector with eigenvalue -1
and there must be a two-dimensional subspace of vectors in P , all with
eigenvalue +1.
• Linear recurrences and Fibonacci-like sequences.

In computer science, it is frequently the case that the first two terms of a
sequence, a0 and a1 , are specified, and subsequent terms are specified by a
“linear recurrence” of the form an+1 = ban−1 +can . The best-known example
is the Fibonacci sequence (Hubbard, pages 223-225) where a0 = a1 = 1 and
b = c = 1.
n
an 0 1 an−1 0 1 a0
Then = = .
an+1 b c an b c a1

0 1
The easy way to raise matrix A = to the nth power is to diagonalize
b c
it.
• Solving systems of linear differential equations

This topic, of crucial importance to physics, will be covered after we have
done some calculus and infinite series.
8
2 Lecture Outline
1. Using the characteristic polynomial to find eigenvalues and eigenvectors
If A~v = λ~v, ~v is called an eigenvector for A, and λ is the corresponding
eigenvalue.
If A is a 2 × 2 or 3 × 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = λ~v as A~v = λI~v, where I is the identity matrix.
Equivalently, (A − λI)~v = ~0
Suppose that λ is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A − λI).
It follows that the matrix (A − λI) is not invertible. But we have a formula
for the inverse of a 2×2 or 3×3 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue
is that det(A − λI) = 0.
The polynomial χA (λ) = det(A − λI) is called the characteristic poly-
nomial of matrix A. It is easy to compute in the 2 × 2 or 3 × 3 case, where
there is a simple formula for the determinant. For larger matrices χA (λ) is
hard to compute efficiently, and this approach should be avoided.
Conversely, suppose that χA (λ) = 0 for some real number λ. It follows
that the columns of the matrix (A − λI) are linearly dependent. If we row
reduce the matrix, we will find at least one nonpivotal column, which in
turn implies that there is a nonzero vector in the kernel. This vector is an
eigenvector.
9
2. A better way to find eigenvectors
Given matrix A, pick an arbitrary vector w ~ . Keep computing A~ w , A2 w~,
3
Aw ~ , etc. until you find a vector that is a linear combination of its prede-
cessors. This situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~ w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
λ by virtue of the “fundamental theorem of algebra” (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky.
Citing your source: This technique was brought to the world’s attention
by Sheldon Axler’s 1995 article “Down with Determinants” (see Hubbard
page 224). Unlike most of what is taught in undergraduate math, it should
probably be cited when you use it in other courses. An informal comment
like “Using Axler’s method for finding eigenvectors...” would suffice.
10

3 2
3. Consider the matrix A = with entries from the finite field Z5 .
3 3
(a) Find the eigenvalues of A by solving the characteristic equation

det(A − λI) = 0, then find the corresponding eigenvectors. Solving a
quadratic equation over Z5 is easy – in a pinch, just try all five possible
roots!
(b) Find the eigenvalues of A by using the technique of example 2.7.5
of Hubbard. You will get the same equation for the eigenvalues, of
course, but it will be more straightforward to find the eigenvectors.
(c) Write down the matrix P whose columns are the basis of eigenvectors,
and check your answer by showing that P −1 AP is a diagonal matrix.
11
4. Concocting a 2 × 2 matrix without a basis of eigenvectors

2 0 1 −1
Let D = ,N = . The matrix N is a so-called “nilpotent”
0 2 1 −1
matrix: because its kernel is the same as its image, N 2 is the zero matrix.
(a) Show that the matrix A = D + N has the property that if we choose
~ that is not in the kernel of N , then the polynomial p(A) is
any w
(A − 2I)2 and so there is no basis of eigenvectors.
(b) Prove by induction that Ak = Dk + kDk−1 N.
12
5. Eigenbases
To construct the matrix P , we need a basis of eigenvectors. A sufficient,

but not necessary, condition is that the matrix A has n distinct eigenvalues.
In examples, these will be real numbers, but the result is valid also in Cn .
Here is your proof 8.1.
If ~v1 , · · · , ~vn are eigenvectors of A : Rn → Rn with distinct eigenvalues
λ1 · · · λn , they are linearly independent.
Suppose, for a contradiction, that the eigenvectors are linearly dependent.
There exists a first eigenvector (the jth one) that is a linear combination
of its predecessors:
~vj = a1 v~1 + · · · + aj−1~vj−1 .
Multiply both sides by A − λj I. You get zero on the left, and on the right
you get a linear combination where all the coefficients are nonzero because
λj − λi 6= 0. This is in contradiction to the assumption that ~vj was the first
one that is a linear combination of its predecessors.
Since in Rn there cannot be more than n linearly independent vectors, there
are at most n distinct eigenvalues.
Proof 8.1, start to finish:
13
6. Finding eigenvectors
This method is guaranteed to succeed only for the field of complex num-
bers, but the algorithm is valid for any field, and it finds the eigenvectors
whenever they exist.
Given matrix A, pick an arbitrary vector w ~ . If you are really lucky, A~
w
~ and you have stumbled across an eigenvector. If not,
is a multiple of w
keep computing A2 w~ , A3 w
~ , etc. until you find a vector that is a linear
combination of its predecessors. This situation is easily detected by row
reduction.
Now you have found a polynomial p of degree m such that p(A)~ w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
λ by virtue of the “fundamental theorem of algebra” (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky. Assuming that the root exists, factor it out:
p(t) = (t − λ)q(t)
w = (A − λI)q(A)~
Now p(A)~ w = 0.
Thus q(A)~
w is an eigenvector with eigenvalue λ.
Here is a 2 × 2 example where the calculation is easy.

−1 4
Let A =
−2 5

1
~ choose
As the arbitrary vector w . Compute A~ w and A2 w
~.
0
14
Use row reduction to express the third of these vectors, A2 w
~ , as a linear
combination of the first two.

1 −1 −7
0 −2 −8
Write the result in the form p(A)~

w = 0.
Factor: p(t)=
To get the eigenvector for eigenvalue 1, apply the remaining factor of

p(A), A − 3I, to w
~.
To get the eigenvector for eigenvalue 3, apply the remaining factor of

p(A), A − I, to w
~.
15
7. Change of basis
Our “old” basis consists of the standard basis vectors ~e1 and ~e2 .
Our “new” basis consists of one eigenvector for each eigenvalue.

2 1
Let’s choose ~v1 = and ~v2 = .
1 1
It would be all right to multiply either of these vectors by a constant or to
reverse their order.
Write down the change of basis matrix P whose columns express the new
basis vectors in term of the old ones.
Calculate the inverse change of basis matrix P −1 whose columns express

the old basis vectors in terms of the new ones.
We are considering a linear transformation that is represented, relative to

the standard basis, by the matrix A. What diagonal matrix D represents
this linear transformation relative to the new basis of eigenvectors?
Confirm that A = P DP −1 . We have “diagonalized” the matrix A.

1 0 1 −1
0 3 −1 2

2 1
1 1
16
8. Eigenvectors for a 3 × 3 matrix
For Hubbard Example 2.7.5, the calculation is best subcontracted to Math-
ematica. The matrix is
 
1 −1 0
A = −1 2 −1
0 −1 1
 
2
Since we have help with the computation, make the choice w ~ = 3.

5
The matrix to row reduce is
 
2 −1 0 3
3 −1 −3 −9, different from the matrix in Hubbard.
5 2 3 6
The result of row reduction is the same:
 
1 0 0 0
0 1 0 −3
0 0 1 4
The rest of the work is easily done by hand.
Using the last column, write the polynomial p(t), and factor it.
Find an eigenvector that corresponds to the smallest positive eigenvalue. It

~ ; any vector will do, as long as it is not
is not necessary to use the same w
in the subspace spanned by the other eigenvectors. Hubbard uses ~e1 . Use
~e3 instead.
17
9. When is there an eigenbasis?
This is a difficult issue in general. The simple case is where we are lucky
and find a polynomial p of degree n that has n distinct roots. In that case
we can find n eigenvectors, and it has already been proved that they are
linearly independent. They form an eigenbasis. If the roots are real, the
eigenvectors are elements of Rn . If the roots are distinct but not all real,
the eigenvectors are still a basis of Cn .
Suppose we try each standard basis vector in turn as w ~ . Using ~ei leads to
a polynomial pi . If every pi is a polynomial of degree mi < n, the situation
is more complicated. Theorem 2.7.6 in Hubbard states the result:
There exists an eigenbasis of Cn if and only if all the roots of all the pi are
simple.
Before doing the difficult proof, look the simplest examples of matrices that
do not have n distinct eigenvalues.

2 0
• Let A = . In this case every vector in R2 is an eigenvector
0 2
with eigenvalue 2. There is only one eigenvalue, but any basis is an
eigenbasis.
If we choose w~ = ~e1 and form the matrix whose columns are w ~ and
A~w,

1 2
,
0 0
the matrix is already in echelon form.

What is p1 ?
What eigenvector do we find?
~ = ~e2 ?
What eigenvector do we find if we choose w
Key point: we found a basis of eigenvectors, even though there was

only one eigenvalue, and the polynomial (t − 2)2 never showed up.
18

2 0
• Let A = . In this case there is only one eigenvalue and there is
1 2
no eigenbasis.
What happens if we choose w ~ = ~e2 ?
~ = ~e1 ,
If we choose w

1 2 4
confirm that
0 1 4
1 0 −4
row reduces to .
0 1 4
What is p1 ?
What happens when we carry out the procedure that usually gives an
eigenvector?
Key point: There was only one eigenvalue, the polynomial (t − 2)2
showed up, and we were unable to find a basis of eigenvectors.
19
10. An instructive 3 × 3 example
The surprising case, and the one that makes the proof difficult, is the one
where there exists a basis of eigenvectors
 but there
 are fewer than n distinct
1 0 0
eigenvalues. A simple example is A = 0 2 0
0 0 2
Here each standard basis vector is an eigenvector. For the first one the
eigenvalue is 1; for the second and third, it is 2.
A less obvious example is
 
2 1 −1
A = 0 2 0 
0 1 1
The procedure for finding eigenvectors is carried out in the Mathematica
file, with the following results:
~ = ~e1 , we get p1 (t) = t − 2 and find an eigenvector
Using w
 
1
0 with eigenvalue 2.
0
~ = ~e2 , we get p2 (t) = (t − 1)(t − 2) and find two eigenvectors:
Using w
   
1 1
0 with eigenvalue 1, 1 with eigenvalue 2.
1 1
At this point we have found three linearly independent eigenvectors and we
have a basis.
~ = ~e3 , we get p3 (t) = (t − 1)(t − 2) and find two eigenvectors:
If we use w
   
1 1
0 with eigenvalue 1, 0 with eigenvalue 2.
1 0
~ , we will get p(t) = (t − 1)(t − 2)
In general, if we use some arbitrary w
and we will find the eigenvector with eigenvalue 1 along with some linear
combination of the eigenvectors with eigenvalue 2.
Key points about this case:
• The polynomial pi (t), in order to be simple, must have degree less than
n.
• We need to use more than one standard basis vector in order to find
a basis of eigenvectors.
20
11. Proof that if all roots are simple there is an eigenbasis
Assume that whenever we choose w ~ = ~ei , the polynomial pi of degree mi
has simple roots. The columns of the matrix that we row reduce are
~ei , A~ei , · · · Ami~ei . The image of this matrix has three properties.
• It is a subspace Ei of Rn .
• It includes mi eigenvectors. Since these correspond to distinct eigen-
values, they are linearly independent, and therefore they span Ei .
• It includes ~ei .
Now take the union of all the Ei . This union has the following properties:
• It includes each standard basis vector ~ei , so it is all of Rn .

• It is spanned by the union of the sets of eigenvectors. In general there
will be more than n vectors in this set. Use them as columns of a
matrix. The image of this matrix is all of Rn . We can find a basis for
the image consisting of n columns, which are all eigenvectors.
12. Proof that if there is an eigenbasis, each pi has simple roots.

There are k distinct eigenvalues, λ1 , · · · , λk . It is entirely possible that
k < n, since different eigenvectors may have the same eigenvalue.
Since there is a basis of eigenvectors, we can express each ~ei as a linear
combination of eigenvectors.
Q
Define pi (t) = (t − λj ). The product extends just over the set of eigen-
values that are associated with the eigenvectors needed to express ~ei as a
linear combination, so there may be fewer than k factors.
Q
Form pi (A) = (A − λj I). The factors can be in any order. If w ~ is any
eigenvector whose eigenvalue λj is included in the product, then
(A − λj I)~w = 0 and so pi (A)~w = ~0. Since those eigenvectors from a basis
for a subspace that includes ~ei , it follows that pi (A)~ei = ~0.
If we form a nonzero polynomial p0i (t) of lower degree by omitting one factor
from the product, then p0i (A)~ei 6= ~0, since the eigenvectors that correspond
to the omitted eigenvalue do not get killed off.
So pi (t) is the nonzero polynomial of lowest degree for which pi (A)~ei = ~0,
and by construction it has simple roots.
21
13. Proof 4.2, first half
Assume that whenever we choose w ~ = ~ei , the polynomial pi of degree mi
has simple roots. Consider the subspace E that is the image of the matrix
whose columns are
~e1 , A~e1 , · · · Am1~e1 , ~e2 , A~e2 , · · · Am2~e2 , · · · , ~en , A~en , · · · Amn~en .
Prove that E = Rn (easy) and that there exists a basis for E that consists
entirely of eigenvectors(harder).
22
14. Proof 4.2, second half
Assume that there is a basis of Rn consisting of eigenvectors of n × n matrix
A, but that A has only k ≤ n distinct eigenvalues. Prove that for any basis
~ = ~ei , the polynomial pi (t) has simple roots.
vector w
23
15. Conformal matrices and complex numbers

7 −10
(a) Show that the polynomial p(t) for the matrix A = has roots
2 −1
3 ± 2i.
(b) Show that ( A−3I
2
)2 = −I.
(c) Choose a new basis with ~v1 = ~e1 , ~v2 = ( A−3I
2
)~e1 .
Use these basis vectors as the columns of matrix P .
Confirm that A = P CP −1 , where C is conformal and P is real.
24
16. Change of basis - nice neat case

1 1 1
Let A = , and find an eigenvector, starting with ~e1 = .
−2 4 0

1 −1
Then A~e1 = and A2~e1 =
−2 −10

1 1 −1 1 0 −6
We row-reduce to
0 −2 −10 0 1 5
and conclude that A2~e1 = −6~e1 + 5A~e1 or A2~e1 − 5A~e1 + 6~e1 = 0.

1 1
Complete the process of finding two eigenvalues and show that and
1 2
2
are a pair of eigenvectors that form a basis for R .
p(t) =
For λ = 2,
For λ = 3,
The change of basis matrix P expresses the new basis (eigenvectors) in

terms of the old (standard); so its columns are the eigenvectors. Write
down P and calculate its inverse.
Now we can check the formula
[T ]{v0 },{v0 } = [P{v0 →v} ]−1 [T ]{v},{v} [P{v0 →v} ].
Calculate P −1 AP to get the diagonal matrix D relative to the new basis.

1 1 1 1
−2 4 1 2

2 −1
−1 1
25
17. Fibonacci numbers by matrices
The usual way to generate the Fibonacci sequence is to set a0 = 1, a1 = 1,

then calculate a2 = a0 + a1 = 2, a3 = a1 + a2 = 3, etc.
In matrix notation this can be written

a1 0 1 1
=
a2 1 1 1
and more generally

n
an 0 1 1
= .
an+1 1 1 1
Use this approach to determine a2 and a3 , doing the matrix multiplication

first.

0 1
1 1

0 1
1 1
Determine a6 and a7 by using the square of the matrix that was just con-
structed.

1 1
1 2

1 1
1 2
We have found a slight computational

n speedup, but it would be nicer to
0 1
have a general formula for .
1 1
26
18. Powers of a diagonal matrix.
For a 2 × 2 diagonal matrix,
n
c1 0 cn1 0
= .
0 c2 0 cn2
The generalization to a diagonal matrix of any size is obvious.

Now suppose that we want to compute An and can find P such that

−1 c1 0
P AP = .
0 c2
Prove by induction that
(P −1 AP )n = P −1 An P.
Now show that

cn1 0
A =Pn
P −1
0 cn2
For the Fibonacci example, this approach works with

√ √
1+ 5 1− 5
c1 = , c2 = ,
2 2
and

2√ 2√
P =
1+ 5 1− 5
.
The accompanying Mathematica notebook file Outline8.nb confirms this.
We need to find a systematic way to construct the matrix P .
27
3 Group Problems
1. Some interesting examples with 2 × 2 matrices
(a) Since a polynomial equation with real (or complex) coefficients always
has a root (the “fundamental theorem of algebra”), a real matrix is
guaranteed to have at least one complex eigenvalue. No such theorem
holds for polynomial equations with coefficients in a finite field, so
zero eigenvalues is a possibility. This is one of the few results in linear
algebra that depends on the underlying
field.
3 1
Consider the matrix A = with entries from the finite field Z5 .
n 3
By considering the characteristic equation, find values of n that lead
to 2, 1, or 0 distinct eigenvalues. For the case of 1 eigenvalue, find an
eigenvector.
Hint: After writing the characteristic equation with n isolated on the
right side of the equals sign, make a table of the value of t2 + 4t + 4
for each of the five possible eigenvalues. That table lets you determine
how many solutions there are for each of the five possible values of
n. When the characteristic polynomial is the square of a linear factor,
there is only one eigenvector and it is easy to construct.

1 −1
(b) The matrix A = has only a single eigenvalue and only one
4 −3
independent eigenvector.
Find the eigenvalue and eigenvector, show that A = D + N where D is
diagonal and N is nilpotent, and use analysis to calculate A3 without
ever multiplying A by itself (unless you want to check your answer).
(c) Extracting square roots by diagonalization.

2 1
The matrix A =
2 3
conveniently has two eigenvalues that are perfect squares. Find a
basis of eigenvectors and construct a matrix P such that P −1 AP is a
diagonal matrix.
Thereby find two independent square roots of A, i.e. find matrices B1
and B2 such that B12 = B22 = A , with B2 6= ±B1 . Hint: use the
negative square root of one of the eigenvaulues, the positive square
root of the other.
If you take Physics 15c next year, you may encounter this technique
when you study “coupled oscillators.”
28
2. Some proofs. In doing these, you may use the fact that an eigenbasis exists
if and only if all the pi (t) have simple roots.
(a) Suppose that a 5 × 5 matrix has a basis of eigenvectors, but that its
only eigenvalues are 1 and 2. Using Hubbard Theorem 2.7.6, convince
yourself that you must make at least three different choices of ~ei in
order to find all the eigenvectors.
(b) An alternative approach to proof 4.1 – use induction.
Identify a base case (easy). Then show that if a set of k−1 eigenvectors
with distinct eigenvalues is linearly independent and you add to the
set an eigenvector ~vk with an eigenvalue λk that is different from any
of the preceding eigenvalues, the resulting set of k eigenvectors with
distinct eigenvalues is linearly independent.
(c) In general, the square matrix A that represents a Markov process has
the property that all the entries are between 0 and 1 and each column
sums to 1. Prove that such a matrix A has an eigenvalue of 1 and
that there is a “stationary vector” that is transformed into itself by A.
You may use the fact, which we have proved so far only for 2×2 and
3 × 3 matrices, that if a matrix has a nonzero vector in its kermel, its
determinant is zero.
29
3. Problems with 3 × 3 matrices, to be solved by writing or editing R scripts
(a) Sometimes you don’t find all the eigenvectors on the first try.
 
1 2 0
The matrix A = 2 1 0
0 0 1
has three real, distinct eigenvalues, and there is a basis of eigenvectors.
Find what polynomial equation for the eigenvalues arises from each of
the following choices, and use it to construct as many eigenvectors as
possible.:
• w
~ = e~1 .
• w
~ = e~3 .
• w
~ = e~1 + e~3 .
 
1 −1 1
(b) Find two eigenvectors for the matrix A = −1 1 1 . and confirm
−2 2 0
that using each of the three standard basis vectors will not roduce a
third independent eigenvector.
Clearly the columns of A are not independent; so 0 is an eigenvalue.
This property makes the algebra really easy.
(c) Use the technique of example 2.7.5 in Hubbard to find the eigenvalues
3 4 −4
and eigenvectors of the matrix A = 1 3 −1

3 6 −4
30
4 Homework
1. Consider the sequence of numbers described, in a manner similar to the
Fibonacci numbers, by
b3 = 2b1 + b2
b4 = 2b2 + b3
bn+2 = 2bn + bn+1
(a) Write a matrix B to generate this sequence in the same way that
Hubbard generates the Fibonacci numbers.
(b) By considering the case b1 = 1, b2 = 2 and the case b1 = −1, b2 = 1,
find the eigenvectors and eigenvalues of B.

1
(c) Express the vector as a linear combination of the two eigenvectors,
1
and thereby find a formula for bn if b1 = 1, b2 = 1.
2. (This is similar to group problem 1c.)

−10 9
Consider the matrix A = .
−18 17
(a) By using a basis of eigenvectors, find a matrix P such that P −1 AP is

a diagonal matrix.
(b) Find a cube root of A, i.e. find a matrix B such that B 3 = A.
3. (a) Prove that if ~v1 and ~v2 are eigenvectors of matrix A, both with the
same eigenvalue λ, then any linear combination of ~v1 and ~v2 is also
an eigenvector.
(b) Suppose that A is a 3 × 3 matrix with a basis of eigenvectors but
~ , the vectors
with only two distinct eigenvalues. Prove that for any w
2
~ , A~
w w, and A w ~ are linearly dependent. (This is another way to
understand why all the polynomials pi (t) are simple when A has a
basis of eigenvectors but a repeated eigenvalue.)
31
4. Harvard graduate Ivana Markov, who concentrated in English and math-
ematics with economics as a secondary field, just cannot decide whether
she wants to be a poet or an investment banker, and so her career path is
described by the following Markov process:
• If Ivana works as a poet in year n, there is a probability of 0.9 that she

will feel poor at the end of the year and take a job as an investment
banker for year n + 1. Otherwise she remains a poet.
• If Ivana works as an investment banker in year n, there is a probability
of 0.7 that she will feel overworked and unfulfilled at the end of the
year and take a job as a poet for year n + 1. Otherwise she remains
an investment banker.

p
Thus, if n describes the probabilities that Ivana works as a poet or a
qn
banker respectively
in year
n, the corresponding
probabilities
for year n + 1
pn+1 pn 0.1 0.7
are given by =A , where A =
qn+1 qn 0.9 0.3
(a) Find the eigenvalues and eigenvectors of A.

(b) Construct the matrix P whose columns
are the eigenvectors, invert
1
it, and thereby express the vector as a linear combination of the
0
eigenvectors.

p0 1
(c) Suppose that in year 0 Ivana works as a poet, so that = .
q
0 0
pn p
Find an explicit formula for and use it to determine 10 . What
qn q10
happens in the limit of large n?
32
5. (a) Prove by induction (no “· · · ” allowed!) that if F = P CP −1 , then
F n = P C n P −1 for all positive integers n.
(b) Suppose that 2 × 2 real matrix F has complex eigenvalues re±iθ . Show
that, for integer n, F n is a multiple of the identity matrix if and only
if nθ = mπ for some integer m. Hint: write F = P CP −1 where C is
conformal. This hint also helps with the rest of the problem.

3 7
(c) If F = , find the smallest n for which F n is a multiple of
−1 −1
the identity. Check your answer by matrix multiplication.

−2 −15
(d) If G = , use half-angle formulas to find a matrix A
3 10
for which A2 = G. Check your answer by matrix multiplication.
Problems that require writing or editing R scripts
6. (This is similar to group problem 3b.)

Use the technique of example 2.7.5 in Hubbard to find the eigenvalues and
eigenvectors of the following two matrices. One has a repeated eigenvalue
and will require you to use the technique with two different basis vectors.
 
3 4 −4
(a) A = 1 3 −1
3 6 −4
 
1 0 0
(b) B = 1 3 −1
1 2 0
 
5 1 1
7. The matrix A = −1 3 −1 has only one eigenvalue, 4, and so its char-
0 0 4
acteristic polynomial must be (t − 4)3 .
(a) Show that A has a two-dimensional subspace of eigenvectors but that

there is no other eigenvector.
(b) Write A = D + N where D is diagonal and N is nilpotent, and confirm
that N 2 is the zero matrix.
33
8. Here is a symmetric matrix, which is guaranteed to have an orthonormal
basis of eigenvectors. For once, the numbers have not been rigged to make
the eigenvalues be integers.
 
4 −1 1
A = −1 3 2
1 2 −3
Express A in the form P DP −1 , where D is diagonal and P is an isometry
matrix whose columns are orthogonal unit vectors.
A similar example is in script 1.4X.
34
Module #2, Week 1 (Number Systems and Sequences)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
Last modified: July 24, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-7 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
• Ross, Chapter 1, sections 1 through 5 (number systems)
• Ross, Chapter 2, sections 7 through 9 (sequences)
• Hubbard, section 0.2 (quantifiers and negation)
• Hubbard, section 0.6 (infinite sets)
Warmups(to be done before lecture)
• (Subsection 1.1) Prove by induction (review how to do this if necessary)

that n
X n(n + 1)
i= .
i=1
2
Then do the proof differently by assuming that there are positive integers
n for which the given formula is not true, letting m be the smallest such
value, and showing that your assumption led to a contradiction because the
formula would also have to be false for m − 1.
• (Subsection 1.2) Look at the axioms for an ordered field (Ross, p. 14).
Identify one of the axioms that is not satisfied by the complex numbers,
which form a field but not an ordered field.
• (Subsection 1.2) You are given an unlimited budget to build a podium, one
foot in height, for the gold medal winner in your school’s track meet. Your
only available construction material is squares of gold foil, which are very
thin. Show that the Archimedean property of the real numbers guarantees
that you can succeed.
√
• (Subsection 1.2) Find a way to express 2 (which is irrational) as the√least
upper bound of a set of rational numbers. Hint: you can write 2 in
decimal notation.
1
• (Subsection 1.4) After a careful reading of example 1 in section 8, write out
a “Formal Proof” that
1
lim √ = 0.
n
• (Subsection 1.4) Invent an example of a sequence (sn ) of positive numbers

that is strictly decreasing (sn+1 < sn for all n) but whose limit is not zero.
• (Subsection 1.5)Students of calculus readily accept the statement

“if |a| < 1, then limn→∞ an = 0” on the basis that
“when I add 1 to n, an gets smaller in magnitude.”
However, the preceding example shows that this observation is not good
enough! Look at page 48 to see how Ross does the proof.
• (Subsection 1.5)Invent sequences (sn ) and (tn ) such that lim(sn ) = 0 but
lim(sn tn ) = 2. Hint: look at theorem 9.4. You need to invent a (tn ) that
does not satisfy the hypotheses of this theorem.
• 5.1 Define “countably infinite.” Prove that the set of positive rational
numbers is countably infinite, but that the set of real numbers in the interval
[0,1], as represented by infinite decimals, is not countable.
• 5.2 Suppose that sn 6= 0 for all n and that s = lim sn > 0.

Prove that ∃N such that ∀n > N, sn > s/2, and that s1n converges to 1s .
Additional proofs(may appear on quiz, students will post pdfs or videos
• 5.3 (Ross, p. 25; the Archimedean Property of R)

The completeness axiom for the real numbers states that every nonempty
subset S ∈ R that is bounded above has a least upper bound sup S. Use
it to prove that for any two positive real numbers a and b, there exists a
positive integer n such that na > b.
• 5.4 (Ross, page 52)

Suppose that lim sn = +∞ and lim tn > 0. Prove that lim sn tn = +∞.
2
R Scripts
• Script 2.1A-Countability.R
Topic 1 - The set of ordered pairs of natural numbers is countable
Topic 2 - The set of positive rational numbers is countable
• Script 2.1B-Uncountability.R
Topic 1 - Cantor’s proof of uncountability
Topic 2 - A different-looking version of the same argument
• Script 2.1C-Denseness.R
Topic 1 - Placing rational numbers between any two real numbers
• Script 2.1D-Sequences.R
Topic 1 - Limit of an infinite sequence
Topic 2 - Limit of sum = sum of limits
Topic 3 - Convergence of sequence of inverses (proof 5.2)
3
1 Executive Summary
1.1 Natural Numbers and Rational Numbers
• The natural numbers N are 1, 2, 3, · · · . They have the following rather
obvious properties. What is not obvious is that these five properties (the
“Peano axioms”) are sufficient to prove any other property of the natural
numbers.
– N1. 1 belongs to N.
– N2. If n ∈ N, then n + 1 ∈ N.
– N3. 1 is not the successor of any element of N.
– N4. If n and m ∈ N have the same successor, then n = m.
– N5. A subset S ∈ N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
• Axiom N5 is related to “proof by induction,” where you want to prove an

infinite set of propositions P1 , P2 , P3 , · · · .
You do this by proving P1 (the “base case”) and then proving that Pn
implies Pn+1 (the “inductive step”).
• The “least number principle” states that any nonempty subset of N has a
least element. This statement, along with the assumption that any natural
number except 1 has a predecessor, can be used to replace N5.
Practical application: instead of doing a proof by induction, you can assert
that k > 1 is the smallest integer for which Pk is false, then get a contra-
diction by showing that Pk−1 is also false, thereby proving that the set for
which Pk is false must be empty.
• The familiar rational numbers can be regarded as fractions in lowest terms:

e.g. m
n
and 2m
2n
represent the same rational number. The rational number
m
r = n satisfies the first-degree polynomial equation nx − m = 0. More
generally, a number that satisfies a polynomial equation of any (finite)
degree, like x2 − 2 = 0 or x5 + x − 1 = 0, is called an algebraic number.
• The rational numbers form a “countably infinite set,” which means that
there is a bijection between them and the natural numbers. Many proofs
rely on the fact that the rational numbers, or a subset of them, can be
enumerated as q1 , q2 , · · · .
4
1.2 Rational Numbers and Real Numbers
• The rational numbers and the real numbers each form an ordered field,
which means that there is a relation ≤ with properties
O1. Given a and b, either a ≤ b or b ≤ a.
O2. If a ≤ b and b ≤ a, then a = b.
O3. If a ≤ b and b ≤ c then a ≤ c.
O4. If a ≤ b, then a + c ≤ b + c.
O5. If a ≤ b and 0 ≤ c, then ac ≤ bc.
Many important properties of infinite sequences of real numbers can be
proved on the basis of ordering.
• If we think of the rational numbers or the real numbers as lying on a number

line, we can interpret the absolute value |a − b| as the distance between
point a and point b: dist(a, b) = |a − b|. In two dimensions the statement
dist(a, b) ≤ dist(a, c) + dist(c, b) means that the length of one side of a
triangle cannot exceed the sum of the lengths of the other two sides. The
name “triangle inequality” is also applied to the one-dimensional special
case where c = 0; i.e. |a + b| ≤ |a| + |b|.
• Many well-known rules of algebra are not included on the list of field axioms.
Usually, as for (−a)(−b) = ab, this is because they are easily provable
theorems. However, there are properties of the real numbers that cannot
be proved from the field axioms alone because they rely on the axiom that
the real numbers are complete. The Completeness Axiom states that
Every nonempty subset S of R that is bounded above has a least upper
bound.
This least upper bound sup S is not necessarily a member of the set S.
• The Archimedean property of the real numbers states that

for any two positive real numbers a and b, there exists a positive integer n
such that na > b. Its proof requires the Completeness Axiom.
• The rational numbers are a “dense subset” of the real numbers. This means
if a, b ∈ R and a < b, there exists r ∈ Q such that a < r < b.
Again the proof relies of the completeness of the real numbers.
• It is not unreasonable to think of real numbers as infinite decimals (though

there are complications). In this view, π (which is not even algebraic) is
the least upper bound of the set
S = {3, 3.1, 3.14, 3.141, 3.1415, 3.14159, · · · }
• The real numbers form an uncountable set. This means that there is no bi-
jection between them and the natural numbers: they cannot be enumerated
as r1 , r2 , · · · .
5
1.3 Quantifiers and Negation
• Quantifiers are not used by Ross, but they are conventional in mathematics
and save space when you are writing proofs.
∃ is read “there exists.” It is usually followed by “such that” or “s.t.”
Example: the proposition “∃x s.t. x2 = 4” is true since either 2 or -2 has
the desired property.
∀ is read “for all” or “for each” or “for every.” It is used to specify that some
proposition is true for every member of a possibly infinite set or sequence.
Example: ∀x ∈ R, x2 ≥ 0 is true, but ∀x ∈ R, x2 > 0 is false.
• Quantifiers and negation: useful in doing proofs by contradiction.
– The negation of “∃x such that P (x) is true” is “∀x, P (x) is false.”
– The negation of “∀x, P (x) is true” is “∃x such that P (x) is false.”
1.4 Sequences and their limits

• A sequence is really a function whose domain is a subset n ≥ m of the
integers, usually starting with m = 0 or 1, and whose codomain (in this
module) is R. Later we will consider sequences of vectors in Rn .
A specific element is denoted sn . The entire sequence can be denoted
(s1 , s2 , · · · ) or (sn )n∈N or even just (sn ).
Although a sequence is infinite, the set of values in the sequence may be
finite; e.g. for sn = cos nπ the set of values is just {−1, 1}.
• “Limit of a sequence” always refers to the limit as n becomes very large; so

it is unambiguous to write it lim sn instead of limn→∞ sn .
Sequence (sn ) is said to converge to the limit s if
∀ > 0, ∃N ∈ N such that ∀n > N, |sn − s| < .
To prove that a sequence (sn ) converges by using this definition, we have
to know or guess the value of the limit s. The rest is algebra, frequently
rather messy algebra.
• If the limit of a sequence exists, it is unique. The proof is a classic applica-

tion of the triangle inequality.
• A “formal proof” should be as concise as possible while omitting nothing

that is essential. Sometimes it obscures the chain of thought that led to the
invention of the proof. Formal proofs are nice, and you should learn how
to write them (Ross has six examples in section 8 and six more in section
9), but if your goal is to convince or instruct the reader, a longer version of
the proof may be preferable.
6
1.5 Theorems about sequences and their limits
• Theorems about limits, all provable from the definition. These will be
especially useful for us after we define continuity in terms of sequences.
– If lim sn = s then lim(ksn ) = ks.

– If lim sn = s and lim tn = t, then lim(sn + tn ) = s + t.
– Any convergent sequence is bounded:
if lim sn = s, ∃M such that ∀n, |sn | < M.
– If lim sn = s and lim tn = t, then lim(sn tn ) = st.
– If lim sn = 0 and (tn ) is bounded, then lim(sn tn ) = 0.
1
– If sn 6= 0 for all n and s = lim sn 6= 0, then inf |sn | > 0 and sn
converges to 1s .
• Using the limit theorems above is usually a much more efficient way to find
the limit of the sequence than doing a brute-force calculation of N in terms
of . Ross has six diverse examples.
• The symbol +∞ has a precise meaning when used to specify a limit. We

say that “the sequence sn diverges to +∞” if
∀M > 0, ∃N such that ∀n > N, sn > M .
Similarly, we say that “the sequence sn diverges to −∞” if
∀M < 0, ∃N such that ∀n > N, sn < M .
• Theorems about infinite limits:
– If lim sn = +∞ and lim tn > 0 (could be +∞), then lim sn tn = +∞.

– If (sn ) is a sequence of positive real numbers, then lim sn = +∞ if and
only if lim s1n = 0.
– If lim sn = +∞, then lim sn + tn = +∞ if tn has any of the following
properties:
∗ lim tn > −∞
∗ tn is bounded (but does not necessarily converge).
∗ inf tn > −∞ (who cares whether tn is bounded above?).
7
2 Lecture Outline
1. Peano axioms for the natural numbers ——N = 1, 2, 3, · · ·
• N1. 1 belongs to N.
• N2. If n ∈ N, then n + 1 ∈ N.
• N3. 1 is not the successor of any element of N.
• N4. If n and m ∈ N have the same successor, then n = m.
• N5. A subset S ∈ N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
Axiom N5 is related to “proof by induction,” where you want to prove an

infinite set of propositions P1 , P2 , P3 , · · · .
You do this by proving P1 (the “base case”) and then proving that Pn
implies Pn+1 (the “inductive step”).
A well known example: the formula 1 + 2 + 3 + · · · + n = 21 n(n + 1)
For proposition P1 simply set n = 1: it is true that 1 = 21 n(n + 1)
Write down proposition Pn , and use a little algebra to show that if Pn is in
the sequence of true propositions, then so is Pn+1
8
A surprising replacement for axiom N5:
• Every subset of N has a smallest element.
• Any element of N except 1 has a predecessor.
Use these two statements (plus N1 through N4) to prove N5.
Practical application: instead of doing a proof by induction, you can denote

by k the smallest integer for which Pk is false, then get a contradiction by
showing that Pk−1 is also false, thereby proving that the set for which Pk
is false must be empty.
How this works in our example-
Suppose that 1 + 2 + 3 + · · · + n = 21 n(n + 1) is not always true. Then there
is a nonempty subset of natural numbers for which it is false. This subset
includes a smallest number k.
Using our analysis from the previous page:
How do we know that k cannot be 1?
Giiven that k cannot be 1, how do we know that k cannot in fact be the

smallest element for which Pk is false?
There is less to this approach than meets the eye. Instead of proving that
Pk implies Pk+1 for k ≥ 1, we showed that NOT Pk implies NOT Pk−1 for
k ≥ 2,
But these two statements are logically equivalent: quite generally, for pro-
postions p and q, p =⇒ q if and only if ¬q =⇒ ¬p. (principle of
contraposition)
A practical rule of thumb:
• If it is easier to prove that Pk =⇒ Pk+1 , use induction.
• If it is easier to prove that ¬Pk =⇒ ¬Pk−1 , use the least-number
principle.
9
2. Proof by induction and least number principle
Students of algebra are aware that for any positive integer n, xn − y n is
divisible by x − y.
• Give a formal inductive proof of this theorem by induction

(“formal” means no use of · · · ).
• Give an alternative proof using the fact that any nonempty set of
positive integers contains a smallest element.
10
3. (Ross, page 16; consequences of the ordered field axioms)
Using the fact that a set of numbers F (could be Q or R) satisfies the
ordered field axioms
O4. If a ≤ b, then a + c ≤ b + c.
prove the following:
• If a ≤ b then −b ≤ −a.
• ∀a ∈ F , a2 ≥ 0.
4. (Countability of the rational numbers - first part of proof 5.1 - script 2.1A)
Use the “diagonal trick” to prove that the positive rational numbers form
a countably infinite set.
5. (Ross, p. 25; the Archimedean Property of R and the denseness of Q -

corollary in script 2.1C)
The completeness axiom for the real numbers states that every nonempty
subset S ∈ R that is bounded above has a least upper bound sup S. Use
it to prove that for any two positive real numbers a and b, there exists a
positive integer n such that na > b.
6. Uncountability of the real numbers - second part of proof 5.1 - script 2.1B)
Prove that the real numbers between 0 and 1, as represented by infinite
decimals, form an uncountably infinite set.
7. (Ross, page 37 - to be done in LaTeX)

Prove that if lim sn = s and lim sn = t, then s = t.
8. (Ross, page 46 - script 2.1D)

Prove that if lim sn = s and lim tn = t, then lim(sn + tn ) = s + t.
9. (Ross, pages 45 and 47)

Prove that any convergent sequence is bounded, then use this result to show
that if lim sn = s and lim tn = t, then lim(sn tn ) = st.
10. (Ross, pages 43 and 47 - script 2.1D)

Suppose that sn 6= 0 for all n and that s = lim sn > 0.
Prove that ∃N such that ∀n > N, sn > s/2 and that s1n converges to 1s .
11. (Ross, page 48)

1
Using the binomial expansion, prove that lim(n n ) = 1.
11
12. (Ross, page 52 - to be done in LaTeX)
Suppose that lim sn = +∞ and lim tn > 0. Prove that lim sn tn = +∞.
12
13. Proofs based on nothing but the ordered field axioms
O4. If a ≤ b, then a + c ≤ b + c.
(a) Using the axioms for an ordered field, prove that the sum of two pos-
itive numbers is a positive number.
(b) Using the axioms for an ordered field, prove that the product of two
positive numbers is a positive number.
(c) Prove that Z5 is not an ordered field.
13
14. Least upper bound principle works for R but not for Q.
Your students at Springfield North are competing with a rival team from
Springfield South to draw up a business plan for a company with m scientists
and n other employees. Entries with m2 > 2n2 get rejected. The entry with
the highest possible ratio of scientists to other employees wins the contest.
Will this competition necessarily have a winner?
14
15. Use quantifiers to express the following concepts:
(a) “No matter how large a positive number M you choose, the sequence
(sn ) has infinitely many elements that are greater than M .”
Does this statement imply that lim sn = +∞?
(b) “No matter how small a positive number you choose, the sequence
(sn ) has only finitely many elements that lie outside the interval
(a − , a + ).”
Does this statement imply that lim sn = a?
15
16. Proving limits by brute force
Prove by brute force that the sequence
1 2 3 4
, , , ,···
3 5 7 9
converges to
1
.
2
16
17. Using limit theorems and trickery to prove limits
(a) Evaluate
1
lim √ √ .
n( n + 1 − n2 − 1)
2
1
Note: √ √ = 0.99999999874999999...
100( 10001 − 9999)
(b) Evaluate
4 4
lim((n + 1) 3 − n 3 ).
4 4 √
3
Note: 101 3 − 100 3 = 6.19907769....; 100 = 4.6415....
17
3 Group Problems
1. Proofs that use induction
(a) Prove that for all nonnegative integers n

n n
!2
X X
i3 = i
i=1 i=1
Hint: the following identity from warmup #1 may be useful

n
X n(n + 1)
i=
i=1
2
(b) • Starting from xy ≤ |xy|, which looks like Cauchy-Schwarz, prove

the triangle inequality |a + b| ≤ |a| + |b| for an ordered field
• Starting from the triangle inequality, prove that for n numbers
a1 , a2 , · · · , an
|a1 + a2 + · · · + an | ≤ |a1 | + |a2 | + · · · + |an |.
(c) • Use the Archimedean property of the real numbers to prove
if a and b are positive real numbers and a < b, there exists r ∈ Q
such that a < r < b.
If you need a hint, look at section 4.8 in Ross or run script 2.1C.
The fact that a and b are positive makes the proof easier than the
one in Ross.
• By induction, prove that in any open interval (a, b) there are in-
finitely many rational numbers.
18
2. Properties of sequences (to be done in LaTeX)
(a) The “squeeze lemma”

Consider three sequences (an ), (bn ), (cn ) such that an ≤ sn ≤ bn for all
n and lim an = lim bn = s. Prove that lim sn = s.
(b) Using quantifiers to describe sequences
Let sn denote the number of inches of snowfall in Cambridge in year n,
e.g. s2013 = 90. Using the quantifiers ∃ (there exists) and ∀ (for all),
convert the following English sentences into mathematical notation.
i. There will be infinitely many years in which the Cambridge snow-
fall exceeds 100 inches.
ii. If you wait long enough, there will come a year after which Cam-
bridge never again gets more than 20 inches of snow.
iii. The snowfall in Cambridge will approach a limit of zero.
(c) Prove that if sequence (tn ) is bounded and lim(sn ) = 0, then lim(tn sn ) =
0.
19
3. Some slightly computational problems
(a) Proving limits by brute force

Let
6n − 4
sn = .
2n + 8
Determine lim sn and prove your answer by brute force, directly from
the definition of limit. (For a model, see Ross, Example 2 on page 39.)
Then get the same answer more easily by using limit theorems.
(b) Finding limits by using limit theorems
p
Determine lim( n(n + 2) − n), stating what limit theorems you are
using in each step.
Hint: Use the same trick of “irrationalizing the denominator” as in
Ross, section 8, example 5. However, that example requires using the
definition of limit. You can invoke limit theorems, which makes things
much easier.
√
(c) (Ross, 9.4) Let s1 = 1 and for n ≥ 1 let sn+1 = sn + 1
• List the first four terms of (sn ).
• It turns out that (sn )√converges to a limit s. Assuming this fact,
prove that s = 21 (1 + 5).
20
4 Homework
1. Ross, exercise 1.1. Do the proof both by induction (with “base case” and
“inductive step”) and by the least number principle (show that the assump-
tion that there is a nonempty set of positive integers for which the formula
is not true leads to a contradiction)
2. Using quantifiers to describe infinite sequences

A Greek hero enters the afterlife and is pleased to learn that the goddess
Artemis is going to be to training him for eternity. He will be shooting an
infinite sequence of arrows. The distance that the nth arrow travels is sn .
Use quantifiers ∃ and ∀ to convert the following to mathematical notation.
(a) He will shoot only finitely many arrows more than 200 meters.
(b) The negation of (a): he will shoot infinitely many arrows more than
200 meters. (You can do this mechanically by using the rules for
negation of statements with quantifiers.)
(c) No matter how small a positive number Artemis chooses, all the rest
of his shots will travel more than 200 − meters. (Off the record –
this idea can be expressed as lim inf sn = 200)
(d) He will become so consistent that eventually any two of his subsequent
shots will differ in distance by less than 1 meter. (This idea will
resurface next week as the concept of “Cauchy sequence.”)
3. Denseness of Q
This problem is closely related to group problem 1c.
355 22
(a) Find a rational number x such that 113
<x< 7
.
355
(b) Find a rational number x such that π < x < 113
.
Hint: π = 4 arctan 1, which any decent calculator can evaluate.
4. Ross, exercise 3.6.
5. Ross, exercise 4.8. If you like this problem, you might enjoy reading enrich-
ment section 6 in Ross, which explains how to construct the real numbers
using “Dedekind cuts.”
6. Ross, Exercise 8.2(c) and 8.2(e). You might want to use the limit theorems
from section 9 to determine the limit, but then do a “Formal Proof” in the
style of the examples from section 8, working directly from the definition
of limit.
21
The last three problems must be dome in LaTeX. Print the pdf file and
attach it to your handwritten solutions.
7. Ross, Exercise 8.9. The star on the exercise means that it is “referred to in
many places.”
8. Ross, Exercise 9.12. This “ratio test” may be familiar from a calculus
course. There is a similar, better known test for infinite series that is
slightly more difficult to prove.
9. Ross, Exercises 9.15 and 9.16(a). The first of these results is invoked fre-
quently in calculus courses, especially in conjunction with Taylor series, but
surprisingly few students can prove it. If you are working the problems in
order, both should be easy.
22
Module #2, Week 2 (Series, Convergence, Power Series)
Reading from Ross
• Chapter 2, sections 10 and 11(pp. 56-77) (monotone and Cauchy sequences,

subsequences, introduction to lim sup and lim inf)
• Chapter 2, sections 14 and 15 (pp. 95-109) (series and convergence tests)
• Chapter 4, section 23 (pp.187-192) (convergence of power series)
• Give an example of:
– a set that contains its supremum and infimum.

– a set that contains only its supremum.
– a set that contains neither its supremum nor infimum.
• From an analytic perspective, if given a series or sequence, how do you

show convergence? (If I gave you a sequence or series and asked whether
it converged, what would you need to compute or demonstrate to conclude
that it was convergent?) How would you show divergence?
• Review chapter 2, section 16 on decimal expansion of real numbers.

Given a repeating decimal, you can write this number as a geometric series.
Write the repeating decimal 0.363636363 · · · as a geometric series, and use
the formula
a
1−r
to show that it is equal to 7/11.
P1
• Give a brief explanation as to why the harmonic series n
diverges. It
need not be rigorous - we will be exploring this in full in this class.
1
• Review the convergence tests you can remember and any specific criteria
for their applications. Use one to show that
∞
2
X
e−n
n=0
is convergent and another to show that

∞
X (−1)n
√
n=0
n
is convergent. Try applying the Root Test (which may be unfamilar) to a

geometric series.
• 6.1 Bolzano-Weierstrass
– Prove that any bounded increasing sequence converges. (You can

assume without additional proof the corresponding result, that any
bounded decreasing sequence converges.)
– Prove that every sequence (sn ) has a monotonic subsequence.
– Prove the Bolzano-Weierstrass Theorem: every bounded sequence has
a convergent subsequence.
• 6.2 The Root Test

the lim sup |an |1/n , referred to as α.
P
Consider the infinite series an and P
Prove the following statements about an :
– The series converges absolutely if α < 1.

– The series diverges if α > 1.
– If α = 1, then nothing can be deduced conclusively about the behavior
of the series.
Additional proofs(may appear on quiz, students wiill post pdfs or

videos
• 6.3 (Cauchy sequences) A Cauchy sequence is defined as a sequence where

∀ > 0, ∃N s.t. ∀m, n > N =⇒ |sn − sm | <
– Prove that any Cauchy sequence is bounded.

– Prove that any convergent sequence is Cauchy.
2
– Prove that any Cauchy sequence of real numbers is covergent. You
will need to use something that follows from the completeness of the
real numbers. This could be the Bolzano-Weierstrss theorem, or it
could the fact that, for a sequence of real numbers, if lim inf sn =
lim sup sn = s, then lim sn is defined and
lim sn = s
• 6.4 (Ross, p.188, Radius ofP

Convergence)
Consider the power series an xn . Let us refer to lim sup |an |1/n as β and
1/β as R. If β = 0, R = +∞ and if β = +∞, R = 0. )
– If |x| < R, the power series converges.

– If |x| > R, the power series diverges.
3
R Scripts
• Script 2.2A-MoreSequences.R
Topic 1 – Cauchy Sequences
Topic 2 – Lim sup and lim inf of a sequence
• Script 2.2B-Series.R
Topic 1 – Series and partial sums
Topic 2 – Passing and failing the root test
Topic 3 – Why the harmonic series diverges
4
1 Executive Summary
1.1 Monotone sequences
A sequence (sn ) is increasing if sn ≤ sn+1 ∀n.
A sequence (sn ) is strictly increasing if sn < sn+1 ∀n.
A sequence (sn ) is decreasing if sn ≥ sn+1 ∀n.
A sequence (sn ) is strictly decreasing if sn > sn+1 ∀n.
A sequence that is either increasing or decreasing is called a monotone sequence.
All bounded monotone sequences converge.
For an unbounded increasing sequence, limn→∞ sn = +∞.
For an unbounded decreasing sequence, limn→∞ sn = −∞.
1.2 Supremum, infimum, maximum, minimum

The supremum of a subset S (which is a subset of some set T ) is the least
element of T that is greater than or equal to all of the elements that are in the
subset S. The supremum of the subset S definitely lives in the set T . It may also
be in S, but that is not a requirement.
The supremum of a sequence the least upper bound of its set of elements.
The maximum is the largest value attained within a set or sequence.
It is easy to find examples of sets or sequences for which no supremum exists, or
for which a supremum exists but a maximum does not.
The infimum of a sequence is the “greatest lower bound,” or the greatest
element of T that is less than or equal to all of the elements that are in the
subset S. It is not the same as a minimum, because the minimum must be
achieved in S, while the infimum may be an element of only T .
1.3 Cauchy sequences

A sequence is a Cauchy sequence if
∀ > 0, ∃N s.t. ∀m, n > N, |sn − sm | <
Both convergent and Cauchy sequences must be bounded.

A convergent sequence of real numbers or of rational numbers is Cauchy.
A Cauchy sequence of real numbers is convergent.
It is easy to invent a Cauchy sequence of rational numbers whose limit is an
irrational number.
Off the record: quantum mechanics is done in a “Hilbert space,” one of the
requirements for whichis that every Cauchy sequence is convergent. Optimization
problems in economics are frequently formualated in a “Banach space,” which has
the same requirement.
5
1.4 lim inf and lim sup
Given any bounded sequence, the “tail” of the sequence, which consists of the
infinite number of elements beyond the N th element, has a well-defined supremum
and infimum.
Let us combine the notion of limit with the definitions of supremum and
infimum. The ”limit infimum” and ”limit supremum” are written and defined as
follows:
lim inf sn = lim inf{sn : n > N }
N →∞
lim sup sn = lim sup{sn : n > N }

N →∞
The limit supremum is defined in a parallel manner, only considering the

supremum of the sequences instead of the infimum.
Now that we know the concepts of lim inf and lim sup, we find the following
properties hold:
• If lim sn is defined as a real number or ±∞, then
lim inf sn = lim sn = lim sup sn
• If lim inf sn = lim sup sn , then lim sn is defined and
lim sn = lim inf sn = lim sup sn
• For a Cauchy sequence of real numbers, lim inf sn = lim sup sn , and so the
sequence converges.
1.5 Subsequences and the Bolzano-Weierstrass theorem

A subsequence is a sequence obtained by selecting an infinite number of terms
from the “parent” sequence in order.
If (sn ) converges to s, then any subsequence selected from it also converges to s.
Given any sequence, we can construct from it a monotonic subsequence, ei-
ther an increasing whose limit is lim sup sn , a decreasing sequence whose limit
is lim inf sn , or both. If the original sequence is bounded, such a monotonic se-
quence must converge, even if the original sequence does not.
This construction proves one of the most useful results in all of mathematics, the
Bolzano-Weierstrass theorem:
Every bounded sequence has a convergent subsequence.
6
1.6 Infinite series, partial sums, and convergence
Given an infinite series Σan we define the partial sum
n
X
sn = ak
k=m
The lower limitPm is usually either 0 or 1.

The series ∞ k=m ak is said to converge when the limit of its partial sums
as n → ∞ equals some P number S. If a series does not converge, it is said to
diverge. The sum an has no meaning unless its sequence of partial sums
either converges to a limit S or diverges to either +∞ or −∞.
A series with all positive terms will either converge or diverge to +∞.
A series with all negative terms will either converge or divergePto −∞.
For a series with both positive and negative terms, the sum an may have no
meaning. P
A series is called absolutely convergent if the series |an | converges.
Absolutely convergent series are also convergent.
1.7 Familiar examples

A geometric series is of the form
a + ar + ar2 + ar3 + . . .
If |r| < 1, then
∞
X a
arn =
n=0
1−r
A p-series is of the form
∞
X 1
n=1
np
for some positive real number p. It converges if p > 1, diverges if p ≤ 1.
1.8 Cauchy criterion

. We say that a series satisfies the Cauchy criterion if the sequence of its partial
sums is a Cauchy sequence. Writing this out with quantifiers, we have
∀ > 0, ∃N s.t. ∀m, n > N, |sn − sm | <
Here is a restatement of the Cauchy criterion, which proves more useful for
some proofs:
Xn
∀ > 0, ∃N s.t. ∀n ≥ m > N, | ak | <
k=m
A series converges if and only if it satisfies the Cauchy criterion.
7
1.9 Convergence tests
• Limit of the terms. If a series converges, the limit of its terms is 0.
P
• Comparison
P Test. Consider the series an Pof all positive terms.
If P an converges and |bn | < an for all n then bn also
P converges.
If an diverges to +∞ and |bn | > an for all n, then bn also diverges to
+∞
P
• Ratio Test. Consider the series an of nonzero terms.
an+1
This series converges if lim sup | an | < 1
This series diverges if lim inf | an+1
an
|>1
an+1 an+1
If lim inf | an | ≤ 1 ≤ lim sup | an |, then we have no information and need
to perform another test to determine convergence.
• Root Test. Consider the seriesP an , and evaluate lim sup |an |1/n .
P
If lim sup |an |1/n < 1, the series P an converges absolutely.
If lim sup |an |1/n > 1, the series an diverges.
If lim sup |an |1/n = 1, the test gives no information.
• Integral Test. Consider a series of nonnegative terms for which the other
tests seem to be failing. In the event that we can find a function f (x), such
that f (n) = an ∀n, we may look at the behavior of this function’s integral
to tell us whether
Rn the series converges.
If limn→∞ R1 f (x)dx = +∞, then the series will diverge.
n
If limn→∞ 1 f (x)dx < +∞, then the series will converge.
• Alternating Series Test. If the absolute value of the each term in an

alternating series is decreasing and has a limit of zero, then the series con-
verges.
1.10 Convergence tests for power series

Power series are series of the form
∞
X
an x n
n=0
where the sequence (an ) is a sequence of real numbers. A power series defines a
function of x whose domain is the set of values of x for which the series converges.
That, of course, depends on the coefficients (an ). There are three possibilities:
– Converges ∀x ∈ R.
– Converges only for x = 0.
– Converges ∀x in some interval, centered at 0. The interval may be open (−R, R),
closed [−R, R], or a mix of the two like [−R, R]. The number R is called the radius
of convergence. Frequently the series converges absolutely in the interior of the
interval, but the convergence at an endpoint is only conditional.
8
Lecture Outline
1. (Ross, p. 62, convergent & Cauchy sequences) A Cauchy sequence is de-

fined as a sequence where ∀ > 0, ∃N s.t. ∀m, n > N =⇒ |sn − sm | <
(a) Prove that any Cauchy sequence is bounded.

(b) Prove that any convergent sequence is Cauchy.
2. (Ross, pp. 60-62, limits of the supremum and infimum)

The limit of the supremum, written ”lim sup” is defined as follows:
lim sup sn = lim sup{sn : n > N }

N →∞
The limit of the infimum, written ”lim inf” is defined as follows:
lim inf sn = lim inf{sn : n > N }

N →∞
(We do not restrict sn to be a bounded sequence, so if it is not bounded

above, lim sup sn = +∞, and if it is not bounded below, lim inf sn = −∞)
Let (sn ) be a sequence in R. Prove that if lim inf sn = lim sup sn = s, then
lim sn is defined and
lim sn = s
3. (Ross, p. 64, convergent & Cauchy sequences)
Using the result of the preceding proof, which relies on the completeness
axiom for the real numbers, prove that any Cauchy sequence of real num-
bers is convergent.
9
4. (Convergent subsequences, Bolzano Weierstrass)
Given a sequence (sn )n∈N , a subsequence of this sequence is a sequence
(tk )k∈N , where for each k, there is a positive integer nk such that
n1 < n2 < . . . < nk < nk+1 . . .
and tk = snk . So (tk ) is just a sampling of some, or all, of the (sn ) terms,
with order preserved.
A term sn is called dominant if it is greater than any term that follows it.
(a) Use the concept of dominant term to prove that every sequence (sn )
has a monotonic subsequence.
(b) Prove that any bounded increasing sequence converges to its least
upper bound.
(c) Prove the Bolzano-Weierstrass Theorem: every bounded sequence has
a convergent subsequence.
5. (Ross, p. 96, Example 1, geometric series (refers also to p. 98))

Prove that ∞
X a
ark = if |r| < 1,
k=0
1−r
and that the series diverges if |r| ≥ 1.

For the sake of novelty, do the first part of the proof by using the least-
number principle insted of by induction.
10
6. (Ross, p.99-100, The Root Test)
the lim sup |an |1/n , referred to as α.
P
Consider the infinite series an and P
Prove the following statements about an :
(you may assume the Comparison Test as proven)
• The series converges absolutely if α < 1.

• The series diverges if α > 1.
• If α = 1, then nothing can be deduced conclusively about the behavior
of the series.
7. (Ross,
P pp. 99-100, The Ratio Test)
Let an be an infinite series of nonzero terms. Prove the following (you
may assume the Root Test as proven). You may also use without proof the
following result from Ross (theorem 12.2):
sn+1 1 1 sn+1
lim inf | | ≤ lim inf |sn | n ≤ lim sup |sn | n ≤ lim sup | |
sn sn
• If lim sup |an+1 /an | < 1, then the series converges absolutely.
• If lim inf |an+1 /an | > 1, then the series diverges.
• If lim inf |an+1 /an | ≤ 1 ≤ lim sup |an+1 /an |, then the test gives no
information.
8. (Ross, p.188, Radius of Convergence)

an xn . Let us refer to lim sup |an |1/n as β and
P
Consider the power series
1/β as R. (Logically, it follows that if β = 0, R = +∞ and if β = +∞, R =
0. )
• If |x| < R, the power series converges.

• If |x| > R, the power series diverges.
(You may recognize R here as the radius of convergence.)
11
9. Defining a sequence recursively (model for group problems, set 1)
John’s rich parents hope that a track record of annual gifts to Harvard will
enhance his chance of admssion. On the day of his birth they set up a trust
fund with a balance s0 = 1 million dollars. On each birthday they add
another million dollars to the fund, and the trustee immediately donates
1/3 of the fund to Harvard in John’s name. After the donation, the balance
is therefore
2
sn+1 = (sn + 1).
3
• Use R to find the annual fund balance up through s18 .
• Use induction to show sn < 2 for all n.
• Show that (sn ) is an increasing sequence.
• Show that lim sn exists and find lim sn .
12
10. What is the fallacy in the following argument?
•
1 1 1 1 1 1 1
loge 2 = 1 − + − + − + − + ··· .
2 3 4 5 6 7 8
•
1 1 1 1 1
loge 2 = − + − + · · · .
2 2 4 6 8
•
3 1 1 1 1 1 1
loge 2 = 1 − − + − − + + · · · = loge 2.
2 4 4 3 8 8 5
•
3
= 1; 3 = 2; 1 = 0.
2
13
11. Clever proofs for p-series.
P1
(a) Prove that n
= +∞ by showing that the sequence of partial sums
is not a Cauchy sequence.
(b) Evaluate
∞
X 1
n=2
n(n − 1)
by exploiting the fact that this is a “telescoping series.”
(c) Prove that
∞
X 1
n=2
n2
is convergent.
14
12. For the sequence
n+2 nπ
sn = sin( ),
n+1 4
give three examples of a subsequence, find the lim sup and the lim inf, and
determine whether it converges.
15
13. A case where the root test outperforms the ratio test
(Ross, Example 8 on page 103)
∞
X n −n 1 1 1 1 1
2(−1) =2+ + + + + + ··· .
n=0
4 2 16 8 64
(a) Show that the ratio test fails totally.

(b) Show that the root test creectly concludes that the seies is convergent.
(c) Find a simpler argument using the comparison test.
16
14. (Model for group problems, set 3) Find the radius of convergence and the
exact interval of convergence for the series
∞
X n 3n
x ,
n=0
2n
(a) by using the Root Test.

(b) by using the Ratio Test.
17
2 Group Problems
1. Subsequences, monotone sequences, lim sup and lim inf
(a) (Ross, 11.4) Here are four sequences:

n nπ
an = (−2)n , xn = 5(−1) , yn = 1 + (−1)n , dn = n cos ( )
4
i. For each sequence, give an example of a monotone subsequence.
ii. For each sequence, give its set of subsequential limits.
iii. For each sequence, give its lim sup and lim inf.
iv. Which of the sequences is bounded? converges? diverges to +∞?
diverges to −∞?
(b) (Ross, 12.4)
Show that lim sup(sn + tn ) ≤ lim sup sn + lim sup tn for bounded se-
quences (sn ) and (tn ), and invent an example where lim sup(sn + tn ) <
lim sup sn + lim sup tn . There is a hint on page 82 of Ross.
(c) The following famous series, known as Gregory’s series but discovered
by the priest-mathematicians of southwest India long before James
Gregory (1638-1675) was born, converges to π4 .
π 1 1 1 1
= 1 − + − + + ··· .
4 3 5 7 9
i. For the sequence of partial sums (sn ), find an increasing subse-
quence and a decreasing subsequence.
ii. Prove that lim sup sn = lim inf sn
iii. Prove that the series is not absolutely convergent by showing that
it fails the Cauchy test with = 1/2,
18
2. Sequences, defined recursively
Feel free to use R to calculate the first few terms of the sequence instead
of doing it by hand. Using a for loop, you can easily calculate as many
terms as you like. By modifying script 2.2C, you can easily plot the first
20 or so terms. It you come up with a good R script, please upload it to
the solutions page.
n
(a) (Ross, 10.9) Let s1 = 1 and sn+1 = ( n+1 )s2n for n > 1.
• Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go
at least as far as s20 .
• Show that lim sn exists.
• Prove that lim sn = 0.
(b) (Ross, 10.10) Let s1 = 1 and sn+1 = 13 (sn + 1) for n > 1.
• Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go
at least as far as s20 .
• Use induction to show sn > 12 for all n.
• Show that (sn ) is a decreasing sequence.
• Show that lim sn exists and find lim sn .
1
(c) (Ross, 10.12) Let t1 = 1 and tn+1 = [1 − ]t
(n+1)2 n
for n > 1.
• Find t2 , t3 , t4 if working by hand. If using R, use a for loop to go
at least as far as t20 .
• Show that lim tn exists.
• Use induction to show tn = n+1 2n
for all n.
• Find lim tn .
19
This last set of problems should be done using LaTeX. They provide good
practice with summations, fractions, and exponents.
3. Applying convergence tests to power series (Ross, 23.1 and 23.2)

Find the radius of convergence R and the exact interval of convergence.
In each case, you can apply the root test (works well with powers) or the
ratio test (works well with factorials) to get an equation that can be solved
for x to get the radius of convergence R. Since you have an xn , the root test,
which you may not have encountered in AP calculus, is especially useful. At
the endpoints you may need to apply something like the alternating series
test or the integral test.
Remember that lim n1/n = 1.
(a)
X 2n X
( )xn and xn! .
n!
(b)
X 3n X√
n
( )x and nxn .
n · 4n
(c)
X (−1)n X 3n
( 2 n )xn and √ xn .
n4 n
20
3 Homework
1. Ross, 10.2 (Prove all bounded decreasing sequences converge.)
2. Ross, 10.6
3. Ross, 11.8.
4. Suppose that (sn ) is a Cauchy sequence and that the subsequence (s1 , s2 , s4 , s8 , s16 , · · · )
converges to s. Prove that lim sn = s. Hint– use the standard bag of tricks:
the triangle inequality, epsilon-over-2, etc.
5. Sample problem 2 shows that in general, the order of terms in a series must
be respected when calculating the sum. However, addition is commutative
and associative, which makes it surprising that order should matter.
• Prove that if a series (an ) has only positive terms, then its sum is
equal to the least upper bound of the numbers that can be obtained
by summing over any finite subset of the terms.
Hint: Call this least upper bound S 0 . Call the sum as defined by Ross
S. Prove that S 0 ≤ S and that S ≤ S 0 .
• Suppose that a series includes both positive and negative terms and
its sum is S. It looks as though you can split it into a series of non-
negative terms and a series of negative terms, sum each separately,
then combine the results. Will this approach work for the seies in
sample problem 2
6. Ross, 14.3 (Determining whether a series converges. Apologies to those who

have already done hundreds of these in a high-school course.)
7. Ross, 14.8.
8. Ross, 15.6
9. Ross, 23.4. You might find it useful to have R generate some terms of the
series.
10. Ross, 23.5
21
Module #2, Week 3 (Limits and continuity of functions)
Reading from Ross
• Chapter 3, sections 17 and 18. (continuity)
• Chapter 3, sections 19 and 20 (uniform continuity and limits of functions)
• Study example 1 on page 125, then invent a similar argument for the func-
tion f (x) = x2 − 2x + 1. It is important to realize that a proof can be done
“for all sequences.”
•
1
The function g(x) = sin for x 6= 0, g(0) = 0
x
1
is discontinuous at x = 0. Show that the sequence xn = nπ
can be used as
a “bad sequence” to prove this assertion.
• Suppose that a function f (x) has the property that the image of the interval
I = [0, 2] is the interval J = [0, 1] ∪ [2, 3]. Invent a discontinuous function f
with this property and conveince yourself that no continuous function can
have this property.
• When you define the arc sine function in a calculus course, you begin by
restricting the domain of the sine function to the interval [− π2 , π2 ]. Convince
yourself that this restriction makes Theorems 18.4 and 18.5 apply, while
restricting the domain to [0, π] would not work. Which restricted domain
works for defining the arc cosine function?
• Read through examples 1-3 in section 19.1 of Ross. You can skip over the
computational details. The key issue is this:
On the interval (0, ∞) the function f (x) = x12 is continuous for any specified
x0 . However, when x0 is very small, the δ that is needed to prove continuity
1
must be proportional to x30 . There is no “one size fits all” δ that is indepen-
dent of x0 . Example 3 shows that even with = 1, it is impossible to meet
the requirement for uniform continuity. When you draw the graph of f (x),
you see what the problem is: the derivative of f (x), which is essentially the
ratio of to δ, is unbounded.
• The squaring function f (x) = x2 is continuous. However, its derivative is

unbounded on [0, ∞), and the function is not uniformly continuous. Con-
vince yourself that no matter how small you require |y − x| to be, you can
always make |f (y) − f (x)| be as large as you like simply by making y and
x be large.
• Now you have seen two ways to select a function and an interval so that the
function is continuous but not uniformly continuous on the interval. Read
through the rest of section 19.1 to see how to avoid this situation. There
are four ways:
– Make the interval be closed and bounded.

– If the interval is not closed, make it closed by including its endpoints,
and “extend” the function so that it remains continuous.
– The problem is related to an unbounded derivative: if f 0 (x) is bounded,
it goes away.
– If f turns a Cauchy sequence (xn ) into a Cauchy sequence (f (xn )),
there is no problem,
• Think hard about definition 20.1. This is not the definition of limit that
is found in most calculus texts, but it is in some ways better because it
incorporates the ideas of “limit at infinity” and “increases without limit.”
• Look at theorems 20.4 and 20.5, and convince yourself that they are crucial
for proving the well-known formulas for derivatives that are in every calculus
course. If you are fond of entertaining counterexamples, look at example 7
on page 158.
• 7.1 Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b).
Prove that there exists at least one x ∈ [a, b] such that f (x) = y.
Use Ross’s “no bad sequence” definition of continuity, not the epsilon-delta
definition.
• 7.2 Using the Bolzano-Weierstrass theorem, prove that if function f is con-

tinuous on the closed interval [a, b], then f is uniformly continuous on [a, b].
2
videos
• 7.3 Prove that if f and g are real-valued functions that are continuous at
x0 ∈ R, then f + g is continuous at x0 . Do the proof twice: once using the
“no bad sequence” definition of continuity and one using the epsilon-delta
definition of continuity.
• 7.4 (Ross, page 146; uniform continuity and Cauchy sequences)

Prove that if f is uniformly continuous on a set S and (sn ) is a Cauchy
sequence in S, then (f (sn )) is a Cauchy sequence. Invent an example where
f is continuous but not uniformly continuous on S and (f (sn )) is not a
Cauchy sequence.
3
R Scripts
• Script 2.3A-Continuity.R
Topic 1 - Two definitions of continuity
Topic 2 – Uniform continuity
• Script 2.3B-IntermediateValue.R
Topic 1 - Proving the intermediate value theorem
Topic 2 - Corollaries of the IVT
4
1 Executive Summary
1.1 Two equivalent definitions of continuity
• Continuity in terms of sequences
This definition is not standard: Ross uses it, but many authors use the
equivalent epsilon-delta definition. Here is some terminology that students
find useful when discussing the concept:
– If lim xn = x0 and lim f (xn ) = f (x0 ), we call xn a “good sequence.”

– If lim xn = x0 but lim f (xn ) 6= f (x0 ), we call xn a “bad sequence.”
Then “function f is continuous at x0 ” means “every sequence is a good

sequence”; i.e. “there are no bad sequences.”
• The more conventional definition:

Let f be a real-valued function with domain U ⊂ R. Then f is continuous
at x0 ∈ U if and only if
∀ > 0, ∃δ > 0 such that if x ∈ U and |x − x0 | < δ, |f (x) − f (x0 )| < .
• Which definition to use?

To prove that a function is continuous, it is often easier to use the second
version of the definition. Start with a specified , and find a δ (not ”the
δ”) that does the job. However, as Ross Example 1a on page 125 shows,
the first definition, combined with the limit theorems that we have already
proved, can let us prove that an arbitrary sequence is good.
To prove that a function is discontinuous, the first definition is generally
more useful. All you have to do is to construct one bad sequence.
5
1.2 Useful properties of continuous functions
• New continuous functions from old ones.
– If f is continuous at x0 , then |f | is continuous at x0 .

– If f is continuous at x0 , then kf is continuous at x0 .
– If f and g are continuous at x0 , then f + g is continuous at x0 .
– If f and g are continuous at x0 , then f g is continuous at x0 .
f
– If f and g are continuous at x0 and g(x0 ) 6= 0, then g
is continuous at
x0 .
– If f is continuous at x0 and g is continuous at f (x0 ), then the composite
function g ◦ f is continuous at x0 .
Once you know that the identity function and elementary functions like nth
root, sine, cosine, exponential, and logarithm as continuous (Ross has not
yet defined most of these functions!), you can state the casual rule
“If you can write a formula for a function that does not involve
division by zero, that function is continuous everywhere.”
• Theorems about a continuous function on a closed interval [a, b] (an exam-

ple of a “compact set”), easy to prove by using the Bolzano-Weierstrass
theorem.
– f is a bounded function.
– f achieves its maximum and minimum values on the interval (i.e. they
are not just approached as limiting values).
• The Intermediate Value Theorem and some of its corollaries.

It is impossible to do calculus without either proving these theorems or
stating that they are obvious!
Now f is assumed continuous on an interval I that is not necessarily closed
(e.g. x1 on (0, 1])
– IVT: If a < b and y lies between f (a) and f (b), there exists at least
one x in (a, b) for which f (x) = y.
– The image of an interval I is either a single point or an interval J.
– If f is a strictly increasing function on I, there is a continuous strictly
increasing inverse function f −1 : J → I.
– If f is a strictly decreasing function on I, there is a continuous strictly
decreasing inverse function f −1 : J → I.
– If f is one-to-one on I, it is either strictly increasing or strictly de-
creasing.
6
1.3 Continuity versus uniform continuity
It’s all a matter of the order of quantifiers. For continuity, y is agreed upon
before the epsilon-delta game is played. For uniform continuity, a challenge is
made using some > 0, then a δ has to be chosen that meets the challenge
independent of y.
For function f whose domain is a set S:
• Continuity: ∀y ∈ S, ∀ > 0,
∃δ > 0 such that ∀x ∈ S, |x − y| < δ implies |f (x) − f (y)| < .
• Uniform continuity: ∀ > 0,

∃δ > 0 such that ∀x, y ∈ S,|x − y| < δ implies |f (x) − f (y)| < .
• On [0, ∞] (not a bounded set), the squaring function is continuous but not
uniformly continuous.
1
• On (0, 1) (not closed) the function f (x) = x
is continuous but not uniformly
continuous.
• On a closed, bounded interval [a, b], continuity implies uniform continuity.

The proof uses the Bolzano-Weierstrass theorem.
• By definition, if a function is continuous at s ∈ S and (sn ) converges to s,

then (f (sn )) converges to f (s). If (sn ) is merely Cauchy, we know that it
converges, but not what it converges to. To guarantee that (f (sn )) is also
Cauchy, we must require f to be uniformly continuous.
• On an open interval (a, b) a function can be continuous without being uni-

formly continuous. However, if we can extend f to a function f , defined so
that f is continuous at a and b, then f is uniformly continuous on [a, b] and f
is uniformly continuous on (a, b). The most familiar example is f (x) = sinx x
on (0, ∞), extended by defining f (0) = 1.
• Alternative criterion for uniform continnuity (sufficient but not necessary):

f is differentiable on (a, b), with f 0 bounded on (a, b).
7
1.4 Limits of functions
1. Definitions of “limit”
• Ross’s definition of limit, consistent with the definition of continuity:

S is a subset of R, f is a function defined on S, and a and L are real
numbers, ∞ or −∞. Then limx→aS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
• The conventional epsilon-delta definition:
f is a function defined on S ⊂ R, a is a real number in the closure of S
(not ±∞) and L is a real number (not ±∞). limx→a f (x) = L means
∀ > 0, ∃δ > 0 such that if x ∈ S and |x − a| < δ, then |f (x) − L| < .
2. Useful theorems about limits, useful for proving differentiation rules.

Note: a can be ±∞ but L has to be finite.
Suppose that L1 = limx→aS f1 (x) and L2 = limx→aS f2 (x) exist and are
finite.
Then
• limx→aS (f1 + f2 )(x) = L1 + L2 .

• limx→aS (f1 f2 )(x) = L1 L2 .
• limx→aS ( ff21 )(x) = L1
L2
, provided L2 6= 0 and f2 (x) 6= 0 for x ∈ S.
3. Limit of the composition of functions

Suppose that L = limx→aS f (x) exists and is finite.
Then limx→aS (g ◦ f )(x) = g(L) provided
• g is defined on the set {f (x) : x ∈ S}.

• g is defined at L
(which may just be a limit point of the set {f (x) : x ∈ S}.)
• g is continuous at L.
4. One-sided limits
We can modify either definition to provide a definition for L = limx→a+ f (x).
• With Ross’s definition, choose the set S to include only values that
are greater than a.
• With the conventional definition, consider only x > a: i.e.
a < x < a + δ implies |f (x) − L| < .
It is easy to prove that

limx→a f (x) = L if and only if limx→a+ f (x) = limx→a− f (x) = L.
8
Lecture outline
1. (Ross, page 124)

For specified x0 and function f , define the following terminology:
• If lim xn = x0 and lim f (xn ) = f (x0 ), we call xn a “good sequence.”

• If lim xn = x0 but lim f (xn ) = f (x0 ), we call xn a “bad sequence.”
Then Ross’s definition of continuity is “every sequence is a good sequence.”

Prove the following, which is the more conventional definition:
Let f be a real-valued function with domain U ⊂ R. Then f is continuous
at x0 ∈ U if and only if
∀ > 0, ∃δ > 0 such that if x ∈ U and |x − x0 | < δ, |f (x) − f (x0 )| < .
2. (Ross, page 128)

Prove that if f and g are real-valued functions that are continuous at x0 ∈ R,
then f + g is continuous at x0 .
3. (Ross, page 133)

Let f be a real-valued function on a closed interval [a, b]. Using the Bolzano-
Weierstrass theorem, prove that f is bounded and that f achieves its max-
imum value: .i.e. ∃y0 ∈ [a, b] such that f (x) ≤ f (y0 ) for all x ∈ [a, b].
4. (Ross, page 134: the intermediate value theorem)

Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b). Prove
that there exists at least one x ∈ [a, b] such that f (x) = y.
Use Ross’s “no bad sequence” definition of continuity, not the epsilon-delta
definition.
5. (Ross, page 143)

Using the Bolzano-Weierstrass theorem, prove that if function f is contin-
uous on the closed interval [a, b], then f is uniformly continuous on [a, b].
6. (Ross, page 146)

Prove that if f is uniformly continuous on a set S and (sn ) is a Cauchy
sequence in S, then (f (sn )) is a Cauchy sequence. Invent an example where
f is continuous but not uniformly continuous on S and (f (sn )) is not a
Cauchy sequence.
9
7. (Ross, page 156)
Use Ross’s non-standard but excellent definition of limit.
S is a subset of R, f is a function defined on S, and a and L are real
numbers, ∞ or −∞.
Then limx→aS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
Suppose that L1 = limx→aS f1 (x) and L2 = limx→aS f2 (x) exist and are
finite.
Prove that limx→aS (f1 + f2 )(x) = L1 + L2 and
limx→aS (f1 f2 )(x) = L1 L2 .
8. (Ross, page 159; conventional definition of limit)

Let f be a function defined on S ⊂ R, let a be in the closure of S, and let
a be a real number.
Prove that limx→a f (x) = L if and only if
∀ > 0, ∃δ > 0 such that
if x ∈ S and |x − a| < δ, then |f (x) − L| < .
10
9. Using the “bad sequence” criterion to show that a function is discontinuous.
x
The “signum function” sgn(x) is defined as |x|
for x 6= 0, 0 for x = 0.
Invent a “bad sequence,” none of whose elements is zero, to prove that
sgn(x) is discontinuous at 0, then show that for any positive x, no such bad
sequence can be constructed.
Restate this proof that sgn(x) is discontinous at x = 0, continuous for
positive x, in terms of the epsilon-delta definition.
11
10. Prove that the function
x2 x4
C(x) = 1 − +
2 24
is equal to zero for one and only one value x ∈ [1, 2].
This result will be useful when we define π without trigonometry.
12
11. Uniform continuity (or lack thereof)
1
Let f (x) = x2 + x2
.
Determine whether f is or is not uniformly continuous on each of the fol-
lowing intervals:
(a) [1, 2]
(b) (0, 1]
(c) [2, ∞)
(d) (1, 2)
13
12. Uniform continuity
Show that on the open interval (0, π) the function
1 − cos x
f (x) =
x2
is uniformly continuous by using the “extension” approach.
14
13. Limits by brute force
p
(a) Use the epsilon-delta definition of limit to prove that limx→0 |x| = 0.
x
(b) Use the sequence definition of limit to show that limx→0 |x| does not
exist.
15
14. Limits that involve roots
Use the sum and product rules for limits to evaluate
1
x3 − 1
lim
x→1 x − 1
16
2 Group Problems
1. Proofs about continuity
For (a) and (b), do two different versions of the proof:
• Use the “no bad sequence definition” and invoke a result for sequences
from week 1.
• Use the epsilon-delta definition and mimic the proof for sequences from
week 1.
(a) Prove that if f and g are real-valued functions that are continuous at
x0 ∈ R, then f g is continuous at x0 . (Hint: on any closed interval [x0 −
a, x0 + b] in the domain of f , the continuous function f is bounded.)
(b) Prove that if f is continuous at x0 ∈ R, and g is continuous at f (x0 ),
then the composite function g ◦ f is continuous at x0 .
(c) • The Heaviside function H(x) is defined by H(x) = 0 for x <
0, H(x) = 1 for x ≥ 0 Using the “no bad sequence” definition,
prove that H is discontinuous at x = 0.
• Using the epsilon-delta definition of continuity, prove that f (x) =
x3 is continuous for arbitrary x0 . (Hint: first deal with the special
case x0 = 0, then notice that for small enogh δ, |x| < 2|x0 |.
17
2. Uniform continuity; intermediate-value theorem
(a) Uniform contiunity, or lack thereof

• Show that f (x) = x2 is not uniformly continuous on the closed
interval [0, ∞].
1
• Show that f (x) = 1−x not uniformly continuous on [0, 1).
• Show that f (x) = sin x is uniformly continuous on the open inter-
val (0, π).
(b) Using the intermediate-value theorem
As a congressional intern, you are asked to propose a tax structure for
families with incomes in the range 2 to 4 million dollars inclusive. Your
boss, who feels that proposing a tax rate of exactly 50% for anyone
would be political suicide, wants a function T (x) with the following
properties:
• It is continuous.
• Its domain is [2,4].
• Its codomain is [1,2].
• There is no x for which 2T (x) = x.
Prove that this set of requirements cannot be met by applying the
intermediate-value theorem to the function x−2T (x), which is negative
if the tax rate exceeds 50%.
Then prove “from scratch” that this set of requirements cannot be
met, essentially repeating the proof of the IVT. Hint: Consider the
least upper bound of the set of incomes S ∈ [2, 4] for which the tax
rate is less than 50 %, and construct a pair of good sequences.
(c) Continuous functions on an interval that is not closed
Let S = [0, 1). Invent a sequence xn ∈ S that converges to a number
/ S. Hint: try x1 = 12 , x2 = 34 . Then, using this sequence, invent an
x0 ∈
unbounded continuous function on S and invent a bounded continuous
function on S that has no maximum.
18
3. Calculation of limits (do these in LaTeX to get practice with fractions and
functions)
(a) Limits by brute force

i. Use the epsilon-delta definition of limit to prove that limx→0 x sin x1 =
0.
ii. Use the sequence definition of limit to show that limx→0 sin x1 does
not exist.
(b) Limits that involve square roots; use the sum and product rules for
limits
• Evaluate 3 3
(x + h) 2 − x 2
lim
h→0 h
• Evaluate √ √
lim ( x + 1 − x)
x→∞
(c) Limits that involve trig functions; use the sum and product rules for
limits and the fact that limx→0 sinx x = 1.
• Evaluate
cos 2x − 1
lim
x→0 x2
• Evaluate
tan x − sin x
lim
x→0 x3
19
3 Homework
Special offer – if you do the entire problem set, with one problem omitted, in
LaTeX and hand in a printout of the PDF file, you will receive full credit for the
omitted problem.
1. Ross, exercises 19.2(b) and 19.2(c). Be sure that you prove uniform conti-
nuity, not just continuity!
3. Ross, exercises 20.16 and 20.17. This squeeze lemma is a cornerstone of

elementary calculus, and it is nice to be able to prove it!
4. Ross, exercise 20.18. Be sure to indicate where you are using various limit
theorems.
5. Ross, exercise 17.4. It is crucial that the value of δ is allowed to depend on

x.
6. Ross, exercises 17-13a and 17-14. These functions will be of interest when
we come to the topic of integration in the spring term.
7. Ross, exercise 18-4. To show that something exists, describe a way to

construct it.
8. Ross, exercise 18-10. You may use the intermediate-value theorem to prove
the result.
20
Module #2, Week 4 (Derivatives, Inverse functions, Taylor series)
Last modified:July 24, 2015 by Paul Bamberg
Reading from Ross
• Chapter 5, sections 28 and 29 (pp.223-240)
• Chapter 5, sections 30 and 31, but only up through section 31.7.
• Chapter 7, section 37 (logarithms and exponentials)
• Review the derivative rules, and the limit definition of the derivative.
• Be able to compute polynomial limits such as x2 − 2x from the limit defi-

nition of the derivative.
• Read the last paragraph of section 29.8, which begins “We next show how
to ...” Apply the argument to the case f (x) = sin x, I = ( −π , π ) to do the
2 2
standard derivation of the derivative of the arc sine function. Then be sure
that you understand what else needs to be proved.
• Read the statement of L’Hospital’s rule at the start of section 30.2. Then
look at examples 2 through 5 and identify the values of s and L.
• Look through examples 6 through 9 of section 30.2. Don’t worry about

the details: just notice that there are tricks that can be used to convert a
limit into a form to which L’Hospital’s rule applies. Which example uses
the “common denominator” trick? Which uses the “exponential” trick?
• Read Example 3 on page 257, which describes a function that does not
equal the sum of its Taylor series! Once you are aware of the existence of
such functions, you will appreciate why it is necessary to prove “Taylor’s
theorem with remainder.” Only by showing that the remainder approaches
a limit of zero can you prove that the Taylor series converges to the function.
1
• Look at example 1 of section 31.4, where the familiar Taylor series for the
exponential function and the sine function are derived. By looking at the
corollary at the start of the section and the theorem that precedes it, figure
out the importance of the statement “the derivatives are bounded.”
• Skim the proof of the binomial theorem in Section 31.7. Notice that it is
not sufficient just to crank out derivatives and get the Taylor series. We
will need to prove that, for any |x| < 1, the series for (1 + x)α converges
to the function, and this requires a different form on the remainder. Look
at Corollary 31.6 and Corollary 31.4 and figure out which relies on the
mean-value theorem and which relies on integration by parts.
• 8.1 Suppose that f is a one-to-one continuous function on open interval I

(either strictly increasing or strictly decreasing) Let open interval J = f (I),
and define the inverse function f −1 : J → I for which
(f −1 ◦ f )(x) = x for X ∈ I; f ◦ f −1 (y) = y for y ∈ J.
– Use the chain rule to prove that if f −1 is differentiable at y0 = f (x0 ),

then
1
(f −1 )0 (y0 ) = 0 .
f (x0 )
– Let g = f −1 ; it has already been shown that g is continuous at y0 .
Prove that, if f if differentiable at x0 , then
g(y) − g(y0 ) 1
lim = 0 .
y→y0 y − y0 f (x0 )
• 8.2 Taylor’s Theorem with remainder: Let f be defined on (a, b) with a <
0 < b. Suppose that the nth derivative f (n) exists on (a, b).
Define the remainder
n−1 (k)
X f (0)
Rn (x) = f (x) − xk .
k=0
k!
Prove, by repeated use of Rolle’s theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
f (n) (y) n
Rn (x) = x .
n!
2
videos
• 8.3 (Ross, pp.233-234, Rolle’s Theorem and the Mean Value Theorem)
– Prove Rolle’s Theorem: if f is a continuous function on [a, b] that is

differentiable on (a, b) and satisfies f (a) = f (b), then there exists at
least one x in (a, b) such that f 0 (x) = 0.
– Using Rolle’s Theorem, prove the Mean Value Theorem: f is a contin-

uous function on [a, b] that is differentiable on (a, b), then there exists
at least one x in (a, b) such that
f (b) − f (a)
f 0 (x) =
b−a
• 8.4 (Ross, pp. 228, The Chain Rule – easy special case) Assume the follow-
ing:
– Function f is differentiable at a.
– Function g is differentiable at f (a).
– There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
– Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g ◦ f is defined on J and differentiable at a and that
(g ◦ f )0 (a) = g 0 (f (a)) · f 0 (a).
3
R Scripts
• Script 2.4A-Taylor Series.R

Topic 1 - Convergence of the Taylor series for the cosine function
Topic 2 - A function that is not the sum of its Taylor series
Topic 3 - Illustrating Ross’s proof of Taylor series with remainder.
• Script2.4B-LHospital.R Topic 1 - Illustration of proof 6 from Week 8
• Script 2.4C-SampleProblems.R
4
1 Executive Summary
1.1 The Derivative - Definition and Properties
• A function f is differentiable at some point a if the limit
f (x) − f (a)
lim
x→a x−a
exists and is finite. It is referred to as f 0 (a). If a function is differentiable
at a point a, then it is continuous at a as well.
• Derivatives, being defined in terms of limits, share many properties with

limits. Given two functions f and g, both differentiable at some point a,
the following properties hold:
– scalar multiples: (cf )0 (a) = c · f 0 (a)

– sums of functions: (f + g)0 (a) = f 0 (a) + g 0 (a)
– Product Rule: (f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a)
– Quotient Rule: (f /g)0 (a) = [g(a)f 0 (a) − f (a)g 0 (a)]/g 2 (a) if g(a) 6= 0
• The most memorable derivative rule is The Chain Rule, which states that
if f is differentiable at some point a, and g is differentiable at f (a), then
their composite function g ◦ f is also differentiable at a, and
(g ◦ f )0 (a) = g 0 (f (a)) · f 0 (a)
1.2 Increasing and decreasing fucntions

The terminology is the same as what we used for sequences. It applies to functions
whether or not they are differentiable or even continuous.
• A function f is strictly increasing on an interval I if x1 , x2 ∈ I and

x1 < x2 =⇒ f (x1 ) < f (x2 )
• A function f is strictly decreasing on an interval I if x1 , x2 ∈ I and

x1 < x2 =⇒ f (x1 ) > f (x2 )
• A function f is increasing on an interval I if x1 , x2 ∈ I and

x1 < x2 =⇒ f (x1 ) ≤ f (x2 )
• A function f is decreasing on an interval I if x1 , x2 ∈ I and

x1 < x2 =⇒ f (x1 ) < f (x2 )
5
1.3 Behavior of differentiable functions
These justify our procedures when we are searching for the critical points of a
given function. They are the main properties we draw on when reasoning about
a function’s behavior.
• If f is defined on an open interval, achieves its maximum or minimum at
some x0 , and is differentiable there, then f 0 (x0 ) = 0.
• Rolle’s Theorem. If f is continuous on some interval [a, b] and differentiable

on (a, b) with f (a) = f (b), then there exists at least one x ∈ (a, b) such
that f 0 (x) = 0 (Rolle’s Theorem).
• Mean Value Theorem. If f is continuous on some interval [a, b] and differ-

entiable on (a, b), then there exists at least one x ∈ (a, b) such that
f (b) − f (a)
f 0 (x) =
b−a
• If f is differentiable on (a, b) and f 0 (x) = 0 ∀x ∈ (a, b), then f is a constant

function on (a, b).
• If f and g are differentiable functions on (a, b) such that f 0 = g 0 on (a, b),

then there exists a constant c such that
∀x ∈ (a, b) f (x) = g(x) + c
1.4 Inverse functions and their derivatives

• Review of a corollary of the intermediate value theorem: If function f is
continuous and one-to-one on a interval I(which means it must be either
strictly increasing or strictly decreasing), then there is a continuous inverse
function f −1 , whose domain is the interval
J = f (I), such that f ◦ f −1 and f −1 ◦ f are both the identity function.
• Not quite a proof: Since (f ◦ f −1 )(y) = y, the chain rule states that
f 0 (f −1 (y))(f −1 )0 (y) = y and, if f 0 (f −1 (y)) 6= 0,
1
(f −1 )0 (y) = .
f 0 (f −1 (y))
• Example: if f (x) = tan x with I = ( −π , π ), then f −1 (y) = arctan y and

2 2
1 1 1 1
(arctan)0 (y) = = = 2
=
(tan)0 (arctan y) sec2 (arctan y) 1 + tan (arctan y) 1 + y2
• The problem: we need to prove that f 0 is differentiable.
6
1.5 Defining the logarithm and exponential functions
Define the natural logarithm as an antiderivative:
Z y Z e
1 1
L(y) = dt, and define e so that dt = 1.
1 t 1 t
From this definition it is easy to prove that L0 (y) = y1 and not hard to prove that
L(xy) = L(x) + L(y).
Now the exponential function can be defined as the inverse function, so that
E(L(y)) = y. From this definition it follows that E(x + y) = E(x) + E(y) and
that E 0 (x) = E(x).
1.6 L’Hospital’s rule

• Suppose that f and g are differentiable functions and that
f 0 (x)
lim 0
= L; lim+ f (x) = lim+ g(x) = 0; g 0 (a) < 0.
x→a+ g (x) x→a x→a
Then
f (x)
lim = L.
x→a+ g(x)
• Replace x → a+ by x → a− or x → a or x → ±∞ and the result is still

valid. It is also possible to have limx→a+ f (x) = limx→a+ g(x) = ∞. The
restriction to g 0 (a) < 0 is just to make the proof easier; the result is also
true if g 0 (a) > 0.
• Once you understand the proof in one special case, the proof in all the other
cases is essentially the same.
• Here is the basic strategy: given that
f 0 (x)
lim = L,
x→a g 0 (x)
use the mean value theorem to construct an interval (a, α) on which
f (x)
| − L| < .
g(x)
7
1.7 Taylor series
• If a function f is defined by a convergent power series, i.e.
∞
X
f (x) = ak xk for |x| < R,
k=0
then it is easy to show that
∞
X f (k) (0)
f (x) = xk for |x| < R.
k=0
k!
The challenge is to extend this formula to functions that are differentiable

many times but that are not defined p by power series, like trig functions
defined geometrically, or the function (1 + x).
• Taylor’s theorem with remainder – version 1
By the mean value theorem, f (x) − f (0) = f 0 (y)x for some y ∈ (0, x).
The generalization is that
f 00 (0) 2 f (n−1) (0) n−1 f (n) (y) n
f (x) − f (0) − f 0 (0)x − x − ··· − x = x
2! (n − 1)! n!
for some y between 0 and x. It is proved by induction, using Rolle’s theorem
n times.
• If the right hand side approaches zero in the limit of large n, then the Taylor
series converges to the function. This is true if all the derivatives f (n) are
bounded by a single constant C. This criterion is sufficient to establish
familiar Taylor expansions like
2 3
ex = 1 + x + x2 + x3! + · · ·
2 4
cos x = 1 − x2 + x4! + · · ·
• Taylor’s theorem with remainder – version 2 Rx

The fundamental theorem of calculus says that f (x) − f (0) = 0 f 0 (t)dt.
The generalization is that
x
f 00 (0) 2 f (n−1) (0) n−1 (x − t)n−1 (n)
Z
0
f (x)−f (0)−f (0)x− x −· · ·− x = f (t)dt.
2! (n − 1)! 0 (n − 1)!
It is proved by induction, using integration by parts, but not by us!

• A famous counterexample.
1
The function f (x) = e− x for x > 0 and f (x) = 0 for x ≤ 0 has the prop-
erty that the remainder does not approach a limit of zero. It does not equal
the sum of its Taylor series.
8
Lecture Outline
1. (Ross, p.226, Sum and Product Rule for Derivatives)

Consider two functions f and g. Prove that if both functions are differen-
tiable at some point a, then both (f + g) and f g are differentiable at a as
well, and:
• (f + g)0 (a) = f 0 (a) + g 0 (a)

• (f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a)
2. (Ross, pp. 228, The Chain Rule – easy special case) Assume the following:
• Function f is differentiable at a.
• Function g is differentiable at f (a).
• There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
• Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g ◦ f is defined on J and differentiable at a and that
(g ◦ f )0 (a) = g 0 (f (a)) · f 0 (a).
3. The derivative at a maximum or minimum (Ross, page 232)

Prove that if f is defined on an open interval containing x0 , if f has its
maximum of minimum at x0 , and if f is differentiable at x0 , then f 0 (x0 ) = 0.
4. (Ross, pp.233-234, Rolle’s Theorem and the Mean Value Theorem)

Prove Rolle’s Theorem: if f is a continuous function on [a, b] that is differ-
entiable on (a, b) and satisfies f (a) = f (b), then there exists at least one x
in (a, b) such that f 0 (x) = 0.
Using Rolle’s Theorem, prove the Mean Value Theorem: f is a continuous

function on [a, b] that is differentiable on (a, b), then there exists at least
one x in (a, b) such that
f (b) − f (a)
f 0 (x) =
b−a
9
5. (Ross, theorem 29.9 on pages 237-238, with the algebra done in reverse
order)
Suppose that f is a one-to-one continuous function on open interval I (either
strictly increasing or strictly decreasing) Let open interval J = f (I), and
define the inverse function f −1 : J → I for which
(f −1 ◦ f )(x) = x for X ∈ I; f ◦ f −1 (y) = y for y ∈ J.
• Use the chain rule to prove that if f −1 is differentiable at y0 = f (x0 ),

then
1
(f −1 )0 (y0 ) = 0 .
f (x0 )
• Let g = f −1 ; it has already been shown that g is continuous at y0 .
Prove that
g(y) − g(y0 ) 1
lim = 0 .
y→y0 y − y0 f (x0 )
6. (L’Hospital’s Rule; based on Ross, 30.2, but simplified to one special case)
Suppose that f and g are differentiable functions and that
f 0 (z)
lim = L; f (a) = 0, g(a) = 0; g 0 (a) > 0.
z→a+ g 0 (z)
Choose x > a so that for a < z ≤ x, g(z) > 0 and g 0 (z) > 0.
(You do not have to prove that this can always be done!)
By applying Rolle’s Theorem to h(z) = f (z)g(x) − g(z)f (x),
prove that
f (x)
lim = L.
x→a+ g(x)
10
7. (Ross, page 250; version 1 of Taylor’s Theorem with remainder, setting
c = 0)
Let f be defined on (a, b) with a < 0 < b. Suppose that the nth derivative
f (n) exists on (a, b).
Define the remainder
n−1 (k)
X f (0)
Rn (x) = f (x) − xk .
k=0
k!
Prove, by repeated use of Rolle’s theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
f (n) (y) n
Rn (x) = x .
n!
8. (Ross, pp. 342-343; defining the natural logarithm)
Define Z y
1
L(y) = dt.
1 t
Prove from this definition the following properties of the natural logarithm:
•
1
L0 (y) = for y ∈ (0, ∞).
y
• L(yz) = L(y) + L(z) for y, z ∈ (0, ∞).
• limy→∞ L(y) = +∞.
11
9. Calculating derivatives
√
Let f (x) = 3 x.
(a) Calculate f 0 (x) using the definition of the derivative.

(b) Calculate f 0 (x) by applying the chain rule to (f (x))3 = x.
12
10. Using the Mean Value Theorem
(a) Suppose f is differentiable on R and f (0) = 0, f (1) = 1, and f (2) = 1.

Show that f 0 (x) = 1/2 for some x ∈ (0, 2).
Then, by applying the Intermediate Value Theorem and Rolle’s The-
orem to g(x) = f (x) − 41 x, show that f 0 (x) = 41 for some x ∈ (0, 2).
(b) Prove that if f is a differentiable function on an interval (a, b) and
f 0 (x) > 0 ∀x ∈ (a, b), then f is strictly increasing.
13
11. Using L’Hospital’s rule – tricks of the trade
(a) Conversion to a quotient – evaluate
lim x loge x2 .
x→0+
(b) Evaluate
xex − sin x
lim
x→0 x2
both by using L’Hospital’s rule and by expansion in a Taylor series.
14
12. Applying the inverse-function rule
The function g(y) = arctan y 2 , y ≥ 0 is continuous and strictly increasing,
hence invertible.
Calculate its derivative by finding a formula for the inverse function f (x),
which is easy to differentiate, then using the rule for the derivative of an in-
verse function. You can confirm your answer by using the known derivative
of the arctan function.
15
13. Definition and properties of the exponential function
Denote the function inverse to L by E, i.e.
(E(L(y)) = y for y ∈ (0, ∞)

L(E(x)) = x for x ∈ R
Prove from this definition the following properties of the exponential func-
tion E:
• E 0 (x) = E(x) for x ∈ R.

• E(u + v) = E(u)E(v) for u, v ∈ R.
16
14. Hyperbolic functions, defined by their Taylor series
x3 x5 x2 x4
sinh x = x + + + · · · ; cosh x = 1 + + + ···
3! 5! 2! 4!
• Calculate sinh0 x and cosh0 x, and prove that cosh2 x − sinh2 x = 1.

• Use Taylor’s theorem to prove that
sinh(a + x) = sinh a cosh x + cosh a sinh x.
17
2 Group Problems
1. Proving differentiation rules
(a) Trig functions

• Prove that (sin x)0 = cos x from scratch using the fact that
sin x
lim =1
x→0 x
• Let f (x) = csc x so that sin xf (x) = 1. Use the product rule to
prove that
(csc x)0 = − csc x cot x.
(b) Integer exponents
• Positive: use induction and the product rule to prove that for all
positive integers n
(xn )0 = nxn−1
Hint: start with a base case of n = 1.
• Negative: let f (x) = x−n so that xn f (x) = 1. Use the product
rule to prove that for all positive integers n
(x−n )0 = −nx−n−1 .
(c) Non-integer exponents

• Rational exponent: Let f (x) = xm/n , so that (f (x))n = xm .
Prove that
m m
f 0 (x) = x n −1 .
n
• Irrational exponent:
Let p be any real number and define f (x) = xp = E(pL(x)).
Prove that f 0 (x) = pxp−1 .
18
2. MVT, L’Hospital, inverse functions
(a) When a local minimum is also a global minimum

Suppose that f is twice differentiable on (a, b), with f 00 > 0, and that
there exists x ∈ (a, b) for which f 0 (x) = 0, so that x is a local minimum
of f . Consider y ∈ (x, b). By using the mean value theorem twice,
prove that f (y) > f (x). This, along with a similar result for y ∈ (a, x),
establishes that x is also the global minimum of f on (a, b).
(b) Using L’Hospital’s rule
i. Evaluate the limit
1 − cos x
lim
x→0 ex − x − 1
by using L’Hospital’s rule, then confirm your answer by expanding

both numerator and denominator in a Taylor series.
ii. Evaluate the limit
csc x − cot x
lim .
x→0 x
It takes a little bit of algebraic work to rewrite this in a form to
which L’Hospital’s rule can be applied.
(c) Applying the inverse-function rule
√
The function g(y) = arcsin y, 0 < y < 1 is important in the theory
of random walks.
Calculate its derivative by finding a formula for the inverse function
f (x), which is easy to differentiate, then using the rule for the deriva-
tive of an inverse function. You can confirm your answer by using the
known derivative of the arcsin function.
19
3. Taylor series
(a) Using the Taylor series for the trig functions

Define functions S(x) and C(x) by the power series
x 3 x5 x2 x4
S(x) = x − + − · · · ; C(x) = 1 − + − ···
3! 5! 2! 4!
• Calculate S 0 (x) and C 0 (x), and prove that S 2 (x) + C 2 (x) = 1.
• Use Taylor’s theorem to prove that
C(a + x) = C(a)C(x) − S(a)S(x).
(b) Using the remainder to prove convergence
Define f (x) = loge (1 + x) for x ∈ (−1, ∞).
Using the remainder formula
f (n) (y) n
Rn (x) = x
n!
prove that
1 1 1 1
loge 2 = 1 − + − + − ··· .
2 3 4 5
Show that the remainder does not go to zero if you set x = −1.
(c) Derive the Taylor series for the function f (x) = cos x. Prove that the
series converges for all x. Then use an appropriate form of remainder
to prove that it converges to the cosine function.
20
3 Homework
Again, if you do the entire assignment in TeX, you may omit one problem and
receive full credit for it.
1. Ross, 28.2
2. Ross, 28.8
3. Ross, 29.12
4. Ross, 29.18
5. Ross, exercises 30-1(d) and 30-2(d). Do these two ways: once by using
L’Hospital’s rule, once by replacing each function by the first two or three
terms of its Taylor series.
6. Ross, 30-4. Use the result to convert exercise 30-5(a) into a problem that
involves a limit as y → ∞.
7. One way to define the exponential function is as the sum of its Taylor series:
x2 x3
ex = 1 + x + 2!
+ 3!
+ ··· .
Using this definition and Taylor’s theorem, prove that ea+x = ea ex .
8. Ross, exercise 31.5. For part (a), just combine the result of exmaple 3
(whose messy proof you need not study) with the chain rule.
21
Module #3, Week 1

Reading
• Hubbard, Section 1.5. The only topology that is treated is the “open-ball
topology.”
Alas, Hubbard does not mention either finite topology or differential equa-
tions. I have included a set of notes on these topics that I wrote for Math 121.
Warmups(intended to be done before lecture)
• Go to the page of the Math 23 Web site called “Finite topology example.”
Roam around the six pages by clicking links, and convince yourself that the
site is represented by the graph and the matrix T on page 3.
• Look at the three axioms for topology on page 4, and decide whether or
not open intervals on the line and open disks in R2 appear to satisfy them.
In each case, invent an infinite intersection of open sets that consists of a
single point, which is a closed set.
• Review matrix diagonalization and its generalizations. In order to solve

differential equations, you will need to be able to express a 2/times2 matix
A in one of three way:
– A = P DP −1 where D is diagonal (for real distinct eigenvalues)

– A = bI + N where N is nilpotent (if p(t) = (t − b)2 )
– A = P CP −1 where C is conformal (for complex conjugate eigenvalues)
1
Proofs:
• 9.1
– Define “Hausdorff space,” and prove that in a Hausdorff space the

limit of a sequence is unique.
– Prove that Rn , with the topology defined by open balls, is a Hausdorff
space.
• 9.2 Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the “infinite triangle inequality” for Rn
∞
X ∞
X
| a~i | ≤ |~
ai |
i=1 i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is con-
vergent.
2
R Scripts
• Script 3.1A-FiniteTopology.R
Topic 1 - The ”standard” Web site graph, used in notes and examples
Topic 2 - Drawing a random graph to create a different topology on the
same set
• Script 3.1B-SequencesSeriesRn.R
Topic 1 - A convergent sequence of points in R2
Topic 2 - A convergent infinite series of vectors
Topic 3 - A convergent geometric series of matrices
• Script 3.1C-DiffEquations.R
Topic 1 - Two real eigenvalues
Topic 2 - A repeated real eigenvalue
Topic 3 - Complex conjugate eigenvalues
3
1 Executive Summary
1.1 Axioms of Topology
In topology, we start with a set X and single out some of its subsets as “open
sets.” The only requirement on a topology is that the collection of open sets
satisfies the following rules (axioms)
• The empty set and the set X are both open.
• The union of any finite or infinite collection number of open sets is open.
• The intersection of two open sets is open. It follows by induction that the
intersection of n open sets is open, but the intersection of infinitely many
open sets is not necessarily open.
1.2 A Web-site model for finite topology

A model for a set of axioms is a set of real-world objects that satisfy the axioms.
Consider a Web site of six pages, linked together as follows:
In this model, an “open set” is defined by the property that no page in the
set can be reached by a link from outside the set. We need to show that this
definition is consistent with the axioms for open sets.
–The empty set is open. Since it contains no pages, it contains no page that can
be reached by an outside link.
–The set X of all six pages is open, because there is no other page on the site
from which an outside link could come.
–If sets A and B are open, no page in either can be reached by an outside link,
and so their union is also open.
–If sets A and B are open, so is their intersection A ∩ B. Proof by contraposition:
Suppose that A ∩ B is not open. Then it contains a page that can be reached
by an outside link. If that link comes from A, then B is not open. If that link
comes from B, then A is not open. If that link comes from outside both A and
B, then both A and B are not open.
4
1.3 Topology in R and Rn
The usual way to introduce a topology for the set R is to decree that any open
interval is an open set and so is the empty set. Equivalently, we can decree that
the set of points for which |x − x0 | < , with > 0, is an open set. Notice that the
infinite intersection of the open sets (−1/n, 1/n) is the single point 0, a closed
set!
The usual way to introduce a topology for the set Rn is to decree that any
“open ball,” the set of points for which |x − x0 | < , with > 0, is an open set.
1.4 More concepts of general topology

These definitions are intuitively reasonable for R and Rn , but they also apply to
the Web-site finite topology,
• Closed sets
A closed set A is one whose complement Ac = X − A is open. Careful:
this is different from “one that is not open.” There are lots of sets that are
neither open nor closed, and there are sets that are both open and closed.
• A neighborhood of a point is any set that has as a subset an open set

containing the point. A neighborhood does not have to be open.
• The closure of set A ⊂ Rn , denoted A, is “the smallest closed set that

contains A,” i.e. the intersection of all the closed sets that contain A
• The interior of a set A ⊂ Rn , denoted Å, is “the largest open set that is
contained in A,” i.e. the union of all the open subsets of A.
• The boundary of A, denoted ∂A, is the set of all points x with the property
that any neighborhood of x includes points of A and also includes points
of the complement Ac .
The boundary of A is the difference between the closure of A and its interior.
1.5 A topological definition of convergence

Sequence sn converges to a limit s if for every open set A containing s, ∃N such
that ∀n > N , an ∈ A. In other words, the points of the sequence eventually get
inside A and stay there.
Specialize to R and Rn .
A sequence an of real numbers converges to a limit a if ∀ > 0, ∃N such that
∀n > N , |a − an | < . (open sets defined as open intervals)
A sequence a1 , a2 , ... in Rn converges to the limit a if ∀ > 0, ∃M such that if
m > M , |am − a| < . (open sets defined by open balls)
The sequence converges if and only if the sequences of coordinates all converge.
5
1.6 Something special about the open ball topology
For the Web diagram above , the sequence (6,5,4,6,5,4,5,4,5,4,...) converges

both to 4 and to 5. Both {456} and {45} are open sets (no incoming links) but
{4} {5}, {46}, and {56} are not.
This cannot happen in Rn . If the sequence a1 , a2 , ... in Rn converges to a and
same sequence also converges to the limit b,we can prove that a = b.
Why? The open ball topological space is Hausdorff. Given any two distinct
points a and b, we can find open sets A and B with a ∈ A, b ∈ B, and A ∩ B = ∅.
In a Hausdorff space, the limit of a sequence is unique.
1.7 Infinite sequences and series of vectors and matrices

• We need something that can be made “less than .” For vectors the familiar
length is just fine. The “infinite triangle inequality (proof 9.2) states that
∞
X ∞
X
| a~i | ≤ |~
ai |
i=1 i=1
• We define the “length of a matrix” by viewing the matrix as a vector.

Since an m × n matrix A is an element of Rmn , we can view it as a vector
and define its length |A| as the square root of the sum of the squares of all
its entries. This definition has the following useful properties:
– |A~b| ≤ |A||~b|
– |AB| ≤ |A||B|
Let A be a square matrix, and define its exponential by

∞
X (A)r tr
exp(At) = .
r=0
r!
Denoting the length of matrix A by |A|, we have

∞
X (|A|t)r
| exp(At)| ≤ .
r=0
r!
√
or | exp(At)| ≤ exp(|A|t) + n − 1, so the series is convergent for all t.
6
1.8 Calculating the exponential of a matrix

b 0 bt 0
–If D = , then Dt = and
0 c 0 ct
bt
1 0 bt 0 1 (bt)2 0 e 0
exp(Dt) = + + + ··· =
0 1 0 ct 2 0 (ct)2 0 ect
–If there is a basis of eigenvectors for A,
then A = P DP −1 , Ar = P Dr P −1 ., and exp(At) = P exp(Dt)P −1 .
–Replace D by a conformal matrix C = aI + bJ where J 2 = −I and
exp(Ct) = exp(aIt) exp(bJt) can be expressed in terms of sin t and cos t.
–If A = bI + N, and N 2 = 0, exp(At) = exp bt exp(N t) = exp bt(I + N t).
1.9 Solving systems of linear differential equations

We put a dot over a quantity to denote its time derivative.
The solution to the differential equation ẋ = kx is x = exp(kt)x0 .
Suppose that there is more than one variable, for example
ẋ = x + y
ẏ = −2x + 4y.

x
If we set ~v = then this pair of equations becomes
y

˙~v = A~v, where A = 1 1
−2 4
The solution is the same as in the single-variable case: ~v = exp(At)~v0
Proof:
∞
X Ar tr
exp At = .
r=0
r!
∞
X rAr tr−1
d
exp At = .
dt r=1
r!
Set s = r − 1.
∞ ∞
d X As+1 ts X As ts
exp At = =A = A exp At.
dt s=0
s! s=0
s!
So
d
~v˙ = exp At~v0 = A exp At~v0 = A~v.
dt
7
2 Lecture outline
1. Proof 9.1
• Define “Hausdorff space,” and prove that in a Hausdorff space the

limit of a sequence is unique.
• Prove that Rn , with the topology defined by open balls, is a Hausdorff
space.
2. Convergent sequences in Rn :
A sequence a1 , a2 , ... in Rn converges to the limit a if

∀ > 0, ∃M such that if m > M , |am − a| < .
Prove that the sequence converges if and only if the sequences of coordinates
all converge.
Then state and prove the corresponding result for infinite series of vectors
in Rn
3. Proof 9.2
Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the “infinite triangle inequality” for Rn
∞
X ∞
X
| a~i | ≤ |~
ai |
i=1 i=1
under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is con-
vergent.
4. Prove that if every element of the convergent sequence (xn ) is in the closed
subset C ⊂ Rn , then the limit x0 of the sequence is also in C.
5. Proof of inequalities involving matrix length

The length of a matrix is calculated by treating it as a vector: take the
square root of the sum of the squares of all the entries.
If matrix A consists of a single row, then |A~b| ≤ |A||~b| is just the Cauchy-
Schwarz inequality.
• |A~b| ≤ |A||~b| when A is an m × n matrix.

• |AB| ≤ |A||B|
√
• |I| = n for the n × n identity matrix.
8
6. Constructing a finite topology
Axioms for general topology
– The empty set and the set X are both open.
– The union of any finite or infinite collection number of open sets is open.
– The intersection of two open sets is open.
Suppose that we start with X = {123456} and choose a “subbasis.” con-
sisting of {123}, {245}, and {456}.
• Find all the other sets that must be open because of the intersection
axiom and the empty-set axiom.
• Find all the other sets that must be open because of the union axiom
and the axiom that set X is open.
• We now have the smallest collection of open sets that satisfies the ax-
ioms and includes the subbasis. A closed set is one whose complement
is open. List all the closed sets.
• What is the smallest legal collection of open sets in the general case?
• What is the largest legal collection of open sets in the general case?
9
7. Web site topology. A set of pages is “open” if there are no incoming links
from elsewhere on the site. A set of pages is closed if no outgoing link
leads to a page outside the set (i.e. if the complement is an open set.)
Open:{2}, {45}, {123}, {456}, {245}, {12345}, {2456}

Closed:{13456}, {1236}, {456}, {123}, {136}, {6}, {13}
Both: Empty set and {123456}
• Is {345} a neighborhood of page 4?
• What is the closure of {23}?
• Of {26}?
• What is the interior of {23}?
• Of {23456}?
• What is the boundary of {23}?

• A sequence sn converges to page a if, for any open set S that contains
page a,
∃N such that ∀n > N, sn ∈ S.
To which page or pages does sequence (1, 2, 3, 2, 1, 2, 2, 2, 2, 2, · · · ) converge?
To which page or pages does sequence (4, 5, 6, 4, 5, 6, 4, 5, 4, 5, · · · ) converge?
10
8. The “open ball” definition of an open set satisfies the axioms of topology.
A set U ∈ Rn is open if ∀x ∈ U, ∃r > 0 such that the open ball Br (x) ∈ U .
• Prove that the empty set is open.
• Prove that all of Rn is open.
• Prove that the union of any collection of open sets is open.
• Prove that the intersection of two open sets is open.
• Prove that in R2 , the boundary of the open disc x2 + y 2 < 1 is the

circle x2 + y 2 = 1 .
• Find the infinite intersection of open balls of radius n1 around the

origin, for all positive integers. Is it open, closed, or neither?
11
9. A geometric series of matrices
The geometric series formula for a square matrix A is
(I − A)−1 = I + A + A2 + ....
1
1
0 2 , A2 =
−4 0
Let A = .
− 12 0 0 − 41
(a) Evaluate I + A2 + A4 .....

(b) Evaluate A + A3 + A5 .... = A(I + A2 + A4 ....).
(c) Evaluate I + A + A2 + ....
(d) Evaluate (I − A)−1 and compare.
12
10. Calculating and using the exponential of a matrix

1 1 1
The matrix A = has eigenvector with eigenvalue 2 and
−2 4 1

1
eigenvector with eigenvalue 3.
2
(a) Write A in the form A = P DP −1 ,and work out exp(At) = P exp(Dt)P −1 .

0
(b) As “initial conditions,” take ~v0 = . Calculate exp(At)~v0 .
1
(c) Differentiate the answer with respect to t and check that
ẋ = x + y
ẏ = −2x + 4y.
13
11. Solving a differential equation when there is no eigenbasis.
The system of differential equations
ẋ = 3x − y
ẏ = x + y

3 −1
can be written ~v˙ = A~v, where A = .
1 1
Our standard technique leads to p(t) = t2 − 4t + 4 = (t − 2)2 , so there is
one only eigenvalue.

1−1
Let N = A − 2I = .
1−1
We have found that p(A) = A2 − 4A + 4I = (A − 2I)2 = 0, so N 2 = 0.
Since matrices 2I and N commute, exp(At) = exp(2It) exp(N t)
Show that exp At = e2t (I + N t) ,and confirm that (exp At)~e1 is a solution
to the differential equation.
14
12. Solving the “harmonic oscillator” differential equation (if time permits)
Applying Newton’s second law of motion to a mass of 1 attached to a spring
with “spring constant” 4 leads to the differential equation
ẍ = −4x.
Solve this equation by using matrices for the case where x(0) = 1, v(0) = 0.
The trick is to consider a vector

x(t)
~ =
w , where v = ẋ.
v(t)
15
3 Group Problems
1. Topology
(a) We can use the same conventions as for the ferryboat graph of week
1. Column j shows the links going out of page j. If Ti,j = 1, there is
a link from page j to page i. If Ti,j = 0, there is no link from page j
to page i.
 
0 1 0 0 0 0
1 0 0 0 0 0
 
0 1 0 1 0 0
T =0 0 0 0 0 0 .

 
0 0 0 1 0 0
0 1 0 1 0 0
Draw the Web site graph that this matrix represents.
i. Open sets include {12} and {4}. List all the other open sets and
all the closed sets.
ii. Determine the interior, closure, and boundary of {123}.
iii. Determine to what point or points (if any) the sequence
(1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 4, 6, 4, 6, 4, 6 · · · ) converges.
(b) Recall the axioms of topology, which refer only to open sets:
• The empty set and the set X are both open.
• The union of any collection of open sets is open.
• The intersection of two open sets is open.
A closed set C is defined as a set whose complement C c is open.
You may use the following well-known properties of set complements,
sometimes called “De Morgan’s Laws”:
(A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c .
i. Prove directly from the axioms of topology that the union of two
closed sets is closed.
ii. In the Web site topology, a closed set of pages is one that has
no outgoing links to other pages on the site. Prove that in this
model, the union of two closed sets is closed.
iii. Prove that if A and B are closed subsets of R2 (with the topology
specified by open balls), their union is also closed.
(c) Subsets of R
i. Let A = {0} ∪ (1, 2]. Determine Ac , Å, A, and ∂A.
ii. What interval is equal to ∞ 1 1
S
n=2 [−1 + n , 1 − n ]? Is it a problem
that this union of closed sets is not a closed set?
iii. Let Q1 denote the set of rational numbers in the interval (−1, 1).
Determine the closure, interior, and boundary of this set.
16
2. Convergence in Rn
(a) The sequence a1 , a2 , ... in Rn converges to a.

The sequence b1 , b2 , ... in Rn converges to b.
Define cn = an + bn , c = a + b.
Prove that the sequence c1 , c2 , ... in Rn converges to c. Use the triangle
inequality for vectors: the proof strategy is similar to the one that you
learned for sequences of real numbers.
(b) Suppose that the sequence a1 , a2 , ... in Rn converges to 0, and the se-
quence of real numbers k1 , k2 , · · · , although not necessarily convergent,
is bounded: ∃K > 0 such that ∀n ∈ N, |kn | < K.
Prove that the sequence k1 a1 , k2 a2 , ... in Rn converges to 0.

0 −1
(c) Prove that if J = , then exp(Jt) = I cos t + J sin t. Show that
1 0
this is consistent with the Taylor series for eit .
17
3. Differential equations
(a) The original patriarchal differential equation problem

Isaac has established large flocks of sheep for his sons Jacob and Esau.
Anticipating sibling rivalry, he has arranged that the majority of the
growth of each son’s flock will come from lambs born to the other son.
So, if x(t) denotes the total weight of all of Jacob’s sheep and y(t)
denotes the total weight of all of Esau’s sheep, the time evolution of
the weight of the flocks is given by the differential equations
ẋ = x + 2y
ẏ = 2x + y

1 2
i. Calculate exp(At), where A = .
2 1
ii. Show that if the flocks are equal in size, they will remain that way.
What has this got to do with the eigenvectors of A?
iii. Suppose that when t = 0, the weight of Jacob’s flock is S while the
weight of Esau’s flock is 2S. Find formulas for the sizes as func-
tions of time, and show that the flocks will become more nearly
equal in weight as time passes.

˙ 3 1
(b) Suppose that ~v = A~v, where A = . Since p(t) = (t−2)2 , there
−1 1
is no basis of eigenvectors. By writing A as the sum of a multiple of
the identity matrix and a nilpotent matrix, calculuate exp(At).
(c) Convert ẍ + 4ẋ + 5x = 0 to a first-order equation of the formw ~˙ = A~
w,
1
and show that A = P CP −1 , where the first column of P is and C
0
is conformal. Thereby determine x(t) for initial position x0 = 5 and
initial velocity v0 = −10. Don’t multiply out the matrices – let each
in turn act on the vector of initial condtions.
18
4 Homework
1. Suppose that you want to construct a Web site of six pages numbered 1
through 6, where the open sets of pages, defined as in lecture, include {126},
{124}, and {56}.
(a) Prove that in the Web site model of finite topology, the intersection
of two open sets is open.
(b) What other sets must be open in order for the family of open sets to
satisfy the intersection axiom?
(c) What other sets must be open in order for the family of open sets to
satisfy the union axiom?
(d) List the smallest family of open sets that includes the three given sets
and satisfies all three axioms. (You have already found all these sets!)
(e) Draw a diagram showing how six Web pages can be linked together so
that only the sets in this family are open. This is tricky. First deal with
5 and 6. Then deal with 1 and 2. Then incorporate 4 into the network,
and finally 3. There are many correct answers since, for example, if
page 1 links to page 2 and page 2 links to page 3, then adding a direct
link from page 1 to page 3 does not change the topology.
2. In R2 , in addition to defining an open ball Br around x, we can define an

“open diamond” Dr around x by
Dr (x) = {y ∈ R2 such that |x1 − y1 | + |x2 − y2 | < r}
and we can define an “open square” Sr around x by
Sr (x) = {y ∈ R2 such that max(|x1 − y1 |, |x2 − y2 |) < r}.

3
(a) For x = , r = 1, make a sketch showing B1 (x), D1 (x), and S1 (x).
2
(b) Suppose that, in Hubbard definition 1.5.2, you replace “open ball” by
“open diamond” or “open square.” Prove that the topology remains
the same: i.e. that an open set according to one definition is an open
set according to either of the others.
(c) (Optional) Show that if, instead of two-component vectors, you use
infinite sequences, there is an open square of radius 1 centered on the
zero vector that is not contained in any open ball and an open ball of
radius 1 that is not contained in any open diamond. You can learn
more about infinite-dimensional vector spaces by taking Math 110,
Math 116, or Physics 143.
19
3. More theorems about limits of sequences
The sequence a~1 , a~2 , ... in Rn converges to ~a.
The sequence b~1 , b~2 , ... in Rn converges to ~b.
(a) Prove that the sequence of lengths |b~1 |, |b~2 |, ... in R is bounded:
∃K such that ∀n, |b~n | < K. Hint: write b~m = b~m − ~b + ~b, then use
the triangle inequality.
(b) Define the sequence of dot products: cn = a~n · b~n .
Prove that c1 , c2 , · · · converges to ~a · ~b.
Hint: Subtract and add ~a · b~n , then use the triangle inequality and the
Cauchy-Schwarz inequality.
1 1
4. Let A = 31 13
3 3
(a) By considering the length of A, show that
lim An
n→∞
must be the zero matrix.

(b) Find a formula for An when n ≥ 1, and prove it by induction. Note
that the formula is not valid for n = 0.
(c) Verify the formula
(I − A)−1 = I + A + A2 + ....
for this choice of A. As was the case for sample problem 4, you can
evaluate the infinite sum on the right by summing a geometric series,
but you should split off the first term and start the geometric series
with the second term.
20
5. The differential equation ẍ = −3ẋ − 2x describes the motion of an “over-
damped oscillator.” The acceleration ẍ is the result of the sum of a force
proportional to ẋ, supplied by a shock absorber, and a force proportional
to x, supplied by a spring.

x
~ =
(a) Introduce v = ẋ as a new variable, and define the vector w .
v
Find a matrix A such that w ~˙ = A~
w.
(b) Calculate the matrix exp(At).
(c) Graph x(t) for the following three sets of initial values that specify
position and velocity when t = 0:

1
Release from rest: w~0 = .
0

0
Quick shove: w~0 = .
1

1
Push toward the origin: w ~0 = .
−3
21

a b
6. Suppose that A is a matrix of the form S = . Prove that
b a

cosh(bt) sinh(bt)
exp(St) = exp(at) .
sinh(bt) cosh(bt)
Then use this result to solve
ẋ = x + 2y
ẏ = 2x + y
without having to diagonalize the matrix S.

−1 9
7. Let B = . Show that there is only one eigenvalue λ and find an
−1 5
eigenvector for it. Then show that N = B − λI is nilpotent.
(a) By writing B = λI + N , calculate B 2 .

(b) By writing B = λI + N , solve the system of equations
ẋ = −x + 9y
ẏ = −x + 5y
x
for arbitrary initial conditions ~v0 = 0 .
y0

7 −10
8. Week 4, sample problem 6, showed how to write A = in the form
2 −1
3 −2 1 2
A = P CP −1 , where C = is conformal and P
2 3 0 1
˙
Follow up on this analysis to solve the differential equation ~v = A~v for
1
initial conditions ~v0 = .
0
9. Let A be a 2 × 2 matrix which has two distinct real eigenvalues λ1 and λ2 ,
with associated eigenvectors ~v1 and ~v2 .
(a) Show that the matrix P1 = A−λ 2I

λ1 −λ2
is a projection onto the subspace
spanned by eigenvector ~v1 . Find its image and kernel, and show that
P12 = P1 .
(b) Similarly, the matrix P2 = A−λ 1I
λ2 −λ1
is a projection onto the subspace
spanned by eigenvector ~v2 . Show that P1 P2 = P2 P1 = 0, that P1 +P2 =
I, and that λ1 P1 + λ2 P2 = A.
(c) Show that exp(tλ1 P1 + tλ2 P2 ) = exp(λ1 t)P1 + exp(λ2 t)P2 , and use this
result to solve the equations
ẋ = −4x + 5y
ẏ = −2x + 3y
x
for arbitrary initial conditions ~v0 = 0 .
y0
22
Module #3, Week 2

Reading
• Hubbard, section 1.5, pages 92 through 99 (limits and continuity)
• Hubbard, section 1.6 up through page 112.
• Hubbard, Appendix A.3 (Heine-Borel)
• Hubbard, section 1.7 up through page 133.
• 10.1 Let X ⊂ R2 be an open set, and consider f : X → R2 . Let x0 be a

point in X. Prove that f is continuous at x0 if and only if for every sequence
xi converging to x0 ,
lim f (xi ) = f (x0 ).

i→∞
• 10.2 Using the Bolzano-Weierstrass theorem, prove that a continuous real-

valued function f defined on a compact subset C ⊂ Rn has a supremum M
and that there is a point a ∈ C (a maximum) where f (a) = M .
You may wish to feature Ötzi the Iceman as the protagonist of your proof.
1
R Scripts
• Script3.2A-LimitFunctionR2.R
Topic 1 - Sequences that converge to the origin
Topic 2 - Evaluating functions along these sequences
• Script 3.2B-AffineApproximation.R
Topic 1 - The tangent-line approximation for a single variable
Topic 2 - Displaying a contour plot for a function
Topic 3 - The gradient as a vector field
Topic 4 - Plotting some pathological functions
2
1 Executive Summary
1.1 Limits in Rn
• To define limx→x0 f (x), we need not require that x0 is in domain of f . We
require only that x0 is in the closure of the domain of f . This requirement
guarantees that for any δ > 0 we can find an open ball of radius δ around
x0 that includes points in the domain of f . There is no requirement that
all points in that ball be in the domain.
• Limit of a function f from Rn to Rm :
We assume that the domain is a subset X ⊂ Rn .
Definition: Function f : X → Rm has the limit a at x0 :
lim f (x) = a
x→x0
if x0 is in the closure of X and ∀ > 0, ∃δ > 0 such that

∀x ∈ X that satisfy |x − x0 | < δ, |f (x) − a| < .
• limx→x0 f (x) = a if and only if for all sequences with lim xn = x0 ,
lim f (xn ) = a. To show that a function f does not have a limit as x → x0 ,
invent two different sequences, both of which converge to x0 , for which the
sequences of function values do not approach the same limit. Or just invent
one sequence for which the sequence lim f (xn ) does not converge!
• If limx→x0 f (x) = a and limx→x0 f (x) = b, then a = b.

f1 (x)
• Suppose f (x) = .
f2 (x)
limx→x0 f (x) = a if and only if limx→x0 f1 (x) = a1 and limx→x0 f2 (x) = a2 .
• Properties of limits
These are listed on p. 95 of Hubbard. The proofs are almost the same as
for functions of one variable
–Limit of sum = sum of limits.
–Limit of product = product of limits.
–Limit of quotient = quotient of limits if you do not have zero in the de-
nominator.
–Limit of dot product = dot product of limits. (proved on pages 95-96.)
These last two useful properties involve a vector-valued function f (x) and
a scalar-valued function h(x), both with domain U .
–If f is bounded and h has a limit of zero, then hf also has a limit of zero.
–If h is bounded and f has a limit of zero, then hf also has a limit of zero.
3
1.2 Continuous functions in topology and in Rn
• Function f is continuous at x0 if, for any open set U in the codomain that
contains f (x0 ), the preimage (inverse image) of U , i.e. the set of points x
in the domain for which f (x) ∈ U , is also an open set.
• Here is the definition that lets us extend real analysis to n dimensions.
f : Rn → Rm is continuous at x0 if, for any open “codomain ball” of radius
centered on f (x0 ), we can find an open “domain ball” of radius δ centered
on x0 such that if x is in the domain ball, f (x) is in the codomain ball.
• An equivalent condition (your proof 10.1):
f is continuous at x0 if and only if every sequence that converges to x0 is a
good sequence. We will need to prove this for f : Rn → Rm , but the proof
is almost identical to the proof for f : R → R, which we have already done.
• As was the case in R, sums, products, compositions, etc. of continuous
functions are continuous. If you can write a formula for a function of
several variables that does appear to involve division by zero, the theorems
on pages 98 and 99 will show that it is continuous.
• To show that a function is discontinuous, construct a bad sequence!
1.3 Compact subsets and Bolzano-Weierstrass

• A subset X ∈ Rn is bounded if there is some ball, centered on the origin,
of which it is a subset. If a nonempty subset C ∈ Rn is closed as well as
bounded, it is called compact.
• Bolzano-Weierstrass theorem in Rn
The theorem says that given any sequence of points x1 , x2 , ... from a com-
pact set C, we can extract a convergent subsequence whose limit is in C.
Easy proof (Ross,section 13.5)
In Rn , using the theorem that we have proved for R, extract a subsequence
where the first components converge. Then extract a subsequence where
the second components converge, continuing for n steps.
Hubbard, theorem 1.6.3, offers an alternative but nonconstructive proof.
• Existence of a maximum
The supremum M of function f on set C is the least upper bound of the
values of f . The maximum, if it exists, is a point of evaluation: a point
a ∈ C such that f (a) = M . Infimum and minimum are defined similarly.
A continuous real-valued function f defined on a compact subset C ⊂ Rn
has a supremum M and that there is a point a ∈ C (a maximum) where
f (a) = M . The proof (your proof 10.2) is similar to the proof in R.
4
1.4 The nested compact set theorem
Xk ∈ Rn is a decreasing sequence of nonempty compact sets: X1 ⊃ X2 ⊃ · · · .
For example, in R, Xn = [−1/n, 1/n]. In R2 . we can use nested squares.
∞
\
The theorem states that Xk 6= ∅.
k=1
1
If Xk = (0, (not compact!), the infinite intersection is the empty set.
k
)
The proof (Hubbard, Appendix A.3) starts by choosing a point xk from each
set Xk , then invokes the Bolzano-Weierstrass theorem to select a convergent
subsequence yi that converges to a point a thatT∞is contained in each of the Xk
and so is also an element of their intersection m=1 Xm .
1.5 The Heine-Borel theorem

The Heine-Borel theorem states that for a compact subset X ∈ Rn , any open
cover contains a finite subcover. In other words, if someone gives you a possibly
infinite collection of open sets Ui whose union includes every point in X, you can
select a finite number of them whose union still includes every point in X
m
[
X⊂ Ui .
i=1
The proof (Hubbard, Appendix A.3) uses the nested compact set theorem.
In general topology, where the sets that are considered are not necessarily
subsets of Rn , the statement “every open cover contains a finite subcover” is
used as the definition of “compact set.”
1.6 Partial derivatives

If U is an open subset of Rn and function f : U → R is defined by a formula
 
x1
 x2 
f 
· · ·
xn
then its partial derivative with respect to the ith variable is
   
a1 a1
∂f 1  ... 

 ... )
 
= Di f (a) = lim (f  ai + h − f  ai 
∂xi h→0 h
an an
This does not give the generalization we want. It specifies a good approxima-
tion to f only along a line through a, whereas we would like an approximation
that is good in a ball around a.
5
1.7 Directional derivative, Jacobian matrix, gradient
Let ~v be the direction vector of a line through a. Imagine a moving particle
whose position as a function of time t is given by a + t~v on some open interval
that includes t = 0. Then f (a + t~v) is a function of the single variable t. The
derivative of this function with respect to t is the directional derivative.
More generally, we use h instead of t and define the directional derivative as
f (a + h~v) − f (a)
∇~v f (a) = lim
h→0 h
If the directional derivative is a linear function of ~v, in which case f is said
to be differentiable at a, then the directional derivative can be calculated if we
know its value for each of the standard basis vectors. Since
f (a + h~ei ) − f (a)
∇~ei f (a) = lim = Di f (a)
h→0 h
we can write
∇~v f (a) = D1 f (a)v1 + D2 f (a)v2 + · · · + Dn f (a)vn .

For a more compact notation, we can make the partial derivatives into a 1 × n
matrix, called the Jacobian matrix
[Jf (a)] = [D1 f (a)D2 f (a) · · · Dn f (a)],

whereupon
∇~v f (a) = [Jf (a)]~v.
Alternatively, we can make the partial derivatives into a column vector, the
gradient vector
 
D1 f (a)
 D2 f (a) 
 ··· ,
grad f (a) =  
Dn f (a)
so that
∇~v f (a) = grad f (a) · ~v.

We now have, for differentiable functions (and we will soon prove that if
the partial derivatives of f are continuous, then f is differentiable), a useful
generalization of the tangent-line approximation of single variable calculus.
f (a + h~v) ≈ f (a) + [Jf (a)](h~v)
This sort of approximation (a constant plus a linear approximation) is called

an“affine approximation.”
6
2 Lecture outline
1. Given that function f : Rk → Rm is continuous at x0 , prove that every
sequence such that xn → x0 is a “good sequence” in the sense that f (xn )
converges to f (x0 ). (This is half of proof 10.1.)
7
2. Given that function f : Rk → Rm is discontinuous at x0 , show how to
construct a “bad sequence” such that xi → x0 but f (xi ) does not converge
to f (x0 ). (This is the other half of proof 10.1).
8
3. A fanciful version of proof 10.2: a continuous real-valued function f defined
on a compact subset C ⊂ Rn has a supremum M and there is a point a ∈ C
(a maximum) where f (a) = M .
Ötzi the Iceman, whose mummy is the featured exhibit at the archaeological
museum in Bolzano, Italy, has a goal of camping at the greatest altitude
M on the Tyrol, a compact subset of the earth’s surface on which altitude
is a continuous function f of latitude and longitude.
(a) Assume that there is no supremum M. Then Ötzi can select a sequence
of campsites in C such that
f (x1 ) > 1, f (x2 ) > 2,... f (xn ) > n, · · · . Show how to use Bolzano-
Weierstrass to construct a “bad sequence,” in contradiction to the
assumption that f is continuous.
(b) On night n, Ötzi chooses a campsite whose altitude exceeds M − 1/n.
From this sequence, extract a convergent subsequence, and call its
limit a. Show that f (a) = M , so a is a maximum, and M is not
merely a supremum but a maximum value.
9
4. Nested compact sets
You have purchased a nice chunk of Carrara marble from which to carve
the term project for your GenEd course on Italian Renaissance sculpture.
On day 1 the marble occupies a compact subset X1 of the space in your
room. You chip away a bit every evening, hoping to reveal the masterpiece
that is hidden in the marble, and you thereby create a decreasing sequence
of nonempty compact sets: X1 ⊃ X2 ⊃ · · · .
Your understanding instructor gives you an infinite extension of time on the
project. Prove that there is a point a that forever remains in the marble,
no matter how much you chip away; i.e. that
∞
\
Xk 6= ∅.
k=1
10
5. Heine-Borel theorem (proved in R2 , but the proof is the same for Rn .)
Suppose that you need security guards to guard a compact subset X ∈ R2 .
Heine-Borel Security, LLC proposes that you should hire an infinite number
of their guards, each of whom will patrol an open subset Ui of R2 . These
guards protect all of X: the union of their patrol zones is an “open cover.”
Prove that you can fire all but a finite number m of the security guards
(not necessarily the first m) and your property will still be protected:
m
[
X⊂ Ui .
i=1
Break up the part of the city where your property lies into closed squares,
each 1 kilometer on a side. There will exist a square B0 that needs infinitely
many guards (the “infinite pigeonhole principle”).
Break up this square into 4 closed subsquares: again, at least one will need
infinitely many guards. Choose one subsquare and call it B1 . Continue this
procedure to get a decreasing sequence Bi of nested compact sets, whose
intersection includes a point a.
Now show that any guard whose open patrol zone includes a can replace
all but a finite number of other guards.
11
6. Cauchy sequences in Rn
• Prove that every Cauchy sequence of vectors ~a1 , ~a2 , · · · ∈ Rn is bounded:

i.e. ∃M such that ∀n, |~an | < M .
Hint: ~an = ~an − ~am + ~am . When showing that a sequence is bounded,
you can ignore the first N terms.
• Prove that if a sequence a1 , a2 , · · · ∈ Rn converges to a, it is a Cauchy
sequence. Hint: am − an = am − a + a − an . Use the triangle inequality.
• Prove that every convergent sequence of vectors ~a1 , ~a2 , · · · ∈ Rn is
bounded (very easy, given the preceding results.)
12
7. Using sequences to show that a limit does not exist.
x2 − y 2

x
f = 2
y x + y2
Construct sequences (xn ), all of which converge to the origin, with the
following properties:
(a) lim f (xn ) = 1.

(b) lim f (xn ) = 0.
(c) lim f (xn ) = −3/5..
(d) lim f (xn ) does not exist.
Express f in terms of polar coordinates to make it clear what is going on.
13
8. A challeging bad sequence construction, from Hubbard pp. 96-97.
|y|
|y|e− x2

x
f =
y x2
(a) Evaluate f on the sequence xn = 1/n, yn = m/n for arbitrary m

(b) Evaluate f on the sequence xn = 1/n, yn = 1/n2
14
9. Continuity and discontinuity in R3
(a) Define    
x 0
xyz
F y = 2
  , F 0 = 0.

x + y2 + z2
z 0
Prove that F is continuous at the origin.
(b) Define    
x 0
xy + xz + yz  
g y  = 2 , g 0 = 0.
x + y2 + z2
z 0
Prove that g is discontinuous at the origin.
15
10. Converse of Heine-Borel in R
The converse of Heine-Borel says that if the U.S goverment is hiring Heine-
Borel security to guard a subset X of the road from Mosul to Damascus
and wants to be sure that they do not have to pay an infinite number of
guards, then X has to be closed and bounded.
(a) What happens if Heine-Borel assigns guard k to patrol the open inter-
val (−k, k)?
(b) What happens if Heine-Borel selects a point x0 that is not in X and
assigns guard k to patrol the interval (x0 − 1/k, x0 + 1/k).?
16

x p
11. Let f = xy 3 .
y

4
Evaluate the Jacobian matrix of f at and use it to find the best affine
1
4 2
approximation to f ( +t ) for small t.
1 1

4 2
By defining g(t) = f ( +t ), you can convert this problem to one
1 1
in single-variable calculus. Show that using the tangent-line approximation
near t = 0 leads to exactly the same answer.
17
12. A clever applcation of the gradient vector
The Cauchy-Schwarz inequality says that
grad f · v ≤ |gradf ||v|, with equality when grad f and v are proportional.
If v is a unit vector, the maximum value of the directional derivative occurs
when v is a multiple of grad f .
Suppose
that the temperature T in a open subset of the plane is given by
x
T = 25 + 0.1x2 y 3 . If you are at x = 1, y = 2, along what direction
y
should you walk to have temperature increase most rapidly?
18
3 Group Problems
1. Theorems related to Bolzano-Weierstrass and Heine-Borel
(a) You are working for Heine-Borel Security and are bidding on a project
to guard the interior of one mile of Pennsylvania Avennue between the
Capitol to the White House, modeled as the open interval I = (0, 1).
Show that you can create a countably infinite set of disjoint open patrol
zones which cover only a subset of I, so that no “finite subcover” will be
possible. Then show that you cannot do the same with an uncountably
infinite set of disjoint open patrol zones. (Hint: each zone includes a
different rational number.)
(b) A school playground is a compact subset C ⊂ R2 . Two aspiring quar-
terbacks are playing catch with a football, and they want to get as far
apart as possible. Show that if sup |x − y| = D for any two points in
C, they can find a pair of points x0 and y0 such that |x0 − y0 | = D.
Then invent simple examples to show that this cannot be done if the
playground is unbounded or is not closed.
(c) The converse of the Heine-Borel theorem states that if every open
cover of set X ∈ Rn contains a finite subcover, then X must be closed
and bounded.
i. By choosing as the open cover a set of open balls of radius 1, 2, · · · ,
prove that X must be bounded.
ii. To show that X is closed, show that its complement X c must be
open. Hint: choose any x0 ∈ X c and choose an open cover of X
in which the kth set consists of points whose distance from x0 is
greater than k1 . This open cover of X must have a finite subcover.
If you need a further hint, look on pages 90 and 91 of Chapter 2 of
Ross.
19
2. Limits and continuity in R2
(a) Define
xy 3

x 0
f = 2 ,f = 0.
y x +y 6 0
1 1
Show that the sequence 1 is “good” but that i13 is “bad.”
i
i i
(b) Let
xy(x2 − y 2 )

x 0
f = ,f = 0.
y 2 2
(x + y ) 2 0

0
Invent a “bad sequence” of points (a1 , a2 , · · · ) that converges to
0
for which
lim f (ai ) 6= 0.
i→∞

0
This bad sequence proves that f is discontinuous at .
0
(c) Let
xy(x2 − y 2 )

x 0
g = ,g = 0.
y 2
x +y 2 0

0
By introducing polar coordinates, prove that g is continuous at .
0
20
3. Using partial derivatives to find approximate function values

x 2 2
(a) Let f = x y. Evaluate the Jacobian matrix of f at and use
y 0.5
1.98 1.998
it to find the best affine approximation to f and to f .
0.51 0.501
Use a calculator or R, find the “remainder” (the difference between
the actual function value and the best affine approximation) in each
case. You should find that the remainder decreases by a factor that is
much greater than 10.
(b) Let
x2 y

x
f = 4 .
y x + y2

0
f is defined to be 0 at . Show that both partial derivatives are
0
0
zero at but that the function is not continuous there.
0

x
(c) Let f = y + log(xy) (natural logarithm) for x, y > 0. Evaluate
y
0.5
the Jacobian matrix of f at and use it to find the best affine
2
0.51
approximation (constant plus linear approximation) to f .
2.02
21
4 Homework
1. A rewrite of Oetzi the Iceman, with lots of sign changes.
Joe the Plumber, who became a minor celebrity in the 2008 presidential
campagn, has hit the jackpot. Barack Obama enrolls him in a health plan,
formerly available only to members of Congress, that makes him immortal,
and gives him a special 401(k) that delivers $10K per month of tax-free
income. Joe retires to pursue his lifelong dream of camping at the lowest
spot in Death Valley.
Assume that Death Valley National Park is a closed set and that altitude
f (x) in the Park is a continuous function. Prove that the altitude in Death
Valley has a greatest lower bound (even though that is obvious on geograph-
ical grounds) and that there is a place where that lower bound is achieved,
so that Joe can achieve his goal.
2. You are the mayor of El Dorado. Not all the streets are paved with gold –
only the interval [0,1] on Main Street – but you still have a serious security
problem, and you ask Heine-Borel Security LLC to submit a proposal for
keeping the street safe at night. Knowing that the city coffers are full, they
come up with the following pricey plan for meeting your requirements by
using a countable infinity of guards:
• Guard 0 patrols the interval (− N1 , N1 ), where you may choose any value
greater than 100 for the integer N . She is paid 200 dollars.
• Guard 1 patrols the interval (0.4, 1.2) and is paid 100 dollars.
• Guard k patrols the interval ( 0.8 , 2.4 ) and is paid 100(0.9)k−1 dollars.
2k 2k
(a) Calculate the total cost of hiring this infinite set of guards (sum a
geometric series).
(b) Show that the patrol regions of the guards form an “open cover” of
the interval [0,1].
(c) According to the Heine-Borel theorem, this infinite cover has a finite
subcover. Explain clearly how to construct it. (Hint: look at the proof
of the Heine-Borel theorem)
(d) Suppose that you want to protect only the open interval (0,1), which
is not a compact subset of Main Street. In what very simple way can
Heine-Borel Security modify their proposal so that you are forced to
hire infinitely many guards?
22
3. Prove the Heine-Borel theorem in R2 by contraposition. Assume that you
have been given a countably infinite collection of open sets Ui that cover
a compact set X, and assume that no finite subcollection covers X. Show
(for a contradiction) that you can identify a single U that replaces all but
finitely many of the Ui .
4. Hubbard, Exercise 1.6.6. You might want to work parts (b) and (c) before
attempting part (a). The function f (x) is defined for all of R, which is not
a compact set, so you will have to do some work before applying theorem
1.6.9. Notice that “a maximum” does not have to be unique: a function
could achieve the same maximum value at more than one point.
5. Singular Point, California is a spot in the desert near Death Valley that is
reputed to have been the site of an alien visit to Earth. In response to a
campaign contribution from AVSIG, the Alien Visitation Special Interest
Group, the government has agreed to survey the region around the site.
In the vicinity, the altitude is given by the function
2x2 y

x
f = 4 .
y x + y2
A survey team that traveled through the Point going west to east declares
that the altitude at the Point itself is zero. A survey team that went
south to north would comment only that zero was perhaps a reasonable
interpolation.
(a) Suppose you travel through the Point along the line y = mx, passing
through the point at time t = 0 andmoving witha constant velocity

x t
such that x = t: in other words, = . Find a function
y mt
g(m, t) that gives your altitude as a function of time on this journey.
Sketch graphs of g as a function of t for m = 1 and for m = 3. Is what
happens for large m consistent with what happens on the y axis?

0
(b) Find a sequence of points that converges to , for which xn = n1
0
x
and f = 1 for every point in the sequence. Do the same for
y
x
f = −1.
y
(c) Is altitude a continuous function at Singular Point? Explain.
23
6. (a) Hubbard, exercise 1.7.12. This is good practice in approximating a
function by using its derivative and seeing how fast the “remainder”
goes to zero.
(b) Hubbard, exercise 1.7.4. These are all problems in single-variable cal-
culus, but they cannot be solved by using standard differentation for-
mulas. You have to use the definition of the derivative as a limit.
7. Linearity of the directional derivative.

2
Suppose that, near the point a = , the Celsius temperature is specified
1
x
by the function f = 20 + xy 2 .
y

1
(a) Suppose that you drive with a constant velocity vector ~v1 = , pass-
3
2
ing through the point at time t = 0. Express the temperature
1
outside your car as a function g(t) and use single-variable calculus to
calculate g 0 (0), the rate at which the reading on your car’s thermome-
ter is changing. You have calculated the directional derivative of f
along the vector ~v1 by using single-variable calculus.

−1
(b) Do the same for the velocity vector ~v2 = .
−1
(c) As it turns out, the given function f is differentiable, and the direc-
tional derivative is therefore a linear function of velocity. Use this fact
to determine the directional derivative of f along the standard ba-
0
sis vector ~e2 = from your earlier answers, and confirm that your
1
answer agrees with the partial derivative D2 f (a).
(d) Remove all the mystery from this problem by recalculating the direc-
tional derivatives using the formula [Df (a)]~v.

x √ 2
8. Let f = x y. Evaluate the Jacobian matrix of f at and use it
y 4
1.98
to find the best affine approximation to f .
4.06
√
As you can confirm by using a calculator, 1.98 4.06 = 3.989589452...
24
9. (a) Hubbard, Exercise 1.7.22. This is a slight generalization of a topic that
was presented in lecture. The statement is in terms of derivatives, but
it is equivalent to the version that uses gradients.
(b) An application: suppose that you are skiing on a mountain where
x
the height above sea level is described by the function f = 1−
y
0.2x2 − 0.4y 2 (with the kilometer as the unit of distance,
this is not
x 1
unreasonable). You are located at the point = . Find a
y 1
unit vector ~v along the direction in which you should head if you want
to head straight down the mountain and two unit vectors w ~ 1 and w~2
that specify directions for which your rate of descent is only 35 of the
maximum rate.
(c) Prove that in general, the unit vector for which the directional deriva-
tive is greatest is orthogonal to the direction along which the direc-
tional derivative is zero, and use this result to find a unit vector ~u
appropriate for a timid but lazy skier who wants to head neither down
nor up.
25
Module #3, Week 3
Differentiability, Newton’s method, inverse functions

Reading
• Hubbard, section 1.7 (you have already read most of this)
• Hubbard, sections 1.8 and 1.9 (computing derivatives and differentaibility)
• Hubbard, section 2.8 page 233-235 and page 246. (Newton’s method)
• Hubbard, section 2.10 up through page 264. (inverse function theorem)
• 11.1 Let U ⊂ Rn be an open set, and let f and g be functions from U to

R. Prove that if f and g are differentiable at a then so is f g, and that
[D(f g)(a)] = f (a)[Dg(a)] + g(a)[Df (a)].
• 11.2 Using the mean value theorem, prove that if a function f : R2 → R has
partial derivatives D1 f and D2 f that are continuous
at a, it is differentiable

at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
1
R Scripts
• Script 3.3A-ComputingDerivatives.R
Topic 1 - Testing for differentiability
Topic 2 - Illustrating the derivative rules
• Script 3.3B-NewtonsMethod.R
Topic 1 - Single variable
Topic 2 - 2 equations, 2 unknowns
Topic 3 - Three equations in three unknowns
• Script 3.3C-InverseFunction.R
Topic 1 - A parametrization function and its inverse
Topic 2 - Visualizing coordinates by means of a contour plot
Topic 3 - An example that is economic, not geometric
2
1 Executive Summary
1.1 Definition of the derivative
• Converting the derivative to a matrix
The linear function f (h) = mh is represented by the 1 × 1 matrix [m].
When we say that f 0 (a) = m, what we mean is that the function
f (a + h) − f (a) is well approximated, for small h, by the linear function
mh. The error made by using the approximation is a “remainder” r(h) =
f (a + h) − f (a) − mh. If f is differentiable, this remainder approaches 0
faster than h, i.e.
r(h) f (a + h) − f (a) − mh
lim = lim = 0.
h→0 h h→0 h
This definition leads to the standard rule for calculating the number m,
f (a + h) − f (a)
m = lim .
h→0 h
• Extending this definition to f : Rn → Rm
A linear function L(~h) is represented by an m × n matrix.
When we say that f is differentiable at a, we mean that the function
f (a + ~h) − f (a) is well approximated, for any ~h whose length is small, by a
linear function L, called the derivative [Df (a)].
The error made by using the approximation is a “remainder”
r(~h) = f (a + ~h) − f (a) − [Df (a)](~h).
f is called differentiable if this remainder approaches 0 faster than |~h|, i.e.
1 ~ 1
lim r(h) = lim (f (a + ~h) − f (a) − [Df (a)](~h)) = 0.
h→~0 |~
~ h| h→~0 |~
~ h|
In that case, [Df (a)] is represented by the Jacobian matrix [Jf (a)].
Proof: Since L exists and is linear, it is sufficient to consider its action on
each standard basis vector. We choose ~h = t~ ei so that |~h| = t. Knowing
that the limit exists, we can use any sequence that converges to the origin
to evaluate it, and so
1 1
ei ) − f (a) − tL~
lim (f (a + t~ ei )) = 0? and L(~ ei ) − f (a))
ei ) = lim (f (a + t~
t→0 t t→0 t
What is hard is proving that f is differentiable – that L exists – since that

requires evaluating a limit where ~h → ~0. Eventually we will prove that f is
differentiable at a if all its partial derivatives are continuous there.
3
1.2 Proving differentiability and calculating derivatives
In every case f is a function from U to Rm , where U is an open subset of Rn .
• f is constant: f = c. Then [Df (a)] is the zero linear transformation, since
1 1
lim (f (a + ~h) − f (a) − [Df (a)]~h) = lim (c − c − ~0) = ~0.
h→~0 |~
~ h| h→~0 |~
~ h|
• f is affine: a constant plus a linear function, f = c + L. [Df (a)] = L , since

1 1
lim (f (a+~h)−f (a)−[Df (a)]~h) = lim (c+L(a+~h)−(c+L(a))−L(~h)) = 0.
~ ~
h→~0 |h| ~
h→~0 |h|
~
   
f1 Df1 (a)
·  · 
   
f has differentiable components: if f = 
 ·  : then Df (a) =  · 
 
·  · 
fn Dfn (a)
.
• f + g is the sum of two functions f and g, both differentiable at a.

The derivative of f + g is the sum of the derivatives of f and g. (easy to
prove)
• f g is the product of scalar-valued function f and vector-valued g, both

differentiable. Then
[D(f g)(a)]~v = f (a)([Dg(a)]~v) + ([Df (a)]~v)g(a).
• g/f is the quotient of vector-valued function g and scalar-valued f , both

differentiable, and f (a) 6= 0. Then
g [Dg(a)]~v ([Df (a)]~v)g(a)

[D( )(a)]~v = − .
f f (a) (f (a)2
• U ⊂ Rn and V ⊂ Rm are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U → V is differentiable at a, and [Dg(a)] is an m × n Jacobian matrix.
f : V → Rp is differentiable at g(a), and [Df (g(a))] is a p × m Jacobian
matrix.
The chain rule states that [D(f ◦ g)(a))] = [Df (g(a))] ◦ [Dg(a)].
• The combined effect of all these rules is effectively that if a function is

defined by well-behaved formulas (no division by zero), it if differentiable,
and its derivative is represented by its Jacobian matrix.
4
1.3 Connection between Jacobian matrix and derivative
• If f : Rn → Rm is defined on an open set U ∈ Rn , and
   
x1 f1 (x)
f (x) = f  ...  =  ... 
xn fm (x)
the Jacobian matrix [Jf (x)] is made up of all the partial derivatives of f :
 
D1 f1 (a)....Dn f1 (a)
[Jf (a)] =  ... 
D1 fm (a)....Dn fm (a)
• We can invent pathological cases where the Jacobian matrix of f exists

(because all the partial derivatives exist), but the function f is not differ-
entiable. In such a case, using the formula
∇~v f (a) = [Jf (a)]~v
generally gives the wrong answer for the directional derivative! You are
trying to use a linear approximation where none exists.
• Using the Jacobian matrix of partial derivatives to get a good affine ap-
proximation for f (a + ~h) is tantamount to assuming that you can reach the
point a + ~h by moving along lines that are parallel to the coordinate axes
and that the change in the function value along the solid horizontal line is
well approximated by the change along the dotted horizontal line. With
the aid of the mean value theorem, you can show that this is the case if
(proof 11.2) the partial derivatives of f at a are continuous.
(a1 , a2 + h2 ) (a1 + h1 , a2 + h2 )
(a1 , a2 ) (a1 + h1 , a2 )
1.4 Newton’s method – one variable

Newton’s method is based on the tangent-line approximation. Function f is
differentiable. We are trying to solve the equation f (x) = 0, and we have found
a value a0 that is close to the desired x. So we use the best affine approximation
f (x) ≈ f (a0 ) + f 0 (x0 )(x − a0 ).
Then we find a value a1 for which this tangent-line approximation equals zero.
f (a0 ) + f 0 (x0 )(a1 − a0 ) = 0, and a1 = a0 − f (a0 )/f 0 (a0 ).
When f (a0 ) is small, f 0 (a0 ) is large, and f 0 (a0 ) does not change too rapidly, a1
is a much improved approximation to the desired solution x. Details, for which
Kantorovich won the Nobel prize in economics, are in Hubbard.
5
1.5 Newton’s method – more than one variable
Example: we are trying to solve a system of n nonlinear equations in n unknowns,
e.g.
x2 ey − sin(y) − 0.3 = 0
tan x + x2 y 2 − 1 = 0.
Ordinary algebra is no help – there is no nonlinear counterpart to row reduction.
U is an open subset of
R
n
, and we have a differentiable
function ~f (x) : U → Rn .
x x2 ey − sin(y) − 0.3
In the example, ~f = , which is differentiable.
y tan x + x2 y 2 − 1
We are trying to solve the equation ~f (x) = ~0.
Suppose we have found a value a0 that is close to the desired x.
Again we use the best affine approximation
~f (x) ≈ ~f (a0 ) + [Df̃ (a0 )](x − a0 ).
We set out to find a value a1 for which this affine approximation equals zero.
~f (a0 ) + [Df̃ (a0 )](a1 − a0 ) = ~0
This is a linear equation, which we know how to solve!

If [Df̃ (a0 )] is invertible (and if it is not, we look for a better a0 ), then
−1
a1 = a0 − [Df̃ (a0 )] ~f (a0 ).
Iterating this procedure is the best known for solving systems of nonlinear equa-
tions. Hubbard has a detailed discussion (which you are free to ignore) of how
to use Kantorovich’s theorem to assess convergence.
1.6 The inverse function theorem – short version

For function f : [a, b] → [c, d], we know that if f is strictly increasing or strictly
decreasing on interval [a, b], there is an inverse function g for which g ◦ f and
f ◦ g are both the identity function. We can find g(y) for a specific y by solving
f (x) − y = 0, perhaps by Newton’s method. If f (x0 ) = y0 and f 0 (x0 ) 6= 0, we
can prove that g is differentiable at y0 and that g 0 (y0 ) = 1/f 0 (x0 ).
“Strictly monotone” does not generalize, but “nonzero f 0 (x0 )” generalizes to
“invertible [Df (x0 )].” Start with a function f : Rn → Rn whose partial derivatives
are all continuous, so that we know that it is differentiable everywhere. Choose
a point x0 where the derivative [Df (x0 )] is an invertible matrix. Set y0 = f (x0 ).
Then there is a differentiable local inverse function g = f −1 such that
• g(y0 ) = x0 .
• f (g(y)) = y if y is close enough to y0 .
• [Dg(y)] = [Df (g(y))]−1 (follows from the chain rule)
6
2 Lecture outline
1. (Proof 11.1)
Let U ⊂ Rn be an open set, and let f and g be functions from U to R.
Prove that if f and g are differentiable at a then so is f g, and that
[D(f g)(a)] = f (a)[Dg(a)] + g(a)[Df (a)].

(This is simpler than the version in Hubbard because both f and g are
scalar-valued functions)
2. (Chain rule in R2 – not a proof, but still pretty convincing)

U ⊂ R2 and V ⊂ R2 are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U → V is differentiable at a, and [Dg(a)] is a 2 × 2 Jacobian matrix.
f : V → R2 is differentiable at g(a), and [Df (g(a))] is a 2 × 2 Jacobian
matrix.
The chain rule states that [D(f ◦ g)(a))] = [Df (g(a))] ◦ [Dg(a)].
Draw a diagram to illustrate what happens when you use derivatives to find
a linear approximation to f ◦ g)(a))]. This can be done in a single step or
in two steps.
7
3. (Proof 11.2) Using the mean value theorem, prove that if a function f :
R2 → R has partial derivatives D1 f and D2 f that are continuous
at a, it is
differentiable at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
4. Newton’s method
(a) One variable: Function f is differentiable. You are trying to solve the
equation f (x) = 0, and you have found a value a0 , close to the desired
x, for which f (a0 ) is small. Derive the formula a1 = a0 − f (a0 )/f 0 (a0 )
for an improved estimate.
(b) n variables: U is an open subset of Rn , and function ~f (x) : U → Rn is
differentiable. You are trying to solve the equation ~f (x) = ~0,
and you have found a value a0 , close to the desired x, for which ~f (a0 )
is small. Derive the formula
−1
a1 = a0 − [Df̃ (a0 )] ~f (a0 ).
for an improved estimate.
5. Derivative of inverse function

Suppose that f : Rn → Rn is a continuously differentiable function. Choose
a point x0 where the derivative [Df (x0 )] is an invertible matrix. Set y0 =
f (x0 ). Let g be the differentiable local inverse function g = f −1 such that
g(y0 ) = x0 and f (g(y)) = y if y is close enough to y0 .
Prove that [Dg(y0 )] = [Df (x0 )]−1
8
6. Jacobian matrix for a parametrization function
Here is the function that converts the latitude u and longitude v of a point
on the unit sphere to the Cartesian coordinates of that point.
 
cos u cos v
u
f =  cos u sin v 
v
sin u
Work out the Cartesian coordinates of the point with sin u = 35 (37 degrees
North latitude) and sin v = 1(90 degrees East longitude), and calculate the
Jacobian matrix at that point. Then find the best affine approximation to
the Cartesian coordinates of the nearby point where u is 0.01 radians less
(going south) and v is 0.02 radians greater (going east).
9
7. Derivative of a function of a matrix (Example 1.7.17 in Hubbard):
A matrix is also a vector. When we square an n × n matrix A, the entries of
S(A) = A2 are functions of all the entries of A. If we change A by adding
to it a matrix H of small length, we will make a change in the function
value A2 that is a linear function of H plus a small “remainder.”
We could in principle represent A by a column vector with n2 components
and the derivative of S by a very large matrix, but it is more efficient to
leave H in matrix form and use matrix multiplication to find the effect of
the derivative an a small increment matrix H. The derivative is still a linear
function, but it is represented by matrix multiplication in a different way.
(a) Using the definition of the derivative, show that the linear function
that we want is DS(H) = AH + HA.
(b) Confirm that DS is a linear function of H
(c) Check that DS(H) is a good approximation to S(A+H)−S(A) for the
following simple case, where the matrices A and H do not commute.

1 1 0 h
A= ,H=
0 1 k 0
10
8. Two easy chain rule examples
(a) g : R → R2 maps time into the position of a particle moving around

the unit circle:
cos t
g(t) = .
sin t
f : R2 → R maps a point into the temperature at that point.

x
f = x2 − y 2
y
The composition f ◦ g maps time directly into temperature .
Confirm that [D(f ◦ g)(t)] = [Df (g(t))] ◦ [Dg(t)].
(b) Let φ : R → R be any differentiable function. You can make a function
f : R2 → R that is constant on any circle centered at the origin by
x
forming the composition f = φ(x2 + y 2 ).
y
Show that f satisfies the partial differential equation yD1 f −xD2 f = 0.
11
9. Chain rule for functions of matrices
In sample problem 2 we showed that the derivative of the squaring function
S(A) = A2 is DS(H) = AH + HA
Proposition 1.7.19 (tedious proof on pp. 136-137) establishes the similar
rule that for T (A) = A−1 , the derivative is DT (H) = −A−1 HA−1
Now the function U (A) = A−2 can be expressed as the composition U =
S ◦ T.
Find the derivative DU (H) by using the chain rule.
The chain rule says “the derivative of a composition is the composition of
the derivatives,” even in a case like this where composition is not repre-
sented by matrix multiplication.
12
10. A non-differentiable function
Consider a surface where the height z is given by the function
3x2 y − y 3

x 0
f = 2 ;f = 0.
y x +y 2 0
This function is not differentiable at the origin, and so you cannot calculate
its directional derivatives there by using the Jacobian matrix!
(a) Along the first standard basis vector, the directional derivative at the
origin is zero. Find two unit vectors along other directions that also
have this property.
(b) Along the second standard basis vector, the directional derivative at
the origin is -1.
Find two unit vectors along other directions that also have this prop-
erty. (This surface is sometimes called a “monkey saddle,” because a
monkey could sit comfortably on it with its two legs and its tail placed
along these three downward-sloping directions.
(c) Calculate
the directional derivative along an arbitrary unit vector
cos θ
~eθ = . Using the trig identity sin 3θ = 3 sin θ cos2 θ − sin3 θ,
sin θ
quickly rederive the special cases of parts (a) and (b).
(d) Using the definition of the derivative, give a convincing argument that
this function is not differentiable at the origin.
13
11. Newton’s method
We want an approximate solution to the equations
log x + log y = 3
x2 − y = 1
x log x + log y − 3 0
i.e. f = 2 = .
y x −y−1 0

3
Knowing that log 3 ≈ 1.1, show that x0 = is an approximate solution
9
to this equation, then use Newton’s method to improve the approximation.
Here is a check:
log 2.81 + log 6.87 = 2.98
2.812 − 6.87 = 1.02
14
12. An economic example of the inverse-function theorem:
Your model: Providing x in health benefits and y in educational benefits
leads to happiness H and cost C according the the equation

H x x + x0.5 y
=f = .
C y x1.5 + y 0.5
Currently, x = 4, y = 9, H = 22, C = 11. Your budget is cut, and you are
told to adjust x and y to reduce C to 10 and H to 19. Find an approximate
solution by using the inverse-function theorem.

H
We cannot find formulas for the inverse function g that would solve
C
the problem exactly, but we can calculate the derivative of g.
" √ #
1 + 2√y x x 13

2
(a) Check that[Df ] = 3√ 1 = 4 1 is invertible.
2
x √
2 y 3 6

−0.03 0.36 19
(b) Use the derivative [Dg] = to approximate g
0.55 −0.6 10
15
3 Group Problems
1. Chain rule
(a) Chain rule for matrix functions

On smple problem 4, we obtained the differentiation formula for U (A) =
A−2 by writing U = S ◦ T with S(A) = A2 , T (A) = A−1 . Prove
the same formula from the chain rule in a different way, by writing
U = T ◦ S. You may reuse the formulas for the derivatives of S and
T:
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A−1 then [DT (A)](H) = −A−1 HA−1 .
(b) Let U ⊂ R2 be the set of points whose coordinates
are both positive.
x
Suppose that f : U → R can be written f = φ(y/x), for some
y
differentiable φ : R → R.
Show that f satisfies the partial differential equation
xD1 f + yD2 f = 0.
(c) Chain rule with 2 × 2 matrices

r
Start with a pair of polar coordinates .
θ

x
Function g converts them to Cartesian .
y
x 2xy
Function f then converts to .
y x2 − y 2

r r r
Confirm that [D(f ◦ g)( ))] = [Df (g )] ◦ [Dg ]
θ θ θ
16
2. Issues of differentiability
(a) Let
x2 y 2

x
f = 2 .
y x + y2

0
f is defined to be 0 at . State, in terms of limits, what it means
0
0
to say that f is differentiable at and prove that its derivative
0
0
[Df ] is the zero linear transformation.
0
(b) Suppose that A is a matrix and S is the cubing function given by the
formula S(A) = A3 . Prove that the derivative of S(A) is
[DS(A)](H) = A2 H + AHA + HA2 .
The proof consists in showing that the length of the “remainder” goes
to zero faster than the length of the matrix H.
(c) A continuous but non-differentiable function
x2 y

x 0
f = 2 ,f = 0.
y x + y2 0
i. Show that both partial derivatives vanish at the origin, so that
the Jacobian matrix at the origin isthe zero matrix [0 0], but
1
that the directional derivative along is not zero. How does
1
this calculation show that the function is not differentiable at the
origin?
ii. For all points except the origin, the partial derivatives are given
by the formulas
2xy 3 x4 − x2 y 2

x x
D1 f = 2 , D2 f =
y (x + y 2 )2 y (x2 + y 2 )2
Construct a “bad sequence” of points approaching the origin to
show that D1 f is discontinuous at the origin.
17
3. Inverse functions and Newton’s method
(to be done in R, by modifying R script 3.3B)
(a) An approximate solution to the equations
x3 + y 2 − xy = 1.08
x2 y + y 2 = 2.04
is x0 = 1, y0 = 1.
Use one step of Newton’s method to improve this approximation.
(b) You are in charge of building the parking lots for a new airport. You
have ordered from amazon.com enough asphalt to pave 1 square kilo-
meter, plus 5.6 kilometers of chain-link fencing. Your plan is to build
two square, fenced lots. The short-term lot is a square of side x=0.6
kilometers; the long-term lot is a square of side y=0.8 kilometers. The
amount of asphalt A and the amount C of chain-link fencing required
are then specified by the function
2
A x x + y2
=F = ,
C y 4x + 4y
Alas, Amazon makes a small shipping error. They deliver enough
asphalt to pave 1.03 square kilometers but only 5.4 kilometers of fence.
i. Use the inverse-function theorem to find approximate new values
for x and y that use exactly what was shipped to you.
In this simple case you can check your answer by solving alge-
braically for x and y.
ii. Find a case where A = 1 but the value of C is such that this
approach will fail because [DF ] is not onto. (This case corresponds
to the maximum amount of fencing.)
(c) Saving Delos
The ancient citizens of Delos, threatened with a plague, consulted the
oracle of Delphi, who told them to construct a new cubical altar to
Apollo whose volume was double the size of the original cubical altar.
(For details, look up “Doubling the cube” on Wikipedia.)
If the side of the original altar was 1, the side of the new altar had to
be the real solution to f (x) = x3 − 2 = 0.
Numerous solutions to this problem have been invented. One uses a
“marked ruler” or “neusis”; another uses origami.
Your job is to use multiple iterations of Newton’s method to find an
approximate solution for which x3 − 2 is less than 10−8 in magnitude.
18
4 Homework
1. (similar to group problem 1a)
We know the derivatives of the matrix-squaring function S and the matrix-
inversion function T :
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A−1 then [DT (A)](H) = −A−1 HA−1 .
(a) Use the chain rule to find a formula for the derivative of the function
U (a) = A4 .
(b) Use the chain rule to find a formula for the derivative of the function
W (a) = A−4 .
2. (a) Hubbard, Exercise 1.7.21 (derivative of the determinant function).

This is really easy if you work directly from the definition of the deriva-
tive.
(b) Generalize this result to the 3 × 3 case. Hint: consider a matrix whose
columns are ~e1 + h~a1 , ~e2 + h~a2 , ~e3 + h~a3 , and use the definition of the
determinant as a triple product.
3. Hubbard, Exercise 1.8.6, part (b) only. In the case where f and g are
functions of time t, this formula finds frequent use in physics. You can
either do the proof as suggested in part (a) or model your proof on the one
for the dot product on page 143.
4. (similar to group problem 1b)

Hubbard, Exercise 1.8.9. The equation that you prove can be called a
“first-order partial differential equation.”
19
5. (similar to group problem 2c)
As a summer intern, you are given the job of reconciling the Democratic and
Republican proposals for tax reform. Both parties agree on the following
model:
• x is the change in the tax rate for the middle class.

• y is the change in the tax rate for the well-off.
• The net impact on revenue is given by the function
x(x2 − y 2 )

x 0
f = ,f = 0.
y 2
x +y 2 0
The Republican proposal is y = −x, while the Democratic proposal is
y = x.
(a) Show that f is continuous at the origin.

(b) Show that both proposals are revenue neutral by calculating two ap-
propriate directional derivatives. You will have to use the definition
of the directional derivative, not the Jacobian matrix.
(c) At the request of the White House, you investigate a 50-50 mix of the
two proposals, the compromise case where y = 0, and you discover
that it is not revenue neutral! Confirm this surprising conclusion by
showing that the directional derivatives at the origin cannot be given
by a linear function; i.e. that f is not differentiable.
(d) Your final task is to explain the issue in terms that legislators can un-
derstand: the function is not differentiable because its partial deriva-
tives are not continuous. Demonstrate that one of the partial deriva-
tives of f is discontinuous at the origin. (D2 f is less messy.)
20
6. Chain rule: an example with 2 × 2 matrices A similar example with a 3 × 3
matrix is on page 151 of Hubbard.
The function
1
x (x + y)
f = 2 √ was invented by Gauss about 200 years ago to deal
y xy
with integrals of the form
Z ∞
dt
p .
−∞ (t2 + x2 )(t2 + y 2 )
It was revived in the late 20th century as the basis of the AGM (arithmetic-
geometric mean) method for calculating π. You can get 1 million digits with
a dozen or so iterations.
The function is meant to be composed with itself; so it will be appropriate
to compute the derivative of f ◦ f by the chain rule.
(a) f is differentiable whenever x and y are positive; so its derivative is

given by its Jacobian matrix. Calculate this matrix.

8
We choose to evaluate the derivative of f ◦ f at the point .
2
8 5
Conveniently, f = . The chain rule says that
2 4

8 5 8
[D(f ◦ f )] = [Df ][Df ].
2 4 2
Evaluate the two numerical Jacobian matrices. Because the derivative
of f is evaluated at two different points, they will not be the same.
(b) Write the formula for f ◦ f , compute and evaluate the lower left-hand
entry in its Jacobian matrix, and check that it agrees with the value
given by the chain rule.
7. (Related to group problem 3c)

The quintic equation x(x2 − 1)(x2 − 4) = 0 clearly has five real roots that
are all integers. So does the equation x(x2 − 1)(x2 − 4) − 1 = 0, but you
have to find them numerically. Get all five roots using Newton’s method,
carrying out enough iterations to get an error of less than .001. Use R to do
Newton’s method and to check your answers. If you have R plot a graph,
it will be easy to find an initial guess for each of the five roots.
21
8. (Related to group problem 3b, but involves extra iterations)
The CEO of a chain of retail stores will get a big bonus if she hits her volume
and profit targets for December exactly. Her microeconomics consultant,
fresh out of Harvard, tells her that both her target figures are functions
of two variables, investment x in Internet advertising and investment y in
television advertising. The former attracts savvier customers and so tends
to contribute to volume more than to profit.
The function that determines volume V and profit P is
3 1
V x4 y 3 + x
= 1 2 .
P x4 y 3 + y
With x = 16, y = 8, V = 32, P = 16, our CEO figures she is set for a
big bonus. Suddenly, the board of directors, feeling that Wall Street is
looking as much for profit as for volume this year, changes her targets to
V = 24, P = 24. She needs to modify x and y to meet these new targets.
(a)
Near V =32,P = 16, there is an inverse function such that
x V
=g . Find its derivative [Dg], and use the derivative to find
y P
values of x and y that are an approximate solution to the problem.
Because the increments to V and P are large, you should not expect
the approximate solution to be very good, but it will be better than
doing nothing.
(b) Use multiple iterations of Newton’s method in R to find accurate values
of x and y that meet the revised targets. Feel free to modify Script
3.3C.
22
9. (a) Hubbard, problem 2.10.2. Make a sketch to show how this mapping
defines an alternative coordinate system for the plane, in which a point
is defined by the intersection of two hyperbolas.
(b) The point x = 3, y = 2 is specified in this new coordinate system
by the coordinates u = 6, v = 5. Use the derivative of the inverse
function to find approximate values of x and y for a nearby point
where u = 6.5, v = 4.5. (This is essentially one iteration of Newton’s
method.)
(c) Find h such that the point u = 6 + h, v = 5.1 has nearly the same
x-coordinate as u = 6, v = 5.
(d) Find k such that the point x = 3 + k, y = 2.1 has nearly the same
u-coordinate as x = 3, y = 2.
(e) For this mapping, you can actually find a formula for the inverse func-
tion that works in the region of the plane where x, y, u, and v are all
positive. Find the rather messy formulas for x and y as functions of
u and v, and use them to answer the earlier questions. Once you cal-
culate the Jacobian matrix and plug in appropriate numerical values,
you will be back on familiar ground.
I could get Mathematica Solve[] to find the inverse function only after
I eliminated y by hand. At this point the quadratic formula does the
job anyway!
23
Module #3, Week 4
Implicit functions, manifolds, tangent spaces, critical points

Reading
• Hubbard, Section 3.1 (Implicit functions and manifolds)
• Hubbard, Section 3.2 (Tangent spaces)
• Hubbard, Section 3.6 (Critical points)
• Hubbard, Section 3.7 through page 354 (constrained critical points)
• 12.1 (Hubbard Theorem 3.2.4) Suppose that U ⊂ Rn is an open subset,

F : U → Rn−k is a C 1 mapping, and manifold M can be described as the
set of points that satisfy F(z) = 0. Use the implicit function theorem to
show that if [DF(c)] is onto for c ∈ M , then the tangent space Tc M is the
kernel of [DF(c)]. You may assume that the variables have been numbered
so that when you row-reduce [DF(c)], the first n − k columns are pivotal.
• 12.2(Hubbard, theorems 3.6.3 and 3.7.1) Let U ∈ Rn be an open subset

and let f : U → R be a C 1 (continuously differentiable) function.
First prove, using a familiar theorem from single-variable calculus, that if
x0 ∈ U is an extremum, then [Df (x0 )] = [0].
Then prove that if M ⊂ Rn is a k-dimensional manifold, and c ∈ M ∩ U is
a local extremum of f restricted to M , then Tc M ⊂ ker[Df (c)].
1
R Scripts
• Script 3.4A-ImplicitFunction.R
Topic 1 - Three variables, one constraint
Topic 2 - Three variables, two constraints
• Script 3.4B-Manifolds2D.R
Topic 1 - A one-dimensional submanifold of R2 – the unit circle
Topic 2 - Interesting examples from the textbook
Topic 3 - Parametrized curves in R2
Topic 4 - A two-dimensional manifold in R2
Topic 5 - A zero-dimensional manifold inR2
• Script 3.4C-Manifolds3D.R
Topic 1 - A manifold as a function graph
Topic 2 - Graphing a parametrized manifold
Topic 3 - Graphing a manifold that is specified as a locus
• Script 3.4D-CriticalPoints
Topic 1 - Behavior near a maximum or minimum
Topic 2 - Behavior near a saddle point
• Script 3.5A-LagrangeMultiplier.R
Topic 1 - Constrained critical points in R2
2
1 Executive Summary
1.1 Implicit functions – review of the linear case.
We have n unknowns, n − k equations, e.g for n = 3, k = 1
2x + 3y − z = 0, 4x − 2y + 3z = 0
2 3 −1
Create an (n − k) × n matrix: T =
4 −2 3
If the matrix T is not onto, its rows (the equations) are linearly dependent.
Otherwise, when we row reduce, we will find n − k = 2 pivotal columns and
k = 1 nonpivotal columns. We assign values arbitrarily to the “active” variables
that correspond to the nonpivotal columns, and then the values of the “passive”
variables that corresponds to the pivotal column are determined.
Suppose that we reorder the unknowns so that the “active” variables come last.
Then, after we row reduce the matrix, the first n − k columns will be pivotal. So
the first n − k columns will be linearly independent, and they form an invertible
square matrix. The matrix is now of the form T = [A|B], where A is invertible.
~x
The solution vector is of the form ~v = , where the passive variables ~x come
~y
first, the active variables ~y come second.
A solution to T ~v = ~0 is obtained by choosing ~y arbitrarily and setting
~x = −A−1 B~y. Our system of equations determines ~x “implicitly” in terms of ~y.
1.2 Implicit function theorem – the nonlinear case.

We have a point c ∈ Rn , a neighborhood W of c, and a function F : W → Rn−k
for which F(c) = 0 and [DF(c)] is onto. F imposes constraints.
The variables are ordered so that the n − k pivotal columns in the Jacobian
matrix, which correspond to the passive variables, come first. Let a denote the
passive variables at c; let b denote the active variables at c.
The implicit function g expresses the passive variables in terms of the active
variables, and g(b)
= a. For y near b, x = g(y) determines passive variables
a
such that F = 0. Tweak y, and g specifies how to tweak x so that the
b
constraints are still satisfied.
Although we usually cannot find a formula for g, we can find its derivative at
b by the same recipe that worked in simple cases.
Evaluate the Jacobian matrix [DF(c)].
Extract the first n − k columns to get an invertible square matrix A.
Let the inverse of this matrix act on the remaining k columns (matrix B) and
change the sign to get the (n − k) × k Jacobian matrix for g.
That is, [Dg(b)] = −A−1 B.
3
1.3 Curves, Surfaces, Graphs, and Manifolds
Manifolds are a generalization of smooth curves and surfaces.
The simplest sort of manifold is a flat one, described by linear equations. An
example is the line of slope 2 that passes through the point x = 0, y = −2: a
one-dimensional submanifold of R2
There are three equivalent ways to describe such a manifold.
• (The definition) As the graph of a function that expresses the passive vari-
ables in terms of the active variables: either y = f (x) = −2 + 2x or
x = g(y) = 12 (y + 2).

x
• As a “locus” defined by a constraint equation F = 2x − y − 2 = 0.
y

1 1
• By a parametrization function g(t) = +t .
0 2
Definition: A subset M ⊂ Rn is a smooth manifold if locally it is the graph
of a C 1 function (the partial derivatives are continuous). “Locally” means that
for any point x ∈ M we can find a neighborhood U of x such that within M ∩ U ,
there is a C 1 function that expresses n − k passive variables in terms of the other
k active variables. The number k is the dimension of the manifold. In R3 there
are four possibilities:
• k = 3. Any open subset M ⊂ R3 is a smooth 3-dimensional manifold. In
this case k = 3, and the manifold is the graph of a function
f : R3 → {~0}, whose codomain is the trivial vector space {~0} that contains
just a single point. Such a function is necessarily constant, and its derivative
is zero.

x
• k = 2. The graph of z = f = x2 + y 2 is a paraboloid.
y

x ~ cos 2πz
• k = 1. The graph of the function = f (z) = is a helix.
y sin 2πz
• k = 0. In this case the manifold consists of one of more isolated points.
Near any of these points x0 , it is the graph of a function ~f : {~0} → R3
whose domain is a zero-dimensional vector space and whose image is the
point x0 ∈ R3 . This function is differentiable because, since its domain
contains only one point (the zero vector) you cannot find narby points to
show that it is not differentiable.
There is no requirement that a manifold be the graph of a single function, or
that the “active” be the same at every point on the manifold. The unit circle, the
locus of x2 +y 2 −1 = 0, is the union of four function graphs, two of which have x as
the active variable, two of which have y. By using a parameter t that is not
one of
x cos t
the variables, we can represent it by the parametrization = g(t) =
y sin t
4
1.4 Using the implicit function theorem
Start with an open subset U ⊂ Rn and a C 1 function F : U → Rn−k . Consider
the “locus,” M ∩ U , the set of solutions of the equation F(z) = 0.
If [DF(z)] is onto (surjective) for every z ∈ M ∩ U , then M ∩ U is a smooth
k-dimensional manifold embedded in Rn .
Proof: the implicit function theorem says precisely this. The statement that
[DF(z)] is onto guarantees the differentiability of the implicitly defined function.
If [DF(z)] does not exist or fails to be onto, perhaps even just at a single point,
the locus is not a manifold. We use the notation M ∩ U because F may define
just part of a larger manifold M that cannot be described as the locus as a single
function. To say that M itself is a manifold, we have to find an appropriate U
and F for every point z in the manifold.
1.5 Parametrizing a manifold

For a k-dimensional submanifold of Rn , the parametrization function is γ : U →
M , where U ⊂ Rk is an open set. The variables in Rk are called “parameters.”
The parametrization function γ must be C 1 , one-to-one, and onto M . In other
words, we want γ to give us the entire manifold. Finding a local parametrization
that gives part of the manifold is of no particular interest, because there is, by
definition, a function graph that does that.
An additional requirement: The derivative of the parametrization function
is one-to-one for all parameter values. This requirement guarantees that the
columns of the the Jacobian matrix [Dγ] are linearly independent.
1.6 Tangent space as graph, kernel, or image

Locally, a k-dimensional submanifold M of Rn is the graph of a function
g : Rk → Rn−k . The derivative of g, [Dg(b)], is an (n − k) × k matrix that
converts a vector of increments to the k active variables, ẏ, into a vector of
increments to the n − k passive variables, ẋ. That is, ẋ = [Dg(b)](ẏ)
A point c of M is specified by the active variables b and the accompanying
passive variables a. The tangent space TM (c) is the graph of this derivative. It
is a k-dimensional subspace of Rn .
The k-dimensional manifold M can also be specified as the locus of the equation
F(z) = 0, for F : Rn → Rn−k The tangent space Tc M is the kernel of the linear
transformation [DF(c)].
Finally, the manifold M can also be described as the image of a parametrization
function γ : U ⊂ Rk → Rn ,
In this case any point of M is the image of some point u in the parameter
space, and the tangent space is Tγ(u) M = Img [Dγ(u)]. Whether specified as
graph, kernel, or image, the tangent space Tc M is the same! It contains the
increment vectors that lead from c to nearby points that are “almost on the
manifold.”
5
1.7 Critical points
Suppose that function f : Rn → R is differentiable at point x0 and that the
derivative [Df (x0 )] is not zero. Then there exists a vector ~v for which the direc-
tional derivative is not zero, the function g(t) = f (x0 + t~v − f (x0 ) has a nonzero
derivative at t = 0, and, even if we just consider points that lie on a line through
x0 with direction vector ~v, the function f cannot have a maximum or minimum
at x0 . So in searching for a a maximum or minimum of f at points where it is
differentiable, we need to consider only “critical points” where [Df (x0 ] = 0.
A critical point is not necessarily a maximum or minimum, but for f : Rn → R
there is a useful test that generalizes the second-derivative test of single-variable
calculus. The proof relies on sections 3.3-3.5 of Hubbard, which we are skipping.
Form the “Hessian matrix” of second partial derivatives (Hubbard, p. 348),
evaluated at the critical point x of interest.
Hi,j (x) = Di Dj f (x).
H is a symmetric matrix. If it has a basis of eigenvectors and none of the

eigenvalues are zero, we can classify the critical point.
If H has a basis of eigenvectors, all with positive eigenvalues, the critical point
is a minimum.
If H has a basis of eigenvectors, all with negative eigenvalues, the critical
point is a maximum.
If H has a basis of eigenvectors, some with positive eigenvalues and some with
negative eigenvalues, the critical point is a saddle: it is neither a maximum or a
minimum.
1.8 Constrained critical points

These are of great important in physics, economics, and other areas to which
mathematics is applied.
Consider a point c on manifold M where the function f : Rn → R is differen-
tiable. Perhaps f has a maximum or minimum at c when its value is compared to
the value at nearby points on M , even though there are points not on M where f
is larger or smaller. . In that case we should not consider all increment vectors,
but only those increment vectors ~v that lie in the tangent space to the manifold.
The derivative [Df (c)] does not have to be the zero linear transformation, but
it has to give zero when applied to any increment that lies in the tangent space
Tc M , or
Tc M ⊂ Ker[Df (c)].
When manifold M is specified as the locus where some function F = 0, there
is an ingenious way of finding constrined critical points by using “Lagrange mul-
tipliers,”but not this week!
6
1.9 Constrained critical points - three approaches
We have proved the following:
If M ⊂ Rn is a k-dimensional manifold, and c ∈ M ∩ U is a local extremum of f
restricted to M , then Tc M ⊂ ker[Df (c)].
Corresponding to each of the three ways that we can “know” the manifold
M, there is a technique for finding the critical points of f restricted to M .
• Manifold as a graph
Near the critical point, the passive variables x are a function g(y of the
active variables y. Define the graph-making function

x
g̃(y) =
y
Now f (g(y) specifies values of f only at points on the manifold. Just search
for unconstrained critical points of this function by setting [Df ◦ g̃(y)] = 0.
This approach works well if you can represent the entire manifold as a single
function graph.
• Parametrized manifold
Points on the manifold are specified by a parametrization γ(u).
Now f (γ(u)) specifies values of f only at points on the manifold. Just search
for unconstrained critical points of this function by setting [Df ◦ γ(u)] = 0.
This approach works well if you can parametrize the entire manifold.
• Manifold specified by constraints

Points on the manifold all satisfy the constraints F(x) = 0.
In this case we know that
Tc M = Ker[DF(c)], so the rule for a critical point becomes
Ker[DF(c)] ⊂ Ker[Df (c)].
If there is just a single constraint F (x) = 0, both derivative matrices consist
of just a single row, and we can represent the condition for a critical point
as Ker α ⊂ Ker β.
Suppose that ~v ∈ ker α and that β = λα. The quantity λ is called a
Lagrange multiplier. Then by linearity, [Df (c)]~v = β~v = λα~v = 0.
So [Df (c)]~v = 0 for any vector in the tangent space of F = 0, and we have
a constrained critical point.
It is not quite so obvious that the condition β = λα is necessary as well as
sufficient. We will need to do a proof by contradiction (proof 13.1).
7
1.10 Equality of crossed partial derivatives
Let U ⊂ Rn be open. Suppose that f : Rn → R is differentiable at a and has the
property that each of its partial derivatives Di f is also differentiable at a. Then
Dj (Di f )(a) = Di (Dj f )(a).
The proof consists in using the mean value theorem to show that
1
Dj (Di f )(a) = Di (Dj f )(a) = lim (f (a+t~ei +t~ej )−f (a+t~ei )−f (a+t~ej )+f (a)).
t→0 t2
8
2 Proofs
1. Let W be an open subset of Rn , and let F : W → Rn−k be a C 1 mapping
such that F(c) = 0. Assume that [DF(c)] is onto.
Prove that the n variables can be ordered so that the first n − k columns
of [DF(c)] are linearly independent, and that [DF(c)] = [A|B] where A is
an invertible (n − k) × (n − k) matrix.

a
Set c = , where a are the n − k passive variables and b are the k active
b
variables.
Let g be the “implicit function”
from aneighborhood of b to a neighborhood
g(y)
of a such that g(b) = a and F = 0.
y
Prove that [Dg(b)] = −A−1 B.
9
2. (Proof 12.1 - Hubbard Theorem 3.2.4)
Suppose that U ⊂ Rn is an open subset, F : U → Rn−k is a C 1 mapping,
and manifold M can be described as the set of points that satisfy F(z) = 0.
Use the implicit function theorem to show that if [DF(c)] is onto for c ∈ M ,
then the tangent space Tc M is the kernel of [DF(c)]. You may assume that
the variables have been numbered so that when you row-reduce [DF(c)],
the first n − k columns are pivotal.
10
3. (Hubbard, Proposition 3.2.7) Let U ⊂ Rk be open, and let γ : U → Rn be
a parametrization of manifold M . Show that
Tγ(u) M = img[Dγ(u)].
You may take it as proved that if subspaces V and W both have dimension
k and V ⊂ W, then V = W (for the simple reason that k basis vectors for
V are k independent vectors in W and therefore also form a basis for W ).
11
4. (Proof 12.2 – Hubbard, theorems 3.6.3 and 3.7.1)
Let U ∈ Rn be an open subset and let f : U → R be a C 1 (continuously
differentiable) function.
First prove, using a familiar theorem from single-variable calculus, that if
x0 ∈ U is an extremum, then [Df (x0 )] = [0].
Then prove that if M ⊂ Rn is a k-dimensional manifold, and c ∈ M ∩ U is
a local extremum of f restricted to M , then Tc M ⊂ ker[Df (c)].
12
3 Sample Problems
1. A cometary-exploration robot is fortunate enough to land on an ellipsoidal
comet whose surface is described by the equation
y2 z2
x2 + + = 9.
4 9
Its landing point is x = 2, y = 4, z = 3.
• Prove that the surface of the comet is a smooth manifold.

• The controllers of the robot want it to move to a nearby point on the
surface where y = 4.02, z = 3.06. Use the implicit function theorem to
determine the approximate x coordinate of this point.
(Check: 1.982 + 4.022 /4 + 3.062 /9 = 9.0009.)
• Find a basis for the tangent space at the landing point.
• Find the equation of the tangent plane at the landing point.
(Check: 4(1.98) + 2(4.02) + (2/3)(3.06) = 18.)
13
2. The plane x + 2y − 3z + 4 = 0 andthecone x2 + y 2 − z 2 = 0 intersect in a
3
curve that includes the point c = 4. Near that point this curve is the
5
x
graph of a function = g(z).
y
Use the implicit function theorem to determine g0 (5), then find the approx-
imate coordinates of a point on the curve with z = 5.01.
Check: 2.89+2(4.07) - 3(5.01)= -4; 2.892 + 4.072 = 24.917.
14
3. Assume that, at the top level, there are nine categories x1 , x2 , · · · , x9 in the
Federal budget. They must satisfy four constraints:
• One simply fixes the total dollar amount.

• One comes from your political advisors – it makes the budget looks
good to likely voters in swing states.
• One comes form Congress - it guarantees that everyone can have his
or her earmarks.
• One comes from the Justice Department – it guarantees compliiance
with all laws.
These four constraints together define a function F whose derivative is onto

for budgets that satisfy the constraints. The acceptable budgets, for which
F(x) = 0, form a k-dimensional submanifold M of Rn .
Specify the dimension of the domain and codomain foe
(a) A function g that specifies that passive variables in terms of the active
variables.
(b) The function F that specifies the constraints.
(c) A parametrization function γ that generates a valid budget from a set
of parameters.
For each alternative, specify the shape of the matrix that represents the
derivative of the relevant function and explain how, given a valid budget c,
it could be used to find a basis for the tangent space Tc M.
15
4. (Hubbard, exercise 3.1.17) Consider the situation described by Example
3.1.8 in Hubbard, where four linked rigid rods form a quadrilateral in the
plane. The distance from vertex x1 to x2 is l1 , the distance from vertex x2
to x3 is l2 , the distance from vertex x3 to x4 is l3 , and the distance from
vertex x4 to x1 is l4 .
Show that knowing the positions x1 and x3 of two opposite vertices deter-
mines exactly four possible positions of the linkage if the distance from x1
to x3 is less than both l1 + l2 and l3 + l4 but greater than both |l1 − l2 | and
|l3 − l4 | Draw diagrams to illustrate what can happen if these conditions
are not satisfied.
16
5. Critical points

x
f = 21 x2 + 13 y 3 − xy
y
Calculate the partial derivatives
as
functions
of x and y, and show that the
0 1
only critical points are and
0 1
Calculate the Hessian matrix H and evaluate it numerically at each critical

point to get matrices H0 and H1 .

0
Find the eigenvalues of H0 and classify the critical point at .
0

1
Find the eigenvalues of H1 and classify the critical point at .
1
17
4 Group Problems
1. Implicitly defined functions
 
x 2 2 2

x + y + z − 3
(a) The nonlinear equation F y  = = 0 implicitly
x2 + z 2 − 2
z
determines x and y as a function of z. The first equation describes a
sphere of radius 3, the second describes a cylinder of radius 2 whose
axis is the y-axis. The intersection is a circle in the plane y = 1.
Near the point x = 1, y = 1, z = 1, there is a function that expresses
the two√ passive variables
x and y in terms of the active variable z.
2−z 2
g(z) = .
1
Calculate g0 (z) and determine the numerical value of g0 (1)
Then get the same answer without using the function g by forming
the Jacobian matrix [DF] evaluating it at x = y = z = 1, and using
the implicit function theorem to determine g0 (z) = −A−1 [B].
(b) Dean Smith is working on a budget in which he will allocate x to the
library, y to pay raises, and z to the Houses. He is constrained.
The Library Committee, happy to see anyone get more funds as long
as the library does even better, insists that x2 − y 2 − z 2 = 1.
The Faculty Council, content to see the Houses do well as long as other
areas benefit equally, recommends that x + y − 2z = 1.
To comply with these constraints, the dean tries x = 3, y = 2, z = 2.
Given theconstraints,
x and y are determined by an implicitly defined
x
function = g(z).
y
Use the implicit function theorem to calculate g0 (2), and use it to find
approximate values of x and y if z increased to 2.1.
 
x
(c) The nonlinear equation F y  = x2 − 4z 2 − 4y 2 − 1 = 0 implicitly

z
determines x as a function of y and z, but we need to know whether x
is positive or negative to choose the right square root in the function.

y
Find the appropriate function g near the point
z
1
x = 3, y = 1, z = 1, and calculate [Dg ]
1
Then get the same answer by calculating the Jacobian matrix [DF ]
at x = 3, y = 1, z = 1, splitting off a square matrix A on the left, and
computing [Dg] = −A−1 B.
18
2. Manifolds and tangent spaces, investigated with help from R
(a) Manifold
  M is known by the equation  
x 4
2
F  y  = xz − y = 0 near the point c = 2.

z 1
It can also
be2 
described parametrically by
s
s
γ = st2  near s = 2, t = 1.
t
t4
i. Use the parametrization to find a basis for the tangent space Tc M.
ii. Use the function F to confirm that your basis vectors are indeed
in the tangent space Tc M.
iii. Use the parametrization to do a wireframe plot of the parametrized
manifold near s = 2, t = 1. See script 3.4C, topic 2.
 
x
(b) Manifold M is known by the equation F y  = x2 y + xy 2 − z 2 + 3 = 0

  z
2
near the point c = 1.

3
i. Find a basis for the tangent space Tc M.

y
ii. Locally, M is the graph of a function x = g . Determine
z
1
[Dg ] by using the implicit function theorem.
3
iii. Solve for z in terms of xand y, and use R to do a wireframe plot
of the manifold. See script 3.4C, topic 1.
 
z1
z3
(c) (Hubbard, Example 3.1.14) F z2 = 
z3 − z1 z2
z3
Construct [DF]. It has two rows.
Find the point for which [DF] is not onto. Use R to find points on
the manifold near this point, and try to figure out what is going on.
See the end of script 3.4C for an example of how to find points on a
1-dimensional manifold in R3 .
19
3. Critical points (rigged to make the algebra work, but you should also plot
contour lines in R and use them to find the critical points)
Calculate the Jacobian matrix and the Hessian both by using R and with
pencil and paper.

x
(a) i. Find the one and only critical point of f = 4x2 + 12 y 2 + x82 y
y
on the square 14 ≤ x ≤ 4, 41 ≤ y ≤ 4.
ii. Use second derivatives (the Hessian matrix) to determine whether
this critical point is a maximum, minimum, or neither.

x
(b) The domain of the function F = y 2 + (x2 − 3x) log y is the upper
y
half-plane y > 0. Find all the critical points of F , and use the Hessian
matrix to classsify each as maximum, minimum, or saddle point.

x
(c) The function F = x2 y − 3xy + 12 x2 + y 2 has three critical points,
y
two of which lie on the line x = y. Find each and use the Hessian
matrix to classify it as maximum, minimum, or saddle point.
20
5 Homework - due on December 2
Although all of these problems except the last one were designed so that they
could be done with pencil and paper, it makes sense to do a lot of them in R,
and the Week 12 scripts provide good models. For each problem that you choose
to do in R, include a “see my script” reference in the paper version. Put all your
R solutions into a single script, and upload it to the homework dropbox on the
week 12 page.
When you use R, you will probably want to include some graphs that are not
required by the statement of the problem.
Do appreciate that problems 3 and 4, which use only androgynous names, are
sexual-orientation neutral as well as gender-neutral and avoid the use of third-
person singular pronouns.
1. (Hubbard, exercise 3.12)

Let X ⊂ R3 be the set of midpoints of segments joining a point of the
curve C1 of equation y = x2 , z = 0 to a point of the curve C2 of equation
z = y 2 , x = 0.
(a) Parametrize C1 and C2 .

(b) Parametrize X.
(c) Find an equation for X (i.e. describe X as a locus)
(d) Show that X is a smooth surface.
 
x
2. Manifold M is known by the equation F y  = x2 + y 4 − 2z 2 − 2 = 0 near

  z
3
the point c = 1.
2

y
(a) Locally, near c, M is the graph of a function x = g . Determine
z
[Dg(c)] by using the implicit function theorem.
(b) Use [Dg(c)] to find the approximate value of x for a point of M near
c for which y = 1.1, z = 1.8.
(c) Check your answers by finding an explicit formula for g and taking its
derivative.
21
3. Pat and Terry are in charge of properties for the world premiere of the
student-written opera “Goldfinger” at Dunster House. In the climactic
scene the anti-hero takes the large gold brick that he has made by melting
down chalices that he stole from the Vatican Museum and places it in a
safety deposit box in a Swiss bank while singing the aria “Papal gold, now
rest in peace.”
The gold brick is supposed to have length x = 8, height y = 2, and width
z = 4. With these dimensions in mind, Pat and Terry have spent their
entire budget on 112 square inches of gold foil and 64 cubic inches of an
alloy that melts at 70 degrees Celsius. They plan to fabricate the brick by
melting the alloy in a microwave oven and casting it in a sand mold.
Alas, the student mailboxes that they have borrowed to simulate safety-
deposit boxes turn out to be not quite 4 inches wide. Fortunately, the
equation
 
x
xyz − 64
F y  = =0
xy + xz + yz − 56
z
specifies x and y implicitly in terms of z.
(a) Use the implicit function

theorem to find [Dg(4)], where g is the func-
x
tion that specifies in terms of z, and find the approximate dimen-
y
sions of a brick with the same volume and surface area as the original
but with a width of only 3.9 inches.
(b) Show that if the original dimensions had been x = 2, y = 2, z = 16,
then the constraints of volume 64, surface area 136 specify y and z in
terms of x but fail to specify x and y in terms of z.
(c) Show that if the original brick had been a cube with x = y = z = 4,
then, with the constraints of volume 64, surface area 96, we cannot
show the existence of any implicit function. In fact there is no implicit
function, but our theorem does not prove that fact. This happens
because this cube has minimum surface area for the given volume.
22
4. This problem is an example of a two-dimensional submanifold of R4 .
For their term project in the freshman seminar “Nuclear Terrorism and the
Third World,” Casey and Chris decide to investigate whether plutonium
can be handled safely using only bronze-age technology. They acquire two
bronze spears, each 5 meters long, and design a system where the plutonium
container is connected to the origin by one spear and to the operator by
the other. Everything is in a plane. Now the coordinates x1 and y1 of
the plutonium and the coordinates x2 and y2 of the operator satisfy the
equation  
x1
 y1  x21 + y12 − 25
F  =
  = 0.
x2 (x1 − x2 )2 + (y1 − y2 )2 − 25
y2
One solution to this equation is x1 = 3, y1 = 4, x2 = 0, y2 = 8.
(You can build a model with a couple of ball-point pens and some Scotch
tape).
(a) Show that near the given solution, the constraint equation specifies x1
and y1 as a function of x2 and y2 , but not vice-versa.
(b) Calculate the derivative of the implicit function and show that it is not
onto. Determine in what direction the plutonium container will move
if x2 and y2 are both increased by equal small amounts (or changed
in any other way.) This system is not really satisfactory, because the
plutonium container can move only along a circle.
(c) Casey and Chris come up with a new design in which one spear has its
end confined to the x-axis (coordinate x2 can be changed, but y2 = 0).
The other spear has its end confined to the y-axis (coordinate y3 can
be changed, but x3 = 0). For this new setup, one solution is x1 = 3,
y1 = 4, x2 = 6, y3 =0. Show that x1 and y1 are now specified locally
x
by a function ~g 2 . Calculate [Dg] and show that it is onto.
y3
(d) Are x2 and y3, near the same solution, now specified locally by a
x1
function ~f ? If so, what is [Df ]?
y1
(e) For the new setup, another solution is x1 = 3, y1 = 4, x2 = 6, y3 = 8.
Show that in this case, although [DF] is onto, the choice of x1 and y1
as passive variables
is not possible, and there is no implicitly defined
x2
function ~g as there was in part (c). Draw a diagram to illustrate
y3
what is the problem.
23
5. (Physics version)In four-dimensional spacetime, a surface is specified as the
intersection of the hypersphere x2 + y 2 + z 2 = t2 − 2 and the hyperplane
3x + 2y + z − 2t = 2.
(Economics version)A resource is consumed at rate t to manufacture goods
at rates x, y, and z, and production is constrained by the equation x2 +
y 2 + z 2 = t2 − 2.
Furthermore, the expense of extracting the resource is met by selling the
goods, so that 2t = 3x + 2y + z − 2.
In either case, we have a manifold that is the locus of
 
x 2
y  x + y 2 + z 2 − t2 + 2
F  =
  = 0.
z 3x + 2y + z − 2t − 2
t
(a) Show that this surface is a smooth 2-dimensional manifold.

(b) One point on the manifold is x = 1, y = 2, z = 3, t = 4. Near this
point the manifold is the graph of a function g that expresses x and y
as functions of z and t. Using the implicit function theorem, determine
[Dg] at the point z = 3, t = 4.
6. Consider
the
manifold
specified by the parametrization

x t + et
g(t) = = , −∞ < t < ∞.
y t + e2t
Find where it intersects the line 2x+y = 10. You can get an initial estimate
by using the graph in script 3.4B, then use Newton’s method to improve
the estimate.
24
7. Manifold X, a hyperboloid, can be parametrized as
   
x sec u
y  = γ u = tan u cos v 
v
z tan u sin v
If you use R, you can do a wireframe plot the same way that the sphere
was plotted in script 3.4C, topic 2.
(a) Find the coordinates of the point c on this manifold for which
u = π4 , v = π2 .
π
(b) Find the equation of the tangent plane Tc X as the image of [Dγ π4 ].
2
 
x
(c) Find an equation F y  = 0 that describes the same manifold near
z
c, and find the equation of the tangent plane Tc X as the kernel of
[DF(c)].

y
(d) Find an equation x = g that describes the same manifold near
z
c, and
find the equation of the tangent plane Tc X as the graph of
0
[Dg ].
1
8. Hubbard, Exercise 3.6.2. This is the only problem of this genre on the
homework that can be done with pencil and paper, but you must be pre-
pared to do one like it on the final exam!
9. Here is another function that has one maximum, one miminum, and two
saddle points, for all of which x and y are less than 3 in magnitude.

x
f = x3 − y 3 + 2xy − 5x + 6y.
y
Locate and classify all four critical points using R, in the manner of script
3.4D. A good first step is to plot contour lines with x and y ranging from
-3 to 3. If you do
contour(x,y,z, nlevels = 20)
you will learn enough to start zooming in on all four critical points.
An alternative, more traditional, approach is to take advantage of the fact
that the function f is a polynomial. If you set both partial derivatives equal
to zero, you can eliminate either x or y from the resulting equations, then
find approximate solutions by plotting a graph of the resulting fourth-degree
polynomial in x or y.
25

Harvard Math 23a Notes Problems Syllabus PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Harvard Math 23a Notes Problems Syllabus PDF

Uploaded by

Copyright:

Available Formats

MATHEMATICS 23a/E23a, FALL 2015

Linear Algebra and Real Analysis I

Instructor: Paul Bamberg (to be addressed as “Paul,” please)

Head Teaching Assistant: Kate Penner (to be addressed as “Kate,” please)

Office Hours: TBD

Week 1: T Regular office hours TBA

• Nicolas Campos, ncampos@college.harvard.edu

• Jennifer Hu, jenniferhu@college.harvard.edu

• Ju Hyun Lee, juhyunlee@college.harvard.edu

• Elaine Reichert, reichertelaine@gmail.com

• Ben Sorscher, bsorscher@college.harvard.edu

• Sebastian Wagner-Carena, swagnercarena@college.harvard.edu

• Kenneth Wang, kwang02@college.harvard.edu

Ross, Elementary Analysis: The Theory of Calculus, 2nd Edition, 2013.

Lawvere, Conceptual mathematics: a first introduction to categories, 2nd Edi-

• Presenting a proof to Paul, Kate, one of the course assistants, or a fellow

– are interested in computer science and want practice in using software

R is free, open-source software. Instructions for download and installation

• By being a member of a group that uploads solutions to section problems

• By submitting R scripts that solve the optional R homework problems (again

• By doing a term project in R. (about 20 points)

To do the “graduate credit” grade calculation, we wiil add in your R bonus

• presenting and listening to proofs, 26 points.

• uploading proofs to the Web site, 4 points.

• participation in the “early” sections, based on attendance, preparation, con-

• two quizzes, 40 points each.

• final exam, slightly more than 60 points.

• R bonus points, about 50 points in numerator, 25-45 points in denominator.

• Math 23a to Math 21a or b

• Math 25a to Math 23a

Special material for Physics 15b and Physics 153

Collaboration and Academic Integrity policy:

Authors: Paul Bamberg and Kate Penner

• Hubbard, Sections 0.1 through 0.4

• Hubbard, Sections 1.1, 1.2, and 1.3

• Lawvere and Schanuel, Conceptual Mathematics

Proofs to present in section or to a classmate who has done them.

– If ab = 0, then either a or b must be 0.

• 1.2(Generalization of Hubbard, proposition 1.2.9) A is an n × m matrix.

• 1.3 (Hubbard, proposition 1.3.14) Suppose that linear transformation

Note: Use * to denote matrix multiplication and ◦ to denote composition

• Script 1.1A-Finite Fields.R

2. Addition is associative: (a + b) + c = a + (b + c).

3. Additive identity: ∃0 such that ∀a ∈ F, 0 + a = a + 0 = a.

4. Additive inverse: ∀a ∈ F, ∃ − a such that −a + a = a + (−a) = 0.

5. Multiplication is associative: (ab)c = a(bc).

6. Multiplication is commutative: ab = ba.

7. Multiplicative identity: ∃1 such that ∀a ∈ F, 1a = a.

8. Multiplicative inverse: ∀a ∈ F − {0}, ∃a−1 such that a−1 a = 1.

9. Distributive law: a(b + c) = ab + ac.

Examples of fields include:

To add a vector to a vector, we again add component by component. The

To form a scalar multiple of a vector, we multiply each component by the

1.4 Matrices and linear transformations

1.5 Matrix multiplication

1.7 Function inverses

1.8 The determinant of a 2 × 2 matrix

1.9 Matrix inverses

The transpose of a matrix product is the product of the transposes, but in

1.11 Applications of matrix multiplication

• Markov processes: A game of beach volleyball has two “states”: in state

• There exists an even prime number.

• The function f (x) is continuous on the open interval (0,1), which